A Hamilton-Jacobi Reachability-Based Framework for Predicting and Analyzing Human Motion for Safe Planning
Somil Bansal, Andrea Bajcsy, Ellis Ratner, Anca D. Dragan, Claire J. Tomlin
AA Hamilton-Jacobi Reachability-Based Frameworkfor Predicting and Analyzing Human Motion for Safe Planning
Somil Bansal*, Andrea Bajcsy*, Ellis Ratner*, Anca D. Dragan, Claire J. Tomlin
Abstract — Real-world autonomous systems often employprobabilistic predictive models of human behavior duringplanning to reason about their future motion. Since accuratelymodeling human behavior a priori is challenging, such modelsare often parameterized, enabling the robot to adapt predictionsbased on observations by maintaining a distribution over themodel parameters. Although this enables data and priors toimprove the human model, observation models are difficult tospecify and priors may be incorrect, leading to erroneous statepredictions that can degrade the safety of the robot motion plan.In this work, we seek to design a predictor which is more robustto misspecified models and priors, but can still leverage humanbehavioral data online to reduce conservatism in a safe way. Todo this, we cast human motion prediction as a Hamilton-Jacobireachability problem in the joint state space of the humanand the belief over the model parameters. We construct a newcontinuous-time dynamical system, where the inputs are theobservations of human behavior, and the dynamics include howthe belief over the model parameters change. The results of thisreachability computation enable us to both analyze the effectof incorrect priors on future predictions in continuous stateand time, as well as to make predictions of the human statein the future. We compare our approach to the worst-caseforward reachable set and a stochastic predictor which usesBayesian inference and produces full future state distributions.Our comparisons in simulation and in hardware demonstratehow our framework can enable robust planning while not beingoverly conservative, even when the human model is inaccurate.Videos of our experiments can be found at the project website . I. I
NTRODUCTION
Planning around humans is critical for many real-worldrobotic applications. To effectively reason about humanmotion, practitioners often couple dynamics models withhuman decision-making models to generate predictions thatare used by the robot at planning time. Such predictors oftenmodel the human as an agent maximizing an objective [1–5]or they learn human behavioral structure from data [6]. Whencoupled with robot motion planners, these human decision-making models have been successfully used in a variety ofdomains including manipulation [7–11], autonomous driving[11, 12], and navigation [13, 14] (see [15] for a survey).Such predictive models are often parameterized to en-code variations in decision-making between different people.However, since modeling human behavior and how peoplemake decisions a priori is challenging, predictors can main-tain a belief distribution over these model parameters [16,17]. This stochastic nature of the predictor naturally incorpo-rates the model parameter uncertainty into future human state ∗ Indicates equal contribution. Authors are with EECS at UC Berkeley.Research supported by DARPA Assured Autonomy, NSF CPS VeHICal,SRC CONIX, NSF CAREER award, and a NASA NSTRF. Project website: https://abajcsy.github.io/hallucinate/ Fig. 1: The robot models the human as walking towards one of the two goalsshown in red. However, the human actually walks straight in between them.The predictions from our Hamilton-Jacobi reachability framework (center,in magenta) approximates the full probability distribution (right, in teal)while being more robust to model misspecification and less conservativecompared to the naive full forward reachable set (left, in grey). uncertainty [4, 18, 19]. Importantly, once deployed, the robotcan observe human behavior and update the distribution overthe model parameters to better align its predictive model withthe observations.However, there are two key challenges with such stochas-tic predictors. First, to update the distribution and to gener-ate predictions, stochastic predictors rely on priors and onobservation models. Although these two components enabledata and prior knowledge to improve the model, observationmodels are difficult to specify and the priors may be in-correct. When either is erroneous, then the state predictionswill be erroneous as well, and the robot may confidently planunsafe motions. Secondly, there exist limited computationaltools to generate these stochastic predictions in continuoustime and state, which may be important for safety-criticalapplications.To address the former challenge, researchers in the robustcontrol community have looked into full forward reachableset predictors, which compute the set of all possible statesthat the human could reach from their current state [12,20]. Unfortunately, because these predictors consider worst-case human behavior, the resulting predictions are overlyconservative, do not typically leverage any data to updatethe human model, and they significantly impact the overallefficiency of the robot when used in practice.In this work we seek a marriage between stochastic predic-tion and robust control: a predictor which is more robust to a r X i v : . [ c s . R O ] A p r isspecified models and priors, but can still leverage humanbehavioral data online to reduce conservatism in a safe way.To do this, we cast human motion prediction as a Hamilton-Jacobi (HJ) reachability problem [20–22] in the joint statespace of the human and the belief over the predictive modelparameters. Observations of the human are treated as con-tinuous “input” into our new dynamical system and how thebelief over the model parameters changes based on this dataare part of the “dynamics” of the system. However, unlikestochastic predictors, we do not rely on the exact probabilityof an observation during the prediction. Rather, we divide thepossible human observations under the current model in twodisjoint sets of likely and unlikely observations, and com-pute the possible future human states by treating all likelyenough observations as equally probable. Since we no longerrely on the exact observation probabilities, the resultingpredictor exhibits robustness to incorrect priors, while stillleveraging human behavioral data to reduce conservatismonline in a safe way. In doing so, our algorithm providesa bridge between stochastic predictors and predicting thefull forward reachable set. Additionally, our reachabilityformulation allows us to leverage the computational toolsdeveloped for reachability analysis to predict future humanstates in continuous time and state [23, 24].Interestingly, because HJ reachability is rooted in optimalcontrol, our novel formulation also allows us to analyzehuman models which change online based on data. Forexample, we can now ask (and answer) questions like “Howlong will it take the predictor to reach a desired level ofconfidence in its human model?” As long as the robotdoes not have enough confidence in the human model, thisinformation can be used to plan safe maneuvers. Once theconfidence in the human model is high enough, the robot canplan more aggressive maneuvers to improve performance.To summarize, our key contributions in this work are: • a Hamilton-Jacobi reachability-based framework forhuman motion prediction which provides robustnessagainst misspecified observation models and priors; • a demonstration of how our framework can be usedto analyze human decision-making models that areupdated online; • a demonstration of our approach in simulation and onhardware for safe navigation around humans.II. P ROBLEM S TATEMENT
We study the problem of safe motion planning for amobile robot in the presence of a single human agent. Inparticular, our goal is to compute a control sequence for therobot which moves it from a given start state to a goal statewithout colliding with the human or the static obstacles inthe environment. We assume that both the robot and humanstates can be accurately sensed at all times. Finally, we alsoassume that a map of the static parts of the environmentis known; however, the future states of the humans are notknown and need to be predicted online. Consequently, wedivide the safe planning problem into two subproblems:human motion prediction and robot trajectory planning.
A. Human Motion Prediction with Online Updates
To predict future human states, we model the human as adynamical system with state x H ∈ R n H , control u H ∈ R m H ,and dynamics ˙ x H = f H ( x H , u H ) . (1)Here, x H could represent the position and velocity of thehuman, and f H describes the change in their evolution overtime. To find the likely future states of the human, wecouple this dynamics model with a model of how the humanchooses their actions. In general, this is a particularly difficultmodeling challenge and many models exist in the literature(see [15]). In this work, we primarily consider stochasticcontrol policies that are parameterized by λ t : u tH ∼ P ( u tH | x tH ; λ t ) . (2)In this model, λ t can represent many different aspects ofhuman decision-making , from how passive or aggressive aperson is [25] to the kind of visual cues they pay attentionto in a scene [5]. The specific choice of parameterizationis often highly problem specific and can be hand-designedor learned from prior data [4, 26]. Nevertheless, the truevalue of λ t is most often not known beforehand and insteadneeds to be estimated after receiving measurements of thetrue human behavior. Thus, at any time t , we maintaina distribution P t ( λ ) over λ t , which allows us to reasonabout the uncertainty in human behavior online based onthe measurements of u H . Running example:
We now introduce a running exam-ple for illustration purposes throughout the paper. In thisexample, we consider a ground vehicle that needs to goto a goal position in a room where a person is walking.We consider a planar model with state x H = [ h x , h y ] ,control u H = θ , and dynamics ˙ x H = [ v H cos ( θ ) , v H sin ( θ )] .The model parameter λ t can take two values and indicateswhich goal location the human is trying to navigate to. Thehuman policy for any state and goal is given by a Gaussianwith mean pointing in the goal direction and a variancerepresenting uncertainty in the human action: u tH | x tH ∼ (cid:40) N ( µ , σ ) , if λ t = g N ( µ , σ ) , if λ t = g , (3) where µ i = tan − (cid:0) g i ( y ) − h ty g i ( x ) − h tx (cid:1) and σ i = π/ for i ∈ { , } . ( g i ( x ) , g i ( y )) represents the position of goal g i . Since we are uncertain about the true value of λ t , weupdate P t ( λ ) online based on the measurements of u tH .This observed control may be used as evidence to update therobot’s prior belief P t ( λ ) about λ t over time via a Bayesianupdate to obtain the posterior belief: P t + ( λ t | u tH , x tH ) = P ( u tH | x tH ; λ t ) P t ( λ ) (cid:80) ¯ λ P ( u tH | x tH ; ¯ λ ) P t (¯ λ ) (4)Given the human state x tH , the dynamics f H in (1), thecontrol policy in (2), and the distribution P t ( λ ) , our goal This formulation is easily amenable to deterministic policies where P ( u tH | x tH ; λ t ) is the Dirac delta function. s to find the likely human states at some future time, t + τ : K t(cid:15) ( τ ) = { x t + τH | P ( x t + τH | x tH ) > (cid:15) } , (5)where (cid:15) ≥ is the desired safety threshold and is a designparameter. When (cid:15) = 0 , we drop the subscript (cid:15) in K t . Usingthis set of likely human states, our robot will generate atrajectory that at each future time step t + τ avoids K t(cid:15) ( τ ) .Note that the requirement to compute K t(cid:15) ( τ ) is subtlydifferent from computing the full state distribution, P ( x t + τH | x tH ) . For computing the full distribution, one can explicitlyintegrate over all possible future values of λ , state, and actiontrajectories. Alternatively, one can use the belief space tokeep track of P t ( λ ) over time, and compute the humanstate distribution using the belief. The latter computationcan be thought of as branching on future observations,and keeping track of what the belief might be at eachfuture time depending on that observation history and theintrinsic changes in the human behavior. Our insight is thatthis latter computation can be formulated as a stochasticHamilton-Jacobi reachability problem in the joint state spaceof the human and belief, but that it can be simplified to adeterministic reachability problem. This not only leads to amore robust prediction of likely future human states whenthe prior P t ( λ ) is not correct, but also allows us to com-pute an approximation of K t(cid:15) ( τ ) with lower computationalcomplexity when the prior is correct. B. Robot Motion Planning
We model the robot as a dynamical system with state x R ∈ R nR , control u R ∈ R m R , and dynamics ˙ x R = f R ( x R , u R ) .The robot’s goal is to determine a set of controls u
0: ¯ TR suchthat it does not collide with the human or the (known) staticobstacles, and reaches its goal g R by ¯ T . In this work, wesolve this planning problem in a receding horizon fashion.Since the future states of the human are not known a priori ,we instead plan the robot trajectory to avoid K t(cid:15) ( · ) , the likelystates of the human in the time interval [ t, t + T ] , duringplanning at time t . Running example:
Our ground robot is modeled as a3D Dubins’ car with state given by its position and heading x R = [ s x , s y , φ ] , and speed and angular speed as the control u R = [ v R , ω ] . The respective dynamics are described by ˙ x R = [ v R cos φ, v R sin φ, ω ] . At any given time t , we use athird-order spline planner to compute the robot trajectoryfor a horizon of T (for more details, see [27]). III. R
EACHABILITY - BASED M OTION P REDICTION
In this section, we first discuss how to cast the full prob-abilistic human motion prediction problem as a stochasticreachability problem. Next, we discuss how we can obtaina deterministic approximation of this stochastic reachabilityproblem, and solve it using our HJ reachability framework.
A. Casting prediction as a reachability problem
We cast the problem of predicting a probability distri-bution over the human states at some future time as oneof maintaining a time-dependent distribution over reachable states and beliefs, given a stochastic human model as in (2).Let the current time be t with the current (known) humanstate x tH and a belief P t ( λ ) . Different control actions u tH that the human might take next will induce a change inboth the human’s physical state and the robot’s belief over λ . This in turn affects what human action distribution therobot will predict for the following timestep, and so on. Tosimultaneously compute all possible future beliefs over λ and corresponding likely human states, we consider the jointdynamics of P t ( λ ) and the human: ˙ z t = [ ˙ x tH ˙ P t ( λ )] = f ( z t , u tH ) . (6)At any state z , the distribution over the (predicted) humanactions is given by u H ∼ P ( u H | z ) = (cid:88) ¯ λ P ( u H | x H ; ¯ λ ) P (¯ λ ) . (7)To derive the dynamics of P t ( λ ) in (6), we note that thebelief can change either due to the the new observations(via the Bayesian update in (4)) or the change in humanbehavior (modeled via the parameter λ ) over time. Thiscontinuous evolution of P t ( λ ) can be described by thefollowing equation: ˙ P t ( λ ) = γ (cid:16) P t + ( λ | u tH , x tH ) − P t ( λ ) (cid:17) + k (cid:16) P t ( λ ) (cid:17) . (8)Here, the function k represents the intrinsic changes in thehuman behavior, whereas the other component captures theBayesian change in P t ( λ ) due to the observation u tH . Notethat the time derivative in (8) is pointwise in the λ space.Typically, the Bayesian update is performed in discretetime when the new observations are received; however, inthis work, we reason about continuous changes in P t ( λ ) and the corresponding continuous changes in the humanstate. We omit a detailed derivation, but intuitively, to relatecontinuous-time Bayesian update to discrete-time version, γ in (8) can be thought of as the observation frequency. Indeed,as γ ↑ ∞ , i.e., observations are received continuously, P t ( λ ) instantaneously changes to P t + ( λ | u tH , x tH ) . On the otherhand, as γ ↓ , i.e., no observation are received, the Bayesianupdate does not play a role in the dynamics of P t ( λ ) .Given the joint state at time t , z t , and the control policyin (7), we are interested in computing the following set: V ( τ ) = { z τ | P ( z τ | z t ) > } , τ ∈ [ t, t + T ] . (9)Intuitively, V ( τ ) represents all possible states of the jointsystem, i.e., all possible human states and beliefs over λ , thatare reachable under the dynamics in (6) for some sequenceof human actions. We refer to this set as Belief AugmentedForward Reachable Set (BA-FRS) from here on. Given aBA-FRS, we can obtain K t ( τ ) by projecting V ( τ ) on thehuman state space. In particular, K t ( τ ) = (cid:91) z τ ∈V ( τ ) Π( z τ ) , τ ∈ [ t, t + T ] , (10)where Π( z ) denotes the human state component of z . Conse-quently, the probability of any human state can be obtained ig. 2: Visualization based on the the running example of the full BA-FRSand their projections into the human state space in the h x - h y plane. as P ( x τH ) = (cid:80) z τ ∈V ( τ ) P ( z τ | z t ) , if x τH = Π( z τ ) (and otherwise) which can be used to obtain K t(cid:15) ( τ ) for any (cid:15) ≥ .Since the control policy in (7) is stochastic, the computa-tion of V ( τ ) is a stochastic reachability problem. However,when the prior P t ( λ ) is not correct, safeguarding against V ( τ ) is not sufficient. Moreover, even when the prior iscorrect, computing stochastic reachable sets can be computa-tionally demanding [28]. To overcome these challenges, werecast the computation of V ( τ ) as a deterministic reachabilityproblem. We next discuss HJ-reachability analysis for com-puting the BA-FRS thanks to modern computational tools[23, 24], and discuss how we can cast V ( τ ) as a deterministicreachability problem. B. Background: Hamilton-Jacobi Reachability
HJ-reachability analysis [20, 22, 29] can be used forcomputing a general Forward Reachable Set (FRS) V ( τ ) given a set of starting states L . Intuitively, V ( τ ) is the setof all possible states that the system can reach at time τ starting from the states in L under some permissible controlsequence. The computation of the FRS can be formulated asa dynamic programming problem which ultimately requiressolving for the value function V ( τ, z ) in the following initialvalue Hamilton Jacobi-Bellman PDE (HJB-PDE): D τ V ( τ, z )+ H ( τ, z, ∇ V ( τ, z )) = 0 , V (0 , z ) = l ( z ) , (11)where τ ≥ . Here, D τ V ( τ, z ) and ∇ V ( τ, z ) denote the timeand space derivatives of the value function respectively. Thefunction l ( z ) is the implicit surface function representing theinitial set of states L = { z : l ( z ) ≤ } . The Hamiltonian, H ( τ, z, ∇ V ( τ, z )) , encodes the role of system dynamics andcontrol, and is given by H ( τ, z, ∇ V ( τ, z )) = max u H ∈U ∇ V ( τ, z ) · f ( z, u H ) . (12)Once the value function V ( τ, z ) is computed, the FRS isgiven by V ( τ ) = { z : V ( τ, z ) ≤ } . C. An HJ Reachability-based framework for predictionand analysis
In this section, we build on the reachability formalism forprediction in Sec. III-A to obtain a framework which we willuse to both generate robust and faster predictions, as well asto enable planners to answer important analysis questionsabout Bayesian predictors. Our framework is based on onekey idea: rather than using a probability distribution over human actions as in (7), we will use a deterministic set ofallowable human actions at every step. Very importantly, thisset will be state-dependent, and therefore belief -dependent: u H ∈ U ( z ) , U ( z ) = { u H : h ( u H , z ) ≥ δ } (13)where h is a function allowed to depend on both the controland the state z = ( x H , λ ) , and threshold δ . Using a controlset rather than a distribution allows us to convert the stochas-tic reachability problem in Sec. III-A to a deterministicreachability problem, which can be solved using the HJB-PDE formulation in Section III-B. We now illustrate howdifferent instantiations of h in our framework enable bothrobust prediction and predictor analysis. Prediction.
We generate a predictor using our framework byinstantiating the set of allowable human actions to be onlythose with sufficient probability under the belief: u H ∈ U ( z ) , U ( z ) = { u H : P ( u H | z ) ≥ δ } . (14)Now, instead of associating future states with probabilities,we maintain a set of feasible states z at every time step. Overtime, this set still evolves via (6), but now all actions thathave too low probability are excluded, and actions that havehigh probability are all treated as equally likely. However,because of the coupling between future belief and allowableactions, we may approximate K t(cid:15) ( τ ) via a ˜ K tδ ( τ ) , using anon-zero δ . This has two advantages: (a) when the prior iscorrect, this allow us to compute an approximation of K t(cid:15) ( τ ) using the computational tools developed for reachabilityanalysis, and (b) when prior is incorrect, this allows the pre-dictions to be robust to such inaccuracies, since computationof ˜ K tδ ( τ ) no longer relies on the exact action (or observation)probabilities. We further discuss these aspects in Sec. IV. Analysis.
Suppose we have a prior (or current belief) over λ ;however, the prior might be wrong, i.e. arg max λ (cid:48) P ( λ (cid:48) ) (cid:54) = λ ∗ , with λ ∗ being some hypothesized ground truth value forthe human internal state. A reasonable question to ask insuch a scenario would be “how long it would take the robotto realize that the value of the internal state is λ ∗ , i.e. toplace enough probability in its posterior on λ ∗ ?” A differentinstantiation of our framework can be used to answer suchquestions: we now want to compute the BA-FRS underallowed human actions that are modeling the hypothesizedground truth, and compute how long it takes to attain thedesired property on the belief (we discuss this further inSec. V). Thus, the allowed control set is: u H ∈ U ( z ) , U ( z ) = { u H : P ( u H | x H , λ ∗ ) ≥ δ } . (15)Overall, by choosing h appropriately, we can generate arange of predictors and analyses. The two examples aboveseemed particularly useful to us, and we detail them in thefollowing sections.IV. A N EW HJ R
EACHABILITY - BASED P REDICTOR
Our reachability-based framework enables us generate anew predictor by computing an approximation of BA-FRS.In this section, we analyze the similarities and differencesetween this predictor and the one obtained by solving thefull stochastic reachability problem.
Prediction as an approximation of K . Since the stochas-tic reachability problem needs to explicitly maintain thestate probabilities, it might be challenging to compute K compared to ˜ K . However, this advantage in computationcomplexity is achieved at the expense of losing the infor-mation about the human state distribution, which can bean important component for several robotic applications.However, when the full state distribution is not required, as isthe case in this paper, ˜ K provides a very good approximationof K . In fact, it can be shown that K t ( τ ) = ˜ K t ( τ ) . Running example:
For simplicity, consider the case whenthe intrinsic behavior of the human does not change overtime, i.e., k ( P t ( λ )) = 0 . Since in this case λ takes only twopossible values, the joint state space is three dimensional.In particular, z = [ h x , h y , p ] , where p := P ( λ = g ) . P ( λ = g ) is given by (1 − P ( λ = g )) so we do notneed to explicitly maintain it as a state. P ( u H | z ) = p N ( µ , σ ) + (1 − p ) N ( µ , σ ) which can be used tocompute the set of allowable controls U for different δ s asper (14) . We use the Level Set Toolbox [23] to compute theBA-FRS, starting from x H = (0 , . The corresponding likelyhuman states, ˜ K tδ ( T ) , for different initial priors and δ s areshown in Fig. 3 in magenta. For comparison purposes, wecompute K t(cid:15) ( T ) (teal), as well as the “naive” FRS obtainedusing all possible human actions (gray). (cid:15) for K is pickedto capture 95% area of the set.As evident from Fig. 3, ˜ K t is an over-approximation of K t ,but at the same time it is not overly conservative unlikethe naive FRS. This is primarily because even though theproposed framework doesn’t maintain the full state distribu-tion, it still discards the unlikely controls during the BA-FRScomputation. It is also interesting to note that BA-FRS is nottoo sensitive to the initial prior for low δ s. This propertyof BA-FRS allows the predictions to be robust to incorrectpriors as we explain later in this section.We also show the full 3D BA-FRS, as well as the projected ˜ K t sets over time for an initial prior, p = 0 . , in Fig.2. When δ is high, both the belief as well as the humanstates are biased towards the goal g over time. When δ is high, only the actions that steer the human towards g will be initially contained in the control set for the BA-FRScomputation. Moreover, propagating the current belief underthese actions further reinforces the belief that the human ismoving towards g . As a result, the beliefs in the BA-FRSshift towards a higher p over time. On the other hand, when δ = 0 . , the human actions under g are also contained inthe control set, which leads to the belief and the human stateshift in both directions (towards g and g ). Prediction as robust to incorrect priors and misspecifiedmodels.
The set K t(cid:15) ( T ) depends heavily on the prior P t ( λ ) .When the initial prior for the human motion prediction isnot accurate enough, using K t(cid:15) ( T ) for planning might leadto unsafe behavior as it can be too optimistic. This issue isparticularly exacerbated when the true (unknown) parameter Fig. 3: As δ increases, we obtain a tighter approximation of K t ; this isbecause a sequence of “medium” likely actions over time result in states thatare overall unlikely under a Bayesian prediction. Discarding such actionsunder our framework leads to a better approximation of K t . However,choosing δ too aggressively might lead to an overly optimistic set. of the human, λ ∗ , is not within the support of λ consideredin the model, i.e., when the model is misspecified. Forexample, when the exact goal of the human may not be anyof the goals specified in the model. In such scenarios, a fullBayesian inference may fail to assign sufficient probabilitiesto the likely states of the human, which can lead to unsafesituations. On the other hand, using the full FRS , i.e., the setof all states the human can reach under any possible control,will ensure safety but can impede robot plan efficiency.In such situations, the proposed framework presents agood middle-ground alternative to the two approaches – itdoes not rely heavily on the exact probability of an actionwhile computing FRS since it leverages action probabilitiesonly to distinguish between likely and unlikely actions. Yet,it still uses a threshold to discard highly unlikely actionsunder the current belief, ensuring the obtained FRS is nottoo conservative. This allows our framework to perform wellin situations where the initial prior is not fully accurate butaccurate enough to distinguish likely actions from unlikelyactions. In particular, suppose the prior at time t is suchthat P ( u H | x tH ; λ ∗ ) ≥ δ = ⇒ P ( u H | z t ) ≥ δ ∀ u H ,where λ ∗ is the true (unknown) human parameter, and z t =( x tH , P t ( λ )) . Intuitively, the above condition states that theprior at time t is accurate enough to distinguish the set oflikely actions from the unlikely actions for the true humanbehavior; however, we do not have the knowledge of thetrue probability distribution of the actions. In such a case, itcan be shown that any human state that is reachable under acontrol sequence consisting of at least δ -likely controls willbe contained within ˜ K tδ . Running example:
Consider the scenario where the ac-tual human goal is g , midway between g and g (see Fig.4). Thus, the current model does not capture the true goal ofthe human. Even though the human walks straight towards g , a full Bayesian framework fails to assign sufficientprobabilities to the likely human states because of its overreliance on the model. Ultimately, this leads to a collisionbetween the human and the robot. In contrast, since our ig. 4: Visualized is a snapshot of the human predictions K t(cid:15) ( T ) startingfrom a prior of (0 . , . , and the corresponding trajectory of the robotwhen using the BA-FRS (left) or the Bayesian predictor (right). framework uses the model to only distinguish likely actionsfrom unlikely actions, it recognizes that moving straightahead is a likely human action. This is also evident fromFig. 3, where the states straight ahead of the human arecontained within the BA-FRS even for a relatively high δ of0.2. As a result, using the deterministic BA-FRS for planningleads to a safe navigation around the human.These results are confirmed in our hardware experimentsperformed on a TurtleBot 2 testbed. As shown in Fig. 1,we demonstrate these scenarios for navigating around ahuman participant. We measured human positions at 200Hzusing a VICON motion capture system and used on-boardTurtleBot odometry sensors for the robot state measurement.As discussed, our framework allows us to be robust tomisspecified goals while not being overly conservative. V. P
REDICTION A NALYSIS
An interesting question that our framework can answer ishow long does the predictor have to observe the human inorder to determine the true human behavior for some prior.For simplicity, consider the scenario where λ can take twopossible values b or b ; however, the true human parameteris unknown. We also have an initial prior over λ , given by P ( λ ) . Then we may pose the following questions: (1) Whatis the minimum and maximum possible time it will take todetermine that λ ∗ = b with sufficiently high probability(denoted as T min and T max )? (2) What are the correspondingsequences of observations? Thus, we want to know what isthe most and the least informative set of observations thatthe human can provide under λ ∗ = b . Similar questions canalso be posed for b . The overall minimum and the maximumtime to determine the true human behavior are then givenby T min = min { T min , T min } and T max = min { T max , T max } .Once T min and T max are available, the robot trajectory canbe planned to safeguard against both possibilities ( λ ∗ = b and λ ∗ = b ) for t ≤ T max . After a duration of T max , wewill be able to determine the true human behavior so it issufficient to safeguard against the likely states of the humanunder the belief P T max ( λ ) there on.As discussed in Sec. III-C, an instantiation of the pro-posed reachability-based framework can be used to deter-mine T min and T max . In particular, given the initial humanstate x H and the initial prior P ( λ ) , we can compute theBA-FRS with the control policy in (15), and the Hamiltonian H ( τ, z, ∇ V ( τ, z )) = max u H ∈U ∇ V ( τ, z ) · f ( z, u H ) . Then, T min can be obtained as the minimum time such that P ∗ ( λ ) is contained in V ( · ) for some human state x H . Here, P ∗ ( λ ) is a distribution that assigns a sufficiently high probabilityto λ = b . Intuitively, this computation gives the minimumtime it will take to reach the belief that λ ∗ = b if the truehuman parameter is indeed b and the human is giving usthe most informative samples to discern its behavior. Wecan similarly compute T min by computing a similar BA-FRSunder the likely controls from b .Similarly, T max can be computed using a similar proce-dure, but instead of maximizing over the control in theHamiltonian, we minimize over the control instead. Thiscomputation corresponds to finding the control sequencethat is least informative at inferring λ ∗ = b and can beobtained by: u ∗ H ( τ, z ) = arg min u H ∈U ∇ V ( τ, z ) · f ( z, u H ) . Intuitively, u ∗ H is the control observation that least differ-entiate between b and b . Similarly, u ∗ H correspondingto the computation of T min is the control observation thatdifferentiates most between b and b . This is closely relatedto prior work on legibility [30] and deception [31]: given afixed horizon, our framework computes a sequence of con-trols that is maximally informative or maximally ambiguouscumulatively across all the time steps, which in general isnontrivial to compute. Running example:
Consider the planar pedestrian dy-namics as before, but with the following human policy: u tH | x tH ∼ (cid:40) N (0 , σ ) , if λ = 0 Uniform ( − π, π ) , if λ = 1 , (16) where σ = 0 . . The human walks straight with a smallvariance when λ = 0 and move in a random direction when λ = 1 , approximating an irrational human. We compute theminimum and maximum time to realize λ ∗ = 0 starting froma high initial prior on irrational behavior, (0 . , . . Weassume that we can confidently conclude that λ ∗ = 0 whenall human trajectories reach a belief of at least . for λ = 0 .The obtained T min and T max are 3.2s and 11.2s respectively.We also compute the control sequences that correspond tothese times. The optimal control sequence for T min is givenby u H = 0 , since that is the most likely action under therational behavior compared to irrational behavior. On theother hand, for T max , the optimal control sequences consistof an angle of 15 degrees, which is the least likely action thatis above the δ -threshold (0.3 for this example) for λ = 0 . VI. C
ONCLUSION
When robots operate in complex environments aroundhumans, they often employ probabilistic predictive modelsto reason about human behavior. Even though powerful,such predictors can make poor predictions when the prioris incorrect or the observation model is misspecified. Thisin turn will likely cause unsafe behavior. In this work, weformulate human motion prediction as a Hamilton-Jacobireachability problem. We demonstrate that the proposedframework provides more robust predictions when the prioris incorrect or human behavior model is misspecified, canperform these predictions in continuous time and state usingthe tools developed for reachability analysis, and can be usedfor the predictor analysis.
EFERENCES [1] H. Bai, S. Cai, N. Ye, et al. “Intention-aware online POMDPplanning for autonomous driving in a crowd”.
InternationalConference on Robotics and Automation (ICRA) . 2015.[2] C. L. Baker, J. B. Tenenbaum, and R. R. Saxe. “Goal infer-ence as inverse planning”.
Annual Meeting of the CognitiveScience Society . Vol. 29. 2007.[3] A. Y. Ng, S. J. Russell, et al. “Algorithms for inverse rein-forcement learning.”
International Conference on MachineLearning (ICML) . 2000.[4] B. D. Ziebart, N. Ratliff, G. Gallagher, et al. “Planning-based prediction for pedestrians”.
International Conferenceon Intelligent Robots and Systems (IROS) . 2009.[5] K. M. Kitani, B. D. Ziebart, J. A. Bagnell, and M. Hebert.“Activity forecasting”.
European Conference on ComputerVision . 2012.[6] E. Schmerling, K. Leung, W. Vollprecht, andM. Pavone. “Multimodal Probabilistic Model-BasedPlanning for Human-Robot Interaction”. arXiv preprintarXiv:1710.09483 (2017).[7] H. B. Amor, G. Neumann, S. Kamthe, et al. “Interactionprimitives for human-robot cooperation tasks”.
InternationalConference on Robotics and Automation (ICRA) . 2014.[8] H. Ding, G. Reißig, K. Wijaya, et al. “Human arm mo-tion modeling and long-term prediction for safe and effi-cient human-robot-interaction”.
International Conference onRobotics and Automation . 2011.[9] H. S. Koppula and A. Saxena. “Anticipating human activitiesfor reactive robotic response.”
International Conference onIntelligent Robots and Systems . 2013.[10] P. A. Lasota and J. A. Shah. “Analyzing the effects ofhuman-aware motion planning on close-proximity human–robot collaboration”.
Human factors
InternationalConference on Humanoid Robots (Humanoids) . 2013.[12] K. Driggs-Campbell, R. Dong, and R. Bajcsy. “Robust,Informative Human-in-the-Loop Predictions via EmpiricalReachable Sets”.
IEEE Transactions on Intelligent Vehicles (2018).[13] W.-C. Ma, D.-A. Huang, N. Lee, and K. M. Kitani. “Fore-casting Interactive Dynamics of Pedestrians With FictitiousPlay”.
Conference on Computer Vision and Pattern Recog-nition (CVPR) . July 2017.[14] A. Alahi, K. Goel, V. Ramanathan, et al. “Social lstm:Human trajectory prediction in crowded spaces”.
Conferenceon Computer Vision and Pattern Recognition (CVPR) . 2016.[15] A. Rudenko, L. Palmieri, M. Herman, et al. “HumanMotion Trajectory Prediction: A Survey”. arXiv preprintarXiv:1905.06113 (2019).[16] J. Fisac, A. Akametalu, M. Zeilinger, et al. “A general safetyframework for learning-based control in uncertain roboticsystems”.
IEEE Trans. on Automatic Control (2018).[17] P. A. Lasota and J. A. Shah. “A multiple-predictor approachto human motion prediction”. . IEEE. 2017.[18] T. Bandyopadhyay, K. S. Won, E. Frazzoli, et al.“Intention-aware motion planning”.
Algorithmic Foundationsof Robotics X . Springer, 2013.[19] M. J. Kochenderfer, M. W. M. Edwards, L. P. Espindle, et al.“Airspace encounter models for estimating collision risk”.
Journal of Guidance, Control, and Dynamics
IEEE Trans. on automatic control
Automatica
Automatica (2004).[24] X. Chen, E. ´Abrah´am, and S. Sankaranarayanan. “Flow*:An analyzer for non-linear hybrid systems”.
InternationalConference on Computer Aided Verification . Springer. 2013.[25] D. Sadigh, S. S. Sastry, S. A. Seshia, and A. Dragan.“Information Gathering Actions over Human Internal State”.
Proceedings of the IEEE, /RSJ, International Conference onIntelligent Robots and Systems (IROS) . IEEE, Oct. 2016.[26] C. Finn, S. Levine, and P. Abbeel. “Guided cost learning:Deep inverse optimal control via policy optimization”.
In-ternational Conference on Machine Learning . 2016.[27] S. Bansal, V. Tolani, S. Gupta, et al. “Combining OptimalControl and Learning for Visual Navigation in Novel Envi-ronments”. arXiv preprint (2019).[28] A. Abate, S. Amin, M. Prandini, et al. “Computationalapproaches to reachability analysis of stochastic hybridsystems”.
International Workshop on Hybrid Systems: Com-putation and Control . Springer. 2007.[29] S. Bansal, M. Chen, S. Herbert, and C. J. Tomlin. “Hamilton-Jacobi Reachability: A Brief Overview and Recent Ad-vances”.
CDC . 2017.[30] A. D. Dragan, K. C. Lee, and S. S. Srinivasa. “Legibilityand predictability of robot motion”.
Proceedings of the8th ACM/IEEE international conference on Human-robotinteraction . IEEE Press. 2013.[31] A. D. Dragan, R. M. Holladay, and S. S. Srinivasa. “AnAnalysis of Deceptive Robot Motion.”