Toward Safe and Efficient Human-Robot Interaction via Behavior-Driven Danger Signaling
aa r X i v : . [ c s . R O ] F e b Toward Safe and Efficient Human–RobotInteraction via Behavior-Driven Danger Signaling
Mehdi Hosseinzadeh
Department of Electricaland Systems EngineeringWashington University in St. LouisSt. Louis, MO 63130, USAEmail: [email protected]
Bruno Sinopoli
Department of Electricaland Systems EngineeringWashington University in St. LouisSt. Louis, MO 63130, USAEmail: [email protected]
Aaron F. Bobick
Department of ComputerScience and EngineeringWashington University in St. LouisSt. Louis, MO 63130, USAEmail: [email protected]
Abstract —This paper introduces the notion of danger aware-ness in the context of Human-Robot Interaction (HRI), whichdecodes whether a human is aware of the existence of therobot, and illuminates whether the human is willing to engagein enforcing the safety. This paper also proposes a method toquantify this notion as a single binary variable, so-called dangerawareness coefficient . By analyzing the effect of this coefficienton the human’s actions, an online Bayesian learning method isproposed to update the belief about the value of the coefficient.It is shown that based upon the danger awareness coefficient andthe proposed learning method, the robot can build a predictivehuman model to anticipate the human’s future actions. In orderto create a communication channel between the human andthe robot, to enrich the observations and get informative dataabout the human, and to improve the efficiency of the robot, therobot is equipped with a danger signaling system. A predictiveplanning scheme, coupled with the predictive human model, isalso proposed to provide an efficient and Probabilistically safeplan for the robot. The effectiveness of the proposed schemeis demonstrated through simulation studies on an interactionbetween a self-driving car and a pedestrian.
I. I
NTRODUCTION
The aim in Human-Robot Interaction (HRI) is to enableefficient and safe interaction among all participating agents. Ingeneral, this is a very challenging task, as safety enforcementtechniques depend on some presumptions and assumptions thatwill not necessarily be true in practice, which may hamper theefficiency of the robot. More precisely, robots will inevitablyencounter incomplete and possibly erroneous knowledge ofthe environment and other agents; humans in particular. Forinstance, a human might be less attentive than normal whenfacing a self-driving car, presuming that the self-driving carwill undertake the safety satisfaction. Thus, robots must beable to safely and timely reason over the uncertainty of theenvironment they operate in to maintain their efficiency.Reasoning in uncertain environments is an area wherehumans excel. Inspired by this, this paper uses models ofhuman decision-making from cognitive science to develop aframework that enables robots to reason over the uncertaintiesinherent in predicting the actions of humans to enforce safety.In particular, this paper introduces the notion of dangerawareness in HRI. This notion can be used to decode whether a human is aware of the existence of other agents and possibledangers that exist in the environment, and to explicate whetherthe human is willing to engage in enforcing the safety. By theterm danger we refer to a situation in which the likelihood ofa collision between the human and the robot is greater than athreshold value if neither the robot nor the human change theirbehavior. This notion plays an important role in humans dailyinteraction. For instance, a driver pays more attention to a bikerin front of the car than the one behind the car, as the driverthinks that the biker in front of the car is not aware of the carapproaching from behind. Another example is a driver whoslows down when driving in an area wherein kids are playinga game, while he/she drives at the same speed when adultsare playing the same game, because he/she has learnt thatkids may jump on the street unawares. These two examplesshow that humans act based on their (unconscious) reasoningabout each other’s danger awareness level in safety-demandinginteraction. Therefore, the notion of danger awareness shouldbe incorporated in any approaches which aim at enabling therobots to reach the human-level intelligence, and in any safetyenforcing schemes.This paper proposes a method to quantify the notion ofdanger awareness. More precisely, it is shown that we canmodel the effect of this notion on human’s decisions via abinary variable, so-called danger awareness coefficient . Thispaper also proposes a method in which the robot contin-ually learns the value of this coefficient based upon real-time observations. A planning scheme is also proposed toprovide a probabilistically safe plan for the robot. Note thatby a probabilistically safe plan we mean a plan where theprobability of collision between the human and the robot isless than a certain value.Despite the current trend in robotics literature (e.g., [1, 2,3, 4]), we believe that it is implausible to learn the human’sbehavior by only passively observing his/her states and actions.Indeed, the human’s trajectory may not encode sufficientinformation about the human. Thus, any planner developedbased upon such learning might be tremendously inaccurate,leading to a conservative solution. One possible way to addressthis issue, and consequently to improve the efficiency of theobot, is to enable the robot to influence the environmentto enrich the observations. To cater this need, we assumethat the robot is equipped with a danger signaling system.This system can create a communication channel between thehuman and the robot, where the robot can convey a messageto the human and receives the human’s reply manifested inhis/her actions. More precisely, by means of the danger signal,the robot can actively aptly perturb the environment so thatthe bird’s-eye view of the human’s behavior observed by therobot is rich enough to reason about the human’s opinion onthe cooperative safety enforcement. It is noteworthy that theidea of perturb&observe is taken from human interaction aswell. A demonstration is a driver who sounds the horn toalert a pedestrian to the danger, and then reasons about his/herconsideration by observing his/her actions.Two key contributions of this paper are: 1) to introduceand quantify the notion of danger awareness, to investigateits effect in HRI safety, and to propose a real-time method tolearn its value, and 2) to propose a state-of-the-art planningscheme to provide probabilistically safe plans by taking intoaccount the notion of danger awareness. The main features ofthis work are: 1) it is general and can be applied to any HRIin which the human can assess the danger and can cooperatewith the robot to enforce safety; 2) the robot communicateswith the human, which can improve learning performance andefficiency; and 3) the proposed scheme is modular, meaningthat any other objective functions (possibly representing otherbehavioral attributes of the human) can be incorporated intothe scheme without changing the structure.The remainder of this paper is organized as follows. SectionII discusses selected related work. Section III formulates theproblem. Section IV discusses how to build a predictive humanmodel, how to learn from human’s actions, and how to predicthuman’s state in the future. In Section V a scheme for asafe and efficient planning is proposed. Section VI verifies theproposed method through intensive simulation studies. Finally,Section VII concludes the paper and discusses future work.
Notation:
We denote the set of real numbers by R , the setof positive real numbers by R > , and the set of non-negativereal numbers by R ≥ . We use N ( µ, σ ) to indicate the Gaussiandistribution with mean value µ and covariance Σ . We denoteproportionality by ∝ , and the transpose of matrix A by A ⊤ .II. R ELATED W ORK
In recent years, there have been several studies on predictingthe actions of humans in the context of HRI. In some work(e.g., [5]), it is assumed that the robot has complete knowledgeabout the environment. However, this assumption may not bereasonable in real-world scenarios due to uncertainties in thehuman’s behavior.As a result, many researchers have focused on developinga method to enable the robots to use the history of thehumans’ actions and states to predict future actions. In [6],propagation networks have been utilized to detect partiallyordered sequential actions of the humans. In [7], the authorsintroduced the concept of constrained probabilistic Petri nets and showed how this concept can be used to predict the actionsof humans. In [8], Gaussian mixture distribution techniqueshave been used to model the actions of humans and topredict the timing. Markov models [9, 10, 11] have beenused in a variety of studies to predict the timing of theactions of humans. In [12], an interaction primitive frameworkfor predicting the most likely future movements of a humanis developed. The anticipatory temporal conditional randomfields have been used in [13] to predict the future actions ofthe human. Some ad hoc methods have also been proposed inthe literature, e.g., [14].Extensive work in cognitive science has shown that humanbehavior can be well modeled by objective-driven optimization[15, 16, 17]. In this context, a goal-based planning methodis proposed in [18] to predict future pedestrian trajectories.References [19] and [20] provide a Bayesian framework toreason about the model confidence. The authors of [21] assumethat humans are rational and try to control their actions toavoid collision. In [19], the authors assume that the humansare irrational and it is the responsibility of the robot to maintaina safe distance from the human at all times. In [19], therobot models the human as more likely to choose actionswhich minimize a goal objective function. We use a goal-based method to model the human’s actions in this paper.However, we do not set any presumption on the rationalityof the human. More precisely, we model the human’s actionsby means of a combination of two separate objective functions,a goal objective function and a safety objective function,where the robot learns this combination online. As will beseen later, this formulation allows us to add the capabilityof indirect communication between the robot and the humanto understand the human’s actual intention, and consequentlyreduce conservatism (i.e., improve efficiency of the robot).Once a predictive human model is developed, the robot canuse this model to generate a safe and efficient plan. Severalplanning schemes have been proposed in the literature. In[22], the authors introduced the adaptive preferences algorithmthat computes a flexible optimal policy for robot schedulingand control in assembly manufacturing. In [23], a method hasbeen proposed to optimize the task assignment such that thecycle time is shortened, and consequently the productivity isincreased. Probabilistic wait-sensitive task planning have beenproposed in [24, 25] to optimize the robot tasks with respectto the posterior human action distributions, reducing the totalwait time of the human. In [26, 27], the authors proposeda motion planning scheme based on the human’s trajectoryprediction to improve efficiency. Genetic algorithms have alsobeen utilized in planners, e.g., [28]. The notion of the virtualplane is used in [29] for path planning and navigation indynamic environments. In [30], the authors have developeda path planning framework to safely navigate robots, whileavoiding dynamic obstacles with uncertain motion patterns.Finally, an optimal safe planner is proposed in [31], wherethe impacts of carelessness and boredom of humans have beentaken into account. Our work builds upon the notion of dangerawareness to create a scheme that can provide a safe andess-conservative plan, without degrading the efficiency of therobot. III. P
ROBLEM F ORMULATION
Consider a human-robot interaction, in which the robot andthe human are moving to two different present goal locations.In the following, we formulate the problem for a generalinteraction, while the interaction between a self-driving carand a pedestrian will be being used as a running example toshow the utility of the proposed method. The running exampleis demonstrated in Fig. 1.
A. Robot Model
The robot can be modeled as x R [ t + 1] = f R ( x R [ t ] , u R [ t ]) , (1)where x R [ t ] ∈ R n R and u R [ t ] ∈ R m R are respectively thestate and control action of the robot at time t , with n R and m R as the dimensions of the robot state-space and the robotcontrol action, respectively.Let g R ∈ R n R be the goal state of the robot. Suppose thatthe robot control action should belong to U R at all times,i.e., u R [ t ] ∈ U R , ∀ t ≥ . We assume that the robot usesa receding predictive control to reach the state g R , whileavoiding collisions with the human. In particular, considering Q Rg ( x R [ t ] , u R [ t ] , g R ) : R n R × R m R × R n R → R as theobjective function corresponding to the goal state g R , and [ t, t + T R ] with T R ∈ R ≥ as the prediction horizon, the robotsolves the following optimization problem at time t to computethe optimal control actions over the interval [ t, t + T R ] : u ∗ R [ t : t + T R ] , d R = arg min u R [ k ] P t + T R k = t Q Rg ( · ) s.t. u R [ k ] ∈ U R , ∀ k The model given in (1) P Coll [ k ] ≤ P th , ∀ k , (2)where k ∈ { t, · · · , t + T R } and u ∗ R [ t : t + T R ] = h u ∗ ⊤ R [ t ] · · · u ∗ ⊤ R [ t + T R ] i ⊤ , but only implements the action u ∗ R [ t ] , and then solves (2) again at the next time instance,repeatedly. In (2), d R is the on/off status of the danger signal(will be discussed in Section III-C), P Coll [ t ] ∈ [0 , is the theprobability of collision between the human and robot at time t (will be discussed later in Section IV-C), and P th ∈ [0 , isthe threshold value. Running Example:
The state of the self-driving car is x R [ t ] = (cid:2) p xR [ t ] p yR [ t ] (cid:3) ⊤ ∈ R , where p xR [ t ] and p yR [ t ] are x and y positions of the self-driving car at time t ,respectively. The control action of the self-driving car is u R [ t ] = (cid:2) u xR [ t ] u yR [ t ] (cid:3) ⊤ ∈ R , where u xR [ t ] and u yR [ t ] are the directional velocities along x and y axes at time t ,respectively. The dynamical model of the self-driving car isthen x R [ t + 1] = x R [ t ] + u R [ t ] . The target position for theself-driving car is g R = (cid:2) g xR y yR (cid:3) ⊤ ∈ R . For the sake ofsimplicity, we assume that the self-driving car can only movealong the y axis, i.e., p xR [ t ] = g xR , u xR [ t ] = 0 , ∀ t ≥ . Suppose Fig. 1: Running example: the interaction between a self-driving car and a pedestrian.that U R = n(cid:2) (cid:3) ⊤ , (cid:2) v R / (cid:3) ⊤ , (cid:2) v R (cid:3) ⊤ o , where v R is thebasic velocity of the self-driving car. This means that the self-driving car can move at either zero speed, or half speed, or fullspeed. Finally, the goal objective function of the self-drivingcar is Q Rg ( x R [ t ] , u R [ t ] , g R ) = θ ( p yR [ t ] + u yR [ t ] − g yR ) + θ ( u yR [ t ]) , where θ ∈ R > and θ ∈ R ≥ are designparameters. In this objective function, the first term is theEuclidean distance between the target position and the self-driving car at time t , and the second term is the distancetraveled by the self-driving car within one time step. B. Human Model
The human can be modeled as x H [ t + 1] = f H ( x H [ t ] , u H [ t ]) , (3)where x H [ t ] ∈ R n H and u H [ t ] ∈ R m H are respectively thehuman’s state and action at time t , with n H as the dimensionof the human state-space and m H as the dimension of thehuman’s action.Let g H ∈ R n H be the human’s goal state. Suppose thatthe human’s action should belong to U H at all times, i.e., u H [ t ] ∈ U H , ∀ t ≥ .As discussed in [15, 19, 32], human’s action can be wellmodeled by objective-driven optimization. In the consideredHRI, we can model the human’s action as optimizing acombination of a goal objective function and a safety objectivefunction. In mathematical terms, the human’s action at time t is the solution of the following optimization problem: u ∗ H [ t ] = arg min u H Q Hg ( x H [ t ] , u H , g H )+ βQ Hs ( x H [ t ] , u H , ˆ x R [ t ]) s.t. The model given in (3) u H ∈ U H , (4)where Q Hg ( x H [ t ] , u H [ t ] , g H ) : R n H × R m H × R n H → R is thegoal objective function corresponding to the goal state g H ,and Q Hs ( x H [ t ] , u H [ t ] , ˆ x R [ t ]) : R n H × R m H × R n R → R is thesafety objective function. In (4), ˆ x R [ t ] ∈ R n R is an estimationf the state of the robot at time t computed by the human.Most importantly, β is a binary variable (i.e., β ∈ { , } ) thatwe refer to as the danger awareness coefficient. A commoninterpretation of β = 0 is a human who does not see therobot, is careless, or presumes that it is the responsibility ofthe robot to keep a safe distance. Whereas, β = 1 meansthat the human is aware of the danger, is risk-averse, and actsproperly to reduce the risk. Remark 1:
The estimation ˆ x R [ t ] can be modeled as ˆ x R [ t ] = x R [ t ] + ǫ [ t ] , with ǫ [ t ] ∈ R n R as a zero-mean Gaussian randomvariable with covariane Σ ∈ R n R × n R ≥ , i.e., ǫ [ t ] ∼ N (0 , Σ) . Running Example:
The pedestrian is an adult, whose stateis x H [ t ] = (cid:2) p xH [ t ] p yH [ t ] (cid:3) ⊤ ∈ R , where p xH [ t ] and p yH [ t ] arex and y positions of the pedestrian at time t , respectively.The pedestrian’s action is u H [ t ] = (cid:2) u xH [ t ] u yH [ t ] (cid:3) ⊤ ∈ R ,where u xH [ t ] and u yH [ t ] are the directional velocities alongx and y axes at time t , respectively. The pedestrian’smodel is then x H [ t + 1] = x H [ t ] + u H [ t ] . The targetposition for the pedestrian is g H = (cid:2) g xH g yH (cid:3) ⊤ ∈ R .For the sake of simplicity, we assume that thepedestrian can only move along the x axis, i.e., p yH [ t ] = g yH , u yH [ t ] = 0 , ∀ t ≥ . Suppose that U H = n(cid:2) − v H (cid:3) ⊤ , (cid:2) − v H (cid:3) ⊤ , (cid:2) (cid:3) ⊤ , (cid:2) v H (cid:3) ⊤ , (cid:2) v H (cid:3) ⊤ o ,where v H is the pedestrian’s basic walking velocity.This means that the pedestrian can stop, walk, or run ineither directions. The pedestrian’s goal objective functionis Q Hg ( x H [ t ] , u H [ t ] , g H ) = θ ( p xH [ t ] + u xH [ t ] − g xH ) + θ ( u xH [ t ] − v H ) , where θ , θ ∈ R > are design parameters.In this objective function, the first term is the Euclideandistance between the target position and the pedestrian attime t , and the second term ensures that the pedestrianwalks toward the target position. Finally, the safety objectivefunction is Q Hs ( x H [ t ] , u H , ˆ x R [ t ]) = θ e − θ · dist [ t ] , where θ , θ ∈ R > are design parameters, and dist [ t ] ∈ R ≥ is thedistance between self-driving car and the pedestrian estimatedby the pedestrian at time t . This safety objective functionis indeed a penalty function on the distance between thepedestrian and the self-driving car, such that larger distanceproduces lower penalty value. This formulation is plausible,as humans continue walking toward the target position whenthere is large distance. C. Danger Signaling System
Despite majority of work in the literature, we assume thatthe humans can be influenced by the actions of the robots.Indeed, we believe that assuming that humans irrationallyoperate in human-robot environments and intentionally ignorethe robots not only is unrealistic, but also severely harms theefficiency of the robots.To address this issue, we assume that the robot isequipped with a proper pre-collision method which usessignals/indicators to alert the danger to the human (e.g., visualindicators [33] and auditory signals [34]). As discussed inSection I, the main goal of employing the danger signalingis to improve the robot’s ability to estimate the state of the Fig. 2: The general structure of the proposed planning scheme.human danger awareness, and to maintain the efficiency ofthe robot in reaching the goal state without being influencedby the human’s possible unsafe actions. We denote the on/offstatus of the danger signal by the binary variable d R , where d R = 0 if the signal is off and d R = 1 if it is on. The robotswitches the signal on when the constraint P Coll [ k ] ≤ P th in(2) is active for any k ∈ { t, · · · , t + T R } (i.e., this constrainteffects the obtained results). Remark 2:
From a technical viewpoint, the danger signal-ing can actively perturb the environment so to improve theefficiency and safety of the interaction by: 1) acquainting anunaware human or a human who underestimates the danger(i.e., changing the value of β from 0 to 1), and 2) helping thehuman to reduce the estimation error ǫ [ t ] (e.g., the error maybe large due to bright sun glare in the human’s eyes). Running Example:
The self-driving car uses the highbeams to notify the possible collision between the car andthe pedestrian.
D. Problem Statement
At this stage we define the following problem.
Problem 1:
Consider a HRI, in which the robot and thehuman are moving to two different goal locations. Supposethat the robot model is as in (1), and the robot uses the recedingpredictive control given in (2) to determine the next action.Suppose that the human’s model is as in (3), and the humandecides the next action via optimization problem (4). Supposethat the robot uses the danger signaling system to alert thedanger to the human. Provide a method to ensure that bothagents reach their goal states, while the safety of the humanand the robot is guaranteed.In order to solve Problem 1, assuming that the robot can ob-serve human’s location and action, we will develop a schemeto provide probabilistically safe robot actions to guide therobot to the goal state, without any collision with the human.The structure of the proposed scheme is depicted in Fig. 2. Thegist of this scheme is the development of a predictive modelof the human’s motion, whose values are computed throughposterior calculations based upon observations performed bythe robot. The proposed scheme will be discussed in detail inthe following sections.IV. P
REDICTIVE H UMAN M ODEL
According to (2), the robot determines its actions by takinginto account the future states of the human. This mean thatthe robot must employ a predictive human model to predictuman’s states in the future. The accuracy of these predictionsand the method used to plan around them determine the safetyof the interaction.According to (4), the human decides next action accordingto the objective functions Q Hg ( · ) and Q Hs ( · ) , and the dangerawareness coefficient β . We assume that the robot knowsthe objective functions Q Hg ( · ) and Q Hs ( · ) . This assumption isreasonable, as the robot can learn these functions from priorhuman’s motions or these functions can be explicitly providedby the system designers. However, any presumption on thevalue of the coefficient β will often be wrong in practice;humans may have different opinions of safety, or they may beless attentive than normal when facing robots presuming thatit is the responsibility of the robot to maintain a safe distance.Thus, the robot must be able to timely reason over the valueof β to produce a reliable distribution of human’s states in thefuture.In what follow, first, we will propose a probabilistic Boltz-mann model to predict the human’s action. Then, an updatemethod will be proposed to update the belief on the value ofthe danger awareness coefficient. Finally, it will be discussedhow the robot can predict the probability of a collision in thefuture. A. Human Action Prediction
As mentioned above, we assume that the robot knows theobjective functions Q Hg ( · ) and Q Hs ( · ) . Under this assumption,the robot can predict the human’s action as a probabilitydistribution over actions conditioned on the human’s state. Running Example:
The formulation of the goal objectivefunction Q Hg ( · ) is quite straightforward, as the pedestrianwants to walk from one sidewalk to the other sidewalk of thestreet. For what concerns the safety objective function Q Hs ( · ) ,it is possible to learn it for pedestrians based upon behavioralpatterns [35, 36, 37, 38]. Note that since β multiplies Q Hs ( · ) in (4), either β = 0 (i.e., unaware humans) or Q Hs ( · ) = 0 (e.g., children who do not recognize the danger) has the sameresults in human actions.The robot uses the following mixture distribution to modelthe human’s behavior: P ( u H | x H , x R ; β ) =(1 − ω H ) · P d ( u H | x H , x R ; β )+ ω H · P r ( u H ) , (5)where P d ( u H | x H , x R ; β ) models the human’s deliberate be-havior, P r ( u H ) models the human’s random behavior, and ω H ∈ [0 , is the mixture weight. Note that (5) should becomputed for every u H ∈ U H and every β ∈ { , } .The function P d ( u H | x H , x R ; β ) describes the probabilitydistribution of the human’s action if the human chooses theaction according to the the goal and safety objective functions.Assuming that the robot can observe the human’s state, one We understand that there might be some differences between people.However, this paper does not aim to deal with the differences. possible way to model the human’s deliberate behavior is touse the Boltzmann distribution, as follows: P d ( u H | x H , x R ; β ) ∝ e − γ ( Q Hg ( x H ,u H ,g H )+ βQ Hs ( x H ,u H ,x R ) ) , (6)where γ ≫ . Note that (6) should be computed for every u H ∈ U H and every β ∈ { , } . Remark 3:
Selecting a large γ ensures that model (6)treats the human as more likely to choose the action thatminimizes the cost function given in (4). More precisely, P d ( u ∗ H [ t ] | x H [ t ] , x R [ t ]; β ) ≈ , where u ∗ H [ t ] is as in (4), and P d ( u H [ t ] | x H [ t ] , x R [ t ]; β ) ≈ , where u H [ t ] = u ∗ H [ t ] . Remark 4:
The robot uses the actual state x R [ t ] in (6) to pre-dict the human’s actions, while the human uses the estimation ˆ x R [ t ] in (4) to decide the next action. Thus, if the estimationerror ǫ [ t ] is large, we may have P d ( u ∗ H [ t ] | x H [ t ] , x R [ t ]; β ) ,where u ∗ H [ t ] is as in (4).In reality, the human may choose a random action bycompletely ignoring the objective functions for any reason.The robot makes use of a uniform distribution to model thehuman’s random behavior, as follows P r ( u H ) = 1 |U H | , ∀ u H ∈ U H , (7)where |U H | is the cardinality of the set U H . Note that thisuniform distribution can be given a more practical interpreta-tion related to the accuracy of the estimation ˆ x R [ t ] and/or thereliability of the detectors used by the robot to observe thehuman’s state and action. Running Example:
The uniform distribution P r ( u H ) models cases in which the pedestrian stops on the road orwalks in the opposite direction of the target position g H forany reason other than safety. B. Real-Time Update of the Belief About the Coefficient β The danger awareness coefficient β can be seen as ahidden state. Given a priori belief about the danger awarenesscoefficient (i.e., P ( β ) , ∀ β ), the robot can use the observationsto update the belief about the danger awareness coefficient byapplying the Bayes’ rule. In mathematical terms, by observingthe human’s state and action at time t , the robot can update thebelief about the danger awareness coefficient via the followingBayesian update: P t +1 ( β ) = P ( u H [ t ] | x H [ t ] , x R [ t ]; β ) P t ( β ) P ¯ β P ( u H [ t ] | x H [ t ] , x R [ t ]; ¯ β ) P t ( ¯ β ) , ∀ β, (8)where P ( u H [ t ] | x H [ t ]; β ) is as in (5). Note that since the setof β values and the set of human’s possible actions U H aresmall, update rule (8) can be implemented in real-time. It isnoteworthy that P t ( β = 1) is the robot’s belief at time t aboutthe likelihood that the human is aware of the danger. C. Computation of the Probability of Collision
Suppose that the human’s state-space is divided into N c discrete grid cells. Thus, by observing x H [ t ] , the probabilityf the human’s state in the time interval [ t + 1 , t + T R ] can bepredicted via the following recursive update rule: P ( x H [ k + 1]) ∝ X x H [ k ] ,u H [ k ] X β P ( x H [ k + 1] | x H [ k ] , u H [ k ]) · P ( u H [ k ] | x H [ k ] , x R [ k ]; β ) · P t ( β ) , (9)for k = t, · · · , t + T R − , where P ( u H [ k ] | x H [ k ] , x R [ k ]; β ) can be computed via (5), P t ( β ) can be computed via (8), and P ( x H [ k + 1] | x H [ k ] , u H [ k ]) is equal to 1 if x H [ k + 1] , x H [ k ] ,and u H [ k ] satisfy (3), and is equal to zero otherwise.Note that (9) should be computed for every grid cell inthe human’s state-space. In other words, for each k , thelikelihood of the human’s state being in all N c cells should becomputed. Also, note that we use P t ( β ) in (9) to predict thehuman’s state, meaning that the danger awareness coefficientis assumed to be constant within the prediction horizon. Due tothe receding nature of the planner given in (2), this assumptiondoes not hamper the performance of the planner.Once the probability distribution of the human’s state overthe time interval [ t + 1 , t + T R ] is generated, the robotshould computed the probability of collision P Coll [ k ] , k = t + 1 , · · · , t + T R . Note that P Coll [ t ] = 0 in (2), as there is nocollision at the current time (i.e., time t ).Let π ( x H [ k ]) , k = t + 1 , · · · , t + T R be a neighborhoodaround the predicted human’s state x H [ k ] . This neighborhoodshould be determined according to the predefined safe distanc-ing measures, the effect of possible modeling and trackingerrors on the predictions, and the effect of gridding thehuman’s state-space (i.e., quantization error).By construction, the probability of a collision event at time k (for k ∈ { t + 1 , · · · , t + T R } ) can be computed [19] as theprobability that x R [ k ] is inside the neighborhood π ( x H [ k ]) ,without any collisions prior to k . This probability is presentedin (10). Note that (10) should be computed recursively. Remark 5:
Since the effect of gridding the human’s state-space is reflected in the function π ( · ) , safety analyses of thispaper are valid even with a small N c (i.e., large cells). Remark 6:
According to (10), the probability of collisionfor k ∈ { t + 1 , t + T R } can be upper bounded as P Coll [ k ] ≤ P ( x R [ k ] ∈ π ( x H [ k ])) , (11) which means that collision probabilities over time are indepen-dent. This upper bound may lead to a conservative solution.However, it can significantly reduce the computational com-plexity of optimization problem (2).The following proposition elucidates how (10) (or (11)) canbe computed according to (9). Proposition 1:
Suppose that x R [ k ] , k ∈ { t + 1 , · · · , t + T R } is the robot trajectory within the prediction horizon. Then, P ( x R [ k ] ∈ π ( α )) where α is a cell in the human’s state-spaceis equal to the probability that the human’s state at time k is α , i.e., P ( x H [ k ] = α ) , which can be computed through (9). Running Example:
We define π ( x H [ k ]) as a circle cen-tered on x H [ k ] with radius ρ . Thus, x R [ k ] ∈ π ( x H [ k ]) iff ( p xH [ k ] − p xR [ k ]) + ( p yH [ k ] − p yR [ k ]) ≤ ρ . Remark 7:
Setting P th = 0 provides a deterministically safe trajectory. Such trajectories are usually very conservative, asthey take into account the worst-case scenario. The obtainedtrajectory for P th ∈ (0 , is probabilistically safe, which, ingeneral, does not guarantee recursive feasibility. This is anissue in many planning schemes which are developed basedupon a probabilistic human model, e.g., [3, 19, 20, 30, 39].Indeed, since the robot actions are limited and due to theimperfect human model, the recursive feasibility is unsurpris-ingly very difficult to satisfy without rendering the solutionconservative.V. T HE P ROPOSED P LANNING S CHEME
The general structure of the proposed planning scheme isshown in Fig. 2. In this scheme, first, the robot observes thehuman’s state x H [ t ] and its own state x R [ t ] . Based on theseobservations, the robot generates two probability distributions:1) a distribution over the human’s next action, and 2) adistribution over the human’s states in a future time interval.The former probability distribution will be used to update thebelief on the human’s risk-aversion level. The latter probabilitydistribution will be used to compute probability of collision ina future time interval. The robot then solves an optimizationproblem to determine the next action and the on/off status ofthe danger signal. Finally, Once the human takes an action,the robot observes the action and updates the belief about thedanger awareness coefficient. The proposed scheme can besummarized as Algorithm 1. P Coll [ k ] = P (cid:0) x R [ k ] ∈ π ( x H [ k ]) , x R [ k − π ( x H [ k − , · · · , x R [ t + 1] π ( x H [ t + 1]) (cid:1) = P (cid:0) x R [ k ] ∈ π ( x H [ k ]) | x R [ k − π ( x H [ k − , · · · , x R [ t + 1] π ( x H [ t + 1]) (cid:1) × P (cid:0) x R [ k − π ( x H [ k − | x R [ k − π ( x H [ k − , · · · , x R [ t + 1] π ( x H [ t + 1]) (cid:1) × · · · × P (cid:0) x R [ t + 1] π ( x H [ t + 1]) (cid:1) = P (cid:0) x R [ k ] ∈ π ( x H [ k ]) | x R [ k − π ( x H [ k − , · · · , x R [ t + 1] π ( x H [ t + 1]) (cid:1) × (cid:18) − P (cid:0) x R [ k − ∈ π ( x H [ k − | x R [ k − π ( x H [ k − , · · · , x R [ t + 1] π ( x H [ t + 1]) (cid:1)(cid:19) × · · · × (cid:18) − P (cid:0) x R [ t + 1] ∈ π ( x H [ t + 1]) (cid:1)(cid:19) . (10) -4-202400.20.40.60.81 Fig. 3: A screenshot of the generated simulator shown in the accompanied video (see https://youtu.be/ 9UjDvZYT2U). Leftfigure: the interaction between a self-driving car and a pedestrian in the street; the pedestrian moves from right to left, andthe car moves from south to north. Middle-left figures: the top figure shows the time profile of the robot’s belief about thelikelihood that the human is aware of the danger, i.e., P t ( β = 1) , and the bottom figure shows the probability of collision overthe prediction horizon. Middle-right figure: probability distribution of the human’s position in the street over the predictionhorizon, computed at each time instant (here at time t = 17 ); note that the pedestrian moves from right to left. Right figures:the top and bottom figures indicate the pedestrian and car actions at the current time instant, respectively. Algorithm 1
Planning Scheme Observe the human’s state x H [ t ] and the state of the robot x R [ t ] . Compute the mixture distribution P ( u H | x H [ t ] , x R [ t ]; β ) for every u H ∈ U H and β via (5). Compute the probability distribution of human’s states P ( x H [ k ]) for k ∈ { t + 1 , t + T R } via (9). Compute the probability of collision P Coll [ k ] for k ∈ { t +1 , t + T R } via (10) (or (11)). Determine the action of the robot u ∗ R [ t ] and the on/offstatus of the danger signal d R via (2). Observe the human’s action u H [ t ] . Update the belief about the danger awareness coefficientvia (8), i.e., compute P t +1 ( β ) for all β .VI. S IMULATION S TUDY
In order to demonstrate the effectiveness of the proposedscheme, we simulate the considered running example, i.e., theinteraction between a self-driving car and a pedestrian shownin Fig. 1. We assume that g R = [0 80] ⊤ , g H = [5 10] ⊤ , v R = 2 , v H = 0 . , ω H = 0 . , P th = 0 . , T R = 5 , γ = 1000 , θ = 1 , θ = 0 . , θ = 2 . , θ = 8 × − , θ = 300 , θ = 6 × − , Σ = 1 , and P ( β = 0) = P ( β = 1) =1 / . The simulations are carried out using MATLAB/Simulinkpackage, on Intel(R) Core(TM)i7-7500U CPU 2.70 GHz with16.00 GB of RAM. We use the YALMIP toolbox [40] to solvethe optimization problems.In order to have a visual demonstration of the consideredinteraction between a pedestrian and a self-driving car, asimulator has been generated. Fig. 3 presents an overview ofthe generated simulator. A video of operation of the simulatoris available at the URL: https://youtu.be/ 9UjDvZYT2U.
A. Impact of the Danger Signaling System
As discussed in Remark 2, the danger signaling system canimprove the efficiency and safety by acquainting the unawarepedestrian. This aspect is shown in Fig. 4. As seen in thisfigure, when the pedestrian is unconcerned (i.e., β = 0 and/or Q Hs ( · ) = 0 ), the pedestrian keeps walking toward the targetposition g H . Thus, the self-driving car comes to a full stop tokeep the probability of collision lower than the threshold value.Whereas, when the robot alerts the danger to a concernedbut unaware pedestrian, he/she runs backward to the rightsidewalk. Thus the self-driving car continues toward the goalposition g R without stopping. B. Impact of the Estimation Error ǫ [ t ] As discussed in Section III-B, the pedestrian makes use ofan estimation of the position of the self-driving car. However,as discussed in Section IV-A, the self-driving car predicts thepedestrian’s action based on its actual position. This meansthat in the presence of a large estimation error, even if thepedestrian is completely aware of the danger (i.e., β = 1 ),the pedestrian may take an action which increases the risk.Thus, the self-driving car cannot learn accurately the dangerawareness coefficient, and may have unnecessary stops. Asdiscussed in Remark 2, the danger signaling system can impactthe interaction by decreasing the estimation error. This impacthas been studied in Fig. 5. As seen in this figure, eventhough the pedestrian is concerned, due to the estimation errorthe pedestrian takes a safe action late . Note that when theestimation error is small (i.e., ǫ [ t ] = 5 ), the pedestrian runsbackward toward the sidewalk on the right, as the pedestrianrealizes the danger when he/she is in the right half of the street. We assume that the pedestrian compensates the estimation error as ǫ [ t ] = ǫ · e − η.d R . ( t − t d ) , where ǫ is the initial error, η > is a scalar, and t d isthe time that d R switches from 0 to 1. ig. 4: Time profile of the robot’s belief about the likelihoodthat the pedestrian is aware of the danger. Top figure: thepedestrian is concerned, i.e., is aware of the danger and en-gages in the safety enforcement. Middle figure: the pedestrianis unconcerned, i.e., either is unaware of the danger or doesnot care. Bottom figure: the danger signaling system acquaintsthe concerned but unaware pedestrian.While, when the estimation error is large (i.e., ǫ [ t ] = 10 ), thepedestrian runs forward toward the sidewalk on the left, as thepedestrian realizes the danger when he/she is in the left halfof the street. C. Impact of the Mixture Weight ω H As discussed in Section IV-A, the mixture weight ω H de-fines the relationship between the mixture components. Moreprecisely, ω H = 0 means that the human is being driven onlyby the goal and safety objective functions, and ω H = 1 meansthat the human chooses the actions randomly by completelyignoring the objectives functions. This mixture weight affectsthe prediction of the pedestrian’s position in the street overthe prediction horizon. In particular, for a large ω H , as thepedestrian appears random, the probability distribution overthe pedestrian’s position in the future will be wide. While, asmall ω H leads to a tight distribution. This impact is shownin Fig. 6 for four different values of ω H . As seen in thisfigure, by increasing ω H , the probability distribution over thepedestrian’s position in the street becomes wider, meaningthat the prediction of the pedestrian’s position becomes moreuncertain.VII. C ONCLUSION AND F UTURE W ORK
This paper introduced the notion of danger awareness inHRI, and accordingly the co-called danger awareness coef- Fig. 5: Time profile of the robot’s belief about the likelihoodthat the pedestrian is aware of the danger in the presenceof a concerned pedestrian. Top figure: the estimation error iszero; the pedestrian realizes the danger in time. Middle figure:the pedestrian realizes the danger late, and runs toward thesidewalk on the right. Bottom figure: the pedestrian realizesthe danger very late, and runs toward the sidewalk on the left. -4-202400.20.40.60.81 -4-202400.20.40.60.81-4-202400.20.40.60.81 -4-202400.20.40.60.81
Fig. 6: The impact of the mixture weight ω H in the probabilitydistribution over the human’s position in the street over theprediction horizon, computed at time t . Note that the pedes-trian moves from right to left.ficient. This coefficient quantifies the human’s intention toparticipate in and/or human’s opinion of cooperative safetyenforcement. The notion of danger awareness contributes tothe state-of-the-art by revoking the presumption that humansdo not intend to cooperate with robots to enforce safety, whichusually leads to a conservative solution. In particular, thisotion not only addresses Law hidden intention of humans,this paper revealed that it is possible to enforce safety withouthindering the robots. In future work, we will investigate howthe notion of danger awareness can be leveraged to improve fairness in HRI despite conflicting objectives, i.e., ensuringsafety of all agents without letting them deceive each other.R EFERENCES [1] C. Brooks and D. Szafir, “Balanced information gatheringand goal-oriented actions in shared autonomy,” in
Proc.14th ACM/IEEE Int. Conf. Human-Robot Interaction ,Daegu, Korea, Mar. 11-14, 2019, pp. 85–94.[2] M. El-Shamouty, X. Wu, S. Yang, M. Albus, and M. F.Huber, “Towards safe human-robot collaboration usingdeep reinforcement learning,” in
Proc. Int. Conf. Roboticsand Automation , Paris, France, May 31-Aug. 31, 2020,pp. 4899–4905.[3] D. Fridovich-Keil, A. Bajcsy, J. F. Fisac, S. L. Herbert,S. Wang, A. D. Dragan, and C. J. Tomlin, “Confidence-aware motion prediction for real-time collision avoid-ance,”
The Int. J. Robotics Research , vol. 39, no. 2-3,pp. 250–265, Mar. 2020.[4] R. Peddi, C. D. Franco, S. Gao, and N. Bezzo, “A data-driven framework for proactive intention-aware motionplanning of a robot in a human environment,” in
Proc. IEEE/RSJ Int. Conf. Intelligent Robots and Systems , LasVegas, NV, USA, Oct. 25-29, 2020, pp. 5738–5744.[5] M. Huber, A. Knoll, T. Brandt, and S. Glasauer, “When toassist?-modelling human behaviour for hybrid assemblysystems,” in
Proc. 41st Int. Symp. Robotics and 6thGerman Conf. Robotics , Munich, Germany, Jun. 7-9,2010.[6] Y. Shi, Y. Huang, D. Minnen, A. Bobick, and I. Essa,“Propagation networks for recognition of partially or-dered sequential action,” in
Proc. IEEE Computer SocietyConf. Computer Vision and Pattern Recognition , Wash-ington, DC, USA, Jun. 27- Jul. 2, 2004, pp. 862–869.[7] M. Albanese, R. Chellappa, V. Moscato, A. Picariello,V. S. Subrahmanian, and P. Turaga, “A constrained prob-abilistic petri net framework for human activity detectionin video,”
IEEE Trans. Multimed. , vol. 10, no. 6, pp. 982–996, Oct. 2008.[8] J. Kinugawa, A. Kanazawa, S. Arai, and K. Kosuge,“Adaptive task scheduling for an assembly task coworkerrobot based on incremental learning of human motionpatterns,”
IEEE Robot. Autom. Lett. , vol. 2, no. 2, pp.856–863, Apr. 2017.[9] D. Vasquez, T. Fraichard, and C. Laugier, “Incrementallearning of statistical motion patterns with growing hid-den markov models,”
IEEE Trans. Intell. Transp. Syst. ,vol. 10, no. 3, pp. 403–416, Sep. 2009.[10] B. T. Morris and M. M. Trivedi, “Trajectory learningfor activity understanding: Unsupervised, multilevel, andlong-term adaptive approach,”
IEEE Trans. Pattern Anal.Mach. Intell. , vol. 33, no. 11, pp. 2287–2301, Nov. 2011.[11] H. Ding, G. Reibig, K. Wijaya, D. Bortot, K. Bengler,and O. Stursberg, “Human arm motion modeling andlong-term prediction for safe and efficient human-robot-interaction,” in
Proc. IEEE Int. Conf. Robotics and Au-tomation , Shanghai, China, May 9-13, 2011, pp. 5875–5880.[12] H. B. Amor, G. Neumann, S. K. nd Oliver Kroemer,and J. Peters, “Interaction primitives for human-robotcooperation tasks,” in
Proc. IEEE Int. Conf. Robotics andAutomation , Hong Kong, China, May 31-Jun. 7, 2014,pp. 2831–2837.[13] H. S. Koppula and A. Saxena, “Anticipating human ac-tivities for reactive robotic response,” in
Proc. IEEE/RSJInt. Conf. Intelligent Robots and Systems , Tokyo, Japan,Nov. 3-7, 2013, p. 2071.[14] K. Li, J. Hu, and Y. Fu, “Modeling complex temporalcomposition of actionlets for activity prediction,” in
Proc.European Conference on Computer Vision , Florence,Italy, Oct. 7-13, 2012, pp. 286–299.[15] C. L. Baker, J. B. Tenenbaum, and R. R. Saxe, “Goalinference as inverse planning,”
Proceedings of the AnnualMeeting of the Cognitive Science Society , vol. 29, pp.779–784, 2007.[16] J. V. Neumann and O. Morgenstern,
Theory of games andeconomic behavior . Princeton University Press, 2007.[17] R. D. Luce,
Individual Choice Behavior: A Theoreticalnalysis . Courier Corporation, 2012.[18] B. D. Ziebart, N. Ratliff, G. Gallagher, C. Mertz,K. Peterson, J. A. Bagnell, M. Hebert, A. K. Dey, andS. Srinivasa, “Planning-based prediction for pedestrians,”in
Proc. IEEE/RSJ Int. Conf. Intelligent Robots andSystems , St. Louis, MO, USA, Oct. 10-15, 2009, pp.3931–3936.[19] J. F. Fisac, A. Bajcsy, S. Herbert, D. Fridovich-Keil,S. Wang, C. J. Tomlin, and A. D. Dragan, “Probabilisti-cally safe robot planning with confidence-based humanpredictions,” in
Proc. Robotics: Science and Systems ,Pittsburgh, PA, USA, Jun. 26-30, 2018.[20] A. Bajcsy, S. L. Herbert, D. Fridovich-Keil, J. F. Fisac,S. Deglurkar, A. D. Dragan, and C. J. Tomlin, “A scal-able framework for real-time multi-robot, multi-humancollision avoidance,” in
Proc. Int. Conf. Robotics andAutomation , Montreal, QC, Canada, May 20-24, 2019,pp. 936–943.[21] K. P. Hawkins and P. Tsiotras, “Anticipating humancollision avoidance behavior for safe robot reaction,” in
Proc. IEEE Conf. Decision and Control , Miami Beach,FL, USA, Dec. 17-19, 2018, pp. 6301–6306.[22] R. Wilcox, S. Nikolaidis, and J. Shah, “Optimization oftemporal dynamics for adaptive human-robot interactionin assembly manufacturing,” in
Proc. Robotics: Scienceand Systems , Sydney, NSW, Australia, Jul. 9-13, 2012,pp. 441–448.[23] H. Ding, M. Schipper, and B. Matthias, “Optimized taskdistribution for industrial assembly in mixed human-robot environments– Case study on IO module assem-bly,” in
Proc. IEEE Int. Conf. Automation Science andEngineering , Taipei, Taiwan, Aug. 18-22, 2014, pp. 19–24.[24] K. P. Hawkins, N. Vo, S. Bansal, and A. F. Bobick,“Probabilistic human action prediction and wait-sensitiveplanning for responsive human-robot collaboration,” in
Proc. 13th IEEE-RAS Int. Conf. Humanoid Robots , At-lanta, GA, USA, Oct. 15-17, 2013, pp. 499–506.[25] K. P. Hawkins, S. Bansal, N. N. Vo, and A. F. Bobick,“Anticipating human actions for collaboration in thepresence of task and sensor uncertainty,” in
Proc. IEEEInt. Conf. Robotics and Automation , Hong Kong, China,May 31- Jun. 7, 2014, pp. 2215–2222.[26] Y. Tanaka, J. Kinugawa, and K. Kosuge, “Motion plan-ning with worker’s trajectory prediction for assembly taskpartner robot,” in
Proc. IEEE/RSJ Int. Conf. IntelligentRobots and Systems , Vilamoura, Portugal, Oct. 7-12,2012, pp. 1525–1532.[27] A. Kanazawa, J. Kinugawa, and K. Kosuge, “Adaptivemotion planning for a collaborative robot based on pre-diction uncertainty to enhance human safety and workefficiency,”
IEEE Trans. Robot. , vol. 35, no. 4, pp. 817–832, Aug. 2019.[28] K. Baizid, A. Yousnadj, A. Meddahi, R. Chellali, andJ. Iqbal, “Time scheduling and optimization of industrialrobotized tasks based on genetic algorithms,”
Robot. Comput.-Integr. Manuf. , vol. 34, pp. 140–150, Aug. 2015.[29] F. Belkhouche, “Reactive path planning in a dynamicenvironment,”
IEEE Trans. Robotics , vol. 25, no. 4, pp.902–911, Aug. 2009.[30] G. S. Aoude, B. D. Luders, J. M. Joseph, N. Roy, andJ. P. How, “Probabilistically safe motion planning toavoid dynamic obstacles with uncertain motion patterns,”
Autonomous Robots , vol. 35, pp. 51–76, 2013.[31] M. Hosseinzadeh, B. Sinopoli, and A. F. Bobick, “To-ward a safe and efficient human-robot interaction: Whenyou have a careless worker!” in
Proc. Int. Conf. Roboticsand Automation , Xi’an, China, May 30-Jun. 5, 2021.[32] R. D. Luce,
Individual Choice Behavior: A TheoreticalAnalysis . Dover Publications, Inc., 2005.[33] K. Baraka, S. Rosenthal, and M. Veloso, “Enhancinghuman understanding of a mobile robot’s state and ac-tions using expressive lights,” in
Proc. 25th IEEE Int.Symp. Robot and Human Interactive Communication ,New York, NY, USA, Aug. 26-31, 2016, pp. 652–657.[34] M. S. Wogalter, “Communication-human informationprocessing (c-hip) model in forensic warning analysis,”in
Proc. 20th Congress Int. Ergonomics Association ,Florence, Italy, Aug. 26-30, 2018, pp. 761–769.[35] J. Zacharias, “Pedestrian behavior and perception inurban walking environments,”
J. Planning Literature ,vol. 16, no. 1, pp. 3–18, Aug. 2001.[36] B. J. Campbell, C. V. Zegeer, H. H. Huang, and M. J.Cyneck, “A Review of Pedestrian Safety Research in theUnited States and Abroad,” University of North Carolina,Tech. Rep., 11 2003.[37] V. Mehta, “Walkable streets: pedestrian behavior, per-ceptions and attitudes,”
J. Urbanism: Int. Research onPlacemaking and Urban Sustainability , vol. 1, no. 3, pp.217–245, 2008.[38] D. McAslan, “Walking and transit use behavior in walk-able urban neighborhoods,”
Michigan J. Sustainability ,vol. 5, no. 1, pp. 51–71, 2017.[39] M. P. Chapman, J. Lacotte, A. Tamar, D. Lee, K. M.Smith, V. Cheng, J. F. Fisac, S. Jha, M. Pavone, andC. J. Tomlin, “A risk-sensitive finite-time reachabilityapproach for safety of stochastic dynamic systems,” in
Proc. American Control Conf. , Philadelphia, PA, USA,Jul. 10-12, 2019.[40] J. Lofberg, “YALMIP: a toolbox for modeling and opti-mization in MATLAB,” in