[PDF] Toward Adaptive Trust Calibration for Level 2 Driving Automation

Abstract

Properly calibrated human trust is essential for successful interaction between humans and automation. However, while human trust calibration can be improved by increased automation transparency, too much transparency can overwhelm human workload. To address this tradeoff, we present a probabilistic framework using a partially observable Markov decision process (POMDP) for modeling the coupled trust-workload dynamics of human behavior in an action-automation context. We specifically consider hands-off Level 2 driving automation in a city environment involving multiple intersections where the human chooses whether or not to rely on the automation. We consider automation reliability, automation transparency, and scene complexity, along with human reliance and eye-gaze behavior, to model the dynamics of human trust and workload. We demonstrate that our model framework can appropriately vary automation transparency based on real-time human trust and workload belief estimates to achieve trust calibration.

Full PDF

TToward Adaptive Trust Calibration for Level 2 DrivingAutomation

Kumar Akash [email protected] UniversityWest Lafayette, IN, USA

Neera Jain [email protected] UniversityWest Lafayette, IN, USA

Teruhisa Misu [email protected] Research Institute USA, Inc.San Jose, CA, USA

Figure 1: Simulation environment of our user study with augmented reality (AR)-based information presentation. The goalof this research is to calibrate a driver’s trust in driving automation through AR-based information while avoiding increaseddriver workload.

ABSTRACT

Properly calibrated human trust is essential for successful inter-action between humans and automation. However, while humantrust calibration can be improved by increased automation trans-parency, too much transparency can overwhelm human workload.To address this tradeoff, we present a probabilistic framework us-ing a partially observable Markov decision process (POMDP) formodeling the coupled trust-workload dynamics of human behaviorin an action-automation context. We specifically consider hands-offLevel 2 driving automation in a city environment involving multi-ple intersections where the human chooses whether or not to relyon the automation. We consider automation reliability, automationtransparency, and scene complexity, along with human relianceand eye-gaze behavior, to model the dynamics of human trust andworkload. We demonstrate that our model framework can appro-priately vary automation transparency based on real-time humantrust and workload belief estimates to achieve trust calibration.

CCS CONCEPTS • Human-centered computing → User models ; Mixed / aug-mented reality;

KEYWORDS user modeling; HMI for automated driving; trust calibration

ACM Reference Format:

Kumar Akash, Neera Jain, and Teruhisa Misu. 2020. Toward Adaptive TrustCalibration for Level 2 Driving Automation. In

Proceedings of the 2020

ICMI ’20, October 25–29, 2020, Virtual event, Netherlands © 2020 Association for Computing Machinery.This is the author’s version of the work. It is posted here for your personal use. Notfor redistribution. The definitive Version of Record was published in

Proceedings of the2020 International Conference on Multimodal Interaction (ICMI ’20), October 25–29, 2020,Virtual event, Netherlands , https://doi.org/10.1145/3382507.3418885.

International Conference on Multimodal Interaction (ICMI ’20), October 25–29, 2020, Virtual event, Netherlands.

ACM, New York, NY, USA, 10 pages.https://doi.org/10.1145/3382507.3418885

Humans are increasingly becoming dependent on automation. Inthe driving domain, advanced driver-assistance systems (ADAS)like adaptive cruise control, lane assist, and collision avoidance havebeen developed and deployed extensively to vehicles driving on theroads today. Despite significant advancements in these technolo-gies, though, human supervision and intervention are still required.Researchers have shown that human trust plays a critical role ininteractions between humans and automated systems. For example,low levels of trust can lead to disuse of automation [12, 23], whereasexcessively relying on the automation capabilities under unsafeconditions, or situations outside of the scope of automation design,can lead to overtrust and consequently, accidents [33, 42]. There-fore, the goal of the study and resulting framework presented inthis paper is to align human trust with the automation’s reliability,rather than to maximize human trust.Researchers have proposed to develop paradigms that anticipatehuman interaction behaviors—such as trust in automation—andinfluence humans to make optimal choices about automation use[1, 17, 29, 38]. Pre-requisites for such an approach involve the capa-bility to quantitatively predict human behavior and an algorithmfor determining the optimal intervention to influence human be-havior. Chen et al. [11] has modeled human trust dynamics on atable-clearing task, and adjusted manipulation robot’s control be-havior considering human trust status. While many studies haveoptimized the system physical behavior, it is not necessarily aneasy approach when the system deals with safety critical tasks. Thesystem must guarantee safety while adapting the control behavior a r X i v : . [ c s . H C ] S e p or improved performance, which makes the optimization compli-cated. Several studies have shown that optimizing the amount ofinformation the system provides (disclosure of the system’s inter-nal states) can also achieve a trust calibration without changingsystem’s physical behavior. Studies such as [3–5, 21, 45, 54, 56]have shown that the optimization of automation transparency canalso contribute to the better collaboration performance. However,while more information is typically communicated to the human toachieve greater transparency, it often results in increased cognitiveworkload [24, 56] and can distract the human from the most criticalinformation [7] as well as sacrifice one of the primary benefitsof the automation, i.e., reduction of human workload. Therefore,a tradeoff between increased trust and increased workload existswhen considering increased transparency [6].Existing decision-making frameworks do not explicitly modelthe human workload dynamics required to address this tradeoff[11, 38], in particular for driving contexts. This is probably becausemost above-mentioned studies dealt with decision-aid contexts[4, 5] in which the automation only makes a recommendation, andthe final action is taken by the human, thus more transparency wasusually the better policy. However, several real-life contexts canbe classified as ‘action-automation’ where the automation takesthe action, unless intervened upon by the human. Examples ofthe action-automation include automated power plant operation,aircraft autopilot, and self-driving cars. Unlike decision automation,where the humans’ interaction with the automation is characterizedby their surveillance and compliance whenever the automationpresents a recommendation, action-automation is characterized bythe human continuously monitoring and relying on the automationor intervening to take over control. Thus, the system needs tomonitor a user’s workload while increasing transparency so thathis/her decision is not delayed by overwhelmed workload.In this paper, we present 1) an interaction model of human trust-workload dynamics and 2) optimization of the system transparencylevel based on the estimated human state in a hands-off SAE (Soci-ety of Automotive Engineers) level 2 driving context. The drivingautomation was chosen as it is a promising application of action-automation-based systems.Level 2 driving automation manages longitudinal and lateral con-trol but requires the driver to monitor the system. We specificallyconsider a city driving scenario involving multiple intersections,where humans can “take over” control, if desired. One challengein this application is that unlike the aircraft autopilot system thatassumes a trained professional operator, driving automation is ex-pected to be the first safety critical action-automation system thatare used by novice users. In this setting, we believe trust calibra-tion is especially important because of two reasons: 1) the systemreliability (at least user’s perceived system reliability) is affectedby the scene complexity 2) user trust to the system might changefrequently depending on system reliability and traffic conditiongiven that most users are not trained experts. We develop a proba-bilistic model of the user trust and workload dynamics using humansubject data collected using a driving simulator for urban driving Following those previous studies we define transparency as “the descriptive qualityof an interface pertaining to its abilities to afford an operator’s comprehension aboutan intelligent agent’s intent, performance, future plans, and reasoning process” [10]. scenes. We then optimize system behavior of dynamically vary-ing automation transparency to achieve a better trust-workloadtradeoff considering the automation performance.The contributions can be summarized as follows:(1) probabilistic dynamic modeling of human trust-workloadbehavior in action-automation contexts,(2) explicit modeling of the coupling between human trust andworkload,(3) driver behavior analysis using time-domain analysis tech-niques focusing on the effect of transparency on the coupledtrust and workload dynamics, and(4) to the best of our knowledge, this is the first work that opti-mizes system behavior policy design to calibrate trust in realtime for level 2 driving automation considering automationreliability, automation transparency, and scene complexity,along with human reliance and surveillance behavior.This paper is organized as follows. Section 2 summarizes relatedwork. Section 3 describes the proposed framework to quantitatively model the dynamics of human trust and workload. The drivingsimulation study used to collect human subject data and the param-eter estimation algorithm are presented in Section 4. Results anddiscussion about the estimated model and the corresponding policyare presented in Section 5, followed by concluding statements inSection 6.

Several researchers have developed a variety of human trust models.A large number of these models are qualitative models [15, 30,40, 43] which analyze the factors that affect trust but cannot beused to make quantitative predictions. Some quantitative models,including regression models [14, 44] and time-series models oftrust [2, 27, 29, 31–33, 41], fill this gap but do not account for theprobabilistic nature of human behavior.Markov models, particularly hidden Markov Models (HMMs)[18, 37, 39] have been used for probabilistic modeling of trust.While HMMs can incorporate the uncertainty in human behav-ior [34, 35, 48, 61], they do not include the effects of actions fromautonomous systems that affect human behavior. An extension ofHMMs, partially observable Markov decision processes (POMDPs),provide a framework that does account for these actions and enablesthe design and synthesis of a policy to choose optimal actions basedon a desired reward function. POMDPs have been used in HMI forautomatically generating robot explanations to improve perfor-mance [59] and estimating trust in agent-agent interactions [52].Recent work has demonstrated the use of a POMDP model withhuman trust dynamics to improve human-robot performance [3–5, 11]. However, these models do not capture the dynamic effectof automation transparency on human trust-workload behaviorin an action-automation context, specifically in driving automa-tion. In this work, we model human trust-workload behavior as aPOMDP and optimally vary automation transparency to improvethe interaction between the human and driving automation.Self-reported surveys are a common tool for assessing a human’strust and workload. Trust surveys include specific questions re-lated to the corresponding experimental context, and a Likert scale igure 2: A simplified representation of a partially observ-able Markov decision process (POMDP) model. is typically used for participants to report their trust in the sys-tem [30]. The NASA TLX survey is the preferred tool to assesshuman workload [49]. In the context of developing algorithms tocalibrate human trust in real time, however, it is not practical to usesurveys for trust and workload measurements because continuouslyinquiring humans is not feasible in most contexts. Alternatively,we propose to use behavioral metrics that are readily available inreal time and correlate to human trust-workload behavior. For thiswork, we use human reliance and surveillance behavior througheye-gaze to infer human trust and workload, respectively. The cor-relation between trust and reliance is well established [16, 33, 47].Furthermore, increased surveillance will lead to increased cognitiveload on the human.

Here we propose a probabilistic model for estimating human trustand workload dynamics. We assume that these dynamics followthe Markov property [50], and therefore, we model human trust-workload behavior as a POMDP. A POMDP is a 7-tuple ( S , A , O , T , E , R , γ ) and can be represented as shown in Figure 2. Here, S is afinite set of states, A is a finite set of actions, and O is a set of obser-vations. The transition from the current state s ∈ S to the next state s ′ ∈ S given the action a ∈ A is characterized by the transitionprobability function T ( s ′ | s , a ) . The emission probability function E( o | s ) characterizes the likelihood of observing o ∈ O given theprocess is in state s . Finally, the optimal policy is calculated basedon the reward function R( s ′ , s , a ) and the discount factor γ . Referto [53] for a detailed description of POMDPs. Many studies haveapplied POMDPs to model human-machine interaction [3, 5, 11]. Atypical allocation is as follows. S is associated with the human’sinternal (i.e., mental) states. Since S cannot be directly observedby the system, the system action selection is conducted based ona system belief over S , which is estimated through observation o ∈ O . Based on the belief, system action strategy π is optimizedto maximize the discounted cumulative reward, where the rewardfunction R( s ′ , s , a ) is designed based on the optimization target ofthe interaction.We apply this model to an interaction between a driver and a SAElevel 2 driving automation in a “hands-off” city-driving scenario.While the level 2 automation is active, the steering, acceleration,and braking are automated. Nevertheless, the human driver has to Trust Workload Reliance Gaze PositionReliability Transparency SceneComplexity

Figure 3: The structure of the proposed coupled trust-workload model. supervise the automation and “take over” control in order to main-tain safety as, and when, needed. Interaction with level 2 drivingautomation is characterized by the human’s reliance, or lack thereof,on the automation. Furthermore, there is an associated eye-gazebehavior corresponding to the human’s supervision of the automa-tion in the environment. We assume that these characteristics of thehuman’s behavior—i.e., reliance and gaze—are dependent on humantrust and workload. In particular, we assume that trust only affectsreliance, and workload only affects gaze position. This enables thetrust and workload states to be identified based on the emissionprobabilities, which will be discussed later. It should be noted thatthis assumption is challenged by earlier research that has shownthat the relationship between trust and reliance decreases underhigher workload [26]. On the other hand, although recent work hasshown that there exist correlations between human trust and theirgaze behavior [25, 36], it is suspected that intuitive processes thatare not captured by self-report measures might have mediated therelationship [25]. Given the need to strike a balance between modelfidelity and complexity, we assume that any coupled interactionsbetween these particular states and observations can be capturedthrough the coupled interaction between trust and workload; doingso facilitates parameterization of the model as described later inthis section. Finally, we assume that human trust and workloadare influenced by the characteristics of the automation—reliabilityand transparency—as well as that of the environment—i.e. scenecomplexity. The model structure based on these assumptions isillustrated in Figure 3. Table 1 summarizes the definitions of themodel variables.As human trust and workload cannot be directly observed, wedefine the finite set of states S of the trust-workload POMDP con-sisting of tuples of the Trust state s T and the Workload state s W , i.e., s ∈ S and s = [ s T , s W ] T . The Trust state s T can either be Low Trust T ↓ or High Trust T ↑ . Similarly, the Workload state s W can either beLow Workload W ↓ or High Workload W ↑ . Since these hidden statesof trust and workload are influenced by the characteristics of theautomation and the environment, we define the finite set of actions A consisting of the tuples a ∈ A containing the automation’s transparency a τ and reliability a r , along with scene complexity a C .The explicit definition of the possible values for each of the actions able 1: Definition of the trust-workload POMDP model. Human trust and workload are modeled as hidden states that areaffected by actions corresponding to the characteristics of the automation and the scene. The observable characteristics of thehuman are modeled as the observations of the POMDP. States s ∈ S s = (cid:20) Trust s T Workload s W (cid:21) s T ∈ (cid:8) Low trust T ↓ , High trust T ↑ (cid:9) s W ∈ (cid:8) Low workload W ↓ , High workload W ↑ (cid:9) Actions a ∈ A a =  Transparency a τ Reliability a r Scene Complexity a C  a τ ∈ (cid:8) Augmented reality cues absent AR off , Augmented reality cues present AR on (cid:9) a r ∈ (cid:8) Low reliability Rel low , Medium reliability Rel mid , High reliability Rel high (cid:9) a C ∈ Traffic density × Intersection complexityTraffic density (cid:66) (cid:8)

Low traffic density Traffic low , High traffic density Traffic high (cid:9)

Intersection complexity (cid:66) (cid:26)

Intersection with only cars Peds absent , Intersection with both cars and pedestrians Peds present (cid:27)

Observations o ∈ O o = (cid:20) Reliance o R Gaze position o G (cid:21) o R ∈ (cid:8) Not relying on automation R − , Relying on automation R + (cid:9) o G ∈ (cid:8) Road G road , Vehicle G vehi , Pedestrian G ped , Sidewalk G side , Others G oth (cid:9) depends on the specific interaction context and is therefore definedin Section 4 based on the human subject study design consideredin this manuscript. The observable characteristics of the humanare defined as the finite set of observations O consisting of human reliance o R and gaze position o G . Here, reliance o R can either bethe human driver relying on the automation, o R = R + , or the hu-man driver not relying on the automation and taking over control, o R = R − . We classify the human driver’s gaze position o G at anytime belonging to one of five possible values: 1) Road G road , 2) Vehi-cle G vehi , 3) Pedestrian G ped , 4) Sidewalk G side , and 5) Others G oth .Others consists of all other elements in the scene not covered in1-4, such as the interior of the car, the sky, and buildings.We assume that at any given time, human trust s ′ T and workload s ′ W are conditionally independent given the previous states s T , s W and actions a , i.e., p ( s ′ T , s ′ W | s T , s W , a ) = p ( s ′ T | s T , s W , a ) p ( s ′ W | s T , s W , a ) , but that trust s T and workload s W at the current time affect the nexttrust state s ′ T as well as the next workload state s ′ W . In this way, themodel captures the dynamic coupling between trust and workloadbehavior as it evolves over time. This assumption significantly re-duces the number of model parameters and in turn, the amount ofdata needed to estimate them. They also result in separate transitionprobability functions for trust behavior, T T ( s ′ T | s T , s W , a ) , and work-load behavior, T W ( s ′ W | s T , s W , a ) , as well as independent emissionprobability functions for reliance, E T ( o R | s T ) , and gaze position, E W ( o G | s W ) . To parameterize the human trust-workload model, we collectedhuman subject data in an experiment designed to analyze the impactof the driving scene on the human driver’s trust in automation. Thisexperiment consisted of a series of interactions with intersections ofvarying scene complexity accompanied with or without augmentedreality (AR) graphical cues.

Stimuli and Procedure:

A within-subject study was conducted suchthat each participant completed driving tasks in each of eight(2 × ×

2) intersection conditions: two levels of traffic density(low traffic, Traffic low , or high traffic, Traffic high ), two levels of in-tersection complexity (presence of cars and pedestrians, Peds present or presence of cars only, Peds absent ), and two levels of AR cues (an-notated, AR on , or unannotated, AR off ). The order of the conditionswere counterbalanced per participant to reduce expectancy andlearning effects over time. The eight driving conditions were orga-nized along two possible routes, with each drive covering 15 blocksconsisting of approximately equal left and right turns to ensure along enough drive without any maneuver-specific responses. Ineach drive, three intersections were randomly chosen during theexperiment design stage that consisted of different numbers of carsand pedestrians based on the drive condition.High traffic density intersections without pedestrians (Traffic high + Peds absent ) consisted of eight cars in total with three from each sideof the cross traffic and two oncoming cars. Low traffic density inter-sections without pedestrians (Traffic low + Peds absent ) consisted offour cars in total with one from each side of the cross traffic and twooncoming cars. High traffic density intersections with the presenceof both cars and pedestrians (Traffic high + Peds present ) consistedof four cars in total with two from each side of the cross traffic(no oncoming) and eight pedestrians crossing the road. Low trafficdensity intersections with the presence of both cars and pedestrians(Traffic low + Peds present ) consisted of two cars in total with one fromeach side of the cross traffic (no oncoming) and four pedestrianscrossing the road. The AR graphical cues, if present, consisted ofbounding boxes surrounding each of the visible cars and pedestriansin the scene. This graphical highlighting was chosen based on thefirst stage of Endsley’s three stage model of situational awareness(i.e. perception) [19]. Cues highlighting pedestrians were markedin blue, while cues highlighting vehicles were marked in red (Fig-ure 1). All AR visuals were conformally registered in space to theeospatial center of each visible car/pedestrian and could movethrough any part of the virtually projected forward road scene withan appearance of being superimposed onto the projected scene.

Apparatus and Testbed:

The study was conducted in VirginiaTech’s COGnitive Engineering for Novel Technologies (COGENT)laboratory in a room equipped with a medium-fidelity driving sim-ulator with a fully instrumented cab and approximately 75 degreesof projected virtual canvas at approximately 3 meters in front of thedriver eye line. All simulated environments were rendered usingUnreal Engine 4.18 [20] and enabled detailed visual effects includ-ing shadow rendering, post processing, ambient vegetation, andlight scattering in high definition. The AR cues were overlaid intothe virtual scene in real time via a software developed in Unity [58]

Participants:

Sixteen participants (twelve males and four females)from Virginia Polytechnic Institute and State University completedthe study, ranging in age from 18 to 30 years old. Each participantwas required to have a valid driving license for at least two yearsand to have driven more than 5000 miles per year. None of theparticipants had experiences with AR-based interfaces. Participantsprovided informed consent and were briefly given a background ofthe intended research.After being equipped with Tobii Pro 2 eye-tracking glasses,participants completed a practice drive within the virtually simu-lated city environment. Participants were asked to monitor the au-tonomous driving mode as it navigated through the urban area. Theautomated driving was simulated by replaying a past researcher’sdrive via the “Wizard of Oz” technique [60]. Participants couldtakeover the automation to ensure continued safety by brakingand steering to stop and pull over, respectively. Once a participantfelt the situation was safe and released the brake, the automateddriving resumed. After the participant felt comfortable with theenvironment, they completed all eight driving sessions. Each driv-ing session lasted about 4-5 minutes depending on a participant’stake-over trials. Data from six participants was excluded from theanalysis because eye-tracking data could not be recorded completelyor with sufficient quality; therefore, ten participants’ data was usedfor estimating the trust-workload model.

Although individual differences exist between humans, we assumethat a common model can capture the dynamics of human trustand workload behavior for the general population. Therefore, weuse the aggregated data from all participants to estimate the tran-sition probability function, observation probability function, andthe prior probabilities of states for the trust-workload model. Forthis study, the automation transparency a τ is defined in two levelsas the absence of AR annotation cues and the presence of AR an-notation cues, i.e., a τ ∈ { AR off , AR on } . The scene complexity a C ischaracterized by both traffic density (Traffic low or Traffic high ) andintersection complexity (Peds absent or Peds present ). We define theautomation reliability at an intersection in terms of the distancethe driving automation stops the car before the stop line. The au-tomation reliability is defined to be low (Rel low ) if the car stoppedless than 5 meters before, or crossed, the stop line. The reliability isdefined to be medium (Rel mid ) if the car stopped between 5 metersand 15 meters before the stop line, and the reliability is defined to be high (Rel high ) if the car stopped more than 15 meters awayfrom the stop line. Such a reliability definition is similar to drivingaggressiveness, which affects the perceived trustworthiness of theautomation [22, 28].Reliance o R is defined based on the human relying on the automa-tion, o R = R + , or not relying on the automation and taking over, o R = R − . Each participant’s gaze position is classified as belongingto G road , G vehi , G ped , G side , or G oth in each video frame collected at25 frames per second. This is achieved by first classifying fixationsusing Tobii’s attention filter with default parameters [46, 57] andthen manually annotating each fixation. Finally, we assign to allframes after the start of one fixation, and before the start of the nextfixation, the annotation of the prior fixation. To estimate the model,we use the data collected during each of the three intersectionsin which the conditions of interest were varied, along with threeseconds before and after the corresponding intersection, in each ofthe eight drives. We define a sequence of action-observation datafor each participant as the interaction sequence at each intersection.For the ten participants’ data, we have 10 × × To obtain a model with the best generalizability given the availabledata, we find the subset of actions that directly affect the trust andworkload dynamics. For example, we fix reliability to always bean action for the trust dynamics as it has been established thatreliability affects trust. We then train all possible trust-workloadmodels with different subsets of actions for trust and workload.We conduct a 3-fold cross validation for each possible model, witheach model trained 24 times with a different division of trainingand testing sets to reduce uncertainty in the estimated validationlikelihood. We ensure that each fold contains one intersection fromeach condition of the experiment for each participant to maintainuniformity of the data across the folds. Furthermore, we calculatethe average validation likelihood for each of the models and se-lect the model that minimizes the Akaike information criterion(AIC) for the average validation likelihood [8]. The resulting modelconsists of automation reliability and automation transparency asactions for trust dynamics, and automation reliability, automationtransparency, and intersection complexity as actions for workloaddynamics. The model does not include traffic density as an action,which agrees with the findings based on the questionnaire [62] inwhich traffic density was found to be insignificant. The resultingmodel structure is represented in Figure 4 and has significantlyless parameters as there would be in a model that had not been rust Workload Reliance Gaze PositionReliability Transparency SceneComplexity**Based only on intersection complexityi.e.,

Figure 4: The structure of the simplified trust-workloadmodel. The model does not include traffic density as an ac-tion and the scene complexity based on intersection com-plexity affects only the workload state. simplified as described here. It also has the maximum average vali-dation likelihood among all models; therefore, this model structuregeneralizes well.Finally, the entire dataset is used to estimate the parametersof the model structure represented in Figure 4. In order to avoidlocal minima in parameter estimation, we iterate the algorithm1000 times, with each iteration using a different initial guess ofthe parameters. The estimated POMDP model of trust-workloadbehavior is presented and analyzed in the next section.

The estimated model consists of initial state probabilities for trust π ( s T ) and workload π ( s W ) , emission probability functions E T ( o R | s T ) and E W ( o G | s W ) , and transition probability functions T T ( s ′ T | s T , s W , a ) and T W ( s ′ W | s T , s W , a ) . Based on the emission probability functionfor trust E T ( o R | s T ) , we define the High Trust state s T = T ↑ asthat in which there is a higher probability of observing the humanrely on the automation, o R = R + . Based on the emission proba-bility function for workload E W ( o G | s W ) , we define the state withthe higher entropy of emission probability as the High Workloadstate s W = W ↑ . The entropy S ( s W ) for the workload state s W iscalculated as [13] S ( s W ) = − (cid:213) o G E W ( o G | s W ) log (E W ( o G | s W )) . The estimated initial probabilities of Low Trust T ↓ and HighTrust T ↑ are π ( T ↓ ) = . π ( T ↑ ) = . E T ( o R | s T ) is depicted in Figure 5and characterizes the probability of the participants’ reliance onthe automation given the participants’ state of trust. When in astate of Low Trust, the likelihood of participants not relying onthe automation is one. Similarly, when in a state of High Trust, thelikelihood of participants relying on the automation is nearly one.The estimated initial probabilities of Low Workload W ↓ and HighWorkload W ↑ are π ( W ↓ ) = . π ( W ↑ ) = . Figure 5: Emission probability function E T ( o R | s T ) for re-liance. Probabilities of observation are shown beside the ar-rows. Figure 6: Emission probability function E W ( o G | s W ) for gazeposition. Probabilities of observation are shown above thearrows. The emission probability function E W ( o G | s W ) is depicted in Fig-ure 6 and characterizes the probability of the participants’ gazeposition on the scene given the participants’ state of workload. Forthe Low Workload state, participants focus more on the road andthe vehicles on the road. However, for the High Workload state, par-ticipants’ focus is distributed between pedestrians and the sidewalk,along with other elements in the scene.To analyze how the actions—transparency, reliability, and scenecomplexity—affect the state dynamics over time, we simulate stepresponses as shown in Figure 7. Here, a step response for an action a can be construed as the evolution of the probability that thehuman is in a state of High Trust T ↑ and High Workload W ↑ as thePOMDP evolves under the given action. Each of the plots in Figure 7compare the step response of trust and workload between the twotransparency levels—absence and presence of AR cues—for a givenscene complexity, a C , and automation reliability, a r . We observethat over time, in most cases, the probability of High Trust decayedfaster to an equal or lower value if AR cues were absent (red dashedlines) as compared to when AR cues were present (green solid lines).This is consistent with the findings based on the questionnaire [62]in which more participants thought the system with the AR cuescould provide advice for their decision making compared to thesystem without AR cues. Also, the probability of High Workloadconverged to a lower or equal value when AR cues were absent (reddotted lines) as compared to when AR cues were present (greendot-dashed lines).Furthermore, Figures 7(c) and 7(f) show that high automation re-liability saturates the probability of High Trust to a very high level Time (in sec) . . . . . p ( T ↑ ) o r p ( W ↑ ) (a) a C = Peds absent , a r = Rel low . Time (in sec) . . . . . p ( T ↑ ) o r p ( W ↑ ) (b) a C = Peds absent , a r = Rel mid . Time (in sec) . . . . . p ( T ↑ ) o r p ( W ↑ ) (c) a C = Peds absent , a r = Rel high . Time (in sec) . . . . . p ( T ↑ ) o r p ( W ↑ ) (d) a C = Peds present , a r = Rel low . Time (in sec) . . . . . p ( T ↑ ) o r p ( W ↑ ) (e) a C = Peds present , a r = Rel mid . Time (in sec) . . . . . p ( T ↑ ) o r p ( W ↑ ) (f) a C = Peds present , a r = Rel high . p ( T ↑ ), a τ = AR on p ( W ↑ ), a τ = AR on p ( T ↑ ), a τ = AR oﬀ p ( W ↑ ), a τ = AR oﬀ Figure 7: Step responses of human state of trust and workload for each of the actions of the POMDP model. irrespective of the scene complexity and automation transparency.This is expected given that reliability has been shown to stronglyimpact human trust. Nonetheless, we observe that high automationtransparency (presence of AR cues) is able to maintain high trusteven in low reliability cases (Figures 7(a) and 7(d)) because theparticipants are able to make more informed decisions. For mediumreliability cases, shown in (Figures 7(b) and 7(e)), participants’ trustdecreases over time, possibly because the participants may be un-sure of the trustworthiness of the system given that the car neitherstops too close to the stop sign nor far enough. Considering theworkload state, we observe that high scene complexity (presenceof pedestrians at the intersection) results in higher probability ofHigh Workload (Figures 7(d), 7(e), and 7(f)) as compared to lowscene complexity (Figures 7(a), 7(b), and 7(c)).To summarize, the parameterized model captures the expectedtrade-off: that increasing transparency can increase trust but alsoincreases cognitive workload. Moreover, the results suggest thatthe effect of increased transparency on human trust and workloadalso depends on other factors including, but not limited to, scenecomplexity and automation reliability.

Next, we obtain the optimal policy aimed at calibrating trust. Theobtained trust-workload model provides the ability to estimatetrust and workload levels of a human continuously, and in real time,using the belief state estimates [55]. In order to obtain a policy thatcan calibrate human trust, we define a reward function as a functionof the human trust state s T and automation reliability a r as shownin Table 2. We allocate a penalty of − . . Table 2: Reward function used to calibrate human trust.

Rel low

Rel mid

Rel high

Low Trust T ↓ . . − . T ↑ − . . . when the model predicts the human is in a state of High Trust givenhigh automation reliability and when it predicts the human is in astate of Low Trust given low automation reliability. We select thediscount factor γ such that the reward after one second has a weightof e − ; given 25 time steps per second for our dataset, such a valueof γ can be approximated as γ = + = . a τ ) corre-sponding to each of the uncontrollable actions (reliability a r andscene complexity a C ) is depicted in Figure 8. It is worth noting . . . p ( T ↑ ) . . . . . p ( W ↑ ) (a) Peds absent and Rel low . . . . p ( T ↑ ) . . . . . p ( W ↑ ) (b) Peds present and Rel low . . . . p ( T ↑ ) . . . . . p ( W ↑ ) (c) Peds absent and Rel mid . . . . p ( T ↑ ) . . . . . p ( W ↑ ) (d) Peds present and Rel mid . . . . p ( T ↑ ) . . . . . p ( W ↑ ) (e) Peds absent and Rel high . . . . p ( T ↑ ) . . . . . p ( W ↑ ) (f) Peds present and Rel high . Figure 8: Optimized policy for calibrating human trust. Thecolor indicate if the system should turn-on the AR cues(green) or turn-off the AR cues (red) depending on system’sbelief on user’s trust and workload state. that even though the reward function is defined in terms of thetrust state s T , the system action is also dependent on the work-load state s W . For example, the system learned to reduce the driverworkload when system reliability is medium or high and the sceneis not very complicated by adopting absence of AR cues (AR off )(see Figures 8(c) and 8(e)); this is due to the coupled modeling oftrust and workload. For low reliability cases (Figures 8(a) and 8(b)),the optimal policy adopts the presence of AR cues (AR on ) as thetransparency level. This high transparency will allow the humanto make an informed decision and avoid mistrust. For mediumreliability cases (Figures 8(c) and 8(d)), the optimal policy adoptshigh transparency (AR on ) when both trust and workload are low.Providing high automation transparency at the low trust state helpsto increase the human’s trust, but it is avoided when the human’s workload is high. Similarly, for high reliability cases (Figures 8(e)and 8(f)), high transparency is only used when the human’s work-load is low. Interestingly, high transparency (AR on ) is adopted evenwhen the human’s workload is high when pedestrians are present(Figures 8(b), 8(d), and 8(f)). One potential reason for this is thatthe presence of pedestrians may be interpreted as “higher risk” tothe human, thereby leading to less trust in the automation if ARcues are absent. However, identification of potential confoundingeffects of risk is out of the scope of this work. Nonetheless, theseresults highlight the importance of including factors such as scenecomplexity, in addition to automation reliability in such a modelused for real-time decision making.In summary, the trust and workload-based feedback policy de-scribed here provides a framework for achieving adaptive trans-parency based on a quantitative dynamic model of human behavior.Although based on a limited sample size of human subject data,the framework provides an insight toward the coupled interactionsbetween human trust and workload and how the correspondingdynamics can be exploited to optimally calibrate trust to improvehuman-automation interactions. We presented a POMDP framework to model coupled human trustand workload dynamics as they evolve during a human’s interac-tion with a hands-off SAE level 2 driving automation. The modelwas trained using human subject data, collected via a medium-fidelity driving simulator with variations in scenarios that capturedthe effects of automation reliability, automation transparency, andscene complexity, along with reliance and eye-gaze behavior, onthe dynamics of human trust and workload. Analysis of the modeldemonstrates that user behavior is strongly influenced by the scenecomplexity and it should be accordingly considered when deter-mining the optimal transparency. Using a reward function designedto calibrate trust, we obtained an optimal policy to achieve trustcalibration. The proposed algorithm can influence driver trust leveland workload by controlling automation transparency level de-pending on the human’s current trust and workload level alongwith automation reliability and scene complexity. While we traineda “general” policy that applies to all participants, which mostlycomprised of young adults, the optimal policy might also dependon the individual factors of each driver. We would like to carefullyinvestigate the effect of these driver-dependent factors when weconduct a validation of this policy in real time with human subjects.Finally, it is worth noting that real driving is more complex thanthe scenarios we created; therefore, the ecological validity is lim-ited. We believe that research to explore the effects of attributionsto driver trust behavior and to influence user trust and workloadlevel will provide essential and necessary steps towards developinghuman-aware automated system interfaces.

ACKNOWLEDGMENTS

We sincerely acknowledge COGNET lab at Virginia PolytechnicInstitute and State University for the collection of the human subjectdata used in this paper. We also thank Yuki Gorospe at HondaResearch Institute US, Inc. for helping with eye-gaze annotation.

EFERENCES [1] Kumar Akash, Wan-Lin Hu, Neera Jain, and Tahira Reid. 2018. A ClassificationModel for Sensing Human Trust in Machines Using EEG and GSR.

ACM Trans.Interact. Intell. Syst.

8, 4 (Nov. 2018), 1–20. https://doi.org/10.1145/3132743[2] Kumar Akash, Wan-Lin Hu, Tahira Reid, and Neera Jain. 2017. Dynamic Modelingof Trust in Human-Machine Interactions. In

American Control Conference (ACC),2017 . IEEE, 1542–1548.[3] Kumar Akash, Griffon McMahon, Tahira Reid, and Neera Jain. 2020. HumanTrust-Based Feedback Control: Dynamically Varying Automation Transparencyto Optimize Human-Machine Interactions. arXiv:2006.16353 [cs, eess] (June 2020).arXiv:cs, eess/2006.16353[4] Kumar Akash, Katelyn Polson, Tahira Reid, and Neera Jain. 2019. ImprovingHuman-Machine Collaboration Through Transparency-Based Feedback – PartI: Human Trust and Workload Model.

IFAC-PapersOnLine

51, 34 (Jan. 2019),315–321. https://doi.org/10.1016/j.ifacol.2019.01.028[5] Kumar Akash, Tahira Reid, and Neera Jain. 2019. Improving Human-MachineCollaboration Through Transparency-Based Feedback – Part II: Control Designand Synthesis.

IFAC-PapersOnLine

51, 34 (Jan. 2019), 322–328. https://doi.org/10.1016/j.ifacol.2019.01.026[6] Victoria Alonso and Paloma de la Puente. 2018. System Transparency in SharedAutonomy: A Mini Review.

Front. Neurorobot.

12 (Nov. 2018), 83. https://doi.org/10.3389/fnbot.2018.00083[7] Mike Ananny and Kate Crawford. 2018. Seeing without Knowing: Limitationsof the Transparency Ideal and Its Application to Algorithmic Accountability.

New Media & Society

20, 3 (March 2018), 973–989. https://doi.org/10.1177/1461444816676645[8] Kenneth P. Burnham, David Raymond Anderson, and Kenneth P. Burnham.2002.

Model Selection and Multimodel Inference: A Practical Information-TheoreticApproach (2nd ed ed.). Springer, New York.[9] Anthony R Cassandra, Leslie Pack Kaelbling, and Michael L Littman. 1994. ActingOptimally in Partially Observable Stochastic Domains. In

AAAI , Vol. 94. 1023–1028.[10] Jessie Y Chen, Katelyn Procci, Michael Boyce, Julia Wright, Andre Garcia, andMichael Barnes. 2014.

Situation Awareness-Based Agent Transparency . TechnicalReport. Army Research Lab Aberdeen Proving Ground MD Human Research andEngineering Directorate.[11] Min Chen, Stefanos Nikolaidis, Harold Soh, David Hsu, and Siddhartha Srinivasa.2018. Planning with Trust for Human-Robot Collaboration. In

Proceedings of the2018 ACM/IEEE International Conference on Human-Robot Interaction - HRI ’18 .ACM Press, Chicago, IL, USA, 307–315. https://doi.org/10.1145/3171221.3171264[12] Jong Kyu Choi and Yong Gu Ji. 2015. Investigating the Importance of Trust onAdopting an Autonomous Vehicle.

International Journal of Human–ComputerInteraction

31, 10 (Oct. 2015), 692–702. https://doi.org/10.1080/10447318.2015.1070549[13] Thomas M. Cover and Joy A. Thomas. 2006.

Elements of Information Theory 2ndEdition (2 edition ed.). Wiley-Interscience, Hoboken, N.J.[14] Peter de Vries, Cees Midden, and Don Bouwhuis. 2003. The Effects of Errors onSystem Trust, Self-Confidence, and the Allocation of Control in Route Planning.

International Journal of Human-Computer Studies

58, 6 (June 2003), 719–735.https://doi.org/10.1016/S1071-5819(03)00039-9[15] Munjal Desai. 2012.

Modeling Trust to Improve Human-Robot Interaction . Ph.D.University of Massachusetts Lowell, United States – Massachusetts.[16] Stephen R. Dixon and Christopher D. Wickens. 2006. Automation Reliability inUnmanned Aerial Vehicle Control: A Reliance-Compliance Model of AutomationDependence in High Workload.

Hum Factors

48, 3 (Sept. 2006), 474–486. https://doi.org/10.1518/001872006778606822[17] Kim Drnec and Jason S. Metcalfe. 2016. Paradigm Development for Identifyingand Validating Indicators of Trust in Automation in the Operational Environmentof Human Automation Integration. In

Foundations of Augmented Cognition:Neuroergonomics and Operational Neuroscience , Dylan D. Schmorrow and Cali M.Fidopiastis (Eds.). Vol. 9744. Springer International Publishing, Switzerland, 157–167. https://doi.org/10.1007/978-3-319-39952-2_16[18] Ehab ElSalamouny, Vladimiro Sassone, and Mogens Nielsen. 2009. HMM-BasedTrust Model. In

International Workshop on Formal Aspects in Security and Trust .Springer, Berlin, Heidelberg, 21–35.[19] Mica R. Endsley. 1995. Toward a Theory of Situation Awareness in DynamicSystems.

Hum Factors

The8th IEEE International Conference on E-Commerce Technology and The 3rd IEEEInternational Conference on Enterprise Computing, E-Commerce, and E-Services(CEC/EEE’06) . 37–37. https://doi.org/10.1109/CEC-EEE.2006.14[22] Judy J. Fleiter, Alexia Lennon, and Barry Watson. 2010. How Do Other PeopleInfluence Your Driving Speed? Exploring the ‘Who’ and the ‘How’ of SocialInfluences on Speeding from a Qualitative Perspective.

Transportation Research Part F: Traffic Psychology and Behaviour

13, 1 (Jan. 2010), 49–62. https://doi.org/10.1016/j.trf.2009.10.002[23] Mahtab Ghazizadeh, John D. Lee, and Linda Ng Boyle. 2012. Extending theTechnology Acceptance Model to Assess Automation.

Cogn Tech Work

14, 1(March 2012), 39–49. https://doi.org/10.1007/s10111-011-0194-3[24] Tove Helldin. 2014.

Transparency for Future Semi-Automated Systems: Effects ofTransparency on Operator Performance, Workload and Trust . Ph.D. Dissertation.Örebro University, Örebro.[25] Sebastian Hergeth, Lutz Lorenz, Roman Vilimek, and Josef F. Krems. 2016. KeepYour Scanners Peeled: Gaze Behavior as a Measure of Automation Trust DuringHighly Automated Driving.

Hum Factors

58, 3 (May 2016), 509–519. https://doi.org/10.1177/0018720815625744[26] Kevin Anthony Hoff and Masooda Bashir. 2015. Trust in Automation: IntegratingEmpirical Evidence on Factors That Influence Trust.

Human Factors: The Journalof the Human Factors and Ergonomics Society

57, 3 (2015), 407–434.[27] Mark Hoogendoorn, Syed Waqar Jaffry, Peter Paul Van Maanen, and Jan Treur.2013. Modelling Biased Human Trust Dynamics.

Web Intelligence and AgentSystems

11, 1 (Aug. 2013), 21–40.[28] John M. Houston, Paul B. Harris, and Marcia Norman. 2003. The AggressiveDriving Behavior Scale: Developing a Self-Report Measure of Unsafe DrivingPractices.

North American Journal of Psychology; Winter Garden

5, 2 (June 2003),269–278.[29] W. Hu, K. Akash, T. Reid, and N. Jain. 2018. Computational Modeling of the Dy-namics of Human Trust During Human–Machine Interactions.

IEEE Transactionson Human-Machine Systems (2018), 1–13.[30] Jiun-Yin Jian, Ann M. Bisantz, and Colin G. Drury. 2000. Foundations for anEmpirically Determined Scale of Trust in Automated Systems.

InternationalJournal of Cognitive Ergonomics

4, 1 (2000), 53–71.[31] John Lee and Neville Moray. 1992. Trust, Control Strategies and Allocation ofFunction in Human-Machine Systems.

Ergonomics

35, 10 (1992), 1243–1270.[32] John D Lee and Neville Moray. 1994. Trust, Self-Confidence, and Operators’Adaptation to Automation.

International Journal of Human-Computer Studies

Human Factors: The Journal of the Human Factors andErgonomics Society

46, 1 (2004), 50–80.[34] Ming Li and Allison M Okamura. 2003. Recognition of Operator Motions forReal-Time Assistance Using Virtual Fixtures. In

Haptic Interfaces for VirtualEnvironment and Teleoperator Systems, 2003. HAPTICS 2003. Proceedings. 11thSymposium On . IEEE, 125–131.[35] Xin Liu and Anwitaman Datta. 2012. Modeling Context Aware Dynamic TrustUsing Hidden Markov Model. In

AAAI . 1938–1944.[36] Yidu Lu and Nadine Sarter. 2019. Eye Tracking: A Process-Oriented Method forInferring Trust in Automation as a Function of Priming and System Reliability.

IEEE Transactions on Human-Machine Systems

49, 6 (Dec. 2019), 560–568. https://doi.org/10.1109/THMS.2019.2930980[37] Zaki Malik, Ihsan Akbar, and Athman Bouguettaya. 2009. Web Services Reputa-tion Assessment Using a Hidden Markov Model. In

Service-Oriented Computing .Springer, Berlin, Heidelberg, 576–591.[38] J. S. Metcalfe, A. R. Marathe, B. Haynes, V. J. Paul, G. M. Gremillion, K. Drnec, C.Atwater, J. R. Estepp, J. R. Lukos, E. C. Carter, and W. D. Nothwang. 2017. Buildinga Framework to Manage Trust in Automation. In

Micro- and NanotechnologySensors, Systems, and Applications IX , Vol. 10194. 101941U. https://doi.org/10.1117/12.2264245[39] Marie Elisabeth Gaup Moe, Mozhgan Tavakolifard, and Svein J Knapskog. 2008.Learning Trust in Dynamic Multiagent Environments Using HMMs. In

Proceed-ings of the 13th Nordic Workshop on Secure IT Systems (NordSec 2008) .[40] Neville Moray and T. Inagaki. 1999. Laboratory Studies of Trust between Hu-mans and Machines in Automated Systems.

Transactions of the Institute ofMeasurement and Control

21, 4-5 (Oct. 1999), 203–211. https://doi.org/10.1177/014233129902100408[41] Neville Moray, Toshiyuki Inagaki, and Makoto Itoh. 2000. Adaptive Automation,Trust, and Self-Confidence in Fault Management of Time-Critical Tasks.

Journal ofExperimental Psychology: Applied

6, 1 (2000), 44–58. https://doi.org/10.1037/1076-898X.6.1.44[42] Bonnie M Muir. 1987. Trust between Humans and Machines, and the Designof Decision Aids.

International Journal of Man-Machine Studies

27, 5–6 (1987),527–539.[43] Bonnie M Muir. 1994. Trust in Automation: Part I. Theoretical Issues in the Studyof Trust and Human Intervention in Automated Systems.

Ergonomics

37, 11(1994), 1905–1922.[44] Bonnie M Muir and Neville Moray. 1996. Trust in Automation. Part II. Experi-mental Studies of Trust and Human Intervention in a Process Control Simulation.

Ergonomics

39, 3 (1996), 429–460.[45] Kazuo Okamura and Seiji Yamada. 2018. Adaptive Trust Calibration for Su-pervised Autonomous Vehicles. In

Adjunct Proceedings of the 10th InternationalConference on Automotive User Interfaces and Interactive Vehicular Applications(AutomotiveUI ’18) . ACM, New York, NY, USA, 92–97. https://doi.org/10.1145/239092.3265948[46] Anneli Olsen. 2012. The Tobii I-VT Fixation Filter.

Tobii Technology (2012), 21.[47] Raja Parasuraman and Christopher D. Wickens. 2008. Humans: Still Vital AfterAll These Years of Automation.

Human Factors: The Journal of the Human Factorsand Ergonomics Society

50, 3 (June 2008), 511–520.[48] Joelle Pineau, Geoff Gordon, Sebastian Thrun, et al. 2003. Point-Based ValueIteration: An Anytime Algorithm for POMDPs. In

IJCAI , Vol. 3. 1025–1032.[49] Robert W Proctor and Trisha Van Zandt. 2018.

Human Factors in Simple andComplex Systems (3rd ed.). CRC Press.[50] Martin L Puterman. 2014.

Markov Decision Processes: Discrete Stochastic DynamicProgramming . John Wiley & Sons.[51] L. Rabiner and B. Juang. 1986. An Introduction to Hidden Markov Models.

IEEEASSP Magazine

3, 1 (Jan. 1986), 4–16.[52] R. Seymour and G. L. Peterson. 2009. A Trust-Based Multiagent System. In , Vol. 3.109–116. https://doi.org/10.1109/CSE.2009.297[53] Olivier Sigaud and Olivier Buffet. 2013.

Markov Decision Processes in ArtificialIntelligence . John Wiley & Sons.[54] Rashmi Sinha and Kirsten Swearingen. 2002. The Role of Transparency in Recom-mender Systems. In

CHI’02 Extended Abstracts on Human Factors in ComputingSystems . ACM, 2.[55] Matthijs T. J. Spaan. 2012. Partially Observable Markov Decision Processes.In

Reinforcement Learning: State-of-the-Art , Marco Wiering and Martijn vanOtterlo (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 387–414. https: //doi.org/10.1007/978-3-642-27645-3_12[56] N. Tintarev and J. Masthoff. 2007. A Survey of Explanations in RecommenderSystems. In

TheEleventh ACM/IEEE International Conference on Human Robot Interaction . IEEE,109–116.[60] P. Wang, S. Sibi, B. Mok, and W. Ju. 2017. Marionette: Enabling On-Road Wizard-of-Oz Autonomous Driving Studies. In . 234–243.[61] Zheng Wang, Angelika Peer, and Martin Buss. 2009. An HMM Approach toRealistic Haptic Human-Robot Interaction. In

EuroHaptics Conference, 2009 andSymposium on Haptic Interfaces for Virtual Environment and Teleoperator Systems.World Haptics 2009. Third Joint . IEEE, 374–379.[62] Xingwei Wu, Coleman Merenda, Teruhisa Misu, Kyle Tanous, Chihiro Suga, andJoseph L Gabbard. 2020. Drivers’ Attitudes and Perceptions towards A DrivingAutomation System with Augmented Reality Human-Machine Interfaces. In2020 IEEE Intelligent Vehicles Symposium (IV)