[PDF] AutoPreview: A Framework for Autopilot Behavior Understanding

Abstract

The behavior of self driving cars may differ from people expectations, (e.g. an autopilot may unexpectedly relinquish control). This expectation mismatch can cause potential and existing users to distrust self driving technology and can increase the likelihood of accidents. We propose a simple but effective framework, AutoPreview, to enable consumers to preview a target autopilot potential actions in the real world driving context before deployment. For a given target autopilot, we design a delegate policy that replicates the target autopilot behavior with explainable action representations, which can then be queried online for comparison and to build an accurate mental model. To demonstrate its practicality, we present a prototype of AutoPreview integrated with the CARLA simulator along with two potential use cases of the framework. We conduct a pilot study to investigate whether or not AutoPreview provides deeper understanding about autopilot behavior when experiencing a new autopilot policy for the first time. Our results suggest that the AutoPreview method helps users understand autopilot behavior in terms of driving style comprehension, deployment preference, and exact action timing prediction.

Full PDF

AAutoPreview: A Framework for Autopilot BehaviorUnderstanding

Yuan Shen [email protected] of Illinois atUrbana-ChampaignChampaign, Illinois, USA

Niviru Wijayaratne [email protected] of Illinois atUrbana-ChampaignChampaign, Illinois, USA

Peter Du [email protected] of Illinois atUrbana-ChampaignChampaign, Illinois, USA

Shanduojiao Jiang [email protected] of Illinois atUrbana-ChampaignChampaign, Illinois, USA

Katherine Driggs-Campbell [email protected] of Illinois atUrbana-ChampaignChampaign, Illinois, USA

ABSTRACT

The behavior of self-driving cars may differ from people’s expecta-tions (e.g. an autopilot may unexpectedly relinquish control). Thisexpectation mismatch can cause potential and existing users todistrust self-driving technology and can increase the likelihood ofaccidents. We propose a simple but effective framework,

AutoPre-view , to enable consumers to preview a target autopilot’s potentialactions in the real-world driving context before deployment. For agiven target autopilot, we design a delegate policy that replicates thetarget autopilot behavior with explainable action representations,which can then be queried online for comparison and to build anaccurate mental model. To demonstrate its practicality, we presenta prototype of

AutoPreview integrated with the CARLA simulatoralong with two potential use cases of the framework. We conducta pilot study to investigate whether or not

AutoPreview providesdeeper understanding about autopilot behavior when experiencinga new autopilot policy for the first time. Our results suggest thatthe

AutoPreview method helps users understand autopilot behaviorin terms of driving style comprehension, deployment preference,and exact action timing prediction.

CCS CONCEPTS • Human-centered computing → Human computer interac-tion (HCI) ; •

Computing methodologies → Artificial intelli-gence . KEYWORDS

Autonomous Vehicle, Mental Model, Human Robot Interaction,Imitation Learning, Agent Behavior Understanding, Preview

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than theauthor(s) must be honored. Abstracting with credit is permitted. To copy otherwise, orrepublish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from [email protected].

CHI ’21 Extended Abstracts, May 8–13, 2021, Yokohama, Japan © 2021 Copyright held by the owner/author(s). Publication rights licensed to ACM.ACM ISBN 978-1-4503-8095-9/21/05...$15.00https://doi.org/10.1145/3411763.3451591

ACM Reference Format:

Yuan Shen, Niviru Wijayaratne, Peter Du, Shanduojiao Jiang, and KatherineDriggs-Campbell. 2021. AutoPreview: A Framework for Autopilot BehaviorUnderstanding. In

CHI Conference on Human Factors in Computing SystemsExtended Abstracts (CHI ’21 Extended Abstracts), May 8–13, 2021, Yokohama,Japan.

ACM, New York, NY, USA, 7 pages. https://doi.org/10.1145/3411763.3451591

Despite recent efforts towards fully autonomous vehicles (e.g., SAELevel 5 [24]), existing self-driving solutions still require humandrivers to maintain situational awareness and be ready to take overcontrol at any given moment [3]. The effectiveness of these sys-tems requires that the human drivers have developed a reasonableunderstanding of the autopilot’s behaviors and tendencies.However, industry currently does not provide sufficient tools tohelp drivers calibrate appropriate mental models of autonomoustechnology. We conducted a simple initial survey study to under-stand how potential and existing users currently explore and buildan understanding of autopilot behavior. Our results showed that59.1% of our participants expressed that they rarely, if ever, checkthe content of release notes (the current industry practice), while77.3% of our participants indicated they would prefer a previewingtool prior to purchase or deployment. As a result of poor mentalmodel calibration tools, drivers may experience unexpected behav-iors when on the road and therefore disengage the autopilot [28].For example, researchers have found 10.5 hours of YouTube videosthat record how autopilot has surprised drivers [4]. Our objectiveis to develop a tool to help drivers become familiar with autopilotbehavior, improve their understanding, and establish appropriatelevels of trust.We propose a framework, called

AutoPreview , which aims tohelp new or already existing users of autonomous vehicles previewautopilot behaviors of updated control policies prior to purchaseor deployment. At a high level,

AutoPreview takes advantage of adelegate model to inform drivers about the potential actions that atarget autopilot would take if it were deployed. We implemented aframework prototype in the CARLA simulation environment [8].Our preliminary finding suggests that

AutoPreview is easy-to-useand can help users better understand autopilot behavior in terms a r X i v : . [ c s . A I] F e b HI ’21 Extended Abstracts, May 8–13, 2021, Yokohama, Japan Yuan Shen, Niviru Wijayaratne, Peter Du, Shanduojiao Jiang, and Katherine Driggs-Campbell of driving style comprehension, deployment preference, and exactaction timing prediction.

Prior work has conducted several studies on building mental modelsof intelligent agents. These methods can be categorized into onlineinteraction and offline introspection. For online interaction, explain-able AI related systems are widely discussed and used [7, 15, 18, 23].Through visual or verbal explanations, real-time interaction candirectly respond to real-world scenarios but cannot protect usersfrom the danger of unexpected agent behaviors when users havenot established sufficient understanding of the agent’s policies. Asfor offline introspection, researchers indicate that end-users canbuild better mental models of reinforcement learning agent policieseither through checking the extracted critical states from the agenttrajectories [1, 14, 25], or through actively querying trajectorieswhich satisfy certain behavioral related conditions [2, 5, 30]. Theseoffline methods offer targeted feedback to users’ queries, but re-quire extra effort to explore and thus oppose the principle of leasteffort [31]. Our method combines the best of both worlds by en-abling users to safely and conveniently preview an agent’s policiesonline through real world interaction, via a delegate policy.Aside from methods for mental model development, researchershave also explored factors that influence the acceptability of au-tonomous vehicles. Choi et al . revealed that trust and perceivedusefulness strongly affect the user’s desire to use autonomous vehi-cles [6]. In particular, trust is a widely adopted metric for the levelof acceptance of autonomous systems [6, 12, 13, 20, 29]. They alsosuggest that system transparency, technical competence, and situ-ation management can positively impact trust, and can thereforeindirectly influence the adoption and acceptance of autonomousvehicle. AUTOPREVIEW

FRAMEWORK

The motivation of

AutoPreview is to make autopilots transparentand understandable to new or active users with no domain knowl-edge. We aim to provide an easy-to-use and safe tool for theseconsumers to understand, evaluate, and compare autopilot modelsbefore use. Our framework design was guided by the followingthree design considerations:(1)

Safety : While drivers are learning autopilot behaviors, weshould not put them in dangerous situations that may ariseas a result of inexperience with the new autopilot system.(2)

Convenience : We avoid solutions that require humans tospend extra time reading or learning, a downside of offlineintrospection as discussed in Section 2.(3)

Realism : We prefer solutions that enable users to learn autopi-lot behavior through experiences that are as real as possible.Past work in social psychology has provided strong evidencesupporting the fact that realistic experiences yield more clearand accurate attitudes than those developed through unreal-istic experiences. [9, 21].Our

AutoPreview framework achieves the above three designcriteria by previewing the behaviors of the target autopilot, 𝜋 𝑡𝑎𝑟𝑔𝑒𝑡 ,indirectly through a delegate autopilot, 𝜋 𝑑𝑒𝑙𝑒𝑔𝑎𝑡𝑒 (Figure 1). Thedelegate autopilot is generated by imitation learning algorithms AutoPreview

Framework imitation learningtarget autopilot π target current observation o t action a ext explanation generation module“ would force the driver to take over control now if it were deployed” π target delegate autopilot π delegate Figure 1:

AutoPreview

Framework. The shaded area repre-sents the framework logic happening inside the end-user’svehicle, and the rest of the logic happens at the self-drivingcompany. The target autopilot, 𝜋 𝑡𝑎𝑟𝑔𝑒𝑡 , refers to the modelthat people are interested in for its behavior. For exam-ple, the target autopilot can be a newly released but unfa-miliar autopilot that end-users are not sure whether theyshould deploy. The delegate autopilot, 𝜋 𝑑𝑒𝑙𝑒𝑔𝑎𝑡𝑒 , is a delegatemodel which behavior matches with 𝜋 𝑡𝑎𝑟𝑔𝑒𝑡 . The function of 𝜋 𝑑𝑒𝑙𝑒𝑔𝑎𝑡𝑒 is to inform human drivers about the potential ac-tions of 𝜋 𝑡𝑎𝑟𝑔𝑒𝑡 based on the current driving state. However,the proposed actions of 𝜋 𝑑𝑒𝑙𝑒𝑔𝑎𝑡𝑒 will not be executed to con-trol the vehicle. and can output control actions that match the behavior of 𝜋 𝑡𝑎𝑟𝑔𝑒𝑡 .To clarify, we define a policy, 𝜋 , as a function that outputs an actionbased on an observation. In order to satisfy the safety criterion,a human driver must maintain full control of the vehicle duringuse of our framework, therefore the control action produced by 𝜋 𝑑𝑒𝑙𝑒𝑔𝑎𝑡𝑒 will not actually be executed. The delegate autopilot willsolely inform human drivers about the potential actions of 𝜋 𝑡𝑎𝑟𝑔𝑒𝑡 based on the current driving state. Under the AutoPreview frame-work, drivers can manually control their vehicle to actively learnfrom interesting scenarios and can evaluate the target autopilotaction under those conditions [10]. We describe the details of ourframework in the next subsection.

Our goal is to enable potential or existing users to preview anautopilot model before purchase or deployment. The initial step inthe use of this framework would be to have the desired self-drivingcar company generate a delegate autopilot model 𝜋 𝑑𝑒𝑙𝑒𝑔𝑎𝑡𝑒 , whichimitates the autopilot behaviors of 𝜋 𝑡𝑎𝑟𝑔𝑒𝑡 . This generated 𝜋 𝑑𝑒𝑙𝑒𝑔𝑎𝑡𝑒 is then delivered to users who are interested in learning about theautopilot behavior of 𝜋 𝑡𝑎𝑟𝑔𝑒𝑡 .We start by elaborating further on the model training processwithin the self-driving car company. To achieve the previewingobjective while satisfying the previously defined realism criterion,the self-driving car company must send a version of the targetautopilot model, 𝜋 𝑡𝑎𝑟𝑔𝑒𝑡 , to users so that they can explore the au-topilot functionality with online, real-world observations, 𝑜 𝑡 . Notethat we consider 𝜋 𝑡𝑎𝑟𝑔𝑒𝑡 as a black-box model with no assumptionabout its internal structure. Based on this assumption, since theoutputs of 𝜋 𝑡𝑎𝑟𝑔𝑒𝑡 are low-level actions (e.g., pedal, brake, steeringangle), end users cannot directly map those actions to high-levelbehaviors (e.g., overtake, change lanes). Our proposed solution isto use imitation learning methods [16, 17, 27] to generate a dele-gate model 𝜋 𝑑𝑒𝑙𝑒𝑔𝑎𝑡𝑒 , which matches the behavior of 𝜋 𝑡𝑎𝑟𝑔𝑒𝑡 , but utoPreview: A Framework for Autopilot Behavior Understanding CHI ’21 Extended Abstracts, May 8–13, 2021, Yokohama, Japan Next version would start to slow down now Next version would switch lanes now Next version would force human to take over control now x π newdelegate π newdelegate π newdelegate The delegate autopilot is provided by the self-driving companyControlled by human driverv scenario 1 scenario 2 scenario 3 Car accident spot x π Bdelegate π Adelegate π Cdelegate

Controlled by human driverv

BrandA would switch lanes now BrandB would brake now BrandC would consider to brake now

These delegate autopilots are provided by di ﬀ erent self-driving companies New Release PreviewAutopilot Comparison

Figure 2: Two potential use cases of our

AutoPreview frame-work. The first figure illustrates the previewing feature dur-ing an existing self-driving car owner’s trial of a new soft-ware release involving autopilot changes. The second fig-ure describes the experience a potential buyer of an au-tonomous vehicle would undergo if using the frameworkto compare autopilot models in the same driving scenarioacross brands. BrandA, BrandB, and BrandC represent au-topilot models from different companies. remaps the low-level action outputs to the high level action spacethat humans use to explain and comprehend driving scenarios. Once downloaded by the user, the delegate autopilot, 𝜋 𝑑𝑒𝑙𝑒𝑔𝑎𝑡𝑒 ,can then output high-level actions 𝑎 𝑒𝑥𝑡 to users based on the currentobservation, 𝑜 𝑡 . Note that actions 𝑎 𝑒𝑥𝑡 are what action the targetautopilot would take if deployed. As discussed previously, 𝑎 𝑒𝑥𝑡 willnot be executed to control the vehicle. Instead, 𝑎 𝑒𝑥𝑡 is fed into theexplanation generation module (Figure 1) which is responsible forpreparing visual or verbal explanation outputs for the user anddeciding when to trigger these explanations to avoid counterpro-ductive effects [11, 26]. To demonstrate its practicality, we present two potential use casesof our

AutoPreview framework, one for existing users, another forpotential consumers. As discussed in Section 1, existing usersof autonomous vehicles need convenient tools to preview autopilotbehavior in order to decide if they should deploy a newly released Note that we assume the delegate accurately captures the target autopilot withsufficient data and computational power. Modeling errors and incorrect abstractionswill be explored in future work. We also see potential utility of our framework for companies developing self-drivingtechnology (e.g., crowd sourcing early feedback for autopilots in beta release) but leaveexploration of this application to future work autopilot model. In

New Release Preview seen in Figure 2, we il-lustrate how an existing user could use the framework to safelypreview autopilot behaviors when a new software release is avail-able. Directly deploying the newly released autopilot is risky sinceend-users are unsure about its safety and behaviors. After down-loading the delegate autopilot, drivers can manually control theirvehicles to actively explore the scenarios they are interested in andevaluate, in those scenarios, the newly released target autopilot’sactions based on the output from the delegate autopilot. This pre-viewing feature can enable users to make a deployment decisionbased on their first-hand experience through our framework.

As a second use case, we ex-plore the potential consumers’ need to evaluate autopilot perfor-mance from different companies prior to making a purchase. Com-paring autopilot behaviors across self-driving car companies is achallenging task. Some third-party benchmarking providers haveevaluated self-driving cars from different brands based on cus-tomized metrics under several test scenarios, but this approach ishard to scale in terms of scenario coverage and car brands. Ourdelegate autopilot design can compare autopilot behaviors in thesame real-world scenarios across different car brands (

AutopilotComparison in Figure 2). The delegate autopilot has a flexible hard-ware requirement since it does not need to be trained with thesame sensor inputs as the target autopilot [22]. In other words, itis possible to run 𝜋 𝑑𝑒𝑙𝑒𝑔𝑎𝑡𝑒 on different sensor inputs, includingnon-autonomous vehicles, as long as some sensors are equippede.g. camera. In Autopilot Comparison in Figure 2, under the sameaccident scenario, it is easy to tell that autopilot BrandA performsbetter than the other two brands since its action is the earliest andthe most efficient.

The proposed

AutoPreview framework enables potential consumersor existing users to preview the behaviors of a target autopilotby observing the actions of a delegate autopilot that shares thesame abstracted behaviors as the target autopilot. The goal of thisexperiment is to investigate what degree of autopilot behavior un-derstanding our

AutoPreview approach can establish. We conducteda between-subject control experiment online with 10 participants.Our participants are aged between 18 and 30, and agreed to joinour study voluntarily. The study took between 30 and 45 minutesfor each participant. During the study, we assisted our participantsonline.

We built a prototype of our

AutoPreview framework in a customizedCARLA simulation environment [8]. We modified a Model Predic-tive Control agent provided by CARLA as our target autopilot tocontrol the autopilot behavior and its driving style. The modifiedautopilot can only perform lane-changing and lane-following op-erations. Moreover, we explicitly defined the trigger condition of

HI ’21 Extended Abstracts, May 8–13, 2021, Yokohama, Japan Yuan Shen, Niviru Wijayaratne, Peter Du, Shanduojiao Jiang, and Katherine Driggs-Campbell

Action TableRear CameraDriving Mode

Online Actionmodel

Comparison Action TableTreatment Action Table

Background Actionmodel

Figure 3: Our framework prototype. The left screenshot shows the driver-perspective video interface that is used in our controlexperiment. In particular, the action table is different for each experiment group. For the version in the treatment group(upper-right), the action column describes the potential actions based on the delegate autopilot’s output. For example, thecorrect way to interpret the example treatment action table is: the delegate autopilot suggests that the corresponding targetautopilot would switch to the left lane if it were deployed to control the vehicle. On the other hand, for the comparison group(lower-right), the arrow icons in the action column reflects the lane-changing operation of the current vehicle which is directlycontrolled by the target autopilot. the lane-changing operation. Since we had full access to the tar-get autopilot, the delegate policy’s behavior was also defined byexplicitly engineered rules. Our driving interface was configured to have a first-person view-ing perspective together with a rear camera view as shown in Figure3. The actions of the delegate autopilot are presented through anaction table. Inspired by the prior work on mental model buildingvia critical states, we considered lane-changing behaviors as criticalactions, and only informed participants in real-time about the leftor right lane-changing operation of our delegate autopilot throughthe arrow icons in the action table [14]. The default maps providedby CARLA contain complicated driving scenarios (e.g., traffic cross-ings, roundabout, and pedestrian passing), which can be tricky forour participants to observe in a short experiment trial. To reducethe complexity of the driving environment and constrain the setof driving scenarios only to those that include lane-changing orlane-following, we generated a two-lane single-loop map usingRoadRunner.

Our experiment hypothesis is that the

AutoPreview method canhelp end-users understand the target autopilot’s behavior at leastas accurate as observing the target autopilot’s behavior directly.We measured the degree of autopilot behavior understanding interms of the aggressiveness level on a 10-point Likert scale. Wequantified the degree of understanding in terms of the absolutetiming error between the ground truth and user predicted lane-changing timesteps. Specifically, we asked participants to specifythe time instance during which the target autopilot would be mostlikely to switch lanes, along with their level of confidence of thisprediction, in eight different five-second test scenarios. We split ourparticipants into two groups, with five participants in each group. Although we did successfully train an imitation learning agent to replicate the MPCbehavior as previously described, we did not include the trained agent in our pre-liminary study since the effect of training error was difficult to control in this initialstudy.

In the comparison group, we asked our participants to observethe behaviors of the target autopilot which directly controlled thevehicle. In the treatment group, our participant were told to infer thetarget autopilot behavior indirectly through the delegate autopilot.To simulate the autopilot experience, we prepared a three minutefirst-person test-drive video for the different experiment groups(Figure 3). Participants in both groups were instructed to imaginethemselves as passengers in the car in the video. For the compari-son group, the car was directly controlled by the target autopilot.Participants were told that the car was in autopilot mode. As forthe treatment group, participants were informed that the car wasin manual mode, controlled by a researcher, and the action of thedelegate autopilot was presented in the action table. In this pre-liminary study, we did not enable participants to actively explorethe scenarios by interactively controlling the vehicle, in order toreduce experiment noise caused by the participants’ explorationstrategy. We also ensured consistency in manual mode behavioracross different treatment group videos, by using another autopilotto control the vehicle so as to replicate a manually controlled ve-hicle. Additionally, to reduce the influence of sample bias on ourresult, we randomly initialized traffic scenarios for each recordingsuch that every video was different. Finally, we explicitly set thetarget autopilot to have an aggressive lane-changing behavior bycontrolling the lane switching triggering conditions to ensure areasonable effect size for our experiment.The experiment procedure involved three stages: tutorial, virtualtest-drive, and post-experiment questions. During the tutorial stage,participants learned about the video interface and their task, andsigned the experiment consent form. During the virtual test-drive,the participants imagined themselves as passengers of the car in thevideo, and finished watching the video without pausing or replaying.While the video was playing, the participants were tasked withfiguring out the lane-changing behavior of the target autopilot,based solely on the video content. The post-experiment sessionthen involved an evaluation of the participants’ understanding ofthe target autopilot’s lane changing behavior. utoPreview: A Framework for Autopilot Behavior Understanding CHI ’21 Extended Abstracts, May 8–13, 2021, Yokohama, Japan

Treatment Group Comparison Group Hedges’s g s Method Direction Test Statistics p-valueM SD M SDParticipant-report Conﬁdence

Unweighted Timing Error

Mann-Whitney U test < U (5,5) = 21 < 0.05 Weighted Timing Error two-sample t-test < t(8) = 2.37 < 0.05

Figure 4: Quantitative result on timing prediction error. Participant-report Confidence refers to how confident participantsbelieve that their predicted lane-changing timing is not 0.5 seconds away from the ground truth. Unweighted Timing Erroris the average L1 error between the ground truth and participant predicted lane-changing timing across eight different testscenarios. The unit for the error is seconds. And the Weighted Timing Error is calculated by performing a weighted averageof the timing error based on Participant-report Confidence. We calculated Hedges’s 𝑔 𝑠 based on [19]. Comparison Group Treatment Group

Would you deploy the target autopilot to your self-driving car?Rate the driving style of the target autopilot (1 for the most aggressive,10 for the most cautious)

Figure 5: Visualization of participants’ overall judgment oftarget autopilot behaviors for both groups after they learnedthe target autopilot behaviors. The percent bar plot suggeststhat participants from both experiment groups agree thatthe target autopilot has an aggressive driving style.

We compared the participants’ responses from five perspectives:(1) overall autopilot driving style, (2) deployment preference, (3)average action timing error, (4) average prediction confidence, (5)framework usability. As shown in Figure 5, participants in bothgroups believed the target autopilot had an aggressive driving stylewith 3 as the majority opinion. As for deployment preference, while40% of the participants in the comparison group preferred to deploy,all participants in the treatment group decide not to deploy thetarget autopilot.The error and associated confidence of lane change timing (ab-solute difference between ground truth and user label) is shown inFigure 4. We used the Mann-Whitney U test for the UnweightedTiming Error since it does not pass the normality check. Boththe weighted and unweighted error show statistically significantdifference. Thus, we concluded that the

AutoPreview method canpotentially help potential consumers or target users predict thetarget autopilot action more accurately than the baseline. Overall,we observed large Hedges’s 𝑔 𝑠 . Our explanation is that AutoPreview enables users to learn from driving states that are rare in the realworld but nonetheless insightful in helping users understand theautopilot’s behavior. We believe this advantage can be further lever-aged if we enable participants to actively control the vehicle andexplore driving scenarios. Finally, for the usability of our frame-work, in the treatment group, two participants said the delegateautopilot was very easy to use, one said it was easy to use, and one said it was neither easy nor difficult to use in a five-level multiplechoice question, leading us to conclude that the framework is, infact, a viable and convenient solution for the previewing task.

Our preliminary findings suggests that

AutoPreview can help usersintuitively understand autopilot behavior in terms of overall driv-ing style understanding, deployment preference, and exact actiontiming prediction. From our experimental results, we noticed thatparticipants in the treatment group showed less confidence in theirtiming prediction and more conservative attitude towards deploy-ing the target autopilot model, suggesting that the action table aloneis not enough to instill participant confidence in the target autopilot.We consider this a limitation of our framework and attribute thediscrepancy in deployment preference between the comparison andtreatment groups, as discussed in Section 4.3, to this.There are several limitations with this framework. First, thedelegate autopilot can potentially report actions in states that thetarget autopilot is unlikely to visit, since the delegate autopilotbases action notifications purely on current observations withoutconsidering state visitation frequency. Additionally, although the

AutoPreview framework can protect drivers from the danger ofunexpected autopilot behavior during exploration, the notificationmechanism we employed might add extra mental load to the driverand can potentially increase the risk of accidents. Furthermore,our prototype can only report information regarding an actiontriggering moment. Subtle behaviors (e.g., how soft the brake wouldbe), still require further research. As for our experiment, the smallsample size as well as the usage of video recordings, ultimatelymade for a sub-optimal experiment design. We believe a largersample size as well as the usage of driving simulators or a moreinteractive tool could potentially yield more conclusive results thanthose reported in our experiment.For future work, we hope to explore whether active learningcan improve learning quality. More concretely, we plan to researchthe improvement in user understanding of autopilot behavior ifusers are given the ability to control the car and actively create testscenarios that they hope to know autopilot’s action in. Furthermore,we hope to explore verbal, textual or augmented-reality-based noti-fication mechanisms in future.

HI ’21 Extended Abstracts, May 8–13, 2021, Yokohama, Japan Yuan Shen, Niviru Wijayaratne, Peter Du, Shanduojiao Jiang, and Katherine Driggs-Campbell

In this paper, we propose the

AutoPreview framework, which ab-stracts autopilot policies into explainable policies for viewing andexploring online. The main contribution of our work is highlightinga novel design space which involves using the preview stage tobuild or calibrate human drivers’ mental model towards the targetautopilot. Our preliminary finding suggests that the

AutoPreview method is easy-to-use and can help users understand autopilot be-havior in terms of overall driving style understanding, deploymentpreference, and exact action timing prediction.

REFERENCES [1] Dan Amir and Ofra Amir. 2018. HIGHLIGHTS: Summarizing Agent Behavior toPeople. In

Proceedings of the 17th International Conference on Autonomous Agentsand MultiAgent Systems (Stockholm, Sweden) (AAMAS ’18) . International Founda-tion for Autonomous Agents and Multiagent Systems, Richland, SC, 1168–1176.[2] Serena Booth, Yilun Zhou, Ankit Shah, and Julie Shah. 2021. Bayes-TrEx: aBayesian Sampling Approach to Model Transparency by Example. In

Proceedingsof the AAAI Conference on Artificial Intelligence (Accepted for publication) . AAAI,virtual.[3] Shadan Sadeghian Borojeni, Frank Flemisch, Marcel Baltzer, and Susanne Boll.2018. Automotive UI for Controllability and Safe Transitions of Control. In

Adjunct Proceedings of the 10th International Conference on Automotive User In-terfaces and Interactive Vehicular Applications (Toronto, ON, Canada) (Automo-tiveUI ’18) . Association for Computing Machinery, New York, NY, USA, 23–29.https://doi.org/10.1145/3239092.3239559[4] Barry Brown and Eric Laurier. 2017. The Trouble with Autopilots: Assistedand Autonomous Driving on the Social Road. In

Proceedings of the 2017 CHIConference on Human Factors in Computing Systems (Denver, Colorado, USA) (CHI ’17) . Association for Computing Machinery, New York, NY, USA, 416–429.https://doi.org/10.1145/3025453.3025462[5] Maya Cakmak and Manuel Lopes. 2012. Algorithmic and Human Teaching ofSequential Decision Tasks. In

Proceedings of the Twenty-Sixth AAAI Conference onArtificial Intelligence (Toronto, Ontario, Canada) (AAAI’12) . AAAI Press, Toronto,1536–1542.[6] Jong Kyu Choi and Yong Gu Ji. 2015. Investigating the importance of trust onadopting an autonomous vehicle.

International Journal of Human-ComputerInteraction

31, 10 (2015), 692–702.[7] Jonathan Dodge, Sean Penney, Claudia Hilderbrand, Andrew Anderson, andMargaret Burnett. 2018.

How the Experts Do It: Assessing and Explaining AgentBehaviors in Real-Time Strategy Games . Association for Computing Machinery,New York, NY, USA, 1–12. https://doi.org/10.1145/3173574.3174136[8] Alexey Dosovitskiy, German Ros, Felipe Codevilla, Antonio Lopez, and VladlenKoltun. 2017. CARLA: An Open Urban Driving Simulator. In

Proceedings of the 1stAnnual Conference on Robot Learning (Proceedings of Machine Learning Research,Vol. 78) , Sergey Levine, Vincent Vanhoucke, and Ken Goldberg (Eds.). PMLR,Mountain View, 1–16. http://proceedings.mlr.press/v78/dosovitskiy17a.html[9] Russell H Fazio and Mark P Zanna. 1981. Direct experience and attitude-behaviorconsistency. , 161–202 pages.[10] Richard M Felder and Rebecca Brent. 2009. Active learning: An introduction.

ASQ higher education brief

2, 4 (2009), 1–5.[11] Ernestine Fu, Mishel Johns, David A. B. Hyde, Srinath Sibi, Martin Fischer, andDavid Sirkin. 2020. Is Too Much System Caution Counterproductive? Effectsof Varying Sensitivity and Automation Levels in Vehicle Collision AvoidanceSystems. In

Proceedings of the 2020 CHI Conference on Human Factors in ComputingSystems (Honolulu, HI, USA) (CHI ’20) . Association for Computing Machinery,New York, NY, USA, 1–13. https://doi.org/10.1145/3313831.3376300[12] Sebastian Hergeth, Lutz Lorenz, Roman Vilimek, and Josef F. Krems. 2016. KeepYour Scanners Peeled: Gaze Behavior as a Measure of Automation Trust DuringHighly Automated Driving.

Human Factors

58, 3 (2016), 509–519. https://doi.org/10.1177/0018720815625744 arXiv:https://doi.org/10.1177/0018720815625744PMID: 26843570.[13] Wan-Lin Hu, Kumar Akash, Neera Jain, and Tahira Reid. 2016. Real-TimeSensing of Trust in Human-Machine Interactions**This material is based uponwork supported by the National Science Foundation under Award No. 1548616.Any opinions, findings, and conclusions or recommendations expressed in thismaterial are those of the author(s) and do not necessarily reflect the viewsof the National Science Foundation.

IFAC-PapersOnLine

49, 32 (2016), 48–53.https://doi.org/10.1016/j.ifacol.2016.12.188 Cyber-Physical & Human-SystemsCPHS 2016.[14] Sandy H Huang, Kush Bhatia, Pieter Abbeel, and Anca D Dragan. 2018. Establish-ing appropriate trust via critical states. In . IEEE, Madrid, 3929–3936.[15] Jinkyu Kim, Anna Rohrbach, Trevor Darrell, John Canny, and Zeynep Akata.2018. Textual Explanations for Self-Driving Vehicles. In

Proceedings of the Euro-pean Conference on Computer Vision (ECCV) . Springer Science+Business Media,Munich, 14.[16] Thomas Kipf, Yujia Li, Hanjun Dai, Vinicius Zambaldi, Alvaro Sanchez-Gonzalez,Edward Grefenstette, Pushmeet Kohli, and Peter Battaglia. 2019. CompILE:Compositional Imitation Learning and Execution. In

Proceedings of the 36thInternational Conference on Machine Learning (Proceedings of Machine LearningResearch, Vol. 97) , Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.). PMLR,Long Beach, 3418–3428. http://proceedings.mlr.press/v97/kipf19a.html[17] Ashish Kumar, Saurabh Gupta, and Jitendra Malik. 2020. Learning NavigationSubroutines from Egocentric Videos. In

Proceedings of the Conference on Ro-bot Learning (Proceedings of Machine Learning Research, Vol. 100) , Leslie PackKaelbling, Danica Kragic, and Komei Sugiura (Eds.). PMLR, Osaka, 617–626.http://proceedings.mlr.press/v100/kumar20a.html utoPreview: A Framework for Autopilot Behavior Understanding CHI ’21 Extended Abstracts, May 8–13, 2021, Yokohama, Japan [18] Moritz Körber, Lorenz Prasch, and Klaus Bengler. 2018. Why Do I Haveto Drive Now? Post Hoc Explanations of Takeover Requests.

HumanFactors

60, 3 (2018), 305–323. https://doi.org/10.1177/0018720817747730arXiv:https://doi.org/10.1177/0018720817747730 PMID: 29283269.[19] Daniël Lakens. 2013. Calculating and reporting effect sizes to facilitate cumulativescience: a practical primer for t-tests and ANOVAs.

Frontiers in psychology

Human Factors

62, 2 (2020), 260–277. https://doi.org/10.1177/0018720819872672 arXiv:https://doi.org/10.1177/0018720819872672 PMID:31502885.[21] James M Olson and Gregory R Maio. 2003. Attitudes in social behavior. , 299–325 pages.[22] Yunpeng Pan, Ching-An Cheng, Kamil Saigol, Keuntaek Lee, Xinyan Yan, Evan-gelos Theodorou, and Byron Boots. 2018. Agile Autonomous Driving using End-to-End Deep Imitation Learning. In

Proceedings of Robotics: Science and Systems .RSS, Pittsburgh, Pennsylvania, 13. https://doi.org/10.15607/RSS.2018.XIV.056[23] A. Rotsidis, A. Theodorou, J. J. Bryson, and R. H. Wortham. 2019. Improving RobotTransparency: An Investigation With Mobile Augmented Reality. In

Artificial Intelligence

288 (2020), 103367.[26] Yuan Shen, Shanduojiao Jiang, Yanlin Chen, Eileen Yang, Xilun Jin, Yuliang Fan,and Katie Driggs Campbell. 2020. To Explain or Not to Explain: A Study on theNecessity of Explanations for Autonomous Vehicles.[27] Kyriacos Shiarlis, Markus Wulfmeier, Sasha Salter, Shimon Whiteson, and IngmarPosner. 2018. Taco: Learning task decomposition via temporal alignment forcontrol. In

International Conference on Machine Learning . ACM, Vienna, 4654–4663.[28] Hanneke Hooft van Huysduynen, Jacques Terken, and Berry Eggen. 2018. WhyDisable the Autopilot?. In

Proceedings of the 10th International Conference onAutomotive User Interfaces and Interactive Vehicular Applications (Toronto, ON,Canada) (AutomotiveUI ’18) . Association for Computing Machinery, New York,NY, USA, 247–257. https://doi.org/10.1145/3239060.3239063[29] Tingru Zhang, Da Tao, Xingda Qu, Xiaoyan Zhang, Rui Lin, and Wei Zhang. 2019.The roles of initial trust and perceived risk in public’s acceptance of automatedvehicles.