[PDF] Building Trust in Autonomous Vehicles: Role of Virtual Reality Driving Simulators in HMI Design

Abstract

The investigation of factors contributing at making humans trust Autonomous Vehicles (AVs) will play a fundamental role in the adoption of such technology. The user's ability to form a mental model of the AV, which is crucial to establish trust, depends on effective user-vehicle communication; thus, the importance of Human-Machine Interaction (HMI) is poised to increase. In this work, we propose a methodology to validate the user experience in AVs based on continuous, objective information gathered from physiological signals, while the user is immersed in a Virtual Reality-based driving simulation. We applied this methodology to the design of a head-up display interface delivering visual cues about the vehicle' sensory and planning systems. Through this approach, we obtained qualitative and quantitative evidence that a complete picture of the vehicle's surrounding, despite the higher cognitive load, is conducive to a less stressful experience. Moreover, after having been exposed to a more informative interface, users involved in the study were also more willing to test a real AV. The proposed methodology could be extended by adjusting the simulation environment, the HMI and/or the vehicle's Artificial Intelligence modules to dig into other aspects of the user experience.

Full PDF

IIEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. XX, NO. XX, XXXX XXXX 1

Building Trust in Autonomous Vehicles: Role ofVirtual Reality Driving Simulators in HMI Design

Lia Morra,

Senior Member, IEEE,

Fabrizio Lamberti,

Senior Member, IEEE,

F. Gabriele Prattic´o,Salvatore La Rosa, Paolo Montuschi,

Fellow, IEEE

Abstract —The investigation of factors contributing at makinghumans trust Autonomous Vehicles (AVs) will play a fundamentalrole in the adoption of such technology. The user’s ability toform a mental model of the AV, which is crucial to establishtrust, depends on effective user-vehicle communication; thus, theimportance of Human-Machine Interaction (HMI) is poised toincrease. In this work, we propose a methodology to validatethe user experience in AVs based on continuous, objectiveinformation gathered from physiological signals, while the useris immersed in a Virtual Reality-based driving simulation. Weapplied this methodology to the design of a head-up displayinterface delivering visual cues about the vehicle’ sensory andplanning systems. Through this approach, we obtained qualitativeand quantitative evidence that a complete picture of the vehicle’ssurrounding, despite the higher cognitive load, is conducive to aless stressful experience. Moreover, after having been exposed toa more informative interface, users involved in the study werealso more willing to test a real AV. The proposed methodologycould be extended by adjusting the simulation environment, theHMI and/or the vehicle’s Artiﬁcial Intelligence modules to diginto other aspects of the user experience.

Index Terms —autonomous vehicles, human-machine interac-tion, driving simulator, user experience, virtual reality.

I. I

NTRODUCTION M OST research efforts in the context of intelligent vehi-cles (IVs) have been directed to improving safety andeffectiveness of vehicle’s control (autonomy) and vehicle-to-vehicle coordination (connected vehicles) [1]. To fully reapthe beneﬁts of autonomous driving (AD) systems, humans,both drivers/passengers and pedestrians alike, will need to trust their safety and reliability. Hence, there is an emerging needto support effective and reassuring communication betweenhumans and IVs. Passengers need to feel conﬁdent, at alltimes, that they have sufﬁcient information about the stateof the vehicle, its environment and perceptions as well as itsplanned and current behavior; even more, that they possessall the appropriate information and means to take over all theaspects regarding the operation of the vehicle in due time,when needed, in a safe and appropriate manner.Despite playing a crucial role in the uptake of any systembased on autonomous agents, including autonomous vehicles(AVs), trust between humans and machines is generally hard

Copyright (c) 2019 IEEE. Personal use of this material is permitted.However, permission to use this material for any other purposes must beobtained from the IEEE by sending a request to [email protected] authors are with the GRAINS – GRAphics And INtelligent Systemsgroup at the Dipartimento di Automatica e Informatica of Politecnico diTorino, 10129 Torino, Italy. e-mail: (see http://grains.polito.it/people.php).Manuscript received XXXX XX, XXXX; revised XXXX XX, XXXX. to establish. According to a 2017 survey by the Pew ResearchCenter on “Automation in everyday life”, over half (56%) ofthe Americans who were interviewed said they would not wantto ride in a driverless vehicle if given the opportunity [2].However, preliminary experiments in the literature on par-tially autonomous driving scenarios show that these negativeemotions can be reduced by adopting Human-Machine Inter-action (HMI) designs that provide feedback about how the caris acting (what automated activity it is undertaking) and thereasons why the car is acting that way [3].The role of HMI in IVs is thus profound and, for this reason,user experience (UX) should be taken into large accountat any stage of the development process. By establishinga collaborative relationship between drivers/passengers andvehicles, HMI can positively affect the acceptance as well asthe technological advancement of AD solutions.Unfortunately, the application of consolidated approachesfor UX design and evaluation to AD systems is not straight-forward. For instance, focusing on the quantitative assessmentof a particular user interface design, techniques that measuredriver’s performance in speciﬁc driving tasks could not beeasily reused when, due to the speciﬁc level of automation,there are no more drivers but passengers. Similarly, post-experience questionnaires (alone) could be no more appropri-ate when feedback to be collected concerns the huge amountof aspects that may contribute to the perceived level of trust.Even driving simulators that are used today for developingvehicle’s intelligent behaviors may not be directly applied toUX studies, as focus would have to be shifted, e.g., on thevehicle’s interior and on interaction with it, rather than onthe ﬁdelity of external factors affecting its decisions (trafﬁc,presence of pedestrians, etc.).By moving from the above considerations, in this paper wepresent a methodology that is meant to support the study ofHMI with IVs, and we show its helpfulness in the evaluationof the passengers’ level of trust by considering the design ofa possible interface for AD systems.The devised methodology relies on a simulation platformbased on immersive Virtual Reality (VR), which was devel-oped by grounding on an existing driving simulator. Although,in principle, the technology is applicable to many scenarios,from unassisted to fully autonomous systems, we focused onL4 and L5 automation levels, as they represent the conﬁgura-tions for which characterizing the passenger’s experience fromthe point of view of comfort and trust is more challenging.We therefore created a virtual AD system that allows usersto experience a simulated ride in a virtual urban environment, a r X i v : . [ c s . H C ] J u l EEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. XX, NO. XX, XXXX XXXX 2 facing a number of different situations.For the assessment of the UX, we consider both cognitiveand affective factors, by integrating feedback based on subjec-tive post-experience questionnaires with continuous, objectiveinformation gathered from physiological signals. In particular,in this paper we focused on stress level measurements toinvestigate the perceived degree of safety and “connection”with the vehicle. Notwithstanding, the proposed methodologyhas been designed in a way to support later extensions for thedetection of other emotional states. It is worth observing that,thanks to its immersive nature, VR allows to measure the latterstate much more realistically than other traditional simulationscenarios [4].With the aim to evaluate the suitability of the proposedapproach, the methodology was applied to the design of ahead-up display (HUD)-based interface for AVs that providesvisual cues about the vehicle’ sensory and planning systems.As said, providing information about how and why the car isacting is crucial to elicit trust in AVs, but little experimentalevidence is available to determine how such information is bestpresented to the passengers [5], [6]. By applying our approachto the above scenario, we obtained qualitative and quantitativeproofs that a complete picture of the vehicle’s surrounding,despite the higher cognitive load, is conducive to a lessstressful experience. Moreover, after having been exposed toan interface delivering a higher information content, usersinvolved in the study were also more willing to test a realAD system.Besides offering interesting insights that may drive futureHMI designs, the results conﬁrm the effectiveness of theproposed methodology in digging into a use case that wellrepresents possible facets of the UX which could be investi-gated through the experimented techniques.II. B

ACKGROUND AND R ELATED W ORK

A. HMI in Partially and Fully Automated Vehicles

Establishing trust is important in order for users to accept,and even rely on, automated systems. Mcknight & Chervany[7] have identiﬁed three constructs necessary to increase trust:ability, benevolence, and integrity. When the trustee is anautonomous system, these factors translate in the system’s performance and skillful execution, into the sharing of acommon purpose with the user, and into the implementationof a reliable and consistent process . Trust is thus establishedthrough direct observation of the system’s behavior and itsunderlying mechanisms. Lee & See observed that “Trust thatis based on an understanding of the motives of the agent willbe less fragile than trust that is based on the principle ofreliability of the agent” [8]. In the context of AD systems,HMI plays a fundamental role in this respect, by providinginformation about the vehicle’s performance. In fact, partiallyautomated vehicles on the market allow the driver to monitorthe status of the car’s components. User interfaces are designedto increase the perceived ability of the system and to supportpredictability, thus inducing trust.In recent years, a study by Ekman and colleagues provided asystematic review of HMI design principles that promote trust in AD systems [9]. The authors distinguish a learning phase ,that starts with the ﬁrst interaction and lasts until the user isfamiliar with the AD systems, from the performance phase ,which takes into account a long-term use perspective. During atesting simulation, it can be argued that the learning phase ismost important, although its speciﬁc duration differs on anindividual basis. In the performance phase, trust is mainlybased on the performance and dependability of the system,and is fairly stable unless an error or unexpected event occurs;in the learning phase, it is the user’s ability to form a mentalmodel of the AD systems that is crucial to form a trust bond.Hence, in this work we focused our attention speciﬁcallyon the four factors that, according to [9], are more relevantfor the learning phase: mental model , the ability to forman approximate representation of the AD system’s skillsand functions; the system’s proneness to be perceived as an expert / reputable agent; the possibility to provide continuous feedback to the user, ideally addressing two or more senses;ﬁnally, the provision of how and why information regardingupcoming actions. In this context, a “how” message describeshow the system solves a given task, whereas a “why” messagepertains to the motivations that lead to the task itself.A limited number of experimental studies have, so far, es-tablished that providing information to the driver/user usuallyincreases driving performance and acceptability in partially[3], [10], [11] and fully automated driving systems [6]. Forinstance, Verberne et al. [10] found that Adaptive CruiseControl (ACC) systems that share the same drivers’ objectives,like the adoption of a relaxed and safe driving style withoutsudden braking and accelerations, while at the same timeproviding information to the user, are considered more reliableand acceptable. Koo et al. [3] explored the effect of providing“how” and “why” information in the context of an auto-braking system. Providing both the information types resultedin the safest driving behavior, at the expense, however, of ahigh cognitive load and decreased acceptability. Drivers pre-ferred receiving only “why” information, whereas the “how”information was often perceived as redundant. The interfacesconsidered in these studies were very simple compared to thetechnical possibilities of current user interfaces: they consistedof brief verbal messages, with no visual cues [3], or includedonly information on the position of obstacles [6].It is important to consider not only which information isprovided to the users, but also how it is conveyed. The visualmode is the primary and most widely used among the vehicleinterfaces, and represents the most consistent communicationchannel. In-vehicle display devices can be grouped in threecategories: head-down displays (HDDs), head-up displays (HUDs), and head-mounted displays (HMDs). HDDs offer theadvantage of not blocking the view of the real world for theusers, who, however, ﬁnd themselves distracting from the road.HUDs make it possible to take advantage of the necessaryinformation while keeping an eye on the external environment,but pose signiﬁcant construction challenges. HMDs share theadvantages of HUDs, but only a few devices are available onthe market, which suffer from some usability issues (especiallyfor in-vehicle applications).Studies have consistently shown that HUDs result in a better EEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. XX, NO. XX, XXXX XXXX 3

TABLE II

NFORMATION DISPLAYED BY COMMERCIAL

HUD

CONCEPTS ANDDEMONSTRATION VIDEOS .AR-HUD Displayed informationContinental AR-HUDConcept Lane Departure Warning System (LDWS),Assisted Navigation, Adaptive Cruise Con-trol (ACC)Hyudai AR-HUD Concept Trafﬁc lights, Assisted Navigation, LKA,ACCPSA group AR-HUD con-cept Assisted Navigation, LKA, ACC, Pedestri-ans, Approaching obstacle warningDaqri AR-HUD Concept Assisted Navigation, Lane Keep Assistance(LKA), Lane Control, ACC, Approachingobstacle warning, Children crossing, Pedes-triansWayRay Holographic ARDisplay concept Assisted NavigationWaymo Demo video Trafﬁc signs, Cars, Pedestrians, Cyclists(bounding boxes and colored overlays withdistance and speed information), MotionPrediction, Assisted NavigationNVIDIA Drive AGX Trafﬁc signs, Cars, Trafﬁc lights, Lanes,Pedestrians, Cyclists (bounding boxes withdistance information), Motion Prediction,Lane separation lines, Route planning data driving experience and performance than HDDs, leading toshorter reaction times [12], decreased cognitive load [13], andfewer driving errors [5], [14]; HUDs are also preferred byusers against both HDDs and HMDs [5]. Augmented RealityHUDs (AR-HUDs) have been found especially effective inincreasing the driver’s intuitive cognition [15] and promotinga safer and more effective driving behavior, particularly indemanding driving situations [16], [17].Given the technical difﬁculties in realizing AR-HUDs, cur-rent displays often come in the form of prototypes, concepts ordemonstration videos. Examples in the literature often focuson speciﬁc aspects of the driving experience, such as driverassistance (DA) [5] and obstacle detection [6]. Many com-mercial prototypes focus on partially automated systems thatextend current DA solutions, whereas Waymo and NVIDIAare more directly focused on L4 and L5 automation.Information displayed by the main commercial solutionsis reported in Table I. A tendency to adopt a common setof symbols and metaphors can be observed among vendors.For instance, information related to ACC and Lane KeepAssistance functionalities, such as the current lane, speed, andthe position and speed of preceding cars are displayed byContinental, Hyundai, PSA, and Daqri. Waymo and NVIDIAinclude richer information on both the path planning andthe sensory capabilities of the vehicle. Through boundingboxes (i.e., parallelepipeds enclosing detected objects), coloredoverlays and other elements, all the factors involved in drivingare highlighted. In addition, navigation information is addednot only for the user’s vehicle, but also related to other cars,pedestrians or cyclists through motion prediction.

B. Measuring User Experience in Driving Simulators

Researchers have for long time relied on driving simulatorsto cope with difﬁculties and risks associated with ﬁeld testing[18]. In recent years, VR simulators have elicited a lot ofinterest thanks to their immersive nature [5], [13], [19], [20]. Most studies investigating different aspects of driving insimulated scenarios, including HMI design [5], [21], relyon drivers’ behavior and performance as a proxy for theiremotional and cognitive status [3], [12], [22]. Experimentalmeasures include standardized questionnaires as well as indi-cators such as driving speed, lane keeping, braking patterns,etc., for which absolute or relative validity has been generallyestablished [22]. However, in AD systems, humans are ex-pected to take progressively less part in driving, which makesbehavioral assessment less relevant.Physiological signals are increasingly used to measureusers’ affective and cognitive states in engineering in general[23]. The activity of the autonomic nervous system, which reg-ulates affective states, can be captured non-invasively throughsignals such as Heart Rate (HR) and Electrocardiography(ECG), Electromyography (EMG), Respiratory Rate, and Gal-vanic Skin Response (GSR). In the last years, researchers alsoinvestigated their use combined with traditional or immersivedriving simulators [4], [22], [24].In particular, the relative validity of physiological signals fortraditional driving simulators is supported by several studies,albeit available data is less abundant than for driving perfor-mance [4], [22], [25]. For instance, risk perception was foundto be highly correlated with changes in GSR [22]. Comparisonbetween on-road and simulated driving conditions establishedthe relative validity for mean HR and mean oxygen consump-tion, although HR values observed in real driving conditionswere higher, probably due to the increased stress associatedwith driving on a real road [25]. In a pilot study, Eudeaveand colleagues found that the physiological response in animmersive VR environment is stronger than in a traditionaldriving simulator [4].Recording of physiological signals have also been exploitedin real-life driving conditions to characterize drivers’ perfor-mance and experience, from measuring stress levels to detect-ing drivers’ drowsiness [26], [27]. Of particular interest is thestudy of Healey and colleagues on driving-related stress [26].ECG, EMG and GSR were recorded while drivers followeda set route; driving sessions were videotaped and visuallyinspected for observable stress-induced actions, such as headturning, to be used as reference standard. Collected signalsallowed the authors to distinguish different levels of stresswith high accuracy (over 97% across multiple drivers); GSRand HR metrics were most closely correlated with drivers’stress level. Again, studies have been conducted, so far, fromthe point of view of an active driver, leaving the questionopen on whether stress-induced changes can be equally andas effectively observed in passengers.III. P

ROPOSED M ETHODOLOGY

A. Overview

As discussed in Section II-A, trust in automated systemscan be achieved from direct observation of system’s behavior,coupled with an understanding of the underlying mechanisms.To this aim, as depicted in Fig. 1, the devised methodologyrelies on a VR-based AV simulator. Simulation allows theuser to get immersed in repeatable scenarios including a

EEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. XX, NO. XX, XXXX XXXX 4

Motion PlatformVehicle Simulation Driving ScenarioVR Simulation HMI DesignSubjective Feedback Quantitative Feedback

Fig. 1. Proposed methodology: exempliﬁcation on the task of HMI design. A deﬁned AV driving scenario is simulated in immersive VR. Vehicle simulationis based on the open source GENIVI platform. The simulator is integrated with a motion platform to further foster immersion. User’s feedback is collectedthrough both subjective, ofﬂine questionnaires, and objective, real-time physiologic measurements reﬂecting cognitive and affective states. variety of both ordinary and emotional-intensive events. Useris provided with insights on the autonomous system’s behaviorby means of a virtual AR-HUD combined with additionalaudio cues. In this way, we postulate that the user can forman adequate mental model of the AD system. Assessmentis performed by collecting feedback from the user in theform of subjective (questionnaire-based) ratings and objective(physiological signal-based) measurements.

B. Technology and Setup × ◦ FOV at90Hz. The native positional tracking leverages the IR lasersemitted from the Vive Base stations (built upon the Valve’sLighthouse technology) which, combined with headset’s built-in sensors, enables a 6 DOF outside-in tracking of the user’shead.With the aim to foster immersion through the simulation ofthe motion stimuli that a driver or passenger would experienceon a real vehicle, an inertial motion platform is used. The plat-form exploited in this work is the Atomic A3 Racing, designedby Atomic Motion Systems, which supports 2 DOFs (yawand pitch) motion simulation. To simulate the user’s perceivedaccelerations, the so-called tilt coordination motion simulationstrategy [7] was implemented. In short, this technique worksby imitating the perceived acceleration via decomposition ofthe gravity acceleration vector, obtained through a coherentrotation of the platform. A motion compensation needs to beapplied to the VR coordinate system (which is centered inthe headset, i.e., in the user’s viewpoint), based on currentplatform’s rotation. To this purpose, a Vive Tracker wasmounted on the seat and tracked together with the headset.Finally, since it has been proved that letting the user see hisor her hands in the virtual environment increases the sense ofpresence [28], a virtual replica of the user’s hands includingarticulated ﬁngers is created by tracking them using a LeapMotion Controller device attached to the headset.

C. AV Driving Simulator

The vehicle simulator implemented is based on the opensource Simulator Vehicle project by the GENIVI Alliance [29](in the following simply referred to as GENIVI, for brevity).GENIVI was selected among several possible alternatives formultiple reasons: it was originally created to support HMIdesign; it allows, by design, the addition of new features; italready provides modules for intelligent trafﬁc simulation; itincludes a basic auto drive functionality for the user’s vehicle;it provides a few driving scenes and vehicles with their ownrigid body physics-based controller.The main activities carried out to adapt GENIVI to thepurposes of this work involved: porting of available featuresto VR; integration of the motion platform; implementation ofa custom AD controller. The latter activity was considered asnecessary since, in a preliminary study, the built-in controllerwas judged not realistic enough, especially when dealingwith complex, unpredictable events (e.g., sudden pedestriancrossing, etc.).

1) VR porting:

Implementing the support for VR wasfacilitated by the fact that GENIVI is based on the Unitygame engine, which natively allows for the creation of VRapplications for the HTC Vive. Our implementation allows tovirtual accommodate the user to any seat of the virtual vehicle.Built-in vehicles, namely a Land Rover L405 and a Jaguar XJ,are designed for a non-immersive simulation. Hence, a newvehicle was created with VR-based interaction in mind, i.e.,by focusing on visual ﬁdelity of the vehicle’s interior. Finally,support for users’ virtual hands was added.

2) Motion platform integration: to integrate the motionplatform, an additional software module was developed. Themodule receives in input the acceleration values calculatedin the seat’s tracked point by the physics simulation engineand outputs it to the proprietary platform’s driver (AMSSymphinity), which remaps them to coherent tilt and pitchangles and consequently applies them to the platform. Othermotion platforms may be integrated in a similar way.

3) AV simulation: within this work, our aim was to providea methodology to study the considered domain using simulatedVR-based scenarios, accompanied by suitable measurementtools, rather than contribute to the advancement of the state of

EEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. XX, NO. XX, XXXX XXXX 5 the art of AVs’ control sub-systems. Basic AD functionalityavailable in GENIVI was therefore extended to make it copewith situations of interest. Attention was focused on re-producibility, while preserving simplicity. More sophisticatedimplementations, leveraging, e.g., data provided by vehicle’svirtual sensors could nonetheless by integrated in the future.Our implementation takes advantage of the native trajectorysystem, which is used in GENIVI to manage the trafﬁc. Pathsto follow are embedded in the scene description using acomplex network of waypoints. The developed AD systemrelies on it to feed a PID-based controller, which is in chargeof driving the vehicle by making it accelerate, brake, andsteer. Differently than with the other cars in the trafﬁc, theAD system is affected by the full set of accurate, rigid bodyphysics simulation variables. The PID was ﬁne-tuned in closedloop using manual parameter adjustment targeting a maximumovershooting of 5% at step response, in order to achievea comfortable and realistic behavior. To this aim, controlcommands shaping and auxiliary waypoints were also used.Although different and far more sophisticated approachescould be investigated in the future (e.g., [30]), the selectedcontrol system proved to meet the simplicity-effectivenesstrade-off required to cope with the issues tackled in thiswork. Appropriateness of the pursued approach was alsoconﬁrmed by subjective observations concerning simulationquality (Section IV-E and Supplemental material).The same approach was pursued also to bind speciﬁcvehicle’s reactions to the pre-programmed events. Obstacleavoidance is handled by a dedicated logic, which also takesinto account trajectory replanning when the obstacle cannotbe avoided by simply adjusting vehicle’s speed. For movingobstacles, replanning takes into account predicted motion.Further details on the implementation can be found in [31].

D. HUD Design

Based on the principles discussed in Section II-A, an in-vehicle user interface should continuously provide feedbackaddressing, whenever possible multiple senses, highlighting“why” information that explains the vehicle’s choices, andadopting a pleasant and effective communication style thatpresents the system as a skilled/reputable driver. These ele-ments, while important in general, are particularly relevant inthe initial learning phase, where the user is still unfamiliarwith the AD system and needs to form an appropriate mentalmodel of its inner mechanisms [9]. As it will be illustratedin more detail in Section IV, subjects who participated in ourstudy were never exposed to a real AD system. An AR-HUDwas therefore designed, as it was found in the literature to bethe most effective interface under the considered conditions.It was deemed as important to ensure that visual cuesdisplayed by the AR-HUD are consistent with informationconveyed by commercial DA products, as users are mostlyfamiliar with it. However, it was regarded as crucial to providealso information that illustrate the vehicle’s sensory capa-bilities and hence, improve the user’s situational awareness.Finally, given this work’s focus on L4 and L5 AD systems,information about the vehicle’s planning functionalities neededto be delivered as well. Design was based on the features reported in Table I. TheHUD is capable to display information about all the relevantelements in the surrounding environment, including both staticobjects (trees, lighting poles, parked cars, trafﬁc lights, roadsigns, etc.) and dynamic objects (pedestrians, animals or othercars). These elements are provided together with distanceinformation in meters from the vehicle, absolute speed whenavailable, and a visual warning status indicator. Lane keepingand navigation cues for the user’s vehicle and other cars(assuming that they are connected) are also considered. Thecolor of each car is randomly assigned by GENIVI.Objects of interests are identiﬁed by means of a boundingbox. This metaphor, previously validated in the literature[17], is adopted by commercial players such as Waymo andNVIDIA. In our implementation, each bounding box has awhite outline and is associated with a label and an icon identi-fying the detected object, thus satisfying the usability principlewhich suggests that the adopted representation must be simpleand intuitively understandable by the user [32]; the use offamiliar cues, such as icons, also reduces the cognitive load inthe presence of a large amount of information [33]. Boundingboxes are automatically generated in VR knowing the position,size and pose of all objects in the scene. Technically, thiswas implemented in Unity by associating to all objects with a

Collider component a visible colored material. The

Colliders are not visible by default in the rendering step because justdeﬁne the bounding volume of an object for the purposesof identifying physical collisions through the physics engine.Labels always face the vehicle and are, therefore, readable bythe user.In order to determine which situations constitute a potentialdanger, we relied on the deﬁnition from the ISO 15623standard on “Forward vehicle collision warning systems” [34],counting on a previous study by Sebastian et al [35]. Amathematical model is used to determine potential collisionsbased on the trajectory, speed and acceleration of the vehicle,as well as that of potential obstacles (e.g., the preceding car).Once the possibility of a collision has been established, asafety distance is calculated, which depends on the speed ofthe vehicle and the reaction time of the driver, which wasestimated based on the study in [36]. The distance between thevehicle and the estimated collision point is therefore measured:if this distance is less than the safety distance, the passengerneeds to be warned of the potential danger. We thereforedeﬁned a hazard index , ranging from 0 to 1 and calculatedas the ratio between the distance from the obstacle and thewarning distance deﬁned in [35].The objects’ warning status is presented through both visualand auditory cues. In the literature, the AR-based DA system in[37] adopted an intuitive color code in which the severity of thedanger of an obstacle detected on the road is shown by meansof a color code that starts from the green (safety) and extendsup to the red (maximum state of danger). This color coding isconsistent with systems reviewed in Section II-A. Therefore,we decided to color-code the hazard index with a green to redgradient and use it to visually represent the warning status ofthe detected object by controlling the transparency color ofthe bounding box. The color-code value is computed using a

EEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. XX, NO. XX, XXXX XXXX 6 perception-based equation, where the hazard index is used asexponential factor.To signal potential dangers, the label associated with thebounding box ﬂashes as well, so as to direct the user’s attentiontowards the obstacle. Flashing is used in DA systems byvarious vendors [38], [39]. It is important to underline that,through the ﬂashing information, the vehicle communicates tothe user why it is about to perform a speciﬁc action [9]. Theﬂashing visual cue, at a lower frequency, is also used to notifythe user of a road sign or trafﬁc light. Flashing occurs whena trafﬁc light changes or when a new road sign is recognized.The lower frequency reduces the sense of alarm and, hence,allows the user to distinguish normal driving operations fromhigh-risk situations.An immediate danger is also marked by a sound alert [34],[40]. This is consistent with current DA systems, which pro-duce audible warnings, e.g., in emergency braking conditions[39]. A more pleasant sound is played when road signs aredetected, to capture the user’s attention in an unalarming way.Two variants of the HUD were designed, which in thefollowing are referred to as omni-comprehensive (OMN) and selective (SEL).

1) Omni-comprehensive HUD: in the OMN variant, weshow information about all dynamic elements (cars and pedes-trians) within a “detection” diameter which is set to 150 me-ters. This threshold was ﬁrstly motivated by practical reasons:virtual objects beyond this distance would be too small to beappreciated considering the resolution of the display in the VRheadset. This distance is also compatible with the equipmentof current AD prototypes and the detection range of LiDARsystems. Road signs and trafﬁc lights are always shown in theinterface, except for those that regulate road sections differentthan the one the vehicle is currently on. Furthermore, it wasdecided to exclude from the display the information aboutstatic objects such as trees, parked cars, lighting poles, etc.unless they become dangerous. This exclusion is motivatedby the principle of cognitive load, according to which aninterface should be easily understandable by the user, simpleand intuitive, as it avoids excessive cluttering [32].

2) Selective HUD: in the SEL variant, only informationthat is deemed of speciﬁc interest to the user is displayed.The guiding principle was to select information that pertainsto those elements of the environment that, at any given point,affect the behavior of the AD system. Let us consider roadsigns: the vehicle detects all road signs within the diameterof interest, but not all of them are necessarily useful at thetime. For example, in the presence of a pedestrian crossingsign and a speed limit sign, the vehicle may decide not toshow any information on the former sign, based on the factthat, at the moment, there is no pedestrian intending to cross;the latter sign may force the vehicle to slow down and, thus, itwould be highlighted in the interface. More speciﬁcally, in theSEL variant only cars that precede the current vehicle or, moregenerally, that intersect its current trajectory, are highlightedwith a bounding box. Pedestrians and other static or movingobjects are identiﬁed only if and when they become dangerous,i.e., when a collision becomes possible. Navigation lines forother cars are only displayed when assessed by the vehicle (a) OMN(b) SELFig. 2. Comparison between the OMN (a) and SEL (b) AR-HUD interfaces. (e.g., at intersections to determine priority). Trafﬁc lightsinformation, as well as tracing of the vehicle’s navigationline and the road center line, are unchanged in this variant.A comparison of the two interfaces is provided in Fig. 2.

E. Simulated Scenario

In order to create a relationship of trust between a userand an AD system, the latter must show its ability in dealingwith different driving scenarios [8]. Our simulated scenario isconstructed to include a variety of different situations, bothordinary and challenging, that may occur in an urban setting.The urban setting is, in general, considered the most difﬁcultto manage by AVs [41]. In fact, current L2 and L3 automatedsystems are mostly restricted to motorways and extra-urbanroutes, and the biggest challenge in the development of L4and L5 systems is represented precisely by urban areas wheresigniﬁcantly more factors are at play, driving conditions arefar less predictable, and the presence of pedestrians ampliﬁesthe perceived risk.Compared to real-life driving, simulators offer the distinc-tive advantage of creating repeatable scenarios, where mostexperimental factors can be easily controlled. Therefore, it ispossible to study and compare subjects’ reaction to individualevents, whereas in real-life driving experiences one would bemostly restricted to consider overall measures [25], [26].The simulation was created starting from one of the scenesincluded in the GENIVI platform, representing a miniatureversion of the city of San Francisco. As said, despite thebasic auto drive functionality, GENIVI is natively meant tosupport mostly ﬁrst-person driving experiences, with randomtrafﬁc patterns, no pedestrians and no intentionally hazardoussituations. By leveraging the developed AD capabilities and

EEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. XX, NO. XX, XXXX XXXX 7 the integrated waypoint system, different situations were em-bedded in the simulated scenario in order to showcase differentAD abilities and elicit changes in the subjects’ affectivestate. Considered abilities include: interacting with trafﬁc andespecially with other cars, e.g., maintaining safety distance,overtaking, etc.; handling road signs and trafﬁc lights; avoidingobstacles and dealing with other potentially dangerous situa-tions, including those where other cars or pedestrians do notbehave correctly.The subject is seated in the passenger’s position (frontright). The experience begins in an area with a relativelysimple environment and little trafﬁc, to allow the user tofamiliarize with the AD system. Then, the environment be-comes populated by cars and pedestrians: the subject becomesfamiliar with the HUD and the way information is conveyedby the vehicle. Afterwards, riskier situations occur in whichthe vehicle can show its decision-making skills. In [8], it wasobserved that “If trust is primarily based on rules that char-acterize performance during normal situations, then abnormalsituations might lead to the collapse of trust”. This strengthensthe importance of including driving situations that, while lesslikely, may pose signiﬁcant challenges for an AD system. Tosimulate a typical urban context, such situations were spacedthroughout the simulation and alternated with ordinary ones,as illustrated in Fig. 3. After every risky situation, the carstops for few seconds, to ensure that the subject has enoughtime to understand what happened and to reﬂect on how the carhandled that situation. Considering the time for letting subjectsget acquainted with the system as well as the time requiredto achieve a suitable distribution of situations, the duration ofthe simulated scenario was set to 12 minutes.Simulated events include the sudden crossing of a dog(Dog), a child on the sidewalk throwing a ball on the street(Ball), scooters and cars that split lanes while driving (Scooter,Car1 and Car2), as well as pedestrians crossing the street(Man1 and Man2). Illustrative frames are reported in Fig. 3.The Dog event corresponds to a highly hazardous situation,in which the vehicle is forced not only to slow down, butalso to steer in the opposite direction to avoid a collision.The same happens in the Ball and in the Man2 events (inthe latter case, a pedestrian crosses outside of a designatedcrosswalk while the car is at full speed). Man1 is a less riskysituation, as the car is approaching a red light and is alreadybraking when the pedestrian starts crossing. In the Scooterevent, the vehicle slows down as the preceding car turns right;in the meanwhile, a scooter enters the lane from the left.The situation is not particularly dangerous, as the vehicle wasalready reducing its speed to deal with trafﬁc jam; however,from the viewpoint of vehicle-to-human communication, thisinteraction is complicated as it involves several vehicles. In theCar1 event, a car suddenly changes its lane when approachingroad construction (which is poorly visible), forcing the vehicleto quickly reduce its speed to avoid a collision. The Car2 eventis even riskier, as another car driving on the intersecting roaddoes not stop at a red trafﬁc light and instead passes at fullspeed, forcing the vehicle to brake very abruptly.Two videos showing the simulated scenario with the OMNand the SEL interfaces are available at http://tiny.cc/p4v16y.

BallDog Car1 Scooter Car2 Man1 Man2Start EndBallDog ScooterCar2 Man1Man2StartEndCar1

Fig. 3. Timeline of the test scenario with simulated events.

F. Galvanic Skin Response

As discussed in Section II-B, physiological signals related tothe activity of the autonomic nervous system can provide non-invasive information about the user’s affective state. While,in principle, a combination of different signals can be used,for the sake of simplicity in this work we focus on the GSRsignal, which was found to effectively detect stress in bothsimulated and real-life driving [22], [26]. Furthermore, it iseasily measured with a simple sensor placed on the ﬁngers[23]. GSR is mostly sensitive to the dimension of arousal ,going from sleepiness to excitement or stress [42]; it leavesopen whether the arousal change is of a positive or negativenature (the valence dimension), which nonetheless in ourspeciﬁc case can be derived from the context.The GSR can be decomposed into a slowly changingtonic component, the Skin Conduction Level (SCL), and animpulsive phasic component, the Skin Conductance Response(SCR) [42]. While the SCL reﬂects the overall emotional stateas well as habituation to the environment, the SCR measuresactivation in response to a stimulus, e.g., a potentially stressfulevent occurring in the simulated scenario. The magnitude ofthe response should correlate to the perceived threat. Thisphenomenon was previously validated in other types of VRenvironments [43], with an observable effect on the GSRsignal even after multiple exposures.

1) Signal processing: the SCR data was extracted using a3 rd order Butterworth band-pass ﬁlter ranging from 0.16 Hzto 2.1 Hz [27], [42]. Normalization is required to account forthe intrinsic inter-individual differences in skin conductance[23]. The most common choices are z-score standardization,in which the signal is divided by the standard deviation aftersubtracting the mean, and min-max normalization, in which thesignal is normalized between 0 and 1. We found that the min- EEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. XX, NO. XX, XXXX XXXX 8 max scaled signal was most useful for visualization purposesand trend analysis, whereas z-score normalization could beused for the ﬁnal feature extraction as we observed less inter-subject variability. Considering the fact that, typically, SCRpeaks appear between 1 and 5 seconds from the stimulus’sonset and last for about 10 seconds, we extracted the SCRwaveform for a time window of ±10s centered on each event[44], [45]. Within each window, all samples are divided by theinitial signal value, to focus on relative changes.

2) Feature extraction: for each event time window, a set offeatures is extracted [44]. Let ˆ GSR ( k, j, i ) be the z-score stan-dardized data value for subject k , event j and sample at time i , SCR ( k, j, i ) the corresponding ﬁltered signal representing theskin conductance response, and L the total number of samplesper time window. Features extracted include the mean GSR(Eq. 1), the accumulated GSR (Eq. 2), the max GSR (Eq. 3,and the Peak to Peak distance in SCR (Eq. 4). Each feature iscalculated on the 10s before ( P re ) and the 10s after (

P ost ) thetest event, and the difference ( ∆ ) is used as the ﬁnal measure. GSR mean ( k, j ) = (cid:80) Li =0 ˆ GSR ( k, j, i ) L (1) GSR

Acc ( k, j ) = L (cid:88) i =0 ˆ GSR ( k, j, i ) (2) M ax ( k, j ) = max i (cid:0) ˆ GSR ( k, j, i ) (cid:1) (3) P P ( k, j ) = max i (cid:0) SCR ( k, j, i ) (cid:1) − min i (cid:0) SCR ( k, j, i ) (cid:1) (4) G. Questionnaire

Subjective data about the experience can be collectedthrough questionnaires. The questionnaire that we designedtackles factors affecting trust and, in general, HMI effective-ness [9]. Speciﬁc sections were included to test each of thesefactors. The questionnaire includes both general questions, thatcould be re-used across different driving scenarios, as well asquestions that are more speciﬁc to HMI and to the simulatedscenario. We focused our attention on those aspects thatare more relevant for establishing trust in an initial learningphase, where the user gets acquainted with the system. Whenpossible, questions were mutuated from validated tools suchas Simulator Sickness Questionnaire (SSQ) [46], the SituationAwareness Rating Technique (SART) [47] and the NASA TaskLoad Index (NASA-TLX) [48]. Questions were organized inthe following sections.

1) Health status:

VR systems may induce motion sicknessand other side effects: to avoid biases, health status is collectedbefore and after the experience using the SSQ tool.

2) System competence:

Inspired by standard questions forthe evaluation of trust in human-robot interaction (HRI), thissection evaluates the perceived system’s competence across therange of driving situations explored in the simulation [49].

3) Reaction to test events:

For each test event, the user isasked to rate four statements:

1) The situation was dangerous ,

2) The event took me by surprise ,

3) I was able to see thepotential danger before it affected the vehicle’s performance ,and

4) The interface provided me useful information to foreseethe event . These questions provide complementary information to the physiological signals and disentangle the effect of thespeciﬁc event from the HMI.

4) Situational awareness:

This section was inspired by theSART tool, focusing on dimensions (quality, quantity and fa-miliarity) that pertain to comprehensibility. Here, quality refersto the usefulness with respect to clarifying system’s intentions.Quality and quantity were evaluated for each element of theHMI, e.g., bounding boxes, navigation lines, etc.

5) Cognitive load:

This section was adapted from theNASA-TLX evaluation tool.

6) Overall user experience:

This section investigates gen-eral aspects regarding the mental model, and is concludedwith a direct question about trust. Predisposition towardsparticipating in an AD experience was also assessed beforeand after the simulation.

7) Immersion and presence:

Immersion, presence and sim-ulation ﬁdelity were evaluated by adapting the relevant sec-tions from the VRUSE questionnaire [50], an establishedtechnique to measure usability of VR applications.All questions were in Italian and had to be rated on a 1–5Likert scale. Sections

Reaction to test events and

Situationalawareness included snapshots of the test events and the HMIelements, respectively; the questionnaire was adapted for eachtest group with snapshots from the speciﬁc HUD version.The complete questionnaire (SEL version) is available athttps://forms.gle/CpSYZc729fho7gy86.IV. E

XPERIMENTS

A. Data Acquisition

Healthy individuals (e.g. with no impairing chronic oracute illnesses at the time of the acquisition) with a validdriving license were recruited to participate in the virtualdriving experiment. Participation was voluntary and no mon-etary compensation was provided. Study participants wererandomly assigned to either the OMN or SEL HUD group.All acquisitions were performed within one week.The test phase began for each subject with a brief explana-tion of the test session. Health status, demographic informationand general disposition towards AD systems were collectedbefore starting the simulation. Two baseline signals were alsoacquired: one minute at rest, and one minute after placing theVR headset. After the simulation, the ﬁnal questionnaire wasadministered and the experience debriefed.The GSR was recorded through an ad-hoc device based onthe Groove GSR Sensor [51] and a Raspberry Pi 3 board.The acquisition module was implemented in Python. Anexternal Analog to Digital Converter (MCP3008 [52]) wasused to connect the output of the sensor to the board via theRaspberry’s Serial Peripheral Interface (SPI). The samplingfrequency was set to 256 Hz in order to separate the twocomponents of the GSR signal [44]. Due to inter-subjectvariability, the GSR may saturate during the analog to digitalconversion: therefore, during the initial baseline acquisition,the converter was manually calibrated by adjusting the resistoruntil the output fell in the 200–512 a.u. range. The sensorswere applied on the ﬁngers of the non-dominant hand, afterwashing the hands. Postprocessing and feature extraction was

EEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. XX, NO. XX, XXXX XXXX 9 implemented in Python 3.6.5 and the SciPy library for ﬁltering;all calculations were performed on an HP Pavilion, Intel Corei5-3230M CPU.

B. Statistical Analysis

A two-way factorial Analysis of Variance (ANOVA) wasconducted to examine the main effect of HUD as well as theinteraction effect between event and HUD type on each GSRfeature. A mixed design was employed with the HUD type asthe between-groups factor and the event as the within-subjectsfactor. Post-hoc comparison between the different events andHUD types was performed applying Bonferroni correction.Questionnaire data was analyzed separately for each groupof questions. Event-related questions were analyzed using atwo-way factorial ANOVA, using the same design of theGSR feature. Outcomes of the other questions were comparedbetween the OMN and SEL groups using the Mann-WhitneyU-test for categorical data. A p -value of .05 or lower wasconsidered to indicate a statistically signiﬁcant difference.Statistical analysis was performed using SPSS v20, whereassignal analysis and feature extraction were coded in Python. C. Participants’ Characteristics

Thirty-nine subjects volunteered to participate in the study.One subject with excessive motion sickness was excludedfrom the data-set, as symptoms would bias the physiologicalresponse [53]. A total of 38 subjects (25 male, 13 female,mean age 23.9) were included in the analysis. GSR data wasnot available for 8 subjects due to failures in the recordingequipment. Most of the subjects reported using VR or drivingsimulators “never” or “rarely” (30/38 and 34/38, respectively).

D. Quantitative Measurements

The normalized GSR signals averaged over all study sub-jects within each group are reported in Fig. 4(a)–(b). Allsubjects showed an increase in baseline GSR in VR. Moreover,a noticeable peak in the GSR occurred for most events inthe test scenario. Fig. 4(c)–(d) show the mean SCR curve foreach event. Each curve is extracted for a time window of ±10scentered at each event; within each window, all samples aredivided by the ﬁrst value to highlight changes.From the SCR and GSR curves, different features havebeen extracted, as deﬁned in Section III-F2. We here report indetail the two-way ANOVA results for the ∆ P P feature. Themain effect of HUD was signiﬁcant, F(1,28)=4.72, p =.039,indicating a statistically signiﬁcant difference between theOMN and SEL interfaces. The main effect of event wasalso statistically signiﬁcant, F(6,168)=13.9, p< .001. We didnot ﬁnd a signiﬁcant interaction between HUD and event,F(6,168)=1.74, p =.115; hence, post-hoc analyses were con-ducted on each main effect separately.The mean and standard error of the SCR feature for eachevent and for each HUD are reported in Fig. 5. At post-hocanalysis, the SEL HUD consistently showed higher emotionalarousal for Car1 ( p =.022), Car2 ( p =.042) and Man2 ( p =.041)events. For the ﬁrst two events in the timeline, a positive trend (a) OMN (b) SEL(c) OMN (d) SELFig. 4. Normalized raw GSR signal for OMN (a) and SEL (b) interfaces;baseline is collected prior to the experience and red lines represent test eventsin the simulation. Average SCR curves over all subjects in the 10s before andafter the event for all the test events for OMN (c) and SEL (d) interfaces. Man2Man1Car2ScooterCar1BallDog * * * P e a k P e a k ( a . u . ) SELOMN

Page 1

Fig. 5. ∆ P P feature, for all events, with OMN and SEL. ∗ p -value < .05. could be observed ( p =.181 and p =.409). For the Scooter andMan1 events, which elicit no emotional arousal, differenceswere not statistically signiﬁcant ( p =.759 and p =.990).All GSR features showed a signiﬁcant main effect of HUD: ∆ M ax , F(1,28)=8.53, p =.007; ∆ GSR

Mean , F(1,28)=9.36 p =.005, and GSR

Acc , F(1,28)=9.02, p =.006. Likewise, a sig-niﬁcant main effect of event was always found ( p < . ),with no signiﬁcant interaction between HUD and event. Atpost-hoc analysis, results for the ∆ M ax feature were compa-rable to ∆ P P , whereas ∆ GSR

Mean and ∆ GSR

Acc featuresreported signiﬁcant differences ( p ¡.05) for Ball, Car1 and Car2events, instead of Car, Car2 and Man2 events. EEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. XX, NO. XX, XXXX XXXX 10

For each HUD and each test event, ∆ P P Pre (10s beforethe event) and

Post (10s after the event) values were tested fordifferences using two-tailed t-tests. With the SEL HUD, allevents showed a signiﬁcant increase in SCR ( p ¡.001), exceptfor Scooter ( p =.927) and Man1 ( p =.920). Very similar resultswere obtained with the OMN HUD: the Dog ( p =.014), Ball( p =.046), Car1 ( p =.007), Car2 ( p =.001) and Man2 ( p =.003)events showed a signiﬁcant effect on SCR, whereas Scooter( p =.142) and Man1 ( p =.422) did not. Results for other GSRfeatures, here omitted for brevity, were also consistent. E. Questionnaire Results

Only one subject was excluded from this analysis dueto high motion sickness; the other subjects did not reportexcessive symptoms (nausea rating M=1.26, SD=.54).Subjective ratings for test events are reported in Fig. 6.Four statements were included for each test event, as detailedin Section III-G3; for the sake of clarity, only question 1(which evaluates the risk) and question 3 (which evaluates theability to detect the potential danger in advance) are includedin the plots, as answers to questions 2 and 4 were verysimilar. At two-way ANOVA, the main effect of both HUD,F(1,36)=15.91, p ¡.001, and event, F(6,216)=54.05, p ¡.001, onthe perceived risk (question 1) were statistically signiﬁcant.Interaction between the two factors failed to reach statisticalsigniﬁcance, F(6,216)=2.05, p =.060. Regarding the ability toidentify dangerous situations in advance (question 3), themain effect of both HUD, F(1,36)=28.08, p ¡.001, and event,F(6,216)=14.78, p ¡.001, were statistically signiﬁcant, withouta signiﬁcant interaction, F(6,216)=1.75, p = . .Since events are the same in both groups, we attributethe difference in perceived risk to the greater ability of theOMN interface to convey information about the vehicle’s sur-roundings before critical situations occur. At post-hoc analysis,differences were statistically signiﬁcant for Car1 ( p =.003),Car2 ( p =.017) and Man2 ( p =.008) events, and a positive trendwas observed for Dog ( p =.134) and Ball ( p =.872) events.For each event, questionnaire ratings and GSR featuresvalues were compared by using multiple linear regression;by attempting to predict the average GSR outcome ( ∆ P P )from the average questionnaire ratings, we can desume thedegree of similarity between the two measurements. A per-subject analysis was not attempted, given the limited samplesize. A statistically signiﬁcant regression equation was found,F(4,9)=14.34, p =.0007, with an adjusted R of 0.804, whichindicates that roughly 50% of the variance of the GSR canbe explained by the questionnaires. Individual factors failedto reach statistical signiﬁcance, but the strongest trends wereobserved for the perceived level of risk (coefﬁcient 0.111, p =.29) and the element of surprise (coefﬁcient 0.112, p =.29),which are presented in the scatter plots of Fig. 7.Subjects generally found the vehicle’s driving skills ad-equate (SEL M=4.53, SD=0.61, OMN (M=4.68, SD=0.48, p =.556). In the SEL group, subjects reported more oftenthat the vehicle faced difﬁculties with unexpected changes inthe environment (SEL M=1.68, SD=0.75 and OMN M=1.21,SD=0.42, p =.41); such differences can only be attributed to OMN SEL ** ** ** ** ** **

Dog Ball Car1 Scooter Car2 Man1 Man2

Fig. 6. Subjective measurements for questionnaire section

Reaction to testevents . Label 1 refers to the question which evaluates the risk perception, ona scale from 1 (low risk) to 5 (high risk); label 3 refers to the question thatevaluates if and how the individual previously noticed the dangerous situation,on a scale from 1 (not previously noticed) to 5 (previously noticed). Each testevent is considered separately. ∗ p -value < .05, ∗∗ p -value < .01. M e a n s u r p r i s e r a t i n g Mean Δ P2P (a.u.)

OMNSEL (a) M e a n r i s k r a t i n g Mean Δ P2P (a.u.)

OMNSEL (b)Fig. 7. Comparison of subjective vs. objective ratings. For subjectivemeasurements, the average perceived risk (a) and the average surprise rating(b) are reported (where the latter refers to the extent to which the user wastaken by surprise by the event). The mean ∆ P P feature is reported asobjective rating. Each data point corresponds to a speciﬁc test event. the HUD, considering that the vehicle’s behavior was exactlythe same in both experiences.Displaying more information may result in an excessivecognitive load. Indeed, subjects in the OMN group more oftenrated the amount of information provided by the interface asexcessive (OMN M=2.1, SD=0.229, SEL M=1.05, SD=0.809, p ¡.001), whereas comprehensibility was rated adequate forboth the interfaces ( p =.908). On average, the UX was sat-isfactory for both the interfaces, and the information providedby the HUD was considered useful (SEL M=4.16, SD=0.69,OMN M=4.84, SD=0.38, p =.001). Participants in the OMNgroup reported that the information was more useful in orderto understand why the vehicle made a decision (SEL M=4.26,SD=1.05, OMN M=4.84, SD=0.38, p =.055) and to feel in gen-eral at ease (SEL M=3.79, SD=1.08, OMN M=4.68, SD=.59, p =.003), as well as that the vehicle seemed to have greatercontrol on the external environment (SEL M=3.79, SD=1.08,OMN M=4.84, SD=0.38, p < . ). Overall, the OMN HUDwas more helpful in anticipating potential dangers (SELM=2.42, SD=0.61, OMN M=4.10, SD=0.57, p ¡.001). Subjectsreported a high sense of immersion (M=4.50, SD=0.73) andpresence (M=4.37, SD=0.59), with no signiﬁcant differencebetween the two groups.Finally, users in the OMN group were better disposed EEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. XX, NO. XX, XXXX XXXX 11

Pre-Test Post-Test

OMN

OMN SEL ** *

Fig. 8. Disposition towards participating in a real AD experience. On the left,the pre-test answer for the SEL and OMN interfaces; on the right, the post-testanswer. Scale from 1 (absolutely negative) to 5 (absolutely positive). Mann-Whitney U-tests between pre- and post-test answers are shown, ∗ p -value < .05, ∗∗ p -value < towards participating in a real AD experience (OMN M=4.68,SD=0.58, SEL M=4.05, SD=0.85, p =.012). As shown in Fig. 8,prior to the experiment all participants were mildly optimistic,but after experiencing the OMN HUD, attitude towards thetechnology markedly improved (M=4.0 vs. M=4.68, p =.002).Complete data is provided in the Supplemental material.V. D ISCUSSION

We here proposed a methodology to validate the UX inAD systems based on continuous, quantitative informationgathered from physiological signals while the user is immersedin a VR driving simulation. Our methodology is exempliﬁedby the comparison of two AR-HUD-based interfaces whichdiffer in the amount of information displayed to the users.By controlling all aspects of the simulated environment, wewere able to disentangle the effect of very speciﬁc designchoices and measure their impact on the overall UX. It mustbe stressed that the only difference between the two groupswas the information displayed by the HUD, as the simulationwas otherwise identical; study groups were also homogeneousin terms of age, sex and ethnicity.Our results conﬁrmed that providing “why” information isimportant to reassure the user of the system’s competenceand to promote trust and situational awareness [3], [9]. Tothe best of our knowledge, ours is the ﬁrst contribution toevaluate a realistic HUD displaying a wide range of visual andauditory cues about the vehicle and its surroundings, as it isexpected in future AVs. Given the number of objects involvedin realistic scenarios, an omni-comprehensive (OMN) displaycould lead to an excessive cognitive load. A possible way toreduce information load, which we denoted as selective (SEL),is to display only the most relevant visual cues in the currentcontext. Indeed, our results indicated that the users foundinformation displayed by the OMN HUD slightly excessive,although acceptable in both cases, but this was compensatedby a less stressful driving experience, as conﬁrmed both bysubjective and objective measures.This difference is especially evident when potentially dan-gerous events occur, such as a pedestrian crossing the streetat the last minute. It is worth noting how the HMI inﬂuencedthe perception of external events, on one hand, and of thevehicle’s performance, on the other hand, despite the fact that the simulated scenario was identical in those respects. For in-stance, users in the OMN group perceived the vehicle as betterequipped to deal with unexpected changes in the environment.We argue that this difference arose as a consequence of themental model that users formed: as the information providedby the HUD allowed users to better anticipate dangeroussituations, they projected this feeling onto the AV as well.Our results have important implications for AI research inAD, and speciﬁcally for the sensory sub-systems, as HMIconstraints need to be considered in their design. For instance,end-to-end training from sensory input to planning does notexplicitly extract all the information that was included in thissimulated HMI [54]. In our simulation, information displayedby the SEL HUD was chosen based on a set of heuristicsthat could be further improved by exploiting a more advancedAI, such as the ability to predict the motion of objects andpedestrians to foresee potentially dangerous situations beforethey actually affect the vehicle’s trajectory.In this study, we have sought to be as independent aspossible from speciﬁc AD systems, e.g., by simulating per-fect vehicle sensing capabilities. Our conclusions are thusunaffected by potential errors or misses in the AD objectdetection system. The proposed methodology could certainlybe employed to test other types of autonomous vehicles andtheir underlying AI systems, by changing the modeled interiorand/or behavior. It would also be possible to investigate howpossible errors may affect the UX and trust.The proposed scenario is certainly representative of thelearning phase as deﬁned in [9]. Information display by theHUD is particularly relevant in this initial phase, when theuser is still forming a mental model of how the AD systemworks. Our results may not apply entirely to the performancephase, in which the user has observed the AD system for aprolonged period of time. However, the unexpected events oraccidents which we simulate, while rare, can have a profoundeffect on trust, both at the individual and collective level. Itshould be noticed that trust begins to form even before the ﬁrstinteraction with the system, e.g. based on information from themedia, or personal preferences [8], [9]. This was evident inour study where, initially, many subjects were not willing toparticipate in a real AD experience. However, participatingin the VR experience, and being exposed to an informativeinterface, signiﬁcantly improved their acceptance towards ADsystems. In a simulated setting, all AD technologies, as wellas all types of events, can be recreated, opening interestingopportunities for “training” future users of AV technology.GSR proved capable of detecting user’s stress in responseto potentially dangerous events, in line with previous literatureresults which, however, were obtained in the context of manualor partially automated driving [4], [26]. Notably, differencesin HMI design were reﬂected in observable changes in GSRlevels, even when using consumer electronics sensors. TheGSR response was correlated to the perceived risk as measuredby subjective questionnaires, as well as to the “surprise” factor,which depends on the HMI. We here focused on the responseto speciﬁc events, but the methodology could be extended toextract features that characterize the entire experience [26].

EEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. XX, NO. XX, XXXX XXXX 12

VI. C

ONCLUSIONS AND F UTURE W ORK

In this work we proposed a methodology to validate the UXin AD systems based on continuous, quantitative informationgathered from physiological signals while the user is immersedin a VR driving simulation. Its effectiveness was shown inthe context of HMI design, and speciﬁcally applied to thecomparison of HUD-based interfaces for AVs that providesvisual cues about the vehicle’s sensory and planning systems.We explored in this exempliﬁcation the role of HMI in elicitinga sense of trust and safeness in AD systems, as this will bekey for humans to relinquish control of the vehicle.The proposed methodology relies on physiological signals(GSR in this speciﬁc embodiment) to provide a continuous,quantitative and objective feedback. This is particularly rel-evant for simulation of AD systems, as objective measuresin driving research are traditionally based on driver’s perfor-mance and behavior. A limitation of GSR is that it measuresarousal, but is a poor indicator of valence. In our speciﬁc case,the experience was engineered to elicit a sense of distress and,hence, a positive valence was excluded. In the future, thislack could be overcome by including other sensors, e.g., tomeasure the HR, other types of features that reﬂect differentcharacteristics of the UX, as well as machine learning modelsto more accurately detect the passengers’ affective state.It should be noticed that the increasing adoption of wearabledevices like smart watches incorporating a growing set ofhealth sensors will open additional opportunities for AVs’ per-sonalization; anthropomorphism, customization and adaptivityare also important factors for trust-worthy HMI [9]. While thephysiological response (1–5s in the case of GSR) is too slowto be exploited for actual driving, it could be used to customizevarious aspects of the HMI, like the quantity and quality ofinformation displayed, and of the overall driving experience.The proposed methodology for testing could be extended tocover also the above scenarios as well as other aspects of theUX (e.g., considering not just in-vehicle scenarios, but alsovehicle-to-pedestrian interactions [55], long-term performance[9], etc.), by adjusting the simulation, the HMI and/or thevehicle’s AI as needed.A

CKNOWLEDGMENT

The authors want to thank Dario Doronzo and AntonelloLaurino for their contributions on the system implementation.This research was partly supported by the VR@Polito lab.R

EFERENCES[1] W. Jiadai, L. Jiajia, and K. Nei, “Networking and communications inautonomous driving: A survey,”

IEEE Comm. Surveys & Tutor.

International Journal onInteractive Design and Manufacturing , vol. 9, no. 4, pp. 269–275, 2015.[4] L. Eudave and M. Valencia, “Physiological response while driving inan immersive virtual environment,” in

Wearable and Implantable BodySensor Networks (BSN), 2017 IEEE 14th International Conference on .IEEE, 2017, pp. 145–148. [5] R. Jose, G. A. Lee, and M. Billinghurst, “A comparative study of simu-lated augmented reality displays for vehicle navigation,” in

Proceedingsof the 28th Australian Conference on Computer-Human Interaction , ser.OzCHI ’16. New York, NY, USA: ACM, 2016, pp. 40–48.[6] P. Lungaro, K. Tollmar, and T. Beelen, “Human-to-ai interfaces forenabling future onboard experiences,” in

Proceedings of the 9th In-ternational Conference on Automotive User Interfaces and InteractiveVehicular Applications Adjunct , ser. AutomotiveUI ’17, New York, NY,USA, 2017, pp. 94–98.[7] D. Harrison McKnight and N. L. Chervany, “Trust and distrust deﬁ-nitions: One bite at a time,” in

Trust in Cyber-societies , R. Falcone,M. Singh, and Y.-H. Tan, Eds. Berlin, Heidelberg: Springer BerlinHeidelberg, 2001, pp. 27–54.[8] J. D. Lee and K. A. See, “Trust in automation: Designing for appropriatereliance,”

Hum. Factors , vol. 46, no. 1, pp. 50–80, 2004.[9] F. Ekman, M. Johansson, and J. Sochor, “Creating appropriate trust inautomated vehicle systems: A framework for hmi design,”

IEEE Trans.Hum-Mach. Syst. , vol. 48, no. 1, pp. 95–101, 2018.[10] F. M. F. Verberne, J. Ham, and C. J. H. Midden, “Trust in smart systems:Sharing driving goals and giving information to increase trustworthinessand acceptability of smart systems in cars,”

Hum. Factors , vol. 54, no. 5,pp. 799–810, 2012.[11] R. H¨auslschmid, M. von B¨ulow, B. Pﬂeging, and A. Butz, “Supportingtrust in autonomous driving,” in

Proceedings of the 22nd InternationalConference on Intelligent User Interfaces . New York, NY, USA: ACM,2017, pp. 319–329.[12] A. Doshi, S. Y. Cheng, and M. M. Trivedi, “A novel active heads-up display for driver assistance,”

IEEE Trans. Syst., Man, Cybern. B ,vol. 39, no. 1, pp. 85–93, 2009.[13] Z. Medenica, A. L. Kun, T. Paek, and O. Palinko, “Augmented realityvs. street views: a driving simulator study comparing two emergingnavigation aids,” in

Proc. 13th Int. Conf. on Human Computer Int. withMobile Dev. and Serv.

ACM, 2011, pp. 265–274.[14] S. Kim and A. K. Dey, “Simulated augmented reality windshield displayas a cognitive mapping aid for elder driver navigation,” in

Proceedingsof the SIGCHI Conference on Hum. Factors in Computing Systems , ser.CHI ’09. New York, NY, USA: ACM, 2009, pp. 133–142.[15] B.-J. Park, J.-W. Lee, C. Yoon, and K.-H. Kim, “Augmented reality forcollision warning and path guide in a vehicle,” in

Proceedings of the21st ACM Symposium on Virtual Reality Software and Technology , ser.VRST ’15. New York, NY, USA: ACM, 2015, pp. 195–195.[16] R. Haeuslschmid, L. Schnurr, J. Wagner, and A. Butz, “Contact-analogwarnings on windshield displays promote monitoring the road scene,”in

Proceedings of the 7th International Conference on Automotive UserInterfaces and Interactive Vehicular Applications , ser. AutomotiveUI’15. New York, NY, USA: ACM, 2015, pp. 64–71.[17] M. T. Phan, I. Thouvenin, and V. Frmont, “Enhancing the driverawareness of pedestrian using augmented reality cues,” in , Nov 2016, pp. 1298–1304.[18] L. Guo, S. Manglani, Y. Liu, and Y. Jia, “Automatic sensor correctionof autonomous vehicles by human-vehicle teaching-and-learning,”

IEEETrans. on Vehicular Technology , vol. 67, no. 9, pp. 8085–8099, 2018.[19] F. Bazzano, F. Gentilini, F. Lamberti, A. Sanna, G. Paravati, V. Gatteschi,and M. Gaspardone, “Immersive virtual reality-based simulation tosupport the design of natural human-robot interfaces for service roboticapplications,” in

Augmented Reality, Virtual Reality, and ComputerGraphics - Third International Conference, AVR 2016, Lecce, Italy, June15-18, 2016. Proceedings, Part I , 2016, pp. 33–51.[20] Y. Chen, C. Stout, A. Joshi, M. L. Kuang, and J. Wang, “Driver-assistance lateral motion control for in-wheel-motor-driven electricground vehicles subject to small torque variation,”

IEEE Trans. onVehicular Technology , vol. 67, no. 8, pp. 6838–6850, 2018.[21] Y. Wang, B. Mehler, B. Reimer, V. Lammers, L. A. D’Ambrosio,and J. F. Coughlin, “The validity of driving simulation for assessingdifferences between in-vehicle informational interfaces: A comparisonwith ﬁeld testing,”

Ergonomics , vol. 53, no. 3, pp. 404–420, 2010.[22] N. Mullen, J. Charlton, A. Devlin, and M. Bedard,

Simulator validity:behaviours observed on the simulator and on the road , 1st ed. Australia:CRC Press, 2011, pp. 1 – 18.[23] S. Balters and M. Steinert, “Capturing emotion reactivity throughphysiology measurement as a foundation for affective engineering inengineering design science and engineering practices,”

J. of Intell.Manuf. , vol. 28, no. 7, pp. 1585–1607, 2017.[24] D. Ruscio, L. Bascetta, A. Gabrielli, M. Matteucci, D. Ariansyah,M. Bordegoni et al. , “Collection and comparison of driver/passengerphysiologic and behavioural data in simulation and on-road driving,” in

EEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. XX, NO. XX, XXXX XXXX 13

Models and Technologies for Intelligent Transportation Systems (MT-ITS), 2017 5th IEEE International Conference on , 2017, pp. 403–408.[25] M. J. Johnson, T. Chahal, A. Stinchcombe, N. Mullen, B. Weaver, andM. Bedard, “Physiological responses to simulated and on-road driving,”

Int. J. Psychophysiol. , vol. 81, no. 3, pp. 203–208, 2011.[26] J. Healey, R. W. Picard et al. , “Detecting stress during real-world drivingtasks using physiological sensors,”

IEEE Trans. Intell. Transp. Syst. ,vol. 6, no. 2, pp. 156–166, 2005.[27] R. R. Singh, S. Conjeti, and R. Banerjee, “Assessment of driver stressfrom physiological signals collected under real-time semi-urban drivingscenarios,”

Int. J. of Comput. Int. Sys. , vol. 7, no. 5, pp. 909–923, 2014.[28] B. Dalgarno and M. J. Lee, “What are the learning affordances of 3-dvirtual environments?”

Brit. J. Ed. Tech. , vol. 41, no. 1, pp. 10–32, 2010.[29] Genivi Vehicle Simulator. [Online]. Available: https://at.projects.genivi.org/wiki/display/PROJ/GENIVI+Vehicle+Simulator[30] R. Marino, S. Scalzi, and M. Netto, “Nested pid steering control forlane keeping in autonomous vehicles,”

Control Engineering Practice ,vol. 19, no. 12, pp. 1459–1467, 2011.[31] A. Laurino, “Virtual reality-based simulation tools for evaluating userexperience in autonomous vehicles,” Master’s thesis, 2018.[32] P. A. Hancock, R. J. Jagacinski, R. Parasuraman, C. D. Wickens, G. F.Wilson, and D. B. Kaber, “Human-automation interaction research: Past,present, and future,”

Ergon. in Design , vol. 21, no. 2, pp. 9–14, 2013.[33] S. W. A. Dekker and D. D. Woods, “Maba-maba or abracadabra?progress on human-automation co-ordination,”

Cogn. Technol. Work ,vol. 4, pp. 240–244, 2002.[34] ISO, “Intelligent transport systems – forward vehicle collision warn-ing systems – performance requirements and test procedures,” ISO22324:2015, 2013.[35] A. Sebastian, M. Tang, Y. Feng, and M. Looi, “Multi-vehicles interactiongraph model for cooperative collision warning system,” in , June 2009, pp. 929–934.[36] G. Johansson and K. Rumar, “Drivers’ brake reaction times,”

Hum.Factors , vol. 13, no. 1, pp. 23–27, 1971.[37] P. George, I. Thouvenin, V. Frmont, and V. Cherfaoui, “Daaria: Driverassistance by augmented reality for intelligent automobile,” in \ International Conference on AppliedHum. Factors and Ergonomics , 2017, pp. 220–228.[41] B. Kim, D. Kim, K. Kim, and K. Yi, “High-level automated drivingon complex urban roads with enhanced environment representation,” in , Oct 2015, pp. 516–521.[42] G. Valenza, A. Lanata, and E. P. Scilingo, “The role of nonlinear dyn.in affective valence and arousal recognition,”

IEEE transactions onaffective computing , vol. 3, no. 2, pp. 237–249, 2012.[43] M. Meehan, B. Insko, M. Whitton, and F. P. Brooks Jr, “Physiologicalmeasures of presence in stressful virtual environments,” in

ACM Trans.Graphics (tog) , vol. 21, no. 3, 2002, pp. 645–652.[44] B. Figner, R. O. Murphy et al. , “Using skin conductance in judgmentand decision making research,”

A handbook of process tracing methodsfor decision research , pp. 163–184, 2011.[45] M. Slater, C. Guger, G. Edlinger, R. Leeb, G. Pfurtscheller, A. Antley et al. , “Analysis of physiological responses to a social situation in animmersive virtual environment,”

Presence: Teleoperators and VirtualEnvironments , vol. 15, no. 5, pp. 553–569, 2006.[46] R. S. Kennedy, N. E. Lane, K. S. Berbaum, and M. G. Lilienthal,“Simulator sickness questionnaire: an enhanced method for quantifyingsimulator sickness,”

Int. J. Aviat. Psychol. , vol. 3, pp. 203–220, 1993.[47] R. Taylor, “Situational awareness rating technique: The development ofa tool for aircrew systems design,” in

Sit. Awar. , 2017, pp. 111–128.[48] NASA. [Online]. Available: https://humansystems.arc.nasa.gov/groups/TLX/[49] K. E. Schaefer, “Measuring trust in human robot interactions: Develop-ment of the trust perception scale-hri,” in

Robust Intelligence and Trustin Autonomous Systems . Springer, 2016, pp. 191–218.[50] R. S. Kalawsky, “VRUSE - a computerised diagnostic tool: for usabil-ity evaluation of virtual/synthetic environment systems,”

Appl. Ergon. ,vol. 30, no. 1, pp. 11–25, 1999. [51] “Grove GSR sensor Seeed Wiki,” 2017. [Online]. Available: http://wiki.seeedstudio.com/Grove-GSR Sensor/[52] “Raspberry ADC,” 2015. [Online]. Available: https://learn.adafruit.com/raspberry-pi-analog-to-digital-converters/mcp3008[53] B. Patrao, S. Pedro, and P. Menezes, “How to deal with motion sicknessin virtual reality,”

Sciences and Tech. of Int., 2015 22nd , pp. 40–46, 2015.[54] M. Bojarski, D. Del Testa, D. Dworakowski, B. Firner, et al. , “End toend learning for self-driving cars,” arXiv:1604.07316 , 2016.[55] R. Amir and K. T. John, “Autonomous vehicles that interact withpedestrians: A survey of theory and practice,”

IEEE Transactions onIntelligent Transportation Systems , in press.

Lia Morra received the M.Sc. and the Ph.D. degreesin computer engineering from Politecnico di Torino,Italy, in 2002 and 2006. Currently, she is senior post-doctoral fellow at the Dip. di Automatica e Informat-ica of Politecnico di Torino. Her research interestsinclude computer vision, pattern recognition, andmachine learning.

Fabrizio Lamberti is an associate professor at theDip. di Automatica e Informatica of Politecnicodi Torino. His research interests are manly in theareas of computer graphics, HMI and intelligentcomputing. He is serving as Associate Editor forthe IEEE Transactions on Computers, the IEEETransactions on Emerging Topics in Computing, theIEEE Transactions on Learning Technologies andthe IEEE Transactions on Consumer Electronics. Heis a Senior Member of the IEEE.

F. Gabriele Prattic´o received his M.Sc. degrees incomputer engineering from Politecnico di Torino,Italy, in 2017. Currently, he is a Ph.D. student atPolitecnico di Torino, where he carries out researchin the areas of mixed reality, HMI, serious gamesand user experience design.

Salvatore La Rosa received the M.Sc. degree inbiomedical engineering from Politecnico di Torino,Italy, in 2019. His major interests regard biosignalanalysis, pattern recognition, machine learning andembedded systems.

Paolo Montuschi is a full professor at the Dip.di Automatica e Informatica and a Member of theBoard of Governors of Politecnico di Torino, Italy.His research interests include computer arithmetic,computer graphics, and intelligent systems. He isserving as 2019 Acting (interim) Editor-in-Chief ofthe IEEE Transactions on Emerging Topics in Com-puting and as the 2017-19 IEEE Computer SocietyAwards Chair. He is an IEEE Fellow, and a lifemember of the International Academy of Sciencesof Turin and of IEEE Eta Kappa Nu.

EEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. XX, NO. XX, XXXX XXXX 14 A PPENDIX

In this Appendix, complete questionnaire results are pro-vided. In particular, in Table II, characteristics of subjects in-volved in the user study are given (per group). Results concern-ing motion sickness are reported in Fig. 9. Feedback collectedthrough the other sections of the questionnaire are provided inTables III–VI. Speciﬁcally, subjective evaluation pertaining toquestionnaire sections

System competence , Cognitive Load and

Overall User Experience are reported in Table III. Subjectivemeasurements for questionnaire section

Reaction to test events are reported in Table IV, and complete the results presented inFig. 6 of the main text. Subjective evaluation pertaining to thequestionnaire section on

Situational Awareness are reportedin Table V. Quality and quantity were evaluated for eachelement of the HMI, e.g., bounding boxes, navigation lines,etc., where quality in this context refers to how useful thespeciﬁc element was in understanding the vehicle’s behaviouras well as the surrounding environment. Finally, questionsrelated to the VR experience (questionnaire section

Immersionand presence ) are reported in Table VI. All questions were inItalian and had to be rated on a 1–5 Likert scale. The Selective(SEL) and Omni-comprehensive (OMN) AR-HUD interfacesare compared using the Mann-Whitney U-test. A p -value of .05or lower was considered to indicate a statistically signiﬁcantdifference; statistically signiﬁcant differences are highlightedin bold in the tables. TABLE IIS

UBJECTS CHARACTERISTICS BY STUDY GROUP

Test group Gender Age

SEL Male Female 22.53 ± ± EEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. XX, NO. XX, XXXX XXXX 15

TABLE IIIS

UBJECTIVE RESULTS ON SYSTEM COMPETENCE , COGNITIVE LOAD , AND USER EXPERIENCE COLLECTED THROUGH QUESTIONNAIRES ON A TO -5(D ISAGREE TO AGREE ) SCALE . I

NSPIRED BY STANDARD QUESTIONS FOR THE EVALUATION OF TRUST IN HUMAN - ROBOT INTERACTION (HRI),

THISSECTION EVALUATES THE PERCEIVED

AV’

S COMPETENCE ACROSS THE RANGE OF DRIVING SITUATIONS EXPLORED IN THE SIMULATION . O

VERALLUSER EXPERIENCE INVESTIGATES GENERAL ASPECTS REGARDING THE MENTAL MODEL , AND THE TRUST POSED IN THE SYSTEM .SEL OMN U-Test

Statement µ σ µ σ p-value

System Competence

The autonomous vehicle showed adequate decision-making skills 4.526 .612 4.684 .478 .556The autonomous vehicle faced difﬁculties during unexpected changes in the environment 1.684 .749 1.211 .419 .041The autonomous vehicle was smart 4.579 .692 4.368 .761 .415I appreciated the driving skills of the autonomous vehicle 4.158 .602 4.105 .567 .863

Cognitive Load

It was demanding to ﬁnd information within the HUD 1.000 .000 1.053 .229 1.000It was stressful to ﬁnd information within the HUD 1.000 .000 1.105 .315 .486Generally the amount of information provided by then HUD was excessive 1.053 .809 2.105 .229 .000

Generally the comprehensibility of information provided by the HUD was adequate 4.737 .562 4.684 .582 .908

Overall User Experience

HUD information was useful to build trust in the vehicle 4.158 .688 4.842 .375 .001HUD information was helpful to understand why the vehicle made a decision 4.263 1.046 4.842 .375 .055HUD information helped me feel comfortable and at ease 3.789 1.084 4.684 .582 .003

Thanks to HUD information, I had the perception that the vehicle was in full control of thesituation 3.789 1.084 4.842 .375 .000

The HUD user interface was able to inform me before the potential danger affected driving 2.421 .607 4.105 .567 .000

Generally, I have found that HUD information was helpful in anticipating the dangerous situation 2.474 .612 4.421 .607 .000

EEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. XX, NO. XX, XXXX XXXX 16

TABLE IVS

UBJECTIVE ASSESSMENT OF TEST EVENTS COLLECTED THROUGH QUESTIONNAIRES ON A TO -5 (D ISAGREE TO AGREE ) SCALE . T

HESE QUESTIONSPROVIDE COMPLEMENTARY INFORMATION TO THE PHYSIOLOGICAL SIGNALS BY INVESTIGATING THE PERCEPTION OF THE EVENT ITSELF ( SENSE OFDANGER ) AS WELL AS THE QUALITY OF THE

HMI

IN PRESENTING THE EVENT TO THE USER ( SENSE OF SURPRISE ).SEL OMN U-Test

Statement µ σ µ σ p-value

Test Event - Dog

How dangerous would you rate this situation? (not at all - very dangerous) 3.368 .831 3.053 .911 .269I was surprised by this situation 4.421 .769 3.737 .991 .016

I detected the potential danger before it affected driving 1.632 1.065 2.737 1.046 .001

The information displayed was useful to anticipate the potential danger 1.579 .692 3.000 .816 .001

Test Event - Ball

How dangerous would you rate this situation? (not at all - very dangerous) 3.053 1.079 3.000 .943 .872I was surprised by this situation 3.632 1.012 3.053 .970 .102I detected the potential danger before it affected driving 2.053 1.224 2.789 1.228 .064The information displayed was useful to anticipate the potential danger 1.579 .692 3.263 1.368 .000

Test Event - Car1

How dangerous would you rate this situation? (not at all - very dangerous) 3.579 .961 2.421 1.170 .003

I was surprised by this situation 3.368 1.300 1.947 1.129 .001

I detected the potential danger before it affected driving 3.000 1.528 4.158 1.259 .007

The information displayed was useful to anticipate the potential danger 2.632 1.012 3.947 1.129 .001

Test Event - Scooter

How dangerous would you rate this situation? (not at all - very dangerous) 1.895 1.243 1.368 .684 .208I was surprised by this situation 1.842 1.068 1.684 1.003 .628I detected the potential danger before it affected driving 2.105 1.286 2.000 1.291 .773The information displayed was useful to anticipate the potential danger 1.211 .713 1.053 .229 .743

Test Event - Car2

How dangerous would you rate this situation? (not at all - very dangerous) 4.211 .787 3.421 1.017 .017

I was surprised by this situation 4.105 .994 3.053 .970 .002I detected the potential danger before it affected driving 1.895 .994 3.316 1.204 .000

The information displayed was useful to anticipate the potential danger 2.053 .848 3.158 .958 .000

Test Event - Man1

How dangerous would you rate this situation? (not at all - very dangerous) 1.316 .671 1.263 .452 1.000I was surprised by this situation 1.316 .582 1.158 .501 .390I detected the potential danger before it affected driving 3.474 1.712 4.474 .772 .093The information displayed was useful to anticipate the potential danger 3.421 1.170 4.316 .885 .018

Test Event - Man2

How dangerous would you rate this situation? (not at all - very dangerous) 4.474 .697 3.737 .933 .008

I was surprised by this situation 4.526 .841 3.368 1.012 .000

I detected the potential danger before it affected driving 1.842 1.015 3.053 1.079 .001

The information displayed was useful to anticipate the potential danger 1.474 .697 3.158 .958 .001

EEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. XX, NO. XX, XXXX XXXX 17

TABLE VS

UBJECTIVE ASSESSMENT OF

HMI

ELEMENTS COLLECTED THROUGH QUESTIONNAIRES ON A TO -5 (D ISAGREE TO AGREE ) SCALE . S

ELECTIVE (SEL)

AND O MNI -C OMPREHENSIVE (OMN) AR-HUD

INTERFACES ARE COMPARED USING THE M ANN -W HITNEY U- TEST

SEL OMN U-Test

Statement µ σ µ σ p-value

Situational Awareness

Bounding boxes (BB) helped me understand that the car had taken charge of the trafﬁc lightsand handled them appropriately 4.824 .393 4.765 .437 1.000Labels helped me understand that the car had taken charge of the trafﬁc lights and handled themappropriately 4.842 .501 4.947 .229 .743BB helped me understand that the car had taken charge of the road sign and handled itappropriately 4.556 .784 4.500 .730 .772Labels helped me understand that the car had taken charge of the road sign and handled itappropriately 4.684 .478 4.684 .671 .714BB helped me understand that the car had taken charge of the potential obstacle (pedestrian,animal) and handled it appropriately 4.316 1.157 4.737 .452 .492Labels helped me understand that the car had taken charge of the potential obstacle (pedestrian,animal) and handled it appropriately 4.263 1.195 4.579 .838 .558BB helped me to understand that the car had taken charge of the other cars likely affecting thedriving and ﬁgured out how to handle them 4.895 .315 4.895 .315 1.000Labels helped me to understand that the car had taken charge of the other cars likely affectingthe driving and ﬁgured out how to handle them 4.842 .501 4.684 .749 .500Navigation line has been helpful in understanding the vehicles’s intentions 4.944 .236 4.947 .229 1.000Other vehicles’ navigation lines helped me to understand that the car had taken charge of theirproximity and ﬁgured out how to handle them 5.000 .000 4.579 .961 .046

BB colour linked to the level of risk helped me to understand that the car had taken charge ofobstacles and handled them appropriately 4.800 .561 4.600 .737 .555The warning sound of trafﬁc light/road sign helped me to understand that the car had takencharge of the situation 4.688 .602 4.250 .775 .106The warning sound in case of danger helped me to understand that the car had taken charge ofthe situation and handled it appropriately 4.500 .857 4.588 .618 1.000

Quantity/Mental Workload: how would you rate the quantity of information? (1=poor,3=adequate, 5=excessive)

Number of bounding boxes and labels for trafﬁc lights 3.053 .229 3.105 .459 .604Number of bounding boxes and labels for road signs 2.579 .607 3.263 .562 .001

Number of bounding boxes and labels for potential obstacles (pedestrian, animal, etc.) 2.474 .697 3.105 .315 .001

Number of bounding boxes and labels for trafﬁc cars 2.789 .419 3.474 .697 .001

Number of navigation lines for the trafﬁc cars 2.842 .375 3.474 .905 .014

The warning sound of trafﬁc light/road sign 2.368 .684 3.316 .582 .000

The warning sound in case of danger 2.526 .697 3.000 .000 .008

TABLE VIS

UBJECTIVE ASSESSMENT OF VR SIMULATOR COLLECTED THROUGH QUESTIONNAIRES ON A TO -5 (D ISAGREE TO AGREE ) SCALE . R

ELEVANTSECTIONS FROM THE

VRUSE

QUESTIONNAIRE SELECTED TO EVALUATE THE SIMULATED ENVIRONMENT WITH RESPECT TO IMMERSION , PRESENCEAND FIDELITY . SEL OMN U-Test

Statement µ σ µ σ p-value

Immersion and Presence

I felt a sense of being immersed in the virtual environment 4.526 .697 4.474 .772 .999The quality of the image reduced my feeling of presence 2.105 1.150 2.474 1.020 .239I had a good sense of scale in the virtual environment 4.947 .229 4.842 .375 .604The presence of my hands and legs within the VR enhanced my sense of presence 4.632 .684 4.579 .769 1.000The motion platform enhanced my sense of presence 4.474 1.073 4.579 .838 .987Overall I would rate my sense of presence as: (1) very unsatisfactory- (5) very satisfactory 4.368 .597 4.421 .692 .769