Comparing State-of-the-Art and Emerging Augmented Reality Interfaces for Autonomous Vehicle-to-Pedestrian Communication
F. Gabriele Pratticò, Fabrizio Lamberti, Alberto Cannavò, Lia Morra, Paolo Montuschi
IIEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. XX, NO. XX, XXXX 2020 1
Comparing State-of-the-Art and EmergingAugmented Reality Interfaces for AutonomousVehicle-to-Pedestrian Communication
F. Gabriele Prattic`o,
Student Member, IEEE,
Fabrizio Lamberti,
Senior Member, IEEE,
Alberto Cannav`o,
Student Member, IEEE,
Lia Morra,
Senior Member, IEEE,
Paolo Montuschi,
Fellow, IEEE
Abstract —Providing pedestrians and other vulnerable roadusers with a clear indication about a fully autonomous ve-hicle status and intentions is crucial to make them coexist.In the last few years, a variety of external interfaces havebeen proposed, leveraging different paradigms and technologiesincluding vehicle-mounted devices (like LED panels), short-range on-road projections, and road infrastructure interfaces(e.g., special asphalts with embedded displays). These designswere experimented in different settings, using mockups, speciallyprepared vehicles, or virtual environments, with heterogeneousevaluation metrics. Promising interfaces based on AugmentedReality (AR) have been proposed too, but their usability andeffectiveness have not been tested yet. This paper aims to comple-ment such body of literature by presenting a comparison of state-of-the-art interfaces and new designs under common conditions.To this aim, an immersive Virtual Reality-based simulationwas developed, recreating a well-known scenario represented bypedestrians crossing in urban environments under non-regulatedconditions. A user study was then performed to investigate thevarious dimensions of vehicle-to-pedestrian interaction leveragingobjective and subjective metrics. Even though no interface clearlystood out over all the considered dimensions, one of the ARdesigns achieved state-of-the-art results in terms of safety andtrust, at the cost of higher cognitive effort and lower intuitivenesscompared to LED panels showing anthropomorphic features.Together with rankings on the various dimensions, indicationsabout advantages and drawbacks of the various alternatives thatemerged from this study could provide important informationfor next developments in the field.
Index Terms —Fully autonomous vehicles, human-machine in-teraction, virtual reality, augmented reality, vehicle-to-pedestriancommunication, pedestrian crossing.
I. I
NTRODUCTION A DVANCEMENTS in the field of automation are contin-uous, and promise to revolutionize most of everyone’sactivities. Autonomous vehicles, in particular, will play a keyrole in this ongoing revolution. While in the early 2010’sautonomous vehicles were still regarded as visionary by almostall car manufacturers, today this sector has changed into a
Manuscript received XXXX XX, XXXX; revised XXXX XX, XXXX.Copyright (c) 2015 IEEE. Personal use of this material is permitted.However, permission to use this material for any other purposes must beobtained from the IEEE by sending a request to [email protected] authors are with the GRAINS – GRAphics And INtelligentSystems group at the Dipartimento di Automatica e Informaticaof Politecnico di Torino, 10129, Torino, Italy. e-mail: (e-mail: fi[email protected];[email protected];[email protected]; [email protected]; [email protected]).This article has supplementary downloadable material available athttps://doi.org/10.1109/TVT.2021.3054312, provided by the authors. multi-billion dollars business. Huge investments are beingmade [1], and the expectations are that fully autonomousvehicles (FAVs) [2] will reach the market by the next decade.Given the disruptive potential of FAVs, their acceptance couldbe hindered by open challenges related not only to technicalaspects, but also to societal factors [3], which deserve sig-nificant attention as both people with and without technicalexpertise will be requested to trust machines [4]. This needcan be addressed from two viewpoints: that of the drivers (inthe future, of the passengers) and of in-vehicle interfaces; andthat of vulnerable road users (VRUs), like, e.g., pedestrians,and of interfaces external to the vehicle.In the last years, significant efforts have been devoted todesigning interaction paradigms capable of raising occupants’awareness about vehicles’ status and intentions, with the aimof improving trust in their autonomous decisions [5]. However,tackling the needs and expectations of VRUs is substantiallymore complicated. Driving is a complex social behavior basedon continuous interactions between drivers and road usersin uncertain and ambiguous situations. When adding (or re-placing traditional vehicles with) FAVs, the lack of humandrivers may cause a communication breakdown, potentiallydangerous for all road users [6]. For this reason, interfaces withVRUs (mainly pedestrians) recently started to be investigatedto increase the safety and acceptability of FAVs.Several alternatives have been proposed already, focusing onthe most common vehicle-to-pedestrian interactions, i.e., roadcrossings. Possible interaction paradigms include showinganthropomorphic features [7], or using LED strips/panels tocommunicate FAV’s intentions [8]–[10]. In other designs, on-vehicle visual hints were replaced by on-road projections,possibly leveraging well-known metaphors like crosswalk orstop signs [11], [12]. Some prototype implementations alsointroduced changes in the road infrastructure, collecting datafrom connected vehicles to communicate with pedestrians.Whenever a new vehicle-to-pedestrian interaction paradigm(or interface) was introduced, it was generally compared withsome of the previously proposed ones in qualitative and/orquantitative terms, often working with prototype implementa-tions or mockups. Some works resorted to Virtual Reality (VR)for comparing a number of alternatives at once [11]. Despitethe great relevance of these works for next developmentsin the field, available experimental evaluations suffer fromseveral drawbacks. For instance, when a representative set ofinterfaces are considered, not all the experimental conditions a r X i v : . [ c s . H C ] F e b EEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. XX, NO. XX, XXXX 2020 2 studied in other works are investigated (and viceversa). Whenthe above requirements are met, often the analysis does notfully recreate real-world conditions or does not address thesame objective and subjective dimensions which were deemedrelevant in other studies. Finally, and most importantly, not allthe interaction paradigms devised so far have been tested. Thisis the case, for instance, of Augmented Reality (AR)-basedinterfaces, which proved to be particularly effective for in-vehicle interaction [5]. Some AR-based designs for vehicle-to-pedestrian interfaces have been also proposed but, at present,they have not been compared (or even tested) yet with thementioned alternatives.In this work, a VR-based simulation system was first de-signed, by taking into account the above issues and endowingit with the capabilities required to support a fair comparisonof multiple interface designs relying on heterogeneous tech-nologies and offering different functionalities. This system wasthen exploited to run a user study aimed to compare the mostrelevant interfaces proposed so far from different categories.Specifically, AR-based interfaces were considered, to shedsome light on their possible role in next-generation vehicle-to-pedestrian interaction paradigms. A wide set of metrics derivedfrom relevant literature was used, providing interested readerswith a comprehensive picture of advantages and drawbacks ofavailable paradigms.II. R
ELATED WORK
As discussed in the previous section, the implicit communi-cation mechanism provided by vehicle movement alone mightnot be able to guarantee an efficient interaction between VRUsand FAVs. This is why, in the last years, a number of interfaceshave been developed to elicit vehicles’ status and/or intentions.
1) Vehicle-mounted interfaces: the first interfaces proposedrelied on visual hints provided by equipment mounted on thevehicle exterior. A first example is represented by the “Eyeson a car” design [7], which aims to replace eye contactbetween pedestrians and drivers of conventional vehicles withdigital eyes placed on the front lights. As soon as the vehicle’ssensors detect a pedestrian intending to cross, the eyes startstaring at him or her to signal that it will stop to allowcrossing. Otherwise, the eyes gaze remains fixed on the road.Experiments performed in a Virtual Environment (VE) showedthat pedestrians were faster in deciding whether to cross or notwhen the interface was available. However, some users foundthe eyes artificial, leading to an undesirable Uncanny Valleyeffect [11], and not sufficiently reliable or safe.Other interfaces were inspired by the well-known trafficlight metaphor. For instance, the concept in [8] is based on aLED light placed in the vehicle’s front. Two lighting patternswere proposed: green, flashing yellow, red (GYR), and white,flashing red, red (WRR). For each set, the first color indicateswhen it is safe to cross, the last color when it is not; to referto intermediate situations, the middle color is used. Studiesperformed on this design confirmed that users were able toproperly associate colors to messages the vehicle intended tocommunicate. However, the flashing red of WRR tended to be(mis)interpreted as a danger signal rather than a warning one, especially compared to the GYR flashing yellow. In anotherstudy dealing with color codes [13], the authors suggestedto remove the intermediate warning state (collapsing it todanger), as they realized that pedestrians tended to selfishlycross even when they were putting the vehicle’s passengersat risk. Authors also stated that an indication of approachingvehicle could be more effective than a safe/unsafe to crossinformation (though without any experimental evaluation).Another type of interface which proved to be quite intuitivefor pedestrians consists in
LED strips mounted in variouspositions of the vehicle’s exterior. An example is given in[9], [10]. In this implementation, the strip is mounted overthe windshield, and only the central LEDs are lighted whiledriving. When a pedestrian in detected, LEDs start to lightfrom the center to the edges of the strip. During crossing, allthe LEDs are lighted, and when the vehicle resumes motion,a reverted animation from edges to center is activated. Experi-ments carried out with a Wizard of Oz technique on a speciallyprepared vehicle showed that, after a short training, users wereable to properly use the interface. It is worth noting that, in [9],two different experiments were run to gather both direct andindirect measures of the pedestrian perceived safety. In [11], acomparison between the above interface and a different designwith two strips on the vehicle’s sides was performed in a VE.Users preferred the latter interface showing appreciation forits intuitiveness and unambiguity, though continuous feedbackprovided was judged as not particularly useful.Other examples of this interface category are provided in[14]–[16]. Besides the position on the vehicle’s exterior, themain differences among the various designs lay in the strip’sshape, in the lighting pattern, and in the color(s) used (and theirmeanings). In some cases, the above features are combinedtogether. For instance, in [17], a flashing yellow light is usedto indicate that the vehicle is not yielding, whereas a blue lightmoving from top to bottom indicates that it is going to stop;a fading blue light shows that vehicle is waiting and, lastly,flashing yellow is used again to indicate vehicle restart.An alternative design, still based on a LED strip butexploiting again anthropomorphic features like the interface in[7], is the “Smiling car” [11]. The interface shows a horizontalyellow line in normal driving conditions, which changes toa smile when a pedestrian is detected to inform him or herthat the vehicle will yield. Based on studies performed in ascenario involving one vehicle and one pedestrian [11], [18],users found this interface as very simple to use and able toprovide unambiguous information.LED panels have been used also to provide pedestrians withwritten information on vehicle’s status (e.g., “Braking” [18])or on what to do (e.g., “Cross now” [19]–[21]). Comparedwith other interfaces, such a direct communication approachdid not prove particularly effective, due to possible readabilityand language issues [11]. In some configurations, interfacesrequest the pedestrians to notify their intention to cross witha specific gesture. In this case, LED strips have been used toinform them that gesture was correctly recognized [22].
2) Projection-based interfaces: the main limitation ofvehicle-mounted interfaces is that they communicate only bymeans of visual signs placed on the vehicle, which may be
EEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. XX, NO. XX, XXXX 2020 3 poorly visible either because of adverse weather or lightingconditions, or simply because the vehicle is still too far.A way pursued to cope with these limitations consistsin projecting visual indications on the road. For instance,the prototype interface in [23] projects a pattern of parallellines in front of the vehicle. Lines are perpendicular to thedriving direction, and get closer/farther when the vehicle isdecelerating/accelerating. Experiments performed with a real,non-autonomous vehicle showed that pedestrians were fasterin deciding whether to cross or not, but they focused theirattention on the road rather than on the vehicle itself. In[12], a more sophisticated projection pattern was used. Duringnormal driving, a wave-shaped red pattern is projected. Whena pedestrian is detected and vehicle starts to slow down,pattern color changes to yellow. When vehicle has come toa complete stop and crossing is safe, pattern changes into agreen crosswalk shape, which turns to red when vehicle isgoing to restart again. The implementation proposed in [11],characterized by a lower number of states, was rated as verypleasant; it was also judged as futuristic, but this result couldbe due to the particular vehicle the interface was mountedonto. Despite the increased visibility, these interfaces may notwork well with all road conditions. Moreover, even thoughthey rely on the well-known crosswalk concept which shouldbe familiar for pedestrians, experiments indicated that theseinterfaces induce a high mental workload.
3) Smart road interfaces: a different approach to supportvehicle-to-pedestrian communication consists in using the roadinfrastructure itself. In the so called “Smart roads” concept,visual hints are provided through LED panels embedded inthe road pavement to indicate when it is safe to cross (e.g., byshowing a white crosswalk) and when it is not (e.g., lightingred bars on the sidewalk edge). For instance, the prototypeimplemented in London [24] was judged as very trustworthy.The main drawback of such interfaces is the cost associatedwith updating the road infrastructure [11].
4) Multi-modal interfaces: the above designs rely only onthe visual communication channel, which is also the focusof the present study. It is worth noticing, however, that thischoice makes these interfaces not suitable for all VRUs, e.g.,for visually impaired persons. For this reason, multi-modalinterfaces including a combination of visual, audio and haptic-based notifications have been proposed [25]. However, theaudio channel proved not particularly effective in noisy traffic.
5) AR interfaces: the last category in this review is rep-resented by AR interfaces. Thanks to technological advance-ments and the ever-larger availability of consumer-grade de-vices [26], it is possible to imagine a not too distant future inwhich pedestrians will wear their own AR glasses, enabling thedevelopment of sophisticated vehicle-to-pedestrian and, moregenerally, human-robot interaction paradigms [27]. However,experiments in this field are still rare. In a recent study [28],three concepts were explored. In the first one, AR visual hintsare exploited to show to the pedestrian, through a blue tapeoverlapped to his or her field of view, the safer path to followfor crossing the road; moreover, AR-based yellow arrows aredrawn in front of the vehicle to show where it could stop, ifneeded. The second design, referred to as “safe zones”, uses AR to draw large green regions indicating where it is safe tocross, and red regions closer to the vehicle indicating where itwould be dangerous to cross. The third design is a combinationof the previous ones. A user study was performed, in whichparticipants were shown static representations of the threedesigns. Despite the amount of visual indications provided,participants preferred the third design.The above review, which showed that various interfacecategories have indeed been proposed but only a few ofthem have been tested under representative and/or comparableconditions, motivated the design of the simulation system andthe experimental analysis that are reported in this work.III. S
IMULATION SYSTEM
In this section, the VR-based system created to support thestudy of the various interfaces is illustrated. In particular, thesimulation environment is presented first. Afterwards, inter-faces implementation and scenario configuration are discussed.
A. Simulation environment
The simulation environment has been built on top of theAirSim open-source software [29]. Originally developed byMicrosoft as a simulator for collecting data needed to traincomputer vision algorithms for unmanned vehicles, AirSimsupports hardware-in-the loop simulations and is characterizedby an extremely high visual fidelity. Also, a “Windridge City”urban scenario is provided free of charge.However, the “as-is” performance of AirSim is not suit-able for an effective fruition via immersive VR. Hence, datacollection from vehicle’s simulated sensors was disabled (asnot required in this work), and a number of optimizationswere implemented in order to target the minimum frameraterequired for immersive experiences (90 frames per seconds ormore to prevent motion sickness). To this purpose, among thegraphics platforms supported by AirSim, Unity was selected;optimizations leveraged ad-hoc functionalities available in thisgraphics engine to improve the performance of the applicationlogic and of graphics processing.In particular, concerning the former aspect, the novel Data-Oriented Technology Stack (DOTS) paradigm was used, whichallows developers to exploit the parallel processing capabilitiesand large cache availability of modern processors. The benefitin performance is especially relevant for applications that needto handle multiple instances of the same simulated object, inthis case the logic (and physics) of vehicles (and their parts,like wheels) and of pedestrians. The above paradigm wasimplemented by relying on the Unity’s architectural patternnamed Entity Component System (ECS) together with the C
EEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. XX, NO. XX, XXXX 2020 4 (a) Baseline, B (b) Smile, S(c) Projection, P (d) Smart road, M(e) Safe roads, F (f) Safe roads extended, EFig. 1. Simulation environment and interfaces included in the evaluationdepicted while the vehicle is braking. available for VR applications, was used. The HDRP allowed toeffectively manage the rendering of 3D elements with differentlevels of detail (LODs) based on actual distance from theuser, to use dynamic resolution for reducing the workloadon the GPU, to enable efficient calculation of reflections,transparencies and shadows, and to activate volumetric lightingfor realistic effects. Furthermore, the SRP supports the ShaderGraph (SG), which is a visual shader programming toolthat was of paramount importance to effectively backing theimplementation of the vehicle’s interfaces.Finally, several minor changes were applied to the originalurban scenario like, among others, simplifying the colliders’shape, activating GPU instancing on materials, and enablingasynchronous map loading.
B. Interface selection and implementation
In order to provide the users with a reference implementa-tion, a “Baseline” behavior was defined for the virtual FAVs,in which no specific interface is available (Fig. 1a). Braking was implemented as follows: as soon as the FAVdetects a pedestrian intending to cross, because close enoughto the sidewalk edge, looking either at the vehicle or at theroad, and within the detection range, it starts braking with aconstant deceleration. Deceleration is calculated by consider-ing the current speed and aiming to reach a full stop at a certaindistance from the pedestrian; in preliminary experiments, otherdeceleration strategies were found to largely alter the users’perception of the interface behavior, as confirmed also by [30],and hence were excluded from the evaluation. If the FAV isnot able to stop in due time (too fast and/or too close whencrossing starts), the horn is activated to signal the danger.By default, AirSim’s built-in vehicles are provided with acontroller logic for driving them with keyboard or joystick.However, in this work FAVs’ behavior described above wasscripted using the ECS, and speed was managed using a PIDcontroller (with output shaping) since, as it will be shown later,only deceleration and acceleration on a straight path had to behandled in the designed experiments.By building on top of the above “Baseline” (abbreviatedB), one interface was implemented for each of the categoriesdiscussed in detail in Section 2, focusing specifically on visualinterfaces. To this aim, we chose the interfaces that scoredbetter in experiments reported in previous literature. Neitherthe FAV’s behavior nor the vehicle shape were altered to avoidbiases in the interface comparison [11].For vehicle-mounted interfaces [11], [18], the “Smilingcar” concept was selected [11], later referred to as “Smile”(abbreviated S). In normal driving conditions, an horizontalstraight yellow line is shown on the vehicle’s front side; whena pedestrian is detected and the FAV starts braking, the lineturns smoothly into a smile (Fig. 1b) to signal that it will yield.Among projection-based interfaces [11], [12], [23], theconcept originally presented in [12] was considered (laternamed “Projection”, P). To foster comparability of results,the implementation in [11] was used (Fig. 1c), but the soundplayed at vehicle restart was removed in order to make theinterface rely only on visual indications, like the other ones.The interface integrates a LED panel on the vehicle’s frontside, whose pattern changes based on actual projection: allLEDs lighted in normal driving conditions, LEDs lightingfrom edges to center while decelerating, LEDs lighting in thecrossing direction while stopped, then transition to the originalstate at vehicle restart.For the “Smart road” (M) category, the implementationstudied in [24] was selected (Fig. 1d). In the original work,pedestrian detection is performed by the infrastructure (thus,for instance, crosswalk marking appears even when there isno vehicle approaching); in this study, detection was movedonto the FAV’s side. When there is no vehicle approaching,the interface does not provide any feedback (like with theother designs). When a vehicle is approaching, the interface’sstate is controlled by the vehicle itself, simulating a connectionestablished between the two entities; if the vehicle is ableto stop, a crosswalk appears together with the predicted stopposition, and white lines on the sidewalk edge are turned green(otherwise, they are turned red and no crosswalk is shown).Lastly, for the AR-based category, the third design in [28]
EEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. XX, NO. XX, XXXX 2020 5 leveraging AR hints for showing zones suitable/unsuitable forcrossing was chosen, later referred to as “Safe roads” (F).The interface shows in front of the vehicle a yellow arrow,terminated by a yellow bar whose length is equal to thecurrent estimated stopping distance (Fig. 1e). The arrow lengthis determined only by the vehicle’s dynamics, and it is notrelated in any way to pedestrian detection. When acceleratingor moving at cruise speed, the stopping distance is calculatedby applying a reference deceleration (deemed comfortable forthe passengers) of 3 m/s [31]. When braking, the stoppingdistance is obtained by sampling a pre-calculated brakingcurve at the current velocity: the curve is selected amongtwo alternatives (comfortable 3 m/s , and maximum 6 m/s ),picking the one with the lowest deceleration value still greaterthan current deceleration. To reinforce the feedback providedby the interface, the region of the road between the vehicleand the arrowhead is colored red, the rest green. It is worthobserving that, since the arrowhead (and the colored region)reflects solely the dynamics of the FAV, the pedestrian is notprovided with a clear, immediate indication of the fact that thevehicle actually detected him or her, and will stop accordingly.Since the goal of the present work was to study, in particular,the effectiveness of AR-based vehicle-to-pedestrian interac-tion, a further interface was added to the analysis. This choicewas motivated by the fact that, unfortunately, the interface in[28] is one of the few concepts proposed so far that has stillto be tested by letting users actually experiment road crossingwith it. Hence, a new interface was designed, named “Saferoads extended”, by combining key communication abilitiesthat were found in some of the best interface designs, butwere lacking in F. The behavior of the original yellow arrowis maintained. However, a red tick is added on the arrow body(Fig. 1f) to show the pedestrian where the vehicle would stopin case of an emergency brake (6 m/s ): the distance betweenthe vehicle and the tick represents the minimum stoppingdistance. Moreover, an additional blue arrow, with a blue baron the head, is drawn only when a pedestrian is detected(thus avoiding unneeded visual clutter). The blue arrowheadposition is fixed as the vehicle’s speed decreases, showing the(estimated) point where it will stop; thus, the pedestrian isimplicitly informed that he or she has been detected, and canuse this information to decide whether to cross or not. C. Virtual scenario
In order to test the interfaces, a representative case studywithin the selected VE had to be defined. Based on previousliterature, we decided to focus on an unregulated crossing sce-nario [11], [20], which could become commonplace in urbancontexts populated only by FAVs. Moreover, in the absence ofother external signals (e.g., traffic lights), VRUs have to relyon information provided by the FAVs’ interfaces only, levelingout possible environmental contributions. In particular, a one-way, 5 m wide road was chosen, as done, e.g., in [11], [12].In this way, the users could ground their decisions to cross ornot on the observation of a limited number of vehicles movingon a straight path rather than, e.g., on vehicles changing lanesor stopping at different locations; evaluation metrics can alsobe made independent of the crossing direction. Following the approach adopted, e.g., in [20], [32], vehicleswere organized in a pattern whose characteristics were con-trolled in order to ensure realism while at the same time pre-senting the users a number of different situations stimulatingthe various interfaces in many ways (Fig. 2). Cruise speed wasset to 50 Km/h (14 m/s) for all the vehicles, since previouswork indicates that above 40 Km/h the demographics havelow to no impact on the pedestrians’ behavior [33]. Since theexperience was expected to strongly depend on space (hence,time) separating the users from the approaching vehicle, it wasdecided to vary the distances between consecutive vehicles,setting them to either 45 m, 60 m or 100 m; within thepattern, a pseudo-random distribution guaranteeing a uniformdistribution of the inter-vehicle distance was used. So called“not-yielding” vehicles were also included in the pattern: thischaracteristic has been used rarely in the literature, but authorsof works where it was not exploited lamented the fact thatits lack negatively influenced the realism of the simulation[11]. These vehicles are essential to explore trust in human-to-vehicle interactions in the presence of faulty FAVs. Detectionrange of vehicles’ sensors was set to 60 m, and yieldingvehicles were programmed to stop 5 m ahead of the crossingpedestrian. IV. E
XPERIMENTAL PROTOCOL
In this section, the experimental protocol devised to carryout the evaluation will be illustrated in detail.
A. Experiment design and preparation
For each interface, the user is instructed to cross as quicklyand as many times as possible, whenever he or she feels it safeto cross. The testing of each experience is concluded when thefollowing two conditions are met: the user has completed atleast 15 crossings in total and at least one per distance. Thischoice ensures that the user experiences the full spectrum ofsituations within a reasonable amount of time. Additionally, asdone in [20] a maximum limit to the duration of the simulationwas introduced, in our case an upper bound of 300 generatedvehicles was set. FAVs are set to become visible at 140 m(Fig. 2) and are allowed to queue, though queues are preventedfrom growing too much in order to avoid long waiting timesthat could be physically and/or mentally demanding in VR.The user was informed that crossings in front of queuedvehicles would not be considered (they will be later referredto as invalid crossings).As done in almost all the works carried out so far, the user iscontrolling the only pedestrian that can interact with vehiclesin the above scenario: in this way, it is possible to isolate thecontribution of his or her interactions with the approachingvehicles from other possibly disturbing factors. It is worthobserving that interactions in more complex scenarios couldbe investigated in the future using the devised software, whichalready integrates a traffic simulation system encompassingboth vehicles and pedestrians (not used in the experiments).Experiments were designed according to a within-subjectslogic: all the subjects tested all the interfaces, starting withthe Baseline. Latin Square order was then used to define the
EEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. XX, NO. XX, XXXX 2020 6
Fig. 2. Scenario devised for the experiment: pattern used for generating vehicles, visibility and detection ranges, safety stop distance, and crosswalk length. sequence of the remaining tests. In the future, it is reasonableto expect that pedestrians will be accustomed with FAVs’interfaces: we sought to replicate a similar condition in ourexperimental setting by showing each subject a video tofamiliarize with each interface prior to the virtual experience.In each video, a pedestrian attempts to cross the road underthree different conditions: the vehicle is either too close tostop (hence, it will pass by horning at the pedestrian), it isforced to an emergency brake, or it has sufficient time to stopsmoothly. Videos shown to participants are made available . B. Hardware configuration and physical setup
The VR headset device selected to let the users immersethemselves in the created VE was the Samsung Odyssey. It isan HMD equipped with an AMOLED display with a resolutionof 1440 × ◦ . The headset supports 3Daudio and relies on inside-out tracking technology, meaningthat it does not requires external equipment to determine theuser’s (better, his or her head’s) and the controllers’ pose inthe real environment. An untethered setup was defined usingthe MSI VR One Backpack PC to run the simulation. Thebackpack integrates an Intel Core i7-7820HK and a NVidiaGeForce GTX 1070 in less than 3.5 Kg. Users could thuswalk in a physical space mapping one-to-one with the selectedportion of the VE where they were expected to cross. Thissetup was expected to lead to a more realistic experience,boosting the sense of immersion and presence [34]. C. Objective evaluation metrics
During the simulation, user’s behavior is logged by collect-ing several quantitative measures. For each interaction withan approaching vehicle, the system records the time at whichthe pedestrian is detected (and the vehicle starts braking,thus initiating the negotiation) and the time at which thepedestrian reaches the opposite sidewalk, i.e., the crossingends (Fig. 2).
Crossing time ( CT ), calculated as the differencebetween the above times, is logged; previous works speculatedthat it may be associated with user’s uncertainty [11]. Whenthe user enters the road, the distance of the approachingvehicle ( distance at crossing , or DAC ) and its speed ( speedat crossing , or
SAC ) are recorded. The interval between thetime the user was detected and he or she left the sidewalk(and negotiation ends) is defined as decision time ( DT ). Thesystem also keeps track of possible collisions with the vehicles. Videos: https://youtu.be/RoPURY1dlZE
Aborted crossings, occurring when the user gets off and on thesame sidewalk (without reaching the opposite one), are alsorecorded. An efficiency metric was defined as the total numberof valid crossings (i.e., non-invalid and non-aborted) dividedby the time elapsed from the first to the last crossing.
D. Subjective evaluation metrics
To further evaluate users’ experience, a questionnaire wasdeveloped based on previous literature on the subject.A before-experience section ( BEQ ) collects demographicinformation about the participant, as well as his or her opinionabout FAVs, crossing habits, and experience with technologiesused in the experiments. Possible symptoms associated withmotion sickness are recorded using the Simulation SicknessQuestionnaire (SSQ) [35].Afterwards, the testing of the individual interfaces begins.The after-video tutorial section (
ATQ ) aims at ensuring thatthe interface functioning has been fully understood prior tothe actual VR experience: the user is required to answer somequestions, describe the interface behavior in words and sketcha graphical representation on paper. Then, the after-interfacequestionnaire (
AIQ ) is used to investigate various dimensionsof vehicle-to-user interaction after the VR experience (lastingapproximately 10 minutes). The AIQ combines questionsfrom the Trust Scale (TS) [36], the System Usability Scale(SUS) [37], the NASA Task Load Index (NASA-TLX) [38]and the Short User Experience Questionnaire (UEQ-S) [39].Additional, custom questions are used to, e.g., investigate thelevel of perceived safety associated with a given interface,including whether the participant has felt the need to wait forthe vehicle to come to a full stop before attempting to cross.Finally, the participant is requested to rate a number of featuresof the tested interface (perceived safety, familiarity, workload,etc.) with respect to the Baseline.After testing all the interfaces, a post-experience question-naire (PEQ) is administered, which includes questions fromthe SSQ to evaluate possible discomforts due to the VE,from the VRUse tool [40] to determine the usability of theVR simulation, and from the iGroup Presence Questionnaire(IPQ) [41] to measure immersion and presence. Additionalquestions verify that the proper level of realism is reached inthe simulation. Finally, the participant is asked to rank (withoutties) all the interfaces based on individual features and overall.A final open interview concludes the experience. Questionnaire: http://tiny.cc/6xoksz
EEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. XX, NO. XX, XXXX 2020 7
E. Data analysis
Collected data were analyzed using MS Excel with the Real-Statistics add-on (v7.1). Comparative analyses were performedusing Friedman’s test (pass condition was set at p ≤ . ).Post-hoc comparison on pairwise groups was further appliedusing the Conover’s test. For non-comparative questionnaireitems, Cronbach’s Alpha ( α ) on clusters of related items wascomputed to test internal consistency. Interface rankings inthe PEQ were aggregated using a Bucklin voting system; inparticular, the relative placement scoring system (RPSS) [42]was applied considering a majority of half of the sample size.V. R ESULTS
Experimental evaluation involved 12 volunteers (8 males,4 females) aged between 24 and 46 ( M = 28 . , SD =4 . ). Participants were selected among Politecnico di Torinostudents and from the authors’ social networks. Based ondemographic information collected from the BEQ, of theparticipants were unfamiliar with immersive VR (never used,or used once), whereas said they use this technology veryoften or on a daily basis. Participants reported crossing theroads under non-regulated conditions either very often/daily(50%) or occasionally (50%), and were on average trustworthyabout FAVs ( M = 3 . , SD = 0 . , α = 0 . ). In thefollowing, experimental results will be presented, focusing firston the participants’ perception of the VE and of simulationquality, since possible discomforts associated to the use ofVR could introduce biases in the evaluation. Afterwards, theirexperience with the various interfaces will be compared byleveraging subjective feedback and objective measurements. A. Virtual experience and simulation quality
No significant effect was registered on pre-post experienceconditions ( p -value computed using the two-tailed Mann-Whitney U-test) for the three SSQ clusters, i.e., oculomotor( O ), disorientation ( D ), and nausea ( N ) symptoms, as wellas overall ( T ): ∆ O = 13 . ± . p = 0 . , ∆ D =8 . ± . p = 0 . , ∆ N = 14 . ± . p = 0 . , and ∆ T = 14 . ± . p = 0 . . Hence, psychophysiologicalalterations related to motion sickness that could have affectedparticipants’ attention level, reaction time, etc. apparently hadno significant influence on the experiments.Results from the IPQ shows that participants experienceda fairly good general sense of presence in the VE ( M =4 . , SD = 0 . ), and assigned medium-high scores for theremaining indicators, i.e., spatial presence ( M = 3 . , SD =0 . , α = 0 . ), involvement ( M = 3 . , SD = 0 . , α =0 . ), and realism ( M = 3 . , SD = 0 . , α = 0 . ).Participants were also satisfied with the operation of thelocomotion technique ( M = 4 . , SD = 0 . , α = 0 . ), aswell as with the visual quality and the simulation fidelity of theVE ( M = 4 . , SD = 0 . , α = 0 . ). Scores related to theperception of the FAVs, as measured by the PEQ, are reportedin Table I ( α = 0 . ). Participants were satisfied overall withthe FAV simulation. Based on open feedback collected afterthe experiments, lower scores assigned to some questions weredue to the non-yielding behavior of some vehicles which, TABLE IFAV
SIMULATION PERCEPTION . I
NVERTED ITEMS ARE MARKED WITH *. Item M SDYou think that vehicles were driven by a human (1)or were fully autonomous (5) 3.75 0.72The FAVs showed adequate decision making skills 4.17 0.56The FAVs struggled in case of sudden, unexpectedor abrupt pedestrian behavior* 2.08 1.04The FAVs made mistakes* 2.08 0.96The FAVs seemed intelligent 3.83 0.56Overall, you are satisfied about FAV simulation 4.00 0.59 as said, was intentionally introduced to simulate failures inpedestrian detection: in this respect, . of the participantsstated that they felt awkward when, right in the middle of thecrossing, the vehicle sometimes did not recognize them (andthey ascribed this behavior to faulty, or not smart enough,FAVs). Few participants reported also that they found FAVstoo polite with pedestrians compared to how a human driverwould behave in such situations. B. Interface comparison
Once the representativeness of the simulated scenario wasvalidated, the analysis moved to comparing selected interfaces.
1) Subjective results:
Ranking of the features analyzed inthe PEQ were first analyzed. Results aggregated with the RPSSare reported (together with pairwise significances) in Table II.Considering the various features, all the interfaces were rankedsignificantly higher than B (apart from familiarity, as it couldbe expected); this finding suggests that the introduction of avehicle-to-pedestrian interface was effective. Worth to mentionare the different placements of the two AR interfaces; accord-ing to participants preferences, E significantly outperformedF for most of the features. The least significant differenceswere obtained for cognitive workload: only S was judged assignificantly less demanding in terms of mental effort than B,M, F and E. In fact, S was also ranked frequently among thetwo best interfaces, largely overcoming E in features regardingimmediateness (like ease of use and intuitiveness).Very similar considerations can be made based on rankingsassigned in the AIQ: by performing a consistency check be-tween the two observations (AIQ, PEQ) of the same features,all reached significant levels with high correlation ( ρ ≥ . ),except ambiguity ( p = 0 . ) and latency ( p = 0 . ).Like for mental demand, none of the NASA-TLX indicators(normalized to interface B on a per-sample basis) were foundto be significant, with the exception of S that was consideredas less demanding than B ( p = 0 . ).The best and second-best interfaces along each dimensionare highlighted in Table II. It can be concluded that, whileboth S and E stood out from the other interfaces, S requireda lower effort for the user. Comparing the AR interfaces,participants were more effective in completing the crossingtask (higher efficiency, lower latency) when using E. Theblue arrow included in E could justify its significantly higherranking in terms of visibility. Importantly, both S and P were EEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. XX, NO. XX, XXXX 2020 8
TABLE IIR
ANKINGS AND P - VALUES FOR PAIRWISE COMPARISONS . A
GGREGATED RANKING IS CALCULATED BY COMBINING INDIVIDUAL FEATURE RANKINGS ( OVERALL EXCLUDED ). R
EMAINING RANKINGS ARE OBTAINED DIRECTLY FROM
PEQ
ANSWERS . I
NVERTED ITEMS ARE MARKED WITH *. B S P M F E B-S B-P B-M B-F B-E S-P S-M S-F S-E P-M P-F P-E M-F M-E F-EOverall 6 2 3 4 5 1
Safety 6 2 3 4 5 1
Familiarity 1 3 4 2 5 6
Intuitiveness 6 1 3 2 5 4
Ambiguity* 6 1 2 3 5 4
Visibility 6 4 5 3 2 1
Aggregated 6 2 3 4 5 1 % 6 3 0 ) ( , Q W H U I D F H 9 D O X H S S S S S S S S 2 Y H U D O O % 6 3 0 ) ( , Q W H U I D F H 9 D O X H S S S S S 3 U D J P D W L F 4 X D O L W \ % 6 3 0 ) ( , Q W H U I D F H 9 D O X H S S S S S S S S + H G R Q L F 4 X D O L W \ Fig. 3. Results of the UEQ-S: overall, as well as pragmatic and hedonic quality dimensions shown ( p -values reported for significant pairwise comparisons).TABLE IIIT RUST AND SAFETY . M
EANS ( STANDARD DEVIATIONS ), AND P - VALUES FOR PAIRWISE COMPARISONS . I
NVERTED ITEMS ARE MARKED WITH *. B S P M F E B-S B-P B-M B-F B-E S-P S-M S-F S-E P-M P-F P-E M-F M-E F-ETrust 27.0(7.1) 37.3(3.2) 34.3(7.6) 33.4(6.6) 33.2(6.2) 36.3(6.1)
Safety 2.3(1.0) 4.2(1.2) 3.9(1.1) 3.5(0.3) 3.4(0.6) 4.2(0.6)
Hesitancy* 3.3(0.8) 2.2(0.4) 3.1(0.0) 2.1(1.0) 2.8(1.0) 2.1(1.4)
Cautiousness 4.9(0.6) 5.0(0.7) 4.2(0.9) 4.5(1.3) 4.8(0.9) 4.8(0.4) 0.27 deemed less visible than the other interfaces (as expected), butfor S this aspect was not considered detrimental to safety.Moving to the other dimensions characterising the userexperience that were investigated more in depth through theUEQ-S, as reported in Fig. 3 all the interfaces performedsignificantly better than B, whereas M was significantly theworst interface (although not significantly with respect to F).Analyzing separately the dimensions addressed by the ques-tionnaire, S was considered as significantly more pragmaticthan the other interfaces, except E; moreover, focusing on theAR interfaces, only E was judged better than B in this respect.Concerning the hedonic dimension, M was judged worse thanboth E and P, E overcame F, and P was rated better than M.With respect to overall usability, according to SUS resultsall the interfaces were rated from “good” to “excellent”,and no significant differences were observed ( p = 0 . ): B( M = 84 . , SD = 10 . ), S ( M = 90 . , SD = 6 . ), P( M = 84 . , SD = 17 . ), M ( M = 79 . , SD = 10 . ), F( M = 79 . , SD = 13 . ), E ( M = 76 . , SD = 23 . ).Items of the AIQ pertaining trust and safety are reported inTable III. Based on the TS, all the interfaces were found tobe more trustworthy than B; furthermore, S, P, and E scoredsignificantly better than F. The same trend was observed also for the perceived safety. An interesting aspect to analyze ishesitancy (“I felt the need to wait for the vehicle to stop beforestarting to cross”): interfaces that provide an instantaneousindication of the predicted stop position (M and E) scoredsignificantly better than those that do not offer such a feedback(B, P, F). The importance of providing the VRU with someexplicit feedback that the vehicle recognized him or her (notnecessarily indicating the predicted stop position) is confirmedby the fact that F did not reach significance when comparedwith B. Surprisingly, S obtained intermediate scores for thisfactor, performing significantly better than both B and P(though not differently than M, F and E). Related to aspectsabove there is the item regarding cautiousness, intended as theparticipants predisposition to cross when vehicles were stillfar away (“I felt safe to cross when FAVs were distant”): asone could expect, P performed significantly worse than otherinterfaces (S, F, E, and B), whereas S scored unexpectedlybetter than M. Based on open feedback, this result could be dueto the fact that the interface induced the participants not to lookat the vehicle but just at the indications shown close to theirfeet. Results concerning rashness, intended as the participantspredisposition to cross when vehicles were close to them, arealso reported, although significance was found only for the EEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. XX, NO. XX, XXXX 2020 9 % 6 3 0 ) ( , Q W H U I D F H 9 D O X H S S S S S S S S S S 2 Y H U D O O % 6 3 0 ) ( , Q W H U I D F H 9 D O X H S S S S S S S S P % 6 3 0 ) ( , Q W H U I D F H 9 D O X H S S S S S S S S S S S P % 6 3 0 ) ( , Q W H U I D F H 9 D O X H S S S S S S S S S S P Fig. 4. Decision time (lower is better): overall, and for the various distances considered ( p -values reported for significant pairwise comparisons). comparisons with B.Remaining aspects analyzed through subjective metrics didnot showed significant differences for the various interfaces.
2) Objective Results:
Data analysis on CT highlighted nosignificant difference among the various interfaces ( p = 0 . ).Despite previous speculations, this result is not surprising,since it may suggest that, once the VRU has concluded thenegotiation with the FAV (decided to cross or not), the urgencyto complete the task and the trust in the FAV are predominanton his or her behavior compared to other possible informationprovided by the vehicle’s interface.More insights can be obtained by focusing on the distribu-tion of DT in Fig. 4. Considering all the distances together(overall), only P did not score significantly better than B, andboth of them showed significantly higher DT values comparedto the other interfaces (S, M, F, E). Furthermore, M performedsignificantly worse than both the AR interfaces (F, E) as wellas of S, and no difference was spotted among them (S, F, E).Further considerations can be made by analyzing results forthe various distances. Participants showed significantly lowerDTs for M compared with B at 100 m and 45 m, but notat 60 m. Differences with respect to other vehicle-mountedinterfaces tend to reduce at higher distances. This behaviorcan be observed for M against S (100 m) and for M againstP (100 m, 60 m). The only interface able to outperform Mat any distance was E. F was able to consistently performbetter than M just at 60 m; however, at that distance, F wassignificantly worse than E. At intermediate distances, the onlyinterface capable of consistently offer significantly lower DTswas E. A significantly high correlation ( ρ = 0 . , p = 0 . )was found between the overall DT and perceived latency(AIQ), confirming agreement between objective and subjectiveresults.Another indicator that is worth discussing is the speed atwhich the FAV was moving when the pedestrian started cross-ing the road (SAC). Even though this speed is undoubtedlylinked to DT, this is also influenced by other factors, as statedin [11]. It is also more unbiased that other metrics like, e.g.,the DAC, which depends on the vehicle distance class. Asshown in Fig. 5, considering all the distances (overall) therank from best to worst is as follows: E, F, S, M, B, and P. Allpairwise comparisons were significant, except E-F ( p = 0 . ) and B-M ( p = 0 . ). Importantly, the fact that SAC for Pwas even lower than for B confirms the subjective findingconcerning cautiousness. Furthermore, P is the only interfacefor which SACs for the three distances were not statisticallydifferent ( p = 0 . ). For all the other interfaces, SAC valuesare significantly higher at 100 m than both at 60 m and 45 m( p ≤ . ). Only F had significantly degraded performanceat small distances (60 m with respect to 45 m).Coming back to Fig. 5, digging into the behavior of SACfor the different distances it is possible to note that the B-M, B-S, and S-M pairs gain significance merely at smalldistances (45 m). On the contrary, B-F, B-P, and F-P pairsgain significance at medium and large distances (60 m and100 m). A possible interpretation for these findings could bethat the closer the vehicle is, the more important is for theVRU to have a clear confirmation that it has been detected,and this relevance fades out as the distance grows. Moreover,although the trend for M-F could appear as contradictory, itcould be easily explained by the fact that participants tended totake a confirmatory look at the vehicle for small distances afterreceiving the feedback from the road interface, hence delayingthe time they actually started the crossing. Another interestingaspect is that both the AR interfaces performed better thanthe other interfaces (S included) at 100 m and 60 m; then,at 45 m, F fell behind both S and E, thus providing furtherevidence of the importance to provide VRUs with a detectionfeedback, especially at small distances. A possible explanationfor this result could be that AR interfaces are characterizedby a higher visibility, as confirmed by a correlation analysiswith PEQ subjective visibility: ρ m = − . ( p = 0 . ), ρ m = − . ( p = 0 . ), ρ m = − . ( p = 0 . ). Itis worth observing that also SACs at 45 m were found tobe significantly correlated ( ρ = 0 . , p = 0 . ) with the AIQrashness item, suggesting that the latter could be a good metricfor subjective observations.Concerning aborted crossings and collisions, no significantdifferences were found among the various interfaces. Finally,regarding efficiency (i.e., the number of valid crossings nor-malized to the simulation time), Fig. 6 indicates that the mostinefficient interface was P, which resulted even statisticallyequivalent to B. S, M, and E were found to be comparabletoo, whereas E appeared significantly more efficient than F. EEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. XX, NO. XX, XXXX 2020 10 % 6 3 0 ) ( , Q W H U I D F H 9 D O X H S S S S S S S S S S S S S 2 Y H U D O O % 6 3 0 ) ( , Q W H U I D F H 9 D O X H S S S S S S S S S S S P % 6 3 0 ) ( , Q W H U I D F H 9 D O X H S S S S S S S S P % 6 3 0 ) ( , Q W H U I D F H 9 D O X H S S S S S S S S S S P Fig. 5. Speed at crossing (higher is better): overall, and for the various distances considered ( p -values reported for significant pairwise comparisons). % 6 3 0 ) ( , Q W H U I D F H 9 D O X H S S S S S S S S S ( I I L F L H Q F \ Fig. 6. Objective efficiency (higher is better): no distinction was made basedon distances ( p -values reported for significant pairwise comparisons). Objective efficiency was significantly correlated with the AIQsubjective efficiency ( ρ = 0 . , p = 0 . ). C. Key findings
Hereafter, the key aspects of each interface as emergingfrom collected measurements are summarized, by consideringalso open feedback collected through the interviews.
1) Smile (S):
It proved overall to be one of two bestinterfaces. It is based on a very simple concept, whichresults in high efficiency and intuitiveness, and low mentaleffort. However, at large distances it is poorly visible (allthe participants mentioned this issue); furthermore, ofthe participants would have also preferred it to include anadditional state (“frown”) to signal when the FAV has seenthem (horn activated) but cannot stop safely, and of themfound not ideal that the predicted stop position is not indicated.
2) Projected (P):
It suffers as well from visibility issues;however, its main drawback is the limited efficiency, whichcan be attributed to its semantics. Even though it is consideredvery pleasant (hedonic quality), participants tend to wait forthe green indication (that is showed only when the vehiclehas come to a full stop); according to of the participants,the red crosswalk sign is not a clear indication that the FAVactually detected the pedestrian (they suggested to use theyellow color to that purpose, mimicking traffic lights). Lastly, of the participants found the vehicle-mounted LED paneluseless or even did not notice it all.
3) Smart roads (M):
Even though it was judged as veryfamiliar and visible, overall it did not score well comparedto other interfaces. The main negative aspect, pinpointed by of the participants, was the fact that, differently than allthe other interfaces, it is not visually entangled to the FAV:hence, once a change in the interface is noticed, participantstend to double-check the incoming vehicle to ensure that theinterface feedback is coherent with vehicle dynamics (i.e., itis braking). Moreover, of the participants would havepreferred an additional state indicating whether the vehiclewas successfully connected to and communicating with thesmart road, in order to distinguish the case of faulty pedestriandetection from faulty connections (which were not simulated).
4) Safe roads (F):
Although, based on objective metrics, itscored similarly to the extended version, from the subjectiveviewpoint it was often rated worse than the other interfaces.This result could be explained by the fact that, being the twoAR interfaces very similar to each other and at the same timevery different from all the others, participants may have facedpsychological biases, such as the Weber–Fechner law [43] andthe distinction bias [44], making them overestimate the actualdifferences. The main objective difference was in the SAC atsmall distances, which was a clear indication of the greatersafety provided by the extended version compared to theoriginal design. By considering also poor ratings concerningmental workload, this interface is not particularly appealingcompared to considered alternatives. Notwithstanding, of the participants expressed their appreciation for the hintsprovided by the green/red regions drawn on the road.
5) Safe roads extended (E):
It was considered as one ofthe two best interfaces, although more complex, mentallydemanding, and less intuitive compared to the first design.Participants appreciated in particular its high visibility and thevisual connection to the vehicle, as well as the indication usedto simultaneously provide the pedestrian detection feedbackand the predicted stop position. It was the only interfacedeemed capable of fostering trust at any distance. It is worthremarking that of the participants would have preferredfewer indications (only one participant would have liked moreinformation), and of them considered not fundamentalthe arrows showing the vehicle’s dynamics (yellow and red);
EEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. XX, NO. XX, XXXX 2020 11 of the participants found the red tick useful, but not theyellow arrow (as it provides a duplicated information).
D. Considerations and remarks
It is worth observing that, despite the efforts put in car-rying out a fair and representative comparison, results andcommentary reported above can be considered as valid only forthe scenario configured for the experiments. Different settings,encompassing, e.g., a multi-lane and/or two-way traffic couldintroduce important challenges for the considered interfaces,which would require further investigations. For instance, onecould validate speculations that performance of interfacesproviding at a glance a feedback about safe zones for thewhole road width (like M, F, E) may be boosted in such ascenario. Similarly, one could study whether having multipleagents (either avatars controlled by the simulation or otheruser-controlled pedestrians) crossing the road simultaneouslyto the user-controlled pedestrian may lead to different com-munication patterns between the FAVs and the VRUs. In fact,this configuration could be extremely penalizing for interfacesconceived only for one-to-one communication like, e.g., S (asa VRU could hardy determine whether the smile was directedto him or her or to other agents). However, S could be possiblyimplemented in practice not as a vehicle-mounted one but asan AR interface: thus, the design would become similar toexperimented solutions in which each VRU gets a differentfeedback for the same vehicle. In this case, future experimentsshould focus on digging into the possible impact of currenttechnological limitations of wearable AR devices, e.g., relatedto their limited field of view. Still concerning the evaluationperspective, it shall be observed that the sample size of the userstudy reported in this work and its characteristics (e.g., the factthat participants were all Italians and accustomed to crossingin unregulated conditions, etc.) may not be fully representativeof all the potential end-user categories. Hence, future worksshould also consider cultural factors and personal behaviorsof study participants, since it can be easily expected that theycould have a non-negligible impact on users’ preferences.VI. C
ONCLUSIONS
In this paper, a careful selection of the most promisingstate-of-the-art and newly proposed interfaces for FAV-to-VRUinteraction were compared through a user study by consideringa common experimental scenario represented by pedestriancrossing. Comparison was performed in an immersive VE, inwhich a single, user-controlled VRU was requested to cross aone-way road under non-regulated conditions.Results obtained using subjective and objective metricsoutlined the importance to provide the users with a clearfeedback about the pedestrian detection process, and to ensurehigh visibility. Moreover, the study proved the potential of AR-based interfaces in supporting effective vehicle-to-pedestriancommunication (and, to the best of the authors’ knowledge,represents the most extensive analysis of such interfaces). Inparticular, one of the newly proposed AR-based interfaces(namely, E) outperformed the other designs for what it con-cerns the above requirements; it was also characterized by a higher efficiency, and it was the only interface to be judgedas capable of inducing in the users a high sense of safetyindependently of the vehicle distance. However, state-of-the-art designs leveraging anthropomorphic features displayed onvehicle-mounted LED panels (like S) proved to be charac-terized by a lower cognitive effort and a higher intuitiveness(and ease of use, in general). Despite these findings, noneof the considered interfaces stood out for all the analyzeddimensions. Nevertheless, the outcomes of this study provideprecious indications that could be used to shape interactionparadigms and technologies of future FAVs ecosystems.A
CKNOWLEDGEMENTS
The authors would like to thank Edoardo Demuru for hiscontribution to the system implementation. This research waspartially supported by the VR@POLITO initiative.R
EFERENCES[1] Y. Qiao, Y. Cheng, J. Yang, J. Liu, and N. Kato, “A mobility analyticalframework for big mobile data in densely populated area,”
IEEE Trans.on Vehic. Techn. , vol. 66, no. 2, pp. 1443–1455, 2016.[2] “Taxonomy and definitions for terms related to driving automa-tion systems for on-road motor vehicles,” in
SAE Technical Paper,J3016 201806 , 2018.[3] Y. Xun, J. Liu, N. Kato, Y. Fang, and Y. Zhang, “Automobile driverfingerprinting: A new machine learning based authentication scheme,”
IEEE Trans. on Ind. Informatics , vol. 16, no. 2, pp. 1417–1426, 2019.[4] J. Wang, J. Liu, and N. Kato, “Networking and communications inautonomous driving: A survey,”
IEEE Communications Surveys &Tutorials , vol. 21, no. 2, pp. 1243–1274, 2018.[5] L. Morra, F. Lamberti, F. G. Prattic´o, S. La Rosa, and P. Montuschi,“Building trust in autonomous vehicles: Role of virtual reality drivingsimulators in HMI design,”
IEEE Trans. on Vehic. Techn. , vol. 68, no. 10,2019.[6] A. Rasouli and J. K. Tsotsos, “Autonomous vehicles that interact withpedestrians: A survey of theory and practice,”
IEEE Transactions onIntelligent Transportation Systems , vol. 21, no. 3, pp. 900–918, 2019.[7] C.-M. Chang, K. Toda, D. Sakamoto, and T. Igarashi, “Eyes on a car:An interface design for communication between an autonomous car anda pedestrian,” in
Proc. 9th Int. Conf. on Automotive User Interfaces andInteractive Vehicular Applications , 2017, pp. 65–73.[8] Y. Li, M. Dikmen, T. G. Hussein, Y. Wang, and C. Burns, “Tocross or not to cross: Urgency-based external warning displays onautonomous vehicles to improve pedestrian crossing safety,” in
Proc. of10th Int. Conf. on Automotive User Interfaces and Interactive VehicularApplications , 2018, pp. 188–197.[9] A. Habibovic et al. , “Communicating intent of automated vehicles topedestrians,”
Frontiers in Psychology , vol. 9, no. 1336, 2018.[10] T. Lagstr¨om and V. M. Lundgren, “AVIP – autonomous vehicles’interaction with pedestrians – An investigation of pedestrian-driver com-munication and development of a vehicle external interface,” Master’sthesis, 2016.[11] A. L¨ocken, C. Golling, and A. Riener, “How should automated vehiclesinteract with pedestrians? a comparative analysis of interaction conceptsin virtual reality,” in
Proc. 11th Int. Conf. on Automotive User Interfacesand Interactive Vehicular Applications , 2019, pp. 262–274.[12] T. T. Nguyen, K. Holl¨ander, M. Hoggenmueller, C. Parker, andM. Tomitsch, “Designing for projection-based communication betweenautonomous vehicles and pedestrians,” in
Proc. 11th Int. Conf. onAutomotive User Int. and Interactive Vehic. Appl. , 2019, pp. 284–294.[13] S. Kitayama, T. Kondou, H. Ohyabu, M. Hirose, H. Narihiro, andR. Maeda, “Display system for vehicle to pedestrian communication,”in
SAE Technical Paper , 2017.[14] E. Florentine, M. A. Ang, S. D. Pendleton, H. Andersen, and M. H.Ang Jr, “Pedestrian notification methods in autonomous vehicles formulti-class mobility-on-demand service,” in
Proc. 4th Int. Conf. onHuman Agent Interaction , 2016, pp. 387–392.[15] Y. M. Lee et al. , “Understanding the messages conveyed by automatedvehicles,” in
Proc. 11th Int. Conf. on Automotive User Interfaces andInteractive Vehicular Applications , 2019, pp. 134–143.
EEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. XX, NO. XX, XXXX 2020 12 [16] L. Graziano, “AutonoMI autonomous mobility interface,” https://vimeo.com/99160686, [Online; accessed 1/30/2021].[17] M.-P. B¨ockle, A. P. Brenden, M. Klingeg˚ard, A. Habibovic, and M. Bout,“SAV2P: Exploring the impact of an interface for shared automatedvehicles on pedestrians’ experience,” in
Proc. of 9th Int. Conf. onAutomotive User Interfaces and Interactive Vehicular Applications ,2017, pp. 136–140.[18] S. Deb, L. J. Strawderman, and D. W. Carruth, “Investigating pedestriansuggestions for external features on fully autonomous vehicles: A virtualreality experiment,”
Transportation Research Part F: Traffic Psychologyand Behaviour , vol. 59, 2018.[19] M. Matthews, G. Chowdhary, and E. Kieson, “Intent communicationbetween autonomous vehicles and pedestrians,” 2017. [Online].Available: https://arxiv.org/abs/1708.07123[20] K. De Clercq, A. Dietrich, J. P. N´u˜nez Velasco, J. de Winter, andR. Happee, “External human-machine interfaces on automated vehicles:Effects on pedestrian crossing decisions,”
Human Factors , vol. 61, no. 8,2019.[21] E. Ackerman, “Drive.ai solves autonomous cars’ communicationproblem,”
IEEE Sprectrum , 2016. [Online]. Available: https://tinyurl.com/ycjw4ro2[22] U. Gruenefeld, S. Weiß, A. L¨ocken, I. Virgilio, A. L. Kun, and S. Boll,“VRoad: Gesture-based interaction between pedestrians and automatedvehicles in virtual reality,” in
Proc. 11th Int. Conf. on Automotive UserInterfaces and Interactive Vehicular Applications Adjunct , 2019.[23] C. G. Burns, L. Oliveira, P. Thomas, S. Iyer, and S. Birrell, “Pedestriandecision-making responses to external human-machine interface designsfor autonomous vehicles,” in
IEEE Intelligent Vehicles Symp. , 2019.[24] J. Mairs, “Umbrellium develops interactive road crossing thatonly appears when needed,” 2017. [Online]. Available: https://tinyurl.com/y4vng5p8[25] K. Mahadevan, S. Somanath, and E. Sharlin, “Communicating awarenessand intent in autonomous vehicle-pedestrian interaction,” in
Proc. CHIConf. on Human Factors in Computing Systems , 2018, pp. 1–12.[26] H. Nishiyama, T. Ngo, S. Oiyama, and N. Kato, “Relay by smartdevice: Innovative communications for efficient information sharingamong vehicles and pedestrians,”
IEEE Vehicular Technology Magazine ,vol. 10, no. 4, pp. 54–62, 2015.[27] L. Cancedda, A. Cannav`o, G. Garofalo, F. Lamberti, P. Montuschi, andG. Paravati, “Mixed Reality-based user interaction feedback for a hand-controlled interface targeted to robot teleoperation,” in
InternationalConference on Augmented Reality, Virtual Reality and Computer Graph-ics , 2017, pp. 447–463.[28] M. Hesenius, I. B¨orsting, O. Meyer, and V. Gruhn, “Don’t panic!:Guiding pedestrians in autonomous traffic with augmented reality,” in
Proc. 20th Int. Conf. on HCI with Mobile Devices and Services , 2018.[29] S. Shah, D. Dey, C. Lovett, and A. Kapoor, “Airsim: High-fidelity visualand physical simulation for autonomous vehicles,” in
Field and ServiceRobotics , 2017, pp. 621–635.[30] A. Pillai, “Virtual reality based study to analyse pedestrian attitudetowards autonomous vehicles,” 2017.[31] G. Johansson and K. Rumar, “Drivers’ brake reaction times,”
HumanFactors , vol. 13, no. 1, pp. 23–27, 1971.[32] I. Doric et al. , “A novel approach for researching crossing behavior andrisk acceptance: The pedestrian simulator,” in
Proc. 8th Int. Conf. onAutomotive User Int. and Interactive Vehic. Appl. , 2016, pp. 39–44.[33] M. Beggiato, C. Witzlack, and J. F. Krems, “Gap acceptance andtime-to-arrival estimates as basis for informal communication betweenpedestrians and vehicles,” in
Proc. 9th Int. Conf. on Automotive UserInterfaces and Interactive Vehicular Applications , 2017, pp. 50–57.[34] A. Cannav`o, D. Calandra, F. G. Prattic`o, V. Gatteschi, and F. Lamberti,“An evaluation testbed for locomotion in virtual reality,”
IEEE Trans.Vis. Comput. Graphics , vol. 27, pp. 1871–1889, 2021.[35] R. S. Kennedy, N. E. Lane, K. S. Berbaum, and M. G. Lilienthal,“Simulator sickness questionnaire: An enhanced method for quantifyingsimulator sickness,”
The Int. Journal of Aviat. Psychology , vol. 3, 1993.[36] J.-Y. Jian, A. M. Bisantz, and C. G. Drury, “Foundations for anempirically determined scale of trust in automated systems,”
Int. J.Cognitive Ergonomics , vol. 4, no. 1, pp. 53–71, 2000.[37] J. Brooke, “SUS – A quick and dirty usability scale,”
Usability Evalu-ation in Industry , 1996.[38] S. G. Hart and L. E. Staveland, “Development of NASA-TLX (TaskLoad Index): Results of empirical and theoretical research,” in
Advancesin Psychology , 1988, vol. 52, pp. 139–183.[39] M. Schrepp, A. Hinderks, and J. Thomaschewski, “Design and evalua-tion of a short version of the User Experience Questionnaire (UEQ-S),”
Int. J. Interact. Multimedia Artif. Int. , vol. 4, no. 6, pp. 103–108, 2017. [40] R. S. Kalawsky, “VRUSE – A computerised diagnostic tool for us-ability evaluation of virtual/synthetic environment systems,”
AppliedErgonomics , vol. 30, no. 1, pp. 11–25, 1999.[41] T. W. Schubert, “The sense of presence in virtual environments: A three-component scale measuring spatial presence, involvement, and realness.”
Zeitschrift f¨ur Medienpsychologie , vol. 15, no. 2, pp. 69–71, 2003.[42] G. Erd´elyi, L. Piras, and J. Rothe, “Bucklin voting is broadly resistantto control,” 2010. [Online]. Available: https://arxiv.org/abs/1005.4115[43] S. Dehaene, “The neural basis of the Weber–Fechner law: A logarithmicmental number line,”
Trends in Cognitive Sciences , vol. 7, no. 4, pp.145–147, 2003.[44] C. K. Hsee and J. Zhang, “Distinction bias: Misprediction and mischoicedue to joint evaluation.”
Journal of Personality and Social Psychology ,vol. 86, no. 5, p. 680, 2004.
F. Gabriele Prattic`o received the M.Sc. degree incomputer engineering from Politecnico di Torino,Turin, Italy, in 2017. Currently, he is a Ph.D. studentat the Dipartimento di Automatica e Informatica ofPolitecnico di Torino, where he carries out researchin the areas of extended reality, human-machineinteraction, educational and training systems, anduser experience design.
Fabrizio Lamberti is a Full Professor with theDipartimento di Automatica e Informatica of Po-litecnico di Torino, Turin, Italy, where he has the re-sponsibility for the VR@POLITO hub. His researchinterests include computer graphics, human-machineinteraction, and intelligent computing. He is servingas an Associate Editor for IEEE Transactions onComputers, IEEE Transactions on Learning Tech-nologies, IEEE Transactions on Consumer Electron-ics, IEEE Consumer Electronics Magazine, and theInternational Journal of Human-Computer Studies.
Alberto Cannav`o received the B.Sc. degree fromUniversity of Messina, Italy, in 2013. Then, he re-ceived the M.Sc. and the Ph.D. degrees in computerengineering from Politecnico di Torino, Italy, in2015 and 2020, respectively. Currently, he is a Post-doctoral Fellow at the Dipartimento di Automaticae Informatica of Politecnico di Torino. His fieldsof interest include computer graphics and human-machine interaction.
Lia Morra received the M.Sc. and Ph.D. degreesin computer engineering from Politecnico di Torino,Turin, Italy, in 2002 and 2006, respectively. Sheis currently a Senior Postdoctoral Fellow with theDipartimento di Automatica e Informatica of Po-litecnico di Torino. Her research interests includecomputer vision, pattern recognition, and machinelearning.