[PDF] Artificial Intelligence Methods in In-Cabin Use Cases: A Survey

Abstract

As interest in autonomous driving increases, efforts are being made to meet requirements for the high-level automation of vehicles. In this context, the functionality inside the vehicle cabin plays a key role in ensuring a safe and pleasant journey for driver and passenger alike. At the same time, recent advances in the field of artificial intelligence (AI) have enabled a whole range of new applications and assistance systems to solve automated problems in the vehicle cabin. This paper presents a thorough survey on existing work that utilizes AI methods for use-cases inside the driving cabin, focusing, in particular, on application scenarios related to (1) driving safety and (2) driving comfort. Results from the surveyed works show that AI technology has a promising future in tackling in-cabin tasks within the autonomous driving aspect.

Full PDF

11 Artiﬁcial Intelligence Methods in In-Cabin UseCases: A Survey

Yao Rong ∗ , Chao Han † , Christian Hellert † , Antje Loyal † , Enkelejda Kasneci ∗∗ Human-Computer Interaction, University of T¨ubingen, Germany { yao.rong, enkelejda.kasneci } @uni-tuebingen.de † Continental Automotive GmbH, Germany { chao.han, christian.hellert, antje.loyal } @continental-corporation.com Abstract —As interest in autonomous driving increases, effortsare being made to meet requirements for the high-level au-tomation of vehicles. In this context, the functionality inside thevehicle cabin plays a key role in ensuring a safe and pleasantjourney for driver and passenger alike. At the same time, recentadvances in the ﬁeld of artiﬁcial intelligence (AI) have enabled awhole range of new applications and assistance systems to solveautomated problems in the vehicle cabin. This paper presentsa thorough survey on existing work that utilizes AI methodsfor use-cases inside the driving cabin, focusing, in particular,on application scenarios related to (1) driving safety and (2)driving comfort. Results from the surveyed works show thatAI technology has a promising future in tackling in-cabin taskswithin the autonomous driving aspect.

I. I

NTRODUCTION

Autonomous driving is among the most widely discussedtopics in the recent decade. As a new transportation technol-ogy, the autonomous vehicle is designed to surpass humandrivers in many aspects, particular in safety. However, inorder to realize fully autonomous driving, different levels ofautonomy are planned to be achieved successively. Accordingto the

Taxonomy and Deﬁnitions for Terms Related to On-Road Motor Vehicle Automated Driving Systems by the SAEInternational [1], there are six different levels leading up tofull autonomy, with Level 0 representing “fully manual” andLevel 5 representing “fully autonomous” driving. The vehiclesfunctioning between Level 0 and Level 5 are all regarded assemi-autonomous vehicles.Current research and product development is mainly target-ing Level 3 (L3) and Level 4 (L4). For L3, the presence ofthe driver is required to resolve driving situations that are notmanageable by automation. The task for autonomous vehiclesis to handle driving under certain conditions, such as drivingon a highway or in a city trafﬁc jam. Many vehicle man-ufacturers are now focusing on incorporating L3 automatedsystems into their products, e.g. Audi

Trafﬁc Jam Pilot [2].From L4 on, requests of the takeover from a human driverare no longer necessary. The vehicle is required to analyzedriving situations and make informed decisions, like whento change lanes, turn, accelerate, or brake. Even in the caseof a device failure, the autonomous systems should be ableto safely handle these actions independently. In L4, however,manual intervention does remain for particularly challengingcircumstances, such as a system failure. L5 vehicles can operate under all situations, but also provide a more reﬁned,higher quality of services.Currently, the autonomous vehicles have achieved L3 andprogressing towards L4. Human drivers are still the main de-cision makers and supervise the entire system. Consequently,an aspect of ongoing research in L3 is to ﬁnd the optimalway of assisting human drivers and to provide a smoothand safe transition from human to autonomous driving andback again. Driver-related activities within the vehicle’s cabinshould be monitored and analyzed by the system to achievenot only a safe and comfortable drive, but to ensure thesystem’s ability to smoothly handle a takeover situation. Mostof the tasks in autonomous driving are related to “ perception ”.As human beings, we receive information mostly throughvision and speech. We analyze this information and respondaccordingly to different events. To endow vehicles with thesame capability of understanding, researchers are mounting AItechnology on autonomous vehicles to automate the perceptionof surroundings. Additionally, with emerging technologiessuch as Augmented/Virtual reality (AR/VR) new ways ofpersonalized driving assistance, information, navigation andentertainment [3]–[5] have been enabled. Given the broadapplication of AI technologies inside the driving cabin, we areperforming a thorough survey of existing studies conductedby researchers and system developers. Our motivation is toidentify and highlight the similarities and differences withinexisting works in order to envision new applications. Inthis context, similarities means the identiﬁcation of differentapplications using the same input data modality or algorithms.This often indicates an emerging trend of research. Moreover,resource efﬁciency can be improved if one input feature (orhardware) can be used in different applications. A diversity ofalgorithms can be used to solve similar problems. Reviewingand referring to existing works can serve as an inspirationfor readers seeking concrete solutions for speciﬁc tasks inautonomous driving. We aim to provide a clear overview ofcommonly used hardware and algorithms, with a strong focuson the SAEs L3 and L4. L3 and L4 will be discussed with anemphasis on considerations for safety and comfort.The paper is organized following: in Section II, we willdiscuss different applications that contribute to safety andemployed methodological approaches. Section III introducestasks aimed at comfortable driving. In the last section, wesummarize all the surveyed works and provide a brief outlook. a r X i v : . [ c s . A I] J a n TABLE IU SE - CASES FOR DRIVING SAFETY .A N OVERVIEW OF USE - CASES FOR S AFETY . F

OR EACH USE - CASE WE SUMMARIZE THE TYPICALLY EMPLOYED INPUT FEATURES AND THE AI METHODOLOGY . R

OUND BRACKETS MARK THE SOURCE OF THE FEATURE : D = D

RIVER , V = V

EHICLE , O = O

UTSIDE /R OAD VIEW . Use-case Feature Method Reference

Emotion detection physiological information (D)acoustic signals (D)image of driver (D) FFNN, CNNSVMFuzzy Logic SystemGMM (regression model) [15]–[22]

DriverStatusMonitoring

Fatigue detection eyelid movement (D)mouth movement (D)head posture (D)physiological information (D)vehicle dynamics (V) FFNNSVMFuzzy Logic System [24], [25][29], [30]

Distraction detection image of driver and road (D&O)physiological information (D)head posture (D)vehicle dynamics (V)driver behavior (D) Semi-Supervised LearningSVM, Random ForestsMaximal Information CoefﬁcientCNNGMM (preprocessing) [31], [36], [37], [39]–[44]

Attention detection eye gaze (D)head posture (D)full images of driver (D) 3D CNN [32]–[34]

Driving

Driver intention analysis vehicle position and dynamics (V)image of driver and road (D&O)head posture (D) SVM, Random ForestGMM (regression model)RNN/LSTMHMM3D CNN [45]–[52]

Assistance

Trafﬁc hazards warning head posture (D)vehicle dynamics (V)image of road (O) Fuzzy logic system [54]

TakeoverReadiness

Takeover readiness evaluation vehicle dynamics (V)eye gaze (D)driver behavior (D) SVM, KNN [60]

II. I N - CABIN USE - CASES FOR DRIVING SAFETY

According to the

National Highway Trafﬁc Safety Admin-istration (NHTSA) in the USA, of serious accidents arecaused by errors behind the wheel [6]. One important task foran autonomous driving system is to ensure the safety of thedriver, passengers and other vehicles and pedestrians on theroad. Since SAE L3 and L4 require a driver’s presence, thesystem is responsible for monitoring the driver. For instance,the system needs to assess whether or not the driver is in aproper state for driving and to assist the driver in the decision-making process. Table I presents a short overview of the use-cases discussed in the following sections.

A. Driver status monitoring

Various Driver Monitoring Systems (DMS) have been devel-oped over the past few years. Due to the rapid developmentof AI technology, some mature systems are currently beingutilized in the market. For example, Seeing Machines [7],Valeo Driver Monitoring [8], and SmartEye Driver MonitoringSystem [9]. These systems are usually based on image infor-mation from a camera mounted in front of the driver. Theyinfer information about the driver based on the analysis of herfacial expression, eye gazes or head posture. Physiologicalsignals, such as heart rate and skin temperature can alsocontain valuable information about the driver. Utilizing thisinformation is helpful when gauging the driver’s vigilance,emotions and level of attention or distraction.

1) Emotion detection:

The emotional status of a drivercan heavily inﬂuence the decision-making process and overallbehavior on the road. It is important to analyze the emotionalstatus of the driver and process this information accordingly within the automated system. In particular, “Aggressive driv-ing,” as deﬁned by NHTSA, has been researched for decadesregarding its negative inﬂuence on road safety [10]. Driversoften respond to aggressive acts by another driver with angerand mirrored aggression. [11]. Due to this common responsefrom drivers, it is important to monitor driver emotions peri-odically. Automated recognition of a driver’s emotional statecan capture warnings of aggressive or distracted driving dueto “road rage” before behavior escalates, resulting in a saferdriving experience.Since emotions correlate strongly with facial expressions,automated methods for emotion recognition based on imageshave been the focus of research over the last two decades. Afew of these approaches, [12], [13] use Cohn and Kanade’sdataset [14], which contains a large number of facial imagesequences from different people. [13] proposes a system thatﬁrst locates the face in the image and then classiﬁes theemotions based on the Gabor magnitude representations of thelocated faces. An approach with AdaSVM provides the bestperformance in this work: Gabor features chosen by Adaboostwere used as the training input for Supported Vector Machine(SVM) classiﬁer. [12] uses Local Binary Patterns (LBP) asthe discriminative features rather than Gabor features whichallows for a very fast feature extraction. Similarly, an SVM isemployed as the classiﬁer for emotion recognition.When it comes to in-cabin driver emotion detection, image,speech and physiological signals are often used for detectingemotion. A motion estimation system named “Affectiva” [22],which is also applied in automotive applications, uses driverfacial images and speech signals. Most of the research workon this topic focuses on physiological signals due to suitabilityand accuracy. In [15]–[17], biopotentials are measured by various medical techniques: electromyogram (EMG) in [16],electrocardiogram (ECG) in [16], [17], electroencephalogram(EEG) in [15] and electroencephalogram activity (EDA) in[16], [17]. Besides biopotential, skin temperature is also usedin [17]–[19], as is respiration in [16], [18] and heart rate in[19]. In addition to physiological features, a driver’s acousticsignals are processed for this same purpose in [20], [21].Speech may not offer as robust a result as biosignals, butacquisition of acoustic signals is simple and unobtrusive.A diverse selection of emotions is required to effectivelytrain machine learning models. Over the last seven years, amassive amount of video and audio data from all over theworld has been collected for the emotion AI system [22]. In[16], [18], [19], data is recorded when different affective be-haviors from drivers are elicited in simulated driving scenariosin the lab. In [20], however, real world speech clips are used.A publicly available speech database called Emo-DB [23] isused in [21].With the help of large amounts of real world data, verydeep Convolutional Neural Networks (CNNs) are trained forthe classiﬁcation of seven different emotion [22]. In [16],[17], four different classes of emotion (excited, relaxed, an-gry and sad) are detectable by Feed Forward Neural Net-works (FFNNs). [17] uses cellular neural network, while[16] combines FFNN and fuzzy inference systems. [19] alsouses FFNNs but trains with different optimizers: MarquardtBackpropagation (MBP) and Resilient Backpropagation (RBP)algorithms. The best results are achieved by RBP amongst ﬁvedifferent emotional states with 91.9% accuracy. The authorsfrom [18] propose a novel latent variable model and alsointroduce the temporal state into the model. Training thismodel is similar to training a Gaussian Mixture model (GMM).Audio streams are used as system inputs and extract acousticfeatures like speech intensity, pitch, and Mel-Frequency Cep-stral Coefﬁcients (MFCC) in [20], [21]. An SVM and BayesianQuadratic Discriminate Classiﬁer are trained in [20] and [21],respectively. Moreover, [20] uses speech enhancement to resistthe inﬂuence of noisy background inﬂuence. It also showsthat including gender information results in better overallrecognition.

2) Fatigue detection:

Drowsy driving greatly impacts thesafety of those on the road. It is necessary to remind driversto rest when the fatigue is detected. The most popular featurefor measuring fatigue is eyelid movement, particularly thepercentage of eyelid closure (PERCLOS) [26]. Other usefulinformation can include facial expressions, physiological in-formation (heart rate) and vehicle data (car speed, steeringwheel angle, position on the lane).For successful fatigue detection, eye metrics are useful [24],[25], [27]–[30]. Such features can be collected simply byusing a regular camera mounted in front of the driver. In[24], yawning (mouth movement) is measured along with eyeclosure; in [30], vehicle data is also proven to be useful.[27] uses the velocity of eyelid to detect the eye blink forassessing drowsiness. Eye blinking and head movements areused together as input signals of logic regression modelsfor drowsiness state classiﬁcation in [28]. [29] comparesthe detection accuracy using behavioral data (eye and head movement), physiological information and vehicle data.Different machine learning models can be applied to de-termine whether or not the driver is exhausted. [24] uses aFuzzy Expert System to classify the state of the driver, while[25] deploys a binary SVM classiﬁer for detecting open andclosed eyes. [29], [30] show that the FFNN is also suitable formeasuring levels of drowsiness. Especially in [29], the FFNNcan even predict when the driver will reach a given level.

3) Distraction detection:

Distraction is another major threatto driving safety, motivating researchers to study activities thatoften lead to preoccupied driving.According to [35], distraction has four distinct categories:visual, cognitive, auditory, and bio-mechanical. Visual dis-traction is deﬁned as “eye-off-the-road”, which is obviousto detect. In this instance, eye gaze is an essential featurefor detection. In [31], the proposed method estimates a 3Dhead pose and a 3D eye gaze direction using a low-cost CCD(charge-coupled device) camera. Estimations are measuredwith respect to the camera coordinate system. With the rotationmatrix from the camera coordinate to the world coordinatesystem, the driver’s observance of the road can be measured.An SVM classiﬁer is used ﬁrst for detecting sunglasses. Ifsunglasses are detected, the estimation relies only on the headpose. [38] proposes a standardized framework for evaluatinga system, which tracks driver head movements to alert in casethe driver is distracted. Such a standard makes it possibleto fairly evaluate different driver head tracking systems. Inaddition, this framework introduces a ground-truth data acqui-sition system, PolhemusTM Patriot, and takes driver-relatedinformation (gender, race and age, etc.) into account. [39]uses eye movements and driving data to classify normal anddistracted driving in real time. It also proves that the SVMclassiﬁer is suitable for such a task.Compared with visual distraction, cognitive distraction suchas daydreaming or becoming “lost in thought” is harder todetect. Cognitive distraction is also called “mind-off-the-road”,indicating a loss of situation awareness. Facial expressions anddriving performance reﬂect this distraction. [36] explores theeffect of both distractions with the help of multi-modality fea-tures from CAN-Bus, microphone, and camera recording roadand driver. Classiﬁers employ these feature representations todiscriminate between different distraction levels. The causesof cognitive distractions are variable. Estimation of drivers’workloads can also impact the cognitive state of the driver. Tomeasure workloads, [37] proposes a new nonlinear causalitydetection method called error reduction ratio causality, whichidentiﬁes the important variables. The variables used hereinclude Skin Conductance Response (SCR), hand temperatureand heart rate, as well as GPS position and accelerationrecorded from real-world driving. An SVM is trained after-wards to select the right model for measurement.[40] studies audio-cognitive distraction. The task for thedriver is to count how many times each of the target soundsappear. An eye tracker records eye and head movement data.This data is then used to train a Laplacian SVM and Semi-Supervised Extreme Learning Machine. The study also provesthat using a semi-supervised learning algorithm outperformssupervised learning when giving more unlabeled data.

Bio-mechanical distraction refers to adjusting devices man-ually. For example, adjusting the radio. The solution is tosimplify the Human-Machine-Interface (HMI) in the cabin,which will be discussed in Section III.Performing secondary tasks always causes more than onedistraction. Distracting secondary tasks include talking ona cell phone or drinking/eating. Deep neural networks canrecognize these behaviors which are very helpful in actionrecognition. For example in [43], [44], seven activities aredivided into two groups: normal driving (normal driving, rightmirror checking, rear mirror checking and left mirror check-ing) and distraction (using in-vehicle radio device, texting andanswering the mobile phone). The dataset is collected usingKinect, so the images and the coordinates of head centre orupper body joints are recorded. [44] uses Random Forests(RF), Maximal Information Coefﬁcient (MIC) and a FFNNas classiﬁers using the head and body features. [43] onlyuses images of drivers. The images are ﬁrst processed bya GMM to segment the driver’s body, and then used forCNNs training. The CNN backbones used in experiments areAlexNet, GoogLeNet, and ResNet50. The best performance isachieved by AlexNet which also surpasses the result in [44].In [41], [42], CNNs such as AlexNet, InceptionV3 and BN-Inception are trained in end-to-end manner. These networksachieve distracting activity recognition with high accuracy.

4) Attention detection:

Another important task for DMS isto understand where the driver is looking while driving. Witha high-level criticallity of the event detected (e.g. a pedestriancrossing the street), the system warns the driver if the driver isnot paying attention [61], [62]. This task is one speciﬁc use-case in visual attention modeling. Visual saliency and gaze arecommon tools for measuring the attentive area.Eye-tracking glasses have the ability to track the preciousposition of the gaze, but it is challenging for the driver to wearequipment while driving. In this case, head posture estimationassists with gaze estimation. In [32], a pipeline is proposed:facial feature detection and tracking – (3D) head postureestimation – gaze region estimation. Besides using handcraftedfeatures such as facial landmarks, [33] proposes a deep CNNfor localizing the driver’s head and shoulder position in thedepth images.It is also possible to predict the focus of attention withoutusing head posture information. For instance, in [34], the rawvideo, optical ﬂow and semantic segmentation informationare fed to a multi-branch 3D-CNN for end-to-end training,in order to predict the focus area on the road image. In thefuture, attention prediction for human drivers can contributeto attention mechanisms for autonomous perception functions.

B. Driving assistance

In Section II-A, we discussed the Driver Monitoring System,a system that focuses on and contributes to safe driving. TheAdvanced Driver Assistance System (ADAS) is also designedto avoid accidents by alerting the driver to potential problemsor by taking over the control of the vehicle. In the last decades,functions such as anticipating the intention of drivers andanalyzing on-road trafﬁc have also been studied. This sectionintroduces these functions integrated into ADAS.

1) Driver intention analysis:

Accelerating, braking, steer-ing, turning and lane changing are common tasks duringdriving. Wrong decisions can result in critical situations ortriggering accidents. ADAS assists with lane keeping or chang-ing and prevents some dangerous maneuvers. In order to assistthe driver, it has to understand the driving context. [45] usesvisual gist as the image descriptor for pre-attentive percep-tion. The images are captured by three on-board cameras. ARandom Forest (RF) classiﬁer trained with the gist featurescan differentiate road contexts such as single-lane, crossing,or T-junction. Furthermore, it can successfully predict drivingactions in real time using driving context information.An important driving behavior is lane changing. In [46]–[49], [52], lane changing behavior is anticipated. [46] predictsthree classes: right/left lane change and no lane change. Thefeatures are collected by a vision and Inertial MeasurementUnit (IMU) based lane tracker. The position of the vehiclein respect to the lane, more speciﬁcally the lateral positionand the steering angle, are recorded. The proposed predictionmodel includes a Bayesian ﬁlter and an SVM classiﬁer. TheBayesian ﬁlter takes the output from the SVM and producesa ﬁnal prediction. [52] predicts whether or not lane changingoccurs with the help of the Sparse Bayesian Learning (SBL)model. The input features are lane positional informationacquired from the camera focused on the road, vehicle param-eters from CAN-Bus, and driver head posture obtained fromthe image of the driver. In [48], [49], more driving behaviorsare included in addition to the three lane changing classes,i.e. right/left turn. The input information sources are variousin this dataset. They include videos of drivers and the roadoutside the vehicle, vehicle dynamics, GPS, and street maps.[49] makes use of all this information and trains a RecurrentNeural Network (RNN) with Long-Short Term Memory Cells(LSTM). According to the results in [48], this architectureachieves the best result when compared with SVM, RF orHidden Markov Model (HMM). Moreover, it anticipates theaction with an average 3.58s. Using videos of drivers, end-to-end prediction is also accurate. For instance, in [50] the3D ResNeXt-101 with a LSTM layer on the top is trained inend-to-end style. The results in [51] prove that videos towardsroads have complementary information as driver videos, whichshould also be considered in driver maneuver prediction. [47]takes the personalities of drivers into account because ADASshould comply with the driver’s habits to ensure overall safety.It proposes using a GMM to adjust the sinusoidal lane changekinematic model according to individual driving styles.Finally yet importantly, [53] provides an overview of amulti-module Driver Intention Inference (DII) system designedfor lane changing intention detection. This system consists ofdifferent modules: trafﬁc context perception module, vehicledynamic module, driver behavior recognition module anddriver intention inference module. From this work, we cansee an emerging trend of multi-module fusion in ADAS.

2) Trafﬁc hazards warning:

Not only should ADAS focuson the intention of the driver, but it should also simultaneouslyobserve on-road trafﬁc. This can prevent some trafﬁc accidentsby correlating information and notifying the driver in a timelymanner. On-road hazards include rear-end crashes, unnoticed pedestrians, speed breakers or trafﬁc signs.One possible solution for this task is to combine the driver intention prediction/driver status detection with on-roadtrafﬁc detection . It requires driver monitoring, object detec-tion/tracking, and data fusion modules to work simultaneously.Fig. 1 shows the components of the system. Trafﬁc detectionthat only uses on-road information is not related to in-cabinapplications and will not be discussed.

TraﬃcHazardsWarningDriver Status Traﬃc on RoadDriver Intention

In-Cabin Out-Cabin

Fig. 1. Trafﬁc hazards warning system includes both in- and out-cabinanalysis.

The system in [54] consists of two modules in Fig. 1. Driverhead posture estimation is a preliminary part of driver attentionanalysis. A 3D face model is trained using an asymmetricface appearance model. Mapping 2D feature-points into a 3Dface helps to determine the direction of the driver’s attention.The second component of the driver-assistance system is roadtrafﬁc detection, which uses global Haar-like features (GHaar)classiﬁer to detect vehicles ahead on the road. Additionally,the system can estimate the distance and the angle between thedetected vehicle and the ego vehicle in relation to the right laneof the road. A fuzzy logic system extrapolates future drivingrisks based on driver and on-road information.Besides other vehicles, pedestrians and bicycles are otherimportant factors on road. In [55], the authors developed apedestrian collision warning system, equipped with a volu-metric head-up display (HUD) in the cabin to identify whenand where pedestrians are approaching. This work also showsthat the Augmented Reality (AR) technique is both effectiveand intuitive for warning systems within the cabin.

C. Take-over readiness evaluation

As mentioned, at SAE L3 the human driver should stand byand be prepared to take over control of the vehicle.

Takeoverreadiness deﬁnes the driver’s ability to regain control of thevehicle from the automated mode. Non-driving related tasksduring automated driving may interfere with a driver’s abilityto regain control of the vehicle [56]. Thus, it is necessary tohelp the driver stay prepared for a takeover. In this section,we discuss some methodologies that measure driver takeoverreadiness.To study the readiness of the driver, takeover request (TOR)time is a key term. TOR measures the time between the requestfor takeover and the critical situation (by which time the drivermsut maintain control). Determining when to alert the driverto a takeover situation is critical. [58] studies four different TOR times. The results show that the TOR resulting fromthe performance-based method provides the shortest reactiontime and highest satisfaction for drivers. This performance-based method considers the inﬂuence of driving behaviors. Itwas originally designed for the airborne collision avoidancesystem. Besides the TOR time, there are other factors thatmay inﬂuence takeover behavior. Factors may include trafﬁcsituation complexity, ego-motion of the vehicle, and typeof secondary tasks, etc. [56] studies how the complexity ofthe driving task and secondary task impact takeover reactiontime. A mathematical formula estimates takeover reaction timebased on experimental data. [57] creates a concept systemthat can estimate readiness directly by using driver behaviorinformation and biometric data. Extracted eye gazes andhead movements are driver behaviors while heart rate andrespiration rate are considered biometric data.There is relatively little research employing machine learn-ing methods to estimate driver readiness, with the exception of[60], [63], [64]. The authors use multi-modality data to traindifferent classiﬁers, such as K-Nearest Neighbors (KNN) andSVM. The studied data includes the maximum deviation fromthe lane center, the minimum distance to the leading vehicleand drivers’ eye gazes and behaviors. These classiﬁers predictthe quality of takeover readiness. The best result is achievedby a linear SVM: the accuracy is 79%.In addition to the estimation of takeover readiness, thesystem is responsible for keeping the driver constantly awareof the situation both inside and outside of the vehicle. AnInteractive Automation Control System (IACS) designed in[59] keeps the driver aware of the TOR on a display. Exper-iment results show that the response time to TOR and thetotal number of collisions decreases due to support from thissystem. [79] proposes a system which employs AR. In thissystem, AR is used to show a digital twin of the driver’s caron in simulation of a potential accident where the TOR isnecessary. After alerting the driver to the coming situation,the TOR is executed. This work indicates that a simulation incockpit can help the driver better understand trafﬁc situationsand handle the TOR more effectively.One limitation is that all projects presented here are con-ducted using driving simulators. Since the takeover task is asafety-critical issue, more experiments should be conducted inreal-world driving situations.III. I N -C ABIN U SE - CASES FOR D RIVING C OMFORT

Autonomous vehicle technology makes driving not only safebut also relaxing. Improving driver and passenger comfortlevel is another key research topic. Tasks in the comfort sectorare generally non-driving related tasks. In this section, weintroduce some works aiming to optimizing in-cabin operatingsystems by making vehicles more intelligent.

A. Convenience “Convenience” describes the ability of the system to accom-plish non-driving related tasks automatically according to theneeds of drivers and passengers. An intelligent system shouldrecognize needs in an accurate and timely manner. In order to perceive needs, AI methods are very suitable because theycan analyze human actions and the information encoded withinhuman actions. A new dataset named “Drive&Act” is collectedfor driver action recognition purpose [65]. It is collected indriving mode as well as in automated driving mode, and thebehaviors are ﬁne-grained labeled. This dataset includes manysecondary task actions like putting on sunglasses or readingmagazines . The videos are recorded by six synchronizedcameras inside the cabin in RGB, depth, infrared and bodyposemodalities. Recognizing these behaviors correctly can increasecomfort. For instance, the visor should ﬂip down automaticallywhen the driver is putting on sunglasses. The appearance ofthis dataset supplements a large benchmark for in-cabin actionrecognition. The authors in [65] also train different modelswith this dataset. The best performance is achieved by the 3DCNN-based model. Results indicate that AI methods have apromising future for in-cabin applications.Listening to music can provide drivers and passengers witha more comfortable journey. Research such as [77] shows thatlistening to suitable music can improve the driver’s mood andfatigue state resulting in improved driving performance. [77]proposes a framework which detects the driver’s mood-fatiguestatus and recommends music accordingly. This frameworkmakes use of different smartphone sensors to gauge eachdrivers’ speciﬁc situation and to employ intelligent analysis.For example, the system will engage the closest algorithm toclassify different music moods. B. Human-Machine-Interface

The more functional automated vehicles are, the more com-plex HMI can become. Some crucial principles are mentionedin [66], [67] for designing the HMI: HMI should both providecomfort and stimulate an appropriate level of attention fromusers. HMI should maintain minimal content in order to reducedistraction. For instance, [68] investigates the position of thedisplay for the haptic rotary device in a conventional vehicleHMI system. The results show that cluster display positionreduces lane position deviation during secondary tasks.The authors in [78] propose using AR to realize a multi-layer ﬂoating user interface system in the vehicle. This systememploys stereoscopic depth to arrange different informationon 3-layer displays. Critical information, such as “low fuel”warning, is shared on the nearest screen. Less critical itemsare shifted to the back layers and blurred. This system aimsat providing a large amount of information without greatlydistracting drivers.Hand gestures and speech are becoming a popular means ofsimplifying HMI systems because they reduce visual and bio-mechanical distraction during driving. Different sensors andrecognition algorithms are used for hand gesture recognitionin the vehicular environment. For example, (1) [73] uses mm-wavelength radar sensor and trains a Random Forest. Onaverage, the system performs above 90% accuracy for all sixgestures classes; (2) in [74], multiple modalities includingRGB, depth/infrared images and 3D hand joints are tested.They train two networks: A C3D network and a Long-ShortTerm Memory (LSTM). The best model, with a recognition accuracy of 94.4% for 12 classes, is the LSTM model, using3D hand joints as input modality. In speech recognition,special uses for driving scenarios are explored. Some examplesinclude: natural language analysis based on a RNN archi-tecture for commands like “set/change destination or drivingspeed” in [75], or the the vehicle control system’s defensestrategy using an SVM classiﬁer that can resist attacks fromhidden voice commands in [76].Another traditional HMI element in vehicles is the HVAC(Heating, ventilation, and air conditioning) system. Normally,the controllers are hand-coded, requiring attention from thedriver. In [71], a control system deploying NN architecture canrealize automatic control of the cabin’s thermal environment.At ﬁrst, the model collects data while the user is adjustingthe system. After training, the model can learn the user’spreference and control the thermal environment accordingly.Different machine learning techniques can be used to realizethis goal. In [72], the automatic control is realized usingReinforcement Learning (RL). It should be noted that theRL controller consumes less energy and produces a morecomfortable environment than manual control approaches.For fully autonomous cars, [67] proposes that HMI shouldonly contain commands for “start”, “stop” and “choose thedestination”. Additionally, other interfaces for entertainmentor maps should be included in personal mobile devices. Theadvantages are the separation of safety critical functions fromnon-critical ones, whose personalization remains.As the SAE level increases, drivers can focus less on drivingtasks and have more access to HMIs. Human factors becomemore inﬂuential in the HMI systems. [69] introduces an HMIframework which clusters human factors (of both drivers andother users of the road) as dynamic factors. Different HMIsare chosen depending on these inﬂuential factors. They alsopropose an external HMI for communicating with other userson the road. One speciﬁc and important human factor forautonomous driving is trust in the vehicle. [70] focuses onhow to increase human trust for an autonomous car via HMIs.The authors suggest that HMI framework should take multipleevents over a period of time into account rather than focus onone isolated event.

C. Navigation

Navigation is one of the most pronounced functions inmodern vehicles. Many drivers have experienced difﬁculty,trying to concentrate on the road while viewing a personalnavigation device. Using an AR Head-Up Display (HUD) toshow the navigational path, trafﬁc signs, and landmarks is apractical solution. The work in [82] proves that drivers prefernavigation using AR HUD to other traditional navigationdevices, namely egocentric street view and map view whichshows the vehicle within the context of its surroundings on theLCD display. On the HUD, directions are listed on a narrowsemi-transparent surface that is suspended above the center ofthe road at a height of about 2 meters. Moreover, accordingto eye gazes measurements, drivers spend 5.7 sec and 4.2 secmore per minute looking at the road ahead in comparison toLCD street view and map view, respectively.

In [80], the framework can detect vehicles and trafﬁc signsand project them onto the AR-HUD, helping drivers to avoidsome dangerous accidents in the process. For detection, theframework uses AdaBoost learning algorithm to train with theHaar features of vehicles and trafﬁc signs. The next stage afterdetection is to ﬁnd positions on the HUD for the projectionof virtual objects. To do this, camera parameters and relativeposition of the camera, with respect to objects, are required forcalculation. With the help of AR, virtual objects are attachedto real objects. In this case, drivers will be alerted to criticalinformation on the road in an unobtrusive way.An investigation of the effectiveness of different presenta-tions of AR enhanced navigational instructions in [81] shows:The most effective arrangement is to use boxes that enclosea landmark, such as “turn right in 120 meters”. The responsetimes and success rates are enhanced by 43.1% and 26.2%compared to the conventional representation (only the sign).IV. C

ONCLUSION

In this section, we summarize all AI techniques imple-mented in the in-cabin use-cases we reviewed as well ascorresponding features of these applications. The in-cabin use-cases can be abstracted into the following topics: classiﬁcationproblem, regression problem and sensor fusion problem. Forexample, to predict whether or not the driver is tired (in [25])is a classiﬁcation problem. To predict the drowsiness level (in[29]) is a regression problem. A typical occasion for sensorfusion is the “trafﬁc hazards warning” system proposed in [54].When enough data is provided, AI methods can easily tacklethe three problems outlined. It also explains the frequency ofutilization of ﬁve techniques shown in Fig. 2.

Machine LearningDeep LearningReinforcement LearningMarkov Decision ProcessFuzzy Logic

Fig. 2. Frequency of utilization of different AI methods in in-cabin use-cases

Fig. 2 shows the different AI techniques in all examinedpapers. It is worth noting that “Deep Learning” refers to thelearning algorithms that use layered structures (Artiﬁcial Neu-ral Networks). Although it is a subset of “Machine Learning”,it is regarded as a separated set due to its importance inComputer Vision research. The “Machine Learning” set refersto algorithms with the exception of “Deep Learning”. There are a total 42 works that utilize AI techniques. Since MachineLearning algorithms and Deep Learning networks are veryeffective when solving classiﬁcation and regression problems,both dominate the surveyed works. Concretely, 50.0% (21papers) of the applications were solved by Machine Learningalgorithms and 40.5% (17 papers) used Deep Learning net-works. The Fuzzy Logic System (4.7%) is used when thereare multiple inputs from different sensors, as exhibited in [24],[55].We summarize the use-cases discused in this paper andtheir relationship to SAE L3, L4 and L5 in the Table II. The (cid:88) indicates that the use-case is an important function at thislevel. As shown in the Table II, driving assistance, takeoverreadiness and navigation are no longer necessary in L4 andL5 because a human driver will not intervene. The purposeof driver status monitoring also changes from L3 to L4. TheL3 system focuses on driver anomaly while L4 and L5 areconcerned with passenger emotion and satisfaction.

TABLE IIU SE - CASES AND THEIR IMPLEMENTATIONS IN DIFFERENT

SAE

LEVELS ( FROM L3 TO L5)L3 L4 L5driver status monitoring (cid:88) (cid:88) (cid:88) driving assistance (cid:88) takeover readiness (cid:88) convenience (cid:88) (cid:88) (cid:88)

HMI (cid:88) (cid:88) (cid:88) navigation (cid:88)

At the end of this paper, Table III itemizes the differenthardware employed for data acquisition in examined worksutilizing AI techniques. The hardware is categorized into eightdifferent types, as shown in the ﬁrst column. In the secondcolumn, the hardware names or models are listed. Some aremarked with “unknown” when the name is not mentioned inthe original work.Fig 3 summarizes all the use-cases examined in our survey.The features are depicted as “leaves” in a tree structure. Linesof different colors represent different techniques. The use-cases are described in keywords. If a line emerges from theleaf with an open ending, this means that the applicationonly uses this feature as the input. Typically, the applicationsuse more than one feature, marked with connections in theﬁgure. The use-case is the nearest keyword above the line(or on the right side of the line). From this overview, thefollowing is apparent for in-cabin use-cases: (1) importantinput features are a driver’s eye and head movement, fullimages of drivers/roads and vehicle position and dynamics; (2)popular techniques are Machine Learning and Deep Learning;(3) research focuses are distraction detection, HMI design, anddriver intention analysis.Fig. 3 also includes features that are used in differentapplications. For example, the image of the driver is usedwidely in distraction and intention detection, as well as forconvenience purposes. For the future work, a high-level mod-ule integrated with different functionalities will be considered.This module should have a manager that can coordinate thework of different sub-modules. In this way, the resource of the vehicle is saved and different modules can support oneanother to achieve a holistic solution.R

Jam Session , [Online] Available: https://audi-encounter.com/en/staupilot[3] The University of Michigan,

Augmented reality at U-M improves driver-less vehicle testing , [Online] Available: https://record.umich.edu/articles/augmented-reality-u-m-improves-driverless-vehicle-testing/[4] Civil Maps,

Civil Maps , [Online] Available: https://civilmaps.com/[5] Wayray,

WayRay , [Online] Available: https://wayray.com/navion[6] Ayers, Whitlow & Dressler,

NHTSA: Nearly all car crashesare due to human error

Seeing Machines

Valeo Driver Monitoring

Driver Monitoring System Interior Sensing for vehicleintegration , [Online] Available: https://smarteye.se/automotive-solutions/[10] Aggressive Driving,

INTRODUCTION , [Online] Available: https://one.nhtsa.gov/people/injury/research/aggdrivingenf/pages/introduction.html[11] Melissa Stoppler, M.D.,

Controlling Road Rage

Robust facialexpression recognition using local binary patterns , IEEE InternationalConference on Image Processing 2005, Vol.2, 2005, IEEE[13] Bartlett, Marian Stewart and Littlewort, Gwen and Fasel, Ian andMovellan, Javier R,

Real Time Face Detection and Facial ExpressionRecognition: Development and Applications to Human Computer Inter-action. , 2003 Conference on computer vision and pattern recognitionworkshop, Vol.5, 2003, IEEE[14] Kanade, Takeo and Cohn, Jeffrey F and Tian, Yingli,

Comprehensivedatabase for facial expression analysis , Proceedings Fourth IEEE Inter-national Conference on Automatic Face and Gesture Recognition (Cat.No. PR00580), 2000, IEEE[15] Fan, Xin-An and Bi, Lu-Zheng and Chen, Zhi-Long,

Using EEGto detect drivers’ emotion with Bayesian Networks , 2010 InternationalConference on Machine Learning and Cybernetics, 2010, IEEE[16] C. D. Katsis and N. Katertsidis and G. Ganiatsas and D. I. Fotiadis,

Toward Emotion Recognition in Car-Racing Drivers: A Biosignal Pro-cessing Approach , IEEE Transactions on Systems, Man, and Cybernetics- Part A: Systems and Humans, 2008, vol.38, IEEE[17] Ali, Mouhannad and Al Machot, Fadi and Mosa, Ahmad Haj andKyamakya, Kyandoghere,

Cnn based subject-independent driver emotionrecognition system involving physiological signals for adas , AdvancedMicrosystems for Automotive, 2016, Springer[18] Wang, Jinjun and Gong, Yihong,

Recognition of multiple drivers’ emo-tional state , 2008 19th International Conference on Pattern Recognition,2008, IEEE[19] Lisetti, Christine L and Nasoz, Fatma,

Affective intelligent car interfaceswith emotion recognition , Proceedings of 11th International Conferenceon Human Computer Interaction, 2005, Citeseer[20] Tawari, Ashish and Trivedi, Mohan,

Speech based emotion classiﬁcationframework for driver assistance system , 2010 IEEE Intelligent VehiclesSymposium, 2010, IEEE[21] Al Machot, Fadi and Mosa, Ahmad Haj and Dabbour, Kosai andFasih, Alireza and Schwarzlm¨uller, Christopher and Ali, Mouhanndadand Kyamakya, Kyandoghere,

A novel real-time emotion detection systemfrom audio streams based on bayesian quadratic discriminate classiﬁerfor ADAS , Proceedings of the Joint INDS’11 & ISTET’11, 2011, IEEE[22] Affectiva,

Multi-Modal Emotion Estimation , [Online]. Available: https://developer.download.nvidia.com/video/gputechconf/gtc/2019/presentation/s9466-multimodal-affects-analysis-the-future-of-the-autonomous-vehicle-in-cabin-experience.pdf[23] Burkhardt, Felix and Paeschke, Astrid and Rolfes, Miriam andSendlmeier, Walter F and Weiss, Benjamin,

A database of Germanemotional speech , Ninth European Conference on Speech Communicationand Technology, 2005 [24] Azim, Tayyaba and Jaffar, M Arfan and Mirza, Anwar M,

Fullyautomated real time fatigue detection of drivers through fuzzy expertsystems , Applied Soft Computing, Vol.18, page 25–38, 2014, Elsevier.[25] Mandal, Bappaditya and Li, Liyuan and Wang, Gang Sam and Lin, Jie,

Towards detection of bus driver fatigue based on robust visual analysisof eye state , IEEE Transactions on Intelligent Transportation Systems,Vol.18, page 545–557, 2016, IEEE[26] Wierwille, Walter W and Wreggit, SS and Kirn, CL and Ellsworth, LAand Fairbanks, RJ,

Research on vehicle-based driver status/performancemonitoring; development, validation, and reﬁnement of algorithms fordetection of driver drowsiness , 1994[27] M.H.Baccour and F.Driewer and E.Kasneci and W.Rosenstiel,

Camera-Based Eye Blink Detection Algorithm for Assessing Driver Drowsiness ,IEEE Transactions on Intelligent Vehicles Symposium, page 987-993,2019, IEEE[28] M.H.Baccour and F.Driewer and T.Sch¨ack and E.Kasneci,

Camera-based Driver Drowsiness State Classiﬁcation Using Logistic RegressionModels , IEEE International Conference on Systems, Man, and Cybernet-ics (SMC), page 1-8, 2020, IEEE[29] de Naurois, Charlotte Jacob´e and Bourdin, Christophe and Stratulat,Anca and Diaz, Emmanuelle and Vercher, Jean-Louis,

Detection andprediction of driver drowsiness using artiﬁcial neural network models ,Accident Analysis & Prevention, 2017, Elsevier[30] Wang, Xuesong and Xu, Chuan,

Driver drowsiness detection based onnon-intrusive metrics considering individual speciﬁcs , Accident Analysis& Prevention, Vol.95, page 350–357, 2016, Elsevier[31] Vicente, Francisco and Huang, Zehua and Xiong, Xuehan and De laTorre, Fernando and Zhang, Wende and Levi, Dan,

Driver gaze trackingand eyes off the road detection system , IEEE Transactions on IntelligentTransportation Systems, Vol.16, page 2014–2027, 2015, IEEE[32] Fridman, Lex and Langhans, Philipp and Lee, Joonbum and Reimer,Bryan,

Driver gaze estimation without using eye movement , 2015[33] Borghi, Guido and Venturelli, Marco and Vezzani, Roberto and Cuc-chiara, Rita,

Poseidon: Face-from-depth for driver pose estimation ,Proceedings of the IEEE Conference on Computer Vision and PatternRecognition, page 4661–4670, 2017[34] Palazzi, Andrea and Abati, Davide and Solera, Francesco and Cucchiara,Rita and others,

Predicting the Driver’s Focus of Attention: the DR(eye) VE Project , IEEE transactions on pattern analysis and machineintelligence, Vol.41, page 1720–1733, 2018, IEEE[35] Ranney, Thomas A and Garrott, W Riley and Goodman, Michael J,

NHTSA driver distraction research: Past, present, and future , 2001, SAETechnical Paper.[36] Li, Nanxiang and Busso, Carlos,

Predicting perceived visual and cogni-tive distractions of drivers with multimodal features , IEEE Transactionson Intelligent Transportation Systems, Vol.16, page 51–65, 2015, IEEE[37] Xing, Yang and Lv, Chen and Cao, Dongpu and Wang, Huaji andZhao, Yifan,

Driver workload estimation using a novel hybrid method oferror reduction ratio causality and support vector machine , Measurement,Vol.114, pages 390–397, 2018, Elsevier[38] Tian, Renran and Ruan, Keyu and Li, Lingxi and Le, Jialiang andGreenberg, Jeff and Barbat, Saeed,

Standardized evaluation of camera-based driver state monitoring systems ,IEEE/CAA Journal of AutomaticaSinica, Vol.6, page 716–732, 2019, IEEE[39] Liang, Yulan and Reyes, Michelle L and Lee, John D,

Real-timedetection of driver cognitive distraction using support vector machines ,IEEE transactions on intelligent transportation systems, Vol.8, page 340–350, 2007, IEEE[40] Liu, Tianchi and Yang, Yan and Huang, Guang-Bin and Yeo, Yong Kiangand Lin, Zhiping,

Driver distraction detection using semi-supervisedmachine learning , IEEE transactions on intelligent transportation systems,Vol.17, page 1108–1120, 2015, IEEE[41] Abouelnaga, Yehya and Eraqi, Hesham M and Moustafa, MohamedN,

Real-time distracted driver posture classiﬁcation ,arXiv preprintarXiv:1706.09498, 2017[42] Kose, Neslihan and Kopuklu, Okan and Unnervik, Alexander and Rigoll,Gerhard,

Real-Time Driver State Monitoring Using a CNN Based Spatio-Temporal Approach , arXiv preprint arXiv:1907.08009, 2019[43] Xing, Yang and Lv, Chen and Wang, Huaji and Cao, Dongpu andVelenis, Efstathios and Wang, Fei-Yue,

Driver activity recognition forintelligent vehicles: A deep learning approach , IEEE Transactions onVehicular Technology, Vol.68, pages 5379–5390, 2019, IEEE[44] Xing, Yang and Lv, Chen and Zhang, Zhaozhong and Wang, Huajiand Na, Xiaoxiang and Cao, Dongpu and Velenis, Efstathios and Wang,Fei-Yue,

Identiﬁcation and analysis of driver postures for in-vehicledriving activities and secondary tasks recognition , IEEE Transactions onComputational Social Systems, Vol.5, pages 95–108, 2017, IEEE [45] Pugeault, Nicolas and Bowden, Richard,

How much of driving ispreattentive? , IEEE Transactions on Vehicular Technology, Vol.64, page5424–5438, 2015, IEEE[46] Kumar, Puneet and Perrollaz, Mathias and Lefevre, St´ephanie andLaugier, Christian,

Learning-based approach for online lane changeintention prediction , 2013 IEEE Intelligent Vehicles Symposium (IV),page 797–802, 2013, IEEE[47] Butakov, Vadim A and Ioannou, Petros,

Personalized driver/vehicle lanechange models for ADAS , IEEE Transactions on Vehicular Technology,Vol.64, page 4422–4431, 2014, IEEE[48] Jain, Ashesh and Koppula, Hema S and Soh, Shane and Raghavan,Bharad and Singh, Avi and Saxena, Ashutosh,

Brain4cars: Car thatknows before you do via sensory-fusion deep learning architecture , arXivpreprint arXiv:1601.00740, 2016[49] Jain, Ashesh and Singh, Avi and Koppula, Hema S and Soh, Shaneand Saxena, Ashutosh,

Recurrent neural networks for driver activityanticipation via sensory-fusion architecture , 2016 IEEE InternationalConference on Robotics and Automation (ICRA), page 3118–3125, 2016,IEEE[50] Gebert, Patrick and Roitberg, Alina and Haurilet, Monica and Stiefelha-gen, Rainer,

End-to-end prediction of driver intention using 3d convolu-tional neural networks , 2019 IEEE Intelligent Vehicles Symposium (IV),page 969–974, 2019, IEEE[51] Rong, Yao and Akata, Zeynep and Kasneci, Enkelejda,

Driver IntentionAnticipation Based on In-Cabin and Driving Scene Monitoring , IEEEInternational Conference on Intelligent Transportation, 2020[52] McCall, Joel C and Wipf, David P and Trivedi, Mohan M and Rao,Bhaskar D,

Lane change intent analysis using robust operators andsparse bayesian learning , IEEE Transactions on Intelligent TransportationSystems, Vol.8, page 431–440, 2007, IEEE[53] Xing, Yang and Lv, Chen and Wang, Huaji and Wang, Hong andAi, Yunfeng and Cao, Dongpu and Velenis, Efstathios and Wang, Fei-Yue,

Driver lane change intention inference for intelligent vehicles:Framework, survey, and challenges , IEEE Transactions on VehicularTechnology, Vol.68, page 4377–4390, 2019, IEEE[54] Rezaei, Mahdi and Klette, Reinhard,

Look at the driver, look at theroad: No distraction! No accident! , Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition, page 129–136, 2014[55] Kim, Hyungil and Miranda Anon, Alexandre and Misu, Teruhisa andLi, Nanxiang and Tawari, Ashish and Fujimura, Kikuo,

Look at me: Aug-mented reality pedestrian warning system using an in-vehicle volumetrichead up display , Proceedings of the 21st International Conference onIntelligent User Interfaces, page 294–298, 2016, ACM[56] Tanshi, Foghor and S¨offker, Dirk,

Modeling Drivers’ Takeover BehaviorDepending on the Criticality of Driving Situations and the Complexity ofSecondary Tasks , 2019 IEEE Conference on Cognitive and ComputationalAspects of Situation Management (CogSIMA), page 67–73, 2019, IEEE[57] Kim, HyunSuk and Kim, Woojin and Kim, Jungsook and Lee, Seung-Jun and Yoon, DaeSub,

Design of Driver Readiness Evaluation Systemin Automated Driving Environment , 2018 International Conference onInformation and Communication Technology Convergence (ICTC), page300–302, 2018, IEEE[58] Kim, Hyung Jun and Yang, Ji Hyun,

Takeover requests in simulated par-tially autonomous vehicles considering human factors , IEEE Transactionson Human-Machine Systems, vol.47, page 735–740, 2017, IEEE[59] Olaverri-Monreal, Cristina and Kumar, Satyarth and D`ıaz- `Alvarez, Al-berto,

Automated Driving: Interactive Automation Control System toEnhance Situational Awareness in Conditional Automation , 2018 IEEEIntelligent Vehicles Symposium (IV), page 1698–1703, 2018, IEEE[60] Braunagel, Christian and Rosenstiel, Wolfgang and Kasneci, Enkelejda,

Ready for take-over? A new driver assistance system for an automatedclassiﬁcation of driver take-over readiness , IEEE Intelligent Transporta-tion Systems Magazine, vol. 9, page 10–22, 2017, IEEE[61] Kasneci, Enkelejda and Kasneci, Gjergji and K¨ubler, Thomas Cand Rosenstiel, Wolfgang,

Online recognition of ﬁxations, saccades,and smooth pursuits for automated analysis of trafﬁc hazard percep-tion ,Artiﬁcial neural networks, pages 411 434, 2015, Springer[62] Kasneci, Enkelejda and K¨ubler, Thomas and Broelemann, Klaus andKasneci, Gjergji,

Aggregating physiological and eye tracking signals topredict perception in the absence of ground truth , Computers in HumanBehavior, 68, pages 450–455, 2017, Elsevier[63] Braunagel, Christian and Kasneci, Enkelejda and Stolzmann, Wolfgangand Rosenstiel, Wolfgang,

Driver-activity recognition in the context ofconditionally autonomous driving , 2015 IEEE 18th International Con-ference on Intelligent Transportation Systems, pages 1652–1657, 2015,IEEE[64] Braunagel, Christian and Geisler, David and Rosenstiel, Wolfgangand Kasneci, Enkelejda,

Online recognition of driver-activity based on visual scanpath classiﬁcation , IEEE Intelligent Transportation SystemsMagazine, 9 (4), pages 23–36, 2017, IEEE[65] Martin, Manuel and Roitberg, Alina and Haurilet, Monica and Horne,Matthias and Reiß, Simon and Voit, Michael and Stiefelhagen, Rainer,

Drive&Act: A Multi-modal Dataset for Fine-grained Driver BehaviorRecognition in Autonomous Vehicles , ICCV, 2019[66] Carsten, Oliver and Martens, Marieke H,

How can humans understandtheir automated cars? HMI principles, problems and solutions , Cognition,Technology & Work, vol.21, page 3–20, 2019, Springer[67] Benderius, Ola and Berger, Christian and Lundgren, Victor Malmsten,

The best rated human–machine interface design for autonomous vehiclesin the 2016 grand cooperative driving challenge , IEEE Transactions onintelligent transportation systems, vol.19, page 1302–1307, 2017, IEEE[68] Tian, Renran and Li, Lingxi and Rajput, Vikram S and Witt, Gerald Jand Duffy, Vincent G and Chen, Yaobin,

Study on the display positionsfor the haptic rotary device-based integrated in-vehicle infotainment in-terface ,IEEE Transactions on Intelligent Transportation Systems, Vol.15,page 1234–1245, 2014, IEEE[69] Bengler, Klaus and Rettenmaier, Michael and Fritz, Nicole and Feierle,Alexander,

From HMI to HMIs: Towards an HMI Framework for Au-tomated Driving , Information, Vol.11, page 61, 2020, MultidisciplinaryDigital Publishing Institute[70] Ekman, Fredrick and Johansson, Mikael and Sochor, Jana,

Creatingappropriate trust in automated vehicle systems: A framework for HMIdesign , IEEE Transactions on Human-Machine Systems, vol.48, page 95–101, 2017, IEEE[71] St¨ark, Marius and Backes, Damian and Kehl, Christian,

A SupervisedLearning Concept for Reducing User Interaction in Passenger Cars , arXivpreprint arXiv:1711.04518, 2017[72] Brusey, James and Hintea, Diana and Gaura, Elena and Beloe, Neil,

Reinforcement learning-based thermal comfort control for vehicle cabins ,Mechatronics, vol.50, page 413–421, 2018, Elsevier[73] Smith, Karly A and Csech, Cl´ement and Murdoch, David and Shaker,George,

Gesture recognition using mm-wave sensor for human-car inter-face , IEEE Sensors Letters, vol.2, pages 1–4, 2018, IEEE[74] Manganaro, Fabio and Pini, Stefano and Borghi, Guido and Vezzani,Roberto and Cucchiara, Rita,

Hand Gestures for the Human-Car Interac-tion: The Briareo Dataset , International Conference on Image Analysisand Processing, pages 560–571, 2019, Springer[75] Okur, Eda and Kumar, Shachi H and Sahay, Saurav and Esme, Asli Ar-slan and Nachman, Lama,

Natural Language Interactions in AutonomousVehicles: Intent Detection and Slot Filling from Passenger Utterances ,arXiv preprint arXiv:1904.10500, 2019[76] Zhou, Man and Qin, Zhan and Lin, Xiu and Hu, Shengshan and Wang,Qian and Ren, Kui,

Hidden Voice Commands: Attacks and Defenses onthe VCS of Autonomous Driving Cars , IEEE Wireless Communications,2019, IEEE[77] Hu, Xiping and Deng, Junqi and Zhao, Jidi and Hu, Wenyan and Ngai,Edith C-H and Wang, Renfei and Shen, Johnny and Liang, Min and Li,Xitong and Leung, Victor and others,

SAfeDJ: a crowd-cloud codesign ap-proach to situation-aware music delivery for drivers , ACM Transactionson Multimedia Computing, Communications, and Applications (TOMM),vol.12, pages 21, 2015, ACM[78] Lindemann, Patrick and Rigoll, Gerhard,

Exploring ﬂoating stereoscopicdriver-car interfaces with wide ﬁeld-of-view in a mixed reality simulation ,Proceedings of the 22nd ACM Conference on Virtual Reality Softwareand Technology, pages 331–332, 2016, ACM[79] Wiegand, Gesa and Mai, Christian and Liu, Yuanting and Hußmann,Heinrich,

Early Take-Over Preparation in Stereoscopic 3D , AdjunctProceedings of the 10th International Conference on Automotive UserInterfaces and Interactive Vehicular Applications, pages 142–146, 2018,ACM[80] Abdi, Lotﬁ and Abdallah, Faten Ben and Meddeb, Aref,

In-vehicle aug-mented reality trafﬁc information system: a new type of communicationbetween driver and vehicle , Procedia Computer Science, vol.73, pages242–249, 2015, Elsevier[81] Bolton, Adam and Burnett, Gary and Large, David R,

An investigationof augmented reality presentations of landmark-based navigation usinga head-up display , Proceedings of the 7th International Conference onAutomotive User Interfaces and Interactive Vehicular Applications, pages56–63, 2015, ACM[82] Medenica, Zeljko and Kun, Andrew L and Paek, Tim and Palinko, Oskar,

Augmented reality vs. street views: a driving simulator study comparingtwo emerging navigation aids , Proceedings of the 13th InternationalConference on Human Computer Interaction with Mobile Devices andServices, pages 265–274, 2011, ACM T A B LE III O V E R V I E W O F T H E HA R D W A R E U T I L I Z A T I ON F O R DA T AA C QU I S I T I ON I N E XA M I N E D P A P E R S H a r d w a re C a t e go r y H a r d w a re F e a t u re Sp ec i ﬁ c a t i o n s D a t a s e t R e f ere n ce ca m e r a N ea r i n fr a r e d ( N I R ) c h a r g e d c oup l e dd e v i ce ( CC D ) ca m e r ae y e li d m ov e m e n t fr o m f ac i a li m a g e s I R ; f p s t e s t on3v i d e o s , fr a m e s p e r v i d e o [ ] w i d e - a ng l eca m e r ae y e li d m ov e m e n t fr o m f ac i a li m a g e s R G B , ( e y ea r ea ) × i x e l , f p s a r ti c i p a n t s : i m a g e s f o r t r a i n i ng , i m a g e s f o r t e s t [ ] L og it ec h c W e b ca m h ea dpo s t u r e & e y e g aze fr o m f ac i a li m a g e s R G B , × i x e l , f p s a li dp a r ti c i p a n t s : s a m p l e s f o r t r a i n i ng , f o r t e s t [ ] A lli e d V i s i on T ec h G uppy P r o F - B / C e y e g aze fr o m f ac i a li m a g e s g r a y s ca l e , × i x e l , f p s , , i m a g e s fr o m a r ti c i p a n t s : c l a ss e s , unb a l a n ce d [ ] I D S U I- LE i m a g e o f d r i v e rI R , × i x e l , f p s a r ti c i p a n t s , t o t a ll y12ho f v i d e o [ ] M i c r o s o f t K i n ec ti m a g e o f d r i v e r R G B , × i x e l , f p s ; I R , × i x e l , f p s ; D e p t h , × i x e l , f p s ; a r ti c i p a n t s , t o t a ll y12ho f v i d e o [ ] h ea dpo s t u r e fr o m f ac i a li m a g e s R G B , × i x e l , f p s ; D e p t h , × i x e l , f p s ; s e qu e n ce s : s ub j ec t s , eac h5 r ec o r d i ng s [ ] i m a g e o f d r i v e r R G B , × i x e l , f p s a bou t t hou s a nd s i m a g e s : s ub j ec t s , eac h10 - m i n r ec o r d i ng [ ] GA R M I NV i r b X ca m e r a i m a g e o fr o a d R G B , × i x e l , f p s D R ( e y e ) V E d a t a s e t: a r ti c i p a n t s , , fr a m e s ( - m i n s e qu e n ce s )[ ] unkno w n ca m e r a i m a g e o f d r i v e r R G B , × i x e l , f p s f d r i v i ng : i d e o s e g m e n t s ( r i v e r s *8d r i v i ng c ond iti on s *3 s e g m e n t s )[ ] unkno w n ca m e r a i m a g e o fr o a d R G B , × i x e l , f p s f d r i v i ng : i d e o s e g m e n t s ( r i v e r s *8d r i v i ng c ond iti on s *3 s e g m e n t s )[ ] A S U S Z e n P hon e ( Z UD ) r ea r ca m e r a i m a g e o f d r i v e r R G B , × i x e l a r ti c i p a n t s [ ] TZ YX s t e r e o ca m e r a i m a g e o fr o a d512 × i x e l , f p s l a n ec h a ng e s : f o r t r a i n i ng , f o r t e s t [ ] unkno w n ca m e r a i m a g e o f d r i v e r × i x e l , f p s B r a i n4 ca r s : a r ti c i p a n t s , t o t a ll y1180 m il e s o f d r i v i ng [ ] unkno w n ca m e r a i m a g e o fr o a d720 × i x e l , f p s B r a i n4 ca r s : a r ti c i p a n t s , t o t a ll y1180 m il e s o f d r i v i ng [ ] P i c o F l e xx T o F ca m e r a h a ndg e s t u r e I R , × i x e l , f p s [ ] e y e t r ac k e r F ace L A B e y e t r ac k i ngd e v i cee y e li d m ov e m e n t , h ea dpo s t u r ea ndg aze : b li nkdu r a ti on / fr e qu e n c y , P E RC L O S , h ea d3 D po s iti on / r o t a ti on s / , s acca d e fr e qu e n c y , e t c . H z [ ] : a r ti c i p a n t s , - m i np e r p a r ti c i p a n t , on e s a m p l e p e r m i nu t e . t r a i n i ng s e t: v a li d a ti on s e t:t e s t s e t = . : . : . [ ] : a r ti c i p a n t s , - m i nd r i v e s p e r p a r ti c i p a n t [ ] : a r ti c i p a n t s , -f o l d c r o ss - v a li d a ti onp r o ce ss [ ] , [ ] , [ ] S m a r t e y e P r o e y e li d m ov e m e n t: b li nkdu r a ti on / fr e qu e n c y , P E RC L O S , pup il d i a m e t e r H z a r ti c i p a n t s , m o f d r i v i ngp e r p a r ti c i p a n t , s a m p l e s f o r t r a i n i ng , s a m p l e s f o r t e s t [ ] S M I ET G w e y e t r ac k i ngg l a ss e s e y e g aze × i x e l , f p s D R ( e y e ) V E d a t a s e t: a r ti c i p a n t s , , fr a m e s ( - m i n s e qu e n ce s )[ ] D i k a b li s p r o f e ss i on a l e y e t r ac k e r e y e g aze × i x e l , H z a r ti c i p a n t s , s it u a ti on s p e r p a r ti c i p a n t , l ea v e - - ou t c r o ss v a li d a ti on [ ] d r i v i ng s i m u l a t o r S C AN e R S t ud i ov e h i c l e dyn a m i c s : l a t e r a l d i s t a n ce fr o m t h ec l o s e s tl a v ea nd t h ece n t e r o f t h eca r ; l a t e r a l s h i f t o f t h e v e h i c l ece n t e rr e l a ti v e t o t h e l a n ece n t e r ; ti m e t o l a n ec r o ss i ng ; s t ee r i ng a ng l e / a ng l e v e l o c it y , e t c . ; v e h i c l e s p ee d ; nu m b e r o f d i r ec ti on c h a ng e / ou t - t h e -r o a d10 H z I n [ ] : a r ti c i p a n t s , - m i np e r p a r ti c i p a n t , on e s a m p l e p e r m i nu t e . t r a i n i ng s e t: v a li d a ti on s e t:t e s t s e t = . : . : . I n [ ] , a r ti c i p a n t s , m o f d r i v i ngp e r p a r ti c i p a n t , s a m p l e s f o r t r a i n i ng , s a m p l e s f o r t e s t I n [ ] : a r ti c i p a n t s [ ] , [ ] , [ ]( H z ) T NO P r e S ca nv e h i c l e dyn a m i c s : v e l o c it y , R P M , g ea r s , e t c . a r ti c i p a n t s [ ] C AN - B u s C AN - B u s v e h i c l e dyn a m i c s : v e l o c it y , s t ee r i ng w h ee l a ng l e , b r a k e v a l u e , R P M acce l e r a ti on , b li nk e r s t a t u s f d r i v i ng [ ] H z a r ti c i p a n t s , s it u a ti on s p e r p a r ti c i p a n t , l ea v e - - ou t c r o ss v a li d a ti on [ ] phy s i o l og i ca l m ea s u r e m e n t c h a nn e l d i g it a l b r a i n w a v e m ea s u r e m e n t s y s t e m fr o m N E U R O C o m p a ny e l ec t r o e n ce ph a l og r a m ( EE G ) a r ti c i p a n t s [ ] B i op ac M P s y s t e m& A c qkno w l e dg e s o f t w a r e phy s i o l og i ca l s i gn a l s : h ea r t r a t e , s y m p a t h e ti c r a ti o , v a g a l r a ti o , s y m p a t h e ti c - v a g a l r a ti o , r e s p i r a ti on r a t e , e t c . H z a r ti c i p a n t s , - m i np e r p a r ti c i p a n t , on e s a m p l e p e r m i nu t e . t r a i n i ng s e t: v a li d a ti on s e t:t e s t s e t = . : . : . [ ] l a s e r s e n s o r l a s e r B i r d l a s e r s ca nn e r h ea dpo s t u r e H z a r ti c i p a n t s , s it u a ti on s p e r p a r ti c i p a n t , l ea v e - - ou t c r o ss v a li d a ti on [ ] r a d a r & L i d a r s e n s o r unkno w n r a d a r / L i d a rr e l a ti v e d i s t a n ce , s p ee d a nd a ng l e t o t h e s u rr ound i ngv e h i c l e s H z , ◦ c ov e r a g e , up t o50 m e t e r s a r ti c i p a n t s , t o t a ll y97d r i v e s , a v e r a g e du r a ti ono f m i n [ ] fr e qu e n c y m odu l a t e d c on ti nuou s w a v e mm - w a v e l e ng t h r a d a r h a ndg e s t u r e H z , . mm r e s o l u ti on6 c l a ss e s , f o r t r a i n i ng : a r ti c i p a n t s , r ec o r d i ng s p e r g e s t u r e f o r t e s t: a r ti c i p a n t s , r ec o r d i ng s p e r g e s t u r e [ ] m i c r ophon e unkno w n m i c r ophon e s p eec ho f d r i v e r f d r i v i ng [ ] Intention [23] Acoustic signals Handgesture TechnologyMachine LearningDeep LearningReinforcement LearningMarkov Decision ProcessFuzzy LogicVR/AR Full image Full image TemperatureDynamicsPositionInteriorVehicleExterior Driver/PassengerAmbience Eye gaze Head posture Mouth movementPhysiological information Eyelidmovement

HMIEmotionHMIFatigueDistractionEmotionConvenience,Distraction,Attention DistractionHMIIntentionNavigation,TakeoverIntention, Takeover Tra ﬃ c warningTra ﬃ c warning Intentionc warning Intention