[PDF] Corner Case Generation and Analysis for Safety Assessment of Autonomous Vehicles

Abstract

Testing and evaluation is a crucial step in the development and deployment of Connected and Automated Vehicles (CAVs). To comprehensively evaluate the performance of CAVs, it is of necessity to test the CAVs in safety-critical scenarios, which rarely happen in naturalistic driving environment. Therefore, how to purposely and systematically generate these corner cases becomes an important problem. Most existing studies focus on generating adversarial examples for perception systems of CAVs, whereas limited efforts have been put on the decision-making systems, which is the highlight of this paper. As the CAVs need to interact with numerous background vehicles (BVs) for a long duration, variables that define the corner cases are usually high dimensional, which makes the generation a challenging problem. In this paper, a unified framework is proposed to generate corner cases for the decision-making systems. To address the challenge brought by high dimensionality, the driving environment is formulated based on Markov Decision Process, and the deep reinforcement learning techniques are applied to learn the behavior policy of BVs. With the learned policy, BVs will behave and interact with the CAVs more aggressively, resulting in more corner cases. To further analyze the generated corner cases, the techniques of feature extraction and clustering are utilized. By selecting representative cases of each cluster and outliers, the valuable corner cases can be identified from all generated corner cases. Simulation results of a highway driving environment show that the proposed methods can effectively generate and identify the valuable corner cases.

Full PDF

CCorner Case Generation and Analysis for Safety Assessment of Autonomous VehiclesHaowei Sun

Department of Civil and Environmental EngineeringUniversity of Michigan, Ann Arbor, MI, USAEmail: [email protected]

Shuo Feng, Ph.D., Corresponding Author

Department of Civil and Environmental EngineeringUniversity of Michigan, Ann Arbor, MI, USAEmail: [email protected]

Xintao Yan

Department of Civil and Environmental EngineeringUniversity of Michigan, Ann Arbor, MI, USAEmail: [email protected]

Henry X. Liu, Ph.D.

Department of Civil and Environmental EngineeringUniversity of Michigan Transportation Research InstituteUniversity of Michigan, Ann Arbor, MI, USAEmail: [email protected] a r X i v : . [ c s . A I] F e b un, Feng, Yan, and Liu 2 ABSTRACT

Testing and evaluation is a crucial step in the development and deployment of Connected and Auto-mated Vehicles (CAVs). To comprehensively evaluate the performance of CAVs, it is of necessityto test the CAVs in safety-critical scenarios, which rarely happen in naturalistic driving environ-ment. Therefore, how to purposely and systematically generate these corner cases becomes animportant problem. Most existing studies focus on generating adversarial examples for perceptionsystems of CAVs, whereas limited efforts have been put on the decision-making systems, whichis the highlight of this paper. As the CAVs need to interact with numerous background vehicles(BVs) for a long duration, variables that deﬁne the corner cases are usually high dimensional,which makes the generation a challenging problem. In this paper, a uniﬁed framework is proposedto generate corner cases for the decision-making systems. To address the challenge brought byhigh dimensionality, the driving environment is formulated based on Markov Decision Process,and the deep reinforcement learning techniques are applied to learn the behavior policy of BVs.With the learned policy, BVs will behave and interact with the CAVs more aggressively, resultingin more corner cases. To further analyze the generated corner cases, the techniques of feature ex-traction and clustering are utilized. By selecting representative cases of each cluster and outliers,the valuable corner cases can be identiﬁed from all generated corner cases. Simulation resultsof a highway driving environment show that the proposed methods can effectively generate andidentify the valuable corner cases.

Keywords : Connected and Automated Vehicles, Corner Case Generation, Safety, Deep Reinforce-ment Learningun, Feng, Yan, and Liu 3

INTRODUCTION

Testing and evaluation of Connected and Automated Vehicles (CAVs) have been studied for years( ). To comprehensively evaluate the performance of CAVs, it is crucial to test the CAVs in dif-ferent scenarios, especially the safety-critical ones. In the naturalistic driving environment (NDE),however, safety-critical scenarios rarely happen, so it is very time-consuming and inefﬁcient tocollect corner cases from either on-road test or simulation test of NDE (

12, 13 ). Therefore, how topurposely and systematically generate corner cases becomes an important problem.Most existing studies for corner case generation focus on the perception systems of CAVs.Utilizing the methods from the domain of computer vision, researchers aim to generate adversarialexamples, which can fool the perception system of CAVs, such as misleading the object classi-ﬁcation results ( ) and hiding pedestrians from the perception results ( ). Consequently,disturbed CAVs may miss safety-critical information and encounter dangerous situations.The decision-making system is also essential in keeping the safety driving of CAVs. Eventhough the perception system is perfect in the sense that every object of interest can be correctlyobserved and recognized, the failure of decision-making system can still cause severe accidents.However, how to generate corner cases for decision-making systems still lacks investigation. To-wards solving this issue, several methods have been recently proposed to generate corner casesof simple scenarios (e.g., cross-walking) using adversarial machine learning techniques ( ). Forreal-world trafﬁc environment (e.g., highway driving environment), however, CAVs need to in-teract with multiple background vehicles (BVs) for a long duration, so the variables that deﬁnethe corner cases will be high dimensional. To the best of our knowledge, no existing method canhandle such a high dimensionality for corner case generation.In this paper, a uniﬁed framework is proposed for high dimensional corner case generationproblem. To address the challenge brought by the high dimensionality, Markov Decision Process(MDP) is introduced to formulate the trafﬁc environment, which can simplify the temporal de-pendency between different snapshots of the simulation. Compared with existing driving modelscommonly used in prevailed simulation platforms such as IDM ( ), MOBIL ( ), and Kraussmodel ( ), the MDP-based driving model can incorporate the behavior randomness of the real-world datasets, which makes the simulation more realistic and reliable. Moreover, with the learningresults of the RL, the obtained simulation can generate much more corner cases purposely, com-paring with the naturalistic driving environment. To further simplify the spatial dependency, theBVs of the environment are assumed to make decisions simultaneously and independently duringeach time step, which is commonly accepted in existing studies (

22, 23 ). Using naturalistic driv-ing data, the empirical distributions of BV’s actions can be obtained for every state. By samplingactions of BVs from the distributions at each time step, the naturalistic driving environment canbe essentially generated (

12, 13 ). Based on this formulation, the corner case generation problemof the driving environment is equivalent to optimizing the behavior policy of BVs to improve theprobability of corner cases. To achieve this optimization objective, deep reinforcement learning(DRL) ( ) techniques are utilized to learn the optimal behavior policy of BVs. With the learnedpolicy, the BVs will behave more aggressively when interacting with the CAVs, and thereforesystematically generate the corner cases for the complex driving environment, such as highwaydriving environment.Due to the diversity of corner cases, the generated cases can usually be divided into severalclusters as well as outliers that do not belong to any speciﬁc cluster. Therefore, typical cornercases from each cluster and the outliers can represent the internal property of the generated cornerun, Feng, Yan, and Liu 4cases, which are referred as "valuable" corner cases hereafter. Given the generated corner cases, itis crucial to identify the valuable corner cases, which are usually more valuable for the evaluationand development of CAVs. To achieve this goal, feature extraction and clustering techniques areintroduced. Since the generated corner cases could be high dimensional, Principal ComponentAnalysis (PCA) ( ) is utilized to reduce the dimension of corner cases and extract the principalfeatures. For the extracted features, clustering methods are applied to identify different clusters ofcorner cases as well as the speciﬁc outliers. Consequently, valuable corner cases can be identiﬁed.To validate the proposed method, experiments in a highway driving environment are simu-lated. To generate the corner cases, one of the commonly used DRL methods, the dueling deep Qnetwork ( ), is applied to learn the optimal behavior policy of BVs. After the feature extractionby PCA, K-means ( ) and the "density-based spatial clustering of applications with noise" (DB-SCAN) algorithms ( ) are utilized to analyze the corner cases. The experiment result shows thatthe proposed method can effectively generate and analyze valuable corner cases.The rest of the paper is organized as follows: First, the related works about corner casegeneration are introduced. Second, we propose a new uniﬁed framework for corner case generationusing MDP and DRL techniques. Third, feature extraction and clustering techniques are appliedfor the corner case analysis. After that, two case studies are provided to validate the proposedcorner case generation and analysis methods. Finally, we conclude and discuss future works. RELATED WORKS

In the CAV testing and evaluation domain, the ability to test the performance of CAV under differ-ent scenarios is crucial. Therefore, researchers have been deploying on this ﬁeld to ﬁnd efﬁcientand reasonable methods of generating corner cases for both CAV perception testing and decision-making testing area.

Corner Case Generation for Vehicle Perception

In the CAV perception ﬁeld, one popular research area of generating corner cases is the adversarialexamples generation method. Many autonomous vehicle manufacturers such as Tesla, Waymo, etc.have been leveraging neuron networks for perception purposes. However, with the rapid develop-ment of deep learning and neuron network training, many researchers have found that with a smallperturbation on the training examples, the machine learning system can be fooled ( ). Thesecontaminated examples are deﬁned as adversarial examples. For example, Szegedy et al. ( ) pro-posed an gradient-based optimization adversary examples generation method. By minimizing thedifference between adversarial examples and normal training examples while modifying the pre-dicted label from the machine learning system, the proposed algorithm can automatically generateadversarial examples for speciﬁc neuron networks.In terms of autonomous vehicle testing, there are also many approaches to interfering withvehicle perception, where most attention is focused on attacking the object detection system. Xieet al. ( ) proposed an attack algorithm of object recognition system by removing the pedestriansegmentation. Kurakin et al. ( ) showed that the perturbation in the physical world instead ofthe image perception area can lead to fatal errors of the CAV perception system. Furthermore,Evtimov et al. ( ) and Eykholt et al. ( ) designed a real-world stop sign with domain knowledgefrom adversarial examples and successfully let the object recognition system classify it as a speedlimit sign. Several works such as Lu et al. (

33, 34 ) showed that the adversarial stop signs inthe physical world will not fool the modern CAV perception system as the CAV is continuouslyun, Feng, Yan, and Liu 5moving and detecting the object at each time step. However, Chen et al. ( ) showed that theperception system of CAV is still vulnerable to speciﬁc adversarial examples. Corner Cases Generation for Vehicle Decision-making

From another aspect, researchers have also been focusing on generating corner cases for CAV deci-sion systems. Ma and Peng ( ) have proposed a worst case evaluation method, which formulatedthe disturbance generator and the controller as two players in a game, and generated corner casesby ﬁnding the worst inputs from steering controllers and integrated chassis controllers. However,the proposed method focuses on a single vehicle, without considering the inﬂuence of other trafﬁcparticipants, which is crucial in the evaluating and testing process of CAV.To generate cases with multiple trafﬁc participants, researchers introduced the risky indexand the probabilistic model of the environment to help generate critical cases. For example, Zhaoet al. (

36, 37 ) introduced importance sampling techniques and generated testing cases for car-following and lane-changing maneuvers. To reduce the overvalue problem of worst cases, Feng etal. (

12, 38–41 ) deﬁned the maneuver challenge and exposure frequency terms and generated caseson various environment settings, including cut-in scenarios, car-following scenarios, and highway-driving environment. Akagi et al. ( ) use self-deﬁned risky index and naturalistic driving data tosample critical cut-in scenarios. O’Kelly et al. ( ) utilized neuron network and imitation learningto calibrate naturalistic driving model from the NGSIM data, and then a highway with 6 vehiclesare chosen to generate testing cases. These critical case generation methods consider both the riskyindex and the naturalistic probability in the generating process. Although the critical cases aresigniﬁcant for the systematic evaluation of CAVs, the corner cases are also important especiallyfor the vulnerability identiﬁcation of CAVs, which is complementary to the critical cases. Howto generate and identify corner cases with the high coverage, variability, and representativenessremains an open question.To deal with corner cases with long time duration, researchers introduced markov decisionprocess (MDP) and reinforcement learning (RL) techniques to reduce the temporal complexity.Ding et al. ( ) modeled the environment as the combination of "blocks" and uses REINFORCEalgorithm to generate corner cases. However, the modeling method can only be used in simpliﬁedenvironment and cannot deal with large number of trafﬁc participants. Koren et al. ( ) proposedthe Adaptive Stress Testing (AST) method, which introduced monte carlo tree search (MCTS)and deep reinforcement learning (DRL) to solve the pedestrian-crossing problem. In this study,however, it also only involves one autonomous vehicle and one or two pedestrians. Karunakaranet al. ( ) utilized the deep q network (DQN) to generate corner cases involving one pedestrianand one autonomous vehicle. However, the action chosen for the pedestrian is very simple, and therisk estimation only concentrates on the responsibility-sensitive safety (RSS) metric ( ), whichmay become misleading in predicting the crash probability.To address the limitations of the existing decision-making corner case generation method,this paper proposes a corner case generation method over the decision domain. In most scenarios,the corner case is a sequence of snapshots of the environment with multiple trafﬁc participants.Therefore, the corner cases in real-world trafﬁc environments always have high dimensions. How-ever, existing decision-making corner case generation methods are only validated under simpliﬁedscenarios and cannot process highly complex environments. In this paper, we propose a cornercase generation method for high dimensional and complex trafﬁc simulations. By modeling theenvironment with Markov Decision Process (MDP), the complexity of the temporal domain isun, Feng, Yan, and Liu 6simpliﬁed. Moreover, we model the scenario as interactions between multiple trafﬁc participants,which can handle the curse of dimensionality in space. CORNER CASE GENERATION

In this section, we propose a uniﬁed framework for corner case generation based on MDP formula-tion and DRL techniques. To address the challenge brought by high dimensionality, we formulatethe trafﬁc simulation environment as an MDP. By utilizing the naturalistic driving data, we buildthe naturalistic driving models (e.g., car-following and lane-changing models). In this way, thenaturalistic driving environment can be modeled as the interactions between multiple backgroundvehicles with the naturalistic driving models. To purposely generate the corner cases, we formulatethe generation problem as an optimization problem of the driving models of BVs. The goal of theoptimization problem is to improve the probability of crashes. By utilizing the DRL techniques,the optimization problem can be solved by learning the behavior policy of BVs. The learned be-havior policy essentially leads to aggressive driving models of BVs, which can generate cornercases for the CAV under test.

Problem formulation

In this paper, the problem formulation is consistent with (

12, 38–41 ). Let θ describe the pre-determined parameters of the operational design domains (ODD), such as the number of lanes,weather, etc. The deﬁnitions of scenario and scene are adopted from ( ). Under speciﬁc ODDparameters, a scene describes the snapshot of the trafﬁc environment, which includes the statesof static elements and dynamic trafﬁc participants (e.g., pedestrians and vehicles). A scenariodescribes the temporal development among a sequence of scenes. Let X represent the decisionvariables of the scenario and s denote the state of the scene. In this paper, we only considerbackground vehicles, and each vehicle has three parameters to describe the overall state: position p , velocity v , and heading angle α . Therefore, the decision variable of one vehicle can be deﬁnedas x = { p , v , α } . In each time step, the vehicle numbers in the CAV’s neighborhood are different,so we deﬁne the number of observed vehicles in time step t as m t . Then, the scene of time step t can be deﬁned as s t = { x ( t ) , x ( t ) , · · · , x ( t ) m t } , and the scenario can be deﬁned as: X = { s , s , · · · , s n } . (1)To simplify the problem, we model the trafﬁc environment as a Markov Decision Process(MDP). Given a speciﬁc state s t , the "agent" can represent the background vehicles in the environ-ment and make a decision u t (e.g., accelerations). As the u t is only determined by state s t , it canalso be written as u t ( s t ) . Therefore, the scenario can be rewritten as follows: X = { s , u , s , u , s , · · · , s n } . (2)Then, we deﬁne A as the event of interest (e.g., crash event). For a given scenario X , wecan clearly identify whether it is the event of interest. Therefore, P ( A | X ) is known for given X .Under speciﬁc ODD, the probability of the event of interest can be written as follows: P ( A | θ ) = ∑ X ∈ D P ( A | X ) P ( X | θ ) , (3)where P ( X | θ ) denotes the probability of speciﬁc scenario X given ODD parameter θ , and D rep-resents the set of available scenarios. By using the notation from Equation (2), we can furtherdecompose P ( X | θ ) in a factorized way as: P ( X | θ ) = P ( s | θ ) m − ∏ k = P ( s k + | u k , s k , θ ) P ( u k | s k , θ ) . (4)un, Feng, Yan, and Liu 7Here P ( s k + | u k , s k , θ ) is the state transition probability, which means the probability of the oc-currence of s k + given state s k and action u k , and P ( u k | s k , θ ) denotes the probability of choosingaction u k at state s k .In Equation (4), P ( s | θ ) and P ( s k + | u k , s k , θ ) are determined by the environment, and P ( A | X ) is determined by the CAV under test, which can not be modiﬁed. Therefore, to improvethe exposure frequency of corner cases ( P ( A | θ ) in Equation (3)), we should modify the P ( u k | s k , θ ) such that the P ( X | θ ) can be increased for X where P ( A | X ) = P ( u k | s k , θ ) denotes the probabilityof agent choosing action u k under state s k , so it can be viewed as a stochastic policy. If the policycan be optimized to improve P ( A | θ ) , the corner cases can be generated more purposely. To achievethis goal, the DRL techniques are utilized to train a new policy as the replacement of P ( u k | s k , θ ) ,which will be elaborated as follows. Deep Reinforcement Learning Based Method

By replacing P ( u k | s k , θ ) with the modiﬁed behavior policy π ρ ( u k | s k , θ ) , we can obtain higherprobability of the event of interest. The new deﬁnition is shown in the following equation: P ( X | θ , ρ ) = P ( s | θ ) m − ∏ k = P ( s k + | u k , s k , θ ) π ρ ( u k | s k , θ ) . (5)The newly deﬁned π ρ ( u k | s k , θ ) represents a behavior policy to be learned in DRL problem. De-tailed optimization formulation can be seen in Equation 6:max ρ P ( A | θ , ρ ) = ∑ X ∈ D P ( A | X ) P ( X | θ , ρ ) . (6)The optimization problem can be considered as training a speciﬁc behavior policy in thesimulation environment, which will let BVs aggressively interact with the CAV. Therefore, giventhe complexity of the environment, DRL techniques can have a good performance in solving theoptimization problem. As an unsupervised algorithm, DRL techniques can learn optimal policyfrom experience given speciﬁc reward settings. By implementing the Reinforcement Learning(RL) algorithms and using Deep Neuron Network (DNN) as the function approximator, DRL cansolve the problem with high complexity and derive well-behaved policy in aspects of video games( ), robotics ( ), etc. Following the ideas of traditional RL, DRL also has three different ap-proaches: value-based DRL, policy-based DRL, and actor-critic DRL (

24, 48–53 ). In this paper,we mainly use the value-based DRL to solve the optimization problem.Value-based DRL, also commonly known as the Deep Q Network (DQN), aims at learningthe optimal state-action value function Q ( s t , u t ) by estimating the expected reward of action u t given state s t . Researchers introduce DNN here to represent the Q function, where the neuron net-work will accept the state and action information as the input and return the estimated state-actionvalue. Speciﬁcally, DQN uses a neuron network with parameter ρ to approximate the optimalstate-action value function: Q ∗ ρ ( s , u ) = max π E [ ∞ ∑ i = γ i r t + | s t = s , u t = u , π ρ ] , (7)where r t denotes the immediate reward received at step t . Therefore, with the optimal state-actionvalue function, we can easily derive the optimal policy which can increase the probability of cornercases. The optimal policy will be a deterministic policy as follows: P ( u t | s t ) = (cid:40) u t = arg max u Q ( s t , u ) . (8)By only giving crash events positive reward r A , with the training process of the DQN agent, theun, Feng, Yan, and Liu 8optimal value function and the optimal policy will bring about higher probability of crash events.In order to further improve the performance of DRL techniques, dueling network architec-ture ( ) are introduced. By deﬁning the new deﬁnition value function V ( s ) and advantage function A ( s , u ) , the dueling network architecture can estimate the value function of the state-action moreprecisely. The connection between dueling network and the DQN can be seen as follows: A ( s , u ) = Q ( s , u ) − V ( s ) . (9)Therefore, two neuron networks are introduced to represent A ( s , u ) and V ( s ) . These two networksoften shares some hidden layers as shown in Figure 1. FIGURE 1 Dueling Network ArchitectureCORNER CASE ANALYSIS

With the learned behavior policy, a large number of corner cases will be generated by simulatingthe interaction between the agent and the environment. With a detailed understanding of generatedcorner cases, the evaluation and development of CAVs can be more targeted and effective.Due to the diversity of corner cases, the generated cases can usually be divided into severalclusters as well as outliers that do not belong to any speciﬁc cluster. Therefore, typical cornercases from each cluster and the outliers can represent the internal property of the generated cornercases, which are referred as valuable corner cases. To identify the valuable corner cases, we needtwo following techniques: feature extraction and clustering. The long period and a large numberof trafﬁc participants lead to the high dimension of the corner cases, which further bring aboutdifﬁculty in the analysis. To address the difﬁculty, we introduce the feature extraction method toreduce the dimension and extract key information from the cases. In this paper, we utilize thePrinciple Component Analysis (PCA) algorithm to reduce the dimension. Moreover, corner casescan usually be classiﬁed into different clusters due to the diversity of the generated corner cases.Different types of cases may differ signiﬁcantly in terms of frequency. Some types may occupythe majority of corner cases while other types may become the minority. To keep the diversity,the majority and minority need to be analyzed separately. To achieve this goal, the DBSCANmethod is used to separate the majority and minority of corner cases as two groups. Then, for eachun, Feng, Yan, and Liu 9group of the corner cases, the K-means method is utilized to cluster the corner cases, respectively.As unsupervised methods, the DBSCAN and K-means methods can be applied to cluster genericcorner cases without knowing the predeﬁned labels.

Corner Case Feature Extraction

Corner cases are sequences of continuous snapshots of trafﬁc environment, which always containa large number of trafﬁc participants. Therefore, corner cases always have high dimensions, whichwill bring about difﬁculties in the analysis. Feature extraction techniques can be utilized to re-duce the dimension of the feature space while keeping essential information. Principal componentanalysis (PCA) ( ) is one of the most popular feature extraction methods. By maximizing thevariance in every direction, the PCA method can project data points from high dimensional spaceto low dimensional space and transform the data points to a new coordinate system. The ﬁrst k di-rections given by PCA are also the directions on which the largest variance of data projection lies.Therefore, in most cases, only picking several coordinates given by PCA will bring about essentialinformation of the original data. By introducing the PCA algorithm, we can extract several key fea-tures and simplify the analysis process in the next steps. Additionally, PCA can differentiate andvisualize the distance and relatedness between different populations (i.e., data clusters). Therefore,the data points after the PCA projection will be more suitable for the next-step clustering analysis. Corner Case Clustering

In the generated corner cases, some normal scenarios may occupy a large proportion. For example,some libraries may contain a large number of homogeneous cases (e.g., many rear-end collisions).However, the value of normal scenarios is limited, as the increase in normal corner cases will notbring about much new information for CAV performance. Instead, we should pay more attentionto valuable scenarios in the generated corner cases, such as typical corner cases of a speciﬁc cornercase cluster, and the outliers that have signiﬁcantly different properties compared with the majorityof corner cases. In this paper, to detect the outliers, we utilize the density-based spatial clusteringof applications with noise (DBSCAN) method ( ), which groups the data points that are closelypacked together. Therefore, the outliers can be identiﬁed as the data points with the low density,namely, the minority of corner cases.After differentiating the majority and the minority, further analysis can be applied sepa-rately. Due to the diversity of the generated corner cases, different cases have different internalpatterns and are always distributed in different areas of the feature space. Therefore, by applyingthe clustering method, corner cases can be classiﬁed as different types. Additionally, some typicalcases can represent a large number of corner cases (in the same cluster). To automatically clas-sify the type of different data points, the clustering method is commonly used. In this domain,the K-means method ( ) is among the most popular algorithms. As a non-parametric unsuper-vised learning method, K-means provides the clustering result minimizing the in-group variance(squared Euclidean distances) for a given objective cluster number, which can help differentiateclusters with different internal properties. Valuable Corner Case Identiﬁcation

After corner case clustering, valuable corner case extraction becomes the next topic. The valueof one speciﬁc corner case represents how much it can help in evaluating CAVs and improvingthe performance of CAVs. From this perspective, two different types of cases can be deﬁned asun, Feng, Yan, and Liu 10valuable: typical cases and rare cases. Typical cases usually represent a large number of cornercases from the same cluster, and rare cases are usually the outliers, which have distinct proper-ties compared with the majority and rarely happen in the NDE. Correspondingly, two techniquesare utilized to extract valuable corner cases. First, after data clustering of corner cases, the typ-ical corner cases can be selected by the distance to the center of a speciﬁc cluster. Second, byoutlier detection methods powered by clustering, rare corner cases with various properties canbe extracted. Then, the identiﬁcation of valuable corner cases can be achieved. In this way, theidentiﬁed valuable corner cases can well balance the consideration of coverage, variability, andrepresentativeness of the cases.

CASE STUDY

In this section, we validate our methodology using experiments in the highway simulation plat-form. This section consists of four parts. First, we introduce the naturalistic driving data (NDD)processing and NDD driving models. Second, we introduce the simulation platform used in thecase study. Third, the proposed corner case generation method is validated in the simulation plat-form. Finally, by utilizing the corner case analysis method, we extract the valuable corner casesfrom different experiment settings.

Naturalistic Driving Data Processing

To build the naturalistic highway driving model, we implement a data-driven stochastic model fromthe Integrated Vehicle-Based Safety Systems (IVBSS) dataset ( ) at the University of Michigan,Ann Arbor. By integrating the forward collision warning (FCW), lane departure warning (LDW),lane change warning (LCW), and curve speed warning (CSV) function on passenger cars and heavytrucks, the project aims to prevent rear-end and other crashes. This project collected 650,000 milesof driving data on heavy trucks and 175,000 miles of driving data on regular vehicles.To calibrate the data-driven naturalistic driving model for the highway environment, wecollected data points in which the velocity of vehicles is between 20 m / s and 40 m / s . As a result,we collected around 3 × data points and built up the empirical distribution of BV’s action forevery state. By sampling actions from the empirical distribution, all BVs are essentially controlledby the naturalistic driving model, which formulate the naturalistic driving environment (NDE). Simulation Platform

Our simulation platform is based on the open-source simulation platform HIGHWAY-ENV ( ).This simulation environment is fully compatible with OpenAI gym environment ( ), so it can beused to train the autonomous vehicle planning algorithm. In the HIGHWAY-ENV environment,the IDM car-following model ( ) and MOBIL ( ) lane-changing model are used to provide con-tinuous trafﬁc ﬂow and reasonable vehicle behaviors. However, default vehicle models are deter-ministic and can not perform the naturalistic nature of vehicle behavior, which are not suitable forthe corner case generation process. To overcome this limitation, we re-design the simulation envi-ronment and add control API to improve the controllability of BVs, so the BVs can be controlledby the naturalistic driving model. Corner Case Generation

We implement the DQN ( ) method using Pytorch and SGD as the optimizer. Detailed experimenthyper-parameters are listed in Table 1. In the dueling neuron network architecture ( ), there areun, Feng, Yan, and Liu 114 fully connected layers, each with 128 units. As is shown in Figure 1, the dueling network splitsinto two streams of fully connected layers: the value stream and the advantage stream. Eachstream has two fully-connected layers with 128 units. The ﬁnal output layer of the state streamand the value stream are also fully connected. The value stream has 1 output and the advantagestream has 33 outputs (pre-deﬁned discrete actions). The value stream output and the advantagestream output are combined using Equation 9. During the training process, + − FIGURE 2 DQN Training ResultTABLE 1 EXPERIMENTS HYPER-PARAMETERS

Hyper-parameter Valuemini-batch size 16replay memory size 1e6discount factor 1learning rate 1e-6initial exploration 1ﬁnal exploration 0.1replay start size 5000target network update 1000In the case study, we test one commonly used CAV model, which is constructed by theun, Feng, Yan, and Liu 12IDM car-following model and MOBIL lane-change model. To better interpret the results, CARLAsimulator ( ) is used to visualize corner cases. We implement the Naturalistic Driving Environ-ment (NDE) (

12, 13 ) as our baseline, which contains the CAV model and NDD vehicles that followthe naturalistic driving model. In the NDE, the crash cases frequency is around 1 × − , which israre. In the corner case generation environment, we control the nearest background vehicle aroundthe AV using the trained DQN. Results show the crash case frequency is about 0.286 as shown inFigure 2. Therefore, in terms of corner case generation frequency, we get about 3 × times morecorner cases compared with the NDE.To evaluate the corner case generation environment, we run about 50 miles in both NDEand the corner case generation environment respectively. The distribution of the bumper-to-bumperdistance (BBD) and time to collision (TTC) metric are calculated to compare the difference be-tween NDE and the corner case generation environment. The comparison of bumper to bumperdistance can be seen in Figure 3(a) and Figure 3(b), and the comparison of TTC can be seen inFigure 3(c) and Figure 3(d). From these ﬁgures, we can see that the testing vehicle in the cornercase generation environment has much smaller bumper-to-bumper distance and TTC with both thefront car and the rear car. It suggests that the corner case generation environment is much riskierthan the NDE.To further test whether the corner case environment could ﬁt non-surrogate CAV models,a CAV model based on reinforcement learning is introduced as another testing model. When thecorner case environment is utilzied for the surrogate IDM-based model, we get approximate 28 . Corner Case Analysis

Corner Case Feature Extraction

In the generated corner cases (around 50,000 scenarios), different cases have different sizes. Forthe convenience of corner case analysis, we need one uniﬁed structure of selecting features fromthe corner cases. As discussed before, the scenario is composed of scenes, and the scene canbe written as { x ( t ) , x ( t ) , . . . , x ( t ) m t } . Recall that the vehicle number m t is determined by time t andis continuously changing at each time step. Therefore, to obtain features with the same shapefrom different time steps, we need to restrict the number of vehicles recorded. Based on domainknowledge, we select the most critical BV in the crash corner cases (i.e., the nearest vehicle in theBVs). Then we use the following equations as the extracted feature of one time step: [ r lon , r lat , rr , h CAV , h BV ] , (10)un, Feng, Yan, and Liu 13 (a) BBD Front Car (b) BBD Rear Car(c) TTC Front Car (d) TTC Rear Car FIGURE 3 Comparison between NDE and Corner Case Generation EnvironmentFIGURE 4 Illustration of Crash Types where r lon refers to the longitudinal relative distance, r lat refers to the lateral relative distance, rr refers to the velocity difference, and h CAV , h BV represent the heading angle of CAV and critical BVrespectively. The feature extracted can represent the key information in a snapshot of the trafﬁcsimulation. To characterize the long-term change of the trafﬁc state, we continuously pick featuresfrom k time steps before crashes.In the experiment, eight values of the time period are applied, starting from the one timestep to 15 consecutive time steps. For each time step, 5 features are considered as shown in Equa-tion 10. Therefore, for 15 time steps, 75 features are considered. To extract the crucial featuresfrom the high dimensional data, we apply the PCA algorithm and pick the ﬁrst two dimensions.After the feature extraction and projection, we implement the DBSCAN clustering method to dif-un, Feng, Yan, and Liu 14 FIGURE 5 Distributions of Crash TypesFIGURE 6 Demonstration of Each Crash Type in CARLA ferentiate the minority and the majority of the generated corner cases. The distribution of the datapoints projection on the 2-D plane can be seen in Figure 7. The data points located in the high-density area are classiﬁed as the majority (green), while the low-density ones are classiﬁed as theminority (red). From the clustering result, one obvious trend is that, with the increase of the timeun, Feng, Yan, and Liu 15period, the data points seem to be less gathered. Regarding the features of 1s or 2s, only severaldata points are identiﬁed as the minority (marked as red). However, when it turns to quiet a longtime period (15s), nearly all data points are randomly distributed over the feature space, whichmakes it harder to differentiate the majority and the minority. Therefore, in the following analysis,we use the 1s data as an example to illustrate the result of the generated corner cases. It is alsoreasonable considering that most of the vehicle accidents in the real world involve only a smallnumber of vehicles in a short period.

FIGURE 7 Corner Case Extracted Features

Corner Case Clustering

After applying the DBSCAN method on the generated corner cases, we further cluster the minorityof generated corner cases. As shown in Figure 8, the majority (A part) are distinguished from theminority. We can see that the majority of the corner cases form a rectangle in the projected PCAfeature space. In this rectangle, most cases share some characteristics in common: the CAV isrunning straight on the road, and the background vehicle is directly crashing into the CAV fromdifferent angles and different positions as shown in Figure 9A. In this type of crash, the crashangle and crash relative position parameters are continuous in the parameter space, while otherparameters are the same. Therefore, even though it seems that there are many different kinds ofcrashes in this crash type, they are closely connected in the PCA feature space.From the 2-D projection of the feature vector, we can see that the minority can be assignedto several clusters, while the majority is closely connected. Therefore, in this case, we apply theclustering method (K-means) on the minority, resulting in 4 clusters (B, C, D, E in Figure 8). Theclustering process took 0 . s on a laptop with i7-8750h CPU and 24Gb RAM. For each cluster,cases of each cluster share similar internal properties. For example, eases of B cluster in Figure 8and Figure 9 demonstrate the rear-end cases, in which one BV cuts in the CAV’s lane and forces theCAV to change its lane, making the CAV crash into another BV in the original lane behind the CAV.Cases of C cluster indicate another crash type: CAV and BV change their lanes simultaneously andun, Feng, Yan, and Liu 16crash into each other. Cases of D cluster show that the BV tries to change into the lane of the CAVand causes crashes when making the lane-change decision. The only case in the E cluster behavessimilarly to the cases in the D cluster. However, there are some slight differences: the crash in theE cluster happens after the BV changes into the lane of the CAV and can be deﬁned as a rear-endcrash. FIGURE 8 Corner Case Clustering Results"NDD Bounded" Case Study

We also provide the analysis results of another "NDD bounded" generation method. In Equation6, the controlled BV can choose any actions in the pre-deﬁned action space. However, in theNDE, the BVs generally have limited choices of actions. To solve this issue, we slightly modifythe optimization problem in Equation 6 by adding action constraints. By applying the constraints,we can restrict the available actions of BVs, so the agent in the environment can only choose theactions which are possible in the NDE. The new optimization problem can be written as follows:max ρ P ( A | θ , ρ ) , (11)s.t. π ρ ( u k | s k , θ ) = i f P ( u k | s k , θ ) = . (12)Therefore, the agent trained from the modiﬁed optimization problem can only reproduce the cornercases which are likely to happen in the NDE. In this way, the generated corner cases will bemuch more realistic and valuable for the CAV evaluation. Using the same analysis method of theprevious result, we apply PCA on the crash event data and reduce the data into two dimensions.After that, we apply the DBSCAN algorithm on the PCA projected features to get different clustersand outliers. The experiment is applied on 1627 data samples and costs 0 . s on a laptop withi7-8750h CPU and 24Gb RAM. Detailed results can be seen in Figure 10.From Figure 10, we can see that there are two main clusters identiﬁed and several outliers(D). To analyze the results, we split one large cluster into B and C, which are deeply connectedbut demonstrate different data layout. As shown in Figure 11, data points in A area share similarproperties: the BV suddenly cuts in the CAV and causes the rear-end collision. We deﬁne it asthe "aggressive cut-in" cluster. B, C, and D are in nearly the same situation: CAV and BV changeun, Feng, Yan, and Liu 17 FIGURE 9 Corner Case ExamplesFIGURE 10 Corner Case Clustering Results in the "NDD Bounded" Case Study their lanes at the same time and crash in the middle lane. We deﬁne it as the "lane conﬂict"cases. Even though the causes of crashes are similar, these three different clusters have differentcrash snapshots. In the B cluster, the CAV and BV are involved in a side-by-side collision. In Ccluster, CAV(BV) crashes into BV(CAV) at the rear of the car. In D cluster which is identiﬁed asun, Feng, Yan, and Liu 18the relatively rare events (minority) in the generated corner cases, we can see that the CAV andBV do not crash until they change to the same lane, after which the rear-end collision happens.Therefore, even though the last three clusters are corner cases caused by the same reason, thedecision variables and conditions are different case by case. In the outlier part (D), we can getsome cases with extreme decision variables.The extracted valuable corner cases can provide insight for the further improvement of theCAV model. Although some cases are inevitable, there are cases caused by the ﬂaws of the CAVmodel, and thus can be avoided by more intelligent CAVs. For example, for the CAV model inthe case study, one signiﬁcant limitation is the lack of behavioral competency for lateral collisionavoidance, especially during the lane changing process, as shown in Figure 11 B, C, D clusters.

FIGURE 11 "NDD Bounded" Corner Case ExamplesCONCLUSION

In this paper, we propose a decision-making corner case generation and analysis method for CAVtesting and evaluation purpose. By utilizing MDP formulation and DRL techniques, the cornercases of highway driving environment are purposely generated with a higher probability, compar-ing with the NDE. After generating the corner cases, the valuable corner cases are further identiﬁedby the corner case analysis method, including feature extraction and clustering techniques. Twocase studies are provided to validate the proposed methods. Results show that the valuable cornercases can be effectively generated and identiﬁed, which are helpful for CAV evaluation and de-velopment by revealing ﬂaws of the given CAV model. Future studies may focus on introducingmore different trafﬁc participants (pedestrians, trafﬁc lights, etc.). Furthermore, the corner casegeneration for urban driving environment deserves more investigation, including intersection androundabout scenarios.un, Feng, Yan, and Liu 19

ACKNOWLEDGMENTS

The authors would like to thank the US Department of Transportation (USDOT) Region 5 Univer-sity Transportation Center: Center for Connected and Automated Transportation (CCAT) of theUniversity of Michigan for funding the research. The authors would like to thank Miss MiaoshiqiLiu for valuable suggestions. The views presented in this paper are those of the authors alone.

AUTHOR CONTRIBUTION STATEMENT

The authors conﬁrm contribution to the paper as follows: study concept and design: Haowei Sun,Shuo Feng, and Henry Liu; simulation platform construction: Haowei Sun, Xintao Yan; analysisand interpretation of results: Haowei Sun, Shuo Feng, and Henry Liu; draft manuscript preparation:Haowei Sun, Shuo Feng, and Henry Liu. All authors reviewed the results and approved the ﬁnalversion of the manuscript.un, Feng, Yan, and Liu 20

REFERENCES

1. Kalra, N. and S. M. Paddock, Driving to safety: How many miles of driving would it taketo demonstrate autonomous vehicle reliability?

Transportation Research Part A: Policyand Practice , Vol. 94, 2016, pp. 182–193.2. Li, L., W.-L. Huang, Y. Liu, N.-N. Zheng, and F.-Y. Wang, Intelligence testing for au-tonomous vehicles: A new approach.

IEEE Transactions on Intelligent Vehicles , Vol. 1,No. 2, 2016, pp. 158–166.3. Li, L., Y.-L. Lin, N.-N. Zheng, F.-Y. Wang, Y. Liu, D. Cao, K. Wang, and W.-L. Huang, Ar-tiﬁcial intelligence test: a case study of intelligent vehicles.

Artiﬁcial Intelligence Review ,Vol. 50, No. 3, 2018, pp. 441–465.4. Koren, M., S. Alsaif, R. Lee, and M. J. Kochenderfer, Adaptive stress testing for au-tonomous vehicles. In , IEEE, 2018, pp.1–7.5. Cui, L., J. Hu, B. B. Park, and P. Bujanovic, Development of a simulation platform forsafety impact analysis considering vehicle dynamics, sensor errors, and communicationlatencies: Assessing cooperative adaptive cruise control under cyber attack.

Transportationresearch part C: emerging technologies , Vol. 97, 2018, pp. 1–22.6. Thorn, E., S. C. Kimmel, M. Chaka, B. A. Hamilton, et al.,

A framework for automateddriving system testable cases and scenarios . United States, Department of Transportation,National Highway Trafﬁc Safety Administration, 2018.7. Li, L., X. Wang, K. Wang, Y. Lin, J. Xin, L. Chen, L. Xu, B. Tian, Y. Ai, J. Wang, et al.,Parallel testing of vehicle intelligence via virtual-real interaction.

Science Robotics , 2019.8. Wang, P., Y. Chen, C. Wang, F. Liu, J. Hu, and N. N. Van, Development and veriﬁcation ofcooperative adaptive cruise control via LTE-V.

IET Intelligent Transport Systems , Vol. 13,No. 6, 2019, pp. 991–1000.9. Liu, P., Z. Xu, and X. Zhao, Road tests of self-driving vehicles: affective and cognitivepathways in acceptance formation.

Transportation research part A: policy and practice ,Vol. 124, 2019, pp. 354–369.10. Liu, P., R. Yang, and Z. Xu, How safe is safe enough for self-driving vehicles?

Riskanalysis , Vol. 39, No. 2, 2019, pp. 315–325.11. Li, L., N. Zheng, and F.-Y. Wang, A Theoretical Foundation of Intelligence Testing andIts Application for Intelligent Vehicles.

IEEE Transactions on Intelligent TransportationSystems , 2020.12. Feng, S., X. Yan, H. Sun, Y. Feng, and H. X. Liu, Intelligent driving intelligence test forautonomous vehicles with naturalistic and adversarial environment.

Nature Communica-tions , 2021.13. Yan, X., S. Feng, H. Sun, and H. X. Liu, Distributionally Consistent Simulationof Naturalistic Driving Environment for Autonomous Vehicle Testing. arXiv preprintarXiv:2101.02828 , 2021.14. Chen, S.-T., C. Cornelius, J. Martin, and D. H. Chau, Robust physical adversarial attackon faster r-cnn object detector. arXiv preprint arXiv:1804.05810 , Vol. 2, No. 3, 2018, p. 4.15. Evtimov, I., K. Eykholt, E. Fernandes, T. Kohno, B. Li, A. Prakash, A. Rahmati, andD. Song, Robust physical-world attacks on machine learning models. arXiv preprintarXiv:1707.08945 , Vol. 2, No. 3, 2017, p. 4.un, Feng, Yan, and Liu 2116. Eykholt, K., I. Evtimov, E. Fernandes, B. Li, A. Rahmati, C. Xiao, A. Prakash, T. Kohno,and D. Song, Robust physical-world attacks on deep learning visual classiﬁcation. In

Pro-ceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2018, pp.1625–1634.17. Xie, C., J. Wang, Z. Zhang, Y. Zhou, L. Xie, and A. Yuille, Adversarial examples forsemantic segmentation and object detection. In

Proceedings of the IEEE InternationalConference on Computer Vision , 2017, pp. 1369–1378.18. Ding, W., M. Xu, and D. Zhao, Learning to Collide: An Adaptive Safety-Critical ScenariosGenerating Method. arXiv preprint arXiv:2003.01197 , 2020.19. Treiber, M., A. Hennecke, and D. Helbing, Congested trafﬁc states in empirical observa-tions and microscopic simulations.

Physical review E , Vol. 62, No. 2, 2000, p. 1805.20. Treiber, M. and D. Helbing, Mobil: General lane-changing model for car-following mod-els.

Disponıvel Acesso Dezembro , 2016.21. Krauß, S., P. Wagner, and C. Gawron, Metastable states in a microscopic model of trafﬁcﬂow.

Physical Review E , Vol. 55, No. 5, 1997, p. 5597.22. Bareiss, D. and J. van den Berg, Generalized reciprocal collision avoidance.

The Interna-tional Journal of Robotics Research , Vol. 34, No. 12, 2015, pp. 1501–1514.23. Weng, B., S. J. Rao, E. Deosthale, S. Schnelle, and F. Barickman, Model Predictive In-stantaneous Safety Metric for Evaluation of Automated Driving Systems. In , IEEE, 2020, pp. 1899–1906.24. Mnih, V., K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves,M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al., Human-level control through deepreinforcement learning. nature , Vol. 518, No. 7540, 2015, pp. 529–533.25. Wold, S., K. Esbensen, and P. Geladi, Principal component analysis.

Chemometrics andintelligent laboratory systems , Vol. 2, No. 1-3, 1987, pp. 37–52.26. Alsabti, K., S. Ranka, and V. Singh, An efﬁcient k-means clustering algorithm, 1997.27. Ester, M., H.-P. Kriegel, J. Sander, and X. Xu, Density-based spatial clustering of appli-cations with noise. In

Int. Conf. Knowledge Discovery and Data Mining , 1996, Vol. 240,p. 6.28. Goodfellow, I. J., J. Shlens, and C. Szegedy, Explaining and harnessing adversarial exam-ples. arXiv preprint arXiv:1412.6572 , 2014.29. Yuan, X., P. He, Q. Zhu, and X. Li, Adversarial examples: Attacks and defenses for deeplearning.

IEEE transactions on neural networks and learning systems , Vol. 30, No. 9,2019, pp. 2805–2824.30. Buckner, C., Understanding adversarial examples requires a theory of artefacts for deeplearning.

Nature Machine Intelligence , Vol. 2, No. 12, 2020, pp. 731–736.31. Szegedy, C., W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus,Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 , 2013.32. Kurakin, A., I. Goodfellow, and S. Bengio, Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533 , 2016.33. Lu, J., H. Sibai, E. Fabry, and D. Forsyth, No need to worry about adversarial examples inobject detection in autonomous vehicles. arXiv preprint arXiv:1707.03501 , 2017.34. Lu, J., H. Sibai, E. Fabry, and D. Forsyth, Standard detectors aren’t (currently) fooled byphysical adversarial stop signs. arXiv preprint arXiv:1710.03337 , 2017.35. Ma, W.-H. and H. Peng, A worst-case evaluation method for dynamic systems, 1999.un, Feng, Yan, and Liu 2236. Zhao, D., H. Lam, H. Peng, S. Bao, D. J. LeBlanc, K. Nobukawa, and C. S. Pan, Ac-celerated evaluation of automated vehicles safety in lane-change scenarios based on im-portance sampling techniques.

IEEE transactions on intelligent transportation systems ,Vol. 18, No. 3, 2016, pp. 595–607.37. Zhao, D., X. Huang, H. Peng, H. Lam, and D. J. LeBlanc, Accelerated evaluation ofautomated vehicles in car-following maneuvers.

IEEE Transactions on Intelligent Trans-portation Systems , Vol. 19, No. 3, 2017, pp. 733–744.38. Feng, S., Y. Feng, C. Yu, Y. Zhang, and H. X. Liu, Testing scenario library generation forconnected and automated vehicles, Part I: Methodology.

IEEE Transactions on IntelligentTransportation Systems , 2020.39. Feng, S., Y. Feng, H. Sun, S. Bao, Y. Zhang, and H. X. Liu, Testing scenario librarygeneration for connected and automated vehicles, Part II: Case studies.

IEEE Transactionson Intelligent Transportation Systems , 2020.40. Feng, S., Y. Feng, H. Sun, Y. Zhang, and H. X. Liu, Testing Scenario Library Generationfor Connected and Automated Vehicles: An Adaptive Framework.

IEEE Transactions onIntelligent Transportation Systems , 2020, pp. 1–10.41. Feng, S., Y. Feng, X. Yan, S. Shen, S. Xu, and H. X. Liu, Safety assessment of highly au-tomated driving systems in test tracks: a new framework.

Accident Analysis & Prevention ,Vol. 144, 2020, p. 105664.42. Akagi, Y., R. Kato, S. Kitajima, J. Antona-Makoshi, and N. Uchida, A risk-index basedsampling method to generate scenarios for the evaluation of automated driving vehiclesafety. In , IEEE, 2019,pp. 667–672.43. O’Kelly, M., A. Sinha, H. Namkoong, R. Tedrake, and J. C. Duchi, Scalable end-to-end au-tonomous vehicle testing via rare-event simulation.

Advances in Neural Information Pro-cessing Systems , Vol. 31, 2018, pp. 9827–9838.44. Karunakaran, D., S. Worrall, and E. Nebot, Efﬁcient statistical validation with edge casesto evaluate Highly Automated Vehicles. arXiv preprint arXiv:2003.01886 , 2020.45. Shalev-Shwartz, S., S. Shammah, and A. Shashua, On a formal model of safe and scalableself-driving cars. arXiv preprint arXiv:1708.06374 , 2017.46. Ulbrich, S., T. Menzel, A. Reschka, F. Schuldt, and M. Maurer, Deﬁning and substan-tiating the terms scene, situation, and scenario for automated driving. In , IEEE, 2015, pp. 982–988.47. Tai, L., G. Paolo, and M. Liu, Virtual-to-real deep reinforcement learning: Continuouscontrol of mobile robots for mapless navigation. In , IEEE, 2017, pp. 31–36.48. Sutton, R. S. and A. G. Barto,

Reinforcement learning: An introduction . MIT press, 2018.49. Mnih, V., K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Ried-miller, Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 ,2013.50. Van Hasselt, H., A. Guez, and D. Silver, Deep reinforcement learning with double q-learning. arXiv preprint arXiv:1509.06461 , 2015.51. Hessel, M., J. Modayil, H. Van Hasselt, T. Schaul, G. Ostrovski, W. Dabney, D. Horgan,B. Piot, M. Azar, and D. Silver, Rainbow: Combining improvements in deep reinforcementlearning. arXiv preprint arXiv:1710.02298 , 2017.un, Feng, Yan, and Liu 2352. Wang, Z., T. Schaul, M. Hessel, H. Hasselt, M. Lanctot, and N. Freitas, Dueling net-work architectures for deep reinforcement learning. In

International conference on ma-chine learning , PMLR, 2016, pp. 1995–2003.53. Lillicrap, T. P., J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra,Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 ,2015.54. Ference, J. J.,

The integrated vehicle-based safety systems initiative . National HighwayTrafﬁc Safety Administration, 2006.55. Leurent, E.,

An Environment for Autonomous Driving Decision-Making . https://github.com/eleurent/highway-env , 2018.56. Brockman, G., V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, andW. Zaremba, OpenAI Gym , 2016.57. Dosovitskiy, A., G. Ros, F. Codevilla, A. Lopez, and V. Koltun, CARLA: An open urbandriving simulator. arXiv preprint arXiv:1711.03938arXiv preprint arXiv:1711.03938