[PDF] Scalable Autonomous Vehicle Safety Validation through Dynamic Programming and Scene Decomposition

Abstract

An open question in autonomous driving is how best to use simulation to validate the safety of autonomous vehicles. Existing techniques rely on simulated rollouts, which can be inefficient for finding rare failure events, while other techniques are designed to only discover a single failure. In this work, we present a new safety validation approach that attempts to estimate the distribution over failures of an autonomous policy using approximate dynamic programming. Knowledge of this distribution allows for the efficient discovery of many failure examples. To address the problem of scalability, we decompose complex driving scenarios into subproblems consisting of only the ego vehicle and one other vehicle. These subproblems can be solved with approximate dynamic programming and their solutions are recombined to approximate the solution to the full scenario. We apply our approach to a simple two-vehicle scenario to demonstrate the technique as well as a more complex five-vehicle scenario to demonstrate scalability. In both experiments, we observed an increase in the number of failures discovered compared to baseline approaches.

Full PDF

SScalable Autonomous Vehicle Safety Validation throughDynamic Programming and Scene Decomposition

Anthony Corso, Ritchie Lee, and Mykel J. Kochenderfer Abstract — An open question in autonomous driving is howbest to use simulation to validate the safety of autonomousvehicles. Existing techniques rely on simulated rollouts, whichcan be inefﬁcient for ﬁnding rare failure events, while othertechniques are designed to only discover a single failure.In this work, we present a new safety validation approachthat attempts to estimate the distribution over failures of anautonomous policy using approximate dynamic programming.Knowledge of this distribution allows for the efﬁcient discoveryof many failure examples. To address the problem of scalability,we decompose complex driving scenarios into subproblemsconsisting of only the ego vehicle and one other vehicle. Thesesubproblems can be solved with approximate dynamic pro-gramming and their solutions are recombined to approximatethe solution to the full scenario. We apply our approach toa simple two-vehicle scenario to demonstrate the technique aswell as a more complex ﬁve-vehicle scenario to demonstratescalability. In both experiments, we observed an increase in thenumber of failures discovered compared to baseline approaches.

I. INTRODUCTIONOne common practice for automated vehicle (AV) safetyvalidation is to maintain a suite of challenging drivingscenarios that the vehicle must successfully navigate aftereach update to the driving policy. Although useful, thisapproach will miss any failures that are not already includedin the test suite. Automated testing procedures that treatthe vehicle as a black box must be developed to catchunknown and unexpected failure modes of the AV whichcould dramatically decrease testing time and improve thesafety of autonomous vehicles.Much of the literature on black box testing focuses onfalsiﬁcation where inputs are generated that cause a systemto violate a safety speciﬁcation [1]. Those inputs serve as acounter example to the hypothesis that the system is safe.For autonomous driving, it is not feasible to create an agentthat can avoid all possible accidents [2], so rather than ﬁndany failure of an AV, it is preferable to ﬁnd the most likelyfailures. Traditional falsiﬁcation techniques do not considerthe probability of the failures they ﬁnd and are thereforeill-suited to this goal. Adaptive stress testing [3] tries toﬁnd the most-likely failure of an autonomous system. Thisapproach can improve the likelihood of discovered failuresbut does not necessarily explore the range of possible failuresof the system. The goal of this work is to develop a safetyvalidation approach that can reliably ﬁnd all of the mostrelevant failures of an autonomous vehicle.

A. Corso and M. J. Kochenderfer are with the Aeronautics and Astronau-tics Department, Stanford University. e-mail: { acorso,mykel } @stanford.eduR. Lee is with the Robust Software Engineering group at NASA AmesResearch Center. e-mail: [email protected] Our approach attempts to estimate the distribution overfailures of an autonomous vehicle operating in a stochasticenvironment. If we assume that the vehicle’s policy andsimulator are Markov then we show that the problem sim-pliﬁes to estimating the probability of failure at each state,a computation which can be performed using approximatedynamic programming (DP). Approximate DP is particularlyeffective at ﬁnding failures because it can start at a failureand work backward to see what led to it. Unfortunately,this approach has difﬁculty scaling to large state spaces. Toimprove scalability, we use the structure of driving scenariosby decomposing the simulation into pairwise interactionsbetween the ego vehicle and other agents on the road. Thesesubproblems are tractable for approximate DP, and theirsolutions can be recombined to approximate the solutionfor the full problem. To account for the approximation errordue to multi-agent interactions, we combine the subproblemsusing a learned set of weights.We apply our approach to two driving scenarios: a simpletwo-vehicle scenario to demonstrate the effectiveness of DP,and a more complex ﬁve-vehicle scenario to demonstratethe favorable scaling of the approach. In both experiments,we observed increases in the number of failures discoveredcompared to baseline approaches, and the discovered failurehad comparatively high likelihood. The main contributionsof this work are: • A safety validation approach that estimates the distri-bution over failures using approximate DP. • An algorithm for problem decomposition and recon-struction to scale approximate DP to complex drivingscenarios. • Demonstration of these techniques on two realistic driv-ing scenarios and observation of a signiﬁcant increasein rates of discovered failures.The remainder of the paper is organized as follows: sec-tion II gives an overview of related work in the ﬁeld of black-box validation for autonomous driving, section III describesour proposed technique in detail, section IV outlines thetwo experiments and describes our results, and section Vconcludes and discusses future work.II. R

ELATED W ORK

This section discusses safety validation of autonomoussystems. We ﬁrst give a brief overview of black-box fal-siﬁcation algorithms and then discuss approaches that weredeveloped speciﬁcally for autonomous driving. a r X i v : . [ c s . R O ] J un . Safety Validation of Black-Box Systems Falsiﬁcation of black box systems involves ﬁnding inputsto the system that lead to violation of the system speciﬁca-tions. State-of-the-art approaches cast falsiﬁcation as a globaloptimization problem over the input space [1] and try tosolve it using surrogate models [4], deep reinforcement learn-ing [5], genetic algorithms [6], Monte Carlo tree search [7],or cross-entropy optimization [8]. Adaptive stress testing(AST) [3], [9]–[11] frames the problem of falsiﬁcation asa Markov decision process and uses reinforcement learningto ﬁnd the most-likely failures of a system according to aprescribed probability model. The ﬁeld of statistical modelchecking [12] deals with estimating the probability of failure,and in doing so will ﬁnd inputs to the system that cause itto fail.Several approaches rely on sampling-based methods todiscover failures. Huang et al. [13] use bootstrapping andimportance sampling to obtain a low-variance estimate ofthe probability of failure. Another approach uses impor-tance sampling via the cross-entropy method to increase thenumber of failures found in simulation [14], [15]. Uesato et al. [16] use previous versions of an autonomous agentto help ﬁnd failures in the ﬁnal version, an approach thatworks when when agents have learned behavior. Similar tothe present work, Chryssanthacopoulos, Kochenderfer, andWilliams [17] use DP to estimate the probability of failure.

B. Safety of Autonomous Vehicles

Some work has focused on falsifying components of anautonomous vehicle such as Adaptive Cruise Control [18]or perception systems [19], [20]. Other work has focusedon the generation of critical test cases. For example Mullins et al. [21] identify regions of the input space that separatedistinct types of autonomous agent behavior, and Althoffand Lutz [22] design adversarial agents to minimize the safeavailable driving space of the autonomous vehicle.III. PROPOSED APPROACHThis section describes our approach to the safety valida-tion problem. We start with the problem formulation anddeﬁnition of notation. Then, we describe our technique forestimating the distribution over failures assuming we knowthe probability of failure from each state. Lastly, we describehow to compute that probability in a scalable way.

A. Problem Formulation

Suppose we wish to analyze the safety of a black-box au-tonomous system (system-under-test, or SUT) that operatesin a stochastic simulated environment. The state of the SUTand the environment is s ∈ S and the disturbances x ∈ X are stochastic elements of the environment that inﬂuencethe behavior of the SUT. A state-disturbance trajectory τ = { s , x , s . . . , x N , s N } has a likelihood of occurrence p ( τ ) .We deﬁne E as the set of all failure states of the SUT andthe notation s N ∈ E means that the trajectory τ ends in afailure. Let T be the set of all terminal states where E ⊆ T . We would like to know the distribution over failures f ( τ ) = { s N ∈ E } p ( τ ) E p [ { s N ∈ E } ] (1)where is the indicator function and the denominatornormalizes the distribution. Note that f ( τ ) is the minimum-variance importance sampling distribution for estimating theprobability of failure. B. Estimating the Distribution Over Failures

The space of all trajectories is exponential in the legnthof the trajectory, so it will be challenging to represent thedistribution f ( τ ) directly. To reduce the dimensionality ofthe distribution we assume that the SUT and environmentare Markov. The current disturbance x and next state s (cid:48) willonly depend on the current state s such that p ( x, s (cid:48) | s ) = p ( x | s ) (cid:124) (cid:123)(cid:122) (cid:125) disturbance model dynamics (cid:122) (cid:125)(cid:124) (cid:123) p ( s (cid:48) | s, x ) . (2)If we also assume that the dynamics of the SUT andthe environment are deterministic (i.e. all stochasticity iscontrolled through disturbances), then p ( x, s (cid:48) | s ) = p ( x | s ) . (3)With these assumptions, the distribution over failures onlydepends on p ( x | s ) and is given by f ( τ ) = { s N ∈ E } E p [ { s N ∈ E } ] N (cid:89) t =1 p ( x t | s t − ) . (4)The Markov assumption allows us to ﬁnd a distributionover disturbances, or stochastic policy, π that generatessample trajectories (rollouts) distributed according to f . Let π ( x | s ) = p ( x | s ) v ( s (cid:48) ) (cid:80) a (cid:48) p ( x (cid:48) | s ) v ( s (cid:48)(cid:48) ) = p ( x | s ) v ( s (cid:48) ) v ( s ) (5)where v ( s ) is the probability of failure from state s and s (cid:48)(cid:48) isthe state reached from s after applying disturbance x (cid:48) . Thesecond equality in eq. (5) comes from the observation thatthe probability of failure in the current state is a sum of theprobability of failure over possible next states, weighted bythe likelihood of reaching that state. Proposition 1:

Trajectories generated from rollouts of thepolicy π will be distributed according to f . Proof:

Let f ∗ ( τ ) be the distribution induced by rolloutsof the policy π . We will show that for any τ , f ( τ ) = f ∗ ( τ ) .First, we deﬁne the Bellman equation that describes theprobability of failure of a Markov system as v ( s ) =  if s ∈ E if s / ∈ E, s ∈ T (cid:80) a p ( x | s ) v ( s (cid:48) ) otherwise (6)Then consider an arbitrary trajectory τ that has probabilityaccording to f ∗ given by f ∗ ( τ ) = N (cid:89) t =1 π ( x t | s t − ) = N (cid:89) t =1 p ( x t | s t − ) v ( s t ) v ( s t − ) (7)nd probability according to f given by eq. (4). There aretwo cases to consider. Case 1: { s N / ∈ E } Due to the indicator function in eq. (4), f ( τ ) = 0 . Withﬁnal state s N ∈ T and s N / ∈ E , we have v ( s N ) = 0 . Thelast term in the product in eq. (7) contains v ( s N ) , making f ∗ ( τ ) = 0 . Case 2: { s N ∈ E } Considering the deﬁnition of f ∗ ( τ ) , we have f ∗ ( τ ) = N (cid:89) t =1 p ( x t | s t − ) v ( s t ) v ( s t − ) (8) = (cid:24)(cid:24)(cid:24)(cid:24)(cid:58) v ( s N ) v ( s ) N (cid:89) t =1 p ( x t | s t − ) (9) = 1 E p [ { s N ∈ E } ] N (cid:89) t =1 p ( x t | s t − ) (10) = f ( τ ) (11)where, in the second line, all of the v ( s t ) terms werecanceled except v ( s ) and v ( s N ) , and eq. (6) was usedto let v ( s N ) = 1 . In the third line, we observe that theprobability of failure at the initial state v ( s ) is equivalentto the expectation of failures E p [ { s N ∈ E } ] . Thus, in allcases, f ( τ ) = f ∗ ( τ ) .Assuming that the distribution over disturbances p ( x | s ) is provided (either from domain knowledge or data), then theproblem of computing the distribution over failures amountsto computing the probability of failure at each state v ( s ) .Additionally, once v ( s ) is known, failures can be generatedfrom any initial condition where a failure can be reached. C. Computing the Probability of Failure

The feasibility of computing the probability of failure v ( s ) depends on the size of the state and disturbance spaces. Ifthose spaces are discrete and relatively small, then DP canbe used to compute v to any desired level of accuracy. Ifthe state space is continuous, but is small enough to bediscretized, then local approximation DP can be used toestimate v ( s ) [23]. As will be demonstrated by our exper-iments, this approach is feasible for interactions betweentwo vehicles on the road. For more vehicles, discretizing thestate space becomes intractable and we must rely on furtherapproximation.When scaling to much larger state-spaces, we can leveragethe structure of the problem to improve scalability. Wepropose to decompose a complicated driving scenario intopairwise interactions between the ego vehicle and otheragents on the road, similar to the decomposition approachused by Bouton et al. [24]. Each subproblem can then besolved for the probability of failure between the i th vehicleand the ego vehicle yielding v i ( s ( i ) ) , where s ( i ) is the subsetof the state representing only those vehicles. To combinethe probability of failure from each of m subproblems, wecan use the transfer learning approach called attend, adaptand transfer (A2T) [25]. A2T combines the solutions of m S t a t e s Base networkSolutionsAttention network valuesweights v ( s ) Fig. 1: The A2T network. Dashed lines represent backprop-agation for learning parameters.problems with a solution learned from scratch v base using alearned set of state-dependent attention weights w ( s ) . Theestimated probability of failure for a state s , ˜ v ( s ) , is thengiven by ˜ v ( s ) = w ( s ) v base ( s ) + m (cid:88) i =1 w i ( s ) v i ( s ( i ) ) (12)where w i and v base have parameters that can be learned.The use of attention weights allows A2T to learn whichsolutions are most relevant in which states. If none of thesubproblems are providing a good estimate then the base net-work will learn a good estimate from scratch. The estimatefrom eq. (12) can be represented as the network architectureshown in ﬁg. 1. The base network has two hidden layerseach with units and relu activations followed by a sigmoidactivation to keep the output between and . The solutionstake the state as input and give the probability of failureestimate for each subproblem. The attention network has onehidden layer with units and a softmax layer to makesure the weights sum to 1. The base network output isconcatenated to the subproblem solution outputs to create avector of values that has m + 1 components. The dot productis taken between the values and the m +1 weights to producethe ﬁnal estimate of the probability of failure. Algorithm 1

MC evaluation with function approximation function MCP

OLICY E VAL ( ˜ v θ , N iter , N samp , α )2: for N iter iterations3: S, G ← Rollouts( ˜ v θ , N samp )4: J = N samp (cid:80) N samp j =1 ( G j − v ( S j )) θ ← θ − α ∇ θ J return ˜ v θ The network can be trained using rollouts from thefull driving scenario to estimate the probability of failure.The training procedure we used is Monte Carlo policyevaluation with function approximation [26] and is shownin algorithm 1. The algorithm takes as input the networkthat estimates the probability of failure ˜ v θ with trainableparameters θ , the number of training iterations N iter , thenumber of sampled transitions per iteration N samp , and thelearning rate α . On each iteration, a series of rollouts areperformed (line 3). The rollout policy is π from eq. (5)where v ( s ) is replaced with the current estimate ˜ v θ ( s ) . As theestimate of the probability of failure is improved, the rolloutolicy will produce more failure examples. All of the statesvisited during the rollouts are concatenated into a vector S .The return is computed for each state s j ∈ S as G ( s j ) = { s N ∈ E } N (cid:89) t = j p ( x t | s t − ) /π ( x t | s t − ) (13)where N is the length of the episode that contained state s j . The estimate G is a Bernoulli sample weighted bythe likelihood ratio of the current sampling policy so theexpected value of of G ( s ) is the probability of failure fromstate s . The cost J is the mean squared error between theestimated probability of failure ˜ v θ ( s ) and G ( s ) (line 4). Theparameters of the network are updated using the gradient ofthe cost function to improve the estimate (line 5).IV. EXPERIMENTSThis section describes two experimental driving scenarios,a simple scenario with two vehicles, and a more complex sce-nario with ﬁve vehicles. The simulations were designed withAutomotiveSimulator.jl, an open-source julia package. Bothsimulations rely on the same road geometry and autonomousdriving policy. The SUT is an autonomous vehicle referred toas the ego vehicle and a failure refers to any instance wherethe ego vehicle collides with another vehicle.The road geometry and initial vehicle conﬁgurations arepictured in ﬁgs. 2 to 5. The driving scenario an unprotectedleft turn of the ego vehicle (in blue) onto a two-lane road.Other vehicles (referred to adversarial vehicles) are initial-ized on the through-road and can either continue straight orturn (the yellow dot represents a turn signal). The right-of-way rules are 1) vehicles on the through-road have right-of-way over vehicles turning on to the through-road, and 2)vehicles turning right have right-of-way over vehicles turningleft.The state of each vehicle can be described with fourvariables: position along the lane, velocity along the lane,a Boolean indicating if the turn signal is on, an integerindicating the lane. For approximate DP, the position andvelocity were each discretized into values and eachvehicle can be in one of two lanes so each vehicle had atotal of states.Each vehicle on the road including the ego vehicle, followsa modiﬁed version of the intelligent driver model (IDM) [27].The IDM is a vehicle-following algorithm that tries to driveat a speciﬁed velocity while avoiding collisions with leadingvehicles. In our experiments, the IDM is parameterized bya desired velocity of

29 m / s , a minimum spacing of , amaximum acceleration of / s and a comfortable brakingdeceleration of − / s , and a simulation timestep of ∆ t =0 .

18 s . The IDM was modiﬁed with a rule-based algorithm(algorithm 2) for navigating the T-intersection. Each vehiclereasons about right-of-way and turning intention of othervehicles based on the state of their blinker, and uses currentvehicle speeds to calculate if the intersection is safe to cross.The disturbances in the environment correspond to distur-bances to the deterministic actions of all adversarial vehicles.

Algorithm 2

Intersection navigation algorithm function C OMPUTE A CCELERATION ( veh , scene )2: v ← velocity( veh )3: v lead , ∆ s lead ← leading vehicle( veh , scene )4: acc ← IDM acceleration( ∆ s lead , v , v lead )5: if veh does not have right of way6: ttc ← time to cross intersection( veh )7: ∆ s int ← distance to intersection( veh )8: if ∆ s int < ∆ s lead for agent in scene ttenter ← time to enter intersection( agent )11: ttexit ← time to exit intersection( agent )12: if ttenter < ttc and ttexit + (cid:15) > ttc acc ← IDM acceleration( ∆ s int , v , 0)14: break return acc TABLE I: Action space for adversarial vehicles

Action Acceleration MC Probability

No disturbance / s . Medium slowdown − . / s × − Major slowdown − / s × − Medium speedup . / s × − Major speedup / s × − Toggle blinker N/A × − Toggle turn intent N/A × − The disturbances and their corresponding probabilities areshown in table I. The ﬁrst produces no disturbance, so theadversary accelerates by a IDM , the acceleration computed bythe modiﬁed IDM. The next four disturbances perturb the ad-versary’s acceleration by an amount δa ∈ [ − / s , / s ] so that the actual acceleration of the adversary is a IDM + δa .The next disturbance toggles the adversary’s turn signalwhich is observed by other vehicles and used to determinethe adversary’s intention. The ﬁnal disturbance changes thehidden adversary intention as to whether or not it will turn.The choice of a disturbance probability model should bedriven by real-world driving data. In absence of that data, wechose a simple probability model that made disturbances rareaccording to their magnitude (see MC Probability in table I).Medium slowdowns and speedups were give a probability of × − per timestep while major slowdowns and speedups,toggling the blinker, and toggling turn intention had a per-timestep probability of occurrence of × − .The two metrics we chose to evaluate our approach arethe rate of failures found and the log-likelihood of adversarydisturbances for failure trajectories. The failure rates arecomputed from rollouts and the average log-likelihoodof disturbances is computed from failure examples. Themean and standard deviations are reported. We compare ourapproach against three baselines:1) Monte Carlo: rollouts with the true probability distri-bution over disturbances.2) Uniform importance sampling: rollouts with a uniformdistribution over disturbances.3) Cross entropy method: rollouts with a distribution overdisturbances that has been optimized using the crossentropy method [28].ig. 2: Normal 2-car scenario at t = (1 .

26 s , .

88 s)

Fig. 3: Collision in 2-car scenario at t = (1 .

26 s , .

80 s)

A. Two-Vehicle Interaction

The ﬁrst scenario is an interaction between the ego vehicleand one adversarial vehicle over a range of initial conditions.Figure 2 shows one mode of expected behavior in thescenario: the ego vehicle correctly predicts it can cross theintersection before the other driver arrives so it proceeds withthe left turn. A sample failure is shown in ﬁg. 3. We can seethat the adversary had to accelerate early in the simulation tocause a collision with the ego vehicle, which did not predictthat an acceleration would occur.Table II shows the number of failures observed with eachapproach. Monte Carlo sampling ﬁnds the fewest failureswith a rate of × − but the failure trajectories have acomparatively large log-likelihood. The uniform importancesampling approach increases the number of failures found bymaking rare disturbances more likely, but causes the foundfailures to be extremely unlikely due to these rare distur-bances. The cross entropy method ﬁnds slightly more failuresthan the Monte Carlo approach with a larger log likelihoodthan uniform importance sampling. The DP approach wasmost successful with a failure rate of . × − while stillretaining a large value of log likelihood. B. 5 Vehicle Interaction

The second scenario involves the interaction of the egovehicle with four adversarial drivers. A sample of normalbehavior for the scenario is shown in ﬁg. 4 where the cars onthe left and the trailing car on the right go straight, while theleading car on the right turns onto the vertical road segment.The ego vehicle gives way to all four vehicles and completesthe left turn after they have passed. A sample failure is shownin ﬁg. 5. The failure shows that the last car on the left turns itsignal on while continuing straight through the intersection,tricking the ego vehicle into initiating the left turn too early.The driving scenario was broken into four subproblems,TABLE II: 2-Car Scenario Failure Rates

Method Failure Rate Log Likelihood

Monte Carlo . ± . − . ± . Uniform Importance Sampling . ± . − . ± . Cross Entropy Method . ± . − . ± . Dynamic Programming (ours) . ± . − . ± . TABLE III: 5-Car Scenario Failure Rates

Method Failure Rate Log Likelihood

Monte Carlo . ± . − . ± . Uniform Importance Sampling . ± . − . ± . Cross Entropy Method . ± . − . ± . A2T + DP (ours) . ± . − . ± . one for each adversarial vehicle. The probability of failurewas computed for each driving scenario using approximateDP and the solutions were combined using an A2T networktrained on rollouts of the full simulator. One challenge forthis approach is the exponential scaling of the disturbancespace of the full system. If there are 4 subproblems eachwith disturbances then the full problem must consider disturbances per step. To mitigate this problem, we only letone agent act at each timestep, reducing the possible dis-turbances to . This design choice reduces the complexityof possible failure modes, but makes the problem tractablewhile still ﬁnding failures.The results are shown in table III. We ﬁrst note that thefailure rate in the scenario is lower than the previous scenarioas indicated by the failure rate of the Monte Carlo approach( . × − ). The uniform importance sampling approachimproves failure rate signiﬁcantly but ﬁnds failures with verylow likelihood due to the increased number of rare distur-bances. The cross entropy method has twice the failure rateas the Monte Carlo approach with a similar log-likelihood.Our approach (DP combined with A2T) has a much largerfailure rate ( . × − ) while ﬁnding relatively likelyfailures, demonstrating that scene decomposition combinedwith A2T is an effective strategy for ﬁnding failures of anautonomous vehicle in a complex driving scenario.V. CONCLUSIONSIn this work, we have made progress toward the goal ofautomated testing of autonomous vehicles. We introduced asafety validation formulation that uses approximate dynamicprogramming to estimate the distribution over failures andcreate sequences of disturbances that cause an autonomoussystem to fail. The problem of scalability was addressed bydecomposing the driving scenario into pairwise interactionsbetween the ego vehicles and other agents on the road. Thesesubproblems were solved and recombined to estimate theprobability of failure of the full system. To correct for errorsin this estimate, we trained an A2T network with MonteCarlo policy evaluation to weight each subproblem based onthe state. We observed to orders of magnitude increasein the number of failures found compared to importancesampling baselines in a two-vehicle driving scenario and amore complex ﬁve-vehicle driving scenario, demonstratingthe beneﬁt of this approach. Future work will use thecalculated policy to obtain a low-variance estimate of theprobability of failure, test performance on more complicateddriving scenarios with many agents, and attempt to interpretthe attention weights parameters to understand the cause offailures.ig. 4: Normal 5-car scenario at t = (0 . , .

32 s , .

02 s)

Fig. 5: Collision in 5-car scenario at t = (0 . , .

34 s , .

42 s) A CKNOWLEDGMENT

The authors gratefully acknowledge the ﬁnancial supportfrom the Stanford Center for AI Safety. We also thank theNASA AOSP SWS Project.R

EFERENCES[1] A. Donz and O. Maler, “Robust satisfaction of temporal logicover real-valued signals,” in

International Conference on FormalModeling and Analysis of Timed Systems (FORMATS) , 2010.[2] S. Shalev-Shwartz, S. Shammah, and A. Shashua, “On a for-mal model of safe and scalable self-driving cars,”

ARXIV , no.1708.06374, 2017.[3] R. Lee, M. J. Kochenderfer, O. J. Mengshoel, G. P. Brat, andM. P. Owen, “Adaptive stress testing of airborne collision avoidancesystems,” in

Digital Avionics Systems Conference (DASC) , 2015.[4] L. Mathesen, S. Yaghoubi, G. Pedrielli, and G. Fainekos, “Falsiﬁ-cation of cyber-physical systems with robustness uncertainty quan-tiﬁcation through stochastic optimization with adaptive restart,” in

IEEE Conference on Automation Science and Engineering (CASE) ,Aug. 2019.[5] T. Akazaki, S. Liu, Y. Yamagata, Y. Duan, and J. Hao, “Falsiﬁcationof cyber-physical systems using deep reinforcement learning,” in

International Symposium on Formal Methods , 2018.[6] Q. Zhao, B. H. Krogh, and P. Hubbard, “Generating test inputs forembedded control systems,”

IEEE Control Systems Magazine , vol.23, no. 4, pp. 49–57, 2003.[7] Z. Zhang, G. Ernst, S. Sedwards, P. Arcaini, and I. Hasuo, “Two-layered falsiﬁcation of hybrid systems guided by Monte Carlotree search,”

IEEE Transactions on Computer-Aided Design ofIntegrated Circuits and Systems (TCAD) , vol. 37, no. 11, pp. 2894–2905, 2018.[8] S. Sankaranarayanan and G. Fainekos, “Falsiﬁcation of temporalproperties of hybrid systems using the cross-entropy method,” in

ACM international conference on Hybrid Systems: Computation andControl (HSCC) , 2012.[9] M. Koren, S. Alsaif, R. Lee, and M. J. Kochenderfer, “Adaptivestress testing for autonomous vehicles,” in

IEEE Intelligent VehiclesSymposium (IV) , 2018.[10] A. Corso, P. Du, K. Driggs-Campbell, and M. J. Kochenderfer,“Adaptive stress testing with reward augmentation for autonomousvehicle validation,” in

IEEE International Conference on IntelligentTransportation Systems (ITSC) , 2019.[11] M. Koren and M. Kochenderfer, “Efﬁcient autonomy validationin simulation with adaptive stress testing,” in

IEEE InternationalConference on Intelligent Transportation Systems (ITSC) , Oct. 2019.[12] G. Agha and K. Palmskog, “A survey of statistical model check-ing,”

ACM Transactions on Modeling and Computer Simulation(TOMACS) , vol. 28, no. 1, pp. 1–39, 2018.[13] Z. Huang, M. Arief, H. Lam, and D. Zhao, “Evaluation uncertaintyin data-driven self-driving testing,” in

IEEE International Confer-ence on Intelligent Transportation Systems (ITSC) , 2019. [14] Y. Kim and M. J. Kochenderfer, “Improving aircraft collisionrisk estimation using the cross-entropy method,”

Journal of AirTransportation , vol. 24, no. 2, pp. 55–62, 2016.[15] M. O’Kelly, A. Sinha, H. Namkoong, R. Tedrake, and J. C. Duchi,“Scalable end-to-end autonomous vehicle testing via rare-eventsimulation,” in

Advances in Neural Information Processing Systems(NIPS) , 2018.[16] J. Uesato, A. Kumar, C. Szepesvari, T. Erez, A. Ruderman, K.Anderson, N. Heess, and P. Kohli, “Rigorous agent evaluation: Anadversarial approach to uncover catastrophic failures,”

ARXIV , no.1812.01647, 2018.[17] J. P. Chryssanthacopoulos, M. J. Kochenderfer, and R. E. Williams,“Improved Monte Carlo sampling for conﬂict probability estima-tion,” in

AIAA Non-Deterministic Approaches Conference , 2010.[18] M. Koschi, C. Pek, S. Maierhofer, and M. Althoff, “Computationallyefﬁcient safety falsiﬁcation of adaptive cruise control systems,”in

IEEE International Conference on Intelligent TransportationSystems (ITSC) , 2019.[19] Y. Cao, C. Xiao, B. Cyr, Y. Zhou, W. Park, S. Rampazzi, Q. A.Chen, K. Fu, and Z. M. Mao, “Adversarial sensor attack onlidar-based perception in autonomous driving,” in

ACM SIGSACConference on Computer and Communications Security , 2019.[20] A. Balakrishnan, A. Puranic, X. Qin, A. Dokhanchi, J. Deshmukh,H. Ben Amor, and G. Fainekos, “Specifying and evaluating qualitymetrics for vision-based perception systems,” in

Design, Automationand Test in Europe (DATE) , May 2019.[21] G. E. Mullins, P. G. Stankiewicz, R. C. Hawthorne, and S. K.Gupta, “Adaptive generation of challenging scenarios for testingand evaluation of autonomous vehicles,”

Journal of Systems andSoftware , vol. 137, pp. 197–215, 2018.[22] M. Althoff and S. Lutz, “Automatic generation of safety-criticaltest scenarios for collision avoidance of road vehicles,” in

IEEEIntelligent Vehicles Symposium (IV) , 2018.[23] M. J. Kochenderfer,

Decision making under uncertainty: Theoryand application . MIT Press, 2015.[24] M. Bouton, K. D. Julian, A. Nakhaei, K. Fujimura, and M. J.Kochenderfer, “Decomposition methods with deep corrections forreinforcement learning,”

International Conference on AutonomousAgents and Multiagent Systems (AAMAS) , vol. 33, no. 3, pp. 330–352, 2019.[25] J. Rajendran, A. S. Lakshminarayanan, M. M. Khapra, P. Prasanna,and B. Ravindran, “Attend, adapt and transfer: Attentive deeparchitecture for adaptive transfer from multiple sources in the samedomain,”

ARXIV , no. 1510.02879, 2015.[26] R. S. Sutton and A. G. Barto,

Reinforcement learning: An introduc-tion . MIT Press, 2018.[27] M. Treiber, A. Hennecke, and D. Helbing, “Congested trafﬁc statesin empirical observations and microscopic simulations,”

PhysicalReview E , vol. 62, pp. 1805–1824, 2000.[28] P.-T. De Boer, D. P. Kroese, S. Mannor, and R. Y. Rubinstein,“A tutorial on the cross-entropy method,”