A Mutation-based Approach for Assessing Weight Coverage of a Path Planner
Thomas Laurent, Paolo Arcaini, Fuyuki Ishikawa, Anthony Ventresque
AA Mutation-based Approach for Assessing WeightCoverage of a Path Planner
Thomas Laurent
Lero & University College Dublin, IrelandNational Institute of Informatics, Japan [email protected]
Fuyuki Ishikawa
National Institute of Informatics
Tokyo, [email protected]
Paolo Arcaini
National Institute of Informatics
Tokyo, [email protected]
Anthony Ventresque
Lero & University College Dublin
Dublin, [email protected]
Abstract —Autonomous cars are subjected to several differentkind of inputs (other cars, road structure, etc.) and, therefore,testing the car under all possible conditions is impossible. Totackle this problem, scenario-based testing for automated drivingdefines categories of different scenarios that should be covered.Although this kind of coverage is a necessary condition, it stilldoes not guarantee that any possible behaviour of the autonomouscar is tested. In this paper, we consider the path planner of anautonomous car that decides, at each timestep, the short-termpath to follow in the next few seconds; such decision is done byusing a weighted cost function that considers different aspects(safety, comfort, etc.). In order to assess whether all the possibledecisions that can be taken by the path planner are covered bya given test suite T , we propose a mutation-based approach thatmutates the weights of the cost function and then checks if atleast one scenario of T kills the mutant. Preliminary experimentson a manually designed test suite show that some weights areeasier to cover as they consider aspects that more likely occur ina scenario, and that more complicated scenarios (that generatemore complex paths) are those that allow to cover more weights. Index Terms —software testing, mutation analysis, automateddriving, path planner
I. I
NTRODUCTION
Automated driving is a technology currently being intenselydeveloped, that promises to impact our lives in many ways.Application for automated vehicles range from transport ofgoods (automated freight) to personal mobility, with offerssuch as Tesla’s or Uber’s. As promising as the technology is,great care must be taken in evaluating and validating suchsystems, to avoid tragic accidents happening [1], [2].Testing automated driving systems is critical for the satisfac-tion and safety of all stakeholders, but is also a very expensiveoperation. Therefore, it is essential to know when a system hasbeen sufficiently tested. This is the question we focus on inthis paper.
This work was supported, in part, by Science Foundation Ireland grant13/RC/2094. T. Laurent is supported by an Irish Research Council grant(GOIPG/2017/1829). P. Arcaini and F. Ishikawa are supported by ERATOHASUO Metamathematics for Systems Design Project (No. JPMJER1603),JST. Funding Reference number: 10.13039/501100009024 ERATO.
Fig. 1: Illustration of the path planner.An autonomous driving system can be seen as a set ofcomponents that sense the environment of the vehicle, choosea path given an itinerary, and implement this path into concreteactions performed by actuators. In this paper, we considera path planner component provided by our industry partner,which computes the best trajectory for the vehicle given atarget destination. At every timestep, the path planner decidesthe short-term path that the car should follow in the nextfew seconds and the control commands that must be providedto implement it (such as acceleration and angle); in order todecide the next short-term path, an optimisation algorithm isemployed. A set of short-term paths starting from the head ofthe car to a grid of points in front of the car are enumerated.This is shown in Fig. 1 where the white car is the ego car, thered car another, immobile, car. The translucent cars representthe possible future positions sampled by the path plannerand the blue arrows the associated short-term paths. Then,each short-term path is evaluated according to a cost function.The cost function considers different aspects, such as safety , vehicle limitation , regulation compliance , and comfort . Giventhe ranking of all the short-term paths, the one with the lowestcost is taken.The path planner is tested using a simulator. The simu-lator takes as input a path planner and a scenario –a roadconfiguration, a starting position, direction and speed for theautomated car (or ego car ), and for other objects on the road–and runs the path planner in this particular scenario, computingthe path the ego car would take. Evaluating the output of thesimulation, i.e., a path, is not done with a pass/fail oracle, as a r X i v : . [ c s . S E ] O c t here is no comprehensive definition of what a valid path is.The presence or absence of a crash, for example, is not a goodenough oracle: one can drive badly and not crash, or can takethe best possible decision and still experience an unavoidablecrash. Instead, we can define some metrics to capture somemeasures of interest (e.g., minimum distance of the ego carwith all the other objects along the path); these metrics canbe used to evaluate a path or compare paths computed bydifferent path planners for a same scenario.In testing, coverage criteria are used to evaluate the quality of a test suite T , i.e., if T is sufficient for testing the SystemUnder Test. For example, classical structural criteria checkthat the code has been covered sufficiently. In scenario-basedtesting [3] for automated driving (as done with our simulator),main approaches aim at covering all different traffic situations(e.g., number and positions of the other cars), and manoeuvresdone by the ego car [4]. To support this kind of testing,ontologies regarding driving behaviours, road topologies, en-vironmental conditions, etc. have been devised (see the eightdocuments in [5]). Although such kind of coverage is alsonecessary for our path planner, it still does not guaranteecoverage of all possible behaviours of the path planner.Therefore, in this paper, we propose a definition of whatit means to sufficiently test a path planner. At a very highlevel, we want to check that all the possible decisions thatcan be taken by the path planner are observed in at least onetest. This is difficult because, in general, we do not knowwhich scenarios lead to some given decisions; this is why thecoverage of scenario elements (as in [4]) may be not sufficientand so we need a different criterion. Moreover, we do not evenhave a proper characterisation of the different decisions, i.e.,given two short-term paths computed by the path planner, wecannot say if they have been taken for the same reasons (i.e.,for respecting the same aspects).Although it is impossible to evaluate if all possible decisionsare covered in a direct manner, we propose an indirect way ofassessing this. The path planner can be seen as a weightedfunction of the different aspects listed before (i.e., safety,comfort, etc.). For each aspect, the path planner has oneor more weight (s) set by the system designer. Such weightsrepresent how important that aspect is in selecting a short-term path (we call this selection a decision ). We claim thata minimal condition for testing the path planner is that eachweight is shown to be “relevant” in at least one decision inone test. We say that a weight w i is covered by a scenario s (i.e., that w i is “relevant” to at least a decision taken for s )if using a different weight value w (cid:48) i in the path planner, thecomputed path p (cid:48) is different from the path p computed withthe original weight. Indeed, if changing the weight w i in allpossible ways does not affect any decision, it means that theaspect considered by w i is irrelevant in that scenario.Since trying all the possible alternative weights is infeasible(as their number is very large), we propose a mutation-basedapproach [6] that is able to estimate the weight coverage of a given test suite T . The approach consists in mutating a weight w i with a finite set of mutation operators; w i is considered covered by a test suite T if, in at least one testin T , the path computed by the mutated path planner is different from the path computed by original path planner,according to a mutation oracle . We propose three mutationoracles that provide different guarantees in terms of coverage:the path oracle simply compares the two paths point-wise,the safety oracle compares the minimum distance of the egocar w.r.t. the other objects in the two paths, and the comfortoracle compares the smoothness of the paths. Note that somemutation oracles, as the path oracle, are more likely to saythat two paths are different, while others, as the comfortoracle, are more demanding and require bigger differencesin order to consider two paths different; in general, strongermutation oracles provide stronger guarantees that two pathsare significantly different [6].The rest of this paper is structured as follows: Section IIintroduces some core definitions, and Section III presents ourdefinition of weight coverage and a mutation-based approachto estimate it. Section IV presents some experiments weperformed to evaluate our approach and discusses their results.Section V further discusses some insights from the experi-ments, and Section VI tackles some threats that could affectthe validity of the proposed approach. Finally, Section VIIreviews some related work, and Section VIII concludes thepaper. II. D EFINITIONS
In the following, we provide some definitions related to thepath planner and its simulator.
Definition 1 (Scenario) . A scenario s describes the environ-ment in which the ego car is operating. It is constituted of: • a map M describing the road structure: • an initial position, speed, acceleration, and direction ofthe ego car; • a target destination of the ego car; • a set of static objects SO ; each static object is charac-terised by its position in the map, and its size (length andwidth); • a set of dynamic objects DO ; each dynamic object, inaddition to position and size, is also characterised by itsinitial speed, acceleration, and direction; • a timeout TO ; the scenario must be run until time TO .We will use the dot notation (e.g., s. TO ) to access a particularfield of a scenario.For the sake of conciseness, in the following, we considerstatic objects as dynamic objects having no velocity and noacceleration. Definition 2 (Path) . A path is a sequence of tuples [( t ,l = ( x , y ) , d , v , a ) , . . . , ( t n , l n = ( x n , y n ) , d n , v n ,a n )] , where each tuple i identifies a timestamp t i , a location l i = ( x i , y i ) in the map, a direction d i , a speed v i , and anacceleration a i . We use the dot notation to access tuple fieldsat a given time t i (e.g., p.a i ).ote that, for each dynamic object do of a scenario s ,we can automatically compute its path p up to the timeout s. TO ; we will write do ( s ) = p , where p = [( t , l =( x , y ) , d , v , a ) , . . . , ( t n , l n = ( x n , y n ) , d n , v n , a n )] and t n = s. TO . Definition 3 (Path Planner) . A path planner PP can be seenas a function that, given a scenario s , produces a path p for theego car up to simulation time s. TO , formally, PP ( s ) = p . Wename each pair of consecutive tuples (( t i , l i = ( x i , y i ) , d i ,v i , a i ) , ( t i +1 , l i +1 = ( x i +1 , y i +1 ) , d i +1 , v i +1 , a i +1 )) (with i ∈ { , . . . , n − } ) as short-term path : it corresponds to a decision taken by the path planner. A. Evaluation metrics (path quality metrics)
Given a scenario s , we can define different metrics charac-terising the whole path computed by the path planner. In thefollowing, let p e = PP ( s ) be the path computed by the pathplanner for the ego car, and p = do ( s ) , . . . , p m = do m ( s ) be the paths of the dynamic objects do . . . do m . a) Safety metric: The first metric provides a quantitativeevaluation of how safe the chosen path is. It is defined in termsof minimum distance between the ego car and any other objectalong the path as follows minDis ( p e , { p , . . . , p m } ) = min i ∈{ ...n } j ∈{ ...m } dis ( p e .l i , p j .l i ) where dis is the Euclidean distance between two points. b) Comfort metric: This metric assesses how comfort-able the path has been for the driver. It is defined as maximumacceleration along the path: comf ( p e ) = max i ∈{ ...n } | p e .a i | Note that other comfort metrics could be defined in terms of,e.g., maximum torque or maximum lateral acceleration.III. P
ROPOSED APPROACH
In this paper, we are interested in defining sufficiency criteria for path planner testing. A path planner can takedifferent decisions on the base of the different environmentaland driving conditions in which it is operating; we wouldlike to check that all these possible decisions that can betaken by the path planner are observed in at least one test.However, we do not precisely know which scenarios cause aparticular decision. Moreover, we cannot even characterise allthe possible decisions taken by the path planner, i.e., given twodecisions (two short-term paths computed by the path planner)we do not know if they have been selected for the same reason.However, we can exploit the architecture of the particular pathplanner under test in order to create a proxy for these decisions.In this section, we describe how we propose to do this usinga mutation-based approach. In the path planner simulator we are using, the behaviour of dynamicobjects does not depend on the current situation, but only on the initialconditions specified in the scenario. For this reason, we can compute thepath offline.
A. Path planner under test
The path planner provided by our industrial partner works asfollows. At each timestep, it chooses which short-term path tofollow in the next time period (see Def. 3). In order to do this,it enumerates a set of possible short-term paths, and scoresthem using a weighted cost function that considers differentaspects: • Safety : no collision with moving or static objects musthappen and safety distances must be respected; • Vehicle Limitation : actions that cannot be achieved by thecar must be avoided (e.g., no impossible steering can berequired to follow a path); • Compliance : the car should respect road regulations asmuch as possible; • Comfort : the path should be as comfortable as possible forthe passenger, avoiding too much forward and/or lateralacceleration.In particular, the cost function uses these weights W for thedifferent aspects: • w : a factor that is multiplied with the maximum lateralacceleration along the short-term path; • w : a constant that is added to the total cost if themaximum lateral acceleration is over a given threshold; • w : a constant that is added to the total cost if the speedis over a given speed limit; • w : a constant that is added to total cost if the maximumacceleration along the short-term path is over a certainthreshold; • w : a constant that is added to the total cost if themaximum deceleration along the short-term path is overa given threshold; • w : a constant that is added to the total cost if the curva-ture along the short-term path is over a given threshold.As such, the different decisions that the system can take aretightly dictated by these weights. Note that weights w , w , w , w , and w are related to the safety aspect; weights w , w , and w , are related to the comfort aspect; weight w is related to the compliance aspect; finally, w , w , and w are related to the vehicle limitation aspect. A weight can beassociated with more than one aspect, e.g., w is associatedto the safety and comfort aspects, and w to all the aspects.Our industrial partner provided us with a version of thepath planner that has been calibrated with a satisfactory setof weight values w for W . In the following, we identify with PP w the path planner configured with weight values w . B. Weight coverage
Since the weights are strictly bound to the aspects that areconsidered in the decisions, we propose to map the coverageof the possible decisions with the coverage of the weights usedto make the decisions.Therefore, in this section, we propose a way to assesswhether a weight is involved in a decision and, in section III-C,a technique for measuring the sufficiency of a given test suite T in testing the weights w . efinition 4 (Weight coverage criterion) . Given a path planner PP w with weights w = { w , . . . , w k } , a test scenario s covers a weight w i ∈ w w.r.t. a metric M , if there exists a weight w (cid:48) i (cid:54) = w i such that M ( PP w ( s )) (cid:54) = M ( PP w (cid:48) i ( s )) with w (cid:48) i = { w , . . . , w (cid:48) i , . . . , w k } .Intuitively, a test scenario s covers a weight w i if, with an-other value of the weight, the path planner behaves differentlyaccording to metric M . A good test suite T should then coverall weights w i .Note that weight coverage has similarities with the MC/DCcoverage criterion [7] for Boolean expressions in which eachclause must be shown to determine the value of the globalpredicate in a test: given an assignment of truth values, a clause C determines the value of the global predicate P if flipping thevalue of C changes the value of P . In our case, for each weight w i , we want to have a test in which the aspect considered by w i has some influence on the final decision taken by the pathplanner; we want to show that by modifying the weight insome way we can also modify the decision. Remark 1.
The path planner, in order to decide the next short-term path in a scenario s , assigns a numerical cost to a set ofpossible short-term paths stp , . . . , stp n , using a cost functionthat depends on the weights w ; then, it selects the candidatewith the lowest cost. Changing a weight w i in w (cid:48) i will changethe cost of a given short-term path stp j from c j to c (cid:48) j = c j +∆ j . If the weight considers an aspect that is relevant for thescenario s , ∆ j will be different for the different short-termpaths and so their ranking could be modified (and so the finaldecision). Instead, if weight w i considers an aspect that isirrelevant for the scenario s , the costs of all the possible short-term paths will be modified by a same value ∆ ; therefore,the rank of the possible short-term paths will not be affectedand the same short-term path (i.e., the one selected with theoriginal weights) will be selected as final decision. C. Mutation-based approximation of weight coverage
As we can not exhaustively evaluate the weight coverageof a test suite T (the weights having continuous values), wepropose a mutation-based approach to estimate whether or not T covers the different weights.In the following, we describe the mutation operators weuse to generate mutants, some oracles that we use to assesswhether a test kills a mutant, and finally how we use these forestimating weight coverage.
1) Mutation operators:
In this work, we are only concernedwith the coverage of the test suite T w.r.t. each individualweight w i . Thus, we propose a simple mutation operator:each mutant w (cid:48) i differs only in the value w (cid:48) i of a weight w i , which is multiplied by a constant K , i.e., w (cid:48) i = K · w i .In order to explore different ranges for each weight, we usethe following values of K : 0, 0.5, 0.9, 1.1, 1.5, 2, 10. Thisleads to seven versions of the operator that we refer to as MOs = { MO , . . . , MO } . These factors were chosen to samplethe space of possible weight values. In particular, 0 and 10show extreme changes, 0 completely cancelling the effect of a weight. The other values of K let us explore the effect ofdifferent scales of change to the weight values.In the following, we identify with PP w ji the path plannerobtained from PP w by mutating weight w i with mutationoperator MO j . Remark 2.
Note that our mutation operators are not meantto be related to some fault-classes as in classical mutationanalysis, i.e., they are not meant to replicate some possiblefaults. They are used to artificially perturbate the path planner,such that it possibly takes different decisions due to the mu-tated weight. As future work, we could design more targetedmutation operators, based on system and domain knowledge.
2) Mutation oracles:
In order to assess whether a mutanthas been killed, we need to compare the paths computed bythe original path planner PP w and the mutated one PP w ji . Inthe following, given a scenario s , let p e = PP w ( s ) and p (cid:48) e = PP w ji ( s ) be two paths computed by the two path planners, and p = do ( s ) , . . . , p m = do m ( s ) be the paths of the dynamicobjects do . . . do m . The mutated path planner is considered killed by s if p e and p (cid:48) e are sufficiently different .In order to assess this, we can use different mutation oraclesthat differ in the characteristics of the paths they consider (e.g.,safety or comfort). We devised the following oracles, definedas predicate killed that tells whether a scenario s kills the pathplanner PP w ji obtained by mutating weight w i with mutationoperator MO j . Path Oracle (PO)
Given a threshold θ P , the mutated pathplanner is considered killed if there is a timestep in whichthe difference in the position of the ego car in the twopaths is greater than θ P , i.e., killed P ( s, w i , MO j ) =( ∃ i ∈ { , . . . , n } : dis ( p e .l i , p (cid:48) e .l i ) > θ P ) where dis is the Euclidean distance. Safety Oracle (SO)
Given a threshold θ S , the mutated pathplanner is killed if the difference of the minimum dis-tances (with the dynamic objects) of the two paths isgreater than θ S , i.e., killed S ( s, w i , MO j ) = (cid:18)(cid:12)(cid:12)(cid:12)(cid:12) minDis ( p e , { p , . . . , p m } ) − minDis ( p (cid:48) e , { p , . . . , p m } ) (cid:12)(cid:12)(cid:12)(cid:12) > θ S (cid:19) Comfort Oracle (CO)
Given a threshold θ C , the mutatedpath planner is killed if the difference of the comfortmeasure in the two paths is greater than θ C , i.e., killed C ( s, w i , MO j ) = ( | comf ( p e ) − comf ( p (cid:48) e ) | > θ C ) Thresholds θ P , θ S , and θ C must be selected by the domainexpert who can tune how much difference must be observedin order to declare a mutant killed.The different measures assess different aspects of the systembut are also more or less “strict”. For example, the Path Oracleis the “easiest” to kill, and should be subsumed by the otheroracles (for equivalent thresholds). This idea can be somehowrelated to the idea of weak and strong mutation testing [6]ABLE I: D ESCRIPTION OF THE TEST SUITE SCENARIOSID Description s The ego car is proceeding on a lane and two dynamic objects cross the street closely in front of it. s The ego car is proceeding on a lane, following a slowing dynamic object and with a faster dynamic object coming from behind. s The ego car is proceeding on a lane and a dynamic object is proceeding in the different direction on a different lane. s The ego car is proceeding on a lane, encounters a parked car, and overtakes it. s Similar to s , but there is another car coming from the opposite direction. The ego car has enough time to overtake the parked car before the othercar arrives. s Similar to s , but the ego car must let the other car pass before overtaking the parked car (as there is not enough time before). s At a crossing, the ego car must turn right, while a dynamic object crosses the intersection from the opposite direction. The ego car must let theobject pass before turning. s At a crossing, the ego car must turn right, while a dynamic object is approaching the intersection from right. The ego car must slow down and letthe dynamic object pass. s At a crossing, the ego turns right, and, just after the turn, it encounters a dynamic object coming against the flow of traffic in its target lane. s The ego car is approaching from behind a dynamic object that is slowing down. in classic software testing, where weak mutation measureschanges in the internal state of the program caused by mutants,while strong mutation considers only changes to output.
3) Estimating weight coverage:
We can now describe a wayto estimate weight coverage. From an original valuation w ofweights W for the path planner PP , we create mutants w (cid:48) byapplying the mutation operators described before, by changingthe value of each weight w i in w in turn, thereby obtainingmutated versions of the path planner PP w ji . We then run T against each PP w ji and determine which mutants are killed by T according to our different oracles. Finally, following ourinitial definition of weight coverage (see Def. 4), we estimatethat a weight w i is covered w.r.t. a metric M (with M ∈{ P, S, C } ) if one of its mutants is killed in a scenario, i.e., covered ( w i , T, M ) =( ∃ s ∈ T, ∃ MO j ∈ MOs : killed M ( s, w i , MO j )) (1)By Def. 4, if covered ( w i , T, M ) holds, then w i is also cov-ered for the weight coverage criterion. If covered ( w i , T, M ) does not hold, we can estimate that weight coverage does nothold as well, assuming that the mutation operators MOs area good proxy of all the possible weight changes (see Sect. VIfor a more detailed discussion on this point). In the remainderof the text, we use the phrase weight coverage for both thecoverage and its mutation-based approximation.IV. E
XPERIMENTS
In order to evaluate the approach, we designed a test suite T composed of 10 scenarios, whose description is reportedin Table I. Note that the path planner is designed to work incountries as Ireland and Japan that adopt the left-hand trafficpractice. While designing the test suite, we tried to coverdifferent kinds of manoeuvres (e.g., going straight, overtak-ing a parked car, turning at a crossroad, giving precedenceto another car, etc.). All the scenarios have been designedmanually, except for scenario s that has been found using asearch algorithm with the aim of having a dangerous situation.Then, we mutated the six weights W = { w , . . . , w } of theoriginal path planner (see Sect. III-A) using the seven mutationoperators MOs = { MO , . . . , MO } described in Sect. III-C1.Therefore, in total we have × mutated versionsof the path planner; as before, we identify with PP w ji the TABLE II: W EIGHT COVERAGE (T:
COVERED , F:
NOT COV - ERED ) Weight Mutation oracle
PO SO CO w T T T w T T T w T F F w T T T w T T T w T F F path planner obtained from PP w by mutating weight w i withmutation operator MO j .We then ran the designed test suite T on the original pathplanner PP w and the 42 mutated versions PP w ji ; we collectedall the produced paths and computed the mutation oracles asspecified in Sect. III-C2. For the experiments, thresholds θ P , θ S , and θ C of the mutation oracles have been set to 0: in thisway, it is easier to compare the killing strength of each oracle.We evaluated the approach using four research questions. RQ1:
What is the weight coverage of the designed test suite?
We are interested in assessing how much the designed testsuite covers the path planner weights. Table II reports, foreach weight w i , its coverage (either T rue or F alse) accordingto the three mutation oracles (see Eq. 1); we highlight in greythe covered cases. Weights w , w , w , and w are coveredby the test suite with all the mutation oracles; these weightsare all related to (lateral) acceleration/deceleration. The factthat they are all covered means that the test suite containstests in which the acceleration has some effect on the decisiontaken by the path planner. Moreover, they are covered not onlywith the path oracle (that is a weak criterion for declaring amutant killed), but also with the safety and comfort oracles,that are more demanding: this means that the killed mutantschange both the minimum distance with the other dynamicobjects (considered in the safety oracle) and the maximumacceleration (considered in the comfort oracle).Weights w and w (related to the speed limit and suddenchange of direction), instead, are only covered by the pathoracle. This means that, although the mutants can slightlychange the taken path, they do not affect the minimum distancewith the other cars and the maximum speed. RQ2:
What is the weight coverage provided by each singlescenario?
ABLE III: W
EIGHT COVERAGE BY SCENARIO (T:
COVERED , F:
NOT COVERED ) (a) Mutation oracle PO s Weight Count w w w w w w s T T T T F F 4/6 s F T F T F T 3/6 s F F F F F F 0/6 s T T F F F F 2/6 s T T F T T F 4/6 s F T F T T F 3/6 s T T F T T F 4/6 s T T F T T F 4/6 s T T F T T F 4/6 s T F F T T F 3/6Count 7/10 8/10 1/10 8/10 6/10 1/10 (b) Mutation oracle SO s Weight Count w w w w w w s F T F F F F 1/6 s F T F F F F 1/6 s F F F F F F 0/6 s T F F F F F 1/6 s T T F T T F 4/6 s F T F F T F 2/6 s T T F T T F 4/6 s F F F F T F 1/6 s T T F F F F 2/6 s T F F T T F 3/6Count 5/10 6/10 0/10 3/10 5/10 0/10 (c) Mutation oracle CO s Weight Count w w w w w w s F F F F F F 0/6 s F T F F F F 1/6 s F F F F F F 0/6 s F F F F F F 0/6 s F F F F F F 0/6 s F F F F F F 0/6 s T T F T F F 3/6 s F F F F F F 0/6 s F F F F F F 0/6 s F F F F T F 1/6Count 1/10 2/10 0/10 1/10 1/10 0/10
We want to conduct a deeper analysis on the coverageprovided by each single scenario. Table III reports, for thethree mutation oracles, whether a given scenario covers agiven weight. Considering mutation oracle PO, we observethat scenarios s , s , s , s , and s cover more than halfof the weights (4/6); these scenarios are among the mostcomplicated ones (see the description in Table I), in whichdifferent aspects must be taken into consideration; this alsopartially holds for scenarios s , s , and s that cover halfof the weights. Scenario s does not cover any weight, asit simply describes a situation in which the ego car is goingstraight, and not too many factors influence the decision ofthe path planner in this case.Regarding the mutation oracle SO, in general, scenarios killfewer weights than what done with the mutation oracle PO:this is expected, as SO subsumes PO (i.e., if the minimumdistance is different, the path must be different as well, butnot the other way round). However, scenarios s , s , and s kill the same weights with the two oracles; this means that,for these scenarios, the mutants always lead to a different pathin which the minimum distance with the dynamic objects isaffected (either smaller or larger). Indeed, using the originalpath planner, the ego car gets quite close to the dynamicobjects, and so it is reasonable that any change in the pathaffects also the minimum distance.Regarding the mutation oracle CO, only three scenarioskill some weight. Scenario s achieves the highest coverage,killing half of the weights; this is due to the fact that thechange of the weights leads to either a greater maximumacceleration to cross before the incoming car, or a lowermaximum acceleration to let the other car pass, dependingon the mutants (see scenario description in Table I). RQ3:
How many scenarios cover each weight?
We now want to assess how easy it is to cover a weight;we assume that the more scenarios cover a weight, the easierit is to cover it. The last rows of the tables in Table III reportthe count of scenarios covering a given weight.Using the mutation oracle PO, we observe that w , w ,and w are the weights that are easier to cover. Indeed, theyare all related to lateral/normal acceleration and very likely a decision of the path planner depends on the acceleration (andso a perturbation of the weights changes the computed path).Instead, weight w (related to the violation of the speedlimit) is only covered by scenario s in which the ego car isclose to collision with two other dynamic objects. We furtherobserve that, for w , only mutant PP w , in which the constrainton the speed limit is completely removed, is killed: in this way,the path planner can compute an even safer (and so different)path that avoids the dynamic objects faster.Also weight w (related to sudden change of direction) isonly covered by a single scenario, namely s . In s the egocar is approaching a slowing car and is followed by a fastcar that is approaching its back: by relaxing the constraint onthe sudden change of lane, the path planner can compute adifferent and safer path.Observations similar to those done for mutation oracle POcan also be done for mutation oracle SO. We only observethat w is no more covered by s : this means that, althoughthe mutated path planner can compute a different path, theminimum distance to the other dynamic objects remains thesame (the mutated path planner can simply exit from thedangerous situation faster, as the constraint on the speed limithas been relaxed). In the same way, scenario s no longercovers weight w : the mutated path planner can avoid thedangerous situation with a more sudden action (because theweight is relaxed) but it reaches the same minimum distanceas the original path planner.As we already observed in RQ2, the comfort oracle CO ishighly demanding and it is difficult to kill mutants with thisoracle (it is only possible by obtaining a path with a differentmaximum acceleration). As expected, only the weights relatedto acceleration (i.e., w , w , w , and w ) can be covered byat least one scenario. RQ4:
What is the weight coverage provided by each mutationoperator?
We are interested in assessing which mutation operatorsproduce mutants that are easier to kill. Table IV reports, forthe three mutation oracles, whether a given mutation operator(we report the constant K used in the operator) produces amutated path planner that is covered (for at least one scenarioABLE IV: W EIGHT COVERAGE BY MUTATION OPERATOR (T:
COVERED , F:
NOT COVERED ) (a) Mutation oracle PO K Weight Count w w w w w w (b) Mutation oracle SO K Weight Count w w w w w w (c) Mutation oracle CO K Weight Count w w w w w w of the test suite T ). For all the mutation oracles, coverage iscorrelated with the degree of change of the weight: mutationoperators that change the weight significantly (i.e., 0 and 10)are those that cover the most (5 out of 6 weights), while weakermutation operators (i.e., 0.9 and 1.1) cover less. For mutationoracle CO, only mutants with K equal to 0, 2, or 10, can leadto the coverage of at least one weight.The results in Table IV also provide some insights on theweights themselves. Let’s consider the results of the mutationoracle PO in Table IVa. We observe that some weights suchas w and w are covered with almost any mutation operator:this means that the weight is important in the decision makingof the path planner and thus it is sensitive to small changes.On the other hand, if a system designer knew their test suiteis strong, but a weight is not covered, this could show that theweight has no influence on the decision making, and couldhighlight a fault in the system or its design.V. D ISCUSSION
We now provide more general observations about the pro-posed approach.The first observation is related to the coverage of a weight.If a weight w i is never covered in a test suite, it could meanthat either the test suite is not complete enough to cover w i , or w i is never relevant in the decisions taken by the path planner.In the former case, we would just need to add some scenariotrying to cover w i ; in the latter case, we would need to mark w i as an infeasible test requirement and we could report aproblem in the path planner. However, detecting infeasible testrequirements is in general undecidable.Another observation is related to the completeness of themutation-based approach. In order to approximate weight cov-erage of a weight w i , we propose to use a set of seven mutantswhere w i is modified using seven constants of different scales.It could be that w i is covered by a test suite T accordingto weight coverage (see Def. 4), but not using the mutation-based approximation. However, we believe that this does notaffect the general conclusions of our experiments regarding therelations between the scenarios and the weights: it is unlikelythat, given two scenarios that do not cover any weight withthe mutation-based approach, there are some other changesof the weights (not considered by the mutants) that cover onescenario and not the other. Indeed, the path planner considered in this work uses a linear cost function. For a new value tochange the result of a test when no mutant does, it wouldthen need to induce a greater change than the mutants, whichhave already been designed to induce significant changes tothe weights. Still, if such a case occurred, one could ponder thesignificance of such a coverage: does a scenario meaningfullycover a weight, if for the decision of the path planner in thisscenario to change one must introduce massive change to theweight?A final observation is related to the mutation oracles. Wecan note that, although the path oracle is a very weak criterion,it is still useful to decide whether a scenario should be kept inthe test suite: if a scenario cannot kill any weight even withregards to the path oracle, it means that it is not challengingthe path planner at all and should be removed from the testsuite (as scenario s . See Table IIIa).VI. T HREATS TO VALIDITY
We identify these threats to the validity of the approach.A threat to external validity [8] is that the approach may notbe generalizable to other systems. As this is a project drivenby a collaboration with an industrial partner, the solution hasthe risk to be too domain specific. First of all, we want topoint out that, in some cases, a solution to a given problem isnecessarily domain-specific and trying to achieve generabilitycould also be counterproductive [9]. Moreover, we still believethat the approach could be applied to other systems similarto the path planner, i.e., systems solving some optimizationproblems using some weights to consider different aspects.As future work, we plan to evaluate whether the approach isgeneralizable to a broader class of systems.A threat to internal validity [8] could be that our mutation-based approach could be faulty and so the obtained resultswould be not meaningful. In order to mitigate this threat, wechecked that the mutated scenarios are syntactically correctand that they are parsed correctly by the path planner sim-ulator; moreover, we assessed that the mutation oracles areimplemented correctly by verifying that some known relationsbetween them hold: for example, if the path oracle is 0, theother two oracles must be 0 as well.VII. R
ELATED WORK
In this section we review some related work concerningtesting of, and testing criteria for automated driving systems,s well as non-conventional applications of mutation analysis.Testing of autonomous driving systems is a complex issuethat includes many challenges, as highlighted by Koopman andWagner in [10]. In [11], Wachenfeld and Winner show that it isinfeasible to test autonomous driving systems only using reallife test drives. Indeed, they show that, according to Germanhighway driving data, one would have to drive 6.61 billionkilometers in order to encounter some fatal scenarios, i.e.,the scenarios that should be most critically tested. Zhao andPeng [12] and Helmer et al. [13] arrive at similar conclusions,stating that billions of kilometers should be driven to achievesufficient testing guarantees.As such, many works [3], [13]–[15] focus on using sim-ulation and particular scenarios to test autonomous drivingsystems, this is the situation we are in in the context of thiswork. In this context, the question of test sufficiency, or ofa test stopping criterion becomes essential. Indeed, as Hauerat al. remark in [4], “One can always come up with anotherscenario type as well as with instances of those types thatare different from the types and instances used before”, whichmeans that we need a criterion to know when our test data hascovered all plausible situations. Our work not only focuses ona test ending criterion but on a more general testing criterionthat lets us evaluate how much of the system’s decision spacea test suite covers, rather than how much of the possiblescenarios have been covered, as different scenarios could leadto the same decisions.Mutation analysis has been applied to diverse domains [6],and recently to deep neural networks (DNNs). DNNs havethe same characteristic as the path planner, in that theirbehavior is governed by computed numerical values, ratherthan logical branches, and that their correctness is evaluated bysome metrics (e.g., accuracy) rather than with pass/fail tests.A mutation analysis method for DNNs has been proposedthat considers mutations on training data, training code, andtrained models [16]. The mutation score evaluates whethereach mutation changes correct classification into misclassifi-cation in some test data. Our proposal works with the morecomplex situation of path planner. Although the mutationtargets (weights) are also continuous values, we deal withcomplex oracles and multiple evaluation criteria, instead ofthe binary problem of misclassification.VIII. C
ONCLUSIONS
In this paper, we proposed a mutation-based approach forassessing whether all the possible decisions that can be takenby the path planner of an autonomous car are covered ina test suite (each test is a scenario). The path planner weconsider makes decisions by using a weighted function ofdifferent aspects (safety, comfort, etc.). The approach consistsin mutating the weights and checking whether the test suite isable to kill the mutant. The approach has been experimented ona manually designed test suite; we observed that some weightsare easier to cover as they consider aspects that occur moreoften in a scenario. Moreover, more complicated scenarios thatgenerate more complex paths are those that allow coverage of more weights. We believe that these preliminary resultsconfirm our intuition that the proposed coverage criterion isreasonable. However, more rigorous and systematic evaluationis needed: as future work, we plan to perform a wider set ofexperiments using different test suites, automatically generatedand manually designed. Moreover, we plan to assess whetherweight coverage correlates with good fault detection.Finally, we believe that the proposed approach is not onlyapplicable to path planners, but to any optimisation programthat relies on a weighted function; as future work, we plan togive a more general definition of the weight coverage criterion,and experiment it on a wider class of systems.R , June 2018, pp. 1821–1827.[4] F. Hauer, T. Schmidt, B. Holzmuller, and A. Pretschner, “Did we test allscenarios for automated and autonomous driving systems?” in
PrePrintfor Proc. of IEEE Intelligent Transportation Systems Conference , 2019.[5] K. Czarnecki, “WISE drive: Requirements analysis framework forautomated driving systems,” Waterloo Intelligent Systems EngineeringLab (WISE), University of Waterloo, Tech. Rep., 07 2018, https://uwaterloo.ca/waterloo-intelligent-systems-engineering-lab/projects/wise-drive-requirements-analysis-framework-automated-driving.[6] M. Papadakis, M. Kintis, J. Zhang, Y. Jia, Y. Le Traon, and M. Harman,“Chapter six - mutation testing advances: An analysis and survey,” ser.Advances in Computers. Elsevier, 2019, vol. 112, pp. 275–378.[7] J. J. Chilenski and S. P. Miller, “Applicability of modified condi-tion/decision coverage to software testing,”
Software Engineering Jour-nal , vol. 9, no. 5, pp. 193–200, Sep. 1994.[8] C. Wohlin, P. Runeson, M. Hst, M. C. Ohlsson, B. Regnell, andA. Wessln,
Experimentation in Software Engineering . Springer Pub-lishing Company, Incorporated, 2012.[9] L. C. Briand, D. Bianculli, S. Nejati, F. Pastore, and M. Sabetzadeh, “Thecase for context-driven software engineering research: Generalizabilityis overrated,”
IEEE Software , vol. 34, no. 5, pp. 72–75, 2017.[10] P. Koopman and M. Wagner, “Challenges in autonomous vehicle testingand validation,”
SAE Int. J. Trans. Safety , vol. 4, pp. 15–24, 04 2016.[11] W. Wachenfeld and H. Winner,
The Release of Autonomous Vehicles .Berlin, Heidelberg: Springer Berlin Heidelberg, 2016, pp. 425–449.[12] D. Zhao and H. Peng, “From the lab to the street: Solving the challengeof accelerating automated vehicle testing,”
CoRR , vol. abs/1707.04792,2017.[13] T. Helmer, L. Wang, K. Kompass, and R. Kates, “Safety performanceassessment of assisted and automated driving by virtual experiments:Stochastic microscopic traffic simulation as knowledge synthesis,” in , Sep. 2015, pp. 2019–2023.[14] C. Roesener, F. Fahrenkrog, A. Uhlig, and L. Eckstein, “A scenario-based assessment approach for automated driving by using time seriesclassification of human-driving behaviour,” in , Nov2016, pp. 1360–1365.[15] E. de Gelder and J.-P. Paardekooper, “Assessment of automated drivingsystems using real-life scenarios,” in , June 2017, pp. 589–594.[16] L. Ma, F. Zhang, J. Sun, M. Xue, B. Li, F. Juefei-Xu, C. Xie, L. Li,Y. Liu, J. Zhao, and Y. Wang, “DeepMutation: Mutation testing ofdeep learning systems,” in2018 IEEE 29th International Symposiumon Software Reliability Engineering (ISSRE)