[PDF] Assisted Coverage Closure

Abstract

The malfunction of safety-critical systems may cause damage to people and the environment. Software within those systems is rigorously designed and verified according to domain specific guidance, such as ISO26262 for automotive safety. This paper describes academic and industrial co-operation in tool development to support one of the most stringent of the requirements --- achieving full code coverage in requirements-driven testing. We present a verification workflow supported by a tool that integrates the coverage measurement tool RapiCover with the test-vector generator FShell. The tool assists closing the coverage gap by providing the engineer with test vectors that help in debugging coverage-related code quality issues and creating new test cases, as well as justifying the presence of unreachable parts of the code in order to finally achieve full effective coverage according to the required criteria. To illustrate the practical utility of the tool, we report about an application of the tool to a case study from automotive industry.

Full PDF

AAssisted Coverage Closure (cid:63)

Adam Nellis , Pascal Kesseli , Philippa Ryan Conmy , Daniel Kroening ,Peter Schrammel , Michael Tautschnig Rapita Systems Ltd, UK University of Oxford, UK Queen Mary University of London, UK

Abstract.

The malfunction of safety-critical systems may cause dam-age to people and the environment. Software within those systems isrigorously designed and veriﬁed according to domain speciﬁc guidance,such as ISO26262 for automotive safety. This paper describes academicand industrial co-operation in tool development to support one of themost stringent of the requirements — achieving full code coverage inrequirements-driven testing.We present a veriﬁcation workﬂow supported by a tool that integratesthe coverage measurement tool RapiCover with the test-vector genera-tor FShell. The tool assists closing the coverage gap by providing theengineer with test vectors that help in debugging coverage-related codequality issues and creating new test cases, as well as justifying the pres-ence of unreachable parts of the code in order to ﬁnally achieve full eﬀective coverage according to the required criteria.To illustrate the practical utility of the tool, we report about an appli-cation of the tool to a case study from automotive industry.

Software within safety-critical systems must undergo strict design and veriﬁ-cation procedures prior to their deployment. The recently published ISO26262standard [1] describes the safety life cycle for electrical, electronic and softwarecomponents in the automotive domain. Diﬀerent activities are required at dif-ferent stages of the life cycle, helping ensure that system safety requirements aremet by the implemented design. The rigor to which these are carried out dependson the severity of consequences of failure of the various components. Compo-nents with automotive safety integrity level (ASIL) D have the most stringentrequirements, and ASIL A the least strict. One of the key required activities forsoftware is to demonstrate the extent to which testing has exercised source code,also known as code coverage. This can be a challenging and expensive task [4],with much manual input required to achieve adequate coverage results.This paper presents work undertaken within the Ve riﬁcation and Te sting toSupport Functional S afety S tandards (VeTeSS) project, which is developing new (cid:63) The research leading to these results has received funding from the ARTEMIS JointUndertaking under grant agreement number 295311 “VeTeSS”. a r X i v : . [ c s . S E ] S e p ype Description ASILFunction(arch level) Each function in the code is exercised at leastonce A, B (R); C, D (HR)Statement Each statement in the code is exercised at leastonce A, B (HR); C, D (R)Branch Each branch in the code has been exercised forevery outcome at least once. A (R); B, C, D (HR)MC/DC Each possible condition must be shown to inde-pendently aﬀect a decision’s outcome. A, B, C (R); D (HR) Table 1.

ISO26262 Coverage Requirements (HR = Highly Recommended, R = Rec-ommended) tools and processes to meet ISO26262. The main contribution of this paper isan integration of the FShell tool [8] with an industrial code coverage tool (Rapi-Cover) in order to generate extra test cases and increase code coverage results.An additional contribution is to present a discussion as to how this technologymight be most appropriately used within the safety life cycle. Achieving 100%code coverage can be a complex and diﬃcult task, so tools to assist the processare desirable, however there is a need to ensure that any additional automaticallygenerated tests still address system safety requirements.Safety standards require diﬀerent depths of coverage depending on the ASILof the software. The requirements of ISO26262 are summarized in Tab. 1. Theaim of requirements-based software testing is to ensure the diﬀerent types ofcoverage are achieved to 100% for each of the categories required. In practicethis can be extremely diﬃcult, e.g. defensive coding can be hard to provide testvectors for. Another example is code that may be deactivated in particular modesof operation. Sometimes there is not an obvious cause for lack of coverage aftermanual review. In this situation, generating test vectors automatically can bebeneﬁcial to the user providing faster turnaround and improved coverage results.This paper is laid out as follows. In Sec. 2 we provide background to the cov-erage problem being tackled, and criteria for success. In Sec. 3 we describe thespeciﬁc tool integration. Sec. 4 describes an industrial automotive case study.Sec. 5 looks at both previous work and some of the lessons learned from the im-plementation experience, and suggested improvements. Finally we present con-clusions and further work.The contribution of this paper is by and large of practical nature: the inte-gration of formal-methods based tools with industrial testing software. In thesafety-critical domain these two areas are generally separated from one another,with formal methodology used only for small and critical sections of software toprove correctness and viewed as an expensive procedure. In some cases the meth-ods are seen in direct odds to one another [6]. The tool is at a prototype stageof development, and the authors are working with industrial partners to assessfuture improvements to prepare its commercialization, as described in Sec. 5.

Assisted Coverage Closure

Testing has to satisfy two objectives: it has to be eﬀective, and it has to becost-eﬀective. Testing is eﬀective if it can distinguish a correct product from onethat is incorrect. Testing is cost-eﬀective if it can achieve all it needs to do atthe lowest cost (which usually means the fewest tests, least amount of eﬀort andshortest amount of time).Safety standards like ISO26262 and DO-178B/C demand requirements-driventesting to increase conﬁdence in correct behavior of the software implemented.Correct behavior means that the software implements the behavior speciﬁed inthe requirements and that it does not implement any unspeciﬁed behaviors. Asa quality metrics they demand the measurement of coverage according to cer-tain criteria as listed in Tab. 1, for instance. The rationale behind using codecoverage as a quality metrics for assessing the achieved requirements coverageof a test suite is the following: Suppose we have a test suite that presumablycovers each case in the requirements speciﬁcation, then, obviously, missing orerroneously implemented features may be observed by failing test cases, whereasthe lack of coverage, e.g. according to the MC/DC criterion, indicates that thereis behavior in the software which is not exercised by a test case. This may hintat the following software and test quality problems:(A) Some cases in the requirements speciﬁcation have been forgotten. Theserequirements have to be covered by additional test cases.(B) Features have been implemented that are not needed. Unspeciﬁed featuresare not allowed in safety-critical software and have to be removed.(C) The requirements speciﬁcation is too vague or ambiguous to describe afeature completely. The speciﬁcation must be disambiguated and reﬁned.(D) Parts of the code are unreachable. The reasons may be:(1) A programming error that has to be ﬁxed.(2) Code generated from high-level models often contains unreachable codeif the code generator is unable to eliminate infeasible conditionals.(3) It may actually be intended in case of defensive programming and errorhandling.In the latter case, fault injection testing is required to exercise these features[9]. Dependent on the policy regarding unreachable code, case (2) can behandled through justiﬁcation of non-coverability, tuning the model or thecode generator, or post-processing of generated code.The diﬃculty for the software developer consists in distinguishing above cases.This is an extremely time consuming and, hence, expensive task that calls fortool assistance.

Given – an implementation under test (e.g. C code generated from a Simulink model), – an initial test suite (crafted manually or generated by some other test suitegeneration techniques), and a coverage criterion (e.g. MC/DC),we aim at increasing eﬀective test coverage by automatically – generating test vectors that help the developer debug the software in orderto distinguish above reasons (A)–(D) for missing coverage; – in particular, suggesting additional test vectors that help the developer createtest cases to complete requirements coverage in case (A); – proving infeasibility of non-covered code, thus giving evidence for arguingnon-coverability.Note that safety standards like to DO-178C [13] allow only requirements-driven test-case generation and explicitly forbid to achieve full structural codecoverage by blindly applying automated test-vector generation. This can easilylead to confusion if the distinction between test- case generation and test- vector generation is not clearly made. Test- vector generation can be applied blindlyto achieve full coverage, but it is without use by itself. A test vector is onlya part of a test case because it lacks the element that provides informationabout the correctness of the software, i.e. the expected test result. Only therequirements can tell the test engineer what the expected test result has to be.Test- case generation is thus always based on the requirements (or a formalizedmodel thereof if available). Our objective is to provide assistance for test-casegeneration to bridge the coverage gap. Combining a test-case generator with a coverage tool provides immediate accessto test vectors needed to obtain the level of coverage required for your qualiﬁca-tion level.Coverage tools determine which parts of the code have been executed by usinginstrumentation. Instrumentation points are automatically inserted at speciﬁcpoints in the code. If an instrumentation point is executed, this is recorded in itsexecution data. After test completion, the coverage tool analyzes the executiondata to determine which parts of the source code have been executed. The toolthen computes the level of coverage achieved by the tests.We use the coverage tool RapiCover, which is part of the RVS tool suitedeveloped by Rapita Systems Ltd.

We use the test vector generator, FShell [8] (see Sec. 3.2 for details), which isbased on the Software Bounded Model Checker for C programs, CBMC [3].Viewing a program as a transition system with initial states described by thepropositional formula

Init , and the transition relation

Trans , Bounded ModelChecking (BMC) [2] can be used to check the existence of a path π of length k from Init to another set of states described by the formula ψ . This check isperformed by deciding satisﬁability of the following formula using a SAT or SMTsolver: Init ( s ) ∧ (cid:94) ≤ j

The Coverage Closure Process

If the solver returns the answer “satisﬁable”, it also provides a satisfying assign-ment to the variables ( s , i , s , i , . . . , s k − , i k − , s k ). The satisfying assignmentrepresents one possible path π = (cid:104) s , s , . . . , s k (cid:105) from Init to ψ and identiﬁes thecorresponding input sequence (cid:104) i , . . . , i k − (cid:105) .Besides being useful for refuting safety properties (where ψ deﬁnes the errorstates), BMC can be used for generating test vectors (where ψ deﬁnes the testgoal to be covered).The analysis performed by CBMC is bit-exact w.r.t. the machine semanticsof the execution target and CBMC provides full bit-exact support for ﬂoatingpoint arithmetic. Architecture-speciﬁc settings can be conﬁgured via commandline in FShell and RapiCover supports on-target coverage measurement. We arehence guaranteed that the generated test vectors are going to cover the testgoals. In addition, using BMC in a test-vector generator permits generating theshortest test vectors possible to cover a certain test goal or even a whole groupof test goals, which helps keeping test suites concise and test execution fast [14].An advantage of using a model checker is also its ability to ﬁnd test vectorsfor corner cases (“Under which conditions can this ﬂoating point variable takethe value NaN?”). Moreover, in our experience, due to the high precision ofthe analysis, it is even very likely to discover inconsistencies and holes in therequirements speciﬁcation during test-vector generation.At last, BMC can give a proof of unreachability of a test goal if loops can beunrolled completely; or otherwise, k -induction [15], a BMC-based technique forunbounded model checking, can be used to attempt a proof. The algorithm that we implement to assist the coverage closure process is shownin Fig. 1. It proceeds as follows:1. We start with an initial test suite that has been crafted manually or has beengenerated using other test-case generation techniques like directed randomtesting. The initial test suite may be empty, but many test goals can be easilycovered using test-case generation methods that are cheaper than BoundedModel Checking. It is thus recommended to start with such a base test suite.. In the next step, this test suite is run using the coverage measurement toolin order to obtain a list of non-covered test goals . Coverage measurement canbe performed on a developer machine to obtain approximate coverage, butﬁnal certiﬁcation data has to be obtained by running the test suite on theactual target platform.3. The test-vector generator takes the list of non-covered test goals and triesto compute input values to cover them. Ideally, the test-vector generatoris parametrized with the architectural parameters of the target platform inorder to obtain guarantees that the goals are indeed going to be covered. Asour test-vector generator is a Bounded Model Checker, there will be threepossible outcomes of an attempt to cover test goals:(a) A test goal has been covered. In this case this new test vector is presentedto the user who has to turn it into a new test case to be added to the testsuite . Note that building the new test case is the only part of the process(bold edge) that is not fully automatic since human judgment is requiredto identify why the corresponding test goal has not been covered in theﬁrst place, i.e. distinguishing reasons (A)–(D) in Sec. 2.(b) It is infeasible to cover a test goal . This happens when the test-vectorgenerator comes up with a proof of unreachability of the test goal. Asmentioned above, a Bounded Model Checker can provide such proofsif the loops have been unwound completely, for instance. In this case,the corresponding test goal can be annotated in the coverage report as proven infeasible to justify its non-coverability. This increases eﬀective coverage by reducing the number of genuinely coverable test goals.(c) The goal has not been covered and we were unable to prove infeasibilityof the test goal. With a Bounded Model Checker this can happen if thechosen bound k has been too low. In this case the test goal will remainuncovered and it can be tried to cover it with a higher value for k in thenext iteration of the process.4. Coverage of the enhanced test suite is then measured again to identify testgoals that remain uncovered, and the process is repeated. Generated teststypically will cover more test goals than intended. Measuring coverage be-tween generating tests increases cost-eﬀectiveness of the process by eliminat-ing unnecessary test-case generations.5. If there are no more non-covered test goals we have achieved full coverage and the process terminates.Note that the process depicted in Fig. 1 is not speciﬁc to our tool but appliesin general. In particular, it does not rely on the test-vector generator to guaranteethat a generated test vector covers the test goal it has been generated for, becausethe coverage measurement tool will check all generated test cases anyway forincreasing the coverage. However, the generation of useless test cases can beavoided by using a tool such as FShell that can provide such guarantees.Then, in theory, termination of the process achieving full coverage can beguaranteed, because embedded software is ﬁnite state. In practice, however, thisepends on the reachability diameter of the system [12] and the capacity of thetest-vector generator to cope with the system’s size and complexity. The input to the tool is a C program with an initial test suite. The outputof the tool is twofold. The ﬁrst output is a set of generated test vectors thataugment the initial test suite to increase its coverage. The second output is acoverage report detailing the level of coverage achieved by the initial test suite,and the extra coverage added by the generated test cases. Fig. 2.

RVS Process

FShell has been in-tegrated into RapiCoveras context menu option,shown in Fig. 3. Rapi-Cover can be used toselect a single function,call, statement, decisionor branch. The tool thenuses FShell to generatea test vector for this el-ement. Alternatively, thetool has a button to gen-erate as much coverageas possible. When thisoption is chosen, the toolgoes around the loop de-scribed in Fig. 1, usingFShell to repeatedly generate test cases to increase the coverage as much aspossible, verifying the obtained coverage with RapiCover.There is tension between the need to demonstrate that the activities pre-scribed by ISO26262 have been met in spirit as well as with quantiﬁable criteria.Recall that achieving 100% code coverage during testing does not ensure thecode meets its intent. Consequently the FShell plug-in would be provided asadvisory service, generating candidate test vectors, which a user can examine tohelp them identify why their planned testing was inadequate. Values generatedneed to be assessed for being valid for the system under test, i.e. reﬂect realworld values that could be input to a function, e.g. from a sensor. RVS is licensed software. An evaluation version can be requested from . The licensing policy disallows anonymous licenses. Tocompensate for this, we provide a video showing the plug-in here: https://drive.google.com/file/d/0B7xeLJ8vk3W8Y094TVc4Rmh0S0k . ig. 3. Screenshot of RapiCover with the FShell Plug-in

RapiCover uses instrumentation to determine which program parts have beenexecuted. Instrumentation points are automatically inserted at speciﬁc pointsin the code. Execution of an instrumentation point is recorded in its executiondata. Upon test completion, RapiCover analyzes the execution data to determinewhich instrumentation points have been hit.The ﬁrst step in the RapiCover analysis process is to create an instrumentedbuild of the application ((1) in Fig. 2). RapiCover automatically adds instru-mentation points ((2) in Fig. 2) to the source code.The instrumentation code itself takes the form of very lightweight measure-ment code that is written for each target to ensure minimal impact on theperformance of the software, and to support on target testing for environmentswith limited resources. The instrumented software and possibly an instrumen-tation library are compiled and linked using the standard compiler tool chain.The executable produced is then downloaded onto the target hardware. The ex-ecutable is exercised and instrumentation data ((3) in Fig. 2) is generated andretrieved. This data is used to generate coverage metrics. FShell is an extended testing environment for C programs supporting a richscripting language interface. FShell’s interface is designed as a database engine,dispatching queries about the program to various program analysis tools. Thesequeries are expressed in the FShell Query Language (FQL). Users formulate test Available from: http://forsyte.at/software/fshell peciﬁcations and coverage criteria, challenging FShell to produce test suitesand input assignments covering the requested patterns. The program supports arich and extensive interface. The expressions used for the FShell plugin for RVSimplementation are listed in Tab. 2 with syntax and examples.

Expression Name Syntax ExampleFunction Call @CALL(. . . ) @CALL(X)Concatenation . @CALL(X).@CALL(Y)Sequence - > @CALL(X)- > @CALL(Y)Negation “NOT(. . . )” “NOT(@CALL(X))”Repetition * @CALL(X)*Alternative + (@CALL(X) + @CALL(Y)) Table 2.

FShell expressions @CALL(X) re-quires generated testcases to call func-tion X . This is theonly primitive ex-pression used in themodule. The con-catenation operator . joins two expres-sions, requiring themto be satisﬁed sub-sequently. As an example, a test case generated by @CALL(X).@CALL(Y) cov-ers a call to X immediately followed by Y . This is similar to the sequence operator - > , which requires the second call to occur eventually. @CALL(X)- > @CALL(Y) is thus fulﬁlled if a call to X is eventually followed by a call to Y . The negation “NOT(@CALL(X))” is satisﬁed by every statement except a call to function X .The repetition operator is implemented along the lines of its regular expressionpendant, such that @CALL(X)* is satisﬁed by a series of calls to X . Finally,the alternative operator implements logical disjunction, such that (@CALL(X)+ @CALL(Y)) will be satisﬁed if either a call to X or Y occurs.The expressions and operators above are all that is used by the FShell plug-in to generate the test vectors requested by RapiCover. Sec. 3.3 illustrates howthese expressions are used to convert test goals to equivalent FQL queries. JavaModuleFShellRapiCover Test Goals FQL QueriesTest VectorsTest Suites

Fig. 4.

Architecture of FShell plugin for RVS

The FShell plugin for RVStranslates test goals requestedby RapiCover into FQL queriescovering these goals in FShell,as illustrated in Fig. 4.Test goals are speciﬁed us-ing marker elements fromthe RapiCover instrumenta-tion, which can identify arbi-trary statements in the source code by assigning them an instrumentation pointid . In accordance with MC/DC criteria, decisions and their constituting condi-tions are further identiﬁed using unique decision and condition point ids .Fig. 5 shows an example program before and after RapiCover instrumenta-tion. The module supports two categories of test goals:

Instrumentation PointPath Test Goals and

Condition Test Goals . The former speciﬁes a simple series nt main() { // ... if (a == b || b != c) { printf (”%d %d \ n”, a, b); } return } int main() { // ... Ipoint (1); if (Ipoint (4, Ipoint (2, a == b) || Ipoint (3, b != c))) { Ipoint (5);printf (”%d %d \ n”, a, b); } Ipoint (6); return } Fig. 5.

Code example before and after after RapiCover instrumentation of instrumentation points to be covered by FShell. The system also permits in-clusive or and negation operators in instrumentation point paths, allowing tospecify a choice of instrumentation points to be covered or to make sure that arequested instrumentation point is not covered by the provided test vector. Asan example, the instrumentation point path > > in Fig. 5 is only coveredif the decision in the if statement evaluates to true . Conversely, the path > NOT(5)- > is only covered if it evaluates to false . The former can be achievedwith inputs a =1 , b =1 , c =2, whereas the latter could be covered using the in-put vector a =1 , b =2 , c =2. Condition Test Goals on the other hand are speciﬁedby a single decision point and multiple condition points , as well as the desiredtruth value for each decision and condition. This allows us to cover branch con-ditions with precise values for its sub-conditions. As an example, the conditiontest goal (4,true) - > (2,false) - > (3,true) would be covered by the input vector a =1 , b =2 , c =3.The instrumentation elements introduced by RapiCover need to be mappedto an equivalent FQL query using the features presented in Tab. 2. For this Category Goal FQLInstrumentationPoint Path Goal Simple @CALL(Ipoint5) - > @CALL(Ipoint6) - > . . .Disjunction ( @CALL(Ipoint5) + @CALL(Ipoint6) + . . . )Complement @CALL(Ipoint1).”NOT(@CALL(Ipoint5))*”.@CALL(Ipoint6)- > . . .Condition Goal Condition @CALL(Ipoint2f).”NOT(@CALL(Ipoint1))*”.@CALL(Ipoint2t).”NOT(@CALL(Ipoint1))*”.+. . .Decision @CALL(Ipoint4t) Table 3.

Test Goal Types and FShell Queries urpose, we replace their default implementation in RapiCover by synthesizedsubstitutions which are optimized for eﬃcient tracking by FShell. These mockimplementations are synthesized for each query and injected into the programon-the-ﬂy at analysis time. Standard FQL queries are then enough to examinethese augmented models for the speciﬁed coverage goals. Tab. 3 shows explicitlyhow these goals can described using the FShell query syntax.

The FShell plugin for RVS has been tested using an industrial automotive usecase, for a software managed controller.

To illustrate the features and utility of the tool, we applied it to the software ofan e-Shift Park Control Unit. This system is in charge of the management of themechanical park lock that blocks or unblocks the transmission to avoid unwantedmovement of the vehicle when stopped. The park mode is enabled either bycommand of the driver via the gear lever (PRND: park/rear/neutral/drive) orautomatically. vehiclecontrolunitdashboard e-ParkControlUnitPRNDswitches powertrain park lock Fig. 6.

Case Study: e-Shift Park Control Unit

Fig. 6 shows thearchitectural elementsthe e-Park system iscommunicating with.The vehicle controlunit monitors the sta-tus of the vehicle viasensors and informsthe driver, in particu-lar, about the speed of the vehicle and the status of the gears via the dashboard.The e-Park Control Unit is responsible for taking control decisions when to ac-tuate the mechanical park lock system.Among many others, the following requirements have to be fulﬁlled:1. Parking mode is engaged if vehicle speed is below 6 km/h and the driverpresses parking button (P) and brake pedal.2. If vehicle speed is above 6 km/h and the driver presses the parking button(P) and brake pedal then commands from the accelerator pedal are ignored;parking mode is activated as soon as speed decreases below 6 km/h.3. If vehicle speed is below 6 km/h and the driver presses the driving button(D) and brake pedal, then forward driving mode is enabled.4. If vehicle speed is above 6 km/h then backward driving mode (R) is inhibited. The C code was provided by Centro Ricerche Fiat under a GPL-likelicense and can be downloaded here: https://drive.google.com/file/d/0B22MA57MHHBKamhQMmpEQlRWVG8 . The C code was generated from a Simulink modelthat has not been disclosed, unfortunately. s is typical for embedded software, the e-Park Control Unit software con-sists of tasks that — after initialization of the system on start-up — executeperiodically in the control loop until system shutdown. A test vector hence con-sists of a sequence of input values (sensor values and messages received via thecommunication system) that may change in each control loop iteration. We callthe number of iterations the length of the test vector.To generate valid test vectors, a model of the vehicle is required. Otherwise,the test vector generator may produce results that are known not to occur in therunning system, such as inﬁnite vehicle velocity. For the case study this modelconsisted of assumptions about the input value ranges, such as “The speed of thecar will not exceed 1000 km/h, or reduce below 0 km/h.” These assumptionsare part of the admissible operating conditions as stated in the requirementsspeciﬁcation.

In order to evaluate the FShell plugin for RVS, we used the C source code of thee-Shift case study (approx. 4KLOC) and started out with an initial test suiteconsisting of 100 random test vectors uniformly distributed over the admissibleinput ranges. Then we incrementally extended this test suite by additional testvectors generated by the following two approaches:1. FShell plugin for RVS following the process illustrated in Fig. 1.2. A combination of test vector generation based on random search and greedytest suite reduction.

FShell plugin for RVS random search + reduction1. Start with the initial test suite.2. Compile and run the C source code with the current test suite, usingRapiCover to generate a coverage report.3. RapiCover provides FShell with a listof non-covered test goals.4. FShell generates a test vector for thesenon-covered test goals. Generate a random test vector, uni-formly distributed over the admissibleinput ranges.5. FShell feeds back information about in-feasible test goals and test vectors forfeasible test goals.6. Create C test cases based on these test vectors.7. Re-compile and re-run the C code with this new test case, using RapiCoverto verify that the generated test case does indeed cover the test goal.8. If the coverage has increased then keepthe test case; otherwise discard it.9. Repeat from step 3.

Table 4.

Experimental setup of the two approaches that we compare.nitial Random search FShelltest suite plug-inRuntime (hh:mm) - 00:33 01:04 06:15 08:00 08:00Generated test cases - 500 1000 5000 6092 7Thereof non-redundant - 7 10 13 13 7Total test cases

107 110 113

113 107

Statement coverage

Increase 0.1% 0.2% 0.2%

MC/DC coverage

Increase 2.1% 2.7% 3.5%

Evaluation results: Comparing FShell plugin for RVS against test vectorsgenerated by random search.

We compared the achieved coverage gain and resulting test suite sizes afterrunning both approaches for 8 hours. Tab. 4 describes our experimental setup.The runtime of FShell is exponential in the loop bound of this main loop.Choosing a too high loop bound results in FShell taking prohibitively long torun, yet setting the loop bound too low results in some branches not beingcoverable. As mitigation, we started the experiment with a loop bound of 1, thenwe gradually increased the loop bound to cover those branches that we were notable cover in previous iterations. As explained in Section 2.1, step 6 in Tab. 4is not automatic since it needs information from the requirements speciﬁcation.For the sake of our comparison that does not care about the pass/fail status ofthe test, we skipped the manual addition of the expected test outcome.

We ran the experiment for 8 hours. The ﬁrst approach spent more than 99%of this time within FShell. Within this time frame, the loop bound reached 2,and thus not all branches could be covered. Nevertheless an increased coveragewas achieved as detailed in Tab. 5, which shows the baseline coverage on theinitial test suite (second column from the left) and the increase in percentageof coverage gained by the tool in this experiment (rightmost column). The codeunder test implements a state machine, which is mostly decisions with very fewfunctions and calls, which is why we focussed on decision and statement coveragefor our evaluation.To underpin the beneﬁt of our tool we compared these results to the secondapproach described in Tab. 4, a random search test generation strategy. Tab. 5shows four snapshots of this search (middle columns) after exploring 500, 1000,5000 and eventually 6092 test vectors of length 5 out of the admissible inputrange. The results show that more than 99.99% of the generated test vectors We chose length 5 because it seems a good compromise between increasing coverageand keeping test execution times short for this case study: adding 100 test vectorsof length 5 increased coverage by 1.1%; 100 test vectors of length 10 increased it dded by the random search are redundant and do not increase the coverageof the suite. This conﬁrms that the system under test represents a particularlychallenging case for black-box test case generation and that only very few testvectors in the input range lead to actual coverage increase.On the other hand, the FShell plugin for RVS achieves a signiﬁcantly largerincrease in the same amount of time in both statement and MC/DC coverage.In addition to this, the plug-in achieves this increased coverage with only halfas many new test vectors as the random approach, leading to an overall smallerand more eﬃcient test suite.This evaluation thus underlines the beneﬁt from our tool integration to sup-port the coverage closure process on an industrial case study. The expectedreduction in manual work needs to be investigated in a broader industrial eval-uation involving veriﬁcation engineers performing the entire coverage closureprocess.

There is much work existing for test case generation using Model Checkingtechniques [5], but a smaller amount targeted directly at the high criticalitysafety domain where the criterion and frameworks for test case generation arerestricted. A useful survey relating to MC/DC can be found in [17]. In [7] Ghaniand Clark present a search based approach to generating test frameworks. Thereare two issues with the approach presented, ﬁrstly that it is applied to Java—alanguage which is rarely used for safety-critical software, and particularly not forthe most critical software. The second is more subtle: the test cases were gener-ated to ensure that the minimal set of truth tables for MC/DC were exercized,but without consideration of the validity of any of the test data. Additionally,we emphasize that our approach takes into account existing coverage that hasalready been achieved and complements the requirements based testing, ratherthan completely replacing it.Other work such as [11] looks at modiﬁcation of the original source throughmutation testing in order to assess eﬀectiveness of the tests. This could be con-sidered an adjunct to our methodology, but at present mutation testing is notwidely adopted by industry. Jones [10] considers test prioritization and test suitereduction, but not new test case generation.

In order to encourage wider adoption of this integrated tool, we need to considerwhere it would ﬁt in users’ workﬂow and veriﬁcation processes, as well as meetingthe practical requirements of the standard. As noted earlier, fully automated by only 1.3% while test execution times would double and only half as many testvectors could be explored. ode coverage testing is not desirable as it misses the intent of the requirementsbased testing process. However, achieving full code coverage is a diﬃcult task,and often requires a large amount of manual inspection of coverage results toexamine what was missing. Hence providing the user with suggested test datais potentially very valuable and could improve productivity in one of the mosttime consuming and expensive parts of the safety certiﬁcation process.Another beneﬁt of integrating test case generation and coverage measurementis test suite reduction. The coverage measurement tool returns for each test casea list of covered goals. Test suite reduction is hence the computation of a minimalset cover (an

N P -complete problem). Approximate algorithms [16] may be usedto achieve this in reasonable runtimes.FShell uses a class of semantically exact, but computationally expensive,