Data Driven Testing of Cyber Physical Systems
DData Driven Testing of Cyber Physical Systems st Dmytro Humeniuk
Polytechnique Montr´eal
Montreal, [email protected] nd Giuliano Antoniol
Polytechnique Montr´eal
Montreal, [email protected] rd Foutse Khomh
Polytechnique Montr´eal
Montreal, [email protected]
Abstract —Consumer grade cyber-physical systems are becom-ing an integral part of our life, automatizing and simplifyingeveryday tasks; they are almost always capable of connection tothe network allowing remote monitoring and programming. Theyrely on powerful programming languages, cloud infrastructures,and ultimately on complex software stacks. Indeed, due to com-plex interactions between hardware, networking and software,developing and testing such systems is known to be a challengingtask. Ensuring properties such as dependability, security or dataconfidentiality is far from obvious. Various quality assurance andtesting strategies have been proposed.The most common approach for pre-deployment testing is tomodel the system and run simulations with models or software inthe loop. In practice, most often, tests are run for a small numberof simulations, which are selected based on the engineers’ domainknowledge and experience.To improve quality assurance practices, previous works suchas the ARISTEO paper, promulgate the adoption of search-based techniques and models to falsify a given property. Ourconjecture is that effectiveness and usefulness are improved ifthe environment is also explicitly accounted for. In the approachpresented in this paper, we seek properties violations via simu-lation and search based-techniques. However, the environment,more precisely, a set of environment conditions are accounted forvia models learned on real world data. Learned models, representthe interactions of cyber-physical systems and the surroundingsunder specific environmental conditions. Models are then fedinto a combined simulation - search algorithms. Indeed, we seekto find a model combination ( i.e., environment realization) andsequence of system actions violating a particular property.We have implemented our approach in Python, using standardframeworks and used it to generate scenarios violating temper-ature constraints for a smart thermostat implemented as a partof our IoT testbed. Data collected from an application managinga smart building have been used to learn models of the environ-ment under ever changing conditions. The suggested approachallowed us to identify several pit-fails, scenarios ( i.e., environmentconditions), where the system behaves not as expected.
Index Terms —cyber-physical systems, test case generation,genetic algorithm
I. I
NTRODUCTION
Ensuring the reliability of cyber-physical systems (CPS),such as cars with adaptive cruise control, robotic systemsor smart buildings is of vital importance. Testing softwarefor such systems is challenging, as developers need to takeinto account the interaction with hardware as well as theenvironment. Consider an autonomous vehicle that may drivein various conditions (fog, rain, snow) and react to pedestriansas well as other cars manoeuvres. Or an IoT network, where, due to extensive traffic, control commands would arrive withdelay or get lost.The input search space of such systems is substantial,therefore metaheuristics and random search based techniques,are often used to generate the test cases [1]. Further, the systemmodel is used to execute the test cases, as it is unpracticalto use the physical system, especially on the pre-deploymenttesting stage [2]. Therefore, it is also important to obtain anaccurate system model, that is not time consuming to execute.A number of tools have been developed to verify whether amodel meets specific requirements and whether there are anyinputs violating them, e.g., S-Taliro, Breach, falsify ARIsTEOand other tools described in [3]. However, they require tomanually specify the requirements, which is a tedious taskand does not guarantee the consideration of all possiblerequirements.Another direction of research is automatic search basedgeneration of test suites for CPS. Those approaches are mostlyoften focused on finding the test cases with the best require-ments coverage and diversity, but not falsification [1], [4], [5].They lack flexibility as often require an external software togenerate the initial test cases. Also, they are using the SimulinkAPI to execute the models, which can be computationallyexpensive.We surmise that it is important to design test suites with highfault revealing power, indicating to developers the possibleworst case scenarios of system execution. Moreover, the testcases should consider possible combination of environmentalconditions during system execution. For example, a car canride on a dry or an icy road, changing the model describingit’s trajectory evolution.
Motivation . We explain the motivation for our work witha wirelessly controlled thermostat case study. This system isdescribed in a greater detail in [6]. The thermostat automati-cally controls the temperature in a closed room by switchingbetween ”on” and ”off” modes. The temperature in the room isset by the user defined schedule. A developer writing softwarefor the thermostat defines a number of parameters such as thesampling rate, the hysteresis value (small threshold before orafter reaching the temperature), etc. In addition, the environ-mental conditions affect the system behaviour: the commandssent from the controller to the thermostat can be delayedor even lost due to the network overload, the temperaturedecrease/increase speed can vary depending on the time ofthe day, room humidity, etc. Considering all the parameters, a r X i v : . [ c s . CR ] F e b re there scenarios when the system is not able to follow theschedule? What are the combinations of input parameters andenvironmental conditions that drive the system to an unsafestate? Finding answers to such questions motivated us indeveloping an automatic search based approach for CPS testcase generation.II. P ROBLEM FORMULATION
The behaviour of a hybrid system can be described withmodes having continuous output dynamics and discrete modeswitches [7] . Each hybrid system has input(s) U i , output(s) Y i and state variables S i . The expected system behaviour B ( τ ) is specified over a time interval τ . Mode switch occurs whenthe expected output requirements can’t be met by the systemin a particular mode. The dynamics of system state, inputand output variables S i , U i , Y i in each mode N i is givenby a corresponding mathematical model M i , which can bederived from system execution data using system identificationor machine learning techniques [8]. Being dependent on theenvironmental conditions, the models can get very complexand taking a substantial amount of time to execute. Wetherefore surmise that each mode N i can be represented by aseries of surrogate or simplified models M i , corresponding tocertain environmental conditions E i . Therefore each series M i will contain models m ij , where i is the model series identifierand j - model identifier. The test case (TC) generation problemcan thus be thought of as finding a combination of models m ij and input values U i maximizing the difference betweensimulated system behaviour B s ( T C ) over time interval τ andexpected behaviour B ( T C ) , with system variables satisfyinga certain constraint K : max δ ( B ( T C ) , B s ( T C )) ,subject to : K ( Si, U i, Y i ) . Where δ ( B ( T C ) , B s ( T C )) is our fitness function computingthe deviation between the expected and simulated behaviour.Evolution can be done in one way: for the fixed scenarios finda combination of models, violating the user requirements. Orin both ways: by changing models and system inputs, find theworst possible scenario.To generate the initial test cases we suggest using hiddenMarkov chains. This idea is not new, in [9] for example,Markov chains are used to generate simulation scenarios fora wireless network. We decide to use the Markov chain fortwo main reasons. First, by running the chain for a number oftimes, the developer can estimate an average performance ofthe system. Secondly, in our experiments, the initial populationfor genetic algorithm (GA) generated with Markov chain pro-vided semantically better test cases, than completely randominitialization.The parameters for the Markov chains, such as statesand probabilities of state change, can be estimated from thedata on typical system usage scenarios. In this case, ”states”correspond to the system modes. For each state, there isa set of possible output values the system can reach, theduration of being in the state and the model, accounting for the system behaviour corresponding to particular environmentalconditions. Therefore, for scenario generation, operation ineach state can be represented by a triplet: S i = ( Y i , τ i , M i ) (1)where Y i is desired system output in a particular state, τ i - theduration of this state (the sum of τ i should be equal to τ ) and M i - model to use, to describe the system behaviour. A testcase is represented by a sequence of states: T C i = ( S , S , ..., S i ) (2)Finally, we suggest using evolutionary algorithms to find thecombination of the Y , τ i and M i values maximizing the fitnessfunction δ . For this study we used a single objective geneticalgorithm. III. C ASE STUDY
In our case study we consider an example of a wirelesslycontrolled thermostat, described in the Introduction.For the thermostat two modes of operation can be defined: M : “ON” and M : “OFF”. The behaviour of the systemcan be represented by a sequence of switching between thesemodes, and time passed in each mode. The input variable U is the goal temperature (expected behaviour) at a givenpoint in time. The output variable Y is the value of outputtemperature controlled by the system. It can also have suchstate variables as S - system start temperature in a certainmode, S - time spent in a particular mode, etc. The constraintsfor the input variables are the temperature values between 16and 25 degrees Celsius. The time intervals spent in each modecan range from 15 minutes to 6 hours. A. Model creation
To create the system model we used a system identificationtechnique [10], where a model of a dynamical system is builtfrom the data. The process requires the following steps:1) Extract the data describing system behaviour in differentmodes.2) Select a model structure.3) Apply an estimation method to estimate values for theadjustable coefficients in the candidate model structure.4) Evaluate the estimated model.The wireless thermostat, controlling temperature in a closedroom, is a part of our physical testbed of IoT network of morethan 30 devices, based on a Z-wave protocol. Therefore, weextracted the data for creating the model from the experimentalmeasurements. We selected the series of data points, corre-sponding to behaviour of the thermostat after ”switch on” and”switch off” commands. One model includes two equationsdescribing behaviour in ”on” and ”off” modes. In total, wecould identify 15 models having different coefficients in theequations. Evidently, due to varying environmental conditions,i.e the opened door, higher or lower humidity, heat transferfrom outside, the coefficients in the selected model structurehad to be adjusted to better fit the original data. Creating onecomplex model, with high number of inputs, considering thenvironmental conditions, would make the execution of themodel computationally expensive.One of the challenges is to select the model structure. Inour case, the heating and cooling of a closed space is guidedby physical laws, such as Newton Law of cooling [11]. Thelaw has an exponential nature, therefore our experimentallyselected model structure is based on increasing and decreasingexponential function.We propose the following time-discreet model structure forthe M (”on”) mode: Y = k on ∗ (1 − e − k on ∗ t i ) + T (3)and for the M (”off”) mode: Y = k off ∗ ( e − k off ∗ t i ) + T − k off (4)Here k on , k on , k off , k off are the unique coefficientsdefining the model behaviour in a particular environment. T - is the starting temperature and t i - the discreet time stepvalue. We keep the coefficients in a table, such as table I.As an example, we show coefficients for the three obtainedmodels. To obtain the coefficients, the points from the data TABLE IM
ODEL COEFFICIENTS
Model k on k on k off k off must be fitted by a curve with minimal deviation. We usedpython SciPy library, namely curve f it class, which is basedon non-linear least squares method. The average root meansquare error between original and approximated data did notexceed 0.5 degrees. B. Generating initial test cases
To automatically generate the test cases we represent thethermostat system as a Markov chain with two states ”on” and”off”, which is shown in Fig.1. The probabilities of changingthe states were estimated empirically, so that most of thegenerated test cases are semantically correct. A change of stateoccurs with probability of 0.9 and state remains the same with0.1 probability.After reaching a particular state, we randomly choose atemperature value the system is expected to reach, the timeinterval to be spent in the state and the the model coefficientsto use, so that each state is represented by a triplet (tempera-ture, duration, model), similar to (1). In this way, a test caserepresents a temperature schedule a user might define.For each execution we indicate the expected duration of thetest case as well as the number of states. We chose the durationto be 24 hours, representing one day, and having 5 to 12 statesin each test case.We implemented the algorithm in a python script, whichsaves the generated test cases in a ”json” format.
Fig. 1. Thermostat system representation with a Markov chain
C. Genetic algorithm description
To find the test cases maximizing the difference between theexpected and simulated behaviour we implemented a geneticalgorithm in Python with Pymoo framework [12]. In ourconfiguration the number of generations is N G = 90, mutationrate is m r = 0.4, crossover rate is c r = 0.9 and population size: p size = 100. These values were established experimentally andfollowing the common practices.
1) Solution representation:
The solution is composed byat least one test case, containing from 5 to 12 states. Thechromosomes are the test cases, represented in the softwareimplementation as a dictionary, see Fig.2. They have a variablenumber of genes, where each gene corresponds to a systemstate.
Fig. 2. GA chromosome representation, tc - test case, st - state
2) Selection:
We use the k-way tournament selection im-plemented in Pymoo to choose the parents.
3) Crossover operators:
We implemented a crossover oper-ator that exchanges the states between two different test casesas shown in Fig.3 We use a one point crossover.
Fig. 3. Crossover operator for two test cases of 5 and 6 states with crossoverpoint at the third state
4) Mutation operators:
We define two mutation operators,similar to [4]: • exchange operator : two states of a chromosome arerandomly selected and exchanged the positions; • change of variable operator : a state in a chromosomeis randomly selected, then for one of the state variables(temperature, duration, model) value is changed accord-ing to its type and maximum as well as minimum values. ) Fitness function: In our study the fitness function eval-uates the root mean square error between the simulated andexpected behaviour. The expected behaviour is specified inthe test cases, which are given as an input to the systemsimulation. The test case is executed using the specifiedmodels and the values of system behaviour are calculated.The fitness of 1 signifies that the system can provide thetemperature with the difference from the schedule of 1 degreeon average, which might be acceptable for a typical user. Asthe Pymoo framework minimizes the fitness function, in ourimplementation we multiply its actual value by (-1).IV. R
ESULTS
To evaluate the performance of our GA implementation weran it 50 times (each run contains 9000 evaluations). Aftereach run we recorded the fittest individuals. We comparedits performance with the random search (RS). We recordedthe fittest individual after generating 9000 random individuals,repeating the process 50 times. The obtained boxplot is shownin the Fig. 4. In the boxplot we also report the fitness values ofall randomly generated individuals during evaluation. We cansee that GA always produces better results with an averagefitness of -7.2, while the average fitness of the RS bestindividuals is -2.8. Considering all the randomly generatedindividuals, the average fitness is around 0.93. For one of theruns we also report the convergence of GA in Fig. 5, whichconfirms its good performance.From this evaluation we conclude that our thermostat systemperforms well on average (the mean deviation from the sched-ule is around 1 degree) as shown by all randomly generatedschedules. However, there are potential scenarios, which canlead to completely wrong system behaviour (deviation fromthe schedule for 7 degrees on average). It is up to developerto decide, whether the found test cases are realistic or not.If they aren’t, we recommend adjusting the search parametersand constraints.
Fig. 4. Genetic algorithm and random search comparison after 50 runs
V. D
ISCUSSION AND CONCLUSION
In this paper we suggested an approach for generatingfault revealing test cases for hybrid CPS, taking into account
Fig. 5. Convergence of GA over 90 generations variability of system behaviour in changing environmentalconditions. It includes generation of models, initial test casesand genetic algorithm implementation in Pymoo framework.The results for the wireless thermostat case study prove theeffectiveness of our implementation comparing to randomsearch. With our approach we could evaluate the system per-formance as well as generate potentially dangerous scenarios.However, it is up to developers to judge if the test cases arepertinent and take further actions to prevent the failures.The approach can be applied for a wide range of hybridCPS, what we are going to demonstrate in our future casestudies. We also plan to implement our approach as a completetest case generation tool.R
EFERENCES[1] A. Turlea. Search based model in the loop testing for cyber physicalsystems. In , pages 22–28, 2018.[2] Arend Aerts, M Reniers, and Mohammad Reza Mousavi. Model-basedtesting of cyber-physical systems. In
Cyber-Physical Systems , pages287–304. Elsevier, 2017.[3] Gidon Ernst, Paolo Arcaini, Ismail Bennani, Alexandre Donze, GeorgiosFainekos, Goran Frehse, Logan Mathesen, Claudio Menghi, GiuliaPedrinelli, Marc Pouzet, et al. Arch-comp 2020 category report:Falsification.
EPiC Series in Computing , 2020.[4] Aitor Arrieta, Shuai Wang, Urtzi Markiegi, Goiuria Sagardui, and LeireEtxeberria. Search-based test case generation for cyber-physical systems.In , pages688–697. IEEE, 2017.[5] Reza Matinnejad, Shiva Nejati, Lionel C Briand, and Thomas Bruck-mann. Automated test suite generation for time-continuous simulinkmodels. In proceedings of the 38th International Conference on SoftwareEngineering , pages 595–606, 2016.[6] Cyrine Zid, Dmytro Humeniuk, Foutse Khomh, and Giuliano Antoniol.Double cycle hybrid testing of hybrid distributed iot system. In
Pro-ceedings of the IEEE/ACM 42nd International Conference on SoftwareEngineering Workshops , pages 529–532, 2020.[7] Rajeev Alur.
Principles of cyber-physical systems . MIT press, 2015.[8] Claudio Menghi, Shiva Nejati, Lionel Briand, and Yago Isasi Parache.Approximation-refinement testing of compute-intensive cyber-physicalmodels: An approach based on system identification. In , pages372–384. IEEE, 2020.[9] Jing Liu and Yixu Yao. Modeling and analysis of wireless cyberphysicalsystems using stochastic methods.
Wireless Communications and MobileComputing , 2019, 2019.[10] Lennart Ljung and Torkel Glad.
Modeling of dynamic systems . NumberBOOK. Prentice-Hall, 1994.[11] RHS Winterton. Newton’s law of cooling.
Contemporary Physics ,40(3):205–212, 1999.12] J. Blank and K. Deb. Pymoo: Multi-objective optimization in python.