A predictive safety filter for learning-based racing control
Ben Tearle, Kim P. Wabersich, Andrea Carron, Melanie N. Zeilinger
AA predictive safety filter for learning-based racing control
Ben Tearle and Kim P. Wabersich and Andrea Carron and Melanie N. Zeilinger
Abstract — The growing need for high-performance con-trollers in safety-critical applications like autonomous drivinghas been motivating the development of formal safety veri-fication techniques. In this paper, we design and implementa predictive safety filter that is able to maintain vehiclesafety with respect to track boundaries when paired alongsideany potentially unsafe control signal, such as those found inlearning-based methods. A model predictive control (MPC)framework is used to create a minimally invasive algorithmthat certifies whether a desired control input is safe andcan be applied to the vehicle, or that provides an alternateinput to keep the vehicle in bounds. To this end, we providea principled procedure to compute a safe and invariant setfor nonlinear dynamic bicycle models using efficient convexapproximation techniques. To fully support an aggressive racingperformance without conservative safety interventions, the safeset is extended in real-time through predictive control backuptrajectories. Applications for assisted manual driving and deepimitation learning on a miniature remote-controlled vehicledemonstrate the safety filter’s ability to ensure vehicle safetyduring aggressive maneuvers.
I. INTRODUCTIONThe development of robotic systems has led to an everincreasing number of applications that go beyond the isolatedtask spaces found in legacy industries such as automotive orelectronics production. More recent applications encompassdynamic and learning-based interactions with humans incomplex task spaces, as is the case with autonomous driving,and therefore require advanced safety mechanisms [1], [2], toprevent potentially dangerous situations. Maintaining safetyat the physical limits for highly dynamic systems oftenrequires a task-specific trade-off between performance andconservatism to ensure safe system operation. As a result,there is an increasing interest in developing theoreticallysound safety frameworks with a reduced degree of conser-vatism that enable safety in a modular fashion, independentof a task-specific objective.While some of these methods have been demonstratedin practice, the considered applications are often small-scale or nearly linear control systems that are only operatedwithin conservative regions of their state space [3]. Motivatedby the strict safety requirements in autonomous driving,we consider the problem of safe autonomous and assistedracing as a benchmark application for deriving a practically The authors are with the Institute for Dynamical Systems and Con-trol, ETH Zurich, ZH-8092, Switzerland: [email protected],[wkim|carrona|mzeilinger]@ethz.ch . This work was sup-ported by the Swiss National Science Foundation under grant no. PP00P2157601 / 1. The research of Andrea Carron was supported by the SwissNational Centre of Competence in Research NCCR Digital Fabrication. Thework of Ben Tearle and Andrea Carron was supported by the ETH CareerSeed Grant 19-18-2.
Desired controlsignal u d ( x ( k )) Safety filter π s ( x ( k ) , u d ( k )) System f ( x ( k ) , u ( k )) u d ( k ) x ( k ) u ( k ) Safe system f s ( x ( k ) , u d ( k )) Fig. 1. Concept of predictive safety filter: Based on the current state x ( k ) ,an arbitrary control algorithm provides a desired control input u d ( k ) ∈ R m ,which is processed by the safety filter u ( k ) = π s ( x ( k ) , u d ( k )) and appliedto the real system. relevant safety mechanism. Racing requires the utilization ofa vehicle’s full nonlinear dynamics, providing a challengingdomain in which safety must be guaranteed.To provide safety for arbitrary control policies, we relyon a modular safety framework as shown in Figure 1. Thisapproach allows the framework to be used in conjunctionwith any potentially unsafe control signal, such as thosefrom learning-based controllers. The basic idea is to designa safety filter , which analyzes the desired control signal anddecides in real-time whether it can be applied to the system,or if it has to be modified to ensure safety. For the racingapplication considered in this work, this consists of verifyingif the vehicle is able to stay within track boundaries in thefuture given the current steering and drivetrain commands.This is achieved by finding safe backup control sequencesthat lead the vehicle towards a set of known safe states, wherethe first input of the sequence is as close as possible to thedesired control signal. This approach allows for verifying thesafety of the desired input while at the same time providingan alternative safe input otherwise. A. Related Work
The concept of using a safety controller in a closed-loopsystem was first introduced in [4], where the system canswitch between an experimental controller and a reliablesafety controller in the case of software faults. Developmentson the theoretic use of barrier certificates for verifying systemsafety were later proposed in [5], which was further extendedto the idea of control barrier functions (CBFs) [6]. Morerecent work has revisited the notion of using CBFs for safety-critical control of robotic systems, see [1] for an overview.This approach has been combined with a machine-learningframework in [7] to safely learn model discrepancies of aSegway robot while limiting the operational space duringtraining. Although these methods build off strong theoretical a r X i v : . [ ee ss . S Y ] F e b esults from control Lyapunov function theory, they rely onthe ability to explicitly model a system’s safety requirementsas a CBF, which is not generally trivial to design.Given the inherent lack of safety guarantees in traditionalmachine learning methods, the reinforcement learning (RL)field has become increasingly interested in enforcing con-straints for training black-box control policies. A general-purpose policy search algorithm for constrained reinforce-ment learning is introduced in [8], which approximatelyenforces safety constraints at every policy update. Usinga learning-based system model, [9] proposes a method fordetermination of a safe set of system states under a spe-cific learning-based policy. Although these methods allowfor approximately safe policy training, they are limited inthat they remain tied to task-specific reinforcement learningalgorithms, whereas the safety filter presented in this workis able to function independent of a specific task and therebyenables modular safety.An approach for providing system safety based on confin-ing a system to a pre-computed set of safe states is introducedin [10]. This uses reachability-based techniques to find a safeset for a given system together with a corresponding controlpolicy that provides invariance within the safe set. The ideais expanded in [3] to perform online updates of the safeset using a non-parametric system dynamics estimate. Theseapproaches suffer from limited scalability in the offline safeset computation required. Recent work attempts to addressthis by approximating the reachable sets using data-basedmethods [11], sum-of-squares programming [12], and activelearning [13].Closely related to these ideas, a method for establishingsafety using an MPC-based control law is derived in [14].A continuously updating control policy is computed onlineto find backup trajectories towards safe states, resulting inthe implicit representation of the safe set and correspondingsafe control law via the MPC optimization problem. Thismethod is extended to consider nonlinear stochastic systemsformulated with chance-constraints or parametric uncertain-ties in [15], [16], and provides the foundation for the task ofautonomous racing considered in this work. B. Contributions
The main contribution of this paper is the design andimplementation of a permissive safety filter for autonomousracing that can be combined with any desired control signal,ensuring closed-loop vehicle safety with respect to a trackfor a diverse range of applications. To this end, we usethe concept of predictive safety filters as presented in [14],[16]. To achieve a minimally invasive safety filter supportingaggressive maneuvers, we use a nonlinear dynamic bicyclemodel with a Pacejka model of the tire forces [17] tosimultaneously predict and optimize accurate backup controltrajectories. In addition to a high-fidelity system model,the safety filter performance can be improved by usingeither a longer planning horizon or a larger terminal set.As the planning horizon is typically limited by memory andprocessing requirements, we derive an iterative optimization- based invariant set computation using convex approximationsto obtain an enlarged terminal safe set for the nonlineardynamic bicycle model, which is valid over a range ofconstant road curvatures.The physical miniature racing application demonstratesthe proposed safety filter’s performance with both human-in-the-loop racing and deep imitation learning. This workpresents, to the best of our knowledge, the first application ofa predictive safety filter to a complex and highly dynamicalnonlinear system demonstrated in experimental results.II. PROBLEM FORMULATION
Notation : The set of integers in the interval [ a, b ] ⊂ R is denoted by I [ a,b ] , and the set of integers in the interval [ a, ∞ ) ⊂ R is I ≥ a . The i -th row of a matrix M ∈ R n × m is denoted by [ M ] i .The goal of this work is to design a safety filter thatcertifies whether or not a desired control input, u d ( k ) , is safefor a vehicle system, and provides an alternative safe controlinput at any time. We consider a discrete-time nonlinearsystem of the form x ( k + 1) = f ( x ( k ) , u ( k )) , ∀ k ∈ I ≥ , (1)subject to state and input constraints, x ( k ) ∈ X , u ( k ) ∈ U ,where the dynamics f : X × U → R n . System safety isdefined with respect to ensuring constraint satisfaction at alltimes, as follows. Definition 1.
A system (1) is considered safe if x ( k ) ∈ X , u ( k ) ∈ U , ∀ k ∈ I ≥ . (2)In order to guarantee this notion of safety for a given u d ( k ) , a safety control policy, π S ( x ( k ) , u d ( k )) , is providedthat guarantees constraint satisfaction for all future timestepsif applied to the vehicle. If a safety policy exists with u d ( k ) as the current input of the policy, then u d ( k ) can be certifiedas safe and applied to the system. More formally: Definition 2.
A desired input u d (¯ k ) is certified as safe forsystem (1) , at a given timestep ¯ k , if the safety control policyyields π S ( x (¯ k ) , u d (¯ k )) = u d (¯ k ) , and application of u ( k ) = π S ( x ( k ) , u d ( k )) to the system results in safety according toDefinition 1 for all k ≥ ¯ k . Using a safety policy in accordance with Definition 2provides a safety filter that can be brought into a closed-loop system as shown in Figure 1. Since the safety policycan be updated at each time step to consider the incomingdesired input, this allows the desired control signal to havecontrol authority over the system whenever possible, i.e. π S ( x ( k ) , u d ( k )) = u d ( k ) . However, if the desired controlsignal would put the system at risk of violating its constraintsin the future, then alternate inputs, π S ( x ( k ) , u d ( k )) (cid:54) = u d ( k ) ,must be available that ensure safety for the system.The next section discusses an approach to compute π S online using an MPC framework that minimizes interferencewhile still ensuring safety for the system. ig. 2. Diagram (a) shows a possible vehicle trajectory from a safe desired input u d ( k ) . Diagram (b) shows the resulting vehicle trajectory from an unsafedesired input, where the vehicle ends up leaving the track. An alternate safe input u ∗ ( k ) applied by the safety filter is shown along with its trajectory. III. PREDICTIVE SAFETY FILTERWe define an implicit safety policy through a receding-horizon optimal control problem, referred to as predictivesafety filter problem [14], which allows for an efficient onlinecomputation of the desired safety filter π S : min x i | k ,u i | k J ( u i | k , u d ( k )) (3a)s.t. ∀ i ∈ I [0 ,N − : x | k = x ( k ) , (3b) x i +1 | k = f ( x i | k , u i | k ) , (3c) x i | k ∈ X , (3d) u i | k ∈ U , (3e) x N | k ∈ S f . (3f)Problem (3) computes a discrete-time state and input backuptrajectory, { x ∗ i | k , u ∗ i | k } , of length N , where x i | k is the statepredicted i timesteps ahead, computed at time k , initialized at x | k = x ( k ) , and similarly for u i | k . The system is predictedalong the horizon according to dynamics (3c), subject toan initial condition (3b), state and input constraints (3d)and (3e), and terminal constraint (3f). Different from classi-cal MPC, the objective function in (3a) is chosen to minimizethe difference between the desired control input and the firstinput of the solution trajectory, as J ( u i | k , u d ( k )) = (cid:107) u d ( k ) − u | k (cid:107) . (4)The safety policy is then defined by π S ( x ( k ) , u d ( k )) = u ∗ | k .The cost function in (4) can be modified to includesecondary objectives beyond tracking the desired controlsignal. For the racing application, we include a regularizationterm that penalizes the rate of change of the inputs in orderto encourage a smoother control trajectory: J ( u i | k , u d ( k )) = (cid:107) u d ( k ) − u | k (cid:107) W + N − (cid:88) i =0 (cid:107) ∆ u i | k (cid:107) R S , (5) where ∆ u | k := u | k − u | k − , ∆ u i | k := u i | k − u i − | k for i = 1 , .., N − , and W, R S ∈ R m × m are cost matrices forthe input deviation and input rate respectively. This helpsto reduce rapid fluctuations between the desired input andsafety filter’s input, which can occur with the system atthe boundary of the state constraints in practice. To avoidunnecessary input deviations from a desired input that canbe certified as safe, the weights are chosen with W muchlarger than R S to ensure priority remains on tracking thedesired input. Assumption 1 (Invariant terminal set) . There exists a controllaw κ f : S f → U , and a corresponding positively invariantset S f ⊆ X , such that for all x ∈ S f , it holds that κ f ( x ) ∈ U and f ( x, κ f ( x )) ∈ S f . As in standard MPC theory, Assumption 1 provides recur-sive feasibility for the safety control policy obtained fromproblem (3), i.e. if the problem has a feasible solution attimestep ¯ k , then a feasible solution also exists for all futuretimes k > ¯ k . This results in constraint satisfaction at alltimes, meeting the requirements for a safe system put forthin Definition 1.If we consider the case where u d ( k ) is a safe input forthe system, there must exist a state and input trajectory,beginning at x ( k + 1) = f ( x ( k ) , u d ( k )) , that is feasiblealong the horizon and ends in S f . An example is shownin Figure 2(a), where a vehicle is at initial state x ( k ) , andthe primary state constraint is to stay inside track limits.Application of the input u d ( k ) would bring the vehicle tostate x | k , from where a state and input trajectory existsthat keeps the vehicle inside the boundaries before reaching S f . The input u d ( k ) can therefore be certified as safe, andthe optimal solution to (3) would be u ∗ | k = u d ( k ) . Thisachieves a minimal objective cost of zero, satisfying thedesired behavior of no intervention for a safe u d ( k ) .If the desired input is unsafe, then any resulting trajectorybeginning at x ( k + 1) = f ( x ( k ) , u d ( k )) must violate theonstraints at some point along the horizon. Looking atFigure 2(b), the trajectory following x | k after applying u d ( k ) can be seen to leave the track. In this case, Problem (3)will provide an input, u ∗ | k (cid:54) = u d ( k ) , that is able to maintainsystem safety while being as close as possible to u d ( k ) . Abackup control trajectory is shown in the same figure thatcan be taken instead if a safe initial input is applied.IV. VEHICLE DYNAMICS AND CONSTRAINTSIn this section, the model used to describe the vehicledynamics is presented, followed by the system constraints. A. System Model
In this work we consider a miniature RC car, which ismodeled using a standard dynamic bicycle model formu-lation [18], [19]. Using a dynamic model as opposed to asimpler kinematic model as considered in previous relatedwork, see, e.g., [1], allows us to consider the nonlinear tireforces which have a significant impact on vehicle motionduring aggressive maneuvers. The state of the model is x = [ p x , p y , ψ, v x , v y , r ] , with the input u = [ δ, τ ] , where p x , p y are the x-y coordinates of the car and ψ is the headingangle in the global coordinate frame; v x , v y , and r arethe velocities and yaw rate of change in the vehicle’s bodyframe. Finally, δ is the steering angle and τ is the drivetraincommand. An illustration can be seen in Figure 3. Fig. 3. Dynamic vehicle model diagram.
The system model can be described by the differentialequations ˙ x = v x cos( ψ ) − v y sin( ψ ) v x sin( ψ ) − v y cos( ψ ) r m ( F x − F yf sin( δ ) + mv y r ) m ( F yr + F yf cos( δ ) − mv x r ) I z ( F yf l f cos( δ ) − F yr l r ) , (6)where m is the car mass, I z is the yaw moment of inertia,and l f/r is the distance between the center of gravity and thefront and rear axles, respectively. The lateral tire forces F yf and F yr are modeled with a simplified Pacejka tire model, α f = arctan (cid:18) v y + l f rv x (cid:19) − δ, α r = arctan (cid:18) v y − l r rv x (cid:19) F yf/yr = D f/r sin( C f/r arctan( B f/r α f/r )) , (7)where α f and α r are the tire slip angles [17]. The longi-tudinal force is modeled as a single force applied to the center of gravity of the vehicle, and is computed as alinear combination of the drivetrain command and velocityas F x = C τ + C τ + C v x + C v x + C τ v x . The drivetraincommand τ can be positive, resulting in forward motion, ornegative, resulting in braking.The continuous-time system in (6) is discretized usingEuler forward, obtaining a discrete-time nonlinear system ofthe form (1). B. System Constraints
The system is subject to nonlinear state constraints, andpolyhedral input constraints of the form X := { x ∈ R n | d ( x ) ≤ b } , U := { u ∈ R m | Gu ≤ g } , (8)where d : R n → R n b , and G ∈ R n g × m . The input constraintsconsist of bounding the maximum and minimum commands,while the state constraints enforce the safety-critical task ofkeeping the car within track limits.To keep the vehicle within the boundaries of the track, weconstrain the front two corners of a bounding box aroundthe vehicle, e lf and e rf , shown in Figure 4. The lateral errorof the vehicle’s center of gravity with respect to the trackcenter-line is e lat , while the yaw error of the vehicle withrespect to the track orientation is µ . Given a reference center-line position and orientation, x t , y t , ψ t , these states can bewritten as e lat ( k ) = − sin( ψ t )( x ( k ) − x t ) + cos( ψ t )( y ( k ) − y t ) ,µ ( k ) = ψ ( k ) − ψ t ,e lf ( k ) = e lat ( k ) + l f sin( µ ( k )) + w µ ( k )) ,e rf ( k ) = e lat ( k ) + l f sin( µ ( k )) − w µ ( k )) , (9)where w is the width of the vehicle. These two corner pointsof the bounding box can be bounded by half the width ofthe track, denoted t , as | e lf | ≤ t, | e rf | ≤ t. (10) Fig. 4. Track-relative error states used to constraint the vehicle.
V. TERMINAL SET COMPUTATIONThe main difficulty in designing a safety filter for theconsidered racing application is the construction of thepositively invariant set, S f , for the nonlinear vehicle systemas described in Assumption 1. A method for computingpolyhedral terminal sets for autonomous driving is presentedin [20], but the required simplifying assumptions in the kine-matic model used are not suitable for a vehicle performingaggressive maneuvers. Approaches to terminal set design forore general nonlinear systems can be found in [21], [22],[23], where the common idea is to design a set based on alinearized system while using techniques to compensate forlinearization errors such that set invariance still holds for thenonlinear system. We take a similar approach that enforcesa required Lyapunov dissipation for a range of steady-statesto compute the terminal safe set.We first introduce a transformation in a track-relativecoordinate frame that allows computing steady-states of thenonlinear vehicle model parameterized by the road curvature.Based on established techniques for terminal set design, wethen propose to compute a linear control law capable ofstabilizing the nonlinear system in a neighborhood around aspecific steady state. We consider a grid of parameter valuesfor the linearized system and compute a positively invariantset for track segments of constant curvature. A-posterioriverification is then performed to ensure invariance holds forthe nonlinear system across the full parameter range. A. Track-Relative Coordinate Transformation and TerminalSteady-States
For the safety certification problem presented in Sec-tion III, the terminal set must contain states that are con-sidered safe for the desired system. In a racing context,having the vehicle positioned on the center-line and orientedforwards is a safe position, providing the vehicle is able tofollow the center-line closely under some control law. Inorder to more easily analyze the system with respect to thecenter-line, the global state is transformed into the track-relative state x r = [ e lat , µ, v x , v y , r ] , similar to that used in[24]. Here, e lat and µ are the lateral error and orientationerror as described in (9), and v x , v y , and r remain unchangedfrom (6). The dynamics of e lat and µ are described by ˙ e lat = v x sin( µ ) + v y cos( µ ) , ˙ µ = r − c v x cos( µ ) − v y sin( µ )1 − ce lat , (11)which are parameterized by the curvature of the track, c , ata given point on the center-line. We use the same dynamicsfor v x , v y , r as in (6) to describe ˙ x r , then discretize to obtain x r ( k +1 , c ) = f r ( x r ( k, c ) , u ( k )) , ∀ k ∈ I ≥ , (12)with f r : R n r × R m → R n r . Constraints keeping the vehiclewithin track boundaries, | e lat | ≤ t − w/ , and orientedforwards, | µ | ≤ π/ , can now be written in polytopic formas X r := { x r ∈ R n r | Hx r ≤ h } , where H ∈ R n h × n r .The goal is to find a terminal control law for the sys-tem (12) that can stabilize the vehicle around the track center-line, relating to e lat = 0 and a constant velocity v x = v x .Since the track-relative dynamics are parameterized by c ,different steady-state points ( x er ( c ) , u e ( c )) exist dependingon the current track curvature. The steady-state and corre-sponding input at a given curvature can be computed bysolving (12) for a state and input pairing such that x er ( c ) = f ( x er ( c ) , u e ( c )) , resulting in x er ( c ) = [0 , µ e , v x , v ey , r e ] T , u e ( c ) = [ δ e , τ e ] T . (13) While direct use of the steady-state (13) as a terminalconstraint satisfies the invariance property, the resultingterminal constraints (3f) would become rather restrictive,resulting in conservative behavior of the safety filter. Toincrease the feasible set of (3) and thereby the safe set ofthe vehicle states, we propose a design procedure to enlargethe terminal steady state constraint through an invariant setin the following. B. Terminal Set & Control Law Synthesis
To design a terminal set for the system (6), we use alinearization around the previously introduced equilibriumpoints (13) to obtain a stabilizing state feedback controller.This allows us to derive a positively invariant set from aLyapunov function for the corresponding closed-loop system.We begin by linearizing (12) for a specific steady-stateand curvature (13), resulting in ¯ x r ( k + 1 , c ) = A ( c )¯ x r ( k, c ) + B ( c )¯ u ( k, c ) (14)where A ( c ) and B ( c ) are the linearization matrices eval-uated at a steady state pair ( x er ( c ) , u e ( c )) . The notation ¯ x r ( k, c ) = x r ( k, c ) − x er ( c ) indicates the deviation of the state x r ( k, c ) from the steady-state x er ( c ) for a given curvature,and similarly for ¯ u ( k, c ) . For the local stabilizing controllaw, we choose a constant linear controller of the form κ f ( k, c ) = K ¯ x r ( k, c ) , (15)where K ∈ R m × n r .An ellipsoidal set is chosen for the terminal set as S f ( c ) := (cid:8) ¯ x r ( k, c ) | ¯ x r ( k, c ) T P ¯ x r ( k, c ) ≤ (cid:9) ⊆ X r , (16)which is a sublevel set of a quadratic Lyapunov function V f (¯ x r ( k, c )) = ¯ x r ( k, c ) T P ¯ x r ( k, c ) , contained within thestate constraints X r . The matrix P ∈ R n r × n r can be obtainedby solving the discrete-time Lyapunov equation for theclosed-loop system dynamics matrix A cl ( c ) = A ( c )+ B ( c ) K ,with a pre-specified dissipation rate Q dis : A cl ( c ) T P A cl ( c ) − P ≤ − Q dis . (17)The set (16) is then guaranteed to be positively invariant forthe system (14) at a given curvature when subject to thecontrol law (15). The dissipation Q dis provides the abilityto compensate for linearization errors when stabilizing theoriginal nonlinear system. This dissipation value is chosenusing Q dis = Q + K T RK , where Q, R are cost matricesthat can be designed to bound the linearization errors by ¯ x r ( k, c ) T Q ¯ x r ( k, c ) + ¯ u ( k, c ) T R ¯ u ( k, c ) .The curvature values of a track with both left and rightturns fall into the range c ∈ [ − c max , c max ] , where c max is the largest curvature value on the track. We thereforewant a single control law that stabilizes the system at anycurvature within the given range. This is done by firstintroducing a set of n c ∈ R equidistant incremental curvaturevalues in [ − c max , c max ] , and computing the correspondingequilibrium states, inputs, and linearization matrices for each: (cid:8) x er,i , u ei , A i , B i (cid:9) , ∀ i ∈ I [1 ,n c ] . We then impose the stabilitycondition from (17) at each steady-state for the same controlatrix K , computing the control law and resulting invariantset with a semidefinite program (similarly used in [23]): min E,Y − log det E (18a)s.t. ∀ i ∈ I [1 ,n c ] : E (cid:23) (18b) (cid:34)(cid:0) [ h ] j − [ H ] j x er,i (cid:1) [ H ] j EE [ H ] Tj E (cid:35) (cid:23) , ∀ j ∈ I [1 ,n h ] (18c) (cid:20) ([ g ] l − [ G ] l u ei ) [ G ] l EE [ G ] Tl E (cid:21) (cid:23) , ∀ l ∈ I [1 ,n g ] (18d) E (cid:63) (cid:63) (cid:63)A i E + B i Y E Q E I R Y I (cid:23) (18e)where E := P − , and Y := KE . The solution to (18)allows us to extract a maximal volume ellipsoidal set (16)that is invariant for the closed-loop system A cl ( c ) at eachof the n c gridded curvature values. The matrix inequalitiesdescribed by (18c) and (18d) impose the state and inputconstraints for each equilibrium point. The constraint in (18e)can be derived from the Lyapunov decrease condition (17)and Schur complements; the matrix is symmetric with (cid:63) representing the corresponding transposed terms.Since the resulting set is invariant for only the linearizedsystem at the chosen curvature values by design, we mustfurther verify that invariance holds for the nonlinear systemacross the continuous range of curvatures. This is done viaan additional optimization problem that searches the set forany state and curvature pairing that leads to an invariance vi-olation for the nonlinear system under the computed terminalcontrol law: max ¯ x r ,c ¯ x r ( k + 1 , c ) T P ¯ x r ( k + 1 , c ) (19a)s.t. ¯ x r ( k, c ) T P ¯ x r ( k, c ) ≤ (19b) ¯ x r ( k + 1 , c ) = f (¯ x r ( k ) , κ f ( k ) , c ) (19c) c ∈ [ c min , c max ] . (19d)If the optimal objective value (19a) is less than 1, then S f ( c ) is verified as invariant for the nonlinear system; otherwise,the problem has found a state for which the set is notinvariant under the nonlinear dynamics. In this case, the setcan be incrementally scaled down until no violating pointsare found, with the limit reaching the vehicle steady-state asa feasible solution.Note that the invariance guarantees of the proposed termi-nal set are valid for constant curvatures. Since we considera track made up of connecting constant curvature segments,the theoretical invariance property therefore holds on eachindividual segment. However, the guarantees do not strictlyhold for the instantaneous change of curvature betweensegments due to the resulting shift in steady-state set point.Since the linearization-based control law and invariant setdesign inherently introduce some conservatism, we observein practice that changing set points can still be efficiently compensated. We therefore do not explicitly account forthis change in curvature, and consider invariance for theindividual track segments as practically adequate for theterminal safe set.The curvature value used for the terminal set when solvingProblem (3) is taken as the curvature a certain distance aheadof the vehicle along the track. This distance is heuristicallychosen as a function of the current desired torque input, u d,τ ( k ) , and the time horizon of the problem, t N = N · T s ,to generate a reasonable distance ahead for the terminal set.VI. EXPERIMENTSTo demonstrate the performance of the proposed safetyfilter, the scheme is implemented with a small remote con-trolled vehicle on a track, where the vehicle must stay insidetrack boundaries. We first present an experiment showingthe safety filter in a driver-assistance scenario, where thedesired inputs are provided directly by a human driver.This is followed by an example of a learning-based controlapplication using imitation learning, where a neural networkpolicy is safely learned and deployed on the vehicle. A videoof the experiments performed can be found at: https://youtu.be/Aaly_IwQmfc . A. Problem Implementation
To ensure feasibility of the MPC problem (3), the trackwidth constraint (10) and the terminal set constraint (16),are implemented as soft constraints. The problem is solvedonline using acados [25] with a real-time iteration SQPscheme, horizon length of N = 60 , and sampling frequencyof 80 Hz. The terminal set computation in (18) is solvedoffline using MOSEK [26], with n c = 21 equilibrium pointsspanning curvatures [ − . , . . The verification problemin (19) is solved 1000 times from randomly selected initialconditions, and the resulting objective value never exceeds 1. B. Experimental Platform
A Kyosho Mini-Z 1:28-scale remote controlled vehicle isused on a 0.80 m constant-width track as the test platform forall experiments. A VICON motion capture system providesvehicle position and orientation information, which is usedby an Extended Kalman Filter to produce a complete stateestimate. The safe control inputs are sent via radio controllerto the vehicle. The closed loop system is implementedusing ROS (Robotic Operating System) running on a LenovoThinkPad P1 with Ubuntu 18.04, Intel Core i7-9750H pro-cessor, and 32 GB RAM.
C. Manual Driver Assistance
By combining the safety certification with human driverinputs, a driver-assistance system is created that providesnecessary intervention should the driver make a mistake thatwould endanger the vehicle. Since the safety certification isdesigned to be minimally invasive, it gives the driver freecontrol of the vehicle as long as their actions remain safe,only intervening when required.In this experiment, the manual driver inputs are providedby a physical joystick. Figure 5 shows the vehicle trajectorynd corresponding inputs from a single lap driven with thesafety certification active. In the vehicle trajectory plot, thecolor-map shows the L2-norm of the difference betweenthe desired and safe control input vectors, indicating themagnitude of modification by the safety filter. The inputcomparison plots show the safety filter commands initiallyclosely correspond to the driver commands up until thedashed line, indicating that the driver commands are beingcertified as safe and applied to the vehicle. After this, thesafety certification begins to intervene in both steering andthrottle inputs as the driver purposefully fails to steer aroundcorners, or swerves the car toward the wall. The plot of thetrajectory demonstrates how the safety certification is able tokeep the vehicle within track boundaries at all times, whilestill managing to track the desired inputs whenever possible. − − − − − y [ m ] Trajectory . . . . . − . . . δ [ r a d ] δ d δ S . . τ τ d τ S Fig. 5. Vehicle trajectory (top) and control inputs (middle, bottom) fora human providing the desired control signal by joystick. The safety filterintervention is shown via heat map on the trajectory. The orange dot andarrow indicate the starting point and travel direction; the dashed blue lineindicates the transition from generally safe driver inputs to unsafe inputs.
D. Imitation Learning
Imitation learning is a technique for learning a policy thatreplicates the actions of a demonstration from another agent,typically an expert policy for the given task. We implementan iterative algorithm called DAgger (Dataset Aggregation) [27], to learn a stationary deterministic policy using imita-tion learning. In DAgger, a policy is first initialized usingsupervised learning from an expert demonstration, and isthen deployed directly on the task. The expert labels allstates visited by the learned policy with the optimal action,which is then added to the dataset for the policy to retrainon. This process is repeated iteratively with the intention thatthe learned policy is able to improve from previous mistakes.Since DAgger relies on rolling out the learner policyduring training, it can be combined with the proposed safetyfilter to provide a safe training environment to learn a racingcontroller using the physical vehicle. A feedforward neuralnetwork with 3 hidden layers, 64 neurons per layer, andReLU activation functions is used as the policy architecture,which outputs a drivetrain and steering command. The inputto the network is chosen as the vehicle state in track-relativecoordinates, along with 30 curvature values over the next1.5 meters of track as x NN = [ e lat , µ, v x , v y , r, c . . . c ] .Training the network consists of supervised learning tominimize the L2-norm of the difference between expert andnetwork commands. The expert policy used is a ModelPredictive Contouring Controller (MPCC), presented in [19],which maximizes track progress while staying inside trackboundaries, and has proved successful in other racing ap-plications [28]. Imitating a finite-horizon optimal policy likeMPCC can be beneficial, as the states visited by the networkcontroller each iteration can be labeled offline using anMPCC with a longer horizon that cannot be used in practicedue to solve time requirements. The resulting neural networkthen imitates a high performance policy that otherwise couldnot be achieved by the expert in real-time.DAgger is set up on the experimental platform alongsidethe safety filter to allow for completely automated safetraining. Safety is provided both during data collection whenthe neural network policy is operating, and during transitionperiods as the vehicle stops to retrain the policy. Figure 6shows two trajectory plots of DAgger episodes while theneural network policy is active alongside the safety filter. Theplot in 6(a) shows the trajectory over several laps from thefirst DAgger episode, where multiple instances of necessarysafety filter intervention can be seen, as indicated by the colorof the safety deviation norm. In the early stages of training,the neural network policy has only been trained on the initialexpert dataset, so it struggles to bring the vehicle onto theoptimal racing line without trying to cut corners. The safetyfilter must then deviate from applying the desired inputs tocomputing safe inputs that keep the vehicle in the track.The plot in 6(b) shows the trajectory from the 5th episode,which is much more consistent than the initial policy withalmost no major safety filter interventions. The trajectory isaligned more closely with the optimal trajectory from MPCC,demonstrating an improved policy over previous iterations.VII. CONCLUSIONSIn this work, we have presented a predictive safety filterthat is able to render a closed-loop vehicle system safe whensubject to any unsafe control signal. A method for computingnd verifying an invariant terminal set for the nonlinearvehicle system on constant curvature track segments ispresented, providing a safe operating domain that does notoverly restrict the desired policy. The experiments illustratetwo applications where the safety filter is able to ensuresafety of the vehicle during dynamic high speed maneuvers. − − − − − y [ m ] NN Trajectory - 1st IterationMPCC Converged Trajectory . . . . − − − − − y [ m ] NN Trajectory - 4th IterationMPCC Converged Trajectory
Fig. 6. Vehicle trajectories shown during the first (top) and fourth (bottom)episodes of DAgger; safety filter intervention is shown via heat-map, andthe converged expert MPCC trajectory is shown in red. Initial location anddirection of travel are shown in orange. R EFERENCES[1] A. D. Ames, S. Coogan, M. Egerstedt, G. Notomista, K. Sreenath,and P. Tabuada, “Control barrier functions: Theory and applications,”in
European Control Conference , 2019.[2] L. Hewing, K. P. Wabersich, M. Menner, and M. N. Zeilinger,“Learning-Based Model Predictive Control: Toward Safe Learningin Control,”
Annual Review of Control, Robotics, and AutonomousSystems , vol. 3, no. 1, pp. 269–296, 2020.[3] J. F. Fisac, A. K. Akametalu, M. N. Zeilinger, S. Kaynama, J. Gillula,and C. J. Tomlin, “A General Safety Framework for Learning-Based Control in Uncertain Robotic Systems,”
IEEE Transactions onAutomatic Control , vol. 64, no. 7, pp. 2737–2752, 2019.[4] D. Seto, B. Krogh, L. Sha, and A. Chutinan, “The simplex architecturefor safe on-line control system upgrades,” in
Proceedings of theAmerican Control Conference , vol. 6, 1998, pp. 3504–3508. [5] S. Prajna and A. Jadbabaie, “Safety verification of hybrid systems us-ing barrier certificates,” in
Hybrid Systems: Computation and Control ,R. Alur and G. J. Pappas, Eds. Springer, 2004, pp. 477–492.[6] P. Wieland and F. Allg¨ower, “Constructive safety using control barrierfunctions,”
IFAC Proceedings , vol. 40, no. 12, pp. 462–467, 2007.[7] A. Taylor, A. Singletary, Y. Yue, and A. Ames, “Learning for safety-critical control with control barrier functions,” in
Proceedings of the2nd Conference on Learning for Dynamics and Control , vol. 120.PMLR, 2020, pp. 708–717.[8] J. Achiam, D. Held, A. Tamar, and P. Abbeel, “Constrained policyoptimization,” in
Proceedings of the 34th International Conference onMachine Learning , vol. 70. PMLR, 2017, pp. 22–31.[9] F. Berkenkamp, R. Moriconi, A. P. Schoellig, and A. Krause, “Safelearning of regions of attraction for uncertain, nonlinear systems withGaussian processes,” in , 2016, pp. 4661–4666.[10] J. H. Gillula and C. J. Tomlin, “Guaranteed safe online learning ofa bounded system,” in
IEEE Conference on Intelligent Robots andSystems , 2011, pp. 2979–2984.[11] K. P. Wabersich and M. N. Zeilinger, “Scalable synthesis of safetycertificates from data with application to learning-based control,” in
European Control Conference , 2018, pp. 1691–1697.[12] L. Wang, D. Han, and M. Egerstedt, “Permissive Barrier Certificatesfor Safe Stabilization Using Sum-of-squares,”
Proceedings of theAmerican Control Conference , vol. 2018-June, pp. 585–590, 2018.[13] A. Chakrabarty, C. Danielson, S. Di Cairano, and A. Raghunathan,“Active learning for estimating reachable sets for systems with un-known dynamics,”
IEEE Transactions on Cybernetics , pp. 1–12, 2020.[14] K. P. Wabersich and M. N. Zeilinger, “Linear model predictivesafety certification for learning-based control,” in
IEEE Conferenceon Decision and Control , 2018, pp. 7130–7135.[15] K. P. Wabersich, L. Hewing, A. Carron, and M. N. Zeilinger,“Probabilistic model predictive safety certification for learning-basedcontrol,”
IEEE Transactions on Automatic Control , 2021.[16] K. P. Wabersich and M. N. Zeilinger, “A predictive safety filter forlearning-based control of constrained nonlinear dynamical systems,”
Automatica, 2021 [accepted];arXiv:1812.05506 .[17] H. Pacejka,
Tyre and Vehicle Dynamics , ser. Automotive EngineeringSeries. Butterworth-Heinemann, 2002.[18] R. Rajamani,
Vehicle Dynamics and Control , ser. Mechanical Engi-neering Series. Springer US, 2011.[19] A. Liniger, A. Domahidi, and M. Morari, “Optimization-based au-tonomous racing of 1:43 scale rc cars,”
Optimal Control Applicationsand Methods , vol. 36, no. 5, pp. 628–647, Jul 2014.[20] P. F. Lima, J. M˚artensson, and B. Wahlberg, “Stability conditions forlinear time-varying model predictive control in autonomous driving,”in
IEEE Conference on Decision and Control , 2017, pp. 2775–2782.[21] H. Chen and F. Allg¨ower, “A quasi-infinite horizon nonlinear modelpredictive control scheme with guaranteed stability,”
Automatica ,vol. 34, no. 10, pp. 1205 – 1217, 1998.[22] A. Carron and M. Zeilinger, “Model predictive coverage control,”
IFACWorld Congress , 2020.[23] C. Conte, C. N. Jones, M. Morari, and M. N. Zeilinger, “Distributedsynthesis and stability of cooperative distributed model predictivecontrol for linear systems,”
Automatica , vol. 69, pp. 117 – 125, 2016.[24] J. L. Vazquez, M. Bruhlmeier, A. Liniger, A. Rupenyan, andJ. Lygeros, “Optimization-based hierarchical motion planning forautonomous racing,” in
IEEE International Conference on IntelligentRobots and Systems , 2020.[25] R. Verschueren et al. , “acados: a modular open-source framework forfast embedded optimal control,” 2019.[26] M. ApS,
The MOSEK optimization toolbox for MATLAB , 2019.[27] S. Ross, G. Gordon, and D. Bagnell, “A reduction of imitation learningand structured prediction to no-regret online learning,” in
Proceedingsof the Fourteenth International Conference on Artificial Intelligenceand Statistics , vol. 15, 2011, pp. 627–635.[28] A. Kabzan, Liniger, J. Lygeros, and R. Siegwart et. al, “Amz driverless:The full autonomous racing system,”