[PDF] A predictive safety filter for learning-based racing control

Abstract

The growing need for high-performance controllers in safety-critical applications like autonomous driving has been motivating the development of formal safety verification techniques. In this paper, we design and implement a predictive safety filter that is able to maintain vehicle safety with respect to track boundaries when paired alongside any potentially unsafe control signal, such as those found in learning-based methods. A model predictive control (MPC) framework is used to create a minimally invasive algorithm that certifies whether a desired control input is safe and can be applied to the vehicle, or that provides an alternate input to keep the vehicle in bounds. To this end, we provide a principled procedure to compute a safe and invariant set for nonlinear dynamic bicycle models using efficient convex approximation techniques. To fully support an aggressive racing performance without conservative safety interventions, the safe set is extended in real-time through predictive control backup trajectories. Applications for assisted manual driving and deep imitation learning on a miniature remote-controlled vehicle demonstrate the safety filter's ability to ensure vehicle safety during aggressive maneuvers.

Full PDF

AA predictive safety ﬁlter for learning-based racing control

Ben Tearle and Kim P. Wabersich and Andrea Carron and Melanie N. Zeilinger

Abstract — The growing need for high-performance con-trollers in safety-critical applications like autonomous drivinghas been motivating the development of formal safety veri-ﬁcation techniques. In this paper, we design and implementa predictive safety ﬁlter that is able to maintain vehiclesafety with respect to track boundaries when paired alongsideany potentially unsafe control signal, such as those found inlearning-based methods. A model predictive control (MPC)framework is used to create a minimally invasive algorithmthat certiﬁes whether a desired control input is safe andcan be applied to the vehicle, or that provides an alternateinput to keep the vehicle in bounds. To this end, we providea principled procedure to compute a safe and invariant setfor nonlinear dynamic bicycle models using efﬁcient convexapproximation techniques. To fully support an aggressive racingperformance without conservative safety interventions, the safeset is extended in real-time through predictive control backuptrajectories. Applications for assisted manual driving and deepimitation learning on a miniature remote-controlled vehicledemonstrate the safety ﬁlter’s ability to ensure vehicle safetyduring aggressive maneuvers.

I. INTRODUCTIONThe development of robotic systems has led to an everincreasing number of applications that go beyond the isolatedtask spaces found in legacy industries such as automotive orelectronics production. More recent applications encompassdynamic and learning-based interactions with humans incomplex task spaces, as is the case with autonomous driving,and therefore require advanced safety mechanisms [1], [2], toprevent potentially dangerous situations. Maintaining safetyat the physical limits for highly dynamic systems oftenrequires a task-speciﬁc trade-off between performance andconservatism to ensure safe system operation. As a result,there is an increasing interest in developing theoreticallysound safety frameworks with a reduced degree of conser-vatism that enable safety in a modular fashion, independentof a task-speciﬁc objective.While some of these methods have been demonstratedin practice, the considered applications are often small-scale or nearly linear control systems that are only operatedwithin conservative regions of their state space [3]. Motivatedby the strict safety requirements in autonomous driving,we consider the problem of safe autonomous and assistedracing as a benchmark application for deriving a practically The authors are with the Institute for Dynamical Systems and Con-trol, ETH Zurich, ZH-8092, Switzerland: [email protected],[wkim|carrona|mzeilinger]@ethz.ch . This work was sup-ported by the Swiss National Science Foundation under grant no. PP00P2157601 / 1. The research of Andrea Carron was supported by the SwissNational Centre of Competence in Research NCCR Digital Fabrication. Thework of Ben Tearle and Andrea Carron was supported by the ETH CareerSeed Grant 19-18-2.

Desired controlsignal u d ( x ( k )) Safety ﬁlter π s ( x ( k ) , u d ( k )) System f ( x ( k ) , u ( k )) u d ( k ) x ( k ) u ( k ) Safe system f s ( x ( k ) , u d ( k )) Fig. 1. Concept of predictive safety ﬁlter: Based on the current state x ( k ) ,an arbitrary control algorithm provides a desired control input u d ( k ) ∈ R m ,which is processed by the safety ﬁlter u ( k ) = π s ( x ( k ) , u d ( k )) and appliedto the real system. relevant safety mechanism. Racing requires the utilization ofa vehicle’s full nonlinear dynamics, providing a challengingdomain in which safety must be guaranteed.To provide safety for arbitrary control policies, we relyon a modular safety framework as shown in Figure 1. Thisapproach allows the framework to be used in conjunctionwith any potentially unsafe control signal, such as thosefrom learning-based controllers. The basic idea is to designa safety ﬁlter , which analyzes the desired control signal anddecides in real-time whether it can be applied to the system,or if it has to be modiﬁed to ensure safety. For the racingapplication considered in this work, this consists of verifyingif the vehicle is able to stay within track boundaries in thefuture given the current steering and drivetrain commands.This is achieved by ﬁnding safe backup control sequencesthat lead the vehicle towards a set of known safe states, wherethe ﬁrst input of the sequence is as close as possible to thedesired control signal. This approach allows for verifying thesafety of the desired input while at the same time providingan alternative safe input otherwise. A. Related Work

The concept of using a safety controller in a closed-loopsystem was ﬁrst introduced in [4], where the system canswitch between an experimental controller and a reliablesafety controller in the case of software faults. Developmentson the theoretic use of barrier certiﬁcates for verifying systemsafety were later proposed in [5], which was further extendedto the idea of control barrier functions (CBFs) [6]. Morerecent work has revisited the notion of using CBFs for safety-critical control of robotic systems, see [1] for an overview.This approach has been combined with a machine-learningframework in [7] to safely learn model discrepancies of aSegway robot while limiting the operational space duringtraining. Although these methods build off strong theoretical a r X i v : . [ ee ss . S Y ] F e b esults from control Lyapunov function theory, they rely onthe ability to explicitly model a system’s safety requirementsas a CBF, which is not generally trivial to design.Given the inherent lack of safety guarantees in traditionalmachine learning methods, the reinforcement learning (RL)ﬁeld has become increasingly interested in enforcing con-straints for training black-box control policies. A general-purpose policy search algorithm for constrained reinforce-ment learning is introduced in [8], which approximatelyenforces safety constraints at every policy update. Usinga learning-based system model, [9] proposes a method fordetermination of a safe set of system states under a spe-ciﬁc learning-based policy. Although these methods allowfor approximately safe policy training, they are limited inthat they remain tied to task-speciﬁc reinforcement learningalgorithms, whereas the safety ﬁlter presented in this workis able to function independent of a speciﬁc task and therebyenables modular safety.An approach for providing system safety based on conﬁn-ing a system to a pre-computed set of safe states is introducedin [10]. This uses reachability-based techniques to ﬁnd a safeset for a given system together with a corresponding controlpolicy that provides invariance within the safe set. The ideais expanded in [3] to perform online updates of the safeset using a non-parametric system dynamics estimate. Theseapproaches suffer from limited scalability in the ofﬂine safeset computation required. Recent work attempts to addressthis by approximating the reachable sets using data-basedmethods [11], sum-of-squares programming [12], and activelearning [13].Closely related to these ideas, a method for establishingsafety using an MPC-based control law is derived in [14].A continuously updating control policy is computed onlineto ﬁnd backup trajectories towards safe states, resulting inthe implicit representation of the safe set and correspondingsafe control law via the MPC optimization problem. Thismethod is extended to consider nonlinear stochastic systemsformulated with chance-constraints or parametric uncertain-ties in [15], [16], and provides the foundation for the task ofautonomous racing considered in this work. B. Contributions

The main contribution of this paper is the design andimplementation of a permissive safety ﬁlter for autonomousracing that can be combined with any desired control signal,ensuring closed-loop vehicle safety with respect to a trackfor a diverse range of applications. To this end, we usethe concept of predictive safety ﬁlters as presented in [14],[16]. To achieve a minimally invasive safety ﬁlter supportingaggressive maneuvers, we use a nonlinear dynamic bicyclemodel with a Pacejka model of the tire forces [17] tosimultaneously predict and optimize accurate backup controltrajectories. In addition to a high-ﬁdelity system model,the safety ﬁlter performance can be improved by usingeither a longer planning horizon or a larger terminal set.As the planning horizon is typically limited by memory andprocessing requirements, we derive an iterative optimization- based invariant set computation using convex approximationsto obtain an enlarged terminal safe set for the nonlineardynamic bicycle model, which is valid over a range ofconstant road curvatures.The physical miniature racing application demonstratesthe proposed safety ﬁlter’s performance with both human-in-the-loop racing and deep imitation learning. This workpresents, to the best of our knowledge, the ﬁrst application ofa predictive safety ﬁlter to a complex and highly dynamicalnonlinear system demonstrated in experimental results.II. PROBLEM FORMULATION

Notation : The set of integers in the interval [ a, b ] ⊂ R is denoted by I [ a,b ] , and the set of integers in the interval [ a, ∞ ) ⊂ R is I ≥ a . The i -th row of a matrix M ∈ R n × m is denoted by [ M ] i .The goal of this work is to design a safety ﬁlter thatcertiﬁes whether or not a desired control input, u d ( k ) , is safefor a vehicle system, and provides an alternative safe controlinput at any time. We consider a discrete-time nonlinearsystem of the form x ( k + 1) = f ( x ( k ) , u ( k )) , ∀ k ∈ I ≥ , (1)subject to state and input constraints, x ( k ) ∈ X , u ( k ) ∈ U ,where the dynamics f : X × U → R n . System safety isdeﬁned with respect to ensuring constraint satisfaction at alltimes, as follows. Deﬁnition 1.

A system (1) is considered safe if x ( k ) ∈ X , u ( k ) ∈ U , ∀ k ∈ I ≥ . (2)In order to guarantee this notion of safety for a given u d ( k ) , a safety control policy, π S ( x ( k ) , u d ( k )) , is providedthat guarantees constraint satisfaction for all future timestepsif applied to the vehicle. If a safety policy exists with u d ( k ) as the current input of the policy, then u d ( k ) can be certiﬁedas safe and applied to the system. More formally: Deﬁnition 2.

A desired input u d (¯ k ) is certiﬁed as safe forsystem (1) , at a given timestep ¯ k , if the safety control policyyields π S ( x (¯ k ) , u d (¯ k )) = u d (¯ k ) , and application of u ( k ) = π S ( x ( k ) , u d ( k )) to the system results in safety according toDeﬁnition 1 for all k ≥ ¯ k . Using a safety policy in accordance with Deﬁnition 2provides a safety ﬁlter that can be brought into a closed-loop system as shown in Figure 1. Since the safety policycan be updated at each time step to consider the incomingdesired input, this allows the desired control signal to havecontrol authority over the system whenever possible, i.e. π S ( x ( k ) , u d ( k )) = u d ( k ) . However, if the desired controlsignal would put the system at risk of violating its constraintsin the future, then alternate inputs, π S ( x ( k ) , u d ( k )) (cid:54) = u d ( k ) ,must be available that ensure safety for the system.The next section discusses an approach to compute π S online using an MPC framework that minimizes interferencewhile still ensuring safety for the system. ig. 2. Diagram (a) shows a possible vehicle trajectory from a safe desired input u d ( k ) . Diagram (b) shows the resulting vehicle trajectory from an unsafedesired input, where the vehicle ends up leaving the track. An alternate safe input u ∗ ( k ) applied by the safety ﬁlter is shown along with its trajectory. III. PREDICTIVE SAFETY FILTERWe deﬁne an implicit safety policy through a receding-horizon optimal control problem, referred to as predictivesafety ﬁlter problem [14], which allows for an efﬁcient onlinecomputation of the desired safety ﬁlter π S : min x i | k ,u i | k J ( u i | k , u d ( k )) (3a)s.t. ∀ i ∈ I [0 ,N − : x | k = x ( k ) , (3b) x i +1 | k = f ( x i | k , u i | k ) , (3c) x i | k ∈ X , (3d) u i | k ∈ U , (3e) x N | k ∈ S f . (3f)Problem (3) computes a discrete-time state and input backuptrajectory, { x ∗ i | k , u ∗ i | k } , of length N , where x i | k is the statepredicted i timesteps ahead, computed at time k , initialized at x | k = x ( k ) , and similarly for u i | k . The system is predictedalong the horizon according to dynamics (3c), subject toan initial condition (3b), state and input constraints (3d)and (3e), and terminal constraint (3f). Different from classi-cal MPC, the objective function in (3a) is chosen to minimizethe difference between the desired control input and the ﬁrstinput of the solution trajectory, as J ( u i | k , u d ( k )) = (cid:107) u d ( k ) − u | k (cid:107) . (4)The safety policy is then deﬁned by π S ( x ( k ) , u d ( k )) = u ∗ | k .The cost function in (4) can be modiﬁed to includesecondary objectives beyond tracking the desired controlsignal. For the racing application, we include a regularizationterm that penalizes the rate of change of the inputs in orderto encourage a smoother control trajectory: J ( u i | k , u d ( k )) = (cid:107) u d ( k ) − u | k (cid:107) W + N − (cid:88) i =0 (cid:107) ∆ u i | k (cid:107) R S , (5) where ∆ u | k := u | k − u | k − , ∆ u i | k := u i | k − u i − | k for i = 1 , .., N − , and W, R S ∈ R m × m are cost matrices forthe input deviation and input rate respectively. This helpsto reduce rapid ﬂuctuations between the desired input andsafety ﬁlter’s input, which can occur with the system atthe boundary of the state constraints in practice. To avoidunnecessary input deviations from a desired input that canbe certiﬁed as safe, the weights are chosen with W muchlarger than R S to ensure priority remains on tracking thedesired input. Assumption 1 (Invariant terminal set) . There exists a controllaw κ f : S f → U , and a corresponding positively invariantset S f ⊆ X , such that for all x ∈ S f , it holds that κ f ( x ) ∈ U and f ( x, κ f ( x )) ∈ S f . As in standard MPC theory, Assumption 1 provides recur-sive feasibility for the safety control policy obtained fromproblem (3), i.e. if the problem has a feasible solution attimestep ¯ k , then a feasible solution also exists for all futuretimes k > ¯ k . This results in constraint satisfaction at alltimes, meeting the requirements for a safe system put forthin Deﬁnition 1.If we consider the case where u d ( k ) is a safe input forthe system, there must exist a state and input trajectory,beginning at x ( k + 1) = f ( x ( k ) , u d ( k )) , that is feasiblealong the horizon and ends in S f . An example is shownin Figure 2(a), where a vehicle is at initial state x ( k ) , andthe primary state constraint is to stay inside track limits.Application of the input u d ( k ) would bring the vehicle tostate x | k , from where a state and input trajectory existsthat keeps the vehicle inside the boundaries before reaching S f . The input u d ( k ) can therefore be certiﬁed as safe, andthe optimal solution to (3) would be u ∗ | k = u d ( k ) . Thisachieves a minimal objective cost of zero, satisfying thedesired behavior of no intervention for a safe u d ( k ) .If the desired input is unsafe, then any resulting trajectorybeginning at x ( k + 1) = f ( x ( k ) , u d ( k )) must violate theonstraints at some point along the horizon. Looking atFigure 2(b), the trajectory following x | k after applying u d ( k ) can be seen to leave the track. In this case, Problem (3)will provide an input, u ∗ | k (cid:54) = u d ( k ) , that is able to maintainsystem safety while being as close as possible to u d ( k ) . Abackup control trajectory is shown in the same ﬁgure thatcan be taken instead if a safe initial input is applied.IV. VEHICLE DYNAMICS AND CONSTRAINTSIn this section, the model used to describe the vehicledynamics is presented, followed by the system constraints. A. System Model

In this work we consider a miniature RC car, which ismodeled using a standard dynamic bicycle model formu-lation [18], [19]. Using a dynamic model as opposed to asimpler kinematic model as considered in previous relatedwork, see, e.g., [1], allows us to consider the nonlinear tireforces which have a signiﬁcant impact on vehicle motionduring aggressive maneuvers. The state of the model is x = [ p x , p y , ψ, v x , v y , r ] , with the input u = [ δ, τ ] , where p x , p y are the x-y coordinates of the car and ψ is the headingangle in the global coordinate frame; v x , v y , and r arethe velocities and yaw rate of change in the vehicle’s bodyframe. Finally, δ is the steering angle and τ is the drivetraincommand. An illustration can be seen in Figure 3. Fig. 3. Dynamic vehicle model diagram.

The system model can be described by the differentialequations ˙ x =  v x cos( ψ ) − v y sin( ψ ) v x sin( ψ ) − v y cos( ψ ) r m ( F x − F yf sin( δ ) + mv y r ) m ( F yr + F yf cos( δ ) − mv x r ) I z ( F yf l f cos( δ ) − F yr l r )  , (6)where m is the car mass, I z is the yaw moment of inertia,and l f/r is the distance between the center of gravity and thefront and rear axles, respectively. The lateral tire forces F yf and F yr are modeled with a simpliﬁed Pacejka tire model, α f = arctan (cid:18) v y + l f rv x (cid:19) − δ, α r = arctan (cid:18) v y − l r rv x (cid:19) F yf/yr = D f/r sin( C f/r arctan( B f/r α f/r )) , (7)where α f and α r are the tire slip angles [17]. The longi-tudinal force is modeled as a single force applied to the center of gravity of the vehicle, and is computed as alinear combination of the drivetrain command and velocityas F x = C τ + C τ + C v x + C v x + C τ v x . The drivetraincommand τ can be positive, resulting in forward motion, ornegative, resulting in braking.The continuous-time system in (6) is discretized usingEuler forward, obtaining a discrete-time nonlinear system ofthe form (1). B. System Constraints

The system is subject to nonlinear state constraints, andpolyhedral input constraints of the form X := { x ∈ R n | d ( x ) ≤ b } , U := { u ∈ R m | Gu ≤ g } , (8)where d : R n → R n b , and G ∈ R n g × m . The input constraintsconsist of bounding the maximum and minimum commands,while the state constraints enforce the safety-critical task ofkeeping the car within track limits.To keep the vehicle within the boundaries of the track, weconstrain the front two corners of a bounding box aroundthe vehicle, e lf and e rf , shown in Figure 4. The lateral errorof the vehicle’s center of gravity with respect to the trackcenter-line is e lat , while the yaw error of the vehicle withrespect to the track orientation is µ . Given a reference center-line position and orientation, x t , y t , ψ t , these states can bewritten as e lat ( k ) = − sin( ψ t )( x ( k ) − x t ) + cos( ψ t )( y ( k ) − y t ) ,µ ( k ) = ψ ( k ) − ψ t ,e lf ( k ) = e lat ( k ) + l f sin( µ ( k )) + w µ ( k )) ,e rf ( k ) = e lat ( k ) + l f sin( µ ( k )) − w µ ( k )) , (9)where w is the width of the vehicle. These two corner pointsof the bounding box can be bounded by half the width ofthe track, denoted t , as | e lf | ≤ t, | e rf | ≤ t. (10) Fig. 4. Track-relative error states used to constraint the vehicle.

V. TERMINAL SET COMPUTATIONThe main difﬁculty in designing a safety ﬁlter for theconsidered racing application is the construction of thepositively invariant set, S f , for the nonlinear vehicle systemas described in Assumption 1. A method for computingpolyhedral terminal sets for autonomous driving is presentedin [20], but the required simplifying assumptions in the kine-matic model used are not suitable for a vehicle performingaggressive maneuvers. Approaches to terminal set design forore general nonlinear systems can be found in [21], [22],[23], where the common idea is to design a set based on alinearized system while using techniques to compensate forlinearization errors such that set invariance still holds for thenonlinear system. We take a similar approach that enforcesa required Lyapunov dissipation for a range of steady-statesto compute the terminal safe set.We ﬁrst introduce a transformation in a track-relativecoordinate frame that allows computing steady-states of thenonlinear vehicle model parameterized by the road curvature.Based on established techniques for terminal set design, wethen propose to compute a linear control law capable ofstabilizing the nonlinear system in a neighborhood around aspeciﬁc steady state. We consider a grid of parameter valuesfor the linearized system and compute a positively invariantset for track segments of constant curvature. A-posterioriveriﬁcation is then performed to ensure invariance holds forthe nonlinear system across the full parameter range. A. Track-Relative Coordinate Transformation and TerminalSteady-States

For the safety certiﬁcation problem presented in Sec-tion III, the terminal set must contain states that are con-sidered safe for the desired system. In a racing context,having the vehicle positioned on the center-line and orientedforwards is a safe position, providing the vehicle is able tofollow the center-line closely under some control law. Inorder to more easily analyze the system with respect to thecenter-line, the global state is transformed into the track-relative state x r = [ e lat , µ, v x , v y , r ] , similar to that used in[24]. Here, e lat and µ are the lateral error and orientationerror as described in (9), and v x , v y , and r remain unchangedfrom (6). The dynamics of e lat and µ are described by ˙ e lat = v x sin( µ ) + v y cos( µ ) , ˙ µ = r − c v x cos( µ ) − v y sin( µ )1 − ce lat , (11)which are parameterized by the curvature of the track, c , ata given point on the center-line. We use the same dynamicsfor v x , v y , r as in (6) to describe ˙ x r , then discretize to obtain x r ( k +1 , c ) = f r ( x r ( k, c ) , u ( k )) , ∀ k ∈ I ≥ , (12)with f r : R n r × R m → R n r . Constraints keeping the vehiclewithin track boundaries, | e lat | ≤ t − w/ , and orientedforwards, | µ | ≤ π/ , can now be written in polytopic formas X r := { x r ∈ R n r | Hx r ≤ h } , where H ∈ R n h × n r .The goal is to ﬁnd a terminal control law for the sys-tem (12) that can stabilize the vehicle around the track center-line, relating to e lat = 0 and a constant velocity v x = v x .Since the track-relative dynamics are parameterized by c ,different steady-state points ( x er ( c ) , u e ( c )) exist dependingon the current track curvature. The steady-state and corre-sponding input at a given curvature can be computed bysolving (12) for a state and input pairing such that x er ( c ) = f ( x er ( c ) , u e ( c )) , resulting in x er ( c ) = [0 , µ e , v x , v ey , r e ] T , u e ( c ) = [ δ e , τ e ] T . (13) While direct use of the steady-state (13) as a terminalconstraint satisﬁes the invariance property, the resultingterminal constraints (3f) would become rather restrictive,resulting in conservative behavior of the safety ﬁlter. Toincrease the feasible set of (3) and thereby the safe set ofthe vehicle states, we propose a design procedure to enlargethe terminal steady state constraint through an invariant setin the following. B. Terminal Set & Control Law Synthesis

To design a terminal set for the system (6), we use alinearization around the previously introduced equilibriumpoints (13) to obtain a stabilizing state feedback controller.This allows us to derive a positively invariant set from aLyapunov function for the corresponding closed-loop system.We begin by linearizing (12) for a speciﬁc steady-stateand curvature (13), resulting in ¯ x r ( k + 1 , c ) = A ( c )¯ x r ( k, c ) + B ( c )¯ u ( k, c ) (14)where A ( c ) and B ( c ) are the linearization matrices eval-uated at a steady state pair ( x er ( c ) , u e ( c )) . The notation ¯ x r ( k, c ) = x r ( k, c ) − x er ( c ) indicates the deviation of the state x r ( k, c ) from the steady-state x er ( c ) for a given curvature,and similarly for ¯ u ( k, c ) . For the local stabilizing controllaw, we choose a constant linear controller of the form κ f ( k, c ) = K ¯ x r ( k, c ) , (15)where K ∈ R m × n r .An ellipsoidal set is chosen for the terminal set as S f ( c ) := (cid:8) ¯ x r ( k, c ) | ¯ x r ( k, c ) T P ¯ x r ( k, c ) ≤ (cid:9) ⊆ X r , (16)which is a sublevel set of a quadratic Lyapunov function V f (¯ x r ( k, c )) = ¯ x r ( k, c ) T P ¯ x r ( k, c ) , contained within thestate constraints X r . The matrix P ∈ R n r × n r can be obtainedby solving the discrete-time Lyapunov equation for theclosed-loop system dynamics matrix A cl ( c ) = A ( c )+ B ( c ) K ,with a pre-speciﬁed dissipation rate Q dis : A cl ( c ) T P A cl ( c ) − P ≤ − Q dis . (17)The set (16) is then guaranteed to be positively invariant forthe system (14) at a given curvature when subject to thecontrol law (15). The dissipation Q dis provides the abilityto compensate for linearization errors when stabilizing theoriginal nonlinear system. This dissipation value is chosenusing Q dis = Q + K T RK , where Q, R are cost matricesthat can be designed to bound the linearization errors by ¯ x r ( k, c ) T Q ¯ x r ( k, c ) + ¯ u ( k, c ) T R ¯ u ( k, c ) .The curvature values of a track with both left and rightturns fall into the range c ∈ [ − c max , c max ] , where c max is the largest curvature value on the track. We thereforewant a single control law that stabilizes the system at anycurvature within the given range. This is done by ﬁrstintroducing a set of n c ∈ R equidistant incremental curvaturevalues in [ − c max , c max ] , and computing the correspondingequilibrium states, inputs, and linearization matrices for each: (cid:8) x er,i , u ei , A i , B i (cid:9) , ∀ i ∈ I [1 ,n c ] . We then impose the stabilitycondition from (17) at each steady-state for the same controlatrix K , computing the control law and resulting invariantset with a semideﬁnite program (similarly used in [23]): min E,Y − log det E (18a)s.t. ∀ i ∈ I [1 ,n c ] : E (cid:23) (18b) (cid:34)(cid:0) [ h ] j − [ H ] j x er,i (cid:1) [ H ] j EE [ H ] Tj E (cid:35) (cid:23) , ∀ j ∈ I [1 ,n h ] (18c) (cid:20) ([ g ] l − [ G ] l u ei ) [ G ] l EE [ G ] Tl E (cid:21) (cid:23) , ∀ l ∈ I [1 ,n g ] (18d)  E (cid:63) (cid:63) (cid:63)A i E + B i Y E Q E I R Y I  (cid:23) (18e)where E := P − , and Y := KE . The solution to (18)allows us to extract a maximal volume ellipsoidal set (16)that is invariant for the closed-loop system A cl ( c ) at eachof the n c gridded curvature values. The matrix inequalitiesdescribed by (18c) and (18d) impose the state and inputconstraints for each equilibrium point. The constraint in (18e)can be derived from the Lyapunov decrease condition (17)and Schur complements; the matrix is symmetric with (cid:63) representing the corresponding transposed terms.Since the resulting set is invariant for only the linearizedsystem at the chosen curvature values by design, we mustfurther verify that invariance holds for the nonlinear systemacross the continuous range of curvatures. This is done viaan additional optimization problem that searches the set forany state and curvature pairing that leads to an invariance vi-olation for the nonlinear system under the computed terminalcontrol law: max ¯ x r ,c ¯ x r ( k + 1 , c ) T P ¯ x r ( k + 1 , c ) (19a)s.t. ¯ x r ( k, c ) T P ¯ x r ( k, c ) ≤ (19b) ¯ x r ( k + 1 , c ) = f (¯ x r ( k ) , κ f ( k ) , c ) (19c) c ∈ [ c min , c max ] . (19d)If the optimal objective value (19a) is less than 1, then S f ( c ) is veriﬁed as invariant for the nonlinear system; otherwise,the problem has found a state for which the set is notinvariant under the nonlinear dynamics. In this case, the setcan be incrementally scaled down until no violating pointsare found, with the limit reaching the vehicle steady-state asa feasible solution.Note that the invariance guarantees of the proposed termi-nal set are valid for constant curvatures. Since we considera track made up of connecting constant curvature segments,the theoretical invariance property therefore holds on eachindividual segment. However, the guarantees do not strictlyhold for the instantaneous change of curvature betweensegments due to the resulting shift in steady-state set point.Since the linearization-based control law and invariant setdesign inherently introduce some conservatism, we observein practice that changing set points can still be efﬁciently compensated. We therefore do not explicitly account forthis change in curvature, and consider invariance for theindividual track segments as practically adequate for theterminal safe set.The curvature value used for the terminal set when solvingProblem (3) is taken as the curvature a certain distance aheadof the vehicle along the track. This distance is heuristicallychosen as a function of the current desired torque input, u d,τ ( k ) , and the time horizon of the problem, t N = N · T s ,to generate a reasonable distance ahead for the terminal set.VI. EXPERIMENTSTo demonstrate the performance of the proposed safetyﬁlter, the scheme is implemented with a small remote con-trolled vehicle on a track, where the vehicle must stay insidetrack boundaries. We ﬁrst present an experiment showingthe safety ﬁlter in a driver-assistance scenario, where thedesired inputs are provided directly by a human driver.This is followed by an example of a learning-based controlapplication using imitation learning, where a neural networkpolicy is safely learned and deployed on the vehicle. A videoof the experiments performed can be found at: https://youtu.be/Aaly_IwQmfc . A. Problem Implementation

To ensure feasibility of the MPC problem (3), the trackwidth constraint (10) and the terminal set constraint (16),are implemented as soft constraints. The problem is solvedonline using acados [25] with a real-time iteration SQPscheme, horizon length of N = 60 , and sampling frequencyof 80 Hz. The terminal set computation in (18) is solvedofﬂine using MOSEK [26], with n c = 21 equilibrium pointsspanning curvatures [ − . , . . The veriﬁcation problemin (19) is solved 1000 times from randomly selected initialconditions, and the resulting objective value never exceeds 1. B. Experimental Platform

A Kyosho Mini-Z 1:28-scale remote controlled vehicle isused on a 0.80 m constant-width track as the test platform forall experiments. A VICON motion capture system providesvehicle position and orientation information, which is usedby an Extended Kalman Filter to produce a complete stateestimate. The safe control inputs are sent via radio controllerto the vehicle. The closed loop system is implementedusing ROS (Robotic Operating System) running on a LenovoThinkPad P1 with Ubuntu 18.04, Intel Core i7-9750H pro-cessor, and 32 GB RAM.

C. Manual Driver Assistance

By combining the safety certiﬁcation with human driverinputs, a driver-assistance system is created that providesnecessary intervention should the driver make a mistake thatwould endanger the vehicle. Since the safety certiﬁcation isdesigned to be minimally invasive, it gives the driver freecontrol of the vehicle as long as their actions remain safe,only intervening when required.In this experiment, the manual driver inputs are providedby a physical joystick. Figure 5 shows the vehicle trajectorynd corresponding inputs from a single lap driven with thesafety certiﬁcation active. In the vehicle trajectory plot, thecolor-map shows the L2-norm of the difference betweenthe desired and safe control input vectors, indicating themagnitude of modiﬁcation by the safety ﬁlter. The inputcomparison plots show the safety ﬁlter commands initiallyclosely correspond to the driver commands up until thedashed line, indicating that the driver commands are beingcertiﬁed as safe and applied to the vehicle. After this, thesafety certiﬁcation begins to intervene in both steering andthrottle inputs as the driver purposefully fails to steer aroundcorners, or swerves the car toward the wall. The plot of thetrajectory demonstrates how the safety certiﬁcation is able tokeep the vehicle within track boundaries at all times, whilestill managing to track the desired inputs whenever possible. − − − − − y [ m ] Trajectory . . . . . − . . . δ [ r a d ] δ d δ S . . τ τ d τ S Fig. 5. Vehicle trajectory (top) and control inputs (middle, bottom) fora human providing the desired control signal by joystick. The safety ﬁlterintervention is shown via heat map on the trajectory. The orange dot andarrow indicate the starting point and travel direction; the dashed blue lineindicates the transition from generally safe driver inputs to unsafe inputs.

D. Imitation Learning

Imitation learning is a technique for learning a policy thatreplicates the actions of a demonstration from another agent,typically an expert policy for the given task. We implementan iterative algorithm called DAgger (Dataset Aggregation) [27], to learn a stationary deterministic policy using imita-tion learning. In DAgger, a policy is ﬁrst initialized usingsupervised learning from an expert demonstration, and isthen deployed directly on the task. The expert labels allstates visited by the learned policy with the optimal action,which is then added to the dataset for the policy to retrainon. This process is repeated iteratively with the intention thatthe learned policy is able to improve from previous mistakes.Since DAgger relies on rolling out the learner policyduring training, it can be combined with the proposed safetyﬁlter to provide a safe training environment to learn a racingcontroller using the physical vehicle. A feedforward neuralnetwork with 3 hidden layers, 64 neurons per layer, andReLU activation functions is used as the policy architecture,which outputs a drivetrain and steering command. The inputto the network is chosen as the vehicle state in track-relativecoordinates, along with 30 curvature values over the next1.5 meters of track as x NN = [ e lat , µ, v x , v y , r, c . . . c ] .Training the network consists of supervised learning tominimize the L2-norm of the difference between expert andnetwork commands. The expert policy used is a ModelPredictive Contouring Controller (MPCC), presented in [19],which maximizes track progress while staying inside trackboundaries, and has proved successful in other racing ap-plications [28]. Imitating a ﬁnite-horizon optimal policy likeMPCC can be beneﬁcial, as the states visited by the networkcontroller each iteration can be labeled ofﬂine using anMPCC with a longer horizon that cannot be used in practicedue to solve time requirements. The resulting neural networkthen imitates a high performance policy that otherwise couldnot be achieved by the expert in real-time.DAgger is set up on the experimental platform alongsidethe safety ﬁlter to allow for completely automated safetraining. Safety is provided both during data collection whenthe neural network policy is operating, and during transitionperiods as the vehicle stops to retrain the policy. Figure 6shows two trajectory plots of DAgger episodes while theneural network policy is active alongside the safety ﬁlter. Theplot in 6(a) shows the trajectory over several laps from theﬁrst DAgger episode, where multiple instances of necessarysafety ﬁlter intervention can be seen, as indicated by the colorof the safety deviation norm. In the early stages of training,the neural network policy has only been trained on the initialexpert dataset, so it struggles to bring the vehicle onto theoptimal racing line without trying to cut corners. The safetyﬁlter must then deviate from applying the desired inputs tocomputing safe inputs that keep the vehicle in the track.The plot in 6(b) shows the trajectory from the 5th episode,which is much more consistent than the initial policy withalmost no major safety ﬁlter interventions. The trajectory isaligned more closely with the optimal trajectory from MPCC,demonstrating an improved policy over previous iterations.VII. CONCLUSIONSIn this work, we have presented a predictive safety ﬁlterthat is able to render a closed-loop vehicle system safe whensubject to any unsafe control signal. A method for computingnd verifying an invariant terminal set for the nonlinearvehicle system on constant curvature track segments ispresented, providing a safe operating domain that does notoverly restrict the desired policy. The experiments illustratetwo applications where the safety ﬁlter is able to ensuresafety of the vehicle during dynamic high speed maneuvers. − − − − − y [ m ] NN Trajectory - 1st IterationMPCC Converged Trajectory . . . . − − − − − y [ m ] NN Trajectory - 4th IterationMPCC Converged Trajectory

Fig. 6. Vehicle trajectories shown during the ﬁrst (top) and fourth (bottom)episodes of DAgger; safety ﬁlter intervention is shown via heat-map, andthe converged expert MPCC trajectory is shown in red. Initial location anddirection of travel are shown in orange. R EFERENCES[1] A. D. Ames, S. Coogan, M. Egerstedt, G. Notomista, K. Sreenath,and P. Tabuada, “Control barrier functions: Theory and applications,”in

European Control Conference , 2019.[2] L. Hewing, K. P. Wabersich, M. Menner, and M. N. Zeilinger,“Learning-Based Model Predictive Control: Toward Safe Learningin Control,”

Annual Review of Control, Robotics, and AutonomousSystems , vol. 3, no. 1, pp. 269–296, 2020.[3] J. F. Fisac, A. K. Akametalu, M. N. Zeilinger, S. Kaynama, J. Gillula,and C. J. Tomlin, “A General Safety Framework for Learning-Based Control in Uncertain Robotic Systems,”

IEEE Transactions onAutomatic Control , vol. 64, no. 7, pp. 2737–2752, 2019.[4] D. Seto, B. Krogh, L. Sha, and A. Chutinan, “The simplex architecturefor safe on-line control system upgrades,” in

Proceedings of theAmerican Control Conference , vol. 6, 1998, pp. 3504–3508. [5] S. Prajna and A. Jadbabaie, “Safety veriﬁcation of hybrid systems us-ing barrier certiﬁcates,” in

Hybrid Systems: Computation and Control ,R. Alur and G. J. Pappas, Eds. Springer, 2004, pp. 477–492.[6] P. Wieland and F. Allg¨ower, “Constructive safety using control barrierfunctions,”

IFAC Proceedings , vol. 40, no. 12, pp. 462–467, 2007.[7] A. Taylor, A. Singletary, Y. Yue, and A. Ames, “Learning for safety-critical control with control barrier functions,” in

Proceedings of the2nd Conference on Learning for Dynamics and Control , vol. 120.PMLR, 2020, pp. 708–717.[8] J. Achiam, D. Held, A. Tamar, and P. Abbeel, “Constrained policyoptimization,” in

Proceedings of the 34th International Conference onMachine Learning , vol. 70. PMLR, 2017, pp. 22–31.[9] F. Berkenkamp, R. Moriconi, A. P. Schoellig, and A. Krause, “Safelearning of regions of attraction for uncertain, nonlinear systems withGaussian processes,” in , 2016, pp. 4661–4666.[10] J. H. Gillula and C. J. Tomlin, “Guaranteed safe online learning ofa bounded system,” in

IEEE Conference on Intelligent Robots andSystems , 2011, pp. 2979–2984.[11] K. P. Wabersich and M. N. Zeilinger, “Scalable synthesis of safetycertiﬁcates from data with application to learning-based control,” in

European Control Conference , 2018, pp. 1691–1697.[12] L. Wang, D. Han, and M. Egerstedt, “Permissive Barrier Certiﬁcatesfor Safe Stabilization Using Sum-of-squares,”

Proceedings of theAmerican Control Conference , vol. 2018-June, pp. 585–590, 2018.[13] A. Chakrabarty, C. Danielson, S. Di Cairano, and A. Raghunathan,“Active learning for estimating reachable sets for systems with un-known dynamics,”

IEEE Transactions on Cybernetics , pp. 1–12, 2020.[14] K. P. Wabersich and M. N. Zeilinger, “Linear model predictivesafety certiﬁcation for learning-based control,” in

IEEE Conferenceon Decision and Control , 2018, pp. 7130–7135.[15] K. P. Wabersich, L. Hewing, A. Carron, and M. N. Zeilinger,“Probabilistic model predictive safety certiﬁcation for learning-basedcontrol,”

IEEE Transactions on Automatic Control , 2021.[16] K. P. Wabersich and M. N. Zeilinger, “A predictive safety ﬁlter forlearning-based control of constrained nonlinear dynamical systems,”

Automatica, 2021 [accepted];arXiv:1812.05506 .[17] H. Pacejka,

Tyre and Vehicle Dynamics , ser. Automotive EngineeringSeries. Butterworth-Heinemann, 2002.[18] R. Rajamani,

Vehicle Dynamics and Control , ser. Mechanical Engi-neering Series. Springer US, 2011.[19] A. Liniger, A. Domahidi, and M. Morari, “Optimization-based au-tonomous racing of 1:43 scale rc cars,”

Optimal Control Applicationsand Methods , vol. 36, no. 5, pp. 628–647, Jul 2014.[20] P. F. Lima, J. M˚artensson, and B. Wahlberg, “Stability conditions forlinear time-varying model predictive control in autonomous driving,”in

IEEE Conference on Decision and Control , 2017, pp. 2775–2782.[21] H. Chen and F. Allg¨ower, “A quasi-inﬁnite horizon nonlinear modelpredictive control scheme with guaranteed stability,”

Automatica ,vol. 34, no. 10, pp. 1205 – 1217, 1998.[22] A. Carron and M. Zeilinger, “Model predictive coverage control,”

IFACWorld Congress , 2020.[23] C. Conte, C. N. Jones, M. Morari, and M. N. Zeilinger, “Distributedsynthesis and stability of cooperative distributed model predictivecontrol for linear systems,”

Automatica , vol. 69, pp. 117 – 125, 2016.[24] J. L. Vazquez, M. Bruhlmeier, A. Liniger, A. Rupenyan, andJ. Lygeros, “Optimization-based hierarchical motion planning forautonomous racing,” in

IEEE International Conference on IntelligentRobots and Systems , 2020.[25] R. Verschueren et al. , “acados: a modular open-source framework forfast embedded optimal control,” 2019.[26] M. ApS,

The MOSEK optimization toolbox for MATLAB , 2019.[27] S. Ross, G. Gordon, and D. Bagnell, “A reduction of imitation learningand structured prediction to no-regret online learning,” in

Proceedingsof the Fourteenth International Conference on Artiﬁcial Intelligenceand Statistics , vol. 15, 2011, pp. 627–635.[28] A. Kabzan, Liniger, J. Lygeros, and R. Siegwart et. al, “Amz driverless:The full autonomous racing system,”