Provably Correct Training of Neural Network Controllers Using Reachability Analysis
11 Provably Correct Training of Neural NetworkControllers Using Reachability Analysis
Xiaowu Sun Yasser Shoukry Abstract —In this paper, we consider the problem of trainingneural network (NN) controllers for cyber-physical systems (CPS)that are guaranteed to satisfy safety and liveness properties. Ourapproach is to combine model-based design methodologies fordynamical systems with data-driven approaches to achieve thistarget. Given a mathematical model of the dynamical system, wecompute a finite-state abstract model that captures the closed-loop behavior under all possible neural network controllers.Using this finite-state abstract model, our framework identifiesthe subset of NN weights that are guaranteed to satisfy thesafety requirements. During training, we augment the learningalgorithm with a NN weight projection operator that enforcesthe resulting NN to be provably safe. To account for theliveness properties, the proposed framework uses the finite-state abstract model to identify candidate NN weights that maysatisfy the liveness properties. Using such candidate NN weights,the proposed framework biases the NN training to achieve theliveness specification. Achieving the guarantees above, can not beensured without correctness guarantees on the NN architecture,which controls the NN’s expressiveness. Therefore, and as acorner step in the proposed framework is the ability to selectprovably correct NN architectures automatically.
I. I
NTRODUCTION
The last decade has witnessed tremendous success in us-ing machine learning (ML) in a multitude of safety-criticalcyber-physical systems domains, such as autonomous vehi-cles, drones, and smart cities. Indeed, end-to-end learningis attractive for the realization of feedback controllers forsuch complex cyber-physical systems, thanks to the appealof designing control systems based on purely data-drivenarchitectures. However, regardless of the explosion in the useof machine learning to design data-driven feedback controllers,providing formal safety and reliability guarantees of theseML-based controllers is in question. It is then unsurprisingthe recent focus in the literature on the problem of safeand trustworthy autonomy in general, and safe reinforcementlearning, in particular.The literature on the safe design of ML-based controllersfor dynamical and hybrid systems can be classified accordingto three broad approaches, namely (i) incorporating safetyin the training of ML-based controllers, (ii) post-trainingverification of ML-based controllers, and (iii) online validationof safety and control intervention. Representative examples ofthe first approach include reward-shaping [1], Bayesian androbust regression [2], [3], [4], and policy optimization withconstraints [5], [6], [7], [8]. Unfortunately, this approach does Department of Electrical Engineering and Computer Science, Universityof California, Irvine, CA 92697 USA { xiaowus,yshoukry } @uci.edu This work was partially sponsored by the NSF awards not provide provable guarantees on the safety of the trainedcontroller. Other techniques in this domain include Lyapunovmethods [9], [10], [11], and safe model predictive control [12]which focuses mainly on providing stability guarantees ratherthan general safety and liveness guarantees.To provide strong safety and reliability guarantees, severalworks in the literature focus on applying formal verificationtechniques (e.g., model checking) to verify pre-trained ML-based controllers’ formal safety properties. Representativeexamples of this approach are the use of SMT-like solvers [13],[14], [15] and hybrid-system verification [16], [17], [18].However, these techniques only assess a given ML-basedcontroller’s safety rather than design or train a safe agent.Due to the lack of safety guarantees on the resulting ML-based controllers, researchers proposed several techniques to restrict the output of the ML-based controller to a set of safecontrol actions. Such a set of safe actions can be obtainedthrough reachability analysis [19], [20], [21], barrier certifi-cates [22], [23], [24], [25], [26], [27], [28], [29], [30], andonline learning of uncertainties [31]. Unfortunately, methodsof this type suffer from being computationally expensive,specific to certain controller structures or else employ trainingalgorithms that require assumptions on the system model.This paper proposes a principled framework combiningmodel-based control and data-driven neural network training toachieve enhanced reliability and verifiability. Our frameworkbridges ideas from reachability analysis to guide and bias theneural network controller’s design and training and is capableof supplying strong reliability guarantees. To that end, andstarting from a nonlinear model of the system, we compute afinite-state abstract model capable of capturing the closed-loopbehavior under all neural network controllers. Such a finite-state abstract model can be computed using a direct extensionof existing reachability tools. Next, our framework uses thisabstract model to search for safe subsets of neural networkweight assignments that are guaranteed to result in a safecontroller. During the neural network training, we use a novel projection operator that projects the trained neural networkweights to the subsets found to be safe.Unlike the safety property, satisfying the liveness propertycan not be enforced by projecting the trained neural networkweights. Therefore, to account for the liveness properties, ourframework utilizes the abstract model further to refine thesafe set of neural network weights to find sets of candidate
NN weights that may satisfy the liveness properties. Theframework then ranks these candidates and bias the NNtraining process accordingly until a NN that satisfies theliveness property is obtained. In conclusion, the contributions a r X i v : . [ ee ss . S Y ] F e b of this paper can be summarized as follows:1) An abstraction-based framework that captures the be-havior of all possible neural network controllers.2) A novel NN weight projection operator that can beintegrated with any NN training procedure to ensure thatthe trained NN is provably safe.3) A procedure to bias the NN training to satisfy theliveness properties.II. P ROBLEM F ORMULATION
Notation:
The symbols R and N denote the set of real andnatural numbers, respectively. The symbols ∧ and = ⇒ denotethe logical AND and logical IMPLIES, respectively. We use Ψ CPWA : X → R m to denote a Continuous and Piece-WiseAffine (CPWA) function of the form: Ψ CPWA ( x ) = K i x + b i if x ∈ R i , i = 1 , . . . , L, (1)where the polytopic sets {R , . . . , R L } is a partition of theset X . We call each polytopic set R i ⊂ X a linear region,and use L Ψ CPWA = {R , . . . , R L } to denote the set of linearregions associated to Ψ CPWA . A. Dynamical Model, Neural Network Controller, and Speci-fication
Consider discrete-time nonlinear dynamical systems of theform: x ( t +1) = f ( x ( t ) , u ( t ) ) , (2)where the state vector x ( t ) ∈ X ⊂ R n , the control vector u ( t ) ∈ U , and t ∈ N . Given a feedback control law Ψ :
X →U , we use ξ x , Ψ : N → X to denote the closed-loop trajectoryof (2) that starts from the state x ∈ X and evolves under thecontrol law Ψ .In this paper, our primary focus is on controlling thenonlinear system (2) with a state-feedback neural networkcontroller N N : X → U . A K -layer Rectified Linear Unit(ReLU) NN is specified by composing K layer functions (orjust layers). A layer l with i l inputs and o l outputs is specifiedby a weight matrix W ( l ) ∈ R o l × i l and a bias vector b ( l ) ∈ R o l as follows: L θ ( l ) : z (cid:55)→ max { W ( l ) z + b ( l ) , } , (3)where the max function is taken element-wise, and θ ( l ) (cid:44) ( W ( l ) , b ( l ) ) for brevity. Thus, a K -layer ReLU NN is specifiedby K layer functions { L θ ( l ) : l = 1 , . . . , K } whose inputand output dimensions are composable: that is they satisfy i l = o l − : l = 2 , . . . , K . Specifically: N N θ ( x ) = ( L θ ( K ) ◦ L θ ( K − ◦ · · · ◦ L θ (1) )( x ) , (4)where we index a ReLU NN function by a list of matrices θ (cid:44) ( θ (1) , . . . , θ ( K ) ) . Also, it is common to allow the finallayer function to omit the max function altogether, and wewill be explicit about this when it is the case.Specifying the number of layers and the dimensions of theassociated matrices θ ( l ) = ( W ( l ) , b ( l ) ) specifies the architec-ture of the ReLU NN. Therefore, we use: A (cid:44) (( n, o ) , ( i , o ) , . . . , ( i K − , o K − ) , ( i K , m )) (5) to denote the architecture of a ReLU NN. Given a NNarchitecture A , we denote by Θ A the set of all lists of matrices θ that satisfy the number of layers and dimensions requiredby A : Θ A (cid:44) { θ = ( θ (1) , . . . , θ ( K ) ) | θ ( l ) = ( W ( l ) , b ( l ) ) , W ( l ) ∈ R o l × i l , b l ∈ R o l } . (6)As a typical control task, this paper considers a reach-avoidspecification φ , which combines a safety property φ safety foravoiding a set of obstacles {O , . . . , O o } with O i ⊂ X , and aliveness property φ liveness for reaching a goal region X goal ⊂ X in a bounded time horizon T . We use ξ x , Ψ | = φ safety and ξ x , Ψ | = φ liveness to denote a trajectory ξ x , Ψ satisfies the safetyand liveness specifications, respectively, i.e.: ξ x , Ψ | = φ safety ⇐⇒ ∀ t ∈ N , ∀ i ∈ { O , . . . , O o } , ξ x , Ψ ( t ) (cid:54)∈ O i ,ξ x , Ψ | = φ liveness ⇐⇒ ∃ t (cid:48) ∈ { , . . . T } , ξ x , Ψ ( t (cid:48) ) ∈ X goal . Given a set of initial states X init , a control law Ψ :
X → R m satisfies the specification φ (denoted by Ψ , X init | = φ ) if alltrajectories starting from the set X init satisfy the specification,i.e., ξ x, Ψ | = φ , ∀ x ∈ X init . B. Main Problem
Given the dynamical model (2) and a reach-avoid specifica-tion φ = φ safety ∧ φ liveness , we consider the problem of designingan NN controller with provable guarantees as described in thenext problem. Problem 2.1:
Given the nonlinear dynamical system (2)and a reach-avoid specification φ , compute (i) a ReLU NNarchitecture A , (ii) an assignment of weights θ ∈ Θ A , and(iii) a set of initial states X init ⊆ X , such that N N θ , X init | = φ safety ∧ φ liveness . III. F RAMEWORK
Before describing our approach to solve Problems 2.1, westart by recalling the connection between ReLU neural net-works and Continuous Piece-Wise Affine (CPWA) functionsas follows [32]:
Proposition 3.1:
Every R n → R m ReLU NN represents acontinuous piece-wise affine function.In this paper, we confine our attention to CPWA controllers(and hence neural network controllers) that are selected froma bounded polytopic set P K × P b ⊂ R m × n × R m , i.e., weassume that K i ∈ P K and b i ∈ P b .Our solution to Problem 2.1 is to use the mathematicalmodel of the physical system (2) to guide the design ofthe NN architecture and bias its training. In particular, ourapproach is split into two components, one to address thesafety specifications while the other one addresses the livenessspecifications as described in the next two subsections. A. Addressing Safety Specification φ safety Our approach to address the safety specifications φ safety isas follows: • Step 1:
Capture the closed-loop behavior of the systemunder all
CPWA controllers using an abstract model. x ( t +1) = f ( x ( t ) , u ( t ) )
Identify a subset of CPWA controllers that leadto correct behavior on the abstract model. • Step 3:
Design a NN architecture that matches thestructure of the CPWA controllers that are identified tobe correct. • Step 4:
Enforce the training of the NN to pick from thesubset of the CPWA controllers that are identified to becorrect.Figure 1 conceptualizes our framework. To start with, weconstruct an abstract model by partitioning both the state space X and the set of all allowed CPWA functions P K × P b .In Figure 1 (b), the state space is partitioned into a set ofabstract states X = { q , q , q } such that q i ⊂ X and X = (cid:83) i ∈{ , , } q i . Similarly, the controller space P K × P b is parti-tioned into a set of abstract controller partitions P = {P , P } such that P i ∈ P K ×P b and P K ×P b = (cid:83) i ∈{ , } P i . The finalabstract model is a non-deterministic finite transition systemwith nodes represent abstract states in X and transitions arelabeled by controller partitions in P . Transitions between theabstract states are computed based on the reachable sets of thenonlinear system (2) from each abstract state and under everycontroller partition of the CPWA functions.Based on the abstract model, we compute a function P safe that maps each abstract state q ∈ X to a subset of thecontroller partitions (representing a collection of subsets ofCPWA functions) that are considered to be safe at each abstractstate. For example, in Figure 1 (b), since the transition from q labeled by P leads to the obstacle, the controller partition P is unsafe at q , and hence P safe ( q ) = {P } . Similarly, P safe ( q ) = {P , P } . For the abstract state q , since both P and P can lead to the obstacle, P safe ( q ) is empty, and hence q is considered as an unsafe abstract state. The set of initialstates that can provide safety guarantee is the union of the safeabstract states, i.e., X init = q ∪ q .Using the set of safe controllers captured by P safe ( q ) , it isdirect to show that any neural network whose linear regions arealigned to the abstract states, and its weights are restricted tothose in P safe ( q ) will always result into a NN that satisfies the safety specifications. Therefore, we utilize this observation toconstruct a NN architecture that is guaranteed to be sufficientto implement a safe controller. Moreover, to ensure the safetyof the resulting trained neural network, we propose a NNweight “projection” operator to enforce that the trained NNonly gives rise to the CPWA functions indicated by P safe ( q ) . B. Addressing Liveness Specification φ liveness Our approach to addressing the liveness specification φ liveness (reaching the goal) can be summarized as follows: • Step 1:
Use the abstract model to identify candidate controller partitions P (cid:63) that can lead to satisfaction ofthe liveness properties. • Step 2:
Construct one local neural network NN q for eachof the abstract states. A sufficient architecture for thisNN q can be derived using ideas presented in [34]. • Step 3:
Train the local neural networks NN q usingcollected data. The data collection can be biased usingthe knowledge of P (cid:63) to accelerate the process. We usethe NN weight projection operator (discussed in the safetysection) during training to ensure that the resulting NNstill enjoys the safety specifications. • Step 4:
Combine all the local neural networks NN q intoa single global NN.In the context of the example in Figure 1 (c), the controllerpartition P is assigned to both q and q as the candidate controller partition P (cid:63) that may lead to the satisfaction of theliveness properties.Next, we train one local neural network NN q for eachabstract state, or a subset of abstract states assigned with thesame controller partition as shown in Figure 1 (d). Duringtraining of the local neural networks, we project the weightsof the neural networks to enforce that the resulting NNs giverise to the CPWA functions belong to the assigned controllerpartition P (cid:63) . Since the controller partition P (cid:63) is chosen fromthe collection P safe ( q ) , the resulting NN at q enjoys the samesafety guarantees.Finally, in Figure 1 (e), we combine the NNs trained foreach abstract state into a single NN controller, by using layerswith fixed weights to decide which part of the NN should beactivated. C. Formal Guarantees
We highlight that the proposed framework above always guarantees that the resulting NN satisfies the safety specifica-tion φ safety thanks to the NN weight projection operator. Thisis reflected in Theorem 5.2 discussed in Section V-D.On the other hand, achieving the liveness specification φ liveness depends on the quality of the data used to train theneural networks and hence needs an extra step of formallyverifying the resulting neural networks and iteratively changethe candidate controller partition P (cid:63) if needed. However, weargue that the resulting NN architecture is modular and is com-posed of a set of local networks NN q that are more amenableto verification. The proposed architecture leads to a directdivide-and-conquer approach in which only local networksNN q may need to be re-designed and trained whenever theliveness properties are not met.IV. S AFE C ONTROLLER P ARTITION
In this section, we provide details on how to constructthe abstract model that captures the behavior of all CPWAcontrollers along with how to choose the controller partitionsthat satisfy the specification φ . A. Abstract Model
In order to capture the behavior of the system (2) under allpossible CPWA controllers Ψ CPWA , we construct a finite-stateabstract model by discretizing both the state space and theset of all allowed CPWA functions. In particular, we partitionthe state space
X ⊂ R n into a set of abstract states, denotedby X = { q , . . . , q N } , with each q i ∈ X be an infinity-normball in R n . The goal region X goal ⊂ X is represented by asingle abstract state q goal ∈ X , and the set of obstacles X obst = (cid:83) oi =1 { q obst i } represents each obstacle O i ⊂ X by an abstractstate q obst i ∈ X , i = 1 , . . . , o . Let Int( q ) denote the interiorof a set q , then the partitioning satisfies X = (cid:83) q ∈ X q , and Int( q i ) ∩ Int( q j ) = ∅ if i (cid:54) = j .Similarly, we partition the controller space into polytopicsets. For simplicity of notation, we define the set of parameters P K × b ⊂ R m × ( n +1) be a polytope that combines P K and P b ,and with some abuse of notation, we use K i ( x ) with a singleparameter K i ∈ P K × b to denote K (cid:48) i x i + b (cid:48) i with the pair ( K (cid:48) i , b (cid:48) i ) = K i . The controller space P K × b ⊂ R m × ( n +1) isdiscretized into a set of polytopic sets in R m × ( n +1) , denotedby P = {P , . . . , P M } , such that P K × b = (cid:83) P∈ P P , and Int( P i ) ∩ Int( P j ) = ∅ if i (cid:54) = j . We call each of thesubsets P i ∈ P a controller partition. Each controller partition P ∈ P represents a subset of CPWA functions, by restrictingparameters K i in a CPWA function to take values from P .In order to reason about the safety property φ safety , we intro-duce a posterior operator based on the dynamical model (2).The posterior of an abstract state q ∈ X under a controllerpartition P ∈ P is the set of states that can be reached in onestep from the states x ∈ q by using an affine state feedbackcontroller with parameters K ∈ P , i.e.: Post( q, P ) (cid:44) { f ( x, K ( x )) ∈ R n | x ∈ q, K ∈ P} . (7) A nonlinear system’s posterior is often over-approximated inpractice and we use Post to denote the corresponding operator.Our abstract model is defined by using the set of abstractstates X , the set of controller partitions P , and the posterioroperator Post . Intuitively, an abstract state q ∈ X has atransition to q (cid:48) ∈ X under a controller partition P ∈ P ifthe intersection between q (cid:48) and Post( q, P ) is non-empty. Definition 4.1: (Posterior Graph) A posterior graph is a finitetransition system S Post (cid:44) ( X, X , L, −→ ) , where: • X = X ; • X = X ; • L = 2 P ; • q l −→ q (cid:48) , if q (cid:54)∈ { q goal } ∪ X obst and l = {P ∈ P | q (cid:48) ∩ Post( q, P ) (cid:54) = ∅} (cid:54) = ∅ .Since the specification φ also requires reaching a goal region X goal ⊂ X , we introduce a predecessor operator to capture theliveness property. The predecessor of an abstract state q (cid:48) ∈ X under a controller partition P ∈ P is defined as the set ofstates that can reach q (cid:48) in one step by using an affine statefeedback controller with some parameter K ∈ P : Pre( q (cid:48) , P ) (cid:44) { x ∈ R n | ∃ K ∈ P : f ( x, K ( x )) ∈ q (cid:48) } . (8)The computation of the posterior and the predecessor operatorscan be done by borrowing existing techniques in reachabilityanalysis of polytopic systems [33], with some difference relieson the need to consider polytopic partitions of the parameterspace P K × b ⊂ R m × ( n +1) instead of the well-studied problemof considering polytopic partitions of the input space U ⊂ R m . B. Computing Function P safe Once the abstract model is computed, our framework iden-tifies a set of safe controller partitions P safe ( q ) ⊆ P at eachabstract state q ∈ X . It is possible that the set P safe ( q ) is emptyat some state q ∈ X , in which case, the state q is consideredto be unsafe.To start with, we introduce an operator Next on the posteriorgraph S Post . Given an arbitrary abstract state q ∈ X and acontroller partition P ∈ P , the set Next( q, P ) ⊆ X consists ofall abstract states that can be reached from q in one step, withthe label of the corresponding transition in S Post contains P : Next( q, P ) (cid:44) { q (cid:48) ∈ X | q (cid:48) ∩ Post( q, P ) (cid:54) = ∅} . (9)Similar to standard reachability analysis, a state q ∈ X isconsidered to be safe if the posteriors computed in multiplesteps do not intersect the obstacles. Unique to our problem,the parameter space P K × b ⊂ R m × ( n +1) is discretized into aset of controller partitions P , and at each state q ∈ X , it hasthe freedom to choose one of the controller partitions P ∈ P when computing the posterior. To capture this freedom in thechoice of controller partitions, we identify a series of unsafesets in a recursive manner by backtracking from the set ofobstacles X obst in the posterior graph S Post : X unsafe = X obst X unsafe = { q ∈ X | ∀P ∈ P : Next( q, P ) ∩ X unsafe (cid:54) = ∅} ∪ X unsafe ... X k unsafe = { q ∈ X | ∀P ∈ P : Next( q, P ) ∩ X k − unsafe (cid:54) = ∅} ∪ X k − unsafe The backtracking stops when it does not find new unsafestates, i.e., X k unsafe = X k − unsafe . Intuitively, the set of unsafe sates X k unsafe consists of abstract states that cannot avoid reachingthe obstacles by choosing controller partitions from P .Then, the set of safe states X safe consists of all the statesthat can avoid transition to the set X k unsafe : X safe (cid:44) { q ∈ X \ X k unsafe | ∃P ∈ P : Next( q , P ) ∩ X k unsafe = ∅} . (10) Correspondingly, the function P safe maps each abstract state q ∈ X safe to a subset of controller partitions that can be usedto avoid transition to X k unsafe : P safe : q (cid:55)→ {P ∈ P | Next( q , P ) ∩ X k unsafe = ∅} . (11)The following theorem summarizes the the safety property: Theorem 4.2:
Let Ψ q, CPWA be a state feedback CPWAcontroller (as defined in (1)) corresponding to an abstract state q ∈ X safe with arbitrarily choosed parameter K i that satisfy K i ∈ P and P ∈ P safe ( q ) . Consider a feedback controller Ψ constructed as follows: Ψ( x ) = Ψ q, CPWA ( x ) ∀ x ∈ q, ∀ q ∈ X safe . Then, with the set of initial states X init = (cid:83) q ∈ X safe q , thesystem (2) controlled by Ψ satisfies the safety specification φ safety , i.e., Ψ , X init | = φ safety . By Theorem 4.2, the system is guaranteed to satisfy thesafety specification φ safety by applying any CPWA controller Ψ q, CPWA at each abstract sate q ∈ X safe , as long as thecontroller parameters K i are chosen from the safe controllerpartitions P ∈ P safe ( q ) . This allows us to conclude the safetyguarantee provided by a NN controller if the CPWA functionsrepresented by the NN are chosen from the safe controllerpartitions, as we show in detail in Section IV-D. Proof:
When the backtracking stops, for every abstract state q ∈ X \ X k unsafe , there exists a controller partition P ∈ P suchthat Next( q , P ) ∩ X k unsafe = ∅ . By (10), X safe = X \ X k unsafe .Then, by (11), for every state q ∈ X safe , and any partition P ∈ P safe ( q ) , Next( q , P ) ⊆ X safe . Therefore, by applyingan arbitrary controller partition P ∈ P safe ( q ) at q ∈ X safe ,none of the resulting trajectories in the posterior graph S Post can reach an obstacle state q obst ∈ X obst . By the constructionof posterior graph S Post , the safety property provided by
P ∈ P safe ( q ) remain holds for any CPWA controller withparameters K i ∈ P . C. Sufficient NN Architecture for Safety Specifications
Once the set of safe abstract states X safe and the associatedmap P safe is identified, the next step is to construct a neuralnetwork architecture that is sufficient to implement a safecontroller. This NN will be augmented in Section V to accountfor the liveness properties as well. The following lemmaidentifies a sufficient number of linear regions needed byany piece-wise affine function to satisfy the safety property φ safety . In Section V-D, this number is transformed into a NNarchitecture that is guaranteed to respect the sufficient numberof linear regions needed for the safety guarantees. Lemma 4.3:
There is a piece-wise affine function
Ψ :
X →U with the number of linear regions be | X safe | , such that thesystem (2) controlled by Ψ satisfies the safety property φ safety . Proof:
By Theorem 4.2, the safety property is satisfied byapplying any CPWA function with K i ∈ P and P ∈ P safe ( q ) .In particular, we consider the parameter K i is fixed at q , i.e.,each abstract state q ∈ X safe is a linear region. Then, thesufficient number of linear regions is same as the number ofstates in X safe . D. Safe Training Using NN Weight Projection
As described in Section III, the final NN consists of severallocal networks NN q that will be trained to account for theliveness property. Nevertheless, to ensure that the safety prop-erty is met by each of the local networks NN q , we propose aNN weight projection operator that can be incoprpated in thetraining of these local NNs. This operator projects the weightsof each NN q to ensure that the network gives rise to the CPWAfunctions that belong to the controller partitions P (cid:63) ∈ P safe ( q ) (the selection of P (cid:63) based on the liveness property is presentedin Section V-A).To that end, we recall that every local network NN q rep-resents a CPWA function N N θ that partitions the state spaceinto a set of linear regions L N N θ = {R , . . . , R L } , with anaffine function parametrized by K i ∈ R m × ( n +1) associatedto each linear region R i , i = 1 , . . . , L . During the projectionphase, our algorithm identifies the subset of linear regions thatintersect with the abstract state q for which the NN is trained: L N N θ ∩ q (cid:44) {R ∈ L N N θ | q ∩ R (cid:54) = ∅} . (12)Then, for each affine function with parameter K i at region R i ∈ L N N θ ∩ q , our approach enforces K i ∈ P (cid:63) by adjustingthe weights of the trained NN.The projection of NN q weights can be done by solving aconvex optimization problem. Consider a K -layer NN withcurrently trained weights (cid:98) θ = ( (cid:98) θ (1) , . . . , (cid:98) θ ( K ) ) , where we usethe hat notation for weights that are given by any trainingprocedure. As a common practice, we consider there is noReLU activation function in the output layer of the NN, andhence the subset of intersected regions L N N (cid:98) θ ∩ q ⊆ L N N (cid:98) θ onlydepends on the hidden layer weights (cid:98) θ ( l ) , l = 1 , . . . , K − . During projection, our algorithm adjusts the output layerweights θ ( K ) by minimizing its difference from the currentlytrained weights (cid:98) θ ( K ) . By fixing the hidden layer weights asgiven by the training, the subset of intersected regions L N N (cid:98) θ ∩ q can be determined. Controller parameters K i corresponding tolinear regions R i ∈ L N N (cid:98) θ ∩ q are brought into the controllerpartition P (cid:63) by solving the following quadratic program: min θ ( K ) || θ ( K ) − (cid:98) θ ( K ) || s.t. K i ∈ P (cid:63) , ∀ R i ∈ L N N (cid:98) θ ∩ q . (13)The weights θ ( K ) solved from (13) are then used to replacethe currently trained output layer weights (cid:98) θ ( K ) . These twoprocesses, i.e., training and projection, can be done either in analternating way, or only projecting once at the end of training.In the result section, we show that the trained NNs performwell even with just a single projection at the end, Notice thatthe optimization problem (13) could be infeasible, which canbe resolved by improving the hidden layer weights with moretraining effort. Example:
We illustrate the optimization problem (13) by atoy example. Consider a neural network
N N θ : R → R has asingle hidden layer with two neurons: h = max { , W (1)11 x + W (1)12 x } , h = max { , W (1)21 x + W (1)22 x } , and the outputlayer u = W (2)11 h + W (2)12 h . Based on the activation pattern ofthe two hidden layer neurons, the NN has four linear regions R , . . . , R , and the corresponding CPWA function N N θ canbe written as: N N θ ( x ) = ( W (1)11 W (2)11 + W (1)21 W (2)12 ) x + ( W (1)12 W (2)11 + W (1)22 W (2)12 ) x , if x ∈ R W (1)11 W (2)11 x + W (1)12 W (2)11 x , if x ∈ R W (1)21 W (2)12 ) x + W (1)22 W (2)12 x , if x ∈ R , if x ∈ R Consider the controller partition P (cid:63) is given by K ∈ [ K low , K up ] and K ∈ [ K low , K up ] . Suppose the abstract state q intersects R and R , then problem (13) can be written as: min W (2)11 ,W (2)12 ( W (2)11 − (cid:99) W (2)11 ) + ( W (2)12 − (cid:99) W (2)12 ) s.t. (cid:99) W (1)11 W (2)11 + (cid:99) W (1)21 W (2)12 ∈ [ K low , K up ] , (cid:99) W (1)11 W (2)11 ∈ [ K low , K up ] , (cid:99) W (1)12 W (2)11 + (cid:99) W (1)22 W (2)12 ∈ [ K low , K up ] , (cid:99) W (1)12 W (2)11 ∈ [ K low , K up ] Since all the hidden layer weights (cid:99) W (1) ij are fixed as given bythe training, it yields a quadratic program.The following result shows the safety guarantee providedby projecting weights during training: Theorem 4.4:
Given an abstract state q ∈ X safe and anarbitrary controller partition P (cid:63) ∈ P safe ( q ) . Let the weightassignment θ of NN q be computed by (13), then the resultingNN q = N N θ is guaranteed to be safe at q , i.e., NN q , q | = φ safety . Proof:
Notice that the subset of intersected regions L N N (cid:98) θ only depends on the hidden layer weights, which remainunchanged during the projection. Then, the constraints inproblem (13) along with Theorem 4.2 directly leads to theresult. V. E XTENSION TO L IVENESS P ROPERTY
A. Controller Partition Assignment
Among all the safe controller partitions in P safe ( q ) , we as-sign one of them P (cid:63) ∈ P safe ( q ) to each abstract state q ∈ X safe ,by taking into account the liveness specification φ liveness . Theliveness property requires that the nonlinear system (2) canreach the goal X goal ⊂ X by using the trained NN controller.Unlike the safety property which can be enforced using theprojection operator, achieving the liveness property dependson practical issues, such as the amount of training data andthe effort spent on training the NNs. To that end, we showthat the safety guarantee provided by our algorithm does notimpede the collection of training data, and hence the NNs canbe trained to satisfy the liveness property by using standardlearning techniques.Since the posterior graph S Post over-approximates the be-havior of the system, a transition from q to q (cid:48) under P doesnot guarantee every state x ∈ q can reach q (cid:48) in one step,by applying input u = K ( x ) with parameter K ∈ P . To capture the liveness property, we introduce the predecessorgraph based on the predecessor operator given in Section IV-A: Definition 5.1: (Predecessor Graph) A predecessor graph isa finite transition system S Pre (cid:44) ( X, X , L, −→ ) , where: • X = X safe ∪ { q goal } ; • X = X safe ; • L = 2 P ; • q l −→ q (cid:48) , if q (cid:54) = q goal and l = {P ∈ P safe ( q ) | q ∩ Pre( q (cid:48) , P ) (cid:54) = ∅} (cid:54) = ∅ .Notice that in the construction of S Pre , we restrict transitionlabels to the safe controller partitions
P ∈ P safe ( q ) at eachstate q ∈ X safe . Let T Pre be the set of all trajectories overthe predecessor graph S Pre , then we use π ( t ) X : ω (cid:55)→ q todenote the map from a trajectory ω ∈ T Pre to the abstractstate q at time step t , and use π ( t ) L : ω (cid:55)→ l to denote the mapfrom ω ∈ T Pre to the label l ∈ P associated to the transitionfrom π ( t ) X ( ω ) to π ( t +1) X ( ω ) . By extending the formulation ofspecification to the abstract state space, a trajectory ω in S Pre satisfies the reach-avoid specification φ , denoted by ω | = φ , if ω reaches the goal state q goal in T steps. Similar to the notationintroduced for the posterior graph, let Next( q, P ) be the setof states that can be reached from q under the partition P inone step: Next( q, P ) (cid:44) { q (cid:48) ∈ X safe ∪{ q goal } | q ∩ Pre( q (cid:48) , P ) (cid:54) = ∅} . (14)At each state q ∈ X safe , our objective is to choose thecandidate controller partition P (cid:63) ∈ P safe ( q ) that can lead mostof the states x ∈ q to the goal. To that end, we restrict ourattention to trajectories in S Pre that progress towards the goal.That is, let | ω | be the length of a trajectory ω ∈ T Pre , and
Dist : X safe → N map a state q ∈ X safe to the length ofthe shortest trajectory from the state q to the goal in thepredecessor graph S Pre . Then, we use T (cid:48) Pre ⊆ T
Pre to denotethe subset of trajectories that lead to the goal: T (cid:48) Pre (cid:44) { ω ∈ T Pre | Dist( π ( t ) X ( ω )) < Dist( π ( t − X ( ω )) ,t = 1 , . . . , | ω | − } . (15)Now we can define the subset of abstract states that progresstowards the goal, denoted by Q q, P ⊆ Next( q, P ) , as the setof abstract states along a trajectory ω ∈ T (cid:48) Pre that satisfies thegiven specification: Q q, P (cid:44) { q (cid:48) ∈ Next( q, P ) | ∃ ω ∈ T (cid:48) Pre : ω | = φ,π (0) X ( ω ) = q, π (1) X ( ω ) = q (cid:48) , P ∈ π (0) L ( ω ) } . (16)Then, the intersection between the state q and the predecessorsof q (cid:48) ∈ Q q, P under the controller partition P is given by: I q, P (cid:44) (cid:91) q (cid:48) ∈Q q, P ( q ∩ Pre( q (cid:48) , P )) . (17)Intuitively, the set I q, P measures the portion of states x ∈ q that can reach the goal by applying input u = K ( x ) with K ∈P at the first step. Then, our algorithm assigns a state q ∈ X safe with the controller partition P (cid:63) ∈ P safe ( q ) that corresponds tothe largest set I q, P (cid:63) among all the sets I q, P for P ∈ P safe ( q ) .Indeed, and as mentioned in Section III, such procedure onlyranks the available choices of controller partitions and onemay need to iterate over the remaining choices using the sameheuristic above. B. Selecting Architecture for Local NNs
Given the assigned controller partition P (cid:63) ∈ P safe ( q ) at eachstate q ∈ X safe , the next step is to select a NN architecture(number of layers and number of neurons per layer) for eachof the local networks NN q . Such an architecture, should besufficient to implement any CPWA function within the selectedcontroller partition P (cid:63) ∈ P safe ( q ) .Unfortunately, such neural network architecture may notexist without additional assumptions on the controllabilityof the underlying system, e.g., the existence of a Lipschitzcontinuous controller that satisfy the specification φ liveness [34].We note that the current definition of the predecessor operator Pre can be slightly modified to ensure the existence of suchLipschitz continuous controllers, thanks to the fact that P (cid:63) represents a set of CPWA functions which are by definitionLipschitz continuous. For sake of brevity and due to spaceconstraints, we omit this discussion as it follows similararguments as the one detailed in [34]. Finally, we refer tosuch architecture as A q, P (cid:63) . C. Data Collection and Training of Local NNs
For training of the local networks NN q , we consider thetraining data are either already available or need to be collectedby invoking an expert. In the first scenario, unlike mostlearning algorithms that can directly use the available data,our approach needs to take into account the abstract states andcontroller partitions associated with the data. In case trainingdata are not available, we assume to have access to an expertfor collecting data, with extra consideration that the collecteddata need be aligned with the controller partition assignment.We consider the collection of training data takes the form of { ( x, u ) } , where each state x ∈ X is associated with an inputlabel u ∈ R m . Given a collection of data D from trajectoriesthat satisfy the given specification, Algorithm 1 selects a subsetof data D q ⊆ D used for training the local network NN q at q ,without the need to compute predecessors or invoke an expert.Specifically, for each data point ( x, u ) ∈ D , it determines thecorresponding controller partition P ∈ P safe ( q ) by solving alinear feasibility problem, under the constraints u = K ( x ) , K ∈ P (one data point ( x, u ) may correspond to multiplepartitions P ) (line 7 in Algorithm 1). After classifying a subsetof data D q, P associated to each controller partition P , thealgorithm selects P (cid:63) ∈ P safe ( q ) that corresponds to the mostamount of available data points (line 9-13 in Algorithm 1).By assuming having access to an expert, Algorithm 2collects training data that are consistent with the assignedcontroller partitions. It first computes the intersection betweenthe state q and the predecessors of states q (cid:48) ∈ Q q, P (line 2-8in Algorithm 2). Then, the algorithm selects P (cid:63) ∈ P safe ( q ) corresponding to the largest intersection with the predecessors(line 9-12 in Algorithm 2). With access to an expert, it samplesstates x ∈ q and synthesizes one step transition to reachabstract states in Q q, P (cid:63) , by using an affine controller withparameter K ∈ P (cid:63) (line 13-15 in Algorithm 2).With the collection of training data D q that are alignedwith the controller partition assignment P (cid:63) ∈ P safe ( q ) ateach abstract state q ∈ X safe , the local NNs can be trained to satisfy the liveness specification φ liveness , along with thesafety guarantee enforced by projecting weights at the endof training. Algorithm 1
CLASSIFY-DATA ( q, P safe ( q ) , D ) D q = list () , P (cid:63) = None, current = 0 for P ∈ P safe ( q ) do D q, P = list () for ( x, u ) ∈ D do if x ∈ q then for P ∈ P safe ( q ) do if u = K ( x ) , K ∈ P then D q, P .append( ( x, u ) ) for P ∈ P safe ( q ) do if |D q, P | > current then current = |D q, P | D q = D q, P P (cid:63) = P Return D q , P (cid:63) Algorithm 2
COLLECT-DATA ( q, P safe ( q ) , S Pre ) D q = list () , P (cid:63) = None, current = 0 for P ∈ P safe ( q ) do I q, P = set () , Q q, P = set () for q (cid:48) ∈ Next( q, P ) do if ∃ ω ∈ T (cid:48) Pre : ω | = φ , π (0) X ( ω ) = q , π (1) X ( ω ) = q (cid:48) , P ∈ π (0) L ( ω ) then isect = q ∩ Pre( q (cid:48) , P ) I q, P = I q, P ∪ isect Q q, P = Q q, P ∪ { q (cid:48) } for P ∈ P safe ( q ) do if size( I q, P ) > current then P (cid:63) = P current = size( I q, P ) for sample x ∈ q do u = expert ( x, P (cid:63) , Q q, P (cid:63) ) D q .append( ( x, u ) ) Return D q , P (cid:63) D. Combined NN Controller
The final step of our framework is to combine all thelocal networks NN q into one global NN controller. Figure 2(a) shows the overall structure of the global NN controllerobtained by combining modules [ NN q ] M that correspond tothe local networks NN q . As input to the NN controller, thestate x ∈ X is fed into all the local networks, and the outputof the NN controller is the summation of all the local networkoutputs. In the figure, we show a single output for simplicity,but it can be easily extended to multiple outputs (indeed, evenin the figures, u and u q can be thought as vectors in R m , andthe summation and product operations correspond to vectoraddition and scalar multiplication, respectively).Each module [ NN q ] M consists of two parts: logic and ReLUNN. The logic component decides whether the current state x is in the abstract state q associated with [ NN q ] M , and outputs if the answer is affirmative, otherwise. The ReLU NN is theneural network trained for this abstract state q . By multiplyingthe outputs of the logic and the ReLU NN, output of themodule [ NN q ] M is identical to the output of the ReLU NN (a)(b) Fig. 2: (a) The combined NN controller consists of one module [ NN q i ] M for each abstract state q i ∈ X safe , where N (cid:48) = | X safe | .(b) An example of the module [ NN q ] M , where the state q ⊂ R is given by q = [0 , × [0 , .if x ∈ q , and zero otherwise. Figure 2 (b) is an example ofthe module [ NN q ] M , where we choose an arbitrary abstractstate q ⊂ R given by ≤ x ≤ and ≤ x ≤ .The logic component in each module [ NN q ] M can beimplemented as a single layer NN with fixed weights. Giventhe H -representation Ax ≤ c of the state q , the weight matrixand the bias vector associated to the single layer NN are W (1) = − A and b (1) = c , respectively. Essentially, thischoice of weights encodes one hyperplane inequality in the H -representation to each neuron in the single layer. To representwhether an inequality holds, we use a step function as thenonlinear activation function for the single layer: Step( x ) = (cid:40) if x ≥ otherwise . (18)The product of the outputs of all the neurons in the singlelayer is computed at the end (by the product operator Π ),and hence the logic component returns if and only if allthe hyperplane inequalities are satisfied. We refer to the archi-tecture of the logic component as A q, Π and the architectureof the whole module [ NN q ] M as A q = [ A q, P (cid:63) ||A q, Π ] , where || denotes the parallel composition of the ReLU NN and thelogic component. Using the same notation, we can define thearchitecture of the global NN as follows: A = A q || . . . ||A q N (cid:48) . Now the guarantees of the combined NN controller can besummarized as follows whose proof follows directly from thediscussion above along with Theorem 4.2, Lemma 4.3, andTheorem 4.4:
Theorem 5.2:
Consider the nonlinear system (2) and thereach-avoid specification φ = φ safety ∧ φ liveness . Let the con-troller partition assignment P (cid:63) , . . . , P (cid:63)N (cid:48) and the local neuralnetworks NN q , . . . , NN q N (cid:48) satisfy the following conditions: • the assignment P (cid:63)i satisfies P (cid:63)i ∈ P safe ( q i ) , • the NN architecture A q i satisfies A q i = [ A q i , P (cid:63) ||A q i , Π ] , • the NN weights θ i of NN q i is projected on P (cid:63)i using theprojection operator (13).Then, the neural network NN with architecture A = A q || . . . ||A q N (cid:48) and local networks NN q , . . . , NN q N (cid:48) satisfiesNN , X init | = φ safety with X init = (cid:83) q ∈ X safe q . Moreover, if all thelocal networks NN q i satisfy: Reach( q i , NN q i ) ⊆ (cid:91) q (cid:48) ∈Q qi, P (cid:63)i q (cid:48) , (19)where Q q i , P (cid:63)i is defined as (16) and Reach( q, NN q ) is definedas: Reach( q, NN q ) (cid:44) { f ( x, NN q ( x )) | x ∈ q } , (20)then: NN , X init | = φ liveness . In words, Theorem 5.2 guarantees that any global NNcomposed from provably safe local networks is still safe(i.e., satisfy φ safety ). This is a reflection of the fact that thecomposition of the global network respects the linear regionson which the local networks are defined. Moreover, if each ofthe local NNs satisfies the local reachability property in (19),then the global NN satisfies the liveness property φ liveness . Thisis a reflection of the fact that the set Q q, P in (16) is definedto guarantee progress towards the goal.In practice, by combining the local NNs into a single con-troller, it allows one to repair the NN controller in a systematicway when it fails to meet the local liveness property (19).Specifically, with the observation that the behavior of thesystem is not as expected at a certain abstract state q ∈ X safe ,only the local neural network NN q need to be improved, suchas by further training with augmented data collected at thestate q , without affecting NNs that satisfy the specification atother abstract states.VI. E XPERIMENTAL R ESULTS
We implemented the proposed framework and evaluatedboth the resulting control performance and the scalability ofthe proposed algorithm. We used TIRA [35] to compute thereachable sets and FORCES [36], [37] to collect training dataas shown in line 14 of Algorithm 2. All experiments wereexecuted on an Intel Core i9 2.4-GHz processor with 32 GBof memory.
A. Controller Performance Comparison: Provably Correct NNControllers vs Standard Training Techniques
We first present trajectories of a wheeled robot under thecontrol of NN controllers trained by our algorithm. Let thestate vector of the system be x = [ ζ x , ζ y , θ ] (cid:62) ∈ X ⊂ R ,where ζ x , ζ y denote the coordinates of the robot, and θ is the W o r k s p a ce x [m] y [ m ] x [m] y [ m ] x [m] y [ m ] W o r k s p a ce x [m] y [ m ] x [m] y [ m ] x [m] y [ m ] Fig. 3: Workspace (the upper row) and workspace (the lower row) are partitioned into abstract states (dash lines) eitheruniformly or non-uniformly. Trajectories starting from different initial states satisfy both the safety specification φ safety (blueareas are obstacles) and the liveness specification φ liveness for reaching the goal (green area).heading direction. The discrete-time dynamics of the robot isgiven by: ζ ( t +∆ t ) x = ζ ( t ) x + ∆ t v cos ( θ ( t ) ) ζ ( t +∆ t ) y = ζ ( t ) y + ∆ t v sin ( θ ( t ) ) (21) θ ( t +∆ t ) = θ ( t ) + ∆ t u ( t ) where the control input u ( t ) ∈ R is determined by a ReLU NNcontroller, i.e., u ( t ) = NN ( x ( t ) ) , NN ∈ P K × b ⊂ R × withthe controller space P K × b considered to be a hyperrectangle.We choose discrete time step size ∆ t = 0 . .We considered two different workspaces indexed by and as shown in the upper and lower row of Figure 3, respectively.As the first step of our algorithm, we discretized the state space X ⊂ R and the controller space P K × b ⊂ R × as describedin Section IV-A. To illustrate the flexibility in the choice ofpartition strategies, we partitioned the state space correspond-ing to workspace uniformly into abstract states, whilepartitioning the state space corresponding to workspace non-uniformly into abstract states. In both cases, the range ofheading direction θ ∈ [0 , π ) is uniformly partitioned into intervals, and the partitions of the x , y dimensions areshown as the dashed lines in the workspaces in Figure 3. Weuniformly partitioned P K × b into hyperrectangles.By computing the reachable sets using the reachability toolTIRA [35], we constructed the posterior graph S Post , whichis then used to find the set of safe abstract states X safe andthe function P safe . The number of safe abstract states | X safe | is and for workspaces and , respectively. Noticethat not all abstract states are safe. Indeed, abstract states that are next to the obstacles with the heading direction θ towardsthe obstacles are inevitably unsafe, since the control input u cannot affect the coordinates ζ x , ζ y in one time step. Theexecution time to compute the posterior graph and to identifythe set of safe states can be found in Table I.We collected training data by following Algorithm 2 inSection V-A. We used Keras [38] to train a shallow NN (onehidden layer) with hidden layer neurons for each abstractstate q ∈ X safe . At the end of training, we projected the trainedNN weights only once as mentioned in Section IV-D. Forworkspace , it takes seconds to collect all the trainingdata, and seconds to train all the local NNs including theprojection of the NN weights. For workspace , the executiontime for collecting data is seconds, and the total time fortraining and projection is seconds.In Figure 3, we show some trajectories under NN con-trollers trained by our algorithm in both workspaces. De-spite we choose trajectories with initial states in the set X init = (cid:83) q ∈ X safe q to be close to the obstacles or initiallyheading towards the obstacles, all the trajectories are collision-free as guaranteed by our algorithm. Moreover, by assigningcontroller partitions based on strategies in Section V-A, alltrajectories satisfy the liveness specification φ liveness .Next, we compare NN controllers trained by our algorithmwith those trained by standard imitation learning, which mini-mizes the regression loss without taking into account the safetyguarantee. All NN controllers are trained using the same setof the training data. We vary the NN architectures for theNNs trained by standard imitation learning to achieve betterperformance, and train them using enough episodes for the loss S t a nd a r d I m i t a t i o n L e a r n i n g x [m] y [ m ] x [m] y [ m ] x [m] y [ m ] P r ova b l y S a f e T r a i n i n g x [m] y [ m ] x [m] y [ m ] x [m] y [ m ] Fig. 4: The upper row shows trajectories resulting from NN controllers trained using standard imitation learning, where theNN architectures are (left) hidden layers with neurons per layer, (middle) hidden layers with neurons per layer, and(right) hidden layers with neurons per layer. The lower row shows trajectories resulting from NN controllers trainedusing our algorithm. With the same initial states (two sub-figures in the same column), only the NN controllers trained by ouralgorithm result into collision-free trajectories.to be low enough. Nevertheless, as shown in Figure 4, for allthe NN controllers trained by standard imitation learning, weare able find initial states starting from which the trajectoriesare not safe. However, with the same initial states, trajectoriesunder NN controllers trained by our algorithm are collision-free, which is guaranteed by our framework. B. Scalability Study
1- Scalability with respect to Partition Granularity:
Ouralgorithm computes the reachable sets of abstract states underdifferent controller partitions using reachability tools, whichmay employ conservative approximation for nonlinear sys-tems. In that case, identifying the set of safe abstract statesneed finer partitioning of the state space X and the controllerspace P K × b . To that end, we show scalability of our algorithmwith respect to the choice of partition parameters. Usingthe two workspaces in Figure 3, we increase the numberof abstract states and controller partitions by decreasing thepartition grid size and report the execution time for each partof our framework in Table I. As shown in the table, withfiner partitioning of the state space (more abstract states),the number of safe abstract states increases. In this example,finer partitioning of the controller space does not lead tomore safe abstract states, since controller partitions have beensmall enough and as mentioned above, some abstract statesare inevitably unsafe. Moreover, we notice that the execution TABLE I: Scalability with respect to Partition Granularity Workspace Number of Number of Number of Compute Construct Compute AssignIndex Abstract Controller Safe & Reachable Reachable Posterior Function ControllerStates Partitions Abstract States Sets [s] Graph [s] P safe [s] Partitions [s] time grows linearly with the number of abstract states and thenumber of controller partitions.Although we conducted all the experiments on a single CPUcore, we note that our algorithm can be highly parallelized.For example, computing reachable sets of the abstract states,checking intersection between the posteriors and the abstractstates when constructing the posterior graph, and training localneural networks NN q can all be done in parallel. After trainingthe NN controller, the execution time of the controller is almostinstantaneous, which is a major advantage of NN controllers.
2- Scalability with respect to System Dimension:
Abstraction-based controller design is known to be computa-tional expensive for high-dimensional systems due to the curseof dimensionality. In Table II, we show scalability of our algo-rithm with respect to the system dimension. To convenientlyincrease the system dimension, we consider a chain of integra-tors represented as the linear system x ( t +1) = Ax ( t ) + Bu ( t ) , TABLE II: Scalability with respect to System Dimension
System Number of Compute Reachable Construct PosteriorDimension n Abstract States Sets [s] Graph [s] where A ∈ R n × n is the identity matrix, and u ( t ) ∈ R . Withfixed number of controller partitions and partition grid sizefor abstract states, Table II shows that the number of abstractstates and execution time grow exponentially with the systemdimension n . Nevertheless, our algorithm can handle a high-dimensional system in a reasonable amount of time.R EFERENCES[1] W. Saunders, G. Sastry, A. Stuhlmueller, and O. Evans, “Trial withouterror: Towards safe reinforcement learning via human intervention,” in
Proceedings of the 17th International Conference on Autonomous Agentsand MultiAgent Systems , 2018, pp. 2067–2069.[2] A. Liu, G. Shi, S.-J. Chung, A. Anandkumar, and Y. Yue, “Robust regres-sion for safe exploration in control,” arXiv preprint arXiv:1906.05819 ,2019.[3] F. Berkenkamp, A. Krause, and A. P. Schoellig, “Bayesian optimizationwith safety constraints: safe and automatic parameter tuning in robotics,” arXiv preprint arXiv:1602.04450 , 2016.[4] P. Pauli, A. Koch, J. Berberich, and F. Allg¨ower, “Training robust neuralnetworks using lipschitz bounds,” arXiv preprint arXiv:2005.02929 ,2020.[5] C. Gaskett, “Reinforcement learning under circumstances beyond itscontrol,” 2003.[6] T. M. Moldovan and P. Abbeel, “Safe exploration in markov decisionprocesses,” arXiv preprint arXiv:1205.4810 , 2012.[7] M. Turchetta, F. Berkenkamp, and A. Krause, “Safe exploration in finitemarkov decision processes with gaussian processes,” in
Advances inNeural Information Processing Systems , 2016, pp. 4312–4320.[8] L. Wen, J. Duan, S. E. Li, S. Xu, and H. Peng, “Safe reinforcementlearning for autonomous vehicles through parallel constrained policyoptimization,” arXiv preprint arXiv:2003.01303 , 2020.[9] F. Berkenkamp, M. Turchetta, A. Schoellig, and A. Krause, “Safe model-based reinforcement learning with stability guarantees,” in
Advances inneural information processing systems , 2017.[10] Y. Chow, O. Nachum, A. Faust, E. Duenez-Guzman, andM. Ghavamzadeh, “Lyapunov-based safe policy optimization forcontinuous control,” arXiv preprint arXiv:1901.10031 , 2019.[11] Y. Chow, O. Nachum, E. Duenez-Guzman, and M. Ghavamzadeh, “Alyapunov-based approach to safe reinforcement learning,” in
Advancesin neural information processing systems , 2018, pp. 8092–8101.[12] T. Koller, F. Berkenkamp, M. Turchetta, and A. Krause, “Learning-basedmodel predictive control for safe exploration,” in . IEEE, 2018.[13] X. Sun, H. Khedr, and Y. Shoukry, “Formal verification of neuralnetwork controlled autonomous systems,” in
Proceedings of the 22ndACM International Conference on Hybrid Systems: Computation andControl , 2019, pp. 147–156.[14] S. Dutta, S. Jha, S. Sankaranarayanan, and A. Tiwari, “Output rangeanalysis for deep feedforward neural networks,” in
NASA Formal Meth-ods Symposium . Springer, 2018.[15] C. Liu, T. Arnon, C. Lazarus, C. Barrett, and M. J. Kochender-fer, “Algorithms for verifying deep neural networks,” arXiv preprintarXiv:1903.06758 , 2019.[16] M. Fazlyab, A. Robey, H. Hassani, M. Morari, and G. Pappas, “Efficientand accurate estimation of lipschitz constants for deep neural networks,”in
Advances in Neural Information Processing Systems , 2019, pp.11 423–11 434.[17] W. Xiang, D. M. Lopez, P. Musau, and T. T. Johnson, “Reachable setestimation and verification for neural network models of nonlinear dy-namic systems,” in
Safe, Autonomous and Intelligent Vehicles . Springer,2019, pp. 123–144. [18] R. Ivanov, J. Weimer, R. Alur, G. J. Pappas, and I. Lee, “Verisig:verifying safety properties of hybrid systems with neural networkcontrollers,” in
Proceedings of the 22nd ACM International Conferenceon Hybrid Systems: Computation and Control , 2019, pp. 169–178.[19] A. K. Akametalu, J. F. Fisac, J. H. Gillula, S. Kaynama, M. N. Zeilinger,and C. J. Tomlin, “Reachability-based safe learning with gaussianprocesses,” in . IEEE,2014, pp. 1424–1431.[20] V. Govindarajan, K. Driggs-Campbell, and R. Bajcsy, “Data-drivenreachability analysis for human-in-the-loop systems,” in . IEEE, 2017, pp.2617–2622.[21] J. F. Fisac, A. K. Akametalu, M. N. Zeilinger, S. Kaynama, J. Gillula,and C. J. Tomlin, “A general safety framework for learning-based controlin uncertain robotic systems,”
IEEE Transactions on Automatic Control ,vol. 64, no. 7, pp. 2737–2752, 2018.[22] J. Ferlez, M. Elnaggar, Y. Shoukry, and C. Fleming, “Shieldnn: Aprovably safe nn filter for unsafe nn controllers,” arXiv preprintarXiv:2006.09564 , 2020.[23] R. Cheng, G. Orosz, R. M. Murray, and J. W. Burdick, “End-to-endsafe reinforcement learning through barrier functions for safety-criticalcontinuous control tasks,” in
Proceedings of the AAAI Conference onArtificial Intelligence , vol. 33, 2019, pp. 3387–3395.[24] K. P. Wabersich and M. N. Zeilinger, “Scalable synthesis of safetycertificates from data with application to learning-based control,” in . IEEE, 2018, pp. 1691–1697.[25] M. Srinivasan, A. Dabholkar, S. Coogan, and P. Vela, “Synthesis ofcontrol barrier functions using a supervised machine learning approach,” arXiv preprint arXiv:2003.04950 , 2020.[26] A. J. Taylor, A. Singletary, Y. Yue, and A. D. Ames, “A control barrierperspective on episodic learning via projection-to-state safety,” arXivpreprint arXiv:2003.08028 , 2020.[27] X. Li and C. Belta, “Temporal logic guided safe reinforcement learningusing control barrier functions,” arXiv preprint arXiv:1903.09885 , 2019.[28] R. Cheng, M. J. Khojasteh, A. D. Ames, and J. W. Burdick, “Safe multi-agent interaction through robust control barrier functions with learneduncertainties,” arXiv preprint arXiv:2004.05273 , 2020.[29] L. Wang, E. A. Theodorou, and M. Egerstedt, “Safe learning of quadro-tor dynamics using barrier certificates,” in . IEEE, 2018, pp.2460–2465.[30] A. Robey, H. Hu, L. Lindemann, H. Zhang, D. V. Dimarogonas,S. Tu, and N. Matni, “Learning control barrier functions from expertdemonstrations,” arXiv preprint arXiv:2004.03315 , 2020.[31] G. Shi, X. Shi, M. O’Connell, R. Yu, K. Azizzadenesheli, A. Anand-kumar, Y. Yue, and S.-J. Chung, “Neural lander: Stable drone landingcontrol using learned dynamics,” in . IEEE, 2019, pp. 9784–9790.[32] R. Pascanu, G. Montufar, and Y. Bengio, “On the number of response re-gions of deep feed forward networks with piece-wise linear activations,” arXiv preprint arXiv:1312.6098 , 2013.[33] B. Yordanov, J. Tumova, I. Cerna, J. Barnat, and C. Belta, “Temporallogic control of discrete-time piecewise affine systems,”
IEEE Transac-tions on Automatic Control , vol. 57, no. 6, pp. 1491–1504, 2012.[34] J. Ferlez, X. Sun, and Y. Shoukry, “Two-level lattice neural net-work architectures for control of nonlinear systems,” arXiv preprintarXiv:2004.09628 , 2020.[35] P.-J. Meyer, A. Devonport, and M. Arcak, “Tira: toolbox for intervalreachability analysis,” in
Proceedings of the 22nd ACM InternationalConference on Hybrid Systems: Computation and Control , 2019, pp.224–229.[36] A. Domahidi and J. Jerez, “Forces professional,” Embotech AG,url=https://embotech.com/FORCES-Pro, 2014–2019.[37] A. Zanelli, A. Domahidi, J. Jerez, and M. Morari, “Forces nlp: anefficient implementation of interior-point... methods for multistage non-linear nonconvex programs,”
International Journal of Control , pp. 1–17,2017.[38] F. Chollet et al.et al.