[PDF] Provably Correct Training of Neural Network Controllers Using Reachability Analysis

Abstract

In this paper, we consider the problem of training neural network (NN) controllers for cyber-physical systems (CPS) that are guaranteed to satisfy safety and liveness properties. Our approach is to combine model-based design methodologies for dynamical systems with data-driven approaches to achieve this target. Given a mathematical model of the dynamical system, we compute a finite-state abstract model that captures the closed-loop behavior under all possible neural network controllers. Using this finite-state abstract model, our framework identifies the subset of NN weights that are guaranteed to satisfy the safety requirements. During training, we augment the learning algorithm with a NN weight projection operator that enforces the resulting NN to be provably safe. To account for the liveness properties, the proposed framework uses the finite-state abstract model to identify candidate NN weights that may satisfy the liveness properties. Using such candidate NN weights, the proposed framework biases the NN training to achieve the liveness specification. Achieving the guarantees above, can not be ensured without correctness guarantees on the NN architecture, which controls the NN's expressiveness. Therefore, and as a corner step in the proposed framework is the ability to select provably correct NN architectures automatically.

Full PDF

11 Provably Correct Training of Neural NetworkControllers Using Reachability Analysis

Xiaowu Sun Yasser Shoukry Abstract —In this paper, we consider the problem of trainingneural network (NN) controllers for cyber-physical systems (CPS)that are guaranteed to satisfy safety and liveness properties. Ourapproach is to combine model-based design methodologies fordynamical systems with data-driven approaches to achieve thistarget. Given a mathematical model of the dynamical system, wecompute a ﬁnite-state abstract model that captures the closed-loop behavior under all possible neural network controllers.Using this ﬁnite-state abstract model, our framework identiﬁesthe subset of NN weights that are guaranteed to satisfy thesafety requirements. During training, we augment the learningalgorithm with a NN weight projection operator that enforcesthe resulting NN to be provably safe. To account for theliveness properties, the proposed framework uses the ﬁnite-state abstract model to identify candidate NN weights that maysatisfy the liveness properties. Using such candidate NN weights,the proposed framework biases the NN training to achieve theliveness speciﬁcation. Achieving the guarantees above, can not beensured without correctness guarantees on the NN architecture,which controls the NN’s expressiveness. Therefore, and as acorner step in the proposed framework is the ability to selectprovably correct NN architectures automatically.

I. I

NTRODUCTION

The last decade has witnessed tremendous success in us-ing machine learning (ML) in a multitude of safety-criticalcyber-physical systems domains, such as autonomous vehi-cles, drones, and smart cities. Indeed, end-to-end learningis attractive for the realization of feedback controllers forsuch complex cyber-physical systems, thanks to the appealof designing control systems based on purely data-drivenarchitectures. However, regardless of the explosion in the useof machine learning to design data-driven feedback controllers,providing formal safety and reliability guarantees of theseML-based controllers is in question. It is then unsurprisingthe recent focus in the literature on the problem of safeand trustworthy autonomy in general, and safe reinforcementlearning, in particular.The literature on the safe design of ML-based controllersfor dynamical and hybrid systems can be classiﬁed accordingto three broad approaches, namely (i) incorporating safetyin the training of ML-based controllers, (ii) post-trainingveriﬁcation of ML-based controllers, and (iii) online validationof safety and control intervention. Representative examples ofthe ﬁrst approach include reward-shaping [1], Bayesian androbust regression [2], [3], [4], and policy optimization withconstraints [5], [6], [7], [8]. Unfortunately, this approach does Department of Electrical Engineering and Computer Science, Universityof California, Irvine, CA 92697 USA { xiaowus,yshoukry } @uci.edu This work was partially sponsored by the NSF awards not provide provable guarantees on the safety of the trainedcontroller. Other techniques in this domain include Lyapunovmethods [9], [10], [11], and safe model predictive control [12]which focuses mainly on providing stability guarantees ratherthan general safety and liveness guarantees.To provide strong safety and reliability guarantees, severalworks in the literature focus on applying formal veriﬁcationtechniques (e.g., model checking) to verify pre-trained ML-based controllers’ formal safety properties. Representativeexamples of this approach are the use of SMT-like solvers [13],[14], [15] and hybrid-system veriﬁcation [16], [17], [18].However, these techniques only assess a given ML-basedcontroller’s safety rather than design or train a safe agent.Due to the lack of safety guarantees on the resulting ML-based controllers, researchers proposed several techniques to restrict the output of the ML-based controller to a set of safecontrol actions. Such a set of safe actions can be obtainedthrough reachability analysis [19], [20], [21], barrier certiﬁ-cates [22], [23], [24], [25], [26], [27], [28], [29], [30], andonline learning of uncertainties [31]. Unfortunately, methodsof this type suffer from being computationally expensive,speciﬁc to certain controller structures or else employ trainingalgorithms that require assumptions on the system model.This paper proposes a principled framework combiningmodel-based control and data-driven neural network training toachieve enhanced reliability and veriﬁability. Our frameworkbridges ideas from reachability analysis to guide and bias theneural network controller’s design and training and is capableof supplying strong reliability guarantees. To that end, andstarting from a nonlinear model of the system, we compute aﬁnite-state abstract model capable of capturing the closed-loopbehavior under all neural network controllers. Such a ﬁnite-state abstract model can be computed using a direct extensionof existing reachability tools. Next, our framework uses thisabstract model to search for safe subsets of neural networkweight assignments that are guaranteed to result in a safecontroller. During the neural network training, we use a novel projection operator that projects the trained neural networkweights to the subsets found to be safe.Unlike the safety property, satisfying the liveness propertycan not be enforced by projecting the trained neural networkweights. Therefore, to account for the liveness properties, ourframework utilizes the abstract model further to reﬁne thesafe set of neural network weights to ﬁnd sets of candidate

NN weights that may satisfy the liveness properties. Theframework then ranks these candidates and bias the NNtraining process accordingly until a NN that satisﬁes theliveness property is obtained. In conclusion, the contributions a r X i v : . [ ee ss . S Y ] F e b of this paper can be summarized as follows:1) An abstraction-based framework that captures the be-havior of all possible neural network controllers.2) A novel NN weight projection operator that can beintegrated with any NN training procedure to ensure thatthe trained NN is provably safe.3) A procedure to bias the NN training to satisfy theliveness properties.II. P ROBLEM F ORMULATION

Notation:

The symbols R and N denote the set of real andnatural numbers, respectively. The symbols ∧ and = ⇒ denotethe logical AND and logical IMPLIES, respectively. We use Ψ CPWA : X → R m to denote a Continuous and Piece-WiseAfﬁne (CPWA) function of the form: Ψ CPWA ( x ) = K i x + b i if x ∈ R i , i = 1 , . . . , L, (1)where the polytopic sets {R , . . . , R L } is a partition of theset X . We call each polytopic set R i ⊂ X a linear region,and use L Ψ CPWA = {R , . . . , R L } to denote the set of linearregions associated to Ψ CPWA . A. Dynamical Model, Neural Network Controller, and Speci-ﬁcation

Consider discrete-time nonlinear dynamical systems of theform: x ( t +1) = f ( x ( t ) , u ( t ) ) , (2)where the state vector x ( t ) ∈ X ⊂ R n , the control vector u ( t ) ∈ U , and t ∈ N . Given a feedback control law Ψ :

X →U , we use ξ x , Ψ : N → X to denote the closed-loop trajectoryof (2) that starts from the state x ∈ X and evolves under thecontrol law Ψ .In this paper, our primary focus is on controlling thenonlinear system (2) with a state-feedback neural networkcontroller N N : X → U . A K -layer Rectiﬁed Linear Unit(ReLU) NN is speciﬁed by composing K layer functions (orjust layers). A layer l with i l inputs and o l outputs is speciﬁedby a weight matrix W ( l ) ∈ R o l × i l and a bias vector b ( l ) ∈ R o l as follows: L θ ( l ) : z (cid:55)→ max { W ( l ) z + b ( l ) , } , (3)where the max function is taken element-wise, and θ ( l ) (cid:44) ( W ( l ) , b ( l ) ) for brevity. Thus, a K -layer ReLU NN is speciﬁedby K layer functions { L θ ( l ) : l = 1 , . . . , K } whose inputand output dimensions are composable: that is they satisfy i l = o l − : l = 2 , . . . , K . Speciﬁcally: N N θ ( x ) = ( L θ ( K ) ◦ L θ ( K − ◦ · · · ◦ L θ (1) )( x ) , (4)where we index a ReLU NN function by a list of matrices θ (cid:44) ( θ (1) , . . . , θ ( K ) ) . Also, it is common to allow the ﬁnallayer function to omit the max function altogether, and wewill be explicit about this when it is the case.Specifying the number of layers and the dimensions of theassociated matrices θ ( l ) = ( W ( l ) , b ( l ) ) speciﬁes the architec-ture of the ReLU NN. Therefore, we use: A (cid:44) (( n, o ) , ( i , o ) , . . . , ( i K − , o K − ) , ( i K , m )) (5) to denote the architecture of a ReLU NN. Given a NNarchitecture A , we denote by Θ A the set of all lists of matrices θ that satisfy the number of layers and dimensions requiredby A : Θ A (cid:44) { θ = ( θ (1) , . . . , θ ( K ) ) | θ ( l ) = ( W ( l ) , b ( l ) ) , W ( l ) ∈ R o l × i l , b l ∈ R o l } . (6)As a typical control task, this paper considers a reach-avoidspeciﬁcation φ , which combines a safety property φ safety foravoiding a set of obstacles {O , . . . , O o } with O i ⊂ X , and aliveness property φ liveness for reaching a goal region X goal ⊂ X in a bounded time horizon T . We use ξ x , Ψ | = φ safety and ξ x , Ψ | = φ liveness to denote a trajectory ξ x , Ψ satisﬁes the safetyand liveness speciﬁcations, respectively, i.e.: ξ x , Ψ | = φ safety ⇐⇒ ∀ t ∈ N , ∀ i ∈ { O , . . . , O o } , ξ x , Ψ ( t ) (cid:54)∈ O i ,ξ x , Ψ | = φ liveness ⇐⇒ ∃ t (cid:48) ∈ { , . . . T } , ξ x , Ψ ( t (cid:48) ) ∈ X goal . Given a set of initial states X init , a control law Ψ :

X → R m satisﬁes the speciﬁcation φ (denoted by Ψ , X init | = φ ) if alltrajectories starting from the set X init satisfy the speciﬁcation,i.e., ξ x, Ψ | = φ , ∀ x ∈ X init . B. Main Problem

Given the dynamical model (2) and a reach-avoid speciﬁca-tion φ = φ safety ∧ φ liveness , we consider the problem of designingan NN controller with provable guarantees as described in thenext problem. Problem 2.1:

Given the nonlinear dynamical system (2)and a reach-avoid speciﬁcation φ , compute (i) a ReLU NNarchitecture A , (ii) an assignment of weights θ ∈ Θ A , and(iii) a set of initial states X init ⊆ X , such that N N θ , X init | = φ safety ∧ φ liveness . III. F RAMEWORK

Before describing our approach to solve Problems 2.1, westart by recalling the connection between ReLU neural net-works and Continuous Piece-Wise Afﬁne (CPWA) functionsas follows [32]:

Proposition 3.1:

Every R n → R m ReLU NN represents acontinuous piece-wise afﬁne function.In this paper, we conﬁne our attention to CPWA controllers(and hence neural network controllers) that are selected froma bounded polytopic set P K × P b ⊂ R m × n × R m , i.e., weassume that K i ∈ P K and b i ∈ P b .Our solution to Problem 2.1 is to use the mathematicalmodel of the physical system (2) to guide the design ofthe NN architecture and bias its training. In particular, ourapproach is split into two components, one to address thesafety speciﬁcations while the other one addresses the livenessspeciﬁcations as described in the next two subsections. A. Addressing Safety Speciﬁcation φ safety Our approach to address the safety speciﬁcations φ safety isas follows: • Step 1:

Capture the closed-loop behavior of the systemunder all

CPWA controllers using an abstract model. x ( t +1) = f ( x ( t ) , u ( t ) ) AAACCHicbZBNS8MwGMfT+TbnW9WjB4ND2FBGOwW9CEMvHie4F9jqSLN0C0vTkqTiKD168at48aCIVz+CN7+NWdeDbv4h8Mv/eR6S5++GjEplWd9GbmFxaXklv1pYW9/Y3DK3d5oyiAQmDRywQLRdJAmjnDQUVYy0Q0GQ7zLSckdXk3rrnghJA36rxiFxfDTg1KMYKW31zP2Hu7ikjuxyAi+gV0pv5eQYRlMo98yiVbFSwXmwMyiCTPWe+dXtBzjyCVeYISk7thUqJ0ZCUcxIUuhGkoQIj9CAdDRy5BPpxOkiCTzUTh96gdCHK5i6vydi5Es59l3d6SM1lLO1iflfrRMp79yJKQ8jRTiePuRFDKoATlKBfSoIVmysAWFB9V8hHiKBsNLZFXQI9uzK89CsVuyTSvXmtFi7zOLIgz1wAErABmegBq5BHTQABo/gGbyCN+PJeDHejY9pa87IZnbBHxmfPxPWl3c=  K K K K   ? ?? ?   K K K K AAADFXicjVJNS8QwEE3r9/q16tFLcFEEYWmroEfRi+BFwVVhuyxpOrsG07QmqbiU/gkv/hUvHhTxKnjz35iuPbgfBwdKHm/ezLxOEiScKe0435Y9MTk1PTM7V5lfWFxarq6sXqo4lRQaNOaxvA6IAs4ENDTTHK4TCSQKOFwFt8dF/uoepGKxuNC9BFoR6QrWYZRoQ7VXrB0/gC4TWRARLdlDXvFTEYIs+mWneTtz3Rxv4WHSy7HvD0u9cVLPSCs+iPDPBA53lZGxShNZlPfPovkA8a8W8f2o8UGu9D1AemOEY1ybaFdrTt3pBx4FbglqqIyzdvXLD2OaRiA05USppuskupURqRnlUOxaQULoLelC00BBIlCtrH+rOd40TIg7sTSf0LjP/q3ISKRULwqM0ni8UcO5ghyXa6a6c9DKmEhSDYL+DuqkHOsYF08Eh0wC1bxnAKGSGa+Y3hBJqDYPqViCO/zLo+DSq7u7de98r3Z4VK5jFq2jDbSNXLSPDtEJOkMNRK1H69l6td7sJ/vFfrc/fqW2VdasoYGwP38AyHf64Q== (a) (b) (c) (d) (e) Fig. 1: (a) An example of a state space is partitioned into abstract states q i , i = 1 , , , and the controller parameter space isdiscretized into controller partitions P and P . (b) Posterior graph constructed using the dynamical model, the set of abstractstates, and the set of controller partitions. (c) Assign one controller partition to each abstract state. Since both the controllerpartitions P and P are in the label from the state q to the obstacle, q is unsafe and hence is not considered in (c). (d) Traina local NN for each abstract state and enforce the CPWA functions represented by the NNs are within the assigned controllerpartitions. (e) Combine local NNs into a single NN controller that is guaranteed to satisfy the safety speciﬁcation φ safety .Such abstract model can be obtained by extending cur-rent results on reachability analysis for polytopic sys-tems [33]. • Step 2:

Identify a subset of CPWA controllers that leadto correct behavior on the abstract model. • Step 3:

Design a NN architecture that matches thestructure of the CPWA controllers that are identiﬁed tobe correct. • Step 4:

Enforce the training of the NN to pick from thesubset of the CPWA controllers that are identiﬁed to becorrect.Figure 1 conceptualizes our framework. To start with, weconstruct an abstract model by partitioning both the state space X and the set of all allowed CPWA functions P K × P b .In Figure 1 (b), the state space is partitioned into a set ofabstract states X = { q , q , q } such that q i ⊂ X and X = (cid:83) i ∈{ , , } q i . Similarly, the controller space P K × P b is parti-tioned into a set of abstract controller partitions P = {P , P } such that P i ∈ P K ×P b and P K ×P b = (cid:83) i ∈{ , } P i . The ﬁnalabstract model is a non-deterministic ﬁnite transition systemwith nodes represent abstract states in X and transitions arelabeled by controller partitions in P . Transitions between theabstract states are computed based on the reachable sets of thenonlinear system (2) from each abstract state and under everycontroller partition of the CPWA functions.Based on the abstract model, we compute a function P safe that maps each abstract state q ∈ X to a subset of thecontroller partitions (representing a collection of subsets ofCPWA functions) that are considered to be safe at each abstractstate. For example, in Figure 1 (b), since the transition from q labeled by P leads to the obstacle, the controller partition P is unsafe at q , and hence P safe ( q ) = {P } . Similarly, P safe ( q ) = {P , P } . For the abstract state q , since both P and P can lead to the obstacle, P safe ( q ) is empty, and hence q is considered as an unsafe abstract state. The set of initialstates that can provide safety guarantee is the union of the safeabstract states, i.e., X init = q ∪ q .Using the set of safe controllers captured by P safe ( q ) , it isdirect to show that any neural network whose linear regions arealigned to the abstract states, and its weights are restricted tothose in P safe ( q ) will always result into a NN that satisﬁes the safety speciﬁcations. Therefore, we utilize this observation toconstruct a NN architecture that is guaranteed to be sufﬁcientto implement a safe controller. Moreover, to ensure the safetyof the resulting trained neural network, we propose a NNweight “projection” operator to enforce that the trained NNonly gives rise to the CPWA functions indicated by P safe ( q ) . B. Addressing Liveness Speciﬁcation φ liveness Our approach to addressing the liveness speciﬁcation φ liveness (reaching the goal) can be summarized as follows: • Step 1:

Use the abstract model to identify candidate controller partitions P (cid:63) that can lead to satisfaction ofthe liveness properties. • Step 2:

Construct one local neural network NN q for eachof the abstract states. A sufﬁcient architecture for thisNN q can be derived using ideas presented in [34]. • Step 3:

Train the local neural networks NN q usingcollected data. The data collection can be biased usingthe knowledge of P (cid:63) to accelerate the process. We usethe NN weight projection operator (discussed in the safetysection) during training to ensure that the resulting NNstill enjoys the safety speciﬁcations. • Step 4:

Combine all the local neural networks NN q intoa single global NN.In the context of the example in Figure 1 (c), the controllerpartition P is assigned to both q and q as the candidate controller partition P (cid:63) that may lead to the satisfaction of theliveness properties.Next, we train one local neural network NN q for eachabstract state, or a subset of abstract states assigned with thesame controller partition as shown in Figure 1 (d). Duringtraining of the local neural networks, we project the weightsof the neural networks to enforce that the resulting NNs giverise to the CPWA functions belong to the assigned controllerpartition P (cid:63) . Since the controller partition P (cid:63) is chosen fromthe collection P safe ( q ) , the resulting NN at q enjoys the samesafety guarantees.Finally, in Figure 1 (e), we combine the NNs trained foreach abstract state into a single NN controller, by using layerswith ﬁxed weights to decide which part of the NN should beactivated. C. Formal Guarantees

We highlight that the proposed framework above always guarantees that the resulting NN satisﬁes the safety speciﬁca-tion φ safety thanks to the NN weight projection operator. Thisis reﬂected in Theorem 5.2 discussed in Section V-D.On the other hand, achieving the liveness speciﬁcation φ liveness depends on the quality of the data used to train theneural networks and hence needs an extra step of formallyverifying the resulting neural networks and iteratively changethe candidate controller partition P (cid:63) if needed. However, weargue that the resulting NN architecture is modular and is com-posed of a set of local networks NN q that are more amenableto veriﬁcation. The proposed architecture leads to a directdivide-and-conquer approach in which only local networksNN q may need to be re-designed and trained whenever theliveness properties are not met.IV. S AFE C ONTROLLER P ARTITION

In this section, we provide details on how to constructthe abstract model that captures the behavior of all CPWAcontrollers along with how to choose the controller partitionsthat satisfy the speciﬁcation φ . A. Abstract Model

In order to capture the behavior of the system (2) under allpossible CPWA controllers Ψ CPWA , we construct a ﬁnite-stateabstract model by discretizing both the state space and theset of all allowed CPWA functions. In particular, we partitionthe state space

X ⊂ R n into a set of abstract states, denotedby X = { q , . . . , q N } , with each q i ∈ X be an inﬁnity-normball in R n . The goal region X goal ⊂ X is represented by asingle abstract state q goal ∈ X , and the set of obstacles X obst = (cid:83) oi =1 { q obst i } represents each obstacle O i ⊂ X by an abstractstate q obst i ∈ X , i = 1 , . . . , o . Let Int( q ) denote the interiorof a set q , then the partitioning satisﬁes X = (cid:83) q ∈ X q , and Int( q i ) ∩ Int( q j ) = ∅ if i (cid:54) = j .Similarly, we partition the controller space into polytopicsets. For simplicity of notation, we deﬁne the set of parameters P K × b ⊂ R m × ( n +1) be a polytope that combines P K and P b ,and with some abuse of notation, we use K i ( x ) with a singleparameter K i ∈ P K × b to denote K (cid:48) i x i + b (cid:48) i with the pair ( K (cid:48) i , b (cid:48) i ) = K i . The controller space P K × b ⊂ R m × ( n +1) isdiscretized into a set of polytopic sets in R m × ( n +1) , denotedby P = {P , . . . , P M } , such that P K × b = (cid:83) P∈ P P , and Int( P i ) ∩ Int( P j ) = ∅ if i (cid:54) = j . We call each of thesubsets P i ∈ P a controller partition. Each controller partition P ∈ P represents a subset of CPWA functions, by restrictingparameters K i in a CPWA function to take values from P .In order to reason about the safety property φ safety , we intro-duce a posterior operator based on the dynamical model (2).The posterior of an abstract state q ∈ X under a controllerpartition P ∈ P is the set of states that can be reached in onestep from the states x ∈ q by using an afﬁne state feedbackcontroller with parameters K ∈ P , i.e.: Post( q, P ) (cid:44) { f ( x, K ( x )) ∈ R n | x ∈ q, K ∈ P} . (7) A nonlinear system’s posterior is often over-approximated inpractice and we use Post to denote the corresponding operator.Our abstract model is deﬁned by using the set of abstractstates X , the set of controller partitions P , and the posterioroperator Post . Intuitively, an abstract state q ∈ X has atransition to q (cid:48) ∈ X under a controller partition P ∈ P ifthe intersection between q (cid:48) and Post( q, P ) is non-empty. Deﬁnition 4.1: (Posterior Graph) A posterior graph is a ﬁnitetransition system S Post (cid:44) ( X, X , L, −→ ) , where: • X = X ; • X = X ; • L = 2 P ; • q l −→ q (cid:48) , if q (cid:54)∈ { q goal } ∪ X obst and l = {P ∈ P | q (cid:48) ∩ Post( q, P ) (cid:54) = ∅} (cid:54) = ∅ .Since the speciﬁcation φ also requires reaching a goal region X goal ⊂ X , we introduce a predecessor operator to capture theliveness property. The predecessor of an abstract state q (cid:48) ∈ X under a controller partition P ∈ P is deﬁned as the set ofstates that can reach q (cid:48) in one step by using an afﬁne statefeedback controller with some parameter K ∈ P : Pre( q (cid:48) , P ) (cid:44) { x ∈ R n | ∃ K ∈ P : f ( x, K ( x )) ∈ q (cid:48) } . (8)The computation of the posterior and the predecessor operatorscan be done by borrowing existing techniques in reachabilityanalysis of polytopic systems [33], with some difference relieson the need to consider polytopic partitions of the parameterspace P K × b ⊂ R m × ( n +1) instead of the well-studied problemof considering polytopic partitions of the input space U ⊂ R m . B. Computing Function P safe Once the abstract model is computed, our framework iden-tiﬁes a set of safe controller partitions P safe ( q ) ⊆ P at eachabstract state q ∈ X . It is possible that the set P safe ( q ) is emptyat some state q ∈ X , in which case, the state q is consideredto be unsafe.To start with, we introduce an operator Next on the posteriorgraph S Post . Given an arbitrary abstract state q ∈ X and acontroller partition P ∈ P , the set Next( q, P ) ⊆ X consists ofall abstract states that can be reached from q in one step, withthe label of the corresponding transition in S Post contains P : Next( q, P ) (cid:44) { q (cid:48) ∈ X | q (cid:48) ∩ Post( q, P ) (cid:54) = ∅} . (9)Similar to standard reachability analysis, a state q ∈ X isconsidered to be safe if the posteriors computed in multiplesteps do not intersect the obstacles. Unique to our problem,the parameter space P K × b ⊂ R m × ( n +1) is discretized into aset of controller partitions P , and at each state q ∈ X , it hasthe freedom to choose one of the controller partitions P ∈ P when computing the posterior. To capture this freedom in thechoice of controller partitions, we identify a series of unsafesets in a recursive manner by backtracking from the set ofobstacles X obst in the posterior graph S Post : X unsafe = X obst X unsafe = { q ∈ X | ∀P ∈ P : Next( q, P ) ∩ X unsafe (cid:54) = ∅} ∪ X unsafe ... X k unsafe = { q ∈ X | ∀P ∈ P : Next( q, P ) ∩ X k − unsafe (cid:54) = ∅} ∪ X k − unsafe The backtracking stops when it does not ﬁnd new unsafestates, i.e., X k unsafe = X k − unsafe . Intuitively, the set of unsafe sates X k unsafe consists of abstract states that cannot avoid reachingthe obstacles by choosing controller partitions from P .Then, the set of safe states X safe consists of all the statesthat can avoid transition to the set X k unsafe : X safe (cid:44) { q ∈ X \ X k unsafe | ∃P ∈ P : Next( q , P ) ∩ X k unsafe = ∅} . (10) Correspondingly, the function P safe maps each abstract state q ∈ X safe to a subset of controller partitions that can be usedto avoid transition to X k unsafe : P safe : q (cid:55)→ {P ∈ P | Next( q , P ) ∩ X k unsafe = ∅} . (11)The following theorem summarizes the the safety property: Theorem 4.2:

Let Ψ q, CPWA be a state feedback CPWAcontroller (as deﬁned in (1)) corresponding to an abstract state q ∈ X safe with arbitrarily choosed parameter K i that satisfy K i ∈ P and P ∈ P safe ( q ) . Consider a feedback controller Ψ constructed as follows: Ψ( x ) = Ψ q, CPWA ( x ) ∀ x ∈ q, ∀ q ∈ X safe . Then, with the set of initial states X init = (cid:83) q ∈ X safe q , thesystem (2) controlled by Ψ satisﬁes the safety speciﬁcation φ safety , i.e., Ψ , X init | = φ safety . By Theorem 4.2, the system is guaranteed to satisfy thesafety speciﬁcation φ safety by applying any CPWA controller Ψ q, CPWA at each abstract sate q ∈ X safe , as long as thecontroller parameters K i are chosen from the safe controllerpartitions P ∈ P safe ( q ) . This allows us to conclude the safetyguarantee provided by a NN controller if the CPWA functionsrepresented by the NN are chosen from the safe controllerpartitions, as we show in detail in Section IV-D. Proof:

When the backtracking stops, for every abstract state q ∈ X \ X k unsafe , there exists a controller partition P ∈ P suchthat Next( q , P ) ∩ X k unsafe = ∅ . By (10), X safe = X \ X k unsafe .Then, by (11), for every state q ∈ X safe , and any partition P ∈ P safe ( q ) , Next( q , P ) ⊆ X safe . Therefore, by applyingan arbitrary controller partition P ∈ P safe ( q ) at q ∈ X safe ,none of the resulting trajectories in the posterior graph S Post can reach an obstacle state q obst ∈ X obst . By the constructionof posterior graph S Post , the safety property provided by

P ∈ P safe ( q ) remain holds for any CPWA controller withparameters K i ∈ P . C. Sufﬁcient NN Architecture for Safety Speciﬁcations

Once the set of safe abstract states X safe and the associatedmap P safe is identiﬁed, the next step is to construct a neuralnetwork architecture that is sufﬁcient to implement a safecontroller. This NN will be augmented in Section V to accountfor the liveness properties as well. The following lemmaidentiﬁes a sufﬁcient number of linear regions needed byany piece-wise afﬁne function to satisfy the safety property φ safety . In Section V-D, this number is transformed into a NNarchitecture that is guaranteed to respect the sufﬁcient numberof linear regions needed for the safety guarantees. Lemma 4.3:

There is a piece-wise afﬁne function

Ψ :

X →U with the number of linear regions be | X safe | , such that thesystem (2) controlled by Ψ satisﬁes the safety property φ safety . Proof:

By Theorem 4.2, the safety property is satisﬁed byapplying any CPWA function with K i ∈ P and P ∈ P safe ( q ) .In particular, we consider the parameter K i is ﬁxed at q , i.e.,each abstract state q ∈ X safe is a linear region. Then, thesufﬁcient number of linear regions is same as the number ofstates in X safe . D. Safe Training Using NN Weight Projection

As described in Section III, the ﬁnal NN consists of severallocal networks NN q that will be trained to account for theliveness property. Nevertheless, to ensure that the safety prop-erty is met by each of the local networks NN q , we propose aNN weight projection operator that can be incoprpated in thetraining of these local NNs. This operator projects the weightsof each NN q to ensure that the network gives rise to the CPWAfunctions that belong to the controller partitions P (cid:63) ∈ P safe ( q ) (the selection of P (cid:63) based on the liveness property is presentedin Section V-A).To that end, we recall that every local network NN q rep-resents a CPWA function N N θ that partitions the state spaceinto a set of linear regions L N N θ = {R , . . . , R L } , with anafﬁne function parametrized by K i ∈ R m × ( n +1) associatedto each linear region R i , i = 1 , . . . , L . During the projectionphase, our algorithm identiﬁes the subset of linear regions thatintersect with the abstract state q for which the NN is trained: L N N θ ∩ q (cid:44) {R ∈ L N N θ | q ∩ R (cid:54) = ∅} . (12)Then, for each afﬁne function with parameter K i at region R i ∈ L N N θ ∩ q , our approach enforces K i ∈ P (cid:63) by adjustingthe weights of the trained NN.The projection of NN q weights can be done by solving aconvex optimization problem. Consider a K -layer NN withcurrently trained weights (cid:98) θ = ( (cid:98) θ (1) , . . . , (cid:98) θ ( K ) ) , where we usethe hat notation for weights that are given by any trainingprocedure. As a common practice, we consider there is noReLU activation function in the output layer of the NN, andhence the subset of intersected regions L N N (cid:98) θ ∩ q ⊆ L N N (cid:98) θ onlydepends on the hidden layer weights (cid:98) θ ( l ) , l = 1 , . . . , K − . During projection, our algorithm adjusts the output layerweights θ ( K ) by minimizing its difference from the currentlytrained weights (cid:98) θ ( K ) . By ﬁxing the hidden layer weights asgiven by the training, the subset of intersected regions L N N (cid:98) θ ∩ q can be determined. Controller parameters K i corresponding tolinear regions R i ∈ L N N (cid:98) θ ∩ q are brought into the controllerpartition P (cid:63) by solving the following quadratic program: min θ ( K ) || θ ( K ) − (cid:98) θ ( K ) || s.t. K i ∈ P (cid:63) , ∀ R i ∈ L N N (cid:98) θ ∩ q . (13)The weights θ ( K ) solved from (13) are then used to replacethe currently trained output layer weights (cid:98) θ ( K ) . These twoprocesses, i.e., training and projection, can be done either in analternating way, or only projecting once at the end of training.In the result section, we show that the trained NNs performwell even with just a single projection at the end, Notice thatthe optimization problem (13) could be infeasible, which canbe resolved by improving the hidden layer weights with moretraining effort. Example:

We illustrate the optimization problem (13) by atoy example. Consider a neural network

N N θ : R → R has asingle hidden layer with two neurons: h = max { , W (1)11 x + W (1)12 x } , h = max { , W (1)21 x + W (1)22 x } , and the outputlayer u = W (2)11 h + W (2)12 h . Based on the activation pattern ofthe two hidden layer neurons, the NN has four linear regions R , . . . , R , and the corresponding CPWA function N N θ canbe written as: N N θ ( x ) =  ( W (1)11 W (2)11 + W (1)21 W (2)12 ) x + ( W (1)12 W (2)11 + W (1)22 W (2)12 ) x , if x ∈ R W (1)11 W (2)11 x + W (1)12 W (2)11 x , if x ∈ R W (1)21 W (2)12 ) x + W (1)22 W (2)12 x , if x ∈ R , if x ∈ R Consider the controller partition P (cid:63) is given by K ∈ [ K low , K up ] and K ∈ [ K low , K up ] . Suppose the abstract state q intersects R and R , then problem (13) can be written as: min W (2)11 ,W (2)12 ( W (2)11 − (cid:99) W (2)11 ) + ( W (2)12 − (cid:99) W (2)12 ) s.t. (cid:99) W (1)11 W (2)11 + (cid:99) W (1)21 W (2)12 ∈ [ K low , K up ] , (cid:99) W (1)11 W (2)11 ∈ [ K low , K up ] , (cid:99) W (1)12 W (2)11 + (cid:99) W (1)22 W (2)12 ∈ [ K low , K up ] , (cid:99) W (1)12 W (2)11 ∈ [ K low , K up ] Since all the hidden layer weights (cid:99) W (1) ij are ﬁxed as given bythe training, it yields a quadratic program.The following result shows the safety guarantee providedby projecting weights during training: Theorem 4.4:

Given an abstract state q ∈ X safe and anarbitrary controller partition P (cid:63) ∈ P safe ( q ) . Let the weightassignment θ of NN q be computed by (13), then the resultingNN q = N N θ is guaranteed to be safe at q , i.e., NN q , q | = φ safety . Proof:

Notice that the subset of intersected regions L N N (cid:98) θ only depends on the hidden layer weights, which remainunchanged during the projection. Then, the constraints inproblem (13) along with Theorem 4.2 directly leads to theresult. V. E XTENSION TO L IVENESS P ROPERTY

A. Controller Partition Assignment

Among all the safe controller partitions in P safe ( q ) , we as-sign one of them P (cid:63) ∈ P safe ( q ) to each abstract state q ∈ X safe ,by taking into account the liveness speciﬁcation φ liveness . Theliveness property requires that the nonlinear system (2) canreach the goal X goal ⊂ X by using the trained NN controller.Unlike the safety property which can be enforced using theprojection operator, achieving the liveness property dependson practical issues, such as the amount of training data andthe effort spent on training the NNs. To that end, we showthat the safety guarantee provided by our algorithm does notimpede the collection of training data, and hence the NNs canbe trained to satisfy the liveness property by using standardlearning techniques.Since the posterior graph S Post over-approximates the be-havior of the system, a transition from q to q (cid:48) under P doesnot guarantee every state x ∈ q can reach q (cid:48) in one step,by applying input u = K ( x ) with parameter K ∈ P . To capture the liveness property, we introduce the predecessorgraph based on the predecessor operator given in Section IV-A: Deﬁnition 5.1: (Predecessor Graph) A predecessor graph isa ﬁnite transition system S Pre (cid:44) ( X, X , L, −→ ) , where: • X = X safe ∪ { q goal } ; • X = X safe ; • L = 2 P ; • q l −→ q (cid:48) , if q (cid:54) = q goal and l = {P ∈ P safe ( q ) | q ∩ Pre( q (cid:48) , P ) (cid:54) = ∅} (cid:54) = ∅ .Notice that in the construction of S Pre , we restrict transitionlabels to the safe controller partitions

P ∈ P safe ( q ) at eachstate q ∈ X safe . Let T Pre be the set of all trajectories overthe predecessor graph S Pre , then we use π ( t ) X : ω (cid:55)→ q todenote the map from a trajectory ω ∈ T Pre to the abstractstate q at time step t , and use π ( t ) L : ω (cid:55)→ l to denote the mapfrom ω ∈ T Pre to the label l ∈ P associated to the transitionfrom π ( t ) X ( ω ) to π ( t +1) X ( ω ) . By extending the formulation ofspeciﬁcation to the abstract state space, a trajectory ω in S Pre satisﬁes the reach-avoid speciﬁcation φ , denoted by ω | = φ , if ω reaches the goal state q goal in T steps. Similar to the notationintroduced for the posterior graph, let Next( q, P ) be the setof states that can be reached from q under the partition P inone step: Next( q, P ) (cid:44) { q (cid:48) ∈ X safe ∪{ q goal } | q ∩ Pre( q (cid:48) , P ) (cid:54) = ∅} . (14)At each state q ∈ X safe , our objective is to choose thecandidate controller partition P (cid:63) ∈ P safe ( q ) that can lead mostof the states x ∈ q to the goal. To that end, we restrict ourattention to trajectories in S Pre that progress towards the goal.That is, let | ω | be the length of a trajectory ω ∈ T Pre , and

Dist : X safe → N map a state q ∈ X safe to the length ofthe shortest trajectory from the state q to the goal in thepredecessor graph S Pre . Then, we use T (cid:48) Pre ⊆ T

Pre to denotethe subset of trajectories that lead to the goal: T (cid:48) Pre (cid:44) { ω ∈ T Pre | Dist( π ( t ) X ( ω )) < Dist( π ( t − X ( ω )) ,t = 1 , . . . , | ω | − } . (15)Now we can deﬁne the subset of abstract states that progresstowards the goal, denoted by Q q, P ⊆ Next( q, P ) , as the setof abstract states along a trajectory ω ∈ T (cid:48) Pre that satisﬁes thegiven speciﬁcation: Q q, P (cid:44) { q (cid:48) ∈ Next( q, P ) | ∃ ω ∈ T (cid:48) Pre : ω | = φ,π (0) X ( ω ) = q, π (1) X ( ω ) = q (cid:48) , P ∈ π (0) L ( ω ) } . (16)Then, the intersection between the state q and the predecessorsof q (cid:48) ∈ Q q, P under the controller partition P is given by: I q, P (cid:44) (cid:91) q (cid:48) ∈Q q, P ( q ∩ Pre( q (cid:48) , P )) . (17)Intuitively, the set I q, P measures the portion of states x ∈ q that can reach the goal by applying input u = K ( x ) with K ∈P at the ﬁrst step. Then, our algorithm assigns a state q ∈ X safe with the controller partition P (cid:63) ∈ P safe ( q ) that corresponds tothe largest set I q, P (cid:63) among all the sets I q, P for P ∈ P safe ( q ) .Indeed, and as mentioned in Section III, such procedure onlyranks the available choices of controller partitions and onemay need to iterate over the remaining choices using the sameheuristic above. B. Selecting Architecture for Local NNs

Given the assigned controller partition P (cid:63) ∈ P safe ( q ) at eachstate q ∈ X safe , the next step is to select a NN architecture(number of layers and number of neurons per layer) for eachof the local networks NN q . Such an architecture, should besufﬁcient to implement any CPWA function within the selectedcontroller partition P (cid:63) ∈ P safe ( q ) .Unfortunately, such neural network architecture may notexist without additional assumptions on the controllabilityof the underlying system, e.g., the existence of a Lipschitzcontinuous controller that satisfy the speciﬁcation φ liveness [34].We note that the current deﬁnition of the predecessor operator Pre can be slightly modiﬁed to ensure the existence of suchLipschitz continuous controllers, thanks to the fact that P (cid:63) represents a set of CPWA functions which are by deﬁnitionLipschitz continuous. For sake of brevity and due to spaceconstraints, we omit this discussion as it follows similararguments as the one detailed in [34]. Finally, we refer tosuch architecture as A q, P (cid:63) . C. Data Collection and Training of Local NNs

For training of the local networks NN q , we consider thetraining data are either already available or need to be collectedby invoking an expert. In the ﬁrst scenario, unlike mostlearning algorithms that can directly use the available data,our approach needs to take into account the abstract states andcontroller partitions associated with the data. In case trainingdata are not available, we assume to have access to an expertfor collecting data, with extra consideration that the collecteddata need be aligned with the controller partition assignment.We consider the collection of training data takes the form of { ( x, u ) } , where each state x ∈ X is associated with an inputlabel u ∈ R m . Given a collection of data D from trajectoriesthat satisfy the given speciﬁcation, Algorithm 1 selects a subsetof data D q ⊆ D used for training the local network NN q at q ,without the need to compute predecessors or invoke an expert.Speciﬁcally, for each data point ( x, u ) ∈ D , it determines thecorresponding controller partition P ∈ P safe ( q ) by solving alinear feasibility problem, under the constraints u = K ( x ) , K ∈ P (one data point ( x, u ) may correspond to multiplepartitions P ) (line 7 in Algorithm 1). After classifying a subsetof data D q, P associated to each controller partition P , thealgorithm selects P (cid:63) ∈ P safe ( q ) that corresponds to the mostamount of available data points (line 9-13 in Algorithm 1).By assuming having access to an expert, Algorithm 2collects training data that are consistent with the assignedcontroller partitions. It ﬁrst computes the intersection betweenthe state q and the predecessors of states q (cid:48) ∈ Q q, P (line 2-8in Algorithm 2). Then, the algorithm selects P (cid:63) ∈ P safe ( q ) corresponding to the largest intersection with the predecessors(line 9-12 in Algorithm 2). With access to an expert, it samplesstates x ∈ q and synthesizes one step transition to reachabstract states in Q q, P (cid:63) , by using an afﬁne controller withparameter K ∈ P (cid:63) (line 13-15 in Algorithm 2).With the collection of training data D q that are alignedwith the controller partition assignment P (cid:63) ∈ P safe ( q ) ateach abstract state q ∈ X safe , the local NNs can be trained to satisfy the liveness speciﬁcation φ liveness , along with thesafety guarantee enforced by projecting weights at the endof training. Algorithm 1

CLASSIFY-DATA ( q, P safe ( q ) , D ) D q = list () , P (cid:63) = None, current = 0 for P ∈ P safe ( q ) do D q, P = list () for ( x, u ) ∈ D do if x ∈ q then for P ∈ P safe ( q ) do if u = K ( x ) , K ∈ P then D q, P .append( ( x, u ) ) for P ∈ P safe ( q ) do if |D q, P | > current then current = |D q, P | D q = D q, P P (cid:63) = P Return D q , P (cid:63) Algorithm 2

COLLECT-DATA ( q, P safe ( q ) , S Pre ) D q = list () , P (cid:63) = None, current = 0 for P ∈ P safe ( q ) do I q, P = set () , Q q, P = set () for q (cid:48) ∈ Next( q, P ) do if ∃ ω ∈ T (cid:48) Pre : ω | = φ , π (0) X ( ω ) = q , π (1) X ( ω ) = q (cid:48) , P ∈ π (0) L ( ω ) then isect = q ∩ Pre( q (cid:48) , P ) I q, P = I q, P ∪ isect Q q, P = Q q, P ∪ { q (cid:48) } for P ∈ P safe ( q ) do if size( I q, P ) > current then P (cid:63) = P current = size( I q, P ) for sample x ∈ q do u = expert ( x, P (cid:63) , Q q, P (cid:63) ) D q .append( ( x, u ) ) Return D q , P (cid:63) D. Combined NN Controller

The ﬁnal step of our framework is to combine all thelocal networks NN q into one global NN controller. Figure 2(a) shows the overall structure of the global NN controllerobtained by combining modules [ NN q ] M that correspond tothe local networks NN q . As input to the NN controller, thestate x ∈ X is fed into all the local networks, and the outputof the NN controller is the summation of all the local networkoutputs. In the ﬁgure, we show a single output for simplicity,but it can be easily extended to multiple outputs (indeed, evenin the ﬁgures, u and u q can be thought as vectors in R m , andthe summation and product operations correspond to vectoraddition and scalar multiplication, respectively).Each module [ NN q ] M consists of two parts: logic and ReLUNN. The logic component decides whether the current state x is in the abstract state q associated with [ NN q ] M , and outputs if the answer is afﬁrmative, otherwise. The ReLU NN is theneural network trained for this abstract state q . By multiplyingthe outputs of the logic and the ReLU NN, output of themodule [ NN q ] M is identical to the output of the ReLU NN (a)(b) Fig. 2: (a) The combined NN controller consists of one module [ NN q i ] M for each abstract state q i ∈ X safe , where N (cid:48) = | X safe | .(b) An example of the module [ NN q ] M , where the state q ⊂ R is given by q = [0 , × [0 , .if x ∈ q , and zero otherwise. Figure 2 (b) is an example ofthe module [ NN q ] M , where we choose an arbitrary abstractstate q ⊂ R given by ≤ x ≤ and ≤ x ≤ .The logic component in each module [ NN q ] M can beimplemented as a single layer NN with ﬁxed weights. Giventhe H -representation Ax ≤ c of the state q , the weight matrixand the bias vector associated to the single layer NN are W (1) = − A and b (1) = c , respectively. Essentially, thischoice of weights encodes one hyperplane inequality in the H -representation to each neuron in the single layer. To representwhether an inequality holds, we use a step function as thenonlinear activation function for the single layer: Step( x ) = (cid:40) if x ≥ otherwise . (18)The product of the outputs of all the neurons in the singlelayer is computed at the end (by the product operator Π ),and hence the logic component returns if and only if allthe hyperplane inequalities are satisﬁed. We refer to the archi-tecture of the logic component as A q, Π and the architectureof the whole module [ NN q ] M as A q = [ A q, P (cid:63) ||A q, Π ] , where || denotes the parallel composition of the ReLU NN and thelogic component. Using the same notation, we can deﬁne thearchitecture of the global NN as follows: A = A q || . . . ||A q N (cid:48) . Now the guarantees of the combined NN controller can besummarized as follows whose proof follows directly from thediscussion above along with Theorem 4.2, Lemma 4.3, andTheorem 4.4:

Theorem 5.2:

Consider the nonlinear system (2) and thereach-avoid speciﬁcation φ = φ safety ∧ φ liveness . Let the con-troller partition assignment P (cid:63) , . . . , P (cid:63)N (cid:48) and the local neuralnetworks NN q , . . . , NN q N (cid:48) satisfy the following conditions: • the assignment P (cid:63)i satisﬁes P (cid:63)i ∈ P safe ( q i ) , • the NN architecture A q i satisﬁes A q i = [ A q i , P (cid:63) ||A q i , Π ] , • the NN weights θ i of NN q i is projected on P (cid:63)i using theprojection operator (13).Then, the neural network NN with architecture A = A q || . . . ||A q N (cid:48) and local networks NN q , . . . , NN q N (cid:48) satisﬁesNN , X init | = φ safety with X init = (cid:83) q ∈ X safe q . Moreover, if all thelocal networks NN q i satisfy: Reach( q i , NN q i ) ⊆ (cid:91) q (cid:48) ∈Q qi, P (cid:63)i q (cid:48) , (19)where Q q i , P (cid:63)i is deﬁned as (16) and Reach( q, NN q ) is deﬁnedas: Reach( q, NN q ) (cid:44) { f ( x, NN q ( x )) | x ∈ q } , (20)then: NN , X init | = φ liveness . In words, Theorem 5.2 guarantees that any global NNcomposed from provably safe local networks is still safe(i.e., satisfy φ safety ). This is a reﬂection of the fact that thecomposition of the global network respects the linear regionson which the local networks are deﬁned. Moreover, if each ofthe local NNs satisﬁes the local reachability property in (19),then the global NN satisﬁes the liveness property φ liveness . Thisis a reﬂection of the fact that the set Q q, P in (16) is deﬁnedto guarantee progress towards the goal.In practice, by combining the local NNs into a single con-troller, it allows one to repair the NN controller in a systematicway when it fails to meet the local liveness property (19).Speciﬁcally, with the observation that the behavior of thesystem is not as expected at a certain abstract state q ∈ X safe ,only the local neural network NN q need to be improved, suchas by further training with augmented data collected at thestate q , without affecting NNs that satisfy the speciﬁcation atother abstract states.VI. E XPERIMENTAL R ESULTS

We implemented the proposed framework and evaluatedboth the resulting control performance and the scalability ofthe proposed algorithm. We used TIRA [35] to compute thereachable sets and FORCES [36], [37] to collect training dataas shown in line 14 of Algorithm 2. All experiments wereexecuted on an Intel Core i9 2.4-GHz processor with 32 GBof memory.

A. Controller Performance Comparison: Provably Correct NNControllers vs Standard Training Techniques

We ﬁrst present trajectories of a wheeled robot under thecontrol of NN controllers trained by our algorithm. Let thestate vector of the system be x = [ ζ x , ζ y , θ ] (cid:62) ∈ X ⊂ R ,where ζ x , ζ y denote the coordinates of the robot, and θ is the W o r k s p a ce x [m] y [ m ] x [m] y [ m ] x [m] y [ m ] W o r k s p a ce x [m] y [ m ] x [m] y [ m ] x [m] y [ m ] Fig. 3: Workspace (the upper row) and workspace (the lower row) are partitioned into abstract states (dash lines) eitheruniformly or non-uniformly. Trajectories starting from different initial states satisfy both the safety speciﬁcation φ safety (blueareas are obstacles) and the liveness speciﬁcation φ liveness for reaching the goal (green area).heading direction. The discrete-time dynamics of the robot isgiven by: ζ ( t +∆ t ) x = ζ ( t ) x + ∆ t v cos ( θ ( t ) ) ζ ( t +∆ t ) y = ζ ( t ) y + ∆ t v sin ( θ ( t ) ) (21) θ ( t +∆ t ) = θ ( t ) + ∆ t u ( t ) where the control input u ( t ) ∈ R is determined by a ReLU NNcontroller, i.e., u ( t ) = NN ( x ( t ) ) , NN ∈ P K × b ⊂ R × withthe controller space P K × b considered to be a hyperrectangle.We choose discrete time step size ∆ t = 0 . .We considered two different workspaces indexed by and as shown in the upper and lower row of Figure 3, respectively.As the ﬁrst step of our algorithm, we discretized the state space X ⊂ R and the controller space P K × b ⊂ R × as describedin Section IV-A. To illustrate the ﬂexibility in the choice ofpartition strategies, we partitioned the state space correspond-ing to workspace uniformly into abstract states, whilepartitioning the state space corresponding to workspace non-uniformly into abstract states. In both cases, the range ofheading direction θ ∈ [0 , π ) is uniformly partitioned into intervals, and the partitions of the x , y dimensions areshown as the dashed lines in the workspaces in Figure 3. Weuniformly partitioned P K × b into hyperrectangles.By computing the reachable sets using the reachability toolTIRA [35], we constructed the posterior graph S Post , whichis then used to ﬁnd the set of safe abstract states X safe andthe function P safe . The number of safe abstract states | X safe | is and for workspaces and , respectively. Noticethat not all abstract states are safe. Indeed, abstract states that are next to the obstacles with the heading direction θ towardsthe obstacles are inevitably unsafe, since the control input u cannot affect the coordinates ζ x , ζ y in one time step. Theexecution time to compute the posterior graph and to identifythe set of safe states can be found in Table I.We collected training data by following Algorithm 2 inSection V-A. We used Keras [38] to train a shallow NN (onehidden layer) with hidden layer neurons for each abstractstate q ∈ X safe . At the end of training, we projected the trainedNN weights only once as mentioned in Section IV-D. Forworkspace , it takes seconds to collect all the trainingdata, and seconds to train all the local NNs including theprojection of the NN weights. For workspace , the executiontime for collecting data is seconds, and the total time fortraining and projection is seconds.In Figure 3, we show some trajectories under NN con-trollers trained by our algorithm in both workspaces. De-spite we choose trajectories with initial states in the set X init = (cid:83) q ∈ X safe q to be close to the obstacles or initiallyheading towards the obstacles, all the trajectories are collision-free as guaranteed by our algorithm. Moreover, by assigningcontroller partitions based on strategies in Section V-A, alltrajectories satisfy the liveness speciﬁcation φ liveness .Next, we compare NN controllers trained by our algorithmwith those trained by standard imitation learning, which mini-mizes the regression loss without taking into account the safetyguarantee. All NN controllers are trained using the same setof the training data. We vary the NN architectures for theNNs trained by standard imitation learning to achieve betterperformance, and train them using enough episodes for the loss S t a nd a r d I m i t a t i o n L e a r n i n g x [m] y [ m ] x [m] y [ m ] x [m] y [ m ] P r ova b l y S a f e T r a i n i n g x [m] y [ m ] x [m] y [ m ] x [m] y [ m ] Fig. 4: The upper row shows trajectories resulting from NN controllers trained using standard imitation learning, where theNN architectures are (left) hidden layers with neurons per layer, (middle) hidden layers with neurons per layer, and(right) hidden layers with neurons per layer. The lower row shows trajectories resulting from NN controllers trainedusing our algorithm. With the same initial states (two sub-ﬁgures in the same column), only the NN controllers trained by ouralgorithm result into collision-free trajectories.to be low enough. Nevertheless, as shown in Figure 4, for allthe NN controllers trained by standard imitation learning, weare able ﬁnd initial states starting from which the trajectoriesare not safe. However, with the same initial states, trajectoriesunder NN controllers trained by our algorithm are collision-free, which is guaranteed by our framework. B. Scalability Study

1- Scalability with respect to Partition Granularity:

Ouralgorithm computes the reachable sets of abstract states underdifferent controller partitions using reachability tools, whichmay employ conservative approximation for nonlinear sys-tems. In that case, identifying the set of safe abstract statesneed ﬁner partitioning of the state space X and the controllerspace P K × b . To that end, we show scalability of our algorithmwith respect to the choice of partition parameters. Usingthe two workspaces in Figure 3, we increase the numberof abstract states and controller partitions by decreasing thepartition grid size and report the execution time for each partof our framework in Table I. As shown in the table, withﬁner partitioning of the state space (more abstract states),the number of safe abstract states increases. In this example,ﬁner partitioning of the controller space does not lead tomore safe abstract states, since controller partitions have beensmall enough and as mentioned above, some abstract statesare inevitably unsafe. Moreover, we notice that the execution TABLE I: Scalability with respect to Partition Granularity Workspace Number of Number of Number of Compute Construct Compute AssignIndex Abstract Controller Safe & Reachable Reachable Posterior Function ControllerStates Partitions Abstract States Sets [s] Graph [s] P safe [s] Partitions [s] time grows linearly with the number of abstract states and thenumber of controller partitions.Although we conducted all the experiments on a single CPUcore, we note that our algorithm can be highly parallelized.For example, computing reachable sets of the abstract states,checking intersection between the posteriors and the abstractstates when constructing the posterior graph, and training localneural networks NN q can all be done in parallel. After trainingthe NN controller, the execution time of the controller is almostinstantaneous, which is a major advantage of NN controllers.

2- Scalability with respect to System Dimension:

Abstraction-based controller design is known to be computa-tional expensive for high-dimensional systems due to the curseof dimensionality. In Table II, we show scalability of our algo-rithm with respect to the system dimension. To convenientlyincrease the system dimension, we consider a chain of integra-tors represented as the linear system x ( t +1) = Ax ( t ) + Bu ( t ) , TABLE II: Scalability with respect to System Dimension

System Number of Compute Reachable Construct PosteriorDimension n Abstract States Sets [s] Graph [s] where A ∈ R n × n is the identity matrix, and u ( t ) ∈ R . Withﬁxed number of controller partitions and partition grid sizefor abstract states, Table II shows that the number of abstractstates and execution time grow exponentially with the systemdimension n . Nevertheless, our algorithm can handle a high-dimensional system in a reasonable amount of time.R EFERENCES[1] W. Saunders, G. Sastry, A. Stuhlmueller, and O. Evans, “Trial withouterror: Towards safe reinforcement learning via human intervention,” in

Proceedings of the 17th International Conference on Autonomous Agentsand MultiAgent Systems , 2018, pp. 2067–2069.[2] A. Liu, G. Shi, S.-J. Chung, A. Anandkumar, and Y. Yue, “Robust regres-sion for safe exploration in control,” arXiv preprint arXiv:1906.05819 ,2019.[3] F. Berkenkamp, A. Krause, and A. P. Schoellig, “Bayesian optimizationwith safety constraints: safe and automatic parameter tuning in robotics,” arXiv preprint arXiv:1602.04450 , 2016.[4] P. Pauli, A. Koch, J. Berberich, and F. Allg¨ower, “Training robust neuralnetworks using lipschitz bounds,” arXiv preprint arXiv:2005.02929 ,2020.[5] C. Gaskett, “Reinforcement learning under circumstances beyond itscontrol,” 2003.[6] T. M. Moldovan and P. Abbeel, “Safe exploration in markov decisionprocesses,” arXiv preprint arXiv:1205.4810 , 2012.[7] M. Turchetta, F. Berkenkamp, and A. Krause, “Safe exploration in ﬁnitemarkov decision processes with gaussian processes,” in

Advances inNeural Information Processing Systems , 2016, pp. 4312–4320.[8] L. Wen, J. Duan, S. E. Li, S. Xu, and H. Peng, “Safe reinforcementlearning for autonomous vehicles through parallel constrained policyoptimization,” arXiv preprint arXiv:2003.01303 , 2020.[9] F. Berkenkamp, M. Turchetta, A. Schoellig, and A. Krause, “Safe model-based reinforcement learning with stability guarantees,” in

Advances inneural information processing systems , 2017.[10] Y. Chow, O. Nachum, A. Faust, E. Duenez-Guzman, andM. Ghavamzadeh, “Lyapunov-based safe policy optimization forcontinuous control,” arXiv preprint arXiv:1901.10031 , 2019.[11] Y. Chow, O. Nachum, E. Duenez-Guzman, and M. Ghavamzadeh, “Alyapunov-based approach to safe reinforcement learning,” in

Advancesin neural information processing systems , 2018, pp. 8092–8101.[12] T. Koller, F. Berkenkamp, M. Turchetta, and A. Krause, “Learning-basedmodel predictive control for safe exploration,” in . IEEE, 2018.[13] X. Sun, H. Khedr, and Y. Shoukry, “Formal veriﬁcation of neuralnetwork controlled autonomous systems,” in

Proceedings of the 22ndACM International Conference on Hybrid Systems: Computation andControl , 2019, pp. 147–156.[14] S. Dutta, S. Jha, S. Sankaranarayanan, and A. Tiwari, “Output rangeanalysis for deep feedforward neural networks,” in

NASA Formal Meth-ods Symposium . Springer, 2018.[15] C. Liu, T. Arnon, C. Lazarus, C. Barrett, and M. J. Kochender-fer, “Algorithms for verifying deep neural networks,” arXiv preprintarXiv:1903.06758 , 2019.[16] M. Fazlyab, A. Robey, H. Hassani, M. Morari, and G. Pappas, “Efﬁcientand accurate estimation of lipschitz constants for deep neural networks,”in

Advances in Neural Information Processing Systems , 2019, pp.11 423–11 434.[17] W. Xiang, D. M. Lopez, P. Musau, and T. T. Johnson, “Reachable setestimation and veriﬁcation for neural network models of nonlinear dy-namic systems,” in

Safe, Autonomous and Intelligent Vehicles . Springer,2019, pp. 123–144. [18] R. Ivanov, J. Weimer, R. Alur, G. J. Pappas, and I. Lee, “Verisig:verifying safety properties of hybrid systems with neural networkcontrollers,” in

Proceedings of the 22nd ACM International Conferenceon Hybrid Systems: Computation and Control , 2019, pp. 169–178.[19] A. K. Akametalu, J. F. Fisac, J. H. Gillula, S. Kaynama, M. N. Zeilinger,and C. J. Tomlin, “Reachability-based safe learning with gaussianprocesses,” in . IEEE,2014, pp. 1424–1431.[20] V. Govindarajan, K. Driggs-Campbell, and R. Bajcsy, “Data-drivenreachability analysis for human-in-the-loop systems,” in . IEEE, 2017, pp.2617–2622.[21] J. F. Fisac, A. K. Akametalu, M. N. Zeilinger, S. Kaynama, J. Gillula,and C. J. Tomlin, “A general safety framework for learning-based controlin uncertain robotic systems,”

IEEE Transactions on Automatic Control ,vol. 64, no. 7, pp. 2737–2752, 2018.[22] J. Ferlez, M. Elnaggar, Y. Shoukry, and C. Fleming, “Shieldnn: Aprovably safe nn ﬁlter for unsafe nn controllers,” arXiv preprintarXiv:2006.09564 , 2020.[23] R. Cheng, G. Orosz, R. M. Murray, and J. W. Burdick, “End-to-endsafe reinforcement learning through barrier functions for safety-criticalcontinuous control tasks,” in

Proceedings of the AAAI Conference onArtiﬁcial Intelligence , vol. 33, 2019, pp. 3387–3395.[24] K. P. Wabersich and M. N. Zeilinger, “Scalable synthesis of safetycertiﬁcates from data with application to learning-based control,” in . IEEE, 2018, pp. 1691–1697.[25] M. Srinivasan, A. Dabholkar, S. Coogan, and P. Vela, “Synthesis ofcontrol barrier functions using a supervised machine learning approach,” arXiv preprint arXiv:2003.04950 , 2020.[26] A. J. Taylor, A. Singletary, Y. Yue, and A. D. Ames, “A control barrierperspective on episodic learning via projection-to-state safety,” arXivpreprint arXiv:2003.08028 , 2020.[27] X. Li and C. Belta, “Temporal logic guided safe reinforcement learningusing control barrier functions,” arXiv preprint arXiv:1903.09885 , 2019.[28] R. Cheng, M. J. Khojasteh, A. D. Ames, and J. W. Burdick, “Safe multi-agent interaction through robust control barrier functions with learneduncertainties,” arXiv preprint arXiv:2004.05273 , 2020.[29] L. Wang, E. A. Theodorou, and M. Egerstedt, “Safe learning of quadro-tor dynamics using barrier certiﬁcates,” in . IEEE, 2018, pp.2460–2465.[30] A. Robey, H. Hu, L. Lindemann, H. Zhang, D. V. Dimarogonas,S. Tu, and N. Matni, “Learning control barrier functions from expertdemonstrations,” arXiv preprint arXiv:2004.03315 , 2020.[31] G. Shi, X. Shi, M. O’Connell, R. Yu, K. Azizzadenesheli, A. Anand-kumar, Y. Yue, and S.-J. Chung, “Neural lander: Stable drone landingcontrol using learned dynamics,” in . IEEE, 2019, pp. 9784–9790.[32] R. Pascanu, G. Montufar, and Y. Bengio, “On the number of response re-gions of deep feed forward networks with piece-wise linear activations,” arXiv preprint arXiv:1312.6098 , 2013.[33] B. Yordanov, J. Tumova, I. Cerna, J. Barnat, and C. Belta, “Temporallogic control of discrete-time piecewise afﬁne systems,”

IEEE Transac-tions on Automatic Control , vol. 57, no. 6, pp. 1491–1504, 2012.[34] J. Ferlez, X. Sun, and Y. Shoukry, “Two-level lattice neural net-work architectures for control of nonlinear systems,” arXiv preprintarXiv:2004.09628 , 2020.[35] P.-J. Meyer, A. Devonport, and M. Arcak, “Tira: toolbox for intervalreachability analysis,” in

Proceedings of the 22nd ACM InternationalConference on Hybrid Systems: Computation and Control , 2019, pp.224–229.[36] A. Domahidi and J. Jerez, “Forces professional,” Embotech AG,url=https://embotech.com/FORCES-Pro, 2014–2019.[37] A. Zanelli, A. Domahidi, J. Jerez, and M. Morari, “Forces nlp: anefﬁcient implementation of interior-point... methods for multistage non-linear nonconvex programs,”

International Journal of Control , pp. 1–17,2017.[38] F. Chollet et al.et al.