Stability Analysis using Quadratic Constraints for Systems with Neural Network Controllers
SStability Analysis using Quadratic Constraints for Systems withNeural Network Controllers
He Yin, Peter Seiler and Murat Arcak
Abstract —A method is presented to analyze the stability offeedback systems with neural network controllers. Two stabilitytheorems are given to prove asymptotic stability and to computean ellipsoidal inner-approximation to the region of attraction.The first theorem addresses linear time-invariant plant dynam-ics, and merges Lyapunov theory with local (sector) quadraticconstraints to bound the nonlinear activation functions in theneural network. The second theorem allows the plant dynamicsto include perturbations such as unmodeled dynamics, slope-restricted nonlinearities, and time delay, using integral quadraticconstraint (IQCs) to capture their input/output behavior. Bothresults rely on semidefinite programming to compute Lyapunovfunctions and inner-estimates of the region of attraction. Themethod is illustrated on systems with neural networks trained tostabilize a nonlinear inverted pendulum as well as vehicle lateraldynamics with actuator uncertainty.
I. I
NTRODUCTION
The paradigm of stabilizing dynamical systems with NeuralNetwork (NN) controllers [1] has been revived following re-cent development in deep NNs, e.g. policy gradient [2]–[5] andbehavioral cloning [6]. However, feedback systems with NNcontrollers suffer from lack of stability and safety certificatesdue to the complexity of the NN structure. Specifically, NNshave various types of nonlinear activation functions, poten-tially numerous layers, and a large number hidden neurons,making it difficult to apply classical analysis methods, e.g.Lyapunov theory [7], [8]. Monte-Carlo simulations can be usedto investigate stability but lack formal guarantees, which areimportant in safety-critical applications such as aircraft andspace launch vehicles.Several works propose using quadratic constraints (QCs) tobound the nonlinear activation functions. This approach is usedto outer-bound the outputs of a (static) NN given a set of inputsin [9] and upper-bound the Lipschitz constant of NNs in [10],[11]. The work [12] uses this idea for finite-time reachabilityanalysis of a system with a NN controller. References [13],[14] approximate NN functions with polynomials, while [15]proposes a verification method that transforms sigmoid-basedNNs into equivalent hybrid systems. The work [16] performsstability analysis by constructing QCs from the bounds ofpartial gradients of NN controllers. Reference [17] assessesglobal asymptotic stability of dynamic neural network modelsusing QCs and Lyapunov theory.This paper presents two main stability results for a feedbacksystem with a NN controller. Theorem 1 provides a condition
Funded in part by the Air Force Office of Scientific Research grant FA9550-18-1-0253, the Office of Naval Research grant N00014-18-1-2209, and theU.S. National Science Foundation grant ECCS-1906164.H. Yin and M. Arcak are with the University of California, Berkeley { he_yin, arcak } @berkeley.edu. P. Seiler is with the University of Michigan, Ann Arbor [email protected]. to prove stability and to inner-estimate the region of attractionfor a linear time-invariant (LTI) plant. It uses Lyapunov theory,and local (sector) QCs to bound the nonlinear activationfunctions in the NN. Theorem 2 allows the plant to includeperturbations such as unmodeled dynamics, slope-restrictednonlinearities, and time delay, characterizing them with inte-gral quadratic constraints (IQCs) [18], [19]. Both results relyon semidefinite programming to compute Lyapunov functionsand inner-estimates of the region of attraction.The specific contributions of this paper are two-fold. First,our nominal analysis with LTI plants and NN controllers useslocal sector QCs that provide tight regional bounds and avoidthe conservatism of the global constraints used in [9]–[12],[17]. This allows us to provide stability certificates and regionof attraction estimates when global stability tests are infeasible.Moreover, our approach derives local QCs for activationfunctions using bounds only on the input to the first layer. Thisdiffers from [16] which derives bounds using partial gradientsof the activation functions. Second, our analysis of uncertainplants and NN controllers provides robustness guarantees forthe feedback system. The uncertain plant is modeled as aninterconnection of the nominal plant and perturbations thatare described by IQCs. The use of IQCs also allows for plantsthat are not necessarily LTI.The paper is organized as follows. Section II presents theproblem formulation and the nominal stability analysis whenthe plant is LTI. Section III addresses uncertain systems usingIQCs. Section IV provides numerical examples, including anonlinear inverted pendulum example and an uncertain vehiclelateral control example. Section V summarizes the results.
Notation: S n denotes the set of n -by- n symmetric matrices. S n + and S n ++ denote the sets of n -by- n symmetric, positivesemidefinite and positive definite matrices, respectively. RL ∞ is the set of rational functions with real coefficients and nopoles on the unit circle. RH ∞ ⊂ RL ∞ contains functionsthat are analytic in the closed exterior of the unit disk in thecomplex plane. (cid:96) n x is the set of sequences x : N → R n x with (cid:107) x (cid:107) := (cid:112)(cid:80) ∞ k =0 x ( k ) (cid:62) x ( k ) < ∞ . If v, w ∈ R n then theinequality v ≤ w is element-wise, i.e. v i ≤ w i for i = 1 , . . . n .Similarly, v ≥ w is an elementwise inequality. For P ∈ S n ++ , x ∗ ∈ R n , define the ellipsoid E ( P, x ∗ ) := { x ∈ R n : ( x − x ∗ ) (cid:62) P ( x − x ∗ ) ≤ } . (1)II. N OMINAL S TABILITY A NALYSIS
A. Problem Formulation
Consider the feedback system consisting of a plant G andstate-feedback controller π as shown in Figure 1. As a first a r X i v : . [ ee ss . S Y ] J un tep, we assume the plant G is a linear, time-invariant (LTI)system defined by the following discrete-time model: x ( k + 1) = A G x ( k ) + B G u ( k ) , (2)where x ( k ) ∈ R n G is the state, u ( k ) ∈ R n u is the input, A G ∈ R n G × n G and B G ∈ R n G × n u . The controller π : R n G → R n u is an (cid:96) -layer, feed-forward neural network (NN) defined as: w ( k ) = x ( k ) , (3a) w i ( k ) = φ i (cid:0) W i w i − ( k ) + b i (cid:1) , i = 1 , . . . , (cid:96), (3b) u ( k ) = W (cid:96) +1 w (cid:96) ( k ) + b (cid:96) +1 , (3c)where w i ∈ R n i are the outputs (activations) from the i th layer and n = n G . The operations for each layer are definedby a weight matrix W i ∈ R n i × n i − , bias vector b i ∈ R n i , andactivation function φ i : R n i → R n i . The activation function φ i is applied element-wise, i.e. φ i ( v ) := (cid:2) ϕ ( v ) , · · · , ϕ ( v n i ) (cid:3) (cid:62) , (4)where ϕ : R → R is the (scalar) activation function selectedfor the NN. Common choices for the scalar activation functioninclude ϕ ( ν ) := tanh( ν ) , sigmoid ϕ ( ν ) := e − ν , ReLU ϕ ( ν ) := max(0 , ν ) , and leaky ReLU ϕ ( ν ) := max( aν, ν ) with a ∈ (0 , . We assume the activation ϕ is identical in alllayers; this can be relaxed with minor changes to the notation. G
The region of attraction (ROA) of the feedbacksystem with plant G and NN π is defined as R := { x ∈ R n G : lim k →∞ χ ( k ; x ) = x ∗ } . (6) B. NN Representation: Isolation of Nonlinearities
It is useful to isolate the nonlinear activation functions fromthe linear operations of the NN as done in [9], [17]. Define v i as the input to the activation function φ i : v i ( k ) := W i w i − ( k ) + b i , i = 1 , . . . , (cid:96). (7) The nonlinear operation of the i th layer (Equation 3b) is thusexpressed as w i ( k ) = φ i ( v i ( k )) . Gather the inputs and outputsof all activation functions: v φ := v ... v (cid:96) ∈ R n φ and w φ := w ... w (cid:96) ∈ R n φ , (8)where n φ := n + · · · + n (cid:96) , and define the combined nonlin-earity φ : R n φ → R n φ by stacking the activation functions: φ ( v φ ) := φ ( v ) ... φ (cid:96) ( v (cid:96) ) . (9)Thus w φ ( k ) = φ ( v φ ( k )) , where the scalar activation function ϕ is applied element-wise to each entry of v φ . Finally, the NNcontrol policy π defined in Equation 3 can be rewritten as: (cid:20) u ( k ) v φ ( k ) (cid:21) = N x ( k ) w φ ( k )1 (10a) w φ ( k ) = φ ( v φ ( k )) . (10b)The matrix N depends on the weights and biases as follows,where the vertical and horizontal bars partition N compatiblywith the inputs ( x, w φ , and outputs ( u, v φ ) : N := · · · W (cid:96) +1 b (cid:96) +1 W · · · b W · · · b ... ... . . . ... ... ... · · · W (cid:96) b (cid:96) (11a) := (cid:20) N ux N uw N ub N vx N vw N vb (cid:21) . (11b)This decomposition of the NN, depicted in Figure 2, isolatesthe activation functions in preparation for the stability analysis. N
Definition 2.
Let α ≤ β be given. The function ϕ : R → R lies in the (global) sector [ α, β ] if: ( ϕ ( ν ) − αν ) · ( βν − ϕ ( ν )) ≥ ∀ ν ∈ R . (13)The interpretation of the sector [ α, β ] is that ϕ lies betweenlines passing through the origin with slope α and β . Manyactivation functions are bounded in the sector [0 , , e.g. tanh and ReLU. Figure 3 illustrates ϕ ( ν ) = tanh( ν ) (blue solid)and the global sector defined by [0 , (red solid lines). -6 -4 -2 0 2 4 6-4-3-2-101234 Global SectorLocal Sector ⌫
The global sector constraint is often too coarse for stabilityanalysis, and a local sector constraint provides tighter bounds.
Definition 3.
Let α , β , ν , ¯ ν ∈ R with α ≤ β and ν ≤ ≤ ¯ ν .The function ϕ : R → R satisfies the local sector [ α, β ] if ( ϕ ( ν ) − α ν ) · ( β ν − ϕ ( ν )) ≥ ∀ ν ∈ [ ν, ¯ ν ] . (14)As an example, ϕ ( ν ) := tanh( ν ) restricted to the interval [ − ¯ ν, ¯ ν ] satisfies the local sector bound [ α, β ] with α :=tanh(¯ ν ) / ¯ ν > and β := 1 . As shown in Figure 3 (greendashed lines), the local sector provides a tighter bound than theglobal sector. These bounds are valid for a symmetric intervalaround the origin with ν = − ¯ ν ; non-symmetric intervals( ν (cid:54) = − ¯ ν ) can be handled similarly.The local and global sector constraints above were de-fined to be centered at the point ( ν, ϕ ( ν )) = (0 , . Thestability analysis will require offset sectors centered aroundan arbitrary point ( ν ∗ , ϕ ( ν ∗ )) on the function. For example, ϕ ( ν ) = tanh( ν ) satisfies the global sector bound (red solid)around the point ( ν ∗ , tanh( ν ∗ )) with [ α, β ] = [0 , , as shownin Figure 4. It satisfies a tighter local sector bound (greendashed) when the input is restricted to ν ∈ [ ν, ¯ ν ] . An explicitexpression for this local sector is β = 1 and α := min (cid:18) tanh(¯ ν ) − tanh( ν ∗ )¯ ν − ν ∗ , tanh( ν ∗ ) − tanh( ν ) ν ∗ − ν (cid:19) . The local sector upper bound β can be tightened further. Thisleads to the following definition of an offset local sector. Definition 4.
Let α , β , ν , ¯ ν , ν ∗ ∈ R be given with α ≤ β and ν ≤ ν ∗ ≤ ¯ ν . The function ϕ : R → R satisfies the offset localsector [ α, β ] around the point ν ∗ if: (∆ ϕ ( ν ) − α ∆ ν ) · ( β ∆ ν − ∆ ϕ ( ν )) ≥ ∀ ν ∈ [ ν, ¯ ν ] (15) where ∆ ϕ ( ν ) := ϕ ( ν ) − ϕ ( ν ∗ ) and ∆ ν := ν − ν ∗ . -6 -4 -2 0 2 4 6-4-3-2-101234 Global SectorLocal Sector ⌫
D. Quadratic Constraints: Combined Activation Functions
Offset local sector constraints can also be defined for thecombined nonlinearity φ , given by Equation 9. Let v, ¯ v, v ∗ ∈ R n φ be given with v ≤ v ∗ ≤ ¯ v . Assume that the activationinput v φ ∈ R n φ lies, element-wise, in the interval [ v, ¯ v ] and the i th input/output pair is w i = ϕ ( v i ) . Further assume the scalaractivation function satisfies the local sector [ α i , β i ] around thepoint v ∗ ,i with the input restricted to v i ∈ [ v i , ¯ v i ] for i =1 , . . . , n φ . The local sector bounds can be computed for ϕ onthe given interval either analytically (as above for tanh ) ornumerically. These local sectors can be stacked into vectors α φ , β φ ∈ R n φ that provide quadratic constraints satisfied bythe combined nonlinearity φ . Lemma 1.
Let α φ , β φ , v , ¯ v , v ∗ ∈ R n φ be given with α φ ≤ β φ , v ≤ v ∗ ≤ ¯ v , and w ∗ := φ ( v ∗ ) . Assume φ satisfies the offsetlocal sector [ α φ , β φ ] around the point v ∗ element-wise for all v φ ∈ [ v, ¯ v ] . If λ ∈ R n φ with λ ≥ then: (cid:20) v φ − v ∗ w φ − w ∗ (cid:21) (cid:62) Ψ (cid:62) φ M φ ( λ )Ψ φ (cid:20) v φ − v ∗ w φ − w ∗ (cid:21) ≥ ∀ v φ ∈ [ v, ¯ v ] , w φ = φ ( v φ ) , where: Ψ φ := (cid:20) diag ( β φ ) − I n φ − diag ( α φ ) I n φ (cid:21) (16) M φ ( λ ) := (cid:20) n φ diag ( λ ) diag ( λ ) 0 n φ (cid:21) . (17) Proof.
For any v φ ∈ R n φ and w φ = φ ( v φ ) : (cid:20) v φ − v ∗ w φ − w ∗ (cid:21) (cid:62) Ψ (cid:62) φ M φ ( λ )Ψ φ (cid:20) v φ − v ∗ w φ − w ∗ (cid:21) = n φ (cid:88) i =1 λ i (∆ w i − α ∆ v i ) · ( β ∆ v i − ∆ w i ) where ∆ w i := ϕ ( v i ) − ϕ ( v ∗ ,i ) and ∆ v i := v i − v ∗ ,i . If v φ ∈ [ v, ¯ v ] then each term in the sum is non-negative by the offsetlocal sector constraints and λ ≥ .The stability theorem in the next section must verify that v ∈ [ v, ¯ v ] in order to apply the sector bounds. The analysisis simplified by constructing the sector bounds based only onthe input to the first activation layer. Specifically, let v ∗ be theequilibrium value at the first NN layer. Select v , ¯ v ∈ R n with v ≤ v ∗ ≤ ¯ v . The assumed bounds [ v , ¯ v ] can beused to compute a local sector [ α , β ] around the point v ∗ or the first activation function φ , i.e. the i th input/outputpair of φ is locally sector bounded by [ α i , β i ] . The boundson v can also be used to compute an interval [ w , ¯ w ] forthe output w = φ ( v ) † which can then be used to computebounds [ v , ¯ v ] on the input v to the next activation function. ‡ The intervals computed for w and v will contain theirequilibrium value w ∗ and v ∗ . This process can be propagatedthrough all layers of the NN to obtain local sector bounds [ α i , β i ] for the nonlinearities φ i ( i = 1 , . . . , (cid:96) ). The sectors canbe stacked into vectors α φ , β φ ∈ R n φ that provide (element-wise) local sector bounds as summarized next. Property 1.
Let v ∗ ∈ R n φ be an equilibrium value of theactivation input and v ∗ ∈ R n be the corresponding valueat the first layer. Select v , ¯ v ∈ R n with v ∗ ∈ [ v , ¯ v ] . Let ˆ v φ ( v ) ∈ R n φ denote the activation input generated by the NN(Equation 3b) from the input v ∈ R n at the first layer. Thereexist α φ , β φ ∈ R n φ such that φ satisfies the offset local sectoraround the point ( v ∗ , φ ( v ∗ )) for all ˆ v φ ( v ) with v ∈ [ v , ¯ v ] .E. Lyapunov Condition This section uses a Lyapunov function and the offset localsector to compute an inner approximation for the ROA of thefeedback system of G and π . To simplify notation, the intervalbound on v is assumed to be symmetrical about v ∗ , i.e. v =2 v ∗ − ¯ v so that ¯ v − v ∗ = v ∗ − v . This can be relaxed to handlenon-symmetrical intervals with minor notational changes. Theorem 1.
Consider the feedback system of plant G in (2) and NN π in (3) with equilibrium point ( x ∗ , u ∗ , v ∗ , w ∗ ) satis-fying (12) . Let ¯ v ∈ R n , v := 2 v ∗ − ¯ v , and α φ , β φ ∈ R n φ be given vectors satisfying Property 1 for the NN. Denote the i th row of the first weight W by W i and define matrices R V := (cid:20) I n G n G × n φ N ux N uw (cid:21) , and R φ := (cid:20) N vx N vw n φ × n G I n φ (cid:21) . If there exists a matrix P ∈ S n G ++ , and vector λ ∈ R n φ with λ ≥ such that R (cid:62) V (cid:20) A (cid:62) G P A G − P A (cid:62) G P B G B (cid:62) G P A G B (cid:62) G P B G (cid:21) R V + R (cid:62) φ Ψ (cid:62) φ M φ ( λ )Ψ φ R φ < , (20) (cid:20) (¯ v i − v ∗ ,i ) W i W (cid:62) i P (cid:21) ≥ , i = 1 , · · · , n , (21) † For example, if ϕ ( ν ) = tanh( ν ) then the input bound ν ∈ [ − ¯ ν, ¯ ν ] implies the output bound ϕ ( ν ) ∈ [ − tanh(¯ ν ) , tanh(¯ ν )] . ‡ The next activation input is v := W w + b . The largest value of the i th entry of this vector is obtained by solving the following optimization: ¯ v i := max w ≤ w ≤ ¯ w y (cid:62) w + b i (18)where y (cid:62) is the i th row of W . Define c := ( ¯ w + w ) and r := ( ¯ w − w ) . The optimization can be rewritten as: ¯ v i := (cid:16) y (cid:62) c + b i (cid:17) + max − r ≤ δ ≤ r y (cid:62) δ (19)This has the explicit solution ¯ v i = y (cid:62) c + b i + (cid:80) n j =1 | y j r j | . Similarly, theminimal value is v i = y (cid:62) c + b i − (cid:80) n j =1 | y j r j | . then: (i) the feedback system consisting of G and π islocally stable around x ∗ , and (ii) the set E ( P, x ∗ ) , definedby Equation 1, is an inner-approximation to the ROA.Proof. By Schur complements, (21) is equivalent to: W i P − W (cid:62) i ≤ (¯ v i − v ∗ ,i ) , i = 1 , · · · , n . (22)It follows from Lemma 1 in [20] that: E ( P, x ∗ ) ⊆ { x ∈ R n G : v − v ∗ ≤ W ( x − x ∗ ) ≤ ¯ v − v ∗ } . Finally, use v − v ∗ = W ( x − x ∗ ) to rewrite this as: E ( P, x ∗ ) ⊆ { x : v ≤ v ≤ ¯ v } . To summarize, feasibility of (21) verifies that if x ( k ) ∈E ( P, x ∗ ) then v ( k ) ∈ [ v , ¯ v ] and hence the offset localsectors conditions are valid.Next, left / right multiply the LMI (20) by any (non-zero) (cid:2) ( x ( k ) − x ∗ ) (cid:62) ( w φ ( k ) − w ∗ ) (cid:62) (cid:3) and its transpose to obtain (cid:20) x ( k ) − x ∗ u ( k ) − u ∗ (cid:21) (cid:62) (cid:20) A (cid:62) G P A G − P A (cid:62) G P B G B (cid:62) G P A G B (cid:62) G P B G (cid:21) (cid:20) x ( k ) − x ∗ u ( k ) − u ∗ (cid:21) + (cid:20) v φ ( k ) − v ∗ w φ ( k ) − w ∗ (cid:21) (cid:62) Ψ (cid:62) φ M φ ( λ )Ψ φ (cid:20) v φ ( k ) − v ∗ w φ ( k ) − w ∗ (cid:21) < . Define the Lyapunov function V ( x ) := ( x − x ∗ ) (cid:62) P ( x − x ∗ ) and use (2) and (12) to show: V ( x ( k + 1)) − V ( x ( k ))+ (cid:20) v φ ( k ) − v ∗ w φ ( k ) − w ∗ (cid:21) (cid:62) Ψ (cid:62) φ M φ ( λ )Ψ φ (cid:20) v φ ( k ) − v ∗ w φ ( k ) − w ∗ (cid:21) < . (23)As noted above, x ( k ) ∈ E ( P, x ∗ ) implies the offset local sector [ α φ , β φ ] around v ∗ . Then, by Lemma 1, the final term in (23)is ≥ and it follows from a Lyapunov argument, e.g. Theorem4.1 in [7], that x ∗ is an asymptotically stable equilibrium pointand E ( P, x ∗ ) is an inner approximation of the ROA.This stability theorem can be made less conservative byincorporating global slope bounds on ϕ in addition to the localsector bounds. Slope bounds provide quadratic constraintsbetween two inputs ν , ν ∈ R and their corresponding outputs ϕ ( ν ) , ϕ ( ν ) . These constraints can be exploited in twodifferent ways. First, the slope bounds provide constraints thatrelate the activation at different time instances, e.g. between φ i ( v φ,i ( k )) and φ i ( v φ,i ( k + 1)) for any i = 1 , . . . , n φ . Thesetemporal relations are captured by the Zames-Falb IQC [21],[22]. Second, it provides constraints that relate the scalaractivation functions repeated in φ at the same time instant,e.g. between φ i ( v φ,i ( k )) and φ j ( v φ,j ( k )) .III. R OBUST S TABILITY A NALYSIS
A. Problem Formulation
Consider the uncertain feedback system in Figure 5, con-sisting of an uncertain plant F u ( G, ∆) and a NN controller π as defined by (3). The uncertain plant F u ( G, ∆) is aninterconnection of a nominal plant G and a perturbation ∆ .The nominal plant G is defined by the following equations: x ( k + 1) = A G x ( k ) + B G q ( k ) + B G u ( k ) , (24a) p ( k ) = C G x ( k ) + D G q ( k ) + D G u ( k ) , (24b)here x ( k ) ∈ R n G is the state, u ( k ) ∈ R n u is the controlinput, p ( k ) ∈ R n p and q ( k ) ∈ R n q are the input and outputof ∆ , A G ∈ R n G × n G , B G ∈ R n G × n q , B G ∈ R n G × n u , C G ∈ R n p × n G , D G ∈ R n p × n q , and D G ∈ R n p × n u . Theperturbation is a bounded, causal operator ∆ : (cid:96) n p e → (cid:96) n q e . Thenominal plant G and perturbation ∆ form the interconnection F u ( G, ∆) through the constraint q ( · ) = ∆( p ( · )) . (25)The vectors ( x ∗ , u ∗ , p ∗ , q ∗ ) form an equilibrium point of thefeedback system if the following conditions hold: x ∗ = A G x ∗ + B G q ∗ + B G u ∗ , (26a) u ∗ = π ( x ∗ ) , (26b) q s ∗ = ∆( p s ∗ ) , (26c)where Equation 26c states that the constant input p s ∗ := { p ∗ , p ∗ , . . . } maps to the constant output q s ∗ := { q ∗ , q ∗ , . . . } .Note that ∆ is modeled as an operator mapping inputs tooutputs. If ∆ has an internal state then there is an implicitassumption that its initial condition is such that q s ∗ = ∆( p s ∗ ) .Let χ ( k ; x ) denote the solution to the uncertain feedbacksystem at time k from the initial condition x (0) = x . § Definethe robust ROA associated with x ∗ as follows. G
The robust ROA of the feedback system with theuncertain plant F u ( G, ∆) and NN π is defined as: R := { x ∈ R n G : lim k →∞ χ ( k ; x ) = x ∗ } . (27)The objective is to prove the uncertain feedback system isasymptotically stable and, if so, to find the largest estimate ofthe robust ROA using an ellipsoidal inner approximation. B. Integral Quadratic Constraints
The perturbation can represent various types of uncertainty[18], [19], including saturation, time delay, unmodeled dy-namics, and slope-restricted nonlinearities. The input-outputrelationship of ∆ is characterized with an integral quadraticconstraint (IQC), which consists of a ‘virtual’ filter Ψ ∆ applied § An input/output model is used for the perturbation ∆ so that its internalstate and initial condition is not explicitly considered. to the input p and output q of ∆ and a constraint on the output r of Ψ ∆ . The filter Ψ ∆ is an LTI system of the form: ψ ( k + 1) = A Ψ ψ ( k ) + B Ψ1 p ( k ) + B Ψ2 q ( k ) (28a) r ( k ) = C Ψ ψ ( k ) + D Ψ1 p ( k ) + D Ψ2 q ( k ) . (28b) ψ (0) = ψ ∗ (28c)where ψ ( k ) ∈ R n ψ is the state and r ( k ) ∈ R n r is the output.The state matrices have compatible dimensions. The vectors ψ ∗ ∈ R n ψ and r ∗ ∈ R n r are equilibrium states and output of(28) with inputs p ∗ and q ∗ such that Equation 26c holds, i.e., ψ ∗ = A Ψ ψ ∗ + B Ψ1 p ∗ + B Ψ2 q ∗ (29a) r ∗ = C Ψ ψ ∗ + D Ψ1 p ∗ + D Ψ2 q ∗ . (29b)The Lyapunov analysis in the next subsection makes use oftime-domain IQCs as defined next: Definition 6.
Let Ψ ∆ ∈ RH n r × ( n p + n q ) ∞ and M ∆ ∈ S n r begiven. A bounded, causal operator ∆ : (cid:96) n p e → (cid:96) n q e satisfiesthe time domain IQC defined by (Ψ ∆ , M ∆ ) if the followinginequality holds for all p ∈ (cid:96) n p e , q = ∆( p ) and for all N ≥ N (cid:88) k =0 ( r ( k ) − r ∗ ) (cid:62) M ∆ ( r ( k ) − r ∗ ) ≥ . (30) where r ( k ) and r ∗ are defined by (28b) and (29b) . The notation ∆ ∈ IQC (Ψ ∆ , M ∆ ) indicates that ∆ satisfiesthe IQC defined by Ψ ∆ and M ∆ . Therefore, the preciserelation (25), for analysis, is replaced by the constraint (30) on r . The quadratic constraint proposed in Lemma 1 is a specialinstance of a time-domain IQCs. Specifically, Lemma 1 definesa quadratic constraints that holds at each time step k and hencethe inequality also holds summing over any finite horizons.This is referred to as the offset local sector IQC.The time-domain IQCs, as defined here, hold on any finitehorizon N ≥ . These are typically called “hard IQCs”[18]. IQCs can also be defined in the frequency domainand equivalently expressed as time-domain constraints overan infinite horizon ( N = ∞ ). These are called soft IQCs.Although this paper focuses on the use of hard IQCs, it ispossible to also incorporate soft IQCs [23]–[25]. C. Lyapunov Condition
Let ζ := [ xψ ] ∈ R n ζ define the extended state vector, n ζ = n G + n ψ , whose dynamics are ζ ( k + 1) = A ζ ( k ) + B (cid:20) q ( k ) u ( k ) (cid:21) (31a) r ( k ) = C ζ ( k ) + D (cid:20) q ( k ) u ( k ) (cid:21) (31b) u ( k ) = π ( x ( k )) (31c)where the state-space matrices are A = (cid:20) A G B Ψ1 C G A Ψ (cid:21) , B = (cid:20) B G B G B Ψ1 D G + B Ψ2 B Ψ1 D G (cid:21) , C = (cid:2) D Ψ1 C G C Ψ (cid:3) , D = (cid:2) D Ψ1 D G + D Ψ2 D Ψ1 D G (cid:3) . et ζ ∗ := (cid:2) x ∗ ψ ∗ (cid:3) define the equilibrium point of the extendedsystem (31). Since IQCs implicitly constrain the input p of the extended system (31), the response of the extendedsystem subject to IQCs “covers” the behaviors of the originaluncertain feedback system. The following theorem provides amethod for inner-approximating the robust ROA by perform-ing analysis on the extended system subject to IQCs. Theorem 2.
Consider the feedback system of an uncertainplant F u ( G, ∆) in (24) – (25) , and the NN π in (3) withequilibrium point ( ζ ∗ , u ∗ , v ∗ , w ∗ , r ∗ , p ∗ , q ∗ ) satisfying (26) and (29) . Assume ∆ ∈ IQC (Ψ ∆ , M ∆ ) with Ψ ∆ and M ∆ given. Let ¯ v ∈ R n , v := 2 v ∗ − ¯ v , and α φ , β φ ∈ R n φ be given vectorssatisfying Property 1 for the NN, and define matrices R V = I n ζ I n q N uζ N uw , N uζ = [ N ux , n u × n ψ ] ,R φ = (cid:20) N vζ N vw I n φ (cid:21) , N vζ = [ N vx , n φ × n ψ ] , W i = (cid:2) W i × n ψ (cid:3) , W i is the i th row of W . If there exists a matrix P ∈ S n ζ ++ , and vector λ ∈ R n φ with λ ≥ such that R (cid:62) V (cid:20) A (cid:62) P A − P A (cid:62) P BB (cid:62) P A B (cid:62) P B (cid:21) R V + R (cid:62) φ Ψ (cid:62) Φ M φ ( λ )Ψ φ R φ + R (cid:62) V (cid:2) C D (cid:3) (cid:62) M ∆ (cid:2) C D (cid:3) R V < , (32a) (cid:20) (¯ v i − v ∗ ,i ) W i W (cid:62) i P (cid:21) ≥ , i = 1 , · · · , n , (32b) then: (i) the feedback system comprising the uncertainplant F u ( G, ∆) and π is locally stable around x ∗ for any ∆ ∈ IQC (Ψ ∆ , M ∆ ) , and (ii) the intersection of E ( P, ζ ∗ ) withthe hyperplane ψ = ψ ∗ , i.e. E ( P x , x ∗ ) where P x ∈ R n G × n G is the upper left block of P , is an inner-approximation to therobust ROA.Proof. As in the proof of Theorem 1, feasibility of (32b)implies that if ζ ( k ) ∈ E ( P, ζ ∗ ) then v ( k ) ∈ [ v , ¯ v ] and hencethe offset local sectors conditions are valid. Since the LMI in(32a) is strict, there exists (cid:15) > such that left/right multiplica-tion of the LMI by [( ζ ( k ) − ζ ∗ ) (cid:62) , ( w φ ( k ) − w ∗ ) (cid:62) , ( q ( k ) − q ∗ ) (cid:62) ] and its transpose yields: (cid:34) (cid:63) (cid:35) (cid:62) (cid:20) A (cid:62) P A − P A (cid:62) P BB (cid:62) P A B (cid:62) P B (cid:21) ζ ( k ) − ζ ∗ q ( k ) − q ∗ u ( k ) − u ∗ + (cid:34) (cid:63) (cid:35) (cid:62) (cid:2) C D (cid:3) (cid:62) M ∆ (cid:2) C D (cid:3) ζ ( k ) − ζ ∗ q ( k ) − q ∗ u ( k ) − u ∗ + (cid:104) (cid:63) (cid:105) (cid:62) Ψ (cid:62) φ M φ ( λ )Ψ φ (cid:20) v φ ( k ) − v ∗ w φ ( k ) − w ∗ (cid:21) ≤ − (cid:15) (cid:107) ζ ( k ) − ζ ∗ (cid:107) , where the entries denoted by (cid:63) can be inferred from symmetry.Define the Lyapunov function V ( ζ ) := ( ζ − ζ ∗ ) (cid:62) P ( ζ − ζ ∗ ) ,and use (26), (29) and (31) to show: V ( ζ ( k + 1)) − V ( ζ ( k )) + ( r ( k ) − r ∗ ) (cid:62) M ∆ ( r ( k ) − r ∗ )+ (cid:104) (cid:63) (cid:105) (cid:62) Ψ (cid:62) φ M φ ( λ )Ψ φ (cid:20) v φ ( k ) − v ∗ w φ ( k ) − w ∗ (cid:21) ≤ − (cid:15) (cid:107) ζ ( k ) − ζ ∗ (cid:107) . Sum this inequality from k = 0 to any finite time N ≥ . Thethird and fourth term on the left side will be ≥ by the localsector conditions and the IQC. This yields: V ( ζ ( N + 1)) − V ( ζ (0)) ≤ − N (cid:88) k =0 (cid:15) (cid:107) ζ ( k ) − ζ ∗ (cid:107) . Thus if ζ (0) ∈ E ( P, ζ ∗ ) then ζ ( k ) ∈ E ( P, ζ ∗ ) for all k ≥ .Moreover, this inequality implies that ζ ( N ) → ζ ∗ as N → ∞ .The initial condition for the virtual filter is ψ (0) = ψ ∗ so that ζ (0) ∈ E ( P, ζ ∗ ) is equivalent to x (0) ∈ E ( P x , x ∗ ) . Hence E ( P x , x ∗ ) is an inner approximation for the ROA.For a particular perturbation ∆ there is typically a class ofvalid time-domain IQCs defined by a fixed filter Ψ ∆ and amatrix M ∆ drawn from a constraint set M ∆ . Therefore whenformulating an optimization problem, along with P and λ , wecan treat M ∆ ∈ M ∆ as an additional decision variable toreduce conservatism. In this paper, the set M ∆ is restrictedto one that is described by LMIs [19]. The “largest” inner-approximation to the ROA can be obtained by minimizingtrace ( P x ) or some other metric related to the ellipsoid volume.IV. E XAMPLES
A. Inverted pendulum
Consider the nonlinear inverted pendulum example from[26] with mass m = 0 . kg, length l = 0 . m, and frictioncoefficient µ = 0 . Nms / rad. The dynamics are: ¨ θ ( t ) = mgl sin( θ ( t )) − µ ˙ θ ( t ) + u ( t ) ml , (33)where θ is the angular position (rad) and u is the controlinput (Nm). The plant state is x = [ θ, ˙ θ ] . The controller π is obtained through a reinforcement learning process usingpolicy gradient [3]–[5]. During training, the agent decisionmaking process is characterized by a probability: u ( k ) ∼ P r ( u ( k ) = u | x ( k ) = x ) for all u ∈ R and x ∈ R wherethe probability is a Gaussian distribution with mean π ( x ( k )) ,and standard deviation σ . After training, the policy mean π is used as the deterministic controller u ( k ) = π ( x ( k )) . Thecontroller π is parameterized by a 2-layer, feedforward NNwith n = n = 32 and tanh as the activation function forboth layers. The biases in the NN are set to zero during trainingto ensure that the equilibrium point is x ∗ = 0 and u ∗ = 0 .The dynamics used for training are the discretized version of(33) with sampling time dt = 0 . s.The system (33) is nonlinear due to the trigonometric term sin( θ ) . The robust ROA analysis framework can be applied byrearranging (33) into the form: ¨ θ ( t ) = − mglq ( t ) + mglθ ( t ) − µ ˙ θ ( t ) + u ( t ) ml , (34a) q ( t ) = ∆( θ ( t )) := θ ( t ) − sin( θ ( t )) . (34b)The static nonlinearity ∆( θ ) = θ − sin( θ ) , treated as aperturbation, is slope-restricted in the interval [0, 2], and sectorbounded in the sector [0, 1] for θ ∈ [ − . , . as shownin Figure 6. Local slope and sector bounds can be derivedto provide less conservative results on shorter intervals [ θ, ¯ θ ] . .
B. Vehicle lateral control
Consider the vehicle lateral dynamics from [28]: ˙ e ¨ e ˙ e θ ¨ e θ = C αf + C αr mU − C αf + C αr m aC αf − bC αr mU aC αf − bC αr I z U − aC αf − bC αr I z a C αf + b C αr I z U e ˙ ee θ ˙ e θ + − C αf m − aC αf I z u + aC αf − bC αr m − U a C αf + b C αr I z c (35)where e is the perpendicular distance to the lane edge, and e θ is the angle between the tangent to the straight section of the road and the projection of the vehicle’s longitudinal axis. Let x = [ e, ˙ e, e θ , ˙ e θ ] (cid:62) denote the plant state. The control input u isthe steering angle of the front wheel (rad), the disturbance c isthe road curvature ( / m), and the parameters are: longitudinalspeed of the vehicle U = 28 m / s, front cornering stiffness C αf = − . × N / rad, rear cornering stiffness C αr = − . × N / rad, vehicle mass m = 1 . × kg, momentof inertia I z = 2 . × kg / m , distances from vehicle centerof gravity to front axle a = 0 . m and rear axle b = 1 . m.Again, the controller π is obtained through reinforcementlearning using policy gradient, and is parameterized by a 2-layer, feedforward NN, with n = n = 32 and tanh asthe activation function for both layers. The training processuses a discretized version of (35) with sampling time dt =0 . s and draws the curvature c ( k ) at each time step froman interval [ − / , / . The control command derivedfrom u ( k ) = π ( x ( k )) enters the vehicle dynamics through asaturation function, defined as:sat ( u ) := u max ; u > u max ,u ; u min ≤ u ≤ u max ,u min ; u < u min , (36)where in this case u max = 30 ◦ and u min = − ◦ . Let u sat := sat ( π ( x )) define the saturated control signal.The analysis is performed for a constant curvature c ≡ / , resulting in a non-zero equilibrium state x ∗ =[ − . , , . , (cid:62) . In the analysis problem, on topof saturation, we also add a norm-bounded LTI uncertainty ∆ LTI ∈ RH ∞ with (cid:107) ∆ LTI (cid:107) ∞ ≤ . to the control input.This is used to assess the robustness of the NN controlleragainst actuator uncertainty. As shown in Figure 8, the actualinput to the vehicle dynamics is u pert ( k ) = u sat ( k ) + q ( k ) , (37a) q ( · ) = ∆ LTI ( u sat ( · )) . (37b)The saturation function is static and can also be describedusing a local sector bound. Let ¯ u be the largest possible controlcommand from π induced from the assumption that v ∈ [ v , ¯ v ] where ¯ v = v ∗ +0 . × × , and v = v ∗ − . × × .Then the saturation function in (36) satisfies the local sector [ α, β ] , where α := u max ¯ u and β := 1 . G The paper presented a method to analyze stability and toinner-approximate the region of attraction of equilibria infeedback systems with NN controllers. First, LTI plants wereanalyzed using Lyapunov theory and local sector QCs forbounding nonlinear activation functions. Second, the resultswere extended to uncertain plants with perturbations describedby IQCs, such as nonlinearities and unmodeled dynamics.Finally, the method was illustrated on a nonlinear invertedpendulum example, and uncertain vehicle lateral dynamicswith stabilizing NN controllers obtained using policy gradient.Future work will address disturbance accommodation andreachability analysis in feedback systems with NN controllers.VI. A CKNOWLEDGEMENTS The authors thank Professor Andrew Packard for initiatingthis project. The authors also acknowledge useful discussionswith Mahyar Fazlyab, Abhijit Chakraborty, and Akarsh Gupta.R EFERENCES[1] A. U. Levin and K. S. Narendra, “Control of nonlinear dynamicalsystems using neural networks: controllability and stabilization,” IEEETransactions on Neural Networks , vol. 4, no. 2, pp. 192–206, 1993.[2] R. J. Williams, “Simple statistical gradient-following algorithms forconnectionist reinforcement learning,” Mach. Learn. , vol. 8, no. 3–4,p. 229–256, May 1992. [Online]. Available: https://doi.org/10.1007/BF00992696[3] J. Peters and S. Schaal, “Reinforcement learning of motor skills withpolicy gradients,” Neural networks : the official journal of the Interna-tional Neural Network Society , vol. 21 4, pp. 682–97, 2008.[4] J. Schulman, S. Levine, P. Moritz, M. I. Jordan, and P. Abbeel, “TrustRegion Policy Optimization,” arXiv e-prints , p. arXiv:1502.05477, Feb.2015.[5] S. Kakade, “A natural policy gradient,” in Proceedings of the 14thInternational Conference on Neural Information Processing Systems:Natural and Synthetic , ser. NIPS’01. Cambridge, MA, USA: MITPress, 2001, p. 1531–1538. [6] D. A. Pomerleau, ALVINN: An Autonomous Land Vehicle in a NeuralNetwork . San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.,1989, p. 305–313.[7] H. K. Khalil, Nonlinear systems; 3rd ed. Upper Saddle River, NJ:Prentice-Hall, 2002, the book can be consulted by contacting: PH-AID:Wallet, Lionel. [Online]. Available: https://cds.cern.ch/record/1173048[8] M. Vidyasagar, Nonlinear Systems Analysis . SIAM, 2002.[9] M. Fazlyab, M. Morari, and G. J. Pappas, “Safety Verification andRobustness Analysis of Neural Networks via Quadratic Constraints andSemidefinite Programming,” arXiv e-prints , p. arXiv:1903.01287, Mar.2019.[10] M. Fazlyab, A. Robey, H. Hassani, M. Morari, and G. Pappas, “Efficientand accurate estimation of Lipschitz constants for deep neural networks,”in Advances in Neural Information Processing Systems 32 . CurranAssociates, Inc., 2019, pp. 11 427–11 438.[11] P. Pauli, A. Koch, J. Berberich, and F. Allg¨ower, “Training robust neuralnetworks using Lipschitz bounds,” arXiv e-prints , p. arXiv:2005.02929,May 2020.[12] H. Hu, M. Fazlyab, M. Morari, and G. J. Pappas, “Reach-SDP:Reachability Analysis of Closed-Loop Systems with Neural Net-work Controllers via Semidefinite Programming,” arXiv e-prints , p.arXiv:2004.07876, Apr. 2020.[13] C. Huang, J. Fan, W. Li, X. Chen, and Q. Zhu, “ReachNN: ReachabilityAnalysis of Neural-Network Controlled Systems,” arXiv e-prints , p.arXiv:1906.10654, Jun. 2019.[14] S. Dutta, X. Chen, and S. Sankaranarayanan, “Reachability analysis forneural feedback systems using regressive polynomial rule inference,”in Proceedings of the 22nd ACM International Conference on HybridSystems: Computation and Control , ser. HSCC ’19. New York,NY, USA: Association for Computing Machinery, 2019, p. 157–168.[Online]. Available: https://doi.org/10.1145/3302504.3311807[15] R. Ivanov, J. Weimer, R. Alur, G. J. Pappas, and I. Lee, “Verisig:verifying safety properties of hybrid systems with neural networkcontrollers,” arXiv e-prints , p. arXiv:1811.01828, Nov. 2018.[16] M. Jin and J. Lavaei, “Stability-certified reinforcement learning: Acontrol-theoretic perspective,” arXiv e-prints , p. arXiv:1810.11505, Oct.2018.[17] K. K. Kim, E. R. Patr´on, and R. D. Braatz, “Standard representation andunified stability analysis for dynamic artificial neural network models,” Neural Networks , vol. 98, pp. 251–262, 2018.[18] A. Megretski and A. Rantzer, “System analysis via integral quadraticconstraints,” IEEE Transactions on Automatic Control , vol. 42, no. 6,pp. 819–830, June 1997.[19] J. Veenman, C. W. Scherer, and H. K¨oro˘glu, “Robust stability andperformance analysis based on integral quadratic constraints,” EuropeanJournal of Control , vol. 31, pp. 1 – 32, 2016.[20] H. Hindi and S. Boyd, “Analysis of linear systems with saturation usingconvex optimization,” in Proceedings of the 37th IEEE Conference onDecision and Control (Cat. No.98CH36171) , vol. 1, 1998, pp. 903–908vol.1.[21] G. Zames and P. L. Falb, “Stability conditions for systems withmonotone and slope-restricted nonlinearities,” SIAM Journal on Control ,vol. 6, no. 1, pp. 89–108, 1968.[22] W. P. Heath and A. G. Wills, “Zames-Falb multipliers for quadraticprogramming,” in Proceedings of the 44th IEEE Conference on Decisionand Control , 2005, pp. 963–968.[23] A. Iannelli, P. Seiler, and A. Marcos, “Region of attraction analysis withintegral quadratic constraints,” Automatica , vol. 109, p. 108543, 2019.[24] H. Yin, P. Seiler, and M. Arcak, “Backward Reachability using IntegralQuadratic Constraints for Uncertain Nonlinear Systems,” arXiv e-prints ,p. arXiv:2003.05617, Mar. 2020.[25] H. Yin, A. Packard, M. Arcak, and P. Seiler, “Reachability AnalysisUsing Dissipation Inequalities For Uncertain Nonlinear Systems,” arXive-prints , p. arXiv:1808.02585, Aug. 2018.[26] F. Berkenkamp, R. Moriconi, A. P. Schoellig, and A. Krause, “Safelearning of regions of attraction for uncertain, nonlinear systems withgaussian processes,” in , 2016, pp. 4661–4666.[27] L. Lessard, B. Recht, and A. Packard, “Analysis and design of opti-mization algorithms via integral quadratic constraints,” SIAM Journalon Optimization , vol. 26, no. 1, pp. 57–95, 2016.[28] A. Alleyne, “A comparison of alternative intervention strategies forunintended roadway departure (urd) control,”