On-The-Fly Control of Unknown Smooth Systems from Limited Data
Franck Djeumou, Abraham P. Vinod, Eric Goubault, Sylvie Putot, Ufuk Topcu
OOn-The-Fly Control of Unknown Smooth Systems from Limited Data
Franck Djeumou, Abraham P. Vinod, Eric Goubault, Sylvie Putot, and Ufuk Topcu
Abstract —We investigate the problem of data-driven, on-the-fly control of systems with unknown nonlinear dynamics wheredata from only a single finite-horizon trajectory and possibly sideinformation on the dynamics are available. Such side informationmay include knowledge of the regularity of the underlying dy-namics, monotonicity, or decoupling in the dynamics between thestates. Specifically, we propose two algorithms,
DaTaReach and
DaTaControl , to over-approximate the reachable set and designcontrol signals for the system on the fly.
DaTaReach constructsa differential inclusion that contains the unknown dynamics.Then, it computes an over-approximation of the reachable setbased on interval Taylor-based methods applied to systems withdynamics described as differential inclusions.
DaTaControl en-ables convex-optimization-based, near-optimal control using thecomputed over-approximation and the receding-horizon controlframework. We provide a bound on its suboptimality and showthat more data and side information enable
DaTaControl toachieve tighter suboptimality bounds. Finally, we demonstratethe efficacy of
DaTaControl over existing approaches on theproblems of controlling a unicycle and quadrotor systems.
I. I
NTRODUCTION
Consider a scenario in which abrupt changes in the dy-namics of a system occurs. The changes in the dynamics aresuch that the a priori known model cannot be used, and thereis a need to learn the new dynamics on the fly. In such ascenario, the system needs to retain a certain degree of controlwith data from only its current trajectory. This paper considersthe problem of data-driven, on-the-fly control of systems withunknown dynamics under severely limited data.We propose data-driven algorithms,
DaTaReach and
DaTaControl , to over-approximate the reachable set andcontrol systems with unknown dynamics. Reachability analy-sis provides a framework to analyze the set of states reachedby a dynamical system. We use such a framework as thebaseline for the on-the-fly control, as it enables robust controlto perturbations on the data. When the underlying dynamicsare known, Hamilton-Jacobi-based [1] or interval Taylor-based [2]–[4] methods can approximate the reachable set.However, limited work has been reported for the case wherethe underlying dynamics are not known a priori, and evenfewer work considers settings with severely limited data.
DaTaReach and
DaTaControl can work with data froma single finite-horizon trajectory of the system and take advan-
This material is based on work partly supported by Air Force Office ofScientific Research (FA9550-19-1-0005) and National Aeronautics and SpaceAdministration (80NSSC19K0209).F. Djeumou, A. Vinod, and U. Topcu are with the Department of Electricaland Computer Engineering, Oden Institute for Computational Engineeringand Sciences, and the Department of Aerospace Engineering and EngineeringMechanics at the University of Texas at Austin, Austin, TX, USA. Email: { fdjeumou, utopcu } @utexas.edu , [email protected] .E. Goubault and S. Putot are with the LIX, CNRS, Ecole Poly-technique, Institut Polytechnique de Paris, France. Email: { goubault,putot } @lix.polytechnique.fr . GoalOver-approximationsfor various controlsfor all t ∈ [ t N , t N +1 ] State at t i > t N Sampled trajectory for t ≤ t N Over-approximations fornear-optimal controlsNear-optimal controlsynthesis for t > t N Optimal trajectory
Fig. 1. We use limited data and side information for on-the-fly control ofsystems with unknown dynamics. At each sampling time, we compute anover-approximation of the set of states the system may reach to describethe uncertainty in its unknown trajectory and construct one-step optimalcontrollers via convex optimization. tage of various forms of side information on the dynamics.The algorithms consider trajectories containing finite samplesof the states, the derivatives of the states, and the controlsignals applied. Furthermore, if available, they can make useof side information such as knowledge of the regularity ofthe dynamics, bounds on the vector field, monotonicity ofthe vector field, decoupling in the dynamics among the statesof the system, or knowledge of parts of the dynamics. Suchside information can be extracted from data, responses of thesystem to inputs, or known elementary law of physics.
DaTaReach provides closed-form expressions for over-approximations of the reachable set of unknown dynamicalsystems. It first utilizes the available data and the given sideinformation to construct differential inclusions that containthe vector field of the unknown dynamics. Unlike existingwork [5] that provides state-independent differential inclu-sions, the constructed differential inclusions account for de-pendencies on the states and control signals. Then, it pro-vides closed-form expressions for over-approximations of thereachable set associated with these differential inclusions. Toobtain such over-approximations, it utilizes interval Taylor-based methods [2]–[4] for known dynamics to compute over-approximations of the reachable set of dynamics describedby these differential inclusions. The closed-form expressionscan incorporate the side information, and more data and sideinformation provide tighter over-approximations.The closed-form expressions enable convex-optimization-based, near-optimal control of unknown dynamical systemsbased on the one-step receding-horizon control framework [6].Specifically, we seek to sequentially minimize a given one-step cost function in a discrete-time setting. The one-step costfunction, which encodes the desired behavior of the system,has to be optimized in a black-box manner since it is typicallya function of the unknown next state , the known current state,and the current input.
DaTaControl computes approximate a r X i v : . [ ee ss . S Y ] S e p olutions of such an optimization problem via approximateconvex optimization problems. We obtain such approximateproblems by replacing the unknown next state with a control-affine linearization of the corresponding over-approximationof the reachable set. Furthermore, we provide a bound on thesuboptimality of the approximate problems and show that sucha bound becomes tighter with more data and side information. Related work.
For the problem of data-driven control, re-searchers have proposed several approaches that combinesystem identification with model predictive control [7]–[11].In [7], the authors use Koopman theory to lift the unknownnonlinear dynamics to a higher-dimensional space where theyperform linear system identification.
SINDYc [8] utilizes asparse regression over a library of nonlinear functions fornonlinear system identification.
DMDc [9] uses the spectralproperties of the collected data to obtain approximate linearmodels. Thus,
DMDc can perform poorly due to the restrictionto linear dynamics. Myopic control [10] uses a finite sequenceof perturbations to learn a local linear model, which is thenused to optimize a model-driven goodness function that en-codes desirable behaviors.
DMDc , SINDYc , and approachesbased on the Koopman theory require significantly more datathan the proposed approach. Myopic control cannot incorpo-rate any of the side information mentioned above. Similarly,in their current forms, we believe that
DMDc , SINDYc , andapproaches based on the Koopman theory cannot incorporatethe side information. Furthermore, the experiments show that
DaTaControl performs significantly better than
SINDYc .Contextual optimization-based approaches tackle the data-driven control problem via surrogate optimization and skipthe system identification step [11], [12]. These approachesminimize the one-step cost function in a black-box mannerusing the data.
C2Opt [11] exploits the structure in the givenproblem and utilizes side information (e.g., smoothness) andconvex optimization to solve this problem. It overcomes thedrawbacks of the Gaussian process-based approaches [12],namely high computational costs, expensive hyperparametertuning, and inability to incorporate side information. Unlike
DaTaControl , C2Opt considers limited forms of side in-formation and relies on the knowledge of the gradient of theone-step cost, that may not be accessible. We demonstrate ona unicycle and quadrotor systems that
DaTaControl is threeorders of magnitude computationally faster and less subopti-mal than C2Opt and the Gaussian process-based algorithms.Recent work [13]–[15] has considered the problem of data-driven estimation of the reachable sets of partially unknowndynamical systems. The approaches in these work rely on asystem identification using either supervised learning algo-rithms [15] or Gaussian process-based algorithms [13], [14].Such approaches are unable to take advantage of the side infor-mation, require significantly more data than
DaTaReach , andprovide only probabilistic guarantees of the correctness of thecomputed reachable sets while
DaTaReach provides correctover-approximations at the expense of being conservative.In Appendix A, we provide the proofs of all the technicalresults of the paper. II. P
RELIMINARIES AND P ROBLEM S TATEMENT
We denote an interval by [ a, b ] = { x ∈ R | a ≤ x ≤ b } forsome a, b ∈ R such that a ≤ b , the set { i, . . . , j } by N [ i,j ] for i, j ∈ N with i ≤ j , the -norm by || · || , the k th componentof a vector x and the ( k, j ) component of a matrix X by ( x ) k and ( X ) k,j , respectively, the Lipschitz constant of f : X → R by L f = sup { L ∈ R | | f ( x ) − f ( y ) | ≤ L (cid:107) x − y (cid:107) , x, y ∈X , x (cid:54) = y } for X ⊆ R n , and the Jacobian of a function f by ∂f∂x . A function f ∈ C k ( X ) , referred as f is C k , with k ≥ if f is continuous on X ⊆ R n and all the partial derivativesof order , . . . , k exist and are continuous on X , and f is piecewise- C k with k ≥ if there exists a partition of X suchthat f is C k on each set in the partition. A. Interval Analysis
We denote the set of intervals on R by IR = {A =[ A , A ] | A , A ∈ R , A ≤ A} , the set of n -dimensional intervalvectors by IR n , and the set of n × m -dimensional intervalmatrices by IR n × m . We carry forward the definitions [16]of arithmetic operations, set inclusion, and intersections ofintervals to interval vectors and matrices by applying themcomponentwise. We use the term interval to specify an intervalvector or interval matrix when it is clear from the context.Given f : X (cid:55)→ Y with
X ⊆ R n and Y ⊆ R m (or Y ⊆ R n × m ), we define an interval extension of f as f with f ( A ) ⊇ R ( f, A ) = { f ( x ) | x ∈ A} , ∀A ⊆ X . (1)Thus, given an interval A , f ( A ) is an interval that over-approximates the range of values taken by f over A . Example 1 (I NTERVAL EXTENSION OF − NORM ) . Consider f = || · || . We compute its interval extension f via intervalextensions of α = √· and β = ( · ) . For any A = [ A , A ] ∈ IR , α ( A ) = [ (cid:112) A , (cid:112) A ] , if A ≥ , (2) β ( A ) = (cid:40) [0 , max {A , A } ] , if ∈ A [min {A , A } , max {A , A } ] , otherwise . (3) Using (2) and (3) with interval arithmetic, we have, for any S = [ S , . . . , S n ] ∈ IR n , f ( S ) = α ( (cid:80) ni =1 β ( S i )) .B. Over-Approximations of the Reachable Set Consider a nonlinear dynamical system, ˙ x = h ( x, u ) , (4)where the state x : R + (cid:55)→ X is a continuous-time signalevolving in X ∈ IR n , the control u ∈ U is a signal of timeevolving in the control set U ∈ IR m with U = { v : R + (cid:55)→U | v is piecewise- C D u } for D u ≥ , and h : X × U (cid:55)→ Y is C D h for D h ≥ and Y ∈ IR n . Given an initial state x i = x ( t i ) at time t i and a control signal u ∈ U , a trajectory of (4)is a function of time x ( · ; x i , u ) : [ t i , ∞ [ (cid:55)→ X that satisfies (4). Definition 1 (R EACHABLE SET ) . Given a set I i ⊆ X of statesat time t i and a set V ⊆ U of control signals, the reachableset of the dynamics (4) at time t ≥ t i is given by R ( t, I i , V ) = { z ∈ X | ∃ x i ∈ I i , ∃ v ∈ V , z = x ( t ; x i , v ) } . iven a set V ⊆ U and a set I ⊆ X of states at time t ,we compute over-approximations of R ( t, I , V ) at time t ≥ t using interval Taylor-based methods [3], [4], [16]. Specifically,we consider a time grid t < · · · < t N such that for all v ∈ V , v is C D u on each interval [ t i , t i +1 [ . We want to compute sets R + i ∈ IR n (with R +0 = I ) such that for all i ∈ N [0 ,N − , R i +1 = R ( t i +1 , R + i , V ) ⊆ R + i +1 . First, interval arithmeticenables to inductively define h [ d ] , the interval extensions ofthe Taylor coefficients h [ d ] given by h [1] = h, h [ d +1] = 1 d + 1 (cid:16) ∂h [ d ] ∂x h + d − (cid:88) l =0 ∂h [ d ] ∂u ( l ) u ( l +1) (cid:17) . (5)Next, we start with R +0 = I , and compute {R + i } Ni =1 by R + i +1 = R + i + D − (cid:88) d =1 (cid:0) t i +1 − t i (cid:1) d (cid:0) h [ d ] ( R + i , v ) (cid:1) ( t i )+ (cid:0) t i +1 − t i (cid:1) D (cid:0) h [ D ] ( S i , v ) (cid:1) ([ t i , t i +1 ]) , (6)where D ≤ min( D u +1 , D h ) is the order of the Taylor expan-sion, we denote v (0) ( A ) by v ( A ) and the intervals v ( d ) ( A ) for all d ∈ N [0 ,D u ] are such that ∪ v ∈ V R ( v ( d ) , A ) ⊆ v ( d ) ( A ) with A ⊆ R + . In other words, v ( d ) ( A ) over-approximatesthe range of the d th derivative of all v ∈ V on the interval A . Here, the set S i ⊆ X is an a priori rough enclosure of R ( t, R + i , V ) for all t ∈ [ t i , t i +1 ] and is a solution of R + i + [0 , t i +1 − t i ] h ( S i , v ([ t i , t i +1 ])) ⊆ S i . (7) C. Problem Statement
In this paper, we consider control-affine nonlinear dynamics, ˙ x = f ( x ) + G ( x ) u, (8)where f : X (cid:55)→ R n and G : X (cid:55)→ R n × m are unknown vector-valued and matrix-valued functions, respectively, for X ∈ IR n .Note that even though we consider control-affine dynamics, inthe general case, we can construct a control-affine model ofthe system locally and apply the results of the paper. Assumption 1 (S MOOTHNESS ) . f and G are C D functionsfor some D ≥ . Assumption 1 is common in the frameworks of reachabilityanalysis and receding-horizon control. It implies that f and G are globally Lipschitz-continuous since the domain X ∈ IR n is bounded. We exploit such Lipschitz continuity by consid-ering the following side information throughout the paper. Side information 1 (B OUNDS ON L IPSCHITZ CONSTANTS ) . Let L f ∈ R n + and L G ∈ R n × m + with ( L f ) k = L f k and ( L G ) k,l = L G k,l be known upper bounds on the Lipschitzconstants of ( f ) k and ( G ) k,l for all k ∈ N [1 ,n ] and l ∈ N [1 ,m ] . We can also utilize, if available, any of the following sideknowledge about the unknown dynamics.
Side information 2 (V ECTOR FIELD BOUNDS ) . We aregiven R f A ∈ IR n and R G A ∈ IR n × m as known over-approximations of the range of f and G , respectively, overa given set A ⊆ X . Side information 3 (G RADIENT BOUNDS ) . We are givenbounds on the gradient of some components of f and G . Suchside information may include the monotonicity of f and G . Side information 4 (D ECOUPLING AMONG STATES ) . We aregiven the knowledge that some components of the vector field ˙ x do not depend on some components of the state x . Side information 5 (P ARTIAL DYNAMICS KNOWLEDGE ) . Weare given terms of some components of the vector field as ˙ x = f kn ( x ) + f ukn ( x ) + ( G kn ( x ) + G ukn ( x )) u, where f kn and G kn are known functions while f ukn and G ukn are unknown functions. Such side information can be derivedfrom the application of elementary law of physics. Let N ∈ N , N ≥ . Let T N = { ( x i , ˙ x i , u i ) } Ni =1 denotea single finite-horizon trajectory containing N samples of thestate x i = x ( t i ) , the derivative ˙ x i = ˙ x ( t i ) of the state, andthe control signal u i = u ( t i ) from a trajectory of (8). Giventhe current state x ( t N +1 ) and the trajectory T N , we first seekto over-approximate the reachable set of the system. Problem 1 (R EACHABLE SET OVER - APPROXIMATION ) . Given a single finite-horizon trajectory T N for N ∈ N , theside information 1 and possibly any of the side information 2–5, a set V ⊆ U of admissible control signals, a time stepsize ∆ t > , and a maximum number T > N of time steps,compute an over-approximation of the reachable set at time t i = t N + ( i − N )∆ t for all i ∈ N [ N +1 ,T ] . Next, we seek to control the system by finding, at eachsampling time t i > t N , control values solutions of the one-step optimal control problem [11], [17], [18] minimize u i ∈U c ( x i , u i , x ( t i +1 ; x i , u i )) , (9)where x i is the state of the system at t i , t i +1 = t i + ∆ t , ∆ t is the constant time step size, and c : X × U × X → R is aknown convex function, which encodes the preferences over ashort interval of length ∆ t in time. Problem 2 (A PPROXIMATE ONE - STEP OPTIMAL CONTROL ) . Given a single finite-horizon trajectory T i − for some i ∈ N , i > N , the current state x i at time t i , the side informa-tion 1, and possibly any of the side information 2–5, compute,at sampling time t i , approximate solutions of the one-stepoptimal control problem (9) and characterize the suboptimalityof such approximations. We use the following example throughout the paper.
Example 2 (U NICYCLE S YSTEM ) . Consider the unicyclesystem with dynamics given by ˙ p x = v cos( θ ) , ˙ p y = v sin( θ ) , ˙ θ = ω, (10) where the components of the state x = [ p x , p y , θ ] represent,respectively, the position in the x plane, the y plane, and theheading of the unicycle. The components of the control u =[ v, ω ] represent the speed and the turning rate, respectively.e consider the constraint set U = [ − , × [ − π, π ] . In thecontrol-affine form (8) , we have f = 0 , ( G ) , = cos( θ ) , ( G ) , = sin( θ ) , ( G ) , = 1 , and ( G ) k,l = 0 otherwise. Weassume the dynamics (10) are unknown and are given the looseLipschitz bounds L f = [0 . , . , . , L G , = L G , =1 . , L G , = 0 . , and L G k,l = 0 otherwise. Furthermore, weconsider the knowledge that the vector field does not dependon its positions. That is, f ( x ) = f ( θ ) and G ( x ) = G ( θ ) . III. O
VER -A PPROXIMATIONS OF THE R EACHABLE S ET OF U NKNOWN D YNAMICAL S YSTEMS
We propose
DaTaReach to address Problem 1. It con-structs a differential inclusion that contains, at all times, thevector field of the unknown dynamics using the side informa-tion 1. Then, it utilizes the interval Taylor-based method toover-approximate the reachable set of dynamics described bythe constructed differential inclusion.
A. Differential Inclusion based on Lipschitz Continuity
First, to aid the construction of the differential inclusion,we over-approximate f and G at each data point of T N . Lemma 1 (C ONTRACTION VIA DATA ) . Given a data point ( x i , ˙ x i , u i ) , an interval F i ∈ IR n such that f ( x i ) ∈ F i , andan interval G i ∈ IR n × m such that G ( x i ) ∈ G i , the intervals C F i and C G i , defined sequentially for l = 1 , . . . , m by ( C F i ) k = ( F i ) k ∩ ( ˙ x i − G i u i ) k , ( s ) k = ( ˙ x i − C F i ) k ∩ ( G i u i ) k , ( C G i ) k,l = (cid:0) (( s l − ) k − (cid:80) p>l ( G i ) k,p u p ) ∩ (( G i ) k,l u l ) (cid:1) u l , if u l (cid:54) = 0( G i ) k,l , otherwise , ( s l ) k = (cid:0) ( s l − ) k − ( C G i ) k,l u l (cid:1) ∩ (cid:0) (cid:88) p>l ( G i ) k,p u p (cid:1) , (11) for all k ∈ N [1 ,n ] , are the smallest intervals enclosing f ( x i ) and G ( x i ) , respectively, given only the data point, F i , and G i . We provide a proof for Lemma 1 in Appendix A where weuse the constraint ˙ x i = f ( x i ) + G ( x i ) u i to remove from F i and G i values of f ( x i ) and G ( x i ) not satisfying the constraint.Next, we use the Lipschitz continuity to provide explicitfunctions that over-approximate f and G , given uncertainknowledge of f and G at the data points. Lemma 2 (O VER - APPROXIMATION OF f AND G ) . Given a set E N = { ( x i , C F i , C G i ) | f ( x i ) ∈ C F i , G ( x i ) ∈ C G i } Ni =0 andthe bounds L f and L G from side information 1, the interval-valued functions f : X → IR n and G : X → IR n × m , givenfor all k ∈ N [1 ,n ] and l ∈ N [1 ,m ] by ( f ( x )) k = (cid:92) ( x i ,C F i , · ) ∈ E N ( C F i ) k + L f k (cid:107) x − x i (cid:107) [ − , , ( G ( x )) k,l = (cid:92) ( x i , · ,C G i ) ∈ E N ( C G i ) k,l + L G k,l (cid:107) x − x i (cid:107) [ − , , (12) are such that f ( x ) ∈ f ( x ) and G ( x ) ∈ G ( x ) for all x ∈ X . Algorithm 1
Optimal over-approximation of the values of f and G at each data point of a finite-horizon trajectory. Input:
Single trajectory T N , sufficiently large M > , upperbounds on the Lipschitz constants (side information 1). Optional : The sets A , R f A and R G A (side information 2). Output: E N = { ( x i , C F i , C G i ) | f ( x i ) ∈ C F i , G ( x i ) ∈ C G i } Ni =0 if no side information 2 then A ← X , R f A ← [ − M, M ] n , R G A ← [ − M, M ] n × m end if Define x ∈ A , C F ← R f A , and C G ← R G A for i ∈ N [1 ,N ] ∧ ( x i , ˙ x i , u i ) ∈ T N do Compute F i = f ( x i ) , G i = G ( x i ) via (12) and E i − Compute C F i , C G i via (11), F i , G i , and ( x i , ˙ x i , u i ) end for while E N is not invariant do Execute lines 5–8 with E N instead of E i − on line 6 end while return E N We provide a proof for Lemma 2 in Appendix A. Notethat interval extensions of f and G can be obtained byreplacing occurences of (cid:107) · (cid:107) by its interval extension givenin Example 1. In the rest of the paper, when the input of f and G are intervals, we use such interval extensions.Finally, Theorem 1 constructs a differential inclusion for thesystem by combining Algorithm 1 and Lemma 2. Theorem 1 (D IFFERENTIAL INCLUSION ) . Given a trajectory T N , the bounds L f and L G from side information 1, andpossibly bounds on the vector field from side information 2,the dynamics (8) are contained in the differential inclusion ˙ x ∈ f ( x ) + G ( x ) u, (13) where the functions f : X → IR n and G : X → IR n × m areobtained by (12) with E N being the output of Algorithm 1. A proof of Theorem 1 is provided in Appendix A where weshow that the output E N of Algorithm 1 is such that for all ( x i , C F i , C G i ) ∈ E N , f ( x i ) ∈ C F i and G ( x i ) ∈ C G i . There-fore, Lemma 2 and interval arithmetic enable to conclude.Figure 2 shows that the differential inclusion holds on theunicycle system using a randomly generated trajectory T . . . . . -0.6-0.4-0.20.00.20.40.6 Time (s) ˙ p x f ( x ) + G ( x ) u f ( x ) + G ( x ) u T Fig. 2. Time evolution of ˙ p x and its over-approximation ( f ( x )+ G ( x ) u ) forthe unicycle system of Example 2. We generate the trajectory correspondingto ˙ x ( t ) using a randomly generated piecewise-constant input u ( t ) ∈ U for t ≤ t = 1 . s , and u ( t ) = [1 , cos(6( t − t ))] for t ∈ [ t , . emark 1. The quality of the differential inclusion of Theo-rem 1 depends on how much information on f and G can beobtained from the given trajectory T N . Thus, if the trajectoryis not diverse , it is likely impossible to obtain tight differentialinclusions. For example, consider one of the corner caseswhere u i = 0 for every data point in T N . In this case, itis impossible to retrieve any information on G . Note thatwhen the goal is to control the unknown system, control valuescan be synthesized to diversify the trajectory and obtain tightdifferential inclusions that help the future control decisions.B. Interval Taylor-Based Method for Differential Inclusions We compute an over-approximation of the reachable set ofthe dynamics described by the differential inclusion (13). Suchover-approximation is naturally an over-approximation of thereachable set of the unknown dynamical system.
Theorem 2 (R EACHABLE SET OVER - APPROXIMATION ) . Given a trajectory T N , a set V ⊆ U of piecewise- C D u control signals for D u ≥ , the bounds L f and L G from sideinformation 1, the set R + i ∈ IR n of states at time t i , and a timestep size ∆ t , an over-approximation R + i +1 of the reachable setat t i + ∆ t of dynamics described by the differential inclusion ˙ x ∈ f ( x ) + G ( x ) u with u ∈ V is given by R + i +1 = R + i + (cid:16) f ( R + i ) + G ( R + i ) v ( t i ) (cid:17) ∆ t + (cid:16) J f + J G V i (cid:17)(cid:16) f ( S i ) + G ( S i ) V i (cid:17) ∆ t G ( S i ) V (1) i ∆ t , (14) where V i = v ([ t i , t i + ∆ t ]) , V (1) i = v (1) ([ t i , t i + ∆ t ]) , andthe matrices J f ∈ IR n × n and J G ∈ IR n × m × n , intervalextensions of the Jacobian of f and G , are such that ( J f ) k,p = L f k [ − , and ( J G ) k,l,p = L G k,l [ − , for all k, p ∈ N [1 ,n ] and l ∈ N [1 ,m ] . The set S i is a solution of R + i + [0 , ∆ t ] (cid:16) f ( S i ) + G ( S i ) V i (cid:17) ⊆ S i . (15) For k, p ∈ N [1 ,n ] , we define the ( k, p ) component of J G V i as ( J G V i ) k,p = m (cid:88) l =1 ( J G ) k,l,p ( V i ) l . (16)We provide a proof for Theorem 2 in Appendix A wherewe use a Taylor expansion (6) of order D = 2 . Corollary 1.
Under the notation of Theorem 2, assume that D u = 0 . Then, an over-approximation R + i +1 of the reachableset of the dynamics described by the differential inclusion ˙ x ∈ f ( x ) + G ( x ) u of Theorem 1 with u ∈ V is given by R + i +1 = R + i + (cid:0) f ( S i ) + G ( S i ) V i (cid:1) ∆ t. Theorem 2, as it is, does not incorporate more side in-formation. We now show how to incorporate the other sideinformation to obtain tighter over-approximations.
Algorithm 2
DaTaReach : Over-approximation of the reach-able set of unknown smooth systems.
Input:
Single trajectory T N , upper bounds on the Lipschitzconstants from side information 1, set V of control signals,time step size ∆ t , maximum number of time steps T > N . Optional : Any of the side information 2–5.
Output:
Over-approximations {R + i } Ti = N +1 of the reachablesets at times t N + ( i − N )∆ t with i ∈ N [ N +1 ,T ] . Define R + N +1 ← { x ( t N +1 ) } Compute f , and G from Theorem 1 for i ∈ { N + 1 , . . . , T − } do Compute v ( t i ) , V i , and V (1) i in Theorem 2 from V Compute S i via (15), V i , f , and G Compute R + i +1 via (14), R + i , S i , f , G , J f , and J G end for return {R + i } Ti = N +1 Side information 2 (V ECTOR FIELD BOUNDS ). Given a set
S ⊆ A , tighter interval extensions of f and G over S can beobtained by the update f ( S ) ← f ( S ) ∩ R f A and G ( S ) ← G ( S ) ∩ R G A . Side information 3 (G RADIENT BOUNDS ). These bounds canbe used to provide tighter interval extensions J f and J G ofthe Jacobians of f and G . For example, if the function ( f ) k is known to be non-decreasing with respect to the variable x p on a set A ⊆ X , then we obtain a tighter R + i +1 by the update ( J f ) k,p ← ( J f ) k,p ∩ R + if S i ⊆ A . Side information 4 (D ECOUPLING AMONG STATES ). Thisknowledge constrains the interval extensions for the Jacobianmatrices. For example, if the state ( x ( t )) p does not directlyaffect ( ˙ x ( t )) k for some p, k ∈ N [1 ,n ] under any controlsignal in U , we can obtain a tighter over-approximation ofthe reachable set by setting to zero the intervals ( J f ) k,p and ( J G ) k,l,p for all l ∈ N [1 ,m ] . For the unicycle system ofExample 2, since f and G depends only on the heading θ ,the Jacobian terms ( J f ) , , ( J f ) , , ( J f ) , , ( J f ) , , ( J f ) , , ( J f ) , , ( J G ) , , , ( J G ) , , , ( J G ) , , , ( J G ) , , , ( J G ) , , ,and ( J G ) , , must all be set to zero . Side information 5 (P ARTIAL DYNAMICS KNOWLEDGE ). Thenew functions f and G are given by f = f kn + f ukn and G = G kn + G ukn , where the functions f kn and f ukn are intervalextensions of known functions f kn and G kn , respectively, andthe functions f ukn and G ukn are obtained by Theorem 1 using L f ukn , L G ukn , and the new trajectory T (cid:48) N = { ( x i , ˙ x i − ( f kn ( x i )+ G kn ( x i ) u i ) , u i ) | ( x i , ˙ x i , u i ) ∈ T N } . Furthermore, the new Jacobian terms in the computationof R + i +1 are given by J f = ∂f kn ∂x ( S i ) + J f ukn and ( J G ) k,l,p = ∂ ( G kn ) k,l ∂x p ( S i ) + ( J G ukn ) k,l,p , respectively.We demonstrate the value of additional side informationon the unicycle system of Example 2. Specifically, we use DaTaReach to compute over-approximations of the reachablesets of the unicycle under different side information. Figure 3hows that, as expected, the over-approximation becomestighter under additional side information.-2.7 -2.6 -2.5 -2.4 -2.3 -2.2 -2.1 -2.0 -1.9 − − − p x p y DaTaReach (a)
DaTaReach (b) T DaTaReach (c) Reachable set
Fig. 3. Over-approximation of the reachable set of the unicycle system inthe x - y plane. The trajectory T comes from a trajectory of the systemobtained by a randomly generated piecewise-constant control signal u ( t ) ∈U for t ≤ t = 1 . s . The parameters for DaTaReach were given by ∆ t = 0 . , T = 200 , and V = { t (cid:55)→ [1 + a , cos(6( t − t )) + a ] | a ∈ [ − . , . , a ∈ [ − . , . } . The case (a) ignores the side information f ( x ) = f ( θ ) , case (b) is exactly the setting of Example 2, and case (c) assumes the extra knowledge that f = 0 and ( G ) , = 1 . IV. C
ONTROL S YNTHESIS FOR U NKNOWN D YNAMICAL S YSTEMS
We propose
DaTaControl to address Problem 2. It com-putes approximate solutions to the one-step optimal controlproblem (9) using the over-approximation of the reachableset. Specifically, it replaces the unknown next state x i +1 inproblem (9) with the corresponding over-approximation R + i +1 ,which may be computed using Theorem 2. Unfortunately, R + i +1 is a nonconvex function of the control signal u , the de-cision variable. Therefore, it constructs an over-approximationof R + i +1 , in Theorem 3, that is convex in the given constantcontrol signal u applied in the time interval [ t i , t i +1 ] . Thenew over-approximation provides tractable approximations ofthe one-step optimal control problem (9). Theorem 3.
Under the notation of Theorem 2 and given aconstant control signal u : t (cid:55)→ u i applied between t i and t i +1 = t i + ∆ t with u i ∈ U , the reachable set R i +1 satisfies R i +1 ⊆ (cid:0) B i + A + i u i (cid:1) ∩ (cid:0) B i + A − i u i (cid:1) , (17) where the intervals A − i , A + i , and B i are given by A − i = G ( R + i )∆ t + (cid:0) J f G ( S i ) + J T G ( f ( S i ) + G ( S i ) U ) (cid:1) ∆ t , A + i = G ( R + i )∆ t + (cid:0) ( J f + J G U ) G ( S i ) + J T G f ( S i ) (cid:1) ∆ t , B i = R + i + f ( R + i )∆ t + J f f ( S i ) ∆ t , (18) where ( J T G ) k,p,l = ( J G ) k,l,p for k, p ∈ N [1 ,n ] and l ∈ N [1 ,m ] ,and the a priori rough enclosure S i is a solution of R + i + [0 , ∆ t ] (cid:16) f ( S i ) + G ( S i ) U (cid:17) ⊆ S i . (19) Algorithm 3
DaTaControl : Approximate one-step optimalcontrol solution at sampling time t i > t N Input:
Single trajectory T i − of length i > N , time step size ∆ t , upper bounds on the Lipschitz constants from sideinformation 1, one-step cost c , current state x i = x ( t i ) ,constraint set U . Optional : Any of the side information 2–5.
Output:
Constant control ˆ u i ∈ U to apply between t i and t i + ∆ t that approximates a solution of (9). Define R + i ← { x i } Compute f , and G from Theorem 1 Compute S i via (19), R + i , U , f , and G Compute B i , A + i , and A − i via (18), R + i , S i , U , f , G , J f ,and J G Compute ˆ u i as the solution of either (20) or (21) return ˆ u i A proof of Theorem 3 is provided in Appendix A wherewe linearize, using the constrained set U , the quadratic termin u of the closed-form expression (14).We use the control-affine linearization of Theorem 3 topropose two convex optimization problems that approximatethe one-step optimal control problem (9). The first problem,called optimistic control problem , is given by minimize u i ∈U inf x i +1 ∈X ,x i +1 ∈ ( B i + A + i u i ) ∩ ( B i + A − i u i ) c ( x i , u i , x i +1 ) , (20)where the goal is to minimize the best possible cost valueover all possible state x i +1 in the over-approximation (17).The second problem, called idealistic control problem , is anidealistic approximation of (9) given by minimize u i ∈U c ( x i , u i , b ide i + A ide i u i ) , (21)where the goal is to minimize the cost associated to a specifictrajectory x i +1 = b ide i + A ide i u i in the over-approximation (8),idealistically considered as the unknown next state evolution.Finally, we provide, in Theorem 4, bounds on the subopti-mality of the approximate problems when c is quadratic. Assumption 2 (Q UADRATIC ONE - STEP COST ) . We considerthat the one-step cost function c is a convex quadratic function, c ( x, u, y ) = (cid:20) yu (cid:21) T (cid:20) Q SS T R (cid:21) (cid:20) yu (cid:21) + (cid:20) qr (cid:21) T (cid:20) yu (cid:21) , (22) where q ∈ R n , r ∈ R m , Q = Q T ∈ R n × n , R = R T ∈ R m × m ,and S ∈ R n × m . Theorem 4 (S UBOPTIMALITY BOUND ) . Let c (cid:63)i , c opt i , and c ide i be the optimal cost of the one-step optimal control problem (9) ,the optimistic control problem (20) , and the idealistic controlproblem (21) , respectively, at the sampling time t i . We have | c (cid:63)i − c i | ≤ max (cid:0) (cid:107) w ( B i ) + w ( A + i ) |U|(cid:107) K ( A + i ) , (cid:107) w ( B i ) + w ( A − i ) |U|(cid:107) K ( A − i ) (cid:1) , (23)
50 10010 − − − Time step C o m pu t e ti m e ( s ) − − p x p y SINDYc CGP-LCB C2Opt
Opt . traj . DaTaControl T Target set Initial state
Time step C o s t f un c ti on Fig. 4.
DaTaControl successfully drove the unicycle to the origin.
Opt . traj . corresponds to the one-step optimal control with the known dynamics. where c i is either c opt i or c ide i , w ( A ) = A − A is the width ofan interval A , and K ( A ) , for any A ∈ IR n × m , is given by K ( A ) = min (cid:0) (cid:107) | S U| + q + 2 | Q ( B i + AU ) |(cid:107) , (cid:107) | S U| + q + 2 | Q X |(cid:107) (cid:1) . We provide a proof for Theorem 4 in Appendix A wherewe use the local Lipschitz property of the cost function c tocharacterize its variations over the over-approximation set (17). Remark 2.
The main term of the suboptimality bound inTheorem 4 is directly related to the width of the over-approximation of the reachable set at time t i +∆ t . Thus, as theover-approximation of the reachable set becomes tighter, thegap with the unknown optimal cost decreases. Specifically, it isstraightforward to see that the more data and side informationare used in the computation of B i , A + i , and A − i , the tighterthe widths w ( B i ) , w ( A + i ) , and w ( A − i ) become. V. C
OMPARISON WITH E XISTING A PPROACHES
We compare
DaTaControl with existing data-drivencontrol algorithms
CGP-LCB [12], [19],
C2Opt [11], and
SINDYc [8]. These algorithms also solve the one-step optimalcontrol problem (9) approximately.
CGP-LCB assumes thatthe unknown one-step approximate cost function C ( x i , u i ) = c ( x i , u i , x ( t i +1 ; x i , u i )) to minimize is described by a Gaus-sian process, while C2Opt assumes that C has Lipschitzcontinuous gradients. On the other hand, SINDYc uses thelimited data to perform sparse identification of the dynamics,which then permits an approximate solution to (9) via numer-ical solvers. The experiments on the unicycle system and aquadrotor demonstrate that
DaTaControl outperforms thesealgorithms both on the computation time at each sampling timeand the suboptimality of the control.
A. Unicycle System
We consider the problem of driving the unicycle of Ex-ample 2 to the origin. We encode this control objectiveusing the one-step cost function c ( x i , u i , x i +1 ) = 0 . (cid:107) x i +1 (cid:107) .We chose the time step size ∆ t = 0 . s and generated arandom initial trajectory T starting from the state x =[ − , − . , π/ and such that the unicycle goes away from the target. We applied DaTaControl with the side informationdiscussed in Example 2. We used the default parameters in
GPyOpt [19] when implementing
CGP-LCB . We chose theLipschitz constant for the gradient L = 10 and trade-offhyperparameter α = 1 / for C2Opt [11]. For
SINDYc , weconsidered monomials (up to degree ), sines and cosines ofthe state, and the products of these functions with the velocity v and the turning rate ω as the library functions. To performsparse identification, we swept the regularization parameter [8] λ ∈ { p : p ∈ N [ − , } and rounded-down the coefficientssmaller than − to zero. We relaxed the target state to the . -sublevel set of the one-step cost function as the stoppingcriteria for the algorithms.Figure 4 shows the trajectories and the evolution of thecost function for the different algorithms. DaTaControl performs significantly better than
GPyOpt , SINDYc , and
C2Opt . It reaches the origin in fewer time steps and signif-icantly lower computation time. Figure 4 empirically showsthat
DaTaControl achieves near-real-time control.
B. Quadrotor System
Consider a quadrotor with control-affine dynamics [20] ˙ p x = v x , ˙ v x = − m C vD v x − T m sinφ − T m sinφ, ˙ p y = v y , ˙ v y = − m ( mg + C vD v y ) + T m cosφ + T m cosφ, ˙ φ = ω, ˙ ω = − I yy C φD ω − l I yy T + l I yy T , where the components of the state x = [ p x , v x , p y , v y , φ, ω ] represent, respectively, the horizontal position, horizontal ve-locity, vertical position, vertical velocity, pitch angle, and pitchrate, and the components of the control u = [ T , T ] representthe thrust exerted on either end of the quadrotor. We chose theconstraint set U = [0 , . × [0 , . . The constants of thedynamics are given by C vD = 0 . , C φD = 0 . , g = 9 . , m = 1 . , l = 0 . , and I yy = 0 . .We assume that the dynamics of the quadrotor are un-known and consider the problems of controlling v x to asetpoint v sp x = 5 and p y to a setpoint p sp y = 8 . We encodethese control objectives, respectively, using the cost functions
50 100 15010 − − − Time step C o m pu t e ti m e ( s ) . . − Time (s) v x T Setpoint
SINDYc CGP-LCB
Opt . traj . DaTaControl
Time (s) p y Fig. 5.
CGP-LCB and
SINDYc fail in the task of controlling the horizontal speed v x of the quadrotor whereas DaTaControl reaches both setpoints. c ( x i , u i , x i +1 ) = 0 . x i +1 ) − v sp x ) and c ( x i , u i , x i +1 ) =0 . x i +1 ) − p sp y ) . We chose the time step size ∆ t = 0 . s and generated a random initial trajectory T starting from thestate x = [0 , , , , , . We applied DaTaControl withthe side information ˙ p x = v x , ˙ p y = v y , and ˙ φ = ω . Suchextra knowledge is obtained from elementary law of physics.Furthermore, DaTaControl considered the loose Lipschitzbounds L f = L f = 0 . , L f = 0 . , L G , = L G , = 0 . , L G , = L G , = L G , = L G , = L G , = L G , = 0 . ,and uses the side information G ( x ) = G ( φ ) and f ( x ) = f ( v x , v y , w ) . We used the default parameters of GPyOpt when implementing
CGP-LCB . For
SINDYc , we consideredmonomials (up to degree ), sines and cosines of the state,and the products of these functions with the T and T as thelibrary functions. We do not make a comparison with C2Opt due to the inability to compute the gradient of C .Figure 5 shows the near-optimality of DaTaControl while
CGP-LCB and
SINDYc fail to reach the setpoints.Furthermore, the figure demonstrates that
DaTaControl canachieve near-optimal, near-real-time control of the verticalposition and horizontal speed. We justify the suboptimality of
Opt . traj . by the fact that the one-step optimal control prob-lem is highly nonlinear, which makes possible the synthesisof local optimum solutions by the numerical solvers.VI. C ONCLUSION
In this work, we proposed two data-driven algorithms,
DaTaReach and
DaTaControl , for on-the-fly over-approximation of the reachable set and constrained near-optimal control of systems with unknown dynamics. Thesealgorithms are suitable for scenarios with severely limiteddata and can take advantage of various side information.The numerical experiments demonstrate the efficacy of thealgorithms over existing approaches while suggesting that
DaTaControl is also near-real-time.R
EFERENCES[1] I. Mitchell, A. Bayen, and C. Tomlin, “A time-dependent hamilton-jacobiformulation of reachable sets for continuous dynamic games,”
IEEETrans. Autom. Control , vol. 50, no. 7, pp. 947–957, 2005. [2] M. Berz and K. Makino, “Verified integration of odes and flows usingdifferential algebraic methods on high-order taylor models,”
Reliablecomputing , vol. 4, no. 4, pp. 361–369, 1998.[3] N. S. Nedialkov, K. R. Jackson, and G. F. Corliss, “Validated solutionsof initial value problems for ordinary differential equations,”
AppliedMathematics and Computation , vol. 105, no. 1, pp. 21–68, 1999.[4] E. Goubault and S. Putot, “Inner and outer reachability for the verifica-tion of control systems,” in
Proc. Hybrid Syst.: Comput. & Ctrl. , 2019,pp. 11–22.[5] M. Ahmadi, A. Israel, and U. Topcu, “Safe controller synthesis for data-driven differential inclusions,”
IEEE Trans. Autom. Control , 2020.[6] D. Q. Mayne, J. B. Rawlings, C. V. Rao, and P. O. Scokaert, “Con-strained model predictive control: Stability and optimality,”
Automatica ,vol. 36, no. 6, pp. 789–814, 2000.[7] M. Korda and I. Mezi´c, “Linear predictors for nonlinear dynamical sys-tems: Koopman operator meets model predictive control,”
Automatica ,vol. 93, pp. 149–160, 2018.[8] E. Kaiser, J. N. Kutz, and S. L. Brunton, “Sparse identification ofnonlinear dynamics for model predictive control in the low-data limit,”
Proc. of the Royal Society A , vol. 474, no. 2219, p. 20180335, 2018.[9] J. L. Proctor, S. L. Brunton, and J. N. Kutz, “Dynamic mode decom-position with control,”
SIAM Journal on Applied Dynamical Systems ,vol. 15, no. 1, pp. 142–161, 2016.[10] M. Ornik, S. P. Carr, A. Israel, and U. Topcu, “Control-oriented learningon the fly,”
IEEE Trans. Autom. Control , 2019.[11] A. P. Vinod, A. Israel, and U. Topcu, “Convexified contextual optimiza-tion for on-the-fly control of smooth systems,” in , 2020, pp. 2004–2011.[12] A. Krause and C. S. Ong, “Contextual gaussian process bandit optimiza-tion,” in
Advances in NIPS. , 2011, pp. 2447–2455.[13] A. Devonport and M. Arcak, “Data-driven reachable set computationusing adaptive gaussian process classification and monte carlo methods,”in . IEEE, 2020, pp. 2629–2634.[14] S. Haesaert, P. M. Van den Hof, and A. Abate, “Data-driven and model-based verification via bayesian identification and reachability analysis,”
Automatica , vol. 79, pp. 115–126, 2017.[15] A. Chakrabarty, A. Raghunathan, S. Di Cairano, and C. Danielson,“Data-driven estimation of backward reachable and invariant sets forunmodeled systems via active learning,” in . IEEE, 2018, pp. 372–377.[16] R. E. Moore,
Interval analysis . Prentice-Hall Englewood Cliffs, 1966,vol. 4.[17] J. Kocijan, R. Murray-Smith, C. E. Rasmussen, and A. Girard, “Gaussianprocess model based predictive control,” in
Proceedings of the 2004American control conference , vol. 3. IEEE, 2004, pp. 2214–2219.[18] B.-G. Park, J.-W. Lee, and W. H. Kwon, “Robust one-step recedinghorizon control for constrained systems,”
International Journal of Robustand Nonlinear Control: IFAC-Affiliated Journal , vol. 9, no. 7, pp. 381–395, 1999.[19] The GPyOpt authors, “GPyOpt: A Bayesian optimization framework inPython,” http://github.com/SheffieldML/GPyOpt, 2016.[20] M. Chen, S. L. Herbert, M. S. Vashishtha, S. Bansal, and C. J. Tomlin,“Decomposition of reachable sets and tubes for a class of nonlinearystems,”
IEEE Transactions on Automatic Control , vol. 63, no. 11, pp.3675–3688, Nov 2018.[21] F. Benhamou, F. Goualard, L. Granvilliers, and J.-F. Puget, “Revisinghull and box consistency,” in
Proceedings of the 1999 InternationalConference on Logic Programming . USA: Massachusetts Institute ofTechnology, 1999, p. 230–244.[22] H. Federer,
Geometric measure theory , ser. Grundlehren Math. Wiss.Berlin: Springer, 1969. A PPENDIX
AIn this appendix, we provide the proofs for all lemmas andtheorems presented in this paper.P
ROOF OF L EMMA f ( x i ) ∈ F i and G ( x i ) ∈ G i , wewant to obtain tighter intervals C F i ⊆ F i and C G i ⊆ G i thatprune out some values f ( x i ) and G ( x i ) from F i and G i thatdo not satisfy the constraint ˙ x i = f ( x i ) + G ( x i ) u i . We firsthave that f ( x i ) = ˙ x i − G ( x i ) u i ∈ ( ˙ x i − G i u i ) ∩ F i = C F i . Therefore, a similar reasoning using the tighter interval C F i provides that G ( x i ) u i ∈ ( ˙ x i − C F i ) ∩ ( G i u i ) = s . It isimportant to note that plugging back s instead of G i u i inthe expression of C F i will not yield an interval tighter than C F i . Therefore, C F i and s are optimal. Next, we focus onthe term G ( x i ) u i ∈ s . For all k ∈ N [1 ,n ] , we have that ( G ( x i )) k, ( u i ) = ( G ( x i ) u i ) k − (cid:88) l> ( G ( x i )) k,l ( u i ) l ∈ (cid:0) ( s ) k − (cid:88) l> ( G i ) k,l u l (cid:1) ∩ (cid:0) ( G i ) k, u (cid:1)(cid:124) (cid:123)(cid:122) (cid:125) ( C G i ) k, u , and we can deduce (cid:88) l> ( G ( x i )) k,l ( u i ) l ∈ (cid:0) ( s ) k − ( C G i ) k, u (cid:1) ∩ (cid:0) (cid:88) l> ( G i ) k,l u l (cid:1)(cid:124) (cid:123)(cid:122) (cid:125) ( s ) k . Using the same argument as for the optimality of s and C F i ,we can say that s and ( C G i ) k, u are optimal. Finally, we ap-ply the previous step in a sequential manner for p = 2 , . . . , m to the equality ( G ( x i )) k,p ( u i ) p = (cid:80) l>p − ( G ( x i )) k,l ( u i ) l − (cid:80) l>p ( G ( x i )) k,l ( u i ) l in order to obtain optimal intervals s p and ( C G i ) k,p u p . The procedure described here is similar to the HC4-Revise algorithm [21] where optimality was provenwhen each variable, in our case f ( x i ) and G ( x i ) , appearsonly once in the definition of the constraint, in our case ˙ x i = f ( x i ) + G ( x i ) u i . (cid:3) P ROOF OF L EMMA f k and G k,l . Specifically, from the upper bound L f k on theLipschitz constant of f k , we have that | ( f ( x )) k − ( f ( y )) k | ≤ L f k (cid:107) x − y (cid:107) , ∀ x, y ∈ X . Hence, given a data ( x i , C F i , C G i ) ∈ E N and x ∈ X , wecan write that ( f ( x )) k ∈ ( f ( x i )) k + L f k (cid:107) x − x i (cid:107) [ − , ,and therefore ( f ( x )) k ∈ ( C F i ) k + L f k (cid:107) x − x i (cid:107) [ − , since f ( x i ) ∈ C F i . The previous belonging relation is valid forevery data ( x i , C F i , C G i ) ∈ E N and as a result ( f ( x )) k ∈ ( f ( x )) k . The same reasoning applied to L G k,l enables to showthat ( G ( x )) k,l ∈ ( G ( x )) k,l . (cid:3) P ROOF OF T HEOREM ( x i , C F i , C G i ) ∈ E N given by Algorithm 1, we have f ( x i ) ∈ C F i and G ( x i ) ∈ C G i .Specifically, as a consequence of line 6 of Algorithm 1 andLemma 2, we have that f ( x i ) ∈ F i and G ( x i ) ∈ G i . Hence, byline 7 and Lemma 1, we immediately have that f ( x i ) ∈ C F i and G ( x i ) ∈ C G i . As a consequence of lines 5–8, E N canbe used in Lemma 2 to conclude that f ( x ) ∈ f ( x ) and G ( x ) ∈ G ( x ) for all x ∈ X . Therefore, using intervalarithmetic, we have ˙ x = f ( x ) + G ( x ) u ∈ f ( x ) + G ( x ) u .The lines 9–11 enable to obtain tighter sets C F i and C G i by combining the Lipschitz continuity, the trajectory, andthe values C F i and C G i obtained during previous iterations.Hence, E N is optimal given the the trajectory and the Lipschitzcontinuity since we have the optimality of C F i and C G i byLemma 1. (cid:3) P ROOF OF T HEOREM ˙ x ∈ f ( x )+ G ( x ) u .Clearly, h ( Z , W ) = f ( Z ) + G ( Z ) W is an interval extensionof h ( x, u ) = f ( x ) + G ( x ) u . Hence, by simple inclusionbetween h and h , the set S i solution of (15) is an a priorirough enclosure as it satisfies the fixed-point equation (7). Weuse a Taylor expansion (6) of order D = 2 to obtain R + i +1 ⊆ R + i + ∆ t (cid:16) h ( R + i , v ( t i )) (cid:17) + ∆ t (cid:16) ∂h∂x h (cid:17) ( S i , V i )+ ∆ t (cid:16) ∂h∂u (cid:17) ( S i , V i ) V (1) i , (24)where ∂h∂x and ∂h∂u are interval extensions of ∂h∂x and ∂h∂u ,respectively. We have that G is an interval extension of ∂h∂u as ∂h∂u ( x, u ) = G ( x ) ∈ G ( x ) = ⇒ ∂h∂u = G . (25)Furthermore, for all k, p ∈ N [1 ,n ] , x ∈ X and u ∈ U , we have ∂h k ∂x p ( x, u ) = ∂f k ∂x p ( x, u ) + m (cid:88) l =1 ∂G k,l ∂x p ( x, u ) u l . Thus, a consequence of
Rademacher’s theorem [22, Theorem . . ] enables to write that ∂f k ∂x p ∈ L f k [ − , and ∂G k,l ∂x p ∈ L G k,l [ − , . Therefore, we have that ∂h∂x ( S i , V i ) = J f + J G V i . Finally,merging ∂h∂x and ∂h∂u into (24) provides the over-approximatingset (14). (cid:3) ROOF OF T HEOREM V = { u } . Specifically, since u is such that ˙ u ( t ) = 0 for all t ∈ [ t i , t i + ∆ t ] , we have that v ( t i ) = u i , v ([ t i , t i + ∆ t ]) = u i , and v (1) ([ t i , t i + ∆ t ]) = 0 .Applying Theorem 2, R + i +1 as a function of u i is thereforereduced to a term independent of u i , a term linear in u i , anda term quadratic in u i .Specifically, it is immediate from (14) that the term inde-pendent of u i is given by B i (18). The term linear in u i is (cid:0) G ( R + i )∆ t (cid:1) u i + (cid:0) J G u i f ( S j ) + J f G ( S j ) u i (cid:1) ∆ t . (26)The k -th component of J G u i f ( S i ) is given by n (cid:88) p =1 ( J G u i ) k,p ( f ( S i )) p = n (cid:88) p =1 (cid:0) m (cid:88) l =1 ( J G ) k,l,p ( u i ) l (cid:1) ( f ( S i )) p = m (cid:88) l =1 (cid:0) n (cid:88) p =1 ( J G ) k,l,p ( f ( S i )) p (cid:124) (cid:123)(cid:122) (cid:125) ( J T G f ( S i )) k,l (cid:1) ( u i ) l = ( J T G f ( S i ) u i ) k . (27)The quadratic term . t J G u i G ( S i ) u i in u i satisfies J G u i G ( S i ) u i ⊆ J G U G ( S i ) u i , (28) J G u i G ( S i ) u i ⊆ J G u i G ( S i ) U = J T G G ( S i ) U u i . (29)Henceforth, by combining (26), (27), and (28), the linearand quadratic terms in u i of R + i +1 can be over-approximatedby the linear term A + i u i with A + i given by (18). Similarly,using (26), (27), and (29) helps to prove that the linear andquadratic terms in u i can be also over-approximated by A − i u i with A − i given by (18). Hence, we deduce the intersectionin (17). (cid:3) P ROOF OF T HEOREM x + i +1 = b i + A + i u i ∈ B i + A + i u i and ˆ x + i +1 =ˆ b i + ˆ A + i u i ∈ B i + A + i u i with u i ∈ U , we have that | c ( · , u i , x + i +1 ) − c ( · , u i , ˆ x + i +1 ) | = (cid:12)(cid:12) (cid:20) x + i +1 u i (cid:21) T (cid:20) Q SS T R (cid:21) (cid:20) x + i +1 u i (cid:21) − (cid:20) ˆ x + i +1 u i (cid:21) T (cid:20) Q SS T R (cid:21) (cid:20) ˆ x + i +1 u i (cid:21) + (cid:20) qr (cid:21) T (cid:20) x + i +1 − ˆ x + i +1 (cid:21) (cid:12)(cid:12) = (cid:12)(cid:12)(cid:0) x + i +1 − ˆ x + i +1 (cid:1) T (cid:0) Q (cid:0) x + i +1 + ˆ x + i +1 (cid:1) + 2 Su i + q (cid:1)(cid:12)(cid:12) ≤ (cid:0) (cid:107) ( b i − ˆ b i ) + ( A + i − ˆ A + i ) u i (cid:107) (cid:1)(cid:0) (cid:107) Q (cid:0) ( b i + ˆ b i ) + ( A + i + ˆ A + i ) u i (cid:1) + 2 Su i + q (cid:107) (cid:1) . (30)By definition of the width w , | b i − ˆ b i | ≤ w ( B i ) and | A + i − ˆ A + i | ≤ w ( A + i ) . Hence, we can deduce that (cid:107) ( b i − ˆ b i ) + ( A + i − ˆ A + i ) u i (cid:107) ≤ (cid:107) w ( B i ) + w ( A + i ) |U|(cid:107) . (31)Furthermore, interval arithmetic provides that ( b i + ˆ b i ) + ( A + i + ˆ A + i ) u i ∈ (cid:0) B i + 2 A + i u i (cid:1) ∩ X . Hence, we have (cid:107) Su i + q + Q (cid:0) ( b i + ˆ b i ) + ( A + i + ˆ A + i ) u i (cid:1) (cid:107) ≤ (cid:107) | S U| + q + 2 | Q ( B i + A + i U ) |(cid:107) , (32) (cid:107) Su i + q + Q (cid:0) ( b i + ˆ b i ) + ( A + i + ˆ A + i ) u i (cid:1) (cid:107) ≤ (cid:107) | S U| + q + 2 | Q X |(cid:107) . (33)Therefore, combining (33) and (32), we have that (cid:107) Su i + q + Q (cid:0) ( b i + ˆ b i ) + ( A + i + ˆ A + i ) u i (cid:1) (cid:107) ≤ K ( A + i ) . (34)As a consequence of (30), (31), and (34), we can write | c ( · , u i , x + i +1 ) − c ( · , u i , ˆ x + i +1 ) |≤(cid:107) w ( B i ) + w ( A + i ) |U|(cid:107) K ( A + i ) . (35)Let u ∗ i be the optimal solution of the one-step optimalcontrol problem (9) and x i +1 ( u ∗ i ) ∈ B i + A + i u ∗ i be thecorresponding unknown next state. By optimality of u ∗ i , wehave that c ( x i , u ∗ i , x i +1 ( u ∗ i )) ≤ c ( x i , u i , x i +1 ( u i )) for all u i ∈ U . If ˆ u i is the optimal solution of the optimistic controlproblem (20) and ˆ x i +1 (ˆ u i ) ∈ B i + A + i u ∗ i is the known nextstate for which the optimum of (20) is attained, then | c (cid:63)i − c opt i | = | c ( x i , u ∗ i , x i +1 ( u ∗ i )) − c ( x i , ˆ u i , ˆ x i +1 (ˆ u i )) |≤ | c ( x i , ˆ u i , x i +1 (ˆ u i )) − c ( x i , ˆ u i , ˆ x i +1 (ˆ u i )) |≤ (cid:107) w ( B i ) + w ( A + i ) |U|(cid:107) K ( A + i ) . Similarly, using A − i instead of A + i , we can prove that | c (cid:63)i − c opt i | ≤ (cid:107) w ( B i ) + w ( A − i ) |U|(cid:107) K ( A − i ) . Hence we obtain thesuboptimality bounds (23) for the optimistic control problem.Finally, let ˆ u i be the optimal solution of the idealistic controlproblem (21) and ˆ x i +1 (ˆ u i ) = b ide i + A ide i ˆ u i be the next statefor which the optimum is attained. We have that | c ide i − c (cid:63)i | = (cid:40) c ide i − c ( x i , u (cid:63)i , x i +1 ( u ∗ i )) , if c ide i ≥ c (cid:63)i c (cid:63)i − c ( x i , ˆ u i , ˆ x i +1 (ˆ u i )) , otherwise ≤ c ( x i , u (cid:63)i , ˆ x i +1 ( u (cid:63)i )) − c ( x i , u (cid:63)i , x i +1 ( u ∗ i )) , if c ide i ≥ c (cid:63)i c ( x i , ˆ u i , x i +1 (ˆ u i )) − c ( x i , ˆ u i , ˆ x i +1 (ˆ u i )) , otherwise ≤ max (cid:0) (cid:107) w ( B i ) + w ( A + i ) |U|(cid:107) K ( A + i ) , (cid:107) w ( B i ) + w ( A − i ) |U|(cid:107) K ( A − i ) (cid:1) , where the second inequality follows from the definition ofoptimality of (9) and (21), and the last inequality followssimilarly to the inequalities obtained for the optimistic sub-optimality bounds.where the second inequality follows from the definition ofoptimality of (9) and (21), and the last inequality followssimilarly to the inequalities obtained for the optimistic sub-optimality bounds.