[PDF] On-The-Fly Control of Unknown Smooth Systems from Limited Data

Abstract

We investigate the problem of data-driven, on-the-fly control of systems with unknown nonlinear dynamics where data from only a single finite-horizon trajectory and possibly side information on the dynamics are available. Such side information may include knowledge of the regularity of the dynamics, monotonicity of the states, or decoupling in the dynamics between the states. Specifically, we develop two algorithms, DaTaReach and DaTaControl , to over-approximate the reachable set and design control signals for the system on the fly. DaTaReach constructs a differential inclusion that contains the unknown vector field. Then, it computes an over-approximation of the reachable set based on interval Taylor-based methods applied to systems with dynamics described as differential inclusions. DaTaControl enables convex-optimization-based, near-optimal control using the computed over-approximation and the receding-horizon control framework. We provide a bound on its suboptimality and show that more data and side information enable DaTaControl to achieve tighter suboptimality bounds. Finally, we demonstrate the efficacy of DaTaControl over existing approaches on the problems of controlling a unicycle and quadrotor systems.

Full PDF

OOn-The-Fly Control of Unknown Smooth Systems from Limited Data

Franck Djeumou, Abraham P. Vinod, Eric Goubault, Sylvie Putot, and Ufuk Topcu

Abstract —We investigate the problem of data-driven, on-the-ﬂy control of systems with unknown nonlinear dynamics wheredata from only a single ﬁnite-horizon trajectory and possibly sideinformation on the dynamics are available. Such side informationmay include knowledge of the regularity of the underlying dy-namics, monotonicity, or decoupling in the dynamics between thestates. Speciﬁcally, we propose two algorithms,

DaTaReach and

DaTaControl , to over-approximate the reachable set and designcontrol signals for the system on the ﬂy.

DaTaReach constructsa differential inclusion that contains the unknown dynamics.Then, it computes an over-approximation of the reachable setbased on interval Taylor-based methods applied to systems withdynamics described as differential inclusions.

DaTaControl en-ables convex-optimization-based, near-optimal control using thecomputed over-approximation and the receding-horizon controlframework. We provide a bound on its suboptimality and showthat more data and side information enable

DaTaControl toachieve tighter suboptimality bounds. Finally, we demonstratethe efﬁcacy of

DaTaControl over existing approaches on theproblems of controlling a unicycle and quadrotor systems.

I. I

NTRODUCTION

Consider a scenario in which abrupt changes in the dy-namics of a system occurs. The changes in the dynamics aresuch that the a priori known model cannot be used, and thereis a need to learn the new dynamics on the ﬂy. In such ascenario, the system needs to retain a certain degree of controlwith data from only its current trajectory. This paper considersthe problem of data-driven, on-the-ﬂy control of systems withunknown dynamics under severely limited data.We propose data-driven algorithms,

DaTaReach and

DaTaControl , to over-approximate the reachable set andcontrol systems with unknown dynamics. Reachability analy-sis provides a framework to analyze the set of states reachedby a dynamical system. We use such a framework as thebaseline for the on-the-ﬂy control, as it enables robust controlto perturbations on the data. When the underlying dynamicsare known, Hamilton-Jacobi-based [1] or interval Taylor-based [2]–[4] methods can approximate the reachable set.However, limited work has been reported for the case wherethe underlying dynamics are not known a priori, and evenfewer work considers settings with severely limited data.

DaTaReach and

DaTaControl can work with data froma single ﬁnite-horizon trajectory of the system and take advan-

This material is based on work partly supported by Air Force Ofﬁce ofScientiﬁc Research (FA9550-19-1-0005) and National Aeronautics and SpaceAdministration (80NSSC19K0209).F. Djeumou, A. Vinod, and U. Topcu are with the Department of Electricaland Computer Engineering, Oden Institute for Computational Engineeringand Sciences, and the Department of Aerospace Engineering and EngineeringMechanics at the University of Texas at Austin, Austin, TX, USA. Email: { fdjeumou, utopcu } @utexas.edu , [email protected] .E. Goubault and S. Putot are with the LIX, CNRS, Ecole Poly-technique, Institut Polytechnique de Paris, France. Email: { goubault,putot } @lix.polytechnique.fr . GoalOver-approximationsfor various controlsfor all t ∈ [ t N , t N +1 ] State at t i > t N Sampled trajectory for t ≤ t N Over-approximations fornear-optimal controlsNear-optimal controlsynthesis for t > t N Optimal trajectory

Fig. 1. We use limited data and side information for on-the-ﬂy control ofsystems with unknown dynamics. At each sampling time, we compute anover-approximation of the set of states the system may reach to describethe uncertainty in its unknown trajectory and construct one-step optimalcontrollers via convex optimization. tage of various forms of side information on the dynamics.The algorithms consider trajectories containing ﬁnite samplesof the states, the derivatives of the states, and the controlsignals applied. Furthermore, if available, they can make useof side information such as knowledge of the regularity ofthe dynamics, bounds on the vector ﬁeld, monotonicity ofthe vector ﬁeld, decoupling in the dynamics among the statesof the system, or knowledge of parts of the dynamics. Suchside information can be extracted from data, responses of thesystem to inputs, or known elementary law of physics.

DaTaReach provides closed-form expressions for over-approximations of the reachable set of unknown dynamicalsystems. It ﬁrst utilizes the available data and the given sideinformation to construct differential inclusions that containthe vector ﬁeld of the unknown dynamics. Unlike existingwork [5] that provides state-independent differential inclu-sions, the constructed differential inclusions account for de-pendencies on the states and control signals. Then, it pro-vides closed-form expressions for over-approximations of thereachable set associated with these differential inclusions. Toobtain such over-approximations, it utilizes interval Taylor-based methods [2]–[4] for known dynamics to compute over-approximations of the reachable set of dynamics describedby these differential inclusions. The closed-form expressionscan incorporate the side information, and more data and sideinformation provide tighter over-approximations.The closed-form expressions enable convex-optimization-based, near-optimal control of unknown dynamical systemsbased on the one-step receding-horizon control framework [6].Speciﬁcally, we seek to sequentially minimize a given one-step cost function in a discrete-time setting. The one-step costfunction, which encodes the desired behavior of the system,has to be optimized in a black-box manner since it is typicallya function of the unknown next state , the known current state,and the current input.

DaTaControl computes approximate a r X i v : . [ ee ss . S Y ] S e p olutions of such an optimization problem via approximateconvex optimization problems. We obtain such approximateproblems by replacing the unknown next state with a control-afﬁne linearization of the corresponding over-approximationof the reachable set. Furthermore, we provide a bound on thesuboptimality of the approximate problems and show that sucha bound becomes tighter with more data and side information. Related work.

For the problem of data-driven control, re-searchers have proposed several approaches that combinesystem identiﬁcation with model predictive control [7]–[11].In [7], the authors use Koopman theory to lift the unknownnonlinear dynamics to a higher-dimensional space where theyperform linear system identiﬁcation.

SINDYc [8] utilizes asparse regression over a library of nonlinear functions fornonlinear system identiﬁcation.

DMDc [9] uses the spectralproperties of the collected data to obtain approximate linearmodels. Thus,

DMDc can perform poorly due to the restrictionto linear dynamics. Myopic control [10] uses a ﬁnite sequenceof perturbations to learn a local linear model, which is thenused to optimize a model-driven goodness function that en-codes desirable behaviors.

DMDc , SINDYc , and approachesbased on the Koopman theory require signiﬁcantly more datathan the proposed approach. Myopic control cannot incorpo-rate any of the side information mentioned above. Similarly,in their current forms, we believe that

DMDc , SINDYc , andapproaches based on the Koopman theory cannot incorporatethe side information. Furthermore, the experiments show that

DaTaControl performs signiﬁcantly better than

SINDYc .Contextual optimization-based approaches tackle the data-driven control problem via surrogate optimization and skipthe system identiﬁcation step [11], [12]. These approachesminimize the one-step cost function in a black-box mannerusing the data.

C2Opt [11] exploits the structure in the givenproblem and utilizes side information (e.g., smoothness) andconvex optimization to solve this problem. It overcomes thedrawbacks of the Gaussian process-based approaches [12],namely high computational costs, expensive hyperparametertuning, and inability to incorporate side information. Unlike

DaTaControl , C2Opt considers limited forms of side in-formation and relies on the knowledge of the gradient of theone-step cost, that may not be accessible. We demonstrate ona unicycle and quadrotor systems that

DaTaControl is threeorders of magnitude computationally faster and less subopti-mal than C2Opt and the Gaussian process-based algorithms.Recent work [13]–[15] has considered the problem of data-driven estimation of the reachable sets of partially unknowndynamical systems. The approaches in these work rely on asystem identiﬁcation using either supervised learning algo-rithms [15] or Gaussian process-based algorithms [13], [14].Such approaches are unable to take advantage of the side infor-mation, require signiﬁcantly more data than

DaTaReach , andprovide only probabilistic guarantees of the correctness of thecomputed reachable sets while

DaTaReach provides correctover-approximations at the expense of being conservative.In Appendix A, we provide the proofs of all the technicalresults of the paper. II. P

RELIMINARIES AND P ROBLEM S TATEMENT

We denote an interval by [ a, b ] = { x ∈ R | a ≤ x ≤ b } forsome a, b ∈ R such that a ≤ b , the set { i, . . . , j } by N [ i,j ] for i, j ∈ N with i ≤ j , the -norm by || · || , the k th componentof a vector x and the ( k, j ) component of a matrix X by ( x ) k and ( X ) k,j , respectively, the Lipschitz constant of f : X → R by L f = sup { L ∈ R | | f ( x ) − f ( y ) | ≤ L (cid:107) x − y (cid:107) , x, y ∈X , x (cid:54) = y } for X ⊆ R n , and the Jacobian of a function f by ∂f∂x . A function f ∈ C k ( X ) , referred as f is C k , with k ≥ if f is continuous on X ⊆ R n and all the partial derivativesof order , . . . , k exist and are continuous on X , and f is piecewise- C k with k ≥ if there exists a partition of X suchthat f is C k on each set in the partition. A. Interval Analysis

We denote the set of intervals on R by IR = {A =[ A , A ] | A , A ∈ R , A ≤ A} , the set of n -dimensional intervalvectors by IR n , and the set of n × m -dimensional intervalmatrices by IR n × m . We carry forward the deﬁnitions [16]of arithmetic operations, set inclusion, and intersections ofintervals to interval vectors and matrices by applying themcomponentwise. We use the term interval to specify an intervalvector or interval matrix when it is clear from the context.Given f : X (cid:55)→ Y with

X ⊆ R n and Y ⊆ R m (or Y ⊆ R n × m ), we deﬁne an interval extension of f as f with f ( A ) ⊇ R ( f, A ) = { f ( x ) | x ∈ A} , ∀A ⊆ X . (1)Thus, given an interval A , f ( A ) is an interval that over-approximates the range of values taken by f over A . Example 1 (I NTERVAL EXTENSION OF − NORM ) . Consider f = || · || . We compute its interval extension f via intervalextensions of α = √· and β = ( · ) . For any A = [ A , A ] ∈ IR , α ( A ) = [ (cid:112) A , (cid:112) A ] , if A ≥ , (2) β ( A ) = (cid:40) [0 , max {A , A } ] , if ∈ A [min {A , A } , max {A , A } ] , otherwise . (3) Using (2) and (3) with interval arithmetic, we have, for any S = [ S , . . . , S n ] ∈ IR n , f ( S ) = α ( (cid:80) ni =1 β ( S i )) .B. Over-Approximations of the Reachable Set Consider a nonlinear dynamical system, ˙ x = h ( x, u ) , (4)where the state x : R + (cid:55)→ X is a continuous-time signalevolving in X ∈ IR n , the control u ∈ U is a signal of timeevolving in the control set U ∈ IR m with U = { v : R + (cid:55)→U | v is piecewise- C D u } for D u ≥ , and h : X × U (cid:55)→ Y is C D h for D h ≥ and Y ∈ IR n . Given an initial state x i = x ( t i ) at time t i and a control signal u ∈ U , a trajectory of (4)is a function of time x ( · ; x i , u ) : [ t i , ∞ [ (cid:55)→ X that satisﬁes (4). Deﬁnition 1 (R EACHABLE SET ) . Given a set I i ⊆ X of statesat time t i and a set V ⊆ U of control signals, the reachableset of the dynamics (4) at time t ≥ t i is given by R ( t, I i , V ) = { z ∈ X | ∃ x i ∈ I i , ∃ v ∈ V , z = x ( t ; x i , v ) } . iven a set V ⊆ U and a set I ⊆ X of states at time t ,we compute over-approximations of R ( t, I , V ) at time t ≥ t using interval Taylor-based methods [3], [4], [16]. Speciﬁcally,we consider a time grid t < · · · < t N such that for all v ∈ V , v is C D u on each interval [ t i , t i +1 [ . We want to compute sets R + i ∈ IR n (with R +0 = I ) such that for all i ∈ N [0 ,N − , R i +1 = R ( t i +1 , R + i , V ) ⊆ R + i +1 . First, interval arithmeticenables to inductively deﬁne h [ d ] , the interval extensions ofthe Taylor coefﬁcients h [ d ] given by h [1] = h, h [ d +1] = 1 d + 1 (cid:16) ∂h [ d ] ∂x h + d − (cid:88) l =0 ∂h [ d ] ∂u ( l ) u ( l +1) (cid:17) . (5)Next, we start with R +0 = I , and compute {R + i } Ni =1 by R + i +1 = R + i + D − (cid:88) d =1 (cid:0) t i +1 − t i (cid:1) d (cid:0) h [ d ] ( R + i , v ) (cid:1) ( t i )+ (cid:0) t i +1 − t i (cid:1) D (cid:0) h [ D ] ( S i , v ) (cid:1) ([ t i , t i +1 ]) , (6)where D ≤ min( D u +1 , D h ) is the order of the Taylor expan-sion, we denote v (0) ( A ) by v ( A ) and the intervals v ( d ) ( A ) for all d ∈ N [0 ,D u ] are such that ∪ v ∈ V R ( v ( d ) , A ) ⊆ v ( d ) ( A ) with A ⊆ R + . In other words, v ( d ) ( A ) over-approximatesthe range of the d th derivative of all v ∈ V on the interval A . Here, the set S i ⊆ X is an a priori rough enclosure of R ( t, R + i , V ) for all t ∈ [ t i , t i +1 ] and is a solution of R + i + [0 , t i +1 − t i ] h ( S i , v ([ t i , t i +1 ])) ⊆ S i . (7) C. Problem Statement

In this paper, we consider control-afﬁne nonlinear dynamics, ˙ x = f ( x ) + G ( x ) u, (8)where f : X (cid:55)→ R n and G : X (cid:55)→ R n × m are unknown vector-valued and matrix-valued functions, respectively, for X ∈ IR n .Note that even though we consider control-afﬁne dynamics, inthe general case, we can construct a control-afﬁne model ofthe system locally and apply the results of the paper. Assumption 1 (S MOOTHNESS ) . f and G are C D functionsfor some D ≥ . Assumption 1 is common in the frameworks of reachabilityanalysis and receding-horizon control. It implies that f and G are globally Lipschitz-continuous since the domain X ∈ IR n is bounded. We exploit such Lipschitz continuity by consid-ering the following side information throughout the paper. Side information 1 (B OUNDS ON L IPSCHITZ CONSTANTS ) . Let L f ∈ R n + and L G ∈ R n × m + with ( L f ) k = L f k and ( L G ) k,l = L G k,l be known upper bounds on the Lipschitzconstants of ( f ) k and ( G ) k,l for all k ∈ N [1 ,n ] and l ∈ N [1 ,m ] . We can also utilize, if available, any of the following sideknowledge about the unknown dynamics.

Side information 2 (V ECTOR FIELD BOUNDS ) . We aregiven R f A ∈ IR n and R G A ∈ IR n × m as known over-approximations of the range of f and G , respectively, overa given set A ⊆ X . Side information 3 (G RADIENT BOUNDS ) . We are givenbounds on the gradient of some components of f and G . Suchside information may include the monotonicity of f and G . Side information 4 (D ECOUPLING AMONG STATES ) . We aregiven the knowledge that some components of the vector ﬁeld ˙ x do not depend on some components of the state x . Side information 5 (P ARTIAL DYNAMICS KNOWLEDGE ) . Weare given terms of some components of the vector ﬁeld as ˙ x = f kn ( x ) + f ukn ( x ) + ( G kn ( x ) + G ukn ( x )) u, where f kn and G kn are known functions while f ukn and G ukn are unknown functions. Such side information can be derivedfrom the application of elementary law of physics. Let N ∈ N , N ≥ . Let T N = { ( x i , ˙ x i , u i ) } Ni =1 denotea single ﬁnite-horizon trajectory containing N samples of thestate x i = x ( t i ) , the derivative ˙ x i = ˙ x ( t i ) of the state, andthe control signal u i = u ( t i ) from a trajectory of (8). Giventhe current state x ( t N +1 ) and the trajectory T N , we ﬁrst seekto over-approximate the reachable set of the system. Problem 1 (R EACHABLE SET OVER - APPROXIMATION ) . Given a single ﬁnite-horizon trajectory T N for N ∈ N , theside information 1 and possibly any of the side information 2–5, a set V ⊆ U of admissible control signals, a time stepsize ∆ t > , and a maximum number T > N of time steps,compute an over-approximation of the reachable set at time t i = t N + ( i − N )∆ t for all i ∈ N [ N +1 ,T ] . Next, we seek to control the system by ﬁnding, at eachsampling time t i > t N , control values solutions of the one-step optimal control problem [11], [17], [18] minimize u i ∈U c ( x i , u i , x ( t i +1 ; x i , u i )) , (9)where x i is the state of the system at t i , t i +1 = t i + ∆ t , ∆ t is the constant time step size, and c : X × U × X → R is aknown convex function, which encodes the preferences over ashort interval of length ∆ t in time. Problem 2 (A PPROXIMATE ONE - STEP OPTIMAL CONTROL ) . Given a single ﬁnite-horizon trajectory T i − for some i ∈ N , i > N , the current state x i at time t i , the side informa-tion 1, and possibly any of the side information 2–5, compute,at sampling time t i , approximate solutions of the one-stepoptimal control problem (9) and characterize the suboptimalityof such approximations. We use the following example throughout the paper.

Example 2 (U NICYCLE S YSTEM ) . Consider the unicyclesystem with dynamics given by ˙ p x = v cos( θ ) , ˙ p y = v sin( θ ) , ˙ θ = ω, (10) where the components of the state x = [ p x , p y , θ ] represent,respectively, the position in the x plane, the y plane, and theheading of the unicycle. The components of the control u =[ v, ω ] represent the speed and the turning rate, respectively.e consider the constraint set U = [ − , × [ − π, π ] . In thecontrol-afﬁne form (8) , we have f = 0 , ( G ) , = cos( θ ) , ( G ) , = sin( θ ) , ( G ) , = 1 , and ( G ) k,l = 0 otherwise. Weassume the dynamics (10) are unknown and are given the looseLipschitz bounds L f = [0 . , . , . , L G , = L G , =1 . , L G , = 0 . , and L G k,l = 0 otherwise. Furthermore, weconsider the knowledge that the vector ﬁeld does not dependon its positions. That is, f ( x ) = f ( θ ) and G ( x ) = G ( θ ) . III. O

VER -A PPROXIMATIONS OF THE R EACHABLE S ET OF U NKNOWN D YNAMICAL S YSTEMS

We propose

DaTaReach to address Problem 1. It con-structs a differential inclusion that contains, at all times, thevector ﬁeld of the unknown dynamics using the side informa-tion 1. Then, it utilizes the interval Taylor-based method toover-approximate the reachable set of dynamics described bythe constructed differential inclusion.

A. Differential Inclusion based on Lipschitz Continuity

First, to aid the construction of the differential inclusion,we over-approximate f and G at each data point of T N . Lemma 1 (C ONTRACTION VIA DATA ) . Given a data point ( x i , ˙ x i , u i ) , an interval F i ∈ IR n such that f ( x i ) ∈ F i , andan interval G i ∈ IR n × m such that G ( x i ) ∈ G i , the intervals C F i and C G i , deﬁned sequentially for l = 1 , . . . , m by ( C F i ) k = ( F i ) k ∩ ( ˙ x i − G i u i ) k , ( s ) k = ( ˙ x i − C F i ) k ∩ ( G i u i ) k , ( C G i ) k,l = (cid:0) (( s l − ) k − (cid:80) p>l ( G i ) k,p u p ) ∩ (( G i ) k,l u l ) (cid:1) u l , if u l (cid:54) = 0( G i ) k,l , otherwise , ( s l ) k = (cid:0) ( s l − ) k − ( C G i ) k,l u l (cid:1) ∩ (cid:0) (cid:88) p>l ( G i ) k,p u p (cid:1) , (11) for all k ∈ N [1 ,n ] , are the smallest intervals enclosing f ( x i ) and G ( x i ) , respectively, given only the data point, F i , and G i . We provide a proof for Lemma 1 in Appendix A where weuse the constraint ˙ x i = f ( x i ) + G ( x i ) u i to remove from F i and G i values of f ( x i ) and G ( x i ) not satisfying the constraint.Next, we use the Lipschitz continuity to provide explicitfunctions that over-approximate f and G , given uncertainknowledge of f and G at the data points. Lemma 2 (O VER - APPROXIMATION OF f AND G ) . Given a set E N = { ( x i , C F i , C G i ) | f ( x i ) ∈ C F i , G ( x i ) ∈ C G i } Ni =0 andthe bounds L f and L G from side information 1, the interval-valued functions f : X → IR n and G : X → IR n × m , givenfor all k ∈ N [1 ,n ] and l ∈ N [1 ,m ] by ( f ( x )) k = (cid:92) ( x i ,C F i , · ) ∈ E N ( C F i ) k + L f k (cid:107) x − x i (cid:107) [ − , , ( G ( x )) k,l = (cid:92) ( x i , · ,C G i ) ∈ E N ( C G i ) k,l + L G k,l (cid:107) x − x i (cid:107) [ − , , (12) are such that f ( x ) ∈ f ( x ) and G ( x ) ∈ G ( x ) for all x ∈ X . Algorithm 1

Optimal over-approximation of the values of f and G at each data point of a ﬁnite-horizon trajectory. Input:

Single trajectory T N , sufﬁciently large M > , upperbounds on the Lipschitz constants (side information 1). Optional : The sets A , R f A and R G A (side information 2). Output: E N = { ( x i , C F i , C G i ) | f ( x i ) ∈ C F i , G ( x i ) ∈ C G i } Ni =0 if no side information 2 then A ← X , R f A ← [ − M, M ] n , R G A ← [ − M, M ] n × m end if Deﬁne x ∈ A , C F ← R f A , and C G ← R G A for i ∈ N [1 ,N ] ∧ ( x i , ˙ x i , u i ) ∈ T N do Compute F i = f ( x i ) , G i = G ( x i ) via (12) and E i − Compute C F i , C G i via (11), F i , G i , and ( x i , ˙ x i , u i ) end for while E N is not invariant do Execute lines 5–8 with E N instead of E i − on line 6 end while return E N We provide a proof for Lemma 2 in Appendix A. Notethat interval extensions of f and G can be obtained byreplacing occurences of (cid:107) · (cid:107) by its interval extension givenin Example 1. In the rest of the paper, when the input of f and G are intervals, we use such interval extensions.Finally, Theorem 1 constructs a differential inclusion for thesystem by combining Algorithm 1 and Lemma 2. Theorem 1 (D IFFERENTIAL INCLUSION ) . Given a trajectory T N , the bounds L f and L G from side information 1, andpossibly bounds on the vector ﬁeld from side information 2,the dynamics (8) are contained in the differential inclusion ˙ x ∈ f ( x ) + G ( x ) u, (13) where the functions f : X → IR n and G : X → IR n × m areobtained by (12) with E N being the output of Algorithm 1. A proof of Theorem 1 is provided in Appendix A where weshow that the output E N of Algorithm 1 is such that for all ( x i , C F i , C G i ) ∈ E N , f ( x i ) ∈ C F i and G ( x i ) ∈ C G i . There-fore, Lemma 2 and interval arithmetic enable to conclude.Figure 2 shows that the differential inclusion holds on theunicycle system using a randomly generated trajectory T . . . . . -0.6-0.4-0.20.00.20.40.6 Time (s) ˙ p x f ( x ) + G ( x ) u f ( x ) + G ( x ) u T Fig. 2. Time evolution of ˙ p x and its over-approximation ( f ( x )+ G ( x ) u ) forthe unicycle system of Example 2. We generate the trajectory correspondingto ˙ x ( t ) using a randomly generated piecewise-constant input u ( t ) ∈ U for t ≤ t = 1 . s , and u ( t ) = [1 , cos(6( t − t ))] for t ∈ [ t , . emark 1. The quality of the differential inclusion of Theo-rem 1 depends on how much information on f and G can beobtained from the given trajectory T N . Thus, if the trajectoryis not diverse , it is likely impossible to obtain tight differentialinclusions. For example, consider one of the corner caseswhere u i = 0 for every data point in T N . In this case, itis impossible to retrieve any information on G . Note thatwhen the goal is to control the unknown system, control valuescan be synthesized to diversify the trajectory and obtain tightdifferential inclusions that help the future control decisions.B. Interval Taylor-Based Method for Differential Inclusions We compute an over-approximation of the reachable set ofthe dynamics described by the differential inclusion (13). Suchover-approximation is naturally an over-approximation of thereachable set of the unknown dynamical system.

Theorem 2 (R EACHABLE SET OVER - APPROXIMATION ) . Given a trajectory T N , a set V ⊆ U of piecewise- C D u control signals for D u ≥ , the bounds L f and L G from sideinformation 1, the set R + i ∈ IR n of states at time t i , and a timestep size ∆ t , an over-approximation R + i +1 of the reachable setat t i + ∆ t of dynamics described by the differential inclusion ˙ x ∈ f ( x ) + G ( x ) u with u ∈ V is given by R + i +1 = R + i + (cid:16) f ( R + i ) + G ( R + i ) v ( t i ) (cid:17) ∆ t + (cid:16) J f + J G V i (cid:17)(cid:16) f ( S i ) + G ( S i ) V i (cid:17) ∆ t G ( S i ) V (1) i ∆ t , (14) where V i = v ([ t i , t i + ∆ t ]) , V (1) i = v (1) ([ t i , t i + ∆ t ]) , andthe matrices J f ∈ IR n × n and J G ∈ IR n × m × n , intervalextensions of the Jacobian of f and G , are such that ( J f ) k,p = L f k [ − , and ( J G ) k,l,p = L G k,l [ − , for all k, p ∈ N [1 ,n ] and l ∈ N [1 ,m ] . The set S i is a solution of R + i + [0 , ∆ t ] (cid:16) f ( S i ) + G ( S i ) V i (cid:17) ⊆ S i . (15) For k, p ∈ N [1 ,n ] , we deﬁne the ( k, p ) component of J G V i as ( J G V i ) k,p = m (cid:88) l =1 ( J G ) k,l,p ( V i ) l . (16)We provide a proof for Theorem 2 in Appendix A wherewe use a Taylor expansion (6) of order D = 2 . Corollary 1.

Under the notation of Theorem 2, assume that D u = 0 . Then, an over-approximation R + i +1 of the reachableset of the dynamics described by the differential inclusion ˙ x ∈ f ( x ) + G ( x ) u of Theorem 1 with u ∈ V is given by R + i +1 = R + i + (cid:0) f ( S i ) + G ( S i ) V i (cid:1) ∆ t. Theorem 2, as it is, does not incorporate more side in-formation. We now show how to incorporate the other sideinformation to obtain tighter over-approximations.

Algorithm 2

DaTaReach : Over-approximation of the reach-able set of unknown smooth systems.

Input:

Single trajectory T N , upper bounds on the Lipschitzconstants from side information 1, set V of control signals,time step size ∆ t , maximum number of time steps T > N . Optional : Any of the side information 2–5.

Output:

Over-approximations {R + i } Ti = N +1 of the reachablesets at times t N + ( i − N )∆ t with i ∈ N [ N +1 ,T ] . Deﬁne R + N +1 ← { x ( t N +1 ) } Compute f , and G from Theorem 1 for i ∈ { N + 1 , . . . , T − } do Compute v ( t i ) , V i , and V (1) i in Theorem 2 from V Compute S i via (15), V i , f , and G Compute R + i +1 via (14), R + i , S i , f , G , J f , and J G end for return {R + i } Ti = N +1 Side information 2 (V ECTOR FIELD BOUNDS ). Given a set

S ⊆ A , tighter interval extensions of f and G over S can beobtained by the update f ( S ) ← f ( S ) ∩ R f A and G ( S ) ← G ( S ) ∩ R G A . Side information 3 (G RADIENT BOUNDS ). These bounds canbe used to provide tighter interval extensions J f and J G ofthe Jacobians of f and G . For example, if the function ( f ) k is known to be non-decreasing with respect to the variable x p on a set A ⊆ X , then we obtain a tighter R + i +1 by the update ( J f ) k,p ← ( J f ) k,p ∩ R + if S i ⊆ A . Side information 4 (D ECOUPLING AMONG STATES ). Thisknowledge constrains the interval extensions for the Jacobianmatrices. For example, if the state ( x ( t )) p does not directlyaffect ( ˙ x ( t )) k for some p, k ∈ N [1 ,n ] under any controlsignal in U , we can obtain a tighter over-approximation ofthe reachable set by setting to zero the intervals ( J f ) k,p and ( J G ) k,l,p for all l ∈ N [1 ,m ] . For the unicycle system ofExample 2, since f and G depends only on the heading θ ,the Jacobian terms ( J f ) , , ( J f ) , , ( J f ) , , ( J f ) , , ( J f ) , , ( J f ) , , ( J G ) , , , ( J G ) , , , ( J G ) , , , ( J G ) , , , ( J G ) , , ,and ( J G ) , , must all be set to zero . Side information 5 (P ARTIAL DYNAMICS KNOWLEDGE ). Thenew functions f and G are given by f = f kn + f ukn and G = G kn + G ukn , where the functions f kn and f ukn are intervalextensions of known functions f kn and G kn , respectively, andthe functions f ukn and G ukn are obtained by Theorem 1 using L f ukn , L G ukn , and the new trajectory T (cid:48) N = { ( x i , ˙ x i − ( f kn ( x i )+ G kn ( x i ) u i ) , u i ) | ( x i , ˙ x i , u i ) ∈ T N } . Furthermore, the new Jacobian terms in the computationof R + i +1 are given by J f = ∂f kn ∂x ( S i ) + J f ukn and ( J G ) k,l,p = ∂ ( G kn ) k,l ∂x p ( S i ) + ( J G ukn ) k,l,p , respectively.We demonstrate the value of additional side informationon the unicycle system of Example 2. Speciﬁcally, we use DaTaReach to compute over-approximations of the reachablesets of the unicycle under different side information. Figure 3hows that, as expected, the over-approximation becomestighter under additional side information.-2.7 -2.6 -2.5 -2.4 -2.3 -2.2 -2.1 -2.0 -1.9 − − − p x p y DaTaReach (a)

DaTaReach (b) T DaTaReach (c) Reachable set

Fig. 3. Over-approximation of the reachable set of the unicycle system inthe x - y plane. The trajectory T comes from a trajectory of the systemobtained by a randomly generated piecewise-constant control signal u ( t ) ∈U for t ≤ t = 1 . s . The parameters for DaTaReach were given by ∆ t = 0 . , T = 200 , and V = { t (cid:55)→ [1 + a , cos(6( t − t )) + a ] | a ∈ [ − . , . , a ∈ [ − . , . } . The case (a) ignores the side information f ( x ) = f ( θ ) , case (b) is exactly the setting of Example 2, and case (c) assumes the extra knowledge that f = 0 and ( G ) , = 1 . IV. C

ONTROL S YNTHESIS FOR U NKNOWN D YNAMICAL S YSTEMS

We propose

DaTaControl to address Problem 2. It com-putes approximate solutions to the one-step optimal controlproblem (9) using the over-approximation of the reachableset. Speciﬁcally, it replaces the unknown next state x i +1 inproblem (9) with the corresponding over-approximation R + i +1 ,which may be computed using Theorem 2. Unfortunately, R + i +1 is a nonconvex function of the control signal u , the de-cision variable. Therefore, it constructs an over-approximationof R + i +1 , in Theorem 3, that is convex in the given constantcontrol signal u applied in the time interval [ t i , t i +1 ] . Thenew over-approximation provides tractable approximations ofthe one-step optimal control problem (9). Theorem 3.

Under the notation of Theorem 2 and given aconstant control signal u : t (cid:55)→ u i applied between t i and t i +1 = t i + ∆ t with u i ∈ U , the reachable set R i +1 satisﬁes R i +1 ⊆ (cid:0) B i + A + i u i (cid:1) ∩ (cid:0) B i + A − i u i (cid:1) , (17) where the intervals A − i , A + i , and B i are given by A − i = G ( R + i )∆ t + (cid:0) J f G ( S i ) + J T G ( f ( S i ) + G ( S i ) U ) (cid:1) ∆ t , A + i = G ( R + i )∆ t + (cid:0) ( J f + J G U ) G ( S i ) + J T G f ( S i ) (cid:1) ∆ t , B i = R + i + f ( R + i )∆ t + J f f ( S i ) ∆ t , (18) where ( J T G ) k,p,l = ( J G ) k,l,p for k, p ∈ N [1 ,n ] and l ∈ N [1 ,m ] ,and the a priori rough enclosure S i is a solution of R + i + [0 , ∆ t ] (cid:16) f ( S i ) + G ( S i ) U (cid:17) ⊆ S i . (19) Algorithm 3

DaTaControl : Approximate one-step optimalcontrol solution at sampling time t i > t N Input:

Single trajectory T i − of length i > N , time step size ∆ t , upper bounds on the Lipschitz constants from sideinformation 1, one-step cost c , current state x i = x ( t i ) ,constraint set U . Optional : Any of the side information 2–5.

Output:

Constant control ˆ u i ∈ U to apply between t i and t i + ∆ t that approximates a solution of (9). Deﬁne R + i ← { x i } Compute f , and G from Theorem 1 Compute S i via (19), R + i , U , f , and G Compute B i , A + i , and A − i via (18), R + i , S i , U , f , G , J f ,and J G Compute ˆ u i as the solution of either (20) or (21) return ˆ u i A proof of Theorem 3 is provided in Appendix A wherewe linearize, using the constrained set U , the quadratic termin u of the closed-form expression (14).We use the control-afﬁne linearization of Theorem 3 topropose two convex optimization problems that approximatethe one-step optimal control problem (9). The ﬁrst problem,called optimistic control problem , is given by minimize u i ∈U inf x i +1 ∈X ,x i +1 ∈ ( B i + A + i u i ) ∩ ( B i + A − i u i ) c ( x i , u i , x i +1 ) , (20)where the goal is to minimize the best possible cost valueover all possible state x i +1 in the over-approximation (17).The second problem, called idealistic control problem , is anidealistic approximation of (9) given by minimize u i ∈U c ( x i , u i , b ide i + A ide i u i ) , (21)where the goal is to minimize the cost associated to a speciﬁctrajectory x i +1 = b ide i + A ide i u i in the over-approximation (8),idealistically considered as the unknown next state evolution.Finally, we provide, in Theorem 4, bounds on the subopti-mality of the approximate problems when c is quadratic. Assumption 2 (Q UADRATIC ONE - STEP COST ) . We considerthat the one-step cost function c is a convex quadratic function, c ( x, u, y ) = (cid:20) yu (cid:21) T (cid:20) Q SS T R (cid:21) (cid:20) yu (cid:21) + (cid:20) qr (cid:21) T (cid:20) yu (cid:21) , (22) where q ∈ R n , r ∈ R m , Q = Q T ∈ R n × n , R = R T ∈ R m × m ,and S ∈ R n × m . Theorem 4 (S UBOPTIMALITY BOUND ) . Let c (cid:63)i , c opt i , and c ide i be the optimal cost of the one-step optimal control problem (9) ,the optimistic control problem (20) , and the idealistic controlproblem (21) , respectively, at the sampling time t i . We have | c (cid:63)i − c i | ≤ max (cid:0) (cid:107) w ( B i ) + w ( A + i ) |U|(cid:107) K ( A + i ) , (cid:107) w ( B i ) + w ( A − i ) |U|(cid:107) K ( A − i ) (cid:1) , (23)

50 10010 − − − Time step C o m pu t e ti m e ( s ) − − p x p y SINDYc CGP-LCB C2Opt

Opt . traj . DaTaControl T Target set Initial state

Time step C o s t f un c ti on Fig. 4.

DaTaControl successfully drove the unicycle to the origin.

Opt . traj . corresponds to the one-step optimal control with the known dynamics. where c i is either c opt i or c ide i , w ( A ) = A − A is the width ofan interval A , and K ( A ) , for any A ∈ IR n × m , is given by K ( A ) = min (cid:0) (cid:107) | S U| + q + 2 | Q ( B i + AU ) |(cid:107) , (cid:107) | S U| + q + 2 | Q X |(cid:107) (cid:1) . We provide a proof for Theorem 4 in Appendix A wherewe use the local Lipschitz property of the cost function c tocharacterize its variations over the over-approximation set (17). Remark 2.

The main term of the suboptimality bound inTheorem 4 is directly related to the width of the over-approximation of the reachable set at time t i +∆ t . Thus, as theover-approximation of the reachable set becomes tighter, thegap with the unknown optimal cost decreases. Speciﬁcally, it isstraightforward to see that the more data and side informationare used in the computation of B i , A + i , and A − i , the tighterthe widths w ( B i ) , w ( A + i ) , and w ( A − i ) become. V. C

OMPARISON WITH E XISTING A PPROACHES

We compare

DaTaControl with existing data-drivencontrol algorithms

CGP-LCB [12], [19],

C2Opt [11], and

SINDYc [8]. These algorithms also solve the one-step optimalcontrol problem (9) approximately.

CGP-LCB assumes thatthe unknown one-step approximate cost function C ( x i , u i ) = c ( x i , u i , x ( t i +1 ; x i , u i )) to minimize is described by a Gaus-sian process, while C2Opt assumes that C has Lipschitzcontinuous gradients. On the other hand, SINDYc uses thelimited data to perform sparse identiﬁcation of the dynamics,which then permits an approximate solution to (9) via numer-ical solvers. The experiments on the unicycle system and aquadrotor demonstrate that

DaTaControl outperforms thesealgorithms both on the computation time at each sampling timeand the suboptimality of the control.

A. Unicycle System

We consider the problem of driving the unicycle of Ex-ample 2 to the origin. We encode this control objectiveusing the one-step cost function c ( x i , u i , x i +1 ) = 0 . (cid:107) x i +1 (cid:107) .We chose the time step size ∆ t = 0 . s and generated arandom initial trajectory T starting from the state x =[ − , − . , π/ and such that the unicycle goes away from the target. We applied DaTaControl with the side informationdiscussed in Example 2. We used the default parameters in

GPyOpt [19] when implementing

CGP-LCB . We chose theLipschitz constant for the gradient L = 10 and trade-offhyperparameter α = 1 / for C2Opt [11]. For

SINDYc , weconsidered monomials (up to degree ), sines and cosines ofthe state, and the products of these functions with the velocity v and the turning rate ω as the library functions. To performsparse identiﬁcation, we swept the regularization parameter [8] λ ∈ { p : p ∈ N [ − , } and rounded-down the coefﬁcientssmaller than − to zero. We relaxed the target state to the . -sublevel set of the one-step cost function as the stoppingcriteria for the algorithms.Figure 4 shows the trajectories and the evolution of thecost function for the different algorithms. DaTaControl performs signiﬁcantly better than

GPyOpt , SINDYc , and

C2Opt . It reaches the origin in fewer time steps and signif-icantly lower computation time. Figure 4 empirically showsthat

DaTaControl achieves near-real-time control.

B. Quadrotor System

Consider a quadrotor with control-afﬁne dynamics [20] ˙ p x = v x , ˙ v x = − m C vD v x − T m sinφ − T m sinφ, ˙ p y = v y , ˙ v y = − m ( mg + C vD v y ) + T m cosφ + T m cosφ, ˙ φ = ω, ˙ ω = − I yy C φD ω − l I yy T + l I yy T , where the components of the state x = [ p x , v x , p y , v y , φ, ω ] represent, respectively, the horizontal position, horizontal ve-locity, vertical position, vertical velocity, pitch angle, and pitchrate, and the components of the control u = [ T , T ] representthe thrust exerted on either end of the quadrotor. We chose theconstraint set U = [0 , . × [0 , . . The constants of thedynamics are given by C vD = 0 . , C φD = 0 . , g = 9 . , m = 1 . , l = 0 . , and I yy = 0 . .We assume that the dynamics of the quadrotor are un-known and consider the problems of controlling v x to asetpoint v sp x = 5 and p y to a setpoint p sp y = 8 . We encodethese control objectives, respectively, using the cost functions

50 100 15010 − − − Time step C o m pu t e ti m e ( s ) . . − Time (s) v x T Setpoint

SINDYc CGP-LCB

Opt . traj . DaTaControl

Time (s) p y Fig. 5.

CGP-LCB and

SINDYc fail in the task of controlling the horizontal speed v x of the quadrotor whereas DaTaControl reaches both setpoints. c ( x i , u i , x i +1 ) = 0 . x i +1 ) − v sp x ) and c ( x i , u i , x i +1 ) =0 . x i +1 ) − p sp y ) . We chose the time step size ∆ t = 0 . s and generated a random initial trajectory T starting from thestate x = [0 , , , , , . We applied DaTaControl withthe side information ˙ p x = v x , ˙ p y = v y , and ˙ φ = ω . Suchextra knowledge is obtained from elementary law of physics.Furthermore, DaTaControl considered the loose Lipschitzbounds L f = L f = 0 . , L f = 0 . , L G , = L G , = 0 . , L G , = L G , = L G , = L G , = L G , = L G , = 0 . ,and uses the side information G ( x ) = G ( φ ) and f ( x ) = f ( v x , v y , w ) . We used the default parameters of GPyOpt when implementing

CGP-LCB . For

SINDYc , we consideredmonomials (up to degree ), sines and cosines of the state,and the products of these functions with the T and T as thelibrary functions. We do not make a comparison with C2Opt due to the inability to compute the gradient of C .Figure 5 shows the near-optimality of DaTaControl while

CGP-LCB and

SINDYc fail to reach the setpoints.Furthermore, the ﬁgure demonstrates that

DaTaControl canachieve near-optimal, near-real-time control of the verticalposition and horizontal speed. We justify the suboptimality of

Opt . traj . by the fact that the one-step optimal control prob-lem is highly nonlinear, which makes possible the synthesisof local optimum solutions by the numerical solvers.VI. C ONCLUSION

In this work, we proposed two data-driven algorithms,

DaTaReach and

DaTaControl , for on-the-ﬂy over-approximation of the reachable set and constrained near-optimal control of systems with unknown dynamics. Thesealgorithms are suitable for scenarios with severely limiteddata and can take advantage of various side information.The numerical experiments demonstrate the efﬁcacy of thealgorithms over existing approaches while suggesting that

DaTaControl is also near-real-time.R

EFERENCES[1] I. Mitchell, A. Bayen, and C. Tomlin, “A time-dependent hamilton-jacobiformulation of reachable sets for continuous dynamic games,”

IEEETrans. Autom. Control , vol. 50, no. 7, pp. 947–957, 2005. [2] M. Berz and K. Makino, “Veriﬁed integration of odes and ﬂows usingdifferential algebraic methods on high-order taylor models,”

Reliablecomputing , vol. 4, no. 4, pp. 361–369, 1998.[3] N. S. Nedialkov, K. R. Jackson, and G. F. Corliss, “Validated solutionsof initial value problems for ordinary differential equations,”

AppliedMathematics and Computation , vol. 105, no. 1, pp. 21–68, 1999.[4] E. Goubault and S. Putot, “Inner and outer reachability for the veriﬁca-tion of control systems,” in

Proc. Hybrid Syst.: Comput. & Ctrl. , 2019,pp. 11–22.[5] M. Ahmadi, A. Israel, and U. Topcu, “Safe controller synthesis for data-driven differential inclusions,”

IEEE Trans. Autom. Control , 2020.[6] D. Q. Mayne, J. B. Rawlings, C. V. Rao, and P. O. Scokaert, “Con-strained model predictive control: Stability and optimality,”

Automatica ,vol. 36, no. 6, pp. 789–814, 2000.[7] M. Korda and I. Mezi´c, “Linear predictors for nonlinear dynamical sys-tems: Koopman operator meets model predictive control,”

Automatica ,vol. 93, pp. 149–160, 2018.[8] E. Kaiser, J. N. Kutz, and S. L. Brunton, “Sparse identiﬁcation ofnonlinear dynamics for model predictive control in the low-data limit,”

Proc. of the Royal Society A , vol. 474, no. 2219, p. 20180335, 2018.[9] J. L. Proctor, S. L. Brunton, and J. N. Kutz, “Dynamic mode decom-position with control,”

SIAM Journal on Applied Dynamical Systems ,vol. 15, no. 1, pp. 142–161, 2016.[10] M. Ornik, S. P. Carr, A. Israel, and U. Topcu, “Control-oriented learningon the ﬂy,”

IEEE Trans. Autom. Control , 2019.[11] A. P. Vinod, A. Israel, and U. Topcu, “Convexiﬁed contextual optimiza-tion for on-the-ﬂy control of smooth systems,” in , 2020, pp. 2004–2011.[12] A. Krause and C. S. Ong, “Contextual gaussian process bandit optimiza-tion,” in

Advances in NIPS. , 2011, pp. 2447–2455.[13] A. Devonport and M. Arcak, “Data-driven reachable set computationusing adaptive gaussian process classiﬁcation and monte carlo methods,”in . IEEE, 2020, pp. 2629–2634.[14] S. Haesaert, P. M. Van den Hof, and A. Abate, “Data-driven and model-based veriﬁcation via bayesian identiﬁcation and reachability analysis,”

Automatica , vol. 79, pp. 115–126, 2017.[15] A. Chakrabarty, A. Raghunathan, S. Di Cairano, and C. Danielson,“Data-driven estimation of backward reachable and invariant sets forunmodeled systems via active learning,” in . IEEE, 2018, pp. 372–377.[16] R. E. Moore,

Interval analysis . Prentice-Hall Englewood Cliffs, 1966,vol. 4.[17] J. Kocijan, R. Murray-Smith, C. E. Rasmussen, and A. Girard, “Gaussianprocess model based predictive control,” in

Proceedings of the 2004American control conference , vol. 3. IEEE, 2004, pp. 2214–2219.[18] B.-G. Park, J.-W. Lee, and W. H. Kwon, “Robust one-step recedinghorizon control for constrained systems,”

International Journal of Robustand Nonlinear Control: IFAC-Afﬁliated Journal , vol. 9, no. 7, pp. 381–395, 1999.[19] The GPyOpt authors, “GPyOpt: A Bayesian optimization framework inPython,” http://github.com/ShefﬁeldML/GPyOpt, 2016.[20] M. Chen, S. L. Herbert, M. S. Vashishtha, S. Bansal, and C. J. Tomlin,“Decomposition of reachable sets and tubes for a class of nonlinearystems,”

IEEE Transactions on Automatic Control , vol. 63, no. 11, pp.3675–3688, Nov 2018.[21] F. Benhamou, F. Goualard, L. Granvilliers, and J.-F. Puget, “Revisinghull and box consistency,” in

Proceedings of the 1999 InternationalConference on Logic Programming . USA: Massachusetts Institute ofTechnology, 1999, p. 230–244.[22] H. Federer,

Geometric measure theory , ser. Grundlehren Math. Wiss.Berlin: Springer, 1969. A PPENDIX

AIn this appendix, we provide the proofs for all lemmas andtheorems presented in this paper.P

ROOF OF L EMMA f ( x i ) ∈ F i and G ( x i ) ∈ G i , wewant to obtain tighter intervals C F i ⊆ F i and C G i ⊆ G i thatprune out some values f ( x i ) and G ( x i ) from F i and G i thatdo not satisfy the constraint ˙ x i = f ( x i ) + G ( x i ) u i . We ﬁrsthave that f ( x i ) = ˙ x i − G ( x i ) u i ∈ ( ˙ x i − G i u i ) ∩ F i = C F i . Therefore, a similar reasoning using the tighter interval C F i provides that G ( x i ) u i ∈ ( ˙ x i − C F i ) ∩ ( G i u i ) = s . It isimportant to note that plugging back s instead of G i u i inthe expression of C F i will not yield an interval tighter than C F i . Therefore, C F i and s are optimal. Next, we focus onthe term G ( x i ) u i ∈ s . For all k ∈ N [1 ,n ] , we have that ( G ( x i )) k, ( u i ) = ( G ( x i ) u i ) k − (cid:88) l> ( G ( x i )) k,l ( u i ) l ∈ (cid:0) ( s ) k − (cid:88) l> ( G i ) k,l u l (cid:1) ∩ (cid:0) ( G i ) k, u (cid:1)(cid:124) (cid:123)(cid:122) (cid:125) ( C G i ) k, u , and we can deduce (cid:88) l> ( G ( x i )) k,l ( u i ) l ∈ (cid:0) ( s ) k − ( C G i ) k, u (cid:1) ∩ (cid:0) (cid:88) l> ( G i ) k,l u l (cid:1)(cid:124) (cid:123)(cid:122) (cid:125) ( s ) k . Using the same argument as for the optimality of s and C F i ,we can say that s and ( C G i ) k, u are optimal. Finally, we ap-ply the previous step in a sequential manner for p = 2 , . . . , m to the equality ( G ( x i )) k,p ( u i ) p = (cid:80) l>p − ( G ( x i )) k,l ( u i ) l − (cid:80) l>p ( G ( x i )) k,l ( u i ) l in order to obtain optimal intervals s p and ( C G i ) k,p u p . The procedure described here is similar to the HC4-Revise algorithm [21] where optimality was provenwhen each variable, in our case f ( x i ) and G ( x i ) , appearsonly once in the deﬁnition of the constraint, in our case ˙ x i = f ( x i ) + G ( x i ) u i . (cid:3) P ROOF OF L EMMA f k and G k,l . Speciﬁcally, from the upper bound L f k on theLipschitz constant of f k , we have that | ( f ( x )) k − ( f ( y )) k | ≤ L f k (cid:107) x − y (cid:107) , ∀ x, y ∈ X . Hence, given a data ( x i , C F i , C G i ) ∈ E N and x ∈ X , wecan write that ( f ( x )) k ∈ ( f ( x i )) k + L f k (cid:107) x − x i (cid:107) [ − , ,and therefore ( f ( x )) k ∈ ( C F i ) k + L f k (cid:107) x − x i (cid:107) [ − , since f ( x i ) ∈ C F i . The previous belonging relation is valid forevery data ( x i , C F i , C G i ) ∈ E N and as a result ( f ( x )) k ∈ ( f ( x )) k . The same reasoning applied to L G k,l enables to showthat ( G ( x )) k,l ∈ ( G ( x )) k,l . (cid:3) P ROOF OF T HEOREM ( x i , C F i , C G i ) ∈ E N given by Algorithm 1, we have f ( x i ) ∈ C F i and G ( x i ) ∈ C G i .Speciﬁcally, as a consequence of line 6 of Algorithm 1 andLemma 2, we have that f ( x i ) ∈ F i and G ( x i ) ∈ G i . Hence, byline 7 and Lemma 1, we immediately have that f ( x i ) ∈ C F i and G ( x i ) ∈ C G i . As a consequence of lines 5–8, E N canbe used in Lemma 2 to conclude that f ( x ) ∈ f ( x ) and G ( x ) ∈ G ( x ) for all x ∈ X . Therefore, using intervalarithmetic, we have ˙ x = f ( x ) + G ( x ) u ∈ f ( x ) + G ( x ) u .The lines 9–11 enable to obtain tighter sets C F i and C G i by combining the Lipschitz continuity, the trajectory, andthe values C F i and C G i obtained during previous iterations.Hence, E N is optimal given the the trajectory and the Lipschitzcontinuity since we have the optimality of C F i and C G i byLemma 1. (cid:3) P ROOF OF T HEOREM ˙ x ∈ f ( x )+ G ( x ) u .Clearly, h ( Z , W ) = f ( Z ) + G ( Z ) W is an interval extensionof h ( x, u ) = f ( x ) + G ( x ) u . Hence, by simple inclusionbetween h and h , the set S i solution of (15) is an a priorirough enclosure as it satisﬁes the ﬁxed-point equation (7). Weuse a Taylor expansion (6) of order D = 2 to obtain R + i +1 ⊆ R + i + ∆ t (cid:16) h ( R + i , v ( t i )) (cid:17) + ∆ t (cid:16) ∂h∂x h (cid:17) ( S i , V i )+ ∆ t (cid:16) ∂h∂u (cid:17) ( S i , V i ) V (1) i , (24)where ∂h∂x and ∂h∂u are interval extensions of ∂h∂x and ∂h∂u ,respectively. We have that G is an interval extension of ∂h∂u as ∂h∂u ( x, u ) = G ( x ) ∈ G ( x ) = ⇒ ∂h∂u = G . (25)Furthermore, for all k, p ∈ N [1 ,n ] , x ∈ X and u ∈ U , we have ∂h k ∂x p ( x, u ) = ∂f k ∂x p ( x, u ) + m (cid:88) l =1 ∂G k,l ∂x p ( x, u ) u l . Thus, a consequence of

Rademacher’s theorem [22, Theorem . . ] enables to write that ∂f k ∂x p ∈ L f k [ − , and ∂G k,l ∂x p ∈ L G k,l [ − , . Therefore, we have that ∂h∂x ( S i , V i ) = J f + J G V i . Finally,merging ∂h∂x and ∂h∂u into (24) provides the over-approximatingset (14). (cid:3) ROOF OF T HEOREM V = { u } . Speciﬁcally, since u is such that ˙ u ( t ) = 0 for all t ∈ [ t i , t i + ∆ t ] , we have that v ( t i ) = u i , v ([ t i , t i + ∆ t ]) = u i , and v (1) ([ t i , t i + ∆ t ]) = 0 .Applying Theorem 2, R + i +1 as a function of u i is thereforereduced to a term independent of u i , a term linear in u i , anda term quadratic in u i .Speciﬁcally, it is immediate from (14) that the term inde-pendent of u i is given by B i (18). The term linear in u i is (cid:0) G ( R + i )∆ t (cid:1) u i + (cid:0) J G u i f ( S j ) + J f G ( S j ) u i (cid:1) ∆ t . (26)The k -th component of J G u i f ( S i ) is given by n (cid:88) p =1 ( J G u i ) k,p ( f ( S i )) p = n (cid:88) p =1 (cid:0) m (cid:88) l =1 ( J G ) k,l,p ( u i ) l (cid:1) ( f ( S i )) p = m (cid:88) l =1 (cid:0) n (cid:88) p =1 ( J G ) k,l,p ( f ( S i )) p (cid:124) (cid:123)(cid:122) (cid:125) ( J T G f ( S i )) k,l (cid:1) ( u i ) l = ( J T G f ( S i ) u i ) k . (27)The quadratic term . t J G u i G ( S i ) u i in u i satisﬁes J G u i G ( S i ) u i ⊆ J G U G ( S i ) u i , (28) J G u i G ( S i ) u i ⊆ J G u i G ( S i ) U = J T G G ( S i ) U u i . (29)Henceforth, by combining (26), (27), and (28), the linearand quadratic terms in u i of R + i +1 can be over-approximatedby the linear term A + i u i with A + i given by (18). Similarly,using (26), (27), and (29) helps to prove that the linear andquadratic terms in u i can be also over-approximated by A − i u i with A − i given by (18). Hence, we deduce the intersectionin (17). (cid:3) P ROOF OF T HEOREM x + i +1 = b i + A + i u i ∈ B i + A + i u i and ˆ x + i +1 =ˆ b i + ˆ A + i u i ∈ B i + A + i u i with u i ∈ U , we have that | c ( · , u i , x + i +1 ) − c ( · , u i , ˆ x + i +1 ) | = (cid:12)(cid:12) (cid:20) x + i +1 u i (cid:21) T (cid:20) Q SS T R (cid:21) (cid:20) x + i +1 u i (cid:21) − (cid:20) ˆ x + i +1 u i (cid:21) T (cid:20) Q SS T R (cid:21) (cid:20) ˆ x + i +1 u i (cid:21) + (cid:20) qr (cid:21) T (cid:20) x + i +1 − ˆ x + i +1 (cid:21) (cid:12)(cid:12) = (cid:12)(cid:12)(cid:0) x + i +1 − ˆ x + i +1 (cid:1) T (cid:0) Q (cid:0) x + i +1 + ˆ x + i +1 (cid:1) + 2 Su i + q (cid:1)(cid:12)(cid:12) ≤ (cid:0) (cid:107) ( b i − ˆ b i ) + ( A + i − ˆ A + i ) u i (cid:107) (cid:1)(cid:0) (cid:107) Q (cid:0) ( b i + ˆ b i ) + ( A + i + ˆ A + i ) u i (cid:1) + 2 Su i + q (cid:107) (cid:1) . (30)By deﬁnition of the width w , | b i − ˆ b i | ≤ w ( B i ) and | A + i − ˆ A + i | ≤ w ( A + i ) . Hence, we can deduce that (cid:107) ( b i − ˆ b i ) + ( A + i − ˆ A + i ) u i (cid:107) ≤ (cid:107) w ( B i ) + w ( A + i ) |U|(cid:107) . (31)Furthermore, interval arithmetic provides that ( b i + ˆ b i ) + ( A + i + ˆ A + i ) u i ∈ (cid:0) B i + 2 A + i u i (cid:1) ∩ X . Hence, we have (cid:107) Su i + q + Q (cid:0) ( b i + ˆ b i ) + ( A + i + ˆ A + i ) u i (cid:1) (cid:107) ≤ (cid:107) | S U| + q + 2 | Q ( B i + A + i U ) |(cid:107) , (32) (cid:107) Su i + q + Q (cid:0) ( b i + ˆ b i ) + ( A + i + ˆ A + i ) u i (cid:1) (cid:107) ≤ (cid:107) | S U| + q + 2 | Q X |(cid:107) . (33)Therefore, combining (33) and (32), we have that (cid:107) Su i + q + Q (cid:0) ( b i + ˆ b i ) + ( A + i + ˆ A + i ) u i (cid:1) (cid:107) ≤ K ( A + i ) . (34)As a consequence of (30), (31), and (34), we can write | c ( · , u i , x + i +1 ) − c ( · , u i , ˆ x + i +1 ) |≤(cid:107) w ( B i ) + w ( A + i ) |U|(cid:107) K ( A + i ) . (35)Let u ∗ i be the optimal solution of the one-step optimalcontrol problem (9) and x i +1 ( u ∗ i ) ∈ B i + A + i u ∗ i be thecorresponding unknown next state. By optimality of u ∗ i , wehave that c ( x i , u ∗ i , x i +1 ( u ∗ i )) ≤ c ( x i , u i , x i +1 ( u i )) for all u i ∈ U . If ˆ u i is the optimal solution of the optimistic controlproblem (20) and ˆ x i +1 (ˆ u i ) ∈ B i + A + i u ∗ i is the known nextstate for which the optimum of (20) is attained, then | c (cid:63)i − c opt i | = | c ( x i , u ∗ i , x i +1 ( u ∗ i )) − c ( x i , ˆ u i , ˆ x i +1 (ˆ u i )) |≤ | c ( x i , ˆ u i , x i +1 (ˆ u i )) − c ( x i , ˆ u i , ˆ x i +1 (ˆ u i )) |≤ (cid:107) w ( B i ) + w ( A + i ) |U|(cid:107) K ( A + i ) . Similarly, using A − i instead of A + i , we can prove that | c (cid:63)i − c opt i | ≤ (cid:107) w ( B i ) + w ( A − i ) |U|(cid:107) K ( A − i ) . Hence we obtain thesuboptimality bounds (23) for the optimistic control problem.Finally, let ˆ u i be the optimal solution of the idealistic controlproblem (21) and ˆ x i +1 (ˆ u i ) = b ide i + A ide i ˆ u i be the next statefor which the optimum is attained. We have that | c ide i − c (cid:63)i | = (cid:40) c ide i − c ( x i , u (cid:63)i , x i +1 ( u ∗ i )) , if c ide i ≥ c (cid:63)i c (cid:63)i − c ( x i , ˆ u i , ˆ x i +1 (ˆ u i )) , otherwise ≤  c ( x i , u (cid:63)i , ˆ x i +1 ( u (cid:63)i )) − c ( x i , u (cid:63)i , x i +1 ( u ∗ i )) , if c ide i ≥ c (cid:63)i c ( x i , ˆ u i , x i +1 (ˆ u i )) − c ( x i , ˆ u i , ˆ x i +1 (ˆ u i )) , otherwise ≤ max (cid:0) (cid:107) w ( B i ) + w ( A + i ) |U|(cid:107) K ( A + i ) , (cid:107) w ( B i ) + w ( A − i ) |U|(cid:107) K ( A − i ) (cid:1) , where the second inequality follows from the deﬁnition ofoptimality of (9) and (21), and the last inequality followssimilarly to the inequalities obtained for the optimistic sub-optimality bounds.where the second inequality follows from the deﬁnition ofoptimality of (9) and (21), and the last inequality followssimilarly to the inequalities obtained for the optimistic sub-optimality bounds.