Convergence rates of moment-sum-of-squares hierarchies for optimal control problems
aa r X i v : . [ m a t h . O C ] S e p Convergence rates of moment-sum-of-squareshierarchies for optimal control problems
Milan Korda , Didier Henrion , , , Colin N. Jones Draft of October 7, 2018
Abstract
We study the convergence rate of moment-sum-of-squares hierarchies of semidefiniteprograms for optimal control problems with polynomial data. It is known that thesehierarchies generate polynomial under-approximations to the value function of theoptimal control problem and that these under-approximations converge in the L normto the value function as their degree d tends to infinity. We show that the rate of thisconvergence is O (1 / log log d ). We treat in detail the continuous-time infinite-horizondiscounted problem and describe in brief how the same rate can be obtained for thefinite-horizon continuous-time problem and for the discrete-time counterparts of bothproblems. Keywords: optimal control, moment relaxations, polynomial sums of squares, semidefiniteprogramming, approximation theory.
The moment-sum-of-squares hierarchy (also know as Lasserre hierarchy) of semidefinite pro-grams was originally introduced in [10] in the context of polynomial optimization. It allowsone to solve globally non-convex optimization problems at the price of solving a sequence, orhierarchy, of convex semidefinite programming problems, with convergence guarantees; seee.g. [13] for an introductory survey, [11] for a comprehensive overview and [3] for controlapplications.This hierarchy was extended in [12] to polynomial optimal control, and later on in [6] toglobal approximations of semi-algebraic sets, originally motivated by volume and integralestimation problems. The approximation hierarchy for semi-algebraic sets derived in [6] was Laboratoire d’Automatique, ´Ecole Polytechnique F´ed´erale de Lausanne, Station 9, CH-1015, Lausanne,Switzerland. { milan.korda,colin.jones } @epfl.ch CNRS; LAAS; 7 avenue du colonel Roche, F-31400 Toulouse; France. [email protected] Universit´e de Toulouse; LAAS; F-31400 Toulouse; France. Faculty of Electrical Engineering, Czech Technical University in Prague, Technick´a 2, CZ-16626 Prague,Czech Republic. L norm on compact domains, or equivalently, almost uniformly)was established in [8].The current paper is motivated by the analysis of the rate of convergence of the moment-sum-of-squares hierarchy for static polynomial optimization achieved in [14]. We show thata similar analysis can be carried out in the dynamic case, i.e. for assessing the rate of con-vergence of the moment-sum-of-squares hierarchy for polynomial optimal control. For easeof exposition, we focus on the discounted infinite-horizon continuous-time optimal controlproblem and briefly describe (in Section 5) how the same convergence rate can be obtainedfor the finite-time continuous version of the problem and for the discrete counterparts ofboth problems.Our main Theorem 4 gives estimates on the rate of convergence of the polynomial under-approximations to the value function in the L norm. As a direct outcome of this result, wederive in Corollary 2 that the rate of convergence is in O (1 / log log d ), where d is the degreeof the polynomial approximation. As far as we know, this is the first estimate of this kindin the context of moment-sum-of-squares hierarchies for polynomial optimal control. The set of all continuous functions on a set X ⊂ R n is denoted by C ( X ); the set of all k -times continuously differentiable functions is denoted by C k ( X ). For h ∈ C ( X ), we denote k h k C ( X ) := max x ∈ X | h ( x ) | and for h ∈ C ( X ) we denote k h k C ( X ) := max x ∈ X | h ( x ) | +max x ∈ X k∇ h ( x ) k where ∇ h is the gradient of h . The L norm with respect to a measure µ of a measurable function h : R n → R is denoted by k h k L ( µ ) := R R n h ( x ) µ ( dx ). Theset of all multivariate polynomials in a variable x of total degree no more than d is denotedby R [ x ] d . The symbol R [ x ] nd denotes the n -fold cartesian product of this set, i.e., the set ofall vectors with n entries, where each entry is a polynomial from R [ x ] d . The interior of aset X ⊂ R n is denoted by Int( X ). 2 Problem setup
Consider the discounted infinite-horizon optimal control problem V ⋆ ( x ) := inf u ( · ) , x ( · ) R ∞ e − βt l ( x ( t ) , u ( t )) dt s . t . x ( t ) = x + R t f ( x ( s ) , u ( s )) ds ∀ t ∈ [0 , ∞ ) x ( t ) ∈ X, u ( t ) ∈ U ∀ t ∈ [0 , ∞ ) (1)where β > f ∈ R [ x, u ] nd f and l ∈ R [ x, u ] d l are given multivariatepolynomials and the state and input constraint sets X and U are of the form X = { x ∈ R n | g Xi ( x ) ≥ , i = 1 , . . . , n X } ,U = { u ∈ R m | g Ui ( u ) ≥ , i = 1 , . . . , n U } , where g Xi ∈ R [ x ] d Xi and g Ui ∈ R [ u ] d Ui are multivariate polynomials. The function V ∗ in (1) iscalled the value function of the optimal control problem (1).Let us recall the Hamilton-Jacobi-Bellman inequality l ( x, u ) − βV ( x, u ) + ∇ V ( x, u ) · f ( x, u ) ≥ ∀ ( x, u ) ∈ X × U (2)which plays a crucial role in the derivation of the convergence rates. In particular, for anyfunction V ∈ C ( X ) that satisfies (2) it holds V ( x ) ≤ V ⋆ ( x ) ∀ x ∈ X. (3)The following polynomial sum-of-squares optimization problem provides a sequence of lowerbounds to the value function indexed by the degree d :max V ∈ R [ x ] d R X V ( x ) dµ ( x )s . t . l − βV + ∇ V · f ∈ Q d + d f ( X × U ) , (4)where µ is a given probability measure supported on X (e.g., the uniform distribution), and Q d + d f ( X × U ) := n s + n X X i =1 g Xi s iX + n U X i =1 g Ui s iU : s ∈ Σ ⌊ ( d + d f ) / ⌋ , s iX ∈ Σ ⌊ ( d + d f − d iX ) / ⌋ , s iU ∈ Σ ⌊ ( d + d f − d iU ) / ⌋ o , is the truncated quadratic module associated with the sets X and U (see [13] or [11]), whereΣ d is the cone of sums of squares of polynomials of degree up to d . Note that whenever V isfeasible in (4), then V satisfies Bellman’s inequality (2), because polynomials in Q d + d f ( X × U )are non-negative on X × U by construction. Therefore any polynomial V feasible in (4)satisfies also (3) and hence is an under-approximation of V ⋆ on X .The truncated quadratic module is essential to the proof of convergence of the moment-sum-of-squares hieararchy in the static polynomial optimization case [10] which is based onPutinar’s Positivstellensatz [15]. We recall that some polynomials of degree d + d f non-negative on X × U may not belong to Q d + d f ( X × U ) [11]. On the other hand, optimizing3ver the polynomials belonging to Q d + d f ( X × U ) is “simple” (it translates to semidefiniteprogramming) while optimizing over the cone of non-negative polynomials is very difficultin general. In particular, the optimization problem (4) translates to a finite-dimensionalsemidefinite programming problem (SDP). The fact that the truncated quadratic modulehas an explicit SDP representation and hence can be tractably optimized over is one of themain reasons for the popularity of the moment-sum-of-squares hierarchies across many fieldsof science.Throughout the paper we impose the following standing assumptions. Assumption 1
The following conditions hold:(a) X ⊂ [ − , n and U ⊂ [ − , m .(b) The sets of polynomials ( g Xi ) n X i =1 and ( g Ui ) n U i =1 both satisfy the Archimedian condition .(c) ∈ Int( X ) and ∈ Int( U ) .(d) The function ∇ V ⋆ is Lipschitz continuous on X .(e) The set f ( x, U ) is convex for all x ∈ X and the function v inf u ∈ U { l ( x, u ) | v = f ( x, u ) } is convex for all x ∈ X . The Assumption ( a ) and ( b ) are made without loss of generality since the sets X and U are assumed to be compact and hence can be scaled such that they are included in theunit ball; adding redundant ball constraints 1 − k x k and 1 − k u k in the description of X and U then implies the Archimedianity condition. Assumption ( c ) essentially requiresthat the sets X and U have nonempty interiors (a mild assumption) since then a changeof coordinates can always be carried out such that the origin is in the interior of thesesets. Assumption ( d ) is an important regularity assumption necessary for the subsequentdevelopments. Assumption ( e ) is a standard assumption ensuring that the value function ofthe so-called relaxed formulation of the problem (4) coincides with V ⋆ (see, e.g., [18]) andis satisfied, e.g., for input-affine systems with input-affine cost function provided that U isconvex. This class of problems is by far the largest and practically most relevant for whichthis assumption holds although other problems exist that satisfy this assumption as well .Under Assumption 1, the hierarchy of lower bounds generated by problem (4) convergesfrom below in the L norm to the value function V ⋆ ; see e.g. [8]: Theorem 1
There exists a d ≥ such that the problem (4) is feasible for all d ≥ d . Inaddition V ≤ V ⋆ for any V feasible in (4) and lim d →∞ k V ⋆ − V ⋆d k L ( µ ) = 0 , where V ⋆d is anoptimal solution to (4). The goal of this paper is to derive bounds on the convergence rate of V ⋆d to V ⋆ . A sufficient condition for a set of polynomials ( g i ) ni =1 to satisfy the Archimedian condition is g i = N −k x k for some i and some N ≥
0, which is a non-restrictive condition provovided that the set defined by g i ’s iscompact and an estimate of its dimeter is known. For a precise definition of this condition see Section 3.6.2of [13]. A system is input-affine if f ( x, u ) = f x ( x ) + f u ( x ) u for some functions f x and f u . For example, consider l ( x, u ) = x , f ( x, u ) = x + u , U = [ − , Convergence rate
The convergence rate is a consequence of the following fundamental results from approxima-tion theory and polynomial optimization.
Theorem 2 (Bagby et al. [1]) If h : X → R is a function such that ∇ h ∈ C ( X ) , thenthere exists a sequence of polynomials ( p d ) ∞ d =1 satisfying deg( p d ) ≤ d such that k h − p d k C ( X ) ≤ c /d for some constant c ≥ depending on h and X only. Now we turn to the second fundamental result. Given a polynomial p ∈ R [ x ] d expressed ina multivariate monomial basis as p ( x ) = X α ∈ N n | α |≤ d β α x α with | α | = P ni =1 α i and x α = Q ni =1 x α i , we define k p k R [ x ] = max α | β α | (cid:0) | α | α (cid:1) , (5)where the multinomial coefficient (cid:0) | α | α (cid:1) is defined by (cid:18) | α | α (cid:19) := | α | ! α ! · . . . · α n ! . Theorem 3 (Nie & Schweighofer [14])
Let p ∈ R [ x, u ] d p and let p min := min ( x,u ) ∈ X × U p ( x, u ) with p min > . Then p ∈ Q d ( X × U ) provided that d ≥ c exp (cid:16) d p ( n + m ) d p k p k R [ x,u ] p min (cid:17) c , (6) where the constant c depends only on the sets X and U . In the following developments it will be crucial to bound the norm k · k R [ x ] of a polynomial byits supremum norm k · k C ( X ) . We remark that such a bound is possible only for a “generic”set X such that any polynomial vanishing on X necessarily vanishes everywhere. A sufficientcondition for this is Int( X ) = ∅ . This is the reason for Assumption 1 ( c ). Lemma 1 If p ∈ R [ x ] d , x ∈ R n , then k p k R [ x ] ≤ d +1 k p k C ([ − , n ) (7) for all d ≥ . roof: The idea is to use a multivariate Markov inequality to bound the derivatives of thepolynomial at zero (and hence its coefficients) in terms of its supremum norm on [ − , n .Let p = P α β α x α ∈ R [ x ] d . From [17, Theorem 6], we have (cid:12)(cid:12)(cid:12) ∂ | α | p∂x α (0) (cid:12)(cid:12)(cid:12) ≤ | T ( | α | ) d (0) + iS ( | α | ) d (0) | · k p k C ([ − , n ) for all multiindices α satisfying | α | ≤ d , where i = √− T d ( y ) = cos( d arccos( y )), y ∈ [ − , d -th univariate Chebyshev polynomial of the first kind, S d ( y ) = sin( d arccos( y )) = d − √ − x T ′ d ( y ), y ∈ [ − , h ( k ) signifies the k -th derivative of a function h : R → R .It is easy to see that S ( k ) d (0) = d − T ( k +1) d (0) and hence | T ( | α | ) d (0) + iS ( | α | ) d (0) | ≤ | T ( | α | ) d (0) | + 1 d | T ( | α | +1) d (0) | = | α | ! · | t d, | α | | + ( | α | + 1)! d · | t d, | α | +1 |≤ h | α | ! + ( | α | + 1)! d i ¯ t d , where t d,k denotes the k -th coefficient of T d when expressed in the monomial basis (i.e., T d ( y ) = P dk =0 t d,k y k ) and ¯ t d = max k ∈{ ,...,d } | t d,k | .Since β α = ( α ! · . . . · α n !) − ∂ | α | p∂x α (0), we get | β α | (cid:0) | α | α (cid:1) = ( α ! · . . . · α n !) | β α || α | ! = 1 | α | ! (cid:12)(cid:12)(cid:12) ∂ | α | p∂x α (0) (cid:12)(cid:12)(cid:12) ≤ h | α | + 1 d i ¯ t d k p k C ([ − , n ) . In view of (5) and since | α | ≤ d we get k p k R [ x ] ≤ h d i ¯ t d k p k C ([ − , n ) . It remains to bound ¯ t d . From the generating recurrence of T d +1 ( y ) = 2 yT d ( y ) − T d − ( y )starting from T = 1 and T = y , it follows that ¯ t d ≤ ˜ t d , where ˜ t d solves the linear differenceequation ˜ t d +1 = 2˜ t d + ˜ t d with the initial condition ˜ t = 1 and ˜ t = 1. The solution to this equation is˜ t d = (1 + √ d (cid:16) √
22 + 12 (cid:17) + (1 − √ d (cid:16) − √ (cid:17) ≤ d , d ≥ . Therefore ¯ t d ≤ d for d ≥ k p k R [ x ] ≤ h d i d k p k C ([ − , n ) ≤ d +1 k p k C ([ − , n ) , d ≥ . Since k p k R [ x ] = k p k C ([ − , n ) for d = 0, the result follows. (cid:3) In order to state an immediate corollary of this result, crucial for subsequent developments,we define r := 1sup { s > | [ − s, s ] n + m ⊂ X × U } , (8)which is the reciprocal value of the length of the side of the largest box centered around theorigin included in X × U . By Assumption 1 ( a ) and ( c ), we have r ∈ [1 , ∞ ).6 orollary 1 If p ∈ R [ x, u ] d , then k p k R [ x,u ] ≤ k ( d ) k p k C ( X × U ) , (9) where k ( d ) = 3 d +1 r d with r defined in (8). Proof:
Set ˜ p (( x, u )) := p ( r − ( x, u )). Then we have k ˜ p k C ([ − , n + m ) = k p k C ([ − /r, /r ] n + m ) ≤ k p k C ( X × U ) (10)since [ − /r, /r ] n + m ⊂ X × U by definition of r (8). In addition k ˜ p k R [ x,u ] = max α r −| α | | β α | (cid:0) | α | α (cid:1) = r − d max α r d −| α | | β α | (cid:0) | α | α (cid:1) ≥ r − d k p k R [ x,u ] . (11)Combining (10), (11) and Lemma 1 we get k p k R [ x,u ] ≤ r d k ˜ p k R [ x,u ] ≤ d +1 r d k ˜ p k C ([ − , n + m ) ≤ d +1 r d k p k C ( X × U ) = k ( d ) k p k C ( X × U ) . as desired. (cid:3) Now we turn to analyzing the Bellman inequality (2). The following immediate property ofthis inequality will be of importance:
Lemma 2
Let V satisfy (2) and let a ∈ R . Then ˜ V := V − a satisfies l − β ˜ V + ∇ ˜ V · f ≥ βa ∀ ( x, u ) ∈ X × U. Proof:
We have l − β ˜ V + ∇ ˜ V · f = l − βV + ∇ V · f + βa ≥ βa , since V satisfies (2). (cid:3) We will also need a result which estimates the distance between the best polynomial approx-imation of a given degree to the value function and polynomials of the same degree satisfyingBellman’s inequality. A similar result in discrete time and with discrete state and controlspaces can be found in [5].
Lemma 3
Let ˆ V d ∈ arg min V ∈ R [ x ] d k V − V ⋆ k C ( X ) . Then there exists a polynomial ˜ V d ∈ R [ x ] d satisfying (2) and such that k ˜ V d − V ⋆ k C ( X ) ≤ k ˆ V d − V ⋆ k C ( X ) (cid:16) k f k C ( X ) β (cid:17) . (12) Proof:
Let ˜ V d := ˆ V d − a . We will find an a ≥ V d satisfies the Bellman inequality.We have l − β ˜ V d + ∇ ˜ V d f = l − β ˆ V d + ∇ ˆ V d f + βa = l − βV ⋆ + ∇ V ⋆ f + β ( V ⋆ − ˆ V d ) + ( ∇ ˆ V d − ∇ V ⋆ ) f + βa ≥ β ( V ⋆ − ˆ V d ) + ( ∇ ˆ V d − ∇ V ⋆ ) f + βa ≥ − β k ˆ V d − V ⋆ k C ( X ) − k ˆ V d − V ⋆ k C ( X ) k f k C ( X ) + βa, a := (cid:16) k f k C ( X ) β (cid:17) k ˆ V d − V ⋆ k C ( X ) , then ˜ V d satisfies Bellman’s inequality (2) and estimate (12) holds. (cid:3) Now we are in position to prove our main result which bounds the gap, in L norm, betweenthe value function V ⋆ of the optimal control problem (1) and any optimal solution V ⋆d of thesum-of-squares program (4): Theorem 4
It holds that k V ⋆ − V ⋆d k L ( µ ) < ǫ for all integer d ≥ c · exp h(cid:16) d p ( ǫ ) (3 r ( n + m )) d p ( ǫ ) M + βǫ + δ k f k C βǫ (cid:17) c i (13)= O (cid:16) exp (cid:2) ǫ c (3( n + m ) r ) c ǫ (cid:3)(cid:17) , (14) where d p = l c ǫ (cid:16) k f k C β (cid:17) + d f m , M = k l − βV ⋆ + ∇ V ⋆ · f k C ( X × U ) < ∞ , r is defined in (8), c = 2 c c (2 β + k f k C ) /β , the constant c depends only on V ⋆ and X and U , whereas theconstant c depends only on sets X and U . Proof:
According to Theorem 2 and Lemma 3 we can find a polynomial ˜ V ˜ d of degree nomore than ˜ d = l c ǫ (cid:16) k f k C β (cid:17)m such that k V ⋆ − ˜ V ˜ d k C ≤ ǫ and such that ˜ V ˜ d satisfies the Bellman inequality (2). Let V bean arbitrary polynomial feasible in (4) for some d ≥
0. Then k V ⋆ − V ⋆d k L ≤ k V ⋆ − V k L ≤ k V ⋆ − V k C ≤ k V ⋆ − ˜ V ˜ d k C + k V − ˜ V ˜ d k C ≤ ǫ k V − ˜ V ˜ d k C . (15)Hence, the goal is to find a degree d ≥ V feasible in (4) for that d satisfying k V − ˜ V ˜ d k C ≤ ǫ/
2. Setting V := ˜ V ˜ d − ǫ/
2, we clearly have k V − ˜ V ˜ d k C ≤ ǫ/
2; inaddition, using Lemma 2 we know that l − βV + ∇ V · f ≥ βǫ > V strictly satisfies the Bellman inequality and as a consequence of the Putinar’sPositivstellensatz [15] there exists a degree d ≥ V is feasible in (4). To boundthe degree d we apply the bound of Theorem 3 on p := l − βV + ∇ V · f . From (16) weknow that p min ≥ βǫ . Next, we need to bound k p k R [ x,u ] by bounding k p k C ( X × U ) and usingCorollary 1. We have k p k C = k l − βV + ∇ V · f k C = k l − β ˜ V ˜ d + ∇ ˜ V ˜ d · f + 12 βǫ k C ≤ k l − βV ⋆ + ∇ V ⋆ · f + k C + β k V ⋆ − ˜ V ˜ d k C + k V ⋆ − ˜ V ˜ d k C k f k C + 12 βǫ ≤ M + 12 βǫ + 12 ǫ k f k C + 12 βǫ = M + βǫ + 12 ǫ k f k C . p . We have (assuming without loss of generalitythat ˜ d + d f − ≥ deg( l ))deg( p ) = deg (cid:0) l − βV + ∇ V · f (cid:1) ≤ ˜ d + d f − ≤ l c ǫ (cid:16) k f k C β (cid:17) + d f m Setting d p := l c ǫ (cid:16) k f k C β (cid:17) + d f m and using Theorem 3 and Corollary 1, we conclude thatfor d ≥ c · exp h(cid:16) d p ( n + m ) d p k ( d p )( M + βǫ + δ k f k C ) βǫ (cid:17) c i , the polynomial V is feasible in (4). Since k ˜ V ˜ d − V k C ≤ ǫ , we conclude from (15) that k V ⋆ − V ⋆d k L ≤ ǫ . Inserting the expression for k ( d ) = 3 d +1 r d from Corollary 1 yields (13)and carrying out asymptotic analysis for ǫ → d ≥ O (cid:16) exp (cid:2) ǫ c (3( n + m ) r ) c ǫ (cid:3)(cid:17) , which is (14). (cid:3) Corollary 2
It holds k V ⋆ − V ⋆d k L ( µ ) = O (1 / log log d ) . Proof:
Follows by inverting the asymptotic expression (14) using the fact that(3( n + m ) r ) c ǫ ≥ ǫ c (3( n + m ) r ) c ǫ for small ǫ . (cid:3) The bound on the convergence rate O (1 / log log d ) should be compared with the bound O (1 / c p log d ) derived in [14] for static polynomial optimization problems (here c ≥ /d (in the sense that there exists aLipschitz continuous function whose best degree- d approximation converges to f with therate exactly C/d , C >
0, in the supremum norm on [ − , n ); this implies the 1 /ǫ dependenceof d p from Theorem 4 which then propagates to doubly exponential dependence on 1 /ǫ through Theorem 3.Therefore the primary point of improvement of the bound from Theorem 4 and Corollary 2is the fundamental bound of Theorem 3 derived in [14]. As the authors of [14] remark, thisbound is far from tight, at least in two special cases: the univariate case (i.e., n + m = 1 in oursetting) or the case of a single constraint defining X × U . In these cases the exponential in (6)can be dropped, which results in O (1 / log d ) asymptotic rate of convergence in Corollary 2.In the general case, however, it is unknown whether the exponential in (6) can be droppedor whether the bound (6) can be improved otherwise [14].9 Extensions
The approach for deriving this bound can be extended to other settings. In particular, sim-ilar bounds, with identical O (1 / log log d ) asymptotics, hold for the finite-horizon version ofthe problem, both in continuous and discrete time, as well as for the discounted discrete-timeinfinite-horizon variant (the former was treated using the moment-sum-of-squares approach,in continuous time, in [12] and the latter was treated in [16]). The derivation in discrete-timeis completely analogous and the results hold under milder assumptions (Assumption 1 ( d )can be replaced by V ⋆ Lipschitz and Assumption 1 ( e ) can be dropped completely). Forthe finite-horizon continuous-time problem, the only difference is in Lemma 3, where theconstant shift ˜ V ( x ) = ˆ V ( x ) − a , is replaced by the affine shift ˜ V ( t, x ) = ˆ V ( t, x ) − a − b ( T − t )for suitable a > b > V satisfies the corresponding finite-time Bellmaninequality and its boundary condition (hence the two degrees of freedom). References [1] T. Bagby, L. Bos, N. Levenberg. Multivariate simultaneous approximation. Construc-tive approximation, 18:569-577, 2002.[2] G. Chesi. Domain of attraction; analysis and control via SOS programming. LectureNotes in Control and Information Sciences, Vol. 415, Springer-Verlag, Berlin, 2011.[3] G. Chesi. LMI techniques for optimization over polynomials in control: a survey. IEEETransactions on Automatic Control, 55: 2500-2510, 2010.[4] E. de Klerk, R. Hess, M. Laurent. Improved convergence rates for Lasserre-type hierarchies of upper bounds for box-constrained polynomial optimization. arXiv:1603.03329 , March 2016.[5] D. P. de Farias, B. Van Roy. The Linear Programming Approach to ApproximateDynamic Programming. Operations Research, 51:850-865.[6] D. Henrion, J. B. Lasserre, C. Savorgnan. Approximate volume and integration forbasic semialgebraic sets. SIAM Review 51(4):722-743, 2009.[7] D. Henrion, M. Korda. Convex computation of the region of attraction of polynomialcontrol systems. IEEE Transactions on Automatic Control, 59:297-312, 2014.[8] D. Henrion, E. Pauwels. Linear conic optimization for nonlinear optimal control. arXiv:1407.1650arXiv:1407.1650