Linear conic optimization for nonlinear optimal control
aa r X i v : . [ m a t h . O C ] J u l Linear conic optimization for nonlinearoptimal control
Didier Henrion , , , Edouard Pauwels , Draft of August 21, 2018
Abstract
Infinite-dimensional linear conic formulations are described for nonlinear optimalcontrol problems. The primal linear problem consists of finding occupation mea-sures supported on optimal relaxed controlled trajectories, whereas the dual linearproblem consists of finding the largest lower bound on the value function of theoptimal control problem. Various approximation results relating the original opti-mal control problem and its linear conic formulations are developed. As illustratedby a couple of simple examples, these results are relevant in the context of finite-dimensional semidefinite programming relaxations used to approximate numericallythe solutions of the infinite-dimensional linear conic problems.
In [8, 9], J.-B. Lasserre described a hierarchy of convex semidefinite programming (SDP)problems allowing to compute bounds and find global solutions for finite-dimensionalnonconvex polynomial optimization problems. Each step in the hierarchy consists ofsolving a primal moment SDP problem and a dual polynomial sum-of-squares (SOS) SDPproblem corresponding to discretizations of infinite-dimensional linear conic problems,namely a primal linear programming (LP) problem on the cone of nonnegative measures,and a dual LP problem on the cone of nonnegative continuous functions. The number ofvariables (number of moments in the primal SDP, degree of the SOS certificates in the dualSDP) increases when progressing in the hierarchy, global optimality can be ensured bychecking rank conditions on the moment matrices, and global optimizers can be extractedby numerical linear algebra. For more information on the moment-SOS hierarchy and itsapplications, see [11].This approach was then extended to polynomial optimal control in [10]. Whereas the keyidea in [8, 9] was to reformulate a (finite-dimensional) nonconvex polynomial optimiza-tion on a compact semi-algebraic set into an LP in the (infinite-dimensional) space of CNRS, LAAS, 7 avenue du colonel Roche, F-31400 Toulouse, France. Universit´e de Toulouse, LAAS, F-31400 Toulouse, France. Faculty of Electrical Engineering, Czech Technical University in Prague, Technick´a 2, CZ-16626Prague, Czech Republic
We consider polynomial optimal control problems (POCPs) of the form v ∗ ( t , x ) := inf u Z Tt l ( x ( t ) , u ( t )) dt + l T ( x ( T ))s . t . ˙ x ( t ) = f ( x ( t ) , u ( t )) , x ( t ) = x x ( t ) ∈ X, t ∈ [ t , T ] u ( t ) ∈ U, t ∈ [ t , T ] x ( T ) ∈ X T (1)where the dot denotes time derivative, l ∈ R [ x, u ] is a given Lagrangian (integral cost), l T ∈ R [ x ] is a given terminal cost, f ∈ R [ x, u ] n is a given dynamics (vector field), X ⊂ R n is a given compact state constraint set, U ⊂ R m is a given compact control constraint set, X T ⊂ X is a given compact terminal state constraint set. Also given are the terminaltime T ≥
0, the initial time t ∈ [0 , T ] and the initial condition x ∈ X . In POCP (1),the minimum is with respect to all control laws u ∈ L ∞ ([ t , T ]; U ) which are boundedfunctions of time t with values in U , and the resulting state trajectories x ∈ L ∞ ([ t , T ]; X )which are bounded functions of time t with values in X .Let A ⊂ [0 , T ] × X denote the set of values ( t , x ) for which there is a controlled trajectory( x, u ) ∈ L ∞ ([ t , T ]; X × U ) starting at x ( t ) = x and admissible for POCP (1). Thefunction ( t , x ) v ∗ ( t , x ) defined in (1) is called the value function, and its domain is A . 2 LP formulation
As explained in the introduction, to derive an LP formulation of POCP (1) we have tointroduce measures on trajectories, the so-called occupation measures. The first step isto replace classical controls with probability measures, and for this we have to defineadditional notations.Given a compact set X ⊂ R n , let C ( X ) denote the space of continuous functions sup-ported on X , and let C + ( X ) denote its nonnegative elements, the cone of nonnegativecontinuous functions on X . Let M + ( X ) = C + ( X ) ′ denote its topological dual, the set ofall nonnegative continuous linear functional on C ( X ). By a Riesz representation theorem,these are nonnegative Borel-regular measures, or Borel measures, supported on X . Thetopology in C + ( X ) is the strong topology of uniform convergence, whereas the topologyin M + ( X ) is the weak-star topology. The duality bracket h v, µ i := Z X v ( x ) µ ( dx )denotes the integration of a function v ∈ C + ( X ) against a measure µ ∈ M + ( X ). Forbackground on weak-star topology see e.g. [12, Section 5.10] or [2, Chapter IV]. Finally,let us denote by P ( X ) the set of probability measures supported on X , consisting ofBorel measures µ ∈ M + ( X ) such that h , µ i = 1. In POCP (1), given ( t , x ) ∈ A , let ( x k , u k ) k ∈ N ∈ L ∞ ([ t , T ]; X × U ) denote a minimizingsequence of admissible controlled trajectories, i.e. it holds˙ x k ( t ) = f ( x k ( t ) , u k ( t )) , x ( t ) = x and lim k →∞ Z Tt l ( x k ( t ) , u k ( t )) dt + l T ( x k ( T )) = v ∗ ( t , x ) . In general the infimum in POCP (1) is not attained, so our next step is to assume that, ateach time t ∈ [ t , T ], the control is not a vector u ( t ) ∈ U , but a time-dependent probabilitymeasure ω ( du | t ) ∈ P ( U ) which rules the distribution of the control in U . We use thenotation ω t := ω ( . | t ) to emphasize the dependence on time. This is called a relaxedcontrol, or stochastic control, or Young measure in the functional analysis literature.POCP (1) is then relaxed to v ∗ R ( t , x ) := min ω Z Tt h l ( x ( t ) , . ) , ω t i dt + l T ( x ( T ))s . t . ˙ x ( t ) = h f ( x ( t ) , . ) , ω t i , x ( t ) = x x ( t ) ∈ X, t ∈ [ t , T ] ω t ∈ P ( U ) , t ∈ [ t , T ] x ( T ) ∈ X T (2)3here the minimization is w.r.t. a relaxed control. Note that we replaced the infimumin POCP (1) with a minimum in relaxed POCP (2). Indeed, it can be proved that thisminimum is always attained using (weak-star) compactness of the space of probabilitymeasures with compact support.Since classical controls u ∈ L ∞ ([ t , T ]; U ) are a particular case of relaxed controls ω t ∈ P ( U ) corresponding to the choice ω t = δ u ( t ) for a.e. t ∈ [ t , T ], the minimum in relaxedPOCP (2) is smaller than the infimum in classical POCP (1), i.e. v ∗ ( t , x ) ≥ v ∗ R ( t , x ) . Contrived optimal control problems (e.g. with overly stringent state constraints) canbe cooked up such that the inequality is strict, i.e. v ∗ ( t , x ) > v ∗ R ( t , x ), see e.g. theexamples in [7, Appendix C]. We do not consider that these examples are practicallyrelevant, and hence the following assumption will be made. Assumption 1 (No relaxation gap)
For any relaxed controlled trajectory ( x, ω t ) ad-missible for relaxed POCP (2), there is a sequence of controlled trajectories ( x k , u k ) k ∈ N admissible for POCP (1) such that lim k →∞ Z Tt v ( x k ( t ) , u k ( t )) dt = Z Tt h v ( x ( t ) , . ) , ω t i dt for every function v ∈ C ( X × U ) . Then it holds v ∗ ( t , x ) = v ∗ R ( t , x ) for every ( t , x ) ∈ A . Note that this assumption is satisfied under the classical controllability and/or convexityconditions used in the Filippov-Wa˙zewski Theorem with state constraints, see [6] and thediscussions around Assumption I in [5] and Assumption 2 in [7]. However, let us point outthat Assumption 1 does not imply that the infimum is attained in POCP (1). Conversely,if the infimum is attained, the values of POCP (1) and relaxed POCP (2) coincide, andAssumption 1 is satisfied.
Given initial data ( t , x ) ∈ A , and given a relaxed control ω t ∈ P ( U ), the uniquesolution of the ODE ˙ x ( t ) = h f ( x ( t ) , . ) , ω t i , x ( t ) = x (3)in relaxed POCP (2) is given by x ( t ) = x + Z tt h f ( x ( s ) , . ) , ω s i ds (4)for every t ∈ [ t , T ]. Let us then define µ ( dt, dx, du ) := dt δ x ( t ) ( dx ) ω t ( du ) ∈ M + ([ t , T ] × X × U ) (5)4s the occupation measure concentrated uniformly in time on the state trajectory startingat x at time t , for the given relaxed control ω t . An analytic intepretation is thatintegration w.r.t. the occupation measure is equivalent to time-integration along systemtrajectories, i.e. Z Tt v ( t, x ( t )) dt = Z Tt Z X Z U v ( t, x ) µ ( dt, dx, du ) = h v, µ i given any test function v ∈ C ([ t , T ] × X ).Let us define the linear operator L : C ([ t , T ] × X ) → C ([ t , T ] × X × U ) by v
7→ L v := ∂v∂t + n X i =1 ∂v∂x i f i = ∂v∂t + grad v · f. Given a continuously differentiable test function v ∈ C ([ t , T ] × X ), notice that v ( T, x ( T )) − v ( t , x ( t )) = R Tt dv ( t, x ( t )) = R Tt ˙ v ( t, x ( t )) dt = R Tt L v ( t, x ( t )) dt = hL v, µ i which can be written more concisely as h v, µ T i − h v, µ i = hL v, µ i (6)upon defining respectively the initial and terminal occupation measures µ ( dt, dx ) := δ t ( dt ) δ x ( t ) ( dx ) , µ T ( dt, dx ) := δ T ( dt ) δ x ( T ) ( dx ) . (7)Let us define the adjoint linear operator L ′ : C ([ t , T ] × X ) ′ → C ([ t , T ] × X × U ) ′ bythe relation h v, L ′ µ i := hL v, µ i for all µ ∈ M ([ t , T ] × X ) and v ∈ C ([ t , T ] × X ). Moreexplicitly, this operator can be expressed as µ
7→ L ′ µ = − ∂µ∂t − n X i =1 ∂ ( f i µ ) ∂x i = − ∂µ∂t − div f µ where the derivatives of measures are understood in the weak sense, i.e. via their action onsmooth test functions, and the change of sign comes from integration by parts. Equation(6) can be rewritten equivalently as h v, µ T i − h v, µ i = h v, L ′ µ i and since this equationshould hold for all test functions v ∈ C ([ t , T ] × X ), we obtain a linear partial differentialequation (PDE) on measures L ′ µ = µ T − µ that we write ∂µ∂t + div f µ + µ T = µ . (8)This linear transport equation is classical in fluid mechanics, statistical physics and anal-ysis of PDEs. It is called the equation of conservation of mass, or the continuity equation,or the advection equation, or Liouville’s equation. Under the assumption that the initialdata ( t , x ) ∈ A and the control law ω t ∈ P ( U ) are given, the following result can befound e.g. in [14, Theorem 5.34] or [1]. 5 emma 1 (Liouville PDE = Cauchy ODE) There exists a unique solution to the Li-ouville PDE (8) which is concentrated on the solution of the Cauchy ODE (3), i.e. suchthat (5) and (7) hold.
In our context of conic optimization, the relevance of the Liouville PDE (8) is its linearityin the occupation measures µ , µ and µ T , whereas the Cauchy ODE (3) is nonlinear inthe state trajectory x ( t ). The cost in relaxed POCP (2) can therefore be written Z Tt h l ( x ( t ) , . ) , ω t i dt + l T ( x ( T )) = h l, µ i + h l T , µ T i and we can now define a relaxed optimal control problem as an LP in the cone of non-negative measures: p ∗ ( t , x ) := min µ,µ T h l, µ i + h l T , µ T i s . t . ∂µ∂t + div f µ + µ T = δ t δ x µ ∈ M + ([ t , T ] × X × U ) µ T ∈ M + ( { T } × X T ) (9)where the minimization is w.r.t. the occupation measure µ (which includes the relaxedcontrol ω t , see (5)) and the terminal measure µ T , for a given initial measure µ = δ t δ x which is the right-hand side in the Liouville equation constraint.Note that in LP (9) the infimum is always attained since the admissible set is (weak-star) compact and the functional is linear. However, since classical trajectories are aparticular case of relaxed trajectories corresponding to the choice (5), the minimum inLP (9) is smaller than the minimum in relaxed POCP (2) (this latter one being equal tothe infimum in POCP (1), recall Assumption 1), i.e. v ∗ ( t , x ) ≥ p ∗ ( t , x ) . (10)The following result, due to [15], essentially based on convex duality, shows that there is nogap occuring when considering more general occupation measures than those concentratedon solutions of the ODE. Lemma 2
It holds v ∗ ( t , x ) = p ∗ ( t , x ) for all ( t , x ) ∈ A . Primal measure LP (9) has a dual LP in the cone of nonnegative continuous functions: d ∗ ( t , x ) := sup v v ( t , x )s . t . l + ∂v∂t + grad v · f ∈ C + ([ t , T ] × X × U ) l T − v ( T, . ) ∈ C + ( X T ) (11)6here maximization is with respect to a continuously differentiable function v ∈ C ([ t , T ] × X ) which can be interpreted as a Lagrange multiplier of the Liouville equation in (9).In general the supremum in dual LP (11) is not attained, and weak duality with theprimal LP (9) holds p ∗ ( t , x ) ≥ d ∗ ( t , x )but it can be shown that there is actually no duality gap: Lemma 3 (No duality gap)
It holds p ∗ ( t , x ) = d ∗ ( t , x ) for all ( t , x ) ∈ A . Proof :
The proof follows along the same lines as the proof of [7, Theorem 2]. First weobserve that ( t , x ) ∈ A and Assumption 1 imply that p ∗ ( t , x ) is finite. Second, weuse the condition that the cone { ( h l, µ i + h l T , µ T i , ∂µ∂t + div f µ + µ T ) : µ ∈ M + ([ t , T ] × X × U ) , µ T ∈ M + ( { T } × X T ) } is closed in the weak-star topology. This is a classical sufficient condition for the absenceof a duality gap in infinite-dimensional LPs, see e.g. [2, Chapter IV, Theorem 7.2]. (cid:3) Primal LP (9) and dual LP (11) are infinite-dimensional conic problems. If we want tosolve them with a computer, we invariably have to use discretization and approximationschemes. The aim of this section is to derive various approximation results that proveuseful when designing numerical methods based on moment-SOS hierarchies.
First, observe that there always exists an admissible solution for dual LP (11). Forexample, choose v ( t, x ) := a + b ( T − t ) with a ∈ R such that l T ( x ) ≥ a on X T and b ∈ R such that l ( x, u ) ≥ b on X × U . Moreover, by construction, any admissible function fordual LP (11) gives a global lower bound on the value function: Lemma 4 (Lower bound on value function) If v ∈ C ([ t , T ] × X ) is admissible fordual LP (11), then v ∗ ≥ v on [ t , T ] × X . Proof :
If ( t , x ) / ∈ A , then v ∗ ( t , x ) is unbounded and the statement holds be-cause v ( t , x ) must be finite. Let ( t , x ) ∈ A be given. If v is admissible for dualLP (11), then for any admissible trajectory ( x, u ) ∈ L ∞ ([ t , T ]; X × U ), starting at x ( t ) = x , it holds R Tt ( l ( x ( t ) , u ( t ))+ L v ( t, x ( t ) , u ( t ))) dt = R Tt l ( x ( t ) , u ( t )) dt + v ( T, x ( T )) − v ( t , x ) ≥ l + L ≥ t , T ] × X × U . Moreover R Tt l ( x ( t ) , u ( t )) dt + l T ( x ( T )) ≥ R Tt l ( x ( t ) , u ( t )) dt + v ( T, x ( T )) since l T − v ( T, . ) ≥ X T . Combining the two inequali-ties yields R Tt l ( x ( t ) , u ( t )) dt + l T ( x ( T )) ≥ v ( t , x ) and the expected inequality follows bytaking the infimum over admissible trajectories. (cid:3) Lemma 5 (Maximizing sequence)
Given ( t , x ) ∈ A , there is a sequence ( v k ) k ∈ N admissible for the dual LP (11) such that v ∗ ( t , x ) ≥ v k ( t , x ) and lim k →∞ v k ( t , x ) = v ∗ ( t , x ) . Proof :
From Lemma 4, it holds v ∗ ( t , x ) ≥ v k ( t , x ) for every function v k admissible fordual LP (11). By Assumption 1 and Lemma 3, it holds d ∗ ( t , x ) = p ∗ ( t , x ) = v ∗ ( t , x )and hence there exists a maximizing sequence v k ∈ C ([ t , T ] × X ) for LP (11). (cid:3) In this section, we investigate the properties of maximizing sequences given by Lemma 5,and in particular their convergence to the value function of POCP (1). We first demon-strate the lower semicontinuity of the value of LP (9). This leads to the lower semiconti-nuity of the value of POCP (1), by considering Assumption 1 and Lemma 2. Note thatlower semicontinuity is readily ensured when the set { ( f ( x, u ) , l ( x, u ) + a ) : u ∈ U, a ≥ } is convex in R n +1 for all x , with U compact, see e.g. [13, Section 6.2]. Indeed, in thiscase, the infimum is attained in POCP (1), and Assumption 1 is readily satisfied. Lemma 6 (Lower semi-continuity of the value of the measure LP)
The function ( t , x ) → p ∗ ( t , x ) is lower semicontinuous. Proof :
We need to show that given a sequence ( t k , x k ) k ∈ N such that lim k →∞ ( t k , x k ) =( t, x ) ∈ R n +1 , it holds that lim inf k →∞ p ∗ ( t k , x k ) ≥ p ∗ ( t, x ). Suppose that ( t, x ) is such thatmeasure LP (9) is feasible. If the left-hand side is not finite, the result holds. If the left-hand side is finite, we can consider, up to taking a subsequence, that lim inf k →∞ p ∗ ( t k , x k ) =lim k →∞ p ∗ ( t k , x k ) < ∞ . Since the infimum is attained in measure LP (9), we have asequence of measures ( µ k , µ T k ) k ∈ N such that p ∗ ( t k , x k ) = h µ k , l i + h µ T k , l T i and ∂µ k ∂t +div f µ k + µ T k = δ t k δ x k . Convergence of ( t k , x k ) to ( t, x ) implies weak-star convergenceof δ t k δ x k to δ t δ x . Using the same closedness argument as in the proof of Lemma 3,we can consider that, up to a subsequencce, µ k and µ T k converge to some measures µ and µ T in the weak-star topology and that ∂µ∂t + div f µ + µ T = δ t δ x . Hence, we havelim inf k →∞ p ∗ ( t k , x k ) = h µ, l i + h µ T , l T i and the pair ( µ, µ T ) is feasible for problem p ∗ ( t, x ).Therefore lim inf k →∞ p ∗ ( t k , x k ) ≥ p ∗ ( t, x ) which proves the result when LP (9) is feasiblefor ( t, x ). Using similar arguments, one can show that if ( t, x ) is such that LP (9) is notfeasible, there cannot be infinitely many k such that LP (9) is feasible for ( t k , x k ). (cid:3) The following result extends the convergence properties of the maximizing sequence.
Theorem 1 (Uniform convergence along relaxed trajectories)
For any sequence ( v k ) k ∈ N admissible for the dual LP (11), for any solution ( x, ω t ) of relaxed POCP (2),and for any t ∈ [ t , T ] , it holds ≤ v ∗ ( t, x ( t )) − v k ( t, x ( t )) ≤ v ∗ ( t , x ) − v k ( t , x ) → k →∞ . roof : Let ( x j , u j ) j ∈ N be an approximating sequence for ( x, ω t ), whose existence is guar-anteed by Assumption 1. For any j ∈ N , k ∈ N , and t ∈ [ t , T ], we have l T ( x j ( T )) − v k ( T, x j ( T ))+ R Tt ( l ( x j ( s ) , u j ( s ))+ L ( s, x j ( s ) , u j ( s ))) ds = l T ( x j ( T ))+ R Tt l ( x j ( s ) , u j ( s )) ds − v k ( t, x j ( t )). Both the first term and the integrand are positive in the left-hand side.Therefore, the right-hand side is a decreasing function of t . Moreover, the trajectory issuboptimal, and v k is a lower bound on the value function. It holds that 0 ≤ v ∗ ( t, x j ( t )) − v k ( t, x j ( t )) ≤ l T ( x j ( T ))+ R Tt l ( x j ( s ) , u j ( s )) ds − v k ( t, x j ( t )) ≤ l T ( x j ( T ))+ R Tt l ( x j ( s ) , u j ( s )) ds − v k ( t , x ). Letting j tend to infinity, using the lower semicontinuity of v ∗ , we conclude that0 ≤ v ∗ ( t, x ( t )) − v k ( t, x ( t )) ≤ lim inf j →∞ v ∗ ( t, x j ( t )) − v k ( t, x j ( t )) ≤ lim j →∞ ( l T ( x j ( T )) + R Tt l ( x j ( s ) , u j ( s )) ds − v k ( t , x )) = v ∗ ( t , x ) − v k ( t , x ). (cid:3) It is important to notice that Theorem 1 holds for any trajectory realizing the minimumof POCP (2) and therefore, for all of them simultaneously. In addition, these trajectoriesare identified with limiting trajectories of POCP (1) by Assumption 1.
Liouville equation (8) is used as a linear equality constraint in POCP (9) with a Diracright-hand side as an initial condition. However, this right-hand side can be replacedby more general probability measures. The linearity of the constraint allows to extendmost of the results of the previous section to this setting. It leads to similar convergenceguarantees regarding a (possibly uncountable) set of optimal control problems. Theseguarantees hold for solutions of a single infinite-dimensional LP.Suppose that we are given a set of initial conditions X ⊂ X , such that ( t , x ) ∈ A forevery x ∈ X . Given a probability measure ξ ∈ P ( X ), let µ ( dt, dx ) = δ t ( dt ) ξ ( dx )and consider the following average value¯ v ∗ ( µ ) := Z X v ∗ ( t, x ) µ ( dt, dx ) = h v ∗ , µ i (12)where v ∗ is the value of POCP (1). Under Assumption 1, by linearity this value is equalto the value of POCP (9) with µ as the right-hand side of the equality constraint, namelythe primal averaged LP¯ p ∗ ( µ ) := min µ,µ T h l, µ i + h l T , µ T i s . t . ∂µ∂t + div f µ + µ T = µ µ ∈ M + ([ t , T ] × X × U ) µ T ∈ M + ( { T } × X T ) (13)with dual averaged LP¯ d ∗ ( µ ) := sup v h v, µ i s . t . l + ∂v∂t + grad v · f ∈ C + ([ t , T ] × X × U ) l T − v ( T, . ) ∈ C + ( X T ) . (14)9he absence of duality gap is justified in the same way as in Lemma 3. Moreover, Lemma4 also holds, and, as in Lemma 5, we have the existence of maximizing lower bounds v k such that lim k →∞ h v k , µ i = ¯ v ∗ ( µ ) = ¯ p ∗ ( µ ) = ¯ d ∗ ( µ ) . Intuitively, primal LP (13) models a superposition of optimal control problems. The LPformulation allows to express it as a single program over measures satisfying a transportequation. A relevant question here is the relation between solutions of averaged measureLP (13) and optimal trajectories of the original problem POCP (1). The intuition isthat measure solutions of LP (13) represent a superposition of optimal trajectories ofthe relaxed POCP (2). These trajectories are themselves limiting trajectories of theoriginal POCP (1). The superposition principle of [1, Theorem 3.2] allows to formalizethis intuition and to extend the result of Theorem 1 to this setting.
Theorem 2 (Uniform convergence on support of optimal measure)
For any so-lution ( µ, µ T ) of primal averaged LP (13), there are parametrized measures ξ t ∈ P ( X ) (for the state) and ω t ∈ P ( U ) (for the control) such that µ ( dt, dx, du ) = dt ξ t ( dx ) ω t ( du ) , µ ( dt, dx ) = δ t ( dt ) ξ t ( dx ) and µ T ( dt, dx ) = δ T ( dt ) ξ T ( dx ) . In addition, if ( v k ) k ∈ N is amaximizing sequence for dual averaged LP (14), for any t ∈ [ t , T ] , it holds ≤ Z X ( v ∗ ( t, x ) − v k ( t, x )) ξ t ( dx ) ≤ Z X ( v ∗ ( t , x ) − v k ( t , x )) ξ ( dx ) → k →∞ . Proof :
The decomposition is given by Lemma 3 in [7]. It asserts the existence of ameasure σ ∈ M ( C ([ t , T ] , X )) supported on trajectories admissible for relaxed POCP(2) and such that for any measurable function w : X → R , it holds R X w ( x ) ξ t ( dx ) = R C ([ t ,T ] ,X ) w ( x ( t )) σ ( dx ( . )). By Assumption 1, all trajectories of the support of σ arepointwise limits of sequences of feasible trajectories of POCP (1). Hence σ -almost allof these sequences must be minimizing sequences for POCP (1), otherwise, that wouldcontradict optimality of ( µ, µ T ). The result follows by discarding the trajectories which arenot limits of minimizing sequences. This does not change σ or ξ t . Theorem 1 applies to σ -almost all these trajectories and we have 0 ≤ R C ([ t ,T ] ,X ) ( v ∗ ( t, x ( t )) − v k ( t, x ( t ))) σ ( dx ( . )) = R X ( v ∗ ( t, x ) − v k ( t, x )) ξ t ( dx ) ≤ R X ( v ∗ ( t , x ) − v k ( t , x )) ξ ( dx ). (cid:3) A remarkable practical implication of this result is that maximizing sequences of averageddual LP (14) provide an approximation to the value function of POCP (1) that is uniformin time and almost uniform in space along limits of optimal trajectories starting from X . In Sections 3 and 5 we reformulated nonlinear optimal control problems as abstract linearconic optimization problems that involve manipulations of measures and continuous func-tions in their full generality. The results presented in Section 4 are related to properties ofminimizing or maximizing elements, or sequences of elements for these problems. From apractical point of view, it is possible to construct these sequences using the same numerical10ools as in static polynomial optimization. On the primal side, this allows to approximatethe minimizing elements of measure LP problems with a converging hierarchy of momentSDP problems. On the dual side, we can construct numerically maximizing sequences ofpolynomial SOS certificates for the continuous function LP problems. The convergenceproperties that we investigated hold in particular for these solutions of the moment-SOShierarchy.This section illustrates convergence properties of the sequence of approximations of valuefunctions computed using moment-SOS hierarchies. We consider simple, but largelyspread, optimal control problems for which the value function (or optimal trajectories)are known.
Consider the one-dimensional turnpike POCP analyzed in Section 22.2 of [3]: v ∗ ( t , x ) := inf u Z t ( x ( t ) + u ( t )) dt s . t . ˙ x ( t ) = 1 + x ( t ) − x ( t ) u ( t ) x ( t ) = x u ( t ) ∈ [0 , . (15)Figure 1: Optimal trajectory t x ∗ ( t ) starting at ( t , x ) = (0 ,
0) for the turnpike POCP(15).For this problem, the infimum is attained at a unique optimal control which is piecewiseconstant. The optimal trajectory t x ∗ ( t ) starting at ( t , x ) = (0 ,
0) is presented inFigure 1. The uniform convergence of approximate value functions t v k ( t, x ∗ ( t )) to11igure 2: Differences t v ∗ ( t, x ∗ ( t )) − v k ( t, x ∗ ( t )) between the actual value functionand its polynomial approximations of increasing degrees k = 3 , , , t x ∗ ( t ) starting at ( t , x ) = (0 ,
0) for the turnpike POCP (15). We observeuniform convergence along this trajectory, as well as time decrease of the difference, aspredicted by the theory.the true value function t v ∗ ( t, x ∗ ( t )) along this optimal trajectory, stated by Theorem1, is illustrated in Figure 2. Moreover, the difference t v ∗ ( t, x ∗ ( t )) − v k ( t, x ∗ ( t )) is adecreasing function of time as we observed in the proof of Theorem 1. Consider the classical linear quadratic regulator problem: v ∗ ( t , x ) := inf u Z (10 x ( t ) + u ( t ) ) dt s . t . ˙ x ( t ) = x ( t ) + u ( t ) x ( t ) = x . (16)For each ( t , x ), the infimum is attained and the value of the problem can be computedby solving a Riccati differential equation. To illustrate Theorem 2 we are interested inthe average value (12) for an initial measure µ concentrated at time 0 and uniformlydistributed in space in X = [ − , t, x ) v ∗ ( t, x ) and a polynomial approxima-tion of degree 6 ( t, x ) v ( t, x ) is represented in Figure 3. We also show the supportof optimal trajectories starting from X . This illustrates the fact that the approximationof the value function is correct in this region, as stated by Theorem 2. It is noticeable12igure 3: Countour lines (at 0 , − , − , − , . . . ) of the decimal logarithm of the difference( t, x ) v ∗ ( t, x ) − v ( t, x ) between the actual value function and its polynomial approx-imation of degree 6 for LQR POCP (16). The dark area represents the set of optimaltrajectories starting from x ∈ X = [ − ,