Optimal steering of a linear stochastic system to a final probability distribution, Part III
11 Optimal steering of a linear stochastic systemto a final probability distribution, Part III
Yongxin Chen, Tryphon Georgiou and Michele Pavon
Abstract —The subject of this work has its roots in the so calledSchr¨odginer Bridge Problem (SBP) which asks for the most likelydistribution of Brownian particles in their passage between observedempirical marginal distributions at two distinct points in time. Renewedinterest in this problem was sparked by a reformulation in the languageof stochastic control. In earlier works, presented as Part I and PartII, we explored a generalization of the original SBP that amounts tooptimal steering of linear stochastic dynamical systems between state-distributions, at two points in time, under full state feedback. In theseworks the cost was quadratic in the control input. The purpose of thepresent work is to detail the technical steps in extending the frameworkto the case where a quadratic cost in the state is also present. In thezero-noise limit, we obtain the solution of a (deterministic) mass transportproblem with general quadratic cost.
I. I
NTRODUCTION
In 1931/32, Erwin Schr¨odinger asked for the most likely evo-lution that a cloud of Brownian particles may have taken in betweentwo end-point empirical marginal distributions [1], [2]. Schr¨odinger’sinsight was that the one-time marginal distributions along the mostlikely evolution can be represented as a product of two factors,a harmonic and a co-harmonic function, in close resemblance tothe way the product of a quantum mechanical wave function andits adjoint produce the correct probability density. The 80+ yearhistory of this so called Schr¨odinger Bridge Problem (SBP) waspunctuated by advances relating SBP with large deviations theory andthe Hamilton-Jacobi-Belman formalism of stochastic optimal control.More precisely, in it is original formulation, SBP seeks a probabilitylaw on path space which is closest to the prior in the sense of largedeviations, i.e., closest in the relative entropy sense. Alternatively, theGirsanov transformation allows seeing this Bayesian-like estimationproblem as a control problem, namely, as the problem to steer acollection of dynamical systems from an initial distribution to afinal one with minimal expected quadratic input cost. The solutionto the control problem generates the process and the law sought inSchr¨odinger’s question.Historically, building on the work of Jamison, Fleming, Holland,Mitter and others, Dai Pra made the connection between SBP andstochastic control [3]. At about the same time, Blaquiere and others[4], [5], [6], [7] studied the control of the Focker-Planck equation,and more recently Brockett studied the Louiville equation [8]. Therationale for seeking to steer a stochastic or, even a deterministicsystem between marginal state-distributions has most eloquently beenexplained by Brockett, in that “important limitations standing in theway of the wider use of optimal control [that] can be circumvented
Supported in part by the NSF under Grants ECCS-1509387, the AFOSRunder Grants FA9550-12-1-0319 and FA9550-15-1-0045, the VincentineHermes-Luh Chair, and by by the University of Padova Research ProjectCPDA 140897.Y. Chen is with the Department of Mechanical Engineering, University ofMinnesota, Minneapolis, MN 55455; [email protected]
T.T. Georgiou is with the Department of Mechanical and AerospaceEngineering, University of California, Irvine, CA 92697; [email protected]
M. Pavon is with the Dipartimento di Matematica, Universit`a di Padova,via Trieste 63, 35121 Padova, Italy; [email protected] by explicitly acknowledging that in most situations the apparatusimplementing the control policy will be judged on its ability to copewith a distribution of initial states, rather than a single state.” Thus,the problem that comes into focus in this line of current research isto impose a “soft conditioning” in the sense that a specification forthe probability distribution of the state vector is prescribed insteadof initial or terminal state values. For the case of linear dynamicsand quadratic input cost, the development parallels that of classicalLQG regulator theory [9]. More specifically, in [10] the solution forquadratic input cost is provided and related to the solution of twononlinearly-coupled homogeneous Riccati equations. The case wherenoise and control channels differ calls for a substantially differentanalysis which is given in [11]. However, both [10], [11] do notconsider penalty on state trajectories. This was discussed in [12]where, rather than having a hard constraint as in the SBP on thefinal marginal, the authors introduce a Wasserstein distance terminalcost. They derive necessary condition for optimality for this problembut without establishing sufficiency. Stochastic control with quadraticstate-cost penalty can be given a probabilistic interpretation whenthe uncontrolled evolution is the law of dynamical particles/systemswith creation/killing in the sense of Feynman-Kac [13], [5]. This wasdiscussed in [14] and necessary conditions for optimality were giventhere too but without establishing sufficiency. In the present work, wedocument fully the solution of the stochastic control problem to steera linear system between end-point Gaussian state-distributions whileminimizing a quadratic state + input cost. The solution is given inclosed form by solving two matrix Riccati equations with nonlinearlycoupled boundary conditions.The paper is organized as follows. We present the problemformulation and the main results in Section II. The results are usedto solve the optimal mass transport problem with losses in SectionIII by taking the zero-noise limit. A numerical example is presentedin Section IV to highlight the results.II. M
AIN RESULTS
We consider the following optimal control problem inf u ∈U E (cid:26)(cid:90) [ (cid:107) u ( t ) (cid:107) + x ( t ) (cid:48) Q ( t ) x ( t )] dt (cid:27) , (1a) dx ( t ) = A ( t ) x ( t ) dt + B ( t ) u ( t ) dt + B ( t ) dw ( t ) , (1b) x (0) ∼ ρ , x (1) ∼ ρ , (1c)where U denotes the set of finite-energy control laws adapted tothe state and ρ , ρ are zero-mean Gaussian distributions withcovariances Σ and Σ . The optimal control for nonzero-mean casescan be obtained by introducing a suitable time-varying drift, cf. [10,Remark 9]. The system is assumed to be uniformly controllable inthe sense that the reachability Gramian M ( t, s ) = (cid:90) ts Ψ( t, τ ) B ( τ ) B ( τ ) (cid:48) Ψ( t, τ ) (cid:48) dτ The choice of the time interval [0 , is without loss of generality, as thegeneral case reduces to this by rescaling time. a r X i v : . [ c s . S Y ] A ug is nonsingular for all s < t . Here Ψ( · , · ) is the state transition matrixfor A ( · ) .Sufficient conditions for optimality were given in [14, Propo-sition 1 and Section III] in the form of the following two Riccatiequations with coupled boundary conditions − ˙Π( t ) = A ( t ) (cid:48) Π( t )+Π( t ) A ( t ) − Π( t ) B ( t ) B ( t ) (cid:48) Π( t )+ Q ( t ) , (2a) − ˙H( t ) = A ( t ) (cid:48) H( t )+H( t ) A ( t )+H( t ) B ( t ) B ( t ) (cid:48) H( t ) − Q ( t ) , (2b) Σ − = Π(0) + H(0) , (2c) Σ − = Π(1) + H(1) . (2d)The special case where Q ( · ) ≡ , i.e., the state penalty is zero,is given in [10] where a solution is given in closed form. A keycontribution below is to show that the system (2a-2d) has alwaysa solution. Thereby, under the stated conditions, an optimal controlstrategy always exists and turns out to be in the form of state feedback u ( t, x ) = − B ( t ) (cid:48) Π( t ) x. (3) Theorem 1:
Consider positive definite matrices Σ , Σ and apair ( A ( · ) , B ( · )) that is uniformly controllable. The coupled systemof Riccati equations (2a-2d) has a unique solution, which is deter-mined by the initial value problem consisting of (2a-2b) and Π(0) = Σ − − Φ − Φ − Σ − / (4a) × (cid:18) I / Φ − Σ (Φ (cid:48) ) − Σ / (cid:19) / Σ − / , (4b) H(0) = Σ − − Π(0) , (4c)where Φ( t, s ) = (cid:20) Φ ( t, s ) Φ ( t, s )Φ ( t, s ) Φ ( t, s ) (cid:21) (5)is a state transition matrix corresponding to ∂ Φ( t, s ) /∂t = M ( t )Φ( t, s ) with Φ( s, s ) = I and M ( t ) = (cid:20) A ( t ) − B ( t ) B ( t ) (cid:48) − Q ( t ) − A ( t ) (cid:48) (cid:21) , and where (cid:20) Φ Φ Φ Φ (cid:21) := (cid:20) Φ (1 ,
0) Φ (1 , (1 ,
0) Φ (1 , (cid:21) . We continue with two technical lemmas needed in the proof ofthe theorem.
Lemma 2:
Given positive definite matrices
X, Y , Y / ( Y − / X − Y − / + 14 Y − / X − Y − X − Y − / ) / Y / = X − / ( I X / Y X / ) / X − / . (6) Proof:
Multiplying both sides of (6) by X / from both left andright we obtain G (( G (cid:48) G ) − + 14 ( G (cid:48) G ) − ) / G (cid:48) = ( I GG (cid:48) ) / , where G denotes X / Y / . As both sides are positive definite, theabove is equivalent to G (( G (cid:48) G ) − + 14 ( G (cid:48) G ) − ) / G (cid:48) G (( G (cid:48) G ) − + 14 ( G (cid:48) G ) − ) / G (cid:48) = I GG (cid:48) , by taking the square of both sides. Since G (cid:48) G commutes with (( G (cid:48) G ) − + ( G (cid:48) G ) − ) / , the LHS of the above is equal to GG (cid:48) G (( G (cid:48) G ) − + 14 ( G (cid:48) G ) − ) G (cid:48) = GG (cid:48) + I , which completes the proof. Lemma 3:
The entries of the state transition matrix in (5)satisfy: Φ ( t, s ) (cid:48) Φ ( t, s ) − Φ ( t, s ) (cid:48) Φ ( t, s ) = I, (7a) Φ ( t, s ) (cid:48) Φ ( t, s ) − Φ ( t, s ) (cid:48) Φ ( t, s ) = 0 , (7b) Φ ( t, s ) (cid:48) Φ ( t, s ) − Φ ( t, s ) (cid:48) Φ ( t, s ) = 0 , (7c) Φ ( t, s )Φ ( t, s ) (cid:48) − Φ ( t, s )Φ ( t, s ) (cid:48) = I, (7d) Φ ( t, s )Φ ( t, s ) (cid:48) − Φ ( t, s )Φ ( t, s ) (cid:48) = 0 , (7e) Φ ( t, s )Φ ( t, s ) (cid:48) − Φ ( t, s )Φ ( t, s ) (cid:48) = 0 , (7f)for all s ≤ t . Moreover, both Φ ( t, s ) and Φ ( t, s ) are invertible forall s < t , and (Φ ( t, − Φ ( t, − is monotonically decreasingfunction in the positive definite sense with left limit as t (cid:38) . Proof:
A direct consequence of the fact that M ( t ) J + JM ( t ) (cid:48) =0 , with J = (cid:20) I − I (cid:21) , is that J ( t, s ) := (cid:20) Φ ( t, s ) (cid:48) Φ ( t, s ) (cid:48) Φ ( t, s ) (cid:48) Φ ( t, s ) (cid:48) (cid:21)(cid:20) I − I (cid:21)(cid:20) Φ ( t, s ) Φ ( t, s )Φ ( t, s ) Φ ( t, s ) (cid:21) ≡ J. (8)To see this, note that J ( s, s ) = J while ∂∂t J ( t, s ) = 0 . Likewise, J ( t, s ) = (cid:20) Φ ( t, s ) Φ ( t, s )Φ ( t, s ) Φ ( t, s ) (cid:21)(cid:20) I − I (cid:21)(cid:20) Φ ( t, s ) (cid:48) Φ ( t, s ) (cid:48) Φ ( t, s ) (cid:48) Φ ( t, s ) (cid:48) (cid:21) ≡ J. (9)Then, (8) gives (7a)-(7c) and (9) gives (7d)-(7f).We next show both Φ ( t, s ) and Φ ( t, s ) are invertible forall s < t . Let T ( t, s ) = Φ ( t, s ) − Φ ( t, s ) . Since Φ ( s, s ) = I , by continuity T ( t, s ) is well-defined for | t − s | sufficiently small. What’s more, T ( t, s ) is symmetric by (7e). Takingthe derivative of T with respect to s yields ∂∂s T ( t, s ) = A ( s ) T ( t, s ) + T ( t, s ) A ( s ) (cid:48) + B ( s ) B ( s ) (cid:48) − T ( t, s ) Q ( s ) T ( t, s ) . This together with the initial condition T ( t, t ) = 0 and the assump-tion that ( A, B ) is controllable lead to T ( t, s ) < for all s < t , which implies that both Φ ( t, s ) and Φ ( t, s ) areinvertible for all s < t . Finally, taking the derivative of T with respect to t we obtain ∂∂t T ( t, s ) = − Φ ( t, s ) − ∂∂t Φ ( t, s )Φ ( t, s ) − Φ ( t, s )+Φ ( t, s ) − ∂∂t Φ ( t, s )= Φ ( t, s ) − B ( t ) B ( t ) (cid:48) (Φ ( t, s )Φ ( t, s ) − Φ ( t, s ) − Φ ( t, s ))= Φ ( t, s ) − B ( t ) B ( t ) (cid:48) (Φ ( t, s )Φ ( t, s ) (cid:48) (Φ ( t, s ) − ) (cid:48) − Φ ( t, s ))= − Φ ( t, s ) − B ( t ) B ( t ) (cid:48) (Φ ( t, s ) − ) (cid:48) ≤ , where we used (7d) and the fact that Φ ( t, s ) − Φ ( t, s ) is sym-metric in the last two steps. Therefore, we conclude that T ( t, s ) is continuous monotonically decreasing function of t ( > s ) in thepositive-definite sense, with left limit T ( s, s ) = 0 at t = s . proof of Theorem 1: The basic idea is to recast the Riccatiequations (2a-2b) as linear differential equations in the standardmanner. To this end, let [ X ( t ) (cid:48) , Y ( t ) (cid:48) ] (cid:48) be the solution of (cid:20) ˙ X ˙ Y (cid:21) = (cid:20) A ( t ) − B ( t ) B ( t ) (cid:48) − Q ( t ) − A ( t ) (cid:48) (cid:21) (cid:20) XY (cid:21) . (10)Then Π( t ) = Y ( t ) X ( t ) − (11)is a solution to the Riccati equation (2a) provided that X ( t ) isinvertible for all t . To see this, differentiate (11) to obtain − ˙Π( t ) = − ˙ Y ( t ) X ( t ) − + Y ( t ) X ( t ) − ˙ X ( t ) X ( t ) − = ( QX + A (cid:48) Y ) X − + Y X − ( AX − BB (cid:48) Y ) X − = A (cid:48) Y X − + Y X − A − Y X − BB (cid:48) Y X − + Q = A (cid:48) Π( t ) + Π( t ) A − Π( t ) BB (cid:48) Π( t ) + Q, which coincides with (2a). Similarly, let H( t ) = − ( ˆ X ( t ) (cid:48) ) − ˆ Y ( t ) (cid:48) (12)with (cid:34) ˙ˆ X ˙ˆ Y (cid:35) = (cid:20) A ( t ) − B ( t ) B ( t ) (cid:48) − Q ( t ) − A ( t ) (cid:48) (cid:21) (cid:20) ˆ X ˆ Y (cid:21) (13)is a solution to (2b) provided that ˆ X ( t ) is invertible for all t . Plugging(11) and (12) into the boundary conditions (2c) and (2d) yields Σ − = Y (0) X (0) − − ( ˆ X (0) (cid:48) ) − ˆ Y (0) (cid:48) , Σ − = Y (1) X (1) − − ( ˆ X (1) (cid:48) ) − ˆ Y (1) (cid:48) . Since [ X ( t ) (cid:48) , Y ( t ) (cid:48) ] (cid:48) has linear dynamics (10), we have (cid:20) X (1) Y (1) (cid:21) = (cid:20) Φ Φ Φ Φ (cid:21) (cid:20) X (0) Y (0) (cid:21) . Similarly, (cid:20) ˆ X (1)ˆ Y (1) (cid:21) = (cid:20) Φ Φ Φ Φ (cid:21) (cid:20) ˆ X (0)ˆ Y (0) (cid:21) . Moreover, without loss of generality, we can assume X (0) =ˆ X (0) = I because their initial values can be absorbed into Y (0) and ˆ Y (0) without changing the values of Π(0) and
H(0) . In thiscase, the only unknowns Y (0) , ˆ Y (0) are symmetric. Combining theabove we obtain Σ − = Y (0) − ˆ Y (0) , (14a) Σ − = (Φ + Φ Y (0))(Φ + Φ Y (0)) − − (Φ (cid:48) + ˆ Y (0) (cid:48) Φ (cid:48) ) − (Φ (cid:48) + ˆ Y (0)Φ (cid:48) ) . (14b) Multiplying (14b) with (Φ (cid:48) + ˆ Y (0)Φ (cid:48) ) from the left and (Φ +Φ Y (0)) from the right yields (Φ (cid:48) + ˆ Y (0)Φ (cid:48) )Σ − (Φ + Φ Y (0))= (Φ (cid:48) + ˆ Y (0)Φ (cid:48) )(Φ + Φ Y (0)) − (Φ (cid:48) + ˆ Y (0)Φ (cid:48) )(Φ + Φ Y (0))= Φ (cid:48) Φ + Φ (cid:48) Φ Y (0) + ˆ Y (0)Φ (cid:48) Φ + ˆ Y (0)Φ (cid:48) Φ Y (0) − Φ (cid:48) Φ − Φ (cid:48) Φ Y (0) − ˆ Y (0)Φ (cid:48) Φ − ˆ Y (0)Φ (cid:48) Φ Y (0)= Y (0) − ˆ Y (0) , (15)where we use the three identities (7a)-(7c) in the last step. By (14a), Y (0) and ˆ Y (0) can be parameterized by a symmetric matrix Z as Y (0) = Z + 12 Σ − , (16a) ˆ Y (0) = Z −
12 Σ − . (16b)Plugging these into (15) yields Σ − = (Φ (cid:48) −
12 Σ − Φ (cid:48) + Z Φ (cid:48) )Σ − (Φ + 12 Φ Σ − + Φ Z ) . Expanding it and exploring the symmetry we obtain a quadraticequation Z Φ (cid:48) Σ − Φ Z + Z Φ (cid:48) Σ − Φ + Φ (cid:48) Σ − Φ Z + Φ (cid:48) Σ − Φ = Σ − + 14 Σ − Φ (cid:48) Σ − Φ Σ − on Z . By completion of square the left hand side is ( Z + Φ (cid:48) (Φ (cid:48) ) − )Φ (cid:48) Σ − Φ ( Z + Φ − Φ ) . Note here we use the fact that Φ is invertible (see Lemma 3). By(7e), Φ − Φ is symmetric, therefore ( T − / ( Z + Φ − Φ ) T − / ) = T − / (Σ − + 14 Σ − T − Σ − ) T − / , where T = (Φ (cid:48) Σ − Φ ) − . It follows that the only solutions are Z ± = − Φ − Φ ± T / ( T − / Σ − T − / +14 T − / Σ − T − Σ − T − / ) / T / . Since Σ and T are positive definite, we can apply Lemma 2 andarrive at Z ± = − Φ − Φ ± Σ − / ( I / Φ − Σ (Φ (cid:48) ) − Σ / ) / Σ − / . The unknowns Y (0) and ˆ Y (0) can be obtained by plugging the aboveinto (16).We next show that when Z = Z − , the solutions to (10) and(13) satisfy that X ( t ) and ˆ X ( t ) are invertible for all t ∈ [0 , , whilethis is not the case when Z = Z + . This implies that when Z = Z − ,the pair (Π( · ) , H( · )) in (11) and (12) is well defined and solves thecoupled Riccati equations (2), whereas, Π( · ) or H( · ) would havefinite escape time when Z = Z + .By (10), recalling the initial condition X (0) = I , X ( t ) = Φ ( t,
0) + Φ ( t, Y (0)= Φ ( t,
0) + Φ ( t, − + Z ) . Since Φ ( t, is nonsingular for all t ∈ (0 , , it follows Φ ( t, − X ( t ) = Φ ( t, − Φ ( t,
0) + 12 Σ − + Z. First, when Z = Z − , we have Φ ( t, − X ( t ) = Φ ( t, − Φ ( t, − Φ − Φ + 12 Σ − − Σ − / ( I / Φ − Σ (Φ (cid:48) ) − Σ / ) / Σ − / . By Lemma 3, Φ ( t, − Φ ( t, ≤ Φ (1 , − Φ (1 ,
0) = Φ − Φ , therefore, for any t ∈ (0 , , Φ ( t, − X ( t ) ≤
12 Σ − − Σ − / ( I / Φ − Σ (Φ (cid:48) ) − Σ / ) / Σ − / < is invertible. This indicates X ( t ) is for all t ∈ [0 , . On the otherhand, when Z = Z + , Φ ( t, − X ( t ) = Φ ( t, − Φ ( t, − Φ − Φ + 12 Σ − +Σ − / ( I / Φ − Σ (Φ (cid:48) ) − Σ / ) / Σ − / . By Lemma 3, (Φ ( t, − Φ ( t, − (cid:37) as t (cid:38) . Thus, forsmall enough s > , Φ ( s, − X ( s ) is symmetric and negativedefinite. But for t = 1 , Φ (1 , − X (1) = 12 Σ − + Σ − / ( I / Φ − Σ (Φ (cid:48) ) − Σ / ) / Σ − / > . Hence, by continuity of X ( t ) we conclude that there exists τ ∈ ( s, such that X ( τ ) is singular. This implies that Π( t ) grows unboundedat t = τ . An analogous argument can be carried out for ˆ X and H .Finally, setting Z = Z − into (16) and recalling that X (0) = ˆ X (0) = I we obtain Π(0) = Σ − − Φ − Φ − Σ − / (cid:18) I / Φ − Σ (Φ (cid:48) ) − Σ / (cid:19) / Σ − / , H(0) = Σ − − Π(0) . This completes the proof.The result for the Q ≡ in [10, Proposition 4, Remark 6] canbe recovered as a special case of the Theorem 1. Corollary 4:
Given Σ , Σ > and controllable pair ( A ( · ) , B ( · )) , the Riccati equations (2) with Q ≡ has a uniquesolution, which is determined by the initial conidtions Π(0) = Σ − , (cid:48) M (1 , − Ψ(1 , − Σ − / (cid:18) I / Ψ(1 , (cid:48) M (1 , − Σ M (1 , − Ψ(1 , / (cid:17) / Σ − / , H(0) = Σ − − Π(0) , where Ψ is the state transition matrix of ( A, B ) and M is thecorresponding reachability Gramian. Proof:
Simply note that when Q ≡ we have Φ = Ψ(1 , , and Φ = − M (1 , , (cid:48) ) − . III. Z
ERO - NOISE LIMIT AND
OMT
WITH LOSSES
The zero-noise limit of the optimal steering problem 1 is aoptimal mass transport problem with general quadratic cost. That is, the solution of inf u ∈U E (cid:26)(cid:90) [ (cid:107) u ( t ) (cid:107) + x ( t ) (cid:48) Q ( t ) x ( t )] dt (cid:27) , (17a) dx ( t ) = A ( t ) x ( t ) dt + B ( t ) u ( t ) dt + √ (cid:15)B ( t ) dw ( t ) , (17b) x (0) ∼ ρ , x (1) ∼ ρ , (17c)converges to the solution of inf u ∈U E (cid:26)(cid:90) [ (cid:107) u ( t ) (cid:107) + x ( t ) (cid:48) Q ( t ) x ( t )] dt (cid:27) , (18a) dx ( t ) = A ( t ) x ( t ) dt + B ( t ) u ( t ) dt, (18b) x (0) ∼ ρ , x (1) ∼ ρ , (18c)as (cid:15) (cid:38) . The special case when Q ≡ has been studied in [15].See for [16], [17], [18], [19] the proof of the general cases.By slightly modifying the results in Section II, we can readilyobtain the solution to (17). The optimal control strategy for (17) is u ( t, x ) = − B ( t ) (cid:48) Π (cid:15) ( t ) x with Π (cid:15) ( · ) satisfying the same Riccati equation (2a) with some properinitial condition Π (cid:15) (0) . The initial value is chosen in a way such thatthe covariance Σ (cid:15) ( · ) , that is, the solution to ˙Σ (cid:15) ( t ) = ( A − BB (cid:48) Π (cid:15) )Σ (cid:15) + Σ (cid:15) ( A − BB (cid:48) Π (cid:15) ) (cid:48) + (cid:15)BB (cid:48) (19)matches the two boundary values Σ and Σ . Combining (2a),(19)and letting H (cid:15) ( t ) = (cid:15) Σ − (cid:15) ( t ) − Π (cid:15) ( t ) yield − ˙H (cid:15) ( t ) = A ( t ) (cid:48) H (cid:15) ( t ) + H (cid:15) ( t ) A ( t ) + H (cid:15) ( t ) B ( t ) B ( t ) (cid:48) H (cid:15) ( t ) − Q ( t ) . Therefore, to establish the optimal control for (17), we only needto solve the coupled Riccati equations (2a)-(2b) with boundaryconditions (cid:15) Σ − = Π (cid:15) (0) + H (cid:15) (0) , (cid:15) Σ − = Π (cid:15) (1) + H (cid:15) (1) . This is nothing but Theorem 1 with different boundary conditions.Therefore, The initial value for Π (cid:15) ( t ) is Π (cid:15) (0) = (cid:15) Σ − − Φ − Φ − Σ − / (cid:18) (cid:15) I / Φ − Σ (Φ (cid:48) ) − Σ / (cid:19) / Σ − / . Letting (cid:15) → we obtain that the solution to the optimal masstransport problem (18) is u ( t, x ) = − B ( t ) (cid:48) Π ( t ) x where Π ( · ) satisfies the Riccati equation (2a) with initial value Π (0) = − Φ − Φ − Σ − / (cid:16) Σ / Φ − Σ (Φ (cid:48) ) − Σ / (cid:17) / Σ − / . Therefore, we established the following.
Theorem 5:
The solution to Problem (18) with zero-meanGaussian marginals with covariances Σ , Σ is u ( t, x ) = − B ( t ) (cid:48) Π( t ) x, where Π is the solution of the Riccati equation (2a) with initial value Π(0) = − Φ − Φ − Σ − / (cid:16) Σ / Φ − Σ (Φ (cid:48) ) − Σ / (cid:17) / Σ − / . See [15] for a precise statement of this convergence which involves weakconvergence of path space probability measures and of their initial-final jointmarginals.
Evidently, we can similarly solve the slightly more generaloptimal mass transport problem inf u ∈U E (cid:26)(cid:90) [ u ( t ) (cid:48) R ( t ) u ( t ) + x ( t ) (cid:48) Q ( t ) x ( t )] dt (cid:27) , (20a) dx ( t ) = A ( t ) x ( t ) dt + B ( t ) u ( t ) dt, (20b) x (0) ∼ ρ , x (1) ∼ ρ , (20c)where R ( t ) , ≤ t ≤ is positive definite, as this reduces to(18) by setting ˜ u ( t ) = R ( t ) / u ( t ) and B ( t ) = B ( t ) R ( t ) − / . More specifically, the solution to (20) with zero-mean Gaus-sian marginals having covariances Σ , Σ is given by u ( t, x ) = − R ( t ) − B ( t ) (cid:48) Π( t ) x , where Π is the solution of − ˙Π( t ) = A ( t ) (cid:48) Π( t )+Π( t ) A ( t ) − Π( t ) B ( t ) R ( t ) − B ( t ) (cid:48) Π( t )+ Q ( t ) with initial value Π(0) = − Φ − Φ − Σ − / (cid:16) Σ / Φ − Σ (Φ (cid:48) ) − Σ / (cid:17) / Σ − / . Here Φ( t, s ) = (cid:20) Φ ( t, s ) Φ ( t, s )Φ ( t, s ) Φ ( t, s ) (cid:21) is a state transition matrix corresponding to ∂ Φ( t, s ) /∂t = M ( t )Φ( t, s ) with Φ( s, s ) = I and M ( t ) = (cid:20) A ( t ) − B ( t ) R ( t ) − B ( t ) (cid:48) − Q ( t ) − A ( t ) (cid:48) (cid:21) , and, as before, (cid:20) Φ Φ Φ Φ (cid:21) := (cid:20) Φ (1 ,
0) Φ (1 , (1 ,
0) Φ (1 , (cid:21) . IV. E
XAMPLES
Consider inertial particles modeled by dx ( t ) = v ( t ) dtdv ( t ) = u ( t ) dt + √ (cid:15)dw ( t ) , where u ( t ) is a control input (force) at our disposal, x ( t ) representsthe position, v ( t ) velocity of particles, and w ( t ) represents randomexitation (corresponding to “white noise” forcing). Our goal is tosteer the spread of the particles from an initial Gaussian distributionwith Σ = 2 I at t = 0 to the terminal marginal Σ = 1 / I for t = 1 in a way such that the cost function (1a) is minimized.Figure 1 displays typical sample paths { ( x ( t ) , v ( t )) | t ∈ [0 , } in phase space, as a function of time, that are attained using theoptimal feedback strategy derived following (3) and Q = I . In allphase plots, the transparent blue “tube” represents the “ σ ” toleranceinterval. More specifically, the intersection ellipsoid between the tubeand the slice plane t is the set (cid:2) x v (cid:3) Σ( t ) − (cid:20) xv (cid:21) ≤ . The feedback gains K ( t ) = [ k ( t ) , k ( t )] are shown in Figure 2 asa function of time. Figure 3 shows the corresponding control actionfor each trajectory.For comparison, Figure 4 and Figure 5 display typical samplepaths under optimal control strategies when Q = 10 I and Q = − I respectively. As expected, Σ( · ) shrinks faster as we increase the statepenalty Q which is consistent with the reference evolution loosingprobability mass at a higher rate at places where x (cid:48) Qx is large, while Σ( · ) will expand first when Q is negative since the particles have thetendency to stay away from the origin to reduce the cost.To see the zero-noise limit behavior of the problems, we takedifferent levels of noise intensity with the same Q = I . Figure 6and Figure 7 depict the typical sample path for (cid:15) = 10 and (cid:15) =0 . respectively. As can be observed, the results converge to that ofProblem 18, which is shown in Figure 8.V. C ONCLUSION
The general theme of the work that was presented in Parts I, II,[10], [11] as well as in the present one, Part III, is the control of linearstochastic dynamical systems between specified distributions of theirstate vectors. This type of a problem represents a “soft conditioning”of terminal constraints that typically arise in LQG theory. It canalso be seen as a precise variant of the rather indirect, and certainlyless accurate, route to approximately regulate the distribution of theterminal state in LQG designs via a suitable choice of quadraticpenalties. Although the development is reminiscent of classical LQGtheory, in each case we studied, the key problem leads to an atypicaltwo-point boundary value problem involving a pair of matrix Riccatiequations nonlinearly coupled through their boundary conditions.The earlier works [10], [11] dealt with the case where aquadratic cost penalty is imposed on the input vector alone and,respectively, where stochastic excitation and control affect the sys-tem through the same or different channels. There is a substantialdifference between the two that necessitated separate treatments. Thepresent work, Part III, details the technical issues that arise when alsoa quadratic cost on the state vector is present. It is important to pointout that herein we assume that noise and control input enter into thesystem via the same channel, i.e., same “ B ” matrix, very much as inthe model taken in [10]. The case where this is not so is currentlyopen.We note that the control problems to steer a stochastic linearsystem between terminal distributions, for the case where stochas-tic excitation and control input enter in the same manner, admita Bayesian-like interpretation in that the law of the controlledsystem is the closest in the relative entropy sense to that of theuncontrolled system (“prior”); the presence of a state-penalty isrelated to creation/killing in the sense of Feynman-Kac [13], [5]of the uncontrolled evolution and was discussed in [14]. Such aninterpretation fails when the respective “ B ”-matrices differ (as in themodel in [11]) because in this case the relative entropy between thetwo laws is infinite.Another fruitful direction is the one taken in [12] where a furtherrelaxation of the terminal constraints was cast as a penalty on theWasserstein distance between the terminal distribution and a pre-specified target distribution. The work in [12] provides necessaryconditions while no probabilistic/Bayesian interpretation of this for-mulation is available at present. Recent related contributions include[20], where a discrete counterpart of SBP is being considered, and[21], where the author brings in integral quadratic constraints intothe corresponding covariance control problem at hand.In all cases considered, a natural by-product is the theory to con-trol linear deterministic systems, i.e., without stochastic excitation,between uncertain marginals for their state vectors. The underlyingproblem is again one of stochastic control by virtue of the random boundary state distributions. Most importantly, it represents a variantof optimal mass transport where the “particles” to be transported froman initial distribution to a final one obey non-trivial dynamics. Thus,the results in the present paper provide yet another generalization ofoptimal mass transport where the transportation cost derives from anaction functional with quadratic Lagrangian not satisfying the usualstrict convexity assumption in the ˙ x variable (see [22]).R EFERENCES [1] E. Schr¨odinger, “ ¨Uber die Umkehrung der Naturgesetze,”
Sitzungs-berichte der Preuss Akad. Wissen. Phys. Math. Klasse, Sonderausgabe ,vol. IX, pp. 144–153, 1931.[2] ——, “Sur la th´eorie relativiste de l’´electron et l’interpr´etation de lam´ecanique quantique,” in
Annales de l’institut Henri Poincar´e , vol. 2,no. 4. Presses universitaires de France, 1932, pp. 269–310.[3] P. Dai Pra, “A stochastic control approach to reciprocal diffusionprocesses,”
Applied mathematics and Optimization , vol. 23, no. 1, pp.313–329, 1991.[4] A. Blaqui`ere, “Controllability of a Fokker-Planck equation, theSchr¨odinger system, and a related stochastic optimal control (revisedversion),”
Dynamics and Control , vol. 2, no. 3, pp. 235–253, 1992.[5] P. Dai Pra and M. Pavon, “On the Markov processes of Schr¨odinger,the Feynman–Kac formula and stochastic control,” in
Realization andModelling in System Theory . Springer, 1990, pp. 497–504.[6] M. Pavon and A. Wakolbinger, “On free energy, stochastic control, andSchr¨odinger processes,” in
Modeling, Estimation and Control of Systemswith Uncertainty . Springer, 1991, pp. 334–348.[7] R. Filliger, M.-O. Hongler, and L. Streit, “Connection between an exactlysolvable stochastic optimal control problem and a nonlinear reaction-diffusion equation,”
Journal of Optimization Theory and Applications ,vol. 137, no. 3, pp. 497–505, 2008.[8] R. Brockett, “Notes on the control of the Liouville equation,” in
Controlof Partial Differential Equations . Springer, 2012, pp. 101–129.[9] W. Fleming and R. Rishel,
Deterministic and Stochastic Optimal Con-trol . Springer, 1975.[10] Y. Chen, T. T. Georgiou, and M. Pavon, “Optimal steering of a linearstochastic system to a final probability distribution, Part I,”
IEEE Trans.on Automatic Control , vol. 61, no. 5, pp. 1158–1169, 2016.[11] ——, “Optimal steering of a linear stochastic system to a final proba-bility distribution, Part II,”
IEEE Trans. on Automatic Control , vol. 61,no. 5, pp. 1170–1180, 2016.[12] A. Halder and E. D. B. Wendel, “Finite horizon linear quadratic Gaussiandensity regulator with Wasserstein terminal cost,” in
Proc. AmericanControl Conf. , 2016.[13] A. Wakolbinger, “Schr¨odinger bridges from 1931 to 1991,” in
Proc.of the 4th Latin American Congress in Probability and MathematicalStatistics, Mexico City , 1990, pp. 61–79.[14] Y. Chen, T. T. Georgiou, and M. Pavon, “Optimal steering of inertial par-ticles diffusing anisotropically with losses,” in
Proc. American ControlConf. , 2015, pp. 1252–1257.[15] Y. Chen, T. Georgiou, and M. Pavon, “Optimal transport over a lineardynamical system,” arXiv preprint arXiv:1502.01265, IEEE Trans. onAutomatic Control, to appear , 2017.[16] T. Mikami, “Monge’s problem with a quadratic cost by the zero-noiselimit of h-path processes,”
Probability theory and related fields , vol. 129,no. 2, pp. 245–260, 2004.
Fig. 1: Inertial particles: state trajectoriesFig. 2: Inertial particles: feedback gains [17] T. Mikami and M. Thieullen, “Optimal transportation problem bystochastic optimal control,”
SIAM Journal on Control and Optimization ,vol. 47, no. 3, pp. 1127–1139, 2008.[18] C. L´eonard, “From the Schr¨odinger problem to the Monge–Kantorovichproblem,”
Journal of Functional Analysis , vol. 262, no. 4, pp. 1879–1920, 2012.[19] ——, “A survey of the Schr¨odinger problem and some of its connectionswith optimal transport,”
Dicrete Contin. Dyn. Syst. A , vol. 34, no. 4, pp.1533–1574, 2014.[20] I. G. Vladimirov and I. R. Petersen, “State distributions and minimumrelative entropy noise sequences in uncertain stochastic systems: Thediscrete-time case,”
SIAM Journal on Control and Optimization , vol. 53,no. 3, pp. 1107–1153, 2015.[21] E. Bakolas, “Optimal covariance control for stochastic linear systemssubject to integral quadratic state constraints,” in . IEEE, 2016, pp. 7231–7236.[22] Y. Chen, T. T. Georgiou, and M. Pavon, “On the relation between optimaltransport and Schr¨odinger bridges: A stochastic control viewpoint,”
Journal of Optimization Theory and Applications , vol. 169, no. 2, pp.671–691, 2016., vol. 169, no. 2, pp.671–691, 2016.