Covariance Steering of Discrete-Time Stochastic Linear Systems Based on Distribution Distance Terminal Costs
CCovariance Steering of Discrete-Time Stochastic Linear Systems Based onDistribution Distance Terminal Costs
Isin M. Balci and Efstathios Bakolas
Abstract — We consider a class of stochastic optimal controlproblems for discrete-time stochastic linear systems which seekfor control policies that will steer the probability distribution ofthe terminal state of the system close to a desired Gaussian dis-tribution. In our problem formulation, the closeness between theterminal state distribution and the desired (goal) distribution ismeasured in terms of the squared Wasserstein distance which isassociated with a corresponding terminal cost term. We recastthe stochastic optimal control problem as a finite-dimensionalnonlinear program and we show that its performance indexcan be expressed as the difference of two convex functions.This representation of the performance index allows us tofind local minimizers of the original nonlinear program viathe so-called convex-concave procedure [1]. Subsequently, weconsider a similar problem but this time we use a terminalcost that corresponds to the KL divergence. Finally, we presentnon-trivial numerical simulations to demonstrate the proposedtechniques and compare them in terms of computation time.
I. I
NTRODUCTION
We consider covariance steering problems for discrete-time stochastic linear systems in which, however, the con-straints on the terminal state covariance are enforced indi-rectly by means of appropriate terminal costs. Specificallywe consider the problem of steering the state of a stochasticsystem, which is originally drawn from a given Gaussiandistribution, to a terminal state whose distribution is “close”to a desired (prescribed) Gaussian distribution, where thecloseness between the two distributions is measured in termsof the squared Wasserstein distance or the Kullback-Leiblerdivergence. We show that the resulting problems can bereduced to tractable optimization problems which can besolved efficiently if one exploits their structure.
Literature Review:
The main focus of the first attemptsto study covariance steering problems [2, 3, 4] was onfinding stabilizing controllers that drive the state covarianceto a desired positive definite matrix asymptotically (infinite-horizon case). Finite-horizon covariance control problemsfor continuous-time linear systems were recently studied in[5, 6, 7, 8]. Covariance steering problems for discrete-timesystems are also receiving significant attention at present. In[9], the constrained covariance steering problem is recast asa finite dimensional convex optimization problem based ona semidefinite relaxation of the constraint on the terminalstate covariance. Covariance steering problems with convexchance constraints are studied in [10].
I. M. Balci (graduate student) and E. Bakolas (Associate Professor) arewith the Department of Aerospace Engineering and Engineering Mechanics,The University of Texas at Austin, Austin, Texas 78712-1221, USA, Emails:[email protected]; [email protected] research has been supported in part by NSF awards ECCS-1924790and CMMI-1937957.
In the previously discussed references, the specificationson the terminal state covariance correspond to hard con-straints which often lead to difficult problems (for instance,the analytic solution to the covariance steering problempresented in [5] is only valid for the special case in which theinput and noise channels coincide). An alternative problemformulation, which has inspired this paper, is presented in[11] in which a terminal cost is used as a “soft” constrainton the terminal state covariance. The latter cost correspondsto the squared Wasserstein distance between a desired statedistribution and the “actual” terminal state distribution. Thelatter formulation leads to a standard two-point boundaryvalue problem which can be solved by means of indirectshooting methods. It is well known that the success of suchmethods relies on knowledge of good initial guesses andthus, in general, a systematic process for the computation ofthe solution to the class of covariance steering problems pro-posed in [11] with soft terminal constraints is still missing.
Main Contribution:
We first formulate the covariancesteering problem as a stochastic optimal control problem inwhich the requirement on the terminal state covariance isencoded in a terminal cost term (“soft constraint”). Similarlywith [11], we consider the case in which the terminal costcorresponds to the squared Wasserstein distance between theactual terminal state distribution and the desired Gaussiandistribution but in contrast with the latter reference, weconsider the discrete-time case. First, we recast this stochas-tic optimal control problem as a (deterministic) nonlinearprogram by utilizing an affine state feedback control policyparametrization (the control input at each stage is an affinefunction of the history of visited states). Then, we showthat the performance index of the nonlinear program can beexpressed as the difference of two convex functions by usinga suitable bilinear transformation of the decision variables.To the best of our knowledge, this is the first paper thatshows that covariance steering problems can be formulatedas a difference of convex functions program (DCP). Byleveraging this fact, one can find local minimizers of thenonlinear program via efficient techniques such as the so-called convex-concave procedure (CCP) [1, 12]. The CCP isan iterative procedure which can compute local minimizersof non-convex optimization problems which correspond toDCP based on successive convexifications. Exploiting thisextra structure of the problem reduces its complexity andallows us to use convex optimization solvers which in turnleads to improved scalability and numerical efficiency.Finally, we consider the same class of problems whenthe terminal cost corresponds to the Kullback-Leibler diver-gence, which is used as a measure of the closeness between a r X i v : . [ m a t h . O C ] S e p he terminal state distribution and the goal distribution (onecan also consider different generalized distance metrics be-tween the two distributions; a review of distance metrics onprobability distributions can be found in [13]). Even thoughthe resulting nonlinear program does not corresponds to aDCP, we show empirically, that one can compute its localminimizers by using interior-point methods for nonlinearprograms. Outline:
The rest of the paper is organized as follows.Section II presents the problem formulation. In SectionIII, we show that when the terminal cost is the squaredWasserstein distance, the covariance steering problem can beassociated with a difference of convex functions program. InSection IV, we provide an alternative problem formulation inwhich the terminal cost corresponds to the KL divergence. InSection V, we present numerical simulations. Finally, SectionVI concludes the paper with a summary of remarks andfuture research directions.II. P
ROBLEM F ORMULATION
A. Notation
We denote by R n the set of n -dimensional real vectors andby R and R + (resp., R ++ ) the set of real numbers and non-negative (resp., strictly positive) real numbers, respectively.The sets of non-negative and strictly positive integers aredenoted by Z + and Z ++ , respectively. We denote by E [ · ] the expectation operator. Given a random vector x , wedenote its mean vector and covariance matrix by E [ x ] and Cov[ x ] , respectively. The space of n × n symmetric matricesis denoted by S n and the cone of positive semi-definite(definite) symmetric matrices by S + n ( S ++ n ). The trace ofa square matrix is denoted as tr( · ) . The transpose of amatrix A ∈ R n × m is denoted by A T and its nuclearnorm by (cid:107) A (cid:107) ∗ where (cid:107) A (cid:107) ∗ := tr(( A T A ) / ) . The blockdiagonal matrix formed by n matrices A , . . . , A n is denotedby blkdiag( A , . . . , A n ) . The zero matrix is denoted as whereas the identity matrix as I . We write x ∼ N ( µ, S ) todenote that x is a Gaussian random vector with mean µ ∈ R n and covariance S ∈ S ++ n . B. Distance Between Probability Distributions
In this paper, we will formulate stochastic optimal controlproblems with terminal costs that measure the closenessbetween the final state distribution and a desired probabilitydistribution. In particular, we will consider two differentdistribution (generalized) distance functions, namely, theWasserstein distance and the KL divergence.
1) Wasserstein Distance Between Two Distributions:
TheWasserstein distance between two probability measures isa valid distance metric (in the strict mathematical sense)because it satisfies all of the properties of a metric. Giventwo random vectors x , x over R n with probability densityfunctions ρ , ρ , their squared Wasserstein distance is definedas follows: W ( ρ , ρ ) := inf ρ ∈P ( ρ ,ρ ) E y [ (cid:107) x − x (cid:107) ] , (1)where y := [ x , x ] T and has a probability density function(pdf) ρ : R n → R + . Furthermore, P ( ρ , ρ ) denotes the set of all probability distributions over R n with finite secondmoments and marginals ρ and ρ on x and x , respectively.If x i ∼ N ( µ i , S i ) for i = 1 , where µ i ∈ R n and S i ∈ S ++ n , then the squared Wasserstein distance is given by [14] W ( ρ , ρ ) = (cid:107) µ − µ (cid:107) + tr (cid:16) S + S − S / S S / ) / (cid:17) . (2)
2) Kullback-Leibler Divergence:
The KL divergence isnot a metric in the strict mathematical sense (it does notenjoy the symmetry property) but it is often used to computethe “distance” between two distributions because of itsease of computation. In particular, given two probabilitydistributions with density functions ρ ( x ) and ρ ( x ) , theirKL divergence is defined as: KL( ρ || ρ ) = (cid:90) ρ ( x ) log (cid:18) ρ ( x ) ρ ( x ) (cid:19) d x. (3)where ρ ( x ) > over the domain of integration.When ρ and ρ correspond to the densities of two Gaus-sian distributions N ( µ , S ) and N ( µ , S ) where µ i ∈ R n and S i ∈ S ++ n is given by KL( ρ || ρ ) = (1 / (cid:2) tr (cid:16) S − S (cid:17) + ( µ − µ ) T S − ( µ − µ ) − n + log (cid:0) det( S ) / det( S ) (cid:1)(cid:3) . (4) C. Problem Statement
We consider an uncertain system whose dynamics isdescribed by the following discrete-time stochastic linearstate space model: x k +1 = A k x k + B k u k + G k w k , ∀ k ∈ Z + , (5)where { x k } k ∈ Z + is the state (random) process over R n x , { u k } k ∈ Z + is the input process over R n u and { w k } k ∈ Z + is thenoise (random) process over R n w . In particular, { w k } k ∈ Z + corresponds to a white Gaussian noise process with E [ w k ] =0 and E [ w k w (cid:62) m ] = δ ( k, m ) I , where δ ( k, m ) = 1 when k = m and δ ( k, m ) = 0 , otherwise. We also assume thatthe initial state x ∼ N ( µ , S ) and that x and { w k } aremutually independent, which implies that E [ x w (cid:62) k ] = forall k ∈ Z + .Our objective is to drive the uncertain state of the system(5) from its given initial distribution to a terminal distributionwhich is close to a desired terminal Gaussian probabilitydistribution N ( µ d , S d ) , where µ d ∈ R n and S d ∈ S ++ n are given, at a given finite time while minimizing a relevantperformance index. Next, we provide the precise formulationof our problem. Problem 1.
Let µ , µ f ∈ R n x , S , S f ∈ S ++ n x , λ > and N ∈ Z ++ be given. In addition, let Π denote the set of alladmissible control policies π := { m ( · ) , . . . , m N − ( · ) } forsystem (5) , with u k = m k ( X k ) where X k denotes the (finite)sequence of states visited up to stage t = k , that is, X k := { x , x , . . . x k } , and m k ( X k ) are measurable functions ofthe elements of X k , for k = 0 , . . . , N − . Then, find aontrol policy π ∗ ∈ Π that solves the following stochasticoptimal control problem: Minimize π ∈ Π E (cid:34) N − (cid:88) k =0 u T k u k (cid:35) + λϕ ( ρ N , ρ d ) (6a) subject to x k +1 = A k x k + B k u k + G k w k (6b) x ∼ N ( µ , S ) (6c) where ρ d is the pdf of the Gaussian probability distribution N ( µ f , S f ) (desired state distribution), ρ N is the pdf of theterminal state x ( N ) , and ϕ ( ρ N , ρ d ) denotes the (general-ized) distance between the probability distributions of thedesired state and the actual terminal state of the system.In particular, ϕ ( ρ N , ρ d ) = W ( ρ N , ρ d ) or ϕ ( ρ N , ρ d ) =KL( ρ N || ρ d ) . In order to associate Problem 1 with a tractable, finite-dimensional optimization problem, we only consider admis-sible control policies that correspond to sequences of controllaws m k ( · ) which are affine functions of the state history: m k ( X k ) = k (cid:88) i =0 K ( k, i ) (cid:0) x i − ¯ x i (cid:1) + u ff ( k ) , (7)where ¯ x i = E [ x i ] . Next, we show the main steps forrecasting the Problem 1, whose decision variable correspondsto the control policy π , as an optimization problem whosedecision variables are the controller parameters u ff ( k ) ∈ R n u and K ( k, j ) ∈ R n u × n x , ∀ k ≥ j ∈ { , . . . , N − } .III. C OVARIANCE S TEERING B ASED ON A W ASSERSTEIN D ISTANCE T ERMINAL C OST
In this section, we will show that Problem 1 when ϕ ( ρ N , ρ d ) = W ( ρ N , ρ d ) can be associated with a differenceof convex function program (DCP), that is, a nonlinearprogram whose performance index is equal to the differenceof two convex functions. This will allow us to efficientlycompute local minimizers of Problem 1 by means of heuristicand easily implementable algorithms, such as the convex-concave procedure [1]. It is worth mentioning that the setof objective functions which can be expressed as the differ-ence of convex functions is dense in the set of continuousfunctions; moreover, every twice differentiable function canbe represented as the difference of convex functions [15].However, there is no systematic process that is guaranteedto find such a representation for a given function of interestexcept for special classes of functions.Next, we recast Problem 1 as a finite-dimensional opti-mization problem. To this aim, we express the state x k interms of a finite-dimensional decision variable. In particular,by propagating forward in time the state of the discrete-time stochastic system (5) and using the control policyparametrization given in (7), we can express x k as a functionof x , { u i } k − i =0 and { w i } k − i =0 as follows: x k = Φ( k, x + k − (cid:88) i =0 Φ( k, i ) B i u i + k − (cid:88) i =0 Φ( k, i ) G i w i , (8) where Φ( k, n ) (cid:44) A k − . . . A n , Φ( n, n ) = I with k ≥ n for k, n ∈ Z + . Now, let us define the following quantities: x := [ x (0) T , x (1) T , . . . , x ( N ) T ] T ∈ R n x ( N +1) , (9a) u := [ u (0) T , u (1) T , . . . , u ( N − T ] T ∈ R n u N , (9b) w := [ w (0) T , w (1) T , . . . , w ( N − T ] T ∈ R n w N . (9c)By using equations (8)-(9), it follows that x = Γ x + H u u + H w w , (10)where Γ := [ I Φ(1 ,
0) Φ(2 , . . . Φ( N, , (11) H u := . . . B . . . Φ(2 , B B . . . ... ... ... ... Φ( N, B Φ( N, B . . . B N − , (12)and H w is defined similarly, after replacing the matrices B i in (12) with the matrices G i . One can refer to [9] for thedetails on the derivation of (10)-(12).Because the performance index of Problem 1 consists ofa terminal cost term, we will use the following equation: x ( N ) = F x , F := [ · · · I ] , (13)to recover x ( N ) from x .Given the particular affine parametrization of the controlpolicy as in (7) and the fact that the initial state is assumedto be a Gaussian (random) vector, it follows that the statesof the system in the subsequent stages will also be Gaussian(random) vectors. In addition, we obtain u = K ( x − ¯ x ) + u ff , (14)where ¯ x := E [ x ] , u ff := [ u Tff (0) , . . . , u Tff ( N − T and K := K (0 , . . . K (1 , K (1 , . . . K (2 , K (2 , . . . ... ... ... ... K ( N − , K ( N − , . . . . (15)We proceed with the derivation of the expression of theperformance index of Problem 1 in terms of the new decisionvariables. To this aim, we write (cid:80) N − k =0 u T k u k = u T u , whichin view of basic properties of trace operator and (7) gives E [ u T u ] = E [tr( uu T )]= E [tr(( K ( x − ¯ x ) + u ff )( K ( x − ¯ x ) + u ff ) T )]= tr( K E [˜ x ˜ x T ] K T ) + (cid:107) u ff (cid:107) , (16)where ˜ x := x − ¯ x and in the derivation of the last equality,we have used the fact that u ff is a deterministic quantity.For the computation of Cov[ x ] = E [˜ x ˜ x T ] , we first haveto compute ¯ x = E [ x ] . By taking expectation of both sidesof (10), we obtain: E [ x ] = E [ Γ x + H u ( K ( x − ¯ x ) + u ff ) + H w w ]= Γ µ + H u u ff . (17)fter some simple algebraic manipulations, we get: ˜ x = ( I − H u K ) − ( Γ ( x − µ ) + H w w ) . (18)Let ¯ K := ( I − H u K ) − and ¯ x := x − µ . We obtain: E [˜ x ˜ x T ] = ¯ K (Γ S Γ T + H w S w H w T ) ¯ K T . (19)From (17) and (19), we can obtain the following expressionsfor µ N := E [ x ( N )] and S N := Cov [ x ( N )] : µ N = F ( Γ µ + H u u ff ) , (20a) S N = F ( I − H u K ) − ˜ S ( I − H u K ) − T F T , (20b)where ˜ S = (Γ S Γ T + H w S w H w T ) and S w = E [ ww T ] .By plugging (19) into (16), we have: E [ u T u ] = tr( K ( I − H u K ) − ˜ S ( I − H u K ) − T K T )+ (cid:107) u ff (cid:107) . (21)After plugging the expressions of µ N and S N in (20a)and (20b) into the expression of W ( ρ N , ρ d ) in the case ofGaussian distributions, which is given in (2), we get: W ( ρ N , ρ d ) = (cid:107) F ( Γ µ + H u u ff ) − µ d (cid:107) + tr( F ( I − H u K ) − ˜ S F ( I − H u K ) − T F T + S d ) − (cid:112) S d × ( F ( I − H u K ) − ˜ S F ( I − H u K ) − T F T ) / × (cid:112) S d ) . (22)At this point, we propose to apply a variable transfor-mation, which was first proposed in [16] and later usedfor covariance steering problems in [9], to convexify theoptimization problem. In particular, we a new transformedvariable, Θ , which is defined as follows: Θ := K ( I − H u K ) − =: ϕ ( K ) (23a) K := ( I + H u Θ ) − Θ =: φ ( Θ ) . (23b)Furthermore, by using the identity ( I + P ) − = I − P ( I + P ) − , we obtain: ( I − H u K ) − = I + H u K ( I − H u K ) − = ( I + H u Θ ) . (23c)As is shown in [16], the functions φ ( · ) and ϕ ( · ) determinea bijective transformation, that is, φ ( · ) = ϕ − ( · ) and viceversa. Therefore, the right hand sides of equations (21) and(22) can be expressed equivalently in terms of transformedvariables (23) as follows: E [ u T u ] = tr( Θ ¯ S Θ T ) + u Tff u ff (24) W = (cid:107) F ( Γ µ + H u u ff ) − µ d (cid:107) + tr( F ( I + H u Θ ) ˜ S ( I + H u Θ ) T F T ) − (cid:112) S d F ( I + H u Θ ) ˜ S ( I + H u Θ ) T F T (cid:112) S d ) / )+ tr( S d ) . (25) Remark 1.
It should be noted that K is a block lowertriangular matrix whose last n x columns are equal to . Ifwe examine equation (23b) , we observe that ( I − H u K ) − is block lower triangular since H u is also block lower triangular, which implies that ( I − H u K ) − is well defined.Finally, left multiplication of ( I − H u K ) − with K gives Θ , which is also a block lower triangular matrix with thesame dimension as K . The reader can refer [9, 16] for moredetails. An important observation is that the new decisionvariable Θ should have the same structure as K for thecontrol policy to maintain causality. Finally, the performance index of Problem 1 can beexpressed in terms of the decision variables u ff and Θ . Letus denote this function as J ( u ff , Θ ) , where J ( u ff , Θ ) = J ( u ff ) + J ( Θ ) + J ( Θ ) − J ( Θ ) , (26)with J ( u ff ) := (cid:107) u ff (cid:107) + λ (cid:107) F ( Γ µ + H u u ff ) − µ d (cid:107) , (27a) J ( Θ ) := tr( Θ ¯ S Θ T ) , (27b) J ( Θ ) := λ tr( F ( I + H u Θ ) ˜ S ( I + H u Θ ) T F T )+ tr( S d ) , (27c) J ( Θ ) := 2 λ tr(( (cid:112) S d F ( I + H u Θ ) × ˜ S ( I + H u Θ ) T F T (cid:112) S d ) / ) . (27d)Thus, Problem 1 can be reduced to the following optimiza-tion problem: Problem 2.
Let µ , µ d ∈ R n x , S , S d ∈ S ++ n x , N ∈ Z ++ and { A k , B k , G k } Nk =0 , where A k ∈ R n x × n x , B k ∈ R n x × n u and G k ∈ R n x × n w , be given. Find a pair ( u (cid:63) ff , Θ (cid:63) ) , where Θ (cid:63) is a block lower triangular matrix in R n u N × n x ( N +1) and u (cid:63) ff ∈ R n u N , that minimizes the objective function J ( u ff , Θ ) ,which is defined in (26) - (27) . Proposition 1.
Let λ ∈ R ++ be given. Then. the functions J , J , J and J , which are defined in (27) , are convexand thus Problem 2 corresponds to a difference of convexfunctions program (DCP).Proof. The proof of convexity of the functions J ( · ) , J ( · ) and J ( · ) can be found in [9]. For the convexity of J ( · ) ,we need to define the functions g ( Θ ) := ( √ S d F ( I + H u Θ ) ˜ V ˜ D / ) T , where ˜ V T ˜ D ˜ V is the eigenvalue decompo-sition of ˜ S , and f ( A ) := tr (( A T A ) / ) = (cid:107) A (cid:107) ∗ . Clearly, g ( · ) is an affine function. In addition, f ( · ) corresponds tothe nuclear norm, which is a valid matrix norm [17] andthus, f ( · ) is a convex function. Finally, J ( Θ ) is convex asthe composition of the convex function f ( · ) with the affinefunction g ( · ) . Remark 2.
Proposition 1 implies that Problem 1 can bereduced to a DCP, whose (local) minimizers can be foundby means of the so-called convex-concave procedure [1, 12]which is known to be efficient and robust in practice.
IV. NLP
FORMULATION FOR
KL D
IVERGENCE T ERMINAL C OST
If we consider Problem 1 when the terminal cost ϕ ( ρ N , ρ d ) = KL( ρ N || ρ d ) , then we will arrive at a nonlinearprogram (NLP) similar to Problem 2. However, using thevariable transformations given in (23) will not yield a DCPs in the case with ϕ ( ρ N , ρ d ) = W ( ρ N , ρ d ) . Thus, usingstate history feedback will not necessarily help us associatethe covariance steering problem (Problem 1) to a tractableoptimization problem. We will instead consider a memoryless state feedback (affine) controller in the form: u k = K ( k ) (cid:0) x k − ¯ x k (cid:1) + u ff ( k ) , (28)where ¯ x k = E [ x k ] . Independent of the choice of thecontroller form, we can express the running cost termof the performance index of Problem 1 as in (21) andalso obtain expressions for the mean and variance of thefinal state x ( N ) as in (20a) and (20b). The only differ-ence will be in the matrix K which is now defined as K := [blkdiag( K (0) , K (1) , . . . , K ( N − , ] , which issignificantly more sparse than the previous case. Becausethe closed-loop system is linear (given the structure of thecontroller given in (28)), the final state x ( N ) will be aGaussian random variable x ( N ) ∼ N ( µ N , S N ) and thuswe can recover an NLP using the expression for the KLdivergence given in (4). Since the KL divergence is notsymmetric, the final objective function will depend on theorder of ρ N and ρ d . We take ϕ ( ρ N , ρ d ) = KL( ρ N || ρ d ) .Thus, the objective function can be expressed as follows byplugging (21) and (20) into (4): J ( K , u ff ) = (cid:107) u ff (cid:107) + tr( K ( I − H u K ) − ˜ S ( I − H u K ) − T K T )+ ( λ/ (cid:104) tr (cid:0) S − d F ( I − H u K ) − ˜ S ( I − H u K ) − T F T (cid:1) + ( µ d − F ( Γ µ + H u u ff )) T S − d ( µ d − F ( Γ µ + H u u ff )) − n x + log(det S d ) − log (cid:0) det( F ( I − H u K ) − ˜ S ( I − H u K ) − T F T ) (cid:1)(cid:105) . (29)In this case, Problem 1 reduces to the following optimiza-tion problem: Problem 3.
Let µ , µ d ∈ R n x , S , S d ∈ S ++ n x , N ∈ Z ++ and { A k , B k , G k } Nk =0 , where A k ∈ R n x × n x , B k ∈ R n x × n u and G k ∈ R n x × n w , be given. Find a pair ( K (cid:63) , u (cid:63) ff ) , where K (cid:63) := [blkdiag( K (cid:63) (0) , K (cid:63) (1) , . . . , K (cid:63) ( N − , ] where K (cid:63) ( i ) ∈ R n u × n x , for i ∈ { , . . . , N − } , and u (cid:63) ff ∈ R n u N ,that minimizes the objective function J ( K , u ff ) defined in (29) . Because the objective function given in (29) is not convexin ( K , u ff ) , Problem 3 corresponds to a non-convex NLP,in general. In addition, Problem 3 does not correspond to aDCP, but local minimizers of this problem can still be com-puted by using nonlinear interior point methods and solverssuch as IPOPT[18] and the scipy optimization package [19],which are readily available.V. N UMERICAL E XPERIMENTS
In this section, we present numerical experiments wherewe used the convex-concave procedure (CCP) with MOSEK[20] to solve Problem 2 and CVXPY [21] for modelingof convexified subproblems. To solve Problem 3 which is a nonlinear program, we used the scipy optimization [19]implementation of the L-BFGS-B algorithm. We considerthe linear state space model (5) with A k = (cid:2) t (cid:3) , B k =[0 ∆ t ] T , G k = I , w k ∼ N ( , γ I ) , ∀ k ∈ Z + . We also took x ∼ N ( µ , S ) , µ = [0 , T , S = 10 I , µ d = [10 , T , S d = I , ∆ t = 1 . In addition, N ∈ { , , , , } and γ ∈ { , . } are chosen for different experiments to comparecomputation time.Figure 1 illustrates the evolution of the state distributionof the system. We use λ = 10 . for the Wasserstein distancecase and λ = 70 . for the KL divergence case for scalingpurposes. The noise intensity parameter γ = 1 and theproblem horizon N = 20 in both experiments. The finalstate covariance matrices are (cid:2) .
81 0 . .
19 1 . (cid:3) for the Wassersteindistance and (cid:2) .
65 0 . .
06 2 . (cid:3) for the KL divergence. Since bothproblems are non-convex, the obtained solutions are expectedto depend on the initial guess. However, repeating the exper-iments with different initial guesses did not change the finalcost and the covariance matrices significantly even thoughthe control policy parameters did change. − − −
20 0 20 − x − − −
20 0 20 − x x Fig. 1. Evolution of 2- σ confidence ellipses (in green) for the controlledsystem based on KL divergence (top) and Wasserstein distance (bottom)terminal cost functions. The blue ellipses correspond to the 2- σ confidenceellipses of the initial state, whereas the red ellipses to the desired distribu-tion. In Figure 2, sample paths of the controlled system areshown for N = 40 . We observe that the optimal controlpolicy allows the spread of trajectories (uncertainty) to“grow” in the beginning and tries to reduce it down towardsthe end of the time horizon. This result is expected giventhat the state covariance is not penalized in the running costterm of the performance index in Problem 1 whereas theuncertainty in the control input is penalized by the term tr( Θ ¯ S Θ ) .In Table I, we compare the computation time of theNLP solver [19] and our CCP based approach for differentproblem instances with different values for the noise intensityparameter γ and the problem horizon N . In our simulations,we used the termination condition ( f k − f k − ) /f k ≤ (cid:15) where f k is the value of objective function at the k th iteration and (cid:15) is the convergence tolerance which was taken to be − .We observe that our approach reduces the computation timesignificantly in all cases. ig. 2. 15 sample paths of controlled system with Wasserstein distanceterminal cost where blue ellipses are 2- σ confidence regions for initialstate and final state and red ellipse is the 2- σ confidence region of desireddistribution. ( γ = 1 , λ = 10 . , N = 40 ).TABLE IC OMPUTATION TIME ( IN SECONDS ) FOR DIFFERENT PROBLEMINSTANCES FOR THE W ASSERSTEIN D ISTANCE T ERMINAL C OST γ = 1 N=10 N=20 N=30 N=40 N=50NLP 7.88 44.30 120.93 348.39 643.65CCP 0.93 7.65 12.81 32.85 68.72 γ = . N=10 N=20 N=30 N=40 N=50NLP 18.01 28.57 209.93 510.14 907.40CCP 2.89 17.11 53.68 156.29 314.52
VI. C
ONCLUSION
We have addressed the covariance steering problem withsoft terminal constraints based on two different problemformulations in which the terminal cost is associated witheither the squared Wasserstein distance or the KL divergencebetween the terminal state distribution and a desired distri-bution. We have shown that in the case with the squaredWasserstein distance terminal cost, the proposed covariancesteering problem reduces to a DCP which can be solvedefficiently by the so-called convex-concave procedure alongwith convex optimization solvers. Our numerical experimentshave shown that our approach reduces significantly thecomputation time compared to off-the-shelf solvers. In ourfuture work, we plan to extend our approach to covariancesteering problems for nonlinear stochastic systems.
REFERENCES [1] A. L Yuille and A. Rangarajan. “The concave-convexprocedure”. In:
Neural computation
Int. J. of Control
IEEE Trans. onAutom. Control
Int. J. of Control
IEEE Trans. on Autom.Control
IEEE Trans. on Autom.Control
IEEE Trans. onAutom. Control
ACC . 2016, pp. 7231–7236.[9] E. Bakolas. “Finite-horizon covariance control fordiscrete-time stochastic linear systems subject to inputconstraints”. In:
Automatica
91 (2018), pp. 61–68.[10] K. Okamoto, M. Goldshtein, and P. Tsiotras. “Optimalcovariance control for stochastic systems under chanceconstraints”. In:
IEEE Control Systems Letters
ACC . 2016, pp. 7249–7254.[12] X. Shen et al. “Disciplined convex-concave program-ming”. In:
CDC . 2016, pp. 1009–1014.[13] A. L Gibbs and F. E. Su. “On choosing and boundingprobability metrics”. In:
Int. Stat. Review
MichiganMath. J.
J. Optim. Theory Appl.
IEEE Trans. on Autom.Control
Matrix analysis . Cam-bridge university press, 2012.[18] A. W¨achter and L. T. Biegler. “On the implementationof an interior-point filter line-search algorithm forlarge-scale nonlinear programming”. In:
Mathematicalprogramming
Nature meth-ods (2010).[21] S. Diamond and S. Boyd. “CVXPY: A Python-embedded modeling language for convex optimiza-tion”. In: