[PDF] Covariance Steering of Discrete-Time Stochastic Linear Systems Based on Distribution Distance Terminal Costs

Abstract

We consider a class of stochastic optimal control problems for discrete-time stochastic linear systems which seek for control policies that will steer the probability distribution of the terminal state of the system close to a desired Gaussian distribution. In our problem formulation, the closeness between the terminal state distribution and the desired (goal) distribution is measured in terms of the squared Wasserstein distance which is associated with a corresponding terminal cost term. We recast the stochastic optimal control problem as a finite-dimensional nonlinear program and we show that its performance index can be expressed as the difference of two convex functions. This representation of the performance index allows us to find local minimizers of the original nonlinear program via the so-called convex-concave procedure [1]. Subsequently, we consider a similar problem but this time we use a terminal cost that corresponds to the KL divergence. Finally, we present non-trivial numerical simulations to demonstrate the proposed techniques and compare them in terms of computation time.

Full PDF

CCovariance Steering of Discrete-Time Stochastic Linear Systems Based onDistribution Distance Terminal Costs

Isin M. Balci and Efstathios Bakolas

Abstract — We consider a class of stochastic optimal controlproblems for discrete-time stochastic linear systems which seekfor control policies that will steer the probability distribution ofthe terminal state of the system close to a desired Gaussian dis-tribution. In our problem formulation, the closeness between theterminal state distribution and the desired (goal) distribution ismeasured in terms of the squared Wasserstein distance which isassociated with a corresponding terminal cost term. We recastthe stochastic optimal control problem as a ﬁnite-dimensionalnonlinear program and we show that its performance indexcan be expressed as the difference of two convex functions.This representation of the performance index allows us toﬁnd local minimizers of the original nonlinear program viathe so-called convex-concave procedure [1]. Subsequently, weconsider a similar problem but this time we use a terminalcost that corresponds to the KL divergence. Finally, we presentnon-trivial numerical simulations to demonstrate the proposedtechniques and compare them in terms of computation time.

I. I

NTRODUCTION

We consider covariance steering problems for discrete-time stochastic linear systems in which, however, the con-straints on the terminal state covariance are enforced indi-rectly by means of appropriate terminal costs. Speciﬁcallywe consider the problem of steering the state of a stochasticsystem, which is originally drawn from a given Gaussiandistribution, to a terminal state whose distribution is “close”to a desired (prescribed) Gaussian distribution, where thecloseness between the two distributions is measured in termsof the squared Wasserstein distance or the Kullback-Leiblerdivergence. We show that the resulting problems can bereduced to tractable optimization problems which can besolved efﬁciently if one exploits their structure.

Literature Review:

The main focus of the ﬁrst attemptsto study covariance steering problems [2, 3, 4] was onﬁnding stabilizing controllers that drive the state covarianceto a desired positive deﬁnite matrix asymptotically (inﬁnite-horizon case). Finite-horizon covariance control problemsfor continuous-time linear systems were recently studied in[5, 6, 7, 8]. Covariance steering problems for discrete-timesystems are also receiving signiﬁcant attention at present. In[9], the constrained covariance steering problem is recast asa ﬁnite dimensional convex optimization problem based ona semideﬁnite relaxation of the constraint on the terminalstate covariance. Covariance steering problems with convexchance constraints are studied in [10].

I. M. Balci (graduate student) and E. Bakolas (Associate Professor) arewith the Department of Aerospace Engineering and Engineering Mechanics,The University of Texas at Austin, Austin, Texas 78712-1221, USA, Emails:[email protected]; [email protected] research has been supported in part by NSF awards ECCS-1924790and CMMI-1937957.

In the previously discussed references, the speciﬁcationson the terminal state covariance correspond to hard con-straints which often lead to difﬁcult problems (for instance,the analytic solution to the covariance steering problempresented in [5] is only valid for the special case in which theinput and noise channels coincide). An alternative problemformulation, which has inspired this paper, is presented in[11] in which a terminal cost is used as a “soft” constrainton the terminal state covariance. The latter cost correspondsto the squared Wasserstein distance between a desired statedistribution and the “actual” terminal state distribution. Thelatter formulation leads to a standard two-point boundaryvalue problem which can be solved by means of indirectshooting methods. It is well known that the success of suchmethods relies on knowledge of good initial guesses andthus, in general, a systematic process for the computation ofthe solution to the class of covariance steering problems pro-posed in [11] with soft terminal constraints is still missing.

Main Contribution:

We ﬁrst formulate the covariancesteering problem as a stochastic optimal control problem inwhich the requirement on the terminal state covariance isencoded in a terminal cost term (“soft constraint”). Similarlywith [11], we consider the case in which the terminal costcorresponds to the squared Wasserstein distance between theactual terminal state distribution and the desired Gaussiandistribution but in contrast with the latter reference, weconsider the discrete-time case. First, we recast this stochas-tic optimal control problem as a (deterministic) nonlinearprogram by utilizing an afﬁne state feedback control policyparametrization (the control input at each stage is an afﬁnefunction of the history of visited states). Then, we showthat the performance index of the nonlinear program can beexpressed as the difference of two convex functions by usinga suitable bilinear transformation of the decision variables.To the best of our knowledge, this is the ﬁrst paper thatshows that covariance steering problems can be formulatedas a difference of convex functions program (DCP). Byleveraging this fact, one can ﬁnd local minimizers of thenonlinear program via efﬁcient techniques such as the so-called convex-concave procedure (CCP) [1, 12]. The CCP isan iterative procedure which can compute local minimizersof non-convex optimization problems which correspond toDCP based on successive convexiﬁcations. Exploiting thisextra structure of the problem reduces its complexity andallows us to use convex optimization solvers which in turnleads to improved scalability and numerical efﬁciency.Finally, we consider the same class of problems whenthe terminal cost corresponds to the Kullback-Leibler diver-gence, which is used as a measure of the closeness between a r X i v : . [ m a t h . O C ] S e p he terminal state distribution and the goal distribution (onecan also consider different generalized distance metrics be-tween the two distributions; a review of distance metrics onprobability distributions can be found in [13]). Even thoughthe resulting nonlinear program does not corresponds to aDCP, we show empirically, that one can compute its localminimizers by using interior-point methods for nonlinearprograms. Outline:

The rest of the paper is organized as follows.Section II presents the problem formulation. In SectionIII, we show that when the terminal cost is the squaredWasserstein distance, the covariance steering problem can beassociated with a difference of convex functions program. InSection IV, we provide an alternative problem formulation inwhich the terminal cost corresponds to the KL divergence. InSection V, we present numerical simulations. Finally, SectionVI concludes the paper with a summary of remarks andfuture research directions.II. P

ROBLEM F ORMULATION

A. Notation

We denote by R n the set of n -dimensional real vectors andby R and R + (resp., R ++ ) the set of real numbers and non-negative (resp., strictly positive) real numbers, respectively.The sets of non-negative and strictly positive integers aredenoted by Z + and Z ++ , respectively. We denote by E [ · ] the expectation operator. Given a random vector x , wedenote its mean vector and covariance matrix by E [ x ] and Cov[ x ] , respectively. The space of n × n symmetric matricesis denoted by S n and the cone of positive semi-deﬁnite(deﬁnite) symmetric matrices by S + n ( S ++ n ). The trace ofa square matrix is denoted as tr( · ) . The transpose of amatrix A ∈ R n × m is denoted by A T and its nuclearnorm by (cid:107) A (cid:107) ∗ where (cid:107) A (cid:107) ∗ := tr(( A T A ) / ) . The blockdiagonal matrix formed by n matrices A , . . . , A n is denotedby blkdiag( A , . . . , A n ) . The zero matrix is denoted as whereas the identity matrix as I . We write x ∼ N ( µ, S ) todenote that x is a Gaussian random vector with mean µ ∈ R n and covariance S ∈ S ++ n . B. Distance Between Probability Distributions

In this paper, we will formulate stochastic optimal controlproblems with terminal costs that measure the closenessbetween the ﬁnal state distribution and a desired probabilitydistribution. In particular, we will consider two differentdistribution (generalized) distance functions, namely, theWasserstein distance and the KL divergence.

1) Wasserstein Distance Between Two Distributions:

TheWasserstein distance between two probability measures isa valid distance metric (in the strict mathematical sense)because it satisﬁes all of the properties of a metric. Giventwo random vectors x , x over R n with probability densityfunctions ρ , ρ , their squared Wasserstein distance is deﬁnedas follows: W ( ρ , ρ ) := inf ρ ∈P ( ρ ,ρ ) E y [ (cid:107) x − x (cid:107) ] , (1)where y := [ x , x ] T and has a probability density function(pdf) ρ : R n → R + . Furthermore, P ( ρ , ρ ) denotes the set of all probability distributions over R n with ﬁnite secondmoments and marginals ρ and ρ on x and x , respectively.If x i ∼ N ( µ i , S i ) for i = 1 , where µ i ∈ R n and S i ∈ S ++ n , then the squared Wasserstein distance is given by [14] W ( ρ , ρ ) = (cid:107) µ − µ (cid:107) + tr (cid:16) S + S − S / S S / ) / (cid:17) . (2)

2) Kullback-Leibler Divergence:

The KL divergence isnot a metric in the strict mathematical sense (it does notenjoy the symmetry property) but it is often used to computethe “distance” between two distributions because of itsease of computation. In particular, given two probabilitydistributions with density functions ρ ( x ) and ρ ( x ) , theirKL divergence is deﬁned as: KL( ρ || ρ ) = (cid:90) ρ ( x ) log (cid:18) ρ ( x ) ρ ( x ) (cid:19) d x. (3)where ρ ( x ) > over the domain of integration.When ρ and ρ correspond to the densities of two Gaus-sian distributions N ( µ , S ) and N ( µ , S ) where µ i ∈ R n and S i ∈ S ++ n is given by KL( ρ || ρ ) = (1 / (cid:2) tr (cid:16) S − S (cid:17) + ( µ − µ ) T S − ( µ − µ ) − n + log (cid:0) det( S ) / det( S ) (cid:1)(cid:3) . (4) C. Problem Statement

We consider an uncertain system whose dynamics isdescribed by the following discrete-time stochastic linearstate space model: x k +1 = A k x k + B k u k + G k w k , ∀ k ∈ Z + , (5)where { x k } k ∈ Z + is the state (random) process over R n x , { u k } k ∈ Z + is the input process over R n u and { w k } k ∈ Z + is thenoise (random) process over R n w . In particular, { w k } k ∈ Z + corresponds to a white Gaussian noise process with E [ w k ] =0 and E [ w k w (cid:62) m ] = δ ( k, m ) I , where δ ( k, m ) = 1 when k = m and δ ( k, m ) = 0 , otherwise. We also assume thatthe initial state x ∼ N ( µ , S ) and that x and { w k } aremutually independent, which implies that E [ x w (cid:62) k ] = forall k ∈ Z + .Our objective is to drive the uncertain state of the system(5) from its given initial distribution to a terminal distributionwhich is close to a desired terminal Gaussian probabilitydistribution N ( µ d , S d ) , where µ d ∈ R n and S d ∈ S ++ n are given, at a given ﬁnite time while minimizing a relevantperformance index. Next, we provide the precise formulationof our problem. Problem 1.

Let µ , µ f ∈ R n x , S , S f ∈ S ++ n x , λ > and N ∈ Z ++ be given. In addition, let Π denote the set of alladmissible control policies π := { m ( · ) , . . . , m N − ( · ) } forsystem (5) , with u k = m k ( X k ) where X k denotes the (ﬁnite)sequence of states visited up to stage t = k , that is, X k := { x , x , . . . x k } , and m k ( X k ) are measurable functions ofthe elements of X k , for k = 0 , . . . , N − . Then, ﬁnd aontrol policy π ∗ ∈ Π that solves the following stochasticoptimal control problem: Minimize π ∈ Π E (cid:34) N − (cid:88) k =0 u T k u k (cid:35) + λϕ ( ρ N , ρ d ) (6a) subject to x k +1 = A k x k + B k u k + G k w k (6b) x ∼ N ( µ , S ) (6c) where ρ d is the pdf of the Gaussian probability distribution N ( µ f , S f ) (desired state distribution), ρ N is the pdf of theterminal state x ( N ) , and ϕ ( ρ N , ρ d ) denotes the (general-ized) distance between the probability distributions of thedesired state and the actual terminal state of the system.In particular, ϕ ( ρ N , ρ d ) = W ( ρ N , ρ d ) or ϕ ( ρ N , ρ d ) =KL( ρ N || ρ d ) . In order to associate Problem 1 with a tractable, ﬁnite-dimensional optimization problem, we only consider admis-sible control policies that correspond to sequences of controllaws m k ( · ) which are afﬁne functions of the state history: m k ( X k ) = k (cid:88) i =0 K ( k, i ) (cid:0) x i − ¯ x i (cid:1) + u ﬀ ( k ) , (7)where ¯ x i = E [ x i ] . Next, we show the main steps forrecasting the Problem 1, whose decision variable correspondsto the control policy π , as an optimization problem whosedecision variables are the controller parameters u ﬀ ( k ) ∈ R n u and K ( k, j ) ∈ R n u × n x , ∀ k ≥ j ∈ { , . . . , N − } .III. C OVARIANCE S TEERING B ASED ON A W ASSERSTEIN D ISTANCE T ERMINAL C OST

In this section, we will show that Problem 1 when ϕ ( ρ N , ρ d ) = W ( ρ N , ρ d ) can be associated with a differenceof convex function program (DCP), that is, a nonlinearprogram whose performance index is equal to the differenceof two convex functions. This will allow us to efﬁcientlycompute local minimizers of Problem 1 by means of heuristicand easily implementable algorithms, such as the convex-concave procedure [1]. It is worth mentioning that the setof objective functions which can be expressed as the differ-ence of convex functions is dense in the set of continuousfunctions; moreover, every twice differentiable function canbe represented as the difference of convex functions [15].However, there is no systematic process that is guaranteedto ﬁnd such a representation for a given function of interestexcept for special classes of functions.Next, we recast Problem 1 as a ﬁnite-dimensional opti-mization problem. To this aim, we express the state x k interms of a ﬁnite-dimensional decision variable. In particular,by propagating forward in time the state of the discrete-time stochastic system (5) and using the control policyparametrization given in (7), we can express x k as a functionof x , { u i } k − i =0 and { w i } k − i =0 as follows: x k = Φ( k, x + k − (cid:88) i =0 Φ( k, i ) B i u i + k − (cid:88) i =0 Φ( k, i ) G i w i , (8) where Φ( k, n ) (cid:44) A k − . . . A n , Φ( n, n ) = I with k ≥ n for k, n ∈ Z + . Now, let us deﬁne the following quantities: x := [ x (0) T , x (1) T , . . . , x ( N ) T ] T ∈ R n x ( N +1) , (9a) u := [ u (0) T , u (1) T , . . . , u ( N − T ] T ∈ R n u N , (9b) w := [ w (0) T , w (1) T , . . . , w ( N − T ] T ∈ R n w N . (9c)By using equations (8)-(9), it follows that x = Γ x + H u u + H w w , (10)where Γ := [ I Φ(1 ,

0) Φ(2 , . . . Φ( N, , (11) H u :=  . . . B . . . Φ(2 , B B . . . ... ... ... ... Φ( N, B Φ( N, B . . . B N −  , (12)and H w is deﬁned similarly, after replacing the matrices B i in (12) with the matrices G i . One can refer to [9] for thedetails on the derivation of (10)-(12).Because the performance index of Problem 1 consists ofa terminal cost term, we will use the following equation: x ( N ) = F x , F := [ · · · I ] , (13)to recover x ( N ) from x .Given the particular afﬁne parametrization of the controlpolicy as in (7) and the fact that the initial state is assumedto be a Gaussian (random) vector, it follows that the statesof the system in the subsequent stages will also be Gaussian(random) vectors. In addition, we obtain u = K ( x − ¯ x ) + u ﬀ , (14)where ¯ x := E [ x ] , u ﬀ := [ u Tﬀ (0) , . . . , u Tﬀ ( N − T and K :=  K (0 , . . . K (1 , K (1 , . . . K (2 , K (2 , . . . ... ... ... ... K ( N − , K ( N − , . . .  . (15)We proceed with the derivation of the expression of theperformance index of Problem 1 in terms of the new decisionvariables. To this aim, we write (cid:80) N − k =0 u T k u k = u T u , whichin view of basic properties of trace operator and (7) gives E [ u T u ] = E [tr( uu T )]= E [tr(( K ( x − ¯ x ) + u ﬀ )( K ( x − ¯ x ) + u ﬀ ) T )]= tr( K E [˜ x ˜ x T ] K T ) + (cid:107) u ﬀ (cid:107) , (16)where ˜ x := x − ¯ x and in the derivation of the last equality,we have used the fact that u ﬀ is a deterministic quantity.For the computation of Cov[ x ] = E [˜ x ˜ x T ] , we ﬁrst haveto compute ¯ x = E [ x ] . By taking expectation of both sidesof (10), we obtain: E [ x ] = E [ Γ x + H u ( K ( x − ¯ x ) + u ﬀ ) + H w w ]= Γ µ + H u u ﬀ . (17)fter some simple algebraic manipulations, we get: ˜ x = ( I − H u K ) − ( Γ ( x − µ ) + H w w ) . (18)Let ¯ K := ( I − H u K ) − and ¯ x := x − µ . We obtain: E [˜ x ˜ x T ] = ¯ K (Γ S Γ T + H w S w H w T ) ¯ K T . (19)From (17) and (19), we can obtain the following expressionsfor µ N := E [ x ( N )] and S N := Cov [ x ( N )] : µ N = F ( Γ µ + H u u ﬀ ) , (20a) S N = F ( I − H u K ) − ˜ S ( I − H u K ) − T F T , (20b)where ˜ S = (Γ S Γ T + H w S w H w T ) and S w = E [ ww T ] .By plugging (19) into (16), we have: E [ u T u ] = tr( K ( I − H u K ) − ˜ S ( I − H u K ) − T K T )+ (cid:107) u ﬀ (cid:107) . (21)After plugging the expressions of µ N and S N in (20a)and (20b) into the expression of W ( ρ N , ρ d ) in the case ofGaussian distributions, which is given in (2), we get: W ( ρ N , ρ d ) = (cid:107) F ( Γ µ + H u u ﬀ ) − µ d (cid:107) + tr( F ( I − H u K ) − ˜ S F ( I − H u K ) − T F T + S d ) − (cid:112) S d × ( F ( I − H u K ) − ˜ S F ( I − H u K ) − T F T ) / × (cid:112) S d ) . (22)At this point, we propose to apply a variable transfor-mation, which was ﬁrst proposed in [16] and later usedfor covariance steering problems in [9], to convexify theoptimization problem. In particular, we a new transformedvariable, Θ , which is deﬁned as follows: Θ := K ( I − H u K ) − =: ϕ ( K ) (23a) K := ( I + H u Θ ) − Θ =: φ ( Θ ) . (23b)Furthermore, by using the identity ( I + P ) − = I − P ( I + P ) − , we obtain: ( I − H u K ) − = I + H u K ( I − H u K ) − = ( I + H u Θ ) . (23c)As is shown in [16], the functions φ ( · ) and ϕ ( · ) determinea bijective transformation, that is, φ ( · ) = ϕ − ( · ) and viceversa. Therefore, the right hand sides of equations (21) and(22) can be expressed equivalently in terms of transformedvariables (23) as follows: E [ u T u ] = tr( Θ ¯ S Θ T ) + u Tﬀ u ﬀ (24) W = (cid:107) F ( Γ µ + H u u ﬀ ) − µ d (cid:107) + tr( F ( I + H u Θ ) ˜ S ( I + H u Θ ) T F T ) − (cid:112) S d F ( I + H u Θ ) ˜ S ( I + H u Θ ) T F T (cid:112) S d ) / )+ tr( S d ) . (25) Remark 1.

It should be noted that K is a block lowertriangular matrix whose last n x columns are equal to . Ifwe examine equation (23b) , we observe that ( I − H u K ) − is block lower triangular since H u is also block lower triangular, which implies that ( I − H u K ) − is well deﬁned.Finally, left multiplication of ( I − H u K ) − with K gives Θ , which is also a block lower triangular matrix with thesame dimension as K . The reader can refer [9, 16] for moredetails. An important observation is that the new decisionvariable Θ should have the same structure as K for thecontrol policy to maintain causality. Finally, the performance index of Problem 1 can beexpressed in terms of the decision variables u ﬀ and Θ . Letus denote this function as J ( u ﬀ , Θ ) , where J ( u ﬀ , Θ ) = J ( u ﬀ ) + J ( Θ ) + J ( Θ ) − J ( Θ ) , (26)with J ( u ﬀ ) := (cid:107) u ﬀ (cid:107) + λ (cid:107) F ( Γ µ + H u u ﬀ ) − µ d (cid:107) , (27a) J ( Θ ) := tr( Θ ¯ S Θ T ) , (27b) J ( Θ ) := λ tr( F ( I + H u Θ ) ˜ S ( I + H u Θ ) T F T )+ tr( S d ) , (27c) J ( Θ ) := 2 λ tr(( (cid:112) S d F ( I + H u Θ ) × ˜ S ( I + H u Θ ) T F T (cid:112) S d ) / ) . (27d)Thus, Problem 1 can be reduced to the following optimiza-tion problem: Problem 2.

Let µ , µ d ∈ R n x , S , S d ∈ S ++ n x , N ∈ Z ++ and { A k , B k , G k } Nk =0 , where A k ∈ R n x × n x , B k ∈ R n x × n u and G k ∈ R n x × n w , be given. Find a pair ( u (cid:63) ﬀ , Θ (cid:63) ) , where Θ (cid:63) is a block lower triangular matrix in R n u N × n x ( N +1) and u (cid:63) ﬀ ∈ R n u N , that minimizes the objective function J ( u ﬀ , Θ ) ,which is deﬁned in (26) - (27) . Proposition 1.

Let λ ∈ R ++ be given. Then. the functions J , J , J and J , which are deﬁned in (27) , are convexand thus Problem 2 corresponds to a difference of convexfunctions program (DCP).Proof. The proof of convexity of the functions J ( · ) , J ( · ) and J ( · ) can be found in [9]. For the convexity of J ( · ) ,we need to deﬁne the functions g ( Θ ) := ( √ S d F ( I + H u Θ ) ˜ V ˜ D / ) T , where ˜ V T ˜ D ˜ V is the eigenvalue decompo-sition of ˜ S , and f ( A ) := tr (( A T A ) / ) = (cid:107) A (cid:107) ∗ . Clearly, g ( · ) is an afﬁne function. In addition, f ( · ) corresponds tothe nuclear norm, which is a valid matrix norm [17] andthus, f ( · ) is a convex function. Finally, J ( Θ ) is convex asthe composition of the convex function f ( · ) with the afﬁnefunction g ( · ) . Remark 2.

Proposition 1 implies that Problem 1 can bereduced to a DCP, whose (local) minimizers can be foundby means of the so-called convex-concave procedure [1, 12]which is known to be efﬁcient and robust in practice.

IV. NLP

FORMULATION FOR

KL D

IVERGENCE T ERMINAL C OST

If we consider Problem 1 when the terminal cost ϕ ( ρ N , ρ d ) = KL( ρ N || ρ d ) , then we will arrive at a nonlinearprogram (NLP) similar to Problem 2. However, using thevariable transformations given in (23) will not yield a DCPs in the case with ϕ ( ρ N , ρ d ) = W ( ρ N , ρ d ) . Thus, usingstate history feedback will not necessarily help us associatethe covariance steering problem (Problem 1) to a tractableoptimization problem. We will instead consider a memoryless state feedback (afﬁne) controller in the form: u k = K ( k ) (cid:0) x k − ¯ x k (cid:1) + u ﬀ ( k ) , (28)where ¯ x k = E [ x k ] . Independent of the choice of thecontroller form, we can express the running cost termof the performance index of Problem 1 as in (21) andalso obtain expressions for the mean and variance of theﬁnal state x ( N ) as in (20a) and (20b). The only differ-ence will be in the matrix K which is now deﬁned as K := [blkdiag( K (0) , K (1) , . . . , K ( N − , ] , which issigniﬁcantly more sparse than the previous case. Becausethe closed-loop system is linear (given the structure of thecontroller given in (28)), the ﬁnal state x ( N ) will be aGaussian random variable x ( N ) ∼ N ( µ N , S N ) and thuswe can recover an NLP using the expression for the KLdivergence given in (4). Since the KL divergence is notsymmetric, the ﬁnal objective function will depend on theorder of ρ N and ρ d . We take ϕ ( ρ N , ρ d ) = KL( ρ N || ρ d ) .Thus, the objective function can be expressed as follows byplugging (21) and (20) into (4): J ( K , u ﬀ ) = (cid:107) u ﬀ (cid:107) + tr( K ( I − H u K ) − ˜ S ( I − H u K ) − T K T )+ ( λ/ (cid:104) tr (cid:0) S − d F ( I − H u K ) − ˜ S ( I − H u K ) − T F T (cid:1) + ( µ d − F ( Γ µ + H u u ﬀ )) T S − d ( µ d − F ( Γ µ + H u u ﬀ )) − n x + log(det S d ) − log (cid:0) det( F ( I − H u K ) − ˜ S ( I − H u K ) − T F T ) (cid:1)(cid:105) . (29)In this case, Problem 1 reduces to the following optimiza-tion problem: Problem 3.

Let µ , µ d ∈ R n x , S , S d ∈ S ++ n x , N ∈ Z ++ and { A k , B k , G k } Nk =0 , where A k ∈ R n x × n x , B k ∈ R n x × n u and G k ∈ R n x × n w , be given. Find a pair ( K (cid:63) , u (cid:63) ﬀ ) , where K (cid:63) := [blkdiag( K (cid:63) (0) , K (cid:63) (1) , . . . , K (cid:63) ( N − , ] where K (cid:63) ( i ) ∈ R n u × n x , for i ∈ { , . . . , N − } , and u (cid:63) ﬀ ∈ R n u N ,that minimizes the objective function J ( K , u ﬀ ) deﬁned in (29) . Because the objective function given in (29) is not convexin ( K , u ﬀ ) , Problem 3 corresponds to a non-convex NLP,in general. In addition, Problem 3 does not correspond to aDCP, but local minimizers of this problem can still be com-puted by using nonlinear interior point methods and solverssuch as IPOPT[18] and the scipy optimization package [19],which are readily available.V. N UMERICAL E XPERIMENTS

In this section, we present numerical experiments wherewe used the convex-concave procedure (CCP) with MOSEK[20] to solve Problem 2 and CVXPY [21] for modelingof convexiﬁed subproblems. To solve Problem 3 which is a nonlinear program, we used the scipy optimization [19]implementation of the L-BFGS-B algorithm. We considerthe linear state space model (5) with A k = (cid:2) t (cid:3) , B k =[0 ∆ t ] T , G k = I , w k ∼ N ( , γ I ) , ∀ k ∈ Z + . We also took x ∼ N ( µ , S ) , µ = [0 , T , S = 10 I , µ d = [10 , T , S d = I , ∆ t = 1 . In addition, N ∈ { , , , , } and γ ∈ { , . } are chosen for different experiments to comparecomputation time.Figure 1 illustrates the evolution of the state distributionof the system. We use λ = 10 . for the Wasserstein distancecase and λ = 70 . for the KL divergence case for scalingpurposes. The noise intensity parameter γ = 1 and theproblem horizon N = 20 in both experiments. The ﬁnalstate covariance matrices are (cid:2) .

81 0 . .

19 1 . (cid:3) for the Wassersteindistance and (cid:2) .

65 0 . .

06 2 . (cid:3) for the KL divergence. Since bothproblems are non-convex, the obtained solutions are expectedto depend on the initial guess. However, repeating the exper-iments with different initial guesses did not change the ﬁnalcost and the covariance matrices signiﬁcantly even thoughthe control policy parameters did change. − − −

20 0 20 − x − − −

20 0 20 − x x Fig. 1. Evolution of 2- σ conﬁdence ellipses (in green) for the controlledsystem based on KL divergence (top) and Wasserstein distance (bottom)terminal cost functions. The blue ellipses correspond to the 2- σ conﬁdenceellipses of the initial state, whereas the red ellipses to the desired distribu-tion. In Figure 2, sample paths of the controlled system areshown for N = 40 . We observe that the optimal controlpolicy allows the spread of trajectories (uncertainty) to“grow” in the beginning and tries to reduce it down towardsthe end of the time horizon. This result is expected giventhat the state covariance is not penalized in the running costterm of the performance index in Problem 1 whereas theuncertainty in the control input is penalized by the term tr( Θ ¯ S Θ ) .In Table I, we compare the computation time of theNLP solver [19] and our CCP based approach for differentproblem instances with different values for the noise intensityparameter γ and the problem horizon N . In our simulations,we used the termination condition ( f k − f k − ) /f k ≤ (cid:15) where f k is the value of objective function at the k th iteration and (cid:15) is the convergence tolerance which was taken to be − .We observe that our approach reduces the computation timesigniﬁcantly in all cases. ig. 2. 15 sample paths of controlled system with Wasserstein distanceterminal cost where blue ellipses are 2- σ conﬁdence regions for initialstate and ﬁnal state and red ellipse is the 2- σ conﬁdence region of desireddistribution. ( γ = 1 , λ = 10 . , N = 40 ).TABLE IC OMPUTATION TIME ( IN SECONDS ) FOR DIFFERENT PROBLEMINSTANCES FOR THE W ASSERSTEIN D ISTANCE T ERMINAL C OST γ = 1 N=10 N=20 N=30 N=40 N=50NLP 7.88 44.30 120.93 348.39 643.65CCP 0.93 7.65 12.81 32.85 68.72 γ = . N=10 N=20 N=30 N=40 N=50NLP 18.01 28.57 209.93 510.14 907.40CCP 2.89 17.11 53.68 156.29 314.52

VI. C

ONCLUSION

We have addressed the covariance steering problem withsoft terminal constraints based on two different problemformulations in which the terminal cost is associated witheither the squared Wasserstein distance or the KL divergencebetween the terminal state distribution and a desired distri-bution. We have shown that in the case with the squaredWasserstein distance terminal cost, the proposed covariancesteering problem reduces to a DCP which can be solvedefﬁciently by the so-called convex-concave procedure alongwith convex optimization solvers. Our numerical experimentshave shown that our approach reduces signiﬁcantly thecomputation time compared to off-the-shelf solvers. In ourfuture work, we plan to extend our approach to covariancesteering problems for nonlinear stochastic systems.

REFERENCES [1] A. L Yuille and A. Rangarajan. “The concave-convexprocedure”. In:

Neural computation

Int. J. of Control

IEEE Trans. onAutom. Control

Int. J. of Control

IEEE Trans. on Autom.Control

IEEE Trans. onAutom. Control

ACC . 2016, pp. 7231–7236.[9] E. Bakolas. “Finite-horizon covariance control fordiscrete-time stochastic linear systems subject to inputconstraints”. In:

Automatica

91 (2018), pp. 61–68.[10] K. Okamoto, M. Goldshtein, and P. Tsiotras. “Optimalcovariance control for stochastic systems under chanceconstraints”. In:

IEEE Control Systems Letters

ACC . 2016, pp. 7249–7254.[12] X. Shen et al. “Disciplined convex-concave program-ming”. In:

CDC . 2016, pp. 1009–1014.[13] A. L Gibbs and F. E. Su. “On choosing and boundingprobability metrics”. In:

Int. Stat. Review

MichiganMath. J.

J. Optim. Theory Appl.

IEEE Trans. on Autom.Control

Matrix analysis . Cam-bridge university press, 2012.[18] A. W¨achter and L. T. Biegler. “On the implementationof an interior-point ﬁlter line-search algorithm forlarge-scale nonlinear programming”. In:

Mathematicalprogramming

Nature meth-ods (2010).[21] S. Diamond and S. Boyd. “CVXPY: A Python-embedded modeling language for convex optimiza-tion”. In: