Regularization of time-varying covariance matrices using linear stochastic systems
11 Regularization of time-varying covariancematrices using linear stochastic systems
Lipeng Ning
Abstract
This work focuses on modeling of time-varying covariance matrices using the state covariance oflinear stochastic systems. Following concepts from optimal mass transport and the Schr¨odinger bridgeproblem (SBP), we investigate several covariance paths induced by different regularizations on thesystem matrices. Specifically, one of the proposed covariance path generalizes the geodesics based onthe Fisher-Rao metric to the situation with stochastic input. Another type of covariance path is generatedby linear system matrices with rotating eigenspace in the noiseless situation. The main contributions ofthis paper include the differential equations for these covariance paths and the closed-form expressionsfor scalar-valued covariance. We also compare these covariance paths using several examples.
Index Terms
Optimal control; linear stochastic system; Fisher-Rao metric; optimal mass transport;
I. I
NTRODUCTION
Consider a linear stochastic system ˙ x t = A t x t + σd w t , (1)where A t ∈ R n × n and w t is a standard n-dimensional Wiener process. The corresponding statecovariance P t := E ( x t x (cid:48) t ) evolves according to ˙ P t = A t P t + P t A t + σ I, (2) This work was supported in part under grants R21MH115280 (PI: Ning), R01MH097979 (PI: Rathi), R01MH111917 (PI:Rathi), R01MH074794 (PI: Westin). L. Ning is with the Department of Psychiatry, Brigham and Women’s Hospital, HarvardMedical School. Email: [email protected]
DRAFT a r X i v : . [ m a t h . O C ] M a y where I denotes the identity matrix. Given two positive definite matrices P , P at t = 0 , ,respectively, we consider covariance paths that connect the two covariance given by the solutionto the following problem min P t ,A t (cid:26)(cid:90) f ( A t ) dt | (2) holds, and P , P specified (cid:27) , (3)where f ( A t ) denotes a quadratic function of A t which may depend on P t . The optimal solutionsto (3) could provide parametric models of covariance paths which are generalizations of shortestlines or geodesics. These paths could be applied to fit noisy measurements in order to understandthe underlying system dynamics of stochastic processes.In this paper, we investigate the solution to (3) corresponding to three objective functions,which have been explored in [1] using linear systems without stochastic input noise. Specifically,the first type of objective function quantifies the optimal mass transport cost between multivariateGaussian distributions [2]–[5] when there is no stochastic noise. In the presence of noise, it hasbeen recently extensively studied that the optimal solution becomes the covariance path inducedby the Schr¨odinger Bridge Problem (SBP) [6] [7], [8]. The second objective function quantifiesa type of weighted-mass-transport cost, which is also related to the Fisher-Rao metric frominformation geometry [9], [10]. The third family of objective function is given by the standardweighted-least-squares of the system matrices. The main contributions of this paper include thedifferential equation formulations of the optimal paths induced by the last two objective functionsand the closed-form expressions in the special case of scalar-valued covariance.The organization of this paper is as follows. In Section II we revisit covariance paths basedon optimal mass transport and SBP. In Section III, we derive the covariance paths correspondingto a weighted-mass-transport cost function, which is equivalent to the Fisher-Rao metric inthe noiseless situation. In Section IV, we investigate the covariance paths corresponding tothe weighted-least-square functions, which includes the Fisher-Rao based covariance paths ina special case. In Section V, we compare these covariance paths using several examples. SectionVI concludes the paper with discussions.For notations, S n , S n + , S n ++ denote the sets of symmetric, positive semidefinite, and positivedefinite matrices of size n × n , respectively. Small boldface letters, e.g. x , v , represent columnvectors. Capital letters, e.g. P, A , denote matrices. Regular small letters, e.g. w , h are for scalarsor scalar-valued functions. DRAFT
II. M
ASS - TRANSPORT BASED COVARIANCE PATHS
Consider u t = A t x t as the control input that steers the state covariance matrices accordingto (2). Define f omt P t ( A t ) = E ( (cid:107) u t (cid:107) ) = tr( A t P t A (cid:48) t ) . Then the minimum work needed to steer the covariance matrix from P to P is equal to min P t ,A t (cid:26) (cid:90) tr( A t P t A (cid:48) t ) dt | ˙ P t = A t P t + P t A (cid:48) t + σ I,P , P specified (cid:27) . (4)If the noise component vanishes, i.e. σ = 0 , then (4) gives the optimal mass transport distance[2]–[5] between two zero-mean Gaussian distributions with covariance matrices P and P , re-spectively. It is also related to the Bures distance between density matrices in quantum mechanics[11]–[13]. In the general situation when σ is non-zero, (4) can be viewed as the Schr¨odingerBridge Problem (SBP) [6] between two zero-mean Gaussian probability density functions withcovariance being P and P , respectively. The basic idea of SBP is to find a stochastic systemwhose probability law on the diffusion paths is most similar to that of a reference systemmeasured by relative entropy and satisfies the initial and final marginal probability distributions.Its relations with stochastic optimal control and mass transport have been extensively studiedrecently [7], [8]. The more general situations when the reference model is a linear time-varyingsystem with degenerate diffusions, i.e. noises that influence a subspace of the random states,have also recently been studied in [14], [15]. The following proposition presents the solution to(4). Proposition 1.
Given P , P ∈ S n ++ and a scalar σ . The unique set of solution to (4) is equal to A omt t = − Π ( I − Π t ) − , (5) P omt t = ( I − Π t ) P ( I − Π t ) + σ ( It − Π t ) , (6) where Π = I − P − (cid:18) P P P + 14 σ I (cid:19) − σ I P − . (7) DRAFT
Proof.
The optimization problem (4) is viewed as an optimal control problem with A t beingmatrix-valued control input. A necessary condition for the optimal solution is that the derivativeof the Hamiltonian h ( P t , A t , Π t ) = tr( A t P t A t + Π t ( A t P t + P t A (cid:48) t + σ I )) with respect to A t vanishes with the optimal control. This gives rise to A t P t + Π t P t = 0 . Thus, A t = − Π t . (8)Next, the optimal ˙Π t also needs to annihilate the derivative of h ( · ) with respect to P t whichleads to ˙Π t = − A (cid:48) t A t − A (cid:48) t Π t − Π t A t . (9)Substituting (8) to (2) and (9) to deduce that ˙ P t = − Π t P t − P t Π t + σ I, (10) ˙Π t = Π t . (11)Next, we show that all eigenvalues of the initial Π must be smaller than one. For this purpose,we note that solutions for P t and Π t from (10) and (11) have the following form P t = ( I − Π t ) P ( I − Π t ) + σ ( It − Π t ) , (12) Π t = Π ( I − Π t ) − . (13)Assume that Π has an eigenvalue λ > , then A t = − Π t becomes unbounded when t increasesto /λ . At the same time, P t is singular at t = 1 /λ and becomes non positive semi-definitewhen t > /λ . Therefore, all eigenvalues of Π are smaller than one.Next, setting t = 1 in (12) and multiplying both sides by P to obtain that P P P = (cid:18) P ( I − Π ) P (cid:19) + σ P ( I − Π ) P . DRAFT
Therefore P ( I − Π ) P has the same eigenvectors as P P P . If y is an eigenvalue of P P P , then the corresponding eigenvalue of P ( I − Π ) P , denoted by x , satisfies that x + σ x = y . The two solutions are given by x ± = − σ ± ( y + 14 σ ) . But x − is negative which contradicts to that I − Π is positive definite. Thus x + is the onlyfeasible solution. Therefore, P ( I − Π ) P = (cid:18) P P P + 14 σ I (cid:19) − σ I, which gives rise to the optimal solution in (7). Then, the proposition is proved.We note that if Π is singular, then A omt t is also singular for all t ∈ [0 , t ] , which implies freediffusion in the subspace spanned by the eigenvectors of Π corresponding to zero eigenvalues.In the noiseless situation when σ = 0 , then covariance path in (6) is equal to the geodesicinduced by the Wasserstein-2 metric.III. I NFORMATION - GEOMETRY BASED COVARIANCE PATHS
The Fisher information metric has provided a well-defined distance measure between prob-ability distributions. For zero-mean multivariate Gaussian distributions, the Fisher informationmetric can be expressed as a quadratic form of the covariance matrices, which is referred to asthe Fisher-Rao metric [9], [10]. Specifically, let P be a positive definite covariance matrix and let ∆ be a symmetric matrix denoting a tangent direction at P on the manifold of positive-definitematrices. Then the Fisher-Rao metric has the following form [10] g P (∆) = tr( P − ∆ P − ∆) . The geodesic connecting P , P induced by the Fisher-Rao metric is the optimal solution to min P t , ˙ P t (cid:26)(cid:90) tr( P − ˙ P t P − ˙ P t ) dt | P , P specified (cid:27) . (14)The optimal solution has the following well-known expression [16] P t = P ( P − P P − ) t P . (15) DRAFT
In [1], we showed that this geodesic is also the optimal solution to an optimization problem inthe form of (3) in the situation when the noise component σ = 0 . Specifically, we denote f info P t ( A t ) = E ( (cid:107) u t (cid:107) P − t ) = E ( u (cid:48) t P − t u t ) = tr( P − t A t P t A (cid:48) t ) . Then, (15) is also the optimal solution to min P t ,A t (cid:26)(cid:90) f info P t ( A t ) dt | ˙ P t = A t P t + P t A (cid:48) t , P , P specified (cid:27) , (16)with the optimal system matrix given by the constant matrix A = 12 P log( P − P P − ) P − . Note that the optimization problem (16) provides an interesting weighted-mass-transport viewof the Fisher-Rao metric.Following (3), we consider the optimization problem in below: min P t ,A t (cid:26) (cid:90) tr( P − t A t P t A (cid:48) t ) dt | ˙ P t = A t P t + P t A (cid:48) t + σ I,P , P specified (cid:27) . (17) A. On the optimal paths
Using variational analysis, we obtain the following proposition on the optimal solution to (17).
Proposition 2.
Given P , P ∈ S n ++ and a scalar σ . If there exists a path that satisfies thefollowing differential equation ˙ P t = − P t Π t P t + σ I, (18) ˙Π t = 2Π t P t Π t , (19) with P t being equal to P and P at t = 0 and , respectively, then P t is an optimal solution to (17) . The corresponding A t is equal to A t = − P t Π t . (20) DRAFT
Moreover, the path P t also satisfies the following differential equation ¨ P t − ˙ P t P − t ˙ P t + σ P − t = 0 . (21) Proof.
Following the same method as in the proof of Proposition (1), we will derive the solutionto the optimal control problem (17) using the following Hamiltonian h ( P t , A t , Π t ) = tr (cid:0) P − t A t P t A (cid:48) t + Π t ( A t P t + P t A (cid:48) t + σ I ) (cid:1) . It is necessary that ˙Π t annihilates the partial derivative of h ( · ) with respect to P t , which leadsto ˙Π t = − A (cid:48) t P − t A t + P − t A t P t A (cid:48) t P − t − Π t A t − A (cid:48) t Π t . (22)Moreover, setting the derivative of h ( · ) with respect to A t to zero to obtain that A t = − P t Π t . (23)Then, (18) and (19) can be obtained by substituting (23) to (2) and (22), respectively. From (18),we obtain the following expression Π t = 12 ( σ P − t − P − t ˙ P t P − t ) . Next, taking the derivative of Π t and setting it equal to (19) we obtain that ˙Π t = 12 (cid:18) σ ( − P − t ˙ P t P − t − P − t ˙ P t P − t )+ 2 P − t ˙ P t P − t ˙ P t P − t − P − t ¨ P t P − t (cid:19) = 12 ( σ P − t − P − t ˙ P t P − t ) P t ( σ P − t − P − t ˙ P t P − t ) . Then (21) can be obtained from the above equation after simplifications. Thus, the proof iscomplete.In the noiseless situation when σ = 0 , (21) becomes ¨ P t − ˙ P t P − t ˙ P t = 0 , which is the geodesic equation induced by the Fisher-Rao metric [16], [17]. In this case, the DRAFT closed-form expression of P t is given by (15). The closed-form expression of the optimal solutionto (17) is currently unknown to the author except for the scalar-valued cases given in below. B. The scalar case
Consider a scalar-valued covariance path p t that is expressed by one of the following equations: p t = p + σ t, (24) p t = ae bt − σ ab e − bt , (25) p t = σ ω cos( ωt + θ ) , (26)with < ω < π, − π < θ < π . It is straightforward to verify that all the above expressions satisfy that p t ¨ p t − ( ˙ p t ) + σ = 0 , which is a scalar-version of (21). Clearly, (24) corresponds to a covariance path purely drivenby the input noise. For the covariance path in (26), the constraint < ω < π is to ensure that P t does not have negative values on t ∈ [0 , . Moreover, the constraint − π < θ < π guaranteesthat p is positive. The following proposition shows that (25) and (26) connect covariance thatsatisfy different boundary conditions. Proposition 3.
Given three positive scalars p , p and σ . If | p − p | > σ , then exists a uniquesolution to (17) which has the form of (25) . If | p − p | < σ , then there exists a unique solutionto (17) which has the form of (26) .Proof. Since the paths in (25) and (26) both satisfy the (21). We will prove that if | p − p | > σ ,then these exists a unique pair of parameters a, b such that p t from (25) is equal to p and p at t = 0 , , respectively and there exists no path of the form (26) that satisfies the boundaryconditions. If | p − p | < σ then there exists a unique path of the form (26) and there existsno path of the form (25) that satisfies the boundary conditions.To prove the first statement, we set p t in (25) to p and p at t = 0 , , respectively, to obtain DRAFT that p = a − σ ab , (27) p = ae b − σ ab e − b . Then, we can derive the following a = p − p e − b e b − e − b . Next, substituting the above expression into (27) then multiplying both sides by p − p e − b ) b to obtain that p − p e − b )( p − p e b ) b − σ ( e b − e − b ) = 0 . (28)The above equation has a trivial solution at b = 0 . Moreover, if b is a solution to (28) so is − b .For a non-zeros b that satisfies (28), the coefficient of the second term of p t is equal to − σ ab = p − p e b e − b − e b . Therefore, (25) is equivalent to p t = p − p e − b e b − e − b e bt + p − p e b e − b − e b e − bt . Thus, switching b to − b does not change the covariance path in (25). Therefore, we only considerpositive solutions to (28). To this end, we denote the left hand side of (28) by φ ( b ) . It isstraightforward to derive that dφ ( b ) db | b =0 = 0 ,d φ ( b ) db | b =0 = 8(( p − p ) − σ ) . Moreover, d φ ( b ) db = − p p ( e b − e − b )( b + 6 b + 6) − σ ( e b − e − b ) which is negative for all b > . Therefore, if ( p − p ) − σ > , then the function φ ( b ) isconvex and positive near b = 0 . But φ ( b ) < as b → ∞ . Because its second order derivative d φ ( b ) db is monotonically decreasing, we conclude that there exist a unique solution to φ ( b ) = 0 . DRAFT0
To prove the second statement, we take the derivative of p t given by (26) to obtain ˙ p t = − σ sin( ωt + θ ) . If p , p are the endpoints of p t , then it is necessary that | p − p | = | (cid:90) ˙ p t dt | ≤ (cid:90) | ˙ p t | dt ≤ σ . To show that there exist a path of the form (26) for any p , p that satisfies | p − p | < σ , were-parameterize (26) as follows p t = c cos( ωt ) + d sin( ωt ) , with ( c + d ) ω = σ . By setting p t to p , p at t = 0 , , respectively, we obtain that p = c, (29) p = c cos( ω ) + d sin( ω ) . Therefore, d = p − p cos( ω )sin( ω ) . (30)Next, substituting (29) and (30) to sin ( ω )( c + d ) = sin ( ω ) σ ω to obtain that σ ω sin ( ω ) + 2 p p cos( ω ) = p + p . (31)We denote the left hand size of the above equation by ψ ( ω ) . Then, the following holds lim ω → ψ ( ω ) = σ + 2 p p ≥ ( p − p ) + 2 p p = p + p . Moreover, ψ ( π ) = − p p < . Furthermore, it can be shown that dψ ( ω ) dω = (cid:18) σ ω ( ω cos( ω ) − sin( ω )) − p p (cid:19) sin( ω ) < for all ω ∈ (0 , π ) . Therefore, there exists a unique solution to (31) for ω ∈ (0 , π ) , whichcompletes the proof. DRAFT1
IV. W
EIGHTED - LEAST - SQUARES BASED COVARIANCE PATHS
For scalar-valued covariance, the Fisher-Rao metric g p ( ˙ p ) can be viewed as the squared norm ( ˙ pp ) . For matrices, there is in general no unique ways to define matrix divisions. For example,in the following differential equation ˙ P t = A t P t + P t A (cid:48) t , (32)the matrix A t can be viewed as a non-commutative division of ˙ P by P . For a given pair ofmatrices P t and ˙ P t there are infinite A t that satisfy (32). But there could be a unique one thatminimizes or maximizes a quadratic function f( A t ) whose optimal value provides a well-definedquadratic norm of the non-commutative division of ˙ P t by P t . The optimal value of the quadraticfunction provides a way to define Riemmanian metrics on the manifold of covariance matricesin order to quantify the similarity between covariance matrices. This motivates our choice of thethird objective function.Specifically, for a matrix A t ∈ R n × n , we decompose it as A t = A t, s + A t, a where A t, s := ( A t + A (cid:48) t ) ,A t, a := ( A t − A (cid:48) t ) , are the symmetric and asymmetric parts of A t , respectively. For a given scalar (cid:15) > , we definethe following weighted squared norm of A t f wls (cid:15) ( A t ) := (cid:107) A t, s (cid:107) + (cid:15) (cid:107) A t, a (cid:107) = 1 + (cid:15) A t A (cid:48) t ) + 1 − (cid:15) A t A t ) . Following (3), we consider the optimization problem in below: min P t ,A t (cid:26) (cid:90) (cid:15) A t A (cid:48) t ) + 1 − (cid:15) A t A t ) dt | ˙ P t = A t P t + P t A (cid:48) t + σ I, P , P specified (cid:27) . (33) Proposition 4.
Given P , P ∈ S n ++ and two scalars (cid:15), σ > . If there exists a pair of matrix- DRAFT2 valued functions P t , Π t that satisfy ˙ P t = − (cid:15) (cid:15) (cid:0) Π t P t + P t Π t (cid:1) + 1 − (cid:15)(cid:15) P t Π t P t + σ I, (34) ˙Π t = 1 + (cid:15) (cid:15) (Π t P t + P t Π t ) − − (cid:15)(cid:15) Π t P t Π t , (35) with P t being equal to P , P at t = 0 , , respectively, then P t is a optimal solution to (33) .Moreover, the optimal A t is given by A t = −
12 (Π t P t + P t Π t ) + 12 (cid:15) ( P t Π t − Π t P t ) . (36) Proof.
Following the same method as in the proof of Proposition (2), we develop the necessaryconditions for a stationary value using the following Hamiltonian h ( P t , A t , Π t ) =tr (cid:18) (cid:15) A t A (cid:48) t ) + 1 − (cid:15) A t A t )+ Π t ( A t P t + P t A (cid:48) t + σ I ) (cid:19) . Setting − ˙Π t equal to partial derivative of h ( · ) with respect to P t to obtain that ˙Π t = − Π t A t − A (cid:48) t Π t . (37)Moreover, setting the partial derivative of h ( · ) with respect to A t equal to zero to obtain that (1 + (cid:15) ) A t + (1 − (cid:15) ) A (cid:48) t + 2Π t P t = 0 . (38)Thus, the symmetric and asymmetric part of A t are equal to A t, s = −
12 (Π t P t + P t Π t ) ,A t, a = 12 (cid:15) ( P t Π t − Π t P t ) . Therefore (36) holds. Then, (34) and (35) can be obtained by substituting (36) into (2) and (37),respectively.Moreover, the path defined by (34) and (35) also coincide with the path given by (18) and(19) for scalar-valued covariance. Therefore, the results from Proposition 3 can also be adaptedto the solution of (33).
DRAFT3
Proposition 5.
Given three positive scalars p , p and σ . If | p − p | > σ , then exists a uniquesolution to (33) which has the form of (25) . If | p − p | < σ , then there exists a unique solutionto (33) which has the form of (26) . Remark.
For any given (cid:15) (cid:54) = 0 and a pair of initial values Π , P , the pair of equations (34) and (35) provide a smooth path of covariance matrices. In the special case when (cid:15) = − , (34) and (35) is equivalent to the Fisher-Rao based path given by (18) and (19) . In the noiselesssituation, the existence and uniqueness of the covariance path that satisfy (34) and (35) hasbeen shown in [1] for (cid:15) taking values around − . To understand the influence of input noise to the optimal covariance path, we consider atrajectory defined by (34) to (36) with given initial values P and Π and the symmetric andasymmetric parts of the system matrix A t are initialized by (36). Then it is straightforward toverify that the asymmetric part of A t given by (36) is constant, which is denoted by A , a andis equal to (cid:15) ( P Π − Π P ) . On the other hand, by taking the derivative of the symmetric partof A t , denoted by A t, s , we obtain that ˙ A t, s = (1 + (cid:15) )( A , a A t, s + A t, s A (cid:48) , a ) − σ Π t . (39)Next, we apply change of variables to define ˆ A t = e (1+ (cid:15) ) A (cid:48) , a t (cid:0) A t, s + (cid:15)A (cid:48) , a (cid:1) e (1+ (cid:15) ) A , a t . Similarly, applying a time-varying change of coordinate to define ˆ P t = e (1+ (cid:15) ) A (cid:48) , a t P t e (1+ (cid:15) ) A , a t , ˆΠ t = e (1+ (cid:15) ) A (cid:48) , a t Π t e (1+ (cid:15) ) A , a t . Then the derivative of the new variable ˆ A t is equal to ˙ˆ A t = − σ ˆΠ t . (40) DRAFT4
Moreover, the derivative of ˆ P t is equal to ˙ˆ P t =(1 + (cid:15) ) A (cid:48) , a t ˆ P t + (1 + (cid:15) ) ˆ P t A , a + e (1+ (cid:15) ) A (cid:48) , a t ( A t P t + P t A (cid:48) t + σ I ) e (1+ (cid:15) ) A , a t = ˆ A t ˆ P t + ˆ P t ˆ A (cid:48) t + σ I. (41)Similarly, the derivative of ˆΠ t is given by ˙ˆΠ t = − ˆ A (cid:48) t ˆΠ t − ˆΠ t ˆ A t . (42)Combining (40) and (42), we obtain that the covariance path is determined by the followingsecond-order differential equation in the roating frame ¨ˆ A t + ˆ A (cid:48) t ˙ˆ A t + ˙ˆ A (cid:48) t ˆ A t = 0 , (43)with the initial values given by ˆ A = A , s + (cid:15)A (cid:48) , a = − P Π , (44) ˙ˆ A = − σ Π . (45)The special case when σ = 0 , the matrix ˆ A t is clearly constant based on (40). Therefore,(41) becomes a linear time-invariant system whose solution is given by the standard expression ˆ P t = e ˆ A t P e ˆ A (cid:48) t . Then, transforming ˆ A t and ˆ P t to the original coordinate to obtain that A t = e (1+ (cid:15) ) A , a t A e (1+ (cid:15) ) A (cid:48) , a t , (46) P t = T (cid:15),t ( A ) P T (cid:15),t ( A ) (cid:48) , (47)where T (cid:15),t ( A ) = e (1+ (cid:15) ) A , a t e ( A , s + (cid:15)A (cid:48) , a ) t , A , s = − (Π P + P Π ) , A , a = (cid:15) ( P Π − Π P ) . T (cid:15),t ( A ) is the state transition matrix which satisfies that ˙ T (cid:15),t ( A ) = A t T (cid:15),t ( A ) . It is interesting tonote that the system matrix A t has a rotating eigenspace. If (cid:15) is sufficiently close to − , thenthere exists a unique covariance path of the form (45) that connects any given P , P ∈ S n ++ [1]. DRAFT5
V. E
XAMPLE
A. Comparing scalar-valued covariance paths
In this example, we compare scalar-valued covariance paths given by the closed-form expres-sions in (6) and (25) to (26). Figure 1 illustrates several covariance paths with the initial valuebeing P = 6 and σ = 4 . These plots clearly illustrate the differences between the paths obtainedfrom OMT and the Fisher-Rao metric, especially between the paths whose endpoints at t = 1 are far away from P + σ , which is equal to in this example. Fig. 1:
An illustration of the covariance paths. The dashed red lines denoted by P omt t illustrate the paths givenby (6) based on OMT. The slid blue lines denoted by P info t illustrate the covariance paths given by (25) to (26)based on the Fisher-Rao metric. B. Comparing matrix-valued covariance paths
In this example, we illustrate the difference between the above covariance paths using twofixed endpoints given by P = . , P = . . (48)Fig. 2a shows the OMT-based paths in (6) using several different values for σ , where the ellipsoidsdenote isocontour of the quadratic function x (cid:48) P t x = r with r = 0 . . Fig. 2b illustrates theFisher-Rao-based paths given by the (21). Since P and P commute, the paths are obtained byusing the closed-form expressions in (25) to (26) to the diagonal entries. Though there are minor DRAFT6 differences between the two sets of covariance paths, all the P t ’s along the two paths have thesame eigenspace.Regarding the covariance paths defined by (34) and (35), their closed-form solutions arecurrently unknown. For this particular pair of end points, there exists multiple local optimalpaths. Specifically, we consider the solution to (33) when (cid:15) = 0 . In this case, any asymmetricsystem matrix A of the form A = ± (2 k +1) π ∓ (2 k +1) π , (49)is an optimal solution because the corresponding objective value is equal to zero. The corre-sponding covariance paths are of the form P ± ,t = . . ( (2 k +1) π t ) ± . (2 k +1) π t ) sin( (2 k +1) π t ) ± . (2 k +1) π t ) sin( (2 k +1) π t ) 0 . . ( (2 k +1) π t ) + σ It.
In order to obtain a numerical solution for the covariance paths given by (34) and (35) with (cid:15) > , we apply the lsqnonlin nonlinear optimization toolbox and the ode45 functions in MATLAB(The MathWorks, Inc., Natic, MA) to solve for Π so that a path P t starts from P will havethe least square error relative to P at t = 1 . We first set (cid:15) = 0 . and choose two differentinitial values for ˆΠ used in the optimization algorithms given by ˆΠ ± , = ± ππ , so that the corresponding system matrices given by (36) approximately satisfy (49). Next, wegradually increase (cid:15) using a step size of . and use the optimal Π from the previous stepas the initial value. The left and right panels of Fig. 3 illustrate two branches of locally optimalpaths corresponding to the two different initial values of Π with σ = 0 . and several differentvalues for (cid:15) . All the numerical solutions satisfy the endpoints with the Frobenius norm of theresiduals at the order of − or smaller. Very different from the paths shown in (2), the pathsin Fig. 3 have rotating eigenspace where the rotation direction depends on the initial choice of Π . Moreover, as (cid:15) increase, the paths become similar to those shown in Fig. 2. DRAFT7 (a) OMT-based paths (b) Fisher-Rao-based paths
Fig. 2:
An illustration of the covariance paths obtained using (6) (Left) and (21) (Right), respectively, with thetwo endpoints given by P and P in (48). (a) (b) Fig. 3:
An illustration of two branches, i.e. (a) and (b), of locally-optimal covariance paths obtained using (34)and (35) with different initial values for Π . C. Fitting noisy measurements of functional MRI data
In this example, we apply the proposed covariance paths to fit noisy sample covariance matricesof a stochastic process based on a resting-state functional MRI (rsfMRI) dataset used in [1]. Theinterested reader is referred to [1] for more detailed information on the data. This problem isformulated as follows. Given K sample covariance matrices, ˜ P t , . . . , ˜ P t K based on K segmentsof a multivariate time series, find a smooth covariance path P t that minimizes min P t K (cid:88) k =1 (cid:107) P t k − ˜ P t k (cid:107) . (50) DRAFT8
In this example, we has K = 10 noisy sample covariance from a 7-dimensional process. We applythree parametric models of covariance path to fit these measurements using the fminsdp functionin MATLAB. The first one is the closed-form expression given by (6) which is parameterizedby P , Π and σ . The optimal solution is denoted by ˆ P omt t . The second model is based on thedifferential equation (18) and (19) which is implemented by the ode45 function in MATLAB.The estimated path is denoted by ˆ P info t . Note again that this model is a special case of (34)and (35) when (cid:15) = − . The more general model in (34) and (35) relies on the estimation anadditional parameter (cid:15) . Since the differential equations are highly nonlinear in term of (cid:15) , we findthat the MATLAB algorithm only provides a local optimal value of (cid:15) depending on its initialvalue. In order to obtain a reliable covariance path to understand the rotation of energy amongthe variables, we apply the rotating-system based covariance path in (46) to fit the measurementsby setting (cid:15) = 20 as used in [1]. The estimated path is denoted by ˆ P wls t .The discrete makers in Fig. 4 illustrate the noisy sample covariance of 6 entries of thecovariance matrices. The blue, green and red plots are the estimated paths given by ˆ P omt t , ˆ P info t ,and ˆ P wls t , respectively. The normalized squared errors corresponding to the three paths are equalto . , . and . , respectively. The much lower estimation error corresponding to ˆ P wls t is because of its capability in tracking rotations in the covariance. Fig. 4:
These plots illustrate the noisy sample covariance ˜ P t and the fitting results of three covariance paths basedon a rsfMRI dataset. DRAFT9
VI. D
ISCUSSION AND CONCLUSION
In this paper, we have investigated three types of covariance paths by using different reg-ularizations on linear stochastic systems. The first covariance path is given by the solutionto the Schr¨odinger bridge problem for multivariate Gaussian and optimal mass transport. Thesecond type of covariance path is based on a generalization of the Fisher-Rao metric with astochastic input term. The third type of covariance paths is obtained using a weighted-least-square regularizations on the system matrices. It is interesting to remark that the Fisher-Raometric can be viewed as a weight-mass-transport cost. The corresponding covariance path is aspecial case of the weighted-least-square based solutions. The main contributions of this workinclude the differential-equation formulations for the last two types of covariance paths and theclosed-form expressions for the scalar-valued case.The general theme of this paper is closely related to the work in [12], [18]–[22] whichall focused on investigating smooth paths connecting positive definite matrices. The proposedapproach is based on different regularizations functions of system matrices which distinguishesthis paper from early work. Finally, we remark that a main motivation of this paper is from aneuroimaging application on analyzing functional brain networks using resting-state functionalMRI data. The proposed covariance paths with rotating eigenspace may provide a useful toolto understand the oscillations and directed connections among brain networks, which will befurther explored in our future work. A
PPENDIX R EFERENCES [1] L. Ning, “Regularization of covariance matrices on Riemannian manifolds using linear systems,” under review , 2018.[2] C. Villani,
Topics in Optimal Transportation . Amer. Math. Soc., 2003.[3] S. Rachev and L. R¨uschendorf,
Mass transportation problems. Vol. I and II. Probability and its Applications . Springer,New York, 1998.[4] M. Knott and C. S. Smith, “On the optimal mapping of distributions,”
Journal of Optimization Theory and Applications ,vol. 43, no. 1, pp. 39–49, May 1984.[5] A. Takatsu, “On Wasserstein geometry of the space of Gaussian measures,”
ArXiv e-prints , Jan. 2008.[6] E. Schr¨odinger, “ ¨Uber die Umkehrung der Naturgesetze,”
Sitzungsberichte Preuss. Akad. Wiss. Berlin. Phys. Math , pp.144–153, 1931.[7] C. L´eonard, “A survey of the Schr¨odinger problem and some of its connections with optimal transport,”
ArXiv e-prints ,Aug. 2013.[8] Y. Chen, T. T. Georgiou, and M. Pavon, “On the relation between optimal transport and Schr¨odinger bridges: A stochasticcontrol viewpoint,”
Journal of Optimization Theory and Applications , vol. 169, no. 2, pp. 671–691, May 2016.
DRAFT0 [9] C. Rao, “Information and the accuracy attainable in the estimation of statistical parameters,”
Bull. Calcutta Math. Soc. ,vol. 37, pp. 81 – 91, 1945.[10] S.-I. Amari and H. Nagaoka,
Methods of information geometry . Amer. Math. Soc., 2000.[11] A. Uhlmann, “The metric of bures and the geometric phase,” in
Quantum Groups and Related Topics: Proceedings of theFirst Max Born Symposium , R. Gielerak, J. Lukierski, and Z. Popowicz, Eds., 1992, p. 267.[12] L. Ning, X. Jiang, and T. Georgiou, “On the geometry of covariance matrices,”
IEEE Signal Processing Letters , vol. 20,no. 8, pp. 787–790, Aug 2013.[13] R. Bhatia, T. Jain, and Y. Lim, “On the Bures-Wasserstein distance between positive definite matrices,”
ArXiv e-prints ,Dec. 2017.[14] Y. Chen, T. T. Georgiou, and M. Pavon, “Optimal steering of a linear stochastic system to a final probability distribution,part I,”
IEEE Transactions on Automatic Control , vol. 61, no. 5, pp. 1158–1169, May 2016.[15] ——, “Optimal steering of a linear stochastic system to a final probability distribution, part II,”
IEEE Transactions onAutomatic Control , vol. 61, no. 5, pp. 1170–1180, May 2016.[16] M. Moakher, “Means and averaging in the groups of rotations,”
SIAM J. Matrix Anal. Appl. , vol. 24, no. 1, pp. 1–16,2002.[17] C. Lenglet, M. Rousson, R. Deriche, and O. Faugeras, “Statistics on the manifold of multivariate normal distributions:Theory and application to diffusion tensor MRI processing,”
Journal of Mathematical Imaging and Vision , vol. 25, no. 3,pp. 423–444, Oct 2006.[18] X. Jiang, L. Ning, and T. T. Georgiou, “Distances and riemannian metrics for multivariate spectral densities,”
IEEETransactions on Automatic Control , vol. 57, no. 7, pp. 1723–1735, July 2012.[19] L. Ning, T. T. Georgiou, and A. Tannenbaum, “On matrix-valued monge-kantorovich optimal mass transport,”
IEEETransactions on Automatic Control , vol. 60, no. 2, pp. 373–382, 2015.[20] Y. Chen, T. Georgiou, and A. Tannenbaum, “Matrix Optimal Mass Transport: A Quantum Mechanical Approach,”
ArXive-prints , Oct. 2016.[21] Y. Chen, T. T. Georgiou, L. Ning, and A. Tannenbaum, “Matricial wasserstein-1 distance,”
IEEE Control Systems Letters ,vol. 1, no. 1, pp. 14–19, July 2017.[22] K. Yamamoto, Y. Chen, L. Ning, T. T. Georgiou, and A. Tannenbaum, “Regularization and interpolation of positivematrices,”
IEEE Transactions on Automatic Control , vol. PP, no. 99, pp. 1–1, 2017., vol. PP, no. 99, pp. 1–1, 2017.