Regularization of covariance matrices on Riemannian manifolds using linear systems
11 Regularization of covariance matrices onRiemannian manifolds using linear systems
Lipeng Ning
Abstract
We propose an approach to use the state covariance of linear systems to track time-varying co-variance matrices of non-stationary time series. Following concepts from Riemmanian geometry, weinvestigate three types of covariance paths obtained by using different quadratic regularizations of systemmatrices. The first quadratic form induces the geodesics based on the Bures-Wasserstein metric fromoptimal mass transport theory and quantum mechanics. The second type of quadratic form leads to thegeodesics based on the Fisher-Rao metric from information geometry. In the process, we introduce afluid-mechanics interpretation of the Fisher-Rao metric for multivariate Gaussian distributions. A maincontribution of this work is the introduction of the third type of covariance paths which are steeredby linear system matrices with rotating eigenspace. We provide theoretical results on the existenceand uniqueness of this type of covariance paths. The three types of covariance paths are comparedusing two examples with synthetic data and real data based on functional magnetic resonance imaging,respectively.
Index Terms
System identification; Riemmanian metric; optimal mass transport; information theory; optimalcontrol; brain networks; functional MRI
I. I
NTRODUCTION
The problem of tracking changes and deformations of positive definite matrices is relevantto a wide spectrum of scientific applications, including computer vision, sensor array, anddiffusion tensor imaging, see e.g. [1]–[7]. A key motivation behind the present work is froma neuroscience application on understanding functional brain connectivity using resting-state
L. Ning is with the Department of Psychiatry, Brigham and Women’s Hospital, Harvard Medical School. Email:[email protected]
DRAFT a r X i v : . [ m a t h . M G ] M a y function MRI (rsfMRI) data. Specifically, rsfMRI is a widely used neuroimaging modality whichacquires a sequence three-dimensional image volumes from the whole brain to understand brainfunctions and activities [8]. The standard approach for analyzing the interdependence betweenbrain regions is to compute the correlation coefficient between the underlying rsfMRI time-seriesdata. Typically, the whole-brain functional network is characterized by the covariance matrix of amultivariate time-series data obtained from different brain regions [9], [10]. It has been recentlyobserved that the functional connectivity fluctuates over time [11], [12], implying that the staticcovariance matrix based on the assumption of stationary time series may be too simplistic tocapture the full extent of brain activities. Thus there is an urgent need for new computationaltools for understanding dynamic functional brain networks using non-stationary rsfMRI data.The aim of this work is use control-theoretic approaches to develop models for time-varyingcovariance matrices. In particular, we consider a non-stationary, discrete-time random processas observations of a zero-mean time-dependent random variable x t ∈ R n . We assume that thetemporal change of the probability distributions p t ( x ) of x t is much slower than the rate ofmeasurements. Therefore, the instantaneous covariance matrix P t := E p t ( xx (cid:48) ) = (cid:90) R n xx (cid:48) p t ( x ) d x , can be estimated by using sample covariance matrices computed from measurements from shorttime windows. Assume that two covariance matrices P and P at t = 0 , are known. Thengeodesics connecting P and P on the manifold of positive-definite matrices provide naturalstructures to model time-varying covariance matrices P t on the time interval t ∈ [0 , . As gener-alizations of straight lines in Euclidean space, geodesics are paths of shortest distances connectingthe start to the finish on a curved manifold. The path length is measured by Riemmanian metricswhich are quadratic forms of the tangent matrix ˙ P t . Several Riemannian metrics have beenproposed to derive geodesics for covariances matrices. For instance, the geodesics based on theFisher-Rao metric for Gaussian distributions [13]–[16] from the theory of information geometryis given by P info t = P / ( P − / P P − / ) t P / . (1)An other example would be the Wasserstein-2 metric for Gaussian probability density functions DRAFT [17]–[20], which induces the following geodesic P omt t = (cid:0) (1 − t ) P + tP ˆ U (cid:1)(cid:0) (1 − t ) P + tP ˆ U (cid:1) (cid:48) , (2)where P t denotes the unique positive-definite square root of P t , and ˆ U = P − / P − / ( P / P P / ) / is an orthogonal matrix. These Riemmanian metrics will be explained in more details in thefollowing sections.In this paper, we combine concepts from Riemmanian geometry and linear systems to developsmooth covariance paths. Specifically, we consider a geodesic P t as the state covariance of thefollowing linear system ˙ x t = A t x t , (3)with A t ∈ R n × n . We analyze the linear systems that steer P t along different geodesics. Basedon (3), the state covariance evolves according to ˙ P t = A t P t + P t A (cid:48) t . (4)Thus, the matrix A t can be considered as a non-commutative devision of ˙ P t by P t scaled by afactor of . Clearly, if P t is given, the mapping from A t to ˙ P t induced by (4) is injective. Inthis paper, we consider the A t that the minimizer of a quadratic function f ( A t ) such that (4)holds. The optimal value f ( A t ) provides an alternative way to define Riemannian metrics formeasuring the length of covariance paths. To this end, we consider covariance paths that are thesolutions to min P t ,A t (cid:26)(cid:90) f( A t ) dt | ˙ P t = A t P t + P t A (cid:48) t , P , P given (cid:27) . (5)Note that the optimal system matrix A t may not be symmetric, which could provide usefulinformation to understand directed interactions among the underlying variables. In this work, weinvestigate the solution to (5) with three different types of quadratic forms of f ( A t ) . The firsttwo quadratic forms lead to the geodesic paths P omt t and P info t , respectively. A main contributionsof this work in the introduction of the third family of geodesics which are the state covariancesof linear system with rotating eigespace. DRAFT
This paper is organized as follows. In Section II, we revisit the OMT-based geodesics P omt t and introduce the corresponding quadratic form f ( A t ) . Section III will focus on the quadraticforms that lead to the Fisher-Rao-based geodesics P info t . We also introduce a fluid-mechanicsinterpretation of the Fisher-Rao metric, which provides a point of contact between OMT andinformation geometry. In Section IV, we investigate the optimal solutions to (5) correspondingto a family of quadratic functions f ( A t ) which are weighted-square norms of the symmetricand asymmetric part of A t . We provide the expression of the optimal solutions and analysison the existence and uniqueness of the solutions. In Section V, we compare the three typesof covariances paths using two examples based on synthetic data and real data from rsfMRI,respectively. Section VI includes the discussions and conclusions.For notations, S n , S n + , S n ++ denote the set of symmetric, positive semidefinite, and positivedefinite matrices of size n × n , respectively. Small boldface letters, e.g. x , v , represent columnvectors. Capital letters, e.g. P, A , denote matrices. Regular small letters, e.g. w , h are for scalarsor scalar-valued functions.II. M ASS - TRANSPORT BASED COVARIANCE PATHS
A. On optimal mass transport
Let p ( x ) and p ( x ) denote two probability density functions on R n . The Wasserstein-2 metricbetween the two, denoted by w ( p , p ) , is defined by w ( p , p ) = inf m ( x , y ) ≥ (cid:90) R n × R n (cid:107) x − y (cid:107) m ( x , y ) d x d y , s.t. (cid:90) R n m ( x , y ) d x = p ( y ) , (cid:90) R n m ( x , y ) d y = p ( x ) , where m ( x , y ) represents a probability density function on the joint space R n × R n with themarginals specified by p and p , [17], [18]. A fluid-mechanics interpretation of w ( p , p ) wasintroduced in [25], [26], which provided a Riemmanian structure of the manifold of probabilitydensity functions. To introduce this formula, we consider the following continuity equation ∂p t ( x ) ∂t + ∇ x · ( p t ( x ) v t ( x )) = 0 , (6)where v t ( x ) represents a time-varying velocity field. Then, w ( p , p ) is equal to [26] w ( p , p ) = inf p t , v t (cid:26) (cid:90) E p t ( (cid:107) v t ( x ) (cid:107) ) dt | ˙ p t + ∇ x · ( p t v t ) = 0 (cid:27) . (7) DRAFT
The optimal solution of p t ( x ) is the geodesic on the manifold of probability density functionsthat connects the endpoints p ( x ) and p ( x ) . B. The Bures-Wasserstein metric
In the special case when p ( x ) and p ( x ) are zero-mean Gaussian probability density functionswith p i ( x ) = det(2 πP i ) − e ( − x (cid:48) P − i x ) , for i = 0 , , the corresponding geodesic p t ( x ) at any fixed time t is also zero-mean Gaussian with thecorresponding covariance matrix given by P omt t [19], [20]. The geodesic distance is equal toWasserstein-2 metric w ( p , p ) , which also induces the following distance measure on thecovariance matrices w ( p , p ) = d w ( P , P ) := (cid:107) P − P ˆ U (cid:107) F , (8)where (cid:107) · (cid:107) F denotes the Frobenius norm of a matrix.The covariance path P omt t is also equal to the geodesic induced by the Bures metric fromquantum mechanics [23], [24] on the manifold of positive definite matrices. In particular, let P ∈ S n ++ and ∆ ∈ S ymn which represents a tangent vector at P . The Bures metric takes theform g P, Bures (∆) = tr(∆ M ) , where M ∈ S n and ( P M + M P ) = ∆ , see e.g. [21], [22]. The trajectory P omt t in (2) is theshortest path connecting P and P . Thus, it satisfies that P omt t = argmin P t (cid:90) g P t , Bures ( ˙ P t ) dt, (9)with a given pair of endpoints P , P ∈ S n ++ , see e.g. [23]. The geodesic distance, also knownas the Bures distance, is equal to d w ( P , P ) .The Bures metric was originally proposed in quantum mechanics to compare density matrices,which are positive definite matrices whose traces are equal to one. The density matrices are non-commutative analogues of probability vectors. In the commutative case when P is restricted tobe a diagonal matrix whose diagonal entries consist of a probability vector, then the Bures metric g P, Bures (∆) is equal the Fisher information metric which will be discussed in Section III.
DRAFT
C. Bures-Wasserstein-based linear systems
Here, we present an alternative expression of (9) using the linear system in (3). For thispurpose, we define that f P t ( A t ) := tr( A t P t A (cid:48) t ) , (10)which is equal to E p t ( (cid:107) ˙ x t (cid:107) ) if ˙ x t is given by (3). The following proposition relates the geodesics P omt t to a linear system. Proposition 1.
Given P , P ∈ S n ++ . Then P omt t given by (2) is the unique minimizer of min P t ,A t (cid:26)(cid:90) f P t ( A t ) dt | ˙ P t = A t P t + P t A (cid:48) t , P , P specified (cid:27) , (11) with f P t ( A t ) defined by (10) and the optimal A t is equal to A omt t = Q ( tQ − I ) − , (12) where Q = I − P − ( P P P ) P − . The optimal value of the objective function in (11) isequal to d w ( P , P ) .Proof. The optimization problem (11) takes a special form of (7) with p ( x ) , p ( x ) being twozero-mean Gaussian probability density functions and an additional constraint that the velocityfield v t ( x ) = A t x . Thus, d w ( P , P ) is a lower bound of (11). Therefore, we only need toshow that A omt t satisfies the constraint and provides the optimal value. ˙ P omt t = A omt t P omt t + P omt t ( A omt t ) (cid:48) . (13)First, we rewrite (2) as P omt t = ( I − tQ ) P ( I − tQ ) . Taking the derivative of P omt t gives ˙ P omt t = − QP ( I − tQ ) − ( I − tQ ) P Q = Q ( tQ − I ) − P t + P t Q ( tQ − I ) − . Since all the eigenvalues of Q are smaller than , the matrix tQ − I is invertible for all t ∈ [0 , . DRAFT
Therefore (13) holds. Moreover, tr( A omt t P omt t A omt t ) = tr( QP Q )= (cid:107) P / ˆ U − P / (cid:107) = d w ( P , P ) , which completes the proof.We note that the matrix A omt t is symmetric. Moreover, the matrices A omt t and A omt t commutefor any t , t . Therefore the eigenspace of A t is fixed on the interval t ∈ [0 , .The results from Proposition (1) can be further extended to obtain the optimal solutionscorresponding to the following objective function f WP t ( A t ) = E p t ( (cid:107) ˙ x t (cid:107) W ) = tr( W A t P t A (cid:48) t ) , (14)where ˙ x t = A t x , W ∈ S n ++ and (cid:107) ˙ x t (cid:107) W := x (cid:48) W x . By applying change of variables, we define P W,t := W P t W , (15) A W,t := W A t W − . (16)Thus, f WP t ( A t ) = tr( A W,t P W,t A (cid:48) W,t ) = f P W,t ( A W,t ) . Moreover, if (4) holds, then ˙ P W,t = A W,t P W,t + P W,t A (cid:48) W,t . Therefore, if A omt t is the optimal system matrix that steers P W, to P W, with respect to f P ( A ) given by Proposition 1, then W − A omt t W is the optimal solution with respect to f WP ( A ) . Inthe following section, we investigate a further extension of f WP ( · ) by using a time-dependentweighting matrix which provides a point of contact between OMT and information geometry.III. I NFORMATION - GEOMETRY BASED COVARIANCE PATHS
A. The Fisher-Rao metric
For two probability density functions p ( x ) and ˆ p ( x ) on R n , the Kullback-Leibler (KL) diver-gence d KL ( p || ˆ p ) := (cid:90) R n p log (cid:18) p ˆ p (cid:19) d x (17) DRAFT represents a well-established notion of distance between the two [27], [28]. If ˆ p = p + δ with δ representing a small perturbation, then the quadratic term of the Taylor’s expansion of d KL ( p || p + δ ) in terms of δ is the Fisher information metric g p, Fisher ( δ ) = (cid:90) δ p d x . For a probability distribution p ( x , θ ) parameterized by a vector θ , the corresponding metric isreferred to as the Fisher-Rao metric and given by g θ , Rao ( δ θ ) = δ (cid:48) θ E (cid:20)(cid:18) ∂ log p∂ θ (cid:19) (cid:18) ∂ log p∂ θ (cid:19) (cid:48) (cid:21) δ θ . If p ( x ) is a zero-mean Gaussian probability density function parameterized by a covariancematrix P , then the metric becomes g P, Rao (∆) = tr( P − ∆ P − ∆) . Given P , P ∈ S n ++ , the geodesic in (1) is equal to the solution P info t = argmin P t (cid:90) g P, Rao ( ˙ P t ) dt, (18)which is the shortest path connecting P and P . The corresponding path length is equal to d info ( P , P ) = (cid:107) log( P − P P − ) (cid:107) F , (19)see e.g. [29, Theorem 6.1.6] B. Fisher-Rao metric based linear systems
Following (5), we will define a positive quadratic form so that the optimal state covariance isequal to P info t . One choice for the quadratic form would be given by f info , P ( A ) : = g P, Rao ( AP + P A (cid:48) )= 2tr( AA + P − AP A (cid:48) ) , (20)which satisfies that f info , P ( A ) ≥ , ∀ A ∈ R n × n and P ∈ S n ++ . DRAFT
Proposition 2.
Given P , P ∈ S n ++ , P info t from (1) is the unique minimizer of min P t ,A t (cid:26) (cid:90) f info , P t ( A t ) dt | ˙ P t = A t P t + P t A (cid:48) t , P , P specified (cid:27) , (21) with f info P t ( A t ) defined by (20) and the optimal A t is equal to A info := 12 P log( P − P P − ) P − . (22) Moreover, the optimal value of the objective function is equal to d info ( P , P ) .Proof. We rewrite (1) as P info t = P ( P − P P − ) t P = P e
12 log( P − P P − ) t e
12 log( P − P P − ) t P = e A info t P e A info (cid:48) t , where the last equation is obtained using e XY X − = Xe Y X − . Taking the derivative of P info t togive that ˙ P info t = A info P info t + P info t A info (cid:48) , (23)which completes the proof.Note that the metric d info ( · ) is invariant with respect to congruence transforms, i.e. d info ( P , P ) =d info ( T P T (cid:48) , T P T (cid:48) ) for any invertible matrix T . If A info t is the optimal solution of (21). Then T A info T − is the optimal solution corresponding to the pair T P T (cid:48) , T P T (cid:48) . C. A weighted-mass-transport view
Since the map from A t to ˙ P t in (4) is injective, the quadratic forms of A t that lead to P info t isnot unique. Here, we provide an alternative form, which provides an interesting relation betweenOMT and the Fisher-Rao metric. For this purpose, we define the following weighted square norm f info , P t ( A t ) = 4 E p t ( (cid:107) ˙ x t (cid:107) P − t ) = 4tr( P − t A t P t A (cid:48) t ) , (24)which is a special form of (14) with the weighting matrix W = P t and scaled by the factor of . The following lemma draws the relation between f info , P ( A ) and f info , P ( A ) . DRAFT0
Lemma 1.
Consider f info , P ( · ) and f info , P ( · ) be defined in (20) and (24) , respectively. Then, f info , P ( A ) ≥ f info , P ( A ) , ∀ P ∈ S n ++ , A ∈ R n × n . Proof.
Taking the difference f info , P ( A ) − f info , P ( A )= 2tr( P − AP A (cid:48) − AA )= tr (cid:18) ( P − AP − P A (cid:48) P − )( P − AP − P A (cid:48) P − ) (cid:48) (cid:19) , which is non-negative.We note that if P − AP is symmetric, then f info , P ( A ) is equal to f info , P ( A ) . This gives riseto the following proposition in parallel to Proposition 2. Proposition 3.
Given P , P ∈ S n ++ . Then P info t , A info given by (1) and (22) , respectively, arethe unique pair of minimizer of min P t ,A t (cid:26) (cid:90) f info , P t ( A t ) dt | ˙ P t = A t P t + P t A (cid:48) t , P , P specified (cid:27) , (25) with f info , P t ( A t ) defined by (24) .Proof. From Lemma 1, (cid:90) f info , P t ( A t ) dt ≥ (cid:90) f info , P t ( A t ) dt ≥ d info ( P , P ) , (26)for any feasible pairs of P t and A t . It is straightforward to verify that the above inequalitiesbecome equalities with the given P info t and A info . Therefore, the proposition holds. D. A fluid-mechanics interpretation
Note that f info , P t ( A t ) is a special case of f WP t ( A ) in (14) when W = 4 P − t . It is equal to f info , P t ( A t ) = 4 E p t ( (cid:107) v t ( x ) (cid:107) P − t ) , with the velocity field given by v t ( x ) = A t x . Thus if the initial distribution p ( x ) is Gaussian,so is p t ( x ) , ∀ t ≥ . Proposition (3) implies that among all the trajectories that connect twoGaussian probability density functions p and p , the lowest weighted-mass-transport cost is DRAFT1 obtained by Gaussian density functions whose covariance matrices are equal to P info t . But thisoptimal trajectory is obtained under the linear constraint of velocity fields. Next, we remove thisconstraint and show that this trajectory is still optimal. Theorem 1.
Given two zero-mean Gaussian probability density functions p ( x ) , p ( x ) on R n with covariance matrices P , P ∈ S n ++ , respectively. Define p info t ( x ) , v info t ( x ) as the minimizerof inf p t , v t (cid:26) (cid:90) E p t (cid:16) (cid:107) v t ( x ) (cid:107) P − t (cid:17) dt | ˙ p t + ∇ x · ( p t v t ) = 0 (cid:27) . (27) Then p info t ( x ) is zero-mean Gaussian whose covariance matrix is equal to P info t from (1) and v info t ( x ) = A info x almost surely with A info given by (22) . Moreover, the optimal value is equalto d info ( P , P ) .Proof. First, we define V t C t C (cid:48) t P t := E p t v t ( x ) x (cid:104) v t ( x ) (cid:48) x (cid:48) (cid:105) . (28)Then applying integral by parts to obtain that ˙ P t = (cid:90) R n xx (cid:48) ˙ p t ( x ) d x = (cid:90) R n − xx (cid:48) ∇ · ( p t ( x ) v t ( x )) d x = C t + C (cid:48) t . (29)Therefore, the following optimization problem min C t ,V t ,P t (cid:26) (cid:90) tr( P − t V t ) dt | V t C t C (cid:48) t P t ∈ S n × n + , ˙ P t = C t + C (cid:48) t (cid:27) (30)provides a lower bound of (27) because the higher-order moments of the probability densityfunctions are not considerd. On the other hand, (25) provides an upper bound of (27) becausethe velocity field is constrained to satisfy the linear system. Then, we show that the two boundscoincide.Note that V t − C t P − t C (cid:48) t ∈ S n × n + . Thus the optimal V t of (30) should satisfy that V t = C t P − t C (cid:48) t . DRAFT2
Therefore, (30) is equal to min C t ,P t ∈ S n × n + (cid:26) (cid:90) tr( P − t C t P − t C (cid:48) t ) dt | ˙ P t = C t + C (cid:48) t (cid:27) . (31)Note that the constrain P t ∈ S n × n + is automatically satisfied due to the inverse barrier objectivefunction. Therefore, we drop the constraint that P t ∈ S n × n + in the following analysis.Next, we consider the optimization problem (31) as an optimal control problem with C t being matrix-valued control. Then we derive the optimal solution using Pontryagin’s minimumprinciple. A necessary condition for the optimal solution is that it much annihilate the variationof the Hamiltonian h ( C t , P t , Π t ) := 4tr( P − t C t P − t C (cid:48) t ) + tr(Π t ( C t + C (cid:48) t )) with respect to the control C t . Here, Π t is a symmetric matrix representing the Lagrangemultiplier, i.e. the co-state. By setting the partial derivative of h ( · ) with respect to C t to zero,we obtain that C t = − P t Π t P t , (32)which provides a necessary condition that the optimal C t is symmetric. Therefore C t = ˙ P t and the objective function in (30) becomes tr( P − t ˙ P t P − t ˙ P t ) = g P, Rao ( ˙ P t ) . Thus, the theoremdirectly follows from (18). For completeness, we finish the proof based on the Hamiltonian h ( · ) in below.The optimal ˙Π t must annihilate the partial derivative of h ( · ) with respect to P t . This givesrise to ˙Π t = 8 P − t C t P − t C (cid:48) t P − t . (33)Then, substituting (32) into (29) and (33) to obtain that ˙ P t = − P t Π t P t , (34) ˙Π t = 12 Π t P t Π t . DRAFT3
Note that ˙ P t Π t + P t ˙Π t = 0 for all t . Hence P t Π t is constant. We set − P t Π t = A. (35)Thus (32) is equal to C t = AP t = P t A (cid:48) . Multiplying both sides by P − t gives that P − t C t P − t = P − t AP t which is symmetric for all t . Substituting (35) to (34) to give that ˙ P t = AP t + P t A (cid:48) . Therefore, P t = e At P e A (cid:48) t . Multiplying P − to both sides to give P − P t P − = P − e At P e A (cid:48) t P − = e P − AP t . By setting t = 1 , we solve that A = 12 P log( P − P P − ) P − , which is equal to A info . Furthermore, from (23), the corresponding P t is equal to P info t . Thenthe optimal covariance matrix in (28) is singular and has rank n . Thus the optimal velocityfield v t ( x ) is equal to A info x almost surely, implying that the corresponding p t ( x ) is Gaussian.Therefore, the theorem is proved.Note that the system matrix A info is constant. Thus both A info and A omt t have fixed eigenspaces.Next, we introduce a different quadratic form of A t which leads to system matrices with rotatingeigenspace. IV. R OTATION - LINEAR - SYSTEM BASED COVARIANCE PATHS
A. Weighted-least-squares cost functions
Note that if X is an asymmetric matrix, i.e. X (cid:48) = − X , then e Xt is a rotation matrix.Consequently, if the system matrix A is asymmetric, then the state covariance matrix has arotating eigenspace. In this regard, we decompose A = A s + A a , DRAFT4 where A s := ( A + A (cid:48) ) , A a := ( A − A (cid:48) ) are the symmetric and asymmetric parts of A ,respectively. Then, we define the following weighted-least-squares (WLS) function f (cid:15) ( A ) := (cid:107) A s (cid:107) + (cid:15) (cid:107) A a (cid:107) , (36)where the scalar (cid:15) > weighs the relative significance of symmetric and asymmetric parts of A .If A satisfies ˙ P = AP + P A (cid:48) for a given pair ˙ P and P , then f (cid:15) ( A ) is considered as a quadraticform of the non-commutative division of ˙ P by P , similar to the Fisher-Rao metric. Actually,for scalar-valued covariances, f (cid:15) ( A ) is equal to the Fisher-Rao metric.Following (5), we consider the optimal solution to min P t ,A t (cid:26) (cid:90) f (cid:15) ( A t ) dt | ˙ P t = A t P t + P t A (cid:48) t , P , P specified (cid:27) , (37)for a given pair of endpoints P , P ∈ S n ++ and a scalar (cid:15) > . B. Optimal covariance paths
To introduce the solution to (37), we define T (cid:15),t ( A ) := e (1+ (cid:15) ) A a t e ( A s + (cid:15)A (cid:48) a ) t . (38)The next lemma shows that T (cid:15),t ( · ) is equal to the state transition matrix of a linear time-varyingsystem. Lemma 2.
Given A ∈ R n × n and a scalar (cid:15) . Define A (cid:15),t := e (1+ (cid:15) ) A a t Ae (1+ (cid:15) ) A (cid:48) a t . Then ˙ T (cid:15),t ( A ) = A (cid:15),t T (cid:15),t ( A ) . (39) Proof. ˙ T (cid:15),t ( A ) = e (1+ (cid:15) ) A a t ((1 + (cid:15) ) A a ) e ( A s + (cid:15)A (cid:48) a ) t + e (1+ (cid:15) ) A a t ( A s + (cid:15)A (cid:48) a ) e ( A s + (cid:15)A (cid:48) a ) t = e (1+ (cid:15) ) A a t Ae ( A s + (cid:15)A (cid:48) a ) t = A (cid:15),t T (cid:15),t ( A ) . DRAFT5
The following corollary is a direct result of Lemma 2.
Corollary 1.
Given P ∈ S n ++ , A ∈ R n × n and a scalar (cid:15) . Define P (cid:15),t := T (cid:15),t ( A ) P T (cid:15),t ( A ) (cid:48) . Then the following equation holds ˙ P (cid:15),t = A (cid:15),t P (cid:15),t + P (cid:15),t A (cid:15),t . (40)Next, we are ready to present the solution to (37). Proposition 4.
Given P , P ∈ S n ++ and a scalar (cid:15) > . If there exists a Π ∈ S n such that P wls (cid:15),t = T (cid:15),t ( A ) P T (cid:15),t ( A ) (cid:48) , (41) satisfies that P wls (cid:15), = P with A = − ( P Π + Π P ) − (cid:15) (Π P − P Π ) , (42) and T (cid:15),t ( · ) given by (38) , then P wls (cid:15),t is a minimizer of (37) . The corresponding optimal A t isequal to A wls (cid:15),t = e (1+ (cid:15) ) A a t A e (1+ (cid:15) ) A (cid:48) a t . (43) Proof.
Consider (37) as an optimal control problem with A t being matrix-valued control. Then,the Hamiltonian is as follows h ( A t , P t , Π t ) = (cid:107) A t + A (cid:48) t (cid:107) + (cid:15) (cid:107) A t − A (cid:48) t (cid:107) + tr(Π t ( A t P t + P t A (cid:48) t )) , = (cid:15) tr( A t A (cid:48) t ) + − (cid:15) tr( A t A t ) + tr(Π t ( A t P t + P t A (cid:48) t )) . It is necessary that ˙Π t annihilates the partial derivative of h ( · ) with respect to P t , which givesrise to ˙Π t = − Π t A t − A (cid:48) t Π t . (44) DRAFT6
Moreover, the partial derivative of h ( · ) with respect to the control A t vanishes, which leads to ( A t + A (cid:48) t ) + (cid:15) ( A t − A (cid:48) t ) + 2Π t P t = 0 . (45)Solving A t from (45) to obtain that A t = − ( P t Π t + Π t P t ) − (cid:15) (Π t P t − P t Π t ) . (46)Then, substituting (46) in (4) and (44), respectively, to obtain ˙ P t = ( − (cid:15) ) P t Π t P t − ( + (cid:15) )(Π t P t + P t Π t ) , ˙Π t = (1 − (cid:15) )Π t P t Π t + ( + (cid:15) )(Π t P t + P t Π t ) . Next, it can be verified that ˙(Π t P t ) − ˙( P t Π t ) = 0 . Thus, the asymmetric part of A t , which isequal to ( A t ) a = − (cid:15) (Π t P t − P t Π t ) , is constant and denoted by A a . Taking the derivative of its symmetric part ( A t ) s = ( A t + A (cid:48) t ) = − ( P t Π t + Π t P t ) gives that ˙( A t ) s = − ˙( P t Π t ) − ˙(Π t P t )= (cid:15) (cid:15) ( P t Π t P t − Π t P t Π t )= (1 + (cid:15) ) (cid:0) A a ( A t ) s + ( A t ) s A (cid:48) a (cid:1) . Since A a is constant, the solution to the above equation is equal to ( A t ) s = e (1+ (cid:15) ) A a t A s e (1+ (cid:15) ) A (cid:48) a t . Therefore, the optimal A t has the form A t = e (1+ (cid:15) ) A a t A s e (1+ (cid:15) ) A (cid:48) a t + A a , = e (1+ (cid:15) ) A a t Ae (1+ (cid:15) ) A (cid:48) a t , DRAFT7 with A = A a + A s being the initial value of A t . Next, we define a new variable ˆ P t := e (1+ (cid:15) ) A (cid:48) a t P t e (1+ (cid:15) ) A a t , (47)whose derivative is equal to ˙ˆ P t = e (1+ (cid:15) ) A (cid:48) a t ( A t P t + P t A (cid:48) t ) e (1+ (cid:15) ) A a t + (1 + (cid:15) ) A (cid:48) a ˆ P t + ˆ P t (1 + (cid:15) ) A a = ( A s + (cid:15)A (cid:48) a ) ˆ P t + ˆ P t ( A s + (cid:15)A a ) . Thus the solution to ˆ P t is equal to ˆ P t = e ( A s + (cid:15)A (cid:48) a ) t P e ( A s + (cid:15)A a ) t . Substituting this solution to (47) to obtain that the optimal P t has the form P t = e (1+ (cid:15) ) A a t e ( A s + (cid:15)A (cid:48) a ) t P e ( A s + (cid:15)A a ) t e (1+ (cid:15) ) A (cid:48) a t . In a similar way, we define ˆΠ t = e (1+ (cid:15) ) A (cid:48) a t Π t e (1+ (cid:15) ) A a t . Then ˙ˆΠ t = ( − A s + (cid:15)A (cid:48) a ) ˆΠ t + ˆΠ t ( − A s + (cid:15)A a ) , whose solution is equal to Π t = e (1+ (cid:15) ) A a t e ( − A s + (cid:15)A (cid:48) a ) t Π e ( − A s + (cid:15)A a ) t e (1+ (cid:15) ) A (cid:48) a t . (48)If (42) holds, then A s + (cid:15)A (cid:48) a = − P Π . It is straightforward to show that (46) holds for all t > for the provided expressions for A t , P t , Π t . In this case, f (cid:15) ( A t ) = (cid:107) A s (cid:107) + (cid:15) (cid:107) A a (cid:107) for all t .Therefore, if the A is equal to ˆ A in (42), then the proposed trajectories A wls (cid:15),t and P wls (cid:15),t is localminimizer, which completes the proof.Next, we provide an upper bound of the optimal value of (37) for all (cid:15) > . For this purpose,we define ˆ A = log( P − ( P P P ) P − ) , which is symmetric. Moreover, P t = e ˆ At P e ˆ At is a feasible solution to (37). Therefore, thefollowing proposition holds DRAFT8
Proposition 5.
Given P , P ∈ S n ++ and a scalar (cid:15) > . Then the optimal value of (37) is notlarger than (cid:107) log( P − ( P P P ) P − ) (cid:107) .C. On the existence and uniqueness of rotation-system based paths We will analyze the existence of a covariance path of the form (41) that connects two given P , P ∈ S n ++ . To simplify notations, we denote α = (1 + (cid:15) ) / (2 (cid:15) ) . Furthermore, we remove theconstraint that (cid:15) > and consider all (cid:15) (cid:54) = 0 . Based on the new parameter α , the closed-formequations (41), (42) and (43) have defined the following mapping h α,P (Π) : S n → S n ++ : Π (cid:55)→ e α ( P Π − Π P ) e − P Π P e − Π P e α (Π P − P Π) . (49)We will analyze the existence of Π that satisfies h α,P (Π) = P , (50)for a given α . If there exists a Π ∈ S n that solves (50) with α (cid:54) = , then there exists a covariancepath of the form (41) with (cid:15) = 1 / (2 α − .In the special case when α = 0 , (50) becomes e − P Π P e − Π P = P , which is equivalent to exp( − P Π P ) = P − P P − . Clearly, it has a unique solution given by Π = − P − log( P − P P − ) P − . (51)It is interesting to note that the corresponding covariance path is equal to the Fisher-Rao-basedgeodesics P info t given by (1).Because the mapping h α,P is continuous in term of α and Π , it is expected that (50) also hasa solution if α is sufficiently close to zero. Specifically, we assume that exists a solution Π to(50) for a specific α , e.g. α = 0 . Then, we can derive the following expression from (50) Π = − P − log( P − e α (Π P − P Π) P e α ( P Π − Π P ) P − ) P − . (52)Next, we will apply perturbation analysis to the above equation to understand the solutionsassociated with different values for α . Specifically, let δ α and ∆ Π denote perturbations to α and DRAFT9 Π , respectively, so that α + δ α and Π + ∆ Π still satisfy (52). Then, for small perturbations, theperturbation of both sides of (52) gives rise to ∆ Π = − P − M − Q (cid:18) αP − M U (∆ Π P − P ∆ Π ) P U (cid:48) P − + αP − U P M U (cid:48) ( P ∆ Π − ∆ Π P ) P − + δ α P − (Π P − P Π) U P U (cid:48) P − + δ α P − U P U (cid:48) ( P Π − Π P ) P − (cid:19) P − + o ( | δ α | ) + o ( (cid:107) ∆ Π (cid:107) ) , (55)where o ( | δ α | ) + o ( (cid:107) ∆ Π (cid:107) ) denotes higher order terms of the perturbations, U = e α (Π P − P Π) ,Q = P − U P U (cid:48) P − , and M − X ( · ) and M X ( · ) are defined in (53) and (54), respectively. Next, we combine all theterms containing ∆ Π on the right hand side of (55) to define the following linear mapping ˆ h α,P , Π : S n → S n , ˆ h α,P , Π : ∆ Π (cid:55)→ − P − M − Q (cid:18) P − M U (∆ Π P − P ∆ Π ) P U (cid:48) P − + P − U P M U (cid:48) ( P ∆ Π − ∆ Π P ) P − (cid:19) P − . (56)Then the terms that containing the δ α is equal to δ α ˆ h α,P , Π (Π) , Thus, (55) is equivalent to ( I − α ˆ h α,P , Π )(∆ Π ) = δ α ˆ h α,P , Π (Π) + o ( | δ α | ) + o ( (cid:107) ∆ Π (cid:107) ) , (57) For A, ∆ ∈ R n × n , e A +∆ = e A + M e A (∆) + o ( (cid:107) ∆ (cid:107) ) , where M X (∆) denotes the non-commutative multiplication of ∆ by X which is defined as M X (∆) = (cid:90) X − τ ∆ X τ dτ. (53)For positive definite matrices A, A + ∆ ∈ S n ++ , log( A + ∆) = log( A ) + M − A (∆) + o ( (cid:107) ∆ (cid:107) ) , where M − X (∆) denotes the non-commutative devision of ∆ by X which is defined as M − X (∆) = (cid:90) ∞ ( X + τ I ) − ∆( X + τ I ) − dτ. (54) DRAFT0 where I denotes the identity mapping.Let α τ = ατ denote a smooth trajectory on the interval τ ∈ [0 , for a given α . If the linearmapping I − α ˆ h α τ ,P , Π τ is invertible, then solution to the following differential equation ddτ Π τ = ( I − α τ ˆ h α τ ,P , Π τ ) − ◦ α ˆ h α τ ,P , Π τ (Π τ ) , (58)at τ = 1 with the initial value given by Π in (51) is equal to the unique solution to (50).The following theorem provides a sufficient condition on the existence and uniqueness ofthe solution. To introduce the results, we let λ min ( P ) and λ max ( P ) denote the smallest and thelargest eigenvalues of a matrix P ∈ S n . Moreover, we define the following pseudo-norm of amatrix P ∈ S n : (cid:107) P (cid:107) a := max ∆ ∈ S n , ∆ (cid:54) =0 (cid:107) ∆ P − P ∆ (cid:107)(cid:107) ∆ (cid:107) . (59)Note that if P is equal to the identify matrix scaled by any scalar then (cid:107) P (cid:107) a = 0 . If (cid:107) P (cid:107) a = 0 ,we follow the conventions to define λ/ (cid:107) P (cid:107) a = + ∞ for any λ > . Then, we obtain the followingtheorem. Theorem 2.
For a pair of positive definite matrices P , P ∈ S n ++ , if α is a scalar such that | α | < max (cid:26) λ min ( P ) λ min ( P ) (cid:107) P (cid:107) a λ max ( P ) , λ min ( P ) λ min ( P ) (cid:107) P (cid:107) a λ max ( P ) (cid:27) , (60) then there exists a unique Π ∈ S n that satisfies e α ( P Π − Π P ) e − P Π P e − Π P e α (Π P − P Π) = P . Proof.
Following (58), we will prove that I − α τ ˆ h α τ ,P , Π τ is invertible if (60) holds. It is sufficientto prove that the singular values of the symmetric mapping ˆ h α τ ,P , Π τ are all smaller than 1. Forthis purpose, we will compute the norm of h α τ ,P , Π τ (∆ Π ) defined by (56). The norm of the first DRAFT1 two terms containing ∆ Π is equal to (cid:12)(cid:12)(cid:12)(cid:12) (cid:107) P − M − Q (cid:18) P − M U (∆ Π P − P ∆ Π ) P U (cid:48) P − (cid:19) P − (cid:107) = 12 (cid:107) (cid:90) ∞ (cid:90) P − ( Q + t I ) − P − U − t (∆ Π P − P ∆ Π ) U t P U (cid:48) P − ( Q + t I ) − P − dt dt (cid:107)≤ (cid:90) ∞ (cid:107) ( P QP + t P ) − (cid:107) dt λ max ( P ) (cid:107) P (cid:107) a (cid:107) ∆ Π (cid:107)≤ (cid:90) ∞ ( λ min ( P ) + t λ min ( P )) − dt λ max ( P ) (cid:107) P (cid:107) a (cid:107) ∆ Π (cid:107)≤ (cid:107) P (cid:107) a λ max ( P ) λ min ( P ) λ min ( P ) (cid:107) ∆ Π (cid:107) . The same upper bound also holds for the other two terms containing ∆ . Combining the boundsfor all the four terms lead to (cid:107) α τ ˆ h α τ ,P , Π τ (cid:107) ≤ α (cid:107) P (cid:107) a λ max ( P ) λ min ( P ) λ min ( P ) . Therefore, if | α | < λ min ( P ) λ min ( P ) (cid:107) P (cid:107) a λ max ( P ) , (61)then I − α τ ˆ h α τ ,P , Π τ is invertible for all τ ∈ [0 , . Thus, we have proved the first term on ther.h.s. of (60).The second term can be obtained in a similar way by considering a time-reversal path from P to P . Specifically, if | α | < λ min ( P ) λ min ( P ) (cid:107) P (cid:107) a λ max ( P ) , (62)then (61) implies that there exist a unique Π ∈ S n such that the following time-reversal path P (cid:15), (1 − t ) = T (cid:15),t ( A ) P T (cid:15),t ( A ) (cid:48) , (63)satisfies P (cid:15), = P , where A = − ( P Π + Π P ) − (Π P − P Π ) . (64) DRAFT2
Next, we follow (48) to define Π (cid:15), (1 − t ) = e (1+ (cid:15) )( A ) a t e ( − ( A ) s + (cid:15) ( A ) (cid:48) a ) t Π e ( − ( A ) s + (cid:15) ( A ) a ) t e (1+ (cid:15) )( A ) a t , (65)where ( A ) s and ( A ) a denote the symmetric and asymmetric part of A , respectively. Let Π = − Π (cid:15), (1 − t ) . Then Π satisfies the conditions in Proposition 4. Specifically, the path from (63)satisfies the time-forward equation P (cid:15),t = T (cid:15),t ( A ) P T (cid:15),t ( A ) (cid:48) with A given by (42). Therefore,the proof is complete. D. On the computation of local solutions If | α | is large so that I − α τ ˆ h α τ ,P , Π τ from (58) is not invertible for some τ ∈ [0 , , thenthere may exist multiple solutions to (50). In below, we propose an approach to compute localsolutions (50) based on an approximate initial value.Specifically, we consider ˆΠ as an initial guess for the solution to (50). We assume that thetrue endpoint P is close to ˆ P = h α,P ( ˆΠ ) . By applying perturbation analysis, we obtain thatif the pair ˆΠ + ∆ Π and ˆ P + ∆ P satisfy (50) then the perturbations should satisfy ∆ Π − α ˆ h α,P , ˆΠ (∆ Π ) + o ( (cid:107) ∆ Π (cid:107) ) = − P − M − Q ( P − U ∆ P U (cid:48) P − ) P − + o ( (cid:107) ∆ P (cid:107) ) . Next, we define a path P τ = (1 − τ ) ˆ P + τ P for τ ∈ [0 , . Then, P τ remains in S n ++ and ˙ P τ = P − ˆ P . If I − α ˆ h α,P , ˆΠ is invertible, then the solution to the following differentialequation ddτ ˆΠ τ = ( I − α ˆ h α,P , ˆΠ ) − (cid:18) − P − M − Q ( P − U ( P − ˆ P ) U (cid:48) P − ) P − ) (cid:19) , at τ = 1 with the initial value given by ˆΠ provides a solution to (50). The solution may dependon the choice of the initial value ˆΠ as illustrated by the following example.V. E XAMPLES
A. Interpolating covariance matrices
In this example, we highlight the difference between P wls (cid:15),t and the other two types of trajec-tories, i.e. P omt t , P info t , using the following two matrices as the endpoints P = , P = . (66) DRAFT3
Applying Eqs. (1) and (2) we obtain that P omt t = (1 + ( √ − t )
00 ( √ − √ t ) ,P info t = t
00 2 − t , which are all diagonal. On the other hand, if (cid:15) = 0 in (42), there are infinitely many asymmetricmatrices A wls0 of the following form A wls0 = ± (2 k +1) π ∓ (2 k +1) π , that makes the objective function equal to zero. The corresponding covariance paths are equalto P wls0 ,t = ( (2 k +1) π t ) ± cos( (2 k +1) π t ) sin( (2 k +1) π t ) ± cos( (2 k +1) π t ) sin( (2 k +1) π t ) 1 + cos ( (2 k +1) π t ) , which are not diagonal.To understand the covariance paths corresponding to nonzero (cid:15) , we gradually increase (cid:15) from . to . and numerically compute the solution to (50) by using the fmincon func-tion in MATLAB R (cid:13) to search for a symmetric matrix Π that minimizes the least-square error (cid:107) h (cid:15) (cid:15) ,P (Π) − P (cid:107) F . In this procedure, we apply the minimizer corresponding to a smaller (cid:15) asthe initial value of the next step with a larger (cid:15) . In the first step when (cid:15) = 0 . , we choosetwo initial values for ˆΠ as ˆΠ ± = ± ± π ± π × − , respectively, so that the initial system matrices from (42) are approximately assymetric. As (cid:15) increases, we obtain two branches of numerical minimizers whose residuals are around − .Figure 1 illustrates the trajectories of the entries of P wls (cid:15),t corresponding to three positive valuesof (cid:15) . The dashed and solid red lines illustrate the off-diagonal entry of the two branches ofminimizers. As (cid:15) increases, the magnitude of the off-diagonal entry reduces. Fig. 2 illustratesthe off-diagonal entry A a (1 , of the two branches of local minimizers at different (cid:15) . The twobranches collapse into a unique one when (cid:15) passes a threshold value around . . DRAFT4
Fig. 1:
An illustration of covariance paths P wls (cid:15),t connecting P and P in (66) at three different values for (cid:15) . Fig. 2:
An illustration of the off-diagonal entry A a (1 , of two branches of local minimizers at different (cid:15) . B. Regularization of sample covariances
We investigate an application to use the proposed covariance paths to fit noisy sample covari-ance matrices from a rsfMRI dataset. Below we provide detailed descriptions about the data, themethod and experimental results.
1) Data:
The sample covariances matrices are computed using a rsfMRI data set from theHuman Connectome Project [31]. This dataset consists of 1200 rsfMRI image volumes measuredin a 15-minute time window. The provided data has already been processed by the ICA-FIXmethod [32]. It is further processed using global signal regression (GSR) as suggested in[33]. Then we apply the label map from [35] to separate brain cortical surface into 7 non-overlapping regions. The data sequences from each region are averaged into a one-dimensionaltime series, providing a 7-dimensional time series sampled at 1200 time points. The same datasetand preprocessing method have been used in our early work [34]. Next, we normalize eachdimension of the time series by its standard devision. The normalized time series is denoted by
DRAFT5 { x t , t = 1 , . . . , } . Moreover, we split the entire sequence into 10 equal-length segments andcompute the corresponding sample covariance matrices as ˜ P k = 1120 (cid:88) i =1 x × k + i x (cid:48) × k + i , for k = 0 , . . . , . Then the time-scale is changed so that ˜ P t k is equal to ˜ P k with t = 0 and t = 1 . The color arraysin the first row of Fig. 3 illustrate several representative ˜ P t k at t = 0 , , , , respectively. Thesefigures show that ˜ P t k has significant fluctuations which is consistent to the observations from[11], [12]. The main goal of this proof-of-concept experiment is to use the proposed covariancepaths to fit these sample covariances and compare their differences. The neuroscience aspectsof this experiment will not be discussed in this paper.
2) Method:
We solve optimization problems of the following form min P t ∈P K (cid:88) k =0 (cid:107) P t k − ˜ P t k (cid:107) , (67)to obtain smooth paths that fit the measurements, where K = 9 and P represents a suitable set ofsmooth paths. Based on results from the previous sections, we propose three sets of parametricmodels for the smooth paths which are described in below.Based on Proposition 1, we define P omt := (cid:26) P t | P t = ( I − tQ ) P ( I − tQ (cid:48) ) ,P ∈ S n ++ , Q ∈ R n × n (cid:27) . Note that Q could be a non-symmetric matrix so that P omt contains the OMT-based geodesicsin the form of (2). We use this more general family of covariance paths in order to obtain betterfitting results. It is also clear from Proposition 1 that a P t P omt is the state covariance of a lineartime varying system with A t = − Q ( I − Qt ) − . We apply the fminsdp function in MATLAB R (cid:13) to obtain an optimal solution. The initial values for P and M are set to ˜ P and the zero matrix,respectively. The same initial values and optimization algorithm are used to solve the subsequentoptimization problems. The corresponding optimal value is denoted by ˆ P omt t . DRAFT6
The second set of smooth paths is defined based on Proposition 2 which is given by P info := (cid:110) P t | P t = e At P e A (cid:48) t , P ∈ S n ++ , A ∈ R n × n (cid:111) . P info includes all geodesic paths in the form of P info t . The optimal path in this set is denoted by ˆ P info t . Clearly, a trajectory in P t ∈ P info is equal to the state covariance of a linear time-invariantsystem.Based on Proposition 4, we define P (cid:15), wls := (cid:26) P t | P t = T (cid:15),t ( A ) P T (cid:15),t ( A ) (cid:48) ,P ∈ S n ++ , A ∈ R n × n (cid:27) , for a given (cid:15) > . The corresponding optimal paths are denoted by ˆ P wls (cid:15),t . This set includes allthe trajectories that are solutions of (37). A trajectory in P (cid:15), wls is equal to the state covariance ofa linear time-varying system with the system matrices expressed in the form e (1+ (cid:15) ) A a t Ae (1+ (cid:15) ) A (cid:48) a t .The system matrices corresponding to ˆ P wls (cid:15),t is denoted by ˆ A wls (cid:15),t . The parameter (cid:15) is then searchedover a discrete set in [0 , to minimize fitting errors. Based on the fitting results, we set thevalue of (cid:15) at 20.
3) Results:
Figure 4 illustrates the fitting results of 6 representative entries of ˜ P t k . The blackstars represent the noisy measurements. The blue, green, and red plots represent the estimatedpaths ˆ P omt t , ˆ P info t and ˆ P wls (cid:15),t , respectively. ˆ P omt t and ˆ P info t are very similar with each other. Clearly, ˆ P wls (cid:15),t has more oscillations which better fits the fluctuations in the measurements. The normalizedsquare errors (cid:16)(cid:80) Kk =0 (cid:107) ˆ P t k − ˜ P t k (cid:107) (cid:17) / (cid:16)(cid:80) Kk (cid:107) ˜ P t k (cid:107) (cid:17) corresponding to ˆ P omt t , ˆ P info t , ˆ P wls (cid:15),t are equalto . , . , . , respectively. Thus, ˆ P wls (cid:15),t has the smallest fitting error. The overall rela-tive large residual is partly due to the low-signal-to-noise ratio of fMRI data [36]. Therefore, thecorresponding system matrices ˆ A wls (cid:15),t could better explain the dynamic interdependence betweenbrain regions. The directed networks in the second row of Fig. 3 illustrates the matrices ˆ A wls (cid:15),t at t = 0 , , , , respectively. The red and blue colors represent positive and negative values,respectively. The edge widths are weighted by the absolute value of the corresponding entries.To simplify visualization, edges with weight smaller than 0.15 are not displayed. DRAFT7
Fig. 3:
The first row illustrates the sample covariance matrices between 7 brain regions computed form differentsegments of a rsfMRI data set from a human brain. The directed networks in the second row illustrate the estimatedsystem matrices corresponding to the proposed weighted-least-squares trajectories.
Fig. 4:
The black stars in each image panel illustrate the noisy sample covariances matrices at different timepoints. The blue, green and red lines are the fitted curves using the proposed three sets of smooth paths.
VI. D
ISCUSSION
In this paper, we have investigated a framework to derive covariance paths on the Riemannianmanifold of positive definite matrices by using quadratic forms of system matrices to regularizethe path lengths. We have considered three types of quadratic forms and derived the correspondingcovariance paths. The first and the second quadratic forms lead to the well-known geodesics de-rived from the Bures-Wasserstein metric from OMT and the Fisher-Rao metric from informationgeometry, respectively. In the process, we have derived a fluid-mechanics interpretation of theFisher-Rao metric in Theorem 1, which provides an interesting weighted-mass-transport view
DRAFT8 for the Fisher-Rao metric.The third type of quadric form gives rise to a general family of covariance paths that aresteered by system matrices with a rotating eigenspace. The rotation velocity is related to thechoice of the parameter (cid:15) . In the special case when (cid:15) = − , i.e. α = 0 , then the eigenspace isnot rotating and the trajectories reduce to the Fisher-Rao based geodesics. We also analyzed theexistence and uniqueness of the paths with sufficiently small α .We note that similar types of trajectories of positive definite matrices with rotating eigenspaceshave been investigated in [37]–[39] from different angles. This work is developed along similarlines as [40], [41], which focus on the optimal steering of state covariances via linear systemsusing external input. But the approach for developing covariance paths used in this paper isdifferent from early work.In a proof-of-concept example, we apply three types of smooth paths of state covarianceto fit noisy sample covariance matrices from a rsfMRI data set. A goal of this experiment isto understand directed interactions among brain regions via the estimated system matrices. Asexpected, the rotation-system-based covariance path has the best performance in terms of fittingfluctuations in the measurements. Therefore, the corresponding system matrices could provide adata-driven tool to understand the structured fluctuations of functional brain activities. In futurework, we will apply this approach to analyze more complex brain networks using different pathfitting algorithms. Moreover, we will also explore the proposed covariance paths to analyze datafrom other neuroimaging modalities such as EEG/MEG.A CKNOWLEDGMENT
The author would like to thank Tryphon T. Georgiou and Yogesh Rathi for insightful discussions.This work was supported in part under grants R21MH115280 (PI: Ning), R01MH097979 (PI: Rathi), R01MH111917(PI: Rathi), R01MH074794 (PI: Westin). R EFERENCES [1] F. Porikli, O. Tuzel, and P. Meer, “Covariance tracking using model update based on lie algebra,” in , vol. 1, June 2006, pp. 728–735.[2] Y. Wu, J. Cheng, J. Wang, H. Lu, J. Wang, H. Ling, E. Blasch, and L. Bai, “Real-time probabilistic covariance trackingwith efficient model update,”
IEEE Transactions on Image Processing , vol. 21, no. 5, pp. 2824–2837, May 2012.[3] J. F. Yang and M. Kaveh, “Adaptive eigensubspace algorithms for direction or frequency estimation and tracking,”
IEEETransactions on Acoustics, Speech, and Signal Processing , vol. 36, no. 2, pp. 241–251, Feb 1988.
DRAFT9 [4] X. Jiang, L. Ning, and T. T. Georgiou, “Distances and riemannian metrics for multivariate spectral densities,”
IEEETransactions on Automatic Control , vol. 57, no. 7, pp. 1723–1735, July 2012.[5] C. Lenglet, M. Rousson, R. Deriche, and O. Faugeras, “Statistics on the manifold of multivariate normal distributions:Theory and application to diffusion tensor MRI processing,”
Journal of Mathematical Imaging and Vision , vol. 25, no. 3,pp. 423–444, Oct 2006. [Online]. Available: https://doi.org/10.1007/s10851-006-6897-z[6] I. L. Dryden, A. Koloydenko, and D. Zhou, “Non-euclidean statistics for covariance matrices, with applications todiffusion tensor imaging,”
The Annals of Applied Statistics
Adaptive Riemannian Metrics for Improved Geodesic Trackingof White Matter . Berlin, Heidelberg: Springer Berlin Heidelberg, 2011, pp. 13–24. [Online]. Available:https://doi.org/10.1007/978-3-642-22092-0 2[8] B. Biswal, F. Zerrin Yetkin, V. M. Haughton, and J. S. Hyde, “Functional connectivity in the motor cortex of restinghuman brain using echo-planar mri,”
Magnetic Resonance in Medicine , vol. 34, no. 4, pp. 537–541, 1995. [Online].Available: http://dx.doi.org/10.1002/mrm.1910340409[9] R. L. Buckner, F. M. Krienen, and B. T. T. Yeo, “Opportunities and limitations of intrinsic functional connectivity MRI,”
Nature Neuroscience , vol. 16, pp. 832–837, 2013.[10] S. M. Smith, D. Vidaurre, C. F. Beckmann, and et al., “Functional connectomics from resting-state fMRI,”
Trends inCognitive Sciences , vol. 17, no. 12, pp. 666–682, 2013.[11] C. Chang and G. H. Glover, “Time-frequency dynamics of resting-state brain connectivity measured with fMRI,”
NeuroImage , vol. 50, no. 1, pp. 81 – 98, 2010.[12] M. G. Preti, T. A. Bolton, and D. V. D. Ville, “The dynamic functional connectome: State-of-the-art and perspectives,”
NeuroImage , vol. 160, pp. 41 – 54, 2017, functional Architecture of the Brain.[13] C. Rao, “Information and the accuracy attainable in the estimation of statistical parameters,”
Bull. Calcutta Math. Soc. ,vol. 37, pp. 81 – 91, 1945.[14] S.-I. Amari and H. Nagaoka,
Methods of information geometry . Amer. Math. Soc., 2000.[15] N. Cencov,
Statistical decision rules and optimal inference . Amer. Math. Soc., 1982.[16] R. Kass and P. Vos,
Geometrical foundations of asymptotic inference . Wiley New York, 1997.[17] C. Villani,
Topics in Optimal Transportation . Amer. Math. Soc., 2003.[18] S. Rachev and L. R¨uschendorf,
Mass transportation problems. Vol. I and II. Probability and its Applications . Springer,New York, 1998.[19] M. Knott and C. S. Smith, “On the optimal mapping of distributions,”
Journal of Optimization Theory and Applications ,vol. 43, no. 1, pp. 39–49, May 1984.[20] A. Takatsu, “On Wasserstein geometry of the space of Gaussian measures,”
ArXiv e-prints , Jan. 2008.[21] A. Uhlmann, “The metric of bures and the geometric phase,” in
Quantum Groups and Related Topics: Proceedings of theFirst Max Born Symposium , R. Gielerak, J. Lukierski, and Z. Popowicz, Eds., 1992, p. 267.[22] D. Petz, “Geometry of canonical correlation on the state space of a quantum system,”
Journal of Mathematical Physics ,vol. 35, pp. 780 – 795, 1994.[23] L. Ning, X. Jiang, and T. Georgiou, “On the geometry of covariance matrices,”
IEEE Signal Processing Letters , vol. 20,no. 8, pp. 787–790, Aug 2013.[24] R. Bhatia, T. Jain, and Y. Lim, “On the Bures-Wasserstein distance between positive definite matrices,”
ArXiv e-prints ,Dec. 2017.
DRAFT0 [25] R. Jordan, D. Kinderlehrer, and F. Otto, “The variational formulation of the Fokker–Planck equation,”
SIAM Journal onMathematical Analysis , vol. 29, no. 1, pp. 1–17, 1998.[26] J.-D. Benamou and Y. Brenier, “A computational fluid mechanics solution to the monge-kantorovich mass transfer problem,”
Numerische Mathematik , vol. 84, no. 3, pp. 375–393, Jan 2000.[27] S. Kullback and R. A. Leibler, “On information and sufficiency,”
The Annals of Mathematical Statistics , vol. 22, no. 1,pp. 79–86, 1951.[28] T. Cover and J. Thomas,
Elements of Information Theory . Wiley-Interscience, 2008.[29] R. Bhatia,
Positive definite matrices . Princeton University Press, 2007.[30] T. T. Georgiou, “Relative entropy and the multivariable multidimensional moment problem,”
IEEE Transactions onInformation Theory , vol. 52, no. 3, pp. 1052–1066, 2006.[31] D. V. Essen, K. Ugurbil, E. Auerbach, and et al., “The human connectome project: A data acquisition perspective,”
NeuroImage , vol. 62, no. 4, pp. 2222 – 2231, 2012, connectivity.[32] S. M. Smith, C. F. Beckmann, J. Andersson, and et al., “Resting-state fmri in the human connectome project,”
NeuroImage ,vol. 80, pp. 144 – 168, 2013, mapping the Connectome.[33] M. Fox, D. Zhang, A. Snyder, and M. Raichle, “The global signal and observed anticorrelated resting state brain networks,”
J. Neurophysiol , vol. 101, p. 3270?3283, 2009.[34] L. Ning and Y. Rathi, “A dynamic regression approach for frequency-domain partial coherence and causality analysis offunctional brain networks,”
IEEE Transactions on Medical Imaging , vol. PP, no. 99, pp. 1–1, 2017.[35] B. T. Yeo, F. M. Krienen, J. Sepulcre, M. R. Sabuncu, D. Lashkari, M. Hollinshead, J. L. Roffman, J. W. Smoller, L. Z¨ollei,J. R. Polimeni, B. Fischl, and R. Liu, H a.nd Buckner, “The organization of the human cerebral cortex estimated by intrinsicfunctional connectivity,”
J. Neurophysiol. , vol. 106, pp. 1125–1165, 2011.[36] K. Murphy, J. Bodurka, and P. A. Bandettini, “How long to scan? the relationship between fmri temporal signal to noiseratio and necessary scan duration,”
NeuroImage , vol. 34, no. 2, pp. 565 – 574, 2007.[37] L. Ning, T. T. Georgiou, and A. Tannenbaum, “On matrix-valued monge-kantorovich optimal mass transport,”
IEEETransactions on Automatic Control , vol. 60, no. 2, pp. 373–382, Feb 2015.[38] K. Yamamoto, Y. Chen, L. Ning, T. T. Georgiou, and A. Tannenbaum, “Regularization and interpolation of positivematrices,”
IEEE Transactions on Automatic Control , vol. PP, no. 99, pp. 1–1, 2017.[39] Y. Chen, T. Georgiou, and A. Tannenbaum, “Matrix Optimal Mass Transport: A Quantum Mechanical Approach,”
ArXive-prints , Oct. 2016.[40] Y. Chen, T. T. Georgiou, and M. Pavon, “Optimal steering of a linear stochastic system to a final probability distribution,Part I,”
IEEE Transactions on Automatic Control , vol. 61, no. 5, pp. 1158–1169, May 2016.[41] ——, “Optimal steering of a linear stochastic system to a final probability distribution, Part II,”
IEEE Transactions onAutomatic Control , vol. 61, no. 5, pp. 1170–1180, May 2016.[42] J. Geweke, “Measurement of linear dependence and feedback between multiple time series,”
J. Am Stat Assoc , vol. 77,no. 378, pp. 304–313, 1982., vol. 77,no. 378, pp. 304–313, 1982.