Geometries on the cone of positive-definite matrices derived from the power potential and their relation to the power means
aa r X i v : . [ m a t h . DG ] F e b Geometries on the cone of positive-definite matrices derived from thepower potential and their relation to the power means
Nadia Chouaieb a , Bruno Iannazzo b , Maher Moakher a, ∗ a LAMSIN, National Engineering School of Tunis, University of Tunis El Manar, B.P. 37, 1002 Tunis-Belv´ed`ere,Tunisia b Dipartimento di Matematica e Informatica, Universit`a di Perugia, Via Vanvitelli 1, I-06123 Perugia, Italy
Abstract
We study a Riemannian metric on the cone of symmetric positive-definite matrices obtained fromthe Hessian of the power potential function (1 − det( X ) β ) /β . We give explicit expressions for thegeodesics and distance function, under suitable conditions. In the scalar case, the geodesic betweentwo positive numbers coincides with a weighted power mean, while for matrices of size at leasttwo it yields a notion of weighted power mean different from the ones given in the literature. As β tends to zero, the power potential converges to the logarithmic potential, that yields a well-known metric associated with the matrix geometric mean; we show that the geodesic and thedistance associated with the power potential converge to the weighted matrix geometric mean andthe distance associated with the logarithmic potential, respectively. Keywords: positive-definite matrices, potential function, Riemannian manifold, Riemannianmetric, Riemannian distance, Karcher mean, matrix geometric mean, matrix power mean, Tsallisstatistic, q -logarithm
1. Introduction
The importance of the cone of symmetric positive-definite matrices can hardly be exaggerated.Such matrices are omnipresent and play fundamental roles in several disciplines such as abstractmathematics, numerical analysis, probability and statistics and engineering sciences. Nowadays, asmany applications deliver data that are constrained to live on this set, it has become essential to ∗ Corresponding author
Email addresses: [email protected] (Nadia Chouaieb), [email protected] (BrunoIannazzo), [email protected] (Maher Moakher)
URL: http://sites.google.com/site/moakhersite/ (Maher Moakher)
Preprint submitted to Linear Algebra and its Applications February 23, 2021 nderstand its geometric structure.In recent years, a rather large number of metrics have been given on the cone of positive-definitematrices. There are both theoretical and practical reasons for this interest. Theoretically, it hasbeen interesting to understand how to suitable extend notions such as the geometric mean or thepower mean from positive scalars to positive-definite matrices [1, 2, 3, 4]. This generalization hasrequired a profound understanding of the geometry on the cone of positive-definite matrices [5, Ch.XII], [6, Ch. 6]. Practically, having a large number of metrics, allows one to choose the one that bestfits the problem of averaging or measuring the nearness of data provided by applications. Metricsdifferent from the Euclidean one have been used in several applications that range from medicalimaging, to machine learning and engineering, see e.g. [7, 8, 9, 10, 11, 12, 13, 14, 15, 16].On the set of symmetric positive-definite matrices of size n , that we will denote by P n , one canuse the simple geometry inherited from the Euclidean geometry of symmetric matrices of size n ,that we will denote by S n . Indeed, it is known that P n is an open convex cone of the space of S n and that the tangent space at a matrix X ∈ P n can be identified with S n . The inner product on S n h A, B i = tr( AB ) , yields a Euclidean space structure on S n , which induces on P n a natural structure of Riemannianmanifold, as an open subset of a Euclidean space. However, the associated metric does not make P n a complete metric space.On the tangent space at X ∈ P n , one can also use the affine-invariant metric h A, B i X = tr( X − AX − B ) , (1)where A, B are matrices in the tangent space, i.e., symmetric matrices. Endowing P n with themetric (1) induces on P n a structure of Riemannian geometry that we will denote by M n and P n equipped with this metric is a complete metric space [3].It has been proved that in M n any set { A , . . . , A m } of given positive-definite matrices of size n has a unique barycenter, that is a minimum over P n of the function f ( X ) := m X i =1 δ ( X, A i ) , where δ ( A, B ) is the Riemannian distance between two given matrices A and B in M n . Thebarycenter on M n has been understood as the legitimate generalization of the geometric mean tomatrices [17], and it is often referred to as the Karcher mean.2oreover, there exists a unique geodesic γ : [0 , → P n joining any two matrices A, B ∈ M n ,and a point of the geodetic curve γ ( t ) is said to be the weighted matrix geometric mean of A and B [18].The Riemannian metric (1) of M n can also be viewed as the Hessian of the logarithmic potentialfunction on P n Φ ( X ) = − ln(det X ) . (2)In interior-point methods of cone programming, (2) is called the logarithmic barrier function [19,Sec. 6.3]. Note that ln(det X ) = tr ln X , and hence the negative of the function (2) is called inclassical statistical mechanics the Boltzmann entropy and in information theory it is called theinformation potential [20].In this paper, we consider the β -power potential function on P n [21, 22]Φ β ( X ) = 1 − (det X ) β β , β = 0 , (3)that generalizes the logarithmic potential in the sense that lim β → Φ β ( X ) = Φ ( X ). For 0 = β < n ,the Hessian of Φ β ( X ) provides a new family of Riemannian geometries on P n that generalizes M n and will be denoted by M βn .We will show that the β -potential function (3) is intimately related to generalized logarithmicfunctions. In fact, in his theory of non-extensive statistics mechanics, which is a generalizationof the Boltzmann-Gibbs statistical mechanics, Tsallis introduced in 1994 [23] the one-parameterfamily, parameterized by q ∈ R , of generalized logarithmic functionsln q x = ln x, q = 1 , x − q − − q , q = 1 , which are defined for all x >
0. The function, ln q , is called the q -logarithm (or the Tsallis logarithm).We note that the family of q -logarithms can be seen as the one-parameter family of Box-Coxtransformations (with deformation parameter λ = 1 − q ) which was proposed by Box and Cox in1964 [24].By setting ˆ β = 1 − β , the β -power potential function (3) can be compactly written asΦ β ( X ) = − ln ˆ β (det X ) , ∀ X ∈ P n , and therefore, the β -power potential function can also be called the ˆ β -logarithmic potential functionon P n . 3e study the geometry of M βn and derive an explicit expression for the geodesics and for theRiemannian distance on M βn , under some conditions. In the scalar case, we observe that thegeodesics joining the two positive numbers a and b is the weighted power mean ((1 − t ) a β/ + tb β/ ) /β . This is nice, since it is known that the weighted power mean with parameter β/
2, fora given t converges to the weighted geometric mean a − t b t as β →
0. For n >
1, this symmetrydisappears and the geodesic on M βn , even for two matrices, does not coincide, for any choice of theparameter, with the definitions of weighted power mean provided so far, namely the straightforwardone ((1 − t ) A q + tB q ) /q , for t ∈ [0 , q ∈ R \ { } , and the one provided by Lim and P´alfia[4]. Thus, the barycenter in M βn can be seen as a third definition of power mean. The latter hasthe luxury of being derived from a rich geometric structure on P n and it approaches the weightedgeometric mean A t B = A ( A − B ) t when the parameter β goes to 0.We here recall that for a diagonalizable matrix C ∈ R n × n with positive eigenvalues, that is, suchthat there exist an invertible matrix M and a diagonal matrix D = diag( d , . . . , d n ) such that C = M − DM , C q should be understood as the primary matrix function C q = M − diag( d q , . . . , d qn ) M .A remarkable property of this function, that will be useful in the following is the commutativitywith similarities, i.e., if N is invertible, then N C q N − = ( N CN − ) q .The paper is structured as follows: in Section 2 we introduce the metric related to the β -powerpotential, and study some of its properties; in Section 3 we provide explicit expressions for thegeodesic joining two positive-definite matrices in the scalar, linearly dependent, and general cases;a number of properties of the mean of two positive-definite matrices associated with this metric aswell as different asymptotic results are studied in Section 4. Explicit expression of the Riemanniandistance between two positive-definite matrices is derived in Section 5, that, as β goes to 0, is shownto converge to the Riemannian distance associated with the metric (1). Finally, in Section 6 someconclusions are drawn.
2. A geometry on P n related to the β -power potential The logarithmic potential function Φ ( X ) = − ln(det( X )) on P n is strictly related to the matrixgeometric mean. Indeed, the differential of Φ ( X ) is d Φ ( X ) = − tr( X − dX ) , and the second differential of Φ ( X ) is d Φ ( X ) = tr( X − dXX − dX ) , (4)4hich is a positive-definite quadratic form, whose associated metric, is the one defined in (1). Thebarycenter with respect to (1) is said to be the matrix geometric mean or the Karcher mean.The one-parameter family of power potential functions (3), which is a generalization of thelogarithmic potential in the sense that lim β → Φ β ( X ) = Φ ( X ), provides for all values of β ∈ ( −∞ , ∪ (0 , n ) a family of Riemannian metrics that are thus very natural to study.Indeed, since d Φ β ( X ) = − (det X ) β tr( X − dX ) , we have that the second differential of Φ β ( X ) is d Φ β ( X ) = (det X ) β (cid:0) tr( X − dXX − dX ) − β tr ( X − dX ) (cid:1) , (5)where we have used d det( X ) = det( X ) tr( X − dX ) and d ( X − ) = − X − dXX − .The quadratic form (5) is positive definite as long as β < n . To prove this fact, we can write (5)as d Φ β ( X ) = (det X ) β (cid:0) tr(( X − / dXX − / ) ) − β tr ( X − / dXX − / ) (cid:1) , and use the following. Lemma 1.
For any n × n symmetric matrix S = 0 and β < n we have tr( S ) − β tr ( S ) > . Proof.
This follows from the Cauchy-Schwartz inequality: β tr ( S ) = β h S, I i ≤ β h I, I ih S, S i = βn tr( S ) < tr( S ) . The Riemannian metric associated with (5) at the base point X ∈ P n is given by g βX ( A, B ) := (det X β ) (cid:0) tr( X − AX − B ) − β tr( X − A ) tr( X − B ) (cid:1) , (6)where A and B are points of the tangent space to P n at X , identified as usual with S n .We will denote by M βn the Riemannian manifold obtained by endowing P n with the metric (6).In Section 3, an explicit expression for the geodesics of M βn will be provided, but first we describea wide set of isometries of M βn . Lemma 2.
Let M ∈ GL ( n, R ) be such that det( M ) = ± . The function f : P n → P n such that f ( X ) = M XM T is an isometry of P n endowed with the metric (6) . roof. Since f is linear, we have that Df ( X )[ H ] = M HM T for any positive-definite matrix X andsymmetric matrix H . Hence, g βf ( X ) ( Df ( X )[ A ] , Df ( X )[ B ]) = g βMXM T ( M AM T , M BM T ) = (det( M ) ) β g βX ( A, B ) , for any A, B ∈ S n , and this completes the proof.We recall a well-known result in Riemannian geometry that characterizes totally-geodesic sub-manifolds. Theorem 3 (Thm. 5.1 in [25]) . Let M be a Riemannian manifold and S any set of isometries of M . Let F be the set of points of M which are left fixed by any elements of S . Then each connectedcomponent of F is a closed totally-geodesic submanifold of M . This allows us to identify two totally-geodesic submanifolds of M βn that are useful in the follow-ing. Corollary 4.
The set of positive-definite diagonal matrices of size n and the set { αI : α > } ofpositive scalar matrices of size n are totally-geodesic submanifolds of the Riemannian manifold P n endowed with the metric (6) .Proof. Let M i ∈ R n × n be the diagonal matrix such that ( M i ) ii = − M i ) jj = 1, for j = i . ByLemma 2, the map X → M i XM Ti is an isometry of M βn , and its fixed points are the positive-definitematrices whose i -th row and column are 0 except in the position i .The common fixed points of all the isometries of the set S = { M , . . . , M n } are the positivediagonal matrices, and by Theorem 3 they form a totally-geodesic submanifold of M βn .Now consider, for i = 2 , . . . , n , the matrix N i ∈ R n × n that permutes the components 1 and i ofa vector in R n . We have that ( N i ) hℓ = 1 for (a) h = ℓ , with h
6∈ { , i } ; (b) h = 1, ℓ = i and; (c) h = i , ℓ = 1; and ( N i ) hℓ = 0 elsewhere. A fixed point X of the isometry X → N i XN Ti is such that X = X ii .The fixed points common to the isometries in the set T = { M , . . . , M n , N , . . . , N n } are thediagonal matrices with constant diagonal, namely the positive scalar matrices, that by the afore-mentioned theorem form a totally-geodesic submanifold of M βn .
3. Geodesics of M βn We show that there exists a unique geodesic curve for the metric (5) joining two positive-definite matrices A and B , under some conditions on β , and we provide an explicit expression forthe geodesic.In the scalar case, the geodesic curve can be interpreted as the weighted p -power mean, for p = β/
2, of two positive numbers a, b , namely ((1 − t ) a p + tb p ) /p , where t ∈ [0 ,
1] is the weightand p is a nonzero real number. This is a very nice feature, since the geodesics for the metric (4),corresponding to the logarithmic potential is the weighted matrix geometric mean a − t b t , that, fora given t , is the limit for p → p -power mean.6e show that a similar result holds when A and B are two linearly dependent matrices, wherethe geodesics turn out to be ((1 − t ) A p + tB p ) /p , for p = nβ/
2. For linearly independent A and B , the explicit expression is much more complicated and is obtained by solving analytically thedifferential equation that it satisfies. Let C A,B denote the space of all C -curves from the interval [0 ,
1] to P n that join two points A and B in P n , i.e., C A,B := (cid:8) P : [0 , → P n , P ∈ C ([0 , | P (0) = A, P (1) = B (cid:9) . A geodesic joining A and B in M βn is a curve in C A,B that minimizes the length with respect to themetric M βn , that is, the functional L β ( P ) := Z q g βP ( t ) ( P ′ ( t ) , P ′ ( t )) dt. (7)In the next result we derive a differential equation whose solution provides the geodesics. Theorem 5.
Let P : [0 , → P n be a smooth geodesic on P n equipped with the Riemannianmetric (6) . Then the function G ( t ) = P − ( t ) P ′ ( t ) satisfies the differential equation G ′ = β − nβ ) (cid:0) tr( G ) − β tr ( G ) (cid:1) I − β tr( G ) G. (8) Proof.
It is a known fact in differential geometry (see e.g., [26, p. 17]) that the extremal curves forthe length functional L β coincide with the extremal curves for the energy functional defined by E β ( P ) := 12 Z g βP ( t ) ( P ′ ( t ) , P ′ ( t )) dt. A customary technique to find the critical curves of E β , is to solve the Euler-Lagrange equationassociated with E β , that is ddt ∂L∂P ′ − ∂L∂P = 0 , where L ( P, P ′ ) := g βP ( t ) ( P ′ ( t ) , P ′ ( t )) is the “Lagrangian” associated with the energy functional E β .Direct computation gives ∂L∂P = (det P ) β (cid:20) β (cid:0) tr( G ) − β tr ( G ) (cid:1) I − G + β tr( G ) G (cid:21) P − , and ∂L∂P ′ = (det P ) β [ G − β tr( G ) I ] P − , G = P − P ′ . Taking the derivative of the previous equation with respect to t andafter some work we obtain G ′ − β tr( G ′ ) I = − β tr( G ) G + 12 β tr ( G ) I + 12 β tr( G ) I. (9)By applying the trace to both sides of equation (9), we gettr( G ′ ) = β − nβ ) (cid:0) n tr( G ) + ( βn −
2) tr ( G ) (cid:1) , which, for later convenience, we rewrite astr( G ′ ) = nβ − nβ ) (cid:0) tr( G ) − β tr ( G ) (cid:1) − β tr ( G ) . (10)Substitution of (10) into (9) and after some simple manipulations yields (8).To find geodesic curves one can solve equation (8). To this end, we found it useful to decompose G ( t ) into its isotropic part α ( t ) I , where α ( t ) = n tr( G ( t )), and its deviatoric part e G ( t ) = G ( t ) − α ( t ) I (recall that this decomposition is unique for any given symmetric matrix, see e.g. [27]). We thenhave tr G ( t ) = nα ( t ) and tr e G ( t ) = 0 and equation (8) can be written as α ′ = − nβ (cid:16) α − n (1 − nβ ) tr( e G ) (cid:17) , e G ′ = − nβα e G. (11)We will get a closed form solution of equation (11) when G ( t ) is a diagonal matrix. In Section 3.2we will show that this is not a restriction. Let P : [0 , → P n be a smooth curve, such that P (0) = A and P (1) = B , for A, B ∈ P n . Thelength of P in M βn is defined in (7).There exists M ∈ GL ( n ) such that M T AM = I and M T BM = D is a positive definite di-agonal matrix (one can choose M = A − / U , where U is an orthogonal matrix such that D := U T A − / BA − / U is diagonal).If we set Q ( t ) = M T P ( t ) M , then we havetr( Q ( t ) − Q ′ ( t ) Q ( t ) − Q ′ ( t )) = tr( P ( t ) − P ′ ( t ) P ( t ) − P ′ ( t )) , tr ( Q ( t ) − Q ′ ( t )) = tr ( P ( t ) − P ′ ( t )) , and det( Q ( t )) β = | det( M ) | β det( P ( t )) β , from which we obtain g βQ ( t ) ( Q ′ ( t ) , Q ′ ( t )) = | det( M ) | β g βQ ( t ) ( P ′ ( t ) , P ′ ( t )) , L ( Q ( t )) = | det( M ) | β L ( P ( t )) . The constant | det( M ) | β = det( A ) − β/ does not depend on P ( t ), but just on A , thus, for any smoothcurve P ( t ) joining A and B there is a smooth curve joining I and D whose length is a scalar multipleof the length of P ( t ), and also the converse is true.This shows that, without loss of generality, in order to find the geodesics, we can consider onlycurves joining the identity and a diagonal matrix with positive diagonal entries. Before giving the general form of the geodesic, we consider the case in which the endpointsare both multiples of the same matrix or, equivalently, are linearly dependent. This includes, inparticular, the case n = 1. For these matrices, the geodesic has a simple form and it is related tothe scalar weighted power mean. Lemma 6.
Let A ∈ P n . A ray { rA, r > } is a totally-geodesic submanifold of M βn and thegeodesic joining r A and r A is given by P ( t ) = (cid:0) (1 − t ) r nβ + tr nβ (cid:1) nβ A, t ∈ [0 , . (12) Proof.
We consider first the case A = I . By Corollary 4, the set of scalar matrices is totallygeodesic, and then the geodesic joining r I and r I should be of the type ξ ( t ) I , where ξ ( t ) is apositive number.In the system (11), we can set e G = 0, and we need to solve the two-point boundary-valueproblem α ′ = − nβ α ,ξ ′ ξ − = α,ξ (0) = r , ξ (1) = r . (13)The corresponding initial-value problem, for given α ( t ) and ξ ( t ) >
0, has a unique local solutionfor t ∈ R . If α ( t ) = 0, then the unique solution is α ≡ ξ ( t ) ≡ ξ ( t ).We can solve the initial-value problem with α (0) = c = 0 and ξ (0) = r , by separation ofvariables, obtaining ξ ( t ) = (cid:0) nβ ct +1 (cid:1) / ( nβ ) r , defined in a right neighborhood of 0. Setting ξ (1) = r ,we get c = nβ (cid:0) ( r r ) nβ/ − (cid:1) . With this choice of c , nonzero for r = r , the initial-value problemwith α (0) = c and ξ (0) = r , has a unique solution in [0 , ξ (1) = r , namely ξ ( t ) = (cid:0) (1 − t ) r nβ + t r nβ (cid:1) nβ , that in turn is the unique solution of the boundary-value problem. For r = r the unique solutionis α ≡ ξ ( t ) ≡ r .Using the argument of Section 3.2, with M = A − / , we have that a ray is a totally-geodesicsubmanifold. In particular, we have M r AM T = r I and M r AM T = r I , and thus the geodesicjoining r A and r A is the curve A / ξ ( t ) A / = ξ ( t ) A , that is (12).9emma 6 can be restated in the following way: Corollary 7.
Let A and B be linearly dependent matrices in P n . Then, there exists a uniquegeodesic in M βn joining A and B given by G β ( A, B, t ) = (cid:0) (1 − t ) A nβ + tB nβ (cid:1) nβ , t ∈ [0 , . (14)Thus, for linearly dependent matrices, the geodesics with respect to the metric in M βn is theweighted power mean with parameter nβ/ Corollary 8.
The geodesic joining a, b ∈ P endowed with the metric (6) is the weighted powermean with parameter β/ , G β ( a, b, t ) = (cid:0) (1 − t ) a β + tb β (cid:1) β , t ∈ [0 , . (15) We provide our main results, that is, a general form of the geodesic on M βn under suitableconditions. To simplify the exposition, we first need to set some notation and introduce somevariables that simplify the expression of the geodesic. For A, B ∈ P n we set e A = det( A ) − /n A , e B = det( B ) − /n B , that are the scaled versions with determinant one of A and B , respectively. Let e µ , . . . , e µ n be the eigenvalues of e A − e B , and define ζ i = ln e µ i , i = 1 , . . . , n. (16)Recalling that the Riemannian distance on M between two matrices M, N ∈ P n is δ ( M, N ) := (cid:16) n X i =1 ln λ i (cid:17) / , (17)where λ , . . . , λ n are the eigenvalues of M − N , we observe that the norm of the vector ζ = [ ζ , . . . , ζ n ]is nothing but the Riemannian distance on M between e A and e B , i.e., k ζ k = (cid:16) n X i =1 ln e µ i (cid:17) / = δ ( e A, e B ) . We are now in a position to define, for β ∈ ( −∞ , ∪ (0 , /n ), the following measure of lineardependence that will play important role in subsequent developments γ β ( A, B ) := | β |k ζ k p /n − β = | β | δ ( e A, e B )2 p /n − β = | β | δ (det( A ) − /n A, det( B ) − /n B )2 p /n − β , (18)with δ as in (17). In fact, γ β ( · , · ) is a distance function on the quotient space P n / ∼ where theequivalence relation is defined as A ∼ B if A and B are on the same ray, i.e., are multiple of oneanother. Some interesting properties of γ β ( A, B ) are summarized in the following.10 emma 9.
For any β ∈ ( −∞ , ∪ (0 , /n ) , the function γ β : P n ×P n → R , defined by (18) , satisfiesfor all A, B, C ∈ P n : Symmetry: γ β ( A, B ) = γ β ( B, A ) ; Positive definiteness: γ β ( A, B ) ≥ with equality if and only if A and B are linearly dependent; Triangle inequality: γ β ( A, C ) ≤ γ β ( A, B ) + γ β ( B, C ) ; Invariance under inversion: γ β ( A − , B − ) = γ β ( A, B ) ; Invariance under congruence: if M is invertible, then γ β ( M T AM, M T BM ) = γ β ( A, B ) .Proof. These properties follow directly from the analogous properties of the Riemanian distance δ ,see e.g. [17].For later use, we note that if A and B are linearly independent then d := δ ( e A, e B ) = 0 and β and β given by β := − π √ π n + 4 nd + πn nd , β := π √ π n + 4 nd − πn nd , (19)are well defined and satisfy β < < β < /n . We have that 0 < γ β ( A, B ) < π/ β ∈ ( β , ∪ (0 , β ). We also note that we have β → −∞ and β → /n as d → β , we have the existence and the explicitexpression of the geodesic joining two matrices in P n . But before stating the main theorem, we givethe following two results in the case of diagonal matrices. Lemma 10.
Let D A , D B be two diagonal matrices in P n , β ∈ ( −∞ , ∪ (0 , /n ) and γ := γ β ( D A , D B ) as in (18) . If < γ < π/ , then there exists a unique geodesic e P ( t ) on M βn join-ing D A and D B such that e P ( t ) = diag( λ ( t ) , . . . , λ n ( t )) , where λ i ( t ) = (cid:0) Φ i ( t, σ, ζ i ) λ nβi (0) + Φ i (1 − t, σ − , − ζ i ) λ nβi (1) (cid:1) nβ , t ∈ [0 , , for i = 1 , . . . , n , with Φ i ( t, σ, ζ i ) = (1 − t )(1 − t + tσ cos γ ) exp (cid:16) nβγ arctan (cid:16) tσ sin γ − t + tσ cos γ (cid:17) ζ i (cid:17) , and σ = det( D − A D B ) β/ , ζ i = ln (cid:16) λ i (1) λ i (0) σ − / ( nβ ) (cid:17) , γ = | β |k ζ k p /n − β . For the convenience of the reader, the rather lengthy proof of this Lemma is given in theAppendix 6. From Lemma 10 we can obtain a simpler expression for the geodesics.
Corollary 11.
Let D A , D B be two diagonal matrices in P n , β ∈ ( −∞ , ∪ (0 , /n ) and γ := γ β ( D A , D B ) as in (18) . If < γ < π/ , then there exists a unique geodesic e P ( t ) on M βn joining D A and D B such that e P ( t ) = diag( λ ( t ) , . . . , λ n ( t )) , where λ i ( t ) = (cid:18) (1 − t ) + σ t + 2 σt (1 − t ) cos γσ α ( t ) (cid:19) nβ λ i (0) − α ( t ) λ i (1) α ( t ) , t ∈ [0 , , (20)11 ith σ = det( D − A D B ) β/ and α ( t ) = 1 γ arctan (cid:18) tσ sin γ − t + tσ cos γ (cid:19) . Proof.
The geodesic in Lemma 10 can be written as λ i ( t ) = λ i (0) (cid:16) (1 − t )(1 − t + tσ cos γ )( e ζ i ) nβα ( t ) + t ( t + (1 − t ) 1 σ cos γ )( e ζ i ) − nβα ( t ) (cid:16) λ i (1) λ i (0) (cid:17) nβ (cid:17) nβ , where α ( t ) = 1 γ arctan φ ( t ) , α ( t ) = 1 γ arctan φ ( t ) , with φ ( t ) = tσ sin γ − t + tσ cos γ , φ ( t ) = (1 − t ) sin γtσ + (1 − t ) cos γ . We note that α ( t ) + α ( t ) = 1. This result can be obtained by observing that the arguments ofthe arctangent φ ( t ) and φ ( t ) are such that φ ( t ) φ ( t ) <
1, since cos γ >
0, because 0 < γ < π/ φ ( t ) + arctan φ ( t ) = arctan tan γ = γ , thatin turn yields α ( t ) + α ( t ) = 1. We further observe that (cid:16) λ i (1) λ i (0) (cid:17) nβ = ( e ζ i ) nβ σ , or equivalently e ζ i = λ i (1) λ i (0) σ − / ( nβ ) . Then, using these two observations, we obtain λ i ( t ) = λ i (0) (cid:0) (1 − t ) + 2 t (1 − t ) σ cos γ + t σ (cid:1) nβ ( e ζ i ) α ( t ) = (cid:0) (1 − t ) + 2 t (1 − t ) σ cos γ + t σ (cid:1) nβ σ − α ( t ) nβ λ i (0) − α ( t ) λ i (1) α ( t ) , where we have set α ( t ) = 1 − α ( t ) = α ( t ).Based on the reduction to the diagonal case we are now in a position to give the geodesic in thegeneral case. Theorem 12.
Let
A, B ∈ P n . If A and B are linearly independent, then setting γ := γ β ( A, B ) , asin (18) , we have that for β ∈ ( β , ∪ (0 , β ) , where β and β are defined in (19) , there exists aunique geodesic on M βn joining A and B and whose explicit expression is G β ( A, B, t ) = η ( t )( A α ( t ) B ) = η ( t ) A ( A − B ) α ( t ) , t ∈ [0 , , (21) where α ( t ) := 1 γ arctan (cid:16) tσ sin γ − t + tσ cos γ (cid:17) , η ( t ) := (cid:16) (1 − t ) + 2 t (1 − t ) σ cos γ + t σ σ α ( t ) (cid:17) nβ (22) with σ = det( A − B ) β/ .If A and B are linearly dependent, then for β ∈ ( −∞ , ∪ (0 , n ) , there exists a unique geodesicon M βn joining A and B and whose explicit expression is G β ( A, B, t ) = ((1 − t ) A nβ + tB nβ ) nβ , t ∈ [0 , . (23)12 roof. When A and B are linearly dependent we can use Corollary 7. Assuming that the twomatrices are independent, we can use the argument of Section 3.2 and in order to obtain a geodesic G β ( A, B, t ) on M βn joining A and B , it is enough to find a geodesic e P ( t ) = diag( λ ( t ) , . . . , λ n ( t ))joining the two diagonal matrices I = M T AM and D = M T BM and then apply the inversecongruence to e P ( t ). If β ∈ ( β , ∪ (0 , β ), that is 0 < γ < π/
2, then the geodesic is obtained usingCorollary 11, with D A = I and D B = D = diag( µ ), where µ = [ µ , . . . , µ n ], is the vector of theeigenvalues of A − B .Setting η ( t ) as in (22) and using the expression provided in (20) and the fact that λ i (0) = 1, for i = 1 , . . . , n , we obtain λ i ( t ) = η ( t ) µ α ( t ) i , and thus G β ( A, B, t ) = M − T e P ( t ) M − = η ( t ) M − T M − M diag( µ α ( t )1 , . . . , µ α ( t ) n ) M − = η ( t ) M − T M − (cid:0) M diag( µ , . . . , µ n ) M − (cid:1) α ( t ) = η ( t ) A ( A − B ) α ( t ) = η ( t )( A α ( t ) B ) , where the third equality is obtained by using the commutativity of primary matrix functions withsimilarities (applied to the fractional power of matrices).Theorem 13, provides the restrictions on β guaranteeing that a geodesic connecting two givenmatrices A and B exists. Now, for a given β , we provide conditions on A and B that ensure theexistence of a geodesic joining them. Theorem 13.
Let
A, B ∈ P n , β ∈ ( −∞ , ∪ (0 , /n ) , and γ := γ β ( A, B ) as in (18) . If γ < π/ , thenthere exists a unique geodesic G β ( A, B, t ) on M βn joining A and B and whose explicit expression,for γ = 0 , is given in (21) , while, for γ = 0 , is given in (23) . Remark 14.
Note that tan( γα ( t )) = tσ sin γ − t + tσ cos γ . Therefore, γα ( t ) can be interpreted geometricallyas the angle of a right triangle with adjacent side of length 1 − t + tσ cos γ and opposite side oflength tσ sin γ (see Fig. 1). Let ℓ ( t ) denotes the length of the hypotenuse side, i.e., ℓ ( t ) := p (1 − t + tσ cos γ ) + ( tσ sin γ ) . Then, as α ′ ( t ) = σ sin γγℓ ( t ) , the angle γα ( t ) is monotonically increasing from 0 to γ as t varies from 0to 1. − t + tσ cos γ t σ s i n γ ℓ ( t ) γα ( t )Figure 1: Geometric interpretation of α ( t ). We further observe that the function η ( t ) can then be compactly written as η ( t ) = (cid:18) ℓ ( t ) σ α ( t ) (cid:19) nβ . . A new power mean for positive definite matrices The geodesic given in Theorem 12 can be seen as a new possible generalization of the powermean to matrices. While for linearly dependent matrices the expression for G β ( A, B, t ) coincideswith the “power-Euclidean” mean P p ( A, B, t ) := ((1 − t ) A p + tB p ) /p , for p = nβ/ Q p ( A, B, t ) := A /p ((1 − t ) A + t ( A p B )) , for linearly independent matrices it is different. This is easily seen by considering I and a diagonalmatrix D = diag( d , . . . , d nn ), that is not a multiple of I . For I and D the power-Euclidean andthe power mean of parameter p = nβ/ − t + td pii ) /p , whilea diagonal entry of the proposed mean is obtained by using all diagonal entries of D . Example 15.
Let A = I and B = (cid:20) (cid:21) and let β = − p = − t = 0 . P p ( A, B, t ) = Q p ( A, B, t ) = (cid:20) / (cid:21) , G β ( A, B, t ) ≈ (cid:20) . . (cid:21) , while the latter has been obtained numerically approximating the result to 6 significant digits.Even if the proposed mean does not reduce to the straightforward power mean for commutingmatrices, it fulfills a number of interesting properties of a power mean.Before listing the properties, we recall here the following properties of the weighted geomet-ric mean that will be used frequently in the sequel. It is remarkable that the t -weighted mean G β ( A, B, t ) of A and B as a scalar multiple of the α ( t )-weighted geometric mean of A and B . Thisfurther exhibits the intimate relation between the proposed mean and the geometric mean. Lemma 16.
Let
A, B ∈ P n and let A t B = A ( A − B ) t , for t ∈ [0 , , be their weighted geometricmean. We have:
1. ( M T AM ) t ( M T BM ) = M T ( A t B ) M , for M invertible and t ∈ [0 , ; A t B = B − t A , for t ∈ [0 , ; A − t B − = A − ( A − t B ) B − = B − ( A − t B ) A − , for t ∈ [0 , ;
4. ( aA ) t ( bB ) = a − t b t ( A t B ) , for a, b > and t ∈ [0 , . The quantities σ , γ , and the functions α ( t ), η ( t ) appearing in Theorem 12 depend on A and B . When needed, we will use the notation σ ( A, B ), γ ( A, B ), α ( A, B, t ), η ( A, B, t ), to explicit thisdependence. We give in the following lemma some of their symmetry and invariance properties thatwill be useful in the proof of the properties of the mean G β ( A, B, t ).14 emma 17.
The functions σ ( A, B ) , γ ( A, B ) , α ( A, B, t ) , η ( A, B, t ) satisfy the following properties: σ ( A, B ) = σ ( B, A ) − = σ ( B − , A − ) = σ ( M T AM, M T BM ) ,γ ( A, B ) = γ ( B, A ) = γ ( B − , A − ) = γ ( M T AM, M T BM ) ,α ( A, B, t ) = 1 − α ( B, A, − t ) = 1 − α ( A − , B − , − t ) = α ( M T AM, M T BM, t ) ,η ( A, B, t ) = η ( B, A, − t ) = η ( A − , B − , − t ) = η ( M T AM, M T BM, t ) , for any invertible matrix M ∈ R n × n and t ∈ [0 , . Furthermore, for t ∈ [0 , and a, b > , we have σ ( aA, bB ) = qσ ( A, B ) ,γ ( aA, bB ) = γ ( A, B ) ,α ( aA, bB, t ) = α ( A, B, e t ) ,η ( aA, bB, t ) = ((1 − t ) a nβ/ + tb nβ/ ) / ( nβ ) a − α ( A,B, e t ) b α ( A,B, e t ) η ( A, B, e t ) , where q = ( b/a ) nβ/ and e t = tq − t + tq .Proof. The first set of properties follows directly from the definitions of σ , γ , α ( t ), η ( t ) (and fromthe proof of Corollary 11 for α ( t )).For the second set of properties, we have e tσ ( A, B ) sin γ ( A, B )1 − e t + e tσ ( A, B ) cos γ ( A, B ) = tσ ( aA, bB ) sin γ ( aA, bB )1 − t + tσ ( aA, bB ) cos γ ( aA, bB ) , from which it follows that α ( A, B, e t ) = α ( aA, bB, t ) and(1 − e t ) + 2 e tσ ( A, B ) cos γ ( A, B ) + t σ ( A, B ) σ ( A, B ) α ( A,B, e t ) = (1 − t ) + 2 tσ ( aA, bB ) cos γ ( aA, bB ) + t σ ( aA, bB ) σ ( aA, bB ) α ( aA,bB,t ) q − α ( aA,bB,t ) (1 − t + tq ) , from which we get η ( A, B, e t ) = η ( aA, bB, t ) ( b/a ) e α a ((1 − t ) a nβ/ + tb nβ/ ) / ( nβ ) , where e α := α ( A, B, e t ) = α ( aA, bB, t ).We will use the properties of the weighted geometric mean of two matrices listed in Lemma 16and the results of Lemma 17 to prove some properties of the proposed mean G β ( A, B, t ), then wewill discuss some asymptotic properties.
Theorem 18.
Let β ∈ ( −∞ , ∪ (0 , /n ) , A, B ∈ P n such that < γ < π/ , with γ := γ β ( A, B ) as in (18) . The following properties hold. G β ( M T AM, M T BM, t ) = M T G β ( A, B, t ) M , for any M ∈ R n × n invertible and t ∈ [0 , . G β ( A, B, t ) = G β ( B, A, − t ) , for t ∈ [0 , . G β ( A − , B − , t ) = A − G β ( A, B, − t ) B − = A − G β ( B, A, t ) B − = B − G β ( B, A, t ) A − = B − G β ( A, B, − t ) A − , for t ∈ [0 , . G β ( A − ,B − ,t ) η ( A − ,B − ,t ) = (cid:0) G β ( A,B,t ) η ( A,B,t ) (cid:1) − , where for t ∈ [0 , , t is such that α ( A, B, t ) = α ( A, B, − t ) . G β ( aA, bB, t ) = ((1 − t ) a nβ/ + tb nβ/ ) nβ G β (cid:16) A, B, tb nβ/ (1 − t ) a nβ/ + tb nβ/ (cid:17) = ((1 − t ) a nβ/ + tb nβ/ ) nβ G β (cid:16) B, A, (1 − t ) a nβ/ (1 − t ) a nβ/ + tb nβ/ (cid:17) , for a, b positive realsand t ∈ [0 , . G β ( A, B, / (1 + σ )) = (cid:0) √ σ cos( γ/ σ (cid:1) nβ A / B , where σ = det( A − B ) β/ .Proof. Using the results of Lemma 17 we have:1. G β ( M T AM, M T BM, t ) = η ( M T AM, M T BM, t )(( M T AM ) α ( M T AM,M T BM,t ) ( M T BM ))= η ( A, B, t ) M T ( A α ( A,B,t ) B ) M = M T G β ( A, B, t ) M. G β ( B, A, − t ) = η ( A, B, t )( B − α ( A,B,t ) A ) = η ( A, B, t )( A α ( A,B,t ) B ) = G β ( A, B, t ) . G β ( A − , B − , t ) = η ( B, A, t )( A − α ( B,A,t ) B − ) = A − ( η ( B, A, t )( A α ( B,A,t ) B )) B − , from which the proof follows using property 3 in Lemma 16.4. Since G β ( A − , B − , t ) = A − G β ( A, B, − t ) B − , we get G β ( A − , B − , t ) = A − η ( A, B, − t ) A ( A − B ) α ( A,B, − t ) B − = η ( B, A, t )( A − B ) − α ( B,A,t ) B − = η ( A − , B − , t )( B − A ) α ( B,A,t ) A − = η ( A − , B − , t ) (cid:0) A ( A − B ) α ( A,B, − t ) (cid:1) − = η ( A − , B − , t ) (cid:16) G β ( A, B, t ) η ( A, B, t ) (cid:17) − .
5. With the notation of Lemma 17 and its proof, we get G β ( A, B, e t ) = η ( A, B, e t )( A e α B ) = η ( aA, bB, t ) a − e α b e α ( A e α B )((1 − t ) a nβ/ + tb nβ/ ) / ( nβ ) = G β ( aA, bB, t )((1 − t ) a nβ/ + tb nβ/ ) / ( nβ ) , where we have used also item 4 of Lemma 16.6. For t = 1 / (1 + σ ), we have α ( t ) = 1 / Q p ( A, B, t ) = ( Q − p ( A − , B − t )) − , that is very different. Moreover, it is unclear whether the new mean has some property related tothe natural order of positive-definite matrices.Next, we provide some asymptotic properties of the new mean.16 heorem 19. For all
A, B ∈ P n we have lim β → G β ( A, B, t ) = A t B. Proof.
With no loss of generality we can assume that A and B are linearly independent. For β sufficiently small G β ( A, B, t ) is well defined so that we can take the limit.Since both G β ( A, B, t ) and the weighted geometric mean commute with congruences, we canassume that A = I and B = D = diag( d , . . . , d n ) and prove that G β ( I, D ) → I t D = D t . In thiscase G β ( A, B, t ) = diag( λ ( t ) , . . . , λ n ( t )), and it is enough to prove that λ i ( t ) → d ti for i = 1 , . . . , n .Setting ω = ln det( D ), we observe that, as β → σ = 1 + ω β/ o ( β ) , γ = ω β + o ( β ) , where ω is a nonzero constant. From this we deduce that for β → tσ sin γ = tγ + o ( β ) , − t + tσ cos γ = 1 + o (1) , α ( t ) = t + o (1) , d αi = d ti + o (1) , and, analogously(1 − t ) + 2 t (1 − t ) σ cos γ + t σ = 1 + ω βt + o ( β ) , σ α = 1 + ω βt + o ( β ) , η ( t ) = 1 + o ( β ) . Finally, since λ i (0) = 1 and λ i (1) = d , we have λ i ( t ) = λ i (0) − α ( t ) λ i (1) α ( t ) η ( t ) = d ti + o (1) . that is what we wanted to prove.In Theorem 12, there is a distinction between the case in which A and B are linearly dependentand the case in which they are not, and the form of the geodesic is different. One might ask whetherthis mean is continuous with respect to A and B , for this reason, we show that, for a given t and β , when the two matrices approach a couple of linearly dependent matrices, the formula (21) tendsto (23). Theorem 20.
Let M ∈ P n , N , N ∈ S n and ω , ω be positive constants. Let A ( τ ) = ω M + τ N and B ( τ ) = ω M + τ N , be two curves on P n , defined in a neighborhood of and such that A ( τ ) and B ( τ ) are linearly independent for τ = 0 . We have that lim τ → G β ( A ( τ ) , B ( τ ) , t ) = G β ( A (0) , B (0) , t ) , for β ∈ ( −∞ , ∪ (0 , /n ) .Proof. There exist continuous functions e µ ( τ ) , . . . , e µ n ( τ ) that are the eigenvalues of e A ( τ ) − e B ( τ ),where as usual e A ( τ ) = det( A ( τ )) − /n A ( τ ), e B ( τ ) = det( B ( τ )) − /n B ( τ ). Hence, the function γ ( τ ) := γ β ( A ( τ ) , B ( τ )) = | β | p /n − β (cid:16)X i ln e µ i ( τ ) (cid:17) /
17s continuous and lim τ → γ ( τ ) = 0, since A (0) and B (0) are linearly dependent. There exists aneighborhood of 0 such that γ ( τ ) < π/ G β ( A ( τ ) , B ( τ ) , t ) is well defined in thisneighborhood.Since the mean commutes with congruences, without loss of generality, we can assume that M = I and ω = 1. We have that σ ( τ ) = det( A ( τ ) − B ( τ )) β/ , is continuous and lim τ → σ ( τ ) = ω nβ/ =: ω .As τ approaches 0, we have the power series expansions σ ( τ ) = ω + ωβ N ) τ + o ( τ ) , e µ i ( τ ) = 1 + b µ i τ + o ( τ ) , where N := N ω − N and b µ i are the eigenvalues of N − n tr( N ) I , and moreover γ ( τ ) = | β | p /n − β (cid:13)(cid:13) N − tr( N ) I/n (cid:13)(cid:13) F τ + o ( τ ) ,α = tω − t + tω + o (1) . Thus (cid:0) (1 − t ) + 2 t (1 − t ) σ cos γ + t σ (cid:1) nβ = (1 − t + tω nβ ) nβ + o (1) , σ αnβ = ω α + o (1) , and A α B = ( I + τ N ) (cid:0) ( I + τ N ) − ( ω I + τ N ) (cid:1) α = ( I + τ N )( ω I + τ ( N − ω N ) + o ( τ )) α = ω α ( I + τ N + τ αN + o ( τ )) , from which we finally getlim τ → G β ( A ( τ ) , B ( τ ) , t ) = (1 − t + tω nβ ) nβ I = G β ( I, ω I ) . By reverting the congruence, the result follows.
5. Riemannian distance
We give an explicit expression for the Riemannian distance in M βn , in terms of the determinantsof A and B and the classical Riemannian distance in M n . Theorem 21.
The distance associated with the Riemannian metric M βn between two positive-definite matrices A and B , such that < γ < π/ , with γ := γ β ( A, B ) of (18) , is given by d β ( A, B ) = 2 p /n − β | β | (cid:16)(cid:0) det( A ) β/ − det( B ) β/ (cid:1) + 4(det( A ) det( B )) β/ sin γ (cid:17) . (24)18 roof. We rely on the proof of Lemma 10 to get some useful expressions involving the geodesics P ( t ) and G ( t ) = P ( t ) − P ′ ( t ).Using (A.5) and the fact that P ni =1 a i = 0, we obtaintr( G ( t )) = 2 h ( t ) h ′ ( t ) β (1 + h ( t ) ) , where h ( t ) = (1 − t ) φ + tψ , with φ and ψ as in (A.9). Furthermore, by noting that P ni =1 a i =2 n (1 − nβ ) a , it follows from (A.5) thattr( G ( t ) ) = 4( φ − ψ ) nβ (1 + h ( t ) ) ( h ( t ) + 1 − nβ ) . From equation (A.6), we obtain thatdet( P ( t )) β = n Y i =1 λ i (0) β (cid:16) h ( t ) φ (cid:17) /n exp (cid:0) b i (arctan( h ( t )) − arctan φ ) (cid:1) = det( A ) β h ( t ) φ , (25)where we have used the fact that P i b i = 0.The “square of the speed” of the geodesic curve P ( t ) is given by g βP ( P ′ , P ′ ) = det( P ( t )) β (cid:2) tr( G ( t ) ) − β tr ( G ( t )) (cid:3) = 4 det( A ) β φ /n − ββ ( ψ − φ ) . (26)The latter is constant and hence t is the arclength of P ( t ). Using the expressions for φ and ψ , weobtain the identity det( A ) β ( ψ − φ ) φ = (det( A ) det( B )) β/ (cid:0) σ + 1 σ − γ (cid:1) , that in turn, as σ = det( B ) β/ / det( A ) β/ , provides the formula for the distanced β ( A, B ) = 4(1 /n − β ) β (cid:0) det( A ) β + det( B ) β − A ) det( B )) β/ cos γ (cid:1) , = 4(1 /n − β ) β (cid:16)(cid:0) det( A ) β/ − det( B ) β/ (cid:1) + 2(det( A ) det( B )) β/ (1 − cos γ ) (cid:17) = 4(1 /n − β ) β (cid:16)(cid:0) det( A ) β/ − det( B ) β/ (cid:1) + 4(det( A ) det( B )) β/ sin γ (cid:17) . (27)The explicit formula (24) involves the dimension n , the parameter β , the determinants of A and B and the classical Riemannian distance in M n between A and B , through γ , see Fig. 2 for ageometric interpretation. 19 e t ( A ) β / d e t( B ) β / | β | √ /n − β d β ( A, B ) γ Figure 2: Geometric interpretation of d β ( A, B ). Note that when det A = det B = ∆, then (24) reduces tod β ( A, B ) = 4 p /n − β | β | ∆ β/ sin γ . As β →
0, we have that γ = | β |√ nδ ( e A, e B ) + o ( | β | ) that yields the expansiond β ( A, B ) = 4 nβ (cid:16) β A ) − log det( B )) − nβ δ ( e A, e B ) + o ( β ) (cid:17) = δ ( A, B ) + o (1) , that gives lim β → d β ( A, B ) = δ ( A, B ) , and the new distance tends to the Riemannian distance between A and B .Finally, we give in the following lemma an expression for the determinant of the point det P ( t )along the geodesics as a function of det( A ) and det( B ). Lemma 22.
Along the geodesic curve P ( t ) joining A and B , the determinant ∆( t ) := det P ( t ) isgiven by ∆( t ) β = (1 − t ) det( A ) β + t det( B ) β + 2 t (1 − t ) det( A ) β/ det( B ) β/ cos γ = (cid:0) (1 − t ) det( A ) β/ + t det( B ) β/ (cid:1) − t (1 − t ) det( A ) β/ det( B ) β/ sin γ . Proof.
Along the geodesic curve P ( t ) joining A and B , we know that the “square of the speed”, g βP ( t ) ( G ( t ) , G ( t )) = det( P ( t )) β (cid:0) tr( G ( t ) ) − β tr G ( t ) (cid:1) , is constant and is equal to D := d β ( A, B ), the square of the β -Riemannian distance between A and B .Therefore, by using Jacobi’s formula, ∆ ′ ( t ) / ∆( t ) = tr( G ( t )), and the above we obtaintr( G ( t ) ) = D ∆( t ) − β + β (cid:18) ∆ ′ ( t )∆( t ) (cid:19) . Furthermore, upon derivation of Jacobi’s formula we get (cid:18) ∆ ′ ( t )∆( t ) (cid:19) ′ = tr( G ′ ( t )) , (cid:18) ∆ ′ ( t )∆( t ) (cid:19) ′ = nβD − nβ ) ∆( t ) − β − β (cid:18) ∆ ′ ( t )∆( t ) (cid:19) , or, equivalently, ∆( t ) ′′ ∆( t ) + ( β − t ) ′ ) = nβD − nβ ) ∆( t ) − β . By integrating this second-order ODE subject to the conditions ∆(0) = det A and ∆(1) = det B weget the following expression for the determinant ∆( t ):∆( t ) β = (1 − t ) det( A ) β + t det( B ) β − t (1 − t ) nD β − nβ ) . The result then follows by substituting the expression of D .When A and B are linearly dependent the above reduces to∆( t ) β = (cid:2) (1 − t ) det( A ) β/ + t det( B ) β/ (cid:3) .
6. Conclusions
Using the Hessian of the power potential function (1 − det( X ) β ) /β we have derived a Riemannianmetric on the cone of positive definite matrices. We were able to find an explicit expression for thegeodesics joining two matrices, when β is sufficiently small or when the two matrices are sufficientlynear to a couple of linearly dependent matrices.The geodesic has been interpreted as a power mean of positive definite matrices, because of itsproperties, while it is different from the power means defined so far, and thus it represents a newmathematical object.Its relation with information geometry and Tsallis statistics make it potentially useful in appli-cation where data matrices need to be averaged. An implementation of the mean can be found atMATLAB Central repository https://tinyurl.com/tzkv3sqh Investigating the scope of applications of the new mean and extending this mean to more thantwo matrices is an object of future work.
References [1] J. D. Lawson, Y. Lim, The geometric mean, matrices, metrics, and more, Amer. Math. Monthly108 (9) (2001) 797–812.[2] T. Ando, C.-K. Li, R. Mathias, Geometric means, Linear Algebra Appl. 385 (2004) 305–334.213] R. Bhatia, J. Holbrook, Riemannian geometry and matrix geometric means, Linear AlgebraAppl. 413 (2-3) (2006) 594–618.[4] Y. Lim, M. P´alfia, Matrix power means and the Karcher mean, J. Funct. Anal. 262 (4) (2012)1498–1514.[5] S. Lang, Fundamentals of differential geometry, Vol. 191 of Graduate Texts in Mathematics,Springer-Verlag, New York, 1999.[6] R. Bhatia, Positive Definite Matrices, Princeton University Press, Princeton, NJ, 2007.[7] B. Iannazzo, B. Jeuris, F. Pompili, The derivative of the matrix geometric mean with an appli-cation to the nonnegative decomposition of tensor grids, in: Structured matrices in numericallinear algebra, Vol. 30 of Springer INdAM Ser., Springer, Cham, 2019, pp. 107–128.[8] G. Cheng, B. C. Vemuri, A novel dynamic system in the space of SPD matrices with applicationsto appearance tracking, SIAM Journal on Imaging Sciences 6 (1) (2013) 592–615.[9] C. Lenglet, M. Rousson, R. Deriche, O. Faugeras, S. Lehericy, K. Ugurbil, A Riemannianapproach to diffusion tensor images segmentation, in: G. E. Christensen, M. Sonka (Eds.),Information Processing in Medical Imaging, Springer Berlin Heidelberg, Berlin, Heidelberg,2005, pp. 591–602.[10] X. Hua, Y. Cheng, H. Wang, Y. Qin, Y. Li, Geometric means and medians with applicationsto target detection, IET Signal Processing 11 (2017) 711–720.[11] T. Drˇsata, A. P´erez, M. Orozco, A. V. Morozov, J. ˇSponer, F. Lankaˇs, Structure, stiffness andsubstates of the Dickerson-Drew dodecamer, Journal of Chemical Theory and Computation9 (1) (2013) 707–721.[12] O. Tuzel, F. Porikli, P. Meer, Pedestrian detection via classification on Riemannian manifolds,IEEE Transactions on Pattern Analysis and Machine Intelligence 30 (10) (2008) 1713–1727.[13] A. Barachant, S. Bonnet, M. Congedo, C. Jutten, Multiclass brain–computer interface classifi-cation by Riemannian geometry, IEEE Transactions on Biomedical Engineering 59 (4) (2012)920–928. 2214] M. Moakher, P. G. Batchelor, The symmetric space of positive definite tensors: From geometryto applications and visualization, in: J. Weickert, H. Hagen (Eds.), Visualization and Processingof Tensor Fields, Mathematics and Visualization, Springer, Berlin, 2006, pp. 285–298.[15] M. Moakher, On the averaging of symmetric positive-definite tensors, Journal of Elasticity82 (3) (2006) 273–296.[16] M. Fasi, B. Iannazzo, Computing the weighted geometric mean of two large-scale matrices andits inverse times a vector, SIAM Journal on Matrix Analysis and Applications 39 (1) (2018)178–203.[17] M. Moakher, A differential geometric approach to the geometric mean of symmetric positive-definite matrices, SIAM J. Matrix Anal. Appl. 26 (2005) 735–747.[18] J. Lawson, H. Lee, Y. Lim, Weighted geometric means, Forum Math. 24 (5) (2012) 1067–1090.[19] Y. E. Nesterov, M. J. Todd, On the Riemannian geometry defined by self-concordant barriersand interior-point methods, Found. Comput. Math. 2 (4) (2002) 333–361.[20] F. Hiai, D. Petz, Matrix means and inequalities, in: Introduction to Matrix Analysis andApplications, Springer International Publishing, 2014, Ch. 5, pp. 187–226.[21] A. Ohara, N. Suda, S. Amari, Dualistic differential geometry of positive definite matrices andits applications to related problems, Linear Algebra and its Applications 247 (1996) 31–53.[22] A. Ohara, S. Eguchi, Geometry on positive definite matrices induced from V-potential func-tion, in: F. Nielsen, F. Barbaresco (Eds.), Geometric Science of Information, Springer BerlinHeidelberg, Berlin, Heidelberg, 2013, pp. 621–629.[23] C. Tsallis, What are the numbers that experiments provide?, Quimica Nova 17 (6) (1994)468–471.[24] G. E. P. Box, D. R. Cox, An analysis of transformations, Journal of the Royal StatisticalSociety, Series B 26 (2) (1964) 211–252.[25] S. Kobayashi, Transformation groups in differential geometry, Springer Science & BusinessMedia, 2012. 2326] J. Jost, Riemannian Geometry and Geometric Analysis, 2nd Edition, Springer, Berlin, 1998.[27] G. Augusti, J. B. Martin, W. Prager, On the decomposition of stress and strain tensors intospherical and deviatoric parts, Proceedings of the National Academy of Sciences of the UnitedStates of America 63 (2) (1969) 239–241.
Appendix A. Proof of Lemma 10
Proof.
The set of positive-definite diagonal matrices is a totally-geodesic submanifold of M βn (seeCorollary 4), thus, we can assume that the geodesic P ( t ) we are looking for is diagonal.Under the hypotheses on γ , we will solve equation (8), with G ( t ) diagonal for t ∈ [0 ,
1] and withsuited boundary conditions.Let P ( t ) = diag( λ , . . . , λ n ) and G ( t ) = diag( γ , . . . , γ n ), where γ i = λ ′ i /λ i . Furthermore, set α ( t ) = n tr( G ( t )) and e G ( t ) = G ( t ) − n tr( G ( t )) I = diag( ν , . . . , ν n ) where ν i = γ i − α , i = 1 , . . . , n .Observing that ν + · · · + ν n = 0, the boundary-value problem to be solved can be written as(see (11)) α ′ = − nβ (cid:16) α − n (1 − nβ ) P n − i =1 (cid:16) ν i + P j
0, thesolution of (A.1) is obtained by α ′ = − nβ α ,γ ′ i /γ i = α, i = 1 , . . . , n,γ i (0) = a ii , γ i (1) = b ii , i = 1 , . . . , n, whose solution is γ i ( t ) = (cid:16) (1 − t ) a nβ ii + tb nβ ii (cid:17) nβ (see Section 3.3). The condition γ ′ i /γ i = α for i = 1 , . . . , n , implies that the quotient b ii /a ii is constant and thus D A and D B are linearly dependent,thus γ = 0.Without loss of generality, from now on, we assume that ν ℓ (0) = 0 for some ℓ . In a rightneighborhood of 0, for the indices such that ν i (0) = 0, we can write ν ′ i ν i = ν ′ ℓ ν ℓ , (A.2)that yields ν i = a i ν ℓ , for some constant a i obtained by integrating (A.2). In general, setting a i = 0when ν i (0) = 0, we can write ν i = a i ν ℓ , ≤ i ≤ n, where a ℓ = 1 and a n = − P n − i =1 a i . By the uniqueness of solution of the initial value problem, wecan further assume that ν ℓ ( t ) = 0 for any t in the domain of definition, and moreover, that ν ℓ ( t ) > P i ν i = 0. 24his reduces the problem to the system of two ordinary differential equations ( α ′ = − nβ (cid:16) α − b aν ℓ (cid:17) ,ν ′ ℓ = − nβαν ℓ , (A.3)where b a = 12 n (1 − nβ ) n X i =1 a i . The solution of (A.3) can be expressed as α ( t ) = 1 nβ h ( t ) h ′ ( t )1 + h ( t ) , (A.4a) ν ℓ ( t ) = √ nβa h ′ ( t )1 + h ( t ) , (A.4b)where h ( t ) = (1 − t ) φ + tψ with φ and ψ are integrating constants, while a is one of the squareroots of b a . Observe that we have assumed that ν ℓ (0) > ψ = φ , andsign( ψ − φ ) sign( a ) sign( β ) = 1, that will be assumed from now on.It follows that the solution of the initial-value problem associated with (A.1) is obtained byintegrating λ ′ i λ i = 1 nβ h ( t ) h ′ ( t )1 + h ( t ) + √ a i a h ′ ( t )1 + h ( t ) ! , i = 1 , . . . , n, (A.5)which yields the following formula for the geodesic curve λ nβi ( t ) = λ nβi (0) 1 + h ( t ) φ exp( b i (arctan h ( t ) − arctan φ )) , (A.6)for i = 1 , . . . , n , where b i = √ a i /a and λ i (0) = a ii . Notice that P ni =1 b i = 0 and b ℓ = √ /a .The constants ( φ , ψ , b , . . . , b n ) are to be determined so that the n end conditions ( λ i (1) = b ii , i = 1 , . . . , n ) are satisfied, together with P i b i = 0.Since ν i ( t ) = λ ′ i λ i − n P nj =1 λ ′ j λ j , we have that, for t in a right neighborhood of 0, it holds that Z t ν i ( s ) ds = ln λ i ( t ) λ i (0) − n det( P (0) − P ( t )) . If we assume that the solution exists for t = 1, we get Z ν i ( s ) ds = ln µ i − n ln det( D − A D B ) = ζ i , and since ν i ( t ) = a i ν ℓ ( t ), we have that a i = ζ i /ζ ℓ and b i = sign( a )2 p n (1 − nβ ) ζ i / k ζ k , recall thatwe have assumed that ν ℓ ( t ) > t , from which it follows that ζ ℓ > ψ and φ . From (A.6) it follows that, in order to have λ i (1) = b ii , itmust be b nβii = a nβii ψ φ exp (cid:0) b i (arctan ψ − arctan φ ) (cid:1) . b i (arctan ψ − arctan φ ) = β ln (cid:16) b nii a nii det D A det D B (cid:17) = nβζ i , (A.7)where ζ i is defined in (16), and, by observing that P i b i = 0 and P i b i = 4 n (1 − nβ ), we get1 + ψ φ = (cid:18) n Y i =1 b ii a ii (cid:19) β = (cid:18) det D B det D A (cid:19) β , (arctan ψ − arctan φ ) = n β k ζ k n (1 − nβ ) = γ . (A.8)With the assumption 0 < γ < π/
2, we have that 0 < | arctan ψ − arctan φ | < π/ ψ − arctan φ = arctan ψ − φ ψφ , that requires ψφ > −
1. The equations (A.8) can be written as the two algebraic equations for φ and ψ ψ = σ (1 + φ ) , ( φ − ψ ) = (tan γ ) (1 + φψ ) , ψφ > − , where σ = det( A − B ) β/ >
0. The numbers σ and γ depend uniquely on β and on the initial andfinal data.The (real) solutions to the two equations are the four pairs ( φ, ψ ), ( ˆ φ, ˆ ψ ), ( − φ, − ψ ), ( − ˆ φ, − ˆ ψ )where φ = ξ ( σ, γ ) and ψ = ξ ( σ − , − γ ) , (A.9)and ˆ φ = ξ ( − σ, γ ) and ˆ ψ = ξ ( − σ − , − γ ) , (A.10)with ξ ( σ, γ ) = σ − p γ ) − γ = σ − − cos γ sin γ . The choices ( φ, ψ ) and ( − φ, − ψ ) are the only ones that give ψφ > − φ, ψ ) gives sign( ψ − φ ) = − sign( β ), while the choice ( − φ, − ψ ) gives sign( ψ − φ ) = sign( β ). Since sign( b i ) = sign( a ) and sign( ψ − φ ) = sign (cid:0) arctan( h ( t )) − arctan φ (cid:1) , we havethat sign( b i ) sign (cid:0) arctan( h ( t )) − arctan φ (cid:1) is the same for both choices, and the formula (A.6) forthe geodesic is the same.We note that the explicit expression (A.6) for the geodesic is not symmetric with respect to theend conditions λ i (0) and λ i (1). To obtain an expression which is symmetric with respect to the endconditions we recall that h ( t ) = (1 − t ) φ + tψ , and hence,1 + h ( t ) = 1 + (1 − t ) φ + t ψ + 2 t (1 − t ) φψ = (1 − t ) (1 + φ ) + t (1 + ψ ) + 2 t (1 − t )(1 + φψ )= (1 − t )[(1 − t )(1 + φ ) + t (1 + φψ )] + t [ t (1 + ψ ) + (1 − t )(1 + φψ )] . Therefore, 1 + h ( t ) φ = (1 − t ) (cid:20) (1 − t ) + t φψ φ (cid:21) + t ψ φ (cid:20) t + (1 − t ) 1 + φψ ψ (cid:21) . From (A.6) it follows λ nβi (1) = λ nβi (0) 1 + ψ φ exp( b i (arctan ψ − arctan φ )) , λ nβi (0) 1 + ψ φ = λ nβi (1) exp( − b i (arctan ψ − arctan φ )) . Then, (A.6) can be written as λ nβi ( t ) = λ nβi (0)(1 − t ) (cid:20) (1 − t ) + t φψ φ (cid:21) exp( b i (arctan h ( t ) − arctan φ ))+ λ nβi (1) tσ (cid:20) t + (1 − t ) 1 + φψ ψ (cid:21) exp( b i (arctan h ( t ) − arctan ψ )) ,,