Bounds for the Wasserstein mean with applications to the Lie-Trotter mean
aa r X i v : . [ m a t h . F A ] A p r BOUNDS FOR THE WASSERSTEIN MEAN WITH APPLICATIONS TOTHE LIE-TROTTER MEAN
JINMI HWANG AND SEJONG KIM
Abstract.
As the least squares mean for the Riemannian trace metric on the cone ofpositive definite matrices, the Riemannian mean with its computational and theoreticalapproaches has been widely studied. Recently the new metric and the least squares meanon the cone of positive definite matrices, which are called the Wasserstein metric and theWasserstein mean, respectively, have been introduced. In this paper, we explore someproperties of Wasserstein mean such as determinantal inequality and find bounds for theWasserstein mean. Using bounds for the Wasserstein mean, we verify that the Wassersteinmean is the multivariate Lie-Trotter mean.
Keywords : Wasserstein mean, Riemannian mean, Lie-Trotter mean Introduction
The open convex cone P m of all m × m positive definite Hermitian matrices with theinner product h X, Y i A = tr( A − XA − Y ) on the tangent space T A ( P m ) at each point A ∈ P m gives us a Riemannian structure. Indeed, P m is a Cartan-Hadamard manifold,a simply connected complete Riemannain manifold with non-positive sectional curvature,and also a Hadamard space. The Riemannian trace metric between A and B is given by δ ( A, B ) = k log A − / BA − / k , where k · k denotes the Frobenius norm. The natural andcanonical mean on a Hadamard space is the least squares mean, called the Cartan meanor Riemannian mean. For a positive probability vector ω = ( w , . . . , w n ), the Riemannianmean of A , . . . , A n ∈ P m is defined asΛ( ω ; A , . . . , A n ) = arg min X ∈ P m n X j =1 w j δ ( X, A j ) . (1.1) The Riemannian mean with its computational and theoretical approaches has been widelystudied. One of the important properties of the Riemannian mean is the arithmetic-geometric-harmonic mean inequalities: n X j =1 w j A − j − ≤ Λ( ω ; A , . . . , A n ) ≤ n X j =1 w j A j . (1.2)Using (1.2) it has been verified in [7] that the Riemannian mean is the multivariate Lie-Trotter mean: for any differentiable curves γ , . . . , γ n on P m with γ i (0) = I for all i ,lim s → Λ( ω ; γ ( s ) , γ ( s ) , . . . , γ n ( s )) /s = exp " n X i =1 w i γ ′ i (0) . Bhatia, Jain, and Lim [3] have recently introduced a new metric and the least squaresmean on P m , called the Wasserstein metric and the Wasserstein mean, respectively. Forgiven A, B ∈ P , the Wasserstein metric d ( A, B ) is given by d ( A, B ) = (cid:20) tr (cid:18) A + B (cid:19) − tr( A / BA / ) / (cid:21) / . (1.3)In quantum information theory, the value tr( A / BA / ) / is known as the fidelity andthe Wasserstein metric is known as the Bures distance of density matrices. The geodesicpassing from A to B is given by γ ( t ) = (1 − t ) A + t B + t (1 − t )[( AB ) / + ( BA ) / ] = A ⋄ t B for t ∈ [0 , Wasserstein mean denoted by Ω( ω ; A , . . . , A n ) is defined byΩ( ω ; A , . . . , A n ) = arg min X ∈ P m n X j =1 w j d ( X, A j ) , (1.4)and it coincides with the unique solution X ∈ P m of the equation X = n X j =1 w j ( X / A j X / ) / . (1.5)Note that Ω(1 − t, t ; A, B ) = A ⋄ t B , and it has been shown that the Wasserstein meansatisfies the arithmetic-Wasserstein mean inequality. On the other hand, it is shown thatthe Wasserstein mean does not satisfy the monotonicity and the Wasserstein-harmonicmean inequality: see [3, Section 5]. So it is a natural question whether the Wassersteinmean is the multivariate Lie-Trotter mean. The main goals of this paper are to provide OUNDS FOR THE WASSERSTEIN MEAN 3 some properties of the Wasserstein mean and to verify that the Wasserstein mean is themultivariate Lie-Trotter mean by finding a lower bound for Wasserstein mean.We recall in Section 2 the Wasserstein metric with geodesic and see the Wassersteindistance between A ⋄ t B and A ⋄ t C for A, B, C ∈ P m and t ∈ [0 , A and B and the positive probability vector ω = (1 − t, t )has the unique solution X = A ⋄ t B . This naturally gives an open problem to extend theWasserstein mean of positive definite matrices to positive invertible operators.2. Wasserstein metric and geodesics
Let M m be the set of all m × m matrices with complex entries. Let H m be the real vectorspace of all m × m Hermitian matrices, and let P m ⊂ H m be the open convex cone of allpositive definite matrices. For A, B ∈ H m we denote as A ≤ B if and only if B − A ispositive semi-definite, and as A < B if and only if B − A is positive definite.The Frobenius norm k · k gives rise to the Riemannian structure on the open convexcone P m with h X, Y i A = Tr( A − XA − Y ), where A ∈ P m and X, Y ∈ T A ( P m ) = H m . Then P m is a Cartan-Hadamard Riemannian manifold, a simply connected complete Riemannianmanifold with non-positive sectional curvature (the canonical 2-tensor is non-negative). TheRiemannian trace metric between A and B is given by δ ( A, B ) = k log A − / BA − / k , and the unique (up to parametrization) geodesic curve on P m connecting from A to B is[0 , ∋ t A t B := A / ( A − / BA − / ) t A / , which is called the weighted geometric mean of A and B . Note that A B = A / B is theunique midpoint of A and B with respect to the Riemannian trace metric, and is the uniquesolution X ∈ P m of the Riccati equation XA − X = B . See [2] for more information. Lemma 2.1.
Let
A, B, C, D ∈ P m and let t ∈ [0 , . Then the following are satisfied. (1) A t B = A − t B t if A and B commute. (2) ( aA ) t ( bB ) = a − t b t ( A t B ) for any a, b > . (3) A t B = B − t A . (4) A t B ≤ C t D whenever A ≤ C and B ≤ D . HWANG AND KIM (5)
The map [0 , × P m × P m → P m , ( t, A, B ) A t B is continuous. (6) X ( A t B ) X ∗ = ( XAX ∗ ) t ( XBX ∗ ) for any nonsingular matrix X . (7) ( A t B ) − = A − t B − . (8) [(1 − λ ) A + λB ] t [(1 − λ ) C + λD ] ≥ (1 − λ )( A t C ) + λ ( B t D ) for any λ ∈ [0 , . (9) det( A t B ) = det( A ) − t det( B ) t . (10) [(1 − t ) A − + tB − ] − ≤ A t B ≤ (1 − t ) A + tB . Bhatia, Jain, and Lim [3] have introduced a new metric on P m , called the Wassersteinmetric , and have established that it gives us the Riemannian metric and the explicit formulaof geodesic curve. For given
A, B ∈ P the Wasserstein metric d ( A, B ) is given by d ( A, B ) = (cid:20) tr (cid:18) A + B (cid:19) − tr( A / BA / ) / (cid:21) / . This metric has been of interest in quantum information where it is called the
Bures distance ,and in statistics and the theory of optimal transport where it is called the
Wassersteinmetric . It is the matrix version of the Hellinger distance between probability distributions:for probability vectors p = ( p , . . . , p m ) and q = ( q , . . . , q m ) d ( p , q ) = " m X i =1 ( √ p i − √ q i ) / . We see the Wasserstein metric is related to the solution of extremal problem. Let U m be the compact subset of all m × m unitary matrices. For given A ∈ P m we define the set F ( A ) as F ( A ) = { M ∈ M m : A = M M ∗ } = { A / U : U ∈ U m } . Theorem 2.2. [3, Theorem 1]
For any
A, B ∈ P m d ( A, B ) = 1 √ M ∈F ( A ) , N ∈F ( B ) k M − N k = 1 √ U ∈ U m k A / − B / U k . The minimum in the second expression is attained at a unitary matrix U occurring in thepolar decomposition of B / A / : B / A / = U | B / A / | = U ( A / BA / ) / . Remark 2.3.
We check that d ( A, B ) is indeed a metric on P by using Theorem 2.2.(i) Obviously, d ( A, B ) ≥ OUNDS FOR THE WASSERSTEIN MEAN 5 (ii) Assume that A = B . Then U = I attains the minimum of k A / − B / U k over U ∈ U m , so the minimum value is 0 = d ( A, B ). Conversely, assume that d ( A, B ) = 0.Then k A / − B / U k = 0 when U = B / A / ( A / BA / ) − / . So A / = B / U = B / B / A / ( A / BA / ) − / = BA / ( A − / B − A − / ) / A / A − / = B ( A B − ) A − / . Set X = A B − . Then X = B − A . By the Riccati equation XA − X = B − , andso, B − A = B − . Thus, A = B .(iii) The Frobenius norm k · k is unitarily invariant: see [6, Chapter 5]. So k A / − B / U k = k A / U ∗ − B / k = k B / − A / V k , where V = U ∗ ∈ U m . Hence, d ( A, B ) = d ( B, A ).(iv) Let
A, B, C ∈ P m . By the triangle inequality for Frobenius norm d ( A, C ) ≤ √ k A / − C / U k ≤ √ k A / − B / V k + k B / V − C / U k )= k A / − B / V k + k B / − C / W k , where W = U V ∗ ∈ U m . So taking the minimum over all U, V ∈ U m , we see that d ( A, C ) ≤ d ( A, B ) + d ( B, C ).At this stage we recall a theorem from Riemannian geometry. Let ( M , g ) and ( N , h ) beRiemannian manifolds with Riemannian metrics g and h . A differentiable map π : M → N is said to be a smooth submersion if its differential Dπ ( p ) : T p M → T π ( p ) N is surjective atevery point p ∈ M . Let T p M = V p ⊕ H p be a decomposition of the tangent space T p M ,where V p = ker Dπ ( p ) and H p = (ker Dπ ( p )) ⊥ are called the vertical and horizontal spaceat p . Then π is called a Riemannian submersion if it is a smooth submersion and the map Dπ ( p ) : H p → T π ( p ) N is isometric for all p ∈ M . Theorem 2.4. [5]
Let ( M , g ) be a Riemannian manifold with Riemannian metrics g . Let G be a compact Lie group of isometries of ( M , g ) acting freely on M . Let N = M /G , andlet π : M → N be the quotient map. Then there exists a unique Riemannian metric h on N for which π : ( M , g ) → ( N , h ) is a Riemannian submersion. Note that the general linear group GL m is a Riemannian manifold with the metric inducedby the Frobenius inner product. The group U m of unitary matrices is a compact Lie group HWANG AND KIM of isometries for this metric. The quotient space GL m /U m is P , and the metric inheritedby the quotient space P is (up to a constant factor)min U ∈ U m k A / − B / U k = √ d ( A, B ) . The map π : GL m → P , π ( M ) = M M ∗ is a smooth submersion, and by Theorem 2.4there is a unique Riemannian metric on P , which is the Wasserstein metric d . From thispoint of view, the geodesic on P m joining A and B has been derived in [3]. The straightline segment Z ( t ) = (1 − t ) A / + tB / U for 0 ≤ t ≤ U = B / A / ( A / BA / ) − / is a geodesic in GL m , and by Theorem 2.4 γ ( t ) = π ( Z ( t )) = Z ( t ) Z ( t ) ∗ is a geodesic in P m : γ ( t ) = (1 − t ) A + t B + t (1 − t )[ A ( A − B ) + ( A − B ) A ]= (1 − t ) A + t B + t (1 − t )[( AB ) / + ( BA ) / ] . (2.6)We denote γ ( t ) =: A ⋄ t B for t ∈ [0 , γ ( t ) is the natural parametrization of thegeodesic joining A and B , it satisfies the affine property of parameters: for any s, t, u ∈ [0 , A ⋄ s B ) ⋄ u ( A ⋄ t B ) = A ⋄ (1 − u ) s + ut B Lemma 2.5.
For any
A, B, C ∈ P m and t ∈ [0 , d ( A ⋄ t B, A ⋄ t C ) ≤ t r λ k A − B − A − C k , where λ := λ ( A ) is the largest eigenvalue of A .Proof. Note that A ⋄ t B = Z ( t ) Z ( t ) ∗ , where Z ( t ) = (1 − t ) A / + tB / U for 0 ≤ t ≤ U = B / A / ( A / BA / ) − / . So Z ( t ) = (1 − t ) A / + t ( A − B ) A / ∈F ( A ⋄ t B ), since U = B / A / ( A − / B − A − / ) / = B / ( A B − ) A − / , and so B / U = B ( A B − ) A − / = ( A B − ) − A / = ( A − B ) A / by the Riccatiequation and Lemma 2.1 (7). Similarly, A ⋄ t C = Y ( t ) Y ( t ) ∗ , where Y ( t ) = (1 − t ) A / + t ( A − C ) A / ∈ F ( A ⋄ t C ) for 0 ≤ t ≤
1. Therefore, from the first expression in Theorem
OUNDS FOR THE WASSERSTEIN MEAN 7 d ( A ⋄ t B, A ⋄ t C ) ≤ √ k Z ( t ) − Y ( t ) k = t √ k ( A − B ) A / − ( A − C ) A / k ≤ t √ k A / k · k A − B − A − C k ≤ t r λ k A − B − A − C k . The second inequality follows from the sub-multiplicative property of the Frobenius norm:see Section 5.6 in [6], and the last inequality follows from the fact that k A / k = m X i =1 λ i ( A ) ≤ λ ( A ) , where λ ( A ) , . . . , λ m ( A ) are positive eigenvalues of A in the decreasing order. (cid:3) Wasserstein mean
Let A = ( A , . . . , A n ) ∈ P nm , and let ω = ( w , . . . , w n ) ∈ ∆ n , the simplex of all positiveprobability vectors in R n . We consider the following minimization problemarg min X ∈ P m n X j =1 w j d ( X, A j ) . (3.7)By using tools from non-smooth analysis, convex duality, and the optimal transport theory,it has been proved in Theorem 6.1, [1] that the above minimization problem has a uniquesolution in P m . On the other hand, it has been shown in [3] that the objective function f ( X ) = n X j =1 w j d ( X, A j ) is strictly convex, by applying the strict concavity of the map h : P m → R , h ( X ) = Tr( X / ). Therefore, we define such a unique minimizer of (3.7) asthe Wasserstein mean , denoted by Ω( ω ; A ). That is,Ω( ω ; A ) = arg min X ∈ P n X j =1 w j d ( X, A j ) . (3.8)To find the unique minimum of objective function f : P m → R , we evaluate the derivative Df ( X ) and set it equal to zero. By using matrix differential calculus, we have the following. HWANG AND KIM
Theorem 3.1. [3, Theorem 8]
The Wasserstein mean Ω( ω ; A ) is a unique solution X ∈ P m of the nonlinear matrix equation I = n X j =1 w j ( A j X − ) , (3.9) equivalently, X = n X j =1 w j ( X / A j X / ) / . We see some interesting properties of the Wasserstein mean. For given A = ( A , . . . , A n ) ∈ P nm , any permutation σ on { , . . . , n } , and any M ∈ GL m , we denote as A σ = ( A σ (1) , . . . , A σ ( n ) ) ∈ P nm ,M A M ∗ = ( M A M ∗ , . . . , M A n M ∗ ) ∈ P nm , A k = ( A , . . . , A n , . . . , A , . . . , A n ) ∈ P nkm , where the number of blocks in the last expression is k . For given ω = ( w , . . . , w n ) ∈ ∆ n ,we also denote as ω σ = ( w σ (1) , . . . , w σ ( n ) ) ∈ ∆ n ,ω k = 1 k ( w , . . . , w n , . . . , w , . . . , w n ) ∈ ∆ nk . Proposition 3.2.
Let A = ( A , . . . , A n ) ∈ P nm , and let ω = ( w , . . . , w n ) ∈ ∆ n . Then thefollowing are satisfied. (1) ( Homogeneity ) Ω( ω ; α A ) = α Ω( ω ; A ) for any α > . (2) ( Permutation invariancy ) Ω( ω σ ; A σ ) = Ω( ω ; A ) for any permutation σ on { , . . . , n } . (3) ( Repetition invariancy ) Ω( ω k ; A k ) = Ω( ω ; A ) for any k ∈ N . (4) ( Unitary congruence invariancy ) Ω( ω ; U A U ∗ ) = U Ω( ω ; A ) U ∗ for any U ∈ U m .Proof. Items (2) and (3) follows from the definition (3.8) of Wasserstein mean.(1) Let X = Ω( ω ; α A ) for any α >
0. By Theorem 3.1 I = n X j =1 w j ( αA j ) X − = n X j =1 w j A j α − X ) − . By Theorem 3.1 α − X = Ω( ω ; A ), which implies the desired identity.(4) Let X = Ω( ω ; U A U ∗ ) for any U ∈ U m . By Theorem 3.1 I = n X j =1 w j ( U A j U ∗ X − ).Taking the congruence transformation by U ∗ ∈ U m on both sides and applying OUNDS FOR THE WASSERSTEIN MEAN 9
Lemma 2.1 (6) I = n X j =1 w j ( A j U ∗ X − U ) = n X j =1 w j ( A j U ∗ XU ) − ) . By Theorem 3.1, we obtain U ∗ XU = Ω( ω ; A ), that is, Ω( ω ; U A U ∗ ) = U Ω( ω ; A ) U ∗ . (cid:3) Remark 3.3.
Let A = " , B = " . One can see easily that
A, B are positive definite and AB = BA . The Wasserstein meanΩ (cid:0) , ; A, B (cid:1) = A ⋄ B and the Riemannian mean Λ (cid:0) , ; A, B (cid:1) = A B of positive definitematrices A and B , respectively, areΩ (cid:18) ,
12 ;
A, B (cid:19) = 14 " , Λ (cid:18) ,
12 ;
A, B (cid:19) = " . . . . . Then their determinants aredet (cid:20) Ω (cid:18) ,
12 ;
A, B (cid:19)(cid:21) = 2 . > (cid:20) Λ (cid:18) ,
12 ;
A, B (cid:19)(cid:21) . In general, det Ω( ω ; A ) = n Y j =1 (det A j ) w j = det Λ( ω ; A ). The following shows the inequalitybetween determinants of the Wasserstein mean and the Cartan mean.It is known from Theorem 7.6.6 in [6] that the map f : P m → R , f ( A ) = log det A isstrictly concave: for any A, B ∈ P m and t ∈ [0 , − t ) A + tB ) ≥ (1 − t ) log det A + t log det B, where equality holds if and only if A = B . By induction together with this property, wehave Lemma 3.4.
Let A , . . . , A n ∈ P m , and let ω = ( w , . . . , w n ) ∈ ∆ n . Then log det n X j =1 w j A j ≥ n X j =1 w j log det A j , where equality holds if and only if A = · · · = A n . The following shows the determinantal inequality between the Wasserstein mean and theRiemannian mean.
Theorem 3.5.
Let A = ( A , . . . , A n ) ∈ P nm , and let ω = ( w , . . . , w n ) ∈ ∆ n . Then det Ω( ω ; A ) ≥ n Y j =1 (det A j ) w j , (3.10) where equality holds if and only if A = · · · = A n .Proof. Let X = Ω( ω ; A ). Then by Theorem 3.1 I = n X j =1 w j ( A j X − ), and by Lemma 3.40 = log det n X j =1 w j ( A j X − ) ≥ n X j =1 w j log det( A j X − )= 12 n X j =1 w j log det A j −
12 log det X. The last equality follows from Lemma 2.1 (9). It implieslog det X ≥ n X j =1 w j log det A j = log n Y j =1 (det A j ) w j . Taking the exponential function on both sides and applying the fact that the exponentialfunction from R to (0 , ∞ ) is monotone increasing, we obtain the desired inequality.Moreover, the equality of (3.10) holds if and only if A i X − = A j X − for all i and j . By Lemma 2.1 (3), X − A i = X − A j , and by the definition of geometric mean it isequivalent to A i = A j for all i and j . (cid:3) Bounds for the Wasserstein mean
The Wasserstein mean satisfies the arithmetic-Wasserstein mean inequality.
Theorem 4.1. [3, Theorem 9]
Let A = ( A , . . . , A n ) ∈ P nm and let ω = ( w , . . . , w n ) ∈ ∆ n .Then Ω( ω ; A ) ≤ n X j =1 w j A j . Proposition 4.2.
Let A = ( A , . . . , A n ) ∈ P nm , and let ω = ( w , . . . , w n ) ∈ ∆ n . Then foran operator norm k · k k Ω( ω ; A ) k ≤ n X j =1 w j k A j k / . OUNDS FOR THE WASSERSTEIN MEAN 11
Proof.
Let X = Ω( ω ; A ). Then by Theorem 3.1, by the triangle inequality for the oper-ator norm, by the fact that k A t k = k A k t for any A ∈ P m and t ≥
0, and by the sub-multiplicativity for the operator norm in [6, Section 5.6] k X k = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) n X j =1 w j (cid:16) X / A j X / (cid:17) / (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ n X j =1 (cid:13)(cid:13)(cid:13)(cid:13) w j (cid:16) X / A j X / (cid:17) / (cid:13)(cid:13)(cid:13)(cid:13) = n X j =1 w j (cid:13)(cid:13)(cid:13) X / A j X / (cid:13)(cid:13)(cid:13) / ≤ n X j =1 w j k X k / k A j k / . Hence, we obtain k Ω( ω ; A ) k = k X k ≤ n X j =1 w j k A j k / . (cid:3) Remark 4.3.
By Theorem 4.1 we have k Ω( ω ; A ) k ≤ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) n X j =1 w j A j (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ n X j =1 w j k A j k . Since the square map R ∋ t t ∈ [0 , ∞ ) is convex, n X j =1 w j k A j k / ≤ n X j =1 w j k A j k . Thus, one can see that Proposition 4.2 gives a sharp upper bound of the Wasserstein meanfor the operator norm.Unfortunately, the Wasserstein mean does not satisfy the Wasserstein-harmonic meaninequality: see Section 5 in [3]. However, we give the lower bound for the Wasserstein meanunder certain condition.
Theorem 4.4.
Let ω = ( w , . . . , w n ) ∈ ∆ n and A = ( A , . . . , A n ) ∈ P nm . Then Ω( ω ; A ) ≥ I − n X j =1 w j A − j . Proof.
Let Ω = Ω( ω ; A ). By Theorem 3.1 and the geometric-harmonic mean inequality inLemma 2.1 (10), I = n X j =1 w j ( A j − ) ≥ n X j =1 w j A − j + Ω2 ! − . Taking inverse on both sides and applying the convexity of inversion map in (1.33) of [2]yield I ≤ n X j =1 w j A − j + Ω2 ! − − ≤ n X j =1 w j A − j + Ω2 ! = 12 n X j =1 w j A − j + 12 X. By a simple calculation, we obtain the desired inequality. (cid:3)
Remark 4.5.
Note that 2 I − n X j =1 w j A − j ≤ n X j =1 w j A − j − . Indeed, I = n X j =1 w j A − j n X j =1 w j A − j − ≤ n X j =1 w j A − j + n X j =1 w j A − j − . We give another upper bound for the Wasserstein mean different from the arithmeticmean.
Remark 4.6.
Assume that n X j =1 w j A j < I . Let Ω = Ω( ω ; A ). By Theorem 3.1 and thearithmetic-geometric mean inequality in Lemma 2.1 (10), I = n X j =1 w j ( A j − ) ≤ n X j =1 w j (cid:18) A j + Ω − (cid:19) . By a simple calculation, we have 0 < I − n X j =1 w j A j ≤ Ω − , and soΩ ≤ I − n X j =1 w j A j − . This means that I − n X j =1 w j A j − is an upper bound for Ω( ω ; A , . . . , A n ). OUNDS FOR THE WASSERSTEIN MEAN 13
On the other hand, note that I − n X j =1 w j A j − ≥ n X j =1 w j A j . Indeed, I = n X j =1 w j A j n X j =1 w j A j − ≤ n X j =1 w j A j + n X j =1 w j A j − . Then 2 I − n X j =1 w j A j ≤ n X j =1 w j A j − , so I − n X j =1 w j A j − ≥ n X j =1 w j A j .5. Applications to the Lie-Trotter mean
We see in this section some applications of the lower bound of the Wasserstein mean inTheorem 4.4 to the notion of Lie-Trotter means. A weighted n -mean G n on P m for n ≥ G n ( ω ; · ) : P nm → P m that is idempotent, in the sense that G n ( ω ; X, . . . , X ) = X forall X ∈ P m . A weighted n -mean G n ( ω ; · ) : P nm → P m is called a multivariable Lie-Trottermean if it is differentiable and satisfieslim s → G n ( ω ; γ ( s ) , γ ( s ) , . . . , γ n ( s )) /s = exp " n X i =1 w i γ ′ i (0) , (5.11)where for ǫ > γ i : ( − ǫ, ǫ ) → P m are differentiable curves with γ i (0) = I for all i = 1 , . . . , n .See [7] for more details and information. Lemma 5.1.
Let Ω ω := Ω( ω ; · ) : P nm → P m be the Wasserstein mean for given probabilityvector ω = ( w , . . . , w n ) . Then it is differentiable at I = ( I, . . . , I ) with D Ω ω ( I )( X , . . . , X n ) = n X j =1 w j X j . Proof.
Let X , . . . , X n ∈ S ( H ) . If X = · · · = X n = 0 , then the statement holds obviously.Without loss of generality, we assume that at least one of X , . . . , X n is not zero. Set ρ := max ≤ j ≤ n σ ( X j )where σ ( X ) is the spectral radius of X. Then ρ > . Define f ( t ) = 2 I − n X ω j ( i + tX j ) − on ( − ρ , ρ ) . Then λ ( I + tX j ) = 1 + tλ ( X j ) ≥ − | t || λ ( X j ) | ≥ − ρ | t | > where λ ( X ) denote the eigenvalue of X. So I + tX j ∈ P for any t ∈ ( − ρ , ρ ) . Thus f iswell-defined in a neighborhood ( − ρ , ρ ) of 0 and f (0) = 2 I − P nj =1 ω j ( I ) − = I. Since thederivative of the map t ( tX + I ) − at t = 0 is − X. We have f ′ (0) = lim t → I − P nj =1 ω j ( I + tX j ) − t = lim t → n X j =1 ω j ( I + tX j ) − X j ( I + tX j ) − = n X j =1 ω j X j . Then by Theorem 4.1 and 4.4,[2 I − P nj =1 ω j ( I + tX j ) − ] − It ≤ Ω ωn ( ω ; I + tX j ) − It ≤ P nj =1 ω j ( I + tX j ) − It = n X j =1 ω j X j for any sufficiently small t > . So we havelim t → + Ω wn ( ω ; I + tX , . . . , I + tX n ) − Ω wn ( I, . . . , I ) t = n X j =1 ω j X j . Since Ω ωn ( I, . . . , I ) = I. Similarly, for t < t → − Ω ωn ( ω ; I + tX , . . . , I + tX n ) − Ω ωn ( I, . . . , I ) t = n X j =1 ω j X j . We conclude that Ω wn is differentiable at I with D Ω ωn ( I )( X , . . . , X n ) = P nj =1 ω j X j . (cid:3) Theorem 5.2.
The Wasserstein mean is the multivariate Lie-Trotter mean, that is, forgiven ω = ( w , . . . , w n ) ∈ ∆ n lim s → Ω( ω ; γ ( s ) , γ ( s ) , . . . , γ n ( s )) /s = exp n X j =1 w j γ ′ j (0) , where for ǫ > , γ j : ( − ǫ, ǫ ) → P m are differentiable curves with γ j (0) = I for all j =1 , . . . , n .Proof. Let ω = ( ω , . . . , ω n − , ω n ) ∈ ∆ n and let γ , . . . , γ n : ( − ǫ, ǫ ) P be any differentiablecurve with γ j (0) = I for all i = 1 , . . . , n. Then2 I − n X j =1 ω j γ j ( s ) − ≤ Ω( ω ; γ ( s ) , . . . , γ n ( s )) ≤ n X j =1 ω j γ j ( s ) . OUNDS FOR THE WASSERSTEIN MEAN 15
Taking logarithms and using the fact that the logarithm function is operator monotone, wehave log I − n X j =1 ω j γ j ( s ) − ≤ log Ω( ω ; γ ( s ) , . . . , γ n ( s )) ≤ log n X j =1 ω j γ j ( s ) . For s > , multiplying all terms by 1 /s, we get1 s log I − n X j =1 γ j ( S ) − ≤ log Ω( ω ; γ ( s ) , . . . , γ n ( s )) /s ≤ s log n X j =1 ω j γ j ( s ) . Taking the limit s → + , and using the l’Hˆ o pital’s theorem we obtainlim s → + log Ω( ω ; γ ( s ) , . . . , γ n ( s )) /s = n X j =1 ω j γ ′ j (0) . Since the logarithm map log : P → S ( H ) is diffeomorphic,lim s → + Ω( ω ; γ ( s ) , . . . , γ n ( s )) /s = exp n X j =1 ω j γ ′ j (0) . For s < , we obtain lim s → − ( ω ; γ ( s ) , . . . , γ n ( s )) /s = exp n X j =1 ω j γ ′ j (0) by similar steps. (cid:3) Taking γ i ( s ) = A si for each i and some A i ∈ P m , we obtain from Theorem 5.2 Corollary 5.3.
Let A , . . . , A n ∈ P m and ω = ( w , . . . , w n ) ∈ ∆ n . Then lim s → Ω( ω ; A s , . . . , A sn ) /s = exp " n X i =1 w i log A i . Final remarks
It is a natural question if the Wasserstein mean can be defined on the setting P of positivedefinite operators. Since one can not have the Wasserstein metric on P , the definition (3.8)may not be available. One possible approach to define the operator Wasserstein mean isto show the existence and uniqueness of the solution of the equation (3.9). On the otherhand, one can not find the explicit form of the solution of the nonlinear equation (3.9), butwe have seen that the solution of (3.9) for two positive definite matrices A and B coincideswith the geodesic γ ( t ) = A ⋄ t B in (2.6) with respect to the Wasserstein metric. We directlysolve the nonlinear equation (3.9) for n = 2 by using the properties of geometric mean ofpositive definite operators. For positive definite operators
A, B ∈ P and t ∈ [0 ,
1] the weighted geometric mean of A and B is defined by A t B = A / ( A − / BA − / ) t A / . Note that A B = A / B is the unique positive definite solution X ∈ P of the Riccatiequation XA − X = B . Moreover, it satisfies most of all properties in Lemma 2.1, but welist some of them that are useful for our goal. See [4, 8, 9]. Lemma 6.1.
Let
A, B, C, D ∈ P and let t ∈ [0 , . Then the following are satisfied. (1) A t B = B − t A . (2) X ( A t B ) X ∗ = ( XAX ∗ ) t ( XBX ∗ ) for any nonsingular matrix X . (3) ( A t B ) − = A − t B − . Theorem 6.2.
Let
A, B ∈ P and t ∈ [0 , . Then the nonlinear equation I = (1 − t )( A X − ) + t ( B X − ) (6.12) has a unique positive definite solution X = A ⋄ t B .Proof. Pre- and post-multiplying all terms by A − / for A > A − = (1 − t )( A − / X − A − / ) / + t ( A − / BA − / A − / X − A − / ) . Let Y = A − / X − A − / and Z = A − / BA − / . Then we have A − = (1 − t ) Y / + t ( Z Y ) . By using the Riccati equation, we get1 t [ A − − (1 − t ) Y / ] Y − [ A − − (1 − t ) Y / ] = Z. Pre- and post-multiplying all terms by A for A > , we get h Y − / − (1 − t ) A i = t AZA.
Taking square root on both sides, we obtain Y − / = (1 − t ) A + t ( AZA ) / . By assumption,we have ( A / XA / ) / = (1 − t ) A + t ( A / BA / ) / . Taking square on both sides, pre- and post-multiplying all terms by A − / for A > , weobtain X = A − / [(1 − t ) A + t ( A / BA / ) / ] A − / = A ⋄ t B. (cid:3) OUNDS FOR THE WASSERSTEIN MEAN 17
Open question . For positive definite operators A , . . . , A n and a positive probabilityvector ( w , . . . , w n ), the nonlinear equation I = n X j =1 w j ( A j X − ) , has a unique positive definite solution X in the setting P of positive definite operators?This is an interesting and challangeable problem, and Theorem 6.2 gives us a positiveanswer. Acknowledgement
The work of S. Kim was supported by the National Research Foundation of Korea (NRF)grant funded by the Korea government (MIST) (No. NRF-2018R1C1B6001394).
References [1] M. Agueh and G. Carlier, Barycenters in the Wasserstein space, SIAM J. Math. Anal. Appl. (2011),904-924.[2] R. Bhatia, Positive Definite Matrices, Princeton Series in Applied Mathematics, Princeton, 2007.[3] R. Bhatia, T. Jain and Y. Lim, On the Bures-Wasserstein distance between positive definite matrices,to appear in Expositiones Mathematicae.[4] G. Corach, H. Porta, and L. Recht, Convexity of the geodesic distance on spaces of positive operators,Illinois J. Math. (1994), 87-94.[5] S. Gallot, D. Hulin, and J. Lafontaine, Riemannian Geometry, Springer, 2004.[6] R. A. Horn and C. R. Johnson, Matrix Analysis, 2nd edition, Cambridge University Press, 2013.[7] J. Hwang and S. Kim, Lie-Trotter means of positive definite operators, Linear Algebra Appl. (2017),268-280.[8] F. Kubo and T. Ando, Means of positive linear operators, Math. Ann. (1979/80), no. 3, 205-224.[9] J. Lawson and Y. Lim, Metric convexity of symmetric cones, Osaka J. Math. (2007), 795-816. Jinmi Hwang, Department of Mathematics, Chungbuk National University, Cheongju 28644,Korea
E-mail address : [email protected] Sejong Kim, Department of Mathematics, Chungbuk National University, Cheongju 28644,Korea
E-mail address ::