Geometric mean of probability measures and geodesics of Fisher information metric
aa r X i v : . [ m a t h . DG ] M a r Geometric mean of probability measures andgeodesics of Fisher information metric
Mitsuhiro Itoh ∗ and Hiroyasu Satoh † March 18, 2019
Abstract
The space of all probability measures having positive density functionon a connected compact smooth manifold M , denoted by P ( M ), carriesthe Fisher information metric G . We define the geometric mean of prob-ability measures by the aid of which we investigate information geometryof P ( M ), equipped with G . We show that a geodesic segment joiningarbitrary probability measures µ and µ is expressed by using the nor-malized geometric mean of its endpoints. As an application, we show thatany two points of P ( M ) can be joined by a unique geodesic. Moreover,we prove that the function ℓ defined by ℓ ( µ , µ ) := 2 arccos R M √ p p dλ , µ i = p i λ , i = 1 , P ( M ). It isshown that geodesics are all minimal. For positive numbers a and b , √ a b is called the geometric mean of a and b . Thegeometric mean of probability measures is similarly defined as follows; for twoprobability measures of density functions p and p , we define their geometricmean by √ p p . By normalizing it, we obtain a probability measure.In this paper we study, from a viewpoint of the normalized geometric mean,information geometry of the space P ( M ) of probability measures on a manifold M , which is equipped with Fisher information metric G . By the aid of thenormalized geometric mean, we give a formula describing geodesic segments andthen exhibit an exact form of the distance function for the space of probabilitymeasures with respect to the metric G .Let M be a connected, compact smooth manifold with a smooth probabilitymeasure λ . Let P ( M ) be the space of probability measures on M which are ab-solutely continuous with respect to the measure λ and have positive continuous ∗ Institute of Mathematics, University of Tsukuba, 1-1-1 Tennodai, Tsukuba-shi, Ibaraki305-8577, JAPAN e-mail : [email protected] † Liberal Arts and Sciences, Nippon Institute of Technology, 4-1 Gakuendai, Miyashiro-machi, Minamisaitama-gun, Saitama 345-8501 JAPAN e-mail : [email protected] P ( M ) = (cid:26) µ (cid:12)(cid:12)(cid:12)(cid:12) µ is a measure on M, Z M dµ = 1 , µ ≪ λ, dµdλ ∈ C ( M ) (cid:27) . (1.1)Here dµ/dλ is the Radon-Nikodym derivative of µ with respect to λ and C ( M )denotes the set of all positive continuous functions on M . The geometric meanof µ = p λ , µ = p λ ∈ P ( M ) is defined by √ p p λ . By normalizing thegeometric mean, we give the definition of the normalized geometric mean. Defninition 1.1.
The normalized geometric mean is a map ϕ : P ( M ) ×P ( M ) → P ( M ) defined by ϕ ( µ , µ ) = Z x ∈ M s dµ dµ ( x ) dµ ( x ) ! − s dµ dµ µ . (1.2)We remark that s dµ dµ µ = √ p p λ for µ i = p i λ , i = 1 , ϕ ( µ , µ ) = ϕ ( µ , µ ), ϕ ( µ, µ ) = µ . Defninition 1.2.
Let ℓ : P ( M ) × P ( M ) → [0 , π ) be a function defined by ℓ ( µ , µ ) = 2 arccos Z x ∈ M s dµ dµ ( x ) dµ ( x ) ! . (1.3)The aim of this paper is to present geometric characterization of the map ϕ and the function ℓ from information geometry of P ( M ).We mention here the informations which are closely related to ϕ and ℓ . Theintegration C H ( µ , µ ) := Z x ∈ M r dµ dλ ( x ) r dµ dλ ( x ) dλ ( x ) = Z x ∈ M s dµ dµ ( x ) dµ ( x )is called the Hellinger integral or the
Hellinger coefficient , representing theamount that measures the separation of two probability measures. The function ℓ defined at Definition 1.2 is then expressed as ℓ ( µ , µ ) = 2 arccos C H ( µ , µ ).The information given by d H ( µ , µ ) := Z M r dµ dλ − r dµ dλ ! dλ / = 2(1 − C H ( µ , µ )) , called the Hellinger distance [22], is characterized as the square of the 0-divergence(see [1, p.58]).The function ℓ provides a Riemannian distance function with respect to acertain Riemannian metric, Fisher information metric, as stated in Theorem1.5. 2e regard the space P ( M ) as an infinite dimensional manifold whose tangentspace T µ P ( M ) at µ ∈ P ( M ) is identified with the vector space (cid:26) τ (cid:12)(cid:12)(cid:12)(cid:12) τ is a signed measure on M , Z M dτ = 0 , dτdµ ∈ C ( M ) (cid:27) . (1.4)T. Friedrich [12] defines for each µ ∈ P ( M ) an inner product G µ of τ , τ ∈ T µ P ( M ) by G µ ( τ , τ ) = Z M dτ dµ dτ dµ dµ (1.5)which is a natural extension of the Fisher information matrix for a statisticalmodel in mathematical statistics and information theory (see [2]). We callthe map µ G µ the Fisher information metric on P ( M ). The metric G is invariant under the push-forward transformation of probability measures aseasily observed (see [12, Satz 1]). Namely, any homeomorphism of M is anisometry with respect to the metric G via the push-forward transformation of P ( M ). Remark that the group of homeomorphisms of a compact manifold M acts on P ( M ) transitively via the push-forward, that is, for any µ ∈ P ( M )there exists a homeomorphism Φ of M such that Φ ♯ λ = µ . Here Φ ♯ meansthe push-forward. Refer for this to [11, 24]. This fact tells us that the space P ( M ) consisting of probability measures of continuous density function admitsa structure of a Riemannian homogeneous space. Refer to [4] for the uniquenessof the Fisher metric on the space of probability measures having smooth densityfunction under push-forward invariance of diffeomorphisms. Notice the space ofprobability measures of smooth density function is a dense subset of P ( M ).An embedding ρ : P ( M ) → L ( M, λ ); µ = pλ
7→ √ p provides the space P ( M ) an L -topology. Here L ( M, λ ) is the L -space of integrable functions on M of finite norm k · k L , where the norm is defined by k f k L = (cid:0)R M | f | dλ (cid:1) / .Then, P ( M ) is embedded onto the subset ρ ( P ( M )) ⊂ { f ∈ L ( M, λ ) | k f k L =1 } of L ( M, λ ). We equip each µ = pλ ∈ P ( M ) with an ε -neighborhood of µ in the k√· − √·k L -topology as { µ ′ = p ′ λ ∈ P ( M ) | k√ p ′ − √ p k L < ε } for ε >
0. Notice that P ( M ) admits also the C -topology with the norm k p k C := sup x ∈ M | p ( x ) | for µ = pλ . However, in this paper we employ mainly the k√·−√·k L –topology. The map ϕ and the function ℓ are continuous with respectto the product topology of P ( M ) ×P ( M ) induced from k√·−√·k L -topology, asshown in section 3. See section 5 for an appropriate smooth structure on P ( M ),given in [25]. The tangent space T µ P ( M ) is an infinite dimensional vector spacewith the inner product G µ . The vector space T µ P ( M ) is not a Hilbert space,since the completion of the space C ( M ) is not itself C ( M ) so that P ( M ) isnot a Riemannian-Hilbert manifold. Remark that the pullback of the L -innerproduct ( · , · ) L , given by ( f, f ) L = R x ∈ M f ( x ) f ( x ) dλ ( x ), via ρ coincides with G ( · , · ). Remark 1.3.
The compactness of the manifold M is assumed throughout thispaper. When M is non-compact, the argument appeared in this paper is almostvalid if a minor change is done, as that P ( M ) is the space of all probability3easures µ = p ( x ) λ , µ ≪ λ such that µ is connected with λ by an open mixturearc (for the notion of open mixture arc see subsection 5.2 and [8, 30]) with p = p ( x ) ∈ C ( M ). Then the k√· − √·k L –topology is introduced on P ( M ),same as in the compact manifold case. The tangent space T µ P ( M ) is the vectorspace of measures τ = q ( x ) λ of q ∈ C ( M ) such that there exists an ε > µ + tτ = ( p + tq ) λ defines a probability measure in P ( M ) for any t ∈ ( − ε, ε ).Sections 2 and 3 may be valid even for a non-compact manifold M . We willgive in future a relevant study about non-compact manifold case.Let ∇ be the Levi-Civita connection of the metric G . Then ∇ is given by ∇ τ τ ( µ ) = − (cid:18) dτ dµ dτ dµ − Z M dτ dµ dτ dµ dµ (cid:19) µ (1.6)for any τ , τ ∈ T µ P ( M ) (see [12, p.276]). T. Friedrich computes the Rie-mannian curvature tensor of G by using (1.6) and shows that the space P ( M )equipped with the metric G has constant sectional curvature +1 / P ( M ) to be geodesic withrespect to G for a given initial data. In fact, let γ : I → P ( M )( I ⊂ R is an openinterval, 0 ∈ I ) be a geodesic, parametrized by arc-length with an initial data; γ (0) = p λ , ˙ γ (0) = ˙ p λ of | ˙ γ (0) | µ = 1. Then the density function p t = p t ( x ) of γ ( t ) with respect to λ has the form p t ( x ) = 11 + tan ( t ) (cid:26) p ( x ) + 2 tan (cid:18) t (cid:19) ˙ p ( x ) + tan (cid:18) t (cid:19) ˙ p ( x ) p ( x ) (cid:27) . (1.7)From this formula any geodesic of P ( M ) is seen to be periodic with period 2 π .It is true that γ ( t ) = p t λ is indeed a probability measure for any t . However,it is not determined from (1.7) whether γ ( t ) = p t λ belongs to P ( M ). It isalso not mentioned in [12] whether p t ∈ C ( M ) at any t for which γ ( t ) isdefined. However, this is completely solved for a geodesic segment, by the aidsof the density free expression for geodesic together with the notion of normalizedgeometric mean.Every geodesic is incomplete, as we see from (1.7) γ ( ± π )
6∈ P ( M ), because γ ( t ) at t = ± π has the form γ ( ± π ) = (cid:18) ˙ p ( x ) p ( x ) (cid:19) λ and Z M ˙ p ( x ) dλ ( x ) = 0so that the continuous function ˙ p ( x ) admits necessarily a zero in M . It is ofinterest whether an interval I ⊂ ( − π, π ), on which the geodesic γ is defined, canbe extended to a maximal one.We emphasize that by relaxing the continuity of density function for prob-ability measures, the situation for geodesics is drastically changed, as will beseen in Proposition 2.5, for example, uniqueness of geodesic segment for givenendpoints collapses.In [15] we obtain from (1.7) a density free description of a geodesic in P ( M )by the aid of which we derive an explicit formula representing a geodesic segment4 ( t ) for given two endpoints µ, µ ∈ P ( M ). By using the normalized geomet-ric mean, we obtain the following theorem stating uniqueness and existence ofgeodesic segment. Theorem 1.4.
Let µ, µ ∈ P ( M ) be arbitrary distinct probability measures.Then, there exists a unique geodesic γ ( t ) with respect to G parametrized byarc-length, joining µ and µ , and being expressed in the form γ ( t ) = a ( t ) µ + a ( t ) µ + a ( t ) ϕ ( µ, µ ) , t ∈ [0 , l ] . (1.8)Here γ ( l ) = µ , l = ℓ ( µ, µ ) and a i ( t ), i = 1 , , t satisfying a ( t ) + a ( t ) + a ( t ) = 1 , which are given by a ( t ) = (cid:18) sin( l − t ) / l/ (cid:19) , a ( t ) = (cid:18) sin( t/ l/ (cid:19) ,a ( t ) = 2 cos( l/ · sin( t/ · sin( l − t ) / ( l/ . The uniqueness of a geodesic segment follows from the fact that all proba-bility measures in P ( M ) and tangent vectors have continuous density functionon M .From Theorem 1.4, we find the following properties of geodesics in P ( M ); Theorem 1.5.
Let γ = γ ( t ), t ∈ [0 , l ] be a geodesic segment joining distinctprobability measures µ , µ ∈ P ( M ) such that γ (0) = µ , γ ( l ) = µ . Then,(i) γ ( t ) belongs to P ( M ) at any t ∈ [0 , l ],(ii) the geodesic segment γ : [0 , l ] → P ( M ) is a curve lying on the planespanned by µ, µ and their normalized geometric mean ϕ ( µ, µ ),(iii) the velocity vectors of the geodesic segment at t = 0 and t = l are respec-tively given by ˙ γ (0) = cot l ( ϕ ( µ, µ ) − µ ) and ˙ γ ( l ) = − cot l ( ϕ ( µ, µ ) − µ ).This implies that two tangent lines defined at the endpoints of the geodesicsegment always intersect each other at ϕ ( µ, µ ) (see Remark 2.7) and(iv) the midpoint of the geodesic segment γ ( t ), t ∈ [0 , l ] is represented by γ ( l/
2) = 14 cos ( l/ s dµ dµ ! µ. (1.9)The probability measure at the right hand side is viewed as the normalized(1 / µ, µ . Here the normalized α -power mean ϕ ( α ) ( µ, µ ) , α ∈ R , of probability measures µ, µ is defined by ϕ ( α ) ( µ, µ ) = "Z M (cid:26) (cid:18) dµ dµ (cid:19) α (cid:27) /α dµ − (cid:26) (cid:18) dµ dµ (cid:19) α (cid:27) /α µ. (1.10)5he normalized α -power mean is derived from the α -power mean of positivetwo numbers a and b defined by (cid:0) a α + b α (cid:1) /α (see [7]). In particular, the arith-metic mean, the geometric mean and the harmonic mean are α -power means, α = +1 , −
1, respectively.
Remark 1.6.
A. Ohara considers in [23] operator means on a symmetric coneΩ and a dualistic structure naturally introduced on it, i.e., a Riemannian metric g on Ω together with affine connections ( ∇ , ∇ ∗ ) adjoint each other with respectto g . In particular, he constructs a family of affine connections {∇ ( α ) } such that ∇ ( − α ) is the dual connection of ∇ ( α ) and ∇ (0) is the Levi-Civita connection of g , and shows that the midpoint of ∇ ( α ) -geodesic segment is the α -power meanof their endpoints. Theorem 1.5 (iv) is inspired by his consideration.The reader may question the difference between Ohara’s results and Theorem1.5 (iv), because Ohara asserts that the midpoint of ∇ ( α ) -geodesic segment ischaracterized as the α -power mean and we assert that the midpoint of ∇ (0) -geodesic segment is characterized as the (1 / α -connections induced by a certain potential function.In this way, the structure which we treat is different from the structure Oharaconsiders.We are able to define similarly α -connections on P ( M ), which also play asignificant role in information geometry, and obtain in a subsequent paper acertain relation between the midpoint of a geodesic segment of α -connectionand the normalized α -power mean of their endpoints. Remark 1.7.
The authors considered in [15] a Hadamard manifold X , a simplyconnected, complete Riemannian manifold having non-positive curvature, andthe space P ( ∂X ) of probability measures defined on the ideal boundary ∂X of X . Under certain assumptions, we can define a map bar : P ( ∂X ) → X , calledthe barycenter map, as a critical point of a function B µ : X → R given by B µ ( x ) = R θ ∈ ∂X B θ ( x ) dµ ( θ ), where B θ ( x ) is the Busemann function associatedwith θ ∈ ∂X , geometrically defined on a Hadamard manifold. The barycentermap plays an essential role in the proof of Mostow’s rigidity theorem shown by G.Besson et al. [5], following the idea of Douady and Earle [10]. In [15, Theorem5], the authors show that the map bar : P ( ∂X ) → X is an onto fibrationand then investigate certain conditions for a geodesic segment of P ( ∂X ) underwhich the endpoints of the geodesic segment are contained in a common fiberbar − ( x ), x ∈ M . For other directions of geometry of P ( ∂X ) with respect toFisher information metric refer to [19, 18, 14].The following theorem indicates that the function ℓ , defined in (1.3) is ac-tually the Riemannian distance function of the space P ( M ). Theorem 1.8. ℓ ( µ, µ ) gives the Riemannian distance between µ and µ withrespect to the Fisher information metric G .6his theorem is verified by the aid of three propositions, familiar in a finitedimensional Riemannian geometry; Gauss lemma, the existence theorem of to-tally normal neighborhood and the minimizing length properties of geodesics,cf. [9, Chap. 3]. Remark 1.9.
T. Friedrich also stated that ℓ ( µ , µ ) is the Riemannian distancebetween µ and µ , but without a proof (see [12, p.279, Bemerkung]).From Theorem 1.8, the Riemannian distance satisfies ℓ ( µ , µ ) < π for all µ , µ ∈ P ( M ). Therefore the diameter D of P ( M ) with respect to the metric G fulfills D ≤ π . The diameter is here defined by D = sup (cid:8) ℓ ( µ , µ ) | µ , µ ∈ P ( M ) (cid:9) . Theorem 1.10.
The diameter D of P ( M ) with respect to the metric G satisfies D = π .This theorem can be verified, by applying the parametrix of the heat kernelof a compact smooth Riemannian manifold M . For the details, refer to [16].Now we will briefly state the development of information geometry and itstopics related to this paper. Information geometry which is the geometry on thespace of probability distributions, called the statistical model, began with thegeometrical considerations of statistical estimations. C. R. Rao [28] proposeddefining a metric based on the Fisher matrix and S. Amari gave a modern dif-ferential geometric framework, i.e., a Riemannian metric and affine connections,on his idea (see [2]). Although information geometry developed afterwards, thesubject was only a family of probability distributions whose parameter spacehas finite dimension. Since the 1990s, the information geometry of the infinitedimensional case, i.e., the geometric structure on the space of all probabilitydistribution has begun to be considered. In 1991, T. Friedrich extended theFisher metric on infinite dimensional statistical model and investigated prop-erties of Riemannian geometric nature, for example the Riemannian curvaturetensor and geodesics, and symplectic structures without any argument of thecoordinate structure of the space of probability measures. In 1995, G. Pistoneand C. Sempi [27] defined the topology of the space of all positive densities ofthe probability measures, which is a subset of L -space, as a Banach manifoldwhose model space is the Orlicz space. The geometrical and analytical prop-erties of the mixture model M ( µ ) and the exponential model E ( µ ) have beenstudied by Pistone and his coauthors (for example, see [25, 26, 8, 13]). See also[30].Our argument is based on Friedrich’s framework. We can develop informa-tion geometry for a more general setting of probability spaces by the aids of theresearches of Pistone-Sempi (for their study refer to [27] and [8]). In final sectionwe will outline their argument by means of Orlicz spaces. We show further inProposition 5.5 that Fisher information metric G , given at (1.5), can be repre-sented as the covariance of random variables in a local chart representation, bythe framework of Pistone-Sempi. 7his paper is organized as follows. In section 2, we outline the derivation of ageodesic γ ( t ) for a given initial data γ (0) = µ and ˙ γ ( t ) = τ , and for a boundarydata γ (0) = µ and γ ( l ) = µ , respectively. Moreover, we show Theorems 1.4and Theorem 1.5, which state a geometric characterization of the normalizedgeometric mean in Fisher information geometry. Section 3 is devoted to showingthat ϕ and ℓ are continuous with respect to the k√· − √·k L -topology. Insection 4, we consider the exponential map and a totally normal neighborhoodon P ( M ) and verify Theorem 1.8. In final section, we consider the topologyand the smooth structure of P ( M ). The argument of Pistone and Sempi issummarized and the notion of being connected by an open mixture arc togetherwith Proposition 5.6 concerning with constant vector field argument is given. We outline the derivation of a formula of geodesic in P ( M ) by following theargument of T. Friedrich (see [12, §
2] for details).Let λ ∈ P ( M ) be the probability measure represented by the Riemannianvolume form of M , associated with a Riemannian metric, provided M is ori-entable. For non-orientable M choose the double covering ˜ M of M and thentaking the push-forward of the Riemannian volume form λ ˜ M via the doublecovering map π : ˜ M → M .Denote by γ ( t ) = p t λ a geodesic in P ( M ) which is parametrized by arc-length, and whose initial point is γ (0) = µ and initial unit velocity is τ ∈ T µ P ( M ). Here p t : x p t ( x ) is a continuous function on M which is assumedto be C -class with respect to t . Since γ ( t ) is a geodesic, we have G ( ∇ ˙ γ ( t ) ˙ γ ( t ) , τ ) = ˙ γ ( t ) G ( ˙ γ ( t ) , τ ) − G ( ˙ γ ( t ) , ∇ ˙ γ ( t ) τ ) = 0for any constant vector field τ . Then, by using the formula (1.6) for the Levi-Civita connection with respect to G , we find that p t satisfies ddt (cid:18) ˙ p t p t (cid:19) + 12 (cid:18) ˙ p t p t (cid:19) + 12 = 0 . (2.1)Setting f t = ˙ p t /p t , we obtain ˙ f t + f t + = 0 and find that a solution to thisdifferential equation is f t = tan ( − / A ). Hence we havelog p t = 2 log cos ( − t/ A ) + B , i.e., p t = C cos (cid:18) − t A (cid:19) , C = exp B where A and C are functions on M determined by the initial condition as follows; A = arctan (cid:18) ˙ p p (cid:19) , C = ( p ) + ( ˙ p ) p . Proposition 2.1 ([12]) . p t = ( p ) + ( ˙ p ) p cos (cid:18) − t (cid:18) ˙ p p (cid:19)(cid:19) = 11 + tan ( t/ (cid:26) p + 2 ˙ p tan t p ) p · tan t (cid:27) . The following is the density free expression of a geodesic.
Proposition 2.2.
Let γ ( t ) be a geodesic with γ (0) = µ and ˙ γ (0) = τ . If τ isof unit norm, i.e., | τ | µ = 1 with respect to G , then γ ( t ) is represented by γ ( t ) = (cid:18) cos t dτdµ · sin t (cid:19) µ. (2.2)In fact, set µ = p λ , τ = ˙ p λ and obtain from Proposition 2.1 γ ( t ) = p t λ = 11 + tan ( t/ ( p p · tan t (cid:18) ˙ p p (cid:19) tan t ) µ = 11 + tan ( t/ (cid:18) p p · tan t (cid:19) µ = (cid:18) cos t dτdµ · sin t (cid:19) µ. Remark 2.3.
We notice from (2.2) that γ ( ± π ) = ( dτ /dµ ) µ is a probabilitymeasure. However, it does not admit positive density function, as we remarkedin section 1. Moreover the formula (2.2) indicates that every geodesic is periodicwith period 2 π , since γ ( t ) = (
12 (1 + cos t ) + 12 (1 − cos t ) (cid:18) dτdµ (cid:19) ) µ + sin t τ. Therefore we are able to choose a parameter t , at which γ ( t ) is defined, is insidethe open interval ( − π, π ). Next, we rewrite (2.2) by using the boundary data (see [15, Theorem 11]).
Theorem 2.4.
Let µ , µ be arbitrary probability measures of P ( M ). Assume µ = µ . Then there exists a unique geodesic segment γ ( t ), t ∈ [0 , l ], l = ℓ ( µ, µ )such that γ (0) = µ , γ ( l ) = µ . In fact, γ ( t ) is represented as γ ( t ) = (cid:18) cos t dτdµ · sin t (cid:19) µ τ = 1sin( l/ s dµ dµ − cos l ! µ. Proof.
If we assume that µ and µ are joined by (2.2), then there exists a positivenumber l such that γ ( l ) = µ , i.e., it holds (cid:18) cos l dτdµ · sin l (cid:19) µ = µ . (2.3)Solving this equation with respect to dτ /dµ , by using an analogous argumentin [15, p.1830, Assertion 3], we find that the initial velocity τ is uniquely deter-mined by τ = 1sin( l/ s dµ dµ − cos l ! µ (2.4)as follows. In fact, from (2.3) we have (cid:18) cos l dτdµ · sin l (cid:19) = dµ dµ , so cos l dτdµ · sin l ± s dµ dµ . Define subsets M , M of M respectively by M = ( x ∈ M (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) sin l dτdµ ( x ) = − cos l s dµ dµ ( x ) ! ) ,M = ( x ∈ M (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) sin l dτdµ ( x ) = − cos l − s dµ dµ ( x ) ! ) . The subsets M , M satisfy M ∪ M = M and both are closed, since on a man-ifold M the function dτ /dµ must be continuous and the function at right handside is also continuous. First, we have M ∩ M = ∅ . This is because, if thereexists, otherwise, x ∈ M ∩ M , then dµ /dµ ( x ) = 0 which is a contradiction.Thus, M and M turn out to be open and closed. Next, we claim that M = ∅ .If M = ∅ , then M = M ( and hence M = ∅ ), since M is connected, and hencefrom R M dτ = 0 we have cos l − Z M s dµ dµ dµ < , π < l < π which is a contradiction, because l ∈ ( − π, π ) (see Remark 2.3).Hence we have (2.4) and from R M dτ = 0 Z M s dµ dµ dµ = cos l . From this and by using the normalized geometric mean ϕ , for the given µ , µ we can express (2.4) as τ = 1tan( l/
2) ( ϕ ( µ, µ ) − µ ) . (2.5)We also have l = 2 arccos Z M s dµ dµ dµ ! = ℓ ( µ, µ ) . (2.6)Thus the theorem is proved.If we relax the space P ( M ) of probability measures having continuous densityfunction as the space ˜ P ( L ,λ ) ( M ) consisting of probability measures µ = pλ having L -integrable, non-negative density function p . Then, Proposition 2.5.
For given distinct µ, µ ∈ P ( M ) there exists a geodesicsegment ˜ γ ( t ) which joins µ and µ , while, at least ˜ γ ( t ) belongs to ˜ P ( L ,λ ) ( M )for each t such that the initial velocity vector ˙˜ γ (0) has L -integrable densityfunction, but not continuous. Furthermore ˜ γ ( t ) satisfies ˜ γ (0) = µ , ˜ γ ( π ) = µ . Proof.
We set µ = pλ and µ = p λ with normalized geometric mean ϕ ( µ, µ )and set ℓ = ℓ ( µ, µ ). Here p , p ∈ C ( M ). Consider the geometric mean of µ and µ , cos ℓ ϕ ( µ, µ ), which is a measure given by p p ( x ) p ( x ) λ . Let q ( x ) bethe density function of ϕ ( µ, µ ) with respect to λ , a positive continuous functionon M . Thus, cos ℓ q ( x ) = p p ( x ) p ( x ). We have Z M q ( x ) dλ = 1. Choose apoint x ∈ M and let C x be the cut locus with respect to x . Here, dim C x ≤ dim M − C x is a measure zero set with respect to λ . For the notion andgeometrical properties of cut locus refer to [29]. Via the exponential map exp x , M \ C x is diffeomorphic to a domain D of T x M . D is bounded, since M iscompact so that there exists R > D ⊂ B ( R ), where B ( R ) is theeuclidean ball of radius R in T x M with respect to the euclidean metric. Let σ bethe Lebesgue’s measure on T x M and identify σ with ((exp x ) − ) ∗ σ on M \ C x .Then, the measure λ restricted to M \ C x is represented by λ | M \ C x = f σ | D for a positive smooth function f on D . The integral Z M dϕ ( µ, µ ) reduces to Z M dϕ ( µ, µ ) = Z M \ C x q ( x ) dλ = Z u ∈ D q (exp x u ) f ( u ) dσ ( u )= Z B ( R ) ˜ q ( u ) ˜ f ( u ) dσ ( u ) = 1 , q and ˜ f are the functions on B ( R ), the natural extension of q (exp x u )and f ( u ), respectively, as ˜ q ≡ f ≡ B ( R ) \ D .Consider the function h of r given by h ( r ) := Z B ( r ) ˜ q ( u ) ˜ f ( u ) dσ ( u ) for0 ≤ r ≤ R . It is not hard to see that h is increasing and continuous and h (0) = 0. By the mean value theorem for continuous functions there exists an r > h ( r ) = 1 /
2. Define τ by τ ( x ) = (cid:26) ˜ q ( u ) λ ( x ); x = exp x u, u ∈ B ( r ) , − ˜ q ( u ) λ ( x ); x = exp x u, u ∈ B ( R ) \ B ( r ) . Notice Z exp x B ( R ) \ B ( r ) dτ = − Z exp x B ( R ) \ B ( r ) ˜ q ( u ) dλ ( x ) = − (cid:18) − (cid:19) = − . Therefore, the measure cos ℓ τ belongs to the tangent space at µ and is of unitnorm, since Z M cos ℓ dτ = cos ℓ Z exp x B ( r ) dτ + Z exp x B ( R ) \ B ( r ) dτ ! = cos ℓ (cid:18) − (cid:19) = 0and G µ (cos ℓ τ , cos ℓ τ ) is given bycos ℓ Z M (cid:18) dτ dµ (cid:19) dµ = cos ℓ Z M (cid:18) ± q ( x ) p ( x ) (cid:19) p ( x ) dλ = Z M p ( x ) p ( x ) p ( x ) dλ = Z M p ( x ) dλ = Z M dµ = 1 . Set ˜ γ ( t ) = (cid:18) cos t t ℓ dτ dµ (cid:19) µ. Then ˜ γ ( t ) gives a geodesic in the space ˜ P ( L ,λ ) ( M ). It satisfies ˜ γ (0) = µ ,˜ γ ( π ) = cos ℓ (cid:18) dτ dµ (cid:19) µ = µ . In fact,cos ℓ (cid:18) dτ dµ (cid:19) µ = (cid:18) ± q ( x ) p ( x ) (cid:19) p ( x ) λ = q ( x ) p ( x ) λ = p ( x ) p ( x ) p ( x ) λ = µ . One finds easily cos ℓ τ ∈ L with respect to λ . Thus the proposition is verified. Remark 2.6.
It is not hard to see that ˜ γ ( t ) gives also a geodesic in the space˜ P ( L ,λ ) ( M ) with initial tangent vector having L -integrable density function.12 .3 Proofs of Theorems 1.4 and 1.5 Now we return back to our main subject. First, we prove Theorem 1.4. Substi-tuting (2.4) into (2.2), we have γ ( t ) = ( cos t t · l/ s dµ dµ − cos l !) µ = ( cos( t/ · sin( l/ − sin( t/
2) cos( l/ l/
2) + sin( t/ l/ s dµ dµ ) µ = ( sin( l − t ) / l/
2) + sin( t/ l/ s dµ dµ ) µ = (cid:18) sin( l − t ) / l/ (cid:19) µ + 2 sin( t/ · sin( l − t ) / ( l/ s dµ dµ µ + (cid:18) sin( t/ l/ (cid:19) µ . (2.7)The second term in the last is represented as2 sin( t/
2) cos( l/
2) sin( l − t ) / l/ ϕ ( µ , µ ) = a ( t ) ϕ ( µ , µ ) , since from Definitions 1.1 and 1.2 one has s dµ dµ µ = cos (cid:18) l (cid:19) ϕ ( µ , µ ) . On the other hand the first and third terms are written as a ( t ) µ and a ( t ) µ ,respectively. Therefore we obtain the form (1.8). Since γ (0) = µ , γ ( ℓ ( µ, µ )) = µ , easy computations show us a ( t ) + a ( t ) + a ( t ) = Z M a ( t ) dµ + a ( t ) d µ + a ( t ) dϕ ( µ , µ ) = Z M d γ ( t ) = 1 . Moreover, it is obvious that a i ( t ) ≥ , i = 1 , , ≤ t ≤ l < π . Hence,we conclude that γ ( t ) belongs to P ( M ) for any t ∈ [0 , l ], which means that γ is the geodesic being inside P ( M ) and joining µ, µ ∈ P ( M ). Thus, we obtainTheorem 1.5 (i) and (ii). Remark 2.7.
The equation (2.5) implies that the tangent line of γ ( t ) at γ (0) = µ , which is a curve in P ( M ), passes through the normalized geometric mean ϕ ( µ, µ ). Now, we consider the geodesic γ − ( t ) = γ ( l − t ) which has inversedirection of γ . Then, γ − ( l ) = µ and ˙ γ − (0) = 1tan l ( ϕ ( µ , µ ) − µ ). Hence,similarly as γ , the tangent line of γ − ( t ) at γ − (0) = γ ( l ) = µ also passesthrough ϕ ( µ, µ ). Thus, we obtain Theorem 1.5 (iii) and more generally thefollowing. 13 heorem 2.8. Let µ , µ be points on a geodesic γ . Let L µ and L µ be thetangent lines, tangent to γ at µ and µ , respectively. Then, the lines L µ and L µ intersect and their intersection point is the normalized geometric mean ϕ ( µ, µ )of µ and µ . (See Figure 1).Figure 1: a geometric characterization of ϕ ( µ, µ ).Substituting t = l/ γ ( l/
2) = (cid:18) sin( l/ l/ (cid:19) ( µ + 2 s dµ dµ µ + µ ) = (cid:18)
12 cos( l/ (cid:19) ( s dµ dµ ) µ, from which we obtain Theorem 1.5 (iv). Remark 2.9.
All the above arguments concerning with geodesics, the map ϕ and the function ℓ are completely valid for the space P ∞ ( M ) of probabilitymeasures with smooth density function. P ∞ ( M ) is dense in the space P ( M )(see Lemma 4.13). ϕ and the function ℓ In this section we will show the following result.
Proposition 3.1.
Relative to the k√· − √·k L –topology,(i) ϕ : P ( M ) × P ( M ) → P ( M ) is continuous and(ii) ℓ : P ( M ) × P ( M ) → [0 , π ) is continuous. Proof.
We will show first (ii). Since the function arccosine is continuous, itsuffices to verify thatcos ℓ ( µ, µ )2 = Z M p p ( x ) p ( x ) dλ ( µ = p ( x ) λ, µ = p ( x ) λ ) (3.1)14s continuous. For this we find the following with another pair of measures µ ′ = p ′ ( x ) λ , µ ′ = p ′ ( x ) λ of P ( M ), by applying the Cauchy-Schwarz inequality (cid:12)(cid:12)(cid:12)(cid:12) cos ℓ ( µ, µ )2 − cos ℓ ( µ ′ , µ ′ )2 (cid:12)(cid:12)(cid:12)(cid:12) ≤ Z M (cid:16) √ p (cid:12)(cid:12)(cid:12) √ p − p p ′ (cid:12)(cid:12)(cid:12) + p p ′ (cid:12)(cid:12)(cid:12) √ p − p p ′ (cid:12)(cid:12)(cid:12)(cid:17) dλ (3.2) ≤ (cid:13)(cid:13)(cid:13) √ p − p p ′ (cid:13)(cid:13)(cid:13) L + (cid:13)(cid:13)(cid:13) √ p − p p ′ (cid:13)(cid:13)(cid:13) L . From this it follows that cos ℓ ( µ, µ ) / ϕ is continuous. As same as just above, let µ = p ( x ) λ , µ = p ( x ) λ , µ ′ = p ′ ( x ) λ and µ ′ = p ′ ( x ) λ ∈ P ( M ). We write ϕ ( µ, µ ) = P ( x ) λ and ϕ ( µ ′ , µ ′ ) = P ′ ( x ) λ , where P ( x ) = p p ( x ) p ( x ) R M p p ( x ) p ( x ) dλ , P ′ ( x ) = p p ′ ( x ) p ′ ( x ) R M p p ′ ( x ) p ′ ( x ) dλ . (3.3)We have then, by using the inequality (cid:12)(cid:12)(cid:12) √ a − √ b (cid:12)(cid:12)(cid:12) ≤ | a − b | for any a, b ≥ (cid:13)(cid:13)(cid:13) √ P − √ P ′ (cid:13)(cid:13)(cid:13) L = Z M (cid:16) √ P − √ P ′ (cid:17) dλ ≤ Z M | P ( x ) − P ′ ( x ) | dλ. (3.4)Here P ( x ) − P ′ ( x ) = p p ( x ) p ( x ) − p p ′ ( x ) p ′ ( x ) R M √ pp dλ + R M (cid:16)p p ′ p ′ − √ pp (cid:17) dλ R M √ pp dλ R M p p ′ p ′ dλ q p ′ ( x ) p ′ ( x ) , (3.5)so | P ( x ) − P ′ ( x ) | ≤ p p ( x ) (cid:12)(cid:12)(cid:12)p p ( x ) − p p ′ ( x ) (cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)p p ( x ) − p p ′ ( x ) (cid:12)(cid:12)(cid:12) p p ′ ( x ) R M √ pp dλ + R M n √ p (cid:12)(cid:12)(cid:12) √ p − p p ′ (cid:12)(cid:12)(cid:12) + (cid:12)(cid:12) √ p − √ p ′ (cid:12)(cid:12) p p ′ o dλ R M √ pp dλ R M p p ′ p ′ dλ q p ′ ( x ) p ′ ( x ) (3.6)and hence Z M | P ( x ) − P ′ ( x ) | dλ ≤ R M √ pp dλ Z M (cid:26)p p ( x ) (cid:12)(cid:12)(cid:12)(cid:12)p p ( x ) − q p ′ ( x ) (cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)p p ( x ) − p p ′ ( x ) (cid:12)(cid:12)(cid:12) q p ′ ( x ) (cid:27) dλ. From the Cauchy-Schwarz inequality one gets k√ P − √ P ′ k L ≤ R M √ pp dλ (cid:18)(cid:13)(cid:13)(cid:13) √ p − p p ′ (cid:13)(cid:13)(cid:13) L + (cid:13)(cid:13)(cid:13) √ p − p p ′ (cid:13)(cid:13)(cid:13) L (cid:19) (3.7)which indicates that ϕ is continuous. 15 Riemannian distance function of ( P ( M ) , G ) In this section we will exhibit that ℓ ( µ, µ ) is precisely the Riemannian distanceof µ and µ in P ( M ). For this purpose we first restrict our argument to P ∞ ( M ),the space of probability measures with smooth density function. We define theexponential map over P ∞ ( M ). We prove, then, Gauss lemma, the existence of atotally normal neighborhood in P ∞ ( M ) with respect to the Fisher metric G andshow that ℓ ( µ, µ ) gives the Riemannian distance in P ∞ ( M ) for µ, µ ∈ P ∞ ( M ).We prove secondly that P ∞ ( M ) is dense in P ( M ) with respect to the C -norm (Lemma 4.13) so that the Riemannian distance of µ, µ ∈ P ∞ ( M ) in thespace P ( M ) is actually given by the function ℓ ( µ, µ ) by the aid of reductio adabsurdum. Finally we verify that ℓ ( µ, µ ) is properly the Riemannian distanceof µ, µ in the space P ( M ).For the sake of convenience we provide P ∞ ( M ) an H a -topology, a > n , n = dim M . We equip the compact manifold M with a Riemannian metricwhose Riemannian volume form coincides with the measure λ . The Sobolevnorm k · k H a is defined by k f k H a := k f k L a + k∇ f k L a , f ∈ C ∞ ( M ). From theSobolev embedding theorem there exists a constant C ( a ) > f ∈ C ∞ ( M ) k f k C (:= sup x ∈ M | f ( x ) | ) ≤ C ( a ) k f k H a . See [3, § k√· − √·k ( L ,λ ) -norm is related to the H a -norm from H¨olderinequality as k√ p − √ p k ( L ,λ ) ≤ k p − p k / L ≤ k p − p k / L a ≤ k p − p k / H a . P ∞ ( M ) Let µ ∈ P ∞ ( M ). Let τ ∈ T µ P ∞ ( M ) be a tangent vector at µ ∈ P ∞ ( M )and suppose that there exists a geodesic γ : [0 , → P ∞ ( M ) satisfying γ (0) = µ, ˙ γ (0) = τ . Then γ (1) ∈ P ∞ ( M ) will be customarily denoted by exp µ τ . Thegeodesic γ can thus be written by γ ( t ) = exp µ tτ. Lemma 4.1.
For any µ ∈ P ∞ ( M ), µ = µ , there exists a geodesic γ : [0 , →P ∞ ( M ) satisfying γ (0) = µ, γ (1) = µ by setting γ ( t ) = exp µ tτ = (cid:18) cos l t l · sin l t · dτdµ (cid:19) µ, (4.1)where τ ∈ T µ P ∞ ( M ) defined by τ = l ˜ τ , l = ℓ ( µ, µ ) and ˜ τ ∈ T µ P ∞ ( M ) is aunit tangent vector defined by˜ τ = 1tan l ( ϕ ( µ, µ ) − µ ) . (4.2) Proof.
From Proposition 2.2,˜ γ ( t ) = (cid:18) cos t t d ˜ τdµ (cid:19) µ τ of (4.2) gives us a geodesic, parametrized by arc-length, satisfying ˜ γ (0) = µ , ˙˜ γ (0) = ˜ τ and ˜ γ ( l ) = µ .Put τ = l ˜ τ and t = ls and set γ ( s ) = ˜ γ ( ls ). Then, γ ( s ) is a geodesicdefined over [0 , γ (0) = µ, γ (1) = µ and ˙ γ (0) = τ .Let µ ∈ P ∞ ( M ) be a probability measure of positive smooth density func-tion. We fix µ for a moment. Let ε be a real number satisfying 0 < ε < π andlet B ( µ ; ε ) be a set of probability measures µ ∈ P ∞ ( M ) satisfying ℓ ( µ, µ ) < ε ; B ( µ ; ε ) := { µ ∈ P ∞ ( M ) | ℓ ( µ, µ ) < ε } . (4.3)Let 0 < ε < π and set B ( µ ; ε ) := (cid:26) τ ∈ T µ P ∞ ( M ) (cid:12)(cid:12)(cid:12)(cid:12) | τ | µ < ε , inf x ∈ M dτdµ ( x ) > −| τ | µ cot | τ | µ (cid:27) . Note that when | τ | µ = 0 we put | τ | µ cot | τ | µ g on M whose Riemannian volume form dv g coincides with the measure µ . Then | τ | µ ≤ (cid:13)(cid:13)(cid:13)(cid:13) dτdµ (cid:13)(cid:13)(cid:13)(cid:13) H a so that the map τ
7→ | τ | µ is continuous with respectto the H a –topology. Moreover, the inequality inf x ∈ M dτdµ ( x ) > −| τ | µ cot | τ | µ f = dτdµ and f − := f − | f | f − ( x ) ≤ x ∈ M and f − ∈ C ( M ) so that theinequality is equivalent to the C -norm inequality; k f − k C < | τ | µ cot | τ | µ f − ,s ∈ C ∞ ( M ), s > f − such that k f − ,s − f − k C → s →
0. Therefore, the inequality with respect to the H a -norm k f − ,s k H a < C ( a ) | τ | µ cot | τ | µ k f − k C ≤ k f − ,s − f − k C + k f − ,s k C < k f − ,s − f − k C + | τ | µ cot | τ | µ k f − ,s − f − k C is taken small as possible. Proposition 4.2.
The exponential map exp µ : B ( µ ; ε ) → B ( µ ; ε ) defined byexp µ τ = (cid:18) cos | τ | µ | τ | µ sin | τ | µ · dτdµ (cid:19) µ is a bijection. 17 roof. First we will show that exp µ τ which we denoted by µ belongs to B ( µ ; ε )for any τ ∈ B ( µ ; ε ). Since s dµ dµ = (cid:18) cos | τ | µ | τ | µ sin | τ | µ · dτdµ (cid:19) , we have Z M s dµ dµ dµ = Z M (cid:18) cos | τ | µ | τ | µ sin | τ | µ · dτdµ (cid:19) dµ = cos | τ | µ Z M dµ + 1 | τ | µ sin | τ | µ Z M dτ = cos | τ | µ . Then, cos | τ | µ ℓ ( µ , µ )2 from (1.3) and hence | τ | µ = ℓ ( µ , µ ) and thus µ ∈ B ( µ ; ε ).Next we will show that the map exp µ is injective over B ( µ ; ε ) \{ } . Let τ, τ ′ ∈ B ( µ ; ε ) \{ } . Assume that exp µ τ = exp µ τ ′ which we denote by µ .Then from the above argument, we have ℓ ( µ , µ ) = | τ | µ = | τ ′ | µ . Moreover,from µ = (cid:18) cos | τ | µ | τ | µ sin | τ | µ · dτdµ (cid:19) µ = (cid:18) cos | τ ′ | µ | τ ′ | µ sin | τ ′ | µ · dτ ′ dµ (cid:19) µ, it follows similarly as in the proof of Theorem 2.4 that dτ /dµ = dτ ′ /dµ on M and hence τ = τ ′ , which means the injectivity of the map exp µ .The surjectivity is obtained by taking µ in B ( µ, ε ) and also τ = l ˜ τ ∈ T µ P ∞ ( M ), where ˜ τ = l/ ( ϕ ( µ, µ ) − µ ) is a unit tangent vector at µ and l = ℓ ( µ, µ ). Then, from Lemma 4.1 µ is described as µ = exp µ τ , whichimplies the surjectivity of exp µ . Remark 4.3.
From the above proposition, especially from its actual form themap exp µ is smooth over B ( µ ; ε ) \{ } together with smooth inverse map exp − µ .For the smoothness refer to [20, II]. Lemma 4.4.
Let µ = p ( x ) λ and µ = p ( x ) λ be probability measures in P ∞ ( M ). Then, ℓ ( µ, µ ) < ε ⇐⇒ k√ p − √ p k L < √ r − cos ε , (4.5)and hence, B ( µ ; ε ) is written as B ( µ ; ε ) = (cid:26) µ = p ( x ) λ (cid:12)(cid:12)(cid:12)(cid:12) k√ p − √ p k L < √ r − cos ε (cid:27) . (4.6)18 emark 4.5. From (4.6) B ( µ ; ε ) can be regarded as a neighborhood of P ∞ ( M )with respect to the k√· − √·k L -norm around µ = pλ . Therefore we considereach B ( µ ; ε ) as a neighborhood of P ∞ ( M ) around µ . Proof.
Denote ℓ ( µ, µ ) by ℓ by abbreviation. Then, the left hand side of (4.5)is equivalent to 0 ≤ ℓ/ < ε/ ε/ < cos( ℓ/ ≤
1. On theother hand, we have the following identity; k√ p − √ p k L = 2 − ℓ k√ p − √ p k L = Z M ( √ p − √ p ) dλ = 2 − Z M √ p p dλ, (4.8)where Z M √ p p dλ is represented by Z M r p p p dλ = Z M s dµ dµ dµ = cos ℓ ( µ, µ )2 . Then, cos( ε/ < cos( l/ ≤ ε < − k√ p − √ p k L ≤ ⇐⇒ − cos ε > k√ p − √ p k L ≥ ⇐⇒ (cid:16) − cos ε (cid:17) > k√ p − √ p k L ≥ ≤ ε/ < π/ µ = p ( x ) λ, µ = p ( x ) λ ∈ B ( µ ; ε ) be arbitrary probability measures.From Lemma 4.4 we have k√ p i − √ p k L < √ r − cos ε , i = 1 , . From the triangle inequality with respect to the L -norm, we have then k√ p − √ p k L ≤ k√ p − √ p k L + k√ p − √ p k L < √ r − cos ε . (4.9) Lemma 4.6.
Let t be a real number satisfying 0 < t < π/
2. Then, we have √ √ − cos t ≤ √ − cos 2 t. (4.10) Proof.
From the obvious equality 1 − cos 2 t = 2(1 − cos t ), we have √ − cos 2 t = √ p − cos t. Since 1 − cos t > t > < t < π/ √ √ − cos t < √ √ − cos t √ t = √ p − cos t which is equal to √ − cos 2 t . 19ow, let B ( µ ; ε ) be a neighborhood around µ , defined at (4.3) with ε < π/ µ i = p i λ ∈ B ( µ ; ε ) , i = 1 ,
2. Then,from (4.9) and (4.10), we have k√ p − √ p k L < √ r − cos ε ≤ √ − cos ε ≤√ √ − cos 2 ε = √ r − cos 4 ε . Lemma 4.7.
Let B ( µ ; ε ) be a neighborhood with ε < π/
4. Then, for any µ , µ ∈ B ( µ ; ε ), ℓ ( µ , µ ) < ε. (4.11) Proof.
For µ , µ ∈ B ( µ ; ε ) one has ℓ ( µ i , µ ) < ε , i = 1 ,
2, equivalently k√ p i − √ p k L < √ r − cos ε , i = 1 , k√ p − √ p k L < √ r − cos 4 ε Proposition 4.8.
Let µ ∈ P ∞ ( M ) be an arbitrary probability measure and ε be a real number satisfying 0 < ε < π . Let W = B ( µ ; ε/
4) be a neighborhooddefined at (4.3). For any µ ∈ W , let B ( µ ; ε ) be a neighborhood around µ .Then,(i) W ⊂ B ( µ ; ε ) and(ii) exp µ is a diffeomorphism between B ( µ ; ε ) and B ( µ ; ε ).The neighborhood W is called a totally normal neighborhood of µ . Proof.
Notice ℓ ( µ, µ ) < ε/
4. If µ ∈ W , then ℓ ( µ, µ ) < ε/
4. From Lemma4.7 we have ℓ ( µ , µ ) < · ( ε/
4) = ε and hence µ ∈ B ( µ ; ε ). Since µ ∈ W isarbitrary, we see W ⊂ B ( µ ; ε ).Assertion (ii) is shown from Proposition 4.2 together with Remark 4.3, since0 < ε < π . Lemma 4.9 (Gauss Lemma) . Denote by f ( t, τ ) the image of the exponentialmap exp µ tτ , t > τ ∈ T µ P ( M ) of unit norm | τ | µ = G µ ( τ, τ ) / = 1. Then G f ( t,τ ) (cid:18) ∂f∂t , ∂f∂τ ∗ ( δτ ) (cid:19) = 0 , where ∂f∂t is the differential of f with respect to t and ∂f∂τ ∗ is the differential mapfrom T τ S µ to T f ( t,τ ) P ( M ). Here S µ := { σ ∈ T µ P ∞ ( M ) | G µ ( σ, σ ) = 1 } and δτ is a tangent vector at τ to S µ . 20 roof. While this lemma is a routine in Riemannian geometry, we verify itdirectly. Since f ( t, τ ) = (cid:18) cos t t dτdµ (cid:19) µ , ∂f∂t = (cid:18) cos t t dτdµ (cid:19) (cid:18) − sin t t dτdµ (cid:19) µ∂f∂τ ∗ ( δτ ) =2 (cid:18) cos t t dτdµ (cid:19) sin t · d ( δτ ) dµ µ. Now we will see G f ( t,τ ) ( ∂f /∂t, ∂f /∂τ ∗ ( δτ )) = 0. Since d ( ∂f /∂t ) d f ( t, τ ) = (cid:16) cos t + dτdµ sin t (cid:17) (cid:16) − sin t + dτdµ cos t (cid:17)(cid:16) cos t + dτdµ sin t (cid:17) = − sin t + dτdµ cos t cos t + dτdµ sin t and similarly d ( ∂f /∂τ ∗ ( δτ )) d f ( t, τ ) = 2 d ( δτ ) dµ sin t cos t + dτdµ sin t and thus G f ( t,τ ) (cid:18) ∂f∂t , ∂f∂τ ∗ ( δτ ) (cid:19) = Z M (cid:16) − sin t + dτdµ cos t (cid:17) d ( δτ ) dµ sin t (cid:16) cos t + dτdµ sin t (cid:17) · (cid:18) cos t dτdµ sin t (cid:19) dµ which is reduced to zero, since Z M (cid:18) − sin t t · dτdµ (cid:19) sin t · d ( δτ ) dµ dµ = − t Z M d ( δτ ) dµ dµ + 2 sin t t G µ ( τ, δτ ) = 0 , where G µ ( τ, δτ ) = 0 is derived from the derivation of G µ ( τ, τ ) = 1 along thedirection δτ . Thus, the lemma is proved. Proposition 4.10.
Let µ ∈ P ∞ ( M ) and ε ∈ (0 , π ). Let B ( µ ; ε ) be an ε -openneighborhood in T µ P ∞ ( M ) such that B ( µ ; ε ) = exp µ ( B ( µ ; ε )). Let γ : [0 , → B ( µ ; ε ) be a geodesic segment satisfying γ (0) = µ .If c : [0 , → P ∞ ( M ) be any piecewise C -curve joining γ (0) and γ (1), thenthe length of γ and c satisfies L ( γ ) ≤ L ( c )and if equality holds, then γ ([0 , c ([0 , γ of [0 , c of [0 , roof. We may suppose that c ([0 , ⊂ B ( µ ; ε ). Since exp µ is bijective on B ( µ ; ε ), c ( t ) for t ( = 0) can be written uniquely as c ( t ) = exp µ ( r ( t ) τ ( t ))where t τ ( t ) is a piecewise C -curve in T µ P ∞ ( M ) with | τ ( t ) | G,µ = 1 and r : (0 , → R is a positive piecewise C -function.By setting f ( r, τ ) = exp µ ( rτ ), we write c ( t ) as c ( t ) = f ( r ( t ) , τ ( t )) for any t ( = 0). It follows then that, except for a finite number of points dcdt ( t ) = ∂f∂r ˙ r ( t ) + ∂f∂τ ∗ (cid:18) dτdt (cid:19) . Here dτdt ∈ T τ ( t ) S µ is the velocity vector of the curve τ ( t ). From Lemma 4.9two vectors of the right hand side are orthogonal each other with respect to themetric G and (cid:12)(cid:12)(cid:12)(cid:12) ∂f∂r (cid:12)(cid:12)(cid:12)(cid:12) c ( t ) = 1 with respect to G . Then, (cid:12)(cid:12)(cid:12)(cid:12) dcdt (cid:12)(cid:12)(cid:12)(cid:12) c ( t ) = | ˙ r ( t ) | + (cid:12)(cid:12)(cid:12)(cid:12) ∂f∂τ ∗ (cid:18) dτdt (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) c ( t ) ≥ | ˙ r ( t ) | . Therefore, for a sufficiently small positive real number δ , we have Z δ (cid:12)(cid:12)(cid:12)(cid:12) dcdt ( t ) (cid:12)(cid:12)(cid:12)(cid:12) c ( t ) dt ≥ Z δ | ˙ r ( t ) | dt ≥ (cid:12)(cid:12)(cid:12)(cid:12)Z δ ˙ r ( t ) dt (cid:12)(cid:12)(cid:12)(cid:12) ≥ r (1) − r ( δ ) . Taking δ →
0, we obtain L ( c ) ≥ L ( γ ), because r (1) = ℓ ( γ (1) , µ ) = L ( γ ).If c ([0 , B ( µ ; ε ), we consider the first point t ∈ (0 , c ( t ) belongs to the boundary of B ( µ ; ε ). We have then L ( c ) ≥ L ( c | [0 ,t ] ) ≥ ε > L ( γ ) . Refer to [9, Chap.3, sec. 3] and [21, II, §
10] for a proof for a finite dimensionalRiemannian manifold.
Theorem 4.11.
Let c : [ a, b ] → P ∞ ( M ) be a piecewise C -curve with a param-eter proportional to arc length. If c has length less than or equal to the lengthof any other piecewise C -curve joining c ( a ) to c ( b ), then c is a geodesic. Proof.
Let t ∈ [ a, b ] and let W be a totally normal neighborhood of a point c ( t ).Then, there exists a closed interval I ⊂ [ a, b ], with non-empty interior and t ∈ I such that c ( I ) ⊂ W . The restriction c | I : I → W is a piecewise C -curve joiningtwo points of W . From Proposition 4.10 together with the hypothesis, the lengthof c | I is equal to the length of a radial geodesic joining these two points. FromProposition 4.10 and from the fact that c | I is parametrized proportionally toarc length, c | I is a geodesic. 22rom this theorem we can assert that the function ℓ = ℓ ( µ, µ ) gives theRiemannian distance in P ∞ ( M ) of µ , µ ∈ P ∞ ( M ). Now we will achieve thefinal aim of this section. Theorem 4.12.
The function ℓ = ℓ ( µ, µ ) is actually the Riemannian distancein P ( M ) of µ and µ of P ( M ).To obtain this theorem we first show the following. Lemma 4.13. P ∞ ( M ) is dense in P ( M ) with respect to the C -norm. Moreprecisely, if f is a continuous function on M , then there exists a family of smoothfunctions f δ , δ > k f δ − f k C →
0, as δ → Proof.
Let { ρ α ; α ∈ A } be a partition of unity subordinate to an open covering { U α ; α ∈ A } of a compact manifold M , dim M = n ≥
2. Here A is a finite set.We may assume that each U α is a coordinate neighborhood diffeomorphic to aeuclidean open ball in R n and supp ρ α ⊂ V α , V α is compact in U α .Let f be a continuous function on M . Set for each α f α := ρ α f . Then,supp f α ⊂ V α . We may extend the function f α outside of V α , as f α ( x ) =0, x ∈ R n \ V α . Let { ψ δ ; δ > } be a family of functions which satisfies(i) ψ δ ( x ) ≥ x ∈ R n , (ii) ψ δ ∈ C ∞ ( R n ), (iii) supp ψ δ = B δ (0),where B δ (0) ⊂ R n is the euclidean closed ball of radius δ with center 0 and(iv) Z R n ψ δ dv = 1. We call { ψ δ } a sequence of mollifiers. We define such asequence { ψ δ } for instance by ψ δ ( x ) = δ − n ψ ( x/δ ), δ >
0, where ψ ( y ) is abump function given by ψ ( y ) = c n exp (cid:8) / ( k y k − (cid:9) for y ∈ R n of k y k < ψ ( y ) = 0 for y of k y k ≥
1. Here c n is a normalization constant accordingto (iv). The function f α is mollified by the convolution with the functions ψ δ as f α,δ ( x ) := ( f α ∗ ψ δ )( x ) = Z y ∈ R n f α ( y ) ψ δ ( x − y ) dv ( y ). Notice supp f α,δ ⊂{ x + y ; x ∈ supp f, y ∈ B δ (0) } which is contained in U α for a sufficiently small δ >
0. The function f on M is now mollified by ψ δ as f δ ( x ) = P α ∈ A f α,δ ( x ), x ∈ M . It is shown that f δ ∈ C ∞ ( M ) for a sufficiently small δ > k f δ − f k C → δ → P ∞ ( M ) is dense in P ( M ).Refer to [26, 4.2],[6, 4.4] and [3, 3.46] for the mollifieres on the euclideanspace.Let { f t } be a family of continuous functions on M parametrized in t ∈ I ( I is a closed interval) with df t dt ∈ C ( M ). Then { f t } is mollified by ψ δ as afamily of smooth functions { f t,δ } and hence (cid:26) df t dt (cid:27) is mollified by (cid:26) df t,δ dt (cid:27) sothat (cid:13)(cid:13)(cid:13)(cid:13) df t,δ dt − df t dt (cid:13)(cid:13)(cid:13)(cid:13) C → δ → Proposition 4.14.
The Riemannian distance in P ( M ) of µ, µ ∈ P ∞ ( M ) withrespect to the metric G is given by the Riemannian distance in P ∞ ( M ).23 roof. Let µ, µ be probability measures in P ∞ ( M ). Then by definition theRiemannian distance d ( µ, µ ) in P ( M ) is given by d ( µ, µ ) = inf c ∈C ( µ,µ ) L ( c ) , (4.12)where C ( µ, µ ) denotes the set of all piecewise C -curves c : [0 , → P ( M ), c (0) = µ , c (1) = µ . To show the proposition we assume inf c ∈C ( µ,µ ) L ( c ) <ℓ ( µ, µ ). We will see by the aid of the mollifier argument in the followingthat there exists a piecewise C -curve c ′ which belongs to P ∞ ( M ) and satisfies L ( c ′ ) < ℓ ( c ′ (0) , c ′ (1)). This causes a contradiction, since ℓ ( · , · ) is the Riemanniandistance function in P ∞ ( M ), as shown in Theorem 4.11.Set ε = (cid:0) ℓ ( µ, µ ) − inf c ∈C ( µ,µ L ( c ) (cid:1) . Then, ε > C -curve c in P ( M ) joining µ and µ and satisfying L ( c ) < ℓ ( µ, µ ) − ε .Write this curve c as c ( t ) = µ t = p ( x, t ) λ with c (0) = µ and c (1) = µ ,represented by p ( x ) λ and p ( x ) λ , respectively, so p ( x,
0) = p ( x ) and p ( x,
1) = p ( x ). By the above mollifier argument p ( x, t ) and ∂p ( x, t ) /∂t are mollified by p δ ( x, t ) and ∂p δ ( x, t ) /∂t so that as δ → G µ t,δ (cid:18) ∂µ t,δ ∂t , ∂µ t,δ ∂t (cid:19) → G µ t (cid:18) ∂µ t ∂t , ∂µ t ∂t (cid:19) . Here µ t,δ = p δ ( x, t ) λ gives us a piecewise C -curve c δ joining µ δ = p δ ( x, λ and µ ,δ = p δ ( x, λ which both belong to P ∞ ( M ). Thus we have |L ( c δ ) − L ( c ) | ≤ Z (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) dµ t,δ dt (cid:12)(cid:12)(cid:12)(cid:12) G − (cid:12)(cid:12)(cid:12)(cid:12) dµ t dt (cid:12)(cid:12)(cid:12)(cid:12) G (cid:12)(cid:12)(cid:12)(cid:12) dt < ε δ > L ( c δ ) < ℓ ( µ, µ ) − ε , (4.13)since L ( c δ ) < L ( c ) + ε/ < ℓ ( µ, µ ) − ε + ε/ ℓ ( µ δ , µ ,δ ), the value of the function ℓ at µ δ and µ ,δ isthe Riemannian distance in P ∞ ( M ) of µ δ and µ ,δ . We find from the followingthat there exists δ > ℓ ( µ, µ ) − ε/ < ℓ ( µ δ , µ ,δ ) holds for any0 < δ < δ . In fact, we may assume ℓ ( µ δ , µ ,δ ) ≤ ℓ ( µ, µ ). Then | ℓ ( µ δ , µ ,δ ) − ℓ ( µ, µ ) | ≤ π (cid:18) sin ℓ ( µ, µ )4 (cid:19) − (cid:16) k p δ − p k / C + k p ,δ − p k / C (cid:17) . (4.14)By (3.2) in section 3 we have (cid:12)(cid:12)(cid:12)(cid:12) cos ℓ ( µ δ , µ ,δ )2 − cos ℓ ( µ, µ )2 (cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:18)Z M |√ p δ − √ p | dλ (cid:19) / + (cid:18)Z M |√ p ,δ − √ p | dλ (cid:19) / (4.15)24o which we apply the inequality |√ a − √ b | ≤ | a − b | for a, b ≥ (cid:12)(cid:12)(cid:12)(cid:12) cos ℓ ( µ δ , µ ,δ )2 − cos ℓ ( µ, µ )2 (cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:18)Z M | p ,δ ( x ) − p ( x ) | dλ (cid:19) / + (cid:18)Z M | p δ ( x ) − p ( x ) | dλ (cid:19) / ≤k p ,δ − p k / C + k p δ − p k / C . Therefore, by setting L = ℓ ( µ, µ ), L δ = ℓ ( µ δ , µ ,δ ) for simplicity one has2 (cid:12)(cid:12)(cid:12)(cid:12) sin L + L δ L − L δ (cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12) cos ℓ ( µ δ , µ ,δ )2 − cos ℓ ( µ, µ )2 (cid:12)(cid:12)(cid:12)(cid:12) and since L, L δ ∈ (0 , π ) and L δ ≤ L from the assumption, one sees ( L + L δ ) / ≥ L/ L − L δ ) / ≤ π/
2. Since (2 /π ) · x ≤ sin x , x ∈ [0 , π/ · π · ( L − L δ )4 sin L ≤ (cid:12)(cid:12)(cid:12)(cid:12) sin L + L δ L − L δ (cid:12)(cid:12)(cid:12)(cid:12) and hence ( L − L δ ) π sin L ≤ k p ,δ − p k / C + k p δ − p k / C (4.16)from which the desired inequality (4.14) is obtained.Now, p and p have been mollified as above by p δ , p ,δ , respectively so, by theaid of (4.14), we can take δ > | L δ − L | = | ℓ ( µ δ , µ ,δ ) − ℓ ( µ, µ ) | < ε/ δ satisfying 0 < δ < δ , so we have ℓ ( µ, µ ) − ε/ < ℓ ( µ δ , µ ,δ ).Therefore, from (4.13) it follows that for sufficiently small δ the length of c δ satisfies L ( c δ ) < ℓ ( µ, µ ) − ε/ < ℓ ( µ, µ ) − ε/ < ℓ ( µ δ , µ ,δ ). This leads acontradiction, since ℓ ( µ δ , µ ,δ ) is distance of µ δ and µ ,δ in P ∞ ( M ). Thus, wecan assert that the function ℓ gives the Riemannian distance of two measures µ, µ of P ∞ ( M ) not only in P ∞ ( M ) but also in P ( M ). P r o o f of Theorem 4.12.
The Riemannian distance in P ( M ) of probability mea-sures µ and µ which belong to P ( M ) is given by inf c ∈C ( µ,µ ) L ( c ). We assumeinf c L ( c ) < ℓ ( µ, µ ). Then, the proof of Theorem 4.11 is also applied, eventhough µ , µ admit a continuous density function, but by a minor modification.From the arguments at the proof of Theorem 4.11, we obtain inf c L ( c ) = ℓ ( µ, µ )which implies that ℓ ( · , · ) gives the Riemannian distance in P ( M ) with respectto the Fisher metric G . P ( M ) In this section we introduce certain topology and a smooth structure on P ( M )by means of the argument of Pistone and Sempi developed in [27]. For thispurpose, let (Ω , B , λ ) be a probability space in a more general setting and denote25y M λ the set of L -integrable density functions of all the probability measures µ equivalent to λ , i.e., µ ≪ λ , λ ≪ µ , M λ := (cid:26) µ (cid:12)(cid:12)(cid:12)(cid:12) dµdλ (= p ) ∈ L (Ω , λ ) , p > λ -a.s. , E λ (cid:20) dµdλ (cid:21) = 1 (cid:27) . (5.1) E λ [ · ] is the expectation with respect to λ . Let µ = pλ be an arbitrary proba-bility measure of M λ . For a real valued random variable u , i.e., a measurablefunction on (Ω , B , µ ) we denote by ˆ u µ ( t ) the moment generating function of u ,defined by ˆ u µ ( t ) := Z Ω exp( tu ) dµ = E µ [exp( tu )]. Define for each µ a vectorspace consisting of certain random variables; V µ := (cid:8) u ∈ L (Ω , µ ) | ∈ D (ˆ u µ ) , E µ [ u ] = 0 (cid:9) . (5.2)The first condition, 0 ∈ D (ˆ u µ ) means that the domain of ˆ u µ contains a neigh-borhood of 0 in R . Then, V µ turns out to be a closed linear subspace of aBanach space L φ (Ω , µ ), the Orlicz space of the Young function φ = φ ( t ); L φ (Ω , µ ) := n u is a random variable (cid:12)(cid:12)(cid:12) ∃ a > , E µ h φ (cid:16) ua (cid:17)i < + ∞ o (5.3)with norm k u |k φ,µ := inf n a > (cid:12)(cid:12)(cid:12) E µ h φ (cid:16) ua (cid:17)i ≤ o . (5.4)Note the Young function φ ( t ) is a real valued convex, even function on R satis-fying φ (0) = 0 and strictly increasing for t > t →∞ t − φ ( t ) = + ∞ . In [27] φ ( t ) = cosh t − L φ (Ω , µ ) of the Youngfunction φ is the generalization of the space L p (Ω , µ ) of L p -integrable functionson Ω, p ≥
1. For a precise argument refer to [27]. It is shown in [27] that V µ coincides with the closed linear subspace L (cosh − (Ω , µ ) = { u ∈ L (cosh − (Ω , µ ) | E µ [ u ] = 0 } ⊂ L (cosh − (Ω , µ )and the following holds; L ∞ , (Ω , µ ) ֒ → V µ (= L (cosh − (Ω , µ )) ֒ → \ p> L p, (Ω , µ ) , (5.5)where the symbol “ ֒ → ” means a continuous and dense embedding. The space P ( M ), our main subject in this paper, turns out to be a dense subset of M λ for (Ω = M, B = B ( M ) , λ ).Let V µ = { u ∈ L (cosh − (Ω , µ ) | || u || φ,µ < } ∩ V µ be a unit open ball in V µ .Then, the injective map σ µ : V µ ∋ u exp [ u − Ψ µ ( u )] µ = exp uE µ [exp u ] µ ∈ M λ (5.6)26ogether with U µ = σ µ ( V µ ), the image of V µ and s µ = σ − µ , the inverse map of σ µ , yields a chart of M λ around µ . Here Ψ µ ( u ) = log E µ [exp u ] is the cumulantgenerating function of u . Notice that s µ has the form; s µ ( ν ) = log (cid:18) dνdµ (cid:19) − E µ (cid:20) log (cid:18) dνdµ (cid:19)(cid:21) , ν ∈ U µ (5.7)so that the transition function between s µ ( U µ ∩ U µ ) and s µ ( U µ ∩ U µ ) of M λ is represented by an affine transform of the form s µ ◦ s − µ ( u ) = u + log (cid:18) dµdµ (cid:19) − E µ (cid:20) u + log (cid:18) dµdµ (cid:19)(cid:21) . Theorem 5.1 ([27, Theorem 3.3]) . The collection of pairs { ( U µ , s µ ) ; µ ∈ M λ } defines an affine smooth atlas on M λ .The atlas of M λ necessarily induces a topology which is shown to be equiv-alent to the topology induced from the e -convergence defined in [27, definition1.1].From this theorem the map ϕ , the normalized geometric mean, given inDefinition 1.1 turns out to be smooth. In fact, one can represent ϕ as thearithmetic mean in terms of the local coordinate maps σ µ and s µ . Lemma 5.2. s µ ( ϕ ( σ µ ( u ) , σ µ ′ ( u ′ )) = 12 { u − E µ [ u ] + u ′ − E µ ′ [ u ′ ] } , u ∈ V µ , u ′ ∈ V µ ′ , (5.8)where one sets µ = ϕ ( µ, µ ′ ) for µ, µ ′ ∈ M λ . Proof.
This is given by a slight computation from the formula ϕ ( σ µ ( u ) , σ µ ′ ( u ′ )) = 1 R M √ exp u exp u ′ dµ p exp u exp u ′ µ (5.9)together with (5.7).(5.9) is derived as follows. From Definition 1.1 ϕ ( σ µ ( u ) , σ µ ′ ( u ′ )) = 1 R M q dσ µ ′ ( u ′ ) dσ µ ( u ) dσ µ ( u ) s dσ µ ′ ( u ′ ) dσ µ ( u ) σ µ ( u ) (5.10)where dσ µ ′ ( u ′ ) dσ µ ( u ) = exp u ′ ( E µ ′ [exp u ′ ]) − exp u ( E µ [exp u ]) − dµ ′ dµ (5.11)which is ensured by the Radon-Nikodym derivative of the measures σ µ ( u ), σ µ ′ ( u ′ ) with respect to the measures µ , µ ′ , respectively. Then s dσ µ ′ ( u ′ ) dσ µ ( u ) σ µ ( u ) = s exp u exp u ′ E µ [exp u ] E µ ′ [exp u ′ ] s dµ ′ dµ µ. (5.12)27ince µ = ϕ ( µ, µ ′ ), one finds s dµ ′ dµ µ = Z M s dµ ′ dµ dµ ! µ so Z M s dσ µ ′ ( u ′ ) dσ µ ( u ) dσ µ ( u ) = R q dµ ′ dµ dµ p E µ [exp u ] E µ ′ [exp u ′ ] Z M p exp u exp u ′ dµ (5.13)and s dσ µ ′ ( u ′ ) dσ µ ( u ) σ µ ( u ) = R q dµ ′ dµ dµ p E µ [exp u ] E µ ′ [exp u ′ ] p exp u exp u ′ µ (5.14)from which (5.9) follows.The smoothness of ϕ at any ( µ, µ ′ ) is immediately derived from (5.8).As the space M λ can be treated as an affine manifold, in the rest of thissection by applying the definition of tangent vectors to M λ given in [27], wepresent the Fisher information metric in local coordinate expression. Now let c : I → M λ be a C –curve of M λ with c ( t ) ∈ U ν with respect to a chart( U ν , s ν ) associated to ν ∈ M λ , where I is an open interval. We have then the C –curve u ν ( t ) = s ν ◦ c ( t ) in V ν in the form of u ν ( t ) = log (cid:18) dc ( t ) dν (cid:19) − E ν (cid:20) log (cid:18) dc ( t ) dν (cid:19)(cid:21) (5.15)with velocity vector u ′ ν ( t ) = ( ds ν ) c ( t ) ( c ′ ( t )) belonging to V ν ; u ′ ν ( t ) = n ddt log (cid:18) dc ( t ) dν (cid:19) − ddt E ν (cid:20) log (cid:18) dc ( t ) dν (cid:19)(cid:21) o | t . (5.16)When c ( t ) ∈ U ν , with respect to another chart ( U ν , s ν ), we have similarly the C –curve u ν ( t ) = s ν ◦ c ( t ) in V ν with velocity vector u ′ ν ( t ) ∈ V ν . Therefore,it is shown from the affine structure of the space M λ , stated in Theorem 5.1that the difference u ′ ν ( t ) − u ′ ν ( t ) is a constant function and from this factthe tangent vector of c ( t ) at t = t in local coordinate expression is definedas the collection of such velocity vectors and denote it by [ c ′ ( t )]. The setof all tangent vectors is a vector space, denoted by T c ( t ) M λ . To formulateFisher information metric in local coordinate expression we select a velocityvector which is particular from the collection [ c ′ ( t )], u ′ µ ( t ) = ( ds µ ) µ ( c ′ ( t )),where u µ ( t ) = s µ ◦ c ( t ) is a curve in V µ with respect to a chart ( U µ , s µ ) forwhich c ( t ) = µ . Notice that u µ ( t ) = 0 and u µ ( t ) ∈ V µ for any t and hence u ′ µ ( t ) ∈ Ker E µ . By using particular tangent vectors we have Defninition 5.3.
Let τ , τ ∈ T µ M λ be tangent vectors at µ , and [ u ], [ u ] be thecorresponding tangent vectors in local coordinate expression, respectively. Thenthe scalar product of [ u ] , [ u ] is defined by h [ u ] , [ u ] i µ = Z Ω u ′ µ ( t ) u ′ µ ( t ) dµ ,28here u ′ µ ( t ) and u ′ µ ( t ) are particular velocity vectors of the curves u µ ( t ) = s µ ◦ c ( t ), u µ ( t ) = s µ ◦ c ( t ) representing [ u ], [ u ], respectively, where c ( t ) := µ + ( t − t ) τ , c ( t ) := µ + ( t − t ) τ are the corresponding curves in M λ . Note 5.4.
The scalar product is stated in [27] as a quadratic form on T µ M λ .It can be represented in the following form; h [ u ] , [ u ] i µ = Z Ω ( u ′ ν ( t ) − E µ [ u ′ ν ( t )]) ( u ′ ν ( t ) − E µ [ u ′ ν ( t )]) dµ (5.17)= Z u ′ ν ( t ) · u ′ ν ( t ) dµ − E µ [ u ′ ν ( t )] · E µ [ u ′ ν ( t )] , (5.18)where u ′ ν ( t ), u ′ ν ( t ) are the vectors representing [ u ], [ u ] with respect to otherchart ( U ν , s ν ), respectively. The formula (5.17) is viewed as the covariance (5.18)of two random variables. (5.17) stems from the fact that the difference of thevector u ′ ν ( t ) and the particular one u ′ µ ( t ) is u ′ ν ( t ) − u ′ µ ( t ) = E µ [ u ′ ν ( t )]. Itis indicated in [27, 3.4] that the cumulant 2-form has a representation of thescalar product (covariance). Proposition 5.5.
The scalar product, thus defined, coincides with Fisher in-formation metric G , namely, h [ u ] , [ u ] i µ = G µ ( τ, τ ) , τ, τ ∈ T µ M λ . (5.19)Here [ u ] , [ u ] are the corresponding tangent vectors of τ, τ , respectively in localcoordinate expression.In fact, the left hand side of (5.19) has the form of Z q ( x ) p ( x ) q ( x ) p ( x ) p ( x ) dλ ( x )where p , q and q are the density functions of µ , τ and τ with respect to λ ,respectively. Since p + ( t − t ) q is the density function of the curve c ( t ), onefinds u µ ( t ) = log p + ( t − t ) qp − E µ (cid:20) log p + ( t − t ) qp (cid:21) and hence u ′ µ ( t ) = ddt (cid:12)(cid:12)(cid:12)(cid:12) t = t (cid:18) log p + ( t − t ) qp − E µ (cid:20) log p + ( t − t ) qp (cid:21)(cid:19) = qp . Similarly one has u ′ µ ( t ) = q p to obtain (5.19). We close this section by giving a certain comment on a constant vector field. Byusing constant vector fields, Friedrich obtains in [12] the formulae of Levi-Civitaconnection and geodesics without any argument of the coordinate structure ofthe space of probability measures.By using the notion of being connected by an open mixture arc introducedin [8] (see also [30]), the argument of constant vector fields is well treated. Two29robability measures µ = pλ , µ = p λ of P ( M ) are connected by an openmixture arc if there exists an open interval I ( ⊃ [0 , tµ + (1 − t ) µ belongs to P ( M ) for every t ∈ I . Here we denote by P ( M ) the spaceof probability measures µ = pλ which satisfy µ ≪ λ with p ∈ C ( M ), where λ is the Riemannian volume form on a complete Riemannian manifold M ofunit volume. M is not necessarily assumed to be compact. We easily find thatthis notion is an equivalence relation from [30, Theorem 4.11]. Moreover thistheorem asserts that µ = pλ and µ = p λ are connected by an open mixturearc if and only if there exist constants c , c with 0 < c < < c such that c < dµ /dµ ( x )(= p ( x ) /p ( x )) < c for any x ∈ M . Therefore, letting P m ( M )be the space of probability measures µ = pλ ∈ P ( M ) which are connected with λ by an open mixture arc. Notice that arbitrary µ, µ belonging to P m ( M ) areconnected by an open mixture arc each other. P m ( M ) coincides with P ( M ),provided M is compact. We define a constant vector field at every probabilitymeasure of P m ( M ) as follows. Proposition 5.6.
Set V m ( M ) := (cid:26) ν = qλ (cid:12)(cid:12)(cid:12)(cid:12) q ∈ C ( M ) , Z M dν = 0 , λ + tν ∈ P m ( M ) for any t ∈ ( − ε, ε ) (cid:27) , (5.20)regarded as the tangent space at λ to P m ( M ); T λ P m ( M ). Here ε > ν . Then,(i) V m ( M ) is a vector space.(ii) Every τ ∈ V m ( M ) induces a constant vector field at every µ ∈ P m ( M ).In other words, each τ ∈ V m ( M ) yields measures µ + tτ in P m ( M ), t ∈ ( − ε, ε ) for any µ ∈ P m ( M ). Proof.
First we show that V m ( M ) is a vector space. Let τ = qλ and τ ′ = q ′ λ ∈ V m ( M ). From the positivity of density function of λ + tτ there exists ε > tq ( x ) > t ∈ ( − ε, ε ). Moreover, from connectedness by an openmixture arc one asserts that from [30, Theorem 4.11] for any fixed t ∈ ( − ε, ε )there exist constants 0 < k < < k such that0 < k < d ( λ + tτ ) dλ ( x ) = 1 + tq ( x ) < k , x ∈ M, (5.21)This indicates aside the boundedness of | q | , as | q ( x ) | < /ε max { k − , − k } , x ∈ M by letting t = ε/ cτ ∈ V m ( M ) for any c ∈ R . We see next that τ + τ ′ belongs to V m ( M ) as follows. For τ ′ we have similarly as τ that for any fixed t ∈ ( − ε ′ , ε ′ ) there exist constants 0 < k ′ < < k ′ such that0 < k ′ < d ( λ + tτ ′ ) dλ ( x ) = 1 + tq ( x ) < k ′ , x ∈ M. (5.22)30hen, from (5.21), (5.22) we have12 ( k + k ′ ) <
12 (1 + 2 tq ( x ) + 1 + 2 tq ′ ( x )) = 1 + t ( q ( x ) + q ′ ( x )) < (1 + t ( q ( x )) + (1 + tq ′ ( x )) < k + k ′ for any t satisfying − / { ε, ε ′ } < t < / { ε, ε ′ } . Hence, this shows that τ + τ ′ belongs to V m ( M ).(ii) is shown as follows. Let µ = pλ ∈ P m ( M ) and τ = qλ ∈ V m ( M )be arbitrary. Since µ is connected with λ by an open mixture arc, there existconstants 0 < c < < c such that c < dµ/dλ ( x ) = p ( x ) < c , x ∈ M andthus c + tq ( x ) < p ( x ) + tq ( x ) < c + tq ( x ), x ∈ M . Hence c (cid:18) tc · q ( x ) (cid:19) < p ( x ) + tq ( x ) < c (cid:18) tc · q ( x ) (cid:19) , x ∈ M. We may assume (5.21) for this τ . Then, for any fixed t satisfying − εc < t < εc one has p ( x ) + tq ( x ) > c (1 + tq ( x ) /c ) > c k > x ∈ M and similarly p ( x ) + tq ( x ) < c (1 + tq ( x ) /c ) < c k . These imply that µ + tτ , − c ε < t < c ε defines a probability measure in P m ( M ), namely τ induces a tangent vector at µ and hence a constant vector field everywhere on P m ( M ).For any µ , µ of P m ( M ) their difference µ − µ belongs to V m ( M ).From this proposition the inner product G µ ( τ, τ ′ ) for τ, τ ′ ∈ V m ( M ), µ ∈P m ( M ) is well defined, since 1 /p ( x ) and | q ( x ) | , | q ′ ( x ) | are bounded from above. Remark 5.7.
By using the constant vector field technique employed by T.Friedrich in [12] together with the notion of connectedness by an open mixturearc, we study geodesics on the space of probability measures directly, not viathe local coordinate maps σ µ , s µ defined in [27]. Gaussian measure µ ( c,d ) ofmean value c and variance d ( >
0) on the one-dimensional euclidean space R isconnected with Gaussian measure µ ( c ,d ) if and only if ( c , d ) = ( c, d ). There-fore, for a space of probability measures on R including all Gaussian measuresit is hard to use the notion of connectedness by open mixture arc so that thenotion of open exponential arc together with the local coordinate maps σ µ , s µ of [27] seems to be applied. Acknowledgement
The authors would like to thank the referees for indicating the authors valu-able comments and relevant references.31 eferences [1] S. Amari,
Information Geometry and Its Applications , Appl. Math. Sci. , Springer, 2016.[2] S. Amari and H. Nagaoka,
Methods of Information Geometry , Trans.Math. Monogr. , AMS, Oxford, 2000.[3] T. Aubin,
Nonlinear Analysis on Manifolds. Monge-Amp`ere Equations ,Grund. math. Wiss., , Springer-Verlag, New York, 1982.[4] M. Bauer, M. Bruveris and P.W. Michor,
Uniqueness of the Fisher-Raometric on the space of smooth densities , Bull. London Math. Soc., (2016),499–506.[5] G. Besson, G. Courtois and S. Gallot, Entropies et rigidit´es des espaceslocalement sym´etriques de courbure strictement n´egative , Geom. Funct.Anal. (1995), 731–799.[6] H. Brezis, Functional analysis, Sobolev spaces and partial differential equa-tions , Universitext, Springer, New York, 2011.[7] P. S. Bullen,
Handbook of means and their inequalities , Math. Appl. ,Kluwer Academic Publishers Group, Dordrecht, 2003.[8] A. Cena and G. Pistone,
Exponential statistical manifold , Annals of theInstitute of Statistical Mathematics (2007), 27–56.[9] M. P. do Carmo, Riemannian Geometry , Birkh¨auser, Boston, MA, 1992.[10] E. Douady and C. Earle,
Conformally natural extension of homeomor-phisms of the circle , Acta Math. (1986), 23–48.[11] A. Fathi,
Structure of the group of homeomorphisms preserving agood measure on a compact manifold , Ann. Scient. ´Ec. Norm. Sup., (1980),45-93.[12] T. Friedrich, Die Fisher-Information und symplektische Strukturen , Math.Nachr. (1991), 273–296.[13] P. Gibiliscos and G. Pistone,
Connections on Non-Parametric Statisti-cal Manifolds by Orlicz Space Geometry , Infin. Dimens. Anal. QuantumProbab. Relat. Top., (1998) 325–347.[14] M. Itoh and H. Satoh, Information geometry of Poisson kernels on Damek-Ricci spaces , Tokyo J. Math. (2010), 129–144.[15] M. Itoh and H. Satoh, Geometry of Fisher information metric and thebarycenter map , Entropy (2015), 1814–1849.[16] M. Itoh and H. Satoh, Riemannian distance and diameter of the space ofprobability measures and the parametrix , in preparation.3217] M. Itoh and H. Satoh,
Information geometry of the space of probabilitymeasures and barycenter maps , Sugaku (2017), 387–406. (in Japanese.English version is to appear in Sugaku Expositions, AMS)[18] M. Itoh, H. Satoh and Y. Shishido, A note on the Fisher informationmetric and heat kernels , Intern. J. Pure Applied Math. (2008), 347–353.[19] M. Itoh and Y. Shishido, Fisher information metric and Poisson kernels ,Diff. Geom. Appl. (2008) 347–356.[20] S. Lang, Differential and Riemannian Manifolds , GTM 160, Springer-Verlag, New York, 1995.[21] J. Milnor,
Morse Theory , Princeton Univ. Press, Princeton, 1963.[22] M.S. Nikulin,
Hellinger distance , in Encyclopedia of Mathematics,Springer. [23] A. Ohara,
Geodesics for dual connections and means on symmetric cones ,Integr. Equ. Oper. Theory (2004), 537–548.[24] J. Oxtoby and S. Ulam, Measure preserving homeomorphisms and metricaltransitivity , Ann. of Math., (1941), 874-920.[25] G. Pistone, Nonparametric Information Geometry , in Geometric Scienceof Information, LNCS 8085, Springer 2013, 5-6.[26] G. Pistone,
Information Geometry of the Gaussian Space ,arXiv:1803.08135vl, 2018.[27] G. Pistone and C. Sempi,
An infinite-dimensional geometric structure onthe space of all the probability measures equivalent to a given one , Ann.Stat. 23 (1995), 1543–1561.[28] C.R. Rao,
Information and the accuracy attainable in the estimation ofstatistical parameters , Bull. Calcutta Math. Soc. (1945) 81–91.[29] T. Sakai, Riemannian Geometry , Trans. Math. Mono. , A.M.S. Prov-idence, 1996.[30] M. Santacroce, P. Siri and B. Trivellato,
New results on mixture and ex-ponential models by Orlicz spaces , Bernoulli22