Differentiable maps between Wasserstein spaces
aa r X i v : . [ m a t h . M G ] O c t D IFFER ENTIABLE MAPS B ETWEEN W ASSERSTEIN SPAC ES
Bernadette Lessel
Max Planck Institute for the History of ScienceBerlin, Germany [email protected]
Thomas Schick
Mathematical InstituteUniversity of Göttingen [email protected]
October 5, 2020 A BSTRACT
A notion of differentiability for maps F : W ( M ) −→ W ( N ) between Wasserstein spaces of order2 is being proposed, where M and N are smooth, connected and complete Riemannian manifolds.Due to the nature of the tangent space construction on Wasserstein spaces, we only give a globaldefinition of differentiability, i.e. without a prior notion of pointwise differentiability. With ourdefinition, however, we recover the expected properties of a differential. Special focus is being puton differentiability properties of maps of the form F = f , f : M −→ N and on convex mixing ofdifferentiable maps, with an explicit construction of the differential. Contents W p ( X ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.2 The continuity equation on W ( M ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.3 The tangent space T µ W ( M ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 dF µ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83.3 Differentiable maps between Wasserstein spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.4 Pullbacks and formal Riemannian isometries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.5 Convex mixing of maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 A Disintegration theorem 15
Fundamental work has been done on the weak Riemannian manifold structure and second order analysis on Wasser-stein spaces W ( M ) , most notably by Felix Otto [Ott01], John Lott [Lot07] and Nicola Gigli [Gig12]. However, toour knowledge, no notion of differentiability for maps between Wasserstein spaces has been proposed in the literatureyet.ifferentiable maps between Wasserstein spaces A PREPRINT
We begin with a reminder of Wasserstein spaces and its weak differentiable structure, to motivate the definitions wemake later on. Our notion of differentiability for maps F : W ( M ) → W ( N ) between Wasserstein spaces is aglobal one, in the sense that it does not use a pointwise notion of differentiability. It seems to be the case that thelatter is not possible in an immediate way due to the way tangent spaces are constructed in Wasserstein geometry:The basis for talking about tangent vectors along curves in W ( M ) is constituted by the weak continuity equation ∂ t µ t + ∇ · ( v t µ t ) = 0 , which can be seen as a differential characterization of absolutely continuous curves in W ( M ) (see Theorem 9). The curve of minimal vector fields v t that solves the continuity equation for an absolutely continuouscurve µ t is then seen as being tangential along µ t . However, v t is only defined for almost every t , so that a pointwiseevaluation is not meaningful and therefore undermines the definition of a pointwise notion of differentiability in ourapproach. The differential of a map is, however, defined in a pointwise manner.Our account on differentiable maps between Wasserstein spaces begins with the definition of absolutely continuousmaps which map absolutely continuous curves to absolutely continuous curves. This definition is made in analogyto the theorem in differential geometry that a map f : M → N is differentiable if and only if it maps differentiablecurves to differentiable curves. Absolutely continuous maps serve as a pre-notion to differentiability. An absolutelycontinuous map F : W ( M ) → W ( N ) is then said to be differentiable if every µ ∈ W ( M ) there exists a boundedlinear map dF µ between the tangent space at µ and the tangent space at F ( µ ) such that for every absolutely continuouscurve µ t the image curve dF µ t ( v t ) of the curve of tangent vector fields v t along µ t is a curve of tangent vector fieldsalong F ( µ t ) (Definition 27). The collection of all these dF µ , in the sense of a bundle map between tangent bundles, isthen called the differential dF of F .We show that dF unique up to a redefinition on a negligible set. Also, the usual properties of the differential arederived, such as the expected differential of the constant and of the indentity mapping, also of the composition of twodifferentiable maps and of the inverse of a differentiable map.Special attention is payed to maps of the form F = f , where measures are mapped to their image-measure withrespect to f : M → N , f being smooth and proper and where sup x ∈ M k df x k < ∞ . Maps of this kind are absolutelycontinuous, and an explicit formula is derived for a curve of vector fields satisfying the continuity equation togetherwith F ( µ t ) , where µ t is absolutely continuous. Unfortunately, it is not true in general that this curve of vector fieldsis actually tangent to µ t , i.e. minimal. To enforce that, one can, however, apply a projector onto the respective tangentspaces, for almost every t , which in particular guarantees the existence of a differential for F .Further focus is being put on the treatment of differentiability properties of convex mixings of maps between Wasser-stein spaces, as they provide a class of non-trivial maps which are not given by a pushfoward of measures.For background knowledge on Wasserstein geometry and optimal transport we refer to [AG13] and [Vil08]. Wasserstein geometry is a dynamical structure on Wasserstein spaces, which basically are sets of probability measurestogether with the Wasserstein distance.Let thus ( X, d ) be a Polish space, where d metrizes the topology of X , and P ( X ) the set of all probability measureson X with respect to the Borel σ -algebra B ( X ) . Instead of ( X, d ) we will often just write X . A measurable mapbetween two Polish spaces T : X → Y induces a map between the respective spaces of probability measures via the pushforward T of measures: T : P ( X ) → P ( Y ) , µ T µ , where T µ ( A ) := µ ( T − ( A )) , for A ∈ B ( Y ) . The support of a measure µ is defined by supp ( µ ) := { x ∈ X | every open neighbourhood of x has positive µ -measure } .The Lebesgue measure on R n is denoted by λ . W p ( X ) We denote the set of probability measures which have finite p -th moment by P p ( X ) , where p ∈ [1 , ∞ ) : P p ( X ) := { µ ∈ P ( X ) | Z X d p ( x , x ) dµ ( x ) < ∞} . Note that P p ( X ) is independent of the choice of x ∈ X . Furthermore, we define Adm ( µ, ν ) := { γ ∈ P ( X × Y ) | π X γ = µ, π Y γ = ν } , the so called admissible transport plans between µ and ν . Here, π X : X × Y → X , π X ( x, y ) = x , similarly π Y .2ifferentiable maps between Wasserstein spaces A PREPRINT
Definition 1 ( Wasserstein distances and Wasserstein spaces ) . Let ( X, d ) be a Polish space and p ∈ [0 , ∞ ) , then W p : P p ( X ) × P p ( X ) → X ( µ, ν ) (cid:18) inf γ ∈ Adm ( µ,ν ) Z X × X d p ( x, y ) dγ ( x, y ) (cid:19) /p is called the p -th Wasserstein distance , or Wasserstein distance of order p . The tuple ( P p ( X ) , W p ) is called Wassersteinspace and is denoted by the symbol W p ( X ) .The fact that W p is indeed a metric distance is a problem treated in optimal transport, where it is established that aminimizer for inf γ ∈ Adm ( µ,ν ) Z X × X d p ( x, y ) dγ ( x, y ) actually exists. Such a minimizer is called optimal transport plan . In case a plan γ ∈ Adm ( µ, ν ) is induced by ameasurable map T : X → Y , i.e. in case γ = ( Id, T ) µ , T is called transport map . Then, T µ = ν .One can show that W p ( X ) is complete and separable. Furthermore, W p metrizes the weak convergence in P p ( X ) . Definition 2 ( Weak convergence in P p ( X ) ) . A sequence ( µ k ) k ∈ N ⊂ P ( X ) is said to converge weakly to µ ∈ P p ( X ) if and only if R ϕdµ k → R ϕdµ for any bounded continuous function ϕ on X . This is denoted by µ k ⇁ µ . A sequence ( µ k ) k ∈ N ⊂ P p ( X ) is said to converge weakly to µ ∈ P p ( X ) if and only if for x ∈ X it is:1) µ k ⇁ µ and2) R d p ( x , x ) dµ k ( x ) → R d p ( x , x ) dµ ( x ) .This is denoted by µ k ⇀ µ .An important class of curves in Wasserstein space that we will need later on are constant speed geodesics. Definition 3 ( Constant speed geodesic ) . A curve ( γ t ) t ∈ [0 , , γ = γ , in a metric space ( X, d ) is called a constantspeed geodesic or metric geodesic in case that d ( γ t , γ s ) = | t − s | d ( γ , γ ) ∀ t, s ∈ [0 , . (1)We will often abbreviate curves ( γ t ) t ∈ [0 , by writing γ t instead. Definition 4 ( Geodesic space ) . A metric space ( X, d ) is called geodesic if for every x, y ∈ X with x = y , there existsa constant speed geodesic γ t with γ = x and γ = y .If ( X, d ) is geodesic, then W ( X ) is geodesic as well ([AG13]). W ( M ) In the upcoming section, we will only be concerned with W ( M ) , where M is a smooth, connected and completeRiemannian manifold with Riemannian metric tensor h and associated Riemannian measure µ . We will often write W ( M ) instead of W ( M ) . Furthermore, we equip the set of measurable sections of T M , which we will denote by Γ( T M ) , with an L -topology. That means, for v ∈ Γ( T M ) we define k v k L ( µ ) := sZ M h ( v, v ) dµ and L ( T M, µ ) := { v ∈ Γ( T M ) | k v k L ( µ ) < ∞} / ∼ . Here, two vector fields are considered to be equivalent in case they differ only on a set of µ -measure zero. L ( T M, µ ) is a Hilbert space with the canonical scalar product. We will often write L ( µ ) if it is clear to which manifold M it isreferred to.The (infinite dimensional) manifold structure that is commonly used on W ( M ) is not a smooth structure in the senseof e.g. [KM97] where infinite dimensional manifolds are modeled on convenient vector spaces. The differentiablestructure on W ( M ) , that will be introduced below, rather consists of ad hoc definitions accurately tailored to optimaltransport and the Wasserstein metric structure which only mimic conventional differentiable and Riemannian behavior.3ifferentiable maps between Wasserstein spaces A PREPRINT
Instead of starting with a smooth manifold structure, on Wasserstein spaces one starts with the notion of a tangentspace. Traditionally, the basic idea of a tangent vector at a given point is that it indicates the direction a (smooth)curve will be going infinitesimally from that point. Then, the set of all such vectors which can be found to be tangentto some curve at a given fixed point are collected in the tangent space at that point. On W ( M ) , however, there isno notion of smooth curves. But there is a notion of metric geodesics. In case the transport plan for the optimaltransport between two measures is induced by a map T , the interpolating geodesic on Hilbert spaces can be written as µ t = ((1 − t ) Id + tT ) µ , thus being of the form µ t = F t µ . More generally, on Riemannian manifolds optimaltransport between µ and µ t can be achieved by µ t = F t µ , F t = exp( t ∇ ϕ ) ( see e.g [Vil08], Chapter 12). In thesecases, F t is injective and locally Lipschitz for < t < ([Vil03], Subsubsection 5.4.1). It is known from the theoryof characteristics for partial differential equations that curves of this kind solve the weak continuity equation, togetherwith the vector field to which integral lines F t corresponds. Definition 5 ( Continuity equation ) . Given a family of vector fields ( v t ) t ∈ [0 ,T ] , a curve µ t : [0 , T ] → W ( M ) is saidto solve the weak continuity equation ∂ t µ t + ∇ · ( v t µ t ) = 0 , (2)if Z T Z M (cid:18) ∂∂t ϕ ( x, t ) + h ( ∇ ϕ ( x, t ) , v t ( x )) (cid:19) dµ t ( x ) dt = 0 (3)holds true for all ϕ ∈ C ∞ c ((0 , T ) × M ) . Theorem 6 ( [Vil03], Theorem 5.34 ) . Let ( F t ) t ∈ [0 ,T ) be a family of maps on M such that F t : M → M is a bijectionfor every t ∈ [0 , T ) , F = Id and both ( t, x ) F t ( x ) and ( t, x ) F − t ( x ) are locally Lipschitz on [0 , T ) × M . Letfurther v t ( x ) be a family of velocity fields on M such that its integral lines correspond to the trajectories F t , and µ be a probability measure. Then µ t = F t µ is the unique weak solution in C ([0 , T ) , P ( M )) of ddt µ t + ∇ · ( v t µ t ) = 0 with initial condition µ = µ . Here, P ( M ) is equipped with the weak topology. It is possible to characterize the class of curves on W ( M ) that admit a velocity in the manner of Definition 5 ([AG13])in the following way. Definition 7 ( Absolutely continuous curve ) . Let ( E, d ) be an arbitrary metric space and I an interval in R . A function γ : I → E is called absolutely continuous (a.c.), if there exists a function f ∈ L ( I ) such that d ( γ ( t ) , γ ( s )) ≤ Z st f ( r ) dr, ∀ s, t ∈ I, t ≤ s. (4) Definition 8 ( Metric derivative ) . The metric derivative | ˙ γ | ( t ) of a curve γ : [0 , → E at t ∈ (0 , is given as thelimit | ˙ γ | ( t ) = lim h → d ( γ ( t + h ) , γ ( t )) | h | . (5)Every constant speed geodesic is absolutely continuous and | ˙ γ | ( t ) = d ( γ (0) , γ (1)) .It is known that for absolutely continuous curves γ , the metric derivative exists for a.e. t . It is an element of L (0 , and, up to sets of zero Lebesgue-measure, the minimal function satisfying equation (4) for γ . In this sense absolutelycontinuous functions enable a generalization of the fundamental theorem of calculus to arbitrary metric spaces. Theorem 9 ( Differential characterization of a.c. curves ) . Let µ t : [0 , → W ( M ) be an a.c. curve. Then thereexists a Borel family of vector fields ( v t ) t ∈ [0 , on M such that the continuity equation (3) holds and k v t k L ( µ t ) ≤ | ˙ µ t | for a.e. t ∈ (0 , . Conversely, if a curve µ t : [0 , → W ( M ) is such that there exists a Borel family of vector fields ( v t ) t ∈ [0 , with k v t k L ( µ t ) ∈ L (0 , , together with which it satisfies (3) , then there exists an a.c. curve ˜ µ t being equal to µ t for a.e. t and satisfying | ˙˜ µ t | ≤ k v t k L (˜ µ t ) for a.e. t ∈ (0 , . T µ W ( M ) As seen in Theorem 9, every absolutely continuous curve in W ( M ) admits an L ( dt ) -family of L ( µ t ) -vector fields v t , i.e. k v t k L ( µ t ) ∈ L (0 , , together with which the continuity equation is satisfied. In the following, we will callevery such pair ( µ t , v t ) an a.c. couple . We further want to call v t an accompanying vector field for µ t .4ifferentiable maps between Wasserstein spaces A PREPRINT
Vector fields v t satisfying the continuity equation with a given µ t are, however, not unique: there are many vectorfields which allow for the same motion of the density: Adding another family w t with the ( t -independent) property ∇ ( w t µ t ) = 0 to v t does not alter the equation. Theorem 9 provides a natural criterion to choose a unique elementamong the v ′ t s . According to this theorem, there is at least one L ( dt ) - family v t such that | ˙ µ t | = k v t k L ( µ t ) foralmost all t , i.e. that is of minimal norm for almost all t . Linearity of (4) with respect to v t and the strict convexity ofthe L -norms ensure the uniqueness of this choice, up to sets of zero measure with respect to t . We want to call sucha couple ( µ t , v t ) , where v t is the unique minimal accompanying vector field for an a.c. curve µ t , a tangent couple .It then seems reasonable to define the tangent space at point µ as the set of v ∈ L ( T M, µ ) with k v k µ ≤ k v + w k µ for all w ∈ L ( T M, µ ) such that ∇ ( wµ ) = 0 . This condition for v ∈ L ( T M, µ ) , however, is equivalent to sayingthat R M h ( v, w ) dµ = 0 for all w ∈ L ( T M, µ ) with ∇ ( wµ ) = 0 . This in turn is equivalent to the following, whichwe will take as the definition of the tangent space. Definition 10 ( Tangent space T µ W ( M ) ) . The tangent space T µ W ( M ) at point µ ∈ W ( M ) is defined as T µ W ( M ) := {∇ ϕ | ϕ ∈ C ∞ c ( M ) } L ( T M,µ ) ⊂ L ( T M, µ ) . (6)We also give the definition of the normal space: T ⊥ µ W ( M ) := { w ∈ L ( T M, µ ) | Z h ( w, v ) dµ = 0 , ∀ v ∈ T µ W ( M ) } = { w ∈ L ( T M, µ ) | ∇ ( wµ ) = 0 } . Remark . If ( µ t , v t ) is an a.c. couple, then ( µ t , v t ) is a tangent couple if and only if v t ∈ T µ t W ( M ) for almostevery t ∈ (0 , ([Gig12], Proposition 1.30).It is not difficult to see that dim T δ W ( M ) = dim M , for a Dirac measure δ , whereas in most of the casesdim T µ W ( M ) = ∞ . In general, it can be shown that as long as µ is supported on an at most countable set, T µ W ( M ) = L ( T M, µ ) (see [Gig12], Remark 1.33). Morally, the more points are contained in the support ofthe measure, the bigger gets the dimension. On the other hand, every probability measure can be approximated by asequence of measures with finite support (see [Vil08] Thm 6.18), so that in each neighborhood of every measure thereis an element µ with dim T µ W ( M ) < ∞ .We call the disjoint union of all tangent spaces, T W ( M ) := G µ ∈ W ( M ) T µ W ( M ) = [ µ ∈ W ( M ) { ( µ, v ) | v ∈ T µ W ( M ) } , the tangent bundle of W ( M ) . Since we are not treating W ( M ) as a traditional manifold with charts, T W ( M ) cannotbe equipped with a traditional tangent bundle topology. Also, due to the denseness of the probability measures withfinite support, local triviality cannot be achieved. However, since there is a natural projection map π : T W ( M ) → W ( M ); ( µ, v ) µ , we can in principle still talk about sections and bundle maps on the pointwise level. Whereasthe notion of a vector field - in this context it would effectively be a field of (equivalence classes of) vector fields - hasnot turned out to be useful so far, we will use the concept of a bundle map later. In this spirit, a bundle map betweentangent bundles of Wasserstein spaces W ( M ) and W ( N ) is a fiber preserving map B : T W ( M ) → T W ( N ) inthe sense that together with a continuous map F : W ( M ) → W ( N ) the commutativity of the following diagram issatisfied: T W ( M ) T W ( N ) W ( M ) W ( N ) Bπ M π N F One could ask about the meaningfulness of the condition that F should be continuous since for B the concept ofcontiuity does not makes sense. It is just that we require the preservation of as much structure as possible. In any case,we are mainly going to use this idea of a bundle map to make clear how we want to see our notion of a differential ofa differentiable function F : W ( M ) → W ( N ) .On W ( M ) one can furthermore define a (formal) Riemannian structure. Intuition comes from the following formulawhich is due to J.-D. Benamou and Y. Brenier ([BB99]). It shows that the Wasserstein distance W , having beendefined through the, static, optimal transport problem, can be recovered by a dynamic formula, being reminiscent ofthe length functional on Riemannian manifolds, defining the Riemannian metric distance.5ifferentiable maps between Wasserstein spaces A PREPRINT
Theorem 12 ( Benamou-Brenier formula ) . Let µ, ν ∈ P ( M ) , then W ( µ, ν ) = inf ( µ t ,v t ) Z k v t k L ( µ t ) dt, (7) where the infimum is taken among all a.c. couples ( µ t , v t ) such that µ = µ and µ = ν . This resemblance of formulas thus inspires the following definition.
Definition 13 ( Formal Riemannian tensor on W ( M ) ) . The formal Riemannian metric tensor H µ on W ( M ) at point µ ∈ W ( M ) is defined as H µ : T µ W ( M ) × T µ W ( M ) → R ( v, w ) Z M h x ( v, w ) dµ ( x ) . Indeed, since k v t k L ( µ t ) = qR M h ( v, v ) dµ = p H µ ( v, v ) , we now have W ( µ, ν ) = inf ( µ t ,v t ) R p H µ ( v t , v t ) dt. The tuple ( T µ W ( M )) , H µ ) constitutes a Hilbert Space.Gigli [Gig08] emphasizes that Definition 10 does not allow for a traditional Riemannian structure on W ( M ) sincethe natural exponential map v exp µ ( v ) := ( Id + v ) µ has injectivity radius for every µ . Since W ( M ) and W ( N ) are not manifolds in a traditional sense, to be able to talk about differentiability of maps F : W ( M ) → W ( N ) we cannot compose F with charts and apply Euclidean calculus. Recall, therefore, that a map f : M → N is differentiable if and only if it maps differentiable curves to differentiable curves. Having only a notion of absolutely continuous curves, which are metrically differentiable almost everywhere andwhich are at the foundation of the construction of tangent spaces at Wasserstein spaces, we start with the followingdefinition.
Definition 14 ( Absolutely continuous map ) . A map F : W ( M ) → W ( N ) is called absolutely continuous , or, a.c. , ifthe curve F ( µ t ) ⊂ W ( N ) is absolutely continuous up to redefining t µ t on a zero set, whenever µ t ⊂ W ( M ) isabsolutely continuous.We want to build our notion of differentiable maps between Wasserstein spaces on this idea of absolutely continuousmaps. Before we continue to do so, we first find some conditions under which maps are absolutely continuous. Forthis, we want to recall the notion of proper maps. Definition 15 ( Proper map ) . A continuous map f : X → Y between a Hausdorff space X and a locally compactHausdorff space Y is called proper , if for all compact subsets K ⊂ Y , the preimage f − ( K ) ⊂ X is compact in X .In the following we denote the operator norm of a linear map by k·k . Theorem 16.
Let F : W ( M ) → W ( N ) be given as F ( µ ) = f µ , f : M → N being smooth and proper andsuch that sup x ∈ M k df x k < ∞ . Then F is absolutely continuous and for every tangent couple ( µ t , v t ) , the tuple ( F ( µ t ) , dF µ t ( v t )) is an a.c. couple, where dF µ t ( v t ) y := Z f − ( y ) df x ( v t,x ) dµ yt ( x ) (8) for almost every t and for y ∈ f ( M ) . Here, df x : T x M → T f ( x ) N denotes the differential of f at the point x , v t,x means the vector field v t at the point x ∈ M and the probability measures µ yt ( x ) are defined through the disintegrationtheorem, dµ t ( x ) = dµ yt ( x ) df µ t ( y ) (see Appendix A). For all y / ∈ f ( M ) , we set dF µ t ( v t ) y = 0 . Note that what in A appears as lower index y , now appears as upper index y since here were are additionally dealing with the t -dependence of µ t . A PREPRINT
Although df x : T x M → T f ( x ) M is well defined for every x as a mapping between tangent spaces, it is not welldefined as a mapping between vector fields as long as f is not injective. We thus take the mean value over all thevectors df x ( v t,x ) as the image vector dF µ t ( v t ) y of the vector field v t at point y , where x stands for the elements of thefiber f − ( y ) . In case f is injective, dF µ ( v ) reduces to df ( v ) for every µ , which then can be regarded as full-fledgedvector field.Our naming of the vector field along F ( µ t ) , dF µ t ( v t ) is, of course, very suggestive. Indeed, since the map ( v, µ ) dF µ ( v ) is linear in v , Theorem supports a natural definition for a notion of differentiability for absolutely continuousmaps F . However, before we give such a definition, we need to make some further preparatory observations. Let usfirst continue with proving Theorem 16. Proof.
Let µ t be an a.c. curve. Using Theorem 9, we want to prove that there exists a family of vector fields (˜ v t ) t ∈ [0 , with R k ˜ v t k L ( F ( µ t )) dt < ∞ , such that ( F ( µ t ) , ˜ v t ) is an a.c. couple.Let ( v t ) t ∈ [0 , be the tangent vector field of µ t . For each t for which v t ∈ T µ t W ( M ) (i.e. almost everywhere) wedefine dF µ t ( v t ) as in equation (8). We will prove that dF µ t ( v t ) is an example of such vector fields ˜ v t we are lookingfor.Let us first see that R k dF µ t ( v t ) k L ( F ( µ t )) dt < ∞ . Using the triangle inequality for Bochner integrals, Jensen’sinequality, the disintegration theorem and Hölder’s inequality (in this order), we have: Z k dF µ t ( v t ) k L ( F ( µ t )) dt = Z sZ N k dF µ t ( v t ) k T y N dF ( µ t )( y ) dt = Z sZ N k Z f − ( y ) df x ( v t,x ) dµ yt ( x ) k T y N df µ t ( y ) dt ≤ Z vuutZ N Z f − ( y ) k df x ( v x ) k T y N dµ yt ( x ) ! df µ t ( y ) dt ≤ Z sZ N Z f − ( y ) k df x ( v t,x ) k T y N dµ yt ( x ) df µ t ( y ) dt = Z sZ M k df x ( v t,x ) k T f ( x ) M dµ t ( x ) dt ≤ Z sZ M k df x k · k v t,x k T x M dµ t ( x ) dt ≤ Z sZ M k v t,x k T x M dµ t · ess sup µ t x ∈ M k df x k dt = Z q k v t k L ( µ t ) · ess sup µ t x ∈ M k df x k dt ≤ C Z k v t k L ( µ t ) dt < ∞ . With ess sup µ t x ∈ M we mean the essential supremum with respect to the measure µ t and C := ess sup µ t x ∈ M k df x k . Thelast expression is finite, since we know that k v t k L ( µ t ) ≤ | ˙ µ t | for almost every t and that the metric derivative of an a.c.map is integrable. (The calculation above shows in particular that dF µ t ( v t ) ∈ L ( µ t ) for almost every t , as we willpoint out again below.) The disintegration theorem now allows the following calculation, with g being the Riemanniantensor on N and h the one on M , ϕ ∈ C ∞ c ( N × (0 , and ∇ the gradient with respect to the first coordinate:7ifferentiable maps between Wasserstein spaces A PREPRINT Z N g y ( ∇ ϕ ( y, t ) , dF µ t ( v t ) y ) df µ t ( y )= Z N g y ∇ ϕ ( y, t ) , Z f − ( y ) df x ( v t,x ) dµ yt ( x ) ! df µ t ( y )= Z N Z f − ( y ) g y ( ∇ ϕ ( y, t ) , df x ( v t,x )) dµ yt ( x ) df µ t ( y )= Z N Z f − ( y ) g f ( x ) ( ∇ ϕ ( f ( x ) , t ) , df x ( v t,x )) dµ yt ( x ) df µ t ( y )= Z M g f ( x ) ( ∇ ϕ ( f ( x ) , t ) , df x ( v t,x )) dµ t ( x )= Z M h x ( ∇ ( ϕ ◦ f )( x, t ) , v t,x ) dµ t ( x ) . By ( ϕ ◦ f )( x, t ) we mean ( ϕ ◦ ( f × id ))( x, t ) . For the second equality we used the continuity of the Riemanniantensor at every point y ∈ N . The last step is true because for every vector X ∈ T x M , h x ( ∇ ( ϕ ◦ f )( x ) , X ) = X ( ϕ ◦ f )( x ) = df ( X )( ϕ )( f ( x ))= g f ( x ) ( ∇ ϕ ( f ( x )) , d x f ( X )) . With this, we can now prove our claim that ddt F ( µ t ) + ∇ ( dF µ t ( v t ) F ( µ t )) = 0 in the weak sense: For every ϕ ∈C ∞ c ( N × (0 , it is Z Z N (cid:18) ∂∂t ϕ (cid:19) ( y, t ) + g y ( ∇ ϕ ( y, t ) , dF µ t ( v t ) y ) df µ t ( y ) dt = Z Z M (cid:18) ∂∂t ϕ (cid:19) ( f ( x ) , t ) + h x ( ∇ ( ϕ ◦ f )( x, t ) , v t,x ) dµ t ( x ) dt = Z Z M (cid:18) ∂∂t ( ϕ ◦ f ) (cid:19) ( x, t ) + h x ( ∇ ( ϕ ◦ f )( x, t ) , v t,x ) dµ t ( x ) dt. = 0 . Since f is smooth and proper, ϕ ◦ f ∈ C ∞ c ( M × (0 , and we can apply our assumption on ( µ t , v t ) to be an a.c.couple. dF µ For Theorem we did not need to test whether dF µ ( v ) ∈ T F ( µ ) W ( N ) for all v ∈ T µ W ( M ) , since we only needed ( F ( µ t ) , dF µ t ( v t )) to be an a.c. couple. But is it still true, given that ( µ t , v t ) is a tangent couple?To begin with, the proof of Theorem 16 also guarantees that for every µ ∈ W ( M ) and v ∈ T µ W ( M ) , dF µ ( v ) ∈ L ( F ( µ )) . Knowing this, we can consider formula (8) as the prescription for a map between T µ W ( M ) and L ( F ( µ )) .It is also useful to know that this map is always bounded, which we will see in the next proposition. For the rest of thissection, let F : W ( M ) → W ( N ) be as in Theorem 16 and dF µ ( v ) as in formula (8). Proposition 17 ( Boundedness of dF ) . For each µ ∈ W ( M ) , dF µ : T µ W ( M ) → L ( F ( µ )) is bounded with k dF µ k ≤ ess sup µx ∈ M k df x k . (9) Here, k·k denotes the operator norm of the respective linear map and ess sup µx ∈ M the essential supremum with respectto µ . Inequality (9) can be attained by taking similar steps as in the proof of Theorem 16. The right-hand side of equation(9) is finite since we demanded sup x ∈ M k dg x k to be finite.Let us give an example for a function F for which equality is attained for every µ in inequality (9).8ifferentiable maps between Wasserstein spaces A PREPRINT
Example 18.
Let g : M → M be a Riemannian isometry, i.e. g ∗ h = h , where h is the Riemannian metric tensor on M . Then, for F = g and for all µ ∈ W ( M ) , k dF µ k = ess sup µx ∈ M k dg x k = 1 . This is, because on the one hand, forall x ∈ M , k dg x k = 1 , since dg is an isometry between the tangent spaces T x M and T g ( x ) M . On the other hand, k dg k = sup k v k TµW ( M ) =1 k dg ( v ) k T g µ W ( M ) = sup k v k TµW ( M ) =1 k v k T µ W ( M ) = 1 . To come back to our question, whether dF µ ( v ) is always an element of T F ( µ ) W ( M ) , we first want to study thefollowing simple cases. Lemma 19.
Let µ = δ x , for x ∈ M . Then dF µ ( v ) ∈ T F ( µ ) W ( N ) for all v ∈ T µ W ( M ) .Proof. This is true because F ( δ x ) = δ f ( x ) and for every y ∈ N , L ( δ y ) ∼ = R n ∼ = T δ y W ( N ) , n = dim N . Lemma 20.
Let g : M → M be a Riemannian isometry, i.e. g ∗ h = h , and v = ∇ ϕ ∈ T µ W ( M ) , ϕ ∈ C ∞ c ( M ) . Thenfor every µ ∈ W ( M ) , dF µ ( v ) = dg ( v ) = ∇ ( ϕ ◦ g − ) ∈ T F ( µ ) W ( M ) .Proof. For the Riemannian metric h on M and for every vector field Xh ( ∇ ( ϕ ◦ g − ) , X ) = d ( ϕ ◦ g − )( X ) = dϕ ( dg − ( X )) = h ( ∇ ϕ, dg − ( X ))= h ( dg ( ∇ ϕ ) , X ) . Since we know from Proposition 17 that dg µ is bounded and therefore continuous for every µ ∈ W ( M ) , we caninfer the following more general statement. Corollary 21.
Let g : M → M be a Riemannian isometry and T µ W ( M ) ∋ v = lim n →∞ ∇ ϕ n . Then dg ( v ) =lim n →∞ ∇ ( ϕ n ◦ g − ) ∈ T F ( µ ) W ( M ) . However, the case in Lemma 19 is extreme and the choice of functions in Lemma 20 specific. We will now see that itcan well be that dF µ does not always hit the tangent space at F ( µ ) . Theorem 22.
Let M be a compact manifold without boundary and f = id M : ( M, h ) → ( M, h ) the identity mapon M , where h = ν h and ν : M → (0 , ∞ ) nonconstant. Then for F = id : W ( M, h ) → W ( M, h ) thereexists a ∇ ϕ ∈ T µ W ( M, h ) so that dF µ ( ∇ ϕ ) / ∈ T µ W ( M, h ) , where µ = C · µ h , µ h the volume measure on M with respect to h and C = 1 /µ h ( M ) .Proof. It is clear that F = id W ( M ) and dF µ ( v ) = v ∀ v ∈ T µ W ( M, h ) . However, v is not automatically a memberof T µ W ( M, h ) . We will show that if ϕ is chosen appropriately, v = ∇ h ϕ is not a limit of gradients with respect to h .For this, recall that on a general Riemannian manifold ( M, h ) , there is a duality between vector fields v and 1-forms v ♭ by the formula v ♭h ( · ) := h ( v, · ) , which maps the vector field ∇ h ϕ to the -form dϕ . This identification givesan isomorphism between {∇ h ϕ } L ( T M,h,µ ) and { dφ } L ( T ∗ M,h ∗ ,µ ) . Since this isomorphism depends on the chosenmetric, it is in general v ♭h = v ♭h , but rather v ♭h = ν v ♭h , as Lemma 24 below shows. And thus ∇ h ϕ ♭h = ν dϕ .Now d ( ν dϕ ) = d ( ν ) ∧ dϕ which one can easily arrange to be non-zero. From Lemma 23 below we can thus inferthat ∇ h ϕ ♭h / ∈ { dϕ } L ( T ∗ M,h ∗ ,µ h ) . As Cµ h = Cν n µ h , with n = dim( M ) , the topology on L ( T ∗ M, h , Cµ h ) and L ( T ∗ M, h , µ h ) coincide, so one can conclude that ν dϕ is not an element of T µ W ( M, h ) . Lemma 23. If ω is a smooth -form on M with dω = 0 then ω / ∈ { dϕ } L ( T ∗ M,g ∗ ,µ h ) , where µ h is the volume measureon M with respect to h.Proof. Assuming the opposite and using the standard inner products, one gets the following contradiction: = ( dω, dω ) = ( ω, d ∗ dω ) = lim( dϕ n , d ∗ dω ) = lim( ddϕ n , dω ) = lim 0 = 0 (10) Lemma 24.
In the situation of Theorem 22 and interpreting dF as a map of L -one forms, we have dF µ ( ω ) = ν ω . A PREPRINT
Proof.
Every vector field v ∈ T M corresponds to the covector field ω ∈ T ∗ M by ω ( w ) = h ( v, w ) . A change ofthe Riemannian metric h to h = ν h yields h ( v, w ) = ν h ( v, w ) = h ( v, ν w ) = ω ( ν w ) = ν ω ( w ) , so withrespect to h , v corresponds to ν ω . As we have seen in Subsection 3.2, the conditions of Theorem 16 do not guarantee dF µ ( v ) ∈ T F ( µ ) W ( N ) , eventhough this property is neccessary for a meaningful definition of the differential of F . To help us here, we use thefact that L ( ν ) = T ν W ( N ) ⊕ T ⊥ ν W ( N ) for every ν ∈ W ( N ) and compose dF with a projection onto T F ( µ ) W ( N ) ,so that at least P F ( µ ) ◦ dF µ : T µ W ( M ) → T F ( M ) W ( M ) is a linear and bounded map between T µ W ( M ) and T F ( M ) W ( M ) . Definition 25.
We call P µ the orthogonal linear projection P µ : L ( µ ) −→ T µ W ( M ) v v ⊤ , where v = v ⊤ + v ⊥ , with v ⊤ ∈ T µ W ( M ) and v ⊥ ∈ T ⊥ µ W ( M ) . Proposition 26.
For every a.c. couple ( µ t , v t ) , ( µ t , P µ t ( v t )) is a tangent couple.Proof. Let ( µ t , v t ) be an a.c. couple, then, for v t = v ⊤ t + v ⊥ t we have ddt µ t + ∇ · ( v ⊤ t µ t ) = ddt µ t + ∇ · (( v ⊤ t + v ⊥ t ) µ t ) = 0 . And since k P µ t ( v t ) k L ( µ t ) ≤ k v t k L ( µ t ) we have also k P µ t ( v t ) k L ( µ t ) ∈ L (0 , . Thus, ( µ t , P µ t ( v t )) is an a.c.couple and with Remark 11 a tangent couple.With the observations we have collected so far, we can finally give our definition of a differentiable map betweenWasserstein spaces. Definition 27 ( Differentiable map between Wasserstein spaces ) . An absolutely continuous map F : W ( M ) → W ( N ) is called differentiable in case for every µ ∈ W ( M ) there exists a bounded linear map dF µ : T µ W ( M ) → T F ( µ ) W ( N ) such that for every tangent couple ( µ t , v t ) the image curve dF µ t ( v t ) is a tangent vector field of F ( µ t ) .In this way a bundle map dF : T W ( M ) → T W ( N ) is defined which we want to call the differential of F.When we say a map F : W ( M ) → W ( N ) is differentiable we automatically mean that it is absolutely continuous inthe first place. Remark . The reader might be surprised that we only give a global definition of differentiability, without havingstarted with a pointwise definition. The latter is difficult, if at all possible, since the tangent vector fields v t are onlydefined for a.e. t ∈ [0 , , so a pointwise evaluation of these is not well-defined. The situation would change if onewould be able to speak about continuous curves of tangent vector fields, but it doesn’t seem to be so easy to make thisnotion precise: For differing t, t ′ the vector fields v t and v t ′ are elements of different tangent spaces, potentially evenof different dimension, which is why the usual notion of continuity cannot be trivially applied.Note again that dF µ t ( v t ) is only well-defined almost everywhere, since v t is. But this is not harmful to our definitionsince in particular also the tangent vectors of F ( µ t ) are only well-defined almost everywhere. But in this samemanner, Definition 27 does not guarantee uniqueness of dF in a strict sense. (Here we mean that dF = f dF whenever dF µ ( v ) = f dF µ ( v ) for all ( µ, v ) ∈ T W ( M ) .) But, after all, one can say that dF is unique up to a “negligible” set. Definition 29 ( Negligible set ) . A subset Z ⊂ T W ( M ) is called negligible whenever for every tangent couple ( µ t , v t ) the set { t ∈ (0 , | ( µ t , v t ) ∈ Z } is of Lebesgue measure zero.This definition respects the L ( dt ) -nature of the v t ’s in the sense that changing any v t on a set of measure zero doesnot change the measure of the set { t ∈ (0 , | ( µ t , v t ) ∈ Z } . Proposition 30 ( Uniqueness of the differential ) . The differential dF of a differentiable map F : W ( M ) → W ( N ) isunique up to a redefinition on a negligible set Z ⊂ T W ( M ) . In our sense of the word “bundle”. A PREPRINT
Proof.
Let dF and f dF be two pointwise linear bundle maps, dF being the differential of an a.c. map F . It is to showthat dF and f dF are both a differential of F if and only if { ( µ, v ) ∈ T W ( M ) | dF µ ( v ) = f dF µ ( v ) } is negligible.Let dF and f dF be different only on a negligible set. In this case, for each tangent couple ( µ t , v t ) the image velocities f dF µ t ( v t ) are different from the ones of dF µ t ( v t ) only on a null set and thus still equal the tangent vector fields along F ( µ t ) almost everywhere. Let on the other hand dF and f dF both fulfill the conditions of Definition 27. By definition,for each tangent couple ( µ t , v t ) both dF µ t ( v t ) and f dF µ t ( v t ) are equal almost everywhere to the tangent vectors along F ( µ t ) . Thus, for every tangent couple ( µ t , v t ) , { t ∈ (0 , | dF µ t ( v t ) = f dF µ t ( v t ) } has Lebesgue measure zero.Let us now analyse some properties of negligible sets. Proposition 31. T µ ( W ( M )) \ { } is negligible, for every µ ∈ W ( M ) . But T µ ( W ( M )) isn’t.2.) The countable union of negligible sets is negligible.3.) Every subset of a negligible set is negligible.4.) The following is an equivalence relation on the set of mappings between tangent bundles on Wassersteinspaces: F ∼ G : ⇔ { ( µ, v ) ∈ T W ( M ) | F ( µ, v ) = G ( µ, v ) } is negligible.Remark . Let dF be a differential of a map F : W ( M ) → W ( N ) . Then there are members of its equivalence class [ dF ] which are not a differential of F since not every member has to be pointwise linear and bounded. Restricting,however, the equivalence relation onto the subset of pointwise linear and bounded maps between tangent bundles ofWasserstein spaces solves this issue. In this case [ dF ] contains precisely all the possible differentials of F . Wheneverwe refer to a representative of dF , we mean an element of the latter equivalence class. Proof. ( µ t , v t ) be a tangent couple, v t a fixed representative of v t ∈ L ( dt ) and T µ := { t ∈ (0 , | µ t = µ, v t ∈ T µ W ( M ) } for some µ ∈ W ( M ) . Let us further assume that v t = 0 for every t ∈ T µ which inparticular means that | ˙ µ t | 6 = 0 for every t ∈ T µ . From this we can also infer that for no t ∈ T µ there exists aneighborhood on which µ t is constant. Let a ∈ T µ be a point which is not isolated. This means that in everyneighborhood of a is another point of T µ . The consequence of this would be that the metric derivative wouldnot exist at that point which we excluded in the definition of T µ . So T µ must consist of only isolated pointsand thus must be countable. Choosing another representative of v t ∈ L ( µ ) only changes the amount of t ’sin T µ by a null set. T µ ( W ( M )) is not negligible since µ t = µ is absolutely continuous with metric derivative .2.) This follows from the fact that any countable union of sets of measure zero again is of measure zero.3.) Let N be a subset of a negligible set and ( µ t , v t ) an a.c. curve with a fixed representative v t . The amountof times where ( µ t , v t ) ∈ N can only be a subset of a set of zero measure. Since the Lebesgue measure is acomplete measure this subset itself is measurable and in particular of measure zero.4.) This follows from 1.) and 2.)The following corollary finally recovers the properties expected of a differential. Corollary 33. F = f and f is as in Theorem 16, F is differentiable with dF µ = P F ( µ ) ◦ c dF µ ,where P F ( µ ) is the orthogonal projection onto T F ( µ ) N from Proposition 26 and c dF µ ( v ) y := Z f − ( y ) df ( v x ) dµ y ( x ) , as in formula (8) . In case f is a Riemannian isometry, the additional projection P is not necessary, as wehave seen in Corollary 21. Then, dF µ = df for all µ ∈ W ( M ) .2.) In particular, the identity mapping F ( µ ) = µ is differentiable with dF µ ( v ) = v up to a negligible map.3.) Let F : W ( M ) → W ( N ) and G : W ( N ) → W ( O ) be two differentiable maps. Then also G ◦ F : W ( M ) → W ( O ) is differentiable with d ( G ◦ F ) µ ( v ) = (cid:0) dG F ( µ ) ◦ dF µ (cid:1) ( v ) up to a negligible set. A PREPRINT F is differentiable, bijective with differentiable inverse F − , then dF is also invertible with inverse d ( F − ) , up to a negligible set.Proof. dG F ( µ ) ◦ dF µ : T µ W ( M ) → T ( G ◦ F )( µ ) W ( O ) is suchthat for every tangent couple ( µ t , v t ) , also (( G ◦ F )( µ t ) , ( dG F ( µ ) ◦ dF µ )( v t )) is a tangent couple. So let ( µ t , v t ) be a tangent couple. Since F is differentiable, we know that ( F ( µ t ) , dF µ t ( v t )) is a tangent couple.Similarly, also (cid:0) G ( F ( µ t )) , dG F ( µ t ) ( dF µ t ( v t )) (cid:1) is a tangent couple. Since G ( F ( µ t )) = ( G ◦ F )( µ t ) and dG F ( µ t ) ( dF µ t ( v t )) = ( dG F ( µ t ) ◦ dF µ t )( v t ) , we have proven the claim.4.) This is an immediate consequence of 2.) and 3.). Remark . Let us again emphasize that this type of differentiability is highly tailored to the structure given by optimaltransport. It knowingly does not fit into the framework of, e.g., [KM97]. Nevertheless, let us mention that also in thisreference, the notion of differentiable maps between infinite dimensional manifolds is established via the property thatdifferentiable curves should be mapped to differentiable curves.
As an application of the previous section, we propose a definition for the pullback of the formal Riemannian tensoron W ( M ) and furthermore a definition for formal Riemannian isometries. As the formal Riemannian metric wasdefined by comparison of formulae to actual Riemannian structures (see Definition 13), the performance of pullbacksnow gives rise to definitions of further possible formal Riemannian metrics on W -spaces, in cases where dF µ isinjective for every µ , i.e. in case F can be considered to be an immersion. Definition 35 ( Pullback of the formal Riemannian tensor ) . Let F : W ( N ) → W ( M ) be differentiable, dF be afixed differential of F , µ ∈ W ( N ) and H F ( µ ) the formal Riemannian metric tensor on W ( M ) at point F ( µ ) ∈ W ( M ) .Then, for v, w ∈ T µ W ( M ) , the pullback ( F ∗ H ) µ of H F ( µ ) is defined as ( F ∗ H ) µ ( v, w ) := H F ( µ ) ( dF µ ( v ) , dF µ ( w )) . Unfortunately, this definition depends on the choice of the differential of F , which is, as we have seen, only unique upto a negligible set. Definition 36 ( Formal Riemannian isometry ) . Analogously to the finite dimensional case, we call a bijective dif-ferentiable map F : W ( M ) → W ( M ) with differentiable inverse a formal Riemannian isometry , in case there is arepresentative of dF such that for all µ ∈ W ( M ) ( F ∗ H ) µ ( v, w ) = H µ ( v, w ) for all ( v, w ) ∈ T µ W ( M ) × T µ W ( M ) .It is straightforward to see that F is a formal Riemannian isometry iff there is a representative of dF such that forevery µ ∈ W ( M ) dF µ : T µ W ( M ) → T F ( µ ) W ( M ) is a metric isometry with respect to the metrics induced by the L -norms.Important formal Riemannian isometries are generated by the isometry group of the underlying metric space. Bymeans of the pushforward, ISO ( M ) acts isometrically also on P p and the map G × T W ( M ) → T W ( M )( g, ( µ, v )) ( g µ, dg ( v )) defines an induced action of every subgroup G of ISO ( M ) on the tangent bundle of W ( M ) , where we regard dg as a differential of g . It is quick to check that for g ∈ ISO ( M ) , g : W ( M ) → W ( M ) is a formal Riemannianisometry. Lemma 37.
Let g ∈ ISO ( M ) , then T g µ W ( M ) = dg ( T µ W ( M )) for all µ ∈ W ( M ) . Here, we again regard dg asa, fixed, differential of g . Proposition 38.
Every formal Riemannian isometry is an isometry in the metric sense of its Wasserstein space. A PREPRINT
Proof.
Let F be a formal Riemannian isometry. Since by definition F is bijective with differentiable inverse, everya.c. couple ( µ t , v t ) can be represented as the image of another a.c. couple (˜ µ t , ˜ v t ) . Just choose ˜ µ t := F − ( µ t ) and ˜ v t := dF − ( v t ) . Then, µ t = F (˜ µ t ) and, using Corollary 33, v t = dF (˜ v t ) almost everywhere. Conversely, everyimage of an a.c. couple, in the above sense, is an a.c. couple. Let dF be a suitable representative. For µ, ν ∈ W ( M ) and µ t a.c. connecting them, we then have according to 12: W ( F ( µ ) , F ( ν )) = inf ( F ( µ t ) ,dF ( v t )) Z q H F ( µ t ) ( dF ( v t ) , dF ( v t )) dt = inf ( µ t ,v t ) Z q H µ t ( v t , v t ) dt = W ( µ, ν ) . It would be interesting to find out whether the converse implication of Proposition 38 is true as well, as it is the casefor finite dimensional Riemannian manifolds.
In the examples, we so far have only been concerned with maps F : W ( M ) → W ( N ) which are induced by maps f : M → N . Now one could wonder how a map F which is not of this type could look like and what its differentiabilityproperties are. As a first hint, we recall that whenever there is an f : M → N such that F = f , then for x ∈ M it is F ( δ x ) = δ f ( x ) . Based on this, we can construct the following examples. Example 39. • If F ( µ ) = µ is a constant map such that µ = δ y , y ∈ N , then there exists no map f : M → N such that F = f . In case F ( µ ) = δ y , it is F = f with f ( x ) = y ∀ x ∈ M .• Let F i : W ( M ) → W ( N ) , i = 1 , , such that they do not coincide on { δ x | x ∈ M } . The mixing ofmeasures F := (1 − λ ) F + λF for < λ < , then, cannot be a pushforward of measures. Remark . Another way to think about this issue is the following: Every map F : W ( M ) → W ( N ) has a decom-position into a map ˜ F : W ( M ) → P ( M × N ) with π ˜ F ( µ ) = µ and the map π : P ( M × N ) → W ( N ) , i.e. F = π ◦ ˜ F . Certainly, ˜ F is not unique, but one can always choose ˜ F ( µ ) = µ ⊗ F ( µ ) . Thus, F is a pushforward withrespect to a map f if and only if there exists a map ˜ F in such a way that ˜ F ( µ ) = ( Id, f ) µ . According to [AG13],Lemma 1.20 this is equivalent to saying that for every µ there exists a ˜ F ( µ ) -measurable set Γ ⊂ M × N on which ˜ F ( µ ) is concentrated such that for µ -a.e. x there exists only one y = f ( x ) ∈ M with ( x, y ) ∈ Γ . And in this case, ˜ F ( µ ) = ( Id, f ) µ .It is easy to see that any constant map F : W ( M ) → W ( N ) , µ µ , is differentiable with dF = 0 up to a negligibleset. In the following we will investigate whether maps of the form F = (1 − λ ) F + λ F are also differentiable. Letus start with asserting that the convex mixing of of a.c. maps is a.c.. Proposition 41.
Let F i : W ( M ) → W ( N ) , i = 1 , , be arbitrary a.c. maps. Then, for ≤ λ ≤ , also F :=(1 − λ ) F + λ F is a.c. For the proof of Proposition 41 we will use that already the convex mixing of of a.c. curves is a.c.
Lemma 42.
Let µ t and µ t be a.c. curves. Then also the convex mixing µ t := (1 − λ ) µ t + λµ t with ≤ λ ≤ is ana.c. curve.Proof. Since the µ it are a.c. curves, for every s ≤ t ∈ (0 , there is a g i ∈ L (0 , such that W (cid:0) µ is , µ it (cid:1) ≤ Z ts g i ( τ ) dτ. Now let γ i ∈ Adm ( µ is , µ it ) . Then (1 − λ ) γ + λγ ∈ Adm ( µ s , µ t ) . This is because for every measurable set A and π i the projection onto the i -th component, π ((1 − λ ) γ + λγ ) ( A ) = ((1 − λ ) γ + λγ ) (( π ) − ( A ))= (1 − λ ) γ (( π ) − ( A )) + λγ (( π ) − ( A ))= (cid:0) (1 − λ ) µ s + λµ s (cid:1) ( A ) = µ s ( A ) . A PREPRINT
Similarly for π . Then for ] Adm ( µ s , µ t ) := { (1 − λ ) γ + λγ | γ i ∈ Adm ( µ is , µ it ) } ⊂ Adm ( µ s , µ t ) we have W ( µ s , µ t ) = W (cid:0) (1 − λ ) µ s + λµ s , (1 − λ ) µ t + λµ t (cid:1) ≤ inf π ∈ ] Adm ( µ s ,µ t ) Z d ( x, y ) dπ ( x, y )= (1 − λ ) inf γ ∈ Adm ( µ s ,µ t ) Z d ( x, y ) dγ + λ inf γ ∈ Adm ( µ s ,µ t ) Z d ( x, y ) dγ = (1 − λ ) W ( µ s , µ t )) + λ W ( µ s , µ t ) This means that W ( µ s , µ t ) = q (1 − λ ) W ( µ s , µ t )) + λ W ( µ s , µ t ) ≤ p (1 − λ ) W ( µ s , µ t ) + √ λ W ( µ s , µ t ) ≤ p (1 − λ ) Z ts g ( τ ) dτ + √ λ Z ts g ( τ ) dτ = Z ts ( p (1 − λ ) g + √ λ g ) dτ. Before continuing with the proof of Proposition 41 we give this immediate corollary from the proof of Lemma 42.
Corollary 43.
Let ( X, d ) be a metric space and µ , µ , µ , µ four probability measures on X . Then, W p ((1 − λ ) µ + λµ , (1 − λ ) µ + λµ ) ≤ p p (1 − λ ) W p ( µ , µ ) + p √ λW p ( µ , µ ) . Proof of Proposition 41.
Let µ t be an a.c. curve. Then by definition F i ( µ t ) , i = 1 , , are a.c. curves. From Lema 42we now know that also F ( µ t ) is an a.c. curve. Theorem 44.
Let F i : W ( M ) → W ( N ) , i = 1 , , be two differentiable maps. Then F = (1 − λ ) F + λ F isdifferentiable. Since we have already seen that with the conditions of Theorem 44 F is a.c., as both F i are a.c., we know that F mapsa.c. curves to a.c. curves. We know further that along each of these a.c. image curves there has to be a tangent vectorfield. To find the tangent map, mapping curves of tangent vector fields along a.c. curves to the corresponding curvesof tangent vector fields along the image a.c. curves, i.e. to prove the theorem, we first give a formula for a canonicalimage tangent vector field. Lemma 45.
Let F i : W ( M ) → W ( N ) , i = 1 , , be two differentiable maps. For an a.c. curve γ t in W ( M ) , we definethe a.c. curves µ t := F ( γ t ) , ν t := F ( γ t ) and α t := λµ t + (1 − λ ) ν t in W ( N ) . With the Lebesgue decompositiomtheorem, the measures µ t and ν t give rise to unique measures τ µt , τ νt , β t and Radon-Nykodym derivatives ρ t such that1. For each t the measures τ µt , τ νt and β t are mutually singular: there exist Borel subsets A t , B t , C t that arepairwise disjoint with union N such that B t and C t are nullsets for τ µt , A t and C t are nullsets for τ νt and A t , B t are nullsets for β t .2. µ t = τ µt + β t ν t = τ νt + ρ t β t ρ t is zero only on a nullset of C t .If furthermore v t is a tangent vector field for µ t and w t is an accompanying vector field for ν t , we can give the formulafor a canonical accompanying vector field u t ∈ L ( N, α t ) for α t as u t ( x ) := v t ( x ); x ∈ A t w t ( x ); x ∈ B tλv t ( x )+ ρ t (1 − λ ) w t ( x ) λ +(1 − λ ) ρ t ; x ∈ C t . A PREPRINT
Proof.
Since ddt α t is linear in α t , the continuity equation for ( α t , u t ) is satisfied if and only if Z T Z N h ( ∇ φ ( x, t ) , u t ( x )) dα t dt = Z T ( Z N h ( ∇ φ ( x, t ) , v t ( x )) λdµ t + h ( ∇ φ ( x, t ) , w t ( x ))(1 − λ ) dν t ) dt (11)for all ϕ ∈ C ∞ c ((0 , T ) × N ) and u t ∈ L ( N, α t ) .Let us first check that u t ∈ L ( T N, α t ) . Since N = A t ˙ ∪ B t ˙ ∪ C t , the condition can be checked separately on A t , B t and C t . First, Z A t | u t ( x ) | dα t = Z A t | v t ( x ) | λdµ t < ∞ , and similarly for B t . To check the situation on C t , we start with Z C t | u t ( x ) | dα t = Z C t | λv t ( x ) + (1 − λ ) ρ t w t ( x ) | ( λ + (1 − λ ) ρ t ) ( λdβ t + (1 − λ ) ρ t dβ t ) (12) ≤ Z C t (cid:18) λλ + (1 − λ ) ρ t | v t ( x ) | λ + (1 − λ ) ρ t λ + (1 − λ ) ρ t | w t ( x ) | (1 − λ ) ρ t (cid:19) dβ t . (13)Now it holds that λλ +(1 − λ ) ρ t ≤ and R C t | v t ( x ) | dβ t < ∞ (as one summand in the L -norm of v t with respect to µ t ).Similarly for the second summand, so we see that the whole expression in Equation (13) is finite.Let us now check Equation (11). This can be done separately for (almost all) t ∈ [0 , T ] and again separately forthe integrals over A t , B t , C t . On A t , Equation (11) holds because here u t = v t and α t = λµ t = λτ µt , whereas ν t ( A t ) = 0 . A similar argument works on B t . On C t , formally, u t dα t = λv t + (1 − λ ) ρ t w t λ + (1 − λ ) ρ t d ( λβ t + (1 − λ ) ρ t β t ) = ( λv t + (1 − λ ) ρ t w t ) dβ t = v t λdµ t + w t (1 − λ ) dν t . Proof of Theorem 44.
First, we need to check that u t is indeed an accompanying vector field for α t , i.e. that k u t k L ( α t ) ∈ L (0 , , so that its projection onto the tangent spaces is indeed a tangent vector field along α t .Since N = A t ˙ ∪ B t ˙ ∪ C t , k u t k L ( α t ) = k u t | A t + u t | B t + u t | C t k L ( α t ) ≤ k u t | A t k L ( α t ) + k u t | B t k L ( α t ) + k u t | C t k L ( α t ) ≤ √ λ k v t k L ( µ t ) + p (1 − λ ) k w t k L ( ν t ) + k u t | C t k L ( α t ) . (14)We know of the first two summands in Equation (14) that their L (0 , -norm is finite, as we demanded v t and w t tobe accompanying vector fields. It thus suffices to show the finiteness of the L (0 , -norm of the last summand. Here,we find with ¯ ρ t,λ := λ +(1 − λ ) ρ t , k u t | C t k L ( α t ) = k ( λv t + (1 − λ ) ρ t w t ) | C t k L (¯ ρ t,λ dβ t ) ≤ k λ v t | C t k L (¯ ρ t,λ dβ t ) + k (1 − λ ) ρ t w t | C t k L (¯ ρ t,λ dβ t ) . We have encountered both of those last summands in the proof Lemma 45 and analogously to there (where we haveconcluded the finiteness of the L -norm), we can now conclude the finiteness of the L (0 , -norm of these summandsand thus the claim that k u t k L ( α t ) ∈ L (0 , .Finally, observe that the construction of u t from ( v t , w t ) is a linear and bounded map A λ : L ( M, µ t ) ⊕ L ( M, ν t ) → L ( M, α t ) , as the formula in the proof of the L -property of u t shows. Composition of A λ with dF ⊕ dG and theprojection to the tangent space then defines the derivative of λF + (1 − λ ) G and shows that this convex combinationis differentiable. A Disintegration theorem
To be able to prove Theorem 16, we rely on the following statement (see [AGS08]).
Theorem 46.
Let X and Y be Radon spaces. Furthermore let µ ∈ P ( X ) and f : X → Y be a measurable map.Then there exists a f µ -almost everywhere uniquely determined family of probability measures { µ y } y ∈ Y on X suchthat • for every measurable set A ⊂ X the map y µ y ( A ) is measurable, A PREPRINT • µ y ( X \ f − ( y )) = 0 for f µ -almost every y ∈ Y , • for every measurable function g : X → [0 , ∞ ] it is Z X g ( x ) dµ ( x ) = Z Y Z f − ( y ) g ( x ) dµ y ( x ) df µ ( y ) . This means in particular that any µ ∈ P ( X × Y ) whose first marginal ν is given can be represented in this disintegratedway.On the other hand, whenever there is a measurable (in the sense of the first item above) family µ x ∈ P ( Y ) given, forany ν ∈ P ( X ) the following formula defines a unique measure µ ∈ P ( X × Y ) : µ ( f ) = Z X (cid:18)Z Y f ( x, y ) dµ x ( y ) (cid:19) dν ( x ) , with f : X × Y → R being a nonnegative measurable function. In this sense, disintegration can be seen as an oppositeprocedure to the construction of a product measure. References [AG13] Luigi Ambrosio and Nicola Gigli. A user’s guide to optimal transport. In
Modelling and optimisation offlows on networks , pages 1–155. Springer, 2013.[AGS08] Luigi Ambrosio, Nicola Gigli, and Giuseppe Savaré.
Gradient Flows . Birkhäuser, 2008.[BB99] Jean-David Benamou and Yann Brenier. A numerical method for the optimal time-continuous mass transportproblem and related problems.
Contemporary Mathematics , 226:1–12, 1999.[Gig08] Nicola Gigli.
On the geometry of the space of measures in R d endowed with the quadratic optimal trans-portation distance . PhD thesis, Scuola Normale Superiore, Pisa, 2008.[Gig12] Nicola Gigli. Second Order Analysis on ( P ( M ) , W ) . American Mathematical Society, 2012.[KM97] Andreas Kriegl and Peter Michor. The Convenient Setting of Global Analysis . American MathematicalSociety, sep 1997.[Lot07] John Lott. Some geometric calculations on Wasserstein space.
Communications in Mathematical Physics ,277(2):423–437, nov 2007.[Ott01] Felix Otto. The geometry of dissipative evolution equations: the porous medium equation.
Communicationsin Partial Differential Equations , 26(1-2):101–174, jan 2001.[Vil03] Cédric Villani.
Topics in Optimal Transportation (Graduate Studies in Mathematics, Vol. 58) . AmericanMathematical Society, 2003.[Vil08] Cédric Villani.