[PDF] Differentiable maps between Wasserstein spaces

Abstract

A notion of differentiability is being proposed for maps between Wasserstein spaces of order 2 of smooth, connected and complete Riemannian manifolds. Due to the nature of the tangent space construction on Wasserstein spaces, we only give a global definition of differentiability, i.e. without a prior notion of pointwise differentiability. With our definition, however, we recover the expected properties of a differential. Special focus is being put on differentiability properties of pushforward maps induced by smooth maps between the underlying manifolds, and on convex mixing of differentiable maps, with an explicit construction of the differential.

Full PDF

aa r X i v : . [ m a t h . M G ] O c t D IFFER ENTIABLE MAPS B ETWEEN W ASSERSTEIN SPAC ES

Bernadette Lessel

Max Planck Institute for the History of ScienceBerlin, Germany [email protected]

Thomas Schick

Mathematical InstituteUniversity of Göttingen [email protected]

October 5, 2020 A BSTRACT

A notion of differentiability for maps F : W ( M ) −→ W ( N ) between Wasserstein spaces of order2 is being proposed, where M and N are smooth, connected and complete Riemannian manifolds.Due to the nature of the tangent space construction on Wasserstein spaces, we only give a globaldeﬁnition of differentiability, i.e. without a prior notion of pointwise differentiability. With ourdeﬁnition, however, we recover the expected properties of a differential. Special focus is being puton differentiability properties of maps of the form F = f , f : M −→ N and on convex mixing ofdifferentiable maps, with an explicit construction of the differential. Contents W p ( X ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.2 The continuity equation on W ( M ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.3 The tangent space T µ W ( M ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 dF µ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83.3 Differentiable maps between Wasserstein spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.4 Pullbacks and formal Riemannian isometries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.5 Convex mixing of maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 A Disintegration theorem 15

Fundamental work has been done on the weak Riemannian manifold structure and second order analysis on Wasser-stein spaces W ( M ) , most notably by Felix Otto [Ott01], John Lott [Lot07] and Nicola Gigli [Gig12]. However, toour knowledge, no notion of differentiability for maps between Wasserstein spaces has been proposed in the literatureyet.ifferentiable maps between Wasserstein spaces A PREPRINT

We begin with a reminder of Wasserstein spaces and its weak differentiable structure, to motivate the deﬁnitions wemake later on. Our notion of differentiability for maps F : W ( M ) → W ( N ) between Wasserstein spaces is aglobal one, in the sense that it does not use a pointwise notion of differentiability. It seems to be the case that thelatter is not possible in an immediate way due to the way tangent spaces are constructed in Wasserstein geometry:The basis for talking about tangent vectors along curves in W ( M ) is constituted by the weak continuity equation ∂ t µ t + ∇ · ( v t µ t ) = 0 , which can be seen as a differential characterization of absolutely continuous curves in W ( M ) (see Theorem 9). The curve of minimal vector ﬁelds v t that solves the continuity equation for an absolutely continuouscurve µ t is then seen as being tangential along µ t . However, v t is only deﬁned for almost every t , so that a pointwiseevaluation is not meaningful and therefore undermines the deﬁnition of a pointwise notion of differentiability in ourapproach. The differential of a map is, however, deﬁned in a pointwise manner.Our account on differentiable maps between Wasserstein spaces begins with the deﬁnition of absolutely continuousmaps which map absolutely continuous curves to absolutely continuous curves. This deﬁnition is made in analogyto the theorem in differential geometry that a map f : M → N is differentiable if and only if it maps differentiablecurves to differentiable curves. Absolutely continuous maps serve as a pre-notion to differentiability. An absolutelycontinuous map F : W ( M ) → W ( N ) is then said to be differentiable if every µ ∈ W ( M ) there exists a boundedlinear map dF µ between the tangent space at µ and the tangent space at F ( µ ) such that for every absolutely continuouscurve µ t the image curve dF µ t ( v t ) of the curve of tangent vector ﬁelds v t along µ t is a curve of tangent vector ﬁeldsalong F ( µ t ) (Deﬁnition 27). The collection of all these dF µ , in the sense of a bundle map between tangent bundles, isthen called the differential dF of F .We show that dF unique up to a redeﬁnition on a negligible set. Also, the usual properties of the differential arederived, such as the expected differential of the constant and of the indentity mapping, also of the composition of twodifferentiable maps and of the inverse of a differentiable map.Special attention is payed to maps of the form F = f , where measures are mapped to their image-measure withrespect to f : M → N , f being smooth and proper and where sup x ∈ M k df x k < ∞ . Maps of this kind are absolutelycontinuous, and an explicit formula is derived for a curve of vector ﬁelds satisfying the continuity equation togetherwith F ( µ t ) , where µ t is absolutely continuous. Unfortunately, it is not true in general that this curve of vector ﬁeldsis actually tangent to µ t , i.e. minimal. To enforce that, one can, however, apply a projector onto the respective tangentspaces, for almost every t , which in particular guarantees the existence of a differential for F .Further focus is being put on the treatment of differentiability properties of convex mixings of maps between Wasser-stein spaces, as they provide a class of non-trivial maps which are not given by a pushfoward of measures.For background knowledge on Wasserstein geometry and optimal transport we refer to [AG13] and [Vil08]. Wasserstein geometry is a dynamical structure on Wasserstein spaces, which basically are sets of probability measurestogether with the Wasserstein distance.Let thus ( X, d ) be a Polish space, where d metrizes the topology of X , and P ( X ) the set of all probability measureson X with respect to the Borel σ -algebra B ( X ) . Instead of ( X, d ) we will often just write X . A measurable mapbetween two Polish spaces T : X → Y induces a map between the respective spaces of probability measures via the pushforward T of measures: T : P ( X ) → P ( Y ) , µ T µ , where T µ ( A ) := µ ( T − ( A )) , for A ∈ B ( Y ) . The support of a measure µ is deﬁned by supp ( µ ) := { x ∈ X | every open neighbourhood of x has positive µ -measure } .The Lebesgue measure on R n is denoted by λ . W p ( X ) We denote the set of probability measures which have ﬁnite p -th moment by P p ( X ) , where p ∈ [1 , ∞ ) : P p ( X ) := { µ ∈ P ( X ) | Z X d p ( x , x ) dµ ( x ) < ∞} . Note that P p ( X ) is independent of the choice of x ∈ X . Furthermore, we deﬁne Adm ( µ, ν ) := { γ ∈ P ( X × Y ) | π X γ = µ, π Y γ = ν } , the so called admissible transport plans between µ and ν . Here, π X : X × Y → X , π X ( x, y ) = x , similarly π Y .2ifferentiable maps between Wasserstein spaces A PREPRINT

Deﬁnition 1 ( Wasserstein distances and Wasserstein spaces ) . Let ( X, d ) be a Polish space and p ∈ [0 , ∞ ) , then W p : P p ( X ) × P p ( X ) → X ( µ, ν ) (cid:18) inf γ ∈ Adm ( µ,ν ) Z X × X d p ( x, y ) dγ ( x, y ) (cid:19) /p is called the p -th Wasserstein distance , or Wasserstein distance of order p . The tuple ( P p ( X ) , W p ) is called Wassersteinspace and is denoted by the symbol W p ( X ) .The fact that W p is indeed a metric distance is a problem treated in optimal transport, where it is established that aminimizer for inf γ ∈ Adm ( µ,ν ) Z X × X d p ( x, y ) dγ ( x, y ) actually exists. Such a minimizer is called optimal transport plan . In case a plan γ ∈ Adm ( µ, ν ) is induced by ameasurable map T : X → Y , i.e. in case γ = ( Id, T ) µ , T is called transport map . Then, T µ = ν .One can show that W p ( X ) is complete and separable. Furthermore, W p metrizes the weak convergence in P p ( X ) . Deﬁnition 2 ( Weak convergence in P p ( X ) ) . A sequence ( µ k ) k ∈ N ⊂ P ( X ) is said to converge weakly to µ ∈ P p ( X ) if and only if R ϕdµ k → R ϕdµ for any bounded continuous function ϕ on X . This is denoted by µ k ⇁ µ . A sequence ( µ k ) k ∈ N ⊂ P p ( X ) is said to converge weakly to µ ∈ P p ( X ) if and only if for x ∈ X it is:1) µ k ⇁ µ and2) R d p ( x , x ) dµ k ( x ) → R d p ( x , x ) dµ ( x ) .This is denoted by µ k ⇀ µ .An important class of curves in Wasserstein space that we will need later on are constant speed geodesics. Deﬁnition 3 ( Constant speed geodesic ) . A curve ( γ t ) t ∈ [0 , , γ = γ , in a metric space ( X, d ) is called a constantspeed geodesic or metric geodesic in case that d ( γ t , γ s ) = | t − s | d ( γ , γ ) ∀ t, s ∈ [0 , . (1)We will often abbreviate curves ( γ t ) t ∈ [0 , by writing γ t instead. Deﬁnition 4 ( Geodesic space ) . A metric space ( X, d ) is called geodesic if for every x, y ∈ X with x = y , there existsa constant speed geodesic γ t with γ = x and γ = y .If ( X, d ) is geodesic, then W ( X ) is geodesic as well ([AG13]). W ( M ) In the upcoming section, we will only be concerned with W ( M ) , where M is a smooth, connected and completeRiemannian manifold with Riemannian metric tensor h and associated Riemannian measure µ . We will often write W ( M ) instead of W ( M ) . Furthermore, we equip the set of measurable sections of T M , which we will denote by Γ( T M ) , with an L -topology. That means, for v ∈ Γ( T M ) we deﬁne k v k L ( µ ) := sZ M h ( v, v ) dµ and L ( T M, µ ) := { v ∈ Γ( T M ) | k v k L ( µ ) < ∞} / ∼ . Here, two vector ﬁelds are considered to be equivalent in case they differ only on a set of µ -measure zero. L ( T M, µ ) is a Hilbert space with the canonical scalar product. We will often write L ( µ ) if it is clear to which manifold M it isreferred to.The (inﬁnite dimensional) manifold structure that is commonly used on W ( M ) is not a smooth structure in the senseof e.g. [KM97] where inﬁnite dimensional manifolds are modeled on convenient vector spaces. The differentiablestructure on W ( M ) , that will be introduced below, rather consists of ad hoc deﬁnitions accurately tailored to optimaltransport and the Wasserstein metric structure which only mimic conventional differentiable and Riemannian behavior.3ifferentiable maps between Wasserstein spaces A PREPRINT

Instead of starting with a smooth manifold structure, on Wasserstein spaces one starts with the notion of a tangentspace. Traditionally, the basic idea of a tangent vector at a given point is that it indicates the direction a (smooth)curve will be going inﬁnitesimally from that point. Then, the set of all such vectors which can be found to be tangentto some curve at a given ﬁxed point are collected in the tangent space at that point. On W ( M ) , however, there isno notion of smooth curves. But there is a notion of metric geodesics. In case the transport plan for the optimaltransport between two measures is induced by a map T , the interpolating geodesic on Hilbert spaces can be written as µ t = ((1 − t ) Id + tT ) µ , thus being of the form µ t = F t µ . More generally, on Riemannian manifolds optimaltransport between µ and µ t can be achieved by µ t = F t µ , F t = exp( t ∇ ϕ ) ( see e.g [Vil08], Chapter 12). In thesecases, F t is injective and locally Lipschitz for < t < ([Vil03], Subsubsection 5.4.1). It is known from the theoryof characteristics for partial differential equations that curves of this kind solve the weak continuity equation, togetherwith the vector ﬁeld to which integral lines F t corresponds. Deﬁnition 5 ( Continuity equation ) . Given a family of vector ﬁelds ( v t ) t ∈ [0 ,T ] , a curve µ t : [0 , T ] → W ( M ) is saidto solve the weak continuity equation ∂ t µ t + ∇ · ( v t µ t ) = 0 , (2)if Z T Z M (cid:18) ∂∂t ϕ ( x, t ) + h ( ∇ ϕ ( x, t ) , v t ( x )) (cid:19) dµ t ( x ) dt = 0 (3)holds true for all ϕ ∈ C ∞ c ((0 , T ) × M ) . Theorem 6 ( [Vil03], Theorem 5.34 ) . Let ( F t ) t ∈ [0 ,T ) be a family of maps on M such that F t : M → M is a bijectionfor every t ∈ [0 , T ) , F = Id and both ( t, x ) F t ( x ) and ( t, x ) F − t ( x ) are locally Lipschitz on [0 , T ) × M . Letfurther v t ( x ) be a family of velocity ﬁelds on M such that its integral lines correspond to the trajectories F t , and µ be a probability measure. Then µ t = F t µ is the unique weak solution in C ([0 , T ) , P ( M )) of ddt µ t + ∇ · ( v t µ t ) = 0 with initial condition µ = µ . Here, P ( M ) is equipped with the weak topology. It is possible to characterize the class of curves on W ( M ) that admit a velocity in the manner of Deﬁnition 5 ([AG13])in the following way. Deﬁnition 7 ( Absolutely continuous curve ) . Let ( E, d ) be an arbitrary metric space and I an interval in R . A function γ : I → E is called absolutely continuous (a.c.), if there exists a function f ∈ L ( I ) such that d ( γ ( t ) , γ ( s )) ≤ Z st f ( r ) dr, ∀ s, t ∈ I, t ≤ s. (4) Deﬁnition 8 ( Metric derivative ) . The metric derivative | ˙ γ | ( t ) of a curve γ : [0 , → E at t ∈ (0 , is given as thelimit | ˙ γ | ( t ) = lim h → d ( γ ( t + h ) , γ ( t )) | h | . (5)Every constant speed geodesic is absolutely continuous and | ˙ γ | ( t ) = d ( γ (0) , γ (1)) .It is known that for absolutely continuous curves γ , the metric derivative exists for a.e. t . It is an element of L (0 , and, up to sets of zero Lebesgue-measure, the minimal function satisfying equation (4) for γ . In this sense absolutelycontinuous functions enable a generalization of the fundamental theorem of calculus to arbitrary metric spaces. Theorem 9 ( Differential characterization of a.c. curves ) . Let µ t : [0 , → W ( M ) be an a.c. curve. Then thereexists a Borel family of vector ﬁelds ( v t ) t ∈ [0 , on M such that the continuity equation (3) holds and k v t k L ( µ t ) ≤ | ˙ µ t | for a.e. t ∈ (0 , . Conversely, if a curve µ t : [0 , → W ( M ) is such that there exists a Borel family of vector ﬁelds ( v t ) t ∈ [0 , with k v t k L ( µ t ) ∈ L (0 , , together with which it satisﬁes (3) , then there exists an a.c. curve ˜ µ t being equal to µ t for a.e. t and satisfying | ˙˜ µ t | ≤ k v t k L (˜ µ t ) for a.e. t ∈ (0 , . T µ W ( M ) As seen in Theorem 9, every absolutely continuous curve in W ( M ) admits an L ( dt ) -family of L ( µ t ) -vector ﬁelds v t , i.e. k v t k L ( µ t ) ∈ L (0 , , together with which the continuity equation is satisﬁed. In the following, we will callevery such pair ( µ t , v t ) an a.c. couple . We further want to call v t an accompanying vector ﬁeld for µ t .4ifferentiable maps between Wasserstein spaces A PREPRINT

Vector ﬁelds v t satisfying the continuity equation with a given µ t are, however, not unique: there are many vectorﬁelds which allow for the same motion of the density: Adding another family w t with the ( t -independent) property ∇ ( w t µ t ) = 0 to v t does not alter the equation. Theorem 9 provides a natural criterion to choose a unique elementamong the v ′ t s . According to this theorem, there is at least one L ( dt ) - family v t such that | ˙ µ t | = k v t k L ( µ t ) foralmost all t , i.e. that is of minimal norm for almost all t . Linearity of (4) with respect to v t and the strict convexity ofthe L -norms ensure the uniqueness of this choice, up to sets of zero measure with respect to t . We want to call sucha couple ( µ t , v t ) , where v t is the unique minimal accompanying vector ﬁeld for an a.c. curve µ t , a tangent couple .It then seems reasonable to deﬁne the tangent space at point µ as the set of v ∈ L ( T M, µ ) with k v k µ ≤ k v + w k µ for all w ∈ L ( T M, µ ) such that ∇ ( wµ ) = 0 . This condition for v ∈ L ( T M, µ ) , however, is equivalent to sayingthat R M h ( v, w ) dµ = 0 for all w ∈ L ( T M, µ ) with ∇ ( wµ ) = 0 . This in turn is equivalent to the following, whichwe will take as the deﬁnition of the tangent space. Deﬁnition 10 ( Tangent space T µ W ( M ) ) . The tangent space T µ W ( M ) at point µ ∈ W ( M ) is deﬁned as T µ W ( M ) := {∇ ϕ | ϕ ∈ C ∞ c ( M ) } L ( T M,µ ) ⊂ L ( T M, µ ) . (6)We also give the deﬁnition of the normal space: T ⊥ µ W ( M ) := { w ∈ L ( T M, µ ) | Z h ( w, v ) dµ = 0 , ∀ v ∈ T µ W ( M ) } = { w ∈ L ( T M, µ ) | ∇ ( wµ ) = 0 } . Remark . If ( µ t , v t ) is an a.c. couple, then ( µ t , v t ) is a tangent couple if and only if v t ∈ T µ t W ( M ) for almostevery t ∈ (0 , ([Gig12], Proposition 1.30).It is not difﬁcult to see that dim T δ W ( M ) = dim M , for a Dirac measure δ , whereas in most of the casesdim T µ W ( M ) = ∞ . In general, it can be shown that as long as µ is supported on an at most countable set, T µ W ( M ) = L ( T M, µ ) (see [Gig12], Remark 1.33). Morally, the more points are contained in the support ofthe measure, the bigger gets the dimension. On the other hand, every probability measure can be approximated by asequence of measures with ﬁnite support (see [Vil08] Thm 6.18), so that in each neighborhood of every measure thereis an element µ with dim T µ W ( M ) < ∞ .We call the disjoint union of all tangent spaces, T W ( M ) := G µ ∈ W ( M ) T µ W ( M ) = [ µ ∈ W ( M ) { ( µ, v ) | v ∈ T µ W ( M ) } , the tangent bundle of W ( M ) . Since we are not treating W ( M ) as a traditional manifold with charts, T W ( M ) cannotbe equipped with a traditional tangent bundle topology. Also, due to the denseness of the probability measures withﬁnite support, local triviality cannot be achieved. However, since there is a natural projection map π : T W ( M ) → W ( M ); ( µ, v ) µ , we can in principle still talk about sections and bundle maps on the pointwise level. Whereasthe notion of a vector ﬁeld - in this context it would effectively be a ﬁeld of (equivalence classes of) vector ﬁelds - hasnot turned out to be useful so far, we will use the concept of a bundle map later. In this spirit, a bundle map betweentangent bundles of Wasserstein spaces W ( M ) and W ( N ) is a ﬁber preserving map B : T W ( M ) → T W ( N ) inthe sense that together with a continuous map F : W ( M ) → W ( N ) the commutativity of the following diagram issatisﬁed: T W ( M ) T W ( N ) W ( M ) W ( N ) Bπ M π N F One could ask about the meaningfulness of the condition that F should be continuous since for B the concept ofcontiuity does not makes sense. It is just that we require the preservation of as much structure as possible. In any case,we are mainly going to use this idea of a bundle map to make clear how we want to see our notion of a differential ofa differentiable function F : W ( M ) → W ( N ) .On W ( M ) one can furthermore deﬁne a (formal) Riemannian structure. Intuition comes from the following formulawhich is due to J.-D. Benamou and Y. Brenier ([BB99]). It shows that the Wasserstein distance W , having beendeﬁned through the, static, optimal transport problem, can be recovered by a dynamic formula, being reminiscent ofthe length functional on Riemannian manifolds, deﬁning the Riemannian metric distance.5ifferentiable maps between Wasserstein spaces A PREPRINT

Theorem 12 ( Benamou-Brenier formula ) . Let µ, ν ∈ P ( M ) , then W ( µ, ν ) = inf ( µ t ,v t ) Z k v t k L ( µ t ) dt, (7) where the inﬁmum is taken among all a.c. couples ( µ t , v t ) such that µ = µ and µ = ν . This resemblance of formulas thus inspires the following deﬁnition.

Deﬁnition 13 ( Formal Riemannian tensor on W ( M ) ) . The formal Riemannian metric tensor H µ on W ( M ) at point µ ∈ W ( M ) is deﬁned as H µ : T µ W ( M ) × T µ W ( M ) → R ( v, w ) Z M h x ( v, w ) dµ ( x ) . Indeed, since k v t k L ( µ t ) = qR M h ( v, v ) dµ = p H µ ( v, v ) , we now have W ( µ, ν ) = inf ( µ t ,v t ) R p H µ ( v t , v t ) dt. The tuple ( T µ W ( M )) , H µ ) constitutes a Hilbert Space.Gigli [Gig08] emphasizes that Deﬁnition 10 does not allow for a traditional Riemannian structure on W ( M ) sincethe natural exponential map v exp µ ( v ) := ( Id + v ) µ has injectivity radius for every µ . Since W ( M ) and W ( N ) are not manifolds in a traditional sense, to be able to talk about differentiability of maps F : W ( M ) → W ( N ) we cannot compose F with charts and apply Euclidean calculus. Recall, therefore, that a map f : M → N is differentiable if and only if it maps differentiable curves to differentiable curves. Having only a notion of absolutely continuous curves, which are metrically differentiable almost everywhere andwhich are at the foundation of the construction of tangent spaces at Wasserstein spaces, we start with the followingdeﬁnition.

Deﬁnition 14 ( Absolutely continuous map ) . A map F : W ( M ) → W ( N ) is called absolutely continuous , or, a.c. , ifthe curve F ( µ t ) ⊂ W ( N ) is absolutely continuous up to redeﬁning t µ t on a zero set, whenever µ t ⊂ W ( M ) isabsolutely continuous.We want to build our notion of differentiable maps between Wasserstein spaces on this idea of absolutely continuousmaps. Before we continue to do so, we ﬁrst ﬁnd some conditions under which maps are absolutely continuous. Forthis, we want to recall the notion of proper maps. Deﬁnition 15 ( Proper map ) . A continuous map f : X → Y between a Hausdorff space X and a locally compactHausdorff space Y is called proper , if for all compact subsets K ⊂ Y , the preimage f − ( K ) ⊂ X is compact in X .In the following we denote the operator norm of a linear map by k·k . Theorem 16.

Let F : W ( M ) → W ( N ) be given as F ( µ ) = f µ , f : M → N being smooth and proper andsuch that sup x ∈ M k df x k < ∞ . Then F is absolutely continuous and for every tangent couple ( µ t , v t ) , the tuple ( F ( µ t ) , dF µ t ( v t )) is an a.c. couple, where dF µ t ( v t ) y := Z f − ( y ) df x ( v t,x ) dµ yt ( x ) (8) for almost every t and for y ∈ f ( M ) . Here, df x : T x M → T f ( x ) N denotes the differential of f at the point x , v t,x means the vector ﬁeld v t at the point x ∈ M and the probability measures µ yt ( x ) are deﬁned through the disintegrationtheorem, dµ t ( x ) = dµ yt ( x ) df µ t ( y ) (see Appendix A). For all y / ∈ f ( M ) , we set dF µ t ( v t ) y = 0 . Note that what in A appears as lower index y , now appears as upper index y since here were are additionally dealing with the t -dependence of µ t . A PREPRINT

Although df x : T x M → T f ( x ) M is well deﬁned for every x as a mapping between tangent spaces, it is not welldeﬁned as a mapping between vector ﬁelds as long as f is not injective. We thus take the mean value over all thevectors df x ( v t,x ) as the image vector dF µ t ( v t ) y of the vector ﬁeld v t at point y , where x stands for the elements of theﬁber f − ( y ) . In case f is injective, dF µ ( v ) reduces to df ( v ) for every µ , which then can be regarded as full-ﬂedgedvector ﬁeld.Our naming of the vector ﬁeld along F ( µ t ) , dF µ t ( v t ) is, of course, very suggestive. Indeed, since the map ( v, µ ) dF µ ( v ) is linear in v , Theorem supports a natural deﬁnition for a notion of differentiability for absolutely continuousmaps F . However, before we give such a deﬁnition, we need to make some further preparatory observations. Let usﬁrst continue with proving Theorem 16. Proof.

Let µ t be an a.c. curve. Using Theorem 9, we want to prove that there exists a family of vector ﬁelds (˜ v t ) t ∈ [0 , with R k ˜ v t k L ( F ( µ t )) dt < ∞ , such that ( F ( µ t ) , ˜ v t ) is an a.c. couple.Let ( v t ) t ∈ [0 , be the tangent vector ﬁeld of µ t . For each t for which v t ∈ T µ t W ( M ) (i.e. almost everywhere) wedeﬁne dF µ t ( v t ) as in equation (8). We will prove that dF µ t ( v t ) is an example of such vector ﬁelds ˜ v t we are lookingfor.Let us ﬁrst see that R k dF µ t ( v t ) k L ( F ( µ t )) dt < ∞ . Using the triangle inequality for Bochner integrals, Jensen’sinequality, the disintegration theorem and Hölder’s inequality (in this order), we have: Z k dF µ t ( v t ) k L ( F ( µ t )) dt = Z sZ N k dF µ t ( v t ) k T y N dF ( µ t )( y ) dt = Z sZ N k Z f − ( y ) df x ( v t,x ) dµ yt ( x ) k T y N df µ t ( y ) dt ≤ Z vuutZ N Z f − ( y ) k df x ( v x ) k T y N dµ yt ( x ) ! df µ t ( y ) dt ≤ Z sZ N Z f − ( y ) k df x ( v t,x ) k T y N dµ yt ( x ) df µ t ( y ) dt = Z sZ M k df x ( v t,x ) k T f ( x ) M dµ t ( x ) dt ≤ Z sZ M k df x k · k v t,x k T x M dµ t ( x ) dt ≤ Z sZ M k v t,x k T x M dµ t · ess sup µ t x ∈ M k df x k dt = Z q k v t k L ( µ t ) · ess sup µ t x ∈ M k df x k dt ≤ C Z k v t k L ( µ t ) dt < ∞ . With ess sup µ t x ∈ M we mean the essential supremum with respect to the measure µ t and C := ess sup µ t x ∈ M k df x k . Thelast expression is ﬁnite, since we know that k v t k L ( µ t ) ≤ | ˙ µ t | for almost every t and that the metric derivative of an a.c.map is integrable. (The calculation above shows in particular that dF µ t ( v t ) ∈ L ( µ t ) for almost every t , as we willpoint out again below.) The disintegration theorem now allows the following calculation, with g being the Riemanniantensor on N and h the one on M , ϕ ∈ C ∞ c ( N × (0 , and ∇ the gradient with respect to the ﬁrst coordinate:7ifferentiable maps between Wasserstein spaces A PREPRINT Z N g y ( ∇ ϕ ( y, t ) , dF µ t ( v t ) y ) df µ t ( y )= Z N g y ∇ ϕ ( y, t ) , Z f − ( y ) df x ( v t,x ) dµ yt ( x ) ! df µ t ( y )= Z N Z f − ( y ) g y ( ∇ ϕ ( y, t ) , df x ( v t,x )) dµ yt ( x ) df µ t ( y )= Z N Z f − ( y ) g f ( x ) ( ∇ ϕ ( f ( x ) , t ) , df x ( v t,x )) dµ yt ( x ) df µ t ( y )= Z M g f ( x ) ( ∇ ϕ ( f ( x ) , t ) , df x ( v t,x )) dµ t ( x )= Z M h x ( ∇ ( ϕ ◦ f )( x, t ) , v t,x ) dµ t ( x ) . By ( ϕ ◦ f )( x, t ) we mean ( ϕ ◦ ( f × id ))( x, t ) . For the second equality we used the continuity of the Riemanniantensor at every point y ∈ N . The last step is true because for every vector X ∈ T x M , h x ( ∇ ( ϕ ◦ f )( x ) , X ) = X ( ϕ ◦ f )( x ) = df ( X )( ϕ )( f ( x ))= g f ( x ) ( ∇ ϕ ( f ( x )) , d x f ( X )) . With this, we can now prove our claim that ddt F ( µ t ) + ∇ ( dF µ t ( v t ) F ( µ t )) = 0 in the weak sense: For every ϕ ∈C ∞ c ( N × (0 , it is Z Z N (cid:18) ∂∂t ϕ (cid:19) ( y, t ) + g y ( ∇ ϕ ( y, t ) , dF µ t ( v t ) y ) df µ t ( y ) dt = Z Z M (cid:18) ∂∂t ϕ (cid:19) ( f ( x ) , t ) + h x ( ∇ ( ϕ ◦ f )( x, t ) , v t,x ) dµ t ( x ) dt = Z Z M (cid:18) ∂∂t ( ϕ ◦ f ) (cid:19) ( x, t ) + h x ( ∇ ( ϕ ◦ f )( x, t ) , v t,x ) dµ t ( x ) dt. = 0 . Since f is smooth and proper, ϕ ◦ f ∈ C ∞ c ( M × (0 , and we can apply our assumption on ( µ t , v t ) to be an a.c.couple. dF µ For Theorem we did not need to test whether dF µ ( v ) ∈ T F ( µ ) W ( N ) for all v ∈ T µ W ( M ) , since we only needed ( F ( µ t ) , dF µ t ( v t )) to be an a.c. couple. But is it still true, given that ( µ t , v t ) is a tangent couple?To begin with, the proof of Theorem 16 also guarantees that for every µ ∈ W ( M ) and v ∈ T µ W ( M ) , dF µ ( v ) ∈ L ( F ( µ )) . Knowing this, we can consider formula (8) as the prescription for a map between T µ W ( M ) and L ( F ( µ )) .It is also useful to know that this map is always bounded, which we will see in the next proposition. For the rest of thissection, let F : W ( M ) → W ( N ) be as in Theorem 16 and dF µ ( v ) as in formula (8). Proposition 17 ( Boundedness of dF ) . For each µ ∈ W ( M ) , dF µ : T µ W ( M ) → L ( F ( µ )) is bounded with k dF µ k ≤ ess sup µx ∈ M k df x k . (9) Here, k·k denotes the operator norm of the respective linear map and ess sup µx ∈ M the essential supremum with respectto µ . Inequality (9) can be attained by taking similar steps as in the proof of Theorem 16. The right-hand side of equation(9) is ﬁnite since we demanded sup x ∈ M k dg x k to be ﬁnite.Let us give an example for a function F for which equality is attained for every µ in inequality (9).8ifferentiable maps between Wasserstein spaces A PREPRINT

Example 18.

Let g : M → M be a Riemannian isometry, i.e. g ∗ h = h , where h is the Riemannian metric tensor on M . Then, for F = g and for all µ ∈ W ( M ) , k dF µ k = ess sup µx ∈ M k dg x k = 1 . This is, because on the one hand, forall x ∈ M , k dg x k = 1 , since dg is an isometry between the tangent spaces T x M and T g ( x ) M . On the other hand, k dg k = sup k v k TµW ( M ) =1 k dg ( v ) k T g µ W ( M ) = sup k v k TµW ( M ) =1 k v k T µ W ( M ) = 1 . To come back to our question, whether dF µ ( v ) is always an element of T F ( µ ) W ( M ) , we ﬁrst want to study thefollowing simple cases. Lemma 19.

Let µ = δ x , for x ∈ M . Then dF µ ( v ) ∈ T F ( µ ) W ( N ) for all v ∈ T µ W ( M ) .Proof. This is true because F ( δ x ) = δ f ( x ) and for every y ∈ N , L ( δ y ) ∼ = R n ∼ = T δ y W ( N ) , n = dim N . Lemma 20.

Let g : M → M be a Riemannian isometry, i.e. g ∗ h = h , and v = ∇ ϕ ∈ T µ W ( M ) , ϕ ∈ C ∞ c ( M ) . Thenfor every µ ∈ W ( M ) , dF µ ( v ) = dg ( v ) = ∇ ( ϕ ◦ g − ) ∈ T F ( µ ) W ( M ) .Proof. For the Riemannian metric h on M and for every vector ﬁeld Xh ( ∇ ( ϕ ◦ g − ) , X ) = d ( ϕ ◦ g − )( X ) = dϕ ( dg − ( X )) = h ( ∇ ϕ, dg − ( X ))= h ( dg ( ∇ ϕ ) , X ) . Since we know from Proposition 17 that dg µ is bounded and therefore continuous for every µ ∈ W ( M ) , we caninfer the following more general statement. Corollary 21.

Let g : M → M be a Riemannian isometry and T µ W ( M ) ∋ v = lim n →∞ ∇ ϕ n . Then dg ( v ) =lim n →∞ ∇ ( ϕ n ◦ g − ) ∈ T F ( µ ) W ( M ) . However, the case in Lemma 19 is extreme and the choice of functions in Lemma 20 speciﬁc. We will now see that itcan well be that dF µ does not always hit the tangent space at F ( µ ) . Theorem 22.

Let M be a compact manifold without boundary and f = id M : ( M, h ) → ( M, h ) the identity mapon M , where h = ν h and ν : M → (0 , ∞ ) nonconstant. Then for F = id : W ( M, h ) → W ( M, h ) thereexists a ∇ ϕ ∈ T µ W ( M, h ) so that dF µ ( ∇ ϕ ) / ∈ T µ W ( M, h ) , where µ = C · µ h , µ h the volume measure on M with respect to h and C = 1 /µ h ( M ) .Proof. It is clear that F = id W ( M ) and dF µ ( v ) = v ∀ v ∈ T µ W ( M, h ) . However, v is not automatically a memberof T µ W ( M, h ) . We will show that if ϕ is chosen appropriately, v = ∇ h ϕ is not a limit of gradients with respect to h .For this, recall that on a general Riemannian manifold ( M, h ) , there is a duality between vector ﬁelds v and 1-forms v ♭ by the formula v ♭h ( · ) := h ( v, · ) , which maps the vector ﬁeld ∇ h ϕ to the -form dϕ . This identiﬁcation givesan isomorphism between {∇ h ϕ } L ( T M,h,µ ) and { dφ } L ( T ∗ M,h ∗ ,µ ) . Since this isomorphism depends on the chosenmetric, it is in general v ♭h = v ♭h , but rather v ♭h = ν v ♭h , as Lemma 24 below shows. And thus ∇ h ϕ ♭h = ν dϕ .Now d ( ν dϕ ) = d ( ν ) ∧ dϕ which one can easily arrange to be non-zero. From Lemma 23 below we can thus inferthat ∇ h ϕ ♭h / ∈ { dϕ } L ( T ∗ M,h ∗ ,µ h ) . As Cµ h = Cν n µ h , with n = dim( M ) , the topology on L ( T ∗ M, h , Cµ h ) and L ( T ∗ M, h , µ h ) coincide, so one can conclude that ν dϕ is not an element of T µ W ( M, h ) . Lemma 23. If ω is a smooth -form on M with dω = 0 then ω / ∈ { dϕ } L ( T ∗ M,g ∗ ,µ h ) , where µ h is the volume measureon M with respect to h.Proof. Assuming the opposite and using the standard inner products, one gets the following contradiction: = ( dω, dω ) = ( ω, d ∗ dω ) = lim( dϕ n , d ∗ dω ) = lim( ddϕ n , dω ) = lim 0 = 0 (10) Lemma 24.

In the situation of Theorem 22 and interpreting dF as a map of L -one forms, we have dF µ ( ω ) = ν ω . A PREPRINT

Proof.

Every vector ﬁeld v ∈ T M corresponds to the covector ﬁeld ω ∈ T ∗ M by ω ( w ) = h ( v, w ) . A change ofthe Riemannian metric h to h = ν h yields h ( v, w ) = ν h ( v, w ) = h ( v, ν w ) = ω ( ν w ) = ν ω ( w ) , so withrespect to h , v corresponds to ν ω . As we have seen in Subsection 3.2, the conditions of Theorem 16 do not guarantee dF µ ( v ) ∈ T F ( µ ) W ( N ) , eventhough this property is neccessary for a meaningful deﬁnition of the differential of F . To help us here, we use thefact that L ( ν ) = T ν W ( N ) ⊕ T ⊥ ν W ( N ) for every ν ∈ W ( N ) and compose dF with a projection onto T F ( µ ) W ( N ) ,so that at least P F ( µ ) ◦ dF µ : T µ W ( M ) → T F ( M ) W ( M ) is a linear and bounded map between T µ W ( M ) and T F ( M ) W ( M ) . Deﬁnition 25.

We call P µ the orthogonal linear projection P µ : L ( µ ) −→ T µ W ( M ) v v ⊤ , where v = v ⊤ + v ⊥ , with v ⊤ ∈ T µ W ( M ) and v ⊥ ∈ T ⊥ µ W ( M ) . Proposition 26.

For every a.c. couple ( µ t , v t ) , ( µ t , P µ t ( v t )) is a tangent couple.Proof. Let ( µ t , v t ) be an a.c. couple, then, for v t = v ⊤ t + v ⊥ t we have ddt µ t + ∇ · ( v ⊤ t µ t ) = ddt µ t + ∇ · (( v ⊤ t + v ⊥ t ) µ t ) = 0 . And since k P µ t ( v t ) k L ( µ t ) ≤ k v t k L ( µ t ) we have also k P µ t ( v t ) k L ( µ t ) ∈ L (0 , . Thus, ( µ t , P µ t ( v t )) is an a.c.couple and with Remark 11 a tangent couple.With the observations we have collected so far, we can ﬁnally give our deﬁnition of a differentiable map betweenWasserstein spaces. Deﬁnition 27 ( Differentiable map between Wasserstein spaces ) . An absolutely continuous map F : W ( M ) → W ( N ) is called differentiable in case for every µ ∈ W ( M ) there exists a bounded linear map dF µ : T µ W ( M ) → T F ( µ ) W ( N ) such that for every tangent couple ( µ t , v t ) the image curve dF µ t ( v t ) is a tangent vector ﬁeld of F ( µ t ) .In this way a bundle map dF : T W ( M ) → T W ( N ) is deﬁned which we want to call the differential of F.When we say a map F : W ( M ) → W ( N ) is differentiable we automatically mean that it is absolutely continuous inthe ﬁrst place. Remark . The reader might be surprised that we only give a global deﬁnition of differentiability, without havingstarted with a pointwise deﬁnition. The latter is difﬁcult, if at all possible, since the tangent vector ﬁelds v t are onlydeﬁned for a.e. t ∈ [0 , , so a pointwise evaluation of these is not well-deﬁned. The situation would change if onewould be able to speak about continuous curves of tangent vector ﬁelds, but it doesn’t seem to be so easy to make thisnotion precise: For differing t, t ′ the vector ﬁelds v t and v t ′ are elements of different tangent spaces, potentially evenof different dimension, which is why the usual notion of continuity cannot be trivially applied.Note again that dF µ t ( v t ) is only well-deﬁned almost everywhere, since v t is. But this is not harmful to our deﬁnitionsince in particular also the tangent vectors of F ( µ t ) are only well-deﬁned almost everywhere. But in this samemanner, Deﬁnition 27 does not guarantee uniqueness of dF in a strict sense. (Here we mean that dF = f dF whenever dF µ ( v ) = f dF µ ( v ) for all ( µ, v ) ∈ T W ( M ) .) But, after all, one can say that dF is unique up to a “negligible” set. Deﬁnition 29 ( Negligible set ) . A subset Z ⊂ T W ( M ) is called negligible whenever for every tangent couple ( µ t , v t ) the set { t ∈ (0 , | ( µ t , v t ) ∈ Z } is of Lebesgue measure zero.This deﬁnition respects the L ( dt ) -nature of the v t ’s in the sense that changing any v t on a set of measure zero doesnot change the measure of the set { t ∈ (0 , | ( µ t , v t ) ∈ Z } . Proposition 30 ( Uniqueness of the differential ) . The differential dF of a differentiable map F : W ( M ) → W ( N ) isunique up to a redeﬁnition on a negligible set Z ⊂ T W ( M ) . In our sense of the word “bundle”. A PREPRINT

Proof.

Let dF and f dF be two pointwise linear bundle maps, dF being the differential of an a.c. map F . It is to showthat dF and f dF are both a differential of F if and only if { ( µ, v ) ∈ T W ( M ) | dF µ ( v ) = f dF µ ( v ) } is negligible.Let dF and f dF be different only on a negligible set. In this case, for each tangent couple ( µ t , v t ) the image velocities f dF µ t ( v t ) are different from the ones of dF µ t ( v t ) only on a null set and thus still equal the tangent vector ﬁelds along F ( µ t ) almost everywhere. Let on the other hand dF and f dF both fulﬁll the conditions of Deﬁnition 27. By deﬁnition,for each tangent couple ( µ t , v t ) both dF µ t ( v t ) and f dF µ t ( v t ) are equal almost everywhere to the tangent vectors along F ( µ t ) . Thus, for every tangent couple ( µ t , v t ) , { t ∈ (0 , | dF µ t ( v t ) = f dF µ t ( v t ) } has Lebesgue measure zero.Let us now analyse some properties of negligible sets. Proposition 31. T µ ( W ( M )) \ { } is negligible, for every µ ∈ W ( M ) . But T µ ( W ( M )) isn’t.2.) The countable union of negligible sets is negligible.3.) Every subset of a negligible set is negligible.4.) The following is an equivalence relation on the set of mappings between tangent bundles on Wassersteinspaces: F ∼ G : ⇔ { ( µ, v ) ∈ T W ( M ) | F ( µ, v ) = G ( µ, v ) } is negligible.Remark . Let dF be a differential of a map F : W ( M ) → W ( N ) . Then there are members of its equivalence class [ dF ] which are not a differential of F since not every member has to be pointwise linear and bounded. Restricting,however, the equivalence relation onto the subset of pointwise linear and bounded maps between tangent bundles ofWasserstein spaces solves this issue. In this case [ dF ] contains precisely all the possible differentials of F . Wheneverwe refer to a representative of dF , we mean an element of the latter equivalence class. Proof. ( µ t , v t ) be a tangent couple, v t a ﬁxed representative of v t ∈ L ( dt ) and T µ := { t ∈ (0 , | µ t = µ, v t ∈ T µ W ( M ) } for some µ ∈ W ( M ) . Let us further assume that v t = 0 for every t ∈ T µ which inparticular means that | ˙ µ t | 6 = 0 for every t ∈ T µ . From this we can also infer that for no t ∈ T µ there exists aneighborhood on which µ t is constant. Let a ∈ T µ be a point which is not isolated. This means that in everyneighborhood of a is another point of T µ . The consequence of this would be that the metric derivative wouldnot exist at that point which we excluded in the deﬁnition of T µ . So T µ must consist of only isolated pointsand thus must be countable. Choosing another representative of v t ∈ L ( µ ) only changes the amount of t ’sin T µ by a null set. T µ ( W ( M )) is not negligible since µ t = µ is absolutely continuous with metric derivative .2.) This follows from the fact that any countable union of sets of measure zero again is of measure zero.3.) Let N be a subset of a negligible set and ( µ t , v t ) an a.c. curve with a ﬁxed representative v t . The amountof times where ( µ t , v t ) ∈ N can only be a subset of a set of zero measure. Since the Lebesgue measure is acomplete measure this subset itself is measurable and in particular of measure zero.4.) This follows from 1.) and 2.)The following corollary ﬁnally recovers the properties expected of a differential. Corollary 33. F = f and f is as in Theorem 16, F is differentiable with dF µ = P F ( µ ) ◦ c dF µ ,where P F ( µ ) is the orthogonal projection onto T F ( µ ) N from Proposition 26 and c dF µ ( v ) y := Z f − ( y ) df ( v x ) dµ y ( x ) , as in formula (8) . In case f is a Riemannian isometry, the additional projection P is not necessary, as wehave seen in Corollary 21. Then, dF µ = df for all µ ∈ W ( M ) .2.) In particular, the identity mapping F ( µ ) = µ is differentiable with dF µ ( v ) = v up to a negligible map.3.) Let F : W ( M ) → W ( N ) and G : W ( N ) → W ( O ) be two differentiable maps. Then also G ◦ F : W ( M ) → W ( O ) is differentiable with d ( G ◦ F ) µ ( v ) = (cid:0) dG F ( µ ) ◦ dF µ (cid:1) ( v ) up to a negligible set. A PREPRINT F is differentiable, bijective with differentiable inverse F − , then dF is also invertible with inverse d ( F − ) , up to a negligible set.Proof. dG F ( µ ) ◦ dF µ : T µ W ( M ) → T ( G ◦ F )( µ ) W ( O ) is suchthat for every tangent couple ( µ t , v t ) , also (( G ◦ F )( µ t ) , ( dG F ( µ ) ◦ dF µ )( v t )) is a tangent couple. So let ( µ t , v t ) be a tangent couple. Since F is differentiable, we know that ( F ( µ t ) , dF µ t ( v t )) is a tangent couple.Similarly, also (cid:0) G ( F ( µ t )) , dG F ( µ t ) ( dF µ t ( v t )) (cid:1) is a tangent couple. Since G ( F ( µ t )) = ( G ◦ F )( µ t ) and dG F ( µ t ) ( dF µ t ( v t )) = ( dG F ( µ t ) ◦ dF µ t )( v t ) , we have proven the claim.4.) This is an immediate consequence of 2.) and 3.). Remark . Let us again emphasize that this type of differentiability is highly tailored to the structure given by optimaltransport. It knowingly does not ﬁt into the framework of, e.g., [KM97]. Nevertheless, let us mention that also in thisreference, the notion of differentiable maps between inﬁnite dimensional manifolds is established via the property thatdifferentiable curves should be mapped to differentiable curves.

As an application of the previous section, we propose a deﬁnition for the pullback of the formal Riemannian tensoron W ( M ) and furthermore a deﬁnition for formal Riemannian isometries. As the formal Riemannian metric wasdeﬁned by comparison of formulae to actual Riemannian structures (see Deﬁnition 13), the performance of pullbacksnow gives rise to deﬁnitions of further possible formal Riemannian metrics on W -spaces, in cases where dF µ isinjective for every µ , i.e. in case F can be considered to be an immersion. Deﬁnition 35 ( Pullback of the formal Riemannian tensor ) . Let F : W ( N ) → W ( M ) be differentiable, dF be aﬁxed differential of F , µ ∈ W ( N ) and H F ( µ ) the formal Riemannian metric tensor on W ( M ) at point F ( µ ) ∈ W ( M ) .Then, for v, w ∈ T µ W ( M ) , the pullback ( F ∗ H ) µ of H F ( µ ) is deﬁned as ( F ∗ H ) µ ( v, w ) := H F ( µ ) ( dF µ ( v ) , dF µ ( w )) . Unfortunately, this deﬁnition depends on the choice of the differential of F , which is, as we have seen, only unique upto a negligible set. Deﬁnition 36 ( Formal Riemannian isometry ) . Analogously to the ﬁnite dimensional case, we call a bijective dif-ferentiable map F : W ( M ) → W ( M ) with differentiable inverse a formal Riemannian isometry , in case there is arepresentative of dF such that for all µ ∈ W ( M ) ( F ∗ H ) µ ( v, w ) = H µ ( v, w ) for all ( v, w ) ∈ T µ W ( M ) × T µ W ( M ) .It is straightforward to see that F is a formal Riemannian isometry iff there is a representative of dF such that forevery µ ∈ W ( M ) dF µ : T µ W ( M ) → T F ( µ ) W ( M ) is a metric isometry with respect to the metrics induced by the L -norms.Important formal Riemannian isometries are generated by the isometry group of the underlying metric space. Bymeans of the pushforward, ISO ( M ) acts isometrically also on P p and the map G × T W ( M ) → T W ( M )( g, ( µ, v )) ( g µ, dg ( v )) deﬁnes an induced action of every subgroup G of ISO ( M ) on the tangent bundle of W ( M ) , where we regard dg as a differential of g . It is quick to check that for g ∈ ISO ( M ) , g : W ( M ) → W ( M ) is a formal Riemannianisometry. Lemma 37.

Let g ∈ ISO ( M ) , then T g µ W ( M ) = dg ( T µ W ( M )) for all µ ∈ W ( M ) . Here, we again regard dg asa, ﬁxed, differential of g . Proposition 38.

Every formal Riemannian isometry is an isometry in the metric sense of its Wasserstein space. A PREPRINT

Proof.

Let F be a formal Riemannian isometry. Since by deﬁnition F is bijective with differentiable inverse, everya.c. couple ( µ t , v t ) can be represented as the image of another a.c. couple (˜ µ t , ˜ v t ) . Just choose ˜ µ t := F − ( µ t ) and ˜ v t := dF − ( v t ) . Then, µ t = F (˜ µ t ) and, using Corollary 33, v t = dF (˜ v t ) almost everywhere. Conversely, everyimage of an a.c. couple, in the above sense, is an a.c. couple. Let dF be a suitable representative. For µ, ν ∈ W ( M ) and µ t a.c. connecting them, we then have according to 12: W ( F ( µ ) , F ( ν )) = inf ( F ( µ t ) ,dF ( v t )) Z q H F ( µ t ) ( dF ( v t ) , dF ( v t )) dt = inf ( µ t ,v t ) Z q H µ t ( v t , v t ) dt = W ( µ, ν ) . It would be interesting to ﬁnd out whether the converse implication of Proposition 38 is true as well, as it is the casefor ﬁnite dimensional Riemannian manifolds.

In the examples, we so far have only been concerned with maps F : W ( M ) → W ( N ) which are induced by maps f : M → N . Now one could wonder how a map F which is not of this type could look like and what its differentiabilityproperties are. As a ﬁrst hint, we recall that whenever there is an f : M → N such that F = f , then for x ∈ M it is F ( δ x ) = δ f ( x ) . Based on this, we can construct the following examples. Example 39. • If F ( µ ) = µ is a constant map such that µ = δ y , y ∈ N , then there exists no map f : M → N such that F = f . In case F ( µ ) = δ y , it is F = f with f ( x ) = y ∀ x ∈ M .• Let F i : W ( M ) → W ( N ) , i = 1 , , such that they do not coincide on { δ x | x ∈ M } . The mixing ofmeasures F := (1 − λ ) F + λF for < λ < , then, cannot be a pushforward of measures. Remark . Another way to think about this issue is the following: Every map F : W ( M ) → W ( N ) has a decom-position into a map ˜ F : W ( M ) → P ( M × N ) with π ˜ F ( µ ) = µ and the map π : P ( M × N ) → W ( N ) , i.e. F = π ◦ ˜ F . Certainly, ˜ F is not unique, but one can always choose ˜ F ( µ ) = µ ⊗ F ( µ ) . Thus, F is a pushforward withrespect to a map f if and only if there exists a map ˜ F in such a way that ˜ F ( µ ) = ( Id, f ) µ . According to [AG13],Lemma 1.20 this is equivalent to saying that for every µ there exists a ˜ F ( µ ) -measurable set Γ ⊂ M × N on which ˜ F ( µ ) is concentrated such that for µ -a.e. x there exists only one y = f ( x ) ∈ M with ( x, y ) ∈ Γ . And in this case, ˜ F ( µ ) = ( Id, f ) µ .It is easy to see that any constant map F : W ( M ) → W ( N ) , µ µ , is differentiable with dF = 0 up to a negligibleset. In the following we will investigate whether maps of the form F = (1 − λ ) F + λ F are also differentiable. Letus start with asserting that the convex mixing of of a.c. maps is a.c.. Proposition 41.

Let F i : W ( M ) → W ( N ) , i = 1 , , be arbitrary a.c. maps. Then, for ≤ λ ≤ , also F :=(1 − λ ) F + λ F is a.c. For the proof of Proposition 41 we will use that already the convex mixing of of a.c. curves is a.c.

Lemma 42.

Let µ t and µ t be a.c. curves. Then also the convex mixing µ t := (1 − λ ) µ t + λµ t with ≤ λ ≤ is ana.c. curve.Proof. Since the µ it are a.c. curves, for every s ≤ t ∈ (0 , there is a g i ∈ L (0 , such that W (cid:0) µ is , µ it (cid:1) ≤ Z ts g i ( τ ) dτ. Now let γ i ∈ Adm ( µ is , µ it ) . Then (1 − λ ) γ + λγ ∈ Adm ( µ s , µ t ) . This is because for every measurable set A and π i the projection onto the i -th component, π ((1 − λ ) γ + λγ ) ( A ) = ((1 − λ ) γ + λγ ) (( π ) − ( A ))= (1 − λ ) γ (( π ) − ( A )) + λγ (( π ) − ( A ))= (cid:0) (1 − λ ) µ s + λµ s (cid:1) ( A ) = µ s ( A ) . A PREPRINT

Similarly for π . Then for ] Adm ( µ s , µ t ) := { (1 − λ ) γ + λγ | γ i ∈ Adm ( µ is , µ it ) } ⊂ Adm ( µ s , µ t ) we have W ( µ s , µ t ) = W (cid:0) (1 − λ ) µ s + λµ s , (1 − λ ) µ t + λµ t (cid:1) ≤ inf π ∈ ] Adm ( µ s ,µ t ) Z d ( x, y ) dπ ( x, y )= (1 − λ ) inf γ ∈ Adm ( µ s ,µ t ) Z d ( x, y ) dγ + λ inf γ ∈ Adm ( µ s ,µ t ) Z d ( x, y ) dγ = (1 − λ ) W ( µ s , µ t )) + λ W ( µ s , µ t ) This means that W ( µ s , µ t ) = q (1 − λ ) W ( µ s , µ t )) + λ W ( µ s , µ t ) ≤ p (1 − λ ) W ( µ s , µ t ) + √ λ W ( µ s , µ t ) ≤ p (1 − λ ) Z ts g ( τ ) dτ + √ λ Z ts g ( τ ) dτ = Z ts ( p (1 − λ ) g + √ λ g ) dτ. Before continuing with the proof of Proposition 41 we give this immediate corollary from the proof of Lemma 42.

Corollary 43.

Let ( X, d ) be a metric space and µ , µ , µ , µ four probability measures on X . Then, W p ((1 − λ ) µ + λµ , (1 − λ ) µ + λµ ) ≤ p p (1 − λ ) W p ( µ , µ ) + p √ λW p ( µ , µ ) . Proof of Proposition 41.

Let µ t be an a.c. curve. Then by deﬁnition F i ( µ t ) , i = 1 , , are a.c. curves. From Lema 42we now know that also F ( µ t ) is an a.c. curve. Theorem 44.

Let F i : W ( M ) → W ( N ) , i = 1 , , be two differentiable maps. Then F = (1 − λ ) F + λ F isdifferentiable. Since we have already seen that with the conditions of Theorem 44 F is a.c., as both F i are a.c., we know that F mapsa.c. curves to a.c. curves. We know further that along each of these a.c. image curves there has to be a tangent vectorﬁeld. To ﬁnd the tangent map, mapping curves of tangent vector ﬁelds along a.c. curves to the corresponding curvesof tangent vector ﬁelds along the image a.c. curves, i.e. to prove the theorem, we ﬁrst give a formula for a canonicalimage tangent vector ﬁeld. Lemma 45.

Let F i : W ( M ) → W ( N ) , i = 1 , , be two differentiable maps. For an a.c. curve γ t in W ( M ) , we deﬁnethe a.c. curves µ t := F ( γ t ) , ν t := F ( γ t ) and α t := λµ t + (1 − λ ) ν t in W ( N ) . With the Lebesgue decompositiomtheorem, the measures µ t and ν t give rise to unique measures τ µt , τ νt , β t and Radon-Nykodym derivatives ρ t such that1. For each t the measures τ µt , τ νt and β t are mutually singular: there exist Borel subsets A t , B t , C t that arepairwise disjoint with union N such that B t and C t are nullsets for τ µt , A t and C t are nullsets for τ νt and A t , B t are nullsets for β t .2. µ t = τ µt + β t ν t = τ νt + ρ t β t ρ t is zero only on a nullset of C t .If furthermore v t is a tangent vector ﬁeld for µ t and w t is an accompanying vector ﬁeld for ν t , we can give the formulafor a canonical accompanying vector ﬁeld u t ∈ L ( N, α t ) for α t as u t ( x ) :=  v t ( x ); x ∈ A t w t ( x ); x ∈ B tλv t ( x )+ ρ t (1 − λ ) w t ( x ) λ +(1 − λ ) ρ t ; x ∈ C t . A PREPRINT

Proof.

Since ddt α t is linear in α t , the continuity equation for ( α t , u t ) is satisﬁed if and only if Z T Z N h ( ∇ φ ( x, t ) , u t ( x )) dα t dt = Z T ( Z N h ( ∇ φ ( x, t ) , v t ( x )) λdµ t + h ( ∇ φ ( x, t ) , w t ( x ))(1 − λ ) dν t ) dt (11)for all ϕ ∈ C ∞ c ((0 , T ) × N ) and u t ∈ L ( N, α t ) .Let us ﬁrst check that u t ∈ L ( T N, α t ) . Since N = A t ˙ ∪ B t ˙ ∪ C t , the condition can be checked separately on A t , B t and C t . First, Z A t | u t ( x ) | dα t = Z A t | v t ( x ) | λdµ t < ∞ , and similarly for B t . To check the situation on C t , we start with Z C t | u t ( x ) | dα t = Z C t | λv t ( x ) + (1 − λ ) ρ t w t ( x ) | ( λ + (1 − λ ) ρ t ) ( λdβ t + (1 − λ ) ρ t dβ t ) (12) ≤ Z C t (cid:18) λλ + (1 − λ ) ρ t | v t ( x ) | λ + (1 − λ ) ρ t λ + (1 − λ ) ρ t | w t ( x ) | (1 − λ ) ρ t (cid:19) dβ t . (13)Now it holds that λλ +(1 − λ ) ρ t ≤ and R C t | v t ( x ) | dβ t < ∞ (as one summand in the L -norm of v t with respect to µ t ).Similarly for the second summand, so we see that the whole expression in Equation (13) is ﬁnite.Let us now check Equation (11). This can be done separately for (almost all) t ∈ [0 , T ] and again separately forthe integrals over A t , B t , C t . On A t , Equation (11) holds because here u t = v t and α t = λµ t = λτ µt , whereas ν t ( A t ) = 0 . A similar argument works on B t . On C t , formally, u t dα t = λv t + (1 − λ ) ρ t w t λ + (1 − λ ) ρ t d ( λβ t + (1 − λ ) ρ t β t ) = ( λv t + (1 − λ ) ρ t w t ) dβ t = v t λdµ t + w t (1 − λ ) dν t . Proof of Theorem 44.

First, we need to check that u t is indeed an accompanying vector ﬁeld for α t , i.e. that k u t k L ( α t ) ∈ L (0 , , so that its projection onto the tangent spaces is indeed a tangent vector ﬁeld along α t .Since N = A t ˙ ∪ B t ˙ ∪ C t , k u t k L ( α t ) = k u t | A t + u t | B t + u t | C t k L ( α t ) ≤ k u t | A t k L ( α t ) + k u t | B t k L ( α t ) + k u t | C t k L ( α t ) ≤ √ λ k v t k L ( µ t ) + p (1 − λ ) k w t k L ( ν t ) + k u t | C t k L ( α t ) . (14)We know of the ﬁrst two summands in Equation (14) that their L (0 , -norm is ﬁnite, as we demanded v t and w t tobe accompanying vector ﬁelds. It thus sufﬁces to show the ﬁniteness of the L (0 , -norm of the last summand. Here,we ﬁnd with ¯ ρ t,λ := λ +(1 − λ ) ρ t , k u t | C t k L ( α t ) = k ( λv t + (1 − λ ) ρ t w t ) | C t k L (¯ ρ t,λ dβ t ) ≤ k λ v t | C t k L (¯ ρ t,λ dβ t ) + k (1 − λ ) ρ t w t | C t k L (¯ ρ t,λ dβ t ) . We have encountered both of those last summands in the proof Lemma 45 and analogously to there (where we haveconcluded the ﬁniteness of the L -norm), we can now conclude the ﬁniteness of the L (0 , -norm of these summandsand thus the claim that k u t k L ( α t ) ∈ L (0 , .Finally, observe that the construction of u t from ( v t , w t ) is a linear and bounded map A λ : L ( M, µ t ) ⊕ L ( M, ν t ) → L ( M, α t ) , as the formula in the proof of the L -property of u t shows. Composition of A λ with dF ⊕ dG and theprojection to the tangent space then deﬁnes the derivative of λF + (1 − λ ) G and shows that this convex combinationis differentiable. A Disintegration theorem

To be able to prove Theorem 16, we rely on the following statement (see [AGS08]).

Theorem 46.

Let X and Y be Radon spaces. Furthermore let µ ∈ P ( X ) and f : X → Y be a measurable map.Then there exists a f µ -almost everywhere uniquely determined family of probability measures { µ y } y ∈ Y on X suchthat • for every measurable set A ⊂ X the map y µ y ( A ) is measurable, A PREPRINT • µ y ( X \ f − ( y )) = 0 for f µ -almost every y ∈ Y , • for every measurable function g : X → [0 , ∞ ] it is Z X g ( x ) dµ ( x ) = Z Y Z f − ( y ) g ( x ) dµ y ( x ) df µ ( y ) . This means in particular that any µ ∈ P ( X × Y ) whose ﬁrst marginal ν is given can be represented in this disintegratedway.On the other hand, whenever there is a measurable (in the sense of the ﬁrst item above) family µ x ∈ P ( Y ) given, forany ν ∈ P ( X ) the following formula deﬁnes a unique measure µ ∈ P ( X × Y ) : µ ( f ) = Z X (cid:18)Z Y f ( x, y ) dµ x ( y ) (cid:19) dν ( x ) , with f : X × Y → R being a nonnegative measurable function. In this sense, disintegration can be seen as an oppositeprocedure to the construction of a product measure. References [AG13] Luigi Ambrosio and Nicola Gigli. A user’s guide to optimal transport. In

Modelling and optimisation ofﬂows on networks , pages 1–155. Springer, 2013.[AGS08] Luigi Ambrosio, Nicola Gigli, and Giuseppe Savaré.

Gradient Flows . Birkhäuser, 2008.[BB99] Jean-David Benamou and Yann Brenier. A numerical method for the optimal time-continuous mass transportproblem and related problems.

Contemporary Mathematics , 226:1–12, 1999.[Gig08] Nicola Gigli.

On the geometry of the space of measures in R d endowed with the quadratic optimal trans-portation distance . PhD thesis, Scuola Normale Superiore, Pisa, 2008.[Gig12] Nicola Gigli. Second Order Analysis on ( P ( M ) , W ) . American Mathematical Society, 2012.[KM97] Andreas Kriegl and Peter Michor. The Convenient Setting of Global Analysis . American MathematicalSociety, sep 1997.[Lot07] John Lott. Some geometric calculations on Wasserstein space.

Communications in Mathematical Physics ,277(2):423–437, nov 2007.[Ott01] Felix Otto. The geometry of dissipative evolution equations: the porous medium equation.

Communicationsin Partial Differential Equations , 26(1-2):101–174, jan 2001.[Vil03] Cédric Villani.

Topics in Optimal Transportation (Graduate Studies in Mathematics, Vol. 58) . AmericanMathematical Society, 2003.[Vil08] Cédric Villani.