[PDF] The dynamical Schrödinger problem in abstract metric spaces

Abstract

In this paper we introduce the dynamical Schr\"odinger problem on abstract metric spaces, defined for a wide class of entropy and Fisher information functionals. Under very mild assumptions we prove a generic Gamma-convergence result towards the geodesic problem as the noise parameter \varepsilon\downarrow 0. We also investigate the connection with geodesic convexity of the driving entropy, and study the dependence of the entropic cost on the parameter \varepsilon. Some examples and applications are discussed.

Full PDF

aa r X i v : . [ m a t h . F A ] D ec The dynamical Schrödinger problem in abstract metricspaces

Léonard Monsaingeon ∗ Luca Tamanini † Dmitry Vorotnikov ‡ Abstract

In this paper we introduce the dynamical Schrödinger problem on abstractmetric spaces, deﬁned for a wide class of entropy and Fisher information func-tionals. Under very mild assumptions we prove a generic Gamma-convergenceresult towards the geodesic problem as the noise parameter ε ↓ . We alsoinvestigate the connection with geodesic convexity of the driving entropy, andstudy the dependence of the entropic cost on the parameter ε . Some examplesand applications are discussed. MSC [2020] 49J45, 49Q20, 58B20.Keywords: Benamou-Brenier formulation, Schrödinger bridge, Fisher information,Gamma-convergence, geodesic convexity, gradient ﬂows, optimal transport

Contents Γ -convergence of the Schrödinger problem . . . . . . . . . . . . . . . 204.2 Displacement convexity . . . . . . . . . . . . . . . . . . . . . . . . . 26 ∗ GFM Universidade de Lisboa, Campo Grande, Edifício C6, 1749-016 Lisboa, Portu-gal and IECL Université de Lorraine, F-54506 Vandoeuvre-lès-Nancy Cedex, France. email:[email protected] † CEREMADE (UMR CNRS 7534), Université Paris Dauphine PSL, Place du Maréchal de Lattrede Tassigny, 75775 Paris Cedex 16, France and INRIA-Paris, MOKAPLAN, 2 Rue Simone Iﬀ, 75012,Paris, France. email: [email protected] ‡ University of Coimbra, CMUC, Department of Mathematics, 3001-501 Coimbra, Portugal.email: [email protected] Examples 35

Gaspard Monge and Erwin Schrödinger came up with two a priori unrelated prob-lems that are concerned with ﬁnding a preeminent way of deforming a prescribedprobability distribution into another one. While Monge was interested in optimizingthe cost of transportation of goods [62, 63, 58], Schrödinger’s original thought exper-iment [59, 60] aimed for ﬁnding the most likely evolution between two subsequentobservations of a cloud of independent particles. So, even if in both cases we are fac-ing an interpolation and optimization problem, the former is deterministic in naturewhereas the latter is strongly related to large deviations theory, and is, at the ﬁrstglance, purely stochastic. We refer to a recent survey [22] for various formulationsand aspects of the Schrödinger problem, and to [64, 65] for a discussion of its role inEuclidean Quantum Mechanics.Anyway, several analogies and connections exist between the two problems. Theycan be appreciated by looking carefully at the interpolation aspects of both problems,namely at their dynamical formulations and the underlying equations governing therespective evolutions. In the case of a quadratic transportation cost over a Rieman-nian manifold M , the Monge-Kantorovich optimal transport problem is solved (atleast in a weak sense) by interpolating between the source and the target distri-butions with a constant-speed, length-minimizing geodesic in the Otto-Wassersteinspace of probability measures P ( M ) . This gives a curve ( µ t ) t ∈ [0 , ⊂ P ( M ) whichformally satisﬁes (in the sense of the celebrated Otto calculus [55, 62, 63]) ∇ ˙ µ t ˙ µ t = 0 , (1.1)where ∇ ˙ µ t is the covariant derivative along the curve t µ t . The Schrödinger prob-lem with parameter ε (from a physical viewpoint, ε can be seen as a temperatureor level of noise) can also be translated into such a geometric language. By anal-ogy, when looking at the covariant derivative along the optimal evolution ( µ εt ) t ∈ [0 , ,usually called Schrödinger bridge or entropic interpolation, the resulting equation issurprising and can be viewed [23] as Newton’s second law ∇ ˙ µ εt ˙ µ εt = ε ∇ I ( µ εt ) , (1.2)where in the right-hand side ∇ denotes the gradient in the Otto-Wasserstein pseudo-Riemannian sense and I is the Fisher information I ( µ ) = 4 Z M |∇√ ρ | dvol = Z M |∇ log ρ | ρ dvol provided µ = ρ · vol . A related observation is that the (scaled) heat ﬂow, coincidingwith the (scaled) gradient ﬂow of the Boltzmann-Shannon entropy [40, 63] H ( µ ) = Z M ρ log ρ dvol µ = ρ · vol , is also a solution to (1.2): a simple diﬀerentiation in time of ˙ µ t = − ε ∇ H ( µ t ) and the fact that I = |∇ H | in the Otto-Wasserstein sense automaticallyyield ∇ ˙ µ t ˙ µ t = ε ∇ H ( µ t ) · ε ∇ H ( µ t ) = ε ∇|∇ H ( µ t ) | = ε ∇ I ( µ t ) . This shows that the Schrödinger problem lies between optimal transport and diﬀu-sion and is naturally intertwined with both deterministic behaviour and Brownianmotion. It shares the same Newton’s law as the gradient ﬂow of the entropy, butunlike the heat ﬂow it has a prescribed ﬁnal conﬁguration to match: it is up tothe parameter ε to tip the balance in favour of deterministic transport or diﬀusion.With this heuristics in mind, we see that as ε → the applied force ε ∇ I ( µ εt ) in(1.2) vanishes, so that the Schrödinger problem may be interpreted as a noisy (en-tropic) counterpart of the Monge-Kantorovich optimal transport, corresponding tothe unforced geodesic evolution (1.1) discussed above. This informal relationshiphas a rigorous counterpart, which dates back to the pioneering works on the asymp-totic behavior of the Schrödinger problem as ε → of T. Mikami, M. Thieullen[50, 51], and C. Léonard [44, 45]. This was subsequently developed in [19, 8, 38].Very recently [47, 30], similar small-noise results were obtained for static Monge-Kantorovich problems regularized with more general entropies.This ﬁrst connection can be investigated further and by doing so one can re-mark that (1.2) is exactly the Euler-Lagrange optimality equation for the dynamicalBenamou-Brenier formulation of the Schrödinger problem [18, 22, 45, 39], whichconsists in minimizing the Lagrangian kinetic action perturbed by the Fisher infor-mation: In more precise terms, inf (cid:26) Z Z M | v t | d µ t d t + ε Z I ( µ t ) d t (cid:27) , (1.3)where the inﬁmum runs over all solutions of the continuity equation ∂µ t + div( v t µ t ) = 0 with prescribed initial and ﬁnal densities. Also from this variational standpoint thereader can see that as ε → the Schrödinger problem formally reduces to inf 12 Z Z M | v t | d µ t d t, (1.4)namely to the dynamical Benamou-Brenier formulation of the (quadratic) optimaltransport problem [9]. This variational representation depicts in a way clearer than(1.2) the double nature of the Schrödinger problem, the competition between thedeterminism encoded in the kinetic energy and the unpredictability coming fromthe Fisher information, and the role played by ε in balancing these two oppositebehaviours.The double bond of the Schrödinger problem with optimal transport on the onehand and heat ﬂow on the other hand results in fruitful and wide-ranging applica-tions of both theoretical and applied interest. Indeed, from the connection with theheat ﬂow the solutions to the Schrödinger problem gain regularity properties whichare not available in optimal transport, and thanks to the asymptotic behaviour of3he Schrödinger problem as ε → entropic interpolations represent an eﬃcient wayto approximate Wasserstein geodesics with second-order accuracy [38, 25]. Thisapproach has already turned out to be successful in conjunction with functional in-equalities [24, 34] and diﬀerential calculus along Wasserstein geodesics [38]. But thenice behaviour of Schrödinger bridges is important also for computational purposes.The impact of Schrödinger problem and Sinkhorn algorithms (deeply related to thestatic formulation of the former) on the numerical methods used in optimal trans-portation theory has been impressive, as witnessed by several recent works (see [57]and references therein as well as [26, 10, 12, 13, 11, 29]).As a matter of fact neither the particular structure of the Wasserstein spacenor the speciﬁc choice of the Boltzmann-Shannon functional are required to deﬁnethe two problems in question (cf. a related discussion in the heuristic paper [43]):one can of course deﬁne length-minimizing geodesics in any metric space (X , d ) ,and the Schrödinger problem (or at least its Benamou-Brenier formulation describedabove) merely involves an entropy functional and a corresponding Fisher information.Given such a reasonable entropy functional E on X that generates a gradient ﬂowin a suitable sense, the corresponding Fisher information is expected to be nothingbut the dissipation rate of E (along solutions of its own gradient ﬂow), just as I coincides with the rate of dissipation of the entropy H along the heat ﬂow. Thisobservation is the starting point of the present paper, where we intend to study theabstract Schrödinger bridge problem or, in other words, the entropic approximationof geodesics in metric spaces.Under very mild assumptions on X and E , we will prove the solvability of theabstract ε -Schrödinger problem and the Γ -convergence to the corresponding geodesicproblem as ε → . We will also rigorously justify, in the metric setting, that anytrajectory of a gradient ﬂow solves an associated Schrödinger problem. Leveraging aquantitative AC estimate based on a straightforward chain-rule in the smooth Rie-mannian setting, the cornerstone of our analysis will be the systematic constructionof an ε -regularized entropic copy ( γ εt ) t ∈ [0 , of any arbitrary curve ( γ t ) t ∈ [0 , . Theseperturbed curves will provide recovery sequences for the Γ -convergence. Our con-struction is completely Eulerian and essentially consists in running the E -gradientﬂow for a short time h ε ( t ) starting at γ t for all t , for well-chosen functions h ε ≥ .The challenge here will be to reproduce the (formal, diﬀerential) Riemannian chain-rule in metric spaces. Notably, an analogous pseudo-Riemannian idea has recentlybeen used by A. Baradat and some of the authors [8, 53] in order to prove the Γ -convergence for the classical dynamical Schrödinger problem on the Otto-Wassersteinspace and for its counterpart on the non-commutative Fisher-Rao space, respectively.However, in those papers the computations were ad hoc and heavily exploited theunderlying structures of the particular spaces as well as the properties of the par-ticular ﬂows (namely, of the classical heat ﬂow and of its restriction to multivariateGaussians), whereas here we derive everything from the existence of an abstractgradient ﬂow on X driven by E .Another notion of paramount importance herein will be convexity. In the smoothRiemannian setting, and given λ ∈ R , elementary calculus shows that the λ -convexityof E along geodesics is of course equivalent to a uniform lower bound Hess E ( x ) ≥ λ Id as quadratic forms in the tangent space, but also more importantly to the λ -contractivity of the E -gradient ﬂow. In the metric setting no second order calculus4s available in general, and the very notion of gradient ﬂow as well as its connectionwith geodesic convexity and contractivity become much more subtle. The key no-tion of gradient ﬂow that we shall use throughout is that of Evolution VariationalInequality , or

EVI λ ﬂow [2]. Under reasonable assumptions it is well known that(a suitable variant of) convexity of E generally provides existence of an EVI λ -ﬂowstarting at any x ∈ X , see [2]. A natural question to ask is whether the converse alsoholds true, i.e. whether well-posedness of a reasonable gradient ﬂow implies someconvexity. This was proved in [15] for the speciﬁc case of the Euclidean Wassersteinspace X = W (Ω) , Ω ⊂ R d , and at least for the so-called internal energies, and it isshown therein that -contractivity of the gradient ﬂow (or equivalently, of the asso-ciated nonlinear diﬀusion equation) implies -displacement convexity in the sense ofMcCann [49]. In the same spirit, and building up on Otto and Westdickenberg [56],Daneri and Savaré proved in a very general metric setting that the generation of an EVI λ -ﬂow indeed implies λ -geodesic convexity [27, Theorem 3.2]. A byproduct ofour analysis for the Γ -convergence will give a new independent proof of this latterfact by a completely diﬀerent approach, essentially by constructing an ε -entropicregularization of geodesics and carefully examining the defect of optimality at orderone in ε → .As a main application of the Γ -convergence of the Schrödinger problem to thegeodesic problem as ε → (and more generally of the ε ′ -Schrödinger problem to the ε -one as ε ′ → ε ) we investigate the behaviour of the optimal value of the dynamicalSchrödinger problem, henceforth called entropic cost , as a function of the temper-ature parameter ε , with particular emphasis on the regularity and the small-noiseregime. For the classical dynamical Schrödinger problem (1.3), it has recently beenproved by the second author with G. Conforti [25] that the entropic cost is of class C ((0 , ∞ )) ∩ C ([0 , ∞ )) (actually C ([0 , ∞ )) under suitable assumptions) and twicea.e. diﬀerentiable; once this regularity information is available, the formula for theﬁrst derivative is rather easy to guess, as by the envelope theorem it coincides withthe partial derivative w.r.t. ε of the functional in (1.3) evaluated at any critical point.Denoting by C ε ( µ, ν ) the value in (1.3) with marginal constraints µ and ν and by ( µ εt ) t ∈ [0 , the associated Schrödinger bridge, this statement reads as dd ε C ε ( µ, ν ) = ε Z I ( µ εt ) d t, ∀ ε > and in [25] this identity played an important role in the study of both the large- andsmall-noise behaviour of the Schrödinger problem, obtaining in particular a Taylorexpansion around ε = 0 with o ( ε ) -accuracy. Since the central object in the presentpaper is an abstract and general formulation of (1.3), an analogous result is expectedto hold. However, from a technical viewpoint the proof is much more subtle andchallenging, because unlike (1.3) our metric version of the dynamical Schrödingerproblem may have multiple solutions. For this reason the discussion about theregularity of the entropic cost in this paper is less concise than in [25]. Nonetheless,we are still able to deduce the same kind of Taylor expansion with the same accuracy.Given the previous interpretation of the Schrödinger problem as a noisy Monge-Kantorovich problem and the importance of quantitative estimates in approximatingoptimal transport by means of the Schrödinger problem, it is reasonable to expectthat such a Taylor expansion (valid in a general framework for a wide choice of5unctionals E ) will ﬁt to a countless variety of examples, some of which will bediscussed here.The paper is organized as follows: In Section 2 we give a short and formal proof ofour fundamental AC estimate in the smooth Riemannian setting, and show how itcan be exploited to establish Γ -convergence and convexity. Section 3 ﬁxes the metricframework in which we work for the rest of the paper, and extends the previousestimate to this metric setting. In Section 4 we prove the Γ -convergence as ε ↓ andestablish the geodesic convexity. Section 5 studies the dependence of the optimalentropic cost on the temperature parameter ε > , and provides a second orderexpansion. Finally, we list in Section 6 several examples and applications coveredby our abstract results. Here we remain formal and the computations are carried in a Riemannian setting,where classical calculus and chain-rules are available. (Signiﬁcant work will be re-quired later on to adapt the computations in metric spaces.) All the objects andfunctions in this section are therefore considered to be smooth, and we deliberatelyignore any regularity issue.Let M be a Riemannian manifold with scalar product h ., . i q at a point q ∈ M and induced Riemannian distance d , and let V : M → R be a given potential. Forsimplicity we assume here that V is globally bounded from below on M , and up toreplacing V by V − min V we can assume that V ( q ) ≥ . (In section 3 we will relaxthis assumption and allow V to be only locally bounded .) Given a small temperatureparameter ε > , and following [43], the (dynamic) geometric Schrödinger problem consists in solving the optimization problem Z (cid:12)(cid:12)(cid:12)(cid:12) d q t d t (cid:12)(cid:12)(cid:12)(cid:12) d t + ε Z |∇ V | ( q t ) d t −→ min; s.t. q ∈ C ([0 , , M ) with endpoints q , q . (2.1)For s ≥ we denote by Φ( s, q ) the semi-ﬂow corresponding to the autonomous V -gradient ﬂow started from q ∈ M , ( dd s Φ( s, q ) = −∇ V (Φ( s, q )) , Φ(0 , q ) = q . The goal of this section is to give a straightforward proof of the following two facts,assuming that the potential V is well behaved:(i) the ε -Schrödinger problem converges to the geodesic problem as ε → ;(ii) λ -contractivity of the generated ﬂow Φ can be turned into λ -convexity alonggeodesics.With this goal in mind, ﬁx any two endpoints q , q ∈ M and take an arbitrary curvejoining them q ∈ C ([0 , , M ) , q | t =0 = q and q | t =1 = q . h ( t ) ≥ with h (0) = h (1) = 0 , we perturb q by deﬁning ˜ q t := Φ( h ( t ) , q t ) , t ∈ [0 , i.e. ˜ q t is the solution of the V -gradient ﬂow at time s = h ( t ) ≥ starting from q t attime s = 0 . We shall refer to t ∈ [0 , as a “horizontal time” and to s ∈ [0 , h ( t )] as a“vertical time”, see Figure 1. Later on we will think of the curve ˜ q as a “regularized”version of q . q q Φ( s, q t ) t s ˜ q t q t Figure 1: The perturbed curveNote that the endpoints remain invariant, ˜ q = q and ˜ q = q . Since by deﬁnitionof the ﬂow ∂ s Φ( s, q t ) = −∇ V (Φ( s, q t )) , the speed of the perturbed curve can becomputed as d˜ q t d t = dd t (cid:16) Φ( h ( t ) , q t ) (cid:17) = ∂ s Φ( h ( t ) , q t ) h ′ ( t ) + ∂ q Φ( h ( t ) , q t ) d q t d t = − h ′ ( t ) ∇ V (˜ q t ) + ∂ q Φ( h ( t ) , q t ) d q t d t . Bringing the h ′ ( t ) term to the left-hand side and taking the half squared norm (inthe tangent space T ˜ q t M ) gives (cid:12)(cid:12)(cid:12)(cid:12) d˜ q t d t (cid:12)(cid:12)(cid:12)(cid:12) + 12 | h ′ ( t ) | |∇ V (˜ q t ) | + h ′ ( t ) h∇ V (˜ q t ) , d˜ q t d t i ˜ q t | {z } = dd t V (˜ q t ) = 12 (cid:12)(cid:12)(cid:12)(cid:12) ∂ q Φ( h ( t ) , q t ) d q t d t (cid:12)(cid:12)(cid:12)(cid:12) . (2.2)Assume now that, for whatever reason, the gradient ﬂow satisﬁes the following quan-tiﬁed contractivity estimate w.r.t. the Riemannian distance dd (Φ( s, p ) , Φ( s, p ′ )) ≤ e − λs d ( p , p ′ ) , ∀ s ≥ , p , p ′ ∈ M (2.3)for some ﬁxed λ ∈ R . Then it is easy to check that the linear map v ∂ q Φ( s, p ) · v (from T p M to T Φ( s,p ) M ) has norm less than e − λs , and therefore (2.2) gives (cid:12)(cid:12)(cid:12)(cid:12) d˜ q t d t (cid:12)(cid:12)(cid:12)(cid:12) + 12 | h ′ ( t ) | |∇ V (˜ q t ) | + h ′ ( t ) dd t V (˜ q t ) ≤ e − λh ( t ) (cid:12)(cid:12)(cid:12)(cid:12) d q t d t (cid:12)(cid:12)(cid:12)(cid:12) . (2.4)7ntegration by parts yields next Z (cid:12)(cid:12)(cid:12)(cid:12) d˜ q t d t (cid:12)(cid:12)(cid:12)(cid:12) d t + 12 Z | h ′ ( t ) | |∇ V (˜ q t ) | d t − Z h ′′ ( t ) V (˜ q t ) d t ≤ Z e − λh ( t ) (cid:12)(cid:12)(cid:12)(cid:12) d q t d t (cid:12)(cid:12)(cid:12)(cid:12) d t + (cid:16) h ′ (0) V ( q ) − h ′ (1) V ( q ) (cid:17) , (2.5)where the invariance ˜ q = q , ˜ q = q was used in the last boundary terms. Thisfundamental estimate gives a quantiﬁed bound on the kinetic energy (namely the L speed) of ˜ q in terms of that of the original curve q , and will be the cornerstone ofthe whole analysis.Both the convexity and the convergence of the Schrödinger problem will actuallyfollow by setting h ( t ) = εH ( t ) for suitable choices of H ( t ) ≥ , and then letting ε ↓ .Note that in this case we have h ( t ) = εH ( t ) ↓ uniformly, hence the perturbed curve q εt := Φ( εH ( t ) , q t ) (2.6)will converge uniformly to q as ε ↓ too. A ﬁrst use of (2.5) will be crucial in proving the Γ -convergence of the Schrödingerfunctional A ε ( q ) := 12 Z (cid:12)(cid:12)(cid:12)(cid:12) d q t d t (cid:12)(cid:12)(cid:12)(cid:12) d t + ε Z |∇ V ( q t ) | d t towards the kinetic action A ( q ) := 12 Z (cid:12)(cid:12)(cid:12)(cid:12) d q t d t (cid:12)(cid:12)(cid:12)(cid:12) d t as ε ↓ . Theorem 2.1 (formal Γ -limit) . For any q , q ∈ M it holds A = Γ − lim ε → A ε for the uniform convergence on the space of curves with ﬁxed endpoints q , q .Proof. We check separately the Γ − lim inf and the Γ − lim sup properties. As forthe former, given any curve q joining q , q and any q ε → q uniformly, since thekinetic energy functional q

7→ A ( q ) is always lower semicontinuous for the uniformconvergence we get ﬁrst A ( q ) ≤ lim inf ε ↓ A ( q ε ) ≤ lim inf ε ↓ A ε ( q ε ) . For the Γ − lim sup , let H ( t ) = min { t, − t } be the hat function centered at t = 1 / with height / and vanishing at the boundaries, set h ( t ) = εH ( t ) , and let q ε bethe regularized curve constructed in (2.6). In this simple smooth setting it is notdiﬃcult to check that q ε → q uniformly. Moreover, our choice of h ( t ) results in8 h ′ ( t ) | = ε with h ′ (0) = ε , h ′ (1) = − ε , and h ′′ ( t ) = − εδ / ( t ) in the distributionalsense. Therefore (2.5) gives immediately A ε ( q ε ) + 2 εV ( q ε / ) = 12 Z (cid:12)(cid:12)(cid:12)(cid:12) d q εt d t (cid:12)(cid:12)(cid:12)(cid:12) d t + ε Z |∇ V ( q εt ) | d t + 2 εV ( q ε / ) ≤ Z e − ελH ( t ) (cid:12)(cid:12)(cid:12)(cid:12) d q t d t (cid:12)(cid:12)(cid:12)(cid:12) d t + ε (cid:16) V ( q ) + V ( q ) (cid:17) . The singularity of h ′′ at t = 1 / can be easily and rigorously worked around, simplyintegrating by parts (2.2) separately on each interval t ∈ [0 , / and t ∈ [1 / , andkeeping track of the boundary terms resulting ultimately in the above εV ( q ε / ) ≥ .Discarding this latter non-negative term ﬁnally gives lim sup ε ↓ A ε ( q ε ) ≤ lim sup ε ↓ ( Z e − ελH ( t ) (cid:12)(cid:12)(cid:12)(cid:12) d q t d t (cid:12)(cid:12)(cid:12)(cid:12) d t + ε (cid:16) V ( q ) + V ( q ) (cid:17)) = A ( q ) and concludes the proof. The second consequence of our fundamental estimate (2.5) is the quantiﬁcation of theconvexity of the potential V in terms of the quantiﬁed contractivity (2.3). The pointhere is that the result can be obtained directly from (2.2), which can be establishedin a purely metric setting without relying on diﬀerential calculus (see the next sectionfor details). Theorem 2.2.

Assume that V satisﬁes 2.3. Then V is λ -geodesically convex, i.e. V ( q θ ) ≤ (1 − θ ) V ( q ) + θV ( q ) − λ θ (1 − θ ) d ( q , q ) , θ ∈ (0 , for any geodesic ( q θ ) θ ∈ [0 , in M .Proof. Let ( q t ) t ∈ [0 , be an arbitrary geodesic with endpoints q , q . For ﬁxed θ ∈ (0 , let H θ ( t ) :=  θ t if t ∈ [0 , θ ] , − − θ ( t − if t ∈ [ θ, , be the hat function centered at t = θ with height and vanishing at t = 0 , , andfor any ε > let q ε be the regularized curve constructed in (2.6) with h ( t ) = εH θ ( t ) .Note moreover that h ′ (0) = εθ , h ′ (1) = − ε − θ , h ′′ ( t ) = − ε (cid:18) θ + 11 − θ (cid:19) δ θ ( t ) in the distributional sense. Discarding the non-negative term | h ′ ( t ) | |∇ V (˜ q t ) | in92.5), the optimality of the geodesic q from q to q gives ≤ Z (cid:12)(cid:12)(cid:12)(cid:12) d q εt d t (cid:12)(cid:12)(cid:12)(cid:12) d t − Z (cid:12)(cid:12)(cid:12)(cid:12) d q t d t (cid:12)(cid:12)(cid:12)(cid:12) d t (2.5) ≤ Z h ′′ ( t ) V ( q εt ) d t + 12 Z (cid:16) e − λh ( t ) − (cid:17) (cid:12)(cid:12)(cid:12)(cid:12) d q t d t (cid:12)(cid:12)(cid:12)(cid:12) d t + (cid:16) h ′ (0) V ( q ) − h ′ (1) V ( q ) (cid:17) = − ε (cid:18) θ + 11 − θ (cid:19) V ( q εθ ) + d ( q , q )2 Z (cid:16) e − λεH θ ( t ) − (cid:17) d t + ε (cid:18) θ V ( q ) + 11 − θ V ( q ) (cid:19) , where the last equality follows from the constant speed property | d q t d t | = d ( q , q ) of the geodesic ( q t ) t ∈ [0 , connecting q , q as well as from the explicit properties of h ( t ) = εH θ ( t ) listed above. Multiplying by θ (1 − θ ) ε > and rearranging gives V ( q εθ ) ≤ (1 − θ ) V ( q ) + θV ( q ) + θ (1 − θ ) d ( q , q )2 Z e − λεH θ ( t ) − ε d t | {z } := I ε . Since R H θ ( t )d t = for all θ we see that I ε → − λ R H θ ( t ) d t = − λ as ε ↓ , andthe result immediately follows since V ( q εθ ) → V ( q θ ) as well in the left-hand side. Before trying to adapt the previous computations to the metric context we need toﬁx once and for all the framework to be used in the sequel. • By C ([0 , , (X , d )) , or simply C ([0 , , X) , we denote the space of continu-ous curves with values in the metric space (X , d ) . The collection of abso-lutely continuous curves on [0 , is denoted by AC ([0 , , (X , d )) , or simply by AC ([0 , , X) . For any curve ( γ t ) ∈ AC ([0 , , X) , its length is well deﬁned as ℓ ( γ ) := Z | ˙ γ t | d t, where | ˙ γ t | denotes the metric speed of γ . If | ˙ γ t | ∈ L (0 , , then we shall saythat ( γ t ) ∈ AC ([0 , , X) . For these notions of absolutely continuous curvesand metric speed in a metric space, see for instance [2, Section 1.1].• A curve γ : [0 , → X is called geodesic provided d ( γ t , γ s ) = | t − s | d ( γ , γ ) forall t, s ∈ [0 , .• The slope | ∂ E | of a functional E : X → R ∪ { + ∞} at a point x ∈ X is set as + ∞ if x / ∈ D ( E ) , if x is isolated, and deﬁned as | ∂ E | ( x ) := lim sup y → x [ E ( x ) − E ( y )] + d ( x, y ) if x ∈ D ( E ) . 10 A curve ( γ t ) t> ⊂ X is said to be a gradient ﬂow of E in the EVI λ sense (with λ ∈ R ) provided ( γ t ) ∈ AC loc ((0 , ∞ ) , X) and

12 dd t d ( γ t , y ) + λ d ( γ t , y ) + E ( γ t ) ≤ E ( y ) , ∀ y ∈ X , a.e. t > . ( EVI λ )If γ t → x as t ↓ with x ∈ D ( E ) , then we say that the gradient ﬂow ( γ t ) startsat x .After this premise, let us ﬁx the framework we shall work within. Setting 3.1.

On the space (X , d ) and on the functional E : X → R ∪ { + ∞} we makethe following assumptions:(A1) (X , d ) is a complete and separable metric space;(A2) E is lower semicontinuous with dense domain, i.e. D ( E ) = X , and locallybounded from below in the following sense: for any d -bounded set B ⊂ X thereexists c B ∈ R such that E ( x ) ≥ c B for all x ∈ B ;(A3) there exists λ ∈ R such that for any x ∈ X there exists an EVI λ -gradient ﬂow of E starting from x . In view of (3.3) , the corresponding 1-parameter semigroupshall be denoted S t . Sometimes, and always explicitly indicated, we will also use the following extrahypothesis.

Assumption 3.2.

There exists a Hausdorﬀ topology σ on X such that d -boundedsets are sequentially σ -compact. Moreover, the distance d and the slope | ∂ E | are σ -sequentially lower semicontinuous. Remark 3.3.

Assumption 3.2 is in particular valid provided (X , d ) is a locallycompact space. Indeed, in this case the metric topology of (X , d ) is an admissi-ble candidate for σ , since bounded sets are relatively compact (by [17, Proposition2.5.22]) and the lower semicontinuity of the slope | ∂ E | w.r.t. the metric topology isa consequence of the forthcoming identity (3.1). (cid:4) Remark 3.4.

Assumption 3.2 implies that d -converging sequences are also σ -converging.Indeed, given ( x n ) ⊂ X with d ( x, x n ) → as n → ∞ for some limit x ∈ X , by As-sumption 3.2 and by the boundedness of ( x n ) n there exist a subsequence ( x n k ) k and y ∈ X such that x n k σ → y as k → ∞ . Since d is σ -sequentially lower semicontinuous (again by Assumption 3.2) we deducethat d ( x, y ) ≤ lim inf k →∞ d ( x, x n k ) = lim n →∞ d ( x, x n ) = 0 , whence x = y . This classically implies that the whole sequence converges, x n σ → x . (cid:4)

11e list now some useful properties of

EVI -gradient ﬂows, which hold true inSetting 3.1 and that we shall use extensively in the sequel. First of all, whenever x ∈ X is the starting point of an EVI λ ﬂow, the slope there (a local object, a priori)admits the global representation | ∂ E | ( x ) = sup y = x (cid:16) E ( x ) − E ( y ) d ( x, y ) + λ d ( x, y ) (cid:17) + , (3.1)see [54, Proposition 3.6]. Since we assume that any x ∈ X is the starting point of an EVI λ -gradient ﬂow, this means in particular that | ∂ E | : X → [0 , ∞ ] is lower semicon-tinuous, since so is the right-hand side above (as a supremum of lower semicontinuousfunctions). This also implies by [2, Theorem 1.2.5] that | ∂ E | is a strong upper gradi-ent for E in the sense of [2, Deﬁnition 1.2.1], namely: for every ( γ t ) ∈ AC ([0 , , X) ,the map t E ( γ t ) is Borel and | E ( γ t ) − E ( γ t ) | ≤ Z t t | ∂ E | ( γ t ) | ˙ γ t | d t, ∀ ≤ t ≤ t ≤ , (3.2)the right-hand side being possibly inﬁnite. In addition, if ( γ t ) is an EVI λ -gradientﬂow of E then the following hold [54, Theorem 3.5]:(i) If ( γ t ) starts from x ∈ D ( E ) and (˜ γ t ) is a second EVI λ -gradient ﬂow of E starting from y ∈ D ( E ) respectively, then d ( γ t , ˜ γ t ) ≤ e − λt d ( x, y ) , ∀ t ≥ . (3.3)This means that EVI-gradient ﬂows are unique (provided they exist) and thusif there exists an EVI-gradient ﬂow ( γ t ) starting from x , then a 1-parametersemigroup ( S t ) t> is unambiguously associated to it via S t ( x ) = γ t .(ii) The maps t γ t and t E ( γ t ) are locally Lipschitz in (0 , ∞ ) with values in X and R , respectively, and satisfy the Energy Dissipation Equality − dd t E ( γ t ) = 12 | ˙ γ t | + 12 | ∂ E | ( γ t ) = | ˙ γ t | = | ∂ E | ( γ t ) , for a.e. t > . (3.4)(iii) The map t e λt | ∂ E | ( γ t ) is non-increasing . (3.5)(iv) If ( γ t ) starts from x and y ∈ D ( | ∂ E | ) , then | ∂ E | ( γ t ) ≤ e λt − | ∂ E | ( y ) + 1 I λ ( t ) d ( x, y ) , provided − λt < log 2 , (3.6)where I λ ( t ) := R t e λs d s .We emphasize that these properties directly follow from the very deﬁnition ( EVI λ ) ofgradient ﬂows, and a priori do not require E to be geodesically λ -convex. Althoughanalogous statements can be found in [2] and [1] under convexity assumptions on E , the latter are essentially needed to grant existence of EVI-gradient ﬂows. It isimportant to stress this fact because in Setting 3.1 we assume that for any x ∈ X there exists an EVI λ -gradient ﬂow of E starting there, which from [27] is known to12mply that E is geodesically convex. In Section 3 we also provide an alternative proofof this latter fact, whence the necessity for us to avoid all properties of EVI -gradientﬂows which are actually a consequence of geodesic convexity.We conclude this preliminary part with a general integrability result about

EVI λ -gradient ﬂows, which we could not ﬁnd explicitly written in the literature and willbe used later on in the proof of Lemma 3.6. Lemma 3.5.

With the same assumptions and notations as in Setting 3.1, let x ∈ X .Then t E ( S t x ) is integrable in [0 , T ] , for all T > ,regardless of whether E ( x ) is ﬁnite or not. On intervals [ ε, T ] this computation is easily justiﬁed by the fact that t E ( S t x ) is locally Lipschitz in (0 , ∞ ) , hence locally integrable therein. But this computationis legitimate even if ε = 0 , as we are going to see. Proof.

Let x ∈ X and T > be as in our statement. Since E is bounded from belowon d -bounded sets by (A2), and because ( S t x ) t ∈ [0 ,T ] is bounded, there exists c ∈ R such that E ( S t x ) ≥ c for all t ∈ [0 , T ] . Combining with ( EVI λ ) this gives c ≤ E ( S t x ) ≤ E ( y ) −

12 dd t d ( S t x, y ) − λ d ( S t x, y ) for any y ∈ D ( E ) and t ∈ (0 , T ] . Integrating from t = η > to t = T gives c ( T − η ) ≤ Z Tη E ( S t x ) d t ≤ ( T − η ) E ( y ) − (cid:16) d ( S T x, y ) − d ( S η x, y ) (cid:17) − λ Z Tη d ( S t x, y ) d t. As t E ( S t x ) is bounded from below on [0 , T ] and the right-hand side has a ﬁnitelimit as η ↓ (thanks to the fact that t S t x is d -continuous on [0 , ∞ ) by the verydeﬁnition of EVI λ -gradient ﬂow), we deduce the desired integrability. In this section the formal Riemannian computations carried out at the beginningof Section 2, and more precisely (2.4), will be reproduced rigorously in the abstractSetting 3.1. To this aim, a key role will be played by the following purely metricestimate:

Lemma 3.6.

With the same assumptions and notations as in Setting 3.1, let ( γ t ) ∈ AC ([0 , , X) with E ( γ ) , E ( γ ) < ∞ . For any ﬁxed absolutely continuous function h : [0 , → R with h ( t ) > for all t ∈ (0 , let ˜ γ t := S h ( t ) γ t , t ∈ [0 , , and for any ≤ t < t ≤ write t + := (cid:26) t if h ( t ) ≥ h ( t ) t otherwise and t − := (cid:26) t if h ( t ) ≥ h ( t ) t otherwise . (3.7)13 hen we have the exact estimate (cid:12)(cid:12)(cid:12)(cid:12) d (˜ γ t , ˜ γ t ) t − t (cid:12)(cid:12)(cid:12)(cid:12) + 12 λ | ∂ E | (˜ γ t + ) e λ ( h ( t ) − h ( t )) + e λ ( h ( t ) − h ( t )) − t − t ) + 1 − e − λ ( h ( t + ) − h ( t − )) λ ( t + − t − ) · E (˜ γ t ) − E (˜ γ t ) t − t ≤ e − λ ( h ( t )+ h ( t )) (cid:12)(cid:12)(cid:12)(cid:12) d ( γ t , γ t ) t − t (cid:12)(cid:12)(cid:12)(cid:12) . (3.8)Here we use the convention that (+ ∞ ) × whenever | ∂ E | (˜ γ t + ) = + ∞ and h ( t ) = h ( t ) in the second term on the left-hand side of (3.8). Since we assume that h ( t ) > for t ∈ (0 , , and because any EVI λ -gradient ﬂow immediately falls within D ( | ∂ E | ) by standard regularizing eﬀects, this latter case is in fact only possible if t = 0 , t = 1 , and h ( t ) = h ( t ) = 0 . In that case ˜ γ = γ and ˜ γ = γ , the thirdterm in the left-hand side also cancels owing to e − λ ( h ( t + ) − h ( t − )) = 1 , and (3.8) thenholds as a trivial equality.We shall rely on this lemma later on in two diﬀerent ways: First, ﬁxing t = 0 and letting t ↓ (resp. ﬁxing t = 1 and letting t ↑ ) to control in Lemma 3.10the continuity of t E (˜ γ t ) at the boundaries t = 0 , , and second, ﬁxing t ∈ (0 , and letting t → t to obtain in Proposition 3.11 a pointwise diﬀerential estimatesimilar to (2.4). Remark 3.7.

The times t ± are just a fancy notation, ordered as h ( t − ) ≤ h ( t + ) .Note that in our estimate (3.8) the Fisher information | ∂ E | (˜ γ t + ) is evaluated at thetime t = t + for which the “smoothing time” s = h ( t ) or s = h ( t ) is the largest,i.e. where the regularizing vertical ﬂow has been run for the longest time. This issomehow natural, as this speciﬁc point is “better” than the other one in terms ofregularity. (cid:4) Proof.

By symmetry we only discuss the case h ( t ) ≥ h ( t ) , i.e. t + = t and t − = t .As already mentioned, if h ( t ) = h ( t ) = 0 our statement is actually vacuous, thusit is not restrictive to further assume h ( t ) > . Let us write for simplicity ˆ γ t := S h ( t ) γ t . From an intuitive point of view, this corresponds to freezing a “vertical time” s = h ( t ) and “translating” ˜ γ t in the “horizontal” t direction parallel to the curve γ until t . Here, in the “vertical” direction above t the smoothing semigroup S s associatedwith E has been run at least for a strictly positive time h ( t ) > , so that by (3.6)the solution of the “vertical” gradient ﬂow at that time lies within the regular domain X = D ( | ∂ E | ) ⊂ X = D ( E ) ⊂ X , see Figure 2.The ﬁrst step is to write ( EVI λ ) for s S s ( γ t ) with ˜ γ t as a reference point,namely

12 dd s d ( S s γ t , ˜ γ t ) + λ d ( S s γ t , ˜ γ t ) + E ( S s γ t ) ≤ E (˜ γ t ) , which holds true for a.e. s ∈ [0 , h ( t )] in the “vertical” direction. This inequality canbe equivalently rewritten as

12 dd s (cid:16) e λs d ( S s γ t , ˜ γ t ) (cid:17) ≤ e λs (cid:16) E (˜ γ t ) − E ( S s γ t ) (cid:17) . (3.9)14 t γ t ˜ γ t ˆ γ t [ s = h ( t )]˜ γ t [ s = h ( t )] t s Figure 2: The horizontal and vertical curvesNote that this estimate carries signiﬁcant information if and only if the referencepoint has ﬁnite entropy, i.e. E (˜ γ t ) < ∞ in the right-hand side. This holds true for t ∈ (0 , because ˜ γ t is the EVI λ -gradient ﬂow of E starting from γ t at a strictlypositive time s = h ( t ) > , but also for h ( t ) = 0 if t = 0 since in this case ˜ γ = γ is assumed to have ﬁnite entropy.Integrating (3.9) from s = h ( t ) to s = h ( t ) gives e λh ( t ) d (˜ γ t , ˜ γ t ) − e λh ( t ) d (ˆ γ t , ˜ γ t ) ≤ Z h ( t ) h ( t ) e λs (cid:16) E (˜ γ t ) − E ( S s γ t ) (cid:17) d s = Z h ( t ) h ( t ) e λs (cid:16)(cid:0) E (˜ γ t ) − E (˜ γ t ) (cid:1) + (cid:0) E (˜ γ t ) − E ( S s γ t ) (cid:1)(cid:17) d s = Z h ( t ) h ( t ) e λs (cid:16) E (˜ γ t ) − E ( S s γ t ) (cid:17) d s − e λh ( t ) − e λh ( t ) λ (cid:16) E (˜ γ t ) − E (˜ γ t ) (cid:17) . (3.10)If h ( t ) > this computation is legitimate because s S s γ t and s E ( S s γ t ) arelocally Lipschitz in (0 , ∞ ) , hence s d ( S s γ t , ˜ γ t ) and s E ( S s γ t ) are locallyintegrable therein. But this computation is also justiﬁed when h ( t ) = 0 by Lemma3.5. More speciﬁcally s E ( S s γ t ) is absolutely integrable on [0 , T ] for any T > and a fortiori so is s e λs E ( S s γ t ) .Now let us estimate the terms in (3.10) to get (3.8). First, since ˆ γ t = S h ( t ) γ t , ˜ γ t = S h ( t ) γ t , and S h ( t ) ( · ) is λ -contractive by (3.3), we observe that the secondterm in the left-hand side of (3.10) can be controlled as d (ˆ γ t , ˜ γ t ) = d ( S h ( t ) γ t , S h ( t ) γ t ) ≤ e − λh ( t ) d ( γ t , γ t ) . (3.11)15n the right-hand side, let us deﬁne I := Z h ( t ) h ( t ) e λs (cid:16) E (˜ γ t ) − E ( S s γ t ) (cid:17) d s. This integral is clearly non-positive by (3.4), but we need a ﬁner analysis. To thisaim, for ﬁxed < s < h ( t ) let us write E (˜ γ t ) − E ( S s γ t ) = E ( S h ( t ) γ t ) − E ( S s γ t )= Z h ( t ) s dd τ E ( S τ γ t ) d τ = − Z h ( t ) s | ∂ E | ( S τ γ t ) d τ, where the second equality holds due to τ E ( S τ γ t ) being Lipschitz on [ s, h ( t )] ,and the third one stems from (3.4) for the gradient ﬂow τ S τ γ t . By (3.5) − Z h ( t ) s | ∂ E | ( S τ γ t ) d τ ≤ − Z h ( t ) s | ∂ E | ( S h ( t ) γ t ) e λ ( h ( t ) − τ ) d τ = − e λh ( t ) | ∂ E | ( S h ( t ) γ t ) Z h ( t ) s e − λτ d τ = 12 λ (cid:16) − e λ ( h ( t ) − s ) (cid:17) | ∂ E | ( S h ( t ) γ t ) , so that, as a consequence, I ≤ λ Z h ( t ) h ( t ) e λs (cid:16) − e λ ( h ( t ) − s ) (cid:17) | ∂ E | ( S h ( t ) γ t ) d s = 12 λ | ∂ E | ( S h ( t ) γ t ) Z h ( t ) h ( t ) (cid:16) e λs − e λh ( t ) · e − λs (cid:17) d s = − λ | ∂ E | (˜ γ t ) e λh ( t ) (cid:16) e λ ( h ( t ) − h ( t )) + e λ ( h ( t ) − h ( t )) − (cid:17) . Plugging this estimate together with (3.11) into (3.10) and dividing by ( t − t ) > entails our claim.We also need to study the behaviour of the “regularized” curve ˜ γ t := S h ( t ) γ t andof the entropy E along it: this is the content of the following two results. Lemma 3.8.

With the same assumptions and notations as in Setting 3.1, if ( γ t ) ∈ AC ([0 , , X) and h : [0 , → R is absolutely continuous with h ( t ) > for all t ∈ (0 , , then the curve ˜ γ t := S h ( t ) γ t belongs to AC loc ((0 , , X) ∩ C ([0 , , X) .Proof. Fix δ ∈ (0 , / , t , t ∈ [ δ, − δ ] with t ≤ t and deﬁne m δ := min t ∈ [ δ, − δ ] h ( t ) , M δ := max t ∈ [ δ, − δ ] h ( t ) , (3.12)paying attention to the fact that m δ > by construction. Write as before ˆ γ t := S h ( t ) γ t for the “horizontal” translation of ˜ γ t (see Figure 2). By triangular inequalityand the contraction estimate (3.3) we get d (˜ γ t , ˜ γ t ) ≤ d (˜ γ t , ˆ γ t ) + d (ˆ γ t , ˜ γ t ) ≤ e λ − M δ d ( γ t , γ t ) + d (ˆ γ t , ˜ γ t ) , (3.13)16here λ − := max {− λ, } . Since ( γ t ) is absolutely continuous the ﬁrst term in theright-hand side can be controlled as d ( γ t , γ t ) ≤ R t t | ˙ γ t | d t . As regards the secondone, by (3.4) and up to assuming h ( t ) ≤ h ( t ) (which is not restrictive, as otherwiseit is suﬃcient to swap the boundary values of integration below) it holds d (ˆ γ t , ˜ γ t ) = d ( S h ( t ) γ t , S h ( t ) γ t ) ≤ Z h ( t ) h ( t ) (cid:12)(cid:12)(cid:12) dd s S s γ t (cid:12)(cid:12)(cid:12) d s = Z h ( t ) h ( t ) | ∂ E | ( S s γ t ) d s, (3.14)where, to avoid possibly ambiguous notations, | dd s S s γ t | denotes the metric speed ofthe “vertical” curve s S s γ t . In order to control the slope in the right-most termuniformly both in s ∈ [ h ( t ) , h ( t )] ⊂ [ m δ , M δ ] and in t ∈ [ δ, − δ ] , for ﬁxed δ , let ε be such that − λε < log 2 (if λ ≥ , choose ε = m δ ) and deﬁne ε ′ := min { m δ , ε } .Then by (3.5) and the fact that s ≥ h ( t ) ≥ m δ ≥ ε ′ we have | ∂ E | ( S s γ t ) ≤ e λ ( ε ′ − s ) | ∂ E | ( S ε ′ γ t ) ≤ e λ − ( M δ − ε ′ ) | ∂ E | ( S ε ′ γ t ) , ∀ s ∈ [ m δ , M δ ] and by (3.6) for any reference point x ∈ D ( | ∂ E | ) it holds | ∂ E | ( S ε ′ γ t ) ≤ e λε ′ − | ∂ E | ( x ) + 1 I λ ( ε ′ ) d ( x, γ t ) . The squared distance in the right-hand side above is bounded uniformly in t ∈ [ δ, − δ ] , since by triangular inequality d ( x, γ t ) ≤ d ( x, γ ) + d ( γ , γ t ) ≤ d ( x, γ ) + ℓ ( γ ) . Therefore there exists C δ > such that | ∂ E | ( S s γ t ) ≤ C δ for all t ∈ [ δ, − δ ] and s ∈ [ m δ , M δ ] . (3.15)and plugging this bound into (3.14) yields d (ˆ γ t , ˜ γ t ) ≤ C δ | h ( t ) − h ( t ) | ≤ C δ Z t t | h ′ ( t ) | d t. It is now suﬃcient to combine this inequality with d ( γ t , γ t ) ≤ R t t | ˙ γ t | d t and (3.13)to get d (˜ γ t , ˜ γ t ) ≤ Z t t (cid:16) e λ − M δ | ˙ γ t | + C δ | h ′ ( t ) | (cid:17) d t. (3.16)As e λ − M δ | ˙ γ t | + C δ | h ′ ( t ) | ∈ L ( δ, − δ ) and δ is arbitrary, the fact that (˜ γ t ) ∈ AC loc ((0 , is thus proved.Turning now to the continuity of (˜ γ t ) at the endpoints, let t = 0 and t ∈ (0 , .Arguing as for (3.13) but with a crucial diﬀerence in the choice of the third point inthe triangular inequality, it holds d (˜ γ , ˜ γ t ) ≤ d (˜ γ , S h ( t ) γ ) + d ( S h ( t ) γ , ˜ γ t ) ≤ d ( S h (0) γ , S h ( t ) γ ) + e − λh ( t ) d ( γ , γ t ) . The second term on the right-hand side vanishes as t ↓ by (absolute) continuityof γ and so does the ﬁrst one, since s S s γ is continuous in [0 , ∞ ) with values in X and h ( t ) → h (0) . The continuity at t = 1 is obtained similarly and the proof iscomplete. 17 emark 3.9. If h ( t ) > also in t = 0 , , then the previous argument can beextended to the whole interval [0 , and therefore (˜ γ t ) ∈ AC ([0 , , X) . (cid:4) Lemma 3.10.

With the same assumptions and notations as in Lemma 3.8, theentropy is locally absolutely continuous in (0 , along the regularized curve ˜ γ t , i.e. t E (˜ γ t ) ∈ AC loc ((0 , . If in addition ( γ t ) ∈ AC ([0 , , X) , E (˜ γ ) , E (˜ γ ) < ∞ and h is diﬀerentiable at t = 0 and t = 1 with h ′ (0) > and h ′ (1) < , then t E (˜ γ t ) ∈ C ([0 , . Note that E (˜ γ ) , E (˜ γ ) < ∞ is automatically satisﬁed if h ( t ) > also in t = 0 , . Proof.

Let us ﬁrst prove that t E (˜ γ t ) is locally absolutely continuous. Since | ∂ E | is a strong upper-gradient, the chain rule (3.2) holds and it suﬃces to show that | ∂ E | (˜ γ t ) | ˙˜ γ t | ∈ L loc (0 , , namely Z − δδ | ∂ E | (˜ γ t ) | ˙˜ γ t | d t < ∞ , ∀ δ ∈ (0 , / , (3.17)as this would imply that E ◦ ˜ γ ∈ AC loc ((0 , with (cid:12)(cid:12)(cid:12)(cid:12) dd t ( E ◦ ˜ γ )( t ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ | ∂ E | (˜ γ t ) · | ˙˜ γ t | , for a.e. t ∈ (0 , . To this aim, observe from (3.16) that | ˙˜ γ t | ∈ L loc (0 , with | ˙˜ γ t | ≤ e λ − M δ | ˙ γ t | + C δ | h ′ ( t ) | a.e. on [ δ, − δ ] , with M δ deﬁned in (3.12). Moreover from (3.15) we also know that | ∂ E | ( S s γ t ) ≤ C δ uniformly in t ∈ [ δ, − δ ] and s ∈ [ m δ , M δ ] , so that by choosing s = h ( t ) we get in particular | ∂ E | (˜ γ t ) ≤ C δ for all t ∈ [ δ, − δ ] . This shows that t

7→ | ∂ E | (˜ γ t ) belongs to L ∞ loc (0 , , whence (3.17).Now assume that ( γ t ) ∈ AC ([0 , , X) , E (˜ γ ) < ∞ , h is diﬀerentiable at t = 0 with h ′ (0) > and let us prove that t E (˜ γ t ) is continuous at t = 0 . (The argumentis identical for t = 1 .) On the one hand, as (˜ γ t ) is continuous at t = 0 by Lemma3.8 and E is lower semicontinuous, we see that E (˜ γ ) ≤ lim inf t ↓ E (˜ γ t ) . On the otherhand, choosing t = 0 in (3.8), our assumption that h ′ (0) > gives h ( t ) > h (0) for t > small, hence t − = 0 and t + = t . Discarding the ﬁrst two (non-negative)terms on the left-hand side, and multiplying by ( t − t ) = t yield − e − λ ( h ( t ) − h (0)) λ ( t − · (cid:0) E (˜ γ t ) − E (˜ γ ) (cid:1) ≤ t e − λ ( h ( t )+ h (0)) (cid:12)(cid:12)(cid:12)(cid:12) d ( γ t , γ ) t (cid:12)(cid:12)(cid:12)(cid:12) ≤ t e − λ ( h ( t )+ h (0)) (cid:16) t Z t | ˙ γ t | d t (cid:17) ≤ e − λ ( h ( t )+ h (0)) Z t | ˙ γ t | d t. . Letting t ↓ , the right-hand side vanishes owing to our assumption that ( γ t ) ∈ AC ([0 , , X) , and clearly the exponential diﬀerence quotient in the left-hand sideconverges to h ′ (0) . Rearranging gives h ′ (0) lim sup t ↓ E (˜ γ t ) ≤ h ′ (0) E (˜ γ ) , h ′ (0) > the desired upper semicontinuity follows and the proof is complete.Gathering the results proven so far, we deduce the following: Proposition 3.11.

With the same assumptions and notations as in Lemma 3.8, fora.e. t ∈ (0 , it holds (cid:12)(cid:12) ˙˜ γ t (cid:12)(cid:12) + 12 | h ′ ( t ) | | ∂ E | (˜ γ t ) + h ′ ( t ) dd t E (˜ γ t ) ≤ e − λh ( t ) | ˙ γ t | . (3.18) Proof.

The argument simply consists in taking the limit t → t in (3.8), whichshould clearly lead (at least formally) to (3.18) by Taylor-expanding the variousexponential diﬀerence quotients. In order to make this rigorous, note that the ﬁrstand third terms in the left-hand side of (3.18) are well deﬁned for a.e. t ∈ (0 , byLemma 3.8 and Lemma 3.10, respectively. The second term is also unambiguouslydeﬁned because h ( t ) > , hence the “vertical” EVI λ -gradient ﬂow starting from γ t and deﬁning ˜ γ t = S h ( t ) γ t falls immediately within the domain X = D ( | ∂ E | ) . Theright-hand side is well deﬁned for a.e. t since γ ∈ AC ([0 , , X) .After this premise, let t ∈ (0 , be any diﬀerentiation point for h , t γ t , t ˜ γ t and t E (˜ γ t ) , choose t = t in (3.8) and let us take the right limit t ↓ t (since weare considering a diﬀerentiability point the left and right limits exist and are equal,so there is no need to address the left limit). From the very deﬁnition (3.7) of t ± it clearly holds t ± → t as t ↓ t , hence the convergence of the right-hand side of(3.8) to the right-hand side of (3.18) is clear and so is the convergence of the twodiﬀerence quotients of h . By Lemma 3.8 the ﬁrst term in the left-hand side alsopasses to the limit, as does the third one according to Lemma 3.10. The only termleft to handle is the Fisher information | ∂ E | (˜ γ + ) . From the continuity of t ˜ γ t (cf. Lemma 3.8) we see that ˜ γ t + → ˜ γ t in (X , d ) , and the lower semicontinuity of theslope (3.1) results in | ∂ E | (˜ γ t ) ≤ lim inf t ↓ | ∂ E | (˜ γ t + ) . Thus rigorously taking the liminf t ↓ t in (3.8) entails (3.18) and achieves theproof.The interesting consequence for our purpose is then: Theorem 3.12.

With the same assumptions and notations as in Setting 3.1, ﬁx ε > , and set h ε ( t ) := ε min { t, − t } . Let ( γ t ) ∈ AC ([0 , , X) be such that E ( γ ) , E ( γ ) < ∞ and deﬁne γ εt := S h ε ( t ) γ t , t ∈ [0 , . Then ( γ εt ) ∈ AC ([0 , , X) , t E ( γ εt ) belongs to AC ([0 , and it holds Z | ˙ γ εt | d t + ε Z | ∂ E | ( γ εt ) d t ≤ e λ − ε Z | ˙ γ t | d t − ε E ( γ ε / )+ ε (cid:16) E ( γ ) + E ( γ ) (cid:17) . (3.19)Note here that h ε (0) = h ε (1) = 0 , so that the endpoints γ ε = γ and γ ε = γ remainunchanged. 19 roof. The strategy of proof simply consists in integrating (3.18) between and while integrating by parts of the term h ′ ε ( t ) dd t E ( γ εt ) , separately on [0 , / and [1 / , . Note carefully that our speciﬁc choice gives h ′ ε = ε and h ′ ε = − ε on thesetwo time intervals, respectively. Taking into account e − λh ε ( t ) ≤ e λ − ε , where λ − :=max {− λ, } , this procedure yields Z | ˙ γ εt | d t + ε Z | ∂ E | ( γ εt ) d t ≤ e λ − ε Z | ˙ γ t | d t − ε E ( γ ε / )+ ε (cid:16) E ( γ ε ) + E ( γ ε ) (cid:17) . The term ε E ( γ ε / ) simply arises from the two boundary terms at t = 1 / in the twointegrations by parts. (Alternatively, it can be seen as the result of − R E ( γ εt ) h ′′ ( t ) arising from the integration by parts in the whole interval [0 , , with the singularity h ′′ ( t ) = − εδ / ( t ) ). However, this argument is not fully rigorous because all theterms on the left-hand side of (3.18) are only locally integrable, hence we may notbe allowed to integrate them all the way to t = 0 and t = 1 .In order to circumvent this slight issue, choose δ ∈ (0 , / and carry out the sameargument on [ δ, / and [1 / , − δ ] rather than on [0 , / and [1 / , : Integrationby parts is now justiﬁed by Lemma 3.10 and this provides us with Z − δδ | ˙ γ εt | d t + ε Z − δδ | ∂ E | ( γ εt ) d t ≤ e λ − ε Z − δδ | ˙ γ t | d t + ε (cid:16) E ( γ εδ ) − E ( γ ε / ) + E ( γ ε − δ ) (cid:17) . (3.20)It is then suﬃcient to pass to the limit as δ ↓ . By monotonicity the left-hand sideabove converges to the left-hand side in (3.19) and for the same reason so does theﬁrst term on the right-hand side, while by the current choice of h and by Lemma3.10 t E ( γ εt ) is continuous on the whole interval [0 , , so that lim δ ↓ ε (cid:16) E ( γ εδ ) + E ( γ ε − δ ) (cid:17) = ε (cid:16) E ( γ ε ) + E ( γ ε ) (cid:17) = ε (cid:16) E ( γ ) + E ( γ ) (cid:17) and (3.19) follows.Finally, since the right-hand side of (3.19) is ﬁnite we see that | ˙ γ ε | ∈ L (0 , and | ∂ E | ( γ ε ) ∈ L (0 , . As a consequence | ˙ γ ε | · | ∂ E ( γ ε ) | ∈ L (0 , in the strongupper-chain rule (3.2), and E ◦ γ ε ∈ AC ([0 , as desired. Γ -convergence of the Schrödinger problem Relying on the results of the previous section, we can now turn to Theorem 2.1 andmake it rigorous in the metric setting. To this end, let us ﬁrst introduce two actionfunctionals: the kinetic energy A and the (halved) Fisher information I along acurve, respectively deﬁned as A ( γ ) := 12 Z | ˙ γ t | d t and I ( γ ) := 12 Z | ∂ E | ( γ t ) d t ( γ t ) ∈ C ([0 , , X) , where it is understood that A ( γ ) = + ∞ whenever γ is notabsolutely continuous. Given two points x, y ∈ X and a temperature/slowing-downparameter ε > , the (metric) Schrödinger problem reads as inf ( γ t ) : x y n A ( γ ) + ε I ( γ ) o , ( Sch ε )where ( γ t ) : x y is a short-hand notation meaning that the inﬁmum runs over all ( γ t ) ∈ C ([0 , , X) such that γ = x and γ = y . For sake of brevity we also introduce A ε ( γ ) := A ( γ ) + ε I ( γ ) . From (

Sch ε ) it is thus clear that the Fisher information I acts as a perturbation of A and this has a regularizing eﬀect, since minimizers of ( Sch ε ) live within the regulardomain X = D ( | ∂ E | ) . Remark 4.1.

The smoothing eﬀect is well understood for the classic Schrödingerproblem in a regular setting, namely when E is the Boltzmann-Shannon relativeentropy and X is the Wasserstein space over a smooth Riemannian manifold. Inthis case, under mild assumptions on the end-points, minimizers of ( Sch ε ) are curvesof absolutely continuous measures whose densities are bounded, smooth, Lipschitz,with exponentially fast decaying tails.In the current metric framework the properties above are meaningless, but stillminimizers of ( Sch ε ) are “regular” from a metric point of view, since as just saidthey live within D ( | ∂ E | ) . Moreover, in Proposition 4.2 we are going to see that E isabsolutely continuous along optimal curves. (cid:4) Let us ﬁrst deal with the solvability of (

Sch ε ). Proposition 4.2.

With the same assumptions and notations as in Setting 3.1 andunder Assumption 3.2, for any ﬁxed x, y ∈ X and ε > the Schrödinger problem ( Sch ε ) is solvable if and only if E ( x ) , E ( y ) < ∞ and there exists ( γ t ) ∈ AC ([0 , , X) such that γ = x and γ = y . As the condition characterizing the solvability of the Schrödinger problem doesnot depend on ε , it is clear that if ( Sch ε ) is solvable for some ε > , then it is actuallysolvable for all ε > . Proof.

Assume that the endpoints have ﬁnite entropy and that there exists an abso-lutely continuous curve γ connecting x to y . Up to reparametrization, we can assumethat ( γ t ) ∈ AC ([0 , , X) . Theorem 3.12 thus guarantees that A ε is ﬁnite along theregularization ( γ εt ) t ∈ [0 , of this curve and therefore the variational problem ( Sch ε )is proper. Let then ( γ nt ) be any minimizing sequence and observe that the kineticaction A is bounded uniformly in n , say A ( γ n ) ≤ C for all n . We now observe thatfor any pair ≤ t < t ≤ it holds d ( γ nt , γ nt ) ≤ Z t t | ˙ γ nt | d t ≤ | t − t | / (cid:16) Z t t | ˙ γ nt | d t (cid:17) / ≤ C | t − t | / . (4.1)Since the endpoints are ﬁxed, this implies that the set of points γ nt is bounded in (X , d ) uniformly in n, t , thus it is σ -relatively sequentially compact by Assumption21.2. By the reﬁned Arzelà-Ascoli lemma [2, Proposition 3.3.1], there exists a limiting d -continuous (actually / -Hölder continuous) curve γ such that γ nt σ → γ t , ∀ t ∈ [0 , . We now observe that the kinetic action is lower semicontinuous for this pointwise-in-time convergence w.r.t. σ , cf. [3, Section 2.2] (indeed, d is lower semicontinuousw.r.t. σ , hence the -energies of the ﬁnite partitions of γ are lower semicontinuousw.r.t. σ too, whence the lower semicontinuity of the -energy of γ itself). Moreover | ∂ E | is also lower semicontinuous w.r.t. σ by hypothesis, and this fact together withFatou’s lemma gives Z | ∂ E | ( γ t ) d t ≤ Z lim inf n →∞ | ∂ E | ( γ nt ) d t ≤ lim inf n →∞ Z | ∂ E | ( γ nt ) d t. Therefore γ is a minimizer of ( Sch ε ).Conversely, assume that there exists a minimizer, denoted by γ (the followingargument actually works for any curve along which A ε is ﬁnite and without Assump-tion 3.2). Then in particular t

7→ | ˙ γ t | and t

7→ | ∂ E | ( γ t ) belong to L (0 , and by(3.2) we see that t E ( γ t ) is globally absolutely continuous with (cid:12)(cid:12)(cid:12)(cid:12) dd t ( E ◦ γ )( t ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ | ∂ E | ( γ t ) · | ˙ γ t | ∈ L (0 , . The fact that ( | ˙ γ t | ) ∈ L (0 , ⊂ L (0 , trivially implies ( γ t ) ∈ AC ([0 , , X) ,whereas the fact that t

7→ | ∂ E | ( γ t ) belongs to L (0 , also implies that | ∂ E | ( γ t ) isﬁnite for a.e. t ∈ [0 , and a fortiori so is E ( γ t ) , since D ( | ∂ E | ) ⊂ D ( E ) . Hence let t ∗ ∈ (0 , be any point satifying E ( γ t ∗ ) < ∞ and note that together with (3.2) thisgives the following global upper bound valid for all t < t ∗ E ( γ t ) ≤ E ( γ t ∗ ) + Z t ∗ t (cid:12)(cid:12)(cid:12)(cid:12) dd t ( E ◦ γ )( t ) (cid:12)(cid:12)(cid:12)(cid:12) d t ≤ E ( γ t ∗ ) + Z (cid:12)(cid:12)(cid:12)(cid:12) dd t ( E ◦ γ )( t ) (cid:12)(cid:12)(cid:12)(cid:12) d t =: E < ∞ . As a consequence, and taking also into account the facts that t γ t is d -continuousand E is lower semicontinuous, we get E ( γ ) = E (cid:0) lim t → γ t (cid:1) ≤ lim inf t → E ( γ t ) ≤ E and the proof is thus complete, as the same argument applies mutatis mutandis for t = 1 too.We now ﬁx x, y ∈ X and let C ([0 , , X) ∋ γ ι ( γ ) denote the convex indicatorof the endpoint constraints, i.e. ι ( γ ) = ( if γ = x and γ = y, + ∞ otherwise.With this said, we can ﬁnally state our Γ -convergence result, where the ﬁnite-entropyassumption on the endpoints is motivated by the previous proposition.22 heorem 4.3. With the same assumptions and notations as in Setting 3.1, if x, y ∈ X are such that E ( x ) , E ( y ) < ∞ , then Γ − lim ε → n A ε + ι o = A + ι for the uniform convergence on C ([0 , , X) . If Assumption 3.2 holds, then the Γ -convergence also takes place w.r.t. the pointwise-in-time σ -topology.Proof. The Γ − lim inf inequality is rather clear, since the kinetic energy A is lowersemicontinuous both w.r.t. uniform-in-time d -convergence and pointwise-in-time σ -convergence: for the former topology the fact is well known, for the latter it hasbeen discussed in the proof of Proposition 4.2. An analogous claim is also true forthe convex indicator ι . As a consequence, we have that for any γ ε converging to γ uniformly in time in the metric topology or pointwise in time in the topology σ (ifapplicable) it holds A ( γ ) + ι ( γ ) ≤ lim inf ε ↓ A ( γ ε ) + lim inf ε ↓ ι ( γ ε ) ≤ lim inf ε ↓ n A ( γ ε ) + ι ( γ ε ) o ≤ lim inf ε ↓ n A ( γ ε ) + ε I ( γ ε ) + ι ( γ ε ) o = lim inf ε ↓ n A ε ( γ ε ) + ι ( γ ε ) o , whence the desired Γ − lim inf inequality.For the Γ − lim sup , take any ( γ t ) ∈ AC ([0 , , X) connecting x to y (if it doesnot exist, then there is nothing to prove). Then Theorem 3.12 precisely provides arecovery sequence γ εt := S h ε ( t ) γ t with h ε deﬁned as therein, both for the uniform-in-time d -convergence and the pointwise-in-time σ -convergence (the latter is an easyconsequence of the former by Remark 3.4). To prove this claim, note that for any n ∈ N there exist t , ..., t k ∈ [0 , such that, for any t ∈ [0 , , d ( γ t , γ t i ) < /n for atleast one t i ; in addition, since γ εt → γ t for all t ∈ [0 , there exists ε n small enoughsuch that d ( γ t i , γ εt i ) < /n for all ε < ε n and i = 1 , ..., k . As a consequence, taking(3.3) into account, d ( γ t , γ εt ) ≤ d ( γ t , γ t i ) + d ( γ t i , S h ε ( t ) γ t i ) + d ( S h ε ( t ) γ t i , γ εt ) ≤ d ( γ t , γ t i ) + d ( γ t i , S h ε ( t ) γ t i ) + e − λh ε ( t ) d ( γ t , γ t i ) ≤ n (2 + e λ − ε/ ) for all t ∈ [0 , and ε < ε n and by the arbitrariness of n we conclude that γ ε → γ uniformly. Furthermore, the lim sup inequality can be proved as follows: lim sup ε ↓ n A ε ( γ ε ) + ι ( γ ε ) o = lim sup ε ↓ n A ( γ ε ) + ε I ( γ ε ) + 0 o (3.19) ≤ lim sup ε ↓ n e λ − ε A ( γ ) − ε E ( γ ε / ) + ε (cid:0) E ( x ) + E ( y ) (cid:1)o ≤ lim sup ε ↓ n e λ − ε A ( γ ) + ε (cid:0) E ( x ) + E ( y ) (cid:1)o − ε ↓ ε E ( γ ε / ) ≤ A ( γ ) = A ( γ ) + ι ( γ ) , where the third inequality comes from the fact that, for any ε ↓ , ( γ ε / ) is containedin a bounded set and by assumption E is bounded from below on bounded sets,whence E ( γ ε / ) ≥ c for some c ∈ R . The proof is thus complete.23s an easy consequence of this result we obtain the following: Corollary 4.4.

With the same assumptions and notations as in Setting 3.1 andunder the further requirements that Assumption 3.2 holds and the Schrödinger prob-lem ( Sch ε ) relative to x, y ∈ X is solvable, let ε k ↓ and ω k be a minimizer of thecorresponding Schrödinger problem ( Sch ε ) with ε = ε k .Then lim k →∞ n A ( ω k ) + ε k I ( ω k ) o = inf ( γ t ) : x y A ( γ ) . Moreover, there exists ω ∈ C ([0 , , X) such that, up to a subsequence, ω k → ω inthe pointwise-in-time σ -topology and A ( ω ) = inf ( γ t ) : x y A ( γ ) . Proof.

Recall that, under a mild equi-coercivity condition, Γ -convergence preciselyguarantees that the limit of the optimal values of the approximating problems isthe optimal value of the limit problem and limits of minimizers are minimizers, cf.[16, Theorem 1.21]. In view of Theorem 4.3 and [16, Theorem 1.21], for the mildequi-coercivity condition to hold it suﬃces to prove that the set of minimizers { ω k } is relatively compact in the pointwise-in-time σ -topology. To this aim, the kineticenergies of the curves ω k are uniformly bounded since A ( ω k ) ≤ A ( ω k ) + ε k I ( ω k ) ≤ A ( ω ε ) + ε k I ( ω ε ) ≤ A ( ω ε ) + ε I ( ω ε ) < + ∞ , where ω ε is the minimizer for the problem with ε = ε := sup k ε k . Arguing as in theproof of Proposition 4.2, we deduce that there exists a continuous curve ( ω t ) t ∈ [0 , connecting x and y such that, up to extracting a suitable subsequence, ω kt → ω t w.r.t. σ as k → ∞ for all t ∈ [0 , . Remark 4.5.

Note that in Corollary 4.4 the curve ω is length-minimizing but notnecessarily distance-minimizing, namely it needs not be a geodesic between x and y ,since we only know that inf ( γ t ) : x y A ( γ ) ≥ d ( x, y ) and the inequality might be strict, e.g. if X is a non-convex subset of R d . However,if (X , d ) is a length metric space, i.e. for all x, y ∈ X and ε > there exists ( γ t ) ∈ AC ([0 , , X) such that γ = x , γ = y and ℓ ( γ ) ≤ d ( x, y ) + ε , then the inequalityabove turns out to be an identity and, as a consequence, ω is a geodesic. Thismeans that for any two points having ﬁnite energy there always exists a geodesicconnecting them. (cid:4) When the endpoints have inﬁnite entropy, the following variant of Theorem 4.3may be useful:

Theorem 4.6.

With the same assumptions and notations as in Setting 3.1, let x, y ∈ X with possibly E ( x ) , E ( y ) = + ∞ and for any ﬁxed ( ε n ) n ∈ N , ε n ↓ , let ( η n ) n ∈ N be converging to 0 slowly enough so that ε n ( E ( γ n ) + E ( γ n )) → with γ n := S η n x, γ n := S η n y. hen Γ − lim n →∞ n A ε n + ι n o = A + ι , for the uniform convergence on C ([0 , , X) . If Assumption 3.2 holds, then the Γ -convergence also takes place w.r.t. the pointwise-in-time σ -topology. Here ι n and ι are the convex indicators of the endpoint constraints for γ n , γ n and x, y , respectively.Proof. The proof of the Γ − lim inf is almost identical to that in Theorem 4.3, withthe only extra observation that ι ( γ ) ≤ lim inf n →∞ ι n ( γ n ) . For the Γ − lim sup , observe that if there does not exist ( γ t ) ∈ AC ([0 , , X) joining x and y , then there is nothing to prove. Hence let us suppose that at least one curve ( γ t ) ∈ AC ([0 , , X) connecting x and y exists, ﬁx it and note that Theorem 3.12applied to the curve S η n γ t still provides a recovery sequence γ ε n t := S η n + h εn ( t ) γ t withthe same choice h ε n ( t ) = ε n min { t, − t } as before. Indeed, on the one hand lim sup n →∞ n A ε n ( γ ε n ) + ι n ( γ ε n ) o = lim sup n →∞ n A ( γ ε n ) + ε n I ( γ ε n ) + 0 o (3.19) ≤ lim sup n →∞ n e λ − ε n A ( S η n γ ) − ε n E ( γ ε n / ) + ε n (cid:0) E ( γ n ) + E ( γ n ) (cid:1)o ≤ lim sup n →∞ n e λ − ε n A ( S η n γ ) + ε n (cid:0) E ( γ n ) + E ( γ n ) (cid:1)o − ε ↓ ε E ( γ ε / ) ≤ lim sup n →∞ A ( S η n γ ) ≤ lim sup n →∞ e − λη n A ( γ ) = A ( γ ) , where the third inequality follows by the same argument adopted in the proof of theprevious theorem and the last one is due to (3.18) with h ( t ) ≡ η n . On the otherhand, γ ε n t → γ t uniformly in t ∈ [0 , in the d -topology and, if Assumption 3.2 holds,for all t ∈ [0 , w.r.t. σ : the argument described in the previous proof applies alsohere verbatim.As conclusion, in the next proposition we show that any EVI -gradient ﬂow is asolution of the Schrödinger problem with suitable endpoints. Intuitively this is clear,because up to a rescaling factor ε both the trajectories of the gradient ﬂow of E andthe solutions to ( Sch ε ) must formally satisfy the same Newton equation, namely ¨ γ t = −∇ Φ( γ t ) where the potential Φ is given by (minus) the Fisher information −| ∂ E | ,cf. [33, Remark 6]. This is also in complete analogy with the standard Schrödingerproblem, which includes the heat ﬂow as a particular entropic interpolation. Proposition 4.7.

With the same assumptions and notations as in Setting 3.1, ﬁx ε > . Then for all x, y ∈ X the following lower bound on the optimal value of ( Sch ε ) holds inf ( γ t ) : x y A ε ( γ ) ≥ ε (cid:12)(cid:12) E ( x ) − E ( y ) (cid:12)(cid:12) . (4.2) If either y = S ε x or x = S ε y , then equality is achieved. In the former case the curve [0 , ∋ t ˆ γ t := S εt x is a minimizer in the Schrödinger problem and the optimalvalue is inf ( γ t ) : x y A ε ( γ ) = ε (cid:0) E ( x ) − E ( S ε x ) (cid:1) . An analogous statement holds when x = S ε y . roof. By (3.2) and Young’s inequality it follows that for any (˜ γ t ) ∈ AC ([0 , ε ] , X) joining x and y (if it exists; if not, (4.2) is trivial) it holds (cid:12)(cid:12) E (˜ γ ) − E (˜ γ ε ) (cid:12)(cid:12) ≤ Z ε | ˙˜ γ t | d t + 12 Z ε | ∂ E | (˜ γ t ) d t. By setting γ t := ˜ γ εt , t ∈ [0 , , and by the arbitrariness of ˜ γ we thus see that for all ( γ t ) ∈ AC ([0 , , X) joining x and y we have ε (cid:12)(cid:12) E ( γ ) − E ( γ ) (cid:12)(cid:12) ≤ Z | ˙ γ t | d t + ε Z | ∂ E | ( γ t ) d t, so that ε (cid:12)(cid:12) E ( γ ) − E ( γ ) (cid:12)(cid:12) ≤ inf ( γ t ) : x y A ε ( γ ) . Now assume that y = S ε x : integrating (3.4) for the EVI λ -gradient ﬂow ˆ γ (payingattention to the rescaling factor ε ) between 0 and 1 we get A ε (ˆ γ ) = 12 Z | ˙ˆ γ t | d t + ε Z | ∂ E | (ˆ γ t ) d t = ε (cid:0) E ( x ) − E ( y ) (cid:1) = ε (cid:12)(cid:12) E ( x ) − E ( y ) (cid:12)(cid:12) , where the last equality comes from the fact that t E ( S t x ) is non-increasing, as aconsequence of (3.4). Combining this identity with (4.2) yields the conclusion. In analogy with Section 2.2, in this short section we establish the geodesic λ -convexityof E . As already explained in the Introduction, although the result is known (cf. [27,Theorem 3.2]), our proof is independent and new and is a further evidence of thewide range of applications of the Schrödinger problem. Let us stress once more thatall the properties of EVI λ -gradient ﬂows stated in Section 3.1 and used so far do notrely on geodesic λ -convexity, whence the genuine independence of our approach. Theorem 4.8.

With the same assumptions and notations as in Setting 3.1, thepotential E is λ -convex along any geodesic.Proof. Let ( γ t ) be any constant-speed geodesic. We want to prove that E ( γ θ ) ≤ (1 − θ ) E ( γ ) + θ E ( γ ) − λ θ (1 − θ ) d ( γ , γ ) , ∀ θ ∈ [0 , . We will establish this inequality by carefully estimating at order one as ε ↓ thedefect of optimality, in the geodesic problem from γ to γ , of a suitably regularizedversion ( γ εt ) of the geodesic.If E ( γ ) = + ∞ or E ( γ ) = + ∞ there is nothing to prove, so we can assumewithout loss of generality that both endpoints have ﬁnite entropy. If θ = 0 or θ = 1 the inequality is trivial as well. Fix then an arbitrary parameter θ ∈ (0 , and let H θ ( t ) :=  θ t if t ∈ [0 , θ ] , − − θ ( t − if t ∈ [ θ, .

26e the hat function centered at t = θ with height and vanishing at t = 0 , . Setting h ( t ) := εH θ ( t ) for small ε > , let ( γ εt ) be the curve constructed as in Lemma 3.6,i.e. γ εt := S h ( t ) γ t , for all t ∈ [0 , . Arguing as in the proof of Theorem 3.12, it is easily veriﬁed that with the currentchoice of h it is still true that t

7→ | ˙ γ εt | and t

7→ | ∂ E | ( γ εt ) belong to AC ([0 , , X) and t E ( γ εt ) to AC ([0 , , so that we can integrate (3.18) in time on the whole interval [0 , . Discarding the non-negative term | h ′ ( t ) | | ∂ E | ( γ εt ) and using the optimality ofthe geodesic γ (namely its optimality between γ and γ ) give ≤ Z | ˙ γ εt | d t − Z | ˙ γ t | d t (3.18) ≤ − Z h ′ ( t ) dd t E ( γ εt ) d t + 12 Z (cid:16) e − λh ( t ) − (cid:17) | ˙ γ t | d t = − ε Z H ′ θ ( t ) dd t E ( γ εt ) d t + d ( γ , γ )2 Z (cid:16) e − ελH θ ( t ) − (cid:17) d t, where the last equality follows from the constant speed property of the geodesic γ ,namely | ˙ γ t | = d ( γ , γ ) . Dividing by ε > and leveraging the explicit piecewiseconstant values of H ′ θ ( t ) on each interval (0 , θ ) and ( θ, gives ≤ − Z H ′ θ ( t ) dd t E ( γ εt ) d t + d ( γ , γ )2 Z e − ελH θ ( t ) − ε d t | {z } := I ε = − Z θ θ dd t E ( γ εt ) d t + Z θ − θ dd t E ( γ εt ) d t + d ( γ , γ )2 I ε = 1 θ (cid:16) E ( γ ) − E ( γ εθ ) (cid:17) + 11 − θ (cid:16) E ( γ ) − E ( γ εθ ) (cid:17) + d ( γ , γ )2 I ǫ . Now let us multiply by θ (1 − θ ) > and rearrange the terms in order to get E ( γ εθ ) ≤ (1 − θ ) E ( γ ) + θ E ( γ ) + θ (1 − θ ) d ( γ , γ )2 I ε . It is easy to check that R H θ ( t )d t = for all θ , so that lim ε ↓ I ε = − λ Z H θ ( t ) d t = − λ. On the other hand, by deﬁnition of γ ε and since h ( θ ) = ε → it is clear that γ εθ = S h ( θ ) γ θ = S ε γ θ → γ θ in X (an EVI λ -gradient ﬂow is continuous up to t = 0 ).By lower semicontinuity of E this yields E ( γ θ ) ≤ lim inf ε ↓ E ( γ εθ ) ≤ (1 − θ ) E ( γ ) + θ E ( γ ) − λ θ (1 − θ ) d ( γ , γ ) , whence the conclusion. 27 Derivative of the cost

As a main application of the Γ -convergence results contained in Theorem 4.3 andCorollary 4.4 (and, in a wider sense, of their strategy of proof), in this section weinvestigate the dependence of the optimal value of the Schrödinger problem ( Sch ε ) onthe regularization parameter ε , focusing in particular on the regularity as a functionof ε and on the behaviour in the small-time regime. More precisely, and denoting C ε ( x, y ) := inf ( γ t ) : x y n A ( γ ) + ε I ( γ ) o , ∀ ε ≥ the optimal entropic cost, we show that ε C ε ( x, y ) is (locally) absolutely contin-uous and admits explicit left and right derivatives in a pointwise sense. Moreover,since C ε ( x, y ) → C ( x, y ) as ε ↓ by Corollary 4.4, we aim at measuring the error C ε ( x, y ) − C ( x, y ) and studying the minimizers of the unperturbed problem C ( x, y ) selected by Γ -convergence. Since we focus here on the dependence on ε we will as-sume throughout the whole Section 5 and without further mention the well-posednessof the ε -Schrödinger problem: Assumption 5.1.

Fix x, y ∈ X and suppose that for some (hence for any, by Propo-sition 4.2) ε > the Schrödinger problem ( Sch ε ) admits at least one minimizer, inother words the inﬁmum is attained in the deﬁnition of C ε ( x, y ) . We accordingly denote the set of ε -minimizers as Λ ε ( x, y ) := n ω ∈ AC ([0 , , X) : ω = x, ω = y and A ε ( ω ) = C ε ( x, y ) o . Let us start the analysis with a preliminary monotonicity statement for the Fisherinformation and the entropic cost, which generalizes [25, Lemma 3.3].

Lemma 5.2.

With the same assumptions and notations as in Setting 3.1 and forany ≤ ε < ε < ∞ there holds inf Λ ε ( x,y ) I ≥ sup Λ ε ( x,y ) I , with possibly inf Λ ( x,y ) I = + ∞ . Moreover, ε C ε ( x, y ) is monotone non-decreasingon [0 , ∞ ) .Proof. Let ε , ε as in the statement and choose ω i ∈ Λ ε i ( x, y ) for i = 1 , , so thatby optimality A ( ω ) + ε I ( ω ) ≤ A ( ω ) + ε I ( ω ) , A ( ω ) + ε I ( ω ) ≤ A ( ω ) + ε I ( ω ) . Summing these inequalities and dividing by ε − ε we obtain I ( ω ) ≥ I ( ω ) , andsince ω ∈ Λ ε and ω ∈ Λ ε are arbitrary the desired conclusion follows. As regardsthe last part of the statement, it is suﬃcient to note that since ω i are minimizers oftheir respective problems and ε < ε , C ε ( x, y ) = A ( ω ) + ε I ( ω ) ≤ A ( ω ) + ε I ( ω ) ≤ A ( ω ) + ε I ( ω ) = C ε ( x, y ) . ε = 0 to any ε ≥ . Proposition 5.3.

With the same assumptions and notations as in Setting 3.1 andunder the additional Assumption 3.2, for any ε > there holds Γ − lim ε ′ → ε n A ε ′ + ι o = A ε + ι (5.1) for the pointwise-in-time σ -topology and lim ε ′ → ε C ε ′ ( x, y ) = C ε ( x, y ) . Moreover, for any ε k → ε and any minimizer ω k ∈ Λ ε k ( x, y ) , there exists ω ∈ Λ ε ( x, y ) such that, up to a subsequence, ω kt σ → ω t , ∀ t ∈ [0 , as k → ∞ .Proof. It is suﬃcient to prove (5.1), as the other properties follow by a verbatimapplication of the arguments in the proof of Corollary 4.4.Fix ε and take ε ′ → ε . The Γ − lim sup inequality is trivial: if γ ε is such thatthe right-hand side of (5.1) is ﬁnite (otherwise there is nothing to prove), then theconstant sequence γ ε ′ ≡ γ ε is an admissible recovery sequence. For the Γ − lim inf inequality, note that the kinetic action A and the Fisher information I are lowersemicontinuous w.r.t. pointwise-in-time σ -convergence (see the proof of Proposition4.2), and clearly so is the convex indicator. Hence for any γ ε ′ converging to γ ε forthe pointwise-in-time σ -topology it holds A ε ( γ ε ) + ι ( γ ε ) ≤ lim inf ε ′ → ε n A ε ( γ ε ′ ) + ι ( γ ε ′ ) o = lim inf ε ′ → ε n A ( γ ε ′ ) + ε I ( γ ε ′ ) + ι ( γ ε ′ ) o = lim inf ε ′ → ε n A ( γ ε ′ ) + ( ε ′ ) I ( γ ε ′ ) + ι ( γ ε ′ ) o = lim inf ε ′ → ε n A ε ′ ( γ ε ′ ) + ι ( γ ε ′ ) o . As an immediate consequence of this result we deduce the following

Lemma 5.4.

With the same assumptions and notations as in Setting 3.1 and underAssumption 3.2, the function ε C ε ( x, y ) is continuous on [0 , ∞ ) .Moreover, if ε ω ε is a continuous (w.r.t. the pointwise-in-time σ -topology)selection of minimizers, then ε

7→ A ( ω ε ) and ε

7→ I ( ω ε ) are also continuous, on [0 , ∞ ) and (0 , ∞ ) respectively. Note that if the minimizers are unique, then ε ω ε is automatically continuousw.r.t. the pointwise-in-time σ -topology, simply by Proposition 5.3, as any sequenceof minimizers admits a subsequence converging to a minimizer and the limit is infact unique. Also, the continuity of the Fisher information can be strengthened upto ε = 0 , see later on Theorem 5.7. 29 roof. The continuity of C ε ( x, y ) for ε > is granted by Proposition 5.3, whilecontinuity at ε = 0 has already been proved in Corollary 4.4.As regards the kinetic energy A and the Fisher information I , recall that theyare both lower semicontinuous in [0 , ∞ ) w.r.t. the pointwise-in-time σ -topology, asalready discussed in the proof of Proposition 4.2. Thus, if ε ω ε is as in the state-ment, we are left to prove that ε

7→ A ( ω ε ) and ε

7→ I ( ω ε ) are upper semicontinuous.To this aim, it is suﬃcient to observe that lim sup ε ′ → ε A ( ω ε ′ ) = lim sup ε ′ → ε n C ε ′ ( x, y ) − ( ε ′ ) I ( ω ε ′ ) o ≤ lim sup ε ′ → ε C ε ′ ( x, y ) − lim inf ε ′ → ε ( ε ′ ) I ( ω ε ′ ) ≤ C ε ( x, y ) − ε I ( ω ε ) = A ( ω ε ) , where the last inequality holds by the continuity of ε C ε ( x, y ) and the lowersemicontinuity of ε

7→ I ( ω ε ) . Thus ε

7→ A ( ω ε ) is upper semicontinuous in [0 , ∞ ) .Interchanging A and I and writing now I = ε ( C ε − A ) , the same argument showsthat ε

7→ I ( ω ε ) is upper semicontinuous in (0 , ∞ ) (continuity at ε = 0 will require aspecial treatment later).We have now all the ingredients to discuss the regularity of the cost C ε ( x, y ) as afunction of the noise parameter ε and explicitly compute its left and right derivatives. Proposition 5.5.

With the same assumptions and notations as in Setting 3.1 andif Assumption 3.2 holds, the map ε C ε ( x, y ) is AC loc ([0 , ∞ )) , left and right diﬀer-entiable everywhere in (0 , ∞ ) and, for any ε > , the left and right derivatives aregiven by d − d ε C ε ( x, y ) = 2 ε max Λ ε ( x,y ) I , d + d ε C ε ( x, y ) = 2 ε min Λ ε ( x,y ) I (5.2) respectively, and the former (resp. latter) is left (resp. right) continuous. It is partof the statement the fact that the maximum and the minimum are attained. Remark 5.6.

Heuristically, (5.2) is nothing but the envelope theorem. Indeed, if ε C ε ( x, y ) were diﬀerentiable, then its derivative would be given by ∂ ε A ε = 2 ε I evaluated at any critical point, i.e. at any ω ε ∈ Λ ε ( x, y ) . However, since we donot know in our general metric framework that Schrödinger problem has a uniquesolution, we are not able to prove pointwise diﬀerentiability as in [25] and we haveto face the possibility of a gap between the left and right derivatives. In any case,for a.e. ε > this gap is zero, because ε C ε ( x, y ) is locally absolutely continuousin (0 , ∞ ) and therefore a.e. diﬀerentiable. This means that, up to a negligible set oftemperatures, the left and right derivatives match and I is constant on Λ ε ( x, y ) . Iffor whatever reason the Schrödinger problem ( Sch ε ) were uniquely solvable (whichis in particular true for the classic Schrödinger problem, as proved in [39, Theorem4.2]), then the left and right derivatives would be trivially equal and Lemma 5.4would give that ε C ε ( x, y ) is actually C ((0 , ∞ )) . Furthermore, the cost wouldalso be twice diﬀerentiable a.e. since by Lemma 5.2 its ﬁrst derivative ε I ( ω ε ) wouldbe the product of a linear function and of a monotone one. (cid:4) Proof.

The continuity of ε C ε ( x, y ) follows by Lemma 5.4, so let us focus on leftand right diﬀerentiability/continuity and local absolute continuity.30 ight diﬀerentiability . Fix ε > , let δ > , and choose ω ε ∈ Λ ε ( x, y ) , ω ε + δ ∈ Λ ε + δ ( x, y ) . Then write C ε + δ ( x, y ) − C ε ( x, y ) δ = A ε + δ ( ω ε + δ ) − A ε ( ω ε ) δ = A ε + δ ( ω ε + δ ) − A ε + δ ( ω ε ) δ + A ε + δ ( ω ε ) − A ε ( ω ε ) δ (5.3)and note that the second term on the right-hand side can be rewritten as A ε + δ ( ω ε ) − A ε ( ω ε ) = (2 εδ + δ ) I ( ω ε ) . The ﬁrst one is non-positive by optimality of ω ε + δ for A ε + δ , hence we obtain lim sup δ ↓ C ε + δ ( x, y ) − C ε ( x, y ) δ ≤ lim sup δ ↓ (2 ε + δ ) I ( ω ε ) = 2 ε I ( ω ε ) . As this inequality holds for any ω ε ∈ Λ ε ( x, y ) , we infer that lim sup δ ↓ C ε + δ ( x, y ) − C ε ( x, y ) δ ≤ ε inf Λ ε ( x,y ) I . (5.4)On the other hand we can also write C ε + δ ( x, y ) − C ε ( x, y ) δ = A ε + δ ( ω ε + δ ) − A ε ( ω ε ) δ = A ε + δ ( ω ε + δ ) − A ε ( ω ε + δ ) δ + A ε ( ω ε + δ ) − A ε ( ω ε ) δ . (5.5)Using now the optimality of ω ε for A ε , we observe that the second term on theright-hand side is non-negative, whence A ε + δ ( ω ε + δ ) − A ε ( ω ε ) δ ≥ A ε + δ ( ω ε + δ ) − A ε ( ω ε + δ ) δ = (2 ε + δ ) I ( ω ε + δ ) . For any sequence δ n ↓ , Proposition 5.3 guarantees (up to extraction of a subse-quence if needed) that ω ε + δ n → ω ε in the pointwise-in-time σ -topology for some ω ε ∈ Λ ε ( x, y ) . By lower semicontinuity of I this implies lim inf n →∞ C ε + δ n ( x, y ) − C ε ( x, y ) δ n ≥ lim inf n →∞ (2 ε + δ n ) I ( ω ε + δ n ) ≥ ε I ( ω ε ) ≥ ε inf Λ ε ( x,y ) I , and together with (5.4) this yields ∃ lim n →∞ C ε + δ n ( x, y ) − C ε ( x, y ) δ n = 2 ε I ( ω ε ) = 2 ε inf Λ ε ( x,y ) I . As the right-hand side does not depend on the particular sequence δ n ↓ we concludethat ∃ lim δ ↓ C ε + δ ( x, y ) − C ε ( x, y ) δ = 2 ε min Λ ε ( x,y ) I , in particular I is minimized by any accumulation point ω ε of { ω ε + δ } δ> .31 eft diﬀerentiability . The argument is very similar. Indeed, if δ < , then theﬁrst term on the right-hand side of (5.3) is non-negative and the second one can behandled in the same way. Hence there holds lim inf δ ↑ C ε + δ ( x, y ) − C ε ( x, y ) δ ≥ ε I ( ω ε ) , for any ω ε ∈ Λ ε ( x, y ) , and therefore lim inf δ ↑ C ε + δ ( x, y ) − C ε ( x, y ) δ ≥ ε sup Λ ε ( x,y ) I . Applying the same considerations to (5.5) and following the same argument as abovewe retrieve the lim sup inequality, ﬁrst along some subsequence δ n ↑ and then along any δ ↑ . Combining with the inequality above gives ∃ lim δ ↑ C ε + δ ( x, y ) − C ε ( x, y ) δ = 2 ε max Λ ε ( x,y ) I , ∀ ε > , whence the pointwise left diﬀerentiability of ε C ε ( x, y ) . Left and right continuity . In order to prove the right continuity of the rightderivative of ε C ε ( x, y ) , note that on the one hand by Lemma 5.2 for any ε n ↓ ε it holds inf Λ ε ( x,y ) I ≥ lim sup n →∞ sup Λ εn ( x,y ) I ≥ lim sup n →∞ inf Λ εn ( x,y ) I . On the other hand, we can assume up to a subsequence if needed that lim inf n →∞ inf Λ εn ( x,y ) I = lim n →∞ inf Λ εn ( x,y ) I . As shown in the proof of right diﬀerentiability, inf Λ ε ′ ( x,y ) I is attained for any ε ′ > ,hence in particular inf Λ εn ( x,y ) I = I ( ω n ) for some ω n ∈ Λ ε n ( x, y ) , for all n . Up toextracting a further subsequence, by Proposition 5.3 we can assume that ω n → ω ε w.r.t. the pointwise-in-time σ -topology for some ω ε ∈ Λ ε ( x, y ) , and moreover byLemma 5.4 lim n →∞ I ( ω n ) = I ( ω ) ≥ inf Λ ε ( x,y ) I . Putting all these inequalities together provides us with the right continuity of ε inf Λ ε ( x,y ) I and, a fortiori, of the right derivative. Left continuity for the left deriva-tive follows along an analogous reasoning. Local absolute continuity . Let < ε < ε < ∞ and, for any < δ < , deﬁne f δ ( ε ) := C ε + δ ( x, y ) − C ε ( x, y ) δ . The monotonicity of ε C ε ( x, y ) from Lemma 5.2 gives f δ ≥ . Arguing as inthe very beginning of the proof of the right diﬀerentiability we see that f δ ( ε ) ≤ (2 ε + 1) I ( ω ε ) for any ω ε ∈ Λ ε ( x, y ) , and by Lemma 5.2 f δ ( ε ) ≤ (2 ε + 1) sup Λ ε ( x,y ) I < ∞ , ∀ ε ∈ ( ε , ε ] . | f δ | ≤ M uniformly in δ and f δ converges pointwise to the right derivative of ε C ε ( x, y ) as δ ↓ , whence by the dominated convergence theorem Z ε ε d + d ε C ε ( x, y ) d ε = lim δ ↓ Z ε ε f δ ( ε ) d ε. The right-hand side can be rewritten as lim δ ↓ Z ε ε f δ ( ε ) d ε = lim δ ↓ (cid:16) δ Z ε ε C ε + δ ( x, y ) d ε − δ Z ε ε C ε ( x, y ) d ε (cid:17) = lim δ ↓ (cid:16) δ Z ε + δε C ε ( x, y ) d ε − δ Z ε + δε C ε ( x, y ) d ε (cid:17) = C ε ( x, y ) − C ε ( x, y ) , where the last equality holds by the Lebesgue diﬀerentiation theorem for the con-tinuous function ε C ε ( x, y ) (cf. Lemma 5.4). We have thus proved that the costbelongs to AC loc ((0 , ∞ )) , since C ε ( x, y ) − C ε ( x, y ) = Z ε ε d + d ε C ε ( x, y ) d ε, ∀ < ε < ε . For the full AC loc ([0 , ∞ )) regularity it is then suﬃcient to let ε ↓ : the left-hand side converges to C ε ( x, y ) − C ( x, y ) by Lemma 5.4, and by the monotonicity d + d ε C ε ≥ the right-hand side also converges by monotone convergence.Relying on our previous auxiliary results and on Proposition 5.5, we are ﬁnallyin position of estimating the error C ε ( x, y ) − C ( x, y ) with o ( ε ) precision. We willalso signiﬁcantly reﬁne Corollary 4.4 by proving that any accumulation point of anysequence of minimizers is not only optimal for the unperturbed problem C ( x, y ) ,but also I -minimizing among all competitors in Λ ( x, y ) . Theorem 5.7.

With the same assumptions and notations as in Proposition 5.5, ifthere exists ω ∈ Λ ( x, y ) such that I ( ω ) < ∞ , then the map ε C ε ( x, y ) is rightdiﬀerentiable also at ε = 0 with d + d ε C ε ( x, y ) (cid:12)(cid:12)(cid:12) ε =0 = 0 , the right derivative is right continuous for any ε ≥ , and C ε ( x, y ) − C ( x, y ) = ε inf Λ ( x,y ) I + o ( ε ) . (5.6) Moreover, for any ε n ↓ and any minimizer ω n ∈ Λ ε n ( x, y ) there exists ω ∗ ∈ Λ ( x, y ) such that (up to a subsequence) ω n → ω ∗ for the pointwise-in-time σ -topology, and ω ∗ has minimal Fisher information in Λ ( x, y ) I ( ω ∗ ) = min Λ ( x,y ) I . roof. The right diﬀerentiability of ε C ε ( x, y ) at ε = 0 follows by the sameargument carried out in Proposition 5.5. Indeed, given ω as in the statement, by(5.3) with ε = 0 it holds lim sup δ ↓ C δ ( x, y ) − C ( x, y ) δ ≤ lim sup δ ↓ δ I ( ω ) = 0 . The liminf inequality is straightforward, since

I ≥ and thus by (5.5) with ε = 0lim inf δ ↓ C δ ( x, y ) − C ( x, y ) δ ≥ lim inf δ ↓ δ I ( ω δ ) ≥ for any ω δ ∈ Λ δ ( x, y ) . This also shows that the right derivative vanishes at ε = 0 .As regards the right continuity of the right derivative, the case ε > has alreadybeen discussed in Proposition 5.5. For ε = 0 the same strategy still works, with theonly minor diﬀerence that we cannot rely on Lemma 5.4 anymore. Nonetheless, if ω n ∈ Λ ε n ( x, y ) is as in Proposition 5.5, ω ∈ Λ ( x, y ) and ω n → ω for the pointwise-in-time σ -topology (the existence of such ω is granted by Corollary 4.4) it is stilltrue that lim inf n →∞ I ( ω n ) ≥ I ( ω ) , simply by lower semicontinuity of I . With this single change in the proof we deducethat ε inf Λ ε ( x,y ) I is right continuous and ﬁnite also at ε = 0 , thanks to the presentassumptions, and so is the right derivative of the cost due to ε inf Λ ε ( x,y ) I → as ε ↓ .The last part of the statement is a slight modiﬁcation of these lines of thought.Indeed, given any sequence ε n ↓ and ω n ∈ Λ ε n ( x, y ) , the existence of ω ∗ ∈ Λ ( x, y ) such that, up to subsequences, ω n → ω ∗ is ensured by Corollary 4.4. The fact that ω ∗ has minimal Fisher information among all elements in Λ ( x, y ) follows from inf Λ ( x,y ) I ≥ lim sup n →∞ sup Λ εn ( x,y ) I ≥ lim sup n →∞ I ( ω n ) ≥ lim inf n →∞ I ( ω n ) ≥ I ( ω ∗ ) ≥ inf Λ ( x,y ) I , where we used once again Lemma 5.2 and the lower semicontinuity of I .Thus, it only remains to establish (5.6). As ε C ε ( x, y ) belongs to AC loc ([0 , ∞ )) and the right derivative coincides a.e. with the full derivative, (5.2) and the funda-mental theorem of calculus yield C ε ( x, y ) − C ( x, y ) = 2 Z ε s inf Λ s ( x,y ) I d s ≤ Z ε s inf Λ ( x,y ) I d s = ε inf Λ ( x,y ) I . Here we used the monotonicity of the Fisher information from Lemma 5.2 in themiddle inequality. By the same monotonicity and the right continuity at ε = 0 of ε inf Λ ε ( x,y ) I we also deduce that C ε ( x, y ) − C ( x, y ) ≥ Z ε s inf Λ ε ( x,y ) I d s = ε inf Λ ε ( x,y ) I = ε inf Λ ( x,y ) I + ε (cid:16) inf Λ ε ( x,y ) I − inf Λ ( x,y ) I (cid:17) = ε inf Λ ( x,y ) I + o ( ε ) . Combining this lower bound with the previous upper one entails (5.6).34 emark 5.8.

It is worth stressing that the upper bound on C ε ( x, y ) − C ( x, y ) is notasymptotic, but pointwise. A possible way to improve (5.6) would rely on a reﬁnedanalysis of ε inf Λ ε ( x,y ) I , its derivative (which exists a.e. by monotonicity), andpossibly absolute continuity. (cid:4) Remark 5.9.

The I -minimizing property of the accumulation point ω ∗ is not spe-ciﬁc of the choice ε = 0 , but of the particular “backward” direction of the sequence ε n ↓ . Repeating the argument in the proof of Theorem 5.7 it is indeed not diﬃcultto check that, given any ε > , a sequence ε n ↓ ε , and ω n ∈ Λ ε n ( x, y ) there exists ω ε ∈ Λ ε ( x, y ) such that, up to a subsequence, ω n → ω ε for the pointwise-in-time σ -topology and I ( ω ε ) = inf Λ ε ( x,y ) I . In a symmetric fashion, a closer look into the proof of Proposition 5.5 suggests thatan opposite behaviour appears in the “forward” direction. More precisely, if ε n ↑ ε instead of ε n ↓ ε , then any accumulation point ω ε of ( ω n ) is such that I ( ω ε ) = sup Λ ε ( x,y ) I . However, the “backward” direction and the case ε = 0 are usually more interesting,because of the connection with the unperturbed problem C ( x, y ) for which theremight be multiple solutions even if the Schrödinger problem ( Sch ε ) has a uniqueminimizer for all ε > . It is therefore natural to look for the (properties of the)solutions selected via Schrödinger regularization. (cid:4) In this section we collect several and heterogeneous situations where our abstractapproach (in particular Theorem 4.3, Corollary 4.4, and Theorem 5.7) applies. Weshall also comment the novelty of the results thus obtained in comparison withthe existing literature. In this perspective, it is worth discussing in more detailthe role played by Assumption 3.2 so far, singling out when the sequential lowersemicontinuity of | ∂ E | w.r.t. σ is needed and when it is not:• to show the existence of a solution to the Schrödinger problem ( Sch ε ) (cf.Proposition 4.2) it is crucial, in order to apply the direct method of the calculusof variations;• in Theorem 4.3 and Corollary 4.4 it is not used;• unlike Corollary 4.4, in Proposition 5.3 it is needed for the Γ -liminf inequalityand so is in Lemma 5.4;• Proposition 5.5 relies on Proposition 5.3 and Lemma 5.4, hence it is implicitlyused;• in Theorem 5.7 the continuity of ε inf Λ ε ( x,y ) I at ε = 0 requires the lowersemicontinuity of | ∂ E | and also Proposition 5.5 is used in the proof of (5.6);hence the lower semicontinuity of | ∂ E | is really needed.35his means that if one is able to show the solvability of the Schrödinger problem( Sch ε ) by means other than those used in Proposition 4.2, then Theorem 4.3 andCorollary 4.4 are still valid under the following weaker hypothesis. Assumption 6.1.

There exists a Hausdorﬀ topology σ on X such that d -boundedsequences contain σ -converging subsequences. Moreover, the distance d is sequentiallylower semicontinuous w.r.t. σ . Theorem 5.7, instead, requires the full validity of Assumption 3.2.

As a ﬁrst example, let us consider the Boltzmann-Shannon relative entropy on theWasserstein space built over a (locally compact)

RCD( K, ∞ ) space. To this end, let ( M, d , m ) be a complete and separable locally compact length metric space endowedwith a Radon measure and assume that it is an RCD( K, ∞ ) space [4] for some K ∈ R .Let X := P ( M ) be the 2-Wasserstein space over M , namely the space of probabilitymeasures with ﬁnite second moments, and equip it with the 2-Wasserstein distance W : it turns out to be a complete and separable metric space [14] as well. TheBoltzmann-Shannon relative entropy E on X is deﬁned as E ( µ ) :=  Z M ρ log( ρ ) d m if µ = ρ m , + ∞ if µ m . As by [61, Theorem 4.24] there exist

C > , x ∈ M such that R M e − C d ( · ,x ) d m < ∞ , E can be equivalently rewritten as E ( µ ) = Z M ˜ ρ log(˜ ρ ) d ˜ m | {z } ≥ − C Z M d ( · , x ) d µ − log Z, where ˜ ρ is the Radon-Nikodym derivative of µ w.r.t. ˜ m , with the normalization Z := Z M e − C d ( · ,x ) d m , ˜ m := 1 Z e − C d ( · ,x ) m . From this very deﬁnition, it is easy to see that E is a proper lower semicontinuousfunctional, bounded from below on W -bounded sets. In addition, it has a densedomain, since by (one of the equivalent) deﬁnition of RCD spaces, cf. [4, Theorem5.1], for any µ ∈ X there exists an EVI K -gradient ﬂow of E starting from it. ThusSetting 3.1 holds.As regards Assumption 6.1, note that X is not locally compact unless M is com-pact, so that in general the metric topology of X is not an admissible candidate for σ . Nonetheless there is a natural alternative: the narrow convergence of probabilitymeasures. Indeed, W -bounded sequences in X are uniformly tight (the second mo-ments are uniformly bounded and the balls in M are relatively compact, so that theclaim follows from [2, Remark 5.1.5]) and thus relatively compact w.r.t. the narrowtopology. Moreover, W is lower semicontinuous w.r.t. narrow convergence of mea-sures [1, Proposition 3.5]. Therefore, given any µ, ν ∈ X for which the dynamicalSchrödinger problem ( Sch ε ) is solvable, the Γ -convergence results of Section 4 are36ully applicable. This is for instance the case if µ, ν ≪ m have bounded densities andsupports (in [38, 39] this is proved for RCD ∗ ( K, N ) spaces, N < ∞ , but the argu-ment can be adapted to locally compact RCD( K, ∞ ) spaces thanks to the existenceof “good” cut-oﬀ functions [52]).In the present framework, taking into account the equivalence between W -absolutely continuous curves and distributional solutions of the continuity equation(see [37]) and the fact that the slope | ∂ E | coincides with the Fisher information [3,Theorem 9.3], ( Sch ε ) reads as inf (cid:26) Z Z | v t | ρ t d t d m + ε Z Z |∇ log ρ t | ρ t d t d m (cid:27) , where the inﬁmum runs over all couples ( µ t , v t ) , µ t = ρ t m , solving the continuityequation ∂µ t + div( v t µ t ) = 0 with the constraint µ = µ and µ = ν . This is thedynamical formulation of the “classical” Schrödinger problem [45]. A thorough studyof this problem and its equivalent formulations (at the static, dual, and dynamicallevels) has been carried out by the second author in [39], but in the more restric-tive framework of RCD ∗ ( K, N ) spaces, and only for ε ﬁxed. The behaviour of the(unique) minimizers as ε ↓ is instead studied in [38, Proposition 5.1], again only in RCD ∗ ( K, N ) spaces, but the Γ -convergence of the corresponding variational prob-lems is not investigated. Hence Theorem 4.3 and Corollary 4.4 are new in the RCD framework.Under the stronger assumption that M is compact (e.g. the torus, the sphere orany convex closed bounded subset of a smooth weighted Riemannian manifold), assaid above we can choose the σ topology to be the strong one induced by W , and inthis case | ∂ E | is lower semicontinuous by (3.1) and Assumption 3.2 is fully satisﬁed.Another interesting situation where Assumption 3.2 fully holds is represented bya convex domain in R d (in this case σ is, as before, the narrow topology; see [35,Lemma 2.4] for a proof of the narrow lower semicontinuity of | ∂ E | ). As a consequence,in these examples also the results in Section 5 hold true and this partly extends therecent work [25], where an analogue of Theorem 5.7 is proved in the Riemanniansetting. As a second class of examples, we consider generalized entropy functionals (usu-ally called internal energies ) on the Wasserstein space built over an

RCD ∗ (0 , N ) space, N < ∞ . Taking advantage of the non-negative curvature assumption and ofthe ﬁnite dimensionality we shall indeed be able to cover a wide range of function-als, including in particular Rényi entropies (naturally linked to the porous mediumequation). By the discussion carried out in the previous section and by the factthat RCD ∗ ( K, N ) spaces are in particular locally compact RCD( K, ∞ ) spaces (thenotion of RCD ∗ ( K, N ) space is ﬁrst introduced in [36]; for the distinction between RCD and

RCD ∗ conditions see [6] and [21]), the 2-Wasserstein space X := P ( M ) over an RCD ∗ (0 , N ) space ( M, d , m ) endowed with the 2-Wasserstein distance W isa complete and separable metric space.As regards the entropy functionals we shall consider on X , they are of the form R M U ( ρ ) d m , where U : [0 , ∞ ) → R is a continuous and convex function with U (0) = and U ′ locally Lipschitz in (0 , ∞ ) satisfying McCann’s condition [49] for some N ′ ≥ N : this means that the corresponding pressure function P ( r ) := rU ′ ( r ) − U ( r ) is such that P (0) := lim r ↓ P ( r ) = 0 and r r − /N ′ P ( r ) is non-decreasing or,equivalently, r r N ′ U ( r − N ′ ) is convex and non-increasing on (0 , ∞ ) . Under these assumptions on U , the internal energy E is deﬁned as E ( µ ) := Z M U ( ρ ) d m + U ′ ( ∞ ) µ ⊥ ( M ) , if µ = ρ m + µ ⊥ , µ ⊥ ⊥ m (6.1)where U ′ ( ∞ ) := lim r →∞ U ′ ( r ) . In the case U is chosen equal to U N ′ ( r ) := − N ′ ( r − /N ′ − r ) , N ′ ≥ N or U m ( r ) := 1 m − r m , m ≥ − N ( U N ′ being more linked to Lott-Sturm-Villani theory of curvature-dimension bounds, U m with the porous medium equation of power m ), the well-known Rényi entropyis recovered. A detailed discussion of internal energies like E , associated non-linear diﬀusion semigroups and evolution variational inequalities in connection withcurvature-dimension conditions is at the heart of the monograph [5] and can also befound in [63, Chapters 16 and 17].Since U (0) = 0 , M is locally compact and U is continuous, it is clear that E iswell deﬁned and ﬁnite on all probability measures with bounded support, so that E is proper and has a dense domain in X . Actually, D ( E ) is dense in energy in X ,i.e. for all µ ∈ X there exist µ n ∈ D ( E ) with W ( µ n , µ ) → and E ( µ n ) → E ( µ ) as n → ∞ . By the properties of U it is also easy to see that E is lower semicontinuous[63, Theorem 30.6] and bounded from below on W -bounded sets. Finally, from [5,Theorem 9.21] with K = 0 (since M is assumed to be an RCD ∗ (0 , N ) space) andthe fact that D ( E ) is dense in energy in X , we see that for all µ ∈ X there exists an EVI -gradient ﬂow of E starting from it.We are therefore within Setting 3.1 and by what we said in Section 6.1 the narrowtopology complies with Assumption 6.1. Hence whenever the dynamical Schrödingerproblem ( Sch ε ) is solvable, our metric results of Section 4 can be applied. As regardsthe lower semicontinuity of | ∂ E | w.r.t. σ , and thus the full validity of Assumption 3.2and, as a consequence, of the abstract results of Section 5 too, if M is compact thenby (3.1) we see that the W -topology is an admissible candidate for σ . If M = R d and U is superlinear at ∞ (which is the case for U m deﬁned above with m > ),then by [2, Theorem 10.4.6] the slope can be represented as | ∂ E | ( µ ) = Z R d |∇ U ′ ( ρ ) | d µ, if µ = ρ L d and by [35, Proposition 2.2] it is lower semicontnuous w.r.t. narrow convergence.Hence also in this situation the results of Section 5 hold true.To the best of our knowledge, up to now the dynamical Schrödinger problem( Sch ε ) with the slope of a general internal energy in place of the slope of the Boltz-mann entropy has been considered only in [33] from a purely formal point of view.Static Monge-Kantorovich problems regularized by means of the Rényi entropy or38ore general internal energies have recently been introduced in [32, 48, 47, 30] (seealso the references therein). Remarkably, [47] establishes the Γ -convergence of theregularized problems towards the optimal transport one (cf. [30] where the conver-gence of the optimal values and minimizers is discussed). However, in [30] onlybounded costs are considered (the quadratic cost function associated to (1.4) is thusruled out for non-compact sample spaces), while [47] the discussion is restricted tosample spaces which are compact subset of R d . Other questions our paper is con-cerned with have not been examined in these references. Note also that the issue ofthe equivalence between static and dynamical formulations is far from being clear atthis level of generality. In view of this discussion, in all the applicability situationspresented in this section our results are new.The case of a (possibly) negatively curved base space M is not discussed since,as already argued above, [5, Theorem 9.21] allows to deduce ( EVI λ ) with λ = 0 onlyfor K ≥ . Moreover, it has recently been proved [28, Theorem 2.5 and Remark2.6] that in the hyperbolic space the porous medium equation cannot be seen as theWasserstein gradient ﬂow of some λ -convex functional in the EVI -sense, hence theRényi entropy cannot generate an

EVI λ -gradient ﬂow there. In the seminal thought experiment proposed by Schrödinger [59, 60] the physicalsystem, whose evolution between two subsequent observations has to be determined,consists of independent Brownian particles. An important generalization has beenrecently proposed in [7], where particles are allowed to interact through a pair po-tential W . This leads to the so-called Mean Field Schrödinger Problem (MFSPhenceforth). In order to see that this example falls within our abstract metric the-ory, let us ﬁrst check the validity of Setting 3.1. As already said in Section 6.1, X := P ( R d ) , the 2-Wasserstein space over R d , endowed with the Wasserstein distace W is a complete and separable metric space. The role played by the Boltzmann-Shannon relative entropy in the “classical” Schrödinger problem is here taken by thefunctional E : X → R deﬁned (up to a shift by a constant) by E ( µ ) :=  H ( µ | L d ) + Z R d W ∗ ρ d µ if µ = ρ L d + ∞ if µ

6≪ L d where H ( µ | L d ) is the Boltzmann-Shannon relative entropy of µ w.r.t. the Lebesguemeasure L d , already introduced in Section 6.1, and W is the pair potential, describingvia convolution the interaction between the particles of the system. On such apotential the following assumptions are made: it is of class C ( R d , R ) , is symmetric,i.e. W ( x ) = W ( − x ) for all x ∈ R d , and satisﬁes the two-sided bound ΛId ≥ ∇ W ≥ λ Id for some Λ , λ > (actually λ ∈ R is enough, but in [7] the authors are interested inthe ergodic behaviour of MFSP). While the upper bound is technical, the lower oneis geometric and crucial. The lower semicontinuity of E is easily seen to hold: therelative entropy has already been discussed, whereas the continuity of the convolutionterm follows from the fact that if µ n → µ in P ( R d ) , then µ n ⊗ µ n → µ ⊗ µ in P ( R d ) ,39f. [2, Example 9.3.4]. The fact that E is proper and the density of its domain arealso clear. Moreover, the assumptions on W guarantee that E is bounded from belowon W -bounded sets. As concerns the existence of EVI λ -gradient ﬂows starting fromany µ ∈ X , this is ensured by [2, Theorem 11.2.1] in conjunction with [2, Remark9.2.5 and Proposition 9.3.5], granting the λ -convexity of E along generalized geodesics(see [2, Deﬁnitions 9.2.2 and 9.2.4]). Therefore, Setting 3.1 holds.As regards Assumption 3.2, for the topology σ the natural candidate is once againthe sequential topology induced by narrow convergence of probability measures. Bythe discussion in Section 6.1, W -bounded sets are relatively narrow compact and W is sequentially narrow lower semicontinuous. The slope of E is explicitly givenby | ∂ E | ( µ ) =  Z R d |∇ log ρ + 2 ∇ W ∗ ρ | d µ if µ = ρ L d , ∇ log ρ ∈ L µ , + ∞ otherwise , cf. [7, Section 1.4.2], and one can rely on [35, Proposition 2.2], the fact that ∆ W is continuous and bounded (as a consequence of the boundedness of ∇ W ) and theregularization properties of the convolution to show that | ∂ E | is also sequentiallynarrowly lower semicontinuous. Hence Assumption 3.2 is fully satisﬁed and all theresults of Sections 4 and 5 are applicable.From the novelty standpoint, a ﬁrst interesting remark is the fact that in [7]the approach is purely stochastic, while our point of view is completely analytic.For instance, in [7, Proposition 1.1] the existence of solutions to MFSP is provedunder the same assumptions we have in Proposition 4.2, namely µ, ν ∈ X with E ( µ ) , E ( ν ) < ∞ . However, already at this basic level the reader may appreciate thediﬀerence between the two approaches.But more than anything else, our abstract results are completely new when spe-cialized to MFSP: indeed, only the ergodic behaviour in the long time regime ε → ∞ is studied in [7], so that the Γ -convergence results of Section 4 are entirely novel.The same is true for Section 5, since in [25] the derivative of the cost associated toMFSP is not investigated nor is the Taylor expansion (5.6). In [31] the authors studied a generalization W m of the quadratic Wasserstein dis-tance on P (Ω) , generated by Benamou-Brenier-type formulas in convex boundeddomains Ω ⊂ R d . The latter are based on nonlinear continuity equations and pseudo-Riemannian norms k ˙ µ k µ = min v (cid:26)Z Ω m ( µ ) | v | s.t. ˙ µ + div( m ( µ ) v ) = 0 (cid:27) , where m : R + → R + is a given nonlinear mobility function satisfying suitable struc-tural conditions (mainly concavity). The linear case m ( r ) = r corresponds to thestandard Wasserstein distance. In the nonlinear case, the lack of 1-homogeneitymay impose restricting to absolutely continuous measures µ = ρ ( x ) d x (as in [20,Section 3] depending on cases A or B therein), thus for the ease of exposition in thissection we identify measures µ with their densities ρ with a slight abuse of notation.40pecifying the reference measure m in (6.1) to be the (normalized) Lebesguemeasure L d and taking U to be superlinear U (+ ∞ ) = + ∞ for convenience, let uscheck that our previous internal energies E ( ρ ) = Z Ω U ( ρ ( x )) d x ﬁt our setting. First of all, completeness is known from [31, Theorem 5.7], and sepa-rability readily follows from the density of compactly supported continuous functions,whence (A1). Next, from [31, Theorem 5.5] it is known that the W m topology is atleast stronger than the weak- ∗ convergence of measures, thus the lower semiconti-nuity in (A2) holds as soon as z U ( z ) is lower semicontinuous. The density ofthe domain D ( E ) in (A2) should be again a simple exercise involving standard ap-proximation arguments, provided U is reasonable. As regards our more fundamentalassumption (A3), the generation of an EVI λ -ﬂow is exactly the purpose of [20] for λ = 0 . More precisely, under some generalized McCann condition GM C ( m , d ) in-volving U, m , and the ambient dimension d , [20, Theorem 4.10] guarantees that ourinternal energy functionals generate -contractive gradient ﬂows. As a consequencewe can rigorously take our Setting 3.1 as applicable here. As concerns Assump-tion 3.2, a reasonable and natural choice for the weaker σ -topology is of course theweak- ∗ convergence of measures. Given the pseudo-Riemannian structure inducedby [20, Eq. (3.2)], the (squared) metric slope is at least formally given by | ∂ E | ( ρ ) = Z Ω m ( ρ ) |∇ U ′ ( ρ ) | = Z Ω |∇ P ( ρ ) | m ( ρ ) , (6.2)where the pressure P is deﬁned as P ( r ) = R r U ′′ ( z ) m ( z ) d z . With reasonable as-sumptions on U, m it should not be diﬃcult to check the lower semicontinuity of thisgeneralized Fisher information for this speciﬁc choice of the σ topology, and from[31, Theorem 5.6] the distance W m is also known to be lower semicontinuous w.r.t.the weak- ∗ convergence.As a consequence our metric results apply to this context as well (althoughfull proofs of the representation (6.2) for the metric slope and of its weak lowersemicontinuity are still missing for a completely rigorous statement, but this is outof scope of the paper). Let (X , d ) be a complete and separable CAT(0) space (i.e. a separable Hadamardspace), x ∈ X be a ﬁxed point, and E = d ( x , · ) . Then E is a -convex functional.By [54, Theorem 3.14], there exists an EVI -gradient ﬂow of E starting from any x ∈ X . Thus, we ﬁt into Setting 3.1.Although Assumption 6.1 is satisﬁed, existence of some “weak” Hausdorﬀ topol-ogy σ on X is required to secure Assumption 3.2 (unless X is locally compact, cf.Remark 3.3). Such a topology indeed exists, at least if X satisﬁes a rather mild geo-metric Q condition [41, 42], and is called the half-space topology. The correspondingconvergence is known as the ∆ -convergence [46]. Indeed, d -bounded sequences con-tain ∆ -converging subsequences [46, 41]. Bounded closed convex sets (in particular,41alls) are ∆ -closed [42], which easily implies that d is ∆ -lower semicontinuous. More-over, it is easy to see from (3.1) that | ∂ E ( x ) | = d ( x , x ) , thus the slope is ∆ -lowersemicontinuous too.In this framework, all our results are applicable. Note that E is always ﬁnite. Acknowledgments

LM wishes to thank Jean-Claude Zambrini for numerous and fruitful discussions onthe Schrödinger problem, and acknowledges support from the Portuguese ScienceFoundation through FCT project PTDC/MAT-STA/22812/2017

SchröMoka . LTacknowledges ﬁnancial support from FSMP Fondation Sciences Mathématiques deParis. DV was partially supported by the FCT projects UID/MAT/00324/2020 andPTDC/MAT-PUR/28686/2017.

References [1] Luigi Ambrosio and Nicola Gigli. A user’s guide to optimal transport. In

Mod-elling and Optimisation of Flows on Networks , Lecture Notes in Mathematics,pages 1–155. Springer Berlin Heidelberg, 2013.[2] Luigi Ambrosio, Nicola Gigli, and Giuseppe Savaré.

Gradient ﬂows in metricspaces and in the space of probability measures . Lectures in Mathematics ETHZürich. Birkhäuser Verlag, Basel, second edition, 2008.[3] Luigi Ambrosio, Nicola Gigli, and Giuseppe Savaré. Calculus and heat ﬂow inmetric measure spaces and applications to spaces with Ricci bounds from below.

Invent. Math. , 195(2):289–391, 2014.[4] Luigi Ambrosio, Nicola Gigli, and Giuseppe Savaré. Metric measure spaces withRiemannian Ricci curvature bounded from below.

Duke Math. J. , 163(7):1405–1490, 2014.[5] Luigi Ambrosio, Andrea Mondino, and Giuseppe Savaré. Nonlinear diﬀusionequations and curvature conditions in metric measure spaces.

Mem. Amer.Math. Soc. , 262(1270):v+121, 2019.[6] Kathrin Bacher and Karl-Theodor Sturm. Localization and tensorization prop-erties of the curvature-dimension condition for metric measure spaces.

J. Funct.Anal. , 259(1):28–56, 2010.[7] Julio Backhoﬀ, Giovanni Conforti, Ivan Gentil, and Christian Léonard. Themean ﬁeld Schrödinger problem: ergodic behavior, entropy estimates and func-tional inequalities.

Probability Theory and Related Fields , pages 1–56, 2020.[8] Aymeric Baradat and Léonard Monsaingeon. Small noise limit and convexity forgeneralized incompressible ﬂows, Schrödinger problems, and optimal transport.

Arch. Ration. Mech. Anal. , 235(2):1357–1403, 2020.[9] Jean-David Benamou and Yann Brenier. A computational ﬂuid mechanics solu-tion to the Monge-Kantorovich mass transfer problem.

Numerische Mathematik ,84(3):375–393, Jan 2000. 4210] Jean-David Benamou, Guillaume Carlier, Marco Cuturi, Luca Nenna, andGabriel Peyré. Iterative Bregman projections for regularized transportationproblems.

SIAM J. Sci. Comput. , 37(2):A1111–A1138, 2015.[11] Jean-David Benamou, Guillaume Carlier, Simone Di Marino, and Luca Nenna.An entropy minimization approach to second-order variational mean-ﬁeldgames.

Preprint, arXiv:1807.09078 , 2018.[12] Jean-David Benamou, Guillaume Carlier, and Luca Nenna. A numerical methodto solve multi-marginal optimal transport problems with Coulomb cost. In

Splitting methods in communication, imaging, science, and engineering , Sci.Comput., pages 577–601. Springer, Cham, 2016.[13] Jean-David Benamou, Guillaume Carlier, and Luca Nenna. Generalized incom-pressible ﬂows, multi-marginal transport and Sinkhorn algorithm.

Preprint,arXiv:1710.08234 , 2017.[14] François Bolley. Separability and completeness for the Wasserstein distance. In

Séminaire de probabilités XLI , volume 1934 of

Lecture Notes in Math. , pages371–377. Springer, Berlin, 2008.[15] François Bolley and José A Carrillo. Nonlinear diﬀusion: geodesic convexity isequivalent to Wasserstein contraction.

Communications in Partial DiﬀerentialEquations , 39(10):1860–1869, 2014.[16] Andrea Braides.

Gamma-convergence for Beginners , volume 22. ClarendonPress, 2002.[17] Dmitri Burago, Yuri Burago, and Sergei Ivanov.

A course in metric geometry ,volume 33 of

Graduate Studies in Mathematics . American Mathematical Society,Providence, RI, 2001.[18] Eric Carlen. Stochastic mechanics: a look back and a look ahead.

Diﬀusion,quantum theory and radically elementary mathematics , 47:117–139, 2014.[19] Guillaume Carlier, Vincent Duval, Gabriel Peyré, and Bernhard Schmitzer.Convergence of entropic schemes for optimal transport and gradient ﬂows.

SIAMJ. Math. Anal. , 49(2):1385–1418, 2017.[20] José Antonio Carrillo, Stefano Lisini, Giuseppe Savaré, and Dejan Slepcev. Non-linear mobility continuity equations and generalized displacement convexity.

Journal of Functional Analysis , 258(4):1273–1309, 2010.[21] Fabio Cavalletti and Emanuel Milman. The globalization theorem for the cur-vature dimension condition. Preprint, arXiv:1612.07623, 2016.[22] Yongxin Chen, Tryphon T Georgiou, and Michele Pavon. On the relation be-tween optimal transport and Schrödinger bridges: A stochastic control view-point.

Journal of Optimization Theory and Applications , 169(2):671–691, 2016.[23] Giovanni Conforti. A second order equation for Schrödinger bridges with appli-cations to the hot gas experiment and entropic transportation cost.

ProbabilityTheory and Related Fields , 174(1-2):1–47, 2019.4324] Giovanni Conforti and Luigia Ripani. Around the entropic Talagrand inequality.

Bernoulli , 26(2):1431–1452, 2020.[25] Giovanni Conforti and Luca Tamanini. A formula for the time derivative of theentropic cost and applications.

Preprint, arXiv:1912.10555 , 2019.[26] Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal trans-port. In

Advances in Neural Information Processing Systems , pages 2292–2300,2013.[27] Sara Daneri and Giuseppe Savaré. Eulerian calculus for the displacement con-vexity in the Wasserstein distance.

SIAM J. Math. Anal. , 40(3):1104–1122,2008.[28] Nicolò De Ponti, Matteo Muratori, and Carlo Orrieri. Wasserstein stabilityof porous medium-type equations on manifolds with Ricci curvature boundedbelow.

Preprint, arXiv:1908.03147 , 2019.[29] Simone Di Marino and Augusto Gerolin. An optimal transport approach forthe Schrödinger bridge problem and convergence of Sinkhorn algorithm.

J. Sci.Comput. , 85(2):Paper No. 27, 28, 2020.[30] Simone Di Marino and Augusto Gerolin. Optimal Transport losses and Sinkhornalgorithm with general convex regularization.

Preprint, arXiv:2007.00976 , 2020.[31] Jean Dolbeault, Bruno Nazaret, and Giuseppe Savaré. A new class of transportdistances between measures.

Calculus of Variations and Partial DiﬀerentialEquations , 34(2):193–231, 2009.[32] Montacer Essid and Justin Solomon. Quadratically regularized optimal trans-port on graphs.

SIAM J. Sci. Comput. , 40(4):A1961–A1986, 2018.[33] Ivan Gentil, Christian Léonard, and Luigia Ripani. Dynamical aspects of gen-eralized Schrödinger problem via Otto calculus - A heuristic point of view.

Preprint, arXiv:1806.01553 , 2018.[34] Ivan Gentil, Christian Léonard, Luigia Ripani, and Luca Tamanini. An en-tropic interpolation proof of the HWI inequality.

Stochastic Processes and theirApplications , 2019.[35] Ugo Gianazza, Giuseppe Savaré, and Giuseppe Toscani. The Wasserstein gra-dient ﬂow of the Fisher information and the quantum drift-diﬀusion equation.

Arch. Ration. Mech. Anal. , 194(1):133–220, 2009.[36] Nicola Gigli. On the diﬀerential structure of metric measure spaces and appli-cations.

Mem. Amer. Math. Soc. , 236(1113):vi+91, 2015.[37] Nicola Gigli and Bangxian Han. The continuity equation on metric measurespaces.

Calc. Var. Partial Diﬀerential Equations , 53(1-2):149–177, 2013.[38] Nicola Gigli and Luca Tamanini. Second order diﬀerentiation formula on

RCD ∗ ( K, N ) spaces. Accepted at JEMS, arXiv:1802.02463, (2018).4439] Nicola Gigli and Luca Tamanini. Benamou-Brenier and duality formulas for theentropic cost on RCD ∗ ( K, N ) spaces. Probability Theory and Related Fields ,176(1-2):1–34, 2020.[40] Richard Jordan, David Kinderlehrer, and Felix Otto. The variational formula-tion of the Fokker-Planck equation.

SIAM J. Math. Anal. , 29(1):1–17, 1998.[41] Bijan Kakavandi. Weak topologies in complete

CAT (0) metric spaces.

Proceed-ings of the American Mathematical Society , 141(3):1029–1039, 2013.[42] William Kirk and Naseer Shahzad.

Fixed point theory in distance spaces .Springer, Cham, 2014.[43] Flavien Léger. A geometric perspective on regularized optimal transport.

Jour-nal of Dynamics and Diﬀerential Equations , 31(4):1777–1791, 2019.[44] Christian Léonard. From the Schrödinger problem to the Monge-Kantorovichproblem.

J. Funct. Anal. , 262(4):1879–1920, 2012.[45] Christian Léonard. A survey of the Schrödinger problem and some of its con-nections with optimal transport.

Discrete Contin. Dyn. Syst. , 34(4):1533–1574,2014.[46] Teck Cheong Lim. Remarks on some ﬁxed point theorems.

Proc. Amer. Math.Soc. , 60:179–182 (1977), 1976.[47] Dirk Lorenz and Hinrich Mahler. Orlicz space regularization of continuousoptimal transport problems.

Preprint, arXiv:2004.11574 , 2020.[48] Dirk A. Lorenz, Paul Manns, and Christian Meyer. Quadratically regularizedoptimal transport.

Applied Math. Optimization, to appear , 2019.[49] Robert J McCann. A convexity principle for interacting gases.

Advances inmathematics , 128(1):153–179, 1997.[50] Toshio Mikami. Monge’s problem with a quadratic cost by the zero-noise limitof h -path processes. Probab. Theory Related Fields , 129(2):245–260, 2004.[51] Toshio Mikami and Michèle Thieullen. Optimal transportation problem bystochastic optimal control.

SIAM J. Control Optim. , 47(3):1127–1139, 2008.[52] Andrea Mondino and Aaron Charles Naber. Structure theory of metric measurespaces with lower Ricci curvature bounds.

Journal of the European MathematicalSociety , 21(6):1809–1854, 2019.[53] Léonard Monsaingeon and Dmitry Vorotnikov. The Schrödinger problem on thenon-commutative Fisher-Rao space.

Calc. Var. Partial Diﬀerential Equations,to appear , 2020.[54] Matteo Muratori and Giuseppe Savaré. Gradient ﬂows and Evolution Varia-tional Inequalities in metric spaces. I: Structural properties.

Journal of Func-tional Analysis , 278(4):108347, 2020.4555] Felix Otto. The geometry of dissipative evolution equations: the porous mediumequation.

Comm. Partial Diﬀerential Equations , 26(1-2):101–174, 2001.[56] Felix Otto and Michael Westdickenberg. Eulerian calculus for the contraction inthe wasserstein distance.

SIAM journal on mathematical analysis , 37(4):1227–1255, 2005.[57] Gabriel Peyré and Marco Cuturi. Computational Optimal Transport: WithApplications to Data Science.

Foundations and Trends® in Machine Learning ,11(5-6):355–607, 2019.[58] Filippo Santambrogio.

Optimal transport for applied mathematicians .Birkhäuser/Springer, 2015.[59] Erwin Schrödinger. Über die Umkehrung der Naturgesetze. Von E. Schrödinger.(Sonderausgabe a. d. Sitz.-Ber. d. Preuss. Akad. d. Wiss., Phys.-math. Klasse,1931, IX.) Verlag W. de Gruyter, Berlin. Preis RM. 1,-.

Angewandte Chemie ,44(30):636–636, 1931.[60] Erwin Schrödinger. Sur la théorie relativiste de l’électron et l’interprétation dela mécanique quantique.

Ann. Inst. H. Poincaré , 2(4):269–310, 1932.[61] Karl-Theodor Sturm. On the geometry of metric measure spaces. I.

Acta Math. ,196(1):65–131, 2006.[62] Cédric Villani.

Topics in optimal transportation . American Mathematical Soc.,2003.[63] Cédric Villani.

Optimal transport. Old and new , volume 338 of

Grundlehren derMathematischen Wissenschaften . Springer-Verlag, Berlin, 2009.[64] Jean-Claude Zambrini. Variational processes and stochastic versions of mechan-ics.

J. Math. Phys. , 27(9):2307–2330, 1986.[65] Jean-Claude Zambrini. The research program of stochastic deformation (with aview toward geometric mechanics). In

Stochastic analysis: a series of lectures ,volume 68 of