The dynamical Schrödinger problem in abstract metric spaces
aa r X i v : . [ m a t h . F A ] D ec The dynamical Schrödinger problem in abstract metricspaces
Léonard Monsaingeon ∗ Luca Tamanini † Dmitry Vorotnikov ‡ Abstract
In this paper we introduce the dynamical Schrödinger problem on abstractmetric spaces, defined for a wide class of entropy and Fisher information func-tionals. Under very mild assumptions we prove a generic Gamma-convergenceresult towards the geodesic problem as the noise parameter ε ↓ . We alsoinvestigate the connection with geodesic convexity of the driving entropy, andstudy the dependence of the entropic cost on the parameter ε . Some examplesand applications are discussed. MSC [2020] 49J45, 49Q20, 58B20.Keywords: Benamou-Brenier formulation, Schrödinger bridge, Fisher information,Gamma-convergence, geodesic convexity, gradient flows, optimal transport
Contents Γ -convergence of the Schrödinger problem . . . . . . . . . . . . . . . 204.2 Displacement convexity . . . . . . . . . . . . . . . . . . . . . . . . . 26 ∗ GFM Universidade de Lisboa, Campo Grande, Edifício C6, 1749-016 Lisboa, Portu-gal and IECL Université de Lorraine, F-54506 Vandoeuvre-lès-Nancy Cedex, France. email:[email protected] † CEREMADE (UMR CNRS 7534), Université Paris Dauphine PSL, Place du Maréchal de Lattrede Tassigny, 75775 Paris Cedex 16, France and INRIA-Paris, MOKAPLAN, 2 Rue Simone Iff, 75012,Paris, France. email: [email protected] ‡ University of Coimbra, CMUC, Department of Mathematics, 3001-501 Coimbra, Portugal.email: [email protected] Examples 35
Gaspard Monge and Erwin Schrödinger came up with two a priori unrelated prob-lems that are concerned with finding a preeminent way of deforming a prescribedprobability distribution into another one. While Monge was interested in optimizingthe cost of transportation of goods [62, 63, 58], Schrödinger’s original thought exper-iment [59, 60] aimed for finding the most likely evolution between two subsequentobservations of a cloud of independent particles. So, even if in both cases we are fac-ing an interpolation and optimization problem, the former is deterministic in naturewhereas the latter is strongly related to large deviations theory, and is, at the firstglance, purely stochastic. We refer to a recent survey [22] for various formulationsand aspects of the Schrödinger problem, and to [64, 65] for a discussion of its role inEuclidean Quantum Mechanics.Anyway, several analogies and connections exist between the two problems. Theycan be appreciated by looking carefully at the interpolation aspects of both problems,namely at their dynamical formulations and the underlying equations governing therespective evolutions. In the case of a quadratic transportation cost over a Rieman-nian manifold M , the Monge-Kantorovich optimal transport problem is solved (atleast in a weak sense) by interpolating between the source and the target distri-butions with a constant-speed, length-minimizing geodesic in the Otto-Wassersteinspace of probability measures P ( M ) . This gives a curve ( µ t ) t ∈ [0 , ⊂ P ( M ) whichformally satisfies (in the sense of the celebrated Otto calculus [55, 62, 63]) ∇ ˙ µ t ˙ µ t = 0 , (1.1)where ∇ ˙ µ t is the covariant derivative along the curve t µ t . The Schrödinger prob-lem with parameter ε (from a physical viewpoint, ε can be seen as a temperatureor level of noise) can also be translated into such a geometric language. By anal-ogy, when looking at the covariant derivative along the optimal evolution ( µ εt ) t ∈ [0 , ,usually called Schrödinger bridge or entropic interpolation, the resulting equation issurprising and can be viewed [23] as Newton’s second law ∇ ˙ µ εt ˙ µ εt = ε ∇ I ( µ εt ) , (1.2)where in the right-hand side ∇ denotes the gradient in the Otto-Wasserstein pseudo-Riemannian sense and I is the Fisher information I ( µ ) = 4 Z M |∇√ ρ | dvol = Z M |∇ log ρ | ρ dvol provided µ = ρ · vol . A related observation is that the (scaled) heat flow, coincidingwith the (scaled) gradient flow of the Boltzmann-Shannon entropy [40, 63] H ( µ ) = Z M ρ log ρ dvol µ = ρ · vol , is also a solution to (1.2): a simple differentiation in time of ˙ µ t = − ε ∇ H ( µ t ) and the fact that I = |∇ H | in the Otto-Wasserstein sense automaticallyyield ∇ ˙ µ t ˙ µ t = ε ∇ H ( µ t ) · ε ∇ H ( µ t ) = ε ∇|∇ H ( µ t ) | = ε ∇ I ( µ t ) . This shows that the Schrödinger problem lies between optimal transport and diffu-sion and is naturally intertwined with both deterministic behaviour and Brownianmotion. It shares the same Newton’s law as the gradient flow of the entropy, butunlike the heat flow it has a prescribed final configuration to match: it is up tothe parameter ε to tip the balance in favour of deterministic transport or diffusion.With this heuristics in mind, we see that as ε → the applied force ε ∇ I ( µ εt ) in(1.2) vanishes, so that the Schrödinger problem may be interpreted as a noisy (en-tropic) counterpart of the Monge-Kantorovich optimal transport, corresponding tothe unforced geodesic evolution (1.1) discussed above. This informal relationshiphas a rigorous counterpart, which dates back to the pioneering works on the asymp-totic behavior of the Schrödinger problem as ε → of T. Mikami, M. Thieullen[50, 51], and C. Léonard [44, 45]. This was subsequently developed in [19, 8, 38].Very recently [47, 30], similar small-noise results were obtained for static Monge-Kantorovich problems regularized with more general entropies.This first connection can be investigated further and by doing so one can re-mark that (1.2) is exactly the Euler-Lagrange optimality equation for the dynamicalBenamou-Brenier formulation of the Schrödinger problem [18, 22, 45, 39], whichconsists in minimizing the Lagrangian kinetic action perturbed by the Fisher infor-mation: In more precise terms, inf (cid:26) Z Z M | v t | d µ t d t + ε Z I ( µ t ) d t (cid:27) , (1.3)where the infimum runs over all solutions of the continuity equation ∂µ t + div( v t µ t ) = 0 with prescribed initial and final densities. Also from this variational standpoint thereader can see that as ε → the Schrödinger problem formally reduces to inf 12 Z Z M | v t | d µ t d t, (1.4)namely to the dynamical Benamou-Brenier formulation of the (quadratic) optimaltransport problem [9]. This variational representation depicts in a way clearer than(1.2) the double nature of the Schrödinger problem, the competition between thedeterminism encoded in the kinetic energy and the unpredictability coming fromthe Fisher information, and the role played by ε in balancing these two oppositebehaviours.The double bond of the Schrödinger problem with optimal transport on the onehand and heat flow on the other hand results in fruitful and wide-ranging applica-tions of both theoretical and applied interest. Indeed, from the connection with theheat flow the solutions to the Schrödinger problem gain regularity properties whichare not available in optimal transport, and thanks to the asymptotic behaviour of3he Schrödinger problem as ε → entropic interpolations represent an efficient wayto approximate Wasserstein geodesics with second-order accuracy [38, 25]. Thisapproach has already turned out to be successful in conjunction with functional in-equalities [24, 34] and differential calculus along Wasserstein geodesics [38]. But thenice behaviour of Schrödinger bridges is important also for computational purposes.The impact of Schrödinger problem and Sinkhorn algorithms (deeply related to thestatic formulation of the former) on the numerical methods used in optimal trans-portation theory has been impressive, as witnessed by several recent works (see [57]and references therein as well as [26, 10, 12, 13, 11, 29]).As a matter of fact neither the particular structure of the Wasserstein spacenor the specific choice of the Boltzmann-Shannon functional are required to definethe two problems in question (cf. a related discussion in the heuristic paper [43]):one can of course define length-minimizing geodesics in any metric space (X , d ) ,and the Schrödinger problem (or at least its Benamou-Brenier formulation describedabove) merely involves an entropy functional and a corresponding Fisher information.Given such a reasonable entropy functional E on X that generates a gradient flowin a suitable sense, the corresponding Fisher information is expected to be nothingbut the dissipation rate of E (along solutions of its own gradient flow), just as I coincides with the rate of dissipation of the entropy H along the heat flow. Thisobservation is the starting point of the present paper, where we intend to study theabstract Schrödinger bridge problem or, in other words, the entropic approximationof geodesics in metric spaces.Under very mild assumptions on X and E , we will prove the solvability of theabstract ε -Schrödinger problem and the Γ -convergence to the corresponding geodesicproblem as ε → . We will also rigorously justify, in the metric setting, that anytrajectory of a gradient flow solves an associated Schrödinger problem. Leveraging aquantitative AC estimate based on a straightforward chain-rule in the smooth Rie-mannian setting, the cornerstone of our analysis will be the systematic constructionof an ε -regularized entropic copy ( γ εt ) t ∈ [0 , of any arbitrary curve ( γ t ) t ∈ [0 , . Theseperturbed curves will provide recovery sequences for the Γ -convergence. Our con-struction is completely Eulerian and essentially consists in running the E -gradientflow for a short time h ε ( t ) starting at γ t for all t , for well-chosen functions h ε ≥ .The challenge here will be to reproduce the (formal, differential) Riemannian chain-rule in metric spaces. Notably, an analogous pseudo-Riemannian idea has recentlybeen used by A. Baradat and some of the authors [8, 53] in order to prove the Γ -convergence for the classical dynamical Schrödinger problem on the Otto-Wassersteinspace and for its counterpart on the non-commutative Fisher-Rao space, respectively.However, in those papers the computations were ad hoc and heavily exploited theunderlying structures of the particular spaces as well as the properties of the par-ticular flows (namely, of the classical heat flow and of its restriction to multivariateGaussians), whereas here we derive everything from the existence of an abstractgradient flow on X driven by E .Another notion of paramount importance herein will be convexity. In the smoothRiemannian setting, and given λ ∈ R , elementary calculus shows that the λ -convexityof E along geodesics is of course equivalent to a uniform lower bound Hess E ( x ) ≥ λ Id as quadratic forms in the tangent space, but also more importantly to the λ -contractivity of the E -gradient flow. In the metric setting no second order calculus4s available in general, and the very notion of gradient flow as well as its connectionwith geodesic convexity and contractivity become much more subtle. The key no-tion of gradient flow that we shall use throughout is that of Evolution VariationalInequality , or
EVI λ flow [2]. Under reasonable assumptions it is well known that(a suitable variant of) convexity of E generally provides existence of an EVI λ -flowstarting at any x ∈ X , see [2]. A natural question to ask is whether the converse alsoholds true, i.e. whether well-posedness of a reasonable gradient flow implies someconvexity. This was proved in [15] for the specific case of the Euclidean Wassersteinspace X = W (Ω) , Ω ⊂ R d , and at least for the so-called internal energies, and it isshown therein that -contractivity of the gradient flow (or equivalently, of the asso-ciated nonlinear diffusion equation) implies -displacement convexity in the sense ofMcCann [49]. In the same spirit, and building up on Otto and Westdickenberg [56],Daneri and Savaré proved in a very general metric setting that the generation of an EVI λ -flow indeed implies λ -geodesic convexity [27, Theorem 3.2]. A byproduct ofour analysis for the Γ -convergence will give a new independent proof of this latterfact by a completely different approach, essentially by constructing an ε -entropicregularization of geodesics and carefully examining the defect of optimality at orderone in ε → .As a main application of the Γ -convergence of the Schrödinger problem to thegeodesic problem as ε → (and more generally of the ε ′ -Schrödinger problem to the ε -one as ε ′ → ε ) we investigate the behaviour of the optimal value of the dynamicalSchrödinger problem, henceforth called entropic cost , as a function of the temper-ature parameter ε , with particular emphasis on the regularity and the small-noiseregime. For the classical dynamical Schrödinger problem (1.3), it has recently beenproved by the second author with G. Conforti [25] that the entropic cost is of class C ((0 , ∞ )) ∩ C ([0 , ∞ )) (actually C ([0 , ∞ )) under suitable assumptions) and twicea.e. differentiable; once this regularity information is available, the formula for thefirst derivative is rather easy to guess, as by the envelope theorem it coincides withthe partial derivative w.r.t. ε of the functional in (1.3) evaluated at any critical point.Denoting by C ε ( µ, ν ) the value in (1.3) with marginal constraints µ and ν and by ( µ εt ) t ∈ [0 , the associated Schrödinger bridge, this statement reads as dd ε C ε ( µ, ν ) = ε Z I ( µ εt ) d t, ∀ ε > and in [25] this identity played an important role in the study of both the large- andsmall-noise behaviour of the Schrödinger problem, obtaining in particular a Taylorexpansion around ε = 0 with o ( ε ) -accuracy. Since the central object in the presentpaper is an abstract and general formulation of (1.3), an analogous result is expectedto hold. However, from a technical viewpoint the proof is much more subtle andchallenging, because unlike (1.3) our metric version of the dynamical Schrödingerproblem may have multiple solutions. For this reason the discussion about theregularity of the entropic cost in this paper is less concise than in [25]. Nonetheless,we are still able to deduce the same kind of Taylor expansion with the same accuracy.Given the previous interpretation of the Schrödinger problem as a noisy Monge-Kantorovich problem and the importance of quantitative estimates in approximatingoptimal transport by means of the Schrödinger problem, it is reasonable to expectthat such a Taylor expansion (valid in a general framework for a wide choice of5unctionals E ) will fit to a countless variety of examples, some of which will bediscussed here.The paper is organized as follows: In Section 2 we give a short and formal proof ofour fundamental AC estimate in the smooth Riemannian setting, and show how itcan be exploited to establish Γ -convergence and convexity. Section 3 fixes the metricframework in which we work for the rest of the paper, and extends the previousestimate to this metric setting. In Section 4 we prove the Γ -convergence as ε ↓ andestablish the geodesic convexity. Section 5 studies the dependence of the optimalentropic cost on the temperature parameter ε > , and provides a second orderexpansion. Finally, we list in Section 6 several examples and applications coveredby our abstract results. Here we remain formal and the computations are carried in a Riemannian setting,where classical calculus and chain-rules are available. (Significant work will be re-quired later on to adapt the computations in metric spaces.) All the objects andfunctions in this section are therefore considered to be smooth, and we deliberatelyignore any regularity issue.Let M be a Riemannian manifold with scalar product h ., . i q at a point q ∈ M and induced Riemannian distance d , and let V : M → R be a given potential. Forsimplicity we assume here that V is globally bounded from below on M , and up toreplacing V by V − min V we can assume that V ( q ) ≥ . (In section 3 we will relaxthis assumption and allow V to be only locally bounded .) Given a small temperatureparameter ε > , and following [43], the (dynamic) geometric Schrödinger problem consists in solving the optimization problem Z (cid:12)(cid:12)(cid:12)(cid:12) d q t d t (cid:12)(cid:12)(cid:12)(cid:12) d t + ε Z |∇ V | ( q t ) d t −→ min; s.t. q ∈ C ([0 , , M ) with endpoints q , q . (2.1)For s ≥ we denote by Φ( s, q ) the semi-flow corresponding to the autonomous V -gradient flow started from q ∈ M , ( dd s Φ( s, q ) = −∇ V (Φ( s, q )) , Φ(0 , q ) = q . The goal of this section is to give a straightforward proof of the following two facts,assuming that the potential V is well behaved:(i) the ε -Schrödinger problem converges to the geodesic problem as ε → ;(ii) λ -contractivity of the generated flow Φ can be turned into λ -convexity alonggeodesics.With this goal in mind, fix any two endpoints q , q ∈ M and take an arbitrary curvejoining them q ∈ C ([0 , , M ) , q | t =0 = q and q | t =1 = q . h ( t ) ≥ with h (0) = h (1) = 0 , we perturb q by defining ˜ q t := Φ( h ( t ) , q t ) , t ∈ [0 , i.e. ˜ q t is the solution of the V -gradient flow at time s = h ( t ) ≥ starting from q t attime s = 0 . We shall refer to t ∈ [0 , as a “horizontal time” and to s ∈ [0 , h ( t )] as a“vertical time”, see Figure 1. Later on we will think of the curve ˜ q as a “regularized”version of q . q q Φ( s, q t ) t s ˜ q t q t Figure 1: The perturbed curveNote that the endpoints remain invariant, ˜ q = q and ˜ q = q . Since by definitionof the flow ∂ s Φ( s, q t ) = −∇ V (Φ( s, q t )) , the speed of the perturbed curve can becomputed as d˜ q t d t = dd t (cid:16) Φ( h ( t ) , q t ) (cid:17) = ∂ s Φ( h ( t ) , q t ) h ′ ( t ) + ∂ q Φ( h ( t ) , q t ) d q t d t = − h ′ ( t ) ∇ V (˜ q t ) + ∂ q Φ( h ( t ) , q t ) d q t d t . Bringing the h ′ ( t ) term to the left-hand side and taking the half squared norm (inthe tangent space T ˜ q t M ) gives (cid:12)(cid:12)(cid:12)(cid:12) d˜ q t d t (cid:12)(cid:12)(cid:12)(cid:12) + 12 | h ′ ( t ) | |∇ V (˜ q t ) | + h ′ ( t ) h∇ V (˜ q t ) , d˜ q t d t i ˜ q t | {z } = dd t V (˜ q t ) = 12 (cid:12)(cid:12)(cid:12)(cid:12) ∂ q Φ( h ( t ) , q t ) d q t d t (cid:12)(cid:12)(cid:12)(cid:12) . (2.2)Assume now that, for whatever reason, the gradient flow satisfies the following quan-tified contractivity estimate w.r.t. the Riemannian distance dd (Φ( s, p ) , Φ( s, p ′ )) ≤ e − λs d ( p , p ′ ) , ∀ s ≥ , p , p ′ ∈ M (2.3)for some fixed λ ∈ R . Then it is easy to check that the linear map v ∂ q Φ( s, p ) · v (from T p M to T Φ( s,p ) M ) has norm less than e − λs , and therefore (2.2) gives (cid:12)(cid:12)(cid:12)(cid:12) d˜ q t d t (cid:12)(cid:12)(cid:12)(cid:12) + 12 | h ′ ( t ) | |∇ V (˜ q t ) | + h ′ ( t ) dd t V (˜ q t ) ≤ e − λh ( t ) (cid:12)(cid:12)(cid:12)(cid:12) d q t d t (cid:12)(cid:12)(cid:12)(cid:12) . (2.4)7ntegration by parts yields next Z (cid:12)(cid:12)(cid:12)(cid:12) d˜ q t d t (cid:12)(cid:12)(cid:12)(cid:12) d t + 12 Z | h ′ ( t ) | |∇ V (˜ q t ) | d t − Z h ′′ ( t ) V (˜ q t ) d t ≤ Z e − λh ( t ) (cid:12)(cid:12)(cid:12)(cid:12) d q t d t (cid:12)(cid:12)(cid:12)(cid:12) d t + (cid:16) h ′ (0) V ( q ) − h ′ (1) V ( q ) (cid:17) , (2.5)where the invariance ˜ q = q , ˜ q = q was used in the last boundary terms. Thisfundamental estimate gives a quantified bound on the kinetic energy (namely the L speed) of ˜ q in terms of that of the original curve q , and will be the cornerstone ofthe whole analysis.Both the convexity and the convergence of the Schrödinger problem will actuallyfollow by setting h ( t ) = εH ( t ) for suitable choices of H ( t ) ≥ , and then letting ε ↓ .Note that in this case we have h ( t ) = εH ( t ) ↓ uniformly, hence the perturbed curve q εt := Φ( εH ( t ) , q t ) (2.6)will converge uniformly to q as ε ↓ too. A first use of (2.5) will be crucial in proving the Γ -convergence of the Schrödingerfunctional A ε ( q ) := 12 Z (cid:12)(cid:12)(cid:12)(cid:12) d q t d t (cid:12)(cid:12)(cid:12)(cid:12) d t + ε Z |∇ V ( q t ) | d t towards the kinetic action A ( q ) := 12 Z (cid:12)(cid:12)(cid:12)(cid:12) d q t d t (cid:12)(cid:12)(cid:12)(cid:12) d t as ε ↓ . Theorem 2.1 (formal Γ -limit) . For any q , q ∈ M it holds A = Γ − lim ε → A ε for the uniform convergence on the space of curves with fixed endpoints q , q .Proof. We check separately the Γ − lim inf and the Γ − lim sup properties. As forthe former, given any curve q joining q , q and any q ε → q uniformly, since thekinetic energy functional q
7→ A ( q ) is always lower semicontinuous for the uniformconvergence we get first A ( q ) ≤ lim inf ε ↓ A ( q ε ) ≤ lim inf ε ↓ A ε ( q ε ) . For the Γ − lim sup , let H ( t ) = min { t, − t } be the hat function centered at t = 1 / with height / and vanishing at the boundaries, set h ( t ) = εH ( t ) , and let q ε bethe regularized curve constructed in (2.6). In this simple smooth setting it is notdifficult to check that q ε → q uniformly. Moreover, our choice of h ( t ) results in8 h ′ ( t ) | = ε with h ′ (0) = ε , h ′ (1) = − ε , and h ′′ ( t ) = − εδ / ( t ) in the distributionalsense. Therefore (2.5) gives immediately A ε ( q ε ) + 2 εV ( q ε / ) = 12 Z (cid:12)(cid:12)(cid:12)(cid:12) d q εt d t (cid:12)(cid:12)(cid:12)(cid:12) d t + ε Z |∇ V ( q εt ) | d t + 2 εV ( q ε / ) ≤ Z e − ελH ( t ) (cid:12)(cid:12)(cid:12)(cid:12) d q t d t (cid:12)(cid:12)(cid:12)(cid:12) d t + ε (cid:16) V ( q ) + V ( q ) (cid:17) . The singularity of h ′′ at t = 1 / can be easily and rigorously worked around, simplyintegrating by parts (2.2) separately on each interval t ∈ [0 , / and t ∈ [1 / , andkeeping track of the boundary terms resulting ultimately in the above εV ( q ε / ) ≥ .Discarding this latter non-negative term finally gives lim sup ε ↓ A ε ( q ε ) ≤ lim sup ε ↓ ( Z e − ελH ( t ) (cid:12)(cid:12)(cid:12)(cid:12) d q t d t (cid:12)(cid:12)(cid:12)(cid:12) d t + ε (cid:16) V ( q ) + V ( q ) (cid:17)) = A ( q ) and concludes the proof. The second consequence of our fundamental estimate (2.5) is the quantification of theconvexity of the potential V in terms of the quantified contractivity (2.3). The pointhere is that the result can be obtained directly from (2.2), which can be establishedin a purely metric setting without relying on differential calculus (see the next sectionfor details). Theorem 2.2.
Assume that V satisfies 2.3. Then V is λ -geodesically convex, i.e. V ( q θ ) ≤ (1 − θ ) V ( q ) + θV ( q ) − λ θ (1 − θ ) d ( q , q ) , θ ∈ (0 , for any geodesic ( q θ ) θ ∈ [0 , in M .Proof. Let ( q t ) t ∈ [0 , be an arbitrary geodesic with endpoints q , q . For fixed θ ∈ (0 , let H θ ( t ) := θ t if t ∈ [0 , θ ] , − − θ ( t − if t ∈ [ θ, , be the hat function centered at t = θ with height and vanishing at t = 0 , , andfor any ε > let q ε be the regularized curve constructed in (2.6) with h ( t ) = εH θ ( t ) .Note moreover that h ′ (0) = εθ , h ′ (1) = − ε − θ , h ′′ ( t ) = − ε (cid:18) θ + 11 − θ (cid:19) δ θ ( t ) in the distributional sense. Discarding the non-negative term | h ′ ( t ) | |∇ V (˜ q t ) | in92.5), the optimality of the geodesic q from q to q gives ≤ Z (cid:12)(cid:12)(cid:12)(cid:12) d q εt d t (cid:12)(cid:12)(cid:12)(cid:12) d t − Z (cid:12)(cid:12)(cid:12)(cid:12) d q t d t (cid:12)(cid:12)(cid:12)(cid:12) d t (2.5) ≤ Z h ′′ ( t ) V ( q εt ) d t + 12 Z (cid:16) e − λh ( t ) − (cid:17) (cid:12)(cid:12)(cid:12)(cid:12) d q t d t (cid:12)(cid:12)(cid:12)(cid:12) d t + (cid:16) h ′ (0) V ( q ) − h ′ (1) V ( q ) (cid:17) = − ε (cid:18) θ + 11 − θ (cid:19) V ( q εθ ) + d ( q , q )2 Z (cid:16) e − λεH θ ( t ) − (cid:17) d t + ε (cid:18) θ V ( q ) + 11 − θ V ( q ) (cid:19) , where the last equality follows from the constant speed property | d q t d t | = d ( q , q ) of the geodesic ( q t ) t ∈ [0 , connecting q , q as well as from the explicit properties of h ( t ) = εH θ ( t ) listed above. Multiplying by θ (1 − θ ) ε > and rearranging gives V ( q εθ ) ≤ (1 − θ ) V ( q ) + θV ( q ) + θ (1 − θ ) d ( q , q )2 Z e − λεH θ ( t ) − ε d t | {z } := I ε . Since R H θ ( t )d t = for all θ we see that I ε → − λ R H θ ( t ) d t = − λ as ε ↓ , andthe result immediately follows since V ( q εθ ) → V ( q θ ) as well in the left-hand side. Before trying to adapt the previous computations to the metric context we need tofix once and for all the framework to be used in the sequel. • By C ([0 , , (X , d )) , or simply C ([0 , , X) , we denote the space of continu-ous curves with values in the metric space (X , d ) . The collection of abso-lutely continuous curves on [0 , is denoted by AC ([0 , , (X , d )) , or simply by AC ([0 , , X) . For any curve ( γ t ) ∈ AC ([0 , , X) , its length is well defined as ℓ ( γ ) := Z | ˙ γ t | d t, where | ˙ γ t | denotes the metric speed of γ . If | ˙ γ t | ∈ L (0 , , then we shall saythat ( γ t ) ∈ AC ([0 , , X) . For these notions of absolutely continuous curvesand metric speed in a metric space, see for instance [2, Section 1.1].• A curve γ : [0 , → X is called geodesic provided d ( γ t , γ s ) = | t − s | d ( γ , γ ) forall t, s ∈ [0 , .• The slope | ∂ E | of a functional E : X → R ∪ { + ∞} at a point x ∈ X is set as + ∞ if x / ∈ D ( E ) , if x is isolated, and defined as | ∂ E | ( x ) := lim sup y → x [ E ( x ) − E ( y )] + d ( x, y ) if x ∈ D ( E ) . 10 A curve ( γ t ) t> ⊂ X is said to be a gradient flow of E in the EVI λ sense (with λ ∈ R ) provided ( γ t ) ∈ AC loc ((0 , ∞ ) , X) and
12 dd t d ( γ t , y ) + λ d ( γ t , y ) + E ( γ t ) ≤ E ( y ) , ∀ y ∈ X , a.e. t > . ( EVI λ )If γ t → x as t ↓ with x ∈ D ( E ) , then we say that the gradient flow ( γ t ) startsat x .After this premise, let us fix the framework we shall work within. Setting 3.1.
On the space (X , d ) and on the functional E : X → R ∪ { + ∞} we makethe following assumptions:(A1) (X , d ) is a complete and separable metric space;(A2) E is lower semicontinuous with dense domain, i.e. D ( E ) = X , and locallybounded from below in the following sense: for any d -bounded set B ⊂ X thereexists c B ∈ R such that E ( x ) ≥ c B for all x ∈ B ;(A3) there exists λ ∈ R such that for any x ∈ X there exists an EVI λ -gradient flow of E starting from x . In view of (3.3) , the corresponding 1-parameter semigroupshall be denoted S t . Sometimes, and always explicitly indicated, we will also use the following extrahypothesis.
Assumption 3.2.
There exists a Hausdorff topology σ on X such that d -boundedsets are sequentially σ -compact. Moreover, the distance d and the slope | ∂ E | are σ -sequentially lower semicontinuous. Remark 3.3.
Assumption 3.2 is in particular valid provided (X , d ) is a locallycompact space. Indeed, in this case the metric topology of (X , d ) is an admissi-ble candidate for σ , since bounded sets are relatively compact (by [17, Proposition2.5.22]) and the lower semicontinuity of the slope | ∂ E | w.r.t. the metric topology isa consequence of the forthcoming identity (3.1). (cid:4) Remark 3.4.
Assumption 3.2 implies that d -converging sequences are also σ -converging.Indeed, given ( x n ) ⊂ X with d ( x, x n ) → as n → ∞ for some limit x ∈ X , by As-sumption 3.2 and by the boundedness of ( x n ) n there exist a subsequence ( x n k ) k and y ∈ X such that x n k σ → y as k → ∞ . Since d is σ -sequentially lower semicontinuous (again by Assumption 3.2) we deducethat d ( x, y ) ≤ lim inf k →∞ d ( x, x n k ) = lim n →∞ d ( x, x n ) = 0 , whence x = y . This classically implies that the whole sequence converges, x n σ → x . (cid:4)
11e list now some useful properties of
EVI -gradient flows, which hold true inSetting 3.1 and that we shall use extensively in the sequel. First of all, whenever x ∈ X is the starting point of an EVI λ flow, the slope there (a local object, a priori)admits the global representation | ∂ E | ( x ) = sup y = x (cid:16) E ( x ) − E ( y ) d ( x, y ) + λ d ( x, y ) (cid:17) + , (3.1)see [54, Proposition 3.6]. Since we assume that any x ∈ X is the starting point of an EVI λ -gradient flow, this means in particular that | ∂ E | : X → [0 , ∞ ] is lower semicon-tinuous, since so is the right-hand side above (as a supremum of lower semicontinuousfunctions). This also implies by [2, Theorem 1.2.5] that | ∂ E | is a strong upper gradi-ent for E in the sense of [2, Definition 1.2.1], namely: for every ( γ t ) ∈ AC ([0 , , X) ,the map t E ( γ t ) is Borel and | E ( γ t ) − E ( γ t ) | ≤ Z t t | ∂ E | ( γ t ) | ˙ γ t | d t, ∀ ≤ t ≤ t ≤ , (3.2)the right-hand side being possibly infinite. In addition, if ( γ t ) is an EVI λ -gradientflow of E then the following hold [54, Theorem 3.5]:(i) If ( γ t ) starts from x ∈ D ( E ) and (˜ γ t ) is a second EVI λ -gradient flow of E starting from y ∈ D ( E ) respectively, then d ( γ t , ˜ γ t ) ≤ e − λt d ( x, y ) , ∀ t ≥ . (3.3)This means that EVI-gradient flows are unique (provided they exist) and thusif there exists an EVI-gradient flow ( γ t ) starting from x , then a 1-parametersemigroup ( S t ) t> is unambiguously associated to it via S t ( x ) = γ t .(ii) The maps t γ t and t E ( γ t ) are locally Lipschitz in (0 , ∞ ) with values in X and R , respectively, and satisfy the Energy Dissipation Equality − dd t E ( γ t ) = 12 | ˙ γ t | + 12 | ∂ E | ( γ t ) = | ˙ γ t | = | ∂ E | ( γ t ) , for a.e. t > . (3.4)(iii) The map t e λt | ∂ E | ( γ t ) is non-increasing . (3.5)(iv) If ( γ t ) starts from x and y ∈ D ( | ∂ E | ) , then | ∂ E | ( γ t ) ≤ e λt − | ∂ E | ( y ) + 1 I λ ( t ) d ( x, y ) , provided − λt < log 2 , (3.6)where I λ ( t ) := R t e λs d s .We emphasize that these properties directly follow from the very definition ( EVI λ ) ofgradient flows, and a priori do not require E to be geodesically λ -convex. Althoughanalogous statements can be found in [2] and [1] under convexity assumptions on E , the latter are essentially needed to grant existence of EVI-gradient flows. It isimportant to stress this fact because in Setting 3.1 we assume that for any x ∈ X there exists an EVI λ -gradient flow of E starting there, which from [27] is known to12mply that E is geodesically convex. In Section 3 we also provide an alternative proofof this latter fact, whence the necessity for us to avoid all properties of EVI -gradientflows which are actually a consequence of geodesic convexity.We conclude this preliminary part with a general integrability result about
EVI λ -gradient flows, which we could not find explicitly written in the literature and willbe used later on in the proof of Lemma 3.6. Lemma 3.5.
With the same assumptions and notations as in Setting 3.1, let x ∈ X .Then t E ( S t x ) is integrable in [0 , T ] , for all T > ,regardless of whether E ( x ) is finite or not. On intervals [ ε, T ] this computation is easily justified by the fact that t E ( S t x ) is locally Lipschitz in (0 , ∞ ) , hence locally integrable therein. But this computationis legitimate even if ε = 0 , as we are going to see. Proof.
Let x ∈ X and T > be as in our statement. Since E is bounded from belowon d -bounded sets by (A2), and because ( S t x ) t ∈ [0 ,T ] is bounded, there exists c ∈ R such that E ( S t x ) ≥ c for all t ∈ [0 , T ] . Combining with ( EVI λ ) this gives c ≤ E ( S t x ) ≤ E ( y ) −
12 dd t d ( S t x, y ) − λ d ( S t x, y ) for any y ∈ D ( E ) and t ∈ (0 , T ] . Integrating from t = η > to t = T gives c ( T − η ) ≤ Z Tη E ( S t x ) d t ≤ ( T − η ) E ( y ) − (cid:16) d ( S T x, y ) − d ( S η x, y ) (cid:17) − λ Z Tη d ( S t x, y ) d t. As t E ( S t x ) is bounded from below on [0 , T ] and the right-hand side has a finitelimit as η ↓ (thanks to the fact that t S t x is d -continuous on [0 , ∞ ) by the verydefinition of EVI λ -gradient flow), we deduce the desired integrability. In this section the formal Riemannian computations carried out at the beginningof Section 2, and more precisely (2.4), will be reproduced rigorously in the abstractSetting 3.1. To this aim, a key role will be played by the following purely metricestimate:
Lemma 3.6.
With the same assumptions and notations as in Setting 3.1, let ( γ t ) ∈ AC ([0 , , X) with E ( γ ) , E ( γ ) < ∞ . For any fixed absolutely continuous function h : [0 , → R with h ( t ) > for all t ∈ (0 , let ˜ γ t := S h ( t ) γ t , t ∈ [0 , , and for any ≤ t < t ≤ write t + := (cid:26) t if h ( t ) ≥ h ( t ) t otherwise and t − := (cid:26) t if h ( t ) ≥ h ( t ) t otherwise . (3.7)13 hen we have the exact estimate (cid:12)(cid:12)(cid:12)(cid:12) d (˜ γ t , ˜ γ t ) t − t (cid:12)(cid:12)(cid:12)(cid:12) + 12 λ | ∂ E | (˜ γ t + ) e λ ( h ( t ) − h ( t )) + e λ ( h ( t ) − h ( t )) − t − t ) + 1 − e − λ ( h ( t + ) − h ( t − )) λ ( t + − t − ) · E (˜ γ t ) − E (˜ γ t ) t − t ≤ e − λ ( h ( t )+ h ( t )) (cid:12)(cid:12)(cid:12)(cid:12) d ( γ t , γ t ) t − t (cid:12)(cid:12)(cid:12)(cid:12) . (3.8)Here we use the convention that (+ ∞ ) × whenever | ∂ E | (˜ γ t + ) = + ∞ and h ( t ) = h ( t ) in the second term on the left-hand side of (3.8). Since we assume that h ( t ) > for t ∈ (0 , , and because any EVI λ -gradient flow immediately falls within D ( | ∂ E | ) by standard regularizing effects, this latter case is in fact only possible if t = 0 , t = 1 , and h ( t ) = h ( t ) = 0 . In that case ˜ γ = γ and ˜ γ = γ , the thirdterm in the left-hand side also cancels owing to e − λ ( h ( t + ) − h ( t − )) = 1 , and (3.8) thenholds as a trivial equality.We shall rely on this lemma later on in two different ways: First, fixing t = 0 and letting t ↓ (resp. fixing t = 1 and letting t ↑ ) to control in Lemma 3.10the continuity of t E (˜ γ t ) at the boundaries t = 0 , , and second, fixing t ∈ (0 , and letting t → t to obtain in Proposition 3.11 a pointwise differential estimatesimilar to (2.4). Remark 3.7.
The times t ± are just a fancy notation, ordered as h ( t − ) ≤ h ( t + ) .Note that in our estimate (3.8) the Fisher information | ∂ E | (˜ γ t + ) is evaluated at thetime t = t + for which the “smoothing time” s = h ( t ) or s = h ( t ) is the largest,i.e. where the regularizing vertical flow has been run for the longest time. This issomehow natural, as this specific point is “better” than the other one in terms ofregularity. (cid:4) Proof.
By symmetry we only discuss the case h ( t ) ≥ h ( t ) , i.e. t + = t and t − = t .As already mentioned, if h ( t ) = h ( t ) = 0 our statement is actually vacuous, thusit is not restrictive to further assume h ( t ) > . Let us write for simplicity ˆ γ t := S h ( t ) γ t . From an intuitive point of view, this corresponds to freezing a “vertical time” s = h ( t ) and “translating” ˜ γ t in the “horizontal” t direction parallel to the curve γ until t . Here, in the “vertical” direction above t the smoothing semigroup S s associatedwith E has been run at least for a strictly positive time h ( t ) > , so that by (3.6)the solution of the “vertical” gradient flow at that time lies within the regular domain X = D ( | ∂ E | ) ⊂ X = D ( E ) ⊂ X , see Figure 2.The first step is to write ( EVI λ ) for s S s ( γ t ) with ˜ γ t as a reference point,namely
12 dd s d ( S s γ t , ˜ γ t ) + λ d ( S s γ t , ˜ γ t ) + E ( S s γ t ) ≤ E (˜ γ t ) , which holds true for a.e. s ∈ [0 , h ( t )] in the “vertical” direction. This inequality canbe equivalently rewritten as
12 dd s (cid:16) e λs d ( S s γ t , ˜ γ t ) (cid:17) ≤ e λs (cid:16) E (˜ γ t ) − E ( S s γ t ) (cid:17) . (3.9)14 t γ t ˜ γ t ˆ γ t [ s = h ( t )]˜ γ t [ s = h ( t )] t s Figure 2: The horizontal and vertical curvesNote that this estimate carries significant information if and only if the referencepoint has finite entropy, i.e. E (˜ γ t ) < ∞ in the right-hand side. This holds true for t ∈ (0 , because ˜ γ t is the EVI λ -gradient flow of E starting from γ t at a strictlypositive time s = h ( t ) > , but also for h ( t ) = 0 if t = 0 since in this case ˜ γ = γ is assumed to have finite entropy.Integrating (3.9) from s = h ( t ) to s = h ( t ) gives e λh ( t ) d (˜ γ t , ˜ γ t ) − e λh ( t ) d (ˆ γ t , ˜ γ t ) ≤ Z h ( t ) h ( t ) e λs (cid:16) E (˜ γ t ) − E ( S s γ t ) (cid:17) d s = Z h ( t ) h ( t ) e λs (cid:16)(cid:0) E (˜ γ t ) − E (˜ γ t ) (cid:1) + (cid:0) E (˜ γ t ) − E ( S s γ t ) (cid:1)(cid:17) d s = Z h ( t ) h ( t ) e λs (cid:16) E (˜ γ t ) − E ( S s γ t ) (cid:17) d s − e λh ( t ) − e λh ( t ) λ (cid:16) E (˜ γ t ) − E (˜ γ t ) (cid:17) . (3.10)If h ( t ) > this computation is legitimate because s S s γ t and s E ( S s γ t ) arelocally Lipschitz in (0 , ∞ ) , hence s d ( S s γ t , ˜ γ t ) and s E ( S s γ t ) are locallyintegrable therein. But this computation is also justified when h ( t ) = 0 by Lemma3.5. More specifically s E ( S s γ t ) is absolutely integrable on [0 , T ] for any T > and a fortiori so is s e λs E ( S s γ t ) .Now let us estimate the terms in (3.10) to get (3.8). First, since ˆ γ t = S h ( t ) γ t , ˜ γ t = S h ( t ) γ t , and S h ( t ) ( · ) is λ -contractive by (3.3), we observe that the secondterm in the left-hand side of (3.10) can be controlled as d (ˆ γ t , ˜ γ t ) = d ( S h ( t ) γ t , S h ( t ) γ t ) ≤ e − λh ( t ) d ( γ t , γ t ) . (3.11)15n the right-hand side, let us define I := Z h ( t ) h ( t ) e λs (cid:16) E (˜ γ t ) − E ( S s γ t ) (cid:17) d s. This integral is clearly non-positive by (3.4), but we need a finer analysis. To thisaim, for fixed < s < h ( t ) let us write E (˜ γ t ) − E ( S s γ t ) = E ( S h ( t ) γ t ) − E ( S s γ t )= Z h ( t ) s dd τ E ( S τ γ t ) d τ = − Z h ( t ) s | ∂ E | ( S τ γ t ) d τ, where the second equality holds due to τ E ( S τ γ t ) being Lipschitz on [ s, h ( t )] ,and the third one stems from (3.4) for the gradient flow τ S τ γ t . By (3.5) − Z h ( t ) s | ∂ E | ( S τ γ t ) d τ ≤ − Z h ( t ) s | ∂ E | ( S h ( t ) γ t ) e λ ( h ( t ) − τ ) d τ = − e λh ( t ) | ∂ E | ( S h ( t ) γ t ) Z h ( t ) s e − λτ d τ = 12 λ (cid:16) − e λ ( h ( t ) − s ) (cid:17) | ∂ E | ( S h ( t ) γ t ) , so that, as a consequence, I ≤ λ Z h ( t ) h ( t ) e λs (cid:16) − e λ ( h ( t ) − s ) (cid:17) | ∂ E | ( S h ( t ) γ t ) d s = 12 λ | ∂ E | ( S h ( t ) γ t ) Z h ( t ) h ( t ) (cid:16) e λs − e λh ( t ) · e − λs (cid:17) d s = − λ | ∂ E | (˜ γ t ) e λh ( t ) (cid:16) e λ ( h ( t ) − h ( t )) + e λ ( h ( t ) − h ( t )) − (cid:17) . Plugging this estimate together with (3.11) into (3.10) and dividing by ( t − t ) > entails our claim.We also need to study the behaviour of the “regularized” curve ˜ γ t := S h ( t ) γ t andof the entropy E along it: this is the content of the following two results. Lemma 3.8.
With the same assumptions and notations as in Setting 3.1, if ( γ t ) ∈ AC ([0 , , X) and h : [0 , → R is absolutely continuous with h ( t ) > for all t ∈ (0 , , then the curve ˜ γ t := S h ( t ) γ t belongs to AC loc ((0 , , X) ∩ C ([0 , , X) .Proof. Fix δ ∈ (0 , / , t , t ∈ [ δ, − δ ] with t ≤ t and define m δ := min t ∈ [ δ, − δ ] h ( t ) , M δ := max t ∈ [ δ, − δ ] h ( t ) , (3.12)paying attention to the fact that m δ > by construction. Write as before ˆ γ t := S h ( t ) γ t for the “horizontal” translation of ˜ γ t (see Figure 2). By triangular inequalityand the contraction estimate (3.3) we get d (˜ γ t , ˜ γ t ) ≤ d (˜ γ t , ˆ γ t ) + d (ˆ γ t , ˜ γ t ) ≤ e λ − M δ d ( γ t , γ t ) + d (ˆ γ t , ˜ γ t ) , (3.13)16here λ − := max {− λ, } . Since ( γ t ) is absolutely continuous the first term in theright-hand side can be controlled as d ( γ t , γ t ) ≤ R t t | ˙ γ t | d t . As regards the secondone, by (3.4) and up to assuming h ( t ) ≤ h ( t ) (which is not restrictive, as otherwiseit is sufficient to swap the boundary values of integration below) it holds d (ˆ γ t , ˜ γ t ) = d ( S h ( t ) γ t , S h ( t ) γ t ) ≤ Z h ( t ) h ( t ) (cid:12)(cid:12)(cid:12) dd s S s γ t (cid:12)(cid:12)(cid:12) d s = Z h ( t ) h ( t ) | ∂ E | ( S s γ t ) d s, (3.14)where, to avoid possibly ambiguous notations, | dd s S s γ t | denotes the metric speed ofthe “vertical” curve s S s γ t . In order to control the slope in the right-most termuniformly both in s ∈ [ h ( t ) , h ( t )] ⊂ [ m δ , M δ ] and in t ∈ [ δ, − δ ] , for fixed δ , let ε be such that − λε < log 2 (if λ ≥ , choose ε = m δ ) and define ε ′ := min { m δ , ε } .Then by (3.5) and the fact that s ≥ h ( t ) ≥ m δ ≥ ε ′ we have | ∂ E | ( S s γ t ) ≤ e λ ( ε ′ − s ) | ∂ E | ( S ε ′ γ t ) ≤ e λ − ( M δ − ε ′ ) | ∂ E | ( S ε ′ γ t ) , ∀ s ∈ [ m δ , M δ ] and by (3.6) for any reference point x ∈ D ( | ∂ E | ) it holds | ∂ E | ( S ε ′ γ t ) ≤ e λε ′ − | ∂ E | ( x ) + 1 I λ ( ε ′ ) d ( x, γ t ) . The squared distance in the right-hand side above is bounded uniformly in t ∈ [ δ, − δ ] , since by triangular inequality d ( x, γ t ) ≤ d ( x, γ ) + d ( γ , γ t ) ≤ d ( x, γ ) + ℓ ( γ ) . Therefore there exists C δ > such that | ∂ E | ( S s γ t ) ≤ C δ for all t ∈ [ δ, − δ ] and s ∈ [ m δ , M δ ] . (3.15)and plugging this bound into (3.14) yields d (ˆ γ t , ˜ γ t ) ≤ C δ | h ( t ) − h ( t ) | ≤ C δ Z t t | h ′ ( t ) | d t. It is now sufficient to combine this inequality with d ( γ t , γ t ) ≤ R t t | ˙ γ t | d t and (3.13)to get d (˜ γ t , ˜ γ t ) ≤ Z t t (cid:16) e λ − M δ | ˙ γ t | + C δ | h ′ ( t ) | (cid:17) d t. (3.16)As e λ − M δ | ˙ γ t | + C δ | h ′ ( t ) | ∈ L ( δ, − δ ) and δ is arbitrary, the fact that (˜ γ t ) ∈ AC loc ((0 , is thus proved.Turning now to the continuity of (˜ γ t ) at the endpoints, let t = 0 and t ∈ (0 , .Arguing as for (3.13) but with a crucial difference in the choice of the third point inthe triangular inequality, it holds d (˜ γ , ˜ γ t ) ≤ d (˜ γ , S h ( t ) γ ) + d ( S h ( t ) γ , ˜ γ t ) ≤ d ( S h (0) γ , S h ( t ) γ ) + e − λh ( t ) d ( γ , γ t ) . The second term on the right-hand side vanishes as t ↓ by (absolute) continuityof γ and so does the first one, since s S s γ is continuous in [0 , ∞ ) with values in X and h ( t ) → h (0) . The continuity at t = 1 is obtained similarly and the proof iscomplete. 17 emark 3.9. If h ( t ) > also in t = 0 , , then the previous argument can beextended to the whole interval [0 , and therefore (˜ γ t ) ∈ AC ([0 , , X) . (cid:4) Lemma 3.10.
With the same assumptions and notations as in Lemma 3.8, theentropy is locally absolutely continuous in (0 , along the regularized curve ˜ γ t , i.e. t E (˜ γ t ) ∈ AC loc ((0 , . If in addition ( γ t ) ∈ AC ([0 , , X) , E (˜ γ ) , E (˜ γ ) < ∞ and h is differentiable at t = 0 and t = 1 with h ′ (0) > and h ′ (1) < , then t E (˜ γ t ) ∈ C ([0 , . Note that E (˜ γ ) , E (˜ γ ) < ∞ is automatically satisfied if h ( t ) > also in t = 0 , . Proof.
Let us first prove that t E (˜ γ t ) is locally absolutely continuous. Since | ∂ E | is a strong upper-gradient, the chain rule (3.2) holds and it suffices to show that | ∂ E | (˜ γ t ) | ˙˜ γ t | ∈ L loc (0 , , namely Z − δδ | ∂ E | (˜ γ t ) | ˙˜ γ t | d t < ∞ , ∀ δ ∈ (0 , / , (3.17)as this would imply that E ◦ ˜ γ ∈ AC loc ((0 , with (cid:12)(cid:12)(cid:12)(cid:12) dd t ( E ◦ ˜ γ )( t ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ | ∂ E | (˜ γ t ) · | ˙˜ γ t | , for a.e. t ∈ (0 , . To this aim, observe from (3.16) that | ˙˜ γ t | ∈ L loc (0 , with | ˙˜ γ t | ≤ e λ − M δ | ˙ γ t | + C δ | h ′ ( t ) | a.e. on [ δ, − δ ] , with M δ defined in (3.12). Moreover from (3.15) we also know that | ∂ E | ( S s γ t ) ≤ C δ uniformly in t ∈ [ δ, − δ ] and s ∈ [ m δ , M δ ] , so that by choosing s = h ( t ) we get in particular | ∂ E | (˜ γ t ) ≤ C δ for all t ∈ [ δ, − δ ] . This shows that t
7→ | ∂ E | (˜ γ t ) belongs to L ∞ loc (0 , , whence (3.17).Now assume that ( γ t ) ∈ AC ([0 , , X) , E (˜ γ ) < ∞ , h is differentiable at t = 0 with h ′ (0) > and let us prove that t E (˜ γ t ) is continuous at t = 0 . (The argumentis identical for t = 1 .) On the one hand, as (˜ γ t ) is continuous at t = 0 by Lemma3.8 and E is lower semicontinuous, we see that E (˜ γ ) ≤ lim inf t ↓ E (˜ γ t ) . On the otherhand, choosing t = 0 in (3.8), our assumption that h ′ (0) > gives h ( t ) > h (0) for t > small, hence t − = 0 and t + = t . Discarding the first two (non-negative)terms on the left-hand side, and multiplying by ( t − t ) = t yield − e − λ ( h ( t ) − h (0)) λ ( t − · (cid:0) E (˜ γ t ) − E (˜ γ ) (cid:1) ≤ t e − λ ( h ( t )+ h (0)) (cid:12)(cid:12)(cid:12)(cid:12) d ( γ t , γ ) t (cid:12)(cid:12)(cid:12)(cid:12) ≤ t e − λ ( h ( t )+ h (0)) (cid:16) t Z t | ˙ γ t | d t (cid:17) ≤ e − λ ( h ( t )+ h (0)) Z t | ˙ γ t | d t. . Letting t ↓ , the right-hand side vanishes owing to our assumption that ( γ t ) ∈ AC ([0 , , X) , and clearly the exponential difference quotient in the left-hand sideconverges to h ′ (0) . Rearranging gives h ′ (0) lim sup t ↓ E (˜ γ t ) ≤ h ′ (0) E (˜ γ ) , h ′ (0) > the desired upper semicontinuity follows and the proof is complete.Gathering the results proven so far, we deduce the following: Proposition 3.11.
With the same assumptions and notations as in Lemma 3.8, fora.e. t ∈ (0 , it holds (cid:12)(cid:12) ˙˜ γ t (cid:12)(cid:12) + 12 | h ′ ( t ) | | ∂ E | (˜ γ t ) + h ′ ( t ) dd t E (˜ γ t ) ≤ e − λh ( t ) | ˙ γ t | . (3.18) Proof.
The argument simply consists in taking the limit t → t in (3.8), whichshould clearly lead (at least formally) to (3.18) by Taylor-expanding the variousexponential difference quotients. In order to make this rigorous, note that the firstand third terms in the left-hand side of (3.18) are well defined for a.e. t ∈ (0 , byLemma 3.8 and Lemma 3.10, respectively. The second term is also unambiguouslydefined because h ( t ) > , hence the “vertical” EVI λ -gradient flow starting from γ t and defining ˜ γ t = S h ( t ) γ t falls immediately within the domain X = D ( | ∂ E | ) . Theright-hand side is well defined for a.e. t since γ ∈ AC ([0 , , X) .After this premise, let t ∈ (0 , be any differentiation point for h , t γ t , t ˜ γ t and t E (˜ γ t ) , choose t = t in (3.8) and let us take the right limit t ↓ t (since weare considering a differentiability point the left and right limits exist and are equal,so there is no need to address the left limit). From the very definition (3.7) of t ± it clearly holds t ± → t as t ↓ t , hence the convergence of the right-hand side of(3.8) to the right-hand side of (3.18) is clear and so is the convergence of the twodifference quotients of h . By Lemma 3.8 the first term in the left-hand side alsopasses to the limit, as does the third one according to Lemma 3.10. The only termleft to handle is the Fisher information | ∂ E | (˜ γ + ) . From the continuity of t ˜ γ t (cf. Lemma 3.8) we see that ˜ γ t + → ˜ γ t in (X , d ) , and the lower semicontinuity of theslope (3.1) results in | ∂ E | (˜ γ t ) ≤ lim inf t ↓ | ∂ E | (˜ γ t + ) . Thus rigorously taking the liminf t ↓ t in (3.8) entails (3.18) and achieves theproof.The interesting consequence for our purpose is then: Theorem 3.12.
With the same assumptions and notations as in Setting 3.1, fix ε > , and set h ε ( t ) := ε min { t, − t } . Let ( γ t ) ∈ AC ([0 , , X) be such that E ( γ ) , E ( γ ) < ∞ and define γ εt := S h ε ( t ) γ t , t ∈ [0 , . Then ( γ εt ) ∈ AC ([0 , , X) , t E ( γ εt ) belongs to AC ([0 , and it holds Z | ˙ γ εt | d t + ε Z | ∂ E | ( γ εt ) d t ≤ e λ − ε Z | ˙ γ t | d t − ε E ( γ ε / )+ ε (cid:16) E ( γ ) + E ( γ ) (cid:17) . (3.19)Note here that h ε (0) = h ε (1) = 0 , so that the endpoints γ ε = γ and γ ε = γ remainunchanged. 19 roof. The strategy of proof simply consists in integrating (3.18) between and while integrating by parts of the term h ′ ε ( t ) dd t E ( γ εt ) , separately on [0 , / and [1 / , . Note carefully that our specific choice gives h ′ ε = ε and h ′ ε = − ε on thesetwo time intervals, respectively. Taking into account e − λh ε ( t ) ≤ e λ − ε , where λ − :=max {− λ, } , this procedure yields Z | ˙ γ εt | d t + ε Z | ∂ E | ( γ εt ) d t ≤ e λ − ε Z | ˙ γ t | d t − ε E ( γ ε / )+ ε (cid:16) E ( γ ε ) + E ( γ ε ) (cid:17) . The term ε E ( γ ε / ) simply arises from the two boundary terms at t = 1 / in the twointegrations by parts. (Alternatively, it can be seen as the result of − R E ( γ εt ) h ′′ ( t ) arising from the integration by parts in the whole interval [0 , , with the singularity h ′′ ( t ) = − εδ / ( t ) ). However, this argument is not fully rigorous because all theterms on the left-hand side of (3.18) are only locally integrable, hence we may notbe allowed to integrate them all the way to t = 0 and t = 1 .In order to circumvent this slight issue, choose δ ∈ (0 , / and carry out the sameargument on [ δ, / and [1 / , − δ ] rather than on [0 , / and [1 / , : Integrationby parts is now justified by Lemma 3.10 and this provides us with Z − δδ | ˙ γ εt | d t + ε Z − δδ | ∂ E | ( γ εt ) d t ≤ e λ − ε Z − δδ | ˙ γ t | d t + ε (cid:16) E ( γ εδ ) − E ( γ ε / ) + E ( γ ε − δ ) (cid:17) . (3.20)It is then sufficient to pass to the limit as δ ↓ . By monotonicity the left-hand sideabove converges to the left-hand side in (3.19) and for the same reason so does thefirst term on the right-hand side, while by the current choice of h and by Lemma3.10 t E ( γ εt ) is continuous on the whole interval [0 , , so that lim δ ↓ ε (cid:16) E ( γ εδ ) + E ( γ ε − δ ) (cid:17) = ε (cid:16) E ( γ ε ) + E ( γ ε ) (cid:17) = ε (cid:16) E ( γ ) + E ( γ ) (cid:17) and (3.19) follows.Finally, since the right-hand side of (3.19) is finite we see that | ˙ γ ε | ∈ L (0 , and | ∂ E | ( γ ε ) ∈ L (0 , . As a consequence | ˙ γ ε | · | ∂ E ( γ ε ) | ∈ L (0 , in the strongupper-chain rule (3.2), and E ◦ γ ε ∈ AC ([0 , as desired. Γ -convergence of the Schrödinger problem Relying on the results of the previous section, we can now turn to Theorem 2.1 andmake it rigorous in the metric setting. To this end, let us first introduce two actionfunctionals: the kinetic energy A and the (halved) Fisher information I along acurve, respectively defined as A ( γ ) := 12 Z | ˙ γ t | d t and I ( γ ) := 12 Z | ∂ E | ( γ t ) d t ( γ t ) ∈ C ([0 , , X) , where it is understood that A ( γ ) = + ∞ whenever γ is notabsolutely continuous. Given two points x, y ∈ X and a temperature/slowing-downparameter ε > , the (metric) Schrödinger problem reads as inf ( γ t ) : x y n A ( γ ) + ε I ( γ ) o , ( Sch ε )where ( γ t ) : x y is a short-hand notation meaning that the infimum runs over all ( γ t ) ∈ C ([0 , , X) such that γ = x and γ = y . For sake of brevity we also introduce A ε ( γ ) := A ( γ ) + ε I ( γ ) . From (
Sch ε ) it is thus clear that the Fisher information I acts as a perturbation of A and this has a regularizing effect, since minimizers of ( Sch ε ) live within the regulardomain X = D ( | ∂ E | ) . Remark 4.1.
The smoothing effect is well understood for the classic Schrödingerproblem in a regular setting, namely when E is the Boltzmann-Shannon relativeentropy and X is the Wasserstein space over a smooth Riemannian manifold. Inthis case, under mild assumptions on the end-points, minimizers of ( Sch ε ) are curvesof absolutely continuous measures whose densities are bounded, smooth, Lipschitz,with exponentially fast decaying tails.In the current metric framework the properties above are meaningless, but stillminimizers of ( Sch ε ) are “regular” from a metric point of view, since as just saidthey live within D ( | ∂ E | ) . Moreover, in Proposition 4.2 we are going to see that E isabsolutely continuous along optimal curves. (cid:4) Let us first deal with the solvability of (
Sch ε ). Proposition 4.2.
With the same assumptions and notations as in Setting 3.1 andunder Assumption 3.2, for any fixed x, y ∈ X and ε > the Schrödinger problem ( Sch ε ) is solvable if and only if E ( x ) , E ( y ) < ∞ and there exists ( γ t ) ∈ AC ([0 , , X) such that γ = x and γ = y . As the condition characterizing the solvability of the Schrödinger problem doesnot depend on ε , it is clear that if ( Sch ε ) is solvable for some ε > , then it is actuallysolvable for all ε > . Proof.
Assume that the endpoints have finite entropy and that there exists an abso-lutely continuous curve γ connecting x to y . Up to reparametrization, we can assumethat ( γ t ) ∈ AC ([0 , , X) . Theorem 3.12 thus guarantees that A ε is finite along theregularization ( γ εt ) t ∈ [0 , of this curve and therefore the variational problem ( Sch ε )is proper. Let then ( γ nt ) be any minimizing sequence and observe that the kineticaction A is bounded uniformly in n , say A ( γ n ) ≤ C for all n . We now observe thatfor any pair ≤ t < t ≤ it holds d ( γ nt , γ nt ) ≤ Z t t | ˙ γ nt | d t ≤ | t − t | / (cid:16) Z t t | ˙ γ nt | d t (cid:17) / ≤ C | t − t | / . (4.1)Since the endpoints are fixed, this implies that the set of points γ nt is bounded in (X , d ) uniformly in n, t , thus it is σ -relatively sequentially compact by Assumption21.2. By the refined Arzelà-Ascoli lemma [2, Proposition 3.3.1], there exists a limiting d -continuous (actually / -Hölder continuous) curve γ such that γ nt σ → γ t , ∀ t ∈ [0 , . We now observe that the kinetic action is lower semicontinuous for this pointwise-in-time convergence w.r.t. σ , cf. [3, Section 2.2] (indeed, d is lower semicontinuousw.r.t. σ , hence the -energies of the finite partitions of γ are lower semicontinuousw.r.t. σ too, whence the lower semicontinuity of the -energy of γ itself). Moreover | ∂ E | is also lower semicontinuous w.r.t. σ by hypothesis, and this fact together withFatou’s lemma gives Z | ∂ E | ( γ t ) d t ≤ Z lim inf n →∞ | ∂ E | ( γ nt ) d t ≤ lim inf n →∞ Z | ∂ E | ( γ nt ) d t. Therefore γ is a minimizer of ( Sch ε ).Conversely, assume that there exists a minimizer, denoted by γ (the followingargument actually works for any curve along which A ε is finite and without Assump-tion 3.2). Then in particular t
7→ | ˙ γ t | and t
7→ | ∂ E | ( γ t ) belong to L (0 , and by(3.2) we see that t E ( γ t ) is globally absolutely continuous with (cid:12)(cid:12)(cid:12)(cid:12) dd t ( E ◦ γ )( t ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ | ∂ E | ( γ t ) · | ˙ γ t | ∈ L (0 , . The fact that ( | ˙ γ t | ) ∈ L (0 , ⊂ L (0 , trivially implies ( γ t ) ∈ AC ([0 , , X) ,whereas the fact that t
7→ | ∂ E | ( γ t ) belongs to L (0 , also implies that | ∂ E | ( γ t ) isfinite for a.e. t ∈ [0 , and a fortiori so is E ( γ t ) , since D ( | ∂ E | ) ⊂ D ( E ) . Hence let t ∗ ∈ (0 , be any point satifying E ( γ t ∗ ) < ∞ and note that together with (3.2) thisgives the following global upper bound valid for all t < t ∗ E ( γ t ) ≤ E ( γ t ∗ ) + Z t ∗ t (cid:12)(cid:12)(cid:12)(cid:12) dd t ( E ◦ γ )( t ) (cid:12)(cid:12)(cid:12)(cid:12) d t ≤ E ( γ t ∗ ) + Z (cid:12)(cid:12)(cid:12)(cid:12) dd t ( E ◦ γ )( t ) (cid:12)(cid:12)(cid:12)(cid:12) d t =: E < ∞ . As a consequence, and taking also into account the facts that t γ t is d -continuousand E is lower semicontinuous, we get E ( γ ) = E (cid:0) lim t → γ t (cid:1) ≤ lim inf t → E ( γ t ) ≤ E and the proof is thus complete, as the same argument applies mutatis mutandis for t = 1 too.We now fix x, y ∈ X and let C ([0 , , X) ∋ γ ι ( γ ) denote the convex indicatorof the endpoint constraints, i.e. ι ( γ ) = ( if γ = x and γ = y, + ∞ otherwise.With this said, we can finally state our Γ -convergence result, where the finite-entropyassumption on the endpoints is motivated by the previous proposition.22 heorem 4.3. With the same assumptions and notations as in Setting 3.1, if x, y ∈ X are such that E ( x ) , E ( y ) < ∞ , then Γ − lim ε → n A ε + ι o = A + ι for the uniform convergence on C ([0 , , X) . If Assumption 3.2 holds, then the Γ -convergence also takes place w.r.t. the pointwise-in-time σ -topology.Proof. The Γ − lim inf inequality is rather clear, since the kinetic energy A is lowersemicontinuous both w.r.t. uniform-in-time d -convergence and pointwise-in-time σ -convergence: for the former topology the fact is well known, for the latter it hasbeen discussed in the proof of Proposition 4.2. An analogous claim is also true forthe convex indicator ι . As a consequence, we have that for any γ ε converging to γ uniformly in time in the metric topology or pointwise in time in the topology σ (ifapplicable) it holds A ( γ ) + ι ( γ ) ≤ lim inf ε ↓ A ( γ ε ) + lim inf ε ↓ ι ( γ ε ) ≤ lim inf ε ↓ n A ( γ ε ) + ι ( γ ε ) o ≤ lim inf ε ↓ n A ( γ ε ) + ε I ( γ ε ) + ι ( γ ε ) o = lim inf ε ↓ n A ε ( γ ε ) + ι ( γ ε ) o , whence the desired Γ − lim inf inequality.For the Γ − lim sup , take any ( γ t ) ∈ AC ([0 , , X) connecting x to y (if it doesnot exist, then there is nothing to prove). Then Theorem 3.12 precisely provides arecovery sequence γ εt := S h ε ( t ) γ t with h ε defined as therein, both for the uniform-in-time d -convergence and the pointwise-in-time σ -convergence (the latter is an easyconsequence of the former by Remark 3.4). To prove this claim, note that for any n ∈ N there exist t , ..., t k ∈ [0 , such that, for any t ∈ [0 , , d ( γ t , γ t i ) < /n for atleast one t i ; in addition, since γ εt → γ t for all t ∈ [0 , there exists ε n small enoughsuch that d ( γ t i , γ εt i ) < /n for all ε < ε n and i = 1 , ..., k . As a consequence, taking(3.3) into account, d ( γ t , γ εt ) ≤ d ( γ t , γ t i ) + d ( γ t i , S h ε ( t ) γ t i ) + d ( S h ε ( t ) γ t i , γ εt ) ≤ d ( γ t , γ t i ) + d ( γ t i , S h ε ( t ) γ t i ) + e − λh ε ( t ) d ( γ t , γ t i ) ≤ n (2 + e λ − ε/ ) for all t ∈ [0 , and ε < ε n and by the arbitrariness of n we conclude that γ ε → γ uniformly. Furthermore, the lim sup inequality can be proved as follows: lim sup ε ↓ n A ε ( γ ε ) + ι ( γ ε ) o = lim sup ε ↓ n A ( γ ε ) + ε I ( γ ε ) + 0 o (3.19) ≤ lim sup ε ↓ n e λ − ε A ( γ ) − ε E ( γ ε / ) + ε (cid:0) E ( x ) + E ( y ) (cid:1)o ≤ lim sup ε ↓ n e λ − ε A ( γ ) + ε (cid:0) E ( x ) + E ( y ) (cid:1)o − ε ↓ ε E ( γ ε / ) ≤ A ( γ ) = A ( γ ) + ι ( γ ) , where the third inequality comes from the fact that, for any ε ↓ , ( γ ε / ) is containedin a bounded set and by assumption E is bounded from below on bounded sets,whence E ( γ ε / ) ≥ c for some c ∈ R . The proof is thus complete.23s an easy consequence of this result we obtain the following: Corollary 4.4.
With the same assumptions and notations as in Setting 3.1 andunder the further requirements that Assumption 3.2 holds and the Schrödinger prob-lem ( Sch ε ) relative to x, y ∈ X is solvable, let ε k ↓ and ω k be a minimizer of thecorresponding Schrödinger problem ( Sch ε ) with ε = ε k .Then lim k →∞ n A ( ω k ) + ε k I ( ω k ) o = inf ( γ t ) : x y A ( γ ) . Moreover, there exists ω ∈ C ([0 , , X) such that, up to a subsequence, ω k → ω inthe pointwise-in-time σ -topology and A ( ω ) = inf ( γ t ) : x y A ( γ ) . Proof.
Recall that, under a mild equi-coercivity condition, Γ -convergence preciselyguarantees that the limit of the optimal values of the approximating problems isthe optimal value of the limit problem and limits of minimizers are minimizers, cf.[16, Theorem 1.21]. In view of Theorem 4.3 and [16, Theorem 1.21], for the mildequi-coercivity condition to hold it suffices to prove that the set of minimizers { ω k } is relatively compact in the pointwise-in-time σ -topology. To this aim, the kineticenergies of the curves ω k are uniformly bounded since A ( ω k ) ≤ A ( ω k ) + ε k I ( ω k ) ≤ A ( ω ε ) + ε k I ( ω ε ) ≤ A ( ω ε ) + ε I ( ω ε ) < + ∞ , where ω ε is the minimizer for the problem with ε = ε := sup k ε k . Arguing as in theproof of Proposition 4.2, we deduce that there exists a continuous curve ( ω t ) t ∈ [0 , connecting x and y such that, up to extracting a suitable subsequence, ω kt → ω t w.r.t. σ as k → ∞ for all t ∈ [0 , . Remark 4.5.
Note that in Corollary 4.4 the curve ω is length-minimizing but notnecessarily distance-minimizing, namely it needs not be a geodesic between x and y ,since we only know that inf ( γ t ) : x y A ( γ ) ≥ d ( x, y ) and the inequality might be strict, e.g. if X is a non-convex subset of R d . However,if (X , d ) is a length metric space, i.e. for all x, y ∈ X and ε > there exists ( γ t ) ∈ AC ([0 , , X) such that γ = x , γ = y and ℓ ( γ ) ≤ d ( x, y ) + ε , then the inequalityabove turns out to be an identity and, as a consequence, ω is a geodesic. Thismeans that for any two points having finite energy there always exists a geodesicconnecting them. (cid:4) When the endpoints have infinite entropy, the following variant of Theorem 4.3may be useful:
Theorem 4.6.
With the same assumptions and notations as in Setting 3.1, let x, y ∈ X with possibly E ( x ) , E ( y ) = + ∞ and for any fixed ( ε n ) n ∈ N , ε n ↓ , let ( η n ) n ∈ N be converging to 0 slowly enough so that ε n ( E ( γ n ) + E ( γ n )) → with γ n := S η n x, γ n := S η n y. hen Γ − lim n →∞ n A ε n + ι n o = A + ι , for the uniform convergence on C ([0 , , X) . If Assumption 3.2 holds, then the Γ -convergence also takes place w.r.t. the pointwise-in-time σ -topology. Here ι n and ι are the convex indicators of the endpoint constraints for γ n , γ n and x, y , respectively.Proof. The proof of the Γ − lim inf is almost identical to that in Theorem 4.3, withthe only extra observation that ι ( γ ) ≤ lim inf n →∞ ι n ( γ n ) . For the Γ − lim sup , observe that if there does not exist ( γ t ) ∈ AC ([0 , , X) joining x and y , then there is nothing to prove. Hence let us suppose that at least one curve ( γ t ) ∈ AC ([0 , , X) connecting x and y exists, fix it and note that Theorem 3.12applied to the curve S η n γ t still provides a recovery sequence γ ε n t := S η n + h εn ( t ) γ t withthe same choice h ε n ( t ) = ε n min { t, − t } as before. Indeed, on the one hand lim sup n →∞ n A ε n ( γ ε n ) + ι n ( γ ε n ) o = lim sup n →∞ n A ( γ ε n ) + ε n I ( γ ε n ) + 0 o (3.19) ≤ lim sup n →∞ n e λ − ε n A ( S η n γ ) − ε n E ( γ ε n / ) + ε n (cid:0) E ( γ n ) + E ( γ n ) (cid:1)o ≤ lim sup n →∞ n e λ − ε n A ( S η n γ ) + ε n (cid:0) E ( γ n ) + E ( γ n ) (cid:1)o − ε ↓ ε E ( γ ε / ) ≤ lim sup n →∞ A ( S η n γ ) ≤ lim sup n →∞ e − λη n A ( γ ) = A ( γ ) , where the third inequality follows by the same argument adopted in the proof of theprevious theorem and the last one is due to (3.18) with h ( t ) ≡ η n . On the otherhand, γ ε n t → γ t uniformly in t ∈ [0 , in the d -topology and, if Assumption 3.2 holds,for all t ∈ [0 , w.r.t. σ : the argument described in the previous proof applies alsohere verbatim.As conclusion, in the next proposition we show that any EVI -gradient flow is asolution of the Schrödinger problem with suitable endpoints. Intuitively this is clear,because up to a rescaling factor ε both the trajectories of the gradient flow of E andthe solutions to ( Sch ε ) must formally satisfy the same Newton equation, namely ¨ γ t = −∇ Φ( γ t ) where the potential Φ is given by (minus) the Fisher information −| ∂ E | ,cf. [33, Remark 6]. This is also in complete analogy with the standard Schrödingerproblem, which includes the heat flow as a particular entropic interpolation. Proposition 4.7.
With the same assumptions and notations as in Setting 3.1, fix ε > . Then for all x, y ∈ X the following lower bound on the optimal value of ( Sch ε ) holds inf ( γ t ) : x y A ε ( γ ) ≥ ε (cid:12)(cid:12) E ( x ) − E ( y ) (cid:12)(cid:12) . (4.2) If either y = S ε x or x = S ε y , then equality is achieved. In the former case the curve [0 , ∋ t ˆ γ t := S εt x is a minimizer in the Schrödinger problem and the optimalvalue is inf ( γ t ) : x y A ε ( γ ) = ε (cid:0) E ( x ) − E ( S ε x ) (cid:1) . An analogous statement holds when x = S ε y . roof. By (3.2) and Young’s inequality it follows that for any (˜ γ t ) ∈ AC ([0 , ε ] , X) joining x and y (if it exists; if not, (4.2) is trivial) it holds (cid:12)(cid:12) E (˜ γ ) − E (˜ γ ε ) (cid:12)(cid:12) ≤ Z ε | ˙˜ γ t | d t + 12 Z ε | ∂ E | (˜ γ t ) d t. By setting γ t := ˜ γ εt , t ∈ [0 , , and by the arbitrariness of ˜ γ we thus see that for all ( γ t ) ∈ AC ([0 , , X) joining x and y we have ε (cid:12)(cid:12) E ( γ ) − E ( γ ) (cid:12)(cid:12) ≤ Z | ˙ γ t | d t + ε Z | ∂ E | ( γ t ) d t, so that ε (cid:12)(cid:12) E ( γ ) − E ( γ ) (cid:12)(cid:12) ≤ inf ( γ t ) : x y A ε ( γ ) . Now assume that y = S ε x : integrating (3.4) for the EVI λ -gradient flow ˆ γ (payingattention to the rescaling factor ε ) between 0 and 1 we get A ε (ˆ γ ) = 12 Z | ˙ˆ γ t | d t + ε Z | ∂ E | (ˆ γ t ) d t = ε (cid:0) E ( x ) − E ( y ) (cid:1) = ε (cid:12)(cid:12) E ( x ) − E ( y ) (cid:12)(cid:12) , where the last equality comes from the fact that t E ( S t x ) is non-increasing, as aconsequence of (3.4). Combining this identity with (4.2) yields the conclusion. In analogy with Section 2.2, in this short section we establish the geodesic λ -convexityof E . As already explained in the Introduction, although the result is known (cf. [27,Theorem 3.2]), our proof is independent and new and is a further evidence of thewide range of applications of the Schrödinger problem. Let us stress once more thatall the properties of EVI λ -gradient flows stated in Section 3.1 and used so far do notrely on geodesic λ -convexity, whence the genuine independence of our approach. Theorem 4.8.
With the same assumptions and notations as in Setting 3.1, thepotential E is λ -convex along any geodesic.Proof. Let ( γ t ) be any constant-speed geodesic. We want to prove that E ( γ θ ) ≤ (1 − θ ) E ( γ ) + θ E ( γ ) − λ θ (1 − θ ) d ( γ , γ ) , ∀ θ ∈ [0 , . We will establish this inequality by carefully estimating at order one as ε ↓ thedefect of optimality, in the geodesic problem from γ to γ , of a suitably regularizedversion ( γ εt ) of the geodesic.If E ( γ ) = + ∞ or E ( γ ) = + ∞ there is nothing to prove, so we can assumewithout loss of generality that both endpoints have finite entropy. If θ = 0 or θ = 1 the inequality is trivial as well. Fix then an arbitrary parameter θ ∈ (0 , and let H θ ( t ) := θ t if t ∈ [0 , θ ] , − − θ ( t − if t ∈ [ θ, .
26e the hat function centered at t = θ with height and vanishing at t = 0 , . Setting h ( t ) := εH θ ( t ) for small ε > , let ( γ εt ) be the curve constructed as in Lemma 3.6,i.e. γ εt := S h ( t ) γ t , for all t ∈ [0 , . Arguing as in the proof of Theorem 3.12, it is easily verified that with the currentchoice of h it is still true that t
7→ | ˙ γ εt | and t
7→ | ∂ E | ( γ εt ) belong to AC ([0 , , X) and t E ( γ εt ) to AC ([0 , , so that we can integrate (3.18) in time on the whole interval [0 , . Discarding the non-negative term | h ′ ( t ) | | ∂ E | ( γ εt ) and using the optimality ofthe geodesic γ (namely its optimality between γ and γ ) give ≤ Z | ˙ γ εt | d t − Z | ˙ γ t | d t (3.18) ≤ − Z h ′ ( t ) dd t E ( γ εt ) d t + 12 Z (cid:16) e − λh ( t ) − (cid:17) | ˙ γ t | d t = − ε Z H ′ θ ( t ) dd t E ( γ εt ) d t + d ( γ , γ )2 Z (cid:16) e − ελH θ ( t ) − (cid:17) d t, where the last equality follows from the constant speed property of the geodesic γ ,namely | ˙ γ t | = d ( γ , γ ) . Dividing by ε > and leveraging the explicit piecewiseconstant values of H ′ θ ( t ) on each interval (0 , θ ) and ( θ, gives ≤ − Z H ′ θ ( t ) dd t E ( γ εt ) d t + d ( γ , γ )2 Z e − ελH θ ( t ) − ε d t | {z } := I ε = − Z θ θ dd t E ( γ εt ) d t + Z θ − θ dd t E ( γ εt ) d t + d ( γ , γ )2 I ε = 1 θ (cid:16) E ( γ ) − E ( γ εθ ) (cid:17) + 11 − θ (cid:16) E ( γ ) − E ( γ εθ ) (cid:17) + d ( γ , γ )2 I ǫ . Now let us multiply by θ (1 − θ ) > and rearrange the terms in order to get E ( γ εθ ) ≤ (1 − θ ) E ( γ ) + θ E ( γ ) + θ (1 − θ ) d ( γ , γ )2 I ε . It is easy to check that R H θ ( t )d t = for all θ , so that lim ε ↓ I ε = − λ Z H θ ( t ) d t = − λ. On the other hand, by definition of γ ε and since h ( θ ) = ε → it is clear that γ εθ = S h ( θ ) γ θ = S ε γ θ → γ θ in X (an EVI λ -gradient flow is continuous up to t = 0 ).By lower semicontinuity of E this yields E ( γ θ ) ≤ lim inf ε ↓ E ( γ εθ ) ≤ (1 − θ ) E ( γ ) + θ E ( γ ) − λ θ (1 − θ ) d ( γ , γ ) , whence the conclusion. 27 Derivative of the cost
As a main application of the Γ -convergence results contained in Theorem 4.3 andCorollary 4.4 (and, in a wider sense, of their strategy of proof), in this section weinvestigate the dependence of the optimal value of the Schrödinger problem ( Sch ε ) onthe regularization parameter ε , focusing in particular on the regularity as a functionof ε and on the behaviour in the small-time regime. More precisely, and denoting C ε ( x, y ) := inf ( γ t ) : x y n A ( γ ) + ε I ( γ ) o , ∀ ε ≥ the optimal entropic cost, we show that ε C ε ( x, y ) is (locally) absolutely contin-uous and admits explicit left and right derivatives in a pointwise sense. Moreover,since C ε ( x, y ) → C ( x, y ) as ε ↓ by Corollary 4.4, we aim at measuring the error C ε ( x, y ) − C ( x, y ) and studying the minimizers of the unperturbed problem C ( x, y ) selected by Γ -convergence. Since we focus here on the dependence on ε we will as-sume throughout the whole Section 5 and without further mention the well-posednessof the ε -Schrödinger problem: Assumption 5.1.
Fix x, y ∈ X and suppose that for some (hence for any, by Propo-sition 4.2) ε > the Schrödinger problem ( Sch ε ) admits at least one minimizer, inother words the infimum is attained in the definition of C ε ( x, y ) . We accordingly denote the set of ε -minimizers as Λ ε ( x, y ) := n ω ∈ AC ([0 , , X) : ω = x, ω = y and A ε ( ω ) = C ε ( x, y ) o . Let us start the analysis with a preliminary monotonicity statement for the Fisherinformation and the entropic cost, which generalizes [25, Lemma 3.3].
Lemma 5.2.
With the same assumptions and notations as in Setting 3.1 and forany ≤ ε < ε < ∞ there holds inf Λ ε ( x,y ) I ≥ sup Λ ε ( x,y ) I , with possibly inf Λ ( x,y ) I = + ∞ . Moreover, ε C ε ( x, y ) is monotone non-decreasingon [0 , ∞ ) .Proof. Let ε , ε as in the statement and choose ω i ∈ Λ ε i ( x, y ) for i = 1 , , so thatby optimality A ( ω ) + ε I ( ω ) ≤ A ( ω ) + ε I ( ω ) , A ( ω ) + ε I ( ω ) ≤ A ( ω ) + ε I ( ω ) . Summing these inequalities and dividing by ε − ε we obtain I ( ω ) ≥ I ( ω ) , andsince ω ∈ Λ ε and ω ∈ Λ ε are arbitrary the desired conclusion follows. As regardsthe last part of the statement, it is sufficient to note that since ω i are minimizers oftheir respective problems and ε < ε , C ε ( x, y ) = A ( ω ) + ε I ( ω ) ≤ A ( ω ) + ε I ( ω ) ≤ A ( ω ) + ε I ( ω ) = C ε ( x, y ) . ε = 0 to any ε ≥ . Proposition 5.3.
With the same assumptions and notations as in Setting 3.1 andunder the additional Assumption 3.2, for any ε > there holds Γ − lim ε ′ → ε n A ε ′ + ι o = A ε + ι (5.1) for the pointwise-in-time σ -topology and lim ε ′ → ε C ε ′ ( x, y ) = C ε ( x, y ) . Moreover, for any ε k → ε and any minimizer ω k ∈ Λ ε k ( x, y ) , there exists ω ∈ Λ ε ( x, y ) such that, up to a subsequence, ω kt σ → ω t , ∀ t ∈ [0 , as k → ∞ .Proof. It is sufficient to prove (5.1), as the other properties follow by a verbatimapplication of the arguments in the proof of Corollary 4.4.Fix ε and take ε ′ → ε . The Γ − lim sup inequality is trivial: if γ ε is such thatthe right-hand side of (5.1) is finite (otherwise there is nothing to prove), then theconstant sequence γ ε ′ ≡ γ ε is an admissible recovery sequence. For the Γ − lim inf inequality, note that the kinetic action A and the Fisher information I are lowersemicontinuous w.r.t. pointwise-in-time σ -convergence (see the proof of Proposition4.2), and clearly so is the convex indicator. Hence for any γ ε ′ converging to γ ε forthe pointwise-in-time σ -topology it holds A ε ( γ ε ) + ι ( γ ε ) ≤ lim inf ε ′ → ε n A ε ( γ ε ′ ) + ι ( γ ε ′ ) o = lim inf ε ′ → ε n A ( γ ε ′ ) + ε I ( γ ε ′ ) + ι ( γ ε ′ ) o = lim inf ε ′ → ε n A ( γ ε ′ ) + ( ε ′ ) I ( γ ε ′ ) + ι ( γ ε ′ ) o = lim inf ε ′ → ε n A ε ′ ( γ ε ′ ) + ι ( γ ε ′ ) o . As an immediate consequence of this result we deduce the following
Lemma 5.4.
With the same assumptions and notations as in Setting 3.1 and underAssumption 3.2, the function ε C ε ( x, y ) is continuous on [0 , ∞ ) .Moreover, if ε ω ε is a continuous (w.r.t. the pointwise-in-time σ -topology)selection of minimizers, then ε
7→ A ( ω ε ) and ε
7→ I ( ω ε ) are also continuous, on [0 , ∞ ) and (0 , ∞ ) respectively. Note that if the minimizers are unique, then ε ω ε is automatically continuousw.r.t. the pointwise-in-time σ -topology, simply by Proposition 5.3, as any sequenceof minimizers admits a subsequence converging to a minimizer and the limit is infact unique. Also, the continuity of the Fisher information can be strengthened upto ε = 0 , see later on Theorem 5.7. 29 roof. The continuity of C ε ( x, y ) for ε > is granted by Proposition 5.3, whilecontinuity at ε = 0 has already been proved in Corollary 4.4.As regards the kinetic energy A and the Fisher information I , recall that theyare both lower semicontinuous in [0 , ∞ ) w.r.t. the pointwise-in-time σ -topology, asalready discussed in the proof of Proposition 4.2. Thus, if ε ω ε is as in the state-ment, we are left to prove that ε
7→ A ( ω ε ) and ε
7→ I ( ω ε ) are upper semicontinuous.To this aim, it is sufficient to observe that lim sup ε ′ → ε A ( ω ε ′ ) = lim sup ε ′ → ε n C ε ′ ( x, y ) − ( ε ′ ) I ( ω ε ′ ) o ≤ lim sup ε ′ → ε C ε ′ ( x, y ) − lim inf ε ′ → ε ( ε ′ ) I ( ω ε ′ ) ≤ C ε ( x, y ) − ε I ( ω ε ) = A ( ω ε ) , where the last inequality holds by the continuity of ε C ε ( x, y ) and the lowersemicontinuity of ε
7→ I ( ω ε ) . Thus ε
7→ A ( ω ε ) is upper semicontinuous in [0 , ∞ ) .Interchanging A and I and writing now I = ε ( C ε − A ) , the same argument showsthat ε
7→ I ( ω ε ) is upper semicontinuous in (0 , ∞ ) (continuity at ε = 0 will require aspecial treatment later).We have now all the ingredients to discuss the regularity of the cost C ε ( x, y ) as afunction of the noise parameter ε and explicitly compute its left and right derivatives. Proposition 5.5.
With the same assumptions and notations as in Setting 3.1 andif Assumption 3.2 holds, the map ε C ε ( x, y ) is AC loc ([0 , ∞ )) , left and right differ-entiable everywhere in (0 , ∞ ) and, for any ε > , the left and right derivatives aregiven by d − d ε C ε ( x, y ) = 2 ε max Λ ε ( x,y ) I , d + d ε C ε ( x, y ) = 2 ε min Λ ε ( x,y ) I (5.2) respectively, and the former (resp. latter) is left (resp. right) continuous. It is partof the statement the fact that the maximum and the minimum are attained. Remark 5.6.
Heuristically, (5.2) is nothing but the envelope theorem. Indeed, if ε C ε ( x, y ) were differentiable, then its derivative would be given by ∂ ε A ε = 2 ε I evaluated at any critical point, i.e. at any ω ε ∈ Λ ε ( x, y ) . However, since we donot know in our general metric framework that Schrödinger problem has a uniquesolution, we are not able to prove pointwise differentiability as in [25] and we haveto face the possibility of a gap between the left and right derivatives. In any case,for a.e. ε > this gap is zero, because ε C ε ( x, y ) is locally absolutely continuousin (0 , ∞ ) and therefore a.e. differentiable. This means that, up to a negligible set oftemperatures, the left and right derivatives match and I is constant on Λ ε ( x, y ) . Iffor whatever reason the Schrödinger problem ( Sch ε ) were uniquely solvable (whichis in particular true for the classic Schrödinger problem, as proved in [39, Theorem4.2]), then the left and right derivatives would be trivially equal and Lemma 5.4would give that ε C ε ( x, y ) is actually C ((0 , ∞ )) . Furthermore, the cost wouldalso be twice differentiable a.e. since by Lemma 5.2 its first derivative ε I ( ω ε ) wouldbe the product of a linear function and of a monotone one. (cid:4) Proof.
The continuity of ε C ε ( x, y ) follows by Lemma 5.4, so let us focus on leftand right differentiability/continuity and local absolute continuity.30 ight differentiability . Fix ε > , let δ > , and choose ω ε ∈ Λ ε ( x, y ) , ω ε + δ ∈ Λ ε + δ ( x, y ) . Then write C ε + δ ( x, y ) − C ε ( x, y ) δ = A ε + δ ( ω ε + δ ) − A ε ( ω ε ) δ = A ε + δ ( ω ε + δ ) − A ε + δ ( ω ε ) δ + A ε + δ ( ω ε ) − A ε ( ω ε ) δ (5.3)and note that the second term on the right-hand side can be rewritten as A ε + δ ( ω ε ) − A ε ( ω ε ) = (2 εδ + δ ) I ( ω ε ) . The first one is non-positive by optimality of ω ε + δ for A ε + δ , hence we obtain lim sup δ ↓ C ε + δ ( x, y ) − C ε ( x, y ) δ ≤ lim sup δ ↓ (2 ε + δ ) I ( ω ε ) = 2 ε I ( ω ε ) . As this inequality holds for any ω ε ∈ Λ ε ( x, y ) , we infer that lim sup δ ↓ C ε + δ ( x, y ) − C ε ( x, y ) δ ≤ ε inf Λ ε ( x,y ) I . (5.4)On the other hand we can also write C ε + δ ( x, y ) − C ε ( x, y ) δ = A ε + δ ( ω ε + δ ) − A ε ( ω ε ) δ = A ε + δ ( ω ε + δ ) − A ε ( ω ε + δ ) δ + A ε ( ω ε + δ ) − A ε ( ω ε ) δ . (5.5)Using now the optimality of ω ε for A ε , we observe that the second term on theright-hand side is non-negative, whence A ε + δ ( ω ε + δ ) − A ε ( ω ε ) δ ≥ A ε + δ ( ω ε + δ ) − A ε ( ω ε + δ ) δ = (2 ε + δ ) I ( ω ε + δ ) . For any sequence δ n ↓ , Proposition 5.3 guarantees (up to extraction of a subse-quence if needed) that ω ε + δ n → ω ε in the pointwise-in-time σ -topology for some ω ε ∈ Λ ε ( x, y ) . By lower semicontinuity of I this implies lim inf n →∞ C ε + δ n ( x, y ) − C ε ( x, y ) δ n ≥ lim inf n →∞ (2 ε + δ n ) I ( ω ε + δ n ) ≥ ε I ( ω ε ) ≥ ε inf Λ ε ( x,y ) I , and together with (5.4) this yields ∃ lim n →∞ C ε + δ n ( x, y ) − C ε ( x, y ) δ n = 2 ε I ( ω ε ) = 2 ε inf Λ ε ( x,y ) I . As the right-hand side does not depend on the particular sequence δ n ↓ we concludethat ∃ lim δ ↓ C ε + δ ( x, y ) − C ε ( x, y ) δ = 2 ε min Λ ε ( x,y ) I , in particular I is minimized by any accumulation point ω ε of { ω ε + δ } δ> .31 eft differentiability . The argument is very similar. Indeed, if δ < , then thefirst term on the right-hand side of (5.3) is non-negative and the second one can behandled in the same way. Hence there holds lim inf δ ↑ C ε + δ ( x, y ) − C ε ( x, y ) δ ≥ ε I ( ω ε ) , for any ω ε ∈ Λ ε ( x, y ) , and therefore lim inf δ ↑ C ε + δ ( x, y ) − C ε ( x, y ) δ ≥ ε sup Λ ε ( x,y ) I . Applying the same considerations to (5.5) and following the same argument as abovewe retrieve the lim sup inequality, first along some subsequence δ n ↑ and then along any δ ↑ . Combining with the inequality above gives ∃ lim δ ↑ C ε + δ ( x, y ) − C ε ( x, y ) δ = 2 ε max Λ ε ( x,y ) I , ∀ ε > , whence the pointwise left differentiability of ε C ε ( x, y ) . Left and right continuity . In order to prove the right continuity of the rightderivative of ε C ε ( x, y ) , note that on the one hand by Lemma 5.2 for any ε n ↓ ε it holds inf Λ ε ( x,y ) I ≥ lim sup n →∞ sup Λ εn ( x,y ) I ≥ lim sup n →∞ inf Λ εn ( x,y ) I . On the other hand, we can assume up to a subsequence if needed that lim inf n →∞ inf Λ εn ( x,y ) I = lim n →∞ inf Λ εn ( x,y ) I . As shown in the proof of right differentiability, inf Λ ε ′ ( x,y ) I is attained for any ε ′ > ,hence in particular inf Λ εn ( x,y ) I = I ( ω n ) for some ω n ∈ Λ ε n ( x, y ) , for all n . Up toextracting a further subsequence, by Proposition 5.3 we can assume that ω n → ω ε w.r.t. the pointwise-in-time σ -topology for some ω ε ∈ Λ ε ( x, y ) , and moreover byLemma 5.4 lim n →∞ I ( ω n ) = I ( ω ) ≥ inf Λ ε ( x,y ) I . Putting all these inequalities together provides us with the right continuity of ε inf Λ ε ( x,y ) I and, a fortiori, of the right derivative. Left continuity for the left deriva-tive follows along an analogous reasoning. Local absolute continuity . Let < ε < ε < ∞ and, for any < δ < , define f δ ( ε ) := C ε + δ ( x, y ) − C ε ( x, y ) δ . The monotonicity of ε C ε ( x, y ) from Lemma 5.2 gives f δ ≥ . Arguing as inthe very beginning of the proof of the right differentiability we see that f δ ( ε ) ≤ (2 ε + 1) I ( ω ε ) for any ω ε ∈ Λ ε ( x, y ) , and by Lemma 5.2 f δ ( ε ) ≤ (2 ε + 1) sup Λ ε ( x,y ) I < ∞ , ∀ ε ∈ ( ε , ε ] . | f δ | ≤ M uniformly in δ and f δ converges pointwise to the right derivative of ε C ε ( x, y ) as δ ↓ , whence by the dominated convergence theorem Z ε ε d + d ε C ε ( x, y ) d ε = lim δ ↓ Z ε ε f δ ( ε ) d ε. The right-hand side can be rewritten as lim δ ↓ Z ε ε f δ ( ε ) d ε = lim δ ↓ (cid:16) δ Z ε ε C ε + δ ( x, y ) d ε − δ Z ε ε C ε ( x, y ) d ε (cid:17) = lim δ ↓ (cid:16) δ Z ε + δε C ε ( x, y ) d ε − δ Z ε + δε C ε ( x, y ) d ε (cid:17) = C ε ( x, y ) − C ε ( x, y ) , where the last equality holds by the Lebesgue differentiation theorem for the con-tinuous function ε C ε ( x, y ) (cf. Lemma 5.4). We have thus proved that the costbelongs to AC loc ((0 , ∞ )) , since C ε ( x, y ) − C ε ( x, y ) = Z ε ε d + d ε C ε ( x, y ) d ε, ∀ < ε < ε . For the full AC loc ([0 , ∞ )) regularity it is then sufficient to let ε ↓ : the left-hand side converges to C ε ( x, y ) − C ( x, y ) by Lemma 5.4, and by the monotonicity d + d ε C ε ≥ the right-hand side also converges by monotone convergence.Relying on our previous auxiliary results and on Proposition 5.5, we are finallyin position of estimating the error C ε ( x, y ) − C ( x, y ) with o ( ε ) precision. We willalso significantly refine Corollary 4.4 by proving that any accumulation point of anysequence of minimizers is not only optimal for the unperturbed problem C ( x, y ) ,but also I -minimizing among all competitors in Λ ( x, y ) . Theorem 5.7.
With the same assumptions and notations as in Proposition 5.5, ifthere exists ω ∈ Λ ( x, y ) such that I ( ω ) < ∞ , then the map ε C ε ( x, y ) is rightdifferentiable also at ε = 0 with d + d ε C ε ( x, y ) (cid:12)(cid:12)(cid:12) ε =0 = 0 , the right derivative is right continuous for any ε ≥ , and C ε ( x, y ) − C ( x, y ) = ε inf Λ ( x,y ) I + o ( ε ) . (5.6) Moreover, for any ε n ↓ and any minimizer ω n ∈ Λ ε n ( x, y ) there exists ω ∗ ∈ Λ ( x, y ) such that (up to a subsequence) ω n → ω ∗ for the pointwise-in-time σ -topology, and ω ∗ has minimal Fisher information in Λ ( x, y ) I ( ω ∗ ) = min Λ ( x,y ) I . roof. The right differentiability of ε C ε ( x, y ) at ε = 0 follows by the sameargument carried out in Proposition 5.5. Indeed, given ω as in the statement, by(5.3) with ε = 0 it holds lim sup δ ↓ C δ ( x, y ) − C ( x, y ) δ ≤ lim sup δ ↓ δ I ( ω ) = 0 . The liminf inequality is straightforward, since
I ≥ and thus by (5.5) with ε = 0lim inf δ ↓ C δ ( x, y ) − C ( x, y ) δ ≥ lim inf δ ↓ δ I ( ω δ ) ≥ for any ω δ ∈ Λ δ ( x, y ) . This also shows that the right derivative vanishes at ε = 0 .As regards the right continuity of the right derivative, the case ε > has alreadybeen discussed in Proposition 5.5. For ε = 0 the same strategy still works, with theonly minor difference that we cannot rely on Lemma 5.4 anymore. Nonetheless, if ω n ∈ Λ ε n ( x, y ) is as in Proposition 5.5, ω ∈ Λ ( x, y ) and ω n → ω for the pointwise-in-time σ -topology (the existence of such ω is granted by Corollary 4.4) it is stilltrue that lim inf n →∞ I ( ω n ) ≥ I ( ω ) , simply by lower semicontinuity of I . With this single change in the proof we deducethat ε inf Λ ε ( x,y ) I is right continuous and finite also at ε = 0 , thanks to the presentassumptions, and so is the right derivative of the cost due to ε inf Λ ε ( x,y ) I → as ε ↓ .The last part of the statement is a slight modification of these lines of thought.Indeed, given any sequence ε n ↓ and ω n ∈ Λ ε n ( x, y ) , the existence of ω ∗ ∈ Λ ( x, y ) such that, up to subsequences, ω n → ω ∗ is ensured by Corollary 4.4. The fact that ω ∗ has minimal Fisher information among all elements in Λ ( x, y ) follows from inf Λ ( x,y ) I ≥ lim sup n →∞ sup Λ εn ( x,y ) I ≥ lim sup n →∞ I ( ω n ) ≥ lim inf n →∞ I ( ω n ) ≥ I ( ω ∗ ) ≥ inf Λ ( x,y ) I , where we used once again Lemma 5.2 and the lower semicontinuity of I .Thus, it only remains to establish (5.6). As ε C ε ( x, y ) belongs to AC loc ([0 , ∞ )) and the right derivative coincides a.e. with the full derivative, (5.2) and the funda-mental theorem of calculus yield C ε ( x, y ) − C ( x, y ) = 2 Z ε s inf Λ s ( x,y ) I d s ≤ Z ε s inf Λ ( x,y ) I d s = ε inf Λ ( x,y ) I . Here we used the monotonicity of the Fisher information from Lemma 5.2 in themiddle inequality. By the same monotonicity and the right continuity at ε = 0 of ε inf Λ ε ( x,y ) I we also deduce that C ε ( x, y ) − C ( x, y ) ≥ Z ε s inf Λ ε ( x,y ) I d s = ε inf Λ ε ( x,y ) I = ε inf Λ ( x,y ) I + ε (cid:16) inf Λ ε ( x,y ) I − inf Λ ( x,y ) I (cid:17) = ε inf Λ ( x,y ) I + o ( ε ) . Combining this lower bound with the previous upper one entails (5.6).34 emark 5.8.
It is worth stressing that the upper bound on C ε ( x, y ) − C ( x, y ) is notasymptotic, but pointwise. A possible way to improve (5.6) would rely on a refinedanalysis of ε inf Λ ε ( x,y ) I , its derivative (which exists a.e. by monotonicity), andpossibly absolute continuity. (cid:4) Remark 5.9.
The I -minimizing property of the accumulation point ω ∗ is not spe-cific of the choice ε = 0 , but of the particular “backward” direction of the sequence ε n ↓ . Repeating the argument in the proof of Theorem 5.7 it is indeed not difficultto check that, given any ε > , a sequence ε n ↓ ε , and ω n ∈ Λ ε n ( x, y ) there exists ω ε ∈ Λ ε ( x, y ) such that, up to a subsequence, ω n → ω ε for the pointwise-in-time σ -topology and I ( ω ε ) = inf Λ ε ( x,y ) I . In a symmetric fashion, a closer look into the proof of Proposition 5.5 suggests thatan opposite behaviour appears in the “forward” direction. More precisely, if ε n ↑ ε instead of ε n ↓ ε , then any accumulation point ω ε of ( ω n ) is such that I ( ω ε ) = sup Λ ε ( x,y ) I . However, the “backward” direction and the case ε = 0 are usually more interesting,because of the connection with the unperturbed problem C ( x, y ) for which theremight be multiple solutions even if the Schrödinger problem ( Sch ε ) has a uniqueminimizer for all ε > . It is therefore natural to look for the (properties of the)solutions selected via Schrödinger regularization. (cid:4) In this section we collect several and heterogeneous situations where our abstractapproach (in particular Theorem 4.3, Corollary 4.4, and Theorem 5.7) applies. Weshall also comment the novelty of the results thus obtained in comparison withthe existing literature. In this perspective, it is worth discussing in more detailthe role played by Assumption 3.2 so far, singling out when the sequential lowersemicontinuity of | ∂ E | w.r.t. σ is needed and when it is not:• to show the existence of a solution to the Schrödinger problem ( Sch ε ) (cf.Proposition 4.2) it is crucial, in order to apply the direct method of the calculusof variations;• in Theorem 4.3 and Corollary 4.4 it is not used;• unlike Corollary 4.4, in Proposition 5.3 it is needed for the Γ -liminf inequalityand so is in Lemma 5.4;• Proposition 5.5 relies on Proposition 5.3 and Lemma 5.4, hence it is implicitlyused;• in Theorem 5.7 the continuity of ε inf Λ ε ( x,y ) I at ε = 0 requires the lowersemicontinuity of | ∂ E | and also Proposition 5.5 is used in the proof of (5.6);hence the lower semicontinuity of | ∂ E | is really needed.35his means that if one is able to show the solvability of the Schrödinger problem( Sch ε ) by means other than those used in Proposition 4.2, then Theorem 4.3 andCorollary 4.4 are still valid under the following weaker hypothesis. Assumption 6.1.
There exists a Hausdorff topology σ on X such that d -boundedsequences contain σ -converging subsequences. Moreover, the distance d is sequentiallylower semicontinuous w.r.t. σ . Theorem 5.7, instead, requires the full validity of Assumption 3.2.
As a first example, let us consider the Boltzmann-Shannon relative entropy on theWasserstein space built over a (locally compact)
RCD( K, ∞ ) space. To this end, let ( M, d , m ) be a complete and separable locally compact length metric space endowedwith a Radon measure and assume that it is an RCD( K, ∞ ) space [4] for some K ∈ R .Let X := P ( M ) be the 2-Wasserstein space over M , namely the space of probabilitymeasures with finite second moments, and equip it with the 2-Wasserstein distance W : it turns out to be a complete and separable metric space [14] as well. TheBoltzmann-Shannon relative entropy E on X is defined as E ( µ ) := Z M ρ log( ρ ) d m if µ = ρ m , + ∞ if µ m . As by [61, Theorem 4.24] there exist
C > , x ∈ M such that R M e − C d ( · ,x ) d m < ∞ , E can be equivalently rewritten as E ( µ ) = Z M ˜ ρ log(˜ ρ ) d ˜ m | {z } ≥ − C Z M d ( · , x ) d µ − log Z, where ˜ ρ is the Radon-Nikodym derivative of µ w.r.t. ˜ m , with the normalization Z := Z M e − C d ( · ,x ) d m , ˜ m := 1 Z e − C d ( · ,x ) m . From this very definition, it is easy to see that E is a proper lower semicontinuousfunctional, bounded from below on W -bounded sets. In addition, it has a densedomain, since by (one of the equivalent) definition of RCD spaces, cf. [4, Theorem5.1], for any µ ∈ X there exists an EVI K -gradient flow of E starting from it. ThusSetting 3.1 holds.As regards Assumption 6.1, note that X is not locally compact unless M is com-pact, so that in general the metric topology of X is not an admissible candidate for σ . Nonetheless there is a natural alternative: the narrow convergence of probabilitymeasures. Indeed, W -bounded sequences in X are uniformly tight (the second mo-ments are uniformly bounded and the balls in M are relatively compact, so that theclaim follows from [2, Remark 5.1.5]) and thus relatively compact w.r.t. the narrowtopology. Moreover, W is lower semicontinuous w.r.t. narrow convergence of mea-sures [1, Proposition 3.5]. Therefore, given any µ, ν ∈ X for which the dynamicalSchrödinger problem ( Sch ε ) is solvable, the Γ -convergence results of Section 4 are36ully applicable. This is for instance the case if µ, ν ≪ m have bounded densities andsupports (in [38, 39] this is proved for RCD ∗ ( K, N ) spaces, N < ∞ , but the argu-ment can be adapted to locally compact RCD( K, ∞ ) spaces thanks to the existenceof “good” cut-off functions [52]).In the present framework, taking into account the equivalence between W -absolutely continuous curves and distributional solutions of the continuity equation(see [37]) and the fact that the slope | ∂ E | coincides with the Fisher information [3,Theorem 9.3], ( Sch ε ) reads as inf (cid:26) Z Z | v t | ρ t d t d m + ε Z Z |∇ log ρ t | ρ t d t d m (cid:27) , where the infimum runs over all couples ( µ t , v t ) , µ t = ρ t m , solving the continuityequation ∂µ t + div( v t µ t ) = 0 with the constraint µ = µ and µ = ν . This is thedynamical formulation of the “classical” Schrödinger problem [45]. A thorough studyof this problem and its equivalent formulations (at the static, dual, and dynamicallevels) has been carried out by the second author in [39], but in the more restric-tive framework of RCD ∗ ( K, N ) spaces, and only for ε fixed. The behaviour of the(unique) minimizers as ε ↓ is instead studied in [38, Proposition 5.1], again only in RCD ∗ ( K, N ) spaces, but the Γ -convergence of the corresponding variational prob-lems is not investigated. Hence Theorem 4.3 and Corollary 4.4 are new in the RCD framework.Under the stronger assumption that M is compact (e.g. the torus, the sphere orany convex closed bounded subset of a smooth weighted Riemannian manifold), assaid above we can choose the σ topology to be the strong one induced by W , and inthis case | ∂ E | is lower semicontinuous by (3.1) and Assumption 3.2 is fully satisfied.Another interesting situation where Assumption 3.2 fully holds is represented bya convex domain in R d (in this case σ is, as before, the narrow topology; see [35,Lemma 2.4] for a proof of the narrow lower semicontinuity of | ∂ E | ). As a consequence,in these examples also the results in Section 5 hold true and this partly extends therecent work [25], where an analogue of Theorem 5.7 is proved in the Riemanniansetting. As a second class of examples, we consider generalized entropy functionals (usu-ally called internal energies ) on the Wasserstein space built over an
RCD ∗ (0 , N ) space, N < ∞ . Taking advantage of the non-negative curvature assumption and ofthe finite dimensionality we shall indeed be able to cover a wide range of function-als, including in particular Rényi entropies (naturally linked to the porous mediumequation). By the discussion carried out in the previous section and by the factthat RCD ∗ ( K, N ) spaces are in particular locally compact RCD( K, ∞ ) spaces (thenotion of RCD ∗ ( K, N ) space is first introduced in [36]; for the distinction between RCD and
RCD ∗ conditions see [6] and [21]), the 2-Wasserstein space X := P ( M ) over an RCD ∗ (0 , N ) space ( M, d , m ) endowed with the 2-Wasserstein distance W isa complete and separable metric space.As regards the entropy functionals we shall consider on X , they are of the form R M U ( ρ ) d m , where U : [0 , ∞ ) → R is a continuous and convex function with U (0) = and U ′ locally Lipschitz in (0 , ∞ ) satisfying McCann’s condition [49] for some N ′ ≥ N : this means that the corresponding pressure function P ( r ) := rU ′ ( r ) − U ( r ) is such that P (0) := lim r ↓ P ( r ) = 0 and r r − /N ′ P ( r ) is non-decreasing or,equivalently, r r N ′ U ( r − N ′ ) is convex and non-increasing on (0 , ∞ ) . Under these assumptions on U , the internal energy E is defined as E ( µ ) := Z M U ( ρ ) d m + U ′ ( ∞ ) µ ⊥ ( M ) , if µ = ρ m + µ ⊥ , µ ⊥ ⊥ m (6.1)where U ′ ( ∞ ) := lim r →∞ U ′ ( r ) . In the case U is chosen equal to U N ′ ( r ) := − N ′ ( r − /N ′ − r ) , N ′ ≥ N or U m ( r ) := 1 m − r m , m ≥ − N ( U N ′ being more linked to Lott-Sturm-Villani theory of curvature-dimension bounds, U m with the porous medium equation of power m ), the well-known Rényi entropyis recovered. A detailed discussion of internal energies like E , associated non-linear diffusion semigroups and evolution variational inequalities in connection withcurvature-dimension conditions is at the heart of the monograph [5] and can also befound in [63, Chapters 16 and 17].Since U (0) = 0 , M is locally compact and U is continuous, it is clear that E iswell defined and finite on all probability measures with bounded support, so that E is proper and has a dense domain in X . Actually, D ( E ) is dense in energy in X ,i.e. for all µ ∈ X there exist µ n ∈ D ( E ) with W ( µ n , µ ) → and E ( µ n ) → E ( µ ) as n → ∞ . By the properties of U it is also easy to see that E is lower semicontinuous[63, Theorem 30.6] and bounded from below on W -bounded sets. Finally, from [5,Theorem 9.21] with K = 0 (since M is assumed to be an RCD ∗ (0 , N ) space) andthe fact that D ( E ) is dense in energy in X , we see that for all µ ∈ X there exists an EVI -gradient flow of E starting from it.We are therefore within Setting 3.1 and by what we said in Section 6.1 the narrowtopology complies with Assumption 6.1. Hence whenever the dynamical Schrödingerproblem ( Sch ε ) is solvable, our metric results of Section 4 can be applied. As regardsthe lower semicontinuity of | ∂ E | w.r.t. σ , and thus the full validity of Assumption 3.2and, as a consequence, of the abstract results of Section 5 too, if M is compact thenby (3.1) we see that the W -topology is an admissible candidate for σ . If M = R d and U is superlinear at ∞ (which is the case for U m defined above with m > ),then by [2, Theorem 10.4.6] the slope can be represented as | ∂ E | ( µ ) = Z R d |∇ U ′ ( ρ ) | d µ, if µ = ρ L d and by [35, Proposition 2.2] it is lower semicontnuous w.r.t. narrow convergence.Hence also in this situation the results of Section 5 hold true.To the best of our knowledge, up to now the dynamical Schrödinger problem( Sch ε ) with the slope of a general internal energy in place of the slope of the Boltz-mann entropy has been considered only in [33] from a purely formal point of view.Static Monge-Kantorovich problems regularized by means of the Rényi entropy or38ore general internal energies have recently been introduced in [32, 48, 47, 30] (seealso the references therein). Remarkably, [47] establishes the Γ -convergence of theregularized problems towards the optimal transport one (cf. [30] where the conver-gence of the optimal values and minimizers is discussed). However, in [30] onlybounded costs are considered (the quadratic cost function associated to (1.4) is thusruled out for non-compact sample spaces), while [47] the discussion is restricted tosample spaces which are compact subset of R d . Other questions our paper is con-cerned with have not been examined in these references. Note also that the issue ofthe equivalence between static and dynamical formulations is far from being clear atthis level of generality. In view of this discussion, in all the applicability situationspresented in this section our results are new.The case of a (possibly) negatively curved base space M is not discussed since,as already argued above, [5, Theorem 9.21] allows to deduce ( EVI λ ) with λ = 0 onlyfor K ≥ . Moreover, it has recently been proved [28, Theorem 2.5 and Remark2.6] that in the hyperbolic space the porous medium equation cannot be seen as theWasserstein gradient flow of some λ -convex functional in the EVI -sense, hence theRényi entropy cannot generate an
EVI λ -gradient flow there. In the seminal thought experiment proposed by Schrödinger [59, 60] the physicalsystem, whose evolution between two subsequent observations has to be determined,consists of independent Brownian particles. An important generalization has beenrecently proposed in [7], where particles are allowed to interact through a pair po-tential W . This leads to the so-called Mean Field Schrödinger Problem (MFSPhenceforth). In order to see that this example falls within our abstract metric the-ory, let us first check the validity of Setting 3.1. As already said in Section 6.1, X := P ( R d ) , the 2-Wasserstein space over R d , endowed with the Wasserstein distace W is a complete and separable metric space. The role played by the Boltzmann-Shannon relative entropy in the “classical” Schrödinger problem is here taken by thefunctional E : X → R defined (up to a shift by a constant) by E ( µ ) := H ( µ | L d ) + Z R d W ∗ ρ d µ if µ = ρ L d + ∞ if µ
6≪ L d where H ( µ | L d ) is the Boltzmann-Shannon relative entropy of µ w.r.t. the Lebesguemeasure L d , already introduced in Section 6.1, and W is the pair potential, describingvia convolution the interaction between the particles of the system. On such apotential the following assumptions are made: it is of class C ( R d , R ) , is symmetric,i.e. W ( x ) = W ( − x ) for all x ∈ R d , and satisfies the two-sided bound ΛId ≥ ∇ W ≥ λ Id for some Λ , λ > (actually λ ∈ R is enough, but in [7] the authors are interested inthe ergodic behaviour of MFSP). While the upper bound is technical, the lower oneis geometric and crucial. The lower semicontinuity of E is easily seen to hold: therelative entropy has already been discussed, whereas the continuity of the convolutionterm follows from the fact that if µ n → µ in P ( R d ) , then µ n ⊗ µ n → µ ⊗ µ in P ( R d ) ,39f. [2, Example 9.3.4]. The fact that E is proper and the density of its domain arealso clear. Moreover, the assumptions on W guarantee that E is bounded from belowon W -bounded sets. As concerns the existence of EVI λ -gradient flows starting fromany µ ∈ X , this is ensured by [2, Theorem 11.2.1] in conjunction with [2, Remark9.2.5 and Proposition 9.3.5], granting the λ -convexity of E along generalized geodesics(see [2, Definitions 9.2.2 and 9.2.4]). Therefore, Setting 3.1 holds.As regards Assumption 3.2, for the topology σ the natural candidate is once againthe sequential topology induced by narrow convergence of probability measures. Bythe discussion in Section 6.1, W -bounded sets are relatively narrow compact and W is sequentially narrow lower semicontinuous. The slope of E is explicitly givenby | ∂ E | ( µ ) = Z R d |∇ log ρ + 2 ∇ W ∗ ρ | d µ if µ = ρ L d , ∇ log ρ ∈ L µ , + ∞ otherwise , cf. [7, Section 1.4.2], and one can rely on [35, Proposition 2.2], the fact that ∆ W is continuous and bounded (as a consequence of the boundedness of ∇ W ) and theregularization properties of the convolution to show that | ∂ E | is also sequentiallynarrowly lower semicontinuous. Hence Assumption 3.2 is fully satisfied and all theresults of Sections 4 and 5 are applicable.From the novelty standpoint, a first interesting remark is the fact that in [7]the approach is purely stochastic, while our point of view is completely analytic.For instance, in [7, Proposition 1.1] the existence of solutions to MFSP is provedunder the same assumptions we have in Proposition 4.2, namely µ, ν ∈ X with E ( µ ) , E ( ν ) < ∞ . However, already at this basic level the reader may appreciate thedifference between the two approaches.But more than anything else, our abstract results are completely new when spe-cialized to MFSP: indeed, only the ergodic behaviour in the long time regime ε → ∞ is studied in [7], so that the Γ -convergence results of Section 4 are entirely novel.The same is true for Section 5, since in [25] the derivative of the cost associated toMFSP is not investigated nor is the Taylor expansion (5.6). In [31] the authors studied a generalization W m of the quadratic Wasserstein dis-tance on P (Ω) , generated by Benamou-Brenier-type formulas in convex boundeddomains Ω ⊂ R d . The latter are based on nonlinear continuity equations and pseudo-Riemannian norms k ˙ µ k µ = min v (cid:26)Z Ω m ( µ ) | v | s.t. ˙ µ + div( m ( µ ) v ) = 0 (cid:27) , where m : R + → R + is a given nonlinear mobility function satisfying suitable struc-tural conditions (mainly concavity). The linear case m ( r ) = r corresponds to thestandard Wasserstein distance. In the nonlinear case, the lack of 1-homogeneitymay impose restricting to absolutely continuous measures µ = ρ ( x ) d x (as in [20,Section 3] depending on cases A or B therein), thus for the ease of exposition in thissection we identify measures µ with their densities ρ with a slight abuse of notation.40pecifying the reference measure m in (6.1) to be the (normalized) Lebesguemeasure L d and taking U to be superlinear U (+ ∞ ) = + ∞ for convenience, let uscheck that our previous internal energies E ( ρ ) = Z Ω U ( ρ ( x )) d x fit our setting. First of all, completeness is known from [31, Theorem 5.7], and sepa-rability readily follows from the density of compactly supported continuous functions,whence (A1). Next, from [31, Theorem 5.5] it is known that the W m topology is atleast stronger than the weak- ∗ convergence of measures, thus the lower semiconti-nuity in (A2) holds as soon as z U ( z ) is lower semicontinuous. The density ofthe domain D ( E ) in (A2) should be again a simple exercise involving standard ap-proximation arguments, provided U is reasonable. As regards our more fundamentalassumption (A3), the generation of an EVI λ -flow is exactly the purpose of [20] for λ = 0 . More precisely, under some generalized McCann condition GM C ( m , d ) in-volving U, m , and the ambient dimension d , [20, Theorem 4.10] guarantees that ourinternal energy functionals generate -contractive gradient flows. As a consequencewe can rigorously take our Setting 3.1 as applicable here. As concerns Assump-tion 3.2, a reasonable and natural choice for the weaker σ -topology is of course theweak- ∗ convergence of measures. Given the pseudo-Riemannian structure inducedby [20, Eq. (3.2)], the (squared) metric slope is at least formally given by | ∂ E | ( ρ ) = Z Ω m ( ρ ) |∇ U ′ ( ρ ) | = Z Ω |∇ P ( ρ ) | m ( ρ ) , (6.2)where the pressure P is defined as P ( r ) = R r U ′′ ( z ) m ( z ) d z . With reasonable as-sumptions on U, m it should not be difficult to check the lower semicontinuity of thisgeneralized Fisher information for this specific choice of the σ topology, and from[31, Theorem 5.6] the distance W m is also known to be lower semicontinuous w.r.t.the weak- ∗ convergence.As a consequence our metric results apply to this context as well (althoughfull proofs of the representation (6.2) for the metric slope and of its weak lowersemicontinuity are still missing for a completely rigorous statement, but this is outof scope of the paper). Let (X , d ) be a complete and separable CAT(0) space (i.e. a separable Hadamardspace), x ∈ X be a fixed point, and E = d ( x , · ) . Then E is a -convex functional.By [54, Theorem 3.14], there exists an EVI -gradient flow of E starting from any x ∈ X . Thus, we fit into Setting 3.1.Although Assumption 6.1 is satisfied, existence of some “weak” Hausdorff topol-ogy σ on X is required to secure Assumption 3.2 (unless X is locally compact, cf.Remark 3.3). Such a topology indeed exists, at least if X satisfies a rather mild geo-metric Q condition [41, 42], and is called the half-space topology. The correspondingconvergence is known as the ∆ -convergence [46]. Indeed, d -bounded sequences con-tain ∆ -converging subsequences [46, 41]. Bounded closed convex sets (in particular,41alls) are ∆ -closed [42], which easily implies that d is ∆ -lower semicontinuous. More-over, it is easy to see from (3.1) that | ∂ E ( x ) | = d ( x , x ) , thus the slope is ∆ -lowersemicontinuous too.In this framework, all our results are applicable. Note that E is always finite. Acknowledgments
LM wishes to thank Jean-Claude Zambrini for numerous and fruitful discussions onthe Schrödinger problem, and acknowledges support from the Portuguese ScienceFoundation through FCT project PTDC/MAT-STA/22812/2017
SchröMoka . LTacknowledges financial support from FSMP Fondation Sciences Mathématiques deParis. DV was partially supported by the FCT projects UID/MAT/00324/2020 andPTDC/MAT-PUR/28686/2017.
References [1] Luigi Ambrosio and Nicola Gigli. A user’s guide to optimal transport. In
Mod-elling and Optimisation of Flows on Networks , Lecture Notes in Mathematics,pages 1–155. Springer Berlin Heidelberg, 2013.[2] Luigi Ambrosio, Nicola Gigli, and Giuseppe Savaré.
Gradient flows in metricspaces and in the space of probability measures . Lectures in Mathematics ETHZürich. Birkhäuser Verlag, Basel, second edition, 2008.[3] Luigi Ambrosio, Nicola Gigli, and Giuseppe Savaré. Calculus and heat flow inmetric measure spaces and applications to spaces with Ricci bounds from below.
Invent. Math. , 195(2):289–391, 2014.[4] Luigi Ambrosio, Nicola Gigli, and Giuseppe Savaré. Metric measure spaces withRiemannian Ricci curvature bounded from below.
Duke Math. J. , 163(7):1405–1490, 2014.[5] Luigi Ambrosio, Andrea Mondino, and Giuseppe Savaré. Nonlinear diffusionequations and curvature conditions in metric measure spaces.
Mem. Amer.Math. Soc. , 262(1270):v+121, 2019.[6] Kathrin Bacher and Karl-Theodor Sturm. Localization and tensorization prop-erties of the curvature-dimension condition for metric measure spaces.
J. Funct.Anal. , 259(1):28–56, 2010.[7] Julio Backhoff, Giovanni Conforti, Ivan Gentil, and Christian Léonard. Themean field Schrödinger problem: ergodic behavior, entropy estimates and func-tional inequalities.
Probability Theory and Related Fields , pages 1–56, 2020.[8] Aymeric Baradat and Léonard Monsaingeon. Small noise limit and convexity forgeneralized incompressible flows, Schrödinger problems, and optimal transport.
Arch. Ration. Mech. Anal. , 235(2):1357–1403, 2020.[9] Jean-David Benamou and Yann Brenier. A computational fluid mechanics solu-tion to the Monge-Kantorovich mass transfer problem.
Numerische Mathematik ,84(3):375–393, Jan 2000. 4210] Jean-David Benamou, Guillaume Carlier, Marco Cuturi, Luca Nenna, andGabriel Peyré. Iterative Bregman projections for regularized transportationproblems.
SIAM J. Sci. Comput. , 37(2):A1111–A1138, 2015.[11] Jean-David Benamou, Guillaume Carlier, Simone Di Marino, and Luca Nenna.An entropy minimization approach to second-order variational mean-fieldgames.
Preprint, arXiv:1807.09078 , 2018.[12] Jean-David Benamou, Guillaume Carlier, and Luca Nenna. A numerical methodto solve multi-marginal optimal transport problems with Coulomb cost. In
Splitting methods in communication, imaging, science, and engineering , Sci.Comput., pages 577–601. Springer, Cham, 2016.[13] Jean-David Benamou, Guillaume Carlier, and Luca Nenna. Generalized incom-pressible flows, multi-marginal transport and Sinkhorn algorithm.
Preprint,arXiv:1710.08234 , 2017.[14] François Bolley. Separability and completeness for the Wasserstein distance. In
Séminaire de probabilités XLI , volume 1934 of
Lecture Notes in Math. , pages371–377. Springer, Berlin, 2008.[15] François Bolley and José A Carrillo. Nonlinear diffusion: geodesic convexity isequivalent to Wasserstein contraction.
Communications in Partial DifferentialEquations , 39(10):1860–1869, 2014.[16] Andrea Braides.
Gamma-convergence for Beginners , volume 22. ClarendonPress, 2002.[17] Dmitri Burago, Yuri Burago, and Sergei Ivanov.
A course in metric geometry ,volume 33 of
Graduate Studies in Mathematics . American Mathematical Society,Providence, RI, 2001.[18] Eric Carlen. Stochastic mechanics: a look back and a look ahead.
Diffusion,quantum theory and radically elementary mathematics , 47:117–139, 2014.[19] Guillaume Carlier, Vincent Duval, Gabriel Peyré, and Bernhard Schmitzer.Convergence of entropic schemes for optimal transport and gradient flows.
SIAMJ. Math. Anal. , 49(2):1385–1418, 2017.[20] José Antonio Carrillo, Stefano Lisini, Giuseppe Savaré, and Dejan Slepcev. Non-linear mobility continuity equations and generalized displacement convexity.
Journal of Functional Analysis , 258(4):1273–1309, 2010.[21] Fabio Cavalletti and Emanuel Milman. The globalization theorem for the cur-vature dimension condition. Preprint, arXiv:1612.07623, 2016.[22] Yongxin Chen, Tryphon T Georgiou, and Michele Pavon. On the relation be-tween optimal transport and Schrödinger bridges: A stochastic control view-point.
Journal of Optimization Theory and Applications , 169(2):671–691, 2016.[23] Giovanni Conforti. A second order equation for Schrödinger bridges with appli-cations to the hot gas experiment and entropic transportation cost.
ProbabilityTheory and Related Fields , 174(1-2):1–47, 2019.4324] Giovanni Conforti and Luigia Ripani. Around the entropic Talagrand inequality.
Bernoulli , 26(2):1431–1452, 2020.[25] Giovanni Conforti and Luca Tamanini. A formula for the time derivative of theentropic cost and applications.
Preprint, arXiv:1912.10555 , 2019.[26] Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal trans-port. In
Advances in Neural Information Processing Systems , pages 2292–2300,2013.[27] Sara Daneri and Giuseppe Savaré. Eulerian calculus for the displacement con-vexity in the Wasserstein distance.
SIAM J. Math. Anal. , 40(3):1104–1122,2008.[28] Nicolò De Ponti, Matteo Muratori, and Carlo Orrieri. Wasserstein stabilityof porous medium-type equations on manifolds with Ricci curvature boundedbelow.
Preprint, arXiv:1908.03147 , 2019.[29] Simone Di Marino and Augusto Gerolin. An optimal transport approach forthe Schrödinger bridge problem and convergence of Sinkhorn algorithm.
J. Sci.Comput. , 85(2):Paper No. 27, 28, 2020.[30] Simone Di Marino and Augusto Gerolin. Optimal Transport losses and Sinkhornalgorithm with general convex regularization.
Preprint, arXiv:2007.00976 , 2020.[31] Jean Dolbeault, Bruno Nazaret, and Giuseppe Savaré. A new class of transportdistances between measures.
Calculus of Variations and Partial DifferentialEquations , 34(2):193–231, 2009.[32] Montacer Essid and Justin Solomon. Quadratically regularized optimal trans-port on graphs.
SIAM J. Sci. Comput. , 40(4):A1961–A1986, 2018.[33] Ivan Gentil, Christian Léonard, and Luigia Ripani. Dynamical aspects of gen-eralized Schrödinger problem via Otto calculus - A heuristic point of view.
Preprint, arXiv:1806.01553 , 2018.[34] Ivan Gentil, Christian Léonard, Luigia Ripani, and Luca Tamanini. An en-tropic interpolation proof of the HWI inequality.
Stochastic Processes and theirApplications , 2019.[35] Ugo Gianazza, Giuseppe Savaré, and Giuseppe Toscani. The Wasserstein gra-dient flow of the Fisher information and the quantum drift-diffusion equation.
Arch. Ration. Mech. Anal. , 194(1):133–220, 2009.[36] Nicola Gigli. On the differential structure of metric measure spaces and appli-cations.
Mem. Amer. Math. Soc. , 236(1113):vi+91, 2015.[37] Nicola Gigli and Bangxian Han. The continuity equation on metric measurespaces.
Calc. Var. Partial Differential Equations , 53(1-2):149–177, 2013.[38] Nicola Gigli and Luca Tamanini. Second order differentiation formula on
RCD ∗ ( K, N ) spaces. Accepted at JEMS, arXiv:1802.02463, (2018).4439] Nicola Gigli and Luca Tamanini. Benamou-Brenier and duality formulas for theentropic cost on RCD ∗ ( K, N ) spaces. Probability Theory and Related Fields ,176(1-2):1–34, 2020.[40] Richard Jordan, David Kinderlehrer, and Felix Otto. The variational formula-tion of the Fokker-Planck equation.
SIAM J. Math. Anal. , 29(1):1–17, 1998.[41] Bijan Kakavandi. Weak topologies in complete
CAT (0) metric spaces.
Proceed-ings of the American Mathematical Society , 141(3):1029–1039, 2013.[42] William Kirk and Naseer Shahzad.
Fixed point theory in distance spaces .Springer, Cham, 2014.[43] Flavien Léger. A geometric perspective on regularized optimal transport.
Jour-nal of Dynamics and Differential Equations , 31(4):1777–1791, 2019.[44] Christian Léonard. From the Schrödinger problem to the Monge-Kantorovichproblem.
J. Funct. Anal. , 262(4):1879–1920, 2012.[45] Christian Léonard. A survey of the Schrödinger problem and some of its con-nections with optimal transport.
Discrete Contin. Dyn. Syst. , 34(4):1533–1574,2014.[46] Teck Cheong Lim. Remarks on some fixed point theorems.
Proc. Amer. Math.Soc. , 60:179–182 (1977), 1976.[47] Dirk Lorenz and Hinrich Mahler. Orlicz space regularization of continuousoptimal transport problems.
Preprint, arXiv:2004.11574 , 2020.[48] Dirk A. Lorenz, Paul Manns, and Christian Meyer. Quadratically regularizedoptimal transport.
Applied Math. Optimization, to appear , 2019.[49] Robert J McCann. A convexity principle for interacting gases.
Advances inmathematics , 128(1):153–179, 1997.[50] Toshio Mikami. Monge’s problem with a quadratic cost by the zero-noise limitof h -path processes. Probab. Theory Related Fields , 129(2):245–260, 2004.[51] Toshio Mikami and Michèle Thieullen. Optimal transportation problem bystochastic optimal control.
SIAM J. Control Optim. , 47(3):1127–1139, 2008.[52] Andrea Mondino and Aaron Charles Naber. Structure theory of metric measurespaces with lower Ricci curvature bounds.
Journal of the European MathematicalSociety , 21(6):1809–1854, 2019.[53] Léonard Monsaingeon and Dmitry Vorotnikov. The Schrödinger problem on thenon-commutative Fisher-Rao space.
Calc. Var. Partial Differential Equations,to appear , 2020.[54] Matteo Muratori and Giuseppe Savaré. Gradient flows and Evolution Varia-tional Inequalities in metric spaces. I: Structural properties.
Journal of Func-tional Analysis , 278(4):108347, 2020.4555] Felix Otto. The geometry of dissipative evolution equations: the porous mediumequation.
Comm. Partial Differential Equations , 26(1-2):101–174, 2001.[56] Felix Otto and Michael Westdickenberg. Eulerian calculus for the contraction inthe wasserstein distance.
SIAM journal on mathematical analysis , 37(4):1227–1255, 2005.[57] Gabriel Peyré and Marco Cuturi. Computational Optimal Transport: WithApplications to Data Science.
Foundations and Trends® in Machine Learning ,11(5-6):355–607, 2019.[58] Filippo Santambrogio.
Optimal transport for applied mathematicians .Birkhäuser/Springer, 2015.[59] Erwin Schrödinger. Über die Umkehrung der Naturgesetze. Von E. Schrödinger.(Sonderausgabe a. d. Sitz.-Ber. d. Preuss. Akad. d. Wiss., Phys.-math. Klasse,1931, IX.) Verlag W. de Gruyter, Berlin. Preis RM. 1,-.
Angewandte Chemie ,44(30):636–636, 1931.[60] Erwin Schrödinger. Sur la théorie relativiste de l’électron et l’interprétation dela mécanique quantique.
Ann. Inst. H. Poincaré , 2(4):269–310, 1932.[61] Karl-Theodor Sturm. On the geometry of metric measure spaces. I.
Acta Math. ,196(1):65–131, 2006.[62] Cédric Villani.
Topics in optimal transportation . American Mathematical Soc.,2003.[63] Cédric Villani.
Optimal transport. Old and new , volume 338 of
Grundlehren derMathematischen Wissenschaften . Springer-Verlag, Berlin, 2009.[64] Jean-Claude Zambrini. Variational processes and stochastic versions of mechan-ics.
J. Math. Phys. , 27(9):2307–2330, 1986.[65] Jean-Claude Zambrini. The research program of stochastic deformation (with aview toward geometric mechanics). In
Stochastic analysis: a series of lectures ,volume 68 of