[PDF] Entropy-Transport distances between unbalanced metric measure spaces

Abstract

Inspired by the recent theory of Entropy-Transport problems and by the D -distance of Sturm on normalised metric measure spaces, we define a new class of complete and separable distances between metric measure spaces of possibly different total mass. We provide several explicit examples of such distances, where a prominent role is played by a geodesic metric based on the Hellinger-Kantorovich distance. Moreover, we discuss some limiting cases of the theory, recovering the "pure transport" D -distance and introducing a new class of "pure entropic" distances. We also study in detail the topology induced by such Entropy-Transport metrics, showing some compactness and stability results for metric measure spaces satisfying Ricci curvature lower bounds in a synthetic sense.

Full PDF

aa r X i v : . [ m a t h . M G ] S e p ENTROPY-TRANSPORT DISTANCES BETWEEN UNBALANCEDMETRIC MEASURE SPACES

NICOLÒ DE PONTI AND ANDREA MONDINO

Abstract.

Inspired by the recent theory of Entropy-Transport problems and by the D -distance of Sturm on normalised metric measure spaces, we deﬁne a new class of completeand separable distances between metric measure spaces of possibly diﬀerent total mass.We provide several explicit examples of such distances, where a prominent role is played bya geodesic metric based on the Hellinger-Kantorovich distance. Moreover, we discuss somelimiting cases of the theory, recovering the “pure transport” D -distance and introducing anew class of “pure entropic” distances.We also study in detail the topology induced by such Entropy-Transport metrics, showingsome compactness and stability results for metric measure spaces satisfying Ricci curvaturelower bounds in a synthetic sense. Contents

Introduction 11. Preliminaries and notation 61.1. Metric and measure setting 61.2. Curvature-Dimension condition 91.3. Entropy functionals 102. Entropy-Transport problem and distances 122.1. Regular Entropy-Transport distances 133. Sturm-Entropy-Transport distance 143.1. Topology 244. Limiting cases 254.1. Pure entropy distances 254.2. Sturm’s distances 314.3. Piccoli-Rossi distance 334.4. Bounds between distances 34References 34

Introduction

With motivations from pure Mathematics as well as from applied sciences, over the lastdecades a growing attention has been paid to the problem of “comparing objects”, whichcome naturally endowed with a distance/metric and a weight/volume form/measure. Fromthe mathematical point of view, such objects are formalised as metric measure spaces (m.m.s.

Nicolò De Ponti: Scuola Internazionale Superiore di Studi Avanzati (SISSA), Trieste, Italy,email: [email protected] Mondino: Mathematical Institute, University of Oxford, UK,email: [email protected]. for short) ( X, d , µ ) , where the metric structure ( X, d ) describes the geometry and the mutualdistance of points, and the measure µ “weights” the relative importance of diﬀerent parts ofthe object.The ﬂexibility of such a framework allows to unify the treatment of a series of problemsstemming from various ﬁelds of science and technology, e.g. regression problems in quantumchemistry [23] computer vision [39], language processing [1, 24], graph [44] and surface [6]matching, machine learning [45]. The theory of metric measure spaces has been ﬂourishingin pure Mathematics as well, providing a uniﬁed setting to investigate concentration ofmeasure phenomena [27, 40], the theory of Ricci limit spaces [19, 9] and, more generally,synthetic notions of Ricci curvature lower bounds [41, 42, 31, 4].In order to “quantify the similarities and diﬀerences between two such objects”, it is thusnatural to investigate appropriate notions of distance between metric measure spaces. Thisidea has its roots in the work of Gromov [25, Chapter 3 ]), who ﬁrst recognized the im-portance of studying the “space of spaces” X as a metric space in its own right. Formally, X denotes the set of equivalence classes of metric measure spaces ( X, d , µ ) , where ( X, d ) is a complete and separable metric space, and µ is a ﬁnite, nonnegative, Borel measure;we are naturally identifying two m.m.s. ( X , d , µ ) , ( X , d , µ ) if there exists an isometry ψ : supp ( µ ) → supp ( µ ) such that ψ ♯ µ = µ . Here by supp ( µ ) we denote the support ofthe measure µ (see the preliminary section for more details).In the recent years, the theory has been pushed forward by the works of Sturm [41, 43]and Memoli [33] who realized that ideas from mass transportation can be used to producenew relevant distances between metric measure spaces. Such distances have been success-fully applied in diﬀerent ﬁelds, but suﬀer from a major restriction which is intrinsic of theWasserstein distances coming from optimal transport: they can be used to compare onlyspaces with the same total mass .The goal of the present paper is to overcome this limitation by taking advantage of thetheory of optimal Entropy-Transport problems [30]. In contrast with the classical transportsetting, these problems allow the description of phenomena where the conservation of massmay not hold; for this reason they are also known in the literature as “unbalanced optimaltransport problems”. The corresponding theory is fairly recent and is becoming increasinglypopular in applications, e.g. gradient ﬂows to train neural networks [10, 36], supervised learn-ing [18], medical imaging [17] and video [28] registration. Indeed, the Entropy-Transportrelaxation seems to outperform classical optimal transport in all the problems where theinput data is noisy or a normalization procedure is not appropriate. We refer to [37] andreferences therein for more applications of unbalanced optimal transport.As we are going to explain in detail below, inspired by the construction of the D -distance ofSturm [41], we are able to produce a new class of complete and separable distances betweenmetric measure spaces by replacing the Wasserstein distance with an Entropy-Transportdistance. Such metric structures on X also turn out to be geodesic (resp. length) when theunderlying Entropy-Transport distance is geodesic (resp. length). Optimal transport and Sturm distances.

Let ( X, d ) be a metric space and c : X × X → [0 , + ∞ ] be a lower semi-continuous cost function. The optimal transport problembetween two probability measures µ , µ consists in the minimization problem: T( µ , µ ) := inf γ ∈ Π( µ ,µ ) Z X × X c ( x , x ) d γ ( x , x ) . (1) NTROPY-TRANSPORT DISTANCES BETWEEN UNBALANCED METRIC MEASURE SPACES 3

Here Π( µ , µ ) denotes the set of measures γ in the product space X × X whose marginalssatisfy the constraint π i♯ γ = µ i , where π i denotes the projection map π i ( x , x ) = x i .A typical choice for the cost function is c ( x , x ) = d p ( x , x ) , p ≥ . In this situation, thetransport cost T is the p -power of the celebrated p -Wasserstein distance W p , a metric on theset P p ( X ) of probability measures over X with ﬁnite p -moment. Starting from the seminalwork of Kantorovich, the metric space ( P p ( X ) , W p ) has been thoroughly studied: it inheritsmany geometric properties of the underlying space ( X, d ) (such as completeness, separability,geodesic property) and induces the weak topology (with p -moments) of probability measures.We refer to the monograph [46] for a detailed overview of the topic.As observed by Sturm [41], one can lift the metric W p to a distance between metricmeasure spaces by deﬁning: D p (cid:0) ( X , d , µ ) , ( X , d , µ ) (cid:1) := inf W p ( ψ ♯ µ , ψ ♯ µ ) , (2)where the inﬁmum is taken over all complete and separable metric spaces ( ˆ X, ˆ d ) , and allisometric embeddings ψ i : supp ( µ i ) → ˆ X . It is proved in [41, Theorem 3.6] that D p is acomplete, separable and geodesic distance on the set X ,p := { ( X, d , µ ) ∈ X : µ ∈ P p ( X, d ) } . The D p metric appears in several applications, e.g. computation of barycenters for graphsor more general shapes [45], generative learning for objects immersed in diﬀerent spaces [7],natural language processing for unsupervised translation learning [1, 24], Entropy-Transport problems and Sturm-Entropy-Transport distances.

Theidea at the core of Entropy-Transport problems is to relax the marginal constraints typicalof the classical Kantorovich formulation (1) by adding some suitable penalizing functionalswhich keep track of the deviation of the marginals γ i := π i♯ γ from the data µ i , i = 1 , .Following the approach of Liero, Mielke and Savaré [30], given a superlinear, convex func-tion F : [0 , + ∞ ) → [0 , + ∞ ] such that F (1) = 0 (for simplicity here we assume F to besuperlinear, see deﬁnition (20) for the general case), one considers the entropy functional(also called Csiszár F -divergence [13]) D F : M ( X ) × M ( X ) → [0 , + ∞ ] , D F ( γ || µ ) := (R X F (cid:0) d γ d µ (cid:1) d µ if γ ≪ µ, + ∞ otherwise . (3)Here M ( X ) denotes the set of ﬁnite, nonnegative, Borel measures over X . A classicalexample is given by the choice F = U ( s ) := s ln( s ) − s + 1 , that corresponds to thecelebrated Kullback-Leibler divergence (note that when γ and µ are probability measures, D U coincides with the celebrated Bolzmann-Shannon entropy Ent( ρµ | µ ) = R ρ log ρ d µ ).Given µ , µ ∈ M ( X ) , the Entropy-Transport problem induced by the entropy function F and the cost function c is then deﬁned as ET( µ , µ ) := inf γ ∈ M ( X × X ) (cid:26) X i =1 D F ( γ i || µ i ) + Z X × X c ( x , x )d γ ( x , x ) (cid:27) . (4)We emphasize that the problem (4) makes perfect sense even when µ ( X ) = µ ( X ) .As in the case of optimal transport problems, it is natural to consider cost functions ofthe form c ( x , x ) = ℓ ( d ( x , x )) , where d is a distance on X and ℓ := [0 , ∞ ) → [0 , ∞ ] is ageneral function. With a careful choice of the functions F and ℓ (see [14] for a discussion onthe metric properties of Entropy-Transport problems), one is able to produce a distance D ET NICOLÒ DE PONTI AND ANDREA MONDINO on the space M ( X ) by taking a suitable power of the Entropy-Transport cost ET , namely D ET = ET a for a certain a ∈ (0 , .In the paper we introduce the class of regular Entropy-Transport distances, deﬁned as fol-lows:

Deﬁnition 1.

We say that D ET is a regular Entropy-Transport distance if(a) There exist a ∈ (0 , , a function F : [0 , + ∞ ) → [0 , + ∞ ) and a function ℓ : [0 , ∞ ) → [0 , ∞ ] such that for every complete and separable metric space ( X, d ) , setting c ( x , x ) := ℓ ( d ( x , x )) , the function D ET coincides with the power a of the Entropy-Transport cost ET induced by ( F, c ) as in (4) .(b) The function ℓ is continuous, convex, and ℓ ( s ) = 0 if and only if s = 0 .(c) F is convex, superlinear and with F (1) = 0 .(d) For every complete and separable metric space ( X, d ) , D ET is a complete and separablemetric on M ( X ) inducing the weak topology. For any regular Entropy-Transport distance, the

Sturm-Entropy-Transport distance D ET between the (equivalence classes of) m.m.s. ( X , d , µ ) , ( X , d , µ ) is then deﬁned as D ET (cid:0) ( X , d , µ ) , ( X , d , µ ) (cid:1) := inf D ET ( ψ ♯ µ , ψ ♯ µ ) (5)where the inﬁmum is taken over all complete and separable metric spaces ( ˆ X, ˆ d ) , and allisometric embeddings ψ : supp ( µ ) → ˆ X and ψ : supp ( µ ) → ˆ X .The main result of the paper (Theorem 2) is that every Sturm-Entropy-Transport distancedeﬁnes a complete and separable metric structure on X . Moreover it satisﬁes the geodesic(resp. length) property if the distance D ET satisﬁes the geodesic (resp. length) property onthe space of measures.We also study in detail the notion of convergence induced by such distances, showingthat it corresponds to the weak measured-Gromov convergence introduced in [21]. As aconsequence, we obtain a compactness result for the class of m.m.s. ( X, d , µ ) satisfying the CD ( K, N ) condition, having bounded diameter and satisfying < v ≤ µ ( X ) ≤ V . We referto Theorem 4 for the precise statement and to the preliminaries for the deﬁnition of thecurvature-dimension condition CD ( K, N ) .At a technical level, the proofs of our results are inspired by the corresponding ones givenby Sturm in [41], but they require new ideas in order to deal with general cost functions andwith the entropic part of the problem. Two key results of independent interest are containedin Proposition 2 and Lemma 6, where we show that the inﬁmum in the right hand side of (5)is actually a minimum, and we give an explicit formulation of the Sturm-Entropy-Transportdistance, namely D /a ET (( X , d , µ ) , ( X , d , µ )) = X i =1 D F ( γ i || µ i ) + Z X × X ℓ (cid:0) ˆ d ( x, y ) (cid:1) d γ , (6)for some optimal measure γ ∈ M ( X × X ) and optimal pseudo-metric coupling ˆ d between d and d (see the preliminaries for the deﬁnition of pseudo-metric coupling).The class of regular Entropy-Transport distances includes some of the main examples ofEntropy-Transport distances known in the literature, including: • The Hellinger-Kantorovich geodesic distance [30, 29, 11, 26] induced by the choices a = 1 / , F ( s ) = U ( s ) , ℓ ( d ) = ( − log (cid:0) cos ( d ) (cid:1) if d < π , + ∞ otherwise . NTROPY-TRANSPORT DISTANCES BETWEEN UNBALANCED METRIC MEASURE SPACES 5 • The so-called

Gaussian

Hellinger-Kantorovich distance [30] that corresponds to thechoices a = 1 / , F ( s ) = U ( s ) , ℓ ( d ) = d . • The quadratic power-like distances studied in [14] corresponding to a = 1 / , F ( s ) = U p ( s ) := s p − p ( s − − p ( p − , ℓ ( d ) = d , < p ≤ . Moreover, our analysis is not restricted to regular Entropy-Transport distances. By alimit procedure we also discuss some singular cases covering: • The “pure entropy” setting that corresponds to the choice c ( x , x ) = ( if x = x , + ∞ otherwise.In this situation we construct a family of distances between metric measure spacesinducing a notion of strong convergence (see Theorems 5 and 6 for the details). • The “pure transport” setting, corresponding to a = 1 /p , F ( s ) = ( s = 1 , + ∞ otherwise , ℓ ( d ) = d p , where we recover the D p -distances introduced by Sturm. • The

Piccoli-Rossi distance BL [34, 35] (also known as bounded-Lipschitz distance ),induced by the choices a = 1 , F ( s ) = | s − | , ℓ ( d ) = d. By an analogous procedure to the one described in (5), in Theorem 8 we show thatthe distance BL can be lifted to a complete distance BL on the set X . Note on the preparation.

Some of the results of the paper (often under additionalassumptions) have been presented at diﬀerent seminars and included in the Phd thesis of theﬁrst named author [15, Chapter 5], where the construction of the Sturm-Entropy-Transportdistances induced by the Hellinger-Kantorovich and the quadratic power-like distances isdeveloped.Only during the ﬁnal stage of preparation of the present manuscript (September 2020), webecame aware of the independent work [38], which deﬁnes a class of distances between un-balanced metric measure spaces starting from the conical formulation of Entropy-Transportproblems (see [30, 11, 14] and Remark 1 for a discussion on the “cone geometry” of Entropy-Transport problems). The paper [38] also provides some interesting numerical discussionson the topic, while it is not present a study on the analytic and geometric properties ofthis class of distances (such as completeness, separability, the length and geodesic property,compactness).

NICOLÒ DE PONTI AND ANDREA MONDINO

For these reasons, we believe that our results and [38] have an independent interest and kindof complement each other.

Acknowledgements.

The project started when N.D.P. was visiting A.M. in the fall2018 at the Mathematics Institute of the University of Warwick, and took advantage of asecond visit of N.D.P. to the Mathematical Institute of the University of Oxford in March2020. The authors wish to thank both the institutions for the inspiring atmosphere and theexcellent working conditions.A.M. is supported by the European Research Council (ERC), under the European’s UnionHorizon 2020 research and innovation programme, via the ERC Starting Grant “CURVA-TURE”, grant agreement No. 802689.The authors wish to warmly thank Giuseppe Savaré for valuable discussions on the topicsof the paper. 1.

Preliminaries and notation

Metric and measure setting.

A function d : X × X → [0 , ∞ ] is a pseudo-metric onthe set X if d is symmetric, satisﬁes the triangle inequality and d ( x, x ) = 0 for every x ∈ X .We say that d is a metric possibly attaining the value + ∞ if it is a pseudo-metric such that d ( x, y ) = 0 implies x = y . When d is also ﬁnite-valued, we simply say that d is a metric. A pseudo-metric space (resp. metric space ) will be a couple ( X, d ) , where d is a pseudo-metric(resp. metric) on the set X .On a pseudo-metric space we will always consider the topology induced by the openballs B r ( x ) := { y ∈ X : d ( x, y ) < r } . A Polish space is a separable completely metrizabletopological space. We will denote by diam ( X ) the diameter of a metric space X .An isometry between two metric spaces ( X , d ) , ( X , d ) is a map ψ : X → X suchthat for every x, y ∈ X we have d ( x, y ) = d ( ψ ( x ) , ψ ( y )) . (7)Let { ( X α , d α ) | α ∈ A } be an indexed family of metric spaces, we deﬁne its disjoint union as G α X α := [ n X α × { α }| α ∈ A o , endowed with a pseudo-metric ˆ d , called pseudo-metric coupling between { d α } , such that ˆ d (( x, α ) , ( y, α )) = d α ( x, y ) for every x, y ∈ X α . The inclusion map ι α : X α → G α X α , ι α ( x ) := ( x, α ) , is thus an isometry with image X α × { α } . We will often identify, with a slight abuse ofnotation, the space X α with X α × { α } . Lemma 1.

Let ( X , d ) , ( X , d ) be two complete and separable metric spaces. Let ˆ d be aﬁnite valued pseudo-metric coupling between d and d . Then the space ˜ X := ( X ⊔ X ) / ∼ where x ∼ x ⇐⇒ ˆ d ( x , x ) = 0 (8) endowed with the distance ˜ d ([ x ] , [ x ]) := ˆ d ( x , x ) is a complete and separable metric space. Here [ x ] ∈ ˜ X denotes the equivalence class of thepoint x ∈ X ⊔ X . NTROPY-TRANSPORT DISTANCES BETWEEN UNBALANCED METRIC MEASURE SPACES 7

Proof.

We ﬁrstly notice that ˜ d is well deﬁned on ˜ X . Indeed, if x ∼ ˜ x and x ∼ ˜ x we have ˜ d ( x , x ) = ˆ d ( x , x ) ≤ ˆ d ( x , ˜ x ) + ˆ d (˜ x , ˜ x ) + ˆ d (˜ x , x ) = ˜ d (˜ x , ˜ x )˜ d (˜ x , ˜ x ) = ˆ d (˜ x , ˜ x ) ≤ ˆ d (˜ x , x ) + ˆ d ( x , x ) + ˆ d ( x , ˜ x ) = ˜ d ( x , x ) which implies ˜ d ( x , x ) = ˜ d (˜ x , ˜ x ) .It is clear that ˜ d is a metric on ( X ⊔ X ) / ∼ .The separability is a consequence of the fact that ( X ⊔ X , ˆ d ) is separable, being the unionof two separable space (recall that ˆ d = d i on X i , i = 1 , ).To prove the completeness, let us consider a Cauchy sequence { y j } ∈ ˜ X . It is suﬃcientto show that a subsequence is converging with respect to ˜ d . Let p : X ⊔ X → ˜ X bethe quotient map and, recalling that X ⊔ X = X × { } ∪ X × { } , we can supposewithout loss of generality that there exists a subsequence { p − ( y j k ) } ∈ X × { } (the case { p − ( y j k ) } ∈ X × { } being analogous). Up to identifying ( X × { } , ˆ d ) with ( X , d ) , wecan infer that { p − ( y j k ) } is a Cauchy sequence in the complete space ( X , d ) and thus itconverges. It is immediate to check that { y j k } is converging in ˜ X with respect to ˜ d and theproof is complete. (cid:3) Starting from a metric space ( X, d ) , we deﬁne the cone over X as the space C ( X ) := ( X × [0 , + ∞ )) / ∼ where ( x , r ) ∼ ( x , r ) ⇐⇒ r = r = 0 or r = r , x = x . If ( X, d ) is a pseudo-metric space, we denote by M ( X ) the space of ﬁnite, nonnegativemeasures on the Borel σ -algebra B ( X ) , and by P ( X ) ⊂ M ( X ) the space of probabilitymeasures. We endow M ( X ) with the weak topology, inducing the following notion ofconvergence: µ n ⇀ µ ⇐⇒ Z X f d µ n → Z X f d µ for any f ∈ C b ( X ) , (9)where C b ( X ) denotes the set of real, continuous and bounded functions deﬁned on X .A subset K ⊂ M ( X ) is bounded if sup µ ∈ K µ ( X ) < ∞ and it is equally tight if ∀ ǫ > ∃ K ǫ ⊂ X compact : ∀ µ ∈ K , µ ( X \ K ǫ ) ≤ ǫ. (10)Compactness properties with respect to the weak topology on M ( X ) are guaranteed bythe following version of Prokhorov’s Theorem: Theorem 1.

Let X be a Polish space. A subset K ⊂ M ( X ) is bounded and equally tightif and only if it is relatively compact with respect to the weak topology. We recall that the set of measures of the form µ = M N X n =1 δ x n , (11)where M ∈ R + , N ∈ N and x n ∈ X , is dense in M ( X ) . Moreover, if X is separable, themeasures of the form (11), with M ∈ Q + and x n in a countable dense subset of X , form acountable dense subset of M ( X ) , proving that also the latter is a separable space.A metric measure space will be a triple ( X, d , µ ) where ( X, d ) is a complete, separablemetric space and µ ∈ M ( X ) . If there exists a point x ∈ X such that Z X d p ( x , x )d µ ( x ) < ∞ , (12) NICOLÒ DE PONTI AND ANDREA MONDINO we will say that the measure µ ∈ P ( X ) has ﬁnite p -moment. We denote by P p ( X ) thespace of measures ν ∈ P ( X ) with ﬁnite p -moment.The support of the measure µ is the smallest closed set X := supp ( µ ) such that µ ( X \ X ) = 0 . We notice that the set supp ( µ ) has a natural structure of metric measure spacewith the induced distance, σ -algebra and measure (which will be denoted in the same way).We say that ϕ is a curve connecting x, y ∈ X , if ϕ : [ a, b ] → X is a continuous map suchthat ϕ ( a ) = x and ϕ ( b ) = y . The length of a curve is deﬁned as Length ( ϕ ) := sup n X i =1 d (cid:0) ϕ ( t i − ) , ϕ ( t i ) (cid:1) , (13)where the supremum is taken over all the partitions a = t < t < ... < t n = b .We will always assume that a curve of ﬁnite length is parametrized by constant speed,i.e. Length ( ϕ ↾ [ s,t ] ) = t − sb − a Length ( ϕ ) . (14)A metric space ( X, d ) is called length space if for all x, y ∈ X d ( x, y ) = inf n Length ( ϕ ) : ϕ curve connecting x and y o . (15)A geodesic is a curve ϕ : [ a, b ] → X such that d ( ϕ ( a ) , ϕ ( t )) = ( t − a ) d ( ϕ ( a ) , ϕ ( b )) , for all t ∈ [ a, b ] . Notice in particular that if ϕ is a geodesic then Length ( ϕ ) = d ( ϕ ( a ) , ϕ ( b )) . A metric space ( X, d ) is geodesic if any pair of points x, y ∈ X is connected by a geodesic.For a metric space ( X, d ) , the Kantorovich-Wasserstein distance W p of order p , p ≥ , isdeﬁned as follows: for µ , µ ∈ M ( X ) we set W pp ( µ , µ ) := inf γ Z X × X d p ( x, y ) d γ , (16)where the inﬁmum is taken over all γ ∈ M ( X × X ) with µ and µ as the ﬁrst and thesecond marginal, i.e. ( π i ) ♯ γ = µ i where π i : X × X → X denotes the projection map π i ( x , x ) = x i , i = 1 , . A measure γ ∈ M ( X × X ) achieving the minimum in (16) withgiven marginals is said a W p -optimal coupling for ( µ , µ ) . It is clear that W p ( µ , µ ) = + ∞ when µ ( X ) = µ ( X ) .If ( X, d ) is complete and separable, ( P p ( X ) , W p ) is a complete and separable metricspace. It is geodesic when ( X, d ) is geodesic. Moreover, for any sequence µ n ∈ P p ( X ) wehave lim n →∞ W p ( µ n , µ ) = 0 ⇐⇒ ( µ n weakly converges to µ,µ n has uniformly p -integrable moments , (17)where the latter means that for some (thus any) x lim R →∞ lim sup n Z X \ B R ( x ) d p ( x , x )d µ n ( x ) = 0 . (18)For a proof of these last facts, see [46, Theorem 6.18]. NTROPY-TRANSPORT DISTANCES BETWEEN UNBALANCED METRIC MEASURE SPACES 9

Curvature-Dimension condition.

It is out of the scopes of this brief section togive a full account of the curvature-dimension condition and its properties; we will limit toschematically recalling the basic deﬁnitions involved. The interested reader is referred tothe original papers [31, 41, 42, 4, 3, 20, 21, 16, 5, 8], the survey [2] and the monograph [46]. • For any K ∈ R , N ∈ (1 , ∞ ) , θ > and t ∈ [0 , , deﬁne the distortion coeﬃcients by τ ( t ) K,N ( θ ) := t N σ ( t ) K,N − ( θ ) N − N , where σ ( t ) K,N ( θ ) :=  ∞ if Kθ ≥ N π tθ √ K/N )sin( θ √ K/N ) if < Kθ < N π t if Kθ = 0 sinh( tθ √ K/N )sinh( θ √ K/N ) if Kθ < . • For every N ∈ (1 , ∞ ) , deﬁne the N -Rényi entropy functional relative to µ , U N ( · | µ ) : P ( X ) → [ −∞ , as U N ( ν | µ ) := − Z X ρ − N d µ, where ν = ρµ + ν s and ν s ⊥ µ. • Deﬁne also the

Bolzmann-Shannon entropy functional relative to µ , Ent( · | µ ) : P ( X ) → ( −∞ , + ∞ ] as Ent( ν | µ ) := Z X ρ log( ρ ) d µ, if ν = ρµ ≪ µ and ρ log ρ ∈ L ( X, µ ) , and + ∞ otherwise. • CD ( K, ∞ ) condition : given K ∈ R , we say that ( X, d , µ ) veriﬁes the CD ( K, ∞ ) condition if for any pair of probability measures ν , ν ∈ P ( X ) with Ent( ν | µ ) , Ent( ν | µ ) < + ∞ , there exists a W -geodesic ( ν t ) t ∈ [0 , from ν to ν such that Ent( ν t | µ ) ≤ (1 − t ) Ent( ν | µ ) + t Ent( ν | µ ) − K t (1 − t ) W ( µ , µ ) , for any t ∈ [0 , . • CD ( K, N ) condition : given K ∈ R , N ∈ (1 , ∞ ) we say that ( X, d , µ ) veriﬁes the CD ( K, N ) condition if for any pair of probability measures ν , ν ∈ P ( X ) withbounded support and with ν , ν ≪ µ , there exists a W -geodesic ( ν t ) t ∈ [0 , from ν to ν with ν t ≪ µ , and a W -optimal coupling γ ∈ P ( X × X ) such that U N ′ ( ν t | µ ) ≤ − Z (cid:20) τ (1 − t ) K,N ′ ( d ( x, y )) ρ − N ′ + τ ( t ) K,N ′ ( d ( x, y )) ρ − N ′ (cid:21) d γ ( x, y ) , for any N ′ ≥ N , t ∈ [0 , . • Consistency property:

A smooth Riemannian manifold (resp. weighted Riemannianmanifold) satisﬁes the CD ( K, N ) condition for some K ∈ R , N ∈ (1 , ∞ ) if and onlyif dim( M ) ≤ N and the Ricci curvature is bounded below by K (resp. if and only ifthe N -Bakry-Émery-Ricci tensor is bounded below by K ). • Deﬁne the slope of a real valued function u : X → R at the point x ∈ X as |∇ u | ( x ) := ( lim sup y → x | u ( x ) − u ( y ) | d ( x,y ) if x is not isolated otherwise . We denote with

LIP( X ) the space of Lipschitz functions on ( X, d ) . • Let f ∈ L ( X, µ ) . The Cheeger energy of f is deﬁned as Ch ( f ) := inf (cid:26) lim inf n →∞ Z |∇ f n | d µ | f n ∈ LIP( X ) ∩ L ( X, µ ) , k f n − f k L → (cid:27) . One can check that the Cheeger energy Ch : L ( X, µ ) → [0 , ∞ ] is convex and lowersemi-continuous. Thus it admits an L -gradient ﬂow, called heat ﬂow . • The metric measure space ( X, d , µ ) is said inﬁnitesimally Hilbertian if Ch is a qua-dratic form, i.e. it it satisﬁes the parallelogram identity.One can check that ( X, d , µ ) is inﬁnitesimally Hilbertian if and only if the heat ﬂowfor every positive time is a linear map from L ( X, µ ) to L ( X, µ ) .If ( X, d , µ ) is the metric measure space associated to a smooth Finsler manifold, onecan check that ( X, d , µ ) is inﬁnitesimally Hilbertian if and only if the manifold isactually Riemannian. • Given K ∈ R and N ∈ (1 , ∞ ] , we say that ( X, d , µ ) veriﬁes the RCD ( K, N ) condition if it satisﬁes the CD ( K, N ) condition and it is inﬁnitesimally Hilbertian. • Pointed measured Gromov-Hausdorﬀ convergence : Let ( X n , d n , µ n ) , n ∈ N ∪ {∞} ,be a sequence of metric measure spaces and let ¯ x n ∈ X n for every n ∈ N ∪ {∞} bea sequence of reference points. We say that ( X n , d n , µ n , ¯ x n ) → ( X ∞ , d ∞ , µ ∞ , ¯ x ∞ ) inthe pointed measured Gromov Hausdorﬀ (pmGH) sense, provided for any ε, R > there exists N ( ε, R ) ∈ N such that for all n ≥ N ( ε, R ) there exists a Borel map f R,εn : B R (¯ x n ) → X ∞ such that – f R,εn (¯ x n ) = ¯ x ∞ , – sup x,y ∈ B R (¯ x n ) | d n ( x, y ) − d ∞ ( f R,εn ( x ) , f R,εn ( y )) | ≤ ε , – the ε -neighbourhood of f R,εn ( B R (¯ x n )) contains B R − ε (¯ x ∞ ) , – ( f R,εn ) ♯ ( µ n x B R (¯ x n )) weakly converges to µ ∞ x B R ( x ∞ ) as n → ∞ , for a.e. R > .If in addition there exists ¯ R > such that diam( X n ) ≤ ¯ R for every n ∈ N ∪ {∞} ,then we say that ( X n , d n , µ n ) → ( X ∞ , d ∞ , µ ∞ ) in the measured Gromov Hausdorﬀ(mGH for short) sense. In this case it is enough to consider only R = ¯ R in the aboverequirements. • Stability : Let K ∈ R and N ∈ (1 , ∞ ] be given. Assume that ( X n , d n , µ n ) sat-isﬁes CD ( K, N ) (resp. RCD ( K, N ) ), for every n ∈ N , and that ( X n , d n , µ n , ¯ x n ) → ( X ∞ , d ∞ , µ ∞ , ¯ x ∞ ) in the pmGH sense. Then ( X ∞ , d ∞ , µ ∞ ) satisﬁes CD ( K, N ) (resp. RCD ( K, N ) ) as well.1.3. Entropy functionals.

In this section we assume that X is a Polish space.A function F : [0 , + ∞ ) → [0 , + ∞ ] belongs to the class Γ ( R + ) of the admissible entropyfunctions if F is convex, lower semicontinuous and F (1) = 0 . We deﬁne the recessionconstant as F ′∞ = lim s →∞ F ( s ) s , and we say that F is superlinear if F ′∞ = + ∞ . NTROPY-TRANSPORT DISTANCES BETWEEN UNBALANCED METRIC MEASURE SPACES 11

We also deﬁne the perspective function induced by F ∈ Γ ( R + ) as the function ˆ F :[0 , + ∞ ) × [0 , + ∞ ) → [0 , + ∞ ] , given by ˆ F ( r, t ) := ( F (cid:0) rt (cid:1) t if t > ,F ′∞ r if t = 0 . (19)Let F ∈ Γ ( R + ) be an admissible entropy function. The F -divergence (also called Csiszár’s divergence or relative entropy ) is the functional D F : M ( X ) × M ( X ) → [0 , + ∞ ] deﬁned by D F ( γ || µ ) := Z X F ( σ )d µ + F ′∞ γ ⊥ ( X ) , γ = σµ + γ ⊥ , (20)where γ = σµ + γ ⊥ is the Lebesgue’s decomposition of the measure γ with respect to µ .When F is superlinear D F ( γ || µ ) = + ∞ if γ has a singular part with respect to µ . Moreover,it is clear that D F ( µ || µ ) = 0 .We now collect some useful properties of the relative entropies. For the proof see [30,Section 2.4]. Lemma 2.

The functional D F is jointly convex and lower semicontinuous in M ( X ) × M ( X ) . More generally, if F ∈ Γ ( R + ) is the pointwise limit of an increasing sequence ( F n ) ⊂ Γ ( R + ) and γ, µ ∈ M ( X ) are the weak limit of a sequence ( γ n , µ n ) ⊂ M ( X ) × M ( X ) then we have lim inf D F n ( γ n || µ n ) ≥ D F ( γ || µ ) . Lemma 3. If K ⊂ M ( X ) is bounded and F ′∞ > then the set K C := { γ ∈ M ( X ) : D F ( γ || µ ) ≤ C, for some µ ∈ K} (21) is bounded for every C ≥ . Moreover, if K is also equally tight and F is superlinear, then K C is equally tight for every C ≥ . The last lemma of this section shows an invariance result for the F -divergences. Lemma 4.

Let F ∈ Γ ( R + ) be an admissible entropy function, X, Y be two Polish spacesand f : X → Y be a Borel injective map. Then, for any γ, µ ∈ M ( X ) it holds D F ( γ || µ ) = D F ( f ♯ γ || f ♯ µ ) . (22) Proof.

Let us consider the Lebesgue’s decompositions γ = σµ + γ ⊥ and f ♯ γ = ˜ σf ♯ µ + ˜ γ ⊥ . Since f ♯ γ and f ♯ µ have support contained in f ( X ) , we can suppose without loss of generalitythat f is bijective.For any Borel set A ⊂ X we have Z A σ d µ + γ ⊥ ( A ) = γ ( A ) = γ ( f − ( f ( A ))) = f ♯ γ ( f ( A ))= Z f ( A ) ˜ σ d f ♯ µ + ˜ γ ⊥ ( f ( A )) = Z A ˜ σ ◦ f d µ + ˜ γ ⊥ ( f ( A )) . (23)By the uniqueness of the Lebesgue’s decomposition (see [30, Lemma 2.3]) it follows that σ = ˜ σ ◦ f up to ( µ + γ ) -negligible sets and γ ⊥ ( X ) = ˜ γ ⊥ ( f ( X )) = ˜ γ ⊥ ( Y ) . In particular D F ( f ♯ γ || f ♯ µ ) := Z Y F (˜ σ )d f ♯ µ + F ′∞ ˜ γ ⊥ ( Y ) = Z X F (˜ σ ◦ f )d µ + F ′∞ γ ⊥ ( X ) = D F ( γ || µ ) . (cid:3) Entropy-Transport problem and distances

Let γ ∈ M ( X × X ) . In the sequel we denote by γ i := ( π i ) ♯ γ the marginals of γ .We are now ready to deﬁne the Entropy-Transport problem. Deﬁnition 2.

Let F ∈ Γ ( R + ) and let c : X × X → [0 , + ∞ ] be a lower semicontinuousfunction. The Entropy-Transport functional between the measures µ , µ ∈ M ( X ) is thefunctional ET ( · || µ , µ ) : M ( X × X ) → [0 , + ∞ ] , ET ( γ || µ , µ ) := D F ( γ || µ ) + D F ( γ || µ ) + Z X × X c ( x , x )d γ ( x , x ) . (24) We deﬁne the Entropy-Transport problem between µ and µ as the minimization problem ET( µ , µ ) := inf γ ∈ M ( X × X ) ET ( γ || µ , µ ) . (25) To highlight the role of the entropy function F and the cost function c , we also say that ET is the cost of the Entropy-Transport problem induced by ( F, c ) . We are particularly interested in cost functions of the form c ( x , x ) = ℓ ( d ( x , x )) for acertain function ℓ : [0 , ∞ ) → [0 , ∞ ] .In the next Proposition we recall some properties of Entropy-Transport problems (for aproof see [30]). Proposition 1.

Let us suppose that the Entropy-Transport problem between the measures µ , µ ∈ M ( X ) is feasible, i.e. there exists γ ∈ M ( X × X ) such that ET ( γ || µ , µ ) < ∞ ,and that F is superlinear. Then the inﬁmum in (25) can be replaced by a minimum and theset of minimizers is a compact convex subset of M ( X × X ) . Moreover, the functional ET isconvex and positively -homogeneous (thus subadditive). Remark 1.

An important role in the theory of Entropy-Transport problems is played by the marginal perspective cost H , that we are going to deﬁne.Given a number c ∈ [0 , + ∞ ) and an admissible entropy function F , we ﬁrst introducethe function H c : [0 , + ∞ ) × [0 , + ∞ ) → [0 , + ∞ ] as the lower semicontinuous envelope of thefunction ˜ H c ( r , r ) := inf θ> F (cid:18) θr (cid:19) r + F (cid:18) θr (cid:19) r + θc. If c = + ∞ we set H ∞ ( r , r ) = F (0) r + F (0) r . When c : X × X → [0 , + ∞ ] is a cost function, the induced marginal perspective cost H : X × [0 , + ∞ ) × X × [0 , + ∞ ) → [0 , + ∞ ] is deﬁned as H ( x , r ; x , t ) := H c ( x ,x ) ( r, t ) . (26) One can give an equivalent formulation of the problem (25) in terms of the marginalperspective cost (see [30, Theorem 5.8] ). Moreover, the metric properties of the entropy-transport cost ET deﬁned in (25) can be read in terms of the properties of H , studied as afunction on the space C ( X ) × C ( X ) . This point of view, which links the Entropy-Transportstructure with the conical geometry of the problem, has been deeply investigated by Liero, NTROPY-TRANSPORT DISTANCES BETWEEN UNBALANCED METRIC MEASURE SPACES 13

Mielke and Savaré for the Hellinger-Kantorovich distance [30, Section 7] (see also [11, 14] and [15, Chapters 3,4] for general marginal perspective functions).For brevity, we do not enter into the details of these formulations and we only remarkthat, for a ∈ (0 , , the cost ET a induces a distance on the space of measures if H a is adistance on the cone. In general, it is not diﬃcult to identify conditions on F and c forwhich the induced function H is nonnegative, symmetric and H ( x , r ; x , t ) = 0 if and onlyif ( x , r ) = ( x , t ) as points on the cone (see [14, Proposition 4] ); on the contrary, provingthe triangle inequality for (a power of ) H is a much more challenging problem. Regular Entropy-Transport distances.

In the next deﬁnition we introduced theclass of regular Entropy-Transport distances . Deﬁnition 3.

We say that D ET is a regular Entropy-Transport distance if • There exist a ∈ (0 , , F ∈ Γ ( R + ) and a function ℓ : [0 , ∞ ) → [0 , ∞ ] such that forevery complete and separable metric space ( X, d ) , setting c ( x , x ) := ℓ ( d ( x , x )) ,the function D ET coincides with the power a of the Entropy-Transport cost ET inducedby ( F, c ) , namely D ET ( µ , µ ) = ET a ( µ , µ ) for every µ , µ ∈ M ( X ) . (27) • The function ℓ is continuous, convex and ℓ ( s ) = 0 if and only if s = 0 . • F is superlinear and ﬁnite valued. • For every complete and separable metric space ( X, d ) , the related Entropy-Transportdistance D ET is a complete and separable metric on M ( X ) inducing the weak topol-ogy.We also write that the distance D ET is induced by ( a, F, ℓ ) with obvious meaning. We notice that if D ET is a regular Entropy-Transport distance induced by ( a, F, ℓ ) then ℓ is an increasing function and lim d → + ∞ ℓ ( d ) = + ∞ .We conclude the section with a list of examples of regular Entropy-Transport distances. Examples. (1)

Hellinger-Kantorovich:

Let F ( s ) = U ( s ) := s log s − s + 1 and ℓ HK ( d ) := ( − log (cid:0) cos ( d ) (cid:1) if d < π , + ∞ otherwise . It is proved in [30, Section 7] that (1 / , U , ℓ HK ) induces a regular Entropy-Transportdistance, called Hellinger-Kantorovich distance. We refer also to [29] for a discussionon “weighted versions” of the Hellinger-Kantorovich distance.(2)

Gaussian Hellinger-Kantorovich:

Let F ( s ) = U ( s ) = s log s − s +1 and ℓ ( d ) := d .The triple (1 / , U , ℓ ) induces a regular Entropy-Transport distance, as discussed in [30, Section 7.8] . It is called Gaussian Hellinger-Kantorovich distance.(3)

Quadratic power-like distances:

Let F ( s ) = U p ( s ) := s p − p ( s − − p ( p − , p > and ℓ ( d ) = d .Then, for every < p ≤ the triple (1 / , U p , ℓ ) induces a regular Entropy-Transportdistance, as proved in [14, Theorem 6 and Corollary 1] .We notice that the class of entropy functions { U p } satisﬁes lim p → U p ( s ) = U ( s ) ,justifying the notation we have used (see also [30, Example 2.5] ). (4) Linear power-like distances:

Let F ( s ) = U p ( s ) := s p − p ( s − − p ( p − , p > and ℓ ( d ) := d .For every p > , (1 / , U p , ℓ ) induces a regular Entropy-Transport distance (see again [14, Theorem 6 and Corollary 1] ). Sturm-Entropy-Transport distance

We say that two metric measure spaces ( X , d , µ ) and ( X , d , µ ) are isomorphic ifthere exists an isometry ψ : supp ( µ ) → supp ( µ ) such that ψ ♯ µ = µ , where ψ ♯ denotesthe push-forward through the map ψ . A necessary condition in order to be isomorphic isthat µ ( X ) = µ ( X ) . The family of all isomorphism classes of metric measure spaces will be denoted by X . Fromnow on, we will identify a metric measure space with its class.We recall now the deﬁnition of the D p -distance due to Sturm. Deﬁnition 4 ([41]) . Fix p ≥ . Let ( X , d , µ ) and ( X , d , µ ) be two metric measurespaces, the Sturm D p -distance is deﬁned as D p (cid:0) ( X , d , µ ) , ( X , d , µ ) (cid:1) := inf W p ( ψ ♯ µ , ψ ♯ µ ) , (28) where the inﬁmum is taken over all complete and separable metric spaces ( ˆ X, ˆ d ) with iso-metric embeddings ψ : supp ( µ ) → ˆ X and ψ : supp ( µ ) → ˆ X . It is proved in [41, Theorem 3.6] that D p is a complete, separable and geodesic metric onthe set X ,p := { ( X, d , µ ) ∈ X : µ ∈ P p ( X, d ) } . We are now going to deﬁne the

Sturm-Entropy-Transport distance in a similar way.

Deﬁnition 5.

Let ( X , d , µ ) and ( X , d , µ ) be two metric measure spaces, we deﬁne theSturm-Entropy-Transport distance induced by the regular Entropy-Transport distance D ET as D ET (cid:0) ( X , d , µ ) , ( X , d , µ ) (cid:1) := inf D ET ( ψ ♯ µ , ψ ♯ µ ) , (29) where the inﬁmum is taken over all complete and separable metric spaces ( ˆ X, ˆ d ) with iso-metric embeddings ψ : supp ( µ ) → ˆ X and ψ : supp ( µ ) → ˆ X . It is not diﬃcult to prove that the deﬁnition is well-posed. Indeed, let us suppose ( X ′ i , d ′ i , µ ′ i ) is isomorphic to ( X i , d i , µ i ) through the map ϕ i , i = 1 , . Then, for everymetric space ˆ X and every isometric embedding ψ i : supp ( µ i ) → ˆ X , i = 1 , , we have that D ET (cid:0) ( ψ ◦ ϕ ) ♯ µ , ( ψ ◦ ϕ ) ♯ µ (cid:1) = D ET ( ψ ♯ µ , ψ ♯ µ ) . It is often convenient to work with explicit realisations of the ambient space ( ˆ X, ˆ d ) , aparticularly useful one is given by the disjoint union that we now discuss.Given two metric spaces ( X , d , µ ) and ( X , d , µ ) , let X ⊔ X be their disjoint union. Wesay that a (resp. pseudo-)metric ˆ d on X ⊔ X is a (resp. pseudo-)metric coupling between d and d if ˆ d ( x, y ) = d ( x, y ) when x, y ∈ X and ˆ d ( x, y ) = d ( x, y ) when x, y ∈ X . NTROPY-TRANSPORT DISTANCES BETWEEN UNBALANCED METRIC MEASURE SPACES 15

A ﬁnite valued metric coupling ˆ d between d and d always exists: to construct it, ﬁx twopoints ¯ x ∈ X , ¯ x ∈ X , a number c ∈ R + , and deﬁne ˆ d as: ˆ d ( x, y ) :=  d ( x, y ) if x, y ∈ X d ( x, y ) if x, y ∈ X d ( x, ¯ x ) + c + d (¯ x , y ) if x ∈ X , y ∈ X d ( y, ¯ x ) + c + d (¯ x , x ) if y ∈ X , x ∈ X . (30)Moreover, from any ﬁnite valued pseudo-metric coupling ˆ d of d and d and any δ > wecan obtain a complete, separable metric ˆ d δ which is again a coupling of d and d in thefollowing way: ˆ d δ := ( ˆ d on ( X × X ) ⊔ ( X × X )ˆ d + δ on ( X × X ) ⊔ ( X × X ) . (31)We say that a measure γ ∈ M ( X × X ) is a measure coupling between µ and µ if γ ( A × X ) = µ ( A ) and γ ( X × B ) = µ ( B ) , (32)for all Borel sets A ⊂ X and B ⊂ X . We keep the notation γ i for the marginals of themeasure γ ∈ M ( X × X ) , i = 1 , . A more explicit formulation of the function D ET is given in the following Proposition. Proposition 2.

Let ( X , d , µ ) and ( X , d , µ ) be two metric measure spaces and D ET aregular Entropy-Transport distance induced by ( a, F, ℓ ) . (i) In Deﬁnition 5 we can suppose without loss of generality that ˆ X = X ⊔ X , ψ = ι , ψ = ι be respectively the inclusion of X and X in X ⊔ X and the inﬁmum istaken over all the pseudo-metric couplings ˆ d between d and d . (ii) In the situation of (i) we will identify µ k with ( ι k ) ♯ µ k , k = 1 , , and it holds D /a ET (( X , d , µ ) , ( X , d , µ )) = inf C ( X i =1 D F ( γ i || µ i ) + Z X × X ℓ (cid:0) ˆ d ( x, y ) (cid:1) d γ ) , (33) where C := { ( γ , ˆ d ) : γ ∈ M ( X × X ) , ˆ d ﬁnite valued pseudo-metric coupling for d , d } (34) Proof. (i)

We ﬁrst show that the inﬁmum as in (i) is less or equal to the inﬁmum as inDeﬁnition 5. Let ( ˆ X, ˆ d ) be a complete and separable metric space with isometric embeddings ψ : supp ( µ ) → ˆ X , ψ : supp ( µ ) → ˆ X , and let ˆ γ ∈ M ( ˆ X × ˆ X ) . It is immediate to checkthat ˜ d ( x , x ) :=  d ( x , x ) if ( x , x ) ∈ X × X d ( x , x ) if ( x , x ) ∈ X × X inf y ∈ supp ( µ ) y ∈ supp ( µ ) d ( x , y ) + ˆ d ( ψ ( y ) , ψ ( y )) + d ( y , x ) if ( x , x ) ∈ X × X inf y ∈ supp ( µ ) y ∈ supp ( µ ) d ( x , y ) + ˆ d ( ψ ( y ) , ψ ( y )) + d ( y , x ) if ( x , x ) ∈ X × X (35)6 NICOLÒ DE PONTI AND ANDREA MONDINO deﬁnes a pseudo-metric on X ⊔ X , coupling between d and d .Moreover, setting Ψ : ψ ( supp ( µ )) ∪ ψ ( supp ( µ )) ⊂ ˆ X → X ⊔ X Ψ := ( ι (( ψ ) − ( x )) if x ∈ ψ ( supp ( µ )) ι (( ψ ) − ( x )) if x ∈ ψ ( supp ( µ )) , x / ∈ ψ ( supp ( µ )) , and using Lemma 4 it is immediate to check that ˜ γ := (Ψ , Ψ) ♯ ˆ γ ∈ M (( X ⊔ X ) × ( X ⊔ X )) satisﬁes X i =1 D F (˜ γ i || ( ι i ) ♯ µ i ) + Z ι ( X ) × ι ( X ) ℓ (cid:0) ˜ d ( x, y ) (cid:1) d ˜ γ ( x, y )= X i =1 D F (ˆ γ i || ( ψ i ) ♯ µ i ) + Z ψ ( X ) × ψ ( X ) ℓ (cid:0) ˆ d ( x, y ) (cid:1) d ˆ γ ( x, y ) ≤ X i =1 D F (ˆ γ i || ( ψ i ) ♯ µ i ) + Z ˆ X × ˆ X ℓ (cid:0) ˆ d ( x, y ) (cid:1) d ˆ γ ( x, y ) . (36)This yields that the inﬁmum as in (i) is less or equal to the inﬁmum as in Deﬁnition 5.To show that the inﬁmum as in Deﬁnition 5 is less or equal to the inﬁmum as in (i) ,it is suﬃcient to notice that for every pseudo-metric coupling ˆ d , for every measure γ ∈ M ( X × X ) and for every ǫ > there is δ > such that the complete and separable metric ˆ d δ deﬁned in (31) is a coupling between d and d satisfying Z X × X ℓ (cid:0) ˆ d δ ( x, y ) (cid:1) d γ ≤ Z X × X ℓ (cid:0) ˆ d ( x, y ) (cid:1) d γ + ǫ, (37)as a consequence of the ﬁniteness of the measure γ and the continuity of ℓ . (ii) In case the inﬁmum runs over the couples ( γ , ˆ d ) ∈ C such that ˆ d is a completeand separable metric, the inequality “ ≤ ” in (33) is a simple consequence of the explicitformulation of the Entropy-Transport problem togheter with the fact that the superlinearityof F allows to consider measures γ ∈ M (( X ⊔ X ) × ( X ⊔ X )) with support contained in X × X . The fact that “ ≤ ” holds in (33) even if the inﬁmum is taken over the larger set C is a consequence of (37).The proof of the inequality “ ≥ ” in (33) is analogous to the ﬁrst part of the proof of (i) ,see in particular (36). (cid:3) In the next Lemma we collect some of the basic properties of the function D ET . Lemma 5.

Let D ET be a regular Entropy-Transport distance induced by ( a, F, ℓ ) .(i) For any M ≥ it holds D ET (( X , d , M µ ) , ( X , d , M µ )) = M a D ET (( X , d , µ ) , ( X , d , µ )) . (38) (ii) If ( X , d ) = ( X , d ) then D ET (( X , d , µ ) , ( X , d , µ )) ≤ D ET ( µ , µ ) . (39) (iii) The set X ∗ := n ( X, d , µ ) ∈ X , supp ( µ ) = { x , ..., x n } , n ∈ N , µ = M n X i =1 δ x i , M ∈ R + o (40) is dense in ( X , D ET ) . NTROPY-TRANSPORT DISTANCES BETWEEN UNBALANCED METRIC MEASURE SPACES 17 (iv) If µ = M n X i =1 δ x i and µ ′ = M n X i =1 δ x ′ i , (41) then D /a ET (( X, d , µ ) , ( X ′ , d ′ , µ ′ )) ≤ M n ℓ (cid:16) sup i,j | d ij − d ′ ij | (cid:17) , (42) where we put d ij = d ( x i , x j ) and d ′ ij = d ( x ′ i , x ′ j ) .(v) For any N > there exists a constant C such that for every M , /N < M < N , wehave D /a ET (( X, d , µ ) , ( X, d , M µ )) ≤ Cµ ( X ) | M − | . (43) Proof. (i) This is a consequence of the -homogeneity of the cost ET (Proposition 1) andof the push-forward map together with the deﬁnitions of D ET and D ET .(ii) The result follows from the deﬁnition of D ET , since ( ˆ X, ˆ d ) = ( X , d ) with ψ = ψ = Id is an admissible competitor for the inﬁmum.(iii) The result follows by the point (ii) of the present Lemma, the fact that D ET metrizes theweak convergence and the density in M ( X ) of the measures µ of the form M P ni =1 δ x i with respect to weak convergence.(iv) Let assume without loss of generality that X = { x , ..., x n } and X ′ = { x ′ , ..., x ′ n } . Weput δ = sup i,j | d ij − d ′ ij | . We construct the following pseudo-metric coupling: on X × X we deﬁne ˆ d = d , on X ′ × X ′ we put ˆ d = d ′ , on X × X ′ we deﬁne ˆ d ( x i , x ′ j ) := inf k ∈{ ,...,n } d ( x i , x k ) + d ′ ( x ′ k , x ′ j ) + δ, ﬁnally on X ′ × X we put ˆ d ( x ′ i , x j ) := inf k ∈{ ,...,n } d ( x j , x k ) + d ′ ( x ′ k , x ′ i ) + δ, so that ˆ d ( x i , x ′ i ) = ˆ d ( x ′ i , x i ) = δ. We then deﬁne the measure coupling γ = M n X i =1 δ ( x i ,x ′ i ) . It is straightforward to see that ˆ d and γ are actually couplings between d , d ′ and µ, µ ′ ,respectively. Then, using Proposition 2 and recalling that ℓ is an increasing functionwe have that D /a ET (( X, d , µ ) , ( X ′ , d ′ , µ ′ )) ≤ Z X × X ′ ℓ ( δ )d γ = M nℓ ( δ ) , and the thesis follows.(v) We can take d itself as metric coupling. Then, by the point (ii) of the present Lemma,we have D /a ET (( X, d , µ ) , ( X, d , M µ )) ≤ D /a ET ( µ, M µ ) = ET( µ, M µ ) . By replacing the cost c with the cost c ∞ ( x , x ) := ( if x = x + ∞ otherwise , we obtain that ET( µ, M µ ) ≤ ET ∞ ( µ, M µ ) , where we have denoted by ET ∞ the Entropy-Transport problem induced by the entropyfunction F and the cost c ∞ . Observe that every admissible entropy function satisﬁes F ( s ) ≤ C | s − | , for every /N < s < N, (44)where C := max (cid:26) F (1 /N )1 /N − , F ( N ) N − (cid:27) . The conclusion now follows from an explicit computation of ET ∞ together with thebound (44). Indeed, we have (see [30, Example E.5]) ET ∞ ( µ, M µ ) ≤ min θ ∈ [1 ,M ] Z X C | θ − | + CM | θ/M − | d µ = Cµ ( X ) | M − | . (45) (cid:3) The next Lemma shows the existence of the optimal couplings.

Lemma 6.

Let D ET be a regular Entropy-Transport distance induced by ( a, F, ℓ ) . Let ( X , d , µ ) and ( X , d , µ ) be two metric measure spaces. Then:(i) There exist a measure γ ∈ M ( X × X ) and a pseudo-metric coupling ˆ d between d and d such that D /a ET (( X , d , µ ) , ( X , d , µ )) = X i =1 D F ( γ i || µ i ) + Z X × X ℓ (cid:0) ˆ d ( x, y ) (cid:1) d γ . (46) (ii) There exist a complete and separable metric space ( ˜ X, ˜ d ) and isometric embeddings ψ : supp ( µ ) → ˜ X , ψ : supp ( µ ) → ˜ X such that D ET (cid:0) ( X , d , µ ) , ( X , d , µ ) (cid:1) = ( D ET ) ˜ d ( ψ ♯ µ , ψ ♯ µ ) , (47) where we have denoted by ( D ET ) ˜ d the Entropy-Transport distance computed in the space ( ˜ X, ˜ d ) .Proof. (i) Step 1 : tightness of the plans.By Proposition 2 there exist a sequence γ n ∈ M ( X × X ) and ˆ d n pseudo-metriccouplings of d , d such that X i =1 D F (( γ n ) i || µ i ) + Z X × X ℓ (cid:0) ˆ d n ( x, y ) (cid:1) d γ n < D /a ET (( X , d , µ ) , ( X , d , µ )) + 1 n . (48)Since the entropy functionals from the ﬁxed measures µ and µ are bounded, we canapply Theorems 1 and Lemma 3 in order to obtain the existence of subsequences (fromnow on we will not relabel them) such that ( γ n ) i converges weakly to some γ i ∈ M ( X i ) , i = 1 , . Since ( γ n ) i are marginals of the measure γ n , the tightness of ( γ n ) i impliesthe tightness of γ n , so that the sequence γ n ∈ M ( X × X ) is converging to some γ .Moreover, by the continuity of the operator π i♯ with respect to the weak topology, the NTROPY-TRANSPORT DISTANCES BETWEEN UNBALANCED METRIC MEASURE SPACES 19 marginals of γ coincide with γ i , i = 1 , . We notice that if γ is the null measure theproof is concluded by taking any pseudo-metric coupling ˆ d between d and d . Step 2 : pre-compactness of the pseudo-metric couplings.Regarding the sequence ˆ d n , by the triangle inequality we have that | ˆ d n ( x , y ) − ˆ d n ( x , y ) | ≤ | d ( x , x ) + d ( y , y ) | . In particular, ˆ d n is uniformly -Lipschitz with respect to the complete and separablemetric d + d on X × X . We claim it is also uniformly bounded in a point. To seethis, take (¯ x, ¯ y ) ∈ supp ( γ ) : since γ n weakly converges to γ for every r, ǫ > and forall n suﬃciently large we have γ n ( B r (¯ x ) × B r (¯ y )) ≥ γ ( B r (¯ x ) × B r (¯ y )) − ǫ. Fix r > and suppose by contradiction that there exists a subsequence (not relabeled)such that r ≤ ˆ d n (¯ x, ¯ y ) → + ∞ . For ǫ = ǫ ( r ) small enough, from (48), the fact that (¯ x, ¯ y ) ∈ supp ( γ ) and ℓ is increasing we infer the existence of some positive constants C, c such that for all n suﬃciently large C > Z X × X ℓ (cid:16) ˆ d n ( x, y ) (cid:17) d γ n ( x, y ) ≥ Z B r (¯ x ) × B r (¯ y ) ℓ (cid:16) ˆ d n (¯ x, ¯ y ) − r (cid:17) d γ n ( x, y ) ≥ ℓ (cid:16) ˆ d n (¯ x, ¯ y ) − r (cid:17) [ γ ( B r (¯ x ) × B r (¯ y )) − ǫ ] ≥ cℓ (cid:16) ˆ d n (¯ x, ¯ y ) − r (cid:17) . Since ℓ has bounded sublevels, this implies that there exists a constant K such that ˆ d n (¯ x, ¯ y ) < K for every n that leads to a contradiction.We can thus apply Ascoli-Arzelà’s theorem to infer the existence of a limit function d : X × X → [0 , ∞ ) such that d n converges (up to subsequence) pointwise to d andthe convergence is uniform on compact sets. We can extend d to ( X ⊔ X ) × ( X ⊔ X ) in order to get a limit pseudo-metric coupling, that we denote in the same way. Step 3 : passing to the limit.Next, we pass to the limit in the following expression X i =1 D F (( γ n ) i || µ i ) + Z X × X ℓ (cid:0) ˆ d n ( x, y ) (cid:1) d γ n . By Lemma 2, the entropy is jointly lower semicontinuous and thus lim inf n D F (( γ n ) i || µ i ) ≥ D F ( γ i || µ i ) . So, it is suﬃcient to prove that lim inf n Z X × X ℓ (cid:0) ˆ d n ( x, y ) (cid:1) d γ n ≥ Z X × X ℓ (cid:0) ˆ d ( x, y ) (cid:1) d γ . (49)Using the equi-tightness of { γ k } we can ﬁnd a sequence of compact sets K ,n ⊂ X and K ,n ⊂ X such that γ k (cid:0) X × X \ ( K ,n × K ,n ) (cid:1) ≤ n for every k . We deﬁne ℓ m ( r ) := min( ℓ ( r ) , m ) , so that the sequence of functions ( x, y ) ℓ m ( d n ( x, y )) converges uniformly on compact subsets of X × X , as n → ∞ . Possiblyby taking a further subsequence via a diagonal argument, we can infer that k ℓ m ( d ) − ℓ m ( d n ) k ∞ ; n → when n → ∞ , where we denote by k · k ∞ ; n the supremum norm in the set K ,n × K ,n . Let M be a positive constant such that γ n ( X × X ) ≤ M for every n . We can bound the integral on the left hand side of (49) in the following way: Z X × X ℓ (ˆ d n )d γ n ≥ Z X × X ℓ m (ˆ d n )d γ n ≥ Z K n × K n ℓ m (ˆ d n )d γ n ≥ Z K n × K n ℓ m (ˆ d )d γ n − M k ℓ m (ˆ d ) − ℓ m (ˆ d n ) k ∞ ; n ≥ Z X × X ℓ m (ˆ d )d γ n − M k ℓ m (ˆ d ) − ℓ m (ˆ d n ) k ∞ ; n − m/n. Now we can pass to the limit with respect to n using the weak convergence of { γ n } ,and we obtain lim inf n Z X × X ℓ (ˆ d n )d γ n ≥ Z X × X ℓ m (ˆ d )d γ and then we conclude using the Beppo Levi’s monotone convergence theorem withrespect to m .(ii) Without loss of generality we assume supp ( µ i ) = X i . By the previous point we knowthe existence of an optimal measure γ ∈ M ( X × X ) and an optimal pseudo-metriccoupling ˆ d between d and d . We consider the complete and separable metric space ( ˜ X, ˜ d ) constructed as in Lemma 1. Denoting by p : X ⊔ X → ˜ X the projection tothe quotient and using the identiﬁcation X ⊔ X = X × { } ∪ X × { } , we notice that X × X ֒ → ˜ X × ˜ X via the injective Borel map ψ ( x , x ) = ( ψ ( x ) , ψ ( x )) := ( p ( x , , p ( x , . Moreover, we also have that ψ i is an isometry of ( X i , d i ) onto its image in ( ˜ X, ˜ d ) , i = 1 , . Thus, denoting by γ i the marginals of γ , we can consider the measures ψ ♯ γ whose projections are ( ψ ) ♯ γ and ( ψ ) ♯ γ . Using Lemma 4 we know that D F ( γ i || µ i ) = D F (( ψ i ) ♯ γ i || ( ψ i ) ♯ µ i ) , i = 1 , . (50)By recalling the deﬁnition of ˜ d , we also have Z X × X ℓ (cid:0) ˆ d ( x, y ) (cid:1) d γ = Z ˜ X × ˜ X ℓ (cid:0) ˜ d ( x, y ) (cid:1) d( ψ ♯ γ ) . (51)Thus, as a consequence of (50), (51) and the optimality of γ and ˆ d , the equality (47)holds on ( ˜ X, ˜ d ) (with optimal measure ψ ♯ γ ). (cid:3) Remark 2.

It is clear that the optimal coupling ˆ d whose existence is proven in the previousLemma is in general only a pseudo-metric and not a metric on X ⊔ X . To see this, it issuﬃcient to consider two isomorphic metric measure spaces ( X , d , µ ) , ( X , d , µ ) . If wedenote by ψ : X → X the isometry between ( X , d ) and ( X , d ) , the optimal coupling ˆ d satisﬁes ˆ d ( x , ψ ( x )) = 0 for µ -a.e x . The next theorem is the main result of the paper.

Theorem 2.

Let D ET be a regular Entropy-Transport distance induced by ( a, F, ℓ ) . Then ( X , D ET ) is a complete and separable metric space. It is also a length (resp. geodesic) spaceif D ET is a length (resp. geodesic) metric. NTROPY-TRANSPORT DISTANCES BETWEEN UNBALANCED METRIC MEASURE SPACES 21

Proof.

Step 1 : D ET deﬁnes a metric.It is clear that D ET is symmetric, ﬁnite valued, nonnegative and D ET (cid:0) ( X , d , µ ) , ( X , d , µ ) (cid:1) = 0 if ( X , d , µ ) = ( X , d , µ ) . We claim that D ET (cid:0) ( X , d , µ ) , ( X , d , µ ) (cid:1) = 0 implies that the metric measure spaces ( X , d , µ ) and ( X , d , µ ) are isomorphic. By Lemma 6 there exist a measure γ ∈ M ( X × X ) and a pseudo-metric coupling ˆ d such that X i =1 D F (cid:0) γ i || µ i (cid:1) + Z X × X ℓ (cid:0) ˆ d ( x, y ) (cid:1) d γ . All the terms are nonnegative, so that γ i = µ i . Moreover, since ℓ ( d ) = 0 if and only if d = 0 ,we have that ˆ d ( x, y ) = 0 for γ -a.e ( x, y ) and thus (using the triangle inequality and that ˆ d is a pseudo-metric coupling between d and d ) ˆ d ( x, y ) = 0 for all ( x, y ) ∈ supp ( γ ) . (52)Since d and d are metrics, it follows that for every x ∈ supp ( µ ) there exists a unique x ∈ supp ( µ ) such that ( x , x ) ∈ supp ( γ ) . Switching the role of X and X in theargument above, we obtain the existence of a bijection ψ : supp ( µ ) → supp ( µ ) such that γ = (Id , ψ ) ♯ µ and (in virtue of (52)) ˆ d ( x, ψ ( x )) = 0 for all x ∈ supp ( µ ) . (53)Let x, y ∈ supp ( µ ) , from (53) and the triangle inequality it follows d ( x, y ) = ˆ d ( x, y ) ≤ ˆ d ( x, ψ ( x )) + ˆ d ( ψ ( x ) , ψ ( y )) + ˆ d ( y, ψ ( y )) = d ( ψ ( x ) , ψ ( y )) , d ( ψ ( x ) , ψ ( y )) = ˆ d ( ψ ( x ) , ψ ( y )) ≤ ˆ d ( x, ψ ( x )) + ˆ d ( x, y ) + ˆ d ( y, ψ ( y )) = d ( x, y ) , which implies that ψ : supp ( µ ) → supp ( µ ) is an isometry.Hence ( X , d , µ ) and ( X , d , µ ) are isomorphic, as claimed.Regarding the triangle inequality, let ( X i , d i , µ i ) , i = 1 , , , be three metric measurespaces. From the deﬁnition of D ET and Proposition 2, for every ǫ > we ﬁnd a pseudo-metric coupling d between d and d , and a pseudo-metric coupling d between d and d such that D ET (cid:0) ( X , d , µ ) , ( X , d , µ ) (cid:1) ≥ ( D ET ) d ( µ , µ ) − ǫ, D ET (cid:0) ( X , d , µ ) , ( X , d , µ ) (cid:1) ≥ ( D ET ) d ( µ , µ ) − ǫ, where we have denoted by ( D ET ) d the Entropy-Transport distance induced by the pseudo-metric d . Set X := X ⊔ X ⊔ X and deﬁne a pseudo-metric d on X in the followingway d ( x, y ) :=  d ( x, y ) if x, y ∈ X ⊔ X d ( x, y ) if x, y ∈ X ⊔ X inf z ∈ X [ d ( x, z ) + d ( z, y )] if x ∈ X and y ∈ X inf z ∈ X [ d ( x, z ) + d ( z, y )] if x ∈ X and y ∈ X . We notice that d coincides with d i when restricted to X i . By applying Proposition 2, thepoint (ii) of Lemma 5 and the triangle inequality of ( D ET ) d we obtain D ET (cid:0) ( X , d , µ ) , ( X , d , µ ) (cid:1) ≤ ( D ET ) d ( µ , µ ) ≤ ( D ET ) d ( µ , µ ) + ( D ET ) d ( µ , µ )= ( D ET ) d ( µ , µ ) + ( D ET ) d ( µ , µ ) ≤ D ET (cid:0) ( X , d , µ ) , ( X , d , µ ) (cid:1) + D ET (cid:0) ( X , d , µ ) , ( X , d , µ ) (cid:1) + 2 ǫ. The conclusion follows since ǫ > is arbitrary. Step 2 : Completeness of D ET .In order to prove completeness, let { ( X n , d n , µ n ) } n ∈ N be a Cauchy sequence in the space ( X , D ET ) . In order to have convergence of the full sequence, it is enough to prove that thereexists a converging subsequence. Let us consider a subsequence such that D /a ET (cid:0) ( X n k , d n k , µ n k ) , ( X n k +1 , d n k +1 , µ n k +1 ) (cid:1) < − ( k +1) . By deﬁnition of D ET and Proposition 2, we can ﬁnd a measure γ k +1 ∈ M ( X n k × X n k +1 ) and a complete and separable metric coupling ˆ d k +1 between d X nk and d X nk +1 such that Z X nk F ( σ n k )d µ n k + Z X nk +1 F ( σ n k +1 )d µ n k +1 + Z X nk × X nk +1 ℓ (cid:0) ˆ d k +1 (cid:1) d γ k +1 < − k , (54)where σ n k (resp. σ n k +1 ) is the Radon-Nykodim derivative of the ﬁrst (resp. second) marginalof γ k +1 with respect to µ n k (resp. µ n k +1 ).Now we want to deﬁne a sequence (cid:8) ( X ′ k , d ′ k ) (cid:9) ∞ k =1 of metric spaces such that X n k ⊂ X ′ k and X ′ k ⊂ X ′ k +1 . We proceed in the following way: we set (cid:0) X ′ , d ′ (cid:1) := (cid:0) X n , d X n (cid:1) ,X ′ k +1 := X ′ k ⊔ X n k +1 (cid:14) ∼ , where x ∼ y if d ′ k +1 ( x, y ) = 0 and the latter is deﬁned as d ′ k +1 ( x, y ) :=  d ′ k ( x, y ) if x, y ∈ X ′ k ˆ d k +1 ( x, y ) if x, y ∈ X n k ⊔ X n k +1 inf z ∈ X nk d ′ k ( x, z ) + ˆ d k +1 ( z, y ) if x ∈ X ′ k , y ∈ X n k +1 inf z ∈ X nk d ′ k ( y, z ) + ˆ d k +1 ( z, x ) if y ∈ X ′ k , x ∈ X n k +1 . From the deﬁnition of d ′ k , it is clear that we can endow the space X ′ := S ∞ k =1 X ′ k with a limitmetric d ′ . Now we consider the completion ( X, d ) of ( X ′ , d ′ ) and we notice that ( X n k , d X nk ) is isometrically embedded in this space for every k . Using the embedding, we can also deﬁnea measure ¯ µ n k as the push-forward of the measure µ n k . Combining the construction abovewith (54) gives ( D ET ) /a d (¯ µ n k , ¯ µ n k +1 ) ≤ Z X nk F ( σ n k )d µ n k + Z X nk +1 F ( σ n k +1 )d µ n k +1 + Z X nk × X nk +1 ℓ (cid:16) ˆ d k +1 (cid:17) d γ k +1 < − k , (55)where ( D ET ) d is the regular Entropy-Transport distance computed in the space ( X, d ) . Inparticular, (55) implies that (¯ µ n k ) k ∈ N is a Cauchy sequence in ( M ( X ) , ( D ET ) d ) . Since ( D ET ) d is complete, there exists µ ∈ M ( X ) such that ( D ET ) /a d (¯ µ n k , µ ) → . NTROPY-TRANSPORT DISTANCES BETWEEN UNBALANCED METRIC MEASURE SPACES 23

Using again that ( X n k , d X nk ) is isometrically embedded in ( X, d ) and the point (ii) ofLemma 5, we can conlude that D ET (cid:0) ( X n k , d n k , µ n k ) , ( X, d , µ ) (cid:1) ≤ ( D ET ) d (¯ µ n k , µ ) → . (56) Step 3 : Separability of D ET .Thanks to (iii) of Lemma 5 it is enough to show that the set X ∗ , deﬁned in (40), isseparable. To this aim, we notice that X ∗ can be written as F n ∈ N ˜ K n where ˜ K n := { ( X, d , µ ) ∈ X ∗ : supp ( µ ) has n points } . Since the set of all ( D, M ) = ( D ij , M ) ∈ R n × n + × R + such that D ij = D ji , D ij = 0 ⇐⇒ i = j , D ij ≤ D ik + D kj (57)is separable (as a subset of the Euclidean space), using (iv) of Lemma 5 we get that ˜ K n,M := { ( X, d , µ ) ∈ X ∗ : supp ( µ ) has n points and µ ( X ) = nM } is separable for every ﬁxed n ∈ N , M > . The separability of ˜ K n follows by the separabilityof ˜ K n,M combined with (v) of Lemma 5. Step 4 : Length/geodesic property of D ET .Let us start by proving the length property. Let ( X , d , µ ) , ( X , d , µ ) ∈ X . By deﬁnitionof D ET , for every ε > we can ﬁnd a complete and separable metric space ( X, d ) andisometric embeddings ψ i : supp ( µ i ) → X , i = 1 , , such that D ET (( X , d , µ ) , ( X , d , µ )) ≥ ( D ET ) d ( µ , µ ) − ε, (58)where, as before, we identify supp ( µ i ) with its isometric image ψ i ( supp ( µ i )) , and correspond-ingly µ i with ψ i♯ µ i , i = 1 , , in order to keep notation short.Recall that, by slightly modifying the classical Kuratowski embedding, one can show thatevery complete and separable metric space can be isometrically embedded in a complete,separable and geodesic metric space (see for instance [25, Exercise 1c. Ch. 3 . ] or [22,Proposition 1.2.12]). Thus, recalling also Lemma 4, without loss of generality we can as-sume that the complete and separable metric space ( X, d ) above is also geodesic .By assumption ( D ET ) d is a length distance on M ( X ) since ( X, d ) is a length space, so thatwe can ﬁnd a curve ( µ t ) t ∈ [1 , ⊂ ( M ( X ) , ( D ET ) d ) from µ to µ satisfying Length ( D ET ) d (( µ t ) t ∈ [1 , ) ≤ ( D ET ) d ( µ , µ ) + ε. (59)Now, it is easy to check that the D ET -length of the curve of m.m.s. (( X, d , µ t )) t ∈ [1 , ⊂ X satisﬁes Length D ET ((( X, d , µ t )) t ∈ [1 , ) ≤ Length ( D ET ) d (( µ t ) t ∈ [1 , ) . (60)Indeed the length of a curve is by deﬁnition the supremum of the sums of mutual distancesover ﬁnite partitions (13), and for every partition ( t i ) of [1 , it holds X i D ET (( X, d , µ t i +1 ) , ( X, d , µ t i )) ≤ X i ( D ET ) d ( µ t i +1 , µ t i ) ≤ Length ( D ET ) d (( µ t ) t ∈ [1 , ) . The combination of (58), (59) and (60) gives

Length D ET ((( X, d , µ t )) t ∈ [1 , ) ≤ Length ( D ET ) d (( µ t ) t ∈ [1 , ) ≤ ( D ET ) d ( µ , µ ) + ε ≤ D ET (( X , d , µ ) , ( X , d , µ )) + 2 ε, as desired. To prove the geodesic property in the case D ET is a geodesic distance, we notice that wecan follow verbatim the argument given above with ε = 0 . Here one has to notice that theexistence of an optimal complete and separable metric space on which (58) holds with ε = 0 follows from (ii) of Lemma 6. (cid:3) Remark 3.

It is proved in [30, Proposition 8.3] that ( M ( X ) , HK) is a geodesic space whenthe underlying space ( X, d ) is geodesic. In particular, the last claim of Theorem 2 can beapplied to the Hellinger-Kantorovich distance.To the best of our knowledge, up to now this is the only known example of regular Entropy-Transport geodesic distance (with the trivial exception of weighted variants of HK [29] ). Topology.

Let us introduce a notion of convergence for sequences of (equivalenceclasses of) metric measure spaces (see [21, Deﬁnition 3.9] for the corrisponding notion in thecontest of pointed metric measure spaces).

Deﬁnition 6.

We say that a sequence ( X n , d n , µ n ) n ∈ N weakly measured-Gromov convergesto ( X ∞ , d ∞ , µ ∞ ) if there exist a complete and separable metric space ( X, d ) and isometricembeddings ι n : X n → X , n ∈ ¯ N , such that ( ι n ) ♯ µ n → ( ι ∞ ) ♯ µ ∞ weakly in M ( X ) . In the next Theorem we see that this notion of convergence actually coincide with theconvergence induced by any Sturm-Entropy-Transport distance.

Theorem 3.

Let D ET be a regular Entropy-Transport distance induced by ( a, F, ℓ ) . A se-quence ( X n , d n , µ n ) n ∈ N weakly measured Gromov converges to ( X ∞ , d ∞ , µ ∞ ) if and only if D ET (( X n , d n , µ n ) , ( X ∞ , d ∞ , µ ∞ )) → as n → ∞ . (61) Proof.

Let us suppose the validity of (61). By deﬁnition of D ET we know that there exist acomplete and separable metric space ( Y n , d Y n ) and isometric embeddings ψ n , ψ ∞ n of ( X n , d n ) , ( X ∞ , d ∞ ) respectively, in Y n such that D ET (( ψ n ) ♯ µ n , ( ψ ∞ ) ♯ µ ∞ ) < n , (62)where D ET is computed in the space Y n . We now deﬁne Y := ⊔ n X n , n ∈ ¯ N endowed withthe pseudo-metric d Y d Y ( y, y ′ ) :=  d n ( y, y ′ ) if y, y ′ ∈ X n , n ∈ ¯ N d Y n ( ψ n ( y ) , ψ ∞ n ( y ′ )) if y ∈ X n , y ′ ∈ X ∞ d Y n ( ψ ∞ n ( y ) , ψ n ( y ′ )) if y ∈ X ∞ , y ′ ∈ X n inf x ∈ X ∞ d Y n ( ψ n ( y ) , ψ ∞ n ( x )) + d Y m ( ψ m ( y ′ ) , ψ ∞ m ( x )) if y ∈ X n , y ′ ∈ X m . We now consider the space

Y / ∼ deﬁned as the quotient of Y with respect to the equivalencerelation y ∼ y ′ ⇔ d Y ( y, y ′ ) = 0 , (63)and we then deﬁne the completion of this space, that we still denote by ( Y, d Y ) . It is easyto see that Y is separable. By construction we notice that the set ψ n ( X n ) ∪ ψ ∞ n ( X ∞ ) ⊂ Y n endowed with the distance d Y n is canonically isometrically embedded in ( Y, d Y ) , so that everyspace X n , n ∈ ¯ N , is canonically isometrically embedded into Y by a map ψ ′ n . We claim nowthat Y and ψ ′ n provide a realization of the weakly measured Gromov convergence. To seethis, it is enough to notice that ( ψ ′ n ) ♯ µ n → ( ψ ′∞ ) ♯ µ ∞ weakly in M ( Y ) which is a consequenceof the construction of ψ ′ n , (62) and the fact that D ET induces the weak topology. NTROPY-TRANSPORT DISTANCES BETWEEN UNBALANCED METRIC MEASURE SPACES 25

For the converse, let us suppose that ( X n , d n , µ n ) n ∈ N weakly measured Gromov convergesto ( X ∞ , d ∞ , µ ∞ ) . By deﬁnition we know that there exist a complete and separable metricspace ( X, d ) and isometric embeddings ι n : X n → X , n ∈ ¯ N , such that ( ι n ) ♯ µ n → ( ι ∞ ) ♯ µ ∞ weakly in M ( X ) . Since D ET metrizes the weak convergence on M ( X ) we know that D ET (( ι n ) ♯ µ n , ( ι ∞ ) ♯ µ ∞ ) → as n → ∞ , and the result follows by the very deﬁnition of the D ET , noticing that ( X, d ) is a possiblecompetitor. (cid:3) Let us denote by X ( K, N, L, v, V ) the family of isomorphism classes of metric measurespaces ( X, d , µ ) ∈ CD ( K, N ) such that diam ( X ) ≤ L and < v ≤ µ ( X ) ≤ V. Let ˜X ( K, N, L, v, V ) be the family of isomorphism classes of spaces ( X, d , µ ) ∈ X ( K, N, L, v, V ) such that µ has full support. Theorem 4.

Fix K ∈ R , N ∈ (1 , ∞ ) , L ∈ (0 , ∞ ) and < v ≤ V < ∞ . Let D ET be aregular Entropy-Transport distance. Then • X ( K, N, L, v, V ) is compact with respect to D ET . • ˜X ( K, N, L, v, V ) is compact with respect to mGH. Moreover on such family the D ET -topology and the mGH-topology coincide.Proof. By [21, Corollary 3.22] we have precompacteness of X ( K, N, L, v, V ) with respect tothe weakly measured Gromov convergence and thus precompactness with respect also the D ET -convergence by Theorem 3. From [42, Theorem 3.1] (see also [21, Theorem 4.9]) weknow that the condition CD ( K, N ) is stable with respect to the weakly measured Gromovconvergence and thus the ﬁrst statement follows. For the second statement we observe thatthe spaces in ˜X ( K, N, L, v, V ) are uniformly doubling and thus the weakly measured Gromovconvergence is equivalent to the mGH-convergence (see [21, Theorem 3.30 and 3.33]). (cid:3) Remark 4.

From Theorem 3: • Combined with [21, Theorem I and pp. 29-30] it follows the stability of CD ( K, N ) with N ∈ (1 , ∞ ] under D ET -convergence. • Combined with [21, Theorem 5.7] it follows the convergence of heat ﬂows under D ET -convergence of CD ( K, ∞ ) spaces. • Combined with [21, Theorem IV] it follows the stability of

RCD ( K, N ) with N ∈ (1 , ∞ ] under D ET -convergence. • Combined with [21, Theorem V] it follows the stability of the spectrum of the Lapla-cian under D ET -convergence of CD ( K, ∞ ) spaces.We refer to [21] for the precise statements. Limiting cases

Pure entropy distances.

In the setting of Entropy-Transport problems, we call pureentropy problems the ones induced by the choices F ∈ Γ( R + ) , c ( x , x ) = ( if x = x , + ∞ otherwise.6 NICOLÒ DE PONTI AND ANDREA MONDINO In this situation one can prove (see [30, Example E.5]) that for any µ , µ ∈ M ( X ) wehave ET( µ , µ ) = inf γ ∈ M ( X ) D F ( γ || µ ) + D F ( γ || µ ) = Z X H (cid:18) d µ d λ , d µ d λ (cid:19) d λ, (64)where λ ∈ M ( X ) is any dominating measure of µ and µ and H is deﬁned as the lowersemicontinuous envelope of the function ˜ H ( r, t ) := inf θ> ˆ F ( θ, r ) + ˆ F ( θ, t ) . (65)In particular, the functional ET corresponds in this situation to the Csiszár’s divergenceinduced by the function s H (1 , s ) ∈ Γ ( R + ) (see [14, Lemma 3]), justifying the name ofpure entropy problem.For some entropy functions F one can prove that a power a of the induced pure entropycost ET is a distance. For instance, when a = 1 and F ( s ) = | s − | we obtain the celebratedtotal variation (denoted by TV in the sequel), a distance in the space of measures inducinga strong topology. Actually, thanks to the result proved in [14, Lemma 8] and the explicitbounds contained in [32, Theorem 2.5], we know that every pure entropy distance inducesthe same topology of the total variation.As shown in [14, Propositions 2, 3], we obtain another class of pure entropy distances bychoosing a = 1 / and the power-like entropy F = U p , p ≥ , deﬁned in example 3. In thissituation we have H ( r, t ) = 1 p h r + t − pp − ( r − p + t − p ) − p i if p > , (66) H ( r, t ) = ( √ r − √ t ) , p = 1 , (67)and we recognize some well-known functionals like the -Hellinger distance (case p = 1 ) andthe triangular discrimination (case p = 2 ). We will denote these distances by PL p .We start with a useful lemma, valid for any pure entropy problem. Lemma 7.

Fix a ∈ (0 , and let us consider the functions F ∈ Γ( R + ) such that F ′∞ = + ∞ , c ( x , x ) = ( if x = x , + ∞ otherwise . Let us denote by PE the power a of the Entropy-Transport cost induced by F and c .For any ( X , d , µ ) , ( X , d , µ ) ∈ X let us deﬁne PE (( X , d , µ ) , ( X , d , µ )) := inf PE ( ψ ♯ µ , ψ ♯ µ ) , (68) where the inﬁmum in the right hand side is taken over all complete and separable metricspaces ( ˆ X, ˆ d ) with isometric embeddings ψ : supp ( µ ) → ˆ X and ψ : supp ( µ ) → ˆ X .Then PE /a (( X , d , µ ) , ( X , d , µ )) = inf C { D F ( γ || µ ) + D F ( γ || µ ) } , (69) where C := n ( γ , ˆ d ) : γ ∈ M ( X × X ) , ˆ d pseudo-metric coupling for d , d , supp ( γ ) ⊂ { ˆ d = 0 } o . Proof.

Setting ℓ ( d ) := ( if d = 0+ ∞ otherwise , NTROPY-TRANSPORT DISTANCES BETWEEN UNBALANCED METRIC MEASURE SPACES 27 we can prove that the inﬁmum of (cid:26) D F ( γ || µ ) + D F ( γ || µ ) + Z X × X ℓ (cid:0) ˆ d ( x, y ) (cid:1) d γ (cid:27) a over the set ˜ C := { ( γ , ˆ d ) : γ ∈ M ( X × X ) , ˆ d pseudo-metric coupling for d , d } is less or equal to the inﬁmum in the right hand side of (68) by reasoning as in the ﬁrst partof the proof of Proposition 2. The fact that the power a of the right hand side of (69) isless or equal to the inﬁmum as in (68) follows by noticing that Z X × X ℓ (cid:0) ˆ d ( x, y ) (cid:1) d γ = ( + ∞ if supp ( γ )

6⊂ { ˆ d = 0 } otherwise . (70)For the converse inequality, we reason in a similar way as in the proof of the point (ii)of Lemma 6. For any pseudo-metric coupling ˆ d of d and d let us consider the space (( X ⊔ X ) / ∼ , ˆ d ) , where x ∼ x ⇐⇒ ˆ d ( x , x ) = 0 . It is a complete and separable metricspace as proved in Lemma 1. Denoting by p : X ⊔ X → ( X ⊔ X ) / ∼ the projection tothe quotient and using the identiﬁcation X ⊔ X = X × { } ∪ X × { } , we notice that X × X ֒ → (cid:0) ( X ⊔ X ) / ∼ (cid:1) × (cid:0) ( X ⊔ X ) / ∼ (cid:1) via the injective map ψ ( x , x ) = ( ψ ( x ) , ψ ( x )) := ( p ( x , , p ( x , . Moreover, we also have that ψ i is an isomorphism of ( X i , d i ) into its image in (( X ⊔ X ) / ∼ , ˆ d ) , i = 1 , . Thus, for any measure γ ∈ M ( X × X ) such that supp ( γ ) ⊂ { ˆ d = 0 } ,denoting by γ i the marginals of γ , we can consider the measures ψ ♯ γ whose projections are ( ψ ) ♯ γ and ( ψ ) ♯ γ . Using Lemma 4 we know that D F ( γ i || µ i ) = D F (( ψ i ) ♯ γ i || ( ψ i ) ♯ µ i ) , i = 1 , and the proof is completed by noticing that supp ( ψ ♯ γ ) is contained in the diagonal of themetric space (( X ⊔ X ) / ∼ , ˆ d ) . (cid:3) In the next theorem we prove that some pure entropy problems can be recovered as alimiting case of regular Entropy-Transport problems.

Theorem 5.

Fix p ≥ and let us consider the sequence of cost fucntions c n = n d and theentropy function F := U p . Let us denote by D p,n the Entropy-Transport distance induced by a = 1 / , c n = n d and F := U p .Then, for every metric measure spaces ( X , d , µ ) , ( X , d , µ ) ∈ X the limit PL p (( X , d , µ ) , ( X , d , µ )) := lim n →∞ D p,n (( X , d , µ ) , ( X , d , µ )) is well deﬁned , (71) where D p,n denotes the function deﬁned as in Deﬁnition 5 upon replacing D ET by D p,n .Moreover, D p,n is a regular Entropy-Transport distance and PL p deﬁnes a metric on X such that PL p (( X , d , µ ) , ( X , d , µ )) = inf PL p ( ψ ♯ µ , ψ ♯ µ ) , (72) where the inﬁmum in the right hand side is taken over all complete and separable metricspaces ( ˆ X, ˆ d ) with isometric embeddings ψ : supp ( µ ) → ˆ X and ψ : supp ( µ ) → ˆ X .Proof. The ﬁrst assertion follows by noticing that for any metric d we have n d ( x , x ) ↑ c ( x , x ) = ( if x = x + ∞ otherwise as n → ∞ for every x , x ∈ X. (73)The fact that D p,n is a regular Entropy-Transport distance is a consequence of [14, Theorem6], noticing the obvious fact that n d is a complete and separable metric for any ﬁxed n .In particular, since for every ﬁxed n we know that D p,n is a metric on X by Theorem 2,we have that PL p is nonnegative, symmetric, it satisﬁes the triangle inequality and PL p (( X , d , µ ) , ( X , d , µ )) = 0 if ( X , d , µ ) = ( X , d , µ ) . We claim that PL p (( X , d , µ ) , ( X , d , µ )) = 0 only if ( X , d , µ ) = ( X , d , µ ) (as equivalence classes). Indeed, since D p,n is nonnegativeand nondecreasing, the fact that lim n →∞ D p,n (( X , d , µ ) , ( X , d , µ )) = 0 implies D p,n (( X , d , µ ) , ( X , d , µ )) = 0 for every n, and the result follows because D p,n is a distance on X .At this level we do not know that PL p is ﬁnite valued, which is a consequence of (72)together with the fact that PL p is a (ﬁnite valued) distance on the space of measures asrecalled above.In order to prove (72), we ﬁrst notice that the monotonicity (73) easily implies that PL p (( X , d , µ ) , ( X , d , µ )) ≤ inf PL p ( ψ ♯ µ , ψ ♯ µ ) , giving the ﬁniteness of PL p (( X , d , µ ) , ( X , d , µ )) .For the converse inequality, thanks to Lemma 6 we know that for every ( X , d , µ ) , ( X , d , µ ) and for every n ∈ N there exist a measure γ n ∈ M ( X × X ) and a pseudo-metric coupling ˆ d n between d and d such that: ∞ > PL p (( X , d , µ ) , ( X , d , µ )) ≥ D p,n (( X , d , µ ) , ( X , d , µ ))= X i =1 D U p ( γ n,i || µ i ) + Z X × X n ˆ d n ( x, y ) d γ n for every n ∈ N . (74)By the superlinearity of the entropy functionals we can infer the existence (up to subse-quence) of a weak limit γ ∈ M ( X × X ) of the sequence { γ n } n ∈ N . We also know that lim inf n →∞ X i =1 D U p ( γ n,i || µ i ) ≥ X i =1 D U p ( γ i || µ i ) , (75)where we have used the usual notation for the marginal measures. If γ is the null measurethe result follows trivially. Otherwise, since the integral Z X × X ˆ d n ( x, y ) d γ n NTROPY-TRANSPORT DISTANCES BETWEEN UNBALANCED METRIC MEASURE SPACES 29 is bounded from above we can argue as in the step of the proof of Lemma 6 and we deducethe existence of a pseudo-metric coupling ˆ d between d and d such that ˆ d n converges (upto subsequence) pointwise to ˆ d and the convergence is uniform on compact sets.By recalling the explicit formulation of the right hand side of (72) given in Lemma 7, theproof is completed if we show that supp ( γ ) ⊂ { ˆ d = 0 } . Let us suppose by contradiction theexistence of a point (¯ x, ¯ y ) ∈ supp ( γ ) such that ˆ d (¯ x, ¯ y ) = k > . Fix k/ > r > : for every ǫ > suﬃciently small we know that there exist m ∈ N such that for every n > m we have γ n ( B r (¯ x ) × B r (¯ y )) ≥ γ ( B r (¯ x ) × B r (¯ y )) − ǫ > , ˆ d n (¯ x, ¯ y ) − r > ˆ d (¯ x, ¯ y ) − r − ǫ > . (76)Starting from the bound in (74) we have ∞ > PL p (( X , d , µ ) , ( X , d , µ )) ≥ n Z X × X ˆ d n ( x, y ) d γ n ≥ n Z B r (¯ x ) × B r (¯ y ) ˆ d n ( x, y ) d γ n ≥ n (cid:16) ˆ d n (¯ x, ¯ y ) − r (cid:17) γ n ( B r (¯ x ) × B r (¯ y )) ≥ n (cid:16) ˆ d (¯ x, ¯ y ) − r − ǫ (cid:17) [ γ ( B r (¯ x ) × B r (¯ y )) − ǫ ] that leads to a contradiction for n suﬃciently large thanks to (76). (cid:3) Deﬁnition 7.

We say that a sequence of metric measure spaces ( X n , d n , µ n ) n ∈ N stronglymeasured-Gromov converges to the metric measure space ( X ∞ , d ∞ , µ ∞ ) if there exist a com-plete and separable metric space ( X, d ) and isometric embeddings ι n : X n → X , n ∈ ¯ N , suchthat ( ι n ) ♯ µ n → ( ι ∞ ) ♯ µ ∞ in M ( X ) with respect to the total variation topology. In the next Theorem we see that this notion of convergence coincides with the convergenceinduced by the distance PL p for every p ≥ . Theorem 6.

Let p ≥ . A sequence ( X n , d n , µ n ) n ∈ N strongly measured-Gromov convergesto ( X ∞ , d ∞ , µ ∞ ) if and only if PL p (( X n , d n , µ n ) , ( X ∞ , d ∞ , µ ∞ )) → as n → ∞ . (77) Proof.

The proof is analogous to the one of Theorem 3. Let us suppose the validity of(77). By deﬁnition of PL p we know that there exist a complete and separable metric space ( Y n , d Y n ) and isometric embeddings ψ n , ψ ∞ n of ( X n , d n ) and ( X ∞ , d ∞ ) in Y n such that PL p (( ψ n ) ♯ µ n , ( ψ ∞ ) ♯ µ ∞ ) < n , (78)where PL p is computed in the space Y n . We now deﬁne Y := ⊔ n ∈ ¯ N X n endowed with thepseudo-metric d Y d Y ( y, y ′ ) :=  d n ( y, y ′ ) if y, y ′ ∈ X n , n ∈ ¯ N d Y n ( ψ n ( y ) , ψ ∞ n ( y ′ )) if y ∈ X n , y ′ ∈ X ∞ d Y n ( ψ ∞ n ( y ) , ψ n ( y ′ )) if y ∈ X ∞ , y ′ ∈ X n inf x ∈ X ∞ d Y n ( ψ n ( y ) , ψ ∞ n ( x )) + d Y m ( ψ m ( y ′ ) , ψ ∞ m ( x )) if y ∈ X n , y ′ ∈ X m . We consider the space

Y / ∼ deﬁned as the quotient of Y with respect to the equivalencerelation y ∼ y ′ ⇔ d Y ( y, y ′ ) = 0 , (79) and we then deﬁne the completion of this space, that we still denote by ( Y, d Y ) . It is easyto see that Y is separable. By construction we notice that the set ψ n ( X n ) ∪ ψ ∞ n ( X ∞ ) ⊂ Y n endowed with the distance d Y n is canonically isometrically embedded in ( Y, d Y ) , so thatevery space X n , n ∈ ¯ N , is canonically isometrically embedded into Y by a map ψ ′ n . Weclaim now that Y and ψ ′ n provide a realization of the strong measured-Gromov convergence.To see this, it is enough to notice that ( ψ ′ n ) ♯ µ n → ( ψ ′∞ ) ♯ µ ∞ in M ( Y ) with respect to thetopology induced by the total variation, which is a consequence of the construction of ψ ′ n ,(78) and the fact that PL p induces the topology of the total variation.For the converse, let us suppose that ( X n , d n , µ n ) n ∈ N strongly measured-Gromov convergesto the metric measure space ( X ∞ , d ∞ , µ ∞ ) . By deﬁnition we know that there exist a com-plete and separable metric space ( X, d ) and isometric embeddings ι n : X n → X , n ∈ ¯ N ,such that ( ι n ) ♯ µ n → ( ι ∞ ) ♯ µ ∞ in M ( X ) with respect to the topology of the total variation.Since PL p metrizes this topology on M ( X ) we know that PL p (( ι n ) ♯ µ n , ( ι ∞ ) ♯ µ ∞ ) → as n → ∞ and the result follows noticing that ( X, d ) is a possible competitor in the characterizationof PL p given in Theorem 5. (cid:3) In the next easy proposition we show that the strong measured-Gromov convergenceimplies the weak measured-Gromov convergence.

Proposition 3.

Let ( X n , d n , µ n ) n ∈ N be a sequence of metric measure spaces strong measured-Gromov converging to ( X ∞ , d ∞ , µ ∞ ) . Then ( X n , d n , µ n ) n ∈ N weakly measured-Gromov con-verges to ( X ∞ , d ∞ , µ ∞ ) .Proof. By deﬁnition there exist a complete and separable metric space ( X, d ) and isometricembeddings ι n : X n → X , n ∈ ¯ N , such that ( ι n ) ♯ µ n → ( ι ∞ ) ♯ µ ∞ in M ( X ) with respectto the total variation topology, which implies that ( ι n ) ♯ µ n → ( ι ∞ ) ♯ µ ∞ with respect to theweak convergence. The result follows by the very deﬁnition of weak measured-Gromovconvergence. (cid:3) We conclude the section with a list of examples of convergences.

Examples. (1) Let us consider the metric measure space ( X ∞ , d ∞ , µ ∞ ) deﬁned as theunit interval X ∞ = [0 , endowed with the Euclidean distance and the Lebesgue mea-sure. We know that ( X ∞ , d ∞ , µ ∞ ) can be approximated in the weak measured-Gromovconvergence by a sequence of discrete spaces: take for instance X n = { m/n } n − m =0 en-dowed with the distance d n inherited from the ambient -dimensional Euclidean spaceand the measure µ n such that µ n ( m/n ) = 1 /n for every m = 0 , ..., n − . We next claim that ( X n , d n , µ n ) does not converge to ( X ∞ , d ∞ , µ ∞ ) in the strongmeasured-Gromov convergence. Indeed, for any metric space ( X, d ) such that X n isisometrically embedded in X via ι n , n ∈ ¯ N , we have TV (( ι n ) ♯ µ n , ( ι ∞ ) ♯ µ ∞ ) = sup A ∈ B ( X ) | ( ι ∞ ) ♯ µ ∞ ( A ) − ( ι n ) ♯ µ n ( A ) |≥ µ ∞ [0 , \ ( n − [ m =0 m/n )! = 1 for any n ∈ N . NTROPY-TRANSPORT DISTANCES BETWEEN UNBALANCED METRIC MEASURE SPACES 31 (2) Let us consider the metric measure space ( X ∞ , d ∞ , µ ∞ ) deﬁned as the unit interval X ∞ = [0 , endowed with the Euclidean distance and the measure µ ∞ = f d L [0 , ,where L [0 , is the Lebesgue measure on [0 , . Let us deﬁne the sequence of metricmeasure spaces ( X n , d n , µ n ) where X n = [0 , − /n ] , d n is the Euclidean distanceand µ n = f n d L [0 , − /n ] . Let us suppose that ˜ f n → f in L ([0 , , where ˜ f n ( x ) = ( f n ( x ) if ≤ x ≤ − /n if − /n < x ≤ . Then, ( X n , d n , µ n ) → ( X ∞ , d ∞ , µ ∞ ) in the strong measured-Gromov convergence.To see this, it is enough to notice that for every n ∈ ¯ N the maps ι n : X n → X ∞ deﬁned as ι n ( x ) = x provides an isometric embedding such that the convergence ( ι n ) ♯ µ n → ( ι ∞ ) ♯ µ ∞ with respect the total variation distance is exactly equivalent to ˜ f n → f in L ([0 , .(3) Let ( X n , d n , µ n ) be the sequence of collapsing ﬂat tori S × n S ⊂ R endowedwith the normalized measures µ n := n/ (4 π ) dvol S × S . It is a standard fact that ( X n , d n , µ n ) converges to ( X ∞ , d ∞ , µ ∞ ) = ( S , d S , (2 π ) − L ) in the weak measured-Gromov sense (this a standard example of a collapsing sequence ).We claim that the convergence cannot be improved to strong measured-Gromov. In-deed, for any metric space ( X, d ) such that X n is isometrically embedded in X via ι n , n ∈ ¯ N , we have TV (( ι n ) ♯ µ n , ( ι ∞ ) ♯ µ ∞ ) = sup A ∈ B ( X ) | ( ι ∞ ) ♯ µ ∞ ( A ) − ( ι n ) ♯ µ n ( A ) |≥ µ n (cid:18)(cid:16) S × n S (cid:17) \ γ n (cid:0) S (cid:1)(cid:19) = 1 for any n ∈ N , where γ n : S → S × n S is an arbitrary isometric immersion. Sturm’s distances.

We notice that the classical p -Wasserstein distance W p , p ≥ ,can be recovered as a particular case of Entropy-Transport problem with the choices F ( s ) = I ( s ) := ( s = 1+ ∞ otherwise c ( x , x ) = d p ( x , x ) . (80)It is clear that W p is not a regular Entropy-Transport distance, however we show nowthat we can recover the D p -distance of Sturm (deﬁned in Deﬁnition 4) as a limiting case ofour framework. Theorem 7.

Fix p ≥ and let us consider the cost function ℓ ( d ) := d p , the entropyfunction F := U , and the power a := 1 /p . Let us denote by D ET ,n the power a of theEntropy-Transport cost induced by F n := nU and c = ℓ ( d ) . Then, for every metric measurespaces ( X , d , µ ) , ( X , d , µ ) ∈ X D p (( X , d , µ ) , ( X , d , µ )) := lim n →∞ D ET ,n (( X , d , µ ) , ( X , d , µ )) is well deﬁned , (81) where D ET ,n denotes the function deﬁned as in Deﬁnition 5 upon replacing D ET by D ET ,n .Moreover, for every metric measure spaces ( X , d , µ ) , ( X , d , µ ) ∈ X ,p we have D p (( X , d , µ ) , ( X , d , µ )) = D p (( X , d , µ ) , ( X , d , µ )) p ≥ . (82) Proof.

We start by proving that the limit (81) exists on the set X . To see this, we noticethat nF ( s ) ↑ I ( s ) for every s ∈ [0 , ∞ ) . In particular, using the explicit formulation of D ET ,n proved in Proposition 2 (we remark that we have not used the fact that D is a distance inthe proof of the proposition), we can infer that D ET ,n is nondecreasing so that the limitexists.It remains to prove that for every p ≥ we have D p = D p on the set X ,p . Since for every n ∈ N , for every complete and separable metric space ( X, d ) and for every µ, γ ∈ M ( X ) wehave D F n ( γ || µ ) ≤ D I ( γ || µ ) , it is clear that D p ≤ D p . For the converse inequality, we know that for every ( X , d , µ ) , ( X , d , µ ) ∈ X ,p we have D pp (( X , d , µ ) , ( X , d , µ )) ≥ D pp (( X , d , µ ) , ( X , d , µ )) ≥ D p ET ,n (( X , d , µ ) , ( X , d , µ )) = X i =1 D F n ( γ n,i || µ i ) + Z X × X ˆ d pn ( x, y ) d γ n , (83)for some γ n ∈ M ( X × X ) and metric coupling ˆ d n of d and d , whose existence is aconsequence of Lemma 6 (notice that we have only used the properties of the cost and theentropy in the proof of the lemma, while the fact that D ET is a metric plays no role). Since D F n is bounded from above by the superlinear entropy D I , by using Lemma 2 and Lemma3 we can infer that γ n is weakly converging (up to subsequence) to a limit γ ∈ M ( X × X ) and lim inf X i =1 D F n ( γ n,i || µ i ) ≥ X i =1 D I ( γ i || µ i ) . By reasoning as in step 2 and step 3 of Lemma 6, we know that there exists a pseudo-metriccoupling ˆ d of d and d such that ˆ d n converges (up to subsequence) pointwise to ˆ d and theconvergence is uniform on compact sets. Moreover, we have that lim inf n Z X × X ˆ d pn ( x, y ) d γ n ≥ Z X × X ˆ d p ( x, y )d γ , (84)and the result follows since ˆ d and γ are competitors in the explicit formulation of D p (see[41, Lemma 3.3]) as a consequence of the fact that D I ( γ i || µ i ) < ∞ ⇐⇒ γ i = µ i , i = 1 , . (cid:3) Remark 5.

We point out that we are not claiming that the sequence D ET ,n deﬁned in The-orem 7 is a sequence of regular Entropy-Transport distances. Actually, this is the case for p = 2 as a conseguence of [30, Theorem 7.25] , noticing that the cost of the Entropy-Transportproblem induced by ( nU , d ) is n times the cost of the Entropy-Transport problem inducedby ( U , ( d / √ n ) ) and d / √ n is trivially a complete and separable distance.In this situation, one can show that D deﬁnes a metric possibly attaining the value + ∞ onthe whole set X , by reasoning as in the proof of Theorem 5. NTROPY-TRANSPORT DISTANCES BETWEEN UNBALANCED METRIC MEASURE SPACES 33

Piccoli-Rossi distance.

A natural extension of the W -metric in the context ofEntropy-Transport problem is the Piccoli-Rossi generalized Wasserstein distance BL [34, 35],induced by the choices F ( s ) = | s − | , c ( x , x ) = d ( x , x ) . (85)We notice that the entropy function is not superlinear.It is proved in [34] that BL is a complete distance on M ( X ) for every Polish space ( X, d ) ([34] is in the Euclidean setting, however the proof for a Polish space can be performedverbatim).By exploiting the dual formulation of this distance, we know that BL corresponds to theso-called ﬂat metric or bounded Lipschitz distance (see [35, Theorem 2]), namely BL ( µ , µ ) = sup (cid:26)Z X f d( µ − µ ) : k f k ∞ ≤ , k f k Lip ≤ (cid:27) for any µ , µ ∈ M ( X ) . (86)We also recall this useful lemma, which is proved in [34, Proposition 1]. Lemma 8.

Given µ , µ ∈ M ( X ) , let us consider the Entropy-Transport problem inducedby ( F, c ) deﬁned in (85) . Then the inﬁmum of the problem (25) is attained by a measure γ ∈ M ( X × X ) such that γ i := ( π i ) ♯ γ ≤ µ i , i = 1 , . We have the following:

Theorem 8.

Fix a = 1 , ℓ ( d ) := d and let us consider the sequence ( F n ) n ≥ deﬁned by F n ( s ) := ( | s − | if ≤ s ≤ n ( s − n − if s > n. Let us denote by D ET ,n the Entropy-Transport cost induced by a , F n and c = ℓ ( d ) .Then, for every metric measure spaces ( X , d , µ ) , ( X , d , µ ) ∈ X the quantity BL (( X , d , µ ) , ( X , d , µ )) := D ET ,n (( X , d , µ ) , ( X , d , µ )) is well deﬁned , (87) where D ET ,n denotes the function deﬁned as in Deﬁnition 5 upon replacing D ET by D ET ,n .Moreover, BL deﬁnes a complete metric on X such that BL (( X , d , µ ) , ( X , d , µ )) = inf BL ( ψ ♯ µ , ψ ♯ µ ) , (88) where the inﬁmum in the right hand side is taken over all complete and separable metricspaces ( ˆ X, ˆ d ) with isometric embeddings ψ : supp ( µ ) → ˆ X and ψ : supp ( µ ) → ˆ X .Proof. We notice that ( F n ) n ≥ is a sequence of continuous superlinear entropy functions.We also know that F n ( s ) = | s − | in [0 , and F n ( s ) ≥ | s − | in [0 , ∞ ) for every n ≥ ,which implies that D ET ,n coincide with BL thanks to Lemma 8. In particular we see that D ET ,n does not depend on n and also the identity (88) follows.The fact that BL is a complete distance on X is a consequence of the completeness of BL (and thus D ET ,n ) on the set of measures M ( X ) , and can be proved along the lines of Step2 in the proof Theorem 2. (cid:3) Remark 6.

We observe that the sequence D ET ,n deﬁned in Theorem 8 is not a sequence of regular Entropy-Transport distances. The problem here is that the topology induced by thedistance BL does not coincide with the weak topology, but it requires an additional tightnesscondition (see [34, Theorem 3] for all the details). Bounds between distances.

The aim of this last short section is to give some explicitbounds between the distances discussed in the paper.

Proposition 4.

Let us denote by HK , G HK , QPL p (for < p ≤ ) and LPL p (for p > )the regular Entropy-Transport distances deﬁned in examples (1), (2), (3) and (4) respec-tively. Accordingly, we denote by D HK , D G HK , D QPL p and D LPL p the induced Sturm-Entropy-Transport distances. The following inequalities hold:(1) D G HK ≤ D HK .(2) D QPL p ≤ D G HK ≤ √ p D QPL p < p ≤ .(3) D LPL p ≤ PL p p > . Moreover, for every regular entropy transport distance D ET induced by (1 /p, F, ℓ ) where p ≥ , F ∈ Γ ( R + ) , ℓ ( d ) = d p we have:(4) D ET ≤ D p p ≥ . Proof. (1) is a consequence of the bound proved in [30, Section 7.8].(2) follows by the corresponding inequality proved in [14, Proposition 7].(3) has been shown along the lines of the proof of Theorem 5 (notice that D LPL p equals D p, in the notation of that Theorem).(4) is a consequence of the explicit formulations of D p and D ET , by noticing that forany F ∈ Γ ( R + ) we have F ≤ I where I has been deﬁned in (80). (cid:3) References [1] D. Alvarez-Melis, T.S. Jaakkola, “Gromov-Wasserstein Alignment of Word Embedding Spaces”, Proc.2018 Conference on Empirical Methods in Natural Language Processing, pp.1881–1890, (2018).[2] L. Ambrosio, “Calculus, heat ﬂow and curvature-dimension bounds in metric measure spaces”, Pro-ceedings of the ICM 2018, Rio de Janeiro, Vol. 1, pp. 301–340.[3] L. Ambrosio, N. Gigli, A. Mondino, T. Rajala “Riemannian Ricci curvature lower bounds in metricmeasure spaces with σ -ﬁnite measure”, Trans. Amer. Math. Soc., Vol. 367, (2015), no. 7, pp. 4661–4701.[4] , “Metric measure spaces with Riemannian Ricci curvature bounded from below”, Duke Math.J., Vol. 163, (2014), pp. 1405-1490.[5] L. Ambrosio, A. Mondino, G. Savaré, “Nonlinear diﬀusion equations and curvature conditions in metricmeasure spaces”, Mem. Amer. Math. Soc., Vol. 262, no. 1270, (2019).[6] A. M. Bronstein, M. M. Bronstein, R. Kimmel, “Generalized multidimensional scaling: a frameworkfor isometry-invariant partial surface matching”, Proc. Nat. Acad. of Sciences, Vol. 103, no.5, (2006),pp. 1168–1172.[7] C. Bunne, D. Alvarez-Melis, A. Krause, S. Jegelka, “Learning generative models across incomparablespaces”, Proc. th Intern. Conference on Machine Learning, Long Beach, California, PMLR 97, (2019).[8] F. Cavalletti, E. Milman, “The Globalization Theorem for the Curvature Dimension Condition”,preprint arXiv:1612.07623.[9] J. Cheeger, T. Colding, “On the structure of spaces with Ricci curvature bounded below I”, J. Diﬀ.Geom. (1997), pp. 406–480.[10] L. Chizat, F. Bach, “On the global convergence of gradient descent for over-parameterized modelsusing optimal transport”, Advances in neural information processing systems, pp. 3036–3046, (2018).[11] L. Chizat, G. Peyré, B. Schmitzer, F-X. Vialard, “Unbalanced optimal transport: dynamic and Kan-torovich formulations”, J. Funct. Anal. 274.11, (2018), pp. 3090-3123.[12] S. Chowdhury, F. Mémoli, “The Gromov–Wasserstein distance between networks and stable networkinvariants”, Information and Inference: A Journal of the IMA, Vol. 8, no. 4, (2019), pp. 757–787.[13] I. Csiszár, “Eine informationstheoretische Ungleichung und ihre Anwendung auf den Beweis der Er-godizität von Markoﬀschen Ketten”, Magyar Tud. Akad. Mat. Kutató Int. Közl., 8, (1963), pp. 85–108.[14] N. De Ponti, “Metric properties of homogeneous and spatially inhomogeneous F -divergences”, IEEETransaction on Information Theory, 5, Vol.66, (2020). NTROPY-TRANSPORT DISTANCES BETWEEN UNBALANCED METRIC MEASURE SPACES 35 [15] , “Optimal transport: entropic regularizations, geometry and diﬀusion PDEs”, Phd Thesis,http://cvgmt.sns.it/paper/4525/, (2019).[16] M. Erbar, K. Kuwada, K.T. Sturm, “On the Equivalence of the Entropic Curvature-Dimension Con-dition and Bochner’s Inequality on Metric Measure Space”, Invent. Math., Vol. 201, no. 3, (2015), pp.993–1071.[17] J. Feydy, P. Roussillon, A. Trouvé, P. Gori, “Fast and scalable optimal transport for brain tractograms”,Intern. Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 636–644.Springer, (2019).[18] C. Frogner, C. Zhang, H. Mobahi, M. Araya, T.A. Poggio, “Learning with a Wasserstein loss”, Advancesin Neural Information Processing Systems, pp. 2053–2061, (2015).[19] K. Fukaya, “Collapsing of Riemannian manifolds and eigenvalues of Laplace operator”, Invent. Math.,Vol. 87, no. 3, (1987), pp. 517-547.[20] N. Gigli, “On the diﬀerential structure of metric measure spaces and applications”, Mem. Amer. Math.Soc. 236 (2015), no. 1113, vi+91.[21] N. Gigli, A. Mondino, G. Savaré, “Convergence of pointed non-compact metric measure spaces andstability of Ricci curvature bounds and heat ﬂows”, Proc. London Math. Soc. (3), Vol. 111, (2015), pp.1071-1129.[22] N. Gigli, E. Pasqualetto, “Lectures on Nonsmooth Diﬀerential Geometry”, SISSA Springer Series, 2,Springer International Publishing, (2020).[23] J. Gilmer, S.S. Schoenholz, P.F. Riley, O. Vinyals, G.E. Dahl, “Neural message passing for quantumchemistry”, Proc. th Intern. Conference on Machine Learning, Vol. 70, pp. 1263–1272, (2017).[24] E. Grave, A. Joulin, Q. Berthet, “Unsupervised alignment of embeddings with Wasserstein procrustes”, nd Intern. Conference on Artiﬁcial Intelligence and Statistics, pp. 1880–1890, (2019).[25] M. Gromov, “Metric structures for Riemannian and non-Riemannian spaces”, Progress in Mathematics,152, Birkhäuser Boston, Inc., Boston, (1999).[26] S. Kondratyev, L. Monsaingeon, D. Vorotnikov, “A new optimal transport distance on the space ofﬁnite Radon measures”, Adv. Diﬀerential Equations, Vol. 21, no. 11/12 ,(2016), pp. 1117-1164.[27] M. Ledoux, “The Concentration of Measure Phenomenon”, Math. Surveys and Monographs, Vol. 89,American Math. Soc., (2001).[28] J. Lee, N.P. Bertrand, C.J. Rozell, “Parallel unbalanced optimal transport regularization for large scaleimaging problems”, preprint arXiv:1909.00149.[29] M. Liero, A. Mielke, G. Savaré, “Optimal Transport in Competition with Reaction: The Hellinger–Kantorovich Distance and Geodesic Curves”, SIAM J. Math. Analysis, 48(4), (2016), pp. 2869–2911.[30] , “Optimal entropy-transport problems and a new Hellinger-Kantorovich distance between posi-tive measures”, Inventiones Mathematicae, 3, Vol. 211., (2018), pp. 969–1117.[31] J. Lott, C. Villani, “Ricci curvature for metric-measure spaces via optimal transport”, Ann. of Math.,169 (2009), pp. 903-991.[32] G. Luise, G. Savaré, “Contraction and regularizing properties of heat ﬂows in metric measure spaces”,Discrete and Continuous Dynamical Systems Series S, early access, 10.3934/dcdss.2020327, (2020).[33] , “Gromov-Wasserstein distances and the metric approach to object matching”, Foundations ofComputational Mathematics, 4, Vol. 11, (2011), pp. 417–487.[34] B. Piccoli, F. Rossi, “Generalized Wasserstein distance and its application to transport equations withsource”, Arch. Ration. Mech. Anal., 211, (2014), pp. 335–358.[35] , “On properties of the Generalized Wasserstein distance”, Arch. Ration. Mech. Anal. 222, (2016),pp. 1339–1365.[36] G. Rotskoﬀ, S. Jelassi, J. Bruna, E. Vanden-Eijnden, “Global convergence of neuron birth-death dy-namics”, Proc. th Intern. Conference on Machine Learning, Long Beach, California, PMLR 97,(2019).[37] T. Séjourné, J. Feydy, F-X. Vialard, A. Trouvé, G. Peyré, “Sinkhorn Divergences for UnbalancedOptimal Transport”, preprint arXiv:1910.12958.[38] T. Séjourné, F.X. Vialard, G. Peyré, “The Unbalanced Gromov Wasserstein Distance: Conic Formu-lation and Relaxation”, preprint arXiv:2009.04266.[39] B. Schmitzer, C. Schnörr, “Modelling convex shape priors and matching based on the Gromov-Wasserstein distance”, Journal of Math. Imaging and Vision, 1, Vol. 46, (2013), pp. 143–159.[40] T. Shioya, “Metric measure geometry”, Vol. 25, IRMA Lectures in Math. and Theoretical Phys., EMSPublishing House, Zürich, (2016), pp.xi+182.6 NICOLÒ DE PONTI AND ANDREA MONDINO