[PDF] The ultrametric Gromov-Wasserstein distance

Abstract

In this paper, we investigate compact ultrametric measure spaces which form a subset \mathcal{U}^w of the collection of all metric measure spaces \mathcal{M}^w. Similar as for the ultrametric Gromov-Hausdorff distance on the collection of ultrametric spaces \mathcal{U}, we define ultrametric versions of two metrics on \mathcal{U}^w, namely of Sturm's distance of order p and of the Gromov-Wasserstein distance of order p. We study the basic topological and geometric properties of these distances as well as their relation and derive for p=\infty a polynomial time algorithm for their calculation. Further, several lower bounds for both distances are derived and some of our results are generalized to the case of finite ultra-dissimilarity spaces.

Full PDF

TTHE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE

FACUNDO M´EMOLI, AXEL MUNK, ZHENGCHAO WAN, AND CHRISTOPH WEITKAMP

Abstract.

In this paper, we investigate compact ultrametric measure spaces which form asubset U w of the collection of all metric measure spaces M w . Similar as for the ultrametricGromov-Hausdorﬀ distance on the collection of ultrametric spaces U , we deﬁne ultrametricversions of two metrics on U w , namely of Sturm’s distance of order p and of the Gromov-Wasserstein distance of order p . We study the basic topological and geometric properties ofthese distances as well as their relation and derive for p “ 8 a polynomial time algorithmfor their calculation. Further, several lower bounds for both distances are derived and someof our results are generalized to the case of ﬁnite ultra-dissimilarity spaces. Contents

1. Introduction 21.1. The proposed approach 51.2. Overview of our results 71.3. Related work 82. Preliminaries 92.1. Ultrametric spaces and dendrograms 92.2. The ultrametric Gromov-Hausdorﬀ distance 112.3. Wasserstein distance on ultrametric spaces 123. Ultrametric Gromov-Wasserstein distances 153.1. Sturm’s ultrametric Gromov-Wasserstein distance 153.2. The ultrametric Gromov-Wasserstein distance 223.3. The relation between u GW ,p and u sturmGW ,p u GW ,p and u sturmGW ,p . 324.2. Completeness and separability 324.3. Geodesic property 355. Lower bounds of u GW ,p u GW ,p on ultra-dissimilarity spaces 40 a r X i v : . [ m a t h . M G ] J a n FACUNDO M´EMOLI, AXEL MUNK, ZHENGCHAO WAN, AND CHRISTOPH WEITKAMP u GW ,p and SLB ult p SLB ult1 based phylogenetic tree shape comparison 508.2.

SLB p based phylogenetic tree shape comparison 539. Concluding remarks 54Acknowledgements 54References 541. Introduction

Over the last decade the acquisition of ever more complex data, structures and shapes hasincreased drastically. Consequently, the need to develop meaningful methods for comparinggeneral objects has become more and more apparent. In numerous applications in molecularbiology (Holm and Sander, 1993; Kufareva and Abagyan, 2011), computer vision (Lowe,2001; Jain and Dorai, 2000) and electrical engineering (Papazov et al., 2012; Kuo et al.,2014) it is important to distinguish between diﬀerent objects, but to consider the sameobject in diﬀerent spatial orientations as equal. Furthermore, also the comparison of graphs,trees and networks, where mainly the underlying connectivity structure matters, have grownin importance (Chen and Safro, 2011; Dong and Sawin, 2020). One possibility to comparetwo general objects in a pose invariant manner is to model them as metric spaces p X, d X q and p Y, d Y q and regard them as elements of the collection of isometry classes of compact metricspaces denoted by M (i.e. two compact metric spaces p X, d X q and p Y, d Y q are in the sameclass if and only if they are isometric to each other which we denote by X – Y ). Then, it ispossible to compare p X, d X q and p Y, d Y q via the Gromov-Hausdorﬀ distance , which deﬁnes adistance on M . The Gromov-Hausdorﬀ distance between p X, d X q and p Y, d Y q is deﬁned as d GH p X, Y q : “ inf Z,φ,ψ d p Z,d Z q H p φ p X q , ψ p Y qq , (1)where φ : X Ñ Z and ψ : Y Ñ Z are isometric embeddings into a metric space p Z, d Z q and d p Z,d Z q H denotes the Hausdorﬀ distance on Z . The Hausdorﬀ distance is a metric on thecollection of compact subsets of a metric space p Z, d Z q , which is denoted by S p Z q , and for A, B P S p Z q deﬁned as follows d p Z,d Z q H p A, B q : “ max ˆ sup a P A inf b P B d Z p a, b q , sup b P B inf a P A d Z p a, b q ˙ . (2)While the Gromov-Hausdorﬀ distance has been applied successfully for various shape anddata analysis tasks (see e.g. M´emoli and Sapiro (2004); Bronstein et al. (2006a,b, 2009a,b); HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 3

Chazal et al. (2009); Bronstein et al. (2010); Carlsson and M´emoli (2010)), it turns out that itis generally convenient to equip the modelled objects with more structure and to model themas metric measure spaces (M´emoli, 2007, 2011). A metric measure space X “ p X, d X , µ X q isa triple, where p X, d X q denotes a metric space and µ X stands for a Borel probability measureon X with full support. This additional probability measure can be thought of as signallingthe importance of diﬀerent regions in the modelled object. Moreover, two metric measurespaces X “ p X, d X , µ X q and Y “ p Y, d Y , µ Y q are considered as isomorphic (denoted by X – w Y ) if and only if there exists an isometry ϕ : p X, d X q Ñ p Y, d Y q such that ϕ µ X “ µ Y .Here, ϕ denotes the pushforward map. From now on, M w denotes the collection of all(isomorphism classes of) compact metric measure spaces.The additional structure of the metric measure spaces allows to regard the modelled objectsas probability measures instead of compact sets. Hence, it is possible to substitute the Haus-dorﬀ component in (1) by a relaxed notion of proximity, namely the Wasserstein distance .This distance is fundamental to a variety of mathematical developments and is also knownas Kantorovich distance (Kantorovich, 1942), Kantorovich-Rubinstein distance (Kantorovichand Rubinstein, 1958), Mallows distance (Mallows, 1972) or as the Earth Mover’s distance(Rubner et al., 2000). Given a compact metric space p Z, d Z q , let P p Z q denote the space ofprobability measures on Z and let α, β P P p Z q . Then, the Wasserstein distance of order p ,for 1 ď p ď 8 , between α and β is deﬁned as d p Z,d Z q W ,p p α, β q : “ ˆ inf µ P C p α,β q ż Z ˆ Z d pZ p x, y q µ p dx ˆ dy q ˙ p , (3)and for p “ 8 as d p Z,d Z q W ,p p α, β q : “ inf µ P C p α,β q sup p x,y qP supp p µ q d Z p x, y q , (4)where C p α, β q denotes the set of all couplings of α and β , i.e., the set of all measures µ onthe product space Z ˆ Z such that µ p A ˆ Z q “ α p A q and µ p Z ˆ B q “ β p B q for all measurable sets A and B of Z . Since the space Z is compact, d p Z,d Z q W ,p deﬁnes a metricon P p Z q .It is well known Villani (2003) that the Wasserstein distance between probability measureson the real line admits a closed form solution (cf. Remark 2.18).Sturm (2006) has shown that replacing the Hausdorﬀ distance in (1) with the Wassersteindistance indeed yields a meaningful metric on M w . Let X “ p X, d X , µ X q and Y “ p Y, d Y , µ Y q be two metric measure spaces. Then, Sturm’s Gromov-Wasserstein distance of order p ,1 ď p ď 8 , is deﬁned as d sturmGW ,p p X , Y q : “ inf Z,φ,ψ d p Z,d Z q W ,p p φ µ X , ψ µ Y q , (5)where φ : X Ñ Z and ψ : Y Ñ Z are isometric embeddings into the metric space p Z, d Z q .Based on similar ideas but a diﬀerent representation of the Gromov-Hausdorﬀ distance,M´emoli (2007, 2011) derived a computationally more tractable and topologically equivalent FACUNDO M´EMOLI, AXEL MUNK, ZHENGCHAO WAN, AND CHRISTOPH WEITKAMP metric on M w , namely the Gromov-Wasserstein distance. Let C p µ X , µ Y q denote the set ofcouplings of µ X and µ Y . For p P r , , the p -distortion of µ P C p µ X , µ Y q is deﬁned asdis p p µ q : “ ¨˝ ĳ X ˆ Y ˆ X ˆ Y ˇˇ d X p x, x q ´ d Y p y, y q ˇˇ p µ p dx ˆ dy q µ p dx ˆ dy q ˛‚ { p (6)and for p “ 8 it is given asdis p µ q : “ sup x,x P X , y,y P Y s.t. p x,y q , p x ,y qP supp p µ q ˇˇ d X p x, x q ´ d Y p y, y q ˇˇ , where supp p µ q denotes the support of µ . The Gromov-Wasserstein distance of order p ,1 ď p ď 8 , is deﬁned as d GW ,p p X , Y q : “

12 inf µ P C p µ X ,µ Y q dis p p µ q . (7)Although both d sturmGW ,p and d GW ,p , 1 ď p ď 8 are in general supposed to be NP-hard tocompute (M´emoli, 2011), it is possible to eﬃciently approximate the local minima of d GW ,p via conditional gradient descent (M´emoli, 2011; Peyr´e et al., 2016). This has lead to numerousapplications and extensions of this distance (Alvarez-Melis and Jaakkola, 2018; Titouan et al.,2019; Bunne et al., 2019; Chowdhury and Needham, 2020).Clearly, the set M w contains various, extremely general spaces. However, in many applica-tions it is possible to have prior knowledge about the metric measure spaces under consid-eration and it is often reasonable to restrict oneself to work on a speciﬁc subset O w Ď M w .For instance, it could be known that the metrics of the spaces considered are induced by theshortest path metric on some underlying trees and hence it is unnecessary to consider thecalculation of d sturmGW ,p and d GW ,p , 1 ď p ď 8 , for all of M w . The potential advantages of afocus on a speciﬁc subset O w are twofold. On the one hand, it might be possible to use thefeatures of O w to gain computational beneﬁts. On the other hand, it might be possible to re-ﬁne the deﬁnition d sturmGW ,p and d GW ,p , 1 ď p ď 8 , to obtain a more informative comparison on O w . Naturally, it is of interest to identify and study these subclasses and the correspondingreﬁnements. This approach has been pursued to study (variants of) the Gromov-Hausdorﬀdistance on compact ultrametric spaces by Zarichnyi (2005) and Qiu (2009), and on com-pact p-metric spaces in M´emoli et al. (2019). Here, the metric space p X, d p p q X q is denoted as p -metric space p ď p ă 8q , if for all x, x , x P X it holds d X p x, x q ď p d X p x, x q p ` d X p x , x q p q { p . Further, the metric space p X, u X q , is denoted as ultrametric space, if u X fulﬁlls for all x, x , x P X that u X p x , x q ď max p u X p x, x q , u X p x , x qq , (8)i.e., ultrametric can be considered as the limiting case of p -metrics. In particular, M´emoliet al. (2019) derived a polynomial time algorithm for the calculation of the ultrametricGromov-Hausdorﬀ distance between two compact ultrametric spaces p X, u X q and p Y, u Y q (see Section 2.2), which is deﬁned as u GH p X, Y q : “ inf Z,φ,ψ d p Z,u Z q H p φ p X q , ψ p Y qq , (9) HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 5 where φ : X Ñ Z and ψ : Y Ñ Z are isometric embeddings into an ultrametric space p Z, u Z q and d p Z,u Z q H denotes the Hausdorﬀ distance on Z .A further motivation to study (surrogates of) the distances d sturmGW ,p and d GW ,p restricted on asubset O w comes from the idea of slicing which originated as a method to eﬃciently estimatethe Wasserstein distance d R W ,p p α, β q between probability measures α and β supported in ahigh dimensional euclidean space R d . The original idea is that given any line (cid:96) in R d one ﬁrstobtains α (cid:96) and β (cid:96) , the respective pushforwards of α and β under the orthogonal projectionmap π (cid:96) : R d Ñ (cid:96) , and then one invokes the explicit formula for the Wasserstein distance forprobability measures on R (see remark 2.18) to obtain a lower bound to d R d W ,p p α, β q withoutincurred the possibly high computational cost associated to solving an optimal transportationproblem. This lower bound is improved via repeated (often random) selections of the theline (cid:96) (Rubner et al., 2000; Bonneel et al., 2015; Kolouri et al., 2019). Recently, Le et al.(2019b) pointed out that, thanks to the fact that the 1-Wasserstein distance also admitsan explicit formula when the underlying metric space is a tree (Do Ba et al., 2011; Evansand Matsen, 2012; McGregor and Stubbs, 2013), one can also devise tree slicing estimatesof the distance between two given probability measures by suitably projecting them ontotree-like structures. Most likely, the same strategy is successful for suitable projections onrandom ultrametric spaces, as on these there is also an explicit formula for the Wassersteindistance (Kloeckner, 2015). The same line of of work has also recently been explored in theGromov-Wasserstein scenario (Vayer et al., 2019; Le et al., 2019a) and could be extendedbased on eﬃciently computable restrictions (or surrogates of) d sturmGW ,p and d GW ,p .Inspired by the results of M´emoli et al. (2019) on the ultrametric Gromov-Hausdorﬀ distanceand the results of Kloeckner (2015), who derived an explicit representation of the Wasser-stein distance on ultrametric spaces, we study in the course of this paper the collection ofcompact ultrametric measure spaces U w Ď M w , where X “ p X, u X , µ X q P U w , wheneverthe underlying metric space p X, u X q is a compact ultrametric space. Ultrametric spaces(and thus also ultrametric measure spaces) arise naturally in statistics as metric encodingsof dendrograms (Carlsson and M´emoli, 2010), in the context of phylogenetic trees (Sempleet al., 2003) and in the probabilistic approximation of ﬁnite metric spaces (Bartal, 1996).Especially for dendrograms and phylogenetic trees, it is important to have a meaningfulmethod of comparison, i.e., it is essential to have a meaningful metric on U w . However, it isevident from the deﬁnition of d sturmGW ,p and the relation of d sturmGW ,p and d GW ,p (see M´emoli (2011)),that the ultrametric structure of X , Y P U w is lost in the computation of d sturmGW ,p p X , Y q and d GW ,p p X , Y q , 1 ď p ď 8 . Hence, we suggest, just as for the ultrametric Gromov-Hausdorﬀdistance, to adapt the deﬁnition of d sturmGW ,p (see (5)) as well as the one of d GW ,p (see (7)) andverify in the following that this makes the comparisons of ultrametric measure spaces moresensitive and leads for p “ 8 to a polynomial time algorithm for the derivation the proposedmetrics.1.1. The proposed approach.

Let X “ p X, u X , µ X q and Y “ p Y, u Y , µ Y q be ultrametricmeasure spaces. Reconsidering the deﬁnition of Sturm’s Gromov-Wasserstein distance in(5), it is clear that if we embed the ultrametric spaces p X, u X q and p Y, u Y q into an arbitrarymetric space, then the ultrametric structure of the spaces X and Y may be lost in theembedding. Hence, in order to preserve the ultrametric structure in the comparison, wepropose to only inﬁmize over ultrametric spaces p Z, u Z q in (5). Thus, we deﬁne for p P r , FACUNDO M´EMOLI, AXEL MUNK, ZHENGCHAO WAN, AND CHRISTOPH WEITKAMP

Sturm’s ultrametric Gromov-Wasserstein distance of order p as u sturmGW ,p p X , Y q : “ inf Z,φ,ψ d p Z,u Z q W ,p p φ µ X , ψ µ Y q , (10)where φ : X Ñ Z and ψ : Y Ñ Z are isometric embeddings into an ultrametric space p Z, u Z q .In the subsequent sections of this paper, we will establish many theoretically appealingproperties of u sturmGW ,p . Unfortunately, we will verify that, although an explicit formula for theWasserstein distance of order p on ultrametric spaces exists (Kloeckner, 2015), for p P r , the calculation of u sturmGW ,p yields a highly non-trivial combinatorial optimization problem (seeSection 3.1.1). Therefore, we demonstrate that an adaption of the Gromov-Wassersteindistance deﬁned in (7) yields a topologically equivalent and easily approximable distance on U w . In order to deﬁne this adaption, we need to introduce some notation. For a, b ě ď q ă 8 let ∆ q p a, b q : “ | a q ´ b q | { q . Further deﬁne ∆ p a, b q : “ max p a, b q whenever a ‰ b and ∆ p a, b q “ a “ b . In particularnote that ∆ p a, q “ a for any a ě d GW ,p , 1 ď p ď 8 , as follows d GW ,p p X , Y q “

12 inf µ P C p µ X ,µ Y q ¨˝ ĳ X ˆ Y ˆ X ˆ Y ` ∆ p d X p x, x q , d Y p y, y qq ˘ p µ p dx ˆ dy q µ p dx ˆ dy q ˛‚ { p . (11)Considering the derivation of d GW ,p in M´emoli (2011) and the results on the closely relatedultrametric Gromov-Hausdorﬀ distance studied in M´emoli et al. (2019), this suggests toreplace ∆ in (11) by ∆ in order to incorporate the ultrametric structures of p X, u X , µ X q and p Y, u Y , µ Y q into the comparison. Hence, we deﬁne the p -ultra-distortion of a coupling µ P C p µ X , µ Y q for 1 ď p ă 8 asdis ult p p µ q : “ ¨˝ ĳ X ˆ Y ˆ X ˆ Y ` ∆ p u X p x, x q , u Y p y, y qq ˘ p µ p dx ˆ dy q µ p dx ˆ dy q ˛‚ { p . (12)and for p “ 8 as dis ult p µ q : “ sup x,x P X , y,y P Y s.t. p x,y q , p x ,y qP supp p µ q ∆ p u X p x, x q , u Y p y, y qq , where supp p µ q denotes the support of µ . Then, the ultrametric Gromov-Wasserstein distance of order p P r , , is given as u GW ,p p X , Y q : “ inf µ P C p µ X ,µ Y q dis ult p p µ q . (13)Due to the structural similarity of d GW ,p and u GW ,p we can expect (and later verify) thatmany properties of d GW ,p extend to u GW ,p . In particular, we will establish that also u GW ,p HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 7 can be approximated via conditional gradient descent and admits several polynomial timecomputable lower bounds that are useful in applications.1.2. Overview of our results.

We give a brief overview over the results obtained.

Section 2.

We slightly generalize the results of Carlsson and M´emoli (2010) on the relationbetween ultrametric spaces and dendrograms and establish a bijection between compactultrametric spaces and proper dendrograms (see Deﬁnition 2.3). After recalling some resultson the ultrametric Gromov-Hausdorﬀ distance (see (9) we use these results to reformulate theexplicit formula for the p -Wasserstein distance (1 ď p ă 8 ) on ultrametric spaces derived byKloeckner (2015) in terms of proper dendrograms. This allows us to derive a formulation ofthe -Wasserstein distance on ultrametric spaces and to study the Wasserstein distance oncompact subspaces of the ultrametric space p R ě , ∆ q , which will be relevant when studyinglower bounds of u GW ,p , 1 ď p ď 8 . Section 3.

We will demonstrate that u GW ,p and u sturmGW ,p , 1 ď p ď 8 , are p -metrics on thecollection of ultrametric measure spaces U w , which induces other topologies on U w than d sturmGW ,p z d GW ,p , 1 ď p ď 8 . We derive several alternative representations for u sturmGW ,p and beginto study the relation of both metrics u sturmGW ,p and u GW ,p . In particular, we show that, whilefor 1 ď p ă 8 it holds in general u GW , ă u sturmGW , , both metrics coincide for p “ 8 (i.e. u GW , “ u sturmGW , ). Furthermore, we show how this equality in combination with analternative representation of u GW , leads to a polynomial time algorithm for the calculationof u sturmGW , “ u GW , . Section 4.

We study the topological properties of p U w , u sturmGW ,p q and p U w , u GW ,p q , 1 ď p ď 8 .Most importantly, we show that, just as d sturmGW ,p and d GW ,p , both considered metrics aretopologically equivalent. While we prove that the metric spaces p U w , u sturmGW ,p q and p U w , u GW ,p q , 1 ď p ă 8 , are neither complete nor separable ultrametric space, we demonstrate that theultrametric space p U w , u sturmGW , q , which coincides with p U w , u GW , q , is complete. Further, weestablish that p U w , u sturmGW , q is a geodesic space. Section 5.

Unfortunately, it does not seem to be possible to derive a polynomial timealgorithm for the calculation of u sturmGW ,p and u GW ,p , 1 ď p ă 8 . Consequently, we deriveseveral polynomial time computable lower bounds for u GW ,p , 1 ď p ď 8 , in Section 5. Dueto the structural similarity of d GW ,p and u GW ,p , these are in a certain sense analogue to thosederived in M´emoli (2007, 2011) for d GW ,p . Among other things, we show that u GW ,p p X , Y q ě SLB ult p p X , Y q : “ inf γ P C p µ X b µ X ,µ Y b µ Y q (cid:107) ∆ p u X , u Y q (cid:107) L p p γ q . (14)We verify that the lower bound SLB ult p can be reformulated in terms of the Wassersteindistance on the ultrametric space p R ě , ∆ q (we derive an explicit formula for d p R ě , ∆ q W,p inSection 2.3). This allows us to eﬃciently calculate

SLB ult p p X , Y q in O pp m _ n q q , where m stands for the cardinality of X and n for the one of Y . Here “approximation” is meant in the sense that one can write code which will locally minimize the func-tional. We do not currently have any theoretical guarantees.

FACUNDO M´EMOLI, AXEL MUNK, ZHENGCHAO WAN, AND CHRISTOPH WEITKAMP

Section 6.

As the requirement that the induced metric spaces of the considered metricmeasure spaces are ultrametric is somewhat restrictive (especially in the context of phyloge-netic trees, see Semple et al. (2003)), we prove in Section 6 that the results on u GW ,p can beextended to the more general ultra-dissimilarity spaces (see Deﬁnition 6.1). In particular, weprove that u GW ,p , 1 ď p ď 8 , is a metric on the isomorphism classes of ultra-dissimilarityspaces (see Deﬁnition 6.6). Section 7.

We illustrate the behaviour and relation of u GW , (which can be approximated viaconditional gradient descent) and SLB ult1 in a set of toy examples. Additionally, we carefullyillustrate the diﬀerences between u GW , and SLB ult1 and d GW , and SLB (see Section 5 fora deﬁnition), respectively. Section 8.

Finally, we apply our result for phylogenetic tree shape comparison . To this end,we compare two sets of phylogenetic tree shapes based on the HA protein sequences fromhuman inﬂuenza collected in diﬀerent regions with the lower bound

SLB ult1 . In particular,we compare our result to the ones obtained by Colijn and Plazzotta (2018) for the samecomparison.1.3.

Related work.

In order to better contextualize our contribution, we now describerelated work, both in applied and computational geometry, and in phylogenetics (wherenotions of distance between trees have arisen naturally).

Metrics between trees: The phylogenetics perspective.

In phylogenetics, the need to be able tomeasure distance between diﬀerent trees arises form the fact that the process of reconstruc-tion of a phylogenetic tree may depend on the set of genes being considered. At the sametime, even for the same set of genes, diﬀerent reconstruction methods could be applied whichwould result in diﬀerent trees. As such, this has led to the development of many diﬀerentmetrics for measuring distance between phylogenetic. Examples include the Robinson-Fouldsmetric (Robinson and Foulds, 1981), The subtree-prune and regraft distance (Hein, 1990),and the nearest-neighbor interchange distance (Robinson, 1971).As pointed out in Owen and Provan (2010), many of these distances tend to quantify dif-ferences between tree topologies and often do not take into account edge lengths. A certainphylogenetic tree metric space which encodes for edge lengths was proposed in Billera et al.(2001) and studied algorithmically in Owen and Provan (2010). This tree space assumesthat the all trees have the same set of taxa. An extension to the case of trees over diﬀerentunderlying sets is given in Grindstaﬀ and Owen (2018)Lafond et al. (2019) considered one type of metrics on possibly muiltilabeled phylogenetictrees with a ﬁxed number of leafs. As the authors point out, a multilabeled philogenetic treein which no leafs are repeated is just a standard philogenetic tree, whereas a multilabeledphylogenetic tree in which all labels are equal deﬁnes a tree shape . The authors then pro-ceed to study the study the computational complexity associated to generalizations of someof the usual metrics for phylogenetic trees (such as the Robinson-Foulds distance) to themultilabeled case.Colijn and Plazzotta (2018) studied a metric between (binary) phylogenetic tree shapesbased on a bottom to top enumeration of speciﬁc connectivity structures. The authorsapplied their metric to compare evolutionary trees based on the HA protein sequences fromhuman inﬂuenza collected in diﬀerent regions.

HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 9

Metrics between trees: The applied geometry perspective.

From a diﬀerent perspective, ideasfrom applied geometry and applied and computational topology have been applied to thecomparison of tree shapes in applications in probability, clustering and applied and compu-tational topology.Metric trees are also considered in probability theory in the study of models for random treestogether with the need to quantify their distance; Evans (2007) describes some variants ofthe Gromov-Hausdorﬀ distance between metric trees. See also Greven et al. (2009) for thecase of metric measure space representations of trees and a certain Gromov-Prokhorov typeof metric on the collection thereof.Trees, in the form of dendrograms, are abundant in the realm of hierarhical clustering meth-ods. In theit study of the stability of hierarchical clustering methods, Carlsson and M´emoli(2010) utilized the Gromov-Hausdorﬀ distance between the ultrametric representation ofdendrograms.Schmiedl (2017) proved that computing the Gromov-Hausdorﬀ distance between tree metricspaces is NP-hard. Liebscher (2018) suggests some variants of the Gromov-Hausdorﬀ distancewhich are applicable in the context of phylogenetic trees.As mentioned before, Zarichnyi (2005) introduced the ultrametric Gromov-Hausdorﬀ dis-tance u GH between compact ultrametric spaces (a special type of tree metric spaces). Certaintheoretical properties such as precompactness of u GH has been studied in Qiu (2009). In con-trast with the NP-hardness of computing d GH , M´emoli et al. (2019) devised an polynomialtime algorithm for computing u GH .In computational topology merge trees arise through the study of the sublevel sets of a givenfunction (Adelson-Velskii and Kronrod, 1945; Reeb, 1946) with the goal of shape simpliﬁ-cation. Morozov et al. (2013) develop the notion of interleaving distance between mergetrees which is related to the Gromov-Hausdorﬀ distance between trees through bi-Lipschitzbounds. In Agarwal et al. (2018), exploiting the connection between the interleaving distanceand the Gromov-Hausdorﬀ between metric trees, the authors approach the computation ofthe Gromov-Hausdorﬀ distance between metric trees in general and provide certain approx-imation algorithms.Touli and Wang (2018) devise ﬁxed-parameter tractable (FPT) algorithms for computingthe interleaving distance between metric trees. One can imply from their methods an FPTalgorithm to compute a 2-approximation of the Gromov-Hausdorﬀ distance between ultra-metric spaces. M´emoli et al. (2019) devise an FPT algorithm for computing the exact valueof the Gromov-Hausdorﬀ distances between ultrametric spaces.2. Preliminaries

In this section we brieﬂy summarize the basic notions and concepts required throughout thepaper.2.1.

Ultrametric spaces and dendrograms.

We begin by describing ﬁnite ultrametricspaces in terms of dendrograms (for more details see Carlsson and M´emoli (2010)). To thisend, we introduce some deﬁnitions and some notation. Given a ﬁnite set X , a partition of X is a set P X “ t X , . . . , X k u where H ‰ X i Ď X , 1 ď i ď k , X i X X j “ H for all i ‰ j “ , . . . , k and Ť ki “ X i “ X . We call each element X i a block of the given partition P X and denote by Part p X q the collection of all partitions of X . For two partitions P X and P X we say that P X is ﬁner than P X , if for every block X i P P X there exists a block X j P P X such that X i Ď X j . Deﬁnition 2.1 (Dendrogram) . A dendrogram θ X : r ,

8q Ñ

Part p X q is a map parameter-izing a nested family of partitions over the same set X that satisﬁes the following conditions:(1) θ X p s q is ﬁner than θ X p t q for any 0 ď s ă t ă 8 ;(2) θ X p q is the ﬁnest partition consisting only singleton sets;(3) There exists t X ą t ě t X , θ X p t q “ t X u is the trivial partition;(4) For each t ě

0, there exists ε ą θ X p t q “ θ X p t q for all t P r t, t ` ε s .The following lemma gives an alternative representation for ﬁnite ultrametric spaces in termsof dendrograms. Lemma 2.2 (Carlsson and M´emoli (2010)) . Given a ﬁnite set X , denote by U p X q thecollection of all ultrametrics on X and D p X q the collection of all dendrograms over X .Then, there exists a bijection ∆ X : U p X q Ñ D p X q . In this paper, we are not only concerned with ﬁnite ultrametric spaces, but mainly withcompact ultrametric spaces. However, we show that a similar statement as Lemma 2.2 holdsfor compact ultrametric spaces. More precisely, we prove that there is a bijection betweencompact ultrametric spaces and the so-called proper dendrograms . Deﬁnition 2.3 (Proper dendrogram) . Given a set X (not necessarily ﬁnite), a proper den-drogram θ X : r ,

8q Ñ

Part p X q is a map satisfying the following conditions:(1) θ X p s q is ﬁner than θ X p t q for any 0 ď s ă t ă 8 ;(2) θ X p q is the ﬁnest partition consisting only singleton sets;(3) There exists T ą t ě T , θ X p t q “ t X u is the trivial partition;(4) For each t ą

0, there exists ε ą θ X p t q “ θ X p t q for all t P r t, t ` ε s .(5) For any distinct points x, x P X , there exists T xx ą x and x belong todiﬀerent blocks in θ X p T xx q .(6) For each t ą θ X p t q consists of only ﬁnitely many blocks.(7) Let t t n u n P N be a decreasing sequence such that lim n Ñ8 t n “ X n P θ X p t n q .If for any 1 ď n ă m , X m Ď X n , then Ş n P N X n ‰ H .It is obvious that any dendrogram θ X over a ﬁnite set X is a proper dendrogram. Let θ X bea proper dendrogram over a set X . For any x P X and t ě

0, we denote by r x s Xt the blockin θ p t q that contains x P X and abbreviate r x s Xt to r x s t when the underlying set X is clearfrom the context. HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 11

The subsequent theorem extends Lemma 2.2 to compact ultrametric spaces. Since its proofdepends on some results not yet introduced, we postpone it to Appendix A.3.

Theorem 2.4.

Given a set X , denote by U p X q the collection of all compact ultrametrics on X and D p X q the collection of all proper dendrograms over X . Then, there exists a canonicalbijective map ∆ X : U p X q Ñ D p X q . Remark 2.5.

From now on, we denote by θ X the proper dendrogram corresponding to agiven compact ultrametric u X on X under the bijection given above. Note that a block r x s t in θ X p t q is actually the closed ball B t p x q in X centered at x with radius t . So for each t ě θ X p t q partitions X into a union of several closed balls in X with respect to u X .2.2. The ultrametric Gromov-Hausdorﬀ distance.

Both d sturmGW ,p and d GW ,p , 1 ď p ď 8 ,are by construction closely related to the Gromov-Hausdorﬀ distance. In a recent paper,M´emoli et al. (2019) studied an ultrametric version of this distance, namely the ultrametricGromov-Hausdorﬀ distance (denoted as u GH ). Since we will demonstrate several connectionsbetween u sturmGW ,p , u GW ,p , 1 ď p ď 8 , and this distance, we brieﬂy summarize some of the resultsin M´emoli et al. (2019). We start by recalling the formal deﬁnition of u GH . Deﬁnition 2.6.

Let p X, u X q and p Y, u Y q be two compact ultrametric spaces. Then, the ultrametric Gromov-Hausdorﬀ between X and Y is deﬁned as u GH p X, Y q “ inf

Z,φ,ψ d Z H p φ p X q , ψ p Y qq , where φ : X Ñ Z and ψ : Y Ñ Z are isometric embeddings (distance preserving transfor-mations) into the ultrametric space p Z, u Z q .Zarichnyi (2005) has shown that u GH indeed is a (ultra)metric on U and M´emoli et al.(2019) identiﬁed a structural theorem (cf. Theorem 2.8) that gives rise to a polynomial timealgorithm for the calculation of u GH . More precisely, it was proven in M´emoli et al. (2019)that u GH can be calculated via so-called quotient ultrametric spaces, which we deﬁne next.Let p X, u X q be an ultrametric spaces and let t ě

0. We deﬁne an equivalence relation „ t on X as follows: x „ t x if and only if u X p x, x q ď t . We denote by r x s Xt (resp. r x s t ) theequivalence class of x under „ t and by X t the set of all such equivalence classes. Note that r x s Xt “ t x P X | u p x, x q ď t u is actually the closed ball centered at x with radius t . Wedeﬁne an ultrametric u X t on X t as follows: u X t pr x s t , r x s t q : “ u X p x, x q , r x s t ‰ r x s t , r x s t “ r x s t . Then, p X t , u X t q is an ultrametric space and we call p X t , u X t q the quotient of p X, u X q at level t (see Figure 1 for an illustration). It is straight forward to prove that the quotient of acompact ultrametric space at level t ą Lemma 2.7.

Let X be a complete ultrametric space. Then, X is compact ultrametric spaceif and only if for any t ą , X t is a ﬁnite space. Figure 1.

Metric quotient:

An ultrametric space (black) and its quotientat level t (red). Proof.

Wan (2020, Lemma 2.4) proves that whenever X is compact, X t is ﬁnite for any t ą X t is ﬁnite for any t ą

0. We only need to prove that X istotally bounded. For any ε ą X ε is a ﬁnite set and thus there exists x , . . . , x n P X suchthat X ε “ tr x s ε , . . . , r x n s ε u . Now, for any x P X , there exists x i for some i “ , . . . , n suchthat x P r x i s ε . This implies that u X p x, x i q ď ε . Therefore, the set t x , . . . , x n u Ď X is an ε -net of X . Then, X is totally bounded and thus compact. (cid:3) The following structural theorem characterizes u GH via quotient of ultrametric spaces. Theorem 2.8 (Structural theorem for u GH , (M´emoli et al., 2019, Theorem 5.7)) . Let p X, u X q and p Y, u Y q be two compact ultrametric spaces. Then, u GH p X, Y q “ inf t t ě | X t – Y t u . Remark 2.9.

The quotient spaces X t and Y t can be considered as vertex weighted, rootedtrees (M´emoli et al., 2019). Hence, it is possible to check X t – Y t in polynomial time(Aho and Hopcroft, 1974) (consider X t and Y t as labeled trees). Consequently, Theorem 2.8induces a simple, polynomial time algorithm to calculate u GH between two ﬁnite ultrametricspaces. Remark 2.10.

Obviously, it follows from Theorem 2.8 that u GH p X, Y q ě ∆ p diam p X q , diam p Y qq . This is analogous to the bound d GH p X, Y q ě | diam p X q ´ diam p Y q | for the Gromov-Hausdorﬀ distance (cf. M´emoli (2012, Theorem 3.4)).2.3. Wasserstein distance on ultrametric spaces.

Kloeckner (2015) uses the repre-sentation of ultrametric spaces as so called synchronized rooted trees to derive an explicitformula for the Wasserstein distance on ultrametric spaces. By the construction of thedendrograms and of the synchronized rooted trees (see Appendix A.1) it is immediatelyclear how to reformulate the results of Kloeckner (2015) on compact ultrametric spaces in

HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 13

Figure 2.

Illustration of p R ě , ∆ q : This is the dendrogram for a subspaceof p R ě , ∆ q consisting of 5 arbitrary distinct points of R ` .terms of proper dendrograms. To this end, we need to introduce some notation. For acompact ultrametric space X , let θ X be the associated proper dendrogram. Then, we let V p X q : “ Ť t ą θ X p t q “ tr x s t | x P X, t ą u . The following is a characterization of V p X q : Lemma 2.11. V p X q is the collection of all closed balls in X except for singletons t x u suchthat x is a cluster point in X . In particular, X P V p X q and for any x P X , if x is not acluster point, then t x u P V p X q . The proof is relegated to Appendix A.3. We sometimes denote by B an element in V p X q to avoid mentioning explicitly the center and the radius of the closed ball B . We denote by B ˚ the smallest (under inclusion) element in V p X q such that B Ř B ˚ (for the existence anduniqueness of B ˚ see Lemma A.1). Lemma 2.12.

Let p X, u X q be a compact ultrametric space. For all α, β P P p X q and ď p ă 8 , we have ` d X W ,p ˘ p p α, β q “ ´ ÿ B P V p X qzt X u p diam p B ˚ q p ´ diam p B q p qq | µ p B q ´ β p B q| . (15)While Lemma 2.12 is only valid for p ă 8 , it can be extended to the case p “ 8 . Lemma 2.13.

Let X be a compact ultrametric space. Then, for any α, β P P p X q , we have d X W , p α, β q “ max B P V p X qzt X u and α p B q‰ β p B q diam p B ˚ q . (16)The proof of this lemma is somewhat technical and we postpone it to Appendix A.3.2.3.1. Wasserstein distance on p R ě , ∆ q . The non-negative half real line R ě endowed with∆ turns out to be an ultrametric space (see Lemma A.3). Compact subspaces of p R ě , ∆ q are of particular interest in this paper. These spaces possess a particular structure (see Figure2) and the computation of the Wasserstein distance on them can be further simpliﬁed.For simplicity, we ﬁrst present an explicit formula for computing d p R ě , ∆ q W ,p between ﬁnitelysupported measures. A cluster point x in a topological space X is such that any neighborhood of x contains countably manypoints in X . Theorem 2.14 ( d p R ě , ∆ q W ,p between ﬁnitely supported measures) . Suppose α, β are supportedon a ﬁnite subset t x , . . . , x n u of R ě such that ď x ă x ă ¨ ¨ ¨ ă x n . Denote α i : “ α pt x i uq and β i : “ β pt x i uq . Then we have for p P r , that d p R ě , ∆ q W ,p p α, β q “ ´ p ˜ n ´ ÿ i “ ˇˇˇˇˇ i ÿ j “ p α j ´ β j q ˇˇˇˇˇ ¨ | x pi ` ´ x pi | ` n ÿ i “ | α i ´ β i | ¨ x pi ¸ p . (17) Let F α and F β denote the cumulative distribution functions of α and β , respectively. Then,for the case p “ 8 we obtain d p R ě , ∆ q W , p α, β q “ max ˆ max ď i ď n ´ ,F α p x i q‰ F β p x i q x i ` , max ď i ď n,α i ‰ β i x i ˙ . Proof.

Clearly, V p X q “ tt x , x , . . . , x i u| i “ , . . . , n u Y tt x i u| i “ , . . . , n u (recall that eachset corresponds to a closed ball). Thus, we conclude the proof by applying Lemma 2.12 andLemma 2.13. (cid:3) Remark 2.15 (The case p “ . Note that when p “

1, for any ﬁnitely supported probabilitymeasures α, β P P p R ě q , d p R ě , ∆ q W , p α, β q “ ˆ d p R , ∆ q W , p α, β q ` ż R x | α ´ β |p dx q ˙ . The formula indicates that the (cid:96) -Wasserstein distance on p R ě , ∆ q is the average of theusual (cid:96) -Wasserstein distance on p R ě , ∆ q and a “weighted total variation distance”. Theweighted total variation like distance term is sensitive to position change. For example, let α “ δ x and β “ δ x , then ş R x | α ´ β |p dx q “ x ` x if x ‰ x .Next, we demonstrate that Theorem 2.14 extends naturally to the case of compactly sup-ported probability measures in p R ě , ∆ q . For this purpose, it is important to note thatcompact subsets of p R ě , ∆ q have a very particular structure as shown by the subsequentlemma. For its proof, we refer to Appendix A.3. Lemma 2.16.

Let X Ď p R ě , ∆ q . X is a compact subset if and only if X is either a ﬁniteset or a countable set with being the unique cluster point. Based on the special structure of compact subsets of p R ě , ∆ q , we derive the followingextension of Theorem 2.14. Theorem 2.17 ( d p R ě , ∆ q W ,p between compactly supported measures) . Suppose α, β are sup-ported on a countable subset X : “ t u Y t x i | i P N u of R ě such that ă . . . ă x n ă x n ´ ă . . . ă x and is the only cluster point with respect to the usual Euclidean distance. Let α i : “ α pt x i uq for i P N and α : “ α pt uq . Similarly, let β i : “ β pt x i uq and β : “ β pt uq . Thenfor p P r , , d p R ě , ∆ q W ,p p α, β q “ ´ p ˜ ÿ i “ ˇˇˇˇˇ ÿ j “ i p α j ´ β j q ˇˇˇˇˇ ¨ | x pi ´ x pi ` | ` ÿ i “ | α i ´ β i | ¨ x pi ¸ p . (18) HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 15

Let F α and F β denote the cumulative distribution functions of α and β , respectively. Then,we obtain d p R ě , ∆ q W , p α, β q “ max ˆ max ď i ď n ´ ,F α p x i q‰ F β p x i q x i ` , max ď i ď n,α i ‰ β i x i ˙ . Proof.

Note that V p X q “ tt u Y t x j | j ě i u| i P N u Y tt x i u| i P N u (recall that each setcorresponds to a closed ball). Thus, we conclude the proof by applying Lemma 2.12 andLemma 2.13. (cid:3) Remark 2.18 (Closed-form solution for d p R ě , ∆ q q W ,p ) . We know that there is a closed-formsolution for Wasserstein distance on R with the usual Euclidean distance ∆ : d p R , ∆ q W ,p p α, β q “ ˆż | F ´ α p t q ´ F ´ β p t q| p dt ˙ p , where F α and F β are cumulative distribution functions of α and β , respectively. We have alsoobtained a closed-form solution for d p R ě , ∆ q W ,p in Theorem 2.17. We generalize these formulasto the case d p R ě , ∆ q q W ,p when q P p , and q ď p in Appendix A.2.3. Ultrametric Gromov-Wasserstein distances

In this section we establish various metric/topological properties of u sturmGW ,p as well as u GW ,p ,1 ď p ď 8 , and study the relation between them. Throughout this section, let P p X q denotethe set of Borel probability measures on the compact space X .3.1. Sturm’s ultrametric Gromov-Wasserstein distance.

We begin this section withestablish several basic properties of u sturmGW ,p , 1 ď p ď 8 , including a proof that u sturmGW ,p is indeeda metric (or more precisely a p -metric) on the collection of ultrametric measure spaces U w .We start with the following obvious observation: Lemma 3.1.

For any p P r , , we always have that u sturmGW ,p p X , Y q ě d sturmGW ,p p X , Y q . The deﬁnition of u sturmGW ,p given in (10) is clunky, technical and in general not easy to workwith. Hence, the ﬁrst observation to make is the fact that u sturmGW ,p , 1 ď p ď 8 shares afurther property with d sturmGW ,p . As for d sturmGW ,p , u sturmGW ,p can be calculated by minimizing overpseudo-ultrametrics instead of isometric embeddings. Lemma 3.2.

Let X “ p X, u X , µ X q and Y “ p Y, u Y , µ Y q be two ultrametric measure spaces.Let D ult p u X , u Y q denote the collection of all pseudo-ultrametrics u on the disjoint union X \ Y such that u | X ˆ X “ u X and u | Y ˆ Y “ u Y . Let p P r , . Then, it holds that u sturmGW ,p p X , Y q “ inf u P D ult p u X ,u Y q d p X \ Y,u q W ,p p µ X , µ Y q , (19) where d p X \ Y,u q W ,p denotes the Wasserstein pseudometric of order p deﬁned in (33) (resp. in (34) for p “ 8 ).Proof. The above lemma follows by the same arguments as Lemma 3.3 p iii q in Sturm (2006). (cid:3) Remark 3.3 (Wasserstein pseudometric) . The

Wasserstein pseudometric is a natural ex-tension of the Wasserstein distance to pseudometric spaces and has already been studied inThorsley and Klavins (2008). In Appendix B.1 we carefully show that it is closely related tothe Wasserstein distance on a canonically induced metric space. We further establish thatthe Wasserstein distance and the Wasserstein pseudometric share many relevant properties.Hence, we do not notationally distinguish between these two concepts.The representation of u sturmGW ,p , 1 ď p ď 8 , given by the above lemma is much more accessibleand we ﬁrst use it to prove the following basic properties of u sturmGW ,p : Proposition 3.4.

Let X , Y P U w . Then, the following holds:(1) For any ď p ď q ď 8 , we have that u sturmGW ,p p X , Y q ď u sturmGW ,q p X , Y q .(2) It holds that lim p Ñ8 u sturmGW ,p p X , Y q “ u sturmGW , p X , Y q . Proof. (1) This simply follows from Jensen’s inequality.(2) By (1), L : “ lim n Ñ8 u sturmGW ,n p X , Y q exists and L ď u sturmGW , p X , Y q . To prove the oppositeinequality, we let u n P D ult p u X , u Y q and µ n P C p µ X , µ Y q be such that ˆż X ˆ Y p u n p x, y qq n µ n p dx ˆ dy q ˙ n “ u sturmGW ,n p X , Y q . By Lemma B.2 and Lemma B.4, t u n u n P N uniformly converges to some u P D ult p u X , u Y q and t µ n u n P N weakly converges to some µ P C p µ X , µ Y q (after taking appropriate sub-sequences of both sequences). Let M “ sup p x,y qP supp p µ q u p x, y q . Let ε ą U “ tp x, y q P X \ Y | u p x, y q ą M ´ ε u . Then, µ p U q ą

0. Since U is open, it fol-lows that there exists a small ε ą µ n p U q ą µ p U q ´ ε ą n large enough. Moreover, by uniform convergence of the sequence t u n u n P N , we have | u p x, y q ´ u n p x, y q| ď ε for any p x, y q P X \ Y and n large enough. Therefore, ˆż X ˆ Y p u n p x, y qq n µ n p dx ˆ dy q ˙ n ě p µ n p U qq n p M ´ ε q ě p µ p U q ´ ε q n p M ´ ε q . Letting n Ñ 8 , we obtain L ě M ´ ε . Since ε ą L ě M ě u sturmGW , p X , Y q . (cid:3) Moreover, we use Lemma 3.2 to prove that p U w , u sturmGW ,p q is indeed a metric space: Theorem 3.5. u sturmGW ,p is a p -metric on the collection U w of compact ultrametric measurespaces. In particular, when p “ 8 , u sturmGW , is an ultrametric. We postpone the proof after introducing several auxiliary results. In particular, we willverify the existence of optimal metrics and optimal couplings in (19).

HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 17

Proposition 3.6 (Existence of optimal couplings) . Let X “ p X, u X , µ X q and Y “ p Y, u Y , µ Y q be compact ultrametric measure spaces. Then, there always exist u P D ult p u X , u Y q and µ P C p µ X , µ Y q such that for ď p ă 8 u sturmGW ,p p X , Y q “ ˆż X ˆ Y p u p x, y qq p µ p dx ˆ dy q ˙ p and such that u sturmGW , p X , Y q “ sup p x,y qP supp p µ q u p x, y q . Proof.

The following proof is a suitable adaptation from proof of Lemma 3.3 in Sturm (2006).We will only prove the claim for the case p ă 8 since the case p “ 8 can be shown in asimilar manner. Let u n P D ult p u X , u Y q and µ n P C p µ X , µ Y q be such that ˆż X ˆ Y p u n p x, y qq p µ n p dx ˆ dy q ˙ p ď u sturmGW ,p p X , Y q ` n . By Lemma B.2, t µ n u n P N weakly converges (after taking an appropriate subsequence) to some µ P C p µ X , µ Y q . By Lemma B.4, t u n u n P N uniformly converges (after taking an appropriatesubsequence) to some u P D p u X , u Y q such that ˆż X ˆ Y p u p x, y qq p µ p dx ˆ dy q ˙ p ď u sturmGW ,p p X , Y q . Hence, it only remains to verify that u P D ult p u X , u Y q . In fact, for any z , z , z P X \ Y , wehave max p u p z , z q , u p z , z qq “ max p lim n Ñ8 u n p z , z q , lim n Ñ8 u n p z , z qq“ lim n Ñ8 max p u n p z , z q , u n p z , z qqě lim n Ñ8 u n p z , z q “ u p z , z q . Therefore, u P D ult p u X , u Y q . (cid:3) As a direct consequence of the proposition, we have the following result:

Corollary 3.7.

Fix ď p ď 8 . Let X “ p X, u X , µ X q and Y “ p Y, u Y , µ Y q be compactultrametric measure spaces. Then, there exist a compact ultrametric space Z and isometricembeddings φ : X ã Ñ Z and ψ : Y ã Ñ Z such that u sturmGW ,p p X , Y q “ d Z W ,p p φ µ X , ψ µ Y q . Next, we ensure that the Wasserstein pseudometric of order p on a compact pseudo-ultrametricspace p X, u X q is for p P r , a p -pseudometric and for p “ 8 a pseudo-ultrametric, i.e., weprove for 1 ď p ă 8 that for all µ , µ , µ P P p X q d p X,u X q W ,p p µ , µ q ď ´´ d p X,u X q W ,p p µ , µ q ¯ p ` ´ d p X,u X q W ,p p µ , µ q ¯ p ¯ { p and for p “ 8 that for all µ , µ , µ P P p X q d p X,u X q W ,p p µ , µ q ď max ´ d p X,u X q W ,p p µ , µ q , d p X,u X q W ,p p µ , µ q ¯ . Lemma 3.8.

Let p X, u X q be a compact ultrametric space. Then, for ď p ď 8 the p -Wasserstein metric d p X,u X q W ,p is a p -pseudometric on P p X q . In particular, when p “ 8 , it isan pseudo-ultrametric on P p X q .Proof. We prove the statement by adapting the proof of the triangle inequality for the p -Wasserstein distance (see e.g. Villani (2003, Theorem 7.3)). We only prove the case when p ă 8 whereas the case p “ 8 follows by analogous arguments.Let α , α , α P P p X q , denote by µ an optimal transport plan between α and α and by µ an optimal transport plan between α and α (see Villani (2008, Theorem 4.1) for theexistence of µ and µ ). Furthermore, let X i be the support of α i , 1 ď i ď

3. Then, bythe Gluing Lemma (Villani, 2003, Lemma 7.6) there exists a measure µ P P p X ˆ X ˆ X q with marginals µ on X ˆ X and µ on X ˆ X . Clearly, we obtain ´ d p X,u X q W ,p p α , α q ¯ p ď ż X ˆ X ˆ X u pX p x, z q dµ p x, y, z qď ż X ˆ X ˆ X p u pX p x, y q ` u pX p y, z qq dµ p x, y, z q . Here, we used that u X is an ultrametric, i.e., in particular a p -metric (M´emoli et al., 2019,Proposition 1.16). With this we obtain that ´ d p X,u X q W ,p p α , α q ¯ p ď ż X ˆ X u pX p x, y q dµ p x, y q ` ż X ˆ X u pX p y, z q dµ p y, z q“ ´ d p X,u X q W ,p p α , α q ¯ p ` ´ d p X,u X q W ,p p α , α q ¯ p . (cid:3) With Proposition 3.6 and Lemma 3.8 at our disposal we are now ready to prove Theorem 3.5which states that u sturmGW ,p is indeed a p -metric on U w . Proof of Theorem 3.5.

It is clear that u sturmGW ,p is symmetric and that u sturmGW ,p p X , Y q “ X – w Y . Furthermore, we remark that u sturmGW ,p p X , Y q ě d sturmGW ,p p X , Y q by Lemma 3.1. Since d sturmGW ,p p X , Y q “ X – w Y (Sturm (2012)), we have that u sturmGW ,p p X , Y q “ X – w Y . It remains to verify the p -triangle inequality. To this end, we only prove thecase when p ă 8 whereas the case p “ 8 follows by analogous arguments.Let X , Y , Z P U w . Suppose u XY P D ult p u X , u Y q and u Y Z P D ult p u Y , u Z q are optimal metriccouplings such that ` u sturmGW ,p p X , Y q ˘ p “ ´ d p X \ Y,u XY q W ,p p µ X , µ Y q ¯ p and ` u sturmGW ,p p Y , Z q ˘ p “ ´ d p Y \ Z,u

Y Z q W ,p p µ Y , µ Z q ¯ p . Further, deﬁne u XY Z on X \ Y \ Z as u XY Z p x , x q “ $’’’&’’’% u XY p x , x q x , x P X \ Yu Y Z p x , x q x , x P Y \ Z inf t max p u XY p x , y q , u Y Z p y, x qq | y P Y u x P X, x P Z inf t max p u XY p x , y q , u Y Z p y, x qq | y P Y u x P Z, x P X. HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 19

Then, by Lemma 1.1 of Zarichnyi (2005) u XY Z is a pseudo-ultrametric on X \ Y \ Z thatcoincides with u XY on X \ Y and with u Y Z on Y \ Z . With this we obtain by Lemma 3.8that ` u sturmGW ,p p X , Z q ˘ p ď ´ d p X \ Y \ Z,u

XY Z q W ,p p µ X , µ Z q ¯ p ď ´ d p X \ Y \ Z,u

XY Z q W ,p p µ X , µ Y q ¯ p ` ´ d p X \ Y \ Z,u

XY Z q W ,p p µ Y , µ Z q ¯ p “ ´ d p X \ Y,u XY q W ,p p µ X , µ Y q ¯ p ` ´ d p Y \ Z,u

Y Z q W ,p p µ Y , µ Z q ¯ p “ ` u sturmGW ,p p X , Y q ˘ p ` ` u sturmGW ,p p Y , Z q ˘ p This gives the claim for p ă 8 . (cid:3) It is important to note that the topology induced on U w by u sturmGW ,p , 1 ď p ď 8 , is diﬀerentfrom the one induced by d sturmGW ,p . This is well illustrated in the following example. Example 3.9 ( u sturmGW ,p and d sturmGW ,p induce diﬀerent topologies) . This example is an adaptationfrom M´emoli et al. (2019, Example 3.14). For each a ą

0, denote by ∆ p a q the two-pointmetric space with interpoint distance a . Endow with ∆ p a q the uniform probability measure µ a and denote the corresponding ultrametric measure space ˆ∆ p a q . Now, let X : “ ˆ∆ p q andlet X n : “ ˆ∆ ` ` n ˘ for n P N . It is easy to check that for any 1 ď p ď 8 , d sturmGW ,p p X , X n q “ n and u sturmGW ,p p X , X n q “ ´ p p ` n q where we adopt the convention that 1 {8 “

0. Hence, as n goes to inﬁnity X n will converge to X in the sense of d sturmGW ,p , but not in the sense of u sturmGW ,p ,for any 1 ď p ď 8 .3.1.1. Alternative representations of u sturmGW ,p . In this subsection we derive alternative repre-sentations for u sturmGW ,p deﬁned in (10) for ﬁnite ultrametric measure spaces. We mainly focuson the case p ă 8 , however it turns out that the results also hold for p “ 8 . In Section 3.3,we will further prove that u sturmGW , “ u GW , (see Theorem 3.33) and study the implications ofthis in Section 3.2.Let X , Y P U w be ﬁnite spaces and recall the original deﬁnition of u sturmGW ,p , p P r , , givenin (10), i.e., u sturmGW ,p p X , Y q “ inf Z,φ,ψ d p Z,u Z q W ,p p ϕ µ Y , ψ µ Y q , where φ : X Ñ Z and ψ : Y Ñ Z are isometric embeddings into an ultrametric space p Z, u Z q . It turns out that we only need to consider relatively few possibilities of mapping twoultrametric spaces into a common ultrametric space. Exemplarily, this is shown in Figure 3,where we see two ultrametric spaces and two possibilities for a common ultrametric space Z . Indeed, it is straightforward to write down all reasonable embeddings and target spaces.We deﬁne the set A : “ tp A, ϕ q | A Ď X and ϕ : A ã Ñ Y is an isometric embedding u . (20)Clearly, A ‰ H , as it holds for each x P X that tpt x u , ϕ y qu y P Y Ď A , where ϕ y is the mapsending x to y P Y . The following example of elements in A is used in the sequel. Figure 3.

Common ultrametric spaces:

Representation of the two kindsof ultrametric spaces Z (middle and right) into which we can isometricallyembed the spaces X and Y (left). Example 3.10.

Let X , Y P U w be ﬁnite spaces and let u P D ult p u X , u Y q . If u ´ p q ‰ H , wedeﬁne A : “ π X p u ´ p qq Ď X . Then, the map ϕ : A Ñ Y deﬁned by sending x P A to y P Y such that u p x, y q “ p A, ϕ q P A .Let D ultadm p u X , u Y q denote the collection of all admissible pseudo-ultrametrics on X \ Y , where u P D ult p u X , u Y q is called admissible , if there exists no u ˚ P D ult p u X , u Y q such that u ˚ ‰ u and u ˚ p x, y q ď u p x, y q for all x, y P X \ Y . Lemma 3.11.

For any X , Y P U w , D ultadm p u X , u Y q ‰ H . Moreover, u sturmGW ,p p X , Y q “ inf u P D ultadm p u X ,u Y q d p X \ Y,u q W ,p p µ X , µ Y q . Combined with Example 3.10, the following result implies that each u P D ultadm p u X , u Y q givesrise to an element in A . Lemma 3.12.

Given X , Y P U w , for each u P D ultadm p u X , u Y q , u ´ p q ‰ H . Proofs of Lemma 3.11 and Lemma 3.12 are given in Appendix B.2.Now, ﬁx two ﬁnite spaces X , Y P U w . Let p A, ϕ q P A and let Z A “ X \ p Y z ϕ p A qq Ď X \ Y .Furthermore, deﬁne u Z A : Z A ˆ Z A Ñ R ě as follows:(1) u Z A | X ˆ X : “ u X and u Z A | Y z ϕ p A qˆ Y z ϕ p A q : “ u Y | Y z ϕ p A qˆ Y z ϕ p A q ;(2) For any x P A and y P Y z ϕ p A q deﬁne u Z A p x, y q : “ u Y p y, ϕ p x qq ;(3) For x P X z A and y P Y z ϕ p A q let u Z A p x, y q : “ inf t max p u X p x, a q , u Y p ϕ p a q , y qq | a P A u ;(4) For any x P X and y P Y z ϕ p A q , u Z A p y, x q : “ u Z A p x, y q . Then, p Z A , u Z A q is an ultrametric space such that X and Y can be mapped isometricallyinto Z A (see Zarichnyi (2005, Lemma 1.1)). Let φ X p A,ϕ q and ψ Y p A,ϕ q denote the correspondingisometric embeddings of X and Y , respectively. HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 21

Theorem 3.13.

Let X , Y P U w be ﬁnite spaces. Then, we have for each p P r , that u sturmGW ,p p X , Y q “ inf p A,ϕ qP A d Z A W ,p ´` φ X p A,ϕ q ˘ µ X , ` ψ Y p A,ϕ q ˘ µ Y ¯ . (21) Proof.

By Lemma 3.11 it is suﬃcent to prove that each u P D ultadm p u X , u Y q induces p A, ϕ q P A such that d p X \ Y,u q W ,p p µ X , µ Y q ě d Z A W ,p ´` φ X p A,ϕ q ˘ µ X , ` ψ Y p A,ϕ q ˘ µ Y ¯ . Let u P D ultadm p u X , u Y q . We deﬁne A : “ t x P X | D y P Y such that u p x, y q “ u ( A ‰ H byLemma 3.12). By Example 3.10, the map ϕ : A Ñ Y deﬁned by taking x to y such that u p x, y q “ p A , ϕ q P A .If u p x, y q ě u Z A ´ φ X p A ,ϕ q p x q , ψ Y p A ,ϕ q p y q ¯ holds for all p x, y q P X ˆ Y , then we set A : “ A and ϕ : “ ϕ . This gives d p X \ Y,u q W ,p p µ X , µ Y q ě d Z A W ,p ´` φ X p A,ϕ q ˘ µ X , ` ψ Y p A,ϕ q ˘ µ Y ¯ . Otherwise, there exists p x, y q P X z A ˆ Y z ϕ p A q such that u p x, y q ă u Z A ` φ X p A ,ϕ q p x q , ψ Y p A ,ϕ q p y q ˘ (if x P A or y P ϕ p A q , then we must have u p x, y q ě u Z A ´ φ X p A ,ϕ q p x q , ψ Y p A ,ϕ q p y q ¯ ). Let p x , y q P X z A ˆ Y z ϕ p A q be such that u p x , y q “ min ! u p x, y q| p x, y q P X z A ˆ Y z ϕ p A q and u p x, y q ă u Z A ` φ X p A ,ϕ q p x q , ψ Y p A ,ϕ q p y q ˘ ) ą . The existence of p x , y q follows from ﬁniteness of X and Y . It is easy to check that ϕ extends to an isometry from A Y t x u to ϕ p A q Y t y u by taking x to y . We denotethe new isometry ϕ and set A : “ A Y t x u . If for any p x, y q P X ˆ Y , we have that u p x, y q ě u Z A ´ φ X p A ,ϕ q p x q , ψ Y p A ,ϕ q p y q ¯ , then we deﬁne A : “ A and ϕ : “ ϕ . Otherwise,we continue the process to obtain A , A , . . . . This process will eventually stop since we areconsidering ﬁnite spaces. Suppose the process stops at A n , then A : “ A n and ϕ : “ ϕ n satisfythat u p x, y q ě u Z A ´ φ X p A,ϕ q p x q , ψ Y p A,ϕ q p y q ¯ for any p x, y q P X ˆ Y . Therefore, d p X \ Y,u q W ,p p µ X , µ Y q ě d Z A W ,p ´` φ X p A,ϕ q ˘ µ X , ` ψ Y p A,ϕ q ˘ µ Y ¯ . Since u P D ultadm p u X , u Y q is arbitrary, this gives the claim. (cid:3) In fact, the possibilities of pairs in A in (21) can be further reduced. We call a pair p A, ϕ q P A maximal , if for all pairs p B, ϕ q P A with A Ď B and ϕ | A “ ϕ it holds A “ B . We denoteby A ˚ Ď A the collection of all maximal pairs. The subsequent direct consequence ofTheorem 3.13 demonstrates that in order to calculate u sturmGW ,p , 1 ď p ď 8 , it is suﬃcient toconsider only spaces p Z A , u Z A q induced by maximal pairs in A ˚ . Corollary 3.14.

Let X , Y P U w be ﬁnite spaces. Then, we have for each p P r , that u sturmGW ,p p X , Y q “ inf p A,ϕ qP A ˚ d Z A W ,p ´` φ X p A,ϕ q ˘ µ X , ` ψ Y p A,ϕ q ˘ µ Y ¯ . (22) Remark 3.15.

Let X and Y be two ﬁnite ultrametric measure spaces. The representation of u GW ,p p X , Y q , 1 ď p ď 8 given by Theorem 3.13 is very explicit and recasts the computationof u GW ,p p X , Y q , 1 ď p ď 8 , as a combinatorial problem. Using the ultrametric Gromov-Hausdorﬀ distance (see (9)) it is possible to determine if two ultrametric spaces are isometricin polynomial time (M´emoli et al., 2019, Theorem 5.7). However, this is clearly not suﬃcientto identify all p A, ϕ q P A ˚ in polynomial time. Especially, since for a given, viable A Ď X ,there are usually multiple ways to deﬁne the corresponding map ϕ . Furthermore, we havefor 1 ď p ă 8 neither been able to further restrict the set A ˚ nor to identify the optimal p A ˚ , ϕ ˚ q . This just leaves a brute force approach which is computationally not feasible. Onthe other hand, for p “ 8 we are able to explicitly construct the optimal pair p A ˚ , ϕ ˚ q (seeTheorem 3.34).3.2. The ultrametric Gromov-Wasserstein distance.

Next, we consider basic proper-ties of u GW ,p and prove the analogue of Theorem 3.5, i.e., we verify that also u GW ,p is a p -metric on the collection of ultrametric measure spaces, 1 ď p ď 8 .We start with the following obvious observation: Lemma 3.16.

For any p P r , , we always have that u GW ,p p X , Y q ě d GW ,p p X , Y q . The subsequent proposition collects two basic properties of u GW ,p which are also shared by u sturmGW ,p (cf. Proposition 3.4). Proposition 3.17.

Let X , Y P U w . Then, the following holds:(1) For any ď p ď q ď 8 , it holds u GW ,p p X , Y q ď u GW ,q p X , Y q ;(2) We have that lim p Ñ8 u GW ,p p X , Y q “ u GW , p X , Y q . Proof. (1) By Jensen’s inequality we have that dis ult p p µ q ď dis ult q p µ q for any µ P C p µ X , µ Y q .Therefore, u GW ,p p X , Y q ď u GW ,q p X , Y q .(2) By (1), L : “ lim n Ñ8 u GW ,n p X , Y q exists and L ď u GW , p X , Y q . To prove the oppositeinequality, we let µ n P C p µ X , µ Y q be such that ¨˝ ĳ X ˆ Y ˆ X ˆ Y ∆ p u X p x, x q , u Y p y, y qq n µ n p dx ˆ dy q µ n p dx ˆ dy q ˛‚ n “ u GW ,n p X , Y q . By Lemma B.2, t µ n u n P N weakly converges (after taking an appropriate subsequence)to some µ P C p µ X , µ Y q . Let M “ sup p x,y q , p x ,y qP supp p µ q ∆ p u X p x, x q , u Y p y, y qq and given any ε ą U “ tpp x, y q , p x , y qq P X ˆ Y ˆ X ˆ Y | ∆ p u X p x, x q , u Y p y, y qq ą M ´ ε u . HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 23

Then, we have µ b µ p U q ą

0. As µ n weakly converges to µ , we have that µ n b µ n weakly converges to µ b µ . Since U is open, there exists a small ε ą µ n b µ n p U q ą µ b µ p U q ´ ε ą n large enough. Therefore, ¨˝ ĳ X ˆ Y ˆ X ˆ Y ∆ p u X p x, x q , u Y p y, y qq n µ n p dx ˆ dy q µ n p dx ˆ dy q ˛‚ n ěp µ n b µ n p U qq n p M ´ ε q ě p µ b µ p U q ´ ε q n p M ´ ε q . For n Ñ 8 , we obtain L ě M ´ ε . Since ε ą L ě M ě u GW , p X , Y q . (cid:3) Furthermore, it is possible to write down u GW ,p , 1 ď p ď 8 , explicitly in some simplesettings. Let X “ p X , d X , µ X q be a ultrametric measure space. Let its p -diameter (see e.g.M´emoli (2011)) for 1 ď p ă 8 be deﬁned asdiam p p X q : “ ¨˝ ĳ X ˆ X ` d X p x, x q ˘ p µ X p dx q µ X p dx q ˛‚ { p and for p “ 8 as diam p X q : “ sup p x,x qP supp p µ X q d X p x, x q . Then, on can show the subsequent proposition.

Proposition 3.18.

Let ˚ P U w be the one-point space. Then, it holds for any ď p ď 8 that u GW ,p p X , ˚q “ diam p p X q . Proof.

Denote by µ the unique coupling µ X b δ ˚ between µ X and δ ˚ . Then, for any p ă 8 we have u GW ,p p X , ˚q “ ¨˝ ĳ X ˆ˚ˆ X ˆ˚ ` ∆ p u X p x, x q , u ˚ p y, y qq ˘ p µ p dx ˆ dy q µ p dx ˆ dy q ˛‚ { p “ ¨˝ ĳ X ˆ X ` u X p x, x q ˘ p µ X p dx q µ X p dx q ˛‚ { p “ diam p p X q . The case p “ 8 follows by analogous arguments. (cid:3) Next, we verify that u GW ,p is indeed a metric on the collection of ultrametric measure spaces. Theorem 3.19.

The ultrametric Gromov-Wasserstein distance u GW ,p is a p -metric on thecollection U w of compact ultrametric measure spaces. In particular, when p “ 8 , u GW , isan ultrametric. The proof is based on the following results about existence of optimal couplings in (13).

Proposition 3.20.

Let X “ p X, u X , µ X q and Y “ p Y, u Y , µ Y q be compact ultrametric mea-sure spaces. Then, for any p P r , , there always exists an optimal coupling µ P C p µ X , µ Y q such that u GW ,p p X , Y q “ dis ult p p µ q .Proof. We will only prove the claim for the case p ă 8 since the case p “ 8 can be provenin a similar manner. Let µ n P C p µ X , µ Y q be such that ¨˝ ĳ X ˆ Y ˆ X ˆ Y ∆ p u X p x, x q , u Y p y, y qq p dµ n p x, y q dµ n p x , y q ˛‚ p ď u GW ,p p X , Y q ` n . By Lemma B.2, t µ n u n P N weakly converges to some µ P C p µ X , µ Y q (after taking an appropriatesubsequence). Then, by the boundedness and continuity of ∆ p u X , u Y q on X ˆ Y ˆ X ˆ Y (cf. Lemma B.5) as well as the weak convergence of µ n b µ n , we have that thatdis ult p p µ q “ lim n Ñ8 dis ult p p µ n q ď u GW ,p p X , Y q . Hence, u GW ,p p X , Y q “ dis ult p p µ q . (cid:3) With Proposition 3.20 at our disposal, we can demonstrate the analogue of Theorem 3.5 for u GW ,p , 1 ď p ď 8 . Proof of Theorem 3.19.

It is clear that u GW ,p is symmetric and that u GW ,p p X , Y q “ X – w Y . Furthermore, we remark that u GW ,p p X , Y q ě d GW ,p p X , Y q by Lemma 3.16. Since d GW ,p p X , Y q “ X – w Y (see M´emoli (2011)), we have that u GW ,p p X , Y q “ X – w Y . It remains to verify the p -triangle inequality. To this end, we onlyprove the case when p ă 8 whereas the case p “ 8 follows by analogous arguments.Now let X , Y , Z be three ultrametric measure spaces. Let µ XY P C p µ X , µ Y q and µ Y Z P C p µ Y , µ Z q be optimal. By the Gluing Lemma (Villani, 2003, Lemma 7.6), there exists ameasure µ XY Z P P p X ˆ Y ˆ Z q with marginals µ XY on X ˆ Y and µ Y Z on Y ˆ Z . Further,we deﬁne µ XZ “ p π XZ q µ P P p X ˆ Z q . Then, p u GW ,p p X , Z qq p ď ĳ X ˆ Z ˆ X ˆ Z ` ∆ p u X p x, x q , u Z p z, z qq ˘ p µ XZ p dx ˆ dz q µ XZ p dx ˆ dz q“ ĳ X ˆ Y ˆ Z ˆ X ˆ Y ˆ Z ` ∆ p u X p x, x q , u Z p z, z qq ˘ p µ XY Z p dx ˆ dy ˆ dz q µ XY Z p dx ˆ dy ˆ dz qď ĳ X ˆ Y ˆ Z ˆ X ˆ Y ˆ Z ` ∆ p u X p x, x q , u Y p y, y qq ˘ p µ XY Z p dx ˆ dy ˆ dz q µ XY Z p dx ˆ dy ˆ dz q` ĳ X ˆ Y ˆ Z ˆ X ˆ Y ˆ Z ` ∆ p u Y p y, y q , u Z p z, z qq ˘ p µ XY Z p dx ˆ dy ˆ dz q µ XY Z p dx ˆ dy ˆ dz q“ ĳ X ˆ Y ˆ X ˆ Y ` ∆ p u X p x, x q , u Y p y, y qq ˘ p µ XY p dx ˆ dy q µ XY p dx ˆ dy q HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 25

Figure 4.

Weighted Quotient:

An ultrametric measure space (black) andits weighted quotient at level t (red). ` ĳ Y ˆ Z ˆ Y ˆ Z ` ∆ p u Y p y, y q , u Z p z, z qq ˘ p µ Y Z p dy ˆ dz q µ Y Z p dy ˆ dz q“p u GW ,p p X , Y qq p ` p u GW ,p p Y , Z qq p , where the second inequality follows from the fact that ∆ in an ultrametric on R ě (seeLemma A.3) and the observation that an ultrametric is automatically a p -metric for any p P r , (M´emoli et al., 2019, Proposition 1.16). (cid:3) Remark 3.21.

By the same arguments as for d GW ,p ,1 ď p ă 8 , (M´emoli, 2011, Sec.7), it follows that for two ﬁnite ultrametric measure spaces X and Y the computation of u GW ,p p X , Y q , 1 ď p ă 8 , boils down to solving a (non-convex) quadratic program. This isin general NP-hard (Pardalos and Vavasis, 1991). On the other hand, for p “ 8 , we willderive a polynomial time algorithm to determine u GW , p X , Y q (cf. Section 3.2.1).3.2.1. Alternative representations of u GW , . In the following, we will derive an alternativerepresentation of u GW , that resembles the one of u GH derived in M´emoli et al. (2019,Theorem 5.7). It also leads to a polynomial time algorithm for the computation of u GW , .For this purpose, we deﬁne the weighted quotient of an ultrametric measure space. Let X “ p X, u X , µ X q P U w and let t ě

0. Then, the weighted quotient of X at level t , is given as X t “ p X t , u X t , µ X t q , where p X t , u X t q is the quotient of the ultrametric space p X, u X q at level t (see Section 2.2) and µ X t P P p X t q is the push forward of µ X under the canonical quotientmap p X, u X q ÞÑ p X t , u X t q . Figure 4 illustrates the weighted quotient in a simple example.Based on this deﬁnition, we show the following theorem. Theorem 3.22.

Let X “ p X, u X , µ X q and Y “ p Y, u Y , µ Y q be two compact ultrametricmeasure spaces. Then, it holds that u GW , p X , Y q “ min t t ě | X t – w Y t u . Remark 3.23.

The weighted quotients X t and Y t can be considered as vertex weighted,rooted trees and thus it is possible to verify X t – w Y t in polynomial time (Aho and Hopcroft,1974). In consequence, we obtain an polynomial time algorithm for the calculation of u GW , .See Section 7.1.2 for details. Proof of Theorem 3.22.

We ﬁrst prove that u GW , p X , Y q “ inf t t ě | X t – w Y t u (23)and then show that the inﬁmum is attainable.Since X – X and Y – Y , if X – w Y , then X – Y and thus by Theorem 3.19 u GW , p X , Y q “ “ inf t t ě | X t – w Y t u Now, assume that for some t ą X t – w Y t . By Lemma 2.7, for some n P N we can write X t “ tr x s t , . . . , r x n s t u and Y t “ tr y s t , . . . , r y n s t u such that u X t pr x i s t , r x j s t q “ u Y t pr y i s t , r y j s t q and µ X pr x i s t q “ µ Y pr y i s t q . Let µ iX : “ µ X | r x i s t and µ iY : “ µ Y | r y i s t for all i “ , . . . , n . Let µ : “ ř ni “ µ iX b µ iY . It is easy to check that µ P C p µ X , µ Y q and supp p µ q “ Ť ni “ r x i s t ˆ r y i s t . Assume p x, y q P r x i s t ˆ r y i s t and p x , y q P r x j s t ˆ r y j s t . If i ‰ j , then u X t pr x i s t , r x j s t q “ u Y t pr y i s t , r y j s t q and thus ∆ p u X p x, x q , u Y p y, y qq “ ∆ p u X t pr x i s t , r x j s t q , u Y t pr y i s t , r y j s t qq “

0. If i “ j , then u X p x, x q , u Y p y, y q ď t and thus ∆ p u X p x, x q , u Y p y, y qq ď t . In either case, we have that u GW , p X , Y q ď sup p x,y q , p x ,y qP supp p µ q ∆ p u X p x, x q , u Y p y, y qq ď t. Therefore, u GW , p X , Y q ď inf t t ě | X t – w Y t u . Conversely, suppose µ P C p µ X , µ Y q and let t : “ sup p x,y q , p x ,y qP supp p µ q ∆ p u X p x, x q , u Y p y, y qq .By M´emoli (2011, Lemma 2.2), we know that supp p µ q is a correspondence between X and Y .It is easy to check that dis p supp p µ qq “ t . We deﬁne a map f t : X t Ñ Y t by taking r x s Xt P X t to r y s Yt P Y t such that p x, y q P supp p µ q . It is easy to check that f t is well-deﬁned and moreover f t is an isometry (see for example the proof of M´emoli et al. (2019, Theorem 5.7)). Next,we prove that f t is actually an isomorphism between X t and Y t . For any r x s Xt P X t , let y P Y be such that r y s Yt “ f t pr x s Xt q . If there exists p x , y q P supp p µ q such that x P r x s Xt and y R r y s Yt , then ∆ p u X p x, x q , u Y p y, y qq “ u Y p y, y q ą t “ dis p supp p µ qq , which isimpossible. Consequently, µ pr x s Xt ˆ p Y zr y s Yt qq “ µ pp X zr x s Xt q ˆ r y s Yt q “ µ X pr x s Xt q “ µ pr x s Yt ˆ Y q “ µ pr x s Xt ˆ r y s Yt q “ µ p X ˆ r y s Yt q “ µ Y pr y s Yt q . Therefore, f t is an isomorphism between X t and Y t . Consequently, we have that u GW , p X , Y q ě inf t t ě | X t – w Y t u and hence u GW , p X , Y q “ inf t t ě | X t – w Y t u . Now, we show that the inﬁmum of inf t t ě | X t – w Y t u is attainable. Let δ : “ inf t t ě | X t – w Y t u .If δ ą

0, then both X δ and Y δ are ﬁnite spaces. Then, if t t n u is a decreasing sequence con-verging to δ , then for n large enough we actually have that X t n “ X δ and Y t n “ Y δ . If wemoreover assume that X t n – w Y t n for all t n , then we have that X δ – w Y δ . If δ “

0, then by(23) we have that u GW , p X , Y q “ δ “

0. By Theorem 3.19, X – w Y . This is equivalent to X δ – w Y δ . Therefore, the inﬁmum of inf t t ě | X t – w Y t u is always attainable. (cid:3) HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 27

The representations of u GH in Theorem 2.8 and u GW , in Theorem 3.22 strongly resemblethemselves. As a direct consequence of both Theorem 2.8 and Theorem 3.22, we obtain thefollowing comparison between the two metrics Corollary 3.24.

Let X , Y P U w . Then, it holds that u GW , p X , Y q ě u GH p X, Y q . (24)The inequality in (24) is sharp and we illustrate this as follows. By M´emoli et al. (2019,Corollary 5.8) we know that if the considered ultrametric spaces p X, u X q and p Y, u Y q havediﬀerent diameter (w.l.o.g. diam p X q ă diam p Y q ), then u GH p X, Y q “ diam p Y q . The samestatement also holds for u GW , Corollary 3.25.

Let X , Y P U w be such that diam p X q ă diam p Y q . Then, u GW , p X , Y q “ diam p Y q “ u GH p X, Y q . Proof.

The rightmost equality follows directly from Corollary 5.8 of M´emoli et al. (2019).As for the leftmost equality, let t : “ diam p Y q , then it is obvious that X t – w ˚ – w Y t . Let s P p diam p X q , diam p Y qq , then X t – w ˚ whereas Y ﬂ w ˚ . By Theorem 3.22, u GW , p X , Y q “ t “ diam p Y q . (cid:3) The relation between u GW ,p and u sturmGW ,p . In this subsection, we study the relationbetween u sturmGW ,p and u GW ,p , 1 ď p ď 8 . For this purpose, we ﬁrst demonstrate that it issuﬃcient to consider the two cases p “ p “ 8 .For each α ą

0, we deﬁne a function S α : R ě Ñ R ě by x ÞÑ x α . Given an ultrametric space p X, u X q and α ą

0, we abuse the notation and denote by S α p X q the new space p X, S α ˝ u X q .It is obvious that S α p X q is still an ultrametric space. This transformation of metric spaces isalso known as the snowﬂake transform (David et al., 1997). An important observation is thatthe snowﬂake transform relates the p -Wasserstein pseudometric on the pseudo-ultrametricspace X with the 1-Wasserstein pseudometric on the space S p p X q , 1 ď p ă 8 . Lemma 3.26.

Given a pseudo-ultrametric space p X, u X q and p ě , we have for any α, β P P p X q that d p X,u X q W ,p p α, β q “ ´ d S p p X q W , p α, β q ¯ p . Remark 3.27.

Since S p ˝ u X and u X induce the same topology and thus the same Borel setson X , we have that P p X q “ P p S p p X qq and thus the expression d S p p X q W , p α, β q in the lemma iswell deﬁned. Proof.

Suppose µ , µ P C p α, β q are optimal for d X W ,p p α, β q and d S p p X q W , p α, β q , respectively (seeAppendix B.1 for the existence of µ and µ ). Then, ´ d p X,u X q W ,p p α, β q ¯ p “ ĳ X ˆ X p u X p x, y qq p µ p dx ˆ dy q “ ĳ X ˆ X S p p u X qp x, y q µ p dx ˆ dy q ě d S p p X q W , p α, β q , and d S p p X q W , p α, β q “ ĳ X ˆ X S p p u X qp x, y q µ p dx ˆ dy q “ ĳ X ˆ X p u X p x, y qq p µ p dx ˆ dy q ě ´ d p X,u X q W ,p p α, β q ¯ p . Therefore, d p X,u X q W ,p p α, β q “ ´ d S p p X q W , p α, β q ¯ p . (cid:3) Let X “ p X, u X , µ X q and Y “ p Y, u Y , µ Y q denote two ultrametric measure spaces. Let1 ď p ă 8 . We denote by S p p X q the ultrametric measure space p X, S p ˝ u X , µ X q . GivenLemma 3.26, it is not very surprising that the snowﬂake transform can also be use to relate u GW ,p p X , Y q as well as u sturmGW ,p p X , Y q with u GW , p S p p X q , S p p Y qq and u sturmGW , p S p p X q , S p p Y qq ,respectively. Theorem 3.28.

Let X , Y P U w and let p P r , . Then, ` u GW ,p p X , Y q ˘ p “ u GW , p S p p X q , S p p Y qq and ` u sturmGW ,p p X , Y q ˘ p “ u sturmGW , p S p p X q , S p p Y qq . Proof.

Let µ P C p µ X , µ Y q . Then, ĳ X ˆ Y ˆ X ˆ Y ` ∆ p u X p x, x q , u Y p y, y qq ˘ p µ p dx ˆ dy q µ p dx ˆ dy q“ ĳ X ˆ Y ˆ X ˆ Y ∆ ` u X p x, x q p , u Y p y, y q p ˘ µ p dx ˆ dy q µ p dx ˆ dy q . Inﬁmize over µ P C p µ X , µ Y q on both sides, then we obtain that p u GW ,p p X , Y qq p “ u GW , p S p p X q , S p p Y qq . In order to prove the second part of the claim, let u P D ult p u X , u Y q . Then, we have that ĳ X ˆ Y p u p x, y qq p µ p dx ˆ dy q “ ĳ X ˆ Y p S p ˝ u qp x, y q µ p dx ˆ dy q . Inﬁmize over µ P C p µ X , µ Y q on both sides, then we obtain that ` d p X \ Y,u q W ,p p µ X , µ Y q ˘ p “ d S p p u q W , p µ X , µ Y q . Finally, inﬁmizing over u P D ult p u X , u Y q , we have that u sturmGW ,p p X , Y q p “ u sturmGW , p S p p X q , S p p Y qq . (cid:3) As a direct consequence, we obtain the following relation between p U w , u sturmGW , q and ` U w , u sturmGW ,p ˘ for p P r , : HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 29

Corollary 3.29.

For each p P r , , the metric space p U w , u sturmGW , q is isometric to thesnowﬂake transform of ` U w , u sturmGW ,p ˘ , i.e., S p ` U w , u sturmGW ,p ˘ – ` U w , u sturmGW , ˘ Proof.

Consider the snowﬂake transform map S p : U w Ñ U w sending X P U w to S p p X q P U w .It is obvious that S p is bijective. By Theorem 3.28, S p is an isometry from S p ` U w , u sturmGW ,p ˘ to ` U w , u sturmGW , ˘ . Therefore, S p ` U w , u sturmGW ,p ˘ – ` U w , u sturmGW , ˘ . (cid:3) Theorem 3.28 suggests that in order to study the relation between u GW ,p and u sturmGW ,p we onlyneed to examine the cases p “ p “ 8 .3.3.1. The case p “ . We ﬁrst study the relation between u sturmGW , and u GW , and use Theo-rem 3.28 to then relate u sturmGW ,p and u GW ,p , 1 ď p ă 8 , afterwards. We start by showing thefollowing theorem. Theorem 3.30.

Let X , Y P U w . Then, we have u sturmGW , p X , Y q ě u GW , p X , Y q . Proof.

Let u P D ult p u X , u Y q and µ P C p µ X , µ Y q be such that u sturmGW , p X , Y q “ ş u p x, y q µ p dx ˆ dy q . The existence of u and µ follows from Proposition 3.6 Claim 1:

For any p x, y q , p x , y q P X ˆ Y , we have∆ p u X p x, x q , u Y p y, y qq ď max p u p x, y q , u p x , y qq ď u p x, y q ` u p x , y q . Assuming Claim 1, we have ĳ X ˆ Y ˆ X ˆ Y ∆ p u X p x, x q , u Y p y, y qq µ p dx ˆ dy q µ p dx ˆ dy qď ĳ X ˆ Y ˆ X ˆ Y u p x, y q µ p dx ˆ dy q µ p dx ˆ dy q` ĳ X ˆ Y ˆ X ˆ Y u p x , y q µ p dx ˆ dy q µ p dx ˆ dy q“ ĳ X ˆ Y u p x, y q µ p dx ˆ dy q ` ĳ X ˆ Y u p x , y q µ p dx ˆ dy q ď u sturmGW , p X , Y q . Therefore, u sturmGW , p X , Y q ě ´ u GW , p X , Y q .Now, we ﬁnalize the proof by verifying Claim 1. Proof of Claim 1:

We only need to show that ∆ p u X p x, x q , u Y p y, y qq ď max p u p x, y q , u p x , y qq .If u X p x, x q “ u Y p y, y q , then there is nothing to prove. Otherwise, we assume withoutloss of generality that u X p x, x q ă u Y p y, y q . If max p u p x, y q , u p x , y qq ă u Y p y, y q , thenby the strong triangle inequality we must have u p x, y q “ u Y p y, y q “ u p x , y q . However, u p x , y q ď max p u X p x, x q , u p x, y qq ă u Y p y, y q , which leads to a contradiction. Therefore,∆ p u X p x, x q , u Y p y, y qq ď max p u p x, y q , u p x , y qq . (cid:3) The following example veriﬁes that the coeﬃcient in Theorem 3.30 is tight.

Example 3.31.

For each n P N , let X n be the three-point space ∆ p q (labeled by t x , x , x u )with a probability measure µ nY such that µ nY p x q “ µ nY p x q “ n and µ nY p x q “ ´ n . Let Y “ ˚ and µ Y be the only probability measure on Y . Then, it is routine to check that u GW , p X n , Y q “ n ` ´ n ˘ and u sturmGW , p X n , Y q “ n . Therefore, we havelim n Ñ8 u GW , p X n , Y q u sturmGW , p X n , Y q “ . In Section 4, we will furthermore verify that u sturmGW ,p and u GW ,p , 1 ď p ă 8 are also topo-logically equivalent. However, for the moment we conclude the direct comparison of thetwo metrics by deriving their relation for general p on the basis of Theorem 3.28 and Theo-rem 3.30. Corollary 3.32.

Let X , Y P U w and p P r , . Then, we have u sturmGW ,p p X , Y q ě ´ p u GW ,p p X , Y q . Proof.

Applying Theorem 3.28 and Theorem 3.30, we have that u sturmGW ,p p X , Y q “ ` u sturmGW , p S p p X q , S p p Y qq ˘ p ě ˆ u GW , p S p p X q , S p p Y qq ˙ p “ ´ p u GW ,p p X , Y q . (cid:3) The case p “ 8 . Now, we consider the relation between u sturmGW , and u GW , . By takingthe limit p Ñ 8 in Theorem 3.30, one might expect that u sturmGW , ě u GW , . In fact, we provethat equality holds. Theorem 3.33.

Let X , Y P U w . Then, it holds that u sturmGW , p X , Y q “ u GW , p X , Y q . Proof.

First we prove that u sturmGW , p X , Y q ě u GW , p X , Y q . Indeed, for any u P D ult p u X , u Y q and µ P C p µ X , µ Y q , we have thatsup p x,y qP supp p µ q u p x, y q “ sup p x,y q , p x ,y qP supp p µ q max p u p x, y q , u p x , y qqě sup p x,y q , p x ,y qP supp p µ q ∆ p u X p x, x q , u Y p y, y qqě u GW , p X , Y q , where the ﬁrst inequality follows from Claim 1 in the proof of Theorem 3.30. Then, by takinga standard limit argument, we conclude that u sturmGW , p X , Y q ě u GW , p X , Y q . Next, we prove that u sturmGW , p X , Y q ď min t t ě | X t – w Y t u . Suppose t ą X t – w Y t and assume ϕ : X t Ñ Y t is such an isomorphism. Then, we deﬁne a function u : X \ Y ˆ X \ Y Ñ R ě as follows:(1) u | X ˆ X : “ u X and u | Y ˆ Y : “ u Y ; HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 31 (2) for any p x, y q P X ˆ Y , u p x, y q : “ u Y t p ϕ pr x s Xt q , r y s Yt q , if ϕ pr x s Xt q ‰ r y s Yt t, if ϕ pr x s Xt q “ r y s Yt . (3) for any p y, x q P Y ˆ X , u p y, x q : “ u p x, y q .Then, it is easy to verify that u P D ult p u X , u Y q and that u is actually an ultrametric. Let Z : “ p X \ Y, u q . By Lemma 2.13, we have u sturmGW , p X , Y q ď d Z W , p µ X , µ Y q “ max B P V p Z qzt Z u and µ X p B q‰ µ Y p B q diam p B ˚ q . We verify that d Z W , p µ X , µ Y q ď t as follows. It is obvious that Z t – X t – Y t . Write X t “ tr x i s Xt u ni “ and Y t “ tr y i s Yt u ni “ such that r y i s Yt “ ϕ pr x i s Xt q for each i “ , . . . , n . Then, r x i s Zt “ r y i s Zt and Z t “ tr x i s Zt | i “ , . . . , n u . Since ϕ is an isomorphism, for any i “ , . . . , n we have that µ X pr x i s Xt q “ µ Y pr y i s Yt q and thus µ X pr x i s Zt q “ µ Y pr y i s Zs q “ µ Y pr x i s Zt q when µ X and µ Y are regarded as pushforward measures under the inclusion map X ã Ñ Z and Y ã Ñ Z , respectively. Now for any B P V p Z q (cf. Section 2.3), if diam p B q ě t , then B is the union of certain r x i s Zt s in Z t and thus µ X p B q “ µ Y p B q . If diam p B q ă t anddiam p B ˚ q ą t , then there exists some x i such that B “ r x i s Zs and r x i s Zs “ r x i s Zt where s : “ diam p B q . This implies that µ X p B q “ µ Y p B q . Then, we have that d Z W , p µ X , µ Y q ď t andthus u sturmGW , p X , Y q ď d p X \ Y,u q W , p µ X , µ Y q ď t . Therefore, u sturmGW , p X , Y q ď inf t t ě | X t – w Y t u .Finally, by invoking Theorem 3.22, we conclude that u sturmGW , p X , Y q “ u GW , p X , Y q . (cid:3) One application of Theorem 3.33 is to explicitly derive the minimizing pair p A, φ q P A ˚ in(22) for p “ 8 . Theorem 3.34.

Let X , Y P U w . Let s : “ u sturmGW , p X , Y q and assume that s ą . Then, thereexists p A, φ q P A deﬁned in (20) such that u sturmGW , p X , Y q “ d Z A W , p µ X , µ Y q , where Z A denotes the ultrametric space deﬁned in Section 3.1.1.Proof. We prove the result via an explicit construction. By Theorem 3.33, we have s “ u sturmGW , p X , Y q “ u GW , p X , Y q . By Theorem 3.22, there exists an isomorpism ϕ : X s Ñ Y s . Since s ą

0, by Lemma 2.7, both X s and Y s are ﬁnite spaces. We then let X s “tr x s Xs , . . . , r x n s Xs u and Y s “ tr y s Ys , . . . , r y n s Ys u and assume r y i s Ys “ ϕ pr x i s Xs q for each i “ , . . . , n . Let A “ t x , . . . , x n u and deﬁne φ : A Ñ Y by sending x i to y i for each i “ , . . . , n .Then, we prove that p A, φ q satisﬁes the conditions in the statement.Since ϕ is an isomorphism, for any 1 ď i ă j ď n , u Y p y i , y j q “ u Y s pr y i s Ys , r y j s Ys q “ u Y s p ϕ pr x i s Xs q , ϕ pr x j s Xs qq “ u X s pr x i s Xs , r x j s Xs q “ u X p x i , x j q . This implies that φ : A Ñ Y is an isometric embedding and thus p A, φ q P A .It is obvious that p Z A q s is isometric to both X s and Y s . In fact, r x i s Z A s “ r y i s Z A s in Z A foreach i “ , . . . , n and p Z A q s “ tr x i s Z A s | i “ , . . . , n u . Since ϕ is an isomorphism, for any i “ , . . . , n we have that µ X pr x i s Xs q “ µ Y pr y i s Ys q and thus µ X pr x i s Z A s q “ µ Y pr y i s Z A s q “ µ Y pr x i s Z A s q when µ X and µ Y are regarded as pushforward measures under the inclusion map X Ñ Z A and Y Ñ Z A , respectively. Now for any B P V p Z A q , if diam p B q ě s , then B is the union of certain r x i s Z A s s and thus µ X p B q “ µ Y p B q . If otherwise diam p B q ă s and diam p B ˚ q ě s ,then there exists x i such that B “ r x i s Z A t and r x i s Z A t “ r x i s Z A s where t : “ diam p B q . Thisimplies that µ X p B q “ µ Y p B q . Then, by Lemma 2.13, we have d Z A W , p µ X , µ Y q ď s and thus d Z A W , p µ X , µ Y q “ s . (cid:3) Topological and geodesic properties

In this section, we study the topology induced by u GW ,p and u sturmGW ,p on U w and discuss thegeodesic properties of both u GW ,p and u sturmGW ,p for 1 ď p ď 8 .4.1. Topological equivalence between u GW ,p and u sturmGW ,p . M´emoli (2011) proved thetopological equivalence between d GW ,p and d sturmGW ,p . We establish an analogous result for u GW ,p and u sturmGW ,p . To this end, we recall the modulus of mass distribution . Deﬁnition 4.1 (Greven et al. (2009, Def. 2.9)) . Given δ ą X P U w , we deﬁne themodulus of mass distribution of X as v δ p X q : “ inf t ε ą | µ X pt x : µ X p B ˝ ε p x qq ď δ uq ď ε u , (25)where B ˝ ε p x q denotes the open ball centered at x with radius ε .We note that v δ p X q is non-decreasing, right-continuous and bounded by 1. Furthermore, itholds that lim δ Œ v δ p X q “ Theorem 4.2.

Let X , Y P U w , p P r , and δ P ` , ˘ . Then, whenever u GW ,p p X , Y q ă δ we have u sturmGW ,p p X , Y q ď p ¨ min p v δ p X q , v δ p Y qq ` δ q p ¨ M, where M : “ ¨ max p diam p X q , diam p Y qq ` . Remark 4.3.

Since it holds that lim δ Œ v δ p X q “ ´ { p u sturmGW ,p ě u GW ,p (seeCorollary 3.32), the above theorem gives the topological equivalence of u GW ,p and u sturmGW ,p ,1 ď p ă 8 (by Theorem 3.33 it holds u sturmGW , “ u GW , ).The proof of the Theorem 4.2 follows the same strategy used for proving Proposition 5.3 inM´emoli (2011) and we refer to Appendix C.2 for the details.4.2. Completeness and separability.

In this subsection, we study completeness and sep-arability of the two metrics u GW ,p and u sturmGW ,p , 1 ď p ď 8 , on U w Theorem 4.4. (1) For p P r , , the metric space p U w , u GW ,p q is neither complete norseparable.(2) For p P r , , the metric space ` U w , u sturmGW ,p ˘ is neither complete nor separable.(3) p U w , u GW , q “ p U w , u sturmGW , q is complete but not separable. HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 33

Proof. (1) We ﬁrst prove that p U w , u GW ,p q is non-separable for each p P r , . Recallnotations in Example 3.9 and consider Consider the family t ˆ∆ p a qu a Pr , s . Claim 1: @ a ‰ b P r , s , u GW ,p ´ ˆ∆ p a q , ˆ∆ p b q ¯ “ ´ p ∆ p a, b q ě ´ p , where we let2 ´ “ ! ˆ∆ p α q ) α Pr , s isan uncountable subset of U w with pairwise distance greater than 2 ´ p , which impliesthat p U w , u GW ,p q is non-separable.Now for p P r , , we show that u GW ,p is not complete. Consider the family t ∆ n p qu n P N of 2 n -point spaces with unitary interpoint distances. Endow each space∆ n p q with the uniform measure µ n and denote the corresponding ultrametric mea-sure space by ˆ∆ n p q . It is proven in Sturm (2012, Example 2.2) that t ˆ∆ n p qu n P N isa Cauchy sequence with respect to d GW ,p without a compact metric measure spaceas limit. It is not hard to check that u GW ,p ´ ˆ∆ m p q , ˆ∆ n p q ¯ “ d GW ,p ´ ˆ∆ m p q , ˆ∆ n p q ¯ , @ n, m P N . Therefore, t ˆ∆ n p qu n P N is a Cauchy sequence with respect to u GW ,p without limit in U w . This implies that p U w , u GW ,p q is not complete.(2) By Corollary 3.32 and (1), we have that ` U w , u sturmGW ,p ˘ is not separable. As for com-pleteness, consider the subset X : “ t ´ n u n P N Ď p R ě , ∆ q . By Lemma 2.16, X isnot a compact ultrametric space. Let µ P P p X q be a probability deﬁned as follows: µ ˆ" ´ n *˙ : “ ´ n , @ n P N . For each N P N , let X N : “ t ´ n | n “ , . . . , N u . Since each X N is ﬁnite, X N is acompact ultrametric space. Let µ N P P p X N q be a probability deﬁned as follows: µ N ˆ" ´ n *˙ : “ ´ n , ď n ă N ´ N ` n “ N .

Then, it is easy to verify (e.g. via theorem 3.34) that tp X N , ∆ , µ N qu is a d sturmGW ,p Cauchy sequence with p X, ∆ , µ q being the limit. Since X is not compact, p X, ∆ , µ q R U w and thus ` U w , u sturmGW ,p ˘ is not complete.(3) That p U w , u GW , q is non-separable is already proved in (1). Given a Cauchy sequence t X n “ p X n , u n , µ n qu n P N with respect to u GW , , we have that the underlying ultra-metric spaces t X n u n P N form a Cauchy sequence with respect to u GH due to Corollary3.24. Since p U , u GH q is complete (see Zarichnyi (2005, Proposition 2.1)), there existsa compact ultrametric space p X, u X q such thatlim n Ñ8 u GH p X n , X q “ . For each n P N , let δ n : “ u GH p X n , X q . By Theorem 2.8, we have that p X n q δ n – X δ n .Denote by ˆ µ n P P p X δ n q the pushforward of p µ n q δ n under the isometry. Furthermore, we have by Lemma 2.7 that X δ n is ﬁnite and we let X δ n “ tr x s δ n , . . . , r x k s δ n u for x , . . . , x k P X . Based on this, we deﬁne ν n : “ k ÿ i “ ˆ µ n pr x i s δ n q ¨ δ x i P P p X q , where δ x i is the Dirac measure at x i . Since X is compact, P p X q is weakly compact.Therefore, the sequence t ν n u n P N has a cluster point ν P P p X q .Now we show that X : “ p X, u X , ν q is a u GW , cluster point of t X n u and thus the limitof t X n u since t X n u is a Cauchy sequence. Without loss of generality, we assume that t ν n u weakly converges to ν . Fix any ε ą

0, we need to show that u GW , p X , X n q ď ε when n is large enough. For any ﬁxed x ˚ P X , r x ˚ s ε is both an open and closed ballin X . Therefore, ν pr x ˚ s ε q “ lim n Ñ8 ν n pr x ˚ s ε q . Since δ n Ñ n Ñ 8 , there exists N ą n ą N , δ n ă ε . We specify an isometry ϕ n : p X n q δ n Ñ X δ n that gives rise to the construction of ν n . Then, we let ψ n : p X n q ε Ñ X ε be theisometry such that the following diagram commutes: p X n q δ n X δ n p X n q ε X εϕ n ε -quotient ε -quotient ψ n Assume that r x ˚ s Xε “ Ť li “ r x i s Xδ n . Let x n ˚ P X n be such that ψ n pr x n ˚ s X n ε q “ r x ˚ s Xε andlet x n , . . . , x nl P X n be such that ϕ n pr x ni s X n δ n q “ r x i s Xδ n for each i “ , . . . , l . Then, r x n ˚ s X n ε “ Ť li “ r x ni s X n δ n . Therefore, ν n pr x ˚ s Xε q “ l ÿ i “ ν n pr x i s Xδ n q “ l ÿ i “ ˆ µ n pr x i s Xδ n q “ l ÿ i “ µ n pr x ni s X n δ n q “ µ n pr x n ˚ s X n ε q . Since X n is a Cauchy sequence, there exists N ą u GW , p X n , X m q ă ε when n, m ą N . Then, by Theorem 3.22, p X n q ε – w p X m q ε for all n, m ą N .By Lemma 2.7, p X n q ε is ﬁnite, then p X n q ε has cardinality independent of n when n ą N . Moreover, for all n ą N , the ﬁnite set A n : “ t µ n pr x n ˚ s X n ε quYt µ n pr x ni s X n ε q| i “ , . . . , l u is independent of n and thus µ n pr x n ˚ s X n ε q only takes value in a ﬁnite set ( A n ).Combining with the fact that lim n Ñ8 µ n pr x n ˚ s X n ε q “ lim n Ñ8 ν n pr x s Xε q “ ν pr x ˚ s Xε q exists, there exists N ą n ą N , µ n pr x n ˚ s ε q ” C for some constant C . This implies that ν pr x ˚ s Xε q “ µ n pr x n ˚ s X n ε q , when n ą max p N , N , N q . Since X ε is ﬁnite, there exists a common N ą n ą N and @r x ˚ s ε P X ε we have ν pr x ˚ s Xε q “ µ n pr x n ˚ s X n ε q , where r x n ˚ s X n ε “ ψ ´ n pr x ˚ s Xε q P p X n q ε . This indicates that ν ε “ p ψ n q p µ n q ε when n ą N . Therefore, X ε – w p X n q ε and thus u GW , p X , X n q ď ε . (cid:3) HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 35

Geodesic property.

A geodesic in a metric space p X, d X q is a continuous function γ : r , s Ñ X such that for each s, t P r , s , d X p γ p s q , γ p t qq “ | s ´ t | ¨ d X p γ p q , γ p qq . Wesay a metric space is geodesic if for any two distinct points x, x P X , there exists a geodesic γ : r , s Ñ X such that γ p q “ x and γ p q “ x .For any p P r , , the notion of p -geodesic is introduced in M´emoli et al. (2019): A p -geodesic in a metric space p X, d X q is a continuous function γ : r , s Ñ X such that for each s, t P r , s , d X p γ p s q , γ p t qq “ | s ´ t | { p ¨ d X p γ p q , γ p qq . Similarly, we say a metric space is p -geodesic if for any two distinct points x, x P X , there exists a p -geodesic γ : r , s Ñ X such that γ p q “ x and γ p q “ x . Note that a 1-geodesic is a usual geodesic and a 1-geodesicspace is a usual geodesic space. Lemma 4.5 (M´emoli et al. (2019, Proposition 7.10)) . Given p P r , , if X is a p -metricspace, then X is not q -geodesic for all ď q ă p . Lemma 4.6 (M´emoli et al. (2019, Theorem 7.7)) . Let X be a geodesic metric space. Then,for any p ě , S p p X q is p -geodesic, where S α denotes the snowﬂake transform for α ą (cf.Section 3.3). Now, we start establish ( p -)geodesic properties of ` U w , u sturmGW ,p ˘ for p P r , . Proposition 4.7.

The metric space p U w , u sturmGW , q is geodesic. The proof is based on the following property of W space. Lemma 4.8 (Bottou et al. (2018, Theorem 5.1)) . Let X be a compact metric space. Then,the space W p X q : “ p P p X q , d X W , q is a geodesic space.Proof of Proposition 4.7. Let X and Y be two compact ultrametric measure spaces. ByCorollary 3.7, there exist a compact ultrametric space Z and isometric embeddings φ : X ã Ñ Z and ψ : Y ã Ñ Z such that u sturmGW ,p p X , Y q “ d Z W ,p p φ µ X , ψ µ Y q . Then, the space W p Z q is geodesic (cf. Lemma 4.8). Therefore, there exists a Wassersteingeodesic ˜ γ : r , s Ñ W p Z q connecting φ µ X and ψ µ Y . This induces a curve γ : r , s Ñ U w where for each t P r , s , γ p t q : “ p supp p ˜ γ p t qq , u | supp p ˜ γ p t qqˆ supp p ˜ γ p t qq , ˜ γ p t qq . Note that γ p q – w X and γ p q – w Y and hence we simply replace γ p q and γ p q with X and Y , respectively.Now, for each s, t P r , s , we have that d sturmGW , p γ p s q , γ p t qq ď d Z W , p ˜ γ p s q , ˜ γ p t qq “ | s ´ t | d Z W , p ˜ γ p q , ˜ γ p qq “ | s ´ t | d sturmGW , p X , Y q . Therefore, γ is a geodesic connecting X and Y and thus p U w , u sturmGW , q is geodesic. (cid:3) Theorem 4.9.

For each p P r , , the metric space ` U w , u sturmGW ,p ˘ is p -geodesic.Proof. By Corollary 3.29, S p ` U w , u sturmGW ,p ˘ – ` U w , u sturmGW , ˘ . This implies that S p p U w , u sturmGW , q – ` U w , u sturmGW ,p ˘ . Then, by Lemma 4.6, we have that ` U w , u sturmGW ,p ˘ is p -geodesic. (cid:3) Remark 4.10.

By Lemma 4.5, ` U w , u sturmGW ,p ˘ is not geodesic for all p ą Remark 4.11.

Though the geodesic properties of ` U w , u sturmGW ,p ˘ , 1 ď p ď 8 are clear, weremark that geodesic properties of p U w , u GW ,p q , 1 ď p ă 8 , still remain unknown to us.5. Lower bounds of u GW ,p Let X “ p X, u X , µ X q and Y “ p Y, u Y , µ Y q be two ultrametric measure spaces. The metrics u sturmGW ,p and u GW ,p respect the ultrametric structure of the spaces X and Y . Thus, one wouldhope that comparing ultrametric measure spaces with u sturmGW ,p or u GW ,p is more meaningfulthan doing it with the usual Gromov-Wasserstein distance or Sturm’s distance. Unfortu-nately, for p ă 8 , the computation of both u sturmGW ,p and u GW ,p is complicated and for p “ 8 both metrics are extremely sensitive to diﬀerences in the diameter of the considered spaces(see Corollary 3.25). Thus, it not feasible to use these metrics in many applications. How-ever, we can derive meaningful lower bounds for u sturmGW ,p and u GW ,p that resemble those of theGromov-Wasserstein distance. Naturally, the question arises whether these lower bounds arebetter/sharper than the ones of the usual Gromov-Wasserstein distance in this setting. Thisquestion is addressed throughout this section and will be readdressed in Section 7 as well asSection 8.In M´emoli (2011), the author introduced three lower bounds for d GW ,p that are computation-ally less expensive than the calculation of d GW ,p . We will brieﬂy review three lower boundsand then deﬁne the corresponding lower bounds for u GW ,p . In the following, we always as-sume p P r , . First lower bound.

Let s X,p : X Ñ R ě , x ÞÑ (cid:107) u X p x, ¨q (cid:107) L p p µ X q . Then, the ﬁrst lower bound FLB p p X , Y q for d GW ,p p X , Y q is deﬁned as follows FLB p p X , Y q : “

12 inf µ P C p µ X ,µ Y q (cid:107) ∆ p s X,p p¨q , s

Y,p p¨qq (cid:107) L p p µ q . Following our intuition of replacing ∆ by ∆ , we deﬁne the ultrametric version of FLB as FLB ult p p X , Y q : “ inf µ P C p µ X ,µ Y q (cid:107) ∆ p s X,p p¨q , s

Y,p p¨qq (cid:107) L p p µ q . Second lower bound.

The second lower bound

SLB p p X , Y q for d GW ,p p X , Y q is given as SLB p p X , Y q : “

12 inf γ P C p µ X b µ X ,µ Y b µ Y q (cid:107) ∆ p u X , u Y q (cid:107) L p p γ q . Thus, we deﬁne the ultrametric second lower bound between two ultrametric measure spaces X and Y as follows: SLB ult p p X , Y q : “ inf γ P C p µ X b µ X ,µ Y b µ Y q (cid:107) ∆ p u X , u Y q (cid:107) L p p γ q . HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 37

Third lower bound.

Before we introduce the ﬁnal lower bound, we have to deﬁne severalfunctions. First, let Γ X,Y : X ˆ Y ˆ X ˆ Y Ñ R ě , p x, y, x , y q ÞÑ ∆ p u X p x, x q , u Y p y, y qq and let Ω p : X ˆ Y Ñ R ě , p P r , , be given byΩ p p x, y q : “ inf µ P C p µ X ,µ Y q (cid:13)(cid:13) Γ X,Y p x, y, ¨ , ¨q (cid:13)(cid:13) L p p µ q . Then, the third lower bound

TLB p is given as TLB p p X , Y q : “

12 inf µ P C p µ X ,µ Y q (cid:13)(cid:13) Ω p p¨ , ¨q (cid:13)(cid:13) L p p µ q . Analogue to the deﬁnition of previous ultrametric versions, we deﬁne Γ X,Y : X ˆ Y ˆ X ˆ Y Ñ R ě , p x, y, x , y q ÞÑ ∆ p u X p x, x q , u Y p y, y qq . Further, for p P r , , let Ω p : X ˆ Y Ñ R ě be given by Ω p p x, y q : “ inf µ P C p µ X ,µ Y q (cid:13)(cid:13) Γ X,Y p x, y, ¨ , ¨q (cid:13)(cid:13) L p p µ q . Then, the ultrametric third lower bound between two ultrametric measure spaces X and Y is deﬁned as TLB ult p p X , Y q : “ inf µ P C p µ X ,µ Y q (cid:13)(cid:13) Ω p p¨ , ¨q (cid:13)(cid:13) L p p µ q . Properties and computation of the lower bounds.

Next, we examine

FLB ult , SLB ult and

TLB ult more closely. Since ∆ p a, b q ě ∆ p a, b q “ | a ´ b | for any a, b ě

0, it is easy toconclude that

FLB ult p ě FLB p , SLB ult p ě SLB p and TLB ult p ě TLB p . Moreover, the threeultrametric lower bounds satisfy the following theorem. Theorem 5.1.

Let X , Y P U w and let p P r , .(1) u GW , p X , Y q ě FLB ult p X , Y q .(2) u GW ,p p X , Y q ě TLB ult p p X , Y q .(3) u GW ,p p X , Y q ě SLB ult p p X , Y q .Proof. The proofs of d GW ,p ě SLB p p X , Y q and d GW ,p ě TLB p p X , Y q in M´emoli (2011) applyhere without any change for proving item (2) and item (3).As for item (1), we ﬁrst observe that for any point x in an ultrametric space X , there existsa point x P X such that u X p x, x q “ diam p X q . Then, as long as µ X is fully supported on X as assumed throughout the paper, we have that s X, ” diam p X q is a constant function.Therefore, ∆ p s X, p x q , s Y, p y qq ” ∆ p diam p X q , diam p Y qq , @ x P X, y P Y. This implies that

FLB ult p X , Y q “ ∆ p diam p X q , diam p Y qq . By Remark 2.10 and Corollary3.24, we have that u GW , p X , Y q ě u GH p X, Y q ě ∆ p diam p X q , diam p Y qq “ FLB ult p X , Y q . (cid:3) Remark 5.2.

Interestingly, it turns out that

FLB ult p is not a lower bound of u GW ,p ingeneral when p ă 8 . For example, let X “ t x , x , . . . , x n u and Y “ t y , . . . , y n u anddeﬁne u X such that u X p x , x q “ u X p x i , x j q “ δ i ‰ j for p i, j q ‰ p , q , p i, j q ‰ p , q and i, j “ , . . . , n . Let u Y p y i , y j q “ δ i ‰ j , i, j “ , . . . , n , and let µ X and µ Y be uniformmeasures on X and Y , respectively. Then, u GW , p X , Y q ď n whereas FLB ult1 p X , Y q “ n ´ n which is greater than u GW , p X , Y q as long as n ą

2. Moreover, we have in this case that

FLB ult1 p X , Y q “ O ` n ˘ whereas u GW , p X , Y q “ O ` n ˘ . Hence, there exists no constant C ą FLB ult1 ď C ¨ u GW , in general.From the structure of SLB ult p and TLB ult p it is obvious that their computations leads todiﬀerent optimal transport problems (see e.g. Villani (2003)). However, we can rewrite SLB ult p and TLB ult p in order to further simplify their computation. To this end, we ﬁrstintroduce the following auxiliary result, which is Lemma 28 in Chowdhury and M´emoli(2019). Lemma 5.3.

Let

X, Y be two Polish metric spaces and let f : X Ñ R and g : Y Ñ R bemeasurable maps. Denote by f ˆ g : X ˆ Y Ñ R the map p x, y q ÞÑ p f p x q , g p y qq . Then, p f ˆ g q C p µ X , µ Y q “ C p f µ Y , g µ Y q , for any µ Y P P p X q and µ Y P P p Y q . Based on Lemma 5.3 it is possible to transform the optimization problem for

SLB ult p intocomputing a Wasserstein distances on p R ě , ∆ q and to eﬃciently calculate Ω p in the deﬁ-nition of TLB ult p . Proposition 5.4.

Let X , Y P U w and let p P r , . Then,(1) SLB ult p p X , Y q “ d p R ě , ∆ q W ,p pp u X q p µ X b µ X q , p u Y q p µ Y b µ Y qq ; (2) For each x, y P X ˆ Y , Ω p p x, y q “ d p R ě , ∆ q W ,p p u X p x, ¨q µ X , u Y p y, ¨q µ Y q .Proof. We only prove (1) in the case when p P r , . The case p “ 8 and (2) can be provenin a similar manner.By directly using the change-of-variables formula, we have the following: SLB ult p p X , Y q “ inf γ P C p µ X b µ X ,µ Y b µ Y q ĳ X ˆ X ˆ Y ˆ Y p ∆ p u X p x, x q , u Y p y, y qqq p γ p d p x, x q ˆ d p y, y qq“ inf γ P C p µ X b µ X ,µ Y b µ Y q ĳ R ě ˆ R ě p ∆ p s, t qq p p u X ˆ u Y q γ p ds ˆ dt q , where u X ˆ u Y : X ˆ X ˆ Y ˆ Y Ñ R ě ˆ R ě maps p x, x , y, y q to p u X p x, x q , u Y p y, y qq .By Lemma 5.3, we have that p u X ˆ u Y q C p µ X b µ X , µ Y b µ Y q “ C pp u X q p µ X b µ X q , p u Y q p µ Y b µ Y qq . HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 39

Therefore,

SLB ult p p X , Y q “ inf γ P C p µ X b µ X ,µ Y b µ Y q ĳ R ě ˆ R ě p ∆ p s, t qq p p u X ˆ u Y q γ p ds ˆ dt q“ inf ˜ γ P C p p u X q p µ X b µ X q , p u Y q p µ Y b µ Y q q ĳ R ě ˆ R ě p ∆ p s, t qq p ˜ γ p ds ˆ dt q“ d p R ě , ∆ q W ,p pp u X q p µ X b µ X q , p u Y q p µ Y b µ Y qq . (cid:3) Remark 5.5.

Since we have by Theorem 2.14 an explicit formula for the Wasserstein dis-tance on p R ě , ∆ q between ﬁnitely supported probability measures, these alternative rep-resentations of the lower bound SLB ult p and the cost functional Ω p drastically reduce thecomputation time of SLB ult p and TLB ult p , respectively. In particular, we note that this allowus to compute SLB ult p , 1 ď p ď 8 , between ﬁnite ultrametric measure spaces X and Y with | X | “ m and | Y | “ n in O pp m _ n q q steps.Moreover, Proposition 5.4 allows us to direclty compare the two lower bounds SLB ult1 and

SLB . Corollary 5.6.

For any ﬁnite ultrametric measure spaces X and Y , we have that SLB ult1 p X , Y q “ SLB p X , Y q ` ż R t |p u X q p µ X b µ X q ´ p u Y q p µ Y b µ Y q| p dt q . (26) Proof.

The claim follows directly from Proposition 5.4 and Remark 2.15. (cid:3)

This corollary implies that

SLB ult p is more rigid than SLB p , since the second summandon the right hand side of Equation 26 is sensitive to distance perturbations. This is alsoillustrated very well in the subsequent example. Example 5.7.

Recall notations from Example 3.9. For any d, d ą

0, we let X : “ ∆ p d q and let Y : “ ∆ p d q . Assume that X and Y have underlying sets t x , x u and t y , y u ,respectively. Deﬁne µ X P P p X q and µ Y P P p Y q as follows. Let α , α ě α ` α “

1. Let µ X p x q “ µ Y p y q : “ α and let µ X p x q “ µ Y p y q : “ α . Then, it is easy toverify that SLB ult1 p X , Y q “ α α ∆ p d, d q . This example illustrates that

SLB ult1 (and hence u GW , ) is rigid with respect to distanceperturbation. u GW ,p on ultra-dissimilarity spaces A natural generalization of ultrametric spaces are the so-called ultra-dissimilarity spaces .These spaces naturally occur when working with symmetric ultranetworks (see Smith et al.(2016)) or phylogenetic tree data (see Semple et al. (2003)). In this section, we will introducethese spaces and brieﬂy illustrate to what extend the results for u GW ,p can be adapted forultra-dissimilarity measure spaces.We start by formally introducing ultra-dissimilarity spaces . Deﬁnition 6.1 (Ultra-dissimilarity spaces) . An ultra-dissimilarity space is a couple p X, u X q consisting of a set X and a function u X : X ˆ X Ñ R ě satisfying the following conditionsfor any x, y, z P X ,:(1) u X p x, y q “ u X p y, x q ;(2) u X p x, y q ď max p u X p x, z q , u X p z, y qq ;(3) max p u X p x, x q , u X p y, y qq ď u X p x, y q and the equality holds if and only if x “ y . Remark 6.2.

Note that when p X, u X q is an ultrametric space condition (3) is triviallysatisﬁed.In the following, we restrict ourselves to ﬁnite ultra-dissimilarity spaces to avoid technicalissues in topology: For a ﬁnite space X , the function u X is obviously continuous with respectto the discrete topology and thus the following counterpart to Lemma B.5 naturally holds(see Chowdhury (2019); Chowdhury and M´emoli (2019) for a more complete treatment ofinﬁnite spaces). Lemma 6.3.

Let

X, Y be ﬁnite ultra-dissimilarity spaces, then ∆ p u X , u Y q : X ˆ Y ˆ X ˆ Y Ñ R ě is continuous with respect to the discrete topology. One important aspect of ultra-dissimilarity spaces is the connection with the so-called tree-grams (Smith et al., 2016; M´emoli et al., 2019), which is a generalization of dendrograms. Fora ﬁnite set X , let SubPart p X q denote the collection of all subpartitions of X : any partition P of a non-empty subset X Ď X is called a subpartition of X . Given two subpartitions P , P , we say P is coarser than P if each block in P is contained in some block in P . Deﬁnition 6.4 (Treegrams) . A treegram T X : r ,

8q Ñ

SubPart p X q is a map parametrizinga nested family of subpartitions over the same set X and satisfying the following conditions:(1) for any 0 ď s ă t ă 8 , T X p t q is coarser than T X p s q ;(2) there exists t X ą t ě t X , T X p t q “ t X u ;(3) for each t ě

0, there exists ε ą T X p t q “ T X p t q for all t P r t, t ` ε s .(4) for each x P X , there exists t x ě t x u is a block in T X p t x q .Similar to Lemma 2.2, which correlates ultrametrics to dendrograms, there exists the equiva-lent relation between ultra-dissimilarity functions and treegrams on a ﬁnite set (see Figure 5for an illustration). HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 41 eabcd {x2} {x3} {x1} x2 x3 x1 ab cd e e (cid:127)

Figure 5.

Treegrams:

Relation between ultra-dissimilarity functions andtreegrams

Proposition 6.5.

Given a ﬁnite set X , denote U dis p X q the collection of all ultrametricdissimilarity functions on X and T p X q the collection of all treegrams over X . Then, thereexists a bijection ∆ X : U dis p X q Ñ T p X q . An ultra-dissimilarity measure space is a triple X “ p X, u X , µ X q where p X, u X q is an ultra-dissimilarity space and µ X is a probability measure fully supported on X . In the following,we denote by U w dis the collection of all ﬁnite ultra-dissimilarity measure spaces.For any given ﬁnite ultra-dissimilarity measure spaces X “ p X, u X , µ X q and Y “ p Y, u Y , µ Y q and any coupling µ P C p µ X , µ Y q , we deﬁne the p -ultra-distortion of µ for 1 ď p ă 8 to bedis ult p p µ q : “ ¨˝ ĳ X ˆ Y ˆ X ˆ Y ∆ p u X p x, x q , u Y p y, y qq p µ p dx ˆ dy q µ p dx ˆ dy q ˛‚ { p and for p “ 8 as dis ult p µ q : “ sup x,x P X , y,y P Y s.t. p x,y q , p x ,y qP supp p µ q ∆ p u X p x, x q , u Y p y, y qq , where supp p µ q denotes the support of µ . Based on this u GW ,p extend naturally to U w dis : u GW ,p p X , Y q : “ inf µ P C p µ X ,µ Y q dis ult p p µ q . (27)Furthermore, one can adept the notion of a p -distortion (see (6)) to ultra-dissimilarity mea-sure spaces in the same manner and hence also d GW ,p generalizes to U w dis .Just as for metric spaces or metric measure spaces, it is important to have a notion ofisomorphism between ultra-dissimilarity spaces. Deﬁnition 6.6 (Isomorphism) . Given X , Y P U w dis , we say they are isomorphic, denoted X – w Y , if there is a bijective f : X Ñ Y such that(1) f µ X “ µ Y ;(2) for any x, x P X , u Y p f p x q , f p x qq “ u X p x, x q . Next, we will show that u GW ,p , 1 ď p ď 8 , is a metric on the isomorphism classes of U w dis .The ﬁrst step to prove this is to verify the existence of an optimal coupling in (27). Proposition 6.7.

Let X , Y P U w dis . Then, for any p P r , , there always exists an optimalcoupling µ P C p µ X , µ Y q such that u GW ,p p X , Y q “ dis ult p p µ q .Proof. The proof is essentially the same as the one for Proposition 3.20. We only replaceLemma B.5 with Lemma 6.3. The details are left to the reader. (cid:3)

As second step we demonstrate that u GW ,p is a pseudo-metric on U w dis . Theorem 6.8. u GW ,p is a p -pseudo-metric on U w dis .Proof. The proof of Theorem 3.19 only utilizes the strong triangle inequality and Proposition3.20. The same strategy applies here by only replacing Proposition 3.20 with Proposition6.7 here. Again, we leave the details to readers. (cid:3)

Finally, we prove that after the identiﬁcation of the isomorphism classes we indeed obtain ametric space.

Theorem 6.9.

Let X , Y P U w dis . Then, it holds u GW ,p p X , Y q “ if and only if X – w Y .Proof. If X – w Y , then obviously u GW ,p p X , Y q “ u GW ,p p X , Y q “

0. By Proposition 6.7 there exists µ P C p µ X , µ Y q suchthat u GW ,p p X , Y q “ dis ult p p µ q “

0. Now, we deﬁne a map ϕ : X Ñ Y as follows: For any x P X we have µ X p x q ą

0, since µ X has full support and X is ﬁnite. As a result, µ p x, y q ą y P Y , then we let ϕ p x q ÞÑ y . This map is well-deﬁned. Indeed, if there are x P X and y, y P Y such that µ p x, y q , µ p x, y q ą

0, then by dis ult p p µ q “ p u X p x, x q , u Y p y, y qq “ ∆ p u X p x, x q , u Y p y, y qq “ ∆ p u X p x, x q , u Y p y , y qq “ . This implies that u Y p y, y q “ u Y p y, y q “ u Y p y , y q “ u X p x, x q . Since u Y is an ultra-dissimilarity, we have that y “ y (cf. condition (3) in Deﬁnition 6.1). Essentially the sameargument can be applied to prove that ϕ : X Ñ Y is an injective map. Consequently, forany x P X , µ X p x q “ µ p x, ϕ p x qq ď µ Y p ϕ p x qq . Since 1 “ ř x P X µ X p x q ď ř x P X µ Y p ϕ p x qq ď µ X p x q “ µ Y p ϕ p x qq for all x P X . Since µ Y is fully supported, this implies that ϕ is a bijective measure preserving map. Now, for any x, x P X , dis ult p p µ q “ p u X p x, x q , u Y p ϕ p x q , ϕ p x qqq “ u X p x, x q “ u Y p ϕ p x q , ϕ p x qq . Therefore, ϕ is anisometry and thus an isomorphism. Then, X – w Y . (cid:3) Lower bounds.

Lower bounds of d GW ,p or u GW ,p on U w deﬁned in Section 5 can beextended to U w dis in a similar manner to the extension of d GW ,p or u GW ,p to U w dis . For example,let X , Y P U w dis , then we deﬁne the ultrametric second lower bound SLB ult p as follows: SLB ult p p X , Y q : “ inf γ P C p µ X b µ X ,µ Y b µ Y q (cid:107) ∆ p u X , u Y q (cid:107) L p p γ q . In the sequel, we mainly focus on

SLB ult p and TLB ult p . In particular, they are indeed lowerbounds of u GW ,p on U w dis : HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 43

Theorem 6.10.

Let X “ p X, u X , µ X q and Y “ p Y, u Y , µ Y q be two ﬁnite ultra-dissimilaritymeasure spaces and let p P r , .(1) u GW ,p p X , Y q ě TLB ult p p X , Y q .(2) u GW ,p p X , Y q ě SLB ult p p X , Y q .Proof. The proofs follows directly from the proof of Chowdhury (2019, Theorem 24). (cid:3) Computational aspects

In this section, we investigate algorithms for approximating/calculating u GW ,p , 1 ď p ď 8 .Furthermore, we evaluate for p ă 8 the performance of the lower bounds introduced in Sec-tion 5 and compare our ﬁndings to the results of the classical Gromov-Wasserstein distance d GW ,p (see (7)). Matlab implementations of the presented algorithms and comparisons areavailable at https://github.com/ndag/uGW .7.1. Algorithms.

Let X “ p X, u X , µ X q and Y “ p Y, u Y , µ Y q be two ﬁnite ultrametric mea-sure spaces with cardinalities m and n , respectively. We have already noted in Remark 3.21that calculating u GW ,p p X , Y q for p ă 8 yields a non-convex quadratic program (which is anNP-hard problem in general (Pardalos and Vavasis, 1991)). Solving this is not feasible inpractice. However, in many practical applications it is suﬃcient to work with good approxi-mations. Therefore, we propose to approximate u GW ,p p X , Y q for p ă 8 via gradient descentand refer to Section 7.1.1 for the details. On the other hand, it is possible to calculate u GW , p X , Y q in polynomial time. The corresponding algorithm is based on Theorem 3.22 aswell as ideas from M´emoli et al. (2019) and is presented in Section 7.1.2.7.1.1. The case p ă 8 . As already mentioned, we propose to estimate the ultrametricGromov-Wasserstein distance between X and Y for p ă 8 via conditional gradient de-scent. To this end, we note that the gradient G that arises from (12) can in the presentsetting be expressed with the following partial derivative with respect to µG i,j “ m ÿ k “ n ÿ l “ p ∆ p u X p x i , x k q , u Y p y j , y l qqq p µ kl , @ ď i ď m, ď j ď n. (28)Furthermore, we remark that, as we deal with a non-convex minimization problem, the per-formance of the gradient descent strongly depends on the starting coupling µ p q . Therefore,we follow the suggestion of Chowdhury and Needham (2020) and employ a Markov ChainMonte Carlo Hit-And-Run sampler to obtain multiple random start couplings. Running gra-dient descent from each point in this ensemble greatly improves the approximation in manycases. For a precise description of the proposed procedure, we refer to Algorithm 1.In order to understand how u GW ,p (or at least its approximation) is inﬂuenced by smallchanges in the structure of the considered ultrametric measure spaces, we exemplarily con-sider the ultrametric measure spaces X i “ p X i , d X i , µ X i q , 1 ď i ď

4, displayed in Figure 6.These ultrametric measure spaces diﬀer only by one characteristic (e.g. one side length orthe equipped measure). For p “

1, the value of the comparison u GW , p X i , X j q , 1 ď i ď j ď L “ N “

40) are shown in Table 1. It isremarkable that a change in diameter seems to inﬂuence u GW , the most. Algorithm 1: u GW ,p p X, Y, p, N, L q /* Create a list of random couplings */ couplings =CreateRandomCouplings(N);stat points = cell(N); for i=1:N do µ p q “ couplings { i } ; for j=1:L do G “ Gradient from (28) w.r.t. µ p j ´ q ;˜ µ p j q “ Solve OT with ground loss G ; γ p j q “ j ` ; /* Alt. find γ P r , s that minimizes dis ult p ´ µ p j ´ q ` γ ` ˜ µ p j q ´ µ p j ´ q ˘¯ */ µ p j q “ p ´ γ p j q q µ p j ´ q ` γ p j q ˜ µ p j q ; end stat points { i } = µ p L q ; end Find µ ˚ in stat points that minimizes dis ult p p µ q ;result =dis ult p p µ ˚ q ; Figure 6.

Ultrametric measure spaces:

Four non-isomorphic ultrametricmeasure spaces denoted (from left to right) as X i “ p X i , d X i , µ X i q , 1 ď i ď The case p “ 8 . For p “ 8 , it follows by Theorem 3.22 that u GW , p X , Y q “ inf t t ě | X t – w Y t u . (29)This identity allows us to construct a polynomial time algorithm for u GW , p X , Y q based onthe ideas of M´emoli et al. (2019, Sec. 8.2.2). More precisely, let spec p X q : “ t u X p x, x q| x, x P X u denote the spectrum of X . Then, it is evident that in order to ﬁnd the inﬁmum in (29),we only have to check X t – w Y t for each t P spec p X q Y spec p Y q , starting from the largestto the smallest and u GW , is given as the smallest t such that X t – Y t . This can be done inpolynomial time by considering X t and Y t as labeled, weighted trees (e.g. by using a slightmodiﬁcation of the algorithm in Example 3.2 of Aho and Hopcroft (1974)). This induces HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 45 u GW , u GW , X X X X X X X X X X X X Table 1.

Comparison of diﬀerent ultrametric measure spaces I:

Thevalues of u GW , p X i , X j q (approximated by Algorithm 1) and u GW , p X i , X j q ,1 ď i ď j ď

4, where X i , 1 ď i ď

4, denote the ultrametric measure spacesdisplayed in Figure 6.the subsequent simple algorithm to calculate u GW , . Algorithm 2: u GW , p X , Y q spec = sort(spec p X q Y spec p Y q , ’descent’); for i “ p spec q do t “ spec p i q ; if X t ﬂ w Y t thenreturn spec p i ´ q ; endendreturn u GW , and u GW , , we reconsider the ultramet-ric measure spaces displayed in Figure 6 and repeat the comparisons from Section 7.1.1. Theresults summarized in Table 1 show that u GW , is extremely sensitive to the changes made.For almost all comparisons it attains the maximal possible value. Only the comparison of X with X , where the only small scale structure of the space was changed, yields a valuethat is smaller than the maximum of the diameters of the considered spaces.7.2. The relationship between u GW ,p and SLB ult p . Due to its low computational com-plexity the tightness of the lower bound

SLB ult p is of particular interest for practical applica-tions. Hence, we study it empirically. To this end, we ﬁrst consider the ultrametric measurespaces in Figure 6 and redo the comparisons from Section 7.1.1 using SLB ult1 instead of u GW , . The results are reported in Table 2. We see that only the values SLB ult1 p X i , X q ,1 ď i ď

4, are clearly distinct from u GW , p X i , X q , 1 ď i ď

4, respectively. This suggeststhat changes in the metric inﬂuence

SLB ult1 in a similar fashion as u GW , , while changes inthe measure have less impact on SLB ult1 . In order to further investigate the diﬀerences andsimilarities in the behavior of u GW , and SLB ult1 , we consider two ways to generate randomultrametric measure spaces, namely independent sampling and subsampling . Independent sampling:

In order to obtain independent random ultrametric spaces, onedraws independent samples from a probability distribution. Each sample is turned in anultrametric measure space by considering it as fully connected graph weighted by the distancebetween the points in the sample and then deﬁning a new (ultra-)metric u between the SLB ult1 X X X X X X X X Table 2.

Comparison of diﬀerent ultrametric measure spaces II:

Thevalues of

SLB ult1 p X i , X j q , 1 ď i ď j ď

4, where X i , 1 ď i ď

4, denote theultrametric measure spaces displayed in Figure 6.points. More precisely, for x and x in random sample, we deﬁne u p x , x q as the weight ofthe minimax path (that is, the largest weight of an edge, on a path chosen to minimize thislargest weight) between x and x . Alternatively, it is possible to employ a linkage algorithmto create a dendrogram, which then induces an ultrametric space. The obtained ultrametricspaces can then be equipped with a (random) measure. Subsampling:

In order to ensure that the structure of the sampled spaces is preserved,it is possible to create one large ultrametric space via independent sampling and then tosubsample (e.g. uniformly) a number of points. Once the corresponding probability weightsare normalized, these points induce a random ultrametric measure space.In the following, we ﬁrst consider independently sampled ultrametric measure spaces tocompare u GW , and SLB ult1 afterwards we redo the comparisons made for a collection ofsubsampled ultrametric measure spaces. We begin by independently sampling ultrametricmeasure spaces. For each k “ , , , , n “ , ,

30 of themixture distribution k ÿ i “ k U r . p k ´ q , . p k ´ q ` s , where U r a, b s denotes the uniform distribution on r a, b s , and transform them into ultrametricmeasure spaces as described previously (the obtained ultrametric spaces are equipped withtheir respective uniform measure). In the following, these spaces are denoted by Y in,k “ ´ Y in,k , u Y in,k , µ Y in,k ¯ , 1 ď i ď

3. We remark that k can be regarded as the number of blocks inthe dendrogram representation of the obtained ultrametric measure spaces (see Figure 7 fora visualization of three 3-block spaces). Next, we calculate u GW , p Y in,k , Y i n ,k q (approximatedwith Algorithm 1, N “ , L “

10) and

SLB ult1 p Y in,k , Y i n ,k q , i, i P t , , u , k, k P t , . . . , u and n, n P t , , u . Then, we visualize the results ordered by k and n as heatmaps(see Figure 8, top row). In the heatmaps in Figure 8 (and all subsequent heatmaps), thespaces t Y in,k u k,n,i are sorted with respect to increasing lexicographical order of p k, n, i q and arelabeled 1 , . . . ,

45 according to this increasing order. Figure 8 demonstrates that, at least inthis setting,

SLB ult1 is close to u GW , . Both u GW , and SLB ult1 strongly discriminate betweenthe diﬀerent independently sampled spaces and the larger the number of blocks k , the largerthe values of u GW , and SLB ult1 are.Next, we investigate how the results change if we consider subsampled instead of independentultrametric measure spaces. For k “ , , , , HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 47

Figure 7.

Randomly sampled ultrametric measure spaces:

Illustra-tion of three independently sampled ultrametric measure spaces with threeblocks (top row) and three subsampled ultrametric measure spaces with threeblocks (bottom row) as dendrograms.space with k blocks, where each block contains 100 points, as previously described. Then,we subsample from each large space 9 subspaces (three 10-point subspaces, three 20-pointsubspaces and three 30-point subspaces) and equip them with the uniform measure. Theseultrametric measure spaces are denoted as Z in,k “ ´ Z in,k , u Z in,k , µ Z in,k ¯ , 1 ď i ď n Pt , , u and k P t , . . . , u . Other than the independently sampled ultrametric measurespaces these subsampled ultrametric measure spaces have the same large scale structurewith high probability (see Figure 7 for an illustration). With the new spaces, we repeatthe comparisons done for the independently sampled spaces. The results are summarized inFigure 8 (bottom row, same visualization) and interestingly diﬀer quite drastically from theones for the independently sampled ultrametric measure spaces. While u GW , (approximatedby Algorithm 1, N “ , L “

10) and

SLB ult1 are again close and display the same behavior,the diﬀerences between spaces with the same number of blocks are much less pronouncedthan before.Let X , Y P U w . The reason, why SLB ult1 behaves so diﬀerently for subsampled ultrametricmeasure spaces becomes evident, when we regard the reformulation of

SLB ult1 given byCorollary 5.6:

SLB ult1 p X , Y q “ SLB p X , Y q ` ż R t |p u X q p µ X b µ X q ´ p u Y q p µ Y b µ Y q| p dt q . (30)If we independently sample the ultrametric measure spaces, then these spaces will usuallyhave a slightly diﬀerent diameters and diﬀerent distance distribution proﬁles around theirdiameter (cf. Figure 7). This causes the second summand in (30) to become large. If wesubsample the ultrametric measure spaces, then they will have the same diameter and large

10 20 30 4051015202530354045 00.10.20.30.4

10 20 30 4051015202530354045 00.050.10.150.20.250.30.350.410 20 30 4051015202530354045 00.050.10.150.20.250.30.350.4

10 20 30 4051015202530354045

Figure 8.

Randomly sampled ultrametric measure spaces I:

Heatmaprepresentation of u GW , p Y in,k , Y i n ,k q (top left), of SLB ult1 p Y in,k , Y i n ,k q (topright), of u GW , p Z in,k , Z i n ,k q (bottom right) and SLB ult1 p Z in,k , Z i n ,k q (bottomleft), 1 ď i, i ď n, n P t , , u and k, k P t , . . . , u .scale structure with high probability (see Figure 7). Hence, the second term in (30) is almostnegligible in this case. Therefore, SLB ult1 is extremely sensitive to small perturbations ofthe large scale structure of ultrametric measure spaces and our simulations suggest that thesame holds true for u GW , .7.3. Relation to the Gromov-Wasserstein distance.

Next, we will demonstrate thediﬀerences between the Gromov-Wasserstein distance d GW , , the lower bound SLB , u GW , and SLB ult1 . To this end, we compare the ultrametric measure spaces displayed in Figure 6and repeat the comparisons between the randomly sampled ultrametric measure spaces donein Section 7.2. The results are displayed in Table 3 and Figure 9, respectively. Consideringthe values in Table 3, we observe that d GW , and SLB are hardly inﬂuenced by the diﬀer-ences between the ultrametric measure spaces X i “ p X i , u X i , µ X i q , 1 ď i ď

4. In particular,it is remarkable that d GW , is aﬀected the most by the changes made to the measure andnot the metric structure. Interestingly, this is not true for SLB . The comparisons of theresults displayed in Figure 8 and Figure 9 further stresses the completely diﬀerent behaviorof u GW , { SLB ult1 and d GW , { SLB . Since both d GW , and SLB are robust to small changesof the metric structure of the considered ultrametric measure spaces, the results for com-paring the independently sampled and subsampled ultrametric measure spaces diﬀer only HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 49 d GW , SLB X X X X X X X X X X X X Table 3.

Comparison of diﬀerent ultrametric measure spaces III:

The values of d GW , p X i , X j q (approximated by a version of Algorithm 1) and SLB p X i , X j q , 1 ď i ď j ď

4, where X i , 1 ď i ď

4, denote the ultrametricmeasure spaces displayed in Figure 6.

10 20 30 4051015202530354045 00.050.10.150.20.250.30.35 10 20 30 4051015202530354045 00.050.10.150.20.250.30.3510 20 30 4051015202530354045 00.050.10.150.20.250.30.35 10 20 30 4051015202530354045 00.050.10.150.20.250.30.35

Figure 9.

Randomly sampled ultrametric measure spaces II:

Heatmap representation of d GW , p Y in,k , Y i n ,k q (top left), of SLB p Y in,k , Y i n ,k q (top right), of d GW , p Z in,k , Z i n ,k q (bottom right) and SLB p Z in,k , Z i n ,k q (bot-tom left), 1 ď i, i ď n, n P t , , u and k, k P t , . . . , u .slightly. Furthermore, there seems to be almost no discrimination between the 4 and 5 blockspaces considered, although they are structurally quite diﬀerent. In conclusion, we ﬁnd that d GW , and SLB have trouble picking up crucial structural diﬀerences between the randomlycreated ultrametric measure spaces. Phylogenetic tree shapes

Rooted phylogenetic trees (for a formal deﬁnition see e.g. Semple et al. (2003)) are a commontool to visualize and analyze the evolutionary relationship between diﬀerent organisms. Incombination with DNA sequencing, they are an important tool to study the rapid evolutionof diﬀerent pathogens. It is well known that the (unweighted) shape of a phylogenetictree, i.e., the tree’s connectivity structure without referring to its labels or the length of itsbranches, carries important information about macroevolutionary processes (see e.g. Mooersand Heard (1997); Blum and Fran¸cois (2006); Dayarian and Shraiman (2014); Wu and Choi(2016)). In order to study the evolution of and the relation between diﬀerent pathogens, it isof great interest to compare the shapes of phylogenetic trees created on the basis of diﬀerentdata sets. Currently, the number of tools for performing phylogenetic tree shape comparisonis quite limited and the development of new methods for this is an active ﬁeld of research(Colijn and Plazzotta, 2018; Morozov, 2018; Kim et al., 2019; Liu et al., 2020). It is wellknown that certain classes of phylogenetic trees (as well as their respective tree shapes) canbe identiﬁed as ultrametric spaces (Semple et al., 2003, Sec. 7). On the other hand, generalphylogenetic trees are closely related to treegrams (see Deﬁnition 6.4). In the following, wewill use this connection and demonstrate in some preliminary, illustrative example that thecomputationally eﬃcient lower bound

SLB ult1 has some potential for comparing phylogenetictree shapes. In particular, we contrast it to the metric deﬁned for this application in Colijnand Plazzotta (2018) and study the behavior of

SLB in this framework.8.1. SLB ult1 based phylogenetic tree shape comparison.

In this subsection, we recon-sider phylogenetic tree shape comparisons from Colijn and Plazzotta (2018) and therebystudy HA protein sequences from human inﬂuenza A (H3N2) (data downloaded from NCBIon 22 January 2016). More precisely, we investigate the relation of two samples of size 200of phylogenetic tree shapes with 500 tips. Phylogenetic trees from the ﬁrst sample are basedon a random subsample of size 500 of 2168 HA-sequences that were collected in the USAbetween March 2010 and September 2015, while trees from the second sample are based ona random subsample of size 500 of 1388 HA-sequences gathered in the tropics between Janu-ary 2000 and October 2015 (for the exact construction of the trees see Colijn and Plazzotta(2018)). Although both samples of phylogenetic trees are based on HA protein sequencesfrom human inﬂuenza A, we expect them to be quite diﬀerent. On the one hand, inﬂuenzaA is highly seasonal outside the tropics (where this seasonal variation is absent) with themajority of cases occurring in the winter (Russell et al., 2008). On the other hand, it iswell known that the undergoing evolution of the HA protein causes a ’ladder-like’ shapeof long-term inﬂuenza phylogenetic trees (Koelle et al., 2010; Volz et al., 2013; Westgeestet al., 2012; (cid:32)Luksza and L¨assig, 2014) that is typically less developed in short term data sets.Thus, also the diﬀerent collection period of the two data sets will most likely inﬂuence therespective phylogenetic tree shapes.In order to compare the (unweighted) phylogenetic tree shapes of the resulting 400 trees,we have to transform the phylogenetic tree shapes into ultra-dissimilarity measure spaces X i “ p X i , u X i , µ X i q , 1 ď i ď X i thetips of the i ’th phylogenetic tree and refer to the corresponding (unweighted) tree shape as T i . Next, we deﬁne the ultra-dissimilarities u X i on X i , 1 ď i ď u X i as follows: let x i , x i P X i and let d i , be the length of a shortest path between x i and x i . Let d ij be the HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 51 phylogenetic tree shape treegram

Figure 10.

Transforming a phylogenetic tree shape into an ultra-dissimilarity space:

In this ﬁgure, we illustrate the treegram correspondingto the ultra-dissimilarity space generated by (31) with respect to the phyloge-netic tree shape on the left. Note that the treegram preserves the tree structureand the smallest birth time of points is exactly 0.length of the shortest path from x ij to the root, 1 ď j ď

2, and let d i be the length of thelongest shortest path from any tip to the root. Then, we deﬁne for any x i , x i P X i u X i p x i , x i q “ d i , if x i ‰ x i d i ´ d i if x i “ x i , (31)and weight all tips in X i equally (i.e. µ X i is the uniform measure on X i ). This naturallytransforms the collection of phylogenetic tree shapes T i , 1 ď i ď SLB ult1 to compare them (once again we exemplarily choose p “ T i ,1 ď i ď d CP , . The top row of Figure 11visualizes the dissimilarity matrix for the comparisons of all 400 phylogenetic tree shapes(the ﬁrst 200 entries correspond to the tree shapes from the US-inﬂuenza and the second 200correspond to the ones from the tropic inﬂuenza) obtained by applying SLB ult1 as heat map(left) and as multidimensional scaling plot (right). The heat map shows that the collectionof US trees is divided into a large group G : “ p T i q ď i ď , that is well separated from thephylogenetic tree shapes based on tropical data G : “ p T i q ď i ď , and a smaller subgroup G : “ p T i q ď i ď , that seems to be more similar (in the sense of SLB ult1 ) to the tropicalphylogenetic tree shapes. In the following G and G are referred to as US main and

USsecondary group , respectively. This division is even more evident in the MDS-plot on theright (black points represent trees shapes from the US main group, blue points trees shapesfrom the US secondary group and red points trees shapes based on the tropical data).

100 200 300 40050100150200250300350400 -15 -10 -5 0 5 10 15-50510

100 200 300 40050100150200250300350400 -40 -20 0 20 40-20-15-10-505101520

Figure 11.

Phylogenetic tree shape comparison:

Visualization of thedissimilarity matrices for the comparison of the phylogenetic tree shapes T i ,1 ď i ď SLB ult1 (top row) and d CP , (bottom row) as heat maps(left) and MDS-plots (right).We remark that in order to highlight the subgroups the US tree shapes have beenreordered according to the output permutation of a single linkage dendrogram (w.r.t. SLB ult1 ) based on the US tree submatrix created by MATLAB (2019) and that the trop-ical tree shapes have been reordered analogously.The second row of Figure 11 displays the analogous plots for d CP , . It is noteworthy, thatthe coloring in the MDS-plot of the left is the same, i.e., T P G is represented by a blackpoint, T P G by a blue one and T P G by a red one. Interestingly, the analysis basedon these plots diﬀers from the previous one. Using d CP , to compare the phylogenetic treeshapes at hand, we can split the data into two clusters, where one corresponds to the USdata and the other one to the tropical data, with only a small overlap (see the MDS-plotin the second row of Figure 11 on the right). In particular, we notice that d CP , does notclearly distinguish between the US groups G and G .In order to analyze the diﬀerences between SLB ult1 and d CP , displayed in Figure 11, wecollect diﬀerent characteristics of the tree shapes in the groups G i , 1 ď i ď | G i | ř T i P G i ř x,x P X i u X i p x, x q (“mean average distance”) or HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 53

USA (main group) USA (secondary group) TropicsMean Avg. Dist. 28.38 38.61 38.19Mean Max. Dist. 61.03 89.33 95.65Mean Num. of 4-Struc. 15.61 14.08 7.81Mean Num. of 5-Struc. 28.04 27.97 35.82

Table 4.

Tree shape characteristics:

The means of several metric andconnectivity characteristics of the ultra-dissimilarity spaces X i and the corre-sponding phylogenetic tree shapes T i , 1 ď i ď G i ,1 ď i ď | G i | ř T i P G i max t u X i p x, x q| x, x P X i u (“mean maximal distance”), 1 ď i ď SLB ult1 strongly) as well as the mean numbers of certain connectivity structures, like the4- and 5-structures (these inﬂuence d CP , , for a formal deﬁnition see Colijn and Plazzotta(2018)). The values in Table 4 show that the mean average distance and the mean maximaldistance diﬀer drastically between the two groups of the US tree shapes. The tree shapes inthese two groups are completely diﬀerent from a metric perspective and the values for thesecondary US group strongly resemble those of the tropic tree shapes. On the other hand,the connectivity characteristics do not change too much between the US main and secondarygroup. Hence, the metric d CP , does not clearly divide the US trees into two groups, althoughthe diﬀerences are certainly present. When carefully checking the phylogenetic trees the rea-sons for the diﬀerences between trees in the US main group and US secondary group arenot immediately apparent. Nevertheless, it is remarkable that trees from the secondary UScluster generally contain more samples from California and Florida (on average 1.92 and 0.88more) and less from Maryland, Kentucky and Washington (on average 0.73, 0.83 and 0.72less).So far we have only considered unweighted phylogenetic tree shapes. However, the branchlengths of the considered phylogenetic trees are relevant in many examples, because they canfor instance reﬂect the (inferred) genetic distance between evolutionary events (Colijn andPlazzotta, 2018). While the branch lengths cannot easily be included in the metric d CP , ,the modeling of phylogenetic tree shapes as ultra-dissimilarity spaces is extremely ﬂexible.It is straight forward to include branch lengths into the comparisons or to put emphasis onspeciﬁc features (via weights on the corresponding tips). However, this is beyond the scopeof our preliminary data analysis.8.2. SLB p based phylogenetic tree shape comparison. To conclude this section, weillustrate how the results change if we compare phylogenetic tree shapes, or more preciselythe corresponding ultra-dissimilarity spaces X i , 1 ď i ď SLB (cf. Section 5) instead of SLB ult1 . The results for these comparisons are summarized inFigure 12 (for additional details see Figure 14 in Appendix D). It illustrates the correspondingdissimilarity matrix based on

SLB for the comparison of the X i , 1 ď i ď SLB ult1 and the MDS-plot shows that also

SLB identiﬁes the two groups G and G inside the collection of trees based on US data.Moreover, it suggests that SLB discriminates less strictly between tree shapes inside G than SLB ult1 .

100 200 300 40050100150200250300350400 -10 -5 0 5 10-2-1012345

Figure 12.

Phylogenetic tree shape comparison based on SLB : Rep-resentation of the dissimilarity matrices for the comparisons of the ultra-dissimilarity spaces X i , 1 ď i ď SLB as heat maps (left)and MDS-plots (right).In conclusion, we ﬁnd that both SLB ult1 and

SLB give comparable results for the unweightedphylogenetic tree shape comparison. However, the discrimination based on SLB is in somecases less strict. 9. Concluding remarks

Since we suspect that computing u GW ,p and u sturmGW ,p for ﬁnite p leads to NP-hard problems, itseems interesting to identify suitable collections of ultrametric measure spaces where thesedistances can be computed in polynomial time as done for the Gromov-Hausdorﬀ distancein M´emoli et al. (2019). Acknowledgements.

We are grateful to Prof. Colijn for sharing the data from Colijn andPlazzotta (2018) with us.F.M. and Z.W. acknowledge funding from the NSF under grants NSF CCF 1740761, NSFDMS 1723003, and NSF RI 1901360.A.M. and C.W. gratefully acknowledge support by the DFG Research Training Group 2088and Cluster of Excellence MBExC 2067.F.M. and A.M. Thank the Mathematisches Forschungsinstitut Oberwolfach. Conversationswhich eventually led to this project were initiated during the 2019 workshop “Statistical andComputational Aspects of Learning with Complex Structure”.

References

GM Adelson-Velskii and AS Kronrod. About level sets of continuous functions with partialderivatives. In

Dokl. Akad. Nauk SSSR , volume 49, pages 239–241, 1945.Pankaj K Agarwal, Kyle Fox, Abhinandan Nath, Anastasios Sidiropoulos, and Yusu Wang.Computing the Gromov-Hausdorﬀ distance for metric trees.

ACM Transactions on Algo-rithms (TALG) , 14(2):1–20, 2018.Alfred V. Aho and John E. Hopcroft.

The design and analysis of computer algorithms .Pearson Education India, 1974.

HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 55

David Alvarez-Melis and Tommi S. Jaakkola. Gromov-Wasserstein alignment of word em-bedding spaces. arXiv preprint arXiv:1809.00013 , 2018.Yair Bartal. Probabilistic approximation of metric spaces and its algorithmic applications.In

Proceedings of 37th Conference on Foundations of Computer Science , pages 184–193.IEEE, 1996.Louis J. Billera, Susan P. Holmes, and Karen Vogtmann. Geometry of the space of phyloge-netic trees.

Advances in Applied Mathematics , 27(4):733–767, 2001.Michael G.B. Blum and Olivier Fran¸cois. Which random processes describe the tree of life?A large-scale study of phylogenetic tree imbalance.

Systematic Biology , 55(4):685–691,2006.Nicolas Bonneel, Julien Rabin, Gabriel Peyr´e, and Hanspeter Pﬁster. Sliced and radonWasserstein barycenters of measures.

Journal of Mathematical Imaging and Vision , 51(1):22–45, 2015.L. Bottou, M. Arjovsky, D. Lopez-Paz, and M. Oquab. Geometrical insights for implicit gen-erative modeling. In

Braverman Readings in Machine Learning. Key Ideas from Inceptionto Current State , pages 229–268. Springer, 2018.Alexander M. Bronstein, Michael M. Bronstein, and Ron Kimmel. Eﬃcient computation ofisometry-invariant distances between surfaces.

SIAM Journal on Scientiﬁc Computing , 28(5):1812–1836, 2006a.Alexander M. Bronstein, Michael M. Bronstein, and Ron Kimmel. Generalized multidimen-sional scaling: A framework for isometry-invariant partial surface matching.

Proceedingsof the National Academy of Sciences , 103(5):1168–1172, 2006b.Alexander M. Bronstein, Michael M. Bronstein, Alfred M Bruckstein, and Ron Kimmel.Partial similarity of objects, or how to compare a centaur to a horse.

International Journalof Computer Vision , 84(2):163, 2009a.Alexander M. Bronstein, Michael M. Bronstein, and Ron Kimmel. Topology-invariant simi-larity of nonrigid shapes.

International journal of computer vision , 81(3):281, 2009b.Alexander M. Bronstein, Michael M. Bronstein, Ron Kimmel, Mona Mahmoudi, andGuillermo Sapiro. A Gromov-Hausdorﬀ framework with diﬀusion geometry fortopologically-robust non-rigid shape matching.

International Journal of Computer Vi-sion , 89(2-3):266–286, 2010.Charlotte Bunne, David Alvarez-Melis, Andreas Krause, and Stefanie Jegelka. Learninggenerative models across incomparable spaces. arXiv preprint arXiv:1905.05461 , 2019.Gunnar Carlsson and Facundo M´emoli. Characterization, stability and convergence of hi-erarchical clustering methods.

Journal of machine learning research , 11(Apr):1425–1470,2010.Fr´ed´eric Chazal, David Cohen-Steiner, Leonidas J Guibas, Facundo M´emoli, and Steve YOudot. Gromov-Hausdorﬀ stable signatures for shapes using persistence. In

ComputerGraphics Forum , volume 28, pages 1393–1403. Wiley Online Library, 2009.Jie Chen and Ilya Safro. Algebraic distance on graphs.

SIAM Journal on Scientiﬁc Com-puting , 33(6):3468–3490, 2011.Samir Chowdhury.

Metric and Topological Approaches to Network Data Analysis . PhDthesis, The Ohio State University, 2019.Samir Chowdhury and Facundo M´emoli. The Gromov-Wasserstein distance between net-works and stable network invariants.

Information and Inference: A Journal of the IMA ,8(4):757–787, 2019.

Samir Chowdhury and Tom Needham. Generalized spectral clustering via Gromov-Wasserstein learning. arXiv preprint arXiv:2006.04163 , 2020.Caroline Colijn and Giacomo Plazzotta. A metric on phylogenetic tree shapes.

Systematicbiology , 67(1):113–126, 2018.Guy David, Stephen W. Semmes, Stephen Semmes, and Guy Rene Pierre Pierre.

Fracturedfractals and broken dreams: Self-similar geometry through metric and measure , volume 7.Oxford University Press, 1997.Adel Dayarian and Boris I. Shraiman. How to infer relative ﬁtness from a sample of genomicsequences.

Genetics , 197(3):913–923, 2014.Khanh Do Ba, Huy L. Nguyen, Huy N. Nguyen, and Ronitt Rubinfeld. Sublinear timealgorithms for Earth Mover’s distance.

Theory of Computing Systems , 48(2):428–442,2011.Yihe Dong and Will Sawin. Copt: Coordinated optimal transport on graphs. arXiv preprintarXiv:2003.03892 , 2020.Richard M Dudley.

Real analysis and probability . CRC Press, 2018.Steven N. Evans.

Probability and Real Trees: ´Ecole D’ ´Et´e de Probabilit´es de Saint-FlourXXXV-2005 . Springer, 2007.Steven N. Evans and Frederick A. Matsen. The phylogenetic Kantorovich–Rubinstein metricfor environmental sequence samples.

Journal of the Royal Statistical Society: Series B(Statistical Methodology) , 74(3):569–592, 2012.Andreas Greven, Peter Pfaﬀelhuber, and Anita Winter. Convergence in distribution of ran-dom metric measure spaces ( λ -coalescent measure trees). Probability Theory and RelatedFields , 145(1-2):285–322, 2009.Gillian Grindstaﬀ and Megan Owen. Geometric comparison of phylogenetic trees with dif-ferent leaf sets. arXiv preprint arXiv:1807.04235 , 2018.Jotun Hein. Reconstructing evolution of sequences subject to recombination using parsimony.

Mathematical biosciences , 98(2):185–200, 1990.Liisa Holm and Chris Sander. Protein structure comparison by alignment of distance matri-ces.

Journal of molecular biology , 233(1):123–138, 1993.Norman R. Howes.

Modern analysis and topology . Springer Science & Business Media, 2012.Anil K. Jain and Chitra Dorai. 3d object recognition: Representation and matching.

Sta-tistics and Computing , 10(2):167–182, 2000.Leonid V. Kantorovich. On the translocation of masses, cr (dokl.) acad.

Sci. URSS (NS) ,37:199, 1942.Leonid V. Kantorovich and G Rubinstein. On a space of completely additive functions(russ.).

Vestnik Leningrad Univ , 13:52–59, 1958.Jaehee Kim, Noah A. Rosenberg, and Julia A. Palacios. A metric space of ranked tree shapesand ranked genealogies. bioRxiv , 2019.Benoˆıt R Kloeckner. A geometric study of Wasserstein spaces: Ultrametrics.

Mathematika ,61(1):162–178, 2015.Katia Koelle, Priya Khatri, Meredith Kamradt, and Thomas B. Kepler. A two-tiered modelfor simulating the ecological and evolutionary dynamics of rapidly evolving viruses, withan application to inﬂuenza.

Journal of The Royal Society Interface , 7(50):1257–1274, 2010.Soheil Kolouri, Kimia Nadjahi, Umut Simsekli, Roland Badeau, and Gustavo Rohde. Gener-alized sliced Wasserstein distances. In

Advances in Neural Information Processing Systems ,pages 261–272, 2019.

HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 57

Irina Kufareva and Ruben Abagyan. Methods of protein structure comparison. In

HomologyModeling , pages 231–257. Springer, 2011.Hao-Yuan Kuo, Hong-Ren Su, Shang-Hong Lai, and Chin-Chia Wu. 3D object detectionand pose estimation from depth image for robotic bin picking. In , pages 1264–1269. IEEE, 2014.Manuel Lafond, Nadia El-Mabrouk, Katharina T Huber, and Vincent Moulton. The com-plexity of comparing multiply-labelled trees by extending phylogenetic-tree metrics.

The-oretical Computer Science , 760:15–34, 2019.Tam Le, Nhat Ho, and Makoto Yamada. Fast tree variants of Gromov-Wasserstein. arXivpreprint arXiv:1910.04462 , 2019a.Tam Le, Makoto Yamada, Kenji Fukumizu, and Marco Cuturi. Tree-sliced variants ofWasserstein distances. In

Advances in neural information processing systems , pages 12304–12315, 2019b.Volkmar Liebscher. New Gromov-inspired metrics on phylogenetic tree space.

Bulletin ofmathematical biology , 80(3):493–518, 2018.Pengyu Liu, Matthew Gould, and Caroline Colijn. Polynomial phylogenetic analysis of treeshapes.

BioRxiv , 2020.David G. Lowe. Local feature view clustering for 3D object recognition. In

Proceedings of the2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.CVPR 2001 , volume 1, pages I–I. IEEE, 2001.Marta (cid:32)Luksza and Michael L¨assig. A predictive ﬁtness model for inﬂuenza.

Nature , 507(7490):57–61, 2014.Colin L. Mallows. A note on asymptotic joint normality.

The Annals of MathematicalStatistics , pages 508–515, 1972.MATLAB.

MATLAB: Accelerating the pace of engineering and science . The MathWorks,Inc., 2019. URL .Andrew McGregor and Daniel Stubbs. Sketching Earth-Mover distance on graph metrics. In

Approximation, Randomization, and Combinatorial Optimization. Algorithms and Tech-niques , pages 274–286. Springer, 2013.Facundo M´emoli. On the use of Gromov-Hausdorﬀ distances for shape comparison. 2007.Facundo M´emoli. Gromov-Wasserstein distances and the metric approach to object match-ing.

Foundations of computational mathematics , 11(4):417–487, 2011.Facundo M´emoli. Some properties of Gromov-Hausdorﬀ distances.

Discrete & ComputationalGeometry , 48(2):416–440, 2012.Facundo M´emoli and Guillermo Sapiro. Comparing point clouds. In

Proceedings of the 2004Eurographics/ACM SIGGRAPH symposium on Geometry processing , pages 32–40, 2004.Facundo M´emoli, Zane Smith, and Zhengchao Wan. Gromov-Hausdorﬀ distances on p -metricspaces and ultrametric spaces. arXiv preprint arXiv:1912.00564 , 2019.Arne O. Mooers and Stephen B. Heard. Inferring evolutionary process from phylogenetictree shape. The quarterly review of Biology , 72(1):31–54, 1997.Alexey Anatolievich Morozov. Extension of colijn-plazotta tree shape distance metric tounrooted trees.

BioRxiv , page 506022, 2018.Dmitriy Morozov, Kenes Beketayev, and Gunther Weber. Interleaving distance betweenmerge trees.

Discrete and Computational Geometry , 49(22-45):52, 2013.Megan Owen and J Scott Provan. A fast algorithm for computing geodesic distances intree space.

IEEE/ACM Transactions on Computational Biology and Bioinformatics , 8(1):

The InternationalJournal of Robotics Research , 31(4):538–553, 2012.Panos M Pardalos and Stephen A Vavasis. Quadratic programming with one negative eigen-value is NP-hard.

Journal of Global optimization , 1(1):15–22, 1991.Gabriel Peyr´e, Marco Cuturi, and Justin Solomon. Gromov-Wasserstein averaging of kerneland distance matrices. In

International Conference on Machine Learning , pages 2664–2672, 2016.Derong Qiu. Geometry of non-archimedean Gromov-Hausdorﬀ distance.

P-Adic Numbers,Ultrametric Analysis, and Applications , 1(4):317, 2009.Georges Reeb. Sur les points singuliers d’une forme de pfaﬀ completement integrable oud’une fonction numerique [on the singular points of a completely integrable pfaﬀ form orof a numerical function].

Comptes Rendus Acad. Sciences Paris , 222:847–849, 1946.David F. Robinson. Comparison of labeled trees with valency three.

Journal of combinatorialtheory, Series B , 11(2):105–119, 1971.David F. Robinson and Leslie R. Foulds. Comparison of phylogenetic trees.

Mathematicalbiosciences , 53(1-2):131–147, 1981.Yossi Rubner, Carlo Tomasi, and Leonidas J. Guibas. The Earth Mover’s distance as ametric for image retrieval.

International journal of computer vision , 40(2):99–121, 2000.Colin A. Russell, Terry C. Jones, Ian G. Barr, Nancy J. Cox, Rebecca J. Garten, VickyGregory, Ian D. Gust, Alan W. Hampson, Alan J. Hay, Aeron C. Hurt, et al. The globalcirculation of seasonal inﬂuenza a (H3N2) viruses.

Science , 320(5874):340–346, 2008.Felix Schmiedl. Computational aspects of the Gromov-Hausdorﬀ distance and its applicationin non-rigid shape matching.

Discret. Comput. Geom. , 57(4):854–880, 2017. doi: 10.1007/s00454-017-9889-4. URL https://doi.org/10.1007/s00454-017-9889-4 .Charles Semple, Mike Steel, et al.

Phylogenetics , volume 24. Oxford University Press onDemand, 2003.Zane Smith, Samir Chowdhury, and Facundo M´emoli. Hierarchical representations of net-work data with optimal distortion bounds. In , pages 1834–1838. IEEE, 2016.Karl-Theodor Sturm. On the geometry of metric measure spaces.

Acta mathematica , 196(1):65–131, 2006.Karl-Theodor Sturm. The space of spaces: Curvature bounds and gradient ﬂows on thespace of metric measure spaces. arXiv preprint arXiv:1208.0434 , 2012.David Thorsley and Eric Klavins. Model reduction of stochastic processes using Wassersteinpseudometrics. In , pages 1374–1381. IEEE, 2008.Vayer Titouan, Nicolas Courty, Romain Tavenard, and R´emi Flamary. Optimal transportfor structured data with application on graphs. In

International Conference on MachineLearning , pages 6275–6284, 2019.Elena Farahbakhsh Touli and Yusu Wang. FPT-algorithms for computing Gromov-Hausdorﬀand interleaving distances between trees. arXiv preprint arXiv:1811.02425 , 2018.Sergei S Vallender. Calculation of the Wasserstein distance between probability distributionson the line.

Theory of Probability & Its Applications , 18(4):784–786, 1974.Titouan Vayer, R´emi Flamary, Romain Tavenard, Laetitia Chapel, and Nicolas Courty.Sliced Gromov-Wasserstein. arXiv preprint arXiv:1905.10124 , 2019.

HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 59

C´edric Villani.

Topics in optimal transportation . Number 58. American Mathematical Soc.,2003.C´edric Villani.

Optimal transport: Old and new , volume 338. Springer Science & BusinessMedia, 2008.Erik M. Volz, Katia Koelle, and Trevor Bedford. Viral phylodynamics.

PLoS Comput Biol ,9(3):e1002947, 2013.Zhengchao Wan. A novel construction of Urysohn universal ultrametric space via theGromov-Hausdorﬀ ultrametric. arXiv preprint arXiv:2007.08105 , 2020.Kim B. Westgeest, Miranda de Graaf, Mathieu Fourment, Theo M. Bestebroer, Ruud vanBeek, Monique I.J. Spronken, Jan C. de Jong, Guus F Rimmelzwaan, Colin A. Russell,Albert D.M.E. Osterhaus, et al. Genetic evolution of the neuraminidase of inﬂuenza a(H3N2) viruses from 1968 to 2009 and its correspondence to haemagglutinin evolution.

The Journal of general virology , 93(Pt 9):1996, 2012.Taoyang Wu and Kwok Pui Choi. On joint subtree distributions under two evolutionarymodels.

Theoretical population biology , 108:13–23, 2016.Ihor Zarichnyi. Gromov-Hausdorﬀ ultrametric. arXiv preprint math/0511437 , 2005.

Appendix A. Missing details from Section 2

A.1.

Synchronized rooted trees. A synchronized rooted tree , is a combinatorial tree T “p V, E q with a root o P V and a height function h : V Ñ r , such that h ´ p q coincides withthe leaf set and h p v q ă h p v ˚ q for each v P V zt o u , where v ˚ is the parent of v . Similar as inLemma 2.2 that there exists a correspondence between ultrametric spaces and dendrograms,an ultrametric space X uniquely determines a synchronized rooted tree T X (Kloeckner, 2015).Now given a compact ultrametric space p X, u X q , we construct the corresponding sychronizedrooted tree T X via the dendrogram θ X associated with u X . Recall from Section 2.3 that V p X q : “ Ť t ą θ X p t q . For each B P V p X qzt X u , denote by B ˚ the smallest element in V p X q such that B Ř B ˚ , whose existence is guaranteed by the following lemma: Lemma A.1.

Let X be a compact ultrametric space and let V p X q “ Ť t ą θ X p t q , where θ X is as deﬁned in Remark 2.5. For each B P V p X q such that B ‰ X , there exists B ˚ P V p X q such that B ˚ ‰ B and B ˚ Ď B for all B P V p X q with B Ř B .Proof. Let δ : “ diam p B q . Let x P B , then B “ r x s δ . By Lemma 2.7, X δ is a ﬁnite set.Consider δ ˚ : “ min t u X δ pr x s δ , r x s δ q| r x s δ ‰ r x s δ u . Let B ˚ : “ r x s δ ˚ , then B ˚ is the smallestelement in V p X q containing B under inclusion. Indeed, B ˚ ‰ B and if B Ď B for some B P V p X q , then B “ r x s r for some r ą δ . It is easy to see that for all δ ă r ă δ ˚ , r x s r “ r x s δ .Therefore, if B ‰ B , we must have that r ě δ ˚ and thus B ˚ “ r x s δ ˚ Ď r x s r “ B . (cid:3) Now, we deﬁne a combinatorial tree T X “ p V X , E X q as follows: we let V X : “ V p X q ; forany distinct B, B P V p X q , we let p B, B q P E X iﬀ either B “ p B q ˚ or B “ B ˚ . Wechoose X P V to be the root of T X , then any B ‰ X in V has a unique parent B ˚ . Wedeﬁne h X : V X Ñ r , such that h X p B q : “ diam p B q for any B P V X . Now, T X endowedwith the root X and the height function h X is a synchronized rooted tree. It is easy to seethat X can be isometrically identiﬁed with h ´ X p q of the so-called metric completion of T X (see (Kloeckner, 2015, Section 2.3) for details). With this construction Lemma 2.12 followsdirectly from (Kloeckner, 2015, Lemma 3.1). A.2.

Closed-form solution for d p R ě , ∆ q q W ,p .Theorem A.2. Given ď p, q ă 8 , we have that d p R ě , ∆ q q W ,p p α, β q ď ˆż ∆ q p F ´ α p t q , F ´ β p t qq p dt ˙ p . When q ď p , the equality holds whereas when q ą p , the equality does not hold in general.Proof. Note that d p R ě , ∆ q q W ,p p α, β q “ inf p ξ,η q ` E p ∆ q p ξ, η q p q ˘ p , where ξ and η are two randomvariables with marginal distributions α and β , respectively. Moreover, let ζ be the randomvariable uniformly distributed on r , s , then F ´ α p ζ q has distribution function F α and F ´ β p ζ q has distribution function F β (see for example Vallender (1974)). Let ξ “ F ´ α p ζ q and η “ F ´ β p ζ q , then we have d p R ě , ∆ q q W ,p p α, β q ď ` E p ∆ q p ξ, η q p q ˘ p “ ˆż ∆ q p F ´ α p t q , F ´ β p t qq p dt ˙ p . Now we assume q ď p . Denote S q : r ,

8q Ñ r , the map taking x ě x q . Then, ´ d p R ě , ∆ q q W ,p p α, β q ¯ p “ inf µ P C p α,β q ĳ R ě ˆ R ě p ∆ q p x, y qq p µ p dx ˆ dy q“ inf µ P C p α,β q ĳ R ě ˆ R ě | S q p x q ´ S q p y q| pq µ p dx ˆ dy q“ inf µ P C p α,β q ĳ R ě ˆ R ě | s ´ t | pq p S q ˆ S q q µ p ds ˆ dt q“ ´ d p R ě , ∆ q W , pq pp S q q α, p S q q β q ¯ pq , where we use pq ě ´ d p R ě , ∆ q W , pq pp S q q α, p S q q β q ¯ pq “ ż | F ´ µ,q p t q ´ F ´ β,q p t q| pq dt, where F µ,q and F β,q are distribution functions of p S q q α and p S q q β , respectively. It is easyto verify that F µ,q p t q “ p F ´ α p t qq q and F β,q p t q “ p F ´ β p t qq q . Therefore, d p R ě , ∆ q q W ,p p α, β q “ ˆż ∆ q p F ´ α p t q , F ´ β p t qq p dt ˙ p Now consider the case when p ă q . We ﬁrst consider the extreme case p “ q “ 8 (though we require q ă 8 in the assumptions of the theorem, we relax this for now). Let α “ δ ` δ and β “ δ ` δ where δ x means the Dirac measure at point x P R ě .Then, we have that d p R ě , ∆ q W , p α , β q “ ă “ ż ∆ p F ´ α p t q , F ´ β p t qq dt. HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 61

It is not hard to see that both d p R ě , ∆ q q W ,p p α , β q and ´ş ∆ q p F ´ α p t q , F ´ β p t qq p dt ¯ p are continu-ous with respect to p P r , and q P r , . Then, for p close to 1 and large enough q ă 8 and in particular, p ă q , we have that d p R ě , ∆ q q W ,p p α , β q ă ˆż ∆ q p F ´ α p t q , F ´ β p t qq p dt ˙ p . (cid:3) A.3.

Relegated results and proofs from Section 2.

In this section we give the proofsof various results form Section 2.

Lemma A.3. ∆ deﬁnes an ultrametric on R ` . Proof.

It is evident that ∆ is positive and symmetric. Let a, b, c P R ` . Then, it holds bydeﬁnition that ∆ p a, b q “ a “ b . Furthermore, we have distinguish severalcases to demonstrate that ∆ p a, c q ď max t ∆ p a, b q , ∆ p a, b qu . (32)As (32) is trivial for a “ c , we consider the three cases a ‰ b ‰ c , a “ b ‰ c and a ‰ b “ c .(1) a ‰ b ‰ c : In this case we have that∆ p a, c q “ max p a, c q ď max t max p a, b q , max p b, c qu “ max t ∆ p a, b q , ∆ p b, c qu . (2) a “ b ‰ c : It follows that∆ p a, c q “ max p a, c q “ max t , max p a, c qu “ max t , max p b, c qu “ max t ∆ p a, b q , ∆ p b, c qu . (3) a ‰ b “ c : We obtain that∆ p a, c q “ max p a, c q “ max t max p a, c q , u “ max t max p a, b q , u “ max t ∆ p a, b q , ∆ p b, c qu . This yields the claim. (cid:3)

Proof of Theorem 2.4.

Given θ P D p X q , we deﬁne u θ : X ˆ X Ñ R ě as follows u θ p x, x q : “ inf t t ě | x and x belong to the same block of θ p t qu . It is straight forward to verify that u θ is an ultrametric. For any Cauchy sequence t x n u n P N in p X, u θ q , let D i : “ sup m,n ě i u θ p x m , x n q for each i P N . Since the sequence is a Cauchy(and because of (3)), each D i ă 8 and lim i Ñ8 D i “

0. By deﬁnition of u θ , we have thatfor each i P N the set t x n u n “ i is contained in the block r x i s D i P θ p D i q . Let X i : “ r x i s D i for each i P N . Then, obviously we have that X j Ď X i for any 1 ď i ă j . By condition(7) in Deﬁnition 2.3, we have that Ş i P N X i ‰ H . Choose x ˚ P Ş i P N X i , then it is easy toverify that x ˚ “ lim n Ñ8 x n and thus p X, u θ q is a complete space. To prove that p X, u θ q is acompact space, we need to verify that for each t ą X t is a ﬁnite space (cf. Lemma 2.7).In fact, the equivalence relation „ t with respect to u θ is the same as the one induced by thepartition θ p t q . Since θ p t q is ﬁnite by condition (6) in Deﬁnition 2.3, we have that X t is ﬁniteand thus X is compact. Now, we proved that u θ P U p X q . Based on this, we deﬁne a mapΥ X : D p X q Ñ U p X q by θ ÞÑ u θ . Now given u P U p X q , we deﬁne a map θ u : r ,

8q Ñ

Part p X q as follows: for each t ě „ t with respect to u , i.e., x „ t x if and only if u p x, x q ď t .We det θ u p t q to be the partition induced by „ t , i.e., θ u p t q “ X t . It is not hard to showthat θ u satisﬁes conditions (1)–(5) in Deﬁnition 2.3. Since X is compact, then θ u p t q “ X t isﬁnite for each t ą θ u satisﬁes condition (6) in Deﬁnition 2.3. Now, let t t n u n P N be a decreasing sequence such that lim n Ñ8 t n “ X n P θ X p t n q such that for any1 ď n ă m , X m Ď X n . Since each X n “ r x n s t n for some x n P X , X n is a compact subset of X . Since X is also complete, we have that Ş n P N X n ‰ H . Therefore, θ u satisﬁes condition(7) in Deﬁnition 2.3 and thus θ u P D p X q . Then, we deﬁne the map ∆ X : U p X q Ñ D p X q by u ÞÑ θ u .It is easy to check that ∆ X is the inverse of Υ X and thus we have established a bijectionbetween D p X q and U p X q . (cid:3) Proof of Lemma 2.11.

Given any t ą x P X , r x s t “ B t p x q “ t x P X | u X p x, x q ď t u .Therefore, V p X q is a collection of closed balls in X . On the contrary, any closed ball B t p x q with positive radius t ą r x s t P θ X p t q and thus belongs to V p X q . Now, forany singleton t x u “ B p x q . If x is not a cluster point, then there exists t ą B t p x q “ t x u which implies that t x u P V p X q . If x is a cluster point, then for any t ą t x u Ř B t p x q “ r x s t and thus t x u ‰ V p X q . In conclusion, V p X q is the collection of all closedballs in X except for singletons t x u such that x is a cluster point in X .If X is a one point space, then obviously X P V p X q “ t X u . Otherwise, let δ : “ diam p X q ą x P X we have that X “ r x s δ P V p X q . As for singletons t x u where x P X is nota cluster point, we have proved above that t x u P V p X q . (cid:3) Proof of Lemma 2.13.

First of all, we show that the right hand side of (16) is well deﬁned.More precisely, we employ Lemma 2.7 to prove that the supremumsup B P V p X qzt X u and α p B q‰ β p B q diam p B ˚ q is attained. For any given B P V p X qzt X u such that α p B q ‰ β p B q , we have thatdiam p B ˚ q ą

0. By Lemma 2.7 the spaces X t are ﬁnite for t ą

0. Since V p X q “ tr x s t | x P X, t ą u “ Ť t ą X t , there are only ﬁnitely many B P V p X qzt X u such that diam p B q ě diam p B ˚ q and thus diam p B ˚ q ě diam p B ˚ q . This implies that the supremum is attainedand thus sup B P V p X qzt X u and α p B q‰ β p B q diam p B ˚ q “ max B P V p X qzt X u and α p B q‰ β p B q diam p B ˚ q . Let δ : “ diam p B ˚ q . It is easy to see that for any x P X , α pr x s δ q “ β pr x s δ q .By Strassen’s theorem (see for example (Dudley, 2018, Theorem 11.6.2)), d W , p α, β q “ inf t r ě | for any closed subset A Ď X, α p A q ď β p A r qu , where A r : “ t x P X | u X p x, A q ď r u . First, we reconsider the closed subset B ( B P V p X qzt X u such that α p B q ‰ β p B q ). Since α p B q ‰ β p B q , we assume without loss ofgenerality that α p B q ą β p B q . By deﬁnition of B ˚ , it is obvious that p B q δ “ B ˚ (Recall: δ : “ diam p B ˚ q ) and p B q r “ B for all 0 ď r ă δ . Therefore, α p B q ď β pp B q r q only when r ě δ . This implies that d W , p α, β q ě δ . Conversely, for any closed set A , we have that HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 63 A r “ Ť x P A r x s r . For two closed balls in ultrametric spaces, either one includes the other orthey have no intersection. Therefore, there exists a subset S Ď A such that A r “ Ů x P S r x s r .Then, α p A q ď α p A δ q “ ř x P S α pr x s δ q “ ř x P S β pr x s δ q “ β p A δ q . Hence, d W , p α, β q ď δ andthus d W , p α, β q “ max B P V p X qzt X u and α p B q‰ β p B q diam p B ˚ q . (cid:3) Proof of Lemma 2.16. If X is ﬁnite, then obviously X is compact. Assume that X is acountable set with 0 being the unique cluster point. If t x n u Ď X is a Cauchy sequence withrespect to ∆ , then either x n is a constant when n is large or lim n Ñ8 x n “

0. In either case,the limit of the t x n u belongs to X and thus X is complete. Now for any ε ą

0, by Lemma2.7, X ε is a ﬁnite set. Denote X ε “ tr x s ε , . . . , r x n s ε u . Then, t x , . . . , x n u is a ﬁnite ε -net of X . Therefore, X is totally bounded and thus X is compact.Now, assume that X is compact. Then, for any ε ą X ε is a ﬁnite set. Suppose X ε “tr x s ε , . . . , r x n s ε u where 0 ď x ă x ă . . . ă x n . Further, we have that ∆ p x i , x j q “ x j whenever 1 ď i ă j ď n . This implies that(1) x i ą ε for all 2 ď i ď n ;(2) r x i s ε “ t x i u for all 2 ď i ď n .Therefore, X X p ε,

8q “ t x , . . . , x n u is a ﬁnite set. Since ε ą X is a at mostcountable set and has no cluster point other than 0. If X is countable, then 0 must be acluster point and by compactness of X , we have that 0 P X . (cid:3) Appendix B. Missing details from Section 3

B.1.

The Wasserstein pseudometric.

Given a set X , a pseudometric is a symmetricfunction d X : X ˆ X Ñ R ě satisfying the triangle inequality and d X p x, x q “ x P X .Note that if moreover d X p x, y q “ x “ y , then d X is a metric. There is a canonicalidentiﬁcation on pseudometric spaces p X, d X q : Deﬁne x „ x if d X p x, x q “

0. Then, „ isan equivalence relation and we deﬁne the quotient space ˜ X “ X { „ . Deﬁne a function˜ d X : ˜ X ˆ ˜ X Ñ R ě as follows:˜ d X pr x s , r x sq : “ d X p x, x q if d X p x, x q ‰

00 otherwise . Then, p ˜ X, ˜ d X q is a metric space named the metric space induced by the pseudometric space p X, d X q . Note that ˜ d X preserves the induced topology (see e.g. Howes (2012)) and thus thequotient map Ψ : X Ñ ˜ X is continuous.Analogously to the Wasserstein distance, which is deﬁned for probability measures on met-ric spaces, we deﬁne the Wasserstein pseudometric for measures on compact pseudometricspaces as done in Thorsley and Klavins (2008). Let α, β P P p X q . Then, we deﬁne for p P r , the Wasserstein pseudometric of order p as d p X,d X q W ,p p α, β q : “ ˆ inf µ P C p α,β q ż X ˆ X d pX p x, y q µ p dx ˆ dy q ˙ p (33) and for p “ 8 as d p X,d X q W , p α, β q : “ inf µ P C p α,β q sup p x,y qP supp p µ q u p x, y q . (34)It is easy to see that the Wasserstein pseudometric is closely related to the Wassersteindistance on the induced metric space. More precisely, one can show the following. Lemma B.1.

Let p X, d X q denote a compact pseudometric space, let α, β P P p X q . Then, itfollows for p P r , that d p X,d X q W ,p p α, β q “ d p ˜ X, ˜ d X q W ,p p Ψ α, Ψ β q (35) and in particular that the inﬁmum in (33) (resp. in (34) if p “ 8 ) is attained for some µ P C p α, β q .Proof. In the course of this proof we focus on the case p ă 8 and remark that the case p “ 8 follows by similar arguments. The quotient map allows us to deﬁne the map θ : C p α, β q Ñ C p Ψ α, Ψ β q via µ ÞÑ p Ψ ˆ Ψ q µ . It is easy to see that θ is well deﬁned and surjective.Furthermore, it holds by construction that ż X ˆ X d pX p x, y q µ p dx ˆ dy q “ ż ˜ X ˆ ˜ X ˜ d pX p x, y q θ p µ qp dx ˆ dy q for all µ P C p α, β q . Hence, (35) follows.We come to the second part of the claim. By (Villani, 2008, Sec.4) there exists an optimalcoupling ˜ µ ˚ P C p Ψ α, Ψ β q such that d p ˜ X, ˜ d X q W ,p p Ψ α, Ψ β q “ ˆż ˜ X ˆ ˜ X ˜ d pX p x, y q ˜ µ ˚ p dx ˆ dy q ˙ p . In consequence, we ﬁnd using our previous results that for any µ ˚ P θ ´ p ˜ µ ˚ q it holds d p ˜ X, ˜ d X q W ,p p Ψ α, Ψ β q “ ˆż ˜ X ˆ ˜ X ˜ d pX p x, y q ˜ µ ˚ p dx ˆ dy q ˙ p “ ˆż X ˆ X d pX p x, y q µ ˚ p dx ˆ dy q ˙ p “ d p X,d X q W ,p p α, β q . This yields the claim. (cid:3)

B.2.

Relegated results and proofs from Section 3.

Proof of Lemma 3.11. If t u n u n P N Ď D ult p u X , u Y q is a decreasing sequence (with respect topointwise inequality), it is easy to verify that u : “ inf n P N u n P D ult p u X , u Y q and thus u is alower bound of t u n u n P N . Then, by Zorn’s lemma D ultadm p u X , u Y q ‰ H . Therefore, we obtainthat u sturmGW ,p p X , Y q “ inf u P D ultadm p u X ,u Y q d p X \ Y,u q W ,p p µ X , µ Y q . (cid:3) HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 65

Proof of Lemma 3.12.

Assume otherwise that u ´ p q “ H . Then, u is a metric (instead ofpseudo-metric). Let p x , y q P X ˆ Y such that u p x , y q “ min x P X,y P Y u p x, y q . The existenceof p x , y q is guaranteed by the compactness of X and Y . We deﬁne u p x ,y q : X \ Y ˆ X \ Y Ñ R ě as follows:(1) u p x ,y q | X ˆ X : “ u X and u p x ,y q | Y ˆ Y : “ u Y ;(2) For p x, y q P X ˆ Y , u p x ,y q p x, y q : “ min p u p x, y q , max p u X p x, x q , u Y p y, y qqq ;(3) For any p y, x q P Y ˆ X , u p x ,y q p y, x q : “ u p x ,y q p x, y q .It is easy to verify that u p x ,y q P D ult p u X , u Y q . Then, it is obvious that u p x ,y q p x , y q “ ă u p x , y q and that u p x ,y q p x, y q ď u p x, y q for all x, y P X \ Y which contradicts with u P D ultadm p u X , u Y q . Therefore, u ´ p q ‰ H . (cid:3) Lemma B.2.

Let X “ p X, u X , µ X q and Y “ p Y, u Y , µ Y q be compact ultrametric measurespaces. Then, µ P C p µ X , µ Y q Ď P p X ˆ Y, max p u X , u Y qq is compact with respect to weakconvergence.Proof. The proof follows directly from Chowdhury and M´emoli (2019, Lemma 10). (cid:3)

Lemma B.3.

Let X , Y P U w . Let D Ď D ult p u X , u Y q be a non-empty subset satisfying thefollowing: there exist p x , y q P X ˆ Y and C ą such that u p x , y q ď C for all u P D .Then, D is pre-compact with respect to uniform convergence.Proof. Let t u n u n P N Ď D be a sequence. Note that X ˆ Y Ď X \ Y ˆ X \ Y . Let v n : “ u n | X ˆ Y .For any n P N and any p x, y q , p x , y q P X ˆ Y , we have that | u n p x, y q ´ u n p x , y q| ď u X p x, x q ` u Y p y, y q ď p u X , u Y q pp x, y q , p x , y qq . Then, t v n u n P N is equicontinuous. Now, since u n p x , y q ď C , we have that for any p x, y q P X ˆ Y , u n p x, y q ď p u X , u Y q pp x, y q , p x , y qq ` u n p x , y q ď p diam p X q , diam p Y qq ` C. Then, t v n u n P N is uniformly bounded. By Arz´ela-Ascoli theorem, we have that t v n u n P N has auniform convergent subsequence. Without loss of generality, we assume that v : X ˆ Y Ñ R ě is the limit of the sequence t v n u n P N .Now, we deﬁne u : X \ Y ˆ X \ Y Ñ R ě as follows:(1) u | X ˆ X : “ u X and u | Y ˆ Y : “ u Y ;(2) u | X ˆ Y : “ v ;(3) for p y, x q P Y ˆ X , we let u p y, x q : “ u p x, y q .Then, it is easy to verify that u P D ult p u X , u Y q and u is a cluster point of the sequence t u n u n P N . Therefore, D is pre-compact. (cid:3) Lemma B.4.

Let X “ p X, u X , µ X q and Y “ p Y, u Y , µ Y q be compact ultrametric measurespaces. Let t µ n u n P N Ď C p µ X , µ Y q be a convergent sequence with the limit µ with respectto weak convergence. Let t u n u n P N Ď D ult p u X , u Y q . Suppose there exist a non-decreasingsequence t p n u n P N Ď r , and C ą such that ˆż X ˆ Y p u n p x, y qq p n µ n p dx ˆ dy q ˙ pn ď C for all n P N . Then, t u n u n P N uniformly converges to some u P D ult p u X , u Y q (up to taking asubsequence).Proof. The following argument adapts from Lemma 3.3 in Sturm (2006). For any p x , y q P supp p µ q , there exists ε, δ ą C ě ˆż X ˆ Y p u n p x, y qq p n µ n p dx ˆ dy q ˙ pn ě ż X ˆ Y u n p x, y q µ n p dx ˆ dy qě ż B Xε p x qˆ B Yε p y q u n p x, y q µ n p dx ˆ dy q ě ż B Xε p x qˆ B Yε p y q p u n p x , y q ´ ε q µ n p dx ˆ dy qě p u n p x , y q ´ ε q ` µ ` B Xε p x q ˆ B Yε p y q ˘ ´ δ ˘ . Therefore, t u n p x , y qu n P N is uniformly bounded. By lemma B.3, we have that t u n u n P N has auniformly convergent subsequence. (cid:3) Lemma B.5.

Let

X, Y be ultrametric spaces, then ∆ p u X , u Y q : X ˆ Y ˆ X ˆ Y Ñ R ě iscontinuous with respect to product topology or equivalently max p u X , u Y , u X , u Y q .Proof. Fix p x, y, x , y q P X ˆ Y ˆ X ˆ Y and ε ą

0. Choose 0 ă δ ă ε such that δ ă u X p x, x q if x ‰ x and δ ă u Y p y, y q if y ‰ y . Then, consider any point p x , y , x , y q P X ˆ Y ˆ X ˆ Y such that u X p x, x q , u Y p y, y q , u X p x , x q , u Y p y , y q ď δ . For u X p x , x q , we have the followingtwo situations:(1) x “ x : u X p x , x q ď max p u X p x , x q , u X p x, x qq ď δ ă ε ;(2) x ‰ x : u X p x , x q ď max p u X p x , x q , u X p x, x q , u X p x , x qq “ u X p x, x q . Similarly, u X p x, x q ď u X p x , x q and thus u X p x, x q “ u X p x , x q .Similar result holds for u Y p y , y q . This leads to four cases for ∆ p u X p x , x q , u Y p y , y qq :(1) x “ x , y “ y : In this case we have u X p x , x q , u Y p y , y q ă ε . Then, | ∆ p u X p x , x q , u Y p y , y qq ´ ∆ p u X p x, x q , u Y p y, y qq| “ ∆ p u X p x , x q , u Y p y , y qq ď ε ;(2) x “ x , y ‰ y : Now u X p x , x q ă ε and u Y p y , y q “ u Y p y, y q . If u Y p y, y q ě ε ą u X p x , x q , then | ∆ p u X p x , x q , u Y p y , y qq ´ ∆ p u X p x, x q , u Y p y, y qq| “ | u Y p y, y q ´ u Y p y, y q| “ u Y p y, y q ă ε , then ∆ p u X p x , x q , u Y p y , y qq ď ε and ∆ p u X p x, x q , u Y p y, y qq “ u Y p y, y q ď ε . Therefore, | ∆ p u X p x , x q , u Y p y , y qq ´ ∆ p u X p x, x q , u Y p y, y qq| ď ε ; HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 67 (3) x ‰ x , y “ y : Similar with (2) we have | ∆ p u X p x , x q , u Y p y , y qq ´ ∆ p u X p x, x q , u Y p y, y qq| ď ε ;(4) x ‰ x , y ‰ y : Now u X p x , x q “ u X p x, x q and u Y p y , y q “ u Y p y, y q . Therefore, | ∆ p u X p x , x q , u Y p y , y qq ´ ∆ p u X p x, x q , u Y p y, y qq| “ . In conclusion, whenever u X p x, x q , u Y p y, y q , u X p x , x q , u Y p y , y q ď δ we have that | ∆ p u X p x , x q , u Y p y , y qq ´ ∆ p u X p x, x q , u Y p y, y qq| ď ε. Therefore, ∆ p u X , u Y q is continuous. (cid:3) Appendix C. Missing proofs from Section 4

C.1.

Missing details of the proof of Theorem 4.4.

Proof of Claim 1 (Theorem 4.4).

First note by Theorem 5.1 that u GW ,p ´ ˆ∆ p a q , ˆ∆ p b q ¯ ě SLB ult p ´ ˆ∆ p a q , ˆ∆ p b q ¯ . It is easy to verify that SLB ult p ´ ˆ∆ p a q , ˆ∆ p b q ¯ “ ´ p ∆ p a, b q . Onthe other hand, consider the diagonal coupling between µ a and µ b , then for p P r , u GW ,p ´ ˆ∆ p a q , ˆ∆ p b q ¯ ď ˆ ¨ ∆ p a, b q p ¨ ¨ ˙ p “ ´ p ∆ p a, b q , and for p “ 8 u GW , ´ ˆ∆ p a q , ˆ∆ p b q ¯ ď ∆ p a, b q . Therefore, u GW ,p ´ ˆ∆ p a q , ˆ∆ p b q ¯ “ ´ p ∆ p a, b q . (cid:3) C.2.

Proof of Theorem 4.2.

In this section, we prove Theorem 4.2 by slightly modifyingthe proof of Proposition 5.3 in M´emoli (2011).

Lemma C.1.

Let X and Y be compact ultrametric spaces and let S Ď X ˆ Y be non-empty.Assume that sup p x,y q , p x ,y qP S ∆ p u X p x, x q , u Y p y, y qq ă η. Deﬁne u S : X \ Y ˆ X \ Y Ñ R ě as follows:(1) u S | X ˆ X : “ u X and u S | Y ˆ Y : “ u Y ;(2) for any p x, y q P X ˆ Y , u S p x, y q : “ inf p x ,y qP S max p u X p x, x q , u Y p y, y q , η q . (3) for any p x, y q P X ˆ Y , u S p y, x q : “ u S p x, y q .Then, u S P D ult p u X , u Y q and u S p x, y q ď η for all p x, y q P S .Proof. That u S P D ult p u X , u Y q essentially follows by Zarichnyi (2005, Lemma 1.1). For any p x, y q P S , we let p x , y q : “ p x, y q , then u X p x, y q ď max p u X p x, x q , u Y p y, y q , η q “ max p , , η q “ η. (cid:3) Proof of Theorem 4.2.

Let µ P C p µ X , µ Y q be a coupling such that (cid:13)(cid:13) Γ X,Y (cid:13)(cid:13) L p p µ b µ q ă δ . Set ε : “ v δ p X q ď N ď r { δ s and points x , . . . , x N in X such that min i ‰ j u X p x i , x j q ě ε , min i µ X ` B Xε p x i q ˘ ą δ and µ X ´Ť Ni “ B Xε p x i q ¯ ě ´ ε . Claim 1:

For each i “ , . . . , N there exists y i P Y such that µ ` B Xε p x i q ˆ B Y p ε ` δ q p y i q ˘ ě p ´ δ q µ X ` B Xε p x i q ˘ . Proof.

Assume the claim is false for some i and let Q i p y q “ B Xε p x i q ˆ ´ Y z B Y p ε ` δ q p y q ¯ . Then,as µ P C p µ X , µ Y q it holds µ X ` B Xε p x i q ˘ “ µ ` B Xε p x i q ˆ Y ˘ “ µ ` B Xε p x i q ˆ B Y p ε ` δ q p y q ˘ ` µ ` B Xε p x i q ˆ ` Y z B Y p ε ` δ q p y q ˘˘ . Consequently, we have that µ p Q i p y qq ě δ µ X ` B Xε p x i q ˘ . Further, let Q i : “ (cid:32) p x, y, x , y q P X ˆ Y ˆ X ˆ Y | x, x P B Xε p x i q and u Y p y, y q ě p ε ` δ q ( . Clearly, it holds for p x, y, x , y q P Q i thatΓ X,Y p x, y, x , y q “ ∆ p u X p x, x q , u Y p y, y qq “ u Y p y, y q ě δ. Further, we have that µ b µ p Q i q ě δ . Indeed, it holds µ b µ p Q i q “ ĳ B Xε p x i qˆ Y ĳ Q i p y q µ p dx ˆ dy q µ p dx ˆ dy q“ ĳ B Xε p x i qˆ Y µ p Q i p y qq µ p dx ˆ dy q“ µ X ` B Xε p x i q ˘ ż Y µ p Q i p y qq µ Y p dy qě ` µ X ` B Xε p x i q ˘˘ δ ě δ . However, this yields that (cid:13)(cid:13) Γ X,Y (cid:13)(cid:13) L p p µ b µ q ě (cid:13)(cid:13) Γ X,Y (cid:13)(cid:13) L p µ b µ q ě (cid:13)(cid:13) Γ X,Y Q i (cid:13)(cid:13) L p µ b µ q ě δ ¨ µ b µ p Q i q ě δ , which contradicts (cid:13)(cid:13) Γ X,Y (cid:13)(cid:13) L p p µ b µ q ă δ . (cid:3) Deﬁne for each i “ , . . . , N S k : “ B Xε p x i q ˆ B Y p ε ` δ q p y i q . Then, by Claim 1, µ p S i q ě δ p ´ δ q , for all i “ , . . . , N . Claim 2: Γ X,Y p x i , y i , x j , y j q ď p ε ` δ q for all i, j “ , . . . , N . HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 69

Proof.

Assume the claim fails for some p i , j q , i.e.,∆ p u X p x i , x j q , u Y p y i , y j qq ą p ε ` δ q ą . Then, ∆ p u X p x i , x j q , u Y p y i , y j qq “ max p u X p x i , x j q , u Y p y i , y j qq . We assume withoutloss of generality that u X p x i , x j q “ ∆ p u X p x i , x j q , u Y p y i , y j qq ą u Y p y i , y j q . Consider any p x, y q P S i and p x , y q P S j . By the strong triangle inequality and the factthat u X p x i , x j q ą p ε ` δ q ą ε , it is easy to verify that u X p x, x q “ u X p x i , x j q . Moreover, u Y p y, y q ď max p u Y p y, y i q , u Y p y i , y j q , u Y p y j , y qqă max p p ε ` δ q , u X p x i , x j q , p ε ` δ qq “ u X p x i , x j q “ u X p x, x q . Therefore,Γ X,Y p x, y, x , y q “ u X p x, x q “ u X p x i , x j q “ Γ X,Y p x i , y i , x j , y j q ą p ε ` δ q ą δ. Consequently, we have that (cid:13)(cid:13) Γ X,Y (cid:13)(cid:13) L p p µ b µ q ě (cid:13)(cid:13) Γ X,Y (cid:13)(cid:13) L p µ b µ q ě (cid:13)(cid:13) Γ X,Y S i S j (cid:13)(cid:13) L p µ b µ q ě δµ p S i q µ p S j qą δ ` δ p ´ δ q ˘ . However, for δ ď {

2, 2 δ p δ p ´ δ qq ě δ . This leads to a contradiction. (cid:3) Consider S Ď X ˆ Y given by S : “ tp x i , y i q| i “ , . . . , N u . Let u S be the ultrametric on X \ Y given by Lemma C.1. By Claim 2, sup p x,y q , p x ,y qP S Γ X,Y p x, y, x , y q ď p ε ` δ q . Then,for all i “ , . . . , N we have that u S p x i , y i q ď p ε ` δ q and for any p x, y q P X ˆ Y we havethat u S p x, y q ď max p diam p X q , diam p Y q , p ε ` δ qq ď max p diam p X q , diam p Y q , q “ : M . Here in the second inequality we use the assumption that δ ă and the fact that ε “ v δ p X q ď Claim 3:

Fix i P t , . . . , N u . Then, for all p x, y q P S i , it holds u S p x, y q ď p ε ` δ q . Proof.

Let p x, y q P S i . Then, u X p x, x i q ď ε and u Y p y, y i q ď p ε ` δ q . Then, by the strongtriangle inequality for u S we obtain u S p x, y q ď max t u X p x, x i q , u Y p y, y i q , u S p x i , y i quď max t ε, p ε ` δ q , p ε ` δ qu ď p ε ` δ q . (cid:3) Let L : “ Ť Ni “ S i . The next step is to estimate the mass of µ in the complement of L . Claim 4: µ p X ˆ Y z L q ď ε ` δ . Proof.

For each i “ , . . . , N , let A i : “ B Xε p x i q ˆ ´ Y z B Y p ε ` δ q p y i q ¯ . Then, A i “ ` B Xε p x i q ˆ Y ˘ z ` B Xε p x i q ˆ B Y p ε ` δ q p y i q ˘ “ ` B Xε p x i q ˆ Y ˘ z S i . Hence, µ p A i q “ µ ` B Xε p x i q ˆ Y ˘ ´ µ p S i q “ µ X ` B Xε p x i q ˘ ´ µ p S i q , where the last equality follows from the fact that µ P M p µ X , µ Y q . By Claim 1, we have that µ p S i q ě µ X ` B Xε p x i q ˘ p ´ δ q . Consequently, we obtain µ p A i q ď µ X ` B Xε p x i q ˘ δ . Notice that X ˆ Y z L Ď ˜ X I N ď i “ B Xε p x i q ¸ ˆ Y Y ˜ N ď i “ A i ¸ . Hence, µ p X ˆ Y z L q ď µ X ˜ X I N ď i “ B Xε p x i q ¸ ` N ÿ i “ µ p A i qď ´ µ X ˜ N ď i “ B Xε p x i q ¸ ` N ÿ i “ δ µ X ` B Xε p x i q ˘ ď ε ` N ¨ δ ď ε ` δ. Here, the third inequality follows from the construction of x i s in the beginning of this sectionand from the fact that N ď r { δ s . (cid:3) Now, ĳ X ˆ Y u pS p x, y q µ p dx ˆ dy q “ ¨˚˝ĳ L ` ĳ X ˆ Y z L ˛‹‚ u pS p x, y q µ p dx ˆ dy qď p p ε ` δ qq p ` M p ¨ p ε ` δ q . Since we have for any a, b ě p ě a { p ` b { p ě p a ` b q { p , we obtain u sturmGW ,p p X , Y q ď p ε ` δ q p ´ p ε ` δ q ´ p ` M ¯ ď p ε ` δ q p p ` M qď p v δ p X q ` δ q p ¨ M, where we used ε “ v δ p X q and M : “ p diam p X q , diam p Y qq ` ě M `

27. Since theroles of X and Y are symmetric, we have that u sturmGW ,p p X , Y q ď p p v δ p X q , v δ p Y qq ` δ q p ¨ M. This concludes the proof. (cid:3)

HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 71

Appendix D. Missing details from Section 8

D.1.

Phylogenetic tree shape comparison based on SLB ult1 . In this subsection, weadditionally illustrate the results obtained for the comparisons of the phylogenetic tree shapes T i (deﬁned in Section 8.1), 1 ď i ď SLB ult1 and d CP , as dendrograms. Figure 13.

Phylogenetic tree shape comparison:

Visualization of theresults for the comparison of the phylogenetic tree shapes T i , 1 ď i ď SLB ult1 (top row) and d CP , (bottom row) as single linkage (left) andcomplete linkage dendrograms (right). D.2.

Phylogenetic tree shape comparison based on SLB . Here, we additionally il-lustrate the results obtained for the comparisons of the ultra-dissimilarity spaces X i (inducedby the phylogenetic tree shapes T i deﬁned in Section 8.1), 1 ď i ď SLB asdendrograms. Figure 14.

Phylogenetic tree shape comparison based on SLB : Rep-resentation of the results for the comparisons of the ultra-dissimilarity spaces X i , 1 ď i ď SLB as single linkage (left) and complete linkagedendrograms (right). Department of Mathematics and Department of Computer Science and Engineering, TheOhio State University

Email address : [email protected] Institute for Mathematical Stochastics, University of G¨ottingen

Email address : [email protected] Department of Mathematics, The Ohio State University

Email address : [email protected] Institute for Mathematical Stochastics, University of G¨ottingen

Email address ::