The ultrametric Gromov-Wasserstein distance
Facundo Mémoli, Axel Munk, Zhengchao Wan, Christoph Weitkamp
TTHE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE
FACUNDO M´EMOLI, AXEL MUNK, ZHENGCHAO WAN, AND CHRISTOPH WEITKAMP
Abstract.
In this paper, we investigate compact ultrametric measure spaces which form asubset U w of the collection of all metric measure spaces M w . Similar as for the ultrametricGromov-Hausdorff distance on the collection of ultrametric spaces U , we define ultrametricversions of two metrics on U w , namely of Sturm’s distance of order p and of the Gromov-Wasserstein distance of order p . We study the basic topological and geometric properties ofthese distances as well as their relation and derive for p “ 8 a polynomial time algorithmfor their calculation. Further, several lower bounds for both distances are derived and someof our results are generalized to the case of finite ultra-dissimilarity spaces. Contents
1. Introduction 21.1. The proposed approach 51.2. Overview of our results 71.3. Related work 82. Preliminaries 92.1. Ultrametric spaces and dendrograms 92.2. The ultrametric Gromov-Hausdorff distance 112.3. Wasserstein distance on ultrametric spaces 123. Ultrametric Gromov-Wasserstein distances 153.1. Sturm’s ultrametric Gromov-Wasserstein distance 153.2. The ultrametric Gromov-Wasserstein distance 223.3. The relation between u GW ,p and u sturmGW ,p u GW ,p and u sturmGW ,p . 324.2. Completeness and separability 324.3. Geodesic property 355. Lower bounds of u GW ,p u GW ,p on ultra-dissimilarity spaces 40 a r X i v : . [ m a t h . M G ] J a n FACUNDO M´EMOLI, AXEL MUNK, ZHENGCHAO WAN, AND CHRISTOPH WEITKAMP u GW ,p and SLB ult p SLB ult1 based phylogenetic tree shape comparison 508.2.
SLB p based phylogenetic tree shape comparison 539. Concluding remarks 54Acknowledgements 54References 541. Introduction
Over the last decade the acquisition of ever more complex data, structures and shapes hasincreased drastically. Consequently, the need to develop meaningful methods for comparinggeneral objects has become more and more apparent. In numerous applications in molecularbiology (Holm and Sander, 1993; Kufareva and Abagyan, 2011), computer vision (Lowe,2001; Jain and Dorai, 2000) and electrical engineering (Papazov et al., 2012; Kuo et al.,2014) it is important to distinguish between different objects, but to consider the sameobject in different spatial orientations as equal. Furthermore, also the comparison of graphs,trees and networks, where mainly the underlying connectivity structure matters, have grownin importance (Chen and Safro, 2011; Dong and Sawin, 2020). One possibility to comparetwo general objects in a pose invariant manner is to model them as metric spaces p X, d X q and p Y, d Y q and regard them as elements of the collection of isometry classes of compact metricspaces denoted by M (i.e. two compact metric spaces p X, d X q and p Y, d Y q are in the sameclass if and only if they are isometric to each other which we denote by X – Y ). Then, it ispossible to compare p X, d X q and p Y, d Y q via the Gromov-Hausdorff distance , which defines adistance on M . The Gromov-Hausdorff distance between p X, d X q and p Y, d Y q is defined as d GH p X, Y q : “ inf Z,φ,ψ d p Z,d Z q H p φ p X q , ψ p Y qq , (1)where φ : X Ñ Z and ψ : Y Ñ Z are isometric embeddings into a metric space p Z, d Z q and d p Z,d Z q H denotes the Hausdorff distance on Z . The Hausdorff distance is a metric on thecollection of compact subsets of a metric space p Z, d Z q , which is denoted by S p Z q , and for A, B P S p Z q defined as follows d p Z,d Z q H p A, B q : “ max ˆ sup a P A inf b P B d Z p a, b q , sup b P B inf a P A d Z p a, b q ˙ . (2)While the Gromov-Hausdorff distance has been applied successfully for various shape anddata analysis tasks (see e.g. M´emoli and Sapiro (2004); Bronstein et al. (2006a,b, 2009a,b); HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 3
Chazal et al. (2009); Bronstein et al. (2010); Carlsson and M´emoli (2010)), it turns out that itis generally convenient to equip the modelled objects with more structure and to model themas metric measure spaces (M´emoli, 2007, 2011). A metric measure space X “ p X, d X , µ X q isa triple, where p X, d X q denotes a metric space and µ X stands for a Borel probability measureon X with full support. This additional probability measure can be thought of as signallingthe importance of different regions in the modelled object. Moreover, two metric measurespaces X “ p X, d X , µ X q and Y “ p Y, d Y , µ Y q are considered as isomorphic (denoted by X – w Y ) if and only if there exists an isometry ϕ : p X, d X q Ñ p Y, d Y q such that ϕ µ X “ µ Y .Here, ϕ denotes the pushforward map. From now on, M w denotes the collection of all(isomorphism classes of) compact metric measure spaces.The additional structure of the metric measure spaces allows to regard the modelled objectsas probability measures instead of compact sets. Hence, it is possible to substitute the Haus-dorff component in (1) by a relaxed notion of proximity, namely the Wasserstein distance .This distance is fundamental to a variety of mathematical developments and is also knownas Kantorovich distance (Kantorovich, 1942), Kantorovich-Rubinstein distance (Kantorovichand Rubinstein, 1958), Mallows distance (Mallows, 1972) or as the Earth Mover’s distance(Rubner et al., 2000). Given a compact metric space p Z, d Z q , let P p Z q denote the space ofprobability measures on Z and let α, β P P p Z q . Then, the Wasserstein distance of order p ,for 1 ď p ď 8 , between α and β is defined as d p Z,d Z q W ,p p α, β q : “ ˆ inf µ P C p α,β q ż Z ˆ Z d pZ p x, y q µ p dx ˆ dy q ˙ p , (3)and for p “ 8 as d p Z,d Z q W ,p p α, β q : “ inf µ P C p α,β q sup p x,y qP supp p µ q d Z p x, y q , (4)where C p α, β q denotes the set of all couplings of α and β , i.e., the set of all measures µ onthe product space Z ˆ Z such that µ p A ˆ Z q “ α p A q and µ p Z ˆ B q “ β p B q for all measurable sets A and B of Z . Since the space Z is compact, d p Z,d Z q W ,p defines a metricon P p Z q .It is well known Villani (2003) that the Wasserstein distance between probability measureson the real line admits a closed form solution (cf. Remark 2.18).Sturm (2006) has shown that replacing the Hausdorff distance in (1) with the Wassersteindistance indeed yields a meaningful metric on M w . Let X “ p X, d X , µ X q and Y “ p Y, d Y , µ Y q be two metric measure spaces. Then, Sturm’s Gromov-Wasserstein distance of order p ,1 ď p ď 8 , is defined as d sturmGW ,p p X , Y q : “ inf Z,φ,ψ d p Z,d Z q W ,p p φ µ X , ψ µ Y q , (5)where φ : X Ñ Z and ψ : Y Ñ Z are isometric embeddings into the metric space p Z, d Z q .Based on similar ideas but a different representation of the Gromov-Hausdorff distance,M´emoli (2007, 2011) derived a computationally more tractable and topologically equivalent FACUNDO M´EMOLI, AXEL MUNK, ZHENGCHAO WAN, AND CHRISTOPH WEITKAMP metric on M w , namely the Gromov-Wasserstein distance. Let C p µ X , µ Y q denote the set ofcouplings of µ X and µ Y . For p P r , , the p -distortion of µ P C p µ X , µ Y q is defined asdis p p µ q : “ ¨˝ ij X ˆ Y ˆ X ˆ Y ˇˇ d X p x, x q ´ d Y p y, y q ˇˇ p µ p dx ˆ dy q µ p dx ˆ dy q ˛‚ { p (6)and for p “ 8 it is given asdis p µ q : “ sup x,x P X , y,y P Y s.t. p x,y q , p x ,y qP supp p µ q ˇˇ d X p x, x q ´ d Y p y, y q ˇˇ , where supp p µ q denotes the support of µ . The Gromov-Wasserstein distance of order p ,1 ď p ď 8 , is defined as d GW ,p p X , Y q : “
12 inf µ P C p µ X ,µ Y q dis p p µ q . (7)Although both d sturmGW ,p and d GW ,p , 1 ď p ď 8 are in general supposed to be NP-hard tocompute (M´emoli, 2011), it is possible to efficiently approximate the local minima of d GW ,p via conditional gradient descent (M´emoli, 2011; Peyr´e et al., 2016). This has lead to numerousapplications and extensions of this distance (Alvarez-Melis and Jaakkola, 2018; Titouan et al.,2019; Bunne et al., 2019; Chowdhury and Needham, 2020).Clearly, the set M w contains various, extremely general spaces. However, in many applica-tions it is possible to have prior knowledge about the metric measure spaces under consid-eration and it is often reasonable to restrict oneself to work on a specific subset O w Ď M w .For instance, it could be known that the metrics of the spaces considered are induced by theshortest path metric on some underlying trees and hence it is unnecessary to consider thecalculation of d sturmGW ,p and d GW ,p , 1 ď p ď 8 , for all of M w . The potential advantages of afocus on a specific subset O w are twofold. On the one hand, it might be possible to use thefeatures of O w to gain computational benefits. On the other hand, it might be possible to re-fine the definition d sturmGW ,p and d GW ,p , 1 ď p ď 8 , to obtain a more informative comparison on O w . Naturally, it is of interest to identify and study these subclasses and the correspondingrefinements. This approach has been pursued to study (variants of) the Gromov-Hausdorffdistance on compact ultrametric spaces by Zarichnyi (2005) and Qiu (2009), and on com-pact p-metric spaces in M´emoli et al. (2019). Here, the metric space p X, d p p q X q is denoted as p -metric space p ď p ă 8q , if for all x, x , x P X it holds d X p x, x q ď p d X p x, x q p ` d X p x , x q p q { p . Further, the metric space p X, u X q , is denoted as ultrametric space, if u X fulfills for all x, x , x P X that u X p x , x q ď max p u X p x, x q , u X p x , x qq , (8)i.e., ultrametric can be considered as the limiting case of p -metrics. In particular, M´emoliet al. (2019) derived a polynomial time algorithm for the calculation of the ultrametricGromov-Hausdorff distance between two compact ultrametric spaces p X, u X q and p Y, u Y q (see Section 2.2), which is defined as u GH p X, Y q : “ inf Z,φ,ψ d p Z,u Z q H p φ p X q , ψ p Y qq , (9) HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 5 where φ : X Ñ Z and ψ : Y Ñ Z are isometric embeddings into an ultrametric space p Z, u Z q and d p Z,u Z q H denotes the Hausdorff distance on Z .A further motivation to study (surrogates of) the distances d sturmGW ,p and d GW ,p restricted on asubset O w comes from the idea of slicing which originated as a method to efficiently estimatethe Wasserstein distance d R W ,p p α, β q between probability measures α and β supported in ahigh dimensional euclidean space R d . The original idea is that given any line (cid:96) in R d one firstobtains α (cid:96) and β (cid:96) , the respective pushforwards of α and β under the orthogonal projectionmap π (cid:96) : R d Ñ (cid:96) , and then one invokes the explicit formula for the Wasserstein distance forprobability measures on R (see remark 2.18) to obtain a lower bound to d R d W ,p p α, β q withoutincurred the possibly high computational cost associated to solving an optimal transportationproblem. This lower bound is improved via repeated (often random) selections of the theline (cid:96) (Rubner et al., 2000; Bonneel et al., 2015; Kolouri et al., 2019). Recently, Le et al.(2019b) pointed out that, thanks to the fact that the 1-Wasserstein distance also admitsan explicit formula when the underlying metric space is a tree (Do Ba et al., 2011; Evansand Matsen, 2012; McGregor and Stubbs, 2013), one can also devise tree slicing estimatesof the distance between two given probability measures by suitably projecting them ontotree-like structures. Most likely, the same strategy is successful for suitable projections onrandom ultrametric spaces, as on these there is also an explicit formula for the Wassersteindistance (Kloeckner, 2015). The same line of of work has also recently been explored in theGromov-Wasserstein scenario (Vayer et al., 2019; Le et al., 2019a) and could be extendedbased on efficiently computable restrictions (or surrogates of) d sturmGW ,p and d GW ,p .Inspired by the results of M´emoli et al. (2019) on the ultrametric Gromov-Hausdorff distanceand the results of Kloeckner (2015), who derived an explicit representation of the Wasser-stein distance on ultrametric spaces, we study in the course of this paper the collection ofcompact ultrametric measure spaces U w Ď M w , where X “ p X, u X , µ X q P U w , wheneverthe underlying metric space p X, u X q is a compact ultrametric space. Ultrametric spaces(and thus also ultrametric measure spaces) arise naturally in statistics as metric encodingsof dendrograms (Carlsson and M´emoli, 2010), in the context of phylogenetic trees (Sempleet al., 2003) and in the probabilistic approximation of finite metric spaces (Bartal, 1996).Especially for dendrograms and phylogenetic trees, it is important to have a meaningfulmethod of comparison, i.e., it is essential to have a meaningful metric on U w . However, it isevident from the definition of d sturmGW ,p and the relation of d sturmGW ,p and d GW ,p (see M´emoli (2011)),that the ultrametric structure of X , Y P U w is lost in the computation of d sturmGW ,p p X , Y q and d GW ,p p X , Y q , 1 ď p ď 8 . Hence, we suggest, just as for the ultrametric Gromov-Hausdorffdistance, to adapt the definition of d sturmGW ,p (see (5)) as well as the one of d GW ,p (see (7)) andverify in the following that this makes the comparisons of ultrametric measure spaces moresensitive and leads for p “ 8 to a polynomial time algorithm for the derivation the proposedmetrics.1.1. The proposed approach.
Let X “ p X, u X , µ X q and Y “ p Y, u Y , µ Y q be ultrametricmeasure spaces. Reconsidering the definition of Sturm’s Gromov-Wasserstein distance in(5), it is clear that if we embed the ultrametric spaces p X, u X q and p Y, u Y q into an arbitrarymetric space, then the ultrametric structure of the spaces X and Y may be lost in theembedding. Hence, in order to preserve the ultrametric structure in the comparison, wepropose to only infimize over ultrametric spaces p Z, u Z q in (5). Thus, we define for p P r , FACUNDO M´EMOLI, AXEL MUNK, ZHENGCHAO WAN, AND CHRISTOPH WEITKAMP
Sturm’s ultrametric Gromov-Wasserstein distance of order p as u sturmGW ,p p X , Y q : “ inf Z,φ,ψ d p Z,u Z q W ,p p φ µ X , ψ µ Y q , (10)where φ : X Ñ Z and ψ : Y Ñ Z are isometric embeddings into an ultrametric space p Z, u Z q .In the subsequent sections of this paper, we will establish many theoretically appealingproperties of u sturmGW ,p . Unfortunately, we will verify that, although an explicit formula for theWasserstein distance of order p on ultrametric spaces exists (Kloeckner, 2015), for p P r , the calculation of u sturmGW ,p yields a highly non-trivial combinatorial optimization problem (seeSection 3.1.1). Therefore, we demonstrate that an adaption of the Gromov-Wassersteindistance defined in (7) yields a topologically equivalent and easily approximable distance on U w . In order to define this adaption, we need to introduce some notation. For a, b ě ď q ă 8 let ∆ q p a, b q : “ | a q ´ b q | { q . Further define ∆ p a, b q : “ max p a, b q whenever a ‰ b and ∆ p a, b q “ a “ b . In particularnote that ∆ p a, q “ a for any a ě d GW ,p , 1 ď p ď 8 , as follows d GW ,p p X , Y q “
12 inf µ P C p µ X ,µ Y q ¨˝ ij X ˆ Y ˆ X ˆ Y ` ∆ p d X p x, x q , d Y p y, y qq ˘ p µ p dx ˆ dy q µ p dx ˆ dy q ˛‚ { p . (11)Considering the derivation of d GW ,p in M´emoli (2011) and the results on the closely relatedultrametric Gromov-Hausdorff distance studied in M´emoli et al. (2019), this suggests toreplace ∆ in (11) by ∆ in order to incorporate the ultrametric structures of p X, u X , µ X q and p Y, u Y , µ Y q into the comparison. Hence, we define the p -ultra-distortion of a coupling µ P C p µ X , µ Y q for 1 ď p ă 8 asdis ult p p µ q : “ ¨˝ ij X ˆ Y ˆ X ˆ Y ` ∆ p u X p x, x q , u Y p y, y qq ˘ p µ p dx ˆ dy q µ p dx ˆ dy q ˛‚ { p . (12)and for p “ 8 as dis ult p µ q : “ sup x,x P X , y,y P Y s.t. p x,y q , p x ,y qP supp p µ q ∆ p u X p x, x q , u Y p y, y qq , where supp p µ q denotes the support of µ . Then, the ultrametric Gromov-Wasserstein distance of order p P r , , is given as u GW ,p p X , Y q : “ inf µ P C p µ X ,µ Y q dis ult p p µ q . (13)Due to the structural similarity of d GW ,p and u GW ,p we can expect (and later verify) thatmany properties of d GW ,p extend to u GW ,p . In particular, we will establish that also u GW ,p HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 7 can be approximated via conditional gradient descent and admits several polynomial timecomputable lower bounds that are useful in applications.1.2. Overview of our results.
We give a brief overview over the results obtained.
Section 2.
We slightly generalize the results of Carlsson and M´emoli (2010) on the relationbetween ultrametric spaces and dendrograms and establish a bijection between compactultrametric spaces and proper dendrograms (see Definition 2.3). After recalling some resultson the ultrametric Gromov-Hausdorff distance (see (9) we use these results to reformulate theexplicit formula for the p -Wasserstein distance (1 ď p ă 8 ) on ultrametric spaces derived byKloeckner (2015) in terms of proper dendrograms. This allows us to derive a formulation ofthe -Wasserstein distance on ultrametric spaces and to study the Wasserstein distance oncompact subspaces of the ultrametric space p R ě , ∆ q , which will be relevant when studyinglower bounds of u GW ,p , 1 ď p ď 8 . Section 3.
We will demonstrate that u GW ,p and u sturmGW ,p , 1 ď p ď 8 , are p -metrics on thecollection of ultrametric measure spaces U w , which induces other topologies on U w than d sturmGW ,p z d GW ,p , 1 ď p ď 8 . We derive several alternative representations for u sturmGW ,p and beginto study the relation of both metrics u sturmGW ,p and u GW ,p . In particular, we show that, whilefor 1 ď p ă 8 it holds in general u GW , ă u sturmGW , , both metrics coincide for p “ 8 (i.e. u GW , “ u sturmGW , ). Furthermore, we show how this equality in combination with analternative representation of u GW , leads to a polynomial time algorithm for the calculationof u sturmGW , “ u GW , . Section 4.
We study the topological properties of p U w , u sturmGW ,p q and p U w , u GW ,p q , 1 ď p ď 8 .Most importantly, we show that, just as d sturmGW ,p and d GW ,p , both considered metrics aretopologically equivalent. While we prove that the metric spaces p U w , u sturmGW ,p q and p U w , u GW ,p q , 1 ď p ă 8 , are neither complete nor separable ultrametric space, we demonstrate that theultrametric space p U w , u sturmGW , q , which coincides with p U w , u GW , q , is complete. Further, weestablish that p U w , u sturmGW , q is a geodesic space. Section 5.
Unfortunately, it does not seem to be possible to derive a polynomial timealgorithm for the calculation of u sturmGW ,p and u GW ,p , 1 ď p ă 8 . Consequently, we deriveseveral polynomial time computable lower bounds for u GW ,p , 1 ď p ď 8 , in Section 5. Dueto the structural similarity of d GW ,p and u GW ,p , these are in a certain sense analogue to thosederived in M´emoli (2007, 2011) for d GW ,p . Among other things, we show that u GW ,p p X , Y q ě SLB ult p p X , Y q : “ inf γ P C p µ X b µ X ,µ Y b µ Y q (cid:107) ∆ p u X , u Y q (cid:107) L p p γ q . (14)We verify that the lower bound SLB ult p can be reformulated in terms of the Wassersteindistance on the ultrametric space p R ě , ∆ q (we derive an explicit formula for d p R ě , ∆ q W,p inSection 2.3). This allows us to efficiently calculate
SLB ult p p X , Y q in O pp m _ n q q , where m stands for the cardinality of X and n for the one of Y . Here “approximation” is meant in the sense that one can write code which will locally minimize the func-tional. We do not currently have any theoretical guarantees.
FACUNDO M´EMOLI, AXEL MUNK, ZHENGCHAO WAN, AND CHRISTOPH WEITKAMP
Section 6.
As the requirement that the induced metric spaces of the considered metricmeasure spaces are ultrametric is somewhat restrictive (especially in the context of phyloge-netic trees, see Semple et al. (2003)), we prove in Section 6 that the results on u GW ,p can beextended to the more general ultra-dissimilarity spaces (see Definition 6.1). In particular, weprove that u GW ,p , 1 ď p ď 8 , is a metric on the isomorphism classes of ultra-dissimilarityspaces (see Definition 6.6). Section 7.
We illustrate the behaviour and relation of u GW , (which can be approximated viaconditional gradient descent) and SLB ult1 in a set of toy examples. Additionally, we carefullyillustrate the differences between u GW , and SLB ult1 and d GW , and SLB (see Section 5 fora definition), respectively. Section 8.
Finally, we apply our result for phylogenetic tree shape comparison . To this end,we compare two sets of phylogenetic tree shapes based on the HA protein sequences fromhuman influenza collected in different regions with the lower bound
SLB ult1 . In particular,we compare our result to the ones obtained by Colijn and Plazzotta (2018) for the samecomparison.1.3.
Related work.
In order to better contextualize our contribution, we now describerelated work, both in applied and computational geometry, and in phylogenetics (wherenotions of distance between trees have arisen naturally).
Metrics between trees: The phylogenetics perspective.
In phylogenetics, the need to be able tomeasure distance between different trees arises form the fact that the process of reconstruc-tion of a phylogenetic tree may depend on the set of genes being considered. At the sametime, even for the same set of genes, different reconstruction methods could be applied whichwould result in different trees. As such, this has led to the development of many differentmetrics for measuring distance between phylogenetic. Examples include the Robinson-Fouldsmetric (Robinson and Foulds, 1981), The subtree-prune and regraft distance (Hein, 1990),and the nearest-neighbor interchange distance (Robinson, 1971).As pointed out in Owen and Provan (2010), many of these distances tend to quantify dif-ferences between tree topologies and often do not take into account edge lengths. A certainphylogenetic tree metric space which encodes for edge lengths was proposed in Billera et al.(2001) and studied algorithmically in Owen and Provan (2010). This tree space assumesthat the all trees have the same set of taxa. An extension to the case of trees over differentunderlying sets is given in Grindstaff and Owen (2018)Lafond et al. (2019) considered one type of metrics on possibly muiltilabeled phylogenetictrees with a fixed number of leafs. As the authors point out, a multilabeled philogenetic treein which no leafs are repeated is just a standard philogenetic tree, whereas a multilabeledphylogenetic tree in which all labels are equal defines a tree shape . The authors then pro-ceed to study the study the computational complexity associated to generalizations of someof the usual metrics for phylogenetic trees (such as the Robinson-Foulds distance) to themultilabeled case.Colijn and Plazzotta (2018) studied a metric between (binary) phylogenetic tree shapesbased on a bottom to top enumeration of specific connectivity structures. The authorsapplied their metric to compare evolutionary trees based on the HA protein sequences fromhuman influenza collected in different regions.
HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 9
Metrics between trees: The applied geometry perspective.
From a different perspective, ideasfrom applied geometry and applied and computational topology have been applied to thecomparison of tree shapes in applications in probability, clustering and applied and compu-tational topology.Metric trees are also considered in probability theory in the study of models for random treestogether with the need to quantify their distance; Evans (2007) describes some variants ofthe Gromov-Hausdorff distance between metric trees. See also Greven et al. (2009) for thecase of metric measure space representations of trees and a certain Gromov-Prokhorov typeof metric on the collection thereof.Trees, in the form of dendrograms, are abundant in the realm of hierarhical clustering meth-ods. In theit study of the stability of hierarchical clustering methods, Carlsson and M´emoli(2010) utilized the Gromov-Hausdorff distance between the ultrametric representation ofdendrograms.Schmiedl (2017) proved that computing the Gromov-Hausdorff distance between tree metricspaces is NP-hard. Liebscher (2018) suggests some variants of the Gromov-Hausdorff distancewhich are applicable in the context of phylogenetic trees.As mentioned before, Zarichnyi (2005) introduced the ultrametric Gromov-Hausdorff dis-tance u GH between compact ultrametric spaces (a special type of tree metric spaces). Certaintheoretical properties such as precompactness of u GH has been studied in Qiu (2009). In con-trast with the NP-hardness of computing d GH , M´emoli et al. (2019) devised an polynomialtime algorithm for computing u GH .In computational topology merge trees arise through the study of the sublevel sets of a givenfunction (Adelson-Velskii and Kronrod, 1945; Reeb, 1946) with the goal of shape simplifi-cation. Morozov et al. (2013) develop the notion of interleaving distance between mergetrees which is related to the Gromov-Hausdorff distance between trees through bi-Lipschitzbounds. In Agarwal et al. (2018), exploiting the connection between the interleaving distanceand the Gromov-Hausdorff between metric trees, the authors approach the computation ofthe Gromov-Hausdorff distance between metric trees in general and provide certain approx-imation algorithms.Touli and Wang (2018) devise fixed-parameter tractable (FPT) algorithms for computingthe interleaving distance between metric trees. One can imply from their methods an FPTalgorithm to compute a 2-approximation of the Gromov-Hausdorff distance between ultra-metric spaces. M´emoli et al. (2019) devise an FPT algorithm for computing the exact valueof the Gromov-Hausdorff distances between ultrametric spaces.2. Preliminaries
In this section we briefly summarize the basic notions and concepts required throughout thepaper.2.1.
Ultrametric spaces and dendrograms.
We begin by describing finite ultrametricspaces in terms of dendrograms (for more details see Carlsson and M´emoli (2010)). To thisend, we introduce some definitions and some notation. Given a finite set X , a partition of X is a set P X “ t X , . . . , X k u where H ‰ X i Ď X , 1 ď i ď k , X i X X j “ H for all i ‰ j “ , . . . , k and Ť ki “ X i “ X . We call each element X i a block of the given partition P X and denote by Part p X q the collection of all partitions of X . For two partitions P X and P X we say that P X is finer than P X , if for every block X i P P X there exists a block X j P P X such that X i Ď X j . Definition 2.1 (Dendrogram) . A dendrogram θ X : r ,
8q Ñ
Part p X q is a map parameter-izing a nested family of partitions over the same set X that satisfies the following conditions:(1) θ X p s q is finer than θ X p t q for any 0 ď s ă t ă 8 ;(2) θ X p q is the finest partition consisting only singleton sets;(3) There exists t X ą t ě t X , θ X p t q “ t X u is the trivial partition;(4) For each t ě
0, there exists ε ą θ X p t q “ θ X p t q for all t P r t, t ` ε s .The following lemma gives an alternative representation for finite ultrametric spaces in termsof dendrograms. Lemma 2.2 (Carlsson and M´emoli (2010)) . Given a finite set X , denote by U p X q thecollection of all ultrametrics on X and D p X q the collection of all dendrograms over X .Then, there exists a bijection ∆ X : U p X q Ñ D p X q . In this paper, we are not only concerned with finite ultrametric spaces, but mainly withcompact ultrametric spaces. However, we show that a similar statement as Lemma 2.2 holdsfor compact ultrametric spaces. More precisely, we prove that there is a bijection betweencompact ultrametric spaces and the so-called proper dendrograms . Definition 2.3 (Proper dendrogram) . Given a set X (not necessarily finite), a proper den-drogram θ X : r ,
8q Ñ
Part p X q is a map satisfying the following conditions:(1) θ X p s q is finer than θ X p t q for any 0 ď s ă t ă 8 ;(2) θ X p q is the finest partition consisting only singleton sets;(3) There exists T ą t ě T , θ X p t q “ t X u is the trivial partition;(4) For each t ą
0, there exists ε ą θ X p t q “ θ X p t q for all t P r t, t ` ε s .(5) For any distinct points x, x P X , there exists T xx ą x and x belong todifferent blocks in θ X p T xx q .(6) For each t ą θ X p t q consists of only finitely many blocks.(7) Let t t n u n P N be a decreasing sequence such that lim n Ñ8 t n “ X n P θ X p t n q .If for any 1 ď n ă m , X m Ď X n , then Ş n P N X n ‰ H .It is obvious that any dendrogram θ X over a finite set X is a proper dendrogram. Let θ X bea proper dendrogram over a set X . For any x P X and t ě
0, we denote by r x s Xt the blockin θ p t q that contains x P X and abbreviate r x s Xt to r x s t when the underlying set X is clearfrom the context. HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 11
The subsequent theorem extends Lemma 2.2 to compact ultrametric spaces. Since its proofdepends on some results not yet introduced, we postpone it to Appendix A.3.
Theorem 2.4.
Given a set X , denote by U p X q the collection of all compact ultrametrics on X and D p X q the collection of all proper dendrograms over X . Then, there exists a canonicalbijective map ∆ X : U p X q Ñ D p X q . Remark 2.5.
From now on, we denote by θ X the proper dendrogram corresponding to agiven compact ultrametric u X on X under the bijection given above. Note that a block r x s t in θ X p t q is actually the closed ball B t p x q in X centered at x with radius t . So for each t ě θ X p t q partitions X into a union of several closed balls in X with respect to u X .2.2. The ultrametric Gromov-Hausdorff distance.
Both d sturmGW ,p and d GW ,p , 1 ď p ď 8 ,are by construction closely related to the Gromov-Hausdorff distance. In a recent paper,M´emoli et al. (2019) studied an ultrametric version of this distance, namely the ultrametricGromov-Hausdorff distance (denoted as u GH ). Since we will demonstrate several connectionsbetween u sturmGW ,p , u GW ,p , 1 ď p ď 8 , and this distance, we briefly summarize some of the resultsin M´emoli et al. (2019). We start by recalling the formal definition of u GH . Definition 2.6.
Let p X, u X q and p Y, u Y q be two compact ultrametric spaces. Then, the ultrametric Gromov-Hausdorff between X and Y is defined as u GH p X, Y q “ inf
Z,φ,ψ d Z H p φ p X q , ψ p Y qq , where φ : X Ñ Z and ψ : Y Ñ Z are isometric embeddings (distance preserving transfor-mations) into the ultrametric space p Z, u Z q .Zarichnyi (2005) has shown that u GH indeed is a (ultra)metric on U and M´emoli et al.(2019) identified a structural theorem (cf. Theorem 2.8) that gives rise to a polynomial timealgorithm for the calculation of u GH . More precisely, it was proven in M´emoli et al. (2019)that u GH can be calculated via so-called quotient ultrametric spaces, which we define next.Let p X, u X q be an ultrametric spaces and let t ě
0. We define an equivalence relation „ t on X as follows: x „ t x if and only if u X p x, x q ď t . We denote by r x s Xt (resp. r x s t ) theequivalence class of x under „ t and by X t the set of all such equivalence classes. Note that r x s Xt “ t x P X | u p x, x q ď t u is actually the closed ball centered at x with radius t . Wedefine an ultrametric u X t on X t as follows: u X t pr x s t , r x s t q : “ u X p x, x q , r x s t ‰ r x s t , r x s t “ r x s t . Then, p X t , u X t q is an ultrametric space and we call p X t , u X t q the quotient of p X, u X q at level t (see Figure 1 for an illustration). It is straight forward to prove that the quotient of acompact ultrametric space at level t ą Lemma 2.7.
Let X be a complete ultrametric space. Then, X is compact ultrametric spaceif and only if for any t ą , X t is a finite space. Figure 1.
Metric quotient:
An ultrametric space (black) and its quotientat level t (red). Proof.
Wan (2020, Lemma 2.4) proves that whenever X is compact, X t is finite for any t ą X t is finite for any t ą
0. We only need to prove that X istotally bounded. For any ε ą X ε is a finite set and thus there exists x , . . . , x n P X suchthat X ε “ tr x s ε , . . . , r x n s ε u . Now, for any x P X , there exists x i for some i “ , . . . , n suchthat x P r x i s ε . This implies that u X p x, x i q ď ε . Therefore, the set t x , . . . , x n u Ď X is an ε -net of X . Then, X is totally bounded and thus compact. (cid:3) The following structural theorem characterizes u GH via quotient of ultrametric spaces. Theorem 2.8 (Structural theorem for u GH , (M´emoli et al., 2019, Theorem 5.7)) . Let p X, u X q and p Y, u Y q be two compact ultrametric spaces. Then, u GH p X, Y q “ inf t t ě | X t – Y t u . Remark 2.9.
The quotient spaces X t and Y t can be considered as vertex weighted, rootedtrees (M´emoli et al., 2019). Hence, it is possible to check X t – Y t in polynomial time(Aho and Hopcroft, 1974) (consider X t and Y t as labeled trees). Consequently, Theorem 2.8induces a simple, polynomial time algorithm to calculate u GH between two finite ultrametricspaces. Remark 2.10.
Obviously, it follows from Theorem 2.8 that u GH p X, Y q ě ∆ p diam p X q , diam p Y qq . This is analogous to the bound d GH p X, Y q ě | diam p X q ´ diam p Y q | for the Gromov-Hausdorff distance (cf. M´emoli (2012, Theorem 3.4)).2.3. Wasserstein distance on ultrametric spaces.
Kloeckner (2015) uses the repre-sentation of ultrametric spaces as so called synchronized rooted trees to derive an explicitformula for the Wasserstein distance on ultrametric spaces. By the construction of thedendrograms and of the synchronized rooted trees (see Appendix A.1) it is immediatelyclear how to reformulate the results of Kloeckner (2015) on compact ultrametric spaces in
HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 13
Figure 2.
Illustration of p R ě , ∆ q : This is the dendrogram for a subspaceof p R ě , ∆ q consisting of 5 arbitrary distinct points of R ` .terms of proper dendrograms. To this end, we need to introduce some notation. For acompact ultrametric space X , let θ X be the associated proper dendrogram. Then, we let V p X q : “ Ť t ą θ X p t q “ tr x s t | x P X, t ą u . The following is a characterization of V p X q : Lemma 2.11. V p X q is the collection of all closed balls in X except for singletons t x u suchthat x is a cluster point in X . In particular, X P V p X q and for any x P X , if x is not acluster point, then t x u P V p X q . The proof is relegated to Appendix A.3. We sometimes denote by B an element in V p X q to avoid mentioning explicitly the center and the radius of the closed ball B . We denote by B ˚ the smallest (under inclusion) element in V p X q such that B Ř B ˚ (for the existence anduniqueness of B ˚ see Lemma A.1). Lemma 2.12.
Let p X, u X q be a compact ultrametric space. For all α, β P P p X q and ď p ă 8 , we have ` d X W ,p ˘ p p α, β q “ ´ ÿ B P V p X qzt X u p diam p B ˚ q p ´ diam p B q p qq | µ p B q ´ β p B q| . (15)While Lemma 2.12 is only valid for p ă 8 , it can be extended to the case p “ 8 . Lemma 2.13.
Let X be a compact ultrametric space. Then, for any α, β P P p X q , we have d X W , p α, β q “ max B P V p X qzt X u and α p B q‰ β p B q diam p B ˚ q . (16)The proof of this lemma is somewhat technical and we postpone it to Appendix A.3.2.3.1. Wasserstein distance on p R ě , ∆ q . The non-negative half real line R ě endowed with∆ turns out to be an ultrametric space (see Lemma A.3). Compact subspaces of p R ě , ∆ q are of particular interest in this paper. These spaces possess a particular structure (see Figure2) and the computation of the Wasserstein distance on them can be further simplified.For simplicity, we first present an explicit formula for computing d p R ě , ∆ q W ,p between finitelysupported measures. A cluster point x in a topological space X is such that any neighborhood of x contains countably manypoints in X . Theorem 2.14 ( d p R ě , ∆ q W ,p between finitely supported measures) . Suppose α, β are supportedon a finite subset t x , . . . , x n u of R ě such that ď x ă x ă ¨ ¨ ¨ ă x n . Denote α i : “ α pt x i uq and β i : “ β pt x i uq . Then we have for p P r , that d p R ě , ∆ q W ,p p α, β q “ ´ p ˜ n ´ ÿ i “ ˇˇˇˇˇ i ÿ j “ p α j ´ β j q ˇˇˇˇˇ ¨ | x pi ` ´ x pi | ` n ÿ i “ | α i ´ β i | ¨ x pi ¸ p . (17) Let F α and F β denote the cumulative distribution functions of α and β , respectively. Then,for the case p “ 8 we obtain d p R ě , ∆ q W , p α, β q “ max ˆ max ď i ď n ´ ,F α p x i q‰ F β p x i q x i ` , max ď i ď n,α i ‰ β i x i ˙ . Proof.
Clearly, V p X q “ tt x , x , . . . , x i u| i “ , . . . , n u Y tt x i u| i “ , . . . , n u (recall that eachset corresponds to a closed ball). Thus, we conclude the proof by applying Lemma 2.12 andLemma 2.13. (cid:3) Remark 2.15 (The case p “ . Note that when p “
1, for any finitely supported probabilitymeasures α, β P P p R ě q , d p R ě , ∆ q W , p α, β q “ ˆ d p R , ∆ q W , p α, β q ` ż R x | α ´ β |p dx q ˙ . The formula indicates that the (cid:96) -Wasserstein distance on p R ě , ∆ q is the average of theusual (cid:96) -Wasserstein distance on p R ě , ∆ q and a “weighted total variation distance”. Theweighted total variation like distance term is sensitive to position change. For example, let α “ δ x and β “ δ x , then ş R x | α ´ β |p dx q “ x ` x if x ‰ x .Next, we demonstrate that Theorem 2.14 extends naturally to the case of compactly sup-ported probability measures in p R ě , ∆ q . For this purpose, it is important to note thatcompact subsets of p R ě , ∆ q have a very particular structure as shown by the subsequentlemma. For its proof, we refer to Appendix A.3. Lemma 2.16.
Let X Ď p R ě , ∆ q . X is a compact subset if and only if X is either a finiteset or a countable set with being the unique cluster point. Based on the special structure of compact subsets of p R ě , ∆ q , we derive the followingextension of Theorem 2.14. Theorem 2.17 ( d p R ě , ∆ q W ,p between compactly supported measures) . Suppose α, β are sup-ported on a countable subset X : “ t u Y t x i | i P N u of R ě such that ă . . . ă x n ă x n ´ ă . . . ă x and is the only cluster point with respect to the usual Euclidean distance. Let α i : “ α pt x i uq for i P N and α : “ α pt uq . Similarly, let β i : “ β pt x i uq and β : “ β pt uq . Thenfor p P r , , d p R ě , ∆ q W ,p p α, β q “ ´ p ˜ ÿ i “ ˇˇˇˇˇ ÿ j “ i p α j ´ β j q ˇˇˇˇˇ ¨ | x pi ´ x pi ` | ` ÿ i “ | α i ´ β i | ¨ x pi ¸ p . (18) HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 15
Let F α and F β denote the cumulative distribution functions of α and β , respectively. Then,we obtain d p R ě , ∆ q W , p α, β q “ max ˆ max ď i ď n ´ ,F α p x i q‰ F β p x i q x i ` , max ď i ď n,α i ‰ β i x i ˙ . Proof.
Note that V p X q “ tt u Y t x j | j ě i u| i P N u Y tt x i u| i P N u (recall that each setcorresponds to a closed ball). Thus, we conclude the proof by applying Lemma 2.12 andLemma 2.13. (cid:3) Remark 2.18 (Closed-form solution for d p R ě , ∆ q q W ,p ) . We know that there is a closed-formsolution for Wasserstein distance on R with the usual Euclidean distance ∆ : d p R , ∆ q W ,p p α, β q “ ˆż | F ´ α p t q ´ F ´ β p t q| p dt ˙ p , where F α and F β are cumulative distribution functions of α and β , respectively. We have alsoobtained a closed-form solution for d p R ě , ∆ q W ,p in Theorem 2.17. We generalize these formulasto the case d p R ě , ∆ q q W ,p when q P p , and q ď p in Appendix A.2.3. Ultrametric Gromov-Wasserstein distances
In this section we establish various metric/topological properties of u sturmGW ,p as well as u GW ,p ,1 ď p ď 8 , and study the relation between them. Throughout this section, let P p X q denotethe set of Borel probability measures on the compact space X .3.1. Sturm’s ultrametric Gromov-Wasserstein distance.
We begin this section withestablish several basic properties of u sturmGW ,p , 1 ď p ď 8 , including a proof that u sturmGW ,p is indeeda metric (or more precisely a p -metric) on the collection of ultrametric measure spaces U w .We start with the following obvious observation: Lemma 3.1.
For any p P r , , we always have that u sturmGW ,p p X , Y q ě d sturmGW ,p p X , Y q . The definition of u sturmGW ,p given in (10) is clunky, technical and in general not easy to workwith. Hence, the first observation to make is the fact that u sturmGW ,p , 1 ď p ď 8 shares afurther property with d sturmGW ,p . As for d sturmGW ,p , u sturmGW ,p can be calculated by minimizing overpseudo-ultrametrics instead of isometric embeddings. Lemma 3.2.
Let X “ p X, u X , µ X q and Y “ p Y, u Y , µ Y q be two ultrametric measure spaces.Let D ult p u X , u Y q denote the collection of all pseudo-ultrametrics u on the disjoint union X \ Y such that u | X ˆ X “ u X and u | Y ˆ Y “ u Y . Let p P r , . Then, it holds that u sturmGW ,p p X , Y q “ inf u P D ult p u X ,u Y q d p X \ Y,u q W ,p p µ X , µ Y q , (19) where d p X \ Y,u q W ,p denotes the Wasserstein pseudometric of order p defined in (33) (resp. in (34) for p “ 8 ).Proof. The above lemma follows by the same arguments as Lemma 3.3 p iii q in Sturm (2006). (cid:3) Remark 3.3 (Wasserstein pseudometric) . The
Wasserstein pseudometric is a natural ex-tension of the Wasserstein distance to pseudometric spaces and has already been studied inThorsley and Klavins (2008). In Appendix B.1 we carefully show that it is closely related tothe Wasserstein distance on a canonically induced metric space. We further establish thatthe Wasserstein distance and the Wasserstein pseudometric share many relevant properties.Hence, we do not notationally distinguish between these two concepts.The representation of u sturmGW ,p , 1 ď p ď 8 , given by the above lemma is much more accessibleand we first use it to prove the following basic properties of u sturmGW ,p : Proposition 3.4.
Let X , Y P U w . Then, the following holds:(1) For any ď p ď q ď 8 , we have that u sturmGW ,p p X , Y q ď u sturmGW ,q p X , Y q .(2) It holds that lim p Ñ8 u sturmGW ,p p X , Y q “ u sturmGW , p X , Y q . Proof. (1) This simply follows from Jensen’s inequality.(2) By (1), L : “ lim n Ñ8 u sturmGW ,n p X , Y q exists and L ď u sturmGW , p X , Y q . To prove the oppositeinequality, we let u n P D ult p u X , u Y q and µ n P C p µ X , µ Y q be such that ˆż X ˆ Y p u n p x, y qq n µ n p dx ˆ dy q ˙ n “ u sturmGW ,n p X , Y q . By Lemma B.2 and Lemma B.4, t u n u n P N uniformly converges to some u P D ult p u X , u Y q and t µ n u n P N weakly converges to some µ P C p µ X , µ Y q (after taking appropriate sub-sequences of both sequences). Let M “ sup p x,y qP supp p µ q u p x, y q . Let ε ą U “ tp x, y q P X \ Y | u p x, y q ą M ´ ε u . Then, µ p U q ą
0. Since U is open, it fol-lows that there exists a small ε ą µ n p U q ą µ p U q ´ ε ą n large enough. Moreover, by uniform convergence of the sequence t u n u n P N , we have | u p x, y q ´ u n p x, y q| ď ε for any p x, y q P X \ Y and n large enough. Therefore, ˆż X ˆ Y p u n p x, y qq n µ n p dx ˆ dy q ˙ n ě p µ n p U qq n p M ´ ε q ě p µ p U q ´ ε q n p M ´ ε q . Letting n Ñ 8 , we obtain L ě M ´ ε . Since ε ą L ě M ě u sturmGW , p X , Y q . (cid:3) Moreover, we use Lemma 3.2 to prove that p U w , u sturmGW ,p q is indeed a metric space: Theorem 3.5. u sturmGW ,p is a p -metric on the collection U w of compact ultrametric measurespaces. In particular, when p “ 8 , u sturmGW , is an ultrametric. We postpone the proof after introducing several auxiliary results. In particular, we willverify the existence of optimal metrics and optimal couplings in (19).
HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 17
Proposition 3.6 (Existence of optimal couplings) . Let X “ p X, u X , µ X q and Y “ p Y, u Y , µ Y q be compact ultrametric measure spaces. Then, there always exist u P D ult p u X , u Y q and µ P C p µ X , µ Y q such that for ď p ă 8 u sturmGW ,p p X , Y q “ ˆż X ˆ Y p u p x, y qq p µ p dx ˆ dy q ˙ p and such that u sturmGW , p X , Y q “ sup p x,y qP supp p µ q u p x, y q . Proof.
The following proof is a suitable adaptation from proof of Lemma 3.3 in Sturm (2006).We will only prove the claim for the case p ă 8 since the case p “ 8 can be shown in asimilar manner. Let u n P D ult p u X , u Y q and µ n P C p µ X , µ Y q be such that ˆż X ˆ Y p u n p x, y qq p µ n p dx ˆ dy q ˙ p ď u sturmGW ,p p X , Y q ` n . By Lemma B.2, t µ n u n P N weakly converges (after taking an appropriate subsequence) to some µ P C p µ X , µ Y q . By Lemma B.4, t u n u n P N uniformly converges (after taking an appropriatesubsequence) to some u P D p u X , u Y q such that ˆż X ˆ Y p u p x, y qq p µ p dx ˆ dy q ˙ p ď u sturmGW ,p p X , Y q . Hence, it only remains to verify that u P D ult p u X , u Y q . In fact, for any z , z , z P X \ Y , wehave max p u p z , z q , u p z , z qq “ max p lim n Ñ8 u n p z , z q , lim n Ñ8 u n p z , z qq“ lim n Ñ8 max p u n p z , z q , u n p z , z qqě lim n Ñ8 u n p z , z q “ u p z , z q . Therefore, u P D ult p u X , u Y q . (cid:3) As a direct consequence of the proposition, we have the following result:
Corollary 3.7.
Fix ď p ď 8 . Let X “ p X, u X , µ X q and Y “ p Y, u Y , µ Y q be compactultrametric measure spaces. Then, there exist a compact ultrametric space Z and isometricembeddings φ : X ã Ñ Z and ψ : Y ã Ñ Z such that u sturmGW ,p p X , Y q “ d Z W ,p p φ µ X , ψ µ Y q . Next, we ensure that the Wasserstein pseudometric of order p on a compact pseudo-ultrametricspace p X, u X q is for p P r , a p -pseudometric and for p “ 8 a pseudo-ultrametric, i.e., weprove for 1 ď p ă 8 that for all µ , µ , µ P P p X q d p X,u X q W ,p p µ , µ q ď ´´ d p X,u X q W ,p p µ , µ q ¯ p ` ´ d p X,u X q W ,p p µ , µ q ¯ p ¯ { p and for p “ 8 that for all µ , µ , µ P P p X q d p X,u X q W ,p p µ , µ q ď max ´ d p X,u X q W ,p p µ , µ q , d p X,u X q W ,p p µ , µ q ¯ . Lemma 3.8.
Let p X, u X q be a compact ultrametric space. Then, for ď p ď 8 the p -Wasserstein metric d p X,u X q W ,p is a p -pseudometric on P p X q . In particular, when p “ 8 , it isan pseudo-ultrametric on P p X q .Proof. We prove the statement by adapting the proof of the triangle inequality for the p -Wasserstein distance (see e.g. Villani (2003, Theorem 7.3)). We only prove the case when p ă 8 whereas the case p “ 8 follows by analogous arguments.Let α , α , α P P p X q , denote by µ an optimal transport plan between α and α and by µ an optimal transport plan between α and α (see Villani (2008, Theorem 4.1) for theexistence of µ and µ ). Furthermore, let X i be the support of α i , 1 ď i ď
3. Then, bythe Gluing Lemma (Villani, 2003, Lemma 7.6) there exists a measure µ P P p X ˆ X ˆ X q with marginals µ on X ˆ X and µ on X ˆ X . Clearly, we obtain ´ d p X,u X q W ,p p α , α q ¯ p ď ż X ˆ X ˆ X u pX p x, z q dµ p x, y, z qď ż X ˆ X ˆ X p u pX p x, y q ` u pX p y, z qq dµ p x, y, z q . Here, we used that u X is an ultrametric, i.e., in particular a p -metric (M´emoli et al., 2019,Proposition 1.16). With this we obtain that ´ d p X,u X q W ,p p α , α q ¯ p ď ż X ˆ X u pX p x, y q dµ p x, y q ` ż X ˆ X u pX p y, z q dµ p y, z q“ ´ d p X,u X q W ,p p α , α q ¯ p ` ´ d p X,u X q W ,p p α , α q ¯ p . (cid:3) With Proposition 3.6 and Lemma 3.8 at our disposal we are now ready to prove Theorem 3.5which states that u sturmGW ,p is indeed a p -metric on U w . Proof of Theorem 3.5.
It is clear that u sturmGW ,p is symmetric and that u sturmGW ,p p X , Y q “ X – w Y . Furthermore, we remark that u sturmGW ,p p X , Y q ě d sturmGW ,p p X , Y q by Lemma 3.1. Since d sturmGW ,p p X , Y q “ X – w Y (Sturm (2012)), we have that u sturmGW ,p p X , Y q “ X – w Y . It remains to verify the p -triangle inequality. To this end, we only prove thecase when p ă 8 whereas the case p “ 8 follows by analogous arguments.Let X , Y , Z P U w . Suppose u XY P D ult p u X , u Y q and u Y Z P D ult p u Y , u Z q are optimal metriccouplings such that ` u sturmGW ,p p X , Y q ˘ p “ ´ d p X \ Y,u XY q W ,p p µ X , µ Y q ¯ p and ` u sturmGW ,p p Y , Z q ˘ p “ ´ d p Y \ Z,u
Y Z q W ,p p µ Y , µ Z q ¯ p . Further, define u XY Z on X \ Y \ Z as u XY Z p x , x q “ $’’’&’’’% u XY p x , x q x , x P X \ Yu Y Z p x , x q x , x P Y \ Z inf t max p u XY p x , y q , u Y Z p y, x qq | y P Y u x P X, x P Z inf t max p u XY p x , y q , u Y Z p y, x qq | y P Y u x P Z, x P X. HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 19
Then, by Lemma 1.1 of Zarichnyi (2005) u XY Z is a pseudo-ultrametric on X \ Y \ Z thatcoincides with u XY on X \ Y and with u Y Z on Y \ Z . With this we obtain by Lemma 3.8that ` u sturmGW ,p p X , Z q ˘ p ď ´ d p X \ Y \ Z,u
XY Z q W ,p p µ X , µ Z q ¯ p ď ´ d p X \ Y \ Z,u
XY Z q W ,p p µ X , µ Y q ¯ p ` ´ d p X \ Y \ Z,u
XY Z q W ,p p µ Y , µ Z q ¯ p “ ´ d p X \ Y,u XY q W ,p p µ X , µ Y q ¯ p ` ´ d p Y \ Z,u
Y Z q W ,p p µ Y , µ Z q ¯ p “ ` u sturmGW ,p p X , Y q ˘ p ` ` u sturmGW ,p p Y , Z q ˘ p This gives the claim for p ă 8 . (cid:3) It is important to note that the topology induced on U w by u sturmGW ,p , 1 ď p ď 8 , is differentfrom the one induced by d sturmGW ,p . This is well illustrated in the following example. Example 3.9 ( u sturmGW ,p and d sturmGW ,p induce different topologies) . This example is an adaptationfrom M´emoli et al. (2019, Example 3.14). For each a ą
0, denote by ∆ p a q the two-pointmetric space with interpoint distance a . Endow with ∆ p a q the uniform probability measure µ a and denote the corresponding ultrametric measure space ˆ∆ p a q . Now, let X : “ ˆ∆ p q andlet X n : “ ˆ∆ ` ` n ˘ for n P N . It is easy to check that for any 1 ď p ď 8 , d sturmGW ,p p X , X n q “ n and u sturmGW ,p p X , X n q “ ´ p p ` n q where we adopt the convention that 1 {8 “
0. Hence, as n goes to infinity X n will converge to X in the sense of d sturmGW ,p , but not in the sense of u sturmGW ,p ,for any 1 ď p ď 8 .3.1.1. Alternative representations of u sturmGW ,p . In this subsection we derive alternative repre-sentations for u sturmGW ,p defined in (10) for finite ultrametric measure spaces. We mainly focuson the case p ă 8 , however it turns out that the results also hold for p “ 8 . In Section 3.3,we will further prove that u sturmGW , “ u GW , (see Theorem 3.33) and study the implications ofthis in Section 3.2.Let X , Y P U w be finite spaces and recall the original definition of u sturmGW ,p , p P r , , givenin (10), i.e., u sturmGW ,p p X , Y q “ inf Z,φ,ψ d p Z,u Z q W ,p p ϕ µ Y , ψ µ Y q , where φ : X Ñ Z and ψ : Y Ñ Z are isometric embeddings into an ultrametric space p Z, u Z q . It turns out that we only need to consider relatively few possibilities of mapping twoultrametric spaces into a common ultrametric space. Exemplarily, this is shown in Figure 3,where we see two ultrametric spaces and two possibilities for a common ultrametric space Z . Indeed, it is straightforward to write down all reasonable embeddings and target spaces.We define the set A : “ tp A, ϕ q | A Ď X and ϕ : A ã Ñ Y is an isometric embedding u . (20)Clearly, A ‰ H , as it holds for each x P X that tpt x u , ϕ y qu y P Y Ď A , where ϕ y is the mapsending x to y P Y . The following example of elements in A is used in the sequel. Figure 3.
Common ultrametric spaces:
Representation of the two kindsof ultrametric spaces Z (middle and right) into which we can isometricallyembed the spaces X and Y (left). Example 3.10.
Let X , Y P U w be finite spaces and let u P D ult p u X , u Y q . If u ´ p q ‰ H , wedefine A : “ π X p u ´ p qq Ď X . Then, the map ϕ : A Ñ Y defined by sending x P A to y P Y such that u p x, y q “ p A, ϕ q P A .Let D ultadm p u X , u Y q denote the collection of all admissible pseudo-ultrametrics on X \ Y , where u P D ult p u X , u Y q is called admissible , if there exists no u ˚ P D ult p u X , u Y q such that u ˚ ‰ u and u ˚ p x, y q ď u p x, y q for all x, y P X \ Y . Lemma 3.11.
For any X , Y P U w , D ultadm p u X , u Y q ‰ H . Moreover, u sturmGW ,p p X , Y q “ inf u P D ultadm p u X ,u Y q d p X \ Y,u q W ,p p µ X , µ Y q . Combined with Example 3.10, the following result implies that each u P D ultadm p u X , u Y q givesrise to an element in A . Lemma 3.12.
Given X , Y P U w , for each u P D ultadm p u X , u Y q , u ´ p q ‰ H . Proofs of Lemma 3.11 and Lemma 3.12 are given in Appendix B.2.Now, fix two finite spaces X , Y P U w . Let p A, ϕ q P A and let Z A “ X \ p Y z ϕ p A qq Ď X \ Y .Furthermore, define u Z A : Z A ˆ Z A Ñ R ě as follows:(1) u Z A | X ˆ X : “ u X and u Z A | Y z ϕ p A qˆ Y z ϕ p A q : “ u Y | Y z ϕ p A qˆ Y z ϕ p A q ;(2) For any x P A and y P Y z ϕ p A q define u Z A p x, y q : “ u Y p y, ϕ p x qq ;(3) For x P X z A and y P Y z ϕ p A q let u Z A p x, y q : “ inf t max p u X p x, a q , u Y p ϕ p a q , y qq | a P A u ;(4) For any x P X and y P Y z ϕ p A q , u Z A p y, x q : “ u Z A p x, y q . Then, p Z A , u Z A q is an ultrametric space such that X and Y can be mapped isometricallyinto Z A (see Zarichnyi (2005, Lemma 1.1)). Let φ X p A,ϕ q and ψ Y p A,ϕ q denote the correspondingisometric embeddings of X and Y , respectively. HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 21
Theorem 3.13.
Let X , Y P U w be finite spaces. Then, we have for each p P r , that u sturmGW ,p p X , Y q “ inf p A,ϕ qP A d Z A W ,p ´` φ X p A,ϕ q ˘ µ X , ` ψ Y p A,ϕ q ˘ µ Y ¯ . (21) Proof.
By Lemma 3.11 it is sufficent to prove that each u P D ultadm p u X , u Y q induces p A, ϕ q P A such that d p X \ Y,u q W ,p p µ X , µ Y q ě d Z A W ,p ´` φ X p A,ϕ q ˘ µ X , ` ψ Y p A,ϕ q ˘ µ Y ¯ . Let u P D ultadm p u X , u Y q . We define A : “ t x P X | D y P Y such that u p x, y q “ u ( A ‰ H byLemma 3.12). By Example 3.10, the map ϕ : A Ñ Y defined by taking x to y such that u p x, y q “ p A , ϕ q P A .If u p x, y q ě u Z A ´ φ X p A ,ϕ q p x q , ψ Y p A ,ϕ q p y q ¯ holds for all p x, y q P X ˆ Y , then we set A : “ A and ϕ : “ ϕ . This gives d p X \ Y,u q W ,p p µ X , µ Y q ě d Z A W ,p ´` φ X p A,ϕ q ˘ µ X , ` ψ Y p A,ϕ q ˘ µ Y ¯ . Otherwise, there exists p x, y q P X z A ˆ Y z ϕ p A q such that u p x, y q ă u Z A ` φ X p A ,ϕ q p x q , ψ Y p A ,ϕ q p y q ˘ (if x P A or y P ϕ p A q , then we must have u p x, y q ě u Z A ´ φ X p A ,ϕ q p x q , ψ Y p A ,ϕ q p y q ¯ ). Let p x , y q P X z A ˆ Y z ϕ p A q be such that u p x , y q “ min ! u p x, y q| p x, y q P X z A ˆ Y z ϕ p A q and u p x, y q ă u Z A ` φ X p A ,ϕ q p x q , ψ Y p A ,ϕ q p y q ˘ ) ą . The existence of p x , y q follows from finiteness of X and Y . It is easy to check that ϕ extends to an isometry from A Y t x u to ϕ p A q Y t y u by taking x to y . We denotethe new isometry ϕ and set A : “ A Y t x u . If for any p x, y q P X ˆ Y , we have that u p x, y q ě u Z A ´ φ X p A ,ϕ q p x q , ψ Y p A ,ϕ q p y q ¯ , then we define A : “ A and ϕ : “ ϕ . Otherwise,we continue the process to obtain A , A , . . . . This process will eventually stop since we areconsidering finite spaces. Suppose the process stops at A n , then A : “ A n and ϕ : “ ϕ n satisfythat u p x, y q ě u Z A ´ φ X p A,ϕ q p x q , ψ Y p A,ϕ q p y q ¯ for any p x, y q P X ˆ Y . Therefore, d p X \ Y,u q W ,p p µ X , µ Y q ě d Z A W ,p ´` φ X p A,ϕ q ˘ µ X , ` ψ Y p A,ϕ q ˘ µ Y ¯ . Since u P D ultadm p u X , u Y q is arbitrary, this gives the claim. (cid:3) In fact, the possibilities of pairs in A in (21) can be further reduced. We call a pair p A, ϕ q P A maximal , if for all pairs p B, ϕ q P A with A Ď B and ϕ | A “ ϕ it holds A “ B . We denoteby A ˚ Ď A the collection of all maximal pairs. The subsequent direct consequence ofTheorem 3.13 demonstrates that in order to calculate u sturmGW ,p , 1 ď p ď 8 , it is sufficient toconsider only spaces p Z A , u Z A q induced by maximal pairs in A ˚ . Corollary 3.14.
Let X , Y P U w be finite spaces. Then, we have for each p P r , that u sturmGW ,p p X , Y q “ inf p A,ϕ qP A ˚ d Z A W ,p ´` φ X p A,ϕ q ˘ µ X , ` ψ Y p A,ϕ q ˘ µ Y ¯ . (22) Remark 3.15.
Let X and Y be two finite ultrametric measure spaces. The representation of u GW ,p p X , Y q , 1 ď p ď 8 given by Theorem 3.13 is very explicit and recasts the computationof u GW ,p p X , Y q , 1 ď p ď 8 , as a combinatorial problem. Using the ultrametric Gromov-Hausdorff distance (see (9)) it is possible to determine if two ultrametric spaces are isometricin polynomial time (M´emoli et al., 2019, Theorem 5.7). However, this is clearly not sufficientto identify all p A, ϕ q P A ˚ in polynomial time. Especially, since for a given, viable A Ď X ,there are usually multiple ways to define the corresponding map ϕ . Furthermore, we havefor 1 ď p ă 8 neither been able to further restrict the set A ˚ nor to identify the optimal p A ˚ , ϕ ˚ q . This just leaves a brute force approach which is computationally not feasible. Onthe other hand, for p “ 8 we are able to explicitly construct the optimal pair p A ˚ , ϕ ˚ q (seeTheorem 3.34).3.2. The ultrametric Gromov-Wasserstein distance.
Next, we consider basic proper-ties of u GW ,p and prove the analogue of Theorem 3.5, i.e., we verify that also u GW ,p is a p -metric on the collection of ultrametric measure spaces, 1 ď p ď 8 .We start with the following obvious observation: Lemma 3.16.
For any p P r , , we always have that u GW ,p p X , Y q ě d GW ,p p X , Y q . The subsequent proposition collects two basic properties of u GW ,p which are also shared by u sturmGW ,p (cf. Proposition 3.4). Proposition 3.17.
Let X , Y P U w . Then, the following holds:(1) For any ď p ď q ď 8 , it holds u GW ,p p X , Y q ď u GW ,q p X , Y q ;(2) We have that lim p Ñ8 u GW ,p p X , Y q “ u GW , p X , Y q . Proof. (1) By Jensen’s inequality we have that dis ult p p µ q ď dis ult q p µ q for any µ P C p µ X , µ Y q .Therefore, u GW ,p p X , Y q ď u GW ,q p X , Y q .(2) By (1), L : “ lim n Ñ8 u GW ,n p X , Y q exists and L ď u GW , p X , Y q . To prove the oppositeinequality, we let µ n P C p µ X , µ Y q be such that ¨˝ ij X ˆ Y ˆ X ˆ Y ∆ p u X p x, x q , u Y p y, y qq n µ n p dx ˆ dy q µ n p dx ˆ dy q ˛‚ n “ u GW ,n p X , Y q . By Lemma B.2, t µ n u n P N weakly converges (after taking an appropriate subsequence)to some µ P C p µ X , µ Y q . Let M “ sup p x,y q , p x ,y qP supp p µ q ∆ p u X p x, x q , u Y p y, y qq and given any ε ą U “ tpp x, y q , p x , y qq P X ˆ Y ˆ X ˆ Y | ∆ p u X p x, x q , u Y p y, y qq ą M ´ ε u . HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 23
Then, we have µ b µ p U q ą
0. As µ n weakly converges to µ , we have that µ n b µ n weakly converges to µ b µ . Since U is open, there exists a small ε ą µ n b µ n p U q ą µ b µ p U q ´ ε ą n large enough. Therefore, ¨˝ ij X ˆ Y ˆ X ˆ Y ∆ p u X p x, x q , u Y p y, y qq n µ n p dx ˆ dy q µ n p dx ˆ dy q ˛‚ n ěp µ n b µ n p U qq n p M ´ ε q ě p µ b µ p U q ´ ε q n p M ´ ε q . For n Ñ 8 , we obtain L ě M ´ ε . Since ε ą L ě M ě u GW , p X , Y q . (cid:3) Furthermore, it is possible to write down u GW ,p , 1 ď p ď 8 , explicitly in some simplesettings. Let X “ p X , d X , µ X q be a ultrametric measure space. Let its p -diameter (see e.g.M´emoli (2011)) for 1 ď p ă 8 be defined asdiam p p X q : “ ¨˝ ij X ˆ X ` d X p x, x q ˘ p µ X p dx q µ X p dx q ˛‚ { p and for p “ 8 as diam p X q : “ sup p x,x qP supp p µ X q d X p x, x q . Then, on can show the subsequent proposition.
Proposition 3.18.
Let ˚ P U w be the one-point space. Then, it holds for any ď p ď 8 that u GW ,p p X , ˚q “ diam p p X q . Proof.
Denote by µ the unique coupling µ X b δ ˚ between µ X and δ ˚ . Then, for any p ă 8 we have u GW ,p p X , ˚q “ ¨˝ ij X ˆ˚ˆ X ˆ˚ ` ∆ p u X p x, x q , u ˚ p y, y qq ˘ p µ p dx ˆ dy q µ p dx ˆ dy q ˛‚ { p “ ¨˝ ij X ˆ X ` u X p x, x q ˘ p µ X p dx q µ X p dx q ˛‚ { p “ diam p p X q . The case p “ 8 follows by analogous arguments. (cid:3) Next, we verify that u GW ,p is indeed a metric on the collection of ultrametric measure spaces. Theorem 3.19.
The ultrametric Gromov-Wasserstein distance u GW ,p is a p -metric on thecollection U w of compact ultrametric measure spaces. In particular, when p “ 8 , u GW , isan ultrametric. The proof is based on the following results about existence of optimal couplings in (13).
Proposition 3.20.
Let X “ p X, u X , µ X q and Y “ p Y, u Y , µ Y q be compact ultrametric mea-sure spaces. Then, for any p P r , , there always exists an optimal coupling µ P C p µ X , µ Y q such that u GW ,p p X , Y q “ dis ult p p µ q .Proof. We will only prove the claim for the case p ă 8 since the case p “ 8 can be provenin a similar manner. Let µ n P C p µ X , µ Y q be such that ¨˝ ij X ˆ Y ˆ X ˆ Y ∆ p u X p x, x q , u Y p y, y qq p dµ n p x, y q dµ n p x , y q ˛‚ p ď u GW ,p p X , Y q ` n . By Lemma B.2, t µ n u n P N weakly converges to some µ P C p µ X , µ Y q (after taking an appropriatesubsequence). Then, by the boundedness and continuity of ∆ p u X , u Y q on X ˆ Y ˆ X ˆ Y (cf. Lemma B.5) as well as the weak convergence of µ n b µ n , we have that thatdis ult p p µ q “ lim n Ñ8 dis ult p p µ n q ď u GW ,p p X , Y q . Hence, u GW ,p p X , Y q “ dis ult p p µ q . (cid:3) With Proposition 3.20 at our disposal, we can demonstrate the analogue of Theorem 3.5 for u GW ,p , 1 ď p ď 8 . Proof of Theorem 3.19.
It is clear that u GW ,p is symmetric and that u GW ,p p X , Y q “ X – w Y . Furthermore, we remark that u GW ,p p X , Y q ě d GW ,p p X , Y q by Lemma 3.16. Since d GW ,p p X , Y q “ X – w Y (see M´emoli (2011)), we have that u GW ,p p X , Y q “ X – w Y . It remains to verify the p -triangle inequality. To this end, we onlyprove the case when p ă 8 whereas the case p “ 8 follows by analogous arguments.Now let X , Y , Z be three ultrametric measure spaces. Let µ XY P C p µ X , µ Y q and µ Y Z P C p µ Y , µ Z q be optimal. By the Gluing Lemma (Villani, 2003, Lemma 7.6), there exists ameasure µ XY Z P P p X ˆ Y ˆ Z q with marginals µ XY on X ˆ Y and µ Y Z on Y ˆ Z . Further,we define µ XZ “ p π XZ q µ P P p X ˆ Z q . Then, p u GW ,p p X , Z qq p ď ij X ˆ Z ˆ X ˆ Z ` ∆ p u X p x, x q , u Z p z, z qq ˘ p µ XZ p dx ˆ dz q µ XZ p dx ˆ dz q“ ij X ˆ Y ˆ Z ˆ X ˆ Y ˆ Z ` ∆ p u X p x, x q , u Z p z, z qq ˘ p µ XY Z p dx ˆ dy ˆ dz q µ XY Z p dx ˆ dy ˆ dz qď ij X ˆ Y ˆ Z ˆ X ˆ Y ˆ Z ` ∆ p u X p x, x q , u Y p y, y qq ˘ p µ XY Z p dx ˆ dy ˆ dz q µ XY Z p dx ˆ dy ˆ dz q` ij X ˆ Y ˆ Z ˆ X ˆ Y ˆ Z ` ∆ p u Y p y, y q , u Z p z, z qq ˘ p µ XY Z p dx ˆ dy ˆ dz q µ XY Z p dx ˆ dy ˆ dz q“ ij X ˆ Y ˆ X ˆ Y ` ∆ p u X p x, x q , u Y p y, y qq ˘ p µ XY p dx ˆ dy q µ XY p dx ˆ dy q HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 25
Figure 4.
Weighted Quotient:
An ultrametric measure space (black) andits weighted quotient at level t (red). ` ij Y ˆ Z ˆ Y ˆ Z ` ∆ p u Y p y, y q , u Z p z, z qq ˘ p µ Y Z p dy ˆ dz q µ Y Z p dy ˆ dz q“p u GW ,p p X , Y qq p ` p u GW ,p p Y , Z qq p , where the second inequality follows from the fact that ∆ in an ultrametric on R ě (seeLemma A.3) and the observation that an ultrametric is automatically a p -metric for any p P r , (M´emoli et al., 2019, Proposition 1.16). (cid:3) Remark 3.21.
By the same arguments as for d GW ,p ,1 ď p ă 8 , (M´emoli, 2011, Sec.7), it follows that for two finite ultrametric measure spaces X and Y the computation of u GW ,p p X , Y q , 1 ď p ă 8 , boils down to solving a (non-convex) quadratic program. This isin general NP-hard (Pardalos and Vavasis, 1991). On the other hand, for p “ 8 , we willderive a polynomial time algorithm to determine u GW , p X , Y q (cf. Section 3.2.1).3.2.1. Alternative representations of u GW , . In the following, we will derive an alternativerepresentation of u GW , that resembles the one of u GH derived in M´emoli et al. (2019,Theorem 5.7). It also leads to a polynomial time algorithm for the computation of u GW , .For this purpose, we define the weighted quotient of an ultrametric measure space. Let X “ p X, u X , µ X q P U w and let t ě
0. Then, the weighted quotient of X at level t , is given as X t “ p X t , u X t , µ X t q , where p X t , u X t q is the quotient of the ultrametric space p X, u X q at level t (see Section 2.2) and µ X t P P p X t q is the push forward of µ X under the canonical quotientmap p X, u X q ÞÑ p X t , u X t q . Figure 4 illustrates the weighted quotient in a simple example.Based on this definition, we show the following theorem. Theorem 3.22.
Let X “ p X, u X , µ X q and Y “ p Y, u Y , µ Y q be two compact ultrametricmeasure spaces. Then, it holds that u GW , p X , Y q “ min t t ě | X t – w Y t u . Remark 3.23.
The weighted quotients X t and Y t can be considered as vertex weighted,rooted trees and thus it is possible to verify X t – w Y t in polynomial time (Aho and Hopcroft,1974). In consequence, we obtain an polynomial time algorithm for the calculation of u GW , .See Section 7.1.2 for details. Proof of Theorem 3.22.
We first prove that u GW , p X , Y q “ inf t t ě | X t – w Y t u (23)and then show that the infimum is attainable.Since X – X and Y – Y , if X – w Y , then X – Y and thus by Theorem 3.19 u GW , p X , Y q “ “ inf t t ě | X t – w Y t u Now, assume that for some t ą X t – w Y t . By Lemma 2.7, for some n P N we can write X t “ tr x s t , . . . , r x n s t u and Y t “ tr y s t , . . . , r y n s t u such that u X t pr x i s t , r x j s t q “ u Y t pr y i s t , r y j s t q and µ X pr x i s t q “ µ Y pr y i s t q . Let µ iX : “ µ X | r x i s t and µ iY : “ µ Y | r y i s t for all i “ , . . . , n . Let µ : “ ř ni “ µ iX b µ iY . It is easy to check that µ P C p µ X , µ Y q and supp p µ q “ Ť ni “ r x i s t ˆ r y i s t . Assume p x, y q P r x i s t ˆ r y i s t and p x , y q P r x j s t ˆ r y j s t . If i ‰ j , then u X t pr x i s t , r x j s t q “ u Y t pr y i s t , r y j s t q and thus ∆ p u X p x, x q , u Y p y, y qq “ ∆ p u X t pr x i s t , r x j s t q , u Y t pr y i s t , r y j s t qq “
0. If i “ j , then u X p x, x q , u Y p y, y q ď t and thus ∆ p u X p x, x q , u Y p y, y qq ď t . In either case, we have that u GW , p X , Y q ď sup p x,y q , p x ,y qP supp p µ q ∆ p u X p x, x q , u Y p y, y qq ď t. Therefore, u GW , p X , Y q ď inf t t ě | X t – w Y t u . Conversely, suppose µ P C p µ X , µ Y q and let t : “ sup p x,y q , p x ,y qP supp p µ q ∆ p u X p x, x q , u Y p y, y qq .By M´emoli (2011, Lemma 2.2), we know that supp p µ q is a correspondence between X and Y .It is easy to check that dis p supp p µ qq “ t . We define a map f t : X t Ñ Y t by taking r x s Xt P X t to r y s Yt P Y t such that p x, y q P supp p µ q . It is easy to check that f t is well-defined and moreover f t is an isometry (see for example the proof of M´emoli et al. (2019, Theorem 5.7)). Next,we prove that f t is actually an isomorphism between X t and Y t . For any r x s Xt P X t , let y P Y be such that r y s Yt “ f t pr x s Xt q . If there exists p x , y q P supp p µ q such that x P r x s Xt and y R r y s Yt , then ∆ p u X p x, x q , u Y p y, y qq “ u Y p y, y q ą t “ dis p supp p µ qq , which isimpossible. Consequently, µ pr x s Xt ˆ p Y zr y s Yt qq “ µ pp X zr x s Xt q ˆ r y s Yt q “ µ X pr x s Xt q “ µ pr x s Yt ˆ Y q “ µ pr x s Xt ˆ r y s Yt q “ µ p X ˆ r y s Yt q “ µ Y pr y s Yt q . Therefore, f t is an isomorphism between X t and Y t . Consequently, we have that u GW , p X , Y q ě inf t t ě | X t – w Y t u and hence u GW , p X , Y q “ inf t t ě | X t – w Y t u . Now, we show that the infimum of inf t t ě | X t – w Y t u is attainable. Let δ : “ inf t t ě | X t – w Y t u .If δ ą
0, then both X δ and Y δ are finite spaces. Then, if t t n u is a decreasing sequence con-verging to δ , then for n large enough we actually have that X t n “ X δ and Y t n “ Y δ . If wemoreover assume that X t n – w Y t n for all t n , then we have that X δ – w Y δ . If δ “
0, then by(23) we have that u GW , p X , Y q “ δ “
0. By Theorem 3.19, X – w Y . This is equivalent to X δ – w Y δ . Therefore, the infimum of inf t t ě | X t – w Y t u is always attainable. (cid:3) HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 27
The representations of u GH in Theorem 2.8 and u GW , in Theorem 3.22 strongly resemblethemselves. As a direct consequence of both Theorem 2.8 and Theorem 3.22, we obtain thefollowing comparison between the two metrics Corollary 3.24.
Let X , Y P U w . Then, it holds that u GW , p X , Y q ě u GH p X, Y q . (24)The inequality in (24) is sharp and we illustrate this as follows. By M´emoli et al. (2019,Corollary 5.8) we know that if the considered ultrametric spaces p X, u X q and p Y, u Y q havedifferent diameter (w.l.o.g. diam p X q ă diam p Y q ), then u GH p X, Y q “ diam p Y q . The samestatement also holds for u GW , Corollary 3.25.
Let X , Y P U w be such that diam p X q ă diam p Y q . Then, u GW , p X , Y q “ diam p Y q “ u GH p X, Y q . Proof.
The rightmost equality follows directly from Corollary 5.8 of M´emoli et al. (2019).As for the leftmost equality, let t : “ diam p Y q , then it is obvious that X t – w ˚ – w Y t . Let s P p diam p X q , diam p Y qq , then X t – w ˚ whereas Y fl w ˚ . By Theorem 3.22, u GW , p X , Y q “ t “ diam p Y q . (cid:3) The relation between u GW ,p and u sturmGW ,p . In this subsection, we study the relationbetween u sturmGW ,p and u GW ,p , 1 ď p ď 8 . For this purpose, we first demonstrate that it issufficient to consider the two cases p “ p “ 8 .For each α ą
0, we define a function S α : R ě Ñ R ě by x ÞÑ x α . Given an ultrametric space p X, u X q and α ą
0, we abuse the notation and denote by S α p X q the new space p X, S α ˝ u X q .It is obvious that S α p X q is still an ultrametric space. This transformation of metric spaces isalso known as the snowflake transform (David et al., 1997). An important observation is thatthe snowflake transform relates the p -Wasserstein pseudometric on the pseudo-ultrametricspace X with the 1-Wasserstein pseudometric on the space S p p X q , 1 ď p ă 8 . Lemma 3.26.
Given a pseudo-ultrametric space p X, u X q and p ě , we have for any α, β P P p X q that d p X,u X q W ,p p α, β q “ ´ d S p p X q W , p α, β q ¯ p . Remark 3.27.
Since S p ˝ u X and u X induce the same topology and thus the same Borel setson X , we have that P p X q “ P p S p p X qq and thus the expression d S p p X q W , p α, β q in the lemma iswell defined. Proof.
Suppose µ , µ P C p α, β q are optimal for d X W ,p p α, β q and d S p p X q W , p α, β q , respectively (seeAppendix B.1 for the existence of µ and µ ). Then, ´ d p X,u X q W ,p p α, β q ¯ p “ ij X ˆ X p u X p x, y qq p µ p dx ˆ dy q “ ij X ˆ X S p p u X qp x, y q µ p dx ˆ dy q ě d S p p X q W , p α, β q , and d S p p X q W , p α, β q “ ij X ˆ X S p p u X qp x, y q µ p dx ˆ dy q “ ij X ˆ X p u X p x, y qq p µ p dx ˆ dy q ě ´ d p X,u X q W ,p p α, β q ¯ p . Therefore, d p X,u X q W ,p p α, β q “ ´ d S p p X q W , p α, β q ¯ p . (cid:3) Let X “ p X, u X , µ X q and Y “ p Y, u Y , µ Y q denote two ultrametric measure spaces. Let1 ď p ă 8 . We denote by S p p X q the ultrametric measure space p X, S p ˝ u X , µ X q . GivenLemma 3.26, it is not very surprising that the snowflake transform can also be use to relate u GW ,p p X , Y q as well as u sturmGW ,p p X , Y q with u GW , p S p p X q , S p p Y qq and u sturmGW , p S p p X q , S p p Y qq ,respectively. Theorem 3.28.
Let X , Y P U w and let p P r , . Then, ` u GW ,p p X , Y q ˘ p “ u GW , p S p p X q , S p p Y qq and ` u sturmGW ,p p X , Y q ˘ p “ u sturmGW , p S p p X q , S p p Y qq . Proof.
Let µ P C p µ X , µ Y q . Then, ij X ˆ Y ˆ X ˆ Y ` ∆ p u X p x, x q , u Y p y, y qq ˘ p µ p dx ˆ dy q µ p dx ˆ dy q“ ij X ˆ Y ˆ X ˆ Y ∆ ` u X p x, x q p , u Y p y, y q p ˘ µ p dx ˆ dy q µ p dx ˆ dy q . Infimize over µ P C p µ X , µ Y q on both sides, then we obtain that p u GW ,p p X , Y qq p “ u GW , p S p p X q , S p p Y qq . In order to prove the second part of the claim, let u P D ult p u X , u Y q . Then, we have that ij X ˆ Y p u p x, y qq p µ p dx ˆ dy q “ ij X ˆ Y p S p ˝ u qp x, y q µ p dx ˆ dy q . Infimize over µ P C p µ X , µ Y q on both sides, then we obtain that ` d p X \ Y,u q W ,p p µ X , µ Y q ˘ p “ d S p p u q W , p µ X , µ Y q . Finally, infimizing over u P D ult p u X , u Y q , we have that u sturmGW ,p p X , Y q p “ u sturmGW , p S p p X q , S p p Y qq . (cid:3) As a direct consequence, we obtain the following relation between p U w , u sturmGW , q and ` U w , u sturmGW ,p ˘ for p P r , : HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 29
Corollary 3.29.
For each p P r , , the metric space p U w , u sturmGW , q is isometric to thesnowflake transform of ` U w , u sturmGW ,p ˘ , i.e., S p ` U w , u sturmGW ,p ˘ – ` U w , u sturmGW , ˘ Proof.
Consider the snowflake transform map S p : U w Ñ U w sending X P U w to S p p X q P U w .It is obvious that S p is bijective. By Theorem 3.28, S p is an isometry from S p ` U w , u sturmGW ,p ˘ to ` U w , u sturmGW , ˘ . Therefore, S p ` U w , u sturmGW ,p ˘ – ` U w , u sturmGW , ˘ . (cid:3) Theorem 3.28 suggests that in order to study the relation between u GW ,p and u sturmGW ,p we onlyneed to examine the cases p “ p “ 8 .3.3.1. The case p “ . We first study the relation between u sturmGW , and u GW , and use Theo-rem 3.28 to then relate u sturmGW ,p and u GW ,p , 1 ď p ă 8 , afterwards. We start by showing thefollowing theorem. Theorem 3.30.
Let X , Y P U w . Then, we have u sturmGW , p X , Y q ě u GW , p X , Y q . Proof.
Let u P D ult p u X , u Y q and µ P C p µ X , µ Y q be such that u sturmGW , p X , Y q “ ş u p x, y q µ p dx ˆ dy q . The existence of u and µ follows from Proposition 3.6 Claim 1:
For any p x, y q , p x , y q P X ˆ Y , we have∆ p u X p x, x q , u Y p y, y qq ď max p u p x, y q , u p x , y qq ď u p x, y q ` u p x , y q . Assuming Claim 1, we have ij X ˆ Y ˆ X ˆ Y ∆ p u X p x, x q , u Y p y, y qq µ p dx ˆ dy q µ p dx ˆ dy qď ij X ˆ Y ˆ X ˆ Y u p x, y q µ p dx ˆ dy q µ p dx ˆ dy q` ij X ˆ Y ˆ X ˆ Y u p x , y q µ p dx ˆ dy q µ p dx ˆ dy q“ ij X ˆ Y u p x, y q µ p dx ˆ dy q ` ij X ˆ Y u p x , y q µ p dx ˆ dy q ď u sturmGW , p X , Y q . Therefore, u sturmGW , p X , Y q ě ´ u GW , p X , Y q .Now, we finalize the proof by verifying Claim 1. Proof of Claim 1:
We only need to show that ∆ p u X p x, x q , u Y p y, y qq ď max p u p x, y q , u p x , y qq .If u X p x, x q “ u Y p y, y q , then there is nothing to prove. Otherwise, we assume withoutloss of generality that u X p x, x q ă u Y p y, y q . If max p u p x, y q , u p x , y qq ă u Y p y, y q , thenby the strong triangle inequality we must have u p x, y q “ u Y p y, y q “ u p x , y q . However, u p x , y q ď max p u X p x, x q , u p x, y qq ă u Y p y, y q , which leads to a contradiction. Therefore,∆ p u X p x, x q , u Y p y, y qq ď max p u p x, y q , u p x , y qq . (cid:3) The following example verifies that the coefficient in Theorem 3.30 is tight.
Example 3.31.
For each n P N , let X n be the three-point space ∆ p q (labeled by t x , x , x u )with a probability measure µ nY such that µ nY p x q “ µ nY p x q “ n and µ nY p x q “ ´ n . Let Y “ ˚ and µ Y be the only probability measure on Y . Then, it is routine to check that u GW , p X n , Y q “ n ` ´ n ˘ and u sturmGW , p X n , Y q “ n . Therefore, we havelim n Ñ8 u GW , p X n , Y q u sturmGW , p X n , Y q “ . In Section 4, we will furthermore verify that u sturmGW ,p and u GW ,p , 1 ď p ă 8 are also topo-logically equivalent. However, for the moment we conclude the direct comparison of thetwo metrics by deriving their relation for general p on the basis of Theorem 3.28 and Theo-rem 3.30. Corollary 3.32.
Let X , Y P U w and p P r , . Then, we have u sturmGW ,p p X , Y q ě ´ p u GW ,p p X , Y q . Proof.
Applying Theorem 3.28 and Theorem 3.30, we have that u sturmGW ,p p X , Y q “ ` u sturmGW , p S p p X q , S p p Y qq ˘ p ě ˆ u GW , p S p p X q , S p p Y qq ˙ p “ ´ p u GW ,p p X , Y q . (cid:3) The case p “ 8 . Now, we consider the relation between u sturmGW , and u GW , . By takingthe limit p Ñ 8 in Theorem 3.30, one might expect that u sturmGW , ě u GW , . In fact, we provethat equality holds. Theorem 3.33.
Let X , Y P U w . Then, it holds that u sturmGW , p X , Y q “ u GW , p X , Y q . Proof.
First we prove that u sturmGW , p X , Y q ě u GW , p X , Y q . Indeed, for any u P D ult p u X , u Y q and µ P C p µ X , µ Y q , we have thatsup p x,y qP supp p µ q u p x, y q “ sup p x,y q , p x ,y qP supp p µ q max p u p x, y q , u p x , y qqě sup p x,y q , p x ,y qP supp p µ q ∆ p u X p x, x q , u Y p y, y qqě u GW , p X , Y q , where the first inequality follows from Claim 1 in the proof of Theorem 3.30. Then, by takinga standard limit argument, we conclude that u sturmGW , p X , Y q ě u GW , p X , Y q . Next, we prove that u sturmGW , p X , Y q ď min t t ě | X t – w Y t u . Suppose t ą X t – w Y t and assume ϕ : X t Ñ Y t is such an isomorphism. Then, we define a function u : X \ Y ˆ X \ Y Ñ R ě as follows:(1) u | X ˆ X : “ u X and u | Y ˆ Y : “ u Y ; HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 31 (2) for any p x, y q P X ˆ Y , u p x, y q : “ u Y t p ϕ pr x s Xt q , r y s Yt q , if ϕ pr x s Xt q ‰ r y s Yt t, if ϕ pr x s Xt q “ r y s Yt . (3) for any p y, x q P Y ˆ X , u p y, x q : “ u p x, y q .Then, it is easy to verify that u P D ult p u X , u Y q and that u is actually an ultrametric. Let Z : “ p X \ Y, u q . By Lemma 2.13, we have u sturmGW , p X , Y q ď d Z W , p µ X , µ Y q “ max B P V p Z qzt Z u and µ X p B q‰ µ Y p B q diam p B ˚ q . We verify that d Z W , p µ X , µ Y q ď t as follows. It is obvious that Z t – X t – Y t . Write X t “ tr x i s Xt u ni “ and Y t “ tr y i s Yt u ni “ such that r y i s Yt “ ϕ pr x i s Xt q for each i “ , . . . , n . Then, r x i s Zt “ r y i s Zt and Z t “ tr x i s Zt | i “ , . . . , n u . Since ϕ is an isomorphism, for any i “ , . . . , n we have that µ X pr x i s Xt q “ µ Y pr y i s Yt q and thus µ X pr x i s Zt q “ µ Y pr y i s Zs q “ µ Y pr x i s Zt q when µ X and µ Y are regarded as pushforward measures under the inclusion map X ã Ñ Z and Y ã Ñ Z , respectively. Now for any B P V p Z q (cf. Section 2.3), if diam p B q ě t , then B is the union of certain r x i s Zt s in Z t and thus µ X p B q “ µ Y p B q . If diam p B q ă t anddiam p B ˚ q ą t , then there exists some x i such that B “ r x i s Zs and r x i s Zs “ r x i s Zt where s : “ diam p B q . This implies that µ X p B q “ µ Y p B q . Then, we have that d Z W , p µ X , µ Y q ď t andthus u sturmGW , p X , Y q ď d p X \ Y,u q W , p µ X , µ Y q ď t . Therefore, u sturmGW , p X , Y q ď inf t t ě | X t – w Y t u .Finally, by invoking Theorem 3.22, we conclude that u sturmGW , p X , Y q “ u GW , p X , Y q . (cid:3) One application of Theorem 3.33 is to explicitly derive the minimizing pair p A, φ q P A ˚ in(22) for p “ 8 . Theorem 3.34.
Let X , Y P U w . Let s : “ u sturmGW , p X , Y q and assume that s ą . Then, thereexists p A, φ q P A defined in (20) such that u sturmGW , p X , Y q “ d Z A W , p µ X , µ Y q , where Z A denotes the ultrametric space defined in Section 3.1.1.Proof. We prove the result via an explicit construction. By Theorem 3.33, we have s “ u sturmGW , p X , Y q “ u GW , p X , Y q . By Theorem 3.22, there exists an isomorpism ϕ : X s Ñ Y s . Since s ą
0, by Lemma 2.7, both X s and Y s are finite spaces. We then let X s “tr x s Xs , . . . , r x n s Xs u and Y s “ tr y s Ys , . . . , r y n s Ys u and assume r y i s Ys “ ϕ pr x i s Xs q for each i “ , . . . , n . Let A “ t x , . . . , x n u and define φ : A Ñ Y by sending x i to y i for each i “ , . . . , n .Then, we prove that p A, φ q satisfies the conditions in the statement.Since ϕ is an isomorphism, for any 1 ď i ă j ď n , u Y p y i , y j q “ u Y s pr y i s Ys , r y j s Ys q “ u Y s p ϕ pr x i s Xs q , ϕ pr x j s Xs qq “ u X s pr x i s Xs , r x j s Xs q “ u X p x i , x j q . This implies that φ : A Ñ Y is an isometric embedding and thus p A, φ q P A .It is obvious that p Z A q s is isometric to both X s and Y s . In fact, r x i s Z A s “ r y i s Z A s in Z A foreach i “ , . . . , n and p Z A q s “ tr x i s Z A s | i “ , . . . , n u . Since ϕ is an isomorphism, for any i “ , . . . , n we have that µ X pr x i s Xs q “ µ Y pr y i s Ys q and thus µ X pr x i s Z A s q “ µ Y pr y i s Z A s q “ µ Y pr x i s Z A s q when µ X and µ Y are regarded as pushforward measures under the inclusion map X Ñ Z A and Y Ñ Z A , respectively. Now for any B P V p Z A q , if diam p B q ě s , then B is the union of certain r x i s Z A s s and thus µ X p B q “ µ Y p B q . If otherwise diam p B q ă s and diam p B ˚ q ě s ,then there exists x i such that B “ r x i s Z A t and r x i s Z A t “ r x i s Z A s where t : “ diam p B q . Thisimplies that µ X p B q “ µ Y p B q . Then, by Lemma 2.13, we have d Z A W , p µ X , µ Y q ď s and thus d Z A W , p µ X , µ Y q “ s . (cid:3) Topological and geodesic properties
In this section, we study the topology induced by u GW ,p and u sturmGW ,p on U w and discuss thegeodesic properties of both u GW ,p and u sturmGW ,p for 1 ď p ď 8 .4.1. Topological equivalence between u GW ,p and u sturmGW ,p . M´emoli (2011) proved thetopological equivalence between d GW ,p and d sturmGW ,p . We establish an analogous result for u GW ,p and u sturmGW ,p . To this end, we recall the modulus of mass distribution . Definition 4.1 (Greven et al. (2009, Def. 2.9)) . Given δ ą X P U w , we define themodulus of mass distribution of X as v δ p X q : “ inf t ε ą | µ X pt x : µ X p B ˝ ε p x qq ď δ uq ď ε u , (25)where B ˝ ε p x q denotes the open ball centered at x with radius ε .We note that v δ p X q is non-decreasing, right-continuous and bounded by 1. Furthermore, itholds that lim δ Œ v δ p X q “ Theorem 4.2.
Let X , Y P U w , p P r , and δ P ` , ˘ . Then, whenever u GW ,p p X , Y q ă δ we have u sturmGW ,p p X , Y q ď p ¨ min p v δ p X q , v δ p Y qq ` δ q p ¨ M, where M : “ ¨ max p diam p X q , diam p Y qq ` . Remark 4.3.
Since it holds that lim δ Œ v δ p X q “ ´ { p u sturmGW ,p ě u GW ,p (seeCorollary 3.32), the above theorem gives the topological equivalence of u GW ,p and u sturmGW ,p ,1 ď p ă 8 (by Theorem 3.33 it holds u sturmGW , “ u GW , ).The proof of the Theorem 4.2 follows the same strategy used for proving Proposition 5.3 inM´emoli (2011) and we refer to Appendix C.2 for the details.4.2. Completeness and separability.
In this subsection, we study completeness and sep-arability of the two metrics u GW ,p and u sturmGW ,p , 1 ď p ď 8 , on U w Theorem 4.4. (1) For p P r , , the metric space p U w , u GW ,p q is neither complete norseparable.(2) For p P r , , the metric space ` U w , u sturmGW ,p ˘ is neither complete nor separable.(3) p U w , u GW , q “ p U w , u sturmGW , q is complete but not separable. HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 33
Proof. (1) We first prove that p U w , u GW ,p q is non-separable for each p P r , . Recallnotations in Example 3.9 and consider Consider the family t ˆ∆ p a qu a Pr , s . Claim 1: @ a ‰ b P r , s , u GW ,p ´ ˆ∆ p a q , ˆ∆ p b q ¯ “ ´ p ∆ p a, b q ě ´ p , where we let2 ´ “ ! ˆ∆ p α q ) α Pr , s isan uncountable subset of U w with pairwise distance greater than 2 ´ p , which impliesthat p U w , u GW ,p q is non-separable.Now for p P r , , we show that u GW ,p is not complete. Consider the family t ∆ n p qu n P N of 2 n -point spaces with unitary interpoint distances. Endow each space∆ n p q with the uniform measure µ n and denote the corresponding ultrametric mea-sure space by ˆ∆ n p q . It is proven in Sturm (2012, Example 2.2) that t ˆ∆ n p qu n P N isa Cauchy sequence with respect to d GW ,p without a compact metric measure spaceas limit. It is not hard to check that u GW ,p ´ ˆ∆ m p q , ˆ∆ n p q ¯ “ d GW ,p ´ ˆ∆ m p q , ˆ∆ n p q ¯ , @ n, m P N . Therefore, t ˆ∆ n p qu n P N is a Cauchy sequence with respect to u GW ,p without limit in U w . This implies that p U w , u GW ,p q is not complete.(2) By Corollary 3.32 and (1), we have that ` U w , u sturmGW ,p ˘ is not separable. As for com-pleteness, consider the subset X : “ t ´ n u n P N Ď p R ě , ∆ q . By Lemma 2.16, X isnot a compact ultrametric space. Let µ P P p X q be a probability defined as follows: µ ˆ" ´ n *˙ : “ ´ n , @ n P N . For each N P N , let X N : “ t ´ n | n “ , . . . , N u . Since each X N is finite, X N is acompact ultrametric space. Let µ N P P p X N q be a probability defined as follows: µ N ˆ" ´ n *˙ : “ ´ n , ď n ă N ´ N ` n “ N .
Then, it is easy to verify (e.g. via theorem 3.34) that tp X N , ∆ , µ N qu is a d sturmGW ,p Cauchy sequence with p X, ∆ , µ q being the limit. Since X is not compact, p X, ∆ , µ q R U w and thus ` U w , u sturmGW ,p ˘ is not complete.(3) That p U w , u GW , q is non-separable is already proved in (1). Given a Cauchy sequence t X n “ p X n , u n , µ n qu n P N with respect to u GW , , we have that the underlying ultra-metric spaces t X n u n P N form a Cauchy sequence with respect to u GH due to Corollary3.24. Since p U , u GH q is complete (see Zarichnyi (2005, Proposition 2.1)), there existsa compact ultrametric space p X, u X q such thatlim n Ñ8 u GH p X n , X q “ . For each n P N , let δ n : “ u GH p X n , X q . By Theorem 2.8, we have that p X n q δ n – X δ n .Denote by ˆ µ n P P p X δ n q the pushforward of p µ n q δ n under the isometry. Furthermore, we have by Lemma 2.7 that X δ n is finite and we let X δ n “ tr x s δ n , . . . , r x k s δ n u for x , . . . , x k P X . Based on this, we define ν n : “ k ÿ i “ ˆ µ n pr x i s δ n q ¨ δ x i P P p X q , where δ x i is the Dirac measure at x i . Since X is compact, P p X q is weakly compact.Therefore, the sequence t ν n u n P N has a cluster point ν P P p X q .Now we show that X : “ p X, u X , ν q is a u GW , cluster point of t X n u and thus the limitof t X n u since t X n u is a Cauchy sequence. Without loss of generality, we assume that t ν n u weakly converges to ν . Fix any ε ą
0, we need to show that u GW , p X , X n q ď ε when n is large enough. For any fixed x ˚ P X , r x ˚ s ε is both an open and closed ballin X . Therefore, ν pr x ˚ s ε q “ lim n Ñ8 ν n pr x ˚ s ε q . Since δ n Ñ n Ñ 8 , there exists N ą n ą N , δ n ă ε . We specify an isometry ϕ n : p X n q δ n Ñ X δ n that gives rise to the construction of ν n . Then, we let ψ n : p X n q ε Ñ X ε be theisometry such that the following diagram commutes: p X n q δ n X δ n p X n q ε X εϕ n ε -quotient ε -quotient ψ n Assume that r x ˚ s Xε “ Ť li “ r x i s Xδ n . Let x n ˚ P X n be such that ψ n pr x n ˚ s X n ε q “ r x ˚ s Xε andlet x n , . . . , x nl P X n be such that ϕ n pr x ni s X n δ n q “ r x i s Xδ n for each i “ , . . . , l . Then, r x n ˚ s X n ε “ Ť li “ r x ni s X n δ n . Therefore, ν n pr x ˚ s Xε q “ l ÿ i “ ν n pr x i s Xδ n q “ l ÿ i “ ˆ µ n pr x i s Xδ n q “ l ÿ i “ µ n pr x ni s X n δ n q “ µ n pr x n ˚ s X n ε q . Since X n is a Cauchy sequence, there exists N ą u GW , p X n , X m q ă ε when n, m ą N . Then, by Theorem 3.22, p X n q ε – w p X m q ε for all n, m ą N .By Lemma 2.7, p X n q ε is finite, then p X n q ε has cardinality independent of n when n ą N . Moreover, for all n ą N , the finite set A n : “ t µ n pr x n ˚ s X n ε quYt µ n pr x ni s X n ε q| i “ , . . . , l u is independent of n and thus µ n pr x n ˚ s X n ε q only takes value in a finite set ( A n ).Combining with the fact that lim n Ñ8 µ n pr x n ˚ s X n ε q “ lim n Ñ8 ν n pr x s Xε q “ ν pr x ˚ s Xε q exists, there exists N ą n ą N , µ n pr x n ˚ s ε q ” C for some constant C . This implies that ν pr x ˚ s Xε q “ µ n pr x n ˚ s X n ε q , when n ą max p N , N , N q . Since X ε is finite, there exists a common N ą n ą N and @r x ˚ s ε P X ε we have ν pr x ˚ s Xε q “ µ n pr x n ˚ s X n ε q , where r x n ˚ s X n ε “ ψ ´ n pr x ˚ s Xε q P p X n q ε . This indicates that ν ε “ p ψ n q p µ n q ε when n ą N . Therefore, X ε – w p X n q ε and thus u GW , p X , X n q ď ε . (cid:3) HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 35
Geodesic property.
A geodesic in a metric space p X, d X q is a continuous function γ : r , s Ñ X such that for each s, t P r , s , d X p γ p s q , γ p t qq “ | s ´ t | ¨ d X p γ p q , γ p qq . Wesay a metric space is geodesic if for any two distinct points x, x P X , there exists a geodesic γ : r , s Ñ X such that γ p q “ x and γ p q “ x .For any p P r , , the notion of p -geodesic is introduced in M´emoli et al. (2019): A p -geodesic in a metric space p X, d X q is a continuous function γ : r , s Ñ X such that for each s, t P r , s , d X p γ p s q , γ p t qq “ | s ´ t | { p ¨ d X p γ p q , γ p qq . Similarly, we say a metric space is p -geodesic if for any two distinct points x, x P X , there exists a p -geodesic γ : r , s Ñ X such that γ p q “ x and γ p q “ x . Note that a 1-geodesic is a usual geodesic and a 1-geodesicspace is a usual geodesic space. Lemma 4.5 (M´emoli et al. (2019, Proposition 7.10)) . Given p P r , , if X is a p -metricspace, then X is not q -geodesic for all ď q ă p . Lemma 4.6 (M´emoli et al. (2019, Theorem 7.7)) . Let X be a geodesic metric space. Then,for any p ě , S p p X q is p -geodesic, where S α denotes the snowflake transform for α ą (cf.Section 3.3). Now, we start establish ( p -)geodesic properties of ` U w , u sturmGW ,p ˘ for p P r , . Proposition 4.7.
The metric space p U w , u sturmGW , q is geodesic. The proof is based on the following property of W space. Lemma 4.8 (Bottou et al. (2018, Theorem 5.1)) . Let X be a compact metric space. Then,the space W p X q : “ p P p X q , d X W , q is a geodesic space.Proof of Proposition 4.7. Let X and Y be two compact ultrametric measure spaces. ByCorollary 3.7, there exist a compact ultrametric space Z and isometric embeddings φ : X ã Ñ Z and ψ : Y ã Ñ Z such that u sturmGW ,p p X , Y q “ d Z W ,p p φ µ X , ψ µ Y q . Then, the space W p Z q is geodesic (cf. Lemma 4.8). Therefore, there exists a Wassersteingeodesic ˜ γ : r , s Ñ W p Z q connecting φ µ X and ψ µ Y . This induces a curve γ : r , s Ñ U w where for each t P r , s , γ p t q : “ p supp p ˜ γ p t qq , u | supp p ˜ γ p t qqˆ supp p ˜ γ p t qq , ˜ γ p t qq . Note that γ p q – w X and γ p q – w Y and hence we simply replace γ p q and γ p q with X and Y , respectively.Now, for each s, t P r , s , we have that d sturmGW , p γ p s q , γ p t qq ď d Z W , p ˜ γ p s q , ˜ γ p t qq “ | s ´ t | d Z W , p ˜ γ p q , ˜ γ p qq “ | s ´ t | d sturmGW , p X , Y q . Therefore, γ is a geodesic connecting X and Y and thus p U w , u sturmGW , q is geodesic. (cid:3) Theorem 4.9.
For each p P r , , the metric space ` U w , u sturmGW ,p ˘ is p -geodesic.Proof. By Corollary 3.29, S p ` U w , u sturmGW ,p ˘ – ` U w , u sturmGW , ˘ . This implies that S p p U w , u sturmGW , q – ` U w , u sturmGW ,p ˘ . Then, by Lemma 4.6, we have that ` U w , u sturmGW ,p ˘ is p -geodesic. (cid:3) Remark 4.10.
By Lemma 4.5, ` U w , u sturmGW ,p ˘ is not geodesic for all p ą Remark 4.11.
Though the geodesic properties of ` U w , u sturmGW ,p ˘ , 1 ď p ď 8 are clear, weremark that geodesic properties of p U w , u GW ,p q , 1 ď p ă 8 , still remain unknown to us.5. Lower bounds of u GW ,p Let X “ p X, u X , µ X q and Y “ p Y, u Y , µ Y q be two ultrametric measure spaces. The metrics u sturmGW ,p and u GW ,p respect the ultrametric structure of the spaces X and Y . Thus, one wouldhope that comparing ultrametric measure spaces with u sturmGW ,p or u GW ,p is more meaningfulthan doing it with the usual Gromov-Wasserstein distance or Sturm’s distance. Unfortu-nately, for p ă 8 , the computation of both u sturmGW ,p and u GW ,p is complicated and for p “ 8 both metrics are extremely sensitive to differences in the diameter of the considered spaces(see Corollary 3.25). Thus, it not feasible to use these metrics in many applications. How-ever, we can derive meaningful lower bounds for u sturmGW ,p and u GW ,p that resemble those of theGromov-Wasserstein distance. Naturally, the question arises whether these lower bounds arebetter/sharper than the ones of the usual Gromov-Wasserstein distance in this setting. Thisquestion is addressed throughout this section and will be readdressed in Section 7 as well asSection 8.In M´emoli (2011), the author introduced three lower bounds for d GW ,p that are computation-ally less expensive than the calculation of d GW ,p . We will briefly review three lower boundsand then define the corresponding lower bounds for u GW ,p . In the following, we always as-sume p P r , . First lower bound.
Let s X,p : X Ñ R ě , x ÞÑ (cid:107) u X p x, ¨q (cid:107) L p p µ X q . Then, the first lower bound FLB p p X , Y q for d GW ,p p X , Y q is defined as follows FLB p p X , Y q : “
12 inf µ P C p µ X ,µ Y q (cid:107) ∆ p s X,p p¨q , s
Y,p p¨qq (cid:107) L p p µ q . Following our intuition of replacing ∆ by ∆ , we define the ultrametric version of FLB as FLB ult p p X , Y q : “ inf µ P C p µ X ,µ Y q (cid:107) ∆ p s X,p p¨q , s
Y,p p¨qq (cid:107) L p p µ q . Second lower bound.
The second lower bound
SLB p p X , Y q for d GW ,p p X , Y q is given as SLB p p X , Y q : “
12 inf γ P C p µ X b µ X ,µ Y b µ Y q (cid:107) ∆ p u X , u Y q (cid:107) L p p γ q . Thus, we define the ultrametric second lower bound between two ultrametric measure spaces X and Y as follows: SLB ult p p X , Y q : “ inf γ P C p µ X b µ X ,µ Y b µ Y q (cid:107) ∆ p u X , u Y q (cid:107) L p p γ q . HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 37
Third lower bound.
Before we introduce the final lower bound, we have to define severalfunctions. First, let Γ X,Y : X ˆ Y ˆ X ˆ Y Ñ R ě , p x, y, x , y q ÞÑ ∆ p u X p x, x q , u Y p y, y qq and let Ω p : X ˆ Y Ñ R ě , p P r , , be given byΩ p p x, y q : “ inf µ P C p µ X ,µ Y q (cid:13)(cid:13) Γ X,Y p x, y, ¨ , ¨q (cid:13)(cid:13) L p p µ q . Then, the third lower bound
TLB p is given as TLB p p X , Y q : “
12 inf µ P C p µ X ,µ Y q (cid:13)(cid:13) Ω p p¨ , ¨q (cid:13)(cid:13) L p p µ q . Analogue to the definition of previous ultrametric versions, we define Γ X,Y : X ˆ Y ˆ X ˆ Y Ñ R ě , p x, y, x , y q ÞÑ ∆ p u X p x, x q , u Y p y, y qq . Further, for p P r , , let Ω p : X ˆ Y Ñ R ě be given by Ω p p x, y q : “ inf µ P C p µ X ,µ Y q (cid:13)(cid:13) Γ X,Y p x, y, ¨ , ¨q (cid:13)(cid:13) L p p µ q . Then, the ultrametric third lower bound between two ultrametric measure spaces X and Y is defined as TLB ult p p X , Y q : “ inf µ P C p µ X ,µ Y q (cid:13)(cid:13) Ω p p¨ , ¨q (cid:13)(cid:13) L p p µ q . Properties and computation of the lower bounds.
Next, we examine
FLB ult , SLB ult and
TLB ult more closely. Since ∆ p a, b q ě ∆ p a, b q “ | a ´ b | for any a, b ě
0, it is easy toconclude that
FLB ult p ě FLB p , SLB ult p ě SLB p and TLB ult p ě TLB p . Moreover, the threeultrametric lower bounds satisfy the following theorem. Theorem 5.1.
Let X , Y P U w and let p P r , .(1) u GW , p X , Y q ě FLB ult p X , Y q .(2) u GW ,p p X , Y q ě TLB ult p p X , Y q .(3) u GW ,p p X , Y q ě SLB ult p p X , Y q .Proof. The proofs of d GW ,p ě SLB p p X , Y q and d GW ,p ě TLB p p X , Y q in M´emoli (2011) applyhere without any change for proving item (2) and item (3).As for item (1), we first observe that for any point x in an ultrametric space X , there existsa point x P X such that u X p x, x q “ diam p X q . Then, as long as µ X is fully supported on X as assumed throughout the paper, we have that s X, ” diam p X q is a constant function.Therefore, ∆ p s X, p x q , s Y, p y qq ” ∆ p diam p X q , diam p Y qq , @ x P X, y P Y. This implies that
FLB ult p X , Y q “ ∆ p diam p X q , diam p Y qq . By Remark 2.10 and Corollary3.24, we have that u GW , p X , Y q ě u GH p X, Y q ě ∆ p diam p X q , diam p Y qq “ FLB ult p X , Y q . (cid:3) Remark 5.2.
Interestingly, it turns out that
FLB ult p is not a lower bound of u GW ,p ingeneral when p ă 8 . For example, let X “ t x , x , . . . , x n u and Y “ t y , . . . , y n u anddefine u X such that u X p x , x q “ u X p x i , x j q “ δ i ‰ j for p i, j q ‰ p , q , p i, j q ‰ p , q and i, j “ , . . . , n . Let u Y p y i , y j q “ δ i ‰ j , i, j “ , . . . , n , and let µ X and µ Y be uniformmeasures on X and Y , respectively. Then, u GW , p X , Y q ď n whereas FLB ult1 p X , Y q “ n ´ n which is greater than u GW , p X , Y q as long as n ą
2. Moreover, we have in this case that
FLB ult1 p X , Y q “ O ` n ˘ whereas u GW , p X , Y q “ O ` n ˘ . Hence, there exists no constant C ą FLB ult1 ď C ¨ u GW , in general.From the structure of SLB ult p and TLB ult p it is obvious that their computations leads todifferent optimal transport problems (see e.g. Villani (2003)). However, we can rewrite SLB ult p and TLB ult p in order to further simplify their computation. To this end, we firstintroduce the following auxiliary result, which is Lemma 28 in Chowdhury and M´emoli(2019). Lemma 5.3.
Let
X, Y be two Polish metric spaces and let f : X Ñ R and g : Y Ñ R bemeasurable maps. Denote by f ˆ g : X ˆ Y Ñ R the map p x, y q ÞÑ p f p x q , g p y qq . Then, p f ˆ g q C p µ X , µ Y q “ C p f µ Y , g µ Y q , for any µ Y P P p X q and µ Y P P p Y q . Based on Lemma 5.3 it is possible to transform the optimization problem for
SLB ult p intocomputing a Wasserstein distances on p R ě , ∆ q and to efficiently calculate Ω p in the defi-nition of TLB ult p . Proposition 5.4.
Let X , Y P U w and let p P r , . Then,(1) SLB ult p p X , Y q “ d p R ě , ∆ q W ,p pp u X q p µ X b µ X q , p u Y q p µ Y b µ Y qq ; (2) For each x, y P X ˆ Y , Ω p p x, y q “ d p R ě , ∆ q W ,p p u X p x, ¨q µ X , u Y p y, ¨q µ Y q .Proof. We only prove (1) in the case when p P r , . The case p “ 8 and (2) can be provenin a similar manner.By directly using the change-of-variables formula, we have the following: SLB ult p p X , Y q “ inf γ P C p µ X b µ X ,µ Y b µ Y q ij X ˆ X ˆ Y ˆ Y p ∆ p u X p x, x q , u Y p y, y qqq p γ p d p x, x q ˆ d p y, y qq“ inf γ P C p µ X b µ X ,µ Y b µ Y q ij R ě ˆ R ě p ∆ p s, t qq p p u X ˆ u Y q γ p ds ˆ dt q , where u X ˆ u Y : X ˆ X ˆ Y ˆ Y Ñ R ě ˆ R ě maps p x, x , y, y q to p u X p x, x q , u Y p y, y qq .By Lemma 5.3, we have that p u X ˆ u Y q C p µ X b µ X , µ Y b µ Y q “ C pp u X q p µ X b µ X q , p u Y q p µ Y b µ Y qq . HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 39
Therefore,
SLB ult p p X , Y q “ inf γ P C p µ X b µ X ,µ Y b µ Y q ij R ě ˆ R ě p ∆ p s, t qq p p u X ˆ u Y q γ p ds ˆ dt q“ inf ˜ γ P C p p u X q p µ X b µ X q , p u Y q p µ Y b µ Y q q ij R ě ˆ R ě p ∆ p s, t qq p ˜ γ p ds ˆ dt q“ d p R ě , ∆ q W ,p pp u X q p µ X b µ X q , p u Y q p µ Y b µ Y qq . (cid:3) Remark 5.5.
Since we have by Theorem 2.14 an explicit formula for the Wasserstein dis-tance on p R ě , ∆ q between finitely supported probability measures, these alternative rep-resentations of the lower bound SLB ult p and the cost functional Ω p drastically reduce thecomputation time of SLB ult p and TLB ult p , respectively. In particular, we note that this allowus to compute SLB ult p , 1 ď p ď 8 , between finite ultrametric measure spaces X and Y with | X | “ m and | Y | “ n in O pp m _ n q q steps.Moreover, Proposition 5.4 allows us to direclty compare the two lower bounds SLB ult1 and
SLB . Corollary 5.6.
For any finite ultrametric measure spaces X and Y , we have that SLB ult1 p X , Y q “ SLB p X , Y q ` ż R t |p u X q p µ X b µ X q ´ p u Y q p µ Y b µ Y q| p dt q . (26) Proof.
The claim follows directly from Proposition 5.4 and Remark 2.15. (cid:3)
This corollary implies that
SLB ult p is more rigid than SLB p , since the second summandon the right hand side of Equation 26 is sensitive to distance perturbations. This is alsoillustrated very well in the subsequent example. Example 5.7.
Recall notations from Example 3.9. For any d, d ą
0, we let X : “ ∆ p d q and let Y : “ ∆ p d q . Assume that X and Y have underlying sets t x , x u and t y , y u ,respectively. Define µ X P P p X q and µ Y P P p Y q as follows. Let α , α ě α ` α “
1. Let µ X p x q “ µ Y p y q : “ α and let µ X p x q “ µ Y p y q : “ α . Then, it is easy toverify that SLB ult1 p X , Y q “ α α ∆ p d, d q . This example illustrates that
SLB ult1 (and hence u GW , ) is rigid with respect to distanceperturbation. u GW ,p on ultra-dissimilarity spaces A natural generalization of ultrametric spaces are the so-called ultra-dissimilarity spaces .These spaces naturally occur when working with symmetric ultranetworks (see Smith et al.(2016)) or phylogenetic tree data (see Semple et al. (2003)). In this section, we will introducethese spaces and briefly illustrate to what extend the results for u GW ,p can be adapted forultra-dissimilarity measure spaces.We start by formally introducing ultra-dissimilarity spaces . Definition 6.1 (Ultra-dissimilarity spaces) . An ultra-dissimilarity space is a couple p X, u X q consisting of a set X and a function u X : X ˆ X Ñ R ě satisfying the following conditionsfor any x, y, z P X ,:(1) u X p x, y q “ u X p y, x q ;(2) u X p x, y q ď max p u X p x, z q , u X p z, y qq ;(3) max p u X p x, x q , u X p y, y qq ď u X p x, y q and the equality holds if and only if x “ y . Remark 6.2.
Note that when p X, u X q is an ultrametric space condition (3) is triviallysatisfied.In the following, we restrict ourselves to finite ultra-dissimilarity spaces to avoid technicalissues in topology: For a finite space X , the function u X is obviously continuous with respectto the discrete topology and thus the following counterpart to Lemma B.5 naturally holds(see Chowdhury (2019); Chowdhury and M´emoli (2019) for a more complete treatment ofinfinite spaces). Lemma 6.3.
Let
X, Y be finite ultra-dissimilarity spaces, then ∆ p u X , u Y q : X ˆ Y ˆ X ˆ Y Ñ R ě is continuous with respect to the discrete topology. One important aspect of ultra-dissimilarity spaces is the connection with the so-called tree-grams (Smith et al., 2016; M´emoli et al., 2019), which is a generalization of dendrograms. Fora finite set X , let SubPart p X q denote the collection of all subpartitions of X : any partition P of a non-empty subset X Ď X is called a subpartition of X . Given two subpartitions P , P , we say P is coarser than P if each block in P is contained in some block in P . Definition 6.4 (Treegrams) . A treegram T X : r ,
8q Ñ
SubPart p X q is a map parametrizinga nested family of subpartitions over the same set X and satisfying the following conditions:(1) for any 0 ď s ă t ă 8 , T X p t q is coarser than T X p s q ;(2) there exists t X ą t ě t X , T X p t q “ t X u ;(3) for each t ě
0, there exists ε ą T X p t q “ T X p t q for all t P r t, t ` ε s .(4) for each x P X , there exists t x ě t x u is a block in T X p t x q .Similar to Lemma 2.2, which correlates ultrametrics to dendrograms, there exists the equiva-lent relation between ultra-dissimilarity functions and treegrams on a finite set (see Figure 5for an illustration). HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 41 eabcd {x2} {x3} {x1} x2 x3 x1 ab cd e e (cid:127)
Figure 5.
Treegrams:
Relation between ultra-dissimilarity functions andtreegrams
Proposition 6.5.
Given a finite set X , denote U dis p X q the collection of all ultrametricdissimilarity functions on X and T p X q the collection of all treegrams over X . Then, thereexists a bijection ∆ X : U dis p X q Ñ T p X q . An ultra-dissimilarity measure space is a triple X “ p X, u X , µ X q where p X, u X q is an ultra-dissimilarity space and µ X is a probability measure fully supported on X . In the following,we denote by U w dis the collection of all finite ultra-dissimilarity measure spaces.For any given finite ultra-dissimilarity measure spaces X “ p X, u X , µ X q and Y “ p Y, u Y , µ Y q and any coupling µ P C p µ X , µ Y q , we define the p -ultra-distortion of µ for 1 ď p ă 8 to bedis ult p p µ q : “ ¨˝ ij X ˆ Y ˆ X ˆ Y ∆ p u X p x, x q , u Y p y, y qq p µ p dx ˆ dy q µ p dx ˆ dy q ˛‚ { p and for p “ 8 as dis ult p µ q : “ sup x,x P X , y,y P Y s.t. p x,y q , p x ,y qP supp p µ q ∆ p u X p x, x q , u Y p y, y qq , where supp p µ q denotes the support of µ . Based on this u GW ,p extend naturally to U w dis : u GW ,p p X , Y q : “ inf µ P C p µ X ,µ Y q dis ult p p µ q . (27)Furthermore, one can adept the notion of a p -distortion (see (6)) to ultra-dissimilarity mea-sure spaces in the same manner and hence also d GW ,p generalizes to U w dis .Just as for metric spaces or metric measure spaces, it is important to have a notion ofisomorphism between ultra-dissimilarity spaces. Definition 6.6 (Isomorphism) . Given X , Y P U w dis , we say they are isomorphic, denoted X – w Y , if there is a bijective f : X Ñ Y such that(1) f µ X “ µ Y ;(2) for any x, x P X , u Y p f p x q , f p x qq “ u X p x, x q . Next, we will show that u GW ,p , 1 ď p ď 8 , is a metric on the isomorphism classes of U w dis .The first step to prove this is to verify the existence of an optimal coupling in (27). Proposition 6.7.
Let X , Y P U w dis . Then, for any p P r , , there always exists an optimalcoupling µ P C p µ X , µ Y q such that u GW ,p p X , Y q “ dis ult p p µ q .Proof. The proof is essentially the same as the one for Proposition 3.20. We only replaceLemma B.5 with Lemma 6.3. The details are left to the reader. (cid:3)
As second step we demonstrate that u GW ,p is a pseudo-metric on U w dis . Theorem 6.8. u GW ,p is a p -pseudo-metric on U w dis .Proof. The proof of Theorem 3.19 only utilizes the strong triangle inequality and Proposition3.20. The same strategy applies here by only replacing Proposition 3.20 with Proposition6.7 here. Again, we leave the details to readers. (cid:3)
Finally, we prove that after the identification of the isomorphism classes we indeed obtain ametric space.
Theorem 6.9.
Let X , Y P U w dis . Then, it holds u GW ,p p X , Y q “ if and only if X – w Y .Proof. If X – w Y , then obviously u GW ,p p X , Y q “ u GW ,p p X , Y q “
0. By Proposition 6.7 there exists µ P C p µ X , µ Y q suchthat u GW ,p p X , Y q “ dis ult p p µ q “
0. Now, we define a map ϕ : X Ñ Y as follows: For any x P X we have µ X p x q ą
0, since µ X has full support and X is finite. As a result, µ p x, y q ą y P Y , then we let ϕ p x q ÞÑ y . This map is well-defined. Indeed, if there are x P X and y, y P Y such that µ p x, y q , µ p x, y q ą
0, then by dis ult p p µ q “ p u X p x, x q , u Y p y, y qq “ ∆ p u X p x, x q , u Y p y, y qq “ ∆ p u X p x, x q , u Y p y , y qq “ . This implies that u Y p y, y q “ u Y p y, y q “ u Y p y , y q “ u X p x, x q . Since u Y is an ultra-dissimilarity, we have that y “ y (cf. condition (3) in Definition 6.1). Essentially the sameargument can be applied to prove that ϕ : X Ñ Y is an injective map. Consequently, forany x P X , µ X p x q “ µ p x, ϕ p x qq ď µ Y p ϕ p x qq . Since 1 “ ř x P X µ X p x q ď ř x P X µ Y p ϕ p x qq ď µ X p x q “ µ Y p ϕ p x qq for all x P X . Since µ Y is fully supported, this implies that ϕ is a bijective measure preserving map. Now, for any x, x P X , dis ult p p µ q “ p u X p x, x q , u Y p ϕ p x q , ϕ p x qqq “ u X p x, x q “ u Y p ϕ p x q , ϕ p x qq . Therefore, ϕ is anisometry and thus an isomorphism. Then, X – w Y . (cid:3) Lower bounds.
Lower bounds of d GW ,p or u GW ,p on U w defined in Section 5 can beextended to U w dis in a similar manner to the extension of d GW ,p or u GW ,p to U w dis . For example,let X , Y P U w dis , then we define the ultrametric second lower bound SLB ult p as follows: SLB ult p p X , Y q : “ inf γ P C p µ X b µ X ,µ Y b µ Y q (cid:107) ∆ p u X , u Y q (cid:107) L p p γ q . In the sequel, we mainly focus on
SLB ult p and TLB ult p . In particular, they are indeed lowerbounds of u GW ,p on U w dis : HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 43
Theorem 6.10.
Let X “ p X, u X , µ X q and Y “ p Y, u Y , µ Y q be two finite ultra-dissimilaritymeasure spaces and let p P r , .(1) u GW ,p p X , Y q ě TLB ult p p X , Y q .(2) u GW ,p p X , Y q ě SLB ult p p X , Y q .Proof. The proofs follows directly from the proof of Chowdhury (2019, Theorem 24). (cid:3) Computational aspects
In this section, we investigate algorithms for approximating/calculating u GW ,p , 1 ď p ď 8 .Furthermore, we evaluate for p ă 8 the performance of the lower bounds introduced in Sec-tion 5 and compare our findings to the results of the classical Gromov-Wasserstein distance d GW ,p (see (7)). Matlab implementations of the presented algorithms and comparisons areavailable at https://github.com/ndag/uGW .7.1. Algorithms.
Let X “ p X, u X , µ X q and Y “ p Y, u Y , µ Y q be two finite ultrametric mea-sure spaces with cardinalities m and n , respectively. We have already noted in Remark 3.21that calculating u GW ,p p X , Y q for p ă 8 yields a non-convex quadratic program (which is anNP-hard problem in general (Pardalos and Vavasis, 1991)). Solving this is not feasible inpractice. However, in many practical applications it is sufficient to work with good approxi-mations. Therefore, we propose to approximate u GW ,p p X , Y q for p ă 8 via gradient descentand refer to Section 7.1.1 for the details. On the other hand, it is possible to calculate u GW , p X , Y q in polynomial time. The corresponding algorithm is based on Theorem 3.22 aswell as ideas from M´emoli et al. (2019) and is presented in Section 7.1.2.7.1.1. The case p ă 8 . As already mentioned, we propose to estimate the ultrametricGromov-Wasserstein distance between X and Y for p ă 8 via conditional gradient de-scent. To this end, we note that the gradient G that arises from (12) can in the presentsetting be expressed with the following partial derivative with respect to µG i,j “ m ÿ k “ n ÿ l “ p ∆ p u X p x i , x k q , u Y p y j , y l qqq p µ kl , @ ď i ď m, ď j ď n. (28)Furthermore, we remark that, as we deal with a non-convex minimization problem, the per-formance of the gradient descent strongly depends on the starting coupling µ p q . Therefore,we follow the suggestion of Chowdhury and Needham (2020) and employ a Markov ChainMonte Carlo Hit-And-Run sampler to obtain multiple random start couplings. Running gra-dient descent from each point in this ensemble greatly improves the approximation in manycases. For a precise description of the proposed procedure, we refer to Algorithm 1.In order to understand how u GW ,p (or at least its approximation) is influenced by smallchanges in the structure of the considered ultrametric measure spaces, we exemplarily con-sider the ultrametric measure spaces X i “ p X i , d X i , µ X i q , 1 ď i ď
4, displayed in Figure 6.These ultrametric measure spaces differ only by one characteristic (e.g. one side length orthe equipped measure). For p “
1, the value of the comparison u GW , p X i , X j q , 1 ď i ď j ď L “ N “
40) are shown in Table 1. It isremarkable that a change in diameter seems to influence u GW , the most. Algorithm 1: u GW ,p p X, Y, p, N, L q /* Create a list of random couplings */ couplings =CreateRandomCouplings(N);stat points = cell(N); for i=1:N do µ p q “ couplings { i } ; for j=1:L do G “ Gradient from (28) w.r.t. µ p j ´ q ;˜ µ p j q “ Solve OT with ground loss G ; γ p j q “ j ` ; /* Alt. find γ P r , s that minimizes dis ult p ´ µ p j ´ q ` γ ` ˜ µ p j q ´ µ p j ´ q ˘¯ */ µ p j q “ p ´ γ p j q q µ p j ´ q ` γ p j q ˜ µ p j q ; end stat points { i } = µ p L q ; end Find µ ˚ in stat points that minimizes dis ult p p µ q ;result =dis ult p p µ ˚ q ; Figure 6.
Ultrametric measure spaces:
Four non-isomorphic ultrametricmeasure spaces denoted (from left to right) as X i “ p X i , d X i , µ X i q , 1 ď i ď The case p “ 8 . For p “ 8 , it follows by Theorem 3.22 that u GW , p X , Y q “ inf t t ě | X t – w Y t u . (29)This identity allows us to construct a polynomial time algorithm for u GW , p X , Y q based onthe ideas of M´emoli et al. (2019, Sec. 8.2.2). More precisely, let spec p X q : “ t u X p x, x q| x, x P X u denote the spectrum of X . Then, it is evident that in order to find the infimum in (29),we only have to check X t – w Y t for each t P spec p X q Y spec p Y q , starting from the largestto the smallest and u GW , is given as the smallest t such that X t – Y t . This can be done inpolynomial time by considering X t and Y t as labeled, weighted trees (e.g. by using a slightmodification of the algorithm in Example 3.2 of Aho and Hopcroft (1974)). This induces HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 45 u GW , u GW , X X X X X X X X X X X X Table 1.
Comparison of different ultrametric measure spaces I:
Thevalues of u GW , p X i , X j q (approximated by Algorithm 1) and u GW , p X i , X j q ,1 ď i ď j ď
4, where X i , 1 ď i ď
4, denote the ultrametric measure spacesdisplayed in Figure 6.the subsequent simple algorithm to calculate u GW , . Algorithm 2: u GW , p X , Y q spec = sort(spec p X q Y spec p Y q , ’descent’); for i “ p spec q do t “ spec p i q ; if X t fl w Y t thenreturn spec p i ´ q ; endendreturn u GW , and u GW , , we reconsider the ultramet-ric measure spaces displayed in Figure 6 and repeat the comparisons from Section 7.1.1. Theresults summarized in Table 1 show that u GW , is extremely sensitive to the changes made.For almost all comparisons it attains the maximal possible value. Only the comparison of X with X , where the only small scale structure of the space was changed, yields a valuethat is smaller than the maximum of the diameters of the considered spaces.7.2. The relationship between u GW ,p and SLB ult p . Due to its low computational com-plexity the tightness of the lower bound
SLB ult p is of particular interest for practical applica-tions. Hence, we study it empirically. To this end, we first consider the ultrametric measurespaces in Figure 6 and redo the comparisons from Section 7.1.1 using SLB ult1 instead of u GW , . The results are reported in Table 2. We see that only the values SLB ult1 p X i , X q ,1 ď i ď
4, are clearly distinct from u GW , p X i , X q , 1 ď i ď
4, respectively. This suggeststhat changes in the metric influence
SLB ult1 in a similar fashion as u GW , , while changes inthe measure have less impact on SLB ult1 . In order to further investigate the differences andsimilarities in the behavior of u GW , and SLB ult1 , we consider two ways to generate randomultrametric measure spaces, namely independent sampling and subsampling . Independent sampling:
In order to obtain independent random ultrametric spaces, onedraws independent samples from a probability distribution. Each sample is turned in anultrametric measure space by considering it as fully connected graph weighted by the distancebetween the points in the sample and then defining a new (ultra-)metric u between the SLB ult1 X X X X X X X X Table 2.
Comparison of different ultrametric measure spaces II:
Thevalues of
SLB ult1 p X i , X j q , 1 ď i ď j ď
4, where X i , 1 ď i ď
4, denote theultrametric measure spaces displayed in Figure 6.points. More precisely, for x and x in random sample, we define u p x , x q as the weight ofthe minimax path (that is, the largest weight of an edge, on a path chosen to minimize thislargest weight) between x and x . Alternatively, it is possible to employ a linkage algorithmto create a dendrogram, which then induces an ultrametric space. The obtained ultrametricspaces can then be equipped with a (random) measure. Subsampling:
In order to ensure that the structure of the sampled spaces is preserved,it is possible to create one large ultrametric space via independent sampling and then tosubsample (e.g. uniformly) a number of points. Once the corresponding probability weightsare normalized, these points induce a random ultrametric measure space.In the following, we first consider independently sampled ultrametric measure spaces tocompare u GW , and SLB ult1 afterwards we redo the comparisons made for a collection ofsubsampled ultrametric measure spaces. We begin by independently sampling ultrametricmeasure spaces. For each k “ , , , , n “ , ,
30 of themixture distribution k ÿ i “ k U r . p k ´ q , . p k ´ q ` s , where U r a, b s denotes the uniform distribution on r a, b s , and transform them into ultrametricmeasure spaces as described previously (the obtained ultrametric spaces are equipped withtheir respective uniform measure). In the following, these spaces are denoted by Y in,k “ ´ Y in,k , u Y in,k , µ Y in,k ¯ , 1 ď i ď
3. We remark that k can be regarded as the number of blocks inthe dendrogram representation of the obtained ultrametric measure spaces (see Figure 7 fora visualization of three 3-block spaces). Next, we calculate u GW , p Y in,k , Y i n ,k q (approximatedwith Algorithm 1, N “ , L “
10) and
SLB ult1 p Y in,k , Y i n ,k q , i, i P t , , u , k, k P t , . . . , u and n, n P t , , u . Then, we visualize the results ordered by k and n as heatmaps(see Figure 8, top row). In the heatmaps in Figure 8 (and all subsequent heatmaps), thespaces t Y in,k u k,n,i are sorted with respect to increasing lexicographical order of p k, n, i q and arelabeled 1 , . . . ,
45 according to this increasing order. Figure 8 demonstrates that, at least inthis setting,
SLB ult1 is close to u GW , . Both u GW , and SLB ult1 strongly discriminate betweenthe different independently sampled spaces and the larger the number of blocks k , the largerthe values of u GW , and SLB ult1 are.Next, we investigate how the results change if we consider subsampled instead of independentultrametric measure spaces. For k “ , , , , HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 47
Figure 7.
Randomly sampled ultrametric measure spaces:
Illustra-tion of three independently sampled ultrametric measure spaces with threeblocks (top row) and three subsampled ultrametric measure spaces with threeblocks (bottom row) as dendrograms.space with k blocks, where each block contains 100 points, as previously described. Then,we subsample from each large space 9 subspaces (three 10-point subspaces, three 20-pointsubspaces and three 30-point subspaces) and equip them with the uniform measure. Theseultrametric measure spaces are denoted as Z in,k “ ´ Z in,k , u Z in,k , µ Z in,k ¯ , 1 ď i ď n Pt , , u and k P t , . . . , u . Other than the independently sampled ultrametric measurespaces these subsampled ultrametric measure spaces have the same large scale structurewith high probability (see Figure 7 for an illustration). With the new spaces, we repeatthe comparisons done for the independently sampled spaces. The results are summarized inFigure 8 (bottom row, same visualization) and interestingly differ quite drastically from theones for the independently sampled ultrametric measure spaces. While u GW , (approximatedby Algorithm 1, N “ , L “
10) and
SLB ult1 are again close and display the same behavior,the differences between spaces with the same number of blocks are much less pronouncedthan before.Let X , Y P U w . The reason, why SLB ult1 behaves so differently for subsampled ultrametricmeasure spaces becomes evident, when we regard the reformulation of
SLB ult1 given byCorollary 5.6:
SLB ult1 p X , Y q “ SLB p X , Y q ` ż R t |p u X q p µ X b µ X q ´ p u Y q p µ Y b µ Y q| p dt q . (30)If we independently sample the ultrametric measure spaces, then these spaces will usuallyhave a slightly different diameters and different distance distribution profiles around theirdiameter (cf. Figure 7). This causes the second summand in (30) to become large. If wesubsample the ultrametric measure spaces, then they will have the same diameter and large
10 20 30 4051015202530354045 00.10.20.30.4
10 20 30 4051015202530354045 00.050.10.150.20.250.30.350.410 20 30 4051015202530354045 00.050.10.150.20.250.30.350.4
10 20 30 4051015202530354045
Figure 8.
Randomly sampled ultrametric measure spaces I:
Heatmaprepresentation of u GW , p Y in,k , Y i n ,k q (top left), of SLB ult1 p Y in,k , Y i n ,k q (topright), of u GW , p Z in,k , Z i n ,k q (bottom right) and SLB ult1 p Z in,k , Z i n ,k q (bottomleft), 1 ď i, i ď n, n P t , , u and k, k P t , . . . , u .scale structure with high probability (see Figure 7). Hence, the second term in (30) is almostnegligible in this case. Therefore, SLB ult1 is extremely sensitive to small perturbations ofthe large scale structure of ultrametric measure spaces and our simulations suggest that thesame holds true for u GW , .7.3. Relation to the Gromov-Wasserstein distance.
Next, we will demonstrate thedifferences between the Gromov-Wasserstein distance d GW , , the lower bound SLB , u GW , and SLB ult1 . To this end, we compare the ultrametric measure spaces displayed in Figure 6and repeat the comparisons between the randomly sampled ultrametric measure spaces donein Section 7.2. The results are displayed in Table 3 and Figure 9, respectively. Consideringthe values in Table 3, we observe that d GW , and SLB are hardly influenced by the differ-ences between the ultrametric measure spaces X i “ p X i , u X i , µ X i q , 1 ď i ď
4. In particular,it is remarkable that d GW , is affected the most by the changes made to the measure andnot the metric structure. Interestingly, this is not true for SLB . The comparisons of theresults displayed in Figure 8 and Figure 9 further stresses the completely different behaviorof u GW , { SLB ult1 and d GW , { SLB . Since both d GW , and SLB are robust to small changesof the metric structure of the considered ultrametric measure spaces, the results for com-paring the independently sampled and subsampled ultrametric measure spaces differ only HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 49 d GW , SLB X X X X X X X X X X X X Table 3.
Comparison of different ultrametric measure spaces III:
The values of d GW , p X i , X j q (approximated by a version of Algorithm 1) and SLB p X i , X j q , 1 ď i ď j ď
4, where X i , 1 ď i ď
4, denote the ultrametricmeasure spaces displayed in Figure 6.
10 20 30 4051015202530354045 00.050.10.150.20.250.30.35 10 20 30 4051015202530354045 00.050.10.150.20.250.30.3510 20 30 4051015202530354045 00.050.10.150.20.250.30.35 10 20 30 4051015202530354045 00.050.10.150.20.250.30.35
Figure 9.
Randomly sampled ultrametric measure spaces II:
Heatmap representation of d GW , p Y in,k , Y i n ,k q (top left), of SLB p Y in,k , Y i n ,k q (top right), of d GW , p Z in,k , Z i n ,k q (bottom right) and SLB p Z in,k , Z i n ,k q (bot-tom left), 1 ď i, i ď n, n P t , , u and k, k P t , . . . , u .slightly. Furthermore, there seems to be almost no discrimination between the 4 and 5 blockspaces considered, although they are structurally quite different. In conclusion, we find that d GW , and SLB have trouble picking up crucial structural differences between the randomlycreated ultrametric measure spaces. Phylogenetic tree shapes
Rooted phylogenetic trees (for a formal definition see e.g. Semple et al. (2003)) are a commontool to visualize and analyze the evolutionary relationship between different organisms. Incombination with DNA sequencing, they are an important tool to study the rapid evolutionof different pathogens. It is well known that the (unweighted) shape of a phylogenetictree, i.e., the tree’s connectivity structure without referring to its labels or the length of itsbranches, carries important information about macroevolutionary processes (see e.g. Mooersand Heard (1997); Blum and Fran¸cois (2006); Dayarian and Shraiman (2014); Wu and Choi(2016)). In order to study the evolution of and the relation between different pathogens, it isof great interest to compare the shapes of phylogenetic trees created on the basis of differentdata sets. Currently, the number of tools for performing phylogenetic tree shape comparisonis quite limited and the development of new methods for this is an active field of research(Colijn and Plazzotta, 2018; Morozov, 2018; Kim et al., 2019; Liu et al., 2020). It is wellknown that certain classes of phylogenetic trees (as well as their respective tree shapes) canbe identified as ultrametric spaces (Semple et al., 2003, Sec. 7). On the other hand, generalphylogenetic trees are closely related to treegrams (see Definition 6.4). In the following, wewill use this connection and demonstrate in some preliminary, illustrative example that thecomputationally efficient lower bound
SLB ult1 has some potential for comparing phylogenetictree shapes. In particular, we contrast it to the metric defined for this application in Colijnand Plazzotta (2018) and study the behavior of
SLB in this framework.8.1. SLB ult1 based phylogenetic tree shape comparison.
In this subsection, we recon-sider phylogenetic tree shape comparisons from Colijn and Plazzotta (2018) and therebystudy HA protein sequences from human influenza A (H3N2) (data downloaded from NCBIon 22 January 2016). More precisely, we investigate the relation of two samples of size 200of phylogenetic tree shapes with 500 tips. Phylogenetic trees from the first sample are basedon a random subsample of size 500 of 2168 HA-sequences that were collected in the USAbetween March 2010 and September 2015, while trees from the second sample are based ona random subsample of size 500 of 1388 HA-sequences gathered in the tropics between Janu-ary 2000 and October 2015 (for the exact construction of the trees see Colijn and Plazzotta(2018)). Although both samples of phylogenetic trees are based on HA protein sequencesfrom human influenza A, we expect them to be quite different. On the one hand, influenzaA is highly seasonal outside the tropics (where this seasonal variation is absent) with themajority of cases occurring in the winter (Russell et al., 2008). On the other hand, it iswell known that the undergoing evolution of the HA protein causes a ’ladder-like’ shapeof long-term influenza phylogenetic trees (Koelle et al., 2010; Volz et al., 2013; Westgeestet al., 2012; (cid:32)Luksza and L¨assig, 2014) that is typically less developed in short term data sets.Thus, also the different collection period of the two data sets will most likely influence therespective phylogenetic tree shapes.In order to compare the (unweighted) phylogenetic tree shapes of the resulting 400 trees,we have to transform the phylogenetic tree shapes into ultra-dissimilarity measure spaces X i “ p X i , u X i , µ X i q , 1 ď i ď X i thetips of the i ’th phylogenetic tree and refer to the corresponding (unweighted) tree shape as T i . Next, we define the ultra-dissimilarities u X i on X i , 1 ď i ď u X i as follows: let x i , x i P X i and let d i , be the length of a shortest path between x i and x i . Let d ij be the HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 51 phylogenetic tree shape treegram
Figure 10.
Transforming a phylogenetic tree shape into an ultra-dissimilarity space:
In this figure, we illustrate the treegram correspondingto the ultra-dissimilarity space generated by (31) with respect to the phyloge-netic tree shape on the left. Note that the treegram preserves the tree structureand the smallest birth time of points is exactly 0.length of the shortest path from x ij to the root, 1 ď j ď
2, and let d i be the length of thelongest shortest path from any tip to the root. Then, we define for any x i , x i P X i u X i p x i , x i q “ d i , if x i ‰ x i d i ´ d i if x i “ x i , (31)and weight all tips in X i equally (i.e. µ X i is the uniform measure on X i ). This naturallytransforms the collection of phylogenetic tree shapes T i , 1 ď i ď SLB ult1 to compare them (once again we exemplarily choose p “ T i ,1 ď i ď d CP , . The top row of Figure 11visualizes the dissimilarity matrix for the comparisons of all 400 phylogenetic tree shapes(the first 200 entries correspond to the tree shapes from the US-influenza and the second 200correspond to the ones from the tropic influenza) obtained by applying SLB ult1 as heat map(left) and as multidimensional scaling plot (right). The heat map shows that the collectionof US trees is divided into a large group G : “ p T i q ď i ď , that is well separated from thephylogenetic tree shapes based on tropical data G : “ p T i q ď i ď , and a smaller subgroup G : “ p T i q ď i ď , that seems to be more similar (in the sense of SLB ult1 ) to the tropicalphylogenetic tree shapes. In the following G and G are referred to as US main and
USsecondary group , respectively. This division is even more evident in the MDS-plot on theright (black points represent trees shapes from the US main group, blue points trees shapesfrom the US secondary group and red points trees shapes based on the tropical data).
100 200 300 40050100150200250300350400 -15 -10 -5 0 5 10 15-50510
100 200 300 40050100150200250300350400 -40 -20 0 20 40-20-15-10-505101520
Figure 11.
Phylogenetic tree shape comparison:
Visualization of thedissimilarity matrices for the comparison of the phylogenetic tree shapes T i ,1 ď i ď SLB ult1 (top row) and d CP , (bottom row) as heat maps(left) and MDS-plots (right).We remark that in order to highlight the subgroups the US tree shapes have beenreordered according to the output permutation of a single linkage dendrogram (w.r.t. SLB ult1 ) based on the US tree submatrix created by MATLAB (2019) and that the trop-ical tree shapes have been reordered analogously.The second row of Figure 11 displays the analogous plots for d CP , . It is noteworthy, thatthe coloring in the MDS-plot of the left is the same, i.e., T P G is represented by a blackpoint, T P G by a blue one and T P G by a red one. Interestingly, the analysis basedon these plots differs from the previous one. Using d CP , to compare the phylogenetic treeshapes at hand, we can split the data into two clusters, where one corresponds to the USdata and the other one to the tropical data, with only a small overlap (see the MDS-plotin the second row of Figure 11 on the right). In particular, we notice that d CP , does notclearly distinguish between the US groups G and G .In order to analyze the differences between SLB ult1 and d CP , displayed in Figure 11, wecollect different characteristics of the tree shapes in the groups G i , 1 ď i ď | G i | ř T i P G i ř x,x P X i u X i p x, x q (“mean average distance”) or HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 53
USA (main group) USA (secondary group) TropicsMean Avg. Dist. 28.38 38.61 38.19Mean Max. Dist. 61.03 89.33 95.65Mean Num. of 4-Struc. 15.61 14.08 7.81Mean Num. of 5-Struc. 28.04 27.97 35.82
Table 4.
Tree shape characteristics:
The means of several metric andconnectivity characteristics of the ultra-dissimilarity spaces X i and the corre-sponding phylogenetic tree shapes T i , 1 ď i ď G i ,1 ď i ď | G i | ř T i P G i max t u X i p x, x q| x, x P X i u (“mean maximal distance”), 1 ď i ď SLB ult1 strongly) as well as the mean numbers of certain connectivity structures, like the4- and 5-structures (these influence d CP , , for a formal definition see Colijn and Plazzotta(2018)). The values in Table 4 show that the mean average distance and the mean maximaldistance differ drastically between the two groups of the US tree shapes. The tree shapes inthese two groups are completely different from a metric perspective and the values for thesecondary US group strongly resemble those of the tropic tree shapes. On the other hand,the connectivity characteristics do not change too much between the US main and secondarygroup. Hence, the metric d CP , does not clearly divide the US trees into two groups, althoughthe differences are certainly present. When carefully checking the phylogenetic trees the rea-sons for the differences between trees in the US main group and US secondary group arenot immediately apparent. Nevertheless, it is remarkable that trees from the secondary UScluster generally contain more samples from California and Florida (on average 1.92 and 0.88more) and less from Maryland, Kentucky and Washington (on average 0.73, 0.83 and 0.72less).So far we have only considered unweighted phylogenetic tree shapes. However, the branchlengths of the considered phylogenetic trees are relevant in many examples, because they canfor instance reflect the (inferred) genetic distance between evolutionary events (Colijn andPlazzotta, 2018). While the branch lengths cannot easily be included in the metric d CP , ,the modeling of phylogenetic tree shapes as ultra-dissimilarity spaces is extremely flexible.It is straight forward to include branch lengths into the comparisons or to put emphasis onspecific features (via weights on the corresponding tips). However, this is beyond the scopeof our preliminary data analysis.8.2. SLB p based phylogenetic tree shape comparison. To conclude this section, weillustrate how the results change if we compare phylogenetic tree shapes, or more preciselythe corresponding ultra-dissimilarity spaces X i , 1 ď i ď SLB (cf. Section 5) instead of SLB ult1 . The results for these comparisons are summarized inFigure 12 (for additional details see Figure 14 in Appendix D). It illustrates the correspondingdissimilarity matrix based on
SLB for the comparison of the X i , 1 ď i ď SLB ult1 and the MDS-plot shows that also
SLB identifies the two groups G and G inside the collection of trees based on US data.Moreover, it suggests that SLB discriminates less strictly between tree shapes inside G than SLB ult1 .
100 200 300 40050100150200250300350400 -10 -5 0 5 10-2-1012345
Figure 12.
Phylogenetic tree shape comparison based on SLB : Rep-resentation of the dissimilarity matrices for the comparisons of the ultra-dissimilarity spaces X i , 1 ď i ď SLB as heat maps (left)and MDS-plots (right).In conclusion, we find that both SLB ult1 and
SLB give comparable results for the unweightedphylogenetic tree shape comparison. However, the discrimination based on SLB is in somecases less strict. 9. Concluding remarks
Since we suspect that computing u GW ,p and u sturmGW ,p for finite p leads to NP-hard problems, itseems interesting to identify suitable collections of ultrametric measure spaces where thesedistances can be computed in polynomial time as done for the Gromov-Hausdorff distancein M´emoli et al. (2019). Acknowledgements.
We are grateful to Prof. Colijn for sharing the data from Colijn andPlazzotta (2018) with us.F.M. and Z.W. acknowledge funding from the NSF under grants NSF CCF 1740761, NSFDMS 1723003, and NSF RI 1901360.A.M. and C.W. gratefully acknowledge support by the DFG Research Training Group 2088and Cluster of Excellence MBExC 2067.F.M. and A.M. Thank the Mathematisches Forschungsinstitut Oberwolfach. Conversationswhich eventually led to this project were initiated during the 2019 workshop “Statistical andComputational Aspects of Learning with Complex Structure”.
References
GM Adelson-Velskii and AS Kronrod. About level sets of continuous functions with partialderivatives. In
Dokl. Akad. Nauk SSSR , volume 49, pages 239–241, 1945.Pankaj K Agarwal, Kyle Fox, Abhinandan Nath, Anastasios Sidiropoulos, and Yusu Wang.Computing the Gromov-Hausdorff distance for metric trees.
ACM Transactions on Algo-rithms (TALG) , 14(2):1–20, 2018.Alfred V. Aho and John E. Hopcroft.
The design and analysis of computer algorithms .Pearson Education India, 1974.
HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 55
David Alvarez-Melis and Tommi S. Jaakkola. Gromov-Wasserstein alignment of word em-bedding spaces. arXiv preprint arXiv:1809.00013 , 2018.Yair Bartal. Probabilistic approximation of metric spaces and its algorithmic applications.In
Proceedings of 37th Conference on Foundations of Computer Science , pages 184–193.IEEE, 1996.Louis J. Billera, Susan P. Holmes, and Karen Vogtmann. Geometry of the space of phyloge-netic trees.
Advances in Applied Mathematics , 27(4):733–767, 2001.Michael G.B. Blum and Olivier Fran¸cois. Which random processes describe the tree of life?A large-scale study of phylogenetic tree imbalance.
Systematic Biology , 55(4):685–691,2006.Nicolas Bonneel, Julien Rabin, Gabriel Peyr´e, and Hanspeter Pfister. Sliced and radonWasserstein barycenters of measures.
Journal of Mathematical Imaging and Vision , 51(1):22–45, 2015.L. Bottou, M. Arjovsky, D. Lopez-Paz, and M. Oquab. Geometrical insights for implicit gen-erative modeling. In
Braverman Readings in Machine Learning. Key Ideas from Inceptionto Current State , pages 229–268. Springer, 2018.Alexander M. Bronstein, Michael M. Bronstein, and Ron Kimmel. Efficient computation ofisometry-invariant distances between surfaces.
SIAM Journal on Scientific Computing , 28(5):1812–1836, 2006a.Alexander M. Bronstein, Michael M. Bronstein, and Ron Kimmel. Generalized multidimen-sional scaling: A framework for isometry-invariant partial surface matching.
Proceedingsof the National Academy of Sciences , 103(5):1168–1172, 2006b.Alexander M. Bronstein, Michael M. Bronstein, Alfred M Bruckstein, and Ron Kimmel.Partial similarity of objects, or how to compare a centaur to a horse.
International Journalof Computer Vision , 84(2):163, 2009a.Alexander M. Bronstein, Michael M. Bronstein, and Ron Kimmel. Topology-invariant simi-larity of nonrigid shapes.
International journal of computer vision , 81(3):281, 2009b.Alexander M. Bronstein, Michael M. Bronstein, Ron Kimmel, Mona Mahmoudi, andGuillermo Sapiro. A Gromov-Hausdorff framework with diffusion geometry fortopologically-robust non-rigid shape matching.
International Journal of Computer Vi-sion , 89(2-3):266–286, 2010.Charlotte Bunne, David Alvarez-Melis, Andreas Krause, and Stefanie Jegelka. Learninggenerative models across incomparable spaces. arXiv preprint arXiv:1905.05461 , 2019.Gunnar Carlsson and Facundo M´emoli. Characterization, stability and convergence of hi-erarchical clustering methods.
Journal of machine learning research , 11(Apr):1425–1470,2010.Fr´ed´eric Chazal, David Cohen-Steiner, Leonidas J Guibas, Facundo M´emoli, and Steve YOudot. Gromov-Hausdorff stable signatures for shapes using persistence. In
ComputerGraphics Forum , volume 28, pages 1393–1403. Wiley Online Library, 2009.Jie Chen and Ilya Safro. Algebraic distance on graphs.
SIAM Journal on Scientific Com-puting , 33(6):3468–3490, 2011.Samir Chowdhury.
Metric and Topological Approaches to Network Data Analysis . PhDthesis, The Ohio State University, 2019.Samir Chowdhury and Facundo M´emoli. The Gromov-Wasserstein distance between net-works and stable network invariants.
Information and Inference: A Journal of the IMA ,8(4):757–787, 2019.
Samir Chowdhury and Tom Needham. Generalized spectral clustering via Gromov-Wasserstein learning. arXiv preprint arXiv:2006.04163 , 2020.Caroline Colijn and Giacomo Plazzotta. A metric on phylogenetic tree shapes.
Systematicbiology , 67(1):113–126, 2018.Guy David, Stephen W. Semmes, Stephen Semmes, and Guy Rene Pierre Pierre.
Fracturedfractals and broken dreams: Self-similar geometry through metric and measure , volume 7.Oxford University Press, 1997.Adel Dayarian and Boris I. Shraiman. How to infer relative fitness from a sample of genomicsequences.
Genetics , 197(3):913–923, 2014.Khanh Do Ba, Huy L. Nguyen, Huy N. Nguyen, and Ronitt Rubinfeld. Sublinear timealgorithms for Earth Mover’s distance.
Theory of Computing Systems , 48(2):428–442,2011.Yihe Dong and Will Sawin. Copt: Coordinated optimal transport on graphs. arXiv preprintarXiv:2003.03892 , 2020.Richard M Dudley.
Real analysis and probability . CRC Press, 2018.Steven N. Evans.
Probability and Real Trees: ´Ecole D’ ´Et´e de Probabilit´es de Saint-FlourXXXV-2005 . Springer, 2007.Steven N. Evans and Frederick A. Matsen. The phylogenetic Kantorovich–Rubinstein metricfor environmental sequence samples.
Journal of the Royal Statistical Society: Series B(Statistical Methodology) , 74(3):569–592, 2012.Andreas Greven, Peter Pfaffelhuber, and Anita Winter. Convergence in distribution of ran-dom metric measure spaces ( λ -coalescent measure trees). Probability Theory and RelatedFields , 145(1-2):285–322, 2009.Gillian Grindstaff and Megan Owen. Geometric comparison of phylogenetic trees with dif-ferent leaf sets. arXiv preprint arXiv:1807.04235 , 2018.Jotun Hein. Reconstructing evolution of sequences subject to recombination using parsimony.
Mathematical biosciences , 98(2):185–200, 1990.Liisa Holm and Chris Sander. Protein structure comparison by alignment of distance matri-ces.
Journal of molecular biology , 233(1):123–138, 1993.Norman R. Howes.
Modern analysis and topology . Springer Science & Business Media, 2012.Anil K. Jain and Chitra Dorai. 3d object recognition: Representation and matching.
Sta-tistics and Computing , 10(2):167–182, 2000.Leonid V. Kantorovich. On the translocation of masses, cr (dokl.) acad.
Sci. URSS (NS) ,37:199, 1942.Leonid V. Kantorovich and G Rubinstein. On a space of completely additive functions(russ.).
Vestnik Leningrad Univ , 13:52–59, 1958.Jaehee Kim, Noah A. Rosenberg, and Julia A. Palacios. A metric space of ranked tree shapesand ranked genealogies. bioRxiv , 2019.Benoˆıt R Kloeckner. A geometric study of Wasserstein spaces: Ultrametrics.
Mathematika ,61(1):162–178, 2015.Katia Koelle, Priya Khatri, Meredith Kamradt, and Thomas B. Kepler. A two-tiered modelfor simulating the ecological and evolutionary dynamics of rapidly evolving viruses, withan application to influenza.
Journal of The Royal Society Interface , 7(50):1257–1274, 2010.Soheil Kolouri, Kimia Nadjahi, Umut Simsekli, Roland Badeau, and Gustavo Rohde. Gener-alized sliced Wasserstein distances. In
Advances in Neural Information Processing Systems ,pages 261–272, 2019.
HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 57
Irina Kufareva and Ruben Abagyan. Methods of protein structure comparison. In
HomologyModeling , pages 231–257. Springer, 2011.Hao-Yuan Kuo, Hong-Ren Su, Shang-Hong Lai, and Chin-Chia Wu. 3D object detectionand pose estimation from depth image for robotic bin picking. In , pages 1264–1269. IEEE, 2014.Manuel Lafond, Nadia El-Mabrouk, Katharina T Huber, and Vincent Moulton. The com-plexity of comparing multiply-labelled trees by extending phylogenetic-tree metrics.
The-oretical Computer Science , 760:15–34, 2019.Tam Le, Nhat Ho, and Makoto Yamada. Fast tree variants of Gromov-Wasserstein. arXivpreprint arXiv:1910.04462 , 2019a.Tam Le, Makoto Yamada, Kenji Fukumizu, and Marco Cuturi. Tree-sliced variants ofWasserstein distances. In
Advances in neural information processing systems , pages 12304–12315, 2019b.Volkmar Liebscher. New Gromov-inspired metrics on phylogenetic tree space.
Bulletin ofmathematical biology , 80(3):493–518, 2018.Pengyu Liu, Matthew Gould, and Caroline Colijn. Polynomial phylogenetic analysis of treeshapes.
BioRxiv , 2020.David G. Lowe. Local feature view clustering for 3D object recognition. In
Proceedings of the2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.CVPR 2001 , volume 1, pages I–I. IEEE, 2001.Marta (cid:32)Luksza and Michael L¨assig. A predictive fitness model for influenza.
Nature , 507(7490):57–61, 2014.Colin L. Mallows. A note on asymptotic joint normality.
The Annals of MathematicalStatistics , pages 508–515, 1972.MATLAB.
MATLAB: Accelerating the pace of engineering and science . The MathWorks,Inc., 2019. URL .Andrew McGregor and Daniel Stubbs. Sketching Earth-Mover distance on graph metrics. In
Approximation, Randomization, and Combinatorial Optimization. Algorithms and Tech-niques , pages 274–286. Springer, 2013.Facundo M´emoli. On the use of Gromov-Hausdorff distances for shape comparison. 2007.Facundo M´emoli. Gromov-Wasserstein distances and the metric approach to object match-ing.
Foundations of computational mathematics , 11(4):417–487, 2011.Facundo M´emoli. Some properties of Gromov-Hausdorff distances.
Discrete & ComputationalGeometry , 48(2):416–440, 2012.Facundo M´emoli and Guillermo Sapiro. Comparing point clouds. In
Proceedings of the 2004Eurographics/ACM SIGGRAPH symposium on Geometry processing , pages 32–40, 2004.Facundo M´emoli, Zane Smith, and Zhengchao Wan. Gromov-Hausdorff distances on p -metricspaces and ultrametric spaces. arXiv preprint arXiv:1912.00564 , 2019.Arne O. Mooers and Stephen B. Heard. Inferring evolutionary process from phylogenetictree shape. The quarterly review of Biology , 72(1):31–54, 1997.Alexey Anatolievich Morozov. Extension of colijn-plazotta tree shape distance metric tounrooted trees.
BioRxiv , page 506022, 2018.Dmitriy Morozov, Kenes Beketayev, and Gunther Weber. Interleaving distance betweenmerge trees.
Discrete and Computational Geometry , 49(22-45):52, 2013.Megan Owen and J Scott Provan. A fast algorithm for computing geodesic distances intree space.
IEEE/ACM Transactions on Computational Biology and Bioinformatics , 8(1):
The InternationalJournal of Robotics Research , 31(4):538–553, 2012.Panos M Pardalos and Stephen A Vavasis. Quadratic programming with one negative eigen-value is NP-hard.
Journal of Global optimization , 1(1):15–22, 1991.Gabriel Peyr´e, Marco Cuturi, and Justin Solomon. Gromov-Wasserstein averaging of kerneland distance matrices. In
International Conference on Machine Learning , pages 2664–2672, 2016.Derong Qiu. Geometry of non-archimedean Gromov-Hausdorff distance.
P-Adic Numbers,Ultrametric Analysis, and Applications , 1(4):317, 2009.Georges Reeb. Sur les points singuliers d’une forme de pfaff completement integrable oud’une fonction numerique [on the singular points of a completely integrable pfaff form orof a numerical function].
Comptes Rendus Acad. Sciences Paris , 222:847–849, 1946.David F. Robinson. Comparison of labeled trees with valency three.
Journal of combinatorialtheory, Series B , 11(2):105–119, 1971.David F. Robinson and Leslie R. Foulds. Comparison of phylogenetic trees.
Mathematicalbiosciences , 53(1-2):131–147, 1981.Yossi Rubner, Carlo Tomasi, and Leonidas J. Guibas. The Earth Mover’s distance as ametric for image retrieval.
International journal of computer vision , 40(2):99–121, 2000.Colin A. Russell, Terry C. Jones, Ian G. Barr, Nancy J. Cox, Rebecca J. Garten, VickyGregory, Ian D. Gust, Alan W. Hampson, Alan J. Hay, Aeron C. Hurt, et al. The globalcirculation of seasonal influenza a (H3N2) viruses.
Science , 320(5874):340–346, 2008.Felix Schmiedl. Computational aspects of the Gromov-Hausdorff distance and its applicationin non-rigid shape matching.
Discret. Comput. Geom. , 57(4):854–880, 2017. doi: 10.1007/s00454-017-9889-4. URL https://doi.org/10.1007/s00454-017-9889-4 .Charles Semple, Mike Steel, et al.
Phylogenetics , volume 24. Oxford University Press onDemand, 2003.Zane Smith, Samir Chowdhury, and Facundo M´emoli. Hierarchical representations of net-work data with optimal distortion bounds. In , pages 1834–1838. IEEE, 2016.Karl-Theodor Sturm. On the geometry of metric measure spaces.
Acta mathematica , 196(1):65–131, 2006.Karl-Theodor Sturm. The space of spaces: Curvature bounds and gradient flows on thespace of metric measure spaces. arXiv preprint arXiv:1208.0434 , 2012.David Thorsley and Eric Klavins. Model reduction of stochastic processes using Wassersteinpseudometrics. In , pages 1374–1381. IEEE, 2008.Vayer Titouan, Nicolas Courty, Romain Tavenard, and R´emi Flamary. Optimal transportfor structured data with application on graphs. In
International Conference on MachineLearning , pages 6275–6284, 2019.Elena Farahbakhsh Touli and Yusu Wang. FPT-algorithms for computing Gromov-Hausdorffand interleaving distances between trees. arXiv preprint arXiv:1811.02425 , 2018.Sergei S Vallender. Calculation of the Wasserstein distance between probability distributionson the line.
Theory of Probability & Its Applications , 18(4):784–786, 1974.Titouan Vayer, R´emi Flamary, Romain Tavenard, Laetitia Chapel, and Nicolas Courty.Sliced Gromov-Wasserstein. arXiv preprint arXiv:1905.10124 , 2019.
HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 59
C´edric Villani.
Topics in optimal transportation . Number 58. American Mathematical Soc.,2003.C´edric Villani.
Optimal transport: Old and new , volume 338. Springer Science & BusinessMedia, 2008.Erik M. Volz, Katia Koelle, and Trevor Bedford. Viral phylodynamics.
PLoS Comput Biol ,9(3):e1002947, 2013.Zhengchao Wan. A novel construction of Urysohn universal ultrametric space via theGromov-Hausdorff ultrametric. arXiv preprint arXiv:2007.08105 , 2020.Kim B. Westgeest, Miranda de Graaf, Mathieu Fourment, Theo M. Bestebroer, Ruud vanBeek, Monique I.J. Spronken, Jan C. de Jong, Guus F Rimmelzwaan, Colin A. Russell,Albert D.M.E. Osterhaus, et al. Genetic evolution of the neuraminidase of influenza a(H3N2) viruses from 1968 to 2009 and its correspondence to haemagglutinin evolution.
The Journal of general virology , 93(Pt 9):1996, 2012.Taoyang Wu and Kwok Pui Choi. On joint subtree distributions under two evolutionarymodels.
Theoretical population biology , 108:13–23, 2016.Ihor Zarichnyi. Gromov-Hausdorff ultrametric. arXiv preprint math/0511437 , 2005.
Appendix A. Missing details from Section 2
A.1.
Synchronized rooted trees. A synchronized rooted tree , is a combinatorial tree T “p V, E q with a root o P V and a height function h : V Ñ r , such that h ´ p q coincides withthe leaf set and h p v q ă h p v ˚ q for each v P V zt o u , where v ˚ is the parent of v . Similar as inLemma 2.2 that there exists a correspondence between ultrametric spaces and dendrograms,an ultrametric space X uniquely determines a synchronized rooted tree T X (Kloeckner, 2015).Now given a compact ultrametric space p X, u X q , we construct the corresponding sychronizedrooted tree T X via the dendrogram θ X associated with u X . Recall from Section 2.3 that V p X q : “ Ť t ą θ X p t q . For each B P V p X qzt X u , denote by B ˚ the smallest element in V p X q such that B Ř B ˚ , whose existence is guaranteed by the following lemma: Lemma A.1.
Let X be a compact ultrametric space and let V p X q “ Ť t ą θ X p t q , where θ X is as defined in Remark 2.5. For each B P V p X q such that B ‰ X , there exists B ˚ P V p X q such that B ˚ ‰ B and B ˚ Ď B for all B P V p X q with B Ř B .Proof. Let δ : “ diam p B q . Let x P B , then B “ r x s δ . By Lemma 2.7, X δ is a finite set.Consider δ ˚ : “ min t u X δ pr x s δ , r x s δ q| r x s δ ‰ r x s δ u . Let B ˚ : “ r x s δ ˚ , then B ˚ is the smallestelement in V p X q containing B under inclusion. Indeed, B ˚ ‰ B and if B Ď B for some B P V p X q , then B “ r x s r for some r ą δ . It is easy to see that for all δ ă r ă δ ˚ , r x s r “ r x s δ .Therefore, if B ‰ B , we must have that r ě δ ˚ and thus B ˚ “ r x s δ ˚ Ď r x s r “ B . (cid:3) Now, we define a combinatorial tree T X “ p V X , E X q as follows: we let V X : “ V p X q ; forany distinct B, B P V p X q , we let p B, B q P E X iff either B “ p B q ˚ or B “ B ˚ . Wechoose X P V to be the root of T X , then any B ‰ X in V has a unique parent B ˚ . Wedefine h X : V X Ñ r , such that h X p B q : “ diam p B q for any B P V X . Now, T X endowedwith the root X and the height function h X is a synchronized rooted tree. It is easy to seethat X can be isometrically identified with h ´ X p q of the so-called metric completion of T X (see (Kloeckner, 2015, Section 2.3) for details). With this construction Lemma 2.12 followsdirectly from (Kloeckner, 2015, Lemma 3.1). A.2.
Closed-form solution for d p R ě , ∆ q q W ,p .Theorem A.2. Given ď p, q ă 8 , we have that d p R ě , ∆ q q W ,p p α, β q ď ˆż ∆ q p F ´ α p t q , F ´ β p t qq p dt ˙ p . When q ď p , the equality holds whereas when q ą p , the equality does not hold in general.Proof. Note that d p R ě , ∆ q q W ,p p α, β q “ inf p ξ,η q ` E p ∆ q p ξ, η q p q ˘ p , where ξ and η are two randomvariables with marginal distributions α and β , respectively. Moreover, let ζ be the randomvariable uniformly distributed on r , s , then F ´ α p ζ q has distribution function F α and F ´ β p ζ q has distribution function F β (see for example Vallender (1974)). Let ξ “ F ´ α p ζ q and η “ F ´ β p ζ q , then we have d p R ě , ∆ q q W ,p p α, β q ď ` E p ∆ q p ξ, η q p q ˘ p “ ˆż ∆ q p F ´ α p t q , F ´ β p t qq p dt ˙ p . Now we assume q ď p . Denote S q : r ,
8q Ñ r , the map taking x ě x q . Then, ´ d p R ě , ∆ q q W ,p p α, β q ¯ p “ inf µ P C p α,β q ij R ě ˆ R ě p ∆ q p x, y qq p µ p dx ˆ dy q“ inf µ P C p α,β q ij R ě ˆ R ě | S q p x q ´ S q p y q| pq µ p dx ˆ dy q“ inf µ P C p α,β q ij R ě ˆ R ě | s ´ t | pq p S q ˆ S q q µ p ds ˆ dt q“ ´ d p R ě , ∆ q W , pq pp S q q α, p S q q β q ¯ pq , where we use pq ě ´ d p R ě , ∆ q W , pq pp S q q α, p S q q β q ¯ pq “ ż | F ´ µ,q p t q ´ F ´ β,q p t q| pq dt, where F µ,q and F β,q are distribution functions of p S q q α and p S q q β , respectively. It is easyto verify that F µ,q p t q “ p F ´ α p t qq q and F β,q p t q “ p F ´ β p t qq q . Therefore, d p R ě , ∆ q q W ,p p α, β q “ ˆż ∆ q p F ´ α p t q , F ´ β p t qq p dt ˙ p Now consider the case when p ă q . We first consider the extreme case p “ q “ 8 (though we require q ă 8 in the assumptions of the theorem, we relax this for now). Let α “ δ ` δ and β “ δ ` δ where δ x means the Dirac measure at point x P R ě .Then, we have that d p R ě , ∆ q W , p α , β q “ ă “ ż ∆ p F ´ α p t q , F ´ β p t qq dt. HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 61
It is not hard to see that both d p R ě , ∆ q q W ,p p α , β q and ´ş ∆ q p F ´ α p t q , F ´ β p t qq p dt ¯ p are continu-ous with respect to p P r , and q P r , . Then, for p close to 1 and large enough q ă 8 and in particular, p ă q , we have that d p R ě , ∆ q q W ,p p α , β q ă ˆż ∆ q p F ´ α p t q , F ´ β p t qq p dt ˙ p . (cid:3) A.3.
Relegated results and proofs from Section 2.
In this section we give the proofsof various results form Section 2.
Lemma A.3. ∆ defines an ultrametric on R ` . Proof.
It is evident that ∆ is positive and symmetric. Let a, b, c P R ` . Then, it holds bydefinition that ∆ p a, b q “ a “ b . Furthermore, we have distinguish severalcases to demonstrate that ∆ p a, c q ď max t ∆ p a, b q , ∆ p a, b qu . (32)As (32) is trivial for a “ c , we consider the three cases a ‰ b ‰ c , a “ b ‰ c and a ‰ b “ c .(1) a ‰ b ‰ c : In this case we have that∆ p a, c q “ max p a, c q ď max t max p a, b q , max p b, c qu “ max t ∆ p a, b q , ∆ p b, c qu . (2) a “ b ‰ c : It follows that∆ p a, c q “ max p a, c q “ max t , max p a, c qu “ max t , max p b, c qu “ max t ∆ p a, b q , ∆ p b, c qu . (3) a ‰ b “ c : We obtain that∆ p a, c q “ max p a, c q “ max t max p a, c q , u “ max t max p a, b q , u “ max t ∆ p a, b q , ∆ p b, c qu . This yields the claim. (cid:3)
Proof of Theorem 2.4.
Given θ P D p X q , we define u θ : X ˆ X Ñ R ě as follows u θ p x, x q : “ inf t t ě | x and x belong to the same block of θ p t qu . It is straight forward to verify that u θ is an ultrametric. For any Cauchy sequence t x n u n P N in p X, u θ q , let D i : “ sup m,n ě i u θ p x m , x n q for each i P N . Since the sequence is a Cauchy(and because of (3)), each D i ă 8 and lim i Ñ8 D i “
0. By definition of u θ , we have thatfor each i P N the set t x n u n “ i is contained in the block r x i s D i P θ p D i q . Let X i : “ r x i s D i for each i P N . Then, obviously we have that X j Ď X i for any 1 ď i ă j . By condition(7) in Definition 2.3, we have that Ş i P N X i ‰ H . Choose x ˚ P Ş i P N X i , then it is easy toverify that x ˚ “ lim n Ñ8 x n and thus p X, u θ q is a complete space. To prove that p X, u θ q is acompact space, we need to verify that for each t ą X t is a finite space (cf. Lemma 2.7).In fact, the equivalence relation „ t with respect to u θ is the same as the one induced by thepartition θ p t q . Since θ p t q is finite by condition (6) in Definition 2.3, we have that X t is finiteand thus X is compact. Now, we proved that u θ P U p X q . Based on this, we define a mapΥ X : D p X q Ñ U p X q by θ ÞÑ u θ . Now given u P U p X q , we define a map θ u : r ,
8q Ñ
Part p X q as follows: for each t ě „ t with respect to u , i.e., x „ t x if and only if u p x, x q ď t .We det θ u p t q to be the partition induced by „ t , i.e., θ u p t q “ X t . It is not hard to showthat θ u satisfies conditions (1)–(5) in Definition 2.3. Since X is compact, then θ u p t q “ X t isfinite for each t ą θ u satisfies condition (6) in Definition 2.3. Now, let t t n u n P N be a decreasing sequence such that lim n Ñ8 t n “ X n P θ X p t n q such that for any1 ď n ă m , X m Ď X n . Since each X n “ r x n s t n for some x n P X , X n is a compact subset of X . Since X is also complete, we have that Ş n P N X n ‰ H . Therefore, θ u satisfies condition(7) in Definition 2.3 and thus θ u P D p X q . Then, we define the map ∆ X : U p X q Ñ D p X q by u ÞÑ θ u .It is easy to check that ∆ X is the inverse of Υ X and thus we have established a bijectionbetween D p X q and U p X q . (cid:3) Proof of Lemma 2.11.
Given any t ą x P X , r x s t “ B t p x q “ t x P X | u X p x, x q ď t u .Therefore, V p X q is a collection of closed balls in X . On the contrary, any closed ball B t p x q with positive radius t ą r x s t P θ X p t q and thus belongs to V p X q . Now, forany singleton t x u “ B p x q . If x is not a cluster point, then there exists t ą B t p x q “ t x u which implies that t x u P V p X q . If x is a cluster point, then for any t ą t x u Ř B t p x q “ r x s t and thus t x u ‰ V p X q . In conclusion, V p X q is the collection of all closedballs in X except for singletons t x u such that x is a cluster point in X .If X is a one point space, then obviously X P V p X q “ t X u . Otherwise, let δ : “ diam p X q ą x P X we have that X “ r x s δ P V p X q . As for singletons t x u where x P X is nota cluster point, we have proved above that t x u P V p X q . (cid:3) Proof of Lemma 2.13.
First of all, we show that the right hand side of (16) is well defined.More precisely, we employ Lemma 2.7 to prove that the supremumsup B P V p X qzt X u and α p B q‰ β p B q diam p B ˚ q is attained. For any given B P V p X qzt X u such that α p B q ‰ β p B q , we have thatdiam p B ˚ q ą
0. By Lemma 2.7 the spaces X t are finite for t ą
0. Since V p X q “ tr x s t | x P X, t ą u “ Ť t ą X t , there are only finitely many B P V p X qzt X u such that diam p B q ě diam p B ˚ q and thus diam p B ˚ q ě diam p B ˚ q . This implies that the supremum is attainedand thus sup B P V p X qzt X u and α p B q‰ β p B q diam p B ˚ q “ max B P V p X qzt X u and α p B q‰ β p B q diam p B ˚ q . Let δ : “ diam p B ˚ q . It is easy to see that for any x P X , α pr x s δ q “ β pr x s δ q .By Strassen’s theorem (see for example (Dudley, 2018, Theorem 11.6.2)), d W , p α, β q “ inf t r ě | for any closed subset A Ď X, α p A q ď β p A r qu , where A r : “ t x P X | u X p x, A q ď r u . First, we reconsider the closed subset B ( B P V p X qzt X u such that α p B q ‰ β p B q ). Since α p B q ‰ β p B q , we assume without loss ofgenerality that α p B q ą β p B q . By definition of B ˚ , it is obvious that p B q δ “ B ˚ (Recall: δ : “ diam p B ˚ q ) and p B q r “ B for all 0 ď r ă δ . Therefore, α p B q ď β pp B q r q only when r ě δ . This implies that d W , p α, β q ě δ . Conversely, for any closed set A , we have that HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 63 A r “ Ť x P A r x s r . For two closed balls in ultrametric spaces, either one includes the other orthey have no intersection. Therefore, there exists a subset S Ď A such that A r “ Ů x P S r x s r .Then, α p A q ď α p A δ q “ ř x P S α pr x s δ q “ ř x P S β pr x s δ q “ β p A δ q . Hence, d W , p α, β q ď δ andthus d W , p α, β q “ max B P V p X qzt X u and α p B q‰ β p B q diam p B ˚ q . (cid:3) Proof of Lemma 2.16. If X is finite, then obviously X is compact. Assume that X is acountable set with 0 being the unique cluster point. If t x n u Ď X is a Cauchy sequence withrespect to ∆ , then either x n is a constant when n is large or lim n Ñ8 x n “
0. In either case,the limit of the t x n u belongs to X and thus X is complete. Now for any ε ą
0, by Lemma2.7, X ε is a finite set. Denote X ε “ tr x s ε , . . . , r x n s ε u . Then, t x , . . . , x n u is a finite ε -net of X . Therefore, X is totally bounded and thus X is compact.Now, assume that X is compact. Then, for any ε ą X ε is a finite set. Suppose X ε “tr x s ε , . . . , r x n s ε u where 0 ď x ă x ă . . . ă x n . Further, we have that ∆ p x i , x j q “ x j whenever 1 ď i ă j ď n . This implies that(1) x i ą ε for all 2 ď i ď n ;(2) r x i s ε “ t x i u for all 2 ď i ď n .Therefore, X X p ε,
8q “ t x , . . . , x n u is a finite set. Since ε ą X is a at mostcountable set and has no cluster point other than 0. If X is countable, then 0 must be acluster point and by compactness of X , we have that 0 P X . (cid:3) Appendix B. Missing details from Section 3
B.1.
The Wasserstein pseudometric.
Given a set X , a pseudometric is a symmetricfunction d X : X ˆ X Ñ R ě satisfying the triangle inequality and d X p x, x q “ x P X .Note that if moreover d X p x, y q “ x “ y , then d X is a metric. There is a canonicalidentification on pseudometric spaces p X, d X q : Define x „ x if d X p x, x q “
0. Then, „ isan equivalence relation and we define the quotient space ˜ X “ X { „ . Define a function˜ d X : ˜ X ˆ ˜ X Ñ R ě as follows:˜ d X pr x s , r x sq : “ d X p x, x q if d X p x, x q ‰
00 otherwise . Then, p ˜ X, ˜ d X q is a metric space named the metric space induced by the pseudometric space p X, d X q . Note that ˜ d X preserves the induced topology (see e.g. Howes (2012)) and thus thequotient map Ψ : X Ñ ˜ X is continuous.Analogously to the Wasserstein distance, which is defined for probability measures on met-ric spaces, we define the Wasserstein pseudometric for measures on compact pseudometricspaces as done in Thorsley and Klavins (2008). Let α, β P P p X q . Then, we define for p P r , the Wasserstein pseudometric of order p as d p X,d X q W ,p p α, β q : “ ˆ inf µ P C p α,β q ż X ˆ X d pX p x, y q µ p dx ˆ dy q ˙ p (33) and for p “ 8 as d p X,d X q W , p α, β q : “ inf µ P C p α,β q sup p x,y qP supp p µ q u p x, y q . (34)It is easy to see that the Wasserstein pseudometric is closely related to the Wassersteindistance on the induced metric space. More precisely, one can show the following. Lemma B.1.
Let p X, d X q denote a compact pseudometric space, let α, β P P p X q . Then, itfollows for p P r , that d p X,d X q W ,p p α, β q “ d p ˜ X, ˜ d X q W ,p p Ψ α, Ψ β q (35) and in particular that the infimum in (33) (resp. in (34) if p “ 8 ) is attained for some µ P C p α, β q .Proof. In the course of this proof we focus on the case p ă 8 and remark that the case p “ 8 follows by similar arguments. The quotient map allows us to define the map θ : C p α, β q Ñ C p Ψ α, Ψ β q via µ ÞÑ p Ψ ˆ Ψ q µ . It is easy to see that θ is well defined and surjective.Furthermore, it holds by construction that ż X ˆ X d pX p x, y q µ p dx ˆ dy q “ ż ˜ X ˆ ˜ X ˜ d pX p x, y q θ p µ qp dx ˆ dy q for all µ P C p α, β q . Hence, (35) follows.We come to the second part of the claim. By (Villani, 2008, Sec.4) there exists an optimalcoupling ˜ µ ˚ P C p Ψ α, Ψ β q such that d p ˜ X, ˜ d X q W ,p p Ψ α, Ψ β q “ ˆż ˜ X ˆ ˜ X ˜ d pX p x, y q ˜ µ ˚ p dx ˆ dy q ˙ p . In consequence, we find using our previous results that for any µ ˚ P θ ´ p ˜ µ ˚ q it holds d p ˜ X, ˜ d X q W ,p p Ψ α, Ψ β q “ ˆż ˜ X ˆ ˜ X ˜ d pX p x, y q ˜ µ ˚ p dx ˆ dy q ˙ p “ ˆż X ˆ X d pX p x, y q µ ˚ p dx ˆ dy q ˙ p “ d p X,d X q W ,p p α, β q . This yields the claim. (cid:3)
B.2.
Relegated results and proofs from Section 3.
Proof of Lemma 3.11. If t u n u n P N Ď D ult p u X , u Y q is a decreasing sequence (with respect topointwise inequality), it is easy to verify that u : “ inf n P N u n P D ult p u X , u Y q and thus u is alower bound of t u n u n P N . Then, by Zorn’s lemma D ultadm p u X , u Y q ‰ H . Therefore, we obtainthat u sturmGW ,p p X , Y q “ inf u P D ultadm p u X ,u Y q d p X \ Y,u q W ,p p µ X , µ Y q . (cid:3) HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 65
Proof of Lemma 3.12.
Assume otherwise that u ´ p q “ H . Then, u is a metric (instead ofpseudo-metric). Let p x , y q P X ˆ Y such that u p x , y q “ min x P X,y P Y u p x, y q . The existenceof p x , y q is guaranteed by the compactness of X and Y . We define u p x ,y q : X \ Y ˆ X \ Y Ñ R ě as follows:(1) u p x ,y q | X ˆ X : “ u X and u p x ,y q | Y ˆ Y : “ u Y ;(2) For p x, y q P X ˆ Y , u p x ,y q p x, y q : “ min p u p x, y q , max p u X p x, x q , u Y p y, y qqq ;(3) For any p y, x q P Y ˆ X , u p x ,y q p y, x q : “ u p x ,y q p x, y q .It is easy to verify that u p x ,y q P D ult p u X , u Y q . Then, it is obvious that u p x ,y q p x , y q “ ă u p x , y q and that u p x ,y q p x, y q ď u p x, y q for all x, y P X \ Y which contradicts with u P D ultadm p u X , u Y q . Therefore, u ´ p q ‰ H . (cid:3) Lemma B.2.
Let X “ p X, u X , µ X q and Y “ p Y, u Y , µ Y q be compact ultrametric measurespaces. Then, µ P C p µ X , µ Y q Ď P p X ˆ Y, max p u X , u Y qq is compact with respect to weakconvergence.Proof. The proof follows directly from Chowdhury and M´emoli (2019, Lemma 10). (cid:3)
Lemma B.3.
Let X , Y P U w . Let D Ď D ult p u X , u Y q be a non-empty subset satisfying thefollowing: there exist p x , y q P X ˆ Y and C ą such that u p x , y q ď C for all u P D .Then, D is pre-compact with respect to uniform convergence.Proof. Let t u n u n P N Ď D be a sequence. Note that X ˆ Y Ď X \ Y ˆ X \ Y . Let v n : “ u n | X ˆ Y .For any n P N and any p x, y q , p x , y q P X ˆ Y , we have that | u n p x, y q ´ u n p x , y q| ď u X p x, x q ` u Y p y, y q ď p u X , u Y q pp x, y q , p x , y qq . Then, t v n u n P N is equicontinuous. Now, since u n p x , y q ď C , we have that for any p x, y q P X ˆ Y , u n p x, y q ď p u X , u Y q pp x, y q , p x , y qq ` u n p x , y q ď p diam p X q , diam p Y qq ` C. Then, t v n u n P N is uniformly bounded. By Arz´ela-Ascoli theorem, we have that t v n u n P N has auniform convergent subsequence. Without loss of generality, we assume that v : X ˆ Y Ñ R ě is the limit of the sequence t v n u n P N .Now, we define u : X \ Y ˆ X \ Y Ñ R ě as follows:(1) u | X ˆ X : “ u X and u | Y ˆ Y : “ u Y ;(2) u | X ˆ Y : “ v ;(3) for p y, x q P Y ˆ X , we let u p y, x q : “ u p x, y q .Then, it is easy to verify that u P D ult p u X , u Y q and u is a cluster point of the sequence t u n u n P N . Therefore, D is pre-compact. (cid:3) Lemma B.4.
Let X “ p X, u X , µ X q and Y “ p Y, u Y , µ Y q be compact ultrametric measurespaces. Let t µ n u n P N Ď C p µ X , µ Y q be a convergent sequence with the limit µ with respectto weak convergence. Let t u n u n P N Ď D ult p u X , u Y q . Suppose there exist a non-decreasingsequence t p n u n P N Ď r , and C ą such that ˆż X ˆ Y p u n p x, y qq p n µ n p dx ˆ dy q ˙ pn ď C for all n P N . Then, t u n u n P N uniformly converges to some u P D ult p u X , u Y q (up to taking asubsequence).Proof. The following argument adapts from Lemma 3.3 in Sturm (2006). For any p x , y q P supp p µ q , there exists ε, δ ą C ě ˆż X ˆ Y p u n p x, y qq p n µ n p dx ˆ dy q ˙ pn ě ż X ˆ Y u n p x, y q µ n p dx ˆ dy qě ż B Xε p x qˆ B Yε p y q u n p x, y q µ n p dx ˆ dy q ě ż B Xε p x qˆ B Yε p y q p u n p x , y q ´ ε q µ n p dx ˆ dy qě p u n p x , y q ´ ε q ` µ ` B Xε p x q ˆ B Yε p y q ˘ ´ δ ˘ . Therefore, t u n p x , y qu n P N is uniformly bounded. By lemma B.3, we have that t u n u n P N has auniformly convergent subsequence. (cid:3) Lemma B.5.
Let
X, Y be ultrametric spaces, then ∆ p u X , u Y q : X ˆ Y ˆ X ˆ Y Ñ R ě iscontinuous with respect to product topology or equivalently max p u X , u Y , u X , u Y q .Proof. Fix p x, y, x , y q P X ˆ Y ˆ X ˆ Y and ε ą
0. Choose 0 ă δ ă ε such that δ ă u X p x, x q if x ‰ x and δ ă u Y p y, y q if y ‰ y . Then, consider any point p x , y , x , y q P X ˆ Y ˆ X ˆ Y such that u X p x, x q , u Y p y, y q , u X p x , x q , u Y p y , y q ď δ . For u X p x , x q , we have the followingtwo situations:(1) x “ x : u X p x , x q ď max p u X p x , x q , u X p x, x qq ď δ ă ε ;(2) x ‰ x : u X p x , x q ď max p u X p x , x q , u X p x, x q , u X p x , x qq “ u X p x, x q . Similarly, u X p x, x q ď u X p x , x q and thus u X p x, x q “ u X p x , x q .Similar result holds for u Y p y , y q . This leads to four cases for ∆ p u X p x , x q , u Y p y , y qq :(1) x “ x , y “ y : In this case we have u X p x , x q , u Y p y , y q ă ε . Then, | ∆ p u X p x , x q , u Y p y , y qq ´ ∆ p u X p x, x q , u Y p y, y qq| “ ∆ p u X p x , x q , u Y p y , y qq ď ε ;(2) x “ x , y ‰ y : Now u X p x , x q ă ε and u Y p y , y q “ u Y p y, y q . If u Y p y, y q ě ε ą u X p x , x q , then | ∆ p u X p x , x q , u Y p y , y qq ´ ∆ p u X p x, x q , u Y p y, y qq| “ | u Y p y, y q ´ u Y p y, y q| “ u Y p y, y q ă ε , then ∆ p u X p x , x q , u Y p y , y qq ď ε and ∆ p u X p x, x q , u Y p y, y qq “ u Y p y, y q ď ε . Therefore, | ∆ p u X p x , x q , u Y p y , y qq ´ ∆ p u X p x, x q , u Y p y, y qq| ď ε ; HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 67 (3) x ‰ x , y “ y : Similar with (2) we have | ∆ p u X p x , x q , u Y p y , y qq ´ ∆ p u X p x, x q , u Y p y, y qq| ď ε ;(4) x ‰ x , y ‰ y : Now u X p x , x q “ u X p x, x q and u Y p y , y q “ u Y p y, y q . Therefore, | ∆ p u X p x , x q , u Y p y , y qq ´ ∆ p u X p x, x q , u Y p y, y qq| “ . In conclusion, whenever u X p x, x q , u Y p y, y q , u X p x , x q , u Y p y , y q ď δ we have that | ∆ p u X p x , x q , u Y p y , y qq ´ ∆ p u X p x, x q , u Y p y, y qq| ď ε. Therefore, ∆ p u X , u Y q is continuous. (cid:3) Appendix C. Missing proofs from Section 4
C.1.
Missing details of the proof of Theorem 4.4.
Proof of Claim 1 (Theorem 4.4).
First note by Theorem 5.1 that u GW ,p ´ ˆ∆ p a q , ˆ∆ p b q ¯ ě SLB ult p ´ ˆ∆ p a q , ˆ∆ p b q ¯ . It is easy to verify that SLB ult p ´ ˆ∆ p a q , ˆ∆ p b q ¯ “ ´ p ∆ p a, b q . Onthe other hand, consider the diagonal coupling between µ a and µ b , then for p P r , u GW ,p ´ ˆ∆ p a q , ˆ∆ p b q ¯ ď ˆ ¨ ∆ p a, b q p ¨ ¨ ˙ p “ ´ p ∆ p a, b q , and for p “ 8 u GW , ´ ˆ∆ p a q , ˆ∆ p b q ¯ ď ∆ p a, b q . Therefore, u GW ,p ´ ˆ∆ p a q , ˆ∆ p b q ¯ “ ´ p ∆ p a, b q . (cid:3) C.2.
Proof of Theorem 4.2.
In this section, we prove Theorem 4.2 by slightly modifyingthe proof of Proposition 5.3 in M´emoli (2011).
Lemma C.1.
Let X and Y be compact ultrametric spaces and let S Ď X ˆ Y be non-empty.Assume that sup p x,y q , p x ,y qP S ∆ p u X p x, x q , u Y p y, y qq ă η. Define u S : X \ Y ˆ X \ Y Ñ R ě as follows:(1) u S | X ˆ X : “ u X and u S | Y ˆ Y : “ u Y ;(2) for any p x, y q P X ˆ Y , u S p x, y q : “ inf p x ,y qP S max p u X p x, x q , u Y p y, y q , η q . (3) for any p x, y q P X ˆ Y , u S p y, x q : “ u S p x, y q .Then, u S P D ult p u X , u Y q and u S p x, y q ď η for all p x, y q P S .Proof. That u S P D ult p u X , u Y q essentially follows by Zarichnyi (2005, Lemma 1.1). For any p x, y q P S , we let p x , y q : “ p x, y q , then u X p x, y q ď max p u X p x, x q , u Y p y, y q , η q “ max p , , η q “ η. (cid:3) Proof of Theorem 4.2.
Let µ P C p µ X , µ Y q be a coupling such that (cid:13)(cid:13) Γ X,Y (cid:13)(cid:13) L p p µ b µ q ă δ . Set ε : “ v δ p X q ď N ď r { δ s and points x , . . . , x N in X such that min i ‰ j u X p x i , x j q ě ε , min i µ X ` B Xε p x i q ˘ ą δ and µ X ´Ť Ni “ B Xε p x i q ¯ ě ´ ε . Claim 1:
For each i “ , . . . , N there exists y i P Y such that µ ` B Xε p x i q ˆ B Y p ε ` δ q p y i q ˘ ě p ´ δ q µ X ` B Xε p x i q ˘ . Proof.
Assume the claim is false for some i and let Q i p y q “ B Xε p x i q ˆ ´ Y z B Y p ε ` δ q p y q ¯ . Then,as µ P C p µ X , µ Y q it holds µ X ` B Xε p x i q ˘ “ µ ` B Xε p x i q ˆ Y ˘ “ µ ` B Xε p x i q ˆ B Y p ε ` δ q p y q ˘ ` µ ` B Xε p x i q ˆ ` Y z B Y p ε ` δ q p y q ˘˘ . Consequently, we have that µ p Q i p y qq ě δ µ X ` B Xε p x i q ˘ . Further, let Q i : “ (cid:32) p x, y, x , y q P X ˆ Y ˆ X ˆ Y | x, x P B Xε p x i q and u Y p y, y q ě p ε ` δ q ( . Clearly, it holds for p x, y, x , y q P Q i thatΓ X,Y p x, y, x , y q “ ∆ p u X p x, x q , u Y p y, y qq “ u Y p y, y q ě δ. Further, we have that µ b µ p Q i q ě δ . Indeed, it holds µ b µ p Q i q “ ij B Xε p x i qˆ Y ij Q i p y q µ p dx ˆ dy q µ p dx ˆ dy q“ ij B Xε p x i qˆ Y µ p Q i p y qq µ p dx ˆ dy q“ µ X ` B Xε p x i q ˘ ż Y µ p Q i p y qq µ Y p dy qě ` µ X ` B Xε p x i q ˘˘ δ ě δ . However, this yields that (cid:13)(cid:13) Γ X,Y (cid:13)(cid:13) L p p µ b µ q ě (cid:13)(cid:13) Γ X,Y (cid:13)(cid:13) L p µ b µ q ě (cid:13)(cid:13) Γ X,Y Q i (cid:13)(cid:13) L p µ b µ q ě δ ¨ µ b µ p Q i q ě δ , which contradicts (cid:13)(cid:13) Γ X,Y (cid:13)(cid:13) L p p µ b µ q ă δ . (cid:3) Define for each i “ , . . . , N S k : “ B Xε p x i q ˆ B Y p ε ` δ q p y i q . Then, by Claim 1, µ p S i q ě δ p ´ δ q , for all i “ , . . . , N . Claim 2: Γ X,Y p x i , y i , x j , y j q ď p ε ` δ q for all i, j “ , . . . , N . HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 69
Proof.
Assume the claim fails for some p i , j q , i.e.,∆ p u X p x i , x j q , u Y p y i , y j qq ą p ε ` δ q ą . Then, ∆ p u X p x i , x j q , u Y p y i , y j qq “ max p u X p x i , x j q , u Y p y i , y j qq . We assume withoutloss of generality that u X p x i , x j q “ ∆ p u X p x i , x j q , u Y p y i , y j qq ą u Y p y i , y j q . Consider any p x, y q P S i and p x , y q P S j . By the strong triangle inequality and the factthat u X p x i , x j q ą p ε ` δ q ą ε , it is easy to verify that u X p x, x q “ u X p x i , x j q . Moreover, u Y p y, y q ď max p u Y p y, y i q , u Y p y i , y j q , u Y p y j , y qqă max p p ε ` δ q , u X p x i , x j q , p ε ` δ qq “ u X p x i , x j q “ u X p x, x q . Therefore,Γ X,Y p x, y, x , y q “ u X p x, x q “ u X p x i , x j q “ Γ X,Y p x i , y i , x j , y j q ą p ε ` δ q ą δ. Consequently, we have that (cid:13)(cid:13) Γ X,Y (cid:13)(cid:13) L p p µ b µ q ě (cid:13)(cid:13) Γ X,Y (cid:13)(cid:13) L p µ b µ q ě (cid:13)(cid:13) Γ X,Y S i S j (cid:13)(cid:13) L p µ b µ q ě δµ p S i q µ p S j qą δ ` δ p ´ δ q ˘ . However, for δ ď {
2, 2 δ p δ p ´ δ qq ě δ . This leads to a contradiction. (cid:3) Consider S Ď X ˆ Y given by S : “ tp x i , y i q| i “ , . . . , N u . Let u S be the ultrametric on X \ Y given by Lemma C.1. By Claim 2, sup p x,y q , p x ,y qP S Γ X,Y p x, y, x , y q ď p ε ` δ q . Then,for all i “ , . . . , N we have that u S p x i , y i q ď p ε ` δ q and for any p x, y q P X ˆ Y we havethat u S p x, y q ď max p diam p X q , diam p Y q , p ε ` δ qq ď max p diam p X q , diam p Y q , q “ : M . Here in the second inequality we use the assumption that δ ă and the fact that ε “ v δ p X q ď Claim 3:
Fix i P t , . . . , N u . Then, for all p x, y q P S i , it holds u S p x, y q ď p ε ` δ q . Proof.
Let p x, y q P S i . Then, u X p x, x i q ď ε and u Y p y, y i q ď p ε ` δ q . Then, by the strongtriangle inequality for u S we obtain u S p x, y q ď max t u X p x, x i q , u Y p y, y i q , u S p x i , y i quď max t ε, p ε ` δ q , p ε ` δ qu ď p ε ` δ q . (cid:3) Let L : “ Ť Ni “ S i . The next step is to estimate the mass of µ in the complement of L . Claim 4: µ p X ˆ Y z L q ď ε ` δ . Proof.
For each i “ , . . . , N , let A i : “ B Xε p x i q ˆ ´ Y z B Y p ε ` δ q p y i q ¯ . Then, A i “ ` B Xε p x i q ˆ Y ˘ z ` B Xε p x i q ˆ B Y p ε ` δ q p y i q ˘ “ ` B Xε p x i q ˆ Y ˘ z S i . Hence, µ p A i q “ µ ` B Xε p x i q ˆ Y ˘ ´ µ p S i q “ µ X ` B Xε p x i q ˘ ´ µ p S i q , where the last equality follows from the fact that µ P M p µ X , µ Y q . By Claim 1, we have that µ p S i q ě µ X ` B Xε p x i q ˘ p ´ δ q . Consequently, we obtain µ p A i q ď µ X ` B Xε p x i q ˘ δ . Notice that X ˆ Y z L Ď ˜ X I N ď i “ B Xε p x i q ¸ ˆ Y Y ˜ N ď i “ A i ¸ . Hence, µ p X ˆ Y z L q ď µ X ˜ X I N ď i “ B Xε p x i q ¸ ` N ÿ i “ µ p A i qď ´ µ X ˜ N ď i “ B Xε p x i q ¸ ` N ÿ i “ δ µ X ` B Xε p x i q ˘ ď ε ` N ¨ δ ď ε ` δ. Here, the third inequality follows from the construction of x i s in the beginning of this sectionand from the fact that N ď r { δ s . (cid:3) Now, ij X ˆ Y u pS p x, y q µ p dx ˆ dy q “ ¨˚˝ij L ` ij X ˆ Y z L ˛‹‚ u pS p x, y q µ p dx ˆ dy qď p p ε ` δ qq p ` M p ¨ p ε ` δ q . Since we have for any a, b ě p ě a { p ` b { p ě p a ` b q { p , we obtain u sturmGW ,p p X , Y q ď p ε ` δ q p ´ p ε ` δ q ´ p ` M ¯ ď p ε ` δ q p p ` M qď p v δ p X q ` δ q p ¨ M, where we used ε “ v δ p X q and M : “ p diam p X q , diam p Y qq ` ě M `
27. Since theroles of X and Y are symmetric, we have that u sturmGW ,p p X , Y q ď p p v δ p X q , v δ p Y qq ` δ q p ¨ M. This concludes the proof. (cid:3)
HE ULTRAMETRIC GROMOV-WASSERSTEIN DISTANCE 71
Appendix D. Missing details from Section 8
D.1.
Phylogenetic tree shape comparison based on SLB ult1 . In this subsection, weadditionally illustrate the results obtained for the comparisons of the phylogenetic tree shapes T i (defined in Section 8.1), 1 ď i ď SLB ult1 and d CP , as dendrograms. Figure 13.
Phylogenetic tree shape comparison:
Visualization of theresults for the comparison of the phylogenetic tree shapes T i , 1 ď i ď SLB ult1 (top row) and d CP , (bottom row) as single linkage (left) andcomplete linkage dendrograms (right). D.2.
Phylogenetic tree shape comparison based on SLB . Here, we additionally il-lustrate the results obtained for the comparisons of the ultra-dissimilarity spaces X i (inducedby the phylogenetic tree shapes T i defined in Section 8.1), 1 ď i ď SLB asdendrograms. Figure 14.
Phylogenetic tree shape comparison based on SLB : Rep-resentation of the results for the comparisons of the ultra-dissimilarity spaces X i , 1 ď i ď SLB as single linkage (left) and complete linkagedendrograms (right). Department of Mathematics and Department of Computer Science and Engineering, TheOhio State University
Email address : [email protected] Institute for Mathematical Stochastics, University of G¨ottingen
Email address : [email protected] Department of Mathematics, The Ohio State University
Email address : [email protected] Institute for Mathematical Stochastics, University of G¨ottingen
Email address ::