[PDF] The mergegram of a dendrogram and its stability

Abstract

This paper extends the key concept of persistence within Topological Data Analysis (TDA) in a new direction. TDA quantifies topological shapes hidden in unorganized data such as clouds of unordered points. In the 0-dimensional case the distance-based persistence is determined by a single-linkage (SL) clustering of a finite set in a metric space. Equivalently, the 0D persistence captures only edge-lengths of a Minimum Spanning Tree (MST). Both SL dendrogram and MST are unstable under perturbations of points. We define the new stable-under-noise mergegram, which outperforms previous isometry invariants on a classification of point clouds by PersLay.

Full PDF

TThe mergegram of a dendrogram and its stability

Yury Elkin

Materials Innovation Factory and Computer Science department, University of Liverpool, [email protected]

Vitaliy Kurlin

Materials Innovation Factory and Computer Science department, University of Liverpool, [email protected]

Abstract

This paper extends the key concept of persistence within Topological Data Analysis (TDA) in a newdirection. TDA quantiﬁes topological shapes hidden in unorganized data such as clouds of unorderedpoints. In the 0-dimensional case the distance-based persistence is determined by a single-linkage(SL) clustering of a ﬁnite set in a metric space. Equivalently, the 0D persistence captures onlyedge-lengths of a Minimum Spanning Tree (MST). Both SL dendrogram and MST are unstableunder perturbations of points. We deﬁne the new stable-under-noise mergegram, which outperformsprevious isometry invariants on a classiﬁcation of point clouds by PersLay.

Theory of computation → Computational geometry

Keywords and phrases clustering dendrogram, topological data analysis, persistence, stability

Digital Object Identiﬁer

Funding

The authors were supported by the £3.5M EPSRC grant EP/R018472/1 (2018-2023)

TDA is now expanding towards machine learning and statistics due to stability that wasproved in a very general form by Chazal et al. [3]. The key idea of TDA is to view a givencloud of points across all scales s , e.g. by blurring given points to balls of a variable radius s .The resulting evolution of topological shapes is summarized by a persistence diagram. (cid:73) Example 1.1.

Fig. 1 illustrates the key concepts (before formal deﬁnitions) for the pointset A = { , , , , } in the real line R . Imagine that we gradually blur original data pointsby growing balls of the same radius s around the given points. The balls of the closestpoints 9 ,

10 start overlapping at the scale s = 0 . { , } . This merger is shown by blue arcs joining at the node at s = 0 . s . . .

52 birthdeath0 . . . . Figure 1

Top : the 5-point cloud A = { , , , , } ⊂ R . Bottom from left to right: single-linkage dendrogram ∆ SL ( A ) from Deﬁnition 2.1, the 0D persistence diagram PD from Deﬁnition 4.4and the new mergegram MG from Deﬁnition 3.4, where the red color shows dots of multiplicity 2. © Yury Elkin, Vitaliy Kurlin;licensed under Creative Commons License CC-BY45th International Symposium on Mathematical Foundations of Computer Science (MFCS 2020).Editors: Javier Esparza and Daniel Král’; Article No. 56; pp. 56:1–56:13Leibniz International Proceedings in InformaticsSchloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany a r X i v : . [ c s . C G ] J u l The persistence diagram PD in the bottom middle picture of Fig. 1 represents this mergerby the dot (0 , .

5) meaning that a singleton cluster of (say) point 9 was born at the scale s = 0 and then died later at s = 0 . { , } and { , } merge at s = 1 .

5, this event waspreviously encoded in the persistence diagram by the single dot (0 , .

5) meaning that onecluster inherited from (say) point 10 was born at s = 0 and has died at s = 1 . . , .

5) means that the cluster { , } merged at thecurrent scale s = 1 . s = 0 .

5. The dot (1 , . { , } merged at the current scale s = 1 . s = 1.Every arc in the single-linkage dendrogram between nodes at scales b and d contributesone dot ( b, d ) to the mergegram, e.g. both singleton sets { } , { } merging at s = 0 . , .

5) or one dot of multiplicity 2 shown in red, see Fig. 1.Example 1.1 shows that the mergegram MG retains more geometric information of a set A than the persistence diagram PD. It turns out that this new intermediate object (largerthan PD and smaller than a full dendrogram) enjoys the stability of persistence, which makesMG useful for analysing noisy data in all cases when distance-based 0D persistence is used.Here is the summary of new contributions to Topological Data Analysis. • Deﬁnition 3.4 introduces the concept of a mergegram for any dendrogram of clustering. • Theorem 5.3 and Example 5.4 justify that the mergegram of a single-linkage dendrogramis strictly stronger than the 0D persistence of a distance-based ﬁltration of sublevel sets. • Theorem 7.4 proves that the mergegram of any single-linkage dendrogram is stable in thebottleneck distance under perturbations of a ﬁnite set in the Hausdorﬀ distance. • Theorem 8.2 shows that the mergegram can be computed in a near linear time.

The aim of clustering is to split a given set of points into clusters such that points withinone cluster are more similar to each other than points from diﬀerent clusters.A clustering problem can be made exact by specifying a distance between given pointsand restrictions on outputs, e.g. a number of clusters or a cost function to minimize.All hierarchical clustering algorithms can output a hierarchy of clusters or a dendrogramvisualising mergers of clusters as explained later in Deﬁnition 3.2. Here we introduce onlythe simplest single-linkage clustering, which plays the central role in the paper. (cid:73)

Deﬁnition 2.1 (single-linkage clustering) . Let A be a ﬁnite set in a metric space X with adistance d : X × X → [0 , + ∞ ). Given a distance threshold, which will be called a scale s ,any points a, b ∈ A should belong to one SL cluster if and only if there is a ﬁnite sequence a = a , . . . , a m = b ∈ A such that any two successive points have a distance at most s , i.e. d ( a i , a i +1 ) ≤ s for i = 1 , . . . , m −

1. Let ∆ SL ( A ; s ) denote the collection of SL clusters at thescale s . For s = 0, any point a ∈ A forms a singleton cluster { a } . Representing each clusterfrom ∆ SL ( A ; s ) over all s ≥ single-linkage dendrogram ∆ SL ( A )visualizing how clusters merge, see the ﬁrst bottom picture in Fig. 1.1. (cid:4) Another way to visualize SL clusters is to build a Minimum Spanning Tree below. .Elkin et. al. 56:3 (cid:73)

Deﬁnition 2.2 (Minimum Spanning Tree

MST( A ) ) . The

Minimum Spanning Tree

MST( A )of a ﬁnite set A in a metric space X with a distance d is a tree (a connected graph withoutcycles) that has the vertex set A and the minimum total length of edges. We assume thatthe length of any edge between vertices a, b ∈ A is measured as d ( a, b ). (cid:4) A review of the relevant past work on persistence diagrams is postponed to section 4,which introduces more auxiliary notions. A persistence diagram consists of dots ( b, d ) ∈ R whose birth/death coordinates represent a life interval [ b, d ) of a homology class, e.g. aconnected component in a Vietoris-Rips ﬁltration, see the bottom middle picture in Fig. 1.Persistence diagrams are isometry invariants that are stable under noise in the sense thata topological space and its noisy point sample have close persistence diagrams. This stabilityunder noise allows us to classify continuous shapes by using only their discrete samples.Imagine that several rigid shapes are sparsely represented by a few salient points, e.g.corners or local maxima of a distance function. Translations and rotations of these pointclouds do not change the underlying shapes. Hence clouds should be classiﬁed moduloisometries that preserve distances between points. The important problem is to recognize ofa shape, e.g. within a given set of representatives, from its sparse point sample with noise.This paper solves the problem by computing isometry invariants, namely the new mergegram,the 0D persistence and the pair-set of distances to two nearest neighbors for each point.Since all dots in a persistence diagram are unordered, our experimental section 8 uses aneural network whose output is invariant under permutations of input point by construction.PersLay [2] is a collection of permutation invariant neural network layers i.e. functions onsets of points in R n that give the same output regardless of the order they are inserted.PersLay extends the neural network layers introduced in Deep Sets [10]. Perslay introducesnew layers to specially handle persistence diagrams, as well as new form of representing suchlayers. Each layer is a combination of a coeﬃcient layer ω ( p ) : R n → R , point transformation φ ( p ) : R n → R q and permutation invariant layer op to retrieve the ﬁnal outputPersLay(diagram) = op( { ω ( p ) φ ( p ) } ) , where p ∈ diagram (any set of points in R n ) . The section introduces a merge module (a family of vector spaces with consistent linearmaps) and a mergegram (a diagram of points in R representing a merge module). (cid:73) Deﬁnition 3.1 (partition set P ( A ) ) . For any set A , a partition of A is a ﬁnite collection ofnon-empty disjoint subsets A , . . . , A k ⊂ A whose union is A . The single-block partition of A consists of the set A itself. The partition set P ( A ) consists of all partitions of A . (cid:4) If A = { , , } , then ( { , } , { } ) is a partition of A , but ( { } , { } ) and ( { , } , { , } )are not. In this case the partition set P ( A ) consists of 5 partitions( { } , { } , { } ) , ( { , } , { } ) , ( { , } , { } ) , ( { , } , { } ) , ( { , , } ) . Deﬁnition 3.2 below extends the concept of a dendrogram from [1, section 3.1] to arbitrary(possibly, inﬁnite) sets A . Since every partition of A is ﬁnite by Deﬁnition 3.1, we don’t needto add that an initial partition of A is ﬁnite. Non-singleton sets are now allowed. M F C S 2 0 2 0 partition ∆( A ; 2) at scale s = 2 { , , } map ∆ : ∆( A ; 1) → ∆( A ; 2) ↑ - partition ∆( A ; 1) at scale s = 1 {1, 2} {3}map ∆ : ∆( A ; 0) → ∆( A ; 1) % ↑ ↑ partition ∆( A ; 0) at scale s = 0 { } { } {3} birthdeath 1 212 Figure 2

The dendrogram ∆ on A = { , , } and its mergegram MG(∆) from Deﬁnition 3.4. (cid:73) Deﬁnition 3.2 (dendrogram of merge sets) . A dendrogram over any set A is a function∆ : [0 , ∞ ) → P ( A ) of a scale s ≥ r ≥ A ; s ) is the single block partition for all s ≥ r .(3.2b) If s ≤ t , then ∆( A ; s ) reﬁnes ∆( A ; t ), i.e. any set from ∆( t ) is a subset of some setfrom ∆( A ; t ). These inclusions of subsets of X induce the natural map ∆ ts : ∆( s ) → ∆( t ).(3.2c) There are ﬁnitely many merge scales s i such that s = 0 and s i +1 = sup { s | the map ∆ ts is identity for s ∈ [ s i , s ) } , i = 0 , . . . , m − . Since ∆( A ; s i ) → ∆( A ; s i +1 ) is not an identity map, there is a subset B ∈ ∆( s i +1 ) whosepreimage consists of at least two subsets from ∆( s i ). This subset B ⊂ X is called a merge set and its birth scale is s i . All sets of ∆( A ; 0) are merge sets at the birth scale 0. The life( B )is the interval [ s i , t ) from its birth scale s i to its death scale t = sup { s | ∆ ss i ( B ) = B } . (cid:4) Dendrograms are usually represented as trees whose nodes correspond to all sets from thepartitions ∆( A ; s i ) at merge scales. Edges of such a tree connect any set B ∈ ∆( A ; s i ) withits preimages under ∆( A ; s i ) → ∆( A ; s i +1 ). Fig. 2 shows the dendrogram on A = { , , } .In the dendrogram above, the partition ∆( A ; 1) consists of { , } and { } . The maps∆ ts induced by inclusions respect the compositions in the sense that ∆ ts ◦ ∆ sr = ∆ tr for any r ≤ s ≤ t , e.g. ∆ ( { } ) = { , } = ∆ ( { } ) and ∆ ( { } ) = { } , i.e. ∆ is a well-deﬁnedmap from the partition ∆( A ; 0) in 3 singleton sets to ∆( A ; 1), but isn’t an identity.At the scale s = 0 the merge sets { } , { } have life = [0 , { } has life = [0 , s = 1 the only merge set { , } has life = [1 , s = 2 the only merge set { , , } has life = [2 , + ∞ ). The notation ∆ is motivated as theﬁrst (Greek) letter in the word dendrogram and by a ∆-shape of a typical tree above.Condition (3.2a) means that a partition of X is trivial for all large scales s . Condition (3.2b)says that when the scale s in increasing sets from a partition ∆( s ) can only merge with eachother, but can not split. Condition (3.2c) implies that there are only ﬁnitely many mergers,when two or more subsets of X merge into a larger merge set. (cid:73) Lemma 3.3 (single-linkage dendrogram) . Given a metric space (

X, d ) and a ﬁnite set A ⊂ X , the single-linkage dendrogram ∆ SL ( X ) from Deﬁnition 2.1 satisﬁes Deﬁnition 3.2. Proof.

Since A is ﬁnite, there are only ﬁnitely many inter-point distances within A , whichimplies condition (3.2a,c). Let f ( p ) : X → R be the distance from a point p ∈ X to (theclosest point of) A . Condition (3.2b) follows the inclusions f − [0 , s ) ⊆ f − [0 , t ) for s ≤ t . (cid:74) A mergegram represents lives of merge sets by dots with two coordinates (birth,death). .Elkin et. al. 56:5 (cid:73) Deﬁnition 3.4 (mergegram

MG(∆) ) . The mergegram of a dendrogram ∆ from Deﬁnition 3.2has the dot (birth,death) in R for each merge set A of ∆ with life( A )=[birth,death). If anylife interval appears k times, the dot (birth,death) has the multiplicity k in MG(∆). (cid:4) For simplicity, this paper considers vector spaces with coeﬃcients (of linear combinationsof vectors) only in Z = { , } , which can be replaced by any ﬁeld. (cid:73) Deﬁnition 3.5 (merge module M (∆) ) . For any dendrogam ∆ on a set X from Deﬁnition 3.2,the merge module M (∆) consists of the vector spaces M s (∆), s ∈ R , and linear maps m ts : M s (∆) → M t (∆), s ≤ t . For any s ∈ R and A ∈ ∆( s ), the space M s (∆) has thegenerator or a basis vector [ A ] ∈ M s (∆). For s < t and any set A ∈ ∆( s ), if the image of A under ∆ ts coincides with A ⊂ X , i.e. ∆ ts ( A ) = A , then m ts ([ A ]) = [ A ], else m ts ([ A ]) = 0. (cid:4) scale s = + ∞ m + ∞ ↑ ↑ scale s = 2 Z m ↑ ↑ ↑ scale s = 1 Z ⊕ Z m ↑ ↑ ↑ ↑ scale s = 0 Z ⊕ Z ⊕ Z [{1}] [{2}] [{3}] Figure 3

The merge module M (∆) of the dendrogram ∆ on the set X = { , , } in Fig. 2. (cid:73) Example 3.6.

Fig. 4 shows the metric space X = { a, b, c, d, e } with distances deﬁned bythe shortest path metric induced by the speciﬁed edge-lengths, see the distance matrix.xa 5 b 1 c2 y2 pq22 a b c p qa 0 6 7 9 9b 6 0 3 5 5c 7 3 0 6 6p 9 5 6 0 4q 9 5 6 4 0 Figure 4

The set X = { a, b, c, d, e } has the distance matrix deﬁned by the shortest path metric. . . . . . . Figure 5

Left : the dendrogram ∆ for the single linkage clustering of the set 5-point set X = { a, b, c, d, e } in Fig. 4. Right : the mergegram MG(∆), red dots have multiplicity 2.

M F C S 2 0 2 0

The dendrogram ∆ in the ﬁrst picture of Fig. 5 generates the mergegram as follows:each of the singleton sets { b } and { c } has the dot (0,1.5), so its multiplicity is 2;each of the singleton sets { p } and { q } has the dot (0,2), so its multiplicity is 2;the singleton set { a } has the dot (0 , { b, c } has the dot (1.5,2.5);the merge set { p, q } has the dot (2,2.5); the merge set { b, c, p, q } has the dot (2.5,3);the merge set { a, b, c, p, q } has the dot (3 , + ∞ ). This section introduces the key concepts from the thorough review by Chazal et al. [3]. Aswill become clear soon, the merge module of any dendrogram belongs to a wider class below. (cid:73)

Deﬁnition 4.1 (persistence module V ) . A persistence module V over the real numbers R isa family of vector spaces V t , t ∈ R with linear maps v ts : V s → V t , s ≤ t such that v tt is theidentity map on V t and the composition is respected: v ts ◦ v sr = v tr for any r ≤ s ≤ t . (cid:4) The set of real numbers can be considered as a category R in the following sense. Theobjects of R are all real numbers. Any two real numbers such that a ≤ b deﬁne a singlemorphism a → b . The composition of morphisms a → b and b → c is the morphism a ≤ c .In this language, a persistence module is a functor from R to the category of vector spaces.A basic example of V is an interval module. An interval J between points p < q in theline R can be one of the following types: closed [ p, q ], open ( p, q ) and half-open or half-closed[ p, q ) and ( p, q ]. It is convenient to encode types of endpoints by ± superscripts as follows:[ p − , q + ] := [ p, q ] , [ p + , q − ] := ( p, q ) , [ p + , q + ] := ( p, q ] , [ p − , q − ] := [ p, q ) . The endpoints p, q can also take the inﬁnite values ±∞ , but without superscripts. (cid:73) Example 4.2 (interval module I ( J ) ) . For any interval J ⊂ R , the interval module I ( J ) isthe persistence module deﬁned by the following vector spaces I s and linear maps i ts : I s → I t I s = (cid:26) Z , for s ∈ J, , otherwise ; i ts = (cid:26) id , for s, t ∈ J, , otherwise for any s ≤ t. The direct sum W = U ⊕ V of persistence modules U , V is deﬁned as the persistencemodule with the vector spaces W s = U s ⊕ V s and linear maps w ts = u ts ⊕ v ts .We illustrate the abstract concepts above using geometric constructions of TopologicalData Analysis. Let f : X → R be a continuous function on a topological space. Its sublevel sets X fs = f − (( −∞ , s ]) form nested subspaces X fs ⊂ X ft for any s ≤ t . The inclusions ofthe sublevel sets respect compositions similarly to a dendrogram ∆ in Deﬁnition 3.2.On a metric space X with with a distance function d : X × X → [0 , + ∞ ), a typicalexample of a function f : X → R is the distance to a ﬁnite set of points A ⊂ X . Morespeciﬁcally, for any point p ∈ X , let f ( p ) be the distance from p to (a closest point of) A . For any r ≥

0, the preimage X fr = f − (( −∞ , r ]) = { q ∈ X | d ( q, A ) ≤ r } is theunion of closed balls that have the radius r and centers at all points p ∈ A . For example, X f = f − (( −∞ , A and X f + ∞ = f − ( R ) = X .If we consider any continuous function f : X → R , we have the inclusion X fs ⊂ X fr forany s ≤ r . Hence all sublevel sets X fs form a nested sequence of subspaces within X . Theabove construction of a ﬁltration { X fs } can be considered as a functor from R to the categoryof topological spaces. Below we discuss the most practically used case of dimension 0. .Elkin et. al. 56:7 (cid:73) Example 4.3 (persistent homology) . For any topological space X , the 0-dimensional homology H ( X ) is the vector space (with coeﬃcients Z ) generated by all connectedcomponents of X . Let { X s } be any ﬁltration of nested spaces, e.g. sublevel sets X fs based ona continuous function f : X → R . The inclusions X s ⊂ X r for s ≤ r induce the linear mapsbetween homology groups H ( X s ) → H ( X r ) and deﬁne the persistent homology { H ( X s ) } ,which satisﬁes the conditions of a persistence module from Deﬁnition 4.1. (cid:4) If X is a ﬁnite set of m points, then H ( X ) is the direct sum Z m of m copies of Z .The persistence modules that can be decomposed as direct sums of interval modules canbe described in a very simple combinatorial way by persistence diagrams of dots in R . (cid:73) Deﬁnition 4.4 (persistence diagram

PD( V ) ) . Let a persistence module V be decomposedas a direct sum of interval modules from Example 4.2 : V ∼ = L l ∈ L I ( p ∗ l , q ∗ l ), where ∗ is + or − .The persistence diagram PD( V ) is the multiset PD( V ) = { ( p l , q l ) | l ∈ L } \ { p = q } ⊂ R . (cid:4) The 0-dimensional persistent homology of a space X with a continuous function f : X → R will be denoted by PD { H ( X fs ) } . Lemma 7.1 will prove that the merge module M (∆) of anydendrogram ∆ is also decomposable into interval modules. Hence the mergegram MG(∆)from Deﬁnition 3.4 can be interpreted as the persistence diagram of the merge module M (∆). Let f : X → R be the distance function to a ﬁnite subset A of a metric space ( X, d ). Thepersistent homology { H k ( X fs ) } in any dimension k is invariant under isometries of X .Moreover, the persistence diagrams of very diﬀerent shapes, e.g. topological spaces andtheir discrete samples, can be easily compared by the bottleneck distance in Deﬁnition 6.3.Practical applications of persistence are justiﬁed by Stability Theorem 6.4 saying thatthe persistence diagram continuously changes under perturbations of a given ﬁltration or aninitial point set. A similar stability of mergegrams will be proved in Theorem 7.4.This section shows that the mergegram MG(∆ SL ( A )) has more isometry informationabout the subset A ⊂ X than the 0-dimensional persistent homology { H ( X fs ) } .Theorem 5.3 shows how to obtain the 0D persistence PD { H ( X fs ) } from MG(∆ SL ( A )),where f : X → R is the distance to a ﬁnite subset A ⊂ X . Example 5.4 builds two 4-pointsets in R whose persistence diagrams are identical, but their mergegrams are diﬀerent.We start from folklore Claims 5.1-5.2, which interpret the 0D persistence PD { H ( X fs ) } using the classical concepts of the single-linkage dendrogram and Minimum Spanning Tree. (cid:73) Claim 5.1 (0D persistence from ∆ SL ) . For a ﬁnite set A in a metric space ( X, d ), let f : X → R be the distance to A . In the single-linkage dendrogram ∆ SL ( A ), let 0 < s < · · · < s m < s m +1 = + ∞ be all distinct merge scales. If k ≥ A merge into alarger subset of A at a scale s i , the multiplicity of s i is µ i = k −

1. Then the persistencediagram PD { H ( X fs ) } consists of the dots (0 , s i ) with multiplicities µ i , i = 1 , . . . , m + 1. (cid:4)(cid:73) Claim 5.2 (0D persistence from MST) . For a set A of n points in a metric space ( X, d ), let f : X → R be the distance to A . Let a Minimum Spanning Tree MST( A ) have edge-lengths l ≤ · · · ≤ l n − . The persistence diagram PD { H ( X fs ) } consists of the n − , . l i )counted with multiplicities if some edge-lengths are equal, plus the inﬁnite dot (0 , + ∞ ). (cid:4) M F C S 2 0 2 0 (cid:73)

Theorem 5.3 (0D persistence from a mergegram) . For a ﬁnite set A in a metric space( X, d ), let f : X → R be the distance to A . Let the mergegram MG(∆ SL ( A )) be a multiset { ( b i , d i ) } ki =1 , where some dots can be repeated. Then the persistence diagram PD { H ( X fs ) } is the diﬀerence of the multisets { (0 , d i ) } ki =1 − { (0 , b i ) } ki =1 containing each dot (0 , s ) exactly b − d times, where b is the number of births b i = s , d is the number of deaths d i = s .All trivial dots (0 ,

0) are ignored, alternatively we take { (0 , d i ) } ki =1 only with d i > (cid:4) Proof.

In the language of Claim 5.1, let at a scale s > µ exactly µ + 1subsets merge into a set B ∈ ∆ SL ( A ; s ). By Claim 5.1 this set B contributes µ dots (0 , s ) tothe persistence diagrams PD { H ( X fs ) } . By Deﬁnition 3.4 the same set B contributes µ + 1dots of the form ( b i , s ), i = 1 , . . . , µ + 1, corresponding to the µ + 1 sets that merge into B at the scale s . Moreover, the set B itself will merge later into a larger set, which creates oneextra dot ( s, d ) ∈ PD { H ( X fs ) } . The exceptional case B = A corresponds to d = + ∞ .If we remove one dot (0 , s ) from the µ + 1 dots counted above as expected in the diﬀerence { (0 , d i ) } ki =1 − { (0 , b i ) } ki =1 of multisets, we get exactly µ dots (0 , s ) ∈ PD { H ( X fs ) } . Therequired formula has been proved for contributions of any merge set B ⊂ A . (cid:74) In Example 1.1 the mergegram in the last picture of Fig. 1 is the multiset of 9 dots:MG(∆ SL ( A )) = { (0 , . , (0 , . , (0 , , (0 , , (0 . , . , (1 , . , (0 , , (1 . , , (2 , + ∞ ) } . Taking the diﬀerence of multisets and ignoring trivial dots (0 , H { X fs } ) = { (0 , . , (0 , . , (0 , , (0 , , (0 , . , (0 , . , (0 , , (0 , , (0 , + ∞ ) }−−{ (0 , . , (0 , , (0 , } = { (0 , . , (0 , , (0 , . , (0 , , (0 , + ∞ ) } as in Fig. 1 . (cid:73) Example 5.4 (the mergegram is stronger than 0D persistence) . Fig. 6 and 7 show thedendrograms, identical 0D persistence diagrams and diﬀerent mergegrams for the sets A = { , , , } and B = { , , , } in R . This example together with Theorem 5.3 justifythat the new mergregram is strictly stronger than 0D persistence as an isometry invariant.scale s . . . .

52 birthdeath0 . . . . Figure 6

Left : single-linkage dendrogram ∆ SL ( A ) for A = { , , , } ⊂ R . Middle : the 0Dpersistence diagram for the sublevel ﬁltration of the distance to A . Right : mergegram MG(∆ SL ( A )). scale s . . . .

52 birthdeath0 . . . . Figure 7

Left : single-linkage dendrogram ∆ SL ( B ) for B = { , , , } ⊂ R . Middle : the 0Dpersistence diagram for the sublevel ﬁltration of the distance to B . Right : mergegram MG(∆ SL ( B )). .Elkin et. al. 56:9 Deﬁnition 6.1 introduces homomorphisms between persistence modules, which are needed tostate the stability of persistence diagrams PD { H ( X fs ) } under perturbations of a function f : X → R . This result will imply a similar stability for the mergegram MG(∆ SL ( A )) forthe dendrogram ∆ SL ( A ) of the single-linkage clustering of a set A within a metric space X . (cid:73) Deﬁnition 6.1 (a homomorphism of a degree δ between persistence modules) . Let U and V be persistent modules over R . A homomorphism U → V of degree δ ∈ R is a collection oflinear maps φ t : U t → V t + δ , t ∈ R , such that the diagram commutes for all s ≤ t . U s U t V s + δ V t + δ φ s u ts v t + δs + δ φ t Let Hom δ ( U , V ) be all homomorphisms U → V of degree δ . Persistence modules U , V are isomorphic if they have inverse homomorphisms U → V and V → U of degree δ = 0. (cid:4) For a persistence module V with maps v ts : V s → V t , the simplest example of a homo-morphism of a degree δ ≥ δ V : V → V deﬁned by the maps v s + δs , t ∈ R . So the maps v ts deﬁning the structure of V shift all vector spaces V s the diﬀerence of scale δ = t − s .The concept of interleaved modules below is an algebraic generalization of a geometricperturbation of a set X in terms of (the homology of) its sublevel sets X s . (cid:73) Deﬁnition 6.2 (interleaving distance ID) . Persistent modules U and V are δ -interleaved ifthere are homomorphisms φ ∈ Hom δ ( U , V ) and ψ ∈ Hom δ ( V , U ) such that φ ◦ ψ = 1 δ V and ψ ◦ φ = 1 δ U . The interleaving distance is ID( U , V ) = inf { δ ≥ | U and V are δ -interleaved } . (cid:4) If f, g : X → R are continuous functions such that || f − g || ∞ ≤ δ in the L ∞ -distance, thepersistence modules H k { f − ( −∞ , s ] } , H k { g − ( −∞ , s ] } are δ -interleaved for any k [4]. Thelast conclusion extended to persistence diagrams in terms of the bottleneck distance below. (cid:73) Deﬁnition 6.3 (bottleneck distance BD) . Let multisets

C, D contain ﬁnitely many points( p, q ) ∈ R , p < q , of ﬁnite multiplicity and all diagonal points ( p, p ) ∈ R of inﬁnitemultiplicity. For δ ≥

0, a δ -matching is a bijection h : C → D such that | h ( a ) − a | ∞ ≤ δ inthe L ∞ -distance on the plane for any point a ∈ C . The bottleneck distance between persistencemodules U , V is BD( U , V ) = inf { δ | there is a δ -matching between PD( U ) and PD( V ) } . (cid:4) The original stability of persistence for sequences of sublevel sets persistence was extendedas Theorem 6.4 to q -tame persistence modules. Intuitively, a persistence module V is q -tameany non-diagonal square in the persistence diagram PD( V ) contains only ﬁnitely many ofpoints, see [3, section 2.8]. Any ﬁnitely decomposable persistence module is q -tame. (cid:73) Theorem 6.4 (stability of persistence modules) . [3, isometry theorem 4.11] Let U and V be q-tame persistence modules. Then ID( U , V ) = BD(PD( U ) , PD( V )), where ID is theinterleaving distance, BD is the bottleneck distance between persistence modules. (cid:4) M F C S 2 0 2 0

In a dendrogram ∆ from Deﬁnition 3.2, any merge set A of ∆ has a life interval life( A ) = [ b, d )from its birth scale b to its death scale d . Lemmas 7.1 and 7.3 are proved in appendices. (cid:73) Lemma 7.1 (merge module decomposition) . For any dendrogram ∆ in the sense of Deﬁni-tion 3.2, the merge module M (∆) ∼ = L A I (life( A )) decomposes over all merge sets A . (cid:4) Lemma 7.1 will allow us to use the stability of persistence in Theorem 6.4 for mergemodules and also Lemma 7.3. Stability of the mergegram MG(∆ SL ( A )) will be proved underperturbations of A in the Hausdorﬀ distance deﬁned below. (cid:73) Deﬁnition 7.2 (Hausdorﬀ distance HD) . For any subsets

A, B of a metric space (

X, d ), the

Hausdorﬀ distance

HD(

A, B ) is max { sup a ∈ A inf b ∈ B d ( a, b ) , sup b ∈ B inf a ∈ A d ( a, b ) } . (cid:4)(cid:73) Lemma 7.3 (merge modules interleaved) . If any subsets

A, B of a metric space (

X, d ) haveHD(

A, B ) = δ , then the merge modules M (∆ SL ( A )) and M (∆ SL ( B )) are δ -interleaved. (cid:4)(cid:73) Theorem 7.4 (stability of a mergegram) . Any ﬁnite subsets

A, B of a metric space (

X, d )have the mergegrams BD(MG(∆ SL ( A )) , MG(∆ SL ( B )) ≤ HD(

A, B ). Hence any small per-turbation of A in the Hausdorﬀ distance yields a similarly small perturbation in the bottleneckdistance for its mergegram MG(∆ SL ( A )) of the single-linkage clustering dendrogram ∆ SL ( A ). Proof.

The given subsets

A, B with HD(

A, B ) = δ have δ -interleaved merge modules byLemma 7.3, i.e. ID(MG(∆ SL ( A )) , MG(∆ SL ( B )) ≤ δ . Since any merge module M (∆) is ﬁ-nitely decomposable, hence q -tame, by Lemma 7.1, the corresponding mergegram MG( M (∆))satisﬁes Theorem 6.4, i.e. BD(MG(∆ SL ( A )) , MG(∆ SL ( B )) ≤ δ as required. (cid:74) Theorem 7.4 is conﬁrmed by the following experiment on cloud perturbations in Fig. 8. b o tt l e n ec k d i s t a n ce Figure 8

The bottleneck distances (average on the left, maximum on the right) between merge-grams of sampled point clouds and their perturbations. Both graphs are below the line y = 2 x . We uniformly generate N = 100 black points in the cube [0 , ⊂ R . Then we generate a random number of red points such that the (cid:15) ball of every black pointrandomly has 1, 2 or 3 red points for a noise bound (cid:15) ∈ [0 . ,

10] taken with a step size 0.1. Compute the bottleneck distance between the mergegrams of black and red points. Repeat the experiment K = 100 times, plot the average and maximum in Fig. 8. .Elkin et. al. 56:11 Algorithm 8.1 computes the mergegram of the SL dendrogram for any ﬁnite set A ⊂ R m . (cid:73) Algorithm 8.1.

Input : a ﬁnite point cloud A ⊂ R m Compute MST( A ) and sort all edges of MST( A ) in increasing order of lengthInitialize Union-Find structure U over A . Set all points of A to be their components.Initialize the function prev: Components[ U ] → R by setting prev( t ) = 0 for all t Initialize the vector Output that will consists of pairs in R × R for Edge e = ( a, b ) in the set of edges (increasing order) do Find components c and c of a and b respectively in Union-Find U Add pairs (prev[ c ],length( e )), (prev[ c ],length( e )) ∈ R to OutputMerge components c and c in Union-Find U and denote the component by t Set prev[ t ] = length( e ) end forreturn OutputLet α ( n ) be the inverse Ackermann function. Other constants below are deﬁned in [9]. (cid:73) Theorem 8.2 (a fast mergegram computation) . For any cloud A ⊂ R m of n points, themergegram MG(∆ SL ( A )) can be computed in time O (max { c , c p c l } c n log n α ( n )). Proof.

A Minimum Spanning Tree MST( A ) needs O (max { c , c p c l } c n log n α ( n )) time by[9, Theorem 5.1]. The rest of Algorithm 8.1 is dominated by O ( nα ( n )) Union-Find operations.Hence the full algorithm has the same computational complexity as the MST. (cid:74) The experiments summarized in Fig. 10 show that the mergegram curve in blue outper-forms other isometry invariants on the isometry classiﬁcation by the state-of-the-art PersLay.We generated 10 classes of 100-point clouds within the unit ball R m for m = 2 , , ,

5. Foreach class, we made 100 copies of each cloud and perturbed every point by a uniform randomshift in a cube of the size 2 × (cid:15) , where (cid:15) is called a noise bound . For each of 100 perturbedclouds, we added 25 points such that every new point is (cid:15) -close to an original point. Withineach of 10 classes all 100 clouds were randomly rotated within the unit ball around the origin,see Fig. 9. For each of the resulting 1000 clouds, we computed the mergegram, 0D persistencediagram and the diagram of pairs of distances to two nearest neighbors for every point. Figure 9

Left : an initial random cloud with 100 blue points.

Middle : all blue points areperturbed, 25 extra orange points are added.

Right : a cloud is rotated through a random angle.Can we recognize that the initial and ﬁnal clouds are in the same isometry class modulo small noise?

M F C S 2 0 2 0 · − . . .

81 2 dimensions; noise bound s u cce ss r a t e mergegramPD0NN(2)cloud 1 2 3 4 5 · − . . . .

81 2 dimensions; noise bound1 2 3 4 5 · − . . .

81 4 dimensions; noise bound s u cce ss r a t e · − . . . .

81 5 dimensions; noise bound

Figure 10

Success rates of PersLay in identifying isometry classes of 100-point clouds uniformlysampled in a unit ball, averaged over 5 diﬀerent clouds and 5 cross-validations with 20/80 splits.

The machine learning part has used the obtained diagrams as the input-data for thePerslay [2]. Each dataset was split into learning and test subsets in ratio 4:1. The learningloops ran by iterating over mini-batches consisting of 128 elements and going through the fulldataset for a given number of epochs. The success rate was measured on the test subset.The original Perslay module was rewritten in Tensorﬂow v2 and RTX 2080 graphics cardwas used to run the experiments. The technical concepts of PersLay are explained in [2]:Adam(Epochs = 300, Learning rate = 0.01)Coeﬃcents = Linear coeﬃcentsFunctional layer = [PeL(dim=50), PeL(dim=50, operationalLayer=PermutationMaxLayer)].Operation layer = TopK(50)The PersLay training has used the following invariants compared in Fig. 10:cloud : the initial cloud A of points corresponds to the baseline curve in black;PD0: the 0D persistence diagram PD for distance-based ﬁltrations of sublevel sets in red;NN(2) brown curve: for each point a ∈ A includes distances to two nearest neighbors;the mergegram MG(∆ SL ( A )) of the SL dendrogram has the blue curve above others.Fig. 10 shows that the new mergegram has outperformed all other invariants on theisometry classiﬁcation problem. The 0D persistence turned out to be weaker than the .Elkin et. al. 56:13 pairs of distances to two neighbors. The topological persistence has found applicationsin data skeletonization with theoretical guarantees [8, 5]. We are planning to extend theexperiments in section 8 for classifying rigid shapes by comining the new mergegram withthe 1D persistence, which has the fast O ( n log n ) time for any 2D cloud of n points [7, 6].In conclusion, the paper has extended the 0D persistence to a stronger isometry invariant,which has kept the celebrated stability under noise important for applications to noisy data.The initial C++ code for the mergregram is at https://github.com/YuryUoL/Mergegram andwill be updated. We thank all the reviewers for their valuable time and helpful suggestions. References Gunnar Carlsson and Facundo Memoli. Characterization, stability and convergence of hier-archical clustering methods.

Journal of machine learning research , 11:1425–1470, 2010. Mathieu Carriere, Frederic Chazal, Yuichi Ike, Theo Lacombe, Martin Royer, and YuheiUmeda. Perslay: A neural network layer for persistence diagrams and new graph topologicalsignatures.

AISTATS, arXiv:1904.09378 , 2020. Frédéric Chazal, Vin De Silva, Marc Glisse, and Steve Oudot.

The structure and stability ofpersistence modules . Springer, 2016. David Cohen-Steiner, Herbert Edelsbrunner, and John Harer. Stability of persistence diagrams.

Discrete & Computational Geometry , 37(1):103–120, 2007. Sara Kalisnik, Vitaliy Kurlin, and Davorin Lesnik. A higher-dimensional homologicallypersistent skeleton.

Adv. App. Maths , 102:113–142, 2019. Vitaliy Kurlin. Auto-completion of contours in sketches, maps and sparse 2d images basedon topological persistence. In

Proceedings of SYNASC 2014 workshop CTIC: ComputationalTopology in Image Context , pages 594–601. IEEE, 2014. Vitaliy Kurlin. A fast and robust algorithm to count topologically persistent holes in noisyclouds. In

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition ,pages 1458–1463, 2014. Vitaliy Kurlin. A homologically persistent skeleton is a fast and robust descriptor of interestpoints in 2d images. In

LNCS, Procedings of CAIP: Computer Analysis of Images and Patterns ,volume 9256, pages 606 – 617, 2015. William B March, Parikshit Ram, and Alexander G Gray. Fast euclidean minimum spanningtree: algorithm, analysis, and applications. In

Proceedings of SIG KDD: Knowledge discoveryand data mining , pages 603–612, 2010. Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Russ R Salakhutdinov,and Alexander J Smola. Deep sets. In

Advances in neural information processing systems ,pages 3391–3401, 2017.,pages 3391–3401, 2017.