The mergegram of a dendrogram and its stability
TThe mergegram of a dendrogram and its stability
Yury Elkin
Materials Innovation Factory and Computer Science department, University of Liverpool, [email protected]
Vitaliy Kurlin
Materials Innovation Factory and Computer Science department, University of Liverpool, [email protected]
Abstract
This paper extends the key concept of persistence within Topological Data Analysis (TDA) in a newdirection. TDA quantifies topological shapes hidden in unorganized data such as clouds of unorderedpoints. In the 0-dimensional case the distance-based persistence is determined by a single-linkage(SL) clustering of a finite set in a metric space. Equivalently, the 0D persistence captures onlyedge-lengths of a Minimum Spanning Tree (MST). Both SL dendrogram and MST are unstableunder perturbations of points. We define the new stable-under-noise mergegram, which outperformsprevious isometry invariants on a classification of point clouds by PersLay.
Theory of computation → Computational geometry
Keywords and phrases clustering dendrogram, topological data analysis, persistence, stability
Digital Object Identifier
Funding
The authors were supported by the £3.5M EPSRC grant EP/R018472/1 (2018-2023)
TDA is now expanding towards machine learning and statistics due to stability that wasproved in a very general form by Chazal et al. [3]. The key idea of TDA is to view a givencloud of points across all scales s , e.g. by blurring given points to balls of a variable radius s .The resulting evolution of topological shapes is summarized by a persistence diagram. (cid:73) Example 1.1.
Fig. 1 illustrates the key concepts (before formal definitions) for the pointset A = { , , , , } in the real line R . Imagine that we gradually blur original data pointsby growing balls of the same radius s around the given points. The balls of the closestpoints 9 ,
10 start overlapping at the scale s = 0 . { , } . This merger is shown by blue arcs joining at the node at s = 0 . s . . .
52 birthdeath0 . . . . Figure 1
Top : the 5-point cloud A = { , , , , } ⊂ R . Bottom from left to right: single-linkage dendrogram ∆ SL ( A ) from Definition 2.1, the 0D persistence diagram PD from Definition 4.4and the new mergegram MG from Definition 3.4, where the red color shows dots of multiplicity 2. © Yury Elkin, Vitaliy Kurlin;licensed under Creative Commons License CC-BY45th International Symposium on Mathematical Foundations of Computer Science (MFCS 2020).Editors: Javier Esparza and Daniel Král’; Article No. 56; pp. 56:1–56:13Leibniz International Proceedings in InformaticsSchloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany a r X i v : . [ c s . C G ] J u l The persistence diagram PD in the bottom middle picture of Fig. 1 represents this mergerby the dot (0 , .
5) meaning that a singleton cluster of (say) point 9 was born at the scale s = 0 and then died later at s = 0 . { , } and { , } merge at s = 1 .
5, this event waspreviously encoded in the persistence diagram by the single dot (0 , .
5) meaning that onecluster inherited from (say) point 10 was born at s = 0 and has died at s = 1 . . , .
5) means that the cluster { , } merged at thecurrent scale s = 1 . s = 0 .
5. The dot (1 , . { , } merged at the current scale s = 1 . s = 1.Every arc in the single-linkage dendrogram between nodes at scales b and d contributesone dot ( b, d ) to the mergegram, e.g. both singleton sets { } , { } merging at s = 0 . , .
5) or one dot of multiplicity 2 shown in red, see Fig. 1.Example 1.1 shows that the mergegram MG retains more geometric information of a set A than the persistence diagram PD. It turns out that this new intermediate object (largerthan PD and smaller than a full dendrogram) enjoys the stability of persistence, which makesMG useful for analysing noisy data in all cases when distance-based 0D persistence is used.Here is the summary of new contributions to Topological Data Analysis. • Definition 3.4 introduces the concept of a mergegram for any dendrogram of clustering. • Theorem 5.3 and Example 5.4 justify that the mergegram of a single-linkage dendrogramis strictly stronger than the 0D persistence of a distance-based filtration of sublevel sets. • Theorem 7.4 proves that the mergegram of any single-linkage dendrogram is stable in thebottleneck distance under perturbations of a finite set in the Hausdorff distance. • Theorem 8.2 shows that the mergegram can be computed in a near linear time.
The aim of clustering is to split a given set of points into clusters such that points withinone cluster are more similar to each other than points from different clusters.A clustering problem can be made exact by specifying a distance between given pointsand restrictions on outputs, e.g. a number of clusters or a cost function to minimize.All hierarchical clustering algorithms can output a hierarchy of clusters or a dendrogramvisualising mergers of clusters as explained later in Definition 3.2. Here we introduce onlythe simplest single-linkage clustering, which plays the central role in the paper. (cid:73)
Definition 2.1 (single-linkage clustering) . Let A be a finite set in a metric space X with adistance d : X × X → [0 , + ∞ ). Given a distance threshold, which will be called a scale s ,any points a, b ∈ A should belong to one SL cluster if and only if there is a finite sequence a = a , . . . , a m = b ∈ A such that any two successive points have a distance at most s , i.e. d ( a i , a i +1 ) ≤ s for i = 1 , . . . , m −
1. Let ∆ SL ( A ; s ) denote the collection of SL clusters at thescale s . For s = 0, any point a ∈ A forms a singleton cluster { a } . Representing each clusterfrom ∆ SL ( A ; s ) over all s ≥ single-linkage dendrogram ∆ SL ( A )visualizing how clusters merge, see the first bottom picture in Fig. 1.1. (cid:4) Another way to visualize SL clusters is to build a Minimum Spanning Tree below. .Elkin et. al. 56:3 (cid:73)
Definition 2.2 (Minimum Spanning Tree
MST( A ) ) . The
Minimum Spanning Tree
MST( A )of a finite set A in a metric space X with a distance d is a tree (a connected graph withoutcycles) that has the vertex set A and the minimum total length of edges. We assume thatthe length of any edge between vertices a, b ∈ A is measured as d ( a, b ). (cid:4) A review of the relevant past work on persistence diagrams is postponed to section 4,which introduces more auxiliary notions. A persistence diagram consists of dots ( b, d ) ∈ R whose birth/death coordinates represent a life interval [ b, d ) of a homology class, e.g. aconnected component in a Vietoris-Rips filtration, see the bottom middle picture in Fig. 1.Persistence diagrams are isometry invariants that are stable under noise in the sense thata topological space and its noisy point sample have close persistence diagrams. This stabilityunder noise allows us to classify continuous shapes by using only their discrete samples.Imagine that several rigid shapes are sparsely represented by a few salient points, e.g.corners or local maxima of a distance function. Translations and rotations of these pointclouds do not change the underlying shapes. Hence clouds should be classified moduloisometries that preserve distances between points. The important problem is to recognize ofa shape, e.g. within a given set of representatives, from its sparse point sample with noise.This paper solves the problem by computing isometry invariants, namely the new mergegram,the 0D persistence and the pair-set of distances to two nearest neighbors for each point.Since all dots in a persistence diagram are unordered, our experimental section 8 uses aneural network whose output is invariant under permutations of input point by construction.PersLay [2] is a collection of permutation invariant neural network layers i.e. functions onsets of points in R n that give the same output regardless of the order they are inserted.PersLay extends the neural network layers introduced in Deep Sets [10]. Perslay introducesnew layers to specially handle persistence diagrams, as well as new form of representing suchlayers. Each layer is a combination of a coefficient layer ω ( p ) : R n → R , point transformation φ ( p ) : R n → R q and permutation invariant layer op to retrieve the final outputPersLay(diagram) = op( { ω ( p ) φ ( p ) } ) , where p ∈ diagram (any set of points in R n ) . The section introduces a merge module (a family of vector spaces with consistent linearmaps) and a mergegram (a diagram of points in R representing a merge module). (cid:73) Definition 3.1 (partition set P ( A ) ) . For any set A , a partition of A is a finite collection ofnon-empty disjoint subsets A , . . . , A k ⊂ A whose union is A . The single-block partition of A consists of the set A itself. The partition set P ( A ) consists of all partitions of A . (cid:4) If A = { , , } , then ( { , } , { } ) is a partition of A , but ( { } , { } ) and ( { , } , { , } )are not. In this case the partition set P ( A ) consists of 5 partitions( { } , { } , { } ) , ( { , } , { } ) , ( { , } , { } ) , ( { , } , { } ) , ( { , , } ) . Definition 3.2 below extends the concept of a dendrogram from [1, section 3.1] to arbitrary(possibly, infinite) sets A . Since every partition of A is finite by Definition 3.1, we don’t needto add that an initial partition of A is finite. Non-singleton sets are now allowed. M F C S 2 0 2 0 partition ∆( A ; 2) at scale s = 2 { , , } map ∆ : ∆( A ; 1) → ∆( A ; 2) ↑ - partition ∆( A ; 1) at scale s = 1 {1, 2} {3}map ∆ : ∆( A ; 0) → ∆( A ; 1) % ↑ ↑ partition ∆( A ; 0) at scale s = 0 { } { } {3} birthdeath 1 212 Figure 2
The dendrogram ∆ on A = { , , } and its mergegram MG(∆) from Definition 3.4. (cid:73) Definition 3.2 (dendrogram of merge sets) . A dendrogram over any set A is a function∆ : [0 , ∞ ) → P ( A ) of a scale s ≥ r ≥ A ; s ) is the single block partition for all s ≥ r .(3.2b) If s ≤ t , then ∆( A ; s ) refines ∆( A ; t ), i.e. any set from ∆( t ) is a subset of some setfrom ∆( A ; t ). These inclusions of subsets of X induce the natural map ∆ ts : ∆( s ) → ∆( t ).(3.2c) There are finitely many merge scales s i such that s = 0 and s i +1 = sup { s | the map ∆ ts is identity for s ∈ [ s i , s ) } , i = 0 , . . . , m − . Since ∆( A ; s i ) → ∆( A ; s i +1 ) is not an identity map, there is a subset B ∈ ∆( s i +1 ) whosepreimage consists of at least two subsets from ∆( s i ). This subset B ⊂ X is called a merge set and its birth scale is s i . All sets of ∆( A ; 0) are merge sets at the birth scale 0. The life( B )is the interval [ s i , t ) from its birth scale s i to its death scale t = sup { s | ∆ ss i ( B ) = B } . (cid:4) Dendrograms are usually represented as trees whose nodes correspond to all sets from thepartitions ∆( A ; s i ) at merge scales. Edges of such a tree connect any set B ∈ ∆( A ; s i ) withits preimages under ∆( A ; s i ) → ∆( A ; s i +1 ). Fig. 2 shows the dendrogram on A = { , , } .In the dendrogram above, the partition ∆( A ; 1) consists of { , } and { } . The maps∆ ts induced by inclusions respect the compositions in the sense that ∆ ts ◦ ∆ sr = ∆ tr for any r ≤ s ≤ t , e.g. ∆ ( { } ) = { , } = ∆ ( { } ) and ∆ ( { } ) = { } , i.e. ∆ is a well-definedmap from the partition ∆( A ; 0) in 3 singleton sets to ∆( A ; 1), but isn’t an identity.At the scale s = 0 the merge sets { } , { } have life = [0 , { } has life = [0 , s = 1 the only merge set { , } has life = [1 , s = 2 the only merge set { , , } has life = [2 , + ∞ ). The notation ∆ is motivated as thefirst (Greek) letter in the word dendrogram and by a ∆-shape of a typical tree above.Condition (3.2a) means that a partition of X is trivial for all large scales s . Condition (3.2b)says that when the scale s in increasing sets from a partition ∆( s ) can only merge with eachother, but can not split. Condition (3.2c) implies that there are only finitely many mergers,when two or more subsets of X merge into a larger merge set. (cid:73) Lemma 3.3 (single-linkage dendrogram) . Given a metric space (
X, d ) and a finite set A ⊂ X , the single-linkage dendrogram ∆ SL ( X ) from Definition 2.1 satisfies Definition 3.2. Proof.
Since A is finite, there are only finitely many inter-point distances within A , whichimplies condition (3.2a,c). Let f ( p ) : X → R be the distance from a point p ∈ X to (theclosest point of) A . Condition (3.2b) follows the inclusions f − [0 , s ) ⊆ f − [0 , t ) for s ≤ t . (cid:74) A mergegram represents lives of merge sets by dots with two coordinates (birth,death). .Elkin et. al. 56:5 (cid:73) Definition 3.4 (mergegram
MG(∆) ) . The mergegram of a dendrogram ∆ from Definition 3.2has the dot (birth,death) in R for each merge set A of ∆ with life( A )=[birth,death). If anylife interval appears k times, the dot (birth,death) has the multiplicity k in MG(∆). (cid:4) For simplicity, this paper considers vector spaces with coefficients (of linear combinationsof vectors) only in Z = { , } , which can be replaced by any field. (cid:73) Definition 3.5 (merge module M (∆) ) . For any dendrogam ∆ on a set X from Definition 3.2,the merge module M (∆) consists of the vector spaces M s (∆), s ∈ R , and linear maps m ts : M s (∆) → M t (∆), s ≤ t . For any s ∈ R and A ∈ ∆( s ), the space M s (∆) has thegenerator or a basis vector [ A ] ∈ M s (∆). For s < t and any set A ∈ ∆( s ), if the image of A under ∆ ts coincides with A ⊂ X , i.e. ∆ ts ( A ) = A , then m ts ([ A ]) = [ A ], else m ts ([ A ]) = 0. (cid:4) scale s = + ∞ m + ∞ ↑ ↑ scale s = 2 Z m ↑ ↑ ↑ scale s = 1 Z ⊕ Z m ↑ ↑ ↑ ↑ scale s = 0 Z ⊕ Z ⊕ Z [{1}] [{2}] [{3}] Figure 3
The merge module M (∆) of the dendrogram ∆ on the set X = { , , } in Fig. 2. (cid:73) Example 3.6.
Fig. 4 shows the metric space X = { a, b, c, d, e } with distances defined bythe shortest path metric induced by the specified edge-lengths, see the distance matrix.xa 5 b 1 c2 y2 pq22 a b c p qa 0 6 7 9 9b 6 0 3 5 5c 7 3 0 6 6p 9 5 6 0 4q 9 5 6 4 0 Figure 4
The set X = { a, b, c, d, e } has the distance matrix defined by the shortest path metric. . . . . . . Figure 5
Left : the dendrogram ∆ for the single linkage clustering of the set 5-point set X = { a, b, c, d, e } in Fig. 4. Right : the mergegram MG(∆), red dots have multiplicity 2.
M F C S 2 0 2 0
The dendrogram ∆ in the first picture of Fig. 5 generates the mergegram as follows:each of the singleton sets { b } and { c } has the dot (0,1.5), so its multiplicity is 2;each of the singleton sets { p } and { q } has the dot (0,2), so its multiplicity is 2;the singleton set { a } has the dot (0 , { b, c } has the dot (1.5,2.5);the merge set { p, q } has the dot (2,2.5); the merge set { b, c, p, q } has the dot (2.5,3);the merge set { a, b, c, p, q } has the dot (3 , + ∞ ). This section introduces the key concepts from the thorough review by Chazal et al. [3]. Aswill become clear soon, the merge module of any dendrogram belongs to a wider class below. (cid:73)
Definition 4.1 (persistence module V ) . A persistence module V over the real numbers R isa family of vector spaces V t , t ∈ R with linear maps v ts : V s → V t , s ≤ t such that v tt is theidentity map on V t and the composition is respected: v ts ◦ v sr = v tr for any r ≤ s ≤ t . (cid:4) The set of real numbers can be considered as a category R in the following sense. Theobjects of R are all real numbers. Any two real numbers such that a ≤ b define a singlemorphism a → b . The composition of morphisms a → b and b → c is the morphism a ≤ c .In this language, a persistence module is a functor from R to the category of vector spaces.A basic example of V is an interval module. An interval J between points p < q in theline R can be one of the following types: closed [ p, q ], open ( p, q ) and half-open or half-closed[ p, q ) and ( p, q ]. It is convenient to encode types of endpoints by ± superscripts as follows:[ p − , q + ] := [ p, q ] , [ p + , q − ] := ( p, q ) , [ p + , q + ] := ( p, q ] , [ p − , q − ] := [ p, q ) . The endpoints p, q can also take the infinite values ±∞ , but without superscripts. (cid:73) Example 4.2 (interval module I ( J ) ) . For any interval J ⊂ R , the interval module I ( J ) isthe persistence module defined by the following vector spaces I s and linear maps i ts : I s → I t I s = (cid:26) Z , for s ∈ J, , otherwise ; i ts = (cid:26) id , for s, t ∈ J, , otherwise for any s ≤ t. The direct sum W = U ⊕ V of persistence modules U , V is defined as the persistencemodule with the vector spaces W s = U s ⊕ V s and linear maps w ts = u ts ⊕ v ts .We illustrate the abstract concepts above using geometric constructions of TopologicalData Analysis. Let f : X → R be a continuous function on a topological space. Its sublevel sets X fs = f − (( −∞ , s ]) form nested subspaces X fs ⊂ X ft for any s ≤ t . The inclusions ofthe sublevel sets respect compositions similarly to a dendrogram ∆ in Definition 3.2.On a metric space X with with a distance function d : X × X → [0 , + ∞ ), a typicalexample of a function f : X → R is the distance to a finite set of points A ⊂ X . Morespecifically, for any point p ∈ X , let f ( p ) be the distance from p to (a closest point of) A . For any r ≥
0, the preimage X fr = f − (( −∞ , r ]) = { q ∈ X | d ( q, A ) ≤ r } is theunion of closed balls that have the radius r and centers at all points p ∈ A . For example, X f = f − (( −∞ , A and X f + ∞ = f − ( R ) = X .If we consider any continuous function f : X → R , we have the inclusion X fs ⊂ X fr forany s ≤ r . Hence all sublevel sets X fs form a nested sequence of subspaces within X . Theabove construction of a filtration { X fs } can be considered as a functor from R to the categoryof topological spaces. Below we discuss the most practically used case of dimension 0. .Elkin et. al. 56:7 (cid:73) Example 4.3 (persistent homology) . For any topological space X , the 0-dimensional homology H ( X ) is the vector space (with coefficients Z ) generated by all connectedcomponents of X . Let { X s } be any filtration of nested spaces, e.g. sublevel sets X fs based ona continuous function f : X → R . The inclusions X s ⊂ X r for s ≤ r induce the linear mapsbetween homology groups H ( X s ) → H ( X r ) and define the persistent homology { H ( X s ) } ,which satisfies the conditions of a persistence module from Definition 4.1. (cid:4) If X is a finite set of m points, then H ( X ) is the direct sum Z m of m copies of Z .The persistence modules that can be decomposed as direct sums of interval modules canbe described in a very simple combinatorial way by persistence diagrams of dots in R . (cid:73) Definition 4.4 (persistence diagram
PD( V ) ) . Let a persistence module V be decomposedas a direct sum of interval modules from Example 4.2 : V ∼ = L l ∈ L I ( p ∗ l , q ∗ l ), where ∗ is + or − .The persistence diagram PD( V ) is the multiset PD( V ) = { ( p l , q l ) | l ∈ L } \ { p = q } ⊂ R . (cid:4) The 0-dimensional persistent homology of a space X with a continuous function f : X → R will be denoted by PD { H ( X fs ) } . Lemma 7.1 will prove that the merge module M (∆) of anydendrogram ∆ is also decomposable into interval modules. Hence the mergegram MG(∆)from Definition 3.4 can be interpreted as the persistence diagram of the merge module M (∆). Let f : X → R be the distance function to a finite subset A of a metric space ( X, d ). Thepersistent homology { H k ( X fs ) } in any dimension k is invariant under isometries of X .Moreover, the persistence diagrams of very different shapes, e.g. topological spaces andtheir discrete samples, can be easily compared by the bottleneck distance in Definition 6.3.Practical applications of persistence are justified by Stability Theorem 6.4 saying thatthe persistence diagram continuously changes under perturbations of a given filtration or aninitial point set. A similar stability of mergegrams will be proved in Theorem 7.4.This section shows that the mergegram MG(∆ SL ( A )) has more isometry informationabout the subset A ⊂ X than the 0-dimensional persistent homology { H ( X fs ) } .Theorem 5.3 shows how to obtain the 0D persistence PD { H ( X fs ) } from MG(∆ SL ( A )),where f : X → R is the distance to a finite subset A ⊂ X . Example 5.4 builds two 4-pointsets in R whose persistence diagrams are identical, but their mergegrams are different.We start from folklore Claims 5.1-5.2, which interpret the 0D persistence PD { H ( X fs ) } using the classical concepts of the single-linkage dendrogram and Minimum Spanning Tree. (cid:73) Claim 5.1 (0D persistence from ∆ SL ) . For a finite set A in a metric space ( X, d ), let f : X → R be the distance to A . In the single-linkage dendrogram ∆ SL ( A ), let 0 < s < · · · < s m < s m +1 = + ∞ be all distinct merge scales. If k ≥ A merge into alarger subset of A at a scale s i , the multiplicity of s i is µ i = k −
1. Then the persistencediagram PD { H ( X fs ) } consists of the dots (0 , s i ) with multiplicities µ i , i = 1 , . . . , m + 1. (cid:4)(cid:73) Claim 5.2 (0D persistence from MST) . For a set A of n points in a metric space ( X, d ), let f : X → R be the distance to A . Let a Minimum Spanning Tree MST( A ) have edge-lengths l ≤ · · · ≤ l n − . The persistence diagram PD { H ( X fs ) } consists of the n − , . l i )counted with multiplicities if some edge-lengths are equal, plus the infinite dot (0 , + ∞ ). (cid:4) M F C S 2 0 2 0 (cid:73)
Theorem 5.3 (0D persistence from a mergegram) . For a finite set A in a metric space( X, d ), let f : X → R be the distance to A . Let the mergegram MG(∆ SL ( A )) be a multiset { ( b i , d i ) } ki =1 , where some dots can be repeated. Then the persistence diagram PD { H ( X fs ) } is the difference of the multisets { (0 , d i ) } ki =1 − { (0 , b i ) } ki =1 containing each dot (0 , s ) exactly b − d times, where b is the number of births b i = s , d is the number of deaths d i = s .All trivial dots (0 ,
0) are ignored, alternatively we take { (0 , d i ) } ki =1 only with d i > (cid:4) Proof.
In the language of Claim 5.1, let at a scale s > µ exactly µ + 1subsets merge into a set B ∈ ∆ SL ( A ; s ). By Claim 5.1 this set B contributes µ dots (0 , s ) tothe persistence diagrams PD { H ( X fs ) } . By Definition 3.4 the same set B contributes µ + 1dots of the form ( b i , s ), i = 1 , . . . , µ + 1, corresponding to the µ + 1 sets that merge into B at the scale s . Moreover, the set B itself will merge later into a larger set, which creates oneextra dot ( s, d ) ∈ PD { H ( X fs ) } . The exceptional case B = A corresponds to d = + ∞ .If we remove one dot (0 , s ) from the µ + 1 dots counted above as expected in the difference { (0 , d i ) } ki =1 − { (0 , b i ) } ki =1 of multisets, we get exactly µ dots (0 , s ) ∈ PD { H ( X fs ) } . Therequired formula has been proved for contributions of any merge set B ⊂ A . (cid:74) In Example 1.1 the mergegram in the last picture of Fig. 1 is the multiset of 9 dots:MG(∆ SL ( A )) = { (0 , . , (0 , . , (0 , , (0 , , (0 . , . , (1 , . , (0 , , (1 . , , (2 , + ∞ ) } . Taking the difference of multisets and ignoring trivial dots (0 , H { X fs } ) = { (0 , . , (0 , . , (0 , , (0 , , (0 , . , (0 , . , (0 , , (0 , , (0 , + ∞ ) }−−{ (0 , . , (0 , , (0 , } = { (0 , . , (0 , , (0 , . , (0 , , (0 , + ∞ ) } as in Fig. 1 . (cid:73) Example 5.4 (the mergegram is stronger than 0D persistence) . Fig. 6 and 7 show thedendrograms, identical 0D persistence diagrams and different mergegrams for the sets A = { , , , } and B = { , , , } in R . This example together with Theorem 5.3 justifythat the new mergregram is strictly stronger than 0D persistence as an isometry invariant.scale s . . . .
52 birthdeath0 . . . . Figure 6
Left : single-linkage dendrogram ∆ SL ( A ) for A = { , , , } ⊂ R . Middle : the 0Dpersistence diagram for the sublevel filtration of the distance to A . Right : mergegram MG(∆ SL ( A )). scale s . . . .
52 birthdeath0 . . . . Figure 7
Left : single-linkage dendrogram ∆ SL ( B ) for B = { , , , } ⊂ R . Middle : the 0Dpersistence diagram for the sublevel filtration of the distance to B . Right : mergegram MG(∆ SL ( B )). .Elkin et. al. 56:9 Definition 6.1 introduces homomorphisms between persistence modules, which are needed tostate the stability of persistence diagrams PD { H ( X fs ) } under perturbations of a function f : X → R . This result will imply a similar stability for the mergegram MG(∆ SL ( A )) forthe dendrogram ∆ SL ( A ) of the single-linkage clustering of a set A within a metric space X . (cid:73) Definition 6.1 (a homomorphism of a degree δ between persistence modules) . Let U and V be persistent modules over R . A homomorphism U → V of degree δ ∈ R is a collection oflinear maps φ t : U t → V t + δ , t ∈ R , such that the diagram commutes for all s ≤ t . U s U t V s + δ V t + δ φ s u ts v t + δs + δ φ t Let Hom δ ( U , V ) be all homomorphisms U → V of degree δ . Persistence modules U , V are isomorphic if they have inverse homomorphisms U → V and V → U of degree δ = 0. (cid:4) For a persistence module V with maps v ts : V s → V t , the simplest example of a homo-morphism of a degree δ ≥ δ V : V → V defined by the maps v s + δs , t ∈ R . So the maps v ts defining the structure of V shift all vector spaces V s the difference of scale δ = t − s .The concept of interleaved modules below is an algebraic generalization of a geometricperturbation of a set X in terms of (the homology of) its sublevel sets X s . (cid:73) Definition 6.2 (interleaving distance ID) . Persistent modules U and V are δ -interleaved ifthere are homomorphisms φ ∈ Hom δ ( U , V ) and ψ ∈ Hom δ ( V , U ) such that φ ◦ ψ = 1 δ V and ψ ◦ φ = 1 δ U . The interleaving distance is ID( U , V ) = inf { δ ≥ | U and V are δ -interleaved } . (cid:4) If f, g : X → R are continuous functions such that || f − g || ∞ ≤ δ in the L ∞ -distance, thepersistence modules H k { f − ( −∞ , s ] } , H k { g − ( −∞ , s ] } are δ -interleaved for any k [4]. Thelast conclusion extended to persistence diagrams in terms of the bottleneck distance below. (cid:73) Definition 6.3 (bottleneck distance BD) . Let multisets
C, D contain finitely many points( p, q ) ∈ R , p < q , of finite multiplicity and all diagonal points ( p, p ) ∈ R of infinitemultiplicity. For δ ≥
0, a δ -matching is a bijection h : C → D such that | h ( a ) − a | ∞ ≤ δ inthe L ∞ -distance on the plane for any point a ∈ C . The bottleneck distance between persistencemodules U , V is BD( U , V ) = inf { δ | there is a δ -matching between PD( U ) and PD( V ) } . (cid:4) The original stability of persistence for sequences of sublevel sets persistence was extendedas Theorem 6.4 to q -tame persistence modules. Intuitively, a persistence module V is q -tameany non-diagonal square in the persistence diagram PD( V ) contains only finitely many ofpoints, see [3, section 2.8]. Any finitely decomposable persistence module is q -tame. (cid:73) Theorem 6.4 (stability of persistence modules) . [3, isometry theorem 4.11] Let U and V be q-tame persistence modules. Then ID( U , V ) = BD(PD( U ) , PD( V )), where ID is theinterleaving distance, BD is the bottleneck distance between persistence modules. (cid:4) M F C S 2 0 2 0
In a dendrogram ∆ from Definition 3.2, any merge set A of ∆ has a life interval life( A ) = [ b, d )from its birth scale b to its death scale d . Lemmas 7.1 and 7.3 are proved in appendices. (cid:73) Lemma 7.1 (merge module decomposition) . For any dendrogram ∆ in the sense of Defini-tion 3.2, the merge module M (∆) ∼ = L A I (life( A )) decomposes over all merge sets A . (cid:4) Lemma 7.1 will allow us to use the stability of persistence in Theorem 6.4 for mergemodules and also Lemma 7.3. Stability of the mergegram MG(∆ SL ( A )) will be proved underperturbations of A in the Hausdorff distance defined below. (cid:73) Definition 7.2 (Hausdorff distance HD) . For any subsets
A, B of a metric space (
X, d ), the
Hausdorff distance
HD(
A, B ) is max { sup a ∈ A inf b ∈ B d ( a, b ) , sup b ∈ B inf a ∈ A d ( a, b ) } . (cid:4)(cid:73) Lemma 7.3 (merge modules interleaved) . If any subsets
A, B of a metric space (
X, d ) haveHD(
A, B ) = δ , then the merge modules M (∆ SL ( A )) and M (∆ SL ( B )) are δ -interleaved. (cid:4)(cid:73) Theorem 7.4 (stability of a mergegram) . Any finite subsets
A, B of a metric space (
X, d )have the mergegrams BD(MG(∆ SL ( A )) , MG(∆ SL ( B )) ≤ HD(
A, B ). Hence any small per-turbation of A in the Hausdorff distance yields a similarly small perturbation in the bottleneckdistance for its mergegram MG(∆ SL ( A )) of the single-linkage clustering dendrogram ∆ SL ( A ). Proof.
The given subsets
A, B with HD(
A, B ) = δ have δ -interleaved merge modules byLemma 7.3, i.e. ID(MG(∆ SL ( A )) , MG(∆ SL ( B )) ≤ δ . Since any merge module M (∆) is fi-nitely decomposable, hence q -tame, by Lemma 7.1, the corresponding mergegram MG( M (∆))satisfies Theorem 6.4, i.e. BD(MG(∆ SL ( A )) , MG(∆ SL ( B )) ≤ δ as required. (cid:74) Theorem 7.4 is confirmed by the following experiment on cloud perturbations in Fig. 8. b o tt l e n ec k d i s t a n ce Figure 8
The bottleneck distances (average on the left, maximum on the right) between merge-grams of sampled point clouds and their perturbations. Both graphs are below the line y = 2 x . We uniformly generate N = 100 black points in the cube [0 , ⊂ R . Then we generate a random number of red points such that the (cid:15) ball of every black pointrandomly has 1, 2 or 3 red points for a noise bound (cid:15) ∈ [0 . ,
10] taken with a step size 0.1. Compute the bottleneck distance between the mergegrams of black and red points. Repeat the experiment K = 100 times, plot the average and maximum in Fig. 8. .Elkin et. al. 56:11 Algorithm 8.1 computes the mergegram of the SL dendrogram for any finite set A ⊂ R m . (cid:73) Algorithm 8.1.
Input : a finite point cloud A ⊂ R m Compute MST( A ) and sort all edges of MST( A ) in increasing order of lengthInitialize Union-Find structure U over A . Set all points of A to be their components.Initialize the function prev: Components[ U ] → R by setting prev( t ) = 0 for all t Initialize the vector Output that will consists of pairs in R × R for Edge e = ( a, b ) in the set of edges (increasing order) do Find components c and c of a and b respectively in Union-Find U Add pairs (prev[ c ],length( e )), (prev[ c ],length( e )) ∈ R to OutputMerge components c and c in Union-Find U and denote the component by t Set prev[ t ] = length( e ) end forreturn OutputLet α ( n ) be the inverse Ackermann function. Other constants below are defined in [9]. (cid:73) Theorem 8.2 (a fast mergegram computation) . For any cloud A ⊂ R m of n points, themergegram MG(∆ SL ( A )) can be computed in time O (max { c , c p c l } c n log n α ( n )). Proof.
A Minimum Spanning Tree MST( A ) needs O (max { c , c p c l } c n log n α ( n )) time by[9, Theorem 5.1]. The rest of Algorithm 8.1 is dominated by O ( nα ( n )) Union-Find operations.Hence the full algorithm has the same computational complexity as the MST. (cid:74) The experiments summarized in Fig. 10 show that the mergegram curve in blue outper-forms other isometry invariants on the isometry classification by the state-of-the-art PersLay.We generated 10 classes of 100-point clouds within the unit ball R m for m = 2 , , ,
5. Foreach class, we made 100 copies of each cloud and perturbed every point by a uniform randomshift in a cube of the size 2 × (cid:15) , where (cid:15) is called a noise bound . For each of 100 perturbedclouds, we added 25 points such that every new point is (cid:15) -close to an original point. Withineach of 10 classes all 100 clouds were randomly rotated within the unit ball around the origin,see Fig. 9. For each of the resulting 1000 clouds, we computed the mergegram, 0D persistencediagram and the diagram of pairs of distances to two nearest neighbors for every point. Figure 9
Left : an initial random cloud with 100 blue points.
Middle : all blue points areperturbed, 25 extra orange points are added.
Right : a cloud is rotated through a random angle.Can we recognize that the initial and final clouds are in the same isometry class modulo small noise?
M F C S 2 0 2 0 · − . . .
81 2 dimensions; noise bound s u cce ss r a t e mergegramPD0NN(2)cloud 1 2 3 4 5 · − . . . .
81 2 dimensions; noise bound1 2 3 4 5 · − . . .
81 4 dimensions; noise bound s u cce ss r a t e · − . . . .
81 5 dimensions; noise bound
Figure 10
Success rates of PersLay in identifying isometry classes of 100-point clouds uniformlysampled in a unit ball, averaged over 5 different clouds and 5 cross-validations with 20/80 splits.
The machine learning part has used the obtained diagrams as the input-data for thePerslay [2]. Each dataset was split into learning and test subsets in ratio 4:1. The learningloops ran by iterating over mini-batches consisting of 128 elements and going through the fulldataset for a given number of epochs. The success rate was measured on the test subset.The original Perslay module was rewritten in Tensorflow v2 and RTX 2080 graphics cardwas used to run the experiments. The technical concepts of PersLay are explained in [2]:Adam(Epochs = 300, Learning rate = 0.01)Coefficents = Linear coefficentsFunctional layer = [PeL(dim=50), PeL(dim=50, operationalLayer=PermutationMaxLayer)].Operation layer = TopK(50)The PersLay training has used the following invariants compared in Fig. 10:cloud : the initial cloud A of points corresponds to the baseline curve in black;PD0: the 0D persistence diagram PD for distance-based filtrations of sublevel sets in red;NN(2) brown curve: for each point a ∈ A includes distances to two nearest neighbors;the mergegram MG(∆ SL ( A )) of the SL dendrogram has the blue curve above others.Fig. 10 shows that the new mergegram has outperformed all other invariants on theisometry classification problem. The 0D persistence turned out to be weaker than the .Elkin et. al. 56:13 pairs of distances to two neighbors. The topological persistence has found applicationsin data skeletonization with theoretical guarantees [8, 5]. We are planning to extend theexperiments in section 8 for classifying rigid shapes by comining the new mergegram withthe 1D persistence, which has the fast O ( n log n ) time for any 2D cloud of n points [7, 6].In conclusion, the paper has extended the 0D persistence to a stronger isometry invariant,which has kept the celebrated stability under noise important for applications to noisy data.The initial C++ code for the mergregram is at https://github.com/YuryUoL/Mergegram andwill be updated. We thank all the reviewers for their valuable time and helpful suggestions. References Gunnar Carlsson and Facundo Memoli. Characterization, stability and convergence of hier-archical clustering methods.
Journal of machine learning research , 11:1425–1470, 2010. Mathieu Carriere, Frederic Chazal, Yuichi Ike, Theo Lacombe, Martin Royer, and YuheiUmeda. Perslay: A neural network layer for persistence diagrams and new graph topologicalsignatures.
AISTATS, arXiv:1904.09378 , 2020. Frédéric Chazal, Vin De Silva, Marc Glisse, and Steve Oudot.
The structure and stability ofpersistence modules . Springer, 2016. David Cohen-Steiner, Herbert Edelsbrunner, and John Harer. Stability of persistence diagrams.
Discrete & Computational Geometry , 37(1):103–120, 2007. Sara Kalisnik, Vitaliy Kurlin, and Davorin Lesnik. A higher-dimensional homologicallypersistent skeleton.
Adv. App. Maths , 102:113–142, 2019. Vitaliy Kurlin. Auto-completion of contours in sketches, maps and sparse 2d images basedon topological persistence. In
Proceedings of SYNASC 2014 workshop CTIC: ComputationalTopology in Image Context , pages 594–601. IEEE, 2014. Vitaliy Kurlin. A fast and robust algorithm to count topologically persistent holes in noisyclouds. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition ,pages 1458–1463, 2014. Vitaliy Kurlin. A homologically persistent skeleton is a fast and robust descriptor of interestpoints in 2d images. In
LNCS, Procedings of CAIP: Computer Analysis of Images and Patterns ,volume 9256, pages 606 – 617, 2015. William B March, Parikshit Ram, and Alexander G Gray. Fast euclidean minimum spanningtree: algorithm, analysis, and applications. In
Proceedings of SIG KDD: Knowledge discoveryand data mining , pages 603–612, 2010. Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Russ R Salakhutdinov,and Alexander J Smola. Deep sets. In
Advances in neural information processing systems ,pages 3391–3401, 2017.,pages 3391–3401, 2017.