A Notion of Harmonic Clustering in Simplicial Complexes
AA Notion of Harmonic Clustering in Simplicial Complexes
Stefania Ebli and Gard Spreemann October 17, 2019
Abstract
We outline a novel clustering scheme for simplicial com-plexes that produces clusters of simplices in a way that issensitive to the homology of the complex. The method isinspired by, and can be seen as a higher-dimensional ver-sion of, graph spectral clustering. The algorithm involvesonly sparse eigenproblems, and is therefore computation-ally efficient. We believe that it has broad application asa way to extract features from simplicial complexes thatoften arise in topological data analysis.
An important objective in modern machine learning, andpart of many scientific and data analysis pipelines, is clus-tering [4]. By clustering, we generally mean the separationof data into groups, in a way that is somehow meaning-ful for the domain-specific relationships that govern theunderlying data and problem in question. However, thedemands that the clustering scheme should satisfy are ofcourse inherently vague.For data that form a point cloud in Euclidean space,and where one expects k clusters to exist, one may em-ploy elementary methods such as k -means clustering [37].For data in a more abstract “similarity space”, for whichno obviously meaningful Euclidean embedding exists, re-searchers invented the schemes [35, 34] that we today referto as hierarchical clustering . Alternatively, one can derivea graph structure from some notion of similarity betweenthe data points. Treating the data points as verticesof a graph allows one to exploit the popular and highlysuccessful spectral clustering techniques [41, 30] whichdeveloped from the field of spectral graph theory [9].Although the graph structure provides us with addi-tional information about the data, graphs are intrinsicallylimited to modeling pairwise interactions. The successof topological methods in studying data, and the par-allel establishment of topological data analysis (TDA) as a field [13, 42] (see also [6, 7, 12, 15] for modernintroductions and surveys), have confirmed the useful-ness of viewing data through a higher-dimensional analogof graphs [28, 32]. Such a higher-dimensional analogis called a simplicial complex , a mathematical objectwhose structure can describe n -fold interactions betweenpoints. Their ability to capture hidden patterns in thedata has led to various applications from biology [16, 33] to material science [21]. Recent work has also ex-panded classical graph-centric results — such as Cheegerinequalities [17, 5], isoperimetric inequalities [31] andspectral methods [22] — to simplicial complexes. Thisleads naturally towards a novel domain of “spectral TDA”methods.In this paper we present the harmonic clustering al-gorithm , a novel clustering scheme inspired by the well-known spectral clustering algorithm for graphs. Ourmethod, like spectral clustering, does not require anyparameter optimization and involves only computing thesmallest eigenvalue eigenvectors of a sparse matrix. Theharmonic clustering algorithm is applied directly to asimplicial complex and it outputs a clustering of the simplices (of a fixed degree) that is sensitive to the homo-logical structure of the complex, something that is highlyrelevant in TDA. Moreover, since simplices can encodeinteractions of higher order than just the pairs capturedby graphs, our algorithm allows us to cluster complexcommunity structures rather than just the entities theycomprise.Our method can be seen as complimentary to the onepresented in [5]. The method we present in this paper does not requiremany formal results from spectral graph theory. Thenotions relevant for our purposes are described below forthe sake of completeness.In its simplest form, the
Laplacian of an undirectedand unweighted finite graph G is taken to be the positivedefinite matrix L = D − A , where A is the adjacencymatrix of G and D its diagonal degree matrix (i.e. therow/column sums of A ). The normalized Laplacian isthen defined as ¯ L = D − / LD − / . For reasons that willbecome clear later on (see 2.1), we will write C ( G ) for thefree real vector space generated by the vertices of G , andconsider L as the matrix of a linear map C ( G ) → C ( G ) in this basis.Already in the middle of the 19th century it was clearthat the eigenvalue spectrum of L has a lot to say about G , as is evident from as early as a historic theorem ofKirchhoff relating the eigenvalues of the Laplacian withthe number of spanning trees in the graph [25]. Fromthe 1950s, graph theorists and quantum chemists wereindependently discovering more relationships between agraph and the eigenspectrum of its Laplacian. However,1 a r X i v : . [ c s . L G ] O c t he publication of the book [10] may be said to mark thestart of spectral graph theory as a field in its own right.A modern introduction to the field and references to theresults listed below can be found in [9].The spectrum of L encodes information about theconnectivity of the graph. For instance, the number ofconnected components of the graph is equal to the di-mension of the kernel of L . Moreover, the eigenvectorsassociated to the zero eigenvalues, also called harmonicrepresentatives , take constant values on connected com-ponents. A perhaps more interesting result is given bythe Cheeger constant [8], a measure of how far away aconnected graph is from being disconnected by boundingthe smallest non-zero eigenvalue of L . Theorem 1 (Cheeger, 1969 [8]; see also e.g. [9]) . Let G = ( V ; E ) be a finite, connected, undirected, unweightedgraph. Write cut( G ) for the triples ( S, ¯ S, ∂S ) with S, ¯ S ⊆ V and ∂S ⊆ E such that S (cid:116) ¯ S = V and ∂S = { ( u, w ) ∈ E : u ∈ S, w ∈ ¯ S } . Define the
Cheeger constant of G as h ( G ) = min ( S, ¯ S,∂S ) ∈ cut( G ) | ∂S | min( (cid:80) u ∈ S deg( u ) , (cid:80) w ∈ ¯ S deg( w )) . Then the first non-zero eigenvalue λ of the graph’s nor-malized Laplacian satisfies h ( G ) ≥ λ ≥ ( h ( G )) . A partition of V as S (cid:116) ¯ S that attains the Cheegerconstant is called a Cheeger cut . It is known that findingan exact Cheeger cut is an NP-hard problem [38]. One ofthe best known approaches to approximating the Cheegercut is the spectral clustering method , which takes the firstnon-zero eigenvalue eigenvector of the graph Laplacianas a relaxed real-valued solution of the original discreteoptimization problem [41]. Namely, the smallest non-zero eigenvector of ¯ L , also called the Fiedler vector or the connectivity vector [14], can be exploited to find thebest partition of the graph into two “almost disconnected”components. The Cheeger cut can be easily generalizedto find k + 1 “almost disconnected” components using the k first non-zero eigenvectors of the graph Laplacian [41]. The Fiedler vector being a relaxed solution of the Cheegercut has implications for clustering the vertices of a graphinto “almost disconnected” components [41, 3]. For theremainder of this section we will assume that the graphunder consideration is connected.Graph spectral clustering of a graph G = ( V, E ) withLaplacian L works in two steps. First, one uses theinformation encoded in the lowest-eigenvalue eigenvectorsof L to map V into low-dimensional Euclidean space.One thereafter uses standard k -means or any applicableEuclidean clustering technique on the points in the imageof this map, before pulling back to G . Specifically, wewill write e , e , . . . , e n for the eigenvectors associated with the n first non-zero eigenvalues of L . One defines afunction, also called a spectral embedding , φ : C ( G ) → R n by φ ( v ) = ( (cid:104) v, e (cid:105) , (cid:104) v, e (cid:105) , . . . , (cid:104) v, e n (cid:105) ) , (1)where (cid:104)• , •(cid:105) is the inner product on C ( G ) that makes V orthonormal. As a finite Euclidean point cloud, im φ isthen clustered in R n by standard k -means or any suitableclustering algorithms. The clustering obtained is thenpulled back to V . Figure 1 shows an example. Observethat in this case, mapping into the real line using onlythe Fiedler vector would suffice (i.e. n = 1 ). Figure 1: Graph spectral clustering of the nodes of a graph withtwo well-connected components weakly interconnected. Clusteringusing the Fiedler vector produces as clusters the well-connectedcomponents.
As pointed out in [41], spectral clustering is one of thestandard approaches to identify groups of “similar behav-ior” in empirical data. It is therefore not surprising thatit has been successfully employed in many fields rangingfrom computer science and statistics to biology and socialscience. Moreover, compared to other approaches, such asGaussian Mixture Models clustering, spectral clusteringdoes not require any parameter optimization and can besolved efficiently by standard linear algebra methods.
Our method is inspired by spectral clustering in graphs,but applies instead to a higher-dimensional analog,namely simplicial complexes . Instead of clustering onlyvertices, which are the zero-dimensional building blocksof graphs and simplicial complexes, the method clustersindependently building blocks of any dimension.This section outlines the prerequisite basic construc-tions from algebraic topology before describing ourmethod. A reader interested in more background onalgebraic topology is directed to standard textbooks [18].Those wishing a quick overview of method can view itin algorithmic form in figure 3. A simplicial complex is a collection of finite sets closedunder taking subsets. We refer to a set in a simplicialcomplex as a simplex of dimension p if it has cardinality p + 1 . Such a p -simplex has p + 1 faces of dimension p − , namely the sets omitting one element, which wewill denote as ( v , . . . , ˆ v i , . . . , v p ) when omitting the i ’th2lement. While this definition is entirely combinatorial,we will soon see that there is a geometric interpretation,and it will make sense to refer to and think of -simplicesas vertices , -simplices as edges , -simplices as triangles , -simplices as tetrahedra , and so forth.Let C p ( K ) be the free real vector space with basis K p ,the set of p -simplices in a simplicial complex K . Theelements of C p ( K ) are called p -chains . These vectorspaces come equipped with boundary maps , namely linearmaps defined by ∂ p : C p → C p − ∂ p (( v , . . . , v p )) = p (cid:88) i =0 ( − i ( v , . . . , ˆ v i , . . . , v p ) with the convention that C − ( K ) = 0 and ∂ = 0 forconvenience. Figure 2 shows how the boundary mapsgive a geometric interpretation of simplicial complexes.One readily verifies that ∂ p ◦ ∂ p +1 = 0 , and so C • ( K ) isa real chain complex . By the p ’th homology vector space of K we will mean the p ’th homology vector space of thischain complex, namely H p ( K ) = H p ( C • ( K )) = ker ∂ p / im ∂ p +1 . The elements of ker ∂ p are called p -cycles , while thoseof im ∂ p +1 are called p -boundaries , as can be seen geo-metrically in Figure 2. The Betti numbers are the di-mensions of the homology vector spaces, and we write β p ( K ) = dim H p ( K ) . Intuitively, the Betti numberscount connected components, non-bounding loops, non-bounding cavities, and so forth.We emphasize again that this is homology with real coefficients, not integer or finite field, as is common inTDA. Figure 2: A simplicial complex K with
20 0 -simplices,
38 1 -simplices(the edges) and
22 2 -simplices (the filled triangles), with somehighlighted -chains. The highlighted simplices in these representthe edges with non-zero coefficient in each chain (the unfamiliarreader is invited to fill in possible values for these coefficients). Thered -chain consists of a single -simplex, and is neither a cycle nor aboundary. The orange -chain has trivial boundary, and is thereforea cycle. It is not a representative of any non-trivial homology class,for it is the boundary of -chain consisting of the three -simplices itencloses. The green and the blue -chains are cycles that representthe same homology class (intuitively the -dimensional hole in themiddle). H ( K ) is -dimensional, K ’s single connected component,while H ( K ) is -dimensional due to the central hole. We are in this paper concerned with finite simplicialcomplexes, and assume that they are built in a way thatencodes useful information about the data being studied. We will briefly discuss the case where each simplex in K comes equipped with extra data — including, but notlimited to the filtration/weighting information that isubiquitous in TDA — or with a normalization factorderived from the complex’s structure, in the form of afunction w : K → R + . The latter is analogous to thevarious normalization schemes that are often used ingraph spectral theory. Our computational experiments,however, will only consider the case w = 1 .The weights are encoded into the chain complex byendowing each degree with the inner product that makesall simplices orthogonal, and a simplex have norm givenby the weight, i.e. (cid:104)• , •(cid:105) i : C i ( K ) × C i ( K ) → R (cid:104) σ, τ (cid:105) i = (cid:40) w ( σ ) if σ = τ otherwise.Further discussions of weighting schemes can be foundin [22].We place no further assumptions on the simplicialcomplex that we take as input. In particular, it is notnecessary for it to come equipped with some embeddinginto Euclidean space, nor do we demand that it trian-gulates a Riemannian manifold. Therefore dualities likethe Hodge star, which is used to construct the Hodge–deRham Laplacian in the smooth setting [26] that motivatesus, are unavailable for our method. The same is truefor discrete versions of the Hodge star, such as that ofHirani [20]. Instead of dualizing with respect to a Hodgestar, to define a discrete version of the Laplacian for sim-plicial complexes, we simply take the linear adjoint ofthe boundary operator with respect to the inner product,defining ∂ ∗ i : C i − → C i by (cid:104) ∂ ∗ i σ, τ (cid:105) i = (cid:104) σ, ∂ i τ (cid:105) i − ∀ σ ∈ K i − , τ ∈ K i . In analogy with Hodge–de Rham theory, we then definethe degree- i simplicial Laplacian of a simplicial complex K as the linear operator L i : C i ( K ) → C i ( K ) such that L i = L up i + L down i L up i = ∂ i +1 ◦ ∂ ∗ i +1 : C i ( K ) → C i ( K ) L down i = ∂ ∗ i ◦ ∂ i : C i ( K ) → C i ( K ) . The harmonics are defined as H i ( K ) = ker L i . Observe that there are p Laplacians for a complex ofdimension p . In most practical applications, the matricesfor the Laplacians are very sparse and can easily becomputed as a product of sparse boundary matrices andtheir transposes.The following discrete version of the important Hodgedecomposition theorem is a simple exercise in linear alge-bra in the current setting.
Theorem 2 (Eckmann, 1944 [11]) . The vector spaces ofchains decompose orthogonally as C i ( K ) ∼ = H i ( K ) ⊕ im ∂ i +1 ⊕ (ker ∂ i ) ⊥ . Moreover, . H i ( K ) ∼ = H i ( K )
2. the harmonics are both cycles and cocycles (i.e. cy-cles with respect to ∂ ∗ i +1 )3. the harmonics are the L -minimal representatives oftheir (co)homology classes, i.e. if h ∈ H i ( K ) and h ∼ z ∈ ker ∂ i are homologous, then (cid:104) h, h (cid:105) i ≤ (cid:104) z, z (cid:105) i . The first detailed work on the spectral properties ofthis kind of simplicial Laplacian was carried out by Horakand Jost [22]. Recently Steenbergen et al. [36] provideda notion of a higher dimensional Cheeger constant forsimplicial complexes. At the same time, Gundert andSzedlák [17] proved a lower bound for a modified versionof the higher dimensional Cheeger constant for simplicialcomplex which was later generalized to weighted com-plexes by Braxton et al. Mukherjee and Steenbergen [29]developed an appropriate notion of random walks on sim-plicial complexes, and related the asymptotic propertiesof these walks to the simplicial Laplacians and harmonics.It is worth mentioning that, to the best of our knowledge,no connection between the eigenvectors of the simpli-cial Laplacian and an optimal cut for simplices in higherdimensions is known.Our contribution is a notion of spectral clustering forsimplicial complexes using the harmonics.
Observe that the ordinary graph Laplacian, as describedin section 1.1, is just the matrix of L = L up in thestandard basis for C ( G ) . The function φ in equation (1)can thus be seen as projecting the -simplices onto a sub-space of low-but-nonzero-eigenvalue eigenvectors of L .The zero part of the spectrum is not used. Theorem 2makes the reason clear: harmonics in H ( G ) have thesame coefficient for every vertex in a connected compo-nent of G . As connectivity information is easy to obtainanyway, there is little use in adding these eigenvectors tothe subspace that φ projects onto. This is not so for thehigher Laplacians. In fact, our method primarily uses theharmonics, and only optionally ventures into the non-zeropart of the eigenspectrum.In what follows, K is a fixed simplicial complex arisingfrom data. The particulars of how K was built from datais outside the scope of this paper, and is a topic thatis well-studied in the field of TDA in general. Our goalis to obtain a useful clustering of K p for some chosen p .We assume that K is of low “homological complexity” indegree p , by which we mean that β p ( K ) is small (lessthan , say).Analogously to φ above, we define the harmonic em-bedding ψ : K p → R β p ( K ) ψ = ξ ◦ proj H p ( K ) ◦ i, where i : K p (cid:44) → C p ( K ) , proj : C p ( K ) → H p ( K ) isorthogonal projection, and ξ : H p ( K ) → R β p ( K ) is anyvector space isomorphism. In practice, we simply pick an orthonormal basis h , . . . , h β p ( K ) for H p ( K ) and let ψ ( σ ) = (cid:16) (cid:104) σ, h (cid:105) p , (cid:104) σ, h (cid:105) p , . . . , (cid:10) σ, h β p ( K ) (cid:11) p (cid:17) . In many situations of practical use, it turns out thatmany points in im ψ lie along one-dimensional subspacesof R β p ( K ) . The membership of a point ψ ( σ ) in such asubspace is what is used to cluster the p -simplex σ (orto leave it unclustered in case it is not judged to be suf-ficiently close to lying in one of the subspaces). Thisamounts to clustering K p by performing Euclidean sub-space clustering of im ψ . A variety of Euclidean subspaceclustering methods are available, but are outside the scopeof this paper. Examples include independent componentanalysis [23], SUBCLU [24], and density maximization on S β p ( K ) − (or, more precisely, on RP β p ( K ) − ), which itselfhas a multitude of approaches, including purely TDA-based ones by means of persistent homology of sublevelsets.We point out that the choice of the isomorphism ξ : H p ( K ) → R β p ( K ) does not matter on a theoretical level.It may, however, have practical implications for how easyit is to perform subspace clustering. In experiments wetypically choose ξ to be the isomorphism that sends h i to the standard basis vector e i . Choosing a differentorthonormal basis for H p ( K ) then just amounts to anelement of SO( β p ( K )) acting on im ψ .Figure 3 summarizes our method in algorithmic form. Require:
Integer p ≥ ; simplicial complex K with β p = dim( H p ( K )) small, K p = { σ , . . . , σ N } , and innerproducts (cid:104)• , •(cid:105) p on C p ( K ) . L p ← matrix for L p ( h , . . . , h β p ) ← orthonormal basis for ker L p (com-puted using iterative methods [19] on L p ) for i = 1 to i = N do x i ← (cid:16) (cid:104) σ i , h (cid:105) p , . . . , (cid:10) σ i , h β p (cid:11) p (cid:17) end for ( a , . . . , a k ) ← subcluster( x , . . . , x N ) for i = 1 to i = k do c i ← { σ j ∈ K p : j ∈ a i } end forEnsure: Homologically sensitive clustering c , . . . , c k of p -simplices in K . Figure 3: Our method in algorithmic form. The subroutine subcluster refers to any Euclidean subspace clustering scheme, suchas independent component analysis [23], SUBCLU [24], or densitymaximization on S β p ( K ) − (or, more precisely, on RP β p ( K ) − ).The latter can be done using methods from TDA, for exampleby means of persistent homology of certain sublevel sets. Notethat there may be unclustered simplices, i.e. it may happen that ∪ ki =1 c i (cid:54) = K p . In this section we present experimental results for the har-monic clustering algorithm on synthetic data. Specifically,we focus on clustering the edges of various constructedsimplicial complexes. The outcomes of our experimentssuggest that the harmonic clustering algorithm provides4lusters sensitive to the homology of the complex. Com-paring our results with those of the traditional spectralclustering algorithm applied to the graph underlying thesimplicial complex reinforces the idea that our algorithmreveals substantially different patterns in data comparedto the classical method.Below, we consider four simplicial complexes. Three ofthem are complexes built from Euclidean point clouds bystandard methods from TDA, while one is a triangulationof a torus. We reiterate that our method works withabstract simplicial complexes without utilizing any em-bedding of these into an ambient space. Euclidean pointclouds just happen to be a good and common source ofsimplicial complexes in TDA, and allow for visualizationof the obtained clustering in a way that easily relates tothe original data.An important step in preprocessing many kinds of in-put data in TDA is constructing a simplicial complexsatisfying certain theoretical properties. In particular, ifthe input data come from points sampled from a topo-logical space X ⊂ R n , one may wish for the homology ofthe complex to coincide with the homology of X .Two constructions for which some such guarantees existare the alpha complex [1] and the Vietoris–Rips (VR)complex [40]. Both can be seen as taking a point cloud andproducing a filtered simplicial complex K , i.e. a sequence ( K t ) t ∈ R + with the property that K s ⊂ K t whenever s ≤ t .We wish to work with a single simplicial complex, not afiltration, so we use persistent homology (see e.g. [15]) tofind the filtration scale t for which K t has the appropriatehomology. Of course, since in practice one probably haslittle or no knowledge of X itself, one cannot necessarilyknow the “correct” t to consider. However, it is oftenthe case in TDA that long-lived homological features —that is to say, homology classes that remain non-trivialunder the induced maps H p ( K s ) → H p ( K t ) for large t − s — express interesting properties of the underlyingspace. We therefore choose a K t to consider by lookingfor a scale t within the range of a manageable number oflong-lived features and few short-lived ones in the degreeunder consideration.In the following experiments, we simplify the setupin the algorithm in figure 3 by performing the subrou-tine subcluster in a somewhat ad hoc semi-manual way.Specifically, all the images of the ψ ’s lie in R or R in these experiments, so we manually pick out the sub-spaces V , . . . , V k in question. Then, the points in im ψ are orthogonally projected onto each of the subspaces.A point ψ ( σ ) is determined to lie on subspace V i if proj V i ( ψ ( σ ) / (cid:107) ψ ( σ ) (cid:107) ) has norm at least . , while theonto all other subspaces has norm less than . . Thesimplex σ is then said to be in cluster number i . If theabove is not true for any of the subspaces, σ is consideredunclustered.In many of the experiments that follow, many pointsin im ψ end up determined as “unclustered” because theyproject well onto neither of the -dimensional subspaces,or project too well onto multiple of them, as describedin 2.3. This is not necessarily a problem, as the partsthat are clustered contain a lot of useful information.Moreover, the problem can be reduced by choosing less ad hoc subspace clustering methods than we are currentlyemploying.To ease visualization, we focus on simplicial complexesthat naturally live in R or R because they arise frompoint clouds. In this experiment, we consider a noisy sampling of X = S ∨ S ∨ S realized as a central unit sphere withunit circles wedged onto antipodal points. We sampled points uniformly randomly from the central sphere,adding radial uniform noise with amplitude . . Thecircles were sampled using points each, again with aradial noise of . . This yields a point cloud with points, which is shown in figure 4. The VR complex iscertainly a suboptimal choice of simplicial complex tobuild on this kind of data, but we chose it to demonstratethat our method works well also for such an overly densecomplex. The complex, constructed at scale / anddenoted K within this section, has -simplices and -simplices, and the Betti numbers are β ( K ) = 1 , β ( K ) = 2 , β ( K ) = 1 , as for X itself. Figure 4: The point sample under consideration in 3.1.
We focus on clustering the -simplices of the complex.The image of ψ in R is shown in figure 5. The points arecolored according to which of the two one-dimensionalsubspaces they are deemed to belong to. The determi-nation was made by a simple criterion of projecting wellenough onto one of the lines, but not the other. Pointsthat project well onto both or neither are consideredunclustered and shown as red. Figure 6 shows this clus-tering pulled back to the complex itself, excluding theunclustered edges. Observe how the method separatesthe -simplices of the VR complex in a manner that issensitive to the two non-bounding cycles that generatehomology in degree (the two circles). Figure 5: The image of ψ for the -simplices in the VR complex fromthe experiment in 3.1. The dashed lines indicate the subspacesused for clustering. The inset shows a detailed view near theorigin, where one can see a large number of points in gray that areunclustered due to them projecting too well onto both subspaces. We also repeated the experiment with one of the circlesin X moved to be attached to the other circle instead of5 igure 6: The clustering from figure 5 pulled back to the -simplicesof the VR complex from 3.1, which is here drawn in R using thecoordinates of the points for visualization purposes only. Theunclustered -simplices, in number, are not drawn. the sphere. This space is obviously homotopy equivalentto X , but is geometrically very different. figure 7 showsthe result. Observe that the sphere is now captured byits adjacent circle, and that the unclustered edges tendto be those near where the two circles intersect. Figure 7: The result of clustering the rearranged point cloud from3.1. Again the -simplices of the VR complex are clustered in a wayrespecting the generators of 1-homology. The unclustered simplices, in number, are drawn in gray. (That the sphere appears solidis only a visualization artifact; the -simplices are not drawn.) In this experiment we uniformly randomly sample points from a unit square in R with three disks of radius / cut out. The points are seen as faint does in figure 8.We construct the alpha complex at parameter . , anddenote it by K in this section. It has Betti numbers β ( K ) = 1 , β ( K ) = 3 and β i> ( K ) = 0 . There are -simplices and -simplices. We again focuson the -simplices for clustering. The codomain of ψ isnow R , and the image is clustered according to three -dimensional linear subspaces. The result is shown infigure 8, and we again observe how the obtained clusteringoccurs with respect to the punctures of the square. . . . . . . . . . . . . Figure 8: The point cloud of the experiment in 3.2 is shown asfaint dots. The punctures can be seen in near (0 . , . , (0 . , . and (0 . , . , and one observes that the clustered -simplices (blue,green, orange, respectively) follow the punctures. The gray -simplices are unclustered. The -simplices have not been drawn. We next perform clustering of the edges of two differenttori.
We uniformly randomly sampled points fromthe unit square and map these under ( ϕ, θ ) (cid:55)→ ((2 + sin(2 πϕ )) cos(2 πθ ) , (2 + sin(2 πϕ )) sin(2 πθ ) , cos(2 πϕ )) to produce a point sample of a torus in R . The pointswere then given a uniformly random noise of amplitude . in both radii. Again a VR complex K was built, atscale . . It has -simplices and -simplices,and has the homology of a torus, i.e. β ( K ) = 1 , β ( K ) = 2 , β ( K ) = 1 . VR was chosen in order for theclustering task to be more complicated than in a moreorderly alpha complex.Figure 9 shows the image in R of K under ψ . Thesubspaces for clustering are somewhat harder to makeout than before, but they can still be found. The resultof clustering by them can be seen in figure 10. Observethat the two clusters respect the two independent unfilledloops of the torus. Figure 9: The clustering of the -simplices from the simplicialcomplex obtained from the sampled torus in the experiment in3.3.1. The unclustered points are shown in gray. They are innumber.Figure 10: The clustering in figure 9 pulled back to -simplices ofthe torus from the experiment in 3.3.1. The unclustered ones arenot shown, something which may make the torus appear broken. As a smaller, more abstract and noise-free example, weconsider a triangulation of a flat torus. The consideredtriangulation consists of a simplicial complex with ver-tices,
27 1 -simplices and
18 2 -simplices. The image ofits -simplices in R under ψ is shown in figure 11. Thearrangement into a perfect hexagon means that there are6n fact three subspaces that can be chosen for clustering.The clusters are shown in figure 12. The arrangementinto a hexagon, and therefore the result of three instead ofthe expected two clusters, disappear if one breaks some ofthe symmetry in the triangulation, for example by havingsome of the diagonal edges go the opposite direction. − . − . . . . h − . − . . . . h Figure 11: The clustering of the -simplices from the simplicialcomplex obtained as a triangulation of the flat torus from theexperiment in 3.3.2. Note that many points overlap. Three clustersare given by points lying on three different linear subspaces.Figure 12: A triangulation of the flat torus represented as a rectanglewith pairs of opposing edges identified. The -simpilces are clusteredinto three groups (orange, blue and green). Those in orange and blueare representatives of the two -homology classes of the complex,whereas the green ones are a linear combination of the others. -simplices We have illustrated our method only on -simplices sofar for ease of visualization. To point out that it also per-forms well in other dimensions, we sampled points(each) from two spheres of radius centered at ( − , , and (1 , , , each with a radial uniform random noisewith amplitude . . We computed the alpha complex K at parameter . , so as to create a rather messy region be-tween the spheres. There are -simplices and -simplices, and β ( K ) = 1 , β ( K ) = 0 and β ( K ) = 2 asexpected. Our clustering method performs as expected,producing clusters of K that correspond to homologicalfeatures, as is shown in figure 13. Figure 13: The output of our method when clustering the -simplices from the complex in the experiment in 3.4 are the blueand orange clusters. The gray simplices are unclustered. It is worth comparing clustering obtained from ourmethod with the ones obtained by clustering the nodesof the graph underlying each simplicial complex usingthe graph spectral clustering algorithm. Figure 14 showsthe results of graph spectral clustering on the nodes ofthe graph underlying the complex in figure 7. The twofirst graph Laplacian eigenvectors were used to map thenodes into R , and then k -means was used to find twoclusters. Similarly, figure 15 displays three clusters on thenodes of the graph underlying the complex representing apunctured plane with three holes in figure 8. The nodesare mapped to R using the three first graph Laplacianeigenvectors, after which k -means was used to find threeclusters. In both cases we see that the clusters do notreflect any obviously meaningful property of the under-lying data, unlike our method, which clusters in a waysensitive to homology. Figure 14: Graph spectral clustering of the vertices of the graphunderlying the VR complex of figure 7. . . . . . . . . . . . . Figure 15: Graph spectral clustering of the vertices of the graphunderlying the alpha complex from figure 8.
In this paper we have presented a novel clustering methodfor simplicial complexes, one that is sensitive to the homol-ogy of the complex. We see the method as a contributionto an emerging field of spectral TDA [27, 2]. Our resultssuggest that the algorithm can be used to extract homo-logical features for simplices of any degree. Experimentsin various simplicial complexes demonstrate the abilityof the method to accurately detect edges belonging todifferent non-bounding cycles. Similar results, not shownin this article for practical considerations of visualization,have been obtained by clustering simplices in higher di-mensions. The sub-problem of the structure of the linearsubspaces in the image of ψ , and how to accurately clusterbased on demand, require further investigation of both amathematical and a algorithmic nature.While our method seems robust to noise in the underly-ing data, a more thorough investigation into the output’s7ependence on noise, and the output’s dependence on thescale at which a point-cloud-derived simplicial complexis built, is warranted.Moreover, it has not eluded us that the method as out-lined is not restricted to clustering just simplices. Otherfinitely generated chain complexes, such as discrete Morsecomplexes or cubical complexes, naturally lend themselvesto the same analysis. One may also want to consider ifthere are theoretical implications even in the smooth case.Further development will also include enlarging thetarget of the projection in ψ to include non-zero eigenvec-tors of L p , as in graph spectral clustering. Preliminaryresults indicate that this yields a further refinement of thehomologically sensitive clusters into “fair” subdivisions.Finally, further work needs to explore the effects ofweighting. Both structural weighting, i.e. derivingweights from the local connectivity properties of the com-plex, as is often done with graph spectral clustering, andweighting originating from the underlying data itself, asis common in TDA.A potential future application that we suspect fits ourmethod well is collaboration networks [32], where n -foldcollaborations clearly cannot accurately be encoded as (cid:0) n (cid:1) pairwise ones. Acknowledgments
Alpha complexes and persistent homology were computedusing
GUDHI [39]. Eigenvector computations were donewith
SLEPc [19].We would like to thank K. Hess for valuable discus-sions.Both authors were supported by the Swiss NationalScience Foundation grant number 200021_172636.
References [1] Nataraj Akkiraju, Herbert Edelsbrunner, MichaelFacello, Ping Fu, EP Mücke, and Carlos Varela. “Al-pha shapes: definition and software”. In:
Proceedings ofthe 1st International Computational Geometry SoftwareWorkshop . Vol. 63. 1995, p. 66.[2] Sergio Barbarossa, Stefania Sardellitti, and Elena Ceci.“Learning from signals defined over simplicial complexes”.In: . IEEE.2018, pp. 51–55.[3] Mikhail Belkin and Partha Niyogi. “Laplacian eigenmapsand spectral techniques for embedding and clustering”.In:
Advances in neural information processing systems .2002, pp. 585–591.[4] Pavel Berkhin. “A Survey of Clustering Data Min-ing Techniques”. In:
Grouping Multidimensional Data.Springer, Berlin, Heidelberg (2006).[5] Osting Braxton, Palande Sourabh, and Wang Bei.
To-wards Spectral Sparsification of Simplicial ComplexesBased on Generalized Effective Resistance . 2017. arXiv: .[6] Gunnar Carlsson. “Topology and Data”. In:
Bull. Amer.Math. Soc.
An introductionto Topological Data Analysis: fundamental and practicalaspects for data scientists . 2017. arXiv: .[8] Jeff Cheeger. “A lower bound for the smallest eigenvalueof the Laplacian”. In:
Proceedings of the Princeton con-ference in honor of Professor S. Bochner . 1969, pp. 195–199.[9] Fan R. K. Chung.
Spectral graph theory . Vol. 92. Re-gional Conference Series in Mathematics. AmericanMathematical Society, 1997.[10] Dragoš M. Cvetković, Michael Doob, and Horst Sachs.
Spectra of Graphs. Theory and Application . Pure andApplied Mathematics. Academic Press, 1979.[11] Beno Eckmann. “Harmonische Funktionen und Randw-ertaufgaben in einem Komplex”. In:
Commentarii Math-ematici Helvetici
Computationaltopology: an introduction . American Mathematical Soc.,2010.[13] Herbert Edelsbrunner, David Letscher, and AfraZomorodian. “Topological persistence and simplifica-tion”. In:
Proceedings 41st Annual Symposium on Foun-dations of Computer Science . IEEE. 2000, pp. 454–463.[14] Miroslav Fiedler. “Algebraic connectivity of graphs”. In:
Czechoslovak mathematical journal
Bulletin of the American Mathematical Society
Proceed-ings of the National Academy of Sciences
Proceedings of the thirtiethannual symposium on Computational geometry . 2014,p. 181.[18] Allen Hatcher.
Algebraic Topology . Cambridge Univer-sity Press, 2002.[19] Vicente Hernandez, Jose E. Roman, and Vicente Vidal.“SLEPc: A scalable and flexible toolkit for the solutionof eigenvalue problems”. In:
ACM Trans. Math. Software
Proceedingsof the National Academy of Sciences
Advances in Mathematics
244 (2013), pp. 303–336.[23] Aapo Hyvärinen and Erkki Oja. “Independent compo-nent analysis: algorithms and applications”. In:
Neuralnetworks
24] Karin Kailing, Hans-Peter Kriegel, and Peer Kröger.“Density-connected subspace clustering for high-dimensional data”. In:
Proceedings of the 2004 SIAMinternational conference on data mining . SIAM. 2004,pp. 246–256.[25] Gustav Kirchhoff. “Ueber die Auflösung der Gleichun-gen, auf welche man bei der Untersuchung der linearenVertheilung galvanischer Ströme geführt wird”. In:
An-nalen der Physik
From calculus tocohomology: de Rham cohomology and characteristicclasses . Cambridge University Press, 1997.[27] Joshua L Mike and Jose A Perea.
Geometric Data Anal-ysis Across Scales via Laplacian Eigenvector Cascading .2018. arXiv: .[28] Terrence J. Moore, Robert J. Drost, Prithwish Basu,Ram Ramanathan, and Anantharam Swami. “Analyzingcollaboration networks using simplicial complexes: Acase study”. In:
Proceedings IEEE INFOCOM Work-shops (2012), pp. 238–243.[29] Sayan Mukherjee and John Steenbergen. “Random walkson simplicial complexes and harmonics”. In:
Randomstructures & algorithms
Advances In Neural Information Processing Systems .MIT Press, 2001, pp. 849–856.[31] Ori Parzanchevski, Ron Rosenthal, and Ran J. Tessler.“Isoperimetric inequalities in simplicial complexes”. In:
Combinatorica
EPJ Data Science
Frontiers in Computational Neuro-science
11 (2017), p. 48.[34] Peter H A Sneath. “The application of computers to tax-onomy”. In:
Journal of General Microbiology
17 (1957),pp. 201–226.[35] Thorvald Sørensen. “A method of establishing groups ofequal amplitude in plant sociology based on similarity ofspecies and its application to analyses of the vegetationon Danish commons”. In:
Biologiske Skrifter
Advances in Applied Mathematics
56 (2014), pp. 56–77.[37] Hugo Steinhaus. “Sur la division des corp materiels enparties”. In:
Bull. Acad. Polon. Sci
Proceedings of the 27th InternationalConference on Machine Learning . 2010, pp. 1039–1046.[39] The GUDHI Project.
GUDHI User and Reference Man-ual . GUDHI Editorial Board, 2015. url : http://gudhi.gforge.inria.fr/doc/latest/ .[40] Leopold Vietoris. “Über den höheren Zusammenhangkompakter Räume und eine Klasse von zusammen-hangstreuen Abbildungen”. In: Mathematische Annalen
Statistics and computing
Discrete & ComputationalGeometry33.2 (2005), pp. 249–274.