[PDF] ε -net Induced Lazy Witness Complexes on Graphs

Abstract

Computation of persistent homology of simplicial representations such as the Rips and the Cěch complexes do not efficiently scale to large point clouds. It is, therefore, meaningful to devise approximate representations and evaluate the trade-off between their efficiency and effectiveness. The lazy witness complex economically defines such a representation using only a few selected points, called landmarks. Topological data analysis traditionally considers a point cloud in a Euclidean space. In many situations, however, data is available in the form of a weighted graph. A graph along with the geodesic distance defines a metric space. This metric space of a graph is amenable to topological data analysis. We discuss the computation of persistent homologies on a weighted graph. We present a lazy witness complex approach leveraging the notion of ϵ -net that we adapt to weighted graphs and their geodesic distance to select landmarks. We show that the value of the ϵ parameter of the ϵ -net provides control on the trade-off between choice and number of landmarks and the quality of the approximate simplicial representation. We present three algorithms for constructing an ϵ -net of a graph. We comparatively and empirically evaluate the efficiency and effectiveness of the choice of landmarks that they induce for the topological data analysis of different real-world graphs.

Full PDF

(cid:15)(cid:15) -net Induced Lazy Witness Complexeson Graphs

Naheed Anjum Arafat , Debabrota Basu , St´ephane Bressan School of Computing, National University of Singapore, Singapore Data Science and AI Division, Chalmers University of Technology, Sweden

Abstract.

Computation of persistent homology of simplicial represen-tations such as the Rips and the Cˇech complexes do not eﬃciently scaleto large point clouds. It is, therefore, meaningful to devise approximaterepresentations and evaluate the trade-oﬀ between their eﬃciency andeﬀectiveness. The lazy witness complex economically deﬁnes such a rep-resentation using only a few selected points, called landmarks.Topological data analysis traditionally considers a point cloud in a Eu-clidean space. In many situations, however, data is available in the formof a weighted graph. A graph along with the geodesic distance deﬁnesa metric space. This metric space of a graph is amenable to topologicaldata analysis.We discuss the computation of persistent homologies on a weightedgraph. We present a lazy witness complex approach leveraging the notionof (cid:15) -net that we adapt to weighted graphs and their geodesic distanceto select landmarks. We show that the value of the (cid:15) parameter of the (cid:15) -net provides control on the trade-oﬀ between choice and number oflandmarks and the quality of the approximate simplicial representation.We present three algorithms for constructing an (cid:15) -net of a graph. Wecomparatively and empirically evaluate the eﬃciency and eﬀectivenessof the choice of landmarks that they induce for the topological dataanalysis of diﬀerent real-world graphs.

Topological data analysis (TDA) [5,24] involves computation of topologicalfeatures of datasets, such as persistent homology classes, and the representationof these topological features using such topological descriptors as persistence bar-codes [13]. In this section, we elaborate the computational blocks of topologicaldata analysis as shown in Figure 1.

Simplicial Complex.

Topological data analysis computes the topologicalfeatures of a dataset, such as persistent homology classes, by computing the topo-logical objects called simplicial complex . A simplicial complex is constructedusing simplices. Formally, a k -simplex is the convex-hull of ( k + 1) data points.For instance, a 0-simplex [ v ] is a single point, a 1-simplex [ v v ] is an edge, and a2-simplex [ v v v ] is a ﬁlled triangle. A k -homology class is an equivalent classof such k -simplicial complexes that cannot be reduced to a lower dimensionalsimplicial complex [13]. a r X i v : . [ c s . C G ] S e p Naheed Anjum Arafat, Debabrota Basu, St´ephane Bressan

Input Data Computation of Topological Representations Aggregation of Representations Applications Representation ofTopological Features Computation ofTopological Features

Topological Representations PersistentHomology Classes Topological Descriptors Filtration

Fig. 1: Components of topological data analysis.In order to compute the k -homology classes, a practitioner does not havedirect access to the underlying space of the point cloud and it is combinatori-ally hard to compute the exact simplicial representation of ˇCech complex [31].Thus diﬀerent approximations of the exact simplicial representation are pro-posed: Vietoris-Rips complex [29] and lazy witness complex [10].

Approximate Simplicial Representations.

The

Vietoris-Rips com-plex R α ( D ), for a given dataset D and real number α >

0, is an abstract simpli-cial complex representation consisting of such k -simplices, where any two points u, v in any of these k -simplices are at distance at most α . Vietoris-Rips com-plex is the best possible ( √ baselinerepresentation in this paper. In the worst case, the number of simplices in theVietoris-Rips complex grows exponentially with the number of data points [31].Lazy witness complex [10] approximates the Vietoris-Rips complex by con-structing the simplicial complexes over a subset of data points L , referred to asthe landmarks. Formally, given a positive integer ν and a real number α > lazy witness complex LW α ( D, L, ν ) of a dataset D is a simplicial complexover a landmark set L where for any two points v i , v j of a k -simplex [ v v · · · v k ],there is a point w whose ( d ν ( w ) + α )-neighbourhood contains v i , v j . d ν ( w ) is thegeodesic distance from point w ∈ L to its ν -th nearest point in the landmark set L . In the worst case, the size of the lazy witness complexes grows exponentiallywith the number of landmarks. Less number of landmarks facilitates computa-tional acceleration while produces a bad approximation of Vietoris-Rips withloss of topological features. Thus, the trade-oﬀ between the approximation oftopological features and available computational resources dictates the choice oflandmarks. We provide a quantiﬁcation on such loss of topological features thatwas absent in the literature.

Filtration and Representation of Topological Features.

As the valueof ﬁltration parameter α increases, new simplices arrive and the topological fea-tures, i.e. the homology classes, start to appear. Some of the homology classesmerge with the existing classes in a subsequent simplicial complex, and some ofthem persist indeﬁnitely [13]. In order to capture the evolution of topologicalstructure with scale, topological data analysis techniques construct a sequenceof simplicial complex representations, called a ﬁltration [13], for an increasingsequence of α ’s. In a given ﬁltration, the persistence interval of a homologyclass is denoted by [ α b , α d ), where α b and α d are the ﬁltration values of its ap- -net Induced Lazy Witness Complexes on Graphs 3 pearance and merging respectively. The persistence interval of an indeﬁnitelypersisting homology class is denoted as [ α b , ∞ ). Topological descriptors, suchas barcodes [9], persistence diagram [13], and persistence landscapes [4], repre-sent persistence intervals in order to draw qualitative and quantitative inferenceabout the topological features. Distance measures between persistent diagramssuch as the q-Wasserstein and Bottleneck distance [13] are often used to drawquantitative inference. Graph Topological Data Analysis.

Topological data analysis (TDA) [5]traditionally considers a point cloud in a Euclidean space. In many situations,however, data is available in the form of a weighted graph. Conveniently, thevertices of the graph with the geodesic distance deﬁne a metric space. This met-ric space is amenable to topological data analysis. This fact does not dependon whether the graph is embeddable in a Euclidean space or not. Though themetric space induced by a graph provides a diﬀerent structure than point cloudsto investigate, this metric space bears the similar issues of scalable constructionof representations as well as similar approximate representations, ﬁltrations, andtopological descriptors. This motivated us to exploit the generalisability of topo-logical data analysis and to extend the (cid:15) -net induced lazy witness complexes [1]to graphs.

Our Contributions.

We investigate the computation of persistent homolo-gies on a weighted graph. In Section 3, we present a lazy witness complex ap-proach leveraging the notion of (cid:15) -net that we adapt to weighted graphs and theirgeodesic distance to select landmarks. We show that the (cid:15) parameter of the (cid:15) -netgives a control on the trade-oﬀ between choice and number of landmarks andthe quality of the approximate simplicial representation.In Section 3.2, we prove that an (cid:15) -net is an (cid:15) -approximate representationof the point cloud with respect to the Hausdorﬀ distance. We prove that thelazy witness complex induced by an (cid:15) -net, as a choice of landmarks, is a 3-approximation of the induced Vietoris-Rips complex.In Section 4, we present three algorithms, namely Greedy- (cid:15) -net, Iterative- (cid:15) -net, and SPTpruning- (cid:15) -net, for constructing an (cid:15) -net of a graph. In Section 5,we comparatively and empirically evaluate the eﬃciency and eﬀectiveness of thechoice of landmarks that they induce for the topological data analysis of severalreal-world graphs.In Section 6, we summarise the ﬁndings and the future directions of researchthat (cid:15) -net opens up for graph topological data analysis.

Graph TDA.

Existing applications of TDA to graphs focus on characterizingnetworks using features computed from persistence homology classes. [7] and[25] computed persistence homology at dimension 0, 1, and 2 of the clique ﬁl-tration to study weighted collaboration networks (size ∼ ∼ ∼ Naheed Anjum Arafat, Debabrota Basu, St´ephane Bressan tween Vietoris-Rips persistence diagram computed on each network. [21] studiedVietoris-Rips ﬁltration of the functional brain networks computed on ∼

100 re-gion of interests (points) in human brains with diﬀerent clinical disorders. Theyfocus on homology classes at dimension 0 and 1. A related line of work regardingthe topology on graphs involves graphs derived as a representation of point-clouddata (e.g. the neighbourhood graph) and their usage in data clustering [8] andinference of global topology from local information [28].

Approximate Simplicial Complexes.

Computational infeasibility of con-structing the ˇCech complex and Vietoris-Rips complex motivates the develop-ment of approximate simplicial representations such as the lazy witness com-plexes, sparse-Rips complex [26] and graph induced complex (GIC) [11].

Applications of (cid:15) -net.

The concept of (cid:15) -net is a standard concept in anal-ysis and topology [18] originating from the idea of ( δ, (cid:15) )-limits formulated byCauchy. (cid:15) -net are sets in a metric space that covers the whole space and arewell-separated. Nets have been used in nearest-neighbour search [20]. [15] used (cid:15) -net for manifold reconstruction. Graph induced complex [11] uses the cliquesin the neighbourhood graph to construct simplcial complex over an (cid:15) -net.[17] proposed net-tree data structure to represent (cid:15) -nets at all scales of (cid:15) . Net-tree is used to construct approximate well-separated pair decompositions [17] andapproximate geometric spanners [17]. Sparse-Rips ﬁltration [26] constructs a net-tree on the point-cloud to decide which neighbouring points to delete. Contraryto Sheehy [26], we use (cid:15) -net to select a ﬁxed subset of points, called landmarks,and compute persistent homology using them. (cid:15) -net of Graphs: Deﬁnition and Analysis [1] proposes the (cid:15) -net induced lazy witness complex for a point cloud embeddedin a Euclidean space for eﬃcient computation of topological data analysis. Inpractice, the datasets may not be represented as a point cloud in a Euclideanspace. The data may have diﬀerent representations and non-Euclidean geometry.For instance, the dataset with contextual and relational structure is often repre-sented using graphs. In a graphical representation of data, the vertices representthe data objects, edges represent relations among the data objects, and weightson the edges quantify the amplitude of the relation with respect to others.In this paper, we study both weighted and unweighted simple graphs. Sinceunweighted graphs are a special case of weighted graph, we construct the deﬁ-nitions for weighted simple graphs. A weighted simple graph [3] G ( V, E, W ) is agraph with a vertex set V , an edge set E , a weight function W : V × V → R + , anddoes not contain any self-edge or multiple edge. The geodesic distance d G ( u, v )between a pair u, v of vertices in a graph is deﬁned as the length of the shortestpath between u and v , where the path length is deﬁned as the sum of weightsof the edges connecting the vertices u and v [23]. In this paper we treat a graph G = ( V, E, W ) as a set V endowed with the canonical metric d G : V × V → R + .Thus, a weighted simple graph G transforms into a metric space ( V, d G ). -net Induced Lazy Witness Complexes on Graphs 5 As we substitute the points in the point cloud with the vertices of the graphand the Euclidean distance between points with the geodesic distance betweenthe vertices, we adapt the components of topological data analysis for simpleweighted graphs. (cid:15) -cover is a construction used in topology to compute inherent properties of agiven space [18]. In this paper, we import the concept of (cid:15) -cover to deﬁne (cid:15) -net ofa graph. We use the (cid:15) -net of a graph as landmarks for constructing lazy witnesscomplex on that graph.We show that (cid:15) -net, as a choice of landmarks, has guarantees such as beingan (cid:15) -approximate representation of the graph, its induced lazy witness complexbeing a 3-approximation of the corresponding Vietoris-Rips complex, and alsobounding the number of landmarks for a given (cid:15) . These guarantees are absent forthe other existing landmark selection algorithms such as random and maxminalgorithms [27]. (cid:15) -net of a Graph

In this paper, we consider a graph G = ( V, E, W ) as a ﬁnite metric space (

V, d G ).Since there are many graphs that are neither Euclidean nor they have Euclideanembedding, we extend the deﬁnitions of neighbourhood and cover from point-settopology as follows before deﬁning (cid:15) -net. Deﬁnition 1 ( (cid:15) -neighbourhood).

The (cid:15) -neighbourhood N (cid:15) ( u ) of a vertex u ∈ V is a subset of the vertex set V such that for any vertex v ∈ N (cid:15) ( u ) the distance d G ( u, v ) ≤ (cid:15) . The notion of (cid:15) -cover for graph generalises the geometric notion of cover usingthe set-theoretic notion of (cid:15) -neighbourhood in Deﬁnition 1.

Deﬁnition 2 ( (cid:15) -cover). An (cid:15) -cover of G is the ﬁnite collection {N (cid:15)/ ( u i ) } of (cid:15)/ -neighbourhoods of vertices in G such that ∪ i N (cid:15) ( u i ) = V . By triangle inequality, any set in the (cid:15) -cover has the property that two vertices u, v in the cover have geodesic distance d G ( u, v ) ≤ (cid:15) . We diﬀerentiate an (cid:15) -cover from the set of vertices whose (cid:15) -neighbourhood determines that cover bydeﬁning (cid:15) -sample. Deﬁnition 3 ( (cid:15) -sample).

A set L = { u , u , . . . , u | L | } ⊆ V is an (cid:15) -sample ofgraph G if the collection {N (cid:15) ( u i ) : u i ∈ L } of (cid:15) -neighbourhoods covers G i.e. ∪ i N (cid:15) ( u i ) = V . (cid:15) -neighbourhoods may intersect, which is not an intended property if we wantto decrease the size of the (cid:15) -sample. We combine the notion of (cid:15) -cover with thenotion of (cid:15) -sparsity to deﬁne (cid:15) -net. Deﬁnition 4 ( (cid:15) -sparse).

A set L = { u , u , . . . , u | L | } ⊂ V is (cid:15) -sparse if forany distinct u i , u j ∈ L , d G ( u i , u j ) > (cid:15) in graph G . Naheed Anjum Arafat, Debabrota Basu, St´ephane Bressan An (cid:15) -net of graph G is such a subset of V which is both an (cid:15) -sample of G and (cid:15) -sparse. The (cid:15) -net while considered as the landmark set L induces a metricsubspace ( L, d L ) of the metric space of the graph of ( V, d G ), where d L is themetric induced on set L by the geodesic metric d G . Deﬁnition 5 ( (cid:15) -net).

A subset L ⊂ V is an (cid:15) -net of G if L is (cid:15) -sparse and an (cid:15) -sample of G . Relating (cid:15) -net and Other Graph Theoretic Concepts.

The deﬁnition of (cid:15) -net on graphs generalises the notion of independent set and dominating setfor undirected graphs [3]. Any 1-net of an undirected graph G = ( V, E ) is anindependent set of G and vice versa. Any minimal cardinality 1-net of G is adominating set of G , and vice versa. (cid:15) -net of a Graph (cid:15) -net of a simple, weighted, connected graph comes with approximation guar-antees irrespective of its algorithmic construction. In this section, we providefollowing three analysis of (cid:15) -net:1. An (cid:15) -net of a connected graph is an (cid:15) -approximation of its set of vertices inHausdorﬀ distance.2. The lazy witness complex induced by an (cid:15) -net is a 3-approximation of theVietoris-Rips complex induced by the same set.3. For a graph of diameter ∆ , there exists an (cid:15) -net of of size at most ( ∆(cid:15) ) O (log | V | (cid:15) ) . Graph Approximation Guarantee of an (cid:15) -net.

We use Lemma 1 to provethat the (cid:15) -net of a graph G = ( V, E, W ) is an (cid:15) -approximation of V in Hausdorﬀmetric (Theorem 1). Lemma 1 follows from the (cid:15) -sample property of an (cid:15) -net. Lemma 1.

Let L be an (cid:15) -net of graph G . For any vertex v ∈ V , there exists apoint u ∈ L ⊆ V such that the geodesic distance d G ( u, v ) ≤ (cid:15) Proof.

Since V = ∪ u ∈ L N (cid:15) ( u ), for any vertex v ∈ V there exists an u ∈ L suchthat v ∈ N (cid:15) ( u ). As v ∈ N (cid:15) ( u ), by deﬁnition of (cid:15) -neighbourhood, the length ofthe shortest path from v to u is at most (cid:15) , i.e. d G ( u, v ) ≤ (cid:15) . Theorem 1.

The Hausdorﬀ distance between ( V, d G ) and its (cid:15) -net induced sub-space ( L, d L ) is at most (cid:15) .Proof. For any u ∈ L ⊆ V , there exists a vertex v ∈ V such that d G ( u, v ) ≤ (cid:15) ,by deﬁnition of (cid:15) -neighbourhood. Hence, max L min V d G ( u, v ) ≤ (cid:15) . By Lemma 1,max V min L d G ( u, v ) ≤ (cid:15) . Since the Hausdorﬀ distance d H ( V, L ) is deﬁned as themaximum of max L min V d G ( u, v ) and max V min L d G ( u, v ), therefore d H ( V, L ) isupper bounded by (cid:15) . -net Induced Lazy Witness Complexes on Graphs 7 Topological Approximation Guarantee of an (cid:15) -net induced Lazy wit-ness complex on graphs.

In addition to an (cid:15) -net being an (cid:15) -approximationof the space (

V, d G ), we prove that the lazy witness complex induced by the (cid:15) -net, as landmarks, is a good approximation (Theorem 2) to the Vietoris-Ripscomplex on the same set of vertices. This approximation ratio is independentof the algorithm constructing the (cid:15) -net. As a step towards Theorem 2, we stateLemma 2. Lemma 2 is implied by the deﬁnition of the lazy witness complexand (cid:15) -sample. Lemma 2 establishes the relation between 1-nearest neighbour ofpoints in an (cid:15) -net. Lemma 2. If L is an (cid:15) -net of ( V, d G ) , the distance d G ( u, v ) from any vertex u ∈ L to its 1-nearest neighbour v ∈ V is at most (cid:15) . Theorem 2 shows that a lazy witness complex induced by an (cid:15) -net landmarksis a 3-approximation of the Vietoris-Rips complex on the landmarks beyond acertain value of the ﬁltration parameter.

Theorem 2. If L is an (cid:15) -net of the point cloud V for (cid:15) ∈ R + , LW α ( V, L, ν = 1) is the lazy witness complex of L at ﬁltration value α , and R α ( L ) is the Vietoris-Rips complex of L at ﬁltration α , R α/ ( L ) ⊆ LW α ( V, L, ⊆ R α ( L ) for α ≥ (cid:15) .Proof. In order to prove the ﬁrst inclusion, consider a k -simplex σ k = [ x · · · x k ] ∈ R α/ ( L ). For any edge [ x i x j ] ∈ σ k , let w t be the point in V that is nearestto the vertices of [ x i x j ]. Without loss of generality, let that vertex be x j . Since w t is the nearest neighbour of x j , by Lemma 2, d G ( w t , x j ) ≤ (cid:15) ≤ α . Since[ x i x j ] ∈ R α/ , d G ( x i , x j ) ≤ α < α . By triangle inequality, d G ( w t , x i ) ≤ α + α ≤ α . Hence, x i is within distance α from w t . The α -neighbourhood of point w t contains both x i and x j . Since d ( w t ) >

0, the ( d ( w t ) + α )-neighbourhood of w t also contains x i , x j . Therefore, [ x i x j ] is an edge in LW α ( V, L, x i , x j ∈ σ k , the k -simplex σ k ∈ LW α ( V, L, k -simplex σ k = [ x x · · · x k ] ∈ LW α ( V, L, x i x j ] of σ k there is a witness w ∈ V such that, the ( d ( w ) + α )-neighbourhoodof w contains both x i and x j . Hence, d G ( w, x i ) ≤ d ( w ) + α ≤ (cid:15) + α (by Lemma2) ≤ α/

2. Similarly, d G ( w, x j ) ≤ α/

2. By triangle inequality, d G ( x i , x j ) ≤ α .Therefore, [ x i x j ] is an edge in R α ( L ). Since the argument is true for any x i , x j ∈ σ k , the k-simplex σ k ∈ R α ( L ). Discussion.

Theorem 2 implies that the interleaving of lazy witness ﬁltra-tion LW = LW α ( L ) and the Vietoris-Rips ﬁltration R = R α ( L ) occurs when α > (cid:15) . As a consequence, their corresponding partial persistence diagrams Dgm > (cid:15) ( LW ) and Dgm > (cid:15) ( R ) are 3 log 3-approximations of each other in log-scale, by the persistence approximation lemma [26]. In Section 5, we empiricallyvalidate this bound for the lazy witness complex induced by the (cid:15) -net landmarks. Size of an (cid:15) -net.

We prove an upper bound on the size of an (cid:15) -net of aconnected unweighted graph using the doubling dimension . Naheed Anjum Arafat, Debabrota Basu, St´ephane Bressan

The doubling dimension [16] of a metric space M = ( X, d ) is the smallestpositive number D such that any (cid:15) -neighbourhood in M can be covered by 2 D number of (cid:15) -neighbourhoods. A metric space is called doubling if its doublingdimension is bounded. The space ( V, d G ) is a doubling metric space.Gupta et. al. [16] showed that the doubling dimension D ( G ) of an unweightedconnected graph G is related to its local density. The local density of an un-weighted connected graph G , denoted β ( G ), is the smallest value β such that |N (cid:15) ( v ) | ≤ β(cid:15) , for all v ∈ V and (cid:15) ∈ N . To be precise the doubling dimension D ( G ) ≤ β ( G )) [16]. We use this result along with the following lemma toprove bound on the size of an epsilon -net of an unweighted graph in Theorem 3. Lemma 3.

For any connected unweighted graph G of diameter ∆ and doublingdimension D ( G ) , there exists an (cid:15) -net of size at most ( ∆(cid:15) ) D ( G ) where ∆ ≥ (cid:15) ≥ Proof.

Let ∆ be the diameter of an unweighted graph G of doubling dimen-sion D ( G ). Thus, V (a ∆ -neighbourhood) can be covered by 2 D ( G ) number of ∆/ ∆/ D ( G ) num-ber of ∆/ V can be covered by 2 D ( G ) number of ∆/ ( ∆(cid:15) )-times, we get that V can be covered by2 D ( G ) log ( ∆(cid:15) ) number of (cid:15) -neighbourhoods. Each of the (cid:15) -neighbourhoods containat most one (cid:15) -net-landmark. Hence, there exists an (cid:15) -net of size ( ∆(cid:15) ) D ( G ) for anyconnected, unweighted graph G . Theorem 3.

For any connected unweighted graph G = ( V, E ) of diameter ∆ ,there exists an (cid:15) -net of size at most ( ∆(cid:15) ) O (log( | V | (cid:15) )) Proof.

In a connected unweighted graph G , the size |N (cid:15) ( v ) | of (cid:15) -neighbourhoodof a vertex v is greater than (cid:15) for ∆ ≥ (cid:15) ≥

1. Thus, local density β ( G ) > v |N (cid:15) ( v ) | < | V | . Thus, β ( G ) = max v,(cid:15) |N (cid:15) ( v ) | (cid:15) is at most | V | (cid:15) . Applying the result from Gupta et.al. [16], the doubling dimension D ( G ) ≤ | V | (cid:15) ). The rest follows from Lemma 3. (cid:15) -Net In this section, we propose and elaborate three algorithms, namely Greedy- (cid:15) -Net,Iterative- (cid:15) -Net, and SPTprunning- (cid:15) -net, for computing (cid:15) -net on graphs. (cid:15) -Net Algorithm

We propose a greedy algorithm, namely Greedy- (cid:15) -Net, to compute a minimal-cardinality (cid:15) -net of a graph. Greedy- (cid:15) -Net (Algorithm 1) maintains a hash tablewith vertices as keys and the number of vertices in their (cid:15) -cover as values. Ateach step, Greedy- (cid:15) -Net selects a vertex with the largest (cid:15) -cover, marks thecovered vertices, and updates the (cid:15) -cover of other vertices until all the verticesare marked as covered. -net Induced Lazy Witness Complexes on Graphs 9

Algorithm 1

Greedy- (cid:15) -Net

Input:

Graph G = ( V, E, W ), parameter (cid:15)

Output:

Set of Landmarks L

1: Initialize L = φ

2: Let nc be the hash table with vertices as keys, number of vertices in their (cid:15) -coveras value.3: for all u ∈ V do nc [ u ] = (cid:15) -BFS ( G, u, (cid:15) )5: end for

6: Initialize all vertex u ∈ V as marked.7: repeat

8: Sort nc in descending order of its value.9: if marked [ u ] = False then L.insert ( u )11: for all vertex u (cid:48) in u’s (cid:15) -cover do

12: Mark u (cid:48) as True.13: Delete key u (cid:48) from nc.14: for all v ∈ V do if u (cid:48) is in (cid:15) -cover of v then

16: Decrease nc [ v ] by 1.17: end if end for end for end if

21: Delete key u from nc.22: until all vertices are marked Algorithm 2 (cid:15) -BFS

Input:

Graph G = ( V, E, W ), vertex u , parameter (cid:15) . Output:

Set of vertices in u ’s (cid:15) -cover, C (cid:15)

1: Initialize Queue Q = { u }

2: initialize C (cid:15) = φ while Q (cid:54) = φ do v = DEQUEUE( Q )5: v.marked = True6: for all v (cid:48) ∈ G.Adj [ v ] do if v (cid:48) .marked = False then v (cid:48) .d = v.d + W [ v, v (cid:48) ]9: if v (cid:48) .d ≤ (cid:15) then C (cid:15) = C (cid:15) ∪ { v (cid:48) } end if end if end for end while (cid:15) -Net Algorithm Iterative- (cid:15) -Net (Algorithm 3) is a diﬀusive algorithm that maintains a set C ( (cid:15), (cid:15) ) that contains the set of unmarked vertices that are within a ring of ( (cid:15), (cid:15) ] distancefrom the current set of landmarks. We call them the ring vertices . Iterative- Algorithm 3

Iterative- (cid:15) -Net

Input:

Graph G = ( V, E, W ), parameter (cid:15)

Output:

Set of landmarks L

1: Initialise L = φ i = 13: Select initial landmark l i = u uniformly at random from V .4: L = L ∪ { u }

5: Let C u ( (cid:15), (cid:15) ) be the set of unmarked vertices v such that (cid:15) < d G ( u, v ) ≤ (cid:15) .6: Let C u (cid:15) be the set of vertices v such that d G ( u, v ) > (cid:15) and are the closest.7: repeat C u(cid:15), (cid:15) = PartialBFS ( G, l i , C u ( (cid:15), (cid:15) ) , C u (cid:15) )9: if C u (cid:15) is empty then

10: select l i +1 uniformly at random from C u ( (cid:15), (cid:15) ) .11: else

12: select l i +1 uniformly at random from C u (cid:15) .13: end if L = L ∪ { l i +1 } .15: i = i + 116: u = l i +1 until all vertices are marked (cid:15) -Net also maintains another set C (cid:15) that contains the set of unmarked verticesthat are at distance at least 2 (cid:15) but are adjacent to the bordering vertices of thecover. We call them the enveloping vertices .Iterative- (cid:15) -net, at each iteration, uniformly at random selects a vertex u from the current set of enveloping vertices as next landmark, and run Partial-BFS (Algorithm 4) starting at u to mark the vertices in its (cid:15) -cover, as well as toupdate the enveloping and ring vertex sets. If enveloping vertex set is empty itselects a ring vertex uniformly at random as next landmark.Iterative (cid:15) -net algorithm has the property that some vertex in the (cid:15) -cover oflandmark l i +1 is always adjacent to some vertex in the (cid:15) -cover of landmark l i .Thus the two covers are adjacent as sets. We propose SPTpruning- (cid:15) -Net algorithm (Algorithm 5) that constructs a short-est path tree of the graph and uses the tree to compute an (cid:15) -net. Algorithm 5computes a shortest path tree rooted at a vertex chosen uniformly at random.The algorithm uses (cid:15) -BFS (Algorithm 2) to construct a preliminary BFS span-ning tree of the graph (line 2). Then the algorithm constructs an (cid:15) -net of theBFS-tree (line 4-28). It does so by traversing the tree level-order starting fromroot, running (cid:15) -BFS in the tree to mark covered vertices, and add the set ofunmarked vertices at level (cid:15) + 1 as candidates for landmarks.An (cid:15) -net L SP T of a BFS-tree

SP T ( G ) of a graph G has the property that,any vertex v ∈ SP T ( G ) ⊂ G that is covered by some vertex u ∈ L SP T , is alsocovered by the u ∈ V in the graph as well. This property follows from the fact -net Induced Lazy Witness Complexes on Graphs 11 Algorithm 4

PartialBFS

Input:

Graph G , vertex u , set C u ( (cid:15), (cid:15) ) , set C u (cid:15) .1: Initialise Queue Q = { u } u.marked = True3: while Q (cid:54) = φ do v = DEQUEUE( Q )5: for all v (cid:48) ∈ G.Adj [ v ] do if v (cid:48) .marked = False then v (cid:48) .d = v.d + W [ v, v (cid:48) ]8: if v (cid:48) .d ≤ (cid:15) then v (cid:48) .marked = True10: Remove v (cid:48) from C u ( (cid:15), (cid:15) ) and C u (cid:15) if exists.11: ENQUEUE( Q, v (cid:48) )12: else if (cid:15) < v (cid:48) .d ≤ (cid:15) then C u ( (cid:15), (cid:15) ) = C u ( (cid:15), (cid:15) ) ∪ { v (cid:48) }

14: ENQUEUE(

Q, v (cid:48) )15: else C u (cid:15) = C u (cid:15) ∪ { v (cid:48) } end if end if end for end while that, the distance between any vertex pair d G ( u, v ) in the graph can only beshorter than their distance d SP T ( u, v ) in the BFS-tree.Unless one of the vertex is a root, the distance between any pair of verticesin the tree is not guaranteed to be the shortest in the graph. Thus an (cid:15) -netof a BFS-tree does not have (cid:15) -sparsity in the graph containing the BFS-treeas a subgraph. As a remedy, Algorithm 6 prunes vertices from the candidatelandmarks that are covered by other candidate landmark. In this section, we experimentally and comparatively analyse the performanceof the three proposed algorithms to compute an (cid:15) -net of graphs. We discussthe eﬀectiveness and eﬃciency of the algorithms and also validate that the (cid:15) -net computed by any of these algorithms satisfy being 3-approximation of theVietoris-Rips complex.

We evaluate the performance of our algorithms using two real-world datasets.The dataset Power [30] is an unweighted graph of US Power-grid (4941 vertices,6594 edges, diameter 46). Celegans [2] is a weighted graph of Celegans worm’sfrontal neural network (297 vertices, 2148 edges, diameter 1.333).

Algorithm 5

SPTpruning- (cid:15) -net

Input:

Graph G , Diameter ∆ , parameter (cid:15) Output:

Set of landmarks C

1: Select a vertex u uniformly at random from V .2: Run (cid:15) -BFS(G,u, ∆ ) to construct a BFS spanning tree rooted at u

3: Let

SP T be the tree.4: Initialise Queue Q = { u } u.marked = True6: Let (cid:15) -net of the SPT C (cid:15) = φ repeat while Q (cid:54) = φ do u (cid:48) = DEQUEUE( Q )10: u’.marked = True11: u (cid:48) .d = 012: Initialise Queue Q (cid:48) = { u (cid:48) } while Q (cid:48) (cid:54) = φ do v = DEQUEUE( Q (cid:48) )15: for all v (cid:48) ∈ SP T.Adj [ v ] do if v (cid:48) .marked = False then v (cid:48) .d = v.d + W [ v, v (cid:48) ]18: if v (cid:48) .d ≤ (cid:15) then v (cid:48) .marked = True20: ENQUEUE( Q (cid:48) , v (cid:48) )21: else C (cid:15) = C (cid:15) ∪ { v (cid:48) }

23: ENQUEUE(

Q, v (cid:48) )24: end if end if end for end while end while until

All vertex in

SP T are marked30: C = Prune(G, C (cid:15) , (cid:15) ) Algorithm 6

Prune

Input:

Graph G , A set of landmarks C (cid:15) , parameter (cid:15) Output:

Set of landmarks C C = C (cid:15) for all vertex v ∈ C (cid:15) do if v.marked = False then

4: Run (cid:15) -BFS(G,v, (cid:15) ) else C = C \ { v } end if end for -net Induced Lazy Witness Complexes on Graphs 13 . . . (cid:15) N u m b e r o f L a nd m a r k s Greedy- (cid:15) -netIterative- (cid:15) -netSPTpruning- (cid:15) -net . . . (cid:15) L a nd m a r k s e l ec t i o n t i m e ( m s ) . . . (cid:15) T o t a l c o m pu t a t i o n t i m e ( m s ) Celegans

Fig. 2: Performance of the algorithms on Celegans dataset. (cid:15) N u m b e r o f L a nd m a r k s Greedy- (cid:15) -netIterative- (cid:15) -netSPTpruning- (cid:15) -net (cid:15) L a nd m a r k s e l ec t i o n t i m e ( m s ) (cid:15) T o t a l c o m pu t a t i o n t i m e ( m s ) Power

Fig. 3: Performance of the algorithms on Power dataset.

We implement the experimental workﬂow in C++. We use Snap library [22]for graph processing and (cid:15) -net computations. We modify the Ripser libraryto compute lazy witness complexes and their persistent intervals. We use R-TDA package [14] to compute bottleneck distances. All experiments are runon a machine with an Intel(R) Xeon(R)@2.20GHz CPU and 80 GB memorylimit. We set the lazy witness parameter ν = 1 in all computations. We set themaximum value of the ﬁltration parameter to the diameter of the correspondingdataset. We compute persistent intervals at dimension 0 and 1. We measure the eﬃciency of our algorithms in terms of CPU time (in ms)required to select (cid:15) -net landmarks for a given (cid:15) , and the overall computation ofpersistent homology of each graph. The overall total computation time includesthe time an algorithm spends constructing (cid:15) -net and time spent on computingpersistent intervals at dimension 0 and 1. We measure the eﬀectiveness of the https://github.com/Ripser/ripser It e r a t i v e -- ne t B o tt l ene ck d i s t. It e r a t i v e -- ne t B o tt l ene ck d i s t. G r eed y -- ne t B o tt l ene ck d i s t. G r eed y -- ne t B o tt l ene ck d i s t. SP T p r un i ng -- ne t B o tt l ene ck d i s t. SP T p r un i ng -- ne t B o tt l ene ck d i s t. dimension 1dimension 0 Fig. 4: Validation of the approximation guarantee of the (cid:15) -net induced lazy wit-ness complexes for Celegans dataset. The bottlneck distance between the partialpersistence diagram of the corresponding Vietoris-Rips and lazy witness ﬁltra-tions is less than 3 log 3 in both dimensions 0 and 1.algorithms using the number of landmarks they select by corresponding (cid:15) -netconstruction.Figures 2 and 3 illustrate the experimental results for the Celegans and thePower dataset respectively. We observe that the number of landmarks selectedby all the algorithms decrease as the (cid:15) -increases. The landmark selection timeof the Greedy- (cid:15) -Net increases as the (cid:15) increases independent of the dataset.For other two algorithms, the landmark selection time varies depending on thedensity of the graph. The landmark computation time of Iterative- (cid:15) -net andSPTpruning- (cid:15) -net are almost invariant with (cid:15) .We observe that Iterative- (cid:15) -Net takes longer time ( >

10 ms) to select land-marks compared to the SPTpruning- (cid:15) -Net algorithm but it selects less number oflandmarks than SPTpruning- (cid:15) -Net algorithm. Thus, the overall runtime of lazywitness complex construction using Iterative- (cid:15) -Net is smaller than that of theSPTpruning- (cid:15) -Net. The empirical performance analysis instantiates Iterative- (cid:15) -Net as the practical and eﬃcient choice to construct (cid:15) -net induced lazy witnesscomplex for graphs. -net Induced Lazy Witness Complexes on Graphs 15

In order to validate the approximation guarantee in Theorem 2, we constructthe Vietoris-Rips complex and lazy witness complexes using (cid:15) -nets for diﬀerentvalues of (cid:15) and diﬀerent algorithms. We compare the corresponding complexes bycomputing bottleneck distances between the persistence diagrams at dimension0 and 1. We retain only the partial diagram with points in the diagram bornafter 2 (cid:15) for this purpose. Figure 4 validates the guarantee on Celegans dataset.We omit the validation on Power dataset for the sake of brevity.

We investigate the computation of persistent homologies on weighted graphs.We extend the notion of (cid:15) -net for point clouds to weighted graphs. We furtherpropose an (cid:15) -net induced lazy witness complex that leverages the (cid:15) -net and theirgeodesic distances to select landmarks.We prove that an (cid:15) -net of a connected graph is an (cid:15) -approximation of its setof vertices in Hausdorﬀ distance. We also prove that the lazy witness complexinduced by an (cid:15) -net is a 3-approximation of the Vietoris-Rips complex inducedby the same set. We prove the existence of an (cid:15) -net (of a graph) of size at most( ∆(cid:15) ) O (log | V | (cid:15) ) , where ∆ is the diameter of the graph.We present three algorithms for constructing an (cid:15) -net of a graph. We com-paratively and empirically evaluate the eﬃciency and eﬀectiveness of the choiceof landmarks that they induce for the topological data analysis of several realworld graphs. The empirical performance analysis instantiates Iterative- (cid:15) -Net asthe practical and eﬃcient choice to construct (cid:15) -net induced lazy witness complexfor graphs.An interesting future work would be to leverage the notion of (cid:15) -net inducedlazy witness complex for faster and scalable computation of machine learningproblems, such as clustering [8], deep learning [19], kernel density estimation [6],for both graphs and point clouds. Acknowledgement

This work is partially supported by the National University of Singapore Insti-tute for Data Science project WATCHA and by Singapore Ministry of Educationproject Janus.

References

1. Arafat, N.A., Basu, D., Bressan, S.: Topological data analysis with (cid:15) -net inducedlazy witness complex. arXiv preprint arXiv:1906.06122 (2019)2. Badhwar, R., Bagler, G.: Control of neuronal network in caenorhabditis elegans.PloS one 10(9), e0139204 (2015)3. Berge, C.: Graphs and hypergraphs. North-Holland, New York (1976)4. Bubenik, P.: Statistical topological data analysis using persistence landscapes. TheJournal of Machine Learning Research 16(1), 77–102 (2015)5. Carlsson, G.: Topology and data. Bulletin of The American Mathematical Society,Vol. 46(2), pp. 255-308 (2009)6 Naheed Anjum Arafat, Debabrota Basu, St´ephane Bressan6. Carriere, M., Cuturi, M., Oudot, S.: Sliced wasserstein kernel for persistence dia-grams. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70. pp. 664–673. JMLR. org (2017)7. Carstens, C.J., Horadam, K.J.: Persistent Homology of Collaboration Networks.Mathematical Problems in Engineering 2013(6), 1–7 (jun 2013)8. Chazal, F., Guibas, L.J., Oudot, S.Y., Skraba, P.: Persistence-based clustering inriemannian manifolds. Journal of the ACM (JACM) 60(6), 41 (2013)9. Collins, A., Zomorodian, A., Carlsson, G., Guibas, L.J.: A barcode shape descriptorfor curve point cloud data. Computers & Graphics 28(6), 881–894 (2004)10. De Silva, V., Carlsson, G.: Topological estimation using witness complexes. In:Proceedings of the First Eurographics conference on Point-Based Graphics. pp.157–166. Eurographics Association (2004)11. Dey, T.K., Fan, F., Wang, Y.: Graph induced complex on point data. Computa-tional Geometry 48(8), 575–588 (2015)12. Duman, A.N., Pirim, H.: Gene coexpression network comparison via persistenthomology. International journal of genomics 2018 (2018)13. Edelsbrunner, H., Harer, J.: Computational topology: an introduction. AmericanMathematical Soc. (2010)14. Fasy, B.T., Kim, J., Lecci, F., Maria, C.: Introduction to the r package tda. arXivpreprint arXiv:1411.1830 (2014)15. Guibas, L.J., Oudot, S.Y.: Reconstruction using witness complexes. Discrete &computational geometry 40(3), 325–356 (2008)16. Gupta, A., Krauthgamer, R., Lee, J.R.: Bounded geometries, fractals, and low-distortion embeddings. In: 44th Annual IEEE Symposium on Foundations of Com-puter Science, 2003. Proceedings. pp. 534–543. IEEE (2003)17. Har-Peled, S., Mendel, M.: Fast construction of nets in low-dimensional metricsand their applications. SIAM Journal on Computing 35(5), 1148–1184 (2006)18. Heinonen, J.: Lectures on analysis on metric spaces. Springer Science & BusinessMedia (2012)19. Hofer, C., Kwitt, R., Niethammer, M., Dixit, M.: Connectivity-optimized represen-tation learning via persistent homology. In: Proceedings of the 36th InternationalConference on Machine Learning. vol. 97, pp. 2751–2760. PMLR (2019)20. Krauthgamer, R., Lee, J.R.: Navigating nets: simple algorithms for proximitysearch. In: Proceedings of the ﬁfteenth annual ACM-SIAM symposium on Dis-crete algorithms. pp. 798–807 (2004)21. Lee, H., Chung, M.K., Kang, H., Kim, B.N., Lee, D.S.: Discriminative persistenthomology of brain networks. In: 2011 IEEE International Symposium on Biomed-ical Imaging: From Nano to Macro. pp. 841–844. IEEE (2011)22. Leskovec, J., Sosiˇc, R.: Snap: A general-purpose network analysis and graph-mininglibrary. ACM Transactions on Intelligent Systems and Technology (TIST) 8(1), 1(2016)23. Newman, M.E.: Analysis of weighted networks. Physical review E 70(5) (2004)24. Otter, N., Porter, M.A., Tillmann, U., Grindrod, P., Harrington, H.A.: A roadmapfor the computation of persistent homology. EPJ Data Science 6(1), 17 (2017)25. Petri, G., Scolamiero, M., Donato, I., Vaccarino, F.: Topological strata of weightedcomplex networks. PloS one 8(6), e66506 (2013)26. Sheehy, D.R.: Linear-size approximations to the vietoris–rips ﬁltration. Discrete &Computational Geometry 49(4), 778–796 (2013)27. Silva, J., Marques, J., Lemos, J.: Selecting landmark points for sparse manifoldlearning. In: Advances in neural information processing systems. pp. 1241–1248(2006)-net inducedlazy witness complex. arXiv preprint arXiv:1906.06122 (2019)2. Badhwar, R., Bagler, G.: Control of neuronal network in caenorhabditis elegans.PloS one 10(9), e0139204 (2015)3. Berge, C.: Graphs and hypergraphs. North-Holland, New York (1976)4. Bubenik, P.: Statistical topological data analysis using persistence landscapes. TheJournal of Machine Learning Research 16(1), 77–102 (2015)5. Carlsson, G.: Topology and data. Bulletin of The American Mathematical Society,Vol. 46(2), pp. 255-308 (2009)6 Naheed Anjum Arafat, Debabrota Basu, St´ephane Bressan6. Carriere, M., Cuturi, M., Oudot, S.: Sliced wasserstein kernel for persistence dia-grams. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70. pp. 664–673. JMLR. org (2017)7. Carstens, C.J., Horadam, K.J.: Persistent Homology of Collaboration Networks.Mathematical Problems in Engineering 2013(6), 1–7 (jun 2013)8. Chazal, F., Guibas, L.J., Oudot, S.Y., Skraba, P.: Persistence-based clustering inriemannian manifolds. Journal of the ACM (JACM) 60(6), 41 (2013)9. Collins, A., Zomorodian, A., Carlsson, G., Guibas, L.J.: A barcode shape descriptorfor curve point cloud data. Computers & Graphics 28(6), 881–894 (2004)10. De Silva, V., Carlsson, G.: Topological estimation using witness complexes. In:Proceedings of the First Eurographics conference on Point-Based Graphics. pp.157–166. Eurographics Association (2004)11. Dey, T.K., Fan, F., Wang, Y.: Graph induced complex on point data. Computa-tional Geometry 48(8), 575–588 (2015)12. Duman, A.N., Pirim, H.: Gene coexpression network comparison via persistenthomology. International journal of genomics 2018 (2018)13. Edelsbrunner, H., Harer, J.: Computational topology: an introduction. AmericanMathematical Soc. (2010)14. Fasy, B.T., Kim, J., Lecci, F., Maria, C.: Introduction to the r package tda. arXivpreprint arXiv:1411.1830 (2014)15. Guibas, L.J., Oudot, S.Y.: Reconstruction using witness complexes. Discrete &computational geometry 40(3), 325–356 (2008)16. Gupta, A., Krauthgamer, R., Lee, J.R.: Bounded geometries, fractals, and low-distortion embeddings. In: 44th Annual IEEE Symposium on Foundations of Com-puter Science, 2003. Proceedings. pp. 534–543. IEEE (2003)17. Har-Peled, S., Mendel, M.: Fast construction of nets in low-dimensional metricsand their applications. SIAM Journal on Computing 35(5), 1148–1184 (2006)18. Heinonen, J.: Lectures on analysis on metric spaces. Springer Science & BusinessMedia (2012)19. Hofer, C., Kwitt, R., Niethammer, M., Dixit, M.: Connectivity-optimized represen-tation learning via persistent homology. In: Proceedings of the 36th InternationalConference on Machine Learning. vol. 97, pp. 2751–2760. PMLR (2019)20. Krauthgamer, R., Lee, J.R.: Navigating nets: simple algorithms for proximitysearch. In: Proceedings of the ﬁfteenth annual ACM-SIAM symposium on Dis-crete algorithms. pp. 798–807 (2004)21. Lee, H., Chung, M.K., Kang, H., Kim, B.N., Lee, D.S.: Discriminative persistenthomology of brain networks. In: 2011 IEEE International Symposium on Biomed-ical Imaging: From Nano to Macro. pp. 841–844. IEEE (2011)22. Leskovec, J., Sosiˇc, R.: Snap: A general-purpose network analysis and graph-mininglibrary. ACM Transactions on Intelligent Systems and Technology (TIST) 8(1), 1(2016)23. Newman, M.E.: Analysis of weighted networks. Physical review E 70(5) (2004)24. Otter, N., Porter, M.A., Tillmann, U., Grindrod, P., Harrington, H.A.: A roadmapfor the computation of persistent homology. EPJ Data Science 6(1), 17 (2017)25. Petri, G., Scolamiero, M., Donato, I., Vaccarino, F.: Topological strata of weightedcomplex networks. PloS one 8(6), e66506 (2013)26. Sheehy, D.R.: Linear-size approximations to the vietoris–rips ﬁltration. Discrete &Computational Geometry 49(4), 778–796 (2013)27. Silva, J., Marques, J., Lemos, J.: Selecting landmark points for sparse manifoldlearning. In: Advances in neural information processing systems. pp. 1241–1248(2006)