[PDF] Estimating Descriptors for Large Graphs

Abstract

Embedding networks into a fixed dimensional feature space, while preserving its essential structural properties is a fundamental task in graph analytics. These feature vectors (graph descriptors) are used to measure the pairwise similarity between graphs. This enables applying data mining algorithms (e.g classification, clustering, or anomaly detection) on graph-structured data which have numerous applications in multiple domains. State-of-the-art algorithms for computing descriptors require the entire graph to be in memory, entailing a huge memory footprint, and thus do not scale well to increasing sizes of real-world networks. In this work, we propose streaming algorithms to efficiently approximate descriptors by estimating counts of sub-graphs of order k≤4 , and thereby devise extensions of two existing graph comparison paradigms: the Graphlet Kernel and NetSimile. Our algorithms require a single scan over the edge stream, have space complexity that is a fraction of the input size, and approximate embeddings via a simple sampling scheme. Our design exploits the trade-off between available memory and estimation accuracy to provide a method that works well for limited memory requirements. We perform extensive experiments on real-world networks and demonstrate that our algorithms scale well to massive graphs.

Full PDF

EEstimating Descriptors for Large Graphs (cid:63)

Zohair Raza Hassan , Mudassir Shabbir , Imdadullah Khan ( (cid:0) ), andWaseem Abbas Information Technology University of the Punjab, Pakistan [email protected], [email protected] Lahore University of Management Sciences, Pakistan [email protected] Vanderbilt University, USA [email protected]

Abstract.

Embedding networks into a ﬁxed dimensional feature space,while preserving its essential structural properties is a fundamental taskin graph analytics. These feature vectors (graph descriptors) are used tomeasure the pairwise similarity between graphs. This enables applyingdata mining algorithms (e.g classiﬁcation, clustering, or anomaly de-tection) on graph-structured data which have numerous applications inmultiple domains. State-of-the-art algorithms for computing descriptorsrequire the entire graph to be in memory, entailing a huge memory foot-print, and thus do not scale well to increasing sizes of real-world networks.In this work, we propose streaming algorithms to eﬃciently approxi-mate descriptors by estimating counts of sub-graphs of order k ≤

4, andthereby devise extensions of two existing graph comparison paradigms:the Graphlet Kernel and NetSimile. Our algorithms require a single scanover the edge stream, have space complexity that is a fraction of theinput size, and approximate embeddings via a simple sampling scheme.Our design exploits the trade-oﬀ between available memory and estima-tion accuracy to provide a method that works well for limited memoryrequirements. We perform extensive experiments on real-world networksand demonstrate that our algorithms scale well to massive graphs.

Keywords:

Graph Descriptor · Edge Stream · Graph Classiﬁcation

Evaluating similarity or distance between a pair of graphs is a building blockof many fundamental data analysis tasks on graphs such as classiﬁcation andclustering. These tasks have numerous applications in social network analysis,bioinformatics, computational chemistry, and graph theory in general. Unfortu-nately, large orders (number of vertices) and massive sizes (number of edges)prove to be challenging when applying general-purpose data mining techniqueson graphs. Moreover, in many real-world scenarios, graphs in a dataset have (cid:63)

The ﬁrst two authors have been supported by the grant received to establish CIPLand the third author has been supported the grant received to establish SEIL, bothassociated with the National Center in Big Data and Cloud Computing, funded bythe Planning Commission of Pakistan. a r X i v : . [ c s . D B ] F e b Z.R. Hassan, M. Shabbir, I. Khan, and W. Abbas varying orders and sizes, hindering the application of data mining algorithmsdevised for vector spaces. Thus, devising a framework to compare graphs withdiﬀerent orders and sizes would allow for rich analysis and knowledge discoveryin many practical domains.However, graph comparison is a diﬃcult task; the best-known solution for de-termining whether two graphs are structurally the same takes quasi-polynomialtime [1], and determining the minimum number of steps to convert one graph toanother is

NP-Hard [16]. In a more practical approach, graphs are ﬁrst mappedinto ﬁxed dimensional feature vectors, where vector space-based algorithms arethen employed. In a supervised setting, these feature vectors are learned throughneural networks [14,25,26]. In unsupervised settings, the feature vectors are de-scriptive statistics of the graph such as average degree, the eigenspectrum, orspectra of sub-graphs of order at most k contained in the graph [7,11,18,17,22,23].The runtimes and memory costs of these methods depend directly on themagnitude (order and size) of the graphs and the dimensionality (dependent onthe number of statistics) of the feature-space. While computing a larger numberof statistics would result in richer representations, these algorithms do not scalewell to the increasing magnitudes of a real-world graphs [9].A promising approach is to process graphs as streams - one edge at a time,without storing the whole graph in memory. In this setting, the graph descriptorsare approximated from a representative sample achieving practical time andspace complexity [6,15,19,20,21].In this work we propose gabe (Graphlet Amounts via Budgeted Estimates),and maeve (Moments of Attributes Estimated on Vertices Eﬃciently), stream-based extensions of the Graphlet Kernel [17], and NetSimile [3], respectively.Our contributions can be summarised as follows: – We propose two simple and intuitive descriptors for graph comparisons thatrun in the streaming setting. – We provide analytical bounds on the time and space complexity of our fea-ture vectors generation; for a ﬁxed budget, the runtime and space cost ofour algorithms are linear. – We perform extensive empirical analysis on benchmark graph classiﬁcationdatasets of varying magnitudes. We demonstrate that gabe and maeve arecomparable to the state-of-the-art in terms of classiﬁcation accuracy, andscale to networks with millions of nodes and edges.The rest of the paper is organized as follows. We discuss the related workin Section 2. Section 3 discusses all preliminaries required to read the text. Wepresent gabe and maeve in Section 4. We report our experimental ﬁndings inSection 5 and ﬁnally conclude the paper in Section 6.

Methods for comparing a pair of graphs can broadly be categorized into directapproaches , kernel methods , descriptors , and neural models . Direct approaches stimating Descriptors for Large Graphs 3 for evaluating the similarity/distance between a pair of graphs preserve the en-tire structure of both graphs. The most prominent method under this approachis the Graph Edit Distance ( ged ), which counts the number of edit operations(insertion/deletion of vertices/edges) required to convert a given graph to an-other [16]. Although intuitive, ged is stymied by its computational intractability.Computing distance based on the vertex permutation that minimizes the “error”between the adjacency representations of two graphs is a diﬃcult task [1], andproposed relaxations of these distances are not robust to permutation [2]. Aneﬃcient algorithm for large network comparison DeltaCon , is proposed in [9]but it is only feasible when there is a valid one-to-one correspondence betweenvertices of the two graphs.In the kernel-based approach, graphs are mapped to a ﬁxed dimensional vec-tor space based on various substructures in the graphs. A kernel function is thendeﬁned, which serves as a pairwise similarity measure that takes as input a pairof graphs and outputs a non-negative real number. Typically, the kernel value isthe inner-product between two feature vectors corresponding to the two graphs.This so-called kernel trick has been used successfully to evaluate pairwise of otherstructures such as images and sequences [4,12,10]. Several graph kernels basedon sub-structural patterns have been proposed, such as the Shortest-Path [5] andGraphlet [17] kernels. More recently, a hierarchical kernel based on propagatingspectral information within the graph [11] was introduced. The WL-Kernel [18]that is based on the Weisfeller-Lehman isomorphism test has been shown to pro-vide excellent results for classiﬁcation and is used as a benchmark in the graphrepresentation learning literature. Kernels require expensive computation andtypically necessitate storing the adjacency matrices, making them infeasible formassive graphs.Graph Neural Networks ( gnn s) learn graph level embeddings by aggregatingnode representations learned via convolving neighborhood information through-out the neural network’s layers. This idea has been the basis of many popularneural networks and is as powerful as WL-Kernels for classiﬁcation [14,26]. Werefer interested readers to a comprehensive survey of these models [25]. Un-fortunately, these models also require expensive computation and storing largematrices, hindering scalability to real-world graphs.Graph descriptors, like the above two paradigms, attempt to map graphs toa vector space such that similar graphs are mapped to closely in the Euclideanspace. Generally, the dimensionality of these vectors is small, allowing eﬃcientalgorithms for graph embeddings. NetSimile [3] describes graphs by computingmoments of vertex features, while SGE [7] uses random walks and hashing tocapture the presence of diﬀerent sub-structures in a graph. State of the artdescriptors are based on spectral information; [23] proposed a family of graphspectral distances and embedding the information as histograms on the multisetof distances in a graph, and NetLSD [22] computes the heat (or wave) trace overthe eigenvalues of a graph’s normalized Laplacian to construct embeddings.The fundamental limitation of all the above approaches is the requirementthat the entire graph is available in memory. This limits the applicability of the

Z.R. Hassan, M. Shabbir, I. Khan, and W. Abbas methods to a graph of small magnitude. To the best of our knowledge, this workis the ﬁrst graph comparison method that does not assume this.Streaming algorithms assume an online setting; the input is streamed oneelement at a time, and the amount of space we are allowed is limited. Thisallows one to design scalable approximation algorithms to solve the underlyingproblems. There has been extensive work on estimating triangles (cycles of lengththree) in graphs [19,21], butterﬂies (cycles of length four) in bipartite graphs [15],and anomaly detection [8] when the graph is input as a stream of edges. Aframework for estimating the number of connected induced sub-graphs on threeand four vertices is presented in [6].

Let G = ( V G , E G ) be an undirected, unweighted, simple graph, where V G is theset of vertices and E G is the set of edges.For v ∈ V G , let N G ( v ) = { u : ( u, v ) ∈ E G } be the set of neighbors of v , and d vG := | N G ( v ) | the degree of v . A graph is connected if and only if there existsa path between all pairs in V G .A sub-graph of G is a graph, G (cid:48) = ( V G (cid:48) , E G (cid:48) ), such that V G (cid:48) ⊆ V G and E G (cid:48) is a subset of edges in E G that are incident only on the vertices present in V G (cid:48) ,i.e. E G (cid:48) ⊆ { ( u, v ) : ( u, v ) ∈ E G ∧ u, v ∈ V G (cid:48) } . If equality holds ( E G (cid:48) contains alledges from the original graph), then G (cid:48) is called an induced sub-graph of G .Two graphs, G and G , are isomorphic if and only if there exists a per-mutation π : V G → V G such that E G = { ( π ( u ) , π ( v )) : ( u, v ) ∈ E G } . For agraph F = ( V F , E F ), let H FG (resp. (cid:98) H FG ) be the set of sub-graphs (resp. inducedsub-graphs) of G that are isomorphic to F .We assume vertices in V G are denoted by integers in the range [0 , | V G | − S = e , e , . . . , e | E G | be a sequence of edges in an arbitrary but ﬁxed order,i.e. e t = ( u t , v t ) is the t th edge. Let b be the maximum number of edges (budget)one can store in our sample, referred to as (cid:102) E G . We now formally deﬁne the graph descriptor problem:

Problem 1 (Constructing Graph Descriptors).

Let G be the set of all possibleundirected, unweighted, simple graphs. We wish to ﬁnd a function, ϕ : G → R d ,that can map any given graph to a d -dimensional vector.Existing work [3,22] on graph descriptors asserts that the underlying algo-rithms should be able to run on any graph, regardless of order or size, and shouldoutput the same representation for diﬀerent vertex permutations. Moreover, thedescriptors should capture features that can be compared across graphs of vary-ing orders; directly comparing sub-graph counts is illogical as bigger graphs will stimating Descriptors for Large Graphs 5 naturally have more sub-graphs. The descriptors we propose are based on graphcomparison methods that meet these requirements due to their graph-theoreticnature and feature scaling based on the graph’s magnitude. We consider an on-line setting and model the input graph as a stream of edges. We impose thefollowing constraints on our algorithms: C1: Single Pass:

The algorithm is only allowed to receive the stream once.

C2: Limited Space:

The algorithm can store a maximum of b edges at once. C3: Linear Complexity:

Space and time complexity of the algorithms shouldbe linear (for ﬁxed b ) with respect to the order and size of the graph. Problem 2 (Connected Sub-graph Estimation on Streams).

Let S be a stream ofedges, e , e , . . . , e | E G | for some graph G = ( V G , E G ). Let F = ( V F , E F ) be asmall connected graph such that | V F | (cid:28) | V G | . Produce an estimate, N FG , of | H FG | while storing a maximum of b edges at any given instant.Based on previous works on sub-graph estimation [20,19,6,21] the underlyingrecipe for algorithms that solve Problem 2 consists of the following steps: – For each edge e t ∈ S , counting the instances of F incident on e t . For example,if F is a triangle, then it amounts to counting the number of triangles anedge e t is part of. – A sampling scheme through which we can compute the probability of de-tecting F in our sample, denoted by p Ft , at the arrival of the t th edge.At the arrival of e t , we increment our estimate of | H FG | by 1 /p Ft for all in-stances of F in our sample (cid:102) E G that e t belongs to. The pseudocode is provided inAlgorithm 1. This simple methodology allows one to compute estimates whoseexpected values are equal to | H FG | : Theorem 1.

Algorithm 1 provides unbiased estimates: E [ N FG ] = | H FG | .Proof. For a sub-graph h ∈ H GF , let X h be a random variable such that X h =1 /p Ft if h is detected at the arrival of its last edge in the stream e t , and 0otherwise. Clearly, N FG = (cid:80) h ∈ H FG X h , and E [ X h ] = (1 /p Ft ) × p Ft = 1. Therefore, E (cid:2) N FG (cid:3) = E  (cid:88) h ∈ H FG X h  = (cid:88) h ∈ H FG E [ X h ] = (cid:88) h ∈ H FG (cid:12)(cid:12) H FG (cid:12)(cid:12) . At the arrival of e t , counting only the sub-graphs that e t belongs to ensuresthat we do count the same sub-graph twice. In this work, we employ reser-voir sampling [24], which has been shown to be eﬀective for sub-graph esti-mation [6,20,21]. Using reservoir sampling, the probability of detecting an F that e t belongs to at the arrival of e t is equivalent to the probability that Z.R. Hassan, M. Shabbir, I. Khan, and W. Abbas

Algorithm 1:

Sub-graph Estimation on Streams

Input :

Stream of edges S = e , e , . . . , e | E G | , budget b , and a graph F Output: N FG (estimate of | H FG | ) (cid:102) E G ← ∅ , N FG ← /* Initialize sample of edges, and estimate */ for e t ∈ S do Find all instances of F that e t belongs to in (cid:102) E G ∪ { e t } Increment N FG by 1 /p Ft for each F detectedSample e t in (cid:102) E G , based on b end | E F | − t − p Ft = min (cid:16) , (cid:81) | E F |− i =0 b − it − − i (cid:17) .We now derive an upper bound for the variance. Note that while the boundis loose, it is suﬃcient to show that we obtain better results with greater b , andapplies to any connected graph, F . Theorem 2.

When using reservoir sampling, the variance of N FG in Algorithm 1is bounded as follows: Var[ N FG ] ≤ | H FG | (cid:81) | E F |− i =0 | E G |− ib − i .Proof. The theorem is trivially true when b ≥ | E G | −

1. We now explore thecase when b < | E G | −

1. Let X h be a random variable as deﬁned in the proof forTheorem 1. Note that p Ft ≥ p F | E G | , and Var[ X h ] = E [ X h ] − E [ X h ] = 1 /p Ft − ≤ /p F | E G | . We bound the total variance using the Cauchy-Schwarz inequality:Var[ N FG ] = (cid:88) h ∈ H FG (cid:88) h (cid:48) ∈ H FG Cov[ X h , X h (cid:48) ] ≤ (cid:88) h ∈ H FG (cid:88) h (cid:48) ∈ H FG (cid:112) Var[ X h ]Var[ X h (cid:48) ] ≤ (cid:88) h ∈ H FG (cid:88) h (cid:48) ∈ H FG p F | E G | = | H FG | | E F |− (cid:89) i =0 | E G | − − ib − i . Note that this methodology is also applicable for estimating the number of sub-graphs that each vertex is incident in, and simple modiﬁcations to the proofs forTheorems 1 and 2 will prove the same results for estimations on vertex counts.

In this section discuss our two proposed descriptors: Graphlet Amounts via Bud-geted Estimates ( gabe ), which is based on the Graphlet Kernel, and Momentsof Attributes Estimated on Vertices Eﬃciently ( maeve ), based on NetSimile.

Let F k be the set of graphs with order k . For two given graphs, G and G ,Shervashidze et al. [17] propose counting all graphlets (induced sub-graphs) of stimating Descriptors for Large Graphs 7 Fig. 1: The graphs counted by gabe , and their corresponding overlap matrix O (best viewed when zoomed in). Zeros have been omitted for readability.order k in both graphs, and computing similarity based on the inner product (cid:104) φ k ( G ) , φ k ( G ) (cid:105) , where, for a given k , and graphs F i ∈ F k : φ k ( G ) := 1 (cid:0) | V G | k (cid:1) (cid:104)(cid:12)(cid:12)(cid:12) (cid:98) H F G (cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12) (cid:98) H F G (cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12) (cid:98) H F G (cid:12)(cid:12)(cid:12) · · · (cid:12)(cid:12)(cid:12) (cid:98) H F |F k |− G (cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12) (cid:98) H F |F k | G (cid:12)(cid:12)(cid:12)(cid:105) (cid:124) Their algorithm runs in O ( | V G | d k − ) ( d = max v ∈ V G d vG ) for k ∈ { , , } , anduses adjacency matrices. We use the methodology of [6], to estimate the sub-graph counts as in Section 3.3, then compute induced sub-graph counts based onthe overlap of graphs of the same order. We follow this procedure for estimatingsub-graph counts of order k ∈ { , , } , then concatenate the resultant φ k ( G )’sinto a vector. The 17 graphs we enumerate are shown in Figure 1. Note thatunlike [6], we also estimate the counts of disconnected induced sub-graphs. Induced Sub-graph Counts.

Let F = { F , F , . . . , F } be the set of graphswe enumerate. Let H F G (resp. (cid:98) H F G ) be a vector such that i th entry corresponds to | H F i G | (resp. | (cid:98) H F i G | ). Let O be a |F| × |F| matrix such that O ( i, j ) is the numberof sub-graphs of F j , isomorphic to F i , when | V F i | = | V F j | , and 0 otherwise.One can clearly see that H F G = O (cid:98) H F G , as we account for the sub-graph countsthat are disregarded when only considering induced sub-graphs. Since O is anupper triangular matrix, it is invertible. Thereby, given H F G , one can retrieve theinduced sub-graph counts by computing O − H F G . By linearity of expectation,Theorem 1 implies that the induced sub-graph counts are unbiased as well.While processing the stream, we store the degree of each vertex, by increment-ing the degree for u t , v t when e t = ( u t , v t ) arrives. We use edge-centric algorithms(as described in Section 3.3) to compute estimates for F , F , . . . , F , and useintuitive combinatorial formulas, listed in Table 1, to compute the remaining 11sub-graphs. We can compute | E G | and | V G | by keeping track of how many edgeshave been received, and the maximum vertex label received, respectively. Time and Space Complexity.

An array of size | V G | is used to store degrees,which can be accessed in O (1) time, and hence the counts for F and F canbe incremented each time an edge arrives in O (1). Let G (cid:48) denote the graphrepresented by (cid:102) E G , stored as an adjacency list. Determining if two vertices areadjacent takes O (log b ) time when using a tree data-structure within the stored Z.R. Hassan, M. Shabbir, I. Khan, and W. Abbas

Table 1: Graphs and their corresponding sub-graph count formulas.

Graph Formula Graph Formula Graph Formula (cid:0) | V G | (cid:1) (cid:0) | V G | (cid:1) (cid:0) | V G | (cid:1) | E G | | E G | ( | V G | − | E G | (cid:0) | V G |− (cid:1)(cid:0) | E G | (cid:1) − | H F G | (cid:80) v ∈ V G (cid:0) d vG (cid:1) (cid:80) v ∈ V G (cid:0) d vG (cid:1) | H F G | ( | V G | − | H F G | ( | V G | −

3) - -

Table 2: Features extracted for each vertex, v ∈ V G for maeve , their formulae,and a ﬁgure highlighting the relevant edges. The ﬁlled in vertex depicts v . Degree ClusteringCoeﬃcient Avg. Degree of N G ( v ) Edges in I G ( v ) Edges leaving I G ( v ) d vG | T G ( v ) | / (cid:0) d vG (cid:1) | P G ( v ) | /d vG d vG + | T G ( v ) | | P G ( v ) | − | T G ( v ) | adjacency list. At the arrival of e t = ( u t , v t ), we need to visit only the verticestwo hops away from u t (resp. v t ), then perform at most three adjacency checks.Thereby, we perform 2 (cid:16)(cid:80) w ∈ N G (cid:48) ( u t ) d wG (cid:48) + (cid:80) w ∈ N G (cid:48) ( v t ) d wG (cid:48) (cid:17) × b = O ( b log b )operations for one edge, and O ( b log b | E G | ) in total. Storing an adjacency listwith b edges, and an array for degrees takes O ( b + | V G | ) space. NetSimile [3] propose extracting features for each vertex and aggregating themby taking moments over their distribution. Similarly, we propose extracting asubset of those features, listed in Table 2, and computing four moments for eachfeature: mean, standard deviation, skewness, and kurtosis.

Extracting Vertex Features.

For a graph G , and a vertex v ∈ V G , we use I G ( v ) to denote the induced sub-graph of G formed by v and its neighbors.Let T G ( v ) be the set of triangles that v belongs to, and P G ( v ) be the set ofthree-paths (paths on three vertices) where v is an end-point. We compute thefeatures in Table 2 by using their formulas on estimates of | T G ( v ) | , | P G ( v ) | , and d vG computed for each v ∈ V as in Sections 3.3 and 4.1. Theorem 3.

For a vertex v ∈ V G , all vertex features used in maeve can beexpressed in terms of d vG , | T G ( v ) | , and | P G ( v ) | .Proof. The ﬁrst two are already expressed in terms of d vG and | T G ( v ) | . Average Degree of Neighbors:

For each u ∈ N G ( v ), there is exactly one edgeconnected to v , accounting for d vG edges. The remaining edges are part of three-paths on which v is an end-point. Therefore, (cid:80) u ∈ N G ( v ) d uG = d vG + | P G ( v ) | . Edges in I G ( v ) : There are two types of edges in E I G ( v ) : (1) edges incident on v ,of which there are d vG , and (2) edges not incident on v . The latter must belong stimating Descriptors for Large Graphs 9 to a pair of vertices which form a triangle with v . For each such edge, there isexactly one triangle. Therefore, (cid:12)(cid:12) E I G ( v ) (cid:12)(cid:12) = d vG + | T G ( v ) | . Edges leaving I G ( v ) : Consider a sub-graph h ∈ P G ( v ). Let u be the other end-point of h , and w be the center vertex. When u (cid:54)∈ N G ( v ), it belongs to a three-path that is not in N G ( v ), and is thereby an edge leaving the induced sub-graphof v . Now, consider u ∈ N G ( v ). Clearly, the edge ( u, w ) forms a triangle, and isincident in exactly two three-paths: { ( v, u ) , ( u, w ) } and { ( v, w ) , ( u, w ) } . There-fore, if we account for the three-paths that lie within N G ( v ), we can formulatethe number of edges leaving I G ( v ) as | P G ( v ) | − | T G ( v ) | . Time and Space Complexity.

As in Section 4.1, we assume an adjacency listwith an underlying tree structure and refer to the sampled graph as G (cid:48) . At thearrival of an edge e t = ( u t , v t ), one can traverse the neighborhoods to obtain thetriangle and three-path count in ( N G (cid:48) ( u t )+ N G (cid:48) ( v t ))+ N G (cid:48) ( u t )+ N G (cid:48) ( v t ) = O ( b )time. We store three arrays of size | V G | to store degrees, triangle counts, andthree-path counts. We can compute the moments in at most two passes overthese arrays, giving us a total of O ( b | E G | + | V G | ) time. Storing an adjacency listof size b and arrays of size | V G | gives us O ( b + | V G | ) space. Improving Estimation Quality with Multiple Workers.

Multiple workermachines can be used in parallel to independently estimate triangle counts beforeaggregating them [20]. Using W worker machines decreases the variances by afactor of 1 /W . Their methodology can be adopted mutatis mutandis in ouralgorithms to improve the estimation quality. In this section, we perform experiments to show how the approximation qualitychanges with respect to b , explore how the descriptors perform on classiﬁcationtasks, and showcase the scalability of the algorithms. As in [3], from extensiveexperiments, we found that Canberra distance (cid:16) d ( x , y ) := (cid:80) di =1 | x i − y i || x i | + | y i | (cid:17) per-forms best when comparing the descriptors. We refer to the approximation erroras the distance between the true vectors and their approximations. Implementation.

All experiments were performed on a machine with 48 IntelXeon E5-2680 v3 2.50GHz Processors, and 125 GB RAM. The algorithms areimplemented in C++ using MPICH 3.2 on the base code provided by the au-thors of Tri-Fly [20]. We use 25 processes to simulate 1 Master and 24 workers.Each descriptor is computed exactly once under this setting. Datasets.

We evaluate our models on randomly sampled REDDIT graphs , ﬁvebenchmark classiﬁcation datasets with large graphs: D&D, COLLAB, REDDIT-BINARY, REDDIT-MULTI-5K, and REDDIT-MULTI-12K [27] (Table 3), andmassive networks from KONECT [13] (Table 4). For each graph, we removeduplicated edges and self-loops, convert to edge-list format with vertex labels inthe range [0 , | V G | − Code: https://github.com/zohair-raza/estimating-graph-descriptors/ https://dynamics.cs.washington.edu/data.html Table 3: Details of classiﬁcation datasets. The number of graphs, classes, andminimum/maximum number of vertices/edges in a graph have been provided.

Dataset |G|

Classes max | V G | max | E G | D&D 1,178 2 5,748 14,267COLLAB 5,000 3 492 40,120REDDIT-BINARY 2,000 2 3,782 4,071REDDIT-MULTI-5K 4,999 5 3,648 4,783REDDIT-MULTI-12K 11,929 11 3,782 5,171

Table 4: Massive networks with their order, size, and what they represent.

Graph | V G | | E G | Network Type

Patent 3,774,768 16,518,937 CitationFlickr 2,302,925 22,838,276 FriendshipFull USA 23,947,347 28,854,312 RoadUK Domain 2002 18,483,186 261,787,258 Hyperlink

Table 5: Classiﬁcation accuracy on the datasets described in Table 3. Resultswithin 1% of the best have been boldfaced.

Descriptor DD COLLAB RDT-2 RDT-5 RDT-12

NetLSD [22] gabe ( b = / | E G | ) 65.23% 63.62% 84.65% gabe ( b = / | E G | ) 69.08% 65.23% % 40.63% maeve ( b = / | E G | ) 59.44% 68.42% 85.04% 41.15% 32.57% maeve ( b = / | E G | ) 61.26% 70.95% % We uniformly sampled 1000 graphs of size 10,000 to 50,000 from REDDIT,representing interactions in various “sub-reddits”. In Figure 2(a) we show howthe average approximation error taken over all the sampled graphs decreases as b (a fraction of the number of edges) increases.

10 20 30 40 5001530

Budget [% of | E G | ] A v g . D i s t a n ce [ × − ] gabemaeve (a) Error vs. b PT FLUS UDPT FLUS UD

Wall-clock time [minutes] D i s t a n ce gabemaeve (b) b = 100 , PT FLUS UDPT FLUS UD

Wall-clock time [minutes] D i s t a n ce gabemaeve (c) b = 500 , Fig. 2:

Approximation error and runtime of gabe and maeve (best viewed in color).stimating Descriptors for Large Graphs 11

We computed descriptors for graphs in Table 3 from samples of 25% and 50% ofall the edges and examined their classiﬁcation accuracy. We used the state-of-the-art descriptor, NetLSD [22], as a benchmark, despite the fact that our modelshave no direct competitors. As in [22], we used a simple 1-Nearest Neighbourclassiﬁer. We performed 10-fold cross-validation for 10 diﬀerent random splitsof the dataset (i.e. 100 diﬀerent folds are tested on), and report the averageaccuracy in Table 5. Note that despite using only a fraction of edges, gabe and maeve give results competitive to the state of the art.

We run our algorithms on massive networks (Table 4) and estimated descriptorsby setting b to 100 ,

000 and 500 , ≈

260 million edges under 20 minutes, with relatively low error.Note that when b = 500 , gabe takes 102 minutes to compute the descriptorfor Flickr, implying that we must take the density of the graph into account foreﬃcient computation when setting the value of b . In this work, we present single-pass streaming algorithms to construct graphdescriptors using a ﬁxed amount of memory. We show that these descriptorsprovide better approximations with increasing b , are comparable with the state-of-the-art known descriptors in terms of classiﬁcation accuracy, and scale wellto networks with millions of vertices and edges. References

1. Babai, L.: Graph isomorphism in quasipolynomial time. In: STOC. pp. 684–697(2016)2. Bento, J., Ioannidis, S.: A family of tractable graph distances. In: SDM. pp. 333–341 (2018)3. Berlingerio, M., Koutra, D., Eliassi-Rad, T., Faloutsos, C.: Network similarity viamultiple social theories. In: ASONAM. pp. 1439–1440 (2013)4. Bo, L., Ren, X., Fox, D.: Kernel descriptors for visual recognition. In: NIPS. pp.244–252 (2010)5. Borgwardt, K., Kriegel, H.: Shortest-path kernels on graphs. In: ICDM. pp. 74–81(2005)6. Chen, X., Lui, J.: A uniﬁed framework to estimate global and local graphlet countsfor streaming graphs. In: ASONAM. pp. 131–138 (2017)7. Dutta, A., Sahbi, H.: Stochastic graphlet embedding. IEEE Trans. Neural Netw.Learning Syst. (8), 2369–2382 (2019)2 Z.R. Hassan, M. Shabbir, I. Khan, and W. Abbas8. Eswaran, D., Faloutsos, C.: Sedanspot: Detecting anomalies in edge streams. In:ICDM. pp. 953–958 (2018)9. Faloutsos, C., Koutra, D., Vogelstein, J.: DELTACON: A principled massive-graphsimilarity function. In: SDM. pp. 162–170 (2013)10. Farhan, M., Tariq, J., Zaman, A., Shabbir, M., Khan, I.: Eﬃcient approximationalgorithms for strings kernel based sequence classiﬁcation. In: NIPS. pp. 6935–6945(2017)11. Kondor, R., Pan, H.: The multiscale laplacian graph kernel. In: NeurIPS. pp. 2982–2990 (2016)12. Kuksa, P., Khan, I., Pavlovic, V.: Generalized similarity kernels for eﬃcient se-quence classiﬁcation. In: SDM. pp. 873–882 (2012)13. Kunegis, J.: KONECT: the koblenz network collection. In: WWW. pp. 1343–1350(2013)14. Morris, C., et al.: Weisfeiler and leman go neural: Higher-order graph neural net-works. In: AAAI. pp. 4602–4609 (2019)15. Sanei-Mehri, S., Zhang, Y., Sariy¨uce, A.E., Tirthapura, S.: FLEET: butterﬂy es-timation from a bipartite graph stream. In: CIKM. pp. 1201–1210 (2019)16. Sanfeliu, A., Fu, K.: A distance measure between attributed relational graphs forpattern recognition. IEEE Trans. Syst. Man Cybern. (3), 353–362 (1983)17. Shervashidze, N., Vishwanathan, S., Petri, T., Mehlhorn, K., Borgwardt, K.: Ef-ﬁcient graphlet kernels for large graph comparison. In: AISTATS. pp. 488–495(2009)18. Shervashidze, N., et al.: Weisfeiler-lehman graph kernels. J. Mach. Learn. Res. ,2539–2561 (2011)19. Shin, K.: WRS: waiting room sampling for accurate triangle counting in real graphstreams. In: ICDM. pp. 1087–1092 (2017)20. Shin, K., et al.: Tri-ﬂy: Distributed estimation of global and local triangle countsin graph streams. In: PAKDD. pp. 651–663 (2018)21. Stefani, L.D., et al.: Tri`est: Counting local and global triangles in fully dynamicstreams with ﬁxed memory size. TKDD (4), 43:1–43:50 (2017)22. Tsitsulin, A., Mottin, D., Karras, P., Bronstein, A.M., M¨uller, E.: Netlsd: Hearingthe shape of a graph. In: KDD. pp. 2347–2356 (2018)23. Verma, S., Zhang, Z.: Hunt for the unique, stable, sparse and fast feature learningon graphs. In: NeurIPS. pp. 88–98 (2017)24. Vitter, J.S.: Random sampling with a reservoir. ACM Trans. Math. Softw. (1),37–57 (1985)25. Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., Yu, P.S.: A comprehensive surveyon graph neural networks. CoRR abs/1901.00596abs/1901.00596