[PDF] A Neighborhood-preserving Graph Summarization

Abstract

We introduce in this paper a new summarization method for large graphs. Our summarization approach retains only a user-specified proportion of the neighbors of each node in the graph. Our main aim is to simplify large graphs so that they can be analyzed and processed effectively while preserving as many of the node neighborhood properties as possible. Since many graph algorithms are based on the neighborhood information available for each node, the idea is to produce a smaller graph which can be used to allow these algorithms to handle large graphs and run faster while providing good approximations. Moreover, our compression allows users to control the size of the compressed graph by adjusting the amount of information loss that can be tolerated. The experiments conducted on various real and synthetic graphs show that our compression reduces considerably the size of the graphs. Moreover, we conducted several experiments on the obtained summaries using various graph algorithms and applications, such as node embedding, graph classification and shortest path approximations. The obtained results show interesting trade-offs between the algorithms runtime speed-up and the precision loss.

Full PDF

aa r X i v : . [ c s . D S ] J a n A N

EIGHB ORHOOD - PRESERVING G R APH S UMMARIZATION

A P

REPRINT

Abd Errahmane KIOUCHE , Julien BASTE , Mohammed HADDAD , and Hamida SEBA Univ Lyon, Université Lyon 1, LIRIS UMR CNRS 5205, F-69621, Lyon, France., E-mail: {abd-errahmane.kiouche,mohammed.haddad,hamida.seba}@univ-lyon1.fr Univ. Lille, CNRS, Centrale Lille, UMR 9189 - CRIStAL - Centre de Recherche en Informatique Signal etAutomatique de Lille, F-59000 Lille, France. E-mail: [email protected] LCSI, Ecole nationale Supérieure d’Informatique (ESI),Algeria.January 28, 2021 A BSTRACT

We introduce in this paper a new summarization method for large graphs. Our summarization ap-proach retains only a user-speciﬁed proportion of the neighbors of each node in the graph. Ourmain aim is to simplify large graphs so that they can be analyzed and processed effectively whilepreserving as many of the node neighborhood properties as possible. Since many graph algorithmsare based on the neighborhood information available for each node, the idea is to produce a smallergraph which can be used to allow these algorithms to handle large graphs and run faster while pro-viding good approximations. Moreover, our compression allows users to control the size of the com-pressed graph by adjusting the amount of information loss that can be tolerated. The experimentsconducted on various real and synthetic graphs show that our compression reduces considerably thesize of the graphs. Moreover, we conducted several experiments on the obtained summaries usingvarious graph algorithms and applications, such as node embedding, graph classiﬁcation and short-est path approximations. The obtained results show interesting trade-offs between the algorithmsruntime speed-up and the precision loss. K eywords Graph compression · Graph summarization · Algorithm speed-up · Node embedding · Graph embedding

Graphs are widely used in data modeling because of their ability to represent, in a simple and intuitive way, complexrelations and interactions between objects: social interactions, protein-protein interactions, chemical molecule bonds,transport networks, etc. We recall that a graph G = ( V, E ) is a data modeling tool consisting of a set V of vertices, alsocalled nodes, and a set E of edges that connect vertices. Vertices represent objects, while edges represent relationshipsbetween them. Edges can be directed, and both vertices and edges can have labels. As we are witnessing an explosionin the number of data generated and processed by our applications, it becomes important to deal efﬁciently with largegraphs the processing of which remains a challenging issue. In fact, the amount of generated data is continuouslyincreasing. Our basic and simple daily activities such as sending emails, surﬁng websites, purchasing online, andinteracting via social networks, generate, on their own, a huge amount of data each day. For example, in 2019Facebook social network had more than 2.4 Billion monthly active users with an average of 155 friendship links foreach user . This large volume of data makes graph querying and analysis a very challenging task. However, a viablesolution seems to emerge from the possibilities offered by graph summarizing.Graph summarization, also known by graph compression or simpliﬁcation, is a solution that tackles scalability andperformance issues when dealing with massive graph data. Beyond the reduction of the volume of data which is themain aim of compression, graph summarization looks for signiﬁcant summaries that can be used, in graph analysis,without decompression. PREPRINT - J

ANUARY

28, 2021In fact, using graph summaries helps to speed-up graph algorithms so that they can efﬁciently run on large graphs.Compression algorithms produce smaller graphs or simpler graph representations, which can be maintained in mainmemory and queried and analyzed in reasonable time.Many graph algorithms, such as node embedding, node classiﬁcation, recommendations, shortest path approximationand graph comparison, are based on the neighborhood information available for each node. Finding this informationmay be difﬁcult in practice as dealing with all the neighbors for each node requires all the edges (links) of the graph tobe processed, which is time and space consuming. This motivated us to introduce a graph compression that controlsthe size of the preserved neighborhood of vertices in the computed summary. So, in this paper, we propose a newgraph compression which retains only a user-speciﬁed proportion of the neighbors of each node to reduce the size ofthe graph while preserving neighborhood queries.The main idea is to sparsify the graph by removing edges, while ensuring that a predeﬁned proportion of the neighborsof each node is included in the set of t -hops neighbors of the node in the compressed graph ( t ≥ ).The main advantages of our neighborhood-preserving compression are:• Reduction in storage space of the graph: our compression can decrease drastically the number of edges inthe graph, thus allowing the compressed graph to be loaded into main memory. The size of the compressedgraph can be controlled by adjusting the proportion of the preserved neighborhood’s information.•

Fast Approximation of graph algorithms:

Many graph algorithms, such as community detection, shortestpath lengths and graph comparison, are mainly based on the neighborhood information of the graph nodes.These algorithms cannot efﬁciently run on large graphs since they require all the edges of the original graphto be loaded in main memory. Since our compression produces a smaller graph that maintains the principalneighborhood’s information, it can be used to allow these algorithms to handle large graphs and run fasterwhile providing good approximations of the original results.•

User-controlled trade-off between compression ratio and information loss: with our compression, theuser can control the size of the compressed graph by adjusting the amount of information loss that can betolerated. This is a very useful property, since the amount of tolerated information loss differs signiﬁcantlyfrom one application to another, and the desired size of the compressed graph depends mainly on the availablememory.The remainder of this paper is organized as follows: Section 2 reviews related works on graph compression methodsand their applications. Section 3 formally deﬁnes the problem of neighborhood-preserving graph compression andstudies its complexity. Then, Section 4 provides a description of the algorithms used to implement this compression.Section 5 presents the results obtained through the extensive experiments we undertook to evaluate the compressionapproach, as well as the usefulness of the obtained summaries. Finally, Section 6 concludes the paper and points outsome research perspectives.

Graph compression is attracting increasing interest in various domains and applications [1]. The aim of graph com-pression, considered here, is to compute a graph summary that retains all or part of the original graph properties, thusallowing use of the summary instead of the original graph in certain applications. The obtained summary can be eithera graph that is simpler or smaller than the original graph, or any other data structure that is more compact or is simplerto use than the original graph. Compression algorithms can be classiﬁed in three main categories according to howthey simplify the input graph: (1) sampling, (2) sparsiﬁcation, and (3) regularity encoding. Sampling and sparsiﬁcationbased methods generate lossy graph compression and their results are generally a graph. Regularity encoding basedmethods allow having lossy, as well as loss-less, graph summaries, either as graphs or other data structures:1. Graph sampling [2, 3] consists in using a fraction of the graph to make inferences about the whole dataset. Itis generally used for dynamic graphs for which a sample at time t is a likely representation of the graph. It isalso used with very large graphs, such as protein to protein interactions, where dealing with the whole graphis too slow. Several graph sampling methods are proposed in the literature. They generally start with a setof initial vertices (and/or edges) which can be empty and expand the sample based on a speciﬁc algorithmsuch as graph exploration and traversal algorithms. As examples, Breadth-First sampling is used for socialnetwork analysis [4] and graph mining [5]. In [6], the authors apply a traversal based sampling that utilizesonly the local information of nodes, combined with estimated values of a set of properties, to guide thesampling process and extract tiny samples that preserve the properties of the graph and closely approximatetheir distributions in the original graph. Random walks based methods are also largely applied in large-scalegraph analysis [7, 8, 9]. Frontier sampling, an edge sampling method using multidimensional random walkers,2 PREPRINT - J

ANUARY

28, 2021is used to estimate the degree distributions and the global clustering coefﬁcient in [10]. In [11], the authorsapproximate betweenness centrality based on a sampled set of shortest paths.2. Graph sparsiﬁcation stands for the methods that compute a sparse subgraph of the input graph, which pre-serves some of its properties such as cuts or shortest paths [12]. Graph sparsiﬁcation methods can also relyon sampling as a tool to achieve sparsiﬁcation. Given a social graph and a log of past propagations, theauthors of [13] prune the network to a preﬁxed extent, while maximizing the likelihood of generating thepropagation traces in the log. A similar work is described in [14]. It tackles the problem of simplifying agraph, while maintaining the connectivity recorded in a given set of observed activity traces represented bya set of DAGs (or trees) with speciﬁed roots. The problem consists in selecting a subset of arcs in the graphso as to maximize the number of nodes reachable in all DAGs by the corresponding DAG roots. This is acover-maximization problem that the authors bring to a problem of minimizing a submodular function undersize constraints and using an algorithm introduced in [15] to solve it.3. Regularity encoding based methods search for regularities within the graph structure, i.e., particular patternsor just repetitive patterns, and then encode these regularities so as to obtain a compact representation of thegraph. Several approaches are proposed in the literature and differ by both the kind of considered regularitiesand how these regularities are encoded within the computed summary. Some methods of this class consistin merging or combining similar nodes, or subgraphs into super-nodes, and similar edges into super-edges.Others work directly on the adjacency matrix of the graph using for example k -trees [16]. In [17], the authorspropose a summarizing approach that iteratively aggregates similar nodes, i.e., those that have the greatestnumber of common neighbors. This aggregation is controlled by an objective function that represents the costof the compressed output graph and is deﬁned according to the principle of Minimum Description Length(MDL) [18]. The graph is encoded with a summary and a set of correcting edges. These corrections, appliedto the summary, enable the initial graph to be reconstructed. Identifying vertices with a similar neighborhoodis a well-studied topic known as modular decomposition of graphs [19, 20], which aims to highlight groupsof vertices that have the same neighbors outside the group. These subsets of vertices are called modules .Modular decomposition is used in [21] to compress a graph and compute its exact list of triangles usingsolely the computed summary. The compression consists in considering each module as a super-node. Severalworks such as [22] and [23] take advantage of the regularities of the web graph structure, such as locality andsimilarity properties, to compress its adjacency lists and reduce the number of bits needed to encode a link. In[24, 25], the authors compress graphs using MDL on a predeﬁned vocabulary of substructures. In [26], graphsare compressed by recursively detecting repeated substructures and representing them through grammar rules.In [27], the authors use a clustering algorithm to partition the original set of vertices into a number of clusters,which will be super-nodes connected by super-edges to form a complete weighted graph. The super-edgeweights are the edge densities between vertices in the corresponding super-nodes. The goal is to producea summary that minimizes the reconstruction error of the original graph. In [28], the authors merge intosuper-nodes graph vertices that have common neighbors so that the obtained compression ensures that theefﬁciency of a given task does not drop below a user-speciﬁed threshold. In [29], the authors accelerate nodegrouping using a divide and conquer approach that allows parallel node merging. In [30], the authors usetensor decomposition to group nodes of an evolving graph according to their connectivity patterns into super-nodes. In [31], the authors address the problem of preserving node attributes while summarizing diffusionnetworks. They propose a sub-quadratic parallelizable algorithm that ﬁnds the best set of candidate nodesand merges them to construct a smaller network of super-nodes that ensures similar diffusion properties to theoriginal graph. In [32], the authors use MDL to measure motif relevance based on motif capacity to compressa graph. In [33], the authors incrementally compute a summary of an evolving graph using frequent patternsand MDL principle, combined with a set of operations (merge, split, etc.) on the patterns, in order to providechanges that have occurred in the data since the previous state.In [34], the authors hybrid regularity encoding with sparsiﬁcation by using both node grouping and edge sparsiﬁcation.This allows optimizing the size of the obtained summary and the graph reconstruction error.It is interesting to note that few works explore the usefulness of the computed summaries beyond simple neighborhoodor reachability queries. Summaries obtained by graph sampling are used to estimate graph parameters and are rarelyused as input for graph applications. Most regularity-encoding based methods do not investigate this issue at all. Graphsparsiﬁcation methods are generally designed for speciﬁc applications because it is difﬁcult to have lossy summariesthat can be used in several kind of graph applications. By targeting neighborhood information in our compression andallowing to control the amount of information loss in the computed summary, we aim to be able to use our summariesin a variety of graph applications. In fact, several graph applications, such as node embedding, node classiﬁcation,recommendations, etc. are based on the availability of node neighborhood information. In the remainder of the paper,3 PREPRINT - J

ANUARY

28, 2021we show that controlling the amount of this information in the computed summary allows to reach good trade-offbetween algorithm speed-up and precision loss when using the summary as input instead of the original graph.

In this section, we explore a new graph sparsiﬁcation method that aims to control the amount of neighborhood infor-mation available for each node in the graph. Our goal is to compute a graph summary that can be used instead of theoriginal graph in several applications.

Let t ≥ be a positive integer. The main idea of neighborhood-preserving graph compression is to sparsify theinput graph by removing edges, while ensuring that, for all ≤ i ≤ t , a proportion p ( i ) of the neighbors of eachnode v is included in the set of the i -hops neighbors of v in the compressed graph. We denote such compression by ( p, t ) -compression where:• p : N ∗ → [0 , is a monotonically increasing function, which represents the proportion of each node’soriginal neighbors that must be retrieved in its i -hops neighborhood in the compressed graph.• t : is the minimum value for which p reaches its maximal value i.e. , p ( x ) = p ( t ) , ∀ x ≥ t .More formally, given an undirected graph G = ( V, E ) , where V is the set of vertices and E is the set of edges, a ( p, t ) -compression of G is deﬁned as follows: Deﬁnition 1

Let t be an integer and p : N → [0 , be a monotonically increasing function satisfying p ( x ) = p ( t ) , ∀ x > t . A ( p, t ) -compression of a graph G = ( V, E ) consists in ﬁnding a subgraph G c = ( V c , E c ) of G such that V c = V , E c ⊆ E , and, for each < x ≤ t and each v ∈ V , (cid:12)(cid:12) N G ( v ) ∩ N xG c ( v ) (cid:12)(cid:12) ≥ (cid:12)(cid:12) N G ( v ) (cid:12)(cid:12) p ( x ) , where N xG ( v ) is theset of all x -hop neighbors of v in G . In other words, the compressed graph G c contains less edges than G and preserves a given amount (equal to p ( t ) ) ofthe original neighbors for each node. In fact, it is required that, for each < x ≤ t , a proportion p ( x ) of the originalneighbors must be accessible within a maximum of x hops in G c using a simple BFS traversal with depth x .Figure 1 illustrates an example of our compression in which of the original neighbors of each vertex are preservedin the compressed graph and are reachable within maximum hops. The resulting compressed graph is smallerthan the original one. (a) Original graph (b) Compressed graphFigure 1: ( p, -compression of Zachary’s karate club network [35] where p (1) = 0 . and p (2) = 1 for each node. With ( p, t ) -compression, function p aims to control the loss of neighborhood information at each neighborhood depth.It is obvious that the smaller the preserved proportions, the better the compression ratio, and vice-versa. As regardsparameter t, the higher it is, the bigger the stretch factor of the compressed graph, and vice-versa. The followingcorollary gives a lower bound of the size of the compressed graph. property 1 For any ( p, t ) -compression, the number of edges | E c | of the compressed graph satisﬁes the followinginequality: | E | p (1) ≤ | E c | PREPRINT - J

ANUARY

28, 2021 proof 1

According to the handshaking theorem, we have P v ∈ V deg ( v ) = 2 | E | . Since at least a proportion equalto p (1) of the original neighbors of each node must be kept in the compressed graph, we have P v ′ ∈ V c deg ( v ′ ) ≥ P v ∈ V deg ( v ) p (1) = 2 | E | p (1) , thus | E c | ≥ p (1) | E | . It is interesting to note that spanners [36] are special cases of ( p, t ) -compression. Given a graph G , possibly edge-weighted, a graph spanner (or spanner for short) is a subgraph G ′ which preserves lengths of shortest paths in G upto a multiplicative and/or additive error. A t -spanner is a subgraph G ′ such that the distance between two verticesin G ′ is at most t times the distance between the same two vertices in G . Thus, a t -spanner is a particular ( p, t ) -compression that could be deﬁned such that p ( i ) = 0 for every positive integer i < t and p ( i ) = 1 for i ≥ t . In otherwords, a t -spanner is a ( p, t ) -compression whose proportion function p ( x ) is the Shifted Unit Step Function u ( x − t ) .From a structural point of view, Figure 2 gives an illustration of a 2-spanner of the Diamond graph (Figure 2.(b)) a2-spanner of the Diamond graph which is also a ( p, -compression where p (1) = and p (2) = 1 (Figure 2.(c)) anda ( p, -compression where p (1) = and p (2) = which is not a 2-spanner (Figure 2.(d)). Figure 2: a Diamond graph and some of its compressions.

We can see in this ﬁgure that with a -spanner, the left-most vertex is only connected to of its original neighborsin the Diamond graph (see Figure 2 (b)), while all the vertices keep at least half of their original neighbors with a ( p, t ) -compression (see Figure 2 (c)). Peleg and Schäffer [36] showed that, given an unweighted graph G , and integers t ≥ , m ≥ , determining if G hasa t -spanner containing m or fewer edges is NP-complete, even when t is ﬁxed to be . The reduction is from the edgedominating set problem on bipartite graphs. Since a t -spanner is a particular ( p, t ) -compression of the graph G , wecan deduce the following result: theorem 1 Finding the optimal (smallest) compressed graph satisfying the ( p, t ) -compression constraints for t ≥ isan NP-Hard problem. Another work giving us an alternative proof is Cai’s [37]. He showed that for any ﬁxed t ≥ , the minimum t -spannerproblem is NP-hard, and for t ≥ , the problem is NP-hard even when restricted to bipartite graphs. The reduction isfrom the 3-SAT problem.Dinitz et al. [38] show that for t ≤ , and for all ǫ > , the t -spanner problem cannot be approximated with a ratiobetter than ( log − ǫ n ) /k unless N P ⊆ BP T IM E (2 polylog ( n ) ) . This implies the same inapproximability result for ( p, t ) -compression.Concerning the best known approximation ratio, Elkin and Peleg [39] propose approximation algorithms with a sublin-ear approximation ratio, and study certain classes of graphs for which logarithmic approximation is feasible. It is alsoshown in [40] that for t = 2 , the t -spanner problem admits an O ( log ( n )) approximation. All these results stronglyindicate that ﬁnding better or even equivalent approximations for the ( p, t ) -compression problem will be a hard task.In particular, ﬁnding a better result should begin with ﬁnding better approximation than O ( log ( n )) for the t -spannerproblem. In the following, we provide an integer linear programming formulation of our problem.Given an input graph G = ( V, E ) , we denote by W the set of all paths in G . Given e = { u, v } ∈ E , we denote by W uv the set of all paths in W from u to v . Note that the graph is undirected and so W uv also corresponds to the pathsfrom v to u . Moreover, given i ∈ N we denote by W iuv the set of all paths of W uv of size at most i .5 PREPRINT - J

ANUARY

28, 2021We then deﬁne the used variables:• x e , for each e ∈ E , denotes whether e ∈ E is selected ( x e = 1 ) or deleted ( x e = 0 ).• f w , for each w ∈ W , is such that f w = 0 if at least one edge e of w is such that x e = 0 . Note that we canhave a path w ∈ W such that every edge e of the path is such that x e = 1 but still have f w = 0 .We can now write the integer linear programming equation: min X e ∈ E x e (1)s.t. f w ≤ x e ∀ w ∈ W , e ∈ E : e ∈ w (2) X w ∈W uv f w ≤ ∀ uv ∈ E, i ∈ N (3) X v ∈ N ( u ) X w ∈W iuv f w ≥ p ( i ) | N ( u ) | ∀ u ∈ V, i ∈ N (4) x e ∈ { , } e ∈ E (5) f w ∈ { , } w ∈ W (6)The variable f w can be seen as a ﬂow from the source to the sink. (2) ensures that if a path w ∈ W uses a removededge e (i.e., e is such that x e = 0 ), then the ﬂow f w = 0 . Using (3), we know that for each uv ∈ E there exists atmost one path w ∈ W uv such that f w = 1 . By intuition, we assume that we took the path w ∈ W uv of shortest lengththat is still available in the remaining graph after removing the edges e ∈ E such that x e = 0 . Then condition (4)ensures that the number of neighbors of a vertex u , which are now at distance at most i in the new graph, is at least theproportion given by p ( i ) . In this section, we present four algorithms for ﬁnding the ( p, t ) -compression of an input graph G . Since ﬁndingthe optimal ( p, t ) -compression is NP-Hard and cannot be resolved in polynomial time, we propose polynomial timeapproximations (sub-optimal algorithms). Algorithm 1 gives the basic implementation of our compression. It hasthe advantage of simplicity and speed. Algorithm 1 takes as input a simple graph to compress G = ( V, E ) , thecompression parameters p and t and an order E o for processing the edges of the input graph. E o is by default a randomordering of the vertices. The idea is to process the edges of the initial graph in the order E o . The algorithm processesthe edges of G incrementally as follows: If an edge e can be removed from G without violating the neighborhoodpreservation constraints, the algorithm does not keep this edge in the summary. Otherwise, the algorithm keeps theedge in the summary. Assume that the average branching factor ( Out/In degree) of the graph is equal to b , then averagetime complexity of Algorithm 1 is O ( | E | b t ) .We note that different edge orderings lead to different compression performances. Therefore, in order to improve thecompression performance of our algorithm, we propose in the three following subsections, 3 sub-optimal algorithmswhich are based on the basic algorithm and try to ﬁnd a near optimal edge processing order. We provide in Section 3.3, an optimal integer linear programming formulation of ( p, t ) -compression. However, sucha resolution is NP -hard to solve, so we use the standard tricks consisting in relaxing the problem. For this, we keepthe same formulation but allow the values x e , e ∈ E , and f w , w ∈ W , to be any real values between and . Aswe only have to consider paths of a length at most t , we have a polynomial number of variables (the degree of whichdepends on the ﬁxed value t ). The average number of variables is of the order O ( | E | + | V | b t ) , where b is the averagebranching factor of the graph. This resolution provides a value for each x e , e ∈ E . The interpretation we give to thisresolution is that the higher the value of x e , the more likely we want to keep e in our solution. In reverse, the lowerthe value of x e , the more likely we want to remove the edge e . Thus, we can use the values of x e , e ∈ E , in order toobtain an ordering for the edges and give this ordering to the basic compression algorithm (see Algorithm 2). Sincethis linear problem is solvable in polynomial time, the time complexity of Algorithm 2 is O ( poly ( | E | + | V | b t )) . Computing the LP-based order is time-consuming for large graphs according to its time complexity. So, we propose inthis subsection another edge ordering that can be computed much faster than the LP order. The idea is to ﬁrst processthe edges with a high centrality value. The centrality we consider here is a relaxation of local edge betweenness6

PREPRINT - J

ANUARY

28, 2021

Data: G = ( V, E ) a simple Graph, t an integer, p a monotonically increasing function p : N → [0 , , E o a possible order of the graph edges Result: G c = ( V c , E c ) a compressed graph // Initialization Step ; G c = ( V c , E c ) ← ( V, ∅ ) ; G ′ = ( V ′ , E ′ ) ← ( V, ∅ ) ; for e = ( u, v ) ∈ E o do E ′ ← E ′ ∪ { ( u, v ) } ; N G ′ ( u ) ← direct neighbors of node u in G ′ ; N G ′ ( v ) ← direct neighbors of node v in G ′ ; insert ← F alse ; for i = 1 to t do N iGc ( u ) ← neighbors of node u in G c within at most i -hops; N iGc ( v ) ← neighbors of node v in graph G c within at most i -hops; if | N iGc ( u ) ∩ N G ′ u ) | < p ( i ) | N G ′ ( u ) | or | N iGc ( v ) ∩ N G ′ ( v ) | < p ( i ) | N G ′ ( v ) | then insert ← T rue ; Break ; end end if insert then E c ← E c ∪ { ( u, v ) } ; end end Algorithm 1: Basic Algorithm

Data: G = ( V, E ) a simple Graph, t an integer, p a monotonically increasing function p : N → [0 , Result: G c = ( V c , E c ) a compressed graph // Computing the greedy edge order E go ; Solve the LP Relaxed problem to compute the edge scores x e ; E go ← sort edges E in descending order according to their score x e ; G c ← Basic Algorithm ( G , t , p , E go ); Algorithm 2:

LP Algorithmdeﬁned in [41]. An edge with a high edge betweenness centrality represents a bridge-like connector between two partsof a network, the removal of which may affect the shortest paths between them. The local edge betweenness of anedge e is the number of shortest paths running along e , the length of which is less than or equal to some constant t . Inour relaxation, we consider all simple paths of a length at most t , i.e., not necessarily shortest paths. Thus, we computefor every edge e a centrality score s ( e ) according to Equation 7.In Equation 7, σ t ( u, v | e ) is the number of simple paths from u to v of length ≤ t that pass through the edge e .Once all scores are computed, we sort the edges in descending order according to their score s ( e ) and pass theobtained order as input to the basic algorithm (See Algorithm 3). The average time complexity of Algorithm 3 is O (( | E | + | V | b t ) log ( | E | + | V | b t )) . s ( e ) = X ( u,v ) ∈ E σ t ( u, v | e ) ∀ ( u, v ) ∈ E (7) In the previous two subsections, we have proposed two greedy edge orderings to improve compression performance.However, the drawback of these two solutions is that they are more time-consumingthan Algorithm 1 with a randomedge ordering, as we will reveal in the next section with the experimental evaluation. Moreover, the computation time7

PREPRINT - J

ANUARY

28, 2021

Data: G = ( V, E ) a simple Graph, t an integer, p a monotonically increasing function p : N → [0 , Result: G c = ( V c , E c ) a compressed graph // Computing the greedy edge order E go ; for e ∈ E do compute the score s(e) using Equation 7; end E go ← sort the edges of G in descending order according to their score s ( e ) ; G c ← Basic Algorithm ( G , t , p , E go ); Algorithm 3:

Greedy Algorithm based on edge connectivity (EC)cannot be controlled by the user since the computation of both orderings, i.e., the LP ordering and the greedy orderingbased on edge connectivity, cannot be suspended, and we need to go to the end of the calculation. To overcome thisproblem, we propose a third algorithm based on Simulated Annealing (SA) [42]. The advantage of this solution is thatthe computation time can be controlled by the user by adjusting the number of SA iterations. SA is an optimizationscheme that allows efﬁcient search space exploration by accepting, with a given probability, worst solutions to avoid apremature convergence [42]. SA for ( p, t ) -compression is illustrated in Algorithm 4. The initial state of the algorithmis a random order of edges. Then, in each iteration, the algorithm makes a slight modiﬁcation to edge order byperforming two permutations of two elements in the vector representing the order and recomputes the cost of the newsolution. If the new order is better, the algorithm keeps the order. Otherwise, the algorithm keeps it with a probabilitywhich increases over the iterations (see line 19 of Algorithm 4). In this section, we present an experimental analysis of our compression. First, we evaluate the approximation algo-rithms provided to compute the compression. Then, we provide an analysis of the sensitivity of the compression toparameters p and t . Finally, we evaluate its effectiveness on several tasks such as graph properties estimation, nodeembedding and whole graph embedding. All the experiments are carried out on an Intel core i processor with Gigabytes of memory.

In this subsection, we present a comparative experimental study of the four proposed approximations of ( p, t ) -compression. For this, we launched the 4 approximations, i.e., the basic algorithm with random edge ordering (Al-gorithm 1), the LP approximation (Algorithm 2), the EC approximation (Algorithm 3), and the SA approximation(Algorithm 4), on 3 families of synthetic graphs the properties of which are given in Table 1.We use the following compression parameters t = 2 , p (1) = 0 . and p (2) = 0 . . For a reliable and accurate compar-ison, we carried out around thirty tests on each family of graphs for each algorithm. The results of the comparison aredepicted in Table 2. Note that the user conﬁguration of the SA is T = 10 , N = 1000 and α = 0 . . We notice thatthe two greedy algorithms LP and EC, and the SA algorithm outperform the basic algorithm with a random orderingof edges in terms of compression performance. The results clearly show that the greedy (EC) and the SA algorithmsare the best algorithms. The greedy (EC) algorithm seems really interesting and offers the best trade-off between com-pression performance and runtime. However, all the approximations are still much slower than the basic algorithmwith a random order of edges. Therefore, we recommend using the basic algorithm for large graphs. In this series of experiments, we study the effect of parameters p and t on compression performance. To this end, weevaluate our compression using two metrics: the compression runtime measured in seconds and the compression ratiothat represents the ratio of the number of deleted edges over the total number of edges (see Equation 8). compression ratio = | E | − | E ′ || E | (8)Note that higher is the compression ratio better is the storage space gain ensured by the compression. For theseexperiments and all the following ones we use real graph datasets. Table 3 gives the main characteristics of thesedatasets. 8 PREPRINT - J

ANUARY

28, 2021

Data: G = ( V, E ) a simple Graph, t an integer, p a monotonically increasing function p : N → [0 , , N an integer (Number of iterations), T a double ( Initial temperature), α a double ( decreasing factor) Result: G c = ( V c , E c ) a compressed graph S ← Random order of E ; T ← T ; G t ( V t , E t ) ← Basic Algorithm( G , t , p , S ); C best ← | E t | ; C S ← | E t | ; for i = 1 to N do S ← Perturbing S by swapping the order of two random edges; G t ( V t , E t ) ← Basic Algorithm( G , t , p , S ); if | E t | < C best then E best ← S ; C best ← | E t | ; end if | E t | < COST S then S ← S ; C S ← | E t | ; end else r ← random number between and ; if exp( C S −| E t | T ) > r then S ← S ; COST S ← | E t | ; end end T ← α ∗ T ; end G c ← Basic Algorithm ( G , t , p , E best ); Algorithm 4: ( p, t ) Compression based on simulated annealing

Table 1: Characteristics of the synthetic graph families

Name number of graphs | V | | E | SYNTHETIC 1

30 20 60

SYNTHETIC 2

30 50 350

SYNTHETIC 3

30 100 1 . K Table 4 gives the compression ratio obtained by our compression on the CA-AstroPh dataset, while varying the neigh-borhood preservation proportion p . As expected, the compression ratio decreases as the preserved proportion ofneighborhood increases and vice-versa. Most of the values of the compression ratio obtained with the various combi-nations of parameters are satisfactory. In addition, we remark that the compression ratio range is wide (from to ) which conﬁrms the possibility of controlling effectively the trade-off information loss/compression ratio usingparameters p and t . Furthermore, we set p ( t ) = 1 in all experiments, which means that the whole initial neighborhoodof each node can be retrieved in a neighborhood of radius r = t at maximum. This ensures that reachability queriesare fully preserved for all vertices. The choice of the best combination of parameters depends essentially on the natureof the graph to be compressed and the user needs. Particularly, for this example, the combinations (0 . , and (0 . , seem really interesting.The curves depicted in Figure 3 show the runtime and the compression ratio as a function of the value of t where p (0 < x < t ) = 0 and p ( t ) = 1 . Note that this combination of parameters gives a particular type of subgraphs called t -spanners [36]. We notice that the compression ratio grows slowly and starts to level off from t = 5 . However, theexecution time increases exponentially and rapidly. This is due to the complexity of the compression, which is of the9 PREPRINT - J

ANUARY

28, 2021

Table 2: Evaluation of the approximation algorithms

Dataset Basic Greedy ( LP) Greedy ( EC) SAAvg | E c | time Avg | E c | time Avg | E c | time Avg | E c | timeSYNTHETIC 1 28 Table 3: Characteristics of the real datasets used in our experiments

Name number of graphs | V | | E | BLOG-CATALOG . K . K CA-ASTROPH . K . K CA-HEPTH . K . K COLLAB . K . M ENZYMES

600 19 . K . K FLICKR . K . M PROTEINS . K . K order O ( | E | b t ) in the average case. Although spanners give good compression ratios on this dataset ranging from up to , they do not allow good control of the trade-off between information loss and compression ratio. Indeed,for this dataset, spanners give a control margin of (85% −

58% = 27%) for a maximal stretch factor t = 6 , whichrepresents a signiﬁcant loss of neighborhood information. However, with ( p, t ) -compression we get a larger controlmargin of (75% −

8% = 67%) with a maximal stretch factor t = 3 . This conﬁrms once again the efﬁciency andusefulness of our compression and its parameter p when compared to spanners. C o m p r ess i on r a t e R un t i m e ( sec ) Compression Rate Runtime

Figure 3: Compression performance of t-spanners on the Ca-AstroPh dataset ( p, t ) compression Several graph algorithms are based on the availability of neighborhood information of nodes. Our ﬁrst motivation isto be able to use these kinds of algorithms directly on the compressed graphs. So, the purpose of these experiments isto show the effectiveness of our compression in terms of speeding-up for such graph algorithms, while handling largegraphs and providing good approximations of the original results. For this, and for all the following experiments, weuse the datasets presented in Table 3 and compute two new metrics in addition to the compression ratio namely:•

Speed-up factor: the ratio between the algorithm run-time on the original graph and its run-time on thecompressed graph. he higher the speed-up factor, the faster the graph algorithm on the compressed graph.•

Performance loss: the difference between the performance metric value on the original graph and the per-formance metric value on the compressed graph. For example, for a classiﬁcation task the performance lossis the difference between the accuracy on the original graph and the accuracy on the compressed graph.Thesmaller the performance loss, the better the approximation of the graph properties on the compressed graph.10

PREPRINT - J

ANUARY

28, 2021

Table 4: Compression ratio of the Ca-AstroPh dataset with different combinations of parameters p and t t p (1) p (2) p (3) compression ratio . . - . . . - . . . - . . . - . . . . . . . . . . . . . . . . . The most suitable application for our compression is the approximation of the shortest paths between all nodes, sinceevery ( p, t ) -compression where p ( t ) = 1 . preserves all the connectivity properties between the nodes of the graph, bystretching all the connecting paths by a factor equal to t in the worst case. In this experimental phase, we compressedthree unweighted undirected graphs with the following combination of parameters t = 2 , p (1) = 0 . , and p (2) = 1 . .Then, we applied the BFS (Breadth First Search) algorithm to compute all shortest paths between all nodes. Table 5summarizes the obtained compression ratio and the shortest path speed-up obtained on three datasets: CA-ASTROPH,CA-HEPTH and BLOG-CATALOG. We notice that our compression saves a considerable amount of storage space(compression ratio > ) while approximating faster (speed-up ranges from . to . ) the shortest path lengthsfor the 3 chosen datasets. Indeed, reducing the number of edges of the graphs reduces the run-time of the shortestpaths computed by the BFS algorithm, which is of complexity O ( | V | ( | E | + | V | )) . This speed-up is more noticeablefor denser graphs. Table 5: Speeding-up all shortest paths computation

Dataset Space gain Speed-upCA-ASTROPH .

82% 1 . BLOG-CATALOG .

52% 1 . CA-HEPTH .

08% 1 . Figure 4 shows the distribution of the shortest path lengths in the original and compressed graphs for the three datasets.We note that the two curves have almost the same pace. This shows that our compression preserves the distribution ofthe lengths of the shortest paths in the 3 datasets. However, the curves of the compressed graphs are slightly stretchedand shifted from the original curves. This is due to the stretching of the paths as a result of compression. This stretchis not really considerable because of the preservation of of the direct neighbors of each vertex in the graph. Inaddition, unlike t -spanners, ( p, t ) -compression compression controls the shift between the two curves by adjusting thevalue of parameter p . Many whole graph embedding methods are based on the local neighborhood information of the nodes. These methodslearn graph representations by exploring the node neighborhood and extracting some features such as walks, shortestpaths, and local substructures. Since our compression preserves the local neighborhood within radius t , it is thus worthrunning these algorithms on compressed graphs to see if our compression speeds up these algorithms and to evaluatethe performance loss. For this, we compressed three different datasets COLLAB, ENZYMES, and PROTEINS where t = 3 , p ( x < t ) = 0 . and p ( t ) = 1 . and we run three graph embedding algorithms on the compressed graphs:shortest path delta kernel [43], graphlet kernel [44] and Graph2vec [45]. We evaluated the performance of thesealgorithms on both the original and the compressed graphs in graph classiﬁcation tasks as follows: we train an SVMclassiﬁer with of the graphs chosen randomly and then compute classiﬁcation accuracy on the test set composedof the remaining graphs. For Graph2vec, we used the best conﬁguration of parameters given in the original paper[45].Table 6 shows the performance of the graph kernels on the compressed graphs. We notice that the three kernels runfaster on compressed graphs in all experiments. This Kernel computation speed-up is more noticeable on denserdatasets, especially for the graphlet kernels. Indeed, we notice that the 4-Graphlet kernel exceeded the time limit(10 hours) on COLLAB’s original graphs, while it takes less than one hour on the compressed dataset. Regardingperformance loss, we notice a small loss in values for the shortest path kernels ranging from . to . . However,the loss is more noticeable for the graphlet kernel but it remains acceptable ( < . in all experiments). This is11 PREPRINT - J

ANUARY

28, 2021

Shortest path lenght F r e qu e n cy Compressed Graph Original Graph (a) Ca-AstroPh dataset

Shortest path lenght F r e qu e n cy Compressed Graph Original Graph (b) Blog-Catalog dataset

Shortest path lenght F r e qu e n cy Compressed Graph Original Graph (c) Ca-HepTh datasetFigure 4: Distribution of shortest path lengths of the original and compressed graphs.Table 6: Graph kernel performance on the compressed graphscr: compression ratio. pl: performance loss

Dataset Kernel cr Speed-up Original accuracy Accuracy plCOLLAB SP delta

79% 4 .

52 65 .

78% 64 .

78% 1 . .

39 64 .

62% 53 .

80% 10 . > (out of time) . -PROTEINS SP delta

40% 1 .

123 71 .

91% 72 .

18% 0 . .

49 71 .

60% 71 .

18% 0 . .

02 71 .

58% 71 .

35% 0 . ENZYMES SP delta

39% 1 .

06 29 .

31% 24 .

71% 4 . . .

58% 19 .

58% 5 . .

87 30 .

03% 19 .

70% 10 . due to the fact that our ( p, t ) -compression does not preserve all graphlets, for example only graphlets are preservedamong the graphlets of size=4. Despite this, kernel computation speed-up is very satisfactory. Table 7: Graph2vec performance on compressed datasets

Dataset Compression ratio Speed-up Performance LossCOLLAB

79% 2 .

54 0 . PROTEINS

40% 1 .

004 2 . ENZYMES

39% 1 .

27 5 . Table 7 shows the performance of Graph2vec on the compressed datasets. Globally, the algorithm runs faster onthe compressed graphs, and the speed-up factor is bigger on denser datasets. Performance loss is acceptable onthe ENZYMES dataset ( < ) and very satisfactory on the ﬁrst two datasets ( < . ). This is due to the factthat Graph2vec considers graphs as sets of Weisfeiler-Lehman relabeled subgraphs [46] that encompass higher orderneighborhoods of graph nodes. These subgraphs are highly preserved by our compression. Figure 5 depicts thedistribution of the classiﬁcation accuracy obtained using Graph2vec on the 3 datasets (compressed and original) byrunning 10 experiments on each dataset and for each type of graph (compressed or original). We notice that theperformances on compressed graphs and original graphs are nearly equivalent. The performance on original graphs is12 PREPRINT - J

ANUARY

28, 2021slightly better for the ENZYMES and PROTEINS datasets. Moreover, the loss of performance due to the compressionis not great.

COLLAB ENZYMES PROTEINSdataset A cc u r acy Compressed graph Original graph

Figure 5: Graph2vec performance boxplot

In this series of experiments, we use two main algorithms, i.e., Node2vec [47] and DeepWalk [48], to learn noderepresentations for both compressed and original graphs. We use BLOG-CATALOG and FLICKR datsets. Thecompressed graphs are obtained using ( p, t ) -compression where t = 2 , p (1) = 0 . and p (2) = 1 . . The compressionratios are ≥ for the two datasets. Node2vec and DeepWalk are run using the best parameter combinationsgiven in the original papers. To evaluate their performance on both compressed and original graphs, we run multiplemultilabel classiﬁcation tasks on the obtained representations. To this end, a sample P tr = { . , ..., . } of thelabeled nodes is used as training data. The rest of the nodes are used for testing. This process is repeated times,and we use Macro-F1 and Micro-F1 as performance metrics.Table 8 shows the performance of Node2vec and Deepwalk on the compressed graphs where P tr = 0 . . Wenotice that Deepwalk runs at the same speed on both the original and compressed graphs. This is because the timecomplexity of Deepwalk depends only on the number of nodes in the graph, which remains the same after ourcompression. However, Node2vec runs much faster on the compressed graphs than on the original graphs. This isjustiﬁed by the fact that Node2vec’s time complexity depends on the square of the branching factor b of the graph[49], which is implicitly related to the number of edges in the graph. The Micro-F1 and Macro-F1 scores obtained onthe compressed graphs are nearly equivalent to the original scores. The performance loss rates are low ( ≤ ) forboth methods, and insigniﬁcant on the FLICKR dataset ( ≤ . ).For more ﬁne-grained results, we also compared the performance of the two algorithms on compressed andoriginal graphs while varying the size of the training sample P tr from . to . . We summarize the results graphicallyfor the Micro-F1 and Macro-F1 scores for both methods, i.e., Node2vec and Deepwalk, in Figures 6 and 7 respectively.Here we make the same observations: the performances of the two algorithms on compressed graphs are almostsimilar to their performances on the original graphs for all training rates. With the BLOG-CATALOG dataset, theperformance drops by on compressed graphs in the worst case. However the performance curves on original andcompressed graphs are almost identical in the FLICKR dataset.13 PREPRINT - J

ANUARY

28, 2021

Table 8: Performance of node embedding algorithms on compressed datasets

Dataset Method Space gain Speed-up Loss ( Micro F1) Loss ( Macro F1)BlogCatalog DeepWalk .

52% 0 .

99 3 .

6% 3 . Node2vec .

87 3 .

4% 2 . Flickr DeepWalk .

59% 0 .

99 0 .

9% 1 . Node2vec .

32 0 .

3% 0 . o F sc o r e BlogCatalog dataset 0.200.250.300.350.400.450.50 M ac r o F sc o r e Figure 6: Performance of Node2vec on compressed graphs M i c r o F sc o r e BlogCatalog dataset 0.200.250.300.350.400.450.50 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9Flickr dataset0.100.150.200.250.30 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 M ac r o F sc o r e Figure 7: Performance of DeepWalk on compressed graphs PREPRINT - J

ANUARY

28, 2021

In this paper, we presented a graph summarization approach designed to control the amount of neighborhood infor-mation preserved in the computed summary. This approach relies on two parameters: a function p that gives theproportion of each node’s original neighbors to be preserved in its i -hops neighborhood in the compressed graph, anda threshold t for which p reaches its maximal value.We presented algorithms to compute this compression with the minimum cost, and showed their effectiveness incompressing input graphs through experimental evaluation on multiple real life as well as synthetic graph datasets.We also showed that the summaries computed by the proposed approach can be used without any decompression asinput to multiple graph applications, such as node embedding, graph classiﬁcation, and shortest path approximations.The results show interesting trade-offs between algorithm runtime speed-up and precision loss.As for future work, we consider a more thorough analysis of ( p, t ) -compression impact on walks based graph learningalgorithms such as Node2vec and DeepWalk. In fact, we observed some situations where learning accuracy increasedwhen the graph was compressed. This was a quite unexpected observation. While we guess that walks are biased inthe right direction by removing edges, characterizing such edges remains an open question. Another important openquestion is to ﬁnd an efﬁcient method to order graph edges. This would allow us to signiﬁcantly improve the timecomplexity of the approach. In addition, we aim to design an incremental version of our compression to deal withdynamic graphs or graph streams.We note also that our approach can be used on both directed and undirected graph. However, our compression do notconsider the labels of the edges. To compress edge-labelled graphs, a new model need to be deﬁned so as to take intoaccount these labels for example when deﬁning the edge ordering. Acknowledgement:This work is funded by ANR under grant ANR-20-CE23-0002.

References [1] Y. Liu, T. Safavi, A. Dighe, and D. Koutra. Graph summarization methods and applications: A survey.

ACMComput. Surv. , 51(3):62:1–62:34, June 2018.[2] A.R. Bloemena.

Sampling from a Graph . Mathematical Centre tracts. Mathematisch Centrum, 1976.[3] L.-C. Zhang and M. Patone. Graph sampling.

METRON , 75(3):277–299, Dec 2017.[4] Yong-Yeol Ahn, Seungyeop Han, Haewoon Kwak, Sue Moon, and Hawoong Jeong. Analysis of topologicalcharacteristics of huge online social networking services. In

Proceedings of the 16th International Conference onWorld Wide Web , WWW ’07, page 835–844, New York, NY, USA, 2007. Association for Computing Machinery.[5] Irma Ravkic, Martin Žnidaršiˇc, Jan Ramon, and Jesse Davis. Graph sampling with applications to estimating thenumber of pattern embeddings and the parameters of a statistical relational model.

Data Mining and KnowledgeDiscovery , 32(4):913–948, July 2018.[6] Muhammad Irfan Yousuf and Suhyun Kim. Guided sampling for large graphs.

Data Mining and KnowledgeDiscovery , 34(4):905–948, July 2020.[7] Azade Nazi, Zhuojie Zhou, Saravanan Thirumuruganathan, Nan Zhang, and Gautam Das. Walk, not wait: Fastersampling over online social networks.

Proc. VLDB Endow. , 8(6):678–689, February 2015.[8] Y. Li, Z. Wu, S. Lin, H. Xie, M. Lv, Y. Xu, and J. C. S. Lui. Walking with perception: Efﬁcient random walksampling via common neighbor awareness. In , pages 962–973, 2019.[9] Junzhou Zhao, Pinghui Wang, John C. S. Lui, Don Towsley, and Xiaohong Guan. Sampling online socialnetworks by random walk with indirect jumps.

Data Mining and Knowledge Discovery , 33(1):24–57, January2019.[10] Bruno Ribeiro and Don Towsley. Estimating and sampling graphs with multidimensional random walks. In

Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement , IMC ’10, page 390–403, NewYork, NY, USA, 2010. Association for Computing Machinery.[11] Matteo Riondato and Evgenios M. Kornaropoulos. Fast approximation of betweenness centrality through sam-pling.

Data Mining and Knowledge Discovery , 30(2):438–475, March 2016.15

PREPRINT - J

ANUARY

28, 2021[12] L. Paul Chew. There are planar graphs almost as good as the complete graph.

Journal of Computer and SystemSciences , 39(2):205 – 219, 1989.[13] Michael Mathioudakis, Francesco Bonchi, Carlos Castillo, Aristides Gionis, and Antti Ukkonen. Sparsiﬁcationof inﬂuence networks. In

Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Dis-covery and Data Mining , KDD ’11, page 529–537, New York, NY, USA, 2011. Association for ComputingMachinery.[14] Francesco Bonchi, Gianmarco De Francisci Morales, Aristides Gionis, and Antti Ukkonen. Activity preservinggraph simpliﬁcation.

Data Mining and Knowledge Discovery , 27(3):321–343, November 2013.[15] Kiyohito Nagano, Yoshinobu Kawahara, and Kazuyuki Aihara. Size-constrained submodular minimizationthrough minimum norm base. In

Proceedings of the 28th International Conference on Machine Learning, ICML2011 , Proceedings of the 28th International Conference on Machine Learning, ICML 2011, pages 977–984,2011. 28th International Conference on Machine Learning, ICML 2011 ; Conference date: 28-06-2011 Through02-07-2011.[16] Nieves R. Brisaboa, Susana Ladra, and Gonzalo Navarro. k2-trees for compact web graph representation. InJussi Karlgren, Jorma Tarhio, and Heikki Hyyrö, editors,

String Processing and Information Retrieval , pages18–30, Berlin, Heidelberg, 2009. Springer Berlin Heidelberg.[17] Saket Navlakha, Rajeev Rastogi, and Nisheeth Shrivastava. Graph summarization with bounded error. In

Pro-ceedings of the 2008 ACM SIGMOD International Conference on Management of Data , SIGMOD ’08, page419–432, New York, NY, USA, 2008. Association for Computing Machinery.[18] J. Rissanen. Modeling by shortest data description.

Automatica , 14(5):465 – 471, 1978.[19] T. Gallai. Transitiv orientierbare graphen.

Acta Mathematica Hungarica , 18:25–66, 1967.[20] Michel Habib and Christophe Paul. A survey of the algorithmic aspects of modular decomposition.

ComputerScience Review , 4(1):41 – 59, 2010.[21] Soﬁane Lagraa and Hamida Seba. An efﬁcient exact algorithm for triangle listing in large graphs.

Data Miningand Knowledge Discovery , 30(5):1350–1369, September 2016.[22] T. Suel and Jun Yuan. Compressing the graph structure of the web. In

Proceedings DCC 2001. Data CompressionConference , pages 213–222, 2001.[23] P. Boldi and S. Vigna. The webgraph framework i: Compression techniques. In

Proceedings of the 13th Interna-tional Conference on World Wide Web , WWW ’04, page 595–602, New York, NY, USA, 2004. Association forComputing Machinery.[24] Danai Koutra, U Kang, Jilles Vreeken, and Christos Faloutsos. Summarizing and understanding large graphs.

Stat. Anal. Data Min. , 8(3):183–202, June 2015.[25] Neil Shah, Danai Koutra, Tianmin Zou, Brian Gallagher, and Christos Faloutsos. Timecrunch: Interpretable dy-namic graph summarization. KDD ’15, page 1055–1064, New York, NY, USA, 2015. Association for ComputingMachinery.[26] Sebastian Maneth and Fabian Peternek. Grammar-based graph compression.

Information Systems , 76:19 – 45,2018.[27] Matteo Riondato, David García-Soriano, and Francesco Bonchi. Graph summarization with quality guarantees.

Data Mining and Knowledge Discovery , 31(2):314–349, March 2017.[28] K. Ashwin Kumar and Petros Efstathopoulos. Utility-driven graph summarization.

Proc. VLDB Endow. ,12(4):335–347, December 2018.[29] Kijung Shin, Amol Ghoting, Myunghwan Kim, and Hema Raghavan. Sweg: Lossless and lossy summarizationof web-scale graphs. In

The World Wide Web Conference , WWW ’19, page 1679–1690, New York, NY, USA,2019. Association for Computing Machinery.[30] Soﬁa Fernandes, Hadi Fanaee-T, and João Gama. Dynamic graph summarization: a tensor decomposition ap-proach.

Data Mining and Knowledge Discovery , 32(5):1397–1420, September 2018.[31] Sorour E. Amiri, Liangzhe Chen, and B. Aditya Prakash. Efﬁciently summarizing attributed diffusion networks.

Data Mining and Knowledge Discovery , 32(5):1251–1274, September 2018.16

PREPRINT - J

ANUARY

28, 2021[32] Peter Bloem and Steven de Rooij. Large-scale network motif analysis using compression.

Data Mining andKnowledge Discovery , 34(5):1421–1453, September 2020.[33] Sarang Kapoor, Dhish Kumar Saxena, and Matthijs van Leeuwen. Online summarization of dynamic graphsusing subjective interestingness for sequential data.

Data Mining and Knowledge Discovery , September 2020.[34] Kyuhan Lee, Hyeonsoo Jo, Jihoon Ko, Sungsu Lim, and Kijung Shin. Ssumm: Sparse summarization of massivegraphs. In

Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and DataMining , KDD ’20, page 144–154, New York, NY, USA, 2020. Association for Computing Machinery.[35] Wayne W Zachary. An information ﬂow model for conﬂict and ﬁssion in small groups.

Journal of anthropologi-cal research , 33(4):452–473, 1977.[36] David Peleg and Alejandro A. Schäffer. Graph spanners.

Journal of Graph Theory , 13(1):99–116, 1989.[37] Leizhen Cai. Np-completeness of minimum spanner problems.

Discrete Applied Mathematics , 48(2):187 – 194,1994.[38] Michael Dinitz, Guy Kortsarz, and Ran Raz. Label cover instances with large girth and the hardness of approxi-mating basic k-spanner.

ACM Trans. Algorithms , 12(2), December 2016.[39] Michael Elkin and David Peleg. Approximating k-spanner problems for k>2.

Theoretical Computer Science ,337(1):249 – 277, 2005.[40] G. Kortsarz and D. Peleg. Generating sparse 2-spanners.

Journal of Algorithms , 17(2):222 – 236, 1994.[41] Steve Gregory. A fast algorithm to ﬁnd overlapping communities in networks. In Walter Daelemans, BartGoethals, and Katharina Morik, editors,

Machine Learning and Knowledge Discovery in Databases , pages 408–423, Berlin, Heidelberg, 2008. Springer Berlin Heidelberg.[42] Peter JM Van Laarhoven and Emile HL Aarts. Simulated annealing. In

Simulated annealing: Theory andapplications , pages 7–15. Springer, 1987.[43] Karsten M Borgwardt and Hans-Peter Kriegel. Shortest-path kernels on graphs. In

Fifth IEEE internationalconference on data mining (ICDM’05) , pages 8–pp. IEEE, 2005.[44] Nino Shervashidze, SVN Vishwanathan, Tobias Petri, Kurt Mehlhorn, and Karsten Borgwardt. Efﬁcient graphletkernels for large graph comparison. In

Artiﬁcial Intelligence and Statistics , pages 488–495, 2009.[45] Annamalai Narayanan, Mahinthan Chandramohan, Rajasekar Venkatesan, Lihui Chen, Yang Liu, and ShantanuJaiswal. graph2vec: Learning distributed representations of graphs. arXiv preprint arXiv:1707.05005 , 2017.[46] Nino Shervashidze, Pascal Schweitzer, Erik Jan Van Leeuwen, Kurt Mehlhorn, and Karsten M Borgwardt.Weisfeiler-lehman graph kernels.

Journal of Machine Learning Research , 12(9), 2011.[47] Aditya Grover and Jure Leskovec. node2vec: Scalable feature learning for networks. In

Proceedings of the 22ndACM SIGKDD international conference on Knowledge discovery and data mining , pages 855–864, 2016.[48] Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. Deepwalk: Online learning of social representations. In