[PDF] Network comparison and the within-ensemble graph distance

Abstract

Quantifying the differences between networks is a challenging and ever-present problem in network science. In recent years a multitude of diverse, ad hoc solutions to this problem have been introduced. Here we propose that simple and well-understood ensembles of random networks (such as Erdős-Rényi graphs, random geometric graphs, Watts-Strogatz graphs, the configuration model, and preferential attachment networks) are natural benchmarks for network comparison methods. Moreover, we show that the expected distance between two networks independently sampled from a generative model is a useful property that encapsulates many key features of that model. To illustrate our results, we calculate this within-ensemble graph distance and related quantities for classic network models (and several parameterizations thereof) using 20 distance measures commonly used to compare graphs. The within-ensemble graph distance provides a new framework for developers of graph distances to better understand their creations and for practitioners to better choose an appropriate tool for their particular task.

Full PDF

NNetwork comparison and the within-ensemble graph distance

Harrison Hartle, Brennan Klein,

1, 2, ∗ Stefan McCabe, Alexander Daniels, Guillaume St-Onge,

4, 5

Charles Murphy,

4, 5 and Laurent Hébert-Dufresne

3, 4, 6 Network Science Institute, Northeastern University, Boston, MA, USA Laboratory for the Modeling of Biological and Socio-Technical Systems, Northeastern University, Boston, MA, USA Vermont Complex Systems Center, University of Vermont, Burlington, VT, USA Département de Physique, de Génie Physique et d’Optique, Université Laval, Québec, Canada Centre Interdisciplinaire de Modélisation Mathématique, Université Laval, Québec, Canada Department of Computer Science, University of Vermont, Burlington, VT, USA (Dated: August 5, 2020)Quantifying the diﬀerences between networks is a challenging and ever-present problem in net-work science. In recent years a multitude of diverse, ad hoc solutions to this problem have beenintroduced. Here we propose that simple and well-understood ensembles of random networks—suchas Erdős-Rényi graphs, random geometric graphs, Watts-Strogatz graphs, the conﬁguration model,and preferential attachment networks—are natural benchmarks for network comparison methods.Moreover, we show that the expected distance between two networks independently sampled from agenerative model is a useful property that encapsulates many key features of that model. To illus-trate our results, we calculate this within-ensemble graph distance and related quantities for classicnetwork models (and several parameterizations thereof) using 20 distance measures commonly usedto compare graphs. The within-ensemble graph distance provides a new framework for developersof graph distances to better understand their creations and for practitioners to better choose anappropriate tool for their particular task.

I. INTRODUCTION

Quantifying the extent to which two ﬁnite graphsstructurally diﬀer from one another is a common, impor-tant problem in the study of networks. We see attemptsto quantify the dissimilarity of graphs in both theoreti-cal and applied contexts, ranging from the comparison ofsocial networks [1–3], to time-evolving networks [4–8], bi-ological networks [5], power grids and infrastructure net-works [9], object recognition [10], video indexing [11], andmuch more. Together, these network comparison studiesall seek to deﬁne a notion of dissimilarity or distance be-tween two networks and to then use such a measure togain insights about the networks in question.However, it is often unclear which network featuresa given graph distance will or will not capture. For thisreason, rigorous benchmarks must be established in orderto better understand the tendencies and biases of thesedistances. We adopt the perspective that random graphensembles are the appropriate tool to achieve this task.Speciﬁcally, by sampling pairs of graphs from within agiven random ensemble with the same parameterizationand measuring the graph distance between them, we cre-ate a benchmark that allows us to better understand thesensitivity of a given graph distance to known statisticalfeatures of an ensemble. Ultimately, a good benchmarkwould characterize the behavior of graph distances be-tween graphs sampled from both within an ensemble and between diﬀerent ensembles. We tackle the former in thispaper, noting a rich diversity of behaviors among com-monly used graph distance measures. Even though this ∗ correspondence: [email protected] work focuses on within-ensemble graph distances, theseresults guide our understanding of how any two sets ofnetworks structurally diﬀer from each other regardless ofif those sets are generated by the same random ensembleor another network-generating process. Put simply, theapproach introduced in this work is general and can beused to develop a number of graph distance benchmarks.There are many approaches used to quantify the dis-similarity between two graphs, and we highlight 20 dif-ferent ones here. Given the large number of algorithmsconsidered in this work, we ﬁnd it useful to systematicallycharacterize each of these measures. We do so by break-ing them down into “description-distance” pairs. Thatis, every graph distance measure can be thought of as 1)computing some description or property of two graphsand 2) quantifying the diﬀerence between those descrip-tions using some distance metric. A. Formalism of Graph Distances

Graph Descriptors

Deﬁnition 1.

A graph description Ψ is a mapping froma set of graphs G to a space D , Ψ :

G → D . (1)The set G is that of all ﬁnite labeled simple graphs,and the space D is known as the graph descriptor space .Typically, D is R l × m for integers l, m or is a space ofprobability distributions. Given a description Ψ , the de-scriptor of graph G , denoted ψ G , is the element of D towhich G is mapped; ψ G = Ψ( G ) . a r X i v : . [ phy s i c s . s o c - ph ] A ug Descriptor Distances

Deﬁnition 2.

A distance maps a pair of descriptors toa nonnegative real value, d : D × D → R + (2) and satisﬁes the following properties for all x, y ∈ D :1. d ( x, y ) = d ( y, x ) (Symmetry)2. d ( x, x ) = 0 (Identity Law) The properties listed in this deﬁnition are general, andthey do not restrict the large possibility of measures wemight use, while also providing a clean separation be-tween how we choose to describe graphs and how we cal-culate the diﬀerences between those descriptions. A com-mon property when considering distance measures is the triangle inequality ; however we have not included this inthe list above as not all commonly used graph distancesobey this property [12]. As in the case of pseudometrics, d ( x, y ) = 0 does not always imply x = y [7] [13]. Graph Distances

Deﬁnition 3.

Given a set of graphs

M ⊆ G , a graphdescription Ψ , its descriptor space D , and a distance d on D , the associated graph distance measure D : M × M → R + is a function deﬁned by D ( G, G (cid:48) ) = d ( ψ G , ψ G (cid:48) ) . (3)Every graph distance quantiﬁes some notion of dissim-ilarity between two graphs [14]. Network spaces

Deﬁnition 4.

Given a distance d and description Ψ ondescriptor space D and a set of graphs M ⊆ G , the as-sociated network space, denoted ( d, Ψ , M ) , is the set ofdescriptors mapped to by Ψ from graphs in M , equippedwith d as a distance measure. The network space ( d, Ψ , M ) consists of |M| points in D , namely { ψ G } G ∈M ⊆ D —giving rise to |M| ( |M| +1) / distance values, one for each pair of descriptions ofelements of M .Fundamental questions naturally arise. Does a net-work space capture known properties of a given ensem-ble of graphs? This question we can begin to answer byconsidering sets of graphs with known properties: i.e.,random graph models. Models

Deﬁnition 5.

A model M (cid:126)α is a process which generatesa probability distribution P (cid:126)α over a set of graphs M ⊆ G ,where (cid:126)α is a vector of parameters needed by the model togenerate the distribution.

Models are typically stochastic processes that takesome parameters as inputs and generate sets of graphs.The probability distribution of model M (cid:126)α is then deﬁnedover the set of graphs that have non-zero probability ofbeing generated given the model and its parameters (cid:126)α .For many well-known models, we have a deep under-standing of how the structure of sampled graphs is in-ﬂuenced by the parameter values. Using our knowledgeof how parameters aﬀect graph structure, we can see howwell the expected features of a given model are reﬂectedby the structure of each network space. B. This study

Herein, we apply a variety of graph distances to pairsof independently and identically sampled networks froma variety of random network models, over a range of pa-rameter values for each, and consider the within-ensembledistance distribution as a function of the type of graphand model parameters. While our focus is on the meansof the distance distributions, we also include the stan-dard deviations in each ﬁgure. Ultimately, we report thewithin-ensemble graph distances for 20 diﬀerent graphdistances from the software package, netrd [15]. To ourknowledge, this is the largest systematic comparison ofgraph distances to date.

II. METHODSA. Ensembles

We study the behavior of ( d, Ψ , M ) for sets of graphssampled from M (cid:126)α under a variety of parameterizations.There are many graph ensembles that one could use tocompute within-ensemble graph distances, and we beginby focusing on two broad classes: ensembles that pro-duce graphs with homogeneous degree distributions andthose that produce graphs with heterogeneous degree dis-tributions. In total, we study the within-ensemble graphdistance for ﬁve diﬀerent ensembles.

1. Erdős-Rényi random graphs

Graphs sampled from the Erdős-Rényi model ( ER ),also known as G ( n,p ) , have (undirected) edges among n nodes, with each pair being connected with probability p [16, 17]. This model is commonly used as a benchmark ora null model to compare with observed properties of real-world network data from nature and society. In our case,it allows us to explore the behavior of graph distancemeasures on dense and homogeneous graphs without anystructure. In fact, this model maximizes entropy subjectto a global constraint on expected edge density, p .One well-studied construction of this ensemble is when p = (cid:104) k (cid:105) n , in which n nodes are connected uniformly atrandom such that nodes in the resulting graph have anaverage degree of (cid:104) k (cid:105) . This ensemble is particularly usefulfor identifying which graph distance measures are ableto capture key structural transitions that happen as theaverage degree increases. For convenience, we will referto this ensemble as G ( n, (cid:104) k (cid:105) ) .

2. Random geometric graphs

We work with random geometric graphs of n nodesand edge density p , generated by sprinkling n coordinatesuniformly into a one-dimensional ring of circumference , and connecting all pairs of nodes whose coordinatedistance (arc length) is less than or equal to p . Comparedto G ( n,p ) , this model produces graphs that have a highaverage local clustering coeﬃcient, which is a propertycommonly found in real network data. Note that settingthe connection distance to p means that p parameterizesthe edge density exactly as in G ( n,p ) [18, 19].

3. Watts-Strogatz graphs

Watts-Strogatz (

W S ) graphs allow us to study the ef-fects that random, long-range connections have on oth-erwise large-world regular lattices. A

W S graph is ini-tialized as a one-dimensional regular ring-lattice, param-eterized by the number of nodes n and the even-integerdegree of every node (cid:104) k (cid:105) (each node connects to the (cid:104) k (cid:105) closest other nodes on either side). Each edge in thenetwork is then randomly rewired with probability p r ,which generates graphs with both relatively high averageclustering and relatively short average path lengths for awide range of p r ∈ (0 , [20].

4. (Soft) Conﬁguration model with power-law degreedistribution

We generate expected degree sequences from distribu-tions with power-law tails with a mean of (cid:104) k (cid:105) . We con-struct an instance of a “soft” conﬁguration model, themaximum entropy network ensemble with a given se-quence of expected degrees, by connecting node-pairswith probabilities determined via the method of La-grange multipliers [21–23]. Through this method, weare able to construct networks with a tunable degreeexponent, γ . The degree exponents that we test range from those that skew the distribution heavily, result-ing in a highly heterogeneous ultra-small-world network( γ ∈ (2 , ), to those that generate more homogeneousnetworks ( γ > ). In contrast to the homogeneous en-sembles we tested—all of which have homogeneous de-gree distributions—the requirement of heterogeneity inthese graphs constrains the possible edge densities to bevanishingly small. Otherwise, in the high-edge densityregime, degrees cannot ﬂuctuate to appreciably larger-than-average values, and we have a natural degree scaleimposed by the network size.

5. Nonlinear preferential attachment

The ﬁnal ensemble of networks included here are grownunder a degree-based nonlinear preferential attachmentmechanism [24–26]. A network of n nodes is grown asfollows: each new node is added to the network sequen-tially, connecting its m edges to nodes already in thenetwork v i ∈ V with probability Π i = k αi (cid:80) j k αj , where k i is the degree of node v i and α modulates the probabilitythat a given node already in the network will collect newedges. When α = 1 , this model generates networks witha power-law degree distribution (with degree exponent γ = 3 ), and a condensation regime emerges as n → ∞ when α > , producing a star network with O ( n ) nodesall connected to a main hub node [26]. B. Graph distance measures

The study of network similarity and graph distancehas yielded many approaches for comparing two graphs[5]. Typically, these methods involve comparing simpledescriptors based on either aggregate statistical prop-erties of two graphs—such as their degree or averagepath length distributions [4]—or intrinsic spectral prop-erties of the two graphs, such as the eigenvalues of theiradjacency matrices, or of other matrix representations[27]. The description distances also tend to fall in twobroad categories: either classic deﬁnitions of norms ordistances based on statistical divergence. While diﬀer-ent approaches are better suited for capturing diﬀerencesbetween certain types of graphs, they obviously are ex-pected to share several properties.The simplest graph distances aggregate element-wisecomparisons between the adjacency matrices of twographs [28–31], and extensions thereof [32]; these meth-ods depend explicitly on the node labeling scheme (andhence are not invariant under graph isomorphism [33]),which may limit their utility when comparing graphs withunknown labels (e.g. graphs sampled from random graphensembles, as we do here). Several measures collect em-pirical distributions [34] or a “signature” vector [1] fromeach graph and take the distance between them (usingthe Jensen-Shannon divergence, Canberra distance, earthmover’s distance, etc. [35]), which, among other things,

Graph distance Label

JAC

HAM

HIM

FRO

POD

DJS

POR

QJS

CSE

10 Graph diﬀusion distance [42]

GDD

11 Resistance-perturbation [8]

REP

12 NetLSD [3]

LSD

13 Lap. spectrum; Gauss. kernel, JSD [27]

LGJ

14 Lap. spectrum; Loren. kernel, Euc. [27]

LLE

15 Ipsen-Mikhailov [43]

IPM

16 Non-backtracking eigenvalue [7]

NBD

17 Distributional Non-backtracking [38]

DNB

18 D-measure distance [9]

DMD

19 DeltaCon [2]

DCN

20 NetSimile [1]

NES

TABLE I.

Graph distances.

Distance measures used tosystematically compare graphs in this work, as well as theirabbreviated labels, and their source.

Abbreviations : Lap. =Laplacian, Gauss. = Gaussian, Loren. = Lorenzian, JSD =Jensen-Shannon divergence, Euc. = Euclidian distance. facilitates comparison of diﬀerently sized graphs [4, 36].Another family of approaches compare spectral proper-ties of certain matrices characterized by the graphs [37],such as the non-backtracking matrix [7, 38] or Laplacianmatrix [27]. The relevant spectral properties associatedwith these distances are invariant under graph isomor-phism [33, 39]. Some graph distances have been shownto be metrics (i.e., they satisfy properties such as triangleinequality, etc.) [12], whereas others have not. These arenot exhaustive descriptions of every graph distance in usetoday, but they represent coarse similarities between thevarious methods. We summarize the 20 graph distanceswe consider in Table I and more extensively deﬁne themin Supplemental Information (SI) B.

C. Description of experiments

See Table II for the full parameterization of these sam-pled graphs. In each experiment, we generate N = 10 pairs of graphs for every combination of parameters.With these sampled random graphs, we measure the dis-tance between pairs from the same parameterization ofthe same model, M (cid:126)α , and report statistical properties ofthe resulting vectors of distances. In other words, ourexperiments consist of calculating mean within-ensemble Ensemble Fixed parameter(s) Key parameter G ( n,p ) n = 500 p ∈ { . , . , ..., . } RGG n = 500 p ∈ { . , . , ..., . } G ( n, (cid:104) k (cid:105) ) n = 500 (cid:104) k (cid:105) ∈ { − , ..., n } W S n = 500 , (cid:104) k (cid:105) = 8 p r ∈ { − , ..., } SCM n = 1000 , (cid:104) k (cid:105) = 12 γ ∈ { . , . , ... . } P A n = 500 , (cid:104) k (cid:105) = 4 α ∈ {− , − . , ..., } TABLE II.

Experiment parameterization.

Here we re-port the ensembles that were used in these experiments, aswell as their parameterizations. For G ( n, (cid:104) k (cid:105) ) and W S key pa-rameters, we span 100 values, spaced logarithmically, betweenthe values above.

Parameter labels : n = network size, p =density, (cid:104) k (cid:105) = average degree, p r = probability that a randomedge is randomly rewired, γ = power-law degree exponent, α = preferential attachment kernel. Note: In SI A, we show howthe within-ensemble graph distance changes as n increases. graph distances, (cid:104) D (cid:105) = (cid:88) G,G (cid:48) ∈G D ( G, G (cid:48) ) P (cid:126)α ( G ) P (cid:126)α ( G (cid:48) ) , (4)where P M,(cid:126)α : G → [0 , (or P (cid:126)α when its meaning is unam-biguous) is the graph probability distribution for model M (cid:126)α . This is estimated by sampling N (cid:29) graph-pairs { ( G i , G (cid:48) i ) } Ni =1 and computing (cid:104) D (cid:105) ≈ N N (cid:88) i =1 D ( G i , G (cid:48) i ) . (5)We then study the behavior of (cid:104) D (cid:105) for various M (cid:126)α . Theerror on the mean within-ensemble graph distance is es-timated from the following standard error of the mean σ (cid:104) D (cid:105) ≈ σ D √ N , where σ D is the standard deviation on thewithin-ensemble graph distance D , estimated by sam-pling as well. For all experiments, we used N = 10 pairsof graphs, which is suﬃcient in general as can be seenfrom the small standard error relative to the mean in allﬁgures. In each plot, we also include the standard devia-tions σ D of the within-ensemble graph distances, and wehighlight when the standard deviation oﬀers particularlynotable insights into the behavior of certain distances.Lastly, there are several distances that assume align-ment in the node labels of G and G (cid:48) . Because we aresampling from random graph ensembles, the networks westudy here are not node-aligned, and as such, care shouldbe taken when interpreting the output of these graph dis-tances. For every description of graph distances in SI B,we note if node alignment is assumed. III. RESULTS

In the following sections, we broadly describe the be-havior of the mean within-ensemble graph distance (ingeneral denoted (cid:104) D (cid:105) ) for the distance measures tested.The general structure of this section is motivated by crit-ical properties of the ensembles studied here. We high-light features of the within-ensemble graph distance fortwo broad characterizations of networks: homogeneousand heterogeneous graph ensembles, focusing on speciﬁcensembles within each category.All of the main results from the experiments describedbelow are summarized in Table III, which practitionersmay ﬁnd especially useful when considering which toolsto use for comparing networks with particular structures.When relevant, we highlight certain distance measuresto emphasize interesting within-ensemble graph distancebehaviors. A. Results for homogeneous graph ensembles

1. Dense graph ensembles

Here, we present our results for the two models thatproduce homogeneous and dense graphs.The G ( n,p ) model possesses three notable features thatwe might expect graph distance measures to recover.Note that while we might expect graph distances torecover these features, we are not asserting that everygraph distance measure should capture these properties.1. The size of the ensembles shrink to a single iso-morphic class in the limits p → and p → , cor-responding respectively to an empty and completegraph of size n . In both limits, we might thereforeexpect (cid:104) D ( M n,p ) (cid:105) to go to zero for any method thatconsiders unlabelled graphs.2. The G ( n,p ) model creates ensembles of graphs andgraph complements symmetric under the change ofvariable p (cid:48) = 1 − p . By deﬁnition, every graph G has a complement ¯ G such that every edge thatdoes (or does not) exist in G does not (or does)exist in ¯ G . Therefore, for every graph in G ( n,p ) , onecan expect to ﬁnd its complement occurring withthe same probability in G ( n, − p ) . We might expect (cid:104) D ( M n,p ) (cid:105) = (cid:104) D ( M n, − p ) (cid:105) if graph distances cancapture this symmetry.3. A density of p = produces the G ( n,p ) ensem-ble with maximal entropy (all graph conﬁgurationshave an equal probability). As a result, we mightalso expect (cid:104) D ( M n,p ) (cid:105) to have a global maximumat p = .The RGG model shares features 1 and 3 with the G ( n,p ) model, but not feature 2. Moreover, the most signiﬁcantdiﬀerences between the two models is that edges are notindependent in the RGG model. Correlations betweenedges lead to local structure (i.e., higher-order structureslike triangles) and to correlations in the joint-degree dis-tribution. We therefore do not expect distance measuresfocused on the degree distribution to produce exactly the same mean within-ensemble distance curve in

RGG as in G ( n,p ) . Conversely, any distance measure that does pro-duce the exact same within-ensemble distance curve for RGG and G ( n,p ) either fails to account for these corre-lations, or the eﬀect of these correlations is negligible onthe overall distance between two graphs drawn from theensemble. This is the case for HAM , HIM and

FRO .Our result for homogeneous graph ensembles are shownin Figure 1. Only 5 out of 20 graph distances capture allfeatures discussed above, namely:

HAM , HIM , FRO , POD , DJS . Notably, these are some of the simplest methodsconsidered. In fact, these include two in which theoret-ical predictions for ER graphs precisely match the ob-served results for both ER graphs and RGG s, despite noconsideration of

RGG s having been included in such cal-culations. In one such case (

FRO ), ER graphs and RGG sbehave identically, yet there is also an n -dependence (SeeSI Figure 6).

2. Sparse graph ensembles

While the previous section highlighted dense

RGG and ER networks, we now turn to the within-ensemble graphdistance of sparse homogeneous graphs sampled from G ( n,p ) , such that p = (cid:104) k (cid:105) n . In the case of sparse graphs,the edge density decays to zero in the n → ∞ limit as themean degree (cid:104) k (cid:105) remains ﬁxed. We found it importantto cast this distinction between dense G ( n,p ) because ofcritical transitions that take place as (cid:104) k (cid:105) increases. Asnetwork scientists, these early transition points in sparsenetworks are foundational, with implications for a num-ber of network phenomena (i.e. the occurrence of out-breaks in disease models [44], etc.).In fact, the presence of such critical transitions in ran-dom graph models underscores the utility of this ap-proach for studying graph distance measures. That is, asudden change in the within-ensemble graph distance sig-nals abrupt changes in the probability distribution overthe set of graphs in the ensemble (i.e., the emergenceof novel graph structures that are markedly diﬀerentfrom the greater population of graphs in an ensemble).This may show up as a local or global maximum within-ensemble graph distance near parameter values for whichthis transition occurs. Conversely, if a sudden decreasein within-ensemble graph distance is observed, then theremay be a sudden disappearance or reduction in largelydissimilar graphs in the ensemble.In the case of G ( n,p ) where p = (cid:104) k (cid:105) n , which we willrefer to with the shorthand, G ( n, (cid:104) k (cid:105) ) , the following criticaltransitions emerge:4. At (cid:104) k (cid:105) = 1 , we see the emergence of a giant compo-nent in ER networks (likewise, a 2-core emerges at (cid:104) k (cid:105) = 2 ). We might expect, for example, a within- G ( n, (cid:104) k (cid:105) ) graph distance to have a local maximum atsuch values.Ultimately, we observe that distance measures that Model Property

JAC HAM HIM FRO POD DJS POR QJS CSE GDD REP LSD LGJ LLE IPM NBD DNB DMD DCN NES G ( n,p ) Complement symmetry (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) G ( n,p ) Derivative with network size, n + − − − ∼ − ∼ − − − − − + − − + ∼ RGG

Maximum: p ≈ (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) G ( n, (cid:104) k (cid:105) ) Detects the giant 1-core (cid:88) ∗ (cid:88) ∗ (cid:88) (cid:88) (cid:88) ∗ (cid:88) (cid:88) (cid:88) ∗ (cid:88) (cid:88) ∗ G ( n, (cid:104) k (cid:105) ) Detects the giant 2-core (cid:88) ∗ (cid:88) ∗ (cid:88) ∗ (cid:88) ∗ G ( n, (cid:104) k (cid:105) ) Derivative with network size, n − − + − − ∼ − + + + − − − ∼ − − + − W S

Small-world > random (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) W S

Path length sensitivity (cid:88) ∗ (cid:88) (cid:88) (cid:88) (cid:88) ∗ (cid:88) (cid:88) (cid:88) (cid:88) ∗ (cid:88) W S

Clustering sensitivity (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) ∗ (cid:88) ∗ SCM

Maximum: < γ < (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) SCM

Monotonic decay as γ grows (cid:88) † (cid:88) (cid:88) † (cid:88) † (cid:88) (cid:88) † (cid:88) † (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) † (cid:88) (cid:88) (cid:88) † P A

Heterogeneous > homogeneous (cid:88) (cid:88) P A

Maximum: α ≈ (uniform) (cid:88) (cid:88) (cid:88) (cid:88) P A

Maximum: α ≈ (linear) (cid:88) (cid:88) P A

Maximum: < α ≤ (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) = captures a given property through a global maximum/minimum in its within-ensemble graph distance curve. ∼ = non-monotonic relationship between network size and within-ensemble graph distance. (cid:88) ∗ = potentially captures a given property (via local maxima in the mean or standard deviation, change in slope, etc.). (cid:88) † = monotonic decay beyond a very small value of γ ( γ ≈ ) where there is an apparent maximum (for SCM ). TABLE III.

Summary of key within-ensemble graph distance properties for diﬀerent ensembles.

Each of theensembles included in this work has characteristic properties that a within-ensemble graph distance may be able capture. Herewe consolidate these various properties into a single table that classiﬁes whether each distance has a given property. Modelsconsidered are dense Erdős-Rényi graphs ( G ( n,p ) ), random geometric graphs ( RGG ), sparse Erdős-Rényi graphs ( G ( n, (cid:104) k (cid:105) ) ),the Watts-Strogatz model ( W S ), soft conﬁguration model with power-law degree distribution (

SCM ) and general preferentialattachment with kernel α ( P A ). Clariﬁcations : In the

W S model, we look at three properties: 1) the mean within-ensemblegraph distance is larger for intermediate “small-world” values of p r than it is when p r = 1 ; 2) the within-ensemble graph distanceis sensitive to values of p r where the magnitude slope of the L p /L curve is largest (“path length sensitivity” above); 3) thewithin-ensemble graph distance is sensitive to values of p r where the magnitude slope of the C p /C curve is largest (“clusteringsensitivity” above). In the P A model, we look at whether high, positive values of α produce greater mean within-ensemblegraph distances than lower, negative values of α , and at where the maximum within-ensemble distance occurs. are fundamentally associated with ﬂow-based propertiesof the network (i.e., if a distance measure is based ona graph’s Laplacian matrix, communicability, or otherproperties important to diﬀusion, such as path-lengthdistributions, etc.) are the ones most sensitive for pickingup on this property (Figure 2) [45].What Figure 2 highlights, which the dense ensemblesin Figure 1 could not, is the rich and varied behaviorcharacteristic of sparse graphs. For example, the distancemeasures with maxima at p = ( HAM , HIM , FRO , POD , DJS ,etc.) are still seen in Figure 2, but the emphasis is insteadon the degree as opposed to the edge density; given thatmost real-world networks are sparse [46], this view of thesame parameter is especially informative.Importantly, while the qualitative behaviors discussedhere are general features of the models and distances, thequantitative value of the average within-ensemble graphdistance also depends on network size. There are no spe-ciﬁc structural transitions to discuss around this depen-dency, but it can be an important problem when compar-ing networks of diﬀerent sizes without a good understand-ing of how network distances might behave. Interested readers can ﬁnd our results in SI A where we use G ( n, (cid:104) k (cid:105) ) to vary network size while keeping all other features ﬁxed.

3. Small-world graphs

The ﬁnal homogeneous graph ensemble studied hereis the Watts-Strogatz model. This model generates net-works that are initialized as lattice networks, and edgesare randomly rewired with probability, p r . At certainvalues of p r , we see two key phenomena occur:5. “Entry” into the small-world regime: Even as theedges in the network are minimally rewired, theaverage path length quickly decreases relative toits initial (longer) value. This is highlighted by theblue curve in Figure 3, corresponding to L p L , where L is the average path length before any edges havebeen rewired. For the parameterizations used inthis study, the largest (negative) slope of this curveis at p r ≈ × − . We might expect a within-ensemble graph distance to be sensitive to this ornearby values of p r , as this region corresponds to D ( G , G ) d Jaccard dissimilarity[JAC] Hamming distance[HAM] Hamming-Ipsen-Mikhailov[HIM] Frobenius norm[FRO] Polynomial dissimilarity[POD] D ( G , G ) d Degree distributionJensen-Shannon div. [DJS] Portrait divergence[POR] Quantum density matrixJensen-Shannon div. [QJS] Communicability sequenceentropy [CSE] Graph diffusion distance[GDD] D ( G , G ) d Resistance perturbationdistance [REP] NetLSD[LSD] Laplacian (Gaussian kernel)Jensen-Shannon div. [LGJ] Laplacian (Lorenzian kernel)Euclidean distance [LLE] Ipsen-Mikhailov[IPM]0.0 0.2 0.4 0.6 0.8 1.0 p D ( G , G ) d Nonbacktracking spectraldistance [NBD] 0.0 0.2 0.4 0.6 0.8 1.0 p Distribributional nonbactrackingspectral distance [DNB] 0.0 0.2 0.4 0.6 0.8 1.0 p D-measure distance[DMD] 0.0 0.2 0.4 0.6 0.8 1.0 p DeltaCon[DCN] 0.0 0.2 0.4 0.6 0.8 1.0 p NetSimile[NES]meansmax mean standard deviationsmax standard deviation G ( n , p ) RGG

Within-ensemble graph distances: G ( n , p ) and RGG (n=500)

FIG. 1.

Mean and standard deviations of the within-ensemble distances for G ( n,p ) and RGG . By repeatedlymeasuring the distance between pairs of G ( n,p ) and RGG networks of the same size and density, we begin to see characteristicbehavior in both the graph ensembles as well as the graph distance measures themselves. In each subplot, the mean within-ensemble graph distance is plotted as a solid line with a shaded region around for the standard error ( (cid:104) D (cid:105) ± σ (cid:104) D (cid:105) ; note that inmost subplots above, the standard error is too small to see), while the dashed lines are the standard deviations. changes in the graphs’ common structural features.6. “Exit” from the small-world regime: After enoughedges have been rewired, the network loses what-ever clustering it had from originally being a lat-tice, reducing to approximately the clustering ofan ER graph. This is highlighted by the violetcurve in Figure 3, corresponding to C p C , where C is the average clustering before any edges have beenrewired. For the parameterizations used in thisstudy, the largest (negative) slope of this curve isat p r ≈ × − . Again, we might expect a within-ensemble graph distance to be sensitive to this largedecrease in clustering.Together, the above features characterize Watts-Strogatz networks. Importantly, we are interested inwhether a distance measure is sensitive to these “en- try” and “exit” values of p r ; sensitive here is deliberatelybroadly deﬁned. For instance, as in the case of CSE , weobserve a reduction in within-ensemble graph distance ata rate that almost exactly resembles the rate at which C p C decays. Alternatively, a distance measure can be sensi-tive to these critical points by having a local maximumat or around the critical point. In the case of POR , we seethat the within-ensemble graph distance is maximized atapproximately the same point as the largest (negative)slope of the L p L curve.Here, insensitivity to these critical points is also aninformative property to highlight in a distance measure.As one example, HAM appears to be otherwise unaﬀectedby the “exit” from the small-world regime, with distancesincreasing steadily despite the model generating networkswith dramatic structural diﬀerences.Lastly, we ask whether the within-ensemble graph dis- D ( G , G ) d Jaccard dissimilarity[JAC] Hamming distance[HAM] Hamming-Ipsen-Mikhailov[HIM] Frobenius norm[FRO] Polynomial dissimilarity[POD] D ( G , G ) d Degree distributionJensen-Shannon div. [DJS] Portrait divergence[POR] Quantum density matrixJensen-Shannon div. [QJS] Communicability sequenceentropy [CSE] Graph diffusion distance[GDD] D ( G , G ) d Resistance perturbationdistance [REP] NetLSD[LSD] Laplacian (Gaussian kernel)Jensen-Shannon div. [LGJ] Laplacian (Lorenzian kernel)Euclidean distance [LLE] Ipsen-Mikhailov[IPM]10 k D ( G , G ) d Nonbacktracking spectraldistance [NBD] 10 k Distribributional nonbactrackingspectral distance [DNB] 10 k D-measure distance[DMD] 10 k DeltaCon[DCN] 10 k NetSimile[NES] k = 1 k = 2meansmax mean standard deviationsmax standard deviation Within-ensemble graph distances: G ( n , k ) (n=500) FIG. 2.

Mean and standard deviations of the within-ensemble distances for G ( n, (cid:104) k (cid:105) ) networks. Here, we generatepairs of ER networks with a given average degree, (cid:104) k (cid:105) , and measure the distance between them with each distance measure.In each subplot, we highlight (cid:104) k (cid:105) = 1 and (cid:104) k (cid:105) = 2 . In each subplot, the mean within-ensemble graph distance is plotted as asolid line with a shaded region around for the standard error ( (cid:104) D (cid:105) ± σ (cid:104) D (cid:105) ; note that in most subplots above, the standard erroris too small to see), while the dashed lines are the standard deviations. tance of random networks (i.e., when p r → ) is greaterthan that of small-world networks; this is indicated bya within-ensemble graph distance curve that is higher at p r = 1 than those between − < p r < − in Fig-ure 3. This property holds for distance measures thatdepend on node labeling (e.g. JAC , HAM , HIM , FRO , POD ,etc.) but also for

DJS —which is intuitive, since morenoise increases the variance of the degree distribution—as well as a few puzzling distances:

QJS , DCN , and the twobased on the non-backtracking matrix,

NBD and

DNB . B. Results for sparse heterogeneous ensembles

The sparse graph setting is much closer to that of realnetworks, which often also have heavy-tailed degree dis-tributions [47]. This motivated the selection of the fol- lowing two heterogeneous, sparse ensembles.

1. Soft conﬁguration model: heavy-tailed degree distribution

We study these graphs using a (soft) conﬁgurationmodel with a power-law expected degree distribution;i.e., the expected degree κ of a node is drawn proportion-ally to κ − γ . From this model, we expect two importantfeatures that graph distance measures could recover:7. For γ < , we know the variance of the degree di-verges in the limit of large graph size n [47]. Sincethere should be large variations on the degree se-quences for two ﬁnite instances, we might also ex-pect the graph distances to produce maximal dis-tance (cid:104) D (cid:105) . D ( G , G ) d Jaccard dissimilarity[JAC] Hamming distance[HAM] Hamming-Ipsen-Mikhailov[HIM] Frobenius norm[FRO] Polynomial dissimilarity[POD] D ( G , G ) d Degree distributionJensen-Shannon div. [DJS] Portrait divergence[POR] Quantum density matrixJensen-Shannon div. [QJS] Communicability sequenceentropy [CSE] Graph diffusion distance[GDD] D ( G , G ) d Resistance perturbationdistance [REP] NetLSD[LSD] Laplacian (Gaussian kernel)Jensen-Shannon div. [LGJ] Laplacian (Lorenzian kernel)Euclidean distance [LLE] Ipsen-Mikhailov[IPM]10 p r D ( G , G ) d Nonbacktracking spectraldistance [NBD] 10 p r Distribributional nonbactrackingspectral distance [DNB] 10 p r D-measure distance[DMD] 10 p r DeltaCon[DCN] 10 p r NetSimile[NES]meansmax mean standard deviationsmax standard deviation L p L C p C Within-ensemble graph distances: Watts-Strogatz (n=500, k=8)

FIG. 3.

Mean and standard deviations of the within-ensemble distances for Watts-Strogatz networks.

Here, wegenerate pairs of Watts-Strogatz networks with a ﬁxed size and average degree but a variable probability of rewiring randomedges, p r . In each subplot we also plot the clustering and path length curves as in the original Watts-Strogatz paper [20] toaccentuate the “small-world” regime with high clustering and low path lengths. The mean within-ensemble graph distance isplotted as a solid line with a shaded region around for the standard error ( (cid:104) D (cid:105) ± σ (cid:104) D (cid:105) ; note that in most subplots above, thestandard error is too small to see), while the dashed lines are the standard deviations.

8. We might also expect a monotonic decay in thewithin-ensemble graph distance as γ increases. Forlarge γ , most expected node-degrees will be approx-imately the average degree, making the network asa whole structurally similar to an ER graph. Onthe other hand when γ is small (especially when γ ≤ ), there is a wide diversity in the degrees ofnodes within the graph, and of the expected degreesof nodes across graphs (since expected degrees arei.i.d. sampled from a Pareto distribution).Out of the 20 studied, most distances capture both ofthese features. Since γ tunes the degree-heterogeneity(larger γ yielding more homogeneous graphs), a decreasein the average distance among pairs of graphs might beexpected. For large γ , most expected node-degrees willbe approximately the average degree, making the net- work as a whole structurally similar to an ER graph. Onthe other hand when γ is small (especially when γ ≤ ),there is a wide diversity in the degrees of nodes withinthe graph, and of the expected degrees of nodes acrossgraphs (since expected degrees are i.i.d. sampled froma Pareto distribution). Thus a reasonable expectationwould be that pairs of graphs on average become fartherapart as γ is decreased. This is observed in many dis-tances, but with the exceptions of QJS and

REP , whicheach instead exhibit maxima at certain ﬁnite values of γ > . Additionally, several distances ( HAM , POR , NBD ,and

NES ) appear to decay monotonically beyond somevery small value of γ , below which they have a slightlysmaller value. This fact could have arisen as a ﬁnite-sizeeﬀect or due to some other details of the implementation,since ﬂuctuations become highly pronounced as γ → .0 D ( G , G ) d Jaccard dissimilarity[JAC] Hamming distance[HAM] Hamming-Ipsen-Mikhailov[HIM] Frobenius norm[FRO] Polynomial dissimilarity[POD] D ( G , G ) d Degree distributionJensen-Shannon div. [DJS] Portrait divergence[POR] Quantum density matrixJensen-Shannon div. [QJS] Communicability sequenceentropy [CSE] Graph diffusion distance[GDD] D ( G , G ) d Resistance perturbationdistance [REP] NetLSD[LSD] Laplacian (Gaussian kernel)Jensen-Shannon div. [LGJ] Laplacian (Lorenzian kernel)Euclidean distance [LLE] Ipsen-Mikhailov[IPM]2 3 4 5 6 D ( G , G ) d Nonbacktracking spectraldistance [NBD] 2 3 4 5 6 Distribributional nonbactrackingspectral distance [DNB] 2 3 4 5 6 D-measure distance[DMD] 2 3 4 5 6 DeltaCon[DCN] 2 3 4 5 6 NetSimile[NES]meansmax mean standard deviationsmax standard deviation = 3

Within-ensemble graph distances: soft configuration model (n=1000, k=12)

FIG. 4.

Mean and standard deviations of the within-ensemble distances for soft conﬁguration model networkswith varying degree exponent.

Here, we generate pairs of networks from a (soft) conﬁguration model, varying the degreeexponent, γ , while keeping (cid:104) k (cid:105) constant ( n = 1000 ). In each subplot we highlight γ = 3 . The mean within-ensemble graphdistance is plotted as a solid line with a shaded region around for the standard error ( (cid:104) D (cid:105) ± σ (cid:104) D (cid:105) ; note that in most subplotsabove, the standard error is too small to see), while the dashed lines are the standard deviations. Only one graph distance produces completely unex-pected behavior:

DCN yields (cid:104) D (cid:105) that monotonically in-creases with the scale exponent γ of the degree distribu-tion, and its standard deviation is minimized when γ ≈ .We will expand upon this in the following section.

2. Nonlinear preferential attachment

The ﬁnal ensemble we include here is the nonlinearpreferential attachment growth model. By varying thepreferential attachment kernel, parameterized by α , wecan capture a range of network properties:9. As α → −∞ , this model generates networks withmaximized average path lengths, whereby each newnode connects its m links to nodes with the smallest average degree; conversely α → ∞ generates star-like networks [48], an eﬀect known as condensation .10. At α = 1 , linear preferential attachment, we seethe emergence of scale-free networks [24], whereasuniform attachment α = 0 gives each node an equalchance of receiving the incoming node’s links.When α = 1 , this ensemble theoretically generates net-works with power-law degree distributions (with degreeexponent, γ = 3 [25]), which is reminiscent of the resultsin Figure 4 where we measure the within-ensemble graphdistances while varying γ .Various mean within-ensemble distances are maxi-mized in the range α ∈ [1 , , which is indicative ofthe diversity of possible graphs that can be producedby the preferential attachment mechanism in the small- α regime. For α (cid:28) , newly arriving nodes connect pri-1 D ( G , G ) d Jaccard dissimilarity[JAC] Hamming distance[HAM] Hamming-Ipsen-Mikhailov[HIM] Frobenius norm[FRO] Polynomial dissimilarity[POD] D ( G , G ) d Degree distributionJensen-Shannon div. [DJS] Portrait divergence[POR] Quantum density matrixJensen-Shannon div. [QJS] Communicability sequenceentropy [CSE] Graph diffusion distance[GDD] D ( G , G ) d Resistance perturbationdistance [REP] NetLSD[LSD] Laplacian (Gaussian kernel)Jensen-Shannon div. [LGJ] Laplacian (Lorenzian kernel)Euclidean distance [LLE] Ipsen-Mikhailov[IPM]4 2 0 1 2 4 D ( G , G ) d Nonbacktracking spectraldistance [NBD] 4 2 0 1 2 4 Distribributional nonbactrackingspectral distance [DNB] 4 2 0 1 2 4 D-measure distance[DMD] 4 2 0 1 2 4 DeltaCon[DCN] 4 2 0 1 2 4 NetSimile[NES]= 1= 2meansmax mean standard deviationsmax standard deviation

Within-ensemble graph distances: preferential attachment networks (n=500, m=2)

FIG. 5.

Mean and standard deviations of the within-ensemble distances for preferential attachment networks.

Here, we generate pairs of preferential attachment networks, varying the preferential attachment kernel, α , while keeping thesize and average degree constant. As α → ∞ , the networks become more and more star-like, and at α = 1 , this model generatesnetworks with power-law degree distributions. The mean within-ensemble graph distance is plotted as a solid line with a shadedregion around for the standard error ( (cid:104) D (cid:105) ± σ (cid:104) D (cid:105) ; note that in most subplots above, the standard error is too small to see),while the dashed lines are the standard deviations. marily to the lowest-degree existing nodes (for exampleleading to long chains of degree- nodes when m = 1 ),making many distance measures record i.i.d. pairs ofgraphs as similar. For α (cid:29) , new nodes tend to con-nect to the highest-degree existing node, leaving a star-like network—then likewise many graph-pairs are deemedvery similar. In the intermediate range (e.g. linear pref-erential attachment, α = 1 ), a much wider variety ofpossible graphs can arise. Thus on average, i.i.d. pairsare (usually) measured as farthest apart in that range.For preferential attachment networks, we again see cu-rious behavior for DCN where, unlike most other distancemeasures, heterogeneous graphs with ≤ α < havesmaller within-ensemble graph distances than more ho-mogeneous graphs α < . Upon closer examination, weknow why this happens, and to conclude this section, we will walk through the anatomy of DCN and show why itsbehavior is often diﬀerent than the other distance mea-sures studied here, especially for heterogeneous networks.The descriptor , ψ G that DCN is based oﬀ of is an aﬃnitymatrix of the graph (constructed from a belief propaga-tion algorithm, see SI B 18 for full methodology), whilethe distance is calculated using the Matusita distance(similar to the Euclidean distance). The authors notethat they selected this distance because they found thatit gave more desirable results: “...it ‘boosts’ the nodeaﬃnities and, therefore, detects even small changes in thegraphs (other distance measures, including [Euclideandistance], suﬀer from high similarity scores no matterhow much the graphs diﬀer)” [2]. What the choice ofthe Matusita distance has apparently obscured, however,is a greater speciﬁcity for distinguishing heterogeneous2networks. We know this because of preliminary exper-iments where the Matusita distance is swapped out fora Jensen-Shannon divergence (as in, for example,

CSE );this resulting within-ensemble graph distance is maxi-mized for heterogeneous networks ( < α < ).Finally, as we note in Section III A 1, we are not as-serting that a graph distance measure should detect theunique behavior of linear preferential attachment ( α =1 ). Nor are we advocating for practitioners to aban-don the use of DCN . What we are claiming, however—and why we chose to focus on

DCN in this section—isthat we need useful benchmarks for understanding theeﬀects of choosing one descriptor-distance pairing overanother. Furthermore, this benchmark should be basedon the within-ensemble graph distances from well-knownensembles.

IV. DISCUSSION

Graph ensembles are core to the characterization andbroader study of networks. Graphs sampled from a givenensemble will highlight certain observable features of theensemble itself, and in this work, we have used the notionof graph distance to further characterize several com-monly studied graph ensembles. The present study fo-cused on one of the simplest quantities to construct givena distance measure and a graph ensemble, namely themean within-ensemble distance (cid:104) D (cid:105) . Note however thatthere are many ensembles for which the present meth-ods could be repeated, as well as more graph distancemeasures, and inﬁnitely many other statistics that couldbe examined from the within-ensemble distance distri-bution. Despite examining the within-ensemble graphdistances for only ﬁve diﬀerent ensembles, we observed arichness and variety of behaviors among the various dis-tance measures tested. We view this work as the start-ing point for more inquiries into the relationship betweengraph ensembles and graph distances.One promising future direction for the study of within-ensemble graph distances is the prospect of deriving func-tional forms for various distance measures, as we do for JAC , HAM , and

FRO in SI C 1, C 2, and C 3. Other distancemeasures, such as

DJS , likely have approximate analyticalexpressions derived for certain graph ensembles.We have here only studied the behavior of graphswithin a given ensemble and parameterization, which isessentially the simplest possible choice. This leaves wideopen any questions regarding distances between graphssampled from diﬀerent ensembles—or even diﬀerent fromtwo diﬀerent parameterizations of the same ensemble.These will be the topic of follow-up works. Neverthe-less, such follow-ups will likewise only cover a very smallfraction of all possible combinations.We hope that our approach will provide a foundationfor researchers to clarify several aspects of the networkcomparison problem. First, we expect that practitionerswill be able to use the within-ensemble graph distance in order to rule out sub-optimal distance measures thatdo not pick up on meaningful diﬀerences between net-works in their domain of interest (e.g., what is an infor-mative “description-distance” comparison between brainnetworks may not be as informative when comparing,for example, infection trees in epidemiology). Second,we expect that this work will provide a foundation forresearchers looking to develop new graph distance mea-sures (or hybrid distance measures, such as

HIM ) that aremore appropriate for their particular application areas.There were 20 diﬀerent graph distances used in thiswork, with undoubtedly more that we have not included.Each of these measures seek to address the same thing:quantifying the dissimilarity of pairs of networks. Wesee the current work as an attempt to consolidate allsuch methods into a coherent framework—namely, cast-ing each distance measure as a mapping of two graphsinto a common descriptor space, and the application ofa distance measure within that space. Not only that, wealso suggest that stochastic, generative, graph models—because of known structural properties and certain criti-cal transition points in their parameter space—are theideal tool to use for characterizing and benchmarkinggraph distance measures.Classic random graph models can ﬁll an important gapby providing well-understood benchmarks on which totest distance measures before using them in applications.Much like in other domains of network science, havingeﬀective and well-calibrated comparison procedures is vi-tal, especially given the great diversity of graph ensem-bles under study and of networks in nature.

SOFTWARE AND DATA AVAILABILITY

All the experiments in this paper were conducted us-ing the netrd

Python package https://github.com/netsiphd/netrd . A repository with replication materi-als can be found at https://github.com/jkbren/wegd . ACKNOWLEDGEMENTS

The authors thank Tina Eliassi-Rad, Dima Krioukov,and Leo Torres for helpful comments about this workthroughout. This work was supported in part by the Net-work Science Institute at Northeastern University andthe Vermont Complex Systems Center. B.K. acknowl-edges support from the National Defense Science & En-gineering Graduate Fellowship (NDSEG) Program. G.Sand C.M acknowledge support from the Natural Sciencesand Engineering Research Council of Canada and theSentinel North program, ﬁnanced by the Canada FirstResearch Excellence Fund. L.H.D. and A.D. acknowledgesupport from the National Science Foundations GrantNo. DMS-1829826.3

AUTHOR CONTRIBUTIONS

All authors contributed to the conception of theproject. H.H., A.D., & L.H.D. devised the formalism used in this work. H.H., B.K., and S.M. conducted sim-ulations of the within-ensemble distances. S.M. led thedevelopment of the netrd software package that that wasused to perform the analyses. All authors contributed towriting the manuscript. H.H. & B.K. contributed equally. [1] M. Berlingerio, D. Koutra, T. Eliassi-Rad, andC. Faloutsos, arXiv preprint arXiv:1209.2684 (2012).[2] D. Koutra, N. Shah, J. T. Vogelstein, B. Gallagher, andC. Faloutsos, ACM Transactions on Knowledge Discov-ery from Data (TKDD) , 1 (2016).[3] A. Tsitsulin, D. Mottin, P. Karras, A. Bronstein, andE. Müller, Proceedings of the 24th ACM SIGKDD In-ternational Conference on Knowledge Discovery & DataMining , 2347 (2018).[4] J. P. Bagrow and E. M. Bollt, Applied Network Science , 1 (2019).[5] C. Donnat and S. Holmes, Annals of Applied Statistics , 971 (2018).[6] N. Masuda and P. Holme, Scientiﬁc Reports , 1 (2019).[7] L. Torres, P. Suárez-Serrato, and T. Eliassi-Rad, AppliedNetwork Science , 41 (2019).[8] N. D. Monnig and F. G. Meyer, Discrete Applied Math-ematics , 347 (2018).[9] T. A. Schieber, L. C. Carpi, A. Díaz-Guilera, P. M.Pardalos, C. Masoller, and M. G. Ravetti, Nature Com-munications , 1 (2017).[10] R. C. Wilson and P. Zhu, Pattern Recognition , 2833(2008).[11] H. Bunke and K. Shearer, Pattern Recognition Letters , 255 (1998).[12] J. Bento and S. Ioannidis, Applied Network Science , 1(2019).[13] For example, two cospectral but non-identical graphswould have distance zero according to any spectral dis-tance measure.[14] Throughout this paper, we use the term “graph distance”or “distance” to refer to a dissimilarity measure betweentwo graphs satisfying the properties we detail in SectionI A. This language is somewhat imprecise from a math-ematical perspective; many graph distances do not meetall the criteria of distance metrics . We have chosen tokeep the term “graph distance” at the cost of some infor-mality to maintain consistency with much of the existingliterature we draw upon. We thank an anonymous re-viewer for their help in clarifying this matter.[15] https://github.com/netsiphd/netrd/ . Note: this soft-ware package includes several more distances that werenot included in these analyses, and as it is an open-sourceproject, we anticipate that it will be updated with newdistance measures as they continue to be developed.[16] P. Erdős and A. Rényi, Publicationes Mathematicae ,290 (1959).[17] B. Bollobás, European Journal of Combinatorics , 311(1980).[18] J. Dall and M. Christensen, Physical Review E ,016121 (2002).[19] M. Penrose, Random Geometric Graphs (Oxford Univer-sity Press, 2003).[20] D. J. Watts and S. H. Strogatz, Nature , 440 (1998). [21] J. Park and M. E. J. Newman, Phys. Rev. E , 066117(2004).[22] D. Garlaschelli and M. I. Loﬀredo, Phys. Rev. E ,015101 (2008).[23] G. Cimini, T. Squartini, F. Saracco, D. Garlaschelli,A. Gabrielli, and G. Caldarelli, Nature Reviews Physics , 58 (2019).[24] A.-L. Barabási and R. Albert, Science , 509 (1999).[25] R. Albert and A.-L. Barabási, Reviews of Modern Physics , 47 (2002).[26] P. L. Krapivsky, S. Redner, and F. Leyvraz, PhysicalReview Letters , 4629 (2000).[27] G. Jurman, R. Visintainer, and C. Furlanello, in NeuralNets WIRN10: Proceedings of the 20th Italian Workshopon Neural Nets , Vol. 226, edited by N. N. W. B. Apolloni et al. (IOS Press, 2011) pp. 227–234.[28] G. H. Golub and C. F. van Loan,

Matrix Computations ,4th ed. (JHU Press, 2013).[29] P. Jaccard, Bulletin de la Societe Vaudoise des SciencesNaturelles , 547 (1901).[30] Hamming, R.W., Bell System Technical Journal , 147(1950).[31] X. Gao, B. Xiao, D. Tao, and X. Li, Pattern Analysisand applications , 113 (2010).[32] W. Wallis, P. Shoubridge, M. Kraetz, and D. Ray, Pat-tern Recognition Letters , 701 (2001).[33] S. Chowdhury and F. Mémoli, arXiv (2017),arXiv:1708.04727.[34] L. C. Carpi, O. A. Rosso, P. M. Saco, and M. G. Ravetti,Physics Letters A , 801 (2011).[35] From our preliminary analyses, the particular choiceof metric can dramatically change the distance values,though we do not report this here. For an extensive de-scription of distance metrics in general, see [49, 50].[36] M. Meilˇa, Journal of Multivariate Analysis , 873(2007).[37] G. Jurman, R. Visintainer, M. Filosi, S. Riccadonna, andC. Furlanello, Proceedings of the 2015 IEEE Interna-tional Conference on Data Science and Advanced Ana-lytics, DSAA 2015 , 1 (2015).[38] A. Mellor and A. Grusovin, Physical Review E , 052309(2019).[39] M. van Steen, Graph Theory and Complex Networks. AnIntroduction (Maarten van Steen, 2010).[40] M. De Domenico and J. Biamonte, Physical Review X ,34 (2016).[41] D. Chen, D. D. Shi, M. Qin, S. M. Xu, and G. J. Pan,Physical Review E , 1 (2018).[42] D. K. Hammond, Y. Gur, and C. R. Johnson, 2013 IEEEGlobal Conference on Signal and Information Processing,GlobalSIP 2013 - Proceedings , 419 (2013).[43] M. Ipsen and A. S. Mikhailov, Physical Review E , 4(2002). [44] M. Molloy and B. Reed, Random Structures & Algo-rithms , 161 (1995).[45] Note that the two distance measures based on the non-backtracking matrix ( NBD & DNB ) are undeﬁned in graphswithout a 2-core, restricting their range in Figure 2.[46] C. I. Del Genio, T. Gross, and K. E. Bassler, PhysicalReview Letters , 1 (2011).[47] M. Newman, Contemporary Physics , 323 (2005).[48] P. L. Krapivsky, S. Redner, and F. Leyvraz, PhysicalReview Letters , 4629 (2000).[49] M. M. Deza and E. Deza, in Encyclopedia of Distances (Springer, 2009) pp. 1–583.[50] F. Emmert-Streib, M. Dehmer, and Y. Shi, InformationSciences , 180 (2016).[51] S. McCabe, L. Torres, T. LaRock, S. Haque, C.-H. Yang,H. Hartle, and B. Klein, “netrd: Network reconstructionand graph distance measures in python,” (2019).[52] V. Y. Pan and Z. Q. Chen, in

Proceedings of the Thirty-First Annual ACM Symposium on Theory of Computing ,STOC ’99 (Association for Computing Machinery, NewYork, NY, USA, 1999) p. 507–516.[53] G. T. Cantwell and M. E. J. Newman, Proceedings of theNational Academy of Sciences , 23398 (2019).[54] J. Lin, IEEE Transactions on Information Theory ,145 (1991).[55] J. P. Bagrow, E. M. Bollt, J. D. Skufca, and D. benAvraham, EPL (Europhysics Letters) , 68004 (2008).[56] W. A. Sutherland, Introduction to Metric and TopologicalSpaces (Oxford University Press, 2009).[57] L. Bai, L. Rossi, A. Torsello, and E. R. Hancock, PatternRecognition , 344 (2015).[58] L. Rossi, A. Torsello, E. R. Hancock, and R. C. Wilson,Physical Review E , 032806 (2013).[59] L. Rossi, A. Torsello, and E. R. Hancock, Physical Re-view E , 022815 (2015).[60] C. Moler and C. Van Loan, SIAM Review , 3 (2003).[61] P. Wills and F. G. Meyer, PLoS ONE , 1 (2020).[62] P. Bonacich, American Journal of Sociology , 1170(1987).[63] K. Henderson, B. Gallagher, L. Li, L. Akoglu, T. Eliassi-Rad, H. Tong, and C. Faloutsos, in Proceedings of the17th ACM SIGKDD International Conference on Knowl-edge Discovery and Data Mining , KDD ’11 (Associationfor Computing Machinery, New York, NY, USA, 2011)p. 663–671. Supplemental Information A: Within-ensemblegraph distance as network size increases

In Figures 1, 2, 3, 4, and 5, we plot the within-ensemblegraph distances of networks with a ﬁxed size. However,one important behavior of graph distance measures ishow they change as networks increase in size.As an example, the Jensen-Shannon divergence be-tween the degree distributions (

DJS ) of two ER graphswill decrease as n → ∞ , since the empirical degree distri-butions get closer and closer to a binomial distribution.On the other hand, for graph distances that are explic-itly accompanied by a size-normalizing term (e.g. HAM ),we would expect that the mean within-ensemble graphdistance does not change as network size increases.In Figure 6, we show how the within-ensemble graphdistance changes as n increases, both for a ﬁxed densityin G ( n,p ) as well as a ﬁxed average degree in G ( n, (cid:104) k (cid:105) ) . Supplemental Information B: Descriptions of graphdistance measures

Throughout the appendix, we assume graphs G and G (cid:48) are undirected and unweighted so that the adjacency ma-trices are binary and symmetric. We ﬁrst consider severalprojections for distances given a description which is thefull adjacency sequence or matrix, followed by projectionsinvolving statistical and ad-hoc descriptions. The list ofgraph distances used in this work is { JAC , HAM , HIM , FRO , POD , DJS , POR , QJS , CSE , GDD , REP , LSD , LGJ , LLE , IPM , NBD , DNB , DMD , DCN , NES }.

1. Jaccard Distance

The Jaccard measure is computed using the adjacencymatrix ψ G = A ∈ { , } n × n . For two graphs vertex-labeled G and G (cid:48) , D JAC ( G, G (cid:48) ) = d JAC ( A , A (cid:48) ) = 1 − | S || T | (B1)where S ij = A ij A (cid:48) ij represents the intersection of edgesets between graphs G and G (cid:48) , while T ij = S ij + (1 − A (cid:48) ij ) A ij + (1 − A ij ) A (cid:48) ij represents the union of edge setsbetween graphs. Here, | S | is the sum over the S ij andsimilarly for | T | . The computational complexity of theJaccard distance is O ( | E | + | E (cid:48) | ) when using unorderedsets to get the union and intersection sets and their car-dinality. This is what is done in the netrd package [51].Since nearly empty graphs likely have nearly zero edgesin common, the | S || T | will be nearly zero for p close to , so that d JAC approaches at low p .

2. Hamming Distance

Similarly, the Hamming measure may also be com-puted using the adjacency matrix A ∈ { , } n × n . Fortwo vertex-labeled graphs G and G (cid:48) , the Hamming dis-tance counts the number of elementwise diﬀerences be-tween ψ G = A and ψ G (cid:48) = A (cid:48) : D HAM ( G, G (cid:48) ) := 1 (cid:0) n (cid:1) (cid:88) ≤ i

3. Frobenius

The Frobenius distance d FRO is simply the norm of ma-trices, so that: D FRO ( G, G (cid:48) ) := (cid:115)(cid:88) i,j | A ij − A (cid:48) ij | (B3)Note that for binary adjacency matrices, | A ij − A (cid:48) ij | = | A ij − A (cid:48) ij | , and A ii = A (cid:48) ii = 0 ∀ i given that there areno self-loops. Note that, because the distance operateson the adjacency matrices directly, it implicitly assumesthe graphs are vertex-labeled. FRO has the same com-putational complexity as the Hamming distance due totheir similarity. It is O ( n ) if one compares all entries,as is in the netrd package, but it could be improved to O ( | E | + | E (cid:48) | ) .

4. Polynomial Dissimilarity

The polynomial dissimilarity,

POD , between two un-weighted, vertex-labeled graphs is based on the eigen-value decompositions of the two adjacency matrices ofthe graphs, G and G (cid:48) [5].To compute the polynomial dissimilarity between twographs, ﬁrst decompose A as Q A Λ A Q TA , where Q A isan orthogonal matrix and Λ A is the diagonal matrixof eigenvalues. Second, construct vectors P ( A ) and P ( A (cid:48) ) for each graph, where P ( A ) = Q A W A Q TA and W A = Λ A + n − α Λ A + ... + n − α ( K − Λ KA .The polynomial dissimilarity, then, is calculated as theFrobenius norm between P ( A ) and P ( A (cid:48) ) D POD ( G, G (cid:48) ) = 1 n || P ( A ) − P ( A (cid:48) ) || . (B4)6 D ( G , G ) d Jaccard dissimilarity[JAC] Hamming distance[HAM] Hamming-Ipsen-Mikhailov[HIM] Frobenius norm[FRO] Polynomial dissimilarity[POD] D ( G , G ) d Degree distributionJensen-Shannon div. [DJS] Portrait divergence[POR] Quantum density matrixJensen-Shannon div. [QJS] Communicability sequenceentropy [CSE] Graph diffusion distance[GDD] D ( G , G ) d Resistance perturbationdistance [REP] NetLSD[LSD] Laplacian (Gaussian kernel)Jensen-Shannon div. [LGJ] Laplacian (Lorenzian kernel)Euclidean distance [LLE] Ipsen-Mikhailov[IPM]10 n D ( G , G ) d Nonbacktracking spectraldistance [NBD] 10 n Distribributional nonbactrackingspectral distance [DNB] 10 n D-measure distance[DMD] 10 n DeltaCon[DCN] 10 n NetSimile[NES]meansmax mean standard deviationsmax standard deviation G ( n , p ) G ( n , k ) Within-ensemble graph distances: G ( n , p = 0.1) and G ( n , k = 6) , varying n FIG. 6.

Mean and standard deviations of the within-ensemble distances for G ( n,p ) and G ( n, (cid:104) k (cid:105) ) as n increases. Here, we generate pairs of ER networks with either a ﬁxed density, p or with a ﬁxed average degree, (cid:104) k (cid:105) , as we increase thenetwork size, n . In each subplot, the mean within-ensemble graph distance is plotted as a solid line with a shaded region aroundfor the standard error ( (cid:104) D (cid:105) ± σ (cid:104) D (cid:105) ; note that in most subplots above, the standard error is too small to see), while the dashedlines are the standard deviations. In this work, we consider a default value of K = 5 in order to accommodate potentially informative higher-order interactions in each of the graphs. Here, α = 1 bydefault, though in [5], α = 0 . is commonly considered.The computational complexity of POD is O ( n ) in prac-tice, which arises from it requiring two n × n matrix eigen-decompositions, which is O ( n ) for general matrices anda method based on the QR algorithm [52], as used inthe netrd package. Note that recent techniques basedon message-passing can give fast and exact results forsparse networks with short loops in O ( n log n ) [53] andcould be used to reduce the computational complexity ofspectral graph distances.

5. Degree Distribution Jensen-Shannon Divergence

A simple graph distance measure is the Jensen-Shannon divergence [54] between the empirical degreedistributions of two graphs. In this case for an n -nodegraph G the descriptor ψ G is the empirical degree distri-bution encoded in the set of numbers { p k ( G ) } k ≥ := p given by p k ( G ) := n k ( G ) /n , where n k ( G ) = (cid:80) ni =1 { k i = k } , with {·} being the indicator function and k i = (cid:80) nj =1 A ij being the degree of node i in terms of the ad-jacency matrix A of G . The Jensen-Shannon divergencebetween two such distributions [34] is the degree Jensen-Shannon divergence or

DJS distance between the graphs: D DJS ( G, G (cid:48) ) = H [ p + ] −

12 ( H [ p ] + H [ p (cid:48) ]) , (B5)7where p + = { ( p k + p (cid:48) k ) / } k ≥ is a mixture distributionand H [ p ] = − (cid:80) k p k ln p k is the Shannon entropy.The computational complexity of DJS is O ( n ) , whicharises from computing two degree distributions (which is O ( n ) ) and then comparing them (which is O ( k + ) , with k + < n being the maximum degree in either network).

6. Portrait Divergence

The portrait divergence,

POR , compares using the JSDa description for each of two graphs called their networkportrait [55]. The network portrait is a matrix B withelements B lk such that B lk ≡ number of nodes with k nodes at distance l. (B6)Alternatively stated, B lk is the k th entry of the em-pirical histogram of l -th neighborhood sizes. These ele-ments are computed using a breadth-ﬁrst search or sim-ilar method. The portrait divergence of G and G (cid:48) isthe JSD of probability distributions associated with theirportraits, B and B (cid:48) [4]. Note that each row in B can beinterpreted as the probability distribution that there willbe k nodes at a distance of l away from a randomly cho-sen node such that: P ( k | l ) = B l,k N (B7)which can be normalized of the number of paths of length l such that the probability distribution is the probabilitythat two randomly selected nodes are at a distance l awayfrom each other: P ( l ) = (cid:80) nk =0 kB l,k (cid:80) c n c (B8)where n c is the number of nodes within a connected com-ponent, c . The joint probability of choosing a pair ofnodes at a distance, l , away from each other and thatone node has k nodes in total at distance, l , away is: P ( k, l ) = P ( k | l ) P ( l ) = (cid:18) (cid:80) nk (cid:48) =0 k (cid:48) B l,k (cid:48) n (cid:19) B l,k (cid:80) c n c (B9)There is now a P B ( k, l ) and P B (cid:48) ( k, l ) for each portrait, B and B (cid:48) , as well as a “mixed” distribution for both,which is speciﬁed as P ∗ = ( P B ( k, l ) + P B (cid:48) ( k, l )) . Theportrait divergence between G and G (cid:48) is the JSD betweentheir portraits as follows D POR ( G, G (cid:48) ) =

JSD ( P B ( k, l ) , P B (cid:48) ( k, l ))= 12 (cid:0) D KL ( P B ( k, l ) , P ∗ )+ D KL ( P B (cid:48) ( k, l ) , P ∗ ) (cid:1) (B10) where D KL is the Kullback-Leibler divergence. Note that √ D POR satisﬁes the properties of a metric (satisﬁes thetriangle inequality, is positive-deﬁnite, symmetric) [56].The computational complexity of

POR is O ( n ( n + | E | ) log n ) , which comes from the requirement of com-puting shortest paths between all pairs of nodes in thenetwork. In our implementation, computing the short-est path between a source and all nodes is done withthe Dijkstra’s algorithm with a binary heap, which takes O (( n + | E | ) log n ) operations in the worst case. Con-structing the portrait and calculating the JSD betweenthe associated distributions has a lower computationalcomplexity.

7. Quantum Spectral Jensen-Shannon Divergence

This method compares graphs via the Jensen-Shannondivergence (JSD) between probability distributions as-sociated with density matrices of two graphs G and G (cid:48) [6, 57–59], denoted ρ and ρ (cid:48) respectively, deﬁned by ρ = e − β L ( G ) Z (B11)where L ( G ) is the Laplacian matrix of graph G , and con-stant Z ≡ (cid:80) ni =1 e − βλ i ( L ) , with λ i ( L ) being the i th eigen-value of L . Description-distance pair ( ρ , JSD) yields the“Quantum Spectral Jensen-Shannon Divergence” ( QJS )[40], which compares two graphs by the entropy of theeigenvalue spectra of their density matrices ρ . Treatingthe spectrum { λ i } ni =1 as a normalized probability distri-bution, the spectral Rényi entropy of order q is given by S q = 11 − q log n (cid:88) i =1 λ i ( ρ ) q , (B12)which, if q = 1 , reduces to the Von Neumann entropy: S = − n (cid:88) i =1 λ i ( ρ ) log λ i ( ρ ) . (B13)The QJS distance between two graphs is deﬁned to be: D QJS ( G, G (cid:48) ) = S q (cid:18) ρ + ρ (cid:48) (cid:19) −

12 [ S q ( ρ ) + S q ( ρ (cid:48) )] . (B14)For default parameter values, we use β = 0 . and q = 1 . , based on the explanations in [40]. QJS requirescomputation of Laplacian matrix spectra of two graphs,and comparison thereof, which yields a computationalcomplexity of O ( n ) (see Appendix B 4).

8. Communicability Sequence Entropy Divergence

The communicability sequence entropy divergence

CSE between two graphs, G and G (cid:48) , is the JSD between the8communicability distributions of G and G (cid:48) . In order tohave a communicability distribution, we ﬁrst constructthe communicability matrix, which is an n × n matrix cor-responding to the communicability between two nodes, v i and v j . C = e A = ∞ (cid:88) k =0 k ! A k (B15)In other words, the communicability matrix, C , is com-puted as a matrix exponentiation of the adjacency ma-trix. The elements C ij , i ≤ j, are stored in a vector (oflength (cid:0) n (cid:1) ) and normalized to create the communicabil-ity sequence, P and P (cid:48) , for each graph. The Shannonentropy of P is H [ P ] = − (cid:80) Mi =1 P i log P i , and the com-municability sequence entropy divergence is calculatedas the JSD between P and P (cid:48) , where M is the mixedsequence of P and P (cid:48) . D CSE ( G, G (cid:48) ) =

JSD ( P, P (cid:48) ) = H [ M ] −

12 ( H [ P ] + H [ P (cid:48) ]) . (B16)The computational complexity of CSE is O ( n ) , withthe computationally intensive step being to computethe exponential of both adjacency matrices A and A (cid:48) .Our implementation uses Padé approximants through theSciPy package to perform this step, which takes O ( n ) operations to get an approximation [60].

9. Graph Diﬀusion Distance

The graph diﬀusion distance [42]

GDD between twographs, G and G (cid:48) , is a distance measure based on thenotion of ﬂow within each graph. As such, this measureuses the unnormalized Laplacian matrices of both graphs, L and L (cid:48) , and uses them to construct time-varying Lapla-cian exponential diﬀusion kernels, e − t L and e − t L (cid:48) , by ef-fectively simulating a diﬀusion process for t timesteps (asa default, t = 1000 ), creating a column vector of node-level activity at each timestep.The distance d GDD ( G, G (cid:48) ) is deﬁned as the Frobeniusnorm between the two diﬀusion kernels at the timestep t ∗ where the two kernels are maximally diﬀerent. D GDD ( G, G (cid:48) ) = (cid:113) || e − t ∗ L − e − t ∗ L (cid:48) || (B17)The computational complexity is O ( n ) since a spec-tral decomposition of the Laplacian matrices is used (seeAppendix B 4).

10. Resistance Perturbation Distance

The resistance perturbation distance

RES between twovertex-labeled graphs, G and G (cid:48) , is the p -norm of the dif-ference between two graph resistance matrices [8]. Theresistance perturbation distance changes if either graph is relabeled (it is not invariant under graph isomorphism),so node labels should be consistent between the twographs being compared. The distance is not normalized.The resistance matrix of a graph G is calculated as R = diag ( L ) T + diag ( L ) T − L , (B18)where L is the Moore-Penrose pseudoinverse of the Lapla-cian of G .The resistance perturbation graph distance of G and G (cid:48) is calculated as the p -norm (the p th root of the sumof the p th powers of elements) of the diﬀerence in theirresistance matrices, R (1) and R (2) D REP ( G, G (cid:48) ) =  (cid:88) i,j ∈ V | R i,j − R (cid:48) i,j | p  /p . (B19)The default value chosen in experiments is p = 2 . Thecomputational complexity of RES is O ( n ) for our imple-mentation, since we need to compute the Moore-Penrosepseudoinverse of the Laplacian matrix of both graphs,which is O ( n ) . Note that low-rank approximations canbe used to reduce the computational complexity [8].

11. NetLSD

The NetLSD distance

LSD between two graphs, G and G (cid:48) , is the Frobenius norm between the heat trace signa-tures of the normalized Laplacians L and L (cid:48) [3]. Theheat kernel matrix is calculated as H t = e − t L = n (cid:88) j =1 e − tλ j φ j φ Tj . (B20)The ij -th element of H t contains the amount of heattransferred from node v i to node v j at time t (defaultof 256 log-spaced time intervals between − and ).From the heat kernel matrix H t , the heat trace , h t isdeﬁned as h t = Tr ( H t ) = n (cid:88) j =1 e − tλ j . (B21)The heat trace signature of graph G is the set { h t } t ≥ .Upon computing heat trace signatures of both G and G (cid:48) ,they are compared via a Frobenius norm D LSD ( G, G (cid:48) ) = d FRO ( { h t } t ≥ , { h (cid:48) t } t ≥ ) . (B22)The computational complexity of LSD is O ( n ) due tothe spectral decomposition of the Laplacian matrices ofboth graphs (see Appendix B 4).

12. Laplacian Spectrum Distances

Many distances between two graphs, G and G (cid:48) , use adirect comparison of their Laplacian spectrum. For all9the methods below, we use the eigenvalues { λ = 0 ≤ λ ≤ · · · ≤ λ n } of the normalized Laplacian matrices L and L (cid:48) . To perform the comparison, a subset of thewhole spectrum can be used, e.g. the k smallest [61] orlargest [6, 27] in magnitude. Unless speciﬁed, we used alleigenvalues for comparison ( k = n ).The distances compare the continuous spectra ρ ( λ ) and ρ (cid:48) ( λ ) associated with the graph G and G (cid:48) . A continuousspectrum is obtained by the convolution of the discretespectrum (cid:80) i δ ( λ − λ i ) with a kernel g ( λ, λ ∗ ) ρ ( λ ) = 1 Z n (cid:88) i =1 (cid:90) g ( λ, λ ∗ ) δ ( λ ∗ − λ i )d λ ∗ , (B23)where Z is a normalization factor. Diﬀerent types ofdistribution can be used for the kernel, for instance aLorentzian distribution [43] g ( λ, λ ∗ ) = γπ [ γ + ( λ − λ ∗ ) ] , (B24)or a Normal distribution g ( λ, λ ∗ ) = exp[ − ( λ − λ ∗ ) / σ ] √ πσ . (B25)Diﬀerent types of metrics can then be used to comparethe spectra, such as the Euclidean metric d ( ρ, ρ (cid:48) ) = (cid:115)(cid:90) [ ρ ( λ ) − ρ (cid:48) ( λ )] d λ , (B26)or the square root of the JSD d ( ρ, ρ (cid:48) ) = (cid:112) JSD ( ρ, ρ (cid:48) ) ,written as JSD ( ρ, ρ (cid:48) ) = 12 D KL ( ρ || ¯ ρ ) + 12 D KL ( ρ (cid:48) || ¯ ρ ) (B27)where ¯ ρ = ( ρ + ρ (cid:48) ) / . Various combination of kernels andmetrics yield the following distinct distance measures: • Laplacian spectrum: Gaussian kernel, JSD distance

LGJ • Laplacian spectrum: Lorenzian kernel, Euclideandistance

LLE

For both kernels, we use a half width at half maximum of . (which means the standard deviation for theGaussian kernel is ≈ . ).While we only focus on the two speciﬁc distancesabove, we note again that there is a world of possiblecombinations of descriptor-distance pairs to possibly usefor comparing graphs. We selected the two above becausetheir within-ensemble graph distance curves diﬀered themost (e.g. as opposed to including Gaussian kernel /Euclidean distance or Lorenzian kernel / JSD). The com-putational complexity of this suite of graph distances is O ( n ) due to the spectral decomposition of the Laplacianmatrices of both graphs (see Appendix B 4).

13. Ipsen-Mikhailov

The Ipsen-Mikhailov distance [43]

IPM between twographs, G and G (cid:48) , is a spectral comparison of their Lapla-cian matrices, L and L (cid:48) . This approach treats the set ofnodes in G and G (cid:48) as molecules with an elastic connec-tion between them, which casts the distance measure-ment between G and G (cid:48) as the solution to a set of dif-ferential equations between the vibrational frequenciesbetween the nodes. The vibrational frequencies, ω i , ofeach node in G is related to the eigenvalues, λ , of L suchthat λ i = ω i .With this, one can construct a spectral density for eachgraph as a sum of Lorenz distributions as follows ρ ( ω ) = 1 Z n − (cid:88) i =1 γ ( ω − ω i ) + γ (B28)where Z is a normalization term, and γ is a ﬁxed scalingterm that controls the width of the Lorenz distributions(as in [43], we use γ = 0 . as a default). The distancebetween G and G (cid:48) is then calculated as D IPM ( G, G (cid:48) ) = d ( ρ, ρ (cid:48) ) = (cid:115)(cid:90) ∞ [ ρ ( ω ) − ρ (cid:48) ( ω )] dω (B29)The computational complexity of IPM is O ( n ) due tothe spectral decomposition of the Laplacian matrices ofboth graphs (see Appendix B 4).

14. Hamming-Ipsen-Mikhailov

The Hamming-Ipsen-Mikhailov distance

HIM betweentwo vertex-labeled graphs, G and G (cid:48) is expressed as aweighted combination of the IPM (Section B 13) distanceand a normalized

HAM (Section B 2) distance [37]. Theparameter γ for the IPM is ﬁxed such that D IPM ( E n , F n ) =1 , where E n and F n are the empty and complete graphsof n nodes. The HIM distance is deﬁned as follows D HIM ( G, G (cid:48) ) = 1 √ ξ (cid:112) D IPM ( G, G (cid:48) ) + ξD HAM ( G, G (cid:48) ) (B30)We default to ξ = 1 , as in [37]. The computationalcomplexity of HIM is O ( n ) , with the computationallyintensive part being the computation of the IPM distance.

15. Non-backtracking Spectral Distance

The non-backtracking spectral distance

NBD betweentwo graphs, G and G (cid:48) , is a method that compares theeigenvalues of the non-backtracking matrix of each graph, B and B (cid:48) [7]. This distance is based on the length spec-trum and the set of non-backtracking cycles of a graph(i.e., a closed walk that does not immediately return to0the node from which it left) and is calculated as theearth mover’s distance ( EM D ) between the eigenvaluesof B and B (cid:48) . The eigenvalues of B and B (cid:48) are expressedas λ k = a k + ib k and λ (cid:48) k = a (cid:48) k + ib (cid:48) k , respectively, and EM D ( λ B , λ B (cid:48) ) is the solution to an optimization prob-lem ﬁnding the minimum amount of work required tomove the coordinates of λ to the positions of λ (cid:48) . D NBD ( G, G (cid:48) ) =

EM D ( λ B , λ B (cid:48) ) . (B31)Note that the Ihara determinant formula can be usedto obtain the non-backtracking eigenvalues diﬀerent from ± using a n × n matrix [7].If one uses the whole non-backtracking spectrums tocompute the distance, the computational complexitywould be O ( n ) [7]. Instead of using the whole spec-trum of the non-backtracking matrices, for graph G wecompute only the r eigenvalues larger in magnitude than √ λ , where λ is the largest eigenvalue of B [7].The computational complexity of our implementationof NBD is O (max( r, r (cid:48) ) n ) for general graphs, where r and r (cid:48) are the number of eigenvalues larger in magnitude than √ λ and (cid:112) λ (cid:48) , respectively for graph G and G (cid:48) . To com-pute these eigenvalues, an implicitly restarted Arnoldimethod is used. For sparse graphs the computation iseven more eﬃcient.

16. Distributional Non-backtracking Distance

Similar to the

NBD distance [38], the

DNB distance lever-ages spectral properties of the non-backtracking matri-ces, B and B (cid:48) , of two graphs, G and G (cid:48) , in order to cal-culate their dissimilarity.Unlike the NBD distance, the

DNB involves a compari-son of the (re-scaled) distribution of eigenvalues of B and B (cid:48) , which are then compared using either the Euclideandistance or the Chebyshev distance (here, we use the Eu-clidean distance). We also use the whole spectrum forthis distance. Therefore, the computational complexityof DNB is O ( n ) due to the spectral decomposition of thetwo n × n matrices (see Appendix B 15). D -measure Distance The D -measure distance [9] DMD between two graphs, G and G (cid:48) , involves a combination of three properties fromthe two graphs to be compared, G and G (cid:48) : the networknode dispersion ( N N D ), the node distance distribution ( µ ), and the α - centrality ( α ) for each graph. For a fullexplanation and justiﬁcation for each of the componentsinvolved in this distance, we refer the reader to the orig-inal article [9], but we will brieﬂy summarize it below.In order to compute the N N D of a graph, each node, v i , is assigned a probability vector, P i , with elementsthat are the fraction of nodes that are connected to v i at each distance j ≤ d , where d is the diameter of the network. The N N D , then, is deﬁned as

N N D ( G ) = JSD (cid:0) P , P , ..., P n (cid:1) log( d + 1) (B32)where JSD (cid:0) P , P , ..., P n (cid:1) is the Jensen-Shannon diver-gence of each P i from the whole network’s average node-distance distribution at every distance j , which we willdenote µ j . The average µ j for all distances j ≤ d in agraph, G , we will denote µ G .The ﬁnal step before the calculation of the D -measuredistance is to ﬁnd the α -centrality [62] of each network, G and G (cid:48) , as well as the α -centrality of the complement of each network, G c and G c (cid:48) . The α -centralities of theoriginal networks are denoted P αG and P αG (cid:48) , while the α -centralities of their complements are P αG c and P αG c (cid:48) .Ultimately, the D -measure distance, D DMD , between twographs is as follows: D DMD ( G, G (cid:48) ) = w (cid:115) JSD ( µ G , µ G (cid:48) )log(2) + w (cid:12)(cid:12)(cid:12)(cid:112) N N D ( G ) − (cid:112) N N D ( G (cid:48) ) (cid:12)(cid:12)(cid:12) + w (cid:32)(cid:115) JSD ( P αG , P αG (cid:48) )log(2) + (cid:115) JSD ( P αG c , P αG c (cid:48) )log(2) (cid:33) (B33)where w + w + w must equal . . To calculate the ﬁnaldistance value, we adopt the convention used in [9] suchthat w = 0 . , w = 0 . , w = 0 . .According to Ref. [9], the computational complexity of DMD is O ( | E | + n log n ) . However, one needs to compute allshortest paths between all nodes, which suggest a morecomputationally intensive calculation. We rather have acomputational complexity of O ( n ( n + | E | ) log n ) with ourimplementation using Dijkstra algorithm with a binaryheap (see Appendix B 6).

18. DeltaCon

The DeltaCon distance

DCN between two graphs, G and G (cid:48) , is the Matusita distance between the aﬃnity matri-ces, S and S (cid:48) , of G and G (cid:48) . The aﬃnity matrices areconstructed using Fast Belief Propagation, which is ex-pressed as [ I + (cid:15) D − (cid:15) A ] (cid:126)s i = (cid:126)e i (B34)where I is the n × n identity matrix, D is the diagonaldegree matrix, A is the adjacency matrix, (cid:126)e i is a vectorindicating the initial node v i from which a random walkprocess is initiated, and (cid:126)s i is a column vector consistingof s ij , which is the aﬃnity of node v j with respect tonode v i . The aﬃnity matrices, S and S (cid:48) , are deﬁned as S = [ I + (cid:15) D − (cid:15) A ] − . The distance between G and G (cid:48) D DCN ( G, G (cid:48) ) = d ( S, S (cid:48) ) = (cid:118)(cid:117)(cid:117)(cid:116) n (cid:88) i =1 n (cid:88) j =1 (cid:0) √ s ij − (cid:113) s (cid:48) ij (cid:1) (B35)The computational complexity of our implementationof DCN is O ( n ) since we obtain S by matrix inversiondirectly. However, note that it is possible to improve thealgorithm and have an O ( n ) computational complexityusing a power method or even O ( | E | ) by approximatingthe distance [2].

19. NetSimile

NetSimile

NES is a method for comparing two graphs, G and G (cid:48) , that is based on statistical features of the twographs. It is invariant to graph labels and is able tocompare graphs of diﬀerent sizes [1]. It is calculated asthe Canberera distance between the × feature matrix, p and p (cid:48) , of each graph. To construct the p and p (cid:48) feature matrices, ﬁrst a × n matrix is constructed foreach, with each column, j , consisting of the followingseven node-level quantities:1. degree, k j = (cid:80) j A ij

2. clustering coeﬃcient, c j = ( A ) jj / (cid:0) k j (cid:1)

3. average neighbor degree k ( nn ) j = k j (cid:80) i k i A ij .4. average clustering coeﬃcient of the nodes in the egonetwork c ( ego ) j = (cid:80) i c i A ij

5. number of edges within the ego network T j = (cid:80) l,m A jl A lm A mj

6. number of outgoing edges from the ego network O j = (cid:80) i A ij k i − T j = k j k ( nn ) j − T j

7. number of neighbors of the ego network nn ( ego ) j = (cid:80) i {∃ l ∈N j : i ∼ l,i (cid:54)∼ j } These features are then summarized into p and p (cid:48) ,which are × signature vectors consisting of the median, mean, standard deviation, skewness, and kurtosis of eachfeature. NetSimile uses the Canberra distance to arriveat a ﬁnal scalar distance. D NSE ( G, G (cid:48) ) = d ( p , p (cid:48) ) = n (cid:88) i =1 | p i − p (cid:48) i || p i | + | p (cid:48) i | (B36)The computational complexity of NES depends on twoparts : features extraction and features aggregation. Fea-tures are all locally deﬁned, hence their extraction willtake O ( qn ) where q is the average degree of a node whenselecting a random edge and choosing an endpoint [63].Feature aggregation is O ( n ln n ) [1], hence the overallcomplexity is O ( qn + n log n ) . Supplemental Information C: Analytical derivationof within-ensemble graph distances1. Jaccard Distance

We can directly calculate (cid:104) d JAC ( A , A (cid:48) ) (cid:105) G ( n,p ) , the ex-pected Jaccard distance among two graphs sampled from G ( n,p ) . Both | T | and | S | are distributed binomially, asthey are the sum of (cid:0) n (cid:1) Bernoulli values arising withprobability p and p (1 − p ) + p , respectively. Since bi-nomial distributions are sharply peaked (for large valuesof n ), we can approximate the expected value of the ratio | S | / | T | by the ratio of the expected values of | S | and | T | .Thus we have, (cid:104) d JAC ( A , A (cid:48) ) (cid:105) G ( n,p ) = 1 − (cid:28) | S || T | (cid:29) ≈ − (cid:104)| S |(cid:105)(cid:104)| T |(cid:105) = 1 − p (cid:0) n (cid:1) (2 p (1 − p ) + p ) (cid:0) n (cid:1) = 1 − p − p (C1)which agrees precisely with simulations. Note, in thelimit p ≈ , we have by Taylor expansion,2 (cid:104) d JAC ( A , A (cid:48) ) (cid:105) G ( n,p ≈ = 1 − (cid:28) | S || T | (cid:29)(cid:12)(cid:12)(cid:12)(cid:12) p ≈ = 1 − p − p (cid:12)(cid:12)(cid:12)(cid:12) p =1 + ( p − ddp (cid:18) − p − p (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) p =1 + ... = 0 + ( p − (cid:18) − − p + − (1 − p )( − )(1 − p ) (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) p =1 + ... = ( p − (cid:18) − − + 0 (cid:19) + ... = 2(1 − p ) + ..., (C2)Similarly—as we show in SI C 2—the Hamming dis- tance ( d HAM ) behaves in this region as (cid:104) d HAM ( A , A (cid:48) ) (cid:105) G ( n,p ≈ = 2 p (1 − p ) | p =1 + ( p −

1) (2(1 − p ) − p ) | p =1 + ... = 0 + ( p − −

2) + ... = 2(1 − p ) + ..., (C3)which is exactly the same. Indeed, we observe this equiv-alence in Figure 1 in the region p ≈ . This ﬁndingmakes intuitive sense because in the region p ≈ , the“union graph”, T , is likely an essentially complete graph,and d JAC simply measures the fraction of edges/non-edgesthat are not in agreement between G and G (cid:48) , which isprecisely what d HAM does for all p given an adjacency de-scription.

2. Hamming Distance

The Hamming measure is simply the fraction of mis-matched entries between A and A (cid:48) . Due to this simplic-ity, we again can analytically predict the mean within-ensemble graph distance for graphs sampled from G ( n,p ) : (cid:104) d HAM ( A , A (cid:48) ) (cid:105) G ( n,p ) = 1 (cid:0) n (cid:1) (cid:88) ≤ i

3. Frobenius

As a back-of-the envelope-calculation, note that thesum of elementwise diﬀerences is binomially distributedwith mean (cid:104) (cid:80) i,j | A ij − A (cid:48) ij |(cid:105) = n ( n − p (1 − p ) . Usingsharply-peakedness, we can thus state approximately, (cid:104) d FRO ( A , A (cid:48) ) (cid:105) G ( n,p ) = (cid:42)(cid:115)(cid:88) i,j | A ij − A (cid:48) ij | (cid:43) ≈ (cid:118)(cid:117)(cid:117)(cid:116)(cid:42)(cid:88) i,j | A ij − A (cid:48) ij | (cid:43) (cid:39) n (cid:112) p (1 − p ) , (C5)which exhibits a maximum at p = for any given n ,but grows linearly with nn