[PDF] On the Similarity between von Neumann Graph Entropy and Structural Information: Interpretation, Computation, and Applications

Abstract

The von Neumann graph entropy is a measure of graph complexity based on the Laplacian spectrum. It has recently found applications in various learning tasks driven by networked data. However, it is computational demanding and hard to interpret using simple structural patterns. Due to the close relation between Lapalcian spectrum and degree sequence, we conjecture that the structural information, defined as the Shannon entropy of the normalized degree sequence, might be a good approximation of the von Neumann graph entropy that is both scalable and interpretable. In this work, we thereby study the difference between the structural information and von Neumann graph entropy named as {\em entropy gap}. Based on the knowledge that the degree sequence is majorized by the Laplacian spectrum, we for the first time prove the entropy gap is between 0 and \log_2 e in any undirected unweighted graphs. Consequently we certify that the structural information is a good approximation of the von Neumann graph entropy that achieves provable accuracy, scalability, and interpretability simultaneously. This approximation is further applied to two entropy-related tasks: network design and graph similarity measure, where novel graph similarity measure and fast algorithms are proposed. Our experimental results on graphs of various scales and types show that the very small entropy gap readily applies to a wide range of graphs and weighted graphs. As an approximation of the von Neumann graph entropy, the structural information is the only one that achieves both high efficiency and high accuracy among the prominent methods. It is at least two orders of magnitude faster than SLaQ with comparable accuracy. Our structural information based methods also exhibit superior performance in two entropy-related tasks.

Full PDF

JJOURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 1

On the Similarity between von Neumann GraphEntropy and Structural Information: Interpretation,Computation, and Applications

Xuecheng Liu, Luoyi Fu, Xinbing Wang, and Chenghu Zhou

Abstract —The von Neumann graph entropy is a measureof graph complexity based on the Laplacian spectrum. It hasrecently found applications in various learning tasks driven bynetworked data. However, it is computational demanding andhard to interpret using simple structural patterns. Due to theclose relation between Lapalcian spectrum and degree sequence,we conjecture that the structural information, deﬁned as theShannon entropy of the normalized degree sequence, might be agood approximation of the von Neumann graph entropy that isboth scalable and interpretable.In this work, we thereby study the difference between thestructural information and von Neumann graph entropy namedas entropy gap . Based on the knowledge that the degree sequenceis majorized by the Laplacian spectrum, we for the ﬁrst timeprove the entropy gap is between and log e in any undirectedunweighted graphs. Consequently we certify that the structuralinformation is a good approximation of the von Neumanngraph entropy that achieves provable accuracy, scalability, andinterpretability simultaneously. This approximation is furtherapplied to two entropy-related tasks: network design and graphsimilarity measure, where novel graph similarity measure andfast algorithms are proposed. We further show empirically andtheoretically that maximizing the von Neumann graph entropycan effectively hide the community structure and propose analternative metric called spectral polarization. Our experimentalresults on graphs of various scales and types show that thevery small entropy gap readily applies to a wide range ofgraphs and weighted graphs. As an approximation of the vonNeumann graph entropy, the structural information is the onlyone that achieves both high efﬁciency and high accuracy amongthe prominent methods. It is at least two orders of magnitudefaster than SLaQ [2] with comparable accuracy. Our structuralinformation based methods also exhibit superior performance intwo entropy-related tasks. Index Terms —spectral graph theory, graph entropy, Laplacianspectrum, spectral polarization, community obfuscation.

I. I

NTRODUCTION

Evidence has rapidly grown in the past few years thatgraphs are ubiquitous in our daily life; online social networks,metabolic networks, transportation networks, and collaboration

This is an extended version of the conference paper [1] published in theWeb Conference 2021.Xuecheng Liu and Xinbing Wang are with the Department of Elec-tronic Engineering, Shanghai Jiao Tong University, Shanghai, 200240 China(email:[email protected], [email protected]).Luoyi Fu is with the Department of Computer Science and En-gineering, Shanghai Jiao Tong University, Shanghai, 200240 China(email:[email protected]).Chenghu Zhou is with the Institute of Geographical Science and NaturalResources Research, Chinese Academy of Sciences, Beijing, 100101 China(email:[email protected]). networks are just a few examples that could be representedprecisely by graphs. One important issue in graph analysis isto measure the complexity of these graphs [3], [4] which refersto the level of organization of the structural features such as thescaling behavior of degree distribution, community structure,etc. In order to capture the inherent structural complexity ofgraphs, many entropy based graph measures [4], [5], [6], [7],[8], [9] are proposed, each of which is a speciﬁc form of theShannon entropy for different types of distributions extractedfrom the graphs.As one of the aforementioned entropy based graph com-plexity measures, the von Neumann graph entropy deﬁned asthe Shannon entropy of the spectrum of the trace rescaledLaplacian matrix of a graph (see Deﬁnition 1), is of specialinterests to scholars and practitioners [10], [2], [11], [12],[13], [14], [15], [16]. This spectral based entropy measuredistinguishes between different graph structures. For instance,it is maximal for complete graphs, minimal for graphs withonly single edge, and takes on intermediate values for ringgraphs. Actually, the entropy measure originates from quantuminformation theory and is used to describe the mixedness ofa quantum system. It is Braunstein et al. that ﬁrst use thevon Neumann entropy to measure the complexity of graphsby viewing each pure state of a quantum system as one of theedges of a graph [9].Built upon the Laplacian spectra, the von Neumann graphentropy is a natural choice to capture the graph complexitysince the Laplacian spectra is well-known to contain rich in-formation about the multi-scale structure of graphs [17], [18].As a result, it has recently found applications in downstreamtasks of complex network analysis and pattern recognition.For example, the von Neumann graph entropy facilitates themeasure of graph similarity via Jensen-Shannon divergence,which could be used to compress multilayer networks [14] anddetect anomalies in graph streams [10]. As another example,the von Neumann graph entropy could be used to measureedge centrality [16] and design entropy-driven networks [19].

A. Motivations

However, despite the popularity received in applications, themain obstacle encountered in practice is the computationalinefﬁciency of the exact von Neumann graph entropy. Indeed,as the spectral based entropy measure, the von Neumanngraph entropy suffers from computational inefﬁciency sincethe computational complexity of the graph spectrum is cubic in a r X i v : . [ c s . S I] F e b OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 2 the number of nodes. Meanwhile, the existing approximationapproaches [10], [11], [2] such as quadratic approximationfail to capture the presence of non-trivial structural patternsthat seem to interpret the spectral based entropy measure.Therefore, there is a strong desire to ﬁnd a good approxi-mation that achieves accuracy, scalability, and interpretabilitysimultaneously .Instead of starting from scratch, we are inspired by the well-known knowledge that there is a close relationship betweenthe combinatorial characteristics of a graph and the algebraicproperties of its associated matrices [20]. To illustrate, weplot the Laplacian spectrum and degree sequence togetherin a same ﬁgure for four representative real-world graphsand four synthetic graphs. As shown in Fig. 1, the sortedspectrum sequence and the sorted degree sequence almostcoincide with each other. The similar phenomenon can alsobe observed in larger scale free graphs, which indicates that itis possible to reduce the approximation of the von Neumanngraph entropy to the time-efﬁcient computation of simple nodedegree statistics. Therefore, we ask without hesitation the ﬁrstresearch question,

RQ1:

Does there exist some non-polynomial function φ such that (cid:80) ni =1 φ (cid:16) d i / (cid:80) nj =1 d j (cid:17) is close to the von Neumanngraph entropy? where d i is the degree of the node i in a graph of order n .We emphasize on the non-polynomial property of the func-tion φ since most of previous works that are based on poly-nomial approximations fail to fulﬁll the interpretability. Thechallenges from scalability and interpretability are translateddirectly into two requirements on the function φ to be deter-mined. First, the explicit expression of φ must exist and keepsimple to ensure the interpretability of the sum over degreestatistics. Second, the function φ should be graph-agnostic tomeet the scalability requirement, that is, φ should be indepen-dent from the graph to be analyzed. One natural choice yieldedby the entropy nature of the graph complexity measure forthe non-polynomial function φ is φ ( x ) = − x log x . The sum − (cid:80) ni =1 (cid:16) d i / (cid:80) nj =1 d j (cid:17) log (cid:16) d i / (cid:80) nj =1 d j (cid:17) has been namedas one-dimensional structural information by Li et al. [4] ina connected graph since it has an entropy form and capturesthe information of a classic random walker in a graph. Weextend this notion to arbitrary undirected graphs. Followingthe question RQ1 , we raise the second research question,

RQ2:

Is the structural information an accurate proxy of thevon Neumann graph entropy?

To address the second question, we conduct to our knowl-edge a ﬁrst study of the difference between structural infor-mation and von Neumann graph entropy, which we name as entropy gap . B. Contributions

To study the entropy gap, we are based on a fundamental re-lationship between Laplacian spectrum λ and degree sequence d in undirected graphs: d is majorized by λ . In other words,there is a doubly stochastic matrix P such that P λ = d .Leveraging the majorization and classic Jensen’s inequality, we prove that the entropy gap is no less than in arbitraryundirected graphs. By exploiting the Jensen’s gap [21] whichis an inverse version of the classic Jensen’s inequality, wefurther prove that the entropy gap is no more than log e inarbitrary unweighted undirected graphs. The constant lowerand upper bounds on the entropy gap are further sharpenedusing more advanced knowledge about the Lapalcian spectrumand degree sequence, such as the Grone-Merris majorization[22]. We also apply the similar technique to bound the entropygap in weighted graphs.In a nutshell, our paper makes the following contributions: • Theory and interpretability:

Inspired by the close relationbetween Laplacian spectrum and degree sequence, we forthe ﬁrst time bridge the gap between the von Neumanngraph entropy and structural information by proving thatthe entropy gap is between and log e in any unweightedgraph. To the best of our knowledge, the constant boundson the approximation error in unweighted graphs are sharperthan that of any existing approaches with provable accuracy,such as FINGER [10]. Therefore, the answers to both RQ1 and

RQ2 are YES! As shown in Table I, the relativeapproximation error is around for small graphs, whichis practically good. Besides, the structural information pro-vides a simple geometric interpretation of the von Neumanngraph entropy as a measure of degree heterogeneity. Thus,the structural information is a good approximation of the vonNeumann graph entropy that achieves provable accuracy,scalability, and interpretability simultaneously. • Applications and efﬁcient algorithms:

Using the structuralinformation as a proxy of the von Neumann graph entropywith bounded error (entropy gap), we develop fast algo-rithms for two entropy based applications: network designand graph similarity measure. For the network design aimingto maximize the von Neumann entropy, we combine greedymethod and pruning strategy to speed up the searchingprocess. For the graph similarity measure, we propose anew distance measure based on structural information andJensen-Shannon divergence. We further show that the pro-posed measure is a pseudometric and devise fast incrementalalgorithm to compute the similarity between adjacent graphsin a graph stream. • Connection with community structure:

We ﬁnd em-pirically that both the von Neumann graph entropy andstructural information are uninformative of the communitystructure. However, they are effective in adversarial at-tacks on community detection, since maximizing the vonNeumann graph entropy will make the Laplacian spectrumuninformative of the community structure. Using the sameidea, we propose an alternative metric called spectral polar-ization which are both effective and efﬁcient in hiding thecommunity structure. • Extensive experiments and evaluations:

We use 3 randomgraph models, 9 real-world static graphs, and 2 real-worldtemporal graphs to evaluate the properties of the entropygap and proposed algorithms. The results show that theentropy gap is small in a wide range of graphs, including theweighted graphs. And it is insensitive to the change of graph

OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 3 index spectrumdegree (a) Zachary’s karate club index spectrumdegree (b) Dolphins index spectrumdegree (c) Email index spectrumdegree (d) Celegans index spectrumdegree (e) ER graph of order index spectrumdegree (f) BA graph of order index spectrumdegree (g) Complete graph of order index . . spectrumdegree (h) Ring graph of order Fig. 1: The close relation between Laplacian spectra and degree sequence in four representative real-world graphs (a-d) andfour common synthetic graphs (e-h). Both the Laplacian spectra and degree sequence are sorted in non-increasing order. Thex-axis represents the index of the sorted sequences, and the y-axis represents the value of Laplacian spectrum and degree.TABLE I: Structural information and von Neumann graph entropy of the graphs in Fig. 1.

Measurements Zachary Dolphins Email Celegans ER BA Complete Ringstructural information H . . . . . . . . von Neumann graph entropy H vn . . . . . . . . entropy gap ∆ H . . . . . . . . relative error ∆ HH vn .

38% 2 .

73% 0 .

67% 0 .

80% 0 .

22% 0 .

95% 0 .

03% 5 . TABLE II: Comparison of methods for approximating the vonNeumann graph entropy in terms of fulﬁlled ( (cid:51) ) and missing( (cid:55) ) properties. [10] [2] [11] Structural Information (Ours)Provable accuracy (cid:51) (cid:55) (cid:55) (cid:51)

Scalability (cid:51) (cid:51) (cid:55) (cid:51)

Interpretability (cid:55) (cid:55) (cid:55) (cid:51) size. Compared with prominent methods for approximatingthe von Neumann graph entropy, the structural informationis superior in both accuracy and computational speed. It isat least 2 orders of magnitude faster than the accurate SLaQ[2] algorithm with comparable accuracy. Our proposed al-gorithms based on structural information also exhibit superbperformance in two entropy based applications.

Roadmap:

The remainder of this paper is organized as fol-lows. We review two related issues in Section II. In Section IIIwe introduce the deﬁnitions of the von Neumann graphentropy, structural information, and the notion of entropy gap.Section IV shows the close relationship between von Neumanngraph entropy and structural information by bounding theentropy gap. Section V presents efﬁcient algorithms for twograph entropy based applications. In Section VI we discuss theconnection between von Neumann graph entropy and com-munity structure. Section VII provides experimental results.Section VIII offers some conclusions and directions for futureresearch. II. R

ELATED W ORK

We review three issues related to the von Neumann graphentropy: computation, interpretation, and connection with community structure. The ﬁrst two issues arise from the broadapplications [16], [19], [14], [23], [24], [25], [15], [26] of thevon Neumann graph entropy, whereas the last issue comesfrom the spectral clustering.

A. Approximate Computation of the von Neumann GraphEntropy

In an effort to overcome the computational inefﬁciency ofthe von Neumann graph entropy, past works have resortedto various numerical approximations. Chen et al. [10] ﬁrstcompute a quadratic approximation of the entropy via Taylorexpansion, then derive two ﬁner approximations with accuracyguarantee by spectrum-based and degree-based rescaling, re-spectively. Before Chen’s work, the Taylor expansion is widelyadopted to give computationally efﬁcient approximations [27],but there is no theoretical guarantee on the approximationaccuracy. Following Chen’s work, Choi et al. [11] proposeseveral more complex quadratic approximations based on ad-vanced polynomial approximation methods whose superiorityare veriﬁed through experiments.Besides, there is a trend to approximate spectral sumsusing stochastic trace estimation based approximations [28],the merit of which is the provable error-bounded estimationof the spectral sums. For example, Kontopoulou et al. [12]propose three randomized algorithms based on Taylor series,Chebyshev polynomials, and random projection matrices toapproximate the von Neumann entropy of density matrices. Asanother example, based on the stochastic Lanczos quadraturetechnique [29], Tsitsulin et al. [2] propose an efﬁcient andeffective approximation technique called SLaQ to estimatethe von Neumann entropy and other spectral descriptors forweb-scale graphs. However, the approximation error bound of

OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 4

SLaQ for the von Neumann graph entropy is not provided.The disadvantages of such stochastic approximations are alsoobvious; their computational efﬁciency depends on the numberof random vectors used in stochastic trace estimation, andthey are not suitable for applications like anomaly detectionin graph streams and entropy-driven network design.The comparison of methods for approximating the vonNeumann graph entropy is presented in Table II. One of thecommon drawbacks of the aforementioned methods is thelack of interpretability, that is, none of these methods pro-vide enough evidence to interpret this spectral based entropymeasure in terms of structural patterns. By contrast, as agood proxy of the von Neumann graph entropy, the structuralinformation offers us the intuition that the spectral basedentropy measure is closely related to the degree heterogeneityof graphs.

B. Spectral Descriptor of Graphs and Its Structural Counter-part

Researchers in spectral graph theory have always beeninterested in establishing a connection between combinatorialcharacteristics of a graph and the algebraic properties of itsassociated matrices. For example, the algebraic connectivity(also known as Fiedler eigenvalue), deﬁned as the secondsmallest eigenvalue of a graph Laplacian matrix, has been usedto measure the robustness [17] and synchronizability [30] ofgraphs. The magnitude of the algebraic connectivity has alsobeen found to reﬂect how well connected the overall graphis [18]. As another example, the Fiedler vector, deﬁned asthe eigenvector corresponding to the Fiedler eigenvalue of agraph Laplacian matrix, has been found to be a good sign ofthe bi-partition structure of a graph [31]. However, there aresome other spectral descriptors that have found applicationsin graph analytics, but require more structural interpretations,such as the heat kernel trace [32], [33] and von Neumanngraph entropy.Simmons et al. [34] suggest to interpret the von Neumanngraph entropy as the centralization of graphs, which is verysimilar to our interpretation using structural information. Theyderive both upper and lower bounds on the von Neumanngraph entropy in terms of graph centralization under somehard assumptions on the range of the von Neumann graphentropy. Therefore, their results cannot be directly convertedto accuracy guaranteed approximations of the von Neumanngraph entropy for arbitrary simple graphs. By constrast, ourwork shows that the structural information is an accurate,scalable, and interpretable proxy of the von Neumann graphentropy for arbitrary simple graphs. Besides, the techniquesused in our proof are also quite different from [34].

C. Spectrum, Detectability, and Signiﬁcance of CommunityStructure

Started from the Fiedler vector, spectral algorithms havebeen widely analyzed and applied in the task of communitydetection [35], [36] because of their simplicities and theoreticalguarantees. Cheeger’s inequality λ ≤ h G ≤ √ λ boundsthe conductance h G of a graph G using the second smallest eigenvalue of the normalized Laplacian matrix. This is latergeneralized to multiway spectral partitioning [37] yielding thehigher-order Cheeger inequalities λ k ≤ ρ G ( k ) ≤ O ( k ) √ λ k for each k , where λ k is the k -th smallest eigenvalue ofthe normalized Lapalcian matrix and ρ G ( k ) is the k -wayexpansion constant. Since both h G and ρ G ( k ) measure thesigniﬁcance of community structure, the graph spectrum isclosely related to the community structure.The coupling between graph spectrum and communitystructure has been empirically validated in [35] where New-man found that if the second smallest eigenvalue λ of theLaplacian matrix is well separated from the eigenvalues aboveit, the spectral clustering based on Lapalcian matrix oftendoes very well. However, community detection by spectralalgorithms in sparse graphs often fails, because the spectrumcontains no clear evidence of community structure. This isexempliﬁed under the sparse stochastic block model with twoclusters of equal size [38], [36], where the second smallesteigenvalue of the adjacency matrix gets lost in the bulk ofuninformative eigenvalues. Our experiments complement thecorrelation between graph spectrum and community structureby showing that the spikes in a sequence of spectral gaps aregood indicators of the community structure.The signiﬁcance and detectability of community structurehas found its application in an emerging area called com-munity obfuscation [39], [40], [41], [42], [43], [44], [45],where the graph structure is minimally perturbed to protectits community structure from being detected. None of thesepractical algorithms exploit the correlation between graphspectrum and community structure except for the structuralentropy proposed by Liu et al. [43]. Our work bridges theone-dimensional structural entropy in [43] with the spectralentropy, elaborates both empirically and theoretically thatmaximizing the spectral entropy is effective in communityobfuscation, and thus provides a theoretical foundation for thesuccess of the structural entropy [43].III. P RELIMINARIES

In this paper, we study the undirected graph G = ( V, E, A ) with positive edge weights, where V = { , . . . , n } is the nodeset, E is the edge set, and A ∈ R n × n + is the symmetric weightmatrix with positive entry A ij denoting the weight of an edge ( i, j ) ∈ E . If the node pair ( i, j ) / ∈ E , then A ij = 0 . If graph G is unweighted, the weight matrix A ∈ { , } n × n is calledthe adjacency matrix of G . The degree of node i ∈ V in graph G is deﬁned as d i = (cid:80) nj =1 A ij . The Laplacian matrix of graph G is deﬁned as L = D − A where D = diag( d , . . . , d n ) isthe degree matrix. Let { λ i } ni =1 be the sorted eigenvalues of L such that λ ≥ λ ≥ · · · ≥ λ n = 0 , which is called Laplacianspectrum. We deﬁne vol( G ) = (cid:80) ni =1 d i as the volume ofgraph G , then vol( G ) = tr( L ) = (cid:80) ni =1 λ i where tr( · ) is thetrace operator. For the convenience of delineation, we deﬁne aspecial function f ( x ) (cid:44) x log x on the support [0 , ∞ ) where f (0) (cid:44) lim x ↓ f ( x ) = 0 by convention. In the following, wepresent formal deﬁnitions of the von Neumann graph entropy,structural information, and the entropy gap. Slightly differentfrom the one-dimensional structural information proposed by OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 5

Li et al. [4], our deﬁnition of structural information does notrequire the graph G to be connected. Deﬁnition 1 (von Neumann graph entropy) . The von Neumanngraph entropy of an undirected graph G = ( V, E, A ) is deﬁnedas H vn ( G ) = − (cid:80) ni =1 f ( λ i / vol( G )) , where λ ≥ λ ≥ · · · ≥ λ n = 0 are the eigenvalues of the Laplacian matrix L = D − A of the graph G , and vol( G ) = (cid:80) ni =1 λ i is the volume of G . Deﬁnition 2 (Structural information) . The structural infor-mation of an undirected graph G = ( V, E, A ) is deﬁned as H ( G ) = − (cid:80) ni =1 f ( d i / vol( G )) , where d i is the degree ofnode i in G and vol( G ) = (cid:80) ni =1 is the volume of G . Deﬁnition 3 (Entropy gap) . The entropy gap of an undirectedgraph G = ( V, E, A ) is deﬁned as ∆ H ( G ) = H ( G ) −H vn ( G ) . The von Neumann graph entropy and structural informationare well-deﬁned for all the undirected graphs except for thegraphs with empty edge set, in which vol( G ) = 0 . When E = ∅ , we take it for granted that H ( G ) = H vn ( G ) = 0 .IV. A PPROXIMATION E RROR A NALYSIS

In this section we bound the entropy gap in the undirectedgraphs of order n . Since the nodes with degree have nocontribution to structural information and von Neumann graphentropy, without loss of generality we assume that d i > forany node i ∈ V . A. Bounds on the Approximation Error

We ﬁrst provide the additive approximation errors in The-orem 1, Corollary 1, and Corollary 2, then obtain the multi-plicative approximation error in Theorem 2.

Theorem 1 (Bounds on the absolute approximation error) . Forany undirected graph G = ( V, E, A ) , the inequality ≤ ∆ H ( G ) ≤ log eδ · tr( A )vol( G ) (1) holds, where δ = min { d i | d i > } is the minimum positivedegree. Before proving Theorem 1, we introduce two techniques:majorization and Jensen’s gap. The former one is a preorderof the vector of reals, while the latter is an inverse versionof the Jensen’s inequality, whose deﬁnitions are presented asfollows.

Deﬁnition 4 (Majorization [46]) . For a vector x ∈ R d , wedenote by x ↓ ∈ R d the vector with the same components, butsorted in descending order. Given x , y ∈ R d , we say that x majorizes y (written as x (cid:31) y ) if and only if (cid:80) ki =1 x ↓ i ≥ (cid:80) ki =1 y ↓ i for k = 1 , . . . , d and x T = y T . Lemma 1 (Jensen’s gap [21]) . Let X be a one-dimensionalrandom variable with mean µ and support Ω . Let ψ ( x ) be a twice differentiable function on Ω and deﬁne function h ( x ) = ψ ( x ) − ψ ( µ )( x − µ ) − ψ (cid:48) ( µ ) x − µ , then E [ ψ ( X )] − ψ ( E [ X ]) ≤ sup x ∈ Ω { h ( x ) } · var ( X ) . Additionally, if ψ (cid:48) ( x ) is convex, then h ( x ) is monotonically increasing in x , and if ψ (cid:48) ( x ) is concave,then h ( x ) is monotonically decreasing in x . Lemma 2.

The function f ( x ) = x log x is convex, its ﬁrstorder derivative f (cid:48) ( x ) = log x + log e is concave.Proof. The second order derivative f (cid:48)(cid:48) ( x ) = (log e ) /x > ,thus f ( x ) = x log x is convex.We can see that the majorization characterizes the degreeof concentration between two vectors, x (cid:31) y means that theentries of y are more concentrated on its mean y T / T thanthe entires of x . An equivalent deﬁnition of the majorization[46] using linear algebra says that x (cid:31) y if and only if thereexists a doubly stochastic matrix P such that P x = y . As afamous example of the majorization, the Schur-Horn theorem[46] says that the diagonal elements of a positive semideﬁniteHermitian matrix are majorized by its eigenvalues. Since x T L x = (cid:80) ( i,j ) ∈ E A ij ( x i − x j ) ≥ for any vector x ∈ R n ,the Laplacian matrix L is a positive semideﬁnite symmetricmatrix whose diagonal elements form the degree sequence d and eigenvalues form the spectrum λ . Therefore, λ (cid:31) d implying that there exists some doubly stochastic matrix P = ( p ij ) ∈ [0 , n × n such that P λ = d .Using the fact that P λ = d and the convexity of f ( x ) inLemma 2, we can now proceed to prove Theorem 1. Proof of Theorem 1.

For each i ∈ V , we deﬁne a dis-crete random variable X i with probability mass function (cid:80) nj =1 p ij δ λ j ( x ) , where δ a ( x ) is the Kronecker delta function.Then the expectation E [ X i ] = (cid:80) nj =1 p ij λ j = d i and thevariance var ( X i ) = (cid:80) nj =1 p ij ( λ j − d i ) = (cid:80) nj =1 p ij λ j − d i .First, we express the entropy gap in terms of the Lapalcianspectrum and the degree sequence. Since H ( G ) = − n (cid:88) i =1 (cid:18) d i vol( G ) (cid:19) log (cid:18) d i vol( G ) (cid:19) = − G ) (cid:32) n (cid:88) i =1 f ( d i ) − n (cid:88) i =1 d i log (vol( G )) (cid:33) = log (vol( G )) − (cid:80) ni =1 f ( d i )vol( G ) , (2)and similarly H vn ( G ) = log (vol( G )) − (cid:80) ni =1 f ( λ i )vol( G ) , (3)we have ∆ H ( G ) = H ( G ) − H vn ( G ) = (cid:80) ni =1 f ( λ i ) − (cid:80) ni =1 f ( d i )vol( G ) . (4)Second, we use Jensen’s inequality to prove ∆ H ( G ) ≥ .Since f ( x ) is convex, f ( d i ) = f ( E [ X i ]) ≤ E [ f ( X i )] for any i ∈ { , . . . , n } . By summing over i , we have n (cid:88) i =1 f ( d i ) ≤ n (cid:88) i =1 E [ f ( X i )] = n (cid:88) i =1 n (cid:88) j =1 p ij f ( λ j ) = n (cid:88) j =1 f ( λ j ) . Therefore, ∆ H ( G ) ≥ for any undirected graphs. OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 6

Finally, we use Jensen’s gap to prove the upper bound on ∆ H ( G ) in (1). Apply the Jensen’s gap to X i and f ( x ) , E [ f ( X i )] − f ( E [ X i ]) ≤ sup x ∈ [0 , vol( G )] { h i ( x ) } · var ( X i ) , (5)where h i ( x ) = f ( x ) − f ( E [ X i ])( x − E [ X i ]) − f (cid:48) ( E [ X i ]) x − E [ X i ] . Since f (cid:48) ( x ) is concave, h i ( x ) is monotonically decreasing in x . Therefore, sup x ∈ [0 , vol( G )] { h i ( x ) } = h i (0) . Since h i (0) = f (0) − f ( d i ) d i + f (cid:48) ( d i ) d i = log ed i ≤ log eδ , the inequality in (5) can be simpliﬁed as n (cid:88) j =1 p ij f ( λ j ) − f ( d i ) ≤ log eδ ·  n (cid:88) j =1 p ij λ j − d i  . (6)By summing both sides of the inequality (6) over i , we getan upper bound UB on (cid:80) nj =1 f ( λ j ) − (cid:80) ni =1 f ( d i ) as UB = log eδ · n (cid:88) i =1  n (cid:88) j =1 p ij λ j − d i  = log eδ ·  n (cid:88) j =1 λ j − n (cid:88) i =1 d i  = log eδ · (cid:0) tr ( L ) − tr ( D ) (cid:1) = log eδ · (cid:0) tr ( A ) − tr ( AD ) − tr ( DA ) (cid:1) = log eδ · tr ( A ) . As a result, ∆ H ( G ) = (cid:80) ni =1 f ( λ i ) − (cid:80) ni =1 f ( d i )vol( G ) ≤ log eδ tr( A )vol( G ) .To illustrate the tightness of the bounds in Theorem 1,we further derive bounds on the entropy gap for unweightedgraphs, especially the regular graphs. Via multiplicative erroranalysis, we show that the structural information converges tothe von Neumann graph entropy as graph size grows. Corollary 1 (Constant bounds on the entropy gap) . For anyunweighted, undirected graph G , ≤ ∆ H ( G ) ≤ log e holds.Proof. In unweighted graph G , tr( A ) = n (cid:88) i =1 n (cid:88) j =1 A ij A ji = n (cid:88) i =1 n (cid:88) j =1 A ij = n (cid:88) i =1 d i = vol( G ) and δ ≥ , therefore ≤ ∆ H ( G ) ≤ log eδ tr( A )vol( G ) = log eδ ≤ log e . Corollary 2 (Entropy gap of regular graphs) . For any un-weighted, undirected, regular graph G d of degree d , theinequality ≤ ∆ H ( G d ) ≤ log ed holds.Proof sketch. In any unweighted, regular graph G d , δ = d . Theorem 2 (Convergence of the multiplicative approximationerror) . For almost all unweighted graphs G of order n , H ( G ) H vn ( G ) − ≥ and decays to at the rate of o (1 / log ( n )) .Proof. Dairyko et al. [47] proved that for almost all un-weighted graphs G of order n , H vn ( G ) ≥ H vn ( K ,n − ) where K ,n − stands for the star graph. Since H vn ( K ,n − ) =log (2 n − − n n − log n = 1+ log n + o (1) , H ( G ) H vn ( G ) − ∆ H ( G ) H vn ( G ) ≤ log e H vn ( K ,n − ) = o ( n ) . B. Sharpened Bounds on the Entropy Gap

Though the constant bounds on the entropy gap is tightenough for applications, we can still sharpen the bounds onthe entropy gap in unweighted graphs using more advancedmajorizations.

Theorem 3 (Sharpened lower bound on entropy gap) . For anyunweighted, undirected graph G , ∆ H ( G ) is lower bounded by ( f ( d max +1) − f ( d max )+ f ( δ − − f ( δ )) / vol( G ) where d max is the maximum degree and δ is the minimum positive degree.Proof. The proof is based on the advanced majorization [48]: λ (cid:31) ( d + 1 , d , . . . , d n − holds on any unweighted,undirected graph G where d ≥ d ≥ · · · ≥ d n is the sorteddegree sequence of G . Similar to the proof of Theorem 1, wehave (cid:80) ni =1 f ( λ i ) ≥ f ( d + 1) + f ( d n −

1) + (cid:80) n − i =2 f ( d i ) .Then the sharpened upper bound follows from the equation(4) since d = d max and d n = δ . Theorem 4 (Sharpened upper bound on entropy gap) . For anyunweighted, undirected graph G = ( V, E ) , ∆ H ( G ) is upperbounded by min { log e, b , b } where b = (cid:80) ni =1 f ( d ∗ i )vol( G ) − (cid:80) ni =1 f ( d i )vol( G ) and b = log (1 + (cid:80) ni =1 d i / vol( G )) − (cid:80) ni =1 f ( d i )vol( G ) .Here ( d ∗ , . . . , d ∗ n ) is the conjugate degree sequence of G where d ∗ k = |{ i | d i ≥ k }| .Proof. We ﬁrst prove ∆ H ( G ) ≤ b using the Grone-Merrismajorization [22]: ( d ∗ , . . . , d ∗ n ) (cid:31) λ . Similar to the proof ofTheorem 1, we have (cid:80) ni =1 f ( d ∗ i ) ≥ (cid:80) ni =1 f ( λ i ) , thus b ≥ (cid:80) ni =1 f ( λ i ) − (cid:80) ni =1 f ( d i )vol( G ) = ∆ H ( G ) . We then prove ∆ H ( G ) ≤ b . Since (cid:80) ni =1 f ( λ i )vol( G ) = n (cid:88) i =1 (cid:32) λ i (cid:80) nj =1 λ j (cid:33) log λ i ≤ log (cid:32) (cid:80) ni =1 λ i (cid:80) nj =1 λ j (cid:33) and (cid:80) ni =1 λ i (cid:80) ni =1 λ i = tr( L )vol( G ) = 1 + (cid:80) ni =1 d i vol( G ) , we have ∆ H ( G ) = (cid:80) ni =1 f ( λ i ) − f ( d i )vol( G ) ≤ b . OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 7

C. Entropy Gap of Speciﬁc Graphs

The entropy gap of complete graph K n , complete bipartitegraph K a,b , path graph P n , and ring graph R n is summarizedin Table III, whose proofs can be found in Appendix B.V. A PPLICATIONS AND A LGORITHMS

As a measure of the structural complexity of a graph,the von Neumann entropy has been applied in a variety ofapplications. For example, the von Neumann graph entropyis exploited to measure the importance of an edge [16]. Asanother example, the von Neumann graph entropy can alsobe used to measure the distance between graphs for graphclassiﬁcation and anomaly detection [10], [15]. In addition, thevon Neumann graph entropy is used in the context of networkembedding [26] to learn low-dimensional feature representa-tions of nodes. We observe that, in these applications, thevon Neumann graph entropy is used to address the followingprimitive tasks: • Entropy-based network design : Change the existing graphto a new graph such that the entropy requirement is attainedwith minimal perturbations on the existing graph. For ex-ample, Minello et al. [19] use the von Neumann entropy toexplore the potential network growth model via experiments. • Graph similarity measure : Compute a real positive numberto reveal the similarity between two graphs. For example,Domenico et al. [14] use the von Neumann graph entropyto compute the Jensen-Shannon distance between graphs forthe purpose of compressing multilayer networks.To resolve both tasks, it requires computing the von Neu-mann graph entropy exactly. To reduce the computational costand preserve the interpretability, we can use the accurate proxy,structural information, to approximately solve these tasks.

A. Entropy-based network design

Network design aims to minimally perturb the network tofulﬁll some goals. Consider a goal to maximize the von Neu-mann entropy of a graph, it helps to understand how differentstructural patterns inﬂuence the entropy value. The entropy-based network design problem is formulated as follows,

Problem 1 (MaxEntropy) . Given an unweighted, undirectedgraph G = ( V, E ) of order n and an integer budget k , ﬁnd aset F of non-existing edges of G whose addition to G createsthe largest increase of the von Neumann graph entropy and | F | ≤ k . Due to the spectral nature of the von Neumann graphentropy, it is not easy to ﬁnd an effective strategy to per-turb the graph, especially in the scenario where there areexponential number of combinations for the subset F . Ifwe use the structural information as a proxy of the vonNeumann entropy, Problem 1 reduces to maximize H ( G (cid:48) ) where G (cid:48) = ( V, E ∪ F ) such that | F | ≤ k . To further alleviatethe computational pressure rooted in the exponential size ofthe search space for F , we adopt the greedy method in whichthe new edges are added one by one until either the structuralinformation attains its maximum value log n or k new edges Algorithm 1:

EntropyAug

Input:

The graph G = ( V, E ) of order n , the budget k Output:

A set of node pairs F ← ∅ , H ← ; while | F | < k do V s : list ← sort V in non-decreasing degree order; head ← , tail ← | V s | − , T ← + ∞ ; while head < tail do for i = head + 1 , head + 2 , . . . , tail do if EC ( V s [head] , V s [ i ]) ≥ T then tail ← i − ; break ; if ( V s [head] , V s [ i ]) / ∈ E then u ← V s [head] , v ← V s [ i ] , T ← EC ( u, v ) ; tail ← i − ; break ; head ← head + 1 ; E ← E ∪ { ( u, v ) } , F ← F ∪ { ( u, v ) } ; if H ( G ) > H then H ← H ( G ) , F ∗ ← F ; if H = log n then break ; return F ∗ .have already been added. We denote the graph with l newedges as G l = ( V, E l ) , then G = G . Now suppose thatwe have G l whose structural information is less than log n ,then we want to ﬁnd a new edge e l +1 = ( u, v ) such that H ( G l +1 ) is maximized, where G l +1 = ( V, E l ∪ { e l +1 } ) .Since H ( G l +1 ) can be rewritten as log (2 | E l | + 2) − f ( d u + 1) + f ( d v + 1) + (cid:80) i (cid:54) = u,v f ( d i )2 | E l | + 2 , the edge e l +1 maximizing H ( G l +1 ) should also minimize theedge centrality EC ( u, v ) = f ( d u + 1) − f ( d u ) + f ( d v + 1) − f ( d v ) , where d i is the degree of node i in G l .We present the pseudocode of our fast algorithm Entropy-Aug in Algorithm 1, which leverages the pruning strategy toaccelerate the process of ﬁnding a single new edge that createsa largest increase of the von Neumann entropy. EntropyAugstarts by initiating an empty set F used to contain the nodepairs to be found and an entropy value H used to recordthe maximum structural information in the graph evolutionprocess (line 1). In each following iteration, it sorts the setof nodes V in non-decreasing degree order (line 3). Note thatthe edge centrality EC ( u, v ) has a nice monotonic property: EC ( u , v ) ≤ EC ( u , v ) if min { d u , d v } ≤ min { d u , d v } and max { d u , d v } ≤ max { d u , d v } . With the sorted list ofnodes V s , the monotonicity of EC ( u, v ) can be translated into EC ( V s [ i ] , V s [ j ]) ≤ EC ( V s [ i ] , V s [ j ]) if the indices satisfy i < j , i < j , i < i , and j < j . Thus, using thetwo pointers { head , tail } and a threshold T , it can prune thesearch space and ﬁnd the desired non-adjacent node pair asfast as possible (line 4-12). It then adds the non-adjacent nodepair minimizing EC ( u, v ) into F and update the graph G (line 13). The structural information of the updated graph iscomputed to determine whether F is the optimal subset tillcurrent iteration (line 14-15). OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 8

TABLE III: Structural information, von Neumann graph entropy, and entropy gap of speciﬁc graphs.

Graph Types Structural information H von Neumann graph entropy H vn Entropy gap ∆ H Complete graph K n log n log ( n −

1) log (1 + n − ) Complete bipartite graph K a,b log ( ab ) 1 + log ( ab ) − log (1+ ba )2 b − log (1+ ab )2 a log (1+ ba )2 b + log (1+ ab )2 a Path P n log ( n −

1) + n − log ( n −

1) + 1 − log e log e − Ring R n log n log n + 1 − log e log e − B. Graph Similarity Measure

Entropy based graph similarity measure aims to comparegraphs using Jensen-Shannon divergence. The Jensen-Shannondivergence, as a symmetrized and smoothed version of theKullback-Leibler divergence, is deﬁned formally in the fol-lowing Deﬁnition 5.

Deﬁnition 5 (Jensen-Shannon divergence) . Let P and Q betwo probability distributions on the same support set Ω N = { , . . . , N } . The Jensen-Shannon divergence between P and Q is deﬁned as D JS ( P, Q ) = H (( P + Q ) / − H ( P ) / − H ( Q ) / , where H ( P ) = − (cid:80) Ni =1 p i log p i is the entropy of the distri-bution P . Endres et al. [49] prove that (cid:112) D JS ( P, Q ) is a boundedmetric on the space of distributions over Ω N with its maximumvalue √ log 2 being attained when min { p i , q i } = 0 for every i ∈ Ω N . Since the von Neumann graph entropy is an entropyof the spectrum based distribution, Lamberti et al. [50] deﬁne aquantum Jensen-Shannon distance between two graphs whichis closely related to the von Neumann graph entropy in thefollowing Deﬁnition 6. Deﬁnition 6 (Quantum Jensen-Shannon distance) . The quan-tum Jensen-Shannon distance between two weighted, undi-rected graphs G = ( V, E , A ) and G = ( V, E , A ) isdeﬁned as D QJS ( G , G ) = (cid:113) H vn ( G ) − ( H vn ( G ) + H vn ( G )) / , where G = ( V, E ∪ E , A ) is an weighted graph with A = A / G ) + A / G ) . Based on the quantum Jensen-Shannon distance, we con-sider the following problem that can be applied in anomalydetection and multiplex network compression,

Problem 2.

Compute the quantum Jensen-Shannon distancebetween adjacent graphs in a stream of graphs { G k =( V, E k , t k ) } Kk =1 where t k is the timestamp of the graph G k and t k < t k +1 . As a distance measure between graphs, D QJS is typicallyrequired to be a pseudometric [33], that is, it should besymmetric and satisfy triangle inequality. However, to thebest of our knowledge, it is still an open problem whether D QJS fulﬁlls the triangle inequality [50]. Meanwhile, thequantum Jensen-Shannon distance inherits the computationalinefﬁciency from the von Neumann graph entropy. Therefore,to solve Problem 2 efﬁciently we propose a new distance measure based on structural information as a surrogate for D QJS . Deﬁnition 7 (Structural information distance) . The structuralinformation distance between two weighted, undirected graphs G = ( V, E , A ) and G = ( V, E , A ) is deﬁned as D SI ( G , G ) = (cid:113) H ( G ) − ( H ( G ) + H ( G )) / , where G = ( V, E ∪ E , A ) is an weighted graph with A = A / G ) + A / G ) . It is a little surprising to ﬁnd that D SI is a pseudometric,the details of which are stated in Theorem 5. Theorem 5 (Properties of the distance measure D SI ) . Thedistance measure D SI ( G , G ) is a pseudometric on the spaceof undirected graphs: • D SI is symmetric, i.e., D SI ( G , G ) = D SI ( G , G ) ; • D SI is non-negative, i.e., D SI ( G , G ) ≥ where theequality holds if and only if d i, (cid:80) nk =1 d k, = d i, (cid:80) nk =1 d k, forevery node i ∈ V where d i,j is the degree of node i in G j ; • D SI obeys the triangle inequality, i.e., D SI ( G , G ) + D SI ( G , G ) ≥ D SI ( G , G ); • D SI is upper bounded by , i.e., D SI ( G , G ) ≤ wherethe equality holds if and only if min { d i, , d i, } = 0 forevery node i ∈ V where d i,j is the degree of node i in G j . To establish a connection between D SI and D QJS , we studytheir extreme values and present the results in Theorem 6.

Theorem 6 (Connection between D QJS and D SI ) . Both D QJS ( G , G ) and D SI ( G , G ) attain the same maximumvalue of under the identical condition that min { d i, , d i, } =0 for every node i ∈ V where d i,j is the degree of node i in G j . In order to compute the structural information distancebetween adjacent graphs in the graph stream { G k } Kk =1 where G k = ( V, E k , t k ) , we ﬁrst compute the structural information H ( G k ) for each k ∈ { , . . . , K } , which takes Θ( Kn ) time.Then we compute the structural information of G k whoseadjacent matrix A k = A k / G k ) + A k +1 / G k +1 ) foreach k ∈ { , . . . , K − } . Since the degree of node i in G k is d i,k = d i,k G k ) + d i,k +1 G k +1 ) and (cid:80) ni =1 d i,k = 1 , the structuralinformation of G k is H ( G k ) = − (cid:80) ni =1 f ( d i,k ) which takes Θ( n ) time for each k . Therefore, the total computational costis Θ((2 K − n ) . OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 9

Algorithm 2:

IncreSim

Input: G and { ∆ G k } K − k =1 Output: {D SI ( G k , G k +1 ) } K − k =1 d ← the degree sequence of the graph G ; m ← (cid:80) ni =1 d i / ; H ( G ) ← log (2 m ) − m (cid:80) ni =1 f ( d i ) ; for k = 1 , . . . , K − do ∆ d ← the degree sequence of the signed graph ∆ G k ; ∆ m ← (cid:80) i ∈ V k ∆ d i / ; Compute a, b, y, z in Lemma 3 via iterating over V k ; Compute H ( G k +1 ) and H ( G k ) based onLemma 3; D SI ( G k , G k +1 ) ← (cid:113) H ( G k ) − ( H ( G k ) + H ( G k +1 )) / ; m ← m + ∆ m ; foreach i ∈ V k do d i ← d i + ∆ d i ; return {D SI ( G k , G k +1 ) } K − k =1 In practice, the graph stream is fully dynamic such thatit would be more efﬁcient to represent the graph stream asa stream of edge insertions and deletions over time, ratherthan a sequence of graphs. Suppose that the graph stream isrepresented as an initial base graph G = ( V, E , t ) and a se-quence of graph changes { ∆ G k = ( V k , E + ,k , E − ,k , t k ) } K − k =1 where t k is the timestamp of the set E + ,k of edge insertionsand the set E − ,k of edge deletions, and V k is the subsetof nodes covered by E + ,k ∪ E − ,k . We can view the graphchange ∆ G k as a signed network where the edge in E + ,k has positive weight +1 and the edge in E − ,k has negativeweight − . The degree of node i ∈ V k in the graph change ∆ G k refers to (cid:80) j ∈ V k I { ( i, j ) ∈ E + ,k } − I { ( i, j ) ∈ E − ,k } .Using the information about previous graph G k and currentgraph change ∆ G k , we can compute the entropy statistics ofthe current graph G k +1 incrementally and efﬁciently via thefollowing lemma, whose proof can be found in the appendix. Lemma 3.

Using the degree sequence d of the graph G k , thestructural information H ( G k ) , and the degree sequence ∆ d of the signed graph ∆ G k , the structural information of thegraph G k +1 can be efﬁciently computed as H ( G k +1 ) = f (2( m + ∆ m )) − a − f (2 m ) + 2 m H ( G k )2( m + ∆ m ) , where m = (cid:80) ni =1 d i / , ∆ m = (cid:80) i ∈ V k ∆ d i / , and a = (cid:80) i ∈ V k f ( d i + ∆ d i ) − f ( d i ) . Moreover, the structural infor-mation of the averaged graph G k between G k and G k +1 canbe efﬁciently computed as H ( G k ) = − b − (2 m − y ) f ( c ) − c ( f (2 m ) − m H ( G k ) − z ) , where y = (cid:80) i ∈ V k d i , z = (cid:80) i ∈ V k f ( d i ) , c = m +∆ m m ( m +∆ m ) , and b = (cid:80) i ∈ V k f (cid:16) d i m + d i +∆ d i m +∆ m ) (cid:17) . The pseudocode of our fast algorithm IncreSim for com-puting the structural information distance in a graph stream is shown in Algorithm 2. It starts by computing the structuralinformation of the base graph G (line 1-3), which takes Θ( n ) time. In each following iteration, it ﬁrst computes thevalue of a, b, c, y, z (line 5-7), then calculates the structuralinformation distance between two adjacent graphs (line 8-9),ﬁnally updates the edge count m and the degree sequence d (line 10-11). The time cost of each iteration is Θ( | V k | ) ,consequently the total time complexity is Θ( n + (cid:80) K − k =1 | V k | ) .VI. C ONNECTIONS WITH C OMMUNITY S TRUCTURE

In this section, we discuss the connections between thegraph entropy and community structures in graphs.

A. Empirical Analysis of Stochastic Block Model1) Preparations:

To study the connections between graphentropy and community structures in a speciﬁc ensemble ofgraphs, suppose that a graph G is generated by the stochasticblock model. There are q groups of nodes, and each node v has a group label g v ∈ { , . . . , q } . Edges are generatedindependently according to a matrix P ∈ [0 , q × q of prob-abilities, with Pr( A uv = 1) = P [ g u , g v ] . In the sparse case,we have P [ a, b ] = C [ a, b ] /n , where the afﬁnity matrix C stays constant in the limit n → ∞ . For simplicity we make acommon assumption that the afﬁnity matrix C has two distinctentries c in and c out where C [ a, b ] = c in if a = b and c out if a (cid:54) = b . For any graph generated from the stochastic blockmodel with two(three) groups, we use the spectral algorithm inAlgorithm 3(Algorithm 4) to detect the community structure. Algorithm 3: -Spectral Clustering Input:

The graph G = ( V, E ) of order n Output:

A cluster membership vector cl ∈ { , } n cl ← ; L ← Laplacian matrix of the graph G ; v ← eigenvector corresponding to λ of L ; for i = 1 , . . . , n do if v [ i ] < then cl [ i ] = 1 ; return cl Algorithm 4: -Spectral Clustering Input:

The graph G = ( V, E ) of order n Output:

A cluster membership vector cl ∈ { , , } n L ← Laplacian matrix of the graph G ; v ← eigenvector corresponding to λ of L ; v ← eigenvector corresponding to λ of L ; cl ← k -means clustering of [ v , v ] with k = 3 ; return cl

2) Evaluation Metrics:

For each synthetic graph, we com-pute the structural information, von Neumann graph entropy,Laplacian eigenvalues λ , λ , λ , . . . with small magnitude,spectral gaps λ k +1 − λ k , and detection error.Let P = { P , . . . , P k } and Q = { Q , . . . , Q k } be two k -partitions of V . We view P as the ground-truth community OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 10 structure and Q as the detected community structure, then thedetection error is min σ k (cid:88) i =1 | P i (cid:52) Q σ ( i ) | , where σ ranges over all bijections σ : { , . . . , k } →{ , . . . , k } and (cid:52) represents symmetric difference.

3) Empirical Results:

The results are shown in Fig. 2 andFig. 3, from which we have the following observations: • Observation 1 (Dynamics of graph entropy):

Both thevon Neumann graph entropy and structural informationare stationary with small ﬂuctuations and linearly corre-lated as c out varies. • Observation 2 (Dynamics of eigenvalues):

In the stochas-tic block model with two clusters of equal size, thesecond smallest eigenvalue λ linearly increases andﬁnally reaches a steady state as c out increases, while theeigenvalues λ and λ above it are stationary all the time.In the stochastic block model with three clusters ofequal size, the second smallest eigenvalue λ and thethird smallest eigenvalue λ linearly increases and ﬁnallyreaches a steady state as c out increases, while the eigen-values λ and λ above them are stationary all the time. • Observation 3 (Phase transition):

In the stochastic blockmodel with two clusters of equal size, both the detectionerror of spectral algorithm and the spectral gap λ − λ undergoes a same phase transition as c out varies. Thespectral gap λ − λ is stationary and close to all thetime. For example, in Fig. 2(b) when c out < the spectralalgorithm can discover the true clusters correctly and λ − λ is signiﬁcantly larger than λ − λ . When c out > ,the spectral algorithm works like a random guess and λ − λ is mixed with λ − λ .In the stochastic block model with three clusters of equalsize, both the detection error of spectral algorithm and thespectral gap λ − λ undergoes a same phase transitionas c out varies. The spectral gap λ − λ is stationary andclose to all the time.Empirically, we conclude that1) Graph entropy and community structure:

Both the vonNeumann graph entropy and structural information re-veals nothing about the community structure and assor-tativity/disassortativity of graphs.2)

Spectral gaps and community structure:

If a graph hassigniﬁcant community structure with k clusters, then thespectral gap λ k +1 − λ k should be signiﬁcantly largerthan λ k +2 − λ k +1 and λ k − λ k − . Conversely, if thereis a signiﬁcant spike in the sequence of spectral gaps { λ i +1 − λ i } n − i =1 of a graph, the graph should have signif-icant community structure that could be easily detectedby some algorithms. B. Adversarial Attacks on Community Detection

The empirical ﬁndings provide us the intuition that theground-truth community structure would not be easily detectedif the spikes in the sequence of spectral gaps are suppressed. Therefore, we are interested in solving the following commu-nity obfuscation problem by exploiting the Laplacian spec-trum.

Problem 3 (Community Obfuscation) . Minimally perturb thegraph G = ( V, E ) with community structure P such that P cannot be easily detected by algorithms. Not like the graphs generated from stochastic block model,the real-world graphs have unknown number of clusters withvarying sizes. Therefore, it is hard to predict where the spikeis in the sequence of spectral gaps. And it is computationallyexpensive to obtain the full spectral gaps. Since the spikesrepresent the uneven distribution of spectrum, alternativelywe can hide the community structure by maximizing somehomogeneity measures on the Laplacian spectrum. Besidesthe von Neumann graph entropy H vn ( G ) , we propose anotherhomogeneity measure called spectral polarization. Deﬁnition 8 (Spectral polarization) . The spectral polarization P ( G ) of a graph G = ( V, E ) of order n is deﬁned as P ( G ) = n (cid:88) i =1 (cid:18) λ i vol( G ) − λ vol( G ) (cid:19) , where λ ≤ λ ≤ · · · ≤ λ n are the eigenvalues of theLaplacian matrix of the graph G , λ = n (cid:80) ni =1 λ i is theaverage eigenvalue, and vol( G ) = (cid:80) ni =1 λ i is the volume of G . Lemma 4. P ( G ) = G ) − n + (cid:80) ni =1 d i vol ( G ) .Proof. P ( G ) = 1vol ( G ) n (cid:88) i =1 ( λ i − λ ) = 1vol ( G ) (cid:32) n (cid:88) i =1 λ i − λ n (cid:88) i =1 λ i + nλ (cid:33) = 1vol ( G ) (cid:32) n (cid:88) i =1 d i + n (cid:88) i =1 d i − nλ (cid:33) = 1vol( G ) − n + (cid:80) ni =1 d i vol ( G ) . Now suppose that we are allowed to add at most k newedges to G to hide the community structure, we can use Algo-rithm 1 to approximately maximize spectral entropy H vn ( G ) or reset the edge centrality EC ( u, v ) = d u + d v to minimizespectral polarization P ( G ) . C. Effectiveness of von Neumann Graph Entropy and SpectralPolarization in Community Obfuscation

We use differential analysis to show that both maximizingvon Neumann graph entropy H vn ( G ) and minimizing spectralpolarization P ( G ) are effective in community obfuscation. Theorem 7.

Minimally perturbing the graph G = ( V, E ) by greedily maximizing the von Neumann graph entropy caneffectively hide the community structure. OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 11 . . . E n t r o p y H vn H . . . S p ec t r u m λ λ λ D e t ec t i o n E rr o r c out . . S p ec t r a l G a p λ − λ λ − λ (a) d = 5 , c in + c out = 10 . . E n t r o p y H vn H S p ec t r u m λ λ λ D e t ec t i o n E rr o r c out S p ec t r a l G a p λ − λ λ − λ (b) d = 15 , c in + c out = 30 . . . E n t r o p y H vn H S p ec t r u m λ λ λ D e t ec t i o n E rr o r c out S p ec t r a l G a p λ − λ λ − λ (c) d = 25 , c in + c out = 50 Fig. 2: The structural information, von Neumann graph entropy, Laplacian spectrum, spectrum gap, and detection error ofsynthetic graphs from stochastic block model with nodes. There are two clusters of equal size . . . . E n t r o p y H vn H . . . S p ec t r u m λ λ λ λ D e t ec t i o n E rr o r c out . . S p ec t r a l G a p λ − λ λ − λ λ − λ (a) d = 5 , c in + 2 c out = 15 . . E n t r o p y H vn H S p ec t r u m λ λ λ λ D e t ec t i o n E rr o r c out . . . S p ec t r a l G a p λ − λ λ − λ λ − λ (b) d = 15 , c in + 2 c out = 45 . . E n t r o p y H vn H S p ec t r u m λ λ λ λ D e t ec t i o n E rr o r c out S p ec t r a l G a p λ − λ λ − λ λ − λ (c) d = 25 , c in + 2 c out = 75 Fig. 3: The structural information, von Neumann graph entropy, Laplacian spectrum, spectrum gap, and detection error ofsynthetic graphs from stochastic block model with nodes. There are three clusters of equal size . Theorem 8.

Minimally perturbing the graph G = ( V, E ) bygreedily minimizing the spectral polarization can effectivelyhide the community structure. Since the proofs of Theorem 7 and Theorem 8 are similar,we only prove Theorem 7 for reference.

Proof of Theorem 7.

Suppose that we minimally perturb thegraph G by adding a new edge e . The Laplacian spectrumof the original graph G is denoted by λ ( G ) = ( λ , . . . , λ n ) .The Laplacian spectrum of the perturbed graph G (cid:48) = ( V, E ∪ { e } ) is denoted by λ ( G (cid:48) ) = ( λ (cid:48) , . . . , λ (cid:48) n ) . According to theclassic matrix perturbation theory, λ (cid:48) i = λ i + δλ i for any i ∈{ , . . . , n } where δλ i ≥ is a very small increment. The sumof these increments is n (cid:88) i =1 δλ i = n (cid:88) i =1 λ (cid:48) i − n (cid:88) i =1 λ i = 2 . Since both G and G (cid:48) are assumed to be connected, λ (cid:48) = λ = δλ = 0 and λ (cid:48) > λ > . OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 12

According to (3), maximizing H vn ( G (cid:48) ) is equivalent tominimize n (cid:88) i =1 f ( λ (cid:48) i ) = n (cid:88) i =1 f ( λ i + δλ i )= n (cid:88) i =2 f ( λ i ) + f (cid:48) ( λ i ) · δλ i = n (cid:88) i =2 f (cid:48) ( λ i ) · δλ i + n (cid:88) i =2 f ( λ i ) . (7)Therefore, the optimal edge e can be found by minimizing (cid:80) ni =2 f (cid:48) ( λ i ) · δλ i subject to the constraints (cid:80) ni =2 δλ i = 2 , λ (cid:48) i ≤ λ (cid:48) i +1 for any i ∈ { , . . . , n − } , and δλ i ≥ forany i ∈ { , . . . , n } . Since f (cid:48) ( λ ) ≤ f (cid:48) ( λ ) ≤ . . . ≤ f (cid:48) ( λ n ) ,the optimal edge e maximizing H vn ( G (cid:48) ) assigns larger valueto δλ , δλ , . . . than δλ n , δλ n − , . . . . Therefore, the spectralgaps indicating the community structure should disappear veryquickly if we greedily maximizing H vn ( G ) by adding edgesone by one. Corollary 3.

Minimally perturbing the graph G = ( V, E ) bymaximizing the structural information H ( G ) can effectivelyhide the community structure. Corollary 4.

Detecting the community structure in d -regulargraph G d is hard.Proof. According to Corollary 2, ≤ ∆ H ( G d ) ≤ log ed .Since H ( G d ) = log n , we have H vn ( G d ) = H ( G d ) − ∆ H ( G d ) ∈ (cid:20) log n − log ed , log n (cid:21) . Therefore, H vn ( G d ) is close to its maximum value log ( n − ,implying that the spectral gaps λ i +1 − λ i → for any i . According to the relation between spectral gaps and thesigniﬁcance of community structure, G d has no signiﬁcantcommunity structure.VII. E XPERIMENTS AND E VALUATIONS

We conduct extensive experiments over both synthetic andreal-world datasets to answer the following questions:Q1.

Universality of the entropy gap over arbitrary simplegraphs: Is the entropy gap close to 0 for a wide range ofgraphs? Is the structural information a good proxy of thevon Neumann graph entropy for a wide range of graphs?Q2.

Sensitivity of the entropy gap to graph properties: Howdo graph properties affect the value of entropy gap?Q3.

Accuracy of the approximation: As a proxy of the vonNeumann graph entropy, is the structural informationmore accurate than its prominent competitors?Q4.

Speed of the computation: Is the computation of thestructural information faster than its prominent competi-tors?Q5.

Extensibility of the entropy gap to weighted graphs: Isthe entropy gap sensitive to the change of edge weights?Is the entropy gap still close to 0 for weighted graphs?Q6.

Performance analysis (Appendix

A): What is the per-formance of EntropyAug (Algorithm 1) in maximizing TABLE IV: Real-world datasets used in our experiments.

Name

Nodes

Edges Category Statistics

Static graphs without timestamps Avg. degreeZachary (ZA) 34 78 Friendship 4.59Dolphins (DO) 62 159 Animal 5.13Jazz (JA) 198 2,742 Contact 27.70Skitter (SK) 1,696,415 11,095,298 Internet 13.08Brightkite (BK) 58,228 214,078 Friendship 7.35Caida (CA) 26,475 53,381 Internet 4.03YouTube (YT) 1,134,890 2,987,624 Friendship 5.27LiveJournal (LJ) 3,997,962 34,681,189 Friendship 17.35Pokec (PK) 1,632,803 22,301,964 Friendship 27.32Dynamic graphs with timestamps

SnapshotsWiki-IT (WK) 1,204,009 34,826,283 Hyperlink 100Facebook (FB) 61,096 788,135 Friendship 29 the von Neumann graph entropy? What is the perfor-mance of IncreSim (Algorithm 2) in analyzing graphstreams? Can the structural information distance be fur-ther used to detect anomalies in a graph stream? Are max-imizing the von Neumann graph entropy and minimizingthe spectral polarization effective in hiding communitystructure?

A. Experimental Settings

Datasets : We consider both synthetic graphs and real-world graphs. The synthetic graphs are generated from threewell-known random graph models: Erd¨os-R´enyi (ER) model,Barab´asi-Albert (BA) model [51], and Watts-Strogatz (WS)model [52]. The real-world graphs [53], [54], [55] used in ourexperiments are listed in Table IV, which contain both staticgraphs with varying size and average degree, and temporalgraphs with varying size and time span. In every staticgraph, we ignore the direction and weight of all edges andremove both self-loops and multiple edges. We treat everytemporal graph as a stream of undirected weighted edges withtimestamps. For the convenience of analysis, we partition theseedges into several groups where each group is within a certaintime interval.

Hardwares : The experiments have been performed on a serverwith Intel(R) Xeon(R) CPU 2.40 GHz (32 virtual cores) and256GB RAM, averaging 10 runs for random algorithms andrandom inputs unless stated otherwise.

Implementation : All of the proposed algorithms and baselinesare implemented in Python.

B. Q1. Universality (Fig. 4)

To evaluate the universality of the entropy gap, we measurethe structural information and exact von Neumann entropyon a set of synthetic graphs with 2,000 nodes. For the ERand BA models, we generate graphs with average degreein { , , . . . , } . For the WS model, we generate graphswith edge rewiring probability in { , / , . . . , } for eachaverage degree in { , , , } . We additionally measure thesharpened lower and upper bounds of the entropy gap. Theresults are shown in Fig. 4.The observations are three fold. First, the entropy gapis close to 0 for a wide range of graphs . The entropy OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 13 gap of each synthetic graph is no more than . , whereasthe exact von Neumann entropy is greater than . Second, the entropy gap is negatively correlated with the averagedegree . Dense graph tends to have very small entropy gap.Third, the structural information is linearly correlated withthe von Neumann graph entropy , with only few exceptions.There is no exception for the ER synthetic graphs. For theBA synthetic graphs, the exceptions are those graphs withextremely small average degree. For the WS synthetic graphs,the exceptions are those graphs with extremely small edgerewiring probability. C. Q2. Sensitivity (Fig. 4, Fig. 5)

To evaluate the sensitivity of the entropy gap to graphproperties such as average degree, graph size, and rewiringprobability, we further measure the entropy gap of 10 syn-thetic graphs with average degree in { , , . . . , } for each random model. The average degree is chosen from { , , , , , } for ER and BA models, and the edgerewiring probability is chosen from { , . , . , . , . , } forthe WS model.The observations from Fig. 4 and Fig. 5 are three fold. First,the entropy gap decreases as the average degree increases forall the three random graph models. Second, the entropy gapdecreases as the edge rewiring probability increases for theWS model. Third, the entropy gap is nearly insensitive tothe change of graph size . D. Q3. Accuracy (Fig. 6)

To evaluate the accuracy of the structural informationas an approximation of the von Neumann graph entropy,we measure the structural information, exact von Neumannentropy (when the graph size is small), and three promi-nent approximations (as competitors) in 9 real-world staticgraphs. The competitors are 1) FINGER- (cid:98) H [10] deﬁnedas (cid:98) H F ( G ) = − Q log ( λ max / tr( L )) where Q = 1 − tr( L ) / tr ( L ) , 2) FINGER- (cid:101) H [10] deﬁned as (cid:101) H F ( G ) = − Q log (2 d max / tr( L )) , and 3) SLaQ [2]. The results inFig. 6 show that the structural information is an accurateapproximation of the von Neumann graph entropy . Theapproximation error of structural information is obviouslymuch smaller than (cid:98) H F and (cid:101) H F . And it is comparable to theapproximation error of SLaQ with only few exceptions such asYT and SK where the structural information is slightly better. E. Q4. Speed (Fig. 7)

To evaluate computational speed of the structural informa-tion, we measure the running time of structural informationand its three competitors in 9 real-world static graphs. Theresults in Fig. 7 show that the computation of structuralinformation is fast . It is about 2 orders of magnitude fasterthan (cid:98) H F , at least 2 orders of magnitude faster than SLaQ,and comparable to (cid:101) H F . Combining Fig. 6 and Fig. 7, weconclude that the structural information is the only one thatachieves both high efﬁciency and high accuracy among theprominent methods . F. Q5. Extensibility (Fig. 8)

To evaluate the extensibility of the entropy gap to weightedgraphs, we measure the entropy gap of synthetic weightedgraphs. Speciﬁcally, we choose 3 real-world graphs (ZA, DO,JA) with small size, a complete graph K and ring graph R each with 1000 nodes. The weight of each edge isset uniformly at random in the range [1 , w ] . We repeat theexperiments for each w ∈ { , , . . . , } . The results in Fig. 8show that the entropy gap is insensitive to the changeof edge weights in these graphs . Therefore, it is of highprobability that the entropy gap is still very small for a widerange of weighted graphs.VIII. C ONCLUSIONS AND F UTURE W ORK

In this work, we suggest to use the structural information asa proxy of the von Neumann graph entropy such that provableaccuracy, scalability, and interpretability are achieved at thesame time. Since experimental results show the entropy gapis insensitive to the graph size, we can estimate the entropygap of a very large graph using small graphs generated fromthe same generative random graph model. We believe that ouridea also provides new insights into approximations of graphspectral descriptors: besides function approximation, we cantry to approximate the graph spectrum using simple and easilyavailable graph statistics, such as the degree sequence.There are multiple tangible research fronts we can pursue.First, in some access limited scenarios such as the World WideWeb, the complete degree sequence is often not available,therefore we need to develop sampling-based methods toestimate the structural information. Second, both the vonNeumann graph entropy and the structural information canbe viewed as a function on the edge set. Their properties suchas the submodularity and monotonicity is under exploration.Last, the approximation of von Neumann entropy deﬁned onthe eigenvalues of normalized Laplacian matrix is still in itsinfancy. A

PPENDIX AA DDITIONAL E XPERIMENTS

A. Performance of EntropyAug (Fig. 9)

To evaluate the performance of EntropyAug (Algorithm 1)in maximizing the von Neumann graph entropy, we measurethe running time and dynamics of von Neumann graph entropyfor EntropyAug and two competitors in three small real-world graphs ZA, DO, and JA. The two baselines are 1)“random” referring to the random addition of k non-existingedges, and 2) “algebraic” [18] referring to greedy additionof k non-existing edges that leads to the largest increase ofthe algebraic connectivity λ n − . We believe the “algebraic”algorithm is a competent competitor since maximizing λ n − would make the Laplacian spectrum concentrated on its mean,thereby maximizing the von Neumann entropy. The results inFig. 9 show that EntropyAug is the only one that achievesboth high efﬁciency and large increments of von Neumanngraph entropy . OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 14

Average Degree − E n t r o p y G a p ∆ H SLB SUB

Average Degree . . E n t r o p y H vn H . . H . . . H v n (a) ER model Average Degree − − E n t r o p y G a p ∆ H SLB SUB

Average Degree . . . E n t r o p y H vn H . . . H . . . H v n (b) BA model . . . . . . Edge Rewiring Probability − − E n t r o p y G a p ¯ d = 6 ¯ d = 10 ¯ d = 20 ¯ d = 500 . . . Probability . . H v n .

85 10 .

90 10 . H . . H v n (c) WS model Fig. 4: The structural information, von Neumann graph entropy, and entropy gap of synthetic graphs generated from threerandom graph models with , nodes, varying average degree, and edge rewiring probability. Number of Nodes . . . . E n t r o p y G a p ¯ d = 2¯ d = 5 ¯ d = 10¯ d = 20 ¯ d = 50¯ d = 100 (a) ER model Number of Nodes . . . . E n t r o p y G a p ¯ d = 2¯ d = 6 ¯ d = 10¯ d = 20 ¯ d = 50¯ d = 100 (b) BA model Number of Nodes . . . . E n t r o p y G a p p = 0 p = 0 . p = 0 . p = 0 . p = 0 . p = 1 (c) WS model ( ¯ d = 20 ) Fig. 5: Effects of input graph properties on the entropy gap for three random graph models.

ZA DO JA CA BK YT SK LJ PK E n t r o p y V a l u e H e H F b H F SLaQ H vn Fig. 6: Structural information is an accurate proxy of vonNeumann graph entropy. The exact von Neumann graph en-tropy lies in the red dotted box whose height is log e . CA BK YT SK LJ PK E x ec u t i o n T i m e ( m s ) × × × × × × H e H F b H F SLaQ

Fig. 7: The computation of structural information is fast.

B. Performance of IncreSim (Fig. 10)

To evaluate the performance of IncreSim (Algorithm 2) andits relation with the VEO score, we measure the distancebetween two adjacent graphs in two real-world temporalgraphs. We choose three methods (IncreSim, VEO score, and

Edge Weight w − − E n t r o p y G a p K R ZADO JA

Fig. 8: The entropy gap is insensitive to the edge weights.deltaCon) along with two simple measures (the number ofadded edges and the number of deleted edges). The VEOscore [56] between two adjacent graphs G t and G t +1 isdeﬁned as − | V t ∩ V t +1 | + | E t ∩ E t +1 | ) | V t | + | V t +1 | + | E t | + | E t +1 | , which measures thechange rate of edge set and node set. The deltaCon [57] isa prominent method to measure graph similarity based on fastbelief propagation. The results are shown in Fig. 10.The observations are two fold. First, the structural infor-mation distance is linearly correlated with the VEO score ,indicating that the structural information distance is not domi-nated by only local information, but rather a global measure onthe graphs. For the FB temporal graph, the Pearson correlationcoefﬁcient and Spearman rank-order correlation coefﬁcient of D SI with the VEO score are (0 . , . respectively, whichis much higher than (0 . , . with deltaCon. For the WKtemporal graph, the two correlation coefﬁcients of D SI with theVEO score are (0 . , . respectively, which is also muchhigher than ( − . , . with deltaCon. Second, all of thethree methods effectively captures the dynamics of graph OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 15

Number of Edge Additions . . . E n t r o p y H v n random algebraic ours E x ec u t i o n T i m e ( m s ) (a) Zachary (ZA) Number of Edge Additions . . . . . E n t r o p y H v n random algebraic ours − E x ec u t i o n T i m e ( m s ) (b) Dolphins (DO) Number of Edge Additions . . . E n t r o p y H v n random algebraic ours E x ec u t i o n T i m e ( m s ) (c) Jazz (JA) Fig. 9: Compared with the other two methods, our structural information based method is the only one that achieves both highefﬁciency and large increments of von Neumann graph entropy.

Graph Index (FB) . . . D i s t a n ce Graph Index (WK) . . . D SI VEO deltaCon added edges deleted edges

Fig. 10: Distance between adjacent graphs in graph streams.The number of added/deleted edges is divided by the totalnumber of added/deleted edges. streams . For the FB temporal graph, the trend of the threedistance measures are similar. For the WK temporal graph,we can see that the distance measure changes dramatically inthe beginning, then gradually turns to be ﬂat, which impliesthat the structure of WK temporal graph gradually becomesstable.

C. Performance in Anomaly Detection (Fig. 11)

We further evaluate the effectiveness of the structural infor-mation distance in detecting the distributed denial-of-service(DDoS) attacks in a graph stream. We ﬁrst generate synthetic graphs G = { G t } t =1 from the BA model, each ofwhich has 100 nodes and average degree d = 4 . We believethat the synthetic graph stream G is a good representative ofthe real-world scale-free graph streams. Then we model theDDoS attack with strength k as follows: (1) Randomly selecta graph G t ∗ from G . (2) Transform G t ∗ into an anomalousgraph G (cid:48) t ∗ . Speciﬁcally, we ﬁrst randomly select a target node v , then randomly select k source nodes S = { s i } ki =1 . Finally,we connect the target node v with the source node s i for each i ∈ { , . . . , k } . (3) Generate the anomalous graph stream G (cid:48) via replacing the graph G t ∗ from G with G (cid:48) t ∗ .We use graph distance measure to rank the anomalousgraph in a graph stream. Suppose that the distance between G t and G t +1 is θ t,t +1 , then the anomalous score for G t is θ t − ,t + θ t,t +1 . We rank the graphs according to their anomalousscores in descending order. Then we use the rank of thetrue synthetic anomalous graph to measure the effectivenessof the graph distance measure in detecting DDoS attacks.We choose four candidates for the graph similarity measure: D SI , D QJS , VEO score, and deltaCon. And we repeat the random DDoS attacks for 100 times for each attack strength k ∈ { , , , , } . The results are shown in Fig. 11.The observations are two fold. First, D SI and D QJS havesimilar behavior in analyzing graph streams. Their trends inanalyzing the synthetic graph stream G is nearly identical.Second, the structural information distance D SI is very suitablefor detecting DDoS attacks in a graph stream. The structuralinformation distance D SI behaves better than the other com-petitors for the attack strength k ∈ { , , , } . When k ∈ { , } , the performance of all the distance measures aremainly affected by the properties of the original normal graphstream. D. Performance in Community Obfuscation

To evaluate the performance of maximizing von Neumanngraph entropy (denoted as A ) and minimizing the spectralpolarization (denoted as A ) in community obfuscation, wemeasure the dynamics of spectral gaps, detection error, graphentropy, and spectral polarization in the greedy edge additionprocess. We evenly allocate the budget of edge additionsamong all the community pairs. The results are shown inFig. 12 and Fig. 13.The observations are four folds. First, the graph entropy aremonotonically increasing w.r.t. the number of added edges.Second, the spectral polarization are monotonically decreas-ing w.r.t. the number of added edges. Third, the detectionerror are monotonically increasing w.r.t. the number of addededges. Therefore, both A and A are effective in communityobfuscation. Fourth, the spectral gap indicating the existenceof community structure slightly increases in the beginning andthen goes down quickly as more edges are added.A PPENDIX BP ROOF OF T ABLE

III

A. Preliminaries: Several Integrations

Lemma 5.

The integration I (cid:44) (cid:90) π log (1 − cos( x ))d x = − π. Proof.

Let x = t + π/ , then d x = d t and (cid:90) ππ/ log (1 − cos( x ))d x = (cid:90) π/ log (1 − cos( t + π/ t = (cid:90) π/ log (1 + cos( π/ − t ))d t. OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 16 (1,2) (2,3) (3,4) (4,5) (5,6) (6,7) (7,8) (8,9)(9,10)

Index of Adjacent Graph Pair . . . . D i s t a n ce D SI D QJS deltaCon VEO (a) Distances between adjacent graphsin a synthetic graph stream G from BAmodel with 100 nodes and d = 4 . . . Frequency12345678910 R a n k o f A n o m a l o u s G r a ph DDoS(5) . . Frequency12345678910

DDoS(10) . . Frequency12345678910

DDoS(20) . . Frequency12345678910

DDoS(30) . . Frequency12345678910

DDoS(40)

Frequency12345678910

DDoS(50) D SI D QJS deltaConVEO (b) Rank of the anomalous graph under DDoS attack in the synthetic graph stream.

Fig. 11: Structural information distance is well suited for detecting DDoS attacks in a graph stream. D e t ec t i o n E rr o r A A λ − λ A A . . λ − λ A A . . . E n t r o p y H v n A A Number of Added Edges . . P o l a r i z a t i o n A A (a) c in = 28 , c out = 2 D e t ec t i o n E rr o r A A . . λ − λ A A . . . λ − λ A A . . . E n t r o p y H v n A A Number of Added Edges . . P o l a r i z a t i o n A A (b) c in = 10 , c out = 40 Fig. 12: Community obfuscation on two graphs generated fromstochastic block model with nodes. There are two clustersof equal size .Let z = π/ − t , then d z = − d t and (cid:90) π/ log (1 + cos( π/ − t ))d t = − (cid:90) π/ log (1 + cos( z ))d z. Therefore, (cid:90) ππ/ log (1 − cos( x ))d x = (cid:90) π/ log (1 + cos( x ))d x. Then I = (cid:90) π/ log (1 − cos( x ))d x + (cid:90) ππ/ log (1 − cos( x ))d x = (cid:90) π/ log (1 − cos( x ))d x + (cid:90) π/ log (1 + cos( x ))d x = (cid:90) π/ log (sin ( x ))d x = 2 (cid:90) π/ log (sin( x ))d x. . . λ − λ A A λ − λ A A . . λ − λ A A D e t ec t i o n E rr o r A A . . E n t r o p y H v n A A Number of Added Edges . . . P o l a r i z a t i o n A A (a) c in = 35 , c out = 5 . . . λ − λ A A λ − λ A A . . . λ − λ A A D e t ec t i o n E rr o r A A . . E n t r o p y H v n A A Number of Added Edges . . P o l a r i z a t i o n A A (b) c in = 55 , c out = 10 Fig. 13: Community obfuscation on three graphs generatedfrom stochastic block model with nodes. There are threeclusters of equal size .Let x = π/ − t , then d x = − d t and (cid:90) π/ log (sin( x ))d x = − (cid:90) π/ log (sin( π/ − t ))d t = (cid:90) π/ log (sin( π/ − t ))d t = (cid:90) π/ log (sin( π/ t ))d t = (cid:90) ππ/ log (sin( x ))d x. OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 17

Therefore, I = (cid:90) π log (sin( x ))d x = (cid:90) π/ log (sin(2 t ))d(2 t )= 2 (cid:90) π/ log (2 sin( t ) cos( t ))d t = 2 (cid:32) π (cid:90) π/ log (sin( t ))d t + (cid:90) π/ log (cos( t ))d t (cid:33) = π + I + 2 (cid:90) π/ log (cos( t ))d t. As a result, π + 2 (cid:90) π/ log (cos( x ))d x = π + 2 (cid:90) π/ log (sin( π/ − x ))d x = π − (cid:90) π/ log (sin( z ))d z = π + 2 (cid:90) π/ log (sin( z ))d z = π + I . Therefore, I = − π . Lemma 6.

The integration I (cid:44) (cid:90) π cos( x ) log (1 − cos( x ))d x = − π log e. Proof.

Let t = sin( x ) , then d t = cos( x )d x , cos( x ) = √ − t for x ∈ (0 , π/ , and cos( x ) = −√ − t for x ∈ ( π/ , π ) .Therefore I = (cid:90) π/ cos( x ) log (1 − cos( x ))d x + (cid:90) ππ/ cos( x ) log (1 − cos( x ))d x = (cid:90) log (1 − (cid:112) − t )d t + (cid:90) log (1 + (cid:112) − t )d t = (cid:90) log (cid:32) − √ − t √ − t (cid:33) d t = (cid:90) log (cid:32) (1 − √ − t ) t (cid:33) d t = 2 (cid:90) log (1 − (cid:112) − t )d t − (cid:90) log t d t. Deﬁne I (cid:44) (cid:82) log (1 − √ − t )d t , I (cid:44) (cid:82) log t d t , anda new function G ( t ) (cid:44) t ln(1 − √ − t ) − t − sin − ( t ) . Then d G ( t )d t = ln(1 − (cid:112) − t ) + t √ − t (1 − √ − t ) − − √ − t = ln(1 − (cid:112) − t ) , therefore, I = 1ln 2 (cid:90) ln(1 − (cid:112) − t )d t = log e ( G (1) − G (0))= − (cid:16) π (cid:17) log e. I = 1ln 2 (cid:90) ln t d t = log e (cid:18) t ln t | − (cid:90) t d ln t (cid:19) = − log e (cid:90) t · t d t = − log e. Finally, I = 2 I − I = − π log e. Deﬁne a new function g ( x ) (cid:44) f (2 − x )) , then wehave the following corollary. Corollary 5.

The integration I (cid:44) (cid:90) π g ( x )d x = 2 π log e. Proof. I = (cid:90) π (2 − x )) log (2 − x ))d x = 2 (cid:90) π log (2 − x ))d x − (cid:90) π cos( x ) log (2 − x ))d x = 2( π + I ) − (cid:18)(cid:90) π cos( x )d x + I (cid:19) = 2( π + I − I )= 2 π log e. Corollary 6. (cid:90) π g ( x )d x = 2 (cid:90) π g ( x )d x = 2 (cid:90) ππ g ( x )d x. Proof.

Let x = π − t , then d x = − d t and (cid:90) π g ( x )d x = − (cid:90) π g ( π − t )d t = (cid:90) π g ( π − t )d t. Let x = π + t , then d x = d t and (cid:90) ππ g ( x )d x = (cid:90) π g ( π + t )d t. Since cos( π − t ) = cos( π + t ) , g ( π − t ) = g ( π + t ) . Thus (cid:90) π g ( π − t )d t = (cid:90) π g ( π + t )d t. Therefore (cid:90) π g ( x )d x = (cid:90) ππ g ( x )d x = 12 (cid:90) π g ( x )d x. OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 18

B. Complete Graph

The Laplacian spectrum of complete graph K n is λ =( n, n, . . . , n, and the degree sequence of K n is d = ( n − , n − , . . . , n − , thus H ( K n ) = log n and H vn ( K n ) =log ( n − yielding ∆ H ( K n ) = H ( K n ) − H vn ( K n ) =log (1 + n − ) . C. Complete Bipartite Graph

The Laplacian spectrum of complete bipartite graph K a,b is λ = ( a + b, a, . . . , a (cid:124) (cid:123)(cid:122) (cid:125) b − , b, . . . , b (cid:124) (cid:123)(cid:122) (cid:125) a − , , and the degree sequence of K a,b is d = ( a, . . . , a (cid:124) (cid:123)(cid:122) (cid:125) b , b, . . . , b (cid:124) (cid:123)(cid:122) (cid:125) a ) , therefore H ( K a,b ) = log (2 ab ) − ba log a + ab log b ab = 1 + 12 log ( ab ) , and H vn ( K a,b ) = log (2 ab ) − ba log a + ab log b ab − ( a + b ) log ( a + b ) − a log a − b log b ab = 1 + 12 log ( ab ) − log (1 + ba )2 b − log (1 + ab )2 a . The entropy gap ∆ H ( K a,b ) = H ( K a,b ) − H vn ( K a,b )= log (1 + ba )2 b + log (1 + ab )2 a . D. Path Graph

The Laplacian spectrum of path graph P n is λ = (cid:18) − (cid:18) πkn (cid:19) , k = 0 , . . . , n − (cid:19) , and the degree sequence of P n is d = (1 , , . . . , (cid:124) (cid:123)(cid:122) (cid:125) n − , , therefore H ( P n ) = log (2 n − − ( n − · n −

2= log ( n −

1) + 1 n − , and H vn ( P n ) = log (2 n − − (cid:80) n − k =0 f (2 − πkn ))2 n − . Then H vn ( P n ) − log (2 n − can be expressed as − (cid:80) n − k =0 g ( πkn )2 n − (cid:34) πn n − (cid:88) k =0 g (cid:18) πkn (cid:19)(cid:35) · nπ · − n − n →∞ −−−−→ − π (cid:90) π g ( x )d x = − log e. Therefore, H vn ( P n ) − log ( n − n →∞ −−−−→ − log e . E. Ring

The Laplacian spectrum of ring graph R n is λ = (cid:18) − (cid:18) πkn (cid:19) , k = 0 , . . . , n − (cid:19) , and the degree sequence of R n is d = (2 , , . . . , , therefore H ( R n ) = log n and H vn ( R n ) = log (2 n ) − (cid:80) n − k =0 f (2 − πkn ))2 n . Then H vn ( R n ) − log (2 n ) can be expressed as − (cid:80) n − k =0 g ( πkn )2 n = (cid:34) πn n − (cid:88) k =0 g (cid:18) πkn (cid:19)(cid:35) · n π · − n n →∞ −−−−→ − π (cid:90) π g ( x )d x = − log e. Therefore, H vn ( R n ) − log n n →∞ −−−−→ − log e .A PPENDIX CP ROOF OF T HEOREM D SI andthe Jensen-Shannon divergence D JS , then the pseudometricproperties of D SI simply follow from the metric properties of √D JS .The structural information H ( G j ) = − n (cid:88) i =1 f (cid:18) d i,j vol( G j ) (cid:19) = H ( P j ) where P j = (cid:16) d ,j vol( G j ) , . . . , d n,j vol( G j ) (cid:17) is a distribution on the set V .In the graph G = ( V, E ∪ E , A ) , the degree d i of node i is d i = n (cid:88) j =1 A ij = n (cid:88) j =1 A ij, G ) + A ij, G )= d i, G ) + d i, G ) . Then the volume of G is vol( G ) = (cid:80) ni =1 d i = 1 . Thereforethe structural information of G is H ( G ) = − n (cid:88) i =1 f (cid:18) d i vol( G ) (cid:19) = − n (cid:88) i =1 f (cid:18) d i, G ) + d i, G ) (cid:19) , which is equivalent to the entropy of the distribution ( P + P ) / . As a result, D SI ( G , G ) = (cid:112) D JS ( P , P ) . OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 19 A PPENDIX DP ROOF OF T HEOREM D SI . It remainsto prove that D QJS ( G , G ) ≤ , and if min { d i, , d i, } = 0 for every node i ∈ V then D QJS ( G , G ) = 1 .We prove D QJS ( G , G ) ≤ using the inequality [58], [23]for the von Neumann entropy: if ρ = (cid:80) i p i ρ i is a mixtureof density matrix ρ i with p i a set of positive real numberssuch that (cid:80) i p i = 1 , then H vn ( (cid:80) i p i ρ i ) ≤ (cid:80) i p i H vn ( ρ i ) + H ( { p i } ) . Here the density matrix ρ i can be viewed as thescaled Laplacian matrix ˜ L i (cid:44) L i / tr ( L i ) of the graph G i . Then D QJS ( G , G ) = (cid:113) H vn ( G ) − ( H vn ( G ) + H vn ( G )) / (cid:113) H vn ( ˜ L + ˜ L ) − ( H vn ( ˜ L ) + H vn ( ˜ L )) / ≤ (cid:113) H vn ( ˜ L + ˜ L ) − H vn (( ˜ L + ˜ L ) /

2) + 1 = 1 . We denote by S j the set of singletons in the graph G j for j ∈ { , } . Since min { d i, , d i, } = 0 for every node i ∈ V , we have S ∪ S = V which implies that ( V \ S ) ∩ ( V \ S ) = ∅ by the De Morgan’s laws. Therefore, the nodeset V can be partitioned into three disjoint subsets V \ S , V \ S , and S ∩ S . Notice that one singleton contributesone eigenvalue of to the Laplacian spectrum, and theLaplacian spectrum of a graph is composed of the Lapla-cian spectrum of its each connected components. We denoteby λ j, , . . . , λ j,n − s j , , . . . , the Laplacian spectrum of G j ,where s j = | S j | for j ∈ { , } . It follows that (cid:80) n − s j i =1 λ j,i =vol( G j ) . Since A = A / G ) + A / G ) , L = L / G ) + L / G ) . Then the Laplacian spectrumof G is composed of Laplacian spectrum of G j divided by G j ) for j ∈ { , } and zeros. As a result, D ( G , G )= − (cid:88) j =1 n − s j (cid:88) i =1 f (cid:18) λ j,i G j ) (cid:19) + 12 (cid:88) j =1 n − s j (cid:88) i =1 f (cid:18) λ j,i vol( G j ) (cid:19) = (cid:88) j =1 n − s j (cid:88) i =1 λ j,i G j ) log (cid:88) j =1 vol( G j )2vol( G j ) = 1 . A PPENDIX EP ROOF OF L EMMA ˜ d the degree sequence of G k +1 , then H ( G k +1 ) = − n (cid:88) i =1 f (cid:32) ˜ d i m + ∆ m ) (cid:33) = f (2( m + ∆ m )) − (cid:80) ni =1 f ( ˜ d i )2( m + ∆ m )= f (2( m + ∆ m )) − (cid:80) i ∈ V k f ( d i + ∆ d i ) − (cid:80) i ∈ V k f ( d i )2( m + ∆ m )= f (2( m + ∆ m )) − a − (cid:80) ni =1 f ( d i )2( m + ∆ m )= f (2( m + ∆ m )) − a − f (2 m ) + 2 m H ( G k )2( m + ∆ m ) . The structural information H ( G k ) is equal to − n (cid:88) i =1 f (cid:32) d i m + ˜ d i m + ∆ m ) (cid:33) = − b − (cid:88) i ∈ V k f (cid:18) m + ∆ m m ( m + ∆ m ) d i (cid:19) = − b − (cid:88) i ∈ V k cd i (log c + log d i )= − b − f ( c ) (cid:88) i ∈ V k d i − c (cid:88) i ∈ V k f ( d i )= − b − f ( c )(2 m − y ) − c ( f (2 m ) − m H ( G k ) − z ) A CKNOWLEDGMENT

The authors would like to thank...R

EFERENCES[1] X. Liu, L. Fu, and X. Wang, “Bridging the gap between von neumanngraph entropy and structural information: Theory and applications,” in

WWW . ACM, 2021.[2] A. Tsitsulin, M. Munkhoeva, and B. Perozzi, “Just slaq when youapproximate: accurate spectral distances for web-scale graphs,” in

Pro-ceedings of The Web Conference 2020 , 2020, pp. 2697–2703.[3] D. Bonchev and G. A. Buck,

Quantitative Measures of Network Com-plexity . Boston: Springer, 2005, pp. 191–235.[4] A. Li and Y. Pan, “Structural information and dynamical complexity ofnetworks,”

IEEE Transactions on Information Theory , vol. 62, no. 6,pp. 3290–3339, Jun 2016.[5] N. Rashevsky, “Life, information theory, and topology,”

Bull. Math.Biophys. , vol. 17, no. 3, pp. 229–235, 1955.[6] C. Raychaudhury, S. K. Ray, J. J. Ghosh, and A. B. Basak, “Discrim-ination of isomeric structures using information theoretic topologicalindices,”

J. Comput. Chem. , vol. 5, no. 6, pp. 581–588, 1984.[7] E. Konstantinova and A. A. Paleev, “Sensitivity of topological indicesof polycyclic graphs,”

Vychisl. Sistemy , vol. 136, pp. 38–48, 1990.[8] M. Dehmer, “Information processing in complex networks: Graph en-tropy and information functionals,”

Appl. Math. Comput. , vol. 201, pp.82–94, 2008.[9] S. L. Braunstein, S. Ghosh, and S. Severini, “The laplacian of a graphas a density matrix: a basic combinatorial approach to separability ofmixed states,”

Annals of Combinatorics , no. 10, pp. 291–317, 2006.[10] P.-Y. Chen, L. Wu, S. Liu, and I. Rajapakse, “Fast incremental von neu-mann graph entropy computation: theory, algorithm, and applications,”in

ICML , Long Beach, California, USA, Jun 2019, pp. 1091–1101.[11] H. Choi, J. He, H. Hu, and Y. Shi, “Fast computation of von neumannentropy for large-scale graphs via quadratic approximations,”

LinearAlgebra and its Applications , vol. 585, pp. 127–146, 2020.[12] E.-M. Kontopoulou, G.-P. Dexter, W. Szpankowski, A. Grama, andP. Drineas, “Randomized linear algebra approaches to estimate thevon neumann entropy of density matrices,”

IEEE Transactions onInformation Theory , vol. 66, no. 8, pp. 5003–5021, 2020.[13] M. De Domenico and J. Biamonte, “Spectral entropies as information-theoretic tools for complex network comparison,”

Phys. Rev. X , vol. 6,p. 041062, Dec 2016.[14] M. D. Domenico, V. Nicosia, A. Arenas, and V. Latora, “Structuralreducibility of multilayer networks,”

Nature Communications , vol. 6,no. 6864, 2015.[15] L. Bai and E. R. Hancock, “Graph clustering using the jensen-shannonkernel,” in

Computer Analysis of Images and Patterns . Berlin: Springer,2011, pp. 394–401.[16] J. Lockhart, G. Minello, L. Rossi, S. Severini, and A. Torsello, “Edgecentrality via the holevo quantity,” in

Structural, Syntactic, and Statis-tical Pattern Recognition . Springer, 2016, pp. 143–152.[17] A. Jamakovic and S. Uhlig, “On the relationship between the algebraicconnectivity and graph’s robustness to node and link failures,” in , 2007, pp. 96–102.[18] A. Ghosh and S. Boyd, “Growing well-connected graphs,” in

IEEE CDC ,2006.

OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 20 [19] G. Minello, L. Rossi, and A. Torsello, “On the von Neumann entropyof graphs,”

Journal of Complex Networks , vol. 7, no. 4, pp. 491–514,11 2018.[20] F. R. K. Chung,

Spectral Graph Theory . American MathematicalSociety, 1997.[21] J. Liao and A. Berg, “Sharpening Jensen’s inequality,”

The AmericanStatistician , vol. 73, no. 3, pp. 278–281, 2019.[22] H. Bai, “The Grone-Merris conjecture,”

Transactions of the AmericanMathematical Society , vol. 363, no. 8, pp. 4463–4474, Aug 2011.[23] A. P. Majtey, P. W. Lamberti, and D. P. Prato, “Jensen-shannon diver-gence as a measure of distinguishability between mixed quantum states,”

Phys. Rev. A , vol. 72, p. 052310, Nov 2005.[24] P. W. Lamberti, M. Portesi, and J. Sparacino, “Natural metric forquantum information theory,” arXiv preprint arXiv:0807.0583 , 2008.[25] J. Bri¨et and P. Harremo¨es, “Properties of classical and quantum jensen-shannon divergence,”

Phys. Rev. A , vol. 79, p. 052311, May 2009.[26] G. Dasoulas, G. Nikolentzos1, K. Scaman, A. Virmaux, and M. Vazir-giannis, “Ego-based entropy measures for structural representations,” arXiv preprint arXiv:2003.00553 , 2020.[27] C. Ye, R. C. Wilson, C. H. Comin, L. d. F. Costa, and E. R. Hancock,“Approximate von neumann entropy for directed graphs,”

Phys. Rev. E ,vol. 89, p. 052804, May 2014.[28] I. Han, D. Malioutov, H. Avron, and J. Shin, “Approximating spectralsums of large-scale matrices using stochastic chebyshev approxima-tions,”

SIAM Journal on Scientiﬁc Computing , vol. 39, no. 4, pp. A1558–A1585, 2017.[29] S. Ubaru, J. Chen, and Y. Saad, “Fast estimation of tr(f(a)) via stochasticlanczos quadrature,”

SIAM Journal on Matrix Analysis and Applications ,vol. 38, no. 4, pp. 1075–1099, 2017.[30] W. Yu, G. Chen, and M. Cao, “Consensus in directed networks of agentswith nonlinear dynamics,”

IEEE Transactions on Automatic Control ,vol. 56, no. 6, pp. 1436–1441, 2011.[31] C. H. Ding, X. He, H. Zha, M. Gu, and H. D. Simon, “A min-max cutalgorithm for graph partitioning and data clustering,” in

ICDM , 2001,pp. 107–114.[32] B. Xiao, E. R. Hancock, and R. C. Wilson, “Graph characteristics fromthe heat kernel trace,”

Pattern Recognition , vol. 42, no. 11, pp. 2589–2606, 2009.[33] A. Tsitsulin, D. Mottin, P. Karras, A. Bronstein, and E. M¨uller, “Netlsd:Hearing the shape of a graph,” in

ACM SIGKDD , 2018, pp. 2347–2356.[34] D. E. Simmons, J. P. Coon, and A. Datta, “The von Neumann Theil in-dex: characterizing graph centralization using the von Neumann index,”

Journal of Complex Networks , vol. 6, no. 6, pp. 859–876, Jan 2018.[35] M. E. J. Newman, “Finding community structure in networks using theeigenvectors of matrices,”

Phys. Rev. E , vol. 74, p. 036104, Sep 2006.[36] F. Krzakala, C. Moore, E. Mossel, J. Neeman, A. Sly, L. Zdeborov´a,and P. Zhang, “Spectral redemption in clustering sparse networks,”

Proceedings of the National Academy of Sciences , vol. 110, no. 52,pp. 20 935–20 940, 2013.[37] J. R. Lee, S. O. Gharan, and L. Trevisan, “Multiway spectral partitioningand higher-order cheeger inequalities,”

J. ACM , vol. 61, no. 6, Dec. 2014.[38] R. R. Nadakuditi and M. E. J. Newman, “Graph spectra and thedetectability of community structure in networks,”

Phys. Rev. Lett. , vol.108, p. 188701, May 2012.[39] S. Nagaraja, “The impact of unlinkability on adversarial community de-tection: Effects and countermeasures,”

Privacy Enhancing Technologies ,vol. 6205, 2010.[40] Y. Chen, Y. Nadji, A. Kountouras, F. Monrose, R. Perdisci, M. An-tonakakis, and N. Vasiloglou, “Practical attacks against graph-basedclustering,” in

CCS . ACM, 2017, pp. 1125–1142.[41] V. Fionda and G. Pirr`o, “Community deception or: How to stop fearingcommunity detection algorithms,”

IEEE Transactions on Knowledge andData Engineering , vol. 30, no. 4, pp. 660–673, 2018.[42] M. Waniek, T. P. Michalak, M. J. Wooldridge, and T. Rahwan, “Hidingindividuals and communities in a social network,”

Nature HumanBehavior , vol. 2, pp. 139–147, 2018.[43] Y. Liu, J. Liu, Z. Zhang, L. Zhu, and A. Li, “Rem: From structuralentropy to community structure deception,” in

Advances in NeuralInformation Processing Systems , vol. 32, 2019, pp. 12 938–12 948.[44] J. Chen, L. Chen, Y. Chen, M. Zhao, S. Yu, Q. Xuan, and X. Yang,“GA-based Q-attack on community detection,”

IEEE Transactions onComputational Social Systems , vol. 6, no. 3, pp. 491–503, 2019.[45] J. Jia, B. Wang, X. Cao, and N. Z. Gong, “Certiﬁed robustnessof community detection against adversarial structural perturbation viarandomized smoothing,” in

WWW . ACM, 2020, pp. 2718–2724. [46] A. W. Marshall, I. Olkin, and B. C. Arnold,

Inequalities: Theory ofMajorization and Its Applications , 2nd ed. Springer Series in Statistics,2011.[47] M. Dairyko, L. Hogben, J. C.-H. Lin, J. Lockhart, D. Roberson,S. Severini, and M. Young, “Note on von neumann and r´enyi entropiesof a graph,”

Linear Algebra and its Applications , vol. 521, pp. 240–253,2017.[48] R. D. Grone, “Eigenvalues and the degree sequences of graphs,”

Linearand Multilinear Algebra , vol. 39, no. 1–2, pp. 133–136, 1995.[49] D. M. Endres and J. E. Schindelin, “A new metric for probabilitydistributions,”

IEEE Transactions on Information Theory , vol. 49, no. 7,pp. 1858–1860, 2003.[50] P. W. Lamberti, A. P. Majtey, A. Borras, M. Casas, and A. Plastino,“Metric character of the quantum jensen-shannon divergence,”

Phys. Rev.A , vol. 77, p. 052311, May 2008.[51] A.-L. Barab´asi and R. Albert, “Emergence of scaling in random net-works,”

Science , vol. 286, no. 5439, pp. 509–512, 1999.[52] D. J. Watts and S. H. Strogatz, “Collective dynamics of ‘small-world’networks,”

Nature , no. 393, pp. 440–442, 1998.[53] J. Leskovec and A. Krevl, “SNAP Datasets: Stanford large networkdataset collection,” http://snap.stanford.edu/data, Jun. 2014.[54] J. Kunegis, “The KONECT Project: Koblenz network collection,” http://konect.cc/networks, 2013.[55] B. Viswanath, A. Mislove, M. Cha, and K. P. Gummadi, “On theevolution of user interaction in facebook,” in

WOSN , 2009, pp. 37–42.[56] P. Papadimitriou, A. Dasdan, and H. Garcia-Molina, “Web graph similar-ity for anomaly detection,”

Journal of Internet Services and Applications ,vol. 1, pp. 19–30, 2010.[57] D. Koutra, N. Shah, J. T. Vogelstein, B. Gallagher, and C. Faloutsos,“Deltacon: principled massive-graph similarity function with attribu-tion,”

ACM Trans. Knowl. Discov. Data , vol. 10, no. 3, Feb. 2016.[58] M. A. Nielsen and I. L. Chuang,