Fast Incremental von Neumann Graph Entropy Computation: Theory, Algorithm, and Applications
FFast Incremental von Neumann Graph Entropy Computation:Theory, Algorithm, and Applications
Pin-Yu Chen Lingfei Wu Sijia Liu Indika Rajapakse Abstract
The von Neumann graph entropy (VNGE) facili-tates measurement of information divergence anddistance between graphs in a graph sequence. Ithas been successfully applied to various learn-ing tasks driven by network-based data. Whileeffective, VNGE is computationally demandingas it requires the full eigenspectrum of the graphLaplacian matrix. In this paper, we propose a newcomputational framework, F ast I ncremental von N eumann G raph E nt R opy (FINGER), which ap-proaches VNGE with a performance guarantee.FINGER reduces the cubic complexity of VNGEto linear complexity in the number of nodes andedges, and thus enables online computation basedon incremental graph changes. We also showasymptotic equivalence of FINGER to the ex-act VNGE, and derive its approximation errorbounds. Based on FINGER, we propose efficientalgorithms for computing Jensen-Shannon dis-tance between graphs. Our experimental resultson different random graph models demonstratethe computational efficiency and the asymptoticequivalence of FINGER. In addition, we applyFINGER to two real-world applications and onesynthesized anomaly detection dataset, and cor-roborate its superior performance over seven base-line graph similarity methods.
1. Introduction
In recent years, graph-based learning has become an ac-tive research field (Shuman et al., 2013; Kalofolias, 2016;Luo et al., 2012; Shivanna & Bhattacharyya, 2014; Wanget al., 2016; Kipf & Welling, 2017; Wu et al., 2018a;b; Xuet al., 2018). Its success is rooted in the advanced capabil-ity of summarizing and representing phenomenal structural IBM Research University of Michigan, Ann Arbor, USA.Correspondence to: Pin-Yu Chen < [email protected] > . Proceedings of the th International Conference on MachineLearning , Long Beach, California, PMLR 97, 2019. Copyright2019 by the author(s). features embedded in graphs. In particular, evaluating sim-ilarity between graphs is crucial to network analysis andgraph-based anomaly detection (Papadimitriou et al., 2010;Akoglu et al., 2015; Ranshous et al., 2015). For example, Ya-nardag and Vishwanathan used graph similarity for learningnovel graph kernels (Yanardag & Vishwanathan, 2015), andSharpnack et al. proposed the Lovasz extended scan statisticfor anomaly detection in connected graphs (Sharpnack et al.,2013). Koutra et al. proposed DeltaCon, a state-of-the-artsimilarity algorithm in terms of its scalability and capabilityof handling weighted graphs using fast belief propagation(Koutra et al., 2016). However, these methods are sensitiveto heuristic metrics and presumed models, and thus providelimited understanding on the general notion of variationsbetween graphs. On the other hand, model-agnostic ap-proaches such as graph entropy have been used to quantifythe structural complexity of a single graph, which relatesto the Shannon entropy of a probability distribution overa function of enumerated subgraphs in a graph (Simonyi,1995; Shetty & Adibi, 2005; Li & Pan, 2016). However,graph entropy can be computationally demanding due to itsuse of exhaustive subgraph search.Different from the aforementioned approaches and inspiredby quantum information theory, the von Neumann graph en-tropy (VNGE) (Braunstein et al., 2006; Passerini & Severini,2008; 2009) facilitates the measure of (quantum) Jensen-Shannon divergence and distance (Endres & Schindelin,2003; Bri¨et & Harremo¨es, 2009) between graphs. It as-sociates with a model-agnostic information measure forquantifying variation between two quantum density matri-ces. In addition, the VNGE has been shown to be linearlycorrelated with classical graph entropy measures (Anand &Bianconi, 2009; Anand et al., 2011). The VNGE and theJensen-Shannon distance have been successfully applied tostructural reduction in multiplex networks (De Domenicoet al., 2015), depth analysis in image processing (Han et al.,2012; Bai & Hancock, 2014), structure-function analysis ingenetic networks (Seaman et al., 2017; Liu et al., 2018b),and network-ensemble comparison (Li et al., 2018). How-ever, despite its effectiveness, the computation of VNGErequires (at most) cubic complexity in the number of nodes,thereby impeding its applicability to machine learning anddata mining tasks involving a sequence of large graphs. a r X i v : . [ s t a t . M L ] M a y ast Incremental von Neumann Graph Entropy Computation: Theory, Algorithm, and Applications Contributions.
To overcome the computational inefficiencyof VNGE, we propose a F ast I ncremental von N eumann G raph E nt R opy (FINGER) framework to approximateVNGE with a performance guarantee, reducing its cubiccomplexity to linear complexity in the number of nodesand edges. FINGER is a generic tool that applies to bothbatch and online graph sequences. It enables fast entropycomputation when every single graph in a graph sequenceis presented (e.g., a snapshot of a dynamic network, or asingle-layer connectivity pattern of a multiplex network).For applications where changes in a graph (e.g., addition anddeletion of nodes or edges over time) are continuously re-ported (e.g., streaming graphs), FINGER also allows onlinecomputation based on incremental graph changes. We provethat FINGER maintains an approximation guarantee and isasymptotically equivalent to the exact VNGE under someeigenspectrum conditions, which is further validated by dif-ferent synthetic random graphs. We then apply FINGERto developing efficient algorithms for the computation ofJensen-Shannon distance between graphs. Comparing to thestate-of-the-art graph similarity methods and two alternativeapproximate VNGE, FINGER yields superior and robustperformance for anomaly detection in evolving Wikipedianetworks and router communication networks, as well asbifurcation analysis in dynamic genomic networks. Theseapplications show the effectiveness and potentials of Jensen-Shannon distance for network learning in a wide range ofdomains, which has not been rigorously explored owing toits high computation complexity in the absence of FINGER.The contributions of this paper and the proposed framework(FINGER) are summarized as follows. • Two types of approximate VNGE reducing its cubic com-plexity to linear complexity are proposed to support fastand incremental computation of VNGE. We derive their ap-proximation error bounds and show asymptotic equivalencerelative to the exact VNGE under mild conditions. • FINGER achieves nearly 100% reduction in computationtime for VNGE of different graphs and enables scalableJensen-Shannon graph distance computation. • On two real-world applications (anomaly detection andcellular bifurcation analysis) and one synthesized dataset,FINGER exhibits outstanding and robust performance over7 baseline and state-of-the-art methods.
Related Work.
The VNGE was firstly defined based onthe combinatorial graph Laplacian matrix (Braunstein et al.,2006; Passerini & Severini, 2008; 2009; De Domenico et al.,2015; Li et al., 2018). Variants of VNGE and their approxi-mations have been proposed in the literature, including thenormalized graph Laplacian matrix (Shi & Malik, 2000)proposed in (Han et al., 2012) and the generalized graphLaplacian matrix of directed graphs (Chung, 2005) pro-posed in (Ye et al., 2014). However, these alternatives lackapproximation justification and are shown to be suboptimal in Section 4. To the best of our knowledge, this paper isthe first work that provides fast VNGE computation with aprovable approximation analysis.
2. FINGER: Theory and Algorithms
Using terminology from quantum statistical mechanics, adensity matrix Φ describing a quantum system in a mixedstate can be cast as a statistical ensemble of several quantumstates. The n × n matrix Φ is symmetric, positive semidefi-nite, and satisfies trace ( Φ ) = 1 . The von Neumann entropyof a quantum system is defined as H = − trace ( Φ ln Φ ) (Von Neumann, 1955), where ln Φ denotes matrix loga-rithm. Let { λ i } ni =1 be the sorted eigenvalues of Φ suchthat ≤ λ n ≤ . . . ≤ λ . The definition of von Neumannentropy is equivalent to H = − (cid:80) ni =1 λ i ln λ i , where theconvention is used due to lim x → + x ln x = 0 .Moreover, since (cid:80) i λ i = 1 and λ i ≥ for all i , the vonNeumann entropy can be viewed as the Shannon entropyassociated with the eigenspectrum { λ i } ni =1 .We consider the class of undirected weighted simple non-empty graphs with nonnegative edge weights, denoted by G . Let G = ( V , E , W ) ∈ G denote a single graph, where V and E denote its node and edge set with cardinality |V| = n and |E| = m , respectively, and W is an n × n matrixwith entry [ W ] ij = w ij denoting the weight of an edge ( i, j ) ∈ E . A graph sequence { G t } Tt =1 refers to a set of T graphs indexed by t ∈ { , . . . , T } with known node-to-nodecorrespondence, where G t ∈ G for all t . The combinatorialgraph Laplacian matrix of G is defined as L = S − W (Luxburg, 2007), where S = diag ( s , . . . , s n ) is a diagonalmatrix and its diagonal entry s i = (cid:80) nj =1 w ij is the nodalstrength (weighted degree) of a node i ∈ V . Connectingthe von Neumann entropy to graphs, the VNGE, denotedby H ( G ) , is defined by replacing Φ with L N = c · L (Braunstein et al., 2006; Passerini & Severini, 2008; 2009),where c = 1 / trace ( L ) is a trace normalization factor. Ithas been proved in (Passerini & Severini, 2008) that forany G ∈ G , H ( G ) ≤ ln( n − , where the equality holdswhen G is a complete graph. Note that since computingVNGE requires the entire eigenspectrum { λ i } ni =1 of L N , itincurs full eigenvalue decomposition on L N and has cubiccomplexity O ( n ) (Horn & Johnson, 1990), making itcomputationally infeasible for large graphs. f ( n ) = O ( h ( n )) , f ( n ) = o ( h ( n )) and f ( n ) = Ω( h ( n )) mean lim sup n →∞ | f ( n ) h ( n ) | < ∞ , lim n →∞ f ( n ) h ( n ) = 0 , and lim sup n →∞ | f ( n ) h ( n ) | > , respectively. For computing all eigenvalues of large matrices, a viablesolution is direct methods, possibly with parallel eigensolvers foracceleration. The complexity for computing { λ i } ni =1 of L N istypically O ( n + n ) (Bai et al., 2000). ast Incremental von Neumann Graph Entropy Computation: Theory, Algorithm, and Applications In what follows, we propose two types of approximateVNGE ( (cid:98) H and (cid:101) H ) for the exact VNGE H , where (cid:98) H and (cid:101) H possess linear computation complexity and satisfy (cid:101) H ≤ (cid:98) H ≤ H . Depending on the data format and problemsetup, (cid:98) H is designed for fast computation of H for a singlegraph, and (cid:101) H is designed for online computation of H basedon incremental graph changes. Furthermore, we derive ap-proximation error and prove asymptotic equivalence relativeto H under mild conditions on the eigenspectrum { λ i } ni =1 of L N . Our proofs are given in the supplementary material. Recall that computing H = − (cid:80) ni =1 λ i ln λ i requires O ( n ) computation complexity. For computation accel-eration, we first reduce its computation complexity by usingthe quadratic approximation of the term λ i ln λ i in H viaTaylor series expansion, leading to the following lemma. Lemma 1 (Quadratic approximation Q of H ) . For any G ∈ G , the quadratic approximation Q of the von Neumanngraph entropy H via Taylor series expansion is equivalentto Q = 1 − c ( (cid:80) i ∈V s i + 2 (cid:80) ( i,j ) ∈E w ij ) , where c = S and S = trace ( L ) = (cid:80) i ∈ V s i = 2 (cid:80) ( i,j ) ∈E w ij . It is clear from Lemma 1 that Q only depends on the edgeweights in G = ( V , E , W ) , resulting in linear computationcomplexity O ( n + m ) , where |V| = n and |E| = m . Wenote that higher-order (beyond quadratic) approximationof H is plausible at the price of less computational effi-ciency and possibly excessive subgraph pattern searching.For example, the cubic approximation of H involves thecomputation of trace ( W ) , which relates to the sum of edgeweights of every triangle in G . To identify the approxima-tion accuracy and equivalence of Q with respect to H , thefollowing theorem shows the approximation bounds on H in terms of Q and the eigenspectrum { λ i } ni =1 of L N . Theorem 1 (Approximation bounds on H ) . For any G ∈G , let λ max and λ min be the largest and smallest posi-tive eigenvalue of L N , respectively. If λ max < , then − Q ln λ max − λ min ≤ H ≤ − Q ln λ min − λ max . The bounds become ex-act and H = ln( n − when G is a complete graph withidentical edge weight. Note that Theorem 1 excludes the extreme case when λ max = 1 , as the resulting VNGE is trivial ( H = 0 ). Thecondition λ max < holds for any graph G ∈ G having aconnected subgraph with at least 3 nodes. In addition to theapproximation bounds presented in Theorem 1, the corollarybelow further shows asymptotic equivalence between Q and H ln n under mild conditions on λ max and λ min . Corollary 1 (Asymptotic equivalence of Q ) . For any G ∈G , let n + denote the number of positive eigenvalues of L N . The complexity becomes O ( n ) when m = O ( n ) (i.e.,dense graphs). In sparse graphs m could be O ( n ) . If n + = Ω( n ) and λ min = Ω( λ max ) , then H ln n − Q → as n → ∞ . Corollary 1 suggests that the VNGE of large graphs withbalanced eigenspectrum (i.e., λ min = Ω( λ max ) ) can bewell approximated by Q and a factor ln n . The condition ofbalanced eigenspectrum holds in regular and homogeneousrandom graphs (Passerini & Severini, 2008; Du et al., 2010).Furthermore, since n + equals to n − g , where g is thenumber of connected components in G (Merris, 1994), thecondition n + = Ω( n ) holds when g = o ( n ) . (cid:98) H : Approximate von Neumann GraphEntropy (cid:98) H Using Q and λ max Based on the derived lower bound of H in Theorem 1, wepropose the first type of approximate VNGE (cid:98) H using Q and λ max for any G ∈ G , which is defined as (cid:98) H ( G ) = − Q ln λ max . (1)Comparing to the lower bound − Q ln λ max − λ min in Theorem 1, (cid:98) H is a looser lower bound on H since − λ min < .Here we use − λ min ≈ when approximating H , sincetrace ( L N ) = (cid:80) ni =1 λ i = 1 and hence λ min is negligible,especially for large graphs.More importantly, since λ max is the largest eigenvalue of L N and by definition L N has n + m nonzero entries, thecomputation of λ max only requires O ( m + n ) operations viapower iteration methods (Horn & Johnson, 1990; Wu et al.,2017; Liao et al., 2019), leading to the same complexity as Q . Consequently, by only acquiring λ max instead of theentire eigenspectrum { λ i } ni =1 , the computation of (cid:98) H haslinear complexity O ( m + n ) , resulting in significant com-putation reduction when compared with the exact VNGE H , which requires cubic complexity O ( n ) . In addition tocomputational efficiency, the following corollary shows thatthe approximation error of (cid:98) H , defined as H − (cid:98) H , decays atthe rate of ln n under the same conditions as in Corollary1. We note that the o (ln n ) approximation error rate is non-trivial since H ≤ ln( n − for any G ∈ G (Passerini &Severini, 2008; Du et al., 2010). Corollary 2 ( o (ln n ) approximation error of (cid:98) H ) . For any G ∈ G , if n + = Ω( n ) and λ min = Ω( λ max ) , then thescaled approximation error (SAE) H − (cid:98) H ln n → as n → ∞ ,implying H − (cid:98) H = o (ln n ) . (cid:101) H : Approximate von Neumann GraphEntropy (cid:101) H Using Q and s max The proxy (cid:98) H in Section 2.3 enables fast computation ofVNGE for a single graph. As the exact online update of theeigenvalue λ max in (cid:98) H based on incremental graph changesis challenging, we propose the second type of approximate ast Incremental von Neumann Graph Entropy Computation: Theory, Algorithm, and Applications VNGE (cid:101) H using Q and the largest nodal strength s max =max i ∈V s i in a graph, which allows simple incrementalupdate of (cid:101) H based on graph changes but at the price oflarger approximation error than that of (cid:98) H . The approximateVNGE (cid:101) H is defined as (cid:101) H ( G ) = − Q ln(2 c · s max ) , (2)where c is the trace normalization constant. Using the def-inition L N = c · L and the upper bound on the largesteigenvalue of L in (Anderson Jr & Morley, 1985), we ob-tain (cid:101) H ≤ (cid:98) H ≤ H since λ max ≤ c · s max , implying (cid:101) H is alooser lower bound on H when compared with (cid:98) H . Nonethe-less, the following corollary shows the approximation errorof (cid:101) H also decays at the same rate o (ln n ) as (cid:98) H . Corollary 3 ( o (ln n ) approximation error of (cid:101) H ) . For any G ∈ G , if n + = Ω( n ) and λ min = Ω( λ max ) , then thescaled approximation error (SAE) H − (cid:101) H ln n → as n → ∞ ,implying H − (cid:101) H = o (ln n ) . To enable incremental computation of VNGE using (cid:101) H ,let G = ( V , E , W ) and G (cid:48) = ( V (cid:48) , E (cid:48) , W (cid:48) ) be any twographs from a graph sequence. Without loss of gener-ality we assume G and G (cid:48) have a common node set V c with |V c | = n nodes . In particular, the graph ∆ G =(∆ V , ∆ E , ∆ W ) with | ∆ V| = ∆ n and | ∆ E| = ∆ m is in-troduced to represent the changes made from converting G to G (cid:48) , denoted by G (cid:48) = G ⊕ ∆ G . The terms { ∆ s i } i ∈ ∆ V and { ∆ w ij } ( i,j ) ∈ ∆ E denote the nodal strengths and edgeweights of ∆ G , respectively, and ∆ S = (cid:80) i ∈ ∆ V ∆ s i . Let Q (cid:48) be the quadratic approximation of H ( G (cid:48) ) . The theorembelow shows that Q (cid:48) can be efficiently updated based on Q of H ( G ) , the values of s max and c from G , and ∆ G ,yielding competent complexity O (∆ n + ∆ m ) . Theorem 2 (Incremental update of Q (cid:48) ) . For any
G, G (cid:48) ∈ G such that G (cid:48) = G ⊕ ∆ G , given Q , G and ∆ G , the term Q (cid:48) can be efficiently updated by incremental graph changesas Q (cid:48) = Q − c ∆ S ) − (cid:16) c c ∆ S (cid:17) ∆ Q + 1 , where ∆ Q =2 (cid:80) i ∈ ∆ V s i ∆ s i + (cid:80) i ∈ ∆ V ∆ s i + 4 (cid:80) ( i,j ) ∈ ∆ E w ij ∆ w ij +2 (cid:80) ( i,j ) ∈ ∆ E ∆ w ij , and ∆ c = − c ∆ S c ∆ S . Furthermore, by the definition of (cid:101) H in (2), (cid:101) H ( G ⊕ ∆ G ) canbe efficiently updated by (cid:101) H ( G ⊕ ∆ G ) = − Q (cid:48) ln[2( c + ∆ c )( s max + ∆ s max )] (3)given Q , s max and c from G , and graph changes ∆ G , where ∆ c is defined in Theorem 2, and ∆ s max is the maximumvalue of and max i ∈ ∆ V ( s i +∆ s i ) − s max . The computation If G and G (cid:48) have different nodes, the set V c can be constructedby the set union V c = V ∪ V (cid:48) . The notation ⊕ denotes set additions V (cid:48) = V (cid:85) ∆ V , E (cid:48) = E (cid:85) ∆ E and matrix addition W (cid:48) = W + ∆ W . Algorithm 1
FINGER-JSdist (Fast)
Input:
Two graphs G and G (cid:48) from a graph sequence Output:
JSdist ( G, G (cid:48) )1. Obtain G = G ⊕ G (cid:48) and compute (cid:98) H ( G ) , (cid:98) H ( G (cid:48) ) , and (cid:98) H ( G ) via FINGER- (cid:98) H from (1)2. JSdist ( G, G (cid:48) ) = (cid:16) (cid:98) H ( G ) − [ (cid:98) H ( G ) + (cid:98) H ( G (cid:48) )] (cid:17) / Algorithm 2
FINGER-JSdist (Incremental)
Input:
A graph G , graph changes ∆ G , and (cid:101) H ( G ) Output:
JSdist ( G, G ⊕ ∆ G )1. Compute (cid:101) H ( G ⊕ ∆ G ) and (cid:101) H ( G ⊕ ∆ G ) via FINGER- (cid:101) H from (3) and Theorem 22. JSdist ( G, G ⊕ ∆ G ) = (cid:16) (cid:101) H ( G ⊕ ∆ G ) − [ (cid:101) H ( G ) + (cid:101) H ( G ⊕ ∆ G )] (cid:17) / complexity of (cid:101) H ( G ⊕ ∆ G ) is O (∆ n + ∆ m ) since theincremental update formula of Q (cid:48) in Theorem 2 and thecomputation of ∆ s max only take O (∆ n + ∆ m ) operations. As summarized in Algorithms 1 and 2, one major util-ity of VNGE is the computation of Jensen-Shannon dis-tance (JSdist) between any two graphs from a graph se-quence. Consider two graphs G = ( V c , E , W ) ∈ G and G (cid:48) = ( V c , E (cid:48) , W (cid:48) ) ∈ G , and let G = ( V c , E , W ) = G ⊕ G (cid:48) denote their averaged graph such that W = W + W (cid:48) . Thenthe Jensen-Shannon divergence between G and G (cid:48) canbe computed by JSdiv ( G, G (cid:48) ) = H ( G ) − [ H ( G ) + H ( G (cid:48) )] (De Domenico et al., 2015). Furthermore, theJensen-Shannon distance between G and G (cid:48) is defined as JSdist ( G, G (cid:48) ) = (cid:112)
JSdiv ( G, G (cid:48) ) , which has been provedto be a valid distance metric in (Endres & Schindelin, 2003;Bri¨et & Harremo¨es, 2009). The exact computation of JSdist requires O ( n ) computation complexity by the definition of H , where |V c | = n , which is computationally cumbersomefor large graphs. To overcome its computational inefficiency,we apply the developed FINGER- (cid:98) H and FINGER- (cid:101) H tothe computation of JSdist . If each graph G t in a graph se-quence { G t } Tt =1 is given, then FINGER-JSdist (Fast) allowsfast computation of JSdist and features linear computationcomplexity inherited from (cid:98) H . If a graph sequence is pre-sented by sequential graph changes { ∆ G t } T − t =1 such that G t +1 = G t ⊕ ∆ G t , then FINGER-JSdist (Incremental)allows online computation of JSdist relative to the incre-mental graph changes. Their superior performance will bediscussed in Section 4. Codes: https://github.com/pinyuchen/FINGER ast Incremental von Neumann Graph Entropy Computation: Theory, Algorithm, and Applications (a) ER model (b) BA model edge rewiring probability g r aph en t r op y app r o x . e rr o r Watts-Strogatz graphs d = 6 d = 10 d = 20 d = 50 edge rewiring probability c o m pu t a t i on t i m e r edu c t i on r a t i o ( % ) b H w.r.t. H , d = 50 e H w.r.t. H , d = 50 dashed line: H − e H solid line: H − b H (c) WS model Figure 1.
Performance evaluation of von Neumann graph entropy approximation in different random graph models with n = 2000 nodesunder varying average degree d and edge rewiring probability p WS . The approximation error of FINGER decays as d increases or p WS decreases. FINGER achieves nearly 100% speed-up relative to the exact entropy computation.
500 1000 1500 2000 2500 3000 3500 4000 4500 5000 number of nodes sc a l ed app r o x . e rr o r Erdos-Renyi graphs number of nodes c o m pu t a t i on t i m e r edu c t i on r a t i o ( % ) d = 2 d = 5 d = 10 d = 20 d = 50 d = 100 d = 200 (a) ER model
500 1000 1500 2000 2500 3000 3500 4000 4500 5000 number of nodes sc a l ed app r o x . e rr o r Barabasi-Albert graphs number of nodes c o m pu t a t i on t i m e r edu c t i on r a t i o ( % ) d = 2 d = 5 d = 10 d = 20 d = 50 d = 100 d = 200 (b) BA model
500 1000 1500 2000 2500 3000 3500 4000 4500 5000 number of nodes sc a l ed app r o x . e rr o r Watts-Strogatz graphs number of nodes c o m pu t a t i on t i m e r edu c t i on r a t i o ( % ) p WS = 0 p WS = 0 . p WS = 0 . p WS = 0 . p WS = 0 . p WS = 0 . p WS = 1 (c) WS model ( d = 50 ) Figure 2.
Scaled approximation error (SAE) and computation time reduction ratio (CTRR) of (cid:98) H via FINGER for different random graphmodels and varying number of nodes n . The SAE of ER and WS graphs validates the o (ln n ) approximation error analysis in Corollaries2 and 3. The CTRR attains nearly 100% speed-up relative to H for moderate-size graphs ( n ≥ ).
3. Experiments
In this section we conducted intensive experiments on theVNGE of three kinds of synthetic random graphs to studythe effects of graph size, average degree, and graph regu-larity on the approximation error of FINGER and its com-putational efficiency. The three random graph models are:(i) Erdos-Renyi (ER) model (Erd¨os & R´enyi, 1959) – everynode pair is connected independently with probability p ER ;(ii) Barabasi-Albert (BA) model (Barab´asi & Albert, 1999) –the degree distribution follows a power-law distribution; and(iii) Watts-Strogatz (WS) model (Watts & Strogatz, 1998)– an initially regular ring network with independent edgerewiring probability p WS for simulating small-world net-works. The parameter p WS controls the regularity of graphconnectivity, and smaller p WS gives more regular graphs.Since (cid:101) H ≤ (cid:98) H ≤ H , the approximation error (AE) is definedas H − (cid:98) H and H − (cid:101) H , respectively. The scaled approxima-tion error (SAE) is defined as AE ln n , which is a proper scalingaccording to our error analysis in Section 2, and it alsomakes a fair comparison of graphs with different number ofnodes. The computation time reduction ratio (CTRR) is de-fined as Time ( H ) − Time ( X ) Time ( H ) , where X ∈ { (cid:98) H, (cid:101) H } and Time ( Y ) denotes the computation time for Y ∈ { H, (cid:98) H, (cid:101) H } . All ex- periments (including Section 4) were conducted by MatlabR2016 on a 16-core machine with 128 GB RAM. The resultsin this section are averaged over 10 random trials. Addi-tional results are reported in the supplementary material. The effect of average degree d and graph regularity pa-rameter p WS . Figures 1 (a) and 1 (b) display the exactand the two approximate VNGE of ER and BA graphs andthe corresponding CTRR under varying d . When fixingthe number of nodes n , both (cid:98) H and (cid:101) H better match H as d increases, suggesting their AE decays with d . Compar-ing their CTRR, the computation of (cid:98) H and (cid:101) H enjoys atleast speed-up relative to H . The drastic reduction incomputation time can be explained by the efficient linearcomplexity of FINGER, as opposed to the high complexityin computing the entire eigenspectrum for calculating H .The CTRR of (cid:98) H slightly decays with d due to the growingnumber of nonzero entries (edges) in L N , resulting in in-creasing operations for computing λ max . Although the AEof (cid:98) H is always smaller than that of (cid:101) H due to the fact that (cid:101) H ≤ (cid:98) H ≤ H , the CTRR of (cid:101) H has nearly 100% speed-uprelative to H by simply requiring the information of s max instead of λ max from a graph.Figure 1 (c) displays the AE and CTRR of (cid:98) H and (cid:101) H under ast Incremental von Neumann Graph Entropy Computation: Theory, Algorithm, and Applications varying edge rewiring probability p WS and different averagedegree d ∈ { , , , } of WS model. Similar to ER andBA graphs, when fixing n and p WS , the AE of (cid:98) H and (cid:101) H decays as d increases. When n and d are fixed, smaller p WS yields less AE for both (cid:98) H and (cid:101) H , suggesting that FINGERattains better approximation when graphs are more regular.Since the curves of CTRR for different d in WS modelhave similar behavior, here we only report the results when d = 50 . Consistent with the observations in ER and BAgraphs, in WS graphs the CTRR of (cid:98) H and (cid:101) H achieves nearly100% improvement relative to H , and (cid:101) H attains slightlybetter CTRR than (cid:98) H at the price of larger AE. The effect of graph size n . Figure 2 displays the SAEof FINGER under the three random graph models whenvarying the number of nodes n . Since the results of (cid:98) H and (cid:101) H are similar, we show the SAE of (cid:98) H in Figure 2 andreport the SAE of (cid:101) H in the supplementary material. By thefact that ER and WS graphs have balanced eigenspectrum(Van Mieghem, 2010), for ER and WS models the SAE ofboth (cid:98) H and (cid:101) H decays as n increases, which verifies the o (ln n ) approximation error as stated in Corollaries 2 and3. On the other hand, the SAE of BA graphs is observed togrow logarithmically in n due to the existence of extremeeigenvalues (imbalanced eigenspectrum) (Van Mieghem,2010; Goh et al., 2001). Similar to the observations fromfixed-size graphs, for a fixed n the SAE decays with d and graph regularity in all cases. In addition, the CTRRattains nearly 100% speed-up relative to H for moderate-size graphs ( n ≥ ).
4. Applications
Here we apply FINGER to the computation of Jensen-Shannon (JS) distance between graphs (Section 2.5) in twoapplications and one synthesized dataset and demonstrateits outstanding performance over seven baseline and state-of-the-art methods in terms of efficiency and effectiveness.
Anomaly detection in evolving Wikipedia hyperlink net-works.
Wikipedia is an online encyclopedia that allowsediting and referencing between articles. By viewing anarticle as a node and a hyperlink as an edge, the evolu-tion of Wikipedia forms a graph sequence { G t } Tt =1 overtime. Table 1 summarizes four evolving Wikipedia networksof different language settings collected in (Mislove, 2009;Preusse et al., 2013), where each graph G t = ( V t , E t , W t ) corresponds to a monthly snapshot of a hyperlink network.These datasets are presented in terms of addition and dele-tion of nodes or edges with timestamps (i.e., continuousgraph changes { ∆ G t } T − t =1 ), which directly applies to incre-mental JS distance computation via FINGER (Algorithm2). Fast JS distance computation via FINGER (Algorithm1) can also be applied by computing G t +1 = G t ⊕ ∆ G t toobtain { G t } Tt =1 . The task of anomaly detection is to iden- tify noticeable changes (relative to the bulk network) in theconsecutive monthly snapshots of these massive Wikipediahyperlink networks. Bifurcation detection in dynamic genomic networks.
The genome-wide chromosome conformation capture (Hi-C) contact maps (Beloqui et al., 2009) for studying cellreprogramming from human fibroblasts to skeletal musclecan be viewed as a graph sequence consisting of 12 sampledspatial measurements, in which the cell reprogramming un-dergoes a space-time bifurcation at the 6th measurementas verified in (Liu et al., 2018a). The task is to identifythis bifurcation instance based on the dynamic Hi-C contactmaps. Additional descriptions of this dataset are given inthe supplementary material.
Evaluation.
We note that there are two major differences be-tween these two applications: (i) unweighted v.s. weightedgraphs and (ii) with v.s. without ground truth.In the Wikipedia case (unweighted graphs), our main goalis to use these large datasets to demonstrate the efficientcomputation of JS distance via FINGER owing to its lin-ear complexity. Additionally, since there are no labelsfor verifying the detected changes, we conduct an ex postfacto correlation analysis using an explicit and explainableanomaly metric – the vertex/edge overlapping (VEO) score(Papadimitriou et al., 2010). VEO is a properly normal-ized metric reflecting topological differences between twounweighted graphs, defined as − |V t ∩V t +1 | + |E t ∩E t +1 | ) |V t | + |V t +1 | + |E t | + |E t +1 | ,which is between [0 , and relates to the SorensenDice co-efficient (Dice, 1945; Sørensen, 1948) for comparing thesimilarity of two samples. In the Wikipedia experiments, ahigh VEO score directly pinpoints the month when articlesare edited by a relatively significant amount. Consequently,VEO can be used as an anomaly proxy for ex post factoanalysis in our setting.In the genome case (weighted graphs), the ground-truthbifurcation instance was verified. Moreover, unlike theWikipedia case, the genome dataset contains nonnegativeedge weights indicating cell interaction strengths. Therefore,in this case VEO is not an appropriate anomaly proxy be-cause by definition it is insensitive to edge weight changes. Comparative methods.
We compare the proposed methodwith the following baseline methods: • DeltaCon (Koutra et al., 2016): DeltaCon uses the ideaof fast belief propagation to compute graph similarity andoutputs a similarity score Sim DC ∈ [0 , . We use − Sim DC as the anomaly score. • RMD (Koutra et al., 2016): RMD is the Matusita distancededuced from DeltaCon, which is defined as Sim DC − . • λ distance (Bunke et al., 2007; Wilson & Zhu, 2008): TheEuclidean distance between two sets of top k eigenvalues ofa matrix. Here we use the weight matrix W (Adj.) and thegraph Laplacian matrix L (Lap.), and set k = 6 . ast Incremental von Neumann Graph Entropy Computation: Theory, Algorithm, and Applications Table 1.
Summary of four evolving Wikipedia hyperlink networks.
Datasets (graph sequence) maximum
Table 2.
Computation time (seconds) and Pearson correlation coefficient (PCC) between the anomaly proxy and different methods.FINGER attains the best PCC and time efficiency. The Spearman’s rank correlation analysis is given in Table S1 of the supplement.
Datasets FINGER-JS (Fast) FINGER-JS (Inc.) DeltaCon RMD λ dist.(Adj.) λ dist.(Lap.) GED VNGE-NL VNGE-GLWiki(sEN) PCC • GED (Bunke et al., 2007): graph edit distance (GED) forundirected unweighted graphs is the number of operations(node/edge additions and removals) required to convert agraph G t to another graph G t +1 . • VNGE-NL (Han et al., 2012) / VNGE-GL (Ye et al., 2014):Two VNGE heuristics using the normalized/generalizedgraph Laplacian matrix. Unlike FINGER, they lack approx-imation error guarantee.
Wikipedia results.
We compute the dissimilarity metrics ofeach method and compare them with the anomaly proxy interms of the Pearson correlation coefficient (PCC). A higherPCC suggests a better match to the anomaly proxy for de-tecting abnormal monthly edit changes relative to the bulknetwork. The PCC and computation time of each methodare reported in Table 2. For illustration, the dissimilaritymetrics of Wikipedia-EN are shown in Figure 3. The plotsof the other Wikipedia networks are given in the supplemen-tary material. The statistics of the anomaly proxy meet theintuition that in the earlier stage the monthly evolution ofWikipedia is more drastic, and in the later stage it becomesstable (i.e., less anomalous) since the changes are subtle rela-tive to the entire network. In Table 2, FINGER-JSdist (Fast)attains the best PCC (0.9029) and competitive computationtime. This suggests that the computation of JS distance canbe made efficient by FINGER, and its ex post facto analysisis highly correlated with the anomaly proxy. For example,in Figure 3 their top 10 flagged anomalies have 9 monthsin common. On the other hand, the other dissimilarity met-rics are either implicitly defined, unnormalized or lackingapproximation guarantees, making the detected anomaliesless explainable. FINGER-JSdist (Incremental) has the least
Figure 3.
Dissimilarity (anomaly) metrics of consecutive monthlyWikipedia-English hyperlink networks. The ex post facto analysisshows FINGER-JSdist (Fast) is highly correlated with the anomalyproxy (0.9029 PCC in Table 2 and 0.7973 SRCC in Table S1).FINGER-JSdist (Incremental) has efficient computation time andattains the second best PCC and SRCC among all methods. computation time by leveraging online computation, and itachieves the second best PCC due to looser approximationerror of (cid:101) H than (cid:98) H . Nonetheless, FINGER-JSdist (Incre-mental) is roughly 3 times faster than GED, 20 times fasterthan VNGE-GL, 50 times faster than FINGER-JSdist (Fast),100 times faster than DeltaCon, RMD and VNGE-NL, and200-300 times faster than λ distance. In addition to PCC,we also report the rank correlation coefficients in the sup-plementary material to show the high correlation betweenFINGER and the anomaly proxy.As discussed in the “Evaluation” paragraph, the main pur-pose of the Wikipedia experiments (without ground truths)is to show the efficiency in fast JS distance computation oflarge real-world graphs, enabled by FINGER. Additionally,our ex post facto analysis shows high correlation of FIN- ast Incremental von Neumann Graph Entropy Computation: Theory, Algorithm, and Applications Table 3.
Detection rate on synthesized anomalous events in the dynamic communication networks.
DoS attack ( X % ) FINGER-JS (Fast) FINGER-JS (Inc.) DeltaCon RMD λ dist.(Adj.) λ dist.(Lap.) GED VNGE-NL VNGE-GL1 %
24 %
10% 14% 14% 10%
14% 22% 22%3 %
62% 58% 58% 12% 23% 36% 39% 39%5 %
90% 90%
12% 28% 41% 67% 67%10 %
91% 91% 91% 91% 91% 91%
91% 91%
Figure 4.
Bifurcation detection of cell reprogramming in dynamicgenomic networks via the temporal difference score (TDS) ofdifferent methods (y-axis). The red squares indicate the detectedbifurcation points. Among all the compared methods, FINGER-JSdist (Algorithm 1) is the only method that correctly detects theground-truth bifurcation point (index 6), and its TDS resemblesthe shape of the ground-truth statistic.
GER with an explainable anomaly proxy. Beyond efficiency,we use the next two sets of experiments (with ground truths)to demonstrate the effectiveness of FINGER.
Bifurcation detection results.
Using the ground-truthstatistic provided by (Liu et al., 2018a), we compare theperformance of detecting the critical bifurcation point byeach method. Let θ t,t (cid:48) denote a dissimilarity metric betweentwo graphs G t and G t (cid:48) from { G t } Tt =1 . For each method,the temporal difference score (TDS) proposed in (Liu et al.,2018a) is used for bifurcation detection, which is definedas TDS ( t ) = [ θ t,t − + θ t,t +1 ] when t ∈ { , . . . , T − } ,and TDS (1) = θ , and TDS ( T ) = θ T,T − . The mea-surement(s) corresponding to a local minimum in TDS isdetected as a bifurcation instance. The ground-truth statisticand TDS of each method are shown in Figure 4. Amongall the compared methods, FINGER-JSdist (Algorithm 1) isthe only method that correctly detects the bifurcation point(index 6), and its TDS based on JS distance also resemblesthe shape of the ground-truth statistic. Synthesized anomaly detection results.
For further vali-dation, we use another real-world dynamic peering networkdataset at the autonomous system (AS) level (the Oregon-1dataset (Leskovec et al., 2005)) to synthesize anomalous connectivity patterns that mimic the denial-of-service (DoS)attacks. Here each graph represents the router connectivityover a certain time period, leading to 9 such graphs. We syn-thesize anomalous events by first selecting one graph fromthe first 8 graphs at random, and then connecting X % ofnodes to a randomly chosen node in the selected graph. Thissynthesized connection pattern mimics that of the DoS at-tack, in which multiple nodes (e.g., a botnet) aim to connectto the target node simultaneously. The task is to detect thissynthesized anomalous event by comparing the dissimilar-ity metric between consecutive graphs. Table 3 reports thedetection rate of different methods, where the detection rateis defined as the fraction of 100 random instances in whichthe anomalous event appears in the top-2 ranking basedon a dissimilarity metric. Tested on X = { , , , } % ,FINGER-JS (Fast) consistently attains the best detection rateamong all methods, suggesting the stability and superiorityof the proposed method. On the other hand, the comparedmethods are not as robust as FINGER. Notably, when X is small (i.e., the more challenging case for detection asthe attack becomes stealthier), the detection performanceof FINGER is more sensible than other methods. As X becomes large, which means the DoS attack pattern is moreapparent, the detection performance becomes similar.
5. Conclusion
In this paper, we proposed FINGER, a novel framework forefficiently computing von Neumann graph entropy (VNGE).FINGER reduces the computation of VNGE from cubiccomplexity to linear complexity for a given graph, andallows online computation based on incremental graphchanges. In addition to bounded approximation error, ourtheory shows that FINGER is guaranteed to have asymp-totic equivalence to the exact VNGE under mild conditions,which has been validated by extensive experiments on threedifferent random graph models. The high efficiency of FIN-GER also leads to scalable network learning algorithms forcomputing Jensen-Shannon distance between graphs. Fur-thermore, we use two domain-specific applications and onesynthesized dataset to corroborate the efficiency and effec-tiveness of FINGER compared to 7 baseline graph similaritymethods. The results demonstrate the power of FINGER intackling large network analysis and (unsupervised) learningproblems in different domains. Our future work includesextension to directed graphs and negative edge weights. ast Incremental von Neumann Graph Entropy Computation: Theory, Algorithm, and Applications
Acknowledgment
Pin-Yu Chen, Lingfei Wu and Sijia Liu acknowledge thesupport from MIT-IBM Watson AI Lab. Indika Rajapakseis supported in part by the Lifelong Learning Machinesprogram from DARPA/MTO.
References
Akoglu, L., Tong, H., and Koutra, D. Graph based anomalydetection and description: a survey.
Data Mining andKnowledge Discovery , 29(3):626–688, 2015.Anand, K. and Bianconi, G. Entropy measures for networks:Toward an information theory of complex topologies.
Physical Review E , 80(4):045102, 2009.Anand, K., Bianconi, G., and Severini, S. Shannon andvon Neumann entropy of random networks with heteroge-neous expected degree.
Physical Review E , 83(3):036109,2011.Anderson Jr, W. N. and Morley, T. D. Eigenvalues of theLaplacian of a graph.
Linear and Multilinear Algebra , 18(2):141–145, 1985.Bai, L. and Hancock, E. R. Depth-based complexity tracesof graphs.
Pattern Recognition , 47(3):1172–1186, 2014.Bai, Z., Demmel, J., Dongarra, J., Ruhe, A., and van derVorst, H.
Templates for the solution of algebraic eigen-value problems: a practical guide . SIAM, 2000.Barab´asi, A.-L. and Albert, R. Emergence of scaling inrandom networks.
Science , 286(5439):509–512, October1999.Beloqui, A., Guazzaroni, M.-E., Pazos, F., Vieites, J. M.,Godoy, M., Golyshina, O. V., Chernikova, T. N., Wal-iczek, A., Silva-Rocha, R., Al-ramahi, Y., et al. Reactomearray: forging a link between metabolome and genome.
Science , 326(5950):252–257, 2009.Braunstein, S. L., Ghosh, S., and Severini, S. The Lapla-cian of a graph as a density matrix: a basic combinatorialapproach to separability of mixed states.
Annals of Com-binatorics , 10(3):291–317, 2006.Bri¨et, J. and Harremo¨es, P. Properties of classical andquantum Jensen-Shannon divergence.
Physical Review A ,79(5):052311, 2009.Bunke, H., Dickinson, P. J., Kraetzl, M., and Wallis, W. D.
Agraph-theoretic approach to enterprise network dynamics ,volume 24. Springer Science & Business Media, 2007.Chen, P.-Y. and Hero, A. O. Node removal vulnerabilityof the largest component of a network. In
IEEE GlobalConference on Signal and Information Processing (Glob-alSIP) , pp. 587–590, 2013. Chen, P.-Y., Choudhury, S., and Hero, A. O. Multi-centralitygraph spectral decompositions and their application to cy-ber intrusion detection. In
IEEE International Conferenceon Acoustics, Speech and Signal Processing (ICASSP) ,pp. 4553–4557, 2016.Chung, F. Laplacians and the Cheeger inequality for directedgraphs.
Annals of Combinatorics , 9(1):1–19, 2005.De Domenico, M., Nicosia, V., Arenas, A., and Latora, V.Structural reducibility of multilayer networks.
NatureCommunications , 6, 2015.Del Vecchio, D., Abdallah, H., Qian, Y., and Collins, J. J.A blueprint for a synthetic genetic feedback controller toreprogram cell fate.
Cell Systems , 2017.Dice, L. R. Measures of the amount of ecologic associationbetween species.
Ecology , 26(3):297–302, 1945.Du, W., Li, X., Li, Y., and Severini, S. A note on the vonNeumann entropy of random graphs.
Linear Algebra andits Applications , 433(11-12):1722–1725, 2010.Endres, D. M. and Schindelin, J. E. A new metric for prob-ability distributions.
IEEE Transactions on Informationtheory , 49(7):1858–1860, 2003.Erd¨os, P. and R´enyi, A. On random graphs, I.
PublicationesMathematicae (Debrecen) , 6:290–297, 1959.Fiedler, M. Algebraic connectivity of graphs.
CzechoslovakMathematical Journal , 23(98):298–305, 1973.Goh, K.-I., Kahng, B., and Kim, D. Spectra and eigenvectorsof scale-free networks.
Physical Review E , 64(5):051903,2001.Han, L., Escolano, F., Hancock, E. R., and Wilson, R. C.Graph characterizations from von Neumann entropy.
Pat-tern Recognition Letters , 33(15):1958–1967, 2012.Horn, R. A. and Johnson, C. R.
Matrix Analysis . CambridgeUniversity Press, 1990.Kalofolias, V. How to learn a graph from smooth signals.In
International Conference on Artificial Intelligence andStatistics (AISTATS) , pp. 920929, 2016.Kipf, T. N. and Welling, M. Semi-supervised classificationwith graph convolutional networks.
ICLR , 2017.Koutra, D., Shah, N., Vogelstein, J. T., Gallagher, B., andFaloutsos, C. DeltaCon: Principled massive-graph sim-ilarity function with attribution.
ACM Transactions onKnowledge Discovery from Data , 10(3):28, 2016. ast Incremental von Neumann Graph Entropy Computation: Theory, Algorithm, and Applications
Leskovec, J., Kleinberg, J., and Faloutsos, C. Graphs overtime: densification laws, shrinking diameters and pos-sible explanations. In
ACM International Conferenceon Knowledge Discovery and Data Mining (KDD) , pp.177–187, 2005.Li, A. and Pan, Y. Structural information and dynamicalcomplexity of networks.
IEEE Transactions on Informa-tion Theory , 62(6):3290–3339, 2016.Li, Z., Mucha, P. J., and Taylor, D. Network-ensemblecomparisons with stochastic rewiring and von neumannentropy.
SIAM Journal on Applied Mathematics , 78(2):897–920, 2018.Liao, R., Zhao, Z., Urtasun, R., and Zemel, R. S. Lanczos-net: Multi-scale deep graph convolutional networks. In
International Conference on Learning Representations(ICLR) , 2019.Liu, S., Chen, H., Ronquist, S., Seaman, L., Ceglia, N.,Meixner, W., Chen, P.-Y., Higgins, G., Baldi, P., Smale,S., et al. Genome architecture mediates transcriptionalcontrol of human myogenic reprogramming. iScience , 6:232–246, 2018a.Liu, S., Chen, P.-Y., Hero, A., and Rajapakse, I. Dy-namic network analysis of the 4d nucleome. bioRxiv ,pp. 268318, 2018b.Luo, D., Huang, H., Nie, F., and Ding, C. H. Forgingthe graphs: A low rank and positive semidefinite graphlearning approach. In
Advances in Neural InformationProcessing Systems , pp. 2960–2968, 2012.Luxburg, U. A tutorial on spectral clustering.
Statistics andComputing , 17(4):395–416, December 2007.Merris, R. Laplacian matrices of graphs: a survey.
LinearAlgebra and its Applications , 197-198:143–176, 1994.Mislove, A. E.
Online social networks: measurement, anal-ysis, and applications to distributed information systems .PhD thesis, Rice University, 2009.Papadimitriou, P., Dasdan, A., and Garcia-Molina, H. Webgraph similarity for anomaly detection.
Journal of Inter-net Services and Applications , 1(1):19–30, 2010.Passerini, F. and Severini, S. The von Neumann entropy ofnetworks. arXiv preprint arXiv:0812.2597 , 2008.Passerini, F. and Severini, S. Quantifying complexity in net-works: the von Neumann entropy.
International Journalof Agent Technologies and Systems (IJATS) , 1(4):58–67,2009. Preusse, J., Kunegis, J., Thimm, M., Staab, S., and Gottron,T. Structural dynamics of knowledge networks. In
Inter-national AAAI Conference on Weblogs and Social Media ,2013.Ranshous, S., Shen, S., Koutra, D., Harenberg, S., Faloutsos,C., and Samatova, N. F. Anomaly detection in dynamicnetworks: a survey.
Wiley Interdisciplinary Reviews:Computational Statistics , 7(3):223–247, 2015.Seaman, L., Chen, H., Brown, M., Wangsa, D., Patterson,G., Camps, J., Omenn, G. S., Ried, T., and Rajapakse,I. Nucleome analysis reveals structure-function relation-ships for colon cancer.
Molecular Cancer Research , pp.molcanres–0374, 2017.Sharpnack, J. L., Krishnamurthy, A., and Singh, A. Near-optimal anomaly detection in graphs using lovasz ex-tended scan statistic. In
Advances in Neural InformationProcessing Systems , pp. 1959–1967, 2013.Shetty, J. and Adibi, J. Discovering important nodes throughgraph entropy the case of enron email database. In
Pro-ceedings of the 3rd international workshop on Link dis-covery , pp. 74–81. ACM, 2005.Shi, J. and Malik, J. Normalized cuts and image segmen-tation.
IEEE Trans. Pattern Anal. Mach. Intell. , 22(8):888–905, 2000.Shivanna, R. and Bhattacharyya, C. Learning on graphs us-ing orthonormal representation is statistically consistent.In
Advances in Neural Information Processing Systems ,pp. 3635–3643, 2014.Shuman, D., Narang, S., Frossard, P., Ortega, A., and Van-dergheynst, P. The emerging field of signal processing ongraphs: Extending high-dimensional data analysis to net-works and other irregular domains.
IEEE Signal Process.Mag. , 30(3):83–98, 2013.Simonyi, G. Graph entropy: A survey.
CombinatorialOptimization , 20:399–441, 1995.Sørensen, T. A method of establishing groups of equal am-plitude in plant sociology based on similarity of speciesand its application to analyses of the vegetation on danishcommons.
Kongelige Danske Videnskabernes Selskab , 5:1–34, 1948.Van Mieghem, P.
Graph Spectra for Complex Networks .Cambridge University Press, 2010.Von Neumann, J.
Mathematical foundations of quantummechanics . Number 2. Princeton university press, 1955.Wang, Y., Wang, Y.-X., and Singh, A. Graph connectivity innoisy sparse subspace clustering. In
Artificial Intelligenceand Statistics , pp. 538–546, 2016. ast Incremental von Neumann Graph Entropy Computation: Theory, Algorithm, and Applications
Watts, D. J. and Strogatz, S. H. Collective dynamics of‘small-world’ networks.
Nature , 393(6684):440–442,June 1998.Weintraub, H. The myod family and myogenesis: redun-dancy, networks, and thresholds.
Cell , 75(7):1241–1244,1993.Weintraub, H., Tapscott, S. J., Davis, R. L., Thayer, M. J.,Adam, M. A., Lassar, A. B., and Miller, A. D. Activationof muscle-specific genes in pigment, nerve, fat, liver,and fibroblast cell lines by forced expression of myod.
Proceedings of the National Academy of Sciences , 86(14):5434–5438, 1989.Wilson, R. C. and Zhu, P. A study of graph spectra forcomparing graphs and trees.
Pattern Recognition , 41(9):2833–2841, 2008.Wu, L., Romero, E., and Stathopoulos, A. Primme SVDS: Ahigh-performance preconditioned svd solver for accuratelarge-scale computations.
SIAM Journal on ScientificComputing , 39(5):S248–S271, 2017.Wu, L., Chen, P.-Y., Yen, I. E.-H., Xu, F., Xia, Y., and Aggar-wal, C. Scalable spectral clustering using random binningfeatures. In
ACM SIGKDD International Conference onKnowledge Discovery & Data Mining , pp. 2506–2515,2018a.Wu, L., Yen, I. E.-H., Xu, F., Ravikumar, P., and Witbrock,M. D2KE: From distance to kernel and embedding. arXivpreprint arXiv:1802.04956 , 2018b.Xu, K., Wu, L., Wang, Z., Feng, Y., Witbrock, M., andSheinin, V. Graph2seq: Graph to sequence learningwith attention-based neural networks. arXiv preprintarXiv:1804.00823 , 2018.Yanardag, P. and Vishwanathan, S. A structural smoothingframework for robust graph comparison. In
Advances inNeural Information Processing Systems , pp. 2134–2142,2015.Ye, C., Wilson, R. C., Comin, C. H., Costa, L. d. F., andHancock, E. R. Approximate von Neumann entropy fordirected graphs.
Physical Review E , 89(5):052804, 2014. ast Incremental von Neumann Graph Entropy Computation: Theory, Algorithm, and Applications
Supplementary Material
A. Proof of Lemma 1
For any real x such that < x < , it is easy to show that theTaylor series expansion of − x ln x at is (cid:80) ∞ z =1 ( − z z x ( x − z . Applying this result to the term − λ i ln λ i in H andtaking the quadratic approximation of the series expansiongives Q = n (cid:88) i =1 λ i (1 − λ i ) = 1 − n (cid:88) i =1 λ i (S1)since by definition (cid:80) ni =1 λ i = trace ( L N ) = 1 . The term (cid:80) ni =1 λ i in (S1) can be expressed as n (cid:88) i =1 λ i = trace ( L N ) (S2) = n (cid:88) i =1 n (cid:88) j =1 [ L N ] ij [ L N ] ji (S3) ( a ) = n (cid:88) i =1 n (cid:88) j =1 [ L N ] ij ( b ) = c n (cid:88) i =1 [ L ] ii + n (cid:88) i =1 n (cid:88) j =1 ,j (cid:54) = i [ L ] ij (S4) ( c ) = c (cid:88) i ∈V s i + 2 (cid:88) ( i,j ) ∈E w ij , (S5)where ( a ) is due to the matrix symmetry of L N , ( b ) is dueto the definition that L N = c · L , and ( c ) is due to thedefinition of L such that [ L ] ii = s i , and [ L ] ij = w ij when ( i, j ) ∈ E and [ L ] ij = 0 otherwise. Furthermore, define S = trace ( L ) = n (cid:88) i =1 [ L ] ii = (cid:88) i ∈V s i = 2 (cid:88) ( i,j ) ∈E w ij . (S6)Using the relation c = trace ( L ) , we obtain the expression Q = 1 − c (cid:16)(cid:80) i ∈V s i + 2 (cid:80) ( i,j ) ∈E w ij (cid:17) , where c = S and S = (cid:80) i ∈ V s i = 2 (cid:80) ( i,j ) ∈E w ij . B. Proof of Theorem 1
The assumption λ max < implies < λ i ≤ λ max < forall nonzero eigenvalues λ i . Following the definition of H , we can rewrite H as H = − n (cid:88) i =1 λ i ln λ i (S7) = − (cid:88) i : λ i > λ i ln λ i (S8) = − (cid:88) i : λ i > λ i (1 − λ i ) ln λ i − λ i . (S9)Since for all λ i > , ln λ min ≤ ln λ i ≤ ln λ max < and < − λ max ≤ − λ i ≤ − λ min < , we obtain therelation − ln λ max − λ min ≤ − ln λ i − λ i ≤ − ln λ min − λ max . (S10)Using Q = (cid:80) ni =1 λ i (1 − λ i ) = (cid:80) i : λ i > λ i (1 − λ i ) in (S1)and applying (S10) to (S9) yields − Q ln λ max − λ min ≤ H ≤ − Q ln λ min − λ max . (S11)When G is a complete graph with identical edge weight x > , it can be shown that the eigenvalues of L have1 eigenvalue at and n − identical eigenvalues at nx (Merris, 1994). Since the trace normalization constant c = trace ( L ) = n − nx , the eigenvalues of L N = c · L are λ n = 0 and λ i = nx ( n − nx = n − for all ≤ i ≤ n − ,which implies H = ln( n − . It is easy to see that inthis case Q = 1 − n − = 1 − λ min = 1 − λ max and − ln λ max = − ln λ min = ln( n − . Consequently, thebounds in (S11) become exact and H = ln( n − when G is a complete graph with identical edge weight. C. On the condition λ max < in Theorem 1 Here we show that the condition λ max < is always sat-isfied with any graph G ∈ G having a connected subgraphwith at least 3 nodes. By definition, λ max ≤ since it is thelargest eigenvalue of the scaled matrix L N = L / trace ( L ) .Since any connected subgraph with at least 3 nodes willcontribute to at least 2 positive eigenvalues of L N (VanMieghem, 2010; Chen & Hero, 2013) and all eigenvalues of L N sum to 1, we have λ max < . D. Proof of Corollary 1
Since (cid:80) ni =1 λ i = 1 , the condition λ min = Ω( λ max ) implies λ max and λ min are of the same order n + , where n + is thenumber of positive eigenvalues of L N . When the condition n + = Ω( n ) also holds, then λ max = an and λ min = bn forsome constants a, b such that a ≥ b > , and we obtain lim n →∞ − n · ln λ max − λ min = lim n →∞ n · ln n − ln a − bn = 1 . (S12) ast Incremental von Neumann Graph Entropy Computation: Theory, Algorithm, and Applications Similarly, lim n →∞ − n · ln λ min − λ max = 1 . (S13)Taking the limit of H ln n and applying (S12) and (S13) to thebounds in (S11), we obtain lim n →∞ H ln n − Q = 0 , (S14)which completes the proof. E. Proof of Corollary 2
Following the proof of Corollary 1, if n + = Ω( n ) and λ min = Ω( λ max ) , then λ max = an and λ min = bn for someconstants a, b such that a ≥ b > . We have lim n →∞ H − (cid:98) H ln n = lim n →∞ H ln n − Q + Q − (cid:98) H ln n (S15) ( a ) = lim n →∞ Q − (cid:98) H ln n (S16) ( b ) = lim n →∞ Q − Q · ln n − ln a ln n (S17) = 0 , (S18)where ( a ) uses (S14) and ( b ) uses the definition of (cid:98) H in(1) and λ max = an . This implies the approximation error H − (cid:98) H decays with ln n . That is, H − (cid:98) H = o (ln n ) . F. Proof of Corollary 3
Let µ max denote the largest eigenvalue of the graph Lapla-cian matrix L of a graph G ∈ G . Then it is known that nn − s max ≤ µ max ≤ s max , where the lower bound isproved in (Fiedler, 1973) and the upper bound is proved in(Anderson Jr & Morley, 1985). These bounds suggest that µ max has asymptotically the same order as s max . Moreover,since by definition L N = c · L , it holds that λ max = c · µ max and hence λ max = O ( c · s max ) . Following the proof ofCorollary 1, if n + = Ω( n ) and λ min = Ω( λ max ) , then λ max = an and λ min = bn for some constants a, b such that a ≥ b > , and c · s max = γn for some γ > since λ max = O ( c · s max ) . Similar to the proof of Corollary 2,we have lim n →∞ H − (cid:101) H ln n = lim n →∞ H ln n − Q + Q − (cid:101) H ln n (S19) ( a ) = lim n →∞ Q − (cid:101) H ln n (S20) ( b ) = lim n →∞ Q − Q · ln n − ln γ ln n (S21) = 0 , (S22) where ( a ) uses (S14) and ( b ) uses the definition of (cid:101) H in (2)and c · s max = γn . This implies the approximation error H − (cid:101) H decays with ln n . That is, H − (cid:101) H = o (ln n ) . G. Proof of Theorem 2
Let L and L (cid:48) denote the graph Laplacian matrix of G and G (cid:48) , respectively, and let L N = c · L and L (cid:48)N = c (cid:48) · L (cid:48) bethe corresponding trace-normalized matrices. Since S = trace ( L ) = 2 (cid:80) ( i,j ) ∈E w ij and ∆ S = 2 (cid:80) ( i,j ) ∈ ∆ E ∆ w ij ,it is easy to show that trace ( L (cid:48) ) = S + ∆ S = 1 /c (cid:48) . Wehave c (cid:48) − c = 1 S + ∆ S − S = − ∆ S ( S + ∆ S ) S = − cc (cid:48) ∆ S (S23)since c (cid:48) = 1 / trace ( L (cid:48) ) and c = 1 / trace ( L ) . This thenimplies c (cid:48) = c c ∆ S and ∆ c = c (cid:48) − c = − c ∆ S c ∆ S . (S24)Using the expression of quadratic approximation for VNGEin Lemma 1 and the relation that G (cid:48) = G ⊕ ∆ G , we have Q − Q (cid:48) = ( c + ∆ c ) (cid:88) i ∈V ( s i + ∆ s i ) + 2 (cid:88) ( i,j ) ∈E ( w ij + ∆ w ij ) − c (cid:88) i ∈V s i + 2 (cid:88) ( i,j ) ∈E w ij (S25) = (2∆ c + ∆ c ) (cid:88) i ∈V s i + 2 (cid:88) ( i,j ) ∈E w ij + ∆ Q + c ∆ Q, (S26)where ∆ Q = 2 (cid:80) i ∈ ∆ V s i ∆ s i + (cid:80) i ∈ ∆ V ∆ s i +4 (cid:80) ( i,j ) ∈ ∆ E w ij ∆ w ij + 2 (cid:80) ( i,j ) ∈ ∆ E ∆ w ij , and we usethe convention ∆ s i = 0 and ∆ w ij = 0 when there areno changes made in the nodal strength of node i andin the weight of edge ( i, j ) from G to G (cid:48) , respectively.Since Q = 1 − c (cid:16)(cid:80) i ∈V s i + 2 (cid:80) ( i,j ) ∈E w ij (cid:17) , replac-ing (cid:80) i ∈V s i + 2 (cid:80) ( i,j ) ∈E w ij with − Qc in (S26) and usingthe relation c (cid:48) = c + ∆ c yields Q (cid:48) = (cid:18) c (cid:48) c (cid:19) Q − c (cid:48) ∆ Q − c + ∆ c c . (S27) ast Incremental von Neumann Graph Entropy Computation: Theory, Algorithm, and Applications g r aph en t r op y app r o x . e rr o r d = 2 edge rewiring probability d = 4 d = 6 d = 10 g r aph en t r op y app r o x . e rr o r d = 20 edge rewiring probability d = 50 d = 100 d = 200 (a) Approximation error c o m pu t a t i on t i m e r edu c t i on r a t i o ( % ) d = 2 edge rewiring probability d = 4 d = 6 d = 10 c o m pu t a t i on t i m e r edu c t i on r a t i o ( % ) d = 20 edge rewiring probability d = 50 d = 100 d = 200 (b) CTRR Figure S1.
Approximation error and computation time reduction ratio (CTRR) of FINGER under different average degree d of WS model.The red solid line and blue dashed line refer to the results of (cid:98) H and (cid:101) H , respectively. Both (cid:98) H and (cid:101) H achieve at least 97% speed-up relativeto the computation of H in all cases. It is observed that (cid:101) H has larger approximation error than (cid:98) H but better CTRR. Using the result from (S24) that c (cid:48) c = c ∆ S , we can furthersimplify (S27) as Q (cid:48) = Q (1 + c ∆ S ) − (cid:18) c c ∆ S (cid:19) ∆ Q − c ∆ S ) + 1 (S28) = Q − c ∆ S ) − (cid:18) c c ∆ S (cid:19) ∆ Q + 1 , (S29)which completes the proof. H. Finite-size analysis and asymptoticequivalence of JS distance using FINGER
Beyond asymptotic analysis, we believe our results canprovide new insights to finite-size analysis, especially basedon the facts that: (i) our entropy inequality (cid:101) H ≤ (cid:98) H ≤ H isa finite-size result; (ii) The VNGE approximation error rate o (ln n ) is in fact optimal in n for any finite-size analysis,since Theorem 1 shows that the rate is tight for completegraphs with identical edge weights.Furthermore, based on the asymptotic equivalence results ofVNGE, it is straightforward to establish asymptotic equiv-alence of JS distance using FINGER as described in Al-gorithms 1 and 2. Let JS denote the exact JS distanceand JS FINGER denote the approximate JS distance using theVNGE computation from FINGER (either (cid:98) H or (cid:101) H ). UsingCorollaries 2 and 3, the properly scaled absolute approxima-tion error (SAAE) of JS distance, | JS − JS FINGER |√ ln n , convergesto as n → ∞ , which proves | JS − JS FINGER | = o ( √ ln n ) and JS FINGER √ ln n is asymptotically a distance metric. I. Additional experimental results onsynthetic random graphs
The effect of average degree d on Watts-Strogatz graphs. Figure S1 displays the approximation error and computationtime reduction ratio (CTRR) of FINGER- (cid:98) H and FINGER- (cid:101) H under different average degree d of WS model, whichis defined as H − (cid:98) H and H − (cid:101) H , respectively. It can beobserved that when fixing d , the approximation error decayswith the edge rewiring probability for both (cid:98) H and (cid:101) H . Inaddition, for the same edge rewiring probability, larger d yields less approximation error. Using FINGER, both (cid:98) H and (cid:101) H achieve at least 97% speed-up relative to the computationof H in all cases. The approximate VNGE (cid:101) H always attainsbetter CTRR than (cid:98) H but at the price of larger approximationerror due to the fact that (cid:101) H ≤ (cid:98) H ≤ H .Figure S2 displays the scaled approximation error (SAE)and computation time reduction ratio of (cid:98) H via FINGERfor WS model under varying number of nodes n and twodifferent settings of the average degree d . Their behaviorsare similar to the case of d = 50 as displayed in Figure 2(c). The effect of graph size n on FINGER- (cid:101) H . In compari-son to (cid:98) H via FINGER in Figure 2, Figure S3 displays theSAE and CTRR of (cid:101) H for the three different random graphmodels and varying number of nodes n . Consistent withthe findings in Section 3, the SAE of (cid:101) H for ER and WSgraphs obeys the o (ln n ) approximation error analysis asestablished in Corollary 3 since they have balanced eigen-spectrum. On the other hand, the SAE of BA graphs growslogarithmically with n due to imbalanced eigenspectrum. ast Incremental von Neumann Graph Entropy Computation: Theory, Algorithm, and Applications
500 1000 1500 2000 2500 3000 3500 4000 4500 5000 number of nodes sc a l ed app r o x . e rr o r Watts-Strogatz graphs number of nodes c o m pu t a t i on t i m e r edu c t i on r a t i o ( % ) p WS = 0 p WS = 0 . p WS = 0 . p WS = 0 . p WS = 0 . p WS = 0 . p WS = 1 (a) WS model ( d = 20 )
500 1000 1500 2000 2500 3000 3500 4000 4500 5000 number of nodes sc a l ed app r o x . e rr o r Watts-Strogatz graphs number of nodes c o m pu t a t i on t i m e r edu c t i on r a t i o ( % ) p WS = 0 p WS = 0 . p WS = 0 . p WS = 0 . p WS = 0 . p WS = 0 . p WS = 1 (b) WS model ( d = 100 ) Figure S2.
Scaled approximation error (SAE) and computation time reduction ratio (CTRR) of (cid:98) H via FINGER for WS model undervarying number of nodes n . Their behaviors are similar to the case of d = 50 as displayed in Figure 2 (c).
500 1000 1500 2000 2500 3000 3500 4000 4500 5000 number of nodes sc a l ed app r o x . e rr o r Erdos-Renyi graphs number of nodes c o m pu t a t i on t i m e r edu c t i on r a t i o ( % ) d = 2 d = 5 d = 10 d = 20 d = 50 d = 100 d = 200 (a) ER model
500 1000 1500 2000 2500 3000 3500 4000 4500 5000 number of nodes sc a l ed app r o x . e rr o r Barabasi-Albert graphs number of nodes c o m pu t a t i on t i m e r edu c t i on r a t i o ( % ) d = 2 d = 5 d = 10 d = 20 d = 50 d = 100 d = 200 (b) BA model
500 1000 1500 2000 2500 3000 3500 4000 4500 5000 number of nodes sc a l ed app r o x . e rr o r Watts-Strogatz graphs number of nodes c o m pu t a t i on t i m e r edu c t i on r a t i o ( % ) p WS = 0 p WS = 0 . p WS = 0 . p WS = 0 . p WS = 0 . p WS = 0 . p WS = 1 (c) WS model ( d = 50 )
500 1000 1500 2000 2500 3000 3500 4000 4500 5000 number of nodes sc a l ed app r o x . e rr o r Watts-Strogatz graphs number of nodes c o m pu t a t i on t i m e r edu c t i on r a t i o ( % ) p WS = 0 p WS = 0 . p WS = 0 . p WS = 0 . p WS = 0 . p WS = 0 . p WS = 1 (d) WS model ( d = 100 ) Figure S3.
Scaled approximation error (SAE) and computation time reduction ratio (CTRR) of (cid:101) H via FINGER for different random graphmodels and varying number of nodes n . The SAE of ER and WS graphs validates the o (ln n ) approximation error analysis in Corollary 3,whereas the SAE of BA graphs grows logarithmically with n due to imbalanced eigenspectrum. The CTRR attains nearly 100% speed-uprelative to H for moderate-size graphs ( n ≥ ). ast Incremental von Neumann Graph Entropy Computation: Theory, Algorithm, and Applications (a) Dissimilarity (anomaly) metrics of Wikipedia-sEN (b) Dissimilarity (anomaly) metrics of Wikipedia-FR(c) Dissimilarity (anomaly) metrics of Wikipedia-GE Figure S4.
Anomaly detection in consecutive monthly Wikipedia hyperlink networks via different dissimilarity metrics. The correspondingcomputation time and Pearson correlation coefficient are reported in Table 2. Similar to the observations in Figure 3, FINGER-JSdist(Fast) best aligns with the anomaly proxy in all datasets. FINGER-JSdist (Incremental) has efficient computation time but less consistency(second best PCC among all methods).
Fixing n , larger average degree or more graph regularityleads to less approximation error. Comparing to (cid:98) H , theCTRR of (cid:101) H attains nearly 100% speed-up relative to H forrelatively small-size graphs ( n ≥ ). J. Implementation details for VNGE-NL andVNGE-GL
We note that in the Wikipedia application, we omit the edgedirection for all methods except VNGE-GL since the result-ing performance is almost identical. The implementationof VNGE-GL indeed considers the edge direction. We alsonote that in these two applications, the Jensen-Shannon dis-tances of VNGE-NL and VNGE-GL are ineffective. There-fore, we use the consecutive difference of their approximateVNGE as the anomaly score, and take the absolute value ofthe anomaly score for anomaly ranking.
K. Additional results for anomaly detection inevolving Wikipedia hyperlink networks
Additional Wikipedia network plots.
The plots of dissim-ilarity (anomaly) metrics of different methods in Section 4for consecutive monthly hyperlink networks of Wikipedia- sEN, Wikipedia-FR, and Wikipedia-GE are shown in FigureS4. Their performance in terms of the computation time andPearson correlation coefficient are reported in Table 2. Sim-ilar to the observations in Figure 3, FINGER-JSdist (Fast)best aligns with the anomaly proxy in all datasets. FINGER-JSdist (Incremental) has efficient computation time but lessconsistency (still attains second best PCC among all meth-ods).
Rank correlation coefficients.
In addition to PCC, we fur-ther use the Spearman’s rank correlation coefficient (SRCC)to evaluate the consistency of each method with the anomalyproxy in this task. The results are summarized in Table S1.Similar to the results using PCC, FINGER-JS (Fast) attainsthe best SRCC among all the compared methods in thefour Wikipedia networks. This result again confirms thatJS distance via FINGER indeed learns the similar notion ofanomaly as indicated by the anomaly proxy.
L. Addition descriptions for bifurcationdetection of cell reprogramming indynamic genomic networks
Genome architecture is important in studying cell develop-ment, but its dynamics and role in determining cell iden- ast Incremental von Neumann Graph Entropy Computation: Theory, Algorithm, and Applications
Table S1.
Performance comparison of Spearman’s rank correlation coefficient (SRCC) between the anomaly proxy and each method inthe Wikipedia application. FINGER attains the best SRCC across all datasets.
Datasets FINGER-JS (Fast) FINGER-JS (Inc.) DeltaCon RMD λ dist.(Adj.) λ dist.(Lap.) GED VNGE-NL VNGE-GLWiki(sEN) Figure S5.
Chromatin contact matrix from Hi-C over a time course of 12 samples, which correspond to -48 hour (hr), 0 hr, 8 hr, , 80 hrover 6 days. tity are not well understood. Myogenic differentiation 1(MYOD1) is a master transcription factor that directly con-verts human fibroblasts to myogenic cells as studied in(Weintraub et al., 1989; Weintraub, 1993). Very recently,Liu et al. (Liu et al., 2018a) studied the chromatin contactmap (genome-wide structure) through chromosome con-formation capture (Hi-C) during the conversion of humanfibroblasts to myogenic cells. To understand cell reprogram-ming, one major question is detecting when the phase transi-tion occurs for cell identity conversion. Liu et al. conductedexperiments and constructed a 1Mb binned chromatin con-tact matrix (namely, Hi-C matrix) of dimension 2894 overa 6-day time course, leading to 12 sampled measurements.It was found that there exists a bifurcation point at the 6thsample (the measurement at 32 hour), suggesting that thecell reprogramming can be interpreted as a genome-widedynamic system (Del Vecchio et al., 2017) (i.e., a graph sequence) as displayed in Figure S5, where the bifurcationoccurs when a small structure change made to the cellularsystem causes a significant system-wide change for genome.Liu et al. further used complex graph analysis techniquesinvolving the temporal difference score (TDS) and multiplegraph centrality features (Chen et al., 2016) to construct arepresentative statistic for expressing the states of the stud-ied dynamic genomic contact network as displayed in Figure4, which is used in this paper as the ground-truth statistic forcomparing the performance of detecting bifurcation pointusing different dissimilarity and distance metrics. In partic-ular, given the TDS of a graph dissimilarity method overmeasurements, a bifurcation point is defined as the saddlepoint of the TDS curve excluding the first and last measure-ments (i.e., t = 1 and t = T ). The detected bifurcationpoint(s) of each method is displayed in Figure 4. ast Incremental von Neumann Graph Entropy Computation: Theory, Algorithm, and Applications Table S2.
Detection rate on synthesized anomalous events in the dynamic communication networks.
DoS attack ( X % ) FINGER-JS (Fast) FINGER-JS (Inc.) DeltaCon RMD λ dist.(Adj.) λ dist.(Lap.) GED VNGE-NL VNGE-GL VEO Cosinedistance Bhattacharyyadistance Hellingerdistance1 %
24 %
10% 14% 14% 10%
14% 22% 22% 14% 12% 10% 12%3 %
62% 58% 58% 12% 23% 36% 39% 39% 36% 35% 14% 16%5 %
90% 90%
12% 28% 41% 67% 67% 41% 37% 37% 34%10 %
91% 91% 91% 91% 91% 91%
91% 91%
46% 46% 67% 71%
M. Additional results using VEO as a baseline