[PDF] Optimality of Graph Scanning Statistic for Online Community Detection

Abstract

Sequential change-point detection for graphs is a fundamental problem for streaming network data types and has wide applications in social networks and power systems. Given fixed vertices and a sequence of random graphs, the objective is to detect the change-point where the underlying distribution of the random graph changes. In particular, we focus on the local change that only affects a subgraph. We adopt the classical Erdos-Renyi model and revisit the generalized likelihood ratio (GLR) detection procedure. The scan statistic is computed by sequentially estimating the most-likely subgraph where the change happens. We provide theoretical analysis for the asymptotic optimality of the proposed procedure based on the GLR framework. We demonstrate the efficiency of our detection algorithm using simulations.

Full PDF

OOptimality of Graph Scanning Statistic for OnlineCommunity Detection

Liyan Xie and Yao Xie

H. Milton Stewart School of Industrial and Systems EngineeringGeorgia Institute of TechnologyAtlanta, GA 30332, United StatesEmail: [email protected], [email protected]

Abstract —Sequential change-point detection for graphs is afundamental problem for streaming network data types andhas wide applications in social networks and power systems.Given ﬁxed vertices and a sequence of random graphs, theobjective is to detect the change-point where the underlyingdistribution of the random graph changes. In particular, wefocus on the local change that only affects a subgraph. Weadopt the classical Erd˝os-Rényi model and revisit the generalizedlikelihood ratio (GLR) detection procedure. The scan statistic iscomputed by sequentially estimating the most-likely subgraphwhere the change happens. We provide theoretical analysis forthe asymptotic optimality of the proposed procedure based on theGLR framework. We demonstrate the efﬁciency of our detectionalgorithm using simulations.

I. I

NTRODUCTION

Change-point detection is a fundamental problem for net-work data, such as power systems [1], [2], sensor networks [3],and social networks [4]–[6]. Network data can be modeled asgraphs. For instance, in social networks, each node representsusers, and the edge represents the connectivity between users.We consider the Erd˝os-Rényi model [7], which is parameter-ized by the probability of having an edge between two nodes.In this paper, we consider the detection of a local change inthe graph, which only affects the distribution of a subgraph .The detection procedure is to form the scan statistic based onthe Erd˝os-Rényi model. In particular, we treat the affectedsubgraph as unknown anomaly information, and apply thegeneralized likelihood ratio test [8] to form the scan statistic,utilizing the graph scanning techniques [9], [10].In order to give a computationally efﬁcient algorithm andtheoretical calibration, we assume the size of the subgraphaffected by the change is known . More speciﬁcally, when thechange happens, only a subset of the graph, of known size, isaffected by the change and has a different distribution, whilethe distribution for the rest of the graph remains the same.The problem of local change-point detection is challengingfor two reasons: (i) it is not clear whether there is a change;(ii) if there is a change at some time, it is not clear whichsubgraph contains the change.The major motivating application of our study is communitydetection. In particular, we are interested in the detection of the emergence of a community in a network that is homogeneousin the beginning. Such a problem is essential for dynamicnetworks. For example, the social network can start from a homogeneous state, and then evolve over time and form acommunity. Usually, the interactions between nodes within thecommunity are more dense than other parts of the network.Another example is ambient noise monitoring in seismicsensor networks. More speciﬁcally, the cross-correlation func-tion between the sensors affected by the change will havea signiﬁcant peak at the time of the change. Meanwhile,such waveform does not exist for cross-correlation functionsbetween affected sensors and unaffected sensors, and amongunaffected sensors. Therefore, this problem, mathematically,becomes detecting a local change in a sequence of graphs[11].In this paper, we focus on the parametric approach forconstructing scan statistics to detect a local change in asequence of graphs. For simplicity, we adopt the Erd˝os-Rényi model (while the analysis can be generalized to morecomplicated models), where each edge exists with probability p ∈ (0 , and independently with each other. After thechange, the affected subgraph still follows the Erd˝os-Rényimodel, but with a different parameter p ∈ (0 , . We considera sequential detection setting and prove the optimality of theonline detection procedure in the sense that the detectiondelay matches the well-known lower bound. The main ideaof the proof is adopted from the seminal work on generalizedlikelihood ratio test [8].This paper is related to works in community detection andgraph scan statistics. In [12], three likelihood ratio based algo-rithms were developed for detecting communities in the Erd˝os-Rényi graph, including the exhaustive search, the mixture, andthe hierarchical mixture methods. Theoretical approximationwas also given in [12] to characterize the false alarms. In[13], the community detection for Erd˝os-Rényi graphs wasconsidered as a hypothesis testing problem, where the goal isto ﬁnd a test function that takes the random graph as input andclaim whether there is a community or not. The detectabilityof this problem in the asymptotic dense regime, when theconnection probability p is large enough, was provided in[13]. Later on, information-theoretic lower bounds for theasymptotically sparse regime, when the connection probability p is small enough, was studied in [14].The rest of the paper is organized as follows. We presentthe problem setup in Section II. The detection procedure isdetailed in Section III. We present the optimality study in a r X i v : . [ m a t h . S T ] F e b ection IV. Numerical examples are presented in Section Vto support the theoretical ﬁndings. Finally Section VI containsour concluding remarks.II. P ROBLEM SETUP

Given a network with N sensors (nodes) numbered as { , . . . , N } , the dynamic graphical structure is observed asa sequence of undirected adjacency matrixes G (1) , G (2) , . . . , where G ( t ) ∈ { , } N × N characterizes the edge or interactioninformation between different nodes, i.e., G ( t ) ij = 1 if andonly if there is an edge between node i and node j at time t . Consider the Erd˝os-Rényi model, denoted as ER(

N, p ) ,where the graph is constructed by connecting nodes randomly.Each edge is included with probability p independently, i.e., P ( G ( t ) ij = 1) = p . The sequence of observations G (1) , G (2) , . . . are independent realizations of the Erd˝os-Rényi model.Assume that there is a change-point at an unknown time τ that changes the distribution of a subgraph with nodes indexedby V ∗ ⊂ { , . . . , N } . Before the change, the full graphfollows the model ER(

N, p ) with connection probability p .After the change, the subgraph V ∗ follows ER( | V ∗ | , p ) , i.e.,the connection probability inside the subgraph V ∗ becomes p , with everything else the same. Here | V ∗ | denotes thecardinality of the set V ∗ . Usually, the true subgraph V ∗ isunknown as it represents the anomaly information. We assumethat the cardinality of V ∗ equals a known constant n with n < N . In most applications, we have n (cid:28) N which meansthat we are only interested in the local graphical changes.Although the true subgraph where the change happens isunknown, there are only a ﬁnite number of possible subgraphswhen the total number of nodes N is ﬁxed. Denote all possiblesubgraphs as: V = { V (1) , . . . , V ( d ) } . Note that the number of all possible subgraphs can be upperbounded by (cid:0) Nn (cid:1) and we further have d (cid:28) (cid:18) Nn (cid:19) , if we are only interested in locally connected subgraphs, whichis a reasonable assumption for community detection taskswhere the change tends to happen within a small neighbor-hood.In summary, the problem of detecting a local change forthe underlying Erd˝os-Rényi model becomes the followinghypothesis testing problem: H : P ( G ( t ) ij = 1) = p , ∀ i, j ; t = 1 , , . . .H : P ( G ( t ) ij = 1) = p , ∀ i, j ; t = 1 , , . . . , τ − P ( G ( t ) ij = 1) = p , ∀ i, j ∈ V ∗ ; t = τ, τ + 1 , . . . P ( G ( t ) ij = 1) = p , ∀ i or j / ∈ V ∗ ; t = τ, τ + 1 , . . . (1)where τ represents the change-point. This hypothesis testingproblem is illustrated in Fig. 1, where the post-change sub-graph contains only three nodes and is shown in highlight. ⋯ τ ⋯ 𝑝 ! 𝑝 ! 𝑝 " ER (𝑁, 𝑝 " ) ER 𝑛, 𝑝 ! 𝑝 " Fig. 1. Graphs prior to the change-point in time τ follow the Erd˝os-Rényi model with coneection probability p . After the change-point τ , thesubgraph (shown in highlight) follows the Erd˝os-Rényi model with connctionprobability p (cid:54) = p , with everything else the same. We are particularlyinterested in detecting the local change in the subgraph. Given access to a sequence of graph observations G (1) , G (2) , . . . , if they are sampled from the hypothesis H inthe model (1), our objective is to design a stopping time thatcan detect the unknown change-point τ as quickly as possible.Meanwhile, if the data are sampled from the hypothesis H inthe model (1), it is desired to have less false alarms as possible.Here we assume the connection probabilities p , p are known,but the subgraph where the change happens is unknown .III. D ETECTION P ROCEDURE

The detection problem (1) can be solved based on statisticalchange detection methodology, which we describe in thissection. We start by introducing the basic cumulative sum(CUSUM) procedure for change detection when the subgraph V ∗ is known, and then study the generalized likelihood ratio(GLR) test for unknown subgraphs.The log-likelihood ratio between the pre- and post-changedistributions plays a key role in sequential change detection.For the Erd˝os-Rényi model ER(

N, p ) before change, we havethe likelihood function of G ( t ) is L ( G ( t ) , p ) = (cid:89) ≤ i b } , (3)where the threshold b is a pre-set constant to control the falsealarm rates.By Jensen’s inequality, it is easy to show that the expectationof the increment term (cid:96) V ∗ ( G ( t ) ) in (2) is negative under thepre-change regime, and positive in the post-change regime.herefore, the CUSUM statistic S t will have a positive driftafter the change happens, enabling its efﬁcient detection ofthe change-point. The CUSUM procedure was shown to havestrong optimality properties in [8], [16]–[18]. In particular,it attains the minimal worst-case detection delay among alltesting procedures that satisfy certain false alarm constraint.However, the CUSUM statistic (2) cannot be used directlywhen the changed subgraph V ∗ is unknown . Therefore, weadopt the GLR framework [8], [12], also known as the scan test[11], [13]. The GLR test was originally developed for changedetection for parametric families when the post-change pa-rameter is unknown, by substituting the maximum likelihoodtype estimators. Here instead of estimating the post-changeparameters, we estimate the unknown post-change subgraphby maximizing the likelihood function of samples in the past.More speciﬁcally, the GLR statistic at time t is deﬁned as S t = max ≤ k ≤ t max V ∈V R t,k,V , where R t,k,V is the log-likelihood ratio of samples G ( k ) , . . . , G ( t ) , assuming the changed subgraph is V and thechange-point τ = k , which can be derived as: R t,k,V = (cid:88) i,j ∈ Vi b (cid:27) , (5)where b is the threshold, and R t,k,V is the log-likelihood ratioas deﬁned in (4).We are further interested in knowing which subgraph con-tains the change in the graph structure. Once we have detectedthe change-point as k , we can choose a post-change interval ( k, t ) . The test statistic R t,k,V is useful in localizing thechange, as the subgraph V ∗ that maximizes R t,k,V , over allpossible subgraphs in V , is the maximum likelihood estimate (MLE) of the subgraph containing the change, (cid:98) V k,t = arg max V ∈V R t,k,V . (6)It is worth mentioning that the solution to (6) is the so-called“densest n subgraph” [19] and it is an NP-hard problem; there is no constant approximation ratio algorithm due to its corehardness. We use the greedy procedure in [19] to approximatethe maximum likelihood estimate (cid:98) V k,t . For completeness, werestate the procedure here: “Sort the vertices by order of theirdegree. Let H denote the n/ vertices with highest degreesin the graph G . Sort the remaining vertices by the numberof neighbors they have in H . Let C denote the n/ verticesin G \ H with the largest number of neighbors in H . Return H ∪ C .” IV. O PTIMALITY

We ﬁrst introduce two metrics commonly used to charac-terize the performance of detection procedures in sequentialchange detection.The average run length (ARL) is deﬁned as the averagetime between false alarms when there is no change; it canbe denoted as E ∞ [ T ] , where E ∞ is the expectation underthe pre-change measure (i.e., the change-point is at ∞ ). Theexpected detection delay (EDD) refers to the expected delayin detecting the change. There are two common deﬁnitions forEDD as introduced in [16] and [20]. We adopt the one in [16]as follows: ¯ E ( T ) = sup τ ≥ ess sup E τ [( T − τ ) + | G (1) , . . . , G ( τ − ] , (7)where the essential supremum is taken over all possiblechange-point τ and realizations G (1) , . . . , G ( τ − before thechange; and E τ is the expectation under the probabilitymeasure that the change-point equals to τ .The goal is to minimize EDD subject to the ARL constraintthat E ∞ [ T ] ≥ γ for a positive constant γ .The lower bound to the worst-case EDD ¯ E ( T ) was givenin [8, Theorem 1]. More speciﬁcally, we restate this lowerbound in our setting (1). Theorem 1 ( [8]) . As γ → ∞ , we have inf (cid:8) ¯ E ( T ) : E ∞ ( T ) ≥ γ (cid:9) ≥ ( I − + o (1)) log γ, where I = (cid:18) n (cid:19) (cid:18) p log p p + (1 − p ) log 1 − p − p (cid:19) , is the Kullback–Leibler (KL) divergence between the graphicaldistribution ER( n, p ) and ER( n, p ) on the changed sub-graph V ∗ with | V ∗ | = n . Theorem 1 means that for any detection procedure withARL greater than γ , the minimal detection delay is of theorder of log γ/I . Therefore, a detection procedure is called ﬁrst-order asymptotic optimal if its EDD equals to log γ/I (1+ o (1)) asymptotically as γ → ∞ . Below, we prove that thedetection rule (5) achieves the lower bound in Theorem 1.We ﬁrst consider the pre-change regime and state thefollowing lemma: Lemma 2.

For the stopping time (5) , we have sup τ ≥ P ∞ ( τ ≤ T G < τ + m α ) ≤ m α e − b (cid:18) Nn (cid:19) . (8) roof. First note that P ∞ ( τ ≤ T G ≤ τ + m α ) ≤ (cid:88) τ − m α ≤ k ≤ τ + m α P ∞ ( τ k ≤ k + m α ) , (9)where τ k := inf (cid:110) t ≥ k + m (cid:48) α : (cid:98) V k,t ∈ V , and R t,k, (cid:98) V k,t ≥ b (cid:111) . (10)To analyze P ∞ ( τ k ≤ k + m α ) , we use a change-of-measure argument. Let P kV denote the probability measure under whichthe distribution of G ( i ) is ER(

N, p ) for i < k , and thedistribution of the subgraph V becomes ER( n, p ) for i ≥ k .Deﬁne a measure Q k = (cid:88) V ∈V P kV . Since V is a ﬁnite set, Q k is a ﬁnite measure. For t ≥ k , let F k,t denote the sigma-algebra generated by G ( k ) , . . . , G ( t ) .The Radon-Nikodym derivative of the restriction of measure Q k to F k,t relative to the restriction of P ∞ to F k,t is L t = (cid:88) V ∈V exp { R t,k,V } . Hence by Wald’s likelihood ratio identity, we have P ∞ ( τ k ≤ k + m α ) = (cid:90) { τ k ≤ k + m α } L − τ k dQ k = (cid:88) V ∈V (cid:90) { τ k ≤ k + m α } L − τ k dP kV . Since L τ k = (cid:88) V ∈V exp { R τ k ,k,V } ≥ exp (cid:110) R τ k ,k, (cid:98) V k,t (cid:111) ≥ e b , (11)where the last inequality is due to the deﬁnition of τ k in (10),we have that sup k P ∞ ( τ k ≤ k + m α ) = sup k (cid:88) V ∈V (cid:90) { τ k ≤ k + m α } L − τ k dP kV ≤ e − b |V| ≤ e − b (cid:18) Nn (cid:19) . Substitute into (9), we have P ∞ ( τ ≤ T G ≤ τ + m α ) ≤ m α e − b (cid:18) Nn (cid:19) . Given the condition (8) in Lemma 2, by [8, Theorem 4],we have the following optimality results for our setup (1).

Theorem 3.

For the stopping time (5) , if the window size m α satisﬁes lim inf α → m α | log α | > I − , log m α = o (log α ) , and the threshold b satisﬁes m α e − b (cid:18) Nn (cid:19) = α, (12) then we have E ∞ [ T G ] ≥ ( 12 − α )( m α α − , (13) and as α → , ¯ E ( T G ) ≤ ( I − + o (1)) b, as b ∼ | log α | → ∞ . Therefore the stopping rule (5) is asymptotically optimal.Proof.

For the true subgraph V ∗ , deﬁne the window-limitedCUSUM rule as (cid:101) T = inf (cid:26) t : max t − m α ≤ k ≤ t − m (cid:48) α R t,k,V ∗ > b (cid:27) . It is obvious that ¯ E ( T G ) ≤ ¯ E ( (cid:101) T ) due to the deﬁnition of T G in (5). Note that for any ﬁnitevalue N, n , the constraint (12) implies that we can choosethe threshold as b ∼ | log α | . By [8, Theorem 4], we have as b ∼ | log α | → ∞ , ¯ E ( (cid:101) T ) ≤ ( I − + o (1)) b, if m (cid:48) α = o ( | log α | ) . The ARL (13) is proved in [8] whenever T G satisﬁes the condition (8). Further note that log E ∞ [ T G ] ∼ | log α | . Therefore, T G matches the lower bounds in Theorem 1. Inother words, T G is ﬁrst-order asymptotically optimal.It is worth mentioning that the condition (12) only yields b ∼ | log α | for ﬁnite values of N, n . It does not hold for N that diverges to inﬁnity. The analysis here differs fromthe original GLR framework considered in [8] and can beviewed as a special case, since the unknown subgraph hasﬁnite possibilities while the unknown parameter in parametricmodels can vary in a continuous space. Remark 4 (Generalization to unknown p ) . When the post-change probability p is unknown, we can estimate it usingthe maximum likelihood estimator when formulating the GLRdetection statistic. More speciﬁcally, given G ( k ) , . . . , G ( t ) anda subgraph V , the MLE of p is given by (cid:98) p ( k,t,V )1 = (cid:88) i,j ∈ V,i b (cid:27) , (14) where U t,k,V is deﬁned as (cid:88) i,j ∈ Vi

V. N

UMERICAL E XAMPLES

We present simulation examples using the Erd˝os-Rényigraphical models to visualize the detection procedure (5).The size of the network N , i.e., the total number of nodes, isset as and , respectively. In both cases, we are interestedin detecting the change that happens only in a much smallersubgraph consisting of n = 5 nodes. The pre-change edge-forming probability is set as p = 0 . , and the post-changeprobability is p = 0 . , i.e., the change increases the intensityof edges within the changed subgraph.For numerical issues, we do not compute the worst-casedetection delay (7) that takes supremum over all possiblepast observations and over all possible change-points. Instead,we compute an alternative formulation E [ T G ] that can beconveniently evaluated by setting the change-point as τ = 1 ,i.e., the change happens before we take any sample.In Fig. 2, we compare the EDD of the GLR procedure in(5) and the CUSUM procedure in (3). The CUSUM statisticserves as a baseline since it is the optimal detection procedurewith the smallest detection delay. It is shown that the detectiondelay of the GLR approach indeed matches the detection delayof CUSUM in ﬁrst-order (i.e., in the slope). Moreover, it canbe seen that the detection delay of the GLR approach tends toincrease as we increase the network size N , since it becomesmore difﬁcult to scan for the right subgraph containing thechange. ARL12345678 E DD CUSUMWL GLR ARL02468101214 E DD CUSUMWL GLR

Fig. 2. The EDD/ARL tradeoff for window-limited GLR by graph scanning,and the optimal CUSUM when the subgraph is known. Left: N = 20 ; Right: N = 50 . VI. C

ONCLUSION

We have revisited the sequential change detection for Erd˝os-Rényi graphs. The problem setup can be applied to communitydetection problems. The graph scanning statistic considered inthis paper is formed by scanning all possible subgraphs overthe whole graph. The detection procedure matches the well-known GLR test and is asymptotically optimal.Future direction includes extending the proposed methodto more complicated graphical models and to sequences withdependency. Moreover, the framework can be applied to the goodness-of-ﬁt test for local regions . The global null is that aknown graphical distribution P (e.g., ER(

N, p ) ) is a goodﬁt for all local regions (the whole graph). The alternative isthat there is a subgraph such that the underlying distributiondistinct from P signiﬁcantly. For each local region, we cancompute a local test statistic based on GLR, and compare itwith a threshold, which can be set by simulation or the limitingdistribution of the test statistic.A CKNOWLEDGMENT

The work of Liyan Xie and Yao Xie is partially supportedby an NSF CAREER Award CCF-1650913, DMS-1938106,DMS-1830210, CCF-1442635, and CMMI-1917624.R

EFERENCES[1] G. Rovatsos, X. Jiang, A. D. Domínguez-García, and V. V. Veeravalli,“Statistical power system line outage detection under transient dynam-ics,”

IEEE Transactions on Signal Processing , vol. 65, no. 11, pp. 2787–2797, 2017.[2] Y. C. Chen, T. Banerjee, A. D. Dominguez-Garcia, and V. V. Veeravalli,“Quickest line outage detection and identiﬁcation,”

IEEE Transactionson Power Systems , vol. 31, no. 1, pp. 749–758, 2015.[3] L. Xie, Y. Xie, and G. V. Moustakides, “Sequential subspace changepoint detection,”

Sequential Analysis , vol. 39, no. 3, pp. 307–335, 2020.[4] Y. Wang, A. Chakrabarti, D. Sivakoff, and S. Parthasarathy, “Fastchange point detection on dynamic social networks,” arXiv preprintarXiv:1705.07325 , 2017.[5] L. Peel and A. Clauset, “Detecting change points in the large-scalestructure of evolving networks,” in

Proceedings of the AAAI Conferenceon Artiﬁcial Intelligence , vol. 29, no. 1, 2015.[6] S. Li, Y. Xie, M. Farajtabar, A. Verma, and L. Song, “Detecting changesin dynamic events over networks,”

IEEE Transactions on Signal andInformation Processing over Networks , vol. 3, no. 2, pp. 346–359, 2017.[7] P. Erd˝os and A. Rényi, “On the evolution of random graphs,”

Publ.Math. Inst. Hung. Acad. Sci , vol. 5, no. 1, pp. 17–60, 1960.[8] T. L. Lai, “Information bounds and quick detection of parameter changesin stochastic systems,”

IEEE Transactions on Information Theory ,vol. 44, no. 7, pp. 2917–2929, 1998.[9] C. E. Priebe, J. M. Conroy, D. J. Marchette, and Y. Park, “Scan statisticson Enron graphs,”

Computational & Mathematical Organization Theory ,vol. 11, no. 3, pp. 229–247, 2005.[10] J. Sharpnack, A. Rinaldo, and A. Singh, “Detecting anomalous activityon networks with the graph Fourier scan statistic,”

IEEE Transactionson Signal Processing , vol. 64, no. 2, pp. 364–379, 2016.[11] X. He, Y. Xie, S.-M. Wu, and F.-C. Lin, “Sequential graph scanningstatistic for change-point detection,” in . IEEE, 2018, pp. 1317–1321.[12] D. Marangoni-Simonsen and Y. Xie, “Sequential changepoint approachfor online community detection.”

IEEE Signal Processing Letters ,vol. 22, no. 8, pp. 1035–1039, 2015.[13] E. Arias-Castro and N. Verzelen, “Community detection in dense randomnetworks,”

Annals of Statistics , vol. 42, no. 3, pp. 940–969, 2014.[14] N. Verzelen and E. Arias-Castro, “Community detection in sparserandom networks,”

Annals of Applied Probability , vol. 25, no. 6, pp.3465–3510, 2015.[15] E. S. Page, “Continuous inspection schemes,”

Biometrika , vol. 41, no.1/2, pp. 100–115, 1954.[16] G. Lorden, “Procedures for reacting to a change in distribution,”

Annalsof Mathematical Statistics , vol. 42, no. 6, pp. 1897–1908, 1971.[17] G. V. Moustakides, “Optimal stopping times for detecting changes indistributions,”

Annals of Statistics , vol. 14, no. 4, pp. 1379–1387, 1986.[18] Y. Ritov, “Decision theoretic optimality of the CUSUM procedure,”

Annals of Statistics , vol. 18, no. 3, pp. 1464–1469, 1990.[19] U. Feige, D. Peleg, and G. Kortsarz, “The dense k-subgraph problem,”

Algorithmica , vol. 29, no. 3, pp. 410–421, 2001.[20] M. Pollak, “Optimal detection of a change in distribution,”