Optimality of Graph Scanning Statistic for Online Community Detection
OOptimality of Graph Scanning Statistic for OnlineCommunity Detection
Liyan Xie and Yao Xie
H. Milton Stewart School of Industrial and Systems EngineeringGeorgia Institute of TechnologyAtlanta, GA 30332, United StatesEmail: [email protected], [email protected]
Abstract —Sequential change-point detection for graphs is afundamental problem for streaming network data types andhas wide applications in social networks and power systems.Given fixed vertices and a sequence of random graphs, theobjective is to detect the change-point where the underlyingdistribution of the random graph changes. In particular, wefocus on the local change that only affects a subgraph. Weadopt the classical Erd˝os-Rényi model and revisit the generalizedlikelihood ratio (GLR) detection procedure. The scan statistic iscomputed by sequentially estimating the most-likely subgraphwhere the change happens. We provide theoretical analysis forthe asymptotic optimality of the proposed procedure based on theGLR framework. We demonstrate the efficiency of our detectionalgorithm using simulations.
I. I
NTRODUCTION
Change-point detection is a fundamental problem for net-work data, such as power systems [1], [2], sensor networks [3],and social networks [4]–[6]. Network data can be modeled asgraphs. For instance, in social networks, each node representsusers, and the edge represents the connectivity between users.We consider the Erd˝os-Rényi model [7], which is parameter-ized by the probability of having an edge between two nodes.In this paper, we consider the detection of a local change inthe graph, which only affects the distribution of a subgraph .The detection procedure is to form the scan statistic based onthe Erd˝os-Rényi model. In particular, we treat the affectedsubgraph as unknown anomaly information, and apply thegeneralized likelihood ratio test [8] to form the scan statistic,utilizing the graph scanning techniques [9], [10].In order to give a computationally efficient algorithm andtheoretical calibration, we assume the size of the subgraphaffected by the change is known . More specifically, when thechange happens, only a subset of the graph, of known size, isaffected by the change and has a different distribution, whilethe distribution for the rest of the graph remains the same.The problem of local change-point detection is challengingfor two reasons: (i) it is not clear whether there is a change;(ii) if there is a change at some time, it is not clear whichsubgraph contains the change.The major motivating application of our study is communitydetection. In particular, we are interested in the detection of the emergence of a community in a network that is homogeneousin the beginning. Such a problem is essential for dynamicnetworks. For example, the social network can start from a homogeneous state, and then evolve over time and form acommunity. Usually, the interactions between nodes within thecommunity are more dense than other parts of the network.Another example is ambient noise monitoring in seismicsensor networks. More specifically, the cross-correlation func-tion between the sensors affected by the change will havea significant peak at the time of the change. Meanwhile,such waveform does not exist for cross-correlation functionsbetween affected sensors and unaffected sensors, and amongunaffected sensors. Therefore, this problem, mathematically,becomes detecting a local change in a sequence of graphs[11].In this paper, we focus on the parametric approach forconstructing scan statistics to detect a local change in asequence of graphs. For simplicity, we adopt the Erd˝os-Rényi model (while the analysis can be generalized to morecomplicated models), where each edge exists with probability p ∈ (0 , and independently with each other. After thechange, the affected subgraph still follows the Erd˝os-Rényimodel, but with a different parameter p ∈ (0 , . We considera sequential detection setting and prove the optimality of theonline detection procedure in the sense that the detectiondelay matches the well-known lower bound. The main ideaof the proof is adopted from the seminal work on generalizedlikelihood ratio test [8].This paper is related to works in community detection andgraph scan statistics. In [12], three likelihood ratio based algo-rithms were developed for detecting communities in the Erd˝os-Rényi graph, including the exhaustive search, the mixture, andthe hierarchical mixture methods. Theoretical approximationwas also given in [12] to characterize the false alarms. In[13], the community detection for Erd˝os-Rényi graphs wasconsidered as a hypothesis testing problem, where the goal isto find a test function that takes the random graph as input andclaim whether there is a community or not. The detectabilityof this problem in the asymptotic dense regime, when theconnection probability p is large enough, was provided in[13]. Later on, information-theoretic lower bounds for theasymptotically sparse regime, when the connection probability p is small enough, was studied in [14].The rest of the paper is organized as follows. We presentthe problem setup in Section II. The detection procedure isdetailed in Section III. We present the optimality study in a r X i v : . [ m a t h . S T ] F e b ection IV. Numerical examples are presented in Section Vto support the theoretical findings. Finally Section VI containsour concluding remarks.II. P ROBLEM SETUP
Given a network with N sensors (nodes) numbered as { , . . . , N } , the dynamic graphical structure is observed asa sequence of undirected adjacency matrixes G (1) , G (2) , . . . , where G ( t ) ∈ { , } N × N characterizes the edge or interactioninformation between different nodes, i.e., G ( t ) ij = 1 if andonly if there is an edge between node i and node j at time t . Consider the Erd˝os-Rényi model, denoted as ER(
N, p ) ,where the graph is constructed by connecting nodes randomly.Each edge is included with probability p independently, i.e., P ( G ( t ) ij = 1) = p . The sequence of observations G (1) , G (2) , . . . are independent realizations of the Erd˝os-Rényi model.Assume that there is a change-point at an unknown time τ that changes the distribution of a subgraph with nodes indexedby V ∗ ⊂ { , . . . , N } . Before the change, the full graphfollows the model ER(
N, p ) with connection probability p .After the change, the subgraph V ∗ follows ER( | V ∗ | , p ) , i.e.,the connection probability inside the subgraph V ∗ becomes p , with everything else the same. Here | V ∗ | denotes thecardinality of the set V ∗ . Usually, the true subgraph V ∗ isunknown as it represents the anomaly information. We assumethat the cardinality of V ∗ equals a known constant n with n < N . In most applications, we have n (cid:28) N which meansthat we are only interested in the local graphical changes.Although the true subgraph where the change happens isunknown, there are only a finite number of possible subgraphswhen the total number of nodes N is fixed. Denote all possiblesubgraphs as: V = { V (1) , . . . , V ( d ) } . Note that the number of all possible subgraphs can be upperbounded by (cid:0) Nn (cid:1) and we further have d (cid:28) (cid:18) Nn (cid:19) , if we are only interested in locally connected subgraphs, whichis a reasonable assumption for community detection taskswhere the change tends to happen within a small neighbor-hood.In summary, the problem of detecting a local change forthe underlying Erd˝os-Rényi model becomes the followinghypothesis testing problem: H : P ( G ( t ) ij = 1) = p , ∀ i, j ; t = 1 , , . . .H : P ( G ( t ) ij = 1) = p , ∀ i, j ; t = 1 , , . . . , τ − P ( G ( t ) ij = 1) = p , ∀ i, j ∈ V ∗ ; t = τ, τ + 1 , . . . P ( G ( t ) ij = 1) = p , ∀ i or j / ∈ V ∗ ; t = τ, τ + 1 , . . . (1)where τ represents the change-point. This hypothesis testingproblem is illustrated in Fig. 1, where the post-change sub-graph contains only three nodes and is shown in highlight. ⋯ τ ⋯ 𝑝 ! 𝑝 ! 𝑝 " ER (𝑁, 𝑝 " ) ER 𝑛, 𝑝 ! 𝑝 " Fig. 1. Graphs prior to the change-point in time τ follow the Erd˝os-Rényi model with coneection probability p . After the change-point τ , thesubgraph (shown in highlight) follows the Erd˝os-Rényi model with connctionprobability p (cid:54) = p , with everything else the same. We are particularlyinterested in detecting the local change in the subgraph. Given access to a sequence of graph observations G (1) , G (2) , . . . , if they are sampled from the hypothesis H inthe model (1), our objective is to design a stopping time thatcan detect the unknown change-point τ as quickly as possible.Meanwhile, if the data are sampled from the hypothesis H inthe model (1), it is desired to have less false alarms as possible.Here we assume the connection probabilities p , p are known,but the subgraph where the change happens is unknown .III. D ETECTION P ROCEDURE
The detection problem (1) can be solved based on statisticalchange detection methodology, which we describe in thissection. We start by introducing the basic cumulative sum(CUSUM) procedure for change detection when the subgraph V ∗ is known, and then study the generalized likelihood ratio(GLR) test for unknown subgraphs.The log-likelihood ratio between the pre- and post-changedistributions plays a key role in sequential change detection.For the Erd˝os-Rényi model ER(
N, p ) before change, we havethe likelihood function of G ( t ) is L ( G ( t ) , p ) = (cid:89) ≤ i
We first introduce two metrics commonly used to charac-terize the performance of detection procedures in sequentialchange detection.The average run length (ARL) is defined as the averagetime between false alarms when there is no change; it canbe denoted as E ∞ [ T ] , where E ∞ is the expectation underthe pre-change measure (i.e., the change-point is at ∞ ). Theexpected detection delay (EDD) refers to the expected delayin detecting the change. There are two common definitions forEDD as introduced in [16] and [20]. We adopt the one in [16]as follows: ¯ E ( T ) = sup τ ≥ ess sup E τ [( T − τ ) + | G (1) , . . . , G ( τ − ] , (7)where the essential supremum is taken over all possiblechange-point τ and realizations G (1) , . . . , G ( τ − before thechange; and E τ is the expectation under the probabilitymeasure that the change-point equals to τ .The goal is to minimize EDD subject to the ARL constraintthat E ∞ [ T ] ≥ γ for a positive constant γ .The lower bound to the worst-case EDD ¯ E ( T ) was givenin [8, Theorem 1]. More specifically, we restate this lowerbound in our setting (1). Theorem 1 ( [8]) . As γ → ∞ , we have inf (cid:8) ¯ E ( T ) : E ∞ ( T ) ≥ γ (cid:9) ≥ ( I − + o (1)) log γ, where I = (cid:18) n (cid:19) (cid:18) p log p p + (1 − p ) log 1 − p − p (cid:19) , is the Kullback–Leibler (KL) divergence between the graphicaldistribution ER( n, p ) and ER( n, p ) on the changed sub-graph V ∗ with | V ∗ | = n . Theorem 1 means that for any detection procedure withARL greater than γ , the minimal detection delay is of theorder of log γ/I . Therefore, a detection procedure is called first-order asymptotic optimal if its EDD equals to log γ/I (1+ o (1)) asymptotically as γ → ∞ . Below, we prove that thedetection rule (5) achieves the lower bound in Theorem 1.We first consider the pre-change regime and state thefollowing lemma: Lemma 2.
For the stopping time (5) , we have sup τ ≥ P ∞ ( τ ≤ T G < τ + m α ) ≤ m α e − b (cid:18) Nn (cid:19) . (8) roof. First note that P ∞ ( τ ≤ T G ≤ τ + m α ) ≤ (cid:88) τ − m α ≤ k ≤ τ + m α P ∞ ( τ k ≤ k + m α ) , (9)where τ k := inf (cid:110) t ≥ k + m (cid:48) α : (cid:98) V k,t ∈ V , and R t,k, (cid:98) V k,t ≥ b (cid:111) . (10)To analyze P ∞ ( τ k ≤ k + m α ) , we use a change-of-measure argument. Let P kV denote the probability measure under whichthe distribution of G ( i ) is ER(
N, p ) for i < k , and thedistribution of the subgraph V becomes ER( n, p ) for i ≥ k .Define a measure Q k = (cid:88) V ∈V P kV . Since V is a finite set, Q k is a finite measure. For t ≥ k , let F k,t denote the sigma-algebra generated by G ( k ) , . . . , G ( t ) .The Radon-Nikodym derivative of the restriction of measure Q k to F k,t relative to the restriction of P ∞ to F k,t is L t = (cid:88) V ∈V exp { R t,k,V } . Hence by Wald’s likelihood ratio identity, we have P ∞ ( τ k ≤ k + m α ) = (cid:90) { τ k ≤ k + m α } L − τ k dQ k = (cid:88) V ∈V (cid:90) { τ k ≤ k + m α } L − τ k dP kV . Since L τ k = (cid:88) V ∈V exp { R τ k ,k,V } ≥ exp (cid:110) R τ k ,k, (cid:98) V k,t (cid:111) ≥ e b , (11)where the last inequality is due to the definition of τ k in (10),we have that sup k P ∞ ( τ k ≤ k + m α ) = sup k (cid:88) V ∈V (cid:90) { τ k ≤ k + m α } L − τ k dP kV ≤ e − b |V| ≤ e − b (cid:18) Nn (cid:19) . Substitute into (9), we have P ∞ ( τ ≤ T G ≤ τ + m α ) ≤ m α e − b (cid:18) Nn (cid:19) . Given the condition (8) in Lemma 2, by [8, Theorem 4],we have the following optimality results for our setup (1).
Theorem 3.
For the stopping time (5) , if the window size m α satisfies lim inf α → m α | log α | > I − , log m α = o (log α ) , and the threshold b satisfies m α e − b (cid:18) Nn (cid:19) = α, (12) then we have E ∞ [ T G ] ≥ ( 12 − α )( m α α − , (13) and as α → , ¯ E ( T G ) ≤ ( I − + o (1)) b, as b ∼ | log α | → ∞ . Therefore the stopping rule (5) is asymptotically optimal.Proof.
For the true subgraph V ∗ , define the window-limitedCUSUM rule as (cid:101) T = inf (cid:26) t : max t − m α ≤ k ≤ t − m (cid:48) α R t,k,V ∗ > b (cid:27) . It is obvious that ¯ E ( T G ) ≤ ¯ E ( (cid:101) T ) due to the definition of T G in (5). Note that for any finitevalue N, n , the constraint (12) implies that we can choosethe threshold as b ∼ | log α | . By [8, Theorem 4], we have as b ∼ | log α | → ∞ , ¯ E ( (cid:101) T ) ≤ ( I − + o (1)) b, if m (cid:48) α = o ( | log α | ) . The ARL (13) is proved in [8] whenever T G satisfies the condition (8). Further note that log E ∞ [ T G ] ∼ | log α | . Therefore, T G matches the lower bounds in Theorem 1. Inother words, T G is first-order asymptotically optimal.It is worth mentioning that the condition (12) only yields b ∼ | log α | for finite values of N, n . It does not hold for N that diverges to infinity. The analysis here differs fromthe original GLR framework considered in [8] and can beviewed as a special case, since the unknown subgraph hasfinite possibilities while the unknown parameter in parametricmodels can vary in a continuous space. Remark 4 (Generalization to unknown p ) . When the post-change probability p is unknown, we can estimate it usingthe maximum likelihood estimator when formulating the GLRdetection statistic. More specifically, given G ( k ) , . . . , G ( t ) anda subgraph V , the MLE of p is given by (cid:98) p ( k,t,V )1 = (cid:88) i,j ∈ V,i V. N UMERICAL E XAMPLES We present simulation examples using the Erd˝os-Rényigraphical models to visualize the detection procedure (5).The size of the network N , i.e., the total number of nodes, isset as and , respectively. In both cases, we are interestedin detecting the change that happens only in a much smallersubgraph consisting of n = 5 nodes. The pre-change edge-forming probability is set as p = 0 . , and the post-changeprobability is p = 0 . , i.e., the change increases the intensityof edges within the changed subgraph.For numerical issues, we do not compute the worst-casedetection delay (7) that takes supremum over all possiblepast observations and over all possible change-points. Instead,we compute an alternative formulation E [ T G ] that can beconveniently evaluated by setting the change-point as τ = 1 ,i.e., the change happens before we take any sample.In Fig. 2, we compare the EDD of the GLR procedure in(5) and the CUSUM procedure in (3). The CUSUM statisticserves as a baseline since it is the optimal detection procedurewith the smallest detection delay. It is shown that the detectiondelay of the GLR approach indeed matches the detection delayof CUSUM in first-order (i.e., in the slope). Moreover, it canbe seen that the detection delay of the GLR approach tends toincrease as we increase the network size N , since it becomesmore difficult to scan for the right subgraph containing thechange. ARL12345678 E DD CUSUMWL GLR ARL02468101214 E DD CUSUMWL GLR Fig. 2. The EDD/ARL tradeoff for window-limited GLR by graph scanning,and the optimal CUSUM when the subgraph is known. Left: N = 20 ; Right: N = 50 . VI. C ONCLUSION We have revisited the sequential change detection for Erd˝os-Rényi graphs. The problem setup can be applied to communitydetection problems. The graph scanning statistic considered inthis paper is formed by scanning all possible subgraphs overthe whole graph. The detection procedure matches the well-known GLR test and is asymptotically optimal.Future direction includes extending the proposed methodto more complicated graphical models and to sequences withdependency. Moreover, the framework can be applied to the goodness-of-fit test for local regions . The global null is that aknown graphical distribution P (e.g., ER( N, p ) ) is a goodfit for all local regions (the whole graph). The alternative isthat there is a subgraph such that the underlying distributiondistinct from P significantly. For each local region, we cancompute a local test statistic based on GLR, and compare itwith a threshold, which can be set by simulation or the limitingdistribution of the test statistic.A CKNOWLEDGMENT The work of Liyan Xie and Yao Xie is partially supportedby an NSF CAREER Award CCF-1650913, DMS-1938106,DMS-1830210, CCF-1442635, and CMMI-1917624.R EFERENCES[1] G. Rovatsos, X. Jiang, A. D. Domínguez-García, and V. V. Veeravalli,“Statistical power system line outage detection under transient dynam-ics,” IEEE Transactions on Signal Processing , vol. 65, no. 11, pp. 2787–2797, 2017.[2] Y. C. Chen, T. Banerjee, A. D. Dominguez-Garcia, and V. V. Veeravalli,“Quickest line outage detection and identification,” IEEE Transactionson Power Systems , vol. 31, no. 1, pp. 749–758, 2015.[3] L. Xie, Y. Xie, and G. V. Moustakides, “Sequential subspace changepoint detection,” Sequential Analysis , vol. 39, no. 3, pp. 307–335, 2020.[4] Y. Wang, A. Chakrabarti, D. Sivakoff, and S. Parthasarathy, “Fastchange point detection on dynamic social networks,” arXiv preprintarXiv:1705.07325 , 2017.[5] L. Peel and A. Clauset, “Detecting change points in the large-scalestructure of evolving networks,” in Proceedings of the AAAI Conferenceon Artificial Intelligence , vol. 29, no. 1, 2015.[6] S. Li, Y. Xie, M. Farajtabar, A. Verma, and L. Song, “Detecting changesin dynamic events over networks,” IEEE Transactions on Signal andInformation Processing over Networks , vol. 3, no. 2, pp. 346–359, 2017.[7] P. Erd˝os and A. Rényi, “On the evolution of random graphs,” Publ.Math. Inst. Hung. Acad. Sci , vol. 5, no. 1, pp. 17–60, 1960.[8] T. L. Lai, “Information bounds and quick detection of parameter changesin stochastic systems,” IEEE Transactions on Information Theory ,vol. 44, no. 7, pp. 2917–2929, 1998.[9] C. E. Priebe, J. M. Conroy, D. J. Marchette, and Y. Park, “Scan statisticson Enron graphs,” Computational & Mathematical Organization Theory ,vol. 11, no. 3, pp. 229–247, 2005.[10] J. Sharpnack, A. Rinaldo, and A. Singh, “Detecting anomalous activityon networks with the graph Fourier scan statistic,” IEEE Transactionson Signal Processing , vol. 64, no. 2, pp. 364–379, 2016.[11] X. He, Y. Xie, S.-M. Wu, and F.-C. Lin, “Sequential graph scanningstatistic for change-point detection,” in . IEEE, 2018, pp. 1317–1321.[12] D. Marangoni-Simonsen and Y. Xie, “Sequential changepoint approachfor online community detection.” IEEE Signal Processing Letters ,vol. 22, no. 8, pp. 1035–1039, 2015.[13] E. Arias-Castro and N. Verzelen, “Community detection in dense randomnetworks,” Annals of Statistics , vol. 42, no. 3, pp. 940–969, 2014.[14] N. Verzelen and E. Arias-Castro, “Community detection in sparserandom networks,” Annals of Applied Probability , vol. 25, no. 6, pp.3465–3510, 2015.[15] E. S. Page, “Continuous inspection schemes,” Biometrika , vol. 41, no.1/2, pp. 100–115, 1954.[16] G. Lorden, “Procedures for reacting to a change in distribution,” Annalsof Mathematical Statistics , vol. 42, no. 6, pp. 1897–1908, 1971.[17] G. V. Moustakides, “Optimal stopping times for detecting changes indistributions,” Annals of Statistics , vol. 14, no. 4, pp. 1379–1387, 1986.[18] Y. Ritov, “Decision theoretic optimality of the CUSUM procedure,” Annals of Statistics , vol. 18, no. 3, pp. 1464–1469, 1990.[19] U. Feige, D. Peleg, and G. Kortsarz, “The dense k-subgraph problem,” Algorithmica , vol. 29, no. 3, pp. 410–421, 2001.[20] M. Pollak, “Optimal detection of a change in distribution,”