Change-point detection for multivariate and non-Euclidean data with local dependency
CChange-point detection for multivariateand non-Euclidean data with localdependency
Hao Chen
Department of StatisticsUniversity of Calfornia, DavisOne Shields AvenueDavis, Calfornia 95616USAe-mail: [email protected]
Abstract:
In a sequence of multivariate observations or non-Euclideandata objects, such as networks, local dependence is common and could leadto false change-point discoveries. We propose a new way of permutation –circular block permutation with a random starting point – to address thisproblem. This permutation scheme is studied on a non-parametric change-point detection framework based on a similarity graph constructed on theobservations, leading to a general framework for change-point detection fordata with local dependency. Simulation studies show that this new frame-work retains the same level of power when there is no local dependency,while it controls type I error correctly for sequences with and without localdependency. We also derive an analytic p -value approximation under thisnew framework. The approximation works well for sequences with lengthin hundreds and above, making this approach fast-applicable for long datasequences. AMS 2000 subject classifications:
Primary 62G32.
Keywords and phrases: graph-based tests, circular block permutation,non-parametric, scan statistic, high-dimensional data, non-Euclidean data,tail probability, analytic p-value approximation.
1. Introduction
Change-point detection is a widely studied problem in statistics and has itsapplications in many fields. In the typical formulation, we have observations { y t : t = 1 , . . . , n } over time (or some other meaningful orderings, such as a one-dimensinoal spatial domain), and test whether there exists τ ∈ { , . . . , n − } such that the underlying distribution of y t changes at τ . There is a rich literatureof this model when y t ’s are real or integer valued scalars (see Carlstein, M¨ullerand Siegmund (1994); Cs¨org¨o and Horv´ath (1997) for a survey).As we entering the big data era, change-point analysis for sequences of mul-tivariate observations or non-Euclidean data objects is gaining more and moreattentions. For example, in text or sequence analysis, each observation in thesequence could be a vector of word counts over a large dictionary of words a r X i v : . [ s t a t . M E ] M a r ao Chen/Change-point detection for data with local dependency (Gir´on, Ginebra and Riba, 2005; Tsirigos and Rigoutsos, 2005). Network datais ubiquitous nowadays as well. Email, phone and online chat records can beused to construct networks of social interactions among individuals (Kossinetsand Watts, 2006; Eagle, Pentland and Lazer, 2009). A large part of these studiesis characterizing how the network evolves through time. Here, each observationis a network and one might ask whether there is an abrupt shift in networkconnectivity at any point in time. In these sequences of complicated data types,it is common that observations are autocorrelated. For example, relationshipsamong people last over an extended time period and the social networks haveserial correlations.A closely related field is time series data analysis. The ARCH model proposedby Engle (1982) and the GARCH model proposed by Bollerslev (1986) and theirvariants were widely used for studying one-dimensional time series data. Thereare many generalizations to accommodate multivariate time series data (see forexamples Bauwens, Laurent and Rombouts (2006); Silvennoinen and Ter¨asvirta(2009); Aue et al. (2009) and references therein). These models are useful forlow-dimensional data and/or for detecting specific types of changes. For high-dimensional data, tests based on these parametric models cannot be applied orlack power unless some strong assumptions are made on the data to avoid theestimation of the large number of nuisance parameters.In this work, we restrain the change-point detection problem to locally de-pendent data, in which we are able to develop a general framework for high-dimensional data and non-Euclidean data objects. We leave the problem oflong-range dependence for future studies. The proposed framework builds uponan earlier work by Chen and Zhang (2015), in which the authors developed anonparametric framework for change-point detection for generic data types un-der the assumption that the observations are independent . When there is localdependence in the data, the method in Chen and Zhang (2015) could result inmore false discoveries than it supposed to have (details see in Section 3). Toaddress this problem, we propose to use a new way of permutation – circularblock permutation with a random starting point. This new way of permutationretains the local structure and could control type I error correctly when thesequence is locally dependent. Moreover, simulation studies show that it retainsthe same level of power when the observations in the sequence are independent.In the following, we use { y , y , . . . , y n } to denote the data sequence, where y t could lie in a high-dimensional or a non-Euclidean space. We focus on thesingle change-point alternative to illustrate the idea, i.e., there possibly existsat most one change-point. The proposed procedure can be extended to thechanged interval alternative and to multiple change-points. Discussions on theseextensions see in Section 6.2.The rest of the paper is organized as follows. Section 2 briefly reviews themethod introduced in Chen and Zhang (2015) in utilizing a similarity graphconstructed on observations for change-point detection. You can skip readingthis section if you are familiar with this method. Section 3 discusses the issues ofthe method in Chen and Zhang (2015) when the sequence is locally dependent,and proposes a new permutation framework to address the issue. Section 4 ao Chen/Change-point detection for data with local dependency discusses more about the proposed test statistic and derives analytic formulas tocalculate the test statistic. Section 5 derives the analytic p -value approximationsfor the test statistic and check how the approximations work for finite samples.We concludes the paper by discussing the choice of the block size in the newframework and the extension of the proposed framework to multiple change-points in Section 6.
2. A brief review of Chen and Zhang (2015)
In Chen and Zhang (2015), the authors assume that y , y , . . . , y n are indepen-dent . They adapted the edge-count test to the scan statistic framework: Each t divides the observations into two samples, { y , . . . , y t } and { y t +1 , . . . , y n } , andthe edge-count test is conducted to test whether these two samples are fromthe same distribution or not. Then, the maximum of the scan statistics over t is used as the test statistic.The edge-count test, introduced in Friedman and Rafsky (1979), is a two-sample test that is based on a similarity graph constructed on the pooled obser-vations of two samples. The similarity graph can be a given graph that reflectsthe similarity between observations (Chen and Zhang, 2013). More generally, itcan be constructed based on a similarity measure through a certain criterion,such as a minimum spanning tree (MST) (Friedman and Rafsky, 1979), whichis a tree connecting all observations with the total distance across edges mini-mized, a minimum distance pairing (Rosenbaum, 2005), or a nearest neighborgraph (Henze, 1988).The edge-count test counts the number of edges in the graph that connectfrom different samples and reject the null of equal distribution when the countis significantly smaller than its null expectation – when the two samples arefrom the same distribution, the two samples are well mixed and this count isrelatively large, so a small count is an evidence for rejecting the null of equaldistribution. Let G be the similarity graph on all observations in the sequence.The edge-count test statistic at time t is: R G ( t ) = (cid:88) ( i,j ) ∈ G I g i ( t ) (cid:54) = g j ( t ) , g i ( t ) = I i>t , where I A is the indicator function that takes value 1 if event A is true and 0otherwise. Figure 1 illustrates the computation of R G ( t ) on a small artificialdata set (the observations are in 2-dimension for illustration purpose).Under the null hypothesis of no change-point and the assumption that y i ’sare independent, the joint distribution of { y i : i = 1 , . . . , n } is the same underpermutation. Hence, the null distribution of R G ( t ) is defined to be the permuta-tion distribution, which places 1 /n ! probability on each of the n ! permutations of { y i : i = 1 , . . . , n } . When there is no further specification, we denote by P P , E P , Var P probability, expectation, and variance, respectively, under the permutationnull distribution. ao Chen/Change-point detection for data with local dependency Fig 1 . The computation of R G ( t ) for 4 different t ’s on a small artificial data set of length n = 20 with G be the MST on the Euclidean distance. The index of each observation is besideeach point. The first 10 points are randomly drawn from N ( , I ) and the next 10 points arerandomly drawn from N ((2 , T , I ) . Each t divides the observations into two groups, onegroup for observations before and at t (shown as circles) and the other group for observationsafter t (shown as triangles). Edges that connect observations from the two different groupsare emboldened in the graph. G does not change as t changes, but the group identities of someobservations change, causing R G ( t ) to change. The authors standardized R G ( t ) so that it is comparable across t . Let Z G ( t ) = − R G ( t ) − E P ( R G ( t )) (cid:112) Var P ( R G ( t )) . (2.1)The sign is flipped that a large Z G ( t ) indicates a change-point. The analyticexpressions for E P ( R G ( t )) and Var P ( R G ( t )) are given in Chen and Zhang (2015).The null hypothesis of no change-point is rejected if the scan statisticmax n ≤ t ≤ n Z G ( t ) , ( n , n prespecified) (2.2)is greater than a threshold. When n is small, this threshold could be deter-mined by performing random permutations directly; when n is large, Chen andZhang (2015) provided accurate analytic formulas to approximate the permu-tation p -value, allowing fast application of the method. The authors also shownthrough simulations that this graph-based testing framework has better powerthan likelihood-based methods when the dimension of the data is moderate tohigh.
3. A circular block permutation framework for locally dependentdata
The method in Chen and Zhang (2015) assume that the observations are inde-pendent . Under the independence assumption, we can permute the order of theobservations to get a pool of sequences that have the same distribution as theoriginal sequence under the null hypothesis of no change-point. However, whenthere is dependence structure within the sequence, such as autocorrelation, theabove argument no longer holds. For example, if we apply the scan statistic (2.2)in Chen and Zhang (2015) to an autocorrelated sequence and use the p -valuecalculated based on permutation, it rejects the null hypothesis more often thanit should do (Figure 2, right panel). ao Chen/Change-point detection for data with local dependency Fig 2 . Histograms of p -values using the method in Chen and Zhang (2015) in testing homo-geneity of 10,000 sequences of no change-point. Left panel: the observations in each sequencesare independently generated from multivariate normal distribution ( y t iid ∼ N ( , Σ) , d =10 , Σ( i, j ) = | i − j | . , n = 200 ). Right panel: each sequence is generated from the multivariateautoregression model ( y t = ρ y t − + ε t , t = 1 , . . . , n , with y ∼ N ( , − ρ Σ) , ε , . . . , ε n iid ∼N ( , Σ) , ρ = 0 . , d = 10 , n = 200 ). When the sequence has dependency over time, the permutation null distri-bution is no longer a good surrogate to the true null distribution as permuta-tion destroy the local structure. If the dependency structure can be removedfrom the sequence, the remaining sequence with independent observations canbe analyzed through the method in Chen and Zhang (2015). This is, however,not realistic for many applications with high-dimensional/non-Euclidean datasequences. To address the local dependence issue, we propose a new null distri-bution that serves as a better surrogate to the true null distribution than thepermutation null.
The block-resampling bootstrap was proposed by K¨unsch (1989) and indepen-dently by Liu and Singh (1992) as a resampling procedure for weakly dependentstationary observations. The idea is that the dependency structure is preservedwithin the blocks. This was extended to circular block resampling bootstrap bywrapping the data around in a circle before blocking them (Politis and Romano,1992). For change-point analysis, the block permutation with fixed blocks start-ing from the first observation and the circular block bootstrap were studied ondependent data for one-dimensional observations (Kirch, 2006).In light of these studies, we propose to use circular block permutation with arandom starting point to generate a pool of sequences representing realizationsfrom approximate distributions of the original sequence with local dependencyunder the null of no change. The recipe with block size L on a sequence of length n is as follows: ao Chen/Change-point detection for data with local dependency (1) The starting point is chosen uniformly from the n observations, which isdenoted as k . If k >
1, the first k − { y k , . . . , y n , y , . . . , y k − } .(2) The new sequence is blocked into [ n/L ] blocks of size L starting fromthe first observation y k . It is possible that the last block has less than L observations.(3) The [ n/L ] blocks are permuted.For the above recipe, it is easy to see that the resulting sequence from thecircular block permutation with a random starting point is one permutation ofthe original sequence – each observation appears in the resulting sequence andappears only once. Then, the similarity graph on the resulting sequence is thesame as that on the original sequence, which makes theoretical analysis on thisframework tractable. In addition, the randomized starting point ensures thatthe probability of any observation y i appears at any location j in the resultingsequence is 1 /n , ensuring unbiasedness.To make the theoretical treatment more tractable, we work under the follow-ing variant of the framework: We first augment the sequence by x (0 ≤ x < L )pseudo observations by adding them to the end of the sequence so that n + x isdivisible by L . These augmented x observations have no edge connected to anyother observations. Then the recipe described above is applied to this augmentedsequence with n + x observations. This variant is the same as the original versionwhen n is divisible by L and works similarly when not. In this variant, all theblocks are of size L , so the theoretical treatments are much more tractable. Inthe following, we work under this variant and short the framework as ‘circularblock permutation’ or ‘CBP’ for simplicity. We also use n to denote n + x forsimplicity. We use P CBP , E CBP , Var
CBP to denote probability, expectation, andvariance, respectively, under this framework. We consider L to be fixed for therest of the paper. A discussion on how to choose L in a data driven way is inSection 6.1. The standardized edge-count statistic under the circular block permutation canbe defined as: Z G, CBP ( t ) = − R G ( t ) − E CBP ( R G ( t )) (cid:112) Var
CBP ( R G ( t )) . (3.1)The scan statistic is then defined as:max n ≤ t ≤ n Z G, CBP ( t ) . (3.2)In the following, with no further specification, we set n = [0 . n ] and n = n − n . For a real value s , we use [ s ] to denote the largest integer that is no larger than s . ao Chen/Change-point detection for data with local dependency Figure 3 shows the histograms of p -values in testing the homogeneity of se-quences when the sequence has no change-point. For each sequence, the p -valueof the test is obtained through doing 100,000 CBPs, respectively. Now we seethat the type I error is correctly controlled for sequences of autocorrelated data. Fig 3 . Histograms of p -values under CBP (with block size L = 5 ) in testing homogeneity ofthe same set of sequences in Figure 2. An immediate follow-up question is whether the improvement in controllingthe type I error of the circular block permutation framework come with a sacri-fice on its power. To get an idea of this issue, we compare the power of the twoframeworks. Table 1 shows the estimated power based on 1,000 simulation runs.In each simulation run, the sequence is generated in the same way as in Figure2, while here, there is a mean shift in the middle of the sequence with the L distance between the means before and after the change to be 2. This specificalternative is chosen so that the tests have moderate power. We see that whendata are independent, the power under CBP is similar to that under permuta-tion. When the sequence is autocorrelated, the permutation framework cannotbe used, while the CBP framework has power on par with the independentscenario. Table 1
Estimated power based on 1,000 simulation runs. Significance level set to be 0.05. permutation CBP, L = 5Independent data 0.791 0.779Autorrelated data – 0.775 These simulation results show the ability of CBP in controlling the type Ierror rate and at the same time keeping substantial power for sequences withlocal dependency. In the above simulation runs, the expectation and variance of R G ( t ) under CBP, as well as the p -value of the test, are calculated by randomlysampling from the circular block permutation distribution. This is very timeconsuming if one wants to get a good estimate of them. In the following, wework on analytic expressions (or approximate analytic expressions when the ao Chen/Change-point detection for data with local dependency exact analytic expression is hard to obtain) of these quantities to make thisframework easy to use in practice.
4. Analytic expressions under CBP
Set m ≡ n/L , there are in total L × m ! CBPs and it is very time consumingto draw all these CBPs when m is moderate to large. In this section, we deriveexact analytic expression for E CBP ( R G ( t )) (Section 4.1) and an approximateanalytic expression for Var
CBP ( R G ( t )) (Section 4.2). In the following, for anyscalar s , we use ( s ) + to denote max(0 , s ); and for set S , we use | S | to denotethe number of elements in the set. E CBP ( R G ( t )) Let π CBP ( i ) be the index of y i under the circular block permutation. Then E CBP ( R G ( t )) = (cid:88) ( i,j ) ∈ G P ( g π CBP ( i ) ( t ) (cid:54) = g π CBP ( j ) ( t )) (4.1)= 2 (cid:88) ( i,j ) ∈ G P ( π CBP ( i ) ≤ t, π CBP ( j ) > t ) . The design of the circular block permutation ensures that, besides t , n and theblock size L , P ( π CBP ( i ) ≤ t, π CBP ( j ) > t ) only depends on δ ij = min( | i − j | , n − | i − j | ), which is the smaller index difference between y i and y j in thecircle formed by linking the end of the sequence to its start. In particular, theprobability depends on which one of the following categories δ ij belongs to: { δ ij = 1 } , . . . , { δ ij = L − } , { δ ij ≥ L } . The probability is the same in eachcategory and can be calculated exactly for each of them. Hence, we can classifythe edges in G according to these categories and get Theorem 4.1. Theorem 4.1.
For each t ∈ { , . . . , n } , we write t in the form of t = aL + b where a = [ t/L ] , b = t − aL , then E CBP ( R G ( t )) = L (cid:88) k =1 p ( k, a, b ) |E k | , where p ( k, a, b ) = ( δ ij − b ) + a ( m − a ) n ( m − + ( b − ( L − k )) + ( a +1)( m − a − n ( m − + (min( b, L − k ) − ( b − k ) + ) a ( m − a − m − n ( m − , E k = { ( i, j ) ∈ G : δ ij = k } , k = 1 , . . . , L − , E L = { ( i, j ) ∈ G : δ ij ≥ L } . Remark 4.2.
When t is divisible by L ( t = aL ), we have p ( k, a,
0) = δ ij a ( m − a ) n ( m − .Then E CBP ( R G ( t )) = a ( m − a ) n ( m − (cid:80) Lk =1 k |E k | . ao Chen/Change-point detection for data with local dependency Proof of Theorem 4.1.
We compute the probability P ( π CBP ( i ) ≤ t, π CBP ( j ) > t ) (4.2)under difference scenarios.When δ ij ≥ L , y i and y j are always in two different blocks. If b = 0, then π CBP ( i ) ≤ t only if the block containing y i is placed in the first a blocks afterthe circular block permutation, and π CBP ( j ) > t only if the block containing y j is placed in the last m − a blocks after the circular block permutation. So theprobability (4.2) is am m − am − = La ( m − a ) n ( m − . If b >
0, we need to discuss whether the block containing either y i or y j sitson t , whether y i is in the first b observations in the block, and whether y j isin the last L − b observations in the block. Enumerating all possibilities, theprobability (4.2) is am (cid:16) m − a − m − + m − L − bL (cid:17) + m bL m − a − m − = La ( m − a )+ b ( m − a − n ( m − . When δ ij < L , we also discuss the two scenarios: b = 0 and b > b = 0, y i and y j need to be in different blocks to have ( π CBP ( i ) ≤ t, π ( j ) CBP > t ). Among the L possible ways in blocking the sequence, δ ij ofthem have y i and y j in different blocks, so the probability (4.2) is δ ij L a ( m − a ) m ( m − = δ ij a ( m − a ) n ( m − . If b > y i and y j could be in different blocks or in the same block to satisfy( π CBP ( i ) ≤ t, π CBP ( j ) > t ) . If they are in the same block, we denote thatparticular block by B . If they are in different blocks, the two blocks much beadjacent. Among the two blocks, we denote the left block to be B (to makethis argument consistent, the first block of the sequence and the last block ofthe sequence are considered to be adjacent and the last block of the sequenceis considered to be on the left of the first block of the sequence). We then letthe adjacent block right of B to be B . For block B , we further divide it intotwo sub-regions with B ,l denoting the first b location(s) of the block and B ,r denoting the rest L − b location(s) of the block. We define B ,l and B ,r similarlyfor block B .Then, there are four configurations for the placements of i and j for each ofthe two scenarios: (i) i on the left of j within B ∪ B , and (ii) i on the right of j within B ∪ B . The four configurations are listed in Tables 2 and 3 for thesetwo scenarios, respectively. Together in the tables are the probability of havingeach of the configuration out of L different ways of doing the blocking (Prob.1 in the tables) and the proportion of the permutations in terms of permutingthe blocks so that ( π CBP ( i ) ≤ t, π CBP ( j ) > t ) given the configuration (Prob. 2in the tables). For each of the two scenarios, summing over the product of thetwo probabilities (Prob. 1 and Prob. 2 in the tables) gives (4.2). ao Chen/Change-point detection for data with local dependency Table 2
Four configurations of the placement of i and j when δ ij < L , b > and i on the left of j within B ∪ B . For each configuration, “Prob. 1” is the probability of having theconfiguration out of L different ways of doing the blocking, and “Prob. 2” is the proportionof the permutations in terms of permuting the blocks so that ( π CBP ( i ) ≤ t, π CBP ( j ) > t ) given the configuration. (In this table, δ ij is shortened as δ to avoid cumbersome.) B ,l B ,r B ,l B ,r Prob. 1 Prob. 2 i j min( b,L − δ ) − ( b − δ )+ L m i j ( b − ( L − δ )) + L ( a +1)( m − a − m ( m − i j min( b,L − δ ) − ( b − δ )+ L a ( m − a − m ( m − i j ( δ − b ) + L a ( m − a ) m ( m − Table 3
Four configurations of the placement of i and j when δ ij < L , b > and i on the right of j within B ∪ B . Other notations follow Table 2. B ,l B ,r B ,l B ,r Prob. 1 Prob. 2 j i min( b,L − δ ) − ( b − δ )+ L j i ( b − ( L − δ )) + L ( a +1)( m − a − m ( m − j i min( b,L − δ ) − ( b − δ )+ L a ( m − a )+( m − a − m ( m − j i ( δ − b ) + L a ( m − a ) m ( m − Since m + a ( m − a − m ( m − = a ( m − a )+( m − a − m ( m − , both summations give, for δ ij < L ,(min( b, L − δ ij ) − ( b − δ ij )+) a ( m − a − m − n ( m − + ( b − ( L − δ ij )) + ( a +1)( m − a − n ( m − + ( δ ij − b ) + a ( m − a ) n ( m − . Then, the theorem follows as the result for δ ij ≥ L is a special case of the aboveexpression with δ ij replaced by L . Var
CBP ( R G ( t )) For variance, we need to figure out E CBP ( R G ( t )) = (cid:80) ( i,j ) , ( u,v ) ∈ G P ( g π CBP ( i ) ( t ) (cid:54) = g π CBP ( j ) ( t ) , g π CBP ( u ) ( t ) (cid:54) = π π CBP ( v ) ( t )) . Then,
Var
CBP ( R G ( t )) follows as E CBP ( R G ( t )) − ( E CBP ( R G ( t ))) with the analytic expression for E CBP ( R G ( t )) provided in Sec-tion 4.1.When i, j, u, v are all different, we need to consider (cid:0) (cid:1) = 6 index-pairs andwhether they could be within a block or not. This is much more complicatedthan the calculation in E CBP ( R G ( t )) where only one index-pair is considered. Inthe derivation of E CBP ( R G ( t )), when t is divisible by L ( b = 0 in the proof), thederivation is much easier. Therefore, we work out the exact analytic expressionfor Var
CBP ( R G ( t )) for t divisible by L , which is already very complicated, anddo extrapolations for other t ’s. We then compare the result obtained in this waywith doing circular block permutation directly. The following theorem givesexact analytic expression for Var
CBP ( R G ( t )) when t = aL, a = 0 , . . . , m . ao Chen/Change-point detection for data with local dependency Theorem 4.3.
For t = aL, a = 1 , . . . , m , we have Var
CBP ( R G ( t )) = c p ( a ) + c p ( a ) + c p ( a ) − c p ( a ) , where p ( a ) = a ( m − a ) m ( m − ,p ( a ) = p ( a ) ,p ( a ) = a ( a − m − a )( m − a − m ( m − m − m − ,c = L L (cid:88) k =1 k |E k | ,c = L L (cid:88) k =1 k |E k | + L (cid:88) ( i,j ) , ( i,u ) ∈ G ; j (cid:54) = u { ( L − δ ju ) I h ( i,j,u ) < ,δ ju Remark 4.4. For c , a high level explanation is that it is the average of thenumber of edges whose end nodes appear in different blocks over all L possibleways of blocking . Let ω represents one of such blockings and Ω be the set ofall L ways of blockings, then c = L (cid:80) ω ∈ Ω c ( ω ) . The other three coefficients c , c , and c , involve two edges. The two edges do not need to be distinct,i.e., they could degenerate to be the same edge; or the two edges could share thesame node. Then, under a certain blocking ω , c ( ω ) is the number of pairs ofedges whose end nodes only appear in two distinct blocks with both edges havingtheir end nodes appearing in different blocks, c ( ω ) is the number of pairs ofedges whose end nodes appear in three distinct blocks with both edges having Based on the recipe of the circular block permutation, the blocks resulted from a randomstarting point at y t are the same as the blocks resulted from a random starting point at y t + L .Hence, there are only L different ways of blocking. ao Chen/Change-point detection for data with local dependency their end nodes appearing in different blocks, and c ( ω ) is the number of pairsof edges whose end nodes appear in four distinct blocks. It is not hard to see that c ( ω ) + c ( ω ) + c ( ω ) = c ( ω ) . Then, c + c + c = L (cid:88) ω ∈ Ω c ( ω ) ≥ (cid:32) L (cid:88) ω ∈ Ω c ( ω ) (cid:33) = c . When L = 1 , the equality always holds. Indeed, when L = 1 , the coefficientscan be simplified to be c = c = | G | , c = (cid:80) ni =1 | G i | − | G | and c = | G | − (cid:80) ni =1 | G i | + | G | , where | G | is the number of edges in the graph G , and G i isthe subgraph in G that connect to node y i . So | G i | is the degree of node y i . It isclear that c + c + c = | G | = c . However, for L > , the equality usually doesnot hold unless under very special cases that c ( ω ) is the same for all ω ∈ Ω . In Theorem 4.3, we provide the exact analytic expression for Var CBP ( R ( t ))when t is divisible by L . Unless L = 1 that the coefficients, c , c and c , couldbe greatly simplified, the expressions for these coefficients are in general fairlycomplicated for L > 1. It can be imagined that the exact analytic expression for Var CBP ( R ( t )) when t is not divisible by L would be much more complicated. Inaddition, the computation time for these coefficients when L > G , the analytic expression could not be further simplified astwo edges are involved in computing the variance and we need to consider allthe possible combinations of whether the (cid:0) (cid:1) = 6 pairwise index differences aresmaller than L or not when the four end points of the two edges are distinct. In atypical run for a 1,000-length sequence with local dependence, the computationtime for getting all the coefficients with L = 5 is about 8 second on the 12-inchMacBook (2015), which is acceptable; while the not-derived analytic expressionsfor the variance under CBP at t = aL + b, < b < L would be much morecomplicated (the magnitude of the complication different can be inferred fromthe exact analytic expression of E CBP ( R ( t )) in Section 4.1 under b = 0 and b > 0) and require much more time to compute. Combining all the factors, wepropose to fill-in Var CBP ( R G ( t )) at t not divisible by L by extrapolating fromthe values at t = aL, a = 0 , , . . . , m .Figure 4 shows the standard deviation of R G ( t ) under circular block permu-tation (SD CBP ( R G ( t ))) for a 1000-length sequence with local dependence. Theblue line in each plot is based on the analytic formula provided in Theorem 4.3with the values at t = aL + b, < b < L filled in by extrapolation, and the blackline in each plot is based on 900 CBPs (top panels) and 90,000 CBPs (bottompanels). Here, 900 CBPs were chosen as it uses a similar amount of time to thatby computing the standard deviation based on Theorem 4.3 and extrapolationfor this sequence. We can see clearly from the top panels that 900 CBPs is notenough as the results are fluctuating widely. It is important to have a goodestimate of SD CBP ( R G ( t )) as it standardize the raw statistic R G ( t ) and a badestimate could lead to a bad estimate of the change-point location. When we ao Chen/Change-point detection for data with local dependency Fig 4 . Standard deviation of R G ( t ) under circular block permutation from the analytic ex-pression given in Theorem 4.3 (the values at t = aL + b, < b < L are extrapolated) and fromdoing circular block permutation directly. The left panels are plotting the whole line and theright panels are plotting the middle part of the sequence. In each plot, the blue line is based onthe analytic formula and the black line is based on directly doing circular block permutationwith 900 CBPs for the top panels and 90,000 CBPs for the bottom panels. increase the number of CBPs to 90,000 (using 100-fold times as for getting theanalytic results), the values based on CBPs directly is much better (bottom leftpanel). However, if we zoom into the middle part of the sequence, we could stillsee the black line wiggling around, which could cause inaccurate estimate of thechange-point location. This toy example shows that we would need even morenumber of CBPs to get an estimate that is as good as those from the analyticformula with extrapolation.Therefore, we recommend to use the analytic formula given in Theorem 4.3to get exact values at t = aL, a = 0 , , . . . , m and use extrapolation to get valuesat t = aL + b, < b < L . This approach gives us accurate enough estimate for Var CBP ( R G ( t )) with a reasonable fast enough computing time. In the following, Z G, CBP ( t ) is defined with Var CBP ( R G ( t )) computed in this recommended way. 5. Analytic p -value approximations Now, we have a relative fast analytic way to compute the standardized statistic Z G, CBP ( t ). The next question is how large the scan statisticmax n ≤ t ≤ n Z G, CBP ( t ) ao Chen/Change-point detection for data with local dependency needs to be to constitute sufficient evidence against the null hypothesis of ho-mogeneity, i.e., we are concerned with the tail probability of the scan statisticsunder H : P ( max n ≤ t ≤ n Z G, CBP ( t ) > b ) . (5.1)To obtain this tail probability, we can directly perform the circular blockpermutation, which would be time consuming in obtaining a reasonably accu-rate estimate. Therefore, we seek to derive an analytic expression for this tailprobability. In the rest of this section, we first study the asymptotic distributionof the process { Z G, CBP ( t ) } . We derive approximate analytic expression for thetail probability for the limiting process and then refine the approximation towork for finite n . Here, we derive the limiting distribution of { Z G, CBP ([ mw ] L ) : (cid:15) ≤ w ≤ − (cid:15) } for any 0 < (cid:15) < . e be e − and e + with e − < e + . For node i and edge e = ( e − , e + ), let A e,L, = { e ∗ : min( δ ( e ∗− , e − ) , δ ( e ∗− , e + ) , δ ( e ∗ + , e − ) , δ ( e ∗ + , e + )) < L } ,A e,L, = A e,L, ∪ (cid:91) { e (cid:48) : e (cid:48) ∈ G e ∗− ∪ G e ∗ + , ∀ e ∗ ∈ A e,L, } A e (cid:48) ,L, ,A e,L, = A e,L, ∪ (cid:91) { e (cid:48) : e (cid:48) ∈ G e ∗− ∪ G e ∗ + , ∀ e ∗ ∈ A e,L, } A e (cid:48) ,L, ,A i,L, = { e ∗ : min( δ ( e ∗− , i ) , δ ( e ∗ + , i )) < L } ,A i,L, = A i,L, ∪ (cid:91) { e (cid:48) : e (cid:48) ∈ G e ∗− ∪ G e ∗ + , ∀ e ∗ ∈ A e,L, } A e (cid:48) ,L, ,A i,L, = A i,L, ∪ (cid:91) { e (cid:48) : e (cid:48) ∈ G e ∗− ∪ G e ∗ + , ∀ e ∗ ∈ A e,L, } A e (cid:48) ,L, . Here, A e,L, is the set of edges whose end nodes could be within the same blockwith any of the end nodes in e under some circular block permutations. This canbe viewed as the neighbors of e . Then, A e,L, is the set of edges whose end nodescould be within the same block with any of the end nodes of the edges in A e,L, ,so A e,L, can be viewed as the set containing neighbors of A e,L, . Similarly, A e,L, can be viewed as the set containing neighbors of A e,L, . The other threesets, A i,L, , A i,L, and A i,L, , are defined similarly but initiated from a node i . ao Chen/Change-point detection for data with local dependency For a certain block i , let D i be the number of edges in G that connect anode in block i to another node not in block i under the blocking that block i exists. Let E Ω be the expectation that places probability L on each ω ∈ Ω withΩ defined in Remark 4.4. In the following, we write a n = O ( b n ) when a n has thesame order as b n , and write a n = o ( b n ) when a n has order smaller than b n . Thelimiting distribution of the stochastic process needs the following conditions. Condition 1. | G | = O ( n α ) , ≤ α < ; (cid:80) e ∈ G | A e,L, || A e,L, | = o ( n | G | ) ; (cid:80) ni =1 | A i,L, || A i,L, | = o ( n . ) . Condition 2. E Ω ( (cid:80) mi =1 D i ) − m ( E Ω ( (cid:80) mi =1 D i )) = O ( E Ω ( (cid:80) mi =1 D i )) . Condition 3. E Ω ( (cid:80) mi =1 D i ) = O ( | G | ) . Theorem 5.1. Under Conditions 1 and 2, or under Conditions 1 and 3, as n → ∞ , ∀ (cid:15) ∈ (0 , . , { Z CBP ([ mw ] L ) : (cid:15) < w < − (cid:15) } converges in finitedimensional distributions to a Gaussian process, which we denote as { Z (cid:63) CBP ( w ) : (cid:15) < w < − (cid:15) } . The complete proof of the theorem is in Appendix A.2. It utilizes a prooftechnique that is similar to that in Chen and Zhang (2015). Remark 5.2. Condition 1 under L = 1 are relaxed versions of the conditionsfor the limiting distribution under permutation null distribution in Chen andZhang (2015) . For Conditions 2 and 3, we only need one of them to hold.Condition 2 says that the graph shall not be flat: D i ’s shall not be of the sameorder across i ’s. However, when | G | = O ( n ) , such a flat graph would also beacceptable as Condition 3 holds. We now examine the asymptotic behavior of the tail probability (5.1). Our ap-proximation require the function ν ( x ) defined as ν ( x ) = 2 x − exp (cid:8) − (cid:80) ∞ i =1 1 i Φ (cid:0) − x √ i (cid:1)(cid:9) , x > ν ( x ) ≈ (2 /x )(Φ( x/ − . x/ x/ φ ( x/ . Based on Theorem 5.1, under Conditions 1 and 2 (or 3), following similararguments in Chen and Zhang (2015), we have as n → ∞ , for b = O ( √ n ), n , n = O ( n ), P ( max n ≤ t ≤ n Z CBP ( t ) > b ) ∼ bφ ( b ) (cid:88) n ≤ t ≤ n C ( t ) ν (cid:16)(cid:112) b C ( t ) (cid:17) , (5.2)where C ( t ) = L ∂ρ ( s,t ) ∂s (cid:12)(cid:12)(cid:12) s = t , with ρ ( s, t ) ≡ Cov CBP ( Z CBP ( s ) , Z CBP ( t )) . The fol-lowing theorem gives an analytic expression for Cov CBP ( R G ( t ) , R G ( t )) at t and t divisible by L . We do not consider | G | = O ( n α ) , < α < ao Chen/Change-point detection for data with local dependency Theorem 5.3. For t = a L < t = a L where a , a ∈ { , , . . . , m } , we have Cov CBP ( R G ( t ) , R G ( t ))= c q ( a , a ) + c q ( a , a ) + c q ( a , a ) − c p ( a ) p ( a ) , with c , c , c , c provided in Theorem 4.3, and q ( a , a ) = a ( m − a ) m ( m − ,q ( a , a ) = a ( m − a )( m − a +2 a − m ( m − m − ,q ( a , a ) = a ( m − a ) { ( a − m − a − a − a )( m − a − } m ( m − m − m − . The proof of this theorem is in Appendix A.3.Then, for t = aL , a ∈ { , . . . , m − } , we have C ( t ) = m ( m − h ( a,m ) c + h ( a,m ) c + h ( a,m ) c )2 La ( m − a )( h ( a,m )(2 c + c )+ h ( a,m ) c + h ( a,m ) c ) where h ( a, m ) = 2 m ( m − m − ,h ( a, m ) = ( m − m − a ) − m ) ,h ( a, m ) = − m − a ) + 4 m,h ( a, m ) = m ( m − m − m − h ( a, m ) = 4 m ( m − a − m − a − ,h ( a, m ) = − a ( m − a )( m − m − . Based on the above results, the p -value approximation is reasonably wellfor low to moderate dimension, but not that well when the dimension is high(details see in Section 5.3, Table 4). The reason is that when the dimension ishigh, the convergence of Z CBP ( t ) to the Gaussian distribution is low when t closes to the two ends and, for finite samples, Z CBP ( t ) could be quite skewed.Figure 5 plots the skewness of Z CBP ( t ) estimated from 100,000 random circularblock permutations for two data sequences generated based on model M2 underscenarios 2 and 3 provided in Section 5.3. We can see that the skewness is quitesevere when t is close to the two ends of the sequence and when the dimensionis high. Chen and Zhang (2015) discussed the same issue under the permutationframework, here, we adopt a similar treatment for the circular block permutationby adding an extra term S ( t ) to correct for the skewness. This term varies across t to address for the different extend of the skewness across t . P ( max n ≤ t ≤ n Z CBP ( t ) > b ) ≈ bφ ( b ) (cid:88) n ≤ t ≤ n S ( t ) C ( t ) ν (cid:16)(cid:112) b C ( t ) (cid:17) , (5.3)where S ( t ) = exp (cid:16) 12 ( b − ˆ θ b ( t )) + 16 γ ( t )ˆ θ b ( t ) (cid:17) √ γ ( t )ˆ θ b ( t ) , with γ ( t ) = E CBP ( Z ( t )) and ˆ θ b ( t ) =( − (cid:112) bγ ( t )) /γ ( t ). ao Chen/Change-point detection for data with local dependency To get an exact analytic expression for E CBP ( Z ( t )), one needs to figure outall possible configurations of 3 edges, and whether any of the six end nodes arewithin a block or not. Thus, even for only calculation t = aL, a = 1 , , . . . , m , theanalytic expression is very complicated and needs quite some time to run in R.It turns out that γ ( t ) can be reasonably well approximated by E P ( Z ( t )), whichcan be instantly computed in R with its exact analytic expression provided inChen and Zhang (2015). Figure 5 plots E P ( Z ( t )) (red line) on top of estimated E CBP ( Z ( t )) (black dots), and we can see that E P ( Z ( t )) provides a goodestimate to E CBP ( Z ( t )). Hence, when we apply (5.3) to approximate the p -value, we use E P ( Z ( t )) to estimate γ ( t ) in computing S ( t ). Fig 5 . Skewness E CBP ( Z CBP ( t )) estimated from 100,000 random circular block permutations(black dots), and E P ( Z P ( t )) computed from its exact analytic expression (red line). d = 100 d = 1 , p -value approximations Here, we check how accurate the analytic formulas provided in 5.2 in approxi-mating the p -values. We compare the analytic p -value approximations obtainedthrough asymptotic results (5.2) (denoted by “A1”) and after skewness correc-tion (5.3) (denoted by “A2”) with the p -value estimated from 100,000 randomcircular block permutations (denoted by “CBP”). We generate data from au-toregressive and/or moving average models and consider the following threescenarios:Scenario 1: d = 10, noises generated from the Gaussian distribution.Scenario 2: d = 100, noises generated from t distribution.Scenario 3: d = 1 , arima.sim to first generate d independent sequences of timeseries data of length n = 1 , z , z , . . . , z n ∈ R d . Then, let y t = Σ / z t , wherethe ( i, j ) element of Σ is 0 . | i − j | . The methods are applied to the data sequence { y , y , . . . , y n } . We consider five autoregressive and/or moving average modelsin generating z t ’s. ao Chen/Change-point detection for data with local dependency • M1: AR(1) with parameter 0.1. • M2: AR(2) with parameters 0.1 and 0.05. • M3: MA(1) with parameter 0.1. • M4: MA(2) with parameters 0.1 and 0.05. • M5: ARMA(1,1) with parameters 0.1 and 0.1. Table 4 Critical values at 0.05 significance level based on the asymptotic results (“A1”), theskewness corrected version (“A2”), and 100,000 circular block permutations (“CBP”). Scenario 1 ( d = 10) Scenario 2 ( d = 100) Scenario 3 ( d = 1000)A1 A2 CBP A1 A2 CBP A1 A2 CBPM1 L = 2 3.05 2.94 2.92 2.99 2.72 2.71 2.95 2.38 2.44 L = 5 3.05 2.94 2.92 2.99 2.72 2.71 2.95 2.38 2.40 L = 10 3.05 2.94 2.92 2.99 2.72 2.71 2.95 2.37 2.38 L = 20 3.04 2.94 2.94 2.99 2.72 2.71 2.95 2.37 2.36M2 L = 2 3.05 2.94 2.92 3.00 2.77 2.73 2.96 2.47 2.52 L = 5 3.05 2.94 2.91 3.00 2.77 2.74 2.96 2.47 2.51 L = 10 3.05 2.94 2.91 3.00 2.77 2.72 2.96 2.47 2.51 L = 20 3.05 2.95 2.88 3.00 2.77 2.69 2.96 2.46 2.50M3 L = 2 3.05 2.95 2.94 3.00 2.73 2.74 2.96 2.40 2.48 L = 5 3.05 2.95 2.95 3.01 2.74 2.76 2.96 2.40 2.48 L = 10 3.06 2.96 2.96 3.01 2.74 2.75 2.96 2.40 2.48 L = 20 3.06 2.96 2.98 3.01 2.74 2.77 2.96 2.40 2.47M4 L = 2 3.05 2.94 2.91 3.00 2.71 2.72 2.97 2.52 2.55 L = 5 3.05 2.94 2.90 2.99 2.71 2.69 2.97 2.52 2.55 L = 10 3.05 2.94 2.91 3.00 2.71 2.72 2.97 2.53 2.57 L = 20 3.05 2.94 2.89 3.00 2.71 2.72 2.98 2.53 2.59M5 L = 2 3.05 2.96 2.93 3.01 2.78 2.76 2.98 2.59 2.60 L = 5 3.05 2.96 2.93 3.01 2.78 2.77 2.98 2.59 2.58 L = 10 3.05 2.96 2.93 3.01 2.78 2.78 2.97 2.58 2.56 L = 20 3.06 2.96 2.95 3.01 2.78 2.80 2.97 2.58 2.56 We checked for several block sizes: L = 2 , , , 20. The critical values basedon “A1”, “A2”, and “CBP” are provided in Table 4. Here, results under “CBP”can be viewed as good estimates of the true critical values even though theyare time-consuming to obtain. We compare results under “A1” and “A2” withthose under “CBP”. We see that, under scenario 1, when the dimension is low,both analytic p -value approximations work reasonably well with those under“A2” very close the corresponding ones under “CBP”. Under scenarios 2 and3, the dimension is higher, we see that the analytic p -value approximation onlybased on the asymptotic results is doing poorly, while the analytic p -value ap-proximation after skewness correction are still close to those obtained through100,000 circular block permutations. The same pattern goes for different mod-els and different choices of L ’s. Based on these simulation studies, we see thatthe analytic p -value approximation after skewness correction (5.3) provides rea-sonably accurate estimates to the p -value under the circular block permutationframework. ao Chen/Change-point detection for data with local dependency 6. Discussion In this section, we discuss the choice of L through a data driven way, and theextension of the proposed framework to accommodate multiple change-points. L One practical question in applying this circular block permutation framework isthe choice of L . Here, we provide a data driven way to choose L . Figure 6 plotsthe Z G, CBP ( t ) under different choices of L ’s.In the top panel, the observations are independent: y , . . . , y n iid ∼ N ( , Σ),and the ( i, j )th element of Σ is 0 . | i − j | . The dimension of the data is 100 andthe length of the sequence is 1,000. We see that the scan statistic Z G, CBP ( t )under L = 1 (permutation), L = 2, L = 5 are almost the same. On theother hand, in the middle and bottom panels, the data sequence is from amultivariate autoregression model: y t = ρ y t − + ε t , t = 1 , . . . , n , with y ∼N ( , − ρ Σ) , ε , . . . , ε n iid ∼ N ( , Σ). Here, ρ = 0 . ρ = 0 . L = 2 no longer overlaps with that underpermutation ( L = 1). The stronger the autocorrelation is, the larger the discrep-ancy between the two curves under L = 1 and L = 2. Secondly, as L increases,the curves becomes more similar. For example, in the middle panel, the twocurves under L = 5 and L = 6 almost overlap with each other; in the bottompanel, the curve under L = 6 is also close to that under L = 5, showing that L around this range is close to enough to take care of the local dependence in thesequence. Hence, we could choose L to be the value that the scan statistic nolonger changes or changes in a negligible amount.To be more specific, we plot the maximum scan statistic max n ≤ t ≤ n Z G, CBP ( t )over L for the three data sequences and they are shown in Figure 7 (left panels).We see that the maximum scan statistic scatter around under the independentcase (top panel), and the maximum scan statistic is decreasing for L from 1 to10 under the autocorrelated cases (middle panel and bottom panel). In the rightpanel, the ratio of the maximum scan statistic under L + 1 over that under L isplotted. In each ratio plot, a horizontal line at 0.99 is added, which appears tobe a reasonable threshold to use in choosing L . We could set L to be the small-est value such that the ratio goes above 0.99. Under this criterion, we could set L = 2, L = 6, and L = 8 for the three data sequences, respectively. We focused on the single change-point alternative so far. The proposed frame-work could be extended to the changed interval alternative up to some mod-ifications: For any pair of times t < t , we could construct the test statistic Z G, CBP ( t , t ) to test { y t , . . . , y t − } against { y t , . . . , y n , y , . . . , y t − } in a ao Chen/Change-point detection for data with local dependency Fig 6 . Plots of the scan statistic Z G, CBP ( t ) under different choices of L ’s. Data gen-erated from the multivariate autoregression model y t = ρ y t − + ε t , t = 1 , . . . , n , with y ∼ N ( , − ρ Σ) , ε , . . . , ε n iid ∼ N ( , Σ) , d = 100 , n = 1 , . The ( i, j ) th element of Σ is . | i − j | . Top panel: ρ = 0 ; middle panel: ρ = 0 . ; bottom panel: ρ = 0 . .ao Chen/Change-point detection for data with local dependency Fig 7 . Plots of the maximum scan statistic max n ≤ t ≤ n Z G, CBP ( t ) under different choices of L ’s based on the same data sets in Figure 6. Top panel: ρ = 0 ; middle panel: ρ = 0 . ; bottompanel: ρ = 0 . . In the right panel, the ratio of max n ≤ t ≤ n Z G, CBP ( t ) under L + 1 over thatunder L is plotted. The horizontal line is at 0.99.ao Chen/Change-point detection for data with local dependency similar way as Z G, CBP ( t ), which tests { y , . . . , y t } against { y t +1 , . . . , y n } . Thescan statistic for the changed-interval alternative can then be defined asmax ≤ t Acknowledgments Hao Chen is supported in part by NSF award DMS-1513653. References Aue, A. , H¨ormann, S. , Horv´ath, L. , Reimherr, M. et al. (2009). Breakdetection in the covariance structure of multivariate time series models. TheAnnals of Statistics Bauwens, L. , Laurent, S. and Rombouts, J. V. (2006). MultivariateGARCH models: a survey. Journal of applied econometrics Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedas-ticity. Journal of econometrics Carlstein, E. G. , M¨uller, H. G. and Siegmund, D. (1994). Change-pointproblems . Inst of Mathematical Statistic. Chen, L. H. Y. and Shao, Q. M. (2005). Stein’s method for normal approxi-mation. An introduction to Stein’s method Chen, H. and Zhang, N. R. (2013). Graph-based tests for two-sample com-parisons of categorical data. Statistica Sinica Chen, H. and Zhang, N. (2015). Graph-based change-point detection. TheAnnals of Statistics Cs¨org¨o, M. and Horv´ath, L. (1997). Limit theorems in change-point analysis . John Wiley & Sons Inc. Eagle, N. , Pentland, A. S. and Lazer, D. (2009). Inferring friendshipnetwork structure by using mobile phone data. Proceedings of the NationalAcademy of Sciences Engle, R. F. (1982). Autoregressive conditional heteroscedasticity with esti-mates of the variance of United Kingdom inflation. Econometrica: Journal ofthe Econometric Society Friedman, J. H. and Rafsky, L. C. (1979). Multivariate generalizations ofthe Wald-Wolfowitz and Smirnov two-sample tests. The Annals of Statistics ao Chen/Change-point detection for data with local dependency Gir´on, J. , Ginebra, J. and Riba, A. (2005). Bayesian analysis of a multino-mial sequence and homogeneity of literary style. The American Statistician Henze, N. (1988). A multivariate two-sample test based on the number ofnearest neighbor type coincidences. The Annals of Statistics Kirch, C. (2006). Resampling methods for the change analysis of dependentdata. PhD thesis, University of Cologne . Kossinets, G. and Watts, D. J. (2006). Empirical analysis of an evolvingsocial network. Science K¨unsch, H. R. (1989). The jackknife and the bootstrap for general stationaryobservations. The Annals of Statistics Liu, R. Y. and Singh, K. (1992). Moving blocks bootstrap and jackknife cap-ture weak dependence. Exploring the Limits of Bootstrap Olshen, A. B. , Venkatraman, E. , Lucito, R. and Wigler, M. (2004). Cir-cular binary segmentation for the analysis of array-based DNA copy numberdata. Biostatistics Politis, D. N. and Romano, J. P. (1992). A circular block-resampling pro-cedure for stationary data. Exploring the limits of bootstrap Rosenbaum, P. R. (2005). An exact distribution-free test comparing two mul-tivariate distributions based on adjacency. Journal of the Royal StatisticalSociety: Series B (Statistical Methodology) Siegmund, D. and Yakir, B. (2007). The statistics of gene mapping . SpringerScience & Business Media. Silvennoinen, A. and Ter¨asvirta, T. (2009). Multivariate GARCH models.Handbook of Financial Time Series TG Andersen, RA Davis, JP. Kreiss andT. Mikosch, eds. Tsirigos, A. and Rigoutsos, I. (2005). A new computational method forthe detection of horizontal gene transfer events. Nucleic acids research Vostrikova, L. J. (1981). Detecting disorder in multidimensional random pro-cesses. In Soviet Mathematics Doklady Appendix A: Proofs for theorems A.1. Proof of Theorem 4.3 We have E CBP ( R G ( t ))= (cid:88) ( i,j ) , ( u,v ) ∈ G P ( g π CBP ( i ) ( t ) (cid:54) = g π CBP ( j ) ( t ) , g π CBP ( u ) ( t ) (cid:54) = π π CBP ( v ) ( t ))= (cid:88) ( i,j ) ∈ G P ( g π CBP ( i ) ( t ) (cid:54) = g π CBP ( j ) ( t ))+ (cid:88) ( i,j ) , ( i,u ) ∈ G ; j (cid:54) = u P ( g π CBP ( i ) ( t ) (cid:54) = g π CBP ( j ) ( t ) , g π CBP ( i ) ( t ) (cid:54) = π π CBP ( u ) ( t )) ao Chen/Change-point detection for data with local dependency + (cid:88) ( i, j ) , ( u, v ) ∈ Gi, j, u, v all different P ( g π CBP ( i ) ( t ) (cid:54) = g π CBP ( j ) ( t ) , g π CBP ( u ) ( t ) (cid:54) = π π CBP ( v ) ( t )) . The first part of the summation is E CBP ( R G ( t )). In the following, we figure outthe second and third part of the summation. A.1.1. P ( g π CBP ( i ) ( t ) (cid:54) = g π CBP ( j ) ( t ) , g π CBP ( i ) ( t ) (cid:54) = π π CBP ( u ) ( t )) , j (cid:54) = u When δ ij , δ iu , δ ju ≥ L , i, j, u are all in different blocks, and the probability is p ( a ). When at least one of δ ij , δ iu , δ ju ≥ L is less than L , we need to considerscenarios that some of the indices could be in the same block. In the following, weconsider when the event { g π CBP ( i ) ( t ) (cid:54) = g π CBP ( j ) ( t ) , g π CBP ( i ) ( t ) (cid:54) = π π CBP ( u ) ( t ) } could happen.When δ ij < L , δ iu , δ ju ≥ L , since π CBP ( i ) and π CBP ( j ) need to be in differentblocks, i and j need to be in different blocks. And the probability is δ ij L p ( a ) . Similarly, when δ iu < L , δ ij , δ ju ≥ L , the probability is δ iu L p ( a ) . When δ ju < L , δ ij , δ iu ≥ L , since π CBP ( j ) and π CBP ( u ) can be in the sameblock, so the probability is l − δ ju L p ( a ) + δ ju L p ( a ) . When δ ij , δ iu < L , δ ju ≥ L , then δ ju = δ ij + δ iu . j and u will always be indifferent blocks. We need i be in a block different from j ’s and u ’s block so thatthe probability is positive. So the probability is δ ju − LL p ( a ) . When δ ij , δ ju < L , δ iu ≥ L , then δ iu = δ ij + δ ju . i and j needs to be indifferent blocks, so the probability is δ ij − ( δ iu − L ) L p ( a ) + δ iu − LL p ( a ) = L − δ ju L p ( a ) + δ iu − LL p ( a ) . Similarly for δ iu , δ ju < L .When δ ij , δ iu , δ ju < L , i needs to be in different block from j and u . So theprobability is min( δ ij ,δ iu ) L I (max( δ ij , δ iu , δ ju ) (cid:54) = δ ju ) p ( a ) . ao Chen/Change-point detection for data with local dependency A.1.2. P ( g π CBP ( i ) ( t ) (cid:54) = g π CBP ( j ) ( t ) , g π CBP ( u ) ( t ) (cid:54) = π π CBP ( v ) ( t )) , i, j, u, v alldifferent When i, j, u, v are all different, there are (cid:0) (cid:1) = 6 index-pairs among them. Ifall pairwise index distances are greater than or equal to L , this probability is p ( a ). In the following, we consider scenarios that at least one of the six indexdistances is less than L .1) One distance < L .If δ ij < L and all other pairwise distances are ≥ L , then this probability is δ ij L p ( a ) . Similarly, if only δ uv < L , the probability is δ uv L p ( a ) . If only δ iu < L , the probability is L − δ iu L p ( a ) + δ iu L p ( a ) . Similar for only δ iv < L , or δ ju < L , or δ jv < L .2) Two distances < L .If only δ ij , δ uv < L , then the probability is (min( δ ij ,b ij,uv + δ uv ) − b ij,uv ) + +(min( δ ij ,b ij,uv + δ uv − l )) + L p ( a ) . If only δ ij , δ iu < L , we must have δ ui + δij = δ uj , the probability is L − δ ui L p ( a ) + δ ij + δ iu − LL p ( a ) . Similar for other 7 similar cases.If only δ iu , δ iv < L , then we have δ uv = δ ui + δ iv , the probability is L − δ uv L p ( a ) + δ uv − LL p ( a ) . Similar for other 3 similar cases.If only δ iu , δ jv < L , the probability is l − δ iu − δ jv + x ( iu,jv ) L p ( a ) + δ iu + δ jv − x ( iu,jv ) L p ( a ) + x ( iu,jv ) L p ( a ) . Similar for the case that only δ iv , δ ju < L .3) Three distances < L .If only δ ij , δ uv , δ iu < L , then the order of the four indices must be ( j, i, u, v )or the reverse, and δ jv = δ ji + δ iu + δ uv . The probability is L − δ iu L p ( a ) + ( δ jv − L ) + L p ( a ) . Similar for other 3 similar cases: replace δ iu by one of ( δ iv , δ ju , δ jv ). ao Chen/Change-point detection for data with local dependency If only δ ij , δ iu , δ jv < L , then the order of the four indices must be ( u, i, j, v )or the reverse, and δ uv = δ ui + δ ij + δ jv . The probability is L − δ uv +( δ uv − L ) + L p ( a ) + δ uv + δ ij − L − δ uv − L ) + L p ( a ) + ( δ uv − L ) + L p ( a ) . Similar for other 3 similar cases: only δ ij , δ iv , δ ju < L ; only δ uv , δ ui , δ vj < L ;only δ uv , δ uj , δ vi < L .If only δ iu , δ iv , δ ju < L , then the order of the four indices must be ( j, u, i, v )or the reverse, and δ jv = δ ju + δ ui + δ iv . The probability is L − δ jv +( δ jv − L ) + L p ( a ) + δ jv − L − δ jv − L ) + L p ( a ) + ( δ jv − L ) + L p ( a ) . Similar for other 3 similar cases: choose 3 out of ( δ iu , δ iv , δ ju , δ jv ).If only δ ij , δ iu , δ ju < L , the probability is δ ij L p ( a ) . Similar for other 3 similar cases: only δ ij , δ iv , δ jv < L ; only δ uv , δ iu , δ iv < L ;only δ uv , δ ju , δ jv < L .It’s not possible for 4 cases with only three distances smaller than l thatshare one index. For example, only δ ij , δ iu , δ iv < L .4) Four distances < L .For scenarios that 4 distances < L and 2 distances ≥ L , it is impossible for3 cases: only δ ij , δ uv ≥ L ; only δ iu , δ jv ≥ L ; only δ iv , δ ju ≥ L .If only δ ij , δ iu ≥ L , it can be ( i, v, u, j ) or ( i, v, j, u ) or their reverses. Forboth orders, the probability is L − δ iv L p ( a ) + δ iu − LL p ( a ) . Similar for 7 other similar cases.If only δ iu , δ iv ≥ L , it can be ( i, j, u, v ) or ( i, j, v, u ) or their reverses. Theprobability is δ uv L p ( a ) . Similar for 3 other similar cases.5) Five distances < L .If only δ ij ≥ L , the probability is δ uv L p ( a ) . Similar for if only δ uv ≥ L .If only δ iu ≥ L , it can be ( i, j, v, u ) or ( i, v, j, u ) or their reverses. The prob-ability is δ iu − LL p ( a ) + δ vj L I ( δ ij = δ iv + δ vj ) p ( a ) . Similar for other 3 similar cases.6) Six distances < L .If all 6 distances are < L . ao Chen/Change-point detection for data with local dependency If δ ij is the maximum distance, the probability is δ uv L p ( a ) . Similar for the case that δ uv is the maximum distance.If δ iu is the maximum distance, the probability is δ jv L I ( δ ij = δ iv + δ jv ) p ( a ) . Similar for 3 other similar cases.Summing all possible scenarios, we get the result stated in Lemma 4.3. A.2. Proof of Theorem 5.1 To prove { Z G, CBP ([ mw ] L ) : (cid:15) ≤ w ≤ − (cid:15) } converges to a Gaussian process infinite dimensional distributions, we only need to show that( Z G, CBP ([ mw ] L ) , Z G, CBP ([ mw ] L ) , . . . , Z G, CBP ([ mw K ] L ))converges to multivariate Gaussian as n → ∞ for any (cid:15) ≤ w < w < · · · CBB to denote the probability, expectation, and variance, re-spectively.Let Z G, CBB ( t ) = − R G ( t ) − E CBB ( R G ( t )) √ Var CBB ( R G ( t )) ,X CBB ( t ) = n CBB ( t ) − ( t/L ) √ (1 − t/n ) t/L , where n CBB ( t ) = m (cid:88) i =1 I ˜ π ( i ) ≤ t/L . Then following the similar arguments for obtaining Theorems 4.1 and 4.3, wehave that, for t = aL, a ∈ { , . . . , m } , E CBB ( R G ( t )) = c p , CBB ( a ) , Var CBB ( R G ( t )) = c p , CBB ( a ) + c p , CBB ( a ) + c p , CBB ( a ) − c p , CBB ( a ):= ( σ CBB ( t )) , where p , CBB ( a ) = a ( m − a ) m , p , CBB ( a ) = a ( m − a ) m , p , CBB ( a ) = a ( m − a ) m . We next prove the following two lemmas. ao Chen/Change-point detection for data with local dependency Lemma A.1. Under Condition 1, as n → ∞ , ( Z G, CBB ( t ) , . . . , Z G, CBB ( t K ) , X CBB ( t ) , . . . , X CBB ( t K )) (A.1) converges to a multivariate Gaussian distribution under CBB and the covariancematrix of ( X CBB ( t ) , X CBB ( t ) , . . . , X CBB ( t K )) is positive definite. Lemma A.2. Under Condition 2 or Condition 3, we have for k = 1 , . . . , K ,1. Var CBB ( R G ( t k )) Var CBP ( R G ( t k )) → r ([ mw k ]) , with r ( a ) a constant only depends on a .2. E CBB ( R G ( t k )) − E CBP ( R G ( t k )) √ Var CBB ( R G ( t k )) → . From Lemma A.1, ( Z G, CBB ( t ) , Z G, CBB ( t ) , . . . , Z G, CBB ( t K )) conditioning on( X CBB ( t ) , X CBB ( t ) , . . . , X CBB ( t K )) converges to multivariate normal underCBB. Since ( Z G, CBB ( t ) , Z G, CBB ( t ) , . . . , Z G, CBB ( t K )) | X CBB ( t ) = 0 , X CBB ( t ) =0 , . . . , X CBB ( t K ) = 0) under CBB has the same distribution as ( Z G, CBB ( t ) , Z G, CBB ( t ) , . . . , Z G, CBB ( t K ))under CBP, and notice that Z G, CBP ( t ) = Var CBB ( R G ( t )) Var CBP ( R G ( t )) (cid:18) Z G, CBB ( t ) − E CBB ( R G ( t )) − E CBP ( R G ( t )) √ Var CBB ( R G ( t )) (cid:19) . Given Lemma A.2, we conclude that ( Z G, CBP ( t ) , Z G, CBP ( t ) , . . . , Z G, CBP ( t K ))converges to a multivariate Gaussian distribution under CBP. Next, we provethe two lemmas. A.2.1. Proof for Lemma A.1 To show that (A.1) converges to a multivariate Gaussian distribution, we onlyneed to show that (cid:80) Kk =1 ( c k Z G, CBB ( t k ) + c k X CBB ( t k )) converges to a normaldistribution for any fixed { c k } and { c k } . If Var G, CBB ( (cid:80) Kk =1 ( c k Z G, CBB ( t k ) + c k X CBB ( t k )) = 0, (cid:80) Kk =1 ( c k Z G, CBB ( t k ) + c k X CBB ( t k ) is degenerating. Fornon-degenerating case, let σ = Var G, CBB ( K (cid:88) k =1 ( c k Z G, CBB ( t k ) + c k X CBB ( t k )) . We prove the Gaussianity of (cid:80) Kk =1 ( c k Z G, CBB ( t k ) + c k X CBB ( t k )) by theStein’s method. Consider sums of the form W = (cid:80) i ∈J ξ i , where J is an indexset and ξ are random variables with E ξ i = 0, and E ( W ) = 1. The followingassumption restricts the dependence between { ξ i : i ∈ J } . Assumption A.3. (Chen and Shao, 2005, p. 17) For each i ∈ J there exists K i ⊂ L i ⊂ J such that ξ i is independent of ξ K ci and ξ K i is independent of ξ L ci . We will use the following existing theorem in proving Theorem 5.1. ao Chen/Change-point detection for data with local dependency Theorem A.4. (Chen and Shao, 2005, Theorem 3.4) Under Assumption A.3,we have sup h ∈ Lip (1) | E ( h ( W )) − E ( h ( Z )) | ≤ δ where Lip (1) = { h : R → R ; (cid:107) h (cid:48) (cid:107) ≤ } , Z has N (0 , distribution and δ = 2 (cid:88) i ∈J ( E | ξ i η i θ i | + | E ( ξ i η i ) | E | θ i | ) + (cid:88) i ∈J E | ξ i η i | with η i = (cid:80) j ∈ K i ξ j and θ i = (cid:80) j ∈ L i ξ j , where K i and L i are defined in Assump-tion A.3. We adopt the same notations with the index set J = { e ∈ G } ∪ { , . . . , n } .Denote the end nodes of an edge e to be e − and e + . Let ξ e,k = I g ˜ π ( e − )( tk ) (cid:54) = g ˜ π ( e +)( tk ) − p , CBB ( t k ) σ CBB ( t k ) . Since I g ˜ π ( e − ) ( t k ) (cid:54) = g ˜ π ( e +) ( t k ) ∈ { , } and p , CBB ( t k ) ∈ [0 , | ξ e,k | ≤ σ CBB ( t k ) . Let ξ i,k = I ˜ π ( i ) ≤ tk/L − w k √ mw k (1 − w k ) . Similarly, we have | ξ i,k | ≤ √ mw k (1 − w k ) . Let ξ e = (cid:80) Kk =1 c k ξ e,k /σ , ξ i = (cid:80) Kk =1 c k ξ i,k /σ , and W = (cid:80) j ∈J ξ j = (cid:80) Kk =1 ( c k Z G, CBB ( t k ) + c k X CBB ( t k )) /σ . Then E CBB ( W ) = 0 , E CBB ( W ) = 1.Let c M = max( (cid:80) Kk =1 | c k | , (cid:80) Kk =1 | c k | ), σ = σ × min k σ CBB ( t k ), σ = σ × min k (cid:112) mw k (1 − w k ), then | ξ e | ≤ c M σ , ∀ e ∈ G ; | ξ i | ≤ c M σ , ∀ i ∈ { , . . . , n } . Based on Remark 4.4, c + c + c ≥ c , we have, with a k = [ mw k ], Var CBB ( R G ( t k )) ≥ c ( p , CBB ( a k ) − p , CBB ( a k )) + c ( p , CBB ( a k ) − p , CBB ( a k ))= c a k ( m − a k ) m ( m − + ( c + c ) a k ( m − a k ) m ( m − (cid:16) − a k ( m − a k ) m ( m − (cid:17) ≥ c a k ( m − a k ) m ( m − − ( c + c ) a k ( m − a k ) m ( m − . It can be shown that c = o ( mc ) (detailed arguments seen in the proof forLemma A.2). So Var CBB ( R G ( t k )) is at least of order c , which is at least oforder | G | . It is clear that σ = O ( n ).For any A a set of edge(s), let V ( A ) to be the set that contains all nodesbeing connected by at least one edge in A . Notice that | V ( A ) | ≤ | A | as theworst scenario occurs when all edge(s) in A are disconnected. ao Chen/Change-point detection for data with local dependency With A e,L, and A e,L, defined in Section 5.1, for e ∈ G , let S e = A e,L, ∪ V ( A e,L, ) ,T e = A e,L, ∪ V ( A e,L, ) , Then S e and T e satisfy Assumption A.3.With A i,L, and A i,L, defined in Section 5.1, for i = 1 , . . . , n , let S i = A i,L, ∪ V ( A i,L, ) ,T i = A i,L, ∪ V ( A i,L, ) , Then S i and T i satisfy Assumption A.3.By Theorem A.4, we have (cid:80) h ∈ Lip (1) | E h ( W ) − E h ( Z ) | ≤ δ for Z ∼ N (0 , δ = 2 (cid:88) i ∈J ( E | ξ i η i θ i | + | E ( ξ i η i ) | E | θ i | ) + (cid:88) i ∈J E | ξ i η i | = 2 (cid:88) e ∈ G ( E | ξ e η e θ e | + | E ( ξ e η e ) | E | θ e | ) + (cid:88) e ∈ G E | ξ e η e | + 2 n (cid:88) i =1 ( E | ξ i η i θ i | + | E ( ξ i η i ) | E | θ i | ) + n (cid:88) i =1 E | ξ i η i |≤ c M σ (cid:16) | A e,L, | c M σ + | V ( A e,L, ) | c M σ (cid:17) (cid:16) | A e,L, | c M σ + | V ( A e,L, ) | c M σ (cid:17) + c M σ (cid:16) | A i,L, | c M σ + | V ( A i,L, ) | c M σ (cid:17) (cid:16) | A i,L, | c M σ + | V ( A i,L, ) | c M σ (cid:17) ≤ c M σ (cid:16) c M σ + c M σ (cid:17) | A e,L, || A e,L, | + c M σ (cid:16) c M σ + c M σ (cid:17) | A i,L, || A i,L, | When (cid:80) e ∈ G | A e,L, || A e,L, | = o ( n | G | . ) and (cid:80) ni =1 | A i,L, || A i,L, | = o ( n . ), wehave δ → n → ∞ .Let Σ X be the covariance matrix of ( X CBB ( t ) , X CBB ( t ) , . . . , X CBB ( t K )).Follow similar arguments in Chen and Zhang (2015), | Σ X | = (cid:81) Kk =1 (1 − t k /t k +1 ) (cid:81) Kk =1 (1 − t k /n ) .So Σ X is positive definite. A.2.2. Proof for Lemma A.2 For t = aL , a = [ mw k ], k = 1 , . . . , K , we have Var CBB ( R G ( t )) = c p , CBB ( a ) + c p , CBB ( a ) + c p , CBB ( a ) − c p , CBB ( a )= c a ( m − a ) m ( m − + (2 c + c ) − a ( m − a ) + m ( m − a ( m − a ) m ( m − + ( c + c + c − c ) a ( m − a ) m ( m − , Var CBP ( R G ( t )) = c p ( a ) + c p ( a ) + c p ( a ) − c p ( a )= c a ( m − a )( a − m − a − m ( m − m − m − ao Chen/Change-point detection for data with local dependency + (cid:16) c + c − c m + c m ( m − (cid:17) − a ( m − a ) + m ( m − a ( m − a ) m ( m − m − m − + ( c + c + c − c ) a ( m − a ) m ( m − . From Remark 4.4, we know that c + c + c − c ≥ 0. Using the samenotations in Remark 4.4, under a certain blocking ω , let | G b i ( ω ) | be the numberof edges in G that connect a node in block i to another node not in block i . Then c ( ω ) = (cid:80) mi =1 | G b i ( ω ) | , and c ( ω ) + 2 c ( ω ) = (cid:80) mi =1 | G b i ( ω ) | . ByCauchy-Schwarz inequality, we have that c ( ω ) + 2 c ( ω ) ≥ ( (cid:80) mi =1 | G bi ( ω ) | ) m = c ( ω ) m . Then, 2 c + c = L (cid:88) ω ∈ Ω c ( ω ) + c ( ω ) ≥ L (cid:88) ω ∈ Ω 4 c ( ω ) m ≥ c m . So 2 c + c − c m ≥ j in block i , it is clear that | G b i ( ω ) | ≤ | A j,L, | , so2 c ( ω ) + c ( ω ) ≤ L (cid:80) nj =1 | A j,n, | . Then 2 c + c ≤ L (cid:80) nj =1 | A j,n, | ≤ L (cid:80) nj =1 | A j,n, || A j,n, | = o ( n . ).Notice that a ( m − a ) = O ( m ) for a = [ mw k ] , k = 1 , . . . , K , and 0 < a ( m − a ) ≤ m , we have − a ( m − a ) + m ( m − a ( m − a ) ∈ (cid:104) − m , m ( m − (cid:105) . If a is in a range such that − a ( m − a ) + m ( m − a ( m − a ) = O ( m ),notice that c = O ( | G | ), 2 c + c = o ( n . ) , c m = o ( mc ), and c ≥ c , soterm (2 c + c ) − a ( m − a ) + m ( m − a ( m − a ) m ( m − is dominated by term c a ( m − a ) m ( m − ,and term (cid:16) c + c − c m + c m ( m − (cid:17) − a ( m − a ) + m ( m − a ( m − a ) m ( m − m − m − is dominated byterm c a ( m − a )( a − m − a − m ( m − m − m − , thenlim m →∞ Var CBB ( R G ( t )) Var CBP ( R G ( t )) = 1 . If a is in a range such that − a ( m − a ) + m ( m − a ( m − a ) is of orderhigher than m , then − a ( m − a ) + m ( m − a ( m − a ) must be positive. UnderCondition 2, we have 2 c + c − c m = O (2 c + c ), then lim m →∞ Var CBB ( R G ( t )) Var CBP ( R G ( t )) is a positive constant. Under Condition 3, we have 2 c + c = O ( c ), thenlim m →∞ Var CBB ( R G ( t )) Var CBP ( R G ( t )) is also a positive constant.For t = aL, a = [ mw k ] , k = 1 , . . . , K , notice that E CBB ( R G ( t )) − E CBP ( R G ( t )) = − c a ( m − a ) m ( m − . From the above arguments, we know that Var CBB ( R G ( t )) is at leastof order | G | , and c = O ( | G | ), then E CBB ( R G ( t )) − E CBP ( R G ( t )) √ Var CBB ( R G ( t )) is at most of order | G | / m − = m . α − , which converges to 0 as m → ∞ since α < . ao Chen/Change-point detection for data with local dependency A.3. Proof for Theorem 5.3 We need to figure out E CBP ( R G ( t ) R G ( t )) for t = a L < t = a L with a , a ∈ { , . . . , m } . We have E CBP ( R G ( t ) R G ( t ))= (cid:88) ( i,j ) , ( u,v ) ∈ G P ( g π CBP ( i ) ( t ) (cid:54) = g π CBP ( j ) ( t ) , g π CBP ( u ) ( t ) (cid:54) = π π CBP ( v ) ( t ))= (cid:88) ( i,j ) ∈ G P ( g π CBP ( i ) ( t ) (cid:54) = g π CBP ( j ) ( t ) , g π CBP ( i ) ( t ) (cid:54) = g π CBP ( j ) ( t ))+ (cid:88) ( i,j ) , ( i,u ) ∈ G ; j (cid:54) = u P ( g π CBP ( i ) ( t ) (cid:54) = g π CBP ( j ) ( t ) , g π CBP ( i ) ( t ) (cid:54) = π π CBP ( u ) ( t ))+ (cid:88) ( i, j ) , ( u, v ) ∈ Gi, j, u, v all different P ( g π CBP ( i ) ( t ) (cid:54) = g π CBP ( j ) ( t ) , g π CBP ( u ) ( t ) (cid:54) = π π CBP ( v ) ( t )) . Notice that, when i , j , u , v are all in different blocks, we have P ( g π CBP ( i ) ( t ) (cid:54) = g π CBP ( j ) ( t ) , g π CBP ( i ) ( t ) (cid:54) = g π CBP ( j ) ( t ))= a ( m − a ) m ( m − = q ( a , a ) ,P ( g π CBP ( i ) ( t ) (cid:54) = g π CBP ( j ) ( t ) , g π CBP ( i ) ( t ) (cid:54) = π π CBP ( u ) ( t ))= a ( m − a )( m − a +2 a − m ( m − m − = q ( a , a ) ,P ( g π CBP ( i ) ( t ) (cid:54) = g π CBP ( j ) ( t ) , g π CBP ( u ) ( t ) (cid:54) = π π CBP ( v ) ( t ))= a ( m − a )[( a − m − a − a − a )( m − a − m ( m − m − m − = q ( a , a ) . When some or all i , j , u , v are in the same block, we could follow exactly thesame procedure in the proof for Theorem 4.3 while replacing p ( a ), p ( a ), p ( a )by q ( a , a ), q ( a , a ), q ( a , a2