[PDF] Hypothesis testing for populations of networks

Abstract

It has become an increasingly common practice for scientists in modern science and engineering to collect samples of multiple network data in which a network serves as a basic data object. The increasing prevalence of multiple network data calls for developments of models and theory that can deal with inference problems for populations of networks. In this work, we propose a general procedure for hypothesis testing of networks and in particular, for differentiating distributions of two samples of networks. We consider a very general framework which allows us to perform tests on large and sparse networks. Our contribution is two-fold: (1) We propose a test statistics based on the singular value of a generalized Wigner matrix. The asymptotic null distribution of the statistics is shown to follow the Tracy--Widom distribution as the number of nodes tends to infinity. The test also yields asymptotic power guarantee with the power tending to one under the alternative; (2) The test procedure is adapted for change-point detection in dynamic networks which is proven to be consistent in detecting the change-points. In addition to theoretical guarantees, another appealing feature of this adapted procedure is that it provides a principled and simple method for selecting the threshold that is also allowed to vary with time. Extensive simulation studies and real data analyses demonstrate the superior performance of our procedure with competitors.

Full PDF

HH YPOTHESIS TESTING FOR POPULATIONS OF NETWORKS

A P

REPRINT

Li Chen

College of MathematicsSichuan UniversityChengdu, Sichuan 610064, China [email protected]

Jie Zhou

College of MathematicsSichuan UniversityChengdu, Sichuan 610064, China [email protected]

Lizhen Lin

Department of Applied andComputational Mathematics and StatisticsUniversity of Notre DameSouth Bend, Indiana 46617, U.S.A. [email protected]

July 6, 2020 A BSTRACT

It has become an increasingly common practice in modern science and engineering to collect samplesof multiple network data in which a network serves as a basic data object. The increasing prevalenceof multiple network data calls for developments of models and theories that can deal with inferenceproblems for populations of networks. In this work, we propose a general procedure for hypothesistesting of networks and in particular, for differentiating distributions of two samples of networks. Weconsider a very general framework which allows us to perform test on large and sparse networks. Ourcontribution is two-fold: (1) We propose a test statistics based on the singular value of a generalizedWigner matrix. The asymptotic null distribution of the statistics is shown to follow the Tracy–Widomdistribution as the number of nodes tends to inﬁnity. The test also yields asymptotic power guaranteewith the power tending to one under the alternative; (2) The test procedure is adapted for change-pointdetection in dynamic networks which is proven to be consistent in detecting the change-points. Inaddition to theoretical guarantees, another appealing feature of this adapted procedure is that itprovides a principled and simple method for selecting the threshold that is also allowed to vary withtime. Extensive simulation studies and real data analyses demonstrate the superior performance ofour procedure with competitors.

Keywords:

Change-point detection; Dynamic networks; Hypothesis testing; Network data; Tracy–Widom distribution a r X i v : . [ s t a t . M E ] J u l PREPRINT - J

ULY

6, 2020

One of the unique features in modern data science is the increasing availability of complex data in non-traditional forms.Among the newer forms of data, network has arguably emerged as one of the most important and powerful data types. Anetwork, an abstract object consisting of a set of nodes and edges, can be broadly used to represent interactions amonga set of agents or entities and one can ﬁnd its applications in virtually any scientiﬁc ﬁeld. The ubiquity of networkdata in diverse ﬁelds ranging from biology (Chen and Yuan, 2006; Cline et al., 2007), physics (Bounova and de Weck,2012; Kulig et al., 2015), social science (Hoff et al., 2002; Snijders and Baerveldt, 2003) to engineering (Leonardi andVan De Ville, 2013; Chen et al., 2010) has spurred fast developments in models, theories and algorithms for the ﬁeldof network analysis, see e.g., Erd˝os and Rényi (1959); Holland et al. (1983); Karrer and Newman (2011); Ball et al.(2011); Wolfe and Olhede (2013); Rohe et al. (2011); Decelle et al. (2011); Amini and Levina (2018); Bickel and Chen(2009). The existing literature, however, has largely been focusing on inference of one single (often large) network. Therecent advancement in technology and computer prowess has led to the increasing prevalence of network data availablein multiple networks in which a network serves as the basic data object. For instance, such datasets can be found inneuroscience (Bassett et al., 2008), cancer study (Zhang et al., 2009), microbiome study (Cai et al., 2019), and socialinteractions (Kossinets and Watts, 2006; Eagle et al., 2009). There is a strong need for development of models andtheories that can deal with such data sets, and more broadly, for inference of population of networks.One has already seen a growing effort in this direction. Ginestet et al. (2017) proposes a geometric framework forhypothesis tests of populations of networks viewing a weighted network as a point on a manifold. Along the sameline, Kolaczyk et al. (2017) provides geometric characterization of space of all unlabeled networks which serve as thefoundation for inference based on Fréchet mean of networks. In addition, Mukherjee et al. (2017) provides a generalframework for clustering network objects. Durante et al. (2017) proposes a Bayesian nonparametric approach formodeling the populations of networks.One of commonly encountered problems for inference of populations of networks is hypothesis testing which hassigniﬁcant applications, but remains largely understudied especially for large networks. Among the few existing workin the literature, besides Ginestet et al. (2017) as mentioned above, Tang et al. (2017) carries out hypothesis testsusing random dot product graph model via adjacency spectral embedding. Ghoshdastidar et al. (2017) proposes twotest statistics based on estimates of the Frobenius norm and spectral norm between link probability matrices of thetwo samples, the key challenge of which lies in choosing a threshold for the test statistics. Ghoshdastidar and vonLuxburg (2018) uses the same statistics as Ghoshdastidar et al. (2017) and proves asymptotic normality for the statistics.Ghoshdastidar and von Luxburg (2018) further proposes a test statistics based on the extreme eigenvalues of a scaledand centralized matrix and proves that the new statistics asymptotically follows the Tracy–Widom law (Tracy andWidom, 1996). Most of the literature, however, focuses on the case where the number of nodes for each network isﬁxed, which greatly limits the scope of inference.The initial focus of our work is on hypothesis testing for two samples of networks including large and sparse networks.We propose a very intuitive testing statistics which yields theoretical guarantees. More speciﬁcally, we prove thatits asymptotic null distribution follows the Tracy–Widom distribution and the asymptotic power tends to 1 under thealternative. One of the appealing features of our approach is that our test adopts a very general framework in which thenumber of the nodes are allowed to grow to inﬁnity, while most of the existing methods assume that the number of2

PREPRINT - J

ULY

6, 2020nodes is ﬁxed, which is not always a practical assumption since many modern networks are often large and sparse. Wethen adapt our test statistics for a change-point detection procedure in dynamic networks and prove its consistency indetecting change-points. We provide a principled method for selecting the threshold level in the change-point detectionprocedure based on the asymptotic distribution of the testing statistics and the threshold is allowed to vary with time.This is appealing comparing to many existing change-point detection approaches which require either a cross-validationfor selecting the threshold or a careful tuning of the parameters. Extensive simulation studies and two real data analysesdemonstrate the superior performance of our procedure in comparing with others in both tasks.The paper is organized as follows. In Section 2, we propose a testing statistics and throughly study its asymptoticproperties. Section 3 is devoted to a change-point detection procedure for dynamic networks by adapting the testingstatistics derived in Section 2. Simulation studies are carried out in Section 4 and real data examples are presented inSection 5. Technical proofs can be found in the appendix.

We ﬁrst introduce some notations that will be used throughout the paper. For a set N , |N | denotes its cardinality. T W denotes the Tracy–Widom distribution with index 1. χ ( n ) denotes the Chi-squared distribution with n degreesof freedom and χ α ( n ) corresponds to its α th upper quantile for α ∈ (0 , . For a square matrix B ∈ R n × n , B ij denotes its ( i, j ) entry and B i · is the i th row of B . Given two vectors a and b , (cid:104) a, b (cid:105) denotes their inner product.Given two matrices A and B , A ◦ B denotes the Hadamard (element-wise) product of A and B . Notation (cid:107)·(cid:107) op denotes operator norm. For a symmetric matrix B ∈ R n × n , λ j ( B ) denotes its j th largest eigenvalue, ordered as λ ( B ) ≥ λ ( B ) ≥ · · · ≥ λ n ( B ) , σ ( B ) is the largest singular value. Write X n (cid:32) X if a sequence of randomvariables { X n } ∞ n =1 converges in distribution to random variable X . (cid:98) x (cid:99) denotes the largest integer but no greater than x ∈ R . I ( · ) denotes indicator function. For two sequences of real numbers { x n } and { y n } , we have the followingnotations: y n = O n ( x n ) : there exists a constant M such that lim n →∞ y n x n = M . y n = o n ( x n ) : lim n →∞ y n x n = 0 . y n = O p ( x n ) : for any ε > , there exist ﬁnite M > and N > such that P (cid:0)(cid:12)(cid:12) y n x n (cid:12)(cid:12) > M (cid:1) < ε for any n > N . y n = O p ( x n ) : lim n →∞ P (cid:0)(cid:12)(cid:12) y n x n (cid:12)(cid:12) ≥ ε (cid:1) = 0 for any positive ε . We consider two samples of networks with n nodes and sample sizes m and m respectively. More speciﬁcally,we assume one observes symmetric binary adjacency matrices A (1)1 , . . . , A ( m )1 that are generated from symmetriclink probability matrix P with A ( k )1 ,ij ∼ Bernoulli ( P ,ij ) , k = 1 , , . . . , m , i, j = 1 , , . . . , n , and another sample ofadjacency matrices A (1)2 , . . . , A ( m )2 generated from the same model with link probability matrix P . Our goal is to testwhether the two samples of networks have same graph structure or not, which is equivalent to testing: H : P = P against H : P (cid:54) = P . (1)3 PREPRINT - J

ULY

6, 2020For the case of m = m = 1 and a ﬁxed n , Tang et al. (2017) focuses on random dot product graphs by applyingthe adjacency spectral embedding, whereas Ghoshdastidar and von Luxburg (2018) focuses on the inhomogeneousErd˝os–Rényi graphs and proposes a test based on eigenvalues.For the case of large m , m and again a ﬁxed number of nodes n , Ginestet et al. (2017) proposes a χ -type testbased on a geometric characterization of the space of graph Laplacians and a notion of Fréchet means (Fréchet, 1948;Bhattacharya and Lin, 2017). As a simpliﬁcation of the statistics in Ginestet et al. (2017), Ghoshdastidar and vonLuxburg (2018) sets m = m = m and obtains the test statistics as follows: T χ = (cid:88) im/ A ( k )1 ,ij − A ( k )2 ,ij (cid:17)(cid:114)(cid:80) im/ A ( k )1 ,ij + A ( k )2 ,ij (cid:17) . (3)Ghoshdastidar and von Luxburg (2018) proves the asymptotic normality of T N as n → ∞ . We refer this method to N -type test. In proposing our test statistics, we consider a very general setting in which the number of nodes can grow to inﬁnityinstead of being ﬁxed unlike most of the existing literature, and the sample sizes m and m grow in an appropriaterate. We ﬁrst introduce the centralized and re-scaled matrix Z with entries given as follows: Z ij = ¯ A ,ij − ¯ A ,ij (cid:114) ( n − (cid:104) m P ,ij (cid:0) − P ,ij (cid:1) + m P ,ij (cid:0) − P ,ij (cid:1)(cid:105) , (4)where ¯ A u,ij = m u (cid:80) m u k =1 A ( k ) u,ij with u = 1 , and i, j = 1 , . . . , n .The matrix Z involves unknown link probability matrices P and P thus can not be directly used as a test statistics. Asan alternative, one can choose some appropriate plugin estimates for P and P , and some of these estimates attaingood properties for the resulting tests as we will see in the following discussions.Denote ˆ P and ˆ P as some plugin estimators of P and P respectively, then the empirical standardized matrix ˆ Z of Z can be written with entries as ˆ Z ij = ¯ A ,ij − ¯ A ,ij (cid:114) ( n − (cid:104) m ˆ P ,ij (cid:0) − ˆ P ,ij (cid:1) + m ˆ P ,ij (cid:0) − ˆ P ,ij (cid:1)(cid:105) , i, j = 1 , , . . . , n. (5)We propose to use the largest singular value of ˆ Z , after suitable shifting and scaling, as our test statistics: T T W = n / (cid:2) σ ( ˆ Z ) − (cid:3) . (6)4 PREPRINT - J

ULY

6, 2020Given a signiﬁcance level α ∈ (0 , , the rejection region Q for H in test (1) is Q = { T T W | T T W ≥ τ α/ } , (7)where τ α/ is the corresponding α/ upper quantile of T W . We then have the following results. Theorem 2.1 (General asymptotic null distribution) . Let A (1)1 , . . . , A ( m )1 be a sample of networks generated from alink probability matrix P with n nodes, and A (1)2 , . . . , A ( m )2 be another sample generated from a link probabilitymatrix P with the same number of nodes. Let ˆ Z be given as in (5) . Given some estimated matrices ˆ P u of P u , u = 1 , ,if sup i,j | ˆ P u,ij − P u,ij | = o p ( n − / ) , then the following holds under the null hypothesis in (1) : n / [ λ ( ˆ Z ) − (cid:32) T W , n / [ − λ n ( ˆ Z ) − (cid:32) T W . (8) Remark 2.2.

Theorem 2.1 is very general in the sense that it puts no structural conditions on the networks, nor does itimpose any assumption on the type of estimates for P and P so long as they are estimated within o p ( n − / ) error. The following corollaries show asymptotic type I error control and asymptotic power for the rejection rule (7).

Corollary 2.3 (Asymptotic type I error control) . Supposing assumptions in Theorem 2.1 hold, the rejection region in (7) has size α . Corollary 2.4 (Asymptotic power guarantee) . Deﬁne a matrix ˜ Z ∈ R n × n with zero diagonal and for any i (cid:54) = j , ˜ Z ij = P ,ij − P ,ij (cid:114) ( n − (cid:104) m P ,ij (cid:0) − P ,ij (cid:1) + m P ,ij (cid:0) − P ,ij (cid:1)(cid:105) . (9) Under the assumptions of Theorem 2.1, if P ,ij and P ,ij are such that n − / [ σ ( ˜ Z ) − − ≤ o n (1) , then P ( T T W ≥ τ α/ ) = 1 − o n (1) . Remark 2.5.

As mentioned in the introduction, in Ghoshdastidar and von Luxburg (2018), a test statistics for comparingtwo large graphs is proposed, and our test statistics appears to be similar in natural to theirs. However, there are somekey distinctions between our method and theirs. First, our testing statistics considers two-sample test on two populationsof networks which requires exploration of the proper interplay between the asymptotics in both the sample sizes ofnetworks and nodes number. Second, Ghoshdastidar and von Luxburg (2018) proves the asymptotic Tracy–Widom lawunder the true link probability matrices, while in our paper, we consider various estimates of link probability matrices(again based on multiple networks) and prove the Tracy–Widom law theoretically. We also discuss the performance ofthe resulting testing statistics under various estimators. Third, our testing statistics is modiﬁed for a novel and efﬁcientchange-point detection procedure and the consistency of the change-point detection is also proved.

In this section, we discuss explicit estimates for the link probability matrices that can be used as the plugging estimatesin the test statistics. Assume a link probability matrix P ∈ R n × n is generated by a graphon function f : [0 , → [0 , such that P ij = f ( ξ i , ξ j ) , ξ i i.i.d. ∼ Uniform [0 , , i, j = 1 , . . . , n. We ﬁrst apply the modiﬁed neighborhood smoothing (MNBS) method proposed in Zhao et al. (2019) to estimate P and discuss the performance of the resulting test in our proposed method. The essential idea of the MNBS procedure5 PREPRINT - J

ULY

6, 2020consists of the following steps: First, given a group of adjacency matrices A (1) , A (2) , . . . , A ( m ) generated from P , let ¯ A = (cid:80) mk =1 A ( k ) /m , deﬁne the distance measure between nodes i and i (cid:48) as d ( i, i (cid:48) ) = max k (cid:54) = i,i (cid:48) |(cid:104) ¯ A i · − ¯ A i (cid:48) · , ¯ A k · (cid:105)| and the neighborhood of node i as N i = { i (cid:48) (cid:54) = i : d ( i, i (cid:48) ) ≤ q i ( q ) } , where q i ( q ) denotes the q th quantile ofthe distance set { d ( i, i (cid:48) ) : i (cid:48) (cid:54) = i } . Then we set q to be C log n/ ( n / ω ) , where C is some positive constant and ω = min { n / , ( m log n ) / } . Finally, given the neighborhood N i for each node i , the link probability P ij betweennodes i and j is estimated by ˜ P ij = (cid:80) i (cid:48) ∈N i ¯ A i (cid:48) j / |N i | . In comparing with the neighborhood smoothing methodproposed in Zhang et al. (2017), the key idea is to employ the average network information ¯ A and simultaneously shrinkthe neighborhood size (from C (log n/n ) / to C log n/ ( n / ω ) ) to obtain an estimate with an improved rate.Based on MNBS, for the symmetric networks considered in this paper, we use symmetrized estimators of the linkprobability matrices P u , u = 1 , , of the two groups of graphs as ˆ P u = ˜ P u + ( ˜ P u ) T , with ˜ P u,ij = (cid:80) i (cid:48) ∈N u,i ¯ A u,i (cid:48) j |N u,i | , (10)where ¯ A u,i (cid:48) j is the ( i (cid:48) , j ) element of ¯ A u = (cid:80) m u k =1 A ( k ) u /m u and N u,i is the neighborhood of node i in group u .From Lemma 3.3 in Zhao et al. (2019), we have |N u,i | ≥ B u n / log nω u , (11)where B u is a global positive constant and ω u = min { n / , ( m u log n ) / } for u = 1 , .In the following theorem, we state the asymptotic null distribution of our test statistics based on the MNBS estimator. Theorem 2.6 (MNBS based asymptotic null distribution) . Let A (1)1 , . . . , A ( m )1 be a sample of networks generatedfrom a link probability matrix P with a graphon function f and n nodes, and A (1)2 , . . . , A ( m )2 be another sampleof networks generated from a link probability matrix P with graphon function f and the same number of nodes. ˆ Z is a matrix deﬁned in (5) and ˆ P u is the estimator of P u based on the MNBS method given in (10) , u = 1 , . Assume m u = O n ( n α u ) , ω u = min { n / , ( m u log n ) / } . If ω u = n / , α u ≥ / , or ω u = ( m log n ) / , α u > / , then(i) sup i,j | ˆ P u,ij − P u,ij | = o p ( n − / ) ;(ii) Under the null hypothesis in (1) , n / [ λ ( ˆ Z ) − (cid:32) T W , n / [ − λ n ( ˆ Z ) − (cid:32) T W . The corresponding asymptotic type I error control and power are the same as those in Corollaries 2.3 and 2.4.

Remark 2.7.

When n is relatively small compared with m u such that n / ≤ ( m u log n ) / and ω u = n / , we require m u = O n ( n α u ) , α u ≥ / . Whereas when n is relatively large compared with m u , and n / ≥ ( m u log n ) / , ω u =( m u log n ) / , the condition can be relaxed to m u = O n ( n α u ) , α u > / . In other words, the increasing speed of m u with nodes number n is allowed to be slower if n is large enough. In addition to the MNBS based estimator for link probability matrix, there are many other choices of estimators. In thissubsection, we investigate the properties of the tests corresponding to various different estimators for link probabilitymatrix. 6

PREPRINT - J

ULY

6, 2020We ﬁrst consider a different but natural and simple estimator of P u by using the average of all the adjacency matrices inthe same group. We denote this method as AVG and the link probability matrix estimator as ˆ P AVG ,u , which is actually ¯ A u .It’s not difﬁcult to see that sup i,j | ˆ P AVG ,u,ij − P u,ij | = o p (cid:0) m − / u log( n ) (cid:1) by applying Bernstein’s inequality. To guarantee the asymptotic T W in (8), it requires that α u > / . Morespeciﬁcally, the sample size m u needs to increase faster than nodes number n , so m u will exceed n eventually as n tends to inﬁnity. Therefore, the AVG estimator will perform well if the sample size is large enough. However, this ishard to hold in reality especially when the size of the network is large. Usually, for most practical applications, it wouldbe more suitable to require m u to increase slower than n , which is covered in the MNBS based method in Theorem 2.6.We also consider an average estimator of P u based on the stochastic block model (SBM), which is similar in spirit tothe estimator in Ghoshdastidar and von Luxburg (2018) but with a different algorithm for estimating the communities.Our main idea can be summarized as follows: First, assume the graphs are SBMs, or approximate them with SBMs by aweaker version of Szemerédi’s regularity lemma (see Lovász (2012)). Second, use one of the community detectionalgorithms such as the goodness-of-ﬁt test proposed in Lei (2016) to estimate the number of the communities ˆ K u .Then perform clustering using for example the spectral clustering algorithm (see, e.g., von Luxburg (2007)) to obtainestimates of the membership vector g u ∈ { , . . . , ˆ K u } n as well as the community set B u,k = { i : 1 ≤ i ≤ n, g u,i = k } ,where k = 1 , , . . . , ˆ K u and g u,i is the i th element of g u . Subsequently, P u is approximated by a block matrix ˆ P SBM ,u such that ˆ P SBM ,u,ij is the mean of the submatrix of ¯ A u restricted to B u,g u,i × B u,g u,j .Under further assumption that each community has size at least proportional to n/K u , where K u is the true communitynumber, it can be seen that the error of ˆ P SBM ,u,ij is o p ( K u m − / u n − log n ) (Lei, 2016). This implies that only when K u = O n ( n γ u ) , γ u < / α u / , the error condition in Theorem 2.1 holds. For large networks in practice, the numberof communities can be very large therefore such a condition might be hard to satisfy. Moreover, due to the potentialdouble estimation in the process (in estimating the number of communities as well as the community membership), itmay bring large error to the ﬁnal test statistics, especially when the SBM assumption is not valid. We refer the two sample test based on asymptotic

T W proposed in the previous section as T W -type test. In thissection, we adapt the T W -type test to a procedure for change-point detection in dynamic networks, which is anotherimportant learning task in statistics and has received a great deal of recent attentions. Speciﬁcally, we examine asequence of networks whose distributions may exhibit changes at some time epochs. Then, the problem is to determinethe unknown change-points based on the observed sequence of network adjacency matrices.Assume the observed dynamic networks { A t } mt =1 are generated by a sequence of probability matrices { P t } mt =1 with A t,ij ∼ Bernoulli ( P t,ij ) for time t = 1 , . . . , m . Let J = { η j } Jj =1 ⊂ { , . . . , m } be a collection of change-points and η = 0 , η J +1 = m , ordered as η < η < · · · < η J < η J +1 , such that P t = P ( j ) , t = η j − + 1 , . . . , η j , j = 1 , . . . , J + 1 . PREPRINT - J

ULY

6, 2020In other words, the change-points { η j } Jj =1 divide the networks into J + 1 groups, the networks contained in the samegroup follow the same link probability matrix and P ( j ) is the link probability matrix of the j th segment satisfying P ( j ) (cid:54) = P ( j +1) . Denote J = ∅ if J = 0 .Now we apply our T W -type test to a screening and thresholding algorithm that is commonly used in change-pointdetection, see Niu and Zhang (2012); Zou et al. (2014); Zhao et al. (2019). The detection procedure is referred as T W -type detection and described as follows.Deﬁne L = min ≤ j ≤ J +1 ( η j − η j − ) , which is the minimum segment length. Set a screening window size h (cid:28) m and h < L/ . Denote ¯ A ( t, h ) = h (cid:80) ti = t − h +1 A i and ¯ A ( t, h ) = h (cid:80) t + hi = t +1 A i for each t = h, . . . , m − h . ˆ P ( t, h ) and ˆ P ( t, h ) are for example MNBS estimators using { A i } ti = t − h +1 and { A i } t + hi = t +1 respectively. In addition, we denote amatrix ˆ Z ( t, h ) with entries as follows essentially the same as in (5): ˆ Z ij ( t, h ) = ¯ A ,ij ( t, h ) − ¯ A ,ij ( t, h ) (cid:115) ( n − (cid:26) h ˆ P ,ij ( t, h ) (cid:104) − ˆ P ,ij ( t, h ) (cid:105) + h ˆ P ,ij ( t, h ) (cid:104) − ˆ P ,ij ( t, h ) (cid:105)(cid:27) ,i, j = 1 , , . . . , n. In the screening step, we calculate the scan statistics T T W ( t, h ) depending only on observations in a small neighborhood [ t − h + 1 , t + h ] as follows: T T W ( t, h ) = n / (cid:8) σ (cid:2) ˆ Z ( t, h ) (cid:3) − (cid:9) . Deﬁne the h -local maximizers of T T W ( t, h ) as { t : T T W ( t, h ) ≥ T T W ( t (cid:48) , h ) for all t (cid:48) ∈ ( t − h, t + h ) } . Let LM denote the set of all h -local maximizers of T T W ( t, h ) .In the thresholding step, we estimate the change-points by a thresholding rule to LM with time t such that ˆ J = { t : t ∈ LM and T T W ( t, h ) > (cid:77) T TW } , (12)where (cid:77) T TW = max { τ α , n / [ δ ( t, h ) − − τ α } , α = 1 / − (1 − /n ) / (2 h ) / , δ ( t, h ) = σ [ V ( t, h )] is the largestsingular value of matrix V ( t, h ) with zero diagonal and for any i (cid:54) = j , V ,ij ( t, h ) = ˆ P ,ij ( t, h ) − ˆ P ,ij ( t, h ) (cid:114) ( n − (cid:110) h ˆ P ,ij ( t, h ) (cid:104) − ˆ P ,ij ( t, h ) (cid:105) + h ˆ P ,ij ( t, h ) (cid:104) − ˆ P ,ij ( t, h ) (cid:105)(cid:111) ,i, j = 1 , , . . . , n. We have the following consistency result.

Theorem 3.1 (Consistency of

T W -type change-point detection) . Under the alternative hypothesis, assume n / [ σ ( t, h ) − ≥ τ α , α = 1 / − (1 − /n ) / (2 h ) / , h < L/ , then the T W -type change-point detectionprocedure satisﬁes lim n →∞ P (cid:0) J = ˆ J (cid:1) = 1 . One of the interesting ﬁndings from Theorem 3.1 is that for a ﬁxed window size h , the threshold in (12) is dynamic withtime t instead of being a constant as in Zhao et al. (2019). By adapting the T W -type test for change-point detection,we can adjust the threshold with t and still enjoy consistency of the change-point detection. From the proof of Theorem8 PREPRINT - J

ULY

6, 20203.1, it is reﬂected that for a time t that does not correspond to a change-point, T T W ( t, h ) ≤ (cid:77) T TW with probability 1,so it can control the type I error. However, for a change-point t , T T W ( t, h ) > (cid:77) T TW with probability 1, and hence thethreshold can lead to a good performance.The only tuning parameter of T W -type change-point detection procedure is the local window size h , which is chosenaccording to applications with available information or artiﬁcially like set h = √ m as recommended in Zhao et al.(2019). In this section, we illustrate the performance of

T W -type test and its application to change-point detection usingseveral synthetic data examples.We ﬁrst deﬁne four graphons and an SBM, which are used for two-sample test and change-point detection in thesimulation studies. The graphons are partly borrowed from Zhang et al. (2017) and the SBM is from Zhao et al. (2019)with 2 communities. We denote the block matrix or the probability matrix of connections between blocks as Λ . Morespeciﬁcally, the graphons and SBM are deﬁned as: Graphon 1 : f ( u, v ) =  k/ ( K + 1) , u, v ∈ (( k − /K, k/K ) , . / ( K + 1) , otherwise , where K = (cid:98) log n (cid:99) , k = 1 , , . . . , K . Graphon 2 : f ( u, v ) = ( u + v ) / / ( u + v )] + 0 . . Graphon 3 : f ( u, v ) = sin[5 π ( u + v −

1) + 1] / . . Graphon 4 : f ( u, v ) = ( u + v ) /

10 cos[1 / ( u + v )] + 0 . . SBM 1 : Λ = (cid:20) . θ . . . (cid:21) , where θ is a constant related to sample size m . The membership of the i th node is M ( i ) = I (1 ≤ i ≤ (cid:98) n/ log n (cid:99) ) +2 I ( (cid:98) n/ log n (cid:99) + 1 ≤ i ≤ n ) .To operationalize simulations related to MNBS, the quantile parameter q = B (log n ) / / ( n / h / ) and the threshold (cid:77) D = D (log n ) / δ / ( n / h / ) with tuning parameters D and δ for change-point detection in Zhao et al. (2019)need to be speciﬁed. In the following simulations in this section and the real data analyses in Section 5, we set therelated parameters h = √ m, B = 3 , δ = 0 . , D = 0 . as recommended in Zhao et al. (2019) unless otherwiseindicated. 9 PREPRINT - J

ULY

6, 2020

To examine the performance of the two-sample test (1), we present our results by

T W -type tests based on MNBS( T W -MNBS), AVG ( T W -AVG), and SBM ( T W -SBM) discussed in subsections 2.4 and 2.5, χ -type test withstatistics (2), and N -type test with statistics (3). We measure the performance in terms of the Attained SigniﬁcanceLevel (ASL) which is the probability of observing a statistics far away from the true value under the null hypothesis,and the Attained Power (AP), the probability of correctly rejecting the null hypothesis when the alternative hypothesisis true.We conduct two experiments using Graphon 1 and Graphon 2 respectively. In the ﬁrst experiment, we generate twogroups of networks { A ( k )1 } m k =1 and { A ( k )2 } m k =1 . We vary the number of nodes n growing from 100 to 1000 in a stepof 100 with sample sizes m = m = 30 , , and set signiﬁcance level at α = 0 . . { A ( k )1 } m k =1 are generated fromGraphon 1. Under the null hypothesis, { A ( k )2 } m k =1 are also generated from the Graphon 1 and hence P = P . Underthe alternative hypothesis, randomly choose (cid:98) log n (cid:99) -element subset S ⊂ { , , . . . , n } , generate { A ( k )2 } m k =1 from P by setting P ,ij = P ,ij + θ with θ = 0 . for m = m = 30 ( θ = 0 . for m = m = 200) if i, j ∈ S , and θ = 0 otherwise. Using T W -MNBS, T W -AVG, T W -SBM tests, χ -type test and N -type test, we run 1000 MonteCarlo simulations for the experiment to estimate the ASLs and APs of test (1).The second experiment is conducted similarly but using Graphon 2. The only difference is that for a better visualizationof comparisons, under the alternative hypothesis, we set P ,ij = P ,ij + θ with θ = 0 . for m = m = 30 ( θ =0 . for m = m = 200) if i, j ∈ S and θ = 0 otherwise. The rates of rejecting the null hypothesis for these twoexperiments are summarized in Figures 1 and 2 respectively.The results of the ﬁrst experiment using Graphon 1, an SBM set up, are plotted in Figure 1. It reveals undesirablebehaviors of χ -type test and T W -AVG test since with increasing number of nodes n , the ASLs of both tests growquickly close to , which is too large to be used in practice. We can also see that the N -type test is not efﬁcient asboth ASLs and APs of the test are for both cases of m = m = 30 , . Its poor performance in APs is partly dueto the small difference between { A ( k )1 } m k =1 and { A ( k )2 } m k =2 we set. However, the performance of T W -SBM test and T W -MNBS test are much better, ASLs of both tests are stable and close to the signiﬁcance level of α = 0 . , whileAPs improve to as n grows. It is also found that when n is not that large, T W -SBM test is slightly more powerful interms of AP than T W -MNBS test. This is not surprising because the networks generated from Graphon are endowedwith an SBM structure.The results of the second experiment using Graphon 2, which is not an SBM, are given in Figure 2. It indicates that thebehaviors of T W -AVG test, χ -type test and N -type test are similar to those in the ﬁrst experiment using Graphon 1and the performance is poor. On the other hand, T W -MNBS test has a superior performance than T W -SBM in bothASL and AP. Speciﬁcally, ASLs of T W -SBM test are away from . , whereas T W -MNBS test still performs wellon both ASL and AP. Moreover, this also indicates that T W -SBM test is sensitive to the network structure especiallydeviation from an SBM. Hence, T W -MNBS test is more robust to the network structure whereas T W -SBM test ispreferable for SBM networks. 10 PREPRINT - J

ULY

6, 2020 TW − MNBS TW − AVG TW − SBM χ Normal0.000.250.500.751.000.05 100 400 700 1000 ( a ) AS L s ( b ) AS L s ( c ) AP s ( d ) AP s Figure 1 ASLs and APs of tests using Graphon for different values of nodes number n , sample sizes m and m . m = m = 30 for (a) and (c) and m = m = 200 for (b) and (d). To assess the performance of

T W -type change-point detection in dynamic networks, we compare its performancebased on MNBS, AVG, and SBM estimators (referred as CP-TWMNBS, CP-TWAVG, CP-TWSBM respectively) tothe graph-based nonparametric testing procedure in Chen and Zhang (2015) referred as CP-GRA detection, and theMNBS-based change-point detect procedure in Zhao et al. (2019) referred as CP-DMNBS detection.Speciﬁcally, using all the above ﬁve methods, we conduct change-point detection experiments under three differentscenarios with zero, one, and three change-points respectively. For all the experiments, we vary the nodes number andthe sample size at n = 100 , , , m = 100 , , and set the signiﬁcance α = 0 . . For each combination of thesample size, nodes number, and the network model, we run 100 Monte Carlo trials. Simultaneously, we also explore theeffect of network sparsity on the performance of change-point detection. For this, we consider the above setting, butscale the link probability P as ρP by a factor ρ = 1 , . , where ρ = 1 is exactly the same as the above setting while ρ = 0 . corresponds to sparser graphs. 11 PREPRINT - J

ULY

6, 2020 TW − MNBS TW − AVG TW − SBM χ Normal0.000.250.500.751.000.05 100 400 700 1000 ( a ) AS L s ( b ) AS L s ( c ) AP s ( d ) AP s Figure 2 ASLs and APs of tests using Graphon for different values of nodes number n , sample sizes m and m . m = m = 30 for (a) and (c) and m = m = 200 for (b) and (d). To study the performance with respect to false positives, we simulate two kinds of dynamic networks { A t } mt =1 withno change-point from Graphon 3 and SMB 1 with θ = 0 respectively. Tables 1 and 2 report the average number ofestimated change-points by using the ﬁve methods .Table 1 Average estimated change-points number ˆ J under no change-point scenarios through Graphon 3. m n ρ CP-TWAVG CP-TWSBM CP-TWMNBS CP-GRA CP-DMNBS100 100 1 0.00 0.00 0.00 0.04 3.75 . . . . . . PREPRINT - J

ULY

6, 2020Table 2 Average estimated change-points number ˆ J under no change-point scenarios through SBM 1. m n ρ CP-TWAVG CP-TWSBM CP-TWMNBS CP-GRA CP-DMNBS100 100 1 4.19 0.02 0.02 0.10 0.00 . . . . . . As one can see, the performance of CP-TWSBM, CP-TWMNBS, and CP-GRA detections perform reasonably well andimproves as n increases. CP-TWAVG detect method performs well in the case of Graphon 3 while experiences heavyinﬂated levels in the case of SBM 1. As for CP-DMNBS detection, the empirical type I error is completely controlled atthe target level 0.05 for SBM 1, but there are some false positives in the case of Graphon 3. We now assess the accuracy of our proposed

T W -type change-point estimators in different scenarios. The dynamicnetworks { A t } mt =1 are designed as follows. For t = 1 , , . . . , m/ , A t is generated from link probability matrix P bySBM 1 with θ = 0 . For t = m/ , . . . , m , A t is generated from P by SBM 1 with θ = − m − / .We adopt Boysen distance suggested in Boysen et al. (2009) as a measurement in the change-point estimation.Speciﬁcally, calculate the distances between the estimated change-point set ˆ J and the true change-point set J as ε ( ˆ J (cid:107)J ) = max b ∈J min a ∈ ˆ J | a − b | and ε ( J (cid:107) ˆ J ) = max b ∈ ˆ J min a ∈J | a − b | .Utilizing CP-TWMNBS, CP-TWAVG, CP-TWSBM, CP-GRA, and CP-DMNBS detections, we estimate the efﬁcientdetect rate (the rate at least one change-point is detected over 100 simulations), the average change-point number overthe efﬁcient detections, and the average Boysen distances over the efﬁcient detections. The corresponding results arelisted in Tables 3–5.Table 3 Average estimated change-points number ˆ J under single change-point scenarios through SBM 1. m n ρ CP-TWAVG CP-TWSBM CP-TWMNBS CP-GRA CP-DMNBS100 100 1 3.43 1.01 1.01 1.07 1.00 . . . . . . PREPRINT - J

ULY

6, 2020Table 4 Average efﬁcient detect rate under single change-point scenarios through SBM 1. m n ρ

CP-TWAVG CP-TWSBM CP-TWMNBS CP-GRA CP-DMNBS100 100 1 1.00 1.00 1.00 1.00 1.00 . . . . . . Table 5 Average Boysen distances ε , ε under single change-point scenarios through SBM 1. m n ρ CP-TWAVG CP-TWSBM CP-TWMNBS CP-GRA CP-DMNBS100 100 1 ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε Note: the dash “-” means there is no change-points detected. PREPRINT - J

ULY

6, 2020Results provided in Tables 3–5 show that CP-TWSBM and CP-TWMNBS detections yield reliable estimates of thenumber of change-points and their locations. When ρ = 1 , CP-TWAVG over-estimates the number of change-points,but it’s interesting that it performs well for sparser case of ρ = 0 . . A possible explanation is that the sparser structureovercomes its inﬂated behavior to some extent. As for CP-GRA and CP-DMNBS detections, the performances of bothmethods are reasonable in dense scenarios, especially CP-DMNBS. However, they are unable to detect any change-pointfor the sparser setting ρ = 0 . in this example. To assess the robustness of our method for change-point detection, we further construct a model with three change-pointsin the networks. We ﬁrst design three types of link probability matrix changes, which we use to build dynamic networkslater. Given a link probability matrix P , deﬁne a changed link probability matrix initialized as P (cid:48) = P . For two givensets M , M ⊂ { , , . . . , n } , for any i ∈ M and j ∈ M , the different types of link probability matrix changes aredeﬁned as follows:(1) Coummunity switching: P (cid:48) i, · = P j, · , P (cid:48)· ,i = P · ,j , P (cid:48) j, · = P i, · , P (cid:48)· ,j = P · ,i .(2) Community merging: P (cid:48) i, · = P j, · , P (cid:48)· ,i = P · ,j .(3) Community changing: Regenerate P (cid:48) i,j from Graphon 4.Then the dynamic networks { A t } mt =1 for multiple change-points are designed as follows. M and M are twosets with (cid:98) n/ (cid:99) nodes randomly chosen from { , , . . . , n } . For t = 1 , , . . . , m/ , A t is generated from P byGraphon 2. For t = m/ , . . . , m/ , A t is generated from P changed from P by community switching. For t = m/ , . . . , m/ , A t is generated from P changed from P by community merging. For t = 3 m/ , . . . , m , A t is generated from P changed from P by community changing.The results are illustrated in Tables 6–8.The reports suggest that CP-TWMNBS performs the best in terms of the number, efﬁciency and accuracy of change-point estimation. CP-TWSBM enjoys reasonably good behavior when m = 100 while encounters some false positiveswhen m increases to . As for CP-TWAVG, although the estimated change-points number ˆ J in Table 6 are not faraway from real value and the efﬁcient detect rates in Table 7 are all equal to , the Boysen distances in Table 8 aresometimes too large to be accepted, i.e., the location error can not be controlled stably.On the other hand, CP-GRA detection suffers greatly under-estimating the change-points, especially when ρ = 0 . ,there is no change-point detected in all cases. It happens similarly to CP-DMNBS detection when ρ = 0 . , soCP-DMNBS is also not the ideal for this scenario even though it is powerful when the networks are dense.Overall, the numerical experiments clearly demonstrate the superior performance of CP-TWMNBS detection over otherdetect methods for all simulation scenarios with CP-TWSBM method coming in second. CP-TWMNBS detectionprovides robust and stable performance across all experiments with more accurate ˆ J , higher efﬁcient detection andsmaller Boysen distances. 15 PREPRINT - J

ULY

6, 2020Table 6 Average estimated change-points number ˆ J under three change-points scenarios. m n ρ CP-TWAVG CP-TWSBM CP-TWMNBS CP-GRA CP-DMNBS100 100 1 3.00 3.00 3.00 0.31 3.00 . . . . . . Table 7 Average efﬁcient detect rate under three change-points scenarios. m n ρ

CP-TWAVG CP-TWSBM CP-TWMNBS CP-GRA CP-DMNBS100 100 1 1.00 1.00 1.00 0.13 1.00 . . . . . . Table 8 Average Boysen distances ε , ε under three change-points scenarios. m n ρ CP-TWAVG CP-TWSBM CP-TWMNBS CP-GRA CP-DMNBS100 100 1 ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε Note: the dash “-” means there is no change-points detected. PREPRINT - J

ULY

6, 2020

In this section, we analyze the performance of the proposed

T W -type method for two-sample test and T W -typechange-point detection using two real datasets. The ﬁrst dataset used for the two-sample test comes from the Centers ofBiomedical Research Excellence (COBRE) and the second dataset used for change-point detection is from MIT RealityMining (RM) (Eagle et al., 2009). Raw anatomical and functional scans from 146 subjects of 72 patients with schizophrenia (SCZ) and 74 healthy controls(HCs) can be downloaded from a public database ( http://fcon_1000.projects.nitrc.org/indi/retro/cobre.html ). In this paper, we use the processed connectomics dataset in Relión et al. (2019). After a series of pre-processingsteps, Relión et al. (2019) keeps 54 SCZ and 70 HC subjects for analysis and chooses 264 brain regions of interestas the nodes. For each of the 263 nodes with every other node, they applies Fisher’s R-to-Z transformation to thecross-correlation matrix of Pearson r -values.In our study, we perform the Z-to-R inverse transformation to their dataset to get the original cross-correlation matrix ofPearson r -values, which is denoted as R . To analyze graphical properties of these brain functional networks, we need tocreate an adjacency matrix A from R . We set A ij to be if R ij exceeds a threshold T and A ij to be otherwise. Thereis no generally accepted way to identify an optimal threshold for this graph construction procedure, we decide to set T varied between . and . with step of . .For each threshold T , two situations are considered for the two-sample test. In the ﬁrst situation, we randomly divide HCinto groups with sample sizes m = m = 35 and calculate the average null hypothesis reject rates of T W -MNBStest, T W -AVG test, T W -SBM test, χ -type test, and N -type test through 100 repeated simulations. In the secondsituation, we apply the same test methods above to two groups of SCZ and HC directly and compare their average nullhypothesis reject rates. The results are shown in Tables 9 and 10 respectively.Table 9 Average H reject rate of test over HC group over 100 simulations. T .

30 0 .

35 0 .

40 0 .

45 0 .

50 0 .

55 0 .

60 0 .

65 0 . T W -AVG .

80 0 .

67 0 .

57 0 .

50 0 . T W -SBM T W -MNBS χ -type N -type Table 10 Average H reject rate of test over SCZ and HC groups. T .

30 0 .

35 0 .

40 0 .

45 0 .

50 0 .

55 0 .

60 0 .

65 0 . T W -AVG T W -SBM T W -MNBS χ -type N -type To investigate the performance of the tests, we need to compare the type I error in Table 9 and the power result in Table10 together. Table 9 shows that

T W -type tests based on SBM and AVG have poor performance for the test over HC17 PREPRINT - J

ULY

6, 2020group because the reject rates all exceed . and even equal to . From Table 10, it is found that χ -type test losespower for the test over SCZ and HC groups, where the reject rates are all . Only T W -type test based on MNBS when T = 0 . , . and N -type test when T ≥ . can perform well in both situations. In addition, applying MNBS, weillustrate the adjacency matrices of subject-speciﬁc networks of HC and SCZ groups when T = 0 . in Figure 3. Onecan ﬁnd that the two groups do have differences in the network structure. Nodes N ode s (a) HC Nodes N ode s (b) SCZ Figure 3 Adjacency matrices estimated by MNBS for HC and SCZ groups.

In this section, we apply CP-TWMNBS, CP-TWAVG, CP-TWSBM, CP-GRA, and CP-DMNBS detections to performchange-point detection for a phone-call network data extracted from RM dataset. The data is collected through anexperiment conducted by the MIT Media Laboratory following 106 MIT students and staff using mobile phones withpreinstalled software that can record and send call logs from 2004 to 2005 academic year. Note that this is differentfrom the MIT proximity network data considered in Zhao et al. (2019) which is based on the bluetooth scans instead ofphone calls. In this analysis, we are interested in whether phone call patterns changed during this time, which mayreﬂect a change in relationship among these subjects. 94 of the 106 RM subjects completed the survey, we remainrecords only within these participants and ﬁlter records before / / due to the extreme scarcity of sample beforethat time. Then there remains 81 subjects left and we construct dynamic networks among these subjects by day. Foreach day, construct a network with the subjects as nodes and a link between two subjects if they had at least one callon that day. We encode the network of each day by an adjacency matrix, with for element ( i, j ) if there is an edgebetween subject i and subject j , and otherwise. Thus, there are in total 310 days from / / to / / .The calendar of events is included in the appendix. We claim that an estimated change-point is reasonable if it is atmost three days away from the real dates the event lasts.We ﬁrst choose h = 7 and Figure 4 plots the results of different methods on the dynamic networks. The purple shadowareas mark time intervals from the beginning to the end of events continue on MIT academic calendar 2004–2005, whichcan be used as references for the estimated change-points’ occurrences. The red lines in Figure 4 are the estimatedchange-points applying different detect methods.It turns out that CP-TWAVG and CP-DMNBS detections either do not work well or detect no change-point. CP-TWSBMmethod detects 20 change-points, CP-TWMNBS method detects 19 change-points, while CP-GRA detection detects 12change-points. When comparing the estimated change-points to intervals of calendar events, we see that they align each18 PREPRINT - J

ULY

6, 2020other the best by using CP-TWMNBS detection and then CP-TWSBM detection, whereas there are more estimatedchange-points by CP-GRA detection that can not be explained.Figure 4 Calendar time intervals of events and estimated change-points.However, it’s observed that some of the change-points detected by CP-TWSBM and CP-TWMNBS methods can be alittle trivial. For example, CP-TWMNBS detected a change-point occurred at around 01/09/2004, which is near event“English Evaluate Test for International students” in the calendar. To ignore the less signiﬁcant events, we only considerthe seemingly major events displayed in bold in the calendar as possible reasons for estimated change-points and set h = 14 , which corresponds to 2 weeks. The details are reported in Table 11. The CP-TWMNBS and CP-TWSBMmethods detect 9 change-points, CP-GRA method detects 13 change-points. Notably CP-GRA method still labelsmore trivial change-points away from the important events. Based on the results, it is most likely valid in saying thatCP-TWSBM and CP-TWMNBS detections are more reliable.Table 11 Estimated change-points by different methods for MIT phone data. CP-TWSBM 02/08/2004 08/09/2004 12/10/2004 15/11/2004 27/12/2004 03/02/2005 19/02/2005 23/03/2005 03/05/2005CP-TWMNBS 10/08/2004 27/09/2004 17/10/2004 20/11/2004 05/12/2004 24/12/2004 13/02/2005 10/04/2005 04/05/2005CP-GRA 13/08/2004 01/09/2004 14/09/2004 28/09/2004 13/10/2004 04/11/2004 18/11/2004 02/12/2004 19/12/200409/01/2005 06/03/2005 17/04/2005 05/05/2005

We consider the problem of hypothesis testing on whether two populations of networks deﬁned on a common vertex setare from the same distribution. Two-sample testing on populations of networks is a challenging task especially when thethe number of nodes is large. We propose a general

T W -type test (which is later adapted to a change-point detectionprocedure in dynamic networks), derive its asymptotic distribution and asymptotic power. The test statistics utilizes19 PREPRINT - J

ULY

6, 2020some plugin estimates for the link probability matrices and properties of the resulting tests with various estimatesdiscussed by evaluating and comparing

T W -type tests based on MNBS, AVG, SBM theoretically, and numericallywith both simulated and real data. From the simulation study, we see that the proposed T W -type test based on MNBSperforms the best and yields robust results even when the structure is sparse. In addition, we provide a signiﬁcantmodiﬁcation of the two-sample network test for change-point detection in dynamic networks. Simulation and real dataanalyses show that the procedure is consistent, principled and practically viable. Acknowledgements

The work of Li Chen was supported by the China Scholarship Council under Grant 201806240032. The work of JieZhou was supported in part by the National Natural Science Foundation of China under grants 61374027 and 11871357,and in part by the Sichuan Science and Technology Program under grant 2019YJ0122. Lizhen Lin acknowledges thegenerous support from NSF grants IIS 1663870, DMS Career 1654579 and a DARPA grant N66001-17-1-4041.

A Preliminaries

Proposition A.1 (Hoeffding’s inequality (Hoeffding, 1963)) . If X , X , . . . , X m are independent random variablesand a i ≤ X i ≤ b i ( i = 1 , , . . . , m ) , then for t > P (cid:0) ¯ X − µ ≥ t (cid:1) ≤ exp (cid:40) − m t (cid:80) mi =1 ( b i − a i ) (cid:41) , where ¯ X = m (cid:80) mi =1 X i , µ = E ( ¯ X ) . Proposition A.2 (Bernstein’s inequality (Bernstein, 1946)) . Let X , X , . . . , X m be independent zero-mean randomvariables. Suppose that | X i | ≤ M with probability for all i . Then for all positive t , we have P  m (cid:88) i =1 X i > t  ≤ exp (cid:40) − t (cid:80) mi =1 E ( X i ) + M t (cid:41) . For a sequence of independent Bernoulli random variables where X i ∼ Bernoulli ( p ) , by Proposition A.1 we have P (cid:0) (cid:12)(cid:12) ¯ X − p (cid:12)(cid:12) ≥ t (cid:1) ≤ (cid:8) − mt (cid:9) . Similarly, by Proposition A.2, we have P (cid:0) (cid:12)(cid:12) ¯ X − p (cid:12)(cid:12) > t (cid:1) ≤ (cid:40) − mt p (1 − p ) + t (cid:41) . Lemma A.3 (Asymptotic distributions of λ ( Z ) and λ n ( Z ) ) . For Z deﬁned in (4) , we have n / [ λ ( Z ) − (cid:32) T W , n / [ − λ n ( Z ) − (cid:32) T W . Proof.

Let G be an n × n symetric matrix whose upper diagonal entries are independent normal with mean zero andvariance / ( n − , and zero diagonal entries. Let H G = (cid:112) ( n − /nG , according to Theorem 1.2 in Lee and Yin(2014), n / [ λ ( H G ) − converges to T W in distribution. For convenience and without ambiguity, we also use T W to denote a random variable following the Tracy–Widom law with index 1. Then we have λ ( H G ) = 2 + n − / T W + o p ( n − / ) . PREPRINT - J

ULY

6, 2020Further, λ ( G ) = (cid:114) nn − λ ( H G ) = (cid:2) O n (cid:0) n − (cid:1)(cid:3) λ ( H G ) = 2 + n − / T W + o p ( n − / ) , which is equivalent to n / [ λ ( G ) − (cid:32) T W . Since the ﬁrst and second moments of entries of Z and G are the same, it follows from Theorem 2.4 in Erd˝os et al.(2012) that n / [ λ ( Z ) − and n / [ λ ( G ) − have the same limiting distribution. Therefore, n / [ λ ( Z ) − (cid:32) T W . The same argument applies to λ n ( Z ) . B Proof of Theorem 2.1

Under the null hypothesis H , we have P = P ≡ P , and it’s not difﬁcult to observe that ˆ Z ij = (cid:113) m P ij (1 − P ij ) + m P ij (1 − P ij ) (cid:113) m ˆ P ,ij (cid:0) − ˆ P ,ij (cid:1) + m ˆ P ,ij (cid:0) − ˆ P ,ij (cid:1) Z ij . (13)Since sup i,j | ˆ P u,ij − P ij | = o p ( n − / ) , (14)for the numerator in (13), utilizing the Taylor Expansion, we have (cid:114) m P ij (1 − P ij ) + 1 m P ij (1 − P ij )= (cid:114) m + m m m P ij (1 − P ij )= (cid:114) m + m m m (cid:104)(cid:113) ˆ P ,ij (cid:0) − ˆ P ,ij (cid:1) + O n (cid:0) P ij − ˆ P ,ij (cid:1)(cid:105) = (cid:114) m + m m m (cid:104)(cid:113) ˆ P ,ij (cid:0) − ˆ P ,ij (cid:1) + o p ( n − / ) (cid:105) = (cid:114) m ˆ P ,ij (cid:0) − ˆ P ,ij (cid:1) + 1 m ˆ P ,ij (cid:0) − ˆ P ,ij (cid:1) + (cid:114) m + m m m o p ( n − / ) , where the third equality is obtained by condition (14).Without loss of gernerality, assume ˆ P ,ij (cid:0) − ˆ P ,ij (cid:1) ≤ ˆ P ,ij (cid:0) − ˆ P ,ij (cid:1) , then (cid:114) m P ij (1 − P ij ) + 1 m P ij (1 − P ij ) (15) ≤ (cid:114) m ˆ P ,ij (cid:0) − ˆ P ,ij (cid:1) + 1 m ˆ P ,ij (cid:0) − ˆ P ,ij (cid:1) + (cid:114) m + m m m o p ( n − / ) . Similarily, we have (cid:114) m ˆ P ,ij (cid:0) − ˆ P ,ij (cid:1) + 1 m ˆ P ,ij (cid:0) − ˆ P ,ij (cid:1) ≤ (cid:114) m P ij (1 − P ij ) + 1 m P ij (1 − P ij ) + (cid:114) m + m m m o p ( n − / ) . (16)21 PREPRINT - J

ULY

6, 2020From (15) and (16), (cid:114) m P ij (1 − P ij ) + 1 m P ij (1 − P ij )= (cid:114) m ˆ P ,ij (cid:0) − ˆ P ,ij (cid:1) + 1 m ˆ P ,ij (cid:0) − ˆ P ,ij (cid:1) + (cid:114) m + m m m o p ( n − / ) . (17)Combining (17) with (13), we have ˆ Z − Z = M ◦ Z, (18)where M is an n × n matrix whose elements M ij = o p ( n − / ) .One has (cid:107) ˆ Z − Z (cid:107) op = (cid:107) M ◦ Z (cid:107) op = sup (cid:107) x (cid:107) =1 ,x ∈ R n ( M ◦ Z ) x = sup (cid:107) x (cid:107) =1 ,x ∈ R n (cid:118)(cid:117)(cid:117)(cid:117)(cid:116) n (cid:88) i =1  n (cid:88) j =1 M ij Z ij x j  . (19)The key technique here is to factor out the M ij in (19) by some bounded constant. Actually, for each i = 1 , · · · , n ,the sum term in (19), (cid:80) nj =1 M ij Z ij x j , can be divided into two parts in terms of j . Speciﬁcally, let { , · · · , n } = S i ∪ S i , S i ∩ S i = ∅ . If j ∈ S i , Z ij x j ≥ , otherwise if j ∈ S i , Z ij x j ≤ . Then we have n (cid:88) j =1 M ij Z ij x j = (cid:88) j ∈ S i M ij Z ij x j + (cid:88) j ∈ S i M ij Z ij x j ≤ sup i,j M ij (cid:88) j ∈ S i Z ij x j + inf i,j M ij (cid:88) j ∈ S i Z ij x j . (20)Next, we consider (20) by looking at the two cases lim n →∞ (cid:80) nj =1 Z ij x j (cid:54) = 0 and lim n →∞ (cid:80) nj =1 Z ij x j = 0 separately.If lim n →∞ (cid:80) nj =1 Z ij x j (cid:54) = 0 , let ˜ M i = sup i,j M ij (cid:80) j ∈ S i Z ij x j + inf i,j M ij (cid:80) j ∈ S i Z ij x j (cid:80) nj =1 Z ij x j . (21)We now try to show that ˜ M i = o p ( n − / ) . If S i = ∅ or S i = ∅ , the result holds immediately since M ij = o p ( n − / ) .Otherwise, it sufﬁces to show that (cid:80) j ∈ Si Z ij x j (cid:80) nj =1 Z ij x j and (cid:80) j ∈ Si Z ij x j (cid:80) nj =1 Z ij x j are both bounded by some constants when n → ∞ .This comes from ﬁrst noting that (cid:80) j ∈ S i Z ij x j and (cid:80) j ∈ S i Z ij x j are with the same order following < | S i | min j ∈ S i Z ij x j −| S i | max j ∈ S i Z ij x j ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:80) j ∈ S i Z ij x j (cid:80) j ∈ S i Z ij x j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ | S i | max j ∈ S i Z ij x j −| S i | min j ∈ S i Z ij x j ≤ C , where C is a constant. The last inequality holds by noting that | S i | = a i n and | S i | = b i n with a i and b i two numberssatisfying a i , b i ∈ (0 , , a i + b i = 1 , in a addition, max j ∈ S i Z ij x j and min j ∈ S i Z ij x j have the same formula interms of n , m and m . Therefore, we can write lim n →∞ | (cid:88) j ∈ S i Z ij x j | = C lim n →∞ | (cid:88) j ∈ S i Z ij x j | , PREPRINT - J

ULY

6, 2020with some constant C (cid:54) = 1 , which implies lim n →∞ | (cid:80) j ∈ S i Z ij x j || (cid:80) nj =1 Z ij x j | = lim n →∞ | C (cid:80) j ∈ S i Z ij x j || (1 − C ) (cid:80) nj ∈ S i Z ij x j | = (cid:12)(cid:12)(cid:12)(cid:12) C − C (cid:12)(cid:12)(cid:12)(cid:12) . Similarly, lim n →∞ | (cid:80) j ∈ Si Z ij x j || (cid:80) nj =1 Z ij x j | can be bounded too.When n tends to inﬁnity, plugging (21) in (20), we have n (cid:88) j =1 M ij Z ij x j ≤ ˜ M i n (cid:88) j =1 Z ij x j . For (cid:80) nj =1 M ij Z ij x j ≥ , we have  n (cid:88) j =1 M ij Z ij x j  ≤ ˜ M i  n (cid:88) j =1 Z ij x j  , n → ∞ . For (cid:80) nj =1 M ij Z ij x j ≤ , we have − (cid:80) nj =1 M ij Z ij x j ≥ , then it’s obtained similarly that  n (cid:88) j =1 M ij Z ij x j  ≤ ˜˜ M i  n (cid:88) j =1 Z ij x j  , n → ∞ , where ˜˜ M i = o p ( n − / ) .For convenience and without loss of generality, we use ˜ M i for both of the above two cases instead of ˜ M i and ˜˜ M i separately. Therefore,  n (cid:88) j =1 M ij Z ij x j  ≤ ˜ M i  n (cid:88) j =1 Z ij x j  , n → ∞ . (22)If lim n →∞ (cid:80) nj =1 Z ij x j = 0 , from (20), as n → ∞ we have n (cid:88) j =1 M ij Z ij x j ≤ sup i,j M ij (cid:88) j ∈ S i Z ij x j − inf i,j M ij (cid:88) j ∈ S i Z ij x j (23) ≤ (cid:16) sup i,j M ij − inf i,j M ij (cid:17) (cid:88) j ∈ S i Z ij x j . (24)Obviously, sup i,j M ij − inf i,j M ij = o p ( n − / ) . Then we have n (cid:88) j =1 M ij Z ij x j ≤ ˜ M i (cid:88) j ∈ S i Z ij x j , n → ∞ where ˜ M i = o p ( n − / ) which may not necessarily be the same with that one in (21). Similarly, we can always have  n (cid:88) j =1 M ij Z ij x j  ≤ ˜ M i  (cid:88) j ∈ S i Z ij x j  , n → ∞ . (25)23 PREPRINT - J

ULY

6, 2020Next, for the ﬁrst case of lim n →∞ (cid:80) nj =1 Z ij x j (cid:54) = 0 , plugging inequality (22) in (19), when n → ∞ , we have (cid:107) ˆ Z − Z (cid:107) op ≤ sup (cid:107) x (cid:107) =1 ,x ∈ R n (cid:118)(cid:117)(cid:117)(cid:117)(cid:116) n (cid:88) i =1 ˜ M i  n (cid:88) j =1 Z ij x j  ≤ sup (cid:107) x (cid:107) =1 ,x ∈ R n sup i | ˜ M i | (cid:118)(cid:117)(cid:117)(cid:117)(cid:116) n (cid:88) i =1  n (cid:88) j =1 Z ij x j  = o p ( n − / ) (cid:107) Z (cid:107) op . For the case when lim n →∞ (cid:80) nj =1 Z ij x j = 0 , plugging inequality (25) in (19), when n → ∞ , we have (cid:107) ˆ Z − Z (cid:107) op ≤ sup (cid:107) x (cid:107) =1 ,x ∈ R n (cid:118)(cid:117)(cid:117)(cid:117)(cid:116) n (cid:88) i =1 ˜ M i  (cid:88) j ∈ S i Z ij x j  ≤ sup (cid:107) x (cid:107) =1 ,x ∈ R n sup i | ˜ M i | (cid:118)(cid:117)(cid:117)(cid:117)(cid:116) n (cid:88) i =1  (cid:88) j ∈ S i Z ij x j  (26) ≤ o p ( n − / ) (cid:107) Z (cid:107) op . The last inequality follows by noting that the vector x in (26) is a special case where x j = 0 for all j ∈ S i .In addition, from Corollary 2.3.6 in Tao (2012), the norm of Z in (4) satisties (cid:107) Z (cid:107) op = O p (1) . So (cid:107) ˆ Z − Z (cid:107) op ≤ o p ( n − / ) . Then | λ ( ˆ Z ) − λ ( Z ) | ≤ o p ( n − / ) . (27)Combining (27) with Lemma A.3, we have n / [ λ ( ˆ Z ) − (cid:32) T W . Similarly, we can prove n / [ − λ n ( ˆ Z ) − (cid:32) T W . C Proof of Corollary 2.3 P ( T T W ≥ τ α/ ) ≤ P (cid:0) n / [ λ ( ˆ Z ) − ≥ τ α/ (cid:1) + P (cid:0) n / [ − λ n ( ˆ Z ) − ≥ τ α/ (cid:1) = α/ o n (1) + α/ o n (1)= α + o n (1) . PREPRINT - J

ULY

6, 2020

D Proof of of Corollary 2.4

Deﬁne a matrix W ∈ R n × n with zero diagonal and for any i (cid:54) = j , W ij = ( P ,ij − P ,ij ) − ( ¯ A ,ij − ¯ A ,ij ) (cid:114) ( n − (cid:104) m P ,ij (1 − P ,ij ) + m P ,ij (1 − P ,ij ) (cid:105) . Recall the deﬁnitions of Z, ˆ Z and ˜ Z given by (4), (5) and (9) respectively, from (18), it is easy to get ˆ Z ij = [1 + o p ( n − / )]( ˜ Z ij − W ij ) . Thus [1 + o p ( n − / )] ˜ Z ij = ˆ Z ij + [1 + o p ( n − / )] W ij . This implies ˆ Z = ˜ Z ◦ ( J + D ) − W ◦ ( J + D ) , where J is an n × n matrix with every element equal to 1, D is an n × n matrix with elements D ij = o p ( n − / ) .Similarly with proof of Theorem 2.1, we can get σ [ ˜ Z ◦ ( J + D )] = σ ( ˜ Z ) , σ [ W ◦ ( J + D )] = σ ( W ) with probability1 as n → ∞ .Applying the triangle inequality of spectral norm, we have σ ( ˆ Z ) ≥ σ ˜( Z ) − σ ( W ) with probability as n tends to inﬁnity. Noting that W is a mean zero matrix whose singular value can be bounded byusing the T W asymptotic distribution. Hence, for any β ∈ (0 , , P (cid:0) σ ( W ) ≤ n − / τ β (cid:1) = 1 − P (cid:0) σ ( W ) > n − / τ β (cid:1) ≥ − (cid:2) P (cid:0) λ ( W ) > n − / τ β (cid:1) + P (cid:0) − λ n ( W ) > n − / τ β (cid:1)(cid:3) = 1 − β + o n (1) . (28)Set τ β = n / [ σ ( ˜ Z ) − − τ α/ , and plug this in (28), then we have − β + o n (1) ≤ P (cid:16) σ ( W ) ≤ n − / (cid:8) n / [ σ ( ˜ Z ) − − τ α/ (cid:9)(cid:17) = P (cid:16) n − / τ α/ ≤ σ ( ˜ Z ) − σ ( W ) (cid:17) ≤ P (cid:16) n − / τ α/ ≤ σ ( ˆ Z ) (cid:17) = P (cid:0) n / [ σ ( ˆ Z ) − ≥ τ α/ (cid:1) = P ( T T W ≥ τ α/ ) . Observe that if n − / [ σ ( ˜ Z ) − − ≤ o n (1) , for a ﬁxed α ∈ (0 , , we have τ − β = o n (1) , that is β = o n (1) .Therefore, P ( T T W ≥ τ α/ ) = 1 + o n (1) . PREPRINT - J

ULY

6, 2020

E Proof of Theorem 2.6

Under the null hypothesis, P = P ≡ P . From the deﬁnition of ˆ P u given by (10), the error of ˆ P u,ij can be calculatedby the triangle inequality as | ˆ P u,ij − P ij | = | ˆ P u,ij − E ( ˆ P u,ij ) + E ( ˆ P u,ij ) − P ij |≤ | ˆ P u,ij − E ( ˆ P u,ij ) | + | E ( ˆ P u,ij ) − P ij | , (29)where E ( ˆ P u,ij ) = 12 E (cid:32) (cid:80) i (cid:48) ∈N u,i ¯ A u,i (cid:48) j |N u,i | + (cid:80) j (cid:48) ∈N u,j ¯ A u,j (cid:48) i |N u,j | (cid:33) = (cid:80) i (cid:48) ∈N u,i P i (cid:48) j |N u,i | + (cid:80) j (cid:48) ∈N u,j P j (cid:48) i |N u,j | . For the ﬁrst term in (29), we have | ˆ P u,ij − E ( ˆ P u,ij ) | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:80) i (cid:48) ∈N u,i (cid:0) ¯ A u,i (cid:48) j − P i (cid:48) j (cid:1) |N u,i | + (cid:80) j (cid:48) ∈N u,j (cid:0) ¯ A u,j (cid:48) i − P j (cid:48) i (cid:1) |N u,j | (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:80) m u k =1 (cid:80) i (cid:48) ∈N u,i (cid:16) A ( k ) i (cid:48) j − P i (cid:48) j (cid:17) |N u,i | m u + (cid:80) m u k =1 (cid:80) j (cid:48) ∈N u,j (cid:16) A ( k ) j (cid:48) i − P j (cid:48) i (cid:17) |N u,j | m u (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . For the second term in (29), | E ( ˆ P u,ij ) − P ij | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:80) i (cid:48) ∈N u,i (cid:0) P i (cid:48) j − P ij (cid:1) |N u,i | + (cid:80) j (cid:48) ∈N u,j (cid:0) P j (cid:48) i − P ji (cid:1) |N u,j | (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . So, | ˆ P u,ij − P ij | ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:80) m u k =1 (cid:80) i (cid:48) ∈N u,i (cid:16) A ( k ) i (cid:48) j − P i (cid:48) j (cid:17) |N u,i | m u (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:80) m u k =1 (cid:80) j (cid:48) ∈N u,j (cid:16) A ( k ) j (cid:48) i − P j (cid:48) i (cid:17) |N u,j | m u (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:80) i (cid:48) ∈N u,i (cid:0) P u,i (cid:48) j − P ij (cid:1) |N u,i | (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:80) j (cid:48) ∈N u,j (cid:0) P j (cid:48) i − P ji (cid:1) |N u,j | (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . (30)We ﬁrst calculate the ﬁrst absolute term in (30). From Bernstein’s inequality, for any t > , P (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) m u (cid:88) k =1 (cid:80) i (cid:48) ∈N u,i (cid:16) A ( k ) i (cid:48) j − P i (cid:48) j (cid:17) |N u,i | m u (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > t  ≤  − t (cid:80) m u k =1 (cid:80) i (cid:48) ∈N u,i P i (cid:48) j (1 − P i (cid:48) j )4 |N u,i | m u + t m u  ≤  − |N u,i | m u t + |N u,i | t  ≤ max (cid:40) (cid:8) − |N u,i | m u t (cid:9) , (cid:26) − m u t (cid:27)(cid:41) . PREPRINT - J

ULY

6, 2020Taking t = ( |N u,i | m u ) − / or t = m − u , we get (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:80) m u k =1 (cid:80) i (cid:48) ∈N u,i (cid:16) A ( k ) i (cid:48) j − P i (cid:48) j (cid:17) |N u,i | m u (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = max (cid:110) O p (cid:0) ( |N u,i | m u ) − / (cid:1) , O p ( m − u ) (cid:111) . Similarly, for the second absolute term in (30), we have (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:80) m u k =1 (cid:80) j (cid:48) ∈N u,j (cid:16) A ( k ) j (cid:48) i − P j (cid:48) i (cid:17) |N u,j | m u (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = max (cid:110) O p (cid:0) ( |N u,j | m u ) − / (cid:1) , O p ( m − u ) (cid:111) . As for the third term in (30), we know that P ij = f ( ξ i , ξ j ) with ξ i i.i.d ∼ Uniform [0 , for any i, j = 1 , · · · , n . FromLemma 3.2 and Lemma 3.3 in Zhao et al. (2019), for any i (cid:48) ∈ N u,i , ξ i (cid:48) should be in the neighborhood of ξ i in whichpiecewise Lipschitz condition for the Lipschitz graphon function f : [0 , → [0 , is satisﬁed, such that | P i (cid:48) l − P il | ≤ | f ( ξ i (cid:48) , ξ l ) − f ( ξ i , ξ l ) | ≤ L | ξ i (cid:48) − ξ l | ≤ C log nn / ω u , where L and C are constants. Therefore, we have (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:80) i (cid:48) ∈N u,i (cid:0) P i (cid:48) j − P ij (cid:1) |N u,i | (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C log n n / ω u = O n (cid:18) log nn / ω u (cid:19) . For the same reason, the following holds for the last term in (30). (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:80) j (cid:48) ∈N u,j (cid:0) P j (cid:48) i − P ji (cid:1) |N u,j | (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C log n n / ω u = O n (cid:18) log nn / ω u (cid:19) . Therefore, | ˆ P u,ij − P u,ij | ≤ max (cid:110) O p (cid:0) ( |N u,i | m u ) − / (cid:1) , O p ( m − u ) (cid:111) + O n (cid:18) log nn / ω u (cid:19) . If ω u = n / , α u ≥ / , according to (11), |N u,i | ≥ B u log n, |N u,j | ≥ B u log n . So m u is easy to exceed n andthus m u ≥ |N u,i | , then sup i,j | ˆ P u,ij − P ij | ≤ O p (cid:0) ( |N u,i | m u ) − / (cid:1) + O n ( n − log n ) ≤ o p ( n − / ) . If ω u = ( m u log n ) / , α u > / , according to (11), |N u,i | ≥ B u (cid:16) n log nm u (cid:17) / , |N u,j | ≥ B u (cid:16) n log nm u (cid:17) / . m u issmall and can attain m u ≤ |N u,i | , then sup i,j | ˆ P u,ij − P ij | ≤ O p ( m − u ) + O n (cid:0) ( m u n ) − / (log n ) / (cid:1) ≤ o p ( n − / ) . The

T W convergence is obtained by applying Theorem 2.1.27 PREPRINT - J

ULY

6, 2020

F Proof of Theorem 3.1

For any t that is not a change-point, since t ∈ J , we have P ( T T W ( t, h ) > (cid:77) T TW ) = 1 − P ( T T W ( t, h ) ≤ (cid:77) T TW )= 1 − (cid:89) t (cid:48) ∈ ( t − h,t + h ) P ( T T W ( t (cid:48) , h ) ≤ (cid:77) T TW )= 1 − (cid:89) t (cid:48) ∈ ( t − h,t + h ) (cid:2) − P ( T T W ( t (cid:48) , h ) > (cid:77) T TW ) (cid:3) ≤ − (cid:89) t (cid:48) ∈ ( t − h,t + h ) (cid:2) − P ( T T W ( t (cid:48) , h ) > τ α ) (cid:3) ≤ − (1 − α ) h + o n (1)= 1 /n + o n (1) → . For any t that is a true change-point, under the alternative hypothesis, n / [ δ ( t, h ) − ≥ τ α . We have P ( T T W ( t, h ) > (cid:77) T TW ) = P (cid:16) n / (cid:8) σ (cid:2) ˆ Z ( t, h ) (cid:3) − (cid:9) > n / [ δ ( t, h ) − − τ α (cid:17) = P (cid:0) σ (cid:2) ˆ Z ( t, h ) (cid:3) > δ ( t, h ) − − n − / τ α (cid:1) . (31)Assume P ( t, h ) and P ( t, h ) are the true link probability matrices of groups { A i } ti = t − h +1 and { A i } t + hi = t +1 . For proofconvenience later, we denote matrices B ( t, h ) , B ( t, h ) , V ( t, h ) all with zero diagonals and for all i (cid:54) = j , B ,ij ( t, h ) = P ,ij ( t, h ) − P ,ij ( t, h ) (cid:115) ( n − (cid:26) h ˆ P ,ij ( t, h ) (cid:104) − ˆ P ,ij ( t, h ) (cid:105) + h ˆ P ,ij ( t, h ) (cid:104) − ˆ P ,ij ( t, h ) (cid:105) (cid:27) ,B ,ij ( t, h ) = [ P ,ij ( t, h ) − P ,ij ( t, h )] − [ ¯ A ,ij ( t, h ) − ¯ A ,ij ( t, h )] (cid:115) ( n − (cid:26) h ˆ P ,ij ( t, h ) (cid:104) − ˆ P ,ij ( t, h ) (cid:105) + h ˆ P ,ij ( t, h ) (cid:104) − ˆ P ,ij ( t, h ) (cid:105) (cid:27) ,V ,ij ( t, h ) = [ P ,i,j ( t, h ) − P ,ij ( t, h )] − [ ¯ A ,ij ( t, h ) − ¯ A ,ij ( t, h )] (cid:115) ( n − (cid:26) h P ,ij ( t, h ) (cid:2) − P ,ij ( t, h ) (cid:3) + h P ,ij ( t, h ) (cid:2) − P ,ij ( t, h ) (cid:3) (cid:27) . Then the lower bound of σ ( t, h ) can be obtained: σ (cid:2) ˆ Z ( t, h ) (cid:3) ≥ σ [ B ( t, h )] − σ [ B ( t, h )]= (cid:8) σ [ V ( t, h )] − σ [ V ( t, h )] (cid:9) [1 + o p ( n − / )] ≥ δ ( t, h ) − − n − / τ α , with probability at most − α + o n (1) . The last inequality follows by noting that V ( t, h ) is a generalized Wignermatrix. Similarly with proof of Lemma A.3, we have P (cid:0) n / (cid:8) σ [ V ( t, h )] − (cid:9) ≤ τ α (cid:1) ≥ − α + o n (1) . Combiningthis with (31), we have P ( T T W ( t, h ) > (cid:77) T TW ) = 1 − α + o n (1) = (1 − /n ) / (2 h ) + o n (1) → . The above result implies that with probability of 1, all and only the change-points will be selected at the thresholdingsteps. Therefore, we have lim n →∞ P (cid:0) J = ˆ J (cid:1) = 1 . PREPRINT - J

ULY

6, 2020

G Academic calendar of MIT 2004–2005

The academic calendar of MIT we use in this paper is illustrated as follows.Table 12 Academic calendar of MIT from July 20, 2004 to June 14, 2005.

Date EventAugust 6, 2004

Deadline for doctoral students to submit application for Fall Term Non-Resident status; Thesis duefor September degree candidates .August 12, 2004

Continuing students ﬁnal deadline to pre-reg on-line .August 13, 2004

Last day to go off the September degree list .August 16–17, 2004 Summer Session Final Exam Period.August 23, 2004

Grades due .August 27, 2004 Term Summaries of Summer Session Grades.August 30, 2004 Graduate Student Orientation activities begin.August 31, 2004 English Evaluation Test for International students.September 6, 2004 Labor Day–Holiday.September 7, 2004

Registration day .September 8, 2004

First day of classes .September 9–17, 2004 Physical Education Petition Period.September 10, 2004

Degree application deadline .September 14, 2004 Committee on Graduate School Policy Meeting.September 15, 2004 Faculty ofﬁcers recommend degrees to Corporation.September 24, 2004

Minor completion date .September 30, 2004 Last day to sing up family health insurance or waive individual coverage.October 1, 2004 Deadline for completing Harvard cross-registration.October 8, 2004

Last day to add subjects to Registration .October 11, 2004 Columbus Day–Holiday.October 15–17, 2004 Family Weekend.October 2, 20046 Second quarter Physical Education classes begin.November 1, 2004 Half-term subjects offered in second half of term begin.November 11, 2004 Veteran’s Day–Holiday.November 17, 2004

Last day to cancel subjects from Registration .November 25–26, 2004 Thanksgiving Vacation–Holiday.December 1

On-line pre-registration for Spring Term begins .December 3

Subjects with no ﬁnal/ﬁnal exam .December 9, 2004

Last day of classes .December 10, 2004 Last day to submit or change Advanced Degree Thesis Title.December 13–17, 2004

Final exam period .December 14–22, 2004

Grade deadline .December 18, 2004 Winter Vacation begins–Holiday.December 30, 2004

Spring pre-registration deadline .January 2, 2005 Winter Vacation ends.January 3, 2005

Deadline for doctoral students to submit applications for Spring Term Non-Resident status .January 6, 2005 Term Summaries of Fall Term Grades.January 7, 2005

Thesis due .January 10, 2005 Second-Year and Third-Year Grades Meeting.January 11, 2005 Fourth-Year Grades meeting; Committee on Graduate School Policy Meeting.January 13, 2005

Final deadline for continuing students to pre-reg on-line .January 14, 2005

Thesis due .January 17, 2005 Martin Luther King, Jr. Day–Holiday.January 19–20, 2005 C.A.P. deferred action meeting.January 26, 2005 English Evaluation Test for International students.January 26–28, 2005 Some advanced standing exams and postponed ﬁnals.January 28, 2005 Last day of January Independent Activities Period.January 31, 2005

Registration day .February 1, 2005

First day of classes .February 2–11, 2005 Physical Education Petition Period.February 3, 2005

Grades due .February 4, 2005

Registration deadline .February 7, 2005 Term summaries of Grades for IAP.February 8, 2005 Committee on Graduate School Policy Meeting.February 11, 2005 C.A.P. February Degree Candidates Meeting.February 16, 2005 Faculty Ofﬁcers recommend degrees to Corporation.February 18, 2005

Minor completion date .February 21, 2005 Presidents Day–Holiday.February 22, 2005

Monday schedule of classes to be held . PREPRINT - J

ULY

6, 2020

Date EventFebruary 28, 2005 Last day to sing up for family health insurance or waive individual coverage.March 4, 2005

Last day to add subjects to Registration .March 21–25, 2005 Spring Vacation–Holiday.March 28, 2005 Half-term subjects offered in second half of term begin.March 30, 2005 Fourth quarter Physical Education classes begin.April 1, 2005 Last day to submit or change Advanced Degree Thesis Title.April 7–10, 2005 Campus Preview Weekend.April 18–19, 2005 Patriots Day–Holiday.April 21, 2005

Last day to cancel subjects from Registration .April 29, 2005

Thesis due .May 2, 2005

On-line pre-registration for Fall Term and Summer Session begins .May 6, 2005

Subjects with no ﬁnal/ﬁnal exam .May 12, 2005

Last day of classes .May 16–20, 2005

Final exam week .May 17–24, 2005

Grade deadline .May 20, 2005

Last day to go off the June degree list .May 26, 2005

Department grades meetings .May 27, 2005 Fourth-Year Grades Meeting.May 30, 2005 Memorial Day–Holiday.May 31, 2005

Fall pre-registration deadline .June 1, 2005 First-Year Grades Meeting.June 2, 2005 Doctoral Hooding Ceremony.June 3, 2005

Commencement .June 14, 2005 C.A.P. deferred action meeting.

References

Amini, A. A. and Levina, E. (2018). On semideﬁnite relaxations for the block model.

The Annals of Statistics ,46(1):149–179.Ball, B., Karrer, B., and Newman, M. E. J. (2011). Efﬁcient and principled method for detecting communities innetworks.

Physical Review E , 84(3). Art. ID 036103.Bassett, D. S., Bullmore, E., Verchinski, B. A., Mattay, V. S., Weinberger, D. R., and Meyer-Lindenberg, A. (2008).Hierarchical organization of human cortical networks in health and schizophrenia.

The Journal of Neuroscience ,28(37):9239–9248.Bernstein, S. (1946).

The Theory of Probabilities . Gastehizdat Publishing House, Moscow, Soviet Union.Bhattacharya, R. and Lin, L. (2017). Omnibus CLTs for Fréchet means and nonparametric inference on non-euclideanspaces.

The Proceedings of the American Mathematical Society , 145:413–428.Bickel, P. J. and Chen, A. (2009). A nonparametric view of network models and Newman–Girvan and other modularities.

Proceedings of the National Academy of Sciences of the United States of America , 106(50):21068–21073.Bounova, G. and de Weck, O. (2012). Overview of metrics and their correlation patterns for multiple-metric topologyanalysis on heterogeneous graph ensembles.

Physical Review E , 85. Art. ID 016117.Boysen, L., Kempe, A., Liebscher, V., Munk, A., and Wittich, O. (2009). Consistencies and rates of convergence ofjump-penalized least squares estimators.

The Annals of Statistics , 37(1):157–183.Cai, T., Li, H., Ma, J., and Xia, Y. (2019). Differential Markov random ﬁeld analysis with an application to detectingdifferential microbial community networks.

Biometrika , 106(2):401–416.Chen, A., Cao, J., and Bu, T. (2010). Network tomography: Identiﬁability and Fourier domain estimation.

IEEETransactions on Signal Processing , 58(12):6029–6039.Chen, H. and Zhang, N. (2015). Graph-based change-point detection.

The Annals of Statistics , 43(1):139–176.30

PREPRINT - J

ULY

6, 2020Chen, J. and Yuan, B. (2006). Detecting functional modules in the yeast protein–protein interaction network.

Bioinfor-matics , 22(18):2283–2290.Cline, M. S., Smoot, M., Cerami, E., Kuchinsky, A., Landys, N., Workman, C., Christmas, R., Avila-Campilo, I.,Creech, M., Gross, B., Hanspers, K., Isserlin, R., Kelley, R., Killcoyne, S., Lotia, S., Maere, S., Morris, J., Ono,K., Pavlovic, V., Pico, A. R., Vailaya, A., Wang, P.-L., Adler, A., Conklin, B. R., Hood, L., Kuiper, M., Sander,C., Schmulevich, I., Schwikowski, B., Warner, G. J., Ideker, T., and Bader, G. D. (2007). Integration of biologicalnetworks and gene expression data using Cytoscape.

Nature Protocols , 2(10):2366–2382.Decelle, A., Krzakala, F., Moore, C., and Zdeborová, L. (2011). Asymptotic analysis of the stochastic block model formodular networks and its algorithmic applications.

Physical Review E , 84(6). Art. ID 066106.Durante, D., Dunson, D. B., and Vogelstein, J. T. (2017). Nonparametric Bayes modeling of populations of networks.

Journal of the American Statistical Association , 112(520):1516–1530.Eagle, N., Pentland, A. S., and Lazer, D. (2009). Inferring friendship network structure by using mobile phone data.

Proceedings of the National Academy of Sciences of the United States of America , 106(36):15274–15278.Erd˝os, L., Yau, H.-T., and Yin, J. (2012). Rigidity of eigenvalues of generalized Wigner matrices.

Advances inMathematics , 229(3):1435–1515.Erd˝os, P. and Rényi, A. (1959). On random graphs. I.

Publicationes Mathematicae , 6:290–297.Fréchet, M. (1948). Les éléments aléatoires de nature quelconque dans un espace distancié.

Annales de L’Institut HenriPoincaré , 10(4):215–310.Ghoshdastidar, D., Gutzeit, M., Carpentier, A., and von Luxburg, U. (2017). Two-sample hypothesis testing forinhomogeneous random graphs. arXiv:1707.00833 .Ghoshdastidar, D. and von Luxburg, U. (2018). Practical methods for graph two-sample testing. In

Advances in NeuralInformation Processing Systems , pages 3019–3028, Montréal, Canada.Ginestet, C. E., Li, J., Balanchandran, P., Rosenberg, S., and Kolaczyk, E. D. (2017). Hypothesis testing for networkdata in functional neuroimaging.

The Annals of Applied Statistics , 11(2):725–750.Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables.

Journal of the AmericanStatistical Association , 58(301):13–30.Hoff, P. D., Raftery, A. E., and Handcock, M. S. (2002). Latent space approaches to social network analysis.

Journal ofthe American Statistical Association , 97(460):1090–1098.Holland, P. W., Laskey, K. B., and Leinhardt, S. (1983). Stochastic blockmodels: First steps.

Social Networks ,5(2):109–137.Karrer, B. and Newman, M. E. J. (2011). Stochastic blockmodels and community structure in networks.

PhysicalReview E , 83. Art. ID 016107.Kolaczyk, E., Lin, L., Rosenberg, S., Xu, J., and Walters, J. (2017). Averages of unlabeled networks: Geometriccharacterization and asymptotic behavior. arXiv:1709.02793 .Kossinets, G. and Watts, D. J. (2006). Empirical analysis of an evolving social network.

Science , 311(5757):88–90.31

PREPRINT - J

ULY

6, 2020Kulig, A., Dro˙zd˙z, S., Kwapie´n, J., and O´swi˛ecimka, P. (2015). Modeling the average shortest-path length in growth ofword-adjacency networks.

Physical Review E , 91(3). Art. ID 032810.Lee, J. O. and Yin, J. (2014). A necessary and sufﬁcient condition for edge universality of wigner matrices.

DukeMathematical Journal , 163(1):117–173.Lei, J. (2016). A goodness-of-ﬁt test for stochastic block models.

The Annals of Statistics , 44(1):401–424.Leonardi, N. and Van De Ville, D. (2013). Tight wavelet frames on multislice graphs.

IEEE Transactions on SignalProcessing , 61(13):3357–3367.Lovász, L. (2012).

Large Networks and Graph Limits . American Mathematical Society, Providence, RI, USA.Mukherjee, S. S., Sarkar, P., and Lin, L. (2017). On clustering network-valued data. In

Advances in Neural InformationProcessing Systems , pages 7071–7081, Long Beach, CA, USA.Niu, Y. S. and Zhang, H. (2012). The screening and ranking algorithm to detect dna copy number variations.

TheAnnals of Applied Statistics , 6(3):1306–1326.Relión, J. D. A., Kessler, D., Levina, E., and Taylor, S. F. (2019). Network classiﬁcation with applications to brainconnectomics.

The Annals of Applied Statistics , 13(3):1648–1677.Rohe, K., Chatterjee, S., and Yu, B. (2011). Spectral clustering and the high-dimensional stochastic block model.

TheAnnals of Statistics , 39(4):1878–1915.Snijders, T. A. B. and Baerveldt, C. (2003). A multilevel network study of the effects of delinquent behavior onfriendship evolution.

Journal of Mathematical Sociology , 27(2-3):123–151.Tang, M., Athreya, A., Sussman, D. L., Lyzinski, V., Park, Y., and Priebe, C. E. (2017). A semiparametric two-samplehypothesis testing problem for random graphs.

Journal of Computational and Graphical Statistics , 26(2):344–354.Tao, T. (2012).

Topics in Random Matrix Theory . American Mathematical Society, Providence, RI, USA.Tracy, C. A. and Widom, H. (1996). On orthogonal and symplectic matrix ensembles.

Communications in MathematicalPhysics , 177(3):727–754.von Luxburg, U. (2007). A tutorial on spectral clustering.

Statistics and Computing , 17(4):395–416.Wolfe, P. J. and Olhede, S. C. (2013). Nonparametric graphon estimation. arXiv:1309.5936 .Zhang, B., Li, H., Riggins, R. B., Zhan, M., Xuan, J., Zhang, Z., Hoffman, E. P., Clarke, R., and Wang, Y. (2009).Differential dependency network analysis to identify condition-speciﬁc topological changes in biological networks.

Bioinformatics , 25(4):526–532.Zhang, Y., Levina, E., and Zhu, J. (2017). Estimating network edge probabilities by neighbourhood smoothing.

Biometrika , 104(4):771–783.Zhao, Z., Chen, L., and Lin, L. (2019). Change-point detection in dynamic networks via graphon estimation. arXiv:1908.01823 .Zou, C., Yin, G., Feng, L., and Wang, Z. (2014). Nonparametric maximum likelihood approach to multiple change-pointproblems.