Contiguity and non-reconstruction results for planted partition models: the dense case
aa r X i v : . [ m a t h . P R ] N ov Contiguity and non-reconstruction results forplanted partition models: the dense case D ebapratim B anerjee Dept. of Statistics
University of [email protected] 15, 2016
Abstract
We consider the two block stochastic block model on n nodes with asymptoti-cally equal cluster sizes. The connection probabilities within and between clusterare denoted by p n : = a n n and q n : = b n n respectively. Mossel et al.[25] consid-ered the case when a n = a and b n = b are fixed. They proved the probabilitymodels of the stochastic block model and that of Erd¨os-R´enyi graph with sameaverage degree are mutually contiguous whenever ( a − b ) < a + b ) and areasymptotically singular whenever ( a − b ) > a + b ). Mossel et al. [25] alsoproved that when ( a − b ) < a + b ) no algorithm is able to find an estimate ofthe labeling of the nodes which is positively correlated with the true labeling. Itis natural to ask what happens when a n and b n both grow to infinity. We provethat their results extend to the case when a n = o ( n ) and b n = o ( n ). We alsoconsider the case when a n n → p ∈ (0 ,
1) and ( a n − b n ) = Θ ( a n + b n ). Observethat in this case b n n → p also. We show that here the models are mutually con-tiguous if ( a n − b n ) < − p )( a n + b n ) and they are asymptotically singular if( a n − b n ) > − p )( a n + b n ). Further we also prove it is impossible find anestimate of the labeling of the nodes which is positively correlated with the truelabeling whenever ( a n − b n ) < − p )( a n + b n ). The results of this paper jus-tify the negative part of a conjecture made in Decelle et al.(2011) [15] for densegraphs. In the last few years the stochastic block model has been one of the most active domainsof modern research in statistics, computer science and many other related fields. Ingeneral a stochastic block model is a network with a hidden community structure wherethe nodes within the communities are expected to be connected in a di ff erent manner1han the nodes between the communities. This model arises naturally in many problemsof statistics, machine learning and data mining, but its applications further extends tofrom population genetics [28] , where genetically similar sub-populations are used asthe clusters, to image processing [30], [31] , where the group of similar images actsas cluster, to the study of social networks , where groups of like-minded people act asclusters [27].Recently a huge amount of e ff ort has been dedicated to find out the clusters. Numer-ous di ff erent clustering algorithms have been proposed in literature. One might look at[20],[16], [11], [17], [8], [7], [14], [29], [23] for some references.One of the easiest examples of the stochastic block model is the planted partitionmodel where one have only two clusters of more or less equal size. Formally, Definition 1.1.
For n ∈ N , and p , q ∈ [0 ,
1] let G ( n , p , q ) denote the model of random, ± labelled graphs in which each vertex u is assigned (independently and uniformly at ran-dom) a label σ u ∈ {± } and each edge between u and v are included independently withprobability p if they have the same label and with probability q if they have di ff erentlabels.The case when p and q are su ffi ciently close to each other has got significant amountof interest in literature. Decelle et al. [15] made a fascinating conjecture in this regard. Conjecture 1.1.
Let p = an and q = bn where a and b are fixed real numbers. Theni) If ( a − b ) > a + b ) then one can find almost surely a bisection of the vertices whichis positively correlated with the original clusters.ii) If ( a − b ) < a + b ) then the problem is not solveable.iii) Further, there are no consistent estimators of a and b if ( a − b ) < a + b ) and thereare consistent estimators of a and b whenever ( a − b ) > a + b ).Coja-Oghlan [13] solved part i ) of the problem when ( a − b ) > C ( a + b ) for somelarge C and finally part ii ) and iii ) of Conjecture 1.1 was proved by Mossel et al. [25]and part i ) was solved by Mossel et al. [24] and Massouli´e [22] independently.Typically the problem is much more delicate when more than two communities arepresent in the sparse case. To keep things simple let us consider the general stochasticblock model with k asymptotically equal sized blocks with connection probabilitieswithin and between blocks are given by an and bn respectively. It was conjectured inMossel et al [25] that for k su ffi ciently large, there is a constant c ( k ) such that whenever c ( k ) < ( a − b ) a + ( k − b < k the reconstruction problem is solvable in exponential time, it is not solvable if ( a − b ) a + ( k − b < c ( k ) and solvable in polynomial time if k < ( a − b ) a + ( k − b . The upper bound is known asKesten-Stigum threshold. Bordenave et al. [9] solved the reconstruction problem abovea deterministic threshold by spectral analysis of non-backtraking matrix. One mightlook at Banks et al. [6] for the non solvability part. They prove that the probability2odels of stochastic block model and that of Erd¨os-R´enyi graph with same averagedegree are contiguous and the reconstruction problem is unsolvable if d < k − k − λ . Here d = a + ( k − bk and λ = a − bkd . Abbe et al. [1] provides an e ffi cient algorithm forreconstruction above the Kesten-Stigum threshold. Abbe et al. [1] and Banks et al.[6] also provide cases strictly below the Kesten-Stigum threshold where the problem issolvable in exponential time.On the other hand, a di ff erent type of reconstruction problem was considered inMossel et al. [26] for denser graphs. They considered two di ff erent notions of recovery.The first one is weak consistency where one is interested in finding a bisection ˆ σ suchthat σ and ˆ σ have correlation going to 1 with high probability. The second one is calledstrong consistency. Here one is interested in finding a bisection ˆ σ such that ˆ σ is either σ or − σ with probability tending to 1. Mossel et al. [26] prove that weak recovery ispossible if and only if n ( p n − q n ) p n + q n → ∞ and strong recovery is possible if and only if (cid:16) a n + b n − p a n b n − (cid:17) log n +
12 log log n → ∞ . Here a n = np n log n and b n = nq n log n respectively. Abbe et al. [2] studied the same problemindependently in the logarithmic sparsity regime. They prove that for a = np n log n and b = nq n log n fixed, ( a + b ) − √ ab > ffi cient for strong consistency and that ( a + b ) − √ ab ≥ ii ) and iii ) of Conjecture 1.1 have not yet been addressed in dense case (i.e. when a and b increase to infinity) which is the main focus of this paper.Before stating our results we mention that the results in Mossel et al. [25] is moregeneral than part iii ) of Conjecture 1.1. Let P n and P ′ n be the sequence of probabilitymeasures induced by G ( n , p , q ) and G ( n , p + q , p + q ) respectively. Then [25] prove thatwhenever a and b are fixed numbers and ( a − b ) < a + b ), the measures P n and P ′ n are mutually contiguous i.e. for a sequence of events A n , P n ( A n ) → P ′ n ( A n ) →
0. Now part iii ) of Conjecture 1.1 directly follows from the contiguity. Theproof in Mossel et al. [25] is based on calculating the limiting distribution of the shortcycles and using a result of contiguity (Theorem 1 in Janson [19] and Theorem 4.1 inWormald [33]). However, one should note that the result from [25] doesn’t directlygeneralize to the denser case. Since, one requires the limiting distributions of shortcycles to be independent Poisson in order to use Janson’s result. In our proof instead ofconsidering the short cycles we consider the “signed cycles”(to be defined later) whichhave asymptotic normal distributions. We also find a result analogous to Janson for thenormal random variables in order to complete the proof.On the other hand the original proof of non-reconstruction from Mossel et al. [25]relies on the coupling of P n and P ′ n with probability measure induced by Galton Wat-son trees of suitable parameters. However, it is well known that when the graph is3u ffi ciently dense i.e. a n >> n o (1) the coupling argument doesn’t work. So our proofis based on fine analysis of some conditional probabilities. Technically, this proof isclosely related to the non-reconstruction proof in section 6.2 of Banks et al. [6] ratherthan the original proof given in Mossel et al. [25].The paper is organized in the following manner. In Section 2 we build some pre-liminary notations and state our results. Section 3 is dedicated for building a resultanalogous to Theorem 1 in Janson [19]. In Section 4 we define signed cycles and findtheir asymptotic distributions. Section 5 is dedicated to complete the proofs of our con-tiguity results. In Section 6 we prove the non-reconstruction result. Finally, the paperconcludes with an Appendix containing a proof of a result from random matrix theoryused in this paper. Through out the paper a random graph will be denoted by G and x i , j will be used todenote the indicator random variable corresponding to an edge between the nodes i and j . Further P n and P ′ n will be used to denote the sequence of probability measuresinduced by G ( n , p n , q n ) and G ( n , p n + q n , p n + q n ) respectively. For notational simplicity wedenote p n + q n by ˆ p n .Further, for any two labeling of the nodes σ and τ , we define their overlap to beov( σ, τ ) : = n n X i = σ i τ i − n n X i = σ i n X i = τ i . (2.1)We now state our results. Theorem 2.1. i)If a n , b n → ∞ , a n = o ( n ) and ( a n − b n ) < a n + b n ) , then the probabilitymeasures P n and P ′ n are mutually contiguous. As a consequence, for any sequence ofevents A n , P n ( A n ) → if and only if P ′ n ( A n ) → . So there doesn’t exists an estimator ( A n , B n ) for ( a n , b n ) such that | A n − a n | + | B n − b n | = o p ( a n − b n ) .ii)If a n , b n → ∞ , a n = o ( n ) and ( a n − b n ) > a n + b n ) , then the probability measures P n and P ′ n are asymptotically singular. Further there exists an estimator ( A n , B n ) for ( a n , b n ) such that | A n − a n | + | B n − b n | = o p ( a n − b n ) . Theorem 2.2.
Suppose a n n → p ∈ (0 , and let c : = ( a n − b n ) ( a n + b n ) ∈ (0 , ∞ ) , then the followingare true:i) P n and P ′ n are mutually contiguous whenever c − p ) < . So there doesn’t exists anestimator ( A n , B n ) for ( a n , b n ) such that | A n − a n | + | B n − b n | = o p ( a n − b n ) .ii) P n and P ′ n are asymptotically singular whenever c − p ) > . Further there exists anestimator ( A n , B n ) for ( a n , b n ) such that | A n − a n | + | B n − b n | = o p ( a n − b n ) . Theorem 2.3. i) If a n , b n → ∞ , a n = o ( n ) and ( a n − b n ) < a n + b n ) , then there is noreconstruction algorithm which performs better than the random guessing i.e. for any stimate of the labeling { ˆ σ i } ni = we have ov( σ, ˆ σ ) P → . (2.2) ii)Suppose a n n → p ∈ (0 , and let c : = ( a n − b n ) ( a n + b n ) ∈ (0 , ∞ ) , then (2.2) holds when c − p ) < . As a consequence, no reconstruction algorithm performs better than the randomguessing.
In this section we provide a very brief description of contiguity of probability measures.We suggest the reader to have a look at the discussion about contiguity of measures inJanson [19] for further details. In this section we state several propositions and exceptfor Proposition 3.4 and Proposition 3.3, all the proofs can be found in Janson [19].
Definition 3.1.
Let P n and Q n be two sequences of probability measures, such that foreach n , P n and Q n both are defined on the same measurable space ( Ω n , F n ). We then saythat the sequences are contiguous if for every sequence of measurable sets A n ⊂ Ω n , P n ( A n ) → ⇔ Q n ( A n ) → . Definition 3.1 might appear a little abstract to some people. However the followingreformulation is perhaps more useful to understand the contiguity concept.
Proposition 3.1.
Two sequences of probability measures P n and Q n are contiguous ifand only if for every ε > there exists n ( ε ) and K ( ε ) such that for all n > n ( ε ) thereexists a set B n ∈ F n with P n ( B cn ) , Q n ( B cn ) ≤ ε such thatK ( ε ) − ≤ Q n ( A n ) P n ( A n ) ≤ K ( ε ) . ∀ A n ⊂ B n . Although Proposition 3.1 gives an equivalent condition, verifying this conditionis often di ffi cult. However under the assumption of convergence of d Q n d P n , one gets thefollowing simplified result. Proposition 3.2.
Suppose that L n = d Q n d P n , regarded as a random variable on ( Ω n , F n , P n ) ,converges in distribution to some random variable L as n → ∞ . Then P n and Q n arecontiguous if and only if L > a.s. and E[ L ] = . We now introduce the concept of Wasserstein’s metric which will be used in theproof of Proposition 3.4.
Definition 3.2.
Let F and G be two distribution functions with finite p th moment.Then the Wasserstein distance W p between F and G is defined to be W p ( F , G ) = (cid:20) inf X ∼ F , Y ∼ G E | X − Y | p (cid:21) p . Here X and Y are random variables having distribution functions F and G respectively.5n particular, the following result will be useful in our proof: Proposition 3.3.
Suppose F n be a sequence of distribution functions and F be a distri-bution function. Then F n converge to F in distribution and R x dF n ( x ) → R x dF ( x ) ifW ( F n , F ) → . The proof of Proposition 3.3 is well known. One might look at Mallows(1972)[21]for a reference.With Proposition 3.2 in hand, we now state the most important result in this section.This result will be used to prove Theorems 2.1 and 2.2. Although, Proposition 3.4 iswritten in a complete di ff erent notation, one can check that it is analogous to Theorem1 in Janson [19]. Proposition 3.4.
Let P n and Q n be two sequences of probability measures such thatfor each n, both of them are defined on ( Ω n , F n ) . Suppose that for each i ≥ , X n , i arerandom variables defined on ( Ω n , F n ) . Then the probability measures P n and Q n aremutually contiguous if the following conditions hold.i) P n << Q n and Q n << P n for each n.ii) For each fixed i ≥ , X n , i | P n d → Z i ∼ N (0 , i ) jointly and X n , i | Q n d → Z ′ i ∼ N ( t i , i ) jointly such that | t | < .iii) Z i and Z ′ i are sequences of independent random variables.iv) E P n d Q n d P n ! → exp ( − t − t ) √ − t . (3.1) Further, d Q n d P n | P n d → exp ∞ X i = t i Z i − t i i . (3.2) Proof.
In this proof for simplicity we denote d Q n d P n by Y n . We break the proof into twosteps. Step 1.
In this step we prove the random variable in R.S. of (3.2) is almost surelypositive and E [ W ] =
1. Let us define W = exp ∞ X i = t i Z i − t i i and W ( m ) = exp m X i = t i Z i − t i i . Z i ∼ N (0 , i ),E exp t i Z i − t i i = exp ( t i × i × i − t i i ) = . So { W ( m ) } ∞ m = is a martingale sequence andE h W ( m )2 i = m Y i = exp ( t i i ) = exp m X i = t i i . Now ∞ X i = t i i = log(1 − t ) − t − t ! ∀ | t | < . So W ( m ) is a L bounded martingale. Hence, W is a well defined random variable,E[ W ] = exp ( − t − t ) √ − t and E[ W ] = Z i d = − Z i for each i and whenever | t | <
1, the series P ∞ i = t i i converges. So W − d = exp ∞ X i = t i Z i + t i i . However, E [ W − ] = exp nP ∞ i = t i i o < ∞ implies W > Step 2.
Now we come to the harder task of proving Y n d → W . Sincelim sup n →∞ E P n h ( Y n ) i < ∞ from condition iv ), the sequence Y n is tight. Hence from Prokhorov’s theorem there isa sub sequence { n k } ∞ k = such that Y n k converge in distribution to some random variable W ( { n k } ). We shall prove that the distribution of W ( { n k } ) doesn’t depend on the subsequence { n k } . In particular, W ( { n k } ) d = W .Since Y n k converges in distribution to W ( { n k } ), for any further sub sequence { n k l } of { n k } , Y n kl also converges in distribution to W ( { n k } ).Given ε > m big enough such thatexp ∞ X i = t i i − exp m X i = t i i < ε. For this m , look at the joint distribution of ( Y n k , X n k , , . . . , X n k , m ). This sequence of m − P n k is also tight from condition ii ). So ithas a further sub sequence such that( Y n kl , X n kl , , . . . , X n kl , m ) | P n kl d → ( H , . . . , H m − ) ∈ ( Ω ( { n k l } ) , F ( { n k l } ) , P ( { n k l } ))( say ) . H is same as W ( { n k } ) and ( H , . . . , H m − ) d = ( Z , . . . , Z m ) from condition ii ).The most important part of this proof is to find suitable σ algebras F ⊂ F ∈F ( { n k l } ) and a random variable V ( m ) d = W ( m ) such that ( V ( m ) , F ) and ( H , F ) is a pair ofmartingales.From condition iv ) we have lim sup n →∞ E P n h Y n i < ∞ . As a consequence, the se-quence the sequence Y n kl is uniformly integrable. This together with condition i ) willgive us 1 = E P nkl h Y n kl i → E[ H ] = . In other words, 1 = Z Y n kl d P n kl → Z H dP ( { n k l } ) = . (3.3)Now take any positive bounded continuous function f : R m → R . By Fatou’s lemmalim inf Z f (cid:16) X n kl , , . . . , X n kl , m (cid:17) Y n kl d P n kl ≥ Z f ( H , . . . , H m − ) H dP ( { n k l } ) . (3.4)However for any constant ξ we have ξ = Z ξ Y n kl d P n kl → Z ξ H dP ( { n k l } ) = ξ from (3.3).So (3.4) holds for any bounded continuous function f . On the other hand replacing f by − f we havelim Z f (cid:16) X n kl , , . . . , X n kl , m (cid:17) Y n kl d P n kl = Z f ( H , . . . , H m − ) H dP ( { n k l } ) . (3.5)Now applying condition ii ) we have Z f (cid:16) X n kl , , . . . , X n kl , m (cid:17) Y n kl d P n kl = Z f (cid:16) X n kl , , . . . , X n kl , m (cid:17) d Q n kl → Z f ( H ′ , . . . , H ′ m − ) dQ . (3.6)Here ( H ′ , . . . , H ′ m − ) d = ( Z ′ , . . . , Z ′ m ) and Q is the measure induced by ( H ′ , . . . , H ′ m − ).In particular, one can take the measure Q such that it is defined on ( Ω ( { n k l } ) , F ( { n k l } ))and ( H , . . . , H m − ) themselves are distributed as ( H ′ , . . . , H ′ m − ) under the measure Q .This is true due to the following observation. Z f ( H , . . . , H m − ) dQ = Z f ( H , . . . , H m − ) V ( m ) dP ( { n k l } )for any bounded continuous function f . Here V ( m ) : = exp m − X i = t i + H i − t i + i + d = W ( m ) . f is any bounded continuous function, we have Z A dQ = Z A V ( m ) dP ( { n k l } )for any A ∈ σ ( H , . . . , H m − ).Now looking back into (3.5), we have Z A V ( m ) dP ( { n k l } ) = Z A H dP ( { n k l } ) . V ( m ) is σ ( H , . . . , H m − ) measurable. So ( V ( m ) , σ ( H , . . . , H m − )) and ( H , σ ( H ) ∨ σ ( H , . . . , H m − )) is a pair of martingales.From Fatou’s lemmaE[ H ] ≤ lim inf n →∞ E P n [ Y n ] = exp ∞ X i = t i i . As a consequence, in the probability space ( Ω ( { n k l } ) , F ( { n k l } ) , P ( { n k l } )), we have0 ≤ E | H − V ( m ) | = E[ H ] − E[ V ( m )2 ] < ε. So W ( F V ( m ) , F H ) < √ ε . Here F V ( m ) and F H denote the distribution functions cor-responding to V ( m ) and H respectively. As a consequence, W ( F V ( m ) , F H ) → m → ∞ . Hence by Proposition 3.3, V ( m ) d → H . Using W ( m ) d = V ( m ) , we get W ( m ) d → H .On the other hand, we have already proved W ( m ) converge to W in L . So H d = W . However, we also proved H d = W ( { n k } ). Together, they imply W ( { n k } ) d = W asrequired. (cid:3) Remark 3.1.
One might observe that the second part in assumption ii ) of Proposition3.4 is slightly weaker than (A2) in Theorem 1 of Janson [19]. For our purpose this issu ffi cient since we use the fact that Y n = d Q n d P n . However, in Theorem 1 of Janson [19] Y n can be any random variable. We have discussed in the introduction that the proof of Mossel et al. [25] crucially usedthe fact that the asymptotic distribution of short cycles turn out to be Poisson. However,in the denser case one doesn’t get a Poisson limit for the short cycles. So their proofdoesn’t work in the denser case. Here we consider instead the “signed cycles” definedas follows:
Definition 4.1.
For a random graph G the signed cycle of length k is defined to be: C n , k ( G ) = p np n , av (1 − p n , av ) k X i , i ,..., i k − ( x i , i − p n , av ) . . . ( x i k − i − p n , av )9here i , i , . . . , i k − are all distinct and p is the average connection probability i.e. p n , av = n ( n − P i , j E[ x i , j ] . Observe that for G ( n , p n , q n ), p n , av is equal to ˆ p n .One should note that when k = C n , k ’swhen n → ∞ and ˆ p n is su ffi ciently large. Our next result is formalizing this intuition. Proposition 4.1. i)When G ∼ P ′ n , n ( p n + q n ) → ∞ and ≤ k < . . . < k l = o (log( ˆ p n n )) , C n , k ( G ) √ k , . . . , C n , k l ( G ) √ k l ! d → N l (0 , I l ) . (4.1) ii) When G ∼ P n , np n → ∞ , c = ( a n − b n ) ( a n + b n ) = Θ (1) and ≤ k < . . . < k l = o (cid:16) min(log( ˆ p n n ) , p log( n )) (cid:17) , C n , k ( G ) − µ √ k , . . . , C n , k l ( G ) − µ l √ k l ! d → N l (0 , I l ) (4.2) where µ i = (cid:18) q c − ˆ p n ) (cid:19) k i for ≤ i ≤ m. The proof of Proposition 4.1 is inspired from the remarkable paper by Anderson andZeitouni [3]. However, the model in this case is simpler which makes the proof lesscumbersome. The fundamental idea is to prove that the signed cycles converges indistribution by using the method of moments and the limiting random variables satisfythe Wick’s formula. At first we state the method of moments.
Lemma 4.1.
Let Y n , , . . . , Y n , l be a random vector of l dimension. Then ( Y n , , . . . , Y n , l ) d → ( Z , . . . , Z l ) if the following conditions are satisfied:i) lim n →∞ E[ X n , . . . X n , m ] (4.3) exists for any fixed m and X n , i ∈ { Y n , , . . . , Y n , l } for ≤ i ≤ m.ii) (Carleman’s Condition)[12] ∞ X h = (cid:18) lim n →∞ E[ X hn , i ] (cid:19) − h = ∞ ∀ ≤ i ≤ l . Further, lim n →∞ E[ X n , . . . X n , m ] = E[ X . . . X m ] . Here X n , i ∈ { Y n , , . . . , Y n , l } for ≤ i ≤ m and X i is the in distribution limit of X n , i . Lemma 4.2. (Wick’s formula)[32] Let ( Y , . . . , Y l ) be a multivariate mean randomvector of dimension l with covariance matrix Σ (possibly singular). Then (( Y , . . . , Y l )) is jointly Gaussian if and only if for any integer m and X i ∈ { Y , . . . , Y l } for ≤ i ≤ m E[ X . . . X m ] = ( P η Q m i = E[ X η ( i , X η ( i , ] for m even for m odd. (4.4) Here η is a partition of { , . . . , m } into m blocks such that each block contains exactly elements and η ( i , j ) denotes the j th element of the i th block of η for j = , . The proof of the aforesaid Lemma is omitted. However, it is good to note that therandom variables Y , . . . , Y l may also be the same. In particular, taking Y = · · · = Y l ,Lemma 4.2 also provides a description of the moments of Gaussian random variables.With Lemma 4.1 and 4.2 in hand, we now jump into the proof of Proposition 4.1. Proof of Proposition 4.1
At first we introduce some notations and some terminologies. We denote an word w to be an ordered sequence of integers (to be called letters) ( i , . . . , i k − , i k ) suchthat i = i k and all the numbers i j for 0 ≤ j ≤ k − w = ( i , . . . , i k − , i k ), its length l ( w ) is k +
1. The graph induced by an word w isdenoted by G w and defined as follows. One treats the letters ( i , . . . , i k ) as nodes andput an edge between the nodes ( i j , i j + ) ≤ j ≤ k − . Note that for a word w of length k + G w = ( V w , E w ) is just a k cycle. For a word w = ( i , . . . , i k ) its mirror image is defined by˜ w = ( i , i k − , i k − , . . . , i , i ). Further for a cyclic permutation τ of the set { , , . . . , k − } ,we define w τ : = ( i τ (0) , . . . , i τ ( k − , i τ (0) ). Finally two words w and x are called paired ifthere is a cyclic permutation τ such that either x τ = w or ˜ x τ = w . An ordered tupleof m words, ( w , . . . , w m ) will be called a sentence. For any sentence a = ( w , . . . w m ), G a = ( V a , E a ) is the graph with V a = ∪ mi = V w i and E a = ∪ mi = E w i . Proof of part i)
We complete the proof of this part in two steps. In the first stepthe asymptotic variances of ( C n , k ( G ) , . . . , C n , k l ( G )) will be calculated and the secondstep will be dedicated towards proving the asymptotic normality and independence of( C n , k ( G ) , . . . , C n , k l ( G )) . Step 1:
Observe that when G ∼ P ′ n the distribution of C n , k ( G ) , . . . , C n , k l ( G ) is triv-ially independent of the labels σ i and E[ C n , k ( G )] = k . Since P ′ n corre-sponds to the probability distribution induced by an Erd¨os-R´enyi model. Now weprove that Var( C n , k ( G )) ∼ k for any k = o ( √ n ). Let for any word w = ( i , . . . , i k ),11 w : = Q k − j = (cid:16) x i j , i j + − ˆ p n (cid:17) . Now observe thatVar( C n , k ) = n ˆ p n (1 − ˆ p n ) ! k E ( X w X w ) = n ˆ p n (1 − ˆ p n ) ! k E X w , x X w X x . (4.5)Since both X w and X x are product of independent mean 0 random variables each comingexactly once, E[ X w X x ] , G w are repeated in G x . Observethat since G w and G x are cycles of length k , this is satisfied if and only if w and x arepaired. There are k many cyclic permutations τ of the set { , . . . , k − } and for agiven w and τ , there are only two possible choices of x such that w and x are paired.These choices are obtained when x τ = w and ˜ x τ = w . As a consequence for anyword w , exactly 2 k words are paired with it. Now observe that when w and x arepaired, X w X x is a product of k random variables each appearing exactly twice. As aconsequence, E[ X w X x ] = ( ˆ p n (1 − ˆ p n )) k . Also the total number of words is given by n ( n − . . . ( n − k +
1) for the choices of i , . . . , i k − . It is well known that n ( n − . . . ( n − k + n k → k = o ( √ n ). SoVar( C n , k ) = k n ˆ p n (1 − ˆ p n ) ! k n ( n − . . . ( n − k +
1) ( ˆ p n (1 − ˆ p n )) k ∼ k (4.6)as long as k = o ( √ n ) . This completes
Step 1 of the proof.
Step 2:
Now we claim that in order to complete
Step 2 , is enough to prove the followingtwo limits. lim n →∞ E (cid:2) C n , k ( G ) C n , k ( G ) (cid:3) → k , k and there exists random variables Z , . . . , Z l such that for any fixed m lim n →∞ E[ X n , . . . X n , m ] → ( P η Q m i = E[ Z η ( i , Z η ( i , ] for m even0 for m odd. (4.8)where X n , i ∈ { C n , k ( G ) √ k , . . . , C n , kl ( G ) √ k l } .First observe that (4.8) will simultaneously imply part i ) and ii ) of Lemma 4.1.Implication of i ) is obvious. However, for ii ) one can take X n , i ’s to be all equal and fromWick’s formula (Lemma 4.2) the limiting distribution of X n , i ’s are normal. It is wellknown that normal random variables satisfy Carleman’s condition. On the other hand(4.8) also implies that the limit of ( C n , k ( G ) √ k , . . . , C n , kl ( G ) √ k l ) is jointly normal. Hence applying(4.7), one gets the asymptotic independence.12e first prove (4.7). Observe thatE (cid:2) C n , k ( G ) C n , k ( G ) (cid:3) = n ˆ p n (1 − ˆ p n ) ! k + k E X w , x X w X x . However, here l ( w ) = k + l ( x ) = k +
1. So E hP w , x X w X x i =
0. As a consequence,(4.7) holds.Now we prove (4.8). Let l i be the length of the signed cycle corresponding to X n , i .Observe that l i ∈ { k , . . . , k l } for any i . At first we expand the L.S. of (4.8).E[ X n , . . . X n , m ] = n ˆ p n (1 − ˆ p n ) ! P i li X w ,..., w m E (cid:2) X w . . . X w m (cid:3) . (4.9)Here each of the graphs G w , . . . , G w m are cycles of length l , . . . , l m respectively. Soin order to have E (cid:2) X w . . . X w m (cid:3) ,
0, we need each of the edges in G w , . . . , G w m to betraversed more than once. The sentence a : = ( w , . . . , w m ), formed by such ( w , . . . , w m )will be called a weak CLT sentence. Given a weak CLT sentence a , we introducea partition η ( a ), of { , . . . , m } in the following way. If i , j are in same block of thepartition η ( a ), then G w i G w j have at least one edge in common.As a consequence, we can further expand the L.S. of (4.9) in the following way. n ˆ p n (1 − ˆ p n ) ! P i li X η X w ,..., w m | η = η ( w ,..., w m ) E (cid:2) X w . . . X w m (cid:3) (4.10)Observe that each block in η should have at least 2 elements. Otherwise, in this case E (cid:2) X w . . . X w m (cid:3) =
0. As a consequence, the number of blocks in η ≤ [ m ].Now we prove that if the number of blocks in η < [ m ], then n ˆ p n (1 − ˆ p n ) ! P i li X η X w ,..., w m | η = η ( w ,..., w m ) E (cid:2) X w . . . X w m (cid:3) → . If η ( w , . . . , w m ) have strictly less than [ m ] blocks, then a has strictly less than [ m ]connected components. From Proposition 4.10 of Anderson and Zeitouni [3] it followsthat in this case V a ≤ P mi = l i − . However each connected component is formed byan union of several cycles so V a ≤ E a . Now the following lemma gives a bound on thenumber of weak CLT sentences having strictly less than [ m ] connected components. Lemma 4.3.
Let A be the set of weak CLT sentences such that for each a ∈ A , V a = t.Then A ≤ P i l i C X i l i C m X i l i P i l i − t ) n t . (4.11)13he proof of Lemma 4.3 is rather technical and requires some amount of randommatrix theory. So we defer its proof to the appendix. However, assuming Lemma 4.3,we have n ˆ p n (1 − ˆ p n ) ! P i li X a : V a ≤ P mi = li − E (cid:2) X w . . . X w m (cid:3) ≤ n ˆ p n (1 − ˆ p n ) ! P i li P mi = li − X t = P i li X e = t P i l i C X i l i C m X i l i P i l i − t ) n t ˆ p en . ≤ n ˆ p n (1 − ˆ p n ) ! P i li P mi = li − X t = P i l i C ′ X i l i C ′ m X i l i P i l i − t ) n t ˆ p tn . ≤ p (1 − ˆ p n ) P i l i P mi = li − X t = C ( P i l i ) C n ˆ p n ! P i li − t | {z } T (( say )) . (4.12)where C and C are some known constants. The third in equality holds due to thefollowing reason. As P mi = l i − t ≥ C ′ X i l i C ′ m X i l i P i l i − t ) = C ′ ( X i l i ) C ′ m P i li − t ) + P i l i − t ) ≤ C ′ ( X i l i ) C ′ m + P i l i − t ) . Observe that T is just a geometric series. Further, lowest value of P mi = l i − t is 1.So we can give the following final bound to (4.12), p (1 − ˆ p n ) P i l i C C ( P i l i ) C n ˆ p n . (4.13)where C is another known constant. When k l = o (log( ˆ p n n )) and P i l i ≤ mk l p (1 − ˆ p n ) mk l C C ( mk l ) C n ˆ p n → . Once this is proved all the other partitions left are pair partitions i.e. it has exactly m many blocks. However, once such a partition η is fixed then the choices within a blockdoesn’t depend on the others. As a consequence, (4.4) is satisfied. This completes parti). (cid:3) roof of part ii) Let d : = p n − q n . We have C n , k ( G ) = n ˆ p n (1 − ˆ p n ) ! k X i , i ,..., i k − ( x i , i − ˆ p n ) . . . ( x i k − i − ˆ p n ) = n ˆ p n (1 − ˆ p n ) ! k X i , i ,..., i k − ( x i , i − p i , i + p i , i − ˆ p n ) . . . ( x i k − i − p i k − , i k + p i k − , i k − ˆ p n ) = n ˆ p n (1 − ˆ p n ) ! k X i , i ,..., i k − ( x i , i − p i , i + σ i σ i d ) . . . ( x i k − i − p i k − , i k + σ i k − σ i k d ) = n ˆ p n (1 − ˆ p n ) ! k X i , i ,..., i k − ( x i , i − p i , i ) . . . ( x i k − i − p i k − , i k ) + d k k − Y j = σ i j σ i j + + V n , k . (4.14)Here p i , j = p n if σ i = σ j and q n otherwise.At first we prove that k − Y j = σ i j σ i j + = σ i ’s. To prove this, without loss of generality let us assume σ i = + . We now look at the runs of + − σ i j ’s. Since i = i k , the valueof σ i k is also 1. So the any such assignment of σ start with a run of + +
1. Also, the runs of + − s Y j = σ i j σ i j + = − s + Y j = σ i j σ i j + if and only if σ i s = − σ i s + . This completes the proof of (4.15).The proof of asymptotic normality and independence of D n , k ( G ) : = n ˆ p n (1 − ˆ p n ) ! k X i , i ,..., i k − ( x i , i − p i , i ) . . . ( x i k − i − p i k − , i k ) is exactly same as part i). We only note that here the variance is also 2 k . To see this, atfirst observe that d = r c ˆ p n n and whenever, k = o (log( ˆ p n n )) bothlim n →∞ ( ˆ p n + d )(1 − ˆ p n − d )ˆ p n (1 − ˆ p n ) ! k = n →∞ ( ˆ p n − d )(1 − ˆ p n + d )ˆ p n (1 − ˆ p n ) ! k = . (4.17)It is easy to see that Var (cid:16) D n , k ( G ) √ k (cid:17) lies between L.S. of ( 4.16) and (4.17). As a conse-quence, Var (cid:16) D n , k ( G ) √ k (cid:17) → V n , k ) → w and let E f ⊂ E w be any subset. Then V n , k = X w V n , k , w where V n , k , w : = n ˆ p n (1 − ˆ p n ) ! k X E f ⊂ E w Y e ∈ E f σ e d Y e ∈ E \ E f ( x e − p e ) . Here for any edge i , j , x e = x i , j , p e = p i , j and σ e = σ i σ j . NowVar( V n , k ) = X w , x Cov( V n , k , w , V n , k , x ) . We now find an upper bound of Cov( V n , k , w , V n , k , x ).At first fix any word w and the set E f ⊂ E w and consider all the words x such that E w ∩ E x = E w \ E f . As every edge in G w and G x appear exactly once,Cov( V n , k , w , V n , k , x ) = X E w \ E ′ ⊂ E w \ E f n ˆ p n (1 − ˆ p n ) ! k Y e ∈ E ′ ( ± d ) E Y e ∈ E w \ E ′ ( x e − p e ) = X E w \ E ′ ⊂ E w \ E f n ˆ p n (1 − ˆ p n ) ! k ± d E ′ (1 + o (1)) ( ˆ p n (1 − ˆ p n )) k − E ′ ≤ X E w \ E ′ ⊂ E w \ E f (1 + o (1)) n ˆ p n (1 − ˆ p n ) ! k (cid:18) c (cid:19) E ′ ˆ p n n ! E ′ ˆ p k − E ′ n ≤ ( C ) k n k + E f (4.18)where C is some known constant. The last inequality holds since E ′ ≥ E f and E w \ E ′ ⊂ E w \ E f ) ≤ k .Observe that the graph corresponding to the edges E w \ E f is a disjoint collection ofstraight lines. Let the number of such straight lines be ζ . Obviously ζ ≤ E w \ E f ). Thenumber of ways these ζ components can be placed in x is bounded by k ζ ≤ k E w \ E f ) andall other nodes in x can be chosen freely. So there is at most n k − V Ew \ E f k E w \ E f ) choicesof such x . Here V E w \ E f is the set of vertices of the graph corresponding to ( E w \ E f ).Observe that, whenever E f > E w \ E f is a forest so V E w \ E f ≥ E w \ E f ) + ⇔ k − V E w \ E f ≤ E f − .
16s a consequence, X x | E w ∩ E x = E w \ E f Cov( V n , k , w , V n , k , x ) ≤ ( C ) k n k + E f n E f − k E w \ E f ) ≤ ( C ) k n k + k k . (4.19)R.S. of (4.19) doesn’t depend on E f and there are at most 2 k nonempty subsets E f of E w . So X x Cov( V n , k , w , V n , k , x ) ≤ (2 C ) k k k n k + . Finally there are at most n k many w . So X w X x Cov( V n , k , w , V n , k , x ) ≤ (2 C ) k k k n . (4.20)Now we use the fact k = o ( p log( n )). In this case k log(2 C ) + k log( k ) ≤ p log ( n ) log( p log n ) = o ( log ( n )) ⇔ (2 C ) k k k = o ( n ) . This concludes the proof. (cid:3)
With Propositions 3.4 and 4.1 in hand the rest of the proof of Theorems 2.1 and 2.2should be very straight forward. We at first prove that lim n →∞ E (cid:16) d P n d P ′ n (cid:17) is r.s. of (3.1)with t = c and c − p ) whenever a n = o ( n ) and a n n → p respectively. Lemma 5.1.
Let Y n : = d P n d P ′ n . Then the following are truei) When p n → (i.e. a n = o ( n ) ), E P ′ n [ Y n ] → exp ( − t − t ) √ − t , t = c < . ii) When p n → p ∈ (0 , P ′ n [ Y n ] → exp ( − t − t ) √ − t , t = c − p ) < . Proof.
The proof of Lemma 5.1 is similar to the proof of Lemma 5.4. in Mossel et al.[25]. We only provide a proof of part ii ). The proof of part i ) is similar. The notationsused in this proof are slightly di ff erent from that of Lemma 5.4. in Mossel et al. [25]17or understanding part ii ) better.At first we introduce some notations. Given a labeled graph ( G , σ ) we define W uv = W uv ( G , σ ) = p n ˆ p n if σ u σ v = u , v ) ∈ E q n ˆ p n if σ u σ v = − u , v ) ∈ E − p n − ˆ p n if σ u σ v = u , v ) < E − q n − ˆ p n if σ u σ v = − u , v ) < E (5.1)and define V uv by the same formula, but with σ replaced by τ . Now Y n = n X σ ∈{ , − } n Y ( u , v ) W uv and Y n = n X σ,τ Y ( u , v ) W uv V uv . Since { W uv } are independent given σ , it follows thatE P ′ n ( Y n ) = n X σ,τ Y ( u , v ) E P ′ n ( W uv V uv ) . Now we consider the following cases:1. σ u σ v = τ u τ v = σ u σ v = − τ u τ v = − σ u σ v = τ u τ v = − σ u σ v = − τ u τ v = t = c − p ) . We at first calculate E P ′ n ( W uv V uv ) for cases 1 and 3. Case 1: E P ′ n ( W uv V uv ) = p n ˆ p n ! ˆ p n + − p n − ˆ p n ! (1 − ˆ p n ) . = p n ˆ p n + (1 − p n ) − ˆ p n = ( ˆ p n + d n ) ˆ p n + (1 − ˆ p n − d n ) − ˆ p n = + d n ( 1ˆ p n + − ˆ p n ) = + d n ˆ p n (1 − ˆ p n ) = + c n (1 − ˆ p n ) = + t n n (5.2)18here d n = p n − q n and t n : = c − ˆ p n ) = (1 + o (1)) t . Case 3: E P ′ n ( W uv V uv ) = p n ˆ p n · q n ˆ p n ! ˆ p n + − p n − ˆ p n · − q n − ˆ p n ! (1 − ˆ p n ) . = p n q n ˆ p n + (1 − p n )(1 − q n )1 − ˆ p n = ( ˆ p n + d n )( ˆ p n − d n )ˆ p n + (1 − ˆ p n − d n )(1 − ˆ p n + d n )1 − ˆ p n = − d n ( 1ˆ p n + − ˆ p n ) = − d n ˆ p n (1 − ˆ p n ) = − t n n (5.3)It is easy to observe that E P ′ n ( W uv V uv ) = + t n n and 1 − t n n for Case 2 and Case 4 respec-tively.We now introduce another parameter ρ = ρ ( σ, τ ) = n P i σ i τ i . Let S ± be the number of { u , v } such that σ u σ v τ u τ v = ± ρ = n + n ( S + − S − ) (5.4)and 1 − n = n ( S + + S − ) . (5.5)So S + = (1 + ρ ) n − n , S − = (1 − ρ ) n . (5.6)Now E P ′ n ( Y n ) = n X σ,τ (cid:18) + t n n (cid:19) S + (cid:18) − t n n (cid:19) S − = n X σ,τ (cid:18) + t n n (cid:19) (1 + ρ ) n − n (cid:18) − t n n (cid:19) (1 − ρ ) n . (5.7)Observe that t n = (1 + o (1)) t is a bounded sequence. It is easy to check by takinglogarithm and Taylor expansion that for any bounded sequence x n , (cid:18) + x n n (cid:19) n = (1 + o (1)) exp ( nx n − x n ) . So we can write R.S. of (5.7) as(1 + o (1)) 12 n X σ,τ e − tn exp " nt n − t n ! + ρ ! × exp " − nt n − t n ! − ρ ! = (1 + o (1)) 12 n X σ,τ e − tn − t n exp " nt n ρ = (1 + o (1)) e − tn − t n n X σ,τ exp " (1 + o (1)) tn ρ (5.8)19rom Lemma 5.5 in Mossel et al. [25]12 n X σ,τ exp " (1 + o (1)) nt ρ → √ − t . So R.S. of (5.8) converges to exp ( − t − t ) √ − t as required. (cid:3) Proof of Theorem 2.1 and 2.2:
The proofs of Theorem 2.1 and 2.2 only di ff er in thevalue of t . For the case a n = o ( n ), t = c and t = c − ˆ p ) for the other case. We prove onlyTheorem 2.1. Proof of Theorem 2.2 is similar after plugging in the appropriate valueof t . Proof of part i)
We take X n , i = C n , i ( G ).At first observe that when a n = o ( n )(i.e. p n , q n →
0) for any fixed i , µ i : = (cid:18) q c − ˆ p n ) (cid:19) i converges to (cid:16) c (cid:17) i as n → ∞ .From Proposition 4.1 and Lemma 4.1 we see that C n , i ( G )’s satisfy all the requiredconditions for Proposition 3.4. Hence P n and P ′ n are mutually contiguous.It is easy to see that the average degree ˆ d n : = n P i , j x i , j has mean a n + b n and variance O ( a n + b n n ). So ˆ d n − a n + b n = o p ( p a n + b n ) = o p ( a n − b n )Suppose under P n there exist estimators A n of a n and B n of b n such that | A n − a n | + | B n − b n | = o p ( a n − b n ) . Then 2( ˆ d n − B n ) − ( a n − b n ) = o p ( a n − b n ) i.e.2( ˆ d n − B n ) a n − b n | P n P → . However, from the fact that P n and P ′ n are contiguous we also have2( ˆ d n − B n ) a n − b n | P ′ n P → Proof of part ii)
It is easy to observe that P n and P ′ n are asymptotically singular as forany k n → ∞ , µ kn √ k n → ∞ . Now we construct estimators for a n and b n . Let us defineˆ f n , k n = (cid:16) √ k n C n , k n ( G ) (cid:17) kn if C n , k n ( G ) >
00 otherwise .
20t is easy to see that under P n ˆ f n , k n P → a n − b n √ a n + b n ) = p c as k n → ∞ . We have seen earlierthat under P n ˆ d n − ( a n + b n )2 √ a n + b n P → ⇒ ˆ d n − ( a n + b n )2 a n + b n P → ⇒ s ˆ d na n + b n P → . ⇒ q ˆ d n − r a n + b n = o p ( p a n + b n ) = o p ( a n − b n ) (5.9)So p ˆ d n ˆ f n , k n − a n − b n = o p ( a n − b n ) under P n . As a consequence, the estimators ˆ A = ˆ d n + p ˆ d n ˆ f n , k n and ˆ B = ˆ d n − p ˆ d n ˆ f n , k n have the required property. This concludes theproof. (cid:3) In this section we provide a proof of the non-reconstruction results stated in Theo-rem 2.3. Our proof technique relies on fine analysis of some conditional probabilities.Technically, this proof is closely related to the non-reconstruction proof in section 6.2of Banks et al. [6] rather than the original proof given in Mossel et al. [25]. At first weprove one Proposition and one Lemma which will be crucial for our proof.
Proposition 6.1.
Suppose a n , b n → ∞ , a n n → p ∈ [0 , and c : = ( a n − b n ) ( a n + b n ) < − p ) .Then for any fixed r and any two configurations ( σ (1)1 , . . . , σ (1) r ) , ( σ (2)1 , . . . , σ (2) r )TV (cid:16) P n ( G | ( σ (1)1 , . . . , σ (1) r )) , P n ( G | ( σ (2)1 , . . . , σ (2) r )) (cid:17) = o (1) Here
TV( µ , µ ) is the total variation distance between two probability measures µ and µ .Proof. We know thatTV (cid:16) P n ( G | σ (1) u u ∈ [ r ]) , P n ( G | σ (2) u u ∈ [ r ]) (cid:17) = X G (cid:12)(cid:12)(cid:12) ( P n ( G | σ (1) u u ∈ [ r ]) − P n ( G | σ (2) u u ∈ [ r ]) (cid:12)(cid:12)(cid:12) = X G (cid:12)(cid:12)(cid:12) ( P n ( G | σ (1) u u ∈ [ r ]) − P n ( G | σ (2) u u ∈ [ r ]) (cid:12)(cid:12)(cid:12) p P ′ n ( G ) p P ′ n ( G ) ≤ X G P ′ n ( G ) X G (cid:16) P n ( G | σ (1) u u ∈ [ r ]) − P n ( G | σ (2) u u ∈ [ r ] (cid:17) P ′ n ( G ) = X G (cid:16)P ˜ σ P n ( ˜ σ ) (cid:16) P n ( G | σ (1) , ˜ σ ) − P n ( G | σ (2) , ˜ σ (cid:17)(cid:17) P ′ n ( G ) . (6.1)21ere σ (1) : = n ( σ (1)1 , . . . , σ (1) r o , σ (2) : = n ( σ (2)1 , . . . , σ (2) r ) o and ˜ σ is any configuration on { r + , . . . , n } . Now observe that X ˜ σ P n ( ˜ σ ) (cid:16) P n ( G | σ (1) , ˜ σ ) − P n ( G | σ (2) , ˜ σ (cid:17) = X ˜ σ, ˜ τ P n ( ˜ σ ) P n (˜ τ ) (cid:16) P n ( G | σ (1) , ˜ σ ) P n ( G | σ (1) , ˜ τ ) + P n ( G | σ (2) , ˜ σ ) P n ( G | σ (2) , ˜ τ ) − P n ( G | σ (1) , ˜ σ ) P n ( G | σ (2) , ˜ τ ) − P n ( G | σ (2) , ˜ σ ) P n ( G | σ (1) , ˜ τ ) (cid:17) . (6.2)We shall prove that the value of X G X ˜ σ, ˜ τ P n ( ˜ σ ) P n (˜ τ ) P n ( G | σ (1) , ˜ σ ) P n ( G | σ (2) , ˜ τ ) P ′ n ( G ) (6.3)doesn’t depend on σ (1) and σ (2) upto o (1) terms. This will prove that the final expressionin (6.1) goes to 0. As a consequence, the proof of Proposition 6.1 will be complete.At first we recall the definition of W uv ( G , σ ) from (5.1). It is easy to observe that X G X ˜ σ, ˜ τ P n ( ˜ σ ) P n (˜ τ ) (cid:16) P n ( G | σ (1) , ˜ σ ) P n ( G | σ (2) , ˜ τ ) (cid:17) P ′ n ( G ) = X ˜ σ, ˜ τ n − r ) X G Y uv W ( G , σ (1) , ˜ σ ) W ( G , σ (2) , ˜ τ ) P ′ n ( G ) = n − r ) X ˜ σ, ˜ τ Y u , v E P ′ n ( W ( G , σ (1) , ˜ σ ) W ( G , σ (2) , ˜ τ )) . (6.4)Observe that the sum in the final expression of (6.4) is taken over ( ˜ σ, ˜ τ ) so the configu-rations in σ (1) and σ (2) remain unchanged.Now let us introduce the following parameters ρ fix : = r r X i = σ (1) i σ (2) i S fix ± : = X u , v ∈ [ r ] I { σ (1) u σ (1) v σ (2) u σ (2) v = ± } (6.5)where I A denotes the indicator variable corresponding to set A . We similarly define ρ ( ˜ σ, ˜ τ ) : = n − r n X i = r + ˜ σ i ˜ τ i S ± ( ˜ σ, ˜ τ ) : = X u , v ∈ [ r ] I { ˜ σ u ˜ σ v ˜ τ u ˜ τ v = ± } . (6.6)22y using arguments similar to the proof of Lemma 5.1 one can show that the R.S.of the final expression of (6.4) further simplifies to = (cid:18) + t n n (cid:19) S fix + (cid:18) − t n n (cid:19) S fix − n − r ) X ˜ σ, ˜ τ (cid:18) + t n n (cid:19) S + ( ˜ σ, ˜ τ ) (cid:18) − t n n (cid:19) S − ( ˜ σ, ˜ τ ) = (cid:18) + t n n (cid:19) S fix + (cid:18) − t n n (cid:19) S fix − n − r ) X ˜ σ, ˜ τ (cid:18) + t n n (cid:19) ( + ρ ( ˜ σ, ˜ τ ) ) ( n − r )24 − n − r (cid:18) − t n n (cid:19) ( − ρ ( ˜ σ, ˜ τ ) ) ( n − r )24 . (6.7)Now S fix + and S fix − are both bounded by r also t n = (1 + o (1)) t . So (cid:18) + t n n (cid:19) S fix + (cid:18) − t n n (cid:19) S fix − = (1 + o (1)) . On the other hand one can repeat the arguments in the proof of Lemma 5.1 to concludethat X ˜ σ, ˜ τ (cid:18) + t n n (cid:19) ( + ρ ( ˜ σ, ˜ τ ) ) ( n − r )24 − n − r (cid:18) − t n n (cid:19) ( − ρ ( ˜ σ, ˜ τ ) ) ( n − r )24 → √ − t exp ( − t − t ) . As a result X G X ˜ σ, ˜ τ P n ( ˜ σ ) P n (˜ τ ) P n ( G | σ (1) , ˜ σ ) P n ( G | σ (2) , ˜ τ ) P ′ n ( G ) = (1 + o (1)) 1 √ − t exp ( − t − t ) irrespective of the value of σ (1) and σ (2) . So the final expression in (6.1) goes to 0.Hence the proof is complete. (cid:3) We now prove the following easy consequence of Proposition 6.1 which statesthat the posterior distribution of a single label is essentially unchanged if we knowa bounded number of other labels.
Lemma 6.1.
Suppose S is a set of finite cardinality r, u < S be a fixed node and π givesprobability to both ± . Then under the conditions of Proposition 6.1 E [TV( P n ( σ u | G , σ S ) , π ) | σ S ] = o (1) . Proof.
Observe that P n ( σ u = i ) = π ( i ) from the model assumption. SoE [TV( P n ( σ u | G , σ S ) , π ) | σ S ] = X G X i = ± | P n ( σ u = i | G , σ S ) − P n ( σ u = i ) | P n ( G | σ S ) = X i = ± P n ( σ u = i ) X G (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) P n ( σ u = i | G , σ S ) P n ( σ u = i ) − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) P n ( G | σ S ) = X i = ± P n ( σ u = i ) X G (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) P n ( σ u = i ∩ G ∩ σ S ) P n ( σ S ) P n ( σ u ∩ σ S ) P n ( G ∩ σ S ) − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) P n ( G | σ S ) = X i = ± P n ( σ u = i ) X G (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) P n ( G | σ S , σ u = i ) P n ( G | σ S ) − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) P n ( G | σ S ) (6.8)23bserve that P n ( G | σ S ) =
12 ( P n ( G | σ S , σ u = + P n ( G | σ S , σ u = − . As a consequence, the final expression of the R.S. of (6.8) becomes12 X i = ± P n ( σ u = i )TV ( P n ( G | σ S , σ u = i ) , P n ( G | σ S , σ u = − i )) . So the proof is complete by applying Proposition 6.1. (cid:3)
With Proposition 6.1 and Lemma 6.1 in hand, we now give a proof of Theorem 2.3.
Proof of Theorem 2.3:
We only prove part i ) of Theorem 2.3. The proof of part ii ) issimilar.Let ˆ σ be any estimate of the labeling of the nodes, σ be the true labeling and f : { , } → {± } be the function such that f (1) = f (2) = − σ, ˆ σ ) = n " N + N − n ( N · N · ) − n ( N · N · ) . (6.9)Here N i j = (cid:12)(cid:12)(cid:12) σ − { f ( i ) } ∩ ˆ σ − { f ( j ) } (cid:12)(cid:12)(cid:12) N i · = (cid:12)(cid:12)(cid:12) σ − { f ( i ) } (cid:12)(cid:12)(cid:12) N · j = (cid:12)(cid:12)(cid:12) ˆ σ − { f ( j ) } (cid:12)(cid:12)(cid:12) . (6.10)So it is su ffi cient to prove that1 n E P n " N ii − n N i · N · i = n E P n " N ii − n N ii N i · N · i + n N i · N · i → i ∈ { , } . Now E P n h N ii i = E P n X u , v I { σ u = f ( i ) } I { σ v = f ( i ) } I { ˆ σ u = f ( i ) } I { ˆ σ v = f ( i ) } = E P n E X u , v I { σ u = f ( i ) } I { σ v = f ( i ) } I { ˆ σ u = f ( i ) } I { ˆ σ v = f ( i ) } | G = E P n E X u , v I { σ u = f ( i ) } I { σ v = f ( i ) } I { ˆ σ u = f ( i ) } I { ˆ σ v = f ( i ) } | G (6.11)The last step follows from the fact that ˆ σ is a function of G . NowE h I { σ u = f ( i ) } I { σ v = f ( i ) } | G i = E h I { σ u = f ( i ) } | G , σ v = f ( i ) i P n ( σ v = f ( i ) | G ) = ( π ( f ( i )) + o (1)) P n ( G | σ v = f ( i )) P n ( σ v = f ( i )) P n ( G ) = ( π ( f ( i )) + o (1)) P n ( G | σ v = f ( i )) P n ( G )24ere the second step follows from Lemma 6.1. As a consequence, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E P n E X u , v (cid:16) I { σ u = f ( i ) } I { σ v = f ( i ) } − π ( f ( i )) (cid:17) I { ˆ σ u = f ( i ) } I { ˆ σ v = f ( i ) } | G (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ E P n X u , v (cid:12)(cid:12)(cid:12)(cid:12) E h(cid:16) I { σ u = f ( i ) } I { σ v = f ( i ) } − π ( f ( i )) (cid:17) I { ˆ σ u = f ( i ) } I { ˆ σ v = f ( i ) } | G i(cid:12)(cid:12)(cid:12)(cid:12) = E P n X u , v (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) π ( f ( i )) I { ˆ σ u = f ( i ) } I { ˆ σ v = f ( i ) } P n ( G | σ v = f ( i )) P n ( G ) − ! + o (1) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ X u , v X G | P n ( G | σ v = f ( i )) − P n ( G ) | + o ( n ) = o ( n ) . (6.12)Here the last step follows from Proposition 6.1.So we have E P n h N ii i = X u , v E P n h π ( f ( i )) I { ˆ σ u = f ( i ) } I { ˆ σ v = f ( i ) } i + o ( n ) (6.13)Similar calculations will prove thatE P n [ N ii N i · N · i ] = n X u , v E P n h π ( f ( i )) I { ˆ σ u = f ( i ) } I { ˆ σ v = f ( i ) } i + o ( n ) (6.14)and E P n h N i · N · i i = n X u , v E P n h π ( f ( i )) I { ˆ σ u = f ( i ) } I { ˆ σ v = f ( i ) } i + o ( n ) . (6.15)Plugging in these estimates we have1 n E P n " N ii − n N i · N · i = o (1) . This completes the proof. (cid:3)
References [1] E. Abbe and C. Sandon. Detection in the stochastic block model withmultiple clusters: proof of the achievability conjectures, acyclic BP, andthe information-computation gap.
ArXiv e-prints , Dec. 2015. URL https://arxiv.org/abs/1512.09080 .[2] E. Abbe, A. S. Bandeira, and G. Hall. Exact recovery in thestochastic block model.
CoRR , abs / http://arxiv.org/abs/1405.3267 .253] G. W. Anderson and O. Zeitouni. A CLT for a band matrix model. Probab. TheoryRelated Fields , 134(2):283–338, 2006.[4] G. W. Anderson, A. Guionnet, and O. Zeitouni.
An introduction to random ma-trices , volume 118 of
Cambridge Studies in Advanced Mathematics . CambridgeUniversity Press, Cambridge, 2010.[5] D. Banerjee and A. Bose. Largest eigenvalue of large randomblock matrices: a combinatorial approach.
Tech. Report R1 / , 2016. URL [6] J. Banks, C. Moore, J. Neeman, and P. Netrapalli. Information-theoretic thresh-olds for community detection in sparse networks. ArXiv e-prints , July 2016. URL https://arxiv.org/abs/1607.01760 .[7] P. J. Bickel and A. Chen. A nonparametric view of network models and newman-girvan and other modularities.
Proceedings of the National Academy of Sciences ,106(50):21068–21073, 2009.[8] R. B. Boppana. Eigenvalues and graph bisection: An average-case analysis.
In28th Annual Symposium on Foundations of Computer Science , pages 280–285,1987.[9] C. Bordenave, M. Lelarge, and L. Massouli´e. Non-backtracking spectrum of ran-dom graphs: community detection and non-regular Ramanujan graphs.
ArXive-prints , Jan. 2015. URL http://arxiv.org/pdf/1501.06087v2.pdf .[10] S. Bubeck, J. Ding, R. Eldan, and M. R´acz. Testing for high-dimensional geometry in random graphs.
ArXiv e-prints , Nov. 2014. URL http://arxiv.org/abs/1411.5713 .[11] T. N. Bui, S. Chaudhuri, F. T. Leighton, and M. Sipser. Graph bisection algorithmswith good average case behavior.
Combinatorica , 7(2):171–191, 1987.[12] T. Carleman. Les fonctions quasi analytiques(in French). Lec¸ons profess´ees auColl`ege de France. 1926.[13] A. Coja-Oghlan. Graph partitioning via adaptive spectral techniques.
Combina-torics, Probability & Computing , 19(2):227–284, 2010.[14] A. Condon and R. M. Karp.
Algorithms for Graph Partitioning on the PlantedPartition Model , pages 221–232. Springer Berlin Heidelberg, Berlin, Heidelberg,1999. 2615] A. Decelle, F. Krzakala, C. Moore, and L. Zdeborov´a. Asymptotic anal-ysis of the stochastic block model for modular networks and its algorith-mic applications.
Physics Review E , 84(6):066106, Dec. 2011. URL https://arxiv.org/abs/1109.3041 .[16] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incom-plete data via the em algorithm.
Journal of the Royal Statistical Society. Series B(Methodological) , 39(1):1–38, 1977.[17] M. E. Dyer and A. M. Frieze. The solution of some random np-hard problems inpolynomial expected time.
J. Algorithms , 10(4):451–489, Dec. 1989.[18] L. Isserlis. On a formula for the product-moment coe ffi cient of any order of anormal frequency distribution in any number of variables. Biometrika , 12(1 / Com-bin. Probab. Comput. , 4(4):369–405, 1995.[20] S. C. Johnson. Hierarchical clustering schemes.
Psychometrika , 32(3):241–254,1967.[21] C. L. Mallows. A note on asymptotic joint normality.
Ann. Math. Statist. , 43(2):508–515, 1972.[22] L. Massouli´e. Community detection thresholds and the weak ramanujan property.
CoRR , abs / http://arxiv.org/abs/1311.3085 .[23] F. McSherry. Spectral partitioning of random graphs. In Foundations of ComputerScience, 2001. Proceedings. 42nd IEEE Symposium on , pages 529–537, Oct 2001.[24] E. Mossel, J. Neeman, and A. Sly. A Proof Of The BlockModel Threshold Conjecture.
ArXiv e-prints , Nov. 2013. URL https://arxiv.org/abs/1311.4115 .[25] E. Mossel, J. Neeman, and A. Sly. Reconstruction and estimation in the plantedpartition model.
Probab. Theory Related Fields , 162(3-4):431–461, 2015.[26] E. Mossel, J. Neeman, and A. Sly. Consistency thresholds for the planted bisectionmodel.
Electron. J. Probab. , 21:1–24, 2016.[27] M. E. J. Newman, D. J. Watts, and S. H. Strogatz. Random graph models ofsocial networks.
Proceedings of the National Academy of Sciences , 99(suppl 1):2566–2572, 2002.[28] J. K. Pritchard, M. Stephens, and P. Donnelly. Inference of population structureusing multilocus genotype data.
Genetics , 155(2):945–959, 2000.2729] K. Rohe, S. Chatterjee, and B. Yu. Spectral clustering and the high-dimensionalstochastic blockmodel.
Ann. Statist. , 39(4):1878–1915, 08 2011.[30] J. Shi and J. Malik. Normalized cuts and image segmentation.
IEEE Trans. PatternAnal. Mach. Intell. , 22(8):888–905, Aug. 2000.[31] M. Sonka, V. Hlavac, and R. Boyle.
Image Processing, Analysis, and MachineVision . Thomson-Engineering, 2007.[32] G. C. Wick. The evaluation of the collision matrix.
Phys. Rev. , 80:268–272, Oct1950.[33] N. C. Wormald. Models of random regular graphs. In J. D. Lamb and D. A.Preece, editors,
Surveys in Combinatorics, 1999 , pages 239–298. Cambridge Uni-versity Press, 1999.
Here we only give a very brief description about the combinatorial aspects of randommatrix theory required to prove Lemma 4.3. For more general information one shouldlook at Chapter 1 of Anderson et al. [4] and Anderson and Zeiouni [3]. The definitionsin this section have been taken from Anderson et al. [4] and Anderson and Zeitouni [3].
Definition 7.1. ( S words) Given a set S , an S letter s is simply an element of S . An S word w is a finite sequence of letters s . . . s n , at least one letter long. An S word w isclosed if its first and last letters are the same. Two S words w , w are called equivalent,denoted w ∼ w , if there is a bijection on S that maps one into the other.When S = { , . . . , N } for some finite N , we use the term N word. Otherwise, if theset S is clear from the context, we refer to an S word simply as a word.For any word w = s . . . s k , we use l ( w ) = k to denote the length of w , define theweight wt ( w ) as the number of distinct elements of the set s , . . . , s k and the support of w , denoted by supp( w ), as the set of letters appearing in w . With any word w we mayassociate an undirected graph, with wt ( w ) vertices and l ( w ) − Definition 7.2. (Graph associated with a word) Given a word w = s . . . s k , we let G w = ( V w , E w ) be the graph with set of vertices V w = supp( w ) and (undirected) edges E w = {{ s i , s i + } , i = , . . . , k − } . The graph G w is connected since the word w defines a path connecting all the ver-tices of G w , which further starts and terminates at the same vertex if the word is closed.For e ∈ E w , we use N we to denote the number of times this path traverses the edge e (in any direction). We note that equivalent words generate the same graphs G w (up tograph isomorphism) and the same passage-counts N we .28 efinition 7.3. (sentences and corresponding graphs) A sentence a = [ w i ] ni = = [[ α i , j ] l ( w i ) j = ] ni = is an ordered collection of n words of length ( l ( w ) , . . . , l ( w n )) respectively. We definethe graph G a = ( V a , E a ) to be the graph with V a = supp( a ) , E a = n { α i , j , α i , j + }| i = , . . . , n ; j = , . . . , l ( w i ) − } o . Definition 7.4. (weak CLT sentences) A sentence a = [ w i ] ni = is called a weak CLTsentence. If the following conditions are true:1. All the words w i ’s are closed.2. Jointly the words w i visit edge of G a at least twice.3. For each i ∈ { , . . . , n } , there is another j , i ∈ { , . . . , n } such that G w i and G w j have at least one edge in common.Note that these definitions are consistent with the ones given in Section 4. How-ever, in Section 4, we defined these only for some specific cases required to solve theproblem.In order to prove Lemma 4.3, we require the following result from Anderson et al.[4]. Lemma 7.1. (Lemma 2.1.23 in Anderson et al. [4]) Let W k , t denote the equivalenceclasses corresponding to all closed words w of length k + with wt( w ) = t such thateach edge in G w have been traversed at least twice. Then for k > t − , W k , t ≤ k k k − t + Assuming Lemma 7.1 we now prove Lemma 4.3.
Proof of Lemma 4.3:
Let a = [ w i ] mi = be a weak CLT sentence such that G a have C ( a )many connected components. At first we introduce a partition η ( a ) in the followingway. We put i and j in same block of η ( a ) if G w i and G w j share an edge. At first wefix such a partition η and consider all the sentences such that η ( a ) = η . Let C ( η ) bethe number of blocks in η . It is easy to observe that for any a with η ( a ) = η , we have C ( η ) = C ( a ). From now on we denote C ( η ) by C for convenience.Let a be any weak CLT sentence such that η ( a ) = η . We now propose an algorithm toembed a into C ordered closed words ( W , . . . , W C ) such that the equivalence class ofeach W i belongs to W L i , t i for some numbers L i and t i .A similar type of argument can be found in Claim 3 of the proof of Theorem 2.2 inBanerjee and Bose(2016) [5]. An embedding algorithm:
Let B , . . . , B C be the blocks of the partition η ordered inthe following way. Let m i = min { j : j ∈ B i } and we order the blocks B i such that m < m . . . < m C . Given a partition η this ordering is unique. Let B i = { i (1) < i (2) < . . . < i ( l ( B i )) } . Here l ( B i ) denotes the number of elements in B i .For each B i we embed the sentence a i = [ w i ( j ) ] ≤ j ≤ l ( B i ) into W i sequentially in thefollowing manner. 29. Let S = { i (1) } and w = w i (1) .
2. For each 1 ≤ c ≤ l ( B i ) − • Consider w c = ( α , c , . . . , α l ( w c ) , c ) and S c ⊂ B i . Let ne ∈ B i \ S c be the indexsuch that the following two conditions hold.(a) G w c and G w ne shares at least one edge e = { α κ , c , α κ + , c } .(b) κ is minimum among all such choices. • Let w ne = ( β , c , . . . , β l ( w ne ) , c ) and { β κ , c , β κ + , c } be the first time e appears in w ne . As { β κ , c , β κ + , c } = { α κ , c , α κ + , c } , α κ , c is either equal to β κ , c or β κ , c .Let κ ∈ { κ , κ + } such that α κ , c = β κ , c . If β κ , c = β κ + , c , then we simplytake κ = κ . • We now generate w c + in the following way w c + = ( α , c , . . . , α κ , c , β κ + , c , . . . , β l ( w ne ) , c , β , c , . . . , β κ , c , α κ + , c , . . . , α l ( w c ) , c ) . Let ˜ a c : = ( w c , w ne ). It is easy to observe by induction that all w c ’s are closedwords and so are all the w ne ’s. So the all the edges in the graph G ˜ a c arepreserved along with their passage counts in G w c + . • Generate S c + = S c ∪ { ne } .
3. Return W i = w l ( B i ) .In the preceding algorithm we have actually defined a function f which maps any weakCLT sentence a into C ordered closed words ( W , . . . , W C ) such that each the equiva-lence class of each W i belongs to W L i , t i for some numbers L i and t i . Observe also that L i < P j ∈ B i l ( w j ) and t i < L i + .Unfortunately f is not an injective map. So given ( W , . . . , W C ) we find an upper boundto the cardinality of the following set f − ( W , . . . , W C ) : = { a | f ( a ) = ( W , . . . , W C ) } We have argued earlier C is the number of blocks in η . However, in general ( W , . . . , W C )does neither specify the partition η nor the order in which the words are concatenatedwith in each block B i of η . So we fix a partition η with C many blocks and an order ofconcatenation O . Observe that O = ( σ ( η ) , . . . , σ C ( η ))where for each i , σ i ( η ) is a permutation of the elements in B i . Now we give an uniformupper bound to the cardinality of the following set f − η, O ( W , . . . , W C ) : = { a | η ( a ) = η ; O ( a ) = O & f ( a ) = ( W , . . . , W C ) } . W i is formed by recursively applying step 2. to( w c , w ne ) for 1 ≤ c ≤ l ( B i ). Given a word w = ( α , . . . , α l ( w ) ), we want to find out thenumber of two words sentences ( w , w ) such that applying step 2 of the algorithm on( w , w ) gives w as an output. This is equivalent to choose three positions i < i < i from the set { , . . . , l ( w ) } such that α i = α i . Once these three positions are chosen,( w , w ) can be constructed uniquely in the following manner w = ( α , . . . , α i , α i + , . . . , α l ( w ) ) w = ( α i , . . . , α i , α i + , . . . , α i ) . Total number of choices i < i < i is bounded by l ( w ) ≤ (cid:0)P mi = l ( w i ) (cid:1) . For eachblock B i , step 2. of the algorithm has been used l ( B i ) many times. So f − η, O ( W , . . . , W C ) ≤ m X i = l ( w i ) P C i = l ( B i ) = m X i = l ( w i ) m On the other hand, a there at most m m many η ’s and for each η there are at most Q C i = l ( B i )! ≤ m m choices of O . So f − ( W , . . . , W C ) ≤ m m m X i = l ( w i ) m ≤ D m X i = l ( w i ) D m (7.1)for some known constants D and D . Now we fix the sequence ( L i , t i ) and find anupper bound to the number of ( W , . . . , W C ). From Lemma 7.1 we know the number ofchoices of W i is bounded by 2 L i − ( L i − L i − t + n t i . So the total number of choices for( W , . . . , W C ) is bounded by2 P mi = l ( w i ) C Y i = ( L i − L i − t + n t i ≤ P mi = l ( w i ) n t m X i = l ( w i ) P mi = l ( w i ) − t ) m X i = l ( w i ) m . (7.2)Now the number of choices ( L i , t i ) such that P C i = L i = P mi = l ( w i ) and P C i = t i = t arebounded by P mi = l ( w i ) − C − ! t − C − ! ≤ m X i = l ( w i ) m . (7.3)Here the inequality follows since C ≤ m and t ≤ P mi = l ( w i )2 −