[PDF] Contiguity and non-reconstruction results for planted partition models: the dense case

Abstract

We consider the two block stochastic block model on n nodes with asymptotically equal cluster sizes. The connection probabilities within and between cluster are denoted by p n := a n n and q n := b n n respectively. Mossel et al.(2012) considered the case when a n =a and b n =b are fixed. They proved the probability models of the stochastic block model and that of Erd{ö}s-R{é}nyi graph with same average degree are mutually contiguous whenever (a−b ) 2 <2(a+b) and are asymptotically singular whenever (a−b ) 2 >2(a+b) . Mossel et al.(2012) also proved that when (a−b ) 2 <2(a+b) no algorithm is able to find an estimate of the labeling of the nodes which is positively correlated with the true labeling. It is natural to ask what happens when a n and b n both grow to infinity. We prove that their results extend to the case when a n =o(n) and b n =o(n) . We also consider the case when a n n →p∈(0,1) and ( a n − b n ) 2 =Θ( a n + b n ) . Observe that in this case b n n →p also. We show that here the models are mutually contiguous if ( a n − b n ) 2 <2(1−p)( a n + b n ) and they are asymptotically singular if ( a n − b n ) 2 >2(1−p)( a n + b n ) . Further we also prove it is impossible find an estimate of the labeling of the nodes which is positively correlated with the true labeling whenever ( a n − b n ) 2 <2(1−p)( a n + b n ) . The results of this paper justify the negative part of a conjecture made in Decelle et al.(2011) for dense graphs.

Full PDF

aa r X i v : . [ m a t h . P R ] N ov Contiguity and non-reconstruction results forplanted partition models: the dense case D ebapratim B anerjee Dept. of Statistics

University of [email protected] 15, 2016

Abstract

We consider the two block stochastic block model on n nodes with asymptoti-cally equal cluster sizes. The connection probabilities within and between clusterare denoted by p n : = a n n and q n : = b n n respectively. Mossel et al.[25] consid-ered the case when a n = a and b n = b are ﬁxed. They proved the probabilitymodels of the stochastic block model and that of Erd¨os-R´enyi graph with sameaverage degree are mutually contiguous whenever ( a − b ) < a + b ) and areasymptotically singular whenever ( a − b ) > a + b ). Mossel et al. [25] alsoproved that when ( a − b ) < a + b ) no algorithm is able to ﬁnd an estimate ofthe labeling of the nodes which is positively correlated with the true labeling. Itis natural to ask what happens when a n and b n both grow to inﬁnity. We provethat their results extend to the case when a n = o ( n ) and b n = o ( n ). We alsoconsider the case when a n n → p ∈ (0 ,

1) and ( a n − b n ) = Θ ( a n + b n ). Observethat in this case b n n → p also. We show that here the models are mutually con-tiguous if ( a n − b n ) < − p )( a n + b n ) and they are asymptotically singular if( a n − b n ) > − p )( a n + b n ). Further we also prove it is impossible ﬁnd anestimate of the labeling of the nodes which is positively correlated with the truelabeling whenever ( a n − b n ) < − p )( a n + b n ). The results of this paper jus-tify the negative part of a conjecture made in Decelle et al.(2011) [15] for densegraphs. In the last few years the stochastic block model has been one of the most active domainsof modern research in statistics, computer science and many other related ﬁelds. Ingeneral a stochastic block model is a network with a hidden community structure wherethe nodes within the communities are expected to be connected in a di ﬀ erent manner1han the nodes between the communities. This model arises naturally in many problemsof statistics, machine learning and data mining, but its applications further extends tofrom population genetics [28] , where genetically similar sub-populations are used asthe clusters, to image processing [30], [31] , where the group of similar images actsas cluster, to the study of social networks , where groups of like-minded people act asclusters [27].Recently a huge amount of e ﬀ ort has been dedicated to ﬁnd out the clusters. Numer-ous di ﬀ erent clustering algorithms have been proposed in literature. One might look at[20],[16], [11], [17], [8], [7], [14], [29], [23] for some references.One of the easiest examples of the stochastic block model is the planted partitionmodel where one have only two clusters of more or less equal size. Formally, Deﬁnition 1.1.

For n ∈ N , and p , q ∈ [0 ,

1] let G ( n , p , q ) denote the model of random, ± labelled graphs in which each vertex u is assigned (independently and uniformly at ran-dom) a label σ u ∈ {± } and each edge between u and v are included independently withprobability p if they have the same label and with probability q if they have di ﬀ erentlabels.The case when p and q are su ﬃ ciently close to each other has got signiﬁcant amountof interest in literature. Decelle et al. [15] made a fascinating conjecture in this regard. Conjecture 1.1.

Let p = an and q = bn where a and b are ﬁxed real numbers. Theni) If ( a − b ) > a + b ) then one can ﬁnd almost surely a bisection of the vertices whichis positively correlated with the original clusters.ii) If ( a − b ) < a + b ) then the problem is not solveable.iii) Further, there are no consistent estimators of a and b if ( a − b ) < a + b ) and thereare consistent estimators of a and b whenever ( a − b ) > a + b ).Coja-Oghlan [13] solved part i ) of the problem when ( a − b ) > C ( a + b ) for somelarge C and ﬁnally part ii ) and iii ) of Conjecture 1.1 was proved by Mossel et al. [25]and part i ) was solved by Mossel et al. [24] and Massouli´e [22] independently.Typically the problem is much more delicate when more than two communities arepresent in the sparse case. To keep things simple let us consider the general stochasticblock model with k asymptotically equal sized blocks with connection probabilitieswithin and between blocks are given by an and bn respectively. It was conjectured inMossel et al [25] that for k su ﬃ ciently large, there is a constant c ( k ) such that whenever c ( k ) < ( a − b ) a + ( k − b < k the reconstruction problem is solvable in exponential time, it is not solvable if ( a − b ) a + ( k − b < c ( k ) and solvable in polynomial time if k < ( a − b ) a + ( k − b . The upper bound is known asKesten-Stigum threshold. Bordenave et al. [9] solved the reconstruction problem abovea deterministic threshold by spectral analysis of non-backtraking matrix. One mightlook at Banks et al. [6] for the non solvability part. They prove that the probability2odels of stochastic block model and that of Erd¨os-R´enyi graph with same averagedegree are contiguous and the reconstruction problem is unsolvable if d < k − k − λ . Here d = a + ( k − bk and λ = a − bkd . Abbe et al. [1] provides an e ﬃ cient algorithm forreconstruction above the Kesten-Stigum threshold. Abbe et al. [1] and Banks et al.[6] also provide cases strictly below the Kesten-Stigum threshold where the problem issolvable in exponential time.On the other hand, a di ﬀ erent type of reconstruction problem was considered inMossel et al. [26] for denser graphs. They considered two di ﬀ erent notions of recovery.The ﬁrst one is weak consistency where one is interested in ﬁnding a bisection ˆ σ suchthat σ and ˆ σ have correlation going to 1 with high probability. The second one is calledstrong consistency. Here one is interested in ﬁnding a bisection ˆ σ such that ˆ σ is either σ or − σ with probability tending to 1. Mossel et al. [26] prove that weak recovery ispossible if and only if n ( p n − q n ) p n + q n → ∞ and strong recovery is possible if and only if (cid:16) a n + b n − p a n b n − (cid:17) log n +

12 log log n → ∞ . Here a n = np n log n and b n = nq n log n respectively. Abbe et al. [2] studied the same problemindependently in the logarithmic sparsity regime. They prove that for a = np n log n and b = nq n log n ﬁxed, ( a + b ) − √ ab > ﬃ cient for strong consistency and that ( a + b ) − √ ab ≥ ii ) and iii ) of Conjecture 1.1 have not yet been addressed in dense case (i.e. when a and b increase to inﬁnity) which is the main focus of this paper.Before stating our results we mention that the results in Mossel et al. [25] is moregeneral than part iii ) of Conjecture 1.1. Let P n and P ′ n be the sequence of probabilitymeasures induced by G ( n , p , q ) and G ( n , p + q , p + q ) respectively. Then [25] prove thatwhenever a and b are ﬁxed numbers and ( a − b ) < a + b ), the measures P n and P ′ n are mutually contiguous i.e. for a sequence of events A n , P n ( A n ) → P ′ n ( A n ) →

0. Now part iii ) of Conjecture 1.1 directly follows from the contiguity. Theproof in Mossel et al. [25] is based on calculating the limiting distribution of the shortcycles and using a result of contiguity (Theorem 1 in Janson [19] and Theorem 4.1 inWormald [33]). However, one should note that the result from [25] doesn’t directlygeneralize to the denser case. Since, one requires the limiting distributions of shortcycles to be independent Poisson in order to use Janson’s result. In our proof instead ofconsidering the short cycles we consider the “signed cycles”(to be deﬁned later) whichhave asymptotic normal distributions. We also ﬁnd a result analogous to Janson for thenormal random variables in order to complete the proof.On the other hand the original proof of non-reconstruction from Mossel et al. [25]relies on the coupling of P n and P ′ n with probability measure induced by Galton Wat-son trees of suitable parameters. However, it is well known that when the graph is3u ﬃ ciently dense i.e. a n >> n o (1) the coupling argument doesn’t work. So our proofis based on ﬁne analysis of some conditional probabilities. Technically, this proof isclosely related to the non-reconstruction proof in section 6.2 of Banks et al. [6] ratherthan the original proof given in Mossel et al. [25].The paper is organized in the following manner. In Section 2 we build some pre-liminary notations and state our results. Section 3 is dedicated for building a resultanalogous to Theorem 1 in Janson [19]. In Section 4 we deﬁne signed cycles and ﬁndtheir asymptotic distributions. Section 5 is dedicated to complete the proofs of our con-tiguity results. In Section 6 we prove the non-reconstruction result. Finally, the paperconcludes with an Appendix containing a proof of a result from random matrix theoryused in this paper. Through out the paper a random graph will be denoted by G and x i , j will be used todenote the indicator random variable corresponding to an edge between the nodes i and j . Further P n and P ′ n will be used to denote the sequence of probability measuresinduced by G ( n , p n , q n ) and G ( n , p n + q n , p n + q n ) respectively. For notational simplicity wedenote p n + q n by ˆ p n .Further, for any two labeling of the nodes σ and τ , we deﬁne their overlap to beov( σ, τ ) : = n  n X i = σ i τ i − n  n X i = σ i   n X i = τ i  . (2.1)We now state our results. Theorem 2.1. i)If a n , b n → ∞ , a n = o ( n ) and ( a n − b n ) < a n + b n ) , then the probabilitymeasures P n and P ′ n are mutually contiguous. As a consequence, for any sequence ofevents A n , P n ( A n ) → if and only if P ′ n ( A n ) → . So there doesn’t exists an estimator ( A n , B n ) for ( a n , b n ) such that | A n − a n | + | B n − b n | = o p ( a n − b n ) .ii)If a n , b n → ∞ , a n = o ( n ) and ( a n − b n ) > a n + b n ) , then the probability measures P n and P ′ n are asymptotically singular. Further there exists an estimator ( A n , B n ) for ( a n , b n ) such that | A n − a n | + | B n − b n | = o p ( a n − b n ) . Theorem 2.2.

Suppose a n n → p ∈ (0 , and let c : = ( a n − b n ) ( a n + b n ) ∈ (0 , ∞ ) , then the followingare true:i) P n and P ′ n are mutually contiguous whenever c − p ) < . So there doesn’t exists anestimator ( A n , B n ) for ( a n , b n ) such that | A n − a n | + | B n − b n | = o p ( a n − b n ) .ii) P n and P ′ n are asymptotically singular whenever c − p ) > . Further there exists anestimator ( A n , B n ) for ( a n , b n ) such that | A n − a n | + | B n − b n | = o p ( a n − b n ) . Theorem 2.3. i) If a n , b n → ∞ , a n = o ( n ) and ( a n − b n ) < a n + b n ) , then there is noreconstruction algorithm which performs better than the random guessing i.e. for any stimate of the labeling { ˆ σ i } ni = we have ov( σ, ˆ σ ) P → . (2.2) ii)Suppose a n n → p ∈ (0 , and let c : = ( a n − b n ) ( a n + b n ) ∈ (0 , ∞ ) , then (2.2) holds when c − p ) < . As a consequence, no reconstruction algorithm performs better than the randomguessing.

In this section we provide a very brief description of contiguity of probability measures.We suggest the reader to have a look at the discussion about contiguity of measures inJanson [19] for further details. In this section we state several propositions and exceptfor Proposition 3.4 and Proposition 3.3, all the proofs can be found in Janson [19].

Deﬁnition 3.1.

Let P n and Q n be two sequences of probability measures, such that foreach n , P n and Q n both are deﬁned on the same measurable space ( Ω n , F n ). We then saythat the sequences are contiguous if for every sequence of measurable sets A n ⊂ Ω n , P n ( A n ) → ⇔ Q n ( A n ) → . Deﬁnition 3.1 might appear a little abstract to some people. However the followingreformulation is perhaps more useful to understand the contiguity concept.

Proposition 3.1.

Two sequences of probability measures P n and Q n are contiguous ifand only if for every ε > there exists n ( ε ) and K ( ε ) such that for all n > n ( ε ) thereexists a set B n ∈ F n with P n ( B cn ) , Q n ( B cn ) ≤ ε such thatK ( ε ) − ≤ Q n ( A n ) P n ( A n ) ≤ K ( ε ) . ∀ A n ⊂ B n . Although Proposition 3.1 gives an equivalent condition, verifying this conditionis often di ﬃ cult. However under the assumption of convergence of d Q n d P n , one gets thefollowing simpliﬁed result. Proposition 3.2.

Suppose that L n = d Q n d P n , regarded as a random variable on ( Ω n , F n , P n ) ,converges in distribution to some random variable L as n → ∞ . Then P n and Q n arecontiguous if and only if L > a.s. and E[ L ] = . We now introduce the concept of Wasserstein’s metric which will be used in theproof of Proposition 3.4.

Deﬁnition 3.2.

Let F and G be two distribution functions with ﬁnite p th moment.Then the Wasserstein distance W p between F and G is deﬁned to be W p ( F , G ) = (cid:20) inf X ∼ F , Y ∼ G E | X − Y | p (cid:21) p . Here X and Y are random variables having distribution functions F and G respectively.5n particular, the following result will be useful in our proof: Proposition 3.3.

Suppose F n be a sequence of distribution functions and F be a distri-bution function. Then F n converge to F in distribution and R x dF n ( x ) → R x dF ( x ) ifW ( F n , F ) → . The proof of Proposition 3.3 is well known. One might look at Mallows(1972)[21]for a reference.With Proposition 3.2 in hand, we now state the most important result in this section.This result will be used to prove Theorems 2.1 and 2.2. Although, Proposition 3.4 iswritten in a complete di ﬀ erent notation, one can check that it is analogous to Theorem1 in Janson [19]. Proposition 3.4.

Let P n and Q n be two sequences of probability measures such thatfor each n, both of them are deﬁned on ( Ω n , F n ) . Suppose that for each i ≥ , X n , i arerandom variables deﬁned on ( Ω n , F n ) . Then the probability measures P n and Q n aremutually contiguous if the following conditions hold.i) P n << Q n and Q n << P n for each n.ii) For each ﬁxed i ≥ , X n , i | P n d → Z i ∼ N (0 , i ) jointly and X n , i | Q n d → Z ′ i ∼ N ( t i , i ) jointly such that | t | < .iii) Z i and Z ′ i are sequences of independent random variables.iv) E P n  d Q n d P n !  → exp ( − t − t ) √ − t . (3.1) Further, d Q n d P n | P n d → exp  ∞ X i = t i Z i − t i i  . (3.2) Proof.

In this proof for simplicity we denote d Q n d P n by Y n . We break the proof into twosteps. Step 1.

In this step we prove the random variable in R.S. of (3.2) is almost surelypositive and E [ W ] =

1. Let us deﬁne W = exp  ∞ X i = t i Z i − t i i  and W ( m ) = exp  m X i = t i Z i − t i i  . Z i ∼ N (0 , i ),E  exp  t i Z i − t i i  = exp ( t i × i × i − t i i ) = . So { W ( m ) } ∞ m = is a martingale sequence andE h W ( m )2 i = m Y i = exp ( t i i ) = exp  m X i = t i i  . Now ∞ X i = t i i = log(1 − t ) − t − t ! ∀ | t | < . So W ( m ) is a L bounded martingale. Hence, W is a well deﬁned random variable,E[ W ] = exp ( − t − t ) √ − t and E[ W ] = Z i d = − Z i for each i and whenever | t | <

1, the series P ∞ i = t i i converges. So W − d = exp  ∞ X i = t i Z i + t i i  . However, E [ W − ] = exp nP ∞ i = t i i o < ∞ implies W > Step 2.

Now we come to the harder task of proving Y n d → W . Sincelim sup n →∞ E P n h ( Y n ) i < ∞ from condition iv ), the sequence Y n is tight. Hence from Prokhorov’s theorem there isa sub sequence { n k } ∞ k = such that Y n k converge in distribution to some random variable W ( { n k } ). We shall prove that the distribution of W ( { n k } ) doesn’t depend on the subsequence { n k } . In particular, W ( { n k } ) d = W .Since Y n k converges in distribution to W ( { n k } ), for any further sub sequence { n k l } of { n k } , Y n kl also converges in distribution to W ( { n k } ).Given ε > m big enough such thatexp  ∞ X i = t i i  − exp  m X i = t i i  < ε. For this m , look at the joint distribution of ( Y n k , X n k , , . . . , X n k , m ). This sequence of m − P n k is also tight from condition ii ). So ithas a further sub sequence such that( Y n kl , X n kl , , . . . , X n kl , m ) | P n kl d → ( H , . . . , H m − ) ∈ ( Ω ( { n k l } ) , F ( { n k l } ) , P ( { n k l } ))( say ) . H is same as W ( { n k } ) and ( H , . . . , H m − ) d = ( Z , . . . , Z m ) from condition ii ).The most important part of this proof is to ﬁnd suitable σ algebras F ⊂ F ∈F ( { n k l } ) and a random variable V ( m ) d = W ( m ) such that ( V ( m ) , F ) and ( H , F ) is a pair ofmartingales.From condition iv ) we have lim sup n →∞ E P n h Y n i < ∞ . As a consequence, the se-quence the sequence Y n kl is uniformly integrable. This together with condition i ) willgive us 1 = E P nkl h Y n kl i → E[ H ] = . In other words, 1 = Z Y n kl d P n kl → Z H dP ( { n k l } ) = . (3.3)Now take any positive bounded continuous function f : R m → R . By Fatou’s lemmalim inf Z f (cid:16) X n kl , , . . . , X n kl , m (cid:17) Y n kl d P n kl ≥ Z f ( H , . . . , H m − ) H dP ( { n k l } ) . (3.4)However for any constant ξ we have ξ = Z ξ Y n kl d P n kl → Z ξ H dP ( { n k l } ) = ξ from (3.3).So (3.4) holds for any bounded continuous function f . On the other hand replacing f by − f we havelim Z f (cid:16) X n kl , , . . . , X n kl , m (cid:17) Y n kl d P n kl = Z f ( H , . . . , H m − ) H dP ( { n k l } ) . (3.5)Now applying condition ii ) we have Z f (cid:16) X n kl , , . . . , X n kl , m (cid:17) Y n kl d P n kl = Z f (cid:16) X n kl , , . . . , X n kl , m (cid:17) d Q n kl → Z f ( H ′ , . . . , H ′ m − ) dQ . (3.6)Here ( H ′ , . . . , H ′ m − ) d = ( Z ′ , . . . , Z ′ m ) and Q is the measure induced by ( H ′ , . . . , H ′ m − ).In particular, one can take the measure Q such that it is deﬁned on ( Ω ( { n k l } ) , F ( { n k l } ))and ( H , . . . , H m − ) themselves are distributed as ( H ′ , . . . , H ′ m − ) under the measure Q .This is true due to the following observation. Z f ( H , . . . , H m − ) dQ = Z f ( H , . . . , H m − ) V ( m ) dP ( { n k l } )for any bounded continuous function f . Here V ( m ) : = exp  m − X i = t i + H i − t i + i +  d = W ( m ) . f is any bounded continuous function, we have Z A dQ = Z A V ( m ) dP ( { n k l } )for any A ∈ σ ( H , . . . , H m − ).Now looking back into (3.5), we have Z A V ( m ) dP ( { n k l } ) = Z A H dP ( { n k l } ) . V ( m ) is σ ( H , . . . , H m − ) measurable. So ( V ( m ) , σ ( H , . . . , H m − )) and ( H , σ ( H ) ∨ σ ( H , . . . , H m − )) is a pair of martingales.From Fatou’s lemmaE[ H ] ≤ lim inf n →∞ E P n [ Y n ] = exp  ∞ X i = t i i  . As a consequence, in the probability space ( Ω ( { n k l } ) , F ( { n k l } ) , P ( { n k l } )), we have0 ≤ E | H − V ( m ) | = E[ H ] − E[ V ( m )2 ] < ε. So W ( F V ( m ) , F H ) < √ ε . Here F V ( m ) and F H denote the distribution functions cor-responding to V ( m ) and H respectively. As a consequence, W ( F V ( m ) , F H ) → m → ∞ . Hence by Proposition 3.3, V ( m ) d → H . Using W ( m ) d = V ( m ) , we get W ( m ) d → H .On the other hand, we have already proved W ( m ) converge to W in L . So H d = W . However, we also proved H d = W ( { n k } ). Together, they imply W ( { n k } ) d = W asrequired. (cid:3) Remark 3.1.

One might observe that the second part in assumption ii ) of Proposition3.4 is slightly weaker than (A2) in Theorem 1 of Janson [19]. For our purpose this issu ﬃ cient since we use the fact that Y n = d Q n d P n . However, in Theorem 1 of Janson [19] Y n can be any random variable. We have discussed in the introduction that the proof of Mossel et al. [25] crucially usedthe fact that the asymptotic distribution of short cycles turn out to be Poisson. However,in the denser case one doesn’t get a Poisson limit for the short cycles. So their proofdoesn’t work in the denser case. Here we consider instead the “signed cycles” deﬁnedas follows:

Deﬁnition 4.1.

For a random graph G the signed cycle of length k is deﬁned to be: C n , k ( G ) =  p np n , av (1 − p n , av )  k X i , i ,..., i k − ( x i , i − p n , av ) . . . ( x i k − i − p n , av )9here i , i , . . . , i k − are all distinct and p is the average connection probability i.e. p n , av = n ( n − P i , j E[ x i , j ] . Observe that for G ( n , p n , q n ), p n , av is equal to ˆ p n .One should note that when k = C n , k ’swhen n → ∞ and ˆ p n is su ﬃ ciently large. Our next result is formalizing this intuition. Proposition 4.1. i)When G ∼ P ′ n , n ( p n + q n ) → ∞ and ≤ k < . . . < k l = o (log( ˆ p n n )) , C n , k ( G ) √ k , . . . , C n , k l ( G ) √ k l ! d → N l (0 , I l ) . (4.1) ii) When G ∼ P n , np n → ∞ , c = ( a n − b n ) ( a n + b n ) = Θ (1) and ≤ k < . . . < k l = o (cid:16) min(log( ˆ p n n ) , p log( n )) (cid:17) , C n , k ( G ) − µ √ k , . . . , C n , k l ( G ) − µ l √ k l ! d → N l (0 , I l ) (4.2) where µ i = (cid:18) q c − ˆ p n ) (cid:19) k i for ≤ i ≤ m. The proof of Proposition 4.1 is inspired from the remarkable paper by Anderson andZeitouni [3]. However, the model in this case is simpler which makes the proof lesscumbersome. The fundamental idea is to prove that the signed cycles converges indistribution by using the method of moments and the limiting random variables satisfythe Wick’s formula. At ﬁrst we state the method of moments.

Lemma 4.1.

Let Y n , , . . . , Y n , l be a random vector of l dimension. Then ( Y n , , . . . , Y n , l ) d → ( Z , . . . , Z l ) if the following conditions are satisﬁed:i) lim n →∞ E[ X n , . . . X n , m ] (4.3) exists for any ﬁxed m and X n , i ∈ { Y n , , . . . , Y n , l } for ≤ i ≤ m.ii) (Carleman’s Condition)[12] ∞ X h = (cid:18) lim n →∞ E[ X hn , i ] (cid:19) − h = ∞ ∀ ≤ i ≤ l . Further, lim n →∞ E[ X n , . . . X n , m ] = E[ X . . . X m ] . Here X n , i ∈ { Y n , , . . . , Y n , l } for ≤ i ≤ m and X i is the in distribution limit of X n , i . Lemma 4.2. (Wick’s formula)[32] Let ( Y , . . . , Y l ) be a multivariate mean randomvector of dimension l with covariance matrix Σ (possibly singular). Then (( Y , . . . , Y l )) is jointly Gaussian if and only if for any integer m and X i ∈ { Y , . . . , Y l } for ≤ i ≤ m E[ X . . . X m ] = ( P η Q m i = E[ X η ( i , X η ( i , ] for m even for m odd. (4.4) Here η is a partition of { , . . . , m } into m blocks such that each block contains exactly elements and η ( i , j ) denotes the j th element of the i th block of η for j = , . The proof of the aforesaid Lemma is omitted. However, it is good to note that therandom variables Y , . . . , Y l may also be the same. In particular, taking Y = · · · = Y l ,Lemma 4.2 also provides a description of the moments of Gaussian random variables.With Lemma 4.1 and 4.2 in hand, we now jump into the proof of Proposition 4.1. Proof of Proposition 4.1

At ﬁrst we introduce some notations and some terminologies. We denote an word w to be an ordered sequence of integers (to be called letters) ( i , . . . , i k − , i k ) suchthat i = i k and all the numbers i j for 0 ≤ j ≤ k − w = ( i , . . . , i k − , i k ), its length l ( w ) is k +

1. The graph induced by an word w isdenoted by G w and deﬁned as follows. One treats the letters ( i , . . . , i k ) as nodes andput an edge between the nodes ( i j , i j + ) ≤ j ≤ k − . Note that for a word w of length k + G w = ( V w , E w ) is just a k cycle. For a word w = ( i , . . . , i k ) its mirror image is deﬁned by˜ w = ( i , i k − , i k − , . . . , i , i ). Further for a cyclic permutation τ of the set { , , . . . , k − } ,we deﬁne w τ : = ( i τ (0) , . . . , i τ ( k − , i τ (0) ). Finally two words w and x are called paired ifthere is a cyclic permutation τ such that either x τ = w or ˜ x τ = w . An ordered tupleof m words, ( w , . . . , w m ) will be called a sentence. For any sentence a = ( w , . . . w m ), G a = ( V a , E a ) is the graph with V a = ∪ mi = V w i and E a = ∪ mi = E w i . Proof of part i)

We complete the proof of this part in two steps. In the ﬁrst stepthe asymptotic variances of ( C n , k ( G ) , . . . , C n , k l ( G )) will be calculated and the secondstep will be dedicated towards proving the asymptotic normality and independence of( C n , k ( G ) , . . . , C n , k l ( G )) . Step 1:

Observe that when G ∼ P ′ n the distribution of C n , k ( G ) , . . . , C n , k l ( G ) is triv-ially independent of the labels σ i and E[ C n , k ( G )] = k . Since P ′ n corre-sponds to the probability distribution induced by an Erd¨os-R´enyi model. Now weprove that Var( C n , k ( G )) ∼ k for any k = o ( √ n ). Let for any word w = ( i , . . . , i k ),11 w : = Q k − j = (cid:16) x i j , i j + − ˆ p n (cid:17) . Now observe thatVar( C n , k ) = n ˆ p n (1 − ˆ p n ) ! k E  ( X w X w )  = n ˆ p n (1 − ˆ p n ) ! k E X w , x X w X x  . (4.5)Since both X w and X x are product of independent mean 0 random variables each comingexactly once, E[ X w X x ] , G w are repeated in G x . Observethat since G w and G x are cycles of length k , this is satisﬁed if and only if w and x arepaired. There are k many cyclic permutations τ of the set { , . . . , k − } and for agiven w and τ , there are only two possible choices of x such that w and x are paired.These choices are obtained when x τ = w and ˜ x τ = w . As a consequence for anyword w , exactly 2 k words are paired with it. Now observe that when w and x arepaired, X w X x is a product of k random variables each appearing exactly twice. As aconsequence, E[ X w X x ] = ( ˆ p n (1 − ˆ p n )) k . Also the total number of words is given by n ( n − . . . ( n − k +

1) for the choices of i , . . . , i k − . It is well known that n ( n − . . . ( n − k + n k → k = o ( √ n ). SoVar( C n , k ) = k n ˆ p n (1 − ˆ p n ) ! k n ( n − . . . ( n − k +

1) ( ˆ p n (1 − ˆ p n )) k ∼ k (4.6)as long as k = o ( √ n ) . This completes

Step 1 of the proof.

Step 2:

Now we claim that in order to complete

Step 2 , is enough to prove the followingtwo limits. lim n →∞ E (cid:2) C n , k ( G ) C n , k ( G ) (cid:3) → k , k and there exists random variables Z , . . . , Z l such that for any ﬁxed m lim n →∞ E[ X n , . . . X n , m ] → ( P η Q m i = E[ Z η ( i , Z η ( i , ] for m even0 for m odd. (4.8)where X n , i ∈ { C n , k ( G ) √ k , . . . , C n , kl ( G ) √ k l } .First observe that (4.8) will simultaneously imply part i ) and ii ) of Lemma 4.1.Implication of i ) is obvious. However, for ii ) one can take X n , i ’s to be all equal and fromWick’s formula (Lemma 4.2) the limiting distribution of X n , i ’s are normal. It is wellknown that normal random variables satisfy Carleman’s condition. On the other hand(4.8) also implies that the limit of ( C n , k ( G ) √ k , . . . , C n , kl ( G ) √ k l ) is jointly normal. Hence applying(4.7), one gets the asymptotic independence.12e ﬁrst prove (4.7). Observe thatE (cid:2) C n , k ( G ) C n , k ( G ) (cid:3) = n ˆ p n (1 − ˆ p n ) ! k + k E X w , x X w X x  . However, here l ( w ) = k + l ( x ) = k +

1. So E hP w , x X w X x i =

0. As a consequence,(4.7) holds.Now we prove (4.8). Let l i be the length of the signed cycle corresponding to X n , i .Observe that l i ∈ { k , . . . , k l } for any i . At ﬁrst we expand the L.S. of (4.8).E[ X n , . . . X n , m ] = n ˆ p n (1 − ˆ p n ) ! P i li X w ,..., w m E (cid:2) X w . . . X w m (cid:3) . (4.9)Here each of the graphs G w , . . . , G w m are cycles of length l , . . . , l m respectively. Soin order to have E (cid:2) X w . . . X w m (cid:3) ,

0, we need each of the edges in G w , . . . , G w m to betraversed more than once. The sentence a : = ( w , . . . , w m ), formed by such ( w , . . . , w m )will be called a weak CLT sentence. Given a weak CLT sentence a , we introducea partition η ( a ), of { , . . . , m } in the following way. If i , j are in same block of thepartition η ( a ), then G w i G w j have at least one edge in common.As a consequence, we can further expand the L.S. of (4.9) in the following way. n ˆ p n (1 − ˆ p n ) ! P i li X η X w ,..., w m | η = η ( w ,..., w m ) E (cid:2) X w . . . X w m (cid:3) (4.10)Observe that each block in η should have at least 2 elements. Otherwise, in this case E (cid:2) X w . . . X w m (cid:3) =

0. As a consequence, the number of blocks in η ≤ [ m ].Now we prove that if the number of blocks in η < [ m ], then n ˆ p n (1 − ˆ p n ) ! P i li X η X w ,..., w m | η = η ( w ,..., w m ) E (cid:2) X w . . . X w m (cid:3) → . If η ( w , . . . , w m ) have strictly less than [ m ] blocks, then a has strictly less than [ m ]connected components. From Proposition 4.10 of Anderson and Zeitouni [3] it followsthat in this case V a ≤ P mi = l i − . However each connected component is formed byan union of several cycles so V a ≤ E a . Now the following lemma gives a bound on thenumber of weak CLT sentences having strictly less than [ m ] connected components. Lemma 4.3.

Let A be the set of weak CLT sentences such that for each a ∈ A , V a = t.Then A ≤ P i l i  C X i l i  C m X i l i  P i l i − t ) n t . (4.11)13he proof of Lemma 4.3 is rather technical and requires some amount of randommatrix theory. So we defer its proof to the appendix. However, assuming Lemma 4.3,we have n ˆ p n (1 − ˆ p n ) ! P i li X a : V a ≤ P mi = li − E (cid:2) X w . . . X w m (cid:3) ≤ n ˆ p n (1 − ˆ p n ) ! P i li P mi = li − X t = P i li X e = t P i l i  C X i l i  C m X i l i  P i l i − t ) n t ˆ p en . ≤ n ˆ p n (1 − ˆ p n ) ! P i li P mi = li − X t = P i l i  C ′ X i l i  C ′ m X i l i  P i l i − t ) n t ˆ p tn . ≤  p (1 − ˆ p n )  P i l i P mi = li − X t = C ( P i l i ) C n ˆ p n ! P i li − t | {z } T (( say )) . (4.12)where C and C are some known constants. The third in equality holds due to thefollowing reason. As P mi = l i − t ≥  C ′ X i l i  C ′ m X i l i  P i l i − t ) =  C ′ ( X i l i ) C ′ m P i li − t ) +  P i l i − t ) ≤  C ′ ( X i l i ) C ′ m +  P i l i − t ) . Observe that T is just a geometric series. Further, lowest value of P mi = l i − t is 1.So we can give the following ﬁnal bound to (4.12),  p (1 − ˆ p n )  P i l i C C ( P i l i ) C n ˆ p n . (4.13)where C is another known constant. When k l = o (log( ˆ p n n )) and P i l i ≤ mk l  p (1 − ˆ p n )  mk l C C ( mk l ) C n ˆ p n → . Once this is proved all the other partitions left are pair partitions i.e. it has exactly m many blocks. However, once such a partition η is ﬁxed then the choices within a blockdoesn’t depend on the others. As a consequence, (4.4) is satisﬁed. This completes parti). (cid:3) roof of part ii) Let d : = p n − q n . We have C n , k ( G ) = n ˆ p n (1 − ˆ p n ) ! k X i , i ,..., i k − ( x i , i − ˆ p n ) . . . ( x i k − i − ˆ p n ) = n ˆ p n (1 − ˆ p n ) ! k X i , i ,..., i k − ( x i , i − p i , i + p i , i − ˆ p n ) . . . ( x i k − i − p i k − , i k + p i k − , i k − ˆ p n ) = n ˆ p n (1 − ˆ p n ) ! k X i , i ,..., i k − ( x i , i − p i , i + σ i σ i d ) . . . ( x i k − i − p i k − , i k + σ i k − σ i k d ) = n ˆ p n (1 − ˆ p n ) ! k  X i , i ,..., i k − ( x i , i − p i , i ) . . . ( x i k − i − p i k − , i k ) + d k k − Y j = σ i j σ i j +  + V n , k . (4.14)Here p i , j = p n if σ i = σ j and q n otherwise.At ﬁrst we prove that k − Y j = σ i j σ i j + = σ i ’s. To prove this, without loss of generality let us assume σ i = + . We now look at the runs of + − σ i j ’s. Since i = i k , the valueof σ i k is also 1. So the any such assignment of σ start with a run of + +

1. Also, the runs of + − s Y j = σ i j σ i j + = − s + Y j = σ i j σ i j + if and only if σ i s = − σ i s + . This completes the proof of (4.15).The proof of asymptotic normality and independence of D n , k ( G ) : = n ˆ p n (1 − ˆ p n ) ! k  X i , i ,..., i k − ( x i , i − p i , i ) . . . ( x i k − i − p i k − , i k )  is exactly same as part i). We only note that here the variance is also 2 k . To see this, atﬁrst observe that d = r c ˆ p n n and whenever, k = o (log( ˆ p n n )) bothlim n →∞ ( ˆ p n + d )(1 − ˆ p n − d )ˆ p n (1 − ˆ p n ) ! k = n →∞ ( ˆ p n − d )(1 − ˆ p n + d )ˆ p n (1 − ˆ p n ) ! k = . (4.17)It is easy to see that Var (cid:16) D n , k ( G ) √ k (cid:17) lies between L.S. of ( 4.16) and (4.17). As a conse-quence, Var (cid:16) D n , k ( G ) √ k (cid:17) → V n , k ) → w and let E f ⊂ E w be any subset. Then V n , k = X w V n , k , w where V n , k , w : = n ˆ p n (1 − ˆ p n ) ! k X E f ⊂ E w Y e ∈ E f σ e d Y e ∈ E \ E f ( x e − p e ) . Here for any edge i , j , x e = x i , j , p e = p i , j and σ e = σ i σ j . NowVar( V n , k ) = X w , x Cov( V n , k , w , V n , k , x ) . We now ﬁnd an upper bound of Cov( V n , k , w , V n , k , x ).At ﬁrst ﬁx any word w and the set E f ⊂ E w and consider all the words x such that E w ∩ E x = E w \ E f . As every edge in G w and G x appear exactly once,Cov( V n , k , w , V n , k , x ) = X E w \ E ′ ⊂ E w \ E f n ˆ p n (1 − ˆ p n ) ! k Y e ∈ E ′ ( ± d ) E Y e ∈ E w \ E ′ ( x e − p e ) = X E w \ E ′ ⊂ E w \ E f n ˆ p n (1 − ˆ p n ) ! k ± d E ′ (1 + o (1)) ( ˆ p n (1 − ˆ p n )) k − E ′ ≤ X E w \ E ′ ⊂ E w \ E f (1 + o (1)) n ˆ p n (1 − ˆ p n ) ! k (cid:18) c (cid:19) E ′ ˆ p n n ! E ′ ˆ p k − E ′ n ≤ ( C ) k n k + E f (4.18)where C is some known constant. The last inequality holds since E ′ ≥ E f and E w \ E ′ ⊂ E w \ E f ) ≤ k .Observe that the graph corresponding to the edges E w \ E f is a disjoint collection ofstraight lines. Let the number of such straight lines be ζ . Obviously ζ ≤ E w \ E f ). Thenumber of ways these ζ components can be placed in x is bounded by k ζ ≤ k E w \ E f ) andall other nodes in x can be chosen freely. So there is at most n k − V Ew \ E f k E w \ E f ) choicesof such x . Here V E w \ E f is the set of vertices of the graph corresponding to ( E w \ E f ).Observe that, whenever E f > E w \ E f is a forest so V E w \ E f ≥ E w \ E f ) + ⇔ k − V E w \ E f ≤ E f − .

16s a consequence, X x | E w ∩ E x = E w \ E f Cov( V n , k , w , V n , k , x ) ≤ ( C ) k n k + E f n E f − k E w \ E f ) ≤ ( C ) k n k + k k . (4.19)R.S. of (4.19) doesn’t depend on E f and there are at most 2 k nonempty subsets E f of E w . So X x Cov( V n , k , w , V n , k , x ) ≤ (2 C ) k k k n k + . Finally there are at most n k many w . So X w X x Cov( V n , k , w , V n , k , x ) ≤ (2 C ) k k k n . (4.20)Now we use the fact k = o ( p log( n )). In this case k log(2 C ) + k log( k ) ≤ p log ( n ) log( p log n ) = o ( log ( n )) ⇔ (2 C ) k k k = o ( n ) . This concludes the proof. (cid:3)

With Propositions 3.4 and 4.1 in hand the rest of the proof of Theorems 2.1 and 2.2should be very straight forward. We at ﬁrst prove that lim n →∞ E (cid:16) d P n d P ′ n (cid:17) is r.s. of (3.1)with t = c and c − p ) whenever a n = o ( n ) and a n n → p respectively. Lemma 5.1.

Let Y n : = d P n d P ′ n . Then the following are truei) When p n → (i.e. a n = o ( n ) ), E P ′ n [ Y n ] → exp ( − t − t ) √ − t , t = c < . ii) When p n → p ∈ (0 , P ′ n [ Y n ] → exp ( − t − t ) √ − t , t = c − p ) < . Proof.

The proof of Lemma 5.1 is similar to the proof of Lemma 5.4. in Mossel et al.[25]. We only provide a proof of part ii ). The proof of part i ) is similar. The notationsused in this proof are slightly di ﬀ erent from that of Lemma 5.4. in Mossel et al. [25]17or understanding part ii ) better.At ﬁrst we introduce some notations. Given a labeled graph ( G , σ ) we deﬁne W uv = W uv ( G , σ ) =  p n ˆ p n if σ u σ v = u , v ) ∈ E q n ˆ p n if σ u σ v = − u , v ) ∈ E − p n − ˆ p n if σ u σ v = u , v ) < E − q n − ˆ p n if σ u σ v = − u , v ) < E (5.1)and deﬁne V uv by the same formula, but with σ replaced by τ . Now Y n = n X σ ∈{ , − } n Y ( u , v ) W uv and Y n = n X σ,τ Y ( u , v ) W uv V uv . Since { W uv } are independent given σ , it follows thatE P ′ n ( Y n ) = n X σ,τ Y ( u , v ) E P ′ n ( W uv V uv ) . Now we consider the following cases:1. σ u σ v = τ u τ v = σ u σ v = − τ u τ v = − σ u σ v = τ u τ v = − σ u σ v = − τ u τ v = t = c − p ) . We at ﬁrst calculate E P ′ n ( W uv V uv ) for cases 1 and 3. Case 1: E P ′ n ( W uv V uv ) = p n ˆ p n ! ˆ p n + − p n − ˆ p n ! (1 − ˆ p n ) . = p n ˆ p n + (1 − p n ) − ˆ p n = ( ˆ p n + d n ) ˆ p n + (1 − ˆ p n − d n ) − ˆ p n = + d n ( 1ˆ p n + − ˆ p n ) = + d n ˆ p n (1 − ˆ p n ) = + c n (1 − ˆ p n ) = + t n n (5.2)18here d n = p n − q n and t n : = c − ˆ p n ) = (1 + o (1)) t . Case 3: E P ′ n ( W uv V uv ) = p n ˆ p n · q n ˆ p n ! ˆ p n + − p n − ˆ p n · − q n − ˆ p n ! (1 − ˆ p n ) . = p n q n ˆ p n + (1 − p n )(1 − q n )1 − ˆ p n = ( ˆ p n + d n )( ˆ p n − d n )ˆ p n + (1 − ˆ p n − d n )(1 − ˆ p n + d n )1 − ˆ p n = − d n ( 1ˆ p n + − ˆ p n ) = − d n ˆ p n (1 − ˆ p n ) = − t n n (5.3)It is easy to observe that E P ′ n ( W uv V uv ) = + t n n and 1 − t n n for Case 2 and Case 4 respec-tively.We now introduce another parameter ρ = ρ ( σ, τ ) = n P i σ i τ i . Let S ± be the number of { u , v } such that σ u σ v τ u τ v = ± ρ = n + n ( S + − S − ) (5.4)and 1 − n = n ( S + + S − ) . (5.5)So S + = (1 + ρ ) n − n , S − = (1 − ρ ) n . (5.6)Now E P ′ n ( Y n ) = n X σ,τ (cid:18) + t n n (cid:19) S + (cid:18) − t n n (cid:19) S − = n X σ,τ (cid:18) + t n n (cid:19) (1 + ρ ) n − n (cid:18) − t n n (cid:19) (1 − ρ ) n . (5.7)Observe that t n = (1 + o (1)) t is a bounded sequence. It is easy to check by takinglogarithm and Taylor expansion that for any bounded sequence x n , (cid:18) + x n n (cid:19) n = (1 + o (1)) exp ( nx n − x n ) . So we can write R.S. of (5.7) as(1 + o (1)) 12 n X σ,τ e − tn exp " nt n − t n ! + ρ ! × exp " − nt n − t n ! − ρ ! = (1 + o (1)) 12 n X σ,τ e − tn − t n exp " nt n ρ = (1 + o (1)) e − tn − t n n X σ,τ exp " (1 + o (1)) tn ρ (5.8)19rom Lemma 5.5 in Mossel et al. [25]12 n X σ,τ exp " (1 + o (1)) nt ρ → √ − t . So R.S. of (5.8) converges to exp ( − t − t ) √ − t as required. (cid:3) Proof of Theorem 2.1 and 2.2:

The proofs of Theorem 2.1 and 2.2 only di ﬀ er in thevalue of t . For the case a n = o ( n ), t = c and t = c − ˆ p ) for the other case. We prove onlyTheorem 2.1. Proof of Theorem 2.2 is similar after plugging in the appropriate valueof t . Proof of part i)

We take X n , i = C n , i ( G ).At ﬁrst observe that when a n = o ( n )(i.e. p n , q n →

0) for any ﬁxed i , µ i : = (cid:18) q c − ˆ p n ) (cid:19) i converges to (cid:16) c (cid:17) i as n → ∞ .From Proposition 4.1 and Lemma 4.1 we see that C n , i ( G )’s satisfy all the requiredconditions for Proposition 3.4. Hence P n and P ′ n are mutually contiguous.It is easy to see that the average degree ˆ d n : = n P i , j x i , j has mean a n + b n and variance O ( a n + b n n ). So ˆ d n − a n + b n = o p ( p a n + b n ) = o p ( a n − b n )Suppose under P n there exist estimators A n of a n and B n of b n such that | A n − a n | + | B n − b n | = o p ( a n − b n ) . Then 2( ˆ d n − B n ) − ( a n − b n ) = o p ( a n − b n ) i.e.2( ˆ d n − B n ) a n − b n | P n P → . However, from the fact that P n and P ′ n are contiguous we also have2( ˆ d n − B n ) a n − b n | P ′ n P → Proof of part ii)

It is easy to observe that P n and P ′ n are asymptotically singular as forany k n → ∞ , µ kn √ k n → ∞ . Now we construct estimators for a n and b n . Let us deﬁneˆ f n , k n =  (cid:16) √ k n C n , k n ( G ) (cid:17) kn if C n , k n ( G ) >

00 otherwise .

20t is easy to see that under P n ˆ f n , k n P → a n − b n √ a n + b n ) = p c as k n → ∞ . We have seen earlierthat under P n ˆ d n − ( a n + b n )2 √ a n + b n P → ⇒ ˆ d n − ( a n + b n )2 a n + b n P → ⇒ s ˆ d na n + b n P → . ⇒ q ˆ d n − r a n + b n = o p ( p a n + b n ) = o p ( a n − b n ) (5.9)So p ˆ d n ˆ f n , k n − a n − b n = o p ( a n − b n ) under P n . As a consequence, the estimators ˆ A = ˆ d n + p ˆ d n ˆ f n , k n and ˆ B = ˆ d n − p ˆ d n ˆ f n , k n have the required property. This concludes theproof. (cid:3) In this section we provide a proof of the non-reconstruction results stated in Theo-rem 2.3. Our proof technique relies on ﬁne analysis of some conditional probabilities.Technically, this proof is closely related to the non-reconstruction proof in section 6.2of Banks et al. [6] rather than the original proof given in Mossel et al. [25]. At ﬁrst weprove one Proposition and one Lemma which will be crucial for our proof.

Proposition 6.1.

Suppose a n , b n → ∞ , a n n → p ∈ [0 , and c : = ( a n − b n ) ( a n + b n ) < − p ) .Then for any ﬁxed r and any two conﬁgurations ( σ (1)1 , . . . , σ (1) r ) , ( σ (2)1 , . . . , σ (2) r )TV (cid:16) P n ( G | ( σ (1)1 , . . . , σ (1) r )) , P n ( G | ( σ (2)1 , . . . , σ (2) r )) (cid:17) = o (1) Here

TV( µ , µ ) is the total variation distance between two probability measures µ and µ .Proof. We know thatTV (cid:16) P n ( G | σ (1) u u ∈ [ r ]) , P n ( G | σ (2) u u ∈ [ r ]) (cid:17) = X G (cid:12)(cid:12)(cid:12) ( P n ( G | σ (1) u u ∈ [ r ]) − P n ( G | σ (2) u u ∈ [ r ]) (cid:12)(cid:12)(cid:12) = X G (cid:12)(cid:12)(cid:12) ( P n ( G | σ (1) u u ∈ [ r ]) − P n ( G | σ (2) u u ∈ [ r ]) (cid:12)(cid:12)(cid:12) p P ′ n ( G ) p P ′ n ( G ) ≤ X G P ′ n ( G )  X G (cid:16) P n ( G | σ (1) u u ∈ [ r ]) − P n ( G | σ (2) u u ∈ [ r ] (cid:17) P ′ n ( G )  = X G (cid:16)P ˜ σ P n ( ˜ σ ) (cid:16) P n ( G | σ (1) , ˜ σ ) − P n ( G | σ (2) , ˜ σ (cid:17)(cid:17) P ′ n ( G )  . (6.1)21ere σ (1) : = n ( σ (1)1 , . . . , σ (1) r o , σ (2) : = n ( σ (2)1 , . . . , σ (2) r ) o and ˜ σ is any conﬁguration on { r + , . . . , n } . Now observe that X ˜ σ P n ( ˜ σ ) (cid:16) P n ( G | σ (1) , ˜ σ ) − P n ( G | σ (2) , ˜ σ (cid:17) = X ˜ σ, ˜ τ P n ( ˜ σ ) P n (˜ τ ) (cid:16) P n ( G | σ (1) , ˜ σ ) P n ( G | σ (1) , ˜ τ ) + P n ( G | σ (2) , ˜ σ ) P n ( G | σ (2) , ˜ τ ) − P n ( G | σ (1) , ˜ σ ) P n ( G | σ (2) , ˜ τ ) − P n ( G | σ (2) , ˜ σ ) P n ( G | σ (1) , ˜ τ ) (cid:17) . (6.2)We shall prove that the value of X G X ˜ σ, ˜ τ P n ( ˜ σ ) P n (˜ τ ) P n ( G | σ (1) , ˜ σ ) P n ( G | σ (2) , ˜ τ ) P ′ n ( G ) (6.3)doesn’t depend on σ (1) and σ (2) upto o (1) terms. This will prove that the ﬁnal expressionin (6.1) goes to 0. As a consequence, the proof of Proposition 6.1 will be complete.At ﬁrst we recall the deﬁnition of W uv ( G , σ ) from (5.1). It is easy to observe that X G X ˜ σ, ˜ τ P n ( ˜ σ ) P n (˜ τ ) (cid:16) P n ( G | σ (1) , ˜ σ ) P n ( G | σ (2) , ˜ τ ) (cid:17) P ′ n ( G ) = X ˜ σ, ˜ τ n − r ) X G Y uv W ( G , σ (1) , ˜ σ ) W ( G , σ (2) , ˜ τ )  P ′ n ( G ) = n − r ) X ˜ σ, ˜ τ Y u , v E P ′ n ( W ( G , σ (1) , ˜ σ ) W ( G , σ (2) , ˜ τ )) . (6.4)Observe that the sum in the ﬁnal expression of (6.4) is taken over ( ˜ σ, ˜ τ ) so the conﬁgu-rations in σ (1) and σ (2) remain unchanged.Now let us introduce the following parameters ρ ﬁx : = r r X i = σ (1) i σ (2) i S ﬁx ± : = X u , v ∈ [ r ] I { σ (1) u σ (1) v σ (2) u σ (2) v = ± } (6.5)where I A denotes the indicator variable corresponding to set A . We similarly deﬁne ρ ( ˜ σ, ˜ τ ) : = n − r n X i = r + ˜ σ i ˜ τ i S ± ( ˜ σ, ˜ τ ) : = X u , v ∈ [ r ] I { ˜ σ u ˜ σ v ˜ τ u ˜ τ v = ± } . (6.6)22y using arguments similar to the proof of Lemma 5.1 one can show that the R.S.of the ﬁnal expression of (6.4) further simpliﬁes to = (cid:18) + t n n (cid:19) S ﬁx + (cid:18) − t n n (cid:19) S ﬁx − n − r ) X ˜ σ, ˜ τ (cid:18) + t n n (cid:19) S + ( ˜ σ, ˜ τ ) (cid:18) − t n n (cid:19) S − ( ˜ σ, ˜ τ ) = (cid:18) + t n n (cid:19) S ﬁx + (cid:18) − t n n (cid:19) S ﬁx − n − r ) X ˜ σ, ˜ τ (cid:18) + t n n (cid:19) ( + ρ ( ˜ σ, ˜ τ ) ) ( n − r )24 − n − r (cid:18) − t n n (cid:19) ( − ρ ( ˜ σ, ˜ τ ) ) ( n − r )24 . (6.7)Now S ﬁx + and S ﬁx − are both bounded by r also t n = (1 + o (1)) t . So (cid:18) + t n n (cid:19) S ﬁx + (cid:18) − t n n (cid:19) S ﬁx − = (1 + o (1)) . On the other hand one can repeat the arguments in the proof of Lemma 5.1 to concludethat X ˜ σ, ˜ τ (cid:18) + t n n (cid:19) ( + ρ ( ˜ σ, ˜ τ ) ) ( n − r )24 − n − r (cid:18) − t n n (cid:19) ( − ρ ( ˜ σ, ˜ τ ) ) ( n − r )24 → √ − t exp ( − t − t ) . As a result X G X ˜ σ, ˜ τ P n ( ˜ σ ) P n (˜ τ ) P n ( G | σ (1) , ˜ σ ) P n ( G | σ (2) , ˜ τ ) P ′ n ( G ) = (1 + o (1)) 1 √ − t exp ( − t − t ) irrespective of the value of σ (1) and σ (2) . So the ﬁnal expression in (6.1) goes to 0.Hence the proof is complete. (cid:3) We now prove the following easy consequence of Proposition 6.1 which statesthat the posterior distribution of a single label is essentially unchanged if we knowa bounded number of other labels.

Lemma 6.1.

Suppose S is a set of ﬁnite cardinality r, u < S be a ﬁxed node and π givesprobability to both ± . Then under the conditions of Proposition 6.1 E [TV( P n ( σ u | G , σ S ) , π ) | σ S ] = o (1) . Proof.

Observe that P n ( σ u = i ) = π ( i ) from the model assumption. SoE [TV( P n ( σ u | G , σ S ) , π ) | σ S ] = X G X i = ± | P n ( σ u = i | G , σ S ) − P n ( σ u = i ) | P n ( G | σ S ) = X i = ± P n ( σ u = i ) X G (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) P n ( σ u = i | G , σ S ) P n ( σ u = i ) − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) P n ( G | σ S ) = X i = ± P n ( σ u = i ) X G (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) P n ( σ u = i ∩ G ∩ σ S ) P n ( σ S ) P n ( σ u ∩ σ S ) P n ( G ∩ σ S ) − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) P n ( G | σ S ) = X i = ± P n ( σ u = i ) X G (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) P n ( G | σ S , σ u = i ) P n ( G | σ S ) − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) P n ( G | σ S ) (6.8)23bserve that P n ( G | σ S ) =

12 ( P n ( G | σ S , σ u = + P n ( G | σ S , σ u = − . As a consequence, the ﬁnal expression of the R.S. of (6.8) becomes12 X i = ± P n ( σ u = i )TV ( P n ( G | σ S , σ u = i ) , P n ( G | σ S , σ u = − i )) . So the proof is complete by applying Proposition 6.1. (cid:3)

With Proposition 6.1 and Lemma 6.1 in hand, we now give a proof of Theorem 2.3.

Proof of Theorem 2.3:

We only prove part i ) of Theorem 2.3. The proof of part ii ) issimilar.Let ˆ σ be any estimate of the labeling of the nodes, σ be the true labeling and f : { , } → {± } be the function such that f (1) = f (2) = − σ, ˆ σ ) = n " N + N − n ( N · N · ) − n ( N · N · ) . (6.9)Here N i j = (cid:12)(cid:12)(cid:12) σ − { f ( i ) } ∩ ˆ σ − { f ( j ) } (cid:12)(cid:12)(cid:12) N i · = (cid:12)(cid:12)(cid:12) σ − { f ( i ) } (cid:12)(cid:12)(cid:12) N · j = (cid:12)(cid:12)(cid:12) ˆ σ − { f ( j ) } (cid:12)(cid:12)(cid:12) . (6.10)So it is su ﬃ cient to prove that1 n E P n " N ii − n N i · N · i = n E P n " N ii − n N ii N i · N · i + n N i · N · i → i ∈ { , } . Now E P n h N ii i = E P n X u , v I { σ u = f ( i ) } I { σ v = f ( i ) } I { ˆ σ u = f ( i ) } I { ˆ σ v = f ( i ) }  = E P n  E X u , v I { σ u = f ( i ) } I { σ v = f ( i ) } I { ˆ σ u = f ( i ) } I { ˆ σ v = f ( i ) }  | G  = E P n  E X u , v I { σ u = f ( i ) } I { σ v = f ( i ) }  I { ˆ σ u = f ( i ) } I { ˆ σ v = f ( i ) } | G  (6.11)The last step follows from the fact that ˆ σ is a function of G . NowE h I { σ u = f ( i ) } I { σ v = f ( i ) } | G i = E h I { σ u = f ( i ) } | G , σ v = f ( i ) i P n ( σ v = f ( i ) | G ) = ( π ( f ( i )) + o (1)) P n ( G | σ v = f ( i )) P n ( σ v = f ( i )) P n ( G ) = ( π ( f ( i )) + o (1)) P n ( G | σ v = f ( i )) P n ( G )24ere the second step follows from Lemma 6.1. As a consequence, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E P n  E X u , v (cid:16) I { σ u = f ( i ) } I { σ v = f ( i ) } − π ( f ( i )) (cid:17) I { ˆ σ u = f ( i ) } I { ˆ σ v = f ( i ) } | G (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ E P n X u , v (cid:12)(cid:12)(cid:12)(cid:12) E h(cid:16) I { σ u = f ( i ) } I { σ v = f ( i ) } − π ( f ( i )) (cid:17) I { ˆ σ u = f ( i ) } I { ˆ σ v = f ( i ) } | G i(cid:12)(cid:12)(cid:12)(cid:12) = E P n X u , v (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) π ( f ( i )) I { ˆ σ u = f ( i ) } I { ˆ σ v = f ( i ) } P n ( G | σ v = f ( i )) P n ( G ) − ! + o (1) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ X u , v X G | P n ( G | σ v = f ( i )) − P n ( G ) | + o ( n ) = o ( n ) . (6.12)Here the last step follows from Proposition 6.1.So we have E P n h N ii i = X u , v E P n h π ( f ( i )) I { ˆ σ u = f ( i ) } I { ˆ σ v = f ( i ) } i + o ( n ) (6.13)Similar calculations will prove thatE P n [ N ii N i · N · i ] = n X u , v E P n h π ( f ( i )) I { ˆ σ u = f ( i ) } I { ˆ σ v = f ( i ) } i + o ( n ) (6.14)and E P n h N i · N · i i = n X u , v E P n h π ( f ( i )) I { ˆ σ u = f ( i ) } I { ˆ σ v = f ( i ) } i + o ( n ) . (6.15)Plugging in these estimates we have1 n E P n " N ii − n N i · N · i = o (1) . This completes the proof. (cid:3)

References [1] E. Abbe and C. Sandon. Detection in the stochastic block model withmultiple clusters: proof of the achievability conjectures, acyclic BP, andthe information-computation gap.

ArXiv e-prints , Dec. 2015. URL https://arxiv.org/abs/1512.09080 .[2] E. Abbe, A. S. Bandeira, and G. Hall. Exact recovery in thestochastic block model.

CoRR , abs / http://arxiv.org/abs/1405.3267 .253] G. W. Anderson and O. Zeitouni. A CLT for a band matrix model. Probab. TheoryRelated Fields , 134(2):283–338, 2006.[4] G. W. Anderson, A. Guionnet, and O. Zeitouni.

An introduction to random ma-trices , volume 118 of

Cambridge Studies in Advanced Mathematics . CambridgeUniversity Press, Cambridge, 2010.[5] D. Banerjee and A. Bose. Largest eigenvalue of large randomblock matrices: a combinatorial approach.

Tech. Report R1 / , 2016. URL [6] J. Banks, C. Moore, J. Neeman, and P. Netrapalli. Information-theoretic thresh-olds for community detection in sparse networks. ArXiv e-prints , July 2016. URL https://arxiv.org/abs/1607.01760 .[7] P. J. Bickel and A. Chen. A nonparametric view of network models and newman-girvan and other modularities.

Proceedings of the National Academy of Sciences ,106(50):21068–21073, 2009.[8] R. B. Boppana. Eigenvalues and graph bisection: An average-case analysis.

In28th Annual Symposium on Foundations of Computer Science , pages 280–285,1987.[9] C. Bordenave, M. Lelarge, and L. Massouli´e. Non-backtracking spectrum of ran-dom graphs: community detection and non-regular Ramanujan graphs.

ArXive-prints , Jan. 2015. URL http://arxiv.org/pdf/1501.06087v2.pdf .[10] S. Bubeck, J. Ding, R. Eldan, and M. R´acz. Testing for high-dimensional geometry in random graphs.

ArXiv e-prints , Nov. 2014. URL http://arxiv.org/abs/1411.5713 .[11] T. N. Bui, S. Chaudhuri, F. T. Leighton, and M. Sipser. Graph bisection algorithmswith good average case behavior.

Combinatorica , 7(2):171–191, 1987.[12] T. Carleman. Les fonctions quasi analytiques(in French). Lec¸ons profess´ees auColl`ege de France. 1926.[13] A. Coja-Oghlan. Graph partitioning via adaptive spectral techniques.

Combina-torics, Probability & Computing , 19(2):227–284, 2010.[14] A. Condon and R. M. Karp.

Algorithms for Graph Partitioning on the PlantedPartition Model , pages 221–232. Springer Berlin Heidelberg, Berlin, Heidelberg,1999. 2615] A. Decelle, F. Krzakala, C. Moore, and L. Zdeborov´a. Asymptotic anal-ysis of the stochastic block model for modular networks and its algorith-mic applications.

Physics Review E , 84(6):066106, Dec. 2011. URL https://arxiv.org/abs/1109.3041 .[16] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incom-plete data via the em algorithm.

Journal of the Royal Statistical Society. Series B(Methodological) , 39(1):1–38, 1977.[17] M. E. Dyer and A. M. Frieze. The solution of some random np-hard problems inpolynomial expected time.

J. Algorithms , 10(4):451–489, Dec. 1989.[18] L. Isserlis. On a formula for the product-moment coe ﬃ cient of any order of anormal frequency distribution in any number of variables. Biometrika , 12(1 / Com-bin. Probab. Comput. , 4(4):369–405, 1995.[20] S. C. Johnson. Hierarchical clustering schemes.

Psychometrika , 32(3):241–254,1967.[21] C. L. Mallows. A note on asymptotic joint normality.

Ann. Math. Statist. , 43(2):508–515, 1972.[22] L. Massouli´e. Community detection thresholds and the weak ramanujan property.

CoRR , abs / http://arxiv.org/abs/1311.3085 .[23] F. McSherry. Spectral partitioning of random graphs. In Foundations of ComputerScience, 2001. Proceedings. 42nd IEEE Symposium on , pages 529–537, Oct 2001.[24] E. Mossel, J. Neeman, and A. Sly. A Proof Of The BlockModel Threshold Conjecture.

ArXiv e-prints , Nov. 2013. URL https://arxiv.org/abs/1311.4115 .[25] E. Mossel, J. Neeman, and A. Sly. Reconstruction and estimation in the plantedpartition model.

Probab. Theory Related Fields , 162(3-4):431–461, 2015.[26] E. Mossel, J. Neeman, and A. Sly. Consistency thresholds for the planted bisectionmodel.

Electron. J. Probab. , 21:1–24, 2016.[27] M. E. J. Newman, D. J. Watts, and S. H. Strogatz. Random graph models ofsocial networks.

Proceedings of the National Academy of Sciences , 99(suppl 1):2566–2572, 2002.[28] J. K. Pritchard, M. Stephens, and P. Donnelly. Inference of population structureusing multilocus genotype data.

Genetics , 155(2):945–959, 2000.2729] K. Rohe, S. Chatterjee, and B. Yu. Spectral clustering and the high-dimensionalstochastic blockmodel.

Ann. Statist. , 39(4):1878–1915, 08 2011.[30] J. Shi and J. Malik. Normalized cuts and image segmentation.

IEEE Trans. PatternAnal. Mach. Intell. , 22(8):888–905, Aug. 2000.[31] M. Sonka, V. Hlavac, and R. Boyle.

Image Processing, Analysis, and MachineVision . Thomson-Engineering, 2007.[32] G. C. Wick. The evaluation of the collision matrix.

Phys. Rev. , 80:268–272, Oct1950.[33] N. C. Wormald. Models of random regular graphs. In J. D. Lamb and D. A.Preece, editors,

Surveys in Combinatorics, 1999 , pages 239–298. Cambridge Uni-versity Press, 1999.

Here we only give a very brief description about the combinatorial aspects of randommatrix theory required to prove Lemma 4.3. For more general information one shouldlook at Chapter 1 of Anderson et al. [4] and Anderson and Zeiouni [3]. The deﬁnitionsin this section have been taken from Anderson et al. [4] and Anderson and Zeitouni [3].

Deﬁnition 7.1. ( S words) Given a set S , an S letter s is simply an element of S . An S word w is a ﬁnite sequence of letters s . . . s n , at least one letter long. An S word w isclosed if its ﬁrst and last letters are the same. Two S words w , w are called equivalent,denoted w ∼ w , if there is a bijection on S that maps one into the other.When S = { , . . . , N } for some ﬁnite N , we use the term N word. Otherwise, if theset S is clear from the context, we refer to an S word simply as a word.For any word w = s . . . s k , we use l ( w ) = k to denote the length of w , deﬁne theweight wt ( w ) as the number of distinct elements of the set s , . . . , s k and the support of w , denoted by supp( w ), as the set of letters appearing in w . With any word w we mayassociate an undirected graph, with wt ( w ) vertices and l ( w ) − Deﬁnition 7.2. (Graph associated with a word) Given a word w = s . . . s k , we let G w = ( V w , E w ) be the graph with set of vertices V w = supp( w ) and (undirected) edges E w = {{ s i , s i + } , i = , . . . , k − } . The graph G w is connected since the word w deﬁnes a path connecting all the ver-tices of G w , which further starts and terminates at the same vertex if the word is closed.For e ∈ E w , we use N we to denote the number of times this path traverses the edge e (in any direction). We note that equivalent words generate the same graphs G w (up tograph isomorphism) and the same passage-counts N we .28 eﬁnition 7.3. (sentences and corresponding graphs) A sentence a = [ w i ] ni = = [[ α i , j ] l ( w i ) j = ] ni = is an ordered collection of n words of length ( l ( w ) , . . . , l ( w n )) respectively. We deﬁnethe graph G a = ( V a , E a ) to be the graph with V a = supp( a ) , E a = n { α i , j , α i , j + }| i = , . . . , n ; j = , . . . , l ( w i ) − } o . Deﬁnition 7.4. (weak CLT sentences) A sentence a = [ w i ] ni = is called a weak CLTsentence. If the following conditions are true:1. All the words w i ’s are closed.2. Jointly the words w i visit edge of G a at least twice.3. For each i ∈ { , . . . , n } , there is another j , i ∈ { , . . . , n } such that G w i and G w j have at least one edge in common.Note that these deﬁnitions are consistent with the ones given in Section 4. How-ever, in Section 4, we deﬁned these only for some speciﬁc cases required to solve theproblem.In order to prove Lemma 4.3, we require the following result from Anderson et al.[4]. Lemma 7.1. (Lemma 2.1.23 in Anderson et al. [4]) Let W k , t denote the equivalenceclasses corresponding to all closed words w of length k + with wt( w ) = t such thateach edge in G w have been traversed at least twice. Then for k > t − , W k , t ≤ k k k − t + Assuming Lemma 7.1 we now prove Lemma 4.3.

Proof of Lemma 4.3:

Let a = [ w i ] mi = be a weak CLT sentence such that G a have C ( a )many connected components. At ﬁrst we introduce a partition η ( a ) in the followingway. We put i and j in same block of η ( a ) if G w i and G w j share an edge. At ﬁrst weﬁx such a partition η and consider all the sentences such that η ( a ) = η . Let C ( η ) bethe number of blocks in η . It is easy to observe that for any a with η ( a ) = η , we have C ( η ) = C ( a ). From now on we denote C ( η ) by C for convenience.Let a be any weak CLT sentence such that η ( a ) = η . We now propose an algorithm toembed a into C ordered closed words ( W , . . . , W C ) such that the equivalence class ofeach W i belongs to W L i , t i for some numbers L i and t i .A similar type of argument can be found in Claim 3 of the proof of Theorem 2.2 inBanerjee and Bose(2016) [5]. An embedding algorithm:

Let B , . . . , B C be the blocks of the partition η ordered inthe following way. Let m i = min { j : j ∈ B i } and we order the blocks B i such that m < m . . . < m C . Given a partition η this ordering is unique. Let B i = { i (1) < i (2) < . . . < i ( l ( B i )) } . Here l ( B i ) denotes the number of elements in B i .For each B i we embed the sentence a i = [ w i ( j ) ] ≤ j ≤ l ( B i ) into W i sequentially in thefollowing manner. 29. Let S = { i (1) } and w = w i (1) .

2. For each 1 ≤ c ≤ l ( B i ) − • Consider w c = ( α , c , . . . , α l ( w c ) , c ) and S c ⊂ B i . Let ne ∈ B i \ S c be the indexsuch that the following two conditions hold.(a) G w c and G w ne shares at least one edge e = { α κ , c , α κ + , c } .(b) κ is minimum among all such choices. • Let w ne = ( β , c , . . . , β l ( w ne ) , c ) and { β κ , c , β κ + , c } be the ﬁrst time e appears in w ne . As { β κ , c , β κ + , c } = { α κ , c , α κ + , c } , α κ , c is either equal to β κ , c or β κ , c .Let κ ∈ { κ , κ + } such that α κ , c = β κ , c . If β κ , c = β κ + , c , then we simplytake κ = κ . • We now generate w c + in the following way w c + = ( α , c , . . . , α κ , c , β κ + , c , . . . , β l ( w ne ) , c , β , c , . . . , β κ , c , α κ + , c , . . . , α l ( w c ) , c ) . Let ˜ a c : = ( w c , w ne ). It is easy to observe by induction that all w c ’s are closedwords and so are all the w ne ’s. So the all the edges in the graph G ˜ a c arepreserved along with their passage counts in G w c + . • Generate S c + = S c ∪ { ne } .

3. Return W i = w l ( B i ) .In the preceding algorithm we have actually deﬁned a function f which maps any weakCLT sentence a into C ordered closed words ( W , . . . , W C ) such that each the equiva-lence class of each W i belongs to W L i , t i for some numbers L i and t i . Observe also that L i < P j ∈ B i l ( w j ) and t i < L i + .Unfortunately f is not an injective map. So given ( W , . . . , W C ) we ﬁnd an upper boundto the cardinality of the following set f − ( W , . . . , W C ) : = { a | f ( a ) = ( W , . . . , W C ) } We have argued earlier C is the number of blocks in η . However, in general ( W , . . . , W C )does neither specify the partition η nor the order in which the words are concatenatedwith in each block B i of η . So we ﬁx a partition η with C many blocks and an order ofconcatenation O . Observe that O = ( σ ( η ) , . . . , σ C ( η ))where for each i , σ i ( η ) is a permutation of the elements in B i . Now we give an uniformupper bound to the cardinality of the following set f − η, O ( W , . . . , W C ) : = { a | η ( a ) = η ; O ( a ) = O & f ( a ) = ( W , . . . , W C ) } . W i is formed by recursively applying step 2. to( w c , w ne ) for 1 ≤ c ≤ l ( B i ). Given a word w = ( α , . . . , α l ( w ) ), we want to ﬁnd out thenumber of two words sentences ( w , w ) such that applying step 2 of the algorithm on( w , w ) gives w as an output. This is equivalent to choose three positions i < i < i from the set { , . . . , l ( w ) } such that α i = α i . Once these three positions are chosen,( w , w ) can be constructed uniquely in the following manner w = ( α , . . . , α i , α i + , . . . , α l ( w ) ) w = ( α i , . . . , α i , α i + , . . . , α i ) . Total number of choices i < i < i is bounded by l ( w ) ≤ (cid:0)P mi = l ( w i ) (cid:1) . For eachblock B i , step 2. of the algorithm has been used l ( B i ) many times. So f − η, O ( W , . . . , W C ) ≤  m X i = l ( w i )  P C i = l ( B i ) =  m X i = l ( w i )  m On the other hand, a there at most m m many η ’s and for each η there are at most Q C i = l ( B i )! ≤ m m choices of O . So f − ( W , . . . , W C ) ≤ m m  m X i = l ( w i )  m ≤  D m X i = l ( w i )  D m (7.1)for some known constants D and D . Now we ﬁx the sequence ( L i , t i ) and ﬁnd anupper bound to the number of ( W , . . . , W C ). From Lemma 7.1 we know the number ofchoices of W i is bounded by 2 L i − ( L i − L i − t + n t i . So the total number of choices for( W , . . . , W C ) is bounded by2 P mi = l ( w i ) C Y i = ( L i − L i − t + n t i ≤ P mi = l ( w i ) n t  m X i = l ( w i )  P mi = l ( w i ) − t )  m X i = l ( w i )  m . (7.2)Now the number of choices ( L i , t i ) such that P C i = L i = P mi = l ( w i ) and P C i = t i = t arebounded by P mi = l ( w i ) − C − ! t − C − ! ≤  m X i = l ( w i )  m . (7.3)Here the inequality follows since C ≤ m and t ≤ P mi = l ( w i )2 −