[PDF] Proof of the Contiguity Conjecture and Lognormal Limit for the Symmetric Perceptron

Abstract

We consider the symmetric binary perceptron model, a simple model of neural networks that has gathered significant attention in the statistical physics, information theory and probability theory communities, with recent connections made to the performance of learning algorithms in Baldassi et al. '15. We establish that the partition function of this model, normalized by its expected value, converges to a lognormal distribution. As a consequence, this allows us to establish several conjectures for this model: (i) it proves the contiguity conjecture of Aubin et al. '19 between the planted and unplanted models in the satisfiable regime; (ii) it establishes the sharp threshold conjecture; (iii) it proves the frozen 1-RSB conjecture in the symmetric case, conjectured first by Krauth-M\'ezard '89 in the asymmetric case. In a recent concurrent work of Perkins-Xu [PX21], the last two conjectures were also established by proving that the partition function concentrates on an exponential scale. This left open the contiguity conjecture and the lognormal limit characterization, which are established here. In particular, our proof technique relies on a dense counter-part of the small graph conditioning method, which was developed for sparse models in the celebrated work of Robinson and Wormald.

Full PDF

aa r X i v : . [ m a t h . P R ] F e b Proof of the Contiguity Conjecture and Lognormal Limitfor the Symmetric Perceptron

Emmanuel Abbe ∗ Shuangping Li † Allan Sly ‡ Abstract

We consider the symmetric binary perceptron model, a simple model of neural networks thathas gathered signiﬁcant attention in the statistical physics, information theory and probabilitytheory communities, with recent connections made to the performance of learning algorithms inBaldassi et al. ’15.We establish that the partition function of this model, normalized by its expected value,converges to a lognormal distribution. As a consequence, this allows us to establish severalconjectures for this model: (i) it proves the contiguity conjecture of Aubin et al. ’19 between theplanted and unplanted models in the satisﬁable regime; (ii) it establishes the sharp thresholdconjecture; (iii) it proves the frozen 1-RSB conjecture in the symmetric case, conjectured ﬁrstby Krauth-M´ezard ’89 in the asymmetric case.In a recent concurrent work of Perkins-Xu [PX21], the last two conjectures were also estab-lished by proving that the partition function concentrates on an exponential scale. This leftopen the contiguity conjecture and the lognormal limit characterization, which are establishedhere. In particular, our proof technique relies on a dense counter-part of the small graph condi-tioning method, which was developed for sparse models in the celebrated work of Robinson andWormald.

The binary perceptron is a simple model used to study the structural properties of zero loss solutionsin neural networks. It was introduced in the 60s by Cover [Cov65] and in the 80s in the statisticalphysics literature with detailed characterizations put forward by Gardner and Derrida [GD88] andKrauth and M´ezard [KM89]. More recently, the structural properties of its solution space havebeen related to the behavior of algorithms for learning neural networks in [Bal+16a; Bal+16b;BZ06; Bal+15] and several probabilistic results have been established in [KR98; Tal99; Sto13;DS19; APZ19] (see further discussions below).There exist several model variants and questions motivated by both memorization [Cov65;GD88; KM89] and generalization properties [OH91]. We focus here on the capacity (or memoriza-tion) problem , the symmetric binary (or Ising) model and the constrained satisfaction point ofview. ∗ Institute of Mathematics, EPFL, Lausanne, CH-1015, Switzerland. † PACM, Princeton University, Princeton, NJ, 08544, USA. ‡ Department of Mathematics, Princeton University, Princeton, NJ, 08544, USA. Mainly for the spherical case. We refer to [Cov65; GD88] for relations between the capacity and the memorization capability of neural networks. G be an m by n matrix with i.i.d. entries taking value in { +1 , − } with equalprobability. Fix a positive real number κ , and consider the following constraints: S j ( G ) := ( X ∈ {− , +1 } n : 1 √ n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X i =1 G j,i X i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ κ ) , j = 1 , · · · , m. The capacity problem is concerned by characterizing regimes of m and n for which solutions to theabove constraints exist, i.e., for which S m ( G ) := m \ j =1 S j ( G ) = ∅ . More precisely, the capacity is deﬁned by m ∗ κ ( n ) := max { m ≥ S m ( G ) = ∅} . (1)In the asymmetric variant of the model, originally studied in [Cov65; GD88; KM89], the con-straints are given by ˜ S j ( G ) := n X ∈ {− , +1 } n : √ n P ni =1 G j,i X i ≥ κ o for 1 ≤ j ≤ m with similardeﬁnitions for ˜ S m and ˜ m ∗ .Despite the simplicity of their deﬁnitions, several long-standing conjectures remain as well asintriguing connections to learning problems. • Capacity sharp threshold conjecture.

The simplest structural property of the solutionspace is the non-emptiness. In [KM89], it was conjectured that for the asymmetric binaryperceptron (ABP), m ∗ κ ( n ) /n converges to an explicit constant α ∗ κ as n diverges, equivalentlygiven by α ∗ κ = inf { α ≥ n →∞ P ( S m ( G ) = ∅ ) = 1 } . More speciﬁcally, it is conjecturedthat a sharp threshold phenomenon takes place at α ∗ κ , with the existence and absence ofsolutions taking place with high probability below and above capacity.This conjecture was extended to the symmetric binary perceptron (SBP) model in [APZ19],with a partial result obtained in this case, showing that solutions do not exists with highprobability below the capacity, and exists with positive probability above capacity. Sucha positive probability result was also recently obtained by [DS19] in the more challengingsetting of ABP. However, in both cases, a second moment method yields a ratio between thesecond moment and ﬁrst moment squared that is not 1 but a positive constant, preventingto establish the high-probability result below capacity.To set the sharp threshold conjecture for SBP, a possibility is to combine the result of Xu[Xu19], which is currently written for the ABP setting, with the result of [APZ19] obtainedfor the SBP setting. However, this would not allow to settled the next conjecture. • Contiguity conjecture.

Proving the existence of solutions below capacity with high prob-ability follows from a stronger property of the solution space. Namely, that the distributionof the matrix G (say in SBP) is contiguous to the distribution of the matrix G in the plantedmodel, i.e., in the model where the matrix G is drawn conditioned on it satisfying a plantedsolution that is drawn uniformly at random. In particular, this contiguity conjecture is statedin [APZ19] for the SBP. Note that m c,κ ( n ) is a random variable that is a function of G . • Strong freezing conjecture.

This phenomenon was originally discovered by [KM89] andsubsequently strengthen in [HWK13] and [HK14] (see also [ZM08a; ZM08b; ZK11] for generalCSPs). It lies at the core of the early statistical physics study of the perceptron model.In the case of sparse random CSPs, an important role is played by two topological thresholdsinvolving clustering and freezing. The clustering property asserts that at suﬃciently largeconstraint density, clusters of connected solutions break apart into exponentially many com-ponents separated by linear distance. This was described precisely in [Krz+07] and proved in[AC08]. The second topological property is freezing, where in a typical cluster of solutions alinear number of variables are frozen, that is they take the same value in every solution in thecluster. In the case of coloring on sparse random graphs, the freezing threshold was estab-lished in [Mol12]. Both of these thresholds occur well before the satisﬁability thresholds in arange of sparse random CSPs such as k -SAT, k -NAESAT and random colourings. Moreover,these thresholds are conjectured to characterize the limit of eﬃcient algorithms.Interestingly, in the perceptron models, the freezing property - called frozen 1-RSB - hasbeen conjectured to take place at all positive density and not just ‘close’ to capacity. Inparticular, [KM89] conjectured that for ABP, typical solutions belong to clusters of vanishingentropy density (i.e., with 2 o ( n ) solutions at linear distance), whereas [HWK13] conjecturedthat such solutions are indeed isolated. The latter conjecture was extended to the SBP modelin [APZ19] and [Bal+20], making the SBP model as interesting as the ABP one for suchstructural properties. • Small graph conditioning for dense models.

A possible approach to tackle the previousconjecture would be to obtain an analog of the small graph conditioning method in the denseABP model considered here. For sparse random constraint satisfaction problems, the smallgraph conditioning method has been used to establish contiguity for the planted distributionand determine the asymptotic law of the partition function. This method is based aroundcounting small cycles in the graph and accounting for their inﬂuence on Z . The perceptronmodel is a naturally dense system but in Section 2.3 we construct the right analogue of thesmall graph conditioning method by summing over products of matrix entries around cycles. • Learning algorithms

The latter conjectures have pointed to an interesting and novel phe-nomenon for learning algorithms on neural networks [Bal+15]. On the one hand, the per-ceptron model (symmetric or not) is conjectured to be ‘typically hard’, as most solutions arecompletely frozen. On the other hand, eﬃcient algorithms have been shown empirically tosucceed in ﬁnding solutions, suggesting - if the freezing conjecture turned out to be correct -that such algorithms ﬁnd atypical solutions, i.e., are part of rare clusters with atypical struc-tural properties. This was shown empirically in [BZ06] with message passing algorithms.Further connections to learning algorithms are given in [BZ06; Bal+15]. Our contribution.

This paper proves the above conjectures for the SBP model (sharp thresh-old, contiguity and strong freezing). Furthermore, it derives these results by obtaining an explicit See also [KR98] for majority vote algorithms and extension of [KR98] to (symmetric) perceptrons with κ > α below an explicit threshold, Z ( G ) E ( Z ( G )) d −→ Lognormal (cid:0) µ ( κ ) , σ ( κ ) (cid:1) , (2)where µ ( κ ) , σ ( κ ) are given in Section 2.1. Our results work under a numerical hypothesis deﬁnedbelow, same as the one in [APZ19]. We expect these methods to be more widely applicable to arange of dense constraint satisfaction problems. Parallel work.

While writing this paper, we became aware of the recent concurrent work[PX21], which shows that the partition function of the symmetric perceptron concentrates on anexponential scale, using an inductive argument as constraints are added. This is suﬃcient to implythe sharp threshold and freezing conjectures, but leaves open the contiguity conjecture and thelognormal limit characterization, which are established here by characterizing the distribution ofthe partition function using a counter-part of the small graph conditioning method.

Deﬁne the probability P κ = P ( | N | ≤ κ ) , where N follows the standard normal distribution and deﬁne the capacity density α c ( κ ) = − log 2log P κ . We denote Z ( G, m, n ) := | S ( G ) | as the number of solutions to the symmetric binary perceptronproblem and will write Z ( G ) in short when m and n are clear. Our main results work under anumerical hypothesis. The hypothesis also appears in [APZ19]. For 0 ≤ x ≤

1, let N and N betwo standard normal random variables with correlation 2 x −

1. Deﬁne q κ ( x ) := P ( | N | ≤ κ, | N | ≤ κ ) and F ( x ) := α log q κ ( x ) − x log x − (1 − x ) log(1 − x ) . Hypothesis 2.1.

For any κ > α >

0, so that F ′′ (1 / <

0, there is exactly one x ∈ (0 , / F ′ ( x ) = 0.To state our main theorem, we deﬁne µ ,κ = 1 P κ Z [ − κ,κ ] √ π e − x x dx and β = − √ α − µ ,κ ) . Note that when α < α c , we have − / < β < Theorem 2.1.

Let κ > and < α < α c ( κ ) . Take m = ⌊ αn ⌋ . Under Hypothesis 2.1, Z ( G ) E ( Z ( G )) d −→ Lognormal (cid:18)

14 log(1 − β ) + β , −

12 log(1 − β ) − β (cid:19) , as n → ∞ . This theorem describes the limiting distribution of Z ( G ) / E Z . This enables us to study otherproperties of the model. 4 .2 Consequences A few immediate consequences are presented below.

Deﬁnition 2.1 (Planted Model) . We use P ∗ to denote the tilted law of G condition on X ∈ S ( G ),where X is a solution uniformly chosen at random from {± } n . Theorem 2.2 (Contiguity) . Let κ > and < α < α c ( κ ) . Take m = ⌊ αn ⌋ . Under Hypothesis 2.1, P and P ∗ are mutually contiguous, i.e. for any sequence of events A n , P ( A n ) → if and only if P ∗ ( A n ) → . Theorem 2.3 (Sharp Threshold) . For any κ > and α > α c , lim n →∞ P ( Z ( G, ⌊ αn ⌋ , n ) ≥

1) = 0 . For any < α < α c ( κ ) , under Hypothesis 2.1, lim n →∞ P ( Z ( G, ⌊ αn ⌋ , n ) ≥

1) = 1 . Theorem 2.4 (Freezing of typical solutions) . Let κ > and < α < α c ( κ ) . Take m = ⌊ αn ⌋ .Under Hypothesis 2.1, there exists d > such that lim n →∞ P ( { X ∈ S ( G ) : d ( X , X ) ≤ dn } = { X } ) = 1 , where X is chosen uniformly at random from S ( G ) . Here the distance function d ( X, X ) = ( n −h X, X i ) / is the Hamming distance. This means for a typical solution X , there is no other solutions at a constant distance from it.Or in other words, for a typical solution X , if we ﬂip a small constant proportion of its entries, theresulting vector is not a solution. The recurring challenge in applying the second moment method is when E [ Z ] / ( E [ Z ]) → C > Z exp( Y ) E ( Z ) p −→ , n → ∞ where Y = C + X k δ k C k , (3)with C k the number of cycles of length k and δ k model dependent parameters. While sometimestechnically challenging to implement, the small graph method has played an important role in theanalysis of many models [GˇSV16; CEH16; MWW09; MNS15].Unlike sparse random CSPs, the perceptron models are dense with direct interactions betweeneach pair of variables in all of the constraints. Nonetheless, we ﬁnd the analogue of equation (3)deﬁning Y := ⌊ log n ⌋ X k =2 β ) k C k − (2 β ) k k , where the cycle counts are now generalized as C k = ( 1 √ n ) k ( 1 √ m ) k X i ,i , ··· ,i k ∈ [ n ] disjoint j ,j , ··· ,j k ∈ [ m ] disjoint k Y ℓ =1 G i ℓ ,j ℓ G i ℓ +1 ,j ℓ , (4)where we identify i k +1 with i . In place of counting over cycles of the graph we instead sum over theproducts of matrix entries of G along cycles in the entries. We prove asymptotic joint normality forthe C k by computing their joint moments and showing convergence to the moments of independentGaussians. With this deﬁnition we show the following theorem. Theorem 2.5.

Let κ > and < α < α c ( κ ) . Take m = ⌊ αn ⌋ . Under Hypothesis 2.1, Z ( G )exp( Y ) E ( Z ( G )) p −→ , as n → ∞ . We now discuss the motivation behind the deﬁnition of Y . We want to understand the eﬀect onthe distribution of a random solution X ∈ S ( G ) from the constraint matrix G . In the symmetricperceptron model, the marginal distribution of each entry X i is always balanced by symmetry forany G . The next simplest interaction is to consider the eﬀect of the constraints on the correlationbetween pairs of X i . Each constraint has the eﬀect of creating a small correlation which can beeither positive or negative and is of order 1 /n . More speciﬁcally, if G j denotes the j -th row of G ,then E ( X p X q | X ∈ S j ( G ) , G j ) ≈ − (1 − µ ,κ ) X p X q G j,p G j,q n + O ( 1 n ) ≈ exp( − (1 − µ ,κ ) X p X q G j,p G j,q n ) . If the eﬀect of the rows were multiplicative then we might expect that E ( X p X q | X ∈ S ( G ) , G ) ≈ exp( − √ α (1 − µ ,κ ) J p,q X p X q √ n ) . J p,q = P mj =1 G j,p G j,q √ m . Since the rows are independent, asymptotically J p,q is N (0 , Y as a similar sum over weighted cycles. For a positive integer M ≥

2, we deﬁne a truncated version of the sum Y as Y M := M X k =2 β ) k C k − (2 β ) k k . Lemma 3.1.

Let κ > and < α < α c ( κ ) . Take m = ⌊ αn ⌋ . Under Hypothesis 2.1, for any ε > ,there exist integer M ≥ and M > such that E (cid:18) Z ( G )exp( Y M [ | Y M | ≤ M ]) (cid:19) ≥ (1 − ε ) E ( Z ( G )) and E (cid:18) Z ( G ) exp(2 Y M [ | Y M | ≤ M ]) (cid:19) ≤ (1 + ε ) E ( Z ( G )) . This lemma enables us to say Z ( G ) / E Z is close to Y M [ | Y M | ≤ M ], which is also close to Y .This will imply Theorem 2.5. Together with a description of the distribution of Y , we can obtainall the other theorems. In order to prove Lemma 3.1 and to study the distribution of Y , we needa characterization of the cycle counts. Recall the deﬁnition of the planted model in Deﬁnition 2.1. Deﬁnition 3.1 (Two solutions planted) . We sample two solutions X and X with replacementuniformly at random from {± } n condition on h X , X i = t √ n . We use P ∗ t to denote the distri-bution of G condition on X , X ∈ S ( G ).With this deﬁnition, we have the following characterization of C k . Lemma 3.2.

Let m = ⌊ αn ⌋ . Under P , for any integer k ≥ , ( C , · · · , C k √ k ) d −→ N ( , I k − ) , as n goes to inﬁnity. Under P ∗ , for any integer k ≥ , ( C − (2 β ) , · · · , C k − (2 β ) k √ k ) d −→ N ( , I k − ) , as n goes to inﬁnity. For any | t | ≤ log n and any integer k ≥ , under P ∗ t , ( C − β ) , · · · , C k − β ) k √ k ) d −→ N ( , I k − ) , as n goes to inﬁnity. Y . Deﬁne a function L ( M ) := M X k =2 (2 β ) k k . Lemma 3.3.

Let m = ⌊ αn ⌋ . Under P , for any integer M ≥ , Y M d −→ N ( − L ( M ) , L ( M )) , as n goes to inﬁnity. Under P ∗ , for any integer M ≥ , Y M d −→ N ( 14 L ( M ) , L ( M )) , as n goes to inﬁnity. For any | t | ≤ log n and any integer M ≥ , under P ∗ t , Y M d −→ N ( 34 L ( M ) , L ( M )) , as n goes to inﬁnity. We note that L ( M ) = P M k =2 (2 β ) k k is the Taylor series expansion for − log(1 − β ) − β .Therefore, we have L ( M ) → − log(1 − β ) − β as M → ∞ . This explains the parameter forlognormal distribution in Theorem 2.1. We start by deﬁning a few discrete analogues of the deﬁnitions in Section 2.1. Deﬁne P κ,n = P ( | X | ≤ κ √ n ) , where X is the sum of n independent Rademacher random variables. And deﬁne µ ,κ,n = 12 n nP κ,n n X t =0 (2 t − n ) (cid:18) nt (cid:19) and β n = − √ m √ n (1 − µ ,κ,n ) . Note that when m = ⌊ αn ⌋ , P κ,n , µ ,κ,n and β n converge to P κ , µ ,κ and β as n → ∞ . To prove the asymptotic normality of C k , we deﬁne a shifted cycle count. Without loss of generality,when we plant one solution, we assume that the all one vector is chosen and when we plant twosolutions, we assume that one of them is the all one vector and the other has 1’s in the ﬁrst n/ t √ n/ − P ∗ , deﬁne¯ C k ( G ) = ( 1 √ n ) k ( 1 √ m ) k X i ,i , ··· ,i k ∈ [ n ] disjoint j ,j , ··· ,j k ∈ [ m ] disjoint k Y ℓ =1 ( G i ℓ ,j ℓ G i ℓ +1 ,j ℓ − β n √ mn ) , P ∗ t , we deﬁne¯ C k ( G ) = ( 1 √ n ) k ( 1 √ m ) k X i ,i , ··· ,i k ∈ [ n ] disjoint j ,j , ··· ,j k ∈ [ m ] disjoint k Y ℓ =1 ( G i ℓ ,j ℓ G i ℓ +1 ,j ℓ − β n ( i ℓ , i ℓ +1 ∈ Q ) √ mn ) , where i ℓ , i ℓ +1 ∈ Q whenever i ℓ , i ℓ +1 ∈ [ n/ t √ n/

2] or i ℓ , i ℓ +1 ∈ [ n ] \ [ n/ t √ n/ Lemma 4.1.

Let m = ⌊ αn ⌋ . Under P ∗ or P ∗ t where | t | ≤ log n , for any integer k ≥ , ( ¯ C , · · · , ¯ C k √ k ) d −→ N ( , I k − ) , as n goes to inﬁnity. Under P , for any integer k ≥ , ( C , · · · , C k √ k ) d −→ N ( , I k − ) , as n goes to inﬁnity. Moreover, we have that

Lemma 4.2.

Let m = ⌊ αn ⌋ . Under P ∗ , for any integer k ≥ , C k − ¯ C k − (2 β n ) k p −→ , as n goes to inﬁnity. For any | t | ≤ log n and any integer k ≥ , under P ∗ t , C k − ¯ C k − β n ) k p −→ , as n goes to inﬁnity. The two lemmas together imply Lemma 3.2.Fix an integer 1 ≤ j ≤ m . Consider a multigraph H j = ([ n ] , E ( H j )). We write V ( H j ) to be theset of all non-isolated vertices. For each edge e = ( p, q ) ∈ E ( H j ), we deﬁne¯ G e := G p,j G q,j − β n √ mn and ¯ G H j := Y e ∈ E ( H j ) ¯ G e , when we work with E ∗ . And deﬁne¯ G e := G p,j G q,j − β n ( p, q ∈ Q ) √ mn and ¯ G H j := Y e ∈ E ( H j ) ¯ G e , when we work with E ∗ t . Lemma 4.3. If H j is an even graph, then E ∗ ( ¯ G H j ) = O (1) . If there are exactly two vertices in V ( H j ) with odd degrees and | E ( H j ) | ≥ , then E ∗ ( ¯ G H j ) = O (1 /n ) . Else, E ∗ ( ¯ G H j ) = O (1 /n ) . roof. For simplicity of notation, we drop j in the subscript. Note that we assumed that the allone vector is the planted solution. So E ∗ is symmetric for any entries G i . In particular, we havethat for disjoint i , i , · · · , i k ∈ [ n ], E ∗ ( G i · · · G i k ) = E ∗ ( G · · · G k ) = E G · · · G k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) | n X i =1 G i | ≤ κ √ n ! . The expected product is zero when k is odd by symmetry. When k is even, we notice that byStirling’s approximation, P ∗ ( G = 1 , · · · , G a = 1 , G a +1 = − , · · · , G k = − P κ,n − n X t : | t |≤√ n n + t √ n ∈ Z (cid:18) n − k n + t √ n − a (cid:19) = 1 P κ,n − n X t : | t |≤√ n n + t √ n ∈ Z (cid:18) n n + t √ n (cid:19) − k (1 + (2 a − k ) t √ n + (( k − a ) − k )( t − n + poly a,k,t n / + O ( 1 n ))= 2 − k (1 + (( k − a ) − k )( µ ,κ,n − n + O ( 1 n )) , where poly a,k,t = (4 a + 8 a − k − ak − a k + 3 k + 6 ak − k ) t ( t − E ∗ ( G · · · G k ) = k X a =0 ( − k − a (cid:18) ka (cid:19) − k (1 + (( k − a ) − k )( µ ,κ,n − n + O ( 1 n )) (5)=  O ( 1 n ) , if k ≥ β n √ mn + O ( 1 n ) , if k = 2 . (6)Here we used the deﬁnition that β n = √ m ( µ ,κ,n − / √ n . Now, if H consists of a single edge,then E ∗ ( ¯ G H ) = E ∗ ( G p,j G q,j − β n √ mn )= O ( 1 n ) . Otherwise, E ∗ ( ¯ G H ) = E ∗ ( Y e ∈ E ( H ) ( G p,j G q,j − β n √ mn ))= E ∗ ( Y e ∈ E ( H ) ¯ G e ) + X A ( E ( H ) E ∗ ( Y e ∈ A ¯ G e Y e ∈ E ( H ) \ A − β n √ mn ):= S + S . | E ( H ) | ≥ H has at least 4 vertices with odd degrees, we have S = O (1 /n )by equation (6). For the second term S , if A consists of all but one edge, then E ∗ ( Q e ∈ A ¯ G e ) = O (1 /n ), and thus S = O (1 /n ). Otherwise, we directly have S = O (1 /n ). Now if H has only2 vertices with odd degrees, then by equation (6) again, S = O (1 /n ). As S = O (1 /n ) as well, E ∗ ( ¯ G H ) = O (1 /n ). Finally, if H is even, then any product in the computation is O (1), so thestatement follows. Lemma 4.4.

Let | t | ≤ log n . If H j is an even graph, then E ∗ t ( ¯ G H j ) = O (1) . If there are exactly twovertices in V ( H j ) with odd degrees and | E ( H j ) | ≥ , then E ∗ t ( ¯ G H j ) = O (1 /n ) . Else, E ∗ t ( ¯ G H j ) = O ( t /n ) .Proof. For simplicity, write T = n/ t √ n/ T ] c = [ n ] \ [ T ]. As we assumed that the twoplanted solutions are the all one vector and the vector with ones in the front T entries, we havethat E ∗ is symmetric for any entries G i in [ T ] and [ T ] c . In particular, we have that for disjoint i , i , · · · , i k ∈ [ T ] and i k +1 , i k +2 , · · · , i k + k ∈ [ T ] c , E ∗ t ( G i · · · G i k k ) = E ∗ ( G · · · G k G T +1 · · · G T + k ) (7)= E  G · · · G k G T +1 · · · G T + k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) | X i ∈ [ n ] G i | ≤ κ √ n, | X i ∈ [ T ] G i − X i ∈ [ T ] c G i | ≤ κ √ n  (8)We deﬁne P κ ( t ) : = P  | X i ∈ [ n ] G i | ≤ κ √ n, | X i ∈ [ T ] G i − X i ∈ [ T ] c G i | ≤ κ √ n  = 2 − n X x ,x (cid:18) n + t √ n n + ( t + x + x ) √ n (cid:19)(cid:18) n − t √ n n + ( − t + x − x ) √ n (cid:19) = P κ (1 + ( t − − [ x ] κ ) n + O ( t n )) . where the summation is over x and x that satisfy | x | , | x | ≤ κ , n + x √ n ∈ Z , n + x √ n ∈ Z and n + ( x + x ) √ n even. Write k = k + k . The expected product in (7) is zero when k is odd bysymmetry. We write G a : b = 1 to denote that G a = · · · = G b = 1. When k is even, we notice thatby Stirling’s approximation, P ∗ t ( G a = 1 , G a +1: k = − , G T +1: T + b = 1 , G T + b +1: T + k = − P κ ( t ) 2 − n X x ,x (cid:18) n + t √ n − k n + ( t + x + x ) √ n − a (cid:19)(cid:18) n − t √ n − k n + ( − t + x − x ) √ n − b (cid:19) . = 1 P κ ( t ) 2 − n X x ,x − k − k (cid:18) n n + x √ n (cid:19)(cid:18) n n + x √ n (cid:19) (1 + poly a,b,x ,x ,t n + O ( t n )) , where the summation is over x and x that satisfy | x | , | x | ≤ κ , n + x √ n ∈ Z , n + x √ n ∈ Z and n + ( x + x ) √ n even. Here Poly a,b,x ,x ,t = ( − k + ( k − k − a + 2 b ) )( x −

1) + ( − k + ( k + k − a − b ) )( x −

1) + ( t − x − x − x and x cancelin the sum so we omit them here. Note that1 P κ ( t ) 2 − n X x ,x − k − k (cid:18) n n + x √ n (cid:19)(cid:18) n n + x √ n (cid:19) x = P κ,n P κ ( t ) µ ,κ,n . Then we have that P ∗ t ( G a = 1 , G a +1: k = − , G T +1: T + b = 1 , G T + b +1: T + k = − P κ P κ ( t ) 2 − k (1 + 2(( k − a ) + ( k − b ) − k ))( µ ,κ,n −

1) + ( t − − [ x ] κ ) n + O ( t n ))= 2 − k (1 + 2(( k − a ) + ( k − b ) − k ))( µ ,κ,n − n + O ( t n )) . This further implies that E ∗ t ( G · · · G k G T +1 · · · G T + k ) (9)= k X a =0 k X b =0 ( − k − a − b (cid:18) k a (cid:19)(cid:18) k b (cid:19) − k (1 + 2(( k − a ) + ( k − b ) − k ))( µ ,κ,n − n + O ( t n ))(10)=  O ( 1 n ) , if k ≥ β n √ mn + O ( t n ) , if k = 2 , a = 0 or b = 0 O ( 1 n ) , else. (11)Here we used the deﬁnition that β n = √ m ( µ ,κ − n − / √ n . The rest of the argument are similar.If H consists of a single edge, then E ∗ t ( ¯ G H ) = E ∗ ( G p,j G q,j − β n √ mn )= O ( t n ) . Otherwise, E ∗ t ( ¯ G H ) = E ∗ ( Y e ∈ E ( H ) ( G p,j G q,j − β n √ mn ))= E ∗ ( Y e ∈ E ( H ) ¯ G e ) + X A ( E ( H ) E ∗ ( Y e ∈ A ¯ G e Y e ∈ E ( H ) \ A − β n √ mn ):= S + S . Notice that when | E ( H ) | ≥ H has at least 4 vertices with odd degrees, we have S = O ( t /n )by equation (11). For the second term S , if A consists of all but one edge, then E ∗ ( Q e ∈ A ¯ G e ) = O (1 /n ), and thus S = O (1 /n ). Otherwise, we directly have S = O ( t /n ). Now if H has only2 vertices with odd degrees, then by equation (11) again, S = O (1 /n ). As S = O (1 /n ) as well, E ∗ ( ¯ G H ) = O (1 /n ). Finally, if H is even, then any product in the computation is O (1), so thestatement follows. 12 roof of Lemma 4.1. We establish this through the method of moment and Wick’s probabilitytheorem. We will show that, for non-negative integers α , · · · , α k ,lim n →∞ E ∗ ¯ C α ¯ C α · · · ¯ C α k k =  k Q ℓ =2 ( α ℓ )!2 α ℓ / ( α ℓ / ℓ ) α ℓ / , if all α ℓ are even0 , else (12)where the right hand side is the joint moment of the multivariate gaussian distribution N ( , Σ),where Σ is a diagonal matrix with diagonal vector (4 , , · · · , k ). We write ¯ C α = ¯ C α ¯ C α · · · ¯ C α k k for short. Deﬁne L ( α ) = P kt =2 tα t . In order to distinguish diﬀerent indices in the summation in¯ C α , we write I ts = ( i t , i t , · · · , i ts ) where i tℓ ∈ [ n ] are disjoint and J ts = ( j t , j t , · · · , j ts ) where j tℓ ∈ [ m ]are disjoint. This implies that E ∗ ¯ C α = ( 1 √ mn ) L ( α ) E ∗ k Y s =2 α s Y t =1 X I ts ,J ts s Y ℓ =1 ( G i tℓ ,j tℓ G i tℓ +1 ,j tℓ − β n √ mn )= ( 1 √ mn ) L ( α ) X I,J E ∗ k Y s =2 α s Y t =1 s Y ℓ =1 ( G i tℓ ,j tℓ G i tℓ +1 ,j tℓ − β n √ mn ):= ( 1 √ mn ) L ( α ) X I,J E ∗ G I,J where I represents all I ts where s ranges from 2 to k and t ranges from 1 to α s . The same deﬁnitionholds for J . For each ﬁxed I and J , we deﬁne a series of multigraphs. Speciﬁcally, we start with H = ([ n ] , E ( H )). We write V ( H ) to be the set of all non-isolated vertices. We go over all I ts andfor any i tℓ ∈ I ts , we draw an edge ( i tℓ , i tℓ +1 ) in H . Now we construct other multigraphs. Let R bethe set of all j tℓ that appears in J . Fix any j ∈ R , we construct a multigraph H j = ([ n ] , E ( H j )) inthe following way. We go over all I ts and for any i tℓ ∈ I ts , if j tℓ = j , then we draw an edge ( i tℓ , i tℓ +1 ) in H j . For any two pairs of ( I, J ) and ( I ′ , J ′ ), we write ( I, J ) ∼ ( I ′ , J ′ ) if | R ( I, J ) | = | R ( I ′ , J ′ ) | andthere is a permutation σ of [ m ] such that all the multigraphs H j ( I, J ) is isomorphic to H σ ( j ) ( I ′ , J ′ ).We refer to the equivalent classes as types and write ( I, J ) ∈ T to indicate that they belong to aspeciﬁc type T . If ( I, J )’s belong to the same type, then an easy observation tells us that E ∗ G I,J are the same by symmetry. Therefore, we can write E ∗ ¯ C α = ( 1 √ mn ) L ( α ) X T X I,J ∈ T E ∗ G I,J = X T O (cid:18) ( 1 n ) L ( α ) n | V ( H ) | n | R | (cid:19) E ∗ G T . If we denote R as the number of even multigraphs H j , R as the number of multigraphs H j withexactly two vertices and | E ( H j ) | ≥ R = | R | − R − R , then by Lemma 4.3, we have that E ∗ G T is bounded by O ( n − R − R ). This implies that E ∗ ¯ C α = X T O ( n −L ( α )+ | V ( H ) | + R + R + R − R − R ) = X T O ( n −L ( α )+ | V ( H ) | + R − R ) . Now for any v ∈ V ( H ), we deﬁne R ( v ) to be the number of j such that H j is even and v ∈ V ( H j ).Then we claim that deg H ( v ) ≥ R ( v )+ 2. Note that as H is a sum of cycles, any vertex v ∈ V ( H )satisﬁes deg H ( v ) ≥

2. Moreover, as H j are even, deg H ( v ) ≥ R ( v ). Now if R ( v ) = 1, then two13dges ( v, u ) and ( v, u ) in E ( H j ) must come from two diﬀerent cycles by deﬁnition. In each cycle,there is another edge that are incident to v . This implies that deg H ( v ) ≥

4. All together, thisclaim holds. This implies that12 X v ∈ V ( H ) deg( v ) ≥ X v ∈ V ( H ) R ( v ) + V ( H ) ≥ R + V ( H ) . Therefore, E ∗ ¯ C α = X T O ( n − R ) . Note that in order for a type T to have O (1) in the summation, we will need all the inequalitiesabove to be equality. Speciﬁcally, we need R = 0, deg H ( v ) = R ( v ) + 2 for any vertex v ∈ V ( H )and | V ( H j ) | = 2 for any even H j . Note that deg H ( v ) = R ( v ) + 2 holds only when deg H ( v ) = 2and R ( v ) = 0 or deg H ( v ) = 4 and R ( v ) = 2. Now we show that R = 0. If R = 0, ﬁndany H j with exactly two odd vertices and | E ( H j ) | ≥

2. Find a vertex v ∈ V ( H j ) with deg H j ( v )even. Then deg H ( v ) > v, u ) and ( v, u ) in E ( H j ) must come fromtwo diﬀerent cycles. Yet, deg H ( v ) = 4 and R ( v ) = 2 cannot hold at the same time becausedeg H ( v ) ≥ R ( v ) + R ( v ). Therefore, all H j are even graphs and as | V ( H j ) | = 2 for any even H j , H j can only be a double edge. Moreover, H must be a disjoint union of double cycles (cycleswhere each edge is a double edge). This is only possible when all α s are even and by deﬁnition,there should be α s / s -double cycle. We call this type T . Any other type of graphs will giverise to O ( n − ) in the summation. Now we compute( 1 √ mn ) L ( α ) X I,J ∈ T E ∗ G I,J = ( 1 √ mn ) L ( α ) X I,J ∈ T (1 + O ( n − ))= ( 1 √ mn ) L ( α ) { I, J ∈ T } (1 + O ( n − )) . Note that for each s , there are ( α s )! / (2 α s / ( α s / α s cyclesand for each pair of cycles, there are 2 s ways to align them into a double cycle. Additionally, thereare n L ( α ) / (1 + O ( n − ) ways to ﬁnd L ( α ) / H and m L ( α ) / (1 + O ( n − ) ways to ﬁnd L ( α ) / R . Putting all of these together, we have that( 1 √ mn ) L ( α ) X I,J ∈ T E ∗ G I,J = (1 + O ( n − ) k Y s =2 ( α s )!2 α s / ( α s / s ) α s / . Therefore, equation (12) holds. By the method of moment, we have Lemma 3.2. The argument for P ∗ t and P are the same, so we omit it here. 14 roof of Lemma 4.2. We start with P ∗ . Notice that C k ( G ) − ¯ C k ( G ) − (2 β n ) k = − (2 β n ) k + ( 1 √ n ) k ( 1 √ m ) k X i ,i , ··· ,i k ∈ [ n ] disjoint j ,j , ··· ,j k ∈ [ m ] disjoint k Y ℓ =1 ( G i ℓ ,j ℓ G i ℓ +1 ,j ℓ ) − k Y ℓ =1 ( G i ℓ ,j ℓ G i ℓ +1 ,j ℓ − β n √ mn )= ( 1 √ n ) k ( 1 √ m ) k X i ,i , ··· ,i k ∈ [ n ] disjoint j ,j , ··· ,j k ∈ [ m ] disjoint X A ( [ k ] ,A = ∅ Y ℓ ∈ A ( G i ℓ ,j ℓ G i ℓ +1 ,j ℓ − β n √ mn ) Y ℓ ∈ [ k ] \ A β n √ mn + O ( 1 n ) . Write I = i , · · · , i k and J = j , · · · , j k . Deﬁne R ( I, J ) := X A ( [ k ] ,A = ∅ Y ℓ ∈ A ( G i ℓ ,j ℓ G i ℓ +1 ,j ℓ − β n √ mn ) Y ℓ ∈ [ k ] \ A β n √ mn . Note that E ∗ ( C k ( G ) − ¯ C k ( G ) − (2 β n ) k ) (13)= ( 1 n ) k ( 1 m ) k X I ,J ,I ,J E ∗ R ( I , J ) R ( I , J ) + O ( 1 n ) . (14)Notice that E ∗ R ( I , J ) R ( I , J )= X A ( [ k ] A = ∅ X A ( [ k ] A = ∅ O ( n | A | + | A |− k ) E ∗ Y ℓ ∈ A ( G i ℓ ,j ℓ G i ℓ +1 ,j ℓ − β n √ mn ) Y ℓ ∈ A ( G i ℓ ,j ℓ G i ℓ +1 ,j ℓ − β n √ mn )For a ﬁxed A and A , we construct multigraphs H and H j the same way as before for i ℓ , j ℓ where ℓ ∈ A and i ℓ , j ℓ where ℓ ∈ A . Using the same notation, by Lemma 4.3, we have E ∗ Y ℓ ∈ A ( G i ℓ ,j ℓ G i ℓ +1 ,j ℓ − β n √ mn ) Y ℓ ∈ A ( G i ℓ ,j ℓ G i ℓ +1 ,j ℓ − β n √ mn ) = O ( n − R − R ) . And thus its total contribution to the sum in (13) is O ( n | A | + | A |− k + V ( H )+ R − R ). Note that anyeven graph H j must have two vertices (thus it is a double edge), because our A and A are onlycoming from two cycles. If H j has more than two vertices, then any vertices in H j with degreelarger than 2 will needs more than two cycles and thus cause a contradiction. So R is bounded bythe number of double edges in H . As H are sum of two subsets of a cycle, E ( H ) consists of onlyedges and double edges. Therefore, R + V ( H ) ≤ k . As | A | < k and | A | < k , we have that E ∗ ( C k ( G ) − ¯ C k ( G ) − (2 β n ) k ) = O ( 1 n ) . P ∗ t when | t | ≤ log n , we note that( 1 √ n ) k ( 1 √ m ) k X i ,i , ··· ,i k ∈ [ n ] disjoint j ,j , ··· ,j k ∈ [ m ] disjoint k Y ℓ =1 β n [ i ℓ , j ℓ ∈ Q ] √ mn = ( 1 √ n ) k ( 1 √ m ) k ( mn ) k (( n t √ n k + ( n − t √ n k )( 4 β n √ mn ) k (1 + O ( 1 n ))= 2(2 β n ) k (1 + O ( t √ n )) . The rest of the arguments are similar. We will need to apply Lemma 4.4. We omit the detailedarguments here.

We ﬁrstly note that a immediate consequence from Lemma 3.3 is the following.

Lemma 4.5.

Let m = ⌊ αn ⌋ . For any integer M ≥ and M > , E ∗ exp( − Y M [ | Y M | ≤ M ]) ≥ − exp( − M L ( M ) ) as n goes to inﬁnity. For any | t | ≤ log n and any integer M ≥ , E ∗ t exp( − Y M [ | Y M | ≤ M ]) ≤ exp( − L ( M )) . as n goes to inﬁnity.Proof of Lemma 3.1. To begin with, note that C k is invariant if we multiply any column of G with −

1. This implies that the law of C k under P ∗ is the same for any solutions planted. The sameholds for Y M . Therefore, by Lemma 4.5, E (cid:18) Z ( G )exp( Y M [ | Y M | ≤ M ]) (cid:19) = 2 − mn X X ∈{± } n X G ∈{± } mn ( X ∈ S ( G ))exp( Y M [ | Y M | ≤ M ])= E ( Z ( G )) E ∗ exp( − Y M ( G, m, n ) [ | Y M | ≤ M ]) ≥ (1 − ε ) E ( Z ( G )) , when M is large. Note that C k is also invariant under any permutations of the columns. Thisimplies that the law of C k under P ∗ t is the same for any pair of solutions planted with h X , X i =16 √ n . Similar to the above, we have E (cid:18) Z ( G ) exp(2 Y M [ | Y M | ≤ M ]) (cid:19) = 2 − mn X X ∈{± } n X X ∈{± } n X G ∈{± } mn ( X , X ∈ S ( G ))exp( Y M [ | Y M | ≤ M ])= 2 − mn X t : | t |≤√ n n + t √ n ∈ Z X X ,X ∈{± } n h X ,X i = t √ n X G ∈{± } mn ( X , X ∈ S ( G ))exp( Y M [ | Y M | ≤ M ])= 4 n − n X t : | t |≤√ n n + t √ n ∈ Z (cid:18) n n + t √ n (cid:19) P t E ∗ t exp( − Y M [ | Y M | ≤ M ]) , where P t denotes the probability of X , X ∈ S ( G ) for a given pair of vectors X and X where h X , X i = t √ n . Note that P t is explicit, P t =  X x ,x (cid:18) n + t √ n n + ( t + x + x ) √ n (cid:19)(cid:18) n − t √ n n − ( t + x − x ) √ n (cid:19) − n  m , (15)where the summation is over x and x that satisfy | x | , | x | ≤ κ , n + x √ n ∈ Z , n + x √ n ∈ Z and n + ( x + x ) √ n even. By Stirling’s approximation, for t ≤ log n , P t = (cid:18) P κ,n (1 + ( t − − µ ,κ,n ) n + O ( t n )) (cid:19) m = P mκ,n exp(2 β n ( t − o (1)) . More generally, by Stirling’s approximation again, for any small ε >

0, there exist δ > | t | ≤ δ √ n , we have P t ≤ P mκ,n exp((2 β + ε ) t ) , when n is large. Note that as β < /

2, for a small ε and for | t | ≤ δ √ n ,2 − n (cid:18) n n + t √ n (cid:19) P t P mκ,n ≤ exp( − ε t ) . By using Hypothesis 2.1 again, for small enough δ , we have for | t | ≥ δ √ n − n (cid:18) n n + t √ n (cid:19) P t P mκ,n ≤ exp( − ε n ) , for a small ε >

0. We also note that E ∗ t exp( − Y M ( G, m, n ) [ | Y M | ≤ M ] ≤ exp(2 M ). As17 < /

2, by Lemma 4.5, we can bound the second moment by E ( Z ( G )) X t : | t |≥ δ √ n n + t √ n ∈ Z exp( − ε n ) exp(2 M ) + E ( Z ( G )) X t :log n ≤| t |≤ δ √ n n + t √ n ∈ Z exp( − ε t ) exp(2 M )+ E ( Z ( G )) X t : | t |≤ log n n + t √ n ∈ Z (cid:18) n n + t √ n (cid:19) exp(2 β n ( t − o (1)) exp( − L ( M )) ≤ E ( Z ( G )) exp( − β ) 1 p − β exp( − L ( M ))(1 + o n (1)) . Notice that when β < / L ( M ) converges as M goes to inﬁnity. In particular, | L ( M ) − ( − log(1 − β ) − β ) | → , as M → ∞ . So by taking M large, the second moment is bounded by (1 + ε ) E ( Z ( G )) . Proof of Theorem 2.5.

Note that (cid:12)(cid:12)(cid:12)(cid:12) Z ( G ) / E Z exp( Y ) − (cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12) Z ( G ) / E Z exp( Y ) − Z ( G ) / E Z exp( Y ( | Y | ≤ M )) ) (cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12) Z ( G ) / E Z exp( Y ( | Y | ≤ M )) − Z ( G ) / E Z exp( Y M ( | Y M | ≤ M )) (cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12) Z ( G ) / E Z exp( Y M ( | Y M | ≤ M )) − (cid:12)(cid:12)(cid:12)(cid:12) := T + T + T . Notice that E C k = 0 for any k . We start by bounding E C k when k ≤ log n . Notice that E C k = ( 1 n ) k ( 1 m ) k X i ,i , ··· ,i k ∈ [ n ] disjoint j ,j , ··· ,j k ∈ [ m ] disjoint X i ,i , ··· ,i k ∈ [ n ] disjoint j ,j , ··· ,j k ∈ [ m ] disjoint E k Y ℓ =1 G i ℓ ,j ℓ G i ℓ +1 ,j ℓ k Y ℓ =1 G i ℓ ,j ℓ G i ℓ +1 ,j ℓ . In the summation, the only terms that are not zero are those when G · , · appears exactly twice inthe product. Therefore, E C k ≤ ( 1 n ) k ( 1 m ) k ( mn ) k k = 2 k. This implies that E Y = E ( ⌊ log n ⌋ X k =2 β ) k C k − (2 β ) k k ) ≤ − β ) + ∞ X k =2 | β | k ∞ X k =2 | β | k E C k (2 k ) ≤ − β ) + 1(1 − | β | ) . E | Y − Y M | = E ( ⌊ log n ⌋ X k = M β ) k C k − (2 β ) k k ) ≤ (2 β ) M (1 − β ) + ∞ X k = M | β | k ∞ X k = M | β | k E C k (2 k ) (16) ≤ (2 β ) M (1 − β ) + (2 β ) M (1 − | β | ) . (17)For any ε, ε ′ >

0, by Lemma 3.1, we know that we can choose M and M large enough such that P ( T ≥ ε ) < ε ′ , for large n . Further, we have P ( T ≥ ε ) ≤ P ( | Y | ≥ M ) ≤ ( 1(1 − β ) + 1(1 − | β | ) ) 1 M . In addition, by Lemma 3.2, we have for a large M > P ( | Y M | ≥ M ) ≤ exp( − M L ( M ) ) , when n is large. Using equation (17), we further have that for a big constant C , P ( | Y | ≥ M ) ≤ exp( − M L ( M ) ) + C (2 β ) M when n is large. This implies that when n is large, for a big constant C , P (exp( | Y ( | Y | ≤ M ) − Y M ( | Y M | ≤ M ) | ) ≥ ε ) ≤ M (exp( − M L ( M ) ) + C (2 β ) M ) + C (2 β ) M log(1 + ε ) , which goes to zero when M and M are large. Therefore, for any ε, ε ′ >

0, we can choose M and M large enough such that P ( T ≥ εZ ( G ) / E Z exp( Y M ( | Y M | ≤ M )) ) ≤ ε. Together with our estimation on T , we have the theorem. Proof of Theorem 2.1.

By equation (17), we know that E | Y − Y M | can be made arbitrarily smallby taking M large. By Lemma 3.3, under P , for any integer M ≥ Y M d −→ N ( − L ( M ) , L ( M )) , as n goes to inﬁnity. As L ( M ) → − log(1 − β ) − β when M converges to inﬁnity, we havethat Y d −→ N ( 14 log(1 − β ) + β , −

12 log(1 − β ) − β ) , as n goes to inﬁnity. This implies thatexp( Y ) d −→ Lognormal( 14 log(1 − β ) + β , −

12 log(1 − β ) − β ) , as n goes to inﬁnity. Together with Theorem 2.5, we have the desired result.19 .4 Proof of Theorem 2.2 Consider a series of events A n . If P ( A n ) →

0, then P ∗ ( A n ) = E [ Z ( G ) ( A n ) E Z ] ≤ E [ Z ( G ) E Z ( Z ( G ) E Z ≥ exp( M ))] + exp( M ) P ( A n ) . When M goes to inﬁnity, the ﬁrst term goes to zero. Therefore, for any ε >

0, we choose an M such that the ﬁrst term is bounded by ε/

2. Then there exist an integer

N > P ∗ ( A n ) < ε whenever n > N . Now, if P ∗ ( A n ) →

0, then by Theorem 2.1, P ( A n ) ≤ P ( Z ( G ) E Z ≤ exp( − M )) + P ( A n ( Z ( G ) E Z ≥ exp( − M ))) ≤ P ( Z ( G ) E Z ≤ exp( − M )) + E ∗ [ E ZZ ( G ) ( Z ( G ) E Z ≥ exp( − M ) , A n )] ≤ exp( − cM ) + exp( M ) P ∗ ( A n ) , for some constant c when M and n are large. This implies that for any ε >

0, we can choose M large such that the ﬁrst term is bounded by ε/

2. Then there exist an integer

N > P ( A n ) < ε whenever n > N . Therefore, the theorem follows. For κ > α > α c , take m = ⌊ αn ⌋ . Let 0 < δ < α − α c . Then when n is large, P ( Z ( G ) ≥ ≤ E ( Z ( G )) = 2 n P mκ = exp( n (log 2 + mn log( P κ ))) ≤ exp( − δn ) , which converges to zero as n goes to inﬁnity. For κ > α < α c , take m = ⌊ αn ⌋ . Let0 < δ < α c − α . Then, E ( Z ( G )) = 2 n P mκ = exp( n (log 2 + mn log( P κ ))) ≥ exp( δn ) . This implies that P ( Z ( G ) ≥

1) = P (cid:18) Z ( G ) E Z ( G ) ≥ E Z ( G ) (cid:19) ≥ P (cid:18) Z ( G ) E Z ( G ) ≥ exp( − δn ) (cid:19) , which converges to 1 as n goes to inﬁnity by Theorem 2.1. Let X be chosen uniformly at random from {± } n . Consider the planted model P ∗ when X isplanted. We ﬁrstly show thatlim n →∞ P ∗ ( { X ∈ S ( G ) : d ( X , X ) ≤ dn ) } = { X } ) = 1 . Recall that we deﬁned P t to be the probability of X , X ∈ S ( G ) for a given pair of vectors X and X where h X , X i = t √ n . Then we have E ∗ ( |{ X ∈ S ( G ) : d ( X , X ) ≤ dn }| −

1) = dn X ℓ =1 (cid:18) nℓ (cid:19) P ( n − ℓ ) / √ n P mκ,n .

20e start by showing that there exists a small δ > c > P ( n − ℓ ) / √ n ≤ exp( − c √ ℓn ) P mκ,n , whenever ℓ ≤ δn . Without loss of generality, assume that X is the all one vector and X is thevector with ℓ ones in the front and − X be chosen uniformly atrandom from {± } n . Deﬁne E to be the event that |h X , X i| ≤ κ √ n and E to be the event that |h X , X i| ≤ κ √ n . Then for large n , there exists a small constant c such that P ( E ) − P ( E , E ) ≥ P (cid:16) ≤ κ √ n − B ≤ √ ℓ, B ≥ √ ℓ (cid:17) ≥ c √ ℓ/ √ n, where B and B are independent, B is the sum of ℓ independent Rademacher random variablesand B is the sum of n − ℓ independent Rademacher random variables. This implies that P ( E , E ) ≤ exp( − c √ ℓ/ √ n ) P κ,n . Further we have for small constants c and c , P ( n − ℓ ) / √ n ≤ exp( − c √ ℓm/ √ n ) P mκ,n ≤ exp( − c √ ℓn ) P mκ,n . Therefore, dn X ℓ =1 (cid:18) nℓ (cid:19) P ( n − ℓ ) / √ n P mκ,n ≤ dn X ℓ =1 (cid:18) nℓ (cid:19) exp( − c √ ℓn ) = o n (1) , when d is small. By Theorem 2.2, the statement holds. References [AC08] Dimitris Achlioptas and Amin Coja-Oghlan. “Algorithmic barriers from phase transi-tions”. In: .IEEE. 2008, pp. 793–802.[APZ19] Benjamin Aubin, Will Perkins, and Lenka Zdeborov´a. “Storage capacity in symmetricbinary perceptrons”. In:

Journal of Physics A: Mathematical and Theoretical

Physical reviewletters

Proceedings of the National Academy of Sciences

Journal of Statistical Mechanics: Theory and Experiment

Journal of Statistical Me-chanics: Theory and Experiment

Journal of Sta-tistical Physics

Physical review letters

Journal of Combinatorial Theory, Series B

Technicalreport, MDS-SPOC, EPFL (2020).[Cov65] Thomas M Cover. “Geometrical and statistical properties of systems of linear inequal-ities with applications in pattern recognition”. In:

IEEE transactions on electroniccomputers

Pro-ceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing . 2019,pp. 816–827.[GD88] Elizabeth Gardner and Bernard Derrida. “Optimal storage properties of neural networkmodels”. In:

Journal of Physics A: Mathematical and general

Combinatorics,Probability and Computing

Physical Review E

Journal of Physics A: Mathematicaland Theoretical

Combinatorics, Probability and Computing

Journal de Physique

Journal of Computer and System Sciences

Proceedings of the National Academy of Sciences

Probability Theory and Related Fields

Pro-ceedings of the forty-fourth annual ACM symposium on Theory of computing . 2012,pp. 921–930.[MWW09] Elchanan Mossel, Dror Weitz, and Nicholas Wormald. “On the hardness of samplingindependent sets beyond the tree threshold”. In:

Probability Theory and Related Fields

Phys. Rev. Lett.

66 (20 1991),pp. 2677–2680. doi : . url : https://link.aps.org/doi/10.1103/PhysRevLett.66.2677 .[PX21] Will Perkins and Changji Xu. “Frozen 1-RSB structure of the symmetric Ising percep-tron”. In: arXiv preprint arXiv:2102.05163 (2021).[RW94] Robert W. Robinson and Nicholas C. Wormald. “Almost all regular graphs are Hamil-tonian”. In: Random Structures & Algorithms arXiv preprint arXiv:1306.4375 (2013).[Tal99] Michel Talagrand. “Intersecting random half cubes”. In:

Random Structures & Algo-rithms arXiv preprintarXiv:1905.05978 (2019).[ZK11] Lenka Zdeborov´a and Florent Krzakala. “Quiet planting in the locked constraint sat-isfaction problems”. In:

SIAM Journal on Discrete Mathematics

Journal of Statistical Mechanics: Theory and Experiment