Spectral Planting and the Hardness of Refuting Cuts, Colorability, and Communities in Random Graphs
Afonso S. Bandeira, Jess Banks, Dmitriy Kunisky, Cristopher Moore, Alexander S. Wein
aa r X i v : . [ c s . CC ] A ug Spectral Planting and the Hardness of Refuting Cuts, Colorability,and Communities in Random Graphs
Afonso S. Bandeira ∗ , Jess Banks ‡ , Dmitriy Kunisky § , Cristopher Moore ¶ , andAlexander S. Wein k Department of Mathematics, ETH Zurich Department of Mathematics, UC Berkeley Department of Mathematics, Courant Institute of Mathematical Sciences, NYU Santa Fe InstituteAugust 28, 2020
Abstract
We study the problem of efficiently refuting the k -colorability of a graph, or equivalently certifying a lower bound on its chromatic number. We give formal evidence of average-casecomputational hardness for this problem in sparse random regular graphs, showing optimalityof a simple spectral certificate. This evidence takes the form of a computationally-quiet planting :we construct a distribution of d -regular graphs that has significantly smaller chromatic numberthan a typical regular graph drawn uniformly at random, while providing evidence that thesetwo distributions are indistinguishable by a large class of algorithms. We generalize our resultsto the more general problem of certifying an upper bound on the maximum k -cut.This quiet planting is achieved by minimizing the effect of the planted structure (e.g. coloringsor cuts) on the graph spectrum. Specifically, the planted structure corresponds exactly toeigenvectors of the adjacency matrix. This avoids the pushout effect of random matrix theory,and delays the point at which the planting becomes visible in the spectrum or local statistics.To illustrate this further, we give similar results for a Gaussian analogue of this problem: aquiet version of the spiked model, where we plant an eigenspace rather than adding a genericlow-rank perturbation.Our evidence for computational hardness of distinguishing two distributions is based onthree different heuristics: stability of belief propagation, the local statistics hierarchy, and thelow-degree likelihood ratio. Of independent interest, our results include general-purpose boundson the low-degree likelihood ratio for multi-spiked matrix models, and an improved low-degreeanalysis of the stochastic block model. ∗ Email: [email protected] . Some of this work was done while with the Department of Mathematics at theCourant Institute of Mathematical Sciences, and the Center for Data Science, at New York University; and partiallysupported by NSF grants DMS-1712730 and DMS-1719545, and by a grant from the Sloan Foundation. ‡ Email: [email protected] . Supported by the NSF Graduate Research Fellowship Program under GrantDGE-1752814. § Email: [email protected] . Partially supported by NSF grants DMS-1712730 and DMS-1719545. ¶ Email: [email protected] . Partially supported by NSF grant IIS-1838251. k Email: [email protected] . Partially supported by NSF grant DMS-1712730 and by the Simons Collaborationon Algorithms and Geometry.
Introduction
Assuming the widely believed P = NP hypothesis, many combinatorial problems in graphs are knownto be computationally hard. Prominent examples from graph theory and network science includefinding large cliques or independent sets, clustering or maximizing cuts, and finding vertex coloringsor computing chromatic numbers. Fortunately, the worst-case computational difficulty of many ofthese problems appears to not be predictive of their feasibility in typical graphs, motivating thestudy of forms of average-case complexity for many of these problems. Many remarkable examplesexist dating back to at least the work of Karp and others in the mid 70s [Kar76], they includethe problem of vertex colorings [GM75] in random graphs, the related problem of finding thelargest independent set (or, equivalently, clique) and many others [Kar76]. In both of theseproblems, for certain natural distributions of random graphs, a multiplicative gap of 2 is identifiedbetween the typical optimal solution and the solution found by the best known polynomial-timealgorithm [Kar86], and improving over this has since been a standing open problem. Motivated bythis question, Kucera [Kuc95] and Alon et al. [AKS98] studied random graph models with plantedstructures (either a large independent set or clique, or a coloring with an unusually small numberof colors) and investigate when such structures are easy to detect. Foreshadowing to what follows,we point out that the existence of a planted structure that cannot be detected efficiently impliesthat that it is impossible to efficiently refute the existence of such a structure in the underlyingunplanted model. In this paper we will focus on the problem of computing the chromatic number,and the related problem of understanding the size of the largest k -cut. The random graph modelthroughout is the uniform distribution over d -regular graphs on n nodes. Refuting colorability.
For an integer k ≥
1, a graph G = ( V, E ) is k -colorable if there exists anassignment σ : V → [ k ] of “colors” to the vertices such that σ ( i ) = σ ( j ) for every edge ( i, j ) ∈ E .The chromatic number χ ( G ) of G is defined as the minimum value of k for which G is k -colorable.A random d -regular graph G on n vertices (we will write random variables in bold-face fontthroughout the paper) is known to have a typical chromatic number χ ( G ) ∼ d log d (see, e.g.,[COEH16] and references therein) in the double limit n → ∞ followed by d → ∞ , where f ∼ g denotes f /g →
1. The problem we will study is that of algorithmically refuting the k -colorabilityof a graph, which we define in Section 2.1. Informally speaking, an algorithm that refutes k -colorability provides an efficiently-verifiable proof that G is not k -colorable. As a simple example,an algorithm exhibiting a ( k + 1)-clique refutes k -colorability. More generally, one may encode a k -coloring as a collection of boolean variables satisfying certain logical relations depending on G ,and refute coloring by deriving a contradiction from those axioms.We will provide evidence that the refutation problem is computationally hard when G is auniformly random d -regular graph. The proof strategy is to construct a different distribution over d -regular graphs whose typical chromatic number is χ ( G ) ∼ √ d , and to argue that this distributionis computationally hard to distinguish from a uniformly random d -regular graph (whose chromaticnumber is instead χ ( G ) ∼ d log d )—we think of this new distribution as having a computationally-quiet planting of a coloring with few colors. We will see below that the value √ d coincides with asimple spectral bound on χ ( G ) for G uniformly random, and so our result is a tight computationallower bound on refutation algorithms. The formal evidence for computational hardness used inthis paper is threefold: we provide consistent pieces of evidence based on (i) the Kesten–Stigum See also Karp’s lecture on the occasion of his Turing Award [Kar86]. Recall that the set of nodes of the same color in a vertex coloring is an independent set. This terminology is inspired by the notion of quiet planting from prior work [KZ09, ZK11], although our notionis somewhat different.
The spectral refutation.
Instead of just refuting k -colorability, we will consider the more gen-eral task of refuting the existence of large k -cuts in a graph. We define the fractional size of thelargest such cut as MC k ( G ) := max σ : V → [ k ] |{ ( u, v ) ∈ E : σ ( u ) = σ ( v ) }|| E | ∈ (0 , . (1)Intuitively, MC k ( G ) describes how close G is to being k -colorable, as the cut counts the fractionof polychromatic edges under the coloring σ . MC k ( G ) is non-decreasing in k , and for any k , G is k -colorable if and only if MC k ( G ) = 1; therefore, the chromatic number is given by χ ( G ) = min { k : MC k ( G ) = 1 } = 1 + max { k : MC k ( G ) < } . (2)Accordingly, upper bounds on MC k away from the maximum value of 1 yield lower bounds on thechromatic number. This task is often called certifying a bound on the optimization problem MC k .The relations (2) show how certifying such a bound in turn refutes colorability. In fact, Hoffman’searly work [Hof70] proposed a technique for any d -regular graph G that gives, when rephrased inour notation, a refutation of colorability by bounding MC k via the minimum eigenvalue λ min ( A G )of the adjacency matrix A G of G . Namely, Hoffman showed that for any d -regular graph G ,MC k ( G ) ≤ k − k (cid:18) − λ min ( A G ) d (cid:19) . (3)(Note that λ min ( A G ) ≤ A G ) = 0.) A short proof of (3) will be given in Section 2.1. Wenote that k − k is the expected value of the objective of (1) when σ : V → [ k ] is chosen uniformlyat random, so expressions like the right-hand side of (3) should be viewed as expressing a factorof “gain” over this value. For G a uniformly random d -regular graph on n vertices, whose law wedenote G n,d , a theorem due to Friedman [Fri03] states that λ min ( G ) = − √ d − o (1) with highprobability for any fixed d ≥
3. This implies that when G ∼ G n,d , Hoffman’s spectral approachwith high probability certifies the upper boundMC k ( G ) ≤ k − k (cid:18) √ d − d + o (1) (cid:19) . (4)This in turn translates to a lower bound on the chromatic number of χ ( G ) ≥ (1 − o d (1)) √ d , where o d pertains to the double limit n → ∞ followed by d → ∞ (see Section 1.2).Equipped with this direct analysis of a simple technique, the natural question arises: can anypolynomial-time algorithm produce a bound that, like Hoffman’s, is valid for any graph G , but istypically tighter for G ∼ G n,d ? Most prior work on this question has studied bounds provided by asemidefinite program computing the Lov´asz ϑ function , which, as shown by [BKM17], is equivalentto the degree-2 sum-of-squares relaxation of MC k ( G ). The work [CO03] and a later, more preciseanalysis by [BKM17] showed that, with high probability when G ∼ G n,d , this relaxation certifies abound still no better thanMC k ( G ) ≤ k − k (cid:18) √ d − d + 2 √ d − − o (1) (cid:19) = k − k (cid:18) √ d − d − O d (cid:18) d (cid:19)(cid:19) . (5) Here and throughout, o (1) pertains to the limit n → ∞ with d fixed. We say that an event occurs with highprobability (w.h.p.) if it has probability 1 − o (1). n → ∞ followed by d → ∞ . Nopolynomial-time certifier is known to asymptotically improve upon this bound. It appears plausible,then, that the spectral bound (4) is an optimal efficiently-computable certificate on MC k ( G ) for G ∼ G n,d . In this work, we will argue that this is indeed the case. Detecting a planted k -cut. The following line of reasoning will be central to this work: to provecomputational hardness of a refutation problem, it is sufficient to construct a computationally-quietplanted distribution (e.g. [BKW20]). To illustrate the meaning of this, suppose our goal is to showcomputational hardness of refuting (w.h.p.) k -colorability of a graph drawn from G n,d . Suppose weare able to construct a planted distribution P over d -regular graphs such that (i) a typical graphdrawn from P is k -colorable, and (ii) P is computationally quiet in the sense that no polynomial-time algorithm can distinguish (w.h.p.) between a sample from P and a sample from G n,d . Itthen follows that no polynomial-time algorithm can refute k -colorability in G n,d , because if such arefutation algorithm were to exist, it must succeed w.h.p. on G n,d and must fail w.h.p. on P , thusproviding a solution to the distinguishing problem. More generally, to show hardness of certifyingan upper bound on the maximum k -cut, we need a planted distribution for which there exists alarge k -cut.As discussed in [BKM17], a natural planted distribution with a large k -cut is a d -regular variantof the popular stochastic block model (SBM) . To sample a graph from this d -regular distribution,which we will denote G sbm n,d,k,η , first sample a balanced labelling σ : [ n ] → [ k ] uniformly at random( balanced means that | σ − ( i ) | = n/k for every i ∈ [ k ]), and then choose G uniformly among d -regular graphs conditional on the event |{ ( u, v ) ∈ E : σ ( u ) = σ ( v ) }|| E | = k − k (1 − η ) . (6)Note that this ensures MC k ( G ) ≥ k − k (1 − η ), since the planted partition σ witnesses a k -cut withthat fraction of bichromatic edges. We will mostly be concerned with the disassortative regime ofthis model where η ∈ [ − k − , k -cut σ is larger than a typical one; for instance,when η = − k − and the planted cut includes every edge, we have the well-studied planted coloringmodel.The relevance of this distribution to certifying bounds on MC k for G ∼ G n,d is as follows: ifan algorithm can with high probability over G n,d certify that MC k ( G ) < k − k (1 − η ), then (asdiscussed above) it is simple to build another testing (or detection ) algorithm that distinguishesbetween G ∼ G n,d and G ∼ G sbm n,d,k,η with high probability. The advantage of taking this pointof view is that there is a rich literature, originating in heuristic methods from statistical physics,that has provided a great deal of evidence that polynomial-time testing between G n,d and G sbm n,d,k,η is impossible below the Kesten–Stigum threshold , i.e., when d < d sbmKS = d sbmKS ( η ) := 1 η + 1 . (7)We will refer to this claim as the SBM conjecture . Polynomial-time algorithms are known tosucceed when d > d sbmKS [Mas14, MNS18, AS16, BMR19]. While proving such a conjecture seemsto be beyond the reach of current techniques (even under an assumption such as P = NP ), variousforms of concrete evidence have been given (either for the d -regular SBM and other variants).These include results on stability of belief propagation [DKMZ11b, DKMZ11a], the local statistics For now, we will not consider the integrality conditions on the parameters ( n, d, k, η ) that this implies, althoughthese types of considerations will be important later. G ∼ G n,d , no polynomial-time algorithm can certify with high probability a boundstronger than MC k ( G ) ≤ k − k (cid:18) √ d − − o (1) (cid:19) . (8)Comparing this to (4) and (5), we see a discrepancy between the best known certification algorithmsand the above hardness result. For large d , this discrepancy amounts to a factor of 2 in the “gain”term. This begs the question of whether better certification algorithms exist, or whether thehardness result can be improved. We will see below that it is the latter. A quieter planting.
Our main contribution is to show an improved hardness result by using a“better” planted distribution. The superior planted distribution is the following more rigid versionof the SBM.
Definition 1.1 (Equitable stochastic block model) . The equitable stochastic block model (eSBM) ,denoted G eq n,d,k,η , is the probability distribution over d -regular graphs on n vertices sampled as follows:first, choose a uniformly random balanced partition σ : V → [ k ] . Then, letting M = ηI k + − ηk J k (where I k and J k are the k × k identity and all-ones matrices, respectively), for each i ∈ [ k ] placea random dM i,i -regular graph on the color class σ − ( i ) , and for each i < j ∈ [ k ] place a randombipartite dM i,j -regular graph between σ − ( i ) and σ − ( j ) . This model is only defined when k | n , dM is a nonnegative integer matrix, and dM i,i n/k is even for all i . As in G sbm n,d,k,η , the planted cut σ has fractional size k − k (1 − η ), and we will again restrict to thedisassortative case η ∈ [ − k − , + d -regular variant of the SBM discussed in the previoussection.Using some of the same methods that provide concrete evidence for the SBM conjecture—namely stability of belief propagation and the local statistics hierarchy—we will show that theequitable SBM appears to exhibit a different computational threshold from the standard SBM.Specifically, we conjecture that no polynomial-time algorithm can distinguish (w.h.p.) between G n,d and G eq n,d,k,η when d < d eqKS = d eqKS ( η ) := 2 η (cid:16) p − η (cid:17) . (9)We will state a formal version of this “eSBM conjecture” later (Conjecture 2.5), which actuallypertains to a slightly “noisy” version of the equitable block model. We remark that when k is large, η must be close to zero (in the disassortative case), and so we have approximately d eqKS ≈ d sbmKS .Repeating our earlier argument for hardness of certification, with the eSBM in place of the SBM,yields the following corresponding result: conditional on the eSBM conjecture, no polynomial-timealgorithm can certify a better bound thanMC k ( G ) ≤ k − k (cid:18) √ d − d − o (1) (cid:19) (10) This last condition ensures that it is possible to place a dM i,i -regular graph on n/k vertices. G ∼ G n,d , which matches the spectral bound (4). However, we have ignored an importantcaveat here: the equitable block model only exists when the parameters ( n, d, k, η ) satisfy certainintegrality conditions. For this reason, our actual lower bound (Theorem 2.6) is sometimes weakerthan (10) would suggest; see Section 2.2.1 for discussion. When d ≫ k , the integrality conditionsare negligible and we obtain a tight lower bound, essentially matching (10). Another settingwhere we obtain tight results is for the problem of refuting colorability (or more accurately, near-colorability) in the double limit n → ∞ followed by d → ∞ , which is discussed in Remarks 2.7and 2.8. This corresponds to the choice η = − k − . Here, when G ∼ G n,d , the following resultshold asymptotically: the true value of χ ( G ) is d log d , the spectral approach certifies a lower boundof √ d on χ ( G ), the basic SBM planting implies hardness of certifying a lower bound better than √ d , and the improved eSBM planting implies hardness of certifying a lower bound better than √ d (which is tight, matching the spectral bound). Why is the equitable SBM quieter?
Here, we give some intuition for why the equitable modelis a good quiet planting. For the sake of illustration, it helps to consider the simple rank-1 Wignerspiked matrix model: Y = η vv ⊤ + W where η > k v k = 1 (the planted“signal,” drawn from some prior), and W (the “noise”) is a GOE matrix, i.e., a symmetric matrixwith N (0 , /n ) entries (see Definition 2.11). For large n , the eigenvalues of W follow the semicirclelaw and are contained in the interval [ − , < η <
2, a surprising “pushout” effect occurs:although the planted signal v has quadratic form v ⊤ Y v ≈ η <
2, its presence causes there to existsome other unit vector u achieving u ⊤ Y u ≈ η + 1 /η > Y from W . Even though thesignal v is “small”, the vector u (which is the leading eigenvector of Y ) is able to achieve a “large”quadratic form by correlating nontrivially with both the signal v and the noise W . The main resultof [BKW20] can be interpreted as giving a “quieter” way to plant the signal “orthogonal to thenoise” with no pushout effect, i.e., v ⊤ Y v ≈ Y is ≈ A of a random d -regular graph converges to λ min := − √ d −
1. As will be madeclear in Section 2.1, the SBM is in some sense analogous to the spiked Wigner model with multipleplanted vectors, namely k “coloring vectors” v i for i ∈ [ k ], that encode the planted labelling σ asfollows: ( v i ) u = c ( k · [ σ ( u ) = i ] − c is chosen so that each k v i k = 1. Planting a k -cutof value k − k (1 + | η | ) via either the SBM or eSBM has the effect that all planted coloring vectorsachieve a small quadratic form: v ⊤ i Av i ≈ − d | η | . In the SBM there is a pushout effect similar to thespiked Wigner model, whereby an eigenvalue less than λ min can be created even when d | η | < | λ min | (see [NN12] for results when the degree grows slowly with n ). The eSBM, however, has the propertythat each coloring vector v i is an eigenvector. In particular, the subspace spanned by { v i } i ∈ [ k ] isorthogonal to the other eigenvectors, which are thus unaffected by the planted structure. As aresult, there is no pushout effect in the eSBM, allowing for a larger k -cut to be planted withoutdisrupting the minimum eigenvalue.An alternative viewpoint is that the standard plantings (the spiked Wigner model or SBM) picka random solution (e.g., a cut) and then condition on that particular solution having the desiredvalue. In contrast, the quieter plantings are perhaps more similar to conditioning on the event“there exists a solution having the desired value.” Remark 1.2.
The fact that the coloring vectors v i are eigenvectors of the eSBM can actually beexploited to give a polynomial-time algorithm for distinguishing G n,d from G eq n,d,k,η for any settings ofthe parameters. See, for example, [Bar17] for some discussion of such algorithms. For this reason, t is crucial that our eSBM conjecture (Conjecture 2.5) adds a small amount of noise to the graphin order to “defeat” these types of algorithms. We discuss this issue further in Section 1.1.4. The case k = 2 : large cuts in G n,d . Here, we briefly discuss the specific case k = 2, whichis better understood in the existing literature. In this case, MC ( G ) is merely the fraction ofedges crossing the largest cut of G , and thus up to this scaling is the solution to the well-known max-cut problem. Equivalently, letting A G be the adjacency matrix of G , we have MC ( G ) = (1 + | E | Γ ( A G )) where Γ ( A ) := − min x ∈{± } n x ⊤ Ax ≥ . When G ∼ G n,d , the behavior of Γ ( A G ) turns out to be deeply connected to its Gaussian analogueΓ ( W ) where W is a GOE matrix. The quantity Γ ( W ) has been studied in statistical physics,being the ground state energy of the Sherrington-Kirkpatrick model of spin glasses [SK75]. The deepbut non-rigorous analysis of Parisi [Par79] proposed an asymptotic value n E Γ ( W ) → P ∗ ≈ . n →∞ E G ∼G n,d MC ( G ) = 12 (cid:18) P ∗ √ d + o d (cid:18) √ d (cid:19)(cid:19) . (11)For the Gaussian setting, it was shown in [BKW20] using a quiet planting approach that (conditionalon a certain complexity assumption based on the low-degree likelihood ratio) the best possible upperbound on n Γ ( W ) that can be certified in polynomial time is 2, in contrast with the true value 2 P ∗ ;the optimal bound is given by a simple spectral certificate involving the maximum eigenvalue of W .Given this, we might expect that the best efficiently-certifiable bound on MC ( G ) for G ∼ G n,d isgiven by replacing 2 P ∗ in (11) with 2. Indeed, [MS16] and [MRX19] showed respectively that thedegree-2 and degree-4 sum-of-squares relaxations can certify a bound no better thanMC ( G ) ≤ (cid:18) √ d + o d (cid:18) √ d (cid:19)(cid:19) . (12)Our results extend the picture emerging from this literature in two important ways. We show (seeTheorem 2.6 and discussion in Section 2.2.1) that conditional on the eSBM conjecture, (12) is infact the optimal bound on MC ( G ) certifiable in polynomial time. Thus, if the eSBM conjectureholds then no constant-degree sum-of-squares relaxation can improve upon on the known resultsfor degree-2 and degree-4. Furthermore, we justify this bound with an explicit quiet planting of alarge cut in a random regular graph. The Gaussian k -cut model. Above, we have seen an intimate connection between Γ ( A G )where A G is the adjacency matrix of G ∼ G n,d , and its Gaussian counterpart Γ ( W ) where W ∼ GOE( n ). More generally, MC k ( G ) can be written in terms of a certain quantity Γ k ( A G ) definedin Section 2.1, which has a natural Gaussian counterpart Γ k ( W ).In fact, explicit formulas similar to (11) are also known that relate the asymptotic values (in thedouble limit n → ∞ followed by d → ∞ ) of Γ k ( A G ) and Γ k ( W ) even for k > k ( W ) under the Gaussian model exhibits similar behavior to the graph model: no polynomial-time certifier can improve over the basic spectral bound. The proof is again based on quiet planting,and can be seen as an extension of the results of [BKW20] which handled the k = 2 case. Our resultsrely on a complexity assumption concerning the low-degree likelihood ratio, which we discuss furtherin Section 1.1.3. As a by-product, we develop a general framework for bounding the low-degreelikelihood ratio of certain Gaussian models, which may be of independent interest. Specifically, weconduct the low-degree analysis of a broad class of multi-spiked matrix models (both Wigner andWishart), and give an improved low-degree analysis of the stochastic block model which suggeststhat fully-exponential time is needed below the Kesten–Stigum threshold. Our results on hardness of certification rely on unproven conjectures about average-case hardness,such as the eSBM conjecture. Proving these types of conjectures seems to be beyond the reachof current techniques (even when assuming standard complexity conjectures such as P = NP ), asillustrated by the fact that no such proof is known for the famous planted clique problem. However,a myriad of heuristic techniques have emerged for predicting hardness of average-case problemsby proving lower bounds against certain classes of algorithms. Taken together, these methodscreate a fairly coherent theory of computational complexity for a large class of high-dimensionalBayesian inference problems. In this section we describe the three methods that will be used inthis work: belief propagation, the local statistics hierarchy, and the low-degree likelihood ratio. Weremark that these are not the only such methods, some others being average-case reductions [BR13,BBH18], sum-of-squares lower bounds [BHK + + The sharp computational phase transition known as the
Kesten–Stigum (KS) threshold in thestochastic block model, was first predicted by [DKMZ11b, DKMZ11a] using non-rigorous ideasinspired by statistical physics. The idea is to consider the belief propagation (BP) algorithm, aniterative method that attempts to recover the planted community structure by keeping track of“beliefs” about each node’s community label and updating these in a locally-Bayesian-optimal way.BP has an “uninformative” fixed point, which is a natural starting point for the algorithm wherethe beliefs reflect no knowledge of the communities. It was shown in [DKMZ11b, DKMZ11a] thatif the signal-to-noise ratio (SNR) lies above the KS threshold then the uninformative fixed pointis unstable, suggesting that BP should leave it and find a community assignment that correlateswith the truth; and if the SNR lies below the KS threshold then the uninformative fixed pointis stable, meaning that BP will remain there and fail to find a nontrivial solution. It was laterproven that indeed it is possible to nontrivially recover the communities in polynomial time whenabove the KS threshold [Mas14, MNS18, AS16], whereas no such algorithm is known below the KSthreshold. More generally, similar computational thresholds have been predicted in various models(e.g., [LKZ15b, LKZ15a]) by examining the stability of BP or its simplified variant, approximatemessage passing (AMP) [DMM09]. For many high-dimensional inference problems, it is knownthat BP and AMP achieve optimal information-theoretic performance (e.g. [DAM17]); and whenthey don’t, it is often conjectured that they achieve the best possible performance among efficient8lgorithms. Thus, stability of BP provides concrete evidence for computational hardness.
Introduced by one of the authors, Mohanty, and Raghavendra in [BMR19] and building off ofthe work of Hopkins and Steurer [HS17], the Local Statistics hierarchy is a family of increasinglypowerful semidefinite programming algorithms for solving Bayesian hypothesis testing problems.By analogy with the Sum-of-Squares algorithm, we take this hierarchy as a proxy for hardness: thehigher we need to go to perform the hypothesis test, the harder it is.Consider a generic inference scheme where we are to distinguish between a null model Q whichoutputs unstructured data G ∈ R m , and a planted model P generating structured data G ∈ R m according to some random and hidden signal x ∈ R n . Letting x = ( x , ..., x n ) be a set of variables,we may regard the conditional expectation E x ∼ P [ p ( x ) | G ] as a random linear functional from R [ x ] R that is positive in a certain sense: E x ∼ P [ p ( x ) | G ] ≥ p .The Local Statistics hierarchy is parameterized by two integers ( D x , D G ). Given as inputsome G ∈ R m , it attempts to find a linear functional that approximates this conditional expec-tation E [ p ( x ) | G ] in the planted model. In particular, borrowing terminology from Sum-of-Squaresprogramming, we search for a “pseudoexpectation” functional e E that assigns a real number to ev-ery polynomial of degree at most D x in R [ x ], with the constraints that (i) e E p ( x ) ≥
0, and (ii) e E p ( x ) ≈ E ( x , G ) ∼ P p ( x ) for every polynomial p ( x ) ∈ R [ x ] whose coefficients are of degree at most D G in the input G . It is well-known that this may be written as a SDP on matrices of size O ( n D x )with O ( m D G ) affine constraints.In many cases, this paper included, this problem is soluble with high probability when theinput G is sampled from the planted model P . For instance, the evaluation map p ( x ) p ( x ) isa feasible solution provided that the P is sufficiently concentrated and these polynomials do notfluctuate too much about their expectations. On the other hand, by taking D x and D G sufficientlylarge it becomes infeasible when the input is drawn from a different distribution. The D x and D G necessary measure the hardness of the hypothesis testing problem. As was first discovered in a series of works in the sum-of-squares literature [BHK +
19, HS17,HKP +
17, Hop18], analyzing the low-degree likelihood ratio gives predictions of computational hard-ness that match widely-believed conjectures for many hypothesis testing problems (as corroboratedby the various other heuristics mentioned above). In essence, this method takes low-degree polyno-mials as a proxy for all polynomial-time algorithms, and analyzes whether any low-degree polyno-mial can distinguish two distributions Q and P (with the same interpretations as above) as n → ∞ .The key to making this analysis tractable is to choose the correct soft notion of “successfullydistinguishing.” This is done by considering the maximizationmaximize E x ∼ P p ( x )( E x ∼ Q p ( x ) ) / such that p = 0 is a polynomial of degree ≤ D. (13)In words, we seek to maximize p ( x ) under P in expectation, while keeping its typical magnitudeunder Q modest.We note that, if we did not restrict p to be a low-degree polynomial, then the optimal p wouldequal the classical likelihood ratio , L := d P d Q . In absence of computational constraints, thresholding L gives an optimal test between P and Q in the sense of minimizing error probabilities, as shownin the classical Neyman-Pearson lemma [NP33]. Moreover, the value of the above problem would9e the norm k L k of L in L ( Q ). If that norm is bounded as n → ∞ , then P and Q cannotbe distinguished w.h.p. by any test, by an application of Le Cam’s second moment method forcontiguity (see [LCY12, KWB19] for further exposition).With the further constraint to low-degree polynomials, the result is similar: the optimal p isthe aforementioned low-degree likelihood ratio, the orthogonal projection of L to the subspace ofdegree- D polynomials in L ( Q ), which we denote L ≤ D . The value of the problem is its norm, k L ≤ D k . We again consider whether, as n → ∞ , for D = D ( n ) slowly growing with n , this normdiverges or remains bounded. If it diverges, we expect that low-degree polynomials can distinguish P from Q w.h.p. (at an intuitive level, the algorithm we have in mind is thresholding L ≤ D ); if itremains bounded, we conclude that low-degree polynomials cannot distinguish P from Q w.h.p. inthis particular sense, and therefore expect that no polynomial-time algorithm can do so either.The precise scaling of D ( n ) relates to the efficiency of algorithms that the heuristic pertains to—higher degree polynomials describe more time-consuming computations. However, while constant-degree polynomials may be evaluated in polynomial time, some other polynomial-time computationsrequire slightly higher degree polynomials to express. A crucial example is approximating thespectral norm of a matrix with dimensions polynomial in n , which requires a polynomial of degreeΘ(log n ) to approximate accurately. Taking this into account, a low-degree lower bound with D ( n ) ≫ log n is taken as evidence that no polynomial-time test exists. Similarly, we view a low-degree lower bound with D ( n ) ≫ n δ for some δ ∈ (0 ,
1) as suggesting a lower bound against testswith runtime O (exp( n δ )). See [KWB19] for further discussion. In the planted 3-XOR-SAT problem, we are given m clauses of the form x i x j x k = b for somechoice of i, j, k ∈ [ n ] and b ∈ {± } . The goal is to distinguish the case where the clauses arecompletely random from the case where there is a planted assignment of {± } values to the variables x , . . . , x n such that all clauses are satisfied. This is a notable counterexample for many heuristicsfor average-case hardness, including sum-of-squares and all three methods mentioned above (see e.g.,Lecture 3.2 of [BS16] or Chapter 18 of [MM09]). These heuristics predict that the distinguishingtask is possible in polynomial time only when m & n / , whereas in reality the problem is mucheasier: Gaussian elimination can be used to decide with certainty whether or not there is a satisfyingassignment. However, if we change the planted distribution so that only a 1 − ε fraction of theclauses are satisfied, Gaussian elimination breaks down and the best known algorithms indeedrequire m & n / . In this sense, the above heuristics seem to predict the threshold for “robust”algorithms. We expect that a similar phenomenon is at play in the eSBM model: while there existalgorithms to solve the problem by exploiting brittle algebraic structure in the eigenvectors (seeRemark 1.2), we conjecture (Conjecture 2.5) that the threshold predicted by the heuristics is thecorrect computational threshold for a noisy version of the problem. We use standard asymptotic notation such as o ( · ) and O ( · ); unless stated otherwise, this alwayspertains to the limit n → ∞ with other parameters (such as d, k, η ) held fixed. We use e.g., o d ( · )or O d ( · ) when considering the double limit n → ∞ followed by d → ∞ ; for example, f ( n, d ) = o d ( g ( n, d )) means that for any ε > d > d ≥ d there exists n > n ≥ n we have | f ( n, d ) /g ( n, d ) | ≤ ε . An event occurs with high probability (w.h.p.) if it has probability 1 − o (1). We write f ∼ g to mean f /g → simple —i.e., without self-loops and multiple edges—unless stated otherwise. A d -regular graph has degree d at every vertex. G n,d denotes the uniformdistribution over d -regular n -vertex graphs.We will use I k denote the k × k identity matrix and J k for the k × k all-ones matrix; e , e , ... arethe standard unit basis vectors, and will denote the all-ones vector. For matrices, k · k denotes theoperator (spectral) norm and k·k F denotes the Frobenius norm. We use [ A ] for the ( { , } -valued)indicator of an event A . We write [ k ] = { , , . . . , k } and N = { , , , . . . } . We consider a general framework that captures the problem of certifying bounds on max- k -cut ina random graph, as well as a Gaussian variant of this problem. Definition 2.1.
For a labeling σ : [ n ] → [ k ] , the associated partition matrix P = P ( σ ) ∈ R n × n isgiven by P i,j = ( σ ( i ) = σ ( j ) , − / ( k − σ ( i ) = σ ( j ) . A basic fact is that P (cid:23) P ) = k −
1. This can be seen by realizing P as the Grammatrix of a certain collection of vectors: assign each u ∈ [ n ] one of the k unit vectors in R k − pointing to the corners of a simplex, according to its label σ ( u ) ∈ [ k ].Let Π be the set of all partition matrices. For a given matrix A ∈ R n × n and a given k , we willbe interested the problem of algorithmically certifying an upper bound on the valueΓ k ( A ) := max P ∈ Π h P, − A i . (14)If A G ∈ { , } n × n is the adjacency matrix of a graph G = ( V, E ) (defined with ( A G ) i,i = 0), then h P ( σ ) , − A G i = 2( k · m σ ( G ) − | E | ) / ( k − m σ ( G ) is the number of monochromatic edges of G under the labeling σ . As a result,MC k ( G ) = k − k (cid:18) k ( A G )2 | E | (cid:19) . (15)Thus, an upper bound on Γ k ( A G ) translates to an upper bound on MC k ( G ), which in turn can beused to refute k -colorability. We now formally define the certification task. Definition 2.2.
Let Q = Q n be a sequence of distributions R n × n and let A = A n : R n × n → R be asequence of algorithms. We say that A certifies the upper bound B on Γ k over A ∼ Q if both ofthe following hold:(i) for every A ∈ R n × n , A ( A ) ≥ Γ k ( A ) , and(ii) if A ∼ Q n then A n ( A ) ≤ B with probability − o (1) . Crucially, A ( A ) must always be a valid upper bound, even if A is atypical under Q . In exponentialtime it is possible to compute Γ k ( A ) exactly and thus achieve perfect certification; we are interestedinstead in polynomial-time certification procedures.11here is a simple spectral approach [Hof70] to certifying bounds on Γ k ( A ). Let λ min = λ min ( A )denote the minimum eigenvalue of A . For any P ∈ Π we have P (cid:23) A − λ min I (cid:23)
0, so0 ≤ h
P, A − λ min I i = h P, A i − λ min n and − λ min n is an efficiently-computable upper bound on Γ k ( A ). Our main results give evidencethat it is computationally hard to improve upon this spectral bound when A is drawn from certaindistributions Q : (i) random d -regular graphs, and (ii) Gaussian matrices. We will be concerned with the following task of distinguishing two distributions, also called hypoth-esis testing or detection . Definition 2.3.
Let P n and Q n be probability measures on the same space Ω n . We say that analgorithm t n : Ω n → { p , q } distinguishes P n and Q n with high probability , or (equivalently) achievesstrong detection (between P n and Q n ) if P n [ t n ( x ) = q ] + Q n [ t n ( x ) = p ] = o (1) as n → ∞ . Our results for random regular graphs are conditional on a conjecture regarding computationalhardness of detection in a noisy variant of the equitable stochastic block model (eSBM). The extranoise is crucial, as discussed in Remark 1.2 and Section 1.1.4. Specifically, the noise takes the formof “rewiring” a small constant fraction of the edges as follows.
Definition 2.4 (Noise Operator) . If G = ( V, E ) is a d -regular n -vertex graph and δ > , let T δ ( G ) denote the random d -regular graph obtained from G by making ⌊ δn ⌋ ‘swaps.’ That is, repeatedlychoose a pair of distinct edges ( i, j ) , ( k, ℓ ) uniformly at random conditioned on the following events: i = k , j = ℓ , ( i, k ) / ∈ E , and ( j, ℓ ) / ∈ E . Remove edges ( i, j ) and ( k, ℓ ) , and add edges ( i, k ) and ( j, ℓ ) . Recall the definition of the eSBM (Definition 1.1) and the associated threshold d eqKS ( η ) definedin (9). Conjecture 2.5 (eSBM Conjecture) . Let e G eq n,d,k,η,δ denote the distribution over d -regular n -vertexgraphs given by T δ ( G ) where G ∼ G eq n,d,k,η . Suppose δ > , η ∈ [ − k − , , k ≥ , and d < d eqKS ( η ) areall fixed. Also suppose k | (1 − η ) d so that the equitable model is defined for an infinite sequence ofvalues of n . Then, there exists no polynomial-time algorithm that with high probability distinguishes e G eq n,d,k,η,δ from G n,d as n → ∞ (in the sense of Definition 2.3). The above conjecture implies the following result on hardness of certifying bounds on max- k -cut. Theorem 2.6.
Assume the eSBM conjecture (Conjecture 2.5) holds. Fix d > and k ≥ and let η ∈ [ − k − , be such that | η | < √ d − d and k | (1 − η ) d . Then for any ǫ > , no polynomial-timealgorithm can certify, in the sense of Definition 2.2, the upper bound nd ( | η | − ǫ ) on Γ k over G n,d .Equivalently, no polynomial-time algorithm can certify for G ∼ G n,d the bound MC k ( G ) ≤ k − k (1 + | η | − ǫ ) . (16) Here, the ordering of the tuples matters, so e.g., ( i, j ) and ( j, i ) should be chosen with equal probability. k | (1 − η ) d are discussed in Section 2.2.1 below; this condition can beignored when d ≫ k . Remark 2.7 (Near-Coloring) . An important special case where Theorem 2.6 is tight is η = − k − ,corresponding to refutation of near colorability. Here the integrality condition k | (1 − η ) d reducesto ( k − | d . Thus for any d, k satisfying ( k − | d and k > d √ d − we have that for any ǫ > ,no polynomial-time algorithm can certify MC k ( G ) ≤ − ǫ . There is an infinite sequence of ( d, k ) values with k ∼ √ d for which the conditions on d, k are satisfied, namely d = 4( k − k − forall k > . Thus, our result is asymptotically tight, matching the spectral algorithm in the doublelimit n → ∞ followed by d → ∞ . Remark 2.8 (Exact Coloring) . As stated, Theorem 2.6 only shows hardness of refuting a near-coloring, as opposed to an exact coloring. While we expect a similar hardness result to hold for exactcoloring, this does not follow from the eSBM conjecture because the noise operator T δ prevents usfrom planting an exact coloring. Hardness of refuting exact coloring would follow from a variant ofthe eSBM conjecture where the noise operator only makes swaps that do not affect the value of theplanted cut. Our next two results give concrete evidence for the eSBM conjecture using two different heuris-tics (discussed in Section 1.1): stability of belief propagation, and the local statistics hierarchy.Both of these methods predict d eqKS ( η ) as the computational threshold for the eSBM. Theorem 2.9 (Informal; Kesten–Stigum threshold of eSBM) . For k ≥ , the Kesten–Stigumthreshold of the eSBM with parameters ( n, d, k, η ) , defined as the smallest number d eqKS so that, forall d > d eqKS , the “uninformative” fixed point of the belief propagation iteration is unstable, is givenby d eqKS = d eqKS ( η ) := 2 η (cid:16) p − η (cid:17) . Further details, as well as the full analysis leading to the above result, can be found in Section 4.
Theorem 2.10 (Local Statistics analysis of eSBM) . If d > d eqKS ( η ) , then there exist D sufficientlylarge and δ > so that the degree- (2 , D ) Local Statistics algorithm with error tolerance δ candistinguish G n,d and G eq n,d,k,η with high probability. If d ≤ d eqKS ( η ) , no such D and δ exist. Further details on the Local Statistics hierarchy, and the proof of Theorem 2.10, can be found inSection 5.
The integrality constraints in the eSBM (Definition 1.1) and in the above results may seem some-what mystifying. In this section we will try to shed some light on the limits of our method forquiet planting, and how they compare to the power of spectral refutation. The integrality condition k | (1 − η ) d from Theorem 2.6 is required so that the equitable block model exists for an infinitesequence of values of n . This condition can be written as ηd ≡ d (mod k ), which means η isconstrained to lie in a certain grid of spacing kd within the interval [ − k − , η as to maximize | η | subject to the integrality condition and | η | < √ d − d . If d ≫ k then we have kd ≪ √ d − d and sothe grid allows η to be chosen extremely close to √ d − d ; this means the effects of the integrality13ondition are negligible and we essentially obtain the ideal hardness result (10) which matches thespectral algorithm. In particular, our results are tight in the regime d → ∞ with k fixed.As discussed in Remark 2.7, we also get tight results in the particular case η = − k − corre-sponding to near-coloring, because the integrality condition simplifies in a helpful way.On the other hand, for small degree the integrality conditions can impose significant limitations.A particularly severe example is that of k = 2 and d = 3, which corresponds to Max-Cut in a random3-regular graph. The spectral bound corresponds toMC ( G ) ≤ (cid:18) √ − (cid:19) ≈ . . (17)On the other hand, the integrality condition from Theorem 2.6 requires that 2 | − η ), in additionto the condition | η | < √ − . Since 3 (cid:16) √ − (cid:17) ≈ .
83 the largest | η | admissible is such that that3 (1 − η ) = 4, corresponding to η = . This means that Theorem 2.6 only addresses certificationbelow MC ( G ) ≤ (cid:18) (cid:19) = 23 ≈ . . This is unfortunate since random 3-regular graphs have MC ( G ) > .
88 with high probabil-ity [DDSW03], and calculations from statistical physics [ZB10] suggest the even larger valueMC ( G ) ≈ .
92. That is, using the equitable block model to plant a 2-cut in a 3-regular graph, bygiving each vertex one neighbor in its own group and two neighbors in the other group, plants acut that is smaller than naturally arising cuts in random 3-regular graphs.In general, aside from the special case of near-coloring (Remark 2.7), our results are mostcompelling when k is small and d is large, so that the integrality condition in Theorem 2.6 doesnot create a significant gap from the limiting threshold | η | = √ d − d . We believe the investigationof quiet plantings in small-degree graphs to be an interesting direction of future research. k -Cut Model In this section we discuss a Gaussian analogue of the coloring problem, namely the problem ofcertifying upper bounds on Γ k ( W ) where W is a GOE matrix defined as follows. Definition 2.11.
The
Gaussian orthogonal ensemble is the following distribution
GOE ( n ) overrandom matrices: W ∼ GOE ( n ) is symmetric ( W u,v = W v,u ) with diagonal entries W u,u ∼N (0 , /n ) and off-diagonal entries W u,v ∼ N (0 , /n ) , where the values { W u,v : u ≤ v } areindependent. It is well known that (as n → ∞ ) the eigenvalues of W ∼ GOE ( n ) follow the Wigner semicircle lawsupported on [ − , λ min ( W ) → − k ( W ) ≤ (2 + o (1)) n . Our main results give rigorousevidence in support of the following conjecture, which states that improving upon this spectralbound requires fully exponential time. Conjecture 2.12.
For any constants k ≥ , ǫ > and δ > , there is no algorithm of runtime exp( O ( n − δ )) that certifies the upper bound (2 − ǫ ) n on Γ k ( W ) over W ∼ GOE ( n ) . Our evidence for this conjecture can be seen as a generalization of [BKW20], which handles the k = 2 case. In Section 3.2 we give a reduction from a certain hypothesis testing problem to the14ertification problem in question. In analogy to our results on coloring, this reduction can be seenas constructing a planted distribution that directly plants an eigenspace with no pushout effect;this planting is computationally quiet, conditional on hardness of the testing problem. The testingproblem is a particular instance of the spiked Wishart model with a rank-( k −
1) negative spike.In Section 2.3.2 below, we state results analyzing the low-degree likelihood ratio (see Section 1.1.3)for this model; these results suggest that fully exponential time is required (in the appropriateparameter regime).As a by-product of our analysis, we give new bounds on the low-degree likelihood ratio for awide class of multi-spiked matrix models (both Wigner and Wishart), which may be of independentinterest. These results also extend to certain binary-values analogues of these problems, including(a variant of) the stochastic block model. These results are also discussed in Section 2.3.2 below.
We consider general variants of the spiked Wigner and Wishart models, defined as follows.
Definition 2.13.
Let X = ( X n ) be a probability measure over R n × n sym , the symmetric n × n matrices.The general spiked Wigner model with spike prior X and signal-to-noise ratio λ ∈ R is specified bythe following null and planted distributions over R n × n sym . • Under Q , draw Y ∼ GOE ( n ) . • Under P , let Y = λ X + W where X ∼ X and W ∼ GOE ( n ) , independently. Definition 2.14.
Let X = ( X n ) be a probability measure over R n × n sym . The general spiked Wishart model with spike prior X , signal-to-noise ratio β > − , and number of samples N ∈ N , is specifiedby the following null and planted distributions over ( y , . . . , y N ) ∈ ( R n ) N . • Under Q , draw y u ∼ N (0 , I n ) independently for u ∈ [ N ] . • Under P , first draw f X ∼ X and define X = ( f X if β f X ≻ − I n , else . (18) Then draw y u ∼ N (0 , I n + β X ) independently for u ∈ [ N ] . The purpose of (18) is to ensure that I n + βX is a valid covariance matrix. We will consider priorsfor which the first case of (18) occurs with high probability. Specifically, we focus on the followingclass of priors that are PSD with constant rank. Definition 2.15.
Fix an integer k ≥ . Let π be a probability measure supported on a boundedsubset of R k , satisfying E [ π ] = 0 and k Cov( π ) k = 1 . Let X ( π ) denote the spike prior which outputs X = n U U ⊤ , where U is n × k with each row distributed according to π . (We do not allow π todepend on n .) It is well known in random matrix theory that polynomial-time detection in the Wigner modelis possible when | λ | >
1, by thresholding the maximum (or minimum if λ <
0) eigenvalue of Y [FP07, CDF09]. Similarly, poly-time detection in the Wishart model is possible when β >n/N , by thresholding the maximum or minimum eigenvalue of the sample covariance matrix Y = N P i y i y ⊤ i [BBP05, BS06]. In the general setting above, we have the following bounds on the normof the low-degree likelihood ratio k L ≤ D k (see Section 1.1.3), which suggest that fully exponential15ime is required to solve the detection problem below this spectral threshold. Our results areconsistent with the computational thresholds predicted by [LKZ15a], where it was shown that theapproximate message passing algorithm fails below the spectral threshold. The proofs are deferredto Section 6. Theorem 2.16.
Fix constants k ≥ and λ ∈ R . Fix π satisfying the requirements in Defi-nition 2.15. Consider the general spiked Wigner model with spike prior X ( π ) . If | λ | < then k L ≤ D k = O (1) for any D = o ( n/ log n ) . Theorem 2.17.
Fix constants k ≥ , β > − , and γ > . Fix π satisfying the requirements inDefinition 2.15. Consider the general spiked Wishart model with any N = N n satisfying n/N → γ as n → ∞ , and with spike prior X ( π ) . If β < γ then k L ≤ D k = O (1) for any D = o ( n/ log n ) . Remark 2.18.
In the setting of Theorem 2.17, the first case of (18) holds with high probabilitybecause β > − and k f X k → k Cov( π ) k = 1 (in probability). To see this, write k f X k = k n U U ⊤ k = k n U ⊤ U k ; being an average of n i.i.d. k × k matrices, n U ⊤ U converges in probability to its expec-tation, which is Cov( π ) . We also extend our framework to binary-valued problems. In Proposition B.1 we give a generalresult analyzing the low degree likelihood ratio in binary-valued problems via a comparison tothe analogous Gaussian-valued problem. As an application, we study the following variant of thestochastic block model (SBM).
Definition 2.19.
The stochastic block model with parameters k ≥ , d > , η ∈ [ − / ( k − , (constants not depending on n ) is specified by the following null and planted distributions over n -vertex graphs. • Under Q , for every u < v the edge ( u, v ) occurs independently with probability d/n . • Under P , each vertex is independently assigned a community label drawn uniformly from [ k ] .Conditioned on these labels, edges occur independently. If u, v belong to the same communitythen edge ( u, v ) occurs with probability (1+( k − η ) d/n ; otherwise ( u, v ) occurs with probability (1 − η ) d/n . Here k is the number of communities, d is the average degree, and η is a signal-to-noise ratio: theplanted k -cut cuts a fraction k − k (1 − η ) of the edges on average. Known polynomial-time algorithmsonly succeed at distinguishing P from Q above the so-called Kesten–Stigum (KS) threshold, i.e.,when dη > Theorem 2.20.
Consider the stochastic block model as in Definition 2.19 with parameters k, d, η fixed. If dη < then k L ≤ D k = O (1) for any D = o ( n/ log n ) . Prior work [HS17, Hop18] has already given a low-degree analysis of this variant of the SBM,showing that the problem is low-degree-hard below the KS bound. Our result offers two advantages:(i) the proof is streamlined, following easily from our general-purpose machinery, without the needfor direct combinatorial calculations, and (ii) we bound k L ≤ D k for D all the way up to o ( n/ log n )instead of only n . . As discussed in Section 1.1.3, item (ii) constitutes evidence that distinguishing P from Q in the SBM requires fully exponential time exp( n − o (1) ) below the KS bound. In this section we give formal proofs, for both the graph and Gaussian models, that hardness of aparticular detection problem implies hardness of certification.16 .1 The Graph Model
We now give the proof of Theorem 2.6, which shows that hardness of detection in the noisy eSBMmodel implies hardness of certifying bounds on max- k -cut. Proof of Theorem 2.6.
Assume for the sake of contradiction that some algorithm A certifies theupper bound MC k ( G ) ≤ k − k (1 + | η | − ε ) when G ∼ G n,d . We will use this to distinguish between e G eq n,d,k,η,δ and G n,d for δ = dε ( k − k ; this contradicts Conjecture 2.5 because the assumption | η | < √ d − d implies d < d eqKS ( η ). Our detection algorithm takes as input a graph G and outputs q (“null”)if A ( G ) ≤ k − k (1 + | η | − ε ) and p (“planted”) otherwise. If G ∼ G n,d then A ( G ) ≤ k − k (1 + | η | − ε )with high probability by assumption, and so the distinguisher outputs q . Now consider the case G ∼ e G eq n,d,k,η,δ . A graph drawn from G eq n,d,k,η has a planted k -cut of fractional size k − k (1 + | η | ), andthe noise operator T δ can remove at most 2 δn edges from this cut. Thus, A ( G ) ≥ MC k ( G ) ≥ k − k (1 + η ) − δn | E | where 2 δn | E | = 4 δd = 4 d · dε ( k − k < ε · k − k . This implies A ( G ) > k − k (1 + | η | − ε ) and so the distinguisher outputs p . Definition 3.1.
Let π k be the distribution over R k given by √ ke i − / √ k where i ∼ [ k ] uniformlyat random. Let X k = X ( π k ) be the associated spike prior, as defined in Definition 2.15. Here, e , e , ... denote the standard unit basis vectors and denotes the all-ones vector. Theorem 3.2.
Suppose there exist constants k ≥ and ǫ > such that there is a time- t ( n ) algorithm to certify the upper bound (2 − ǫ ) n on Γ k ( W ) over W ∼ GOE ( n ) . Then for someconstants β ∈ ( − , and γ > , there is a time- ( t ( n ) + poly( n )) algorithm achieving strongdetection in the general spiked Wishart model with spike prior X k . Note that the above parameters satisfy β < γ , which is in the “hard” regime of the Wishartmodel. Thus, Theorem 3.2 together with the low-degree-hardness of the Wishart model in thatregime (Theorem 2.17) constitute rigorous evidence for Conjecture 2.12. Proof.
Let A be the purported certification algorithm. We will use this to solve detection in theWishart model, given Wishart samples y , . . . , y N . Sample f W ∼ GOE ( n ) and let λ ≤ · · · ≤ λ n denote its (random) eigenvalues. Sample a uniformly random orthonormal basis v n − N +1 , . . . , v n for V := span { y , . . . , y N } and a uniformly random orthonormal basis v , . . . , v n − N for the orthogonalcomplement V ⊥ . Let W = P ni =1 λ i v i v ⊤ i . Our Wishart detection algorithm is as follows: if A ( W ) ≤ (2 − ǫ ) n then output q ; otherwise, output p .We now prove that this achieves strong detection. If the Wishart samples were drawn from Q then V is a uniformly random N -dimensional subspace and so W ∼ GOE ( n ). This means A ( W ) ≤ (2 − ǫ ) n with high probability by assumption, and so our algorithm correctly outputs q .It remains to show that if the Wishart samples were drawn from P , then Γ k ( W ) > (2 − ǫ ) n withhigh probability, and so the algorithm is forced to output p .17uppose the Wishart samples were drawn from P with planted matrix X . With high probabilitywe are in the first case of (18), i.e., X = n U U ⊤ where each row of U is drawn independently from π k . Note that k − U U ⊤ is a partition matrix, and soΓ k ( W ) ≥ − k − h U U ⊤ , W i . We can bound h U U ⊤ , W i = * U U ⊤ , n X i =1 λv i v ⊤ i + ≤ * U U ⊤ , λ n − N n − N X i =1 v i v ⊤ i + λ n n X i = n − N +1 v i v ⊤ i + = * U U ⊤ , λ n − N I n − n X i = n − N +1 v i v ⊤ i ! + λ n n X i = n − N +1 v i v ⊤ i + = * U U ⊤ , λ n − N I n + ( λ n − λ n − N ) n X i = n − N +1 v i v ⊤ i + where we have used the fact that P ni =1 v i v ⊤ i = I n since { v i } i ∈ [ n ] is an orthonormal basis. We willbound the pieces of this expression separately.First we bound h U U ⊤ , I n i . The nonzero eigenvalues of U U ⊤ are the same as the nonzeroeigenvalues of U ⊤ U . Since U ⊤ U is the sum of n i.i.d. k × k matrices, n U ⊤ U converges inprobability to its expectation, which is Cov( π ) = I k − J k /k where J k is the k × k all-ones matrix.This means h U U ⊤ , I n i = Tr( U U ⊤ ) = n Tr( n U ⊤ U ) = (1 + o (1)) n ( k −
1) with high probability.Next we bound h U U ⊤ , P ni = n − N +1 v i v ⊤ i i . Recall that { v i } ni = n − N +1 is an orthonormal basis forspan { y , . . . , y N } , and so P ni = n − N +1 v i v ⊤ i (cid:22) µ Y where Y = N P Ni =1 y i y ⊤ i and µ is the smallestnonzero eigenvalue of Y . Since Y is a spiked covariance matrix, Theorem 1.2 of [BS06] gives µ → ( √ γ − > * U U ⊤ , n X i = n − N +1 v i v ⊤ i + ≤ (cid:28) U U ⊤ , µ Y (cid:29) = 1 µ N N X i =1 k U ⊤ y i k . For fixed U (and therefore fixed X ), note that U ⊤ y i follows a multivariate Gaussian distributionwith mean zero and covariance E [ U ⊤ y i y ⊤ i U ] = U ⊤ E [ y i y ⊤ i ] U = U ⊤ ( I + βX ) U = U ⊤ U + βn U ⊤ U U ⊤ U. Recalling that n U ⊤ U → I k − J k /k , we have n ( U ⊤ U + βn U ⊤ U U ⊤ U ) → (1 + β )( I k − J k /k ) inprobability. Thus, nN P Ni =1 k U ⊤ y i k converges in probability to (1+ β ) Tr( I k − J k /k ) = (1+ β )( k − h U U ⊤ , n X i = n − N +1 v i v ⊤ i i ≤ (1 + o (1)) n µ (1 + β )( k −
1) = (1 + o (1)) n ( √ γ − (1 + β )( k − f W converge to the Wigner semicircle law on [ − , λ n − λ n − N ≤ o (1) with high probability. Also, by taking γ > λ n − N ≤ − ǫ/ k ( W ) ≥ − k − h U U ⊤ , W i≥ − k − * U U ⊤ , λ n − N I n + ( λ n − λ n − N ) n X i = n − N +1 v i v ⊤ i + ≥ − k − (cid:2) ( − ǫ/ o (1)) n ( k −
1) + (4 + o (1)) n ( √ γ − − (1 + β )( k − (cid:3) > (2 − ǫ ) n for sufficiently large n , provided we choose β > − −
1. This completes theproof.
In this section, we carry out a stability analysis of belief propagation (as discussed in Section 1.1.1)and derive the result presented in Theorem 2.9. The analysis resembles that of the originalwork [DKMZ11b, DKMZ11a] that predicted the Kesten–Stigum threshold in the ordinary stochasticblock model, but the equitability constraints create additional technical complexity in our setting.We start with the special case of the equitable coloring model in Section 4.1, and generalize to theequitable block model in Section 4.2. Throughout this section, it turns out to be convenient toparametrize the equitable model in a different way (defined below) than used in the Introduction.
The equitable coloring model is obtained by setting η = k − in Definition 1.1; for brevity let’sdefine c := 1 + ηk , (19)so that in the planted coloring, each vertex has exactly c neighbors of every other color. From thepoint of view of the vertices, this is a complicated constraint affecting a star of d + 1 vertices. Asa result, a factor graph with a variable node for each vertex, and a constraint node correspondingto each vertex and its neighbors, is not even locally treelike. Instead, we define a variable for eachedge, giving a pair of colors. The constraint then demands that the d edges incident to each vertexagree on its color, and that the colors of their other endpoints are equitable.This lets us define a message-passing algorithm. Regarding each edge ( u, v ) of the graph G ∼G eq n,d,k, ( k − − as a pair of directed edges u → v, v → u , each directed edge u → v sends a message µ u → v to vertex v consisting of the estimated probabilities µ u → vr,s that u and v are color r and s respectively, for each r, s ∈ [ k ] with r = s . Vertex v then sends out messages µ v → w to the directededges ( v, w ) which are computed as follows:1. For each of v ’s neighbors u other than w , choose a pair of colors ( r u , s u ) independently fromthe distribution µ u → v = ( µ u → vrs ).2. Condition on the event that the s u are identical for all u . Call this color s .19. Condition on the event that all but one of the colors other than s appear c times in the list( r u ), and that one color appears c − t .4. Then µ v → w = ( µ v → wst ) is the resulting conditional distribution of the pair ( s, t ), i.e., theprobability that v and w are color s and t respectively.Formally we can write µ v → wst = ψ v → wst z v → w (20)where ψ v → wst = X ( r u : u ∈ ∂v \ k ) ∈ [ k ] d − Y u µ u → vr u ,s ! Y u ∈ [ k ] ,u = s " |{ u : r u = u }| = ( c u = s, tc − u = t (21) z v → w = X r,s ∈ [ k ]: r = s ψ v → wrs . (22)Clearly the uniform messages µ u → vrs = 1 / ( k ( k − Y rs,s ′ t = ∂µ v → ws ′ t ∂µ u → vrs . (23)To compute this matrix, suppose that we perturb the incoming message µ u → v for a pair ( r, s ) with r = s , µ u → vr ′ s ′ = 1 k ( k −
1) (1 + εδ rr ′ δ ss ′ )where δ is the Kronecker delta, δ rr ′ = 1 if r = r ′ and 0 otherwise. This perturbation does notrespect the normalization P rs µ u → vrs = 1, but this will simply show up as Y having zero rowand column sums since normalization projects perturbations to the subspace perpendicular to theuniform vector.We then have several cases. If s ′ = s , for all t = s ′ then ψ v → ws ′ t is unchanged from its value atthe uniform fixed point, namely ψ v → ws ′ t = (cid:18) k ( k − (cid:19) d − (cid:18) d − c − , c, . . . , c (cid:19) = (cid:18) k ( k − (cid:19) d − c ( d − c ! k − := ψ . (24)For s ′ = s and t = r , we have ψ v → wst = (cid:18) k ( k − (cid:19) d − (cid:18) (1 + ε ) (cid:18) d − c − , c, . . . , c (cid:19) + ( k − (cid:18) d − c − , c − , c, . . . c (cid:19)(cid:19) = ψ (cid:18) (1 + ε ) c − d − k − cd − (cid:19) = ψ (cid:18) ε c − d − (cid:19) , ( t = r ) (25)20here the two terms in the first line come from r u = r and r u = r, s respectively. Finally, for s ′ = s and t = r , we have ψ v → wst = (cid:18) k ( k − (cid:19) d − (cid:18) (1 + ε ) (cid:18) d − c − , c − , c, . . . , c (cid:19) + (cid:18) d − c − , c, . . . c (cid:19) + ( k − (cid:18) d − c − , c − , c, . . . c (cid:19)(cid:19) = ψ (cid:18) ( k − ε ) cd − c − d − (cid:19) = ψ (cid:18) ε cd − (cid:19) , ( t = r ) (26)where the three terms in the first line come from r u = r , r u = t , and r u = r, s, t respectively.While this level of bookkeeping is comforting, both (25) and (26) are simply ψ (1 + εP ) where P is the fraction of ( d − r u = r . We can write (24), (25),and (26) as ψ v → ws ′ t = ψ (cid:18) εδ ss ′ c − δ rt d − (cid:19) . (27)Summing over all distinct s ′ , t gives the normalization factor z v → w = k ( k − ψ + εψ (cid:18) c − d − k − cd − (cid:19) = k ( k − ψ + εψ = k ( k − ψ (cid:18) εk ( k − (cid:19) , (28)with the multiplicative factor 1 + εP where P = 1 / ( k ( k − u → v has colors r and s on its endpoints.Combining (28) with (27), and (20) gives µ v → ws ′ t = 1 k ( k − (cid:18) ε (cid:18) − δ ss ′ δ rt d − cδ ss ′ d − − k ( k − (cid:19) + O ( ε ) (cid:19) . (29)Canceling the factor 1 / ( k ( k − Y rs,s ′ t = − δ ss ′ δ rt d − cδ ss ′ d − − k ( k − . (30)Using (30) and (19) the reader can check that the rows and columns of Y sum to zero, as alludedto above: ∀ r, s : X s ′ ,t : s ′ = t Y rs,s ′ t = 0 , ∀ s ′ , t : X r,s : r = s Y rs,s ′ t = 0 . (31)To diagonalize Y , it is useful to treat the k ( k − U spanned by orderedpairs ( r, s ) with r = s as the space of k × k matrices U = ( U rs ) with zeroes on the diagonal. Thenwe can interpret the three terms in (30) as follows: • The term δ ss ′ δ rt is the transpose operator, sending U to U ⊤ . • The term δ ss ′ sends U to ( U J ) ⊤ where J is the all-1s matrix. • The term − /k ( k −
1) subtracts the mean entry of U from each entry of Y ( U ).21 Finally, we set all the diagonal elements of Y ( U ) to zero.Thus we can rewrite (30) as Y ( U ) = Π (cid:20) − d − U ⊤ + cd − U J ) ⊤ − k ( k − J U J (cid:21) (32)where Π is the projection operator that sets the diagonal entries of a matrix to zero.To diagonalize Y , recall that if two linear operators commute, they share the same eigenvectors.Clearly Y commutes with relabelings of the colors, i.e., with the S k -action that conjugates U witha permutation matrix. This action preserves the following subspaces of matrices: • The symmetric matrices with zero diagonal • The antisymmetric matrices • The matrices whose row (resp. column) sums are zero • The matrices whose rows (resp. columns) are uniform, other than being zero on the diagonal. . . and their intersections. More abstractly, U is the k ( k − S k acts on distinct ordered pairs ( r, s ) by sending ( r, s ) to ( π ( r ) , π ( s )). We can find theeigenvectors and eigenvalues of Y by decomposing U into a direct sum of irreducible representationsof S k . This decomposition includes one copy of the trivial representation ρ ( k ) = I , and one copyeach of ρ ( k − , , and ρ ( k − , . (To avoid some case-checking we assume that k ≥
4. In particular,if k = 3 then ρ ( k − , disappears.) By Schur’s Lemma, when restricted to each of these irreduciblesubspaces Y is a scalar matrix with a single eigenvalue. These are as follows: • The trivial representation is spanned by the matrix with 1s everywhere off the diagonal.By (31) this has eigenvalue zero. • The copy of ρ ( k − , , consists of antisymmetric matrices with zero row and column sums.These are annihilated by v and are eigenvectors of the transpose with eigenvalue −
1. Thusthey are eigenvectors of Y with eigenvalue +1 / ( d − k − k − / • The copy of ρ ( k − , consists of symmetric matrices with zero row and column sums andzeroes on the diagonal. These are annihilated by v and are eigenvectors of the transpose witheigenvalue +1. Thus they are eigenvectors of Y with eigenvalue − / ( d − k − k − / − k ( k − / U includes two copies of the “standard” representation ρ ( k − , , one each in the symmetric andantisymmetric subspace. Each one has dimension k −
1, and together they span an isotypic subspaceof dimension 2( k − Y is the tensor productof the identity with a 2 × U ij = i = jα i = 1 , j = 1 β j = 1 , i = 1 γ i = 1 , j = 1 , i = j where γ = − α + βk − , (33)22nd their images under conjugation by permutation matrices, i.e., where the “special” row andcolumn ranges from 1 to k . That is, U = α α · · · αβ γ · · · γβ γ γβ γ · · · γ where γ is set so that U ’s entries sum to zero. The reader can check that these matrices areorthogonal to the other irreducible subspaces with respect to the trace inner product h U, U ′ i =Tr U ⊤ U ′ .Using (32), we find that Y ( U ) is also of this form but with entries α ′ and β ′ , where (cid:18) α ′ β ′ (cid:19) = m · (cid:18) αβ (cid:19) where m = (cid:18) − d − − cd − (cid:19) . (34)Thus Y on this isotypic subspace is m ⊗ I where I is the ( k − Y are those of m , namely the roots κ of( d − κ + cκ + 1 = 0 , which are κ ± = − c ± p c − d − d − . When c < d −
1) the discriminant is negative, so these eigenvalues are complex and lie on theunit circle | κ ± | = 1 / √ d −
1. But when c > d −
1) they are real, and κ + > / √ d − Y with thenon-backtracking matrix B . It is a consequence of Theorem 2 in [BC19] that, with high probabilityover G sampled from the equitable SBM with d < d eqKS , the spectrum of the non-backtrackingmatrix consists of a “trivial” eigenvalue d whose left and right eigenvectors is the all-ones vector,and remaining eigenvalues with modulus at most √ d − o n (1) in the complex plane. Sinceperturbations along the uniform eigenvector of B would violate the balance of colors, we are leftwith these remaining eigenavlues. Multiplying κ + by √ d − o n (1) tells us that the Kesten–Stigum transition, where the largest eigenvalue of the Jacobian exceeds 1 in absolute value, occurswhen c ≈ d − ≈ hiding o n (1) terms. Given (19) this is c ≈ (cid:16) ( k −
1) + p k ( k − (cid:17) or d ≈ k − s k ( k − k − ! ≈ λ (cid:16) p − λ (cid:17) where λ = − / ( k − λ in terms of d gives | λ | = 2 √ d − d + o n (1) . (35)23 .2 Generalizing to the Equitable Block Model Let d = a + b ( k − . (36)Recall that a legal labeling of the equitable block model on a d -regular graph is a k -coloring whereeach vertex has exactly a neighbors of its own color, and exactly b neighbors of each of the k − k -colorings corresponds to a = 0 and b = c .We can generalize the message-passing algorithm of the previous section as follows. Eachdirected edge ( u, v ) again sends a message µ u → v to vertex v consisting of the estimated probability µ u → vrs that u and v are color r and s respectively, for each r, s ∈ [ k ] with r = s . Vertex v then sendsout messages µ v → w to the directed edges ( v, w ) which are computed as follows:1. Give pairs ( s, t ) the prior distribution P ( s, t ) = 1 kd ( a ( s = t ) b ( s = t ) . (37)2. Multiplicatively reweight this distribution by the probability that, if for each of v ’s neighbors u other than w we choose a pair of colors ( r u , s u ) independently from the distribution µ u → v =( µ u → vrs ), then: • s u = s for all u , and • (if s = t ) s appears a − r u ), and every color other than s appears b times, • (if s = t ) s appears a times in the list ( r u ), t appears b − b times each.3. Then µ v → w = ( µ v → wst ) is the resulting posterior distribution of ( s, t ).Generalizing (21) and (22), we can write µ v → wst = ψ v → wst z v → w (38)where ψ v → wst = X ( r u : u ∈ ∂v \ k ) ∈ [ k ] d − Y u µ u → vr u ,s ! a Y u ∈ [ k ] " |{ u : r u = u }| = ( a − u = sb u = s ( s = t ) b Y u ∈ [ k ] |{ u : r u = u }| = a u = sb − u = tb u = s, t ( s = t )(39) z v → w = X r,s ∈ [ k ]: r = s ψ v → wrs . (40)Note that the factors of a and b in (39), which did not appear in the coloring case, come from theprior distribution (37). The reader can check that the prior distribution on ( s, t ) is now a fixedpoint of this algorithm, µ u → vst = 1 kd ( a ( s = t ) b ( s = t ) .
24t this fixed point, (39) gives ψ v → wst = ( aψ ( s = t ) bψ ( s = t ) (41)where ψ := a a − b b ( k − ( kd ) d − (cid:18) d − a − , b, . . . , b (cid:19) = a a b b ( k − − ( kd ) d − (cid:18) d − a, b − , b, . . . , b (cid:19) = a a b b ( k − ( kd ) d − ( d − a ! b ! k − , and where as always 0 = 0! = 1.We again consider perturbing the incoming message µ u → v for a pair ( r, s ), µ u → vr ′ s ′ = (1 + εδ rr ′ δ ss ′ ) 1 kd ( a ( r = s ) b ( r = s ) . (42)As before, if s ′ = s then ψ v → ws ′ t is unchanged for all t , and is still given by (41). For the other caseswhere s ′ = s , let us first assume that r = s . There are now three cases: t = s (now that someneighbors have the same color), t = r , and t / ∈ { r, s } . We have ψ v → wst = aψ (cid:16) ε bd − (cid:17) t = sbψ (cid:16) ε b − d − (cid:17) t = rbψ (cid:16) ε bd − (cid:17) t / ∈ { r, s } ( r = s ) (43)where in each case we multiply by 1 + εP where P is the fraction of ( d − r u = r . Summing over all s ′ , t gives z v → w = kdψ + εbψ (cid:18) b − d − k − bd − ad − (cid:19) = kdψ + εbψ = kdψ (cid:18) ε bkd (cid:19) ( r = s ) . (44)This multiplicative factor is 1 + εP where P = b/ ( kd ) is the prior probability that a random edge u → v has colors r, s on its endpoints where r = s .Analogously, if r = s = s ′ we have ψ v → wst = aψ (cid:16) ε a − d − (cid:17) t = sbψ (cid:16) ε ad − (cid:17) t = s ( r = s ) , (45)and z v → w = kdψ + εaψ (cid:18) a − d − k − bd − (cid:19) = kdψ + εaψ = kdψ (cid:16) ε akd (cid:17) ( r = s ) , (46)where the multiplicative factors are again 1+ εP where P is the fraction of ( d − P = a/ ( kd ) of an edge having colors r = s on its endpoints.25utting all this together generalizes (29) to µ v → ws ′ t = 1 kd a (cid:16) ε (cid:16) δ ss ′ a − d − − akd (cid:17) + O ( ε ) (cid:17) [ s ′ = t, r = s ] a (cid:16) ε (cid:16) δ ss ′ bd − − bkd (cid:17) + O ( ε ) (cid:17) [ s ′ = t, r = s ] b (cid:16) ε (cid:16) δ ss ′ ad − − akd (cid:17) + O ( ε ) (cid:17) [ s ′ = t, r = s ] b (cid:16) ε (cid:16) δ ss ′ b − δ rt d − − bkd (cid:17) + O ( ε ) (cid:17) [ s ′ = t, r = s ] , (47)the fourth case of which coincides with (29) when a = 0 and b = c . Comparing with (42) andaccounting for factors of a and b gives the matrix of partial derivatives, Y rs,s ′ t = − δ ss ′ δ rt d − (cid:18) δ ss ′ d − − kd (cid:19) ( a ( s ′ = t ) b ( s ′ = t ) . (48)The reader can check the normalization conditions: the rows of Y sum to zero, so that the uni-form vector is a right eigenvector of eigenvalue zero, but the columns are orthogonal to the priordistribution (37) so that it is a left eigenvector of eigenvalue zero. Thus (31) becomes ∀ r, s : X s ′ ,t Y rs,s ′ t = 0 , ∀ s ′ , t : X r,s Y rs,s ′ t ( a ( r = s ) b ( r = s ) = 0 . (49)At the risk of multiplying entities without necessity, we can also write Y in the style of (32).If we think of Y ’s action by right multiplication on k -dimensional vectors as a linear operator on k -dimensional matrices U = ( U s ′ t ), then Y ( U ) = − d − U ⊤ + 1 d − U ) J ) ⊤ − kd J Υ( U ) J , (50)where J is again the all-1s matrix and Υ is a linear operator on matrices that reweights diagonaland off-diagonal elements by a and b respectively,Υ( U ) s ′ t = U s ′ t ( a ( s ′ = t ) b ( s ′ = t ) . (51)As in the coloring case, we use representation theory to diagonalize Y . The k -dimensionalmatrices form a k -dimensional representation of S k where permutation matrices act by conjugation.Since this representation sends pairs of colors ( r, s ) to ( π ( r ) , π ( s )), this is the tensor product of thenatural permutation representation with itself. The permutation representation is a direct sum ofthe trivial representation (spanned by the uniform vector) with the ( k − ρ ( k − , , and ρ ( k − , with dimension ( k − k − / k ( k − / k − ρ ( k − , . We go through each of thesesubspaces, focusing on Y ’s right eigenvectors.First, the trivial subspace is spanned by the identity matrix I = ( δ s ′ t ) and the all-1s matrix J . As in (49) J is a right eigenvector with eigenvalue zero, so Y ( J ) = 0. Observing (50), we have I ⊤ = I , Υ( I ) = aI , IJ = J , and J IJ = kJ . This gives Y ( I ) = − d − I + (cid:18) ad − − ad (cid:19) J = − d − I + ad ( d − J. Y acts as the matrix1 d − (cid:18) − a/d (cid:19) giving the eigenvalues − / ( d −
1) and 0.Next, as before the copy of ρ ( k − , , consists of antisymmetric matrices U with zero row andcolumn sums. For these matrices we have Υ( U ) = bU and U J = 0, while U ⊤ = − U . Thus theyare again eigenvectors of Y with eigenvalue +1 / ( d − ρ ( k − , consists of symmetric matrices U with zero row and column sums andzeroes on the diagonal. Now we have Υ( U ) = bU , U J = 0, and U ⊤ = U , and (50) again makesthem eigenvectors with eigenvalue − / ( d − U ij = δ i = j = 1 ζ i = j = 1 α i = 1 , j = 1 β j = 1 , i = 1 γ i = 1 , j = 1 , i = j where γ = − α + βk − ζ = − δk − , (52)and their images under conjugation by permutation matrices, i.e., where the “special” row andcolumn ranges from 1 to k . That is, U = δ α α · · · αβ ζ γ · · · γβ γ ζ ...... ... . . . γβ γ · · · γ ζ where γ and ζ are set so that U ’s entries sum to zero and U has zero trace. In particular, U isorthogonal to both J and I , and hence to the trivial subspace. The reader can confirm that It isorthogonal to ρ ( k − , , and ρ ( k − , as well.Using (50) and a little work, we find that Y ( U ) is also of this form but with entries α ′ , β ′ , δ ′ ,where α ′ β ′ δ ′ = m · αβδ where m = 1 d − − b − − ak − b ( k − − ab ( k −
1) 0 a − . (53)Thus Y on this isotypic subspace is m ⊗ I where I is the ( k − Y are those of m , which are namely the roots κ of − d − κ ± = a − b ± p ( a − b ) − d − d − . Analogous with the coloring case, if ( a − b ) < d −
1) these eigenvalues are complex and lie onthe unit circle | κ ± | = 1 / √ d −
1. But when ( a − b ) > d − κ + > / √ d − κ + by the modulus of the largest non-trivial eigenvalue of the non-backtrackingmatrix, √ d − o n (1), to obtain the dominant eigenvalue of the Jacobian of belief propagation.The Kesten–Stigum transition occurs when this eigenvalue exceeds the unit circle, or when( a − b ) = 4( d −
1) + o n (1) . Since in the equitable block model we have λ = a − bd , this again occurs at | λ | = 2 √ d − d + o n (1) . Throughout this section, we will for the sake of brevity write Q = ( Q n ) for the uniform distribution G n,d on d -regular graphs, and P = ( P n ) for the equitable stochastic block model G eq n,k,d,η fromDefinition 1.1.As in the preceding text, we are most concerned with the behavior of the null and plantedmodels when the number of vertices is very large, and we will write with high probability (w.h.p.) todescribe a sequence of events that hold with probability 1 − o n (1) in P n or Q n as n → ∞ , with otherparameters ( d, k, η ) held fixed. The constant in the o n (1) may depend on these other parameters,and we will not make any attempt to quantify its rate, leaving us free to take union bounds overconstantly many events.In this section we study a family of semidefinite programming algorithms for the P vs. Q distinguishing problem. Like Sum of Squares, the Local Statistics algorithm is phrased in thelanguage of polynomials. Let us define a set of variables x = { x u,i } indexed by vertices u ∈ [ n ] andgroup labels i ∈ [ k ], and G = { G u,v } indexed by pairs of distinct vertices. We think of the plantedmodel as outputting a random evaluation of these variables, namely a pair ( x , G ), where x ∈ R n × k encodes the hidden community structure—with x u,i = 1 if σ ( u ) = i and zero otherwise—and G ∈ R ( [ n ]2 ) is the Boolean vector indicating which edges are present in the graph. This allows us toregard polynomials p ∈ R [ x, G ] as statistics of the planted distribution P , and we will in particularfocus on the quantities E ( x , G ) ∼ P p ( x , G ).The planted model outputs random variables x and G with a particular combinatorial structure:each variable is { , } -valued, and each vertex has exactly one label. This can be encoded in a setof polynomial constraints: G u,v − G u,v = 0 ∀ ( u, v ) ∈ (cid:18) [ n ]2 (cid:19) x u,i − x u,i = 0 ∀ u ∈ [ n ] , i ∈ [ k ] X i ∈ [ k ] x u,i − ∀ u ∈ [ n ] . Calling I k the ideal of R [ x, G ] generated by the polynomials on the left hand side of the equationsabove, then for any p ∈ I k , p ( x , G ) = 0. Moreover, the planted distribution has a pleasantsymmetry property: the symmetric group S n acts naturally and simultaneously on the variables x and G , with a permutation ξ acting as x u,i x ξ ( u ) ,i and G u,v G ξ ( u ) ,ξ ( v ) , and for any polynomial28 ∈ R [ x, G ], the expectation E ( x , G ) ∼ P p ( x , G ) is constant on the orbits of this action. The LocalStatistics algorithm, given as input a graph G , endeavors to find a “pseudoexpectation” thatmimics the conditional expectation E [ ·| G ] on polynomials p ( x, G ) of sufficiently low degree. Definition 5.1 (Local Statistics Algorithm with Informal Moment Constraints) . The degree-( D x , D G ) Local Statistics algorithm is the following SDP: given an input graph G , find e E : R [ x ] ≤ D x → R s.t.1. (Positivity) e E p ( x ) ≥ whenever deg p ≤ D x
2. (Hard Constraints) e E p ( x, G ) = 0 for every p ∈ I k
3. (Moment Constraints) e E p ( x, G ) ≈ E ( x , G ) ∼ P p ( x , G ) whenever deg G p ( x, G ) ≤ D G , deg x p ( x, G ) ≤ D x , and p is fixed under the S n action. See, e.g., the survey [Lau09] for detailed discussion of how optimization problems of this form canbe solved as SDPs.We use the Local Statistics SDP for the P vs. Q hypothesis testing problem as follows: given G sampled from one of these two distributions, we run Local Statistics, outputting p if the SDP isfeasible, and q otherwise. The symbol ≈ in the moment constraints indicates that we will permitsome additive error; this is necessary so that, when G ∼ P , the SDP is with high probabilitysatisfiable, by setting e E p ( x, G ) = E p ( x , G ). In fact, we will instantiate these moment constraintsonly on the elements of a certain combinatorially meaningful basis, and we will allow differentadditive error for different basis elements. In so doing we automatically satisfy positivity and thehard constraints, and the additive slack allows for for fluctuations of the p ( x , G ) around theirexpectations. When we make this precise below, we will write this additive slack in terms of an‘error tolerance’ δ > Theorem 5.2.
Let Q and P be as in Definition 1.1. If ( dη ) > d − then there exists δ > sothat the degree (2 , Local Statistics algorithm with error tolerance δ can distinguish P and Q . If ( dη ) ≤ d − , then there do not exist such a D and δ . The proof of Theorem 5.2 closely follows [BMR19]. As in that work, we will first study asimpler SDP hierarchy for the P vs. Q hypothesis testing problem, and then show that its feasiblityis equivalent to that of degree (2 , D ) local statistics SDP. We begin with some standard facts aboutnon-backtracking walks, which will be a central tool in our analysis. Let A G be the adjacency matrix for a d -regular graph G (which may have self-loops and multi-edges). A length- s non-backtracking walk on G is an alternating sequence of vertices and edges v , e , v , e , ..., v s without terms of the form v, e, w, e, v . The matrices A ( s ) G whose u, v entries countthe number of such walks between vertices u and v are given by A (0) G = 1 A (1) G = A G A (2) G = A G − dA ( s +1) G = AA ( s ) G − ( d − A ( s − G s ≥ .
29n particular, A ( s ) G = q s ( A G ), for a sequence of monic univariate polynomials q s ∈ R [ z ], withdeg q s = s , which are known to be orthogonal with respect to the Kesten-McKay measure dµ km ( z ) := d π p d − − z d − z h | z | < √ d − i dz on the interval ( − √ d − , √ d − q s are a basis for the Hilbert space of squareintegrable functions on this interval, equipped with the inner product h f, g i km := Z f ( z ) g ( z ) dµ km ( z ) , and associated norm k f k km := h f, f i km = Z f ( z ) dµ km ( z ) , and in particular for any polynomial f ∈ R [ z ], we have the orthogonal decomposition f = X s ≥ h f, q s i km k q s k km q s . We record for later use that k q s k km = q s ( d ) = ( s = 1 d ( d − s − s > s in a rooted d -regular tree, or equivalently n − times the total number of length- s non-backtracking walks in a d -regular graph on n vertices. Fora derivation of this and other related facts, the reader may refer to [Sol96] or [Sod07], but shouldbeware of differing normalization conventions.We will also require some standard and generic properties sequences of univariate polynomialsorthogonal with respect to a measure on an interval of R [Sze39, Theorems 3.3.1, 6.6.1, and 3.4.1-2]:each q s has s roots in the interval ( − √ d − , √ d − s ≥ Lemma 5.3 (Quadrature) . For each s , call r < r < · · · < r s there roots of q s . There existweights w , ..., w s ≥ with the property that h f, q s i km = X i ∈ [ s ] f ( r i ) q s ( r i ) w i for every polynomial f of degree at most s − . In this section we study a simplified version of the Local Statistics SDP, which will ultimately bekey to our analysis of the full Local Statistics SDP. Let ( x , G ) ∼ P , thinking of x as a collection of k vectors x , ..., x k ∈ { , } n . One can check that the partition matrix for the planted labelling, inthe sense of Definition 2.1, is P := kk − X i ∈ [ k ] x i x ⊤ i − k J k x + · · · + x k = 1, we as well have= kk − X i ∈ [ k ] x i x ⊤ i − k X i,j x i x ⊤ j As we observed in the Introduction, P is PSD with ones on the diagonal, and ( k/n ) P is theorthogonal projector onto the ( k − x , ..., x k and orthogonalto the vector of all-ones.We will be particularly interested in the inner products h P , A ( s ) G i , which count non-backtrackingwalks on G , with weight 1 if the endpoints share a group label, and − ( k − − if they do not. Thefollowing is a consequence of Lemma 5.14 in the sequel. Lemma 5.4.
For every s ≥ and increasing, nonnegative function ∆( n ) , P h(cid:12)(cid:12)(cid:12) h P , A ( s ) G i − q s ( dη ) n (cid:12)(cid:12)(cid:12) > ∆( n ) i = O (cid:18) n ∆( n ) (cid:19) . The
Symmetric Path Statistics SDP , given as input a graph G , attempts to find a “pseudo-partitionmatrix,” i.e. PSD matrix with ones on the diagonal, and whose inner products with the matrices A ( s ) G are equal to q s ( dη ) n , at least up to the fluctuations in Lemma 5.4. Definition 5.5 (Symmetric Path Statistics) . The level- D Symmetric Path Statistics Algorithm with error tolerance δ > , on input a d -regular graph G on n vertices, is the following SDP: find e P (cid:23) so that1. e P u,u = 1 for every u ∈ [ n ] h e P , J n i ∈ [ − δn , δn ] h e P , A ( s ) G i ∈ q s ( dη ) n + [ − δn, δn ] for every s ∈ [ D ] . Theorem 5.6.
Let Q and P be as in Definition 8.1. If ( dη ) > d − , then for every D ≥ there exists an error tolerance δ > at which the level D Symmetric Path Statistics SDP can w.h.p.distinguish P and Q . If ( dη ) ≤ d − , then no such D and δ exist.Proof. By Lemma 8.5, this SDP is with high probability feasible on input G ∼ P . Our proof willtherefore show that when η is sufficiently large, the SDP for some constant D is infeasible on input G ∼ Q , whereas for η sufficiently small, it is feasible for every constant D .First, fix D ≥ dη ) > d − δ > D Symmetric Path Statistics is infeasible on input G ∼ Q . Our strategywill be to find a polynomial f ith the property that f ( A G ) (cid:23) h P, f ( A G ) i < f be a degree D polynomial which is strictly positive on the closed interval [ − √ d − , √ d − f ( dη ) <
0; our assumption on η ensures that this is possible, for instance by setting f ( z ) = 2( d − ( dη ) − z . From our preliminaries on non-backtracking walks and the polynomials q s we know f = D X s =0 h f, q s i km k q s k km q s . When G ∼ Q , A G has an eigenvalue at d whose eigenvector is the all-ones vector and by Fried-man’s Theorem [Fri03] its remaining eigenvalues with high probability have absolute value at most31 √ d − o (1). Our assumptions on f therefore imply f ( A G ) − f ( d ) J/n = f ( A G − dJ/n ) (cid:23) e P (cid:23) D Symmetric Path StatisticsSDP on input G ∼ Q n , then0 ≤ h e P , f ( A G ) − f ( d ) J/n i = D X s =0 h f, q s i km k q s k km h e P , q s ( A G ) i + δ | f ( d ) | n ≤ D X s =0 h f, q s i km k q s k km q s ( dη ) n + δ (cid:0) k f k km + | f ( d ) | (cid:1) n = (cid:0) f ( dη ) + δ (cid:0) k f k km + | f ( d ) | (cid:1)(cid:1) n < , if we set δ sufficiently small.We can now turn to the case ( dη ) ≤ d − G ∼ Q , withhigh probability every level of the Symmetric Path Statistics hierarchy is feasible, for every errortolerance δ . We will use the following lemma, which may be proved by adapting the proof of[BMR19, Proposition 4.8]. The proof proceeds by setting e P equal to a mild modification of thematrix g ( A G ) − g ( d ) J n /n . Lemma 5.7.
Assume there exists a constant-degree polynomial g ∈ R [ z ] that is strictly positive on [ − √ d − , √ d − and satisfies h g, q s i km ∈ q s ( dη ) + [ − δ, δ ] . for every s = 1 , ..., D . Then the level- D Symmetric Path Statistics SDP with error tolerance δ isw.h.p. feasible on input G ∼ Q . Lemma 8.9 in hand, we need only to construct such a polynomial. Assume that ( dη ) ≤ d − dη ∈ [ − √ d − , √ d − q s . Let P ≫ D , and write r < r < · · · < r P for the roots of q P . Let I ⊂ [ P ] contain the indices of the( D + 1) / q P closest to dη , and set g η = 1 ζ Y i/ ∈ I ( z − r i ) , where ζ is a normalizing factor to ensure that h g η , i km = 1.This polynomial is certainly nonnegative, and its degree is 2 P − D −
1. From Lemma 8.4, then,there exist w , ..., w P ≥ s = 0 , ..., D h g η , q s i km = X i ∈ I w i g η ( r i ) q s ( r i ) . In particular, setting s = 0 and recalling the definition of ζ , we have1 = h g η , i km = X i ∈ I w i g η ( r i ) . The referenced result in [BMR19] was proved for an SDP which shared constraints (1) and (2) from Definition5.5, but in which constraint (3) read h P, A ( s ) G i = λ s k q s k km n . The proof may be adapted simply by adopting thehypotheses below, changing every instance of the aforementioned constraints, and taking some care with the δ slack. s = 0 , ..., D , the inner product h g η , q s i is a weighted average of q s evaluatedat the ( D + 1) / q P closest to the point dη . Since the roots of the q polynomials are densein ( − √ d − , √ d − dη is in the closure of this interval, for each D and δ >
0, there existsa constant P for which |h g η , q s i km − q s ( dη ) | < δ for every s = 0 , .., D . We are now ready to study the full Local Statistics algorithm. To start, we will need to developa basis for the symmetric polynomials appearing as affine moment matching constraints in Defini-tion 5.1. Because E p ( x , G ) = 0 for any p ∈ I k , a constraint shared by the pseudoexpectation, itwould suffice to study the subspace of R [ x, G ] / I k fixed under the S n action inherited from R [ x, G ].However, it is computationally favorable to work in a slightly larger vector space instead. Definition 5.8.
Let us write S [ x, G ] ≤ D x ,D G ⊂ R [ x, G ] for the vector space of polynomials that(1) satisfy deg x ≤ D x and deg G ≤ D G , (2) are symmetric with respect to the S n action, (3) aremultilinear in G and x , and (4) for which at most one of x u, , ..., x u,k appears in each monomial,for every u ∈ V . Every polynomial appearing as a moment constraint in the level-( D x , D G ) Local Statisticsalgorithm belongs to S [ x, G ] ≤ D x ,D G . We now give a combinatorially structured basis for this vectorspace, similar to the ‘shapes’ of [BHK + Definition 5.9.
A partially labelled graph ( H, S, τ ) consists of a graph H , a subset S ⊂ V ( H ) , anda map τ : S → [ k ] ; we say that a graph is fully labelled if S = V ( H ) , and in this case write ( H, τ ) forshort. A homomorhism from ( H, S, τ ) into a fully labelled graph ( G, σ ) is a map φ : V ( H ) → V ( G ) that takes edges to edges and agrees on labels; an occurrence of ( H, S, τ ) in ( G, σ ) is an injectivehomomorphism. For each partially labelled graph ( H, S, τ ) there is an associated polynomial in S [ x, G ] , p ( H,S,τ ) ( x, G ) = X φ : V ( H ) ֒ → [ n ] Y ( u,v ) ∈ E ( H ) G φ ( u ) ,φ ( v ) Y u ∈ S x φ ( u ) ,τ ( u ) . Each point in the zero locus of I k may be identified with a fully labelled graph ( G, σ ) . Evaluatedat such a point, this polynomial counts the number of occurrences of ( H, S, τ ) in ( G, σ ) . Finally, deg x p ( H,S,τ ) = | S | and deg G p ( H,S,τ ) = | E ( H ) | . Lemma 5.10.
The polynomials p ( H,S,τ ) with | E ( H ) | ≤ D G and | S | ≤ D x are a vector space basisfor S [ x, G ] ≤ D x ,D G Proof.
Let s : R [ x, G ] → S [ x, G ] be the map that sends a polynomial p to the sum over its S n orbit.The vector space S [ x, G ] ≤ D x ,D G is spanned by the images under s of the multilinear monomialswith x -degree D x and G -degree D G in R [ x ] in which at most one of x u, , ..., x u,k appears for each u ∈ [ n ]. From each such monomial m ( x, G ) one can extract a partially labelled graph ( H m , S m , τ m ),where | S m | ≤ D x , | E ( H m ) | ≤ D G , and H m is a subgraph of the complete graph: E ( H m ) is theunion of all pairs ( u, v ) appearing as an index of a G variable, S m is the union of all u occurring asan index of an x variable, V ( H m ) is the union of S and the endpoints of every edge, and τ ( u ) = i if the variable x u,i appears. Because S n acts transitively on [ n ], the orbit of m ( x ) corresponds toevery possible injection V ( H m ) ֒ → [ n ], and thus s ( m ( x, G )) = p ( H m ,S m ,τ m ) ( x, G ) . This shows that the p ( H,S,τ ) span. To see that they are independent, observe that each monomialappears as a term in exactly one p ( H,S,τ ) . 33n view of this lemma, the moment constraints in Definition 5.1 are equivalent to the require-ment that e E p ( H,S,τ ) ( x, G ) ≈ E p ( H,S,τ ) ( x , G ) for every ( H, S, τ ) with at most D G edges and D x distinguished vertices, up to isomorphism. In order to instantiate and analyze the Local Statisticsalgorithm, we now need to compute these expectations, and bound the fluctuations around them. Instead of working directly in P , we will as usual work in the configuration model b P , a distributionon multigraphs with two key properties: (i) with probability bounded away from zero as n → ∞ , b G ∼ b P is simple, and (ii) the conditioal distribution of b P on this event is equal to P . In this model,conditional on a balanced partition σ , we adorn each vertex in every group i with dM i,j ‘half-edges’labelled i → j , and then for each i, j ∈ [ k ] randomly match the i → j half-edges with the j → i half-edges. When k = 1, this is the usual configuration model on d -regular graphs, and when k = 2and M i,i = 0, it gives bipartite regular graphs. Thus many results we prove for P will apply to Q as well. Claim 5.11.
Write G n and b G n for the sets of all d -regular, n -vertex graphs and multigraphs,respectively. If b E n ⊂ b G n is a sequence of events holding w.h.p. in b P n , then b E n ∩ G n holds w.h.p. in P n .Proof. Since b P n [ G n ] is bounded away from zero and b P [ b E n ] = 1 − o n (1), we have P n [ b E n ∩ G n ] = b P n [ b E n ∩ G n ] b P n [ G n ] ≥ ˆ P n [ G n ] − o n (1)ˆ P n [ G n ] = 1 − o n (1) . As a warm-up, let us recall some standard calculations of subgraph probabilities in these twosimpler situations.
Lemma 5.12.
Let b G be a multi-graph produced by the d -regular configuration model on n vertices.If H is a simple graph, the probability that a fixed injection φ : V ( H ) ֒ → V ( b G ) is a homomorphismis Y v ∈ V ( H ) d !( d − deg( v ))! · ( dn − | E ( H ) | − dn − Y v ∈ V ( H ) d !( d − deg( v ))! · ( dn ) −| E ( H ) | + O ( n −| E ( H ) |− ) . Similarly, let ( b G , σ ) be generated by the d -biregular configuration model on n vertices, σ : V ( e G ) → [2] its left/right labelling. If ( H, τ ) is a simple bipartite graph with left/right labelling τ , then theprobability that a fixed injection φ : V ( H ) ֒ → V ( b G ) agreeing on labels is a homomorphism is Y v ∈ V ( H ) d !( d − deg( v ))! · ( dn − | E ( H ) | )!( dn )! = Y v ∈ V ( H ) d !( d − deg( v ))! · ( dn ) −| E ( H ) | + O ( n −| E ( H ) |− ) . Sketch.
In the case of the d -regular configuration model, once an injective φ has been chosen, thereare d !( d − deg( v ))! ways to choose deg( v ) stubs from the d available at each vertex v to be matched withthe appropriate stubs at each intended neighbor of v , and then ( d n − | E ( H ) | − dn − H, S, τ ) be a partially labelled graph, and b τ an extension of τ , i.e.34 τ : V ( H ) → [ k ], and b τ | S = τ ; for each i, j ∈ [ k ] and u ∈ V i ( H ), write deg j ( u ) for the number ofneighbors that u has in group j according to b τ . Let us define( dM ) ( H,S ) τ := X b τ : b τ | S = τ Q v Q j ∈ [ k ] dM b τ ( v ) ,j ( dM b τ ( v ) ,j − · · · ( dM b τ ( v ) ,j − deg i → j ( v ) + 1) Q ( u,v ) ∈ E ( H ) dM b τ ( u ) , b τ ( v ) (54)if dM i,j ≥ deg j ( v ) for every i, j ∈ [ k ] and v ∈ b τ − ( i ), and zero otherwise. Note that this operationis multiplicative on disjoint unions:( dM ) ( H ⊔ H ,S ⊔ S ) τ ⊔ τ = ( dM ) ( H ,S ) τ ( dM ) ( H ,S ) τ . Finally, let us write χ ( H ) := | V ( H ) | − | E ( H ) | and cc ( H ) for the number of connected components.Since we are aiming to prove high probability statements regarding p ( H,S,τ ) ( x , G ) for ( x , G ) ∼ P by studying the configuration model, we need to extend the quantities p ( H,S,τ ) ( x, G ) to the casewhen G is a multigraph with self-loops. For convenience, we will define an occurrence of ( H, S, τ )in a fully labelled, loopy multigraph as an occurence of (
H, S, τ ) in the simple graph obtained bydeleting all self-loops and merging all multiedges between each pair of vertices. Our key lemmacomputes the expected number of occurrences in the configuration model.
Lemma 5.13.
Let ( H, S, τ ) be a partially labelled graph on O (1) edges, and ( x , b G ) be drawn fromthe configuration model b P . Then E p ( H,S,τ ) ( x , b G ) = ( n/k ) χ ( H ) ( dM ) ( H,S ) τ + O ( n χ ( H ) − ) . Proof.
Let V ( b G ) = [ n ], fix a labelling σ : [ n ] → [ k ], and let b G be drawn from the configurationmodel. Fix an extension b τ of τ . If φ : V ( H ) ֒ → V ( b G ) is an injection that agrees on labels, applyingLemma 8.9 to the multigraphs on each set of vertices σ − ( i ) and between each pair of sets σ − ( i )and σ − ( j ), P [ φ is an occurrence] = ( n/k ) | E ( H ) | · Y i ≤ j ( dM i,j ) −| E i,j ( H, b τ ) | · Y v ∈ V ( H, b τ ) Y i,j ( dM i,j )!( dM i,j − deg j ( v ))!+ O ( n −| E ( H ) |− ) , and there are ( n/k ) V ( H ) + O ( n | V ( H ) |− ) injective choices for φ that agree on labels.Finally, writing Φ( H, S, τ ) for the total number of occurrences of (
H, S, τ ) in b G , and Φ( H, b τ )for the number of occurrences of the fully labelled graph ( H, b τ ), E Φ( H, S, τ ) = E X b τ : b τ | S = τ Φ( H, b τ )= ( n/k ) χ ( H ) ( dM ) ( H,S ) τ + O ( n | V ( H ) |−| E ( H ) |− ) . This lemma has some immediate consequences. First, it tells us that occurrences of partiallylabelled forests are sharply concentrated.
Lemma 5.14.
Let ( H, S, τ ) = F t ∈ [ cc ( H )] ( H t , S t , τ t ) be a partially labelled graph with O (1) vertices, cc ( H ) connected components H t , and no cycles. Then for any function f ( n ) > , P (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p ( H,S,τ ) ( x , G ) − ( n/k ) cc ( H ) Y t ∈ [ cc ( H )] ( dM ) ( H t ,S t ) τ t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > f ( n ) = O n cc ( H ) − f ( n ) ! . roof. The expectation E [ p ( H,S,τ ) ( x , G ) ] is a sum over all pairs of injective, label-consistent maps φ , φ , of the probability that both maps are occurrences. The image of two disjoint copies of H under these two injective maps is a graph H ′ that is either H ⊔ H , or is obtained by identifyingsome pairs of vertices whose τ labels agree—each pair with one vertex from each copy of H . Wecan promote H ′ to a partially labelled graph by taking the induced partial labelling τ ′ from τ .Thus, let us think of the pair φ , φ as a single injective map ϕ : V ( H ′ ) ֒ → V ( e G ) that agrees with τ ′ . Thus E [ p ( H,S,τ ) ( x , G ) ] = X H ′ E p ( H ′ ,S ′ ,τ ′ ) ( x , G )= X H ′ (cid:16) ( n/k ) χ ( H ′ ) ( dM ) ( H ′ ,S ′ τ ′ + O ( n χ ( H ′ ) − ) (cid:17) . When H ′ = H ⊔ H , from our observation above( dM ) ( H ′ ,S ′ ) τ ′ = (( dM ) ( H,S ) τ ) = Y t ∈ [ cc ( H )] (( dM ) ( H t ,S t ) τ t ) . As H has no cycles, every other H ′ satisfies χ ( H ′ ) < χ ( H ). Thus since cc ( H ) = χ ( H ), the assertionis true in the configuration model by Chebyshev, and transfers immediately to the planted model.For good measure, an application of the triangle inequality shows as well that P (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p ( H,S,τ ) ( x , G ) − Y t ∈ [ cc ( H )] p ( H t ,S t ,τ t ) ( x , G ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > f ( n ) = O n cc ( H ) − f ( n ) ! . Thus with high probability, the counts of partially labelled forests enjoy concentration of ± o ( n cc ( H ) ). On the other hand, an immediate application of Markov in e P tells us that thereare very few occurrences of partially labelled graphs with cycles. Lemma 5.15.
Let ( H, S, τ ) be a partially labelled graph with O (1) edges and at least one cycle.Then for any function f ( n ) > , P (cid:2) p ( H,S,τ ) ( x , G ) > f ( n ) (cid:3) = O n cc ( H ) − f ( n ) ! . In particular, we will need later on the fact that there are very few vertices within constant distanceof a constant length cycle. The proof is once again Markov, combined with a union bound.
Lemma 5.16.
Let G ∼ P or Q . Fix constants L and C , and call a vertex bad if it is at most L steps a way from a cycle of length at most C . Then w.h.p. there are fewer than f ( n ) bad vertices,for any increasing function f ( n ) . We can now restate the local statistics algorithm more precisely.
Definition 5.17 (Local Statistics Algorithm with Formal Moment Constraints) . The degree-( D x , D G ) Local Statistics algorithm with error tolerance δ is the following SDP. Given an inputgraph G , find e E : R [ x ] ≤ D x → R s.t.1. (Positivity) e E p ( x ) ≥ whenever deg p ≤ D x . (Hard Constraints) e E p ( x, G ) = 0 for every p ∈ I k
3. (Moment Constraints) For every ( H, S, τ ) with at most D x distinguished vertices, D G edges,and ℓ connected components, e E p ( H,S,τ ) ( x, G ) = ( dM ) ( H,S ) τ ( n/k ) χ ( H ) ± δn cc ( H ) . The n χ ( H ) vs. n cc ( H ) scaling may seem ad hoc, but as promised above we have arranged things sothat the the SDP is w.h.p. feasible when its input is drawn from the planted model. Lemma 5.18.
Fix D x , D G constant, and δ > . with high probability, the degree- ( D x , D G ) LocalStatistics Algorithm with error tolerance δ is feasible on input G ∼ P .Proof. Let ( x , G ) ∼ P , and for each p ( x, G ) ∈ R [ x, G ] ≤ D x ,D G define e E p ( x, G ) = p ( x , G ) . This satisfies positivity, as e E p ( x, G ) = p ( x , G ) ≥
0, and obeys the hard constraints because( x , G ) lies in the zero locus of I k . Finally, let ( H, S, τ ) be a partially labelled graph. If H has acycle, then by Corollary 8.15, w.h.p. (cid:12)(cid:12)(cid:12) p ( H,S,τ ) ( x , G ) − ( n/k ) χ ( H ) ( dM ) ( H,S ) τ (cid:12)(cid:12)(cid:12) ≤ p ( H,S,τ ) ( x , G ) + O ( n χ ( H ) ) ≤ δn cc ( H ) . On the other hand, if H has no cycles, then χ ( H ) = cc ( H ) and w.h.p. (cid:12)(cid:12)(cid:12) p ( H,S,τ ) ( x , G ) − ( n/k ) χ ( H ) ( dM ) ( H,S ) τ (cid:12)(cid:12)(cid:12) ≤ δn cc ( H ) by Proposition 8.14. There are only constantly many partially labelled subgraphs with at most D x vertices and D G edges, so a union bound finishes the proof.Finally, we end this subsection with the proof of Lemma 5.4, which concerned the affine con-straints in the Symmetric Path Statistics SDP Proof of Lemma 5.4.
Recall the partition matrix P = kk − X i ∈ [ k ] x i x ⊤ i − k X i,j ∈ [ k ] x i x ⊤ j from Section 8.2, where each x i ∈ { , } n is the indicator vector for membership in the i th group.We are interested in h P , A ( s ) G i .Let ( P s , { , s } , { i, j } ) denote a path of length s with distinguished endpoints labelled i and j ,and write its vertices as V ( P s ) = { , , ..., s } . From Lemma 5.14, w.h.p. for ( x , G ) ∼ P , p ( P s , { ,s } , { i,j } ( x , G ) = 1 k ( dM ) ( P s , { ,s } i,j n ± o ( n ) . Expanding the right hand side,( dM ) ( P s , { ,s } )) i,j = X b τ : b τ | S = τ Q v ∈ V ( P s ) Q j ′ ∈ [ k ] dM b τ ( v ) ,j ′ ( dM b τ ( v ) ,j ′ − · · · ( dM b τ ( v ) ,j ′ − deg j ( v )) Q ( u,v ) ∈ E ( P s ) dM b τ ( u ) , b τ ( v ) = X b τ : b τ | S = τ dM i, b τ (1) Y t =1 ,...,s − (cid:0) dM b τ ( t ) , b τ ( t +1) − { b τ ( t + 1) = b τ ( t − } (cid:1) = q s ( dM ) i,j . s between vertices i and j on the multi-graph whose adjacency matrix is dM ; from Section 5.1 thesemay be enumerated using the polynomial q s applied to dM .Now, let us define A h s i G as the n × n matrix whose entries count self-avoiding (as opposed tonon-backtracking) walks on G . By definition p ( P s , { ,s } , { i,i } ) ( x, G ) = X u,v (cid:16) A h s i G (cid:17) u,v · x u,i x v,j . Thus, with high probability h P , A h s i G i = kk − X i p ( P s , { ,s } , { i,i } ) ( x , G ) − k X i,j p ( P s , { ,s } , { i,j } ) ( x , G ) = 1 k − h q s ( dM ) , I − J/k i n ± o ( n )= 1 k − (cid:16) Tr q s ( dM ) − q s ( d ) (cid:17) n + o ( n )= q s ( dλ ) n + o ( n ) . The last equality follows from the fact that the spectrum of dM consists of an eigenvalue d withmultiplicity one, and an eigenvalue dλ with multiplicty k − A ( s ) G .It is an easy consequence of Lemma 8.16 that w.h.p. for G ∼ Q , the matrices A ( s ) G and A h s i G disagree in at most o ( n ) rows. Thus, since the L norm of every row is bounded by a constant (bydegree-regularity), w.h.p. k A ( s ) G − A h s i G k F = o ( n ). Since X is PSD with ones on the diagonal, everyoff-diagonal element has magnitude at most one—thus h X, A ( s ) G i = h X, A h s i G i + o ( n ) , and we are done. We will show that if ( dλ ) > d − , D ) Local Statistics algorithm candistinguish P and Q for every D ≥
2. Specifically, we will show that for any such D , with highprobability over input G ∼ Q there exists a δ at which the SDP is infeasible . Our goal, here andin the proof of the lower bound, will be to reduce to our characterization of the Symmetric PathStatistics SDP in Theorem 8.7.Assume that e E is a feasible pseudoexpectation for the degree (2 , D ) Local Statistics SDP withtolerance δ >
0, on input G ∼ Q , and consider the matrix X with entries e P u,v = kk − X i ∈ [ k ] e E ( x u,i − /k )( x v,i − /k ) = kk − X i ∈ [ k ] e E x u,i x v,i − k X i,j ∈ [ k ] e E x u,i x v,j . We will show that e P is a feasible solution to the level- D Symmetric Path Statistics SDP withthe input G , at some tolerance δ ′ = cδ —thus by Theorem 8.7, when ( dλ ) > d −
1) and δ issufficiently small, we will have a contradiction. We observe first that X is PSD with ones on the38iagonal—these facts follow immediately from the hard constraints in Definition 8.17, which tell usthat e E x u,i = e E x u,i for every u and i , and e E P i ∈ [ k ] x u,i = 1 for every u .We turn now to the moment constraints, with the goal of showing h e P , A ( s ) G i = q s ( dλ ) n ± δ ′ n ∀ s ∈ [ D ] h e P , J i = 0 ± δ ′ n . Rehashing our calculations from the proof of Lemma 5.4, we find that w.h.p. h P , A ( s ) G i = h e P , A h s i G i + o ( n )= e E kk − X i p ( P s , { ,s } , { i,i } ) ( x, G ) − k X i,j p ( P s , { ,s } , { i,j } ) ( x, G ) + o ( n )= 1 k − h q s ( dM ) , I − J/k i n ± kk − · kδn + o ( n )= q s ( dλ ) n ± k k − δn + o ( n ) . To verify that X has the correct inner product against the all-ones matrix, consider two partiallylabelled subgraphs: a single vertex labelled i ∈ [ k ], and two disjoint vertices both labelled i ∈ [ k ].The sum of their corresponding polynomials is P u,v x u,i x v,i , and identically in the planted model X u,v x u,i x v,i = ( n/k ) . Our pseudoexpectation is required to match this up to an additive δ ( n + n ), the n and n termsrespectively coming from the additive slack in the one vs. two vertex graphs. Thus h e P , J i = kk − (cid:0) n /k ± δ ( n + n ) − n /k (cid:1) = 0 ± kk − δn . Assume that ( dλ ) ≤ d − , D ) pseudoexpectationthat is with high probability feasible for G ∼ Q . Our tactic will be to show that such an operatorcan be constructed from a feasible solution to the Symmetric Path Statistics SDP guaranteed usby Theorem 8.7.Before building the degree-(2 , D ) pseudoexpectation asserted to exist in the theorem statement,we will first prove a series of structural lemmas showing that it suffices to check only a subset ofthe moment constraints of a Local Statistics pseudoexpectation.First, we show that the moment constraints regarding the pseudo-expected counts of partiallylabelled graphs containing cycles are satisfied more or less for free. Lemma 5.19.
Let G ∼ Q , and e E = e E ( G ) be a degree ( D x , D G ) pseudoexpectation, perhaps depen-dent on G , that satisfies positivity and the hard constraints. For every error tolerance δ , w.h.p. e E satisfies the moment constraints for all partially labelled subgraphs containing a cycle.Proof. It is a routine sum-of-squares calculation that for any monomial µ ( x ),( e E µ ( x )) ≤ e E µ ( x ) = e E µ ( x ) , | e E µ ( x ) | ≤
1. Thus for any (
H, S, τ ), | e E p ( H,S,τ ) ( x, G ) | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X φ : V ( H ) ֒ → [ n ] Y ( α,β ) ∈ E ( H ) G φ ( α ) ,φ ( β ) e E Y α ∈ S x φ ( α ) ,τ ( α ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X φ : V ( H ) Y ( α,β ) ∈ E ( H ) G φ ( α ) ,φ ( β ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , and the right hand side is simply the number of occurrences of the unlabelled graph H in G .From Proposition 8.15, if H has cc ( H ) connected components and at least one cycle, (i) w.h.p. thisquantity is smaller than δ ′ n cc ( H ) for every δ ′ >
0, and (ii) χ ( H ) < cc ( H ). Thus trivially, if we set δ ′ < δ , w.h.p. | e E p ( H,S,τ ) ( x, G ) − ( dM ) ( H,S ) τ | ≤ δ ′ n cc ( H ) + ( dM ) ( H,S ) τ n χ ( H ) ≤ δn cc ( H ) . It therefore suffices to check only the moment constraints for partially labelled forests. In fact, onlya subset of these are important.
Definition 5.20.
Let ( H, S, τ ) be a partialy labelled tree. The pruning of ( H, S, τ ) is the uniquepartially labelled subtree in which every leaf belongs to S . The pruning of an unlabelled graph is theempty graph, and the pruning of a forest is defined tree-by-tree. We say that a partially labelledforest is pruned if it is equal to its pruning. Lemma 5.21.
Let ( H, S, τ ) be a partially labelled forest with maximal degree d , ( e H, S, τ ) its prun-ing, and write deg and g deg for the vertex degrees in H and e H respectively. If X is a symmetricnonnegative integer matrix with row and column sums equal to d , then X ( H,S ) τ X ( e H,S ) τ = Y v ∈ V ( H ) deg( v ) − Y q = g deg( v ) ( d − q ) . Proof.
Let’s reason combinatorially. Any such matrix X can be thought of as the adjacency matrixfor a d -regular multigraph with self-loops; let’s fix X and call this graph Γ. Since Γ’s vertex setis [ k ], we will think of it as a fully labelled graph. By multiplicativity on disjoint unions, we mayfreely assume that ( H, S, τ ) is a tree. Let’s choose a root r ∈ V ( e H ) ⊂ V ( H ); having done so, E ( H )is in bijection with V ( H ) \ r (and similarly for E ( e H ) and V ( e H )). Let’s write p ( v ) for the uniqueparent of every vertex. We can thus write X ( H,S ) τ = X e τ : e τ | S = τ Y j ∈ [ k ] X e τ ( r ) ,j ( X e τ ( r ) ,j − · · · ( X e τ ( r ) ,j − deg j ( r ) + 1) × Y v ∈ V ( H ) \ r Q j ∈ [ k ] X e τ ( v ) ,j · · · ( X e τ ( v ) ,j − deg j ( v ) + 1) X e τ ( v ) , e τ ( p ( v )) . Thinking of each e τ as a map V ( H ) → V (Γ), the summand above gives the number of ways to map η : E ( H ) → E (Γ) with the following constraints: (1) each edge ( u, v ) must be mapped to one of the X τ ( u ) ,τ ( v ) edges between τ ( u ) and τ ( v ), and (2) no two edges in E ( H ) with the same endpoint maybe mapped to the same edge in Γ. We’ll call the pair ( e τ , η ) a locally injective occurrence of ( H, S, τ )in the fully labelled graph Γ. Thus the expression X ( H,S ) τ gives the number of such occurrences.The same argument applies to the pruning ( e H, S, τ ). Now, the graph (
H, S, τ ) consists pruning( e H, S, τ ), plus some trees hanging off the edges. For each locally injective occurrence of ( e H , S, τ ),there are Y v ∈ V ( H ) deg( v ) − Y q = g deg( v ) ( d − q )ways to extend it to a locally injective occurrence of ( H, S, τ ), since Γ is d -regular.40 emma 5.22. Let G ∼ Q , ( H, S, τ ) be a partially labelled forest, and ( e H , S, τ ) its pruning. Thenw.h.p. (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) p ( H,S,τ ) ( x, G ) − n cc ( H ) − cc ( e H ) ( dM ) ( H,S ) τ ( dM ) ( e H,S ) τ p ( e H,S,τ ) ( x, G ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = o ( n cc ( H ) ) , where k · k is the coefficient-wise L norm.Proof. Let (
H, S, τ ) be a partially labelled forest, ( e H, S, τ ) its pruning. If e φ : V ( e H ) ֒ → V ( G ) is anoccurrence of ( e H, S, τ ), then we call φ : V ( H ) ֒ → V ( G ) an extension of e φ if its an occurrence of( H, S, τ ) and agrees with e φ on V ( e H ). Let’s write e Φ for the set of occurrences of ( e H, S, τ ) in ( G , σ ),and for each e φ ∈ e Φ, write Ξ( e φ ) for its set of extensions. Thus, incorporating Lemma 8.20, (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) p ( H,S,τ ) ( x, G ) − n cc ( H ) − cc ( e H ) ( dM ) ( H,S ) τ ( dM ) ( e H,S ) τ p ( e H,S,τ ) ( x, G ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ X e φ ∈ e Φ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) | Ξ( e φ ) | − n cc ( H ) − cc ( e H ) Y v ∈ V ( H ) deg( v ) − Y q = g deg( v ) ( d − q ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . By Proposition 8.15, with high probability there are o ( n cc ( e H ) occurrences of e H whose | E ( H ) | neighborhoods in e G either intersect or contain a cycle, so we can safely restrict the right hand sideabove to the remaining ones. Let’s fix such an occurrence and enumerate the possible extensions.First, for each connected component e J of e H , and its corresponding component J of H , because G is d -regular and locally treelike in the neighborhood of φ ( e J ), there are exactly Y v ∈ V ( J ) deg( v ) − Y q = g deg( v ) ( d − q )ways to extend e φ to the remainder of J . Having already chosen how to extend the occurrenceon these connected components, call K the union of all connected components in H that haveno distinguished vertex. We need to find an injective homomorphism from K into G that doesnot collide with e φ ( e H ) or the portion of the extension that we have already constructed. Since | V ( H ) | = O (1), there are n cc ( H ) − cc ( e H ) Y v ∈ V ( K ) deg( v ) − Y q =0 ( d − q ) + O ( n cc ( H ) − cc ( e H ) − )ways to do this. Since | e Φ | = O ( n cc ( e H ) ), X e φ ∈ e Φ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) | Ξ( e φ ) | − n cc ( H ) − cc ( e H ) Y v ∈ V ( H ) deg( v ) − Y q = g deg( v ) ( d − q ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = O ( n cc ( e H ) ) · O ( n cc ( H ) − cc ( e H ) − ) = O ( n cc ( H ) − )as desired. Lemma 5.23.
Let G ∼ Q and e E = e E ( G ) be a degree- ( D x , D G ) pseudoexpectation, perhaps depen-dent on G . If e E w.h.p. satisfies the moment constraints for pruned partially labelled forests, w.h.p.it does so for every partially labelled forest. roof. This is a direct consequence of Lemma 8.21. Retaining (
H, S, τ ) and ( e H, S, τ ) from theproof of that lemma, since the pseudoexpectation of any monomial has absolute value at most one, e E p ( H,S,τ ) ( x, G ) = ( dM ) ( H,S ) τ ( dM ) ( e H,S ) τ p ( e H,S,τ ) ( x, G ) ± o ( n ℓ ) = ( dM ) ( H,S ) τ ± δn ℓ for every δ >
0. We can take a union bound over all finitely many (
H, S, τ ).We are finally ready to describe our own degree (2 , D ) pseudoexpectation. Our key buildingblock will be the feasible solution P (cid:23) D Symmetric Path Statistics on input G ∼ Q SDP whose asymptotic almost sure existence is guaranteed us by Theorem 8.7. Recall that thismatrix satisfies1. P u,u = 1 for every u ∈ [ n ]2. h P , J i = 03. h P , A ( s ) G i = q s ( dλ ) ± δn for every s = 1 , ..., D .A degree-2 pseudoexpectation e E : R [ x ] ≤ → R may be expressed as a (1 + nk ) × (1 + nk ) blockmatrix (cid:18) l ⊤ l Q (cid:19) = l ⊤ · · · l ⊤ n l Q , · · · Q ,n ... ... . . . ... l n Q n, · · · Q n,n where ( l u ) i = e E x u,i and ( Q u,v ) i,j = e E x u,i x v,j . Our construction will set ( l u ) i = 1 /k for every u and i , and Q = 1 k ( J nk /k + P ⊗ ( I − J k /k ))Let us first check the hard consstraints. For positivity it suffices to observe that Q − ll ⊤ = 1 k P ⊗ ( I − J k /k ) (cid:23) , as P , ( I − J k /k ) (cid:23)
0. We also have e E x u,i = ( Q u,u ) i,i = 1 k (1 /k + (1 − /k ) P i,i ) = 1 /k = e E x u,i since P u,u = 1 for every u ∈ [ n ]. It remains only to check that e E ( x u, + · · · + x u,k ) p ( x ) = e E p ( x ) forevery p ( x ) of degree one. For this it is sufficient to verify that e E ( x u, + · · · + x u,k ) x v,j = X i ( Q u,v ) i,j = 1 k X i (1 /k + P u,v (1 − J k /k ) i,j ) = 1 /k = e E x v,j . Finally, we need to verify the moment constraints. By Lemmas 8.21-22, if we do so only on theminimal partially labelled forests with at most two distinguished vertices, then w.h.p. the remainderof the moment constraints are satisfied. A minimal partially labelled forest with one distinguishedvertex is just a single vertex with a label i ∈ [ k ]; the associated polynomial in this case is just42 ,k + · · · + x n,k , and its required pseudoexpectation is n/k ± δn , since there are identically n/k vertices with each label in the planted model. Our pseudoexpectation assigns a value of e E X u x u,i = X u ( l u ) i = n/k as desired.A minimal partially labelled forest on two vertices is either a path of length s ∈ [ d ] withendpoints labelled i, j ∈ [ k ], or two isolated vertices labelled i, j ∈ [ k ]. In the former case, thepseudoexpectation is required to read k q ( dM ) i,j n ± δn Recycling some calculations from Section8.4, our pseudoexpectation on this polynomial reads h Q i,j , A h s i G i = h Q i,j , A ( s ) G i ± o ( n )= 1 k (cid:16) h J n /k, A ( s ) G i + h P , A ( s ) G i ( I − J k /k ) i,j (cid:17) = 1 k ( q s ( d )( J/k ) i,j + q s ( dλ )( I − J k /k ) i,j ) n ± ( δ/k ) n = 1 k q s ( dM ) i,j ± ( δ/k ) n. The last line follows since dM = dJ k /k + dλ ( I − J k /k ) is the spectral decomposition of dM .Finally, we verify the case of two disjoint vertices labelled i, j ∈ [ k ]. The polynomial here is P u = v x u,i x v,j , and the pseudoexpectation is requried to give a value of ( n/k ) ± δn . As needed,our pseudoexpectation gives h Q i,j , J n i = 1 k ( h J n /k, J n i + h P , J n i ( I − J k /k ) i,j ) = ( n/k ) , as h P , J n i = 0. In this section, we develop machinery for low-degree analysis of general spiked Wigner and Wishartmodels, culminating in the proofs of Theorems 2.16 and 2.17.
Definition 6.1.
A random vector x is ε -local c -subgaussian if for any fixed vector v with k v k ≤ ε , E exp( h v, x i ) ≤ exp (cid:16) c k v k (cid:17) . It is straightforward to verify the following fact.
Fact 6.2.
Suppose x is ε -local c -subgaussian. For a (non-random) scalar α = 0 , α x is ε/ | α | -local cα -subgaussian. Also, the sum of n independent copies of x is ε -local cn -subgaussian. Proposition 6.3.
If a random vector x is ε -local c -subgaussian then it admits the following localChernoff bound: for any k v k = 1 and ≤ t ≤ εc , Pr {h v, x i ≥ t } ≤ exp (cid:18) − t c (cid:19) . roof. Apply the standard Chernoff bound argument: for any α > {h v, x i ≥ t } = Pr { exp( α h v, x i ) ≥ exp( αt ) }≤ E [exp( α h v, x i )] / exp( αt ) ≤ exp( cα / − αt ) provided α ≤ ε. Set α = t/c to complete the proof. Proposition 6.4.
If a random vector x ∈ R k is ε -local c -subgaussian then for any δ > and any ≤ t ≤ εc/ (1 − δ ) , Pr {k x k ≥ t } ≤ C ( δ, k ) exp (cid:18) − c (1 − δ ) t (cid:19) where C ( δ, k ) is a constant depending only on δ and k .Proof. Let
N ⊆ R k be a δ -net of the unit sphere in R k , in the sense that for any v ∈ R k ,max u ∈N h u, v i ≥ (1 − δ ) k v k where k u k = 1 for all u ∈ N . Let C ( δ, k ) = |N | . Using a union bound and the local Chernoffbound (Proposition 6.3), for all 0 ≤ t ≤ εc/ (1 − δ ),Pr {k x k ≥ t } ≤ Pr (cid:26) max u ∈N h u, x i ≥ (1 − δ ) t (cid:27) ≤ C ( δ, k ) exp (cid:18) − c (1 − δ ) t (cid:19) . Proposition 6.5.
Let δ > . If a random vector x ∈ R k is ε -local c -subgaussian with c < (1 − δ ) / then E (cid:2) [ k x k ≤ εc/ (1 − δ )] exp( k x k ) (cid:3) ≤ C ( δ, k )(1 − δ ) / (2 c ) − where C ( δ, k ) is a constant depending only on δ and k .Proof. Let ∆ = εc/ (1 − δ ), and integrate the tail bound from Proposition 6.4: E (cid:2) [ k x k ≤ ∆] exp( k x k ) (cid:3) = Z ∞ Pr { [ k x k ≤ ∆] exp( k x k ) ≥ t } dt ≤ Z exp(∆ )1 Pr { exp( k x k ) ≥ t } dt = 1 + Z exp(∆ )1 Pr {k x k ≥ p log t } dt ≤ C ( δ, k ) Z ∞ exp (cid:18) − c (1 − δ ) log t (cid:19) dt = 1 + C ( δ, k )(1 − δ ) / (2 c ) − . .2 The Wigner Model Proof of Theorem 2.16.
We start with a formula for k L ≤ D k from [KWB19] (adapted slightly forthe case of symmetric Gaussian noise): k L ≤ D k = E X , X ′ exp ≤ D (cid:18) λ n h X , X ′ i (cid:19) (55)where X ′ is an independent copy of X and exp ≤ D ( x ) = P Dd =0 x d d ! denotes the Taylor series trunca-tion of exp. Write k L ≤ D k = L + L where L is the small deviations term L := E X , X ′ (cid:2) h X , X ′ i ≤ ∆ (cid:3) exp ≤ D (cid:18) λ n h X , X ′ i (cid:19) , and L is the large deviations term L := E X , X ′ (cid:2) h X , X ′ i > ∆ (cid:3) exp ≤ D (cid:18) λ n h X , X ′ i (cid:19) , where ∆ > L and L are both O (1), completing the proof. In the setting of Theorem 2.16, if | λ | < then there exists ∆ > such that L = O (1) for any D .Proof. Note that h X , X ′ i = 1 n h U U ⊤ , U ′ ( U ′ ) ⊤ i = 1 n k R k (56)where R = U ⊤ U ′ . In particular, h X , X ′ i ≥
0. Since exp ≤ D ( x ) ≤ exp( x ) for all x ≥ L ≤ E (cid:2) k R k ≤ ∆ n (cid:3) exp (cid:18) λ n k R k (cid:19) . (57)We have R = P ni =1 R i where the R i are independent k × k matrices, each distributed as π ( π ′ ) ⊤ .Since π has bounded support, the moment-generating function M ( T ) = E exp( h T, R i i ) exists ina neighborhood of T = 0 (in fact, it exists everywhere) and thus, by the defining property of theMGF, has gradient ∇ M (0) = E [ R i ] = 0 and Hessian (Hess M )(0) = Cov( R i ) = Cov( π ) ⊗ (cid:22) I k .Thus for any η > ε > M ( T ) ≤ exp (cid:18)
12 (1 + η ) k T k (cid:19) for all k T k F ≤ ε. In other words, R i is ε -local (1 + η )-subgaussian. From Fact 6.2, this implies R is ε -local (1 + η ) n -subgaussian, and λ R / √ n is ε √ n/λ -local (1 + η ) λ / λ <
1, we can choose δ > η > η ) λ < (1 − δ ) . Letting ∆ = (cid:0) ε (1 + η ) / (1 − δ ) (cid:1) and usingProposition 6.5, L ≤ E h k λ R / √ n k F ≤ λ p ∆ n/ i exp (cid:16) k λ R / √ n k (cid:17) = O (1) . .2.2 Large DeviationsLemma 6.7. In the setting of Theorem 2.16, for any constants λ ∈ R and ∆ > , and for any D = o ( n/ log n ) , we have L = o (1) .Proof. Recall from above that R is ε -local (1 + η ) n -subgaussian. By Proposition 6.4 (taking t tobe n times a small constant),Pr {h X , X ′ i > ∆ } ≤ Pr {k R k F > √ ∆ n } = exp( − Ω( n )) . (58)The boundedness of π guarantees |h X , X ′ i| ≤ n C for some constant C >
0, and so L ≤ exp( − Ω( n )) D X d =0 (cid:18) λ n n C (cid:19) d ≤ exp( − Ω( n ))( D + 1) n O ( D ) , which is o (1) provided D = o ( n/ log n ). Proof of Theorem 2.17.
In Appendix A we derive a formula for k L ≤ D k in the general spikedWishart model. The version we will need here is summarized in Proposition A.6. The formulatakes the form k L ≤ D k = E X , X ′ D X d =0 r d ( β X , β X ′ ) (59)for some polynomials r d . As in the Wigner case, we write k L ≤ D k = L + L where L := E X , X ′ (cid:2) h X , X ′ i ≤ ∆ (cid:3) D X d =0 r d ( β X , β X ′ )and L := E X , X ′ (cid:2) h X , X ′ i > ∆ (cid:3) D X d =0 r d ( β X , β X ′ )for a small constant ∆ > L and L are both O (1), completing the proof. Before bounding the small deviations term in Lemma 6.10, we state two deterministic facts thatwill be useful in the proof.
Proposition 6.8.
For n × m matrices A and B , det( I n − AA ⊤ BB ⊤ ) = det( I m − A ⊤ BB ⊤ A ) . Proof. If A, B are square and A is nonsingular, this can be shown by taking determinants on bothsides of the equation ( I − AA ⊤ BB ⊤ ) A = A ( I − A ⊤ BB ⊤ A ). For the general case, pad A, B withzeros to make them square, and consider a sequence of nonsingular matrices converging to A . Lemma 6.9.
For any η > there exists ε > such that for all ≤ t ≤ ε , we have (1 − t ) − ≤ exp((1 + η ) t ) . roof. Letting f ( t ) = (1 − t ) − and g ( t ) = exp((1 + η ) t ), we have f (0) = g (0) = 1 and f ′ (0) = 1 < η = g ′ (0).We are now ready to bound the small deviations term. Lemma 6.10.
In the setting of Theorem 2.17, if β < γ then there exists ∆ > such that L = O (1) for any D .Proof. Since X (cid:23)
0, we have from Proposition A.6(c,d) that r d ( β X , β X ′ ) ≥
0. Analogous to (56),we have either h X , X ′ i = 0 or h X , X ′ i = n k R k . Recall from Definition 2.14 that, when drawingfrom P , we first draw f X ∼ X , and then threshold to form X as: X = ( f X if β f X ≻ − I n , . (60)We can upper bound L by dropping the low-degree truncation and using Proposition A.6(b): L ≤ E X , X ′ (cid:2) h X , X ′ i ≤ ∆ (cid:3) ∞ X d =0 r d ( β X , β X ′ )= E X , X ′ (cid:2) h X , X ′ i ≤ ∆ (cid:3) det( I n − β XX ′ ) − N/ ≤ E (cid:2) k R k ≤ ∆ n (cid:3) det( I n − β f X f X ′ ) − N/ where the +1 in the last line covers the second case of (60). Let { λ i } be the eigenvalues of RR ⊤ (which are nonnegative). Let η > ε according to Lemma 6.9, and choose∆ ≤ ε/β . Note that k R k = P i λ i . Provided k R k ≤ ∆ n , we have λ i ≤ ∆ n for all i , i.e., β n − λ i ≤ ε , and sodet( I n − β f X f X ′ ) − N/ = det( I n − β n − U U ⊤ U ′ U ′⊤ ) − N/ = det( I n − β n − U ⊤ U ′ U ′⊤ U ) − N/ (using Proposition 6.8)= det( I n − β n − RR ⊤ ) − N/ = Y i (1 − β n − λ i ) − N/ ≤ Y i exp((1 + η ) β n − λ i ) N/ (using Lemma 6.9)= exp (1 + η ) β N n X i λ i ! = exp (cid:18) (1 + η ) β N n k R k (cid:19) . We now have L ≤ E (cid:2) k R k ≤ ∆ n (cid:3) exp (cid:18) (1 + η ) β N n k R k (cid:19) . Comparing this to (57), we see that we have reduced to the case of Wigner small deviations. Inplace of λ , we have (1 + η ) β Nn → (1 + η ) β γ . Thus, provided β < γ , we can choose η > > L = O (1) . .3.2 Large DeviationsLemma 6.11. In the setting of Theorem 2.17, for any constants β > − , γ > , and ∆ > , andfor any D = o ( n/ log n ) , we have L = o (1) .Proof. Consider the case β > β X (cid:23)
0; the case β < r d given in Proposition A.6(c): r d ( β X , β X ′ ) = X d ,...,d N ∈ N P Ni =1 d i = d N Y i =1 d i ! E x ∼N (0 ,β X ) x ′ ∼N (0 ,β X ′ ) h x , x ′ i d i . Let { λ i , v i } be an eigendecomposition of X , and let X i = λ i v i v ⊤ i so that X = X + · · · + X k . Let x i ∼ N (0 , β X i ) independently so that x := P ki =1 x i ∼ N (0 , β X ). Similarly define X ′ i , x ′ i , x ′ . Forfixed X , X ′ , { X i } , { X ′ i } , E [ h x , x ′ i d ] ≤ E k x k d k x ′ k d = E k x k d · E k x ′ k d . The boundedness of π guarantees P i λ i = k X k ≤ n C for some constant C >
0. Recall that x i ∼ N (0 , β λ i v i v ⊤ i ), so for g a standard Gaussian vector, x i has the same law as √ β λ i · h v i , g i v i .Since v i is a unit vector, k x i k then has the same law as √ β λ i · |h v i , g i| , which in turn has the samelaw as √ β λ i · | g | . So, for d even, E k x i k d = ( β λ i ) d/ ( d − ≤ β d/ n Cd/ d d . This means E k x k d ≤ E k X i =1 k x i k ! d ≤ E (cid:18) k max i k x i k (cid:19) d ≤ k d E k X i =1 k x i k d ≤ k d +1 β d/ n Cd/ d d . Since E k x k d = 1 when d = 0, we can rewrite this as E k x k d ≤ k d β d/ n Cd/ d d ≤ ( dn ) O ( d ) . Thus for d , . . . , d N ∈ N with P i d i = d ≤ D , N Y i =1 E [ h x , x ′ i d i ] ≤ N Y i =1 ( d i n ) O ( d i ) ≤ ( Dn ) O ( D ) . Now we have r d ( X , X ′ ) ≤ N D ( Dn ) O ( D ) and so L ≤ Pr {h X , X ′ i > ∆ } D X d =0 N D ( Dn ) O ( D ) . Similarly to (58) we have Pr {h X , X ′ i > ∆ } ≤ exp( − Ω( n )) and so L = o (1) provided D = o ( n/ log n ). References [AB ˇC13] Antonio Auffinger, G´erard Ben Arous, and Jiˇr´ı ˇCern`y. Random matrices and complex-ity of spin glasses.
Communications on Pure and Applied Mathematics , 66(2):165–201,2013.[AKS98] N. Alon, M. Krivelevich, and B. Sudakov. Finding a large hidden clique in a randomgraph.
Random Structures & Algorithms , 13:457–466, 1998.48AS16] Emmanuel Abbe and Colin Sandon. Achieving the KS threshold in the general stochas-tic block model with linearized acyclic belief propagation. In
Advances in NeuralInformation Processing Systems , pages 1334–1342, 2016.[Bar17] Paolo Barucca. Spectral partitioning in equitable graphs.
Phys. Rev. E , 95:062310,2017.[BBH18] Matthew Brennan, Guy Bresler, and Wasim Huleihel. Reducibility and computationallower bounds for problems with planted sparse structure. In
Conference On LearningTheory , pages 48–166, 2018.[BBP05] Jinho Baik, G´erard Ben Arous, and Sandrine P´ech´e. Phase transition of the largesteigenvalue for nonnull complex sample covariance matrices.
The Annals of Probability ,33(5):1643–1697, 2005.[BC19] Charles Bordenave and Benoˆıt Collins. Eigenvalues of random lifts and polynomialsof random permutation matrices.
Annals of Mathematics , 190(3):811–875, 2019.[BDG +
16] Gerandy Brito, Ioana Dumitriu, Shirshendu Ganguly, Christopher Hoffman, andLinh V Tran. Recovery and rigidity in a regular stochastic block model. In
Pro-ceedings of the twenty-seventh annual ACM-SIAM symposium on Discrete algorithms ,pages 1589–1601. Society for Industrial and Applied Mathematics, 2016.[BHK +
19] Boaz Barak, Samuel Hopkins, Jonathan Kelner, Pravesh K Kothari, Ankur Moitra,and Aaron Potechin. A nearly tight sum-of-squares lower bound for the planted cliqueproblem.
SIAM Journal on Computing , 48(2):687–735, 2019.[BKM17] Jess Banks, Robert Kleinberg, and Cristopher Moore. The Lov´asz theta function forrandom regular graphs and community detection in the hard regime. arXiv preprintarXiv:1705.01194 , 2017.[BKW20] Afonso S Bandeira, Dmitriy Kunisky, and Alexander S Wein. Computational hardnessof certifying bounds on constrained pca problems. In . Schloss Dagstuhl-Leibniz-Zentrum f¨urInformatik, 2020.[BMR19] Jess Banks, Sidhanth Mohanty, and Prasad Raghavendra. Local statistics, semidefiniteprogramming, and community detection. arXiv preprint arXiv:1911.01960 , 2019.[BR13] Quentin Berthet and Philippe Rigollet. Complexity theoretic lower bounds for sparseprincipal component detection. In
Conference on Learning Theory , pages 1046–1066,2013.[BS06] Jinho Baik and Jack W Silverstein. Eigenvalues of large sample covariance matricesof spiked population models.
Journal of multivariate analysis , 97(6):1382–1408, 2006.[BS16] Boaz Barak and David Steurer. Proofs, beliefs, and algorithms through the lensof sum-of-squares. ,2016.[CDF09] Mireille Capitaine, Catherine Donati-Martin, and Delphine F´eral. The largest eigen-values of finite rank deformation of large wigner matrices: convergence and nonuni-versality of the fluctuations.
The Annals of Probability , 37(1):1–47, 2009.49CO03] Amin Coja-Oghlan. The Lov´asz number of random graphs. In
Approximation, Ran-domization, and Combinatorial Optimization.. Algorithms and Techniques , pages 228–239. Springer, 2003.[COEH16] Amin Coja-Oghlan, Charilaos Efthymiou, and Samuel Hetterich. On the chromaticnumber of random regular graphs.
Journal of Combinatorial Theory, Series B ,116:367–439, 2016.[DAM17] Yash Deshpande, Emmanuel Abbe, and Andrea Montanari. Asymptotic mutual in-formation for the balanced binary stochastic block model.
Information and Inference:A Journal of the IMA , 6(2):125–170, 2017.[DDSW03] Josep D´ıaz, Norman Do, Maria Serna, and Nicholas Wormald. Bounds on the max andmin bisection of random cubic and random 4-regular graphs.
Theoretical ComputerScience , 307:531–547, 10 2003.[DKMZ11a] Aurelien Decelle, Florent Krzakala, Cristopher Moore, and Lenka Zdeborov´a. Asymp-totic analysis of the stochastic block model for modular networks and its algorithmicapplications.
Physical Review E , 84(6):066106, 2011.[DKMZ11b] Aurelien Decelle, Florent Krzakala, Cristopher Moore, and Lenka Zdeborov´a. Inferenceand phase transitions in the detection of modules in sparse networks.
Physical ReviewLetters , 107(6):065701, 2011.[DMM09] David L Donoho, Arian Maleki, and Andrea Montanari. Message-passing algo-rithms for compressed sensing.
Proceedings of the National Academy of Sciences ,106(45):18914–18919, 2009.[DMS17] Amir Dembo, Andrea Montanari, and Subhabrata Sen. Extremal cuts of sparse ran-dom graphs.
The Annals of Probability , 45(2):1190–1217, 2017.[FGR +
17] Vitaly Feldman, Elena Grigorescu, Lev Reyzin, Santosh S Vempala, and Ying Xiao.Statistical algorithms and a lower bound for detecting planted cliques.
Journal of theACM (JACM) , 64(2):1–37, 2017.[FP07] Delphine F´eral and Sandrine P´ech´e. The largest eigenvalue of rank one deformationof large wigner matrices.
Communications in mathematical physics , 272(1):185–228,2007.[Fri03] Joel Friedman. A proof of Alon’s second eigenvalue conjecture. In
Proceedings of thethirty-fifth annual ACM symposium on Theory of computing , pages 720–724, 2003.[GM75] G. R. Grimmett and C. J. H. McDiarmid. On colouring random graphs.
Math. Proc.Camb. Phil. Soc. , 77:313–324, 1975.[Gue03] Francesco Guerra. Broken replica symmetry bounds in the mean field spin glass model.
Communications in mathematical physics , 233(1):1–12, 2003.[GZ17] David Gamarnik and Ilias Zadik. High-dimensional regression with binary coefficients.Estimating squared error and a phase transition. arXiv preprint arXiv:1701.04455 ,2017. 50HKP +
17] Samuel B Hopkins, Pravesh K Kothari, Aaron Potechin, Prasad Raghavendra, TselilSchramm, and David Steurer. The power of sum-of-squares for detecting hidden struc-tures. In , pages 720–731. IEEE, 2017.[Hof70] Alan J Hoffman. On eigenvalues and colorings of graphs.
Graph Theory and itsApplications , 1970.[Hop18] Samuel Hopkins.
Statistical Inference and the Sum of Squares Method . PhD thesis,Cornell University, 2018.[HS17] Samuel B Hopkins and David Steurer. Bayesian estimation from few samples: com-munity detection and related problems. arXiv preprint arXiv:1710.00264 , 2017.[JKS18] Aukosh Jagannath, Justin Ko, and Subhabrata Sen. Max κ -cut and the inhomoge-neous potts spin glass. Annals of Applied Probability , 28(3):1536–1572, 2018.[Kar76] Richard M. Karp. The probabilistic analysis of some combinatorial search algorithms.1976.[Kar86] Richard M. Karp. Combinatorics, complexity, and randomness. 1986.[Kuc95] Ludek Kucera. Expected complexity of graph partitioning problems.
Discrete AppliedMathematics , 57:193–212, 1995.[KWB19] Dmitriy Kunisky, Alexander S Wein, and Afonso S Bandeira. Notes on computationalhardness of hypothesis testing: Predictions using the low-degree likelihood ratio. arXivpreprint arXiv:1907.11636 , 2019.[KZ09] Florent Krzakala and Lenka Zdeborov´a. Hiding quiet solutions in random constraintsatisfaction problems.
Physical review letters , 102(23):238701, 2009.[Lau09] Monique Laurent. Sums of squares, moment matrices and optimization over poly-nomials. In
Emerging applications of algebraic geometry , pages 157–270. Springer,2009.[LCY12] Lucien Le Cam and Grace Lo Yang.
Asymptotics in statistics: some basic concepts .Springer Science & Business Media, 2012.[LKZ15a] Thibault Lesieur, Florent Krzakala, and Lenka Zdeborov´a. MMSE of probabilisticlow-rank matrix estimation: Universality with respect to the output channel. In , pages680–687. IEEE, 2015.[LKZ15b] Thibault Lesieur, Florent Krzakala, and Lenka Zdeborov´a. Phase transitions in sparsePCA. In , pages1635–1639. IEEE, 2015.[Mas14] Laurent Massouli´e. Community detection thresholds and the weak ramanujan prop-erty. In
Proceedings of the forty-sixth annual ACM symposium on Theory of computing ,pages 694–703, 2014.[McK81] Brendan D McKay. The expected eigenvalue distribution of a large regular graph.
Linear Algebra and its Applications , 40:203–216, 1981.51MM09] Marc Mezard and Andrea Montanari.
Information, physics, and computation . OxfordUniversity Press, 2009.[MNS18] Elchanan Mossel, Joe Neeman, and Allan Sly. A proof of the block model thresholdconjecture.
Combinatorica , 38(3):665–708, 2018.[MRX19] Sidhanth Mohanty, Prasad Raghavendra, and Jeff Xu. Lifting sum-of-squares lowerbounds: Degree-2 to degree-4. arXiv preprint arXiv:1911.01411 , 2019.[MS16] Andrea Montanari and Subhabrata Sen. Semidefinite programs on sparse randomgraphs and their application to community detection. In
Proceedings of the forty-eighth annual ACM symposium on Theory of Computing , pages 814–827. ACM, 2016.[NM14] M. E. J. Newman and Travis Martin. Equitable random graphs.
Phys. Rev. E ,90:052824, 2014.[NN12] Raj Rao Nadakuditi and M. E. J. Newman. Graph spectra and the detectability ofcommunity structure in networks.
Phys. Rev. Lett. , 108:188701, May 2012.[NP33] Jerzy Neyman and Egon Sharpe Pearson. IX. on the problem of the most efficienttests of statistical hypotheses.
Philosophical Transactions of the Royal Society of Lon-don. Series A, Containing Papers of a Mathematical or Physical Character , 231(694-706):289–337, 1933.[Pan11] Dmitry Panchenko. The Parisi ultrametricity conjecture. arXiv preprintarXiv:1112.1003 , 2011.[Pan13] Dmitry Panchenko.
The Sherrington-Kirkpatrick model . Springer Science & BusinessMedia, 2013.[Par79] Giorgio Parisi. Infinite number of order parameters for spin-glasses.
Physical ReviewLetters , 43(23):1754, 1979.[Rom05] Steven Roman.
The umbral calculus . Springer, 2005.[Sen18] Subhabrata Sen. Optimization on sparse random hypergraphs and spin glasses.
Ran-dom Structures & Algorithms , 53(3):504–536, 2018.[SK75] David Sherrington and Scott Kirkpatrick. Solvable model of a spin-glass.
Physicalreview letters , 35(26):1792, 1975.[Sod07] Sasha Sodin. Random matrices, nonbacktracking walks, and orthogonal polynomials.
Journal of Mathematical Physics , 48(12):123503, 2007.[Sol96] Patrick Sol´e. Spectra of regular graphs and hypergraphs and orthogonal polynomials.
European Journal of Combinatorics , 17(5):461–477, 1996.[Sze39] Gabor Szeg.
Orthogonal polynomials , volume 23. American Mathematical Soc., 1939.[Tal06] Michel Talagrand. The Parisi formula.
Annals of mathematics , pages 221–263, 2006.[ZB10] Lenka Zdeborov´a and Stefan Boettcher. A conjecture on the maximum cut and bisec-tion width in random regular graphs.
Journal of Statistical Mechanics: Theory andExperiment , 2010(02):P02020, 2010.52ZK11] Lenka Zdeborov´a and Florent Krzakala. Quiet planting in the locked constraint sat-isfaction problems.
SIAM Journal on Discrete Mathematics , 25(2):750–770, 2011.
A Low-Degree Analysis of General Wishart Models
In this section we derive the formula (59) for k L ≤ D k in the general spiked Wishart model. A.1 Hermite Polynomial Facts
We first define and give the key facts that we will use about the Hermite polynomials, the orthogonalpolynomials with respect to the standard Gaussian measure.
Definition A.1 (Hermite polynomials) . The univariate Hermite polynomials are the sequence ofpolynomials h k ( y ) ∈ R [ y ] for k ∈ N , defined by the recursion h ( y ) = 1 , (61) h k +1 ( y ) = yh k ( y ) − h ′ k ( y ) . (62) The n -variate Hermite polynomials are the polynomials H α ( y ) ∈ R [ y , . . . , y n ] indexed by α ∈ N n and H α ( y ) = Q ni =1 h α i ( y i ) . Finally, the normalized n -variate Hermite polynomials are b H α ( y ) =( α !) − / H α ( y ) , where we abbreviate α ! = Q ni =1 α i ! . The main and defining property of the Hermite polynomials is their orthogonality, which werecord below.
Proposition A.2 (Orthogonality under Gaussian measure) . For any α, β ∈ N n , E y ∼N (0 ,I ) [ b H α ( y ) b H β ( y )] = δ αβ . (63)Beyond this, the key additional tool for our analysis is a generalization of the following factfrom the “umbral calculus” of Hermite polynomials (a proof will be subsumed in our more generalresult below). Proposition A.3 (Mismatched Variance Formula) . Let x > − . Then, E y ∼N (0 , x ) h k ( y ) = ( ( k − · x k/ k even k odd (64)Note that the formula on the right-hand side is that for the moments of a Gaussian random variablewith variance x , but we extend it to apply even for negative x , which is the “umbral” case of theresult, admitting an interpretation in terms of a fictitious Gaussian of negative variance—even if x ∈ ( − , E g ∼N (0 ,x ) g k .” A thoroughexposition of such analogies arising in combinatorics and the theory of orthogonal polynomials isgiven in [Rom05].In fact, the same holds even for multivariate Gaussians. The correct result in this case isgiven by imitating the formula for the moments of a multivariate Gaussian, via Wick’s (or Isserlis’)formula. While Proposition A.3 is well-known in the literature on Hermite polynomials and theumbral calculus, we are not aware of previous appearances of the formula below.53 roposition A.4 (Multivariate Mismatched Variance Formula) . Let X ∈ R n × n sym with X ≻ − I n .For α ∈ N n viewed as a multiset of elements of [ n ] , let P ( α ) be the set of pairings of elements of α , and for each P ∈ P ( α ) write X P denote the product of the entries of X located at paired indicesfrom P . Then, E y ∼N (0 ,I + X ) H α ( y ) = X P ∈P ( α ) X P . (65)Note that if X (cid:23)
0, then the right-hand side equals E x ∼N (0 ,X ) x α by Wick’s formula, but we againhave an umbral extension to non-PSD matrices X . Proof.
Define ℓ α := E y ∼N (0 ,I + X ) H α ( y ) . (66)Let e i ∈ N n have i th coordinate equal to 1 and all other coordinates equal to zero, and write 0 ∈ N n for the vector with all coordinates equal to zero. Clearly ℓ = 1 and ℓ e i = 0 for any i ∈ [ n ]. Wethen proceed by induction and use Gaussian integration by parts: ℓ α + e i = E y ∼N (0 ,I + X ) h α i +1 ( y i ) Y j ∈ [ n ] \{ i } h α j ( y j )= E y ∼N (0 ,I + X ) (cid:0) y i h α i ( y i ) − h ′ α i ( y i ) (cid:1) Y j ∈ [ n ] \{ i } h α j ( y j ) (Definition A.1)= E y ∼N (0 ,I + X ) n X k =1 ( I + X ) ik Y j ∈ [ n ] h ( δ jk ) α j ( y j ) − Y j ∈ [ n ] h ( δ ij ) α j ( y j ) (integration by parts)= X k ∈ [ n ] α k > X ik E y ∼N (0 ,I + X ) Y j ∈ [ n ] h ( δ jk ) α j ( y j )= X k ∈ [ n ] α k > α k X ik E y ∼N (0 ,I + X ) Y j ∈ [ n ] h α j − δ jk ( y j )= X k ∈ [ n ] α k > α k X ik ℓ α − e k (inductive hypothesis)so ℓ α satisfy the same recursion and initial condition as the sum-of-products formula on the right-hand side of (65). A.2 Components of the LDLR
Let Q , P be as in the general spiked Wishart model (Definition 2.14), and let L be the associatedlikelihood ratio. Throughout this section, we assume without loss of generality that (i) β = 1(since β can be absorbed into X ), and (ii) the prior X is supported on X for which X ≻ − I n . For α ∈ ( N n ) N , we denote m α := E Y ∼ Q H α ( Y ) L ( Y ) , (67) b m α := E Y ∼ Q b H α ( Y ) L ( Y ) = 1 √ α ! m α . (68)54e may compute these numbers as follows. For any α ∈ ( N n ) N , we have, passing to anexpectation under P rather than Q and then using Proposition A.4, m α = E Y ∼ P H α ( Y )= E X ∼X N Y i =1 E y ∼N (0 ,I + X ) H α i ( y )= E X ∼X N Y i =1 X P ∈P ( α i ) X P . (69) A.3 Taylor Expansion of the LDLR
Lemma A.5.
Suppose X is as in Definition 2.14. Denote by T ≤ D the operation of truncating theTaylor series of a function that is real-analytic in a neighborhood of zero to degree- D polynomials.For the sake of clarity, this will only ever apply to the variable named t . Then, k L ≤ D k = E X , X ′ ∼X T ≤ D h det( I n − t XX ′ ) − N/ i (1) , (70) by which we mean evaluation of the function of t on the RHS at t = 1 .Proof. We have k L ≤ D k = X | α |≤ D b m α = E X , X ′ ∼X X | α |≤ D N Y i =1 α i ! X P ∈P ( α i ) X P X P ∈P ( α i ) ( X ′ ) P = E X , X ′ ∼X D X d =0 X | α | = d N Y i =1 α i ! X P ∈P ( α i ) X P X P ∈P ( α i ) ( X ′ ) P . (71)On the other hand, we may expand the right-hand side of (70) by repeatedly differentiating withrespect to t to extract Taylor coefficients. In doing so we will repeatedly apply the chain rule, andsince ddt det( I − tA ) = Tr( A ) each derivative will be a rational function in ( t, X, X ′ ). Therefore,the Taylor coefficients are some rational functions r d ( X, X ′ ) (depending on N ) for d ∈ N , and forthese coefficients E X , X ′ ∼X T ≤ D h det( I n − t XX ′ ) − N/ i (1) = E X , X ′ ∼X D X d =0 r d ( X , X ′ ) . (72)We will show that in fact termwise equality holds, inside the expectations , namely r d ( X, X ′ ) = X | α | = d N Y i =1 α i ! X P ∈P ( α i ) X P X P ∈P ( α i ) ( X ′ ) P =: r ′ d ( X, X ′ ) (73)for each d ∈ N and for all (deterministic) X, X ′ ∈ R n × n sym . Since either side of (73) is a rationalfunction of ( X, X ′ ), it suffices to show that this is true on a set of matrices of positive measure.We will show that it holds for all X, X ′ (cid:23)
0. 55n this case, we have the convenient Gaussian interpretation of the expression for r ′ d ( X, X ′ )from Wick’s formula: r ′ d ( X, X ′ ) = X | α | = d N Y i =1 α i ! E x ∼N (0 ,X ) x α i ! E x ∼N (0 ,X ′ ) x α i ! = E x ,..., x N ∼N (0 ,X ) x ′ ,..., x ′ N ∼N (0 ,X ′ ) X | α i | even | α | = d N Y i =1 α i ! ( x i ) α i ( x ′ i ) α i (74)and grouping by the values of | α i | ,= E x ,..., x N ∼N (0 ,X ) x ′ ,..., x ′ N ∼N (0 ,X ′ ) X d ,...,d N ∈ N P Ni =1 d i = d Q Ni =1 d i ! X α ,...,α N ∈ N n | α i | = d i N Y i =1 (cid:18) d i α i (cid:19) n Y j =1 (( x i ) j ( x ′ i ) j ) α i ( j ) = E x ,..., x N ∼N (0 ,X ) x ′ ,..., x ′ N ∼N (0 ,X ′ ) X d ,...,d N ∈ N P Ni =1 d i = d N Y i =1 h x i , x ′ i i d i d i != X d ,...,d N ∈ N P Ni =1 d i = d N Y i =1 d i ! E x ∼N (0 ,X ) x ′ ∼N (0 ,X ′ ) h x , x ′ i d i . (75)From here, we introduce the moment-generating function of the inner overlap variables. Define φ X,X ′ ( t ) := E x ∼N (0 ,X ) x ′ ∼N (0 ,X ′ ) exp (cid:0) t h x , x ′ i (cid:1) = ∞ X d =0 t d d ! E x ∼N (0 ,X ) x ′ ∼N (0 ,X ′ ) h x , x ′ i d . (76)Then, r ′ d ( X, X ′ ) is simply the coefficient of t d in the Taylor series of φ X,X ′ ( t ) N .On the other hand, we may actually compute φ X,X ′ ( t ): φ X,X ′ ( t ) = E g , h ∼N (0 ,I n ) exp (cid:16) t g ⊤ √ X √ X ′ h (cid:17) = E g ∼N (0 ,I n ) exp (cid:18) g ⊤ (cid:20) t √ X √ X ′ t √ X ′ √ X (cid:21) g (cid:19) which may be calculated as a moment generating function of the “matrix χ ” variable gg ⊤ , giving= det (cid:18)(cid:20) I n − t √ X ′ √ X ′ − t √ X ′ √ X I n (cid:21)(cid:19) − / = det (cid:16) I n − t √ XX ′ √ X (cid:17) − / and applying Proposition 6.8,= det (cid:0) I n − t XX ′ (cid:1) − / . (77)Thus r ′ d ( X, X ′ ) is also the coefficient of t d in the Taylor series of det( I n − t XX ′ ) − N/ , whereby r ′ d ( X, X ′ ) = r d ( X, X ′ ), completing the proof. 56mplicit in the above proof are the following facts which we have made use of in Section 6.3. Proposition A.6.
Consider the general spiked Wishart model (Definition 2.14), and assume (with-out loss of generality) that β = 1 and X is supported on X for which X ≻ − I n . The followingformulas hold:(a) k L ≤ D k = E X , X ′ ∼X D X d =0 r d ( X , X ′ ) for some polynomials r , . . . , r d , . . . ,(b) ∞ X d =0 r d ( X, X ′ ) = det( I n − XX ′ ) − N/ ,(c) if X (cid:23) and X ′ (cid:23) then r d ( X, X ′ ) = X d ,...,d N ∈ N P Ni =1 d i = d N Y i =1 d i ! E x ∼N (0 ,X ) x ′ ∼N (0 ,X ′ ) h x , x ′ i d i , and(d) if X (cid:22) and X ′ (cid:22) then r d ( X, X ′ ) = X d ,...,d N ∈ N P Ni =1 d i = d N Y i =1 d i ! E x ∼N (0 , − X ) x ′ ∼N (0 , − X ′ ) h x , x ′ i d i . B Exponential Low-Degree Hardness for SBM
The goal of this section is to prove Theorem 2.20. We first prove a general statement that reducesbinary-valued models to the analogous Gaussian model.
B.1 Comparing Binary-Valued Models to Gaussian
Proposition B.1.
Consider the following general binary-valued problem. • Under the null distribution Q , we observe Y ∈ R N where Y i are independent, satisfy E [ Y i ] = 0 and E [ Y i ] = 1 , and each take two possible values: Y i ∈ { a i , b i } with a i < b i . • Under the planted distribution P , a signal X ∈ R N is drawn from some prior, and then Y i ∈ { a i , b i } are drawn independently (conditioned on X ) such that E [ Y i | X i ] = X i . (Thisrequires X i ∈ [ a i , b i ] .)In the above setting, k L ≤ D k ≤ D X d =0 d ! E X , X ′ h X , X ′ i d (78) where X ′ denotes an independent copy of X . The significance of (78) is that the right-hand side is the exact formula for k L ≤ D k in the followingadditive Gaussian model (see [KWB19]): under Q , Y ∼ N (0 , I N ); and under P , Y = X + Z with Z ∼ N (0 , I N ) and X drawn from some prior. Thus, Proposition B.1 can be interpreted as sayingthat a binary-valued problem is at least as hard as the corresponding Gaussian problem.57 roof of Proposition B.1. The Fourier characters χ S ( Y ) = Q i ∈ S Y i for S ⊆ [ N ] with | S | ≤ D areorthonormal with respect to Q , in the sense that E Y ∼ Q [ χ S ( Y ) χ T ( Y )] = [ S = T ]. They also spanthe subspace of degree ≤ D polynomials, since for any r ∈ N , any Y ri can be written as a degree-1 polynomial in Y i . Thus, { χ S } | S |≤ D is an orthonormal basis for the degree ≤ D polynomials.It is a standard fact that this allows us to write k L ≤ D k = P | S |≤ D ( E Y ∼ P [ χ S ( Y )]) (see e.g.,[HS17, Hop18]). We will use S to denote a subset of [ N ] and use α to denote an ordered multi-setof [ N ], with χ α ( Y ) := Q i ∈ α Y i . Now compute k L ≤ D k = X | S |≤ D ( E Y ∼ P χ S ( Y )) = X | S |≤ D ( E X χ S ( X )) ∗ ) ≤ D X d =0 X | α | = d d ! ( E X χ α ( X )) (see below)= D X d =0 X | α | = d d ! E X , X ′ χ α ( X ) χ α ( X ′ )= D X d =0 d ! E X , X ′ h X , X ′ i d . To see that inequality ( ∗ ) holds, note that it would be an equality if restricted to ordered multi-sets α that contain distinct elements. B.2 Stochastic Block Model
We now specialize to the case of the stochastic block model (Definition 2.19), and use Proposi-tion B.1 to reduce to a certain spiked Wigner model.
Proof of Theorem 2.20.
It will be convenient to consider a modification of the SBM that allowsself-loops. Specifically, the edge ( i, i ) occurs with probability dn under Q and with probability(1 + η √ ( k − dn under P . It is clear from the variational formula (13) for k L ≤ D k that revealingthis extra information can only increase k L ≤ D k .In order to place the SBM in the setting of Proposition B.1, take N = n ( n + 1) / Y i,j for every i ≤ j . Let p = d/n . In order to ensure E Q [ Y i,j ] = 0 and E Q [ Y i,j ] = 1, take Y i,j = b := p (1 − p ) /p if edge ( i, j ) is present, and Y i,j = a := − p p/ (1 − p ) otherwise.We now define X appropriately. Conditioned on the community structure, edge ( i, j ) occurswith probability (1 + ∆ i,j ) p where ∆ i,i = η ( k − / √ ∆ i,j (for i < j ) is either η ( k −
1) or − η depending on whether i and j belong to the same community or not, respectively. Using the fact pb + (1 − p ) a = 0 (twice), this means we should take X i,j = E [ Y i,j | X i,j ] = (1 + ∆ i,j ) pb + [1 − (1 + ∆ i,j ) p ] a = ∆ i,j p ( b − a ) = − ∆ i,j a = ∆ i,j r p − p . Let U be the n × k matrix whose i th row is √ ke k i − / √ k where k i ∈ [ k ] is the community assignmentof vertex i . One can check that ( U U ⊤ ) i,j = k [ k i = k j ] − X i,j = η q p − p ( U U ⊤ ) ij for i < j ,58nd X i,i = η √ q p − p ( U U ⊤ ) i,i . Therefore h X , X ′ i = η p − p h U U ⊤ , U ′ ( U ′ ) ⊤ i . By Proposition B.1, k L ≤ D k ≤ D X d =0 d ! E (cid:18) η p − p h U U ⊤ , U ′ ( U ′ ) ⊤ i (cid:19) d . Comparing this to (55) reveals that this is precisely the expression for k L ≤ D k in the generalspiked Wigner model with spike prior X k (see Definition 3.1), except in place of λ we have η d − p =(1 + o (1)) η d . Therefore, appealing to the Wigner result (Theorem 2.16), we have k L ≤ D k = O (1)provided dη <<