[PDF] Machinery for Proving Sum-of-Squares Lower Bounds on Certification Problems

Abstract

In this paper, we construct general machinery for proving Sum-of-Squares lower bounds on certification problems by generalizing the techniques used by Barak et al. [FOCS 2016] to prove Sum-of-Squares lower bounds for planted clique. Using this machinery, we prove degree n ϵ Sum-of-Squares lower bounds for tensor PCA, the Wishart model of sparse PCA, and a variant of planted clique which we call planted slightly denser subgraph.

Full PDF

aa r X i v : . [ c s . CC ] N ov Machinery for Proving Sum-of-Squares Lower Bounds onCertiﬁcation Problems

Aaron Potechin * Goutham Rajendran † November 10, 2020

Abstract

In this paper, we construct general machinery for proving Sum-of-Squares lower bounds on certiﬁcation problemsby generalizing the techniques used by [BHK +

16] to prove Sum-of-Squares lower bounds for planted clique. Usingthis machinery, we prove degree n ε Sum-of-Squares lower bounds for tensor PCA, the Wishart model of sparse PCA,and a variant of planted clique which we call planted slightly denser subgraph. * University of Chicago . [email protected] . Supported in part by NSF grant CCF-2008920. † University of Chicago . [email protected] . Supported in part by NSF grant CCF-1816372. ontents B norm ( α ) , B ( γ ) , N ( γ ) , and c ( α ) i Introduction

The Sum-of-Squares (SoS) hierarchy is an optimization technique that harnesses the power of semideﬁ-nite programming to solve optimization tasks. For polynomial optimization problems, the SoS hierarchy,ﬁrst independently investigated by Shor [Sho87], Nesterov [Nes00], Parillo [Par00], Lasserre [Las01] andGrigoriev [Gri01a, Gri01b], offers a sequence of convex relaxations parameterized by an integer called thedegree of the SoS hierarchy. As we increase the degree d of the hierarchy, we get progressively strongerconvex relaxations, while being solvable in n O ( d ) time. This has paved the way for the SoS hierarchy tobe a powerful tool in algorithm design both in the worst case and the average case settings. Indeed, therehas been tremendous success in using the SoS hierarchy to obtain efﬁcient algorithms for combinatorialoptimization problems (e.g., [GW95, ARV04, GS11, RRS17]) as well as problems stemming from Statisticsand Machine Learning (e.g., [BBH +

12, BKS15, HSS15, PS17, KS17]).On the ﬂip side, some problems have remained intractable beyond a certain threshold even by consider-ing higher degrees of the SoS hierarchy [BHK +

16, KMOW17, MRX20, GJJ + G ( n , ) from a random graph which is obtained by ﬁrst sampling a graph from G ( n , ) and then planting aclique of size n − ε for a small constant ε > . It was shown in [BHK +

16] that with high probability degree o ( log n ) SoS fails to solve this distinguishing problem.There are many reasons for why studying lower bounds against the SoS hierarchy is important. Firstly,since SoS is a generic proof system that captures a broad class of algorithmic reasoning, SoS lower boundsindicate to the algorithm designer the intrinsic hardness of the problem and that if they want to break thealgorithmic barrier, they need to search for algorithms that are not captured by SoS. Secondly, in averagecase problem settings, standard complexity theoretic assumptions such as P = NP have not been shown togive insight into the limits of efﬁcient algorithms. Instead, lower bounds against powerful proof systemssuch as SoS have served as strong evidence of hardness [Hop18]. Moreover, for a large class of problems,it’s been shown that SoS relaxations are the most efﬁcient among all semideﬁnite programming relaxations[LRS15]. Thus, understanding the power of the SoS hierarchy on these problems is an important steptowards understanding the approximability of these problems.

In this paper, we consider the following general category of problems. Given a random input, can we certifythat it does not contain a given structure?Some important examples of this kind of problem are as follows.1. Planted clique: Can we certify that a random graph does not have a large clique?2. Tensor PCA: Given an order k tensor T with random Gaussian entries, can we certify that there is nounit vector x such that h T , x ⊗ . . . ⊗ x i is large?3. Wishart model of sparse PCA: Given an m × d matrix S with random Gaussian entries (which corre-sponds to taking m samples from N ( I d ) ), can we certify that there is no k -sparse unit vector x suchthat k Sx k is large?These kinds of problems, known as certiﬁcation problems, are closely related to their optimizationor estimation variants. A certiﬁcation algorithm is required to produce a proof/certiﬁcate of a bound that1olds for all inputs, as opposed to most inputs. The Sum-of-Squares hierarchy provides such certiﬁcates,so analyzing SoS paves the way towards understanding the certiﬁcation complexity of these problems. Weinvestigate the following question. For certiﬁcation problems, what are the best bounds that SoS can certify?

In this work, we build general machinery for proving probabilistic Sum of Squares lower bounds oncertiﬁcation problems. To build our machinery, we generalize the techniques pioneered by [BHK +

16] forproving Sum of Squares lower bounds for planted clique. We start with the standard framework for provingprobabilistic Sum of Squares lower bounds:1. Construct candidate pseudo-expectation values ˜ E and the corresponding moment matrix Λ (see Section 2.1).2. Show that with high probability, Λ (cid:23) .For planted clique, [BHK +

16] constructed ˜ E and the corresponding moment matrix Λ by introducing thepseudo-calibration technique (see Section 2.2). They then showed through a careful and highly technicalanalysis that with high probability Λ (cid:23) .In this paper, by generalizing the techniques used in this analysis, we give general conditions which aresufﬁcient to show that a candidate moment matrix Λ is positive-semideﬁnite (PSD) with high probability.These conditions, which are our main result, are stated informally in Theorem 3.32 and stated formally inTheorem 7.95 and Theorem 7.101. A natural way to prove lower bounds on a certiﬁcation problem is as follows.1. Construct a planted distribution of inputs which has the given structure.2. Show that we cannot distinguish between the random and planted distributions and thus cannot certifythat a random input does not have the given structure.Based on this idea, the pseudo-calibration technique introduced by [BHK +

16] constructs candidate pseudo-expectation values ˜ E so that as far as low degree tests are concerned, ˜ E for the random distribution mimicsthe behavior of the given structure for the planted distribution (for details, see Section 2.2). This gives acandidate moment matrix Λ which we can then analyze with our machinery. Indeed, this is how we proveour SoS lower bounds for tensor PCA, the Wishart model of sparse PCA, and a variant of planted cliquewhich we call planted slightly denser subgraph. That said, our machinery does not require that the candidatemoment matrix Λ be obtained via pseudo-calibration. We now describe the planted distributions we use to show our SoS lower bounds for planted slightly densersubgraph, tensor PCA, and the the Wishart model of sparse PCA. We also state the random distributions forcompleteness and for contrast. We then state our results.2 lanted slightly denser subgraph

We use the following distributions.- Random distribution: Sample G from G ( n , ) - Planted distribution: Let k be an integer and let p > . Sample a graph G ′ from G ( n , ) . Choose arandom subset S of the vertices, where each vertex is picked independently with probability kn . For allpairs i , j of vertices in S , rerandomize the edge ( i , j ) where the probability of ( i , j ) being in the graphis now p . Set G to be the resulting graph.In Section 4, we compute the candidate moment matrix Λ obtained by using pseudo-calibration on thisplanted distribution. Theorem 1.1.

Let C p > . There exists a constant C > such that for all sufﬁciently small constants ε > , if k ≤ n − ε and p = + n − Cp ε , then with high probability, the candidate moment matrix Λ given bypseudo-calibraton for degree n C ε Sum-of-Squares is PSD.

Corollary 1.2.

Let C p > . There exists a constant C > such that for all sufﬁciently small constants ε > , if k ≤ n − ε and p = + n − Cp ε , then with high probability, degree n C ε Sum-of-Squares cannotcertify that a random graph G from G ( n , ) does not have a subgraph of size ≈ k with edge density ≈ p . Tensor PCA

Let k ≥ be an integer. We use the following distributions.- Random distribution: Sample A from N ( I [ n ] k ) .- Planted distribution: Let λ , ∆ > . Sample u from {− √ ∆ n , 0, √ ∆ n } n where the values are taken withprobabilites ∆ , 1 − ∆ , ∆ respectively. Then sample B from N ( I [ n ] k ) . Set A = B + λ u ⊗ k .In Section 5, we compute the candidate moment matrix Λ obtained by using pseudo-calibration on thisplanted distribution. Theorem 1.3.

Let k ≥ be an integer. There exist constants C , C ∆ > such that for all sufﬁciently smallconstants ε > , if λ ≤ n k − ε and ∆ = n − C ∆ ε then with high probability, the candidate moment matrix Λ given by pseudo-calibration for degree n C ε Sum-of-Squares is PSD.

Corollary 1.4.

Let k ≥ be an integer. There exists a constant C > such that for all sufﬁciently smallconstants ε > , if λ ≤ n k − ε , then with high probability, degree n C ε Sum-of-Squares cannot certify that fora random tensor A from N ( I [ n ] k ) , there is no vector u such that k u k ≈ and h A , x ⊗ . . . ⊗ x | {z } k times i ≈ λ . Wishart model of Sparse PCA

We use the following distributions.- Random distribution: v , . . . , v m are sampled from N ( I d ) and we take S to be the m × d matrixwith rows v , . . . , v m .- Planted distribution: Sample u from {− √ k , 0, √ k } d where the values are taken with probabilites k d , 1 − kd , k d respectively. Then sample v , . . . , v m as follows. For each i ∈ [ m ] , with probability ∆ , sample v i from N ( I d + λ uu T ) and with probability − ∆ , sample v i from N ( I d ) . Finally,take S to be the m × d matrix with rows v , . . . , v m .3n Section 6, we compute the candidate moment matrix Λ obtained by using pseudo-calibration on thisplanted distribution. Theorem 1.5.

There exists a constant C > such that for all sufﬁciently small constants ε > , if m ≤ d − ε λ , m ≤ k − ε λ , and there exists a constant A such that < A < , d A ≤ k ≤ d − A ε , and √ λ √ k ≤ d − A ε ,then with high probability, the candidate moment matrix Λ given by pseudo-calibration for degree d C ε Sum-of-Squares is PSD.

Corollary 1.6.

There exists a constant C > such that for all sufﬁciently small constants ε > , if m ≤ d − ε λ , m ≤ k − ε λ , and there exists a constant A such that < A < , d A ≤ k ≤ d − A ε , and √ λ √ k ≤ d − A ε ,then with high probability, the degree d C ε degree Sum-of-Squares cannot certify that for a random m × d matrix S with Gaussian entries, there is no vector u such that u has ≈ k nonzero entries, k u k ≈ , and k Su k ≈ m + m ∆ λ . Remark 1.7.

Note that our planted distributions only approximately satisfy constraints such as having asubgraph of size k , having a unit vector u , and having u be k -sparse. While we would like to use planteddistributions which satisfy such constraints exactly, these distributions don’t quite satisfy the conditions ofour machinery. This same issue appeared in the SoS lower bounds for planted clique [BHK + In the planted dense subgraph problem, we are given a random graph G where a dense subgraph of size k has been planted and we are asked to ﬁnd this planted dense subgraph. This is a natural generalization of the k -clique problem [Kar72] and has been subject to a long line of work over the years (e.g. [FS +

97, FPK01,Kho06, BCC +

10, BCG +

12, BKRW17, Man17]). In this work, we consider the following certiﬁcationvariant of planted dense subgraph.

Given a random graph G sampled from the Erd˝os-Rényi model G ( n , ) , certify an upper boundon the edge density of the densest subgraph on k vertices. In [BHK + k ≤ n − ε for a constant ε > , the degree o ( log n ) Sum-of-Squarescannot distinguish between a fully random graph sampled from G ( n , ) from a random graph which has aplanted k -clique. This implies that degree o ( log n ) SoS cannot certify an edge density better than for thedensest k -subgraph if k ≤ n − ε .In Corollary 1.2, we show that for k ≤ n − ε for a constant ε > , degree n Ω ( ε ) SoS cannot certify anedge density better than + n O ( ε ) . To the best of our knowledge, this is the ﬁrst result that proves such alower bound for SoS degrees as high as n Ω ( ε ) . When the SoS degree is only o ( log n ) , our result is not asstrong as the work of [BHK + k = n − ε , the true edge density of the densest k -subgraph is + √ log ( n / k ) √ k + o ( √ k ) ≈ + n − ε /2 as was shown in [GZ19, Corollary 2] whereas, by Corollary 1.2, theSoS optimum is as large as + n ε . This highlights a signiﬁcant additive difference in the optimum value.4 .2.2 Tensor PCA The Tensor Principal Component Analysis problem, originally introduced by [RM14], is a generalizationof the PCA problem from machine learning to higher order tensors. Given an order k tensor of the form λ u ⊗ k + B where u ∈ R n is a unit vector and B ∈ R [ n ] k has independent Gaussian entries, we would like torecover u . Here, λ is known as the signal-to-noise ratio.This can be equivalently considered to be the problem of optimizing a homogenous degree k polynomial f ( x ) , with random Gaussian coefﬁcients over the unit sphere k x k = . In general, polynomial optimizationover the unit sphere is a fundamental primitive with a lot of connections to other areas of optimization (e.g.[FK08, BV09, BH17, BKS14, BKS15, BGG + + For an integer k ≥ , given a random tensor A ∈ R [ n ] k with entries sampled independently from N (

0, 1 ) , certify an upper bound on h A , x ⊗ k i over unit vectors x . In [BGL16], it was shown that q ≤ n levels of SoS certiﬁes an upper bound of O ( k ) ( n · polylog ( n )) k /4 q k /4 − forthe Tensor PCA problem. When q = n ε for sufﬁciently small ε , this gives an upper bound of n k − O ( ε ) .Corollary 1.4 shows that this is tight.In [HKP + {−

1, 1 } , wework with N (

0, 1 ) . We remark that our machinery can recover their result.When k = , the maximum value of h x ⊗ k , A i over the unit sphere k x k = is precisely the largesteigenvalue of ( A + A T ) /2 which is Θ ( √ n ) with high probability. For any integer k ≥ , the true maximumof h x ⊗ k , A i over k x k = is O ( √ n ) with high probability [TS14]. In contrast, by Corollary 1.4, theoptimum value of the degree n ε SoS is as large as n k − O ( ε ) . This exhibits an integrality gap of n k − − O ( ε ) . The Wishart model of Sparse PCA, also known as the Spiked Covariance model, was originally proposedby [JL09]. In this problem, we observe m vectors v , . . . , v m ∈ R d from the distribution N ( I d + λ uu T ) where u is a k -sparse unit vector, and we would like to recover u . Here, the sparsity of a vector is the numberof nonzero entries and λ is known as the signal-to-noise ratio.Sparse PCA is a fundamental problem that has applications in a diverse range of ﬁelds (e.g. [WLY12,NYS11, Maj09, TPW14, CK09, AMS11]). It’s known that vanilla PCA does not yield good estimatorsin high dimensional settings [BAP +

05, Pau07, JL09]. A large volume of work has gone into studyingSparse PCA and it’s variants, both from an algorithmic perspective (e.g. [AW08, Ma13, KNV +

15, DM16,WBS + + m ≫ k λ then the sparse vector can be recovered by diagonal thresholding[JL09, AW08], covariancethresholding [KNV +

15, DM16], or SoS [dKNS20].- If m ≥ d and m ≫ d λ or m ≤ d and m ≫ d λ then vanilla PCA can recover the sparse vector (i.e. wedo not need to use the fact that the vector is sparse) (e.g. [BR + m ≤ d , d λ ≪ m ≪ d λ , and m ≫ k λ then there is an efﬁcient spectral algorithm to recover the sparsevector (e.g. [dKNS20]).- If m ≤ d , d λ ≪ m ≪ d λ , and m ≪ k λ then there is a simple spectral algorithm which distinguishesthe planted distribution from the random distribution but it is information theoretically impossible torecover the sparse vector [dKNS20, Appendix E].- If m ≪ k λ and m ≪ d λ then it is conjectured to be hard to distinguish between the random and theplanted distributions. We discuss the evidence for this below.For the parameter regime where m ≪ k λ and m ≪ d λ , several SoS lower bounds have been shown.The works [KNV +

15, BR + λ isa constant. The work [HKP +

17] considers the related Wigner model of Sparse PCA and they state degree d ε SoS lower bounds, without explicitly proving these bounds. They ask for similar SoS lower bounds inthe Wishart model. We almost fully resolve their questions in this work with Corollary 1.6. Our machinerycan also recover their results on the Wigner model.In [dKNS20], they prove that if m ≤ d λ and m ≤ (cid:16) k λ (cid:17) − Ω ( ε ) , then degree n ε polynomials cannot dis-tinguish the random and planted distributions. Corollary 1.6 says that under slightly stronger assumptions,degree n ε Sum-of-Squares cannot distinguish the random and planted distributions, so we conﬁrm that SoSis no more powerful than low degree polynomials in this setting.

In a seminal work, [BHK +

16] proved sum of squares lower bounds for the planted clique problem. Ourmachinery greatly generalizes the techniques of this paper. That said, for a technical reason, our machineryactually doesn’t quite handle planted clique (See Remark 3.35).Similar to this paper, [HKP +

17] also observed that the techniques used in [BHK +

16] can be used togive Sum-of-Squares lower bounds for ± variants of tensor PCA and sparse PCA, though this is not madeexplicit. In this paper, we use our machinery to make these lower bounds explicit. We also handle theWishart model of sparse PCA, which is signiﬁcantly harder to prove lower bounds for.[KMOW17] proved that for random constraint satisfaction problems (CSPs) where the predicate has abalanced pairwise independent distribution of solutions, with high probability, degree Ω ( n ) SoS is requiredto certify that these CSPs do not have a solution. While the pseudo-expectation values used by [KMOW17]can also be derived using pseudo-calibration, the analysis for showing that the moment matrix is PSD isvery different. It is an interesting question whether or not it is possible to unify these analyses.[MRX20] showed that it’s possible to lift degree SoS solutions to degree SoS solutions under suitableconditions, and used it to obtain degree SoS lower bounds for average case d -regular Max-Cut and theSherrington Kirkpatrick problem. Their construction is inspired by pseudo-calibration and their analysisalso goes via graph matrices. 6ecently, [GJJ +

20] proved degree n ε SoS lower bounds for the Sherrington-Kirkpatrick problem viaan intermediate problem known as Planted Afﬁne Planes. Their construction and analysis also goes viapseudo-calibration and graph matrices, but since the constructed moment matrix had a nontrivial nullspace,they had to use different techniques to handle them. However, once this nullspace is taken into account, themoment matrix is dominated by its expected value, so using our machinery would be overkill.[Kun20] recently proposed a technique to lift degree SoS lower bounds to higher levels and applied itto construct degree lower bounds for the Sherrington-Kirkpatrick problem. Interestingly, their constructiondoes not go via pseudo-calibration. Low degree polynomials

Consider a problem where the input is sampled from one of two distributionsand we would like to identify which distribution it was sampled from. Usually, one distribution is therandom distribution while the other is a planted distribution that contains a given structure not present in therandom distribution. In this setting, a closely related method is to use low degree polynomials to try anddistinguish the two distributions. More precisely, if there is a low degree polynomial such that its expectedvalue on the random distribution is very different than its expected value on the planted distribution, thisdistinguishes the two distributions. Recently, this method has been shown to be an excellent heuristic, asit recovers the conjectured hardness thresholds for several problems and is considerably easier to analyze[HKP +

17, Hop18, KWB19].That said, it is an open question whether low degree polynomials generally have the same power as theSoS hierarchy or if there are situations where the SoS hierarchy is more powerful. In this paper, we conﬁrmthat for tensor PCA and the Wishart model of sparse PCA with slightly adjusted planted distributions, theSoS hierarchy is no more powerful than low-degree polynomials.

The Statistical Query Model

The statistical query model is another popular restricted class of algorithmsintroduced by [Kea98]. In this model, for an underlying distribution, we can access it by querying expectedvalue of functions of the distribution. Concretely, for a distribution D on R n , we have access to it via anoracle that given as query a function f : R n → [ −

1, 1 ] returns E x ∼ D f ( x ) upto some additive adversarialerror. SQ algorithms capture a broad class of algorithms in statistics and machine learning and has also beenused to study information-computation tradeoffs. There has also been signiﬁcant work trying to understandthe limits of SQ algorithms (e.g. [FGR +

17, FPV18, DKS17]). The recent work [BBH +

20] showed that lowdegree polynomials and statistical query algorithms have equivalent power under mild conditions. It’s aninteresting open question whether or not SQ algorithms have the same power as Sum-of-Squares algorithms.

The remainder of this paper is organized as follows. In Section 2, we give some preliminaries. In particular,we describe the Sum-of-Squares hierarchy and present a brief overview of the machinery and some prooftechniques that we use. In Section 3, we present the informal statement of the main theorem. In Section 4,Section 5 and Section 6, we qualitatively verify the conditions of the machinery for planted slightly densersubgraph, tensor PCA, and sparse PCA respectively. While these sections only verify the qualitative con-ditions, the results in these sections are precise and will be reused in Section 11, Section 12 and Section 13to fully verify the conditions of the machinery. In Section 7, we introduce all the formal deﬁnitions and7tate the main theorem in full generality. In Section 8, we prove the main theorem while abstracting out thechoice of several functions. In Section 9, we choose these functions so that that they satisfy the conditionsneeded for our main theorem. In Section 10, we give tools for verifying a technical condition of our ma-chinery which is related to truncation error. Finally, in Section 11, Section 12 and Section 13, we verify allthe conditions necessary to prove Theorem 1.1, Theorem 1.3 and Theorem 1.5 respectively.

We will ﬁrst introduce the notion of a pseudoexpectation operator for a set of polynomial constraints andthen describe the Sum-of-Squares relaxation for a polynomial optimization problem.For an integer d , Let R ≤ d [ x , . . . , x n ] be the set of polynomials on x . . . , x n of degree at most d . Wedenote the degree of a polynomial f ∈ R [ x , . . . , x n ] by deg ( f ) . Deﬁnition 2.1 (Pseudoexpectation operator) . Given polynomial constraints g ( x ) =

0, . . . , g m ( x ) = on variables x , . . . , x n such that deg ( g i ) ≤ D for an integer D ≥ . For an even integer d ≥ D , adegree- d pseudoexpectation operator ˜ E satisfying these constraints is an operator ˜ E : R ≤ d [ x , . . . , x n ] → R satisfying:1. ˜ E [ ] = ,2. ˜ E is an R -linear operator, i.e., ˜ E [ f + cg ] = ˜ E [ f ] + c ˜ E [ g ] for every f , g ∈ R ≤ d [ x , . . . , x n ] , c ∈ R ,3. ˜ E [ g i · f ] = for every i =

1, . . . , m and f ∈ R ≤ d [ x , . . . , x n ] with deg ( f · g i ) ≤ d .4. ˜ E [ f ] ≥ for every f ∈ R ≤ d [ x , . . . , x n ] with deg ( f ) ≤ d . The notion of ˜ E generalizes the standard expectation operator. The idea is that optimizing over thislarger space of pseudoexpectation operators can be formulated as a semideﬁnite programming problem andhence, will serve as a relaxation of our program that can be solved efﬁciently.Formally, consider an optimization task on n variables x , . . . , x n ∈ R formulated as maximizing apolynomial f ( x ) subject to polynomial constraints g ( x ) =

0, . . . , g m ( x ) = . Suppose all the polynomials f , g , . . . , g m have degree at most D . Then, for an even integer d ≥ D , the degree d Sum-of-Squaresrelaxation of this program is as follows: Over all pseudoexpectation operators ˜ E satisfying ˜ E [ g ( x )] =

0, . . . , ˜ E [ g m ( x )] = , output the maximum value of ˜ E [ f ( x )] .To prove an SoS lower bound, we need to exhibit an operator ˜ E that satisﬁes these constraints withoptimum value of ˜ E [ f ( x )] being far away from the true optimum.In most cases, when constructing ˜ E , the condition Item 4 in Deﬁnition 2.1 is the most technically chal-lenging condition to satisfy. It can be equivalently stated as a positive semideﬁniteness condition of anassociated matrix called the moment matrix.To deﬁne the moment matrix, we need to set up some more notation.For an integer d ≥ , let I d denote the set of all tuples ( t , . . . , t n ) such that t i ≥ for all i and ∑ t i ≤ d . For I = ( t , . . . , t n ) ∈ I d , denote by x I : = x t x t . . . x t n n .8 eﬁnition 2.2 (Moment Matrix of ˜ E ) . For a degree d pseudoexpectation operator ˜ E on variables x , . . . , x n ,deﬁne the associated moment matrix Λ to be a matrix with rows and columns indexed by I d such that theentry corresponding to row I and column J is Λ [ I , J ] : = ˜ E h x I · x J i . It is easy to verify that Item 4 in Deﬁnition 2.1 equivalent to Λ (cid:23) .The machinery in this paper provides general conditions under which we can show that with highprobability, Λ (cid:23) . Pseudo-calibration is a heuristic developed in [BHK +

16] to construct a candidate pseudoexpectation opera-tor ˜ E and a corresponding moment matrix Λ for random vs planted problems. This will be the starting pointfor all our applications. Here, we will describe the heuristic and show an example of how to use it.Let ν denote the random distribution and µ denote the planted distribution. Let v denote the input and x denote the variables for our SoS relaxation. The idea is that, for an input v sampled from ν and anypolynomial f ( x ) of degree at most the SoS degree, pseudo-calibration proposes that for any low-degree test g ( v ) , the correlation of ˜ E [ f ] should match in the planted and random distributions. That is, E v ∼ ν [ ˜ E [ f ( x )] g ( v )] = E ( x , v ) ∼ µ [ f ( x ) g ( v )] Let F denote the Fourier basis of polynomials for the input v . By choosing different basis functionsfrom F as choices for g such that the degree is at most n ε (hence the term low-degree test), we get all lowerorder Fourier coefﬁcients for ˜ E [ f ( x )] when considered as a function of v . Furthermore, the higher ordercoefﬁcients are set to be so that the candidate pseudoexpectation operator can be written as ˜ E f ( x ) = ∑ g ∈F deg ( g ) ≤ n ε E v ∼ ν [ ˜ E [ f ( x )] g ( v )] g ( v ) = ∑ g ∈F deg ( g ) ≤ n ε E ( x , v ) ∼ µ [[ f ( x )] g ( v )] g ( v ) The coefﬁcients E ( x , v ) ∼ µ [[ f ( x )] g ( v )] can be explicitly computed in many settings, which thereforegives an explicit pseudoexpectation operator ˜ E .An advantage of pseudo-calibration is that this construction automatically satisﬁes some nice propertiesthat the pseudoexpectation ˜ E should satisfy. It’s linear in v by construction. For all polynomial equalitiesof the form f ( x ) = that is satisﬁed in the planted distribution, it’s true that ˜ E [ f ( x )] = . For otherpolynomial equalities of the form f ( x , v ) = that are satisﬁed in the planted distribution, the equality ˜ E [ f ( x , v )] = is approximately satisﬁed. In most cases, ˜ E can be mildly adjusted to satisfy these exactly.The condition ˜ E [ ] = is not automatically satisﬁed but in most applications, we usually requirethat ˜ E [ ] = ± o ( ) . Indeed, this has been the case for all known successful applications of pseudo-calibration. Once we have this, we simply set our ﬁnal pseudoexpectation operator to be ˜ E ′ deﬁned as ˜ E ′ [ f ( x )] = ˜ E [ f ( x )] / ˜ E [ ] .We remark that the condition ˜ E [ ] = ± o ( ) has been quite successful in predicting the right thresh-olds between approximability and inapproximability[HKP +

17, Hop18, KWB19].9 xample: Planted Clique

As an warmup, we review the pseudo-calibration calculation for planted clique.Here, the random distribution ν is G ( n , ) .The planted distribution µ is as follows. For a given integer k , ﬁrst sample G ′ from G ( n , ) , then choosea random subset S of the vertices where each vertex is picked independently with probability kn . For all pairs i , j of distinct vertices in S , add the edge ( i , j ) to the graph if not already present. Set G to be the resultinggraph.The input is given by G ∈ {−

1, 1 } ( [ n ] ) where G i , j is if the edge ( i , j ) is present and − otherwise. Let x , . . . , x n be the boolean variables for our SoS program such that x i indicates if i is in the clique. Deﬁnition 2.3.

Given a set of vertices V ⊆ [ n ] , deﬁne x V = ∏ v ∈ V x v . Deﬁnition 2.4.

Given a set of possible edges E ⊆ ( [ n ] ) , deﬁne χ E = ( − ) | E \ E ( G ) | = ∏ ( i , j ) ∈ E G i , j . Pseudo-calibration says that for all small V and E , E G ∼ ν (cid:2) ˜ E [ x V ] χ E (cid:3) = E µ [ x V χ E ] Using standard Fourier analysis, this implies that if we take c E = E µ [ x V χ E ] = (cid:18) kn (cid:19) | V ∪ V ( E ) | where V ( E ) is the set of the endpoints of the edges in E , then for all small V , ˜ E [ x V ] = ∑ E : E is small c E χ E = ∑ E : E is small (cid:18) kn (cid:19) | V ∪ V ( E ) | χ E Since the values of ˜ E [ x V ] are known, by multi-linearity, this can be naturally extended to obtain values ˜ E [ f ( x )] for any polynomial f of degree at most the SoS degree. In this section, we describe some ideas behind our machinery.As explained above, pseudo-calibration will give us a candidate pseudoexpectation operator ˜ E and wecan consider a corresponding moment matrix Λ . For example, in the case of planted clique, Λ has rows andcolumns indexed by sub-tuples of [ n ] of size at most d , such that Λ [ I , J ] = ∑ E : E is small (cid:18) kn (cid:19) | I ∪ J ∪ V ( E ) | χ E for all sub-tuples I , J of [ n ] of size at most d . We would like to prove that Λ (cid:23) with high probability.The machinery gives a set of conditions under which we can show that Λ (cid:23) with high probability.Our ﬁrst step will be decompose Λ into a linear combination of graph matrices.10 raph matrices Graph matrices were originally introduced by [BHK +

16, MP16] and later generalizedin [AMP20]. We use the generalized graph matrices in our analysis.Each graph matrix is a matrix valued function of the input, that can be identiﬁed by a graph with labelededges that we call a shape. Informally, graph matrices will form a basis for all matrix valued functions ofthe input that have a certain symmetry. In particular, Λ is one such matrix valued function and can thus bedecomposed into graph matrices. For a shape α , the graph matrix associated to α is denoted by M α .Graph matrices have several useful properties. Firstly, k M α k can be bounded with high probability interms of simple combinatorial properties of the shape α . Secondly, when we multiply two graph matrices M α , M β corresponding to shapes α , β , it approximately equals the graph matrix M α ◦ β where the shape α ◦ β ,called the composition of the two shapes α ◦ β , is easy to describe combinatorially.These makes graph matrices a convenient tool to analyze the moment matrix. In our setting, the momentmatrix decomposes as Λ = ∑ λ α M α where the sum is over all shapes α and λ α ∈ R are the coefﬁcientsthat arise from pseudo-calibration. Decomposing Shapes

For graph matrices α , β , M α M β ≈ M α ◦ β where we deﬁne the composition of twoshapes α ◦ β to be a larger shape that is obtained by concatenating the shapes α , β . This equality is onlyapproximate and handling it precisely is a signiﬁcant source of difﬁculty in our analysis. Shape compositionis also associative, hence we can deﬁne composition of three shapes.A crucial idea for our machinery is that for any shape α , there exists a canonical and unique decompo-sition of α as σ ◦ τ ◦ σ ′ T satisfying some nice structural properties, for shapes σ , τ and σ ′ T . Here, σ , τ , σ ′ T are called the left part, the middle part and the right part of α respectively. Using this, our moment matrixcan be written as Λ = ∑ α λ α M α = ∑ σ , τ , σ ′ λ σ ◦ τ ◦ σ ′ T M σ ◦ τ ◦ σ ′ T Giving a PSD factorization

We ﬁrst consider the terms ∑ σ , σ ′ λ σ ◦ σ ′ T M σ ◦ σ ′ T ≈ ∑ σ , σ ′ λ σ ◦ σ ′ T M σ M σ ′ T where τ corresponds to an identity matrix and can be ignored.If there existed real numbers v σ for all left shapes σ such that λ σ ◦ σ ′ T = v σ v σ ′ , then we would have ∑ σ , σ ′ λ σ ◦ σ ′ T M σ M σ ′ T = ∑ σ , σ ′ v σ v σ ′ M σ M σ ′ T = ( ∑ σ v σ M σ )( ∑ σ v σ M σ ) T (cid:23) which shows that the contribution from these terms is positive semideﬁnite. Note that the existence of v σ can be relaxed as follows. Let H be the matrix with rows and columns indexed by left shapes σ such that H ( σ , σ ′ ) = λ σ ◦ σ ′ T . If H is positive semideﬁnite then the contribution from these terms will also be positivesemideﬁnite. In fact, this will be the ﬁrst condition of our main theorem, Theorem 7.101. Handling terms with a non-trivial middle part

Unfortunately, we also have terms λ σ ◦ τ ◦ σ ′ T M σ ◦ τ ◦ σ ′ T where τ is non-trivial. Our strategy will be to charge these terms to other terms.A starting point for our argument is the following inequality. For left shape σ , middle shape τ and rightshape σ ′ T , and real numbers a , b , ( aM σ − bM σ ′ M τ T )( aM σ − bM σ ′ M τ T ) T (cid:23) ab ( M σ M τ M σ ′ T + ( M σ M τ M σ ′ T ) T ) (cid:22) a M σ M σ T + b M σ ′ M τ T M τ M σ ′ T (cid:22) a M σ M σ T + b k M τ k M σ ′ M σ ′ T If it is true that λ σ ◦ τ ◦ σ ′ T k M τ k ≤ λ σ ◦ σ T λ σ ′ ◦ σ ′ T , then we can choose a , b such that a ≤ λ σ ◦ σ T , b k M τ k ≤ λ σ ′ ◦ σ ′ T and ab = λ σ ◦ τ ◦ σ ′ T . This will approximately imply λ σ ◦ τ ◦ σ ′ T ( M σ ◦ τ ◦ σ ′ T + M T σ ◦ τ ◦ σ ′ T ) (cid:22) λ σ ◦ σ T M σ ◦ σ T + λ σ ′ ◦ σ ′ T M σ ′ ◦ σ ′ T which will give us a way to charge terms with a nontrivial middle part against terms with a trivial middlepart.While we could try to apply this inequality term by term, it is not strong enough to give us our results.Instead, we generalize this inequality. This will lead us to the second condition of our main theorem,Theorem 7.101. Handing intersection terms

There’s one important technicality in the above heuristic calculations. When-ever we multiply two graph matrices M α , M β , it is only approximately equal to M α ◦ β . All the other errorterms have to be carefully handled in our analysis. We call these terms intersection terms.These intersection terms themselves turn out to be graph matrices and our strategy is to now recursivelydecompose them into σ ◦ τ ◦ σ ′ T and apply the previous ideas. A similar approach was undertaken in[BHK +

16] but this work generalizes it signiﬁcantly. To do this methodically, we employ several ideassuch as the notion of intersection patterns and the generalized intersection tradeoff lemma (see Section 8).Properly handling the intersection terms is one of the most technically intensive parts of our work.This analysis leads us to condition 3 of Theorem 7.101.

Applying the machinery

To apply the machinery to our problems of interest, we verify the spectral con-ditions that our coefﬁcients should satisfy so that we can use Theorem 7.101. The Planted slightly densersubgraph application is straightforward and will serve as a good warmup to understand our machinery. In theapplications to Tensor PCA and Sparse PCA, the shapes corresponding to the graph matrices with nonzerocoefﬁcients have nice structural properties that will be crucial for our analysis. We exploit this structure anduse novel charging arguments to verify the conditions of our machinery.

In this section, we informally describe our machinery for proving sum of squares lower bounds on plantedproblems. Our goal for this section is to qualitatively state the conditions under which we can show thatthe moment matrix Λ is PSD with high probability (see Theorem 3.32). For simplicity, in this section werestrict ourselves to the setting where the input is {−

1, 1 } ( n ) (i.e. a random graph on n vertices). We alsodefer the proofs of several important facts until Section 7. In Section 7, we give the general deﬁnitions, ﬁllin the missing proofs, and give the full, quantitative statement of our main result (see Theorem 7.101).12 .1 Fourier analysis for matrix-valued functions: ribbons, shapes, and graph matrices For our machinery, we need the deﬁnitions of ribbons, shapes, and graph matrices from [AMP20].

Ribbons lift the usual Fourier basis for functions { f : {± } ( n ) → R } to matrix-valued functions. Deﬁnition 3.1 (Simpliﬁed ribbons – see Deﬁnition 7.22) . Let n ∈ N . A ribbon R is a tuple ( E R , A R , B R ) where E R ⊆ ( n ) and A R , B R are tuples of elements in [ n ] . R thus speciﬁes:1. A Fourier character χ E R .2. Row and column indices A R and B R .We think of R as a graph with vertices V ( R ) = { endpoints of ( i , j ) ∈ E R } ∪ A R ∪ B R and edges E ( R ) = E R , where A R , B R are distinguished tuples of vertices. Deﬁnition 3.2 (Matrix-valued function for a ribbon R ) . Given a ribbon R , we deﬁne the matrix valuedfunction M R : {± } ( n ) → R n ! ( n −| AR | ) ! × n ! ( n −| BR | ) ! to have entries M ( A R , B R ) = χ E R and M ( A ′ , B ′ ) = whenever A ′ = A R or B ′ = B R . The following proposition captures the main property of the matrix-valued functions M R – they are anorthonormal basis. We leave the proof to the reader. Proposition 3.3.

The matrix-valued functions M R form an orthonormal basis for the vector space of matrixvalued functions with respect to the inner product h M , M ′ i = E G ∼{± } ( n ) h Tr (cid:16) M ( G )( M ′ ( G )) ⊤ (cid:17)i . As described above, ribbons are an orthonormal basis for matrix-valued functions. However, we will needan orthogonal basis for the subset of those functions which are symmetric with respect to the action of S n .For this, we use graph matrices , which are described by shapes . The idea is that each ribbon R has a shape α which is obtained by replacing the vertices of R with unspeciﬁed indices. Up to scaling, the graph matrix M α is the average of M π ( R ) over all permutations π ∈ S n . Deﬁnition 3.4 (Simpliﬁed shapes – see Deﬁnition 7.34) . Informally, a shape α is just a ribbon R wherethe vertices are speciﬁed by variables rather than having speciﬁc values in [ n ] . More precisely, a shape α = ( V ( α ) , E ( α ) , U α , V α ) is a graph on vertices V ( α ) , with1. Edges E ( α ) ⊆ ( V ( α ) )

2. Distinguished tuples of vertices U α = ( u , u , . . . ) and V α = ( v , v , . . . ) , where u i , v i ∈ V ( α ) . Note that V ( α ) and V α are not the same object!) Deﬁnition 3.5 (Shape transposes) . Given a shape α , we deﬁne α ⊤ to be the shape α with U α and V α swappedi.e. U σ ⊤ = V σ and V σ ⊤ = U σ . Note that M α ⊤ = M ⊤ α , where M ⊤ α is the usual transpose of the matrix-valuedfunction M α . Deﬁnition 3.6 (Graph matrices) . Let α be a shape. The graph matrix M α : {± } ( n ) → R n ! ( n −| U α | ) ! × n ! ( n −| V α | ) ! is deﬁned to be the matrix-valued function with A , B -th entry M α ( A , B ) = ∑ R s.t. A R = A , B R = B ∃ ϕ : V ( α ) → [ n ] : ϕ is injective , ϕ ( α )= R χ E R In other words, M α = ∑ R M R where the sum is over ribbons R which can be obtained by assigning eachvertex in V ( α ) a label from [ n ] . For examples of graph matrices, see [AMP20].

Remark 3.7.

As noted in [AMP20], we index graph matrices by tuples rather than sets so that they aresymmetric (as a function of the input) under permutations of [ n ] . A crucial idea in our analysis is the idea from [BHK +

16] of decomposing each shape α into left, middle,and right parts. This will allow us to give an approximate factorization of each graph matrix M α . For each shape α we will identify three other shapes, which we denote by σ , τ , σ ′ T and call (for reasons wewill see soon) the left, middle, and right parts of α , respectively. The idea is that M α ≈ M σ M τ M σ ′ T . Weobtain σ , τ , and σ ′ T by splitting the shape α along the leftmost and rightmost minimum vertex separators . Deﬁnition 3.8 (Vertex Separators) . We say that a set of vertices S is a vertex separator of α if every path from U α to V α in α (including paths of length ) intersects S . Note that for any vertex separator S , U α ∩ V α ⊆ S . Deﬁnition 3.9 (Minimum Vertex Separators) . We say that S is a minimum vertex separator of α if S is avertex separator of α and for any other vertex separator S ′ of α , | S | ≤ | S ′ | . Deﬁnition 3.10 (Leftmost and Rightmost Minimum Vertex Separators) .

1. We say that S is the leftmost minimum vertex separator of α if S is a minimum vertex separator of α and for every other minimum vertex separator S ′ of α , every path from U α to S ′ intersects S .2. We say that T is the rightmost minimum vertex separator of α if T is a minimum vertex separator of α and for every other minimum vertex separator S ′ of α , every path from S ′ to V α intersects T .

14t is not immediately obvious that leftmost and rightmost minimum vertex separators are well-deﬁned.For the simpliﬁed setting we are considering here, this was shown by [BHK + α into left, middle, and right parts σ , τ , and σ ′ T . Deﬁnition 3.11 (Decomposition Into Left, Middle, and Right Parts) . Let α be a shape and let S and T bethe leftmost and rightmost minimum vertex separators of α . Given orderings O S and O T for S and T , wedecompose α into left, middle, and right parts σ , τ , and σ ′ T as follows.1. The left part σ of α is the part of α reachable from U α without passing through S . It includes S butexcludes all edges which are entirely within S . More formally,(a) V ( σ ) = { u ∈ V ( α ) : there is a path P from U α to u in α such that ( V ( P ) \ { u } ) ∩ S = ∅ } (b) U σ = U α and V σ = S with the ordering O S (c) E ( σ ) = {{ u , v } ∈ E ( α ) : u , v ∈ V ( σ ) , u / ∈ S or v / ∈ S }

2. The right part σ ′ T of α is the part of α reachable from V α without intersecting T more than once. Itincludes T but excludes all edges which are entirely within T . More formally,(a) V ( σ ′ T ) = { u ∈ V ( α ) : there is a path P from V α to u in α such that ( V ( P ) \ { u } ) ∩ T = ∅ } (b) U σ ′ T = T with the ordering O T and V σ ′ T = V α .(c) E ( σ ′ T ) = {{ u , v } ∈ E ( α ) : u , v ∈ V ( σ ′ T ) , u / ∈ T or v / ∈ T }

3. The middle part τ of α is, informally, the part of α between S and T (including S and T and all edgeswhich are entirely within S or within T ). More formally, let U τ = S with the ordering O S , let V τ = T with the ordering O T , and let E ( τ ) = E ( α ) \ ( E ( σ ) ∪ E ( σ ′ )) be all of the edges of E ( α ) which donot appear in E ( σ ) or E ( σ ′ ) . Then V ( τ ) is all of the vertices incident to edges in E ( τ ) together with S , T . Remark 3.12.

Note that the decomposition into left, middle, and right parts depends on the ordering forthe vertices in S and T . As we will discuss later (see Section 7.8), we will use all possible orderingssimultaneously and then scale things by an appropriate constant. Because of the minimality and leftmost/rightmost-ness of the vertex separators S , T used to deﬁne σ , τ , σ ′ ,the shapes σ , τ , σ ′ have some special combinatorial structure, which we capture in the following proposition.We defer the proof until Section 7 where we state a generalized version. Proposition 3.13. σ , τ , and σ ′ T have the following properties:1. V σ = S is the unique minimum vertex separator of σ .2. S and T are the leftmost and rightmost minimum vertex separators of τ .3. T = U σ ′ T is the unique minimum vertex separator of σ ′ T . Based on this, we deﬁne sets of shapes which can appear as left, middle, or right parts.

Deﬁnition 3.14 (Left, Middle, and Right Parts) . Let α be a shape.1. We say that α is a left part if V α is the unique minimum vertex separator of α and E ( α ) has no edgeswhich are entirely contained in V α . . We say that α is a proper middle part if U α is the leftmost minimum vertex separator of α and V α is therightmost minimum vertex separator of α

3. We say that α is a right part if U α is the unique minimum vertex separator of α and E ( α ) has no edgeswhich are entirely contained in U α . Remark 3.15.

For technical reasons, later on we will need to consider improper middle parts τ where U τ and V τ are not the leftmost and rightmost minimum vertex separators of τ , which is why we make thisdistinction here. The following proposition is also straightforward from the deﬁnitions.

Proposition 3.16.

A shape σ is a left part if and only if σ T is a right part We now analyze what happens when we take the products of graph matrices. Roughly speaking, we willhave that if α can be decomposed into left, middle, and right parts σ , τ , and σ ′ T then M α ≈ M σ M τ M σ ′ T .However, this is only an approximation rather than an equality, and this will be the source of considerabletechnical difﬁculties.We begin with a concatenation operation on ribbons. Deﬁnition 3.17 (Ribbon Concatenation) . If R and R are two ribbons such that V ( R ) ∩ V ( R ) = B R = A R and either R or R contains no edges entirely within B R = A R then we deﬁne R ◦ R to be theribbon formed by glueing together R and R along B R = A R . In other words,1. V ( R ◦ R ) = V ( R ) ∪ V ( R ) E ( R ◦ R ) = E ( R ) ∪ E ( R ) A R ◦ R = A R and B R ◦ R = B R . The following proposition is easy to check.

Proposition 3.18.

Whenever R , R are ribbons such that R ◦ R is deﬁned, M R M R = M R ◦ R We have an analogous deﬁnition for concatenating shapes:

Deﬁnition 3.19 (Shape Concatenation) . If α and α are two shapes such that V ( α ) ∩ V ( α ) = V α = U α and either α or α contains no edges entirely within V α = U α then we deﬁne α ◦ α to be the shape formedby glueing together α and α along V α = U α . In other words,1. V ( α ◦ α ) = V ( α ) ∪ V ( α ) E ( α ◦ α ) = E ( α ) ∪ E ( α ) U α ◦ α = U α and V α ◦ α = V α . The next proposition, again easy to check, shows that the shape concatenation operation respects the left/middle/rightpart decomposition.

Proposition 3.20. If α can be decomposed into left, middle, and right parts σ , τ , σ ′ T then α = σ ◦ τ ◦ σ ′ T .

16e now discuss why M α = M σ ◦ τ ◦ σ ′ T ≈ M σ M τ M σ ′ T is only an approximation rather than an equality.Consider the difference M σ M τ M σ ′ T − M σ ◦ τ ◦ σ ′ T . The graph matrix M σ ◦ τ ◦ σ ′ T decomposes (by deﬁnition)into a sum over injective maps ϕ : V ( σ ◦ τ ◦ σ ′ T ) → [ n ] . Also by expanding deﬁnitions, the product M σ M τ M σ ′ T expands into a sum over triples of injective maps ( ϕ , ϕ , ϕ ) , where ϕ : V ( σ ) → [ n ] , ϕ : V ( τ ) → [ n ] , ϕ : V ( σ ′ ) → [ n ] where ϕ and ϕ agree on V σ = U τ and ϕ and ϕ agree on V τ = U σ ′ T .If they are combined into one map ϕ : V ( σ ∪ τ ∪ σ ′ ) → [ n ] , the resulting ϕ may not be injectivebecause ϕ ( V ( σ )) , ϕ ( V ( τ )) , ϕ ( V ( σ ′ T )) may have nontrivial intersection (beyond ϕ ( V σ ) and ϕ ( V τ ) ).We call the resulting terms intersection terms and handling them properly is a major part of the technicalanalysis. Remark 3.21.

Actually, the approximation M α = M σ ◦ τ ◦ σ ′ T ≈ M σ M τ M σ ′ T is also off by a multiplicativeconstant because there is also a subtle issue involving the automorphism groups of these shapes. For now,we ignore this issue. For details about this issue, see Lemma 7.78. The idea for our analysis is as follows. Given a matrix-valued function Λ which is symmetric under permu-tations of [ n ] , we write Λ = ∑ α λ α M α . We then break each shape α up into left, middle, and right parts σ , τ , and σ ′ T .For this analysis, we use shape coefﬁcient matrices H τ whose rows and columns are indexed by leftshapes and whose entries depend on the coefﬁcients λ α . We choose these matrices so that Λ = ∑ τ H τ ( σ , σ ′ ) M σ ◦ τ ◦ σ ′ T ≈ ∑ τ H τ ( σ , σ ′ ) M σ M τ M σ ′ T To set this up, we separate the possible middle parts τ into groups based on the size of U τ and whether ornot they are trivial. Deﬁnition 3.22.

We deﬁne I mid to be the set of all possible U τ . Here I mid is the set of tuples of unspeciﬁedvertices of the form U = ( u , . . . , u k ) where ≤ k ≤ d . Deﬁnition 3.23.

We say that a proper middle shape τ is trivial if E ( τ ) = ∅ and | U τ ∩ V τ | = | U τ | = | V τ | (i.e. V τ is a permutation of U τ ). For simplicity, the only proper trivial middle parts τ we consider are shapes Id U corresponding toidentity matrices. Deﬁnition 3.24.

Given a tuple of unspeciﬁed vertices U = ( u , . . . , u | U | ) We deﬁne Id U to be the shapewhere V ( Id U ) = U , U Id U = V Id U = U , and E ( Id U ) = ∅ . We group all of the proper non-trivial middle parts τ into sets M U based on the size of U τ . Deﬁnition 3.25.

Given a tuple of unspeciﬁed vertices U = ( u , . . . , u | U | ) , we deﬁne M U to be the set ofproper non-trivial middle parts τ such that U τ and V τ have the same size as U . Note that U τ and V τ mayintersect each other arbitrarily. With these deﬁnitions, we can now deﬁne our shape coefﬁcient matrices.17 eﬁnition 3.26.

Given U ∈ I mid , we deﬁne L U to be the set of left shapes σ such that | V σ | = | U | . Deﬁnition 3.27.

For each U ∈ I mid , we deﬁne the shape coefﬁcient matrix H Id U to be the matrix indexedby left shapes σ , σ ′ ∈ L U with entries H Id U ( σ , σ ′ ) = | U | ! λ σ ◦ σ ′ T Deﬁnition 3.28.

For each U ∈ I mid , for each τ ∈ M U , we deﬁne the shape coefﬁcient matrix H τ to be thematrix indexed by left shapes σ , σ ′ ∈ L U with entries H τ ( σ , σ ′ ) = ( | U | ! ) λ σ ◦ τ ◦ σ ′ T With these shape coefﬁcient matrices, we have the following decomposition of Λ = ∑ α λ α M α . Lemma 3.29. Λ = ∑ U ∈I mid ∑ σ , σ ′ ∈L U H Id U ( σ , σ ′ ) M σ ◦ σ ′ T + ∑ U ∈I mid ∑ τ ∈M U ∑ σ , σ ′ ∈L U H τ ( σ , σ ′ ) M σ ◦ τ ◦ σ ′ T We defer the proof of this lemma to Lemma 7.81.For technical reasons, we need to deﬁne one more operation to handle intersection terms. We call thisoperation the − γ , γ operation. Deﬁnition 3.30.

Given U , V ∈ I mid where | U | > | V | , we deﬁne Γ U , V to be the set of left parts γ such that | U γ | = | U | and V γ = | V | . Deﬁnition 3.31.

Given U , V ∈ I mid where | U | > | V | , a shape coefﬁcient matrix H Id V , and a γ ∈ Γ U , V ,we deﬁne the shape coefﬁcient matrix H − γ , γ Id V to be the matrix indexed by left shapes σ , σ ′ ∈ L U with entries H − γ , γ Id V ( σ , σ ′ ) = H ( σ ◦ γ , σ ′ ◦ γ ) We are now ready to state a simpliﬁed, qualitative version of our main theorem. For the full, quantitativeversion of our main theorem, see Theorem 7.101.

Theorem 3.32.

There exist functions f ( τ ) : M U → R and f ( γ ) : Γ U , V → R depending on n and otherparameters such that if Λ = ∑ α λ α M α and the following conditions hold:1. For all U ∈ I mid , H Id U (cid:23)

2. For all U ∈ I mid and all τ ∈ M U , (cid:20) H Id U f ( τ ) H τ f ( τ ) H T τ H Id U (cid:21) (cid:23)

3. For all U , V ∈ I mid such that | U | > | V | and all γ ∈ Γ U , V , H − γ , γ Id V γ (cid:22) f ( γ ) H Id U γ then with probability at least − o ( ) over G ∼ {± } ( n ) it holds that Λ ( G ) (cid:23) . Remark 3.33.

As we will demonstrate in the remainder of this paper, our machinery works well when thecoefﬁcients λ α for each shape have an exponential decay in both | V ( α ) | and | E ( α ) | . However, since ourmachinery is highly technical with many different parts, it does not work as well if the coefﬁcients have adifferent kind of decay. .5 An informal application to planted clique Before we move on to further deﬁnitions needed for a more complete statement of the main theorem, wepresent an informal example.

Example 3.34.

When the pseudo-calibration method is applied to prove an SoS lower bound for the plantedclique problem in n node graphs with clique size k , as in [BHK + Λ = ∑ α : | V ( α ) |≤ t (cid:16) kn (cid:17) | V ( α ) | M α where t ≈ log ( n ) . One may then compute that the matrices H Id U and H τ are as follows (at least so long as | V ( σ ) | , | V ( τ ) | , | V ( σ ′ ) | ≪ t ; we ignore this detail for now). Forall r ∈ [ d ] ,1. For U with | U | = r , H Id U ( σ , σ ′ ) = (cid:16) kn (cid:17) | V ( σ ) | + | V ( σ ′ ) |− r

2. For all proper, non-trivial middle shapes τ such that | U τ | = | V τ | = r , H τ ( σ , σ ′ ) = (cid:18) kn (cid:19) | V ( σ ) | + | V ( σ ′ ) | + | V ( τ ) |− r Deﬁning v r to be the vector such that v r ( σ ) = (cid:16) kn (cid:17) | V ( σ ) |− r , we have that1. For U with | U | = r , H Id U = v | U | v T | U |

2. For all proper, non-trivial middle shapes τ such that | U τ | = | V τ | = r , H τ = (cid:16) kn (cid:17) | V ( τ ) |− r v r v Tr

3. For all left parts γ , H − γ , γ Id V γ = (cid:16) kn (cid:17) | V ( γ ) |−| U γ |−| V γ | v | U γ | v T | U γ | It turns out in this setting that we can take f ( τ ) to be ˜ O ( n | V ( τ ) |−| U τ | ) and f ( γ ) to be ˜ O ( n | V ( γ ) \ U γ | ) . Thus,as long as k ≪ √ n ,1. For any U and all τ such that V τ = U τ with | U τ | = | V τ | = | U | , f ( τ ) H τ (cid:22) H Id U .2. For all non-trivial left parts γ , H − γ , γ Id V γ (cid:22) f ( γ ) H Id U γ Remark 3.35.

This does not quite satisfy the conditions of Theorem 3.32 because there are τ such that V τ = U τ but which are non-trivial because E ( τ ) = ∅ . For these τ , condition 2 of Theorem 3.32 fails.In order to prove their SoS lower bounds for planted clique, [BHK +

16] handle this issue by groupingtogether all of the τ where V τ = U τ into the indicator function for whether V τ = U τ is a clique.In this paper, we get around this issue by instead considering the planted slightly denser subgraphproblem. This introduces an edge decay into the coefﬁcients. For details, see Section 4. In this section, we restricted ourselves to the case when the {−

1, 1 } ( n ) for simplicity. However, for ourresults we will need to handle more general types of inputs. We now brieﬂy describe which kinds of inputswe will need to handle and how we handle them. 19. In general, the entries of the input may be labeled by more than indices. For example, for tensorPCA on order tensors, the entries of the input are indexed by indices. To handle this, we will haveshapes which have hyperedges rather than edges.2. In general, the entries of the input will come from a distribution Ω rather than being ± . To handlethis, we will take an orthonormal basis { h k } for Ω . We will then give each edge/hyperedge a label l to specify which polynomial h l should be applied to that entry of the input.3. In general, there may be t different types of indices rather than just one type of index. In this case,the symmetry group will be S n × . . . × S n t rather than S n . To handle this, we will have shapes withdifferent types of vertices.We formally make these generalizations in Section 7. We will describe some more notations and deﬁnitions that will be useful to us to describe the qualitativebounds for our applications. For each of our applications, we will describe the corresponding modiﬁcationsneeded to the deﬁnitions already in place and present new deﬁnitions where necessary.

Since the input is a graph G ∈ {−

1, 1 } ( [ n ] ) , most of what we introduced already apply to this setting. Todescribe the moment matrix, we need to deﬁne the truncation parameter. Deﬁnition 3.36 (Truncation parameters) . For integers D sos , D V ≥ , say that a shape α satisﬁes the trun-cation parameters D sos , D V if- The degrees of the monomials that U α and V α correspond to, are at most D sos - The left part σ , the middle part τ and the right part σ ′ of α satisfy | V ( σ ) | , | V ( τ ) | , | V ( σ ′ ) | ≤ D V We consider the input to be a tensor A ∈ R [ n ] k . The input entries are now sampled from the distribution N (

0, 1 ) instead of {−

1, 1 } . So, we will work with the Hermite basis of polynomials.Let the standard unnormalized Hermite polynomials be denoted as h ( x ) = h ( x ) = x , h ( x ) = x −

1, . . . . Then, we work with the basis h a ( A ) : = ∏ e ∈ [ n ] k h e ( A e ) over a ∈ N [ n ] k . Accordingly, we willmodify the graphs that represent ribbons (and by extension, shapes), to have labeled hyperedges of arity k .So, an hyperedge e with a label t will correspond to the hermite polynomial h t ( A e ) . Deﬁnition 3.37 (Hyperedges) . Instead of standard edges, we will have labeled hyperedges of arity k in theunderlying graphs for our ribbons as well as shapes. The label for an hyperedge e , denoted l e , is an elementof N which will correspond to the Hermite polynomial being evaluated on that entry. Note that our hyperedges are ordered since the tensor A is not necessarily symmetric.20or variables x , . . . , x n , the rows and columns of our moment matrix will now correspond to monomialsof the form ∏ i ≤ n x p i i for p i ≥ . To capture this, we use the notion of index shape pieces and index shapes.Informally, we split the above monomial product into groups based on their powers and each such groupwill form an index shape piece. Deﬁnition 3.38 (Index shape piece) . An index shape piece U i = (( U i ,1 , . . . , U i , t ) , p i ) is a tuple of indices ( U i ,1 , . . . , U i , t ) along with a power p i ∈ N . Let V ( U i ) be the set { U i ,1 , . . . , U i , t } of vertices of this indexshape piece. When clear from context, we use U i instead of V ( U i ) . If we realize U i ,1 , . . . , U i , t to be indices a , . . . , a t ∈ [ n ] , then, this realization of this index shape piececorresponds to the monomial ∏ j ≤ t x p i a j . Deﬁnition 3.39 (Index shape) . An index shape U is a set of index shape pieces U i that have different powers.Let V ( U ) be the set of vertices ∪ i V ( U i ) . When clear from context, we use U instead of V ( U ) . Observe that each realization of an index shape corresponds to a row or column of the moment matrix.

Deﬁnition 3.40.

For two index shapes U , V , we write U ≡ V if for all powers p , the index shape pieces ofpower p in U and V have the same length. Deﬁnition 3.41.

Deﬁne I mid to be the set of all index shapes U that contain only index shape pieces ofpower . In the deﬁnition of shapes, the distinguished set of vertices should now be replaced by index shapes.

Deﬁnition 3.42 (Shapes) . Shapes are tuples α = ( H α , U α , V α ) where H α is a graph with hyperedges ofarity k and U α , V α are index shapes such that U α , V α ⊆ V ( H α ) . Deﬁnition 3.43 (Proper shape) . A shape α is proper if it has no isolated vertices outside U α ∪ V α , no multi-edges and all the edges have a nonzero label. To deﬁne the notion of vertex separators, we modify the notion of paths for hyperedges.

Deﬁnition 3.44 (Path) . A path is a sequence of vertices u , . . . , u t such that u i , u i + are in the same hyper-edge, for all i ≤ t − . The notions of vertex separator and decomposition into left, middle and right parts are identically de-ﬁned with the above notion of hyperedges and paths. In Section 7, we will show that they are well deﬁned.In the deﬁnition of trivial shape τ , we now require U τ ≡ V τ . For U ∈ I mid , M U will be the setof proper non-trivial middle parts τ with U τ ≡ V τ ≡ U and L U will be the set of left parts σ such that V σ ≡ U . Similarly, for U , V ∈ I mid , L U , V will be the set of left parts γ such that U γ ≡ U and V γ ≡ V .In order to deﬁne the moment matrix, we need to truncate our shapes based on the number of verticesand the labels on our hyperedges. So, we make the following deﬁnition. Deﬁnition 3.45 (Truncation parameters) . For integers D sos , D V , D E ≥ , say that a shape α satisﬁes thetruncation parameters D sos , D V , D E if- The degrees of the monomials that U α and V α correspond to, are at most D sos - The left part σ , the middle part τ and the right part σ ′ T of α satisfy | V ( σ ) | , | V ( τ ) | , | V ( σ ′ T ) | ≤ D V - For each e ∈ E ( α ) , l e ≤ D E . .7.3 Sparse PCA We consider the m vectors v , . . . , v m ∈ R d to be the input. Similar to Tensor PCA, we will work with theHermite basis of polynomials since the entries are sampled from the distribution N (

0, 1 ) .In particular, if we denote the unnormalized Hermite polynomials by h ( x ) = h ( x ) = x , h ( x ) = x −

1, . . . , then, we work with the basis h a ( v ) : = ∏ i ∈ [ m ] , j ∈ [ n ] h a i , j ( v i , j ) over a ∈ N m × n . To capture thesebases, we will modify the graphs that represent ribbons (and by extension, shapes), to be bipartite graphswith two types of vertices, and have labeled edges that go across vertices of different types. So, an edge ( i , j ) with label t between a vertex i of type and a vertex j of type will correspond to h t ( v i , j ) . Deﬁnition 3.46 (Vertices) . We will have two types of vertices, the vertices corresponding to the m inputvectors that we call type vertices and the vertices corresponding to ambient dimension of the space thatwe call type vertices. Deﬁnition 3.47 (Edges) . Edges will go across vertices of different types, thereby forming a bipartite graph.An edge between a type vertex i and a type 2 vertex j corresonds to the input entry v i , j . Each edge willhave a label in N corresponding to the Hermite polynomial evaluated on that entry. We will have variables x , . . . , x n in our SoS program, so we will work with index shape pieces andindex shapes as in Tensor PCA, since the rows and columns of our moment matrix will now correspond tomonomials of the form ∏ i ≤ n x p i i for p i ≥ . But since in our decompositions into left, right and middleparts, we will have type vertices as well in the vertex separators, we will deﬁne a generalized notion ofindex shape pieces and index shapes. Deﬁnition 3.48 (Index shape piece) . An index shape piece U i = (( U i ,1 , . . . , U i , t ) , t i , p i ) is a tuple of indices ( U i ,1 , . . . , U i , t ) along a type t i ∈ {

1, 2 } with a power p i ∈ N . Let V ( U i ) be the set { U i ,1 , . . . , U i , t } ofvertices of this index shape piece. When clear from context, we use U i instead of V ( U i ) . For an index shape piece (( U i ,1 , . . . , U i , t ) , t i , p i ) with type t i = , if we realize U i , . . . , U i t to be indices a , . . . , a t ∈ [ n ] , then, this index shape pieces correspond this to the monomial ∏ j ≤ n x p i a j . Deﬁnition 3.49 (Index shape) . An index shape U is a set of index shape pieces U i that have either havedifferent types or different powers. Let V ( U ) be the set of vertices ∪ i V ( U i ) . When clear from context, weuse U instead of V ( U ) . Observe that each realization of an index shape corresponds to a row or column of the moment matrix.For our moment matrix, the only nonzero rows correspond to index shapes that have only index shape piecesof type , since the only SoS variables are x . . . , x n , but in order to do our analysis, we need to work withthe generalized notion of index shapes that allow index shape pieces of both types. Deﬁnition 3.50.

For two index shapes U , V , we write U ≡ V if for all types t and all powers p , the indexshape pieces of type t and power p in U and V have the same length. Deﬁnition 3.51.

Deﬁne I mid to be the set of all index shapes U that contain only index shape pieces ofpower . Since we are working with standard graphs, the notion of path and vertex separator need no modiﬁca-tions, but we will now use the minimum weight vertex separator instead of the minimum vertex separatorwhere we deﬁne the weight as follows. 22 eﬁnition 3.52 (Weight of an index shape) . Suppose we have an index shape U = { U , U } ∈ I mid where U = (( U , . . . , U | U | ) , 1, 1 ) is an index shape piece of type and U = (( U , . . . , U | U | ) , 2, 1 ) is anindex shape piece of type . Then, deﬁne the weight of this index shape to be w ( U ) = √ m | U | √ n | U | . We now give the modiﬁed deﬁnition of shapes.

Deﬁnition 3.53 (Shapes) . Shapes are tuples α = ( H α , U α , V α ) where H α is a graph with two types ofvertices, has labeled edges only across vertices of different types and U α , V α are index shapes such that U α , V α ⊆ V ( H α ) . Deﬁnition 3.54 (Proper shape) . A shape α is proper if it has no isolated vertices outside U α ∪ V α , no multi-edges and all the edges have a nonzero label. In Section 7, we will show that with this new deﬁnition of weight and shapes, any shape α has a uniquedecomposition into σ ◦ τ ◦ σ ′ T where σ , τ , σ ′ T are left, middle and right parts respectively. Here, τ maypossibly be improper.In the deﬁnition of trivial shape τ , we now require U τ ≡ V τ . For U ∈ I mid , M U will be the setof proper non-trivial middle parts τ with U τ ≡ V τ ≡ U and L U will be the set of left parts σ such that V σ ≡ U . Similarly, for U , V ∈ I mid , L U , V will be the set of left parts γ such that U γ ≡ U and V γ ≡ V .Finally, in order to deﬁne the moment matrix, we need to truncate our shapes based on the number ofvertices and the labels on our edges. So, we make the following deﬁnition. Deﬁnition 3.55 (Truncation parameters) . For integers D sos , D V , D E ≥ , say that a shape α satisﬁes thetruncation parameters D sos , D V , D E if- The degrees of the monomials that U α and V α correspond to, are at most D sos - The left part σ , the middle part τ and the right part σ ′ T of α satisfy | V ( σ ) | , | V ( τ ) | , | V ( σ ′ T ) | ≤ D V - For each e ∈ E ( α ) , l e ≤ D E . In Theorem 3.32, the third qualitative condition we’d like to show is as follows: For all U , V ∈ I mid suchthat | U | > | V | and all γ ∈ Γ U , V , H − γ , γ Id V γ (cid:22) f ( γ ) H Id U γ .For technical reasons, we won’t be able to show this directly. To handle this, we instead work with aslight modiﬁcation of H Id U γ , a matrix H ′ γ that’s very close to H Id U γ . So, what we will end up showing is:For all U , V ∈ I mid such that | U | > | V | and all γ ∈ Γ U , V , H − γ , γ Id V γ (cid:22) f ( γ ) H ′ γ Let D V be the truncation parameter. A canonical choice for H ′ γ is to take1. H ′ γ ( σ , σ ′ ) = H Id U ( σ , σ ′ ) whenever | V ( σ ◦ γ ) | ≤ D V and | V ( σ ′ ◦ γ ) | ≤ D V .2. H ′ γ ( σ , σ ′ ) = whenever | V ( σ ◦ γ ) | > D V or | V ( σ ′ ◦ γ ) | > D V .With this choice, H ′ γ is the same as H Id U γ upto truncation error. We will formally bound the errors inthe quantitative sections after we introduce the full machinery.23 Qualitative bounds for Planted slightly denser subgraph

We will pseudo-calibrate with respect the following pair of random and planted distributions which wedenote ν and µ respectively.- Random distribution: Sample G from G ( n , ) - Planted distribution: Let k be an integer and let p > . Sample a graph G ′ from G ( n , ) . Choose arandom subset S of the vertices, where each vertex is picked independently with probability kn . For allpairs i , j of vertices in S , rerandomize the edge ( i , j ) where the probability of ( i , j ) being in the graphis now p . Set G to be the resulting graph.We assume that the input is given as G i , j for i , j ∈ ( [ n ] ) where G i , j is if the edge ( i , j ) is present inthe graph and − otherwise. We work with the Fourier basis χ E deﬁned as χ E ( G ) : = ∏ ( i , j ) ∈ E G i , j . For asubset I ⊆ [ n ] , deﬁne x I : = ∏ i ∈ I x I . Lemma 4.1.

Let I ⊆ [ n ] , E ⊆ ( [ n ] ) . Then, E µ [ x I χ E ( G )] = (cid:18) kn (cid:19) | I ∪ V ( E ) | ( p − ) | E | Proof.

When we sample ( G , S ) from µ , we condition on whether I ∪ V ( E ) ⊆ S . E ( G , S ) ∼ µ [ x I χ E ( G )] = Pr ( G , S ) ∼ µ [ I ∪ V ( E ) ⊆ S ] E ( G , S ) ∼ µ [ x I χ E ( G ) | I ∪ V ( E ) ⊆ S ]+ Pr ( G , S ) ∼ µ [ I ∪ V ( E ) S ] E ( G , S ) ∼ µ [ x I χ E ( G ) | I ∪ V ( E ) S ] We claim that the second term is . In particular, E ( G , S ) ∼ µ [ x I χ E ( G ) | I ∪ V ( E ) S ] = because when I ∪ V ( E ) S , either S doesn’t contain a vertex in I or an edge ( i , j ) ∈ E is outside S . If S doesn’t containa vertex in I , then x I = and hence, the quantity is . And if an edge ( i , j ) ∈ E is outside S , since this edgeis sampled with probability , by taking expectations, the quantity E ( G , S ) ∼ µ [ x I χ E ( G ) | I ∪ V ( E ) S ] is .Finally, note that Pr ( G , S ) ∼ µ [ I ∪ V ( E ) ⊆ S ] = (cid:16) kn (cid:17) | I ∪ V ( E ) | and E ( G , S ) ∼ µ [ x I χ E ( G ) | I ∪ V ( E ) ⊆ S ] = E ( G , S ) ∼ µ [ χ E ( G ) | V ( E ) ⊆ S ] = ( p − ) | E | The last equality follows because for each edge e ∈ E , since e is present independently with probability p ,the expected value of χ e is · p + ( − ) · ( − p ) = p − .Now, we can write the moment matrix in terms of graph matrices. Deﬁnition 4.2.

Deﬁne the degree of SoS to be D sos = n C sos ε for some constant C sos > that we chooselater. eﬁnition 4.3 (Truncation parameter) . Deﬁne the truncation parameter to be D V = n C V ε for some constant C V > . Remark 4.4 (Choice of parameters) . We ﬁrst set ε to be a sufﬁciently small constant. Based on this choice,we will set C V to be a sufﬁciently small constant to satisfy all the inequalities we use in our proof. Based onthese choices, we can choose C sos to be sufﬁciently small to satisfy the inequalities we use. We will now describe the decomposition of the moment matrix Λ . Deﬁnition 4.5.

If a shape α satisﬁes the following properties:- α is proper,- α satisﬁes the truncation parameter D sos , D V .then deﬁne λ α = (cid:18) kn (cid:19) | V ( α ) | ( p − ) | E ( α ) | Corollary 4.6. Λ = ∑ λ α M α . We use the canonical deﬁnition of H ′ γ from Section 3.7.4. In this section, we will prove the main qualitativebounds Lemma 4.7, Lemma 4.9 and Lemma 4.11. Lemma 4.7.

For all U ∈ I mid , H Id U (cid:23) We deﬁne the following quantity to capture the contribution of the vertices within τ to the Fouriercoefﬁcients. Deﬁnition 4.8.

For U ∈ I mid and τ ∈ M U , deﬁne S ( τ ) = (cid:18) kn (cid:19) | V ( τ ) |−| U τ | ( p − ) | E ( τ ) | Lemma 4.9.

For all U ∈ I mid and τ ∈ M U , " S ( τ ) | Aut ( U ) | H Id U H τ H T τ S ( τ ) | Aut ( U ) | H Id U (cid:23) We deﬁne the following quantity to capture the contribution of the vertices within γ to the Fouriercoefﬁcients. Deﬁnition 4.10.

For all U , V ∈ I mid where w ( U ) > w ( V ) and γ ∈ Γ U , V , deﬁne S ( γ ) = (cid:18) kn (cid:19) | V ( γ ) |− | U γ | + | V γ | ( p − ) | E ( γ ) | emma 4.11. For all U , V ∈ I mid where w ( U ) > w ( V ) and all γ ∈ Γ U , V , | Aut ( V ) || Aut ( U ) | · S ( γ ) H − γ , γ Id V (cid:22) H ′ γ In order to prove these bounds, we deﬁne the following quantity to capture the contribution of thevertices within σ to the Fourier coefﬁcients. Deﬁnition 4.12.

For a shape σ ∈ L , deﬁne T ( σ ) = (cid:18) kn (cid:19) | V ( σ ) |− | V σ | ( p − ) | E ( σ ) | Deﬁnition 4.13.

For U ∈ I mid , deﬁne v U to be the vector indexed by σ ∈ L such that v U ( σ ) = T ( σ ) if σ ∈ L U and otherwise. Proposition 4.14.

For all U ∈ I mid , ρ ∈ P U , H Id U = | Aut ( U ) | v U v TU .Proof. This follows by verifying the conditions of Deﬁnition 4.5.This immediately implies that for all U ∈ I mid , H Id U (cid:23) , which is Lemma 4.7.We restate De f inition for convenience.

Deﬁnition 4.8.

For U ∈ I mid and τ ∈ M U , deﬁne S ( τ ) = (cid:18) kn (cid:19) | V ( τ ) |−| U τ | ( p − ) | E ( τ ) | Proposition 4.15.

For any U ∈ I mid and τ ∈ M U , H τ = | Aut ( U ) | S ( τ ) v U v TU .Proof. This follows by a straightforward veriﬁcation of the conditions of Deﬁnition 4.5.Lemma 4.9 immediately follows.

Lemma 4.9.

For all U , V ∈ I mid where w ( U ) > w ( V ) and γ ∈ Γ U , V , deﬁne S ( γ ) = (cid:18) kn (cid:19) | V ( γ ) |− | U γ | + | V γ | ( p − ) | E ( γ ) | Proposition 4.16.

For all U , V ∈ I mid where w ( U ) > w ( V ) , for all γ ∈ Γ U , V , H − γ , γ Id V = | Aut ( U ) || Aut ( V ) | S ( γ ) H ′ γ Proof.

Fix σ , σ ′ ∈ L U such that | V ( σ ◦ γ ) | , | V ( σ ′ ◦ γ ) | ≤ D V . Note that | V ( σ ) | − | V σ | + | V ( σ ′ ) | − | V σ ′ | + ( | V ( γ ) | − | U γ | + | V γ | ) = | V ( σ ◦ γ ◦ γ T ◦ σ ′ T ) | . Using Deﬁnition 4.5, we can easily verify that λ σ ◦ γ ◦ γ T ◦ σ ′ T = T ( σ ) T ( σ ′ ) S ( γ ) . Therefore, H − γ , γ Id V ( σ , σ ′ ) = | Aut ( U ) || Aut ( V ) | S ( γ ) H Id U ( σ , σ ′ ) . Since H ′ γ ( σ , σ ′ ) = H Id U ( σ , σ ′ ) whenever | V ( σ ◦ γ ) | , | V ( σ ′ ◦ γ ) | ≤ D V , this completes the proof.Rearranging this gives Lemma 4.11 Lemma 4.11.

For all U , V ∈ I mid where w ( U ) > w ( V ) and all γ ∈ Γ U , V , | Aut ( V ) || Aut ( U ) | · S ( γ ) H − γ , γ Id V (cid:22) H ′ γ Deﬁnition 5.1 (Slack parameter) . Deﬁne the slack parameter to be ∆ = n − C ∆ ε for a constant C ∆ > . We will pseudo-calibrate with respect the following pair of random and planted distributions which wedenote ν and µ respectively.- Random distribution: Sample A from N ( I [ n ] k ) .- Planted distribution: Let λ , ∆ > . Sample u from {− √ ∆ n , 0, √ ∆ n } n where the values are taken withprobabilites ∆ , 1 − ∆ , ∆ respectively. Then sample B from N ( I [ n ] k ) . Set A = B + λ u ⊗ k .Let the Hermite polynomials be h ( x ) = h ( x ) = x , h ( x ) = x −

1, . . . . For a ∈ N [ n ] k andvariables A e for e ∈ [ n ] k , deﬁne h a ( A ) : = ∏ e ∈ [ n ] k h e ( A e ) . We will work with this Hermite basis. Lemma 5.2.

Let I ∈ N n , a ∈ N [ n ] k . For i ∈ [ n ] , let d i = ∑ i ∈ e ∈ [ n ] k a e . Let c be the number of i such that I i + d i is nonzero. Then, if I i + d i are all even, we have E µ [ u I h a ( A )] = ∆ c (cid:18) √ ∆ n (cid:19) | I | ∏ e ∈ [ n ] k λ ( ∆ n ) k ! a e Else, E µ [ u I h a ( v )] = . roof. When A ∼ µ , for all e ∈ [ n ] k , we have A e = B e + λ ∏ i ≤ k u e i . where B e ∼ N (

0, 1 ) .Let’s analyze when the required expectation is nonzero. We can ﬁrst condition on u and use the factthat for a ﬁxed t , E g ∼N ( ) [ h k ( g + t )] = t k to obtain E ( u i , w e ) ∼ µ [ u I h a ( A )] = E ( u i ) ∼ µ [ u I ∏ e ∈ [ n ] k ( λ ∏ i ≤ k u e i ) a e ] = E ( u i ) ∼ µ [ ∏ i ∈ [ n ] u I i + d i i ] ∏ e ∈ [ n ] k λ a e Observe that this is nonzero precisely when all I i + d i are even, in which case E ( u i ) ∼ µ [ ∏ i ∈ [ n ] u I i + d i i ] = ∆ c (cid:18) √ ∆ n (cid:19) ∑ i ≤ n I i + d i = ∆ c (cid:18) √ ∆ n (cid:19) | I | ∏ e ∈ [ n ] k ( ∆ n ) k ! a e where we used the fact that ∑ e ∈ [ n ] k a e = k ∑ i ∈ [ n ] d i . This completes the proof.Now, we can write the moment matrix in terms of graph matrices. Deﬁnition 5.3.

Deﬁne the degree of SoS to be D sos = n C sos ε for some constant C sos > that we chooselater. Deﬁnition 5.4 (Truncation parameters) . Deﬁne the truncation parameters to be D V = n C V ε , D E = n C E ε forsome constants C V , C E > . Remark 5.5 (Choice of parameters) . We ﬁrst set ε to be a sufﬁciently small constant. Based on the choiceof ε , we will set the constant C ∆ > sufﬁciently small so that the planted distribution is well deﬁned. Basedon these choices, we will set C V , C E to be sufﬁciently small constants to satisfy all the inequalities we use inour proof. Based on these choices, we can choose C sos to be sufﬁciently small to satisfy the inequalities weuse. Remark 5.6.

The underlying graphs for the graph matrices have the following structure; There will be n vertices of a single type and the edges will be ordered hyperedges of arity k . Deﬁnition 5.7.

For the analysis of Tensor PCA, we will use the following notation.- For an index shape U and a vertex i , deﬁne deg U ( i ) as follows: If i ∈ V ( U ) , then it is the power ofthe unique index shape piece A ∈ U such that i ∈ V ( A ) . Otherwise, it is .- For an index shape U , deﬁne deg ( U ) = ∑ i ∈ V ( U ) deg U ( i ) . This is also the degree of the monomialthat U corresponds to.- For a shape α and vertex i in α , let deg α ( i ) = ∑ i ∈ e ∈ E ( α ) l e .- For any shape α , let deg ( α ) = deg ( U α ) + deg ( V α ) . We will now describe the decomposition of the moment matrix Λ . Deﬁnition 5.8.

If a shape α satisﬁes the following properties:- deg α ( i ) + deg U α ( i ) + deg V α ( i ) is even for all i ∈ V ( α ) ,- α is proper,- α satisﬁes the truncation parameters D sos , D V , D E . hen deﬁne λ α = ∆ | V ( α ) | (cid:18) √ ∆ n (cid:19) deg ( α ) ∏ e ∈ E ( α ) λ ( ∆ n ) k ! l e Otherwise, deﬁne λ α = . Corollary 5.9. Λ = ∑ λ α M α . We use the canonical deﬁnition of H ′ γ from Section 3.7.4. In this section, we will prove the followingqualitative bounds. Lemma 5.10.

For all U ∈ I mid , H Id U (cid:23) We deﬁne the following quantity to capture the contribution of the vertices within τ to the Fouriercoefﬁcients. Deﬁnition 5.11.

For U ∈ I mid and τ ∈ M U , if deg τ ( i ) is even for all vertices i ∈ V ( τ ) \ U τ \ V τ , deﬁne S ( τ ) = ∆ | V ( τ ) |−| U τ | ∏ e ∈ E ( τ ) λ ( ∆ n ) k ! l e Otherwise, deﬁne S ( τ ) = . Lemma 5.12.

For all U , V ∈ I mid where w ( U ) > w ( V ) and γ ∈ Γ U , V , if deg γ ( i ) is even for all vertices i in V ( γ ) \ U γ \ V γ , deﬁne S ( γ ) = ∆ | V ( γ ) |− | U γ | + | V γ | ∏ e ∈ E ( γ ) λ ( ∆ n ) k ! l e Otherwise, deﬁne S ( γ ) = . Lemma 5.14.

For all U , V ∈ I mid where w ( U ) > w ( V ) and all γ ∈ Γ U , V , | Aut ( V ) || Aut ( U ) | · S ( γ ) H − γ , γ Id V (cid:22) H ′ γ .2.1 Proof of Lemma 5.10 When we compose shapes σ , σ ′ , from Deﬁnition 5.8, observe that all vertices i in λ σ ◦ σ ′ should have deg σ ◦ σ ′ ( i ) + deg U σ ◦ σ ′ ( i ) + deg V σ ◦ σ ′ ( i ) to be even, in order for λ σ ◦ σ ′ to be nonzero. To partially capture this notion con-veniently, we will introduce the notion of parity vectors. Deﬁnition 5.15.

Deﬁne a parity vector ρ to be a vector whose entries are in {

0, 1 } . Deﬁnition 5.16.

For U ∈ I mid , deﬁne P U to be the set of parity vectors ρ whose coordinates are indexedby U . Deﬁnition 5.17.

For a left shape σ , deﬁne ρ σ ∈ P V σ , called the parity vector of σ , to be the parity vectorsuch that for each vertex i ∈ V σ , the i -th entry of ρ σ is the parity of deg U σ ( i ) + deg σ ( i ) , that is ( ρ σ ) i ≡ deg U σ ( i ) + deg σ ( i ) ( mod 2 ) . Deﬁnition 5.18.

For U ∈ I mid and ρ ∈ P U , let L U , ρ be the set of all left shapes σ ∈ L U such that ρ σ = ρ ,that is, the set of all left shapes with parity vector ρ . Deﬁnition 5.19.

For a shape τ , for a τ coefﬁcient matrix H τ and parity vectors ρ ∈ P U τ , ρ ′ ∈ P V τ , deﬁnethe τ -coefﬁcient matrix H τ , ρ , ρ ′ as H τ , ρ , ρ ′ ( σ , σ ′ ) = H τ ( σ , σ ′ ) if σ ∈ L U τ , ρ , σ ′ ∈ L V τ , ρ ′ and otherwise. Proposition 5.20.

For any shape τ and τ -coefﬁcient matrix H τ , H τ = ∑ ρ ∈P U τ , ρ ′ ∈P V τ H τ , ρ , ρ ′ Proposition 5.21.

For any U ∈ I mid , H Id U = ∑ ρ ∈P U H Id U , ρ , ρ Proof.

For any σ , σ ′ ∈ L U , using Deﬁnition 5.8, note that in order for H Id U ( σ , σ ′ ) to be nonzero, we musthave ρ σ = ρ σ ′ .We deﬁne the following quantity to capture the contribution of the vertices within σ to the Fouriercoefﬁcients. Deﬁnition 5.22.

For a shape σ ∈ L , if deg σ ( i ) + deg U σ ( i ) is even for all vertices i ∈ V ( σ ) \ V σ , deﬁne T ( σ ) = ∆ | V ( σ ) |− | V σ | (cid:18) √ ∆ n (cid:19) deg ( U σ ) ∏ e ∈ E ( σ ) λ ( ∆ n ) k ! l e Otherwise, deﬁne T ( σ ) = . Deﬁnition 5.23.

For U ∈ I mid and ρ ∈ P U , deﬁne v ρ to be the vector indexed by σ ∈ L such that v ρ ( σ ) is T ( σ ) if σ ∈ L U , ρ and otherwise. Proposition 5.24.

For all U ∈ I mid , ρ ∈ P U , H Id U , ρ , ρ = | Aut ( U ) | v ρ v T ρ .Proof. This follows by verifying the conditions of Deﬁnition 5.8.

Lemma 5.10.

For all U ∈ I mid , H Id U (cid:23) Proof.

We have H Id U = ∑ ρ ∈P U H Id U , ρ , ρ = | Aut ( U ) | ∑ ρ ∈P U v ρ v T ρ (cid:23) .30 .2.2 Proof of Lemma 5.12 The next proposition captures the fact that when we compose shapes σ , τ , σ ′ T , in order for λ σ ◦ τσ ′ T to benonzero, the parities of the degrees of the merged vertices should add up correspondingly. Proposition 5.25.

For all U ∈ I mid and τ ∈ M U , there exist two sets of parity vectors P τ , Q τ ⊆ P U anda bijection π : P τ → Q τ such that H τ = ∑ ρ ∈ P τ H τ , ρ , π ( ρ ) .Proof. Using Deﬁnition 5.8, in order for H τ ( σ , σ ′ ) to be nonzero, in σ ◦ τ ◦ σ ′ , we must have that for all i ∈ U τ ∪ V τ , deg U σ ( i ) + deg U σ ′ ( i ) + deg σ ◦ τ ◦ σ ′ T ( i ) must be even. In other words, for any ρ ∈ P U , there isat most one ρ ′ ∈ P U such that if we take σ ∈ L U , ρ , σ ′ ∈ L U with H τ ( σ , σ ′ ) nonzero, then the parity of σ ′ is ρ ′ . Also, observe that ρ ′ determines ρ . We then take P τ to be the set of ρ such that ρ ′ exists, Q τ to be theset of ρ ′ and in this case, we deﬁne π ( ρ ) = ρ ′ .We restate Deﬁnition 5.11 for convenience. Deﬁnition 5.11.

For U ∈ I mid and τ ∈ M U , if deg τ ( i ) is even for all vertices i ∈ V ( τ ) \ U τ \ V τ , deﬁne S ( τ ) = ∆ | V ( τ ) |−| U τ | ∏ e ∈ E ( τ ) λ ( ∆ n ) k ! l e Otherwise, deﬁne S ( τ ) = . Proposition 5.26.

For any U ∈ I mid and τ ∈ M U , suppose we take ρ ∈ P τ . Let π be the bijection fromProposition 5.25 so that π ( ρ ) ∈ Q τ . Then, H τ , ρ , π ( ρ ) = | Aut ( U ) | S ( τ ) v ρ v T π ( ρ ) .Proof. This follows by a straightforward veriﬁcation of the conditions of Deﬁnition 5.8.

Lemma 5.12.

For all U ∈ I mid and τ ∈ M U , " S ( τ ) | Aut ( U ) | H Id U H τ H T τ S ( τ ) | Aut ( U ) | H Id U (cid:23) Proof.

Let P τ , Q τ , π be from Proposition 5.25. For ρ , ρ ′ ∈ P U , let W ρ , ρ ′ = v ρ ( v ρ ′ ) T . Then, H Id U = ∑ ρ ∈P U H Id U , ρ , ρ = | Aut ( U ) | ∑ ρ ∈P U W ρ , ρ and H τ = ∑ ρ ∈ P τ H τ , ρ , π ( ρ ) = | Aut ( U ) | S ( τ ) ∑ ρ ∈ P τ W ρ , π ( ρ ) . Wehave " S ( τ ) | Aut ( U ) | H Id U H τ H T τ S ( τ ) | Aut ( U ) | H Id U = S ( τ ) | Aut ( U ) | " ∑ ρ ∈P U W ρ , ρ ∑ ρ ∈ P τ W ρ , π ( ρ ) ∑ ρ ∈ P τ W T ρ , π ( ρ ) ∑ ρ ∈P U W ρ , ρ S ( τ ) | Aut ( U ) | ≥ , it sufﬁces to prove that " ∑ ρ ∈P U W ρ , ρ ∑ ρ ∈ P τ W ρ , π ( ρ ) ∑ ρ ∈ P τ W T ρ , π ( ρ ) ∑ ρ ∈P U W ρ , ρ (cid:23) . Consider " ∑ ρ ∈P U W ρ , ρ ∑ ρ ∈ P τ W ρ , π ( ρ ) ∑ ρ ∈ P τ W T ρ , π ( ρ ) ∑ ρ ∈P U W ρ , ρ = (cid:20) ∑ ρ ∈P U \ P τ W ρ , ρ ∑ ρ ∈P U \ Q τ W ρ , ρ (cid:21) + " ∑ ρ ∈ P τ W ρ , ρ ∑ ρ ∈ P τ W ρ , π ( ρ ) ∑ ρ ∈ P τ W T ρ , π ( ρ ) ∑ ρ ∈ P τ W π ( ρ ) , π ( ρ ) We have ∑ ρ ∈P U \ P τ W ρ , ρ = ∑ ρ ∈P U \ P τ v ρ v T ρ (cid:23) . Similarly, ∑ ρ ∈P U \ Q τ W ρ , ρ (cid:23) and so, the ﬁrst termin the above expression, (cid:20) ∑ ρ ∈P U \ P τ W ρ , ρ ∑ ρ ∈P U \ Q τ W ρ , ρ (cid:21) is positive semideﬁnite. For the second term, " ∑ ρ ∈ P τ W ρ , ρ ∑ ρ ∈ P τ W ρ , π ( ρ ) ∑ ρ ∈ P τ W T ρ , π ( ρ ) ∑ ρ ∈ P τ W π ( ρ ) , π ( ρ ) = ∑ ρ ∈ P τ " W ρ , ρ W ρ , π ( ρ ) W T ρ , π ( ρ ) W π ( ρ ) , π ( ρ ) = ∑ ρ ∈ P τ (cid:20) v ρ v T ρ v ρ ( v π ( ρ ) ) T v π ( ρ ) ( v ρ ) T v π ( ρ ) ( v π ( ρ ) ) T (cid:21) = ∑ ρ ∈ P τ (cid:20) v ρ v π ( ρ ) (cid:21) (cid:2) v ρ v π ( ρ ) (cid:3) (cid:23) The next proposition captures the fact that when we compose shapes σ , γ , γ T , σ ′ T , in order for λ σ ◦ γ ◦ γ ′ T ◦ σ ′ T to be nonzero, the parities of the degrees of the merged vertices should add up correspondingly. Deﬁnition 5.27.

For all U , V ∈ I mid where w ( U ) > w ( V ) , for all γ ∈ Γ U , V , there exists a set of parityvectors P γ ⊆ P U such that H − γ , γ Id V = ∑ ρ ∈ P γ H − γ , γ Id V , ρ , ρ Proof.

Take any ρ ∈ P U . For σ ∈ L U , ρ , σ ′ ∈ L U , since H − γ , γ Id V ( σ , σ ′ ) = λ σ ◦ γ ◦ γ T ◦ σ ′ T | Aut ( V ) | , H − γ , γ Id V ( σ , σ ′ ) isnonzero precisely when λ σ ◦ γ ◦ γ T ◦ σ ′ T is nonzero. For this quantity to be nonzero, using Deﬁnition 5.8, we getthat it is necessary, but not sufﬁcient, that the parity vector of σ ′ must also be ρ . And also observe that thereexists a set P γ of parity vectors ρ for which H − γ , γ Id V , ρ , ρ is nonzero and their sum is precisely H − γ , γ Id V .32 eﬁnition 5.29. For all U , V ∈ I mid where w ( U ) > w ( V ) , for all γ ∈ Γ U , V and parity vector ρ ∈ P U ,deﬁne the matrix H ′ γ , ρ , ρ as H ′ γ , ρ , ρ ( σ , σ ′ ) = H ′ γ ( σ , σ ′ ) if σ , σ ′ ∈ L U , ρ and otherwise. Proposition 5.30.

For all U , V ∈ I mid where w ( U ) > w ( V ) , for γ ∈ Γ U , V , H ′ γ = ∑ ρ ∈ P γ H ′ γ , ρ , ρ . We restate Deﬁnition 5.13 for convenience.

Deﬁnition 5.13.

For all U , V ∈ I mid where w ( U ) > w ( V ) and γ ∈ Γ U , V , if deg γ ( i ) is even for all vertices i in V ( γ ) \ U γ \ V γ , deﬁne S ( γ ) = ∆ | V ( γ ) |− | U γ | + | V γ | ∏ e ∈ E ( γ ) λ ( ∆ n ) k ! l e Otherwise, deﬁne S ( γ ) = . Proposition 5.31.

For all U , V ∈ I mid where w ( U ) > w ( V ) , for all γ ∈ Γ U , V and ρ ∈ P γ , H − γ , γ Id V , ρ , ρ = | Aut ( U ) || Aut ( V ) | S ( γ ) H ′ γ , ρ , ρ Proof.

Fix σ , σ ′ ∈ L U , ρ such that | V ( σ ◦ γ ) | , | V ( σ ′ ◦ γ ) | ≤ D V . Note that | V ( σ ) | − | V σ | + | V ( σ ′ ) | − | V σ ′ | + ( | V ( γ ) | − | U γ | + | V γ | ) = | V ( σ ◦ γ ◦ γ T ◦ σ ′ T ) | . Using Deﬁnition 5.8, we can easily verify that λ σ ◦ γ ◦ γ T ◦ σ ′ T = T ( σ ) T ( σ ′ ) S ( γ ) . Therefore, H − γ , γ Id V , ρ , ρ ( σ , σ ′ ) = | Aut ( U ) || Aut ( V ) | S ( γ ) H Id U , ρ , ρ ( σ , σ ′ ) . Since H ′ γ , ρ , ρ ( σ , σ ′ ) = H Id U , ρ , ρ ( σ , σ ′ ) whenever | V ( σ ◦ γ ) | , | V ( σ ′ ◦ γ ) | ≤ D V , this completes the proof. Lemma 5.14.

For all U , V ∈ I mid where w ( U ) > w ( V ) and all γ ∈ Γ U , V , | Aut ( V ) || Aut ( U ) | · S ( γ ) H − γ , γ Id V (cid:22) H ′ γ Proof.

We have | Aut ( V ) || Aut ( U ) | · S ( γ ) H − γ , γ Id V = ∑ ρ ∈ P γ | Aut ( V ) || Aut ( U ) | · S ( γ ) H − γ , γ Id V , ρ , ρ = ∑ ρ ∈ P γ H ′ γ , ρ , ρ (cid:22) ∑ ρ ∈P U H ′ γ , ρ , ρ = H ′ γ where we used the fact that for all ρ ∈ P U , we have H ′ γ , ρ , ρ (cid:23) which can be proved the same way as theproof of Lemma 5.10. 33 Qualitative bounds for Sparse PCA

Deﬁnition 6.1 (Slack parameter) . Deﬁne the slack parameter to be ∆ = d − C ∆ ε for a constant C ∆ > . We will pseudo-calibrate with respect the following pair of random and planted distributions which wedenote ν and µ respectively.- Random distribution: v , . . . , v m are sampled from N ( I d ) and we take S to be the m × d matrixwith rows v , . . . , v m .- Planted distribution: Sample u from {− √ k , 0, √ k } d where the values are taken with probabilites k d , 1 − kd , k d respectively. Then sample v , . . . , v m as follows. For each i ∈ [ m ] , with probability ∆ , sample v i from N ( I d + λ uu T ) and with probability − ∆ , sample v i from N ( I d ) . Finally,take S to be the m × d matrix with rows v , . . . , v m .We will again work with the Hermite basis of polynomials. For a ∈ N m × d and variables v i , j for i ∈ [ m ] , j ∈ [ n ] , deﬁne h a ( v ) : = ∏ i ∈ [ m ] , j ∈ [ n ] h a i , j ( v i , j ) . Deﬁnition 6.2.

For a nonnegative integer t , deﬁne t !! =  ( t ) ! t !2 t = × × . . . × t , if t is odd otherwise Lemma 6.3.

Let I ∈ N d , a ∈ N m × d . For i ∈ [ m ] , let e i = ∑ j ∈ [ d ] a ij and for j ∈ [ d ] , let f j = I j + ∑ i ∈ [ m ] a ij .Let c (resp. c ) be the number of i (resp. j ) such that e i > (resp. f j > ). Then, if e i , f j are all even, wehave E µ [ u I h a ( v )] = (cid:18) √ k (cid:19) | I | (cid:18) kd (cid:19) c ∆ c ∏ i ∈ [ m ] ( e i − ) !! ∏ i , j √ λ a ij √ k a ij Else, E µ [ u I h a ( v )] = .Proof. v , . . . , v m ∼ µ can be written as v i = g i + √ λ b i l i u where g i ∼ N ( I d ) , l i ∼ N (

0, 1 ) , b i ∈{

0, 1 } where b i = with probability ∆ .Let’s analyze when the required expectation is nonzero. We can ﬁrst condition on b i , l i , u and use thefact that for a ﬁxed t , E g ∼N ( ) [ h k ( g + t )] = t k to obtain E ( u , l i , b i , g i ) ∼ µ [ u I h a ( v )] = E ( u , l i , b i ) ∼ µ [ u I ∏ i , j ( √ λ b i l i u j ) a ij ] = E ( u , l i , b i ) ∼ µ [ ∏ i ∈ [ m ] ( b i l i ) e i ∏ j ∈ [ d ] u f j j ] ∏ i , j √ λ a ij For this to be nonzero, the set of c indices i such that e i > , should not have been resampled otherwise b i = , each of which happens independently with probability ∆ . And the set of c indices j such that f j > should have been such that u j is nonzero, each of which happens independently with probability kd . Since l i , u j are have zero expectation in ν , we need e i , f j to be even. The expectation then becomes ∆ c (cid:18) kd (cid:19) c E ( u , l i ) ∼ µ [ ∏ i ∈ [ m ] l e i i ∏ j ∈ [ d ] u f j j ] ∏ i , j √ λ a ij = (cid:18) √ k (cid:19) | I | (cid:18) kd (cid:19) c ∆ c ∏ i ∈ [ m ] ( e i − ) !! ∏ i , j √ λ a ij √ k a ij j such that u j is nonzero, we have u tj = ( √ k ) t and E g ∼N ( ) [ g t ] =( t − ) !! if t is even.Now, we can write the moment matrix in terms of graph matrices. Deﬁnition 6.4.

Deﬁne the degree of SoS to be D sos = d C sos ε for some constant C sos > that we chooselater. Deﬁnition 6.5 (Truncation parameters) . Deﬁne the truncation parameters to be D V = d C V ε , D E = d C E ε forsome constants C V , C E > . Remark 6.6 (Choice of parameters) . We ﬁrst set ε > to be a sufﬁciently small constant. Based on thechoice of ε , we will set the constant C ∆ > sufﬁciently small so that the planted distribution is well deﬁned.Based on these choices, we will set C V , C E to be sufﬁciently small constants to satisfy all the inequalities weuse in our proof. Based on these choices, we can choose C sos to be sufﬁciently small to satisfy the inequalitieswe use. Remark 6.7.

The underlying graphs for the graph matrices have the following structure: There will betwo types of vertices - d type vertices corresponding to the dimensions of the space and m type verticescorresponding to the different input vectors. The shapes will correspond to bipartite graphs with edgesgoing between across of different types. Deﬁnition 6.8.

For the analysis of Sparse PCA, we will use the following notation.- For a shape α and type t ∈ {

1, 2 } , let V t ( α ) denote the vertices of V ( α ) that are of type t . Let | α | t = | V t ( α ) | .- For an index shape U and a vertex i , deﬁne deg U ( i ) as follows: If i ∈ V ( U ) , then it is the power ofthe unique index shape piece A ∈ U such that i ∈ V ( A ) . Otherwise, it is .- For an index shape U , deﬁne deg ( U ) = ∑ i ∈ V ( U ) deg U ( i ) . This is also the degree of the monomial p U .- For a shape α and vertex i in α , let deg α ( i ) = ∑ i ∈ e ∈ E ( α ) l e .- For any shape α , let deg ( α ) = deg ( U α ) + deg ( V α ) .- For an index shape U ∈ I mid and type t ∈ {

1, 2 } , let U t ∈ U denote the index shape piece of type t in U if it exists, otherwise deﬁne U t to be ∅ . Note that this is well deﬁned since for each type t , thereis at most one index shape piece of type t in U since U ∈ I mid . Also, denote by | U | t the length of thetuple U t . We will now describe the decomposition of the moment matrix Λ . Deﬁnition 6.9.

If a shape α satisﬁes the following properties:- Both U α and V α only contain index shape pieces of type ,- deg α ( i ) + deg U α ( i ) + deg V α ( i ) is even for all i ∈ V ( α ) ,- α is proper,- α satisﬁes the truncation parameters D sos , D V , D E . hen deﬁne λ α = (cid:18) √ k (cid:19) deg ( α ) (cid:18) kd (cid:19) | α | ∆ | α | ∏ j ∈ V ( α ) ( deg α ( j ) − ) !! ∏ e ∈ E ( α ) √ λ l e √ k l e Otherwise, deﬁne λ α = . Corollary 6.10. Λ = ∑ λ α M α . We use the canonical deﬁnition of H ′ γ from Section 3.7.4. In this section, we will prove the followingqualitative bounds. Lemma 6.11.

For all U ∈ I mid , H Id U (cid:23) For technical reasons, it will be convenient to discretize the Normal distribution. The following factfollows from standard results on Gaussian quadrature, see for e.g. [DKS17, Lemma 4.3].

Fact 6.12 (Discretizing the Normal distribution) . There is an absolute constant C disc such that, for anypositive integer D , there exists a distribution E over the real numbers supported on D points p , . . . , p D ,such that- | p i | ≤ C disc √ D for all i ≤ D and- E g ∼E [ g t ] = E g ∼N ( ) [ g t ] for all t =

0, 1, . . . , 2 D − We deﬁne the following quantity to capture the contribution of the vertices within τ to the Fouriercoefﬁcients. Deﬁnition 6.13.

For any shape τ , suppose U ′ = ( U τ ) , V ′ = ( V τ ) are the type vertices in U τ , V τ respectively. Deﬁne R ( τ ) = ( C disc p D E ) ∑ j ∈ U ′∪ V ′ deg τ ( j ) where C disc is the constant from Fact 6.12. Lemma 6.15.

For all U ∈ I mid and τ ∈ M U , " S ( τ ) R ( τ ) | Aut ( U ) | H Id U H τ H T τ S ( τ ) R ( τ ) | Aut ( U ) | H Id U (cid:23) We deﬁne the following quantity to capture the contribution of the vertices within γ to the Fouriercoefﬁcients. 36 eﬁnition 6.16. For all U , V ∈ I mid where w ( U ) > w ( V ) and γ ∈ Γ U , V , if deg γ ( i ) is even for all vertices i in V ( γ ) \ U γ \ V γ , deﬁne S ( γ ) = (cid:18) kd (cid:19) | γ | − | U γ | + | V γ | ∆ | γ | − | U γ | + | V γ | ∏ j ∈ V ( γ ) \ U γ \ V γ ( deg γ ( j ) − ) !! ∏ e ∈ E ( γ ) √ λ l e √ k l e Otherwise, deﬁne S ( γ ) = . Lemma 6.17.

For all U , V ∈ I mid where w ( U ) > w ( V ) and all γ ∈ Γ U , V , | Aut ( V ) || Aut ( U ) | · S ( γ ) R ( γ ) H − γ , γ Id V (cid:22) H ′ γ When we compose shapes σ , σ ′ , from Deﬁnition 6.9, observe that all vertices i in λ σ ◦ σ ′ should have deg σ ◦ σ ′ ( i ) + deg U σ ◦ σ ′ ( i ) + deg V σ ◦ σ ′ ( i ) to be even, in order for λ σ ◦ σ ′ to be nonzero. To partially capture this notion con-veniently, we will introduce the notion of parity vectors. Deﬁnition 6.18.

Deﬁne a parity vector ρ to be a vector whose entries are in {

0, 1 } . Deﬁnition 6.19.

For U ∈ I mid , deﬁne P U to be the set of parity vectors ρ whose coordinates are indexedby U followed by U . Deﬁnition 6.20.

For a left shape σ , deﬁne ρ σ ∈ P V σ , called the parity vector of σ , to be the parity vectorsuch that for each vertex i ∈ V σ , the i -th entry of ρ σ is the parity of deg U σ ( i ) + deg σ ( i ) , that is, ( ρ σ ) i ≡ deg U σ ( i ) + deg σ ( i ) ( mod 2 ) . Deﬁnition 6.21.

For U ∈ I mid and ρ ∈ P U , let L U , ρ be the set of all left shapes σ ∈ L U such that ρ σ = ρ ,that is, the set of all left shapes with parity vector ρ . Deﬁnition 6.22.

For any shape τ and τ -coefﬁcient matrix H τ , H τ = ∑ ρ ∈P U τ , ρ ′ ∈P V τ H τ , ρ , ρ ′ Proposition 6.24.

For any U ∈ I mid , H Id U = ∑ ρ ∈P U H Id U , ρ , ρ Proof.

For any σ , σ ′ ∈ L U , using Deﬁnition 6.9, note that in order for H Id U ( σ , σ ′ ) to be nonzero, we musthave ρ σ = ρ σ ′ .We will now discretize the normal distribution while matching the ﬁrst D E − moments. Deﬁnition 6.25.

Let D be a distribution over the real numbers obtained by setting D = D E in Fact 6.12.So, in particular, for any x sampled from D , we have | x | ≤ C disc √ D E and for t ≤ D E − , E x ∼D [ x t ] =( t − ) !! . We deﬁne the following quantity to capture the contribution of the vertices within σ to the Fouriercoefﬁcients. 37 eﬁnition 6.26. For a shape σ ∈ L , if deg σ ( i ) + deg U σ ( i ) is even for all vertices i ∈ V ( σ ) \ V σ , deﬁne T ( σ ) = (cid:18) √ k (cid:19) deg ( U σ ) (cid:18) kd (cid:19) | σ | − | V σ | ∆ | σ | − | V σ | ∏ j ∈ V ( σ ) \ V σ ( deg σ ( j ) − ) !! ∏ e ∈ E ( σ ) √ λ l e √ k l e Otherwise, deﬁne T ( σ ) = . Deﬁnition 6.27.

Let U ∈ I mid . Let x i for i ∈ U be variables. Denote them collectively as x U . For ρ ∈P U , deﬁne v ρ , x U to be the vector indexed by left shapes σ ∈ L such that the σ th entry is T ( σ ) ∏ i ∈ U x deg σ ( i ) i if σ ∈ L U , ρ and otherwise. Proposition 6.28.

For any U ∈ I mid , ρ ∈ P U , suppose x i for i ∈ U are random variables sampled from D . Then, H Id U , ρ , ρ = | Aut ( U ) | E x [ v ρ , x U v T ρ , x U ] Proof.

Observe that for σ , σ ′ ∈ L U , ρ and t ∈ {

1, 2 } , ( | σ | t − | V σ | t ) + ( | σ ′ | t − | V σ ′ | t ) = | σ ◦ σ ′ | t . The resultfollows by verifying the conditions of Deﬁnition 6.9 and using Deﬁnition 6.25. Lemma 6.11.

For all U ∈ I mid , H Id U (cid:23) Proof.

We have H Id U = ∑ ρ ∈P U H Id U , ρ , ρ = | Aut ( U ) | ∑ ρ ∈P U E x U ∼D U [ v ρ , x U v T ρ , x U ] (cid:23) . The next proposition captures the fact that when we compose shapes σ , τ , σ ′ T , in order for λ σ ◦ τσ ′ T to benonzero, the parities of the degrees of the merged vertices should add up correspondingly. Proposition 6.29.

For all U ∈ I mid and τ ∈ M U , there exist two sets of parity vectors P τ , Q τ ⊆ P U anda bijection π : P τ → Q τ such that H τ = ∑ ρ ∈ P τ H τ , ρ , π ( ρ ) .Proof. Using Deﬁnition 6.9, in order for H τ ( σ , σ ′ ) to be nonzero, we must have that, in σ ◦ τ ◦ σ ′ , for all i ∈ U τ ∪ V τ , deg U σ ( i ) + deg U σ ′ ( i ) + deg σ ◦ τ ◦ σ ′ T ( i ) must be even. In other words, for any ρ ∈ P U , there isat most one ρ ′ ∈ P U such that if we take σ ∈ L U , ρ , σ ′ ∈ L U with H τ ( σ , σ ′ ) nonzero, then the parity of σ ′ is ρ ′ . Also, observe that ρ ′ determines ρ . We then take P τ to be the set of ρ such that ρ ′ exists, Q τ to be theset of ρ ′ and in this case, we deﬁne π ( ρ ) = ρ ′ .We restate Deﬁnition 6.13 for convenience. Deﬁnition 6.13.

For U ∈ I mid and τ ∈ M U , if deg τ ( i ) is even for all vertices i ∈ V ( τ ) \ U τ \ V τ , deﬁne S ( τ ) = (cid:18) kd (cid:19) | τ | −| U τ | ∆ | τ | −| U τ | ∏ j ∈ V ( τ ) \ U τ \ V τ ( deg τ ( j ) − ) !! ∏ e ∈ E ( τ ) √ λ l e √ k l e Otherwise, deﬁne S ( τ ) = . roposition 6.30. For any U ∈ I mid and τ ∈ M U , suppose we take ρ ∈ P τ . Let π be the bijectionfrom Proposition 6.29 so that π ( ρ ) ∈ Q τ . Let U ′ = ( U τ ) , V ′ = ( V τ ) be the type vertices in U τ , V τ respectively. Let x i for i ∈ U ′ ∪ V ′ be random variables independently sampled from D . Deﬁne x U ′ (resp. x V ′ ) to be the subset of variables x i for i ∈ U ′ (resp. i ∈ V ′ ). Then, H τ , ρ , π ( ρ ) = | Aut ( U ) | S ( τ ) E x " v ρ , x U ′ ∏ i ∈ U ′ ∪ V ′ x deg τ ( i ) i ! v T π ( ρ ) , x V ′ Proof.

For σ ∈ L U , ρ , σ ′ ∈ L U , π ( ρ ) and t ∈ {

1, 2 } , we have ( | τ | t − | U τ | t ) + ( | σ | t − | V σ | t ) + ( | σ ′ | t − | V σ ′ | t ) = | σ ◦ τ ◦ σ ′ | t . The result then follows by a straightforward veriﬁcation of the conditions ofDeﬁnition 6.9 using Deﬁnition 6.25. Lemma 6.15.

For all U ∈ I mid and τ ∈ M U , " S ( τ ) R ( τ ) | Aut ( U ) | H Id U H τ H T τ S ( τ ) R ( τ ) | Aut ( U ) | H Id U (cid:23) Proof.

Let P τ , Q τ , π be from Proposition 6.29. Let U ′ = ( U τ ) , V ′ = ( V τ ) be the type vertices in U τ , V τ respectively. Let x i for i ∈ U ′ ∪ V ′ be random variables independently sampled from D . Deﬁne x U ′ (resp. x V ′ ) to be the subset of variables x i for i ∈ U ′ (resp. i ∈ V ′ ).For ρ ∈ P U , deﬁne W ρ , ρ = E y U ∼D U [ v ρ , y U v T ρ , y U ] so that H Id U , ρ , ρ = | Aut ( U ) | W ρ , ρ . Observe that W ρ , ρ = E [ v ρ , x U ′ v T ρ , x U ′ ] = E [ v ρ , x V ′ v T ρ , x V ′ ] because x U ′ and x V ′ are also sets of variables sampled from D and, U ′ , V ′ have the same size as U because U τ = V τ = U .For ρ , ρ ′ ∈ P U , deﬁne Y ρ , ρ ′ = E h v ρ , x U ′ (cid:16) ∏ i ∈ U ′ ∪ V ′ x deg τ ( i ) i (cid:17) v T π ( ρ ) , x V ′ i . Then, H τ = ∑ ρ ∈ P τ H τ , ρ , π ( ρ ) = | Aut ( U ) | S ( τ ) ∑ ρ ∈ P τ Y ρ , π ( ρ ) . We have " S ( τ ) R ( τ ) | Aut ( U ) | H Id U H τ H T τ S ( τ ) R ( τ ) | Aut ( U ) | H Id U = S ( τ ) | Aut ( U ) | " R ( τ ) ∑ ρ ∈P U W ρ , ρ ∑ ρ ∈ P τ Y ρ , π ( ρ ) ∑ ρ ∈ P τ Y T ρ , π ( ρ ) R ( τ ) ∑ ρ ∈P U W ρ , ρ Since S ( τ ) | Aut ( U ) | ≥ , it sufﬁces to prove that " R ( τ ) ∑ ρ ∈P U W ρ , ρ ∑ ρ ∈ P τ Y ρ , π ( ρ ) ∑ ρ ∈ P τ Y T ρ , π ( ρ ) R ( τ ) ∑ ρ ∈P U W ρ , ρ (cid:23) . Consider " R ( τ ) ∑ ρ ∈P U W ρ , ρ ∑ ρ ∈ P τ Y ρ , π ( ρ ) ∑ ρ ∈ P τ Y T ρ , π ( ρ ) R ( τ ) ∑ ρ ∈P U W ρ , ρ = R ( τ ) (cid:20) ∑ ρ ∈P U \ P τ W ρ , ρ ∑ ρ ∈P U \ Q τ W ρ , ρ (cid:21) + " R ( τ ) ∑ ρ ∈ P τ W ρ , ρ ∑ ρ ∈ P τ Y ρ , π ( ρ ) ∑ ρ ∈ P τ Y T ρ , π ( ρ ) R ( τ ) ∑ ρ ∈ P τ W π ( ρ ) , π ( ρ ) We have ∑ ρ ∈P U \ P τ W ρ , ρ = ∑ ρ ∈P U \ P τ E [ v ρ , x U ′ v T ρ , x U ′ ] (cid:23) . Similarly, ∑ ρ ∈P U \ Q τ W ρ , ρ (cid:23) . Also, R ( τ ) ≥ and so, the ﬁrst term in the above expression, R ( τ ) (cid:20) ∑ ρ ∈P U \ P τ W ρ , ρ ∑ ρ ∈P U \ Q τ W ρ , ρ (cid:21) is positive39emideﬁnite. For the second term, " R ( τ ) ∑ ρ ∈ P τ W ρ , ρ ∑ ρ ∈ P τ Y ρ , π ( ρ ) ∑ ρ ∈ P τ Y T ρ , π ( ρ ) R ( τ ) ∑ ρ ∈ P τ W π ( ρ ) , π ( ρ ) = ∑ ρ ∈ P τ  R ( τ ) E [ v ρ , x U ′ v T ρ , x U ′ ] E h v ρ , x U ′ (cid:16) ∏ i ∈ U ′ ∪ V ′ x deg τ ( i ) i (cid:17) v T π ( ρ ) , x V ′ i E h v T ρ , x U ′ (cid:16) ∏ i ∈ U ′ ∪ V ′ x deg τ ( i ) i (cid:17) v π ( ρ ) , x V ′ i R ( τ ) E [ v π ( ρ ) , x V ′ v T π ( ρ ) , x V ′ ]  = ∑ ρ ∈ P τ E  R ( τ ) v ρ , x U ′ v T ρ , x U ′ v ρ , x U ′ (cid:16) ∏ i ∈ U ′ ∪ V ′ x deg τ ( i ) i (cid:17) v T π ( ρ ) , x V ′ v T ρ , x U ′ (cid:16) ∏ i ∈ U ′ ∪ V ′ x deg τ ( i ) i (cid:17) v π ( ρ ) , x V ′ R ( τ ) v π ( ρ ) , x V ′ v T π ( ρ ) , x V ′  We will prove that the term inside the expectation is positive semideﬁnite for each ρ ∈ P τ and eachsampling of the x i from D , which will complete the proof. Fix ρ ∈ P τ and any sampling of the x i from D . Let w = v ρ , X U ′ , w = v π ( ρ ) , x V ′ . Let E = ∏ i ∈ U ′ ∪ V ′ x deg τ ( i ) i . We would like to prove that (cid:20) R ( τ ) w w T Ew w T Ew T w R ( τ ) w w T (cid:21) (cid:23) . For all y sampled from D , | y | ≤ C disc √ D E and so, | E | ≤ ( C disc √ D E ) ∑ j ∈ U ′∪ V ′ deg τ ( j ) = R ( τ ) .If E ≥ , then (cid:20) R ( τ ) w w T Ew w T Ew T w R ( τ ) w w T (cid:21) = ( R ( τ ) − E ) (cid:20) w w T w w T (cid:21) + E (cid:20) w w T w w T w T w w w T (cid:21) = ( R ( τ ) − E ) (cid:18)(cid:20) w (cid:21) (cid:2) w (cid:3) + (cid:20) w (cid:21) (cid:2) w (cid:3)(cid:19) + E (cid:20) w w (cid:21) (cid:2) w w (cid:3) (cid:23) since R ( τ ) − E ≥ And if E < , (cid:20) R ( τ ) w w T Ew w T Ew T w R ( τ ) w w T (cid:21) = ( R ( τ ) + E ) (cid:20) w w T w w T (cid:21) − E (cid:20) w w T − w w T − w T w w w T (cid:21) = ( R ( τ ) + E ) (cid:18)(cid:20) w (cid:21) (cid:2) w (cid:3) + (cid:20) w (cid:21) (cid:2) w (cid:3)(cid:19) − E (cid:20) w − w (cid:21) (cid:2) w − w (cid:3) (cid:23) since R ( τ ) + E ≥ . The next proposition captures the fact that when we compose shapes σ , γ , γ T , σ ′ T , in order for λ σ ◦ γ ◦ γ T ◦ σ ′ T to be nonzero, the parities of the degrees of the merged vertices should add up correspondingly. Deﬁnition 6.31.

For all U , V ∈ I mid where w ( U ) > w ( V ) , for γ ∈ Γ U , V and parity vectors ρ , ρ ′ ∈ P U ,deﬁne the γ ◦ γ T -coefﬁcient matrix H − γ , γ Id V , ρ , ρ ′ as H − γ , γ Id V , ρ , ρ ′ ( σ , σ ′ ) = H − γ , γ Id V ( σ , σ ′ ) if σ ∈ L U , ρ , σ ′ ∈ L U , ρ ′ and otherwise. roposition 6.32. For all U , V ∈ I mid where w ( U ) > w ( V ) , for all γ ∈ Γ U , V , there exists a set of parityvectors P γ ⊆ P U such that H − γ , γ Id V = ∑ ρ ∈ P γ H − γ , γ Id V , ρ , ρ Proof.

For all U , V ∈ I mid where w ( U ) > w ( V ) , for all γ ∈ Γ U , V and parity vector ρ ∈ P U ,deﬁne the matrix H ′ γ , ρ , ρ as H ′ γ , ρ , ρ ( σ , σ ′ ) = H ′ γ ( σ , σ ′ ) if σ , σ ′ ∈ L U , ρ and otherwise. Proposition 6.34.

For all U , V ∈ I mid where w ( U ) > w ( V ) , for γ ∈ Γ U , V , H ′ γ = ∑ ρ ∈ P γ H ′ γ , ρ , ρ . We will now deﬁne vectors which are truncations of v ρ , x U . Deﬁnition 6.35.

Let U , V ∈ I mid where w ( U ) > w ( V ) , and let γ ∈ Γ U , V . Let x i for i ∈ U be variables.Denote them collectively as x U . For ρ ∈ P U , deﬁne v − γρ , x U to be the vector indexed by left shapes σ ∈ L such that the σ th entry is v ρ , x U ( σ ) if | V ( σ ◦ γ ) | ≤ D V and otherwise. We restate Deﬁnition 6.16 for convenience.

Deﬁnition 6.16.

For all U , V ∈ I mid where w ( U ) > w ( V ) and γ ∈ Γ U , V , if deg γ ( i ) is even for all vertices i in V ( γ ) \ U γ \ V γ , deﬁne S ( γ ) = (cid:18) kd (cid:19) | γ | − | U γ | + | V γ | ∆ | γ | − | U γ | + | V γ | ∏ j ∈ V ( γ ) \ U γ \ V γ ( deg γ ( j ) − ) !! ∏ e ∈ E ( γ ) √ λ l e √ k l e Otherwise, deﬁne S ( γ ) = . Proposition 6.36.

For any U , V ∈ I mid where w ( U ) > w ( V ) , and for any γ ∈ Γ U , V , suppose wetake ρ ∈ P γ . When we compose γ with γ T to get γ ◦ γ T , let U ′ = ( U γ ◦ γ T ) , V ′ = ( V γ ◦ γ T ) be thetype vertices in U γ ◦ γ T , V γ ◦ γ T respectively. And let W ′ be the set of type vertices in γ ◦ γ T that wereidentiﬁed in the composition when we set V γ = U T γ . Let x i for i ∈ U ′ ∪ W ′ ∪ V ′ be random variablesindependently sampled from D . Deﬁne x U ′ (resp. x V ′ , x W ′ ) to be the subset of variables x i for i ∈ U ′ (resp. i ∈ V ′ , i ∈ W ′ ). Then, H − γ , γ Id V , ρ , ρ = | Aut ( V ) | S ( γ ) E x " ( v − γρ , x U ′ ) ∏ i ∈ U ′ ∪ W ′ ∪ V ′ x deg γ ◦ γ T ( i ) i ! ( v − γρ , x V ′ ) T Proof.

Fix σ , σ ′ ∈ L U , ρ such that | V ( σ ◦ γ ) | , | V ( σ ′ ◦ γ ) | ≤ D V . Note that for t ∈ {

1, 2 } , | σ | t − | V σ | t + | σ ′ | t − | V σ ′ | t + ( | γ | t − | U γ | t + | V γ | t ) = | σ ◦ γ ◦ γ T ◦ σ ′ T | t . We can easily verify the equality us-ing Deﬁnition 6.9 and Deﬁnition 6.25. 41 roposition 6.37. For any U , V ∈ I mid where w ( U ) > w ( V ) , and for any γ ∈ Γ U , V , suppose we take ρ ∈ P U . Then, H ′ γ , ρ , ρ = | Aut ( U ) | E y U ∼D U h ( v − γρ , y U )( v − γρ , y U ) T i We can now prove Lemma 6.17.

Lemma 6.17.

For all U , V ∈ I mid where w ( U ) > w ( V ) and all γ ∈ Γ U , V , | Aut ( V ) || Aut ( U ) | · S ( γ ) R ( γ ) H − γ , γ Id V (cid:22) H ′ γ Proof.

Let U ′ , V ′ , W ′ be as in Proposition 6.36. We have | Aut ( V ) || Aut ( U ) | · S ( γ ) R ( γ ) H − γ , γ Id V = ∑ ρ ∈ P γ | Aut ( V ) || Aut ( U ) | · S ( γ ) R ( γ ) H − γ , γ Id V , ρ , ρ = ∑ ρ ∈ P γ | Aut ( U ) | · R ( γ ) E x " ( v − γρ , x U ′ ) ∏ i ∈ U ′ ∪ W ′ ∪ V ′ x deg γ ◦ γ T ( i ) i ! ( v − γρ , x V ′ ) T We will now prove that, for all ρ ∈ P γ , | Aut ( U ) | · R ( γ ) E x " ( v − γρ , x U ′ ) ∏ i ∈ U ′ ∪ W ′ ∪ V ′ x deg γ ◦ γ T ( i ) i ! ( v − γρ , x V ′ ) T (cid:22) H ′ γ , ρ , ρ This reduces to proving that R ( γ ) E x " ( v − γρ , x U ′ ) ∏ i ∈ U ′ ∪ W ′ ∪ V ′ x deg γ ◦ γ T ( i ) i ! ( v − γρ , x V ′ ) T (cid:22) E y U ∼D U h ( v − γρ , y U )( v − γρ , y U ) T i = E x h ( v − γρ , x U ′ )( v − γρ , x U ′ ) T + ( v − γρ , x V ′ )( v − γρ , x V ′ ) T i where the last equality followed from linearity of expectation and the fact that U ′ ≡ V ′ ≡ U .Since H − γ , γ Id V , ρ , ρ is symmetric, we have E x " ( v − γρ , x U ′ ) ∏ i ∈ U ′ ∪ W ′ ∪ V ′ x deg γ ◦ γ T ( i ) i ! ( v − γρ , x V ′ ) T = E x " ( v − γρ , x V ′ ) ∏ i ∈ U ′ ∪ W ′ ∪ V ′ x deg γ ◦ γ T ( i ) i ! ( v − γρ , x U ′ ) T So, it sufﬁces to prove R ( γ ) E x " ( v − γρ , x U ′ ) ∏ i ∈ U ′ ∪ W ′ ∪ V ′ x deg γ ◦ γ T ( i ) i ! ( v − γρ , x V ′ ) T + ( v − γρ , x V ′ ) ∏ i ∈ U ′ ∪ W ′ ∪ V ′ x deg γ ◦ γ T ( i ) i ! ( v − γρ , x U ′ ) T (cid:22) E x h ( v − γρ , x U ′ )( v − γρ , x U ′ ) T + ( v − γρ , x V ′ )( v − γρ , x V ′ ) T i

42e will prove that for every sampling of the x i from D , we have R ( γ ) ( v − γρ , x U ′ ) ∏ i ∈ U ′ ∪ W ′ ∪ V ′ x deg γ ◦ γ T ( i ) i ! ( v − γρ , x V ′ ) T + ( v − γρ , x V ′ ) ∏ i ∈ U ′ ∪ W ′ ∪ V ′ x deg γ ◦ γ T ( i ) i ! ( v − γρ , x U ′ ) T ! (cid:22) ( v − γρ , x U ′ )( v − γρ , x U ′ ) T + ( v − γρ , x V ′ )( v − γρ , x V ′ ) T Then, taking expectations will give the result. Indeed, ﬁx a sampling of the x i from D . Let E = ∏ i ∈ U ′ ∪ W ′ ∪ V ′ x deg γ ◦ γ T ( i ) i and let w = v − γρ , x U ′ , w = v − γρ , x V ′ . Then, the inequality we need to show is ER ( γ ) ( w w T + w w T ) (cid:22) w w T + w w T Now, since | x i | ≤ C disc √ D E for all i , we have | E | ≤ ∏ i ∈ U ′ ∪ W ′ ∪ V ′ ( C disc √ D E ) deg γ ◦ γ T ( i ) = R ( γ ) .If E ≥ , using ER ( γ ) ( w − w )( w − w ) T (cid:23) gives ER ( γ ) ( w w T + w w T ) (cid:22) ER ( γ ) ( w w T + w w T ) (cid:22) w w T + w w T since ≤ E ≤ R ( γ ) .And if E < , using − ER ( γ ) ( w + w )( w + w ) T (cid:23) gives ER ( γ ) ( w w T + w w T ) (cid:22) − ER ( γ ) ( w w T + w w T ) (cid:22) w w T + w w T since ≤ − E ≤ R ( γ ) .Finally, we use the fact that for all ρ ∈ P U , we have H ′ γ , ρ , ρ (cid:23) which can be proved the same way asthe proof of Lemma 6.11. Therefore, | Aut ( V ) || Aut ( U ) | · S ( γ ) R ( γ ) H − γ , γ Id V (cid:22) ∑ ρ ∈ P γ H ′ γ , ρ , ρ (cid:22) ∑ ρ ∈P U H ′ γ , ρ , ρ = H ′ γ In this section, we make our deﬁnitions and results more precise. We also generalize our deﬁnitions andresults to handle problems where one or more of the following is true:43. The input entries correspond to hyperedges rather than edges.2. We have different types of indices.3. Ω is a more complicated distribution than {− + } .4. We have to consider matrix indices which are not multilinear.Throughout this section and the remainder of this manuscript, we give the reader a choice for the levelof generality of this machinery. In particular, we will ﬁrst recall our deﬁnition for the simpler case whenour input is {− + } ( n ) and we only consider multilinear indices. We will then discuss how this simplerdeﬁnition generalizes. We denote these generalizations with an asterix ∗ . In the general case we will need a few additional parameters which we deﬁne here.

Deﬁnition 7.1.

1. We deﬁne k to be the arity of the hyperedges corresponding to the input.2. We deﬁne t max to be the number of different types of indices. We deﬁne n i to be the number of possi-bilities for indicies of type i and we deﬁne n = max { n i : i ∈ [ t max ] } . Note: For this section, we use X to denote the input, we use x to denote entries of the input and we use y todenote solution variables. Deﬁnition 7.2 (Vertices: Simpliﬁed Case) . When the input and solution variables are indexed by one typeof index which takes values in [ n ] then we represent the index i by a vertex labeled i .If we want to leave an index unspeciﬁed, we instead represent it by a vertex labeled with a variable (wewill generally use u , v , or w for these variables). Deﬁnition 7.3 (Vertices: General Case*) . When the input and solution variables are indexed by severaltypes of indices where indices of type t take values in [ n t ] , we represent an index of type t with value i as avertex labeled by the tuple ( t , i ) . We say that such a vertex has type t .If we want to leave an index of type t unspeciﬁed, we instead represent it by a vertex labeled with a tuple ( t , ? ) where ? is a variable (which will generally be u , v , or w ). Deﬁnition 7.4 (Edges: Simpliﬁed Case) . When the input is X ∈ {− + } ( n ) , we represent the entries ofthe input by the undirected edges { ( i , j ) : i < j ∈ [ n ] } . Given an edge e = ( i , j ) , we take x e = x ij to be theinput entry corresponding to e . Deﬁnition 7.5 (Edges: General Case*) . In general, we represent the entries of the input by hyperedgeswhose form depends on nature of the input. We still take x e to be the input entry corresponding to e . Example 7.6.

If the input is an n × n matrix X then we will have two types of indices, one for the rowand one for the column. Thus, we will have the vertices { ( i ) : i ∈ [ n ] } ∪ { ( j ) : j ∈ [ n ] } . In thiscase, we have an edge (( i ) , ( j )) for each entry x ij of the input. xample 7.7. If the input is an n × n matrix X which is not symmetric then we only need the indices [ n ] .In this case, we have a directed edge ( i , j ) for each entry x ij where i = j . If the entries x ii are also part ofthe input than we also have loops ( i , i ) for these entries. Example 7.8.

If our input is a symmetric n × n × n tensor X (i.e. x ijk = x ikj = x jik = x jki = x kij = x kji )and x ijk = whenever i , j , k are not distinct then we only need the indices [ n ] . In this case, we have anundirected hyperedge e = ( i , j , k ) for each entry x e = x ijk of the input where i , j , k are distinct. Example 7.9.

If the input is an n × n × n tensor X then we will have three types of indices. Thus, wewill have the vertices { ( i ) : i ∈ [ n ] } ∪ { ( j ) : j ∈ [ n ] } ∪ { ( k ) : k ∈ [ n ] } . In this case, we have ahyperedge e = (( i ) , ( j ) , ( k )) for each entry x e = x ijk of the input. In this subsection, we discuss how our matrices are indexed and how we associate matrix indices withmonomials. We also describe the automorphism groups of matrix indices.

Deﬁnition 7.10 (Matrix Indices: Simpliﬁed Case) . If there is only one type of index and we have the con-straints y i = or y i = y i on the solution variables then we deﬁne a matrix index A to be a tuple of indices ( a , . . . , a | A | ) . We make the following deﬁnitions about matrix indices:1. We associate the monomial ∏ | A | j = y a j to A .2. We deﬁne V ( A ) to be the set of vertices { a i : i ∈ [ | A | ] } . For brevity, we will often write A instead of V ( A ) when it is clear from context that we are referring to A as a set of vertices rather than a matrixindex.3. We take the automorphism group of A to be Aut ( A ) = S | A | (the permutations of the elements of A ) Example 7.11.

The matrix index A = (

4, 6, 1 ) represents the monomial y y y = y y y and Aut ( A ) = S Remark 7.12.

We take A to be an ordered tuple rather than a set for technical reasons. In general, we need a more intricate deﬁnition for matrix indices. We start by deﬁning matrix indexpieces

Deﬁnition 7.13 (Matrix Index Piece Deﬁnition*) . We deﬁne a matrix index piece A i = (( a i , . . . , a i | A i | ) , t i , p i ) to be a tuple of indices ( a i , . . . , a i | A i | ) together with a type t i and a power p i . We make the following deﬁ-nitions about matrix index pieces:1. We associate the monomial p A i = ∏ | A i | j = y p i t i j with A i .2. We deﬁne V ( A i ) to be the set of vertices { ( t i , a ij ) : j ∈ [ | A i | ] } .3. We take the automorphism group of A i to be Aut ( A i ) = S | A i |

4. We say that A i and A j are disjoint if V ( A i ) ∩ V ( A j ) = ∅ (i.e. t i = t j or { a i , . . . , a i | A i | } ∩{ a j , . . . , a j | A j | } = ∅ ) Deﬁnition 7.14 (General Matrix Index Deﬁnition*) . We deﬁne a matrix index A = { A i } to be a set ofdisjoint matrix index pieces. We make the following deﬁnitions about matrix indices: . We associate the monomial p A = ∏ A i ∈ A p ( A i ) with A .2. We deﬁne V ( A ) to be the set of vertices ∪ A i ∈ A V ( A i ) . For brevity, we will often write A instead of V ( A ) when it is clear from context that we are referring to A as a set of vertices rather than a matrixindex.3. We take the automorphism group of A to be Aut ( A ) = ∏ A i ∈ A Aut ( A i ) Example 7.15 (*) . If A = (( ) , 1, 1 ) , A = ((

3, 1 ) , 1, 2 ) , and A = ((

1, 2, 3 ) , 2, 1 ) then A = { A , A , A } represesents the monomial p = y y y y y y and we have Aut ( A ) = S × S × S A key idea is to analyze Fourier characters of the input.

Deﬁnition 7.16 (Simpliﬁed Fourier Characters) . If the input distribution is Ω = {−

1, 1 } then given amulti-set of edges E , we deﬁne χ E ( X ) = ∏ e ∈ E x e . Example 7.17.

If the input is a graph G ∈ {−

1, 1 } ( n ) and E is a set of potential edges of G (with nomultiple edges) then χ E ( G ) = ( − ) | E \ E ( G ) | . In general, the Fourier characters are somewhat more complicated.

Deﬁnition 7.18 (Orthonormal Basis for Ω *) . We deﬁne the polynomials { h i : i ∈ Z ∩ [ | supp ( Ω ) | − ] } to be the unique polynomials (which can be found through the Gram-Schmidt process) such that1. ∀ i , E Ω [ h i ( x )] = ∀ i = j , E Ω [ h i ( x ) h j ( x )] =

3. For all i , the leading coefﬁcient of h i ( x ) is positive. Example 7.19. If Ω is the normal distribution then the polynomials { h i } are the Hermite polynomials withthe appropriate normalization so that for all i , E Ω [ h i ( x )] = . In particular, h ( x ) = , h ( x ) = x , h ( x ) = x − √ , h ( x ) = x − x √ , etc. Deﬁnition 7.20 (General Fourier Characters*) . Given a multi-set of hyperedges E , each of which has a label l ( e ) ∈ [ | support ( Ω ) | − ] (or N if Ω has inﬁnite support), we deﬁne χ E = ∏ e ∈ E h l ( e ) ( X e ) .We say that such a multi-set of hyperedges E is proper if it contains no duplicate hyperedges, i.e. it is aset (though the labels on the hyperedges can be arbitrary non-negative integers). Otherwise, we say that E is improper. Remark 7.21.

The Fourier characters are { χ E : E is proper } . For improper E , χ E can be decomposed asa linear combination of χ E j where each E j is proper. We allow improper E because it is sometimes moreconvenient to have improper E in the middle of the analysis and then do this decomposition at the end. Deﬁnition 7.22 (Ribbons) . A ribbon R is a tuple ( H R , A R , B R ) where H R is a multi-graph (*or multi-hypergraph with labeled edges in the general case) whose vertices are indices of the input and A R and B R are matrix indices such that V ( A R ) ⊆ V ( H R ) and V ( B R ) ⊆ V ( H R ) . We make the following deﬁnitionsabout ribbons:1. We deﬁne V ( R ) = V ( H R ) and E ( R ) = E ( H R ) . We deﬁne χ R = χ E ( R ) .3. We deﬁne M R to be the matrix such that ( M R ) A R B R = χ R and M AB = whenever A = A R or B = B R .We say that R is a proper ribbon if H R contains no isolated vertices outside of A R ∪ B R and E ( R ) is proper.If there is an isolated vertex in ( V ( R ) \ A R ) \ B R or E ( R ) is improper then we say that R is an improperribbon. Proper ribbons are useful because they give an orthonormal basis for the space of matrix valued func-tions.

Deﬁnition 7.23 (Inner products of matrix functions) . For a pair of real matrices M , M of the same di-mension, we write h M , M i = tr ( M M T ) (i.e. h M , M i is the entrywise dot product of M and M ).For a pair of matrix-valued functions M , M (of the same dimensions), we deﬁne h M , M i = E X [ h M ( X ) , M ( X ) i ] Proposition 7.24. If R and R ′ are two proper ribbons then h M R , M R ′ i = if R = R ′ and is otherwise. In this subsection, we describe a basis for S -invariant matrix valued functions where each matrix in thisbasis can be described by a relatively small shape α . The fundamental idea behind shapes is that we keepthe structure of the objects we are working with but leave the elements of the object unspeciﬁed. (Simpliﬁed Index shapes) . With our simplifying assumptions, an index shape U is a tupleof unspeciﬁed indices ( u , · · · , u | U | ) . We make the following deﬁnitions about index shapes:1. We deﬁne V ( U ) to be the set of vertices { u i : i ∈ [ | U | ] } . For brevity, we will often write U instead of V ( U ) when it is clear from context that we are referring to U as a set of vertices rather than an indexshape.2. We deﬁne the weight of U to be w ( U ) = | U | .3. We take the automorphism group of U to be Aut ( U ) = S | U | (the permutations of the elements of U ) Deﬁnition 7.26.

We say that a matrix index A = ( a , . . . , a | A | ) has index shape U = ( u , . . . , u | U | ) if | U | = | A | . Note that in this case, if we take the map ϕ : { u j : j ∈ [ | U | ] } → [ n ] where ϕ ( u j ) = a j then ϕ ( U ) = ( ϕ ( u ) , . . . , ϕ ( u | U | )) = ( a , . . . , a | A | ) = A Deﬁnition 7.27.

We say that index shapes U = ( u , . . . , u | U | ) and V = ( v , . . . , v | V | ) are equivalent(which we write as U ≡ V ) if | U | = | V | . If U ≡ V then we can set U = V by setting v j = u j for all j ∈ [ | U | ] . Example 7.28.

The matrix index A = {

4, 6, 1 } has shape U = { u , u , u } which has weight . .5.2 General Index Shapes* In general, we deﬁne general index shapes in the same way that we deﬁned general matrix indices (just withunspeciﬁed indices)

Deﬁnition 7.29 (Index Shape Piece Deﬁnition) . We deﬁne a index shape piece U i = (( u i , . . . , u i | U i | ) , t i , p i ) to be a tuple of indices ( u i , . . . , u i | A i | ) together with a type t i and a power p i . We make the followingdeﬁnitions about index shape pieces:1. We deﬁne V ( U i ) to be the set of vertices { ( t i , u ij ) : j ∈ [ | U i | ] } .2. We deﬁne w ( U i ) = | U i | log n ( n t i )

3. We take the automorphism group of U i to be Aut ( U i ) = S | U i | Deﬁnition 7.30 (General Index Shape Deﬁnition) . We deﬁne an index shape U = { U i } to be a set of indexshape pieces such that for all i ′ = i , either t i ′ = t i or p i ′ = p i . We make the following deﬁnitions aboutindex shapes:1. We deﬁne V ( U ) to be the set of vertices ∪ U i ∈ U V ( U i ) . For brevity, we will often write U instead of V ( U ) when it is clear from context that we are referring to U as a set of vertices rather than an indexshape.2. We deﬁne w ( U ) to be w ( U ) = ∑ U i ∈ U w ( U i )

3. We take the automorphism group of U to be Aut ( U ) = ∏ U i ∈ U Aut ( U i ) Remark 7.31.

For technical reasons, we want to ensure that if two index shapes U and U ′ have the sameweight then U and U ′ have the same number of each type of vertex. To ensure this, we add an inﬁnitesimalperturbation to each n i if necessary. Deﬁnition 7.32.

We say that a matrix index A has index shape U if there is an assignment of values to theunspeciﬁed indices of U which results in A . More precisely, we say that A has index shape U if there isa map ϕ : { u ij } → N such that if we deﬁne ϕ ( U i ) to be ϕ ( U i ) = (( ϕ ( u i ) , . . . , ϕ ( u i | U i | )) , t i , p i ) then ϕ ( U ) = { ϕ ( U i ) } = { A i } = A . Deﬁnition 7.33. If U and V are two index shapes, we say that U is equivalent to V (which we writeas U ≡ V ) if U and V have the same number of index shape pieces and we can order the index shapepieces of U and V so that writing U = { U i } and V = { V i } where U i = (( u i , . . . , u i | U i | ) , t i , p i ) and V i = (( v i , . . . , v i | V i | ) , t ′ i , p ′ i ) , we have that for all i , | V i | = | U i | , t ′ i = t i , and p ′ i = p i . If U ≡ V then wecan set U = V by setting u ij = v ij for all i and all j ∈ [ | U i | ] . With these deﬁnitions, we are now ready to deﬁne shapes and the matrices associated to them.

Deﬁnition 7.34 (Shapes) . A ribbon shape α (which we call a shape for brevity) is a tuple α = ( H α , U α , V α ) where H α is a multi-graph (*or multi-hypergraph with labeled edges in the general case) whose vertices areunspeciﬁed distinct indices of the input (*whose type is speciﬁed in the general case) and U α and V α areindex shapes such that V ( U α ) ⊆ V ( H α ) and V ( V α ) ⊆ V ( H α ) . We make the following deﬁnitions aboutshapes: . We deﬁne V ( α ) = V ( H α ) (note that V ( α ) and V α are not the same thing) and we deﬁne E ( α ) = E ( H α ) .2. We say that a shape α is proper if it contains no isolated vertices outside of V ( U α ) ∪ V ( V α ) , E ( α ) has no multiple edges/hyperedges and edges in E ( α ) do not have label . If there is an isolated vertexin V ( α ) \ V ( U α ) \ V ( V α ) or E ( α ) has a multiple edge/hyperedge then we say that α is an impropershape.Note: For brevity, we will often write U α and V α instead of V ( U α ) and V ( V α ) when it is clear from contextthat we are referring to U α and V α as sets of vertices rather than index shapes. Deﬁnition 7.35 (Trivial shapes) . We say that a shape α is trivial if V ( α ) = V ( U α ) = V ( V α ) and E ( α ) = ∅ .Otherwise, we say that α is non-trivial. Remark 7.36.

Note that all trivial shapes can do is permute the order of the vertices in V ( U α ) = V ( V α ) . Deﬁnition 7.37.

Informally, we say that a ribbon R has shape α if replacing the indices in R with unspeciﬁedlabels results in α . Formally, we say that R has shape α if there is an injective mapping ϕ : V ( α ) → [ n ] (*or [ t max ] × [ n ] in the general case) such that ϕ ( α ) = R , i.e. ϕ ( H α ) = H R , ϕ ( U α ) = A R , and ϕ ( V α ) = B R Deﬁnition 7.38.

We say that two shapes α and β are equivalent (which we write as α ≡ β ) if they arethe same up to renaming their indices. More precisely, we say that α ≡ β if there is a bijective map π : V ( H α ) → V ( H β ) such that π ( H α ) = H β , π ( U α ) = U β , and π ( V α ) = V β . Deﬁnition 7.39.

Given a shape α and matrix indices A , B of shapes U α and V α respectively, we deﬁne R ( α , A , B ) to be the set of ribbons R such that R has shape α , A R = A , and B R = B . Deﬁnition 7.40.

For a shape α , we deﬁne the matrix-valued function M α to have entries M α ( A , B ) givenby ( M α ) A , B ( X ) = ∑ R ∈R ( α , A , B ) χ R ( X ) For examples of M α , see [AMP20]. Proposition 7.41.

The M α ’s for proper shapes α are an orthogonal basis for the S -invariant functions. Remark 7.42.

Conceptually, one may think of forming an orthonormal basis for this space with the func-tions M α / p h M α , M α i , but for technical reasons it is easiest to work with these functions without normal-izing them to . By orthogonality and the fact that every Boolean function is a polynomial, any S -invariantmatrix-valued function Λ is expressible as Λ = ∑ α h Λ , M α ih M α , M α i · M α Deﬁnition 7.43 (Composing Ribbons) . We say that ribbons R and R are composable if B R = A R . Notethat this deﬁnition is not symmetric so we may have that R and R are composable but R and R are notcomposable. Because of orthogonality of the underlying Fourier characters, it is not hard to check that when α = α ′ and M α , M α ′ have thesame dimensions, h M α , M α ′ i = . e say that R and R are properly composable if we also have that V ( R ) ∩ V ( R ) = V ( B R ) = V ( A R ) (there are no unexpected intersections between R and R ).If R and R are composable ribbons then we deﬁne the composition of R and R to be the ribbon R ◦ R such that1. A R ◦ R = A R and B R ◦ R = B R V ( R ◦ R ) = V ( R ) ∪ V ( R ) E ( R ◦ R ) = E ( R ) ∪ E ( R ) (and thus χ R ◦ R = χ R χ R )We say that ribbons R , . . . , R k are composable/properly composable if for all j ∈ [ k − ] , R ◦ . . . ◦ R j and R j + are composable/properly composable. If R , . . . , R k are composable then we deﬁne R ◦ . . . ◦ R k to be R ◦ . . . ◦ R k = ( R ◦ . . . ◦ R k − ) ◦ R k Proposition 7.44.

Ribbon composition is associative, i.e. if R , R , R are composable/properly compos-able ribbons then R , R are composable/properly composable, R , ( R ◦ R ) are composable/properlycomposable, and R ◦ ( R ◦ R ) = ( R ◦ R ) ◦ R Proposition 7.45. If R and R are composable ribbons then M R ∪ R = M R M R . We have similar deﬁnitions for composing shapes.

Deﬁnition 7.46 (Composing Shapes) . We say that shapes α and β are composable if U β ≡ V α . Note thatthis deﬁnition is not symmetric so we may have that α and β are composable but β and α are not composable.If α and β are composable shapes then we deﬁne the composition of α and β to be the shape α ◦ β suchthat1. U α ◦ β = U α and V α ◦ β = V β

2. After setting U β = V α , we take V ( α ◦ β ) = V ( α ) ∪ V ( β ) E ( α ◦ β ) = E ( α ) ∪ E ( β ) We say that shapes α , . . . , α k are composable if for all j ∈ [ k − ] , α ◦ . . . ◦ α j and α j + are composable. If α , . . . , α k are composable then we deﬁne the shape α ◦ . . . ◦ α k to be α ◦ . . . ◦ α k = ( α ◦ . . . ◦ α k − ) ◦ α k Proposition 7.47.

Shape composition is associative, i.e. if α , α , α are composable shapes then α , α arecomposable, α , ( α ◦ α ) are composable, and α ◦ ( α ◦ α ) = ( α ◦ α ) ◦ α In this subsection, we describe how shapes can be decomposed into left, middle, and right parts based onthe leftmost and rightmost minimum vertex separators , which is a crucial idea for our analysis.

Deﬁnition 7.48 (Paths) . A path in a shape α is a sequence of vertices v , . . . , v t such that v i , v i + are insome edge/hyperedge together. A pair of paths is vertex-disjoint if the corresponding sequences of verticesare disjoint. Deﬁnition 7.49 (Vertex separators) . Let α be a shape and let U and V be sets of vertices in α . We say thata set of vertices S ⊆ V ( α ) is a vertex separator of U and V if every path in α from U to V contains at leastone vertex in S . Note that any vertex separator S of U and V must contain all of the vertices in U ∩ V .As a special case, we say that S is a vertex separator of α if S is a vertex separator of U α and V α

50e deﬁne the weight of a set of vertices S ⊆ V ( α ) in the same way that weight is deﬁned for indexshapes. Deﬁnition 7.50 (Simpliﬁed Weight) . When there is only one type of index, the weight of a set of vertices S ⊆ V ( α ) is simply | S | . Deﬁnition 7.51 (General Weight*) . In general, given a set of vertices S ⊆ V ( α ) , writing S = ∪ t S t where S t is the set of vertices of type t in S , we deﬁne the weight of S to be w ( S ) = ∑ t | S t | log n ( n t ) Remark 7.52 (*) . Again, if necessary, we add an inﬁnitesimal perturbation to n , n , . . . , n t max so that iftwo separators S and S ′ have the same weight then S and S ′ have the same number of each type of vertex. Deﬁnition 7.53 (Leftmost and rightmost minimum vertex separators) . The leftmost minimum vertex separa-tor is the vertex separator S of minimum weight such that for every other minimum-weight vertex separator S ′ , S is a separator of U α and S ′ . The rightmost minimum vertex separator is the vertex separator T ofminimum weight such that for every other minimum-weight vertex separator T ′ , T is a separator of T ′ and V α The work [BHK +

16] showed that under our simplifying assumptions, leftmost and rightmost mini-mum vertex separators are well deﬁned. For a general proof that leftmost and rightmost minimum vertexseparators are well deﬁned, see Appendix A.We now have the following crucial idea. Every shape α can be decomposed into the composition ofthree composable shapes σ , τ , σ ′ T based on the leftmost and rightmost minimum vertex separators S , T of α together with orderings of S and T . Deﬁnition 7.54 (Simpliﬁed Separators With Orderings) . Under our simplifying assumptions, given a set ofvertices S ⊆ V ( α ) and an ordering O S = s , . . . , s | S | of the vertices of S , we deﬁne the index shape ( S , O S ) to be ( S , O S ) = ( s , . . . , s | S | ) . Deﬁnition 7.55 (General Separators With Orderings*) . In the general case, we need to give an ordering foreach type of vertex. Let S ⊆ V ( α ) be a subset of the vertices of α and write S = ∪ t S t where S t is the set ofvertices in S of type t . Given O S = { O t } where O t = s t , . . . , s t | S t | is an ordering of the vertices of S t , wedeﬁne the index shape piece ( S t , O t ) to be ( S t , O t ) = (( s t , . . . , s t | S t | ) , t , 1 ) and we deﬁne the index shape ( S , O S ) to be ( S , O S ) = { ( S t , O t ) } . Proposition 7.56.

The number of possible orderings O for S is equal to | Aut (( S , O S )) | Deﬁnition 7.57 (Shape transposes) . Given a shape α , we deﬁne α T to be the shape α with U α and V α swapped i.e. U σ T = V σ and V σ T = U σ . Deﬁnition 7.58 (Left, middle, and right parts) . Let α be a shape. Let S and T be the leftmost and rightmostminimal vertex separators of α together with orderings O S , O T of S and T .- We deﬁne the left part σ α of α to be the shape such that1. H σ α is the induced subgraph of H α on all of the vertices of α reachable from U α without passingthrough S (note that H σ α includes the vertices of S ) except that we remove any edges/hyperedgeswhich are contained entirely within S .2. U σ α = U α and V σ α = ( S , O S ) We deﬁne the right part σ ′ T α of α to be the shape such that1. H σ ′ T α is the induced subgraph of H α on all of the vertices of α reachable from V α without passingthrough T (note that H σ ′ T α includes the vertices of T ) except that we remove any edges/hyperedgeswhich are contained entirely within T .2. V σ ′ T α = V α and U σ ′ T α = ( T , O T ) - We deﬁne the middle part τ α of α to be the shape such that1. H τ α is the induced subgraph of H α on all of the vertices of α which are not reachable from U α and V α without touching S and T (note that H τ α includes the vertices of S and T ). H τ α also includesthe hyperedges entirely within S and the hyperedges entirely within T .2. U τ α = ( S , O S ) and V τ α = ( T , O T ) . Proposition 7.59. If σ , τ , σ ′ T are the left, middle, and rights parts for α for given orderings O S , O T of S and T then α = σ ◦ τ ◦ σ ′ T . Remark 7.60.

One may ask which ordering(s) we should take of S and T . The answer is that we will takeall of the possible orderings of S and T simultaneously, giving equal weight to each. Based on this decomposition and the following claim, we make the following deﬁnitions for what itmeans for a shape to be a left, middle, or right part.

Claim 7.61 (Proved in Section 6.1 in [BHK + . - Every shape σ which is the left part of some other shape α has that V σ is its left-most and right-mostminimum-weight separator.- Every shape σ T which is the right part of some other shape α has that U σ T is its left-most and right-most minimum-weight separator.- Every shape τ which is the middle part of some other shape α has U τ as its left-most minimum sizeseparator and V τ as its right-most minimum-weight separator. Deﬁnition 7.62.

1. We say that a shape σ is a left shape if σ is a proper shape, V σ is the left-most and right-most minimum-weight separator of σ , every vertex in V ( σ ) \ V σ is reachable from U σ without touching V σ , and σ hasno hyperedges entirely within V σ .2. We say that a shape τ is a proper middle shape if τ is a proper shape, U τ is the left-most minimum-weight separator of τ , and V τ is the right most minimum-weight separator of τ . In the analysis, wewill also need to consider improper middle shapes τ which may not be proper shapes and which mayhave smaller separators between U τ and V τ .3. We say that a shape σ T is a right shape if σ T is a proper shape, U σ T is the left-most and right-mostminimum-weight separator of σ T , every vertex in V ( σ T ) \ U σ T is reachable from V σ T without touching U σ T , and σ T has no hyperedges entirely within U σ T . The proof in [BHK +

16] only explicitly treats the case when the shapes α are graphs, but the proof easily generalizes to thecase when the α are hypergraphs. roposition 7.63. For all shapes σ , σ is a left shape if and only if σ T is a right shape. Remark 7.64.

As the reader has likely guessed, throughout this section we use σ to denote left parts and τ to denote middle parts. Instead of having a separate letter for right parts, we express right parts as thetranspose of a left part. We will have that Λ = ∑ α λ α M α . To analyze Λ , it is extremely useful to express these coefﬁcients in termsof matrices. To do this, we will need a few more deﬁnitions. We start by deﬁning the sets of index shapesthat can appear when analyzing Λ . Deﬁnition 7.65.

Given a moment matrix Λ , we deﬁne the following sets of index shapes.1. We deﬁne I ( Λ ) = { U : ∃ matrix index A : A is a row index of Λ , A has shape U } to be the set ofindex shapes which describe row and column indices of Λ .2. We deﬁne w max to be w max = max { w ( U ) : U ∈ I ( Λ ) } .3. With our simplifying assumptions, we deﬁne I mid to be I mid = { U : | U | ≤ w max } I mid to be I mid = { U : w ( U ) ≤ w max , ∀ U i ∈ U , p i = } We also need to deﬁne the sets of shapes which can appear when analyzing Λ . Deﬁnition 7.66 (Truncation Parameters) . Given a moment matrix Λ = ∑ α λ α M α , we deﬁne D V , D E to bethe smallest natural numbers such that for all shapes α such that λ α = , decomposing α as α = σ ◦ τ ◦ σ ′ T ,1. | V ( σ ) | ≤ D V , | V ( τ ) | ≤ D V , and | V ( σ ′ ) | ≤ D V .2.* For all edges e ∈ E ( σ ) ∪ E ( τ ) ∪ E ( σ ′ ) , l e ≤ D E . Remark 7.67.

Under our simplifying assumptions, all edges have label so we will take D E = andignore conditions involving D E . Deﬁnition 7.68.

Given a moment matrix Λ , we deﬁne the following sets of shapes:1. L = { σ : σ is a left shape , U σ ∈ I ( Λ ) , V σ ∈ I mid , | V ( σ ) | ≤ D V , ∀ e ∈ E ( σ ) , l e ≤ D E }

2. Given V ∈ I mid , we deﬁne L V = { σ ∈ L : V σ ≡ V }

3. Given U ∈ I mid , we deﬁne M U = { τ : τ is a non-trivial proper middle shape , U τ ≡ V τ ≡ U , | V ( τ ) | ≤ D V , ∀ e ∈ E ( τ ) , l e ≤ D E } Deﬁnition 7.69.

Given a moment matrix Λ , we deﬁne a Λ -coefﬁcient matrix (which we call a coefﬁcientmatrix for brevity) to be a matrix whose rows and columns are indexed by left shapes σ , σ ′ ∈ L .We say that a coefﬁcient matrix H is SOS-symmetric if H ( σ , σ ′ ) is invariant under permuting the ver-tices of U σ and permuting the vertices of U σ ′ (*more precisely, for the general case we permute the verticeswithin each index shape piece of U σ and permute the vertices within each index shape piece of U σ ′ ). Deﬁnition 7.70.

Given a shape τ , we say that a coefﬁcient matrix H is a τ -coefﬁcient matrix if H ( σ , σ ′ ) = whenever V σ U τ or V τ U σ ′ T . eﬁnition 7.71. Given an index shape U , we deﬁne Id U to be the shape with U Id U = V Id U = U , no othervertices, and no edges. Given a shape τ and a τ -coefﬁcient matrix H , we create two different matrix-valued functions, M f act τ ( H ) and M orth τ ( H ) . As we will see, we can express Λ in terms of M orth but to show PSDness we will need toshift to M f act . We analyze the difference betweem M f act and M orth in subsections 8.2, 8.3, and 8.4. Deﬁnition 7.72.

Given a shape τ and a τ -coefﬁcient matrix H , deﬁne M f act τ ( H ) = ∑ σ ∈L U τ , σ ′ ∈L V τ H ( σ , σ ′ ) M σ M τ M T σ ′ Proposition 7.73.

For all A and B with shapes in I ( Λ ) , (cid:16) M f act τ ( H ) (cid:17) ( A , B ) = ∑ σ ∈L U τ , σ ′ ∈L V τ H ( σ , σ ′ ) ∑ A ′ , B ′ ∑ R ∈R ( σ , A , A ′ ) , R ∈R ( τ , A ′ , B ′ ) , R ∈R ( σ ′ T , B ′ , B ) M R ( A , A ′ ) M R ( A ′ , B ′ ) M R ( B ′ , B ) If R , R , R are properly composable then R = R ◦ R ◦ R has the expected shape σ ◦ τ ◦ σ ′ T .Otherwise, R ◦ R ◦ R will have a different shape. We deﬁne M orth τ ( H ) to be the same sum as M f act τ ( H ) except that it is restricted to properly composable ribbons R , R , R . Deﬁnition 7.74.

We deﬁne M orth τ ( H ) so that for all A and B with shapes in I ( Λ ) , (cid:16) M orth τ ( H ) (cid:17) ( A , B )= ∑ σ ∈L U τ , σ ′ ∈L V τ H ( σ , σ ′ ) ∑ A ′ , B ′ ∑ R ∈R ( σ , A , A ′ ) , R ∈R ( τ , A ′ , B ′ ) , R ∈R ( σ ′ T , B ′ , B ) , R R R are properly composable M R ( A , A ′ ) M R ( A ′ , B ′ ) M R ( B ′ , B )= ∑ σ ∈L U τ , σ ′ ∈L V τ H ( σ , σ ′ ) ∑ A ′ , B ′ ∑ R ∈R ( σ , A , A ′ ) , R ∈R ( τ , A ′ , B ′ ) , R ∈R ( σ ′ T , B ′ , B ) , R R R are properly composable M R ◦ R ◦ R ( A , B ) It would be nice if we had that M orth τ ( H ) = ∑ σ ∈R U τ , σ ′ ∈R V τ H ( σ , σ ′ ) M σ ◦ τ ◦ σ ′ T . However, this is notquite correct because there is an additional term related to automorphism groups. Deﬁnition 7.75.

Given a shape α , deﬁne Aut ( α ) to be the set of mappings from α to itself which keep U α and V α ﬁxed. Deﬁnition 7.76.

Given composable shapes σ , τ , σ ′ T , we deﬁne Decomp ( σ , τ , σ ′ ) = Aut ( σ ◦ τ ◦ σ ′ ) / ( Aut ( σ ) × Aut ( τ ) × Aut ( σ ′ T )) Remark 7.77.

Each element π ∈ Decomp ( σ , τ , σ ′ ) decomposes σ ◦ τ ◦ σ ′ T into σ , τ , and σ ′ T by specifyingcopies π ( σ ) , π ( τ ) , π ( σ ′ T ) of σ , τ , and σ ′ T such that π ( σ ) ◦ π ( τ ) ◦ π ( σ ′ T ) = π ( σ ◦ τ ◦ σ ′ T ) = σ ◦ τ ◦ σ ′ T . Thus, | Decomp ( σ , τ , σ ′ ) | is the number of ways to decompose σ ◦ τ ◦ σ ′ T into σ , τ , and σ ′ T . emma 7.78. M orth τ ( H ) = ∑ σ ∈L U τ , σ ′ ∈L V τ H ( σ , σ ′ ) | Decomp ( σ , τ , σ ′ T ) | M σ ◦ τ ◦ σ ′ T Proof sketch.

As this lemma shows, we have to be very careful about symmetry groups in our analysis. Foraccuracy, it is safest to check that the coefﬁcients for each individual ribbon match.

Given a matrix-valued function Λ , we can associate coefﬁcient matrices to Λ as follows: Deﬁnition 7.80.

Given a matrix-valued function Λ = ∑ α : α is proper λ α M α ,1. For each index shape U ∈ I mid and every σ , σ ′ ∈ L U , we take H Id U ( σ , σ ′ ) = | Aut ( U ) | λ σ ◦ σ ′ T

2. For each U ∈ I mid , τ ∈ M U and σ , σ ′ ∈ L U , we take H τ ( σ , σ ′ ) = | Aut ( U τ ) |·| Aut ( V τ ) | λ σ ◦ τ ◦ σ ′ T Lemma 7.81. Λ = ∑ U ∈I mid M orthId U ( H Id U ) + ∑ U ∈I mid ∑ τ ∈M U M orth τ ( H τ ) Proof.

We check that the coefﬁcients for each individual ribbon R match. There are two cases to consider.If R has shape α where α has a unique minimum vertex separator S , then there is a bijection betweenorderings O S for S and pairs of ribbons R , R such that R ◦ R = R and the shapes σ , σ ′ T of R , R areleft and right shapes respectively.To see this, observe that when we concatenate R and R , this assigns the matrix index B R = A R to S , which is equivalent to specifying an ordering O S for S . Conversely, given an ordering O S for S , we take R to be the part of R between A R and ( S , O S ) and we take R to be the part of R between ( S , O S ) and B R .From this bijection, it follows that the coefﬁcient of M R is λ α on both sides of the equation.Similarly, if R has shape α where α does not have a unique minimal vertex separator, then there is abijection between orderings O S , O T for the leftmost and rightmost minimum vertex separators S , T of R and triples of ribbons R , R , R such that R ◦ R ◦ R = R and the shapes σ , τ , σ ′ T of R , R , R are left,proper middle, and right shapes respectively.To see this, observe that when we concatenate R , R , and R , this assigns the matrix index B R = A R to S and assigns the matrix index B R = A R to T , which is equivalent to specifying orderings O S , O T for S , T . Conversely, given orderings O S , O T for S , T , we take R to be the part of R between A R and ( S , O S ) ,55e take R to be the part of R between ( S , O S ) and ( T , O T ) , and we take R to be the part of R between ( T , O T ) and B R .From this bijection, it again follows that the coefﬁcient of M R is λ α on both sides of the equation. − γ , − γ operation and qualitative theorem statement In the intersection term anaylsis (see subsections 8.2, 8.3, and 8.4), we will need to further decompose leftshapes σ as σ = σ ◦ γ where σ and γ are themselves left shapes. Accordingly, we make the followingdeﬁnitions Deﬁnition 7.82.

Given a moment matrix Λ , we deﬁne the following sets of left shapes:1. Γ = { γ : γ is a non-trivial left shape , U γ , V γ ∈ I mid , | V ( γ ) | ≤ D V , ∀ e ∈ E ( γ ) , l e ≤ D E }

2. Given U , V ∈ I mid such that w ( U ) > w ( V ) , deﬁne Γ U , V = { γ ∈ Γ : U γ ≡ U , V γ ≡ V } .3. Given U ∈ I mid , deﬁne Γ U , ∗ = { γ ∈ Γ : U γ ≡ U }

4. Given V ∈ I mid , deﬁne Γ ∗ , V = { γ ∈ Γ : V γ ≡ V } Remark 7.83.

Under our simplifying assumptions, Γ is the same as L except that Γ excludes the trivialshapes. In general, while L requires that U σ ∈ I ( Λ ) , Γ requires that U γ ∈ I mid . Note that I ( Λ ) and I mid may be incomparable because1. There may be index shapes U ∈ I mid such that no matrix index of Λ has shape U .2. All index shape pieces U i for index shapes U ∈ I mid must have p i = while this is not the case for I ( Λ ) . We now state our theorem qualitatively after giving one more deﬁnition.

Deﬁnition 7.84.

Given a shape τ , left shapes γ ∈ Γ ∗ , U τ and γ ′ ∈ Γ ∗ , V τ , and a τ -coefﬁcient matrix H ,deﬁne H − γ , γ ′ to be the ( γ ◦ τ ◦ γ ′ T ) -coefﬁcient matrix with entries1. H − γ , γ ′ ( σ , σ ′ ) = H ( σ ◦ γ , σ ′ ◦ γ ′ ) if | V ( σ ◦ γ ) | ≤ D V and | V ( σ ′ ◦ γ ′ ) | ≤ D V .2. H − γ , γ ′ ( σ , σ ′ ) = if | V ( σ ◦ γ ) | > D V or | V ( σ ′ ◦ γ ′ ) | > D V . Remark 7.85.

For the theorem, we will only need the case when γ ′ = γ Our qualitative theorem statement is as follows:

Theorem 7.86.

Let Λ = ∑ U ∈I mid M orthId U ( H Id U ) + ∑ U ∈I mid ∑ τ ∈M U M orth τ ( H τ ) be an SOS-symmetric matrixvalued function.There exist functions f ( τ ) and f ( γ ) depending on n and other parameters such that if the followingconditions hold:1. For all U ∈ I mid , H Id U (cid:23)

2. For all U ∈ I mid and all τ ∈ M U , (cid:20) H Id U f ( τ ) H τ f ( τ ) H T τ H Id U (cid:21) (cid:23) . For all U , V ∈ I mid where w ( U ) > w ( V ) and all γ ∈ Γ U , V , H − γ , γ Id V (cid:22) f ( γ ) H Id U then with high probability Λ (cid:23) Remark 7.87.

Roughly speaking, conditions 1 and 2 give us an approximate PSD decomposition for themoment matrix M . Condition 3 comes from the intersection term analysis, which is the most technicallyintensive part of the proof. To state our theorem quantitatively, we will need a few more things. First, the conditions of the theoremwill involve functions B norm ( α ) , B ( γ ) , N ( γ ) , and c ( α ) . Roughly speaking, these functions will be used asfollows in the analysis:1. B norm ( α ) will bound the norms of the matrices M α B ( γ ) and N ( γ ) will help us bound the intersection terms (see Section 8.4).3. c ( α ) will help us sum over the possible γ and τ .Second, for technical reasons it turns out that comparing H − γ , γ Id V γ to H Id U γ doesn’t quite work. Instead, wecompare H − γ , γ Id V γ to a matrix H ′ γ of our choice where H ′ γ is very close to H Id U γ ( H ′ γ will be the same as H Id U γ up to truncation error). Deﬁnition 7.88.

Given a function B norm ( α ) , we deﬁne the distance d τ ( H τ , H ′ τ ) between two τ -coefﬁcientmatrices H τ and H ′ τ to be d τ ( H τ , H ′ τ ) = ∑ σ ∈L U τ , σ ′ ∈L V τ | H ′ τ ( σ , σ ′ ) − H τ ( σ , σ ′ ) | B norm ( σ ) B norm ( τ ) B norm ( σ ′ ) Third, we need an SOS-symmetric analogue of the identity matrix.

Deﬁnition 7.89.

We deﬁne Id Sym to be the matrix such that1. The rows and columns of Id Sym are indexed by the matrix indices A , B whose index shape is in I ( Λ ) .2. Id Sym ( A , B ) = if p A = p B and Id Sym ( A , B ) = if p A = p B . Proposition 7.90. If M has SOS-symmetry and the rows and columns of Id Sym are indexed by matrix indices A , B whose index shape is in I ( Λ ) then M (cid:22) k M k Id Sym

Corollary 7.91.

For all τ and all SOS-symmetric τ -coefﬁcient matrices H τ and H ′ τ , M f act τ ( H ′ τ ) + M f act τ T ( H ′ τ T ) − M f act τ ( H τ ) − M f act τ T ( H τ T ) (cid:22) d τ ( H τ , H ′ τ ) Id Sym

Note that if τ , H τ and H ′ τ are all symmetric then M f act τ ( H ′ τ ) − M f act τ ( H τ ) (cid:22) d τ ( H τ , H ′ τ ) Id Sym

Finally, we need a few more deﬁnitions about shapes α .57 eﬁnition 7.92 ( M ′ ) . We deﬁne M ′ to be the set of all shapes α such that1. | V ( α ) | ≤ D V ∀ e ∈ E ( α ) , l e ≤ D E e ∈ E ( α ) have multiplicity at most D V . Deﬁnition 7.93 ( S α ) . Given a shape α , deﬁne S α to be the leftmost minimum vertex separator of α Deﬁnition 7.94 ( I α ) . Given a shape α , deﬁne I α to be the set of vertices in V ( α ) \ ( U α ∪ V α ) which areisolated. We can now state our main theorem.

Theorem 7.95.

Given the moment matrix Λ = ∑ U ∈I mid M orthId U ( H Id U ) + ∑ U ∈I mid ∑ τ ∈M U M orth τ ( H τ ) , forall ε > , if we take1. q = l D V ln ( n ) + ln ( ε ) + D V ln ( ) + D V ln ( ) m B vertex = D V p eq B norm ( α ) = B | V ( α ) \ U α | + | V ( α ) \ V α | vertex n w ( V ( α ))+ w ( I α ) − w ( S α ) B ( γ ) = B | V ( γ ) \ U γ | + | V ( γ ) \ V γ | vertex n w ( V ( γ ) \ U γ ) N ( γ ) = ( D V ) | V ( γ ) \ V γ | + | V ( γ ) \ U γ | c ( α ) = ( D V ) | U α \ V α | + | V α \ U α | + | E ( α ) | | V ( α ) \ ( U α ∪ V α ) | and we have SOS-symmetric coefﬁcient matrices { H ′ γ : γ ∈ Γ } such that the following conditions hold:1. For all U ∈ I mid , H Id U (cid:23)

2. For all U ∈ I mid and τ ∈ M U , " | Aut ( U ) | c ( τ ) H Id U B norm ( τ ) H τ B norm ( τ ) H T τ | Aut ( U ) | c ( τ ) H Id U (cid:23)

3. For all U , V ∈ I mid where w ( U ) > w ( V ) and all γ ∈ Γ U , V , c ( γ ) N ( γ ) B ( γ ) H − γ , γ Id V (cid:22) H ′ γ then with probability at least − ε , Λ (cid:23) ∑ U ∈I mid M f actId U ( H Id U ) ! − ∑ U ∈I ∑ γ ∈ Γ U , ∗ d Id U ( H ′ γ , H Id U ) | Aut ( U ) | c ( γ ) ! Id sym If it is also true that whenever k M α k ≤ B norm ( α ) for all α ∈ M ′ , ∑ U ∈I mid M f actId U ( H Id U ) (cid:23) ∑ U ∈I ∑ γ ∈ Γ U , ∗ d Id U ( H ′ γ , H Id U ) | Aut ( U ) | c ( γ ) ! Id sym then with probability at least − ε , Λ (cid:23) . .10.1 General Main Theorem Before stating the general main theorem, we need to modify a few deﬁnitions for α and give a few deﬁnitionsfor Ω Deﬁnition 7.96 ( S α , min and S α , max ) . Given a shape α ∈ M ′ , deﬁne S α , min to be the leftmost minimumvertex separator of α if all edges with multiplicity at least are deleted and deﬁne S α , max to be the leftmostminimum vertex separator of α if all edges with multiplicity at least are present. Deﬁnition 7.97 (General I α ) . Given a shape α , deﬁne I α to be the set of vertices in V ( α ) \ ( U α ∪ V α ) suchthat all edges incident with that vertex have multplicity at least . Deﬁnition 7.98 ( B Ω ) . We take B Ω ( j ) to be a non-decreasing function such that for all j ∈ N , E Ω [ x j ] ≤ B Ω ( j ) j Deﬁnition 7.99 ( h + j ) . For all j , we deﬁne h + j to be the polynomial h j where we make all of the coefﬁcientshave positive sign. Lemma 7.100. If Ω = N (

0, 1 ) then we can take B Ω ( j ) = p j and we have that h + j ( x ) ≤ p j ! ( x + j ) j ≤ (cid:18) ej ( x + j ) (cid:19) j For a proof, see [AMP20, Lemma 8.15].

Theorem 7.101.

Given the moment matrix Λ = ∑ U ∈I mid M orthId U ( H Id U ) + ∑ U ∈I mid ∑ τ ∈M U M orth τ ( H τ ) , forall ε > , if we take1. q = (cid:6) D V ln ( n ) + ln ( ε ) + ( D V ) k ln ( D E + ) + D V ln ( ) (cid:7) B vertex = qD V B edge ( e ) = h + l e ( B Ω ( D V D E )) max j ∈ [ D V D E ] (cid:26)(cid:16) h + j ( B Ω ( qj )) (cid:17) le max { j , le } (cid:27) As a special case, if Ω = N (

0, 1 ) then we can take B edge ( e ) = (cid:0) D V D E q (cid:1) l e B norm ( α ) = eB | V ( α ) \ U α | + | V ( α ) \ V α | vertex (cid:16) ∏ e ∈ E ( α ) B edge ( e ) (cid:17) n w ( V ( α ))+ w ( I α ) − w ( S α , min ) B ( γ ) = B | V ( γ ) \ U γ | + | V ( γ ) \ V γ | vertex (cid:16) ∏ e ∈ E ( γ ) B edge ( e ) (cid:17) n w ( V ( γ ) \ U γ ) N ( γ ) = ( D V ) | V ( γ ) \ V γ | + | V ( γ ) \ U γ | c ( α ) = ( t max D V ) | U α \ V α | + | V α \ U α | + k | E ( α ) | ( t max ) | V ( α ) \ ( U α ∪ V α ) | and we have SOS-symmetric coefﬁcient matrices { H ′ γ : γ ∈ Γ } such that the following conditions hold:1. For all U ∈ I mid , H Id U (cid:23)

2. For all U ∈ I mid and τ ∈ M U , " | Aut ( U ) | c ( τ ) H Id U B norm ( τ ) H τ B norm ( τ ) H T τ | Aut ( U ) | c ( τ ) H Id U (cid:23) . For all U , V ∈ I mid where w ( U ) > w ( V ) and all γ ∈ Γ U , V , c ( γ ) N ( γ ) B ( γ ) H − γ , γ Id V (cid:22) H ′ γ then with probability at least − ε , Λ (cid:23) ∑ U ∈I mid M f actId U ( H Id U ) ! − ∑ U ∈I ∑ γ ∈ Γ U , ∗ d Id U ( H ′ γ , H Id U ) | Aut ( U ) | c ( γ ) ! Id sym If it is also true that whenever k M α k ≤ B norm ( α ) for all α ∈ M ′ , ∑ U ∈I mid M f actId U ( H Id U ) (cid:23) ∑ U ∈I mid ∑ γ ∈ Γ U , ∗ d Id U ( H ′ γ , H Id U ) | Aut ( U ) | c ( γ ) ! Id sym then with probability at least − ε , Λ (cid:23) . H ′ γ and Truncation Error A canonical choice for H ′ γ is to take1. H ′ γ ( σ , σ ′ ) = H Id U ( σ , σ ′ ) whenever | V ( σ ◦ γ ) | ≤ D V and | V ( σ ′ ◦ γ ) | ≤ D V .2. H ′ γ ( σ , σ ′ ) = whenever | V ( σ ◦ γ ) | > D V or | V ( σ ′ ◦ γ ) | > D V .With this choice, the truncation error is d Id U γ ( H Id U γ , H ′ γ ) = ∑ σ , σ ′∈L U γ : V ( σ ) ≤ DV , V ( σ ′ ) ≤ DV , | V ( σ ◦ γ ) | > DV or | V ( σ ′◦ γ ) | > DV B norm ( σ ) B norm ( σ ′ ) H Id U γ ( σ , σ ′ ) In this section, we prove the main theorem under the assumption that the functions B norm ( α ) , B ( γ ) , N ( γ ) ,and c ( α ) have certain properties. More precisely, we prove the following theorem. Theorem 8.1.

For all ε > and all ε ′ ∈ ( ] , for any moment matrix Λ = ∑ U ∈I mid M orthId U ( H Id U ) + ∑ U ∈I mid ∑ τ ∈M U M orth τ ( H τ ) , if B norm ( α ) , B ( γ ) , N ( γ ) , and c ( α ) are functions such that1. With probability at least ( − ε ) , for all shapes α ∈ M ′ , || M α || ≤ B norm ( α ) .2. For all τ ∈ M ′ , γ ∈ Γ ∗ , U τ , γ ′ ∈ Γ ∗ , V τ , and all intersection patterns P ∈ P γ , τ , γ ′ , B norm ( τ P ) ≤ B ( γ ) B ( γ ′ ) B norm ( τ ) Note: Intersection patterns and P γ , τ , γ ′ will be deﬁned later, see Deﬁnitions 8.8 and 8.9. . For all composable γ , γ , B ( γ ) B ( γ ) = B ( γ ◦ γ ) .4. ∀ U ∈ I mid , ∑ γ ∈ Γ U , ∗ | Aut ( U ) | c ( γ ) < ε ′ ∀ V ∈ I mid , ∑ γ ∈ Γ ∗ , V | Aut ( U γ ) | c ( γ ) < ε ′ ∀ U ∈ I mid , ∑ τ ∈M U | Aut ( U ) | c ( τ ) < ε ′

7. For all τ ∈ M ′ , γ ∈ Γ ∗ , U τ ∪ { Id U τ } , and γ ′ ∈ Γ ∗ , V τ ∪ { Id V τ } , ∑ j > ∑ γ , γ ′ , ··· , γ j , γ ′ j ∈ Γ γ , γ ′ , j ∏ i : γ i is non-trivial | Aut ( U γ i ) | ∏ i : γ ′ i is non-trivial | Aut ( U γ ′ i ) | ∑ P , ··· , P j : P i ∈P γ i , τ Pi − γ ′ iT j ∏ i = N ( P i ) ! ≤ N ( γ ) N ( γ ′ )( | Aut ( U γ ) | ) γ is non-trivial ( | Aut ( U γ ′ ) | ) γ ′ is non-trivial Note: Γ γ , γ ′ , j will be deﬁned later, see Deﬁnition 8.18.and we have SOS-symmetric coefﬁcient matrices { H ′ γ : γ ∈ Γ } such that the following conditions hold:1. For all U ∈ I mid , H Id U (cid:23)

2. For all U ∈ I mid and τ ∈ M U , " | Aut ( U ) | c ( τ ) H Id U B norm ( τ ) H τ B norm ( τ ) H T τ | Aut ( U ) | c ( τ ) H Id U (cid:23)

3. For all U , V ∈ I mid where w ( U ) > w ( V ) and all γ ∈ Γ U , V , c ( γ ) N ( γ ) B ( γ ) H − γ , γ Id V (cid:22) H ′ γ then with probability at least − ε , Λ (cid:23) ∑ U ∈I mid M f actId U ( H Id U ) ! − ∑ U ∈I mid ∑ γ ∈ Γ U , ∗ d Id U ( H ′ γ , H Id U ) | Aut ( U ) | c ( γ ) ! Id sym If it is also true that whenever || M α || ≤ B norm ( α ) for all α ∈ M ′ , ∑ U ∈I mid M f actId U ( H Id U ) (cid:23) ∑ U ∈I mid ∑ γ ∈ Γ U , ∗ d Id U ( H ′ γ , H Id U ) | Aut ( U ) | c ( γ ) ! Id sym then with probability at least − ε , Λ (cid:23) . Throughout this section, we assume that we have functions B norm ( α ) , B ( γ ) , N ( γ ) , and c ( α ) . If ∀ α ∈M ′ , || M α || ≤ B norm ( α ) then we say that the norm bounds hold. For the other properties of these functions,we will either restate these properties in our intermediate results to highlight where these properties areneeded or just state that the conditions on these functions are satisﬁed for brevity.61 .1 Warm-up: Analysis with no intersection terms In this subsection, we show how the analysis works if we ignore the difference between M f act and M orth Theorem 8.2.

For all ε ′ ∈ ( ] , if the norm bounds hold and the following conditions hold1. For all U ∈ I mid , H Id U (cid:23)

2. For all U ∈ I mid and all τ ∈ M U " | Aut ( U ) | c ( τ ) H Id U B norm ( τ ) H τ B norm ( τ ) H T τ | Aut ( U ) | c ( τ ) H Id U (cid:23) ∀ U ∈ I mid , ∑ τ ∈M U | Aut ( U ) | c ( τ ) ≤ ε ′ .then ∑ U ∈I mid M f actId U ( H Id U ) + ∑ U ∈I mid ∑ τ ∈M U M f act τ ( H τ ) (cid:23) ( − ε ′ ) ∑ U ∈I mid M f actId U ( H Id U ) (cid:23) Proof.

We ﬁrst show how a single term M σ M τ M σ ′ T plus its transpose M σ ′ M τ T M σ T can be bounded. Lemma 8.3.

If the norm bounds hold then for all τ ∈ M ′ and shapes σ , σ ′ such that σ , τ , σ ′ T are compos-able, for all a , b such that a > , b > , and ab = B norm ( τ ) , M σ M τ M σ ′ T + M σ ′ M τ T M σ T (cid:22) aM σ M σ T + bM σ ′ M σ ′ T Proof.

Observe that (cid:22) √ aM σ − √ bB norm ( τ ) M σ ′ M τ T ! √ aM σ − √ bB norm ( τ ) M σ ′ M τ T ! T = √ aM σ − √ bB norm ( τ ) M σ ′ M τ ! √ aM σ T − √ bB norm ( τ ) M τ M σ ′ T ! = aM σ M σ T − M σ M τ M σ ′ T − M σ ′ M τ T M σ T + bB norm ( τ ) M σ ′ M τ T M τ M σ ′ T (cid:22) aM σ M σ T − M σ M τ M σ ′ T − M σ ′ M τ T M σ T + bB norm ( τ ) M σ ′ ( B norm ( τ ) Id ) M σ ′ T Thus, M σ M τ M σ ′ T + M σ ′ M τ T M σ T (cid:22) aM σ M σ T + bM σ ′ M σ ′ T , as needed.Unfortunately, if we try to bound everything term by term, there may be too many terms to bound.Instead, we generalize this argument for vectors and coefﬁcient matrices. Deﬁnition 8.4.

Let τ be a shape. We say that a vector v is a left τ -vector if the coordinates of v are indexedby left shapes σ ∈ L U τ . We say that a vector w is a right τ -vector if the coordinates of w are indexed by leftshapes σ ′ ∈ L V τ . emma 8.5. For all τ ∈ M ′ , if the norm bounds hold, v is a left τ -vector, and w is a right τ -vector then M f act τ ( vw T ) + M f act τ T ( wv T ) (cid:22) B norm ( τ ) (cid:16) M f actId U τ ( vv T ) + M f actId V τ ( ww T ) (cid:17) and − M f act τ ( vw T ) − M f act τ T ( wv T ) (cid:22) B norm ( τ ) (cid:16) M f actId U τ ( vv T ) + M f actId V τ ( ww T ) (cid:17) Proof.

Observe that (cid:22) ∑ σ v σ M σ ∓ w σ M σ M τ T B norm ( τ ) ! ∑ σ ′ v σ ′ M σ ′ ∓ w σ ′ M σ ′ M τ T B norm ( τ ) ! T = ∑ σ v σ M σ ∓ w σ M σ M τ T B norm ( τ ) ! ∑ σ ′ v σ ′ M σ ′ T ∓ w σ ′ M τ M σ ′ T B norm ( τ ) ! = ∑ σ , σ ′ ( v σ v σ ′ ) M σ M σ ′ T ∓ ∑ σ , σ ′ ( v σ w σ ′ ) B norm ( τ ) M σ M τ M σ ′ ∓ ∑ σ , σ ′ ( w σ v σ ′ ) B norm ( τ ) M σ M τ T M σ ′ + B norm ( τ ) ∑ σ , σ ′ ( v σ v σ ′ ) M σ M τ M τ T M σ ′ T Further observe that1. ∑ σ , σ ′ ( v σ v σ ′ ) M σ M σ ′ T = M f actId U τ ( vv T ) ∑ σ , σ ′ ( v σ w σ ′ ) M σ M τ M σ ′ T = M f act τ ( vw T ) ∑ σ , σ ′ ( w σ v σ ′ ) M σ M τ T M σ ′ T = M f act τ T ( wv T ) ∑ σ , σ ′ ( w σ w σ ′ ) M σ M τ M τ T M σ ′ T = ∑ σ w σ M σ ! M τ M τ T ∑ σ w σ M σ ! T (cid:22) ∑ σ w σ M σ ! B norm ( τ ) Id ∑ σ w σ M σ ! T = B norm ( τ ) ∑ σ , σ ′ ( w σ w σ ′ ) M σ M σ ′ T = B norm ( τ ) M f actId V τ ( ww T ) Putting everything together, M f act τ ( vw T ) + M f act τ T ( wv T ) B norm ( τ ) (cid:22) M f actId U τ ( vv T ) + M f actId V τ ( ww T ) and − M f act τ ( vw T ) + M f act τ T ( wv T ) B norm ( τ ) (cid:22) M f actId U τ ( vv T ) + M f actId V τ ( ww T ) as needed. 63 orollary 8.6. For all τ ∈ M ′ , if the norm bounds hold and H U and H V are matrices such that (cid:20) H U B norm ( τ ) H τ B norm ( τ ) H T τ H V (cid:21) (cid:23) then M f act τ ( H τ ) + M f act τ T ( H τ T ) (cid:22) M f actId U τ ( H U ) + M f actId V τ ( H V ) Proof. If (cid:20) H U B norm ( τ ) H τ B norm ( τ ) H T τ H V (cid:21) (cid:23) then we can write (cid:20) H U B norm ( τ ) H τ B norm ( τ ) H T τ H V (cid:21) = ∑ i ( v i , w i )( v i , w i ) T Since the M f act operations are linear, the result now follows by summing the equation M f act τ ( v i w Ti ) + M f act τ T ( w i v Ti ) (cid:22) B norm ( τ ) (cid:16) M f actId U τ ( v i v Ti ) + M f actId V τ ( w i w Ti ) (cid:17) over all i .Theorem 8.2 now follows directly. For all U ∈ I mid and all τ ∈ M U , using Corollary 8.6 with H U = H V = | Aut ( U ) | c ( τ ) H Id U , M f act τ ( H τ ) + M f act τ T ( H τ T ) (cid:22) | Aut ( U ) | c ( τ ) M f actId U ( H Id U ) + | Aut ( U ) | c ( τ ) M f actId U ( H Id U ) Summing this equation over all U ∈ I mid and all τ ∈ M U , we obtain that ∑ U ∈I mid ∑ τ ∈M U M f act τ ( H τ ) (cid:22) ε ′ ∑ U ∈I mid M f actId U ( H Id U ) as needed. As we saw in the previous subsection, the analysis works out nicely if we work with M f act . Unfortunately,our matrices are expressed in terms of M orth . In this subsection, we describe our strategy for analyzing thedifference between M f act and M orth .Recall the following expressions for (cid:16) M f act τ ( H ) (cid:17) ( A , B ) and (cid:0) M orth τ ( H ) (cid:1) ( A , B ) where A has shape U τ and B has shape V τ : (cid:16) M f act τ ( H ) (cid:17) ( A , B ) = ∑ σ ∈L U τ , σ ′ ∈L V τ H ( σ , σ ′ ) ∑ A ′ , B ′ ∑ R ∈R ( σ , A , A ′ ) , R ∈R ( τ , A ′ , B ′ ) , R ∈R ( σ ′ T , B ′ , B ) M R ( A , A ′ ) M R ( A ′ , B ′ ) M R ( B ′ , B ) (cid:16) M orth τ ( H ) (cid:17) ( A , B )= ∑ σ ∈L U τ , σ ′ ∈L V τ H ( σ , σ ′ ) ∑ A ′ , B ′ ∑ R ∈R ( σ , A , A ′ ) , R ∈R ( τ , A ′ , B ′ ) , R ∈R ( σ ′ T , B ′ , B ) , R R R are properly composable M R ( A , A ′ ) M R ( A ′ , B ′ ) M R ( B ′ , B ) (cid:16) M f act τ ( H ) (cid:17) ( A , B ) − (cid:0) M orth τ ( H ) (cid:1) ( A , B ) is equal to ∑ σ ∈L U τ , σ ′ ∈L V τ H ( σ , σ ′ ) ∑ A ′ , B ′ ∑ R ∈R ( σ , A , A ′ ) , R ∈R ( τ , A ′ , B ′ ) , and R ∈R ( σ ′ T , B ′ , B ) R R R are not properly composable M R ( A , A ′ ) M R ( A ′ , B ′ ) M R ( B ′ , B ) Thus, to understand the difference between M f act and M orth , we need to analyze the terms χ R χ R χ R = χ R ◦ R ◦ R for ribbons R , R , R which are composable but not properly composable. These terms, whichwe call intersection terms, are not negligible and must be analyzed carefully. In particular, we decomposeeach resulting ribbon R = R ◦ R ◦ R into new left, middle, and right parts. We do this as follows:1. Let V ∗ be the set of vertices which appear more than once in V ( R ◦ R ◦ R ) . In other words, V ∗ is the set of vertices involved in the intersections between R , R , and R (not counting the facts that B R = A R and B R = A R because we expect these intersections).2. Let A ′ be the leftmost minimum vertex separator of A R and B R ∪ V ∗ in R . We turn A ′ into a matrixindex by specifying an ordering O A ′ for the vertices in A ′ .3. Let B ′ be the leftmost minimum vertex separator of A R ∪ V ∗ and B R in R . We turn B ′ into a matrixindex by specifying an ordering O B ′ for the vertices in B ′ .4. Decompose R as R = R ′ ∪ R where R ′ is the part of R between A R and A ′ and R is the partof R between B ′ and B R = A R . Similarly, decompose R as R = R ∪ R ′ where R is the part of R between B R = A R and B ′ and R ′ is the part of R between B ′ and B R .5. Take R ′ = R ◦ R ◦ R and note that R ′ ◦ R ′ ◦ R ′ = R ◦ R ◦ R . We view R ′ , R ′ , R ′ as the left,middle, and right parts of R = R ◦ R ◦ R While we will verify our analysis by checking the coefﬁcients of the ribbons, we want to express everythingin terms of shapes. We use the following conventions for the names of the shapes:1. As usual, we let σ , τ , and σ ′ T be the shapes of R , R , and R .2. We let γ and γ ′ T be the shapes of R and R .3. We let σ , τ P , and σ ′ T be the shapes of R ′ , R ′ , and R ′ . Here P is the intersection pattern induced by R , R , and R which we deﬁne in the next subsection. Remark 8.7.

A key feature of our analysis is that it will work the same way regardless of the shapes σ , σ ′ T of R ′ and R ′ . In other words, if we replace σ by σ a and σ ′ by σ ′ a for a given intersection term, this justreplaces σ = σ ∪ γ with σ a = σ a ∪ γ and σ ′ = σ ′ ∪ γ ′ with σ ′ a = σ ′ a ∪ γ ′ . This allows us to focus onthe shapes γ , τ , and γ ′ T and is the reason why the − γ , γ operation appears in our results. In this section, we implement our strategy for analyzing intersection terms. For simplicity, we only giverough deﬁnitions and proof sketches here. For a more rigorous treatment, see Apendix B.We begin by deﬁning intersection patterns which describe how the ribbons R , R , and R intersect. Deﬁnition 8.8 (Rough Deﬁnition of Intersection Patterns) . Given τ ∈ M ′ , γ ∈ Γ ∗ , U τ ∪ { Id U τ } , γ ′ ∈ Γ ∗ , V τ ∪ { Id V τ } , and ribbons R , R , and R of shapes γ , τ , and γ ′ T which are composable but not properlycomposable, we deﬁne the intersection pattern P induced by R , R , and R and the resulting shape τ P asfollows: . We take V ( P ) = V ( γ ◦ τ ◦ γ ′ T ) .2. We take E ( P ) to be the set of edges ( u , v ) such that u , v are distinct vertices in V ( σ ◦ τ ◦ σ ′ T ) but u and v correspond to the same vertex in R ◦ R ◦ R

3. We deﬁne τ P to be the shape of the ribbon R = R ◦ R ◦ R Deﬁnition 8.9.

Given τ ∈ M ′ , γ ∈ Γ ∗ , U τ ∪ { Id U τ } , and γ ′ ∈ Γ ∗ , V τ ∪ { Id V τ } , we deﬁne P γ , τ , γ ′ T to be theset of all possible intersection patterns P which can be induced by ribbons R , R , and R of shapes γ , τ ,and γ ′ T . Remark 8.10.

Note that if γ = Id U τ and γ ′ = Id V τ then P γ , τ , γ ′ T = ∅ as every intersection pattern musthave an unexpected intersection so either γ or γ ′ must be non-trivial. It would be nice if the intersection pattern P together with the ribbon R allowed us to recover the originalribbons R , R , and R . Unfortunately, it is possible for different triples of ribbons to result in the sameintersection pattern P and ribbon R . That said, the number of such triples cannot be too large, and this issufﬁcient for our purposes. Deﬁnition 8.11.

Given an intersection pattern P ∈ P γ , τ , γ ′ T , let R be a ribbon of shape τ P . We deﬁne N ( P ) to be the number of different triples of ribbons R , R , R such that R ◦ R ◦ R = R and R , R , R inducethe intersection pattern P . Lemma 8.12.

This can be proved by making the following observations:1. A R = A R and B R = B R .2. All of the remining vertices in V ( R ) and V ( R ) must be equal to some vertex in V ( R ) .3. Once R and R are determined, there is at most one ribbon R such that R , R , R are composable, R = R ◦ R ◦ R , and R , R , R induce the intersection pattern P .With these deﬁnitions, we can now analyze the intersection terms. Deﬁnition 8.13.

Given a left shape σ , deﬁne e σ to be the vector which has a in coordinate σ and has a in all other coordinates. Lemma 8.14.

For all τ ∈ M ′ , σ ∈ L U τ , and σ ′ ∈ L V τ , M f act τ ( e σ e T σ ′ ) − M orth τ ( e σ e T σ ′ ) = ∑ σ ∈L , γ ∈ Γ : σ ◦ γ = σ | Aut ( U γ ) | ∑ P ∈P γ , τ , IdV τ N ( P ) M orth τ P ( e σ e T σ ′ )+ ∑ σ ′ ∈L , γ ′ ∈ Γ : σ ′ ◦ γ ′ = σ ′ | Aut ( U γ ′ ) | ∑ P ∈P IdU τ , τ , γ ′ T N ( P ) M orth τ P ( e σ e T σ ′ )+ ∑ σ ∈L , γ ∈ Γ : σ ◦ γ = σ ∑ σ ′ ∈L , γ ′ ∈ Γ : σ ′ ◦ γ ′ = σ ′ | Aut ( U γ ) | · | Aut ( U γ ′ ) | ∑ P ∈P γ , τ , γ ′ T N ( P ) M orth τ P ( e σ e T σ ′ ) roof sketch. This lemma follows from the following bijection. Consider the third term ∑ σ ∈L , γ ∈ Γ : σ ◦ γ = σ ∑ σ ′ ∈L , γ ′ ∈ Γ : σ ′ ◦ γ ′ = σ ′ | Aut ( U γ ) | · | Aut ( U γ ′ ) | ∑ P ∈P γ , τ , γ ′ T N ( P ) M orth τ P ( e σ e T σ ′ ) On one side, we have the following data:1. Ribbons R , R , and R of shapes γ , τ , γ ′ T such that R , R , R are composable but R and R ◦ R are not properly composable (i.e. R has an unexpected intersection with R and/or R ) and R ◦ R and R are not properly composable (i.e. R has an unexpected intersection with R and/or R ).2. An ordering O A ′ on the leftmost minimum vertex separator A ′ of A R and V ∗ ∪ B R (recall that V ∗ isthe set of vertices which appear more than once in V ( R ◦ R ◦ R ) ).3. An ordering O B ′ on the rightmost minimum vertex separator B ′ of V ∗ ∪ A R and B R .On the other side, we have the following data1. An intersection pattern P ∈ P γ , τ , γ ′ T where γ and γ ′ T are non-trivial.2. Ribbons R ′ , R ′ , R ′ of shapes σ , τ P , σ ′ T which are properly composable3. A number in [ N ( P )] describing which possible triple of ribbons resulted in the intersection pattern P and the ribbon R ′ .To see this bijection, note that given the data on the ﬁrst side, we can recover the ribbons R ′ , R ′ , and R ′ asfollows:1. We decompose R as R = R ′ ◦ R where B R ′ = A R = A ′ with the ordering O A ′ .2. We decompose R as R = R ◦ R ′ where where B R = A R ′ = B ′ with the ordering O B ′ .3. We take R ′ = R ◦ R ◦ R .The intersection pattern P and the number in [ N ( P )] can be obtained from R , R , and R .Conversely, with the data on the other side, we can recover the data on the ﬁrst side as follows:1. R ′ gives an ordering O A ′ for A ′ = A R ′ and an ordering O B ′ for B ′ = B R ′ .2. The ribbon R ′ , intersection pattern P , and number in [ N ( P )] allow us to recover R , R , and R .3. We take R = R ′ ◦ R and R = R ◦ R ′ .Thus, both sides have the same coefﬁcient for each ribbon.The analysis for the the ﬁrst term is the same except that when γ ′ is trivial, we always take γ ′ = Id V τ .Thus, we always have that B ′ = B R ′ = B R (with the same ordering) and R ′ = R = Id B ′ . Because ofthis, there is no need to specify R , R ′ , R , or an ordering on B ′ .Similarly, the analysis for the the second term is the same except that when γ is trivial, we always take γ = Id U τ . Thus, we always have that A ′ = A R ′ = A R (with the same ordering) and R ′ = R = Id A ′ .Because of this, there is no need to specify R , R ′ , R , or an ordering on A ′ .Applying Lemma 8.14 for all σ and σ ′ simultaneously, we obtain the following corollary.67 eﬁnition 8.15. For all U , V ∈ I mid , given a γ ∈ Γ U , V and a vector v indexed by left shapes σ ∈ L V ,deﬁne v − γ to be the vector indexed by left shapes σ ∈ L U such that v − γ ( σ ) = v ( σ ◦ γ ) if σ ◦ γ ∈ L V and v − γ ( σ ) = otherwise. Proposition 8.16.

For all composable γ , γ ∈ Γ and all vectors v indexed by left shapes in L V γ , ( v − γ ) − γ = v − γ ◦ γ Corollary 8.17.

For all τ ∈ M ′ , for all left τ -vectors v and all right τ -vectors w , M orth τ ( vw T ) = M f act τ ( vw T ) − ∑ γ ∈ Γ ∗ , U τ | Aut ( U γ ) | ∑ P ∈P γ , τ , IdV τ N ( P ) M orth τ P ( v − γ w T ) − ∑ γ ′ ∈ Γ ∗ , V τ | Aut ( U γ ′ ) | ∑ P ∈P IdU τ , τ , γ ′ T N ( P ) M orth τ P ( v ( w − γ ) T ) − ∑ γ ∈ Γ ∗ , U τ ∑ γ ′ ∈ Γ ∗ , V τ | Aut ( U γ ) | · | Aut ( U γ ′ ) | ∑ P ∈P γ , τ , γ ′ T N ( P ) M orth τ P ( v − γ ( w − γ ′ ) T ) Applying Corollary 8.17 iteratively, we obtain the following theorem:

Deﬁnition 8.18.

Given γ , γ ′ ∈ Γ ∪ { Id U : U ∈ I mid } and j > , let Γ γ , γ ′ , j be the set of all γ , γ ′ , · · · , γ j , γ ′ j ∈ Γ ∪ { Id U : U ∈ I mid } such that:1. γ j , . . . , γ are composable and γ j ◦ . . . ◦ γ = γ γ ′ j , . . . , γ ′ are composable and γ ′ j ◦ . . . ◦ γ ′ = γ ′

3. For all i ∈ [ j ] , γ i or γ ′ i is non-trivial (i.e. γ i = Id U γ i or γ ′ i = Id U γ ′ i ). Remark 8.19.

Note that if γ = Id U and γ ′ = Id V then for all j > , Γ γ , γ ′ , j = ∅ . Theorem 8.20.

For all τ ∈ M ′ , left τ -vectors v , and right τ -vetors w , M orth τ ( vw T ) = M f act τ ( vw T )+ ∑ γ ∈ Γ ∗ , U τ ∪{ IdU τ } , γ ′∈ Γ ∗ , V τ ∪{ IdV τ } : γ or γ ′ is non-trivial ∑ j > ( − ) j ∑ γ , γ ′ , ··· , γ j , γ ′ j ∈ Γ γ , γ ′ , j ∏ i : γ i is non-trivial | Aut ( U γ i ) | ∏ i : γ ′ i is non-trivial | Aut ( U γ ′ i ) | ∑ P , ··· , P j : P i ∈P γ i , τ Pi − γ ′ iT j ∏ i = N ( P i ) ! M f act τ Pj ( v − γ ( w − γ ′ ) T ) where we take τ P = τ . M f act and M orth In this subsection, we bound the difference between M f act τ ( H τ ) and M orth τ ( H τ ) . We recall the followingconditions on B ( γ ) , N ( γ ) , and c ( γ ) : 68. For all τ ∈ M ′ , γ ∈ Γ ∗ , U τ ∪ { Id U τ } , and γ ′ ∈ Γ ∗ , V τ ∪ { Id V τ } , ∑ j > ∑ γ , γ ′ , ··· , γ j , γ ′ j ∈ Γ γ , γ ′ , j ∏ i : γ i is non-trivial | Aut ( U γ i ) | !  ∏ i : γ ′ i is non-trivial | Aut ( U γ ′ i ) |  ∑ P , ··· , P j : P i ∈P γ i , τ Pi − γ ′ iT j ∏ i = N ( P i ) ! ≤ N ( γ ) N ( γ ′ )( | Aut ( U γ ) | ) γ is non-trivial ( | Aut ( U γ ′ ) | ) γ ′ is non-trivial

2. For all τ ∈ M ′ , γ ∈ Γ ∗ , U τ , and γ ′ ∈ Γ ∗ , V τ , for all P ∈ P γ , τ , γ ′ T , B norm ( τ P ) ≤ B ( γ ) B ( γ ′ ) B norm ( τ ) ∀ V ∈ I mid , ∑ γ ∈ Γ ∗ , V | Aut ( U γ ) | c ( γ ) ≤ ε ′ ≤ With these conditions, we can now bound the difference between M f act and M orth . Lemma 8.21.

If the norm bounds and the conditions on B ( γ ) , N ( γ ) , and c ( γ ) hold then for all τ ∈ M ′ ,left τ -vectors v , and right τ -vetors w , (cid:16) M f act τ ( vw T ) + M f act τ T ( wv T ) (cid:17) − (cid:16) M orth τ ( vw T ) + M orth τ T ( wv T ) (cid:17) (cid:22) ε ′ B norm ( τ ) M f actId U τ ( vv T ) + ∑ γ ∈ Γ ∗ , U τ B ( γ ) N ( γ ) B norm ( τ ) c ( γ ) | Aut ( U γ ) | M f actId U γ ( v − γ ( v − γ ) T )+ ε ′ B norm ( τ ) M f actId V τ ( ww T ) + ∑ γ ′ ∈ Γ ∗ , V τ B ( γ ′ ) N ( γ ′ ) B norm ( τ ) c ( γ ′ ) | Aut ( U γ ′ ) | M f actId U γ ′ ( w − γ ′ ( w − γ ′ ) T ) Proof.

By Theorem 8.20, taking τ P = τ , M orth τ ( vw T ) = M f act τ ( vw T )+ ∑ γ ∈ Γ ∗ , U τ ∪{ IdU τ } , γ ′∈ Γ ∗ , V τ ∪{ IdV τ } : γ or γ ′ is non-trivial ∑ j > ( − ) j ∑ γ , γ ′ , ··· , γ j , γ ′ j ∈ Γ γ , γ ′ , j ∏ i : γ i is non-trivial | Aut ( U γ i ) | ∏ i : γ ′ i is non-trivial | Aut ( U γ ′ i ) | ∑ P , ··· , P j : P i ∈P γ i , τ Pi − γ ′ iT j ∏ i = N ( P i ) ! M f act τ Pj ( v − γ ( w − γ ′ ) T ) Taking the transpose of this equation gives M orth τ T ( wv T ) = M f act τ T ( wv T )+ ∑ γ ∈ Γ ∗ , U τ ∪{ IdU τ } , γ ′∈ Γ ∗ , V τ ∪{ IdV τ } : γ or γ ′ is non-trivial ∑ j > ( − ) j ∑ γ , γ ′ , ··· , γ j , γ ′ j ∈ Γ γ , γ ′ , j ∏ i : γ i is non-trivial | Aut ( U γ i ) | ∏ i : γ ′ i is non-trivial | Aut ( U γ ′ i ) | ∑ P , ··· , P j : P i ∈P γ i , τ Pi − γ ′ iT j ∏ i = N ( P i ) ! M f act τ TPj ( w − γ ′ ( v − γ ) T ) ± (cid:18) M f act τ Pj ( v − γ ( w − γ ′ ) T ) + M f act τ TPj ( w − γ ′ ( v − γ ) T ) (cid:19) = ± M f act τ Pj s N ( γ ) B ( γ ) c ( γ ) N ( γ ′ ) B ( γ ′ ) c ( γ ′ ) v − γ ! s N ( γ ′ ) B ( γ ′ ) c ( γ ′ ) N ( γ ) B ( γ ) c ( γ ) ( w − γ ′ ) T !! ± M f act τ TPj s N ( γ ′ ) B ( γ ′ ) c ( γ ′ ) N ( γ ) B ( γ ) c ( γ ) w − γ ′ ! s N ( γ ) B ( γ ) c ( γ ) N ( γ ′ ) B ( γ ′ ) c ( γ ′ ) ( v − γ ) T !! (cid:22) B norm ( τ P j ) (cid:18) N ( γ ) B ( γ ) c ( γ ) N ( γ ′ ) B ( γ ′ ) c ( γ ′ ) M f actId U γ ( v − γ ( v − γ ) T ) + N ( γ ′ ) B ( γ ′ ) c ( γ ′ ) N ( γ ) B ( γ ) c ( γ ) M f actId U γ ′ ( w − γ ′ ( w − γ ′ ) T ) (cid:19) Combining these equations, (cid:16) M f act τ ( vw T ) + M f act τ T ( wv T ) (cid:17) − (cid:16) M orth τ ( vw T ) + M orth τ T ( wv T ) (cid:17) (cid:22) ∑ γ ∈ Γ ∗ , U τ ∪{ IdU τ } , γ ′∈ Γ ∗ , V τ ∪{ IdV τ } : γ or γ ′ is non-trivial ∑ j > ∑ γ , γ ′ , ··· , γ j , γ ′ j ∈ Γ γ , γ ′ , j ∏ i : γ i is non-trivial | Aut ( U γ i ) | ∏ i : γ ′ i is non-trivial | Aut ( U γ ′ i ) | ∑ P , ··· , P j : P i ∈P γ i , τ Pi − γ ′ iT j ∏ i = N ( P i ) ! B norm ( τ P j ) (cid:18) N ( γ ) B ( γ ) c ( γ ) N ( γ ′ ) B ( γ ′ ) c ( γ ′ ) M f actId U γ ( v − γ ( v − γ ) T ) + N ( γ ′ ) B ( γ ′ ) c ( γ ′ ) N ( γ ) B ( γ ) c ( γ ) M f actId U γ ′ ( w − γ ′ ( w − γ ′ ) T ) (cid:19) From the conditions on B ( γ ) and N ( γ ) ,1. B norm ( τ P j ) ≤ B ( γ ) B ( γ ′ ) B norm ( τ ) ∑ j > ∑ γ , γ ′ , ··· , γ j , γ ′ j ∈ Γ γ , γ ′ , j ∏ i : γ i is non-trivial | Aut ( U γ i ) | !  ∏ i : γ ′ i is non-trivial | Aut ( U γ ′ i ) |  ∑ P , ··· , P j : P i ∈P γ i , τ Pi − γ ′ iT j ∏ i = N ( P i ) ! ≤ N ( γ ) N ( γ ′ )( | Aut ( U γ ) | ) γ is non-trivial ( | Aut ( U γ ′ ) | ) γ ′ is non-trivial Putting these equations together, (cid:16) M f act τ ( vw T ) + M f act τ T ( wv T ) (cid:17) − (cid:16) M orth τ ( vw T ) + M orth τ T ( wv T ) (cid:17) (cid:22) ∑ γ ∈ Γ ∗ , U τ ∪{ IdU τ } , γ ′∈ Γ ∗ , V τ ∪{ IdV τ } : γ or γ ′ is non-trivial B ( γ ) N ( γ ) B norm ( τ ) c ( γ )( | Aut ( U γ ) | ) γ is non-trivial ( | Aut ( U γ ′ ) | ) γ ′ is non-trivial c ( γ ′ ) M f actId U γ ( v − γ ( v − γ ) T )+ ∑ γ ∈ Γ ∗ , U τ ∪{ IdU τ } , γ ′∈ Γ ∗ , V τ ∪{ IdV τ } : γ or γ ′ is non-trivial B ( γ ′ ) N ( γ ′ ) B norm ( τ ) c ( γ ′ )( | Aut ( U γ ) | ) γ is non-trivial ( | Aut ( U γ ′ ) | ) γ ′ is non-trivial c ( γ ) M f actId U γ ′ ( w − γ ′ ( w − γ ′ ) T ) ∑ γ ∈ Γ ∗ , U τ ∪{ IdU τ } , γ ′∈ Γ ∗ , V τ ∪{ IdV τ } : γ or γ ′ is non-trivial B ( γ ) N ( γ ) B norm ( τ ) c ( γ )( | Aut ( U γ ) | ) γ is non-trivial ( | Aut ( U γ ′ ) | ) γ ′ is non-trivial c ( γ ′ ) M f actId U γ ( v − γ ( v − γ ) T ) (cid:22) ∑ γ ′ ∈ Γ ∗ , V τ | Aut ( U γ ′ ) | c ( γ ′ ) ! B norm ( τ ) M f actId U τ ( vv T )+ ∑ γ ∈ Γ ∗ , U τ  ∑ γ ′ ∈ Γ ∗ , V τ ∪{ Id V τ } ( | Aut ( U γ ′ ) | ) γ ′ is non-trivial c ( γ ′ )  B ( γ ) N ( γ ) B norm ( τ ) c ( γ )( | Aut ( U γ ) | ) γ is non-trivial M f actId U γ ( v − γ ( v − γ ) T ) (cid:22) ε ′ B norm ( τ ) M f actId U τ ( vv T ) + ∑ γ ∈ Γ ∗ , U τ B ( γ ) N ( γ ) B norm ( τ ) c ( γ ) | Aut ( U γ ) | M f actId U γ ( v − γ ( v − γ ) T ) Following similar logic, ∑ γ ∈ Γ ∗ , U τ ∪{ IdU τ } , γ ′∈ Γ ∗ , V τ ∪{ IdV τ } : γ or γ ′ is non-trivial B ( γ ′ ) N ( γ ′ ) B norm ( τ ) c ( γ ′ )( | Aut ( U γ ) | ) γ is non-trivial ( | Aut ( U γ ′ ) | ) γ ′ is non-trivial c ( γ ) M f actId U γ ′ ( w − γ ′ ( w − γ ′ ) T ) (cid:22) ε ′ B norm ( τ ) M f actId V τ ( ww T ) + ∑ γ ′ ∈ Γ ∗ , V τ B ( γ ′ ) N ( γ ′ ) B norm ( τ ) c ( γ ′ ) | Aut ( U γ ′ ) | M f actId U γ ′ ( w − γ ′ ( w − γ ′ ) T ) Putting everything together, (cid:16) M f act τ ( vw T ) + M f act τ T ( wv T ) (cid:17) − (cid:16) M orth τ ( vw T ) + M orth τ T ( wv T ) (cid:17) (cid:22) ε ′ B norm ( τ ) M f actId U τ ( vv T ) + ∑ γ ∈ Γ ∗ , U τ B ( γ ) N ( γ ) B norm ( τ ) c ( γ ) | Aut ( U γ ) | M f actId U γ ( v − γ ( v − γ ) T )+ ε ′ B norm ( τ ) M f actId V τ ( ww T ) + ∑ γ ′ ∈ Γ ∗ , V τ B ( γ ′ ) N ( γ ′ ) B norm ( τ ) c ( γ ′ ) | Aut ( U γ ′ ) | M f actId U γ ′ ( w − γ ′ ( w − γ ′ ) T ) as needed.Using Lemma 8.21 we have the following corollaries: Corollary 8.22.

For all U ∈ I mid , if the norm bounds and the conditions on B ( γ ) , N ( γ ) , and c ( γ ) holdand H Id U (cid:23) then M f actId U ( H Id U ) − M orthId U ( H Id U ) (cid:22) ε ′ M f actId U ( H Id U ) + ∑ γ ∈ Γ ∗ , U B ( γ ) N ( γ ) c ( γ ) | Aut ( U γ ) | M f actId U γ ( H − γ , γ Id U ) Corollary 8.23.

For all U ∈ I mid and all τ ∈ M U , if the norm bounds and the conditions on B ( γ ) , N ( γ ) ,and c ( γ ) hold and " || Aut ( U ) | c ( τ ) H Id U B norm ( τ ) H τ B norm ( τ ) H T τ | Aut ( U ) | c ( τ ) H Id U (cid:23) hen (cid:16) M f act τ ( H τ ) + M f act τ T ( H T τ ) (cid:17) − (cid:16) M orth τ ( H τ ) + M orth τ T ( H T τ ) (cid:17) (cid:22) ε ′ | Aut ( U ) | c ( τ ) M f actId U ( H Id U ) + ∑ γ ∈ Γ ∗ , U B ( γ ) N ( γ ) c ( γ ) | Aut ( U γ ) | · | Aut ( U ) | c ( τ ) M f actId U γ ( H − γ , γ Id U ) We now prove the following theorem which is a slight modiﬁcation of Theorem 8.1 and which impliesTheorem 8.1.

Theorem 8.24.

For all ε > and all ε ′ ∈ ( ] , for any moment matrix Λ = ∑ U ∈I mid M orthId U ( H Id U ) + ∑ U ∈I mid ∑ τ ∈M U M orth τ ( H τ ) , if we have that for all α ∈ M ′ , || M α || ≤ B norm ( α ) and B ( γ ) , N ( γ ) , and c ( α ) are functions such that1. For all τ ∈ M ′ , γ ∈ Γ ∗ , U τ , γ ′ ∈ Γ ∗ , V τ , and all intersection patterns P ∈ P γ , τ , γ ′ , B norm ( τ P ) ≤ B ( γ ) B ( γ ′ ) B norm ( τ )

6. For all τ ∈ M ′ , γ ∈ Γ ∗ , U τ ∪ { Id U τ } , and γ ′ ∈ Γ ∗ , V τ ∪ { Id V τ } , ∑ j > ∑ γ , γ ′ , ··· , γ j , γ ′ j ∈ Γ γ , γ ′ , j ∏ i : γ i is non-trivial | Aut ( U γ i ) | ∏ i : γ ′ i is non-trivial | Aut ( U γ ′ i ) | ∑ P , ··· , P j : P i ∈P γ i , τ Pi − γ ′ iT j ∏ i = N ( P i ) ! ≤ N ( γ ) N ( γ ′ )( | Aut ( U γ ) | ) γ is non-trivial ( | Aut ( U γ ′ ) | ) γ ′ is non-trivial and we have SOS-symmetric coefﬁcient matrices { H ′ γ : γ ∈ Γ } such that the following conditions hold:1. For all U ∈ I mid , H Id U (cid:23)

2. For all U ∈ I mid and τ ∈ M U , " | Aut ( U ) | c ( τ ) H Id U B norm ( τ ) H τ B norm ( τ ) H T τ | Aut ( U ) | c ( τ ) H Id U (cid:23)

3. For all U , V ∈ I mid where w ( U ) > w ( V ) and all γ ∈ Γ U , V , c ( γ ) N ( γ ) B ( γ ) H − γ , γ Id V (cid:22) H ′ γ hen Λ (cid:23) ∑ U ∈I mid M f actId U ( H Id U ) ! − ∑ U ∈I mid ∑ γ ∈ Γ U , ∗ d Id U ( H ′ γ , H Id U ) | Aut ( U ) | c ( γ ) ! Id sym If it is also true that ∑ U ∈I mid M f actId U ( H Id U ) (cid:23) ∑ U ∈I mid ∑ γ ∈ Γ U , ∗ d Id U ( H ′ γ , H Id U ) | Aut ( U ) | c ( γ ) ! Id sym then Λ (cid:23) .Proof. We make the following observations:1. By Theorem 8.2, ∑ U ∈I mid M f actId U ( H Id U ) + ∑ U ∈I mid ∑ τ ∈M U M f act τ ( H τ ) (cid:23) ( − ε ′ ) ∑ U ∈I mid M f actId U ( H Id U )

2. By Corollary 8.22, ∑ U ∈I mid (cid:16) M f actId U ( H Id U ) − M orthId U ( H Id U ) (cid:17) (cid:22) ε ′ ∑ U ∈I mid M f actId U ( H Id U ) + ∑ U ∈I mid ∑ γ ∈ Γ ∗ , U M f actId U γ ( H ′ γ ) c ( γ ) | Aut ( U γ ) |

3. By Corollary 8.23, ∑ U ∈I mid ∑ τ ∈M U (cid:16) M f act τ ( H τ ) − M orth τ ( H τ ) (cid:17) (cid:22) ∑ U ∈I mid ∑ τ ∈M U ε ′ | Aut ( U ) | c ( τ ) M f actId U ( H Id U ) + ∑ γ ∈ Γ ∗ , U B ( γ ) N ( γ ) c ( γ ) | Aut ( U γ ) | · | Aut ( U ) | c ( τ ) M f actId U γ ( H − γ , γ Id U ) ! (cid:22) ε ′ ∑ U ∈I mid M f actId U ( H Id U ) + ε ′ ∑ U ∈I mid ∑ γ ∈ Γ ∗ , U M f actId U γ ( H ′ γ ) c ( γ ) | Aut ( U γ ) | ∑ U ∈I mid ∑ γ ∈ Γ ∗ , U M f actId U γ ( H ′ γ ) c ( γ ) | Aut ( U γ ) | = ∑ U ∈I mid ∑ γ ∈ Γ ∗ , U M f actId U γ ( H Id U γ ) + (cid:16) M f actId U γ ( H ′ γ ) − M f actId U γ ( H Id U γ ) (cid:17) c ( γ ) | Aut ( U γ ) | (cid:22) ∑ U ∈I mid ∑ γ ∈ Γ ∗ , U M f actId U γ ( H Id U γ ) c ( γ ) | Aut ( U γ ) | + ∑ U ∈I mid ∑ γ ∈ Γ U , ∗ d Id U γ ( H ′ γ , H Id U γ ) | Aut ( U γ ) | c ( γ ) ! Id sym (cid:22) ε ′ ∑ U ∈I mid M f actU ( H Id U ) + ∑ U ∈I mid ∑ γ ∈ Γ U , ∗ d Id U γ ( H ′ γ , H Id U γ ) | Aut ( U γ ) | c ( γ ) ! Id sym Λ = ∑ U ∈I mid M orthId U ( H Id U ) + ∑ U ∈I mid ∑ τ ∈M U M orth τ ( H τ ) = ∑ U ∈I mid M f actId U ( H Id U ) + ∑ U ∈I mid ∑ τ ∈M U M f act τ ( H τ ) + ∑ U ∈I mid (cid:16) M f actId U ( H Id U ) − M orthId U ( H Id U ) (cid:17) + ∑ U ∈I mid ∑ τ ∈M U (cid:16) M f act τ ( H τ ) − M orth τ ( H τ ) (cid:17) (cid:23) ( − ε ′ − ε ′ ) ∑ U ∈I mid M f actId U ( H Id U ) − ( + ε ′ ) ∑ U ∈I mid ∑ γ ∈ Γ ∗ , U M f actId U γ ( H ′ γ ) c ( γ ) | Aut ( U γ ) | (cid:23) ( − ε ′ − ε ′ ) ∑ U ∈I mid M f actId U ( H Id U ) − ( + ε ′ ) ∑ U ∈I mid ∑ γ ∈ Γ U , ∗ d Id U γ ( H ′ γ , H Id U γ ) | Aut ( U γ ) | c ( γ ) ! Id sym (cid:23) ∑ U ∈I mid M f actId U ( H Id U ) − ∑ U ∈I mid ∑ γ ∈ Γ U , ∗ d Id U γ ( H ′ γ , H Id U γ ) | Aut ( U γ ) | c ( γ ) ! Id sym B norm ( α ) , B ( γ ) , N ( γ ) , and c ( α ) In this subsection, we give functions B norm ( α ) , B ( γ ) , N ( γ ) , and c ( α ) which satisfy the conditions neededfor our machinery. Recall the following deﬁnitions from Section 7.10.

Deﬁnition 9.1.

We deﬁne S α to be the leftmost minimum vertex separator of α Deﬁnition 9.2 (Simpliﬁed Isolated Vertices) . Under our simplifying assumptions, we deﬁne I α = { v ∈ W α : v is not incident to any edges in E ( α ) } Theorem 9.3 (Simpliﬁed B norm ( α ) , B ( γ ) , N ( γ ) , and c ( α ) ) . Under our simplifying assumptions, for all ε , ε ′ > and all D V ∈ N , if we take1. q = l D V ln ( n ) + ln ( ε ) + D V ln ( ) + D V ln ( ) m B vertex = D V p eq B norm ( α ) = B | V ( α ) \ U α | + | V ( α ) \ V α | vertex n w ( V ( α ))+ w ( I α ) − w ( S α ) B ( γ ) = B | V ( γ ) \ U γ | + | V ( γ ) \ V γ | vertex n w ( V ( γ ) \ U γ ) N ( γ ) = ( D V ) | V ( γ ) \ V γ | + | V ( γ ) \ U γ | c ( α ) = ( D V ) | U α \ V α | + | V α \ U α | + | E ( α ) | | V ( α ) \ ( U α ∪ V α ) | ε ′ hen the following conditions hold:1. With probability at least ( − ε ) , ∀ α ∈ M ′ , || M α || ≤ B norm ( α )

2. For all τ ∈ M ′ , γ ∈ Γ ∗ , U τ ∪ { Id U τ } , γ ′ ∈ Γ ∗ , V τ ∪ { Id V τ } , and intersection patterns P ∈ P γ , τ , γ ′ , B norm ( τ P ) ≤ B ( γ ) B ( γ ′ ) B norm ( τ )

7. For all τ ∈ M ′ , γ ∈ Γ ∗ , U τ ∪ { Id U τ } , and γ ′ ∈ Γ ∗ , V τ ∪ { Id V τ } , ∑ j > ∑ γ , γ ′ , ··· , γ j , γ ′ j ∈ Γ γ , γ ′ , j ∏ i : γ i is non-trivial | Aut ( U γ i ) | ∏ i : γ ′ i is non-trivial | Aut ( U γ ′ i ) | ∑ P , ··· , P j : P i ∈P γ i , τ Pi − γ ′ iT j ∏ i = N ( P i ) ! ≤ N ( γ ) N ( γ ′ )( | Aut ( U γ ) | ) γ is non-trivial ( | Aut ( U γ ′ ) | ) γ ′ is non-trivial B norm ( α ) , B ( γ ) , N ( γ ) , and c ( α ) * Recall the following deﬁnitions from Section 7.10.1.

Deﬁnition 9.4 ( S α , min and S α , max ) . Given a shape α ∈ M ′ , deﬁne S α , min to be the leftmost minimum vertexseparator of α if all edges with multiplicity at least are deleted and deﬁne S α , max to be the leftmost minimumvertex separator of α if all edges with multiplicity at least are present. Deﬁnition 9.5 (General I α ) . Given a shape α , deﬁne I α to be the set of vertices in V ( α ) \ ( U α ∪ V α ) suchthat all edges incident with that vertex have multplicity at least . Deﬁnition 9.6 ( B Ω ) . We take B Ω ( j ) to be a non-decreasing function such that for all j ∈ N , E Ω [ x j ] ≤ B Ω ( j ) j Deﬁnition 9.7.

For all i , we deﬁne h + i to be the polynomial h i where we make all of the coefﬁcients havepositive sign. Lemma 9.8. If Ω = N (

0, 1 ) then we can take B Ω ( j ) = p j and we have that Theorem 9.9 (General B norm ( α ) , B ( γ ) , N ( γ ) , and c ( α ) ) . For all ε , ε ′ > and all D V , D E ∈ N , if we take1. q = (cid:6) D V ln ( n ) + ln ( ε ) + ( D V ) k ln ( D E + ) + D V ln ( ) (cid:7) B vertex = qD V B edge ( e ) = h + l e ( B Ω ( D V D E )) max j ∈ [ D V D E ] (cid:26)(cid:16) h + j ( B Ω ( qj )) (cid:17) le max { j , le } (cid:27) B norm ( α ) = eB | V ( α ) \ U α | + | V ( α ) \ V α | vertex (cid:16) ∏ e ∈ E ( α ) B edge ( e ) (cid:17) n w ( V ( α ))+ w ( I α ) − w ( S α ) . B ( γ ) = B | V ( γ ) \ U γ | + | V ( γ ) \ V γ | vertex (cid:16) ∏ e ∈ E ( γ ) B edge ( e ) (cid:17) n w ( V ( γ ) \ U γ ) N ( γ ) = ( D V ) | V ( γ ) \ V γ | + | V ( γ ) \ U γ | c ( α ) = ( t max D V ) | U α \ V α | + | V α \ U α | + k | E ( α ) | ( t max ) | V ( α ) \ ( U α ∪ V α ) | ε ′ then the following conditions hold:1. With probability at least ( − ε ) , ∀ α ∈ M ′ , || M α || ≤ B norm ( α )

Recall that if Ω = N (

0, 1 ) then we may take B Ω ( j ) = p j and we have that h + j ( x ) ≤ p j ! ( x + j ) j ≤ (cid:18) ej ( x + j ) (cid:19) j Thus, when Ω = N (

0, 1 ) we can take B edge ( e ) = (cid:18) el e ( D V D E + l e ) (cid:19) l e ( e ( D V D E q + )) l e ≤ (cid:0) D V D E q (cid:1) l e B norm ( α ) We need matrix norm bounds which hold for all α ∈ M ′ . For convenience, we recall the deﬁnition of M ′ below. Deﬁnition 9.11 ( M ′ ) . We deﬁne M ′ to be the set of all shapes α such that1. | V ( α ) | ≤ D V ∀ e ∈ E ( α ) , l e ≤ D E e ∈ E ( α ) have multiplicity at most D V .

76o obtain such norm bounds, we start with the norm bounds in the graph matrix norm bound paper. Wethen modify these bounds as follows:1. We make the bounds more compatible with the conditions of our machinery. To do this, we upperbound many of the terms in the norm bound by B | V ( α ) \ U α | + | V ( α ) \ V α | vertex where B vertex is a function of ourparameters. In general, we will also need to upper bound some of the terms by ∏ e ∈ E ( α ) ( B edge ( e )) where B edge ( e ) is a function of l e , Ω , and our parameters.2. We generalize the bounds so that they apply to improper shapes as well as proper shapes. Under oursimplifying assumptions, all we need to do here is to take isolated vertices into account. In general,we also need to handle multi-edges. B norm ( α ) Under our simplifying assumptions, we start with the following norm bound from the updated graph matrixnorm bound paper [AMP20]:

Theorem 9.12 (Simpliﬁed Graph Matrix Norm Bounds) . Under our simplifying assumptions, for all ε > and all proper shapes α , taking c α = | V ( α ) \ ( U α ∪ V α ) | + | S α \ ( U α ∩ V α ) | , Pr (cid:16) || M α || > ( | V α \ ( U α ∩ V α ) | ) | V ( α ) \ ( U α ∩ V α ) | ( eq ) c α n w ( V ( α )) − w ( S α ) (cid:17) < ε where q = (cid:24) ln ( nw ( S α ) ε ) c α (cid:25) Corollary 9.13.

For all shapes α and all ε > , Pr (cid:18) || M α || > (cid:16) | V α | p eq (cid:17) | V ( α ) \ U α | + | V ( α ) \ V α | n w ( V ( α ))+ w ( I α ) − w ( S α ) (cid:19) < ε where q = (cid:24) ln ( nw ( S α ) ε ) c α (cid:25) .Proof. Observe that adding an isolated vertex to α is equivalent to multiplying M α by n − | V ( α ) | . Thus, ifthe bound holds for all proper α then it will hold for improper α as well.We now make the following observations:1. | S α \ ( U α ∩ V α ) | ≤ | U α \ V α | , so c α = | W α | + | S α \ ( U α ∩ V α ) | ≤ | V ( α ) \ V α | . Similarly, | S α \ ( U α ∩ V α ) | ≤ | V α \ U α | , so c α ≤ | V ( α ) \ U α | . Thus, c α ≤ | V ( α ) \ U α | + | V ( α ) \ V α | .2. | V ( α ) \ ( U α ∩ V α ) | ≤ | V ( α ) \ U α | + | V ( α ) \ V α | Thus, by Theorem 9.12, for all proper shapes α and all ε > , Pr (cid:18) || M α || > (cid:16) | V α | p eq (cid:17) | V ( α ) \ U α | + | V ( α ) \ V α | n w ( V ( α ))+ w ( I α ) − w ( S α ) (cid:19) < ε ′′ where q = (cid:24) ln ( nw ( S α ) ε ) c α (cid:25) . 77 orollary 9.14. For all z ∈ N and all ε > , taking ε ′′ = ε z z , with probability at least − ε we have thatfor all shapes α such that | V ( α ) | ≤ z , || M α || ≤ (cid:16) | V α | p eq (cid:17) | V ( α ) \ U α | + | V ( α ) \ V α | n w ( V ( α ))+ w ( I α ) − w ( S α ) where q = (cid:24) ln ( nw ( S α ) ε ′′ ) c α (cid:25) .Proof. This result can be proved from Corollary 9.13 using a union bound and the following proposition:

Proposition 9.15.

Under our simplifying assumptions, for all z ∈ N , there are at most z z proper shapes α such that V ( α ) ≤ z .Proof. Observe that we can construct any proper shape α with at most m vertices as follows:1. Start with z vertices v , . . . , v z .2. For each vertex v i , choose whether v i ∈ V ( α ) \ U α \ V α , v i ∈ U α \ V α , v i ∈ V α \ U α , v i ∈ U α ∩ V α ,or v i / ∈ V ( α ) .3. For each pair of vertices v i , v j ∈ V ( α ) , choose whether or not ( v i , v j ) ∈ E ( α ) Corollary 9.16.

For all D V ∈ N and all ε > , taking q =  ln ( DV D V n DV ε )  = & D V ln ( n ) + ln ( ε ) + D V ln ( ) + D V ln ( ) ' , B vertex = D V p eq , and B norm ( α ) = B | V ( α ) \ U α | + | V ( α ) \ V α | vertex n w ( V ( α ))+ w ( I α ) − w ( S α ) , with probability at least ( − ε ) we have that for all shapes α ∈ M ′ , || M α || ≤ B norm ( α ) Proof.

This follows from Corollary 9.14 and the fact that for all α ∈ M ′ , w ( S α ) ≤ | V ( α ) | ≤ D V B norm ( α ) In general, we start with the following norm bound from the updated graph matrix norm bound paper[AMP20]:

Theorem 9.17 (General Graph Matrix Norm Bounds) . For all ε > and all proper shapes α , taking q = ⌈ ln ( n w ( S α ) ε ) ⌉ P || M α || > e ( q | V ( α ) | ) | V ( α ) \ ( U α ∩ V α ) | ∏ e ∈ E ( α ) h + l e ( B Ω ( ql e )) ! n ( w ( V ( α )) − w ( S α )) ! < ε orollary 9.18. For all ε > , for all z , l max , m ∈ N , taking ε ′′ = ε z ( l max + ) zk , with probability at least − ε , for all shapes α such that1. | V ( α ) | ≤ z .2. All edges in E ( α ) have label at most l max .3. All edges in E ( α ) have multiplicity at most m ., || M α || ≤ e ( q | V ( α ) | ) | V ( α ) \ U α | + | V ( α ) \ V α | ∏ e ∈ E ( α ) h + l e ( B Ω ( ml max )) max j ∈ [ ml max ] (cid:26)(cid:16) h + j ( B Ω ( qj )) (cid:17) le max { j , le } (cid:27)! n w ( V ( α ))+ w ( I α ) − w ( S α , min ) where q = l ln (cid:16) n w ( S α , max ) ε ′′ (cid:17)m Proof.

Observe that for each α which has multi-edges, we can write M α = ∑ i c i M α i where each α i has nomultiple edges. We ﬁrst upper bound ∑ i | c i | . Lemma 9.19.

For any a , . . . , a m ∈ N ∪ { } , taking p max = ∑ mi = a i and writing ∏ mi = h a i = ∑ p max k = c k h k , p max ∑ k = | c k | ≤ ( p max + ) m ∏ i = h + a i ( B Ω ( p max )) ≤ m ∏ i = h + a i ( B Ω ( p max )) Proof.

Suppose ∏ mi = ( h a i ( x )) = ∑ p max k = u k x k and ∏ mi = ( h + a i ( x )) = ∑ p max k = v k x k . Then, note that | u k | ≤ v k and so, E Ω [ m ∏ i = ( h a i ( x )) ] = p max ∑ k = u k E Ω [ x k ] ≤ p max ∑ k = v k | E Ω [ x k ] | ≤ p max ∑ k = v k ( B Ω ( p max )) k = m ∏ i = ( h + a i ( B Ω ( p max )) Therefore, using the fact that h k form an orthonormal basis, p max ∑ k = c k = E Ω [( p max ∑ k = c k h k ( x )) ] = E Ω [ m ∏ i = ( h a i ( x )) ] ≤ m ∏ i = ( h + a i ( B Ω ( p max )) This implies ( p max ∑ k = | c k | ) ≤ ( p max + )( p max ∑ k = c k ) ≤ ( p max + ) m ∏ i = ( h + a i ( B Ω ( p max )) Taking square roots gives the inequality.

Corollary 9.20.

For any shape α such that every edge of α has multiplicity at most m and label at most l max ,if we write M α = ∑ i c i M α i where each α i has no multi-edges then ∑ i | c i | ≤ ∏ e ∈ E ( α ) h + l e ( B Ω ( ml max )) The result now follows from Theorem 9.17 and the following observations:79. | V ( α ) \ ( U α ∩ V α ) | ≤ | V ( α ) \ U α | + | V ( α ) \ V α | .2. For any α , writing M α = ∑ i c i M α i where each α i has no multi-edges, for all α i , w ( V ( α i )) + w ( I α i ) − w ( S α i ) ≤ w ( V ( α )) + w ( I α ) − w ( S α , min )

3. For any a , . . . , a m ∈ N ∪ { } such that ∀ i ′ ∈ [ m ] , a i ′ ≤ l max , for all j ∈ [ ml max ] h + j ( B Ω ( qj )) ≤ m ∏ i ′ = (cid:16) h + j ( B Ω ( qj )) (cid:17) ai ′ max { j , ai ′} ≤ m ∏ i ′ = max j ′ ∈ [ ml max ] ((cid:16) h + j ′ ( B Ω ( qj ′ )) (cid:17) ai ′ max { j ′ , ai ′ } ) Proposition 9.21.

For all z , l max ∈ N , there are at most z ( l max + ) z k proper shapes α such that | V ( α ) | ≤ z and every edge in E ( α ) .Proof. This can be proved in the same way as before. Observe that we can construct any proper shape α with at most z vertices as follows:1. Start with z vertices v , . . . , v z .2. For each vertex v i , choose whether v i ∈ V ( α ) \ U α \ V α , v i ∈ U α \ V α , v i ∈ V α \ U α , v i ∈ U α ∩ V α ,or v i / ∈ V ( α ) .3. For each k tuple of vertices in V ( α ) , choose the label of the hyperedge between these vertices (or ifthe hyperedge is not in E ( α ) ). Corollary 9.22.

For all D V , D E ∈ N and all ε > , taking B norm ( α ) = eB | V ( α ) \ U α | + | V ( α ) \ V α | vertex ∏ e ∈ E ( α ) B edge ( e ) ! n w ( V ( α ))+ w ( I α ) − w ( S α ) where1. q = l ln (cid:16) n DV ε ′′ (cid:17)m = (cid:6) D V ln ( n ) + ln ( ε ) + ( D V ) k ln ( D E + ) + D V ln ( ) (cid:7) B vertex = qD V B edge ( e ) = h + l e ( B Ω ( D V D E )) max j ∈ [ D V D E ] (cid:26)(cid:16) h + j ( B Ω ( qj )) (cid:17) le max { j , le } (cid:27) with probability at least ( − ε ) , for all shapes α ∈ M ′ , || M α || ≤ B norm ( α ) . B ( γ ) We now describe how to choose the function B ( γ ) . Recall that we want the following conditions to hold:1. For all γ , τ , γ ′ and all intersection patterns P ∈ P γ , τ , γ ′ , B norm ( τ P ) ≤ B ( γ ) B ( γ ′ ) B norm ( τ )

80. For all composable γ , γ , B ( γ ) B ( γ ) = B ( γ ◦ γ ) .The most important part of choosing B ( γ ) is to make sure that the factors of n are controlled. For this, weuse the following intersection tradeoff lemma. Under our simplifying assumptions, this lemma follows from[BHK +

16, Lemma 7.12]. We defer the general proof of this lemma to the end of this section.

Lemma 9.23 (Intersection Tradeoff Lemma) . For all γ , τ , γ ′ and all intersection patterns P ∈ P γ , τ , γ ′ , w ( V ( τ P )) + w ( I τ P ) − w ( S τ P , min ) ≤ w ( V ( τ )) + w ( I τ ) − w ( S τ , min ) + w ( V ( γ ) \ U γ ) + w ( V ( γ ′ ) \ U γ ′ ) Based on this intersection tradeoff lemma, we can choose the function B ( γ ) as follows. Corollary 9.24.

If we take B norm ( α ) = C · B | V ( α ) \ U α | + | V ( α ) \ V α | vertex ∏ e ∈ E ( α ) B edge ( e ) ! n w ( V ( α ))+ w ( I α ) − w ( S α ) for some constant C > and take B ( γ ) = B | V ( γ ) \ U γ | + | V ( γ ) \ V γ | vertex ∏ e ∈ E ( γ ) B edge ( e ) ! n w ( V ( γ ) \ U γ ) then the following conditions hold:1. For all γ , τ , γ ′ and all intersection patterns P ∈ P γ , τ , γ ′ , B norm ( τ P ) ≤ B ( γ ) B ( γ ′ ) B norm ( τ )

2. For all composable γ , γ , B ( γ ) B ( γ ) = B ( γ ◦ γ ) .Proof. We have that B norm ( τ P ) = B | V ( τ P ) \ U τ P | + | V ( τ P ) \ V τ P | vertex ∏ e ∈ E ( τ P ) B edge ( e ) ! n w ( V ( τ P ))+ w ( I τ P ) − w ( S τ P ) and B ( γ ) B ( γ ′ ) B norm ( τ ) = B | V ( γ ) \ U γ | + | V ( γ ) \ V γ | + | V ( γ ′ ) \ U γ ′ | + | V ( γ ′ ) \ V γ ′ | + | V ( τ ) \ U τ | + | V ( τ ) \ V τ | vertex ∏ e ∈ E ( γ ) ∪ E ( γ ′ ) ∪ E ( τ ) B edge ( e ) ! n w ( V ( γ ) \ U γ )+ w ( V ( γ ′ ) \ U γ ′ )+ w ( V ( τ ))+ w ( I τ ) − w ( S τ ) The ﬁrst condition now follows immediately from the following observations:1. | V ( γ ) \ U γ | + | V ( γ ) \ V γ | + | V ( γ ′ ) \ U γ ′ | + | V ( γ ′ ) \ V γ ′ | + | V ( τ ) \ U τ | + | V ( τ ) \ V τ | = | V ( γ ◦ τ ◦ γ ′ T ) \ U γ ◦ τ ◦ γ ′ T | + | V ( γ ◦ τ ◦ γ ′ T ) \ V γ ◦ τ ◦ γ ′ T | ≥ | V ( τ P ) \ U τ P | + | V ( τ P ) \ V τ P | E ( τ P ) = E ( γ ) ∪ E ( τ ) ∪ E ( γ ′ T ) so ∏ e ∈ E ( τ P ) B edge ( e ) = ∏ e ∈ E ( γ ) ∪ E ( γ ′ ) ∪ E ( τ ) B edge ( e ) .3. By the intersection tradeoff lemma, w ( V ( τ P )) + w ( I τ P ) − w ( S τ P ) ≤ w ( V ( τ )) + w ( I τ ) − w ( S τ ) + w ( V ( γ ) \ U γ ) + w ( V ( γ ′ ) \ U γ ′ ) The second condition follows from the form of B ( γ ) .81 .4 Choosing N ( γ ) To choose N ( γ ) , we use the following lemma: Lemma 9.25.

Observe that aside from the orderings (which are canceled out by the | Aut ( U γ i ) | and | Aut ( U γ ′ i ) | factors), the intersection patterns { P i : i ∈ [ j ] } are determined by the following data on each vertex v ∈ ( V ( γ ) \ V γ ) ∪ ( V ( γ ′ T ) \ V γ ′ T ) :1. The ﬁrst i ∈ [ j ] such that v ∈ ( V ( γ i ) \ V γ i ) ∪ ( V ( γ ′ i T ) \ V γ ′ iT ) . There are at most j possibilities forthis.2. A vertex u (if one exists) in V ( γ i − ◦ . . . ◦ γ ◦ τ ◦ γ ′ T . . . ◦ γ ′ i − T ) such that u and v are equal. Thereare at most D V possibilities for this.Using these observations and taking j max = | V ( γ ) \ V γ | + | V ( γ ′ ) \ V γ ′ | , ∑ j > ∑ γ , γ ′ , ··· , γ j , γ ′ j ∈ Γ γ , γ ′ , j ∏ i : γ i is non-trivial | Aut ( U γ i ) | ∏ i : γ ′ i is non-trivial | Aut ( U γ ′ i ) | ∑ P , ··· , P j : P i ∈P γ i , τ Pi − γ ′ iT ≤ j max ∑ j = ( jD V ) | V ( γ ) \ V γ | + | V ( γ ′ ) \ V γ ′ | ( | Aut ( U γ ) | ) γ is non-trivial ( | Aut ( U γ ′ ) | ) γ ′ is non-trivial ≤ j max (cid:18) (cid:19) j max ( D V ) ( | V ( γ ) \ V γ | + | V ( γ ′ ) \ V γ ′ | ) ( | Aut ( U γ ) | ) γ is non-trivial ( | Aut ( U γ ′ ) | ) γ ′ is non-trivial < ( D V ) ( | V ( γ ) \ V γ | + | V ( γ ′ ) \ V γ ′ | ) ( | Aut ( U γ ) | ) γ is non-trivial ( | Aut ( U γ ′ ) | ) γ ′ is non-trivial Now recall that by Lemma , for any γ i , τ P i − , γ ′ i T and any intersection pattern P i ∈ P γ i , τ Pi − , γ ′ iT , N ( P i ) ≤ | V ( τ P i ) | | V ( γ i ) \ U γ i | + | V ( γ ′ i ) \ U γ ′ i | Thus, for any P , · · · , P j : P i ∈ P γ i , τ Pi − , γ ′ iT , ∏ ji = N ( P i ) ≤ ( D V ) | V ( γ ) \ U γ | + | V ( γ ′ ) \ U γ ′ | . Putting everythingtogether, the result follows. 82 orollary 9.26. For all D V ∈ N , if we take N ( γ ) = ( D V ) | V ( γ ) \ V γ | + | V ( γ ) \ U γ | then for all composable γ , τ , γ ′ T such that | V ( γ ) | ≤ D V , | V ( τ ) | ≤ D V , and | V ( γ ′ ) | ≤ D V , ∑ j > ∑ γ , γ ′ , ··· , γ j , γ ′ j ∈ Γ γ , γ ′ , j ∏ i : γ i is non-trivial | Aut ( U γ i ) | ∏ i : γ ′ i is non-trivial | Aut ( U γ ′ i ) | ∑ P , ··· , P j : P i ∈P γ i , τ Pi − γ ′ iT j ∏ i = N ( P i ) ! ≤ N ( γ ) N ( γ ′ )( | Aut ( U γ ) | ) γ is non-trivial ( | Aut ( U γ ′ ) | ) γ ′ is non-trivial c ( α ) In this section, we describe how to choose c ( α ) . For simplicity, we ﬁrst describe how to choose c ( α ) under our simplifying assumptions. We then describe the minor adjustments that are needed when we havehyperedges and multiple types of vertices. Lemma 9.27.

In order to choose α , it is sufﬁcient to choose the following:1. The number j of vertices in U α \ V α , the number j of vertices in V α \ U α , and the number j ofvertices in V ( α ) \ ( U α ∪ V α ) .2. A mapping in Aut ( U α ∩ V α ) determining how the vertices in U α ∩ V α match up with each other.3. The position of each vertex u ∈ U α \ V α within U α (there are at most | U α | ≤ D V choices for this).4. The position of each vertex v ∈ V α \ U α within V α (there are at most | U α | ≤ D V choices for this).5. The number j of edges in E ( α ) .6. The endpoints of each edge in E ( α ) .This implies that for all j , j , j , j ≥ ∑ α : U α ≡ U , | U α \ V α | = j | V α \ U α | = j | V ( α ) \ ( U α ∪ V α ) | = j | E ( α ) | = j | Aut ( U α ∩ V α ) | ( D V ) j + j ( D V ) j ≤ Using this, we have that ∑ α : U α ≡ U , α is proper and non-trivial | Aut ( U α ∩ V α ) | ( D V ) | U α \ V α | + | V α \ U α | + | E ( α ) | | V ( α ) \ ( U α ∪ V α ) | ≤ ∑ j , j , j , j ∈ N ∪{ } : j + j + j + j ≥ j + j j j ≤ (cid:18) (cid:19) − < orollary 9.28. For all ε ′ > , if we take c ( α ) = ( D V ) | U α \ V α | + | V α \ U α | + | E ( α ) | | V ( α ) \ ( U α ∪ V α ) | ε ′ then1. ∀ U ∈ I mid , ∑ γ ∈ Γ U , ∗ | Aut ( U ) | c ( γ ) < ε ′ ∀ V ∈ I mid , ∑ γ ∈ Γ ∗ , V | Aut ( U γ ) | c ( γ ) < ε ′ ∀ U ∈ I mid , ∑ τ ∈M U | Aut ( U ) | c ( τ ) < ε ′ c ( α ) in general* When we have multiple types of vertices and hyperedges of arity k , Lemma 9.27 can be generalized asfollows: Lemma 9.29.

This can be proved in the same way as Lemma 9.27 with the following modiﬁcations:1. In addition to choosing the number of vertices in U α \ V α , V α \ U α , and V ( α ) \ ( U α ∩ V α ) , we alsohave to choose the types of these vertices.2. For each hyperedge, we have to choose k endpoints rather than endpoints. Corollary 9.30.

For all ε ′ > , for the same choice of c ( α ) as in Corollary , for any U ∈ I mid andinteger m ≥ , we have ∑ γ ∈ Γ U , ∗ : | V ( γ ) |≥| U | + m | Aut ( U ) | c ( γ ) ≤ ε ′ · m − roof sketch. The proof is similar to the proof of Corollary 9.30, but we now have the extra condition j + j ≥ m in the proof of Lemma 9.27. Then, ∑ j , j , j , j ∈ N ∪{ } : j + j ≥ m j + j j j ≤ ∑ j , j ∈ N ∪{ } m j j = · m ≤ m − We now prove the generalized intersection tradeoff lemma.

Lemma 9.32.

For all γ , τ , γ ′ and all intersection patterns P ∈ P γ , τ , γ ′ , w ( V ( τ P )) + w ( I τ P ) − w ( S τ P , min ) ≤ w ( V ( τ )) + w ( I τ ) − w ( S τ , min ) + w ( V ( γ ) \ U γ ) + w ( V ( γ ′ ) \ U γ ′ ) Proof.

Deﬁnition 9.33.

1. We deﬁne I LM to be the set of vertices which, after intersections, touch γ and τ but not γ ′ T . Inparticular, I LM consists of the vertices which result from intersecting a pair of vertices in V ( γ ) \ V γ and V ( τ ) \ U τ \ V τ and the vertices which are in U τ \ V τ and are not intersected with any othervertex.2. We deﬁne I MR to be the set of vertices which, after intersections, touch τ and γ ′ T but not γ . Inparticular, I MR consists of the vertices which result from intersecting a pair of vertices in V ( τ ) \ U τ \ V τ and V ( γ ′ T ) \ U γ ′ T and the vertices which are in V τ \ U τ and are not intersected with any othervertex.3. We deﬁne I LR to be the set of vertices which, after intersections, touch γ and γ ′ T but not τ . Inparticular, I LR consists of the vertices which result from intersecting a pair of vertices in V ( γ ) \ V γ and V ( γ ′ T ) \ U γ ′ T .4. We deﬁne I LMR to be the set of vertices which, after intersections, touch γ , τ , and γ ′ T . In particular, I LMR consists of the vertices which result from intersecting a triple of vertices in V ( γ ) \ V γ , V ( τ ) \ U τ \ V τ , and V ( γ ′ T ) \ U γ ′ T , intersecting a pair of vertices in V ( γ ) \ V γ and V τ \ U τ , intersecting apair of vertices in U τ \ V τ and V ( γ ′ T ) \ U γ ′ T , and single vertices in U τ ∩ V τ . The main idea is as follows. A priori, any of the vertices in I LM ∪ I MR ∪ I LR ∪ I LMR could becomeisolated. We handle this by keeping track of the following types of ﬂows:1. Flows from U γ to I LM ∪ I LR ∪ I LMR

2. Flows from I LR ∪ I MR ∪ I LMR to V γ ′ T

3. Flows from I LM to I MR . For technical reasons, we also view vertices in I LMR as having ﬂow tothemselves.We then observe that ﬂows to and from these vertices prevent these vertices from being isolated and canprovide ﬂow from U γ to V γ ′ T , which gives a lower bound on w ( S τ P ) .We now implement this idea. 85 eﬁnition 9.34 (Flow Graph) . Given a shape α , we deﬁne the directed graph H α as follows:1. For each vertex v ∈ V ( α ) , we create two vertices v in and v out . We then create a directed edge from v in to v out with capacity w ( v )

2. For each pair of vertices ( v , w ) which is an edge of multiplicity in E ( α ) (or part of a hyperedge ofmultiplicity in E ( α ) ), we create a directed edge with inﬁnite capacity from v out to w in and we createa directed edge with inﬁnite capacity from w out to v in .3. We deﬁne U H α to be U H α = { u in : u ∈ U α } and we deﬁne V H α to be V H α = { v out : v ∈ V α } Lemma 9.35.

The maximum ﬂow from U H α to V H α is equal to the minimum weight of a separator between U α and V α .Proof. This can be proved using the max ﬂow min cut theorem.

Deﬁnition 9.36 (Modiﬁed Flow Graph) . Given a shape α together with a set I L ⊆ V ( α ) of vertices in α (which will be the vertices in α which are intersected with a vertex to the left of α ) and a set I R ⊆ V ( α ) ofvertices in α (which will be the vertices in α which are intersected with a vertex to the right of α ), we deﬁnethe modiﬁed ﬂow graph H I L , I R α as follows:1. We start with the ﬂow graph H α

2. For each vertex u ∈ I L , we delete all of the edges into u in and add u in to U H α

3. For each vertex v ∈ I R , we delete all of the edges out of v out and add v out to V H α

4. We call the resulting graph H I L , I R α and the resulting sets U H IL , IR α and V H IL , IR α Lemma 9.37.

The maximum ﬂow from U H IL , IR α to V H IL , IR α in H I L , I R α is at least as large as the maximum ﬂowfrom U H α to V H α in H α Proof sketch.

Observe that if we have a cut C in H I L , I R α which separates U H IL , IR α and V H IL , IR α then C separates U H α and V H α in H α Before the intersections, we have the following ﬂows.1. We take F to be the maximum ﬂow from U γ to V γ in γ . Note that F has value w ( V γ )

2. We take F to be the maximum ﬂow from U τ to V τ in τ . Note that F has value w ( S τ , min )

3. We take F to be the maximum ﬂow from U γ ′ T to V γ ′ T in γ ′ T . Note that F has value w ( U γ ′ T ) After the intersections, we take the following ﬂows:1. We take F ′ to be the maximum ﬂow from U H ∅ , ILM ∪ ILR ∪ ILMR γ to V H ∅ , ILM ∪ ILR ∪ ILMR γ in H ∅ , I LM ∪ I LR ∪ I LMR γ .2. We take F ′ to be the maximum ﬂow from U H ILM ∪ ILMR , IMR ∪ ILMR τ to V H ILM ∪ ILMR , IMR ∪ ILMR τ in H I LM ∪ I LMR , I MR ∪ I LMR τ

3. We take F ′ to be the maximum ﬂow from U H IMR ∪ ILR ∪ ILMR , ∅ γ ′ T to V H IMR ∪ ILR ∪ ILMR , ∅ γ ′ T in H I MR ∪ I LR ∪ I LMR , ∅ γ ′ T . 86bserve that because of how intersection patterns are deﬁned, val ( F ′ ) = w ( U γ ) and val ( F ′ ) = w ( V γ ′ T ) .By Lemma 9.37, the value of F ′ is at least as large as the value of F , so val ( F ′ ) ≥ w ( S τ , min ) .We now consider F ′ + F ′ + F ′ . As is, this is not a ﬂow, but we can ﬁx this. Deﬁnition 9.38.

For each vertex v ∈ V ( τ P ) ,1. We deﬁne f in ( v ) to be the ﬂow into v in in F ′ + F ′ + F ′ .2. We deﬁne f out ( v ) to be the ﬂow out of v out in F ′ + F ′ + F ′ .3. We deﬁne f through ( v ) to be the ﬂow from v in to v out in F ′ + F ′ + F ′

4. We deﬁne f imbalance ( v ) to be f imbalance ( v ) = | f in ( v ) − f out ( v ) |

5. We deﬁne f excess ( v ) to be f excess ( v ) = f through ( v ) − max { f in ( v ) , f out ( v ) } With this information, we ﬁx the ﬂow F ′ + F ′ + F ′ as follows. For each vertex v ∈ V ( τ P ) ,1. If f in ( v ) > f out ( v ) then we create a vertex v supplemental , out and an edge from v out to v supplemental , out with capacity f imbalance ( v ) and we route f imbalance ( v ) of ﬂow along this edge. We then add v supplemental , out to a set of vertices V supplemental .2. If f in ( v ) < f out ( v ) then we create a vertex v supplemental , in and an edge from v supplemental , in to v in withcapacity f imbalance ( v ) and we route f imbalance ( v ) of ﬂow along this edge. We then add v supplemental , out to a set of vertices V supplemental .3. We reduce the ﬂow on the edge from v in to v out by f excess ( v ) We call the resulting ﬂow F ′ Proposition 9.39. F ′ is a ﬂow from U H ∅ , ILM ∪ ILR ∪ ILMR γ ∪ U supplemental to V H IMR ∪ ILR ∪ ILMR , ∅ γ ′ T ∪ V supplemental withvalue val ( F ′ ) = val ( F ′ ) + val ( F ′ ) + val ( F ′ ) − ∑ v ∈ V ( τ ) f excess ( v ) Corollary 9.40.

There exists a ﬂow F ′′ from U H ∅ , ILM ∪ ILR ∪ ILMR γ to V H IMR ∪ ILR ∪ ILMR , ∅ γ ′ T with value val ( F ′′ ) ≥ val ( F ′ ) + val ( F ′ ) + val ( F ′ ) − ∑ v ∈ V ( τ ) ( f excess ( v ) + f imbalance ( v )) Proof.

Consider the minimum cut C between U H ∅ , ILM ∪ ILR ∪ ILMR γ and V H IMR ∪ ILR ∪ ILMR , ∅ γ ′ T . If we add all of thesupplemental edges to C then this gives a cut C ′ between U H ∅ , ILM ∪ ILR ∪ ILMR γ and V H IMR ∪ ILR ∪ ILMR , ∅ γ ′ T with capacity capacity ( C ′ ) = capacity ( C ) + ∑ v ∈ V ( τ ) f imbalance ( v ) ≥ val ( F ′ ) Thus, capacity ( C ) ≥ val ( F ′ ) − ∑ v ∈ V ( τ ) f imbalance ( v ) so there exists a ﬂow F ′′ from U H ∅ , ILM ∪ ILR ∪ ILMR γ to V H IMR ∪ ILR ∪ ILMR , ∅ γ ′ T with value val ( F ′′ ) = capacity ( C ) ≥ val ( F ′ ) + val ( F ′ ) + val ( F ′ ) − ∑ v ∈ V ( τ ) ( f excess ( v ) + f imbalance ( v )) We now make the following observations: 87 emma 9.41.

1. For all vertices v / ∈ I LM ∪ I MR ∪ I LR ∪ I LMR , f excess ( v ) = f imbalance ( v ) = (and these vertices cannever be isolated).2. For all vertices v ∈ I LM , f excess ( v ) + f imbalance ( v ) ≤ w ( v ) . Moreover, for all vertices v ∈ I LM whichare isolated, f excess ( v ) = f imbalance ( v ) = .3. For all vertices v ∈ I MR , f excess ( v ) + f imbalance ( v ) ≤ w ( v ) . Moreover, for all vertices v ∈ I LM whichare isolated, f excess ( v ) = f imbalance ( v ) = .4. For all vertices v ∈ I LR , f excess ( v ) + f imbalance ( v ) ≤ w ( v ) . Moreover, for all vertices v ∈ I LM whichare isolated, f excess ( v ) = f imbalance ( v ) = .5. For all vertices v ∈ I LMR , f excess ( v ) + f imbalance ( v ) ≤ w ( v ) . Moreover, for all vertices v ∈ I LMR which are isolated, f excess ( v ) = w ( v ) and f imbalance ( v ) = .Proof. For the ﬁrst statement, observe that for vertices v / ∈ I LM ∪ I MR ∪ I LR ∪ I LMR , neither v in nor v out is ever a sink or source so the ﬂow into these vertices must equal the ﬂow out of these vertices and thus f in ( v ) = f out ( v ) = f through ( v ) .For the second statement, observe that for a vertex v ∈ I LM ,1. F ′ will have a ﬂow of f in ( v ) into v in and along the edge from v in to v out F ′ will have a ﬂow of f out ( v ) along the edge from v in to v out and out of v out .Thus, f excess ( v ) = f in ( v ) + f out ( v ) − max { f in ( v ) , f out ( v ) } . Since f imbalance ( v ) = | f in ( v ) − f out ( v ) | , f excess ( v ) + f imbalance ( v ) = f in ( v ) + f out ( v ) − min { f in ( v ) , f out ( v ) } ≤ w ( v ) .If v is isolated then neither F ′ nor F ′ can have any ﬂow to v in or out of v out so f in ( v ) = f through ( v ) = f out ( v ) = The third and fourth statements can be proved in the same way as the second statement.For the ﬁfth statement, observe that for a vertex v ∈ I LMR ,1. F ′ will have a ﬂow of f in ( v ) into v in and along the edge from v in to v out .2. F ′ will have a ﬂow of w ( v ) along the edge from v in to v out F ′ will have a ﬂow of f out ( v ) along the edge from v in to v out and out of v out .Thus, f excess ( v ) = w ( v ) + f in ( v ) + f out ( v ) − max { f in ( v ) , f out ( v ) } . Since f imbalance ( v ) = | f in ( v ) − f out ( v ) | , f excess ( v ) + f imbalance ( v ) = w ( v ) + f in ( v ) + f out ( v ) − min { f in ( v ) , f out ( v ) } ≤ w ( v ) .If v is isolated then neither F ′ nor F ′ can have any ﬂow to v in or out of v out so f in ( v ) = f out ( v ) = and f through ( v ) = w ( v ) .Putting everything together, we have the following corollary: Corollary 9.42. ∑ v ∈ V ( τ P ) ( f excess ( v ) + f imbalance ( v )) ≤ w ( I LM ) + w ( I LR ) + w ( I MR ) + w ( I LMR ) − ( w ( I τ P ) − w ( I τ )) w ( S τ P , min ) ≥ val ( F ′ ) + val ( F ′ ) + val ( F ′ ) − ∑ v ∈ V ( τ P ) ( f excess ( v ) + f imbalance ( v )) ≥ w ( U γ ) + w ( S τ , min ) + w ( V γ ′ T ) − w ( I LM ) − w ( I LR ) − w ( I MR ) − w ( I LMR ) + ( w ( I τ P ) − w ( I τ )) Since w ( V ( τ P )) = w ( V ( τ )) + w ( V ( γ )) + w ( V ( γ ′ )) − w ( I LM ) − w ( I LR ) − w ( I MR ) − w ( I LMR ) , w ( S τ P , min ) ≥ w ( U γ ) + w ( S τ , min ) + w ( V γ ′ T ) + w ( V ( τ P )) − w ( V ( τ )) − w ( V ( γ )) − w ( V ( γ ′ )) + ( w ( I τ P ) − w ( I τ )) Rearranging this gives w ( V ( τ P )) − w ( S τ P , min ) + w ( I τ P ) ≤ w ( V ( τ )) − w ( S τ , min ) + w ( I τ ) + w ( V ( γ ) \ U γ ) + w ( V ( γ ′ ) \ U γ ′ ) which is the generalized intersection tradeoff lemma.

10 Showing Positivity ∑ V ∈I mid M f act ( H Id V ) In this section, we describe how to show that ∑ V ∈I mid M f act ( H Id V ) (cid:23) δ Id Sym for some δ > where δ willdepend on n and other parameters. For now, we assume that the indices of Λ are multilinear monomials.We will then describe the adjustments that are needed to handle non-multilinear matrix indices.We start with a few more deﬁnitions. Deﬁnition 10.1.

For all V ∈ I mid we deﬁne Id Sym , V to be the matrix such that1. Id Sym , V ( A , B ) = if A and B both have index shape V .2. Otherwise, Id Sym , V ( A , B ) = . Proposition 10.2. Id Sym = ∑ V ∈I mid Id Sym , V Deﬁnition 10.3.

For all V ∈ I mid we deﬁne λ V = | Aut ( V ) | H Id V ( Id V , Id V ) We now describe our strategy for showing ∑ V ∈I mid M f act ( H Id V ) (cid:23) δ Id Sym . The idea is as follows. Wewill consider the index shapes V ∈ I mid from largest weight to smallest weight and we will show that foreach V ∈ I mid , there exists a δ V > such that ∑ V ∈I mid M f act ( H Id V ) (cid:23) δ V ∑ U ∈I mid : w ( U ) ≥ w ( V ) Id Sym , U .For the ﬁrst step, letting V max be the maximum weight index shape in I mid , M f act ( H Id Vmax ) = λ V max Id Sym , V max because there are no non-trivial left shapes σ such that V σ = V max . For other V ∈ I mid , λ V Id Sym , V is a partof M f act ( H Id V ) but M f act ( H Id V ) will also contain terms of the form H Id V ( σ , σ ′ ) M σ M σ ′ T where U σ = V or U σ ′ = V .We can handle the terms H Id V ( σ , σ ′ ) M σ M σ ′ T where U σ = V and U σ ′ = V by bounding these termsin terms of Id Sym , U σ and Id Sym , U σ ′ . Since w ( U σ ) > w ( V ) and w ( U σ ′ ) > w ( V ) , Id Sym , U σ and Id Sym , U σ ′ are already available to us. To handle the terms H Id V ( σ , σ ′ ) M σ M σ ′ T where exactly one of U σ and U σ ′ areequal to V , we use the following trick. 89 eﬁnition 10.4. Given V ∈ I mid , deﬁne H ′′ Id V to be the coefﬁcient matrix such that1. If U σ = U σ ′ = V then H ′′ Id V ( σ , σ ′ ) = H Id V ( σ , σ ′ )

2. If exactly one of U σ and U σ ′ are equal to V then H ′′ Id V ( σ , σ ′ ) = H Id V ( σ , σ ′ )

3. If U σ = V and U σ ′ = V then H ′′ Id V ( σ , σ ′ ) = H Id V ( σ , σ ′ ) Proposition 10.5. M f act ( H ′′ Id V ) (cid:23) Proof.

Since H Id V (cid:23) , H ′′ Id V (cid:23) and thus M f act ( H ′′ Id V ) (cid:23) . Corollary 10.6.

For all V ∈ I mid , M f act ( H Id V ) + ∑ σ , σ ′ ∈L V : U σ = V , U σ ′ = V H Id V ( σ , σ ′ ) M σ M σ ′ T (cid:23) λ V Id Sym , V Proof.

Observe that M f act ( H Id V ) − λ V Id Sym , V + ∑ σ , σ ′ ∈L V : U σ = V , U σ ′ = V H Id V ( σ , σ ′ ) M σ M σ ′ T = M f act ( H ′′ Id V ) (cid:23) We now analyze the terms ∑ σ , σ ′ ∈L V : U σ = V , U σ ′ = V H Id V ( σ , σ ′ ) M σ M σ ′ T . Deﬁnition 10.7.

Given U , V ∈ I with w ( U ) > w ( V ) , we deﬁne W ( U , V ) to be W ( U , V ) = | Aut ( U ) | ∑ σ ∈L V : U σ = U ∑ σ ′ ∈L V : U σ ′ = V B norm ( σ ) B norm ( σ ′ ) H Id V ( σ , σ ′ ) Lemma 10.8.

For all V ∈ I mid , ∑ σ , σ ′ ∈L V : U σ = V , U σ ′ = V H Id V ( σ , σ ′ ) M σ M σ ′ T (cid:22) ∑ U ∈I mid : w ( U ) > w ( V ) W ( U , V ) Id Sym , U Proof.

Observe that for all σ , σ ′ ∈ L V such that U σ = V and U σ ′ = V , || M σ M σ ′ T || ≤ B norm ( σ ) B norm ( σ ′ ) and thus (cid:0) M σ M σ ′ T + M σ ′ M σ T (cid:1) (cid:22) B norm ( σ ) B norm ( σ ′ ) (cid:16) M Id U σ + M Id U σ ′ (cid:17) Summing this equation over all σ , σ ′ ∈ L V such that U σ = V and U σ ′ = V , ∑ σ , σ ′ ∈L V : U σ = V , U σ ′ = V H Id U ( σ , σ ′ ) M σ M σ ′ T (cid:22) ∑ σ , σ ′ ∈L V : U σ = V , U σ ′ = V B norm ( σ ) B norm ( σ ′ ) M Id U σ (cid:22) ∑ U ∈I mid : w ( U ) > w ( V ) ∑ σ ∈L V : U σ = U ∑ σ ′ ∈L V : V σ = V B norm ( σ ) B norm ( σ ′ ) H Id U ( σ , σ ′ ) M Id U (cid:22) ∑ U ∈I mid : w ( U ) > w ( V ) | Aut ( U ) | W ( U , V ) M Id U Since all of the coefﬁcient matrices have SOS-symmetry, we can replace M Id U by | Aut ( U ) | Id Sym , U and thiscompletes the proof. 90sing this lemma, we can show the following theorem: Theorem 10.9.

Let G be the following directed graph:1. The vertices of G are the index shapes V ∈ I mid

2. For each U , V ∈ I mid such that w ( U ) > w ( V ) , we have an edge e = ( V , U ) with weight w ( e ) = W ( U , V ) λ V For all V ∈ I mid , Id Sym , V (cid:22) ∑ U ∈I mid : w ( U ) ≥ w ( V ) ∑ P : P is a path from V to U in G ∏ e ∈ E ( P ) w ( e ) ! λ U M f act ( H Id U ) Proof sketch.

This can be proved by starting with Corollary 10.6 and iteratively applying Lemma 10.8 andCorollary 10.6.

In order to handle non-multilinear matrix indices, we need to make a few adjustments. First, we need tomodify the deﬁnition of Id Sym , V . Deﬁnition 10.10.

For all V ∈ I mid we deﬁne Id Sym , V to be the matrix such that1. Id Sym , V ( A , B ) = if A and B have the same index shape U and U has the same number of each typeof vertex as V . Note that B may be a permutation of A and U may have different powers than V .2. Otherwise, Id Sym , V ( A , B ) = . Observe that with this modiﬁed deﬁnition, we will still have Id Sym = ∑ V ∈I mid Id Sym , V .We also need to adjust how we deﬁne λ V as there are left shapes σ such that U σ and V σ have the samenumbers and types of vertices but U σ has different powers. Deﬁnition 10.11.

Given V ∈ I mid , we deﬁne T V ⊆ L V to be the set of left shapes σ ∈ L V such that U σ has the same numbers and types of vertices as V (which automatically implies that E ( σ ) = ∅ ). Deﬁnition 10.12.

Deﬁne Id ∗ Sym , V to be the matrix indexed by left shapes σ , σ ′ ∈ T V such that Id ∗ Sym , V ( σ , σ ′ ) = | Aut ( V ) | if U σ ≡ U σ ′ and Id ∗ Sym , V ( σ , σ ′ ) = otherwise. Proposition 10.13. M f act ( Id ∗ Sym , V ) = Id Sym , V Deﬁnition 10.14.

Let H ′ Id V be the matrix H Id V restricted to rows and columns σ , σ ′ where σ , σ ′ ∈ T V . Wedeﬁne λ V to be the largest constant λ such that H ′ Id V (cid:23) λ Id ∗ Sym , V Finally, whenever we have the condition that U σ = V , it should instead be the condition that U σ doesnot have the same number of each type of vertex as V . With these adjustments, the same arguments gothrough. 91 Now, we will illustrate the ﬁnal ingredients needed to show positivity for our applications.To use Theorem 7.101, we would need to prove a statement of the form: Whenever k M α k ≤ B norm ( α ) for all α ∈ M ′ , ∑ U ∈I mid M f actId U ( H Id U ) (cid:23) ∑ U ∈I mid ∑ γ ∈ Γ U , ∗ d Id U ( H ′ γ , H Id U ) | Aut ( U ) | c ( γ ) ! Id sym We will sketch the strategy we use to prove this. Let D sos be the degree of the SoS program.For the left hand side, we will prove a lower bound of the form: Whenever k M α k ≤ B norm ( α ) for all α ∈ M ′ , ∑ U ∈I mid M f actId U ( H Id U ) (cid:23) n K D sos Id sym for a constant K > . For this, we use the strategy from Section 10.1. Then, we prove an upper bound onthe right hand side of the form ∑ U ∈I mid ∑ γ ∈ Γ U , ∗ d Id U ( H Id U , H ′ γ ) | Aut ( U ) | c ( γ ) ≤ n K D sos D V for a constant K > .Now, we put these two together. Using the fact that Id Sym (cid:23) , by simply setting n K D sos > n K Dsos DV which can be obtained by choosing D sos small enough, we obtain the desired result.We will also need the following bound that says that that lets us sum over all shapes if we have sufﬁcientdecay for each vertex, then, the sum of this decay, over all shapes σ ◦ σ ′ for σ , σ ′ ∈ L ′ U , is bounded. Deﬁnition 10.15.

For U ∈ I mid , let L ′ U ⊂ L U be the set of non-trivial shapes in L U . Lemma 10.16.

Suppose D V = n C V ε , D E = n C E ε for constants C V , C E > , are the truncation parametersfor our shapes. For any U ∈ I mid , ∑ U ∈I mid ∑ σ , σ ′ ∈L ′ U D D sos sos n F ε | V ( σ ◦ σ ′ ) | ≤ for a constant F > that depends only on C V , C E . In particular, by setting C V , C E small enough, we canmake this constant arbitrarily small.Proof. For a given j = | U | , the number of ways to choose U is at most t jmax . For a given U ∈ I mid , we willbound the number of ways to choose σ , σ ′ ∈ L ′ U . To choose σ , σ ′ ∈ L ′ U , it is sufﬁcient to choose- The number of vertices j ≥ (resp. j ′ ≥ ) in U σ \ V σ (resp. U σ ′ \ V σ ′ ), their types of which thereare at most t max , and their powers which have at most D sos choices.- The number of vertices j (resp. j ′ ) in V ( σ ) \ ( U σ ∪ V σ ) (resp. V ( σ ′ ) \ ( U σ ′ ∪ V σ ′ ) ) and also theirtypes, of which there are at most t max .- The position of each vertex i in U σ \ V σ (resp. U σ ′ \ V σ ′ ) within U σ (resp. U σ ′ ). There are at most D V choices for each vertex. 92 The subset of U σ (resp. U σ ′ ) that is in V σ (resp. V σ ′ ) and a mapping in Aut ( U σ ∩ V σ ) (resp. Aut ( U σ ′ ∩ V σ ′ ) ) that determines the matching between the vertices in U σ ∩ V σ (resp. U σ ′ ∩ V σ ′ ).- The number j (resp. j ′ ) of edges in E ( σ ) (resp. E ( σ ′ ) ). and the k endpoints of each edge. Eachendpoint has at most D V choices.Therefore, for all j ≥ j , j ′ ≥ j , j ′ , j , j ′ ≥ , we have ∑ U ∈I mid ∑ σ , σ ′ ∈L ′ U | U σ \ V σ | = j , | U σ ′ \ V σ ′ | = j ′ | V ( σ ) \ ( U σ ∪ V σ ) | = j , | V ( σ ′ ) \ ( U σ ′ ∪ V σ ′ ) | = j ′ | E ( σ ) | = j , | E ( σ ′ ) | = j ′ | Aut ( U σ ′ ∩ V σ ′ ) || Aut ( U σ ∩ V σ ) | ( t max ) j + j + j ′ ( D V t max D sos ) j + j ′ ( D V ) kj ≤ This implies that ∑ U ∈I mid ∑ σ , σ ′ ∈L ′ U D D sos sos n F ε | V ( σ ◦ σ ′ ) | ≤ for a constant F > that only depends on C V , C E .

11 Planted slightly denser subgraph: Full veriﬁcation

In this section, we will prove all the required bounds to prove Theorem 1.1.

Theorem 1.1.

In particular, we will use Theorem 7.95 where we choose ε in the theorem, not to be confused with the ε in Theorem 1.1, to be an arbitrarily small constant. Lemma 11.1.

For all U ∈ I mid and τ ∈ M U , " | Aut ( U ) | c ( τ ) H Id U B norm ( τ ) H τ B norm ( τ ) H T τ | Aut ( U ) | c ( τ ) H Id U (cid:23) Lemma 11.2.

For all U , V ∈ I mid where w ( U ) > w ( V ) and all γ ∈ Γ U , V , c ( γ ) N ( γ ) B ( γ ) H − γ , γ Id V (cid:22) H ′ γ Lemma 11.3.

Whenever k M α k ≤ B norm ( α ) for all α ∈ M ′ , ∑ U ∈I mid M f actId U ( H Id U ) (cid:23) ∑ U ∈I mid ∑ γ ∈ Γ U , ∗ d Id U ( H ′ γ , H Id U ) | Aut ( U ) | c ( γ ) ! Id sym Corollary 11.4.

With constant probability, Λ (cid:23) .Proof. This follows by invoking Theorem 7.95 whose conditions follow from Lemma 4.7, Lemma 11.1,Lemma 11.2 and Lemma 11.3. 93

Lemma 11.5.

Suppose k ≤ n − ε . For all U ∈ I mid and τ ∈ M U , √ n | V ( τ ) |−| U τ | S ( τ ) ≤ n C p ε | E ( τ ) | Proof.

This result follows by plugging in the value of S ( τ ) . Using k ≤ n − ε , √ n | V ( τ ) |−| U τ | S ( τ ) = √ n | V ( τ ) |−| U τ | (cid:18) kn (cid:19) | V ( τ ) |−| U τ | ( ( + n C p ε ) − ) | E ( τ ) | ≤ n C p ε | E ( τ ) | Corollary 11.6.

For all U ∈ I mid and τ ∈ M U , we have c ( τ ) B norm ( τ ) S ( τ ) ≤ Proof.

Since τ is a proper middle shape, we have w ( I τ ) = and w ( S τ ) = w ( U τ ) . This implies n w ( V ( τ ))+ w ( I τ ) − w ( S τ ) = √ n | V ( τ ) |−| U τ | Since τ is proper, every vertex i ∈ V ( τ ) \ U τ or i ∈ V ( τ ) \ V τ has deg τ ( i ) ≥ and hence, | V ( τ ) \ U τ | + | V ( τ ) \ V τ | ≤ | E ( τ ) | . Also, q = n O ( ) · ε C V . We can set C V sufﬁciently small so that, using Lemma 11.5, c ( τ ) B norm ( τ ) S ( τ ) = ( D V ) | U τ \ V τ | + | V τ \ U τ | + | E ( τ ) | | V ( τ ) \ ( U τ ∪ V τ ) | · ( D V p eq ) | V ( τ ) \ U τ | + | V ( τ ) \ V τ | √ n | V ( τ ) |−| U τ | S ( τ ) ≤ n O ( ) · ε C V ·| E ( τ ) | · √ n | V ( τ ) |−| U τ | S ( τ ) ≤ n O ( ) · ε C V ·| E ( τ ) | · n C p ε | E ( τ ) | ≤ We can now prove Lemma 11.1.

Lemma 11.1.

Suppose k ≤ n − ε . For all U , V ∈ I mid where w ( U ) > w ( V ) and for all γ ∈ Γ U , V , n w ( V ( γ ) \ U γ ) S ( γ ) ≤ n B ε ( | V ( γ ) \ ( U γ ∩ V γ ) | + | E ( γ ) | ) for some constant B that depends only on C p . In particular, it is independent of C V .Proof. Since γ is a left shape, we have | U γ | ≥ | V γ | as V γ is the unique minimum vertex separator of γ and so, n w ( V ( γ ) \ U γ ) = n | V ( γ ) |−| U γ | ≤ n | V ( γ ) |− | U γ | + | V γ | . Also, note that | V ( γ ) | − | U γ | − | V γ | = | U γ \ V γ | + | V γ \ U γ | + | V ( γ ) \ U γ \ V γ | ≥ | V ( γ ) \ ( U γ ∩ V γ ) | . Therefore, n w ( V ( γ ) \ U γ ) S ( γ ) = n | V ( γ ) \ U γ ) | (cid:18) kn (cid:19) | V ( γ ) |−| U γ |−| V γ | ( ( + n C p ε ) − ) | E ( γ ) | ≤ n | V ( γ ) |− | U γ | + | V γ | (cid:18) n + ε (cid:19) | V ( γ ) |−| U γ |−| V γ | (cid:18) n C p ε (cid:19) | E ( γ ) | ≤ (cid:18) n ε (cid:19) | V ( γ ) |−| U γ |−| V γ | (cid:18) n C p ε (cid:19) | E ( γ ) | ≤ n B ε ( | V ( γ ) \ ( U γ ∩ V γ ) | + ∑ e ∈ E ( γ ) l e ) for a constant B that depends only on C p .We can now prove Lemma 11.2. Lemma 11.2.

For all U , V ∈ I mid where w ( U ) > w ( V ) and all γ ∈ Γ U , V , c ( γ ) N ( γ ) B ( γ ) H − γ , γ Id V (cid:22) H ′ γ roof. By Lemma 4.11, we have c ( γ ) N ( γ ) B ( γ ) H − γ , γ Id V (cid:22) c ( γ ) N ( γ ) B ( γ ) S ( γ ) | Aut ( U ) || Aut ( V ) | H ′ γ Using the same proof as in Lemma 4.7, we can see that H ′ γ (cid:23) . Therefore, it sufﬁces to prove that c ( γ ) N ( γ ) B ( γ ) S ( γ ) | Aut ( U ) || Aut ( V ) | ≤ Since U , V ∈ I mid , | Aut ( U ) | = | U | !, | Aut ( V ) | = | V | ! . Therefore, | Aut ( U ) || Aut ( V ) | = | U | ! | V | ! ≤ D | U γ \ V γ | V .Also, q = n O ( ) · ε C V . Let B be the constant from Lemma 11.7. We can set C V sufﬁciently small so that,using Lemma 11.7, c ( γ ) N ( γ ) B ( γ ) S ( γ ) | Aut ( U ) || Aut ( V ) | ≤ ( D V ) | U γ \ V γ | + | V γ \ U γ | + | E ( α ) | | V ( γ ) \ ( U γ ∪ V γ ) | · ( D V ) | V ( γ ) \ V γ | + | V ( γ ) \ U γ | ( D V p eq ) | V ( γ ) \ U γ | + | V ( γ ) \ V γ | · n w ( V ( γ ) \ U γ ) S ( γ ) · D | U γ \ V γ | V ≤ n O ( ) · ε C V · ( | V ( γ ) \ ( U γ ∩ V γ ) | + ∑ e ∈ E ( γ ) l e ) · n w ( V ( γ ) \ U γ ) S ( γ ) ≤ n O ( ) · ε C V · ( | V ( γ ) \ ( U γ ∩ V γ ) | + ∑ e ∈ E ( γ ) l e ) · n B ε ( | V ( γ ) \ ( U γ ∩ V γ ) | + ∑ e ∈ E ( γ ) l e ) ≤ In this section, we will prove Lemma 11.3.

Lemma 11.3.

Whenever k M α k ≤ B norm ( α ) for all α ∈ M ′ , ∑ U ∈I mid M f actId U ( H Id U ) (cid:23) ∑ U ∈I mid ∑ γ ∈ Γ U , ∗ d Id U ( H ′ γ , H Id U ) | Aut ( U ) | c ( γ ) ! Id sym We use the strategy from Section 10. We will prove the following lemmas.

Lemma 11.8.

Whenever k M α k ≤ B norm ( α ) for all α ∈ M ′ , ∑ U ∈I mid M f actId U ( H Id U ) (cid:23) n K D sos Id sym for a constant K > . Lemma 11.9. ∑ U ∈I mid ∑ γ ∈ Γ U , ∗ d Id U ( H Id U , H ′ γ ) | Aut ( U ) | c ( γ ) ≤ n K D sos D V for a constant K > .

96f we assume these, we can conclude the following.

Lemma 11.3.

Whenever k M α k ≤ B norm ( α ) for all α ∈ M ′ , ∑ U ∈I mid M f actId U ( H Id U ) (cid:23) ∑ U ∈I mid ∑ γ ∈ Γ U , ∗ d Id U ( H ′ γ , H Id U ) | Aut ( U ) | c ( γ ) ! Id sym Proof.

Let k M α k ≤ B norm ( α ) for all α ∈ M ′ . By Lemma 11.8, ∑ U ∈I mid M f actId U ( H Id U ) (cid:23) n K D sos Id sym for a constant K > . By Lemma 11.9, ∑ U ∈I mid ∑ γ ∈ Γ U , ∗ d Id U ( H Id U , H ′ γ ) | Aut ( U ) | c ( γ ) ≤ n K D sos D V for a constant K > .We choose C sos sufﬁciently small so that n K D sos ≥ n K Dsos DV which can be satisﬁed by setting C sos < K C V for a sufﬁciently small constant K > . Then, since Id Sym (cid:23) , using Lemma 11.8 and Lemma 11.9, ∑ U ∈I mid M f actId U ( H Id U ) (cid:23) n K D sos Id sym (cid:23) n K D sos D V Id sym (cid:23) ∑ U ∈I mid ∑ γ ∈ Γ U , ∗ d Id U ( H ′ γ , H Id U ) | Aut ( U ) | c ( γ ) ! Id sym The rest of the section is devoted to proving Lemma 11.8 and Lemma 11.9.In the proofs of both these lemmas, we will need a bound on B norm ( σ ) B norm ( σ ′ ) H Id U ( σ , σ ′ ) that isobtained below. Lemma 11.10.

Suppose k ≤ n − ε . For all U ∈ I mid and σ , σ ′ ∈ L U , B norm ( σ ) B norm ( σ ′ ) H Id U ( σ , σ ′ ) ≤ n ε | V ( α ) | roof. Let α = σ ◦ σ ′ . Observe that | V ( σ ) | + | V ( σ ′ ) | = | V ( α ) | + | U | . By choosing C V sufﬁciently small, B norm ( σ ) B norm ( σ ′ ) H Id U ( σ , σ ′ ) = ( D V p eq ) | V ( σ ) \ U σ | + | V ( σ ) \ V σ | n w ( V ( σ )) − w ( U ) · ( D V p eq ) | V ( σ ′ ) \ U σ ′ | + | V ( σ ′ ) \ V σ ′ | n w ( V ( σ ′ )) − w ( U ) · | Aut ( U ) | (cid:18) kn (cid:19) | V ( α ) | ( ( + n C p ε ) − ) | E ( α ) | ≤ n O ( ) · ε C V ·| V ( α ) | √ n | V ( σ ) |−| U | √ n | V ( σ ′ ) |−| U | (cid:18) kn (cid:19) | V ( α ) | n C p ε | E ( α ) | ≤ n O ( ) · ε C V ·| V ( α ) | √ n | V ( α ) |−| U | (cid:18) n + ε (cid:19) | V ( α ) | n C p ε | E ( α ) | ≤ n O ( ) · ε C V ·| V ( α ) | · n ε | V ( α ) | · n C p ε | E ( α ) | ≤ n ε | V ( α ) | To prove Lemma 11.8, we will use the strategy from Section 10.1. We will also use the notation from thatsection. We recall that for U ∈ I mid , L ′ U ⊂ L U was the set of non-trivial shapes in L U . Proposition 11.11.

Lemma 11.13.

For any edge e = ( V , U ) in G , we have w ( e ) ≤ n O ( ) D sos n ε | U | Proof.

For any U , V ∈ I mid such that w ( U ) > w ( V ) , ∑ P : P is a path from V to U in G ∏ e ∈ E ( P ) w ( e ) ≤ n O ( ) D sos Proof.

The total number of vertices in G is at most D sos + since each U ∈ I mid has at most D sos vertices.Therefore, for any ﬁxed integer j ≥ , the number of paths from V to U of length j is at most ( D sos + ) j .Take any path P from V to U . Suppose it has length j ≥ . Note that for all edges e = ( V ′ , U ′ ) in E ( P ) ,since | U ′ | ≥ , we have w ( e ) ≤ n O ( ) D sos n ε | U ′ | ≤ n O ( ) D sos n ε So, ∏ e ∈ E ( P ) w ( e ) ≤ (cid:16) n O ( ) Dsos n ε (cid:17) j . Therefore, by setting C sos small enough, ∑ P : P is a path from V to U in G ∏ e ∈ E ( P ) w ( e ) ≤ D sos ∑ j = ( D sos + ) j n O ( ) D sos n ε ! j ≤ n O ( ) D sos We can now prove Lemma 11.8.

Lemma 11.8.

Whenever k M α k ≤ B norm ( α ) for all α ∈ M ′ , ∑ U ∈I mid M f actId U ( H Id U ) (cid:23) n K D sos Id sym or a constant K > .Proof. For all V ∈ I mid , we have Id Sym , V (cid:22) ∑ U ∈I mid : w ( U ) ≥ w ( V ) ∑ P : P is a path from V to U in G ∏ e ∈ E ( P ) w ( e ) ! λ U M f act ( H Id U ) Summing this over all V ∈ I mid , we get Id Sym (cid:22) ∑ U ∈I mid λ U ∑ V ∈I mid : w ( U ) ≥ w ( V ) ∑ P : P is a path from V to U in G ∏ e ∈ E ( P ) w ( e ) ! M f act ( H Id U ) (cid:22) ∑ U ∈I mid λ U ∑ V ∈I mid : w ( U ) ≥ w ( V ) n O ( ) D sos ! M f act ( H Id U ) For any ﬁxed U ∈ I mid , the number of V ∈ I mid such that w ( U ) ≥ w ( V ) is at most D sos + . Also, λ U ≥ d O ( ) Dsos for all U ∈ I mid . Therefore, Id Sym (cid:22) ∑ U ∈I mid λ U ( D sos + ) n O ( ) D sos M f act ( H Id U ) (cid:22) ∑ U ∈I mid n O ( ) D sos M f act ( H Id U ) where we used the fact that for all U ∈ I mid , M f act ( H Id U ) (cid:23) . We restate the lemma for convenience.

Lemma 11.9. ∑ U ∈I mid ∑ γ ∈ Γ U , ∗ d Id U ( H Id U , H ′ γ ) | Aut ( U ) | c ( γ ) ≤ n K D sos D V for a constant K > .Proof. We have ∑ U ∈I mid ∑ γ ∈ Γ U , ∗ d Id U ( H Id U , H ′ γ ) | Aut ( U ) | c ( γ ) = ∑ U ∈I mid ∑ γ ∈ Γ U , ∗ | Aut ( U ) | c ( γ ) ∑ σ , σ ′∈L U γ : | V ( σ ) |≤ DV , | V ( σ ′ ) |≤ DV , | V ( σ ◦ γ ) | > DV or | V ( σ ′◦ γ ) | > DV B norm ( σ ) B norm ( σ ′ ) H Id U γ ( σ , σ ′ ) σ , σ ′ that could appear in the above sum must necessarily be non-trivial and hence, σ , σ ′ ∈ L ′ U .Then, ∑ U ∈I mid ∑ γ ∈ Γ U , ∗ d Id U ( H Id U , H ′ γ ) | Aut ( U ) | c ( γ )= ∑ U ∈I mid ∑ σ , σ ′ ∈L ′ U B norm ( σ ) B norm ( σ ′ ) H Id U ( σ , σ ′ ) ∑ γ ∈ Γ U , ∗ : | V ( σ ◦ γ ) | > D V or | V ( σ ′ ◦ γ ) | > D V | Aut ( U ) | c ( γ ) For σ ∈ L ′ U , deﬁne m σ = D V + − | V ( σ ) | ≥ . This is precisely set so that for all γ ∈ Γ U , ∗ , we have | V ( σ ◦ γ ) | > D V if and only if | V ( γ ) | ≥ | U | + m σ . So, for σ , σ ′ ∈ L ′ U , using Lemma 9.31 ∑ γ ∈ Γ U , ∗ : | V ( σ ◦ γ ) | > D V or | V ( σ ′ ◦ γ ) | > D V | Aut ( U ) | c ( γ ) = ∑ γ ∈ Γ U , ∗ : | V ( γ ) |≥| U | + min ( m σ , m σ ′ ) | Aut ( U ) | c ( γ ) ≤ min ( m σ , m σ ′ ) − Also, for σ , σ ′ ∈ L ′ U , we have | V ( σ ◦ σ ′ ) | + min ( m σ , m σ ′ ) − ≥ D V . Therefore, ∑ U ∈I mid ∑ γ ∈ Γ U , ∗ d Id U ( H Id U , H ′ γ ) | Aut ( U ) | c ( γ ) ≤ ∑ U ∈I mid ∑ σ , σ ′ ∈L ′ U B norm ( σ ) B norm ( σ ′ ) H Id U ( σ , σ ′ ) min ( m σ , m σ ′ ) − ≤ ∑ U ∈I mid ∑ σ , σ ′ ∈L ′ U n O ( ) D sos n ε | V ( σ ◦ σ ′ ) | min ( m σ , m σ ′ ) − where we used Lemma 11.10. Using n ε | V ( σ ◦ σ ′ ) | ≥ n ε | V ( σ ◦ σ ′ ) | | V ( σ ◦ σ ′ ) | , ∑ U ∈I mid ∑ γ ∈ Γ U , ∗ d Id U ( H Id U , H ′ γ ) | Aut ( U ) | c ( γ ) ≤ ∑ U ∈I mid ∑ σ , σ ′ ∈L ′ U n O ( ) D sos n ε | V ( σ ◦ σ ′ ) | | V ( σ ◦ σ ′ ) | min ( m σ , m σ ′ ) − ≤ ∑ U ∈I mid ∑ σ , σ ′ ∈L ′ U n O ( ) D sos n ε | V ( σ ◦ σ ′ ) | D V ≤ ∑ U ∈I mid ∑ σ , σ ′ ∈L ′ U n O ( ) D sos D D sos sos n ε | V ( σ ◦ σ ′ ) | D V The ﬁnal step will be to argue that ∑ U ∈I mid ∑ σ , σ ′ ∈L ′ U D Dsossos n ε | V ( σ ◦ σ ′ ) | ≤ which will complete the proof. Butthis will follow from Lemma 10.16 if we set C V small enough.

12 Tensor PCA: Full veriﬁcation

In this section, we will prove all the bounds required to prove Theorem 1.3.

Theorem 1.3.

For all U ∈ I mid and τ ∈ M U , " | Aut ( U ) | c ( τ ) H Id U B norm ( τ ) H τ B norm ( τ ) H T τ | Aut ( U ) | c ( τ ) H Id U (cid:23) Lemma 12.2.

For all U , V ∈ I mid where w ( U ) > w ( V ) and all γ ∈ Γ U , V , c ( γ ) N ( γ ) B ( γ ) H − γ , γ Id V (cid:22) H ′ γ Lemma 12.3.

Whenever k M α k ≤ B norm ( α ) for all α ∈ M ′ , ∑ U ∈I mid M f actId U ( H Id U ) (cid:23) ∑ U ∈I mid ∑ γ ∈ Γ U , ∗ d Id U ( H ′ γ , H Id U ) | Aut ( U ) | c ( γ ) ! Id sym Corollary 12.4.

With constant probability, Λ (cid:23) .Proof. This follows by invoking Theorem 7.101 whose conditions follow from Lemma 5.10, Lemma 12.1,Lemma 12.2 and Lemma 12.3.

Lemma 12.5.

Suppose λ ≤ n k − ε . For all U ∈ I mid and τ ∈ M U , suppose deg τ ( i ) is even for all i ∈ V ( τ ) \ U τ \ V τ , then √ n | V ( τ ) |−| U τ | S ( τ ) ≤ n ε ∑ e ∈ E ( τ ) l e Proof.

Firstly, we claim that ∑ e ∈ E ( τ ) kl e ≥ ( | V ( τ ) | − | U τ | ) . For any vertex i ∈ V ( τ ) \ U τ \ V τ , deg τ ( i ) is even and is not , hence, deg τ ( i ) ≥ . Any vertex i ∈ U τ \ V τ cannot have deg τ ( i ) = otherwise U τ \ { i } is a vertex separator of strictly smaller weight than U τ , which is not possible, hence, deg τ ( i ) ≥ .Therefore, ∑ e ∈ E ( τ ) kl e = ∑ i ∈ V ( τ ) deg τ ( i ) ≥ ∑ i ∈ V ( τ ) \ U τ \ V τ deg τ ( i ) + ∑ i ∈ U τ \ V τ deg τ ( i ) + ∑ i ∈ V τ \ U τ deg τ ( i ) ≥ | V ( τ ) \ U τ \ V τ | + | U τ \ V τ | + | V τ \ U τ | = ( | V ( τ ) | − | U τ | ) C ∆ sufﬁciently small, we have √ n | V ( τ ) |−| U τ | S ( τ ) = √ n | V ( τ ) |−| U τ | ∆ | V ( τ ) |−| U τ | ∏ e ∈ E ( τ ) λ ( ∆ n ) k ! l e ≤ √ n | V ( τ ) |−| U τ | ∆ | V ( τ ) |−| U τ | ∏ e ∈ E ( τ ) n ( − k − ε ) l e = √ n | V ( τ ) |−| U τ |− ∑ e ∈ E ( τ ) kle ∆ | V ( τ ) |−| U τ | ∏ e ∈ E ( τ ) n − ε l e = ∆ | V ( τ ) |−| U τ | ∏ e ∈ E ( τ ) n − ε l e ≤ n ε ∑ e ∈ E ( τ ) l e Corollary 12.6.

For all U ∈ I mid and τ ∈ M U , we have c ( τ ) B norm ( τ ) S ( τ ) ≤ Proof.

Since τ is a proper middle shape, we have w ( I τ ) = and w ( S τ , min ) = w ( U τ ) . This implies n w ( V ( τ ))+ w ( I τ ) − w ( S τ , min ) = √ n | V ( τ ) |−| U τ | If deg τ ( i ) is odd for any vertex i ∈ V ( τ ) \ U τ \ V τ , then S ( τ ) = and the inequality is true. So, assume deg τ ( i ) is even for all i ∈ V ( τ ) \ U τ \ V τ . As was observed in the proof of Lemma 12.5, every vertex i ∈ V ( τ ) \ U τ or i ∈ V ( τ ) \ V τ has deg τ ( i ) ≥ and hence, | V ( τ ) \ U τ | + | V ( τ ) \ V τ | ≤ ∑ e ∈ E ( τ ) l e .Also, | E ( τ ) | ≤ ∑ e ∈ E ( τ ) l e and q = n O ( ) · ε ( C V + C E ) . We can set C V , C E sufﬁciently small so that, usingLemma 12.5, c ( τ ) B norm ( τ ) S ( τ ) = ( D V ) | U τ \ V τ | + | V τ \ U τ | + k | E ( τ ) | | V ( τ ) \ ( U τ ∪ V τ ) | · e ( qD V ) | V ( τ ) \ U τ | + | V ( τ ) \ V τ | ∏ e ∈ E ( τ ) ( D V D E q ) l e √ n | V ( τ ) |−| U τ | S ( τ ) ≤ n O ( ) · ε ( C V + C E ) · ∑ e ∈ E ( τ ) l e · √ n | V ( τ ) |−| U τ | S ( τ ) ≤ n O ( ) · ε ( C V + C E ) · ∑ e ∈ E ( τ ) l e · n ε ∑ e ∈ E ( τ ) l e ≤ We can now prove Lemma 12.1.

Lemma 12.1.

Suppose λ ≤ n k − ε . For all U , V ∈ I mid where w ( U ) > w ( V ) and for all γ ∈ Γ U , V , n w ( V ( γ ) \ U γ ) S ( γ ) ≤ n B ε ( | V ( γ ) \ ( U γ ∩ V γ ) | + ∑ e ∈ E ( γ ) l e ) for some constant B that depends only on C ∆ . In particular, it is independent of C V and C E .Proof. Suppose there is a vertex i ∈ V ( γ ) \ U γ \ V γ such that deg γ ( i ) is odd, then S ( γ ) = and theinequality is true. So, assume deg γ ( i ) is even for all vertices i ∈ V ( γ ) \ U γ \ V γ .We ﬁrst claim that k ∑ e ∈ E ( γ ) l e ≥ | V ( γ ) \ U γ | . Since γ is a left shape, all vertices i in V ( γ ) \ U γ have deg γ ( i ) ≥ . In particular, all vertices i ∈ V γ \ U γ have deg γ ( i ) ≥ . Moreover, if i ∈ V ( γ ) \ U γ \ V γ ,since deg γ ( i ) is even, we must have deg γ ( i ) ≥ .Let S ′ be the set of vertices i ∈ U γ \ V γ that have deg γ ( i ) ≥ . Then, note that | S ′ | + | U γ ∩ V γ | ≥| V γ | = ⇒ | S ′ | ≥ | V γ \ U γ | since otherwise S ′ ∪ ( U γ ∩ V γ ) will be a vertex separator of γ of weight strictlyless than V γ , which is not possible. Then, ∑ e ∈ E ( γ ) kl e = ∑ i ∈ V ( γ ) deg γ ( i ) ≥ ∑ i ∈ V ( γ ) \ U γ \ V γ deg γ ( i ) + ∑ i ∈ U γ \ V γ deg γ ( i ) + ∑ i ∈ V γ \ U γ deg γ ( i ) ≥ | V ( γ ) \ U γ \ V γ | + | S ′ | + | V γ \ U γ |≥ | V ( γ ) \ U γ \ V γ | + | V γ \ U γ | = | V ( γ ) \ U γ | Finally, note that | V ( γ ) | − | U γ | − | V γ | = | U γ \ V γ | + | V γ \ U γ | + | V ( γ ) \ U γ \ V γ | ≥ | V ( γ ) \ U γ ∩ V γ ) | . By choosing C ∆ sufﬁciently small, we have n w ( V ( γ ) \ U γ ) S ( γ ) = n | V ( γ ) \ U γ ) | ∆ | V ( γ ) |−| U γ |−| V γ | ∏ e ∈ E ( γ ) (cid:18) λ ( ∆ n ) k (cid:19) l e ≤ n | V ( γ ) \ U γ ) | ∆ | V ( γ ) |−| U γ |−| V γ | ∏ e ∈ E ( γ ) n − ( k + ε ) l e ≤ ∆ | V ( γ ) |−| U γ |−| V γ | ∏ e ∈ E ( γ ) n − ε l e ≤ n B ε ( | V ( γ ) \ ( U γ ∩ V γ ) | + ∑ e ∈ E ( γ ) l e ) for a constant B that depends only on C ∆ . Remark 12.8.

In the above bounds, note that there is a decay of n B ε for each vertex in V ( γ ) \ ( U γ ∩ V γ ) .One of the main technical reasons for introducing the slack parameter C ∆ in the planted distribution was tointroduce this decay, which is needed in the current machinery. We can now prove Lemma 12.2.

Lemma 12.2.

For all U , V ∈ I mid where w ( U ) > w ( V ) and all γ ∈ Γ U , V , c ( γ ) N ( γ ) B ( γ ) H − γ , γ Id V (cid:22) H ′ γ Proof.

By Lemma 5.14, we have c ( γ ) N ( γ ) B ( γ ) H − γ , γ Id V (cid:22) c ( γ ) N ( γ ) B ( γ ) S ( γ ) | Aut ( U ) || Aut ( V ) | H ′ γ Using the same proof as in Lemma 5.10, we can see that H ′ γ (cid:23) . Therefore, it sufﬁces to prove that c ( γ ) N ( γ ) B ( γ ) S ( γ ) | Aut ( U ) || Aut ( V ) | ≤ Since U , V ∈ I mid , | Aut ( U ) | = | U | !, | Aut ( V ) | = | V | ! . Therefore, | Aut ( U ) || Aut ( V ) | = | U | ! | V | ! ≤ D | U γ \ V γ | V .Also, | E ( γ ) | ≤ ∑ e ∈ E ( γ ) l e and q = n O ( ) · ε ( C V + C E ) . Let B be the constant from Lemma 12.7. We can set C V , C E sufﬁciently small so that, using Lemma 12.7, c ( γ ) N ( γ ) B ( γ ) S ( γ ) | Aut ( U ) || Aut ( V ) | ≤ ( D V ) | U γ \ V γ | + | V γ \ U γ | + k | E ( α ) | | V ( γ ) \ ( U γ ∪ V γ ) | · ( D V ) | V ( γ ) \ V γ | + | V ( γ ) \ U γ | ( qD V ) | V ( γ ) \ U γ | + | V ( γ ) \ V γ | ∏ e ∈ E ( γ ) ( D V D E q ) l e · n w ( V ( γ ) \ U γ ) S ( γ ) · D | U γ \ V γ | V ≤ n O ( ) · ε ( C V + C E ) · ( | V ( γ ) \ ( U γ ∩ V γ ) | + ∑ e ∈ E ( γ ) l e ) · n w ( V ( γ ) \ U γ ) S ( γ ) ≤ n O ( ) · ε ( C V + C E ) · ( | V ( γ ) \ ( U γ ∩ V γ ) | + ∑ e ∈ E ( γ ) l e ) · n B ε ( | V ( γ ) \ ( U γ ∩ V γ ) | + ∑ e ∈ E ( γ ) l e ) ≤ In this section, we will prove Lemma 12.3 using the strategy sketched in Section 10.

Lemma 12.3.

Whenever k M α k ≤ B norm ( α ) for all α ∈ M ′ , ∑ U ∈I mid M f actId U ( H Id U ) (cid:23) ∑ U ∈I mid ∑ γ ∈ Γ U , ∗ d Id U ( H ′ γ , H Id U ) | Aut ( U ) | c ( γ ) ! Id sym In particular, we prove the following lemmas.

Lemma 12.9.

Whenever k M α k ≤ B norm ( α ) for all α ∈ M ′ , ∑ U ∈I mid M f actId U ( H Id U ) (cid:23) ∆ D sos n D sos Id sym Lemma 12.10. ∑ U ∈I mid ∑ γ ∈ Γ U , ∗ d Id U ( H Id U , H ′ γ ) | Aut ( U ) | c ( γ ) ≤ ∆ D sos D V Assuming these, we can conclude the following.

Lemma 12.3.

We choose C sos sufﬁciently small so that ∆ D sos n Dsos ≥ ∆ Dsos DV which is satisﬁed by setting C sos < C V . Then, since Id Sym (cid:23) , using Lemma 12.9 and Lemma 12.10, ∑ U ∈I mid M f actId U ( H Id U ) (cid:23) ∆ D sos n D sos Id sym (cid:23) ∆ D sos D V Id sym (cid:23) ∑ U ∈I mid ∑ γ ∈ Γ U , ∗ d Id U ( H ′ γ , H Id U ) | Aut ( U ) | c ( γ ) ! Id sym The rest of the section is devoted to proving Lemma 12.9 and Lemma 12.10.In the proofs of both these lemmas, we will need a bound on B norm ( σ ) B norm ( σ ′ ) H Id U ( σ , σ ′ ) that isobtained below. Lemma 12.11.

Suppose λ = n k − ε . For all U ∈ I mid and σ , σ ′ ∈ L U , B norm ( σ ) B norm ( σ ′ ) H Id U ( σ , σ ′ ) ≤ n ε C ∆ | V ( σ ◦ σ ′ ) | ∆ D sos n | U | roof. Suppose there is a vertex i ∈ V ( σ ) \ V σ such that deg σ ( i ) + deg U σ ( i ) is odd, then H Id U ( σ , σ ′ ) = and the inequality is true. So, assume that deg σ ( i ) + deg U σ ( i ) is even for all i ∈ V ( σ ) \ V σ . Simi-larly, assume that deg σ ′ ( i ) + deg U σ ′ ( i ) is even for all i ∈ V ( σ ′ ) \ V σ ′ . Also, if ρ σ = ρ σ ′ , we will have H Id U ( σ , σ ′ ) = and we’d be done. So, assume ρ σ = ρ σ ′ .Let α = σ ◦ σ ′ . We will ﬁrst prove that ∑ e ∈ E ( α ) kl e + deg ( α ) ≥ | V ( α ) | + | U | . Firstly, note that allvertices i ∈ V ( α ) \ ( U α ∪ V α ) have deg α ( i ) to be even and nonzero, and hence at least . Moreover, in boththe sets U α \ ( U α ∩ V α ) and V α \ ( U α ∩ V α ) , there are at least | U | − | U α ∩ V α | vertices of degree at least ,because U is a minimum vertex separator. Also, note that deg ( α ) ≥ | U α | + | V α | . This implies that ∑ e ∈ E ( α ) kl e + deg ( α ) ≥ | V ( α ) \ ( U α ∪ V α ) | + ( | U | − | U α ∩ V α | ) + ( | U α | + | V α | )= ( | V ( α ) | − | U α ∪ V α | ) + ( | U | − | U α ∩ V α | ) + ( | U α ∪ V α | + | U α ∩ V α | )= | V ( α ) | + | U | where we used the fact that U α ∩ V α ⊆ U . Finally, by choosing C V , C E sufﬁciently small, B norm ( σ ) B norm ( σ ′ ) H Id U ( σ , σ ′ ) = e ( qD V ) | V ( σ ) \ U σ | + | V ( σ ) \ V σ | ∏ e ∈ E ( σ ) ( D V D E q ) l e n w ( V ( σ )) − w ( U ) · e ( qD V ) | V ( σ ′ ) \ U σ ′ | + | V ( σ ′ ) \ V σ ′ | ∏ e ∈ E ( σ ′ ) ( D V D E q ) l e n w ( V ( σ ′ )) − w ( U ) · | Aut ( U ) | ∆ | V ( α ) | (cid:18) √ ∆ n (cid:19) deg ( α ) ∏ e ∈ E ( α ) λ ( ∆ n ) k ! l e ≤ n O ( ) · ε ( C V + C E ) · ( | V ( α ) | + ∑ e ∈ E ( α ) l e ) ∆ | V ( α ) | (cid:18) √ ∆ (cid:19) deg ( α ) · √ n | V ( α ) |−| U | (cid:18) √ n (cid:19) deg ( α ) ∏ e ∈ E ( α ) n ( − k − ε ) l e ≤ n O ( ) · ε ( C V + C E ) · ( | V ( α ) | + ∑ e ∈ E ( α ) l e ) n ε C ∆ | V ( α ) | n ε ∑ e ∈ E ( α ) l e · ∆ D sos n | U | √ n | V ( α ) | + | U |− deg ( α ) − ∑ e ∈ E ( α ) kl e ≤ n ε C ∆ | V ( α ) | ∆ D sos n | U | where we used the facts ∆ ≤ deg ( α ) ≤ D sos . To prove Lemma 12.9, we will use the strategy from Section 10.1. We will also use the notation from thatsection. We recall that for U ∈ I mid , L ′ U ⊂ L U was the set of non-trivial shapes in L U . Proposition 12.12.

For V ∈ I mid , λ V = n | V | .Proof. We have λ V = | Aut ( V ) | H Id V ( Id V , Id V ) = ∆ | V | (cid:16) √ ∆ n (cid:17) | V | = n | V | .107 emma 12.13. For any edge e = ( V , U ) in G , we have w ( e ) ≤ n C ∆ ε | U | ∆ D sos Proof.

Let e = ( V , U ) be an edge in G . Then, w ( U ) > w ( V ) and w ( e ) = W ( U , V ) λ V . Using Lemma 12.11,we have W ( U , V ) = | Aut ( U ) | ∑ σ ∈L V , U σ = U ∑ σ ′ ∈L V , U σ ′ = V B norm ( σ ) B norm ( σ ′ ) H Id U ( σ , σ ′ ) ≤ | Aut ( U ) | ∑ σ ∈L V , U σ = U ∑ σ ′ ∈L V , U σ ′ = V n ε C ∆ | V ( σ ◦ σ ′ ) | ∆ D sos n | V | ≤ ∑ σ , σ ′ ∈L ′ V n ε C ∆ | V ( σ ◦ σ ′ ) | ∆ D sos n | V | ≤ ∑ σ , σ ′ ∈L ′ V n C ∆ ε | V ( σ ◦ σ ′ ) | D D sos sos n F ε | V ( σ ◦ σ ′ ) | ∆ D sos n | V | ≤ n C ∆ ε | U | ∆ D sos n | V | ∑ σ , σ ′ ∈L ′ V D D sos sos n F ε | V ( σ ◦ σ ′ ) | ≤ n C ∆ ε | U | ∆ D sos n | V | = λ V n ε C ∆ | U | ∆ D sos where we set C V , C E small enough so that C ∆ ≥ F and invoked Lemma 10.16. This proves the lemma. Corollary 12.14.

For any U , V ∈ I mid such that w ( U ) > w ( V ) , ∑ P : P is a path from V to U in G ∏ e ∈ E ( P ) w ( e ) ≤ D sos Proof.

The total number of vertices in G is at most D sos + since each U ∈ I mid has at most D sos vertices.Therefore, for any ﬁxed integer j ≥ , the number of paths from V to U of length j is at most ( D sos + ) j .Take any path P from V to U . Suppose it has length j ≥ . Note that for all edges e = ( V ′ , U ′ ) in E ( P ) ,since | U ′ | ≥ , we have w ( e ) ≤ n C ∆ ε | U ′ | ∆ D sos ≤ n C ∆ ε ∆ D sos So, ∏ e ∈ E ( P ) w ( e ) ≤ (cid:16) n C ∆ ε ∆ Dsos (cid:17) j . Therefore, by setting C sos small enough, ∑ P : P is a path from V to U in G ∏ e ∈ E ( P ) w ( e ) ≤ D sos ∑ j = ( D sos + ) j (cid:18) n C ∆ ε ∆ D sos (cid:19) j ≤ D sos ∆ D sos Lemma 12.9.

Whenever k M α k ≤ B norm ( α ) for all α ∈ M ′ , ∑ U ∈I mid M f actId U ( H Id U ) (cid:23) ∆ D sos n D sos Id sym Proof.

For all V ∈ I mid , we have Id Sym , V (cid:22) ∑ U ∈I mid : w ( U ) ≥ w ( V ) ∑ P : P is a path from V to U in G ∏ e ∈ E ( P ) w ( e ) ! λ U M f act ( H Id U ) Summing this over all V ∈ I mid , we get Id Sym (cid:22) ∑ U ∈I mid λ U ∑ V ∈I mid : w ( U ) ≥ w ( V ) ∑ P : P is a path from V to U in G ∏ e ∈ E ( P ) w ( e ) ! M f act ( H Id U ) (cid:22) ∑ U ∈I mid λ U ∑ V ∈I mid : w ( U ) ≥ w ( V ) D sos ∆ D sos ! M f act ( H Id U ) For any ﬁxed U ∈ I mid , the number of V ∈ I mid such that w ( U ) ≥ w ( V ) is at most D sos . Therefore, Id Sym (cid:22) ∑ U ∈I mid λ U ∆ D sos M f act ( H Id U )= ∑ U ∈I mid ∆ D sos n | U | M f act ( H Id U ) (cid:22) ∑ U ∈I mid ∆ D sos n D sos M f act ( H Id U ) where we used the fact that for all U ∈ I mid , we have | U | ≤ D sos and M f act ( H Id U ) (cid:23) . We restate the lemma for convenience.

Lemma 12.10. ∑ U ∈I mid ∑ γ ∈ Γ U , ∗ d Id U ( H Id U , H ′ γ ) | Aut ( U ) | c ( γ ) ≤ ∆ D sos D V Proof.

We have ∑ U ∈I mid ∑ γ ∈ Γ U , ∗ d Id U ( H Id U , H ′ γ ) | Aut ( U ) | c ( γ ) = ∑ U ∈I mid ∑ γ ∈ Γ U , ∗ | Aut ( U ) | c ( γ ) ∑ σ , σ ′∈L U γ : | V ( σ ) |≤ DV , | V ( σ ′ ) |≤ DV , | V ( σ ◦ γ ) | > DV or | V ( σ ′◦ γ ) | > DV B norm ( σ ) B norm ( σ ′ ) H Id U γ ( σ , σ ′ ) σ , σ ′ that could appear in the above sum must necessarily be non-trivial and hence, σ , σ ′ ∈ L ′ U .Then, ∑ U ∈I mid ∑ γ ∈ Γ U , ∗ d Id U ( H Id U , H ′ γ ) | Aut ( U ) | c ( γ )= ∑ U ∈I mid ∑ σ , σ ′ ∈L ′ U B norm ( σ ) B norm ( σ ′ ) H Id U ( σ , σ ′ ) ∑ γ ∈ Γ U , ∗ : | V ( σ ◦ γ ) | > D V or | V ( σ ′ ◦ γ ) | > D V | Aut ( U ) | c ( γ ) For σ ∈ L ′ U , deﬁne m σ = D V + − | V ( σ ) | ≥ . This is precisely set so that for all γ ∈ Γ U , ∗ , we have | V ( σ ◦ γ ) | > D V if and only if | V ( γ ) | ≥ | U | + m σ . So, for σ , σ ′ ∈ L ′ U , using Lemma 9.31 ∑ γ ∈ Γ U , ∗ : | V ( σ ◦ γ ) | > D V or | V ( σ ′ ◦ γ ) | > D V | Aut ( U ) | c ( γ ) = ∑ γ ∈ Γ U , ∗ : | V ( γ ) |≥| U | + min ( m σ , m σ ′ ) | Aut ( U ) | c ( γ ) ≤ min ( m σ , m σ ′ ) − Also, for σ , σ ′ ∈ L ′ U , we have | V ( σ ◦ σ ′ ) | + min ( m σ , m σ ′ ) − ≥ D V . Therefore, ∑ U ∈I mid ∑ γ ∈ Γ U , ∗ d Id U ( H Id U , H ′ γ ) | Aut ( U ) | c ( γ ) ≤ ∑ U ∈I mid ∑ σ , σ ′ ∈L ′ U B norm ( σ ) B norm ( σ ′ ) H Id U ( σ , σ ′ ) min ( m σ , m σ ′ ) − ≤ ∑ U ∈I mid ∑ σ , σ ′ ∈L ′ U n ε C ∆ | V ( σ ◦ σ ′ ) | ∆ D sos n | U | min ( m σ , m σ ′ ) − ≤ ∑ U ∈I mid ∑ σ , σ ′ ∈L ′ U n ε C ∆ | V ( σ ◦ σ ′ ) | ∆ D sos min ( m σ , m σ ′ ) − where we used Lemma 12.11. Using n C ∆ | V ( σ ◦ σ ′ ) | ≥ n ε C ∆ | V ( σ ◦ σ ′ ) | | V ( σ ◦ σ ′ ) | , ∑ U ∈I mid ∑ γ ∈ Γ U , ∗ d Id U ( H Id U , H ′ γ ) | Aut ( U ) | c ( γ ) ≤ ∑ U ∈I mid ∑ σ , σ ′ ∈L ′ U n ε C ∆ | V ( σ ◦ σ ′ ) | ∆ D sos | V ( σ ◦ σ ′ ) | min ( m σ , m σ ′ ) − ≤ ∑ U ∈I mid ∑ σ , σ ′ ∈L ′ U n ε C ∆ | V ( σ ◦ σ ′ ) | ∆ D sos D V ≤ ∑ U ∈I mid ∑ σ , σ ′ ∈L ′ U D D sos sos n ε C ∆ | V ( σ ◦ σ ′ ) | ∆ D sos D V where we set C sos small enough so that D sos = n ε C sos ≤ n c ε C ∆ = ∆ . The ﬁnal step will be to arguethat ∑ U ∈I mid ∑ σ , σ ′ ∈L ′ U D Dsossos n C ∆ ε | V ( σ ◦ σ ′ ) | ≤ which will complete the proof. But this will follow fromLemma 10.16 if we set C V , C E small enough.

13 Sparse PCA: Full veriﬁcation

In this section, we will prove all the bounds required to prove Theorem 1.5.110 heorem 1.5.

In particular, we will use Theorem 7.101 where we choose ε in the theorem, not to be confused with the ε in Theorem 1.5, to be an arbitrarily small constant. Deﬁnition 13.1.

Deﬁne n = max ( d , m ) . Remark 13.2.

The above deﬁnition conforms with the notation used in Theorem 7.101. So, we can use thebounds as stated there.

Lemma 13.3.

For all U ∈ I mid and τ ∈ M U , " | Aut ( U ) | c ( τ ) H Id U B norm ( τ ) H τ B norm ( τ ) H T τ | Aut ( U ) | c ( τ ) H Id U (cid:23) Lemma 13.4.

For all U , V ∈ I mid where w ( U ) > w ( V ) and all γ ∈ Γ U , V , c ( γ ) N ( γ ) B ( γ ) H − γ , γ Id V (cid:22) H ′ γ Lemma 13.5.

Whenever k M α k ≤ B norm ( α ) for all α ∈ M ′ , ∑ U ∈I mid M f actId U ( H Id U ) (cid:23) ∑ U ∈I mid ∑ γ ∈ Γ U , ∗ d Id U ( H ′ γ , H Id U ) | Aut ( U ) | c ( γ ) ! Id sym Corollary 13.6.

With high probability, Λ (cid:23) .Proof. This follows by invoking Theorem 7.101 whose conditions follow from Lemma 6.11, Lemma 13.3,Lemma 13.4, and Lemma 13.5.

Lemma 13.7.

Suppose < A < is a constant such that √ λ √ k ≤ d − A ε and √ k ≤ d − A . For all m such that m ≤ d − ε λ , m ≤ k − ε λ , for all U ∈ I mid and τ ∈ M U , suppose deg τ ( i ) is even for all i ∈ V ( τ ) \ U τ \ V τ ,then √ d | τ | −| U τ | √ m | τ | −| U τ | S ( τ ) ≤ ∏ j ∈ V ( τ ) \ U τ \ V τ ( deg τ ( j ) − ) !! · d A ε ∑ e ∈ E ( τ ) l e Proof.

Let r = | τ | − | U τ | , r = | τ | − | U τ | . Since ∆ ≤ , it sufﬁces to prove E : = √ d r √ m r (cid:18) kd (cid:19) r √ λ √ k ! ∑ e ∈ E ( τ ) l e ≤ d A ε ∑ e ∈ E ( τ ) l e We will need the following claim. 111 laim 13.8. ∑ e ∈ E ( τ ) l e ≥ ( r , r ) .Proof. We will ﬁrst prove ∑ e ∈ E ( τ ) l e ≥ r . For any vertex i ∈ V ( τ ) \ U τ \ V τ , deg τ ( i ) is even and isnot , hence, deg τ ( i ) ≥ . Any vertex i ∈ U τ \ V τ cannot have deg τ ( i ) = otherwise U τ \ { i } is avertex separator of strictly smaller weight than U τ , which is not possible, hence, deg τ ( i ) ≥ . Similarly,for i ∈ V τ \ U τ , deg τ ( i ) ≥ . Also, since H τ is bipartite, we have ∑ i ∈ V ( τ ) deg τ ( i ) = ∑ j ∈ V ( τ ) deg τ ( j ) = ∑ e ∈ E ( τ ) l e . Consider ∑ e ∈ E ( τ ) l e = ∑ i ∈ V ( τ ) deg τ ( i ) ≥ ∑ i ∈ V ( τ ) \ U τ \ V τ deg τ ( i ) + ∑ i ∈ ( U τ ) \ V τ deg τ ( i ) + ∑ i ∈ ( V τ ) \ U τ deg τ ( i ) ≥ | V ( τ ) \ U τ \ V τ | + | ( U τ ) \ V τ | + | ( V τ ) \ U τ | = r We can similarly prove ∑ e ∈ E ( τ ) l e ≥ r To illustrate the main idea, we will start by proving the weaker bound E ≤ . Observe that our assump-tions imply m ≤ d λ , m ≤ k λ and also, E ≤ √ d r √ m r (cid:16) kd (cid:17) r (cid:16) √ λ √ k (cid:17) ( r , r ) where we used the fact that √ λ √ k ≤ d − A ε ≤ . Claim 13.9.

For integers r , r ≥ , if m ≤ d λ and m ≤ k λ , then, √ d r √ m r (cid:18) kd (cid:19) r √ λ √ k ! ( r , r ) ≤ Proof.

We will consider the cases r ≥ r and r < r separately. If r ≥ r , we have √ d r √ m r (cid:18) kd (cid:19) r √ λ √ k ! r ≤ √ d r √ d λ ! r (cid:18) kd (cid:19) r √ λ √ k ! r = (cid:18) λ √ d (cid:19) r − r ≤ (cid:18) √ m (cid:19) r − r ≤ And if r < r , we have √ d r √ m r (cid:18) kd (cid:19) r √ λ √ k ! r = √ d r √ m r − r √ m r (cid:18) kd (cid:19) r √ λ √ k ! r ≤ √ d r (cid:18) k λ (cid:19) r − r √ d λ ! r (cid:18) kd (cid:19) r √ λ √ k ! r = d ε . Claim 13.10.

For integers r , r ≥ and an integer r ≥ ( r , r ) , if m ≤ d − ε λ and m ≤ k − ε λ , then, √ d r √ m r (cid:18) kd (cid:19) r √ λ √ k ! r ≤ (cid:18) d A ε (cid:19) r Proof. If r ≥ r , E = √ d r √ m r (cid:18) kd (cid:19) r √ λ √ k ! r √ λ √ k ! r − r ≤ √ d r √ d − ε λ ! r (cid:18) kd (cid:19) r √ λ √ k ! r √ λ √ k ! r − r = λ √ d − ε ! r − r (cid:18) √ d (cid:19) ε r √ λ √ k ! r − r ≤ (cid:18) √ m (cid:19) r − r (cid:18) √ d (cid:19) ε r (cid:18) d A ε (cid:19) r − r ≤ (cid:18) d A (cid:19) ε r (cid:18) d A ε (cid:19) r − r = (cid:18) d A ε (cid:19) r And if r < r , E = √ d r √ m r − r √ m r (cid:18) kd (cid:19) r √ λ √ k ! r √ λ √ k ! r − r ≤ √ d r √ k − ε λ ! r − r √ d − ε λ ! r (cid:18) kd (cid:19) r √ λ √ k ! r √ λ √ k ! r − r = √ k √ d ! ε r (cid:18) √ k (cid:19) ε r √ λ √ k ! r − r ≤ (cid:18) √ k (cid:19) ε r √ λ √ k ! r − r ≤ (cid:18) d A (cid:19) ε r (cid:18) d A ε (cid:19) r − r ≤ (cid:18) d A ε (cid:19) ∑ e ∈ E ( τ ) l e r = ∑ e ∈ E ( τ ) l e in the above claim. Corollary 13.11.

For all U ∈ I mid and τ ∈ M U , we have c ( τ ) B norm ( τ ) S ( τ ) R ( τ ) ≤ Proof.

First, note that if deg τ ( i ) is odd for any vertex i ∈ V ( τ ) \ U τ \ V τ , then S ( τ ) = and the inequalityis true. So, assume that deg τ ( i ) is even for all i ∈ V ( τ ) \ U τ \ V τ .Since τ is a proper middle shape, we have w ( I τ ) = and w ( S τ , min ) = w ( U τ ) . This implies n w ( V ( τ ))+ w ( I τ ) − w ( S τ , min ) = √ d | τ | −| U τ | √ m | τ | −| U τ | As was observed in the proof of Lemma 13.7, every vertex i ∈ V ( τ ) \ U τ or i ∈ V ( τ ) \ V τ has deg τ ( i ) ≥ and hence, | V ( τ ) \ U τ | + | V ( τ ) \ V τ | ≤ ∑ e ∈ E ( τ ) l e . Also, q = d O ( ) · ε ( C V + C E ) . We can set C V , C E sufﬁciently small so that c ( τ ) B norm ( τ ) S ( τ ) R ( τ ) = ( D V ) | U τ \ V τ | + | V τ \ U τ | + | E ( τ ) | | V ( τ ) \ ( U τ ∪ V τ ) | · e ( qD V ) | V ( τ ) \ U τ | + | V ( τ ) \ V τ | ∏ e ∈ E ( τ ) ( D V D E q ) l e · √ d | τ | −| U τ | √ m | τ | −| U τ | S ( τ )( C disc p D E ) ∑ j ∈ ( U τ ) ∪ ( V τ ) deg τ ( j ) ≤ d O ( ) · ( C V + C E ) · ε ∑ e ∈ E ( τ ) l e · ∏ j ∈ V ( τ ) \ V ( U τ ) \ V ( V τ ) ( deg τ ( j ) − ) !! · d A ε ∑ e ∈ E ( τ ) l e ≤ d O ( ) · ( C V + C E ) · ε ∑ e ∈ E ( τ ) l e · ( D V D E ) ∑ e ∈ E ( τ ) l e · d A ε ∑ e ∈ E ( τ ) l e ≤ We can now prove Lemma 13.3.

Lemma 13.3.

For all U ∈ I mid and τ ∈ M U , " | Aut ( U ) | c ( τ ) H Id U B norm ( τ ) H τ B norm ( τ ) H T τ | Aut ( U ) | c ( τ ) H Id U (cid:23) Proof.

Suppose < A < is a constant such that √ λ √ k ≤ d − A ε , √ k ≤ d − A and kd ≤ d − A ε . Forall m such that m ≤ d − ε λ , m ≤ k − ε λ , for all U , V ∈ I mid where w ( U ) > w ( V ) and for all γ ∈ Γ U , V , n w ( V ( γ ) \ U γ ) S ( γ ) ≤  ∏ j ∈ V ( γ ) \ U γ \ V γ ( deg γ ( j ) − ) !!  d B ε ( | V ( γ ) \ ( U γ ∩ V γ ) | + ∑ e ∈ E ( γ ) l e ) for some constant B > that depends only on C ∆ . In particular, it is independent of C V and C E .Proof. Suppose there is a vertex i ∈ V ( γ ) \ U γ \ V γ such that deg γ ( i ) is odd, then S ( γ ) = and theinequality is true. So, assume deg γ ( i ) is even for all vertices i ∈ V ( γ ) \ U γ \ V γ . We have n w ( V ( γ ) \ U γ ) = d | γ | −| U γ | m | γ | −| U γ | . Plugging in S ( γ ) , we get that we have to prove E : = d | γ | −| U γ | m | γ | −| U γ | (cid:18) kd (cid:19) | γ | −| U γ | −| V γ | ∆ | γ | −| U γ | −| V γ | ∏ e ∈ E ( γ ) λ l e k l e ≤ d B ε ( | V ( γ ) \ ( U γ ∩ V γ ) | + ∑ e ∈ E ( γ ) l e ) Let S ′ be the set of vertices i ∈ U γ \ V γ that have deg γ ( i ) ≥ . Let e , f be the number of type verticesand the number of type vertices in S ′ respectively. Observe that S ′ ∪ ( U γ ∩ V γ ) is a vertex separator of γ .Let g = | V γ \ U γ | (resp. h = | V γ \ U γ | ) be the number of type vertices (resp. type vertices) in V γ \ U γ .We ﬁrst claim that d e m f ≥ d g m h . To see this, note that the vertex separator S ′ ∪ ( U γ ∩ V γ ) hasweight √ d e + | U γ ∩ V γ | √ m f + | U γ ∩ V γ | . On the other hand, V γ has weight √ d g + | U γ ∩ V γ | √ m h + | U γ ∩ V γ | . Since γ is a left shape, V γ is the unique minimum vertex separator and hence, √ d e + | U γ ∩ V γ | √ m f + | U γ ∩ V γ | ≥√ d g + | U γ ∩ V γ | √ m h + | U γ ∩ V γ | which implies d e m f ≥ d g m h .Let p = | V ( γ ) \ ( U γ ∪ V γ ) | (resp. q = | V ( γ ) \ ( U γ ∪ V γ ) | ) be the number of type vertices (resp.type vertices) in V ( γ ) \ ( U γ ∪ V γ ) .To illustrate the main idea, we will ﬁrst prove the weaker inequality E ≤ . Since ∆ ≤ , it sufﬁces toprove d | γ | −| U γ | m | γ | −| U γ | (cid:18) kd (cid:19) | γ | −| U γ | −| V γ | ∏ e ∈ E ( γ ) λ l e k l e ≤ We have d | γ | −| U γ | m | γ | −| U γ | = d p + g m q + h ≤ n p + e + g m q + f + h d e m f ≥ d g m h . Also, | γ | − | U γ | − | V γ | = p + e + g . So, it sufﬁces to prove n p + e + g m q + f + h (cid:18) kd (cid:19) p + e + g ∏ e ∈ E ( γ ) (cid:18) λ k (cid:19) l e ≤ We will need the following claim.

Claim 13.13. ∑ e ∈ E ( γ ) l e ≥ max ( p + e + g , 2 q + f + h ) Proof.

Since H γ is bipartite, we have ∑ e ∈ E ( γ ) l e = ∑ i ∈ V ( γ ) deg γ ( i ) = ∑ i ∈ V ( γ ) deg γ ( i ) . Observe that allvertices i ∈ V ( γ ) \ U γ \ V γ have deg γ ( i ) nonzero and even, and hence, deg γ ( i ) ≥ . Then, ∑ e ∈ E ( γ ) l e = ∑ i ∈ V ( γ ) deg γ ( i ) ≥ ∑ i ∈ V ( γ ) \ U γ \ V γ deg γ ( i ) + ∑ i ∈ ( U γ ) \ V γ deg γ ( i ) + ∑ i ∈ ( V γ ) \ U γ deg γ ( i ) ≥ p + e + g Similarly, ∑ e ∈ E ( γ ) l e = ∑ i ∈ V ( γ ) deg γ ( i ) ≥ ∑ i ∈ V ( γ ) \ U γ \ V γ deg γ ( i ) + ∑ i ∈ ( U γ ) \ V γ deg γ ( i ) + ∑ i ∈ ( V γ ) \ U γ deg γ ( i ) ≥ q + f + h Therefore, ∑ e ∈ E ( γ ) l e ≥ max ( p + e + g , 2 q + f + h ) .Now, let r = p + e + g , r = q + f + h . Then, ∑ e ∈ E ( γ ) l e ≥ ( r , r ) and we wish to prove d r m r (cid:18) kd (cid:19) r (cid:18) λ k (cid:19) ( r , r ) ≤ This expression simply follows by squaring Claim 13.9.Now, to prove that E ≤ d B ε ( | V ( γ ) \ ( U γ ∩ V γ ) | + ∑ e ∈ E ( γ ) le ) , we mimic this argument while carefully keeping trackof factors of d ε . Again, using d e m f ≥ d g m h , it sufﬁces to prove that d p + e + g m q + f + h (cid:18) kd (cid:19) | γ | −| U γ | −| V γ | ∆ | γ | −| U γ | −| V γ | ∏ e ∈ E ( γ ) λ l e k l e ≤ d B ε ( | V ( γ ) \ ( U γ ∩ V γ ) | + ∑ e ∈ E ( γ ) l e ) The idea is that the d B ε decay for the edges are obtained from the stronger assumption on m , namely m ≤ d − ε λ , m ≤ k − ε λ . And the d B ε decay for the type vertices of V ( γ ) \ ( U γ ∩ V γ ) are obtained both fromthe stronger assumption on m as well as the factors of kd , the latter especially useful for the degree vertices.Finally, the d B ε decay for the type vertices of V ( γ ) \ ( U γ ∩ V γ ) are obtained from the factors of ∆ .116ndeed, note that for a constant B that depends on C ∆ , ∆ | γ | −| U γ | −| V γ | ≤ d − B ε | V ( γ ) \ ( U γ ∩ V γ ) | . So, wewould be done if we prove d p + e + g m q + f + h (cid:18) kd (cid:19) | γ | −| U γ | −| V γ | (cid:18) λ k (cid:19) ∑ e ∈ E ( γ ) l e ≤ d B ε ( | V ( γ ) \ ( U γ ∩ V γ ) | + ∑ e ∈ E ( γ ) l e ) Let c be the number of type vertices i in V ( γ ) \ ( U γ ∩ V γ ) such that deg γ ( i ) = . Since they havedegree , they must be in ( U γ ) \ V γ . Also, we have | γ | − | U γ | − | V γ | = p + e + g + c and hence, (cid:16) kd (cid:17) | γ | −| U γ | −| V γ | = (cid:16) kd (cid:17) p + e + g + c . For these degree vertices, we have that the factors of kd ≤ d − A ε offer a decay of d B ε . Therefore, it sufﬁces to prove d p + e + g m q + f + h (cid:18) kd (cid:19) p + e + g (cid:18) λ k (cid:19) ∑ e ∈ E ( γ ) l e ≤ d B ε ( p + q + e + f + g + h )+ ∑ e ∈ E ( γ ) l e ) for a constant B > . Observe that p + q + e + f + g + h ≤ ( ∑ e ∈ E ( γ ) l e ) . Therefore, using the notation r = p + e + g , r = q + f + h , it sufﬁces to prove d r m r (cid:18) kd (cid:19) r (cid:18) λ k (cid:19) ∑ e ∈ E ( γ ) l e ≤ d B ε ∑ e ∈ E ( γ ) l e for a constant B > . But this follows by squaring Claim 13.10 where we set r = ∑ e ∈ E ( γ ) l e . Remark 13.14.

In the above bounds, note that there is a decay of d B ε for each vertex in V ( γ ) \ ( U γ ∩ V γ ) .One of the main technical reasons for introducing the slack parameter C ∆ in the planted distribution was tointroduce this decay, which is needed in the current machinery. We can now prove Lemma 13.4.

Lemma 13.4.

For all U , V ∈ I mid where w ( U ) > w ( V ) and all γ ∈ Γ U , V , c ( γ ) N ( γ ) B ( γ ) H − γ , γ Id V (cid:22) H ′ γ Proof.

By Lemma 6.17, we have c ( γ ) N ( γ ) B ( γ ) H − γ , γ Id V (cid:22) c ( γ ) N ( γ ) B ( γ ) S ( γ ) R ( γ ) | Aut ( U ) || Aut ( V ) | H ′ γ Using the same proof as in Lemma 6.11, we can see that H ′ γ (cid:23) . Therefore, it sufﬁces to prove that c ( γ ) N ( γ ) B ( γ ) S ( γ ) R ( γ ) | Aut ( U ) || Aut ( V ) | ≤ Since U , V ∈ I mid , Aut ( U ) = | U | ! | U | !, Aut ( V ) = | V | ! | V | ! . Therefore, | Aut ( U ) || Aut ( V ) | = | U | ! | U | ! | V | ! | V | ! ≤ D | U γ \ V γ | V . Also, | E ( γ ) | ≤ ∑ e ∈ E ( γ ) l e and q = d O ( ) · ε ( C V + C E ) . Note that R ( γ ) = ( C disc √ D E ) ∑ j ∈ ( U γ ) ∪ ( V γ ) deg γ ( j ) ≤ d O ( ) · ε C E · ∑ e ∈ E ( γ ) l e and (cid:16) ∏ j ∈ V ( γ ) \ U γ \ V γ ( deg γ ( j ) − ) !! (cid:17) ≤ ( D V D E ) ∑ e ∈ E ( τ ) l e ≤ d O ( ) · ε ( C V + C E ) · ∑ e ∈ E ( γ ) l e .117et B be the constant from Lemma 13.12. We can set C V , C E sufﬁciently small so that, using Lemma 13.12, c ( γ ) N ( γ ) B ( γ ) S ( γ ) R ( γ ) | Aut ( U ) || Aut ( V ) |≤ ( D V ) | U γ \ V γ | + | V γ \ U γ | + | E ( α ) | | V ( γ ) \ ( U γ ∪ V γ ) | · ( D V ) | V ( γ ) \ V γ | + | V ( γ ) \ U γ | ( qD V ) | V ( γ ) \ U γ | + | V ( γ ) \ V γ | ∏ e ∈ E ( γ ) ( D V D E q ) l e · n w ( V ( γ ) \ U γ ) S ( γ ) d O ( ) · ε C E · ∑ e ∈ E ( γ ) l e · D | U γ \ V γ | V ≤ d O ( ) · ε ( C V + C E ) · ( | V ( γ ) \ ( U γ ∩ V γ ) | + ∑ e ∈ E ( γ ) l e ) · n w ( V ( γ ) \ U γ ) S ( γ ) ≤ d O ( ) · ε ( C V + C E ) · ( | V ( γ ) \ ( U γ ∩ V γ ) | + ∑ e ∈ E ( γ ) l e ) · d B ε ( | V ( γ ) \ ( U γ ∩ V γ ) | + ∑ e ∈ E ( γ ) l e ) ≤ In this section, we will prove Lemma 13.5 using the strategy sketched in Section 10.

Lemma 13.5.

Lemma 13.15.

Whenever k M α k ≤ B norm ( α ) for all α ∈ M ′ , ∑ U ∈I mid M f actId U ( H Id U ) (cid:23) d K D sos Id sym for a constant K > that can depend on C ∆ . Lemma 13.16. ∑ U ∈I mid ∑ γ ∈ Γ U , ∗ d Id U ( H Id U , H ′ γ ) | Aut ( U ) | c ( γ ) ≤ d K D sos D V for a constant K > that can depend on C ∆ . If we assume the above lemmas, we can prove Lemma 13.5.

Lemma 13.5.

Whenever k M α k ≤ B norm ( α ) for all α ∈ M ′ , ∑ U ∈I mid M f actId U ( H Id U ) (cid:23) ∑ U ∈I mid ∑ γ ∈ Γ U , ∗ d Id U ( H ′ γ , H Id U ) | Aut ( U ) | c ( γ ) ! Id sym roof. Let k M α k ≤ B norm ( α ) for all α ∈ M ′ . By Lemma 13.15, ∑ U ∈I mid M f actId U ( H Id U ) (cid:23) d K D sos Id sym for a constant K > . By Lemma 13.16, ∑ U ∈I mid ∑ γ ∈ Γ U , ∗ d Id U ( H Id U , H ′ γ ) | Aut ( U ) | c ( γ ) ≤ d K D sos D V for a constant K > .We choose C sos sufﬁciently small so that d K D sos ≥ d K Dsos DV which can be satisﬁed by setting C sos < K C V for a sufﬁciently small constant K > . Then, since Id Sym (cid:23) , using Lemma 13.15 and Lemma 13.16, ∑ U ∈I mid M f actId U ( H Id U ) (cid:23) d K D sos Id sym (cid:23) d K D sos D V Id sym (cid:23) ∑ U ∈I mid ∑ γ ∈ Γ U , ∗ d Id U ( H ′ γ , H Id U ) | Aut ( U ) | c ( γ ) ! Id sym In the rest of the section, we will prove Lemma 13.15 and Lemma 13.16.To begin with, we will need a bound on B norm ( σ ) B norm ( σ ′ ) H Id U ( σ , σ ′ ) . Lemma 13.17.

Suppose < A < is a constant such that √ λ √ k ≤ d − A ε and √ k ≤ d − A . Suppose m issuch that m ≤ d − ε λ , m ≤ k − ε λ . For all U ∈ I mid and σ , σ ′ ∈ L U , B norm ( σ ) B norm ( σ ′ ) H Id U ( σ , σ ′ ) ≤ d O ( ) D sos d A ε | V ( σ ◦ σ ′ ) | Proof.

Suppose there is a vertex i ∈ V ( σ ) \ V σ such that deg σ ( i ) + deg U σ ( i ) is odd, then H Id U ( σ , σ ′ ) = and the inequality is true. So, assume that deg σ ( i ) + deg U σ ( i ) is even for all i ∈ V ( σ ) \ V σ . Simi-larly, assume that deg σ ′ ( i ) + deg U σ ′ ( i ) is even for all i ∈ V ( σ ′ ) \ V σ ′ . Also, if ρ σ = ρ σ ′ , we will have H Id U ( σ , σ ′ ) = and we would be done. So, assume ρ σ = ρ σ ′ .Let there be e (resp. f ) vertices of type (resp. type ) in V ( σ ) \ U σ \ V σ . Then, n w ( V ( σ )) − w ( U ) = √ d | V ( σ ) | −| U | √ m | V ( σ ) | −| U | = √ d | U σ | √ m | U σ | √ d e √ m f ≤ d O ( ) D sos √ d e √ m f where we used the fact that | U σ | ≤ D sos . 119et there be g (resp. h ) vertices of type (resp. type ) in V ( σ ′ ) \ U σ ′ \ V σ ′ . Then, similarly, n w ( V ( σ ′ )) − w ( U ) ≤ d O ( ) D sos √ d g √ m h .Let α = σ ◦ σ ′ . Since all vertices in V ( α ) \ U α \ V α have degree at least , we have ∑ e ∈ E ( α ) l e ≥ ∑ i ∈ V ( α ) \ U α \ V α deg α ( i ) ≥ ( e + g ) . Similarly, ∑ e ∈ E ( α ) l e ≥ ( f + h ) . Therefore, by setting r = e + g , r = f + h in Claim 13.10, we have √ d e + g √ m f + h (cid:18) kd (cid:19) e + g ∏ e ∈ E ( α ) √ λ l e √ k l e ≤ d A ε ∑ e ∈ E ( α ) l e Also, (cid:16) kd (cid:17) | α | ≤ (cid:16) kd (cid:17) e + g and ∏ j ∈ V ( α ) ( deg α ( j ) − ) !! ≤ d ε C V ∑ e ∈ E ( α ) l e . Therefore, n w ( V ( σ )) − w ( U ) n w ( V ( σ ′ )) − w ( U ) H Id U ( σ , σ ′ ) ≤ d O ( ) D sos √ d e √ m f d O ( ) D sos √ d g √ m h · | Aut ( U ) | (cid:18) √ k (cid:19) deg ( α ) (cid:18) kd (cid:19) | α | ∆ | α | ∏ j ∈ V ( α ) ( deg α ( j ) − ) !! ∏ e ∈ E ( α ) √ λ l e √ k l e ≤ d O ( ) D sos d ε C V ∑ e ∈ E ( α ) l e √ d e + g √ m f + h (cid:18) kd (cid:19) e + g ∏ e ∈ E ( α ) √ λ l e √ k l e ≤ d O ( ) D sos d ε C V ∑ e ∈ E ( α ) l e d A ε ∑ e ∈ E ( α ) l e Now, observe that since all vertices in V ( α ) \ U α \ V α have degree at least , | V ( α ) | ≤ D sos + ∑ e ∈ E ( α ) l e . So, by setting C V , C E sufﬁciently small, B norm ( σ ) B norm ( σ ′ ) H Id U ( σ , σ ′ ) = e ( qD V ) | V ( σ ) \ U σ | + | V ( σ ) \ V σ | ∏ e ∈ E ( σ ) ( D V D E q ) l e n w ( V ( σ )) − w ( U ) · e ( qD V ) | V ( σ ′ ) \ U σ ′ | + | V ( σ ′ ) \ V σ ′ | ∏ e ∈ E ( σ ′ ) ( D V D E q ) l e n w ( V ( σ ′ )) − w ( U ) · H Id U ( σ , σ ′ ) ≤ d O ( ) · ε ( C V + C E ) · ( | V ( α ) | + ∑ e ∈ E ( α ) l e ) d O ( ) D sos d ε C V ∑ e ∈ E ( α ) l e d A ε ∑ e ∈ E ( α ) l e ≤ d O ( ) D sos d A ε | V ( α ) | To prove Lemma 13.15, we will use the strategy from Section 10.1. We will also use the notation from thatsection. We recall that for U ∈ I mid , L ′ U ⊂ L U was the set of non-trivial shapes in L U .120 roposition 13.18. For V ∈ I mid , λ V = ∆ | V | d | V | k | V | Proof.

We have λ V = | Aut ( V ) | H Id V ( Id V , Id V ) = (cid:16) √ k (cid:17) | V | (cid:16) kd (cid:17) | V | ∆ | V | = ∆ | V | d | V | k | V | . Corollary 13.19. λ V ≥ d O ( ) Dsos

Lemma 13.20.

For any edge e = ( V , U ) in G , we have w ( e ) ≤ d O ( ) D sos d A ε | V ( σ ◦ σ ′ ) | Proof.

For any U , V ∈ I mid such that w ( U ) > w ( V ) , ∑ P : P is a path from V to U in G ∏ e ∈ E ( P ) w ( e ) ≤ d O ( ) D sos Proof.

The total number of vertices in G is at most ( D sos + ) since each U ∈ I mid has at most indexshape pieces corresponding to each type and each index shape piece has at most D sos vertices. Therefore,for any ﬁxed integer j ≥ , the number of paths from V to U of length j is at most ( D sos + ) j . Take any121ath P from V to U . Suppose it has length j ≥ . Note that for all edges e = ( V ′ , U ′ ) in E ( P ) , since | U ′ | ≥ , we have w ( e ) ≤ d O ( ) D sos d A ε | U ′ | ≤ d O ( ) D sos d A ε So, ∏ e ∈ E ( P ) w ( e ) ≤ (cid:16) d O ( ) Dsos d A ε (cid:17) j . Therefore, by setting C sos small enough, ∑ P : P is a path from V to U in G ∏ e ∈ E ( P ) w ( e ) ≤ ( D sos + ) ∑ j = ( D sos + ) j d O ( ) D sos d A ε ! j ≤ d O ( ) D sos We can now prove Lemma 13.15.

Lemma 13.15.

Whenever k M α k ≤ B norm ( α ) for all α ∈ M ′ , ∑ U ∈I mid M f actId U ( H Id U ) (cid:23) d K D sos Id sym for a constant K > that can depend on C ∆ .Proof. For all V ∈ I mid , we have Id Sym , V (cid:22) ∑ U ∈I mid : w ( U ) ≥ w ( V ) ∑ P : P is a path from V to U in G ∏ e ∈ E ( P ) w ( e ) ! λ U M f act ( H Id U ) Summing this over all V ∈ I mid , we get Id Sym (cid:22) ∑ U ∈I mid λ U ∑ V ∈I mid : w ( U ) ≥ w ( V ) ∑ P : P is a path from V to U in G ∏ e ∈ E ( P ) w ( e ) ! M f act ( H Id U ) (cid:22) ∑ U ∈I mid λ U ∑ V ∈I mid : w ( U ) ≥ w ( V ) d O ( ) D sos ! M f act ( H Id U ) For any ﬁxed U ∈ I mid , the number of V ∈ I mid such that w ( U ) ≥ w ( V ) is at most ( D sos + ) . Also, λ U ≥ d O ( ) Dsos for all U ∈ I mid . Therefore, Id Sym (cid:22) ∑ U ∈I mid λ U ( D sos + ) d O ( ) D sos M f act ( H Id U ) (cid:22) ∑ U ∈I mid d O ( ) D sos M f act ( H Id U ) where we used the fact that for all U ∈ I mid , M f act ( H Id U ) (cid:23) .122 We restate the lemma for convenience.

Lemma 13.16. ∑ U ∈I mid ∑ γ ∈ Γ U , ∗ d Id U ( H Id U , H ′ γ ) | Aut ( U ) | c ( γ ) ≤ d K D sos D V for a constant K > that can depend on C ∆ .Proof. We have ∑ U ∈I mid ∑ γ ∈ Γ U , ∗ d Id U ( H Id U , H ′ γ ) | Aut ( U ) | c ( γ ) = ∑ U ∈I mid ∑ γ ∈ Γ U , ∗ | Aut ( U ) | c ( γ ) ∑ σ , σ ′∈L U γ : | V ( σ ) |≤ DV , | V ( σ ′ ) |≤ DV , | V ( σ ◦ γ ) | > DV or | V ( σ ′◦ γ ) | > DV B norm ( σ ) B norm ( σ ′ ) H Id U γ ( σ , σ ′ ) The set of σ , σ ′ that could appear in the above sum must necessarily be non-trivial and hence, σ , σ ′ ∈ L ′ U .Then, ∑ U ∈I mid ∑ γ ∈ Γ U , ∗ d Id U ( H Id U , H ′ γ ) | Aut ( U ) | c ( γ )= ∑ U ∈I mid ∑ σ , σ ′ ∈L ′ U B norm ( σ ) B norm ( σ ′ ) H Id U ( σ , σ ′ ) ∑ γ ∈ Γ U , ∗ : | V ( σ ◦ γ ) | > D V or | V ( σ ′ ◦ γ ) | > D V | Aut ( U ) | c ( γ ) For σ ∈ L ′ U , deﬁne m σ = D V + − | V ( σ ) | ≥ . This is precisely set so that for all γ ∈ Γ U , ∗ , wehave | V ( σ ◦ γ ) | > D V if and only if | V ( γ ) | ≥ | U | + m σ . So, for σ , σ ′ ∈ L ′ U , using Lemma 9.31, ∑ γ ∈ Γ U , ∗ : | V ( σ ◦ γ ) | > D V or | V ( σ ′ ◦ γ ) | > D V | Aut ( U ) | c ( γ ) = ∑ γ ∈ Γ U , ∗ : | V ( γ ) |≥| U | + min ( m σ , m σ ′ ) | Aut ( U ) | c ( γ ) ≤ min ( m σ , m σ ′ ) − Also, for σ , σ ′ ∈ L ′ U , we have | V ( σ ◦ σ ′ ) | + min ( m σ , m σ ′ ) − ≥ D V . Therefore, ∑ U ∈I mid ∑ γ ∈ Γ U , ∗ d Id U ( H Id U , H ′ γ ) | Aut ( U ) | c ( γ ) ≤ ∑ U ∈I mid ∑ σ , σ ′ ∈L ′ U B norm ( σ ) B norm ( σ ′ ) H Id U ( σ , σ ′ ) min ( m σ , m σ ′ ) − ≤ ∑ U ∈I mid ∑ σ , σ ′ ∈L ′ U d O ( ) D sos d A ε | V ( σ ◦ σ ′ ) | min ( m σ , m σ ′ ) − d A ε | V ( σ ◦ σ ′ ) | ≥ d A ε | V ( σ ◦ σ ′ ) | | V ( σ ◦ σ ′ ) | , ∑ U ∈I mid ∑ γ ∈ Γ U , ∗ d Id U ( H Id U , H ′ γ ) | Aut ( U ) | c ( γ ) ≤ ∑ U ∈I mid ∑ σ , σ ′ ∈L ′ U d O ( ) D sos d A ε | V ( σ ◦ σ ′ ) | | V ( σ ◦ σ ′ ) | min ( m σ , m σ ′ ) − ≤ ∑ U ∈I mid ∑ σ , σ ′ ∈L ′ U d O ( ) D sos d A ε | V ( σ ◦ σ ′ ) | D V ≤ ∑ U ∈I mid ∑ σ , σ ′ ∈L ′ U d O ( ) D sos D D sos sos d A ε | V ( σ ◦ σ ′ ) | D V The ﬁnal step will be to argue that ∑ U ∈I mid ∑ σ , σ ′ ∈L ′ U D Dsossos d A ε | V ( σ ◦ σ ′ ) | ≤ which will complete the proof.But this will follow from Lemma 10.16 if we set C V , C E small enough. Acknowledgements

We thank Sam Hopkins, Pravesh Kothari, Prasad Raghavendra, Tselil Schramm, and David Steurer forhelpful discussions. We also thank Sam Hopkins and Pravesh Kothari for assistance in drafting the informaldescription of the machinery (Section 3).

References [AMP20] Kwangjun Ahn, Dhruv Medarametla, and Aaron Potechin. Graph matrices: Norm bounds andapplications. abs/1604.03423, 2020. URL: https://arxiv.org/abs/1604.03423 , arXiv:1604.03423 . 11, 13, 14, 49, 59, 77, 78[AMS11] Genevera I Allen and Mirjana Maleti´c-Savati´c. Sparse non-negative generalized pca withapplications to metabolomics. Bioinformatics , 27(21):3029–3035, 2011. 5[ARV04] Sanjeev Arora, Satish Rao, and Umesh Vazirani. Expander ﬂows and a p log n -approximationto sparsest cut. 2004. 1[AW08] Arash A Amini and Martin J Wainwright. High-dimensional analysis of semideﬁnite relax-ations for sparse principal components. In , pages 2454–2458. IEEE, 2008. 5, 6[BAP +

05] Jinho Baik, Gérard Ben Arous, Sandrine Péché, et al. Phase transition of the largest eigenvaluefor nonnull complex sample covariance matrices.

The Annals of Probability , 33(5):1643–1697,2005. 5[BB19] Matthew Brennan and Guy Bresler. Optimal average-case reductions to sparse pca: Fromweak assumptions to strong hardness. arXiv preprint arXiv:1902.07380 , 2019. 5[BBH +

12] Boaz Barak, Fernando G. S. L. Brandão, Aram Wettroth Harrow, Jonathan A. Kelner, DavidSteurer, and Yuan Zhou. Hypercontractivity, sum-of-squares proofs, and their applications.

CoRR , abs/1205.4484, 2012. 1[BBH +

20] Matthew Brennan, Guy Bresler, Samuel B Hopkins, Jerry Li, and Tselil Schramm. Statisticalquery algorithms and low-degree tests are almost equivalent. arXiv preprint arXiv:2009.06107 ,2020. 7 124BCC +

10] Aditya Bhaskara, Moses Charikar, Eden Chlamtac, Uriel Feige, and Aravindan Vijayaragha-van. Detecting high log-densities: an o (n ) approximation for densest k-subgraph. In

Proceedings of the forty-second ACM symposium on Theory of computing , pages 201–210,2010. 4[BCG +

12] Aditya Bhaskara, Moses Charikar, Venkatesan Guruswami, Aravindan Vijayaraghavan, andYuan Zhou. Polynomial integrality gaps for strong sdp relaxations of densest k-subgraph. In

Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms , pages388–405. SIAM, 2012. 4[BGG +

17] Vijay Bhattiprolu, Mrinalkanti Ghosh, Venkatesan Guruswami, Euiwoong Lee, and MadhurTulsiani. Weak decoupling, polynomial folds and approximate optimization over the sphere.In , pages1008–1019. IEEE, 2017. 5[BGL16] Vijay Bhattiprolu, Venkatesan Guruswami, and Euiwoong Lee. Sum-of-squares certiﬁcatesfor maxima of random tensors on the sphere. arXiv preprint arXiv:1605.00903 , 2016. 5[BH17] Fernando GSL Brandao and Aram W Harrow. Quantum de ﬁnetti theorems under local mea-surements with applications.

Communications in Mathematical Physics , 353(2):469–506,2017. 5[BHK +

16] B. Barak, S. B. Hopkins, J. Kelner, P. Kothari, A. Moitra, and A. Potechin. A nearly tightsum-of-squares lower bound for the planted clique problem, 2016. 1, 2, 4, 6, 9, 11, 12, 14, 15,19, 51, 52, 81[BKRW17] Mark Braverman, Young Kun Ko, Aviad Rubinstein, and Omri Weinstein. Eth hardness fordensest-k-subgraph with perfect completeness. In

Proceedings of the Twenty-Eighth AnnualACM-SIAM Symposium on Discrete Algorithms , pages 1326–1341. SIAM, 2017. 4[BKS14] Boaz Barak, Jonathan A Kelner, and David Steurer. Rounding sum-of-squares relaxations. In

Proceedings of the forty-sixth annual ACM symposium on Theory of computing , pages 31–40,2014. 5[BKS15] Boaz Barak, Jonathan A Kelner, and David Steurer. Dictionary learning and tensor decom-position via the sum-of-squares method. In

Proceedings of the forty-seventh annual ACMsymposium on Theory of computing , pages 143–151, 2015. 1, 5[BR13a] Quentin Berthet and Philippe Rigollet. Complexity theoretic lower bounds for sparse principalcomponent detection. In

Conference on Learning Theory , pages 1046–1066, 2013. 5[BR + The Annals of Statistics , 41(4):1780–1815, 2013. 6[BV09] S Charles Brubaker and Santosh S Vempala. Random tensors and planted cliques. In

Approx-imation, Randomization, and Combinatorial Optimization. Algorithms and Techniques , pages406–419. Springer, 2009. 5[CK09] Hyonho Chun and Sündüz Kele¸s. Expression quantitative trait loci mapping with multivariatesparse partial least squares regression.

Genetics , 182(1):79–90, 2009. 5[dKNS20] Tommaso d’Orsi, Pravesh K. Kothari, Gleb Novikov, and David Steurer. Sparse pca: Algo-rithms, adversarial perturbations and certiﬁcates.

In 2020 IEEE 61st Annual Symposium onFoundations of Computer Science (FOCS) , 2020. 6125DKS17] Ilias Diakonikolas, Daniel M Kane, and Alistair Stewart. Statistical query lower bounds forrobust estimation of high-dimensional gaussians and gaussian mixtures. In , pages 73–84. IEEE, 2017.5, 7, 36[DM16] Yash Deshpande and Andrea Montanari. Sparse pca via covariance thresholding.

The Journalof Machine Learning Research , 17(1):4913–4953, 2016. 5, 6[FGR +

17] Vitaly Feldman, Elena Grigorescu, Lev Reyzin, Santosh S Vempala, and Ying Xiao. Statisticalalgorithms and a lower bound for detecting planted cliques.

Journal of the ACM (JACM) ,64(2):1–37, 2017. 7[FK08] Alan Frieze and Ravi Kannan. A new approach to the planted clique problem. In

IARCS An-nual Conference on Foundations of Software Technology and Theoretical Computer Science .Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2008. 5[FPK01] Uriel Feige, David Peleg, and Guy Kortsarz. The dense k-subgraph problem.

Algorithmica ,29(3):410–421, 2001. 4[FPV18] Vitaly Feldman, Will Perkins, and Santosh Vempala. On the complexity of random satisﬁa-bility problems with planted solutions.

SIAM Journal on Computing , 47(4):1294–1338, 2018.7[FS +

97] Uriel Feige, Michael Seltser, et al.

On the densest k-subgraph problem . Citeseer, 1997. 4[GJJ +

20] Mrinalkanti Ghosh, Fernando Granha Jeronimo, Chris Jones, Aaron Potechin, and GouthamRajendran. Sum-of-squares lower bounds for sherrington-kirkpatrick via planted afﬁne planes.

In 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS) , 2020. 1,7[Gri01a] Dima Grigoriev. Complexity of positivstellensatz proofs for the knapsack. computationalcomplexity , 10(2):139–154, 2001. 1[Gri01b] Dima Grigoriev. Linear lower bound on degrees of positivstellensatz calculus proofs for theparity.

Theor. Comput. Sci. , 259(1-2):613–622, 2001. 1[GS11] Venkatesan Guruswami and Ali Kemal Sinop. Lasserre hierarchy, higher eigenvalues, andapproximation schemes for graph partitioning and quadratic integer programming with psdobjectives. In

FOCS , pages 482–491, 2011. 1[GW95] M.X. Goemans and D.P. Williamson. Improved approximation algorithms for maximum cutand satisﬁability problems using semideﬁnite programming.

Journal of the ACM , 42(6):1115–1145, 1995. Preliminary version in

Proc. of STOC’94 . 1[GZ19] David Gamarnik and Ilias Zadik. The landscape of the planted clique problem: Dense sub-graphs and the overlap gap property. arXiv preprint arXiv:1904.07174 , 2019. 4[HKP +

17] Samuel B Hopkins, Pravesh K Kothari, Aaron Potechin, Prasad Raghavendra, Tselil Schramm,and David Steurer. The power of sum-of-squares for detecting hidden structures. In , pages 720–731. IEEE,2017. 5, 6, 7, 9[Hop18] Samuel Brink Klevit Hopkins. Statistical inference and the sum of squares method. 2018. 1,7, 9[HSS15] Samuel B Hopkins, Jonathan Shi, and David Steurer. Tensor principal component analysis viasum-of-squares proofs. In

Conference on Learning Theory , pages 956–1006, 2015. 1, 5126JL09] Iain M Johnstone and Arthur Yu Lu. Sparse principal components analysis. arXiv preprintarXiv:0901.4392 , 2009. 5, 6[Kar72] Richard M Karp. Reducibility among combinatorial problems. In

Complexity of computercomputations , pages 85–103. Springer, 1972. 4[Kea98] Michael Kearns. Efﬁcient noise-tolerant learning from statistical queries.

Journal of the ACM(JACM) , 45(6):983–1006, 1998. 7[Kho06] Subhash Khot. Ruling out ptas for graph min-bisection, dense k-subgraph, and bipartite clique.

SIAM Journal on Computing , 36(4):1025–1071, 2006. 4[KMOW17] Pravesh Kothari, Ryuhei Mori, Ryan O’Donnell, and David Witmer. Sum of squares lowerbounds for refuting any CSP. 2017. 1, 6[KNV +

15] Robert Krauthgamer, Boaz Nadler, Dan Vilenchik, et al. Do semideﬁnite relaxations solvesparse pca up to the information limit?

The Annals of Statistics , 43(3):1300–1322, 2015. 5, 6[KS17] Pravesh K Kothari and David Steurer. Outlier-robust moment-estimation via sum-of-squares. arXiv preprint arXiv:1711.11581 , 2017. 1[Kun20] Dmitriy Kunisky. Positivity-preserving extensions of sum-of-squares pseudomoments over thehypercube. arXiv preprint arXiv:2009.07269 , 2020. 7[KWB19] Dmitriy Kunisky, Alexander S Wein, and Afonso S Bandeira. Notes on computational hard-ness of hypothesis testing: Predictions using the low-degree likelihood ratio. arXiv preprintarXiv:1907.11636 , 2019. 7, 9[Las01] Jean B Lasserre. Global optimization with polynomials and the problem of moments.

SIAMJournal on optimization , 11(3):796–817, 2001. 1[LRS15] James R Lee, Prasad Raghavendra, and David Steurer. Lower bounds on the size of semideﬁ-nite programming relaxations. In

Proceedings of the forty-seventh annual ACM symposium onTheory of computing , pages 567–576, 2015. 1[Ma13] Zongming Ma. Sparse principal component analysis and iterative thresholding.

The Annals ofStatistics , 41(2):772–801, 2013. 5[Maj09] Angshul Majumdar. Image compression by sparse pca coding in curvelet domain.

Signal,image and video processing , 3(1):27–34, 2009. 5[Man17] Pasin Manurangsi. Almost-polynomial ratio eth-hardness of approximating densest k-subgraph. In

Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Comput-ing , pages 954–961, 2017. 4[MP16] Dhruv Medarametla and Aaron Potechin. Bounds on the norms of uniform low degree graphmatrices.

RANDOM , 2016. 11[MRX20] Sidhanth Mohanty, Prasad Raghavendra, and Jeff Xu. Lifting sum-of-squares lower bounds:degree-2 to degree-4. In

Proceedings of the 52nd Annual ACM SIGACT Symposium on Theoryof Computing , pages 840–853, 2020. 1, 6[MW15] Tengyu Ma and Avi Wigderson. Sum-of-squares lower bounds for sparse pca. In

Advances inNeural Information Processing Systems , pages 1612–1620, 2015. 5, 6[Nes00] Yurii Nesterov. Squared functional systems and optimization problems. In

High performanceoptimization , pages 405–440. Springer, 2000. 1127NYS11] Nikhil Naikal, Allen Y Yang, and S Shankar Sastry. Informative feature selection for objectrecognition via sparse pca. In , pages 818–825. IEEE, 2011. 5[Par00] Pablo A Parrilo.

Structured semideﬁnite programs and semialgebraic geometry methods inrobustness and optimization . PhD thesis, California Institute of Technology, 2000. 1[Pau07] Debashis Paul. Asymptotics of sample eigenstructure for a large dimensional spiked covari-ance model.

Statistica Sinica , pages 1617–1642, 2007. 5[PS17] Aaron Potechin and David Steurer. Exact tensor completion with sum-of-squares. arXivpreprint arXiv:1702.06237 , 2017. 1[RM14] Emile Richard and Andrea Montanari. A statistical model for tensor pca. In

Advances inNeural Information Processing Systems , pages 2897–2905, 2014. 5[RRS17] Prasad Raghavendra, Satish Rao, and Tselil Schramm. Strongly refuting random csps belowthe spectral threshold. In

Proceedings of the 49th Annual ACM SIGACT Symposium on Theoryof Computing , pages 121–131, 2017. 1[Sho87] Naum Zuselevich Shor. An approach to obtaining global extremums in polynomial mathemat-ical programming problems.

Cybernetics , 23(5):695–700, 1987. 1[TPW14] Kean Ming Tan, Ashley Petersen, and Daniela Witten. Classiﬁcation of rna-seq data. In

Statistical analysis of next generation sequencing data , pages 219–246. Springer, 2014. 5[TS14] Ryota Tomioka and Taiji Suzuki. Spectral norm of random tensors. arXiv preprintarXiv:1407.1870 , 2014. 5[WBS +

16] Tengyao Wang, Quentin Berthet, Richard J Samworth, et al. Statistical and computationaltrade-offs in estimation of sparse principal components.

The Annals of Statistics , 44(5):1896–1930, 2016. 5[WLY12] Dong Wang, Huchuan Lu, and Ming-Hsuan Yang. Online object tracking with sparse proto-types.

IEEE transactions on image processing , 22(1):314–325, 2012. 5

A Proof that the Leftmost and Rightmost Minimum Vertex Separators areWell-deﬁned

In this section, we give a general proof that the leftmost and rightmost minimum vertex separators arewell-deﬁned.

Lemma A.1.

For any two distinct vertex separators S and S of α , there exist vertex separators S L and S R of α such that:1. S L is a vertex separator of U α and S and a vertex separator of U α and S .2. S R is a vertex separator of S and V α and a vertex separator of S and V α .3. w ( S L ) + w ( S R ) ≤ w ( S ) + w ( S ) Proof.

Take S L to be the set of vertices v ∈ V ( α ) ∩ ( S ∪ S ) such that there is a path from U α to v whichdoesn’t intersect S ∪ S before reaching v . Similarly, take S R to be the set of vertices v ∈ V ( α ) ∩ ( S ∪ S ) such that there is a path from V α to v which doesn’t intersect S ∪ S before reaching v .128ow observe that S L is a vertex separator between U α and S . To see this, note that for any path P from U α to a vertex v ∈ S , either P intersects S L before reaching v or P does not intersect S L before reaching v .In the latter case, v ∈ S L . Thus, in either case, P intersects S L . Following similar logic, S L is also a vertexseparator between U α and S , S R is a vertex separator between S and V α , and S R is also a vertex separatorbetween S and V α .To show that w ( S L ) + w ( S R ) ≤ w ( S ) + w ( S ) , observe that w ( S L ) + w ( S R ) = w ( S R ∪ S R ) + w ( S L ∩ S R ) and w ( S ) + w ( S ) = w ( S ∪ S ) + w ( S ∩ S ) . Thus, to show that w ( S L ) + w ( S R ) ≤ w ( S ) + w ( S ) , it is sufﬁcient to show that1. S L ∪ S R ⊆ S ∪ S S L ∩ S R ⊆ S ∩ S For the ﬁrst statement, note that by deﬁnition any vertex in S L ∪ S R must be in S ∪ S . For the secondstatement, note that if v ∈ S L ∩ S R then there is a path from U α to v which does not intersect any othervertices in S ∪ S and there is a path from v to V α which does not intersect any other vertices in S ∪ S .Combining these paths, we obtain a path P from U α to V α such that v is the only vertex in P which is in S ∪ S . This implies that v ∈ S ∩ S as otherwise either S or S would not be a vertex separator between U α and V α . Corollary A.2.

The leftmost and rightmost minimum vertex separators between U α and V α are well-deﬁned.Proof. Assume that there is no minimum leftmost vertex separator. If so, then there exists a minimum vertexseparator S between U α and V α such that1. There does not exist a minimum vertex separator S ′ of α such that S ′ is also a minimum vertex separatorof U α and S (otherwise we would take S ′ rather than S )2. There exists a minimum vertex separator S of α such that S ′ is not a minimum vertex separator of U α and S (as otherwise S would be the leftmost minimum vertex separator)Now let S L and S R be the vertex separators of α obtained by applying Lemma A.1 to S and S . Since S and S are minimum vertex separators of α , we must have that w ( S L ) = w ( S R ) = w ( S ) = w ( S ) . Since S L is a vertex separator of U α and S , S L = S . However, S L is a vertex separator of U α and S , whichcontradicts our choice of S .Thus, there must be a leftmost minimum vertex separator of α . Following similar logic, there must be arightmost minimum vertex separator of α as well. B Proofs with Canonical Maps

In this section, we give alternative proofs of Lemmas 7.78 and 8.14 using canonical maps.

Deﬁnition B.1 (Canonical Maps) . For each shape α and each ribbon R of shape α , we arbitrarily choose acanonical map ϕ R : V ( α ) → V ( R ) such that ϕ R ( H α ) = H R , ϕ R ( U α ) = A R , and ϕ R ( V α ) = B R . Notethat there are | Aut ( α ) | possible choices for this map. .1 Proof of Lemma 7.78 Lemma B.2. M orth τ ( H ) = ∑ σ ∈ Row ( H ) , σ ′ ∈ Col ( H ) H ( σ , σ ′ ) | Decomp ( σ , τ , σ ′ T ) | M σ ◦ τ ◦ σ ′ T Proof.

Observe that there is a bijection between ribbons R with shape σ ◦ τ ◦ σ ′ T together with an element π ∈ Decomp ( σ , τ , σ ′ ) and triples of ribbons ( R , R , R ) such that1. R , R , R have shapes σ , τ , and σ ′ T , respectively.2. V ( R ) ∩ V ( R ) = A R = B R , V ( R ) ∩ V ( R ) = A R = B R , and V ( R ) ∩ V ( R ) = A R ∩ B R To see this, note that given such ribbons R , R , R , the ribbon R = R ◦ R ◦ R has shape σ ◦ τ ◦ σ ′ T .Further note that we have two bijective maps from V ( σ ◦ τ ◦ σ ′ T ) to V ( R ) . The ﬁrst map is ϕ R . The secondmap is ϕ R ◦ ϕ R ◦ ϕ R . Using this, we can take π = ϕ − R ( ϕ R ◦ ϕ R ◦ ϕ R ) Conversely, given a ribbon R of shape σ ◦ τ ◦ σ ′ T and an element π ∈ Decomp ( σ , τ , σ ′ ) , let R = ϕ R ( π ( σ )) , let R = ϕ R ( π ( τ )) , and let R = ϕ R ( π ( σ ′ T )) . Note that this is well deﬁned because for anyelement π ′ ∈ Aut ( σ ) × Aut ( τ ) × Aut ( σ ′ T ) , ϕ R ( ππ ′ ( σ )) = ϕ R ( π ( π ′ ( σ ))) = ϕ R ( π ( σ )) . Similarly, ϕ R ( ππ ′ ( τ )) = ϕ R ( π ( τ )) and ϕ R ( ππ ′ ( σ ′ T )) = ϕ R ( π ( σ ′ T )) .To conﬁrm that this is bijection, we have to show that these two maps are inverses of each other. Given R , R , and R , applying these two maps gives us ribbons R ′ = ϕ R ϕ − R ( ϕ R ◦ ϕ R ◦ ϕ R )( H σ ) = R , R ′ = ϕ R ϕ − R ( ϕ R ◦ ϕ R ◦ ϕ R )( H τ ) = R , and R ′ = ϕ R ϕ − R ( ϕ R ◦ ϕ R ◦ ϕ R )( H σ ′ T ) = R . Conversely,given R and an element π ∈ Decomp ( σ , τ , σ ′ ) (which we represent by an element π ∈ Aut ( σ ◦ τ ◦ σ ′ T ) ),applying these two maps gives us the ribbon R ′ = ϕ R ( π ( σ )) ◦ ϕ R ( π ( τ )) ◦ ϕ R ( π ( σ ′ T )) = ϕ R π ( σ ◦ τ ◦ σ ′ T ) = R and gives us the map ϕ − R ( ϕ ϕ R ( π ( σ )) ◦ ϕ ϕ R ( π ( τ )) ◦ ϕ ϕ R ( π ( σ ′ T )) ) Now observe that both ϕ R π and ϕ ϕ R ( π ( σ )) give bijective maps from σ to the ribbon ϕ R π ( σ ) so ϕ − ϕ R ( π ( σ )) ϕ R π ∈ Aut ( σ ) . Following similar logic for τ and σ ′ T , in Decomp ( σ , τ , σ ′ ) this map is equivalent to ϕ − R ( ϕ R π ) = π B.2 Proof of Lemma 8.14

Deﬁnition B.3 (Rigorous deﬁnition of intersection patterns) . We deﬁne an intersection pattern P on com-posable shapes γ , τ , γ ′ T to consist of the shape γ ◦ τ ◦ γ ′ T together with a non-empty set of constraint edges E ( P ) on V ( γ ◦ τ ◦ γ ′ T ) such that:1. For all vertices u , v , w ∈ V ( γ ◦ τ ◦ γ ′ T ) , if ( u , v ) , ( v , w ) ∈ E ( P ) then ( u , w ) ∈ E ( P ) E ( P ) does not contain a path between two vertices of γ , two vertices of τ , or two vertices of γ ′ T . Thisensures that when we consider γ , τ , γ ′ individually, their vertices are distinct. . Deﬁning V ∗ ( γ ) ⊆ V ( γ ) to be the vertices of γ which are incident to an edge in E ( P ) , U γ is theunique minimum-weight vertex separator between U γ and V ∗ ( γ ) ∪ V γ

4. Similarly, deﬁning V ∗ ( γ ′ T ) ⊆ V ( γ ′ T ) to be the vertices of γ ′ T which are incident to an edge in E ( P ) , V γ ′ T is the unique minimum-weight vertex separator between V ∗ ( γ ′ T ) ∪ U γ ′ T and V U γ ′ T E ( P ) are between vertices of the same type. Deﬁnition B.4.

We say that two intersection patterns P , P ′ on shapes γ , τ , γ ′ T are equivalent (which wewrite as P ≡ P ′ ) if there is an automorphism π ∈ Aut ( γ ) × Aut ( τ ) × Aut ( γ ′ T ) such that π ( P ) = P ′ (i.e. if E ( P ) and E ( P ′ ) are the constraint edges for P and P ′ respectively then π ( E ( P )) = E ( P ′ ) ). Deﬁnition B.5.

Given composable shapes γ , τ , γ ′ T , we deﬁne P γ , τ , γ ′ T to be the set of all possible intersec-tion patterns P on γ , τ , γ ′ T (up to equivalence) Deﬁnition B.6.

Given composable (but not properly composable) ribbons R , R , R of shapes γ , τ , γ ′ , wedeﬁne the intersection pattern P ∈ P γ , τ , γ ′ T induced by R , R , R as follows:1. Take the canonical maps ϕ R : V ( γ ) → V ( R ) , ϕ R : V ( τ ) → V ( R ) , and ϕ R : V ( γ ′ T ) → V ( R )

2. Given vertices u ∈ V ( γ ) and v ∈ V ( τ ) , add a constraint edge between u and v if and only if ϕ R ( u ) = ϕ R ( v ) . Similarly, given vertices u ∈ V ( γ ) and w ∈ V ( γ ′ T ) , add a constraint edgebetween u and w if and only if ϕ R ( u ) = ϕ R ( w ) and given vertices v ∈ V ( τ ) and w ∈ V ( γ ′ T ) , adda constraint edge between v and w if and only if ϕ R ( v ) = ϕ R ( w ) . Deﬁnition B.7.

Given an intersection pattern P ∈ P γ , τ , γ ′ T , we deﬁne V ( γ ◦ τ ◦ γ ′ T ) / E ( P ) to be V ( γ ◦ τ ◦ γ ′ T ) where all of the edges in E ( P ) are contracted (i.e. if ( u , v ) ∈ E ( P ) then u = v and u = v onlyappears once). Deﬁnition B.8.

Given an intersection pattern P ∈ P γ , τ , γ ′ T , we deﬁne τ P to be the shape such that:1. V ( H τ P ) = V ( γ ◦ τ ◦ γ ′ T ) / E ( P ) E ( H τ P ) = E ( γ ) ∪ E ( τ ) ∪ E ( γ ′ T ) U τ P = U γ V τ P = V γ ′ T Deﬁnition B.9.

Given an intersection pattern P ∈ P γ , τ , γ ′ T , we make the following deﬁnitions:1. We deﬁne Aut ( P ) = { π ∈ Aut ( γ ◦ τ ◦ γ ′ T ) : π ( E ( P )) = E ( P ) }

2. We deﬁne

Aut pieces ( P ) = { π ∈ Aut ( U γ ) × Aut ( τ ) × Aut ( γ ′ T ) : π ( E ( P )) = E ( P ) }

3. We deﬁne N ( P ) = | Aut ( P ) / Aut pieces ( P ) | emma B.10. For all composable σ , τ , and σ ′ T (inclulding improper τ ), M f act τ ( e σ e T σ ′ ) − M orth τ ( e σ e T σ ′ ) = ∑ σ γ : γ is non-trivial , σ ∪ γ = σ | Aut ( U γ ) | ∑ P ∈P γ , τ , IdV τ N ( P ) M orth τ P ( e σ e T σ ′ )+ ∑ σ ′ γ ′ : γ ′ is non-trivial , σ ′ ∪ γ ′ = σ ′ | Aut ( U γ ′ ) | ∑ P ∈P IdU τ , τ , γ ′ T N ( P ) M orth τ P ( e σ e T σ ′ )+ ∑ σ γ : γ is non-trivial , σ ∪ γ = σ ∑ σ ′ γ ′ : γ ′ is non-trivial , σ ′ ∪ γ ′ = σ ′ | Aut ( U γ ) | · | Aut ( U γ ′ ) | ∑ P ∈P γ , τ , γ ′ T N ( P ) M orth τ P ( e σ e T σ ′ ) Proof.

This lemma follows from the following bijection. Consider the third term ∑ σ γ : γ is non-trivial , σ ∪ γ = σ ∑ σ ′ γ ′ : γ ′ is non-trivial , σ ′ ∪ γ ′ = σ ′ | Aut ( U γ ) | · | Aut ( U γ ′ ) | ∑ P ∈P γ , τ , γ ′ T N ( P ) M orth τ P ( e σ e T σ ′ ) On one side, we have the following data:1. Ribbons R , R , and R such that(a) R , R , R have shapes σ , τ , and σ ′ T , respectively.(b) A R = B R and A R = B R (c) ( V ( R ) ∪ V ( R )) ∩ V ( R ) = A R and ( V ( R ) ∪ V ( R )) ∩ V ( R ) = B R

2. An ordering O S ′ on the leftmost minimum vertex separator S ′ between A R and V ∗ ∪ B R .3. An ordering O T ′ on the rightmost minimum vertex separator S ′ between V ∗ ∪ A R and B R .On the other side, we have the following data1. An intersection pattern P ∈ P γ , τ , γ ′ T where γ and γ ′ T are non-trivial.2. Ribbons R ′ , R ′ , R ′ of shapes σ , τ P , σ ′ T such that V ( R ′ ) ∩ V ( R ′ ) = A R ′ = B R ′ , V ( R ′ ) ∩ V ( R ′ ) = B R ′ = A R ′ , and V ( R ′ ) ∩ V ( R ′ ) = A R ′ ∩ B R ′

3. An element π ∈ Aut ( P ) / Aut pieces ( P ) To see this bijection, given R , R , R , we again implement our strategy for analyzing intersection terms.Recall that V ∗ is the set of vertices in V ( R ) ∪ V ( R ) ∪ V ( R ) which have an unexpected equality withanother vertex, S ′ is the leftmost minimum vertex separator between A R and B R ∪ V ∗ , and T ′ is therightmost minimum vertex separator between A R ∪ V ∗ and B R .1. Decompose R as R = R ′ ◦ R where R ′ is the part of R between A R and ( S ′ , O S ′ ) and R is thepart of R between ( S ′ , O S ′ ) and B R = A R . Decompose R as R ∪ R ′ where R is the part of R between A R and ( T ′ , O T ′ ) and R ′ is the part of R between ( T ′ , O T ′ ) and B R

2. Take the intersection pattern P and the ribbon R ′ induced by R , R , and R .3. Observe that we have two bijective maps from V ( γ ◦ τ ◦ γ ′ T ) / E ( P ) to V ( R ) ∪ V ( R ) ∪ V ( R ) .The ﬁrst map is ϕ R ◦ ϕ R ◦ ϕ R and the second map is ϕ R ′ . We take π = ϕ − R ′ ( ϕ R ◦ ϕ R ◦ ϕ R ) .132onversely, given an intersection pattern P ∈ P γ , τ , γ ′ T , R ′ , R ′ , R ′ , and an element π ∈ Aut ( P ) / Aut pieces ( P ) :1. Take R = ϕ R ′ π ( V ( γ )) , R = ϕ R ′ π ( V ( τ )) , and R = ϕ R ′ π ( V ( γ ′ T )) .2. Take R = R ′ ∪ R and take R = R ∪ R ′ .3. Take O S and O T based on B R ′ = A R and B R = A R ′ .To conﬁrm that this is a bijection, we need to show that these maps are inverses of each other.If we apply the ﬁrst map and then the second, we obtain the following:1. We obtain the ribbons(a) R ′′ = R ′ ◦ ϕ R ′ ϕ − R ′ ( ϕ R ◦ ϕ R ◦ ϕ R )( V ( γ )) (b) R ′′ = ϕ R ′ ϕ − R ′ ( ϕ R ◦ ϕ R ◦ ϕ R )( V ( τ )) (c) R ′′ = ϕ R ′ ϕ − R ′ ( ϕ R ◦ ϕ R ◦ ϕ R )( V ( γ ′ T )) ◦ R ′ where(a) R ′ is the part of R between A R and ( S ′ , O S ′ ) where S ′ is the minimum vertex separator between A R and V ∗ ∪ B R .(b) R is the part of R between ( S ′ , O S ′ ) and B R (c) R ′ is the ribbon of shape τ P induced (along with the intersection pattern P ) by R , R , and R .(d) R is the part of R between A R and ( T ′ , O T ′ ) .(e) R ′ is the part of R between ( T ′ , O T ′ ) and B R This implies that R ′′ = R ′ ◦ R = R , R ′′ = R , and R ′′ = R ◦ R ′ = R . Since the second mapleaves R ′ and R ′ unchanged, we recover the orderings O S and O T as well.Conversely, if we apply the second map, we have that R = R ′ ◦ ϕ R ′ π ( V ( γ )) , R = ϕ R ′ π ( V ( τ )) ,and R = ϕ R ′ π ( V ( γ ′ T )) ◦ R ′ and we have the orderings O S and O T corresponding to B R ′ and A R ′ respectively. If we apply the ﬁrst map,1. R ′ and R ′ are preserved.2. R ′′ and P ′′ are the ribbon and intersection pattern induced by the ribbons ϕ R ′ π ( γ ) , ϕ R ′ π ( τ ) , and ϕ R ′ π ( γ ′ T ) . To see that R ′′ = R ′ , observe that R ′′ = ϕ R ′ π ( V ( γ )) ◦ ϕ R ′ π ( V ( τ )) ◦ ϕ R ′ π ( V ( γ ′ T )) = ϕ R ′ π ( γ ◦ τ ◦ γ ′ T ) = ϕ R ( γ ◦ τ ◦ γ ′ T ) = R ′ To see that P ′′ ≡ P , observe that:(a) We have two bijective maps from V ( γ ) to V ( ϕ R ′ π ( γ )) . These two maps are ϕ R ′ π and ϕ ϕ R ′ π ( γ ) .(b) We have two bijective maps from V ( τ ) to V ( ϕ R ′ π ( τ )) . These two maps are ϕ R ′ π and ϕ ϕ R ′ π ( τ ) .(c) We have two bijective maps from V ( γ ′ T ) to V ( ϕ R ′ π ( γ ′ T )) . These two maps are ϕ R ′ π and ϕ ϕ R ′ π ( γ ′ T ) .(d) For P ′′ , the constraint edges are (cid:18) ϕ − ϕ R ′ π ( γ ) ϕ R ′ π ◦ ϕ − ϕ R ′ π ( τ ) ϕ R ′ π ◦ ϕ − ϕ R ′ π ( γ ′ T )) ϕ R ′ π (cid:19) ( E ( P )) π ′′ = ϕ − R ′ ( ϕ ϕ R ′ π ( V ( γ )) ◦ ϕ ϕ R ′ π ( V ( τ )) ◦ ϕ ϕ R ′ π ( V ( γ ′ T )) ) To see that π ′′ ≡ π , note that π = π ′′ (cid:18) ϕ − ϕ R ′ π ( V ( γ )) ϕ R ′ π ◦ ϕ − ϕ R ′ π ( V ( τ )) ϕ R ′ π ◦ ϕ − ϕ R ′ π ( V ( γ ′ T )) ϕ R ′ π ) (cid:19) The analysis for the the ﬁrst term is the same except that when γ ′ is trivial, we always take γ ′ to be theidentity so T = V ( V τ ) = V ( U σ ′ T ) and the ordering O T is given by V τ = U σ ′ T . Similarly, the analysisfor the the second term is the same except that when γ is trivial, we always take γ to be the identity so S = V ( V σ ) = V ( U τ ) and the ordering O S is given by V σ = U ττ