Sum of Squares Lower Bounds from Pairwise Independence
SSum of Squares Lower Bounds from Pairwise Independence
Boaz Barak ∗ Siu On Chan † Pravesh Kothari ‡ March 30, 2015
Abstract
We prove that for every (cid:15) > and predicate P : { , } k → { , } that supports a pairwiseindependent distribution, there exists an instance I of the Max P constraint satisfaction problemon n variables such that no assignment can satisfy more than a | P − (1) | k + (cid:15) fraction of I ’sconstraints but the degree Ω( n ) Sum of Squares semidefinite programming hierarchy cannotcertify that I is unsatisfiable. Similar results were previously only known for weaker hierarchies. ∗ Microsoft Research New England † Microsoft Research New England ‡ University of Texas, Austin. Work done while an intern at Microsoft Research New England. a r X i v : . [ c s . CC ] M a r Introduction
Constraint Satisfaction Problems (CSP) are among the most natural computational problems, andyet their computational complexity is not fully understood. In particular several works have studiedthe notion of
Approximation Resistance , which loosely speaking means that the best polynomial-timeapproximation algorithm is simply the one that outputs a random assignment. Under Khot’s
UniqueGames Conjecture [16] much is known about this property. In particular Austrin and Mossell [3]showed if the UGC is true, then, for every predicate P : { , } k → { , } , if there exists a pairwiseindependent distribution µ over P − (1) (i.e., a distribution µ such that for every i (cid:54) = j ∈ [ k ] , themarginal µ i µ j is the uniform distribution over { , } ), then P is approximation resistant. Austrinand Håstad [2] used this to establish (under the UGC) fairly tight bounds on the threshold atwhich a random predicate of a particular density becomes approximation resistant. However, thereis no consensus whether the UGC is true. Assuming only P (cid:54) = NP , the best known bound is byChan [10] who showed that a predicate is approximation resistant if it contains a distribution µ as above satisfying the additional condition that it is uniform over a subspace V ⊆ GF (2) k . Thisalgebraic structure is a fairly strong condition. In particular if we choose P : { , } k → { , } to be arandom predicate conditioned on | P − (1) | = t (where t ∈ { . . . k } is some parameter), then P willsatisfy the first condition (supporting a pairwise independent distribution) with high probability aslong as t > ck for some constant c [2] while it will not satisfy the second condition even for t aslarge as exp( k/ (see Observation A.1).Another line of work has been concerned with proving unconditional lower bounds for these prob-lems on restricted families of algorithms. These works considered convex relaxations for CSPs, wherewe say that a CSP is approximation resistant for some relaxation R if there is an instance for which arandom assignment is essentially optimal, but the relaxation value is − o (1) (namely, the relaxation“thinks” that it’s possible to satisfy almost all constraints). Interestingly, the unconditional resultsmatch the conditional ones. That is, for certain weaker relaxations (namely, the Sherali-Adamslinear programming hierarchy or Sherali-Adams augmented with the basic semidefinite program),there are unconditional results for the same predicates that were shown approximation-resistantunder the UGC [9, 26, 19]. (This is of course not a coincidence, as the UGC is intimately connectedwith some of these weaker relaxations [21].) In contrast, for the stronger Sum of Squares (SOS) (also known as
Lasserre ) relaxation [24, 18, 20, 17], the previously known results [15, 22, 25] utilizedthe same conditions as in Chan’s NP-hardness result (and in fact inspired Chan’s work).In this work we show that the pairwise independence condition suffices for lower bounds even forthis stronger Sum-of-Squares hierarchy. This result is interesting in its own right and, based on pastexperience, could also be viewed as suggesting that it may be possible to improve the UGC-basedresults to results based on P (cid:54) = NP . Our results actually hold for a more general setting than showing approximation-resistance of predi-cates, and so to state them we need to introduce some notation. Roughly speaking, we show that forevery k and an arbitrarily small (cid:15) > , there exists a set I = { C , . . . , C m } of k -tuples of literals (i.e.variables or their negations) over the variables x , . . . , x n such that (1) for every assignment x tothe variables, the induced distribution on { , } k obtained by taking a random i ∈ [ m ] and lookingat the literals in C i is (cid:15) -close to the uniform distribution on { , } k but (2) for every pairwise inde-pendent distribution µ over { , } k , there is a relaxation-solution that “cheats” the Ω( n ) -degree SOS1elaxation to think that there is a distribution D over assignments (i.e. { , } k ) such that for every i ∈ [ m ] , the projection of D to the literals in C i is distributed according to µ . This immediatelyimplies that predicates supporting a pairwise independent distribution are approximation-resistantfor this relaxation. We now formally state our results: Definition 1.1 (Pseudo-expectation) . For every n and d , let P nd denote the linear space of n -variatereal polynomials of degree at most d . A linear operator ˜ E : P nd → R is a degree- d pseudo-expectationoperator if it satisfies: Normalization ˜ E [1] = 1 where on the LHS denotes the constant polynomial p such that p ( x ) = 1 . Positivity ˜ E [ p ] ≥ for every p ∈ P nd/ .For every polynomial p ∈ P nd , we say that ˜ E satisfies the constraint { p = 0 } if ˜ E [ pq ] = 0 forevery q ∈ P nd − deg( P ) .The Sum-of-Squares hierarchy can be thought of as optimizing over pseudo-expectations; see thesurvey [8] and the references therein, as well as the lecture notes [4]. For notational convenience,we will use variables over {± } instead of { , } . A literal is a function f : {± } n → {± } suchthat f ( x ) = x i or f ( x ) = − x i for some i . If C = ( f , . . . , f k ) is a k -tuple of literals then we denoteby C ( x ) the tuple ( f ( x ) , . . . , f k ( x )) . Our main result is the following: Theorem 1.2 (Main Result) . For every k ∈ N , (cid:15) > there exists δ = δ ( k ) > such that for everysufficiently large n ∈ N there is a set I = { C , . . . , C m } of k -tuples of literals over x , . . . , x n suchthat1. For every x ∈ {± } n , the distribution { C ( x ) } where C is chosen at random in I is within (cid:15) statistical distance to the uniform distribution over {± } k .2. For every pairwise independent distribution µ over {± } k , there exists a degree δn pseudo-expectation operator ˜ E over R n satisfying the constraints { x j = 1 } j =1 ...n such that for every C ∈ I and f : {± } k → R , ˜ E f ( C ( x )) = E f ( µ ) . The following immediate corollary implies that predicates supporting pairwise independent dis-tributions are approximation-resistant for Ω( n ) -degree SOS: Corollary 1.3.
For every (cid:15) > and P : {± } k → { , } , if there exists a pairwise independentdistribution µ supported on P − (1) then there exists δ > such that for all n there is a set I = { C , . . . , C m } of k -tuples of literals over x , . . . , x n such that1. For every x ∈ {± } n , E C ∈I P ( C ( x )) ≤ | P − (1) | k + (cid:15) .2. The value of the δn -degree Max- P SOS relaxation for the fraction of satisfiable constraints onthe instance I is . Remark 1.4.
The instance I = ( C , . . . , C m ) is actually obtained at random (with some pruningof a small fraction of the constraints, or alternatively, with some loss in the “perfect completeness”condition). Thus our results can also be thought as giving some evidence to a conjecture of Barak,Kindler and Steurer [7] that no polynomial-time algorithm (including in particular the SOS algo-rithm) can beat the basic semidefinite program on approximating random CSP instances. Throughout this paper we restrict ourselves to the
Boolean case, and do not consider extensionsto a larger alphabet, though our methods may be useful in this case as well.2 .2 Related works
Grigoriev [15] proved in 1999 that (in the language of this paper) 3XOR is approximation resistantfor the degree Ω( n ) Sum-of-Squares hierarchy. Grigoriev’s work in fact predated the papers of Par-rilo [20] and Lasserre [17] proposing the SOS hierarchy, and so he used the different (but equivalent)language of Positivstellensatz Calculus proofs. (Also, as far we know, he did not note that theseproofs can be efficiently found via a semidefinite program.) Grigoriev’s result was rediscovered in2008 by Schoenebeck [22], who also noted that it implies approximation resistance for 3SAT andsome other CSPs as well. Tulsiani [25] (see also Chan [10]) further generalized these results and inparticular showed that every predicate that contains a pairwise independent subgroup is approxi-mation resistant for Ω( n ) -degree SOS. Both Tulsiani and Schoenebeck follow Grigoriev’s techniqueof reducing SOS lower bounds to resolution width lower bounds. As far as we know, no other SOSintegrality gaps for approximating CSPs were known, and there are very few SOS lower bounds ingeneral, most notably Grigoriev’s lower bound for knapsack [14] and the very recent result by Meka,Potechin and Wigderson for the planted clique problem (personal communication).Arora, Bollobás, Lovász and Tourlakis [1] obtained integrality gaps for the Lovász-Schrijverlinear programming hierarchy for Vertex Cover. Schoenebeck, Trevisan and Tulsiani [23] showedthat Max-Cut is approximation resistant for Ω( n ) levels of the Lovász-Schrijver hierarchy, andthese results have been strengthened to the stronger Sherali-Adams hierarchy [12, 11]. The famousGoemans-Williamson algorithm [13] shows that Max-Cut is not approximation resistant for eventhe degree SOS hierarchy, further underscoring the difference between these relaxations.Perhaps closest to our work are the papers of Benabbas, Georgiou, Magen, and Tulsiani [9] whoshowed that predicates containing a pairwise independent distribution are approximation resistantfor Ω( n ) rounds of the Sherali Adams hierarchy, even when one adds the degree SOS constraints.Indeed, our pseudo-distribution agrees with theirs, though we describe it somewhat differently, andmost importantly, need a completely different argument to show that it is positive semi-definite.Our work is also inspired by the pseudo-expectation view of the SOS hierarchy as advocated in thepapers [5, 6].
To prove Theorem 1.2, we need to show that given any pairwise independent distribution µ over {± } k , one can come up with I , a collection of tuples { C , . . . , C m } of literals and a pseudo-expectation operator ˜ E that “pretends” to be the expectation of a valid distribution whose projectionon to any C i is µ . In fact, our choices for both I and ˜ E will not be novel and follow prior worksin this area. For I , as mentioned, we will simply use a random set of tuples (or more accurately,a set corresponding to a hypergraph with sufficiently strong expansion properties), as was doneby previous works dealing with weaker hierarchies [9, 26, 19]. It turns out that given this choice,the pseudo-expectation ˜ E is essentially “forced”, and again, we use the same pseudo-expectationused in prior works such as [9], though we describe it slightly differently. This pseudo-expectationcorresponds in some sense to the “maximum entropy distribution” conditioned on satisfying ourconstraints (though of course it is not an actual distribution but only a pseudo-distribution in thesense of [8]). Those prior works have shown that for every set S of o ( n ) variables, there is adistribution ν S over the variables in S that agrees with ˜ E . The main difference is that we provethat for some d = Ω( n ) , ˜ E is a valid degree- d pseudo-expectation operator, that is, it satisfies thenon-negativity / positive semidefinite-ness condition ˜ E [ p ] ≥ for every polynomial p ≤ d/ . This3s a more “global” property, as the polynomial p might depend on all n variables, which makes itmore challenging to prove.Our approach is to essentially diagonalize ˜ E . That is, we will show an explicit construction ofpolynomials ˜ χ , . . . , ˜ χ M ∈ P nd/ which we call local orthogonal functions such that (1) { ˜ χ i } Mi =1 spansthe space P nd/ , (2) ˜ E [ ˜ χ i ˜ χ j ] = 0 for all i (cid:54) = j and (3) ˜ E [ ˜ χ i ] ≥ for all i . The existence of thesepolynomials immediately implies the property we need, as, by representing every polynomial p as p = (cid:80) i p i ˜ χ i , we see that ˜ E [ p ] = (cid:88) i,j p i p j ˜ E [ ˜ χ i ˜ χ j ] = (cid:88) i p i ˜ E [ ˜ χ i ] ≥ . We now review the construction of the instance, as well as the pseudo-expectation operator,and then discuss how we come up with these local orthogonal functions. As mentioned above, ourinstance I = ( C , . . . , C m ) will simply be a random instance, which we think of as a k -uniformhypergraph with m hyperedges C , . . . , C m . After some pruning we can assume this hypergraphhas girth Ω(log n ) . By a simple Chernoff + union bound argument, if m > cn for a sufficientlylarge constant c then for every assignment x ∈ {± } n , the induced distribution { C i ( x ) } i ∼ [ m ] willbe (cid:15) -close to the uniform distribution. For this informal overview, suppose that we merely want toestablish the existence of a degree d pseudo-expectation operator for some large constant d . Notethat this means that sets of at most d (or even d ) variables form a forest (i.e. disjoint collection oftrees) in this hypergraph.We now describe the pseudo-expectation operator ˜ E , which in some sense is almost “forced” asthe only natural operator for this instance. (As mentioned, this part is not novel and the sameoperator was used by works such as [9]; however we describe it somewhat differently.) We construct ˜ E by defining for every set S of at most d variables a distribution ν S over {± } S such that (1) for every clause C contained in S , the projection of ν S to C equals µ and (2) the distributionsare locally consistent in the sense that if S ⊆ U then the projection of ν U to S equals ν S . Thedefinition of ν S is very simple. First, say for the purposes of this informal overview that a set S is closed if every clause C in I is either completely contained in S or intersects it in at most a singlevariable. If S of size O ( d ) is closed and connected (as a subgraph of I ) then it is a tree in thehypergraph I . In this case, we define the distribution ν S as follows: to sample x from ν S we pick anarbitrary clause C ⊆ S and sample its variables according to µ . We then continue down the tree,sampling the variables of all the clauses that intersect with C , and so on. It is not hard to showthat because of pairwise independence (and in fact simply because every marginal is uniform) thisprocess will always yield the same distribution regardless of the traversal order, and the probabilityof x ∈ {± } S to be sampled under this distribution will be proportional to (cid:81) C ⊆ S Pr[ µ = C ( x )] .If a set S is closed but not connected then the distribution ν S is obtained by making independentchoices for each of the connected components of S . For a general (not necessarily closed) set S ,we define the closure of S , denoted by cl ( S ) , to be the minimal closed superset of S (this is welldefined; one can show that intersections of closed sets are closed and thus, the minimal closedset is the intersection of all closed sets containing S ). A fairly simple argument using the girthcondition can be used to argue that | cl ( S ) | ≤ O ( | S | ) for every | S | ≤ d . We then define ν S to be thedistribution obtained by projecting the distribution ν cl ( S ) to S . The collection of local distributions If we don’t prune these clauses then our proof guarantees that for − o (1) fraction of the clauses we get themarginal distribution to be µ . It is possible that this can be upgraded to all of the clauses at the expense of someadditional complication, but we have not checked whether or not that’s the case. A and B are collections of disjoint clauses and henceare “closed” under our definition, their distributions could be correlated due to the existence of theset C .so obtained satisfies (1) by construction, and it is not hard to show that it satisfies (2) as well.Since all polynomials of degree at most d are spanned by the set of polynomials { χ S } | S |≤ d (whichwe will call the characters ) where χ S ( x ) = (cid:81) i ∈ S x i , to define the pseudo-expectation operator itsuffices to define ˜ E [ χ S ] for every | S | ≤ d . We simply define ˜ E [ χ S ] to be E x ∼ ν S [ χ S ( x )] .We now describe how we come up with the functions ˜ χ , . . . , ˜ χ M . Intuitively, we would liketo come up with these functions via a Gram-Schmidt like process. That is, we fix some ordering A ≺ . . . ≺ A M of the M = (cid:0) n ≤ d (cid:1) sets of size at most d , and define χ i to be χ A i . Now, we wouldwant to define ˜ χ i to be the component orthogonal to the span of χ , . . . , χ i − where we defineorthogonality using ˜ E as an inner product. We would then get that ˜ E ˜ χ i χ j = 0 for all j < i , whichwould imply that ˜ E ˜ χ i ˜ χ j = 0 for all i (cid:54) = j (as ˜ χ j is spanned by χ , . . . , χ j ). However, this is of coursecircular reasoning, since we cannot assume that ˜ E is positive semidefinite (and hence a valid innerproduct) since this is exactly what we are trying to prove!However, because we know that on every small set U , ˜ E agrees with an actual expectationoperator (the one associated with the actual distribution ν U ), we do know that it is psd when it isrestricted to this small set U . Therefore, if for some reason when we do this Gram-Schmidt processand express ˜ χ i as some linear combination (cid:80) j ≤ i α j χ j , we get lucky and this linear combinationhappens to be extremely sparse then we can actually carry through the argument described above.Specifically, it turns out that it suffices for the set U = ∪{ A j | α j (cid:54) = 0 } to be sufficiently smallso that ˜ E is a valid inner product on U ∪ A i . However a priori, this hope seems dubious, sincethe Gram-Schmidt process is very sequential, and we need to do it for (cid:0) n ≤ d (cid:1) steps. It seems quitepossible that we would create long distance correlations in the process, whereby we would end upneeding to express ˜ χ i using many χ j ’s for sets A j that are quite far from A i . (See Figure 1 for oneexample of a correlation that could arise between two disjoint collection of clauses A and B .)Nevertheless, we show that we are in fact able to choose a tailor-made ordering of the sets so thatthis hope is (essentially) materialized. An important observation that comes to our aid here is thatour local distributions, intuitively speaking, satisfy: if two sets A and B are sufficiently far apart inthe hypergraph I , then the distribution ν A ∪ B is obtained by taking the product of the independentdistributions ν A and ν B . We use this observation to argue that, if we choose the ordering on thesets in (cid:0) [ n ] d (cid:1) in the right way, then, when we express ˜ χ i as a linear combination of the functions χ j j < i , we only use j ’s such that A j is contained in a certain (carefully defined) small “ball” inthe hypergraph around the set A i . The crucial result that we need here is to show that wheneverthere is a dependence between the local distribution on some set A and the local distribution onsome set B that came before A in our order, then, either B is contained in this “ball” around A , orthe correlation between A and B is completely “explained" by the intersection of the closure of B with this ball, in the sense that conditioned on any assignment to the variables in the intersection,the local distributions on A and B are independent. This will allow us to argue that we don’t needto use χ B to express χ A i but can restrict ourselves to characters contained in that ball. Moreover,and crucially, we will show that our ordering has the property that all the characters we will needto use must have come before A as well. Handling Ω( n ) rounds. The above overview can be converted into a full proof with some carewhen d = o (log ( n )) by exploiting the acyclicity of all subgraphs involved. Extending to d = Ω( n ) , however, introduces additional subtleties. When d exceeds Ω(log ( n )) , subgraphs induced by d vertices of I can have cycles. An immediate effect of this is that the the definition of a closed setthat we gave before no longer yields consistent local distributions on any collection of d variables.An example of a problem that arises when cycles can exist on a set of vertices is illustrated inFigure 2. To fix this, we define a stronger notion of closed set S that guarantees that all paths oflength at most between any two vertices in S are completely contained inside S . This notion ofclosures differs from the one that Benabbas et. al. [9] use. An appeal to the expansion propertyof I (instead of high girth as before) can be used to show that the closure of a set S is at most aconstant factor larger than | S | . Similarly, as before, we need to show that there exists a (suitablydefined) ball, Ball ( A ) around any set A of variables (of size at most d ) such that the correlations6ith any other set B of size at most d are “captured” by the intersection of Ball ( A ) and B . Thisneeds a more careful argument. In particular, the correlations (even in the low girth case) areactually not necessarily captured by the intersection of Ball ( A ) with B , but rather with some set B (cid:48) that is related, but not identical to B . However, the crucial property that we require is that theset B in = Ball ( A ) ∩ B (cid:48) satisfies (1) if B came before A in the ordering, then so will B in and (2) | B in | + | B \ Ball ( A ) | ≤ | B | . This second property is more complicated to prove in the case where | B | can be much larger than the girth bound, but turns out to hold there as well. The bottom lineis that with additional care however, the high level picture provided by this overview can indeed beimplemented and we give a full analysis based on the local Gram-Schmidt like process in Section6. We collect some standard definitions and notation here. A ( k, n ) -instance is a k -uniform hypergraph I = { C , . . . , C m } over [ n ] so that every hyperedge (also known as a clause ) C = ( i , . . . , i k ) ∈ I is labeled by a string σ = σ C ∈ {± } k . We identify a clause C with the function that maps x ∈ {± } n to y , . . . , y k where y j = σ j x i j . We will sometimes also consider C as a tuple of the literals ( σ i x i , . . . , σ i k x i k ) . We write V ( C ) for the variables involved in (or covered by ) a clause C and similarly for V ⊆ [ n ] we write C ( V ) for the set of all clauses C such that V ( C ) ⊆ V . For any x ∈ {− , } n , we write x A to denote the tuple of coordinates in the subset A ⊆ [ n ] . If x ∈ {− , } A and y ∈ {− , } B for disjoint sets A and B , we will write x ◦ y for the string in {− , } A ∪ B thatprojects to x for coordinates in A and to y for coordinates in B .Unless explicitly mentioned, the base of all logarithms appearing in the paper is assumed to be . We consider the arity of our tuples k to be a constant and so O notation may hide the dependenceon k .We now define some standard ideas in the context of hypergraphs. Definition 3.1.
Let G be a hypergraph. G is said to be a path if its hyperedges can be orderedinto a sequence C , C , . . . , C (cid:96) such that for each ≤ i ≤ (cid:96) , C i ∩ C i − (cid:54) = ∅ and C i ∩ C j = ∅ for every | i − j | > . G is said to be a cycle if it has at least two hyperedges, and there is a cyclic ordering of itshyperedges C , C , . . . , C (cid:96) − , and there are distinct vertices v , . . . , v (cid:96) − with v i ∈ C i ∩ C ( i +1) mod (cid:96) for all i . G is said to be a forest if it does not contain any cycle. A forest is a tree if it is connected (i.e. for every two distinct vertices u and v , there is a path C , . . . , C (cid:96) such that u ∈ C and v ∈ C (cid:96) ).The degree of G is the maximum number of hyperedges that intersect with any given hyperedge in G . The length of the shortest cycle in G is said to be the girth of G . For any vertices u, v of a hyper-graph G , we define the distance , dist ( u, v ) of u, v in G as the minimum number of hyperedges in anypath that joins u and v in G . For S, T , subsets of vertices, we define dist ( S, T ) def = min s ∈ S,t ∈ T dist ( s, t ) . Next, we define the notion of expansion in a k -uniform hypergraph G : Definition 3.2 (Coefficient of Expansion) . A k -uniform constraint hypergraph G is said to be ( r, β ) -expanding if any collection C of at most r hyperedges of G cover at least ( k − − β ) |C| verticesof G , i.e. |{ v | ∃ C ∈ C , v ∈ C }| ≥ ( k − − β ) |C| . We call β , the coefficient of expansion of G .Let I be a ( k, n ) instance. We now describe the properties of the ( k, n ) instances that we needand give a construction for them in Section B of the Appendix by taking a random instance and7emoving a few clauses. Specifically, we show the existence of nice instances, the ones that satisfythe properties described in the lemma below: Lemma 3.3.
Fix > (cid:15), δ ≥ and γ ≥ e k k . Then, there exists a k -uniform constraint hypergraph G with γn edges such that for η = (1 /γ ) /δ , /τ = 4 log ( γk ) , G :1. is ( ηn, δ ) -expanding,2. has girth g ≥ τ log ( n ) We will use this lemma with any given (cid:15) (the soundness slack), δ = and γ = e k k /(cid:15) . Wewill call the instances that satisfy the conditions of the lemma above as nice .For such instances, it is also easy to prove the soundness part (part (i)) of Theorem 1.2 (seeSection B.1 of the Appendix) which we record in the following lemma. Lemma 3.4.
For every (cid:15) > and k , if n is sufficiently large then there exists a nice ( k, n ) -instance I with the property that for every x ∈ {± } n , the distribution { C ( x ) } C ∈I is (cid:15) -close in total variationdistance to the uniform distribution on {± } k . Throughout the rest of this paper we fix I = ( C , . . . , C m ) to be a nice ( k, n ) instance with coefficientof expansion β . Thus whenever we mention edges, paths, or clauses, they will always be withrespect to the hypergraph I . In this section, we define a linear operator ˜ E on P ns , the linear spaceof multilinear polynomials on R n of degree at most s = ηn . We will ensure that the ˜ E so definedwill satisfy ˜ E [ f ( C ( x ))] = E [ f ( µ )] for every clause C ∈ I and function f : {± } k → R . In thenext section, we will show that the ˜ E we define here is in fact a pseudo-expectation operator on P nd for d = ηn k and thus obtain our main result. The ˜ E operator we use was defined in previousworks such as Benabbas et. al. [9] and later also used by Tulsiani and Worah [26] to study weakerLP/SDP hierarchies. Here, we describe a construction of the same operator in a slightly differentway so as to help us in the proof of our main result.To define ˜ E , it is enough to define ˜ E [ χ S ] for characters χ S for each S ⊆ [ n ] , | S | ≤ s , as one canthen extend ˜ E linearly to all of P ns . To do this, we define a probability distribution ν X for every X ⊆ [ n ] , such that | X | ≤ s , and then set ˜ E [ χ S ] to be the expectation of χ S under ν S . We first define the concept of closed sets that is central to our argument.
Definition 4.1 (Closure and closed sets) . For every R ≥ , a set A ⊆ [ n ] is R -closed if for every v, v (cid:48) ∈ A , any path of length at most R between v and v (cid:48) is contained in A . We say that A is closed if it is -closed.We define the R -closure of A , denoted by cl R ( A ) , to be the intersection of all sets B such that A ⊆ B and B is R -closed. The closure of A , denoted by cl ( A ) , is the -closure of A .8 emark 4.2. Readers familiar with the definition of closure (or advice set) in the work of [9] or [26]will find the definition of closure above slightly different. The main difference is that our definitionallows us to have some nice properties such as uniqueness and that the intersection of two closed setsis closed, which are very helpful for our proof. We stress however that the actual pseudo-expectationis the same as that of those works.
Next, we give a constructive definition of closure of a set.
Lemma 4.3.
Given S ⊆ [ n ] and any R < min { g / , β } , the R - closure of S can be obtained by thefollowing procedure run on S : Set A := ∅ . For every v, v (cid:48) ∈ V ( A ) ∪ S such that there is a pathof length at most R between v and v (cid:48) in I not contained in A , add every clause in the path to A .Output V ( A ) ∪ S .Proof. Observe that the procedure terminates as there are only finitely many clauses. Further, theoutput is closed by virtue of the termination of the procedure. By induction on the time at whicha path is added in the procedure, it is easy to show that every closed set containing S must containthe path. Thus, V ( A ) is a closed set containing A and every clause C such that V ( C ) ⊆ V ( A ) satisfies V ( C ) ⊆ cl R ( S ) . The lemma now follows by the minimality of cl R ( S ) .Next, we bound the size of cl R ( S ) . Lemma 4.4.
For any
R < min { g / , β } and S ⊆ [ n ] such that | S | ≤ ηn R . Then, |C ( cl R ( S )) | ≤ R | S | and | cl R ( S ) | ≤ Rk | S | .Proof. Consider the procedure described in Lemma 4.3. Let S iso ⊆ cl R ( S ) be the isolated verticesin cl R ( S ) . Observe that one cannot add any isolated vertices in the procedure and thus S iso ⊆ S .Define S (cid:48) = S \ S iso . Then, cl R ( S ) = cl R ( S (cid:48) ) ∪ S iso .If the process terminates before adding a total of q = | S (cid:48) | R − β clauses, then there’s nothing to prove,since | S (cid:48) | ≤ | S | ≤ ηn R yields that q ≤ ηn . Thus, suppose, for the sake of a contradiction, that theprocedure adds > q clauses and let i th round of the procedure be the first round where the numberof clauses added exceeds q .Let C i be the set of clauses added in the procedure till the i th round and let S (cid:48) i be the set ofvariables obtained by taking the union of variables covered by the clauses added and S (cid:48) . Further,suppose that the i th round adds q i clauses. Then, |C i | ≤ q + q i < ηn and thus, C i must satisfythe expansion requirement: | V ( C i ) | ≥ ( q + q i )( k − − β ) . On the other hand, any new path oflength j ≤ R added in a round adds at most jk − ( j − − new vertices. Thus, on an average,every one of the at most j new clauses added in any round of the procedure contribute at most: k − − /j ≤ k − − /R new vertices. Thus, | S (cid:48) i | ≤ | S (cid:48) | + ( q + q i )( k − − /R ) .Now, ( q + q i )( k − − β ) ≤ | V ( C i ) | ≤ | S (cid:48) i | ≤ | S (cid:48) | + ( q + q i ) · ( k − − /R ) . This yields that | S (cid:48) | ≥ ( q + q i ) · (1 /R − β ) > | S (cid:48) | using that q = | S (cid:48) | R − β . This is a contradiction.The size claimed in the lemma now follows by observing that R − β ≥ R and that every clausecontributes at most k new variables. 9he following lemma summarizes the simple properties of the closures defined here. Lemma 4.5 (Simple Properties of Closures) .
1. For any
R < g / , if A and B are R -closed andthen so is A ∩ B .2. If A ⊆ B then cl R ( A ) ⊆ cl R ( B ) .3. Every connected component of cl R ( A ) of size ≥ intersects A in at least two elements.4. Let A = A ∪ A ∪ . . . A m . Then, cl ( A ) = cl ( ∪ mi =1 cl ( A i )) .Proof.
1. If there are two vertices v, v (cid:48) in A ∩ B such that dist ( v, v (cid:48) ) ≤ R , then since both A and B are closed, both of them should contain the unique (since R < g / ) path between them.2. By definition, cl R ( B ) is an R -closed set containing B ⊇ A and hence if cl R ( A ) (cid:42) cl R ( B ) then cl R ( A ) ∩ cl R ( B ) would be an even smaller R -closed set that contains A , contradicting theminimality of cl R ( A ) .3. Suppose otherwise that there is some connected component S of cl R ( A ) with | S | ≥ inter-secting A with at most one element { x } , then we claim that B = ( cl R ( A ) \ S ) ∪ { x } is an R -closed set containing A . Clearly, B ⊇ A . Now suppose for the sake of contradiction thatthere were two vertices v (cid:54) = v (cid:48) of distance at most R in B whose path is not in B . Thensince B ⊆ cl R ( A ) and cl R ( A ) is R -closed, the path between v and v (cid:48) must have had a vertex u ∈ S \ { x } . But since one of v or v (cid:48) must be different than x (say v (cid:48) ), we get by contradictionthat v (cid:48) was connected to S in cl R ( A ) .4. Let B = cl ( ∪ mi =1 cl ( A i )) . Since cl ( A ) is closed and contains ∪ mi =1 A i , B ⊆ cl ( A ) . If B (cid:54) = cl ( A ) ,then, B ⊇ ∪ mi =1 A i and is closed contradicting the minimality of cl ( A ) . ˜ E Using the closures defined above, we define a local probability distribution on all closed sets and useit to define ˜ E . Let C = ( v , v , . . . , v k ) , where, each v j is the literal σ j x i j for some σ j ∈ {± } . Thedistribution µ C simply assigns to x ∈ {± } n the probability µ ( σ x i , . . . , σ k x i k ) (i.e., the probabilitythat C ( x ) = a under µ C is set to µ ( a ) for every a ∈ {± } k ).The definition and the proof of consistency of the local distribution we define were shown byBenabbas et. al. [9] for the weaker notion of closures they used (in order to define linear roundsolutions in the Sherali Adams hierarchy). The argument for our notion of closure is similar but weinclude it here for the sake of completeness.For every set S ⊆ [ n ] , | S | ≤ d , let cl ( S ) be the closure of S and suppose I S is the set of isolatedvariables in cl ( S ) . Define C ( cl ( S )) be all clauses C such that V ( C ) ⊆ cl ( S ) . Then, we set: ν cl ( S ) ( x ) = Z cl ( S ) · Π C ∈C ( cl ( S )) µ C ( x C ) (1)where x C the projection of x on to the coordinates in V ( C ) , and Z cl ( S ) = 2 k |C ( cl ( S )) |−| cl ( S ) | ( ≥ .Observe that the above expression tells us that the marginal distribution of ν cl ( S ) over I S is uniform.We extend the notation above and write ν T for the marginal of ν cl ( T ) on variables in T .We now show that ν cl ( S ) defined above is indeed a probability distribution over cl ( S ) . Lemma 4.6.
Let A and B be closed sets such that A ⊆ B and |C ( B ) | ≤ ηn . Then, . ν A is a valid probability distribution: (cid:80) x ∈{− , } A ν A ( x ) = 1 .2. ν is locally consistent: for every x ∈ {− , } S , ν A ( x ) = (cid:80) y ∈{− , } B \ A ν B ( x ◦ y ) . The following claim that we record as a lemma will be useful in the proof.
Lemma 4.7.
There exists an ordering C , C , . . . , C r of clauses in C A,B and a partition of B \ A into sets F ⊆ V ( C ) , F ⊆ V ( C ) , . . . , F r ⊆ V ( C r ) such that for every j ≤ r , | F j | ≥ k − and F j ∩ ( ∪ i>j V ( C i )) = ∅ . We first complete the proof of Lemma 4.6 and then prove Lemma 4.7.
Proof of Lemma 4.6.
Let Z A = 2 −| A | + k |C ( A ) | and Z B = 2 −| B | + k |C ( B ) | . Let C A,B = C ( B ) \ C ( A ) .Using (1), we have: (cid:88) y ∈{− , } B \ A ν B ( x ◦ y ) = Z B · Π C ∈C ( B ) µ C ( x C ◦ y C )= Z B · Π C ∈C ( A ) µ C ( x C ) · Π C ∈C A,B µ C ( x C ◦ y C ) To simplify notation, we will write µ i for µ C i and x i for x V ( C i ) where x ∈ {− , } n . We have,using the ordering given by Lemma 4.7. Then, (cid:88) y ∈{− , } B \ A ν B ( x ◦ y ) = Z B (cid:88) y ∈{− , } B \ A · Π C ∈C ( A ) µ C ( x C ) · Π C ∈C A,B µ C ( x C ◦ y C )= Z B (cid:88) y ∈{− , } B \ A Π C ∈C ( A ) µ C ( x C ) · Π ri =1 µ i ( x i ◦ y i ) Using the partition F , F , . . . , F r = Z B Π C ∈C ( A ) µ C ( x C ) · (cid:88) α r ∈{− , } Fr µ r ( ζ r ◦ α r ) · · · (cid:88) α ∈{− , } F µ ( ζ ◦ α ) where ζ r is the value for the variables in V ( C r ) \ F r .Using that | F r | ≥ k − and pairwise independence of µ = Z B Π C ∈C ( A ) µ C ( x C ) · (cid:88) α r ∈{− , } Fr µ r ( ζ r ◦ α r ) · · · (cid:88) α ∈{− , } F µ ( ζ ◦ α ) · −| V ( C r ) \ F r | Continuing similarly for , , . . . , r = Z B Π C ∈C ( A ) µ C ( x C ) · − (cid:80) ri =1 | V ( C r ) \ F r | . Now, (cid:80) ri =1 | V ( C r ) \ F r | = kr − | B \ A | . Further, −| B | + k |C ( B ) | − kr + | B \ A | = −| A | + k |C ( A ) | .Thus, Z B · − (cid:80) ri =1 | V ( C r ) \ F r | = Z A completing the proof.We now complete the proof of Lemma 4.7. Proof of Lemma 4.7.
For every C ∈ C A,B define Γ( C ) = { v ∈ V ( C ) | ∀ C (cid:48) (cid:54) = C ∈ C A,B , v / ∈ V ( C (cid:48) ) } .For any collection C of clauses in C A,B , let ∆( C ) = | ∪ C ∈C Γ( C ) | . Similarly, define Γ A ( C ) = Γ( C ) \ A and ∆ A ( C ) = | ∪ C ∈C Γ A ( C ) | . We make the following claim:11 laim 4.8. For any
C ⊆ C
A,B , ∆ A ( C ) ≥ ( k − / − β ) |C| . We first complete the proof of the lemma using the claim. Since ∆ A ( C A,B ) ≥ ( k − / − β ) |C A,B | and β < / , there exists a clause C such that | Γ A ( C ) | ≥ k − . Now V ( C ) \ A ⊇ Γ A ( C ) andthus | V ( C ) \ A | ≥ k − . We place this clause at the beginning of the ordering, call it C and set F = V ( C ) \ A . We now iterate with C A,B \ { C } to complete the construction, obtain a clause C ∈ C A,B \ C such that | Γ A ( C ) | ≥ k − . Since Γ A ( C ) cannot intersect Γ A ( C ) , we can now set F = V ( C ) \ V ( C ) . Continuing this way yields the required ordering and partition of B \ A .We now complete the proof of the claim. Proof of Claim 4.8.
Fix any C and consider any (maximally) connected subgraph with edges C (cid:48) ⊆ C .If C (cid:48) consists of a single clause C , then | V ( C ) ∩ A | ≤ (since A is closed) and V ( C ) ∩ V ( C (cid:48) ) = ∅ for any C (cid:48) (cid:54) = C ∈ C . Thus, Γ A ( C (cid:48) ) ≥ k − .Now suppose C (cid:48) consists of at least clauses. We first claim that ∆( C (cid:48) ) ≥ ( k − − β ) |C (cid:48) | . Tosee this, observe that C (cid:48) is a collection of at most ηn clauses in I and thus, | V ( C (cid:48) ) | ≥ ( k − − β ) |C| .Further, every v ∈ V ( C (cid:48) ) \ ∪ C ∈C (cid:48) Γ( C ) belongs to at least two different clauses in C (cid:48) and thus, ( k − − β ) |C (cid:48) | ≤ | V ( C (cid:48) ) | ≤ ∆( C (cid:48) ) + ( k |C (cid:48) | − ∆( C (cid:48) )) / . Rearranging gives ∆( C (cid:48) ) ≥ ( k − − β ) |C (cid:48) | .Next, we claim that that for every v ∈ V ( C (cid:48) ) ∩ A there exists a pair of clauses C, C (cid:48) such that V ( C ∪ C (cid:48) ) ∩ A = { v } . Consider any clause C ∈ C such that V ( C ) ∩ A = { v } . If there is anotherclause C (cid:48) such that V ( C (cid:48) ) ∩ A = { v } , then observe that V ( C (cid:48) ) cannot intersect A in any otherelement (since A is closed) and thus we can let C, C (cid:48) be the pair as above, corresponding to v .Otherwise, there exists a clause C (cid:48) such that C (cid:48) ∈ C such that V ( C (cid:48) ) ∩ V ( C ) (cid:54) = ∅ (since V ( C (cid:48) ) isconnected) and V ( C (cid:48) ) ∩ A = ∅ (as otherwise there would be a path between two distinct verticesof A , of length at most outside of A ). Further, observe that all such pairs are disjoint. This isbecause if some pairs intersect, then they induce a path of length at most between two distinctvertices of A that is not contained in A (violating the closedness of A ). Thus, | V ( C (cid:48) ) ∩ A | ≤ |C (cid:48) | / .Thus, we must have: ∆ A ( C (cid:48) ) ≥ ∆( C (cid:48) ) − |C (cid:48) | / ≥ ( k − − β ) |C (cid:48) | − |C (cid:48) | / k − / − β ) |C (cid:48) | .Since for every connected component C (cid:48) inside C we have that ∆ A ( C (cid:48) ) ≥ ( k − / − β ) |C (cid:48) | , wemust have ∆ A ( C ) ≥ ( k − / − β ) |C| as promised. This completes the proof of claim. ˜ E and some basic properties The following is immediate from (1):
Lemma 4.9.
Suppose A and B are closed disjoint sets such that A ∪ B is closed. Then, ν A ∪ B ( x ) = ν A ( x A ) · ν B ( x B ) . We now define the pseudo-expectation operator associated with the local distributions { ν T } | T |≤ s : Definition 4.10 (Pseudo-Expectation) . For the collection of consistent local probability distribu-tions { ν T } | T |≤ s defined in (1) for s ≤ ηn/ , we define ˜ E on P ns by ˜ E [ χ S ] = E ν S [ χ S ] , for every | S | ≤ s . Corollary 4.11.
Let I be a nice ( k, n ) instance and µ a pairwise independent distribution over {± } k . Then the family of local distributions { ν X } X ⊆ [ n ] , | X | 1. Completeness: For every clause C of I , ν V ( C ) = µ .2. Consistency: for every S ⊆ T ⊆ [ n ] , | T | ≤ d , the marginal of ν T on to S is ν S .Proof. The completeness property follows from (1) and C ( V ( C )) = { C } . The consistency propertyfollows from Lemma 4.6.Finally, since ˜ E corresponds to a valid expectation locally, we obtain that ˜ E induces a positivesemidefinite (PSD) inner product on any space of functions of a small number of variables. Lemma 4.12 (Local PSDness) . Let ˜ E be the pseudo-expectation operator defined by the local dis-tributions { ν S } | S |≤ s . Let T be a subset of [ n ] of size at most s . Then for every f ∈ V = Span { χ A | A ⊆ T } , ˜ E [ f ] ≥ . In this section we make an important step towards showing the positivity property of our pseudo-distribution by showing that if two sets A and B are sufficiently closed, then the local distributionon A ∪ B is only determined by the clauses that are contained in A or in B . In particular, thisimplies that if A and B are disjoint then the distribution on A is independent of the distribution of B . The main result of this section is the following expression for the local distribution on the unionof A and B where A is R -closed for a sufficiently large constant R and B is closed. Lemma 5.1 (Local Distribution on Union of Two Closures) . Suppose A is R -closed for R ≥ and B is closed. Then, for any x ∈ {− , } A ∪ B , ν A ∪ B ( x ) = Z A,B · Π C : V ( C ) ⊆ A ∪ B µ C ( x C ) , where Z A,B = 2 k |C ( A ∪ B ) |−| A ∪ B | . We make two convenient definitions before proceeding, see Figure 3: Definition 5.2 (Bridge Paths) . For any two closed sets A and B , a path P of length at most issaid to be a bridge path for the pair A, B if | P ∩ A | = | P ∩ B | = 1 .13 efinition 5.3 (Bridge-Closure Paths) . For any two closed sets A and B , a path P of length atmost is said to be a bridge-closure path for the pair A, B , if there exists a bridge path P (cid:48) suchthat | P (cid:48) ∩ P | = 1 and | P ∩ B | = 1 but C ∩ A = ∅ . Proof overview. Since the proof is rather technical, let us start with a high level overview of it.We first show the only extra clauses added to cl ( A ∪ B ) come from bridge and bridge-closure paths.Moreover, all these additional paths are disjoint apart from their end points. What this amountsto is that the new connections between A and B can be thought of as a collection of disjoint trees T , . . . , T r such that each of these trees has a root in A and its leaves in B . The marginal distributionover A ∪ B is obtained by summing up all possible assignments to the intermediate nodes in thesetrees. Thus at the heart of the proof is the observation that for every such tree T with root x andleaves x , . . . , x (cid:96) , if we consider the distribution over the variables of T induced by the tree (i.e.,where the probability of x is proportional to (cid:81) C ∈C ( T ) µ C ( x C ) ) then the marginal distribution over { x , x , . . . , x (cid:96) } is uniform. Hence these trees create no dependence between A and B .As a final remark, observe that the example from Figure 1 shows that A and B being -closedis not enough to guarantee the statement of the lemma. While we believe that at least one of thesets out of A and B should be R -closed for some R > for the lemma to hold, currently, we do nothave any example of a counter example demonstrating this point. We now proceed with the actualproof. Proof of Lemma 5.1. Let D = cl ( A ∪ B ) . Let C A,B and C B be the set of bridge paths and bridgeclosure paths of B for the pair A, B , respectively. Observe that V ( C A,B ) ∪ V ( C B ) ⊆ D . We nowshow that these are the only extra clauses in D :We first make a few simple observations:The first observation describes how bridge paths and bridge-closure paths intersect. Claim 5.4 (Intersections) . 1. For any distinct P , P ∈ C A,B , P ∩ P ⊆ A ∪ B .2. For any distinct P , P ∈ C B , P ∩ P ⊆ V ( P ) ∪ B where P is a bridge path.3. For any P ∈ C B and P (cid:48) ∈ C A,B , | V ( P ) ∩ V ( P (cid:48) ) | ≤ .4. Suppose P , P ∈ C B are such that P ∩ P (cid:54) = ∅ and P ∩ P (cid:48) (cid:54) = ∅ for some bridge paths P (cid:54) = P (cid:48) .Then, P ∩ P = ∅ .Proof. 1. If the claim weren’t true, then there must be a path of length ≤ between two verticesof A which violating that A is R -closed.2. Suppose first that there is a bridge path P such that P ∩ P (cid:54) = ∅ and P ∩ P (cid:54) = ∅ . If eitherof P or P intersect P in more than one element, then there is a cycle of length at most in G which violates the fact that G has Ω(1) girth. If P and P intersect in an element notcontained in V ( P ) , then, again there is a cycle of length at most in G violating the highgirth of G . Similarly, if P , P intersect inside B , then, they cannot intersect outside of B andfurther, they cannot both intersect the same bridge path as it would yield a cycle of length atmost in G . Thus in both the cases, P ∩ P ⊆ V ( P ) ∪ B for some bridge path P .3. Otherwise there is a cycle of length at most in G violating that G has girth ω (1) .14. If not, then if | P ∩ P (cid:48) ∩ A | = 1 then there is a cycle of length in the graph, contradictingour assumption on the girth. Otherwise | P ∩ P (cid:48) ∩ A | = 2 which means that there is a path oflength at most between two distinct vertices of A .The next observation shows that there is no path of length at most that connects two bridgepaths, two bridge-closure paths or two bridge-bridge-closure paths that are not contained in A ∪ B . Claim 5.5 (No Extra Paths) . 1. There is no path P of length at most not contained in A thatconnects a bridge path P (cid:48) and A .2. There is no path of length at most not contained in A that connects P ∈ C A,B with P (cid:48) ∈ C B .3. There is no path of length at most connecting distinct P, P (cid:48) ∈ C B .Proof. 1. Otherwise there is a path of length at most between two vertices of A not containedin A , violating the fact that A is R closed.2. Otherwise there is a path of length at most between two vertices of A , violating that A is R closed.3. Otherwise there is a path of length at most not contained in A , connecting two vertices of A .The following claim is now a consequence of the claims above: Claim 5.6. For any C such that V ( C ) (cid:42) A ∪ B but V ( C ) ⊆ D , C ∈ C A,B ∪ C B .Proof of Claim. Consider the iterative procedure of building the closure of A ∪ B by adding pathsone by one in some order. Let P be the first path in this order that violates the claim. Then, either P intersects two bridge paths or a bridge path and A or a bridge path and a bridge-closure pathor two bridge-closure paths. Each of these situations is explicitly barred by the claims above. Thiscompletes the argument.Let Z = 2 k |C ( D ) |−| D | = 2 k |C ( A ∪ B ) | + k |C A,B |−| D | . Observe that Z · − |C A,B | = Z A,B . For every clause C ∈ C A,B ∪ C B , let V (cid:48) C = V ( C ) \ ( A ∪ B ) and V (cid:48)(cid:48) C = V ( C ) ∩ ( A ∩ B ) . Similarly, let D (cid:48) = D \ ( A ∪ B ) and D (cid:48)(cid:48) = D ∩ ( A ∪ B ) . Next, we claim: Claim 5.7. Z · (cid:88) γ ∈{− , } D (cid:48) Π C ∈C A,B ∪C B µ C (cid:16) x V (cid:48)(cid:48) C ◦ γ V (cid:48) C (cid:17) = Z A,B . Proof. Let D (cid:48) = V ∪ V such that V ∩ V = ∅ defined by V = D (cid:48) \ V ( C A,B )) and V = D (cid:48) \ V .15 · (cid:88) γ ∈{− , } D (cid:48) Π C ∈C A,B ∪C B µ C (cid:16) x V (cid:48)(cid:48) C ◦ γ V (cid:48) C (cid:17) = Z (cid:88) γ ∈{− , } D (cid:48) Π C ∈C B µ C (cid:16) x V (cid:48)(cid:48) C ◦ γ V (cid:48) C (cid:17) Π C ∈C A,B µ C (cid:16) x V (cid:48)(cid:48) C ◦ γ V (cid:48) C (cid:17) = Z (cid:88) γ ∈{− , } V Π C ∈C A,B µ C (cid:16) x V (cid:48)(cid:48) C ◦ γ V (cid:48) C (cid:17) (cid:88) γ ∈{− , } V Π C ∈C B µ C (cid:16) x V (cid:48)(cid:48) C ◦ γ V (cid:48) C (cid:17) Now, observe that for every C ∈ C B , V ( C ) ∩ V has at most elements. Thus, by pairwise independence of µ = Z (cid:88) γ ∈{− , } V Π C ∈C A,B µ C (cid:16) x V (cid:48)(cid:48) C ◦ γ V (cid:48) C (cid:17) Π C ∈C B −| V ( C ) ∩ ( A ∪ B ∪ V ) | Similarly, for every C ∈ C A,B , V ( C ) ∩ ( A ∪ B ) contains at most elements. Thus, = Z Π C ∈C A,B −| V ( C ) ∩ ( A ∪ B ) | Π C ∈C B −| V ( C ) ∩ ( A ∪ B ∪ V ) | = Z A,B . We can now write, using (1): ν A ∪ B ( x ) = Z · (cid:88) γ ∈{− , } D (cid:48) Π C : V ( C ) ⊆ D µ C (cid:16) x V (cid:48)(cid:48) C ◦ γ V (cid:48) C (cid:17) Using Claim 5.6 = Z (cid:88) γ ∈{− , } D (cid:48) Π C : V ( C ) ⊆ ( A ∪ B ) µ C ( x C ) · Π C ∈C A,B ∪C B µ C (cid:16) x V (cid:48)(cid:48) C ◦ γ V (cid:48) C (cid:17) Using Claim 5.7 = Z A,B Π C : V ( C ) ⊆ ( A ∪ B ) µ C ( x C ) . This completes the proof. ˜ E is positive semidefinite In this section, we prove our main result. Our proof will follow easily from the following lemmawhich is the main result of this section. Lemma 6.1 (Main Lemma) . Let P nd = Span { χ A | | A | ≤ d } be the space of multilinear polynomialson R n of degree at most d = ηn k . There exists a collection of functions { ˜ χ i | ≤ i ≤ M } ⊆ P nd for M = (cid:0) n ≤ d (cid:1) − such that:1. P nd = Span { ˜ χ i | ≤ i ≤ M } .2. ˜ E [ ˜ χ i ] ≥ .3. ˜ E [ ˜ χ i · ˜ χ j ] = 0 whenever i (cid:54) = j . 16e first complete the proof of of Theorem 1.2 assuming this lemma. Observe that part (1) ofthe theorem follows from Lemma B.3. Further, ˜ E satisfies ˜ E [ f ( C i )] = f ( µ ) by Corollary 4.11. Thus,we only need to prove that ˜ E is a valid pseudo-expectation operator, that is, that ˜ E is positivesemidefinite.Let f ∈ P nd be any multilinear polynomial of degree ≤ d . Then, we show that ˜ E [ f ] ≥ . Weuse the spanning property (1) of the ˜ χ i s above to write f = (cid:80) i< ( n ≤ d ) f i · ˜ χ i . Using orthogonality (3)of ˜ χ i s, we have: ˜ E [ f ] = (cid:80) i ∈ ( [ n ] ≤ d ) f i ˜ E [ ˜ χ i ] . Finally, using the positivity property (2) of the ˜ χ i s, wehave that ˜ E is PSD.The rest of this section is devoted to proving Lemma 6.1. Our aim is to build an order on the (cid:0) [ n ] ≤ d (cid:1) , in which to process them for our local orthogonalizationprocedure. We start with an arbitrary ordering on the clauses of I , e.g. for every C ∈ I we definea unique index ζ ( C ) ∈ [ m ] . We say that A ≺ B if: • C ( cl ( A )) is smaller than C ( cl ( B )) in lexicographic order of ζ . That is, A ≺ B if the maximumindex ζ ( C ) for C ∈ cl ( A ) is smaller than this maximum for cl ( B ) , and if they are equal webreak ties by the second largest index and so on. We define π ( cl ( A )) to be the index of cl ( A ) according to this ordering. (Note that π is a permutation on distinct closures, and so if cl ( A ) (cid:54) = cl ( B ) then π ( cl ( A )) (cid:54) = π ( cl ( B )) .) • If C ( cl ( A )) = C ( cl ( B )) then we say that A ≺ B if | A | < | B | . • If C ( cl ( A )) = C ( cl ( B )) and | A | = | B | then we break ties arbitrarily.For i = 0 , . . . , M , we let A i denote the i th set in this ordering. Note that A = ∅ and A , . . . , A n are the singleton elements { } , . . . , { n } (in some arbitrary order). We will write χ i for χ A i in thefollowing to reduce clutter. Set R = 100 . Define the i th local correlated space as V i = Span { χ B | | B | ≤ d, B ⊆ cl R ( A i ) , B ≺ A i } . Lemma 6.2. For every f ∈ V i , ˜ E [ f ] ≥ .Proof. Invoking Lemma 4.12, it suffices to show that | cl R ( A i ) | < s = ηn/ . This follows by notingthat | A i | ≤ d and | cl R ( A i ) | ≤ Rkd = 200 ηn/ ηn/ ≤ s (Lemma 4.4).Define ¯ χ i to be any f ∈ V i such that ˜ E [( χ i − f ) ] ≤ ˜ E [( χ i − g ) ] for every g ∈ V i . Note thatsuch a function must exist because ˜ E [( χ i − f ) ] ≥ for every f (one can WLOG minimize on theorthogonal complement of the kernel of ˜ E inside V i ). We define ˜ χ i = χ i − ¯ χ i . Since V is empty, weset ¯ χ as the constant function and ˜ χ is thus defined as χ = χ ∅ = 1 .The following simple lemma would be very useful. Lemma 6.3 (Local orthogonality) . ˜ E [ ˜ χ i g ] = 0 for every g ∈ V i . B in , B out and B bdy . Proof. Since both g and ¯ χ i are spanned by characters of size at most d and d < s , the pseudo-expectation is well defined. Further, since both g and ¯ χ i lie in Span { χ S | S ⊆ cl R ( A i ) } and | cl R ( A i ) | ≤ s (as in the proof of Lemma 6.2), ˜ E corresponds to the expectation operator associatedwith the probability distribution ν cl R ( A i ) .Now suppose for the sake of contradiction that ˜ E [( ˜ χ i − ¯ χ i ) g ] = δ for some δ > . (If the expectation is negative then we can take − g .) Let f = ¯ χ i − (cid:15)g . We have: ˜ E [( χ i − f ) ] = ˜ E [( χ i − ¯ χ i ) ] + (cid:15) ˜ E [ g ] − (cid:15)δ and so if (cid:15) is sufficiently small then ˜ E [( χ i − f ) ] < ˜ E [( χ i − ¯ χ i ) ] contradicting our choice of ¯ χ i .The following lemma shows that the ˜ χ i ’s span P nd : Lemma 6.4. For every i : Span { ˜ χ j : j ≤ i } = Span { χ j : j ≤ i } .Proof. First, we show that for every i , χ i ∈ Span { ˜ χ j | j ≤ i } . We argue by induction. ˜ χ = χ andthus the statement holds for i = 1 . Now suppose the statement holds for all j < i . Consider χ i .From the definition of ˜ χ i above, we have that: χ i = ˜ χ i + ¯ χ i . Now, ¯ χ i ∈ V i and V i ⊆ Span { χ j | j < i } by definition. Further, by inductive hypothesis, each χ j for j < i satisfies χ j ∈ Span { ˜ χ j (cid:48) | j (cid:48) ≤ j } ⊆ Span { ˜ χ j (cid:48) | j (cid:48) < i } . This completes the induction.The other direction is easier: ˜ χ i = χ i − ¯ χ i and as argued above, ¯ χ i ∈ Span { χ j | j < i } . Thus, ˜ χ i ∈ Span { χ j | j ≤ i } . Together, we thus have: Span { ˜ χ j | j ≤ i } = Span { χ j | j ≤ i } . In this section, we prove the following lemma that is the technical heart of the proof and says thatlocal orthogonalization is enough to ensure that ˜ χ i are all mutually orthogonal.18 emma 6.5. For every j < i , ˜ E [ ˜ χ i · χ j ] = 0 . We will need the following observation for the proof which we record before proceeding: Lemma 6.6. Suppose H is a connected k -uniform hypergraph such that there exist a subset ofvertices, U , | U | ≥ satisfying: dist ( u, v ) > R for every distinct u, v ∈ U . Then, H must have atleast | U | R hyperedges.Proof. Observe that the collection of balls of radius R/ around any vertex in u ∈ U are all disjointand contain at least one path (due to connectedness of H ).We now go on to prove Lemma 6.5. Proof. Fix any j < i and let A = A i and B = A j . Let B bdy = { x ∈ cl ( B ) | ∃ a clause v ( W ) ∈ C ( cl ( B )) s.t. W ∩ cl R ( A ) = { x }} . For every x ∈ B bdy we call any associated clause W as in the definition above as a boundary clause .Let B out = B \ cl R ( A i ) and B in = B \ ( B out ∪ B bdy ) and B rest = B \ ( B out ∪ B in ) . Note that B bdy is not necessarily a subset of B . Next, we make two useful claims: Claim 6.7. | B bdy ∪ B in | ≤ | B | ≤ d. Proof. We will show that | B bdy | ≤ | B out | . This immediately yields the claim by observing that d ≥ | B | = | B in | + | B out | + | B rest | ≥ | B in | + | B bdy | . We note that the proof of this claim issignificantly simpler in the case that | B | < R/ . Proving it in the case when R is a constant and | B | = Ω( n ) is one of the main technical ingredients in getting the proof sketched in the overview towork for Ω( n ) rounds of the SOS hierarchy.Let Q ⊆ [ n ] be a (maximally) connected component in the subgraph defined by the hyperedges C ( cl ( B )) \ C ( cl R ( A )) . Let Q bdy = B bdy ∩ Q and Q out = B out ∩ Q . B bdy is thus partitioned into Q bdy for every possible maximally connected subgraphs Q . It is thus enough to prove that | Q bdy | ≤ | Q out | for any fixed Q .Observe that Q ∩ cl R ( A ) = Q bdy . If Q ∩ cl R ( A ) = ∅ , then, there is nothing to prove. If Q bdy = { v } ,then, Q contains V ( W v ) where W v is a boundary clause associated with v . If Q contains no vertexof B out , then, observe that cl ( B ) \ ( Q \ { v } ) is a closed set containing B contradicting the minimalityof cl ( B ) . Thus, in this case, | Q bdy | ≤ | Q out | .Now suppose for | Q bdy | ≥ . Then, vertices in Q bdy are connected through clauses in Q . On theother hand, since A is R -closed, for any u, v ∈ Q bdy , any path that uses clauses from Q between u, v must be of length at least R + 1 . Applying Lemma 6.6, we observe that |C ( Q ) | ≥ | Q bdy | R/ .Next, we claim that Q ⊆ cl ( Q bdy ∪ Q out ) . It is easy to complete the proof once we have thisclaim: observe that | Q bdy | R/ ≤ |C ( Q ) | ≤ |C ( cl ( Q bdy ∪ Q out )) | ≤ | Q bdy | + 6 | Q out | . Rearranging yields that | Q out | ≥ | Q bdy | · R/ − . Using R ≥ yields that | Q out | ≥ | Q bdy | .We now proceed to show that Q ⊆ cl ( Q bdy ∪ Q out ) . By Lemma 4.5 (4), cl ( B ) = cl ( B in ∪ B bdy ∪ B out ) . Let B (cid:48) = B in ∪ B bdy ∪ B out \ ( Q bdy ∪ Q out ) . Then, by another application of Lemma 4.5194), cl ( B ) = cl ( cl ( B (cid:48) ) ∪ cl ( Q bdy ∪ Q out )) . In other words, one can build the closure of B by firstbuilding the closure of B (cid:48) and Q bdy ∪ Q out (Step ) and then taking the closure of the unions of theobtained sets (Step ). Clearly, the final output contains every clause in C ( Q ) . If we show that (1) C ( cl ( B (cid:48) )) ∩ C ( Q ) = ∅ and that (2) no clause from C ( Q ) is added in the step , then every clausein C ( Q ) must be added in the procedure to build cl ( Q bdy ∪ Q out ) and thus we are done. We nowproceed to show the two statements above.(1): First observe that cl ( B (cid:48) ) itself can be built by building the closure of B in (and cl ( B in ) ⊆ cl R ( A ) ⇒ C ( cl ( B in )) ∩ C ( Q ) = ∅ ), the closure of B out ∪ B bdy \ ( Q bdy ∪ Q out ) (that cannot intersect anyclause from C ( Q ) as then Q must include a vertex from B out ∪ B bdy \ ( Q bdy ∪ Q out ) , a contradiction)and finally taking the closure of their union. This last step cannot add a clause in Q : every path P added connects cl ( B in ) and cl ( B out ∪ B bdy \ ( Q bdy ∪ Q out )) . If P is contained in cl R ( A ) , then, there isnothing to prove. Otherwise P must pass (exactly once) through a boundary vertex. If P containsa clause from C ( Q ) , then, if P passes through a boundary vertex not in Q bdy , then this enlarges Q violating that Q is a maximally connected component. If, on the other hand, P passes through aboundary vertex in Q bdy , then, P connects B out \ Q with Q violating the maximality of Q . Thus, C ( cl ( B (cid:48) )) cannot include any clause from C ( Q ) .(2): Consider the step of the procedure to build cl ( B ) . In this step, we add paths (of lengthat most ) that connect cl ( B (cid:48) ) and cl ( Q bdy ∪ Q out ) . For any such path P , if P includes some clause C from C ( Q ) then it crosses out of cl R ( A ) (exactly once) and thus must pass through a boundaryvertex. By maximality of Q , we must have that P ∩ B bdy ∈ Q bdy and P \ C ( cl R ( A )) ⊆ C ( Q ) . Onthe other hand, the part of P that connects some vertex in Q bdy to cl ( Q bdy ∪ Q out ) is of length atmost and thus must be contained in cl ( Q bdy ∪ Q out ) . Thus every edge in P \ C ( cl R ( A )) is presentin C ( cl ( Q bdy ∪ Q out ) and thus C ∈ C ( Q ) . Claim 6.8. Suppose B out (cid:54) = ∅ . Then, for every S ⊆ B in ∪ B bdy , S ≺ A .Proof. Since B out (cid:54) = ∅ , cl ( B ) (cid:54) = cl ( A ) . Thus, π ( cl ( B )) < π ( cl ( A )) . Now, | B in ∪ B bdy | ≤ d fromClaim 6.7. Thus, every subset S ⊆ B in ∪ B bdy has a well-defined ordering w.r.t (cid:0) [ n ] ≤ d (cid:1) . Further, forevery such S , cl ( S ) ⊆ cl ( B ) (Lemma 4.5) and thus, π ( cl ( S )) ≤ π ( cl ( B )) . Hence, S ≺ A .We now proceed to complete the proof of the lemma. It is easy to verify that | cl R ( A ) ∪ cl ( B ) | < s and thus by Corollary 4.11 the ˜ E operator on functions on variables in cl R ( A ) ∪ cl ( B ) correspondsto the expectation of a valid local distribution. In what follows, whenever we write Pr , we mean theprobability of an event w.r.t. this local probability distribution. (Note that the expectation w.r.t.this probability distribution agrees with ˜ E whenever both are defined.)Now, χ B = χ B in χ B rest χ B out and we can write ˜ E [ ˜ χ i χ j ] = ˜ E [ ˜ χ i χ B in χ B rest χ B out ] (2)Consider an arbitrary assignment z to B \ A and γ ∈ {± } | B bdy | to x B bdy . Let B bdy = γ be thefunction that on input x ∈ {± } n outputs if x B bdy = γ and zero otherwise.Lemma 5.1 gives the expression for the local distribution on cl R ( A ) ∪ cl ( B ) . Using the expression,we have: ˜ E [ ˜ χ i χ B in χ B rest χ B out | x B \ A = z , x B bdy = γ ] = ˜ E [ ˜ χ i χ B in χ B rest | x B bdy = γ ] · χ B out ( z B out ) , ˜ E on the RHS matches the expectation operator associated with the probability distri-bution ν cl R ( A ) .We will show that ˜ E [ ˜ χ i χ B in χ B rest | x B bdy = γ ] = 0 for every choice of γ . First, we observe that: Pr[ B bdy = γ ] · ˜ E [ χ S · χ B in χ B rest | B bdy = γ ] = ˜ E [ χ S · χ B in χ B rest · B bdy = γ ] , (3)for ever S ⊆ cl R ( A ) , | S | ≤ d .Now, ˜ χ i ∈ Span { χ S | S ⊆ cl R ( A i ) , | S | ≤ d } , and thus using (3), Pr[ B bdy = γ ] · ˜ E [ ˜ χ i χ B in χ B rest | x B bdy = γ ] = ˜ E [ ˜ χ i · χ B in χ B rest · B bdy = γ ] . Now | B in ∪ B bdy | ≤ | B out | (Claim 6.7) and B bdy = γ ∈ Span { χ S | S ⊆ B bdy } : χ B in χ B rest · B bdy = γ ∈ Span { χ B in · χ B rest · χ T | T ⊆ B bdy } . Each index set S of the characters above is a subset of B and thus S ≺ A i (invoking Claim 6.8along with the fact π ( cl ( B )) < π ( cl ( A )) ). Thus, χ B in χ B rest · B bdy = γ ∈ V i . Using Lemma 6.3, thus, ˜ E [ ˜ χ i · χ B in χ B rest · B bdy = γ ] = 0 . We can now complete the proof of Lemma 6.1. Proof of Lemma 6.1. We show that the ˜ χ i constructed above satisfy all the properties required. ByLemma 6.4, Span { ˜ χ i | i ≤ M } = Span { χ i | i ≤ M } = P nd . Next, observe that ˜ χ i = χ i − ¯ χ i . Both χ i and ˜ χ i lie in Span { χ S | S ⊆ cl R ( A i ) } and by Lemma 4.12, ˜ E is a psd expectation operator over V i .Thus, ˜ E [ ˜ χ i ] ≥ for every i ≤ M . Finally, we verify that ˜ χ i are mutually orthogonal. Fix any i . It isthen enough to show that ˜ E [ ˜ χ j · ˜ χ i ] = 0 for every j (cid:54) = i . Since Span { ˜ χ r | r ≤ j } = Span { χ r | r ≤ j } (Lemma 6.4), we only need to show that ˜ E [ χ j · ˜ χ i ] = 0 for every j < i . Invoking Lemma 6.5 thencompletes the proof. Acknowledgements Thanks to Ryan O’Donnell, Li-Yang Tan, and David Steurer for fruitful discussions and the anony-mous reviewers for their valuable comments and suggestions on a previous version of this paper. References [1] Sanjeev Arora, Béla Bollobás, László Lovász, and Iannis Tourlakis. Proving integrality gapswithout knowing the linear program. Theory of Computing , 2(1):19–51, 2006. Preliminaryversion as Arora, Béla Bollobás and Lovász, FOCS 2002.[2] Per Austrin and Johan Håstad. Randomly supported independence and resistance. SIAMJournal on Computing , 40(1):1–27, January 2011.[3] Per Austrin and Elchanan Mossel. Approximation resistant predicates from pairwise indepen-dence. Computational Complexity , 18(2):249–271, 2009.214] Boaz Barak. Sum of squares upper bounds, lower bounds and open questions, 2014. Lecturenotes for an MIT seminar series. Available on .[5] Boaz Barak, Fernando G. S. L. Brandão, Aram Wettroth Harrow, Jonathan A. Kelner, DavidSteurer, and Yuan Zhou. Hypercontractivity, sum-of-squares proofs, and their applications. In STOC , pages 307–326, 2012.[6] Boaz Barak, Jonathan A. Kelner, and David Steurer. Rounding sum-of-squares relaxations. In Proceedings of the 46th Annual ACM Symposium on Theory of Computing , STOC ’14, pages31–40, New York, NY, USA, 2014. ACM.[7] Boaz Barak, Guy Kindler, and David Steurer. On the optimality of semidefinite relaxations foraverage-case and generalized constraint satisfaction. In Proceedings of the 4th Conference onInnovations in Theoretical Computer Science , ITCS ’13, pages 197–214, New York, NY, USA,2013. ACM.[8] Boaz Barak and David Steurer. Sum-of-squares proofs and the quest toward optimal algorithms.In Proceedings of International Congress of Mathematicians (ICM) , 2014. To appear.[9] Siavosh Bennabas, Konstantinos Georgiou, Avner Magen, and Madhur Tulsiani. SDP gapsfrom pairwise independence. Theory of Computing , 8(12):269–289, 2012.[10] Siu On Chan. Approximation resistance from pairwise independent subgroups. In Proceedingsof the Forty-fifth Annual ACM Symposium on Theory of Computing , STOC ’13, pages 447–456,New York, NY, USA, 2013. ACM.[11] Moses Charikar, Konstantin Makarychev, and Yury Makarychev. Integrality gaps for Sherali-Adams relaxations. In Proceedings of the forty-first annual ACM symposium on Theory ofcomputing , pages 283–292. ACM, 2009.[12] Wenceslas Fernandez de la Vega and Claire Kenyon-Mathieu. Linear programming relaxationsof maxcut. In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algo-rithms , pages 53–61. Society for Industrial and Applied Mathematics, 2007.[13] Michel X. Goemans and David P. Williamson. Improved approximation algorithms for max-imum cut and satisfiability problems using semidefinite programming. Journal of the ACM ,42(6):1115–1145, 1995. Preliminary version in STOC ’94.[14] Dima Grigoriev. Complexity of Positivstellensatz proofs for the knapsack. computationalComplexity , 10(2):139–154, 2001.[15] Dima Grigoriev. Linear lower bound on degrees of Positivstellensatz calculus proofs for theparity. Theoretical Computer Science , 259(1-2):613–622, 2001. Preliminary version as technicalreport IHES/M/99/68, Insitut des Hautes Etudes ScientiïňĄques, 1999.[16] Subhash Khot. On the power of unique 2-prover 1-round games. In IEEE Conference onComputational Complexity , page 25, 2002.[17] Jean B. Lasserre. Global optimization with polynomials and the problem of moments. SIAMJournal on Optimization , 11(3):796–817, 2001.2218] Yurii Nesterov. Squared functional systems and optimization problems. High performanceoptimization , 13:405–440, 2000.[19] Ryan O’Donnell and David Witmer. Goldreich’s PRG: Evidence for near-optimal polynomialstretch. In IEEE Conference on Computational Complexity , pages 1–12, 2014.[20] Pablo A Parrilo. Structured semidefinite programs and semialgebraic geometry methods inrobustness and optimization . PhD thesis, California Institute of Technology, 2000.[21] Prasad Raghavendra. Optimal algorithms and inapproximability results for every CSP? In Proceedings of the Fortieth Annual ACM Symposium on Theory of Computing , STOC ’08,pages 245–254, New York, NY, USA, 2008. ACM.[22] Grant Schoenebeck. Linear level Lasserre lower bounds for certain k-CSPs. In FOCS , pages593–602, 2008.[23] Grant Schoenebeck, Luca Trevisan, and Madhur Tulsiani. Tight integrality gaps for Lovász-Schrijver LP relaxations of vertex cover and max cut. In Proceedings of the thirty-ninth annualACM symposium on Theory of computing , pages 302–310. ACM, 2007.[24] Naum Zuselevich Shor. An approach to obtaining global extremums in polynomial mathemat-ical programming problems. Cybernetics and Systems Analysis , 23(5):695–700, 1987.[25] Madhur Tulsiani. CSP gaps and reductions in the Lasserre hierarchy. In STOC , pages 303–312,2009.[26] Madhur Tulsiani and Pratik Worah. LS+ lower bounds from pairwise independence. In Pro-ceedings of the 28th Conference on Computational Complexity, CCC , pages 121–132, 2013. A Random sparse predicates Consider a random sparse predicate P on k variables and accepting | P − (1) | = t assignments. If t = exp( o ( k )) , we now show that P does not support a pairwise independent subgroup with highprobability, as k tends to infinity. Here the randomness corresponds to choosing P − (1) to be a t -sized subset of { , } k uniformly at random. Observation A.1. P − (1) does not contain any affine subspace of dimension (over F ) withprobability ≥ − t / k . Under the condition of the observation, P − (1) does not contain any pairwise independentsubgroup, because any such a subgroup contains an affine subspace of dimension . Proof of Observation A.1. Let v , . . . , v t ∈ P − (1) be an enumeration of vectors in P − (1) . Notethat if P − (1) contains a subspace of dimension 2, then there are ≤ a < b < c ≤ t such that thissubspace is exactly the affine span of v a , v b , v c .For a fixed choice of the triple ( a, b, c ) , conditioning on the event that v a , v b , v c span an affinesubspace of dimension , the remaining vector from this affine subspace also belongs to P − (1) withprobability at most t/ k . Taking a union bound over ( a, b, c ) (at most t such choices), we see that P − (1) contains an affine subspace with probability at most t / k .23 Constructing nice instances In this section, we show the existence of nice instances of constraint hypergraphs and prove Theorem3.4. Lemma B.1. Fix > (cid:15), δ ≥ and γ ≥ e k k . Then, there exists a k -uniform constraint hypergraph G with γn edges such that for η = (1 /γ ) /δ , τ = 4 log ( γk ) , G :1. is ( ηn, δ ) -expanding,2. has girth g ≥ log ( n ) /τ , andProof. We first choose a random graph G by choosing every k uniform hyperedge, independently,with probability p = 4 γ · k ! /n k − . Our final hypergraph will be obtained by removing hyperedgesfrom G .We first show that: Claim B.2. For G chosen as above, with probability at least / ,1. has between γn and γn edges.2. has ( ηn, δ ) -expansion,3. has at most n / log ( n ) cycles of length at most g and We first show that the claim above is enough to complete the proof of the lemma. We define G (cid:48) to be the hypergraph obtained by removing every cycle of length at most g .By the claim above, thetotal number of hyperedges removed in this process, for a large enough n , is at most γn . Observethat the last property in the statement of the theorem is immediately satisfied by G (cid:48) . Further, since G (cid:48) is obtained only by removing hyperedges from G , G (cid:48) still enjoys ( ηn, δ ) -expansion. Thus, G (cid:48) isa constraint hypergraph that satisfies the requirements of the lemma. Finally, the total number ofedges removed is sublinear in n and thus G (cid:48) has at least γn edges for a large enough n .We now move on to complete the proof of the claim above: Proof of Claim. 1. The expected number of edges in G is given by p · (cid:0) nk (cid:1) = 4 γn (1 − k − n ) k − ≥ γn (1 − ( k − n ) . By an application of Chernoff bound, the probability that the number ofedges does not lie in the interval [2 γn, γn ] is at most e − γn .2. Next, consider any collection of s clauses and let us compute the probability that they coverat most cs variables for some c = k − − δ . This probability, is then upper bounded by (cid:18) ncs (cid:19) · (cid:18)(cid:0) csk (cid:1) s (cid:19) p s . Using that (cid:0) csk (cid:1) ≤ ( cs ) k /k ! and the approximation (cid:0) xy (cid:1) ≤ (cid:16) xey (cid:17) y , we can upper bound theabove expression by: (cid:16) necs (cid:17) cs · e ( cs ) k k ! s s (cid:18) γ · k ! n k − (cid:19) s . c = k − − δ and that δ < now yields an upper bound of (cid:16) sn (cid:17) δs · (cid:16) γe k c (cid:17) s . Thus, using that γ > e k k and that s satisfies sn ≤ (1 /γ ) /δ makes the above probability atmost (1 /γ ) s .3. To see how to ensure that the high girth requirement, we first observe that for any integer (cid:96) ,the expected number of cycles of length (cid:96) in G is at most ( dk ) (cid:96) .We first count the number of ways to choose a cycle of length (cid:96) . Recall that a cycle is givenby a cyclic sequence C , . . . , C (cid:96) of hyperedges. There are (cid:0) nk (cid:1) ways to choose C , and for ≤ i < (cid:96) , at most k (cid:0) nk − (cid:1) ways to choose the common vertex C i − ∩ C i and remaining verticesfor C i , and finally at most k (cid:0) nk − (cid:1) to choose C (cid:96) that intersects both C and C (cid:96) − . Thereforethe expected number of length- (cid:96) cycles is at most (cid:18) nk (cid:19) · (cid:18) k (cid:18) nk − (cid:19)(cid:19) (cid:96) − · k (cid:18) nk − (cid:19) · (cid:18) γ ( k !) n k − (cid:19) (cid:96) ≤ (4 γ ) (cid:96) k (cid:96) . By an application of Markov’s inequality, with probability at least / over the draw ofhyperedges of G , the number of cycles of length at most g = log γk n are at most (cid:88) (cid:96) ≤ g (4 γk ) (cid:96) ≤ g n / . By a union bound, now, all the three properties above can be ensured with probability atleast / . B.1 Soundness In this section, we show that after fixing the underlying hyperedges G of an instance, with highprobability over the literals on constraints, all assignments are very close to a random assignment.Here closeness is measured with respect to the distribution { C ( x ) } as one chooses a uniformlyrandom constraint among all hyperedges of the hypergraph.Let G be any hypergraph with m hyperedges. Let I be an instance with the same underlyinghypergraph as G , and with the literals in all clauses be chosen uniformly at random. We have thefollowing lemma. Lemma B.3. Suppose m = Ω(2 O ( k ) (cid:15) − n ) . With high probability over the choice of literals, for anyassignment x ∈ {± } n , the distribution { C ( x ) } with C chosen uniformly at random in I is within (cid:15) statistical distance to the uniform distribution over {± } k .Proof. Let I = ( C , . . . , C m ) be a fixed collection of literals. Let µ I ,x denotes the distribution { C i ( x ) } when i is drawn uniformly from [ m ] . For each local assignment y ∈ {± } k , the probability µ I ,x [ y ] that a random local assignment from µ I ,x equals y is given by E i ∈ [ m ] [ C i ( x )= y ] .25ow suppose the signs of the literals from I for every constraint are chosen uniformly at random,keeping the underlying subhypergraph fixed. Then µ I ,x [ y ] is now a random variable depending onthe randomness of the literals. For each i , the indicator C i ( x )= y equals with probability / k ,and equals with the remaining probability (over the randomness of the signs of the literals on the i -th constraint), and the random variables C i ( x )= y are independent of each other for different i .Therefore µ I ,x [ y ] is the average of m independent { , } -indicator random variables, each being with probability / k . By Chernoff–Hoeffding bound, we have | µ I ,x [ y ] − / k | > η with probability atmost − η m/ k +1 ) . By a union bound over all assignments x ∈ {± } n , the maximum deviationof µ I ,x [ y ] from / k (over all x ) exceeds η with probability at most − η m/ k +1 + n log 2) .Letting η = (cid:15)/ k , we see that Pr (cid:104) max x (cid:110) | µ I ,x [ y ] − / k | (cid:111) ≥ (cid:15) k (cid:105) ≤ exp( − Ω( n )) as long as m = Ω(2 O ( k ) (cid:15) − n ) .Now the distribution { C i ( x ) } for a random i ∈ [ m ] has statistical distance at least (cid:15) impliesthat | µ I ,x [ y ] − / k | ≥ (cid:15)/ k for some y . By a union bound over all y ∈ {± } k , the distribution { C i ( x ) } is close in statistical distance to the uniform distribution on {± } k except with probability exp( O ( k ) − Ω( n )) , assuming m = Ω(2 O ( k ) (cid:15) − n ))