[PDF] Sum of squares bounds for the ordering principle

Abstract

In this paper, we analyze the sum of squares hierarchy (SOS) on the ordering principle on n elements. We prove that degree O( n − − √ log(n)) SOS can prove the ordering principle. We then show that this upper bound is essentially tight by proving that for any ϵ>0 , SOS requires degree Ω( n 1 2 −ϵ ) to prove the ordering principle on n elements.

Full PDF

aa r X i v : . [ c s . CC ] J u l Sum of squares bounds for the ordering principle

Aaron PotechinUniversity of [email protected] 31, 2020

Abstract

In this paper, we analyze the sum of squares hierarchy (SOS) on the ordering principle on n elements (which has N = Θ( n ) variables). We prove that degree O ( √ nlog ( n )) SOS canprove the ordering principle. We then show that this upper bound is essentially tight by provingthat for any ǫ > , SOS requires degree Ω( n − ǫ ) to prove the ordering principle. . Introduction

In proof complexity, we study how easy or difﬁcult it is to prove or refute various statements.Proof complexity is an extremely rich ﬁeld, so we will not attempt to give an overview of proofcomplexity here (for a recent survey of proof complexity, see [14]). Instead, we will only describethe particular proof system and the particular statement we are considering, namely the sum ofsquares hierarchy (SOS), and the ordering principle.SOS can be described in terms of sum of squares/Positivstellensatz proofs (which we write asSOS proofs for brevity). SOS proofs have the following nice properties:1. SOS proofs are broadly applicable as they are complete for systems of polynomial equationsover R . In other words, for any system of polynomial equations over R which is infeasible,there is an SOS proof that it is infeasible [17].2. SOS proofs are surprisingly powerful. In particular, SOS captures both spectral methods andpowerful inequalities such as Cauchy-Schwarz and variants of hypercontractivity [3], whichmeans that much of our mathematical reasoning can be captured by SOS proofs.3. In some sense, SOS proofs are simple. In particular, SOS proofs only use polynomial equal-ities and the fact that squares are non-negative over R .SOS has been extensively studied, so we will not give an overview of what is known about SOShere. To learn more about SOS, see the following survey on SOS [4] and the following recentseminars/courses on SOS [2, 9, 11, 12].The ordering principle (which has N = Θ( n ) variables) states that if we have elements a , . . . , a n which have an ordering and no two elements are equal then some element a i mustbe minimal. The ordering principle is a very interesting example in proof complexity because forseveral proof systems, it has a small size proof but any proof must have high width/degree.The ordering principle was ﬁrst considered by Krishnamurthy [10] who conjectured that it washard for the resolution proof system. This conjecture was refuted by Stalmark [16], who showedthat the ordering principle has a polynomial size resolution proof based on induction. However,any resolution proof of the ordering principle must have width Ω( n ) = Ω( √ N ) . While this is atrivial statement because it takes width n to even describe the ordering principle, Bonet and Galesi[5] showed that the ordering principle can be modiﬁed to give a statement which can be describedwith constant width but resolution still requires width Ω( n ) = Ω( √ N ) to prove it. This showedthat the width/size tradeoff of Ben-Sasson and Wigderson [6] (which was based on the degree/sizetradeoff shown for polynomial calculus by Impagliazzo, Pudl´ak, and Sgall [8]) is essentially tight.Later, Ω( n ) degree lower bounds for the ordering principle were also shown for polynomialcalculus [7] and for the Sherali-Adams hierarchy. However, non-trivial SOS degree bounds for theordering principle were previously unknown. Thus, a natural question is, does SOS also requiredegree Ω( n ) to prove the ordering principle or is there an SOS proof of the ordering principlewhich has smaller degree? In this paper, we show almost tight upper and lower SOS degree bounds for the ordering principle.In particular, we show the following theorems: 2 heorem 1.1.

Degree O ( √ nlog ( n )) SOS can prove the ordering principle on n elements. Theorem 1.2.

For any constant ǫ > there is a constant C ǫ > such that for all n ∈ N , degree C ǫ n − ǫ SOS cannot prove the ordering principle on n elements. Theorem 1.1 shows that looking at degree, SOS is more powerful than resolution, polynomialcalculus, and the Sherali-Adams hierarchy for proving the ordering principle. This also impliesthat the ordering principle is not a tight example for the size/degree trade-off for SOS which wasrecently shown by Atserias and Hakoniemi [1]. On the other hand, Theorem 1.2 shows that Theo-rem 1.1 is essentially tight and thus the ordering principle does still give an example where thereis a small SOS proof yet any SOS proof must have fairly high degree.

The remainder of the paper is organized as follows. In Section 2 we give some prelimiaries. InSection 3, we give natural pseudo-expectation values for the ordering principle. In Section 4, weprove our SOS upper bound. In Sections 5-9, we describe how to prove our SOS lower bound.

In this section, we recall the deﬁnitions of SOS proofs and pseudo-expectation values and describethe encoding of the ordering principle which we will analyze.

Deﬁnition 2.1.

Given a system of polynomial equations { s i = 0 } over R , a degree d SOS proof ofinfeasibility is an equality of the form − X i f i s i + X j g j where1. ∀ i, deg ( f i ) + deg ( s i ) ≤ d ∀ j, deg ( g j ) ≤ d Remark 2.2.

This is a proof of infeasibility because if the equations { s i = 0 } were satisﬁed wewould have that ∀ i, f i s i = 0 and for all ∀ j, g j ≥ , so we cannot possibly have that P i f i s i + P j g j < . In order to prove that there is no degree d SOS proof of infeasibility for a system of polynomialequations over R , we use degree d pseudo-expectation values, which are deﬁned as follows. Deﬁnition 2.3.

Given a system of polynomial equations { s i = 0 } over R , degree d pseudo expec-tation values are a linear map ˜ E from polynomials of degree at most d to R such that1. ˜ E [1] = 1 . ∀ f, i, ˜ E [ f s i ] = 0 whenever deg ( f ) + deg ( s i ) ≤ d ∀ g, ˜ E [ g ] ≥ whenever deg ( g ) ≤ d As shown by the following proposition, these conditions are a weakening of the constraintthat we have expected values over an actual distribution of solutions. Thus, intuitively, pseudo-expectations look like they are the expected values (up to degree d ) over a distribution of solutions,but they may not actually correspond to a distribution of solutions. Proposition 2.4. If Ω is an actual distribution of solutions to the equations { s i = 0 } over R then1. E Ω [1] = 1 ∀ f, i, E Ω [ f s i ] = 0 ∀ g, E Ω [ g ] ≥ Proposition 2.5.

For a given system of polynomial equations { s i = 0 } over R , there cannot beboth degree d pseudo-expectation values and a degree d SOS proof of infeasibility.Proof.

Assume we have both degree d pseudo-epxectation values and a degree d SOS proof ofinfeasibility. Applying the pseudo-expectation values to the SOS proof gives the following contra-diction: − E [ −

1] = X i ˜ E [ f i s i ] + X j ˜ E [ g j ] ≥ For our SOS bounds, we analyze the following system of infeasible equations which correspondsto the negation of the ordering principle.

Deﬁnition 2.6 (Ordering principle equations) . The negation of the ordering principle says that itis possible to have distinct ordered elements { a , . . . , a n } such that no a j is the minimum element.We encode this with the folloing equations:1. We have variables x ij where we want that x ij = 1 if a i < a j and x ij = 0 if a i > a j . We alsohave auxiliary variables { z j : j ∈ [ n ] } .2. ∀ i = j, x ij = x ij and x ij = 1 − x ji (ordering)3. For all distinct i, j, k , x ij x jk (1 − x ik ) = 0 (transitivity)4. ∀ j, P i = j x ij = 1 + z j (for all j ∈ [ n ] , a j is not the minimum element of { a , . . . , a n } )We call this system of equations the ordering principle equations. Remark 2.7.

In this encoding of the negation of the ordering principle, we use the auxiliary vari-ables { z j : j ∈ [ n ] } so that we can express the statement that ∀ j, ∃ i = j : x ij = 1 as polynomialequalities of low degree. .2.1 Relationship to other encodings of the negation of the ordering principle The equations in Deﬁnition 2.6 are tailored for SOS, so they are not the same as the encodings ofthe negation of ordering principle which were studied in prior works [5, 7]. We now discuss thisdifference and how it affects our results.For resolution, the following axioms encode the negation of the ordering principle:1. We have variables x ij where we want that x ij is true if a i < a j and x ij is false if a i > a j .2. ∀ i = j, x ij ∨ x ji and ¬ x ij ∨ ¬ x ji (ordering)3. For all distinct i, j, k , ¬ x ij ∨ ¬ x jk ∨ x ik = 0 (transitivity)4. ∀ j, W i = j x ij (for all j ∈ [ n ] , a j is not the minimum element of { a , . . . , a n } )Translating this into polynomial equations, this gives us the following equations for polynomialcalculus:1. We have variables x ij where we want that x ij = 1 if a i < a j and x ij = 0 if a i > a j .2. ∀ i = j, x ij = x ij and x ij = 1 − x ji (ordering)3. For all distinct i, j, k , x ij x jk (1 − x ik ) = 0 (transitivity)4. ∀ j, Q i = j (1 − x ij ) = 0 (for all j ∈ [ n ] , a j is not the minimum element of { a , . . . , a n } )However, as noted in the introduction, a width/degree lower bound of Ω( n ) is trivial for theseencodings as the axioms already have width/degree n . To handle this, Bonet and Galesi [5] usedauxiliary variables to break up the axioms into constant width axioms. Galesi and Lauria [7]instead considered the following ordering principle on graphs. Deﬁnition 2.8 (Ordering principle on graphs) . Given a graph G with V ( G ) = [ n ] , the orderingprinciple on G says that if each vertex i ∈ [ n ] has a value a i and all of the values are distinct thenthere must be some vertex whose value is less than its neighbors’ values. When we take the negation of the ordering principle on a graph G , this changes our ax-ioms/equations as follows:1. For resolution, instead of the axioms ∀ j, W i = j x ij we have the axioms ∀ j, W i :( i,j ) ∈ E ( G ) x ij .2. For polynomial calculus, instead of the equations ∀ j, Q i = j (1 − x ij ) = 0 we have the equa-tions ∀ j, Q i :( i,j ) ∈ E ( G ) (1 − x ij ) = 0 Remark 2.9. If G = K n then the ordering principle on G is just the ordering principle Galesi and Lauria [7] showed that if G is an expander then polynomial calculus requires degree Ω( n ) to refute these equations.The ordering principle on graphs is a weaker statement than the ordering principle, so wewould expect that its negation would be easier to refute. This is indeed the case. As shownby the following lemma, we can recover the equation P i = j x ij = 1 + z j from the equation Q i :( i,j ) ∈ E ( G ) (1 − x ij ) = 0 , except that z j is replaced by a sum of squares.5 emma 2.10. Given Boolean variables { x i : i ∈ [ k ] } (i.e. ∀ i ∈ [ k ] , x i = x i ) and the equation Q ki =1 (1 − x i ) = 0 , we can deduce that (cid:16)P ki =1 x i (cid:17) − is a sum of squares.Proof. Observe that modulo the axioms x i = x i , k X i =1 x i − − k Y i =1 (1 − x i ) ! + X J ⊆ [ k ]: J = ∅ ( | J | − Y i ∈ J x i !  Y i ∈ [ k ] \ J (1 − x i )  = − k Y i =1 (1 − x i ) ! + X J ⊆ [ k ]: J = ∅ ( | J | − Y i ∈ J x i !  Y i ∈ [ k ] \ J (1 − x i )  To see this, observe that for any non-empty J ⊆ [ k ] , if x i = 1 for all i ∈ J and x i = 0 for all i / ∈ J then the left and right sides are both | J | − . Similarly, if all of the x i are then the left and rightsides are both − .This implies that our SOS upper bound holds for the graph ordering principle as well as theordering principle. However, our SOS lower bound does not apply to the ordering principle onexpander graphs. Part of the reason is that our SOS lower bound relies heavily on symmetry underpermutations of [ n ] .There is one way in which the ordering principle equations are unsatisfactory for our purposes.We want to show that the size/degree tradeoffs for SOS [1] cannot be improved too much further.However, the auxiliary variables in the ordering principle equations are not Boolean and this trade-off only applies when all of the variables are Boolean. To ﬁx this, we show that we can modify theordering principle equations so that we only have Boolean variables but our SOS upper and lowerbounds still hold. For details, see Appendix A. In this section, we give natural candidate pseudo-expectation values ˜ E n for the ordering principleequations. In fact, ˜ E n is essentially the only possible symmetric pseudo-expectation values. Inparticular, in section 4 we will show that if ˜ E n fails at degree d then there is an SOS proof ofdegree at most d + 2 that these equations are infeasible. ˜ E n As noted in Section 2, intuitively, pseudo-expectation values should look like the expected valuesover a distribution of solutions. Also, as shown by the following lemma, since the problem issymmetric under permutations of [ n ] , we can take ˜ E to be symmetric as well. Lemma 3.1. If { s i = 0 } is a system of polynomial equations which is symmetric under per-mutations of [ n ] then if there are degree d pseudo-expectation values ˜ E then there are degree d pseudo-expectation values ˜ E ′ which are symmetric under permutations of [ n ] .Proof. Given degree d pseudo-expectation values ˜ E , take ˜ E ′ to be the linear map from polynomialsof degree at most d to R such that for all monomials p , ˜ E ′ [ p ] = E π ∈ S n [ ˜ E [ π ( p )]] . Now observe that6. ˜ E ′ [1] = E π ∈ S n [ ˜ E [1]] = 1 ∀ f, i : deg ( f ) + deg ( s i ) ≤ d, ˜ E ′ [ f s i ] = E π ∈ S n [ ˜ E [ π ( f ) π ( s i )]] = 0 because the system ofequations { s i = 0 } is symmetric under permutations of [ n ] .3. ∀ g : deg ( g ) ≤ d , ˜ E ′ [ g ] = E π ∈ S n [ ˜ E [ π ( g ) ]] ≥ Guided by this, we take the expected values over the uniform distribution over orderings of x , . . . , x n . These orderings are not solutions to the equations because each ordering causes oneequation P i = j x ij = 1 + z j to fail. However, a random ordering makes each individual equation P i = j x ij = 1 + z j true with high probability, so the intuition is that low degree SOS will not beable to detect that there is always one equation which fails. Deﬁnition 3.2.

We deﬁne U n to be the uniform distribution over orderings of x , . . . , x n , i.e. foreach permutation π : [ n ] → [ n ] we have that with probability n ! , x π (1) < x π (2) < . . . < x π ( n ) andthus for all i < j , x π ( i ) π ( j ) = 1 and x π ( j ) π ( i ) = 0 . Deﬁnition 3.3.

Given a polynomial f ( { x ij : i = j } ) , we deﬁne ˜ E n [ f ( { x ij : i = j } )] = E U n [ f ] Example 3.4. ∀ i = j, ˜ E n [ x ij ] = because there is a chance that i comes before j in a randomordering. Example 3.5.

For all distinct i, j, k , ˜ E n [ x ij x jk ] = because there is a chance that i < j < k ina random ordering. However, in order to fully deﬁne ˜ E n we have to deﬁne ˜ E n [ p ] for monomials p involving the z variables. We can do this as follows. Deﬁnition 3.6 (Candidate pseudo-expectation values) .

1. For all monomials p ( { x ij : i, j ∈ [ n ] , i = j } ) , we take ˜ E n [ p ] = E U n [ p ]

2. For all monomials p ( { x ij : i, j ∈ [ n ] , i = j } ) , we take ˜ E n h(cid:16)Q j ∈ A z j (cid:17) p i = 0 whenever A ⊆ [ n ] is non-empty because each z j could be positive or negative.3. For all monomials p ( { x ij : i, j ∈ [ n ] , i = j } , { z j : j ∈ [ n ] } ) and all j ∈ [ n ] , we take ˜ E n [ z j p ] = ˜ E n h(cid:16)P i = j x ij − (cid:17) p i because we have that for all j , z j = P i = j x ij − . ˜ E n are pseudo-expectation values We now discuss what needs to be checked in order to determine whether our candidate pseudo-expectation values ˜ E n are actually degree d pseudo-expectation values. To analyze ˜ E n , it is con-venient to create a new variable w j which is equal to z j . Deﬁnition 3.7.

Deﬁne w j = P i = j x ij − . { x ij } and { w j } , ˜ E n is the expectedvalues over a distribution of solutions. This implies that the polynomial equalities obtained bymultiplying one of the ordering principle equations in Deﬁnition 2.6 by a monomial will be satis-ﬁed at all degrees, not just up to degree d . However, each w j is supposed to be a square but this isnot actually the case for this distribution, so ˜ E n may fail to give valid pseudo-expectation values.In fact, this is the only way in which ˜ E n can fail to give valid pseudo-expectation values. We makethis observation precise with the following lemma: Lemma 3.8. If ˜ E n h(cid:16)Q j ∈ A w j (cid:17) g A (( { x ij : i, j ∈ [ n ] , i = j } ) i ≥ whenever A ⊆ [ n ] and | A | + deg ( g A ) ≤ d then ˜ E n gives degree d pseudo-expectation values.Proof. We ﬁrst check that ˜ E n [ g ] ≥ whenever deg ( g ) ≤ d as this is the more interesting part.Given a polynomial g of degree at most d , decompose g as g = X A ⊆ [ n ] Y j ∈ A z j ! g A ( x , . . . , x n ) where for all A ⊆ [ n ] , | A | + deg ( g A ) ≤ d . Now observe that ˜ E n [ g ] = ˜ E n  X A,A ′ ⊆ [ n ] Y j ∈ A z j Y j ∈ A ′ z j ! g A ( x , . . . , x n ) g A ′ ( x , . . . , x n )  = X A ⊆ [ n ] ˜ E n " Y j ∈ A w j ! g A ( x , . . . , x n ) ≥ We now check that the polynomial equalities obtained by multiplying one of the ordering prin-ciple equations in Deﬁnition 2.6 by a monomial are satisﬁed. By the deﬁnition of ˜ E n , we have thatfor all monomials p and all j ∈ [ n ] , ˜ E n [( X i = j x ij − − z j ) p ] = ˜ E n [( X i = j x ij − p ] − ˜ E n [( X i = j x ij − p ] = 0 To prove the other polynomial equalities, we use induction on the total degree of the { z j } variables.For the base case, observe that1. ˜ E n [1] = E U n [1] = 1

2. For all monomials p ( { x ij : i, j ∈ [ n ] , i = j } ) and for all ordering or transitivity constraints s i = 0 , ˜ E n [ ps i ] = E U n [ ps i ] = 0 For the inductive step, if p is a monomial which is divisible by z j for some j then write p = z j p ′ .By the inductive hypothesis, for all ordering or transitivity constraints s i = 0 , ˜ E n [ ps i ] = ˜ E n [ z j p ′ s i ] = ˜ E n [( X i = j x ij − p ′ s i ] = 0 p is a monomial of the form p = (cid:16)Q j ∈ A z j (cid:17) p ′ ( { x ij : i, j ∈ [ n ] , i = j } ) where A = ∅ then for all ordering or transitivity constraints s i = 0 , ˜ E n [ ps i ] = ˜ E n " Y j ∈ A z j ! p ′ s i = 0 O ( √ nlog ( n )) Degree SOS Upper Bound

In this section, we prove theorem 1.1 by constructing a degree O ( √ nlog ( n )) proof of the order-ing principle. To construct this proof, we ﬁrst ﬁnd a polynomial g of degree O ( √ nlog ( n )) suchthat ˜ E n [ g ] < . We then show that there is an SOS proof (which in fact uses only polynomialequalities) that E π ∈ S n [ π ( g ) ] = ˜ E n [ g ] < . ˜ E n We now show that ˜ E n fails to give valid pseudo-expectation values at degree O ( √ nlog ( n )) . Theorem 4.1.

For all n ≥ there exists a polynomial g ( w ) of degree ⌈ √ n ( log ( n ) + 1) ⌉ suchthat ˜ E [( z g ( w )) ] = E U n [ w g ( w )] < Proof.

Observe that over the uniform distribution of orderings, w is equally likely to be anyinteger in [ − , n − . To make E U n [ w g ( w )] negative, we want g ( w ) to have high magnitude at w = − and small magnitude on [1 , n − . For this, we can use Chebyshev polynomials. FromWikipedia [18], Deﬁnition 4.2.

The mth Chebyshev polynomial can be expressed as1. T m ( x ) = cos ( mcos − ( x )) if | x | ≤ T m ( x ) = (cid:16)(cid:0) x + √ x − (cid:1) m + (cid:0) x − √ x − (cid:1) m (cid:17) if | x | ≥ Theorem 4.3.

For all integers m ≥ and all x ∈ [ − , , | T m ( x ) | ≤ We now take g ( w ) = T m ( − w n ) where m = ⌈ √ n ( log ( n ) + 1) ⌉ and analyze g . Lemma 4.4.

Taking g ( w ) = T m ( − w n ) where m = ⌈ √ n ( log ( n ) + 1) ⌉ ,1. | g ( − | ≥ n

2. For all w ∈ [0 , n − , | g ( w ) | ≤ .Proof. The second statement follows immediately from Theorem 4.3 as when w ∈ [0 , n − , − w n ∈ [ − , so | g ( w ) | = | T m ( − w n ) | ≤ . For the ﬁrst statement, let ∆ = n andobserve that when x = − − ∆ ,1. √ x − p (1 + ∆) − ≥ √ . Thus, | x − √ x − | ≥ √ .9. x + √ x − < and x − √ x − < so | T m ( x ) | = 12 (cid:16)(cid:12)(cid:12)(cid:12) x + √ x − (cid:12)(cid:12)(cid:12) m + (cid:12)(cid:12)(cid:12) x − √ x − (cid:12)(cid:12)(cid:12) m (cid:17) ≥ (1 + √ m We now use the following proposition.

Proposition 4.5.

For all y ∈ [0 , and all m ≥ , (1 + y ) m ≥ ym Proof.

Observe that (1 + y ) m = (cid:16) (1 + y ) y (cid:17) ym and if y ≤ then (1 + y ) y ≥ .Since n ≥ , ∆ = n ≤ and √ ≤ . Applying Proposition 4.5 with y = √ andrecalling that m = ⌈ √ n ( log ( n ) + 1) ⌉ , | g ( − | = | T m ( − − ∆) | ≥ (1 + √ m ≥ √ m m √ n ≥ log ( n )+1 n We can now complete the proof of Theorem 4.1. By Lemma 4.4, E U n [ w g ′ ( w )] ≤ n − − n + n − X j =0 j ) < We now show that from the failure of ˜ E n , we can construct an SOS proof that the ordering principleequations are infeasible. Theorem 4.6.

If there exists a polynomial g of degree at most d such that ˜ E n [ g ] < then thereexists an SOS proof of degree at most d + 2 that the ordering principle equations are infeasible.Proof. To prove this theorem, we will show that for any monomial p ( { x i,j : i, j ∈ [ n ] , i = j } ) ofdegree at most d , there is a proof of degree at most d + 2 that n ! P π ∈ S n π ( p ) = ˜ E n [ p ] which usesonly polynomial equalities. To prove this, we observe that given arbitrary indices i , . . . , i k , wecan split things into cases based on the order of a i , . . . , a i k . Lemma 4.7.

Given the ordering and transitivity axioms, for all r ∈ N , tuples of distinct indices ( i , . . . , i r +1 ) , and permutations π ∈ S r , r − Y j =1 x i π ( j ) i π ( j +1) = x i r +1 i π (1) r − Y j =1 x i π ( j ) i π ( j +1) + r − X k =1 k − Y j =1 x i π ( j ) i π ( j +1) ! x i π ( k ) i r x i r i π ( k +1) r − Y j = k +1 x i π ( j ) i π ( j +1) ! + r − Y j =1 x i π ( j ) i π ( j +1) x i π ( r ) i r +1 and there is a degree r + 1 proof of this fact which uses only polynomial equalities. emark 4.8. The idea behind this lemma is that we have found the correct ordering for i , . . . , i r and we are inserting i r +1 into the correct place.Proof. Using the ordering and transitivity axioms,1. Q r − j =1 x i π ( j ) i π ( j +1) = x i r +1 i π (1) (cid:16)Q r − j =1 x i π ( j ) i π ( j +1) (cid:17) + x i π (1) i r +1 (cid:16)Q r − j =1 x i π ( j ) i π ( j +1) (cid:17)

2. For all k ∈ [ r − , x i π ( k ) i r +1 r − Y j =1 x i π ( j ) i π ( j +1) = ( x i π ( k ) i r +1 x i r +1 i π ( k +1) + x i π ( k ) i r +1 x i π ( k +1) i r +1 ) r − Y j =1 x i π ( j ) i π ( j +1) ! = ( x i π ( k ) i r +1 x i r +1 i π ( k +1) + x i π ( k +1) i r +1 ) r − Y j =1 x i π ( j ) i π ( j +1) ! where the second equality follows because of the transitivity axiom x i π ( k ) i π ( k +1) x i π ( k +1) i r +1 (1 − x i π ( k ) i r +1 ) = 0 which implies that x i π ( k ) i π ( k +1) x i π ( k +1) i r +1 x i π ( k ) i r +1 = x i π ( k ) i π ( k +1) x i π ( k +1) i r +1 .3. For all k ∈ [ r − , we have the transitivity axiom x i π ( k ) i r +1 x i r +1 i π ( k +1) (1 − x i π ( k ) i π ( k +1) ) = 0 which implies that x i π ( k ) i r +1 x i r +1 i π ( k +1) x i π ( k ) i π ( k +1) = x i π ( k ) i r +1 x i r +1 i π ( k +1) . Thus, x i π ( k ) i r +1 x i r +1 i π ( k +1) r − Y j =1 x i π ( j ) i π ( j +1) = k − Y j =1 x i π ( j ) i π ( j +1) ! x i π ( k ) i r x i r i π ( k +1) r − Y j = k +1 x i π ( j ) i π ( j +1) ! The result follows by combining all of these equalities.

Corollary 4.9.

Given the ordering and transitivity axioms, for all k and all sets of k distinct indices { i , . . . , i k } , X π ∈ S k k − Y j =1 x i π ( j ) i π ( j +1) and there is a degree k + 1 proof of this fact which uses only polynomial equalities. Corollary 4.10.

For any monomial p ( { x i,j : i, j ∈ [ n ] , i = j } ) of degree d whose variables containa total of k indices i , . . . , i k , there is a proof of degree at most d + k +1 that n ! P π ∈ S n π ( p ) = ˜ E n [ p ] which uses only polynomial equalities.Proof sketch. By Corollary 4.9, p = X π ∈ S k k − Y j =1 x i π ( j ) i π ( j +1) p d + k + 1 which uses only polynomial equalities.Using the transitivity axioms, we can prove that X π ∈ S k k − Y j =1 x i π ( j ) i π ( j +1) p = X π ∈ S k : p =1 when x iπ (1) <...

Deﬁnition 5.1.

We deﬁne Ω n,d to be the distribution on a variable u with support [0 , n − d ] ∩ Z and the following probabilities: P r [ u = k ] = (cid:0) n − k − d − (cid:1)(cid:0) nd (cid:1)

1. In Section 6, we show that to prove our sum of squares lower bound, it is sufﬁcient to showthat for all polynomials g ∗ of degree at most d , for some d ≥ d , E Ω n,d [( u − g ∗ ( u ) ] ≥ .Equivalently, n − d − X k =0 (cid:0) n − k − d − (cid:1)(cid:0) n − d − (cid:1) kg ∗ ( k + 1) ≥ g ∗ (0) This reduces the problem to analyzing a distribution on one variable. For the precise state-ment of this result, see Theorem 6.1.2. In Section 7, we observe that an approximation to the above statement is the statement thatfor some small ∆ > (we will take ∆ = d n ), taking g ( x ) = g ∗ (cid:0) x ∆ + 1 (cid:1) , Z ∞ x =0 g ( x ) xe − x dx ≥ ∆ g ( − ∆)

12e then prove this approximate statement. For the precise statement of this result, see The-orem 7.8.3. In Section 8, we analyze the difference between ∆ P ∞ k =0 ( k ∆) e − k ∆ g ( k ∆) and R ∞ x =0 g ( x ) xe − x dx and show that it is small. For the precise statement of this result, see Theorem 8.16.4. In Section 9, we analyze the difference between ∆ P n − d − k =0 ( n − k − d − )( n − d − ) ( k ∆) g ( k ∆) and ∆ P ∞ k =0 ( k ∆) e − k ∆ g ( k ∆) . For the precise statement of this result, see Theorem 9.1.In Section 10, we put all of these pieces together to prove our SOS lower bound. ˜ E to Analyzing a Single-Variable Distri-bution Recall that Ω n,d is the distribution on a variable u with support [0 , n − d ] ∩ Z and the followingprobabilities: P r [ u = k ] = (cid:0) n − k − d − (cid:1)(cid:0) nd (cid:1) In this section, we show that to check that our candidate pseudo-expectation values ˜ E n are valid,it is sufﬁcient to analyze the distribution Ω n,d . In particular, we prove the following theorem: Theorem 6.1.

For all d, d , n ∈ N such that d ≤ d ≤ n , if there is a polynomial g of degree atmost d such that ˜ E n [ g ] < then there is a polynomial g ∗ : R → R of degree at most d such that E Ω n,d [( u − g ∗ ( u ) ] < . Equivalently, n − d − X k =0 (cid:0) n − k − d − (cid:1)(cid:0) n − d − (cid:1) kg ∗ ( k + 1) < g ∗ (0) To see why this statement is equivalent, observe that E Ω n,d [( u − g ∗ ( u ) ] = n − d X k =0 (cid:0) n − k − d − (cid:1)(cid:0) nd (cid:1) ( k − g ∗ ( k )= n − d − X k =0 (cid:0) n − k − d − (cid:1)(cid:0) nd (cid:1) kg ∗ ( k + 1) ! − (cid:0) n − d − (cid:1)(cid:0) nd (cid:1) g ∗ (0) Thus, E Ω n,d [( u − g ∗ ( u ) ] < if and only if n − d − X k =0 (cid:0) n − k − d − (cid:1)(cid:0) nd (cid:1) kg ∗ ( k + 1) < (cid:0) n − d − (cid:1)(cid:0) nd (cid:1) g ∗ (0) ( nd )( n − d − ) , this is equivalent to n − d − X k =0 (cid:0) n − k − d − (cid:1)(cid:0) n − d − (cid:1) kg ∗ ( k + 1) < g ∗ (0) In the remainder of this section, we prove this theorem by starting with the polynomial g andconstructing the polynomial g ∗ . g We ﬁrst use symmetry to argue that we can take g to be symmetric under permutations of all but d distinguished indices. For this, we use Theorem 4.1 in [13], which is essentially implied byCorollary 2.6 of [15]. Deﬁnition 6.2.

The index degree of a polynomial g is the maximum number of indices mentionedin any monomial of g . Example 6.3. g = x x + x has index degree and degree . Theorem 6.4. If ˜ E is a linear map from polynomials to R which is symmetric with respect topermutations of [1 , n ] then for any polynomial g , we can write ˜ E [ g ] = X I ⊆ [1 ,n ] ,j : | I |≤ indexdeg ( g ) ˜ E [ g Ij ] where for all I, j ,1. g Ij is symmetric with respect to permutations of [1 , n ] \ I .2. indexdeg ( g Ij ) ≤ indexdeg ( g ) and deg ( g Ij ) ≤ deg ( g ) ∀ i ∈ I, P π ∈ S [1 ,n ] \ ( I \{ i } ) π ( g Ij ) = 0 Remark 6.5.

The statement that deg ( g Ij ) ≤ deg ( g ) is not in Theorem 4.1 as stated in [13] but itfollows from the proof. By Theorem 6.4, if there is a polynomial g of degree at most d such that ˜ E n [ g ] < thenthere is a polynomial g of degree at most d such that1. ˜ E n [ g ] < g is symmetric under permutations of [2 n ] \ I for some I ⊆ [2 n ] such that | I | ≤ indexdeg ( g ) ≤ deg ( g ) ≤ d .where indexdeg ( g ) ≤ deg ( g ) because all of our variables mention at most two indices.14 .2 Decomposing g Based on z j Variables

We now show that we can choose g to be a polynomial of the form g = (cid:16)Q j ∈ A z j (cid:17) g A where A ⊆ [ n ] and g A is a polynomial in the x ij variables. To do this, just as in Section 3.2, we decompose g as g = P A ⊆ [ n ]: | A |≤ d (cid:16)Q j ∈ A z j (cid:17) g A where each g A is a polynomial in the x ij variables andobserve that ˜ E n [ g ] = ˜ E n  X A,A ′ ⊆ [ n ] Y j ∈ A z j Y j ∈ A ′ z j ! g A g A ′  = X A ⊆ [2 n ] ˜ E n " Y j ∈ A z j ! g A If ˜ E n [ g ] < then there must be an A ⊆ [2 n ] such that ˜ E n [ (cid:16)Q j ∈ A z j (cid:17) g A ] < . Thus, we cantake g = (cid:16)Q j ∈ A z j (cid:17) g A . Note that g = (cid:16)Q j ∈ A z j (cid:17) g A is symmetric under permutations of [2 n ] \ I ′ where I ′ = I ∪ A and thus | I ′ | ≤ d . We now further decompose ˜ E n [ g ] by observing that for any set of indices I ′′ = { i , . . . , i m } , ˜ E n [ g ] = ˜ E n " X π ∈ S m m − Y j =1 x i π ( j ) i π ( j +1) ! g Since ˜ E n [ g ] < , there must be a π ∈ S m such that ˜ E n h(cid:16)P π ∈ S m Q m − j =1 x i π ( j ) i π ( j +1) (cid:17) g i < . Thus, we can restrict our attention to ˜ E n h(cid:16)P π ∈ S m Q m − j =1 x i π ( j ) i π ( j +1) (cid:17) g i which effectivelyimposes the ordering x i π (1) < . . . < x i π ( m ) .For technical reasons, we take I ′′ to be I ′ = I ∪ A plus some additional indices so that | I ′′ | = d .Without loss of generality, we can assume that I ′′ = [ d ] and π is the identity, giving the ordering x < x < . . . < x d .We now observe that under this ordering, for all j ∈ [ d ] , z j = ( j −

2) + X i ∈ [2 n ] \ [ d ] x ij Thus, for all j ∈ [2 , d ] , z j is a sum of squares so (cid:16)Q j ∈ A \{ } z j (cid:17) g A is a sum of squares. This im-plies that ∈ A as otherwise ˜ E n [ g ] ≥ . Following similar logic as before, there is a polynomial g { } in the x ij variables of degree at most d − such that ˜ E n " d − Y i =1 x i ( i +1) ! z g { } < x < x < . . . < x d , we can express g { } interms of the following new variables: Deﬁnition 6.6.

For i ∈ [ d ] ∪ { } , we deﬁne the variable u i so that1. u = P j ∈ [ n ] \ [ d ] x j is the number of elements before a .2. For i ∈ [ d − , u i = P j ∈ [ n ] \ [ d ] x ij x j ( i +1) is the number of elements between a i and a i +1 .3. u d = P j ∈ [ n ] \ [ d ] x d j is the number of elements after a d . With these new variables, ˜ E n " d − Y i =1 x i ( i +1) ! z g { } = 1 d ! E u ,...,u d ∈ N ∪{ } : P d j =0 u j =2 n − d [( u − g { } ( u , . . . , u d )] < where the d ! term appears because the probability of having the ordering x < x < . . . < x d is d ! . We now complete the proof of Theorem 6.1 by constructing g ∗ ( u ) and proving that it has theneeded properties. To construct g ∗ , we take g ∗ ( u ) = E u ,...,u d ∈ N ∪{ } : P d j =1 u j =2 n − d − u [ g { } ( u , . . . , u d ) ] We ﬁrst need to check that g ∗ ( u ) is indeed a polynomial of degree at most d in u . This followsfrom the following lemma: Lemma 6.7.

For all d ∈ N and any polynomial p ( u , . . . , u d ) of degree at most d , E u ,...,u d ∈ N ∪{ } : P d j =1 u j = n ′ p ( u , . . . , u d ) is a polynomial of degree at most d in n ′ (for n ′ ∈ N ∪ { } ).Proof sketch. We illustrate why this lemma is true by computing these expected values for a fewmonomials in the variables { u , . . . , u d } . The ideas used in these computations can be generalizedto any monomial. The idea is to consider placing d − dividing lines among n ′ labeled balls in arandom order. Example 6.8.

With balls and bins, the possibilities are as follows:1. | : Balls and are in the ﬁrst bin in the order , .2. | : Balls and are in the ﬁrst bin in the order , .3. | : Ball is in the ﬁrst bin and ball is in the second bin.4. | : Ball is in the ﬁrst bin and ball is in the second bin. . | : Balls and are in the second bin in the order , .6. | : Balls and are in the second bin in the order , . To analyze monomials in the variables { u , . . . , u d } , we write u j = P n ′ i =1 t ij where t ij = 1 ifball i is in bin j and t ij = 0 otherwise.1. By symmetry, the probability that a given ball is put into the ﬁrst bin is d . Thus, E [ u ] = n ′ E [ t i ] = n ′ d .2. If we consider balls i and j where i = j , the probability that ball i is put into the ﬁrst bin is d . If ball i is placed into the ﬁrst bin, this effectively splits the ﬁrst bin into two bins, the partbefore ball i and the part after ball i . For ball j , the probability that it is put into one of theseparts is d +1 and the probability that it is put into the second bin is d +1 . Thus, E [ t i t j ] = d ( d +1) and E [ t i t j ] = d ( d +1) . This implies that E [ u u ] = n ′ ( n ′ − E [ t i t j ] = n ′ ( n ′ − d ( d +1) and E [ u ] = n ′ E [ t i ] + n ′ ( n ′ − E [ t i t j ] = n ′ d + n ′ ( n ′ − d ( d +1) .Following similar ideas, we can analyze any monomial of degree at most d and show that itsexpected value is a polynomial in n ′ of degree at most d .To complete the proof of Theorem 6.1, we need one more technical lemma Lemma 6.9.

For all n, k, d ∈ N such that k ≤ n − d , (cid:0) n − k − d − (cid:1)(cid:0) n − d − (cid:1) ≤ (cid:0) n − k − d − (cid:1)(cid:0) n − d − (cid:1) ! Proof.

Observe that for all k ′ , n ′ such that < k ′ ≤ n ′ , (cid:0) n ′ − k ′ n ′ (cid:1) = 1 − k ′ n ′ + k ′ n > − k ′ n ′ . Nowobserve that (cid:0) n − k − d − (cid:1)(cid:0) n − d − (cid:1) ! = d − Y j =1 (cid:18) n − k − j n − j (cid:19) ≥ d − Y j =1 (cid:18) n − k − j n − j (cid:19) ≥ d − Y j =1 n − k − jn − j = (cid:0) n − k − d − (cid:1)(cid:0) n − d − (cid:1) We now complete the proof of Theorem 6.1. Recall that Ω n,d is the distribution on a variable u with support [0 , n − d ] ∩ Z and the following probabilities P r [ u = k ] = (cid:0) n − k − d − (cid:1)(cid:0) nd (cid:1) We have the following facts:1. E Ω n,d [( u − g ∗ ( u )] = E u ,...,u d ∈ N ∪{ } : P d j =0 u j =2 n − d [( u − g { } ( u , . . . , g d )] <

2. For all u ∈ [0 , n − d ] ∩ Z , g ∗ ( u ) ≥ emark 6.10. Intuitively, g ∗ should already be a sum of squares. However, we are not sure howto prove this, so we instead show that g ∗ is sufﬁcient for our purposes. Since E Ω n,d [( u − g ∗ ( u )] < , n − d X k =1 (cid:0) n − k − d − (cid:1)(cid:0) nd (cid:1) ( k − g ∗ ( k ) < (cid:0) n − d − (cid:1)(cid:0) nd (cid:1) g ∗ (0) which implies that n − d X k =1 (cid:0) n − k − d − (cid:1)(cid:0) n − d − (cid:1) ( k − g ∗ ( k ) g ∗ (0) < In turn, this implies that n − d X k =1 (cid:0) n − k − d − (cid:1)(cid:0) n − d − (cid:1) ! ( k − g ∗ ( k ) g ∗ (0) ≤ n − d X k =1 (cid:0) n − k − d − (cid:1)(cid:0) n − d − (cid:1) ! ( k − g ∗ ( k ) g ∗ (0) < Using Lemma 6.9, n − d X k =1 (cid:0) n − k − d − (cid:1)(cid:0) n − d − (cid:1) ( k − g ∗ ( k ) g ∗ (0) ≤ n − d X k =1 (cid:0) n − k − d − (cid:1)(cid:0) n − d − (cid:1) ! ( k − g ∗ ( k ) g ∗ (0) < Multiplying both sides by g ∗ (0) , n − d X k =1 (cid:0) n − k − d − (cid:1)(cid:0) n − d − (cid:1) ( k − g ∗ ( k ) = n − d − X k =0 (cid:0) n − k − d − (cid:1)(cid:0) n − d − (cid:1) kg ∗ ( k + 1) < g ∗ (0) This implies that E Ω n,d [( u − g ∗ ( u )] < , as needed. Ω n,d To prove our SOS lower bound, we need to show that for any polynomial g ∗ of degree at most d , E Ω n,d [( u − g ∗ ( u )] ≥ . Equivalently, we need to show that for any polynomial g ∗ of degree atmost d , n − d − X k =0 (cid:0) n − k − d − (cid:1)(cid:0) n − d − (cid:1) kg ∗ ( k + 1) ≥ g ∗ (0) The expression P n − d − k =0 ( n − k − d − )( n − d − ) kg ∗ ( k + 1) is hard to analyze, so we approximate it by an integral.Observe that as long as k << n , kd << n and k d << n , (cid:0) n − k − d − (cid:1)(cid:0) n − d − (cid:1) = d − Y j =1 (cid:18) − kn (cid:19) n − j − k − n − j − kn ≈ (cid:18) − kn (cid:19) d − ≈ e − d kn − n − j − k − n − j − kn = ( n − k )( n − j ) − n ( n − j − k − n − k )( n − j ) = jk + n ( n − k )( n − j ) which is small. Taking ∆ = d n and g ( x ) = g ∗ (cid:0) x ∆ + 1 (cid:1) , approximately what we need to show isthat for all polynomials g of degree at most d , ∞ X j =0 ( j ∆) e − j ∆ g ( j ∆) ≥ g ( − ∆) In turn, this statement is approximately the same as the statement that for all polynomials g ofdegree at most d , Z ∞ x =0 g ( x ) xe − x dx ≥ ∆ g ( − ∆) In the remainder of this section, we prove this statement when dd << n by analyzing the distribu-tion µ ( x ) = xe − x . In Sections 8 and 9, we will then analyze how to bound the difference betweenthis statement and the statement which we actually need to prove. Remark 7.1.

For technical reasons, we will actually take ∆ = d n rather than ∆ = d n . Fordetails, see Section 9. Remark 7.2.

We might think that the probability that x is much more than log ( n ) is very smalland can be ignored. If so, than using Chebyshev polynomials would cause this statement to failat degree ˜ O (cid:0)p nd (cid:1) which is much less than √ n . However, this is not correct. Intuitively, since weare considering polynomials of degree up to d , we should consider the point where x d e − x becomesnegligible, which is when x is a sufﬁciently large constant times dlog ( d ) .Based on this, we can only ignore u which are a sufﬁciently large constant times dlog ( d ) nd .Roughly speaking, we will want to ignore all u > n , so we want d to be at least Cdlog ( d ) forsome sufﬁciently large constant C . For details, see Section 9. µ ( x ) = xe − x In order to analyze R ∞ x =0 g ( x ) xe − x dx , it is very useful to ﬁnd the unique orthonormal basis { h k : k ∈ N ∪{ }} for the distribution µ ( x ) = xe − x such that h k has degree k and the leading coefﬁcientof h k is positive. In this subsection, we ﬁnd this orthonormal basis. Deﬁnition 7.3.

Given two polynomials f and g , we deﬁne f · g = R ∞ x =0 f ( x ) g ( x ) xe − x dx Deﬁnition 7.4.

We deﬁne h k to be the degree k polynomial such that the leading coefﬁcient of h k is positive, h k · h k = 1 , and for all j < k , h j · h k = 0 . Lemma 7.5. h k ( x ) = 1 p k !( k + 1)! k X j =0 ( − k − j (cid:18) kj (cid:19) ( k + 1)!( j + 1)! x j Proof. roposition 7.6. x p · x q = ( p + q + 1)! Computing directly using Gram-Schmidt, the ﬁrst few polynomials in the orthonormal basisare1. h = 1 h = √ ( x − h = √ ( x − x + 6) h = √ ( x − x + 36 x − h = √ ( x − x + 120 x − x + 120) To check the general pattern, we need to check that for all i ∈ [0 , k − , h k · x i = 0 and h k · h k = 1 .To see this, observe that for all i ≥ , h k · x i = 1 p k !( k + 1)! k X j =0 ( − k − j (cid:18) kj (cid:19) ( k + 1)!( j + 1)! ( i + j + 1)!= 1 p k !( k + 1)! k X j =0 ( − k − j (cid:18) kj (cid:19)(cid:18) i + j + 1 j + 1 (cid:19) ( k + 1)! i ! Now observe that for all k and all functions f ( j ) , k X j =0 ( − k − j (cid:18) kj (cid:19) f ( j ) = (∆ k f )(0) where (∆ f )( x ) = f ( x + 1) − f ( x ) Proposition 7.7. If f = x i then ∆ k f = 0 if i < k and ∆ k f = k ! if i = k . Viewing (cid:0) i + j +1 j +1 (cid:1) as a polynomial in j , (cid:18) i + j + 1 j + 1 (cid:19) = ( i + j + 1)! i !( j + 1)! j i i ! + lower order termsPutting everything together,1. h k · x i = 0 whenever i ≤ k .2. h k · h k = √ k !( k +1)! ( h k · x k ) = k !( k +1)! k !( k +1)! (∆ k (cid:0) k + j +1 j +1 (cid:1) (0)) = 1 .3 Proof of the Approximate Statement Now that we have the orthonormal basis for µ ( x ) = xe − x , we prove the approximate statement weneed. Theorem 7.8.

For all d ∈ N and all ∆ > such that d + 1) ∆ e d ∆ ≤ , for any polynomial g of degree at most d , Z ∞ x =0 g ( x ) xe − x dx ≥ g ( − ∆) Proof.

Given a polynomial g of degree at most d , write g = P dk =0 a k h k . Since { h k } is anorthonormal basis for µ ( x ) = xe − x , Z ∞ x =0 g ( x ) xe − x dx = d X k =0 a k Using Cauchy Schwarz, we have that d X k =0 | a k | ≤ vuut d X k =0 a k ! d X k =0 ! = p ( d + 1) vuut d X k =0 a k which implies that P dk =0 a k ≥ ( P dk =0 | a k | ) d +1 In order to upper bound | g ( − ∆) | , we need to bound h k ( x ) near x = 0 . For this, we use thefollowing lemma: Lemma 7.9.

For all k ∈ N and all x ∈ R , | h k ( x ) | ≤ √ k + 1 e k | x | Proof.

Observe that h k ( x ) ≤ p k !( k + 1)! k X j =0 (cid:18) kj (cid:19) ( k + 1)!( j + 1)! | x | j ≤ √ k + 1 k X j =0 ( k | x | ) j j !( j + 1)! ≤ √ k + 1 e k | x | By Lemma 7.9, | g ( − ∆) | ≤ d X k =0 | a k |√ k + 1 e k ∆ ≤ √ d + 1 e d ∆ d X k =0 | a k | Thus, g ( − ∆) ≤ ( d + 1) e d ∆ (cid:16)P dk =0 | a k | (cid:17) .Putting everything together, as long as d +1) ∆ e d ∆ ≤ , R ∞ x =0 g ( x ) xe − x dx ≥ g ( − ∆) ,as needed. 21 Handling Numerical Integration Error

In this section, we show how to bound the difference between ∆ P ∞ j =0 ( j ∆) e − j ∆ g ( j ∆) and R ∞ x =0 g ( x ) xe − x dx In this subsection, we describe how the numerical integration error can be bounded using higherderivatives.

Lemma 8.1.

For any ∆ > and any differentiable function f : [0 , ∞ ) → R , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z ∞ x =0 f ( x ) dx − ∆ ∞ X j =0 f ( j ∆) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ ∆ Z ∞ x =0 | f ′ ( x ) | dx Proof.

This result follows by summing the following proposition over all j ∈ N ∪ { } and usingthe fact that | a + b | ≤ | a | + | b | . Proposition 8.2.

For all j ∈ N ∪ { } (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z ( j +1)∆ x = j ∆ ( f ( x ) − f ( j ∆)) dx (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ ∆ Z ( j +1)∆ x = j ∆ | f ′ ( x ) | dx Proof.

Observe that for all j ∈ N ∪ { } , for all x ∈ [ j ∆ , ( j + 1)∆] , | f ( x ) − f ( j ∆) | ≤ Z ( j +1)∆ x = j ∆ | f ′ ( x ) | dx Thus, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z ( j +1)∆ x = j ∆ ( f ( x ) − f ( j ∆)) dx (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ ∆ Z ( j +1)∆ x = j ∆ | f ′ ( x ) | dx Using higher derivatives, we can get better bounds on the error.

Lemma 8.3.

For any ∆ > and any twice differentiable function f : [0 , ∞ ) → R , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z ∞ x =0 f ( x ) dx − ∆ ∞ X j =0 f ( j ∆) + ∆2 f (0) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ ∆ Z ∞ x =0 | f ′′ ( x ) | dx Proof.

This result follows from summing the following lemma over all j ∈ N ∪ { } and using thefact that | a + b | ≤ | a | + | b | . 22 emma 8.4. For all j ∈ N ∪ { } , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z ( j +1)∆ x = j ∆ f ( x ) dx − ∆2 ( f ( j ∆) + f (( j + 1)∆)) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ ∆ Z ( j +1)∆ x = j ∆ | f ′′ ( x ) | dx Proof.

We prove this lemma using the following estimate of f ( x ) for x ∈ [ j ∆ , ( j + 1)∆] Proposition 8.5.

For all x ∈ [ j ∆ , ( j + 1)∆] , | f ( x ) − f ( j ∆) − ( x − j ∆) f ′ ( j ∆) | ≤ ( x − j ∆) Z ( j +1)∆ x = j ∆ | f ′′ ( x ) | dx Proof.

Observe that for all j ∈ N ∪ { } , for all x ∈ [ j ∆ , ( j + 1)∆] , | f ′ ( x ) − f ′ ( j ∆) | ≤ Z ( j +1)∆ x = j ∆ | f ′′ ( x ) | Taking the integral of this equation from j ∆ to x and using the fact that | a + b | ≤ | a | + | b | , | f ( x ) − f ( j ∆) − ( x − j ∆) f ′ ( j ∆) | ≤ ( x − j ∆) Z ( j +1)∆ x = j ∆ | f ′′ ( x ) | We now make the following observations:1. By Proposition 8.5, | f ( j ∆)+∆ f ′ ( j ∆) − f (( j +1)∆) | = | f (( j +1)∆) − f ( j ∆) − ∆ f ′ ( j ∆) | ≤ ∆ Z ( j +1)∆ x = j ∆ | f ′′ ( x ) | dx

2. Taking the integral of Proposition 8.5 from j ∆ to ( j + 1)∆ and using the fact that | a + b | ≤| a | + | b | , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z ( j +1)∆ x = j ∆ f ( x ) dx − ∆ f ( j ∆) − ∆ f ′ ( j ∆) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ ∆ Z ( j +1)∆ x = j ∆ | f ′′ ( x ) | Adding ∆2 times the ﬁrst equation to the second equation, we have that (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z ( j +1)∆ x = j ∆ f ( x ) dx − ∆2 ( f ( j ∆) + f (( j + 1)∆)) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ ∆ Z ( j +1)∆ x = j ∆ | f ′′ ( x ) | dx as neededWe now generalize this argument to (t+1)th derivatives.23 eﬁnition 8.6. For all t ∈ N , we deﬁne M t to be the ( t + 1) × ( t + 1) matrix with entries ( M t ) ab = ( b − ( a − where ( M t ) = 1 . Note that M t is a Vandermonde matrix and is thusinvertible. Deﬁnition 8.7.

For all t ∈ N , we deﬁne v t to be the vector of length t + 1 with entries ( v t ) a = t a − a and we deﬁne c t = M − t v t . Lemma 8.8.

For all t ∈ N , for any ∆ > and any function f : [0 , ∞ ) → R which can bedifferentiated t + 1 times, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) t t − X b =0 Z ∞ x = b ∆ f ( x ) dx ! − ∆ ∞ X j =0 f ( j ∆) + ∆ t − X j =0 t X b = j +1 ( c t ) b +1 ! f ( j ∆) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ ( t ∆) t +1 (cid:18) t ( t + 1)! + P tb =1 | ( c t ) b +1 | t ! (cid:19) Z ∞ x =0 | f ( t +1) ( x ) | dx Proof.

This result follows from summing the following lemma over all j ∈ N ∪ { } : Lemma 8.9.

For all j ∈ N ∪ { } , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) t Z ( j + t )∆ x = j ∆ f ( x ) dx − ∆ t X b =0 ( c t ) b +1 f (( j + b )∆) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ ( t ∆) t +1 (cid:18) t + 1)! + P tb =1 | ( c t ) b +1 | t ( t !) (cid:19) Z ( j + t )∆ x = j ∆ | f ( t +1) ( x ) | dx Proof.

We prove this lemma using the following estimate of f ( x ) for x ∈ [ j ∆ , ( j + t )∆] Proposition 8.10.

For all x ∈ [ j ∆ , ( j + t )∆] , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) f ( x ) − t X a =0 ( x − j ∆) a a ! f ( a ) ( j ∆) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ ( x − j ∆) t t ! Z ( j + t )∆ x = j ∆ | f ( t +1) ( x ) | dx Proof.

Observe that for all j ∈ N ∪ { } , for all x ∈ [ j ∆ , ( j + t )∆] , | f ( t ) ( x ) − f ( t ) ( j ∆) | ≤ Z ( j + t )∆ x = j ∆ | f ( t +1) ( x ) | Taking the integral of this equation from j ∆ to x t times and using the fact that | a + b | ≤ | a | + | b | , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) f ( x ) − t X a =0 ( x − j ∆) a a ! f ( a ) ( j ∆) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ ( x − j ∆) t t ! Z ( j + t )∆ x = j ∆ | f ( t +1) ( x ) | dx We now make the following observations:1. By Proposition 8.10, for all b ∈ [ t ] , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) t X a =0 ( b ∆) a a ! f ( a ) ( j ∆) − f ( j ∆ + b ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ ( b ∆) t t ! Z ( j + t )∆ x = j ∆ | f ( t +1) ( x ) |

24. Taking the integral of Proposition 8.10 from j ∆ to ( j + t )∆ and using the fact that | a + b | ≤| a | + | b | , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z ( j + t )∆ x = j ∆ f ( x ) dx − t X a =0 ( t ∆) a +1 ( a + 1)! f ( a ) ( j ∆) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ ( t ∆) t +1 ( t + 1)! Z ( j + t )∆ x = j ∆ | f ( t +1) ( x ) | dx Adding ( c t ) b +1 times the ﬁrst equation for each b ∈ [ t ] to t times the second equation, we have that t Z ( j + t )∆ x = j ∆ f ( x ) dx − t X a =0 ∆ a +1 t a ( a + 1)! − t X b =1 b a ( c t ) b +1 a ! ! f ( a ) ( j ∆) − t X b =1 ∆( c t ) b +1 f ( j ∆ + b ) ≤ ( t ∆) t +1 ( t + 1)! Z ( j + t )∆ x = j ∆ | f ( t +1) ( x ) | dx + t X b =1 ∆ | ( c t ) b +1 | ( b ∆) t t ! Z ( j + t )∆ x = j ∆ | f ( t +1) ( x ) | dx ≤ ( t ∆) t +1 (cid:18) t + 1)! + P tb =1 | ( c t ) b +1 | t ( t !) (cid:19) Z ( j + t )∆ x = j ∆ | f ( t +1) ( x ) | dx Thus, it is sufﬁcient to show the following:1. If a = 0 then t a ( a +1)! − P tb =1 b a ( c t ) b +1 a ! = ( c t ) .2. If a ∈ [ t ] then t a ( a +1)! − P tb =1 b a ( c t ) b +1 a ! = 0 To see these statements, observe that t X b =0 b a ( c t ) b +1 = t +1 X b =1 ( M t ) ( a +1) ,b ( c t ) b = ( v t ) a +1 = t a a + 1 Thus, for all a ∈ [ t ] ∪ { } t a ( a + 1)! − t X b =0 b a ( c t ) b +1 a ! = 0 When a = 0 , P tb =1 b a ( c t ) b +1 a ! = P tb =0 b a ( c t ) b +1 a ! − ( c t ) so t a ( a +1)! − P tb =1 b a ( c t ) b +1 a ! = ( c t ) .When a > , P tb =1 b a ( c t ) b +1 a ! = P tb =0 b a ( c t ) b +1 a ! so t a ( a +1)! − P tb =1 b a ( c t ) b +1 a ! = 0 h k In order to use our tools, we need bounds on the integrals of the functions h k . Lemma 8.11.

For all j, j ′ ∈ N ∪ { } , R ∞ x =0 | h j ( x ) h j ′ ( x ) | xe − x dx ≤ Proof.

Observe that Z ∞ x =0 | h j ( x ) h j ′ ( x ) | xe − x dx ≤ Z ∞ x =0 h j ( x ) + h j ′ ( x )2 xe − x dx = 1 emma 8.12. For all j ∈ N ∪ { } , R ∞ x =0 h j ( x ) e − x dx ≤ j + 8 Proof.

The cases where j = 0 and j = 1 can be computed directly. For j > , observe that byLemma 7.9, Z ∞ x =0 h j ( x ) e − x dx = Z j x =0 h j ( x ) e − x dx + Z ∞ x = j h j ( x ) e − x dx ≤ Z j x =0 ( j + 1) e jx e − x dx + j Z ∞ x = j h j ( x ) xe − x dx ≤ ( j + 1)2 j − e −

1) + j ≤ j + 8 Corollary 8.13.

For all j, j ′ ∈ N ∪ { } , R ∞ x =0 | h j ( x ) h j ′ ( x ) | e − x dx ≤ p ( j + 8)( j ′ + 8) Proof.

Observe that by Lemma 8.12, Z ∞ x =0 | h j ( x ) h j ′ ( x ) | e − x dx ≤ Z ∞ x =0 q j ′ +8 j +8 h j ( x ) + q j +8 j ′ +8 h j ′ ( x )2 e − x dx ≤ p ( j + 8)( j ′ + 8) h k We also need to analyze what happens when we take the derivative of h k . Calculating directly, theﬁrst few derivatives are:1. d (1) dx = 0 d ( x − dx = 1 d ( x − x +6) dx = 2 x − x − − d ( x − x +36 x − dx = 3 x − x + 36 = 3( x − x + 6) − x −

2) + 6 d ( x − x + 120 x − x + 120) dx = 4 x − x + 240 x − x − x + 36 x − − x − x + 6) + 24( x − − The general pattern is as follows:

Lemma 8.14. h ′ k ( x ) = P k − k ′ =0 ( − k − − k ′ k ! k ′ ! √ k ′ !( k ′ +1)! √ k !( k +1)! h j ( x ) roof. To prove this lemma, we need to show that the derivative of p k !( k + 1)! h k = k X j =0 ( − k − j (cid:18) kj (cid:19) ( k + 1)!( j + 1)! x j is k − X k ′ =0 ( − k − − k ′ k ! k ′ ! k ′ X j =0 ( − k ′ − j (cid:18) k ′ j (cid:19) ( k ′ + 1)!( j + 1)! x j ! = k − X k ′ =0 ( − k − − k ′ k ! k ′ ! (cid:16)p k ′ !( k ′ + 1)! h k ′ (cid:17) To prove this, we use the following proposition.

Proposition 8.15.

For all n and all j , n − X k = j (cid:18) kj (cid:19) = (cid:18) nj + 1 (cid:19) Proof.

Observe that choosing j + 1 objects out of n objects is equivalent to choosing the position k + 1 of the last object and then choosing the remaining j objects from the ﬁrst k objects.With this proposition in hand, we observe that k − X k ′ =0 ( − k − − k ′ k ! k ′ ! k ′ X j =0 ( − k ′ − j (cid:18) k ′ j (cid:19) ( k ′ + 1)!( j + 1)! x j ! = k − X j =0 ( − k − − j k !( j + 1)! k − X k ′ = j ( k ′ + 1) (cid:18) k ′ j (cid:19)! x j = k − X j =0 ( − k − − j k ! j ! k − X k ′ = j (cid:18) k ′ + 1 j + 1 (cid:19)! x j = k − X j =0 ( − k − − j k ! j ! (cid:18) k + 1 j + 2 (cid:19) x j = k X j =1 ( − k − j k ! j ! (cid:18) k + 1 j + 1 (cid:19) ( jx j − ) = k X j =1 ( − k − j (cid:18) kj (cid:19) ( k + 1)!( j + 1)! ( jx j − )= d (cid:16)P kj =0 ( − k − j (cid:0) kj (cid:1) ( k +1)!( j +1)! x j (cid:17) dx In this subsection, we use our tools to bound the numerical integration error ∆ ∞ X j =0 ( j ∆) e − j ∆ g ( j ∆) − Z ∞ x =0 g ( x ) xe − x dx heorem 8.16. For all t ∈ N , there exist constants C t , C t > such that for all ∆ > , d ∈ N ,and polynomials g of degree at most d , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∆ ∞ X j =0 ( j ∆) e − j ∆ g ( j ∆) − Z ∞ x =0 g ( x ) xe − x dx (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:0) C t ( d ∆) e td ∆ + C t d ( d ∆) t +1 (cid:1) Z ∞ x =0 g ( x ) xe − x dx Proof.

To prove this, we bound R ∞ x =0 (cid:12)(cid:12)(cid:12)(cid:12) d t +1 ( g ( x ) xe − x ) dx t +1 (cid:12)(cid:12)(cid:12)(cid:12) dx . Lemma 8.17.

For all t, d ∈ N If g = P di =0 a i h i is a polynomial of degree at most d then Z ∞ x =0 (cid:12)(cid:12)(cid:12)(cid:12) d t +1 ( g ( x ) xe − x ) dx t +1 (cid:12)(cid:12)(cid:12)(cid:12) dx ≤ ( t + 4)( d + 8)( d + 1)(3 d ) t d X i =0 a i ! Proof.

Observe that d t +1 ( g ( x ) xe − x ) dx t +1 = t +1 X j =0 t +1 − j X j =0 ( t + 1)!( − t +1 − j − j j ! j !( t + 1 − j − j )! d j g ( x ) dx j d j g ( x ) dx j xe − x + t X j =0 t − j X j =0 ( t + 1)!( − t − j − j j ! j !( t − j − j )! d j g ( x ) dx j d j g ( x ) dx j e − x By Lemma 8.14, if f = P di =0 b i h i is a polynomial of degree at most d then writing dfdx = P d − i =0 b ′ i h i , P d − i =0 | b ′ i | ≤ d P di =0 | b i | . By Lemma 8.11 and Corollary 8.13, we have that for all j, j ′ ∈ [0 , d ] ,1. For all j, j ′ ∈ N ∪ { } , R ∞ x =0 | h j ( x ) h j ′ ( x ) | xe − x dx ≤

2. For all j, j ′ ∈ N ∪ { } , R ∞ x =0 | h j ( x ) h j ′ ( x ) | e − x dx ≤ p ( j + 8)( j ′ + 8) Putting these facts together, if g = P di =0 a i h i then Z ∞ x =0 (cid:12)(cid:12)(cid:12)(cid:12) d t +1 ( g ( x ) xe − x ) dx t +1 (cid:12)(cid:12)(cid:12)(cid:12) dx ≤ (cid:0) t +1 d t +1 + ( t + 1)3 t d t ( d + 8) (cid:1) d X i =0 | a i | ! ≤ ( t + 4)( d + 8)( d + 1)(3 d ) t d X i =0 a i ! With this bound in hand, we now apply Lemma 8.8 with f ( x ) = g ( x ) xe − x . For convenience,we recall the statement of Lemma 8.8 here: 28or all t ∈ N , for any ∆ > and any function f : [0 , ∞ ) → R which can be differentiated t + 1 times, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) t t − X b =0 Z ∞ x = b ∆ f ( x ) dx ! − ∆ ∞ X j =0 f ( j ∆) + ∆ t − X j =0 t X b = j +1 ( c t ) b +1 ! f ( j ∆) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ ( t ∆) t +1 (cid:18) t ( t + 1)! + P tb =1 | ( c t ) b +1 | t ! (cid:19) Z ∞ x =0 | f ( t +1) ( x ) | dx To use this bound, we need to bound f ( x ) = g ( x ) xe − x when x ∈ [0 , t ∆] . Lemma 8.18. If g = P di =0 a i h i then for all x ∈ [0 , t ∆] , f ( x ) = g ( x ) xe − x ≤ t ∆( d + 1) e dt ∆ d X i =0 a i ! Proof.

By Lemma 7.9, for all x ∈ R and all j ∈ N , | h j ( x ) | ≤ √ j + 1 e j | x | . Thus, for all x ∈ [0 , t ∆] , | g ( x ) | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) d X i =0 a i h i ( x ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ √ d + 1 e dt ∆ d X i =0 | a i | ! which implies that for all x ∈ [0 , t ∆] g ( x ) xe − x ≤ t ∆( d + 1) e dt ∆ d X i =0 | a i | ! ≤ t ∆( d + 1) e dt ∆ d X i =0 a i ! Using Lemma 8.18, we make the following observations:1. (cid:12)(cid:12)R ∞ f ( x ) dx − t (cid:0)P t − b =0 R ∞ x = b ∆ f ( x ) dx (cid:1)(cid:12)(cid:12) ≤ ( t ∆) ( d + 1) e dt ∆ (cid:16)P di =0 a i (cid:17) (cid:12)(cid:12)(cid:12) ∆ P t − j =0 (cid:16)P tb = j +1 ( c t ) b +1 (cid:17) f ( j ∆) (cid:12)(cid:12)(cid:12) ≤ ( t ∆) ( d + 1) e dt ∆ (cid:0)P tb =1 | ( c t ) b +1 | (cid:1) (cid:16)P di =0 a i (cid:17) Putting everything together, there exist constants C t and C t such that (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∆ ∞ X j =0 ( j ∆) e − j ∆ g ( j ∆) − Z ∞ x =0 g ( x ) xe − x dx (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:0) C t ( d ∆) e td ∆ + C t d ( d ∆) t +1 (cid:1) d X i =0 a i ! Since R ∞ x =0 g ( x ) xe − x dx = P di =0 a i , the result follows.29 Handling the Difference Between Distributions

In this section, we prove the following theorem:

Theorem 9.1.

For all d, d , n ∈ N such that (4 d + 2) ln ( d ) + 2 ln (20) ≤ d ≤ √ n , for allpolynomials g of degree at most d , taking ∆ = d n , ∆ n − d − X k =0 ( k ∆) (cid:0) n − k − d − (cid:1)(cid:0) n − d − (cid:1) g ( k ∆) ≥ ∆2 ∞ X k =0 ( k ∆) e − k ∆ g ( k ∆) − Z ∞ x =0 g ( x ) xe − x dx Proof.

To prove this theorem, we prove the following two statements:1. ∆ P ⌈ n ⌉− k =0 (cid:0) k ∆2 (cid:1) ( n − k − d − )( n − d − ) g ( k ∆) ≥ ∆4 P ⌈ n ⌉− k =0 ( k ∆) e − k ∆ g ( k ∆) ∆ P ∞ k = ⌈ n ⌉ ( k ∆) e − k ∆ g ( k ∆) ≤ R ∞ x =0 g ( x ) xe − x dx Assuming these two statements, we have that ∆ n − d − X k =0 ( k ∆) (cid:0) n − k − d − (cid:1)(cid:0) n − d − (cid:1) g ( k ∆) ≥ ∆ ⌈ n ⌉− X k =0 ( k ∆) (cid:0) n − k − d − (cid:1)(cid:0) n − d − (cid:1) g ( k ∆) ≥ ∆2 ⌈ n ⌉− X k =0 ( k ∆) e − k ∆ g ( k ∆)= ∆2  ∞ X k =0 ( k ∆) e − k ∆ g ( k ∆) − ∞ X k = ⌈ n ⌉ ( k ∆) e − k ∆ g ( k ∆)  ≥ ∆2 ∞ X k =0 ( k ∆) e − k ∆ g ( k ∆) − Z ∞ x =0 g ( x ) xe − x dx We now prove these two statements. The ﬁrst statement follows immediately from the followinglemma:

Lemma 9.2.

For all k, d , n ∈ N such that d ≤ √ n and k ≤ n , taking ∆ = d n , (cid:0) n − k − d − (cid:1)(cid:0) n − d − (cid:1) ≥ e − k ∆ Proof.

Observe that (cid:0) n − k − d − (cid:1)(cid:0) n − d − (cid:1) = d − Y j =1 n − j − k − n − j = d − Y j =1 e − kn · − kn e − kn · n − j − k − n − j − kn ! ≥ e − k ∆ − kn e − kn ! d − d − Y j =1 n − j − k − n − j − kn ! (cid:16) − kn e − kn (cid:17) d − and Q d − j =1 n − j − k − n − j − kn . Proposition 9.3.

For all k ∈ N such that k ≤ n , − kn e − kn ≥ Proof.

Observe that for all x ≥ , e − x ≤ − x + x . Taking x = kn , if k ≤ n then e − kn ≤ − kn + 2 k n ≤ − kn + k n ≤ − kn and thus − kn e − kn ≥ .To bound Q d − j =1 n − j − kn − j − k − n , we prove the following lemma. Lemma 9.4.

For all j, k, n ∈ N such that j ≤ n and k ≤ n , n − j − k − n − j − kn ≥ (cid:18) − n (cid:19) (cid:18) − jkn (cid:19) Proof.

Observe that − n − j − k − n − j − kn = ( n − k )( n − j ) − n ( n − j − k − n − k )( n − j ) = jk + n ( n − k )( n − j ) ≤ jk + n n = 2 jkn + 2 n − n ≤ jkn + 2 n − jkn Rearranging this, we have that n − j − k − n − j − kn ≥ − jkn − n + 4 jkn as needed.Combining this lemma with the following proposition, we have the following corollary. Proposition 9.5.

For all x ∈ [0 , and all k ∈ N , (1 − x ) k ≥ − kx Corollary 9.6.

For all d , k, n ∈ N such that d ≤ n and k ≤ n , d − Y j =1 n − j − k − n − j − kn ≥ (cid:18) − d n (cid:19) (cid:18) − d kn (cid:19) Since d ≤ √ n ≤ n and k ≤ n , combining Proposition 9.3 and Corollary 9.6 with the inequal-ity (cid:0) n − k − d − (cid:1)(cid:0) n − d − (cid:1) ≥ e − k ∆ − kn e − kn ! d − d − Y j =1 n − j − k − n − j − kn !

31e have that (cid:0) n − k − d − (cid:1)(cid:0) n − d − (cid:1) ≥ e − k ∆ (cid:18) − d n (cid:19) (cid:18) − d kn (cid:19) ≥ e − k ∆ which completes the proof of Lemma 9.2We now prove the second statement needed to prove Theorem 9.1. Lemma 9.7.

For all d, d , n ∈ N such that d ≥ dln ( d ) + 2 ln (10 n ) , for any polynomial g ofdegree at most d , taking ∆ = d n ∆ ∞ X k = ⌈ n ⌉ ( k ∆) e − k ∆ g ( k ∆) ≤ n (cid:18) (cid:19) d − (cid:18) nd (cid:19) d Z ∞ x =0 g ( x ) xe − x dx Proof.

To prove this, we upper bound | h k ( x ) | for large x . Lemma 9.8.

For all k ∈ N and all x ≥ , | h k ( x ) | ≤ (2 x ) k Proof.

Recall that h k ( x ) = 1 p k !( k + 1)! k X j =0 ( − k − j (cid:18) kj (cid:19) ( k + 1)!( j + 1)! x j Now observe that k X j =0 (cid:18) kj (cid:19) j + 1)! ≤ k X j =0 k j j !2 j = e k Thus, for all k ∈ N and all x ≥ , | h k ( x ) | ≤ √ k + 1 e k x k .If k ≥ then √ k + 1 e k ≤ k and we are done. For k ∈ [1 , we check the polynomialsdirectly.1. | h ( x ) | = (cid:12)(cid:12)(cid:12) √ ( x − (cid:12)(cid:12)(cid:12) ≤ max { x √ , √ } < x | h ( x ) | = (cid:12)(cid:12)(cid:12) √ ( x − x + 6) (cid:12)(cid:12)(cid:12) ≤ max { x √ , x √ } < x | h ( x ) | = (cid:12)(cid:12)(cid:12) √ ( x − x + 36 x − (cid:12)(cid:12)(cid:12) ≤ max { x √ , x √ } < x | h ( x ) | = (cid:12)(cid:12)(cid:12) √ ( x − x + 120 x − x + 120) (cid:12)(cid:12)(cid:12) ≤ max { x √ , x √ } < x Corollary 9.9.

For all d ∈ N , if g is a polynomial of degree at most d then for all y ≥ , g ( y ) ≤ y ) d Z ∞ x =0 g ( x ) xe − x dx roof. Writing g = P di =0 a i h i , we have that R ∞ x =0 g ( x ) xe − x dx = P di =0 a i and g ( y ) ≤ d X i =0 d X i ′ =0 a i a i ′ (2 y ) i + i ′ ≤ d X i =0 d X i ′ =0 (cid:18) d − i ′ +1 a i + 12 d − i +1 a i ′ (cid:19) (2 y ) d ≤ d X i =0 a i + d X i ′ =0 a i ′ ! (2 y ) d = 2(2 y ) d d X i =0 a i With this bound, we can now prove Lemma 9.7. By Corollary 9.9, g ( y ) ≤ y ) d Z ∞ x =0 g ( x ) xe − x dx Applying this with y = k ∆ , since d ≥ (4 d + 2) ln ( d ) + 2 ln (20) , ∆ ∞ X k = ⌈ n ⌉ ( k ∆) e − k ∆ g ( k ∆) ≤ ∆ ∞ X k = ⌈ n ⌉ (2 k ∆) e − k ∆ (2 k ∆) d Z ∞ x =0 g ( x ) xe − x dx ≤ Z ∞ x =( ⌈ n ⌉− (2 x ) d +1 e − x dx ! Z ∞ x =0 g ( x ) xe − x dx ≤ d +1 2 d +1 X j =0 (2 d + 1)!(2 d + 1 − j )! (cid:16)(cid:16)l n m − (cid:17) ∆ (cid:17) d +1 − j e − ( ⌈ n ⌉ − ) ∆ (cid:18)Z ∞ x =0 g ( x ) xe − x dx (cid:19) ≤ d +2 (cid:16)(cid:16)l n m − (cid:17) ∆ (cid:17) d +1 e − ( ⌈ n ⌉ − ) ∆ (cid:18)Z ∞ x =0 g ( x ) xe − x dx (cid:19) ≤ e ∆ d d +1 e − d (cid:18)Z ∞ x =0 g ( x ) xe − x dx (cid:19) ≤ e (2 d +1) ln ( d ) − d (cid:18)Z ∞ x =0 g ( x ) xe − x dx (cid:19) ≤ Z ∞ x =0 g ( x ) xe − x dx where the second inequality holds because (2 x ) d +1 e − x is a decreasing function whenever x ≥ d + 1 .

10 Putting Everything Together

In this section, we put everything together to prove our SOS lower bound.33

We ﬁrst combine Theorems 8.16 and 9.1 to lower bound our sum with an integral.

Theorem 10.1.

For all d, d , t, n ∈ N , taking ∆ = d n , if the following conditions hold:1. (4 d + 2) ln ( d ) + 2 ln (20) ≤ d ≤ √ n

2. Letting C t and C t be the constants given by Theorem 8.16, C t ( d ∆) e td ∆ + C t d ( d ∆) t +1 ≤ then for any polynomial g of degree at most d , ∆ n − d − X k =0 ( k ∆) (cid:0) n − k − d − (cid:1)(cid:0) n − d − (cid:1) g ( k ∆) ≥ Z ∞ x =0 g ( x ) xe − x dx Proof.

By Theorem 9.1, for all d, d , n ∈ N such that dln ( d ) + 2 ln (10 n ) ≤ d ≤ √ n , for allpolynomials g of degree at most d , taking ∆ = d n , ∆ n − d − X k =0 ( k ∆) (cid:0) n − k − d − (cid:1)(cid:0) n − d − (cid:1) g ( k ∆) ≥ ∆2 ∞ X k =0 ( k ∆) e − k ∆ g ( k ∆) − Z ∞ x =0 g ( x ) xe − x dx By Theorem 8.16, for all ∆ > , d ∈ N , and polynomials g of degree at most d , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∆ ∞ X k =0 ( k ∆) e − k ∆ g ( j ∆) − Z ∞ x =0 g ( x ) xe − x dx (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:0) C t ( d ∆) e td ∆ + C t d ( d ∆) t +1 (cid:1) Z ∞ x =0 g ( x ) xe − x dx ≤ Z ∞ x =0 g ( x ) xe − x dx Thus, ∆ ∞ X k =0 ( k ∆) e − k ∆ g ( j ∆) ≥ Z ∞ x =0 g ( x ) xe − x dx Combining these statements, ∆ P n − d − k =0 ( k ∆) ( n − k − d − )( n − d − ) g ( k ∆) ≥ R ∞ x =0 g ( x ) xe − x dx , as needed. We now prove our SOS lower bound.

Theorem 10.2.

For all d, d , t, n ∈ N , taking ∆ = d n , if the following conditions hold: . (4 d + 2) ln ( d ) + 2 ln (20) ≤ d ≤ √ n d + 1) ∆ e d ∆ ≤ .3. Letting C t and C t be the constants given by Theorem 8.16, C t ( d ∆) e td ∆ + C t d ( d ∆) t +1 ≤ then there is no polynomial g of degree at most d such that ˜ E n [ g ] < .Proof. We recall the following results.1. By Theorem 6.1, since d ≤ d ≤ n , if there is a polynomial g of degree at most d suchthat ˜ E n [ g ] < then there is a polynomial g ∗ : R → R of degree at most d such that E Ω n,d [( u − g ∗ ( u ) ] < . Equivalently, n − d − X k =0 (cid:0) n − k − d − (cid:1)(cid:0) n − d − (cid:1) kg ∗ ( k + 1) < g ∗ (0) Taking g ( x ) = g ∗ (cid:0) x ∆ + 1 (cid:1) , ∆ n − d − X k =0 (cid:0) n − k − d − (cid:1)(cid:0) n − d − (cid:1) ( k ∆) g ( k ∆) < ∆ g ( − ∆)

2. By Theorem 10.1, under the given conditions, ∆ n − d − X k =0 ( k ∆) (cid:0) n − k − d − (cid:1)(cid:0) n − d − (cid:1) g ( k ∆) ≥ Z ∞ x =0 g ( x ) xe − x dx

3. By Theorem 7.8, since d + 1) ∆ e d ∆ ≤ , for any polynomial g of degree at most d , Z ∞ x =0 g ( x ) xe − x dx ≥ g ( − ∆) Putting everything together, if there is a polynomial g of degree at most d such that ˜ E n [ g ] < then there is a polynomial g : R → R of degree at most d such that

32 ∆ g ( − ∆) ≤ ∆ n − d − X k =0 (cid:0) n − k − d − (cid:1)(cid:0) n − d − (cid:1) ( k ∆) g ( k ∆) ≤ Z ∞ x =0 g ( x ) xe − x dx < ∆ g ( − ∆) which is impossible. Thus, there is no polynomial g of degree at most d such that ˜ E n [ g ] < . Corollary 10.3.

For all ǫ > , there exists a constant C ǫ such that for all n ∈ N , degree C ǫ n − ǫ sum of squares cannot prove the ordering principle on n elements. In this paper, we analyzed the performance of SOS for proving the ordering principle, showing thatSOS requires degree roughly √ n to prove the ordering principle on n elements. This shows thatin terms of degree, SOS is more powerful than resolution, polynomial caluclus, and the Sherali-Adams hierarchy, but SOS still requires high degree to prove the ordering principle. While thismostly resolves the question of how powerful SOS is for proving the ordering principle, there areseveral open questions remaining including the following:1. Can we ﬁnd a tight example for the size/degree trade-off for SOS which was recently shownby Atserias and Hakoniemi [1]?2. Can we prove SOS lower bounds for the graph ordering principle on expanders? References ∼ lauria/sos14/ 2014[12] A. Potechin. CMSC 39600 1 (Autumn 2018) Topics in Theoretical Computer Science: TheSum of Squares Hierarchy. https://canvas.uchicago.edu/courses/17604 20183613] A. Potechin. Sum of squares lower bounds from symmetry and a good story. ITCS 2019[14] A. Razborov. Guest column: Proof Complexity and Beyond. ACM SIGACT News, Volume47, Issue 2, p.66-86. 2016[15] A. Raymond, J. Saunderson, M. Singh, and R. Thomas. Symmetric sums of squares overk-subset hypercubes, Math. Program. Volume 167, No. 2, p. 315-354. 2018[16] G. Stalmark. Short resolution proofs for a sequence of tricky formulas. Acta Inform. 33,277280. 1996[17] G. Stengle. A Nullstellensatz and a Positivstellensatz in Semialgebraic Geometry. Mathema-tische Annalen. 207 (2): 8797. doi:10.1007/BF01362149. 1974[18] Wikipedia. Chebyshev Polynomials. https://en.wikipedia.org/wiki/Chebyshev polynomials.Accessed February 17, 2019. A Analyzing the Ordering Principle with Boolean Variables

In this appendix, we describe how to modify the ordering principle equations so that they onlyhave Boolean variables. We then describe how to modify the pseudo-expectation values and theSOS lower bound proof for these equations.

A.1 Equations for the ordering principle with Boolean auxiliary variables

To encode the negation of the ordering principle using only Boolean variables, we simply replaceeach z j with a sum of squares of Boolean auxiliary variables. This gives us the following equationsfor the negation of the ordering principle:1. We have variables x ij where we want that x ij = 1 if a i < a j and x ij = 0 if a i > a j . We alsohave auxiliary variables { z jk : j ∈ [ n ] , k ∈ [ m ] } where m ≥ n − .2. ∀ i = j, x ij = x ij and ∀ j ∈ [ n ] , ∀ k ∈ [ m ] , z jk = z jk (variables are Boolean)3. ∀ i = j, x ij = 1 − x ji (ordering)4. For all distinct i, j, k , x ij x jk (1 − x ik ) = 0 (transitivity)5. ∀ j, P i = j x ij = 1+ P mk =1 z jk (for all j ∈ [ n ] , a j is not the minimum element of { a , . . . , a n } ) A.2 Pseudo-expectation values with Boolean auxiliary variables

In order to give pseudo-expectation values for these equations, we need to give pseudo-expectationvalues for polynomials involving the auxiliary variables. The idea for this is as follows. Letting w j = (cid:16)P i = j x ij (cid:17) − , we want that w j of the auxiliary variables { z jk : k ∈ [ m ] } are . If w j ∈ [0 , m ] ∩ Z , if we choose which of these auxiliary variables are at random,37. P r ( z j = 1) = w j m ,2. P r ( z j = 1 , z j = 1) = w j ( w j − m ( m −

3. More generally, for all K ⊆ [ m ] , P r ( ∀ k ∈ K, z jk = 1) = Q | K |− a =0 ( w j − a ) Q | K |− a =0 ( m − a ) Note that these expressions are still deﬁned for other w j including w j = − (though in this casethey aren’t actual probabilities over a distribution of solutions). Based on this, we have the follow-ing candidate pseudo-expectation values: Deﬁnition A.1 (Candidate pseudo-expectation values with Boolean auxiliary varialbes) .

1. For all polynomials p ( { x ij : i, j ∈ [ n ] , i = j } ) , we take ˜ E n [ p ] = E U n [ p ]

2. For all j ∈ [ n ] , for all K ⊆ [ m ] and all polynomials p which do not contain any of theauxiliary variables { z jk : k ∈ [ m ] } , we take ˜ E " Y k ∈ K z jk ! p = ˜ E h(cid:16)Q | K |− a =0 ( w j − a ) (cid:17) p iQ | K |− a =0 ( m − a ) A.3 Reducing to one variable with Boolean auxiliary variables

Unfortunately, our lower bound for the ordering principle equations in Deﬁnition 2.6 does notdirectly imply a lower bound for the ordering principle equations with Boolean auxiliary variables.That said, we can still reduce the problem to one variable by using the same techniques we used toprove Theorem 6.1. The resulting theorem is similar but not quite the same as Theorem 6.1.

Theorem A.2.

For all d, d , n, m ∈ N such that d ≤ d ≤ n and m ≥ nd , if there is apolynomial g of degree at most d such that ˜ E n [ g ] < then there is a polynomial g ∗ : R → R ofdegree at most d and a j ∈ [ d ] such that n − d X u =1 (cid:0) n − u − d − (cid:1)(cid:0) n − d − (cid:1) Q ja =1 ( u − a ) j ! ! g ∗ ( u ) < . g ∗ (0) Proof sketch.

Having Boolean auxiliary variables affects each part of the proof of Theorem 6.1 asfollows:1. Since the equations and pseudo-expectation values are still symmetric under permutations of [2 n ] , the argument in Section 6.1 that we can reduce to the case when g is symmetric underpermutations of [2 n ] \ I for some subset I ⊆ [2 n ] where | I | ≤ d still applies.2. In Section 6.2, we decomposed ˜ E n [ g ] as ˜ E n [ g ] = X A ⊆ [2 n ] ˜ E n " Y j ∈ A z j ! g A Here we can do a similar decomposition but it is somewhat more complicated.38 eﬁnition A.3.

Given a j ∈ [ n ] and a nonempty K ⊆ [ m ] , deﬁne y jK = Y k ∈ K z jk − Q | K |− a =0 ( w j − a ) Q | K |− a =0 ( m − a ) Proposition A.4.

For any j ∈ [2 n ] , any nonempty K ⊆ [ m ] , and any polynomial p whichdoes not depend on the auxiliary variables { z jk : k ∈ [ m ] } , ˜ E n [ y jK p ] = 0 With this proposition in mind, we decompose g as g = P A ⊆ [2 n ] g A where g A = X { K j : j ∈ A } Y j ∈ A y jK j ! p { K j : j ∈ A } ( where each K j is a nonempty subset of [ m ]) where each K j is a nonempty subset of [ m ] . With this decomposition, we have that ˜ E n [ g ] = X A ⊆ [2 n ] ˜ E n [ g A ] Note that unlike before, here we have the auxiliary variables be part of g A . That said, thisallows us to assume that there are no auxiliary variables z jk where j / ∈ A and that everythingis symmetric under permutations of [2 n ] \ I ′ where | I ′ | ≤ d .3. In Section 6.3, we restricted ourselves to a single ordering for the distinguished indices andexpressed everything in terms of the new variables u , . . . , u d . We can still do this withBoolean auxiliary variables, but this no longer removes all of the auxiliary variables. Whatwe get is a polynomial g { } ( u , . . . , u d , { z j ′ k : j ′ ∈ [ d ] } ) of degree at most d such that ˜ E n " d − Y i =1 x i ( i +1) ! g { } = 1 d ! E u ,...,u d ∈ N ∪{ } : P d j =0 u j =2 n − d h ˜ E ′ u ,...,u d [ g { } ( u , . . . , u d , { z j ′ k : j ′ ∈ [ d ] } )] i < where ˜ E ′ u ,...,u d gives the pseudo-expectation values of the auxiliary variables for givenvalues of u , . . . , u d .4. In Section 6.4, we took g ∗ ( u ) = E u ,...,u d ∈ N ∪{ } : P d j =1 u j =2 n − d − u [ g { } ( u , . . . , u d ) ] Before we can do this here, we need to remove the auxiliary variables { z k : k ∈ [ m ] } . Wecan do this as follows:(a) Observe that looking at the auxiliary variables { z k : k ∈ [ m ] } , ˜ E ′ u ,...,u d (and thus ˜ E n ) is symmetric under permutations of [ m ] . Using Theorem 6.4, we can assume that g { } is symmetric (as far as the auxiliary variables { z k : k ∈ [ m ] } are concerned)under permutations of [ m ] \ K for some K ⊆ [ m ] where | K | ≤ d .39b) Breaking things into cases based on the values of the auxiliary variables { z k : k ∈ K } ,we can assume that g { } ( u , . . . , u d , { z j ′ k : j ′ ∈ [ d ] } ) = Y k ∈ K z k ! Y k ∈ K (1 − z k ) ! p { } ( u , . . . , u d , { z j ′ k : j ′ ∈ [2 , d ] } ) for some K , K ⊆ [ m ] such that K ∩ K = ∅ and | K ∪ K | ≤ d .We now take g ∗ = E u ,...,u d ∈ N ∪{ } : P d j =1 u j =2 n − d − u h ˜ E ′ u ,...,u d [ p { } ( u , . . . , u d , { z j ′ k : j ′ ∈ [2 , d ] } )] i and we have that(a) g ∗ ( u ) is a polynomial of degree at most d in u .(b) For all u ∈ [0 , n − d ] ∩ Z , g ∗ ( u ) ≥ .(c) E Ω n,d h(cid:16)Q | K | a =1 ( u − a ) (cid:17) (cid:16)Q | K | a =1 ( m + 2 − u − a ) (cid:17) g ∗ ( u ) i < Equivalently, taking j = | K | and j = | K | , n − d X u =0 (cid:0) n − u − d − (cid:1)(cid:0) nd (cid:1) j Y a =1 ( u − a ) ! j Y a =1 ( m + 2 − u − a ) ! g ∗ ( u ) < Manipulating this gives n − d X u =1 (cid:0) n − u − d − (cid:1)(cid:0) n − d − (cid:1) Q ja =1 ( u − a ) j ! ! j Y a =1 m + 2 − u − am + 2 − a ! (cid:18) g ∗ ( u ) g ∗ (0) (cid:19) < which implies that n − d X u =1 (cid:0) n − u − d − (cid:1)(cid:0) n − d − (cid:1) ! Q ja =1 ( u − a ) j ! ! j Y a =1 m + 2 − u − am + 2 − a ! (cid:18) g ∗ ( u ) g ∗ (0) (cid:19) < We now make the following observations:(a) By Lemma 6.9, for all u ∈ N such that u ≤ n − d , (cid:18) ( n − u − d − )( n − d − ) (cid:19) ≥ ( n − u − d − )( n − d − ) .(b) For all u ∈ N , (cid:16) Q ja =1 ( u − a ) j ! (cid:17) ≥ Q ja =1 ( u − a ) j ! .(c) Since m ≥ nd , for all u ∈ N such that u ≤ n − d , j Y a =1 m + 2 − u − am + 2 − a ! ≥ (cid:18) nd − n nd (cid:19) d ≥ n − d X u =1 (cid:0) n − u − d − (cid:1)(cid:0) n − d − (cid:1) Q ja =1 ( u − a ) j ! ! g ∗ ( u ) < . g ∗ (0) as needed. A.4 SOS lower bound with Boolean auxiliary variables

When we have Boolean auxiliary variables, our SOS lower bound is modiﬁed as follows:

Theorem A.5.

For all d, d , t, n, m ∈ N such that m ≥ nd , if the following conditions hold forall j ∈ [ d ] (4 d + 2) ln ( d ) + 2 ln (20) ≤ d ≤ √ n ′ ( n − n ′ − d )!( n ′ − n − d )! ( j − j ) ∆ ( d + 1) e d (2 j − ≤

3. Letting C t and C t be the constants given by Theorem 8.16, C t ( d ∆) e td ∆ + C t d ( d ∆) t +1 ≤ where n ′ = n − d + 2 and ∆ = d n ′ then there is no polynomial g of degree at most d such that ˜ E n [ g ] < . Remark A.6.

We believe the condition on m is an artefact of the proof and that we should haveessentially the same lower bound as long as m ≥ n − , though proving this would requiremodifying the analysis further.Proof. Assume there is a polynomial g of degree at most d such that ˜ E n [ g ] < . By TheoremA.2, since d ≤ d ≤ n , there is a polynomial g ∗ : R → R of degree at most d and a j ∈ [ d ] suchthat n − d X u =1 (cid:0) n − u − d − (cid:1)(cid:0) n − d − (cid:1) Q ja =1 ( u − a ) j ! ! g ∗ ( u ) < . g ∗ (0) We transform this left side of this equation into the same form as the left hand side of Theorem10.1 using the following lemma.

Lemma A.7.

For all j, u ∈ N , j ! (cid:16)Q ja =1 ( u − a ) (cid:17) ≥ (cid:0) j − j (cid:1) ( u − j + 1) Proof.

We prove this lemma by induction. When u ≤ j − , the result is trivial. When u = 2 j , j ! j Y a =1 ( u − a ) ! = (cid:18) j − j (cid:19) = (cid:18) j − j (cid:19) ( u − j + 1) u = k where k ≥ j and consider the case when u = k + 1 . Bythe inductive hypothesis, j ! j Y a =1 (( k + 1) − a ) ! = kk − j j ! j Y a =1 ( k − a ) ! ≥ kk − j (cid:18)(cid:18) j − j − (cid:19) ( k − j + 1) (cid:19) ≥ k − j + 2 k − j + 1 (cid:18)(cid:18) j − j − (cid:19) ( k − j + 1) (cid:19) = (cid:18) j − j (cid:19) ( k − j + 2) Applying Lemma A.7, we have that . g ∗ (0) > n − d X u =1 (cid:0) n − u − d − (cid:1)(cid:0) n − d − (cid:1) Q ja =1 ( u − a ) j ! ! g ∗ ( u ) ≥ n − d X u =2 j − (cid:0) n − u − d − (cid:1)(cid:0) n − d − (cid:1) Q ja =1 ( u − a ) j ! ! g ∗ ( u ) ≥ n − d X u =2 j − (cid:0) n − u − d − (cid:1)(cid:0) n − d − (cid:1) (cid:18) j − j (cid:19) ( u − j + 1) g ∗ ( u ) Taking k = u − j + 1 , n ′ = n − j + 2 , ∆ = d n ′ , and g ( x ) = g ∗ (cid:0) x ∆ + 2 j − (cid:1) , . n − n ′ − d )!( n ′ − n − d )! (cid:0) j − j (cid:1) ∆ g ( − (2 j − > ∆ n ′ − d − X k =0 (cid:0) n ′ − k − d − (cid:1)(cid:0) n ′ − d − (cid:1) ( k ∆) g ( k ∆) By Theorem 10.1, under the given conditions, ∆ n ′ − d − X k =0 (cid:0) n ′ − k − d − (cid:1)(cid:0) n ′ − d − (cid:1) ( k ∆) g ( k ∆) ≥ Z ∞ x =0 g ( x ) xe − x dx Thus, Z ∞ x =0 g ( x ) xe − x dx < . n − n ′ − d )!( n ′ − n − d )! (cid:0) j − j (cid:1) ∆ g ( − (2 j − Decomposing g as g = P di =0 c i h i , observe that1. R ∞ x =0 g ( x ) xe − x dx = (cid:16)P di =0 c i (cid:17) .2. By Cauchy-Schwarz, g ( − (2 j − d X i =0 c i h i ( − (2 j − ! ≤ d X i =0 c i ! d X i =0 h i ( − (2 j − ! By Lemma 7.9, for all i ∈ N and all x ∈ R , | h i ( x ) | ≤ √ i + 1 e i | x | . Thus, g ( − (2 j − ≤ ( d + 1) e d (2 j − < . n − n ′ − d )!( n ′ − n − d )! (cid:0) j − j (cid:1) ∆ ( d + 1) e d (2 j − However, ( n − n ′ − d )!( n ′ − n − d )! ( j − j ) ∆ ( d + 1) e d (2 j − ≤ so this gives = . < .12