[PDF] The Polynomial Method Strikes Back: Tight Quantum Query Bounds via Dual Polynomials

Abstract

The approximate degree of a Boolean function f is the least degree of a real polynomial that approximates f pointwise to error at most 1/3. Approximate degree is known to be a lower bound on quantum query complexity. We resolve or nearly resolve the approximate degree and quantum query complexities of the following basic functions: ∙ k -distinctness: For any constant k , the approximate degree and quantum query complexity of k -distinctness is Ω( n 3/4−1/(2k) ) . This is nearly tight for large k (Belovs, FOCS 2012). ∙ Image size testing: The approximate degree and quantum query complexity of testing the size of the image of a function [n]→[n] is Ω ~ ( n 1/2 ) . This proves a conjecture of Ambainis et al. (SODA 2016), and it implies the following lower bounds: − k -junta testing: A tight Ω ~ ( k 1/2 ) lower bound, answering the main open question of Ambainis et al. (SODA 2016). − Statistical Distance from Uniform: A tight Ω ~ ( n 1/2 ) lower bound, answering the main question left open by Bravyi et al. (STACS 2010 and IEEE Trans. Inf. Theory 2011). − Shannon entropy: A tight Ω ~ ( n 1/2 ) lower bound, answering a question of Li and Wu (2017). ∙ Surjectivity: The approximate degree of the Surjectivity function is Ω ~ ( n 3/4 ) . The best prior lower bound was Ω( n 2/3 ) . Our result matches an upper bound of O ~ ( n 3/4 ) due to Sherstov, which we reprove using different techniques. The quantum query complexity of this function is known to be Θ(n) (Beame and Machmouchi, QIC 2012 and Sherstov, FOCS 2015). Our upper bound for Surjectivity introduces new techniques for approximating Boolean functions by low-degree polynomials. Our lower bounds are proved by significantly refining techniques recently introduced by Bun and Thaler (FOCS 2017).

Full PDF

TThe Polynomial Method Strikes Back:Tight Quantum Query Bounds via Dual Polynomials

Mark BunBoston University [email protected]

Robin KothariMicrosoft Research [email protected]

Justin ThalerGeorgetown University [email protected]

Abstract

The approximate degree of a Boolean function f is the least degree of a real polynomial thatapproximates f pointwise to error at most 1 /

3. The approximate degree of f is known to be alower bound on the quantum query complexity of f (Beals et al., FOCS 1998 and J. ACM 2001).We resolve or nearly resolve the approximate degree and quantum query complexities ofseveral basic functions. Speciﬁcally, we show the following: • k -distinctness: For any constant k , the approximate degree and quantum query complexityof the k -distinctness function is Ω( n / − / (2 k ) ). This is nearly tight for large k , as Belovs(FOCS 2012) has shown that for any constant k , the approximate degree and quantumquery complexity of k -distinctness is O ( n / − / (2 k +2 − ). • Image Size Testing: The approximate degree and quantum query complexity of testing thesize of the image of a function [ n ] → [ n ] is ˜Ω( n / ). This proves a conjecture of Ambainiset al. (SODA 2016), and it implies tight lower bounds on the approximate degree andquantum query complexity of the following natural problems. – k -junta testing: A tight ˜Ω( k / ) lower bound for k -junta testing, answering the mainopen question of Ambainis et al. (SODA 2016). – Statistical Distance from Uniform: A tight ˜Ω( n / ) lower bound for approximatingthe statistical distance from uniform of a distribution, answering the main questionleft open by Bravyi et al. (STACS 2010 and IEEE Trans. Inf. Theory 2011). – Shannon entropy: A tight ˜Ω( n / ) lower bound for approximating Shannon entropyup to a certain additive constant, answering a question of Li and Wu (2017). • Surjectivity: The approximate degree of the Surjectivity function is ˜Ω( n / ). The bestprior lower bound was Ω( n / ). Our result matches an upper bound of ˜ O ( n / ) due toSherstov (STOC 2018), which we reprove using diﬀerent techniques. The quantum querycomplexity of this function is known to be Θ( n ) (Beame and Machmouchi, Quantum Inf.Comput. 2012 and Sherstov, FOCS 2015).Our upper bound for Surjectivity introduces new techniques for approximating Booleanfunctions by low-degree polynomials. Our lower bounds are proved by signiﬁcantly reﬁningtechniques recently introduced by Bun and Thaler (FOCS 2017). a r X i v : . [ qu a n t - ph ] A ug ontents NOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.3 Informal Terminology: Polynomials as Algorithms . . . . . . . . . . . . . . . . . . . 263.4 Approximating Surjectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 OR N . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.2 Step 2: Constructing a Preliminary Dual Witness for AND R ◦ OR N . . . . . . . . . . 404.3 Step 3: Constructing the Final Dual Witness . . . . . . . . . . . . . . . . . . . . . . 40 k -Distinctness 41 THR kN . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425.2 Step 2: A Preliminary Dual Witness for OR R ◦ THR kN . . . . . . . . . . . . . . . . . 455.3 Step 3: Completing the Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 SDU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566.4 Lower Bound for Entropy Comparison and Approximation . . . . . . . . . . . . . . . 58 . . 607.2 Open Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 References 61 Introduction

Approximate degree.

The approximate degree of a Boolean function f : {− , } n → {− , } ,denoted (cid:103) deg( f ), is the least degree of a real polynomial p such that | p ( x ) − f ( x ) | ≤ / x ∈ {− , } n . Approximate degree is a basic measure of the complexity of a Boolean function, andhas diverse applications throughout theoretical computer science.Upper bounds on approximate degree are at the heart of the most powerful known learningalgorithms in a number of models [KS04, KS06, KKMS08, STT12, ACR +

10, KT14, OS10], algorithmicapproximations for the inclusion-exclusion principle [KLS96,She09a], and algorithms for diﬀerentiallyprivate data release [TUV12, CTUW14]. A recent line of work [Tal14, Tal17] has used approximatedegree upper bounds to show new lower bounds on the formula and graph complexity of explicitfunctions.Lower bounds on approximate degree have enabled progress in several areas of complexitytheory, including communication complexity [She11, BVdW07, She12, GS10, She13b, RY15, DPV09,CA08, DP08, She08], circuit complexity [MP69, She09b], oracle separations [Bei94, BCH + quantum query complexity [BBC +

01, Aar12, AS04],In spite of the importance of approximate degree, major gaps remain in our understanding.In particular, the approximate degrees of many basic functions are still unknown. Our goal inthis paper is to resolve the approximate degrees of many natural functions which had previouslywithstood characterization.

Quantum query complexity.

While resolving the approximate degree of basic functions ofinterest is a test of our understanding of approximate degree, it is also motivated by the study ofquantum algorithms. In the quantum query model, a quantum algorithm is given query access tothe bits of an input x , and the goal is to compute some function f of x while minimizing the numberof queried bits. Quantum query complexity captures much of the power of quantum computing,and most quantum algorithms were discovered in or can easily be described in the query setting.Approximate degree was one of the ﬁrst general lower bound techniques for quantum querycomplexity. In 1998, Beals et al. [BBC +

01] observed that the bounded-error quantum querycomplexity of a function f is lower bounded by (one half times) the approximate degree of f . Sincepolynomials are sometimes easier to understand than quantum algorithms, this observation led toa number of new lower bounds on quantum query complexity. This method of proving quantumquery lower bounds is called the polynomial method .After several signiﬁcant quantum query lower bounds were proved via the polynomial method(including the work of Aaronson and Shi [AS04], who proved optimal lower bounds for the Collisionand Element Distinctness problems), the polynomial method took a back seat. Since then, thepositive-weights adversary method [Amb02, BSS03, LM04, Zha05] and the newer negative-weightsadversary method [HLˇS07, Rei11, LMR +

11] have become the tools of choice for proving quantumquery lower bounds (with some notable exceptions, such as Zhandry’s recent tight lower bound forthe set equality problem [Zha15]). This leads us to our second goal for this work.In this work, we seek to resolve several open problems in quantum query complexity usingapproximate degree as the lower bound technique. A distinct advantage of proving quantum querylower bounds with the polynomial method is that any such bound can be “lifted” via Sherstov’s pattern matrix method [She11] to a quantum communication lower bound (even with unlimitedshared entanglement [LS09a]); such a result is not known for any other quantum query lower bound3 roblem Best Prior Upper Bound Our Lower Bound Best Prior Lower Bound k -distinctness O ( n / − / (2 k +2 − ) [Bel12a] ˜Ω( n / − / (2 k ) ) ˜Ω( n / ) [AS04]Image Size Testing O ( √ n log n ) [ABRdW16] ˜Ω( √ n ) ˜Ω( n / ) [ABRdW16] k -junta Testing O ( √ k log k ) [ABRdW16] ˜Ω( √ k ) ˜Ω( k / ) [ABRdW16] SDU O ( √ n ) [BHH11] ˜Ω( √ n ) ˜Ω( n / ) [BHH11, AS04]Shannon Entropy ˜ O ( √ n ) [BHH11, LW18] ˜Ω( √ n ) ˜Ω( n / ) [LW18] Table 1: Our lower bounds on quantum query complexity and approximate degree vs. prior work.technique. More generally, using approximate degree as a lower bound technique for quantumquery complexity has other advantages, such as the ability to show lower bounds for zero-error andsmall-error quantum algorithms [BCdWZ99], unbounded-error quantum algorithms [BBC + Quantum query complexity and approximate degree.

In this work we illustrate the powerof the polynomial method by proving optimal or nearly optimal bounds on several functions studiedin the quantum computing community. These results are summarized in Table 1, and deﬁnitions ofthe problems considered can be found in Section 1.1. Since the upper bounds for these functionswere shown using quantum algorithms, our results resolve both the quantum query complexity andapproximate degree of these functions.For most of the functions studied in this paper, the positive-weights adversary bound provablycannot show optimal lower bounds due to the certiﬁcate complexity barrier [Zha05, ˇSS06] and theproperty testing barrier [HLˇS07]. While these barriers do not apply to the negative-weights variant(which is actually capable for proving tight quantum query lower bounds for all functions [Rei11,LMR + n ). Onthe other hand, Sherstov recently showed that Surjectivity has approximate degree ˜ O ( n / ) [She18].Surjectivity is the only known example of a “natural” function separating approximate degree fromquantum query complexity; prior examples of such functions [Amb03, ABK16] were contrived, and(unlike Surjectivity) speciﬁcally constructed to separate the two measures.Our ﬁnal result gives a full characterization of the approximate degree of Surjectivity. We provea new lower bound of ˜Ω( n / ), which matches Sherstov’s upper bound up to logarithmic factors. Wealso give a new construction of an approximating polynomial of degree ˜ O ( n / ), using very diﬀerenttechniques than [She18]. We believe that our proof of this ˜ O ( n / ) upper bound is of independentinterest. In particular, our lower bound proof for Surjectivity is speciﬁcally tailored to showingoptimality of our upper bound construction, in a sense that can be made formal via complementaryslackness. We are optimistic that our approximation techniques will be useful for showing additionaltight approximate degree bounds in the future. 4 roblem Prior Upper Bound Our Upper Bound Our Lower Bound Prior Lower BoundSurjectivity ˜ O ( n / ) [She18] ˜ O ( n / ) ˜Ω( n / ) ˜Ω( n / ) [AS04] Table 2: Our bounds on the approximate degree of Surjectivity vs. prior work.

We now describe our results and prior work on these functions in more detail.

We now informally describe the functions studied in this paper. These functions are formally deﬁnedin Section 2.4.Let R be a power of two and N ≥ R , and let n = N · log R . Most of the functions that weconsider interpret their inputs in {− , } n as a list of N numbers from a range [ R ], and determinewhether this list satisﬁes various natural properties. We let the frequency f i of range item i ∈ R denote the number of times i appears in the input list.In this paper we study the following functions in which the input is N numbers from a range [ R ]: • Surjectivity (

SURJ ): Do all range items appear at least once? • k -distinctness: Is there a range item that appears k or more times? • Image Size Testing: Decide if all range items appear at least once or if at most γ · R rangeitems appear at least once, under the promise that one of these is true. • Statistical distance from uniform (

SDU ): Interpret the input as a probability distribution p ,where p i = f i /N . Compute the statistical distance of p from the uniform distribution over R up to some small additive error ε . • Shannon entropy: Interpret the input as a probability distribution p , where p i = f i /N .Compute the Shannon entropy (cid:80) i ∈ R p i · log(1 /p i ) of p up to additive error ε .An additional function we consider that does not ﬁt neatly into the framework above is k -juntatesting. • k -junta testing: Given an input in {− , } n representing the truth table of a function {− , } log n → {− , } , determine whether this function depends on at most k of its in-put bits, or is at least ε -far from any such function.We resolve or nearly resolve the quantum query complexity and/or approximate degree of allof the functions above. Our lower bounds for SURJ , k -distinctness, Image Size Testing, SDU , andentropy approximation all require N to be “suﬃciently larger” than R , by a certain constant factor.For simplicity, throughout this introduction we do not make this requirement explicit, and for thisreason we label the theorems in this introduction informal. In the Surjectivity problem we are given N numbers from [ R ] and must decide ifevery range item appears at least once in the input.The quantum query complexity of this problem was studied by Beame and Machmouchi [BM12],who proved a lower bound of ˜Ω( n ), which was later improved by Sherstov to the optimal Θ( n ) [She15].5eame and Machmouchi [BM12] explicitly leave open the question of characterizing the approximatedegree of Surjectivity. Recently, Sherstov [She18] showed an upper bound of ˜ O ( n / ) on theapproximate degree of this function. The best prior lower bound was ˜Ω( n / ) [AS04, BT17].We give a completely diﬀerent construction of an approximating polynomial for Surjectivitywith degree ˜ O ( n / ). We also prove a matching lower bound, which shows that the approximatedegree of the Surjectivity function is ˜Θ( n / ). Theorem 1 (Informal) . The approximate degree of

SURJ is ˜Θ( n / ) . k -distinctness. In this problem, we are given N numbers in [ R ] and must decide if any rangeitem appears at least k times in the list (i.e., is there an i ∈ [ R ] with f i ≥ k ?). This generalizes thewell-studied Element Distinctness problem, which is the same as 2-distinctness.Ambainis [Amb07] ﬁrst used quantum walks to give an O ( n k/ ( k +1) ) upper bound on the quantumquery complexity of any problem with certiﬁcates of size k , including k -distinctness and k -sum. Later, Belovs introduced a beautiful new framework for designing quantum algorithms [Bel12b]and used it to improve the upper bound for k -distinctness to O ( n / − / (2 k +2 − ) [Bel12a]. Severalsubsequent works have used Belovs’ k -distinctness algorithm as a black-box subroutine for solvingmore complicated problems (e.g., [LW18, Mon16]).As for lower bounds, Aaronson and Shi [AS04] established an ˜Ω( n / ) lower bound on theapproximate degree of k -distinctness for any k ≥

2. Belovs and ˇSpalek used the adversary methodto prove a lower bound of Ω( n k/ ( k +1) ) on the quantum query complexity of k -sum, showing thatAmbainis’ algorithm is tight for k -sum. They asked whether their techniques can prove an ω ( n / )quantum query lower bound for k -distinctness. We achieve this goal, but using the polynomialmethod instead of the adversary method. Our main result is the following: Theorem 2 (Informal) . For any k ≥ , the approximate degree and quantum query complexity of k -distinctness is ˜Ω( n / − / (2 k ) ) . This is nearly tight for large k , as it approaches Belovs’ upper bound of O ( n / − / (2 k +2 − ).Note that both bounds approach Θ( n / ) as k → ∞ . It remains an intriguing open question to closethe gap between n / − / (2 k +2 − and n / − / (2 k ) , especially for small values of k ≥ k -distinctness lower bound also implies an ˜Ω( n / − / (2 k ) ) lower bound on the quantumquery complexity of approximating the maximum frequency, F ∞ , of any element up to relative errorless than 1 /k [Mon16], improving over the previous best bound of ˜Ω( n / ). Image Size Testing.

In this problem, we are given N numbers in [ R ] and 0 < γ <

1, and mustdecide if every range item appears at least once or if at most γ · R range items appear at least once.We show for any γ >

0, the problem has approximate degree and quantum query complexity ˜Ω( √ n ).This holds as long as N = c · R for a certain constant c > Theorem 3 (Informal) . The approximate degree and quantum query complexity of Image SizeTesting is ˜Ω( √ n ) . This lower bound is tight, matching a quantum algorithm of Ambainis, Belovs, Regev, and deWolf [ABRdW16], and resolves a conjecture from their work. The previous best lower bound wasΩ( n / ) [ABRdW16] obtained via reduction to the Collision lower bound [AS04]. The classicalquery complexity of this problem is Θ( n/ log n ) [VV11]. In the k -sum problem, we are given N numbers in [ R ] and asked to decide if any k of them sum to 0 (mod R ). O ( √ n )queries: pick a random range item, and Grover search for an instance of that range item. Thefact that our lower bound holds even for this special case of the problem considered in prior worksobviously only makes our lower bound stronger.This lower bound also serves as a starting point to establish the next three lower bounds. k -junta Testing. In this problem, we are given the truth table of a Boolean function and have todetermine if the function depends on at most k variables or if it is (cid:15) -far from any such function.The best classical algorithm for this problem uses O ( k log k + k/ε ) queries [Bla09]. The problemwas ﬁrst studied in the quantum setting by Atıcı and Servedio [AS07], who gave a quantum algorithmmaking O ( k/ε ) queries. This was later improved by Ambainis et al. [ABRdW16] to ˜ O ( (cid:112) k/ε ). Theyalso proved a lower bound of Ω( k / ). Via a connection established by Ambainis et al., our imagesize testing lower bound implies a ˜Ω( √ k ) lower bound on the approximate degree and quantumquery complexity of k -junta testing (for some ε = Ω(1)). Theorem 4 (Informal) . The approximate degree and quantum query complexity of k -junta testingis ˜Ω( √ k ) . This matches the upper bound of [ABRdW16], resolving the main open question from theirwork.

Statistical Distance From Uniform ( SDU ) . In this problem, we are given N numbers in [ R ],which we interpret as a probability distribution p , where p i = f i /N , the fraction of times i appears.The goal is to compute the statistical distance between p and the uniform distribution to error ε .This problem was studied by Bravyi, Harrow, and Hassidim [BHH11], who gave an O ( √ n )-queryquantum algorithm approximating the statistical distance between two input distributions to additiveerror ε = Ω(1). We show that the approximate degree and quantum query complexity of this taskare ˜Ω( √ n ), even when one of the distributions is known to be the uniform distribution. Theorem 5 (Informal) . There is a constant c > such that the approximate degree and quantumquery complexity of approximating the statistical distribution of a distribution over a range of size n from the uniform distribution over the same range to additive error c is is ˜Ω( √ n ) . This matches the upper bound of Bravyi et al. [BHH11] and answers the main question left openfrom that work. Note that the classical query complexity of this problem is Θ( n/ log n ) [VV11]. Entropy Approximation.

As in the previous problem, we interpret the input as a probabilitydistribution, and the goal is to compute its Shannon entropy to additive error ε . The classicalquery complexity of this problem is Θ( n/ log n ) [VV11]. We show that, for some ε = Ω(1), theapproximate degree and quantum query complexity are ˜Ω( √ n ). Theorem 6 (Informal) . There is a constant c > such that the approximate degree and quantumquery complexity of approximating the Shannon entropy of a distribution over a range of size n toadditive error c is is ˜Ω( √ n ) . This too is tight, answering a question of Li and Wu [LW18].7 .2 Prior Work on Lower Bounding Approximate Degree

A relatively new lower-bound technique for approximate degree called the method of dual polynomials plays an essential role in our paper. This method of dual polynomials dates back to work ofSherstov [She13c] and ˇSpalek [ˇSpa08], though dual polynomials had been used earlier to resolvelongstanding questions in communication complexity [She11, SZ09, She09b, CA08, LS09b]. To prove alower bound for a function f via this method, one exhibits an explicit dual polynomial for f , whichis a dual solution to a certain linear program capturing the approximate degree of f .A notable feature of the method of dual polynomials is that it is lossless, in the sense that itcan exhibit a tight lower bound on the approximate degree of any function f (though actuallyapplying the method to speciﬁc functions may be highly challenging). Prior to the method ofdual polynomials, the primary tool available for proving approximate degree lower bounds wassymmetrization, introduced by Minsky and Papert [MP69] in the 1960s. Although powerful,symmetrization is not a lossless technique.Most prior work on the method of dual polynomials can be understood as establishing hardnessampliﬁcation results. Such results show how to take a function f that is “somewhat hard” toapproximate by low-degree polynomials, and turn f into a related function g that is much harderto approximate. Here, harder means either that g requires larger degree to approximate to thesame error as f , or that approximations to g of a given degree incur much larger error than doapproximations to f of the same degree. Results for Block-Composed Functions.

Until very recently, the method of dual polynomialshad been used exclusively to prove hardness ampliﬁcation results for block-composed functions. Thatis, the harder function g would be obtained by block-composing f with another function h , i.e., g = h ◦ f . Here, a function g : {− , } n · m → {− , } is the block-composition of h : {− , } n → {− , } and f : {− , } m → {− , } if g interprets its input as a sequence of n blocks, applies f to eachblock, and then feeds the n outputs into h .The method of dual polynomials turns out to be particularly suited to analyzing block-composedfunctions, as there are sophisticated ways of “combining” dual witnesses for h and f individuallyto give an eﬀective dual witness for h ◦ f [She13c, SZ09, She13a, BT13, BT15, Tha16, She14, She15,BCH + f ( x ) = (cid:86) ni =1 (cid:87) mj =1 x ij , known as the AND-OR tree, which hadbeen open for 19 years [BT13,She13a], established new lower bounds for AC under basic complexitymeasures including discrepancy [BT15, Tha16, She14, She15], sign-rank [BT16b], and thresholddegree [She15, She14], and resolved a number of open questions about the power of statistical zeroknowledge proofs [BCH + Beyond Block-Composed Functions.

While the aforementioned results led to considerableprogress in complexity theory, many basic questions require understanding the approximate degreeof non -block-composed functions. One prominent example with many applications is to exhibit anAC circuit over n variables with approximate degree Ω( n ). Until very recently, the best result inthis direction was Aaronson and Shi’s well-known ˜Ω( n / ) lower bound on the approximate degree ofthe Element Distinctness function (which is equivalent to k -distinctness for k = 2) [AS04]. However,Bun and Thaler [BT17] recently achieved a near-resolution of this problem by proving the followingtheorem. Theorem 7 (Bun and Thaler [BT17]) . For any constant δ > , there is an AC circuit withapproximate degree Ω( n − δ ) . required moving beyond block-composed functions is the following resultof Sherstov [She13d]. Theorem 8 (Sherstov) . For any Boolean functions f and h , (cid:103) deg( h ◦ f ) = O (cid:16)(cid:103) deg( h ) · (cid:103) deg( f ) (cid:17) . Theorem 8 implies that the approximate degree of h ◦ f (viewed as a function of its input size) isnever higher than the approximate degree of f or h individually (viewed as a function of their inputsizes). For example, if f and h are both functions on n inputs, and both have approximate degree O ( n / ), then h ◦ f has N := n inputs, and by Theorem 8, (cid:103) deg( h ◦ f ) = O ( n / · n / ) = O ( N / ).This means that block-composing multiple AC functions does not result in a function of higherapproximate degree (as a function of its input size) than that of the individual functions. Bun andThaler [BT17] overcome this hurdle by introducing a way of analyzing functions that cannot bewritten as a block-composition of simpler functions.Bun and Thaler’s techniques set the stage to resolve the approximate degree of many basicfunctions using the method of dual polynomials. However, they were not reﬁned enough to accomplishthis on their own. Our lower bounds in this paper are obtained by reﬁning and extending themethods of [BT17]. In order to describe our techniques, it is helpful to explain the process by which we discovered thetight ˜Θ( n / ) lower and upper bounds for Surjectivity (cf. Theorem 1). It has previously beenobserved [Tha16, BT13, BT17] that optimal dual polynomials for a function f tend to be tailored(in a sense that can be made precise via complementary slackness) to showing optimality of somespeciﬁc approximation technique for f . Hence, constructing a dual polynomial for f can provide astrong hint as to how to construct an optimal approximation for f , and vice versa. Upper Bound for Surjectivity.

In [BT17], Bun and Thaler constructed a dual polynomialwitnessing a suboptimal bound of ˜Ω( n / ) for SURJ . Even though this dual polynomial is suboptimal,it still provided a major clue as to what an optimal approximation for

SURJ should look like: itcuriously ignored all inputs failing to satisfy the following condition.

Condition 1.

Every range item has frequency at most T , for a speciﬁc threshold T = O ( N / ) (cid:28) N . This suggested that an optimal approximation for

SURJ should treat inputs satisfying Condition 1diﬀerently than other inputs, leading us to the following multi-phase construction (for clarity andbrevity, this overview is simpliﬁed). The ﬁrst phase constructs a polynomial p of degree O ( n / )approximating SURJ on all inputs satisfying Condition 1. However, p may be exponentially large onother inputs. The second phase constructs a polynomial q of degree O ( n / ) that is exponentiallysmall on inputs x that do not satisfy Condition 1 (in particular, q ( x ) (cid:28) /p ( x ) for such x ), and isclose to 1 otherwise. The product p · q still approximates SURJ on inputs satisfying Condition 1,and is exponentially small on all other inputs. Notice that deg( p · q ) ≤ deg( p ) + deg( q ) = O ( n / ).Combining the above with an additional averaging step (the details of which we omit from thisintroduction) yields an approximation to SURJ that is accurate on all inputs.9 ower Bound for Surjectivity.

With the O ( n / ) upper bound in hand, we were able to identifythe fundamental bottleneck preventing further improvement of the upper bound. This suggesteda way to reﬁne the techniques of [BT17] to prove a matching lower bound. Once the tight lowerbound for SURJ was established, we were able to identify additional reﬁnements to analyze theother functions that we consider. We now describe this in more detail.Bun and Thaler’s [BT17] (suboptimal) lower bound analysis for

SURJ proceeds in two stages.In the ﬁrst stage, proving a lower bound for

SURJ (on N input list items and R range items) isreduced to the problem of proving a lower bound for the block-composed function AND R ◦ OR N , under the promise that the input has Hamming weight at most N . In this paper, we use this stageof their analysis unmodiﬁed.The second stage proves an ˜Ω( R / ) lower bound for the latter problem by leveraging much ofthe machinery developed to analyze the approximate degree of block-composed functions [BT13,She13a, RS10]. To describe this machinery, we require the following notion. A dual polynomial thatwitnesses the fact that (cid:103) deg ε ( f n ) ≥ d is a function ψ : {− , } n → {− , } satisfying three properties: • (cid:80) x ∈{− , } n ψ ( x ) · f ( x ) > ε . If ψ satisﬁes this condition, it is said to be well-correlated with f . • (cid:80) x ∈{− , } n | ψ ( x ) | = 1. If ψ satisﬁes this condition, it is said to have (cid:96) -norm equal to 1. • For all polynomials p : {− , } n → R of degree less than d , we have (cid:80) x ∈{− , } n p ( x ) · ψ ( x ) = 0.If ψ satisﬁes this condition, it is said to have pure high degree at least d .In more detail, the second stage of the analysis from [BT17] itself proceeds in two steps.First, the authors consider a dual witness ψ for the high approximate degree of AND R ◦ OR N that was constructed in prior work [BT13]. ψ is constructed by taking dual witnesses φ and γ for the high approximate degrees of AND R and OR N individually, and “combining” them in aspeciﬁc way [SZ09, She13c, Lee09] to obtain a dual witness for the high approximate degree of theirblock-composition AND R ◦ OR N .Unfortunately, ψ only witnesses a lower bound for AND R ◦ OR N without the promise that theHamming weight of the input is at most N . To address this issue, it is enough to “post-process” ψ so that it no longer “exploits” any inputs of Hamming weight larger than N (formally, ψ ( x ) shouldequal zero for any inputs in {− , } R · N of Hamming weight more than N ). The authors accomplishthis by observing that ψ “almost ignores” all such inputs (i.e., it places exponentially little totalmass on all such inputs), and hence it is possible to perturb ψ to make it completely ignore all suchinputs.Key to this step is the fact that the “inner” dual witness γ for the high approximate degree ofthe OR N function satisﬁes a “Hamming weight decay” condition: | γ ( x ) | · (cid:18) N | x | (cid:19) ≤ | x | ) , (1)for a suitable polynomial function. When it is not clear from context, we use subscripts to denote the number of variables on which a function isdeﬁned. Note that a reduction the other direction is straightforward: to approximate

SURJ , it suﬃces to approximate

AND R ◦ OR N on inputs of Hamming weight exactly N . This is because SURJ can be expressed as an

AND R (overall range items r ∈ [ R ]) of the OR N (over all input bits i ∈ [ N ]) of “Is input x i equal to r ”? Each predicate of theform in quotes is computed exactly by a polynomial of degree log R , since it depends on only log R of the inputs, andexactly N of these predicates (one for each i ∈ [ N ]) evaluate to TRUE.

10o improve the lower bound for

SURJ from ˜Ω( n / ) to the optimal ˜Ω( n / ), we observe that γ in fact satisﬁes a much stronger decay condition: while the inverse-polynomial decay property ofEquation (1) is tight for small Hamming weights | x | , | γ ( x ) | actually decays exponentially quickly once | x | is larger than a certain threshold t . This observation is enough to obtain the tight ˜Ω( n / )lower bound for SURJ .For intuition, it is worth mentioning that a primal formulation of the dual decay conditionthat we exploit shows that any low-degree polynomial p that is an accurate approximation to OR N on low Hamming weight inputs requires large degree, even if | p ( x ) | is allowed to be exponentiallylarge for inputs of Hamming weight more than t . This is precisely the bottleneck that prevents usfrom improving our upper bound for

SURJ to o ( N / ). In this sense, our dual witness is intuitivelytailored to showing optimality of the techniques used in our upper bound. Other Lower Bounds.

To obtain the lower bound for k -distinctness, the ﬁrst stage of theanalysis of [BT17] reduces to a question about the approximate degree of the block composedfunction OR R ◦ THR kN , under the promise that the input has Hamming weight at most N . Here THR kN : {− , } N → {− , } denotes the function that evaluates to − k . By constructing a suitable dual witness for THR kN , and combiningit with a dual witness for OR N via similar techniques as in our construction for SURJ , we are ableto prove our Ω( n / − / (2 k ) ) lower bound for k -distinctness. (This description glosses over severalsigniﬁcant technical issues that must be dealt with to ensure that the combined dual witness iswell-correlated with OR R ◦ THR kN ). Recall that our lower bounds for k -junta testing, SDU , and entropy approximation are derivedas consequences of our lower bound for image size testing. The connection between image sizetesting and junta testing was established by Ambainis et al. [ABRdW16]. The reason that theimage testing lower bound implies lower bounds for

SDU is the following. Consider any distribution p over [ R ] such that all probabilities p i are integer multiples of 1 /N for some N = O ( R ). Then if p has full support, p is guaranteed to be somewhat close to uniform, while if p has small support, p must be very far from uniform. We obtain our lower bound for entropy approximation using asimple reduction from SDU due to Vadhan [Vad99].To obtain our lower bound for Image Size Testing, we observe that the ﬁrst stage of the analysisof [BT17] reduces to a question about the approximate degree of the block composed function

GapAND R ◦ OR N , under the promise that the input has Hamming weight at most N . Here, GapAND R is the promise function that outputs − −

1, outputs +1 if fewer than γ · R of its inputs are −

1, and is undeﬁned otherwise.Roughly speaking, we obtain the desired ˜Ω( n / ) lower bound by combining a dual witness for GapAND R ◦ OR N from prior work [BT15] with the same techniques as in our construction for SURJ .However, additional technical reﬁnements to the analysis of [BT17] are required to obtain our results. We do not formally describe this primal formulation of the dual decay condition, because it is not necessary toprove any of the results in this paper. Speciﬁcally, our analysis requires the dual witness γ for THR kN to be very well-correlated with THR kN in a certainone-sided sense (roughly, we need the probability distribution | γ | to have the property that, conditioned on γ outputtinga negative value, the input to γ is in (cid:0) THR kN (cid:1) − ( −

1) with probability at least 1 − / (3 R )). This property was notrequired in the analysis for SURJ , which is why our lower bound for

SURJ is larger by a factor of n / (2 k ) than our lowerbound for k -distinctness. This seemingly technical issue is at least partially intrinsic: a polynomial loss compared tothe Ω( n / ) lower bound for SURJ is unavoidable, owing to Belovs’ n / − Ω(1) upper bound [Bel12a] for k -distinctness. urjectivity upper bound 𝑂"(𝑛 % &⁄ ) Surjectivity lower bound

Ω*(𝑛 % &⁄ )𝑘 -distinctness lower bound Ω*(𝑛 %& - ./0 ) Image size testing lower bound

Ω*( 𝑛 )𝑘 -junta testing lower bound Ω*( 𝑘 ) Statistical distance lower bound

Ω*( 𝑛 ) Shannon entropy lower bound

Ω*( 𝑛 ) ReductionIntuition and ideas

Figure 1: Diagram of reductions and relationships amongst our results.In particular, the analysis of [BT17] only provides a lower bound for

SURJ if N = Ω( R · log ( R )).But in order to infer our lower bound for SDU and entropy approximation (as well as k -junta testingfor ε = Ω(1)), it is essential that the lower bound hold for N = O ( R ). This is because a distributionwith full support is guaranteed to be Ω(1)-close to uniform if all probabilities are integer multiples of1 /N with N = O ( R ), but this is not the case otherwise. (Consider, e.g., a distribution that placesmass 1 − / log ( R ) on a single range item, and spreads out the remaining mass evenly over allother range items). Reﬁning the methods of [BT17] to yield lower bounds even when N = O ( R )requires a signiﬁcantly more delicate analysis than in [BT17].A diagram indicating how we obtain our results and the relationships that we establish betweenproblems is given in Figure 1. Section 2 covers preliminary deﬁnitions and lemmas. Section 3 presents the ˜ O ( n / ) upper bound for SURJ , while Section 4 presents the matching ˜Ω( n / ) lower bound. Section 5 gives the ˜Ω( n / − / (2 k ) )lower bound for k -distinctness. Section 6 presents the lower bound for Image Size Testing, and itsimplications for junta testing, SDU , and Shannon entropy approximation. Section 7 concludes bybrieﬂy describing some additional consequences of our results, as well as a number of open questionsand directions for future work.

For a natural number N , let [ N ] = { , , . . . , N } and [ N ] = { , , , . . . , N } . All logarithms aretaken in base 2 unless otherwise noted. As is standard, we say that a function f ( n ) is in ˜ O ( h ( n )) ifthere exists a constant k such that f ( n ) is in O ( h ( n ) · log k ( h ( n ))).12e will frequently work with Boolean functions under the promise that their inputs have lowHamming weight. To this end, we introduce the following notation for the set of low-Hammingweight inputs. Deﬁnition 9.

For ≤ T ≤ n , let H n ≤ T denote the subset of {− , } n consisting of all inputsHamming weight at most T . We use | x | to denote the Hamming weight of an input x ∈ {− , } n ,so H n ≤ T = { x ∈ {− , } n : | x | ≤ T } . There are two natural notions of approximate degree for promise problems (i.e., for functions deﬁnedon a strict subset X of {− , } n ). One notion requires an approximating polynomial p to be boundedin absolute value even on inputs in {− , } n \ X . The other places no restrictions on p outsideof the promise X . In this work, we make use of both notions. Hence, we must introduce some(non-standard) notation to distinguish the two. Deﬁnition 10 (Approximate Degree With Boundedness Outside of the Promise Required) . Let ε > and X ⊆ {− , } n . The ε - approximate degree of a Boolean function f : X → {− , } , denoted (cid:103) deg ε ( f ) , is the least degree of a real polynomial p : X → R such that | p ( x ) − f ( x ) | ≤ ε for all x ∈ X and | p ( x ) | ≤ ε for all x ∈ {− , } n \ X . We use the term approximate degree withoutqualiﬁcation to refer to (cid:103) deg( f ) = (cid:103) deg / ( f ) . The following standard dual formulation of this ﬁrst variant of approximate degree can be foundin, e.g., [BT16a].

Proposition 11.

Let

X ⊆ {− , } n , and let f : X → {− , } . Then (cid:103) deg ε ( f ) ≥ d if and only ifthere exists a function ψ : {− , } n → R satisfying the following properties. (cid:88) x ∈X ψ ( x ) · f ( x ) − (cid:88) x ∈{− , } n \X | ψ ( x ) | > ε, (2) (cid:88) x ∈{− , } n | ψ ( x ) | = 1 , and (3) For every polynomial p : {− , } n → R of degree less than d, (cid:88) x ∈{− , } n p ( x ) · ψ ( x ) = 0 . (4)We will refer to functions ψ : {− , } n → R as dual polynomials . We refer to (cid:80) x ∈{− , } n | ψ ( x ) | as the (cid:96) -norm of ψ , and denote this quantity by (cid:107) ψ (cid:107) . If ψ satisﬁes Equation (4), it is said to have pure high degree at least d .Given a function ψ : {− , } n → R , and a (possibly partial) function f : X → {− , } , where X ⊆ {− , } n , we let (cid:104) f, ψ (cid:105) := (cid:80) x ∈X f ( x ) · ψ ( x ) − (cid:80) x ∈{− , } n \X | ψ ( x ) | , and refer to this as thecorrelation of f and ψ . So Condition (2) is equivalent to requiring ψ and f to have correlation greatthan ε . Deﬁnition 12 (Approximate Degree With Unboundedness Permitted Outside of the Promise) . Let ε > and X be a ﬁnite set. The ε - unbounded approximate degree of a Boolean function f : X → {− , } , denoted (cid:94) ubdeg ε ( f ) , is the least degree of a real polynomial p : X → R such that | p ( x ) − f ( x ) | ≤ ε for all x ∈ X (if X is a strict subset of a larger domain, then no constraints are laced on p ( x ) for x (cid:54)∈ X ). We use the term unbounded approximate degree without qualiﬁcationto refer to (cid:94) ubdeg( f ) = (cid:94) ubdeg / ( f ) . The following standard dual formulation of this second variant of approximate degree canbe found in, e.g., [She11]. A dual polynomial ψ : {− , } n → {− , } witnessing the fact that (cid:94) ubdeg ε ( f ) ≥ d is the same as a dual witness for (cid:103) deg ε ( f ) ≥ d , but with the additional requirementthat ψ ( x ) = 0 outside of X . Proposition 13.

Let

X ⊆ {− , } n , and let f : X → {− , } . Then (cid:94) ubdeg ε ( f ) ≥ d if and only ifthere exists a function ψ : {− , } n → R satisfying the following properties. ψ ( x ) = 0 for all x (cid:54)∈ X , (5) (cid:88) x ∈X ψ ( x ) · f ( x ) > ε, (6) (cid:88) x ∈{− , } n | ψ ( x ) | = 1 , and (7) For every polynomial p : {− , } n → R of degree less than d, (cid:88) x ∈{− , } n p ( x ) · ψ ( x ) = 0 . (8)Observe that (cid:103) deg( f ) and (cid:94) ubdeg( f ) coincide for total functions f . To avoid notational clutter,when referring to the approximate degree of total functions, we will use the shorter notation (cid:103) deg( f ). The seminal work of Nisan and Szegedy [NS94] gave tight bounds on the approximate degree of the

AND n and OR n functions. Lemma 14.

For any constant ε ∈ (0 , , the functions AND and OR on n bits have ε -approximatedegree Θ( n / ) , and the same holds for their negations. Approximate degree is invariant under negating the inputs or output of a function, and hencethe result for

AND implies the result for

NAND , OR , etc.The following lemma, which forms the basis of the well-known symmetrization argument, is dueto Minsky and Papert [MP69]. Lemma 15.

Let p : {− , } n → {− , } be an arbitrary polynomial and let [ n ] denote the set { , , . . . , n } . Then there is a univariate polynomial q : R → R of degree at most deg( p ) such that q ( t ) = 1 (cid:0) nt (cid:1) (cid:88) x ∈{− , } n : | x | = t p ( x ) for all t ∈ [ n ] . We give formal deﬁnitions of the Surjectivity and k -distinctness we consider in this work, as well asseveral variations that will be helpful in proving our lower bounds. The formal deﬁnitions of theother functions we study, including Image Size Testing, SDU , and the Shannon Entropy functions,are deferred to Section 6. 14 .4.1 SurjectivityDeﬁnition 16.

For N ≥ R , we deﬁne SURJ

N,R : [ R ] N → {− , } by SURJ

N,R ( s , . . . , s N ) = − iﬀ for every j ∈ [ R ] , there exists an i such that s i = j . When N and R are clear from context, we will often refer to the function SURJ without theexplicit dependence on these parameters. It will sometimes be convenient to think of the input to

SURJ

N,R as a function mapping {− , } n → {− , } rather than [ R ] N → {− , } . When needed, weassume that R is a power of 2 and an element of [ R ] is encoded in binary using log R bits. In thiscase we will view Surjectivity as a function on n = N log R bits, i.e., SURJ : {− , } n → {− , } .For technical reasons, when proving lower bounds, it will be more convenient to work with avariant of SURJ where the range [ R ] is augmented by a “dummy element” 0 that is simply ignored bythe function. That is, while any of the items s , . . . , s N may take the dummy value 0, the presenceof a 0 in the input is not required for the input to be deemed surjective. We denote this variant ofSurjectivity by dSURJ . More formally: Deﬁnition 17.

For N ≥ R , we deﬁne dSURJ N,R : [ R ] N → {− , } by dSURJ N,R ( s , . . . , s N ) = − iﬀ for every j ∈ [ R ] , there exists an i such that s i = j . The following simple reduction shows that a lower bound on the approximate degree of dSURJ implies a lower bound for

SURJ itself.

Proposition 18.

Let ε > and N ≥ R . Then (cid:103) deg ε ( dSURJ N,R ) ≤ (cid:103) deg ε ( SURJ N +1 ,R +1 ) · log( R + 1) . Proof.

Let p : {− , } ( N +1) · log( R +1) → {− , } be a polynomial of degree d that ε -approximates SURJ N +1 ,R +1 . We will use p to construct a polynomial of degree d that ε -approximates dSURJ N,R .Recall that an input to dSURJ

N,R takes the form ( s , . . . , s N ) where each s i is the binary represen-tation of a number in [ R ] . Deﬁne the transformation T : [ R ] → [ R + 1] by T ( s ) = (cid:40) R + 1 if s = 0 s otherwise . Note that as a mapping between binary representations, the function T is exactly computed by avector of polynomials of degree at most log( R + 1). For every ( s , . . . , s N ) ∈ [ R ] N , observe that dSURJ N,R ( s , . . . , s N ) = SURJ N +1 ,R +1 ( T ( s ) , . . . , T ( s N ) , R + 1) . Hence, the polynomial p ( T ( s ) , . . . , T ( s N ) , R + 1)is a polynomial of degree d · log( R + 1) that ε -approximates dSURJ N,R . k -DistinctnessDeﬁnition 19. For integers k, N, R with k ≤ N , deﬁne the function DIST kN,R : [ R ] N → {− , } by DIST kN,R ( s , . . . , s N ) = − iﬀ there exist r ∈ [ R ] and distinct indices i , . . . , i k such that s i = · · · = s i k = r .

15s with Surjectivity, it will be convenient to work with a variant of k -distinctness where [ R ] isaugmented with a dummy item: Deﬁnition 20.

For integers k, N, R with k ≤ N , deﬁne the function dDIST kN,R : [ R ] N → {− , } by dDIST kN,R ( s , . . . , s N ) = − iﬀ there exist r ∈ [ R ] and distinct indices i , . . . , i k such that s i = · · · = s i k = r . For k ≥

2, a lower bound on the approximate degree of dDIST k implies a lower bound on theapproximate degree of DIST k . The restriction that k ≥ DIST is the constant function that evaluates to TRUE on any input (since at least one range item mustalways have frequency at least one), whereas dDIST contains OR N as a subfunction, and hence hasapproximate degree at least Ω( √ N ). Proposition 21.

Let ε > , N, R ∈ N , and k ≥ . Then deg ε ( dDIST kN,R ) ≤ deg ε ( DIST kN,R + N ) · log( R + 1) . Proof.

The proof is similar to that of Proposition 18, but uses a slightly more involved reduction.Let p : {− , } N · log( R + N ) → {− , } be a polynomial of degree d that ε -approximates DIST kN,R + N .We will use p to construct a polynomial of degree d that ε -approximates dDIST kN,R . For each i = 1 , . . . , R , deﬁne a transformation T i : [ R ] → [ R + N ] by T i ( s ) = (cid:40) R + i if s = 0 s otherwise . As a mapping between binary representations, the function T = ( T , . . . , T N ) is exactly computedby a vector of polynomials of degree at most log( R + 1). For every ( s , . . . , s N ) ∈ [ R ] N , observe that dDIST kN,R ( s , . . . , s N ) = DIST kN,R + N ( T ( s ) , . . . , T N ( s N )) . Hence, the polynomial p ( T ( s ) , . . . , T ( s N ))is a polynomial of degree d · log( R + 1) that ε -approximates dSURJ N,R . An important ingredient in [BT17] is the relationship between the approximate degree of a propertyof a list of numbers (such as

SURJ ) and the approximate degree of a simpler block composedfunction, deﬁned as follows.

Deﬁnition 22.

For functions f : Y n → Z and g : X → Y , deﬁne the block composition f ◦ g : X n → Z by ( f ◦ g )( x , . . . , x n ) = f ( g ( x ) , . . . , g ( x n )) , for all x , . . . , x n ∈ X . Fix

R, N ∈ N , let f : {− , } R → {− , } and let g : {− , } N → {− , } . Suppose g is a symmetric function, in the sense that for any x ∈ {− , } N and any permutation σ : [ N ] → [ N ], wehave g ( x , . . . , x N ) = g ( x σ (1) , . . . , x σ ( N ) ) . g on any input x depends only on its Hamming weight | x | .The functions f and g give rise to two functions. The ﬁrst, which we denote by F prop : [ R ] N →{− , } , is a certain property of a list of numbers s , . . . , s N ∈ [ R ] . The second, which we denoteby F ≤ N : H N · R ≤ N → {− , } , is the block composition of f and g restricted to inputs of Hammingweight at most N . Formally, these functions are deﬁned as: F prop ( s , . . . , s N ) = f ( g ( [ s = 1] , . . . , [ s N = 1]) , . . . , g ( [ s = R ] , . . . , [ s N = R ])) F ≤ N ( x , . . . , x R ) = (cid:40) f ( g ( x ) , . . . , g ( x R )) if x , . . . , x R ∈ {− , } N , | x | + · · · + | x R | ≤ N. undeﬁned otherwise . The following proposition from [BT17], which in turn relies heavily on a clever symmetrizationargument due to due to Ambainis [Amb05], relates the approximate degrees of the two functions F prop and F ≤ N . Theorem 23.

Let f : {− , } R → {− , } be any function and let g : {− , } N → {− , } be asymmetric function. Then for F prop and F ≤ N deﬁned above, and for any ε > , we have deg ε ( F prop ) ≥ (cid:94) ubdeg ε ( F ≤ N ) . In the case where f = AND R and g = OR N , the function F prop ( s , . . . , s N ) is the Surjectivityfunction augmented with a dummy item, dSURJ N,R ( s , . . . , s N ). Hence, Corollary 24.

Let

N, R ∈ N . Then for any ε > , (cid:103) deg ε ( dSURJ N,R ) ≥ (cid:94) ubdeg ε ( F ≤ N ) where F ≤ N : H N · R ≤ N → {− , } is the restriction of AND R ◦ OR N to H N · R ≤ N . Deﬁnition 25.

For integers k, N with k ≤ N , deﬁne the function THR kN : {− , } N → {− , } by THR kN ( x ) = − iﬀ | x | ≥ k . If we let f = OR R and g = THR kN , then the function F prop is the dummy augmented k -distinctnessfunction dDIST kN,R . Corollary 26.

Let

N, R ∈ N . Then for any ε > , (cid:103) deg ε ( dDIST kN,R ) ≥ (cid:94) ubdeg ε ( G ≤ N ) where G ≤ N : H N · R ≤ N → {− , } is the restriction of OR R ◦ THR kN to H N · R ≤ N . This section collects deﬁnitions and preliminary results on the dual block method [SZ09,Lee09,She13c]for constructing dual witnesses for a block composed function F ◦ f by combining dual witnesses for F and f respectively. Deﬁnition 27.

Let

Ψ : {− , } M → R and ψ : {− , } m → R be functions that are not identicallyzero. Let x = ( x , . . . , x M ) ∈ ( {− , } m ) M . Deﬁne the dual block composition of Ψ and ψ , denoted Ψ (cid:63) ψ : ( {− , } m ) M → R , by (Ψ (cid:63) ψ )( x , . . . , x M ) = 2 M · Ψ( . . . , sgn ( ψ ( x i )) , . . . ) · M (cid:89) i =1 | ψ ( x i ) | . Proposition 28 ([She13c, BT17]) . The dual block composition satisﬁes the following properties: reservation of (cid:96) -norm : If (cid:107) Ψ (cid:107) = 1 , (cid:107) ψ (cid:107) = 1 , and (cid:104) ψ, (cid:105) = 0 , then (cid:107) Ψ (cid:63) ψ (cid:107) = 1 . (9) Multiplicativity of pure high degree : If (cid:104) Ψ , P (cid:105) = 0 for every polynomial P : {− , } M →{− , } of degree less than D , and (cid:104) ψ, p (cid:105) = 0 for every polynomial p : {− , } m → {− , } of degreeless than d , then for every polynomial q : {− , } m · M → {− , } , deg q < D · d = ⇒ (cid:104) Ψ (cid:63) ψ, q (cid:105) = 0 . (10) Associativity : For every ζ : {− , } m ζ → R , ϕ : {− , } m ϕ → R , and ψ : {− , } m ψ → R , wehave ( ζ (cid:63) ϕ ) (cid:63) ψ = ζ (cid:63) ( ϕ (cid:63) ψ ) . (11) The following technical proposition reﬁnes techniques of Bun and Thaler [BT17]. This propositionis useful for “zeroing out” the mass that a dual polynomial ξ places on inputs of high Hammingweight, if ξ is obtained via the dual-block method. Deﬁnition 29.

Let M ∈ N and α, β > . A function ω : [ M ] → R satisﬁes the ( α, β ) -decaycondition if M (cid:88) t =0 ω ( t ) = 0 , (12) M (cid:88) t =0 | ω ( t ) | = 1 , (13) | ω ( t ) | ≤ α exp( − βt ) /t ∀ t = 1 , , . . . , M. (14) Proposition 30.

Let R ∈ N be suﬃciently large, and let Φ : {− , } R → R with (cid:107) Φ (cid:107) = 1 . For M ≤ R , let ω : [ M ] → R satisfy the ( α, β ) -decay condition with parameters ≤ α ≤ R , β ∈ (4 ln R/ √ αR, .Let N = (cid:100) √ α (cid:101) R , and deﬁne ψ : {− , } N → R by ψ ( x ) = ω ( | x | ) / (cid:0) N | x | (cid:1) . If D < N is such thatFor every polynomial p with deg p < D, we have (cid:104) Φ (cid:63) ψ, p (cid:105) = 0 , (15) then there exist ∆ ≥ β √ αR/ R and a function ζ : ( {− , } N ) R → R such thatFor every polynomial p with deg p < min { D, ∆ } , we have (cid:104) ζ, p (cid:105) = 0 , (16) (cid:107) ζ − Φ (cid:63) ψ (cid:107) ≤ , (17) (cid:107) ζ (cid:107) = 1 , (18) ζ is supported on H N · R ≤ N . (19)18he key reﬁnement of Proposition 30 relative to the analysis of Bun and Thaler is that Propo-sition 30 applies when N = Θ( R ) (assuming α = O (1)). In contrast, the techniques of Bun andThaler required N = Ω( R · log R ). As indicated in Section 1.3, this reﬁnement will be essentialin obtaining our lower bounds for SDU , entropy approximation, and junta testing for constantproximity parameter.The proof of Proposition 30 occurs in two steps. First, in Proposition 31, we show that ξ = Φ (cid:63) ψ places an exponentially small amount of mass on inputs outside of H N · R ≤ N . Second, in Proposition 33,we construct a correction object ν that zeroes out the mass ξ places outside of H N · R ≤ N withoutdecreasing its pure high degree. Combining ξ with ν yields the desired object ζ . Proposition 31.

Let

Φ : {− , } R → R and ψ : {− , } n → R satisfy the conditions of Proposi-tion 30. Then for suﬃciently large R , there exists ∆ ≥ β √ αR/ R such that, for N = (cid:100) √ α (cid:101) R , (cid:88) x/ ∈ H N · R ≤ N | (Φ (cid:63) ψ )( x ) | ≤ (2 N R ) − . (20) Proof.

Recall that ψ ( x ) = ω ( | x | ) / (cid:0) N | x | (cid:1) where ω : [ M ] → R . By Equation (12) we may write ω = ω +1 − ω − where ω +1 and ω − are non-negative functions satisfying k (cid:88) t =0 ω +1 ( t ) = k (cid:88) t =0 ω − ( t ) = 1 / . (21)By the deﬁnition of dual block composition, we have(Φ (cid:63) ψ )( x , . . . , x R ) = 2 R · Φ( . . . , sgn ( ψ ( x i )) , . . . ) · R (cid:89) i =1 | ψ ( x i ) | . Consequently, (cid:88) x/ ∈ H N · R ≤ N | (Φ (cid:63) ψ )( x ) | = 2 R (cid:88) z ∈{− , } R | Φ( z ) |  (cid:88) ( x ,...,x R ) / ∈ H N · R ≤ N s.t.sgn( ψ ( x ))= z ,..., sgn( ψ ( x R ))= z R R (cid:89) i =1 | ψ ( x i ) |  = 2 R (cid:88) z ∈{− , } R | Φ( z ) |  (cid:88) ( x ,...,x R ) / ∈ H N · R ≤ N R (cid:89) i =1 ω z i ( | x i | ) (cid:0) N | x i | (cid:1)  . (22)Observe that for any ( t , . . . , t R ) ∈ [ M ] R , the number of inputs ( x , . . . , x R ) ∈ (cid:0) {− , } N (cid:1) R suchthat | x i | = t i for all i ∈ [ R ] is exactly (cid:81) Ri =1 (cid:0) Nt i (cid:1) . Hence, deﬁning P = { ( t , . . . , t R ) ∈ [ M ] R : t + · · · + t R > N } , we may rewrite Expression (22) as2 R (cid:88) z ∈{− , } R | Φ( z ) |  (cid:88) ( t ,...,t R ) ∈ P R (cid:89) i =1 ω z i ( t i )  .

19o control this quantity, we appeal to the following combinatorial lemma, whose proof we delay toSection 2.7.1 (this lemma is a substantial reﬁnement of [BT17, Lemma 32]).

Lemma 32.

Let α ≤ R , let β ∈ (4 ln R/ √ αR, , and let R be suﬃciently large. Let N = (cid:100) √ α (cid:101) R . Let η i : [ M ] → R , for i = 1 , . . . R , be a sequence of non-negative functions where forevery i , M (cid:88) r =0 η i ( r ) ≤ / η i ( r ) ≤ α exp( − βr ) /r ∀ r = 1 , . . . , M. (24) Let P = { (cid:126)t = ( t , . . . , t R ) ∈ [ M ] R : t + · · · + t R > N } . Then (cid:88) (cid:126)t ∈ P R (cid:89) i =1 η i ( t i ) ≤ − R · (2 N R ) − where ∆ ≥ β √ αR/ R . Observe that the functions ω z i satisfy Condition (23) (cf. Equation (21)) and Condition (24) (cf.Property (14)). We complete the proof by bounding2 R (cid:88) z ∈{− , } R | Φ( z ) | (cid:88) (cid:126)t ∈ P R (cid:89) i =1 ω z i ( t i )  ≤ R (cid:88) z ∈{− , } R | Φ( z ) | · (cid:0) − R · (2 N R ) − (cid:1) = (2 N R ) − . Here, the equality appeals to the condition that (cid:107) Φ (cid:107) = 1.Our ﬁnal dual witness ζ is obtained by modifying ξ = Φ (cid:63) ψ to zero out all of the mass it placeson inputs of total Hamming weight larger than N . The following proposition captures the conditionsunder which this postprocessing step can be done. Proposition 33 ( [BT17, Proposition 34], building on [RS10]) . Let N ≥ R > D and let ξ :( {− , } N ) R → R be any function such that (cid:88) x/ ∈ H N · R ≤ N | ξ ( x ) | ≤ (2 N R ) − D . Then there exists a function ν : ( {− , } N ) R → R such thatFor all polynomials p : ( {− , } N ) R → R , deg p < D = ⇒ (cid:104) ν, p (cid:105) = 0 (cid:107) ν (cid:107) ≤ / | x | > N = ⇒ ν ( x ) = ξ ( x ) . Remark 1.

Proposition 33 is framed exclusively in the language of dual polynomials: it states thatif a dual polynomial ξ places very little mass on inputs of Hamming weight more than N , then thereexists another dual polynomial ν satisfying certain useful properties (we ultimately use ν to zero ut the mass that ξ places on inputs of Hamming weight more than N ). Proposition 33 also has aclean primal formulation. Roughly speaking, it is equivalent to a bound on the growth rate of anypolynomial of degree at most D that is bounded at all inputs of Hamming weight at most D . Wedirect the interested reader to [Vio17] for details of this primal formulation. We are now ready to combine Proposition 31 and Proposition 33 to complete the proof ofProposition 30.

Proof of Proposition 30.

Let ξ = Φ (cid:63) ψ . By Proposition 31, we have (cid:88) x/ ∈ H N · R ≤ N | (Φ (cid:63) ψ )( x ) | ≤ (2 N R ) − , where ∆ ≥ β √ αR/ ln R . By Proposition 33, there exists a function ν : ( {− , } N ) R → R such thatFor all polynomials p : ( {− , } N ) R → R , deg p < min { D, ∆ } = ⇒ (cid:104) ν, p (cid:105) = 0 (25) (cid:107) ν (cid:107) ≤ /

10 (26) | x | > N = ⇒ ν ( x ) = ξ ( x ) . (27)Observe that (cid:107) ξ − ν (cid:107) >

0, as (cid:107) ξ (cid:107) = 1 (cf. Equation (9)) and (cid:107) ν (cid:107) ≤ /

10 (cf. Inequality (26)).Deﬁne the function ζ ( x ) = ξ ( x ) − ν ( x ) (cid:107) ξ − ν (cid:107) . Since ν ( x ) = ξ ( x ) whenever | x | > N (cf. Equation (27)), the function ζ is supported on the set H N · R ≤ N , establishing (19). We establish (17) by computing (cid:107) ζ − ξ (cid:107) = (cid:88) x ∈ ( {− , } N ) R (cid:12)(cid:12)(cid:12)(cid:12) ξ ( x ) − ν ( x ) (cid:107) ξ − ν (cid:107) − ξ ( x ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:88) x ∈ ( {− , } N ) R (cid:18) (cid:107) ξ − ν (cid:107) − (cid:19) | ξ ( x ) | + 1 (cid:107) ξ − ν (cid:107) | ν ( x ) |≤ (cid:18) (cid:107) ξ (cid:107) − (cid:107) ν (cid:107) − (cid:19) · (cid:107) ξ (cid:107) + 1 (cid:107) ξ (cid:107) − (cid:107) ν (cid:107) · (cid:107) ν (cid:107) ≤ (cid:18) − / − (cid:19) + 1 / − /

10= 29 . Equation (18) is immediate from the deﬁnition of ζ . Finally, (16) follows from (15), (25), andlinearity. Our ﬁnal task is to prove Lemma 32, restated here for the reader’s convenience.21 emma 32.

Let k, n ∈ N with k ≤ n . Then (cid:0) nk (cid:1) ≤ (cid:0) enk (cid:1) k . Lemma 35.

Let m ∈ N and η > . Then ∞ (cid:88) r = m e − ηr = 11 − e − η · e − ηm . Proof.

We calculate ∞ (cid:88) r = m e − ηr = e − ηm · ∞ (cid:88) r =0 e − ηr = e − ηm · − e − η . Lemma 36.

Let m ∈ N . Then m (cid:88) r =1 √ r ≤ √ m. Proof.

Using the fact that the function 1 / √ t is decreasing, we may estimate m (cid:88) r =1 √ r ≤ (cid:90) m √ t dt = 2 √ m. Proof.

Let T = (cid:98) N M (cid:99) . For each s ∈ { T, . . . , R } , let C ( s ) = max (cid:26) N s ln R , N √ R · s (cid:27) . We begin with a simple, but important, structural observation about the set P . Let t =( t , . . . , t R ) ∈ [ M ] R be a sequence such that t + · · · + t R > N . Then we claim that there exists an s ∈ { T . . . , R } such that t i ≥ C ( s ) for at least s indices i ∈ [ R ]. To see this, assume without loss22f generality that the entries of (cid:126)t are sorted so that t ≥ t ≥ · · · ≥ t R . Then there must exist an s ≥ T such that t s ≥ C ( s ). Otherwise, because no t i can exceed M , we would have: t + · · · + t R < T · M + R (cid:88) s =1 C ( s ) ≤ N N R (cid:98) R/ ln R (cid:99) (cid:88) s =1 s + N √ R R (cid:88) s = (cid:100) R/ ln R (cid:101) √ s ≤ N N N N. where the last inequality follows from Lemma 36 and the fact that (cid:80) ms =1 s − ≤ ln m + 1 (and R issuﬃciently large). Since the entries of (cid:126)t are sorted, the preceding values t , . . . , t s − ≥ C ( s ) as well.For each subset S ⊆ [ R ], deﬁne P S = { (cid:126)t ∈ P : t i ≥ C ( | S | ) for all indices i ∈ S } . The observations above guarantee that for every (cid:126)t = ( t , . . . , t R ) ∈ P , there exists some set S of sizeat least s ∈ { T, . . . , R } such that t i ≥ C ( s ) for all i ∈ S . Hence (cid:88) (cid:126)t ∈ P R (cid:89) i =1 η i ( t i ) ≤ R (cid:88) s = T (cid:88) S ⊆ [ R ]: | S | = s (cid:88) (cid:126)t ∈ P S R (cid:89) i =1 η i ( t i ) ≤ R (cid:88) s = T (cid:18) Rs (cid:19)  max i ∈{ ,...,R }  M (cid:88) r = (cid:100) C ( s ) (cid:101) η i ( r )  s (cid:32) max i ∈{ ,...,R } (cid:32) M (cid:88) r =0 η i ( r ) (cid:33)(cid:33) R − s ≤ − R R (cid:88) s = T (cid:18) Rs (cid:19)  M (cid:88) r = (cid:100) C ( s ) (cid:101) α exp( − βr ) r −  s by Properties (23) and (24) ≤ − R R (cid:88) s = T (cid:18) Res (cid:19) s (cid:18) α ( C ( s )) (cid:19) s · ∞ (cid:88) r = (cid:100) C ( s ) (cid:101) exp ( − βrs ) by Lemma 34 ≤ − R R (cid:88) s = T (cid:18) Res (cid:19) s (cid:18) α ( C ( s )) (cid:19) s · − e − βs · exp ( − βsC ( s )) by Lemma 35 ≤ − R R (cid:88) s = T (cid:18) Res (cid:19) s (cid:18) αRsN (cid:19) s · β · exp (cid:18) − βN R (cid:19) by deﬁnition of C ( s )and since 11 − e − βs ≤ − (1 − β/

2) = 2 β for s ≥ , β ∈ (0 , ≤ − R R (cid:88) s = T − s · exp (cid:18) − β √ αR ln R + ln(2 /β ) (cid:19) setting N = (cid:100) √ α (cid:101) R ≤ − R · (2 N R ) − , N R ) · (cid:18) β √ αR ln R − ln(2 /β ) (cid:19) ≥ β √ αR R holds for suﬃciently large R by the restrictions placed on α and β in the statement of the lemma. The goal of this section is to prove that the approximate degree of the Surjectivity function(Deﬁnition 16) is ˜ O ( N / ). Theorem 37.

For any R ∈ N , we have (cid:103) deg( SURJ

N,R ) = ˜ O ( N / ) . For the remainder of the section, we focus on proving Theorem 37 in the case that R = Θ( N ).This is without loss of generality by the following argument. If R > N , then Surjectivity is identicallyfalse, and hence has (exact) degree 0. And if R = o ( N ), then we can reduce to the case R = Θ( N )as follows. Let N (cid:48) = 2 N and R (cid:48) = R + N ; clearly R (cid:48) = Θ( N ). Given an input x to SURJ

N,R , obtainan input x (cid:48) to SURJ N (cid:48) ,R (cid:48) by appending range elements R + 1 , . . . , R + N to x . This constructionguarantees that SURJ

N,R ( x ) = SURJ N (cid:48) ,R (cid:48) ( x (cid:48) ). It follows that an approximation of degree ˜ O ( N / )for SURJ N (cid:48) ,R (cid:48) implies an approximation of the same degree for SURJ

N,R . Section Outline.

This section is structured as follows. Section 3.1 introduces some notation thatis speciﬁc to this section. Section 3.2 provides some intuition for the construction of the polynomialapproximation for Surjectivity using the simpler function

NOR as a warmup example. Section 3.3introduces some terminology that is useful for providing intuitive descriptions of our ﬁnal polynomialconstruction using the language of algorithms. Finally, in Section 3.4 we formally apply our strategyto Surjectivity in order to prove Theorem 37.

In this section, we make a few departures from the notation used in the introduction and later sectionsin order to more easily convey the algorithmic intuition behind our polynomial constructions. First,we will consider Boolean functions f : { , } n → { , } , where 1 corresponds to logical TRUE and 0corresponds to logical

FALSE . For such a Boolean function, we say that a polynomial p : { , } n → R is an ε -approximating polynomial for f if p ( x ) ∈ [0 , ε ] when f ( x ) = 0 and p ( x ) ∈ [1 − ε,

1] when f ( x ) = 1. For such a polynomial p , it will be useful to think of p ( x ) as representing the probabilitythat a randomized or quantum algorithm accepts when run on input x .By extension, throughout this section we will use (cid:103) deg ε ( f ) to denote the least degree of a realpolynomial p that ε -approximates f in the sense described above, and we will write (cid:103) deg( f ) = (cid:103) deg / ( f ). Given an input x ∈ { , } n , we will use | x | to denote its Hamming weight (i.e., | x | = (cid:80) i x i ). NOR

To convey the intuition behind our polynomial construction for Surjectivity, we start by consideringa much simpler function as an illustrative example. Consider the negation of the OR function on n bits, NOR : { , } n → { , } . We will give a novel construction of an approximating polynomial24or NOR of degree ˜ O ( √ n ). Of course, this is not terribly interesting since it is already known that (cid:103) deg( NOR ) = Θ( √ n ) (cf. Lemma 14). But this construction highlights the main idea behind themore involved constructions to follow.In many models of computation, such as deterministic or randomized query complexity, the NOR function remains just as hard if we restrict to inputs with | x | = 0 or | x | = 1 (where | x | denotesthe Hamming weight of x ∈ { , } n ). This is fairly intuitive, since distinguishing these two types ofinputs essentially requires ﬁnding a single 1 among n possible locations. The fact that these inputsrepresent the “hard case” is true for approximate degree as well: any polynomial that uniformlyapproximates NOR to error 1 / , √ n ) [NS94].However, if we remove the boundedness constraint on inputs of Hamming weight larger than 1,then there is a polynomial of degree one that exactly equals the NOR function on Hamming weights0 and 1: namely, the polynomial 1 − (cid:80) i x i . That is, if we view NOR as a promise function mapping H n ≤ to {− , } , then its approximate degree is Θ( n / ), but its unbounded approximate degree isjust 1.More generally, suppose that we only want the polynomial to approximate the NOR function onall inputs of Hamming weight up to T ≤ n , and we place no restrictions on the polynomial whenevaluated at inputs of Hamming weight larger than T . This can be achieved by a polynomial ofdegree O ( √ T ) (see Lemma 39 for a proof).Let us refer to the set of low-Hamming weight inputs as P , i.e., P = H n ≤ T = { x ∈ { , } n : 0 ≤ | x | ≤ T } . (28)The above discussion shows that we can construct a polynomial p of degree O ( √ T ) that tightlyapproximates NOR on inputs x ∈ P , though | p ( x ) | may be exponentially large for x / ∈ P . On theother hand, distinguishing inputs with | x | = 0 from inputs with x / ∈ P also seems “easy”. Forexample, a randomized algorithm that simply samples Θ( n/T ) bits and declares | x | = 0 if and onlyif it does not see a 1 is correct with high probability. Analogously, we can construct a low-degreepolynomial ˜ q which distinguishes between | x | = 0 or x / ∈ P (and is bounded in [0 ,

1] for all inputs in { , } n ): it is not hard to show (via an explicit construction involving Chebyshev polynomials, or byappeal to a quantum algorithm called quantum counting [BHT98]) that there exists a polynomial ˜ q of degree O ( (cid:112) n/T ) that accomplishes this.To summarize the above discussion, we can construct polynomials p and ˜ q , of degree O ( √ T )and O ( (cid:112) n/T ) respectively, with the following properties: p ( x ) ∈  [9 / ,

1] if | x | = 0[0 , /

10] if 1 ≤ | x | ≤ T R if x / ∈ P ˜ q ( x ) ∈  [9 / ,

1] if | x | = 0[0 ,

1] if 1 ≤ | x | ≤ T [0 , /

10] if x / ∈ P (29)Now consider the polynomial p ( x ) · ˜ q ( x ). This polynomial approximates NOR on | x | = 0, sinceits value is in [0 . , NOR on 1 ≤ | x | ≤ T , since its value is in [0 , / x / ∈ P , although ˜ q ( x ) is small, we cannot ensure that the product p ( x ) · q ( x ) is small,since we have no control over p ( x ) for such x .To ﬁx this, we will construct a new polynomial q that behaves like ˜ q for x ∈ P and is extremely small when x / ∈ P (in particular 0 ≤ q ( x ) (cid:28) / | p ( x ) | for such x ). To understand how small we need q to be, we need to determine just how large p ( x ) can be on inputs with x / ∈ P .25igure 2: Caricature of the polynomial p ( x ) Figure 3: Caricature of the polynomial q ( x )To understand the behavior of p ( x ) outside of P , we can either analyze the behavior of anexplicit polynomial of our choice for the NOR function or we can appeal to a general result aboutthe growth of polynomials that are bounded in a region (see Lemma 39). In either case, we get thatthere exists an upper bound M = exp( O ( √ T log n )) such that for all x / ∈ P , | p ( x ) | ≤ M .This leads us to design a new polynomial q , which has the same behavior as ˜ q for all x ∈ P , but isat most 1 / (3 M ) when x / ∈ P . We can construct the polynomial q from ˜ q by applying standard errorreduction to reduce the approximation error of ˜ q to (cid:15) = 1 / (3 M ), which increases its the degree by amultiplicative factor of O (log(3 M )) = O ( √ T log n ). Thus, deg( q ) = O ( (cid:112) n/T · √ T log n ) = ˜ O ( √ n ).In summary, we have now constructed polynomials p and q with the following behavior: p ( x ) ∈  [9 / ,

1] if | x | = 0[0 , /

10] if 1 ≤ | x | ≤ T [0 , M ] if x / ∈ P q ( x ) ∈  [1 − / (3 M ) ,

1] if | x | = 0[0 ,

1] if 1 ≤ | x | ≤ T [0 , / (3 M )] if x / ∈ P (30)Caricatures of these polynomials are depicted in Figure 2 and Figure 3. It is now easy to seethat the product r ( x ) = p ( x ) · q ( x ) is a (1/3)-error approximation to NOR for all x ∈ { , } n . Thedegree of the constructed polynomial is deg( r ) ≤ deg( p ) + deg( q ) = ˜ O ( √ T + √ n ) = ˜ O ( √ n ).Thus our constructed polynomial, r , has degree ˜ O ( √ n ) and approximates NOR to error 1 / Before moving on to Surjectivity, we brieﬂy introduce some terminology that will allow us convey theintuition of more involved constructions by reasoning about polynomials as if they were algorithms.Consider three Boolean functions p : { , } n → { , } , p : { , } n → { , } and p : { , } n →{ , } , and suppose there are deterministic algorithms A , A , and A that compute these Booleanfunctions exactly. Now it makes sense to say “run algorithm A in input x ; if it accepts then output A ( x ), and if it rejects, then output A ( x ).” The Boolean function computed by this is A ( x ) if A ( x ) = 1 and A ( x ) if A ( x ) = 0. Observe that the following polynomial composition of p , p , p computes the same Boolean function: p ( x ) p ( x ) + (1 − p ( x )) p ( x ).We would like to use this terminology to discuss combining polynomials more generally. Forpolynomials p that approximate Boolean functions by outputting a value in [0 ,

1] on any input x , wecan imagine p ( x ) as representing the probability that a randomized (or quantum) algorithm accepts26n input x , and the same interpretation goes through. We will also use the same terminology forpolynomials that output values greater than 1, in which case we cannot interpret the output as aprobability, but expressions like p ( x ) p ( x ) + (1 − p ( x )) p ( x ) still make sense.For example, in the previous section we had two polynomials p and q and we constructed thepolynomial r ( x ) = p ( x ) · q ( x ) by multiplying the two polynomials together. Another way to think ofthis is that r is the polynomial obtained when we “run” the polynomial q and output p if it acceptsand output 0 if it rejects. This yields the polynomial q ( x ) p ( x ) + (1 − q ( x )) · q ( x ) p ( x ). We wouldlike to informally describe polynomial constructions using this language, which will be especiallyuseful when the constructions become more involved. So for example, we would informally describethe polynomial we constructed for the NOR function as follows (Algorithm 1):

Polynomial 1

An informal description of the polynomial approximation for

NOR Using q , check if | x | > T (with exponentially small probability of error if indeed | x | > T ). If so,halt and output 0. Using p , compute NOR under the promise that 0 ≤ | x | ≤ T and output the result. We now construct a polynomial that approximates Surjectivity using the strategy described above.Recall that

SURJ : [ R ] N → { , } is deﬁned by SURJ ( x , . . . , x N ) = 1 if and only if for all r ∈ [ R ]there exists an i ∈ [ N ] such that x i = r .In this section, we will use the notation r ( x ) = |{ i ∈ [ N ] : x i = r }| (31)to denote the number of times the range element r appears in the input x . Note that the Surjectivityfunction evaluates to 1 if and only if r ( x ) ≥ r ∈ [ R ].Finally, we will also need to consider a generalized version of Surjectivity, R -Surjectivity forsome set R ⊆ [ R ], which we denote SURJ R . As with Surjectivity, SURJ R : [ R ] N → { , } , (32)and SURJ R ( x , . . . , x N ) = 1 if and only if for all r ∈ R there exists an i ∈ [ N ] such that x i = r .In other words, SURJ R ( x ) = 1 if and only if for all r ∈ R we have r ( x ) ≥

1. Note that

SURJ [ R ] = SURJ . Our construction will actually show more generally that (cid:103) deg(

SURJ R ) = ˜ O ( N / )for all R ⊆ [ R ]. To implement the strategy described in Section 3.2 in the context of Surjectivity, we ﬁrst choose aset P of inputs that we consider to be “hard”. Since Surjectivity can be phrased as asking whetherall range items appear at least once in the input, it is natural to consider the hard inputs to bethose that have few occurrences of each range item. Intuitively, this is because on such inputs,evidence that any given range item r appears at least once in the input is hard to ﬁnd.27o this end, we deﬁne P as the set of inputs for which every range item appears at most T times, for a parameter T to be chosen later: P = { x : ∀ r ∈ [ R ] , r ( x ) ≤ T } . (33)More generally, when we consider R -Surjectivity, we denote the set of hard inputs P R and deﬁne itas P R = { x : ∀ r ∈ R , r ( x ) ≤ T } . (34)In this section, we will construct a polynomial p R that approximates SURJ R on P R to boundederror, but may be exponentially large outside of P R . The value of T in the deﬁnition of P R will bechosen later; for now we only assume that our choice will satisfy T = N Θ(1) , which simpliﬁes someexpressions since we have log T = Θ(log N ). Overview of the Construction of p R . To construct a polynomial that works on the hard inputs, P R , we ﬁrst express SURJ R as SURJ R ( x ) = (cid:94) r ∈R (cid:95) i ∈ [ N ] [ x i = r ] , (35)where [ x i = r ] is the indicator function that takes value 1 when x i = r and 0 otherwise. Observethat for any ﬁxed r ∈ R , the function [ x i = r ] depends on only log R bits of x and hence can beexactly computed by a polynomial of degree at most log R .Since our goal in this section is to construct a polynomial p R that approximates SURJ R on allinputs in P R and may be exponentially large outside of P R , we can assume that each inner OR gatein Equation (35) is fed an input of Hamming weight at most T . Hence, our approach will be to ﬁrstconstruct a low-degree polynomial q that approximates AND |R| ◦ OR N for inputs in (cid:16) H N ≤ T (cid:17) |R| . Wethen obtain the polynomial p R that approximates SURJ R at all inputs in P R by composing q withthe indicator functions { [ x i = r ] } i ∈ [ N ] ,r ∈R . Notice that the degree of p R is at most deg( q ) · log R .To construct q , our approach is as follows. First, we construct a polynomial V T of degree O ( T / log N ) that approximates OR N to error O (1 /N ) at all inputs of Hamming weight at most T . (However, V T ( x ) may be as large as exp( T / log N ) for inputs x of larger Hamming weight).Invoking Lemma 14, we let w be a polynomial of degree Θ( N / ) that approximates AND |R| to error1 /

20, and ﬁnally we deﬁne q := w ◦ V T . A simple and elegant analysis of Buhrman et al. [BNRdW07](cf. Lemma 40) allows us to argue that q indeed approximates AND |R| ◦ OR N on (cid:16) H N ≤ T (cid:17) |R| . Preliminary Lemmas.

Before formally deﬁning and analyzing p R , we record a few importantfacts about Chebyshev polynomials that will be useful throughout the remainder of this section. Lemma 38 (Properties of Chebyshev Polynomials) . Let d ∈ N .(1) There exists a polynomial T d : R → R (the Chebyshev polynomial of degree d ) such that T d ( x ) ∈ [ − , for all x ∈ [ − , and T d (1 + µ ) ≥ exp( d √ µ ) for all µ ∈ (0 , .(2) For any polynomial p : R → R of degree d with | p ( x ) | ≤ for all x ∈ [ − , , we have that forany x with | x | > , | p ( x ) | ≤ | T d ( x ) | ≤ (2 | x | ) d , (36) where T d is the Chebyshev polynomial of degree d . roof. To establish property (1), we use the fact that for µ >

0, the value of the degree d Chebyshevpolynomial can be written [Che82, § T d (1 + µ ) = cosh( d arcosh(1 + µ )) ≥ cosh( d √ µ ) for µ ≤ ≥ · exp( d √ µ ) . Property (2) appears as [Che82, 3.2 Problem 19].We are now ready to establish the existence of a low-degree polynomial that approximates OR at all inputs of low Hamming weight. Lemma 39 (Approximating OR on Inputs of Low Hamming Weight) . Let ε ∈ (0 , and ≤ T ≤ N .There is a polynomial V T,ε : { , } N → R of degree O ( √ T log(1 /ε )) such that V T,ε ( x ) ∈  [0 , ε ] if | x | = 0[1 − ε,

1] if 1 ≤ | x | ≤ T [ − a, a ] for some a ∈ exp (cid:16) O (cid:16) √ T · log N · log (1 /ε ) (cid:17)(cid:17) if | x | > T . (37) Proof.

Choose d = O ( √ T log(1 /ε )) so that M := T d (1 + 1 /T ) + 1 ≥ /ε (as guaranteed by Property1 of Lemma 38). Deﬁne V T,ε by the following aﬃne transformation of T d : V T,ε ( x ) = (cid:18) − M (cid:19) − M · T d (cid:18) T − | x | T (cid:19) . Then for | x | = 0, we have V T,ε ( x ) = (cid:18) − M (cid:19) − M · T d (cid:18) T (cid:19) = 0 . If 1 ≤ | x | ≤ T , then 1 + 1 /T − | x | /T ∈ [ − , V T,ε ( x ) ∈ [1 − ε, T + 1 ≤ | x | ≤ N ,then | V T,ε ( x ) | ≤ M · T d ( N/T ) ≤ exp( O ( √ T · log N · log(1 /ε )) , where the ﬁnal inequality holds by Equation (36).The following lemma shows that if p and q are approximating polynomials for Boolean functions f and g , respectively, then the block composition p ◦ q approximates f ◦ g , with a blowup in errorthat is proportional to the number of variables over which f is deﬁned. The proof is due to Buhrmanet al. [BNRdW07], but our formulation is slightly diﬀerent so we give the proof for completeness. Lemma 40.

Let f : { , } n → { , } and g : X → { , } be Boolean functions, where X ⊆ { , } m for some m . Let p : { , } n → [0 , be an ε -approximating polynomial for f , and let q : X → [0 , be a δ -approximating polynomial for g . Then the block composition p ◦ q : X n → R is an ( ε + δn ) -approximating polynomial for f ◦ g : X n → { , } . roof. Fix any input x = ( x , . . . , x n ) ∈ X n , and let y = ( g ( x ) , . . . , g ( x n )) ∈ { , } n . Let z ∈ [0 , n be deﬁned by z = ( q ( x ) , . . . , q ( x n )). Since p is an ε -approximating polynomial for f , by the triangleinequality, it suﬃces to show that | p ( y ) − p ( z ) | ≤ δn .Let Z be a random variable on { , } n where each Z i = 1 independently with probability z i .Then due to the multilinearity of p , we have p ( z ) = E [ p ( Z )] = Pr[ Z = y ] · p ( y ) + Pr[ Z (cid:54) = y ] · E [ p ( Z ) | Z (cid:54) = y ] . Since q is a δ -approximating polynomial for g , we have | y i − z i | ≤ δ for every i ∈ [ n ]. Hence,Pr[ Z = y ] ≥ (1 − δ ) n ≥ − δn. Because p is bounded in [0 , p ( z ) ≥ (1 − δn ) · p ( y ) + 0 ≥ p ( y ) − δn and p ( z ) ≤ · p ( y ) + δn · p ( y ) + δn, completing the proof. Formal Deﬁnition of p R . Let w be a (1 / AND |R| of degree O ( |R| / ) whose existence is guaranteed by Lemma 14. We may assume that w ( x ) ∈ [0 ,

1] for all x ∈ { , } n (we will exploit this assumption in the proof of Lemma 43 below, as it allows us to applyLemma 42 below to w ). Let q := w ◦ V T, / (20 n ) . Finally, let us deﬁne p R to be the composition of q with the indicator functions { [ x i = r ] } i ∈ [ N ] ,r ∈R . For example, if R = { , . . . , |R|} , then p R = q ( [ x = 1] , [ x = 1] , . . . , [ x N = 1] , [ x = 2] , . . . , [ x N = |R| ]) . Observe thatdeg( p R ) ≤ deg( w ) · deg( V T, / (20 n ) ) · deg( [ x i = r ]) ≤ O (cid:16) |R| / · T / log n · log R (cid:17) ≤ ˜ O (cid:16) √ N T (cid:17) . Showing p R Approximates

SURJ R on P R . Lemma 40 implies that: | q ( x ) − AND |R| ◦ OR ( x ) | ≤ /

10 for all x ∈ (cid:0) H N ≤ T (cid:1) |R| . (38)An immediate consequence is the following lemma. Lemma 41. | p R ( x ) − SURJ R ( x ) | ≤ / for all x ∈ P R . Bounding p R Outside of P R . For an input x ∈ [ R ] N outside of P R , let b R ( x ) be the number ofrange items that appear more than T times, i.e., b R ( x ) = |{ r ∈ R : r ( x ) > T }| . (39)We claim that | p R ( x ) | ≤ exp (cid:16) b R ( x ) · ˜ O ( √ T ) (cid:17) . This bound relies on the following elementarylemma. 30 emma 42. Let p : R n → R be a multilinear polynomial with p ( x ) ∈ [0 , for all x ∈ { , } n . Thenfor x ∈ R n , we have | p ( x ) | ≤ n (cid:89) i =1 ( | − x i | + | x i | ) . Proof.

We prove the lemma by induction on the number of variables n . If n = 0, then p is a constantin the interval [0 ,

1] and the claim is true.Now suppose the claim is true for n − p : R n → R with p ( x ) ∈ [0 ,

1] for x ∈ { , } n . We begin by decomposing p ( x ) = (1 − x n ) · q ( x , . . . , x n − ) + x n · q ( x , . . . , x n − )where q and q are themselves multilinear polynomials. Since p ( x ) ∈ [0 ,

1] for all x ∈ { , } n , this isin particular true when x n = 0. Hence q ( x (cid:48) ) ∈ [0 ,

1] for all x (cid:48) ∈ { , } n − . Similarly, setting x n = 1reveals that q ( x (cid:48) ) ∈ [0 ,

1] for all x (cid:48) ∈ { , } n − . Now for any x ∈ R n with x (cid:48) = ( x , . . . , x n − ) wehave | p ( x ) | = (cid:12)(cid:12) (1 − x n ) · q ( x (cid:48) ) + x n · q ( x (cid:48) ) (cid:12)(cid:12) ≤ | − x n | · | q ( x (cid:48) ) | + | x n | · | q ( x (cid:48) ) |≤ ( | − x n | + | x n | ) · n − (cid:89) i =1 ( | − x i | + | x i | )where the ﬁnal inequality uses the inductive hypothesis. This proves the claim. Lemma 43.

There exists a function a ( x ) = exp (cid:16) b R ( x ) · ˜ O (cid:16) √ T (cid:17)(cid:17) such that for any R ⊆ [ R ] , thepolynomial p R : [ R ] N → R has degree ˜ O ( √ N T ) and satisﬁes: p R ( x ) ∈  [0 , /

10] if x ∈ P R and SURJ R ( x ) = 0[9 / ,

1] if x ∈ P R and SURJ R ( x ) = 1[ − a ( x ) , a ( x )] if x / ∈ P R . Proof.

The ﬁrst two cases are an immediate consequence of Lemma 41.To upper bound the value of | p R ( x ) | for x / ∈ P R , we exploit the structure of p R as a multilinearpolynomial w of degree O ( √ N ) over the variables z , . . . , z |R| , where each z r is the output of the r th polynomial from Lemma 39. That is, z r = V T, / (20 n ) ( [ x = r ] , . . . , [ x N = r ]) . If b R ( x ) range items appear greater than T times, this means that up to b R ( x ) of the variables z r might take values outside [0 , b R ( x ) variables is still atmost exp( ˜ O ( √ T )). By Lemma 42, | w ( z ) | ≤ (cid:89) r ∈R ( | − z r | + | z r | ) ≤ exp (cid:16) b R ( x ) · ˜ O (cid:16) √ T (cid:17)(cid:17) , since each z r that is in [0 ,

1] contributes a factor of exactly 1 to the product, whereas each of theremaining b R ( x ) variables contributes a factor of at most exp( ˜ O ( √ T )) to the product.31 .4.2 Controlling The Easy InputsIntuition. Unlike the example of the

NOR function, where all the inputs outside the “hard” set P (cf. Equation (28)) were in NOR − (1), for Surjectivity there are both 0- and 1-inputs outside of thehard set P (cf. Equation (33)) . So our remaining task is not simply a matter of constructing apolynomial that detects if the input is outside of P , as it was in the case of the NOR function.However, we will show that, for Surjectivity, the inputs outside of P are easy to handle in adiﬀerent sense—they are easy because we can construct a reduction from inputs outside of P toinputs in P . To gain some intuition, consider the task of designing a randomized algorithm forSurjectivity where we want to reduce the general case to the case where there is a set R ⊆ [ R ] suchthat all range items r ∈ R have r ( x ) ≤ T . To do this, we can simply sample a large number ofelements x i and remove from consideration all range items appearing at least once in the sample,because we know these range items all appear in the input at least once. After this step, we have anew set R ⊆ [ R ] consisting of all range items that have not been seen in the sampling stage. Every r ∈ R is likely to have r ( x ) ≤ T because elements that appeared too frequently would have (withhigh probability) been observed in the sampling stage. Thus it now suﬃces to solve SURJ R on theinput under the assumption that all range items r ∈ R have r ( x ) ≤ T .Informally, the above discussion states that we want to construct the polynomial described inAlgorithm 2. Polynomial 2

Informal description of the polynomial approximation for

SURJ Sample S = ˜Θ( N / ) items and remove all range items seen from [ R ]. Let the remaining set be R ⊆ [ R ]. Solve

SURJ R under the promise that all r ∈ R have r ( x ) ≤ T , where T = ˜Θ( √ N ).We now have to construct a polynomial that represents this algorithmic idea. We have alreadyconstructed an (unbounded approximating) polynomial for SURJ R under the promise P R in theprevious section, so the second step of this construction is done.For Step 1 of Algorithm 2, we need to construct a polynomial to represent the idea of samplinginput elements and evaluating a polynomial that depends on the sampled elements. To build up tothis, consider a deterministic algorithm that queries a subset S ⊆ [ N ] of input elements, checks ifthe sampled string equals another ﬁxed string y and outputs 1 if true and 0 if false. If we denote theinput x ∈ [ R ] N restricted to the subset S ⊆ [ N ] as x S , then this algorithm outputs 1 if and only if x S = y . Interpreting the input as an element of { , } N log R rather than [ R ] N , it is easy to see thata deterministic query algorithm querying |S| log R bits of x can solve this problem. Consequently,there is a polynomial of degree |S| log R that outputs 1 if x S = y and outputs 0 otherwise. For anyﬁxed S ⊆ [ N ] and y ∈ [ R ] |S| , we denote this polynomial by y ( x S ).Now we can construct a polynomial which samples S elements of the input and then outputs abit depending on the elements seen. Let S ⊆ [ N ] be a subset of indices with |S| = S . Let f ( S , x S )be an arbitrary Boolean function that tells us whether to output 0 or 1 on seeing the sample ( S , x S ).Then the following polynomial samples a random set S of size S , reads the input x restricted to theset S , and outputs the bit f ( S , x S ):1 (cid:0) NS (cid:1) (cid:88) S∈ ( [ N ] S ) (cid:88) y ∈ [ R ] S y ( x S ) f ( S , y ) (40)32ote that for any function f , this is a polynomial of degree S log R in the variables x , . . . , x N ,because y ( x S ) is a multilinear polynomial over those variables, and f ( S , y ) is simply a hard-codedbit that does not depend on x . The value of this polynomial on an input x equals1 (cid:0) NS (cid:1) (cid:88) S∈ ( [ N ] S ) (cid:88) y ∈ [ R ] S y ( x S ) f ( S , y ) = 1 (cid:0) NS (cid:1) (cid:88) S∈ ( [ N ] S ) f ( S , x S ) . (41)We can now generalize this construction to allow for the possibility that the function f is itselfa polynomial. This is what we need to implement Algorithm 2, in which we sample a random set S ⊆ [ N ] of size S , query x S (using S log R queries), and then run a polynomial that depends on theresults. Speciﬁcally, if we sample the set S ⊆ [ N ], and learn that x S equals the string y ∈ [ R ] |S| ,then we want to run the polynomial p R ( y ) from Lemma 43 for the set R ( y ) of all elements in [ R ]that do not appear in y , i.e., R ( y ) = [ R ] \ { r : ∃ i y i = r } . (42) Formal Description of The Approximation To Surjectivity.

Using the tools from Sec-tion 3.4.1 and the above discussion, we can now construct a polynomial that corresponds to theinformal description in Algorithm 2: r ( x ) = 1 (cid:0) NS (cid:1) (cid:88) S∈ ( [ N ] S ) (cid:88) y ∈ [ R ] S y ( x S ) p R ( y ) ( x ) . (43)This is a polynomial of degree S log R + max y { deg( p R ( y ) ) } = ˜ O ( S + √ N T ) = ˜ O ( N / ), using S = ˜Θ( N / ) and T = ˜Θ( √ N ) (where the factors hidden by the ˜Θ notation will be chosen later).The value of the polynomial on input x is r ( x ) = 1 (cid:0) NS (cid:1) (cid:88) S∈ ( [ N ] S ) p R ( x S ) ( x ) . (44)where recall that R ( x S ) is as deﬁned in Equation (42). The right hand side of Equation (44) isprecisely the expected value of the polynomial p R ( x S ) ( x ) when S is a uniformly random set of size S . We will now show that r ( x ) is an approximating polynomial for Surjectivity, i.e., that for all x ∈ [ R ] N , | SURJ ( x ) − r ( x ) | ≤ / b R ( x S ) is the number of range items r ∈ R that appear in x greater than T times33cf. Equation (39)), we compute the value of the polynomial r ( x ) on an input x : r ( x ) = 1 (cid:0) NS (cid:1) (cid:88) S∈ ( [ N ] S ) p R ( x S ) ( x ) (45)= 1 (cid:0) NS (cid:1)  (cid:88) S∈ ( [ N ] S ) N/T (cid:88) b =0 [ b R ( x S ) ( x ) = b ] · p R ( x S ) ( x )  (46)= 1 (cid:0) NS (cid:1) (cid:88) S∈ ( [ N ] S ) [ b R ( x S ) ( x ) = 0] · p R ( x S ) ( x ) (47)+ N/T (cid:88) b =1 (cid:0) NS (cid:1) (cid:88) S∈ ( [ N ] S ) [ b R ( x S ) ( x ) = b ] · p R ( x S ) ( x ) , (48)where we split up the sum into the b = 0 (47) and b ≥ b ≥ b R ( x S ) ( x ) ≥

1, we know by Lemma 43 that the magnitude of the polynomial | p R ( x ) | is atmost exp (cid:16) b R ( x S ) ( x ) · ˜ O (cid:16) √ T (cid:17)(cid:17) . Thus the magnitude of the term in Expression (48) is at most N/T (cid:88) b =1 (cid:0) NS (cid:1) (cid:88) S∈ ( [ N ] S ) [ b R ( x S ) ( x ) = b ] · exp( b · ˜ O ( √ T )) (49)= N/T (cid:88) b =1 Pr S [ b R ( x S ) ( x ) = b ] · exp (cid:16) b · ˜ O (cid:16) √ T (cid:17)(cid:17) . (50)We now need to compute the value of Pr S [ b R ( x S ) ( x ) = b ] for each b . Intuitively, this roughlycorresponds to the probability that we sample S = ˜Θ( N / ) elements from the input and missall bT elements that correspond to the T = ˜Θ( √ N ) copies of the b range items that appear atleast T times. If we were to sample N/ ( bT ) elements, the probability of seeing none of the bT elements would be Θ(1). Since we are sampling S = ˜Θ( N / ) (cid:29) N/ ( bT ) elements, the probabilityof not seeing one of the bT elements is exp( − bST /N ) = exp( − b · ˜Ω( N / )). We make this heuristiccalculation formal in the following lemma: Lemma 44.

Let

S, T ≤ N . Let S ⊂ [ N ] be a random subset of size S . Then for every x ∈ [ R ] N and every b ≥ , Pr S [ b R ( x S ) ( x ) ≥ b ] ≤ exp( − b · ( ST /N − log N )) , recalling that b R ( x S ) is the number of range items r ∈ R that appear in x more than T times, but donot appear in x S .Proof. To calculate the probability of interest, we begin by analyzing a simpliﬁed experiment.Suppose there are b range items r , . . . , r b each appearing greater than T times in x . We compute34he probability that none of the items r , . . . , r b appear in the sample x S . If T ≥ N − S , then thisprobability is zero. Otherwise, by direct calculation, we have for each i = 1 , . . . , b :Pr S [ r i / ∈ x S | r / ∈ x S , . . . , r i − / ∈ x S ] ≤ S − (cid:89) j =0 (cid:18) − TN − j (cid:19) ≤ (cid:18) − TN (cid:19) S ≤ exp (cid:18) − STN (cid:19) . Hence the probability that none of r , . . . , r b appear isPr S [ r / ∈ x S ∧ · · · ∧ r b / ∈ x S ] = b (cid:89) i =1 Pr S [ r i / ∈ x S | r / ∈ x S , . . . , r i − / ∈ x S ] ≤ exp (cid:18) − bSTN (cid:19) . Now ﬁx an arbitrary x , and let r , . . . , r k be the range items appearing greater than T times in x . Observe that k ≤ N/T . We estimate the probability of interest by taking a union bound over allsubsets of r , . . . , r k of size b :Pr S [ b R ( x S ) ( x ) ≥ b ] ≤ (cid:88) Y ⊆ [ k ]: | Y | = b Pr S [ r i j / ∈ x S ∀ j ∈ Y ] ≤ (cid:18) kb (cid:19) exp (cid:18) − bSTN (cid:19) ≤ exp (cid:18) − bSTN + b log N (cid:19) . From Lemma 44 we know that Pr S [ b R ( x S ) ( x ) = b ] ≤ exp( − b · ˜Ω( ST /N )). Hence the b ≥ N/T (cid:88) b =1 exp( − b · ˜Ω( ST /N )) · exp( b · ˜ O ( √ T )) = o (1) , (51)by choosing T = ˜Θ( √ N ) and S = ˜Θ( N / ) appropriately. This shows that the term in (48) doesnot signiﬁcantly inﬂuence the value of the polynomial for any input x .Thus we have for any input x , r ( x ) = 1 (cid:0) NS (cid:1) (cid:88) S∈ ( [ N ] S ) [ b R ( x S ) ( x ) = 0] · p R ( x S ) ( x ) + o (1) . (52)By Lemma 43, we know that if b R ( x S ) ( x ) = 0 (and hence x ∈ P R ( x S ) ), then p R ( x S ) ( x ) is a(1 / SURJ ( x ). Applying Lemma 44 once again shows that Pr S [ b R ( x S ) ( x ) ≥ ≤ exp( − ˜Ω( N / )), so Pr S [ b R ( x S ) ( x ) = 0] = 1 − o (1). Hence, for all x ∈ [ R ] N , | r ( x ) − SURJ ( x ) | ≤ /

10 + o (1) ≤ / Lower Bound for Surjectivity

The goal of this section is to show the following improved lower bound on the approximate degreeof the Surjectivity function.

Theorem 45.

For some N = O ( R ) , the (1 / -approximate degree of SURJ

N,R is ˜Ω( R / ) . To prove Theorem 45, we combine the following theorem with the reductions of Proposition 18and Corollary 24.

Theorem 46.

Let N = c · R for a suﬃciently large constant c > . Let F ≤ N : H N · R ≤ N → {− , } equal AND R ◦ OR N restricted to inputs in H N · R ≤ N = { x ∈ {− , } N · R : | x | ≤ N } . Then (cid:94) ubdeg( F ≤ N ) ≥ ˜Ω( R / ) . The proof of Theorem 46 entails using dual witnesses for the high approximate degree of

AND R and OR N to construct a dual witness for the higher approximate degree of F ≤ N . As indicated inSection 1.3, the construction is essentially the same as in [BT17], except that we observe that adual witness for OR constructed and used in prior works satisﬁes an exponentially stronger decaycondition than has been previously realized.The construction can be thought of as consisting of three steps: Step 1.

We begin by constructing a dual witness ψ for the fact that the unbounded approximatedegree of the OR N function is Ω (cid:16) √ T (cid:17) even when promised that the input has Hamming weight atmost T = Θ( √ R ). The dual witness ψ is a small variant of the one in [BT17], but we give a morecareful analysis of its tail decay. In particular, we make use of the fact that for all t ≥

1, the (cid:96) weight that ψ places on the t ’th layer of the Hamming cube is upper bounded by exp( − Ω( t/ √ T )) /t . Step 2.

We combine ψ with a dual witness Φ for AND R to obtain a preliminary dual witnessΦ (cid:63) ψ for F = AND R ◦ OR N . The dual witness Φ (cid:63) ψ shows that F has approximate degreeΩ( √ R · √ T ) = Ω( R / ). However, Φ (cid:63) ψ places weight on inputs of Hamming weight larger than N ,and hence does not give an unbounded approximate degree lower bound for the promise variant F ≤ N . Step 3.

Using Proposition 30 we zero out the mass that Φ (cid:63) ψ places on inputs of Hamming weightlarger than N , while maintaining its pure high degree and correlation with F ≤ N . This yields theﬁnal desired unbounded approximate degree dual witness ζ for F ≤ N , as per Proposition 13. OR N We begin by constructing a univariate function which captures the properties we need of our innerdual witness for OR N . The construction slightly modiﬁes the dual polynomial for OR N given byˇSpalek [ˇSpa08]. We provide a careful analysis of its decay as a function of the input’s Hammingweight. 36 roposition 47. Let T ∈ N and /T ≤ δ ≤ / . There exist constants c , c ∈ (0 , and a function ω : [ T ] → R such that ω (0) − T (cid:88) t =1 ω ( t ) ≥ − δ (53) T (cid:88) t =0 | ω ( t ) | = 1 (54) For all univariate polynomials q : R → R , deg q < c √ δT = ⇒ T (cid:88) t =0 ω ( t ) · q ( t ) = 0 (55) | ω ( t ) | ≤

170 exp( − c t √ δ/ √ T ) δ · t ∀ t = 1 , . . . , T. (56) Proof of Proposition 47.

By renormalizing, it suﬃces to construction a function ω : [ T ] → R suchthat ω (0) − T (cid:88) t =1 ω ( t ) ≥ (1 − δ ) (cid:107) ω (cid:107) (57)For all univariate polynomials q : R → R , deg q < c √ δT = ⇒ T (cid:88) t =0 ω ( t ) · q ( t ) = 0 (58) | ω ( t ) | ≤

170 exp( − c t √ δ/ √ T ) (cid:107) ω (cid:107) δ · t ∀ t = 1 , . . . , T. (59)Let c = (cid:100) /δ (cid:101) below. We will freely use the fact that since δ ≤ /

2, we have c ≤ c / ( c − ≤ /δ .Let m = (cid:98) (cid:112) T / c (cid:99) and deﬁne the set S = { , c } ∪ { ci : 0 ≤ i ≤ m } . Note that | S | ≥ c √ δT for some absolute constant c >

0. Deﬁne the function ω ( t ) = ( − t +( T − m ) T ! (cid:18) Tt (cid:19) (cid:89) r ∈ [ T ] \ S ( t − r ) . Property (58) follows from the following well-known combinatorial identity.

Fact 48 (e.g., [GKP94] Equation (5.23) or [OS10]) . Let T ∈ N , and let p be a polynomial of degreeless than T . Then T (cid:88) t =0 ( − t (cid:18) Tt (cid:19) p ( t ) = 0 . Expanding out the binomial coeﬃcient in the deﬁnition of ω reveals that | ω ( t ) | =  (cid:81) r ∈ S \{ t } | t − r | for t ∈ S, t = 1 with c = 1 /

10, since | ω (1) | ≤ (cid:107) ω (cid:107) and170 δ exp( − c √ δ/ √ T ) ≥ · e −√ / > . For t = c , we have | ω ( c ) | ω (0) = c (cid:81) mi =1 (2 ci ) c ( c − (cid:81) mi =1 (2 ci − c )= 1 c − (cid:32) m (cid:89) i =1 i − / i (cid:33) − ≤ c − (cid:32) − m (cid:88) i =1 i (cid:33) − ≤ c − (cid:18) − π (cid:19) − ≤ c − (cid:81) mi =1 (1 − a i ) ≥ − (cid:80) mi =1 a i for a i ∈ (0 , | ω ( c ) |(cid:107) ω (cid:107) ≤ | ω ( c ) | ω (0) ≤ c − ≤ δ · c since 10 /δ ≥ c / ( c − ≤ δ · c · e − c √ δ/ √ T , since δ ≥ /T and hence e − c √ δ/ √ T ≥ e − . Thus (59) holds for t = c , recalling that c = 1 / t = 2 cj with j ≥

1, we get | ω ( t ) | ω (0) = c (cid:81) mi =1 (2 ci )(2 cj − cj − c ) (cid:81) i ∈ [ m ] \{ j } | ci − cj | = c ( m !) (4 c j − (2 c + 2 c ) j + c ) (cid:81) i ∈ [ m ] \{ j } ( i + j ) | i − j | = c c j − (2 c + 2 c ) j + c · ( m !) ( m + j )!( m − j )! . For j ≥

1, the ﬁrst factor is bounded by c c j − (2 c + 2 c ) j + c ≤ c (2 cj ) , c ≥

2. We control the second factor by( m !) ( m + j )!( m − j )! = mm + j · m − m + j − · . . . · m − j + 1 m + 1 ≤ (cid:18) mm + j (cid:19) j ≤ (cid:18) − j m (cid:19) j ≤ e − j / m , where the last inequality uses the fact that 1 − x ≤ e − x for all x . Since | ω ( t ) |(cid:107) ω (cid:107) ≤ | ω (2 cj ) | ω (0) ≤ c (2 cj ) · e − cj / (4 cm ) ≤ δ · t · e − t √ δ/ √ T , this establishes (59).What remains is to perform the correlation calculation to establish (57). For t = 1, we observe | ω (1) | ω (0) = c (cid:81) mi =1 (2 ci )( c − (cid:81) mi =1 (2 ci − ≥ m (cid:89) i =1 i i − / (2 c ) ≥ . Next, we observe that the total contribution of t > c to (cid:107) ω (cid:107) /ω (0) is at most (cid:88) t>c | ω ( t ) | ω (0) ≤ m (cid:88) j =1 c (2 cj ) < ∞ (cid:88) j =1 cj = π c . (61)Next, we calculate ω (0) − T (cid:88) t =1 ω ( t ) ≥ ω (0) − ω (1) − (cid:32) T (cid:88) t = c | ω ( t ) | (cid:33) ≥ ω (0) − ω (1) − (cid:18) ω ( c ) + ω (0) · π c (cid:19) by (61) ≥ − ω (1) + ω (0) (cid:18) − c − − π c (cid:19) by (60) ≥ − ω (1) + (1 − δ ) ω (0) by our choice of c ≥ /δ. (62)On the other hand, (cid:107) ω (cid:107) ≤ ω (0) − ω (1) + ω (2) + ω (0) · π c by (61) ≤ − ω (1) + ω (0) (cid:18) c − π c (cid:19) by (60) ≤ − ω (1) + (1 + δ ) ω (0) since c ≥ /δ. (63)Combining (62) and (63), and using the fact that − ω (1) ≥ ω (0) shows that ω (0) − (cid:80) kt =1 ω ( t ) (cid:107) ω (cid:107) ≥ − ω (1) + (1 − δ ) ω (0) − ω (1) + (1 + δ ) ω (0) ≥ − δ δ ≥ − δ. This establishes (57), completing the proof. 39he following construction of a dual polynomial for OR N , with N ≥ T , is an immediateconsequence of Minsky-Papert symmetrization (Lemma 15), combined with Proposition 47. Proposition 49.

Let

T, N ∈ N with T ≤ N , and let δ > /T . Deﬁne ω as in Proposition 47.Deﬁne the function ψ : {− , } N → {− , } by ψ ( x ) = ω ( | x | ) / (cid:0) N | x | (cid:1) for x ∈ H N ≤ T and ψ ( x ) = 0 otherwise. Then (cid:104) ψ, OR N (cid:105) ≥ − δ (64) (cid:107) ψ (cid:107) = 1 (65) For any polynomial p : {− , } N → R , deg p < c √ δT = ⇒ (cid:104) ψ, p (cid:105) = 0 (66) AND R ◦ OR N The following proposition, when combined with Proposition 28, shows that there is a functionΦ : {− , } R → {− , } such that the dual block composition Φ (cid:63) ψ is a good dual polynomial for AND R ◦ OR N . In the next section, we will modify Φ (cid:63) ψ to zero out the weight it places outside H N · R ≤ N . Proposition 50.

Let OR N : {− , } N → {− , } and AND R : {− , } R → {− , } . Let ψ : {− , } N → {− , } be a function such that (cid:107) ψ (cid:107) = 1 and (cid:104) ψ, OR N (cid:105) ≥ / . Then thereexists a function Φ : {− , } R → {− , } with pure high degree Ω( √ R ) and (cid:107) Φ (cid:107) = 1 such that (cid:104) Φ (cid:63) ψ, AND R ◦ OR N (cid:105) > / . The proof of Proposition 50 is implicit in the results of [BT13, She13a].

Proposition 51.

Let R be suﬃciently large. There exist N = O ( R ) , D = ˜Ω( N / ) , and ζ :( {− , } N ) R → R such that ζ ( x ) = 0 for all x (cid:54)∈ H N · R ≤ N , (67) (cid:88) x ∈ H N · R ≤ N ζ ( x ) · ( AND R ◦ OR N )( x ) > / , (68) (cid:107) ζ (cid:107) = 1 , and (69) For every polynomial p : ( {− , } N ) R → R of degree less than D, we have (cid:104) p, ζ (cid:105) = 0 . (70) Proof.

We start by ﬁxing choices of several key parameters: • d = Θ( √ R ) is the pure high degree of the dual witness Φ for AND R in Proposition 50, • T = (cid:98) ( R/d ) / (cid:99) = Θ( √ R ), • ˆ D = c √ T · d = Θ( R / ), where c is the constant from Proposition 47, • δ = 1 / • α = 170 /δ = 3400, 40 β = c · √ δ/ √ T = Θ(1 /R / ), where c is the constant from Proposition 47, • N = (cid:100) √ α (cid:101) R = 693 R .Let ψ : {− , } N → {− , } be the function constructed in Proposition 49 with δ := 1 /

20. LetΦ : {− , } R → {− , } be the function constructed in Proposition 50, and deﬁne ξ = Φ (cid:63) ψ . Thenby Proposition 28, ξ satisﬁes the following properties: (cid:104) ξ, AND R ◦ OR N (cid:105) > / , (71) (cid:107) ξ (cid:107) = 1 , (72)For every polynomial p of degree less than D, we have (cid:104) ξ, p (cid:105) = 0 . (73)Recall that ψ was obtained by symmetrizing the function ω constructed in Proposition 47. Proposi-tion 30 guarantees that for some ∆ ≥ β √ αR/ R = ˜Ω( R / ), the function ξ can be modiﬁed toproduce a function ζ : ( {− , } N ) R → R such that ζ ( x ) = 0 for all x (cid:54)∈ H N · R ≤ N , (cid:104) ζ, AND R ◦ OR N (cid:105) ≥ (cid:104) ξ, AND R ◦ OR N (cid:105) − (cid:107) ζ − ξ (cid:107) ≥ / − / > / , (cid:107) ζ (cid:107) = 1 , For every polynomial p of degree less than min { ˆ D, ∆ } , we have (cid:104) ζ, p (cid:105) = 0 . Observing that D = min { ˆ D, ∆ } = ˜Ω( R / )shows that the function ζ satisﬁes the conditions necessary to prove Proposition 51.Theorem 46 follows by combining Proposition 51 with the dual characterization of unboundedapproximate degree given in Proposition 13. By Corollary 24, we conclude that (cid:103) deg( dSURJ N,R ) =˜Ω( R / ). Theorem 45 follows by Proposition 18. k -Distinctness Our goal is to prove the following lower bound on the approximate degree of the k -distinctnessfunction. Theorem 52.

For k ≥ and some N = O k ( R ) , the (1 / -approximate degree of DIST kN,R is ˜Ω k ( R / − / (2 k ) ) . The same lower bound holds for the quantum query complexity of DIST kN,R . Here, the notation O k hides factors depending only on k , and ˜Ω k hides factors logarithmic in R and factors depending only on k .Theorem 52 is a consequence of applying the reductions of Proposition 21 and Corollary 26 tothe following, which is the ultimate goal of this section. Theorem 53.

Let G ≤ N : H N · R ≤ N → {− , } equal OR R ◦ THR kN restricted to inputs in H N · R ≤ N . Thenfor some N = O k ( R ) , we have (cid:94) ubdeg( G ≤ N ) ≥ ˜Ω k ( R / − / (2 k ) ) . The proof of Theorem 53 will follow the same basic outline as the proof of Theorem 46. We willconstruct a dual polynomial for G ≤ N via the following three steps:41 tep 1. We ﬁrst construct a dual witness ψ showing that the unbounded-approximate degree ofthe THR kN function is Ω k (cid:16) √ T N − /k (cid:17) , even when promised that the input has Hamming weight atmost T = Θ k ( √ R ). Moreover, this dual witness satisﬁes additional properties that are exploited inStep 2 below. Step 2.

We combine ψ with a dual witness Φ for OR R to obtain a preliminary dual witnessΦ (cid:63) ψ for OR R ◦ THR kN . The dual witness Φ (cid:63) ψ shows that OR R ◦ THR kN has approximate degreeΩ( √ R · √ T ) = Ω k ( R / − / (2 k ) ). However, Φ (cid:63) ψ places weight on inputs of Hamming weight largerthan N , and hence does not give an unbounded approximate degree lower bound for the promisevariant G ≤ N . Step 3.

Using Proposition 30 we zero out the mass that Φ (cid:63) ψ places on inputs of Hamming weightlarger than N , while maintaining its pure high degree and correlation with G ≤ N . This yields theﬁnal desired dual witness ζ for G ≤ N . Additional Notation.

For functions f : X → {− , } and ψ : X → R , deﬁne the error regions E + ( ψ, f ) = { x ∈ X : ψ ( x ) > , f ( x ) = − } E − ( ψ, f ) = { x ∈ X : ψ ( x ) < , f ( x ) = +1 } . These are the regions where ψ disagrees in sign with f . We refer to E + as the set of “falsepositive” errors made by ψ , and E − as the set of “false negative” errors. THR kN We begin by constructing a univariate version of our dual witness for

THR kN . Properties (74) and (75)below amount to more reﬁned conditions on the correlation between ω and the (symmetrized) THR kN function. These properties will be needed in order to execute Step 2 of the construction inSection 5.2. Proposition 54.

Let k, T, N ∈ N with k ≤ T . There exist constants c , c ∈ (0 , and a function ω : { , , . . . , T } → R such that (cid:88) ω ( t ) > ,t ≥ k | ω ( t ) | ≤ N (74) (cid:88) ω ( t ) < ,t , t ≥ k } , and E − := { t : ω ( t ) < , t < k } . By normalizing, it suﬃces toconstruct a function ω : [ T ] → R such that (cid:88) t ∈ E + | ω ( t ) | ≤ N · (cid:107) ω (cid:107) (79) (cid:88) t ∈ E − | ω ( t ) | ≤ (cid:18) − k (cid:19) · (cid:107) ω (cid:107) (80)For all univariate polynomials q : R → R ,deg p < c (cid:112) k − · T · N − /k = ⇒ k (cid:88) t =0 ω ( t ) · q ( t ) = 0 (81) | ω ( t ) | ≤ (2 k ) k exp( − c t/ √ k · T · N /k ) (cid:107) ω (cid:107) t ∀ t = 1 , , . . . , T. (82)Let c = 2 k (cid:100) N /k (cid:101) , and let m = (cid:98) (cid:112) T /c (cid:99) . Deﬁne the set S = { , , . . . , k } ∪ { ci : 0 ≤ i ≤ m } . Note that | S | = Ω( k − / T / N − / (2 k ) ). Deﬁne the polynomial ω ( t ) = ( − t +( T − m )+1 T ! (cid:18) Tt (cid:19) (cid:89) r ∈ [ T ] \ S ( t − r ) . The signs are chosen so that ω ( k ) <

0. It is immediate from Fact 48 that ω satisﬁes (81) for c = 1 / √

2. We now show that (82) holds. For t = 1 , . . . , k , we have(2 k ) k exp( − c t/ √ k · T · N /k ) t ≥ (2 k ) k exp( − c √ k ) k ≥ c ≤ / k ≥

2. Since | ω ( t ) | ≤ (cid:107) ω (cid:107) , the bound holds for t = 1 , . . . , k .For t = cj with j ≥

1, we expand out the binomial coeﬃcient in the deﬁnition of ω to obtain | ω ( t ) | =  (cid:81) r ∈ S \{ t } | t − r | for t ∈ S, t ∈ { , , . . . , k } , we observe that | ω ( t ) || ω ( k ) | = k ! · (cid:81) mi =1 ( ci − k ) t ! · ( k − t )! · (cid:81) mi =1 ( ci − t ) ≤ (cid:18) kt (cid:19) . (83)43eanwhile, for t = cj with j ≥

1, we get | ω ( t ) || ω ( k ) | = k ! · (cid:81) mi =1 ( ci − k ) (cid:81) ki =1 ( cj − i ) · (cid:81) i ∈ [ m ] \{ j } | ci − cj |≤ k ! · (cid:81) mi =1 ci ( cj − k ) k · (cid:81) i ∈ [ m ] \{ j } c ( i + j ) | i − j | = k ! j ( cj − k ) k · ( m !) ( m + j )!( m − j )! . The ﬁrst factor is bounded above by k !( c − k ) k j k +1 . As long as c ≥ k and k ≥

2, this expression is at most k k ( c/ k j = (2 k ) k c k · j . We control the second factor by( m !) ( m + j )!( m − j )! = mm + j · m − m + j − · . . . · m − j + 1 m + 1 ≤ (cid:18) mm + j (cid:19) j ≤ (cid:18) − j m (cid:19) j ≤ e − j / m , where the last inequality uses the fact that 1 − x ≤ e − x for all x . Hence, | ω ( cj ) || ω ( k ) | ≤ (2 k ) k c k · j · e − j / m . (84)This immediately yields | ω ( cj ) |(cid:107) ω (cid:107) ≤ | ω ( cj ) || ω ( k ) | ≤ (2 k ) k ( cj ) · e − cj / (2 cm ) , which establishes (82) for all t = cj > k .Moreover, by (84) (cid:88) t>k | ω ( t ) | ≤ | ω ( k ) | · m (cid:88) j =1 (2 k ) k c k · j · e − j / m ≤ (2 k ) k c k · | ω ( k ) | · m (cid:88) j =1 j ≤ | ω ( k ) | N . (85)Hence, since ω ( k ) < (cid:88) t ∈ E + | ω ( t ) | ≤ (cid:88) t>k | ω ( t ) | ≤ | ω ( k ) | N ≤ (cid:107) ω (cid:107) N , (cid:107) ω (cid:107) | ω ( k ) | ≤ k (cid:88) t =0 (cid:18) kt (cid:19) + 148 N < k + 1 < · k . (86)We calculate (cid:107) ω (cid:107) − (cid:88) t ∈ E − | ω ( t ) | = (cid:88) t : ω ( t ) < ( − ω ( t )) − (cid:88) t ∈ E − ( − ω ( t )) since (cid:104) ω, (cid:105) = 0= (cid:88) t : ω ( t ) < ,t ≥ k ( − ω ( t )) ≥ − ω ( k ) . Rearranging and applying the bound (86), (cid:88) t ∈ E − | ω ( t ) | ≤ (cid:18)

12 + ω ( k ) (cid:107) ω (cid:107) (cid:19) · (cid:107) ω (cid:107) ≤ (cid:18) − · − k (cid:19) · (cid:107) ω (cid:107) . Applying Minsky-Papert symmetrization (Lemma 15) to ensure that the resulting function hasthe appropriate pure high degree, Proposition 54 yields a dual polynomial for

THR kN . Proposition 55.

Let k, T, N ∈ N with k ≤ T ≤ N . Deﬁne ψ : {− , } N → R by ψ ( x ) = ω ( | x | ) / (cid:0) N | x | (cid:1) for x ∈ H N ≤ T and ψ ( x ) = 0 otherwise, where ω is as constructed in Proposition 54. Then (cid:88) x ∈ E + ( ψ, THR kN ) | ψ ( x ) | ≤ N (87) (cid:88) x ∈ E − ( ψ, THR kN ) | ψ ( x ) | ≤ − k (88) (cid:107) ψ (cid:107) = 1 (89) For any polynomial p : {− , } N → R , deg p < c (cid:112) k − · T · N − /k = ⇒ (cid:104) ψ, p (cid:105) = 0 (90) (cid:88) | x | = t | ψ ( x ) | ≤ (2 k ) k exp( − c t/ (cid:112) k · T · N /k ) /t ∀ t = 1 , , . . . , N. (91) OR R ◦ THR kN The dual witness Φ for OR R that we construct will itself be obtained as a dual block composition ρ (cid:63) ϕ . Each constituent dual polynomial will play a distinct role in showing that Φ (cid:63) ψ is a gooddual polynomial for OR R ◦ THR kN . The ﬁrst function ϕ is an “error ampliﬁer” in the sense that ϕ (cid:63) ψ is much better correlated with OR R ◦ THR kN than ψ is with THR kN . The second function ρ , onthe other hand, is a “degree ampliﬁer” in that it serves to increase the pure high degree of ϕ (cid:63) ψ .While the ampliﬁcation results we need are new, they are relatively straightforward extensionsof similar results in [BT13, She13a, BT15]. Proofs appear below for completeness.45 mplifying Error. The following proposition shows that if ψ is a dual witness for the highapproximate degree of a Boolean function f , then there is a dual witness of the form ϕ (cid:63) ψ for OR M ◦ f such that (a) ϕ (cid:63) ψ may make slightly more false positive errors than ψ (by at most afactor of M ), and (b) ϕ (cid:63) ψ makes signiﬁcantly fewer false-negative errors than ψ (exponentiallysmaller in M ). Proposition 56.

Let f : {− , } m → {− , } and ψ : {− , } m → R be functions such that (cid:88) x ∈ E + ( ψ,f ) | ψ ( x ) | ≤ δ + (92) (cid:88) x ∈ E − ( ψ,f ) | ψ ( x ) | ≤ δ − (93) (cid:107) ψ (cid:107) = 1 . (94) For every M ∈ N , there exists a function ϕ : {− , } M → {− , } with (cid:107) ϕ (cid:107) = 1 and pure highdegree such that (cid:88) x ∈ E + ( ϕ(cid:63)ψ, OR M ◦ f ) | ( ϕ (cid:63) ψ )( x ) | ≤ M · δ + (95) (cid:88) x ∈ E − ( ϕ(cid:63)ψ, OR M ◦ f ) | ( ϕ (cid:63) ψ )( x ) | ≤ · (2 δ − ) M . (96) Proof.

Let ϕ : {− , } M → {− , } be deﬁned such that ϕ ( ) = 1 / ϕ ( − ) = − /

2, and ϕ ( x ) = 0for all other x . Notice that (cid:88) ( x ,...,x M ) ∈{− , } M ϕ ( x , . . . , x M ) = 0 (97)so Ψ has pure high degree 1, and that (cid:107) Ψ (cid:107) = 1.We now prove that (Equation (95)) holds. Let λ be the distribution on {− , } m given by λ ( x ) = | ψ ( x ) | , and let λ ⊗ M be the product distribution on ( {− , } m ) M given by λ ⊗ M ( x , . . . , x M ) = (cid:81) Mi =1 | ψ ( x i ) | . Since ψ is orthogonal to the constant polynomial, it has expected value 0, and hencethe string ( . . . , sgn( ψ ( x i )) , . . . ) is distributed uniformly in {− , } M when one samples ( x , . . . , x M )according to λ ⊗ M . This allows us to write (cid:88) ( x ,...,x M ) ∈ E + ( ϕ(cid:63)ψ, OR M ◦ f ) | ( ϕ (cid:63) ψ )( x , . . . , x M ) | = 2 M E λ ⊗ M [ ϕ ( . . . , sgn( ψ ( x i )) , . . . ) · I ( ϕ ( . . . , sgn( ψ ( x i )) , . . . ) > ∧ OR M ( . . . , f ( x i ) , . . . ) = − (cid:88) z : ϕ ( z ) > ϕ ( z ) · Pr λ ⊗ M [ OR M ( . . . , f ( x i ) , . . . ) = − | ( . . . , sgn( ψ ( x i )) , . . . ) = z ] . (98)Observe that for any bit b ,Pr x ∼ λ [ f ( x ) (cid:54) = sgn( ψ ( x )) | sgn( ψ ( x )) = b ] = 2 (cid:88) x ∈ A b | ψ ( x ) | , where for brevity, we have written A +1 = E + ( ψ, f ) and A − = E − ( ψ, f ). Therefore, as notedin [She13c], for any given z ∈ {− , } M , the following two random variables are identically distributed:46 The string ( . . . , f ( x i ) , . . . ), when one chooses ( . . . , x i , . . . ) from λ ⊗ M conditioned on( . . . , sgn( ψ ( x i )) , . . . ) = z • The string ( . . . , y i z i , . . . ), where y ∈ {− , } M is a random string whose i th bit independentlytakes on value − (cid:80) x ∈ A zi | ψ ( x ) | .Thus, Expression (98) equals (cid:88) z : ϕ ( z ) > ϕ ( z ) · Pr y [ OR M ( . . . , y i z i , . . . ) = − , (99)where y ∈ {− , } M is a random string whose i th bit independently takes on value − (cid:80) x ∈ A zi | ψ ( x ) | . The only term in this sum corresponds to z = , which we now argue isat most M δ + . By (92), each y i = − (cid:80) x ∈ A | ψ ( x ) | = 2 (cid:80) x ∈ E + ( ψ,f ) | ψ ( x ) | ≤ δ + .Hence, for z = , we havePr y [ OR M ( . . . , y i , . . . ) = − ≤ M δ + by a union bound. Thus Expression (99) is at most M δ + , proving (95).It now remains to prove the bound (96). By an identical argument as above, we have (cid:88) ( x ,...,x M ) ∈ E − ( ϕ(cid:63)ψ, OR M ◦ f ) | ( ϕ (cid:63) ψ )( x , . . . , x M ) | = (cid:88) z : ϕ ( z ) < ϕ ( z ) · Pr y [ OR M ( . . . , y i z i , . . . ) = 1] . (100)The only term in the sum corresponds to z = − , which we argue takes value · (2 δ − ) M . Here,each y i = − (cid:80) x ∈ A − | ψ ( x ) | = 2 (cid:80) x ∈ E − ( ψ,f ) | ψ ( x ) | ≤ δ − , and OR M ( . . . , − y i , . . . ) = 1 only if y i = − i . Hence, we conclude thatPr y [ OR M ( . . . , − y i , . . . ) = 1] ≤ (2 δ − ) M . It follows that Expression (100) is at most · (2 δ − ) M , establishing (95). This completes theproof. Amplifying Degree.

The following proposition states that if ψ is a dual polynomial for a Booleanfunction f , then there is a dual polynomial ρ (cid:63) ψ for OR M ◦ f with signiﬁcantly larger pure highdegree that does not make too many more false positive and false negative errors than does ψ itself. Proposition 57.

Let f : {− , } m → {− , } and ψ : {− , } m → R be functions such that (cid:88) x ∈ E + ( ψ,f ) | ψ ( x ) | ≤ δ + (101) (cid:88) x ∈ E − ( ψ,f ) | ψ ( x ) | ≤ δ − (102) (cid:107) ψ (cid:107) = 1 (103) For every M ∈ N there exists a function ρ : {− , } M → R with (cid:107) ρ (cid:107) = 1 and pure high degree Ω( √ M ) such that (cid:104) ρ (cid:63) ψ, OR M ◦ f (cid:105) ≥ − M δ + − δ − . (104)47 roof. Lemma 14 shows that the function OR M has (9 / √ M ). Hence,Proposition 11 guarantees the existence of a function ρ : {− , } M → R with (cid:107) ρ (cid:107) = 1 and purehigh degree Ω( √ M ) such that (cid:104) ρ, OR M (cid:105) ≥ . (105)What remains is to establish the correlation bound (104). Letting λ denote the distribution λ ( x ) = | ψ ( x ) | as in the proof of Proposition 56, we may write (cid:88) ( x ,...,x M ) ∈ ( {− , } m ) M ( ϕ (cid:63) ψ )( x , . . . , x M ) · OR M ( . . . , f ( x i ) , . . . )= 2 M E λ ⊗ M [ ϕ ( . . . , sgn( ψ ( x i )) , . . . ) · OR M ( . . . , f ( x i ) , . . . )]= (cid:88) z ∈{− , } M ϕ ( z ) · E λ ⊗ M [ OR M ( . . . , f ( x i ) , . . . ) | ( . . . , sgn( ψ ( x i )) , . . . ) = z ]= (cid:88) z ∈{− , } M ϕ ( z ) · E y [ OR M ( . . . , y i z i , . . . )] , (106)where y ∈ {− , } M is a random string whose i th bit independently takes the value − (cid:80) x ∈ A zi | ψ ( x ) | . (Here, we are using the abbreviated notation A +1 = E + ( ψ, f ) and A − = E − ( ψ, f ).) We ﬁrst consider the contribution of the term corresponding to z = to the sum.Here, by a union bound, E y [ OR M ( . . . , y i z i , . . . )] = 1 − y [ OR M ( . . . , y i , . . . ) = − ≥ − M ·  (cid:88) x ∈ A +1 | ψ ( x ) |  ≥ − M δ + . Hence, the term z = contributes ϕ ( ) · (1 − M δ + ) to the sum.Now we consider the contribution of any term corresponding to z (cid:54) = . Given such a z , let i ∗ bean index such that z i ∗ = −

1. Then we have − E y [ OR M ( . . . , y i z i , . . . )] = 1 − y [ OR M ( . . . , y i z i , . . . ) = 1] ≥ − · Pr y i ∗ [ y i ∗ = − − ·  (cid:88) x ∈ A − | ψ ( x ) |  ≥ − δ − . We can now lower bound (106) by ϕ ( ) · (1 − M δ + ) − (cid:88) z (cid:54) = ϕ ( z )(1 − δ − ) ≥ (cid:88) z ∈{− , } M ϕ ( z ) OR M ( z ) − M δ + | ϕ ( ) | − δ − (cid:88) z (cid:54) =1 | ϕ ( z ) |≥ − M δ + − δ − . .2.2 Constructing a Dual Witness for OR R ◦ THR kN We now combine our ampliﬁcation lemmas to construct a dual witness for OR R ◦ THR kN . Proposition 58.

Let k, T, N, R ∈ N with k ≤ T ≤ R ≤ N and R divisible by k . Let ψ : {− , } N → R be a function with (cid:107) ψ (cid:107) = 1 and (cid:88) x ∈ E + ( ψ, THR kN ) | ψ ( x ) | ≤ N (cid:88) x ∈ E − ( ψ, THR kN ) | ψ ( x ) | ≤ − k . Then there exists a function

Φ : {− , } R → R with (cid:107) Φ (cid:107) = 1 and pure high degree Ω(2 − k √ R ) suchthat (cid:104) Φ (cid:63) ψ, OR R ◦ THR kN (cid:105) ≥ / . Proof.

Using the construction of Proposition 56 with m = N , M = 4 k , and f = THR kN , we ﬁrstobtain a function ϕ : {− , } k → R with (cid:107) ϕ (cid:107) = 1 and pure high degree 1 such that (cid:88) x ∈ E + ( ϕ(cid:63)ψ, OR k ◦ THR kN ) | ( ϕ (cid:63) ψ )( x ) | ≤ k N (cid:88) x ∈ E − ( ϕ(cid:63)ψ, OR k ◦ THR kN ) | ( ϕ (cid:63) ψ )( x ) | ≤ · (cid:16) − · − k (cid:17) k ≤ e − . Now by the construction of Proposition 57 with m = 4 k · N , M = R/ k and f = OR k ◦ THR kN ,there exists a function ρ : {− , } R/ k → R with (cid:107) ρ (cid:107) = 1 and pure high degree Ω(2 − k √ R ) suchthat (cid:104) ρ (cid:63) ( ϕ (cid:63) ψ ) , OR R/ k ◦ ( OR k ◦ THR kN ) (cid:105) ≥ − R N − e − ≥ . Let Φ : {− , } R → R be the dual block composition ρ (cid:63) ϕ . Since the dual block compositionpreserves (cid:96) -norms (Proposition 28, Condition (9)) and multiplies pure high degrees (Proposition 28,Condition (10)), the function Φ itself has (cid:96) -norm 1 and pure high degree Ω(2 − k √ R ). The claimnow follows from the associativity of dual block composition (Proposition 28, Condition (11)) andthe fact that OR R = OR R/ k ◦ OR k . We are now ready to apply Proposition 30 to zero out the mass that the construction of Proposition 58places on inputs outside of H N · R ≤ N . 49 roposition 59. Let R be suﬃciently large. There exist N = O ((2 k ) k/ · R ) , D = ˜Ω(2 − k/ k ( k − / · R / − / (2 k ) ) , and ζ : ( {− , } N ) R → R such that ζ ( x ) = 0 for all x (cid:54)∈ H N · R ≤ N , (107) (cid:88) x ∈ H N · R ≤ N ζ ( x ) · ( OR R ◦ THR kN )( x ) > / , (108) (cid:107) ζ (cid:107) = 1 , and (109) For every polynomial p : ( {− , } N ) R → R of degree less than D, we have (cid:104) p, ζ (cid:105) = 0 . (110) Proof.

We start by ﬁxing choices of several key parameters: • d = Θ(2 − k √ R ) is the pure high degree of the dual witness Φ for OR R in Proposition 58, • T = (cid:98) (8 k ) k/ √ R (cid:99) , • α = (2 k ) k , • N = (cid:100) √ α (cid:101) R = Θ((2 k ) k/ R ), • ˆ D = c √ k − · T · N − /k · d = Θ(2 − k/ k ( k − / · R / − / (2 k ) ), where c is the constant fromProposition 54, • β = c / √ k · T · N /k = Θ(2 − k/ k ( − k − / · R − / − / (2 k ) ), where c is the constant fromProposition 54.Let ψ : {− , } N → {− , } be the function constructed in Proposition 55. Let Φ : {− , } R →{− , } be the function constructed in Proposition 58, and deﬁne ξ = Φ (cid:63) ψ . Then by Proposition 28, ξ satisﬁes the following properties: (cid:104) ξ, OR R ◦ THR kN (cid:105) > / , (111) (cid:107) ξ (cid:107) = 1 , (112)For every polynomial p of degree less than ˆ D, we have (cid:104) ξ, p (cid:105) = 0 . (113)Recall that ψ was obtained by symmetrizing the function ω constructed in Proposition 54. Proposi-tion 30 guarantees that for some ∆ ≥ β √ αR/ R = ˜Ω(2 − k/ k ( k − / · R / − / (2 k ) ), the function ξ can be modiﬁed to produce a function ζ : ( {− , } N ) R → R such that ζ ( x ) = 0 for all x (cid:54)∈ H N · R ≤ N , (cid:104) ζ, OR R ◦ THR kN (cid:105) ≥ (cid:104) ξ, OR R ◦ THR kN (cid:105) − (cid:107) ζ − ξ (cid:107) ≥ / − / > / , (cid:107) ζ (cid:107) = 1 , For every polynomial p of degree less than min { ˆ D, ∆ } , we have (cid:104) ζ, p (cid:105) = 0 . Observing that D = min { ˆ D, ∆ } = ˜Ω(2 − k/ k ( k − / · R / − / (2 k ) )shows that the function ζ satisﬁes the conditions necessary to prove Proposition 59.50heorem 53 now follows by combining Proposition 59 with the dual characterization of unboundedapproximate degree Proposition 13. The approximate degree lower bound in Theorem 52 is then aconsequence of Proposition 21 and Corollary 26. The quantum query lower bound follows via thestandard fact that the ε -error quantum query complexity of f is lower bounded by 1 / · (cid:103) deg ε ( f )[BBC + The Image Size Testing problem (

IST for short) is deﬁned as follows.

Deﬁnition 60.

Given an input s = ( s , . . . , s N ) ∈ [ R ] N , and i ∈ [ R ] , let f i = |{ j : s j = i }| . The image size of s is the number of i ∈ [ R ] such that f i > . For < γ < , deﬁne: IST γN,R ( s , . . . , s N ) =  − if the image size is R if the image size is at most γ · R undeﬁned otherwise . Observe that the deﬁnition of

IST ignores whether or not the range item 0 has positive frequency,just like the functions dSURJ and dDIST k . We choose to deﬁne IST in this manner to streamlineour analysis.The goal of this section is to prove the following lower bound.

Theorem 61.

For some constant c > , and any constant γ ∈ (0 , , (cid:103) deg (cid:16) IST γN,R (cid:17) ≥ ˜Ω( R / ) ,where N = c · γ − / · R . The same lower bound applies to the quantum query complexity of IST γN,R . Remark 2.

It is possible to reﬁne our analysis to show that even the unbounded approximatedegree of

IST γN,R is ˜Ω( R / ) , and that this holds even if the error parameter is − − n Ω(1) . However,for brevity we do not explicitly establish this stronger result. We direct the interested reader tosubsequent work [BT18], which shows that the threshold degree of

SURJ

N,R is ˜Ω( R / ) for R ≤ N/ .The proof of that result can be extended with little diﬃculty to show that the threshold degree of IST γN,R is ˜Ω( R / ) . For any function f and symmetric function g , Section 2.5 described a connection between thesymmetric property F prop ( s , . . . , s N ) = f ( g ( [ s = 1] , . . . , [ s N = 1]) , . . . , g ( [ s = R ] , . . . , [ s N = R ]))and the partial function F ≤ N ( x , . . . , x R ) = (cid:40) f ( g ( x ) , . . . , g ( x R )) if x , . . . , x R ∈ {− , } N , | x | + · · · + | x R | ≤ N, undeﬁned otherwise . f and g (in particular,this avoided having to address the possibility that f ( g ( x ) , . . . , g ( x R )) is undeﬁned in the deﬁnitionsof F prop and F ≤ N ). Because IST is a partial function, we need to explain that the same connectionstill holds even when f is a partial function. To do this, we need to introduce the notion of the double-promise approximate degree of F ≤ N . Deﬁnition 62.

Let Y ⊂ {− , } R and f : Y → {− , } , and let g : {− , } N → {− , } be asymmetric (total) function. Let G = { x , . . . , x R : ( g ( x ) , . . . , g ( x R )) ∈ Y } . Let F ≤ N be deﬁned asabove. Observe that F ≤ N is deﬁned at all inputs in H N · R ≤ N ∩ G . The double-promise ε -approximatedegree of F ≤ N , denoted (cid:94) dpdeg( F ≤ N ) is the least degree of a real polynomial p such that: | p ( x ) − F ≤ N ( x ) | ≤ ε for all x ∈ H N · R ≤ N ∩ G. (114) | p ( x ) | ≤ ε for all x ∈ H N · R ≤ N \ G. (115)Observe that in the deﬁnition above, no restriction is placed on p ( x ) for any inputs that are notin H N · R ≤ N .Bun and Thaler’s analysis from [BT17] (cf. Theorem 23) applies to partial functions f in thefollowing way. Theorem 63 (Bun and Thaler [BT17]) . Let Y ⊂ {− , } R and let f : Y → {− , } be any partialfunction. Let g : {− , } N → {− , } be any symmetric function. Then for F prop and F ≤ N deﬁnedabove, and for any ε > , we have (cid:103) deg ε ( F prop ) ≥ (cid:94) dpdeg ε ( F ≤ N ) . We will require the following dual formulation of (cid:94) dpdeg ε ( F ≤ N ). Proposition 64.

Let F ≤ N and G be deﬁned as above. Then (cid:94) dpdeg ε ( F ≤ N ) ≥ d if and only if thereexists a function ψ : {− , } n → R satisfying the following properties. ψ ( x ) = 0 for all x (cid:54)∈ H N · R ≤ N , (116) (cid:88) x ∈ H N · R ≤ N ∩ G ψ ( x ) · F ≤ N ( x ) − (cid:88) x ∈ H N · R ≤ N \ G | ψ ( x ) | > ε, (117) (cid:88) x ∈{− , } n | ψ ( x ) | = 1 , and (118) For every polynomial p : {− , } n → R of degree less than d, (cid:88) x ∈{− , } n p ( x ) · ψ ( x ) = 0 . (119)We will need to deﬁne the following partial function. Deﬁnition 65.

Deﬁne

GapAND γR : H R ≤ ( γ · R ) ∪ {− } → {− , } via: GapAND γR ( x ) =  − if x i = − for all i if x ∈ H R ≤ ( γ · R ) undeﬁned otherwise .

52n the case where f = GapAND γR and g = OR N , the function F prop ( s , . . . , s N ) is precisely the IST γN,R function. Hence:

Corollary 66.

Let

N, R ∈ N . Let Y be the domain of GapAND γR , and let G = { ( x , . . . , x R ) ∈{− , } N · R : ( OR N ( x ) , . . . , OR N ( x R )) ∈ Y } . Then for any ε > , (cid:103) deg ε ( IST γN,R ) ≥ (cid:94) dpdeg ε ( F ≤ N ) where F ≤ N : G ∩ H N · R ≤ N → {− , } is the partial function obtained by restricting GapAND γR ◦ OR N to H N · R ≤ N . With Corollary 66 in hand, we now turn to proving a lower bound on (cid:94) dpdeg ε ( F ≤ N ). Proof.

We construct a dual polynomial to witness the lower bound in Theorem 61.Deﬁne the parameters • δ = γ/ • α = 170 /δ , • T = N = (cid:100) √ α (cid:101) R ≤ · γ − / · R , • β = c · √ δ/ √ T , where c is the constant from Proposition 47.Let ψ be the dual witness for OR N from Proposition 49 with T = N . Deﬁne Φ : {− , } R → R as follows: Φ( x ) =  − / x = ( − , − , . . . , − / x = (1 , , . . . , . The dual block composition Φ (cid:63) ψ is the same dual witness for

AND R ◦ OR N which Bun andThaler [BT15] used to show that (cid:103) deg ε ( AND R ◦ OR N ) = Ω( N / ) for ε = 1 − − R . This dual witnesswas also used in subsequent works [She14,BCH + (cid:63) ψ because its correlation with the target function AND R ◦ OR N is exponentially closer to 1 than isthe correlation of ψ with OR N . For our purposes, it will not be essential to exploit such a strongcorrelation guarantee—rather, we are interested in Φ (cid:63) ψ because most its “ (cid:96) -mass” lies on inputseither with full image or tiny image (i.e., most of its mass lies in the domain of GapAND γR ◦ OR N ).As in Corollary 66, let G denote the set of all inputs on which GapAND γR ◦ OR N is deﬁned, i.e., G = { x , . . . , x R ∈ {− , } N · R : ( OR N ( x ) , . . . , OR N ( x R )) ∈ H R ≤ γ · R ∪ {− }} . The analysis in these prior works [BT15, BCH +

17] implies that Φ (cid:63) ψ satisﬁes the following threeproperties. (cid:107) Φ (cid:63) ψ (cid:107) = 1 , (120) (cid:88) x ∈ G (Φ (cid:63) ψ )( x ) · (cid:0) GapAND γR ◦ OR N (cid:1) ( x ) − (cid:88) x ∈{− , } N · R \ G | (Φ (cid:63) ψ )( x ) | ≥ / , (121)For any polynomial p : {− , } N · R → R , deg p < c √ δT = ⇒ (cid:104) Φ (cid:63) ψ, p (cid:105) = 0 , (122)53here c is the constant from Proposition 49. Indeed, Properties (120) and (122) are immediatefrom Proposition 28 on the properties of dual block composition. For completeness, we prove thatProperty (121) holds in Section 6.1.3 below, making use of the fact that δ = γ/ R tobe suﬃciently large.Proposition 30 (with α and β set as above) guarantees that for some ∆ ≥ β √ αR/ R =˜Ω( R / ), the function Φ (cid:63) ψ can be modiﬁed to produce a function ζ : ( {− , } N ) R → R such that ζ ( x ) = 0 for all x (cid:54)∈ H N · R ≤ N , (cid:104) ζ, GapAND γR ◦ OR N (cid:105) ≥ (cid:104) Φ (cid:63) ψ, GapAND γR ◦ OR N (cid:105) − (cid:107) ζ − Φ (cid:63) ψ (cid:107) ≥ / − / > / , (cid:107) ζ (cid:107) = 1 , For every polynomial p of degree less than D := min { ˆ D, ∆ } , we have (cid:104) ζ, p (cid:105) = 0 . Observing that D = min { c √ δT , ∆ } = ˜Ω( R / )shows that the function ζ satisﬁes the conditions necessary to prove Theorem 61 via Proposition 64. (121) Lemma 67.

Let δ > , γ > δ , and let G = { x , . . . , x R ∈ {− , } N · R : ( OR N ( x ) , . . . , OR N ( x R )) ∈ H R ≤ γ · R ∪ {− }} . Deﬁne

Φ : {− , } R → {− , } by Φ( − ) = − / , Φ( ) = 1 / and Φ( z ) = 0 otherwise. Let ψ : {− , } N → {− , } be any dual witness for OR N such that (cid:107) ψ (cid:107) = 1 , (cid:104) ψ, (cid:105) = 0 , and (cid:104) ψ, OR N (cid:105) ≥ − δ . Then (cid:88) x ∈ G (Φ (cid:63) ψ )( x ) · (cid:0) GapAND γR ◦ OR N (cid:1) ( x ) − (cid:88) x ∈{− , } N · R \ G | (Φ (cid:63) ψ )( x ) | ≥ − δ R − exp( − ( γ − δ ) R/ . The proof of Lemma 67 crucially relies on a special property, called one-sided error , that issatisﬁed by any dual polynomial for OR . Deﬁnition 68.

Let f : {− , } N → {− , } and let ψ : {− , } N → R . We say that ψ has one-sidederror with respect to f if for all x ∈ {− , } N , f ( x ) = 1 = ⇒ ψ ( x ) > . (123)The following lemma shows that any dual witness for the OR N function has one-sided error. Lemma 69 (Gavinsky and Sherstov [GS10]) . Let ψ : {− , } N → R be a function with pure highdegree at least such that (cid:104) ψ, OR N (cid:105) > . Then ψ has one-sided error with respect to OR N . In particular, if ψ is such a dual witness for OR N , then we have (cid:88) x ∈ A +1 | ψ ( x ) | ≤

12 (1 − (cid:104) ψ, OR N (cid:105) ) , (cid:88) x ∈ A − | ψ ( x ) | = 0 , (124)54here the sets A +1 and A − are, respectively, the sets of false positive and false negative errorsgiven by A +1 = E + ( ψ, OR N ) = { x ∈ {− , } N : ψ ( x ) > , OR N ( x ) = − } ,A − = E − ( ψ, OR N ) = { x ∈ {− , } N : ψ ( x ) < , OR N ( x ) = +1 } . Proof of Lemma 67.

We begin by observing that the quantity of interest can be written as (cid:88) x ∈{− , } N · R (Φ (cid:63) ψ )( x ) · ( AND R ◦ OR N ) ( x ) −  (cid:88) x ∈{− , } N · R \ G (Φ (cid:63) ψ )( x ) · ( AND R ◦ OR N ) ( x ) + (cid:88) x ∈{− , } N · R \ G | (Φ (cid:63) ψ )( x ) |  ≥ (cid:88) x ∈{− , } N · R (Φ (cid:63) ψ )( x ) · ( AND R ◦ OR N ) ( x ) − (cid:88) x ∈{− , } N · R \ G | (Φ (cid:63) ψ )( x ) | . (125)We estimate each term of Expression (125) separately, beginning with the ﬁrst term. Just as in theproofs of Proposition 56 and Proposition 57, we have (cid:88) x ∈{− , } N · R (Φ (cid:63) ψ )( x ) · ( AND R ◦ OR N )( x ) = (cid:88) z ∈{− , } R Φ( z ) · E y [ AND R ( . . . , y i z i , . . . )]where y ∈ {− , } R is a random string whose i th bit independently takes the value − (cid:80) x ∈ A zi | ψ ( x ) | . For z = − , we have by (124) that 2 (cid:80) x ∈ A − | ψ ( x ) | = 0, so thecontribution of the corresponding term to the sum is 1 /

2. For z = , we use the fact that2 (cid:80) x ∈ A +1 | ψ ( x ) | ≤ δ to compute12 · E y [ AND R ( . . . , y i , . . . )] = 12 · (cid:18) − y [ AND R ( . . . , y i , . . . ) = − (cid:19) ≥ · (cid:0) − δ R (cid:1) . Hence, the ﬁrst summand of (125) is at least 1 − δ R .We now estimate the second summand, 2 (cid:80) x/ ∈ G | (Φ (cid:63) ψ )( x ) | . As in the proofs of Proposition 56and Proposition 57, we let λ denote the distribution with probability mass function λ ( x ) = | ψ ( x ) | .Then2 (cid:88) x/ ∈ G | (Φ (cid:63) ψ )( x ) | = 2 R +1 E λ ⊗ R [ | Φ( . . . , sgn ψ ( x i ) , . . . ) | · I ( x / ∈ G )]= 2 (cid:88) z ∈{− , } R | Φ( z ) | · Pr λ ⊗ R [ x / ∈ G | ( . . . , sgn( ψ ( x i )) , . . . ) = z ]= Pr λ ⊗ R [ x / ∈ G | ( . . . , sgn( ψ ( x i )) , . . . ) = − ] + Pr λ ⊗ R [ x / ∈ G | ( . . . , sgn( ψ ( x i )) , . . . ) = ] . We analyze each term of this sum separately. For the ﬁrst term, observe that by one-sided errorof ψ , it follows that if x = ( x , . . . , x R ) is any input sgn( ψ ( x i )) = − i then OR N ( x i ) = − i . Thus we are guaranteed that x ∈ G , so the contribution of the ﬁrst summand is zero. To55nalyze the second summand, let us denote by r i ∈ { , } the indicator random variable for theevent OR N ( x i ) = − x i is drawn from the conditional distribution ( λ | sgn( ψ ( x i )) = 1). ThenPr[ r i = 1] = Pr x i ∼ λ [ OR N ( x i ) = − | sgn( ψ ( x i )) = 1]= 2 (cid:88) x ∈ A +1 | ψ ( x i ) |≤ δ by (124). Hence,Pr λ ⊗ R [ x / ∈ G | ( . . . , sgn( ψ ( x i )) , . . . ) = ] ≤ Pr (cid:34) R (cid:88) i =1 r i > γR (cid:35) ≤ exp (cid:18) − ( γ − δ ) R (cid:19) by the multiplicative Chernoﬀ bound. Thus, we have2 (cid:88) x/ ∈ G | (Φ (cid:63) ψ )( x ) | ≤ exp (cid:18) − ( γ − δ ) R (cid:19) . Putting everything together, we see that Expression (125) is at least 1 − δ R − exp ( − ( γ − δ ) R/

3) aswe wanted to show.

It follows from a reduction in Ambainis et al. [ABRdW16, Section 6] that for any N = O ( R ) andsuﬃciently small constant γ >

0, a ˜Ω( R / ) lower bound for the approximate degree or quantumquery complexity of IST γN,R implies an ˜Ω( k / ) approximate degree or quantum query lower boundfor k -junta testing for proximity parameter ε = 1 /

3. Hence, Theorem 61 has the following corollary.

Corollary 70.

Any quantum tester that distinguishes k -juntas from functions that are (1 / -farfrom any k -junta with error probability at most / makes ˜Ω( k / ) queries to the function. SDU

The goal of this section is to derive a lower bound for approximating the statistical distance of aninput distribution from uniform up to some additive constant error. We formalize this problem asfollows.Given an input ( s , . . . , s N ) ∈ [ R ] N , and i ∈ [ R ], let f i = |{ j : s j = i }| , and let p be theprobability distribution over [ R ] such that p i = f i /N . For N ≥ R and 0 < γ < γ <

1, deﬁne thepartial function

SDU γ ,γ N,R as follows. The formulation we use here is as follows. Let r , . . . , r R be independent { , } -valued random variables, S = (cid:80) Ri =1 r i , and µ = E [ S ]. Then for η >

1, we have Pr[

S > (1 + η ) µ ] ≤ exp( − ηµ/ µ ≤ δR and η = γR/µ − > eﬁnition 71. Deﬁne

SDU γ ,γ N,R ( s , . . . , s N ) =  − if (cid:80) Ri =1 | p i − /R | ≤ γ if (cid:80) Ri =1 | p i − /R | ≥ γ undeﬁned otherwise . Above, (cid:80) Ri =1 | p i − /R | is the statistical distance between p and the uniform distribution. The

SDU problem reduces to

IST in the sense that any approximating polynomial for

SDU implies the existence of an approximation to

IST of the same degree. Hence, the approximate degreeof

SDU is at least as large as that of

IST . For intuition as to why this is true, let us relate

IST / N,R to SDU / , N,R in the special case where N = R and no dummy (i.e., 0) items appear in the input to IST .If ( s , . . . , s N ) is a true input to IST / N,R , i.e.,

IST / N,R ( s , . . . , s N ) = −

1, then every index i ∈ [ R ]must appear in the input list exactly once. Hence, the distribution represented by ( s , . . . , s N )is exactly uniform, so SDU / , N,R ( s , . . . , s N ) = −

1. On the other hand, if

IST / N,R ( s , . . . , s N ) = 1,then at most R/ i ∈ [ R ] appear in the input list, so the list represents a distribution withstatistical distance at least 1 / SDU / , N,R isalso an approximating polynomial for

IST / N,R .We will actually need a more general relationship between the approximate degrees of

SDU and

IST to handle the fact that we cannot take N to be exactly equal to R in the IST lower bound, aswell as to handle the occurrences of dummy items in the deﬁnition of

IST . Theorem 72.

For some N = O ( R ) , and some constants < γ < γ < , (cid:103) deg (cid:16) SDU γ ,γ N,R (cid:17) ≥ ˜Ω( R / ) . The same lower bound applies to the quantum query complexity of SDU γ ,γ N,R .Proof.

Fix

R >

0, let c be the constant from Theorem 61, γ < (2 / c ) be a suﬃciently smallconstant, and N = c · γ − / · R . As inputs in IST γN,R are interpreted as elements of [ R ] N , we canequivalently interpret them as elements of [ R + 1] N , i.e., as inputs to SDU γ ,γ N,R +1 , for any desired0 < γ < γ < γ = 1 − γ/ γ = 1 − γ / /c . Observe that since γ < (2 / c ) , γ is strictly greaterthan γ . We claim that (cid:16) IST γN,R (cid:17) − ( − ⊆ (cid:16) SDU γ ,γ N,R +1 (cid:17) − ( − , and (cid:16) IST γN,R (cid:17) − (+1) ⊆ (cid:16) SDU γ ,γ N,R +1 (cid:17) − (+1) . Indeed, since inputs in (cid:16)

IST γN,R (cid:17) − ( −

1) deﬁne a probability distribution over [ R + 1] with supportsize at least R , with all probabilities being integer multiples of 1 /N = γ / / ( cR ), the statisticaldistance between any such distribution and the uniform distribution is at most 1 − R/N = 1 − γ / /c .This follows from the following calculation. Amongst probability distributions ( p , . . . , p R +1 ) over[ R + 1] with support size at least R and all probabilities p i being integer multiples of 1 /N , it is nothard to see that one maximizes the statistical distance from the uniform distribution over [ R + 1]57y setting p = 1 − R − N , p = p = · · · = p R = 1 /N , and p R +1 = 0. The statistical distance fromuniform is:12 (cid:18)(cid:18) − R − N − R + 1 (cid:19) + ( R − (cid:18) R + 1 − N (cid:19) + 1 R + 1 (cid:19) = 1 − RN + 1 N − R + 1 ≤ − RN , where we have assumed that N ≥ R + 1, which is true for suﬃciently small choice of γ .Similarly, since inputs in (cid:16) IST γN,R (cid:17) − (1) deﬁne a probability distribution over [ R + 1] withsupport size at most γ · R + 1, the statistical distance between any such distribution and the uniformdistribution is at least 1 − γ/

2. To see this, let p = ( p , . . . , p R +1 ) be any distribution of supportsize at most γ · R , and let S = { i : p i = 0 } and ¯ S be the complement of S . Then the statisticaldistance of p from uniform is at least12 (cid:32)(cid:88) i ∈ S (cid:18) p i − R + 1 (cid:19)(cid:33) + 12 (cid:88) i ∈ ¯ S (cid:18) p i − R + 1 (cid:19) = 12  | S | R + 1 + (cid:88) i ∈ ¯ S p i  − | ¯ S | R + 1  = 12 (cid:18) | S | R + 1 + 1 − | ¯ S | R + 1 (cid:19) ≥ (cid:18) (1 − γ ) RR + 1 + 1 − γR + 1 R + 1 (cid:19) = 12 (cid:18) R − γR − R + 1 (cid:19) = 1 − γR + 1 R + 1 = 1 − γ − γ + 1 R + 1 ≥ − γ/ . It is an easy consequence of the above that any ε -approximating polynomial of degree d for SDU γ ,γ N,R +1 implies an approximation to IST γN,R of the same degree (i.e., that (cid:103) deg ε ( IST γN,R ) ≤ d ).Theorem 72 then follows from Theorem 61. Given a distribution p over [ R ], the Shannon entropy of p , denoted H ( p ), is deﬁned to be H ( p ) := (cid:80) i ∈ [ R ] p i log (1 /p i ). Following Goldreich and Vadhan [GV11], we deﬁne a partial function GapCmprEnt α,βN,R capturing the problem of comparing the entropies of two distributions.The function

GapCmprEnt α,βN,R takes as input two vectors in [ R ] N and interprets each vector i ∈ { , } as a probability distribution p i over [ R ], with p i ( j ) = f i,j /N where f i,j is the frequency of j in the i th vector. The function GapCmprEnt α,βN,R evaluates to  − H ( p ) − H ( p ) ≤ β H ( p ) − H ( p ) ≥ α undeﬁned otherwise . Theorem 73.

There exist constants < β < α < such that (cid:103) deg( GapCmprEnt α,βN,R ) = ˜Ω( R / ) . The same lower bound applies to the quantum query complexity of

GapCmprEnt α,βN,R .Proof.

Vadhan [Vad99, Claim 4.4.2 and Remark 4.4.3] showed that as long as H ((1+ γ ) / < − γ − λ , then SDU γ ,γ N,R is reducible to

GapCmprEnt α,β N, R for some constants α, β such that α − β = λ . Thisreduction (described next for completeness) implies that (cid:103) deg ε ( SDU γ ,γ N,R ) ≤ (cid:103) deg ε ( GapCmprEnt α,β N, R ).58or completeness, we sketch this transformation, closely following the presentation of Goldreichand Vadhan [GV11]. At a high level, the reduction transforms an input in [ R ] N to SDU γ ,γ N,R (interpreted as a distribution p over [ R ]) into two distributions p , p over [ R ] × { , } as follows.Both p and p start by sampling an s ∈ { , } at random. If s = 0, then a random sample r ischosen from p , and if s = 1, then r is set to a uniform random sample from [ R ]. Distribution p outputs ( r, s ), while p outputs ( r, b ) for a random b ∈ { , } .The entropy of p is always v + 1, where v = H ( p ) + log ( R ). As for the entropy of p , if p is far from the uniform distribution, then the selection bit s will be essentially determined by thesample r . Hence, the entropy of p will be approximately v , which is noticeably smaller than theentropy of p . On the other hand, if the two input distributions are close then (even conditionedon the sample selected) the selection bit s will be almost random and so H ( p ) ≈ v + 1, whichis approximately the same as H ( p ). Quantitatively, Vadhan [Vad99] shows that if the statisticaldistance between p and the uniform distribution is δ , then 1 − δ ≤ H ( p ) − H ( p ) ≤ H ((1 + δ ) / R ] N , this transformation can beequivalently described as follows. Assume for simplicity that R divides N . If p is speciﬁed by avector u in [ R ] N , then p is speciﬁed by a vector w in ([ R ] × { , } ) N deﬁned as follows. For all i ∈ [ N ] and j ∈ { , } , w i,j = ( u i , j ∈ { , } , w i,j = ( (cid:100) Ri/N (cid:101) , p is speciﬁedby a vector v in ([ R ] × { , } ) N . For i ∈ [ R ] and j ∈ { , } , v i,j = ( u i , j ), and for j ∈ { , } , v i,j = ( (cid:100) Ri/N (cid:101) , j − u , v , and w as vectors in {− , } N log ( R ) or {− , } N · log ( R ) , each bit of v and w depends on at most one bit of u .Recall that in the statement of Theorem 72, γ = 1 − γ/ γ = 1 − γ / /c , where γ isan arbitrary constant less than (2 / c ) , where c > H ((1 + γ ) / H (1 − γ/

4) = H (3 γ/ p ∈ [0 , / H ( p ) = − p log ( p ) − (1 − p ) log (1 − p ) ≤ − p log ( p ) ≤ p / , it follows that for some constant γ ≤ / c , we have H (3 γ/ < γ / /c . Hence, Vadhan’sreduction from SDU γ ,γ N,R to GapCmprEnt α,β N, R applies to this setting of γ and γ , and this showsthat any degree d ε -approximating polynomial for GapCmprEnt α,β N, R implies a degree d polynomial ε -approximating polynomial for SDU γ ,γ N,R .Combined with Theorem 72, this implies that (cid:103) deg(

GapCmprEnt α,β N, R ) = ˜Ω( R / ) . Clearly, a quantum query algorithm that approximates entropy up to additive error ( α − β ) / GapCmprEnt α,β N, R , by (approximately) computing the entropies of each ofthe two input distributions, and determining whether the diﬀerence is at most ( β + α ) /

2. Hence,Theorem 73 implies the following lower bound for approximating entropy to additive error α − β . Corollary 74.

Let N = c · R for a suﬃciently large constant c . Interpret an input in [ R ] N asa distribution p in the natural way (i.e., for each j ∈ [ R ] , p j = f j /N , where f j is the number oftimes j appears in the input). There is a constant ε > such that any quantum algorithm thatapproximates the entropy of p up to additive error ε with probability at least / requires ˜Ω( R / ) queries. Conclusion and Open Questions

We conclude by brieﬂy describing some additional consequences of our results, as well as a numberof open questions and directions for future work. For any constant k > k -distinctness is computed by a DNF of polynomial size. Our ˜Ω (cid:0) n / − / (2 k ) (cid:1) is the best known lower bound on the approximate degree of polynomial size DNF formulae. Theprevious best was ˜Ω( n / ) for Element Distinctness (a.k.a., 2-Distinctness) [AS04], although Bunand Thaler did establish, for any δ >

0, an Ω( n − δ ) lower bound on the approximate degree of quasi polynomial size DNFs.Similarly, for any constant k ≥

1, Bun and Thaler exhibited an AC circuit of depth 2 k − (cid:16) n − k − / k − (cid:17) . Our techniques can be used to give a polynomial improvementfor any ﬁxed k ≥

2, to ˜Ω (cid:16) n − − k (cid:17) (Theorem 1 is the special case of k = 2, as SURJ is computed byan AC circuit of depth three). We omit further details of this result for brevity. The most obvious direction for future work is to extend our techniques to resolve the approximatedegree and quantum query complexity of additional problems of interest in the study of quantumalgorithms. These include triangle ﬁnding problem [MSS07, Gal14], graph collision [MSS07], andverifying matrix products [BˇS06, KN16]. It would also be interesting to close the gap betweenour Ω( n / − / (2 k ) ) lower bound for k -distinctness and Belovs’ O (cid:16) n / − / (2 k +2 − ) (cid:17) upper bound,especially for small values of k (e.g., k = 3).Although we prove a lower bound of ˜Ω( R / ) for SDU γ ,γ N,R for some constants 0 < γ < γ , weleave open whether or not SDU / , / N,R = ˜Ω( R / ). It may be tempting to suspect that Theorem 72implies an ˜Ω( R / ) lower bound on SDU / , / N,R , by invoking the well-known Polarization Lemma ofSahai and Vadhan [SV03]. The Polarization Lemma reduces

SDU γ ,γ N,R for any pair of constant γ , γ with γ < γ to SDU / , / N (cid:48) ,R (cid:48) for an appropriate choice of N (cid:48) and R (cid:48) . Unfortunately, N (cid:48) and R (cid:48) maybe polynomially larger than N and R , so this reduction does not give an ˜Ω( R / ) lower bound for SDU / , / N,R itself.Another important direction is to resolve the approximate degree of speciﬁc classes of functions,especially polynomial size DNF formulae, and AC circuits. As mentioned in the previous subsection,our k -distinctness lower bound (Theorem 2) gives the best known lower bound on polynomial sizeDNFs. A compelling candidate for improving this lower bound is the k -sum function, which mayhave approximate degree as large as Θ( n k/ ( k +1) ) (it is known that the quantum query complexityof k -sum is ˜Θ( n k/ ( k +1) ) [Amb07, BS13]). On the upper bounds side, it may be possible to extendthe techniques underlying our ˜ O ( n / ) upper bound on the approximate degree of SURJ to yield asublinear upper bound for every

DNF formula of polynomial size.

Open Problem 75.

For every constant c > and every DNF formula f : {− , } n → {− , } ofsize at most n c , is there a δ > (depending only on c ) such that (cid:103) deg( f ) = O ( n − δ ) ?

60 positive answer to Open Problem 75 would have major algorithmic consequences, includinga subexponential time algorithm for agnostically learning DNF formulae [KKMS08] (and PAClearning depth three circuits [KS04]) of any ﬁxed polynomial size.For general AC circuits, an Ω( n − δ ) approximate degree lower bound is already known [BT17].It would be very interesting to improve this lower bound to an optimal Ω( n ). Until recently, SURJ was a prime candidate for exhibiting such a lower bound. However, owing to Sherstov’s upperbound [She18] and Theorem 1,

SURJ is no longer a candidate function. However, we are optimisticabout the following closely related candidate. An approximate majority function is any totalBoolean function that evaluates to − / − every approximate majority has approximate degree Ω( n ); proving this wouldresolve a question of Srinivasan [FHH + Acknowledgements

We are grateful to Sasha Sherstov for an inspiring conversation at the BIRS 2017 workshop onCommunication Complexity and Applications, II, which helped to spark this work. We also thankthe anonymous STOC and Theory of Computing reviewers for comments improving the presentationof this manuscript.This work was done while M. B. was a postdoctoral researcher at Princeton University. Some ofthis work was performed when R. K. was a postdoctoral associate at MIT and was partly supportedby NSF grant CCF-1629809.

References [Aar12] Scott Aaronson. Impossibility of succinct quantum proofs for collision-freeness.

Quan-tum Information & Computation , 12(1-2):21–28, 2012. [p. 3][ABK16] Scott Aaronson, Shalev Ben-David, and Robin Kothari. Separations in query complexityusing cheat sheets. In

Proceedings of the 48th Annual ACM SIGACT Symposium onTheory of Computing, STOC 2016, Cambridge, MA, USA, June 18-21, 2016 , pages863–876, 2016. [p. 4][ABRdW16] Andris Ambainis, Aleksandrs Belovs, Oded Regev, and Ronald de Wolf. Eﬃcientquantum algorithms for (gapped) group testing and junta testing. In

Proceedings ofthe Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms , pages903–922. Society for Industrial and Applied Mathematics, 2016. [pp. 4, 6, 7, 11, 56][ACR +

10] Andris Ambainis, Andrew M. Childs, Ben Reichardt, Robert ˇSpalek, and ShengyuZhang. Any AND-OR formula of size n can be evaluated in time n1/2+o(1) on aquantum computer.

SIAM J. Comput. , 39(6):2513–2530, 2010. [p. 3][Ajt83] Mikl´os Ajtai. σ -formulae on ﬁnite structures. Annals of pure and applied logic ,24(1):1–48, 1983. [p. 61] 61Amb02] Andris Ambainis. Quantum lower bounds by quantum arguments.

Journal of Computerand System Sciences , 64(4):750–767, June 2002. [p. 3][Amb03] Andris Ambainis. Polynomial degree vs. quantum query complexity. In

Foundationsof Computer Science, 2003. Proceedings. 44th Annual IEEE Symposium on , pages230–239. IEEE, 2003. [p. 4][Amb05] Andris Ambainis. Polynomial degree and lower bounds in quantum complexity:Collision and element distinctness with small range.

Theory of Computing , 1(1):37–46,2005. [p. 17][Amb07] Andris Ambainis. Quantum walk algorithm for element distinctness.

SIAM Journalon Computing , 37(1):210–239, 2007. [pp. 6, 60][AS04] Scott Aaronson and Yaoyun Shi. Quantum lower bounds for the collision and theelement distinctness problems.

J. ACM , 51(4):595–605, 2004. [pp. 3, 4, 5, 6, 8, 60][AS07] Alp Atıcı and Rocco A Servedio. Quantum algorithms for learning and testing juntas.

Quantum Information Processing , 6(5):323–348, 2007. [p. 7][BBC +

01] Robert Beals, Harry Buhrman, Richard Cleve, Michele Mosca, and Ronald de Wolf.Quantum lower bounds by polynomials.

J. ACM , 48(4):778–797, 2001. [pp. 3, 4, 51][BCdWZ99] Harry Buhrman, Richard Cleve, Ronald de Wolf, and Christof Zalka. Bounds for small-error and zero-error quantum algorithms. In

FOCS , pages 358–368. IEEE ComputerSociety, 1999. [p. 4][BCH +

17] Adam Bouland, Lijie Chen, Dhiraj Holden, Justin Thaler, and Prashant NaliniVasudevan. On the power of statistical zero knowledge. In , pages 708–719, 2017. [pp. 3, 8, 53][Bei94] Richard Beigel. Perceptrons, PP, and the Polynomial Hierarchy.

ComputationalComplexity , 4:339–349, 1994. [p. 3][Bel12a] Aleksandrs Belovs. Learning-graph-based quantum algorithm for k-distinctness. In

Foundations of Computer Science (FOCS), 2012 IEEE 53rd Annual Symposium on ,pages 207–216. IEEE, 2012. [pp. 4, 6, 11][Bel12b] Aleksandrs Belovs. Span programs for functions with constant-sized 1-certiﬁcates. In

Proceedings of the forty-fourth annual ACM symposium on Theory of computing , pages77–84. ACM, 2012. [p. 6][BHH11] Sergey Bravyi, Aram Wettroth Harrow, and Avinatan Hassidim. Quantum algorithmsfor testing properties of distributions.

IEEE Trans. Information Theory , 57(6):3971–3981, 2011. [pp. 4, 7][BHT98] Gilles Brassard, Peter Høyer, and Alain Tapp.

Quantum counting , pages 820–831.Springer Berlin Heidelberg, 1998. [p. 25]62BIVW16] Andrej Bogdanov, Yuval Ishai, Emanuele Viola, and Christopher Williamson. Boundedindistinguishability and the complexity of recovering secrets. In Matthew Robshawand Jonathan Katz, editors,

Advances in Cryptology - CRYPTO 2016 - 36th AnnualInternational Cryptology Conference, Santa Barbara, CA, USA, August 14-18, 2016,Proceedings, Part III , volume 9816 of

Lecture Notes in Computer Science , pages593–618. Springer, 2016. [p. 3][Bla09] Eric Blais. Testing juntas nearly optimally. In

Proceedings of the Forty-ﬁrst AnnualACM Symposium on Theory of Computing , STOC ’09, pages 151–158, New York, NY,USA, 2009. ACM. [p. 7][BM12] Paul Beame and Widad Machmouchi. The quantum query complexity of AC0.

QuantumInformation & Computation , 12(7-8):670–676, 2012. [pp. 4, 5, 6][BNRdW07] Harry Buhrman, Ilan Newman, Hein R¨ohrig, and Ronald de Wolf. Robust polynomialsand quantum algorithms.

Theory Comput. Syst. , 40(4):379–395, 2007. [pp. 28, 29][BˇS06] Harry Buhrman and Robert ˇSpalek. Quantum veriﬁcation of matrix products. In

Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithms,SODA 2006, Miami, Florida, USA, January 22-26, 2006 , pages 880–889. ACM Press,2006. [p. 60][BS13] Aleksandrs Belovs and Robert Spalek. Adversary lower bound for the k-sum problem.In

Proceedings of the 4th Conference on Innovations in Theoretical Computer Science ,ITCS ’13, pages 323–328, New York, NY, USA, 2013. ACM. [p. 60][BSS03] Howard Barnum, Michael Saks, and Mario Szegedy. Quantum query complexity andsemi-deﬁnite programming. In , pages 179–193, 2003. [p. 3][BT13] Mark Bun and Justin Thaler. Dual lower bounds for approximate degree and Markov-Bernstein inequalities. In Fedor V. Fomin, Rusins Freivalds, Marta Z. Kwiatkowska,and David Peleg, editors,

ICALP (1) , volume 7965 of

Lecture Notes in ComputerScience , pages 303–314. Springer, 2013. [pp. 8, 9, 10, 40, 45][BT15] Mark Bun and Justin Thaler. Hardness ampliﬁcation and the approximate degree ofconstant-depth circuits. In Magn´us M. Halld´orsson, Kazuo Iwama, Naoki Kobayashi,and Bettina Speckmann, editors,

Automata, Languages, and Programming - 42ndInternational Colloquium, ICALP 2015, Kyoto, Japan, July 6-10, 2015, Proceedings,Part I , volume 9134 of

Lecture Notes in Computer Science , pages 268–280. Springer,2015. Full version available at http://eccc.hpi-web.de/report/2013/151 . [pp. 8,11, 45, 53][BT16a] Mark Bun and Justin Thaler. Dual polynomials for Collision and Element Distinctness.

Theory of Computing , 12(16):1–34, 2016. [p. 13][BT16b] Mark Bun and Justin Thaler. Improved bounds on the sign-rank of AC . In IoannisChatzigiannakis, Michael Mitzenmacher, Yuval Rabani, and Davide Sangiorgi, editors, , volume 55 of LIPIcs , pages 37:1–37:14. SchlossDagstuhl - Leibniz-Zentrum fuer Informatik, 2016. [p. 8][BT17] Mark Bun and Justin Thaler. A nearly optimal lower bound on the approximatedegree of AC0. In , pages 1–12, 2017. [pp. 6, 8, 9,10, 11, 12, 16, 17, 18, 20, 36, 52, 61][BT18] Mark Bun and Justin Thaler. The large-error approximate degree of ac0. In

ElectronicColloquium on Computational Complexity (ECCC) , volume 25, page 143, 2018. Toappear in

International Conference on Randomization and Computation (RANDOM) ,2019. [p. 51][BVdW07] Harry Buhrman, Nikolai K. Vereshchagin, and Ronald de Wolf. On computation andcommunication with small bias. In , pages 24–32.IEEE Computer Society, 2007. [p. 3][CA08] Arkadev Chattopadhyay and Anil Ada. Multiparty communication complexity ofdisjointness.

Electronic Colloquium on Computational Complexity (ECCC) , 15(002),2008. [pp. 3, 8][Che82] E.W. Cheney.

Introduction to Approximation Theory . AMS Chelsea Publishing Series.AMS Chelsea Pub., 1982. [p. 29][CTUW14] Karthekeyan Chandrasekaran, Justin Thaler, Jonathan Ullman, and Andrew Wan.Faster private release of marginals on small databases. In

Innovations in TheoreticalComputer Science, ITCS’14, Princeton, NJ, USA, January 12-14, 2014 , pages 387–402,2014. [p. 3][DP08] Matei David and Toniann Pitassi. Separating NOF communication complexity classesRP and NP.

Electronic Colloquium on Computational Complexity (ECCC) , 15(014),2008. [p. 3][DPV09] Matei David, Toniann Pitassi, and Emanuele Viola. Improved separations betweennondeterministic and randomized multiparty communication.

TOCT , 1(2), 2009. [p. 3][FHH +

14] Yuval Filmus, Hamed Hatami, Steven Heilman, Elchanan Mossel, Ryan O’Donnell,Sushant Sachdeva, Andrew Wan, and Karl Wimmer. Real analysis in computer science:A collection of open problems, 2014. [p. 61][Gal14] Fran¸cois Le Gall. Improved quantum algorithm for triangle ﬁnding via combinatorialarguments. In , pages 216–225, 2014. [p. 60][GKP94] Ronald L. Graham, Donald E. Knuth, and Oren Patashnik.

Concrete Mathematics:A Foundation for Computer Science . Addison-Wesley Longman Publishing Co., Inc.,Boston, MA, USA, 2nd edition, 1994. [p. 37]64GS10] Dmitry Gavinsky and Alexander A. Sherstov. A separation of NP and coNP inmultiparty communication complexity.

Theory of Computing , 6(1):227–245, 2010. [pp.3, 54][GV11] Oded Goldreich and Salil P Vadhan. On the complexity of computational prob-lems regarding distributions (a survey). In

Electronic Colloquium on ComputationalComplexity (ECCC) , volume 18, page 4, 2011. [pp. 58, 59][HLˇS07] Peter Høyer, Troy Lee, and Robert ˇSpalek. Negative weights make adversaries stronger.In

Proceedings of the 39th Symposium on Theory of Computing (STOC 2007) , pages526–535, 2007. [pp. 3, 4][KKMS08] Adam Tauman Kalai, Adam R. Klivans, Yishay Mansour, and Rocco A. Servedio.Agnostically learning halfspaces.

SIAM J. Comput. , 37(6):1777–1805, 2008. [pp. 3, 61][KLS96] Jeﬀ Kahn, Nathan Linial, and Alex Samorodnitsky. Inclusion-exclusion: Exact andapproximate.

Combinatorica , 16(4):465–477, 1996. [p. 3][KN16] Robin Kothari and Ashwin Nayak. Quantum algorithms for matrix multiplication andproduct veriﬁcation. In

Encyclopedia of Algorithms , pages 1673–1677. Springer, 2016.[p. 60][KS04] Adam R. Klivans and Rocco A. Servedio. Learning DNF in time 2 ˜ O ( n / ) . J. Comput.Syst. Sci. , 68(2):303–318, 2004. [pp. 3, 61][KS06] Adam R. Klivans and Rocco A. Servedio. Toward attribute eﬃcient learning of decisionlists and parities.

Journal of Machine Learning Research , 7:587–602, 2006. [p. 3][KˇSdW07] Hartmut Klauck, Robert ˇSpalek, and Ronald de Wolf. Quantum and classical strongdirect product theorems and optimal time-space tradeoﬀs.

SIAM Journal on Computing ,36(5):1472–1493, 2007. [p. 4][KT14] Varun Kanade and Justin Thaler. Distribution-independent reliable learning. In Maria-Florina Balcan and Csaba Szepesv´ari, editors,

Proceedings of The 27th Conferenceon Learning Theory, COLT 2014, Barcelona, Spain, June 13-15, 2014 , volume 35 of

JMLR Proceedings , pages 3–24. JMLR.org, 2014. [p. 3][Lee09] Troy Lee. A note on the sign degree of formulas.

CoRR , abs/0909.4607, 2009. [pp. 10,17][LM04] Sophie Laplante and Fr´ed´eric Magniez. Lower bounds for randomized and quantumquery complexity using Kolmogorov arguments. In

Proceedings of the 19th Conferenceon Computational Complexity , pages 294–304, June 2004. [p. 3][LMR +

11] Troy Lee, Rajat Mittal, Ben W. Reichardt, Robert ˇSpalek, and Mario Szegedy. Quan-tum query complexity of state conversion. In

Proceedings of the 52nd Symposium onFoundations of Computer Science (FOCS 2011) , pages 344–353, 2011. [pp. 3, 4][LS09a] Troy Lee and Adi Shraibman. An approximation algorithm for approximation rank.In

Proceedings of the 24th Annual IEEE Conference on Computational Complexity,CCC 2009, Paris, France, 15-18 July 2009 , pages 351–357, 2009. [p. 3]65LS09b] Troy Lee and Adi Shraibman. Disjointness is hard in the multiparty number-on-the-forehead model.

Computational Complexity , 18(2):309–336, 2009. Preliminary versionin

CCC

IEEETransactions on Information Theory , 65(5):2899–2921, 2018. [pp. 4, 6, 7][Mon16] Ashley Montanaro. The quantum complexity of approximating the frequency moments.

Quantum Information & Computation , 16(13&14):1169–1190, 2016. [p. 6][MP69] Marvin Minsky and Seymour Papert.

Perceptrons - an introduction to computationalgeometry . MIT Press, 1969. [pp. 3, 8, 14][MSS07] Fr´ed´eric Magniez, Miklos Santha, and Mario Szegedy. Quantum algorithms for thetriangle problem.

SIAM J. Comput. , 37(2):413–424, 2007. [p. 60][NS94] Noam Nisan and Mario Szegedy. On the degree of Boolean functions as real polynomials.

Computational Complexity , 4:301–313, 1994. [pp. 14, 25][OS10] Ryan O’Donnell and Rocco A. Servedio. New degree bounds for polynomial thresholdfunctions.

Combinatorica , 30(3):327–358, 2010. [pp. 3, 37][Rei11] Ben W Reichardt. Reﬂections for quantum query algorithms. In

Proceedings of thetwenty-second annual ACM-SIAM Symposium on Discrete Algorithms , pages 560–569.Society for Industrial and Applied Mathematics, 2011. [pp. 3, 4][RS10] Alexander A. Razborov and Alexander A. Sherstov. The sign-rank of AC . SIAM J.Comput. , 39(5):1833–1855, 2010. [pp. 10, 20][RY15] Anup Rao and Amir Yehudayoﬀ. Simpliﬁed lower bounds on the multiparty communi-cation complexity of disjointness. In , pages 88–101, 2015. [p. 3][She08] Alexander A. Sherstov. Communication lower bounds using dual polynomials.

Bulletinof the EATCS , 95:59–93, 2008. [p. 3][She09a] Alexander A. Sherstov. Approximate inclusion-exclusion for arbitrary symmetricfunctions.

Computational Complexity , 18(2):219–247, 2009. [p. 3][She09b] Alexander A. Sherstov. Separating AC0 from depth-2 majority circuits.

SIAM J.Comput. , 38(6):2113–2129, 2009. [pp. 3, 8][She11] Alexander A. Sherstov. The pattern matrix method.

SIAM J. Comput. , 40(6):1969–2000, 2011. Preliminary version in

STOC

Proceedings of the 44th Symposium on Theory of Computing Conference, STOC2012, New York, NY, USA, May 19 - 22, 2012 , pages 525–548, 2012. [p. 3][She13a] Alexander A. Sherstov. Approximating the AND-OR Tree.

Theory of Computing ,9(20):653–663, 2013. [pp. 8, 10, 40, 45]66She13b] Alexander A. Sherstov. Communication lower bounds using directional derivatives. InDan Boneh, Tim Roughgarden, and Joan Feigenbaum, editors,

Symposium on Theoryof Computing Conference, STOC’13, Palo Alto, CA, USA, June 1-4, 2013 , pages921–930. ACM, 2013. [p. 3][She13c] Alexander A. Sherstov. The intersection of two halfspaces has high threshold degree.

SIAM J. Comput. , 42(6):2329–2374, 2013. [pp. 8, 10, 17, 46][She13d] Alexander A. Sherstov. Making polynomials robust to noise.

Theory of Computing ,9:593–615, 2013. [p. 9][She14] Alexander A. Sherstov. Breaking the Minsky-Papert barrier for constant-depth circuits.In David B. Shmoys, editor,

Symposium on Theory of Computing, STOC 2014, NewYork, NY, USA, May 31 - June 03, 2014 , pages 223–232. ACM, 2014. [pp. 8, 53][She15] Alexander A. Sherstov. The power of asymmetry in constant-depth circuits. In

IEEE56th Annual Symposium on Foundations of Computer Science, FOCS 2015, Berkeley,CA, USA, 17-20 October, 2015 , pages 431–450, 2015. [pp. 5, 8][She18] Alexander A. Sherstov. Algorithmic polynomials. In

Proceedings of the 50th AnnualACM SIGACT Symposium on Theory of Computing, STOC 2018, Los Angeles, CA,USA, June 25-29, 2018 , pages 311–324, 2018. [pp. 4, 5, 6, 61][ˇSpa08] Robert ˇSpalek. A dual polynomial for OR.

CoRR , abs/0803.4516, 2008. [pp. 8, 36][ˇSS06] Robert ˇSpalek and Mario Szegedy. All quantum adversary methods are equivalent.

Theory of Computing , 2(1):1–18, 2006. [p. 4][STT12] Rocco A. Servedio, Li-Yang Tan, and Justin Thaler. Attribute-eﬃcient learning andweight-degree tradeoﬀs for polynomial threshold functions. In

COLT , pages 14.1–14.19,2012. [p. 3][SV03] Amit Sahai and Salil Vadhan. A complete problem for statistical zero knowledge.

Journal of the ACM (JACM) , 50(2):196–249, 2003. [p. 60][SZ09] Yaoyun Shi and Yufan Zhu. Quantum communication complexity of block-composedfunctions.

Quantum Information & Computation , 9(5):444–460, 2009. [pp. 8, 10, 17][Tal14] Avishay Tal. Shrinkage of De Morgan formulae by spectral techniques. In , pages 551–560, 2014. [p. 3][Tal17] Avishay Tal. Formula lower bounds via the quantum method. In

Proceedings ofthe 49th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2017,Montreal, QC, Canada, June 19-23, 2017 , pages 1256–1268, 2017. [p. 3][Tha16] Justin Thaler. Lower Bounds for the Approximate Degree of Block-Composed Func-tions. In , volume 55 of

Leibniz International Proceedings in Informatics (LIPIcs) ,pages 17:1–17:15, Dagstuhl, Germany, 2016. Schloss Dagstuhl–Leibniz-Zentrum fuerInformatik. [pp. 8, 9] 67TUV12] Justin Thaler, Jonathan Ullman, and Salil P. Vadhan. Faster algorithms for privatelyreleasing marginals. In Artur Czumaj, Kurt Mehlhorn, Andrew M. Pitts, and RogerWattenhofer, editors,

Automata, Languages, and Programming - 39th InternationalColloquium, ICALP 2012, Warwick, UK, July 9-13, 2012, Proceedings, Part I , volume7391 of

Lecture Notes in Computer Science , pages 810–821. Springer, 2012. [p. 3][Vad99] Salil Pravin Vadhan.

A study of statistical zero-knowledge proofs . PhD thesis, Mas-sachusetts Institute of Technology, 1999. [pp. 11, 58, 59][Vio17] Emanuele Viola. Lecture notes for Emanuele Viola’s Fall 2017 course at NortheasternUniversity on Special Topics in Complexity Theory, 2017. Scribe: Biswaroop Maiti.Guest lecture by Justin Thaler. Available online at . [p. 21][VV11] Gregory Valiant and Paul Valiant. Estimating the unseen: An n/ log( n )-sampleestimator for entropy and support size, shown optimal via new clts. In Proceedings ofthe Forty-third Annual ACM Symposium on Theory of Computing , STOC ’11, pages685–694, New York, NY, USA, 2011. ACM. [pp. 6, 7][Zha05] Shengyu Zhang. On the power of Ambainis lower bounds.

Theoretical ComputerScience , 339(2):241–256, 2005. [pp. 3, 4][Zha15] Mark Zhandry. A note on the quantum collision and set equality problems.