[PDF] Approximate degree, secret sharing, and concentration phenomena

Abstract

The ϵ -approximate degree de g ϵ (f) of a Boolean function f is the least degree of a real-valued polynomial that approximates f pointwise to error ϵ . The approximate degree of f is at least k iff there exists a pair of probability distributions, also known as a dual polynomial, that are perfectly k -wise indistinguishable, but are distinguishable by f with advantage 1−ϵ . Our contributions are: We give a simple new construction of a dual polynomial for the AND function, certifying that de g ϵ (f)≥Ω( nlog1/ϵ − − − − − − − √ ) . This construction is the first to extend to the notion of weighted degree, and yields the first explicit certificate that the 1/3 -approximate degree of any read-once DNF is Ω( n − − √ ) . We show that any pair of symmetric distributions on n -bit strings that are perfectly k -wise indistinguishable are also statistically K -wise indistinguishable with error at most K 3/2 ⋅exp(−Ω( k 2 /K)) for all k<K<n/64 . This implies that any symmetric function f is a reconstruction function with constant advantage for a ramp secret sharing scheme that is secure against size- K coalitions with statistical error K 3/2 exp(−Ω(de g 1/3 (f ) 2 /K)) for all values of K up to n/64 simultaneously. Previous secret sharing schemes required that K be determined in advance, and only worked for f= AND. Our analyses draw new connections between approximate degree and concentration phenomena. As a corollary, we show that for any d<n/64 , any degree d polynomial approximating a symmetric function f to error 1/3 must have ℓ 1 -norm at least K −3/2 exp(Ω(de g 1/3 (f ) 2 /d)) , which we also show to be tight for any d>de g 1/3 (f) . These upper and lower bounds were also previously only known in the case f= AND.

Full PDF

aa r X i v : . [ c s . CC ] J un Approximate degree, secret sharing, and concentration phenomena

Andrej Bogdanov , Nikhil S. Mande , Justin Thaler , and Christopher Williamson { andrejb, chris } @cse.cuhk.edu.hk { nikhil.mande, justin.thaler } @georgetown.edu Chinese University of Hong Kong Georgetown University

Abstract

The ε -approximate degree f deg ε ( f ) of a Boolean function f is the least degree of a real-valued poly-nomial that approximates f pointwise to within ε . A sound and complete certiﬁcate for approximatedegree being at least k is a pair of probability distributions, also known as a dual polynomial , that areperfectly k -wise indistinguishable, but are distinguishable by f with advantage − ε . Our contributionsare: • We give a simple, explicit new construction of a dual polynomial for the

AND function on n bits,certifying that its ε -approximate degree is Ω (cid:16)p n log 1 /ε (cid:17) . This construction is the ﬁrst to extendto the notion of weighted degree, and yields the ﬁrst explicit certiﬁcate that the / -approximatedegree of any (possibly unbalanced) read-once DNF is Ω( √ n ) . It draws a novel connection be-tween the approximate degree of AND and anti-concentration of the Binomial distribution. • We show that any pair of symmetric distributions on n -bit strings that are perfectly k -wise indis-tinguishable are also statistically K -wise indistinguishable with at most K / · exp (cid:0) − Ω (cid:0) k /K (cid:1)(cid:1) error for all k < K ≤ n/ . This bound is essentially tight, and implies that any symmetric func-tion f is a reconstruction function with constant advantage for a ramp secret sharing scheme thatis secure against size- K coalitions with statistical error K / · exp (cid:16) − Ω (cid:16) f deg / ( f ) /K (cid:17)(cid:17) for allvalues of K up to n/ simultaneously. Previous secret sharing schemes required that K be de-termined in advance, and only worked for f = AND . Our analysis draws another new connectionbetween approximate degree and concentration phenomena.As a corollary of this result, we show that for any d ≤ n/ , any degree d polynomial approx-imating a symmetric function f to error / must have coefﬁcients of ℓ -norm at least K − / · exp (cid:16) Ω (cid:16) f deg / ( f ) /d (cid:17)(cid:17) . We also show this bound is essentially tight for any d > f deg / ( f ) .These upper and lower bounds were also previously only known in the case f = AND . Introduction

The ε -approximate degree of a function f : {− , } n → { , } , denoted g deg ε ( f ) , is the least degree of amultivariate real-valued polynomial p such that | p ( x ) − f ( x ) | ≤ ε for all inputs x ∈ {− , } n . Such a p issaid to be an approximating polynomial for f . This is a central object of study in computational complexity,owing to its polynomial equivalence to many other complexity measures including sensitivity, exact degree,deterministic and randomized query complexity [21], and quantum query complexity [6].By linear programming duality, f has ε -approximate degree more than k if and only if there exist a pairof probability distributions µ and ν over the domain of f such that µ and ν are perfectly k -wise indistin-guishable (i.e., all k -wise projections of µ and ν are identical), but are (1 − ε ) -distinguishable by f , namely E X ∼ µ [ f ( X )] − E Y ∼ ν [ f ( Y )] ≥ − ε . Said equivalently, a sound and complete certiﬁcate for ε -approximatedegree being more than k is a dual polynomial q = ( µ − ν ) / that contains no monomials of degree k orless, and such that P x | q ( x ) | = 1 and P x q ( x ) f ( x ) ≥ ε .Dual polynomials have immediate applications to cryptographic secret sharing: a dual polynomial q =( µ − ν ) / for f is a description of a cryptographic scheme for sharing a 1-bit secret amongst n parties, wherethe secret can be reconstructed by applying f to the shares, and the scheme is secure against coalitions ofsize k (see [4] for details). Motivation for explicit constructions of dual polynomials.

Recent years have seen signiﬁcant progressin proving new approximate degree lower bounds by explicitly constructing dual polynomials exhibitingthe lower bound [7, 8, 10–12, 25, 26, 28]. These new lower bounds have in turn resolved signiﬁcant openquestions in quantum query complexity and communication complexity. At the technical core of theseresults are techniques for constructing a dual polynomial for composed functions f ◦ g := f ( g, . . . , g ) ,given dual polynomials for f and g individually.Often, an explicitly constructed dual polynomial showing that g deg ε ( g ) ≥ d exhibits additional metricproperties, beyond what is required simply to witness g deg ε ( g ) ≥ d . Much of the major recent progress inproving approximate degree lower bounds has exploited these additional metric properties [7, 11, 12, 28].Accordingly, even if cases where an approximate degree lower bound for a function g is known, it can oftenbe useful to construct an explicit dual polynomial witnessing the lower bound. Hence, we are optimistic thatthe new constructions of dual polynomials given in this work will ﬁnd future applications.Explicit constructions of dual polynomials are also necessary to implement the corresponding secret-sharing scheme, and to analyze the complexity of the algorithm that samples the shares of the secret. Our results in a nutshell.

Our results fall into two categories. In the ﬁrst category, we reprove severalknown approximate degree lower bounds by giving the ﬁrst explicit constructions of dual polynomials wit-nessing the lower bounds. Speciﬁcally, our dual polynomial certiﬁes that the ε -approximate degree of the n -bit AND function is Θ( p n log 1 /ε ) . This construction is the ﬁrst to extend to the notion of weighteddegree, and yields the ﬁrst explicit certiﬁcate that the / -approximate degree of any (possibly unbalanced)read-once DNF is Ω( √ n ) . Interestingly, our dual polynomial construction draws a novel and clean connec-tion between the approximate degree of AND and anti-concentration of the Binomial distribution.In the second category, we prove new and tight results about the size of the coefﬁcients of polyno-mials that approximate symmetric functions. Speciﬁcally, we show that for any d ≤ n/ , any degree d polynomial approximating f to error / must have coefﬁcients of weight ( ℓ -norm) at least d / · exp (cid:16) Ω (cid:16) f deg / ( f ) /d (cid:17)(cid:17) . We show this bound is tight (up to logarithmic factors in the exponent) forany d > f deg / ( f ) . These bounds were previously only known in the case f = AND [5, 24]. Our analysis In this work, for convenience we also consider functions mapping { , } n to { , } . AND

To describe our dual polynomial for

AND , it will be convenient to consider the

AND function to havedomain {− , } n and range { , } , with AND ( x ) = 1 if and only if x = 1 n . In their seminal work, Nisanand Szegedy [21] proved that the / -approximate degree of the AND function on n inputs is Θ( √ n ) . Moregenerally, it is now well-known that the ε -approximate degree of AND is Θ (cid:16)p n log(1 /ε ) (cid:17) [6, 16]. Theseworks do not construct explicit dual polynomials witnessing the lower bounds; this was achieved later inworks of ˇSpalek [29] and Bun and Thaler [8].Our ﬁrst contribution is the construction of a new dual polynomial φ for AND , which is simple enoughto describe in a single equation: φ ( x ) = ( − n Z (cid:18) Y i ∈ [ n ] x i (cid:19)(cid:18) E S Y i ∈ S x i (cid:19) . (1)Here, S is a random subset of { , . . . , n } of size at most ( n − d ) (where d determines the degree of thepolynomials against which the exhibited lower bound holds), and Z is an (explicit) normalization constant.In the language of secret sharing, to share a secret s ∈ {− , } , the dealer samples shares x ∈ {− , } n with probability proportional to ( E S Q i ∈ S x i ) , conditioned on the parity of the shares Q x i being equal to s . In Corollary 2.2 we show that φ certiﬁes that every degree- d polynomial must differ from the AND func-tion by − n P ( n − d ) / k =0 (cid:0) nk (cid:1) at some input. In other words, the approximation error of a degree- d polynomialis lower bounded by the probability that a sum of unbiased independent bits deviates from its mean by d/ .Our function φ given in (1), unlike previous dual polynomials [10, 16, 27, 29], also certiﬁes that the weighted / -approximate degree of AND with weights w ∈ R n ≥ is Ω( k w k ) (see Corollary 2.3). Thislower bound is tight for all w , matching an upper bound of Ambainis [1]. The only difference in our dualpolynomial construction for the weighted case is in the distribution over sets S , and the lower bound in theweighted case is derived from anti-concentration of weighted sums of Bernoulli random variables.Both statements are corollaries of the following theorem. Theorem 1.1.

Deﬁne

AND : {− , } n → { , } as AND ( x ) = 1 if and only if x = 1 n . The function φ deﬁned in Equation (1) is a dual witness for g deg w,ε ( AND ) ≥ d for ε = Pr X ∼{− , } n [ h w, X i ≥ d ] . By combining, in a black-box manner, the dual polynomial for the weighted-approximate degree of

AND with prior work (e.g., [17, Proof of Theorem 7]), one obtains, for any read-once DNF f , an explicit dualpolynomial for the fact that g deg / ( f ) ≥ Ω( n / ) . Very recent work of Ben-David et al. [2] established thisresult for the ﬁrst time, shaving logarithmic factors off of prior work [10, 17]. In fact, Ben-David et al. [2]prove more generally that any depth- d read-once AND - OR formula has approximate degree − O ( d ) √ n .Their method, however, does not appear to yield an explicit dual polynomial, even in the case d = 2 . Discussion.

It has been well known that the ε -approximate degree of the AND function on n variablesis Θ (cid:16)p n log(1 /ε ) (cid:17) [6, 21], a fact which has many applications in theoretical computer science. This is For a polynomial p ( x , . . . , x n ) , a weight vector w ∈ R n ≥ assigns weight w i to variable x i . The weighted degree of p is themaximum weight over all monomials appearing in p , where the weight of a monomial is the sum of the weights of the variablesappearing within it. The weighted ε -approximate degree of f , denoted g deg w,ε ( f ) , is the least weighted degree of any polynomialthat approximates f pointwise to error ε . Θ (cid:16)p n log(1 /ε ) (cid:17) layers of theHamming cube contain a − ε fraction of all inputs (i.e., “most” n -bit strings have Hamming weight closeto n/ ). However, these two phenomena have not previously been connected, and it is not a priori clear whyapproximate degree should be related to concentration of measure. An approximating polynomial p for f must approximate f at all inputs in {− , } n . Why should it matter that most (but very far from all) inputshave Hamming weight close to n/ ?The new dual witness for AND constructed in Equation (1) above provides a surprising answer to thisquestion. The connection between (anti-)concentration and approximate degree of

AND arises not becauseof the number of inputs to f that have Hamming weight close to n/ , but because of the number of parityfunctions on n bits that have degree close to n/ . This connection appears to be rather deep, as evidencedby our construction’s ability to yield a tight lower bound in the case of weighted approximate degree. In this section, for convenience we consider functions mapping { , } n to { , } . Two distributions µ and ν over { , } n are (statistically) ( k, δ ) -wise indistinguishable if for all subsets S ⊆ { , . . . , n } of size k , theinduced marginal distributions µ | S and ν | S are within statistical distance δ . When δ = 0 , we say they are (perfectly) k -wise indistinguishable .For general pairs of distributions, perfect k -wise indistinguishability does not imply any sort of securityagainst distinguishers of size k + 1 . Any binary linear error-correcting code of distance k + 1 and blocklength n induces a pair of distributions (the uniform distribution over codewords and one of its afﬁne shifts)that are perfectly k -wise indistinguishable, yet perfectly ( k + 1) -wise distinguishable.In contrast, we prove that perfect k -wise indistinguishability for symmetric distributions implies strongstatistical security against larger adversaries: Theorem 1.2. If µ and ν are symmetric over { , } n and perfectly k -wise indistinguishable, then they arestatistically ( K, O ( K / ) · e − k / K ) -wise indistinguishable for all ≤ k < K ≤ n/ . Theorem 1.2 has the following direct consequence for secret sharing schemes over bits with symmetricreconstruction. We say ( µ, ν ) are α -reconstructible by f if E X ∼ µ [ f ( X )] − E Y ∼ ν [ f ( Y )] ≥ α . Corollary 1.3.

Let f be a symmetric Boolean function. There exists a pair of distributions µ and ν that are (cid:16) K, K / · e − Ω( g deg / ( f ) /K ) (cid:17) -indistinguishable for all K ≤ n/ , but are Ω(1) -reconstructible by f . Corollary 1.3 is an immediate consequence of our Theorem 1.2, and the fact that any symmetric functionhas an optimal dual polynomial that is itself symmetric. In the special case f = AND (or equivalently f = OR ), Corollary 1.3 implies the existence of a visual secret sharing scheme (see, for example [20])that is (cid:0) K, K / · e − Ω( n/K ) (cid:1) -statistically secure against all coalitions of size K , simultaneously for all K up to size n/ . This property, where security guarantees are in place for many coalition sizes at the sametime, is in contrast to an earlier result of Bogdanov and Williamson [5] where they proved that for any ﬁxedcoalition size K , there is a visual secret sharing scheme that is ( K, e − Ω( n/K ) ) -statistically secure. In theirconstruction, the distribution of shares µ and ν depend on the value of K .We remark that the bound of Corollary 1.3 cannot hold in general for K = n , since there exists distribu-tions that are perfectly Ω( n ) -wise indistinguishable but are reconstructible by the majority function on all n inputs. We do not however know if a bound of the form K ≤ (1 − Ω(1)) n is tight in this context.3 ight weight-degree tradeoffs for polynomials approximating symmetric functions. Let f : { , } n →{ , } be any function. For any integer d ≥ , denote by W ε ( f, d ) the minimum weight of any degree- d polynomial that approximates f pointwise to error ε . By the weight of a polynomial, we mean the ℓ -norm of its coefﬁcients over the parity (Fourier) basis . In Section 4, we observe that Corollary 1.3 impliesweight-degree trade-off lower bounds for symmetric functions. Corollary 1.4.

For any symmetric function f : { , } n → { , } , any constant ε ∈ (0 , / , and any integer K such that n/ ≥ K ≥ g deg ε ( f ) , we have W ε ( f, K ) ≥ K − / · Ω (cid:16)g deg / ( f ) /K (cid:17) . The following theorem shows that the lower bound obtained in Corollary 1.4 is tight (up to polyloga-rithmic factors in the exponent) for all symmetric functions.

Theorem 1.5.

For any symmetric function f : { , } n → { , } , any constant ε ∈ (0 , / and K > g deg ε ( f ) · √ log n , W ε ( f, K ) ≤ ˜ O ( g deg / ( f ) /K ) . Theorem 1.5 also implies that Corollary 1.3 is tight (up to polylogarithmic factors in the exponent) forall symmetric f and for all K ≥ g deg / ( f ) √ log n . This is because any improvement to Corollary 1.3 wouldyield an improvement to Corollary 1.4, contradicting Theorem 1.5. Essentially Optimal Ramp Visual Secret Sharing Schemes.

The following result shows that in the case f = AND , Corollary 1.3 is essentially tight for all K ≥ , and Theorem 1.2 is tight as a reduction fromperfect to approximate indistinguishability for symmetric distributions. It does so by constructing essentiallyoptimal ramp visual secret sharing schemes. Theorem 1.6.

For all ≤ k < K ≤ n there exist symmetric k -wise indistinguishable distributions µ and ν over n -bit strings that are q − K +3 · P d>k (cid:0) KK + d (cid:1) -reconstructible by AND K , where AND K ( x ) is the AND of the ﬁrst K bits of x .Discussion of Theorem 1.6 . This theorem gives the existence of a ramp visual secret sharing scheme thatis perfectly secure against any k parties, but in which any K > k parties can reconstruct the secret withthe above advantage. This generalizes the schemes in [5] where only reconstruction by all n parties wasconsidered.Let us express the reconstruction advantage appearing in Theorem 1.6 in a manner more easily compa-rable to other results in this manuscript. Standard results on anti-concentration of the Binomial distributionstate that − K · P d>k (cid:0) KK + d (cid:1) = e − Θ( k /K ) (see, e.g., [18]). The Cauchy-Schwarz inequality then impliesthat the reconstruction advantage appearing in Theorem 1.6 is at least K − / · e − O ( k /K ) . In fact, our main weight lower bound (Corollary 1.4) holds over any set of functions (not just parities) that each depend on atmost d variables. Here and throughout, the ˜ O notation hides polylogarithmic factors in n . A visual secret sharing scheme is a scheme where the reconstruction function is the

AND of some subset of the shares. Aramp scheme is one where there is not necessarily a sharp threshold between the perfect secrecy and reconstruction thresholds; inparticular, we allow for

K > k + 1 . Theorem 1.6 is closely related to Theorem 1.1, in that Theorem 1.6 gives another anti-concentration-based proof that g deg ε ( AND K ) ≥ k for ε = K − / · e − Θ( k /K ) . However, the two results are incomparable. Theorem 1.6 does not yield anexplicit dual polynomial for AND K , and the ε -approximate degree lower bound for AND K implied by Theorem 1.6 is loose by the K − / factor appearing in the expression for ε . On the other hand, Theorem 1.1 only yields a visual secret sharing scheme withreconstruction by all n parties, while Theorem 1.6 yields a ramp scheme with non-trivial reconstruction advantage by the AND ofthe ﬁrst K (out of n ) parties. ( K ) factor (or the constant factor in theexponent), then this would contradict Theorem 1.2 which upper bounds the distinguishing advantage of anystatistical test over K bits against symmetric, perfectly k -wise indistinguishable distributions. Theorem 1.6also shows that the indistinguishability parameter in Theorem 1.2 cannot be signiﬁcantly improved, even inthe restricted case where the only statistical test is AND K .In Section 6 we describe another application of Theorem 1.2 to security against share consolidation and“downward self-reducibility” of visual secret shares. Prior Work.

Servedio, Tan, and Thaler [24] established Corollary 1.4 and Theorem 1.5 in the special case f = OR , showing that degree d polynomials that approximate the OR function require weight ˜Θ( n/d ) =2 ˜Θ( g deg / ( OR ) /d ) . They used this result to establish tight weight-degree tradeoffs for polynomial thresholdfunctions computing decision lists. As previously mentioned, Bogdanov and Willamson [5] generalizedthe weight-vs-degree lower bound from [24] beyond polynomials, thereby obtaining a visual secret-sharingscheme for any ﬁxed K that is ( K, e − Ω( n/K ) ) -statistically secure.Elkies [14] and Sachdeva and Vishnoi [23] exploit concentration of measure to prove a tight upper boundon the degree of univariate polynomials that approximate the function t t n over the domain [ − , . Theirtechniques inspired our (much more technical) proof of Theorem 1.2. Other Related Work.

This work subsumes Bogdanov’s manuscript [3], which shows a slightly weakerlower bound on the weighted approximate degree of AND, and does not derive an explicit dual polynomial.In independent work, Huang and Viola [15] prove a weaker form of our Corollary 1.3: their distributions µ, ν depend on the value of K . They also prove (a slightly tighter version of) Theorem 1.5, thereby establishingthat the statistical distance in Corollary 1.3 is tight. The proof of Theorem 1.1 (Section 2) is an elementary veriﬁcation that the function φ given in (1) is adual polynomial. The only property that is not immediate is correlation with AND . Verifying this propertyamounts to upper bounding the normalization constant Z , which follows from orthogonality of the Fouriercharacters.In the proof of Theorem 1.2 (Section 3), a K -bit statistical distinguisher for symmetric distribution isﬁrst decomposed into a sum of at most K + 1 tests Q w that evaluate to 1 only when the input has Hammingweight exactly w . Lemma 3.3 shows that the univariate symmetrizations p w of these distinguishers can bepointwise approximated by a degree- k polynomial with error at most O ( K / ) · e − Ω( k /K ) .To construct the desired approximation, we derive an identity relating the moment generating functionof the squared Chebyshev coefﬁcients of p w (interpreted as relative probabilities) to the average magnitudeof a polynomial g related to p w on the unit complex circle (Claims 3.6 and 3.7). We bound these magni-tudes analytically (Claim 3.8) and derive tail inequalities for the Chebyshev coefﬁcients from bounds on themoment generating function as in standard proofs of Chernoff-Hoeffding bounds. These bounds for OR were implicit in [24], but not explicitly highlighted. The upper bound was explicitly stated in [13, Lemma4.1], which gave applications to differential privacy, and the lower bound in [9, Lemma 32], which used it to establish tight weight-degree tradeoffs for polynomial threshold functions computing read-once DNFs.

5n the special case when the secrecy parameters k and K are ﬁxed and the number of parties n ap-proaches inﬁnity, p w ( t ) turns out to equal C w ( t − w ( t + 1) K − w , where C w is some quantity indepen-dent of t . In this case, the Chebyshev coefﬁcients are the regular coefﬁcients of the polynomial g ∞ ( s ) =2 − w C w ( s − w ( s + 1) K − w ) . When w = 0 , K/ , or , the coefﬁcients of g ∞ are exponentially con-centrated around the middle as they follow the binomial distribution. We prove that this exponential decayin magnitudes happens for all values of w , which requires understanding complicated cancellations in thealgebraic expansion of g ∞ ( s ) . We generalize this analysis to the ﬁnitary setting n ≥ K .We prove Theorem 1.5 (Section 4) by writing any symmetric function f as a sum of at most ℓ :=min {| f − (0) | , | f − (1) |} many conjunctions, and approximating each conjunction to such low error (namelyerror ≪ ℓ ) that the sum of all approximations is an approximation for f itself. Theorem 1.5 then follows byconstructing low-weight, low-degree polynomial approximations for each conjunction in the sum.Theorem 1.6 (Section 5) is proved by lower bounding the error of degree k polynomial approximationsto the symmetrization f of the function AND K (cid:0) x | { ,...,K } (cid:1) . By duality, a lower bound on approximationerror translates into a secret sharing scheme with the same reconstruction advantage. To lower bound theerror, we estimate the values of the coefﬁcients in the Chebyshev expansion of f with indices larger than k . Owing to orthogonality, the largest of these coefﬁcients lower bounds the approximation error of anydegree- k polynomial.In Section 6 we formulate a security of secret sharing against consolidation and downward self-reducibilityof visual schemes, and derive these properties from the main results. In this section we prove Theorem 1.1 and derive its two corollaries about the unweighted and weightedapproximate degree of AND.

Notation and Deﬁnitions.

Let [ n ] = { , . . . , n } . Given a vector w ∈ R n ≥ , deﬁne the weight of a monomial χ S ( x ) = Q i ∈ S x i , x i ∈ {− , } to equal P i ∈ S w i . Deﬁne the w -weighted degree of a polynomial to be themaximum weight of a monomial in it. That is, if p = P S ⊆ [ n ] c S χ S , then deﬁne deg w ( p ) = max S : c S =0 w ( S ) . Deﬁne the w -weighted ε -approximate degree g deg w,ε ( f ) to be the minimum w -weighted degree of a poly-nomial p that satisﬁes | p ( x ) − f ( x ) | ≤ ε for all x in the domain of f . Given two real-valued functions f, g over domain {− , } n , deﬁne h f, g i := n P x ∈{− , } n f ( x ) · g ( x ) . Lemma 2.1.

For any ﬁnite set X and any function f : X → R , g deg w,ε ( f ) ≥ d iff there exists a function φ : X → R satisfying the following conditions. • Pure high degree : For any real polynomial p of weighted degree is at most d , h φ, p i = 0 . • Normalization : P x ∈ X | φ ( x ) | = 1 , • Correlation : h φ, f i ≥ ε , We call φ a dual witness for g deg w,ε ( f ) ≥ d . The lemma follows by linear programming duality and isa straightforward generalization of previous results (see e.g. [10, 29]). We prove the “if” direction, which issufﬁcient for our purposes. The i -th coefﬁcient of g ∞ is the value of the i -th Kravchuk polynomial with parameter K evaluated at w . roof. For any p of weighted degree at most d , k f − p k ∞ = k f − p k ∞ k φ k ≥ h φ, f − p i = h φ, f i − h φ, p i ≥ ε. The dual polynomial of interest is φ ( x ) = ( − n Z χ [ n ] ( x ) · E S ∼H [ χ S ( x )] , where x ∈ {− , } n , H is the uniform distribution over the sets { S ⊆ [ n ] : w ( S ) ≤ ( k w k − d ) / } , and Z is the normalization constant Z = X x ∈{− , } n E S ∼H [ χ S ( x )] . Proof of Theorem 1.1.

We prove the theorem by showing that φ satisﬁes the three conditions of Lemma 2.1.The expression E S ∼H [ χ S ( x )] can be written as a sum of products of pairs of monomials of weight at most ( k w k − d ) / , so its weighted degree is at most k w k − d . Thus every monomial that occurs in the expansionof χ [ n ] ( x ) E S ∼H [ χ S ( x )] must have weighted degree at least d , and so φ has pure high weighted degree atleast d as desired.The scaling by Z in the deﬁnition of φ ensures that φ has L norm 1. The correlation of φ and AND isgiven by h φ, AND i = φ (1 n ) = Z . Finally, the normalization constant Z evaluates to Z = X x ∈{− , } n E S ∼H [ χ S ( x )] = X x ∈{− , } n E S ∼H [ χ S ( x )] E T ∼H [ χ T ( x )]= X x ∈{− , } n E S,T ∼H [ χ S ∆ T ( x )] = E S,T ∼H X x ∈{− , } n χ S ∆ T ( x )= 2 n Pr[ S = T ] = 2 n |H| , since the inner summation over x evaluates to n when S = T , and zero otherwise.It remains to show that /Z = |H| / n equals the desired expression for ε . For a set S ⊆ [ n ] , let X ( S ) ∈ {− , } n be the string that assigns values and − to elements inside and outside S , respectively.Then w ( S ) = k w k / h w, X ( S ) i / , so |H| n = Pr S ⊆ [ n ] [ w ( S ) ≥ k w k / d/

2] = Pr X ∼{− , } n [ h w, X i ≥ d ] . Corollary 2.2 (Approximate degree of AND) . Recall that

AND : {− , } n → { , } denotes the functionsatisfying AND ( x ) = 1 if and only if x = 1 n . If p has degree at most d , then | p ( x ) − AND ( x ) | ≥ Pr[ X ≤ ( n − d ) / for some x , where X is a Binomial( n, / random variable. The expression on the right is lower bounded by the larger of / − O ( d/ √ n ) and − O ( d /n ) . In thelarge d regime ( d ≥ √ n ), this bound is tight [6, 16] Proof.

Apply Theorem 1.1 to the weight vector w = (1 , , . . . , .7arlier constructions of dual polynomials for AND are quite different from our Corollary 2.2 [10, 16,27, 29] and are based on real-valued polynomial interpolation. Speciﬁcally, for a carefully chosen set T ⊆{ , , . . . , n } of size | T | = 2 d , the prior constructions consider a univariate polynomial p ( t ) = Q i ∈ [ n ] \ T ( t − i ) , and they deﬁne ψ ( x ) = p ( | x | ) , where | x | denotes the Hamming weight of x . Clearly ψ has degree atmost n − | T | . A fairly complicated calculation is required to show that, for an appropriate choice of T ,deﬁning ψ in this way ensures that | ψ (1 n ) | captures an ε -fraction of the L -mass of ψ . Corollary 2.3 (Weighted approximate degree of AND) . g deg w, / ( AND ) ≥ k w k / . The proof uses the Paley-Zygmund inequality:

Lemma 2.4 (Paley-Zygmund inequality) . Let Z ≥ be any random variable with ﬁnite variance. Then, forany < θ < , Pr[ Z ≥ θ E ( Z )] ≥ (1 − θ ) ( E [ Z ]) E [ Z ] . Proof of Corollary 2.3.

We apply the Paley-Zygmund inequality to h w, X i . First, E [ h w, X i ] = k w k and E [ h w, X i ] = P w i + 3 P w i w j ≤ k w k . Then Pr (cid:20) h w, X i ≥ k w k (cid:21) = 12 Pr (cid:20) |h w, X i| ≥ k w k (cid:21) = 12 Pr (cid:20) h w, X i ≥ k w k (cid:21) ≥ · ·

13 = 332 , where the ﬁrst equality follows from the sign-symmetry of X . Applying Theorem 1.1 with d = k w k / yields the claim. In this section, we prove Theorem 1.2, which states that any pair of symmetric and perfectly k -wise indis-tinguishable distributions over { , } n are also approximately indistinguishable against statistical tests thatobserve K > k of the bits. We may and will assume without loss of generality that the statistical test is asymmetric function, meaning that it depends only on the Hamming weight of the observed bits of its input.Let X and Y denote an arbitrary pair of symmetric ( k, -wise indistinguishable distributions over { , } n . We will be interested in obtaining an upper bound on the statistical distance of their projectionsto any K indices of [ n ] , namely the advantage E X [ T ( X | S ) − E Y [ T ( Y | S )] where T : { , } K → { , } is a symmetric function and S ⊆ [ n ] is any set of size K . We can decompose T into a sum of tests Q w : { , } K → { , } , where Q w outputs 1 if and only if the Hamming weight of its input is exactly w .Speciﬁcally, we decompose T as T = K X w =0 b w Q w , (2)where each b w is either zero or one. We will bound the distinguishing advantage of each Q w in the sumindividually. This advantage is captured by a univariate function p w that expresses Q w in terms of theHamming weight of its input, after shifting and scaling the Hamming weight to reside in the interval [ − , . Fact 3.1.

Let S ⊆ [ n ] be any set of size K . There exists a univariate polynomial p w of degree at most K such that the following holds. For all t ∈ {− , − /n, . . . , − /n, } , p w ( t ) = E Z [ Q w ( Z | S )] where Z is a random string of Hamming weight φ − ( t ) = (1 − t ) n/ ∈ { , , . . . , n } . roof. This statement is a simple extension of Minsky and Papert’s classic symmetrization technique [19].Speciﬁcally, Minsky and Papert showed that for any polynomial p n : { , } n → R , there exists a univariatepolynomial P of degree at most the total degree of p n , such that for all i ∈ { , . . . , n } , P ( i ) = E | x | = i [ p n ( x )] .Apply this result to p n ( x ) = Q w ( x | S ) and let p w ( t ) = P ( φ − ( t )) = P ((1 − t ) n/ . The fact then followsfrom the observation that the total degree of Q w ( x | S ) is at most K , since this function is a K -junta.In particular, the value p w ( t ) is a probability for every t ∈ {− , − /n, . . . , − /n, } . Moreover,this probability must equal zero when the Hamming weight of Z is less than w or greater than n − K + w .Therefore p w has K distinct zeros at the points Z w = Z − ∪ Z + , where Z − = {− h/n : h = 0 , ..., K − w − } , Z + = { − h/n : h = 0 , ..., w − } . (3)and so p w must have the form p w ( t ) = C w · Y z ∈ Z w ( t − z ) (4)for some C w that does not depend on t . As p w ( t ) is probability when t ∈ {− , − /n, . . . , − /n, } ,the function p w is 1-bounded at those inputs. In fact, p w is uniformly bounded on the interval [ − , : Claim 3.2.

Assuming n ≥ K , | p w ( t ) | ≤ for all t ∈ [ − , . The proof is in Section 3.4. Formula (4) and Claim 3.2 will be applied to show that p w has a gooduniform polynomial approximation on the interval [ − , . Lemma 3.3.

Assuming n ≥ K , there exists a degree- k polynomial q w such that | p w ( t ) − q w ( t ) | ≤ √ K exp( − k / K ) for all t ∈ [ − , . Lemma 3.3 is the main technical result of this section. It is proved in Section 3.1.

Proof of Theorem 1.2.

Now let T be a general distinguisher on K inputs. By Facts A.1 and A.2 (see Ap-pendix), T can be assumed to be a symmetric Boolean-valued function. We bound the distinguishing advan-tage as follows. Recalling that X and Y are ( k, -indistinguishable symmetric distributions over { , } n ,for any set S ⊆ [ n ] of size K we have: E [ T ( X | S )] − E [ T ( Y | S )]= K X w =0 b w (cid:0) E [ Q w ( X | S )] − E [ Q w ( Y | S )] (cid:1) (by (2)) ≤ K X w =0 (cid:12)(cid:12) E [ Q w ( X | S )] − E [ Q w ( Y | S )] (cid:12)(cid:12) (by boundedness of b w ) = K X w =0 (cid:12)(cid:12) E [ p w ( φ ( | X | )] − E [ p w ( φ ( | Y | ))] (cid:12)(cid:12) (by symmetry of X, Y , and Fact 3.1) ≤ K X w =0 (cid:12)(cid:12) E [ q w ( φ ( | X | ))] − E [ q w ( φ ( | Y | ))] (cid:12)(cid:12) + 8 √ K exp( − k / K ) (by Lemma 3.3) = O ( K / ) · e − k / K (by k -wise indistinguishability of X, Y )Therefore, X and Y are ( K, O ( K / ) · e − k / K ) -wise indistinguishable for ≤ K ≤ n/ . p w , C w , and Z w also depend on K and n but we omit those arguments from the notation as they will be ﬁxed in the proof. .1 Proof of Lemma 3.3 We will prove Lemma 3.3 by studying the Chebyshev expansion of p w . To this end we take a brief detourinto Chebyshev polynomials and an even briefer one into Fourier analysis. Chebyshev polynomials.

The Chebyshev polynomials are a family of real polynomials { T d } , 1-boundedon [ − , , with T d having degree d . We extend the deﬁnition to negative indices by setting T − d = T d . TheChebyshev polynomials are orthogonal with respect to the measure dσ ( t ) = (1 − t ) − / dt supported on [ − , . Therefore every degree- K polynomial p : R → R has a unique (symmetrized) Chebyshev expansion p ( t ) = K X d = − K c d T d ( t ) , c − d = c d where c − K , . . . , c K are the Chebyshev coefﬁcients of p .The Chebyshev polynomials satisfy the following identity, which plays an important role in our analysis: Fact 3.4. t · T d ( t ) = T d − ( t ) + T d +1 ( t ) . This formula, together with the “base cases” T ( t ) = 1 and T ( t ) = t , speciﬁes all Chebyshev polyno-mials.We will also need the following form of Parseval’s identity for univariate polynomials. Claim 3.5 (Parseval’s identity) . For every complex polynomial h , the sum of the squares of the magnitudesof the coefﬁcients of h equals E z [ | h ( z ) | ] , where z is a random complex number of magnitude 1. Proof outline.

We will argue that the Chebyshev expansion P Kd = − K c d T d ( t ) of p w ( t ) has small weight onthe coefﬁcients c d when | d | > k . Zeroing out those coefﬁcients then yields a good degree- k approximationof p w as desired.The upper bound on the Chebyshev coefﬁcients of p w is derived in two steps. The ﬁrst step, whichis of an algebraic nature, expresses the Chebyshev coefﬁcients of p w as regular coefﬁcients of a relatedpolynomial g . We are interested in the coefﬁcients of the derived polynomial g ε ( s ) = g ((1 + ε ) s ) , whichrepresent the Chebyshev coefﬁcients c d of p w ampliﬁed by the exponential scaling factor (1 + ε ) d .The second step, which is analytic, upper bounds the magnitude of the coefﬁcients of g ε ( s ) . Themain tool is Parseval’s identity, which identiﬁes the sum of the squares of these coefﬁcients by the aver-age magnitude of g ε over the complex unit circle E θ | g ((1 + ε ) e iθ ) | . We bound the maximum magnitude max θ | g ((1 + ε ) e iθ ) | by explicitly analyzing the function g . This step comprises the bulk of our proof.The third step translates the bound on the squared 2-norm P Kd = − K (1 + ε ) d c d of the ampliﬁed coef-ﬁcients into a tail bound on c d by optimizing over a suitable value of ε . This is analogous to the standardderivation of Chernoff-Hoeffding bounds by analysis of the moment generating function of the relevantrandom variable.We now sketch how this outline is executed for the special case where n tends to inﬁnity while k and K remain ﬁxed. Although this setting is technically much easier, it allows us to highlight the main conceptualpoints of our argument. The analysis for ﬁnite n can be viewed as an approximation of this proof strategy. We omit the dependence on w as this parameter remains constant throughout the proof. ketch of the limiting case n → ∞ . By the expansion (4) of p w , as n tends to inﬁnity p w convergesuniformly to the function p ∞ w ( t ) = C w · ( t − w ( t + 1) K − w , as this corresponds to Fact 3.1 when the bits of the string Z are independent and (1 − t ) / -biased. As p ∞ w ( t ) is a probability for every t ∈ [ − , , Claim 3.2 follows immediately. Step 1.

Our algebraic treatment of the Chebyshev transform yields that the Chebyshev coefﬁcient c d of p ∞ w is the ( K + d ) -th regular coefﬁcient of the polynomial g ∞ ( s ) = C w (cid:18) s − √ (cid:19) w (cid:18) s + 1 √ (cid:19) K − w ) . (5) Step 2.

The evaluation of the polynomial g ∞ ε ( s ) = g ∞ ((1 + ε ) s ) at s = e iθ satisﬁes the identity (cid:12)(cid:12)(cid:12) g ∞ (cid:16) (1 + ε ) e iθ (cid:17)(cid:12)(cid:12)(cid:12) = (1 + ε ) K · (1 + δ ) K · C w · (cid:18) − cos θ δ (cid:19) w (cid:18) θ δ (cid:19) K − w , (6)where δ = ε / ε ) . This happens to equal (1 + ε ) K (1 + δ ) K p w (cos θ/ (1 + δ )) , (7)and is in particular uniformly bounded by (1 + ε ) K (1 + δ ) K for all θ . This similarity between p ∞ and g ∞ ε is the crux of our analysis. Step 3.

By Parseval’s identity, after suitable shifting and cancellation, the ampliﬁed sum of Chebyshevcoefﬁcients P Kd = − K (1 + ε ) d c d is upper bounded by (1 + δ ) K . Therefore the tail P k ≥ d c d can have valueat most (1 + δ ) K / (1 + ε ) k ≤ exp(2 Kε − ε − ε / k ) . This upper bound holds for all ε ∈ [0 , , andplugging in the approximate minimizer ε = k/ K yields a bound of the desired form exp( − Ω( k /K )) . Outline of the general case.

We now give the outline of our full proof for the general case and relevanttechnical statements that we use to prove our main upper bound. Identity (5) generalizes to the followingstatement:

Claim 3.6.

The Chebyshev coefﬁcient c d of p w is the ( K + d ) -th regular coefﬁcient of the polynomial g ( s ) = C w Y z ∈ Z w (cid:18) s − sz + 12 (cid:19) , where C w is as in Equation (4) . The general form of identity (6) is:

Claim 3.7.

For ε > , δ = ε / ε ) , and θ ∈ [ − π, π ] , (cid:12)(cid:12) g ((1 + ε ) e iθ ) (cid:12)(cid:12) = (1 + ε ) K (1 + δ ) K · C w Y z ∈ Z w h δ (1+1 / (1+ δ )) (cid:18) cos θ δ , z (cid:19) where h δ ( s, z ) = ( s − z ) + δ (1 − z ) . h δ , there is no identity analogous to (7) when n is ﬁnite and p w haszeros inside ( − , . Nevertheless, Q z ∈ Z w h δ ( s, z ) can be uniformly bounded either by a sufﬁciently smallmultiple of p w ( s ) , or a ﬁxed quantity that is constant in the parameter range of interest. Claim 3.8.

Assume n ≥ K and w ≤ K/ . Then C w · Y z ∈ Z w h δ ( s, z ) ≤ ( e δK · p w ( s ) if | s | ≤ − w/ Ke δK if − w/ K ≤ | s | ≤ . We now prove Lemma 3.3. Claim 3.6 is proved in Section 3.2. Claim 3.7 is proved in Section 3.3.Claims 3.2 and 3.8 are proved in Section 3.4 as the proofs share the same structure.

Fact 3.9. p w ( t ) = p K − w (1 − t ) .Proof. By Fact 3.1, both sides are degree- K polynomials that agree on n + 1 > K points so they areidentical. Proof of Lemma 3.3.

By Fact 3.9 we may and will assume that w ≤ K/ . Let p w = P Kd = − K c d T d . Theapproximating polynomial q w is P | d |

12y the boundedness of p w (Claim 3.2), the upper bounds in Claim 3.8 can be uniﬁed by the inequality C w Y z ∈ Z w h δ ( s, z ) ≤ e δK that is valid for all s ∈ [ − , . Since δ ≤ e δ and ε ≥ e ε − ε / for ≤ ε ≤ , α ≤ √ K · s (1 + δ ) K (1 + ε ) k · e δK ≤ √ K · p e δK − εk + ε k ≤ √ K · p e ε K − εk , where the last inequality follows from the deﬁnition δ = ε / ε ) . Setting ε = k/ K we obtain that α ≤ √ K · e − k / K . Claim 3.6 is a direct consequence of the following formula for the Chebyshev expansion of products oflinear functions.

Claim 3.10. If p ( t ) = Q z ∈ Z ( t − z ) , where | Z | = K then the d -th Chebyshev coefﬁcient of p is the d -thregular coefﬁcient of the Laurent polynomial g ( s ) = Q z ∈ Z ( s + s − − z ) / . Indeed, multiplying the polynomial g ( s ) from Claim 3.10 by s K then yields Claim 3.6. Proof.

We prove this by induction on K . When K = 0 , p has only one nonzero Chebyshev coefﬁcientand it is equal to as claimed. Now assume the claim holds for p ( t ) and we prove it for ( t − z ) p ( t ) . Let [ s d ] ( g ( s )) denote the d -th regular coefﬁcient of g . Then the Chebyshev expansion of p is p ( t ) = X d [ s d ] ( g ( s )) · T d ( t ) , and the Chebyshev expansion of ( t − z ) p ( t ) is ( t − z ) p ( t ) = X d [ s d ] ( g ( s )) tT d ( t ) − X d [ s d ] ( g ( s )) zT d ( t )= X d [ s d ] ( g ( s )) · T d − ( t ) + X d [ s d ] ( g ( s )) · T d +1 ( t ) − X d [ s d ] ( g ( s )) zT d ( t ) (by Fact 3.4) = X d [ s d − ] ( sg ( s )) · T d − ( t ) + X d [ s d +1 ] (cid:0) s − g ( s ) (cid:1) · T d +1 ( t ) − X d [ s d ] ( g ( s )) zT d ( t )= X d [ s d ] (cid:16) s g ( s ) (cid:17) T d ( t ) + X d [ s d ] (cid:18) s − g ( s ) (cid:19) T d ( t ) − X d [ s d ] ( zg ( s )) T d ( t )= X d [ s d ] (cid:18) s + s − − z g ( s ) (cid:19) T d ( t ) , as desired. 13 .3 Proof of Claim 3.7 Proof.

By deﬁnition of Z w , we have that z ∈ [ − , and thus may set z = cos φ . We also write s =(1 + ε ) e iθ = (1 + ε ) cos θ + i (1 + ε ) sin θ , from which it follows that: s − sz + 1 = ( s − z + p z − s − z − p z −

1) = ( s − cos φ + i sin φ )( s − cos φ − i sin φ )= ( s − e iφ )( s − e − iφ ) = ((1 + ε ) e iθ − e iφ )((1 + ε ) e iθ − e − iφ )= (cid:16) (1 + ε ) e i ( θ + φ ) − (cid:17) (cid:16) (1 + ε ) e i ( θ − φ ) − (cid:17) . Recalling that δ = ε ε ) , we have that for any γ , | (1 + ε ) e iγ − | = ( − ε ) cos γ ) + ((1 + ε ) sin γ ) = 1 − ε ) cos γ + (1 + ε ) = 2(1 + ε )(1 − cos γ + δ ) , from which it follows that | s − sz + 1 | = (cid:12)(cid:12)(cid:12) (1 + ε ) e i ( θ + φ ) − (cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12) (1 + ε ) e i ( θ − φ ) − (cid:12)(cid:12)(cid:12) = 4(1 + ε ) (1 − cos( θ + φ ) + δ ) · (1 − cos( θ − φ ) + δ )= 4(1 + ε ) (1 + δ ) (cid:18) − cos( θ + φ )1 + δ (cid:19) (cid:18) − cos( θ − φ )1 + δ (cid:19) = 4(1 + ε ) (1 + δ ) (cid:18) − cos θ cos φ δ (cid:19) − (cid:18) sin θ sin φ δ (cid:19) ! = 4(1 + ε ) (1 + δ ) (cid:18) − z cos θ δ (cid:19) − (cid:18) (1 − z ) sin θ (1 + δ ) (cid:19)! = 4(1 + ε ) (cid:16) (1 + δ − z cos θ ) − (1 − z ) sin θ (cid:17) = 4(1 + ε ) (cid:0) (1 + δ ) − δ ) z cos θ − z + cos θ (cid:1) = 4(1 + ε ) (cid:0) (cos θ − (1 + δ ) z ) + (1 − z )(2 δ + δ ) (cid:1) . Note that the fourth equality uses the sum and difference formulas for sine and cosine.We then have (cid:12)(cid:12)(cid:12)(cid:12) s − sz + 12 (cid:12)(cid:12)(cid:12)(cid:12) = (1 + ε ) (cid:0) (cos θ − (1 + δ ) z ) + (1 − z )(2 δ + δ ) (cid:1) = (1 + ε ) (1 + δ ) (cid:18) cos θ δ − z (cid:19) + (1 − z )(2 δ + δ )1 + δ ! . The claim then follows by multiplicativity of the norm.14 .4 Proofs of Claim 3.8 and Claim 3.2

Proof of Claim 3.8

The objective is to uniformly bound the value of the function h δ ( s ) = C w · Y z ∈ Z w h δ ( s, z ) , where h δ ( s, z ) = ( s − z ) + δ (1 − z ) for s ∈ [ − , . When k, K are ﬁxed and n becomes large, all zeros in Z w approach − or +1 , h δ ( s, z ) uniformly approaches h ( s, z ) = ( s − z ) , h w ( s ) approaches h ( s ) = p ∞ w ( s ) and is therefore uniformlybounded.The main difﬁculty in extending this argument to ﬁnite n is that h δ ( s, z ) can no longer be uniformlybounded by a multiple of ( s − z ) since when s equals z , the latter function vanishes but the former onedoesn’t. For this reason, we divide the analysis into two parameter regimes. When s is bounded away fromthe set of zeros Z w , an approximation of the inﬁnitary term-by-term argument can be carried out. When s is near the zeroes, we argue that h δ ( s ) cannot be much larger than h δ ( s ) for an s that is even fartheraway from Z w , and then argue that h ( s ) = p w ( s ) must be small because it represents the square of aprobability of a rare event. Fact 3.11. h δ ( s, z ) h δ ( s, − z ) = h δ ( − s, z ) h δ ( − s, − z ) . Fact 3.12. h δ ( s, z ) ≤ h δ ( | s | , z ) when z ≤ and s ≥ . Fact 3.13. h δ ( s, z ) ≤ h δ ( s , z ) when s ≤ s ≤ , s ≤ z − , and | z | ≤ .Proof. The fact is equivalent to checking that ( s − z ) − ( s − z ) ≥ when s ≤ s ≤ and s ≤ z − .If s ≤ z then we have that s ≤ s ≤ z from which it immediately follows that ( s − z ) ≥ ( s − z ) . If s > z then ( s − z ) is at most (1 − z ) . However, since | z | ≤ , we have that s ≤ z − ≤ z and thus ( s − z ) is always at least ( z − (2 z − = (1 − z ) . Again we have that ( s − z ) ≥ ( s − z ) .We begin by reducing to the case of non-negative inputs s ∈ [0 , . Claim 3.14.

Assuming w ≤ K/ , h δ ( s ) ≤ h δ ( | s | ) .Proof. When w ≤ K/ then elements of Z w (3) can be split into w pairs of the form A = { ( − h/n, − h/n ) : 0 ≤ h < w } , and K − w remaining elements B = {− h/n : w ≤ h < K − w } are all non-positive. By Fact 3.11, Q ( − z,z ) ∈ A h δ ( s, z ) h δ ( s, − z ) = Q ( − z,z ) ∈ A h δ ( | s | , z ) h δ ( | s | , − z ) . By Fact 3.12, Q z ∈ B h δ ( s, z ) ≤ Q z ∈ B h δ ( | s | , z ) . Therefore the product Q z ∈ Z w h δ ( s, z ) ≤ Q z ∈ Z w h δ ( | s | , z ) .The following claim handles values of s in the range [0 , − w/ K ] . Claim 3.15.

Assuming ≤ s ≤ − w/ K , h δ ( s, z ) ≤ ( (1 + δ )( s − z ) , if z ≤ − / √ . (1 + (64 K/w ) δ )( s − z ) , if z ≥ − w/ K Proof.

The ratio h δ ( s, z ) / ( s − z ) equals − z ) / ( s − z ) ) δ . The number (1 − z ) / ( s − z ) is atmost when s ≥ and z ≤ − / √ and at most the following when z ≥ − w/ K . − (1 − w/ K ) ((1 − w/ K ) − (1 − w/ K )) ≤ w/ K ( w/ K ) = 64 K/w. orollary 3.16. Assuming ≤ s ≤ − w/ K and n ≥ K , h δ ( s ) ≤ e δK h ( s ) .Proof. By the choice of parameters, all zeros in Z − meet the criterion for the ﬁrst inequality in Claim 3.15,while all zeros in Z + meet the criterion for the second one. Therefore h δ ( s ) = C w Y z ∈ Z − h δ ( s, z ) Y z ∈ Z + h δ ( s, z ) ≤ C w Y z ∈ Z − (1 + δ )( s − z ) Y z ∈ Z + (1 + (64 K/w ) δ )( s − z ) ≤ (1 + δ ) K − w (1 + (64 K/w ) δ ) w · C w Y z ∈ Z − h ( s, z ) Y z ∈ Z + h ( s, z ) ≤ e δK · e δK · h ( s ) . The following two claims handle values of s in the range [1 − w/ K, . Claim 3.17.

Assuming w ≤ K and − w/ K ≤ s ≤ − w/ K ≤ s ≤ , h δ ( s, z ) ≤ ( h δ ( s , z ) , if z ≥ − w/ K (1 + w/ K ) · h δ ( s , z ) , if z ≤ − w/ K. Proof.

By the choice of parameters the ﬁrst inequality follows from Fact 3.13. For the second one, we upperbound the ratio ( s − z ) ( s − z ) ≤ (1 − z ) (1 − z − w/ K ) = (cid:18) w/ K − z − w/ K (cid:19) ≤ (cid:18) w K (cid:19) . This is greater than one, so ( s − z ) + δ (1 − z ) ≤ (1 + w/ K ) (( s − z ) + δ (1 − z )) as desired. Corollary 3.18.

Assuming − w/ K ≤ s ≤ − w/ K ≤ s ≤ and n ≥ K , h δ ( s ) ≤ e w/ h δ ( s ) .Proof. By the choice of parameters, all zeros in Z − meet the criterion for the ﬁrst inequality in Claim 3.17,while all zeros in Z + meet the criterion for the second one. Therefore h δ ( s ) = C w Y z ∈ Z − h δ ( s, z ) Y z ∈ Z + h δ ( s, z ) ≤ C w Y z ∈ Z − (1 + w/ K ) · h δ ( s , z ) Y z ∈ Z + h δ ( s , z )= (1 + w/ K ) | Z − | · h δ ( s ) ≤ (1 + w/ K ) K · h δ ( s ) ≤ e w/ h δ ( s ) . Claim 3.19. If s is of the form − h/n for some integer ≤ h ≤ wn/e K then ≤ p w ( s ) ≤ e − w . roof. By Fact 3.1, p w ( s ) is the probability that a random string of Hamming weight h and length n hasexactly w ones in its ﬁrst K positions. The probability that it has at least w ones in its ﬁrst K positions is atmost (cid:18) Kw (cid:19) · hn · h − n − · · · h − w + 1 n − w + 1 ≤ (cid:18) eKw (cid:19) w (cid:18) hn (cid:19) w ≤ e − w . Proof of Claim 3.8.

By Claim 3.14 we may assume s ∈ [0 , . When ≤ s ≤ − w/ K the result followsfrom Corollary 3.16. When − w/ K ≤ | s | ≤ , by the assumption n ≥ K there must exist a value s between − w/ K and − w/ K that is of the form − h/n . In particular h ≤ wn/e K . Then h δ ( s ) ≤ e w/ h δ ( s ) ≤ e w/ e δK p w ( s ) ≤ e δK − w/ , where the inequalities follow from Corollary 3.18, Corollary 3.16, and Claim 3.19, respectively. Proof of Claim 3.2

This proof has a similar structure to that of Claim 3.8. By symmetry we can againrestrict attention to inputs t ∈ [0 , . When t ≤ − w/n then | p w ( t ) | is not much larger than | p w ( t ′ ) | where t ′ is the largest number of the form − h/n not exceeding t for integer h . Otherwise the value | p w ( t ) | isnot much larger than | p w ( s ) | , for some s ∈ [1 − w/ K, − w/ K ] of the form − h/n for an integer h . In turn, p w ( s ) is the probability of a rare event, so we conclude that | p w ( t ) | is small. Claim 3.20. If − /n ≤ t ′ ≤ t ≤ − w/n then | t − z | ≤ ( | t ′ − z | , if z ≥ − w/n, (1 + 2( t − t ′ )) | t ′ − z | , if z ≤ − / − /n. Proof.

The ﬁrst part follows because the expressions under the absolute value are nonnegative. For thesecond part, we bound the ratio t − zt ′ − z = 1 + t − t ′ t ′ − z ≤ t − t ′ ) as desired. Corollary 3.21.

Assuming n ≥ K and − /n ≤ t ′ ≤ t ≤ − w/n , | p w ( t ) | ≤ (1 + 2( t − t ′ )) K | p w ( t ′ ) | .Proof. By the choice of parameters, all zeros in Z + meet the criterion for the ﬁrst inequality in Claim 3.20,while all zeros in Z − meet the criterion for the second one. Therefore | p w ( t ) | = C w Y z ∈ Z − | t − z | Y z ∈ Z + | t − z |≤ C w Y z ∈ Z − (1 + 2( t − t ′ )) (cid:12)(cid:12) t ′ − z (cid:12)(cid:12) Y z ∈ Z + (cid:12)(cid:12) t ′ − z (cid:12)(cid:12) = (1 + 2( t − t ′ )) | Z − | · (cid:12)(cid:12) p w ( t ′ ) (cid:12)(cid:12) ≤ (1 + 2( t − t ′ )) K · (cid:12)(cid:12) p w ( t ′ ) (cid:12)(cid:12) . roof of Claim 3.2. By Fact 3.9 we may assume w ≤ K/ , and by Claim 3.14 (for δ = 0 ) we may assume ≤ t ≤ . If t ≤ − w/n then there exists a t ′ such that p w ( t ′ ) is a probability and ≤ t − t ′ ≤ /n . ByCorollary 3.21, | p w ( t ) | ≤ (1 + 4 /n ) K | p w ( t ′ ) | ≤ | p w ( t ′ ) | .If − w/n ≤ t ≤ , then t ≥ − w/ K . By the assumption n ≥ K there must exist avalue s between − w/ K and − w/ K that is of the form − h/n . In particular h ≤ wn/e K . ByCorollary 3.18, | p w ( t ) | = p h ( t ) ≤ e w/ p h ( s ) = e w/ | p w ( s ) | . By Claim 3.19, p w ( s ) is non-negativeand at most e − w . Therefore | p w ( t ) | ≤ e w/ · e − w ≤ . Proof of Corollary 1.4.

Corollary 1.3 implies the existence of a φ (cid:0) = µ − ν (cid:1) satisfying k φ k = 1 , h f, φ i = ε for some ε = Ω(1) and h φ, q i ≤ K / · − Ω (cid:16)g deg / ( f ) /K (cid:17) for any parity of degree at most K .For any p of degree K and weight at most w , k f − p k ∞ = k f − p k ∞ k φ k ≥ h φ, f − p i = h φ, f i − h φ, p i ≥ ε − w · K / · − Ω (cid:16)g deg / ( f ) /K (cid:17) . Thus, we conclude that W ε/ ( f, K ) = K − / · Ω (cid:16)g deg / ( f ) /K (cid:17) . Corollary 1.4 now follows usingstandard error reduction techniques that show that g deg ε ( f ) = Θ( g deg / ( f )) for all constants < ε < / . We ﬁrst require the following lemma. This lemma builds on ideas in [24, Claim 2], which showed a similarresult for t = Θ(1) . Lemma 4.1.

For any y ∈ { , } n , denote by EQ y the function on { , } n that outputs 1 on input y , and 0otherwise. Then for any t > and d > √ nt log n , we have W n − O ( t ) ( EQ y , d ) ≤ O ( nt log ( n ) /d ) .Proof. Note that for any y ∈ {− , } n , the function EQ y is just the AND function on n input bits (with 0-1valued output), with possibly negated input variables. Thus it sufﬁces to give an approximating polynomialfor the AND function on n bits. We now express AND n as AND ℓ ◦ AND n/ℓ , where ℓ is a parameter we willset later. We compute the inner AND n/ℓ exactly and approximate the outer

AND ℓ to error n − Ω( t ) . This canbe done with a polynomial p of degree O (cid:16)p ℓ log( n t ) (cid:17) [6, 16]. Combining the fact that p is bounded by n − Ω( t ) ≤ at all Boolean inputs with Parseval’s identity and the Cauchy-Schwarz inequality, it can beseen that the weight of p is at most ℓ O (cid:16) √ ℓ log( n t ) (cid:17) . It is well known that the exact multilinear polynomialrepresentation of

AND n/ℓ has constant weight. Hence, by composing p with the multilinear polynomial thatexactly computes AND n/ℓ , we obtain an approximation q for AND n of degree O (cid:18) n q t log nℓ (cid:19) , error n − Ω( t ) , Building on [6], It is possible to derive explicit ε -approximating polynomials for AND where the degree is O (cid:16)p ℓ log(1 /ε ) (cid:17) and the weight is O (cid:16) √ ℓ log(1 /ε (cid:17) rather than ℓ O (cid:16) √ ℓ log(1 /ε ) (cid:17) . Using this tighter weight bound would improve our ﬁnal result by afactor of log n in the exponent. We omit this tighter result for brevity. O (cid:16) √ ℓt log n (cid:17) . We now ﬁx the value of ℓ to ℓ := n t log nd < n , thereby ensuring that the degreeof q is at most d . With this setting of ℓ , the weight of q is at most O ( nt log ( n ) /d ) , proving the lemma. Proof of Theorem 1.5.

Let f : { , } n → { , } be any symmetric function, corresponding to the univariatepredicate D f : { } ∪ [ n ] → { , } n . For the purpose of this proof, let us denote by k f the smallest i forwhich f is constant on inputs of Hamming weight in the interval [ i + 1 , n − i − . Without loss of generality, f ( x ) = 0 for strings of x Hamming weight between k f + 1 and n − k f − . The case where f = 1 on inputstrings of Hamming weight between k f + 1 and n − k f − can be proved using a similar argument. Deﬁne supp( f ) := { x ∈ { , } n : f ( x ) = 1 } . Note that | supp( f ) | ≤ · n k f .Observe that f ( x ) = P y ∈ supp( f ) EQ y ( x ) . Lemma 4.1 implies, for each y ∈ supp( f ) , the existenceof polynomials p y of degree K and weight O ( nk f log ( n ) /K ) , which approximate EQ y to error · n − k f .Deﬁne a polynomial p : { , } n → R by p ( x ) = P y ∈ supp( f ) p y ( x ) . Clearly p has degree K , weight atmost n O ( k f ) · O ( nk f log ( n ) /K ) = 2 ˜ O ( nk f /K ) , and error at most | supp( f ) | · n − k f / ≤ / , where the upperbounds on the weight and error follow from the triangle inequality.The theorem now follows standard error reduction techniques and Paturi’s theorem [22], which statesthat for symmetric functions, g deg( f ) = Θ (cid:0)p n · k f (cid:1) . Remark 4.2.

The upper bound obtained in Theorem 1.5 is more general than as stated, and the only prop-erty of symmetric functions it exploits is that symmetric functions of low approximate degree are highlybiased. More speciﬁcally, the proof of Theorem 1.5 shows that any function f : { , } n → { , } with min {| f − (0) | , | f − (1) |} ≤ n t satisﬁes W ε ( f, K ) ≤ ˜ O ( nt/K ) for any K ≥ √ nt log n . Proof outline.

As we explain in more detail in the proof itself, it is sufﬁcient to establish the theorem forﬁxed k and K and inﬁnitely many n because the statement is downward reducible in n .Using the Chebyshev approximation formulas from Section 3 we derive explicit lower bounds on thelarge Chebyshev coefﬁcients on the polynomial p representing the distinguishing advantage of the ANDfunction on K inputs. Owing to orthogonality and boundedness of the Chebyshev polynomials, this isa lower bound on the approximate degree of AND K . By strong duality as given in the following Claim(see [4]) we obtain Theorem 1.6. Claim 5.1. If g deg ε/ ( F n ) ≥ k then there exists a pair of perfectly k -wise indistinguishable distributions µ , ν over { , } n such that E X ∼ µ [ F n ( X )] − E Y ∼ ν [ F n ( Y )] ≥ ε . Recall that the Chebyshev polynomials are orthogonal under the measure dσ ( t ) = (1 − t ) − / dt supported on [ − , . We will need the following identity for their average square magnitude under thismeasure: E t ∼ σ [ T d ( t ) ] = 1 / when d > . (8) Proof of Theorem 1.6.

By symmetry of the distinguishers, µ and ν can be assumed symmetric. Let F n denote the function on { , } n that outputs AND K (cid:0) x | { ,...,K } (cid:1) , i.e., F n outputs the AND of the ﬁrst

K < n bits of the input. We prove the theorem for G n ( x , . . . , x n ) = NOR ( x | { ,...,K } ) . By the symmetry of and inputs the theorem also holds for F n . 19irst, we claim that the statement of Theorem 1.6 is stronger as n becomes larger, so it is sufﬁcientto prove it in the limiting case when n approaches inﬁnity and k, K are ﬁxed. Suppose that µ and ν are distributions over n bit strings that are k -wise indistinguishable yet are ε -reconstructable by G n . Wemust show that there are distributions µ ′ and ν ′ over { , } n − are k -wise indistinguishable yet are ε -reconstructable by G n − . But this holds for µ ′ (respectively ν ′ ) that generate a random sample from µ (respectively, ν ) and then throw away the last bit.If the statement was false then by Claim 5.1 there would exist degree- k polynomials ˜ G n that approximate G n pointwise on { , } n to within error ε = q − K +1 P d>K (cid:0) KK + d (cid:1) for almost all n . Applying theconstruction from the proof of Fact 3.1 to ˜ G n , there exist univariate degree- k polynomials ˜ p n approximating p n on the set of points W n = {− h/n : 0 ≤ h ≤ n } to within error ε . We emphasize the dependenceon n as it will play a role in the proof.By Formula (3) the polynomial p n has the form p n ( t ) = C n Y z ∈ Z n ( t − z ) , where Z n = {− h/n : 0 ≤ h < K } (the set Z + is empty). The value p n (1) is the probability that G n accepts the all-zero string, so it must equal one. The constant C n must therefore equal Q z ∈ Z n (1 − z ) − .As n tends to inﬁnity, the set Z converges to a single zero at − of multiplicity K , so the sequence p n converges uniformly to the polynomial p ∞ ( t ) = 2 − K ( t + 1) K . By the triangle inequality, for every δ > and all sufﬁciently large n , ˜ p n is within ε + δ of p ∞ on the set W n . A degree- k polynomial is determined by its values on W k +1 and the set of degree- k polynomials thatare within ε + δ of p ∞ on W k +1 is compact. Therefore the sequence of approximating polynomials ˜ p n mustcontain a subsequence (for values of n that are multiples of k + 1 ) that converges (uniformly) to a limitingdegree- k polynomial ˜ p ∞ . Since ˜ p n is within ε + δ of p n on W n for inﬁnitely many n , ˜ p ∞ must be within ε + 2 δ of p ∞ on W n for inﬁnitely many n . The union of these sets W n is dense in [ − , , and by continuity p ∞ can be ε + δ -approximated by the degree- k polynomial ˜ p ∞ everywhere on [ − , . As δ was arbitrary itfollows that the ε -approximate degree of p ∞ can be at most k .All that remains to prove that this is not true, i.e., to show a lower bound of k on the ε -approximatedegree of p ∞ . This lower bound is known (see, e.g., [14]); we provide the details now for completeness. Let q be any degree- k polynomial. By Claim 3.6 the d -th Chebyshev coefﬁcient of p ∞ equals the ( K + d ) -thregular coefﬁcient of g ∞ ( s ) = 2 − K ( s + 1) K , which has value − K (cid:0) KK + d (cid:1) . Since q has degree at most k ,the d -th Chebyshev coefﬁcient c d of p ∞ − q must also equal − K (cid:0) KK + d (cid:1) whenever | d | > k . By symmetryof the Chebyshev coefﬁcients, orthogonality of the Chebyshev polynomials, and Equation (8), E t ∼ σ [( p ∞ ( t ) − q ( t )) ] = c + X d> (2 c d ) E t ∼ σ [ T d ( t ) ] ≥ X d>k · (cid:18) − K (cid:18) KK + d (cid:19)(cid:19) = ε . It follows that the approximation error | p ∞ ( t ) − q ( t ) | must exceed ε for some t ∈ [ − , , contradicting theinitial assumption. Consider a secret sharing scheme with tn parties, divided in n blocks of size t , that is perfectly secureagainst size- k coalitions. If all parties in each block come together and consolidate their information even20nto a single bit, the number of blocks against which the scheme remains secure drops to k/t . In general thisis the best possible, with linear schemes providing tight examples.The following corollary shows that if the distribution over shares is symmetric then much better securityagainst this type of attack can be obtained. Corollary 6.1.

Let f , . . . , f n : { , } t → { , } . Assume X, Y are k -wise indistinguishable symmetri-cally distributed random variables over tn -bit strings. Write X = X . . . X n , Y = Y . . . Y n , whereall blocks X i , Y i have size t . For every K , the n -bit random variables X ′ = f ( X ) . . . f n ( X n ) and Y ′ = f ( Y ) . . . f n ( Y n ) are O (( tK ) / n K e − k / tK ) -close to being perfectly K -wise indistinguishable,assuming K ≤ n/ . The resulting scheme can be viewed as perfectly secure secret sharing with a potentially faulty dealer:With probability − p , the dealer samples perfectly K -wise indistinguishable shares X ′ or Y ′ , and withprobability p = O (( tK ) / n K e − k / tK ) she leaks arbitrary information about the secret.For example, if X, Y are visual shares sampled from the dual polynomial (1) then they are k = Ω( √ tn ) -wise indistinguishable, assuming constant reconstruction error. Corollary 6.1 then says that the inducedblock-shares X ′ , Y ′ are Ω( p n/ log n ) -wise indistinguishable except with probability exp − Ω( √ n log n ) .If, in addition, f = · · · = f n = AND t then X ′ , Y ′ are themselves shares of a visual secret sharingscheme that is secure against Ω( p n/ log n ) -size coalitions. Therefore symmetric visual secret sharingschemes are downward self-reducible at a small loss in security and dealer error in the following sense: Ascheme for n parties can be derived from one for tn parties by dividing the parties into blocks and AND ingthe shares in each block.

Proof of Corollary 6.1.

By Theorem 1.2, X and Y are ( tK, O (( tK ) / ) · e − k / tK ) -wise indistinguish-able. Since any size- K distinguisher against ( X ′ , Y ′ ) induces a size- tK distinguisher against ( X, Y ) , theformer are ( K, δ = O (( tK ) / ) · e − k / tK ) -wise indistinguishable. By Theorem D.1 of [4], any pair of ( K, δ ) -wise indistinguishable distributions over n bits is δn K -close to a pair of perfectly indistinguishableones. We thank Mark Bun for telling us about the work of Sachdeva and Vishnoi [23], and Mert Sa˘glam, PritishKamath, Robin Kothari, and Prashant Nalini Vasudevan for helpful comments on a previous version of themanuscript. We are also grateful to Xuangui Huang and Emanuele Viola for sharing the manuscript [15].Andrej Bogdanov’s work was supported by RGC GRF CUHK14207618. Justin Thaler and Nikhil Mandewere supported by NSF Grant CCF-1845125.

References [1] Andris Ambainis. Quantum search with variable times.

Theory Comput. Syst. , 47(3):786–807, 2010.[2] Shalev Ben-David, Adam Bouland, Ankit Garg, and Robin Kothari. Classical lower bounds fromquantum upper bounds. In , pages 339–349, 2018.[3] Andrej Bogdanov. Approximate degree of AND via Fourier analysis.

Electronic Colloquium onComputational Complexity (ECCC) , 25:197, 2018.214] Andrej Bogdanov, Yuval Ishai, Emanuele Viola, and Christopher Williamson. Bounded indistinguisha-bility and the complexity of recovering secrets. In

Advances in Cryptology - CRYPTO 2016 - 36thAnnual International Cryptology Conference, Santa Barbara, CA, USA, August 14-18, 2016, Proceed-ings, Part III , pages 593–618, 2016.[5] Andrej Bogdanov and Christopher Williamson. Approximate bounded indistinguishability. In , pages 53:1–53:11, 2017.[6] Harry Buhrman, Richard Cleve, Ronald de Wolf, and Christof Zalka. Bounds for small-error and zero-error quantum algorithms. In , pages 358–368, 1999.[7] Mark Bun, Robin Kothari, and Justin Thaler. The polynomial method strikes back: tight quantumquery bounds via dual polynomials. In

Proceedings of the 50th Annual ACM SIGACT Symposium onTheory of Computing, STOC 2018, Los Angeles, CA, USA, June 25-29, 2018 , pages 297–310, 2018.[8] Mark Bun and Justin Thaler. Dual lower bounds for approximate degree and markov-bernstein in-equalities. In

Automata, Languages, and Programming - 40th International Colloquium, ICALP 2013,Riga, Latvia, July 8-12, 2013, Proceedings, Part I , pages 303–314, 2013.[9] Mark Bun and Justin Thaler. Hardness ampliﬁcation and the approximate degree of constant-depth cir-cuits.

Electronic Colloquium on Computational Complexity (ECCC) , 20:151, 2013. Extended abstractin ICALP 2015.[10] Mark Bun and Justin Thaler. Hardness ampliﬁcation and the approximate degree of constant-depthcircuits. In

Automata, Languages, and Programming - 42nd International Colloquium, ICALP 2015,Kyoto, Japan, July 6-10, 2015, Proceedings, Part I , pages 268–280, 2015.[11] Mark Bun and Justin Thaler. A nearly optimal lower bound on the approximate degree of AC0. In , pages 1–12, 2017.[12] Mark Bun and Justin Thaler. The large-error approximate degree of AC0.

Electronic Colloquium onComputational Complexity (ECCC) , 25:143, 2018.[13] Karthekeyan Chandrasekaran, Justin Thaler, Jonathan Ullman, and Andrew Wan. Faster private releaseof marginals on small databases. In

Innovations in Theoretical Computer Science, ITCS’14, Princeton,NJ, USA, January 12-14, 2014 , pages 387–402, 2014.[14] Noam D. Elkies (https://mathoverﬂow.net/users/14830/noam-d elkies). Uniform approxima-tion of x n by a degree d polynomial: estimating the error. MathOverﬂow. URL: https://mathoverflow.net/q/70527 .[15] Xuangui Huang and Emanuele Viola. Almost bounded indistinguishability and degree-weight trade-offs. 2019. Manuscript.[16] Jeff Kahn, Nathan Linial, and Alex Samorodnitsky. Inclusion-exclusion: Exact and approximate. Combinatorica , 16(4):465–477, 1996. 2217] Pritish Kamath and Prashant Vasudevan. Approximate degree of AND-OR trees, 2014. Manuscriptavailable at .[18] Philip N. Klein and Neal E. Young. On the number of iterations for Dantzig-Wolfe optimization andpacking-covering approximation algorithms.

SIAM J. Comput. , 44(4):1154–1172, 2015.[19] Marvin Minsky and Seymour Papert.

Perceptrons . MIT Press, Cambridge, MA, 1969.[20] Moni Naor and Adi Shamir. Visual cryptography. In

Advances in Cryptology - EUROCRYPT ’94,Workshop on the Theory and Application of Cryptographic Techniques, Perugia, Italy, May 9-12, 1994,Proceedings , pages 1–12, 1994.[21] Noam Nisan and Mario Szegedy. On the degree of Boolean functions as real polynomials.

Computa-tional Complexity , 4:301–313, 1994.[22] Ramamohan Paturi. On the degree of polynomials that approximate symmetric boolean functions(preliminary version). In

Proceedings of the 24th Annual ACM Symposium on Theory of Computing,May 4-6, 1992, Victoria, British Columbia, Canada , pages 468–474, 1992.[23] Sushant Sachdeva and Nisheeth K. Vishnoi. Faster algorithms via approximation theory.

Foundationsand Trends in Theoretical Computer Science , 9(2):125–210, 2014.[24] Rocco A. Servedio, Li-Yang Tan, and Justin Thaler. Attribute-efﬁcient learning and weight-degreetradeoffs for polynomial threshold functions. In

COLT 2012 - The 25th Annual Conference on LearningTheory, June 25-27, 2012, Edinburgh, Scotland , pages 14.1–14.19, 2012.[25] Alexander A. Sherstov. Approximating the AND-OR tree.

Theory of Computing , 9:653–663, 2013.[26] Alexander A Sherstov. Breaking the Minsky–Papert barrier for constant-depth circuits.

SIAM Journalon Computing , 47(5):1809–1857, 2018.[27] Alexander A. Sherstov. The power of asymmetry in constant-depth circuits.

SIAM J. Comput. ,47(6):2362–2434, 2018.[28] Alexander A Sherstov and Pei Wu. Near-optimal lower bounds on the threshold degree and sign-rankof AC0. arXiv preprint arXiv:1901.00988 , 2019. To appear in STOC 2019.[29] Robert ˇSpalek. A dual polynomial for OR.

CoRR , abs/0803.4516, 2008.

A Properties of Symmetric Functions and Distributions

Here, we prove some basic facts that we need about symmetric functions and distributions (see ﬁrst para-graph of Section 3). Let Q : { , } n → R be a function. We say that Q is symmetric if the output of Q depends only on the Hamming weight of its input. If we let X : { , } n → [0 , denote a probabilitydistribution, we say that X is symmetric if the corresponding function mapping inputs to probabilities is asymmetric function. We need two further facts about such distributions. Fact A.1.

Suppose that X is a symmetric distribution over { , } n . For S ⊆ { , ..., n } , let X | S denote theprojection of X to the indices in S . Then, X | S is also symmetric. roof. Let z w be an arbitrary element of { , } | S | of Hamming weight w . Using symmetry of X , we canobserve that Pr X [ X | S = z w ] = n X h =0 X y ∈{ , } n −| S | | y | = h Pr[ X | S = z w and X | [ n ] \ S = y ] = n −| S | X h =0 (cid:18) n − | S | h (cid:19) Pr[ | X | = h + w ] (cid:0) nw + h (cid:1) . The expression on the right depends only on w and not on z w , so the distribution X | S must be symmetricalso. Fact A.2.

Suppose that X and Y are symmetric distributions over { , } n . Then without loss of generality,the best statistical test Q : { , } n → [0 , for distinguishing between X and Y is a symmetric function. Inparticular, we have: max symmetric Q { E X [ Q ( X )] − E Y [ Q ( Y )] } = max Q { E X [ Q ( X )] − E Y [ Q ( Y )] } . Proof.

Let Q ∗ denote arg max Q { E X [ Q ( X )] − E Y [ Q ( Y )] } . If Q ∗ is symmetric then the proof is complete.If not, deﬁne ˜ Q as the following symmetrized version of Q ∗ : ˜ Q ( z ) := E σ [ Q ∗ ( σ ( z ))] , where the expectation is over a uniform permutation σ . It is clear that ˜ Q is a symmetric function andwe will write ˜ Q w to denote the value ˜ Q takes on any input of Hamming weight w . We now show thatits distinguishing advantage between X and Y is the same as Q ∗ . Clearly, it is enough to show that E X [ ˜ Q ( X )] = E X [ Q ∗ ( X )] for arbitrary symmetric distribution X . This follows from a simple calculation: E X [ Q ∗ ( X )] = n X w =0 X | x | = w Pr[ X = x ] Q ∗ ( x ) = n X w =0 Pr[ | X | = w ] (cid:0) nw (cid:1) X | x | = w Q ∗ ( x )= n X w =0 Pr[ | X | = w ] · ˜ Q w = E X [ ˜ Q ( X )] ..