[PDF] Lower Bounds for XOR of Forrelations

Abstract

The Forrelation problem, introduced by Aaronson [A10] and Aaronson and Ambainis [AA15], is a well studied problem in the context of separating quantum and classical models. Variants of this problem were used to give exponential separations between quantum and classical query complexity [A10, AA15]; quantum query complexity and bounded-depth circuits [RT19]; and quantum and classical communication complexity [GRT19]. In all these separations, the lower bound for the classical model only holds when the advantage of the protocol (over a random guess) is more than ≈1/ N − − √ , that is, the success probability is larger than ≈1/2+1/ N − − √ . To achieve separations when the classical protocol has smaller advantage, we study in this work the XOR of k independent copies of the Forrelation function (where k≪N ). We prove a very general result that shows that any family of Boolean functions that is closed under restrictions, whose Fourier mass at level 2k is bounded by α k , cannot compute the XOR of k independent copies of the Forrelation function with advantage better than O( α k N k/2 ) . This is a strengthening of a result of [CHLT19], that gave a similar result for k=1 , using the technique of [RT19]. As an application of our result, we give the first example of a partial Boolean function that can be computed by a simultaneous-message quantum protocol of cost polylog(N) (when players share polylog(N) EPR pairs), however, any classical interactive randomized protocol of cost at most o ~ ( N 1/4 ) , has quasipolynomially small advantage over a random guess. We also give the first example of a partial Boolean function that has a quantum query algorithm of cost polylog(N) , and such that, any constant-depth circuit of quasipolynomial size has quasipolynomially small advantage over a random guess.

Full PDF

aa r X i v : . [ c s . CC ] J u l Lower Bounds for XOR of Forrelations

Uma Girish ∗ Ran Raz † Wei Zhan ‡ Abstract

The Forrelation problem, ﬁrst introduced by Aaronson [A10] and Aaronson and Ambai-nis [AA15], is a well studied computational problem in the context of separating quantum andclassical computational models. Variants of this problem were used to give tight separationsbetween quantum and classical query complexity [AA15]; the ﬁrst separation between poly-logarithmic quantum query complexity and bounded-depth circuits of super-polynomial size, aresult that also implied an oracle separation of the classes BQP and PH [RT19]; and improvedseparations between quantum and classical communication complexity [GRT19]. In all theseseparations, the lower bound for the classical model only holds when the advantage of the pro-tocol (over a random guess) is more than ≈ / √ N , that is, the success probability is larger than ≈ / / √ N . This is unavoidable as ≈ / √ N is the correlation between two coordinates ofan input that is sampled from the Forrelation distribution, and hence there are simple classicalprotocols that achieve advantage ≈ / √ N , in all these models.To achieve separations when the classical protocol has smaller advantage, we study in thiswork the xor of k independent copies of (a variant of) the Forrelation function (where k ≪ N ).We prove a very general result that shows that any family of Boolean functions that is closedunder restrictions, whose Fourier mass at level 2 k is bounded by α k (that is, the sum of theabsolute values of all Fourier coeﬃcients at level 2 k is bounded by α k ), cannot compute the xor of k independent copies of the Forrelation function with advantage better than O (cid:16) α k N k/ (cid:17) .This is a strengthening of a result of [CHLT19], that gave a similar statement for k = 1, usingthe technique of [RT19]. We give several applications of our result. In particular, we obtain thefollowing separations: Quantum versus Classical Communication Complexity:

We give the ﬁrst exampleof a partial Boolean function that can be computed by a simultaneous-message quantum pro-tocol with communication complexity polylog( N ) (where Alice and Bob also share polylog( N )EPR pairs), and such that, any classical randomized protocol of communication complexityat most ˜ o ( N / ), with any number of rounds, has quasipolynomially small advantage over arandom guess. Previously, only separations where the classical protocol has polynomially smalladvantage were known between these models [G16, GRT19]. Quantum Query Complexity versus Bounded Depth Circuits:

We give the ﬁrst ex-ample of a partial Boolean function that has a quantum query algorithm with query complexity ∗ Department of Computer Science, Princeton University. Research supported by the Simons Collaboration onAlgorithms and Geometry, by a Simons Investigator Award and by the National Science Foundation grant No.CCF-1714779. † Department of Computer Science, Princeton University. Research supported by the Simons Collaboration onAlgorithms and Geometry, by a Simons Investigator Award and by the National Science Foundation grant No.CCF-1714779. ‡ Department of Computer Science, Princeton University. Research supported by the Simons Collaboration onAlgorithms and Geometry, by a Simons Investigator Award and by the National Science Foundation grant No.CCF-1714779. olylog( N ), and such that, any constant-depth circuit of quasipolynomial size has quasipolyno-mially small advantage over a random guess. Previously, only separations where the constant-depth circuit has polynomially small advantage were known [RT19]. Several recent works used Fourier analysis to prove lower bounds for computing (variants of) theForrelation (partial) function of [A10, AA15], in various models of computation and communi-cation [RT19, CHLT19, GRT19]. These works show that for many computational models, whenanalyzing the success probability of computing the Forrelation function, it’s suﬃcient to bound thecontribution of Fourier coeﬃcients at level 2, ignoring all other Fourier coeﬃcients [RT19, CHLT19].This holds for any computational model that is closed under restrictions and is proved by analyz-ing the Forrelation distribution as a distribution resulting from a certain random walk, rather thananalyzing it directly.While this is a powerful technique, it could only be used to bound computations of the Forre-lation function with advantage (over a random guess) larger than ≈ / √ N , that is, computationswith success probability larger than ≈ / / √ N . Roughly speaking, this is because the boundon the Fourier coeﬃcients at level 2 of the Forrelation function is ≈ O (cid:0) / √ N (cid:1) .In this work, we study the xor of k independent copies of the Forrelation function of [RT19](where k < o ( N / )). We show that for many computational models, when analyzing the successprobability of computing the xor of k independent copies of the Forrelation function, it’s suﬃcientto bound the contribution of Fourier coeﬃcients at level 2 k , ignoring all other Fourier coeﬃcients.Our proof builds on the techniques of [RT19], and followup works [CHLT19, GRT19], by analyzinga “product” of k random walks, one for each of the independent copies of the Forrelation function.This can be viewed as a random walk with a k -dimensional time variable.Consequently, we obtain a very general lower bound that shows that any family of Booleanfunctions that is closed under restrictions, whose Fourier mass at level 2 k is bounded by α k (thatis, for every function in the family, the sum of the absolute values of all Fourier coeﬃcients at level2 k is bounded by α k ), cannot compute the xor of k independent copies of the Forrelation functionwith advantage better than O (cid:16) α k N k/ (cid:17) , that is, with success probability larger than + O (cid:16) α k N k/ (cid:17) .This is a strengthening of a result of [CHLT19], that gave a similar statement for k = 1, using thetechnique of [RT19].We note that the requirement that the family of Boolean functions is closed under restrictionsis satisﬁed by essentially all non-uniform computational models. The requirement of having agood bound on the Fourier mass at level 2 k is satisﬁed by several central and well-studied com-putational models (see for example [CHHL18] for a recent discussion). In particular, we focus inthis work on three such models: communication complexity, query complexity (decision trees) andbounded-depth circuits. We note that our result is valid for any k < N c , for some constant c > k to be poly-logarithmic in N , so that we have quantum protocolsof poly-logarithmic cost. We use our main theorem to give several separations between quantum2nd classical computational models. Quantum versus classical separations in communication complexity have been studied for morethan two decades in numerous works. We brieﬂy summarize the history of quantum advantagein communication complexity of partial functions, that is most relevant for us: First, Buhrman,Cleve and Wigderson proved an exponential separation between zero-error simultaneous-messagequantum communication complexity (without entanglement) and classical deterministic communi-cation complexity [BCW98]. For the bounded-error model, Raz showed an exponential separationbetween two-way quantum communication complexity and two-way randomized communicationcomplexity [R99]. Gavinsky et al (building on Bar-Yossef et al [BJK04]) gave an exponentialseparation between one-way quantum communication complexity and one-way randomized com-munication complexity [GKK+08]. Klartag and Regev gave an exponential separation betweenone-way quantum communication complexity and two-way randomized communication complex-ity [KR11]. The state of the art separation, by Gavinsky, gave an exponential separation betweensimultaneous-message quantum communication complexity (with entanglement) and two-way ran-domized communication complexity [G16]. An alternative proof for Gavinsky’s result was recentlygiven by [GRT19], as a followup to [RT19, CHLT19], and had the additional desired property thatin the quantum protocol, the time complexity of all the players is poly-logarithmic.

Our Result:

In all these works, the lower bounds for classical communication complexity only hold when theadvantage of the protocol (over a random guess) is more than ≈ / √ N , that is, the successprobability is larger than ≈ / / √ N .In this work, we give a partial Boolean function that can be computed by a simultaneous-message quantum protocol with communication complexity polylog( N ) (where Alice and Bob alsoshare polylog( N ) EPR pairs), and such that, any classical randomized protocol of communicationcomplexity at most ˜ o ( N / ), with any number of rounds, has quasipolynomially small advantageover a random guess. This qualitatively matches the results of [G16, GRT19] and has the additionaldesired property that the lower bound for the classical communication protocol holds for quasipoly-nomially small advantage, rather than polynomially small advantage. Moreover, as in [GRT19],the quantum protocol in our upper bound has the additional property of being eﬃciently imple-mentable , in the sense that it can be described by quantum circuits of size polylog( N ), with oracleaccess to the inputs.To prove this result we use the xor of k independent copies of the Forrelation function, liftedto communication complexity using xor as the gadget [R95], as in [GRT19]. The quantum upperbound is simple. For the classical lower bound, we use ideas from [GRT19] to bound the level-2 k Fourier mass. This, along with our main theorem implies the desired separation. Our bounds forthe level-2 k Fourier mass may be interesting in their own right and are proved in Section 7.3 elated Work:

We note that an exponential separation between two-way quantum communication complexityand two-way randomized communication complexity, with quasipolynomially small advantage, canbe proved by a combination of several previous results, as follows:Start with an existing separation between quantum and classical query complexity, such as theone of [AA15]. Use Drucker’s xor lemma for randomized decision tree [D12] to get a separationbetween quantum and classical query complexity, where the classical protocol has quasipolyno-mially small advantage. Finally, use the recent lifting theorem of [CFK+19] to lift the resultto communication complexity. To the best of our knowledge, this separation was not previouslyobserved.It follows from these works that there exists a function computable in the quantum two-waymodel in communication complexity polylog( N ), for which randomized protocols of cost ˜ o ( √ N )have at most quasipolynomially small advantage. While the lower bound is for cost ˜ o ( √ N ) proto-cols, which is quantitatively stronger than our lower bound for cost ˜ o ( N / ) protocols, the quantumupper bound in this result seems to require two rounds of communication, while our function iscomputable in the simultaneous model when Alice and Bob share entanglement. Separations of quantum query complexity and bounded-depth classical circuit complexity havebeen studied in the context of oracle separations of the classes BQP and PH. An example of apartial Boolean function (Forrelation) that has a quantum query algorithm with query complexitypolylog( N ), and such that, any constant-depth circuit of quasipolynomial size has polynomiallysmall advantage over a random guess, was given in [RT19]. This result implied an oracle separationof the classes BQP and PH.Here, we give the ﬁrst example of a partial Boolean function ( xor of k copies of Forrelation) thathas a quantum query algorithm with query complexity polylog( N ), and such that, any constant-depth circuit of quasipolynomial size has quasipolynomially small advantage over a randomguess.For the proof, we use our main theorem, together with Tal’s bounds on the level-2 k Fouriermass of bounded-depth circuits [Tal17].

The query complexity model (also known as black box model or decision-tree complexity) has playeda central role in the study of quantum computational complexity. Quantum advantages in querycomplexity (decision trees) have been demonstrated for partial functions in various settings andnumerous works. For example, Aaronson and Ambainis [AA15] showed that the Forrelation problemcan be solved by one quantum query, while its randomized query complexity is Ω( √ N / log N ).For classical randomized query complexity, there is a known xor lemma, proved by Drucker [D12].4n particular, Theorem 1.3 of [D12], along with the result of [AA15] gives a partial function ( xor of polylog( N ) copies of Forrelation) that can be computed by a quantum query algorithm withpolylog( N ) queries, while every classical randomized algorithm that makes ˜ o ( N / ) queries, hasquasipolynomially small advantage.Our main theorem implies a diﬀerent proof for this result, using Tal’s recent bounds on thelevel-2 k Fourier mass of decision trees [Tal19].

Our functions are obtained by taking an xor of several copies of a variant of the Forrelationproblem, as deﬁned in [RT19].Let N = 2 n for suﬃciently large n ∈ N . Let k ∈ N be a parameter. We assume that k = o ( N / ). Let ǫ = ln N be a parameter.Let H N denote the N × N normalized Hadamard matrix whose entries are either − √ N or √ N .Let f orr ( z ) := 1 N h z , H N z i denote the Forrelation of a vector z = ( z , z ), where z , z ∈ R N . The Forrelation DecisionProblem is the partial Boolean function F : {− , } N → {− , } deﬁned at z ∈ {− , } N by F ( z ) :=  − f orr ( z ) ≥ ǫ/

21 if f orr ( z ) ≤ ǫ/ ⊕ k Forrelation Decision Problem F ( k ) : {− , } kN → {− , } is deﬁned as the xor of k independent copies of F . More precisely, for every z , . . . , z k ∈ {− , } N , let F ( k ) ( z , . . . , z k ) := k Y j =1 F ( z j ) . For our separation results, we take the function F ( k ) , where k = ⌈ log N ⌉ . For our communi-cation complexity separation we take the lift of F ( k ) with xor as the gadget. The quantum upperbounds in all these separation results are quite simple. Moreover, all the quantum algorithms inour upper bounds have the additional advantage of being eﬃciently implementable , in the sensethat they can be described by quantum circuits of size polylog( N ), with oracle access to the inputs.Our main contribution is the classical lower bound. Towards this, our main theorem providesan upper bound on the maximum correlation of F ( k ) with any family of Boolean functions, in termsof the maximum level-2 k Fourier mass of a function in the family.

Main Theorem (Informal) There exist two distributions, σ ( k )0 and σ ( k )1 , on the no and yes instances of F ( k ) , respectively, with the following property. Let H be a family of Boolean functions,each of which maps {− , } kN into [ − , . Assume that H is closed under restrictions. For ∈ H , let L k ( H ) := P | S | =2 k | b H ( S ) | . Let α ∈ R be such that α k := sup H ∈H ( L k ( H ) , . Then, forevery H ∈ H , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E z ∼ σ ( k )0 [ H ( z )] − E z ∼ σ ( k )1 [ H ( z )] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ O (cid:18) α k N k/ (cid:19) Our main theorem implies that functions in H cannot correlate with F ( k ) by more than + O (cid:16) α k N k/ (cid:17) . For the applications, we instantiate H with the class of functions computed by classicalprotocols of small cost. k = 2 Our proof builds on the techniques of [RT19], and followup works [CHLT19, GRT19], which, inturn, used a key idea from [CHHL18]. We will now give an overview of the proof of the MainTheorem for the special case k = 2, where one can already see most of the key ideas.We start by recalling the hard distributions for k = 1, as in [RT19]. The distribution U onno instances of F is the uniform distribution U N on {− , } N . It can be shown that a bitstring drawn uniformly at random almost always has low Forrelation. The distribution G on yesinstances of F is the Gaussian distribution with mean 0 and covariance matrix ǫ (cid:20) I N H N H N I N (cid:21) . Itcan be shown that a vector drawn from this distribution almost always has high Forrelation (atleast ǫ/ G is not a distribution over {− , } N , this can be ﬁxed (by probabilisticallyrounding the values) and we ignore this issue in the proof overview.Our hard distributions for k ≥ distribution µ on no instances of F (2) is ( U × U + G × G ). The distribution µ on yesinstances is ( U × G + G × U ). It can be shown that these distributions indeed have almost alltheir mass on the yes and no instances of F (2) , respectively.Throughout this proof, we identify functions in H with their unique multilinear extensions.Using this identiﬁcation, it follows that for all H ∈ H and z ∈ R N , we have E z ∼U [ H ( z + ( z, E z ∼U [ H ( z + (0 , z ))] = E z ∼U [ H ( z + z )] = H ( z ). Bounding the Advantage of H in Distinguishing p · µ and p · µ , for Small p : As in [RT19, CHLT19], in order to show that functions in H can’t distinguish between µ and µ ,we ﬁrst show that they can’t distinguish between p · µ and p · µ , for small p . We show that forevery H ∈ H , and p ≤ N , (cid:12)(cid:12)(cid:12)(cid:12) E z ∼ p · µ [ H ( z )] − E z ∼ p · µ [ H ( z )] (cid:12)(cid:12)(cid:12)(cid:12) , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E z ∼ p ·G z ∼ p ·G [ H ( z , z ) − H ( z , − H (0 , z ) + H (0 , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ p · O (cid:18) L ( H ) N (cid:19) + O ( p N . )6his claim is analogous to Claim 20 from [CHLT19]. For suﬃciently small p , the second term inthe R.H.S. of the inequality is negligible, compared to the ﬁrst term. To prove this inequality, weuse the Fourier expansion of H in the L.H.S. and bound the diﬀerence between the moments of p · µ and p · µ . We show that p · µ and p · µ agree on moments of degree less than 4, so thesemoments don’t contribute to the diﬀerence. We then show that the contribution of the momentsof degree 4 is L ( H ) · O (cid:16) p N (cid:17) and the contribution of moments of higher degrees is O ( p N . ). Bounding the Advantage of H ( z + z ) in Distinguishing p · µ and p · µ , for Small p : Next, as in [RT19, CHLT19], we show a similar statement for the function H ( z + z ) of z , where z is not too large. We show that for every H ∈ H , and every z ∈ [ − / , / kN and p ≤ N ,12 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E z ∼ p ·G z ∼ p ·G [ H ( z + ( z , z )) − H ( z + ( z , − H ( z + (0 , z )) + H ( z )] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ p · O (cid:18) L ( H ) N (cid:19) + O ( p N . ) (1)The proof of this inequality is similar to the proof of Claim 19 of [CHLT19], using key ideasfrom [CHHL18], and relies on the multilinearity of functions in H and the closure of H underrestrictions. A Random Walk with Two-Dimensional Time Variable:

This is the main place where our proof diﬀers from the one of [RT19] and followup works [CHLT19,GRT19]. In all these works the Forrelation distribution was ultimately analyzed as the distributionobtained by a certain random walk. Here, we consider a product of two random walks, which canalso be viewed as a random walk with two-dimensional time variable.Let T = 16 N and p = √ T . Let z (1)1 , z (1)2 , . . . , z ( T )1 , z ( T )2 ∼ p · G be independent samples. Let t = ( t , t ) for t , t ∈ { , . . . , T } . Let z ≤ ( t ) := (cid:16)P t i =1 z ( i )1 , P t i =1 z ( i )2 (cid:17) . Note that z ≤ ( t ) is distributedaccording to ( p √ t · G ) × ( p √ t · G ). In particular, z ≤ ( T,T ) is distributed according to G × G . Thisimplies that( ∗ ) := E z ∼ µ [ H ( z )] − E z ∼ µ [ H ( z )] , E h H ( z ≤ ( T,T ) ) − H ( z ≤ ( T, ) − H ( z ≤ (0 ,T ) ) + H (0 , i We now rewrite ( ∗ ) as follows.( ∗ ) = 12 X t ∈ [ T ] t ∈ [ T ] E h H ( z ≤ ( t ,t ) ) − H ( z ≤ ( t − ,t ) ) − H ( z ≤ ( t ,t − ) + H ( z ≤ ( t − ,t − ) i (2)The last equation follows by a two-dimensional telescopic cancellation, as depicted in Figure 1.This turns out to be a powerful observation. Note that for every ﬁxed t = ( t , t ), the randomvariable z ≤ ( t ) − z ≤ ( t − (1 , , ( z ( t )1 , z ( t )2 ) is distributed according to p · G , by construction. We can7hus apply Inequality(1), setting z = z ≤ ( t − (1 , . This, along with the Triangle-Inequality impliesthat | ( ∗ ) | ≤ X t ∈ [ T ] t ∈ [ T ] (cid:12)(cid:12)(cid:12) E h H ( z ≤ ( t ,t ) ) − H ( z ≤ ( t − ,t ) ) − H ( z ≤ ( t ,t − ) + H ( z ≤ ( t − ,t − ) i(cid:12)(cid:12)(cid:12) ≤ X t ∈ [ T ] t ∈ [ T ] (cid:18) p · O (cid:18) L ( H ) N (cid:19) + O (cid:0) p N . (cid:1)(cid:19) by Inequality (1)= O (cid:18) L ( H ) N (cid:19) + o (cid:18) N (cid:19) since T = 16 N = 1 p This completes the proof overview for k = 2, albeit with many details left out. −−−−−− ++++++ ( i, j ) is labelledby H ( z ≤ ( i,j ) ) ++++++ −−− −−− + + −− + + −− + + −− + + −− + + −− + + −− + + −− + + −− + + −− Figure 1: Consider the ( T + 1) × ( T + 1) grid whose vertices are indexed by v ∈ ( { } ∪ [ T ]) . Eachvertex v is labelled by H ( z ≤ ( v ) ). Each rectangle has a sign on its vertices as deﬁned in Figure 1and the label of a rectangle is the sum of signed labels of its vertices. The sum of labels of all 1 × T × T rectangle. This is exactly the content of Eq. (2). We present the preliminaries regarding Forrelation in Section 2 and state our main theorems inSection 3. In Section 4, we show how to bound the advantage of H in distinguishing between p · µ and p · µ , for Small p . In Section 5, we show how to bound the advantage of H ( z + z )in distinguishing between p · µ and p · µ , for Small p . In Section 6, we give the analysis of ourrandom walk with k -dimensional time variable. Section 7 contains the proofs of the quantum-classical separations. For n ∈ N , we use [ n ] to denote the set { , , . . . , n } . We typically use N to refer to 2 n . For a set S ⊆ [ n ], let ¯ S := [ n ] \ S denote the complement of S . For sets S ⊆ [ n ] , T ⊆ [ m ], we typically use8 × T := { ( s, t ) : s ∈ S, t ∈ T } denote the set product of S and T . Sometimes, we use the notation( S, T ). Note that the map ( i, j ) → m ( i −

1) + j is a bijection between [ n ] × [ m ] and [ nm ]. Usingthis identiﬁcation, S × T is a subset of [ nm ]. We identify subsets S ⊆ [ n ] with their { , } indicatorvector, that is, the vector S ∈ { , } n such that for each j ∈ [ n ], S j = 1 if and only if j ∈ S .Let v ∈ R n . For i ∈ [ n ], we refer to the i -th coordinate of v by v i or v ( i ). For x, y ∈ R n , let x · y ∈ R n be the pointwise product between x and y . This is the vector whose i -th coordinate is x i y i , for every i ∈ [ n ]. Let h x, y i denote the real inner product between x and y . For x, y ∈ { , } n ,let h x, y i := P ni =1 x i y i mod 2 denote the mod 2 inner product between x and y . We use I n todenote the n × n identity matrix. We use 0 to denote the zero vector in arbitrary dimensions. Distributions

For a probability distribution D , let x ∼ D denote a random variable x sampledaccording to D . For distributions D and D , we use D × D to denote the product distributiondeﬁned by sampling ( x, y ) where x ∼ D and y ∼ D are sampled independently. For n ∈ N anda distribution D , let D n denote the product of n distributions, each of which is D . Let µ ∈ R n be a vector and Σ ∈ R n × n be a positive semi-deﬁnite matrix. We use N ( µ, Σ) to refer to the n -dimensional Gaussian distribution with mean µ and covariance matrix Σ. Let U n denote theuniform distribution on {− , } n . For a distribution D over R n and a ∈ R n , let a + D refer to thedistribution obtained by sampling z ∼ D and returning z + a . For P ∈ R n and a distribution D over R n , let P · D denote the distribution obtained by sampling x ∼ D and returning P · x . For p ∈ R , we use p · D to denote the distribution obtained by sampling x ∼ D and returning px . For I ⊆ [ n ], let b D ( I ) := E z ∼ D (cid:2)Q i ∈ I z i (cid:3) refer to the I -th moment of D . Concentration Inequalities

We make use of the following concentration inequalities. The ﬁrstis the Gaussian Concentration Inequality [UCB] which states that P z ∼N (0 , [ z ≥ t ] ≤ e − t / . We alsouse the following concentration inequality for the Chi-Squared distribution. [UCB] P z ,...,z n ∼N (0 , "(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n n X i =1 z i − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ t ≤ e − nt / for all t ∈ (0 , Fourier Analysis

We refer to {− , } n as the Boolean hypercube in n dimensions. Let F := { f : {− , } n → R } denote the real vector space of all Boolean functions on n variables. Thereis an inner product on this space as follows. For f, g ∈ F , let h f, g i := E x ∼ U n [ f ( x ) g ( x )]. Forevery S ⊆ [ n ], there is a character function χ S : {− , } n → {− , } deﬁned at x ∈ {− , } n by χ S ( x ) := Q i ∈ S x i . The set of character functions { χ S } S ⊆ [ n ] forms an orthonormal basis for F . For f ∈ F and S ⊆ [ n ], let b f ( S ) := h f, χ S i denote the S -th Fourier coeﬃcient of f . Note that forall f ∈ F , we have f = P S ⊆ [ n ] b f ( S ) χ S . For f ∈ F , the multilinear extension of f is the uniquemultilinear polynomial ˜ f : R n → R which agrees with f on {− , } n . For every S ⊆ [ n ], themultilinear extension of χ S is the monomial Q i ∈ S x i . This implies that the multilinear extensionof f ∈ F is P S ⊆ [ n ] b f ( S ) Q i ∈ S x i . Henceforth, we identify Boolean functions with their multilinearextensions. With this identiﬁcation, it can be shown that functions in F which map {− , } n into[ − ,

1] also map [ − , n into [ − , f, g ∈ F , let f ∗ g ∈ F be deﬁned at z ∈ {− , } n by( f ∗ g )( z ) := E x ∼ U n [ f ( x ) g ( x · z )] . It can be shown that for all S ⊆ [ n ], we have [ f ∗ g ( S ) = b f ( S ) b g ( S ).9 evel- k Fourier Mass

For f ∈ F and k ∈ { , . . . , n } , let L k ( f ) := P | S | = k | b f ( S ) | denote thelevel- k Fourier mass of f . For a family H ⊆ F of Boolean functions, let L k ( H ) := sup H ∈H L k ( H ). Let k, N ∈ N be parameters, where N = 2 n for some n ∈ N . We assume that k = o ( N / ). Fix aparameter ǫ = k ln N . Let U refer to U N . Hadamard Matrix

The Hadamard matrix H N of size N is an N × N matrix. The rows andcolumns are indexed by strings a and b respectively where a, b ∈ { , } n and the ( a, b )-th entry of H N is deﬁned to be √ N ( − h a,b i . Equivalently, H N ( a, b ) := ( − √ N if P ni =1 a i b i ≡ +1 √ N if P ni =1 a i b i ≡ The Forrelation Function

The Forrelation Function f orr : R N → R is deﬁned as follows. Let z ∈ R N and x, y ∈ R N be such that z = ( x, y ). Then, f orr ( z ) := 1 N h x, H N y i The ⊕ k Forrelation Decision ProblemDeﬁnition 2.1 (The ⊕ k Forrelation Decision Problem) . The Forrelation Decision Problem is thepartial Boolean function F : {− , } N → {− , } deﬁned as follows. For z ∈ {− , } N , let F ( z ) :=  − if f orr ( z ) ≥ ǫ/ if f orr ( z ) ≤ ǫ/ undeﬁned otherwiseThe ⊕ k Forrelation Decision Problem F ( k ) : {− , } kN → {− , } is deﬁned as the xor of k independent copies of F . To be precise, for every z , . . . , z k ∈ {− , } N , let F ( k ) ( z , . . . , z k ) := k Y j =1 F ( z j ) The Gaussian Forrelation Distribution G Deﬁnition 2.2.

Let G denote the Gaussian distribution over R N deﬁned by the following process.1. Sample x , . . . , x N ∼ N (0 , ǫ ) independently.2. Let x = ( x , . . . , x N ) and y = H N x . . Output ( x, y ) . The distribution G can be equivalently expressed as N (cid:18) , ǫ (cid:20) I N H N H N I N (cid:21)(cid:19) . Moments of G We state some useful facts about the moments of G . We use the following notationto refer to the moments of G . For subsets S, T ⊆ [ N ], let b G ( S, T ) := E ( x,y ) ∼G hQ i ∈ S x i Q j ∈ T y j i . Thefollowing claim and its proof appear as Claim 4.1 in [RT19]. We omit the proof. Claim 2.3.

Let

S, T ⊆ [ N ] and i, j ∈ [ N ] . Let i = | S | , i = | T | . Then,1. b G ( { i } , { j } ) = ǫN − / ( − h i,j i .2. b G ( S, T ) = 0 if i = i .3. (cid:12)(cid:12)(cid:12) b G ( S, T ) (cid:12)(cid:12)(cid:12) ≤ ǫ i i ! N − i/ if i = i = i . R kN Let P , Q be two probability distributions on the domain D := R N . Let S ⊆ [ k ]. We deﬁne P S Q ¯ S to be the distribution on D k deﬁned by sampling x = ( x , . . . , x k ) where x , . . . , x k ∈ D are sampledas follows. For each j ∈ [ k ], independently sample ( x j ∼ P if j ∈ Sx j ∼ Q if j ∈ ¯ S Note that for every I = ( I , . . . , I k ) ⊆ [2 kN ], where I , . . . , I k ⊆ [2 N ], we have the following. \ P S Q ¯ S ( I ) = Y j ∈ S b P ( I j ) · Y j / ∈ S b Q ( I j ) Deﬁnition 2.4.

Let G be the distribution in Deﬁnition 2.2 and U = U N . Deﬁne a pair of distri-butions µ ( k )0 , µ ( k )1 on R kN as follows. µ ( k )0 := 12 k − X S ⊆ [ k ] | S | is even G S U ¯ S and µ ( k )1 := 12 k − X S ⊆ [ k ] | S | is odd G S U ¯ S Lemma 2.5.

Let I = ( I , . . . , I k ) ⊆ [2 kN ] , where each I j ⊆ [2 N ] .1. If | I | < k or if I j = ∅ for some j ∈ [ k ] , then d µ ( k )0 ( I ) = d µ ( k )1 ( I ) .2. If | I j | is odd for some j ∈ [ k ] , then d µ ( k )0 ( I ) = d µ ( k )1 ( I ) .3. Let | I | = 2 i for some i ∈ N . Then, (cid:12)(cid:12)(cid:12)(cid:12) d µ ( k )0 ( I ) − d µ ( k )1 ( I ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ − k +1 ǫ i N − i/ i ! . roof of Lemma 2.5. Note that we have the following equality d µ ( k )0 ( I ) − d µ ( k )1 ( I ) , k −  X S ⊆ [ k ] | S | is even \ G S U ¯ S ( I ) − X S ⊆ [ k ] | S | is odd \ G S U ¯ S ( I )  = 12 k −  X S ⊆ [ k ] ( − | S | \ G S U ¯ S ( I )  = 12 k −  X S ⊆ [ k ] ( − | S | Y j ∈ S b G ( I j ) Y j / ∈ S b U ( I j )  = 12 k − k Y j =1 (cid:16) b U ( I j ) − b G ( I j ) (cid:17) (3)(1.) If | I | < k then there exists some j ∈ [ k ] such that | I j | <

2. If | I j | = 0, then b G ( I j ) = b U ( I j ) = 1.If | I j | = 1, Claim 2.3 implies that b G ( I j ) = b U ( I j ) = 0. This along with Eq. (3) implies that d µ ( k )0 ( I ) = d µ ( k )1 ( I ).(2.) Suppose | I j | is odd for some j ∈ [ k ]. Claim 2.3 implies that b G ( I j ) = b U ( I j ) = 0. This, alongwith Eq. (3) implies that d µ ( k )0 ( I ) = d µ ( k )1 ( I ).(3.) Due to item (1.) and (2.) of this lemma, we may assume that I j = ∅ and | I j | is even for every j ∈ [ k ], otherwise d µ ( k )0 ( I ) − d µ ( k )1 ( I ) = 0 and the inequality is trivially true. For each j ∈ [ k ],let | I j | = 2 i j for some i j ∈ N . Claim 2.3 states that if | I j | = 2 i j , then | b G ( I j ) | ≤ ǫ i j i j ! N − i j / .Since I j = ∅ , we have b U ( I j ) = 0. This, along with Eq. (3) implies that (cid:12)(cid:12)(cid:12)(cid:12) d µ ( k )0 ( I , . . . , I k ) − d µ ( k )1 ( I , . . . , I k ) (cid:12)(cid:12)(cid:12)(cid:12) = 12 k − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) k Y j =1 (cid:16) b G ( I j ) − b U ( I j ) (cid:17)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ k − k Y j =1 ǫ i j i j ! N − i j / = 12 k − ǫ i N − i/ k Y j =1 i j ! ≤ − k +1 ǫ i N − i/ i !This completes the proof of Lemma 2.5. Let trnc : R → [ − ,

1] denote the truncation function, whose action on a ∈ R is given by trnc ( a ) = ( sign ( a ) if a / ∈ [ − , a otherwise12or l ∈ R , we also use trnc : R l → [ − , l to refer to the function that applies the above truncationfunction coordinate-wise. Deﬁnition 2.6.

Let µ be any distribution on R M . We deﬁne the rounded distribution ˜ µ on {− , } M as follows.1. Sample z ∼ µ .2. For each coordinate i ∈ [ M ] , independently, let z ′ i = 1 with probability trnc ( z i )2 and z ′ i = − with probability − trnc ( z i )2 .3. Output z ′ = ( z ′ , . . . , z ′ M ) .Let z ∈ R M and µ be the distribution whose support is { z } . We use ˜ z to refer to ˜ µ . We show some useful facts about expectations of multilinear functions over these distributions.

Claim 2.7.

Let H : R M → R be any multilinear polynomial and a ∈ R M . Let µ be a distributionon R M where each coordinate is sampled independently of the rest so that E z ∼ µ [ z ] = a . Then, E z ∼ µ [ H ( z )] = H ( a ) Corollary 2.8.

Let H : R M → R be any multilinear polynomial. Let µ be any distribution on R M and ˜ µ be the distribution on {− , } M obtained by rounding µ as in Deﬁnition 2.6. Then, E z ∼ ˜ µ [ H ( z )] = E z ∼ µ [ H ( trnc ( z ))] Claim 2.9.

Let H : R kN → R be any multilinear polynomial mapping {− , } kN into [ − , . Let z and P be in [ − / , / kN . Then, E z ∼G ( k ) [ | H ( trnc ( z + P · z )) − H ( z + P · z ) | ] ≤ O (cid:18) N k (cid:19) Proof of Claim 2.7.

Let T ⊆ [ M ] and z ∼ µ . The given assumption on µ is that each z j for j ∈ [ M ]is sampled independently so that E z ∼ µ [ z j ] = a j . This implies that E z ∼ µ [ χ T ( z )] , E z ∼ µ hQ j ∈ T z j i = Q j ∈ T a j , χ T ( a ). Note that the quantities E z ∼ µ [ H ( z )] and H ( a ) are both linear with respect to H . Since we have shown that E z ∼ µ [ H ( z )] = H ( a ) for all character functions H , this observationimplies that E z ∼ µ [ H ( z )] = H ( a ) for all multilinear functions H . Proof of Corollary 2.8 from Claim 2.7.

Observe that for every z ∈ R M , the distribution ˜ z as inDeﬁnition 2.6 satisﬁes the hypothesis in Claim 2.7 with a = trnc ( z ). Claim 2.7 implies that E z ′ ∼ ˜ z [ H ( z ′ )] = H ( trnc ( z )). Therefore, E z ∼ ˜ µ [ H ( z )] , E z ∼ µ E z ′ ∼ ˜ z [ H ( z ′ ) | z ] = E z ∼ µ [ H ( trnc ( z ))].Corollary 2.8 is similar to Equation (2) from [RT19] and Claim 2.2 from [GRT19]. Claim 2.9is similar to Claim 5.3 from [RT19]. The proof of this is also identical, so we omit it. We remarkthat the bound in [RT19] is 8 · N − as opposed to our bound of O (cid:16) N − k (cid:17) . This diﬀerence inparameters arises from our choice of ǫ = k ln N as opposed to their choice of ǫ =

124 ln N . Wealso remark that the claim as stated in [RT19] is for scalars P ∈ [ − / , /

2] as opposed to ourassumption of P ∈ [ − / , / kN . However, their proof works under this assumption as well.13 .4 The Forrelation Distribution Let k ∈ N . Let ˜ µ ( k )0 and ˜ µ ( k )1 (respectively ˜ G ) be distributions over {− , } kN (respectively {− , } N ) generated from rounding µ ( k )1 and µ ( k )0 (respectively G ) according to Deﬁnition 2.6.Observe that we may alternatively deﬁne ˜ µ ( k )0 and ˜ µ ( k )1 as follows. Deﬁnition 2.10.

Let G be as in Deﬁnition 2.2 and U = U N . Let ˜ µ ( k )0 := 12 k − X S ⊆ [ k ] | S | is even ˜ G S U ¯ S and ˜ µ ( k )1 := 12 k − X S ⊆ [ k ] | S | is odd ˜ G S U ¯ S We refer to ˜ µ (1)1 , ˜ G as the Forrelation Distribution. We show that the distributions ˜ µ ( k )1 and ˜ µ ( k )0 put considerable mass on the yes and no instancesof F ( k ) , respectively, where F ( k ) is the ⊕ k Forrelation Decision Problem as in Deﬁnition 2.1.

Lemma 2.11.

Let ˜ µ ( k )0 and ˜ µ ( k )1 be distributions as in Deﬁnition 2.10 and F ( k ) be the ⊕ k ForrelationDecision Problem as in Deﬁnition 2.1. Then, P z ∼ ˜ µ ( k )0 [ F ( k ) ( z ) = 1] ≥ − O (cid:18) kN k (cid:19) and P z ∼ ˜ µ ( k )1 [ F ( k ) ( z ) = − ≥ − O (cid:18) kN k (cid:19) The proofs of these use hypercontractivity to show concentration inequalities for low degreepolynomials under product distributions on the Boolean hypercube. These proofs are technicaland are deferred to the appendix.

Deﬁnition 2.12.

Let a ∈ {− , , } M . Let ρ a : R M → R M be a restriction deﬁned as follows. For v ∈ R M , let ρ a ( v ) ∈ R M be such that for all j ∈ [ M ] , ( ρ a ( v ))( j ) := ( v ( j ) if a ( j ) = 0 a ( j ) otherwiseFor a function F : {− , } M → R , the restricted function F ◦ ρ v : {− , } M → R is deﬁned at z ∈ {− , } M by ( F ◦ ρ v )( z ) := F ( ρ v ( z )) .We say that a family H of Boolean functions in M variables is closed under restrictions if forall restrictions v ∈ {− , , } M and H ∈ H , the restricted function H ◦ ρ v is in H . Let N ∈ N be a parameter describing the input size. We will assume that N is a suﬃciently largepower of 2. Let k ∈ N . We assume that k = o ( N / ). Let ǫ = k ln N be the parameter deﬁning G as before. 14 heorem 3.1. Let H be a family of Boolean functions on kN variables, each of which maps {− , } kN into [ − , . Assume that H is closed under restrictions. Let ˜ µ ( k )0 , ˜ µ ( k )1 be the distribu-tions over {− , } kN as in Deﬁnition 2.10. Then, for every H ∈ H , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E z ∼ ˜ µ ( k )0 [ H ( z )] − E z ∼ ˜ µ ( k )1 [ H ( z )] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ O (cid:18) L k ( H ) N k/ (cid:19) + o (cid:18) N k/ (cid:19) Deﬁnition 3.2.

Let ˜ µ ( k )0 , ˜ µ ( k )1 be as in Deﬁnition 2.10. Let σ ( k )0 (respectively σ ( k )1 ) be obtained byconditioning ˜ µ ( k )0 on being a no (respectively yes ) instance of F ( k ) . Corollary 3.3.

Under the same hypothesis as Theorem 3.1, for every H ∈ H (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E z ∼ σ ( k )0 [ H ( z )] − E z ∼ σ ( k )1 [ H ( z )] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ O (cid:18) L k ( H ) N k/ (cid:19) + o (cid:18) N k/ (cid:19) Query Complexity SeparationsLemma 3.4.

Let D : {− , } kN → {− , } be a deterministic decision tree of depth d ≥ . Then, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E z ∼ σ ( k )0 [ D ( z )] − E z ∼ σ ( k )1 [ D ( z )] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:18) O ( d log( kN )) N / (cid:19) k Theorem 3.5. F ( k ) can be computed in the bounded-error quantum query model with O ( k log N log k ) queries. However, every randomized decision tree of depth ˜ o ( √ N ) has a worst-case success proba-bility of at most + exp( − Ω( k )) . Setting k = ⌈ log c N ⌉ for c ∈ N in Theorem 3.5 gives us an explicit family of partial functionsthat are computable by quantum query algorithms of cost ˜ O (log c +2 N ), however every randomizedquery algorithm of cost ˜ o ( N ) has at most Ω(log c N ) advantage over random guessing. Communication Complexity SeparationsDeﬁnition 3.6 (The ⊕ k Forrelation Communication Problem F ( k ) ◦ xor ) . Alice is given x andBob is given y where x, y ∈ {− , } kN . Let F ( k ) be as in Deﬁnition 2.1. Their goal is to computethe partial function F ( k ) ( x · y ) . Lemma 3.7.

Let C : {− , } kN × {− , } kN → {− , } be any deterministic protocol of commu-nication complexity c . Then, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E x ∼ U kN z ∼ σ ( k )0 [ C ( x, x · z )] − E x ∼ U kN z ∼ σ ( k )1 [ C ( x, x · z )] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ O (cid:18) ( c + 8 k ) k N k/ (cid:19) heorem 3.8. F ( k ) ◦ xor can be solved in the quantum simultaneous with entanglement modelwith O ( k log N log k ) bits of communication, when Alice and Bob share O ( k log N log k ) EPRpairs. However, any randomized protocol of cost ˜ o ( N / ) has a worst-case success probability of atmost + exp( − Ω( k )) . Setting k = ⌈ log c N ⌉ for c ∈ N in Theorem 3.8 gives us an explicit family of partial functionsthat are computable by quantum simultaneous protocols of cost ˜ O (log c +3 N ) when Alice and Bobshare ˜ O (log c +3 N ) EPR pairs, however every interactive randomized protocol of cost ˜ o ( N ) hasat most Ω(log c N ) advantage over random guessing. Circuit Complexity SeparationsLemma 3.9.

Let C : {− , } kN → {− , } be an AC0 circuit of depth d ≥ and size s . Then, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E z ∼ σ ( k )0 [ C ( z )] − E z ∼ σ ( k )1 [ C ( z )] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ O (cid:0) log d − ( s ) (cid:1) N / ! k Theorem 3.10.

The distributions σ ( k )1 and σ ( k )0 can be distinguished by a bounded-error quan-tum query protocol with O ( k log N log k ) queries with / advantage. However, every constantdepth circuit of size o (cid:16) exp (cid:16) N d − (cid:17)(cid:17) can distinguish these distributions with at most exp( − Ω( k )) advantage. Setting k = ⌈ log c N ⌉ for c ∈ N in Theorem 3.10 gives us an explicit family of distributions thatare distinguishable by cost ˜ O (log c +2 N ) quantum query algorithms, however every constant depthcircuit of quasipolynomial size can distinguish them with at most Ω(log c N ) advantage. Lemma 4.1.

Let H be a Boolean function on kN variables that maps {− , } kN into [ − , .Let p ≤ N and P ∈ [ − p, p ] kN . Then, ∆ := (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E z ∼ P · µ ( k )0 [ H ( z )] − E z ∼ P · µ ( k )1 [ H ( z )] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ O (cid:18) − k · L k ( H ) p k N k/ + p k +1) N ( k +1) / (cid:19) roof of Lemma 4.1. For all z ∈ R kN , we have H ( z ) = P S ⊆ [2 kN ] b H ( S ) Q i ∈ S z i . This implies that∆ = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X S ⊆ [2 kN ] b H ( S ) E z ∼ P · µ ( k )0 h Y i ∈ S z i i − E z ∼ P · µ ( k )1 h Y i ∈ S z i i!(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X S ⊆ [2 kN ] b H ( S ) · Y i ∈ S P i · E z ∼ µ ( k )0 h Y i ∈ S z i i − E z ∼ µ ( k )1 h Y i ∈ S z i i!(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X S ⊆ [2 kN ] b H ( S ) · Y i ∈ S P i · (cid:18) d µ ( k )0 ( S ) − d µ ( k )1 ( S ) (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ X S ⊆ [2 kN ] | b H ( S ) | · p | S | · (cid:12)(cid:12)(cid:12)(cid:12) d µ ( k )0 ( S ) − d µ ( k )1 ( S ) (cid:12)(cid:12)(cid:12)(cid:12) . . . since P ∈ [ − p, p ] kN We now apply Lemma 2.5 to bound the diﬀerence in moments between the distributions µ ( k )1 and µ ( k )0 . Lemma 2.5 implies that if | S | < k or | S | is odd, then d µ ( k )0 ( S ) = d µ ( k )1 ( S ). Furthermore,if | S | = 2 i for some i ∈ N , then (cid:12)(cid:12)(cid:12)(cid:12) d µ ( k )0 ( S ) − d µ ( k )1 ( S ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ − k +1 ǫ i N − i/ i !. This implies that∆ ≤ kN X i = k  X | S | =2 i | b H ( S ) |  · − k +1 ǫ i N − i/ i ! p i Since H maps {− , } kN to [ − , P | S | =2 i | b H ( S ) | by q(cid:0) kN i (cid:1) . We also bound2 − k +1 by 1. This, along with the previous inequality implies that∆ ≤ L k ( H ) · (2 − k +1 k ! ǫ k ) · N − k/ p k + kN X i = k +1 s(cid:18) kN i (cid:19) · i ! ǫ i ! · N − i/ p i Note that q(cid:0) kN i (cid:1) · i ! ≤ (2 k ) i N i √ (2 i )! i ! = O (cid:16) (2 k ) i e i (2 i ) i · i i e i · N i (cid:17) = O (cid:0) k i N i (cid:1) . Furthermore, since ǫ = k ln N ,for all i ≥ k , we have ǫ i k i = O (cid:0) k k k · k k (cid:1) = O (cid:0) k (cid:1) . This implies that 2 − k +1 k ! ǫ k = O (cid:0) − k (cid:1) and q(cid:0) kN i (cid:1) · i ! ǫ i = O ( N i ). Substituting these bounds in the previous inequality for ∆, we have∆ ≤ O − k · L k ( H ) N − k/ p k + kN X i = k +1 N i/ p i ! In the summation P i ≥ k +1 N i/ p i , every successive term is smaller than the previous by a factorof at least 1 /

4. This is because the assumption p ≤ N implies that N / p ≤ . Thus, we canbound this summation by twice the ﬁrst term, which is O ( N ( k +1) / p k +1) ) . This implies that∆ ≤ O (cid:16) − k · L k ( H ) N − k/ p k + N ( k +1) / p k +1) (cid:17) This completes the proof of Lemma 4.1. This is because P | S | =2 i | b H ( S ) | ≤ qP | S | =2 i qP | S | =2 i b H ( S ) ≤ q(cid:0) kN i (cid:1) . Single Step Analysis Away from the Origin

Lemma 5.1.

Let H be a family of Boolean functions on kN variables, each of which maps {− , } kN into [ − , . Assume that H is closed under restrictions. Let p ≤ N and z ∈ [ − / , / kN . Then, for all H ∈ H , ∆ := (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E z ∼ p · µ ( k )0 [ H ( z + z )] − E z ∼ p · µ ( k )1 [ H ( z + z )] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ O (cid:18) − k · L k ( H )(2 p ) k N k/ + (2 p ) k +1) N ( k +1) / (cid:19) Let O denote the distribution on R N whose support is { } (i.e, the distribution that puts allits mass on the zero vector in R N ). Corollary 5.2.

Under the same hypothesis as Lemma 5.1, for all H ∈ H , ∆ := 12 k − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X S ⊆ [ k ] ( − | S | E z ∼ z + p ·G S O ¯ S [ H ( z )] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ O (cid:18) − k · L k ( H )(2 p ) k N k/ + (2 p ) k +1) N ( k +1) / (cid:19) Proof of Corollary 5.2 from Lemma 5.1.

We show that the expressions for ∆ in Corollary 5.2 andLemma 5.1 are identical. Let Γ := (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E z ∼ p · µ ( k )0 [ H ( z + z )] − E z ∼ p · µ ( k )1 [ H ( z + z )] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) be the expression for∆ in Lemma 5.1. By the deﬁnition of µ ( k )0 , µ ( k )1 as in Deﬁnition 2.4, we haveΓ = 12 k − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X S ⊆ [ k ] ( − | S | E z ∼ p ·G S U ¯ S [ H ( z + z )] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = 12 k − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X S ⊆ [ k ] ( − | S | E z ∼G S U ¯ S [ H ( z + pz )] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (4)Let S ⊆ [ k ]. We now show that E z ∼G S U ¯ S [ H ( z + pz )] = E z ∼G S O ¯ S [ H ( z + pz )] . Substituting thisin the above equation would complete the proof. Let z ∼ G S O ¯ S and z ∼ O S U ¯ S . Note that z + z ∼ G S U ¯ S . Fix z ∈ R kN . Note that the multilinear polynomial H ( z + pz + pz ) over z and the distribution O S U ¯ S satisﬁes the hypothesis in Claim 2.7 for a = 0. Claim 2.7 implies thatfor all z ∈ R kN , we have E z ∼O S U ¯ S [ H ( z + pz + pz ) | z ] = H ( z + pz ) . It then follows that E z ∼G S U ¯ S [ H ( z + pz )] = E z ∼G S O ¯ S z ∼O S U ¯ S [ H ( z + pz + pz )] = E z ∼G S O ¯ S [ H ( z + pz )] = E z ∼ z + p ·G S O ¯ S [ H ( z )]Substituting the above in Eq. (4) implies that ∆ = Γ. This, along with Lemma 5.1 completes theproof of Corollary 5.2. Proof of Lemma 5.1.

Let v ∈ {− , , } kN be obtained by the following process, which we denoteby v ∼ z . For every i ∈ [2 kN ], independently, set v ( i ) := ( sign ( z ( i )) with probability | z ( i ) | − | z ( i ) | ρ v be a restriction as in Deﬁnition 2.12. For i ∈ [2 kN ] deﬁne P i by −| z ( i ) | . Since z ∈ [ − / , / kN , we have P ∈ [1 , kN . Note that for every i ∈ [2 kN ] and z ∈ {− , } kN , E v ∼ z [( ρ v ( z ))( i )] = | z ( i ) | sign ( z ( i )) + (1 − | z ( i ) | ) z ( i ) = z ( i ) + P − i z ( i )This implies that E v ∼ z [ ρ v ( z )] = z + P − · z for all z ∈ {− , } kN . Note that for every z ∈ {− , } kN ,the multilinear polynomial H and the random variable ρ v ( z ) satisfy the hypothesis of Claim 2.7with a = z + P − · z . Claim 2.7 implies that for all z ∈ {− , } kN , E v ∼ z [ H ( ρ v ( z ))] = H ( z + P − · z )Consider the restricted function H ◦ ρ v . For every z ∈ {− , } kN and v ∈ {− , , } kN , bydeﬁnition, ( H ◦ ρ v )( z ) = H ( ρ v ( z )). This, along with the previous equality implies that for all z ∈ {− , } kN , E v ∼ z [( H ◦ ρ v )( z )] = H ( z + P − · z )Note that both the L.H.S. and the R.H.S. of the above equation are multilinear polynomials in z (since we identify H ◦ ρ v with its multilinear extension). Thus, the above equation holds for all z ∈ R kN . In particular, for all distributions D over R kN , it holds that E z ∼ D E v ∼ z [( H ◦ ρ v )( z )] = E z ∼ D [ H ( z + P − · z )] (5)This implies that ∆ can be expressed as follows.∆ , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E z ∼ p · µ ( k )0 [ H ( z + z )] − E z ∼ p · µ ( k )1 [ H ( z + z )] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E z ∼ pP · µ ( k )0 (cid:2) H ( z + P − · z ) (cid:3) − E z ∼ pP · µ ( k )1 (cid:2) H ( z + P − · z ) (cid:3)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E v ∼ z " E z ∼ pP · µ ( k )0 [( H ◦ ρ v )( z )] − E z ∼ pP · µ ( k )1 [( H ◦ ρ v )( z )] . . . due to Eq. (5) ≤ max v ∼ z (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E z ∼ pP · µ ( k )0 [( H ◦ ρ v )( z )] − E z ∼ pP · µ ( k )1 [( H ◦ ρ v )( z )] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . . . Triangle-InequalityFix any v ∈ {− , , } kN . We now apply Lemma 4.1 on the function H ◦ ρ v with the parameters2 p and pP . Since H is closed under restrictions, H ◦ ρ v ∈ H . Note that the assumption p ≤ N and P ∈ [1 , kN implies that 2 p ≤ N and pP ∈ [ − p, p ] kN and thus, the hypothesis of Lemma 4.1 issatisﬁed. Furthermore, we can bound L k ( H ◦ ρ v ) by L k ( H ), by deﬁnition of the latter. Lemma 4.1implies that ∆ ≤ O (cid:18) − k · L k ( H )(2 p ) k N k/ + (2 p ) k +1) N ( k +1) / (cid:19) This completes the proof of Lemma 5.1.

For u, v ∈ N k , let u = v ∈ { , } be the indicator function that is 1 if and only if u = v . Asmentioned in the preliminaries, we identify sets S ⊆ [ k ] with their indicator vectors in { , } k .19 .1 Proof of Theorem 3.1 Let ∆ := E z ∼ ˜ µ ( k )0 [ H ( z )] − E z ∼ ˜ µ ( k )1 [ H ( z )] be the quantity that we wish to bound in Theorem 3.1. Bythe deﬁnition of ˜ µ ( k )0 , ˜ µ ( k )1 as in Deﬁnition 2.10, we have∆ = 12 k − X S ⊆ [ k ] ( − | S | E z ∼ ˜ G S U ¯ S [ H ( z )]Let S ⊆ [ k ]. Note the distribution ˜ G S U ¯ S is obtained by rounding the distribution G S O ¯ S as in Deﬁ-nition 2.6. We can thus apply Corollary 2.8 to the multilinear polynomial H ( z ) for the distribution G S O ¯ S to obtain that E z ∼ ˜ G S U ¯ S [ H ( z )] = E z ∼G S O ¯ S [ H ( trnc ( z ))] . This along with the above expressionfor ∆ implies that ∆ = 12 k − X S ⊆ [ k ] ( − | S | E z ∼G S O ¯ S [ H ( trnc ( z ))] (6)Let T = 16 N k , p = √ T = N k . For each t ∈ [ T ] and j ∈ [ k ], let z ( t ) j ∼ p · G be an independentsample. By convention, z (0) j := 0 for all j ∈ [ k ]. Let Z refer to the collection { z ( t ) j } t ∈{ ,...,T } ,j ∈ [ k ] ofrandom variables. For t ∈ { , . . . , T } and j ∈ [ k ], deﬁne z ≤ ( t ) j := z (0) j + . . . + z ( t ) j . Note that therandom variable z ≤ ( t ) j has a Gaussian distribution with mean 0 and covariance matrix as p t timesthat of G for all j ∈ [ k ]. In particular, z ≤ ( T ) j is distributed according to G for all j ∈ [ k ].Let a = ( a , . . . , a k ) for a , . . . , a k ∈ { , . . . , T } . Let a − a − , . . . , a k − z ( a ) := ( z ( a )1 , . . . , z ( a k ) k ) and deﬁne z ≤ ( a ) := ( z ≤ ( a )1 , . . . , z ≤ ( a k ) k ). Note that z ( a ) is distributedaccording to p · G k for all a ∈ [ T ] k . Also note that z ≤ ( a ) is distributed according to ( p √ a · G ) × . . . × ( p √ a k · G ) for all a ∈ { , . . . , T } k . In particular, for every S ⊆ [ k ], the random variable z ≤ ( T · S ) is distributed according to G S O ¯ S . Using this observation in Eq. (6), we have∆ = 12 k − X S ⊆ [ k ] ( − | S | E Z h H ( trnc ( z ≤ ( T · S ) )) i (7) Claim 6.1.

For a ∈ [ T ] k , let ∆ a be as follows. ∆ a := 12 k − X S ⊆ [ k ] ( − | S | E Z h H ( trnc ( z ≤ ( a − S ) )) i Then, P a ∈ [ T ] k ∆ a = ∆ .Proof of Claim 6.1. By deﬁnition of ∆ a , we have2 k − X a ∈ [ T ] k ∆ a = X a ∈ [ T ] k X S ⊆ [ k ] ( − | S | E Z h H ( trnc ( z ≤ ( a − S ) )) i a ∈ [ T ] k and S ⊆ [ k ], note that a − S ∈ { , . . . , T } k . Thus, the R.H.S. of the aboveequation is a linear combination of terms E Z [ H ( trnc ( z ≤ ( b ) ))] for b ∈ { , . . . , T } k . That is,2 k − X a ∈ [ T ] k ∆ a = X b ∈{ ,...,T } k  X a ∈ [ T ] k X S ⊆ [ k ] a − S = b · ( − | S |  E Z [ H ( trnc ( z ≤ ( b ) ))] (8)We now study the coeﬃcient of E Z [ H ( trnc ( z ≤ ( b ) ))] in the R.H.S. of the above expression. Note that( − | S | is exactly Q kj =1 (1 − S j ). For a ∈ [ T ] k , let a = ( a , . . . , a k ) for a , . . . , a k ∈ [ T ]. Using thisnotation, the coeﬃcient of E Z [ H ( trnc ( z ≤ ( b ) ))] in Eq. (8) is X a ∈ [ T ] k X S ⊆ [ k ] a − S = b · ( − | S | = X a ∈ [ T ] k X S ⊆ [ k ] Y j ∈ [ k ] (cid:0) S j · a j = b j + (1 − S j ) · a j − b j (cid:1) · ( − | S | = X a ∈ [ T ] k X S ⊆ [ k ] Y j ∈ [ k ] (cid:0) S j · a j = b j + (1 − S j ) · a j − b j (cid:1) · Y j ∈ [ k ] (1 − S j )= X a ∈ [ T ] k X S ⊆ [ k ] Y j ∈ [ k ] (cid:0) S j (1 − S j ) · a j = b j + (1 − S j )(1 − S j ) · a j − b j (cid:1) = X a ∈ [ T ] k X S ⊆ [ k ] Y j ∈ [ k ] (cid:0) − S j · a j = b j + (1 − S j ) · a j − b j (cid:1) . . . since S j = S j for all j ∈ [ k ]= X a ∈ [ T ] k Y j ∈ [ k ] X S j ∈{ , } (cid:0) − S j · a j = b j + (1 − S j ) · a j − b j (cid:1) = X a ∈ [ T ] k Y j ∈ [ k ] (cid:0) − a j = b j + a j − b j (cid:1) = Y j ∈ [ k ] X a j ∈ [ T ] (cid:0) − a j = b j + a j − b j (cid:1) = Y j ∈ [ k ] (cid:0) b j − T = b j (cid:1) Note that Q j ∈ [ k ] (cid:0) b j − T = b j (cid:1) is non zero if and only if each coordinate of b is in { , T } . For b ∈ { , T } k , let B := { j ∈ [ k ] : b j = T } . Note that Q j ∈ [ k ] (cid:0) b j − T = b j (cid:1) = ( − | B | . This, alongwith the above calculation implies that the coeﬃcient of E Z [ H ( trnc ( z ≤ ( b ) ))] in the R.H.S. of Eq. (8)is precisely ( − | B | . Furthermore, note that z ≤ ( b ) = z ≤ ( T · B ) . We substitute this in Eq. (8) to obtain2 k − X a ∈ [ T ] k ∆ a = E Z  X B ⊆ [ k ] ( − | B | H ( trnc ( z ≤ ( T · B ) ))  This, along with Eq. (7) completes the proof of Claim 6.1.Let a ∈ [ T ] k . We now show how to bound ∆ a . Let E a denote the event that z ≤ ( a − / ∈ [ − / , / kN . We show that E a is a low probability event. Recall that for j ∈ [ k ] , i ∈ [2 N ], the21 j, i )-th coordinate of z ≤ ( a − is distributed according to N (0 , p ( a j − ǫ ), where p a j ≤ ǫ = 1 / (60 k ln N ). This implies that for every i ∈ [2 kN ], P [ z ≤ ( a − ( i ) / ∈ [ − / , / ≤ P [ |N (0 , ǫ ) | ≥ / ≤ exp( − / (8 ǫ )) ≤ exp( − k ln N ) ≤ N k Applying a Union bound over coordinates i ∈ [2 kN ], we have that for each a ∈ [ T ] k , P [ E a ] , P [ z ≤ ( a − / ∈ [ − / , / kN ] ≤ kN · N k ≤ kN k (9) Deﬁnition 6.2.

For a ∈ [ T ] k , let ∆ ¬ E a := 12 k − X S ⊆ [ k ] ( − | S | E Z h H ( trnc ( z ≤ ( a − S ) )) | ¬ E a i ∆ E a := 12 k − X S ⊆ [ k ] ( − | S | E Z h H ( trnc ( z ≤ ( a − S ) )) | E a i We bound ∆ ¬ E a as follows. Fix any z := z ≤ ( a − such that E a does not occur. Let S ⊆ [ k ].Note that by deﬁnition, for every ﬁxed z , the random variable z ≤ ( a − S ) is distributed according to z + p ·G S O ¯ S . We now apply Corollary 5.2 to the polynomial H with parameters p and z = z ≤ ( a − .The conditions of Corollary 5.2 are satisﬁed, since z ∈ [ − / , / kN , p ≤ N k ≤ N , and forevery S ⊆ [ k ], the random variable z ≤ ( a − S ) is distributed according to z + p ·G S O ¯ S . Corollary 5.2implies that12 k − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X S ⊆ [ k ] ( − | S | E Z h H ( z ≤ ( a − S ) ) | ¬ E a i(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ O (cid:18) − k · L k ( H )(2 p ) k N k/ + (2 p ) k +1) N ( k +1) / (cid:19) (10)Fix any S ⊆ [ k ]. Let P ∈ { , p } kN be such that for all i ∈ [2 N ] and j ∈ [ k ], we have P j,i = p if and only if j ∈ S . Using this notation, observe that for every ﬁxed z , the random variable z ≤ ( a − S ) is distributed according to z + P · G k . We now apply Claim 2.9 to the multilinearpolynomial H with z = z ≤ ( a − and P as deﬁned above. The conditions of this claim are satisﬁedsince z ∈ [ − / , / kN (since E a does not occur), p ≤ N k ≤ and P ∈ [ − p, p ] kN ⊆ (cid:2) − , (cid:3) kN and H maps {− , } kN into [ − , z ≤ ( a − S ) is distributed according to z + P · G k ,Claim 2.9 implies that for all S ⊆ [ k ], E Z h H ( z ≤ ( a − S ) ) − H ( trnc ( z ≤ ( a − S ) )) | ¬ E a i ≤ O (cid:18) N k (cid:19) This inequality, along with Triangle-Inequality implies that12 k − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X S ⊆ [ k ] ( − | S | E Z h(cid:16) H ( z ≤ ( a − S ) ) − H ( trnc ( z ≤ ( a − S ) )) (cid:17) | ¬ E a i(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ O (cid:18) N k (cid:19) (11)22ombining Eq. (10) and Eq. (11) and applying Triangle-Inequality, we have | ∆ ¬ E a | , k − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E Z  X S ⊆ [ k ] ( − | S | H ( trnc ( z ≤ ( a − S ) )) | ¬ E a (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ O (cid:18) − k · L k ( H )(2 p ) k N k/ + (2 p ) k +1) N ( k +1) / + 1 N k (cid:19) (12)We now bound ∆ E a . For all a ∈ [ T ] k and S ⊆ [ k ], since trnc ( z ≤ ( a − S ) ) ∈ [ − , kN , and H maps[ − , kN to [ − , H ( trnc ( z ≤ ( a − S ) )) ∈ [ − , E a as in Deﬁnition 6.2 implies that | ∆ E a | ≤

2. By the deﬁnition of ∆ a and Deﬁnition 6.2, we have | ∆ a | ≤ P [ E a ] · | ∆ E a | + P [ ¬ E a ] · | ∆ ¬ E a | ≤ P [ E a ] · | ∆ E a | + | ∆ ¬ E a | Using Eq. (9), Eq. (12), along with the inequality | ∆ E a | ≤

2, we have | ∆ a | ≤ O (cid:18) kN k + 2 − k · L k ( H )(2 p ) k N k/ + (2 p ) k +1) N ( k +1) / + 1 N k (cid:19) = O (cid:18) L k ( H ) p k N k/ + (2 p ) k +1) N ( k +1) / + kN k (cid:19) (13)This establishes a bound on ∆ a . Using Claim 6.1 and Triangle-Inequality, we have | ∆ | ≤ P a ∈ [ T ] k | ∆ a | .Substituting the bound from Eq. (13) for ∆ a in this, we have | ∆ | ≤ X a ∈ [ T ] k O (cid:18) L k ( H ) p k N k/ + (2 p ) k +1) N ( k +1) / + kN k (cid:19) ≤ O (cid:18) T k · L k ( H ) p k N k/ + T k · (2 p ) k +1) N ( k +1) / + T k · kN k (cid:19) By our choice of T = 16 N k and p = √ T = N k , we have the following inequality. | ∆ | ≤ O (cid:18) L k ( H ) N k/ + 16 k N k · k +1) N k ( k +1) · N ( k +1) / + 16 k N k · kN k (cid:19) ≤ O (cid:18) L k ( H ) N k/ + 4 k N k · N ( k +1) / + k · k N k (cid:19) ≤ O (cid:18) L k ( H ) N k/ + 4 k N k − + kN k − k (cid:19) A small calculation then shows that | ∆ | ≤ O (cid:18) L k ( H ) N k/ (cid:19) + o (cid:18) N k/ (cid:19) This completes the proof of Theorem 3.1. 23 .2 Proof of Corollary 3.3

Corollary 3.3 essentially follows from the fact that functions in H are bounded over {− , } N andthe fact that for i ∈ { , } the distributions σ ( k ) i and ˜ µ ( k ) i are nearly identical. Let H ∈ H . Deﬁnedistributions π ( k )0 (respectively π ( k )1 ) obtained by conditioning ˜ µ ( k )0 on F ( k ) ( z ) = − µ ( k )1 on F ( k ) ( z ) = +1). Lemma 2.11 implies for δ , δ = O (cid:16) kN k (cid:17) , we have ˜ µ ( k )0 =(1 − δ ) σ ( k )0 + δ π ( k )0 and ˜ µ ( k )1 = (1 − δ ) σ ( k )1 + δ π ( k )1 . Thus, for i ∈ { , } , we have E z ∼ ˜ µ ( k ) i [ H ( z )] = (1 − δ i ) E z ∼ σ ( k ) i [ H ( z )] + δ i E z ∼ π ( k ) i [ H ( z )]Let δ = max( δ , δ ) = O (cid:16) kN k (cid:17) . Since H maps {− , } kN to [ − , | E z ∼ σ ( k ) i [ H ( z )] | and | E z ∼ π ( k ) i [ H ( z )] | by 1. We subtract the equation for i = 1 from that for i = 0 and apply Triangle-inequality to obtain (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E z ∼ ˜ µ ( k )0 [ H ( z )] − E z ∼ ˜ µ ( k )1 [ H ( z )] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E z ∼ σ ( k )0 [ H ( z )] − E z ∼ σ ( k )1 [ H ( z )] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) − δ Rearranging this, we have( ∗ ) := (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E z ∼ σ ( k )0 [ H ( z )] − E z ∼ σ ( k )1 [ H ( z )] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ O (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E z ∼ ˜ µ ( k )0 [ H ( z )] − E z ∼ ˜ µ ( k )1 [ H ( z )] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + δ ! We use Theorem 3.1 to bound the ﬁrst term in the R.H.S. Furthermore, we use the fact that δ = O (cid:16) kN k (cid:17) = o (cid:16) N k/ (cid:17) to obtain that ( ∗ ) ≤ O (cid:16) L k ( H ) N k/ (cid:17) + o (cid:16) N k/ (cid:17) . This completes the proofof Corollary 3.3. Quantum Upper Bound

The quantum query algorithm for F ( k ) is derived from [A10, AA15].These papers provide a quantum query algorithm Q ( z ) which makes one quantum query to the input z ∈ {− , } N and returns a (probabilistic) b ∈ { , } , with the property that P [ b = 1] = forr ( z )2 .Given input z = ( z , . . . , z k ) where z , . . . , z k ∈ {− , } N , we are promised that for each j ∈ [ k ],either f orr ( z j ) ≥ ǫ/ f orr ( z j ) ≤ ǫ/

4. This implies that for all j ∈ [ k ], the probability that Q ( z j )returns 1 is either at least ǫ/ or at most ǫ/ . By repeating the algorithm O (cid:16) log kǫ (cid:17) times andtaking the threshold, we can produce an algorithm that for each j ∈ [ k ], distinguishes between F ( z j ) = 1 and F ( z j ) = − − k . By a Union-bound over j ∈ [ k ],with probability at least 9 /

10, this algorithm computes F ( z j ) for all j ∈ [ k ]. In particular, it cancompute F ( k ) ( z ) = Q kj =1 F ( z j ) with probability at least 9 /

10. Observe that the number of queriesmade by this algorithm is k × log k/ǫ = O (cid:0) k log k log N (cid:1) .It follows that the above algorithm can distinguish the distributions σ ( k )0 and σ ( k )1 with at least9 /

10 advantage. A variant of this algorithm can be used to establish the quantum communicationprotocol in Theorem 3.5. This step is identical to Theorem 3.3 from [GRT19], so we omit it. Wenow prove the classical lower bounds. 24 .1 Query Complexity Separations

Proof of Theorem 3.5.

Let d = o (cid:16) √ N log N (cid:17) . Note that d log( kN ) √ N = o (1). Lemma 3.4 implies thatevery decision tree of depth at most d can distinguish σ ( k )0 and σ ( k )1 with advantage at most (cid:16) O ( d log( kN )) N / (cid:17) k ≤ exp( − Ω( k )). Note that σ ( k )1 and σ ( k )0 are distributions on the yes and no in-stances of F ( k ) , respectively. This implies that every randomized decision tree of depth d = ˜ o ( √ N )can solve F ( k ) with at most exp( − Ω( k )) advantage. Proof of Lemma 3.4.

Let H denote the set of Boolean functions on 2 kN variables that are computedby deterministic decision trees of depth at most d . H is clearly closed under restrictions. We usethe following lemma due to [Tal19] which bounds the level 2 k mass of H . Lemma 7.1 ([Tal19]) . For all k ∈ N , we have L k ( H ) ≤ (cid:16) O (cid:16)p d log( kN ) (cid:17)(cid:17) k . The above bound, along with Corollary 3.3 implies that for all H ∈ H , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E z ∼ σ ( k )0 [ H ( z )] − E z ∼ σ ( k )1 [ H ( z )] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:18) O ( d log( kN )) N / (cid:19) k + o (cid:18) N k/ (cid:19) = (cid:18) O ( d log( kN )) N / (cid:19) k This completes the proof of Lemma 3.4.

Proof of Theorem 3.10.

Let C be an AC0 circuit of depth d and size s = o (cid:16) exp (cid:16) N d − (cid:17)(cid:17) . Notethat O (cid:0) log d − ( s ) (cid:1) = o ( √ N ). This, along with Lemma 3.9 implies that (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E z ∼ σ ( k )0 [ C ( z )] − E z ∼ σ ( k )1 [ C ( z )] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ O (cid:0) log d − ( s ) (cid:1) N / ! k ≤ exp( − Ω( k ))Thus, we have produced distributions on yes and no instances of F ( k ) such that every depth d AC0 circuit of size o (cid:16) exp (cid:16) N d − (cid:17)(cid:17) can distinguish them with at most exp( − Ω( k )) advantage.This completes the proof of Theorem 3.10. Proof of Lemma 3.9.

Let H denote the set of Boolean functions that are computed by AC0 circuitsof depth at most d and size at most s . Note that H is clearly closed under restrictions. We use thefollowing lemma due to [Tal19] which bounds the level 2 k mass of H . Lemma 7.2 ([Tal19]) . For all k ∈ N , we have L k ( H ) ≤ (cid:0) O (cid:0) log d − ( s ) (cid:1)(cid:1) k . The above bound, along with Theorem 3.1 implies that for all H ∈ H , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E z ∼ ˜ µ ( k )0 [ H ( z )] − E z ∼ ˜ µ ( k )0 [ H ( z )] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:0) O (cid:0) log d − ( s ) (cid:1)(cid:1) k N k/ + o (cid:18) N k/ (cid:19) = O (cid:0) log d − ( s ) (cid:1) N / ! k This completes the proof of Lemma 3.9. 25 .3 Applications to Communication Complexity Separations

Proof of Theorem 3.8.

Let c = ˜ o ( N / ). Note that for k = ˜ o ( N / ), we have ( c +8 k ) √ N = o (1) . For i ∈ { , } , let π ( k ) i denote the distribution of ( x, x · z ) where x ∼ U kN and z ∼ σ ( k ) i . Note that π ( k )0 and π ( k )1 are distributions on the yes and no instances of F ( k ) ◦ xor , respectively. Lemma 3.7implies that every deterministic protocol of cost at most c for F ( k ) ◦ xor can distinguish π ( k )0 and π ( k )1 with at most O (cid:16) ( c +8 k ) k N k/ (cid:17) ≤ exp( − Ω( k )) advantage. This implies that no randomized protocolof cost o ( N / ) solves F ( k ) ( ⊕ ) with more than exp( − Ω( k )) advantage. This completes the proof ofTheorem 3.8.To prove Lemma 3.7, the idea is to apply Corollary 3.3 on the function family deﬁned by E x ∼ U kN C ( x, x · z ), where C is a small cost protocol. However, to prove a suitable upper bound onthe level 2 k mass, we require that each rectangle in the protocol is small. To handle this, we deﬁnean extended protocol ext l ( C ), in which the players reveal l additional junk bits and then proceedwith the original protocol C . This modiﬁcation is only a technicality and the rest of the argumentsare similar to the ones in [GRT19]. Deﬁnition 7.3.

Let C : {− , } M × {− , } M → {− , } be any deterministic protocol and l ∈ N .An extension ext l ( C ) : {− , } M + l × {− , } M + l → {− , } is a protocol in which Alice and Bobdeclare the last l bits of their inputs and then follow C on the ﬁrst M bits of their inputs. Deﬁnition 7.4.

For any protocol C : {− , } M × {− , } M → {− , } , let H C : {− , } M → R be deﬁned at every z ∈ {− , } M by H C ( z ) := E x ∼ U M [ C ( x, x · z )] . For any distribution C overprotocols C : {− , } M × {− , } M → {− , } , let H C be deﬁned at every z ∈ {− , } M by H C ( z ) := E C ∼C [ H C ( z )] . Lemma 7.5.

Let l, M ∈ N . Let H be the family of functions H obtained as follows. Let C be anarbitrary distribution over deterministic protocols C : {− , } M × {− , } M → {− , } of cost atmost c . Let H ext l ( C ) be as in Deﬁnition 7.4, and Deﬁnition 7.3 and let H : {− , } M → R be deﬁnedat every z ∈ {− , } M by H ( z ) := E z ′ ∼ U l [ H ext l ( C ) ( z, z ′ )] . Then, H is closed under restrictions. The proof of this is a simple unravelling of deﬁnitions and is deferred to the appendix.

Lemma 7.6.

Let l = ⌈ k log e ⌉ . Let H be the family as in Lemma 7.5. Then, L k ( H ) ≤ O (cid:16)(cid:0) ek (cid:1) k · ( c + 2 l ) k (cid:17) . The proof of this is similar to that of Claim 1 in [GRT19] and is deferred to the appendix.

Proof of Lemma 3.7.

Let l = ⌈ k log e ⌉ . Let H be the family of functions as in Lemma 7.5.Lemma 7.5 implies that the family H is closed under restrictions. We now apply Corollary 3.3to H to obtain that for all H ∈ H , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E z ∼ σ ( k )0 [ H ( z )] − E z ∼ σ ( k )1 [ H ( z )] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ O (cid:18) L k ( H ) N k/ (cid:19) + o (cid:18) N k/ (cid:19)

26e use Lemma 7.6 which upper bounds L k ( H ). This, along with the previous inequality and thefact that l = ⌈ k log e ⌉ implies that (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E z ∼ σ ( k )0 [ H ( z )] − E z ∼ σ ( k )1 [ H ( z )] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ O (cid:18) e k ( c + 2 l ) k k k N k/ (cid:19) + o (cid:18) N k/ (cid:19) = O (cid:18) ( c + 8 k ) k N k/ (cid:19) + o (cid:18) N k/ (cid:19) (14)Let C refer to the given protocol of cost at most c . Let H : {− , } kN → [ − ,

1] be deﬁned at z ∈ {− , } kN by H ( z ) = E z ′ ∼ U l [ H ext l ( C ) ( z, z ′ )]. By Deﬁnition 7.3, for all x, z ∈ {− , } kN , x ′ , z ′ ∈{− , } l , we have that C ( x, x · z ) = ext l ( C )(( x, x ′ ) , ( x · z, x ′ · z ′ )). This implies that for all z ∈{− , } kN , we have H ( z ) , E z ′ ∼ U l [ H ext l ( C ) ( z, z ′ )] , E x ∼ U kN x ′ ,z ′ ∼ U l [ ext l ( C )(( x, x ′ ) , ( x · z, x ′ · z ′ ))] . . . due to Deﬁnition 7.4= E x ∼ U kN [ C ( x, x · z )] . . . due to Deﬁnition 7.3This, along with Eq. (14) implies that (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E x ∼ U kN z ∼ σ ( k )0 [ C ( x, x · z )] − E x ∼ U kN z ∼ σ ( k )1 [ C ( x, x · z )] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ O (cid:18) ( c + 8 k ) k N k/ (cid:19) + o (cid:18) N k/ (cid:19) = O (cid:18) ( c + 8 k ) k N k/ (cid:19) This completes the proof of Lemma 3.7.

Acknowledgement

We would like to thank Avishay Tal for very helpful conversations.

References [A10] Scott Aaronson: BQP and the Polynomial Hierarchy. STOC 2010: 141-150[AA15] Scott Aaronson and Andris Ambainis: Forrelation: A Problem That Optimally SeparatesQuantum from Classical Computing. STOC 2015. 307-316[BCW98] Harry Buhrman, Richard Cleve, Avi Wigderson: Quantum vs. Classical Communicationand Computation. STOC 1998: 63-68[BJK04] Ziv Bar-Yossef, T. S. Jayram, Iordanis Kerenidis: Exponential Separation of Quantumand Classical One-Way Communication Complexity. SIAM J. Comput. 38(1): 366-384 (2008)[CFK+19] Arkadev Chattopadhyay, Yuval Filmus, Sajin Koroth, Or Meir, Toniann Pitassi: Query-To-Communication Lifting for BPP Using Inner Product. ICALP 2019: 35:1-35:15[CHHL18] Eshan Chattopadhyay, Pooya Hatami, Kaave Hosseini, Shachar Lovett: PseudorandomGenerators from Polarizing Random Walks. CCC 2018: 1:1-1:2127CHLT19] Eshan Chattopadhyay, Pooya Hatami, Shachar Lovett, Avishay Tal: PseudorandomGenerators from the Second Fourier Level and Applications to AC0 with Parity Gates. ITCS2019: 22:1-22:15[D12] Andrew Drucker: Improved Direct Product Theorems for Randomized Query Complexity.Computational Complexity 21(2): 197-244 (2012)[G16] Dmitry Gavinsky: Entangled Simultaneity versus Classical Interactivity in CommunicationComplexity. STOC 2016: 877-884[GKK+08] Dmitry Gavinsky, Julia Kempe, Iordanis Kerenidis, Ran Raz, Ronald de Wolf: Expo-nential Separation for One-Way Quantum Communication Complexity, with Applications toCryptography. SIAM J. Comput. 38(5): 1695-1708 (2008)[GRT19] Uma Girish, Ran Raz, Avishay Tal: Quantum versus Randomized Communication Com-plexity, with Eﬃcient Players. CoRR abs/1911.02218 (2019)[KR11] Oded Regev, Bo`az Klartag: Quantum One-Way Communication can be ExponentiallyStronger than Classical Communication. STOC 2011: 31-40[O’D14] Ryan O’Donnell: Analysis of Boolean Functions. Cambridge University Press 2014, ISBN978-1-10-703832-5, pp. I-XX, 1-423[R99] Ran Raz: Exponential Separation of Quantum and Classical Communication Complexity.STOC 1999: 358-367[R95] Ran Raz: Fourier Analysis for Probabilistic Communication Complexity. Comput. Complex.5(3/4): 205-221 (1995)[RT19] Ran Raz and Avishay Tal: Oracle separation of BQP and PH. STOC 2019: 13-23[Tal17] Avishay Tal: Tight Bounds on the Fourier Spectrum of AC0. Computational ComplexityConference 2017: 15:1-15:31[Tal19] Avishay Tal: Towards Optimal Separations between Quantum and Randomized QueryComplexities. CoRR abs/1912.12561 (2019)[UCB] Example 2.1 from [UCB] Example 2.5 from

A Output of F ( k ) on Distributions ˜ µ ( k )0 and ˜ µ ( k )1 We use the following claims to prove Lemma 2.11.

Claim A.1.

Let z ∼ G , where G is the distribution in Deﬁnition 2.2. Then, P z ∼G [ f orr ( z ) ≤ ǫ/ ≤ e − Ω( N ) . Claim A.2.

Let z ∈ [ − / , / N and z ∼ ˜ z be the random variable obtained by rounding z asin Deﬁnition 2.6. Then, P [ | f orr ( z ) − f orr ( z ) | ≥ ǫ/ ≤ e − Ω( N / ) . orollary A.3. Let U be the uniform distribution on {− , } N and ˜ G be the distribution on {− , } N as in Deﬁnition 2.10. Then, P z ∼U [ f orr ( z ) ≤ ǫ/ ≥ − e − Ω( N / ) and P z ∼ ˜ G [ f orr ( z ) ≥ ǫ/ ≥ − O (cid:18) N k (cid:19) Proof of Lemma 2.11 from Corollary A.3.

This follows from a simple Union-bound. Let S ⊆ [ k ].Let z ∼ ˜ G S U ¯ S and z = ( z , . . . , z k ) for z , . . . , z k ∈ {− , } N . For j ∈ S , we have z j ∼ G andconsequently, Corollary A.3 implies that with at least 1 − O (cid:16) N k (cid:17) probability, F ( z j ) = −

1. For j / ∈ S , we have z j ∼ U and consequently, Corollary A.3 implies that with at least 1 − e − Ω( N / ) ≥ − O (cid:16) N k (cid:17) probability , F ( z j ) = 1. A Union-bound over j ∈ [ k ] implies that with probabilityat least 1 − O (cid:16) kN k (cid:17) , we have that all these events occur, that is, z is in the support of F ( k ) and F ( k ) ( z ) , Q kj =1 F ( z j ) = ( − | S | . Since µ ( k )0 (respectively µ ( k )1 ) is a mixture of distributions ˜ G S U ¯ S where | S | is even (respectively | S | is odd), it follows that with probability at least 1 − O (cid:16) kN k (cid:17) , F ( k ) ( z ) = 1 (respectively F ( k ) ( z ) = − Proof of Corollary A.3 from Claim A.1 and Claim A.2.

We set z to be the zero vector in R N andapply Claim A.2. Since the distribution obtained by rounding z is U N and f orr ( z ) = 0, we have P z ∼ U N h f orr ( z ) ≥ ǫ i , P z ∼ ˜ z h f orr ( z ) ≥ ǫ i ≤ e − Ω( N / ) This proves the ﬁrst part of Corollary A.3. To prove the second part, let z ∼ G . Let E denote theevent that z / ∈ [ − / , / N . We ﬁrst show that E is a low probability event. Recall that eachcoordinate of z is distributed as N (0 , ǫ ) where ǫ = 1 / (60 k ln N ). This, along with a Union boundover coordinates i ∈ [2 N ] implies that P [ E ] ≤ N · P [ z ( i ) / ∈ [ − / , / ≤ N · P [ |N (0 , ǫ ) | ≥ / ≤ N exp( − / (8 ǫ )) ≤ N · exp( − k ln N ) = 2 NN k (15)Let z ∼ ˜ z be obtained by rounding z as in Deﬁnition 2.6. If f orr ( z ) ≤ ǫ/

2, then we must eitherhave f orr ( z ) ≤ ǫ/ | f orr ( z ) − f orr ( z ) | ≥ ǫ/

4. For the latter event, we split it into casesconditioned on whether E occurs or not. A Union bound implies that P z ∼G z ∼ ˜ z [ f orr ( z ) ≤ ǫ/ ≤ P z ∼G [ f orr ( z ) ≤ ǫ/

4] + P z ∼G z ∼ ˜ z [ | f orr ( z ) − f orr ( z ) | ≥ ǫ/ ≤ P z ∼G [ f orr ( z ) ≤ ǫ/

4] + P [ E ] + P z ∼G z ∼ ˜ z [ | f orr ( z ) − f orr ( z ) | ≥ ǫ/ | ¬ E ] (16)Claim A.1 implies that with all but e − Ω( N ) probability, for z ∼ G , we have f orr ( z ) > ǫ/

4. Thus,the ﬁrst term in the R.H.S. of Eq. (16) can be upper bounded by e − Ω( N ) . The second term can bebounded by NN k due to Eq. (15). For the third term, note that whenever E does not occur, wecan apply Claim A.2 to obtain that P z ∼ ˜ z [ | f orr ( z ) − f orr ( z ) | ≥ ǫ/ | ¬ E, z ] ≤ e − Ω( N / ) Here we use the fact that k = o ( N / ). P z ∼ ˜ G [ f orr ( z ) ≤ ǫ/ , P z ∼G z ∼ ˜ z [ f orr ( z ) ≤ ǫ/ ≤ e − Ω( N ) + 2 NN k + e − Ω( N / ) = O (cid:18) N k (cid:19) Proof of Claim A.1.

This follows from a simple concentration inequality for Chi-Squared randomvariables. Note that a random sample z ∼ G is equivalent to a sample z = ( x, y ), where x ∼N (0 , ǫ I N ) and y = H N x . This implies that f orr ( z ) = N h x, H N y i = N h x, H N x i = N · k x k . Therandom variable k x k has a Chi-Squared distribution, deﬁned by the sum of squares of N randomvariables, each of which is distributed according to N (0 , ǫ ). Using the concentration inequality forthe Chi-Squared distribution from the preliminaries, we have that for all t ∈ (0 , P "(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N N X i =1 x i − ǫ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ tǫ ≤ exp( − Ω( N t ))Substituting t = 1 /

4, we obtain P [ | f orr ( z ) − ǫ | ≥ ǫ/

4] = P h(cid:12)(cid:12)(cid:12) N P Ni =1 x i − ǫ (cid:12)(cid:12)(cid:12) ≥ ǫ i ≤ e − Ω( N ) . Thisimplies the desired conclusion in Claim A.1. Proof of Claim A.2.

We make use of the following concentration inequality. It appears as Theorem10.24 in Ryan Odonnell’s book on Boolean functions [O’D14] as an application of the general hyper-contractivity theorem on product spaces. We state it in the context of biased product distributionson the Boolean hypercube.

Lemma A.4.

Let π , . . . , π M be probability distributions on {− , } such that for every i ∈ [ M ] ,every outcome in π i has probability at least λ . Let Ω = {− , } M and π = π × . . . × π M . Let f : Ω → R be a Boolean function of total degree at most d and let k f k := p E x ∼ π [ f ( x ) ] denotethe l norm of f . Then, for any t ≥ p e/λ d , we have P x ∼ π [ | f ( x ) | ≥ t k f k ] ≤ λ d exp (cid:0) − d e · λt /d (cid:1) . Note that the distribution ˜ z on {− , } N satisﬁes the hypothesis in Lemma A.4 with λ = because of the assumption that z ∈ [1 / , / N . Claim A.2 essentially follows by consideringthe degree-2 Boolean function f orr ( z ) − f orr ( z ), bounding its l norm and applying Lemma A.4.However, to simplify the calculation we instead consider f : R N → R deﬁned by f ( z ) := f orr ( z − z ) , N − · h x − x , H N ( y − y ) i where z = ( x, y ) for x, y ∈ R N and z = ( x , y ) for x , y ∈ [ − / , / N . Note that we have the identity f ( z ) , f orr ( z − z ) = f orr ( z ) − f orr ( x , y ) − f orr ( x, y ) + f orr ( z ). We now show that when z ∼ ˜ z , the random variables f ( z ) , f orr ( x, y )and f orr ( x , y ) are concentrated around their mean. From the above identity, it will follow that f orr ( z ) is also concentrated around its mean. We ﬁrst show a concentration inequality for f . Sinceeach coordinate of ( x, y ) is sampled independently so that E [( x, y )] = ( x , y ), we have E [ f ] , N − · E  X i,j ∈ [ N ] ( x ( i ) − x ( i ))( y ( j ) − y ( j ))( − h i,j i   = N − · X i,j ∈ [ N ] E (cid:2) ( x ( i ) − x ( i )) ( y ( j ) − y ( j )) (cid:3) . . . since the cross terms are 0. ≤ N − · N . . . since x, x , y, y ∈ [ − , N . k f k ≤ √ N . Note that f is of degree 2. We now apply Lemma A.4 to the function f forthe distribution ˜ z . Let t be a parameter. Since λ = and d = 2, we have p e/λ d = O (1) and λ d exp (cid:0) − d e · λt /d (cid:1) = exp( − Ω( t )). Lemma A.4, along with the above calculation implies that forall t ≥ O (1), we have P z ∼ ˜ z h | f ( z ) | ≥ tN / i ≤ P z ∼ ˜ z [ | f ( z ) | ≥ Ω( t ) · k f k ] ≤ exp( − Ω( t )). We nowset t = N / ǫ = N / k ln N . This is larger than N / for suﬃciently large N and k = o ( N / ). Thisimplies that P z ∼ ˜ z h | f orr ( z − z ) | ≥ ǫ i , P z ∼ ˜ z h | f ( z ) | ≥ ǫ i ≤ exp( − Ω( N / )) (17)We now show a similar concentration inequality for f orr ( x, y ). Let g : R N → R be deﬁned at x ∈ R N by g ( x ) := f orr ( x, y ) − f orr ( z ) , N − · h x − x , H N y i . Since each coordinate of x issampled independently so that E [ x ] = x , we have N · E [ g ] , E  X i ∈ [ N ] ( x ( i ) − x ( i ))( H N y )( i )   = X i ∈ [ N ] E (cid:2) ( x ( i ) − x ( i )) ( H N y )( i ) (cid:3) . . . since the cross terms are 0. ≤ X i ∈ [ N ] · ( H N y )( i ) . . . since x, x ∈ [ − , N . = 4 k H N y k = 4 k y k ≤ N Thus, k g k ≤ √ N . We now apply Lemma A.4 to the degree-1 polynomial g for the distribution˜ x on {− , } N . Let t be a parameter. Since λ = and d = 1, we have λ d exp (cid:0) − d e · λt /d (cid:1) =exp( − Ω( t )) and p e/λ d = O (1). Lemma A.4, along with the above calculation implies that forall t ≥ O (1), we have P x ∼ ˜ x h | g ( x ) | ≥ tN / i ≤ P x ∼ ˜ x [ | g ( x ) | ≥ Ω( t ) · k g k ] ≤ exp( − Ω( t )). We nowset t = N / ǫ = N / k ln N . This is larger than N / for suﬃciently large N and k = o ( N / ). Thisimplies that P x ∼ ˜ x [ | f orr ( x, y ) − f orr ( z ) | ≥ ǫ/ , P x ∼ ˜ x h | g ( x ) | ≥ ǫ i ≤ exp( − Ω( N / )) (18)An identical calculation implies that P y ∼ ˜ y [ | f orr ( x , y ) − f orr ( z ) | ≥ ǫ/ ≤ exp( − Ω( N / )) (19)Recall that we have the identity f orr ( z ) = f orr ( z − z ) + f orr ( x , y ) + f orr ( x, y ) − f orr ( z ).Suppose | f orr ( z ) − f orr ( z ) | ≥ ǫ/

4, then either | f orr ( x, y ) − f orr ( z ) | ≥ ǫ/

12, or | f orr ( x , y ) − f orr ( z ) | ≥ ǫ/

12 or | f orr ( z − z ) | ≥ ǫ/

12. This, along with Eq. (17), Eq. (18), Eq. (19) and aUnion-Bound implies that P [ | f orr ( z ) − f orr ( z ) | ≥ ǫ/ ≤ · e − Ω( N / ) + e − Ω( N / ) ≤ e − Ω( N / ) Closure Under Restrictions

Proof of Lemma 7.5.

Let L = M + l . Let H ∈ H be deﬁned by a distribution C over deterministicprotocols C : {− , } M × {− , } M → {− , } of cost at most c . Let v ∈ {− , , } M and ρ v bea restriction as in Deﬁnition 2.12. Let V := { j : v ( j ) ∈ {− , }} . Deﬁne a distribution C v overprotocols C v : {− , } M × {− , } M → {− , } as follows.1. Sample C ∼ C .2. For each j ∈ V , independently sample a j uniformly at random from {− , } .3. For each j ∈ V , Alice overwrites the j -th bit of her input with a j and Bob overwrites the j -thbit of his input with a j · v j .4. Alice and Bob execute the protocol C on their restricted inputs. Claim B.1.

For all z ∈ {− , } M , v ∈ {− , , } M , we have E z ′ ∼ U l [ H ext l ( C v ) ( z, z ′ )] = H ( ρ v ( z )) . Note that C v is a distribution over deterministic protocols of cost at most c . Thus, by deﬁnitionof H , the function that maps z to E z ′ ∼ U l [ H ext l ( C v ) ( z, z ′ )] is in H . This observation, along withClaim B.1 establishes that the restricted function H ( ρ v ( z )) of z is also in H . It thus suﬃces toprove Claim B.1. Proof of Claim B.1.

This proof is by unravelling deﬁnitions. Let z ∈ {− , } M and v ∈ {− , , } M .Note that for all x, ′ z ′ ∈ {− , } l and x ∈ {− , } M , Deﬁnition 7.3 implies that C ( x, x · z ) = ext l ( C )(( x, x ′ ) , ( x · z, x ′ · z ′ ). In particular, for all x ∈ {− , } M , we have C ( x, x · z ) = E z ′ ,x ′ ∼ U l [ ext l ( C )(( x, x ′ ) , ( x · z, x ′ · z ′ )] (20)Consider E z ′ ∼ U l [ H ext l ( C v ) ( z, z ′ )] , E z ′ ∼ U l E C ∼C v ( x,x ′ ) ∼ U L [ ext l ( C )(( x, x ′ ) , ( x · z, x ′ · z ′ ))] . . . due to Deﬁnition 7.4 , E C ∼C v x ∼ U M [ C ( x, x · z )] . . . due to Eq. (20)For each j ∈ V , let a j be a uniformly random sample as in step 2. For the rest of the coordinates j ∈ [ M ] \ V , set a j := 0 and let a = ( a , . . . , a M ) ∈ {− , , } M . Let A denote the distribution of a obtained by this process. This, along with the above equation and the deﬁnition of C v implies that E z ′ ∼ U l [ H ext l ( C v ) ( z, z ′ )] = E C ∼C v x ∼ U M [ C ( x, x · z )] = E C ∼C E a ∼A x ∼ U M [ C ( ρ a ( x ) , ρ a · v ( x · z ))]Note that ρ a · v ( x · z ) is exactly ρ a ( x ) · ρ v ( z ) . This is because for j ∈ V , we have a j , v j = 0 and thus,( ρ a · v ( x · z ))( j ) = a j · v j = ( ρ a ( x ))( j ) · ( ρ v ( z ))( j ); similarly, for j / ∈ V , we have a j = v j = 0 and thus,( ρ a · v ( x · z ))( j ) = x ( j ) · z ( j ) = ( ρ a ( x ))( j ) · ( ρ v ( z ))( j ). Substituting this in the above equation, E z ′ ∼ U l [ H ext l ( C v ) ( z, z ′ )] = E C ∼C E a ∼A x ∼ U M [ C ( ρ a ( x ) , ρ a ( x ) ρ v ( z ))]32ote that for a ∼ A and x ∼ U M , we have ρ a ( x ) ∼ U M . Substituting this in the above equation, E z ′ ∼ U l [ H ext l ( C v ) ( z, z ′ )] = E C ∼C E x ∼ U M [ C ( x, x · ρ v ( z ))]= E z ′ ∼ U l E C ∼C ( x,x ′ ) ∼ U L [ ext l ( C )(( x, x ′ ) , ( x · ρ v ( z ) , x ′ · z ′ ))] . . . due to Eq. (20)= E z ′ ∼ U l [ H ext l ( C ) ( ρ v ( z ) , z ′ )] . . . due to Deﬁnition 7.4= H ( ρ v ( z )) . . . due to the deﬁnition in Lemma 7.5.This completes the proof of Claim B.1. C Weight Bound

For l ∈ N , we say that a deterministic protocol C : {− , } M × {− , } M → {− , } has minimumcost at least l , if every rectangle in the partition induced by the protocol has length and width atmost 2 M − l . Lemma C.1.

Let C ( x, y ) : {− , } M × {− , } M → { , } be any deterministic protocol of cost atmost c and of minimum cost at least l := ⌈ k log e ⌉ . Let H : {− , } M → R be deﬁned at every z ∈ {− , } M by H ( z ) := E x ∼ U M [ C ( x, x · z )] as in Deﬁnition 7.4. Then, L k ( H ) ≤ O (cid:16)(cid:0) ek (cid:1) k · c k (cid:17) . Corollary C.2.

Let l = ⌈ k log e ⌉ . Let C be a distribution over deterministic protocols C : {− , } M × {− , } M → {− , } of cost at most c . Let H ext l ( C ) be as in Deﬁnition 7.4. Then, L k ( H ext l ( C ) ) ≤ O (cid:16)(cid:0) ek (cid:1) k · ( c + 2 l ) k (cid:17) .Proof of Lemma 7.6 using Corollary C.2. Let H ( z ) := E z ′ ∼ U l [ H ext l ( C ) ( z, z ′ )] be as in Lemma 7.6.Note that for all S ⊆ [ M ], we have b H ( S ) = \ H ext l ( C ) ( S ). This implies that L k ( H ) ≤ L k ( H ext l ( C ) ).Corollary C.2 implies that L k ( H ext l ( C ) ) ≤ O (cid:16)(cid:0) ek (cid:1) k · ( c + 2 l ) k (cid:17) . This completes the proof ofLemma 7.6. Proof of Corollary C.2 using Lemma C.1.

Note that for all S ⊆ [ M + l ], we have \ H ext l ( C ) ( S ) = E C ∼C [ \ H ext l ( C ) ( S )]. This, along with Triangle-Inequality implies that L k ( C ) ≤ max C ∼C L k ( C ). Let C be any deterministic protocol in the support of C . Note that ext l ( C ) is a deterministic protocolof cost at most c + 2 l and of minimum cost l . Let H ext l ( C ) be as in Deﬁnition 7.4. Lemma C.1implies that L k ( H ext l ( C ) ) ≤ O (cid:16)(cid:0) ek (cid:1) k · ( c + 2 l ) k (cid:17) . This completes the proof of Corollary C.2. Proof of Lemma C.1 .

In order to bound L k ( H ), we will use the following lemma. Its statementand proof appear as ‘Level- k Inequalities’ on Page 259 of ‘Analysis of Boolean Functions’ [O’D14].For S ⊆ {− , } n , let S : {− , } n → { , } denote the { , } -indicator function of the set S , thatis, for x ∈ {− , } n , let S ( x ) = 1 if and only if x ∈ S .33 emma C.3 (Level- k Inequalities) . Let A ⊆ {− , } n be a set such that E [ A ] = α and let k ∈ N be at most /α ) . Then, X | S | = k (cid:16) c A ( S ) (cid:17) ≤ α (cid:18) ek ln(1 /α ) (cid:19) k We now show the desired bound on L k ( H ). Since C is a deterministic protocol of cost atmost c , it induces a partition of the input space {− , } M × {− , } M into at most 2 c rectangles.Let P denote the set of rectangles in this partition and let A × B index these rectangles, where A (respectively B ) is the set of Alice’s (respectively Bob’s) inputs compatible with the rectangle.Let C ( A × B ) ∈ {− , } denote the output of the protocol when the inputs are in A × B . For all x, y ∈ {− , } M , C ( x, y ) = X A × B ∈P C ( A × B ) A ( x ) B ( y )This implies that for all x, z ∈ {− , } M ,C ( x, x · z ) = X A × B ∈P C ( A × B ) A ( x ) B ( x · z )Taking an expectation over x ∼ U M of the above identity implies that H ( z ) , E x ∼ U M [ C ( x, x · z )] = X A × B ∈P C ( A × B ) (cid:0) A ∗ B (cid:1) ( z )This implies that for any S ⊆ [ M ], b H ( S ) = X A × B ∈P C ( A × B ) \ A ∗ B ( S ) = X A × B ∈P C ( A × B ) c A ( S ) c B ( S )Note that C ( A × B ) ∈ {− , } . We thus obtain L k ( H ) = X | S | =2 k (cid:12)(cid:12)(cid:12) b H ( S ) (cid:12)(cid:12)(cid:12) = X | S | =2 k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X A × B ∈P C ( A × B ) c A ( S ) c B ( S ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ X A × B ∈P X | S | =2 k | c A ( S ) || c B ( S ) | We apply Cauchy Schwarz to the term P | S | =2 k | c A ( S ) || c B ( S ) | to obtain L k ( H ) ≤ X A × B ∈P (cid:16) X | S | =2 k c A ( S ) (cid:17) / (cid:16) X | S | =2 k c B ( S ) (cid:17) / For ease of notation, let µ ( A ) = | A | M denote the measure of a set A ⊆ {− , } M under U M . Becauseof the assumption that the minimum cost of C is at least l = ⌈ k log e ⌉ , every rectangle A × B ∈ P · − . .

15 0 . .

25 0 . xy Figure 2: Plot of the function y = x (cid:0) ln x (cid:1) satisﬁes µ ( A ) , µ ( B ) ≤ e − k . This ensures that 2 k ≤ µ ( A ) and 2 k ≤ µ ( B ) . We applyLemma C.3 on the indicator functions A and B at level 2 k to obtain X | S | =2 k (cid:16) c A ( S ) (cid:17) ≤ µ ( A ) (cid:16) e k · ln(1 /µ ( A )) (cid:17) k X | S | =2 k (cid:16) c B ( S ) (cid:17) ≤ µ ( B ) (cid:16) e k · ln(1 /µ ( B )) (cid:17) k Substituting this in the bound for L k ( H ), we have L k ( H ) ≤ (cid:16) ek (cid:17) k X A × B ∈P µ ( A ) µ ( B ) (cid:18) ln 1 µ ( A ) ln 1 µ ( B ) (cid:19) k Let ∆ := (cid:0) ek (cid:1) k P A × B ∈P µ ( A ) µ ( B ) (cid:16) ln µ ( A ) ln µ ( B ) (cid:17) k be the expression in the R.H.S. of the above.Consider the case when P consists of 2 c rectangles A × B , each of which satisﬁes µ ( A ) = µ ( B ) = c/ .In this case, ∆ evaluates to (cid:0) ek (cid:1) k P A × B ∈P c ( c ln 22 ) k = O (cid:16)(cid:0) ek (cid:1) k · c k (cid:17) . This proves the lemma inthis special case. A similar bound holds for the general case and the proof follows from a concavityargument that we describe now.Since µ ( A ) , µ ( B ) ≤

1, we have the following inequality.∆ , (cid:16) ek (cid:17) k X A × B ∈P µ ( A ) µ ( B ) (cid:18) ln 1 µ ( A ) ln 1 µ ( B ) (cid:19) k ≤ (cid:16) ek (cid:17) k X A × B ∈P µ ( A ) µ ( B ) (cid:18) ln 1 µ ( A ) µ ( B ) ln 1 µ ( A ) µ ( B ) (cid:19) k = (cid:16) ek (cid:17) k X A × B ∈P µ ( A × B ) (cid:18) ln 1 µ ( A × B ) (cid:19) k Let f : [0 , ∞ ) → R be deﬁned by f ( p ) := p ln(1 /p ) k . A small calculation shows that f is a concave function in the interval [0 , e k − ] (see Figure 2). Let α i ∈ [0 , e k − ] for i ∈ [ d ]. Consider f ′ ( p ) = ln(1 /p ) k − k ln(1 /p ) k − . This implies that f ′′ ( p ) = 2 k ln(1 /p ) k − · p · ((2 k − − ln(1 /p )).Note that for p ≤ e k − , f ′′ ( p ) ≤ f states that for i ∼ [ d ] drawn uniformly at random, we have E i [ f ( α i )] ≤ f ( E i [ α i ]). This implies that d X i =1 α i ln(1 /α i ) k ≤ d X i =1 α i ! ln d P di =1 α i ! k We apply this inequality to the terms in ∆ by substituting α i with µ ( A × B ). We may do thisbecause of the assumption that µ ( A ) , µ ( B ) ≤ e k . This implies that∆ ≤ (cid:16) ek (cid:17) k X A × B ∈P µ ( A × B ) ! ln (cid:18) c P A × B ∈P µ ( A × B ) (cid:19) k Note that P A × B ∈P µ ( A × B ) = 1. This, along with the above inequality implies that ∆ ≤ O (cid:16)(cid:0) ek (cid:1) k · c k (cid:17)(cid:17)