A note on quantum algorithms and the minimal degree of epsilon-error polynomials for symmetric functions
aa r X i v : . [ qu a n t - ph ] F e b A note on quantum algorithms and the minimal degree of ε -errorpolynomials for symmetric functions Ronald de Wolf ∗ CWI Amsterdam
Abstract
The degrees of polynomials representing or approximating Boolean functions are a prominenttool in various branches of complexity theory. Sherstov [She08a] recently characterized theminimal degree deg ε ( f ) among all polynomials (over R ) that approximate a symmetric function f : { , } n → { , } up to worst-case error ε : deg ε ( f ) = e Θ (cid:16) deg / ( f ) + p n log(1 /ε ) (cid:17) . In this note we show how a tighter version (without the log-factors hidden in the e Θ-notation), canbe derived quite easily using the close connection between polynomials and quantum algorithms.
Boolean functions are one of the primary objects of study in theoretical computer science. Suchfunctions can be represented or approximated by polynomials in a number of ways, and the algebraicproperties of such polynomials (such as their degree) often give information about the complex-ity of the function involved. Areas where this approach has been used include circuit complex-ity [Raz87, Smo87, Bei93], complexity classes [BRS95, Bei94, Tod91], decision trees [NS94, BW02],communication complexity [BW01, Raz03, She08b, LS08], and learning theory [MOS04, LMMV05].In this note we focus on polynomials over the field of real numbers. An n -variate multilinearpolynomial p is a function p : R n → R that can be written as p ( x , . . . , x n ) = X S ⊆ [ n ] a S Y i ∈ S x i , for some real numbers a S . The degree of p is deg ( p ) = min {| S | | a S = 0 } . If is well known (and easyto show) that every function f : { , } n → R has a unique representation as such a polynomial; deg ( f ) is defined as the degree of that polynomial.In many applications it suffices if the polynomial is close to f instead of being equal to it: Definition 1
The ε -approximate degree of f : { , } n → R is deg ε ( f ) = min { deg ( p ) | ∀ x ∈ { , } n : | p ( x ) − f ( x ) | ≤ ε } . ∗ [email protected]. Partially supported by a Veni grant from the Netherlands Organization for Scientific Research(NWO), and by the European Commission under the Integrated Project Qubit Applications (QAP) funded by theIST directorate as Contract Number 015848. f is called symmetric if its value only depends on the Hamming weight | x | of itsinput x ∈ { , } n . Equivalently, f ( x ) = f ( π ( x )) for all x ∈ { , } n and all permutations π ∈ S n .We will restrict attention here to symmetric functions f . Examples are OR, AND, PARITY,MAJORITY etc. Since the only thing that matters is the Hamming weight | x | of the input, onecan actually restrict attention to univariate polynomials. We say that a univariate polynomial pε -approximates a symmetric function f if | p ( | x | ) − f ( x ) | ≤ ε for all x ∈ { , } n . By a techniquecalled symmetrization [MP68], it turns out that for symmetric functions, the minimal degree ofsuch univariate ε -approximating polynomials is the same degree deg ε ( f ) as for n -variate multilinearpolynomials. Hence we can switch back and forth between these two kinds of polynomials at will.Paturi [Pat92] tightly characterized the 1/3-approximate degree deg / ( f ) of all symmetric f (see the start of Section 2 for the precise statement). Recently, Sherstov [She08a] studied thedependence on the error ε . He proved the surprisingly clean result that for all ε ∈ [2 − n , / deg ε ( f ) = e Θ (cid:16) deg / ( f ) + p n log(1 /ε ) (cid:17) , where the e Θ notation hides some logarithmic factors. Note that the statement is false if ε ≪ − n ,since clearly deg ( f ) ≤ n for all f .Sherstov gave an interesting application of his result in the context of the inclusion-exclusionprinciple of probability theory. Let f : { , } n → { , } be a Boolean function. Suppose one hasevents A , . . . , A n in some probability space, and one knows the exact values of Pr[ ∩ i ∈ S A i ] for allsets S ⊆ [ n ] of size at most k . How well can we now estimate Pr[ f ( A , . . . , A n )]? Sherstov givesessentially tight bounds for this for all symmetric functions f , based on his degree-result. Thisgeneralizes earlier results for the case where f is the OR function, i.e. where one is estimatingPr[ ∪ i ∈ [ n ] A i ] [LN90, KLS96].In this note we give a different proof, for a slightly tighter version of Sherstov’s degree-result: Theorem 1
For every non-constant symmetric function f : { , } n → { , } and ε ∈ [2 − n , / : deg ε ( f ) = Θ (cid:16) deg / ( f ) + p n log(1 /ε ) (cid:17) . Note that there are no hidden logarithmic factors anymore. As a consequence, the result onapproximate inclusion-exclusion is sharpened as well, but we won’t elaborate on that here.The lower bound on deg ε ( f ) follows immediately from combining Paturi’s tight bound for deg / ( f ) with the tight bound on the ε -approximate degree of the OR-function proved in [BCWZ99].More interestingly, our upper bound is obtained by exhibiting an efficient ε -error quantum algo-rithm for computing a symmetric function. It is well known (at least in quantum circles) that theacceptance probability of a quantum algorithm that makes T queries to its input can be writtenas an n -variate multilinear polynomial of degree at most 2 T [BBC + | x | ∈ { t, . . . , n − t } . These functions may be arbitrary (possibly non-symmetric) for smaller orlarger Hamming weights. For every such function we have deg ε ( f ) = O ( √ tn + p n log(1 /ε )). Discussion
The main message of this note is that one can obtain essentially optimal polynomial approxima-tions of symmetric Boolean functions by arguing about quantum algorithms. This fits in a line2f papers in recent years that prove or reprove theorems about various topics in classical com-puter science or mathematics with the help of quantum computational techniques. This includesresults about locally decodable codes [KW04, WW05], classical proof systems for lattice problemsinspired by earlier quantum proof systems [AR03, AR04], limitations on classical algorithms forlocal search [Aar03] inspired by an earlier quantum proof, a proof that the complexity class PPis closed under intersection [Aar05], lower bounds on the rigidity of Hadamard matrices [Wol06],classical formula size lower bounds from quantum query lower bounds [LLS05], and an approach toproving lower bounds for classical circuit depth using quantum communication complexity [Ker07].There are advantages as well as disadvantages to our approach in this note. We feel that forsomeone familiar with quantum algorithms and their connection to polynomials, our proof shouldbe quite simple and straightforward. Also, our bound applies to a larger class of functions, and istight up to constant instead of logarithmic factors. On the other hand, for those unfamiliar withquantum computation our proof is probably not that accessible. Another disadvantage is that wedo not construct the ε -approximating polynomials explicitly (though one may derive them fromour quantum algorithm), in contrast to Sherstov’s construction based on Chebyshev polynomials. Let f : { , } n → { , } be a non-constant symmetric function that is constant if the Hammingweight | x | of the input is in the interval { t, .., n − t } (where 0 < t ≤ n/ t for whichthis holds). We know deg / ( f ) = Θ( √ tn ) from Paturi [Pat92]. In the next two subsections weprovide matching upper and lower bounds on deg ε ( f ), thus proving Theorem 1. deg ε ( f ) Beals et al. [BBC +
01] showed that the acceptance probability of a T -query quantum algorithmon n -bit input is a multilinear n -variate polynomial p : R n → R of degree at most 2 T . Hence itsuffices to give an ε -error quantum algorithm for f that uses O ( deg / ( f ) + p n log(1 /ε )) queries.The acceptance probability of the algorithm will be our ε -error polynomial.Here is the algorithm. It uses various quantum algorithms based on Grover’s search algorithm,which are explained in the appendix. Let x ∈ { , } n be the input string. The algorithms haveaccess to this string via queries . In the quantum case, one query is one application of the unitarythat maps | i i 7→ ( − x i | i i . A solution is an index i ∈ [ n ] such that x i = 1.1. Use t repeated applications of exact Grover to try to find up to t solutions (initially assuming | x | = t , and “crossing out” in subsequent applications the solutions already found). If | x | ≤ t ,then with probability 1 these repeated applications find all solutions. This costs O ( √ tn )queries.2. Use ε/ O ( p n log(1 /ε )) queries.3. The same as step 1, but now looking for positions of 0s instead of 1s.4. The same as step 2, but now looking for a 0 instead of a 1.The total number of queries is indeed O ( √ tn + p n log(1 /ε )). We need to show that this gives errorprobability at most ≤ ε for every input x ∈ { , } n . Observe the following:3 if step 1 found t solutions, then we know | x | ≥ t with probability 1 (note that you can verifywhether a given position is a solution with only 1 extra query). • if step 1 found fewer than t solutions, but step 2 found another solution, then we know | x | > t (for if | x | ≤ t then step 1 would certainly have found all solutions and there would be noneleft to be found in step 2). • if step 1 found fewer than t solutions, but step 2 did not find another solution, then theprobability that there are more solutions than those found by step 1, is at most ε/ ε/ • similar observations for steps 3 and 4 (with 0s and 1s switching roles).These observations imply that at the end of the 4 steps we have enough information to compute f .Note that with probability at least 1 − ε we can distinguish between the three cases | x | < t , | x | ∈ { t, . . . , n − t } , and | x | > n − t . If | x | ∈ { t, . . . , n − t } then we are done because f is constant onthis interval. If | x | < t then step 1 found all solutions, so we know x completely and can compute f ( x ). If | x | > n − t then step 2 found all non-solutions of x , and again we know x completely. Inall cases we compute f ( x ) with error probability at most ε .This algorithm even works for many non-symmetric functions: it suffices if f is constant on allinputs with Hamming weight in { t, . . . , n − t } ; f may be arbitrary if | x | < t or | x | > n − t since inthese cases the algorithm actually determines x completely, rather than just its Hamming weight. deg ε ( f ) We can assume t < n/
4, because if t ≥ n/ n ≥ deg ( f ) ≥ deg ε ( f ) ≥ deg / ( f ) = Θ( n ) . Buhrman et al. [BCWZ99] showed for the n -bit OR function that deg ε (OR n ) = Θ( p n log(1 /ε )). Since t < n/
4, we can embed an OR on at least n − t ≥ n/ f by fixing some of the bitsto specific values. Hence deg ε ( f ) ≥ max (cid:16) deg / ( f ) , Ω( p n log(1 /ε )) (cid:17) = Ω (cid:16) deg / ( f ) + p n log(1 /ε ) (cid:17) . Acknowledgments
Thanks to Sasha Sherstov for his paper [She08a] (which prompted this note) and some comments.
References [Aar03] S. Aaronson. Lower bounds for local search by quantum arguments. In
Proceedings of35th ACM STOC , pages 465–474, 2003.[Aar05] S. Aaronson. Quantum computing, postselection, and probabilistic polynomial-time.In
Proceedings of the Royal Society , volume A461(2063), pages 3473–3482, 2005. The earlier paper by Kahn et al. [KLS96] showed a e Θ-version of this.
Proceedings of 44thIEEE FOCS , pages 210–219, 2003.[AR04] D. Aharonov and O. Regev. Lattice problems in NP ∩ coNP. In Proceedings of 45thIEEE FOCS , pages 362–371, 2004.[BBC +
01] R. Beals, H. Buhrman, R. Cleve, M. Mosca, and R. de Wolf. Quantum lower boundsby polynomials.
Journal of the ACM , 48(4):778–797, 2001.[BCWZ99] H. Buhrman, R. Cleve, R. de Wolf, and Ch. Zalka. Bounds for small-error and zero-errorquantum algorithms. In
Proceedings of 40th IEEE FOCS , pages 358–368, 1999.[Bei93] R. Beigel. The polynomial method in circuit complexity. In
Proceedings of the 8thIEEE Structure in Complexity Theory Conference , pages 82–95, 1993.[Bei94] R. Beigel. Perceptrons, PP, and the polynomial hierarchy.
Computational Complexity ,4:339–349, 1994.[BHMT02] G. Brassard, P. Høyer, M. Mosca, and A. Tapp. Quantum amplitude amplificationand estimation. In
Quantum Computation and Quantum Information: A MillenniumVolume , volume 305 of
AMS Contemporary Mathematics Series , pages 53–74. 2002.[BRS95] R. Beigel, N. Reingold, and D. Spielman. PP is closed under intersection.
Journal ofComputer and System Sciences , 50(2):191–202, 1995.[BW01] H. Buhrman and R. de Wolf. Communication complexity lower bounds by polynomials.In
Proceedings of 16th IEEE Conference on Computational Complexity , pages 120–130,2001.[BW02] H. Buhrman and R. de Wolf. Complexity measures and decision tree complexity: Asurvey.
Theoretical Computer Science , 288(1):21–43, 2002.[Gro96] L. K. Grover. A fast quantum mechanical algorithm for database search. In
Proceedingsof 28th ACM STOC , pages 212–219, 1996.[GW02] M. de Graaf and R. de Wolf. On quantum versions of the Yao principle. In
Proceedingsof 19th Annual Symposium on Theoretical Aspects of Computer Science (STACS’2002) ,volume 2285 of
LNCS , pages 347–358. Springer, 2002.[Ker07] I. Kerenidis. Quantum multiparty communication complexity and circuit lower bounds.In
Proceedings of 4th TAMC , volume 4484 of
LNCS , pages 306–317. Springer, 2007.[KLS96] J. Kahn, N. Linial, and A. Samorodnitsky. Inclusion-exclusion: Exact and approximate.
Combinatorica , 16(4):465–477, 1996.[KW04] I. Kerenidis and R. de Wolf. Exponential lower bound for 2-query locally decodablecodes via a quantum argument.
Journal of Computer and System Sciences , 69(3):395–420, 2004.[LLS05] S. Laplante, T. Lee, and M. Szegedy. The quantum adversary method and classicalformula size lower bounds. In
Proceedings of 20th IEEE Conference on ComputationalComplexity , 2005. 5LMMV05] R. Lipton, E. Markakis, A. Mehta, and N. Vishnoi. On the Fourier spectrum of sym-metric Boolean functions with applications to learning symmetric juntas. In
Proceedingsof 20th IEEE Conference on Computational Complexity , pages 112–119, 2005.[LN90] N. Linial and N. Nisan. Approximate inclusion-exclusion.
Combinatorica , 10(4):349–365, 1990.[LS08] T. Lee and A. Shraibman. Disjointness is hard in the multi-party number-on-the-forehead model. In
Proceedings of 23rd IEEE Conference on Computational Complexity ,2008.[MOS04] E. Mossel, R. O’Donnell, and R. Servedio. Learning functions of k relevant variables. Journal of Computer and System Sciences , 69(3):421–434, 2004.[MP68] M. Minsky and S. Papert.
Perceptrons . MIT Press, Cambridge, MA, 1968. Second,expanded edition 1988.[NS94] N. Nisan and M. Szegedy. On the degree of Boolean functions as real polynomials.
Computational Complexity , 4(4):301–313, 1994.[Pat92] R. Paturi. On the degree of polynomials that approximate symmetric Boolean func-tions. In
Proceedings of 24th ACM STOC , pages 468–474, 1992.[Raz87] A. Razborov. Lower bounds for the size of circuits of bounded depth with basis {∧ , ⊕} . Mathematical notes of the Academy of Science of the USSR , 41(4):333–338, 1987.[Raz03] A. Razborov. Quantum communication complexity of symmetric predicates.
Izvestiyaof the Russian Academy of Sciences, mathematics , 67(1):159–176, 2003.[She08a] A. Sherstov. Approximate inclusion-exclusion for arbitrary symmetric functions. In
Proceedings of 23rd IEEE Conference on Computational Complexity , 2008.[She08b] A. Sherstov. The pattern matrix method for lower bounds on quantum communication.In
Proceedings of 40th ACM STOC , 2008.[Smo87] R. Smolensky. Algebraic methods in the theory of lower bounds for boolean circuitcomplexity. In
Proceedings of 19th ACM STOC , pages 77–82, 1987.[Tod91] S. Toda. PP is as hard as the polynomial-time hierarchy.
SIAM Journal on Computing ,20(5):865–877, 1991.[Wol06] R. de Wolf. Lower bounds on matrix rigidity via a quantum argument. In
Proceedingsof 33rd ICALP , volume 4051 of
LNCS , pages 62–71, 2006.[WW05] S. Wehner and R. de Wolf. Improved lower bounds for locally decodable codes andprivate information retrieval. In
Proceedings of 32nd ICALP , volume 3580 of
LNCS ,pages 1424–1436, 2005. 6
Grover’s algorithm and applications
Grover’s quantum algorithm [Gro96] for finding a solution (i.e. an i ∈ [ n ] such that x i = 1) consistsof T applications of a certain unitary G , starting from the uniform superposition √ n P ni =1 | i i . Wewon’t explain the details of G here. Suffice it to say that each G makes one quantum query, so thetotal number of queries is T . The intuition is that G changes the state by moving amplitude fromnon-solutions to solutions. One can show [BHMT02] that the probability that a measurement ofthe state after T steps gives a solution, is exactly(sin((2 T + 1) θ )) , where θ = arcsin( p | x | /n ) . If | x | > T = ⌈ ( π/ p n/ | x |⌉ , then this probability is close to 1. Hence if we know (atleast approximately) the number of solutions | x | , then we can find one with good probability using O ( p n/ | x | ) queries. If we know | x | exactly, a small modification of the algorithm finds a solution with probability 1 [BHMT02]. This uses exactly ⌈ ( π/ p n/ | x |⌉ queries; we will refer to it as “exactGrover”.What if we don’t know how many solutions there are in the input? We can first apply Groverassuming the number of solutions is n/
2, then assuming it is n/ P log ni =1 O ( p n/ i ) = O ( √ n ) queries. If we know there are at least t solutions, this can be improvedto O ( p n/t ). We will refer to this as “usual Grover”.And what if we want to have probability at least 1 − ε of finding a solution? Buhrman etal. [BCWZ99] designed an algorithm that achieves this using O ( p n log(1 /ε )) queries, and showed(by proving the lower bound on deg ε (OR) mentioned in Section 2.2) that this complexity is optimalup to a constant factor. Their algorithm is quite simple. Apply exact Grover log(1 /ε ) times, firstassuming there is 1 solution, then assuming there are 2 solutions, etc. If the actual number ofsolutions is between 1 and log(1 /ε ), at least one solution will have been found with probability 1 bynow. If no solution has been found yet, then apply usual Grover O (log(1 /ε )) many times assumingthere are at least t = log(1 /ε ) solutions. It is easy to verify that this has overall query complexity O ( p n log(1 /ε )) and error probability at most ε . We will refer to this as “ ε -error Grover”.De Graaf and de Wolf [GW02, Lemma 2] observed that exact Grover can be used to find allsolutions with probability 1 , as long as we know an upper bound t on the number of solutions.Suppose we run exact Grover t times: the first time assuming we have exactly t solutions, thesecond time assuming we have exactly t − i , we “crossit out” in the sense of modifying the input by setting x i to 0 (this can easily be achieved by someunitary pre- and post-processing around the query). This prevents the algorithm from finding thesame solution twice. The total number of queries used is t X i =1 ⌈ ( π/ p n/i ⌉ ≤ π √ tn. To see that this finds all solutions with probability 1, observe that the assumed number of solutions t − i + 1 of the i th run always upper bounds the actual number of remaining solutions (this “loopinvariant” is easily proved with downward induction). Hence if we start with at most t remainingsolutions, then after tt