The List-Decoding Size of Fourier-Sparse Boolean Functions
aa r X i v : . [ c s . D S ] A p r The List-Decoding Size of Fourier-Sparse BooleanFunctions
Ishay Haviv ∗ Oded Regev † Abstract
A function defined on the Boolean hypercube is k-Fourier-sparse if it has at most k nonzeroFourier coefficients. For a function f : F n → R and parameters k and d , we prove a strongupper bound on the number of k -Fourier-sparse Boolean functions that disagree with f onat most d inputs. Our bound implies that the number of uniform and independent randomsamples needed for learning the class of k -Fourier-sparse Boolean functions on n variablesexactly is at most O ( n · k log k ) .As an application, we prove an upper bound on the query complexity of testing Booleanityof Fourier-sparse functions. Our bound is tight up to a logarithmic factor and quadraticallyimproves on a result due to Gur and Tamuz (Chicago J. Theor. Comput. Sci., 2013). Functions defined on the Boolean hypercube {
0, 1 } n = F n are fundamental objects in theoreticalcomputer science. It is well known that every such function f : F n → R can be represented as alinear combination f = ∑ S ⊆ [ n ] ˆ f ( S ) · χ S of the 2 n functions { χ S } S ⊆ [ n ] defined by χ S ( x ) = ( − ) ∑ i ∈ S x i . This representation is known asthe Fourier expansion of the function f , and the numbers ˆ f ( S ) are known as its Fourier coefficients .The Fourier expansion of functions plays a central role in analysis of Boolean functions and findsapplications in numerous areas of theoretical computer science including learning theory, prop-erty testing, hardness of approximation, social choice theory, and cryptography. For an in-depthintroduction to the topic the reader is referred to the book of O’Donnell [22].A classical result in learning theory is a general algorithm due to Kushilevitz and Mansour [19],based on results of Linial, Mansour, and Nisan [20] and Goldreich and Levin [12], which enablesto efficiently learn classes of Boolean functions with a “simple” Fourier expansion. A common ∗ School of Computer Science, The Academic College of Tel Aviv-Yaffo, Tel Aviv 61083, Israel. † Courant Institute of Mathematical Sciences, New York University. Supported by the Simons Collaboration onAlgorithms and Geometry and by the National Science Foundation (NSF) under Grant No. CCF-1320188. Any opinions,findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarilyreflect the views of the NSF. otion of simplicity of Fourier expansion is its sparsity. A function is said to be k-Fourier-sparse if it has at most k nonzero Fourier coefficients. It follows from [19] that given query access to a k -Fourier-sparse Boolean function f : F n → {
0, 1 } it is possible to estimate its Fourier coefficientsand to get a good approximation of f in running time polynomial in n and k . Later, it was shownthat such running time even allows to reconstruct the function f exactly [13].In recent years, properties of the Fourier expansion of functions were studied in the propertytesting framework. We now mention some of those results; since this will not be needed for thesequel, the reader can skip directly to the description of our results in the next section. Gopalan,O’Donnell, Servedio, Shpilka, and Wimmer considered in [13] the problem of testing if a givenBoolean function is k -Fourier-sparse or ε -far from any such function. Another problem studiedthere is that of deciding if a function is k-Fourier-dimensional , that is, the Fourier support, viewedas a subset of F n , spans a subspace of dimension at most k , or ε -far from satisfying this property.Gopalan et al. [13] established testers for these properties whose query complexities depend onlyon k and ε . For k -Fourier-sparsity the query complexity was a certain polynomial in k and 1/ ε and for k -Fourier-dimensionality it was O ( k · k / ε ) . They also proved lower bounds of Ω ( √ k ) and Ω ( k /2 ) respectively. Another parameter associated with Boolean functions is the degree ofits representation as a polynomial over F . The algorithmic task of testing if a function has F -degree at most d or is ε -far from any such function was considered by Alon et al. [1] and then byBhattacharyya et al. [6], who proved tight upper and lower bounds of Θ ( d + ε ) on the querycomplexity. Note that all the above properties fall into the class of linear-invariant properties,i.e., properties that are closed under compositions with any invertible linear transformation of thedomain. These properties have recently attracted a significant amount of attention in the attemptto characterize efficient testability of them (see [24, 5] for related surveys). List-decoding size.
Our main technical result from which we derive all other results is concernedwith the list-decoding size of Fourier-sparse Boolean functions. In general, the list-decoding prob-lem of an error correcting code for a distance parameter d asks to find all the codewords whoseHamming distance from a given word is at most d . Here we consider the (non-linear) binarycode of block length 2 n whose codewords represent all the k -Fourier-sparse Boolean functions on n variables.It is not difficult to show that the total number of such functions is at most 2 O ( nk ) . Indeed,there are 2 O ( nk ) ways to choose the support of ˆ f , and 2 O ( nk ) ways to set those Fourier coefficientswhich must all be integer multiples of 2 − n in [ − + ] . It is also not difficult to show that thedistance between any two distinct codewords is at least 2 n / k . Indeed, it is known that every k -Fourier-sparse Boolean function has F -degree d ≤ log k (see, e.g., [4, Lemma 3]), and therefore,by the Schwartz-Zippel lemma, every two distinct k -Fourier-sparse Boolean functions disagree onat least 1/ k fraction of the inputs. As a result, for every function f : F n → R there is at most onecodeword of distance smaller than 2 n / ( k ) from f .We are not aware of any other known bounds beyond those two naive ones. We address thisquestion in the following theorem. 2 heorem 1.1. For every function f : F n → R , the number of k-Fourier-sparse Boolean functions ofdistance at most d from f is O ( ndk log k /2 n ) . We observe that for certain choices of k and d the bound given in Theorem 1.1 is tight. Forexample, let f be the constant zero function, let k < n be a power of 2, and take d = n / k .Consider all the indicator functions of linear subspaces of F n of co-dimension log k . Every suchfunction is of distance d from f and is k -Fourier-sparse (see Claim 2.4). The number of such func-tions is 2 Θ ( n log k ) = Θ ( ndk log k /2 n ) . Learning from samples.
As an application of the list-decoding bound, we next consider theproblem of learning the class of k -Fourier-sparse Boolean functions on n variables (exactly) fromuniform and independent random samples (see, e.g., [2, 18] for related work). Let us note alreadyat the outset that all the results mentioned here are not efficient: it is not known if there is analgorithm for the problem whose running time is some fixed polynomial in n times an arbitraryfunction of k . Among other things, such an algorithm would imply a breakthrough on the long-standing open question of learning juntas from samples [7, 21, 25, 18].The question of recovering a function that is sparse in the Fourier (or other) basis from a fewsamples is the central question in the area of sparse recovery. It has been intensely investigatedfor over a decade and, among other things, has applications for compressed sensing and for thedata stream model. The best previously known bounds on our question are O ( n · k log k ) ≤ O ( n · k ) due to Cheraghchi, Guruswami, and Velingker [11] and O ( n · k log k ) ≤ O ( n · k ) dueto Bourgain [8], improving on a previous bound of Rudelson and Vershynin [23] (who themselvesimproved on the work of Cand`es and Tao [10]). We note in passing that they actually answer aharder question: first, because they handle all functions, not necessarily Boolean-valued; second,because they show that a randomly chosen set of sample locations of the above cardinality isgood with high probability simultaneously for all k -Fourier-sparse functions (sometimes knownas the “deterministic” setting), whereas we only want a random set of sample locations to be goodwith high probability for any fixed k -Fourier-sparse function (the “randomized” setting); finally,because they obtain the recovery result by proving a “restricted isometry property” of the Fouriermatrix which among other things implies a recovery algorithm running in time polynomial in 2 n and k .Using Theorem 1.1, we improve the upper bound on the sample complexity of learning Fourier-sparse Boolean functions. Corollary 1.2.
The number of uniform and independent random samples required for learning the class ofk-Fourier-sparse Boolean functions on n variables is O ( n · k log k ) . We believe that our better bound and its elementary proof shed more light on the problem andmight be useful elsewhere. In fact, in a follow-up work [15] we employ the techniques developedhere to study the “restricted isometry property” of random submatrices of Fourier (and other)matrices, improving on the aforementioned works [11, 8]. We finally note that a lower boundof Ω ( k · ( n − log k )) on the sample complexity can be obtained by considering the problem oflearning indicator functions of affine subspaces of F n of co-dimension log k (see Theorem 3.7; see,e.g., [3] for the same lower bound in a different setting).3 esting Booleanity. We next consider the problem of testing Booleanity of Fourier-sparse func-tions, which was introduced and studied by Gur and Tamuz in [14]. In this problem, given accessto a k -Fourier-sparse function f : F n → R , one has to decide if f is Boolean, i.e., its image is con-tained in {
0, 1 } , or not. The objective is to distinguish between the two cases with some constantprobability using as few queries to f as possible. It was shown in [14] that there exists a (non-adaptive one-sided error) tester for the problem with query complexity O ( k ) , and that everytester for the problem has query complexity Ω ( k ) . Here, we use our result on learning k -Fourier-sparse Boolean functions to improve the upper bound of [14] and prove the following. Theorem 1.3.
For every k there exists a non-adaptive one-sided error tester that using O ( k · log k ) queriesto an input k-Fourier-sparse function f : F n → R decides if f is Boolean or not with constant successprobability. We note that, while the tester established in Theorem 1.3 has an improved query complexity,it is not clear if it is efficient with respect to running time. It can be shown, though, that using thelearning algorithm of Fourier-sparse functions that follows from [8, 15] (instead of Corollary 1.2)in our proof of Theorem 1.3, one can obtain an efficient algorithm (running in time polynomial in n and k ) with the slightly worse query complexity of O ( k · log k ) .Finally, we complement Theorem 1.3 by the following nearly matching lower bound. Theorem 1.4.
Every non-adaptive one-sided error tester for Booleanity of k-Fourier-sparse functions hasquery complexity Ω ( k · log k ) . In order to prove Theorem 1.1, we have to bound from above the number of k -Fourier-sparseBoolean functions of distance at most d from a general function f : F n → R . In the discussionbelow, let us consider the special case where f is the constant zero function. The general resultfollows easily.Here, we have to bound the number of k -Fourier-sparse Boolean functions g : F n → {
0, 1 } of support size at most d . We start by observing using Parseval’s theorem that such functionshave small spectral norm k ˆ g k = ∑ S ⊆ [ n ] | ˆ g ( S ) | . Next, we observe that the Fourier expansionof the normalized function g / k ˆ g k is a convex combination of functions ± χ S , and thus can beviewed, following a technique of Bruck and Smolensky [9], as an expectation over a distributionon the S ’s. Using the Chernoff-Hoeffding bound and the bound on the spectral norm, we obtaina succinct representation for every such function g . The ability to represent these functions by abinary string of bounded length yields the upper bound on their number. We note that the proofapproach somewhat resembles that of the upper bound on the list-decoding size of Reed-Mullercodes due to Kaufman, Lovett, and Porat [17]. 4 .2.2 Learning Fourier-Sparse Boolean Functions As a warmup, let us mention an easy upper bound of O ( n · k ) . This follows by recalling thatthere are at most 2 O ( nk ) k -Fourier-sparse Boolean functions, and that each one differs from anyfixed function on at least 1/ k fraction of the inputs. Hence by the union bound, after O ( n · k ) samples all other functions will be eliminated.The improved bound in Corollary 1.2 follows similarly using the list-decoding result of The-orem 1.1. Namely, we apply the union bound separately on functions of different distances fromthe input function. Functions that are nearby are harder to “hit” using random samples, but bythe theorem, there are few of them; functions that are further away are in abundance, but they areeasier to “hit” using random samples. The testing Booleanity problem is somewhat different from typical property testing problems.Indeed, in property testing one usually has to distinguish objects that satisfy a certain propertyfrom those that are ε -far from the property for some distance parameter ε >
0. However, here thetester is required to decide if the function satisfies the Booleanity property or not, with no distanceparameter involved. This unusual setting makes sense in this case because Fourier-sparse non-Boolean functions are always quite far from every Boolean function. More precisely, the authorsof [14] used the uncertainty principle (see Proposition 2.1) to prove that every k -Fourier-sparsenon-Boolean function f : F n → R is non-Boolean on at least Ω ( n / k ) inputs (see Claim 2.3). Thisimmediately implies a (non-adaptive one-sided error) tester that uses O ( k ) queries: just checkthat f is Boolean on O ( k ) uniform inputs in F n .The analysis of [14] turns out to be tight, as there are k -Fourier-sparse non-Boolean functionsthat are not Boolean at only Θ ( n / k ) points. Indeed, for an even integer n , consider the function f : F n → {
0, 1, 2 } defined by f ( x , . . . , x n ) = AND ( x , . . . , x n /2 ) + AND ( x n /2 + , . . . , x n ) , (1)which is not Boolean at only one point and has Fourier-sparsity 2 · n /2 (see Claim 2.4). Upper bound.
We prove Theorem 1.3 using our learning result, Corollary 1.2. To do so, wefirst observe that a restriction of a k -Fourier-sparse non-Boolean function to a random subspace ofdimension O ( log k ) is non-Boolean with high probability (see Lemma 4.1). Since a restriction to asubspace does not increase the Fourier-sparsity, this reduces our problem to testing Booleanity of k -Fourier-sparse functions on n = O ( log k ) variables. Then, after O ( k · log k ) samples from thesubspace, if a non-Boolean value was found then we are clearly done. Otherwise, by Corollary 1.2,the samples uniquely specify a Boolean candidate for the restricted function. Such a functionmust be quite far from every other k -Fourier-sparse function (Boolean or not; see Claim 2.2). Thisenables us to decide if the restricted function equals the Boolean candidate function or not. Lower bound.
The upper bound in Theorem 1.3 gets close to the Ω ( k ) lower bound proven byGur and Tamuz in [14]. For their lower bound, they considered the following two distributions:5a) the uniform distribution over all Boolean n -variable functions that depend only on their firstlog k variables; (b) the uniform distribution over all n -variable functions that depend only ontheir first log k variables and return a Boolean value on k − Ω ( k ) . Since the first distribution is supported on k -Fourier-sparse Boolean functions andthe second on k -Fourier-sparse non-Boolean functions, this implies that the same lower boundholds for the query complexity of testing Booleanity of k -Fourier-sparse functions.Note that the distributions considered above are supported on log k -Fourier-dimensional func-tions. It can be seen (say, using the uncertainty principle) that such functions are not Boolean on atleast 1/ k fraction of their inputs, so O ( k ) random samples suffice for finding a non-Boolean valueif exists. Hence, in order to get beyond the Ω ( k ) lower bound, we need to consider k -Fourier-sparse functions that are not Boolean at only o ( k ) fraction of the inputs – our functions willactually have O ( k ) fraction of such inputs.Specifically, we consider the distribution of functions obtained by composing the function f given in (1) with a random invertible affine transformation. This is the class of functions that canbe represented as a sum V + V of two indicators of affine subspaces V , V ⊆ F n of dimension n /2, which intersect at exactly one point. Intuitively, it seems that distinguishing the functions inthis class from those where V and V have empty intersection requires the tester to learn the affinesubspaces V and V , a task that requires Ω ( n · n /2 ) queries. We prove such a lower bound for non-adaptive one-sided error testers. Since the above functions are k -Fourier-sparse for k = O ( n /2 ) ,the obtained lower bound is Ω ( k · log k ) . Let [ n ] denote the set {
1, . . . , n } . A function f : F n → R is Boolean if its image is contained in {
0, 1 } and is non-Boolean otherwise. The distance between two functions f , g : F n → R , denoteddist ( f , g ) , is the number of vectors x ∈ F n for which f ( x ) = g ( x ) . Fourier Expansion
For every S ⊆ [ n ] , let χ S : F n → {−
1, 1 } denote the function defined by χ S ( x ) = ( − ) ∑ i ∈ S x i .It is well known that the 2 n functions { χ S } S ⊆ [ n ] form an orthonormal basis of the space of func-tions F n → R with respect to the inner product h f , g i = E x [ f ( x ) · g ( x )] , where x is distributeduniformly over F n . Thus, every function f : F n → R can be uniquely represented as a linearcombination f = ∑ S ⊆ [ n ] ˆ f ( S ) · χ S of this basis. This representation is called the Fourier expansion of f , and the numbers ˆ f ( S ) are referred to as its Fourier coefficients . The support of f is definedby supp ( f ) = { x ∈ F n | f ( x ) = } and the support of ˆ f , known as the Fourier spectrum of f , bysupp ( ˆ f ) = { S ⊆ [ n ] | ˆ f ( S ) = } . We say that f is k-Fourier-sparse if | supp ( ˆ f ) | ≤ k . For every Boolean functions are sometimes defined in the literature with range {− + } rather than {
0, 1 } . Notice that thisaffects the Fourier-sparsity by at most 1. ≥ k ˆ f k p = ( ∑ S ⊆ [ n ] | ˆ f ( S ) | p ) p . For p = k ˆ f k is known as the spectral norm of f . Parseval’s theorem states that E x [ f ( x ) ] = k ˆ f k .The uncertainty principle says that there is no nonzero function f for which the supports ofboth f and ˆ f are small (see, e.g., [22, Exercise 3.15]). We state it below with two simple conse-quences. Proposition 2.1 (The Uncertainty Principle) . For every nonzero function f : F n → R , | supp ( f ) | · | supp ( ˆ f ) | ≥ n . Claim 2.2.
For every two distinct k-Fourier-sparse functions f , g : F n → R , dist ( f , g ) ≥ n / ( k ) . Proof:
Apply Proposition 2.1 to the function f − g , whose Fourier-sparsity is at most 2 k . Claim 2.3 ([14]) . For every k-Fourier-sparse function f : F n → R , if f is non-Boolean then |{ x ∈ F n | f ( x ) / ∈ {
0, 1 }}| ≥ k + k + · n . Proof:
Apply Proposition 2.1 to the function f · ( f − ) , whose Fourier-sparsity is at most |{ S △ T | S , T ∈ supp ( ˆ f ) }| + | supp ( ˆ f ) | ≤ (cid:18) k (cid:19) + k + △ stands for symmetric difference of sets.We also need the following simple claim. Claim 2.4.
For every affine subspace V ⊆ F n of co-dimension k, the indicator function V : F n → {
0, 1 } is k -Fourier-sparse. Proof:
Since V has co-dimension k , there exist a , . . . , a k ∈ F n and b , . . . , b k ∈ F such that V = { x ∈ F n | h x , a i i = b i , i =
1, . . . , k } . For every i , let S i ⊆ [ n ] denote the set whose characteristicvector is a i , and observe that for every x ∈ F n , V ( x ) = k ∏ i = (cid:16) + ( − ) b i · χ S i ( x ) (cid:17) .This representation implies that V is 2 k -Fourier-sparse. Chernoff-Hoeffding Bound
Theorem 2.5.
Let X , . . . , X N be N identically distributed independent random variables in [ − a , + a ] satisfying E [ X i ] = µ for all i. Then for every δ ≤ and N ≥ C · a · log ( δ ) / ε , for a universalconstant C, it holds that Pr "(cid:12)(cid:12)(cid:12) µ − N · N ∑ i = X i (cid:12)(cid:12)(cid:12) < ε ≥ − δ .7 The List-Decoding Size of Fourier-Sparse Boolean Functions
We turn to prove Theorem 1.1, which provides an upper bound on the list-decoding size of thecode of block length 2 n of all k -Fourier-sparse Boolean functions on n variables. Equivalently, fora general distance d and a function f : F n → R we bound the number of k -Fourier-sparse Booleanfunctions on n variables of distance at most d from f .We start by proving that a function f : F n → R with small spectral norm can be well ap-proximated by a linear combination of few functions from { χ S } S ⊆ [ n ] with coefficients of equalmagnitude. This was essentially proved in [9] and we include here the proof for completeness. Lemma 3.1.
For every function f : F n → R , ε > , and δ ∈ (
0, 1/2 ] , there exists a collection F ofO ( k ˆ f k · log ( δ ) / ε ) subsets of [ n ] with signs ( a S ∈ {± } ) S ∈F such that for all but at most δ fractionof x ∈ F n it holds that (cid:12)(cid:12)(cid:12) f ( x ) − k ˆ f k |F | · ∑ S ∈F a S · χ S ( x ) (cid:12)(cid:12)(cid:12) < ε . Proof:
Observe that the function f can be represented as follows. f = ∑ S ⊆ [ n ] ˆ f ( S ) · χ S = ∑ S ⊆ [ n ] | ˆ f ( S ) |k ˆ f k · k ˆ f k · sign ( ˆ f ( S )) · χ S = E S ∼ D [ k ˆ f k · sign ( ˆ f ( S )) · χ S ] ,where D is the distribution defined by D ( S ) = | ˆ f ( S ) | / k ˆ f k . Let F be a collection of |F | = O ( k ˆ f k · log ( δ ) / ε ) independent random samples from the distribution D . For every x ∈ F n ,the Chernoff-Hoeffding bound (Theorem 2.5) implies that with probability at least 1 − δ it holdsthat (cid:12)(cid:12)(cid:12) f ( x ) − |F | · ∑ S ∈F k ˆ f k · a S · χ S ( x ) (cid:12)(cid:12)(cid:12) < ε , (2)where a S = sign ( ˆ f ( S )) . By linearity of expectation, it follows that there exist F and signs ( a S ) S ∈F for which (2) holds for all but at most δ fraction of x ∈ F n , as required.We now apply Lemma 3.1 to Fourier-sparse functions in F n → {−
1, 0, + } with boundedsupport size, and then, in Corollary 3.3, derive an upper bound on the number of these functions. Corollary 3.2.
Let f : F n → {−
1, 0, + } be a k-Fourier-sparse function satisfying | supp ( f ) | ≤ d.Then for every δ ∈ (
0, 1/2 ] there exists a collection F of O ( dk log ( δ ) /2 n ) subsets of [ n ] with signs ( a S ∈ {± } ) S ∈F such that for all but at most δ fraction of x ∈ F n it holds that (cid:12)(cid:12)(cid:12) f ( x ) − k ˆ f k |F | · ∑ S ∈F a S · χ S ( x ) (cid:12)(cid:12)(cid:12) <
12 . Repetitions of subsets in the collection F are allowed. roof: By the Cauchy-Schwarz inequality and Parseval’s theorem, we obtain that k ˆ f k k ≤ ∑ S ⊆ [ n ] ˆ f ( S ) = − n · ∑ x ∈ F n f ( x ) ≤ d n .The corollary follows from Lemma 3.1, applied with ε = |F | = O ( k ˆ f k log ( δ ) / ε ) = O ( dk log ( δ ) /2 n ) . Corollary 3.3.
The number of k-Fourier-sparse functions f : F n → {−
1, 0, + } satisfying | supp ( f ) | ≤ d is O ( ndk log k /2 n ) . Proof:
For every k -Fourier-sparse function f : F n → {−
1, 0, + } satisfying | supp ( f ) | ≤ d , let F and ( a S ) S ∈F be as given by Corollary 3.2 for, say, δ = ( k ) . Since the range of f is {−
1, 0, + } , itfollows that the collection F , the signs ( a S ) S ∈F , and the value of k ˆ f k define a function of distanceat most δ · n from f . Notice that by Claim 2.2 and our choice of δ , the distance between everytwo distinct k -Fourier-sparse functions is larger than 2 δ · n . Thus, a function of distance at most δ · n from f fully defines f . This implies that f can be represented by a binary string of length O ( n · dk log k /2 n ) , so the total number of such functions is 2 O ( ndk log k /2 n ) .The bound in Corollary 3.3 implies a bound on the number of Fourier-sparse Boolean functionsof bounded distance from a given Boolean function. Corollary 3.4.
For every k-Fourier-sparse Boolean function f : F n → {
0, 1 } , the number of k-Fourier-sparse Boolean functions of distance at most d from f is O ( ndk log k /2 n ) . Proof:
Let f : F n → {
0, 1 } be a k -Fourier-sparse Boolean function. Consider the mapping thatmaps every k -Fourier-sparse Boolean function g : F n → {
0, 1 } , whose distance from f is atmost d , to the function h = f − g . Observe that h is a 2 k -Fourier-sparse function from F n to {−
1, 0, + } satisfying | supp ( h ) | ≤ d . By Corollary 3.3, the number of such functions h is boundedby 2 O ( ndk log k /2 n ) . Since the above mapping is bijective, this bound holds for the number of func-tions g as well.Equipped with Corollary 3.3, we restate and prove Theorem 1.1. Theorem 1.1.
For every function f : F n → R , the number of k-Fourier-sparse Boolean functions ofdistance at most d from f is O ( ndk log k /2 n ) . Proof:
If there is no k -Fourier-sparse Boolean function of distance at most d from f , then the boundtrivially holds. So assume that such a function g : F n → {
0, 1 } exists. Observe that every k -Fourier-sparse Boolean function of distance at most d from f has distance at most 2 d from g . Thus,by Corollary 3.4 applied to g , the number of such functions is at most 2 O ( ndk log k /2 n ) .9 .1 The Sample Complexity of Learning Fourier-Sparse Boolean Functions The sample complexity of learning a class of functions is the minimum number of uniform and in-dependent random samples needed from a function in the class for specifying it with high successprobability. Here we consider the class of k -Fourier-sparse Boolean functions on n variables, andshow how Theorem 1.1 implies an upper bound on the sample complexity of learning it (Corol-lary 3.6). Theorem 3.5.
For every n, < k ≤ n , and a k-Fourier-sparse function f : F n → R , the following holds.The probability that when sampling O ( n · k log k ) uniform and independent random samples from f , thereexists a k-Fourier-sparse Boolean function g = f that agrees with f on all the samples is − Ω ( n log k ) . Proof:
Consider q = O ( nk log k ) samples ( x , f ( x )) from a k -Fourier-sparse function f : F n → R ,where x is distributed uniformly and independently in F n . By Claim 2.2, the distance between f and every other k -Fourier-sparse function is at least 2 n / ( k ) . For an integer ℓ ∈ [ ⌊ log k ⌋ ] ,consider all the k -Fourier-sparse Boolean functions whose distance from f is in [ n − ℓ , 2 n − ℓ + ] . ByTheorem 1.1, the number of such functions is 2 O ( nk log k /2 ℓ ) . The probability that such a functionagrees with q random independent samples of f is at most ( − − ℓ ) q . By the union bound, theprobability that at least one of these functions agrees with the q samples is at most2 O ( nk log k /2 ℓ ) · ( − − ℓ ) q ≤ O ( nk log k /2 ℓ ) · e − q /2 ℓ ≤ − Ω ( n log k ) ,where the last inequality holds for an appropriate choice of q = O ( nk log k ) . By applying theunion bound over all the values of ℓ , it follows that with probability 1 − − Ω ( n log k ) all the k -Fourier-sparse Boolean functions (besides f ) are eliminated, completing the proof.The following corollary follows immediately from Theorem 3.5 and confirms Corollary 1.2. Corollary 3.6.
For every n and ≤ k ≤ n , the number of uniform and independent random samplesrequired for learning the class of k-Fourier-sparse Boolean functions on n variables with success probability − − Ω ( n log k ) is O ( n · k log k ) . We end with the following simple lower bound.
Theorem 3.7.
For every n and ≤ k ≤ n , the number of uniform and independent random samplesrequired for learning the class of k-Fourier-sparse Boolean functions on n variables with constant successprobability is Ω ( k · ( n − log k )) . Proof:
Assume without loss of generality that k is a power of 2. Let A be an algorithm for learningthe class above with constant success probability p > q uniform and independent randomsamples. Consider the class G of indicators of affine subspaces of F n of co-dimension log k (i.e.,affine subspaces of F n of size 2 n / k ). By Claim 2.4, the functions in G are k -Fourier-sparse. Observethat their number satisfies |G| = Θ ( n · min ( log k , n − log k )) .10y Yao’s minimax principle, there exists a deterministic algorithm A ′ (obtained by fixing the ran-dom coins of A ) that given evaluations of a function, chosen uniformly at random from G , on a fixed collection of q points in F n , learns it with success probability p .Now, observe that the expected number of 1-evaluations that A ′ receives is q / k . By Markov’sinequality, the probability that A ′ receives at least 2 q / ( pk ) p /2. It followsthat for at least p /2 fraction of the functions in G the algorithm A ′ receives at most 2 q / ( pk ) pk ≥
2, the number of possible evaluationsequences on these inputs is at most q / ( pk ) ∑ i = (cid:18) qi (cid:19) ≤ ( k · pe /2 ) q / ( pk ) ≤ O ( q · log k / k ) ,where for the first inequality we used the standard inequality ∑ ti = ( qi ) ≤ ( qe / t ) t which holds for t ≤ q (see, e.g., [16, Proposition 1.4]). The above is bounded from below by |G| · p /2, implyingthat q ≥ Ω ( n · min ( log k , n − log k ) · k / log k ) ≥ Ω ( k · ( n − log k )) ,where the last inequality follows by considering separately the cases of k ≥ n /2 and k < n /2 . Incase that pk <
2, the number of possible evaluation sequences is at most 2 q , and the bound followssimilarly using the assumption that p is a fixed constant. In this section we prove upper and lower bounds on the query complexity of testing Booleanityof Fourier-sparse functions. For a parameter k , consider the problem in which given access to a k -Fourier-sparse function f : F n → R one has to decide if f is Boolean, i.e., f ( x ) ∈ {
0, 1 } for every x ∈ F n , or not, with some constant success probability. As mentioned before, Gur and Tamuz proved in [14] that every k -Fourier-sparse non-Booleanfunction f on n variables satisfies f ( x ) / ∈ {
0, 1 } for at least Ω ( n / k ) inputs x ∈ F n (see Claim 2.3).Thus, querying the input function f on O ( k ) independent and random inputs suffices in order tocatch a non-Boolean value of f if such a value exists. In the following lemma it is shown that it isnot really needed to choose the O ( k ) random vectors independently . It turns out that a restrictionof a k -Fourier-sparse non-Boolean function to a random linear subspace of size O ( k ) , that is, ofdimension ≈ k , is with high probability non-Boolean. Thus, the tester could randomly picksuch a subspace and query f on all of its vectors. This decreases the amount of randomness usedin the tester of [14] from O ( nk ) to O ( n log k ) . More importantly for us, this reduces the problemof testing Booleanity of k -Fourier-sparse functions on n variables to the case of k = Θ ( n /2 ) . Lemma 4.1.
Let f : F n → R be a k-Fourier-sparse non-Boolean function, and denote L = ( k + k + ) /2 .Then, for every δ > , the restriction of f to a uniformly chosen random linear subspace of dimensionr ≥ log ( L / δ ) is also non-Boolean with probability at least − δ . roof: Let f : F n → R be a k -Fourier-sparse non-Boolean function. By Claim 2.3, there are atleast 2 n / L vectors x ∈ F n for which f ( x ) / ∈ {
0, 1 } . This implies that there exists a set S of atleast log ( n / L ) linearly independent vectors in F n on which f is not Boolean. Consider a linearsubspace V ⊆ F n of dimension n − S arelinearly independent, the probability that no vector in S is in V is 2 −| S | ≤ L n . It follows that therestriction f | V of f to V is a k -Fourier-sparse function defined on a linear subspace of dimension n −
1, and its probability to be Boolean is at most L n . Note that one can think of the domainof f | V as F n − , because V and F n − are isomorphic and a composition with an invertible lineartransformation does not affect the Fourier-sparsity. Now, let us repeat the above process n − r − r . The probability that the functionbecomes Boolean in one of the steps is at most L n + L n − + · · · + L r + ≤ L r ≤ δ ,and we are done.We now restate and prove Theorem 1.3, which gives an upper bound of O ( k · log k ) on thequery complexity of testing Booleanity of k -Fourier-sparse functions. In the proof, we first applyLemma 4.1 to restrict the input function to a subspace of dimension O ( log k ) . Then, we applyTheorem 3.5 in an attempt to learn the restricted function and check if it is consistent with some k -Fourier-sparse Boolean function. Theorem 1.3.
For every k there exists a non-adaptive one-sided error tester that using O ( k · log k ) queriesto an input k-Fourier-sparse function f : F n → R decides if f is Boolean or not with constant successprobability. Proof:
Consider the tester that given access to an input k -Fourier-sparse function f : F n → R actsas follows:1. Pick uniformly at random a linear subspace V of F n of dimension r = min ( n , ⌈ log ( L ) ⌉ ) ,where L = ( k + k + ) /2, and let T be an invertible linear transformation mapping F r to V .2. Query f on O ( r · k log k ) random vectors chosen uniformly and independently from the sub-space V . Note that these queries can be seen as uniform and independent random samplesfrom the function g : F r → R defined as g = f ◦ T .3. If there exists a k -Fourier-sparse Boolean function on r variables that agrees with the abovesamples of g then accept, and otherwise reject.We turn to prove the correctness of the above tester. If f is a k -Fourier-sparse Boolean functionthen so is g , because a restriction to a subspace and a composition with a linear transformationleave the function k -Fourier-sparse and Boolean. Hence, in this case the tester accepts with prob-ability 1.On the other hand, if f is a k -Fourier-sparse non-Boolean function, then by Lemma 4.1 the re-striction of f to the random subspace V of dimension r picked in Item 1, as well as the function g k -Fourier-sparse Boolean function on r variables that agrees with O ( r · k log k ) uniform and independent random samples from g is 2 − Ω ( r log k ) , thus the tester cor-rectly rejects with probability at least, say, 0.9, as required. Finally, observe that the number ofqueries made by the tester is O ( r · k log k ) = O ( k · log k ) . We turn to restate and prove our lower bound on the query complexity of testing Booleanity of k -Fourier-sparse functions. Theorem 1.4.
Every non-adaptive one-sided error tester for Booleanity of k-Fourier-sparse functions hasquery complexity Ω ( k · log k ) . Proof:
For a given integer k , let n be the largest even integer satisfying k ≥ · n /2 . Define adistribution D no over functions in F n → {
0, 1, 2 } as follows. Pick uniformly at random a pair ( V , V ) of affine subspaces satisfying dim ( V ) = dim ( V ) = n /2 and | V ∩ V | =
1, and outputthe sum of indicators V + V . Notice that, by Claim 2.4, such a function has Fourier-sparsityat most 2 · n /2 ≤ k . Thus, a function chosen from D no is k -Fourier-sparse and non-Boolean withprobability 1.Let T be a non-adaptive one-sided error randomized tester for Booleanity of k -Fourier-sparsefunctions with query complexity q and success probability at least 2/3. By Yao’s minimax prin-ciple, there exists a deterministic tester T ′ (obtained by fixing the random coins of T ) that rejectsa random function chosen from D no with probability at least 2/3. Since T is non-adaptive andhas one-sided error, it follows that T ′ queries an input function on q fixed vectors a , . . . , a q ∈ F n ,accepts every k -Fourier-sparse Boolean function, and rejects a function chosen from D no with prob-ability at least 2/3. We turn to prove that q > ( n · n /2 ) /1000 = Ω ( k · log k ) .Assume in contradiction that q ≤ ( n · n /2 ) /1000. Let f be a random function chosen from D no , that is, f = V + V for random affine subspaces V and V of dimension n /2 satisfying | V ∩ V | =
1. For i =
1, 2, let W i be the affine span of { a , . . . , a q } ∩ V i . Let E be the event that theintersection of W and W is empty. We turn to prove that if the event E happens then the tester T ′ accepts the function f and that the probability of this event is at least 0.9. This contradicts thesuccess probability of T ′ on functions chosen from D no and completes the proof. Lemma 4.2.
If the event E happens then the tester T ′ accepts the function f . Proof:
Assume that the event E happens, i.e., W ∩ W = ∅ . Then, there exists an affine subspace V ′ of dimension n /2 − W ⊆ V ′ ( V and V ∩ V ′ = ∅ . Consider the function g = V + V ′ . By Claim 2.4, g is a Boolean function whose Fourier-sparsity is at most 3 · n /2 ≤ k ,thus it is accepted by T ′ . However, g satisfies g ( a i ) = f ( a i ) for every 1 ≤ i ≤ q . This implies that T ′ cannot distinguish between g and f , so it must accept f as well. Lemma 4.3.
The probability of the event E is at least . roof: Denote by X the number of vectors in { a , . . . , a q } ∩ V . Since V is distributed uniformlyover all affine subspaces of dimension n /2, the probability that a i belongs to V is 2 − n /2 for every1 ≤ i ≤ q . Thus, by linearity of expectation, E [ X ] = q n /2 ≤ ( n · n /2 ) /10002 n /2 = n h dim ( W ) ≥ n i ≤ Pr h X ≥ n i ≤ V for which dim ( W ) < n /10, and consider the randomness over thechoice of V . Notice that, conditioned on V , V is distributed uniformly over all the affine sub-spaces of dimension n /2 which contain exactly one vector from V . By symmetry, every vectorof V has probability | V | − = − n /2 to belong to V . Thus, the probability that the vector thatbelongs to both V and V is in W is | W | · − n /2 < n /10 · − n /2 = − n /5 .Finally, the probability that W ∩ W = ∅ is at least the probability that W ∩ V = ∅ , and thelatter is at least 1 − ( + − n /5 ) ≥ n . Acknowledgments
We thank Adi Akavia, Shachar Lovett, and Eric Price for useful discussions and comments.
References [1] N. Alon, T. Kaufman, M. Krivelevich, S. Litsyn, and D. Ron. Testing Reed-Muller codes.
IEEETrans. on Information Theory , 51(11):4032–4039, 2005. Preliminary version in RANDOM’03.[2] A. Andoni, R. Panigrahy, G. Valiant, and L. Zhang. Learning sparse polynomial functions. In
SODA , pages 500–510, 2014.[3] K. D. Ba, P. Indyk, E. Price, and D. P. Woodruff. Lower bounds for sparse recovery. In
SODA ,pages 1190–1197, 2010.[4] A. Bernasconi and B. Codenotti. Spectral analysis of Boolean functions as a graph eigenvalueproblem.
IEEE Trans. on Computers , 48(3):345–351, 1999.[5] A. Bhattacharyya. Guest column: On testing affine-invariant properties over finite fields.
SIGACT News , 44(4):53–72, 2013.[6] A. Bhattacharyya, S. Kopparty, G. Schoenebeck, M. Sudan, and D. Zuckerman. Optimaltesting of Reed-Muller codes. In
FOCS , pages 488–497, 2010.[7] A. Blum. Learning a function of r relevant variables. In COLT , pages 731–733, 2003.148] J. Bourgain. An improved estimate in the restricted isometry problem. In
Geometric Aspects ofFunctional Analysis , volume 2116 of
Lecture Notes in Mathematics , pages 65–70. Springer, 2014.[9] J. Bruck and R. Smolensky. Polynomial threshold functions, AC functions, and spectralnorms. SIAM J. Comput. , 21(1):33–42, 1992. Preliminary version in FOCS’90.[10] E. J. Cand`es and T. Tao. Near-optimal signal recovery from random projections: Universalencoding strategies?
IEEE Trans. on Information Theory , 52(12):5406–5425, 2006.[11] M. Cheraghchi, V. Guruswami, and A. Velingker. Restricted isometry of Fourier matrices andlist decodability of random linear codes.
SIAM J. Comput. , 42(5):1888–1914, 2013. Preliminaryversion in SODA’13.[12] O. Goldreich and L. A. Levin. A hard-core predicate for all one-way functions. In
STOC ,pages 25–32, 1989.[13] P. Gopalan, R. O’Donnell, R. A. Servedio, A. Shpilka, and K. Wimmer. Testing Fourier di-mensionality and sparsity.
SIAM J. Comput. , 40(4):1075–1100, 2011. Preliminary version inICALP’09.[14] T. Gur and O. Tamuz. Testing Booleanity and the uncertainty principle.
Chicago J. Theor.Comput. Sci. , 2013, 2013.[15] I. Haviv and O. Regev. The restricted isometry property of subsampled Fourier matrices.2015. Manuscript.[16] S. Jukna.
Extremal Combinatorics: With Applications in Computer Science . Texts in theoreticalcomputer science. Springer-Verlag, second edition, 2011.[17] T. Kaufman, S. Lovett, and E. Porat. Weight distribution and list-decoding size of Reed-Muller codes.
IEEE Trans. on Information Theory , 58(5):2689–2696, 2012. Preliminary versionin ICS’10.[18] M. Kocaoglu, K. Shanmugam, A. G. Dimakis, and A. R. Klivans. Sparse polynomial learningand graph sketching. In
NIPS , pages 3122–3130, 2014.[19] E. Kushilevitz and Y. Mansour. Learning decision trees using the Fourier spectrum.
SIAM J.Comput. , 22(6):1331–1348, 1993. Preliminary version in STOC’91.[20] N. Linial, Y. Mansour, and N. Nisan. Constant depth circuits, Fourier transform, and learn-ability.
J. ACM , 40(3):607–620, 1993. Preliminary version in FOCS’89.[21] E. Mossel, R. O’Donnell, and R. A. Servedio. Learning functions of k relevant variables. J.Comput. Syst. Sci. , 69(3):421–434, 2004. Preliminary vesion in STOC’03.[22] R. O’Donnell.
Analysis of Boolean Functions . Cambridge University Press, 2014.[23] M. Rudelson and R. Vershynin. On sparse reconstruction from Fourier and Gaussian mea-surements.
Comm. Pure Appl. Math. , 61(8):1025–1045, 2008.1524] M. Sudan. Invariance in property testing. In
Property Testing - Current Research and Surveys ,volume 6390, pages 211–227. Springer, 2010.[25] G. Valiant. Finding correlations in subquadratic time, with applications to learning paritiesand juntas. In