[PDF] A tighter bound on the number of relevant variables in a bounded degree Boolean function

Abstract

A classical theorem of Nisan and Szegedy says that a boolean function with degree d as a real polynomial depends on at most d 2 d−1 of its variables. In recent work by Chiarelli, Hatami and Saks, this upper bound was improved to C⋅ 2 d , where C=6.614 . Here we refine their argument to show that one may take C=4.416 .

Full PDF

aa r X i v : . [ c s . D M ] M a r A tighter bound on the number of relevant variables in abounded degree Boolean function

Jake WellensMarch 22, 2019

Abstract

A classical theorem of Nisan and Szegedy says that a boolean function with degree d asa real polynomial depends on at most d d − of its variables. In recent work by Chiarelli,Hatami and Saks, this upper bound was improved to C · d , where C = 6 . C = 4 . Given a Boolean function f : { , } n → { , } , there is a unique multilinear polynomial in R [ x , . . . , x n ] which agrees with f on every input in { , } n . One important feature of thispolynomial is its degree, denoted deg( f ), which is known to be polynomially related to manyother complexity measures, such as block sensitivity bs( f ), certiﬁcate complexity C ( f ), decisiontree depth D ( f ), and approximate degree ^ deg( f ) (see [4] and [1]). One can also bound thenumber of relevant variables of f (i.e. the variables which actually show up in a term withnon-zero coeﬃcient in the polynomial for f , also called the junta size of f ) entirely in termsof the degree: Theorem 1 (Nisan-Szegedy [4]) . A function f : { , } n → { , } with degree d has at most d · d relevant variables. The idea of Nisan and Szegedy’s original proof is to lower bound the inﬂuence of a relevantvariable: a polynomial of degree d has total inﬂuence at most d , and yet the derivative in thedirection of a relevant coordinate is a degree d − / d − . In other words, eachrelevant coordinate has inﬂuence at least 1 / d − , so there can be at most d · d − of them.Theorem 1 is has the correct exponential dependence on d – indeed, consider the function f given by the complete binary decision tree of depth d which queries a distinct coordinate ateach vertex. This function has degree d and 2 d − d ) was necessary until a recent paper by Chiarelli,Hatami and Saks showed that O (2 d ) suﬃces. In the same paper, the authors also give an improved lower bound construction, namely, for each d , a degree d function with · d − heorem 2 (Chiarelli, Hatami, Saks, [2]) . A function f : { , } n → { , } with degree d hasat most (6 . · d relevant variables. The main idea in [2] is to replace inﬂuence by a diﬀerent measure – one which behaves morestably under restrictions of variables. Speciﬁcally, they deﬁne W ( f ) = X i ∈ R ( f ) − deg i ( f ) where R ( f ) is the set of relevant variables for f , and deg i ( f ) is the degree of f in x i . It isstraightforward to check that W ( f ) does not decrease by more than | H | · − d in expectationwhen randomly restricting a set H of coordinates with deg i ( f ) = deg( f ) = d . If H is chosenwell, these contributions are summable, and hence W ( f ) is bounded above by some universalconstant. Since | R ( f ) | ≤ deg( f ) W ( f ), this implies Theorem 2.The heart of the proof is therefore in choosing the set H which is both small enough sothat W ( f ) does not incur a heavy loss, and yet signiﬁcant enough that the restricted functionsare of reduced complexity. The idea used in [2] (originating in unpublished work of Nisanand Smolensky, see [1]) is to build H from a maximal collection of disjoint monomials of fulldegree. The number of such disjoint monomials is limited by the block sensitivity bs( f ), whichis always at most d , and by maximality, all of the resulting restricted functions have degree ≤ d − The above idea certainly does the trick, but there are two somewhat substantial sources ofslack in the analysis: one is the global use of the worst-case bound bs( f ) ≤ d , which can beimproved for any ﬁxed d with a ﬁnite computation. The other is that restricting a large disjointcollection of degree d monomials actually causes a large drop in block sensitivity, which canbe exploited. By leveraging both of these ideas, we are able to improve the constant 6 .

614 byabout 33%:

Theorem 3 (Main result) . A function f : { , } n → { , } with degree d has at most (4 . · d relevant variables. Restrictions:

For a function f : { , } n → { , } , a set H ⊂ [ n ], and an assignment α : H →{ , } , we denote by f α the restricted function obtained by setting the variables x h to α ( h ) for h ∈ H . We will sometimes use f ( α H , x ) for f α ( x ) if we want to be explicit about the set ofcoordinates which have received the assignment by α . Inﬂuence:

The inﬂuence of coordinate i on f , or Inf i [ f ], is the probability that, for auniformly random input x , ﬂipping the i th bit of x causes the value of f ( x ) to ﬂip. The totalinﬂuence Inf[ f ] = P i ∈ [ n ] Inf i [ f ] can also be expressed in terms of the Fourier coeﬃcients of f ,2amely Inf[ f ] = X S ⊆ [ n ] | S | b f ( S ) . Since the degree of f remains unchanged when f is expressed as a multilinear polynomial over { , } n (as we consider in this paper) or { , − } n (as in the Fourier expansion), the aboveformula makes it clear that a Boolean function of degree d has Inf[ f ] ≤ d . As mentioned inthe introduction, the following useful fact is from [4], and can be proved by induction:Inf i [ f ] ≥ − deg i ( f ) . (1) Block sensitivity:

For a set B ⊂ [ n ] and a string x ∈ { , } n , we denote by x B the stringobtained from x by ﬂipping all the bits x b for b ∈ B . Recall that the block sensitivity of f at an input x (denoted bs x ( f )) is the maximum number b of disjoint blocks B , . . . , B b ⊂ [ n ]such that f ( x ) = f ( x B i ) for all 1 ≤ i ≤ b , and the block sensitivity of f (denoted bs( f )) is themaximum of bs x ( f ) over all inputs x . It is well-known that block sensitivity and degree arepolynomially related: deg( f ) / ≤ bs( f ) ≤ deg( f ) (2)although neither bound is known to be sharp. The best known constructions have bs( f ) =Θ(deg( f ) / ) and bs( f ) = Θ(deg( f ) log (6) ) = Θ(deg( f ) . ... ) respectively. See [1] and [3] fordetails and for relationships to many other complexity measures. The measure W ( f ) : Recall that W ( f ) := X i ∈ R ( f ) − deg i ( f ) , where R ( f ) is the set of relevant coordinates (i.e. coordinates i for which Inf i [ f ] >

0) anddeg i ( f ) is the degree of largest degree monomial appearing in f (with non-zero coeﬃcient)that contains x i . The behavior of W under restrictions boils down to the following inequality,whose simple proof we reproduce below for completeness. Fact 4 ([2]) . For any relevant coordinates i = j , let f and f be the restrictions obtained from f by setting x j to 0 and 1 respectively. Then − deg i ( f ) ≤ − deg i ( f ) − + 2 − deg i ( f ) − (3) Proof.

Write f = x j f + (1 − x j ) f = x j ( f − f ) + f , from which it is clear that deg i ( f ) ≥ deg i ( f ). If deg i ( f ) ≥ i ( f ) then the inequality is true independently of deg i ( f ).Otherwise, it must be that deg i ( f ) = deg i ( f ), in which case the leading degree monomials for x i must cancel in f − f . But this implies deg i ( f ) = deg i ( f ) = deg i ( f ), and so the inequalitybecomes an equality in this case.By summing (3) over i ∈ R ( f ) and iterating over restrictions of more variables, one obtains W ( f ) ≤ | H | · − d + 12 | H | X α : H →{ , } W ( f α ) (4)3or any set H ⊂ [ n ] with deg i ( f ) = d for all i ∈ H . As in [2], we deﬁne W d := max deg( f )= d W ( f ) . If H is chosen as a maximal collection of degree d monomials in f , then each f α has degreeat most d −

1. An unpublished argument of Nisan and Smolensky (which we essentially usein the proof of Lemma 6 below) implies that | H | ≤ deg( f ) · bs( f ) ≤ d , and so (4) yields therecursive inequality W d ≤ d · − d + W d − . This is already summable, but the bound W ( f ) ≤ Inf[ f ] ≤ d/ d , andoptimizing over the choice of the two bounds yields W d ≤ .

614 for all d . Our ﬁrst new idea is simply to keep track of block sensitivity through the restriction process:the main observation is Proposition 5 below, which says that if f has ℓ disjoint monomials ofmaximum degree, then by assigning any values to the variables in these monomials, the blocksensitivity of the restricted function decreases by ℓ . So, if we have to restrict many variablesin order to drop the degree of f (i.e. to hit all the maximum degree monomials), then we must“spend” our limited supply of block sensitivity, and in the future it will become much easierto lower the degree again. Proposition 5. If f : { , } n → { , } has ℓ disjoint monomials M , . . . , M ℓ , each of degree d = deg( f ) , then for any assignment α : ∪ M i → { , } , the restricted function f α has bs( f α ) ≤ bs( f ) − ℓ. In particular, ℓ ≤ bs( f ) .Proof. Let M = ∪ ℓi =1 M i and b = bs( f α ) = bs y ( f α ), for some y ∈ { , } [ n ] \ M . Then thereare b disjoint blocks B , . . . , B b ⊂ [ n ] \ M with f ( α M , y ) = f ( α M , y B j ) for each j . Since M i is a maximum degree monomial in f , each of the functions { , } M i ∋ x f ( x, z ) is non-constant for any z . Therefore, for each i , there is a block C i ⊂ M i with f ( α C i M , y ) = f ( α M , y ).Therefore { C , . . . , C ℓ , B , . . . , B b } is a collection of disjoint sensitive blocks for f at the input( α M , y ) ∈ { , } n , and so bs( f ) ≥ b + ℓ .To keep track of W , degree and block sensitivity simultaneously, we deﬁne W ( b, d ) := max f with bs( f ) ≤ b and deg( f )= d W ( f ) . Note that W (0 , d ) = 0, W ( b,

0) = 0, and W ( b, d ) ≤ W d for any b . By (2), we have W ( d , d ) = W d , and we make the convention that W ( b, d ) = 0 for b > d .4 emma 6. For each b, d with b ≤ d , we have W ( b, d ) ≤ max ( ℓ,k ) ∈{ ,...,b }×{ ,...,d } (cid:16) ℓ · d · − d + W ( b − ℓ, d − k ) (cid:17) Proof.

Suppose f has degree d and bs( f ) ≤ b . Let M , . . . , M ℓ be a maximal collection ofdisjoint degree d monomials in f , and let H = ∪ i M i . By inequality (4), W ( f ) ≤ | H | · − d | {z } = ℓ · d · − d + E α : H →{ , } [ W ( f α )]Because the collection { M , . . . , M ℓ } is maximal, H hits every degree d monomial andhence each f α has degree d α ≤ d −

1. By Proposition 5, each f α has bs( f α ) ≤ b − ℓ . Since W ( · , d ) is monotone (for feasible inputs), it follows that for each α , W ( f α ) ≤ W ( b − ℓ, d − k ′ ),where k ′ = arg max k ∈{ ,...,d } W ( b − ℓ, d − k ). Taking the maximum over all possible values of ℓ ∈ { , . . . , b } yields the desired bound.Since W d is bounded and increasing , so W ∗ := lim d →∞ W d exists. Since W d = W ( d , d ),the following corollary comes easily from Lemma 6. Corollary 7.

For any d , W ∗ ≤ W ( d , d ) + ∞ X r = d +1 r − r Lemma 6 yields explicit bounds on W ( b, d ) for any ﬁnite ( b, d ), which in turn yields anexplicit bound on W ∗ via Corollary 7. For small values ( d ≤ W ( b, d ) ≤ W d ≤ max deg( f )= d X i ∈ R ( f ) Inf i [ f ]2 = d W (50 , ≤ . ... which implies the same bound (to around 10 decimal digits) on W ∗ . To further reduce our estimate of W ∗ , we focus on functions of low degree, which clearly havethe most inﬂuence on the bounds. Speciﬁcally, we produce sharper upper bounds on the blocksensitivity of such functions, by solving a small set of linear programs. We begin with a simplereduction to linear program feasibility, using ideas from the original proof of bs( f ) ≤ f )from [4]. It is shown in [2] that W d ≥ − d + W d − , and in fact their lower bound construction can be turned into aproof that W d ≥ · − d + W d − . The “2” in this bound can be removed by using repeated function composition (or tensorization ), as shownin [6]. eg( f ) 1 2 3 4 5 6 7 8 9 10 11 12 13 14bs( f ) ≤ Table 1: LP bounds on block sensitivity for low degree functions.

Fact 8.

If there exists a function f : { , } n → { , } of degree d with block sensitivity b , thenthere exists another function g : { , } b → { , } of degree ≤ d with g (0) = 0 and g ( w ) = 1 foreach vector w of hamming weight 1.Proof. If f ( x ) attains maximal block sensitivity at z , then f ( x ⊕ z ) attains maximal blocksensitivity at 0, so without loss of generality we may assume z = 0, and possibly replacing f by 1 − f we may also assume that f (0) = 0. If B , . . . , B b are sensitive blocks for f at 0, thendeﬁne g ( y , . . . , y b ) = f ( y , . . . , y | {z } B , . . . , y b , . . . , y b | {z } B b )so that for each coordinate vector e i , g ( e i ) = f ( B i ) = f (0 B i ) = 1.For any d ≥

1, deﬁne the moment map m d : R → R d by m ( t ) = ( t, t , . . . , t d ). Proposition 9.

If there exists a degree d function f : { , } n → { , } with block sensitivity b , then there exists τ ∈ { , } such that the following set of linear inequalities has a solution p ∈ R d : h p, m d (1) i = 10 ≤ h p, m d ( k ) i ≤ for each k ∈ { , . . . , b − } (5) h p, m d ( b ) i = τ Proof.

If such an f exists, then let q ( x , . . . , x b ) = b ! P σ ∈ S b g ( x σ (1) , . . . , x σ ( b ) ), where g comesfrom Fact 8, and set τ = g (1 , , . . . , p : R → R of degree at most d such that for any x ∈ { , } b , q ( x , . . . , x b ) = p ( x + · · · + x b ). For each k ∈ { , . . . , b } , p ( k ) is therefore the average value of g on booleanvectors with hamming weight k , so in particular p ( k ) ∈ [0 , p (0) = g (0) = 0, p ( b ) = g (1 , . . . ,

1) = τ , and p (1) = n P i g ( e i ) = 1, and hence the coeﬃcients of p provide asolution to the set of linear inequalities.Using the simplex method with exact (rational) arithmetic in Maple, we compute the largest b = b ( d ) for which the LP (5) is feasible for 1 ≤ d ≤

14, which yields upper bounds on blocksensitivity for low degree boolean functions. These bounds are summarized in Table 1.Setting W ( b, d ) = 0 for b > b ( d ), for each d = 1 , . . . ,

14, we can recompute the recursivebounds from Lemma 6 and obtain W (30 , ≤ . ⇒ W ∗ ≤ . . Remark:

The largest known separation between block sensitivity and degree is exhibited by afunction on 6 variables with degree 3. By a tensorization lemma in [6], any degree d function f eg( f ) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 W ( f ) ≤ Table 2: Bounds on W ( f ) for low degrees, obtained using Lemma 6 and Table 1.with block sensitivity b yields an inﬁnite family of boolean functions f k with deg( f k ) = d k andbs( f k ) ≥ b k . Hence, if an entry ( d, b ( d )) in Table 1 is tight for some d ≥

4, then by Fact 8 thereis a function on b ( d ) variables exhibiting a larger-than-currently-known separation betweendegree and block sensitivity. If ( f ) = deg( f ) log (6) is in fact the optimal separation, then ourtechniques would show W ∗ < . While our methods are unlikely to produce the optimal W ∗ , they do suggest a few interestingquestions. • The proof seems to suggest that block sensitivity limits junta size, and for small d , thevalues of W ( b, d ) are much lower when b ≪ d than when b ∼ d . A classical result ofSimon [5] says that | R ( f ) | ≤ s ( f )4 s ( f ) . Interestingly, we can obtain a proof of a weakerversion of Simon’s theorem using only the techniques in this paper and in [2]. The ideais to deﬁne an analogue of W for sensitivity instead of degree: S ( f ) := X i ∈ R ( f ) − s i ( f ) , where s i ( f ) := max { x : f ( x i ) = f ( x ) } ( s x ( f ) + s x i ( f )) . Claim:

Fact 4 holds with deg i ( f ) replaced by s i ( f ) .Proof: Since s i ( f ) ≥ max { s i ( f ) , s i ( f ) } is clear from the deﬁnitions, we can assumethat f does not depend on x i . Without loss of generality suppose j = 1 and that y has f (1 , y ) = 1 = f (1 , y i ) and s i ( f ) = s y ( f ) + s y i ( f ). First suppose f ( y ) = 0. Since f (1 , y ) = 1, this means f is also sensitive to j at input (1 , y ), and so s i ( f ) ≥ s i ( f ). If f ( y ) = 1, then f ( y i ) is also 1 because f does not depend on x i . But then f is sensitiveto j at input (1 , y i ), and so either way s i ( f ) ≥ s i ( f ), which implies the claim.From here we arrive at the analogue of (4), and we can proceed in a number of ways. Asin the proof of Lemma 6, we can restrict maximum degree monomials until we run outof block sensitivity, yielding | R ( f ) | ≤ deg( f ) · bs( f ) · s ( f ) .In any case, it seems reasonable to conjecture a Nisan-Szegedy theorem for block sensi-tivity, namely that any boolean function f is a poly(bs( f )) · bs( f ) -junta. • More generally, it would be interesting to characterize certain ternary relationships be-tween complexity measures. Many of the examples we know which achieve optimal orbest-known separations between two measures tend to have the property that a thirdmeasure is equal or very close to one of the other two. (For example, the best knowngap of the form bs( f ) ≪ deg( f ) is attained by a Tribes function on n variables withdeg( f ) = n and bs( f ) = s ( f ) = C ( f ) = √ n .) Moreover, these examples almost always7ave | R ( f ) | = poly(deg( f )). Meanwhile, the known examples of functions with nearly-optimal junta size do not exhibit any super-constant separation between the measuresdeg( f ) , bs( f ) , s ( f ) and D ( f ). (It is possible to hybridize small, well-separated functionswith large juntas, but the separations and the junta size both suﬀer some loss.) • Finally, Table 1 suggsts that bs( f ) ≤ c deg( f ) , for c ≈ .

59. If you enjoyed readingthis paper, perhaps you would enjoy trying to compute the optimal value of c . Oneconsequence of showing that bs < deg is that any separation between bs and degreeproven by simply tensorizing a single example would necessarily (in the absence of moresophisticated arguments) look like bs( f ) ≥ deg( f ) − ǫ , for some ǫ > References [1] Harry Buhrman and Ronald De Wolf,

Complexity measures and decision tree complexity:a survey , Theoretical Computer Science 288 (2002), no. 1, 21–43.[2] John Chiarelli, Pooya Hatami, and Michael Saks,

Tight bound on the number of relevantvariables in a bounded degree Boolean function . CoRR, abs 1801.08564, (2018).[3] Pooya Hatami, Raghav Kulkarni, and Denis Pankratov,

Variations on the sensitivityconjecture , Theory of Computing Library, Graduate Surveys 4 (2011), 1–27.[4] Noam Nisan and Mario Szegedy,

On the degree of boolean functions as real polynomials ,Computational complexity (1994), no. 4, 301–313.[5] Hans-Ulrich Simon,

A tight