[PDF] A composition theorem for the Fourier Entropy-Influence conjecture

Abstract

The Fourier Entropy-Influence (FEI) conjecture of Friedgut and Kalai [FK96] seeks to relate two fundamental measures of Boolean function complexity: it states that H[f]≤CInf[f] holds for every Boolean function f , where H[f] denotes the spectral entropy of f , Inf[f] is its total influence, and C>0 is a universal constant. Despite significant interest in the conjecture it has only been shown to hold for a few classes of Boolean functions. Our main result is a composition theorem for the FEI conjecture. We show that if g 1 ,..., g k are functions over disjoint sets of variables satisfying the conjecture, and if the Fourier transform of F taken with respect to the product distribution with biases E[ g 1 ],...,E[ g k ] satisfies the conjecture, then their composition F( g 1 ( x 1 ),..., g k ( x k )) satisfies the conjecture. As an application we show that the FEI conjecture holds for read-once formulas over arbitrary gates of bounded arity, extending a recent result [OWZ11] which proved it for read-once decision trees. Our techniques also yield an explicit function with the largest known ratio of C≥6.278 between H[f] and Inf[f] , improving on the previous lower bound of 4.615.

Full PDF

aa r X i v : . [ c s . CC ] A p r A composition theorem for the Fourier Entropy-Inﬂuence conjecture

Ryan O’DonnellCarnegie Mellon University [email protected] ∗ Li-Yang TanColumbia University [email protected] † April 24, 2018

Abstract

The Fourier Entropy-Inﬂuence (FEI) conjecture of Friedgut and Kalai [FK96] seeks to relatetwo fundamental measures of Boolean function complexity: it states that H [ f ] ≤ C · Inf [ f ]holds for every Boolean function f , where H [ f ] denotes the spectral entropy of f , Inf [ f ] is itstotal inﬂuence, and C > g , . . . , g k are functions over disjoint sets of variables satisfying the conjecture, and if the Fourier trans-form of F taken with respect to the product distribution with biases E [ g ] , . . . , E [ g k ] satisﬁesthe conjecture, then their composition F ( g ( x ) , . . . , g k ( x k )) satisﬁes the conjecture. As an ap-plication we show that the FEI conjecture holds for read-once formulas over arbitrary gates ofbounded arity, extending a recent result [OWZ11] which proved it for read-once decision trees.Our techniques also yield an explicit function with the largest known ratio of C ≥ .

278 between H [ f ] and Inf [ f ], improving on the previous lower bound of 4 . A longstanding and important open problem in the ﬁeld of Analysis of Boolean Functions is theFourier Entropy-Inﬂuence conjecture made by Ehud Friedgut and Gil Kalai in 1996 [FK96, Kal07].The conjecture seeks to relate two fundamental analytic measures of Boolean function complexity,the spectral entropy and total inﬂuence:

Fourier Entropy-Inﬂuence (FEI) Conjecture.

There exists a universal constant

C > suchthat for every Boolean function f : {− , } n → {− , } , it holds that H [ f ] ≤ C · Inf [ f ] . That is, X S ⊆ [ n ] b f ( S ) log b f ( S ) ! ≤ C X S ⊆ [ n ] | S | · b f ( S ) . Applying Parseval’s identity to a Boolean function f we get P S ⊆ [ n ] b f ( S ) = E [ f ( x ) ] = 1,and so the Fourier coeﬃcients of f induce a probability distribution S f over the 2 n subsets of [ n ] ∗ Supported by NSF grants CCF-0747250 and CCF-1116594, and a Sloan fellowship. This material is based uponwork supported by the National Science Foundation under grant numbers listed above. Any opinions, ﬁndings andconclusions or recommendations expressed in this material are those of the author and do not necessarily reﬂect theviews of the National Science Foundation (NSF). † Research done while visiting CMU. S ⊆ [ n ] has “weight” (probability mass) b f ( S ) . The spectral entropy of f , denoted H [ f ],is the Shannon entropy of S f , quantifying how spread out the Fourier weight of f is across all 2 n monomials. The inﬂuence of a coordinate i ∈ [ n ] on f is Inf i [ f ] = Pr [ f ( x ) = f ( x ⊕ i )] , where x ⊕ i denotes x with its i -th bit ﬂipped, and the total inﬂuence of f is simply Inf [ f ] = P ni =1 Inf i [ f ].Straightforward Fourier-analytic calculations show that this combinatorial deﬁnition is equivalent tothe quantity E S ∼ S f [ | S | ] = P S ⊆ [ n ] | S |· b f ( S ) , and so total inﬂuence measures the degree distributionof the monomials of f , weighted by the squared-magnitude of its coeﬃcients. Roughly speakingthen, the FEI conjecture states that a Boolean function whose Fourier weight is well “spread out”( i.e. has high spectral entropy) must have a signiﬁcant portion of its Fourier weight lying on highdegree monomials ( i.e. have high total inﬂuence). In addition to being a natural question concerning the Fourier spectrum of Boolean functions,the FEI conjecture also has important connections to several areas of theoretical computer scienceand mathematics. Friedgut and Kalai’s original motivation was to understand general conditionsunder which monotone graph properties exhibit sharp thresholds, and the FEI conjecture capturesthe intuition that having signiﬁcant symmetry, hence high spectral entropy, is one such condition.Besides its applications in the study of random graphs, the FEI conjecture is known to imply thecelebrated Kahn-Kalai-Linial theorem [KKL88]:

KKL Theorem.

For every Boolean function f there exists an i ∈ [ n ] such that Inf i [ f ] = Var [ f ] · Ω( log nn ) . The FEI conjecture also implies Mansour’s conjecture [Man94]:

Mansour’s Conjecture.

Let f be a Boolean function computed by a t -term DNF formula. For anyconstant ε > there exists a collection S ⊆ [ n ] of cardinality poly( t ) such that P S ∈S b f ( S ) ≥ − ε . Combined with recent work of Gopalan et al. [GKK08a], Mansour’s conjecture yields an eﬃcientalgorithm for agnostically learning the class of poly( n )-term DNF formulas from queries. This wouldresolve a central open problem in computational learning theory [GKK08b]. De et al. also noted thatsuﬃciently strong versions of Mansour’s conjecture would yield improved pseudorandom generatorsfor depth-2 AC circuits [DETT10]. More generally, the FEI conjecture implies the existence ofsparse L -approximators for Boolean functions with small total inﬂuence: Sparse L -approximators. Assume the FEI conjecture holds. Then for every Boolean function f there exists a 2 O ( Inf [ f ] /ε ) -sparse polynomial p : R n → R such that E [( f ( x ) − p ( x )) ] ≤ ε .By Friedgut’s junta theorem [Fri98], the above holds unconditionally with a weaker bound of2 O ( Inf [ f ] /ε ) . This is the main technical ingredient underlying several of the best known uniform-distribution learning algorithms [Ser04, OS08].For more on the FEI conjecture we refer the reader to Kalai’s blog post [Kal07]. Our research is motivated by the following question:

Question 1.

Let F : {− , } k → {− , } and g , . . . , g k : {− , } ℓ → {− , } . What propertiesdo F and g , . . . , g k have to satisfy for the FEI conjecture to hold for the disjoint composition f ( x , . . . , x k ) = F ( g ( x ) , . . . , g k ( x k )) ? All probabilities and expectations are with respect to the uniform distribution unless otherwise stated. The assumption that f is Boolean-valued is crucial here, as the same conjecture is false for functions f : {− , } n → R satisfying P S ⊆ [ n ] b f ( S ) = 1. The canonical counterexample is f ( x ) = √ n P ni =1 x i which has to-tal inﬂuence 1 and spectral entropy log n . F = OR and g , . . . , g k = AND , perhaps two of the most basic Booleanfunctions with extremely simple Fourier spectra. Indeed, Mansour’s conjecture, a weaker conjecturethan FEI, was only recently shown to hold for read-once DNFs [KLW10, DETT10]. Besides beinga fundamental question concerning the behavior of spectral entropy and total inﬂuence undercomposition, Question 1 (and our answer to it) also has implications for a natural approach towardsdisproving the FEI conjecture; we elaborate on this at the end of this section.A particularly appealing and general answer to Question 1 that one may hope for would be thefollowing: “if H [ F ] ≤ C · Inf [ F ] and H [ g i ] ≤ C · Inf [ g i ] for all i ∈ [ k ], then H [ f ] ≤ max { C , C } · Inf [ f ].” While this is easily seen to be false , our main result shows that this proposed answerto Question 1 is in fact true for a carefully chosen sharpening of the FEI conjecture. To arrive ata formulation that bootstraps itself, we ﬁrst consider a slight strengthening of the FEI conjecturewhich we call FEI + , and then work with a generalization of FEI + that concerns the Fourier spectrumof f not just with respect to the uniform distribution, but an arbitrary product distribution over {− , } n : Conjecture 1 (FEI + for product distributions) . There is a universal constant

C > such that thefollowing holds. Let µ = h µ , . . . , µ n i be any sequence of biases and f : {− , } nµ → {− , } . Herethe notation {− , } nµ means that we think of {− , } n as being endowed with the µ -biased productprobability distribution in which E µ [ x i ] = µ i for all i ∈ [ n ] . Let { e f ( S ) } S ⊆ [ n ] be the µ -biased Fouriercoeﬃcients of f . Then X S = ∅ e f ( S ) log Q i ∈ S (1 − µ i ) e f ( S ) ! ≤ C · ( Inf µ [ f ] − Var µ [ f ]) . We write H µ [ f ] to denote the quantity P S ⊆ [ n ] e f ( S ) log (cid:16)Q i ∈ S (1 − µ i ) / e f ( S ) (cid:17) , and so the inequal-ity of Conjecture 1 can be equivalently stated as H µ [ f ≥ ] ≤ C · ( Inf µ [ f ] − Var µ [ f ]).In Proposition 2.1 we show that Conjecture 1 with µ = h , . . . , i (the uniform distribution)implies the FEI conjecture. We say that a Boolean function f “satisﬁes µ -biased FEI + with factor C ” if the µ -biased Fourier transform of f satisﬁes the inequality of Conjecture 1. Our main result,which we prove in Section 3, is a composition theorem for FEI + : Theorem 1.

Let f ( x , . . . , x k ) = F ( g ( x ) , . . . , g k ( x k )) , where the domain of f is endowed witha product distribution µ . Suppose g , . . . , g k satisfy µ -biased FEI + with factor C and F satisﬁes η -biased FEI + with factor C , where η = h E µ [ g ] , . . . , E µ [ g k ] i . Then f satisﬁes µ -biased FEI + withfactor max { C , C } . Theorem 1 suggests an inductive approach towards proving the FEI conjecture for read-once deMorgan formulas: since the dictators ± x i trivially satisfy uniform-distribution FEI + with factor 1,it suﬃces to prove that both AND and OR satisfy µ -biased FEI + with some constant independentof µ ∈ [ − , . In Section 4 we prove that in fact every F : {− , } k → {− , } satisﬁes µ -biasedFEI + with a factor depending only on its arity k and not the biases µ , . . . , µ k . Theorem 2.

Every F : {− , } k → {− , } satisﬁes µ -biased FEI + with factor C = 2 O ( k ) for anyproduct distribution µ = h µ , . . . , µ k i . For example, by considering F = OR , the 2-bit disjunction, and g , g = AND , the 2-bit conjunction. Theorem 3.

Let f be computed by a read-once formula over the basis B and µ be any sequencesof biases. Then f satisﬁes µ -biased FEI + with factor C , where C depends only on the arity of thegates in B . Since uniform-distribution FEI + is a strengthening of the FEI conjecture, Theorem 3 impliesthat the FEI conjecture holds for read-once formulas over arbitrary gates of bounded arity. Asmentioned above, prior to our work the FEI conjecture was open even for the class of read-onceDNFs, a small subclass of read-once formulas over the de Morgan basis { AND , OR , NOT } of arity2. Read-once formulas over a rich basis B are a natural generalization of read-once de Morganformulas, and have seen previous study in concrete complexity (see e.g. [HNW93]). Improved lower bound on the FEI constant.

Iterated disjoint composition is commonly usedto achieve separations between complexity measures for Boolean functions [BdW02], and representsa natural approach towards disproving the FEI conjecture. For example, one may seek a function F such that iterated compositions of F with itself achieves a super-constant ampliﬁcation of theratio between H [ F ] and Inf [ F ], or consider variants such as iterating F with a diﬀerent combiningfunction G . Theorem 3 rules out as potential counterexamples all such constructions based oniterated composition.However, the tools we develop to prove Theorem 3 also yield an explicit function f achievingthe best-known separation between H [ f ] and Inf [ f ] ( i.e. the constant C in the statement of theFEI conjecture). In Section 5 we prove: Theorem 4.

There exists an explicit family of functions f n : {− , } n → {− , } such that lim n →∞ H [ f n ] Inf [ f n ] ≥ . . This improves on the previous lower bound of C ≥ / ≈ .

615 [OWZ11].

Previous work.

The ﬁrst published progress on the FEI conjecture was by Klivans et al. whoproved the conjecture for random poly( n )-term DNF formulas [KLW10]. This was followed bythe work of O’Donnell et al. who proved the conjecture for the class of symmetric functions andread-once decision trees [OWZ11].The FEI conjecture for product distributions was studied in the recent work of Keller et al. [KMS12], where they consider the case of all the biases being the same. They introduce thefollowing generalization of the FEI conjecture to these measures, and show via a reduction to theuniform distribution [BKK +

92] that it is equivalent to the FEI conjecture:

Conjecture 2 (Keller-Mossel-Schlank) . There is a universal constant C such that the followingholds. Let < p < and f : {− , } n → {− , } , where the domain of f is endowed with the productdistribution where Pr [ x i = −

1] = p for all i ∈ [ n ] . Let { e f ( S ) } S ⊆ [ n ] be the Fourier coeﬃcients of f with respect to this distribution. Then X S ⊆ [ n ] e f ( S ) log e f ( S ) ! ≤ C · log(1 /p )1 − p X S ⊆ [ n ] | S | · e f ( S ) . C · log(1 /p )1 − p , depends on p .By way of contrast, in our Conjecture 1 the right-hand side constant has no dependence on p ;instead, the dependence on the biases is built into the deﬁnition of spectral entropy. We view ourgeneralization of the FEI conjecture to arbitrary product distributions (where the biases are notnecessarily identical) as a key contribution of this work, and point to our composition theorem asevidence in favor of Conjecture 1 being a good statement to work with. Notation.

We will be concerned with functions f : {− , } nµ → R where µ = h µ , . . . , µ n i ∈ [0 , n is a sequence of biases. Here the notation {− , } nµ means that we think of {− , } n as beingendowed with the µ -biased product probability distribution in which E µ [ x i ] = µ i for all i ∈ [ n ]. Wewrite σ i to denote variance of the i -th coordinate Var µ [ x i ] = 1 − µ i , and ϕ : R → R as shorthandfor the function t t log(1 /t ), adopting the convention that ϕ (0) = 0. We will assume familiaritywith the basics of Fourier analysis with respect to product distributions over {− , } n ; a review isincluded in Appendix A. Proposition 2.1 (FEI + implies FEI) . Suppose f satisﬁes uniform-distribution FEI + with factor C . Then f satisﬁes the FEI conjecture with factor max { C, / ln 2 } .Proof. Let b f ( ∅ ) = 1 − ε , where ε = Var [ f ] by Parseval’s identity. By our assumption that f satisﬁes uniform-distribution FEI + with factor C , we have X S ⊆ [ n ] b f ( S ) log Q i ∈ S σ i b f ( S ) ! ≤ C · ( Inf [ f ] − Var [ f ]) + (1 − ε ) log 1(1 − ε ) ≤ C · ( Inf [ f ] − Var [ f ]) + ε ln 2= C · Inf [ f ] + (cid:18) − C (cid:19) · Var [ f ] . If C > / ln 2 then the RHS is at most C · Inf [ f ] since ( − C ) · Var [ f ] is negative. Otherwise weapply the Poincar´e inequality (Theorem 9) to conclude that the RHS is at most C · Inf [ f ] + ( − C ) · Inf [ f ] = · Inf [ f ]. + We will be concerned with compositions of functions f = F ( g ( x ) , . . . , g k ( x k )) where g , . . . , g k areover disjoint sets of variables each of size ℓ . The domain of each g i is endowed with a product distri-bution µ i = h µ i , . . . , µ iℓ i , which induces an overall product distribution µ = h µ , . . . , µ ℓ , . . . , µ k , . . . , µ kℓ i over the domain of f : {− , } kℓ → {− , } . For notational clarity we will adopt the equivalentview of g , . . . , g k as functions over the same domain {− , } kℓµ endowed with the same productdistribution µ , with each g i depending only on ℓ out of kℓ variables.Our ﬁrst lemma gives formulas for the spectral entropy and total inﬂuence of the product offunctions Φ , . . . , Φ k over disjoint sets of variables. The lemma holds for real-valued functions Φ i ;we require this level of generality as we will not be applying the lemma directly to the Boolean-valued functions g , . . . , g k in the composition F ( g ( x ) , . . . , g k ( x k )), but instead to their normalizedvariants Φ( g i ) = ( g i − E [ g i ]) / Var [ g i ] / . 5 emma 3.1. Let Φ , . . . , Φ k : {− , } kℓµ → R where each Φ i depends only on the ℓ coordinates in { ( i − ℓ + 1 , . . . , iℓ } . Then H µ [Φ · · · Φ k ] = k X i =1 H µ [Φ i ] Y j = i E µ [Φ j ] and Inf µ [Φ · · · Φ k ] = k X i =1 Inf µ [Φ i ] Y j = i E µ [Φ j ] . Due to space considerations we defer the proof of Lemma 3.1 to Appendix B. We note that thislemma recovers as a special case the folklore observation that the FEI conjecture “tensorizes”: forany f if we deﬁne f ⊕ k ( x , . . . , x k ) = f ( x ) · · · f ( x k ) then H [ f ⊕ k ] = k · H [ f ] and Inf [ f ⊕ k ] = k · Inf [ f ].Therefore H [ f ] ≤ C · Inf [ f ] if and only if H [ f ⊕ k ] ≤ C · Inf [ f ⊕ k ].Our next proposition relates the basic analytic measures – spectral entropy, total inﬂuence,and variance – of a composition f = F ( g ( x ) , . . . , g k ( x k )) to the corresponding quantities of thecombining function F and base functions g , . . . , g k . As alluded to above, we accomplish this byconsidering f as a linear combination of the normalized functions Φ( g i ) = ( g i − E [ g i ]) / Var [ g i ] / and applying Lemma 3.1 to each term in the sum. We mention that this proposition is also thecrux of our new lower bound of C ≥ .

278 on the constant of the FEI conjecture, which we presentin Section 5.

Proposition 3.2.

Let F : {− , } k → R , and g , . . . , g k : {− , } kℓµ → {− , } where each g i depends only on the ℓ coordinates in { ( i − ℓ + 1 , . . . , iℓ } . Let f ( x ) = F ( g ( x ) , . . . , g k ( x )) and { e F ( S ) } S ⊆ [ k ] be the η -biased Fourier coeﬃcients of F where η = h E µ [ g ]) , . . . , E µ [ g k ] i . Then H µ [ f ≥ ] = H η [ F ≥ ] + X S = ∅ e F ( S ) X i ∈ S H µ [ g ≥ i ] Var µ [ g i ] , (1) Inf µ [ f ] = X S = ∅ e F ( S ) X i ∈ S Inf µ [ g i ] Var µ [ g i ] , and (2) Var µ [ f ] = X S = ∅ e F ( S ) = Var η [ F ] . (3) Proof.

By the η -biased Fourier expansion of F : {− , } kη → R and the deﬁnition of η we have F ( y , . . . , y k ) = X S ⊆ [ n ] e F ( S ) Y i ∈ S y i − η i q − η i = X S ⊆ [ n ] e F ( S ) Y i ∈ S y i − E µ [ g i ] Var µ [ g i ] / , so we may write F ( g ( x ) , . . . , g k ( x )) = X S ⊆ [ n ] e F ( S ) Y i ∈ S Φ( g i ( x )) , where Φ( g i ( x )) = g i ( x ) − E µ [ g i ] Var µ [ g i ] / . Note that Φ normalizes g i such that E µ [Φ( g i )] = 0 and E µ [Φ( g i ) ] = 1. First we claim that H µ [ f ≥ ] = H µ (cid:20) X S = ∅ e F ( S ) Y i ∈ S Φ( g i ) (cid:21) = X S = ∅ H µ h e F ( S ) Y i ∈ S Φ( g i ) i . It suﬃces to show that for any two distinct non-empty sets

S, T ⊆ [ k ], no monomial φ µU occursin the µ -biased spectral support of both e F ( S ) Q i ∈ S Φ( g i ) and e F ( T ) Q i ∈ T Φ( g i ). To see this recall6hat Φ( g i ) is balanced with respect to µ ( i.e. E µ [Φ( g i )] = E µ [Φ( g i ) φ µ ∅ ] = 0), and so every monomial φ µU in the support of e F ( S ) Q i ∈ S Φ( g i ) is of the form Q i ∈ S φ µU i where U i is a non-empty subset ofthe relevant variables of g i ( i.e. { ( i − ℓ + 1 , . . . , iℓ } ); likewise for monomials in the support of e F ( T ) Q i ∈ T Φ( g i ). In other words the non-empty subsets of [ k ] induce a partition of the µ -biasedFourier support of f , where φ µU is mapped to ∅ 6 = S ⊆ [ k ] if and only if U contains a relevantvariable of g i for every i ∈ S and none of the relevant variables of g j for any j / ∈ S .With this identity in hand we have H µ [ f ≥ ] = X S = ∅ H µ h e F ( S ) Y i ∈ S Φ( g i ) i = X S = ∅ ϕ ( e F ( S )) + e F ( S ) X i ∈ S H µ [Φ( g i )] . = X S = ∅ ϕ ( e F ( S )) + e F ( S ) X i ∈ S (cid:18) H µ [ g i − E µ [ g i ]] Var µ [ g i ] + ϕ (cid:18) Var µ [ g i ] / (cid:19) Var µ [ g i ] (cid:19) = H η [ F ≥ ] + X S = ∅ e F ( S ) X i ∈ S H µ [ g ≥ i ] Var µ [ g i ] , where the second and third equalities are two applications of Lemma 3.1 (for the second equalitywe view e F ( S ) as a constant function with H µ [ e F ( S )] = ϕ ( e F ( S ))). By the same reasoning, we alsohave Inf µ [ f ] = X S = ∅ Inf µ h e F ( S ) Y i ∈ S Φ( g i ( x i )) i = X S = ∅ e F ( S ) X i ∈ S Inf µ [Φ( g i )]= X S = ∅ e F ( S ) X i ∈ S Inf µ [ g i ] Var µ [ g i ] . Here the second equality is by Lemma 3.1, again viewing e F ( S ) as a constant function with Inf µ [ e F ( S )] = 0, and the third equality uses the fact that Inf µ [ αf ] = α · Inf µ [ f ] and Inf µ [ g i − E µ [ g i ]] = Inf µ [ g i ]. Finally we see that Var µ [ f ] = X S = ∅ Var µ h e F ( S ) Y i ∈ S Φ( g i ) i = X S = ∅ e F ( S ) Y i ∈ S Var µ [Φ( g i )] = X S = ∅ e F ( S ) , where the last quantity is Var η [ F ]. Here the second equality uses the fact that the functions Φ( g i )are on disjoint sets of variables (and therefore statistically independent when viewed as randomvariables), and the third equality holds since Var µ [Φ( g i )] = E [Φ( g i ) ] − E [Φ( g i )] = 1.We are now ready to prove our main theorem: Theorem 1.

Let F : {− , } k → R , and g , . . . , g k : {− , } kℓµ → {− , } where each g i dependsonly on the ℓ coordinates in { ( i − ℓ + 1 , . . . , iℓ } . Let f ( x ) = F ( g ( x ) , . . . , g k ( x )) and suppose C > satisﬁes1. H µ [ g ≥ i ] ≤ C · ( Inf µ [ g i ] − Var µ [ g i ]) for all i ∈ [ k ] .2. H η [ F ≥ ] ≤ C · ( Inf η [ F ] − Var η [ F ]) , where η = h E µ [ g ] , . . . , E µ [ g k ] i .Then H µ [ f ≥ ] ≤ C · ( Inf µ [ f ] − Var µ [ f ]) . roof. By our ﬁrst assumption each g i satisﬁes Inf µ [ g i ] ≥ C H µ [ g ≥ ] + Var µ [ g i ], and so combiningthis with equation (2) of Proposition 3.2 we have Inf µ [ f ] = X S = ∅ e F ( S ) X i ∈ S Inf µ [ g i ] Var µ [ g i ] ≥ X S = ∅ e F ( S ) X i ∈ S H µ [ g ≥ i ] C Var µ [ g i ] + 1 ! = Inf η [ F ] + 1 C X S = ∅ e F ( S ) X i ∈ S H µ [ g ≥ i ] Var µ [ g i ] (4)This along with equations (1) and (3) of Proposition 3.2 completes the proof: H µ [ f ≥ ] = H η [ F ≥ ] + X S = ∅ e F ( S ) X i ∈ S H µ [ g ≥ i ] Var µ [ g i ] ≤ C · ( Inf η [ F ] − Var η [ F ]) + X S = ∅ e F ( S ) X i ∈ S H µ [ g ≥ i ] Var µ [ g i ] ≤ C · ( Inf µ [ f ] − Var η [ F ]) = C · ( Inf µ [ f ] − Var µ [ f ]) . Here the ﬁrst equality is by (1), the ﬁrst inequality by our second assumption, the second inequalityby (4), and ﬁnally the last identity by (3). + In this section we prove that µ -biased FEI + holds for all Boolean functions F : {− , } kµ → {− , } with factor C independent of the biases µ , . . . , µ k of µ . When µ = h , . . . i is the uniformdistribution it is well-known that the FEI conjecture holds with factor C = O (log k ), and a bound of C ≤ k is trivial since Inf [ F ] is always an integer multiple of 2 − k and H [ F ] ≤

1; neither proofs carrythrough to the setting of product distributions. We remark that even verifying the seemingly simpleclaim “there exists a universal constant C such that H µ [ MAJ ] ≤ C · ( Inf µ [ MAJ ] − Var µ [ MAJ ])for all product distributions µ ∈ [0 , ”, where MAJ the majority function over 3 variables, turnsout to be technically cumbersome.The high-level strategy is to bound each of the 2 k − H µ [ F ≥ ] separately; due to spaceconsiderations we defer the proof the main lemma to Appendix B. Lemma 4.1.

Let F : {− , } kµ → {− , } . Let S ⊆ [ k ] , S = ∅ , and suppose e F ( S ) = 0 . For any j ∈ S we have e F ( S ) log Q i ∈ S σ i e F ( S ) ! ≤ k ln 2 · Var µ [ D φ µj F ] . Theorem 2.

Let F : {− , } kµ → {− , } . Then H µ [ F ≥ ] ≤ O ( k ) · ( Inf µ [ F ] − Var µ [ F ]) . Proof.

The claim can be equivalently stated as H µ [ F ≥ ] ≤ O ( k ) P ni =1 Var µ [ D φ µi F ], since n X i =1 Var [ D φ µi F ] = X | S |≥ | S | · e F ( S ) ≤ X | S |≥ ( | S | − · e F ( S ) = 2 · ( Inf µ [ F ] − Var µ [ F ]) . By Lemma 4.1, for every S = ∅ that contributes ϕ ( e F ( S )) to H µ [ F ≥ ] we have ϕ ( e F ( S )) ≤ O ( k ) Var µ [ D φ µj F ], where j is any element of S . Summing over all 2 k − S of [ k ] completes the proof. 8 .1 FEI + for read-once formulas Finally, we combine our two main results so far, the composition theorem (Theorem 1) and thedistribution-independent universal bound (Theorem 2), to prove Conjecture 1 for read-once formu-las with arbitrary gates of bounded arity.

Deﬁnition 5.

Let B be a set of Boolean functions. We say that a Boolean function f is a formulaover the basis B if f is computable a formula with gates belonging to B . We say that f is aread-once formula over B if every variable appears at most once in the formula for f . Corollary 4.2.

Let

C > and B be a set of Boolean functions, and suppose H µ [ F ] ≤ C · ( Inf µ [ F ] − Var µ [ F ]) for all F ∈ B and product distributions µ . Let C be the class of read-once formulas overthe basis B . Then H µ [ f ] ≤ C · ( Inf µ [ f ] − Var µ [ f ]) for all f ∈ C and product distributions µ .Proof. We proceed by structural induction on the formula computing f . The base case holds sincethe µ -biased Fourier expansion of the dictator x and anti-dictator − x i is ± ( µ + σ φ µ ( x )) and so H µ [ f ≥ ] = e f ( { } ) log( σ / e f ( { } ) ) = σ log( σ /σ ) = 0.For the inductive step, suppose f = F ( g , . . . , g k ), where F ∈ B and g , . . . , g k are read-once formulas over B over disjoint sets of variables. Let µ be any product distribution over thedomain of f . By our induction hypothesis we have H µ [ g ≥ i ] ≤ C · ( Inf µ [ g i ] − Var µ [ g i ]) for all i ∈ [ k ], satisfying the ﬁrst requirement of Theorem 1. Next, by our assumption on F ∈ B ,we have H η [ F ≥ ] ≤ C · ( Inf η [ F ] − Var η [ F ]) for all product distributions η , and in particular, η = h E µ [ g ] , . . . , E µ [ g k ] i , satisfying the second requirement of Theorem 1. Therefore, by Theorem1 we conclude that H µ [ f ] ≤ C · ( Inf µ [ f ] − Var µ [ f ]).By Theorem 2, for any set B of Boolean functions with maximum arity k and product distribu-tion µ , every F ∈ B satisﬁes H µ [ F ] ≤ O ( k ) · ( Inf µ [ F ] − Var µ [ q ]). Combining this with Corollary4.2 yields the following: Theorem 3.

Let B be a set of Boolean functions with maximum arity k , and C be the class ofread-once formulas over the basis B . Then H µ [ f ] ≤ O ( k ) · ( Inf µ [ f ] − Var µ [ f ]) for all f ∈ C andproduct distributions µ . The tools we develop in this paper also yield an explicit function f achieving the best-known ratiobetween H [ f ] and Inf [ f ] ( i.e. a lower bound on the constant C in the FEI conjecture). We will usethe following special case of Proposition 3.2 on the behavior of spectral entropy and total inﬂuenceunder composition: Lemma 5.1 (Ampliﬁcation lemma) . Let F : {− , } k → {− , } and g : {− , } ℓ → {− , } bebalanced Boolean functions. Let f = g , and for all m ≥ , deﬁne f m = F ( f m − ( x ) , . . . , f m − ( x k )) .Then H [ f m ] = H [ g ] · Inf [ F ] m + H [ F ] · Inf [ F ] m − Inf [ F ] − Inf [ f m ] = Inf [ g ] · Inf [ F ] m . In particular, if F = g we have H [ f m ] Inf [ f m ] = H [ F ] Inf [ F ] + H [ F ] Inf [ F ]( Inf [ F ] − − H [ F ] Inf [ F ] m +1 ( Inf [ F ] − . roof. Since the composition of a balanced function with another remains balanced, we have therecurrence relations H [ f m ] = H [ f m − ] · Inf [ F ] + H [ F ] and H [ f m ] = H [ f m − ] · Inf [ F ] + H [ F ] asspecial cases of Proposition 3.2. Solving them yields the claim. Theorem 4.

There exists an inﬁnite family of functions f m : {− , } m → {− , } such that lim m →∞ H [ f m ] / Inf [ f m ] ≥ . .Proof. Let g = ( x ∧ x ∧ x ) ∨ ( x ∧ x ∧ x ) ∨ ( x ∧ x ∧ x ∧ x ) ∨ ( x ∧ x ∧ x ) ∨ ( x ∧ x ∧ x ∧ x ) . It can be checked that g is a balanced function with H [ F ] ≥ . Inf [ F ] = 1 . F = g , we getlim m →∞ H [ f m ] Inf [ f m ] ≥ . .

625 + 3 . . × .

625 = 6 . . References [BdW02] Harry Buhrman and Ronald de Wolf. Complexity measures and decision tree complex-ity: a survey.

Theoretical Computer Science , 288(1):21–43, 2002. 1.1[BKK +

92] Jean Bourgain, Jeﬀ Kahn, Gil Kalai, Yitzhak Katznelson, and Nathan Linial. Theinﬂuence of variables in product spaces.

Israel Journal of Mathematics , 77(1):55–64,1992. 1.1[DETT10] Anindya De, Omid Etesami, Luca Trevisan, and Madhur Tulsiani. Improved pseudo-random generators for depth 2 circuits. In

Proceedings of the 14th Annual InternationalWorkshop on Randomized Techniques in Computation , pages 504–517, 2010. 1, 1.1[FK96] Ehud Friedgut and Gil Kalai. Every monotone graph property has a sharp thresh-old.

Proceedings of the American Mathematical Society , 124(10):2993–3002, 1996.(document), 1[Fri98] Ehud Friedgut. Boolean functions with low average sensitivity depend on few coordi-nates.

Combinatorica , 18(1):27–36, 1998. 1[GKK08a] Parikshit Gopalan, Adam Kalai, and Adam Klivans. Agnostically learning decisiontrees. In

Proceedings of the 40th Annual ACM Symposium on Theory of Computing ,pages 527–536, 2008. 1[GKK08b] Parikshit Gopalan, Adam Kalai, and Adam Klivans. A query algorithm for agnosticallylearning DNF? In

Proceedings of the 21st Annual Conference on Learning Theory , pages515–516, 2008. 1[HNW93] Raﬁ Heiman, Ilan Newman, and Avi Wigderson. On read-once threshold formulae andtheir randomized decision in tree complexity.

Theor. Comput. Sci. , 107(1):63–76, 1993.1.1 10Kal07] Gil Kalai. The entropy/inﬂuence conjecture. Posted on Terence Tao’s

What’snew blog, http://terrytao.wordpress.com/2007/08/16/gil-kalai-the-entropyinﬂuence-conjecture/, 2007. 1[KKL88] Jeﬀ Kahn, Gil Kalai, and Nathan Linial. The inﬂuence of variables on Boolean func-tions. In

Proceedings of the 29th Annual IEEE Symposium on Foundations of ComputerScience , pages 68–80, 1988. 1[KLW10] Adam Klivans, Homin Lee, and Andrew Wan. Mansour’s Conjecture is true for randomDNF formulas. In

Proceedings of the 23rd Annual Conference on Learning Theory , pages368–380, 2010. 1.1, 1.1[KMS12] Nathan Keller, Elchanan Mossel, and Tomer Schlank. A note on the entropy/inﬂuenceconjecture.

Discrete Mathematics , 312(22):3364–3372, 2012. 1.1[Man94] Yishay Mansour. Learning Boolean functions via the Fourier Transform. In Vwani Roy-chowdhury, Kai-Yeung Siu, and Alon Orlitsky, editors,

Theoretical Advances in NeuralComputation and Learning , chapter 11, pages 391–424. Kluwer Academic Publishers,1994. 1[OS08] Ryan O’Donnell and Rocco Servedio. Learning monotone decision trees in polynomialtime.

SIAM Journal on Computing , 37(3):827–844, 2008. 1[OWZ11] Ryan O’Donnell, John Wright, and Yuan Zhou. The Fourier Entropy-Inﬂuence Con-jecture for certain classes of boolean functions. In

Proceedings of the 38th AnnualInternational Colloquium on Automata, Languages and Programming , pages 330–341,2011. (document), 1.1[Ser04] Rocco Servedio. On learning monotone DNF under product distributions.

Informationand Computation , 193(1):57–74, 2004. 1

A Biased Fourier Analysis

Theorem 6 (Fourier expansion) . Let µ = h µ , . . . , µ n i be a sequence of biases. The µ -biasedFourier expansion of f : {− , } n → R is f ( x ) = X S ⊆ [ n ] e f ( S ) φ µS ( x ) , where φ µS ( x ) = Y i ∈ S x i − µ i σ i and e f ( S ) = E µ [ f ( x ) φ µS ( x )] , and σ i = Var µ [ x i ] = 1 − µ i . The µ -biased spectral support of f is the collection S ⊆ [ n ] of subsets S ⊆ [ n ] such that e f ( S ) = 0. We write f ≥ k to denote P | S |≥ k e f ( S ) φ µS ( x ), the projection of f onto its monomials ofdegree at least k . Theorem 7 (Parseval’s identity) . Let f : {− , } nµ → R . Then P S ⊆ [ n ] e f ( S ) = E µ [ f ( x ) ] . Inparticular, if the range of f is {− , } then P S ⊆ [ n ] e f ( S ) = 1 . eﬁnition 8 (Inﬂuence) . Let f : {− , } nµ → R . The inﬂuence of variable i ∈ [ n ] on f is Inf µi [ f ] = E ρ [ Var µ i [ f ρ ]], where ρ is a µ -biased random restriction to the coordinates in [ n ] \{ i } .The total inﬂuence of f , denoted Inf µ [ f ], is P ni =1 Inf µi [ f ].We recall a few basic Fourier formulas. The expectation of f is given by E µ [ f ] = e f ( ∅ ) andits variance Var µ [ f ] = P S = ∅ e f ( S ) . For each i ∈ [ n ], Inf µi [ f ] = P S ∋ i e f ( S ) and so Inf µ [ f ] = P S ⊆ [ n ] | S | · e f ( S ) . We omit the sub- and superscripts when µ = h , . . . , i is the uniform distribu-tion. Comparing the Fourier formulas for variance and total inﬂuence yields the Poincar´e inequalityfor functions f : {− , } nµ → R : Theorem 9 (Poincar´e inequality) . Let f : {− , } nµ → R . Then Inf µ [ f ] ≤ Var µ [ f ] . Recall that the i -th discrete derivative operator for f : {− , } n → {− , } is deﬁned to be D x i ( x ) = (cid:0) f ( x i ← ) − f ( x i ←− ) (cid:1) , and for S ⊆ [ n ] we write D x S f to denote ◦ i ∈ S D x i f . Deﬁnition 10 (Discrete derivative) . The i -th discrete derivative operator D φ µi with respect to the µ -biased product distribution on {− , } n is deﬁned by D φ µi f ( x ) = σ i D x i f ( x ).With respect to the µ -biased Fourier expansion of f : {− , } nµ → R the operator D φ µi satisﬁes D φ µi f = X S ∋ i e f ( S ) φ µS , and so for any S ⊆ [ n ] we have e f ( S ) = E [ ◦ i ∈ S D φ µi f ] = Q i ∈ S σ i E µ [( D x S f )]. B Omitted Proofs

Lemma 3.1.

Let Φ , . . . , Φ k : {− , } kℓµ → R where each Φ i depends only on the ℓ coordinates in { ( i − ℓ + 1 , . . . , iℓ } . Then H µ [Φ · · · Φ k ] = k X i =1 H µ [Φ i ] Y j = i E µ [Φ j ] and Inf µ [Φ · · · Φ k ] = k X i =1 Inf µ [Φ i ] Y j = i E µ [Φ j ] . Proof.

We prove both formulas by induction on k , noting that the bases cases are trivially true.For the inductive step, we deﬁne h ( x ) = Q i ∈ [ k − Φ i ( x ) and see that H µ [ h · Φ k ] = X S ⊆ [( k − ℓ ] T ⊆{ ( k − ℓ +1 ,...kℓ } e h ( S ) f Φ k ( T ) log Q i ∈ S ∪ T σ i e h ( S ) f Φ k ( T ) ! = X S,T e h ( S ) f Φ k ( T ) " log Q i ∈ S σ i e h ( S ) ! + log Q i ∈ T σ i f Φ k ( T ) ! = E µ [ h ] · H µ [Φ k ] + E µ [Φ k ] · H µ [ h ]= Y i ∈ [ k − E µ [Φ i ] · H µ [Φ k ] + E µ [Φ k ]  k − X i =1 H µ [Φ i ] Y j = i E µ [Φ j ]  = k X i =1 H µ [Φ i ] Y j = i E µ [Φ j ] . f : {− , } nµ → R does not depend on coordinate i ∈ [ n ] then e f ( S ) = 0 for all S ∋ i ( i.e. the Fourier spectrum of f is supported on sets containingonly its relevant variables). The third equality is by Parseval’s, and the fourth by the inductionhypothesis applied to h .The formula for inﬂuence follows from a similar derivation: Inf µ [ h · Φ k ] = X S ⊆ [( k − ℓ ] T ⊆{ ( k − ℓ +1 ,...kℓ } | S ∪ T | · e h ( S ) f Φ k ( T ) = X S,T | T | · e h ( S ) f Φ k ( T ) + X S,T | S | · e h ( S ) f Φ k ( T ) = E µ [ h ] · Inf µ [Φ k ] + E µ [Φ k ] · Inf µ [ h ]= Y i ∈ [ k − E µ [Φ i ] · Inf µ [Φ k ] + E µ [Φ k ]  k − X i =1 Inf µ [Φ i ] Y j = i E µ [Φ j ]  = k X i =1 Inf µ [Φ i ] Y j = i E µ [Φ j ] , and this completes the proof. Lemma 4.1.

Let F : {− , } kµ → {− , } . Let S ⊆ [ k ] , S = ∅ , and suppose e F ( S ) = 0 . For any j ∈ S we have e F ( S ) log Q i ∈ S σ i e F ( S ) ! ≤ k ln 2 · Var µ [ D φ µj F ] . Proof.

Recall that e F ( S ) = E µ [ ◦ i ∈ S D φ µi f ] = Q i ∈ S σ i E µ [ D x S f ], and so e F ( S ) log Q i ∈ S σ i e F ( S ) ! = Y i ∈ S σ i · E µ [ D x S F ] log (cid:18) E [ D x S F ] (cid:19) ≤ Y i ∈ S σ i · (cid:12)(cid:12) E µ [ D x S F ] (cid:12)(cid:12) ≤ Y i ∈ S σ i Pr µ [ D x S F = 0] . Here the ﬁrst inequality holds since t log(1 /t ) ≤ t/ ln(2) for all t ∈ R + , and the second uses thefact that D x S F is bounded within [ − , Y i ∈ S σ i Pr µ [ D x S F = 0] ≤ k · Var µ [ D φ µj F ]= 2 k σ j · Var µ [ D j F ]= 2 k σ j E y ∈{− , } [ n ] \ S (cid:20) E z ∈{− , } S \{ j } (cid:2) (( D j F ) | y ( z ) − µ ) (cid:3)(cid:21) , µ = E [ D j F ] and ( D j F ) | y denotes the restriction of D j F where the coordinates in [ n ] \ S areset according to y . We ﬁrst rewrite the desired inequality above as  − k Y i ∈ S \{ j } σ i  E y ∈{− , } [ n ] \ S [ D xS F ( y ) =0 ] ≤ E y ∈{− , } [ n ] \ S (cid:20) E z ∈{− , } S \{ j } (cid:2) (( D j F ) | y ( z ) − µ ) (cid:3)(cid:21) and argue that this holds point-wise: for every y ∈ [ n ] \ S such that D x S F ( y ) = 0, E (cid:2) (( D j F ) | y ( z ) − µ ) (cid:3) ≥ − k Y i ∈ S \{ j } σ i . To see this, ﬁx y ∈ {− , } [ n ] \ S such that ( D x S F )( y ) = 0. Viewing ( D x S F ) as ( D x S \{ j } D j F ), itfollows that ( D j F ) | y is non-constant. Since ( D j F ) | y takes values in {− , , } , there must existsome z ∗ ∈ {− , } S \{ j } such that | ( D j F ) | y ( z ∗ ) − µ | ≥ and so indeed E (cid:2) (( D j F ) | y ( z ) − µ ) (cid:3) ≥ (cid:18) (cid:19) Pr [ z = z ∗ ]= 14 Y i ∈ S \{ j } ± µ i ≥ Y i ∈ S \{ j } σ i ≥ − k Y i ∈ S \{ j } σ i ..