Fooling intersections of low-weight halfspaces
aa r X i v : . [ c s . CC ] A p r Fooling intersections of low-weight halfspaces
Rocco A. Servedio ∗ Columbia University Li-Yang Tan † Toyota Technological InstituteApril 18, 2017
Abstract A weight- t halfspace is a Boolean function f ( x ) = sign( w x + · · · + w n x n − θ ) where each w i is an integer in {− t, . . . , t } . We give an explicit pseudorandom generator that δ -fools anyintersection of k weight- t halfspaces with seed length poly(log n, log k, t, /δ ). In particular, ourresult gives an explicit PRG that fools any intersection of any quasipoly( n ) number of halfspacesof any polylog( n ) weight to any 1 / polylog( n ) accuracy using seed length polylog( n ) . Prior tothis work no explicit PRG with non-trivial seed length was known even for fooling intersectionsof n weight-1 halfspaces to constant accuracy.The analysis of our PRG fuses techniques from two different lines of work on unconditionalpseudorandomness for different kinds of Boolean functions. We extend the approach of Harsha,Klivans and Meka [HKM12] for fooling intersections of regular halfspaces, and combine thisapproach with results of Bazzi [Baz07] and Razborov [Raz09] on bounded independence fool-ing CNF formulas. Our analysis introduces new coupling-based ingredients into the standardLindeberg method for establishing quantitative central limit theorems and associated pseudo-randomness results. ∗ Supported by NSF grants CCF-1420349 and CCF-1563155. Email: [email protected] † Supported by NSF grant CCF-1563122. Email: [email protected]
Introduction A halfspace , or linear threshold function (henceforth abbreviated LTF), over {− , } n is a Booleanfunction f that can be expressed as f ( x ) = sign( w x + · · · + w n x n − θ ) for some real values w , . . . , w n , θ . LTFs are a natural class of Boolean functions which play a central role in many areassuch as machine learning and voting theory, and have been intensively studied in complexity theoryfrom many perspectives such as circuit complexity [GHR92, Raz92, H˚as94, SO03], communicationcomplexity [Nis93, Vio15], Boolean function analysis [Cho61, GL94, Per04, Ser07, O’D14], propertytesting [MORS09, MORS10], pseudorandomness [DGJ +
10, MZ13, GKM15] and more.Because of the limited expressiveness of a single LTF (even a parity function over two variablescannot be expressed as an LTF), it is natural to consider Boolean functions that are obtainedby combining LTFs in various ways. Perhaps the simplest and most natural functions of thissort are intersections of LTFs , i.e. Boolean functions of the form F ∧ · · · ∧ F k where each F j is anLTF. Intersections of LTFs have been studied in many contexts including Boolean function analysis[Kan14, She13a, She13b], computational learning (both algorithms [BK97, KOS04, KOS08, Vem10]and hardness results [KS06, KS11]), and pseudorandomness [GOWZ10, HKM12]. We further notethat the set of feasible solutions to an { , } -integer program with k constraints corresponds pre-cisely to the set of satisfying assignments of an intersection of k LTFs; understanding the structureof these sets has been the subject of intensive study in computer science, optimization, and combi-natorics.This paper continues the study of intersections of LTFs from the perspective of unconditionalpseudorandomness; in particular, we are interested in constructing explicit pseudorandom genera-tors (PRGs) for intersections of LTFs. Recall the following standard definitions:
Definition 1 (Pseudorandom generator) . A function
Gen : {− , } r → {− , } n is said to δ -foola function F : {− , } n → {− , } with seed length r if (cid:12)(cid:12)(cid:12)(cid:12) E U ′ ←{− , } r (cid:2) F ( Gen ( U ′ )) (cid:3) − E U ←{− , } n (cid:2) F ( U ) (cid:3)(cid:12)(cid:12)(cid:12)(cid:12) ≤ δ. Such a function
Gen is said to be a explicit pseudorandom generator that δ -fools a class F of n -variable functions if Gen is computable by a deterministic uniform poly( n ) -time algorithm and Gen δ -fools every function F ∈ F . Before describing our results, we recall relevant prior work on fooling LTFs and intersections ofLTFs.
Fooling a single LTF.
In [DGJ +
10] Diakonikolas et al. showed that any ˜ O (1 /δ )-wise independentdistribution over {− , } n suffices to δ -fool any LTF, and thereby gave a PRG for single LTFs withseed length ˜ O (1 /δ ) · log n . Soon after, [MZ13] gave a more efficient PRG for LTFs with seed length O (log n + log (1 /δ )) . They did this by first developing an alternative ˜ O (1 /δ ) · log n seed lengthPRG for regular LTFs; these are LTFs in which no individual weight is large compared to the totalsize of all the weights (we give precise definitions later). [MZ13] built on this PRG for regularLTFs using structural results for LTFs and PRGs for read-once branching programs to obtain theirimproved O (log n + log (1 /δ )) seed length for fooling arbitrary LTFs. More recently, [GKM15] gave1 PRG which δ -fools any LTF over {− , } n using seed length O (log( n/δ )(log log( n/δ )) ); this isthe current state-of-the-art for fooling a single LTF.Since the approach of [MZ13] for fooling regular LTFs is important for our discussion in latersections, we describe it briefly here. The [MZ13] PRG for regular LTFs employs hashing and othertechniques; its analysis crucially relies on the Berry–Ess´een theorem [Ber41, Ess42]. Recall that theBerry–Ess´een theorem is an “invariance principle” for the distribution of linear forms; it (or rather,a special case of it) says that for w a regular vector, the two random variables w · U and w · G , where U is uniform over {− , } n and G is drawn from the standard n -dimensional Gaussian distribution N (0 , n , are close in CDF distance. Roughly speaking, the [MZ13] PRG analysis for τ -regularLTFs proceeds by showing that the limited randomness provided by their generator is sufficient toapply the Berry–Ess´een theorem (over a certain set of roughly 1 /τ independent random variables).We give a more detailed description of the structure of the [MZ13] PRG in Section 2. Fooling intersections of regular LTFs.
Now we turn to results on fooling intersections of LTFs.Essentially simultaneously with [MZ13] (in terms of conference publication), [HKM12] gave a PRGfor intersections of regular
LTFs. Their PRG ˜ O ((log k ) / τ / )-fools any intersection of k many τ -regular LTFs with seed length O ((log n log k ) /τ ). As we discuss in in Section 2, the [HKM12]generator has the same structure as the [MZ13] PRG for regular LTFs, but with different (larger)parameter settings and a significantly more involved analysis. At the heart of the correctnessproof of the [HKM12] PRG is a new invariance principle that [HKM12] prove for k -tuples ( w (1) · U , . . . , w ( k ) · U ) of regular linear forms, generalizing the Berry–Ess´een theorem which as describedabove applies to a single regular linear form. With this new invariance principle in hand, toprove their PRG theorem [HKM12] argue (similar in spirit to [MZ13]) that the limited randomnessprovided by their generator is sufficient for their new k -dimensional invariance principle.Note that even the k = 1 case of the invariance principle (the Berry–Ess´een theorem) does notgive a meaningful bound for non-regular linear forms. As a simple example, consider the triviallinear form x , which is highly non-regular: the two one-dimensional random variables U and G ,where U is uniform over {− , } and G is distributed according to N (0 , ≈ . The PRG of Gopalan, O’Donnell, Wu, and Zuckerman.
Around the same time, [GOWZ10]gave a PRG that δ -fools intersections of k arbitrary LTFs with seed length O (( k log( k/δ ) + log n ) · log( k/δ )), and indeed δ -fools any depth- k size- s decision tree that queries LTFs at its internalnodes with seed length O (( k log( ks/δ ) + log n ) · log( ks/δ )). Their approach builds on the PRG of[MZ13] for general LTFs; one central ingredient is a generalization of structural results for singleLTFs used in [MZ13] to k -tuples of LTFs. Both this generalization, and the read-once branchingprogram based techniques from [MZ13] (which are extended in [GOWZ10] to the context of k -tuplesof LTFs), necessitate a seed length which is at least linear in k . So while the [GOWZ10] PRG is isnotable for being able to handle intersections of general LTFs, their seed length’s linear dependencein k means that their seed length is n Ω(1) whenever k = n Ω(1) , and furthermore their result doesnot give a non-trivial PRG for intersections of k ≥ n many LTFs.2 .1.1 A conceptual challenge We elaborate briefly on an issue related to the linear-in- k dependence of the [GOWZ10] gener-ator discussed above. A standard approach to analyze non-regular LTFs, both in pseudoran-domness and in other subfields of complexity theory such as analysis of Boolean functions andlearning theory [DS13, DRST14, DSTW14, DDS16, FGRW09, CSS16], is to reduce the analysis ofnon-regular LTFs to that of regular LTFs via a “critical index” argument (see [Ser07]). Indeed,most previous pseudorandomness results for classes involving non-regular LTFs and PTFs—generalLTFs [DGJ +
10, MZ13], functions of LTFs [GOWZ10], degree- d PTFs and functions of such PTFs[DKN10, MZ13, DDS14, DS14]—make use of such a reduction to the regular case. In workingwith functions that involve k LTFs (or PTFs), this analysis (see [DDS14, GOWZ10]) involves“multi-critical-index” arguments, originating in [GOWZ10], which necessitate an Ω( k ) seed lengthdependence; indeed, this linear-in- k dependence was highlighed in [HKM12] as a conceptual chal-lenge to overcome in extending their results to intersections of k non-regular LTFs.In this work we give the first analysis that is able to handle an interesting class of functionsinvolving k non-regular LTFs while avoiding this linear-in- k cost that is inherent to multi-critical-index based arguments, and in fact achieving a polylogarithmic dependence on k . It is easy to see that every LTF f : {− , } n → {− , } has some representation as f ( x ) =sign( w · x − θ ) where the coefficients w , . . . , w n are all integers; a standard way of measuring the“complexity” of an LTF is by the size of its integer weights. It has been known at least sincethe 1960s [MTT61, Hon87, Rag88] that every n -variable LTF has an integer representation withmax | w i | ≤ n O ( n ) , and H˚astad has shown [H˚as94] that there are LTFs that in fact require max w i = n Ω( n ) for any integer representation. However, in many settings, LTFs with small integer weights are of special interest. Such LTFs are often the relevant ones in contexts such as voting systems orcontexts where, e.g., biological or physical constraints may limit the size of the weights. From a moretheoretical perspective, it is well known that sample complexity bounds for many commonly usedLTF learning methods, such as the Perceptron and Winnow algorithms, are essentially determinedby the size of the integer weights.We say that f is a weight- t LTF if it can be represented as f ( x ) = sign( w · x − θ ) where each w i is an integer satisfying | w i | ≤ t. Note that arguably the simplest and most natural LTFs —unweighted threshold functions, with the majority function as a special case — have weight 1.Our main result is an efficient PRG for fooling intersections of low-weight LTFs:
Theorem 1 (PRG for intersections of low-weight LTFs) . For all values of k, t ∈ N and δ ∈ (0 , ,there is an explicit pseudorandom generator that δ -fools any intersection of k weight- t LTFs over {− , } n with seed length poly(log n, log k, t, /δ ) . Recalling the results of [HKM12, GOWZ10] described in Section 1.1, prior to this work no ex-plicit PRG with non-trivial seed length was known even for fooling intersections of n weight-1 LTFsto constant accuracy. (In fact, no 2 . n -time algorithm was known for deterministic approximatecounting of satisfying assignments of such an intersection; since such an algorithm is allowed to in-spect the intersection of halfspaces which is its input, while a PRG is “input-oblivious”, giving suchan algorithm is an easier problem than constructing a PRG.) In contrast, our result gives an explicitPRG that fools any intersection of any quasipoly( n ) number of LTFs of any polylog( n ) weight to3ny 1 / polylog( n ) accuracy using seed length polylog( n ) . For any c > n c that fools intersections of exp( n Ω(1) ) many LTFs of weight n Ω(1) to accuracy 1 /n Ω(1) . Recalling the correspondence between intersections of LTFs and { , } -integerprograms, our PRG immediately yields new deterministic algorithms for approximately countingthe number of feasible solutions to broad classes of { , } -integer programs. Our most general PRG result.
We obtain Theorem 1 as an easy consequence of a PRG that foolsa more general class of intersections of LTFs. To describe this class we require some terminology.We say that a vector w over R n is s -sparse if at most s coordinates among w , . . . , w n are nonzero.We similarly say that a linear threshold function sign( w · x − θ ) is s -sparse if w is s -sparse. Following[HKM12], we say that a linear form w = ( w , . . . , w n ) with norm k w k := (cid:0)P ni =1 w i (cid:1) / is τ -regular if P ni =1 w i ≤ τ k w k , and we say that a linear threshold function sign( w · x − θ ) is τ -regular if thelinear form w is τ -regular. Finally, we say that F : {− , } n → {− , } is a ( k, s, τ ) -intersection ofLTFs if F = F ∧ · · · ∧ F k where each F j is an LTF which is either s -sparse or τ -regular.Our most general PRG result is the following: Theorem 2 (Our most general PRG, informal statement) . For all values of k, s ∈ N and τ ∈ (0 , ,there is an explicit pseudorandom generator with seed length poly(log n, log k, s, /τ ) that fools any ( k, s, τ ) -intersection of LTFs to accuracy δ = poly(log k, τ ) . In Section 4.1 we give the formal statement of Theorem 2 and show how Theorem 1 followsfrom Theorem 2.
As explained in Section 1.1, invariance-based arguments are not directly useful for our task offooling intersections of low-weight LTFs, since the invariance principle does not give a non-trivialbound even for a single low-weight LTF. Nevertheless, we are able to show that a generator with thesame structure as the [MZ13, HKM12] generators (but now with slightly larger parameter settingsthan were used in the [HKM12] generator) indeed fools any ( k, s, τ )-intersection of LTFs. We dothis via an analysis that brings in ingredients that are novel in the context of fooling intersections ofLTFs; in particular, we use results of Bazzi [Baz07] and Razborov [Raz09] on bounded independencefooling depth-2 circuits.How are depth-2 circuits relevant to intersections of LTFs? A starting point for our work is tore-express a ( k, s, τ )-intersection of LTFs using a different representation, in which we replace each s -sparse LTF by a CNF formula computing the same function over {− , } n . The following is animmediate consequence of the fact that any s -sparse LTF depends on at most s variables: Fact 2.1.
Let F be a ( k, s, τ ) -intersection of LTFs. Then F ≡ H ∧ G , where • H is the intersection of at most k many τ -regular LTFs. • G is a width- s CNF formula with at most k · s clauses; We refer to a function of the form H ∧ G as above as a ( k, s, τ ) - CnfLtf . We can thus restateour goal as that of designing a PRG to fool any ( k, s, τ )- CnfLtf : with this perspective it is notsurprising that pseudorandomness tools for fooling CNF formulas can be of use.4 .1 The structure of our PRG
To describe our approach we need to explain the general structure of the PRG which is used in[MZ13] for regular LTFs, in [HKM12] for intersections of regular LTFs, and in our work for ( k, s, τ )-intersections of LTFs. The construction uses an r hash -wise independent family H of hash functions h : [ n ] → [ ℓ ], and an r bucket -wise independent generator outputting strings in {− , } n , which wedenote G . The overall generator, which we denote Gen , on input ( h, X (1) , . . . , X ( ℓ ) ) outputs thestring Gen ( h, X (1) , . . . , X ( ℓ ) ) := Y ∈ {− , } n , where Y h − ( b ) = G ( X ( b ) ) h − ( b ) for all b ∈ [ ℓ ]. (Hereand elsewhere, for Y an n -bit string and S ⊆ [ n ] we write Y S to denote the | S | -bit string obtainedby restricting Y to the coordinates in S .)The [MZ13] PRG for τ -regular LTFs instantiates this construction with ℓ = 1 /τ , r hash = 2 , and r bucket = 4 , while the [HKM12] PRG for intersections of k many τ -regular LTFs takes ℓ = 1 /τ, r hash = 2 log k, and r bucket = 4 log k. We state the exact parameter settings which we use to fool ( k, s, τ )-intersections of LTFs in Section 4(the specific values are not important for our discussion in this section).
As our analysis (sketched in Section 2.3) builds on [MZ13, HKM12], in this subsection we sketchthe [MZ13, HKM12] arguments establishing correctness of the PRG
Gen for regular LTFs andintersections of regular LTFs.A high-level sketch of the [MZ13] analysis showing that
Gen fools any regular LTF F ( x ) =sign( w · x − θ ) is as follows: the hash function h : [ n ] → [ ℓ ] partitions the n coefficients w , . . . , w n into ℓ buckets. The pairwise independence of h ← H and the regularity of w are together usedto show that each of the ℓ buckets receives essentially the same amount of “coefficient weight.”The idea then is to view the sum w · Y , where Y is the output of the generator, as a sum of ℓ independent random variables (note that the inputs X (1) , . . . , X ( ℓ ) ∈ {− , } r to Gen are indeedmutually independent), one for each bucket, and use the Berry–Ess´een theorem on that sum. Thefour-wise independence of G is used to ensure that each of the ℓ summands—the b -th summandcorresponding to w h − ( b ) · Y h − ( b ) , the contribution from the b -th bucket—has the moment propertiesthat are required to apply the Berry–Ess´een theorem. Note that in this analysis the Berry–Ess´eentheorem is used as a “black box.”Since [HKM12] have to prove the k -dimensional invariance principle that they use in place ofthe Berry–Ess´een theorem, their analysis is necessarily more involved, but at a high level it followsa similar approach to the [MZ13] analysis sketched above. A sketch of their argument that Gen fools any intersection F = F ∧ · · · ∧ F k of regular LTFs is as follows:1. [HKM12] first argue that for any smooth test function ψ : R k → [0 , ( v ≤ θ ) · ( v ≤ θ ) ·· · · ( v k ≤ θ k ), which corresponds to k -dimensional Note that if the weight vector w is non-regular, then it is in general impossible for any hash function, even a fullyindependent one, to spread the coefficient weight out evenly among the ℓ buckets, and consequently the Berry–Ess´eentheorem cannot be applied (as, intuitively, it requires that no individual random variable summand is “too heavy”compared to the “total weight” of the sum). This is why the overall approach requires regularity. ψ relative to an N (0 , n Gaussian input to ψ. This is done by(a) first arguing (similar to [MZ13]) that the (2 log k )-wise independent hash function h ← H and the regularity of each LTF F k together “spread the coefficient weight” of the k LTFsroughly evenly among the ℓ buckets (we note that this part of the argument has nothingto do with the function ψ );(b) then a hybrid argument across the ℓ buckets, using the smoothness of ψ and momentproperties of the random variables corresponding to the ℓ buckets (which now followfrom the (4 log k )-wise independence of G ), is used to bound (cid:12)(cid:12)(cid:12)(cid:12) E Y ← Gen (cid:2) ψ ( w (1) · Y , . . . , w ( k ) · Y ) (cid:3) − E G ←N (0 , n (cid:2) ψ ( w (1) · G , . . . , w ( k ) · G ) (cid:3)(cid:12)(cid:12)(cid:12)(cid:12) . (1)(Such a hybrid argument is a central ingredient in the Lindeberg-style “replacement method”proof of the Berry–Ess´een theorem, and is also used in [HKM12]’s proof of their invarianceprinciple for intersections of k regular LTFs.) We note that multidimensional Taylor’s theoremplays a crucial role in bounding the difference in expectation between ψ applied to two randomvariables, which is done to “bound the distance” at each step of the hybrid.2. Next [HKM12] use a particular smooth function ψ ∗ based on a result of Bentkus [Ben90]and a Gaussian surface area bound for intersections of k halfspaces due to Nazarov [Naz03]to pass from fooling the smooth test function ψ ∗ to fooling the “hard threshold” functioncorresponding to CDF distance. This essentially amounts to using the fact that (1) is smallto show that (cid:12)(cid:12) E Y ← Gen [ F ( Y )] − E G ←N (0 , n [ F ( G )] (cid:12)(cid:12) is also small. Given this, the fact thatthe generator fools F , i.e. that (cid:12)(cid:12) E Y ← Gen [ F ( Y )] − E X ←{− , } n [ F ( X )] (cid:12)(cid:12) is small, follows from[HKM12]’s invariance principle, which bounds (cid:12)(cid:12) E G ←N (0 , n [ F ( G )] − E X ←{− , } n [ F ( X )] (cid:12)(cid:12) . Wenote that this second step of [HKM12]’s analysis does not use regularity of the F j ’s at all(but their invariance principle does require that each F j is regular). Here we give an overview of our proof that
Gen , with suitable parameters, fools any ( k, s, τ )- CnfLtf F = H ∧ G . Recall that H is an intersection of k many τ -regular LTFs and G is a ( k · s )-clauseCNF, and that the difference between our task and that of [HKM12] is that we must handle theCNF G in addition to the intersection of regular LTFs H . While it is not difficult to see, as aconsequence of [Baz07, Raz09], that the [HKM12] generator with suitable parameters (i) fools H ,and (ii) fools G , it is far from clear a priori that it fools H ∧ G . We show this via a rather delicateargument, which involves a novel extension of the Lindeberg method that is at the heart of allPRGs in this line of work [GOWZ10, MZ13, HKM12]. To surmount the technical challenges thatarise in our setting (which we described next), our analysis features several new ingredients whichare not present in the analyses of [GOWZ10, MZ13, HKM12], or indeed in other Lindeberg-typeproofs of quantitative central limit theorems that we are aware of. The ideas in this new style ofcoupling-based analysis, which we outline in Section 2.3.1 below, may be of use elsewhere.6 he standard Lindeberg setup, and a new challenge in our setting. As is standard inLindeberg-style proofs, our analysis focuses on a particular smooth test function, which for ustakes k + 1 arguments and which we denote ψ ∗ k +1 . This should be thought of as the ( k + 1)-variableversion of the smooth function of Bentkus [Ben90], which was used by [HKM12] as mentioned inthe preceding subsection. Crucually, while ψ ∗ k +1 maps all of R k +1 to [ − , R k × {± } ; indeed, its last (( k + 1)-st) coordinatewill always be a Boolean value which is the output of the CNF G .The heart of our proof lies in showing that for this specific smooth test function ψ ∗ k +1 (whichshould be thought of as a proxy for And (sign( v − θ ) , . . . , sign( v k − θ k ) , v k +1 ))), the pseudorandomdistribution output by the generator fools the test function ψ ∗ k +1 relative to a uniform random inputdrawn from {− , } n . This is done by means of a hybrid argument, the analysis of which (like thatof [GOWZ10, HKM12]) employs a multidimensional version of Taylor’s theorem. However, the factthat the distinguished last coordinate of ψ ∗ k +1 always receives a {± } -valued input—in particular,an input whose magnitude changes by a large amount (namely 2) when it does change—introducessignificant challenges in using the multidimensional Taylor’s theorem. Recall that Taylor’s theoremquantifies the following intuition: roughly speaking, if the input to a smooth function ψ is onlychanged by a small amount ∆, then the resulting change in its output value, ψ ( v + ∆) − ψ ( v ), iscorrespondingly small as well. Naturally, if ∆ is large then Taylor’s theorem does not give usefulbounds. Taylor’s theorem is the core ingredient in Lindeberg-style proofs of invariance principles (seee.g. [Tao10] and Chapter 11 of [O’D14]) and associated pseudorandomness results (see e.g. [GOWZ10,MZ13, HKM12]), where it is used to bound the distance incurred by a single step of the hybridargument. As mentioned above, in order for Taylor’s theorem to give a useful bound when it isapplied to re-express ψ ∗ k +1 ( v + ∆) (in terms of ψ ∗ k +1 ( v ), various derivatives of ψ ∗ k +1 at v , ∆, andan error term), the quantity ∆ must be “small.” This is a problem in our context since the distin-guished last coordinate of ψ ∗ k +1 ’s argument (the output of the CNF G ) is {± } -valued, so the lastcoordinate of ∆ alone may already be as large as 2. We get around this difficulty by utilizing acarefully chosen coupling between two adjacent hybrid random variables and decomposing each ofthe two relevant arguments to which ψ ∗ k +1 is applied (each of which is a random variable) in a verycareful way. One of these random variables is expressed as v + ∆ unif (corresponding to “filling inthe current bucket uniformly at random”) and the other is v + ∆ pseudo (corresponding to “fillingin the current bucket pseudorandomly”); roughly speaking, in order to succeed our analysis mustshow that the magnitude of E [ ψ ∗ k +1 ( v + ∆ unif )] − E [ ψ ∗ k +1 ( v + ∆ pseudo )] is suitably small. The keyproperty of the coupling we employ is that it ensures that the last coordinates of both randomvariables ∆ unif and ∆ pseudo are almost always zero; in fact, one of them will actually be alwayszero, see Equation (7). (We note that if no coupling is used then the last coordinate of ∆ pseudo can be as large as 2 with constant probability.) The existence of such a favorable coupling followsfrom the fact that each bucket of Gen is, by virtue of its bounded independence and the results ofBazzi [Baz07] and Razborov [Raz09], “sufficiently pseudorandom” to fool CNF formulas.However, the way that we structure the random variables v , ∆ unif , and ∆ pseudo to ensurethat the last coordinate of each ∆ is almost always small (as discussed above), introduces a newcomplication, which is that now the random variables v and ∆ unif are not independent (and neitherare v and ∆ pseudo ). This situation does not arise in standard uses of the Lindeberg method, either in7roving invariance principles or in applications to pseudorandom generators. In all of these previousproofs, independence is used to show that various first derivative, second derivative, etc. terms inthe Taylor expansions for the two adjacent random variables cancel out perfectly upon subtraction(using matching moments). To surmount this lack of independence, we exploit the fact that ourcoupling lets us re-express the coupled joint distribution (over a pair of vectors in R k × {± } ) asa mixture of three joint distributions over pairs of ( k + 1)-dimensional vectors in such a way thatone component of the mixture is entirely supported on ( R k × { } ) × ( R k × { } ) , one is entirelysupported on ( R k × {− } ) × ( R k × {− } ) , and the third has a very small mixing weight. Under eachof the first two joint distributions (supported entirely on pairs that agree in the last coordinate), v and ∆ unif will indeed be independent, and so will v and ∆ pseudo .However, performing the hybrid method using these conditional distributions presents anotherchallenge: while now v and ∆ unif are independent (and likewise for v and ∆ pseudo ), the momentsof these conditional random variables may not match perfectly. We deal with this by exploitingthe fact that each pseudorandom distribution that we consider “filling in a single bucket” can infact fool, to very high accuracy, any of poly( n ) many new circuits which arise in our analysis of themultidimensional Taylor expansion (intuitively, these are “slightly augmented” CNFs or DNFs).This allows us to show that while we do not get perfect cancellation, the relevant moments under theconditional distributions are adequately close to each other. Finally, our coupling-based perspectivealso allows us to bound the (crucial) final error term resulting from Taylor’s theorem by reducingits analysis to that of the corresponding error term in [HKM12].The above is a sketch of how we show that Gen fools the smooth test function ψ ∗ k +1 . Topass from fooling ψ ∗ k +1 to fooling the “hard threshold” And function, we combine the [HKM12]invariance principle with a simple relationship, Claim 5.2, which we establish between the anti-concentration of the ( k + 1)-dimensional input to the ψ ∗ k +1 function (with its distinguished lastcoordinate corresponding to outputs of the CNF) and its k -dimensional marginal which excludesthe last coordinate (all coordinates of which correspond to outputs of regular linear forms, i.e. thesetting of [HKM12]). LTFs and regularity.
We recall that a linear threshold function (LTF) is a function of the formsign( w · x − θ ), where sign( z ) is 1 if z > − − True and 1 as
False throughout the paper.We write W ∈ R n × k to denote the matrix whose j -th column is the weight vector of the j -thLTF in an intersection of k LTFs. We assume that each such LTF has been normalized so thatits weight vector has norm 1. For j ∈ [ k ] (indexing one of the LTFs) we write W j to denote the j -th column of W (so k W j k = 1 for all j ), and for B ⊆ [ n ] (a subset of variables) we write W B to denote the matrix formed by the rows of W with indices in B . Combining these notations, W jB denotes the | B | -element column vector which is obtained from W j by taking those entries given bythe indices in B . Throughout the paper we will write ~θ to denote the k -tuple ~θ = ( θ , . . . , θ k ) ∈ R k .We say that a vector w ∈ R n is τ -regular if P ni =1 w i ≤ τ k w k , and that it is s -sparse if it hasat most s non-zero entries. We use the same terminology to refer to an LTF sign( w · x − θ ) . Wesay that a matrix W ∈ R n × k is τ -regular if each of its columns is τ -regular.A restriction ρ fixing a subset S ⊆ [ n ] of n input variables is an element of { , } S ; it correspondsto setting the variables in S in the obvious way and leaving the variables outside S free. Given an8 -variable function f and a restriction ρ we write f ↾ ρ to denote the function obtained by settingsome of the input variables as dictated by ρ . Probability background.
We recall some standard definitions of bounded-independence dis-tributions and hash families. A distribution D over {− , } n is r -wise independent if for every1 ≤ i < · · · < i r ≤ n and every ( b , . . . , b r ) ∈ {− , } r , we have Pr X ←D (cid:2) X i = b and · · · and X i r = b r (cid:3) = 2 − r . We recall the results of [Baz07, Raz09] which state that bounded-independence distributions foolCNF formulas:
Theorem 3 (Bounded independence fools depth-2 circuits) . Let f be any M -clause CNF formulaor M -term DNF formula. Then f is δ -fooled by any O ((log( M/δ )) ) -wise independent distribution. A family H of functions from [ n ] to [ ℓ ] is said to be an r -wise independent hash family if forevery 1 ≤ i < · · · < i r ≤ n and ( j , . . . , j r ) ∈ [ ℓ ] r , we have Pr h ←H (cid:2) h ( i ) = j and · · · and h ( i r ) = j r (cid:3) = ℓ − r . When S is a set the notations Pr X ← S [ · ] , E X ← S [ · ] indicate that the relevant probability or expec-tation is over a uniform draw of X from set S . Throughout the paper we use bold fonts suchas X , U , h , etc. to indicate random variables. We write N (0 ,
1) to denote the standard normaldistribution with mean 0 and variance 1.
Calculus.
We say that a function ψ : R k → R is smooth if its first through fourth derivatives areuniformly bounded. For smooth ψ : R k → R , v ∈ R k , and j , . . . , j r ∈ [ k ], we write ( ∂ j ,...,j r ψ )( x )to denote ∂ j ∂ j · · · ∂ j r ψ ( x ), and for s = 1 , , . . . we write k ψ ( s ) k to denote sup v ∈ R k X j ,...,j s ∈ [ k ] | ( ∂ j ,...,j s ψ )( v ) | . Given indices j , . . . , j r ∈ [ k ], we write ( j , . . . , j r )! to denote s ! s ! · · · s k !, where for each ℓ ∈ [ k ], s ℓ denotes the number of occurrences of ℓ in ( j , . . . , j r ) . We will use the following form ofmultidimensional Taylor’s theorem (see e.g. Fact 4.3 of [HKM12]):
Fact 3.1 (Multidimensional Taylor’s theorem) . Let ψ : R k → R be smooth and let v, ∆ ∈ R k . Then ψ ( v + ∆) = ψ ( v ) + X j ∈ [ k ] ( ∂ j ψ )( v )∆ j + X j,j ′ ∈ [ k ] j, j ′ )! ( ∂ j,j ′ ψ )( v )∆ j ∆ j ′ + X j,j ′ ,j ′′ ∈ [ k ] j, j ′ , j ′′ )! ( ∂ j,j ′ ,j ′′ ψ )( v )∆ j ∆ j ′ ∆ j ′′ + err( v, ∆) , where | err( v, ∆) | ≤ k ψ (4) k · max j ∈ [ k ] | ∆ j | . Useful results from [HKM12].
The following notation will be useful: for 0 < λ < k ≥
1, and ~θ = ( θ , . . . , θ k ) ∈ R k , we defineInner k,~θ = (cid:8) v ∈ R k : v j ≤ θ j for all j ∈ [ k ] (cid:9) , Outer λ,k,~θ = (cid:8) v ∈ R k : v j ≥ θ j + λ for some j ∈ [ k ] (cid:9) , λ,k,~θ = R k \ (Inner k,~θ ∪ Outer λ,k,~θ ) . We recall the main result of [HKM12]:
Theorem 4 (Invariance principle for polytopes, Theorem 3.1 of [HKM12]) . Let W ∈ R n × k be τ -regular with each column W j satisfying k W j k = 1 . Then for all ~θ ∈ R k , we have (cid:12)(cid:12)(cid:12)(cid:12) Pr U ←{− , } n (cid:2) W T U ∈ Inner k,~θ (cid:3) − Pr G ←N (0 , n (cid:2) W T G ∈ Inner k,~θ (cid:3)(cid:12)(cid:12)(cid:12)(cid:12) = O (cid:0) (log k ) / ( τ log(1 /τ )) / (cid:1) . We will also use the following anti-concentration bound for Gaussian random variables (whichis an easy consequence of the O ( √ log k ) Gaussian surface area upper bound of Nazarov [Naz03] forintersections of k LTFs):
Theorem 5 (Anti-concentration bound for Gaussian random variables landing in a strip, Lemma 3.4of [HKM12]) . For all ~θ ∈ R k and all < λ < , we have Pr G ←N (0 , n (cid:2) W T G ∈ Strip λ,k,~θ (cid:3) = O ( λ p log k ) . Our PRG for ( k, s, τ )-intersections of LTFs is the generator
Gen described in Section 2.1, instanti-ated with the following parameters: ℓ = 1 /τ,r hash = 2 log k,r bucket = 4 log k + O ((log( M/δ
CNF )) where M = k · s and δ CNF = 1 / poly( n )(the exact value for δ CNF will be specified later). By standard constructions of r hash -wise inde-pendent hash families and r bucket -wise independent random variables, the total seed length of ourgenerator is O (log( n log ℓ ) · r hash + ℓ · (log n ) · r bucket ) = O (cid:18) τ · log n · (log k + s + log n ) (cid:19) = poly(log n, log k, s, /τ ) . We begin with our most general PRG result:
Theorem 2.
For all values of k, s ∈ N and τ ∈ (0 , , the pseudorandom generator Gen instantiatedwith the parameters above fools the class of ( k, s, τ ) -intersections of LTFs to accuracy δ := O ((log k ) / ( τ log(1 /τ )) / )) (2) with seed length poly(log n, log k, s, /τ ) . Observation 6 (Sparse-or-regular dichotomy) . Let F ( x ) = sign( w · x − θ ) be a weight- t LTF. Thenfor any s , either F is s -sparse or F is ( t/ √ s + 1)-regular. Proof.
Suppose that F is not s -sparse; for notational convenience we may suppose that w =( w , . . . , w s ′ , , . . . ,
0) where s ′ ≥ s + 1 and for 1 ≤ i ≤ s ′ each w i is a nonzero integer in {− t, . . . , t } . Normalize the weights by setting u i = w i / k w k for i = 1 , . . . , n . We have F ( x ) = sign( u · x − θ/ k w k )where k u k = 1.To show that F is ( t/ √ s + 1)-regular we must show that P s ′ i =1 u i ≤ t / ( s + 1) . We have n X i =1 u i ≤ (cid:18) max ≤ j ≤ s ′ u j (cid:19) · n X i =1 u i = max ≤ j ≤ s ′ u j ≤ t k w k ≤ t s + 1 , where the last inequality holds because ( s + 1) / k w k ≤ s ′ / k w k ≤ P s ′ i =1 u i = 1 . Theorem 1.
For all k, t ∈ N and δ ∈ (0 , , there is an explicit pseudorandom generator with seedlength poly(log n, log k, t, /δ ) that δ -fools any intersection of k weight- t LTFs.Proof of Theorem 1 assuming Theorem 2.
We fix τ := ˜Θ (cid:18) δ (log k ) (cid:19) so as to satisfy (2). By Observation 6, we have that every weight- t LTF is either τ -regular or( s := ( t/τ ) )-sparse. By our choice of τ , the parameters ℓ, r hash , and r bucket of the pseudorandomgenerator Gen instantiated with our parameters are all bounded by poly(log n, log k, t, /δ ), andhence the overall seed length is indeed O (log( n log ℓ ) · r hash + ℓ · (log n ) · r bucket ) = poly(log n, log k, t, /δ )as claimed.The remainder of this paper will be devoted to proving Theorem 2. ψ ∗ k +1 An intermediate goal, which in fact takes us most of the way to establishing Theorem 2, is toshow that
Gen fools a particular smooth test function ψ ∗ λ,k +1 , ( ~θ, . In this section we define thissmooth test function, establish some of its basic properties, and formally state our intermediategoal (Theorem 7 below). ψ ∗ λ,k +1 , ( ~θ, and its basic properties As discussed in Section 2.3, our analysis crucially features a particular smooth function ψ ∗ λ,k +1 , ( ~θ, : R k +1 → [ − , k + 1)-dimensional version of a function due to Bentkus[Ben90]. Fact 5.1 below states the key properties of this function.11 act 5.1 (Main result of [Ben90], see Theorem 3.5 of [HKM12]) . For all positive integers k , <λ < , and ~θ ∈ R k , there exists a smooth function ψ ∗ λ,k,~θ : R k → [ − , such that the followingholds: for every s = 1 , , . . . , we have k ( ψ ∗ λ,k,~θ ) ( s ) k ≤ C log s − ( k + 1) /λ s , and for all v ∈ R k , wehave ψ ∗ λ,k,~θ ( v ) = − if v ∈ Inner k,~θ if v ∈ Outer λ,k,~θ ∈ [ − , otherwise (i.e. if v ∈ Strip λ,k,~θ ) . (3)For intuition, the test function ψ ∗ λ,k,~ : R k → [ − ,
1] may loosely be thought of as a smoothapproximation to the k -variable And function; recall that on input ( b , . . . , b k ) ∈ {− , } k , the And function outputs − b , . . . , b k ) = ( − , . . . , − . (We note that [HKM12] only require the s = 4 case of the above theorem (this is their Theorem 3.5), since in their framework they canobtain perfect cancellation of the first, second and third derivative terms in the relevant differenceof Taylor expansions. In contrast we need to use all of the s = 1 , , , ψ ∗ λ,k +1 , ( ~θ, the last argument will always receive aBoolean value from {− , } (corresponding to the output of the CNF G ). We will use the followingsimple claim to control the behavior of ψ ∗ λ,k +1 , ( ~θ, on inputs of this sort: Claim 5.2.
Given < λ < , k ≥ , and ~θ ∈ R k , let v ∈ R k be such that v / ∈ Strip λ,k,~θ . Then bothvectors ( v, − ∈ R k +1 and ( v, ∈ R k +1 lie outside of Strip λ,k +1 , ( ~θ, .Proof. If v ∈ Outer λ,k,~θ (because of some coordinate v j ≥ θ j + λ ), then it is clear that ( v,
1) and( v, −
1) both lie in Outer λ,k +1 , ( ~θ, (because of the same coordinate). So suppose that v ∈ Inner k,~θ . The vector ( v,
1) lies in Outer λ,k +1 , ( ~θ, (because of the last coordinate 1 > λ ), and the vector( v, −
1) is easily seen to lie in Inner k +1 , ( ~θ, . ψ ∗ λ,k +1 , ( ~θ, As an intermediate step towards Theorem 2 we will first establish the following “pseudorandomgenerator” for the smooth function ψ ∗ λ,k +1 , ( ~θ, : Theorem 7 ( Gen fools the smooth test function ψ ∗ λ,k +1 , ( ~θ, ) . Let H ∧ G be a ( k, s, τ ) - CnfLtf ,and let W ∈ R n × k be the matrix of weight vectors (each of norm 1) of the τ -regular LTFs thatcomprise H , and ~θ ∈ R k be the vector of their thresholds (so sign( W j · x − θ j ) is the j -th LTF).For < λ < , let ψ ∗ λ,k +1 , ( ~θ, : R k +1 → [ − , be as described in Fact 5.1. Then when Gen isinstantiated with the parameters from Section 4, (cid:12)(cid:12)(cid:12)(cid:12) E Y ← Gen (cid:2) ψ ∗ λ,k +1 , ( ~θ, ( W T Y , G ( Y )) (cid:3) − E U ←{− , } n (cid:2) ψ ∗ λ,k +1 , ( ~θ, ( W T U , G ( U )) (cid:3)(cid:12)(cid:12)(cid:12)(cid:12) = O (log k ) λ (cid:18) (log k ) · τ log(1 /τ ) + 1 τ · δ CNF · n (cid:19) + 1 τ p δ CNF + X a =1 n a p δ CNF · (log k ) a − λ a !! . (4)12 Setup for our coupling-based hybrid argument
We begin by defining the sequence of random variables that we will use to hybridize between Y ← Gen , the n -bit pseudorandom input, and U , the n -bit uniform random input. Definition 2 (Hybrid random variables) . For any index b ∈ { , , . . . , ℓ } and any hash h : [ n ] → [ ℓ ] ,we define the hybrid random variable X h,b over {− , } n as follows: Independently across each c ∈ [ ℓ ] , • If c > b , then the coordinates X h,bh − ( c ) of X h,b are distributed according to a uniform randomdraw from {− , } n ; • If c ≤ b , then the coordinates X h,bh − ( c ) of X h,b are distributed according to a draw from an r bucket -wise independent random variable over {− , } n .Let H be a (2 log k ) -wise independent family of hashes h : [ n ] → [ ℓ ] . For each b ∈ { , , . . . , ℓ } ,the hybrid random variable X h ,b is defined by drawing h ← H and then taking X h ,b as above. Remark 8.
Note that X h , is a uniform random variable over {− , } n (indeed X h, is uniformfor every fixed hash h ), while X h ,ℓ is distributed according to Gen . Fix a hash h : [ n ] → [ ℓ ], a bucket b ∈ [ ℓ ], and a restriction ρ ∈ {− , } [ n ] \ h − ( b ) fixing the variablesoutside bucket h − ( b ). Recall that X h,b − is distributed according to the uniform distributionwithin h − ( b ), and X h,b is distributed according to a r bucket -wise independent distribution withinthis same bucket h − ( b ). For the remainder of this paper, for notational clarity unless otherwiseindicated U denotes a uniformly distributed random variable over {− , } h − ( b ) and Z denotes a r bucket -wise independent random variable over {− , } h − ( b ) . Our CNF-fooling-based coupling.
By the results of Bazzi and Razborov (Theorem 3) andthe choice of r bucket from Section 4, the random variable Z δ CNF -fools G ↾ ρ (which, like G , is an M -clause CNF). Consequently there exists a coupling ( b U , b Z ) between U and Z such that Pr ( b U , b Z ) (cid:2) ( G ↾ ρ )( b U ) = ( G ↾ ρ )( b Z ) (cid:3) ≤ δ CNF . (5)(Note that this coupling depends on G ↾ ρ .)Consider the following joint distribution over a pair of random variables ( c X h,b − ( ρ ) , c X h,b ( ρ )),both supported on {− , } n : First make a draw ( b U , b Y ) ← ( b U , b Z ), and output ( b X h,b − ( ρ ) , b X h,b ( ρ ))where • b X h,b − ( ρ ) assigns variables according to b U within h − ( b ), and according to ρ outside h − ( b ). • b X h,b ( ρ ) assigns variables according to b Y within h − ( b ), and according to ρ outside h − ( b ). Remark 9.
Note that for ρ ← X h,b [ n ] \ h − ( b ) , we have that c X h,b − ( ρ ) is distributed identically as X h,b − and likewise c X h,b ( ρ ) is distributed identically as X h,b .13 The hybrid argument: Proof of Theorem 7
Throughout this section for notational clarity we simply write ψ instead of ψ ∗ λ,k +1 , ( ~θ, . We alsowrite F ψ : {− , } n → [ − ,
1] to denote the function F ψ ( x ) = ψ ( W T x, G ( x )) . Our core technical result, which we prove in Section 8, is the following:
Lemma 7.1 (Error incurred in one step of hybrid) . For all hashes h : [ n ] → [ ℓ ] , buckets b ∈ [ ℓ ] ,and restrictions ρ ∈ {− , } [ n ] \ h − ( b ) , we have that (cid:12)(cid:12) E (cid:2) F ψ ( c X h,b − ( ρ )) (cid:3) − E (cid:2) F ψ ( c X h,b ( ρ )) (cid:3)(cid:12)(cid:12) (6)= O (log k ) λ (cid:0) (log k ) · h ( W, b ) + δ CNF · n (cid:1) + p δ CNF + X a =1 n a p δ CNF · (log k ) a − λ a ! , where h ( W, b ) := k X j =1 k W jh − ( b ) k k / log k . The following corollary follows as an immediate consequence of Lemma 7.1, Remark 9, and thetriangle inequality:
Corollary 7.2 (Averaging Lemma 7.1 over ρ and summing over b ∈ [ ℓ ]) . For all hashes h : [ n ] → [ ℓ ] ,we have that (cid:12)(cid:12) E (cid:2) F ψ ( X h, ) (cid:3) − E (cid:2) F ψ ( X h,ℓ ) (cid:3)(cid:12)(cid:12) = O ((log k ) ) λ · (log k ) · ℓ X b =1 h ( W, b )+ ℓ · O (log k ) λ · δ CNF · n + p δ CNF + X a =1 n a p δ CNF · (log k ) a − λ a ! . Proof.
We have that (cid:12)(cid:12) E (cid:2) F ψ ( X h, ) (cid:3) − E (cid:2) F ψ ( X h,ℓ ) (cid:3)(cid:12)(cid:12) ≤ ℓ X b =1 (cid:12)(cid:12) E (cid:2) F ψ ( X h,b − ) (cid:3) − E (cid:2) F ψ ( X h,b ) (cid:3)(cid:12)(cid:12) (Triangle inequality)= ℓ X b =1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E ρ ← X h,b [ n ] \ h − b ) (cid:2) F ψ ( c X h,b − ( ρ )) (cid:3) − E ρ ← X h,b [ n ] \ h − b ) (cid:2) F ψ ( c X h,b ( ρ )) (cid:3)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (Remark 9) ≤ ℓ X b =1 E ρ ← X h,b [ n ] \ h − b ) h(cid:12)(cid:12) E (cid:2) F ψ ( c X h,b − ( ρ )) (cid:3) − E (cid:2) F ψ ( c X h,b ( ρ )) (cid:3)(cid:12)(cid:12)i , which gives the claimed bound via Lemma 7.1. 14e do not have a good bound on the quantity h ( W, b ) for an arbitrary hash h : [ n ] → [ ℓ ] andbucket b ∈ [ ℓ ]. Instead, we shall use the following: Lemma 7.3 (Lemma 4.1 of [HKM12]) . For ℓ = 1 /τ and H a (2 log k ) -wise independent hashfamily, E h ←H " ℓ X b =1 h ( W, b ) ≤ ℓ X b =1 E h ←H k X j =1 k W jh − ( b ) k k / log k ≤ k · τ log(1 /τ ) . (The middle quantity is what [HKM12] denotes by H ( W ) and is the quantity they bound; the leftinequality is by the power-mean inequality.)We are now ready to prove Theorem 7: Proof of Theorem 7 assuming Lemma 7.1. (cid:12)(cid:12)(cid:12)(cid:12) E Y ← Gen (cid:2) ψ ∗ λ,k +1 , ( ~θ, ( W T Y , G ( Y )) (cid:3) − E U ←{− , } n (cid:2) ψ ∗ λ,k +1 , ( ~θ, ( W T U , G ( U )) (cid:3)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12) E (cid:2) F ψ ( X h , ) (cid:3) − E (cid:2) F ψ ( X h ,ℓ ) (cid:3)(cid:12)(cid:12) (Remark 8 and definition of F ψ ) ≤ E h ←H h(cid:12)(cid:12) E (cid:2) F ψ ( X h , ) (cid:3) − E (cid:2) F ψ ( X h ,ℓ ) (cid:3)(cid:12)(cid:12)i = O (log k ) λ (cid:18) (log k ) · τ log(1 /τ ) + 1 τ · δ CNF · n (cid:19) + 1 τ p δ CNF + X a =1 n a p δ CNF · (log k ) a − λ a !! , where the final equality is by Corollary 7.2, Lemma 7.3, and recalling that ℓ = 1 /τ . Fix a hash h : [ n ] → [ ℓ ], a bucket b ∈ [ ℓ ], and a restriction ρ ∈ {− , } [ n ] \ h − ( b ) . As is standard inapplications of the Lindeberg method, we will express F ψ ( c X h,b − ( ρ )) and F ψ ( c X h,b ( ρ )) as ψ ( v + ∆ unif ) and ψ ( v + ∆ pseudo ) respectively, where v is common to both random variables. (Very roughlyspeaking, the Lindeberg method employs Taylor’s theorem to show that quantities such as (6) aresmall if ∆ unif and ∆ pseudo are sufficiently “small” and ψ is sufficiently “nice.”). We now describethe choice of random variables v , ∆ unif , ∆ pseudo ∈ R k +1 to accomplish this.We define v : {− , } h − ( b ) → R k +1 as follows: v ( x ) j = X i ∈ [ n ] \ h − ( b ) W ji ρ i for j ∈ [ k ] ,v ( x ) k +1 = ( G ↾ ρ )( x ) . Recalling that ρ is a fixed restriction, we observe that only the final coordinate of v dependson its input x . We further define ∆ unif : {− , } h − ( b ) → R k +1 and ∆ pseudo : {− , } h − ( b ) ×{− , } h − ( b ) → R k +1 as follows:∆ unif ( x ) j = X i ∈ h − ( b ) W ji x i for j ∈ [ k ],∆ unif ( x ) k +1 = 0 , (7)15nd ∆ pseudo ( x, z ) j = X i ∈ h − ( b ) W ji z i for j ∈ [ k ],∆ pseudo ( x, z ) k +1 = ( G ↾ ρ )( z ) − ( G ↾ ρ )( x ) . We observe that F ψ ( c X h,b − ( ρ )) ≡ ψ ( v ( U ) + ∆ unif ( U )) F ψ ( c X h,b ( ρ )) ≡ ψ ( v ( b U ) + ∆ pseudo ( b U , b Z )) , and so the desired quantity (6) of Lemma 7.1 that we wish to upper bound may be re-expressed as(6) = (cid:12)(cid:12) E [ F ψ ( c X h,b − ( ρ ))] − E [ F ψ ( c X h,b ( ρ ))] (cid:12)(cid:12) = (cid:12)(cid:12)(cid:12) E U (cid:2) ψ ( v ( U ) + ∆ unif ( U )) (cid:3) − E ( b U , b Z ) (cid:2) ψ ( v ( b U ) + ∆ pseudo ( b U , b Z )) (cid:3)(cid:12)(cid:12)(cid:12) . (8)We observe that unlike standard Lindeberg-style proofs of invariance principles and associatedpseudorandomness results, in our setup v ( U ) and ∆ unif ( U ) are not independent, and likewiseneither are v ( b U ) and ∆ pseudo ( b U , b Z ). This motivates the definitions of the following subsection. Let U denote the distribution U conditioned on outcomes x ∈ {− , } h − ( b ) such that ( G ↾ ρ )( x ) =1, and similarly U − . Equivalently, U and U − are uniform distributions over ( G ↾ ρ ) − (1) and( G ↾ ρ ) − ( −
1) respectively. We note that U can be expressed as the mixture of U and U − withmixing weights π := Pr U (cid:2) ( G ↾ ρ )( U ) = 1 (cid:3) π − := Pr U (cid:2) ( G ↾ ρ )( U ) = − (cid:3) . We may suppose without loss of generality that Pr U [( G ↾ ρ )( U ) = − ≥ Pr Z [( G ↾ ρ )( Z ) = − b U , b Z ) as the mixture of conditional distribu-tions ( b U , b Z ), ( b U − , b Z − ), ( b U err , b Z err ), where • ( b U , b Z ) is supported on pairs ( x, z ) such that ( G ↾ ρ )( x ) = ( G ↾ ρ )( z ) = 1 • ( b U − , b Z − ) is supported on pairs ( x, z ) such that ( G ↾ ρ )( x ) = ( G ↾ ρ )( z ) = − • ( b U err , b Z err ) is supported on pairs ( x, z ) such that ( G ↾ ρ )( x ) = − , ( G ↾ ρ )( z ) = 1.The mixing weights are ˜ π , ˜ π − , and ˜ π err respectively, where˜ π = π , ˜ π − = π − − ˜ π err , ˜ π err ≤ δ CNF and the bound ˜ π err ≤ δ CNF follows from (5). We stress that while b U is distributed identically as U , this is not the case for b U − and U − , because of the small fraction of pairs that do not alignperfectly under the coupling ( b U , b Z ) and are captured by ( b U err , b Z err ).16 roposition 8.1 (Expressing U and ( b U , b Z ) as mixtures of conditional distributions) . For anyfunction f : {− , } h − ( b ) → R , E U (cid:2) f ( U ) (cid:3) = π E U (cid:2) f ( U ) (cid:3) + π − E U − (cid:2) f ( U − ) (cid:3) . Similarly, for any function f : {− , } h − ( b ) × {− , } h − ( b ) → R , E ( b U , b Z ) (cid:2) f ( b U , b Z ) (cid:3) = ˜ π E ( b U , b Z ) (cid:2) f ( b U , b Z ) (cid:3) + ˜ π − E ( b U − , b Z − ) (cid:2) f ( b U − , b Z − ) (cid:3) + ˜ π err E ( b U err , b Z err ) (cid:2) f ( b U err , b Z err ) (cid:3) = π E ( b U , b Z ) (cid:2) f ( b U , b Z ) (cid:3) + ( π − − ˜ π err ) E ( b U − , b Z − ) (cid:2) f ( b U − , b Z − ) (cid:3) + ˜ π err E ( b U err , b Z err ) (cid:2) f ( b U err , b Z err ) (cid:3) = π E ( b U , b Z ) (cid:2) f ( b U , b Z ) (cid:3) + π − E ( b U − , b Z − ) (cid:2) f ( b U − , b Z − ) (cid:3) ± δ CNF · k f k ∞ . These conditional distributions are useful because of the following two simple but crucial ob-servations:
Observation 10 ( v becomes constant) . Fix c ∈ {− , } . For all x ∈ supp( U c ) we have that v ( x )is the same fixed vector v ∗ ∈ R k +1 given by v ∗ j = X i ∈ [ n ] \ h − ( b ) W ji ρ i for j ∈ [ k ], v ∗ k +1 = ( G ↾ ρ )( x ) = c. The same is true for b U c : for all x ∈ supp( b U c ) we have v ( x ) = v ∗ .Note that as a consequence of Observation 10, the random variables v ( U c ) and ∆ unif ( U c ) areindependent for c ∈ {− , } , and likewise v ( b U c ) and ∆ pseudo ( b U c , b Z c ) are independent as well; cf. ourremark following Equation (8). The next observation further motivates our couplings ( b U , b Z ) and( b U − , b Z − ): Observation 11 (∆ pseudo k +1 = 0) . Fix c ∈ {− , } . For all ( b U , b Z ) ∈ supp( b U c , b Z c ), we have∆ pseudo k +1 ( b U , b Z ) = ( G ↾ ρ )( b Z ) − ( G ↾ ρ )( b U ) = 0 . .1.1 Massaging our goal (8) Applying Proposition 8.1, we can rewrite the RHS of (8) as: (cid:12)(cid:12)(cid:12) E U (cid:2) ψ ( v ( U ) + ∆ unif ( U )) (cid:3) − E ( b U , b Z ) (cid:2) ψ ( v ( b U ) + ∆ pseudo ( b U , b Z )) (cid:3)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:16) π E U (cid:2) ψ ( v ( U ) + ∆ unif ( U )) (cid:3) + π − E U − (cid:2) ψ ( v ( U − ) + ∆ unif ( U − )) (cid:3)(cid:17) − (cid:16) π E ( b U , b Z ) (cid:2) ψ ( v ( b U ) + ∆ pseudo ( b U , b Z )) (cid:3) + π − E ( b U − , b Z − ) (cid:2) ψ ( v ( b U − ) + ∆ pseudo ( b U − , b Z − )) (cid:3)(cid:17)(cid:12)(cid:12)(cid:12)(cid:12) ± δ CNF · k ψ k ∞ ≤ π · (cid:12)(cid:12)(cid:12)(cid:12) E U (cid:2) ψ ( v ( U ) + ∆ unif ( U )) (cid:3) − E ( b U , b Z ) (cid:2) ψ ( v ( b U ) + ∆ pseudo ( b U , b Z )) (cid:3)(cid:12)(cid:12)(cid:12)(cid:12) + π − · (cid:12)(cid:12)(cid:12)(cid:12) E U − (cid:2) ψ ( v ( U − ) + ∆ unif ( U − )) (cid:3) − E ( b U − , b Z − ) (cid:2) ψ ( v ( b U − ) + ∆ pseudo ( b U − , b Z − )) (cid:3)(cid:3)(cid:12)(cid:12)(cid:12)(cid:12) + 2 δ CNF , where the final inequality uses the fact that ψ has range [ − , c ∈ {− , } , π c · (cid:12)(cid:12)(cid:12)(cid:12) E U c (cid:2) ψ ( v ( U c ) + ∆ unif ( U c )) (cid:3) − E ( b U c , b Z c ) (cid:2) ψ ( v ( b U c ) + ∆ pseudo ( b U c , b Z c )) (cid:3)(cid:12)(cid:12)(cid:12)(cid:12) ≤ π c · k ψ k ∞ = 2 π c , which is at most 2 √ δ CNF if π c ≤ √ δ CNF (this is the O ( √ δ CNF ) on the RHS of (6)). We subsequentlyassume that π c ≥ √ δ CNF , and proceed to bound X c ∈{− , } π c · (cid:12)(cid:12)(cid:12)(cid:12) E U c (cid:2) ψ ( v ( U c ) + ∆ unif ( U c )) (cid:3) − E ( b U c , b Z c ) (cid:2) ψ ( v ( b U c ) + ∆ pseudo ( b U c , b Z c )) (cid:3)(cid:12)(cid:12)(cid:12)(cid:12) . We proceed to analyze E U c (cid:2) ψ ( v ( U c ) + ∆ unif ( U c )) (cid:3) − E ( b U c , b Z c ) (cid:2) ψ ( v ( b U c ) + ∆ pseudo ( b U c , b Z c )) (cid:3) for c ∈ {− , } . We will do so by analyzing the Taylor expansion of ψ ( v + ∆) (Fact 3.1): ψ ( v + ∆) = ψ ( v ) (Zeroth-order term)+ X j ∈ [ k +1] ( ∂ j ψ )( v )∆ j (First-order terms)+ X j,j ′ ∈ [ k +1] j, j ′ )! ( ∂ j,j ′ ψ )( v )∆ j ∆ j ′ (Second-order terms)+ X j,j ′ ,j ′′ ∈ [ k +1] j, j ′ , j ′′ )! ( ∂ j,j ′ ,j ′′ ψ )( v )∆ j ∆ j ′ ∆ j ′′ (Third-order terms) ± k ψ (4) k · max j ∈ [ k +1] | ∆ j | . (Error term)18et us consider each of the five terms in the Taylor expansion, starting with the easiest one: Proposition 8.2 (Expected difference of zeroth-order terms) .E U c (cid:2) ψ ( v ( U c )) (cid:3) − E ( b U c , b Z c ) (cid:2) ψ ( v ( b U c )) (cid:3) = 0 . Proof.
Recalling Observation 10, we have that v ( x ) = v ( x ′ ) = v ∗ for all x ∈ supp( U c ) and x ′ ∈ supp( b U c ), where v ∗ is a fixed vector in R k +1 . In order words, therandom variables v ( U c ) and v ( b U c ) are both supported entirely on the same constant v ∗ . In this section we bound the expected difference of the third-order terms: π c · (cid:12)(cid:12)(cid:12)(cid:12) E U c (cid:20) X j,j ′ ,j ′′ ∈ [ k +1] ( ∂ j,j ′ ,j ′′ ψ )( v ( U c ))∆ unif ( U c ) j ∆ unif ( U c ) j ′ ∆ unif ( U c ) j ′′ (cid:21) − E ( b U c , b Z c ) (cid:20) X j,j ′ ,j ′′ ∈ [ k +1] ( ∂ j,j ′ ,j ′′ ψ )( v ( b U c ))∆ pseudo ( b U c , b Z c ) j ∆ pseudo ( b U c , b Z c ) j ′ ∆ pseudo ( b U c , b Z c ) j ′′ (cid:21)(cid:12)(cid:12)(cid:12)(cid:12) . (9)We observe that in standard applications of the Lindeberg method the quantity analogous to theabove quantity would be exactly zero due to matching moments (see the parenthetical followingEquation (11) below). Since our setting requires that we perform the hybrid argument over theconditional distributions U c and ( b U c , b Z c ) (rather than the global distributions U and ( b U , b Z )) weno longer have matching moments, but our analysis in this section shows that the error incurred bythe mismatch is acceptably small. More precisely, we will prove that (9) is at most O ( n √ δ CNF · (log k ) /λ ). An identical argument shows that the analogous quantities for the first- and second-order terms are at most O ( n √ δ CNF · (log k ) /λ ) and O ( n √ δ CNF /λ ) respectively.We begin by noting that(9) = π c · (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X j,j ′ ,j ′′ ∈ [ k +1] ( ∂ j,j ′ ,j ′′ ψ )( v ∗ ) (cid:18) E U c (cid:20) Y ξ ∈{ j,j ′ ,j ′′ } ∆ unif ( U c ) ξ (cid:21) − E ( b U c , b Z c ) (cid:20) Y ξ ∈{ j,j ′ ,j ′′ } ∆ pseudo ( b U c , b Z c ) ξ (cid:21)(cid:19)| {z } Φ( j,j ′ ,j ′′ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (10)where (as in Proposition 8.2) we have again used Observation 10 to get that v ( U c ) ≡ v ( b U c ) ≡ v ∗ for a fixed vector v ∗ ∈ R k +1 . Observation 12 (Difference is zero if j = k + 1 participates) . If k + 1 ∈ { j, j ′ , j ′′ } then Φ( j, j ′ , j ′′ ) =0. 19 roof. This is because ∆ unif k +1 is the identically 0 function (by definition; recall Equation (7)), and∆ pseudo k +1 ( b U c , b Z c ) = 0 for all ( b U c , b Z c ) ∈ supp( b U c , b Z c ) (Observation 11).Therefore it suffices to reason about Φ( j, j ′ , j ′′ ) for triples j, j ′ , j ′′ ∈ [ k ]. Fix any such triple.Recalling the definitions of ∆ unif j and ∆ pseudo j for j ∈ [ k ]:∆ unif ( x ) j = X i ∈ h − ( b ) W ji x i , ∆ pseudo ( x, z ) j = X i ∈ h − ( b ) W ji z i and applying linearity of expectation, we have thatΦ( j, j ′ , j ′′ ) = X i,i ′ ,i ′′ ∈ h − ( b ) W ji W j ′ i ′ W j ′′ i ′′ (cid:16) E U c (cid:2) U ci U ci ′ U ci ′′ (cid:3) − E ( b U c , b Z c ) (cid:2) b Z ci b Z ci ′ b Z ci ′′ (cid:3)(cid:17) . (11)(Note that E (cid:2) U i U i ′ U i ′′ (cid:3) − E (cid:2) b Z i b Z i ′ b Z i ′′ (cid:3) = 0 since U and Z have matching moments. However,since we are working with the conditional distributions b U c and ( b U c , b Z c ) this is no longer the case;nevertheless, we will now show that this difference is adequately small.) The first expectation onthe RHS can be expressed as 2 p unif − p unif = Pr U c (cid:2) U ci U ci ′ U ci ′′ = 1 (cid:3) = Pr U (cid:2) U i U i ′ U i ′′ = 1 , ( G ↾ ρ )( U ) = c (cid:3) Pr U (cid:2) ( G ↾ ρ )( U ) = c (cid:3) , (12)and likewise the second expectation can be expressed as 2 p pseudo − p pseudo = Pr ( b U c , b Z c ) (cid:2) b Z ci b Z ci ′ b Z ci ′′ = 1 (cid:3) = Pr ( b U , b Z ) (cid:2) b Z i b Z i ′ b Z i ′′ = 1 , ( G ↾ ρ )( b U ) = ( G ↾ ρ )( b Z ) = c (cid:3) Pr ( b U , b Z ) (cid:2) ( G ↾ ρ )( b U ) = ( G ↾ ρ )( b Z ) = c (cid:3) . (13)Note that the numerator of (13) is Pr Z (cid:2) Z i Z i ′ Z i ′′ = 1 , ( G ↾ ρ )( Z ) = c (cid:3) − Pr ( b U , b Z ) (cid:2) b Z i , b Z i ′ b Z i ′′ = 1 , ( G ↾ ρ )( b U ) = − c, ( G ↾ ρ )( b Z ) = c (cid:3) ≥ Pr Z (cid:2) Z i Z i ′ Z i ′′ = 1 , ( G ↾ ρ )( Z ) = c (cid:3) − Pr ( b U , b Z ) (cid:2) ( G ↾ ρ )( b U ) = − c, ( G ↾ ρ )( b Z ) = c (cid:3) = Pr Z (cid:2) Z i Z i ′ Z i ′′ = 1 , ( G ↾ ρ )( Z ) = c (cid:3) − O ( δ CNF ) . (by (5))Likewise, the denominator of (13) is Pr Z (cid:2) ( G ↾ ρ )( Z ) = c (cid:3) − O ( δ CNF ), again by (5). Therefore,we have that p pseudo = Pr Z (cid:2) Z i Z i ′ Z i ′′ = 1 , ( G ↾ ρ )( Z ) = c (cid:3) − O ( δ CNF ) Pr Z (cid:2) ( G ↾ ρ )( Z ) = c (cid:3) − O ( δ CNF ) . Next, we note that Z δ CNF -fools the function ( G ↾ ρ )( x ) ⊕ β as well as the function (( G ↾ ρ )( x ) ⊕ β ) ∧ ( ¬ ( x i ⊕ x i ′ ⊕ x i ′′ )) for i, i ′ , i ′′ ∈ h − ( b ), β ∈ {− , } . The former is true by Theorem 3and the fact that r bucket ≥ O ((log( M/δ
CNF )) ), and the latter is true because r bucket ≥ k + O ((log( M/δ
CNF )) ≥ O ((log( M/δ
CNF )) ) . (Observe that if a function f ( x ) and all its restrictions20re κ -fooled by r -wise independence, then f ( x ) ∧ J ( x ), where J is any 3-junta, is κ -fooled by ( r + 3)-wise independence.) Hence we have p pseudo = Pr U (cid:2) U i U i ′ U i ′′ = 1 , ( G ↾ ρ )( U ) = c (cid:3) ± O ( δ CNF ) Pr U (cid:2) ( G ↾ ρ )( U ) = c (cid:3) ± O ( δ CNF ) . Since by assumption π c = Pr U (cid:2) ( G ↾ ρ )( U ) = c (cid:3) ≥ √ δ CNF , it follows from the above and (12)that p pseudo = p unif ± O ( p δ CNF ) . Recalling (11), we have shown that | Φ( j, j ′ , j ′′ ) | = X i,i ′ ,i ′′ ∈ h − ( b ) W ji W j ′ i ′ W j ′′ i ′′ · O ( p δ CNF ) = O ( n p δ CNF ) , where the final equality uses the trivial bounds of | W ji | ≤ j ∈ [ k ] and i ∈ h − ( b ), and | h − ( b ) | ≤ n . We conclude that the expected difference of the third-order terms is at most(10) = π c · (cid:12)(cid:12)(cid:12)(cid:12) X j,j ′ ,j ′′ ∈ [ k +1] ( ∂ j,j ′ ,j ′′ ψ )( v ∗ ) · Φ( j, j ′ , j ′′ ) (cid:12)(cid:12)(cid:12)(cid:12) = O ( n p δ CNF ) · (cid:12)(cid:12)(cid:12)(cid:12) X j,j ′ ,j ′′ ∈ [ k +1] ( ∂ j,j ′ ,j ′′ ψ )( v ∗ ) (cid:12)(cid:12)(cid:12)(cid:12) = O ( n p δ CNF ) · (log k ) λ , where the final equality uses the bound on k ψ (3) k given by Fact 5.1. Finally we bound the contribution from the error terms. This is at most X c ∈{− , } (cid:18) π c E U c h k ψ (4) k max j ∈ [ k +1] (cid:12)(cid:12) ∆ unif ( U c ) j (cid:12)(cid:12) i + π c E ( b U c , b Z c ) h k ψ (4) k max j ∈ [ k +1] (cid:12)(cid:12) ∆ pseudo ( b U c , b Z c ) j (cid:12)(cid:12) i(cid:19) = k ψ (4) k X c ∈{− , } (cid:18) π c E U c h max j ∈ [ k ] (cid:12)(cid:12) ∆ unif ( U c ) j (cid:12)(cid:12) i + π c E ( b U c , b Z c ) h max j ∈ [ k ] (cid:12)(cid:12) ∆ pseudo ( b U c , b Z c ) j (cid:12)(cid:12) i(cid:19) , where this equality again uses the fact that ∆ unif k +1 is the constant 0 function (by definition; recallEquation (7)) and ∆ pseudo ( b U c , b Z c ) k +1 = 0 for all ( b U c , b Z c ) ∈ supp( b U c , b Z c ) (Observation 11) to getthat the max’s can be taken over j ∈ [ k ] rather than j ∈ [ k + 1]. Applying both statements ofProposition 8.1, we get that the above is k ψ (4) k (cid:18) E U h max j ∈ [ k ] (cid:12)(cid:12) ∆ unif ( U ) j (cid:12)(cid:12) i + E ( b U , b Z ) h max j ∈ [ k ] (cid:12)(cid:12) ∆ pseudo ( b U , b Z ) j (cid:12)(cid:12) i ± δ CNF · (cid:13)(cid:13)(cid:13) max j ∈ [ k ] (cid:12)(cid:12) ∆ pseudo ( · , · ) j (cid:12)(cid:12) (cid:13)(cid:13)(cid:13) ∞ (cid:19) = O ((log k ) ) λ (cid:18) E U h max j ∈ [ k ] (cid:12)(cid:12) ∆ unif ( U ) j (cid:12)(cid:12) i + E ( b U , b Z ) h max j ∈ [ k ] (cid:12)(cid:12) ∆ pseudo ( b U , b Z ) j (cid:12)(cid:12) i + δ CNF ( √ n ) (cid:19) , k ∆ pseudo j k ∞ ≤ √ n for j ∈ [ k ] (recalling that eachweight vector W j has k W j k equal to 1). Since r bucket ≥ k , by the same hypercontractivity-based calculations as in the proof of Claim 4.4 of [HKM12] (starting at the bottom of page 15),each of the two expectations is at most O ((log k ) ) · h ( W, b ). (We refer the reader to Section 6.2 of[HKM12] for a justification of why the r bucket -wise independence of the distribution b Z suffices forthe analysis of the second expectation.) This concludes the proof of Lemma 7.1. In this section we relate what we have shown so far, a bound on (cid:12)(cid:12)(cid:12) E U ←{− , } n (cid:2) F ψ ∗ λ,k +1 , ( ~θ, ( U ) (cid:3) − E Y ← Gen (cid:2) F ψ ∗ λ,k +1 , ( ~θ, ( Y ) (cid:3)(cid:12)(cid:12)(cid:12) , (14)to the relevant quantity for Theorem 2, (cid:12)(cid:12)(cid:12) E U ←{− , } n (cid:2) F ( U ) (cid:3) − E Y ← Gen (cid:2) F ( Y ) (cid:3)(cid:12)(cid:12)(cid:12) . (15)By [HKM12]’s Lemma 3.3, the quantity (15) is at most O (1) · ((14) + Pr (cid:2) ( W T U , G ( U )) ∈ Strip λ,k +1 , ( ~θ, (cid:3) ) . We bound this probability as follows: Pr (cid:2) ( W T U , G ( U )) ∈ Strip λ,k +1 , ( ~θ, (cid:3) ≤ Pr (cid:2) W T U ∈ Strip λ,k,~θ (cid:3) (Claim 5.2) ≤ Pr (cid:2) W T G ∈ Strip λ,k,~θ (cid:3) + O ((log k ) / ( τ log(1 /τ )) / )([HKM12]’s invariance principle, Theorem 4)= O ( λ p log k ) + O ((log k ) / ( τ log(1 /τ )) / ) . (Theorem 5)Therefore, it follows that(15) = O (log k ) λ (cid:18) (log k ) · τ log(1 /τ ) + 1 τ · δ CNF · n (cid:19) + 1 τ p δ CNF + X a =1 n a p δ CNF · (log k ) a − λ a !! + O ( λ p log k ) + O ((log k ) / ( τ log(1 /τ )) / ) . As in [HKM12], we choose λ = (log k ) / ( τ log(1 /τ )) / , which makes λ p log k = Θ (cid:18) (log k ) λ · (cid:0) (log k ) · τ log(1 /τ ) (cid:1)(cid:19) = Θ((log k ) / ( τ log(1 /τ )) / )) . Since k ≤ n and τ ≥ / √ n , a suitable choice of δ CNF = 1 / poly( n ) makes the remaining quantity,1 τ (log k ) λ · δ CNF · n + p δ CNF + X a =1 n a p δ CNF · (log k ) a − λ a ! , at most O ((log k ) / ( τ log(1 /τ )) / )) , so we get that (15) is O ((log k ) / ( τ log(1 /τ )) / ) as desired.This concludes the proof of Theorem 2. 22 eferences [Baz07] Louay Bazzi. Polylogarithmic independence can fool DNF formulas. In Proc. 48thIEEE Symposium on Foundations of Computer Science (FOCS) , pages 63–73, 2007.(document), 2, 2.3, 2.3.1, 3[Ben90] Vidmantas Bentkus. Smooth approximations of the norm and differentiable functionswith bounded support in Banach space l k ∞ . Lithuan. Math. J. , 30(3):223–230, 1990. 2,2.3, 5.1, 5.1[Ber41] Andrew C. Berry. The accuracy of the Gaussian approximation to the sum of inde-pendent variates.
Transactions of the American Mathematical Society , 49(1):122–136,1941. 1.1[BK97] Avrim Blum and Ravi Kannan. Learning an intersection of a constant number ofhalfspaces under a uniform distribution.
Journal of Computer and System Sciences ,54(2):371–380, 1997. 1[Cho61] Chao-Kong Chow. On the characterization of threshold functions. In
Proceedings ofthe Symposium on Switching Circuit Theory and Logical Design (FOCS) , pages 34–38,1961. 1[CSS16] Ruiwen Chen, Rahul Santhanam, and Srikanth Srinivasan. Average-case lower boundsand satisfiability algorithms for small threshold circuits. In
Proceedings of the 31stConference on Computational Complexity (CCC) , 2016. 1.1.1[DDS14] Anindya De, Ilias Diakonikolas, and Rocco A. Servedio. Deterministic approximatecounting for juntas of degree-2 polynomial threshold functions. In
Proceedings of the29th Annual Conference on Computational Complexity (CCC) , pages 229–240. IEEE,2014. 1.1.1[DDS16] Anindya De, Ilias Diakonikolas, and Rocco A. Servedio. A robust Khintchine inequal-ity, and algorithms for computing optimal constants in Fourier analysis and high-dimensional geometry.
SIAM J. Discrete Math. , 30(2):1058–1094, 2016. 1.1.1[DGJ +
10] Ilias Diakonikolas, Parikshit Gopalan, Rajesh Jaiswal, Rocco A. Servedio, andEmanuele Viola. Bounded independence fools halfspaces.
SIAM Journal on Com-puting , 39(8):3441–3462, 2010. 1, 1.1, 1.1.1[DKN10] Ilias Diakonikolas, Daniel M. Kane, and Jelani Nelson. Bounded independence foolsdegree-2 threshold functions. In
Proc. 51st IEEE Symposium on Foundations of Com-puter Science (FOCS) , pages 11–20, 2010. 1.1.1[DRST14] Ilias Diakonikolas, Prasad Raghavendra, Rocco A. Servedio, and Li-Yang Tan. Averagesensitivity and noise sensitivity of polynomial threshold functions.
SIAM Journal onComputing , 43(1):231–253, 2014. 1.1.1[DS13] Ilias Diakonikolas and Rocco A. Servedio. Improved approximation of linear thresholdfunctions.
Computational Complexity , 22(3):623–677, 2013. 1.1.123DS14] Anindya De and Rocco A. Servedio. Efficient deterministic approximate counting forlow-degree polynomial threshold functions. In
Proceedings of the 46th Annual Sympo-sium on Theory of Computing (STOC) , pages 832–841, 2014. 1.1.1[DSTW14] Ilias Diakonikolas, Rocco A. Servedio, Li-Yang Tan, and Andrew Wan. A regularitylemma and low-weight approximators for low-degree polynomial threshold functions.
Theory of Computing , 10:27–53, 2014. 1.1.1[Ess42] Carl-Gustav Esseen. On the Liapunoff limit of error in the theory of probability.
Arkivf¨or matematik, astronomi och fysik , A:1–19, 1942. 1.1[FGRW09] Vitaly Feldman, Venkatesan Guruswami, Prasad Raghavendra, and Yi Wu. Agnos-tic learning of monomials by halfspaces is hard. In
Proc. 50th IEEE Symposium onFoundations of Computer Science (FOCS) , pages 385–394, 2009. 1.1.1[GHR92] Mikhail Goldmann, Johan H˚astad, and Alexander Razborov. Majority gates vs. generalweighted threshold gates.
Computational Complexity , 2:277–300, 1992. 1[GKM15] Parikshit Gopalan, Daniel M. Kane, and Raghu Meka. Pseudorandomness via the dis-crete fourier transform. In
IEEE 56th Annual Symposium on Foundations of ComputerScience, FOCS , pages 903–922, 2015. 1, 1.1[GL94] Craig Gotsman and Nathan Linial. Spectral properties of threshold functions.
Combi-natorica , 14(1):35–50, 1994. 1[GOWZ10] Parikshit Gopalan, Ryan O’Donnell, Yi Wu, and David Zuckerman. Fooling functionsof halfspaces under product distributions. In
Proceedings of the 25th Annual Conferenceon Computational Complexity (CCC) , pages 223–234, 2010. 1, 1.1, 1.1.1, 1.2, 2.3, 2.3,2.3.1[H˚as94] Johan H˚astad. On the size of weights for threshold gates.
SIAM Journal on DiscreteMathematics , 7(3):484–492, 1994. 1, 1.2[HKM12] Prahladh Harsha, Adam R. Klivans, and Raghu Meka. An invariance principle forpolytopes.
J. ACM , 59(6):29:1–29:25, 2012. (document), 1, 1.1, 1.1.1, 1.2, 2, 2.1, 2.2,1, 1, 2, 2.3, 2.3, 2.3.1, 3, 3, 4, 5, 5.1, 5.1, 7.3, 7, 8.2.2, 9, 9, 9[Hon87] Jiawei Hong. On connectionist models. Technical Report Technical Report 87-012,Dept. of Computer Science, University of Chicago, 1987. 1.2[Kan14] Daniel M. Kane. The average sensitivity of an intersection of half spaces. In
Symposiumon Theory of Computing (STOC) , pages 437–440, 2014. 1[KOS04] Adam Klivans, Ryan O’Donnell, and Rocco A. Servedio. Learning intersections andthresholds of halfspaces.
Journal of Computer & System Sciences , 68(4):808–840, 2004.1[KOS08] Adam Klivans, Ryan O’Donnell, and Rocco A. Servedio. Learning geometric conceptsvia Gaussian surface area. In
Proceedings of the 49th Symposium on Foundations ofComputer Science (FOCS) , pages 541–550, 2008. 124KS06] Adam Klivans and Alexander Sherstov. Cryptographic hardness for learning inter-sections of halfspaces. In
Proc. 47th IEEE Symposium on Foundations of ComputerScience (FOCS) , pages 553–562, 2006. 1[KS11] Subhash Khot and Rishi Saket. On the hardness of learning intersections of two halfs-paces.
J. Comput. Syst. Sci. , 77(1):129–141, 2011. 1[MORS09] Kevin Matulef, Ryan O’Donnell, Ronitt Rubinfeld, and Rocco A. Servedio. Testing ± APPROX-RANDOM , pages 646–657, 2009. 1[MORS10] Kevin Matulef, Ryan O’Donnell, Ronitt Rubinfeld, and Rocco A. Servedio. Testinghalfspaces.
SIAM J. on Comput. , 39(5):2004–2047, 2010. 1[MTT61] Saburo Muroga, Iwao Toda, and Satoru Takasu. Theory of majority switching elements.
J. Franklin Institute , 271(5):376–418, 1961. 1.2[MZ13] Raghu Meka and David Zuckerman. Pseudorandom generators for polynomial thresholdfunctions.
SIAM J. Comput. , 42(3):1275–1301, 2013. 1, 1.1, 1.1.1, 2, 2.1, 2.2, 1a, 2.3,2.3.1[Naz03] Fedor Nazarov. On the maximal perimeter of a convex set in R n with respect toa Gaussian measure. In Geometric aspects of functional analysis (2001-2002) , pages169–187. Lecture Notes in Math., Vol. 1807, Springer, 2003. 2, 3[Nis93] Noam Nisan. The communication complexity of threshold gates. In
In Proceedings ofCombinatorics, Paul Erdos is Eighty , pages 301–315, 1993. 1[O’D14] Ryan O’Donnell.
Analysis of Boolean Functions . Cambridge University Press, 2014.Available at http://analysisofbooleanfunctions.org/ . 1, 2.3.1[Per04] Yuval Peres. Noise stability of weighted majority, 2004. Available athttp://arxiv.org/abs/math/0412377. 1[Rag88] Prabhakar Raghavan. Learning in threshold networks. In
First Workshop on Compu-tational Learning Theory , pages 19–27, 1988. 1.2[Raz92] Alexander Razborov. On small depth threshold circuits. In
Proceedings of the ThirdScandinavian Workshop on Algorithm Theory (SWAT) , pages 42–52, 1992. 1[Raz09] Alexander Razborov. A simple proof of Bazzi’s theorem.
ACM Trans. Comput. Theory ,1(1):3:1–3:5, February 2009. (document), 2, 2.3, 2.3.1, 3[Ser07] Rocco A. Servedio. Every linear threshold function has a low-weight approximator.
Comput. Complexity , 16(2):180–209, 2007. 1, 1.1.1[She13a] Alexander A. Sherstov. The intersection of two halfspaces has high threshold degree.
SIAM J. Comput. , 42(6):2329–2374, 2013. 1[She13b] Alexander A. Sherstov. Optimal bounds for sign-representing the intersection of twohalfspaces by polynomials.
Combinatorica , 33(1):73–96, 2013. 125SO03] Jir´ı S´ıma and Pekka Orponen. General-purpose computation with neural networks: Asurvey of complexity theoretic results.
Neural Computation , 15(12):2727–2778, 2003. 1[Tao10] Terence Tao. 254A Notes: Topics in random matrix theory. https://terrytao.wordpress.com/tag/lindeberg-replacement-trick/ , 2010.2.3.1[Vem10] Santosh Vempala. A random-sampling-based algorithm for learning intersections ofhalfspaces.
J. ACM , 57(6:32), 2010. 1[Vio15] Emanuele Viola. The communication complexity of addition.