[PDF] Algorithms and Lower Bounds for de Morgan Formulas of Low-Communication Leaf Gates

Abstract

The class FORMULA[s]∘G consists of Boolean functions computable by size- s de Morgan formulas whose leaves are any Boolean functions from a class G . We give lower bounds and (SAT, Learning, and PRG) algorithms for FORMULA[ n 1.99 ]∘G , for classes G of functions with low communication complexity. Let R (k) (G) be the maximum k -party NOF randomized communication complexity of G . We show: (1) The Generalized Inner Product function GI P k n cannot be computed in FORMULA[s]∘G on more than 1/2+ε fraction of inputs for s=o ⎛ ⎝ ⎜ ⎜ n 2 (k⋅ 4 k ⋅ R (k) (G)⋅log(n/ε)⋅log(1/ε)) 2 ⎞ ⎠ ⎟ ⎟ . As a corollary, we get an average-case lower bound for GI P k n against FORMULA[ n 1.99 ]∘PT F k−1 . (2) There is a PRG of seed length n/2+O( s √ ⋅ R (2) (G)⋅log(s/ε)⋅log(1/ε)) that ε -fools FORMULA[s]∘G . For FORMULA[s]∘LTF , we get the better seed length O( n 1/2 ⋅ s 1/4 ⋅log(n)⋅log(n/ε)) . This gives the first non-trivial PRG (with seed length o(n) ) for intersections of n half-spaces in the regime where ε≤1/n . (3) There is a randomized 2 n−t -time # SAT algorithm for FORMULA[s]∘G , where t=Ω ( n s √ ⋅ log 2 (s)⋅ R (2) (G) ) 1/2 . In particular, this implies a nontrivial #SAT algorithm for FORMULA[ n 1.99 ]∘LTF . (4) The Minimum Circuit Size Problem is not in FORMULA[ n 1.99 ]∘XOR . On the algorithmic side, we show that FORMULA[ n 1.99 ]∘XOR can be PAC-learned in time 2 O(n/logn) .

Full PDF

aa r X i v : . [ c s . CC ] F e b Algorithms and Lower Bounds for de Morgan Formulas ofLow-Communication Leaf Gates

Valentine Kabanets ∗ Sajin Koroth † Zhenjian Lu ‡ Dimitrios Myrisiotis § Igor C. Oliveira ¶ February 21, 2020

The class

FORMULA [ s ] ◦ G consists of Boolean functions computable by size- s de Morganformulas whose leaves are any Boolean functions from a class G . We give lower bounds and (SAT,Learning, and PRG) algorithms for FORMULA [ n . ] ◦ G , for classes G of functions with low com-munication complexity . Let R ( k ) ( G ) be the maximum k -party number-on-forehead randomizedcommunication complexity of a function in G . Among other results, we show that: • The Generalized Inner Product function

GIP kn cannot be computed in FORMULA [ s ] ◦ G onmore than 1 / ε fraction of inputs for s = o n (cid:0) k · k · R ( k ) ( G ) · log( n/ε ) · log(1 /ε ) (cid:1) ! . This signiﬁcantly extends the lower bounds against bipartite formulas obtained by [Tal17].As a corollary, we get an average-case lower bound for

GIP kn against FORMULA [ n . ] ◦ PTF k − , i.e., sub-quadratic-size de Morgan formulas with degree-( k −

1) PTF (polynomialthreshold function) gates at the bottom. Previously, only sub-linear lower bounds wereknown [Nis94, Vio15] for circuits with PTF gates. • There is a PRG of seed length n/ O (cid:0) √ s · R (2) ( G ) · log( s/ε ) · log(1 /ε ) (cid:1) that ε -fools FORMULA [ s ] ◦ G . For the special case of FORMULA [ s ] ◦ LTF , i.e., size- s formulas with LTF(linear threshold function) gates at the bottom, we get the better seed length O (cid:0) n / · s / · log( n ) · log( n/ε ) (cid:1) .In particular, this provides the ﬁrst non-trivial PRG (with seed length o ( n )) for intersec-tions of n half-spaces in the regime where ε ≤ /n , complementing a recent result of[OST19]. • There exists a randomized 2 n − t -time FORMULA [ s ] ◦ G , where t = Ω (cid:18) n √ s · log ( s ) · R (2) ( G ) (cid:19) / . In particular, this implies a nontrivial

FORMULA [ n . ] ◦ LTF . • The Minimum Circuit Size Problem is not in

FORMULA [ n . ] ◦ XOR ; thereby makingprogress on hardness magniﬁcation, in connection with results from [OPS19, CJW19]. Onthe algorithmic side, we show that the concept class

FORMULA [ n . ] ◦ XOR can be PAC-learned in time 2 O ( n/ log n ) . ∗ School of Computing Science, Simon Fraser University, Burnaby, BC, Canada; [email protected] . † School of Computing Science, Simon Fraser University, Burnaby, BC, Canada; sajin [email protected] . ‡ School of Computing Science, Simon Fraser University, Burnaby, BC, Canada; zhenjian [email protected] . § Department of Computing, Imperial College London, UK; [email protected] . ¶ Department of Computer Science, University of Warwick, UK; [email protected] . ontents circuits . . . . . . . 234.3.1 FORMULA ◦ SYM and

FORMULA ◦ LTF . . . . . . . . . . . . . . . . . . . . . 234.3.2

FORMULA ◦ XOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244.3.3

FORMULA ◦ AC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244.4 Formulas of low number-on-forehead communication leaf gates . . . . . . . . . . . . 254.4.1 Hardness based PRGs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.4.2 MKtP lower bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 A Proofs of useful lemmas 39

A.1 Useful lemmas for formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39A.2 PRG for low-communication functions in the number-in-hand setting . . . . . . . . . 412

Introduction

A (de Morgan) Boolean formula over { , } -valued input variables x , . . . , x n is a binary treewhose internal nodes are labelled by AND or OR gates, and whose leaves are marked with avariable or its negation. The power of Boolean formulas has been intensively investigated sincethe early years of complexity theory (see, e.g., [Sub61, Nec66, Khr71, And87, PZ93, IN93, H˚as98,Tal14, DM18]). The techniques underlying these complexity-theoretic results have also enabled al-gorithmic developments. These include learning algorithms [Rei11b, ST17], satisﬁability algorithms(cf. [Tal15]), compression algorithms [CKK + FORMULA ◦ G model, i.e., Boolean formulas whose leaves are labelled by an arbitraryfunction from a ﬁxed class G . This model uniﬁes and generalizes a variety of models that have beenpreviously studied in the literature:– Oliveira, Pich, and Santhanam [OPS19] show that obtaining a reﬁned understanding of for-mulas of size n ε over parity (XOR) gates would have signiﬁcant consequences in complexitytheory. Note that de Morgan formulas of size n ε can simulate such devices. Therefore, abetter understanding of the FORMULA ◦ G model even when G = XOR is necessary before weare able to analyze super-cubic size formulas. – Tal [Tal17] obtains almost quadratic lower bounds for the model of bipartite formulas, wherethere is a ﬁxed partition of the input variables into x , . . . , x n and y , . . . , y n , and a formula leafcan compute an arbitrary function over either ~x or ~y . This model was originally investigatedby Pudl´ak, R¨odl, and Savick´y [PRS88], where it was referred to as graph complexity. Themodel is also equivalent to PSPACE -protocols in communication complexity (cf. [GPW18]).– Abboud and Bringmann [AB18] consider formulas where the leaves are threshold gates whoseinput wires can be arbitrary functions applied to either the ﬁrst or the second half of the input.This extension of bipartite formulas is denoted by F in [AB18]. Their work establishesconnections between faster F -SAT algorithms, the complexity of problems in P such asLongest Common Subsequence and the Fr´echet Distance Problem, and circuit lower bounds.– Polytopes (i.e. intersection of half-spaces), which corresponds to G being the family of linear-threshold functions, and the formula contains only AND gates as internal gates. The con-structing of PRGs for this model has received signiﬁcant attention in the literature (see[OST19] and references therein).We obtain in a uniﬁed way several new results for the FORMULA ◦G model, for natural classes G of functions which include parities, linear (and polynomial) threshold functions, and indeed manyother functions of interest. In particular, we show that this perspective leads to stronger lowerbounds, general satisﬁability algorithms, and better pseudorandom generators for a broad class offunctions. We remark that even a single layer of

XOR gates can compute powerful primitives, such as error-correcting codesand hash functions. .1 Results We now describe in detail our main results and how they contrast to previous works. Ourtechniques will be discussed in Section 1.2, while a few open problems are mentioned in Section1.3.We let

FORMULA [ s ] ◦ G denote the set of Boolean functions computed by formulas containingat most s leaves, where each leaf computes according to some function in G . The set of parityfunctions and their negations will be denoted by XOR .We use the following notation for communication complexity. For a Boolean function f : { , } n →{ , } , we let D ( f ) be the two-party deterministic communication complexity of f , where each partyis given an input of n/ g : { , } n → { , } , we denote by R ( k ) δ ( g ) the communication cost of the best k -party number-on-forehead (NOF) communicationprotocol that computes g with probability at least 1 − δ on every input, where the probability istaken over the random choices of the protocol. For simplicity, we might omit the superscript ( k )from R ( k ) δ ( g ) when k = 2. One of our results will also consider k -party number-in-hand (NIH) pro-tocols, and this will be clearly indicated in order to avoid confusion. We always assume a canonicalpartition of the input coordinates in all statements involving k -party communication complexity,unless stated otherwise. We generalize these deﬁnitions for a class of functions G in the naturalway. For instance, we let R ( k ) δ ( G ) = max g ∈G R ( k ) δ ( g ).Our results refer to standard notions in the literature, but in order to ﬁx notation, Section2 formally deﬁnes communication protocols, Boolean formulas, and other notions relevant in thiswork. We refer to the textbooks [KN97] and [Juk12] for more information about communicationcomplexity and Boolean formulas, respectively. To put our results into context, here we only brieﬂyreview a few known upper bounds on the communication complexity of certain classes G . Parities (

XOR ) and Bipartite Formulas.

Clearly, the deterministic two-party communicationcomplexity of any parity function is at most 2, since to agree on the output it is enough for theplayers to exchange the parity of their relevant input bits. Moreover, note that the bipartite formulamodel discussed above precisely corresponds to formulas whose leaves are computed by a two-partyprotocol of communication cost at most 1.

Halfspaces and Polynomial Threshold Functions (PTFs).

Recall that a halfspace, alsoknown as a Linear Threshold Function (LTF), is a Boolean function of the form sign ( P ni a i · x i − b ),where each a i , b ∈ R and x ∈ { , } n , and that a degree- d PTF is its natural generalization wheredegree- d monomials are allowed. It is known that if g ( x , . . . , x n ) is a halfspace, then its random-ized two-party communication complexity, namely R (2) δ ( g ), satisﬁes R (2) δ ( g ) = O (log( n ) + log(1 /δ ))[Nis94]. On the other hand, if g ( x , . . . , x n ) is a degree- d PTF, then R ( d +1) δ ( g ) = O (cid:0) ( d log d )( d log n +log(1 /δ )) (cid:1) [Nis94, Vio15]. Degree- d Polynomials over GF (2) . It is well known that a degree- d GF (2)-polynomial admits a( d + 1)-party deterministic protocol of communication cost d + 1 under any variable partition, sincein the number-on-forehead model each monomial is entirely seen by some player. In particular, theInner Product function IP n ( x, y ) = P i x i · y i ( mod

2) satisﬁes R (3)1 / ( IP n ) = O (1).4 .1.1 Lower bounds Prior to this work, the only known lower bound against

FORMULA ◦ XOR or bipartite formulaswas the recent result of [Tal17] showing that IP n is hard (even on average) against nearly sub-quadratic formulas. In contrast, we obtain a signiﬁcantly stronger result and establish lower boundsfor diﬀerent Boolean functions. We deﬁne such functions next. GIP kn . The Generalized Inner Product function

GIP kn : { , } n → { , } is deﬁned as GIP kn (cid:16) x (1) , x (2) , . . . , x ( k ) (cid:17) = n/k X j =1 k ^ i =1 x ( i ) j ( mod , where x ( i ) ∈ { , } n/k for each i ∈ [ k ]. MKtP . In the Minimum Kt Problem, where Kt refers to Levin’s time-bounded Kolmogorov com-plexity , we are given a string x ∈ { , } n and a string 1 ℓ . We accept ( x, ℓ ) if and only if Kt ( x ) ≤ ℓ . MCSP . In the Minimum Circuit Size Problem, we are given as input the description of a Booleanfunction f : { , } log n → { , } (represented as an n -bit string), and a string 1 ℓ . We accept ( f, ℓ )if and only the circuit complexity of f is at most ℓ . Theorem 1 (Lower bounds) . The following unconditional lower bounds hold: If GIP kn is (1 / ε ) -close under the uniform distribution to a function in FORMULA [ s ] ◦ G ,then s = Ω  n k · k · (cid:0) R ( k ) ε/ (2 n ) ( G ) + log n (cid:1) · log (1 /ε )  . If MKtP ∈ FORMULA [ s ] ◦ G , then s = e Ω  n k · k · R ( k )1 / ( G )  . If MCSP ∈ FORMULA [ s ] ◦ XOR , then s = e Ω( n ) , where e Ω hides inverse polylog ( n ) factors. Observe that, while [Tal17] showed that the Inner Product function IP n is hard against sub-quadratic bipartite formulas, Theorem 1 Item 1 yields lower bounds against formulas whose leavescan compute bounded-degree PTFs and GF (2)-polynomials, including IP n . PTF circuits werepreviously studied by Nisan [Nis94], who obtained an almost linear Ω d ( n − o (1) ) gate complexitylower bound against circuits with degree- d PTF gates. Recently, [KKL17] gave a super-linear wire complexity lower bound for constant-depth circuits with constant-degree PTF gates. However, itwas open whether we can prove lower bounds against any circuit model that can incorporate a For a string x ∈ { , } ∗ , Kt ( x ) denotes the minimum value | M | + log t taken over M and t , where M is a machinethat prints x when it computes for t steps, and | M | is the description length of M according to a ﬁxed universalmachine U . AND ◦ PTF .Let us now comment on the relevance of Items 2 and 3. Both

MCSP and

MKtP are believed tobe computationally much harder than

GIP kn . However, it is more diﬃcult to analyze these problemscompared to GIP kn because the latter is mathematically “structured,” while the former problems donot seem to be susceptible to typical algebraic, combinatorial, and analytic techniques.More interestingly, MCSP and

MKtP play an important role in the theory of hardness magniﬁ-cation (see [OPS19, CJW19]). In particular, if one could show that

MCSP restricted to an inputparameter ℓ ≤ n o (1) is not in FORMULA [ n ε ] ◦ XOR for some ε >

0, then it would follow that NP cannot be computed by Boolean formulas of size n c , where c ∈ N is arbitrary. Theorem 1 makespartial progress on this direction by establishing the ﬁrst lower bounds for these problems in the FORMULA ◦ G model. (We note that the proof of Theorem 1 Item 3 requires instances where theparameter ℓ is n Ω(1) .) We also get pseudorandom generators (PRGs) against

FORMULA ◦ G for various classes offunctions G . Recall that a PRG against a class of functions C is a function G mapping shortBoolean strings (seeds) to longer Boolean strings, so that every function in C accepts G ’s output ona uniformly random seed with about the same probability as that for an actual uniformly randomstring. More formally, G : { , } ℓ → { , } n is a PRG that ε -fools C if for every Boolean function h : { , } n → { , } in C , we have (cid:12)(cid:12)(cid:12)(cid:12) Pr z ∈{ , } ℓ [ h ( G ( z )) = 1] − Pr x ∈{ , } n [ h ( x ) = 1] (cid:12)(cid:12)(cid:12)(cid:12) ≤ ε. Furthermore, we require G to run in deterministic time poly ( n ) on an input string z ∈ { , } ℓ . Theparameter ℓ = ℓ ( n ) is called the seed length of the PRG and is the main quantity to be minimizedwhen constructing PRGs.There exists a PRG that fools formulas of size s and that has a seed of length s / o (1) [IMZ12].In particular, there are non-trivial PRGs for n -variate formulas of size nearly n . Unfortunately,such PRGs cannot be used to fool even linear size formulas over parity functions, since the naivesimulation of these enhanced formulas by standard Boolean formulas requires size n . Moreover,it is not hard to see that this simulation is optimal: Andreev’s function, which is hard againstformulas of nearly cubic size (cf. [H˚as98]), can be easily computed in FORMULA [ O ( n )] ◦ XOR .Given that a crucial idea in the construction of the PRG in [IMZ12] (shrinkage under restrictions)comes from this lower bound proof, new techniques are needed in order to approach the problemin the

FORMULA ◦ XOR model.More generally, extending a computational model for which strong PRGs are known to allowparities at the bottom layer can cause signiﬁcant diﬃculties. A well-known example is AC circuitsand their extension to AC - XOR . While the former class admits PRGs of poly-logarithmic seedlength (see e.g. [ST19]), the most eﬃcient PRG construction for the latter has seed length (1 − o (1)) · n [FSUV13]. Consequently, designing PRGs of seed length ≤ (1 − Ω(1)) · n can already be a challenge.We are not aware of previous results on PRGs for FORMULA ◦ G for any non-trivial class G .By combining ideas from circuit complexity and communication complexity, we construct PRGsof various seed lengths for FORMULA ◦ G , where G ranges from the class of parity functions to themuch larger class of functions of bounded randomized k -party communication complexity.6 heorem 2 (Pseudorandom generators) . Let G be a class of n -bits functions. Then, In the context of parity functions, there is a

PRG that ε -fools FORMULA [ s ] ◦ XOR of seedlength ℓ = O (cid:0) √ s · log( s ) · log(1 /ε ) + log( n ) (cid:1) . In the context of two-party randomized communication complexity, there is a

PRG that ε -fools FORMULA [ s ] ◦ G of seed length ℓ = n/ O (cid:16) √ s · (cid:16) R (2) ε/ (6 s ) ( G ) + log( s ) (cid:17) · log(1 /ε ) (cid:17) . More generally, for every k ( n ) ≥ , let G be the class of functions that have k -party number-in-hand (NIH) ( ε/ s ) -error randomized communication protocols of cost at most R ( k - NIH ) ε/ (6 s ) . Thereexists a PRG that ε -fools FORMULA [ s ] ◦ G with seed length ℓ = n/k + O (cid:16) √ s · (cid:16) R ( k - NIH ) ε/ (6 s ) + log( s ) (cid:17) · log(1 /ε ) + log( k ) (cid:17) · log( k ) . In the setting of k -party NOF randomized communication complexity, there is a

PRG that ε -fools FORMULA [ s ] ◦ G of seed length ℓ = n − nO (cid:16) √ s · k · k · (cid:16) R ( k ) ε/ (2 s ) ( G ) + log( n ) (cid:17) · log( n/ε ) (cid:17) . A few comments are in order. Under a standard connection between PRGs and lower bounds(see e.g. [Kab02]), improving the dependence on s in the seed length for FORMULA [ s ] ◦ XOR (The-orem 2 Item 1) would require the proof of super-quadratic lower bounds against

FORMULA ◦ XOR .We discuss this problem in more detail in Section 1.3. Note that the additive term n/ n/ ℓ ≥ (1 − /k ) · n in Theorem 2Item 3. Removing the exponential dependence on k would also require advances in state-of-the-artlower bounds for multiparty communication complexity.Theorem 2 Item 2 has an interesting implication for fooling a well-studied class of functions: intersections of halfspaces . Note that an intersection of halfspaces is precisely a polytope, or equiv-alently, the set of solutions of a 0-1 integer linear program. Such objects have found applicationsin many ﬁelds, including optimization and high-dimensional geometry. After a long sequence ofworks on the construction of PRGs for bounded-weight halfspaces, (unrestricted) halfspaces, andgeneralizations of these classes, the following results are known for the intersection of m halfspacesover n input variables. Gopalan, O’Donnell, Wu, and Zuckerman [GOWZ10] gave a PRG for thisclass for error ε with seed length O (cid:0) m · log( m/ε ) + log n ) · log( m/ε ) (cid:1) . Note that the seed length of their PRG becomes trivial if the number of halfspaces is linear in n .More recently, O’Donnell, Servedio and Tan [OST19] constructed a PRG with seed length poly (log( m ) , /ε ) · log( n ) . Clearly, the intersection of s functions can be computed by an enhanced formula of size s + 1. We refer to the recent reference [OST19] for an extensive review of the literature in this area. m , but it cannot be used in the small error regime.For example, the seed length becomes trivial if ε = 1 /n . In particular, before this work it was opento construct a non-trivial PRG for the following natural setting of parameters (cf. [OST19, Section1.2]): intersection of n halfspaces with error ε = 1 /n .We obtain the following consequence of Theorem 2 Item 2, which follows from a result of Viola[Vio15] on the k -party number-in-hand randomized communication complexity of a halfspace. Corollary 3 (Fooling intersections of halfspaces in the low-error regime) . For every n, m ∈ N and ε > , there is a pseudorandom generator with seed length O (cid:16) n / · m / · log( n ) · log( n/ε ) (cid:17) . that ε -fools the class of intersections of m halfspaces over { , } n . We note that the PRG from Theorem 2 Item 3 can fool, even in the exponentially small errorregime, not only intersections of halfspaces, but also small formulas over bounded-degree PTFs.Finally, Theorem 2 Item 2 yields the ﬁrst non-trivial PRG for formulas over symmetric functions.Let

SYM denote the class of symmetric Boolean functions on any number of input variables.

Corollary 4 (Fooling sub-quadratic formulas over symmetric gates) . For every n, s ∈ N and ε > ,there is a pseudorandom generator with seed length O (cid:16) n / · s / · log( n ) · log(1 /ε ) (cid:17) . that ε -fools n -variate Boolean functions in FORMULA [ s ] ◦ SYM . Prior to this work, Chen and Wang [CW19] proved that the number of satisfying assignmentsof an n -variate formula of size s over symmetric gates can be approximately counted to an additiveerror term ≤ ε · n in deterministic time exp( n / · s / o (1) p (log( n ) + log( s ))), where ε > In the C , we are given as input the description ofa computational device D ( x , . . . , x n ) from C , and the goal is to count the number of satisfyingassignments for D . This generalizes the SAT problem for C , where it is suﬃcient to decide whether D is satisﬁable by some assignment.In this section, we show that FORMULA ◦G model for classes G that admit two-party communication protocols ofbounded cost. We establish a general result in this context which can be used to obtain algorithmsfor previously studied classes of Boolean circuits.To put our FORMULA ◦ G into context, we ﬁrst mention relevant relatedwork on the satisﬁability of Boolean formulas. Recall that in the very restricted setting of CNFformulas, known algorithms run (in the worst-case) in time 2 n − o ( n ) when the input formulas can havea super-linear number of clauses (cf. [DH09]). On the other hand, for the class of general formulas,there is a better-than-brute-force algorithm for formulas of size almost n . In more detail, for any ε >

0, there is a deterministic

FORMULA [ n − ε ] that runs in time 2 n − n Ω( ε ) FORMULA ◦ XOR .Before stating our results, we discuss the input encoding in the

FORMULA ◦G .The top formula F is represented in some canonical way, while for each leaf ℓ of F , the input stringcontains the description of a protocol Π ℓ computing a function in G . Our results are robust to theencoding employed for Π ℓ . Recall that a protocol for a two-party function is speciﬁed by a protocoltree and a sequence of functions, where each function is associated with some internal node of thetree and depends on n/ o ( n ) has a protocol treecontaining at most 2 o ( n ) nodes, it can be speciﬁed by a string of length 2 n/ o ( n ) . Our algorithmswill run in time closer to 2 n , and using a fully explicit input representation for the protocols isnot an issue. Another possibility for the input representation is to use “computational eﬃcient”protocols. Informally, the next bit messages of such protocols can be computed in polynomial timefrom the current transcript of the protocol and a player input. An advantage of this representationis that an input to our G . Theorem 5 (Satisﬁability algorithms) . The following results hold. There is a deterministic algorithm for

FORMULA [ s ] ◦ G that runs in time n − t , where t = Ω (cid:18) n √ s · log ( s ) · D ( G ) (cid:19) . There is a randomized algorithm for

FORMULA [ s ] ◦ G that runs in time n − t , where t = Ω n √ s · log ( s ) · R / ( G ) ! / . Theorem 5 readily provides algorithms for many circuit classes. For instance, since one caneﬀectively describe a randomized communication protocol for linear threshold functions [Nis94,Vio15], the algorithm from Theorem 5 Item 2 can be used to count the number of satisfyingassignments of Boolean devices from

FORMULA [ n . ] ◦ LTF . Corollary 6 ( . There is a randomized algorithm for

FORMULA [ s ] ◦ LTF that runs in time n − t , where t = Ω (cid:18) n √ s · log ( s ) · log( n ) (cid:19) / . In connection with Corollary 6, prior to this work essentially two lines of research have beenpursued.

ACC circuits with two layers of LTFs at the bottom, assuming a sub-quadraticnumber of them in the layer next to the input variables (see [ACW16] for this result and further9elated work). Corollary 6 seems to provide the ﬁrst non-trivial SAT algorithm that operates withunbounded-depth Boolean devices containing a layer with a sub-quadratic number of LTFs.Theorem 5 can be seen as a generalization of several approaches to designing SAT algorithmsappearing in the literature, which often employ ad-hoc constructions to convert bottlenecks inthe computation of devices from a class C into non-trivial SAT algorithms for C . We observe that,before this work, [PW10] had made a connection between faster SAT algorithms for CNFs and the 3-party communication complexity of a speciﬁc function. Their setting is diﬀerent though: it seems towork only for CNFs, and they rely on conjectured upper bounds on the communication complexityof a particular problem. More recently, [CW19] employed quantum communication protocols todesign approximate counting algorithms for several problems. In comparison to previous works,to our knowledge Theorem 5 is the ﬁrst unconditional result that yields faster We describe a learning algorithm for the

FORMULA ◦ XOR class in Leslie Valiant’s challengingPAC-learning model [Val84]. Recall that a (PAC) learning algorithm for a class of functions C has access to labelled examples ( x, f ( x )) from an unknown function f ∈ C , where x is sampledaccording to some (also unknown) distribution D . The goal of the learner is to output, with highprobability over its internal randomness and over the choice of random examples (measured by aconﬁdence parameter δ ), a hypothesis h that is close to f under D (measured by an error parameter ε ). We refer to [KV94] for more information about this learning model, and to Section 2 for itsstandard formalization.It is known that formulas of size s can be PAC-learned in time 2 e O ( √ s ) [Rei11b]. Therefore,formulas of almost quadratic size can be non-trivially learned from random samples of an arbitrarydistribution. A bit more formally, we say that a learning algorithm is non-trivial if it runs intime 2 n /n ω (1) , i.e., noticeably faster than the trivial brute-force algorithm that takes time 2 n · poly ( n ). Obtaining non-trivial learning algorithms for various circuit classes is closely connectedto the problem of proving explicit lower bounds against the class [OS17] (see also [ST17] for asystematic investigation of such algorithms). We are not aware of the existence of non-triviallearning algorithms for super-quadratic size formulas. However, it seems likely that such algorithmsexist at least for formulas of near cubic size. As explained in Section 1.1.2, this would still beinsuﬃcient for the learnability of classes such as (linear size) FORMULA ◦ XOR .We explore structural properties of

FORMULA ◦ XOR employed in previous results and boostingtechniques from learning theory to show that sub-quadratic size devices from this class can bePAC-learned in time 2 O ( n/ log n ) . Theorem 7 (PAC-learning

FORMULA ◦ XOR in sub-exponential time) . For every constant γ > ,there is an algorithm that PAC learns the class of n -variate Boolean functions FORMULA [ n − γ ] ◦ XOR to accuracy ε and with conﬁdence δ in time poly (cid:0) n/ log n , /ε, log(1 /δ ) (cid:1) . Note that a sub-exponential running time cannot be achieved for

FORMULA ◦G when we considerthe communication complexity of G . Again, the class is too large, for the same reason discussed in Recall that approximately counting satisfying assignments is substantially easier than solving (1 − o (1)) n . It has been brought to our attention that Avishay Tal has independently discovered a SAT algorithm for bipartiteformulas of sub-quadratic size (see the discussion in [AB18, Page 7]), which corresponds to a particular case of Theorem5.

FORMULA ◦ XOR .In contrast to the algorithm mentioned above that learns (standard) formulas of size s ≤ n − o (1) in time 2 e O ( √ s ) , the algorithm from Theorem 7 does not learn smaller formulas over parities in timefaster than 2 O ( n/ log n ) . We discuss this in more detail in Sections 1.2 and 1.3.Finally, we mention a connection to cryptography that provides a conditional upper bound onthe size of FORMULA ◦ XOR circuits that can be learned in time 2 o ( n ) . It is well known that ifa circuit class C can compute pseudorandom functions (or some variants of this notion), then itcannot be learned in various learning models (see e.g. [KV94]). It has been recently conjecturedthat depth-two MOD ◦ XOR circuits of linear size can compute weak pseudorandom functions ofexponential security [BIP +

18, Conjecture 3.7]. If this conjecture holds, then such circuits cannotbe learned in time 2 o ( n ) . Since MOD gates over a linear number of input wires can be simulatedby formulas of size at most O ( n . ) [Ser17], under this cryptographic assumption it is not possibleto learn FORMULA [ n . ] ◦ XOR in time 2 o ( n ) , even if the learner only needs to succeed under theuniform distribution. In order to explain our techniques, we focus for the most part on the design of PRGs for

FORMULA ◦G when G is of bounded two-party randomized communication complexity (a particularcase of Theorem 2 Item 2). This proof makes use of various ingredients employed in other results.After sketching this argument, we say a few words about our strongest lower bound (Theorem 1Item 1) and the satisﬁability and learning algorithms (Theorems 5 and 7, respectively).We build on a powerful result showing that any small de Morgan formula can be approximatedpointwise by a low-degree polynomial: (A) For every formula F ( y , . . . , y m ) of size s , there is a polynomial p ( y , . . . , y m ) ∈ R [ y , . . . , y m ]of degree O ( √ s ) such that | F ( a ) − p ( a ) | ≤ /

10 on every a ∈ { , } m .The only known proof of this result [Rei11b] relies on a sequence of works [BBC +

01, LLS06, HLS07,FGG08, Rei09, ACR +

10, RS12] on quantum query complexity, generalizing Grover’s search algo-rithm for the OR predicate [Gro96] to arbitrary formulas. The starting point of many of our resultsis a consequence of (A) which is implicit in the work of Tal [Tal17]. (B)

Let D be a distribution over { , } m , and F ∈ FORMULA [ s ] ◦ G . Then, for every function f ,if Pr x ∼D [ F ( x ) = f ( x )] ≥ / ε then Pr x ∼D [ h ( x ) = f ( x )] ≥ / − t )for some function h which is the XOR of at most t functions in G , where t = e Θ( √ s · log(1 /ε )).Intuitively, if we could understand well enough the XOR of any small collection of functions in G , then we can translate this into results for FORMULA [ s ] ◦ G , as long as s ≪ n . We adapt thetechniques behind (B) to provide a general approach to constructing PRGs against FORMULA ◦ G : Main PRG Lemma.

In order for a distribution D to ε -fool the class FORMULA [ s ] ◦G , it is enoughfor it to exp( − t )-fool the class XOR t · G , where t = e Θ( √ s · log(1 /ε )).Recall that, in Theorem 2 Item 2, we consider a class G of functions that admit two-party random-ized protocols of cost R = R (2) ε/ s ( G ). It is easy to see that the XOR of any t functions from G is a11unction that can be computed by a protocol of cost at most t · R . Thus the lemma above showsthat it is suﬃcient to fool, to exponentially small error, a class of functions of bounded two-partyrandomized communication complexity. Moreover, since a randomized protocol can be written asa convex combination of deterministic protocols, it is possible to prove that fooling functions ofbounded deterministic communication complexity is enough.Pseudorandom generators in the two-party communication model have been known since [INW94].Their construction exploits that the Boolean matrix associated with a function of small communica-tion cost can be partitioned into a not too large number of monochromatic rectangles. We providein Appendix A.2 a slightly modiﬁed and self-contained construction based on explicit extractors.It achieves the following parameters: There is an explicit PRG that δ -fools any n -bit function oftwo-party communication cost D and that has seed length n/ O ( D + log(1 /δ )). This PRG hasnon-trivial seed length even when the error is exponentially small, as required by our techniques.One issue here is that the INW PRG was only shown to fool functions with low deterministic com-munication complexity. To obtain our PRGs for FORMULA ◦ G when G admits low-cost randomized protocols, we ﬁrst extend the analysis of the INW PRG to show that it also fools functions withlow randomized communication complexity. Combining this construction with the aforementioneddiscussion completes the proof of Theorem 2 Item 2.The argument just sketched reduces the construction of PRGs for FORMULA ◦ G when functionsin G admit low-cost randomized protocols to the analysis of PRGs for functions that admit relativelylow-cost deterministic protocols. Our lower bound proof for GIP kn in Theorem 1 Item 1 proceeds ina similar fashion. We combine statement (B) described above with other ideas to show: Transfer Lemma (Informal).

If a function correlates with some small formula whose leaf gateshave low-cost randomized k -party protocols, then it also non-trivially correlates with some functionthat has relatively low-cost deterministic k -party protocols.Given this result, we are able to rely on a strong average-case lower bound for GIP kn against k -partydeterministic protocols from [BNS92] to conclude that GIP kn is hard for FORMULA ◦ G .Our (A) , for which we show that such a polynomial can be obtained explicitly , with a decompositionof the Boolean matrix at each leaf that is induced by a corresponding low-cost randomized ordeterministic two-party protocol. A careful combination of these two representations allows us toadapt a standard technique employed in the design of non-trivial SAT algorithms (fast rectangularmatrix multiplication) to obtain non-trivial savings in the running time.Finally, our learning algorithm for

FORMULA ◦ XOR is a consequence of statement (B) abovecoupled with standard tools from learning theory. In a bit more detail, since a parity of pari-ties is just another parity function, (B) implies that, under any distribution, every function in

FORMULA [ n . ] ◦ XOR is weakly correlated with some parity function. Using the agnostic learningalgorithm for parity functions of [KMV08], it is possible to weakly learn

FORMULA [ n . ] ◦ XOR in time 2 O ( n/ log n ) . This weak learner can then be transformed into a (strong) PAC learner usingstandard boosting techniques [Fre90], with only a polynomial blow-up over its running time. The main message of our results is that the computational power of a subquadratic-size topformula is not signiﬁcantly enhanced by leaf gates of low communication complexity . We believe that12he idea of decomposing a Boolean device into a computational part and a layer of communicationprotocols will ﬁnd further applications in lower bound proofs and algorithm design.One of our main open problems is to discover a method that can analyze

FORMULA [ s ] ◦ G when s ≫ n . For instance, is it possible to adapt existing techniques to show an explicit lowerbound against FORMULA [ n . ] ◦ G , or achieving this is just as hard as breaking the cubic barrierfor formula lower bounds? Results in this direction would be interesting even for G = XOR .Finally, we would like to mention a few questions connected to our results and their applications.Is it possible to combine the techniques behind Corollary 3 and [OST19] to design a PRG of seedlength n o (1) and error ε = 1 /n for the intersection of n halfspaces? Can we design a satisﬁabilityalgorithm for formulas over k -party number-on-forehead communication protocols? Is it possible tolearn FORMULA [ s ] ◦ XOR in time 2 e O ( √ s ) ? (The learning algorithm for formulas from [Rei11b] relieson techniques from [KKMS08], and it is unclear how to extend them to the case of FORMULA ◦ XOR .) Theorem 1 Item 1 is proved in Section 3, while Items 2 and 3 rely on our PRG constructionsand are deferred to Section 4. The latter describes a general approach to constructing PRGsfor

FORMULA ◦ G . It includes the proof of Theorem 2 and other applications. Our satisﬁabil-ity algorithms (Theorem 5) appear in Section 5. Finally, Section 6 discusses learning results for

FORMULA ◦ XOR and contains a proof of Theorem 7.

Let n ∈ N ; we denote { , . . . , n } by [ n ], and denote by U n the uniform distribution over { , } n .We use e O ( · ) (and e Ω( · )) to hide polylogarithmic factors. That is, for any f : N → N , we have that e O ( f ( n )) = O ( f ( n ) · polylog ( f ( n ))).In this paper, we will mainly use {− , } as the Boolean basis. In some parts of this paper, wewill use the { , } basis for the simplicity of the presentation. This will be speciﬁed in correspondingsections. Deﬁnition 8. An n -variate de Morgan formula is a directed rooted tree; its non-leaf vertices(henceforth, internal gates ) take labels from { AND , OR , NOT } = {∧ , ∨ , ¬} and its leaves (hence-forth, variable gates ) take labels from the set of variables { x , . . . , x n } . Each internal gate hasbounded in-degree (henceforth, fan-in ); the NOT gate in particular has fan-in and every variablegate has fan-in . The size of a de Morgan formula is the number of its leaf gates. In this work, we denote by

FORMULA [ s ] the class of Boolean functions computable by size- s deMorgan formulas. Let G denote some class of Boolean functions; then, we denote by FORMULA [ s ] ◦G the class of functions computable by some size- s de Morgan formula where its leaves are labelledby functions in G . 13 .3 Approximating polynomials Deﬁnition 9 (Point-wise approximation) . For a Boolean function f : {− , } n → {− , } , we saythat the function ˜ f : {− , } n → R ε -approximates f if for every z ∈ {− , } n , (cid:12)(cid:12)(cid:12) f ( z ) − ˜ f ( z ) (cid:12)(cid:12)(cid:12) ≤ ε. We will need the following powerful result for the approximating degree of de Morgan formulas.

Theorem 10 ([Rei11b], see also [BNRdW07]) . Let s > be an integer and < ε < . Anyde Morgan formula F : {− , } n → {− , } of size s has a ε -approximating polynomial of degree d = O ( √ s · log(1 /ε )) . That is, there exists a degree- d polynomial p : {− , } n → R over the realssuch that for every z ∈ {− , } n , | p ( z ) − F ( z ) | ≤ ε. Note that Theorem 10 still holds if we use { , } as the Boolean basis. We use standard deﬁnitions from communication complexity. In this paper we consider thestandard two party model of Yao and its generalizations to multiparty setting. We denote deter-ministic communication complexity of a Boolean function by D ( f ) in the two party setting. Werefer to [KN97] for standard deﬁnitions from communication complexity. Deﬁnition 11.

Let f : { , } n → { , } be a Boolean function. The communication matrix of f ,namely M f , is a n/ × n/ matrix deﬁned by ( M f ) x,y := f ( x, y ) . Deﬁnition 12. A rectangle is a set of the form A × B , for A, B ⊆ { , } n . A monochromaticrectangle is a rectangle S such that for all pairs ( x, y ) ∈ S the value f ( x, y ) is the same. Lemma 13.

Let Π be a protocol that computes f : { , } n → { , } with at most D bits of commu-nication. Then, Π induces a partition of M f into at most D monochromatic rectangles. Given a protocol, its transcript is the sequence of bits communicated.

Lemma 14.

For every transcript z of some communication protocol, the set of inputs ( x, y ) thatgenerate z is a rectangle. Below, we recount the deﬁnitions of two multiparty communication models used in this work,namely the number-on-forehead and the number-in-hand models.

Deﬁnition 15 (“Number-on-forehead” communication model; informal) . In the k -party “number-on-forehead” communication model, there are k players and k strings x , . . . , x k ∈ { , } n/k andplayer i gets all the strings except for x i . The players are interested in computing a value f ( x , . . . , x k ) ,where f : { , } n → { , } is some ﬁxed function. We denote by D ( k ) ( f ) the number of bits thatmust be exchanged by the best possible number on forehead protocol solving f . We also use the following weaker communication model.14 eﬁnition 16 (“Number-in-hand” communication model; informal) . In the k -party “number-in-hand” communication model, there are k players and k strings x , . . . , x k ∈ { , } n/k and player i gets only x i . The players are interested in computing a value f ( x , . . . , x k ) , where f : { , } n →{ , } is some ﬁxed function. We denote by D ( k - NIH ) ( f ) the number of bits that must be exchangedby the best possible communication protocol. Note that D ( k - NIH ) ( f ) ≤ (1 − /k ) · n + 1, for any n -variate Boolean function f , as if k − f ( x , . . . , x k ) on her own and then publish it.For the communication models mentioned above, there are also bounded-error randomizedversions, denoted by R δ , R ( k ) δ , and R ( k - NIH ) δ , respectively, where 0 < δ < r , and the aforementioned error probability of the protocol is considered overthe possible choices of r . Moreover, we require the error to be at most δ on each ﬁxed choice ofinputs.We can extend the deﬁnitions of the communication complexity measures, deﬁned above, toclasses of Boolean functions, in a natural way. That is, for any communication complexity measure M ∈ n D, D ( k ) , D ( k - NIH ) , R δ , R ( k ) δ , R ( k - NIH ) δ o and for any class of Boolean functions G , we may deﬁne M ( G ) := max g ∈G M ( g ) . We note that throughout this paper, we denote by n the number of input bits for the functionregardless the communication models. In the k -party communication setting (either NOF or NIH),we assume without loss of generality that n is divisible by k . A PRG against a class of functions C is a deterministic procedure G mapping short Booleanstrings (seeds) to longer Boolean strings, so that G ’s output “looks random” to every function in C . Deﬁnition 17 (Pseudorandom generators) . Let G : {− , } ℓ → {− , } n be a function, C be aclass of Boolean functions, and < ε < . We say that G is a pseudorandom generator of seedlength ℓ that ε -fools C if, for every function f ∈ C , it is the case that (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E z ∼{− , } ℓ [ f ( G ( z ))] − E x ∼{− , } n [ f ( x )] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ ε. A PRG G outputting n bits is called explicit if G can be computed in poly ( n ) time. All PRGsstated in this paper are explicit. For a function f : { , } n → { , } and a distribution D supported over { , } n , we denote byEX( f, D ) a randomized oracle that outputs independent identically distributed labelled examplesof the form ( x, f ( x )), where x ∼ D . 15 eﬁnition 18 (PAC learning model [Val84]) . Let C be a class of Boolean functions. We say thata randomized algorithm A learns C if, when A is given oracle access to EX( f, D ) and inputs n , ε , and δ , the following holds. For every n -variate function f ∈ C , distribution D supported over { , } n , and real-valued parameters ε > and δ > , A EX( f, D ) (1 n , ε, δ ) outputs with probabilityat least − δ over its internal randomness and the randomness of the example oracle EX( f, D ) adescription of a hypothesis h : { , } n → { , } such that Pr x ∼D [ f ( x ) = h ( x )] ≥ − ε. The sample complexity of a learning algorithm is the maximum number of random examples from

EX( f, D ) requested during its execution. In this section, we prove an average-case lower bound for the generalized inner product functionagainst

FORMULA ◦G , where G is the set of functions that have low-cost randomized communicationprotocols in the number-on-forehead setting. This corresponds to Item 1 of Theorem 1. Items 2and 3 rely on our PRG constructions, and the proofs are deferred to Section 4. Theorem 19.

For any integer k ≥ , s > and any class of functions G , let C : {− , } n → {− , } be a function in FORMULA [ s ] ◦ G such that Pr x ∼{− , } n h C ( x ) = GIP kn ( x ) i ≥ / ε. Then s = Ω  n k · k · (cid:16) R ( k ) ε/ (2 n ) ( G ) + log n (cid:17) · log (1 /ε )  . We need a couple useful lemmas from [Tal16], whose proofs are presented in Appendix A.1(Lemma 50 and Lemma 51) for completeness.

Lemma 20 ([Tal16]) . Let D be a distribution over {− , } n , and let f, C : {− , } n → {− , } besuch that Pr x ∼D [ C ( x ) = f ( x )] ≥ / ε. Let ˜ C : {− , } n → R be a ε -approximating function of C , i.e., for every x ∈ {− , } n , | C ( x ) − ˜ C ( x ) | ≤ ε . Then, E x ∼D [ ˜ C ( x ) · f ( x )] ≥ ε. Lemma 21 ([Tal16]) . Let D be a distribution over {− , } n and let G be a class of functions. For f : {− , } n → {− , } , suppose that D : {− , } n → {− , } ∈ FORMULA [ s ] ◦ G is such that Pr x ∼D [ D ( x ) = f ( x )] ≥ / ε . Then there exists some h : {− , } n → {− , } ∈ XOR O ( √ s · log(1 /ε ) ) ◦ G such that E x ∼D [ h ( x ) · f ( x )] ≥ s O ( √ s · log(1 /ε ) ) .

16e also need the following communication-complexity lower bound for

GIP . Theorem 22 ([BNS92, Theorem 2]) . For any k ≥ , any function that computes GIP kn on morethan / δ fraction of the inputs (over uniformly random inputs) must have k -party deterministiccommunication complexity at least Ω (cid:0) n/ ( k · k ) − log(1 /δ ) (cid:1) . We ﬁrst show that if a function correlates with some small formula, whose leaves are functionswith low randomized communication complexity, then it also correlates non-trivially with somefunction of relatively low deterministic communication complexity.

Lemma 23.

For any distribution D over {− , } n , and any class of functions G , let f : {− , } n →{− , } and C : {− , } n → {− , } ∈ FORMULA [ s ] ◦ G be such that Pr x ∼D [ C ( x ) = f ( x )] ≥ / ε. Then there exists a function h , with k -party deterministic communication complexity at most O (cid:16) R ( k ) ε/ (2 s ) ( G ) · √ s · log(1 /ε ) (cid:17) , such that Pr x ∼D [ h ( x ) = f ( x )] ≥ / /s O ( √ s · log(1 /ε )) . Proof.

Let C = F ( g , g . . . , g s ) be the function in FORMULA [ s ] ◦ G , where F is a formula and g , g , . . . , g s are leaf functions from the class G . For each g i , consider a k -party randomized protocolΠ i of cost at most R = R ( k ) ε/ (2 s ) ( G ) that has an error ε/ (2 s ). Now consider the following function˜ C ( x ) := E Π , Π ,..., Π s [ D ( x )] , where D ( x ) := F (Π ( x ) , Π ( x ) , . . . , Π s ( x )) . Note that for any ﬁxed choice of (Π , Π , . . . , Π s ), D is a formula whose leaves are functions with deterministic communication complexity at most R . Next, we show the following. Claim 24.

The function ˜ C ε -approximates C .Proof of Claim 24. First note that since each Π i is a ( ε/ (2 s ))-error randomized protocol, by takingthe union bound over the s leaf functions, we have that for every input x ∈ {− , } n , Pr Π , Π ,..., Π s [Π ( x ) = g ( x ) ∧ Π ( x ) = g ( x ) ∧ · · · ∧ Π s ( x ) = g s ( x )] ≥ − ε/ . Denote by E the event Π ( x ) = g ( x ) ∧ Π ( x ) = g ( x ) ∧ · · · ∧ Π s ( x ) = g s ( x ). We have for every x ∈ {− , } n , ˜ C ( x ) = E Π , Π ,..., Π s [ D ( x )]= E [ D ( x ) | E ] · Pr [ E ] + E [ D ( x ) | ¬E ] · Pr [ ¬E ]= C ( x ) · Pr [ E ] + E [ D ( x ) | ¬E ] · Pr [ ¬E ] .

17n the one hand, we have˜ C ( x ) = C ( x ) · Pr [ E ] + E [ D ( x ) | ¬E ] · Pr [ ¬E ] ≤ C ( x ) + ε/ . On the other hand, we get˜ C ( x ) = C ( x ) · Pr [ E ] + E [ D ( x ) | ¬E ] · Pr [ ¬E ] ≥ C ( x ) · (1 − ε/

2) + ( − · ( ε/ ≥ C ( x ) − ε. This completes the proof of the claim.Now by Claim 24 and Lemma 20, we have E x ∼D [ ˜ C ( x ) · f ( x )] ≥ ε. (1)By the deﬁnition of ˜ C , Equation (1) implies that there exists some D , which is a formula whoseleaves are functions with deterministic communication complexity at most R , such that E x ∼D [ D ( x ) · f ( x )] ≥ ε, which implies Pr x ∼D [ D ( x ) = f ( x )] ≥ / ε/ . Then by Lemma 21, there exists a function h , which can be expressed as the XOR of at most O ( √ s · log(1 /ε )) leaf functions in D , such that E x ∼D [ h ( x ) · f ( x )] ≥ s O ( √ s · log(1 /ε )) , which again implies Pr x ∼D [ h ( x ) = f ( x )] ≥

12 + 1 s O ( √ s · log(1 /ε ) ) . Finally, note that the k -party deterministic communication complexity of h is at most O ( R · √ s · log(1 /ε )) , where R = R ( k ) ε/ (2 s ) ( G ).We are now ready to show Theorem 19. Proof of Theorem 19.

Consider Lemma 23 with f being GIP kn and D being the uniform distribution.Consider Theorem 22 with δ = 1 /s O ( √ s · log(1 /ε )) . We have O (cid:16) R ( k ) ε/ (2 s ) ( G ) · √ s · log(1 /ε ) (cid:17) ≥ n/ ( k k ) − O (cid:0) √ s · log( s ) · log(1 /ε )) (cid:1) , which implies s ≥ Ω  n k · k · (cid:16) R ( k ) ε/ (2 n ) ( G ) + log n (cid:17) · log (1 /ε )  . Pseudorandom generators

Some of our PRGs are obtained from a general framework that allows us to reduce the task offooling

FORMULA ◦G to the task of fooling the class of functions which are the parity or conjunctionof few functions from G . We show that in order to get a PRG for the class of subquadratic-size formulas with leaf gatesin G , it suﬃces to get a PRG for very simple sublinear-size formulas: either XOR ◦ G or AND ◦ G . Theorem 25 (PRG for

FORMULA ◦ G from PRG for

XOR ◦ G or AND ◦ G ) . Let G be a classof gates on n bits. For any integer s > and any < ε < , there exists a constant c > such that the following holds. If a distribution D over {− , } n (cid:16) − c ·√ s · log( s ) · log(1 /ε ) (cid:17) -fools the XOR (parity) or the

AND (conjunction) of c · √ s · log(1 /ε ) arbitrary functions from G , then D also ε -fools FORMULA [ s ] ◦ G .Proof. We ﬁrst show the case where D fools the parity of a few functions from G . The proof canbe easily adapted to the case of conjunction.Let C = F ( g , g . . . , g s ) be a function in FORMULA [ s ] ◦G , where F is a formula, and g , g , . . . , g s are functions from the class G . Let U be the uniform distribution over {− , } n . We need to show E [ C ( D )] ε ≈ E [ C ( U )] . (2)Let p be a ( ε/ F given by Theorem 10. Note that the degree of p is d = O ( √ s · log(1 /ε )) . Let us replace F , the formula part of C , with p and let˜ C := p ( g , g . . . , g s ) . Since ˜ C point-wisely approximates C , we have E [ ˜ C ( U )] ε/ ≈ E [ C ( U )] , and E [ ˜ C ( D )] ε/ ≈ E [ C ( D )] . Then to show Equation (2), it suﬃces to show E [ ˜ C ( D )] ε/ ≈ E [ ˜ C ( U )] . We have E x ∼D [ ˜ C ( x )] = E x ∼ D  X S ⊆ [ s ]: | S |≤ d ˆ p ( S ) · Y i ∈ S g i ( x )  X S ⊆ [ s ]: | S |≤ d ˆ p ( S ) · E x ∼D "Y i ∈ S g i ( x ) . (3)Now note that for each S ⊆ [ s ], Q i ∈ S g i ( x ) computes the XOR of at most d functions from G . Usingthe fact the distribution D (cid:16) δ = 1 / c ·√ s · log( s ) · log(1 /ε ) (cid:17) -fools the XOR of any d functions from G , weget E x ∼D [ ˜ C ( x )] = X S ⊆ [ s ]: | S |≤ d ˆ p ( S ) · E x ∼ D "Y i ∈ S g i ( x ) = X S ⊆ [ s ]: | S |≤ d ˆ p ( S ) · E x ∼ U "Y i ∈ S g i ( x ) + δ S ! (where | δ S | ≤ δ )= X S ⊆ [ s ]: | S |≤ d ˆ p ( S ) · E x ∼ U "Y i ∈ S g i ( x ) + ˆ p ( S ) · δ S ! = X S ⊆ [ s ]: | S |≤ d ˆ p ( S ) · E x ∼ U "Y i ∈ S g i ( x ) + X S ⊆ [ s ]: | S |≤ d ˆ p ( S ) · δ S = E x ∼ U [ ˜ C ( x )] + X S ⊆ [ s ]: | S |≤ d ˆ p ( S ) · δ S . It remains to show (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X S ⊆ [ s ]: | S |≤ d ˆ p ( S ) · δ S (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ ε/ . Note that because p ( z ) ∈ [1 − ε/ , ε/

3] for every z ∈ {− , } s , we have | ˆ p ( S ) | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E z ∼{− , } s " p ( z ) · Y i ∈ S z i ≤ ε/ < . Then, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X S ⊆ [ s ]: | S |≤ d ˆ p ( S ) · δ S (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ X S ⊆ [ s ]: | S |≤ d | ˆ p ( S ) | · | δ S | ≤ δ · X S ⊆ [ s ]: | S |≤ d | ˆ p ( S ) | ≤ δ · s O ( √ s · log(1 /ε )) ≤ ε/ , where the last inequality holds for some suﬃciently large constant c .To show the case of conjunction, we can write the approximating polynomial as the sum of alldegree- d monomials, each of which is the AND of at most d variables. One way to do this is to usethe domain { , } instead of {− , } in the above argument. We need to show that the coeﬃcientsin this case still have small magnitude. 20 laim 26. Let p : {− , } n → R be a degree- d polynomial of the form p ( x ) = X S ⊆ [ n ]: | S |≤ d ˆ p ( S ) · Y i ∈ S x i , and let q : { , } n → R be the corresponding polynomial of p over the domain { , } n , of the form q ( y ) = X T ⊆ [ n ]: | T |≤ d ˆ q ( T ) · Y i ∈ T y i . Then, | q | = X T ⊆ [ n ]: | T |≤ d | ˆ q ( T ) | ≤ n O ( d ) · max S ⊆ [ n ]: | S |≤ d | ˆ p ( S ) | . Proof.

We have q ( y , y , . . . , y n ) = p (1 − y , − y , . . . , − y n )= X S ⊆ [ n ]: | S |≤ d ˆ p ( S ) · Y i ∈ S (1 − y i )= X S ⊆ [ n ]: | S |≤ d ˆ p ( S ) ·  X ℓ ∈{ , } | S | Y j ∈ S : ℓ j =1 − y j  = X S ⊆ [ n ]: | S |≤ d X ℓ ∈{ , } | S | ˆ p ( S ) · ( − | ℓ | · Y j ∈ S : ℓ j =1 y j . (where | ℓ | = P | S | i =1 ℓ i )For a pair ( S, ℓ ) where S ⊆ [ n ], | S | ≤ d and ℓ ∈ { , } | S | , let us deﬁne the polynomial q ( S,ℓ ) as q ( S,ℓ ) ( y ) = ˆ p ( S ) · ( − | ℓ | · Y j ∈ S : ℓ j =1 y j . Note that there are at most n d · d many pairs of such ( S, ℓ )’s and for each (

S, ℓ ), we have | q ( S,ℓ ) | = (cid:12)(cid:12)(cid:12) ˆ p ( S ) · ( − | ℓ | (cid:12)(cid:12)(cid:12) ≤ d · | ˆ p ( S ) | . Finally we have | q | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)X ( S,ℓ ) q ( S,ℓ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ X ( S,ℓ ) | q ( S,ℓ ) | ≤ n d · d · d · max S ⊆ [ n ]: | S |≤ d | ˆ p ( S ) | , as desired. (cid:3) (Claim 26)This completes the proof of Theorem 25. (cid:3) (Theorem 25)21 .2 Formulas of low-communication functions in the number-in-hand setting In this subsection, we will use { , } as the Boolean basis. Theorem 27.

For any integers k ≥ , s > and any < ε < , let G be the class of functions thathave k -party number-in-hand ( ε/ s ) -error randomized communication protocols of cost at most R .There exists a PRG that ε -fools FORMULA [ s ] ◦ G with seed length n/k + O (cid:0) √ s · ( R + log( s )) · log(1 /ε ) + log( k ) (cid:1) · log( k ) . We need the following PRG that fools single functions with low communication complexity inthe number-in-hand model. The proof is presented in Appendix A.2 (Theorem 52) for completeness.

Theorem 28 ([ASWZ96, INW94]) . For any k ≥ , there exists a PRG that δ -fools any n -bitsfunctions with k -party number-in-hand deterministic communication complexity of at most D ′ , withseed length n/k + O (cid:0) D ′ + log(1 /δ ) + log( k ) (cid:1) · log( k ) . Next, we show a PRG for

FORMULA ◦ G , where G is the class of functions with low-costcommunication protocols in the number-in-hand setting. We ﬁrst show for the case of deterministicprotocols. Theorem 29.

For any integers k ≥ and s > , let G be the class of functions whose k -partynumber-in-hand deterministic communication complexity are at most D . There is a PRG that ε -fools FORMULA [ s ] ◦ G of size s with seed length n/k + O (cid:0) √ s · log(1 /ε ) · ( D + log( s )) + log( k ) (cid:1) · log( k ) . Proof.

By Theorem 25, it suﬃces to show a PRG that (cid:16) δ = 1 / c ·√ s · log( s ) · log(1 /ε ) (cid:17) -fools every func-tion that is the XOR of t = c · √ s · log(1 /ε ) arbitrary functions from G . Note that such a functionhas deterministic communication complexity at most D ′ = t · D . Then Theorem 29 follows fromTheorem 28.We now establish the randomized case. Proof of Theorem 27.

Let C be a function in FORMULA [ s ] ◦ G . For each of the leaf functions in C , consider a k -party number-in-hand randomized protocol of cost at most R that has an errorat most ε/ (6 s ). By taking a union bound over the s leaf functions and by viewing a randomizedprotocol as a distribution of deterministic protocols (as shown in the proof of Claim 24), we getthe following which is a (point-wisely) ( ε/ C :˜ C ( x ) := X i p i · D i ( x ) , where each p i ∈ [0 ,

1] is some probability density value (so P i p i = 1), and each D i is a formulawhose leaves are functions with deterministic communication complexity at most R . Then to ε -fool C , it suﬃces to ( ε/ ε/ C . Also, since ˜ C is a convex combinationof the D i ’s, it suﬃces to ( ε/ D i ’s. We will do this using the PRG form Theorem 29.We get that there exists a PRG that ( ε/ D i with seed length n/k + O (cid:0) √ s · ( R + log( s )) · log(1 /ε ) + log( k ) (cid:1) · log( k ) , as desired. 22 .3 Applications: Fooling formulas of SYMs, LTFs, XORs, and AC circuits FORMULA ◦ SYM and

FORMULA ◦ LTF

Here, we show how the PRG in Theorem 27 implies PRGs for

FORMULA ◦ LTF and

FORMULA ◦ SYM . Theorem 30.

For any size s > and < ε < , there exists a PRG that ε -fools FORMULA [ s ] ◦ LTF with seed length O (cid:16) n / · s / · log( n ) · log( n/ε ) (cid:17) . For

FORMULA [ s ] ◦ SYM , the seed length is O (cid:16) n / · s / · log( n ) · log(1 /ε ) (cid:17) . We need the fact that the class of

LTF has low communication complexity in the number-in-hand model. Consider the following k -party SUM-GREATER m problem where the i -th party holdsa m -bit number z i in hand and they want to determine whether P ki =1 z i > θ , where θ is a ﬁxednumber known to all the parties. Nisan [Nis94] gave an eﬃcient randomized protocol (with publicrandomness) for this problem. Theorem 31 ([Nis94] ) . Let m > be an integer. For any integer ≤ k ≤ m O (1) , and any < δ < , there exists a δ -error randomized protocol of cost O ( k · log( m ) · log( m/δ )) for the k -party SUM - GREATER m problem. By Theorem 31 and the fact that every linear threshold function on n bits has a representationsuch that the weights are O ( n log( n )) integers [MTT61], we get the following. Corollary 32.

For every k ≥ and < δ < , the k -party number-in-hand δ -error randomizedcommunication complexity of LTF is O ( k · log( n ) · log( n/δ )) .Proof of Theorem 30. By Corollary 32 and Theorem 27, for every k ≥ FORMULA ◦ LTF of seed length n/k + O (cid:0) √ s · k · log( n ) · log( ns/ε ) · log(1 /ε ) + log( k ) (cid:1) · log( k ) . By choosing k = n / s / · log( n ) · log( n/ε ) , the claimed seed length follows from a simple calculation.For FORMULA ◦ SYM , note that every n -bit symmetric function has a deterministic k -partynumber-in-hand communication protocol of cost at most k · log( n ). Then the rest can be shownusing a similar argument as above (by choosing k = n / / (cid:0) s / · log( n ) (cid:1) ). Viola [Vio15] gave a δ -error randomized protocol for the k -party SUM-GREATER m problem of cost O ( k · log( k ) · log( m/δ )), which is better than Nisan’s protocol when k = m o (1) . .3.2 FORMULA ◦ XOR

For the case of

FORMULA ◦ XOR , we get a PRG with better seed length.

Theorem 33.

For any size s > and < ε < , there exists a PRG that ε -fools FORMULA [ s ] ◦ XOR with seed length O (cid:0) √ s · log( s ) · log(1 /ε ) + log( n ) (cid:1) . Proof.

By Theorem 25, to fool

FORMULA [ s ] ◦ G , it suﬃces to (cid:16) δ = 1 / O ( √ s · log( s ) · log(1 /ε ) ) (cid:17) -fool the XOR of a few functions from G , where G in this case is the set of all XOR functions. Note thatthe

XOR of any set of

XOR functions simply computes some

XOR function. Therefore, we can usesmall-bias distribution, which fools every

XOR function, to fool

FORMULA [ s ] ◦ XOR . Finally, notethat there are known constructions for δ -bias distributions that use O (log( n/δ )) random bits (seee.g. [AGHP92]).Using the “locality” of this PRG for FORMULA ◦ XOR , we get a lower bound for MCSP againstsubquadratic-size formulas of XORs.

Theorem 34.

For every integer s > , if MCSP on N -bit can be computed by some function in FORMULA [ s ] ◦ XOR , then s = ˜Ω( N ) .Proof sketch. There is a standard construction of δ -bias distribution that is local (see e.g. [AGHP92,Construction 3] and [CKLM19, Fact 7]) in the following sense: there exists a circuit of size at most˜ O (log( n/δ ) · log( n )) such that given a seed of length O (log( n/δ )) and a index j ∈ [ n ], outputs the j -th bit of the distribution. Local PRGs imply MCSP lower bounds (see [CKLM19, Section 3]). FORMULA ◦ AC Another application of Theorem 25 is to take G to be the set all functions that can be computedby small constant-depth circuits ( AC ). Note the state-of-the-art PRG against size- M depth- d AC has a seed length of log d + O (1) ( M n ) · log(1 /ε ) [ST19]. Below, let AC d,M denote the class of depth- d circuits of size at most M . Theorem 35.

For any size s, m > and < ε < , there exists a PRG that ε -fools FORMULA ◦ AC d,M of size s with seed length log d + O (1) ( M n ) · √ s · log( s ) · log(1 /ε ) . Moreover, by inspecting the construction of PRG in [ST19], it is not diﬃcult to see that thePRG is also local; there exists a circuit of size at most λ = log d + O (1) ( M n ) · log(1 /ε ) such that givena seed of length O log d + O (1) ( M n ) · log(1 /ε ) and a index j ∈ [ n ], outputs the j -th bit of the PRG.As a result, we get MCSP lower bounds from the this PRG. Theorem 36.

For every s, d, M ∈ N , if MCSP on N -bit can be computed by some function in FORMULA [ s ] ◦ AC d,M , then s ≥ N / log d + O (1) ( M n ) . .4 Formulas of low number-on-forehead communication leaf gates In this section, we show a PRG with mild seed length for formulas of functions with low multi-party number-on-forehead communication complexity.

Theorem 37.

Let G be a class of n -bits functions. For any size s > , there exists a PRG that ε -fools FORMULA [ s ] ◦ G , with seed length n − nO (cid:16) √ s · k · k · (cid:16) R ( k ) ε/ (2 s ) ( G ) + log( n ) (cid:17) · log( n/ε ) (cid:17) . The PRG is constructed using the hardness vs. randomness paradigm.

We show how to construct the PRG using the average-case hardness result for formulas offunctions with low multi-party communication complexity (Theorem 19). We start with somenotations. For x ∈ {− , } m and an integer k such that k divides m , we consider a partition of x into k equal-sized consecutive blocks and write x = x (1) , x (2) , . . . , x ( k ) , where x ( i ) ∈ {− , } m/k foreach i ∈ [ k ]. Lemma 38.

For any integers m, t, k > such that k divides m, t , let G be a class of functions on mt + t bits, and let G : {− , } m × t → {− , } mt + t be G ( x , x , . . . , x t )= (cid:16) x ( i )1 , x ( i )2 , . . . , x ( i ) t , GIP km (cid:0) x ( i − · ( t/k )+1 (cid:1) , GIP km (cid:0) x ( i − · ( t/k )+2 (cid:1) , . . . , GIP km (cid:0) x i · ( t/k )+1 (cid:1)(cid:17) i ∈ [ k ] , where x , x , . . . , x t ∈ {− , } m . Then G is a PRG that ( t · ε ) -fools FORMULA ◦ G of size s = Ω  m k · k · (cid:16) R ( k ) ε/ (2 m ) ( G ) + log m (cid:17) · log (1 /ε )  . Proof.

The high level idea is as follows. We argue that if there is a

FORMULA ◦ G of the claimedsize that breaks the PRG, then there is a

FORMULA ◦ G ′ of the same size that computes GIP on m bits, where G ′ has a k -party communication complexity that is at most that of G with respect to the m -bit input , and hence contradicts the FORMULA ◦ G ′ complexity of the generalized inner productfunction. The resulting formula is obtained by ﬁxing some input bits of the original FORMULA ◦ G which breaks the PRG.We use a hybrid argument. First consider the distribution given by G , where we replace each GIP ( x j ) ( j ∈ [ t ]) with a uniformly random bit; let us denote those random bits as U j for j ∈ [ t ] (notethat this is just the uniform distribution). Then for each j ∈ [ t ], deﬁne H j to be the distributionthat we substitute back GIP ( x ) , GIP ( x ) , . . . , GIP ( x j ) for the corresponding uniform bits in theprevious distribution.For the sake of contradiction, suppose there exists a FORMULA ◦ G C of size s such that | Pr [ C ( H t ) = 1] − Pr [ F ( H ) = 1] | > t · ε.

25y the triangle inequality, there exists a 1 ≤ j ≤ k such that | Pr [ C ( H j ) = 1] − Pr [ C ( H j − ) = 1] | > ε. Then by averaging, there exist some ﬁxings of x , . . . , x j − , x j +1 , . . . , x t and U j +1 , . . . , U t to C suchthat the above inequality still holds. Let us denote by C ′ the circuit obtained by C after suchﬁxings and assume without loss of generality ( k − t/k ≤ j ≤ t . Then we have (cid:12)(cid:12)(cid:12) Pr h C ′ (cid:16) x (1) j , x (2) j , . . . , x ( k ) j , GIP ( x j ) (cid:17) = 1 i − Pr h C ′ (cid:16) x (1) j , x (2) j , . . . , x ( k ) j , U j (cid:17) = 1 i(cid:12)(cid:12)(cid:12) > ε. (4)By a standard “unpredictability implies pseudorandomness” argument [Yao82], we can show thatthere is some circuit C ′′ , obtained from C ′ by ﬁxing some value for the last bit, that computes thegeneralized inner product function on m bits with probability greater than 1 / ε over uniformlyrandom inputs. Note that the size of C ′′ is the same as C ′ (hence also C ) , and also C ′′ can becomputed by some FORMULA ◦ G ′ , where R ( k ) δ ( G ′ ) ≤ R ( k ) δ ( G ) for every δ . This contradicts hardnessof GIP for such circuits (Theorem 19).We are now ready to prove Theorem 37.

Proof of Theorem 37.

Consider Lemma 38. Let n = mt + t , and we have m = (cid:0) nt − (cid:1) . ThenLemma 38 gives a PRG that ε -fools FORMULA ◦ G of size s = Ω  m k · k · (cid:16) R ( k ) ε/ (2 m ) ( G ) + log m (cid:17) · log ( t/ε )  ≥ Ω (cid:18)(cid:16) nt (cid:17) / (cid:18) k · k · (cid:16) R ( k ) ε/ (2 n ) ( G ) + log n (cid:17) · log ( n/ε ) (cid:19)(cid:19) , which yields t ≥ Ω  n √ s · k · k · (cid:16) R ( k ) ε/ (2 n ) ( G ) + log n (cid:17) · log( n/ε )  . Note that the seed length in this case is n − t . MKtP lower bounds

The PRG in Theorem 37 is suﬃcient to give an MKtP lower bound for formulas of functionswith low multi-party communication complexity.

Theorem 39.

For any integer s > and any class of N -bit function G , if MKtP on N -bit can becomputed by some function FORMULA [ s ] ◦ G , then s = N k · k · R ( k )1 / ( G ) · polylog ( N ) . roof. Let C be a function in FORMULA ◦ G of size less than N k · k · R ( k )1 / ( G ) · log c ( N )where c > / C and its seed length is N − polylog ( N ) . Also, since the PRG is polynomial-time computable, we get that for every seed, the output of thePRG has Kt complexity at most θ = N − polylog ( N ). However, consider the MKtP function witha threshold parameter θ ; this function is not fooled by such a PRG, since it accepts every outputof the PRG and rejects a uniformly random string with high probability. In this section, we will use { , } as the Boolean basis. Deﬁnition 40 (Computational eﬃcient communication protocols) . Let t : N × N → N . We saythat a two-party communication protocol is t -eﬃcient if for each of the parties, given an input x and some previously sent messages π ∈ { , } ∗ , the next message to send can be computed in time t ( | x | , | π | ) ( ⊥ is being output if there is no next message). We say that such a protocol is explicit if t ( | x | , | π | ) = 2 o ( | x | + | π | ) . Lemma 41.

Let f : { , } n → and let Π be a t -eﬃcient communication protocol for f with commu-nication cost at most D . Then the protocol tree of Π can be output in time O (cid:0) D · t ( n/ , D ) · n · D (cid:1) .That is, there exists an algorithm that outputs a list of all (partial and full) transcripts of length atmost D and the rectangles associated with each of the transcripts.Proof. It suﬃces to show that, given an input x ∈ { , } n/ and a transcript ℓ ∈ { , } ≤ D , we candecide whether x belongs to the rectangle indexed by ℓ in time D · t ( n/ , D ). Suppose x is theinput for Alice (resp. Bob), and we want to decide whether x belongs to the rectangle indexedby π . We can carry out the communication task by simulating the behavior of Alice (resp. Bob)using the protocol Π and simulating Bob’s (resp. Alice’s) behavior using the transcript π , andcheck whether the messages sent by Alice (resp. Bob) is consistent with the transcript π . Thistakes time at most D · t ( n/ , D ). To construct the tree, we do the above for every (partial and full)transcript π ∈ { , } ≤ D and every input x ∈ { , } n/ for Alice (resp. Bob). The total runningtime is O (cid:0) D · t ( n/ , D ) · n · D (cid:1) .For a protocol Π, we denote by Leaves(Π) the set of full transcripts of Π. Remark.

We note that, in the white-box context of the satisﬁability problem, there is no needto assume a canonical partition of the input variables among the players. For instance, a helpfulpartition can either be given as part of the input, or computed by the algorithm. As a consequence,in instantiations of Theorem 5 for a particular circuit class C , it is suﬃcient to be able to convertthe input circuit from C into some device from FORMULA ◦ G for which protocols of boundedcommunication cost can be described. 27 .2 Explicit approximating polynomials for formulas

From Theorem 10, we know that every size- s formula has a degree- O ( √ s ) polynomial thatpoint-wisely approximates it. In our SAT algorithms, we will need to explicitly construct suchan approximating polynomial given a formula. One way to do this is to use an eﬃcient quantumquery algorithm for formulas. It is known that a quantum query algorithm for a function f using atmost T queries implies an approximating polynomial for f of degree at most 2 T [BBC + eﬃcient quantum algorithm for evaluating size- s formulas with O ( √ s · log s ) queries . Here, wepresent an alternate way to construct approximating polynomials for de Morgan formulas which relyonly on the existence of such polynomials, without requiring an eﬃcient quantum query algorithm.This “black-box” approach was suggested to us by an anonymous reviewer.We ﬁrst need the following structural lemma for formulas. Lemma 42 ([IMZ12, Tal14]) . For every integer s > , there exists an algorithm such that given asize- s de Morgan formula F , runs in poly ( s ) time and outputs a top formula F ′ with O ( √ s ) leavesand each leaf of F ′ is a sub-formula with O ( √ s ) input leaves. Lemma 43.

For any integer s > and any < ε < , there exists an algorithm of running time s O ( √ s · log( s ) · log(1 /ε ) ) such that given a de Morgan formula F of size s , outputs an ε -approximatingpolynomial of degree O ( √ s · log( s ) · log(1 /ε )) for F . That is, the algorithm outputs a multi-linearpolynomial (as sum of monomials) over the reals such that for every x ∈ { , } n , | p ( x ) − F ( x ) | ≤ ε. Proof.

We ﬁrst note that it suﬃces to construct a (1 / F with degree D = O ( √ s · log( s )). This is because given a (1 / ε -approximating polynomial of degree D · O (log(1 /ε )), by feeding O (1 /ε ) copies of the (1 / O (1 /ε ) bits [BNRdW07](see also [Tal14, Appendix B]).We ﬁrst invoke Lemma 42 on F to obtain a top formula F ′ with t = O ( √ s ) leaves, each of whichis a sub-formula of size O ( √ s ). We construct a (1 / P for the top formula F ′ , which has degree d = O ( s / ) by Theorem 10. Note that P can beconstructed in time 2 O ( √ s ) because F ′ has at most O ( √ s ) leaves. Next, for each of the t sub-formulas, denoted as F , F , . . . , F t , we construct a (1 / (20 t ))-approximating polynomial. Note thatthese polynomials have degree d = O ( s / · log( s )) and can be constructed in time 2 O ( √ s ) . Let’sdenote these t polynomials as Q , Q , . . . , Q t . Now for each Q i ( i ∈ [ t ]), we deﬁne q i ( x ) = Q i ( x ) + 1 / (20 t )1 + 1 / (10 t ) . The ﬁnal approximating polynomial for F is given as p ( x ) = P ( q ( x ) , q ( x ) , . . . , q t ( x )) . It is also known that there exists a quantum query algorithm for evaluating size- s formulas with O ( √ s )queries [Rei11b], which implies the existence of an approximating polynomial for size- s formulas of degree O ( √ s )(see Theorem 10). However, because this algorithm is not known to be eﬃcient, it is unclear whether such anapproximating polynomial can be constructed eﬃciently with respect to the number of monomials. p has degree d · d = O ( √ s · log( s )) and can be constructed (as sum of monomials) intime s O ( √ s · log( s )) . It remains to show that p (1 / F .For 0 ≤ q ≤

1, let N q be the distribution over { , } such that Pr y ∼ N q [ y = 1] = q . Then for anﬁxed input x ∈ { , } s , we have p ( x ) = E y i ∼ N qi ( x ) [ P ( y , y , . . . , y t )] . (5)Let E be the event that y i = F i ( x ) for all i ∈ [ t ]. Note that δ := Pr y i ∼ N qi ( x ) [ ¬E ] ≤ / . (6)To see Equation (6), note that for every i ∈ [ t ], if F i ( x ) = 0, then 0 ≤ q i ( x ) ≤ / (10 t ), whichimplies Pr y i ∼ N qi ( x ) [ y i = F i ( x )] ≤ / (10 t ) . Similar for the case when F i ( x ) = 1 (which implies 1 − / (10 t ) < q i ( x ) ≤ p ( x ) = E [ P ( y , y , . . . , y t ) | E ] · Pr [ E ] + E [ P ( y , y , . . . , y t ) | ¬E ] · Pr [ ¬E ]= (cid:0) F ′ ( F ( x ) , F ( x ) , · F t ( x )) ± / (cid:1) · (1 − δ ) + E [ P ( y , y , . . . , y t ) | ¬E ] · δ. Note that P ( y ) ∈ [ − / (20 t ) , / (20 t )] for every y ∈ { , } t , and that δ ≤ /

10. A simplecalculation shows that p ( x ) = F ′ ( F ( x ) , F ( x ) , . . . , F t ( x )) ± , as desired. In this subsection, we present our

Theorem 44.

For any integer s > , there exists a deterministic FORMULA [ s ] ◦G , where G is the class of functions with explicit two-party deterministic protocols of communicationcost at most D , that runs in time n − n √ s · log2( s ) · D . In the case G is the class of functions with explicit randomized protocols of communication cost atmost R , there exists an analogous randomized algorithm with a running time n − (cid:16) n √ s · log2( s ) · R (cid:17) / . The algorithm is based on the framework for designing satisﬁability algorithms developed byWilliams [Wil14]. The idea is to transform a given circuit into a “sparse polynomial” and solvesatisﬁability by evaluating the polynomial on all points in a faster-than-brute-force manner.We ﬁrst need the following fast matrix multiplication algorithm for “narrow” matrices.

Theorem 45 ([Cop82]) . Multiplication of an N × N . matrix with an N . × N matrix can bedone in O ( N log N ) arithmetic operations over any ﬁeld. n >

0, and x ∈ { , } n , we denote by x L (resp. x R ) the ﬁrst half of x and x R ∈ { , } n/ the second half. We now prove Theorem 44. Proof of Theorem 44.

We ﬁrst prove the deterministic case.Let C = F ( g , g . . . , g s ) be a device in FORMULA ◦ G where F is a formula and g , g , . . . , g s are functions that have a explicit communication protocol of cost at most D . The ﬁrst step is tooutput the protocol tree for each g i ( i ∈ [ s ]). Since each g i has explicit protocol of cost at most D ,by Lemma 41, these protocol trees can be output in time s · n/ D + o ( n ) ≤ n/ . (here we assume D = o ( n ) otherwise the theorem holds trivially).Let n ′ be an integer whose value is determined later. Let T be a set of n ′ variables such that T contains n ′ / n variables and the rest are from the secondhalf. For a partial assignment z ∈ { , } n ′ to T , denote by C z the restricted function of C wherethe variables in T are ﬁxed according to z . To count the number of satisfying assignments of C ,we need to compute the following quantity: X x ∈{ , } n − n ′ X z ∈{ , } n ′ C z ( x ) . (7)Now consider Q ( x ) = X z ∈{ , } n ′ C z ( x ) . We will try to obtain the value of Q ( x ) for every x ∈ { , } n − n ′ , in time about 2 n − n ′ , which willallow us to compute the quantity in Equation (7) in time O (2 n − n ′ ) by summing Q ( x ) over allthe x ’s. We do this by ﬁrst transforming Q into an approximating polynomial with not-too-manymonomials, and each monomial is a product of functions that only rely on either the ﬁrst or thesecond half of x . With such a polynomial, we can perform fast multipoint evaluation using the fastmatrix multiplication algorithm in Theorem 45.For each z ∈ { , } n ′ , we view the formula C z as F ( g z , g z , . . . , g sz ), where F is the de Morganformula part of C z and g z , g z , . . . , g sz are the leaf gates. Let us now replace F by a ε -approximatingpolynomial p , where ε = 1 / (cid:16) · n ′ (cid:17) , using Lemma 43. Note that the degree of p is at most d ≤ O ( √ s · log( s ) · log(1 /ε )) ≤ O ( √ s · log( s ) · n ′ ) . Now consider the following Q ′ ( x ) = X z ∈{ , } n ′ p ( g z ( x ) , g z ( x ) , . . . , g sz ( x )) . First, note that by the value that we’ve chosen for the approximating error ε , we have that, forevery x , (cid:12)(cid:12) Q ′ ( x ) − Q ( x ) (cid:12)(cid:12) ≤ n ′ · ε = 1 / . In other words, given Q ′ ( x ), we can recover the value of Q ( x ), which is supposed to be an integer.Next, we perform fast multipoint evaluation on Q ′ . First of all, we re-write Q ′ as follows: Q ′ ( x ) = X z ∈{ , } n ′ X S ⊆ [ s ]: | S |≤ d ˆ p ( S ) · Y i ∈ S g iz ( x ) . (8)30ow let Π i be the protocol of g i , we can re-write g iz as follows: g iz ( x ) = X π i ∈ Leaves(Π i ) α i (cid:0) z L x L , π i (cid:1) · β i (cid:0) z R x R , π i (cid:1) , (9)where α i (cid:0) z L x L , π i (cid:1) (resp. β i (cid:0) z R x R , π i (cid:1) ) is 1 if and only if (cid:0) z L x L (cid:1) (resp. (cid:0) z R x R (cid:1) ) belongs to therectangle indexed by π i and the function value of that rectangle is 1. Note that for each i ∈ [ s ],given the pre-computed protocol tree of the Π i , α i and β i can be computed in polynomial time (forexample, using binary search). After plugging Equation (9) into Equation (8) for every i ∈ [ s ] andrearranging, we get Q ′ ( x ) = X z ∈{ , } n ′ X S ⊆ [ s ]: | S |≤ d X ~π =( π i ) i ∈ S : π i ∈ Leaves(Π i ) ˆ p ( S ) · Y i ∈ S α i (cid:0) z L x L , π i (cid:1) · Y i ∈ S β i (cid:0) z R x R , π i (cid:1) . (10)Note that Q ′ can be expressed as the sum of at most m terms, where m ≤ n ′ · s O ( √ s · log( s ) · n ′ ) · O ( √ s · log( s ) · n ′ · D ) ≤ O ( √ s · log ( s ) · D · n ′ ) . Note that given Lemma 43, we can obtain Q ′ in time2 O ( √ s · log ( s ) · D · n ′ ) . (11)Next, we construct a 2 ( n − n ′ ) / × m matrix A and a m × ( n − n ′ ) / matrix B as follows: A x L , ( z,S,~π ) = ˆ p ( S ) · Y i ∈ S α i (cid:0) z L x L , π i (cid:1) , and B ( z,S,~π ) ,x R = Y i ∈ S β i (cid:0) z R x R , π i (cid:1) . It is easy to see that for each x ∈ { , } n − n ′ , Q ′ ( x ) = ( A · B ) x L ,x R . We now want to compute A · B . Therefore, we want m ≤ . n − n ′ ) / so that computing A · B canbe done in time ˜ O (2 n − n ′ ) using Theorem 45. For this we can set n ′ to be n ′ = nc · √ s · log ( s ) · D , where c > n − n √ s · log2( s ) · D . For the randomized case, for each g i ( i ∈ [ s ]), we consider a randomized protocol Π i that haserror ε ′ ≤ / (3 · s · n ′ ), and replace g i with a randomly picked protocol from Π i , so we can saythat for every x ∈ n − n ′ , the algorithm computes Q ( x ) (or Q ′ ( x )) with probability at least 2 / g i ’s and a union bound over all the z ’s in { , } n ′ ). Then we canrepeat the above algorithm poly ( n ) times and obtain Q ( x ) for all x ∈ { , } n − n ′ correctly with highprobability. Note that the error of any randomized protocol with communication complexity R canbe reduced to ε ′ by blowing up the communication complexity by a factor of O (log(1 /ε ′ )). In thiscase the, (as we are considering longer transcripts) the number of terms in Q ′ (as in Equation (10))will be 2 O ( √ s · log ( s ) · R · ( n ′ ) ) , and we need to set accordingly n ′ = Ω (cid:18) n √ s · log ( s ) · R (cid:19) / , which gives the claimed running time for the randomized case.In fact, using the ideas above we can also get a randomized FORMULA ◦ AC d,M ◦ G , where AC d,M is the class of depth- d size- M circuits and G is the class of functions that have low-communication complexity , by combining with the fact that AC circuits have low-degree probabilistic polynomials over the reals (a probabilistic polynomialof a function f is a distribution on polynomials such that for every input x , a randomly pickedpolynomial from the distribution agrees with f on the input x ). More speciﬁcally, we have thefollowing. Theorem 46.

For any integers s, d, M > , there exists a randomized FORMULA [ s ] ◦ AC d,M ◦ G , where G is the class of functions with explicit two-party determinis-tic protocols of communication cost at most D , the algorithm outputs the number of satisfyingassignments in time n − (cid:18) n √ s · log2( s ) · (log M ) O ( d ) · D (cid:19) / . In the case G is the class of functions with explicit randomized protocols of communication cost atmost R , there exists an analogous randomized algorithm with a running time n − (cid:18) n √ s · log2( s ) · (log M ) O ( d ) · R (cid:19) / . Proof sketch.

We show the case where G has low randomized communication complexity. Let • ε = 1 / (cid:16) · n ′ (cid:17) , • ε = 1 / (cid:16) · s · n ′ (cid:17) and • ε = 1 / (cid:16) · M · n ′ (cid:17) . Here we deﬁne the size of a AC d,M circuit to be the number of wires. Note that a circuit in FORMULA ◦ AC d,M ◦ G can have M functions from G at the bottom.

32s in the proof of Theorem 44, we can replace the formula part of

FORMULA [ s ] ◦ AC d,M ◦ G witha ε -approximating polynomial of degree O ( √ s · log( s ) · log(1 /ε )) = O ( √ s · log( s ) · n ′ ) . Then we replace the AC d,M circuit with a randomly picked polynomial from a ε -error probabilisticpolynomial. By [HS19], such a probabilistic polynomial is constructive and has degree at most(log M ) O ( d ) · log(1 /ε ) = (log M ) O ( d ) · ( n ′ + log( s )) . Finally, we replace each of the bottom functions, which is from G , with a randomly picked protocolfrom a randomized protocol with error ε , and hence has cost at most R · O (log(1 /ε )) = O ( R · ( n ′ + log( M ))) . As a result, we can express Q ′ as a polynomial with at most2 O ( √ s · log ( s ) · (log M ) O ( d ) · R · ( n ′ ) )monomials, whose variables are functions that depend on either the ﬁrst half or the second half of x . Note that with our choices of ε and ε , for every x ∈ { , } n − n ′ , the algorithm computes Q ( x )correctly that with probability at least 2 / n − (cid:18) n √ s · log2( s ) · (log M ) O ( d ) · R (cid:19) / , as desired.It is worth noting that unlike Theorem 44, the algorithm in Theorem 46 is randomized even if G is the class of functions with low deterministic communication complexity, because of the use ofprobabilistic polynomials for the AC circuits. In this section, we prove the following learning result for the

FORMULA ◦ XOR model.

Theorem 47.

For every constant γ > , there is an algorithm that PAC learns the class of n -variate Boolean functions FORMULA [ n − γ ] ◦ XOR to accuracy ε and with conﬁdence δ in time poly (cid:0) n/ log n , /ε, log(1 /δ ) (cid:1) . We ﬁrst review some useful results that pertain to agnostically learning parities as well asboosting of learning algorithms. 33 .1 Agnostically learning parities and boosting

For a parameter n ≥

1, let ∆ be a distribution on labelled examples ( x, y ) supported over { , } n × { , } , and assume that for each x there is at most one y such that ( x, y ) ∈ Support (∆).For a function h : { , } n → { , } , we denote by err ∆ ( h ) the error of h under this distribution:err ∆ ( h ) = Pr ( x,y ) ∼ ∆ [ h ( x ) = y ] . Similarly, for a class of functions C , we let opt ∆ ( C ) be the error of the best function in the class:opt ∆ ( C ) = min h ∈C err ∆ ( h ) . We will need a result established by Kalai, Mansour, and Verbin [KMV08], which gives a non-trivialtime agnostic learning algorithm for the class of parities.

Lemma 48 ([KMV08]) . Let

XOR be the class of parity functions on n variables. Then, for anyconstant ζ > , there is a randomized learning algorithm W such that, for every parameter n ≥ and distribution ∆ over labelled examples, when W is given access to independent samples from ∆ it outputs with high probability a circuit computing a hypothesis h : { , } n → { , } such that err ∆ ( h ) ≤ opt ∆ ( XOR ) + 2 − n − ζ . The sample complexity and running time of W is O ( n/ log n ) . Recall that a boosting procedure for learning algorithms transforms a weak learner that outputsa hypothesis that is just weakly correlated with the unknown function into a (strong) PAC learningalgorithm for the same class (i.e., a learner in the sense of Deﬁnition 18). We refer for instance to[KV94] for more information about boosting in learning theory. We shall make use of the followingboosting result by Freund [Fre90].

Lemma 49 ([Fre90]) . Let W be a ( weak ) learner for a class C that runs in time t ( n ) and outputs ( under any distribution ) a hypothesis of error up to / − β , for some constructive function β ( n ) > .Then, there exists a PAC learning algorithm for C that runs in time poly ( n, t, /ε, /β, log(1 /δ )) . We are ready to show that sub-quadratic size formulas over parity functions can be learned intime 2 O ( n/ log n ) . First, we argue that Lemma 48 provides a weak learner that works under anydistribution D supported over { , } n . This will follow from Lemma 21, which shows that anyfunction in FORMULA [ s ] ◦ XOR is correlated with some parity function with respect to D . We thenobtain a standard PAC learner via the boosting procedure from Lemma 49. Proof of Theorem 47.

Let C = FORMULA ◦ XOR , where s = n − γ for some constant γ >

0. Forany function f ∈ FORMULA [ s ] ◦ XOR and distribution D supported over { , } n , Lemma 21 showsthat there exists a parity function χ = χ ( f, D ) such thatPr x ∼D [ f ( x ) = χ ( x )] ≥

12 + 12 n − λ , λ = λ ( γ ) > n , under the assumption that n is suﬃciently large.Let ∆ = ∆( D , f ) be the distribution over labelled examples induced by D and f . Note thatopt ∆ ( XOR ) ≤ / − exp( n − λ ). Consequently, by invoking Lemma 48 with parameter ζ = λ , itfollows that FORMULA [ n − γ ] ◦ XOR can be learned under an arbitrary distribution to error β ( n ) ≤ / − exp( n − Ω(1) ) in time t ( n ) = 2 O ( n/ log n ) . Consequently, we can obtain a PAC learner algo-rithm for FORMULA [ n − γ ] ◦ XOR via Lemma 49 that runs in time poly ( n, t ( n ) , /ε, /β, log(1 /δ )) = poly (2 n/ log n , /ε, log(1 /δ )). Acknowledgements

We would like to thank Rocco Servedio for bringing to our attention the work by Kalai, Mansour,and Verbin [KMV08], which is a central ingredient in the proof of Theorem 7. We also thank MahdiCheraghchi for several discussions on the analysis of Boolean circuits with a bottom layer of paritygates.This work was funded in part by a Royal Society University Research Fellowship (URF \ R1 \ References [AB18] Amir Abboud and Karl Bringmann. Tighter connections between formula-SAT andshaving logs. In

ICALP , pages 8:1–8:18, 2018.[ACR +

10] Andris Ambainis, Andrew M. Childs, Ben Reichardt, Robert Spalek, and ShengyuZhang. Any AND-OR formula of size N can be evaluated in time N / o (1) on a quan-tum computer. SIAM J. Comput. , 39(6):2513–2530, 2010. doi.org/10.1137/080712167.[ACW16] Josh Alman, Timothy M. Chan, and R. Ryan Williams. Polynomial representationsof threshold functions and algorithmic applications. In

FOCS , pages 467–476, 2016.doi.org/10.1109/FOCS.2016.57.[AGHP92] Noga Alon, Oded Goldreich, Johan H˚astad, and Ren´e Peralta. Simple construction ofalmost k -wise independent random variables. Random Struct. Algorithms , 3(3):289–304, 1992. doi.org/10.1002/rsa.3240030308.[And87] Alexander E Andreev. On a method for obtaining more than quadratic eﬀective lowerbounds for the complexity of π -schemes. Moscow Univ. Math. Bull. , 42(1):63–66,1987.[ASWZ96] Roy Armoni, Michael E. Saks, Avi Wigderson, and Shiyu Zhou. Discrepancy setsand pseudorandom generators for combinatorial rectangles. In

FOCS , pages 412–421,1996. doi.org/10.1109/SFCS.1996.548500.[BBC +

01] Robert Beals, Harry Buhrman, Richard Cleve, Michele Mosca, and Ronaldde Wolf. Quantum lower bounds by polynomials.

J. ACM , 48(4):778–797, 2001.doi.org/10.1145/502090.502097.[BIP +

18] Dan Boneh, Yuval Ishai, Alain Passel`egue, Amit Sahai, and David J. Wu. Exploringcrypto dark matter: New simple PRF candidates and their applications. In

TCC ,pages 699–729, 2018. doi.org/10.1007/978-3-030-03810-6 25.35BNRdW07] Harry Buhrman, Ilan Newman, Hein R¨ohrig, and Ronald de Wolf. Robust poly-nomials and quantum algorithms.

Theory Comput. Syst. , 40(4):379–395, 2007.doi.org/10.1007/s00224-006-1313-z.[BNS92] L´aszl´o Babai, Noam Nisan, and Mario Szegedy. Multiparty protocols, pseudorandomgenerators for logspace, and time-space trade-oﬀs.

J. Comput. Syst. Sci. , 45(2):204–232, 1992. doi.org/10.1016/0022-0000(92)90047-M.[CJW19] Lijie Chen, Ce Jin, and Ryan Williams. Hardness magniﬁcation for all sparse NP languages. In FOCS , 2019. ECCC:TR19-118.[CKK +

15] Ruiwen Chen, Valentine Kabanets, Antonina Kolokolova, Ronen Shaltiel, and DavidZuckerman. Mining circuit lower bound proofs for meta-algorithms.

ComputationalComplexity , 24(2):333–392, 2015. doi.org/10.1007/s00037-015-0100-0.[CKLM19] Mahdi Cheraghchi, Valentine Kabanets, Zhenjian Lu, and Dimitrios Myrisiotis. Cir-cuit lower bounds for MCSP from local pseudorandom generators. In

ICALP , pages39:1–39:14, 2019. ECCC:TR19-022.[Cop82] Don Coppersmith. Rapid multiplication of rectangular matrices.

SIAM J. Comput. ,11(3):467–471, 1982. doi.org/10.1137/0211037.[CW19] Lijie Chen and Ruosong Wang. Classical algorithms from quantum andArthur-Merlin communication protocols. In

ITCS , pages 23:1–23:20, 2019.doi.org/10.4230/LIPIcs.ITCS.2019.23.[DH09] Evgeny Dantsin and Edward A Hirsch. Worst-case upper bounds.

Handbook ofSatisﬁability , 185:403–424, 2009.[DM18] Irit Dinur and Or Meir. Toward the KRW composition conjecture: Cubic formulalower bounds via communication complexity.

Computational Complexity , 27(3):375–462, 2018. doi.org/10.1007/s00037-017-0159-x.[FGG08] Edward Farhi, Jeﬀrey Goldstone, and Sam Gutmann. A quantum algorithmfor the hamiltonian NAND tree.

Theory of Computing , 4(1):169–190, 2008.doi.org/10.4086/toc.2008.v004a008.[Fre90] Yoav Freund. Boosting a weak learning algorithm by majority. In

COLT , pages202–216, 1990. dl.acm.org/citation.cfm?id=92640.[FSUV13] Bill Feﬀerman, Ronen Shaltiel, Christopher Umans, and Emanuele Viola.On beating the hybrid argument.

Theory of Computing , 9:809–843, 2013.doi.org/10.4086/toc.2013.v009a026.[GOWZ10] Parikshit Gopalan, Ryan O’Donnell, Yi Wu, and David Zuckerman. Fooling func-tions of halfspaces under product distributions. In

CCC , pages 223–234, 2010.arXiv:1001.1593.[GPW18] Mika G¨o¨os, Toniann Pitassi, and Thomas Watson. The landscape of communicationcomplexity classes.

Computational Complexity , 27(2):245–304, 2018.36Gro96] Lov K. Grover. A fast quantum mechanical algorithm for database search. In

STOC ,pages 212–219, 1996. doi.org/10.1145/237814.237866.[H˚as98] Johan H˚astad. The shrinkage exponent of de Morgan formulas is 2.

SIAM J. Comput. ,27(1):48–64, 1998. doi.org/10.1137/S0097539794261556.[HLS07] Peter Høyer, Troy Lee, and Robert Spalek. Negative weights make adversariesstronger. In

STOC , pages 526–535, 2007. doi.org/10.1145/1250790.1250867.[HS19] Prahladh Harsha and Srikanth Srinivasan. On polynomial approximations to AC . Random Struct. Algorithms , 54(2):289–303, 2019. doi.org/10.1002/rsa.20786.[IMZ12] Russell Impagliazzo, Raghu Meka, and David Zuckerman. Pseudorandomness fromshrinkage. In

FOCS , pages 111–119, 2012. ECCC:TR12-057.[IN93] Russell Impagliazzo and Noam Nisan. The eﬀect of random restrictions on formula size.

Random Struct. Algorithms , 4(2):121–134, 1993. doi.org/10.1002/rsa.3240040202.[INW94] Russell Impagliazzo, Noam Nisan, and Avi Wigderson. Pseudorandomness for networkalgorithms. In

STOC , pages 356–364, 1994. doi.org/10.1145/195058.195190.[Juk12] Stasys Jukna.

Boolean Function Complexity - Advances and Frontiers , volume 27 of

Algorithms and combinatorics . Springer, 2012.[Kab02] Valentine Kabanets. Derandomization: A brief overview.

Current Trends in Theoret-ical Computer Science , 1:165–188, 2002.[Khr71] Valeriy M Khrapchenko. Method of determining lower bounds for the complexity of π -schemes. Mathematical Notes , 10(1):474–479, 1971.[KKL17] Valentine Kabanets, Daniel M. Kane, and Zhenjian Lu. A polynomial restrictionlemma with applications. In

STOC , pages 615–628, 2017. ECCC:TR17-026.[KKMS08] Adam Tauman Kalai, Adam R. Klivans, Yishay Mansour, and Rocco A. Serve-dio. Agnostically learning halfspaces.

SIAM J. Comput. , 37(6):1777–1805, 2008.doi.org/10.1137/060649057.[KL18] Valentine Kabanets and Zhenjian Lu. Satisﬁability and derandomization for smallpolynomial threshold circuits. In

APPROX/RANDOM , pages 46:1–46:19, 2018.doi.org/10.4230/LIPIcs.APPROX-RANDOM.2018.46.[KMV08] Adam Tauman Kalai, Yishay Mansour, and Elad Verbin. On agnostic boosting andparity learning. In

STOC , pages 629–638, 2008. doi.org/10.1145/1374376.1374466.[KN97] Eyal Kushilevitz and Noam Nisan.

Communication Complexity . Cambridge UniversityPress, 1997.[KV94] Michael J. Kearns and Umesh V. Vazirani.

An Introduc-tion to Computational Learning Theory . MIT Press, 1994.mitpress.mit.edu/books/introduction-computational-learning-theory.37LLS06] Sophie Laplante, Troy Lee, and Mario Szegedy. The quantum adversary method andclassical formula size lower bounds.

Computational Complexity , 15(2):163–196, 2006.doi.org/10.1007/s00037-006-0212-7.[MTT61] Saburo Muroga, Iwao Toda, and Satoru Takasu. Theory of majoritydecision elements.

Journal of the Franklin Institute , 271:376–418, 1961.doi.org/10.1016/0016-0032(61)90702-5.[Nec66] E.I. Nechiporuk. On a Boolean function.

Doklady Akademii Nauk SSSR , 169(4):765–766, 1966. English translation in Soviet Mathematics Doklady.[Nis94] Noam Nisan. The communication complexity of threshold gates. In

Proceedings of“Combinatorics, Paul Erdos is Eighty” , pages 301–315, 1994.[OPS19] Igor Carboni Oliveira, J´an Pich, and Rahul Santhanam. Hardness magniﬁ-cation near state-of-the-art lower bounds. In

CCC , pages 27:1–27:29, 2019.doi.org/10.4230/LIPIcs.CCC.2019.27.[OS17] Igor Carboni Oliveira and Rahul Santhanam. Conspiracies between learning algo-rithms, circuit lower bounds, and pseudorandomness. In

CCC , pages 18:1–18:49, 2017.ECCC:TR16-197.[OST19] Ryan O’Donnell, Rocco A. Servedio, and Li-Yang Tan. Fooling polytopes. In

STOC ,pages 614–625, 2019. arXiv:1808.04035.[PRS88] Pavel Pudl´ak, Vojtech R¨odl, and Petr Savick´y. Graph complexity.

Acta Inf. ,25(5):515–535, 1988.[PW10] Mihai Patrascu and Ryan Williams. On the possibility of faster SAT algorithms. In

SODA , pages 1065–1075, 2010. doi.org/10.1137/1.9781611973075.86.[PZ93] Mike Paterson and Uri Zwick. Shrinkage of de Morgan formulae under restriction.

Random Struct. Algorithms , 4(2):135–150, 1993. doi.org/10.1002/rsa.3240040203.[Rei09] Ben Reichardt. Span programs and quantum query complexity: The general adversarybound is nearly tight for every Boolean function. In

FOCS , pages 544–551, 2009.doi.org/10.1109/FOCS.2009.55.[Rei11a] Ben Reichardt. Faster quantum algorithm for evaluating game trees. In

SODA , pages546–559, 2011. arXiv:0907.1623.[Rei11b] Ben Reichardt. Reﬂections for quantum query algorithms. In

SODA , pages 560–569,2011. arXiv:1005.1601.[RS12] Ben Reichardt and Robert Spalek. Span-program-based quantum algo-rithm for evaluating formulas.

Theory of Computing , 8(1):291–319, 2012.doi.org/10.4086/toc.2012.v008a013.[Ser17] Igor S Sergeev. Upper bounds for the size and the depth of formulae for MOD-functions.

Discrete Mathematics and Applications , 27(1):15–22, 2017.38ST17] Rocco A. Servedio and Li-Yang Tan. What circuit classes can belearned with non-trivial savings? In

ITCS , pages 30:1–30:21, 2017.doi.org/10.4230/LIPIcs.ITCS.2017.30.[ST19] Rocco A. Servedio and Li-Yang Tan. Improved pseudorandom generators from pseu-dorandom multi-switching lemmas. In

APPROX/RANDOM , pages 45:1–45:23, 2019.doi.org/10.4230/LIPIcs.APPROX-RANDOM.2019.45.[Sub61] Bella Abramovna Subbotovskaya. Realization of linear functions by formulas using ∨ ,&, − . In Doklady Akademii Nauk , volume 136, pages 553–555. Russian Academy ofSciences, 1961.[Tal14] Avishay Tal. Shrinkage of de Morgan formulae by spectral techniques. In

FOCS ,pages 551–560, 2014. ECCC: TR14-048.[Tal15] Avishay Tal.

Electronic Colloquium on Computa-tional Complexity (ECCC) , 22:114, 2015. ECCC:TR15-114.[Tal16] Avishay Tal. The bipartite formula complexity of inner-product is quadratic.

Electronic Colloquium on Computational Complexity (ECCC) , 23:181, 2016.ECCC:TR16-181.[Tal17] Avishay Tal. Formula lower bounds via the quantum method. In

STOC , pages 1256–1268, 2017. doi.org/10.1145/3055399.3055472.[Vad12] Salil P. Vadhan. Pseudorandomness.

Foundations and Trends in Theoretical ComputerScience , 7(1-3):1–336, 2012. doi.org/10.1561/0400000010.[Val84] Leslie G. Valiant. A theory of the learnable. In

STOC , pages 436–445, 1984.doi.org/10.1145/800057.808710.[Vio15] Emanuele Viola. The communication complexity of addition.

Combinatorica ,35(6):703–747, 2015. doi.org/10.1007/s00493-014-3078-3.[Wil14] Ryan Williams. Nonuniform ACC circuit lower bounds.

J. ACM , 61(1):2:1–2:32, 2014.doi.org/10.1109/10.1145/2559903.[Yao82] Andrew Chi-Chih Yao. Theory and applications of trapdoor functions. In

FOCS ,pages 80–91, 1982. doi.org/10.1109/SFCS.1982.45.

A Proofs of useful lemmas

A.1 Useful lemmas for formulas

The proofs in this section are essentially the same as that of [Tal16].

Lemma 50 ([Tal16], Lemma 20 restated) . Let D be a distribution over {− , } n , and let f, C : {− , } n →{− , } be such that Pr x ∼D [ C ( x ) = f ( x )] ≥ / ε. et ˜ C : {− , } n → R be a ε -approximating function of C , i.e., for every x ∈ {− , } n , | C ( x ) − ˜ C ( x ) | ≤ ε . Then, E x ∼D [ ˜ C ( x ) · f ( x )] ≥ ε. Proof.

Note that since ˜

C ε -approximate C , we have for every x ∈ {− , } n ˜ C · C ( x ) ≥ − ε, and ˜ C · (1 − C ( x )) ≥ − − ε. Then, E x ∼D [ ˜ C ( x ) · f ( x )] = E x ∼D [ ˜ C ( x ) · f ( x ) | C ( x ) = f ( x )] · Pr x ∼D [ C ( x ) = f ( x )]+ E x ∼D [ ˜ C ( x ) · f ( x ) | C ( x ) = f ( x )] · Pr x ∼D [ C ( x ) = f ( x )] ≥ (1 − ε ) · Pr x ∼D [ C ( x ) = f ( x )] + ( − − ε ) · (cid:18) − Pr x ∼D [ C ( x ) = f ( x )] (cid:19) = 2 · Pr x ∼D [ C ( x ) = f ( x )] − − ε ≥ · (1 / ε ) − − ε ≥ ε, as desired. Lemma 51 ([Tal16], Lemma 21 restated) . Let D be a distribution over {− , } n and let G be a classof functions. For f : {− , } n → {− , } , suppose that D : {− , } n → {− , } ∈ FORMULA [ s ] ◦ G is such that Pr x ∼D [ D ( x ) = f ( x )] ≥ / ε . Then there exists some h : {− , } n → {− , } ∈ XOR O ( √ s · log(1 /ε ) ) ◦ G such that E x ∼D [ h ( x ) · f ( x )] ≥ s O ( √ s · log(1 /ε ) ) . Proof.

Let D = F ( g , g . . . , g s )be a device in FORMULA ◦ G where F is a formula and g , g , . . . , g s are function from G .Let p : {− , } s → R be a ε -approximating polynomial for F of degree d = O ( √ s · log(1 /ε )).Note that we can write p ( z ) = X S ⊆ [ s ]: | S |≤ d ˆ p ( S ) · Y i ∈ S z i . Also, for each S ⊆ [ s ], we have | ˆ p ( S ) | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E z ∈{− , } s [ p ( z ) · Y i ∈ S z i ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ ε . D := p ( g , g . . . , g s ) . Note that ˜ D is a ε -approximating function for D . Therefore, by Lemma 50, we have ε ≤ E x ∼D [ D ( x ) · f ( x )]= E x ∼D  X S ⊆ [ s ]: | S |≤ d ˆ p ( S ) · Y i ∈ S g i  · f ( x )  = X S ⊆ [ s ]: | S |≤ d ˆ p ( S ) · E x ∼D "Y i ∈ S g i · f ( x ) ≤ X S ⊆ [ s ]: | S |≤ d (1 + ε ) · (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E x ∼D "Y i ∈ S g i · f ( x ) . The above equation is the sum of at most s O ( d ) summands. Therefore, there exists some S ⊆ [ s ]such that (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E x ∼D "Y i ∈ S g i · f ( x ) ≥ ε (1 + ε ) · s O ( d ) ≥ s O ( √ s · log(1 /ε ) ) , which implies that there exists some h , such that either h = Q i ∈ S g i or h = − Q i ∈ S g i , and E x ∼D [ h ( x ) · f ( x )] ≥ s O ( √ s · log(1 /ε ) ) . Finally, note that such h can be expressed as the XOR of at most d functions from G . A.2 PRG for low-communication functions in the number-in-hand setting

In this subsection, we show how to fool functions with low communication complexity in thenumber-in-hand model.

Theorem 52 ([ASWZ96, INW94], Theorem 28 restated) . For any k ≥ , there exists a PRG that δ -fools any n -bits functions with k -party number-in-hand deterministic communication complexityat most D ′ , with seed length n/k + O (cid:0) D ′ + log(1 /δ ) + log( k ) (cid:1) · log( k ) . The PRG in Theorem 28 is based on the PRG by Impagliazzo, Nisan and Wigderson [INW94]that is used to derandomize “network algorithms” and space-bounded computation. We will needto use randomness extractors, which we review below.

Deﬁnition 53 (Min-entropy) . Let X be a random variable. The min-entropy of X , denoted by H ∞ ( X ) , is the largest real number k such that Pr [ X = x ] ≤ − k for every x in the range of X . If X is a distribution over {− , } ℵ with H ∞ ( X ) ≥ k , then X is called a ( ℵ , k )-source . eﬁnition 54 (Extractors) . A function

Ext : {− , } ℵ ×{− , } d → {− , } m is an ( k, ε )-extractor if, for any ( ℵ , k ) -source X , and any test T : {− , } m → {− , } , it is the case that | Pr [ T (Ext( X, U d ) X ) = 1] − Pr [ T ( U m ) = 1] | ≤ ε. Theorem 55 ([Vad12, Theorem 6.22]) . For any integer m, κ > and < δ ′ < , there exists anexplicit ( κ, δ ′ ) extractor Ext : { , } m × { , } d → { , } m with d = O ( m − k + log(1 /δ ′ )) . We are now ready to show Theorem 28.

Proof of Theorem 28.

We ﬁrst describe the construction of the PRG. In fact, we will construct asequence of PRGs G , G , . . . , G log( k ) . We begin by specifying the parameters of these PRGs. Let t = log( k ), and let d = O (cid:0) D ′ + log(1 /δ ) + t (cid:1) . For i = 0 , , . . . , t , let • r = n/k , • r i = r i − + d .Note that we have r i = n/k + i · d . Also, letExt i : { , } r i × { , } d → { , } r i be a ( κ i , δ ′ )-extractor from Theorem 55, where κ i = r i − D ′ − t − log(1 /δ )and δ ′ = δ/ (cid:16) t · D ′ (cid:17) . Note that the seed length of the extractors is d = O ( D ′ + log(1 /δ ) + t ). Finally, deﬁne G i : { , } r i →{ , } n/ t − i recursively as follows • G ( a ) = a , where a ∈ { , } n/k . • G i ( a, z ) = G i − ( a ) ◦ G i − (Ext i − ( a, z )), where a ∈ { , } r i − and z ∈ { , } d .We will show that G t : { , } r t = n/k + t · d → { , } n fools any functions f with k -party number-in-hand deterministic communication complexity at most D ′ . First, note that such f can be writtenas f ( x , x , . . . , x k ) = D ′ X i =1 h ( i )1 ( x ) · h ( i )2 ( x ) · . . . · h ( i ) k ( x k ) , for some h ( i ) j : { , } n/k → { , } ( i ∈ h D ′ i , j ∈ [ k ]). Therefore, to show that the PRG G t δ -fool f , it suﬃces to show that G t (cid:16) δ/ D ′ (cid:17) -fools every function g of the form g ( x , x , . . . , x k ) = h ( x ) · h ( x ) · . . . · h k ( x k ) . More speciﬁcally we show the following. 42 laim 56.

For every k ≥ and ≤ i ≤ t , the generator G i deﬁned above (cid:0) i · δ ′ (cid:1) -fools everyfunction g i : { , } n/ t − i → { , } of the form g i ( x , x , . . . , x k/ t − i ) = h ( x ) · h ( x ) · . . . · h k/ t − i ( x k/ t − i ) , where x , x , . . . , x k/ t − i ∈ { , } n/k .Proof. The proof is by induction on i . The base case is i = 0, which is trivial given the deﬁnitionof G . Now suppose the claim holds for i −

1, we show the case for i . This is done using a hybridargument. Consider the following four distributions • D = U n/ t − i , • D = U n/ t − i +1 ◦ G i − ( U r i − ), • D = G i − ( U r i − ) ◦ G i − ( U ′ r i − ) ( U and U ′ are two independent uniform distributions), • D = G i ( U r i ).We want show show that | E [ g i ( D )] − E [ g i ( D )] | ≤ i · δ ′ . By the triangle inequality, it suﬃces to show that | E [ g i ( D )] − E [ g i ( D )] | + | E [ g i ( D )] − E [ g i ( D )] | + | E [ g i ( D )] − E [ g i ( D )] | ≤ i · δ ′ . (12)We show Equation (12) by upper bounding each of the three summands. First summand.

We show that | E [ g i ( D )] − E [ g i ( D )] | ≤ i − · δ ′ . (13)Let us re-write g i as g i ( x , x , . . . , x k/ t − i ) = h L ( x , x , . . . , x k/ t − i +1 ) · h R ( x k/ t − i +1 +1 , x k/ t − i +1 +2 , . . . , x k/ t − i ) , where h L ( y ) := k/ t − i +1 Y j =1 h i ( y ) and h R ( y ) := k/ t − Y j = k/ t − i +1 h i ( y ) . Then, E [ g i ( D )] = E (cid:2) h L ( U n/ t − i +1 ) · h R ( G i − ( U r i − )) (cid:3) = E (cid:2) h L ( U n/ t − i +1 ) (cid:3) · E (cid:2) h R ( G i − ( U r i − )) (cid:3) = E (cid:2) h L ( U n/ t − i +1 ) (cid:3) · (cid:0) E (cid:2) h R ( U n/ t − i +1 ) (cid:3) ± i − · δ ′ (cid:1) (By the induction hypothesis)= E (cid:2) h L ( U n/ t − i +1 ) (cid:3) · E (cid:2) h R ( U n/ t − i +1 ) (cid:3) ± i − · δ ′ = E [ g i ( D )] ± i − · δ ′ , as desired. 43 econd summand. By a similar argument, it can be shown that | E [ g i ( D )] − E [ g i ( D )] | ≤ i − · δ ′ . (14)We omit the details here. Third summand.

We show that | E [ g i ( D )] − E [ g i ( D )] | ≤ δ ′ . (15)We have E [ g i ( D )] = E [ g i ( G i ( U r i ))]= E (cid:2) h L ( G i − ( X )) · h R ( G i − (Ext i − ( X, Z ))) (cid:3) (where X ∼ { , } r i − and Z ∼ { , } d )= E [ A ( X ) · B (Ext i − ( X, Z ))] (where A ( · ) = h L ( G i − ( · )) and B ( · ) = h R ( G i − ( · )))= E [ B (Ext i − ( X, Z )) | A ( X ) = 1] · Pr [ A ( X ) = 1] . Similarly, we get E [ g i ( D )] = E [ B ( U r i − ) | A ( X ) = 1] · Pr [ A ( X ) = 1] . As a result, we have | E [ g i ( D )] − E [ g i ( D )] | = (cid:12)(cid:12)(cid:0) E [ B (Ext i − ( X, Z )) | A ( X ) = 1] − E [ B ( U r i − ) | A ( X ) = 1] (cid:1) · Pr [ A ( X ) = 1] (cid:12)(cid:12) . (16)On the one hand, if Pr [ A ( X ) = 1] ≤ δ ′ , then Equation (16) is at most δ ′ . On the other hand, if Pr [ A ( X ) = 1] > δ ′ , then H ∞ ( X | A ( X ) = 1) > r i − − log(1 /δ ′ ) > r i − − D ′ − t − log(1 /δ ) = κ i − . Then by the fact that Ext i − is a ( κ i − , δ ′ )-extractor, we have (cid:12)(cid:12) E [ B (Ext i − ( X, Z )) | A ( X ) = 1] − E [ B ( U r i − ) | A ( X ) = 1] (cid:12)(cid:12) ≤ δ ′ . Therefore, Equation (16) is at most δ ′ and this complete the proof of Equation (15). Finally, notethat Equation (12) follows from Equation (13), Equation (14) and Equation (15). This completesthe proof of Claim 56. (cid:3) (Claim 56)Given Claim 56, Theorem 28 now follows by letting i = t . (cid:3)(cid:3)