Weights at the Bottom Matter When the Top is Heavy
aa r X i v : . [ c s . CC ] M a y Weights at the Bottom Matter When the Top is Heavy
Arkadev Chattopadhyay ∗ and Nikhil S. Mande † School of Technology and Computer Science, TIFR, Mumbai
Abstract
Proving super-polynomial lower bounds against depth-2 threshold circuits of theform
THR ◦ THR is a well-known open problem that represents a frontier of our under-standing in boolean circuit complexity. By contrast, exponential lower bounds on thesize of
THR ◦ MAJ circuits were shown by Razborov and Sherstov [31] even for computingfunctions in depth-3 AC . Yet, no separation among the two depth-2 threshold circuitclasses were known. In fact, it is not clear a priori that they ought to be different. Inparticular, Goldmann, H˚astad and Razborov [14] showed that the class MAJ ◦ MAJ isidentical to the class
MAJ ◦ THR .In this work, we provide an exponential separation between
THR ◦ MAJ and
THR ◦ THR . We achieve this by showing a function f that is computed by linear size THR ◦ THR circuits and yet has exponentially large sign rank . This, by a well-known result, impliesthat f requires exponentially large THR ◦ MAJ circuits to be computed. Our resultsuggests that the sign rank method alone is unlikely to prove strong lower boundsagainst
THR ◦ THR circuits.The main technical ingredient of our work is to prove a strong sign rank lower boundfor an
XOR function. This requires novel use of approximation theoretic tools. ∗ Partially supported by a Ramanujan fellowship of the DST. [email protected] † Supported by a DAE fellowship. [email protected] Introduction
Understanding the computational power of constant-depth, unbounded fan-in thresholdcircuits is one of the most fundamental open problems in theoretical computer science.Despite several years of intensive research [1, 16, 20, 14, 30, 5, 23, 24, 12, 13, 31, 17, 18,22, 10], we still do not have strong lower bounds against depth-3 or depth-2 thresholdcircuits, depending on how we define threshold gates. The most natural definition of sucha gate, denoted by
THR w , is just a linear halfspace induced by the real weight vector w = ( w , w , . . . , w n ) ∈ R n +1 . In other words, on an input x ∈ {− , } n , THR w (cid:0) x (cid:1) = sgn w + n X i =1 w i x i ! . The class of all boolean functions that can be computed by circuits of depth d and poly-nomial size, comprising such gates, is denoted by LT d . The seminal work of Minsky andPapert [26] showed that the simple function, Parity, is not in LT . While it is not hard toverify that Parity is in LT , an outstanding problem is to exhibit an explicit function that isnot in LT . This problem is now a well-identified frontier for research in circuit complexity.A natural question is how large the individual weights in the weight vector w need tobe if we allow just integer weights. It was well-known [27] that for every threshold gatewith n inputs, there exists a threshold representation for it that uses only integer weightsof magnitude at most 2 O ( n log n ) . While proving a 2 Ω( n ) lower bound is not very difficult, amatching lower bound was shown only in the nineties by H˚astad [19]. Understanding thepower of large weights vs. small weights in the more general context of small-depth circuitshas attracted attention by several works [1, 14, 34, 17, 18, 30, 16, 21, 15]. More precisely,let c LT d denote the class of boolean functions that can be computed by polynomial sizeand depth d circuits comprising only of threshold gates each of whose integer weights arepolynomially bounded in n , the number of input bits to the circuit. Interestingly, improvingupon an earlier line of work [8, 29, 34], Goldmann, H˚astad and Razborov [14] showed, amongother things, that LT d ⊆ c LT d +1 . It also remains open to exhibit an explicit function thatis not in c LT . This is a very important frontier, as the work of Yao [35] and Beigel andTarui [4] show that the entire class ACC is contained in the class of functions computable byquasi-polynomial size threshold circuits of small weight and depth three. By contrast, therelatively early work of Hajnal et al. [16] established the fact that the Inner-Product modulo2 function (denoted by IP ), that is easily seen to be in c LT , is not in c LT . Summarizing,we have c LT ⊆ LT ⊆ c LT . Where precisely between c LT and c LT do current techniquesfor lower bounds stop working?In search of the answer to the above question, researchers have investigated the finerstructure of depth-2 threshold circuits, and this has generated many new techniques thatare interesting in their own right. Recall the Majority function, denoted by MAJ , thatoutputs 1 precisely when the majority of its n input bits are set to 1. It is simple to verifythat c LT = MAJ ◦ MAJ . Goldmann et al. [14] proved two very interesting results. First,they showed that the class
MAJ ◦ MAJ and
MAJ ◦ THR are identical, i.e. weights of thebottom gates do not matter if the top gate is allowed only polynomial weight. Second, theyshowed that
MAJ ◦ MAJ is strictly contained in the class
THR ◦ MAJ , i.e. the weight at the2op does matter if the bottom weights are restricted to be polynomially bounded in theinput length. This revealed the following structure: c LT = MAJ ◦ THR ( THR ◦ MAJ ⊆ LT ⊆ c LT . This raised the following two questions: how powerful is the class
THR ◦ MAJ and how doesone prove lower bounds on the size of such circuits?In a breakthrough work, Forster [12] showed that IP requires size 2 Ω( n ) to be computedby THR ◦ MAJ circuits. This yielded an exponential separation between
THR ◦ MAJ and c LT . This also meant that at least one of the two containments THR ◦ MAJ ⊆ LT and LT ⊆ c LT is strict. While it is quite possible that both of them are strict, until now noprogress on this question was made. In particular, Amano and Maruoka [1] and Hansenand Podolskii [17] state that separating THR ◦ MAJ from
THR ◦ THR = LT would bean important step for shedding more light on the structure of depth-2 boolean circuits.However, as far as we know, there was no clear target function identified for the purpose ofseparating the two classes.In this work, we exhibit such a function and prove that it achieves the desired separation.To state our result formally, consider the following function that is a simple adaptation ofa well-known function called ODD-MAX-BIT , which we denote by OMB ℓ : it outputs − ℓ (cid:0) x (cid:1) = − ⇐⇒ ℓ X i =1 ( − i +1 i (1 + x i ) ≥ . f m ◦ g n : {− , } m × n → {− , } be the composed function on mn input bits, whereeach of the m input bits to the outer function f is obtained by applying the inner function g to a block of n bits. Then, we show the following: Theorem 1.1.
Let F n be defined on n = 2 ℓ / bits as OMB ℓ ◦ OR ℓ / − log l ◦ XOR . Every THR ◦ MAJ circuit computing F n needs size Ω ( n / ) . To show that the above suffices to provide us with the separation of threshold circuitclasses, we first observe the following: for each x ∈ {− , } n , let ETHR w ( x ) = − ⇐⇒ w + w x + · · · + w n x n = 0. Thus, ETHR gates are also called exact threshold gates. Byfirst observing that every function computed by a circuit of the form
THR ◦ OR can also becomputed by a circuit of the form THR ◦ AND with a linear blow-up in size, it follows that F n can be computed by linear size circuits of the form THR ◦ AND ◦ XOR . Observe thateach AND ◦ XOR is computed by an ETHR gate. Hence, F n is in THR ◦ ETHR , a class thatHansen and Podolskii [17] showed is identical to the class
THR ◦ THR . Thus, Theorem 1.1yields the following fact:
Corollary 1.2.
The function F n (exponentially) separates the class THR ◦ MAJ from
THR ◦ THR . .1 Our Techniques and Related Work The starting point for our lower bound is the same as for all known lower bounds (see, forexample, [12, 31, 7]) on the size of
THR ◦ MAJ circuits. We strive to prove a lower boundon a quantity called the sign rank of our target function f . Given a partition of the inputbits of f into two parts X, Y , consider the real matrix M f , given by M f [ x, y ] = f ( x, y ) foreach x ∈ {− , } | X | , y ∈ {− , } | Y | . Any real matrix sign represents M f if each if its entriesagrees in sign with the corresponding entry of M f . The sign rank of M f (also informallycalled sign rank of f , when the input partition is clear from the context) is the rank of aminimal rank matrix that sign represents it. It is not hard to see that the sign rank ofa function f computed by THR ◦ MAJ circuit of size s is at most O ( n · s ). This sets atarget of proving a strong lower bound on the sign rank of f for showing that it is hard for THR ◦ MAJ .Sign rank has a matrix-rigidity flavor to it, and therefore is quite non-trivial to bound.Forster’s [12] deep result (see Theorem 2.7) shows that the sign rank of a matrix can be lowerbounded by appropriately upper-bounding its spectral norm. This is enough to lower boundthe sign rank of functions like IP as the corresponding matrices are orthogonal and thereforehave relatively small spectral norm. However for other functions f , the spectral norm ofthe sign matrix M f can be large. This is true, for example, for many functions in AC .In a beautiful work, Razborov and Sherstov [31] showed that Forster’s basic method canbe adapted to prove exponentially strong lower bounds on the sign rank of such a function f . However, our first problem is on devising an f that is in THR ◦ THR that plausiblyhas high sign rank. On this, we were guided by another interpretation of sign rank, dueto Paturi and Simon [28]. Paturi and Simon introduced a model of 2-party randomizedcommunication, called the unbounded-error model. In this model, Alice and Bob have togive the right answer with probability just greater than 1/2 on every input. This is, by far,the strongest 2-party known model against which we know how to prove lower bounds. [28]showed that the sign rank of the communication matrix of f essentially characterizes itsunbounded error complexity.Why should some function f ∈ THR ◦ THR have large unbounded-error complexity?The natural protocol one is tempted to use is the following: assume that the sum of themagnitude of the weights of the top
THR gate is 1. Sample a sub-circuit of the top gate witha probability proportional to its weight. Then, use the best protocol for the sampled bottom
THR gate. Note that for any given input x , with probability 1 / ε , one samples a bottomgate that agrees with the value of f ( x ). Here, ε can be as small as the smallest weight of thetop gate. Thus, if we had a small cost randomized protocol for the bottom THR gate thaterrs with probability significantly less than ε we would have a small cost unbounded-errorprotocol for our function f . Fortunately for us (the lower bound prover), there does notseem to exist any such efficient randomized protocol for THR , when ε = 1 / n Ω(1) .Taking this a step further, one could hope that the bottom gates could be any functionthat is hard to compute with such tiny error ε . The simplest such canonical function isEquality (denoted by EQ ). Therefore, a plausible target is THR ◦ EQ . This still turns outto be in THR ◦ THR as EQ ∈ ETHR . Moreover, EQ has a nice composed structure. It is just AND ◦ XOR , which lets us re-express our target as f = THR ◦ AND ◦ XOR , for some top
THR that is ‘suitably’ hard; hard so that the sign rank of f becomes large! At this point, we4iew f as an XOR function whose outer function, g , needs to have sufficiently good analyticproperties for us to prove that g ◦ XOR has high sign rank.We are naturally drawn to the work of Razborov and Sherstov [31] for inspiration asthey bound the sign rank of a three-level composed function as well. They showed that
AND ◦ OR ◦ AND , an AND function , has high sign rank. They exploited the fact that
AND functions embed inside them pattern matrices , which have nice convenient spectral proper-ties as observed in [33]. These spectral properties dictate them to look for an approximatelysmooth orthogonalizing distribution w.r.t which the outer function f = AND ◦ OR has zerocorrelation with small degree parities. This gives rise naturally to an LP that seeks to max-imize the smoothness of the distribution under the constraints of low-degree orthogonality.The main technical challenge that the Razborov-Sherstov work overcomes is the analysis ofthe dual of this LP using and building appropriate tools of approximation theory. We takecue from this work and follow its general framework of analysing the dual of a suitable LP.However, as we are forced to work with an XOR function, there are new challenges that cropup. This is expected for if we take the same outer function of
AND ◦ OR , then the resulting XOR function has small sign rank. Indeed, this remains true even if one were to hardenthe outer function to
MAJ ◦ OR . This is simply because a simple efficient UPP protocol for
MAJ ◦ EQ exists: pick a random EQ and then execute a protocol of cost O (log n ) that solvesthis EQ with error less than 1 /n .The specific new technical challenge that one faces is the following: instead of lowdegree orthogonality, one now needs a distribution µ w.r.t which the outer function has lowcorrelation with all parities (see LP1). Just dealing with high degree parity constraints,though non-trivial, was done in the recent work of the authors [9]. However, unlike there,here one needs the additional constraint of the distribution being (approximately) smoothenough. Analysing this combination of high degree parity constraints and the smoothnessconstraints, is the main new technical challenge that our work addresses. We do this by anovel combination of ideas that differs entirely from the Razborov-Sherstov analysis.Analyzing the dual of our LP (LP2) involves arguing against the existence of a certainkind of (possibly high degree) polynomial representation. We require several ideas to dealwith it. First, the dual polynomial P has unit weight. While it does not necessarily signrepresent f = THR ℓ ◦ OR m , it is constrained to not stray too far away from zero on thewrong side on each point of its domain {− , } n . Moreover, over a set X , where we want thedistribution to be smooth in in the primal LP, roughly speaking, P ’s margin in representing f on average has to be good. Since the set X has to be large (to get good approximatesmoothness), we are essentially forced to include in X all inputs that are mapped to 1 l bythe bottom OR s of f . In particular, we set X to be precisely the set of such points. Withthis setting, our bound on the sign rank becomes roughly δ/ OPT , where
OPT is the optimalvalue of the LP.The first idea we use is an averaging argument that appeared in the work of Krause andPudl´ak [24]. What this does is that for each possible input y ∈ {− , } ℓ to THR ℓ , it takesthe average of all values of P under the uniform distribution over all points x such that OR m (cid:0) x (cid:1) = y . This achieves the following as described in Lemma 3.1: the polynomial P over x transforms to an OR polynomial Q , over y ’s, of the same weight as P , plus an error termwhose magnitude is exponentially small in the fan-in of the bottom OR gates of f . Here,an OR polynomial is a linear combination of OR s of subsets of variables. Assuming, for the5ake of contradiction, OPT to be large enough, we can safely ignore the error term. Thisgives us a passage to an OR polynomial of unit weight representing our top THR function g ,with the same worst-case guarantee that held for P . Additionally, we get the guarantee thatat y = − l , Q ’s margin is better by the average margin of P on the set X . The intuition isthat when OP T is large, this average margin is also large compared to ∆, the worst casemargin.Now we want to argue that such a Q does not exist if we select our top threshold g judiciously. We select the ODD-MAX-BIT function, denoted by OMB , for this purpose.We then observe that if we randomly restrict each variable to −
1, then the expected weightof OR monomials of degree at least d that do not get fixed is as small as 1 / d . Ignoringthis high degree monomials, therefore does not decrease our margin by too much. Further,with high probability, the restriction induces an OMB of sufficiently large number of freevariables. This now gives us a polynomial of Q ′ of degree less than d that has worst casemargin not too small, but does somewhat better on − l . While margin bounds against signrepresenting polynomials of sufficiently small degree have been obtained several times before,our setting is different. Q ′ is not sign-representing OMB . It is here that our choice of theODD-MAX-BIT function comes in very handy. We show that a standard approximationtheoretic lemma of Ehlich and Zeller [11], Rivlin and Cheney [32] can be used to argueagainst the existence of such a Q ′ for OMB . In this section, we provide the necessary preliminaries.
Definition 2.1 (Threshold functions) . A function f : {− , } n → {− , } is called a linear threshold function if there exist integer weights a , a , . . . , a n such that for all inputs x ∈ {− , } n , f ( x ) = sgn( a + P ni =1 a i x i ) . Let THR denote the class of all such functions.
Definition 2.2 (Exact threshold functions) . A function f : {− , } n → {− , } is calledan exact threshold function if there exist reals w , . . . , w n , t such that f ( x ) = − ⇐⇒ n X i =1 w i x i = t Let
ETHR denote the class of exact threshold functions.
Hansen and Podolskii [17] showed the following.
Theorem 2.3 (Hansen and Podolskii [17]) . If a function f : {− , } n → {− , } can berepresented by a THR ◦ ETHR circuit of size s , then it can be represented by a THR ◦ THR circuit of size s . For the sake of completeness and clarity, we provide the proof below.
Proof.
Let f be an exact threshold function with the representation P ni =1 w i x i = t . Thereexists an ε f > P ni =1 w i x i > t = ⇒ P ni =1 w i x i > t + ε f . Consider a THR ◦ ETHR circuit of size s , say it computes sgn( c + P si =1 f i ), where f i s have exact threshold6epresentations P nj =1 w i,j x j = t i , respectively. Consider the THR ◦ THR circuit of size 2 s ,given by sgn ( P si =1 c i ( g i, − g i, + 1)), where g i s are threshold functions with representationsas follows. g i, = 1 ⇐⇒ n X j =1 w i,j x j ≥ t i ,g i, = 1 ⇐⇒ n X j =1 w i,j x j ≥ t i + ε f i . It is easy to verify that this circuit computes f . Remark 2.4.
In fact, Hansen and Podolskii [17] showed that the circuit class
THR ◦ THR is identical to the circuit class
THR ◦ ETHR . However, we do not require the full generalityof their result.
We now note that any function computable by a
THR ◦ OR circuit can be computed bya THR ◦ AND circuit without a blowup in the size.
Lemma 2.5.
Suppose f : {− , } n → {− , } can be computed by a THR ◦ OR circuit ofsize s . Then, f can be computed by a THR ◦ AND circuit of size s .Proof. Consider a
THR ◦ OR circuit of size s , computing f , say f ( x ) = sgn s X i =1 w i _ j ∈ S i x j Note that s X i =1 w i _ j ∈ S i x j = s X i =1 − w i ^ j ∈ S i x cj Thus, sgn (cid:16)P si =1 − w i V j ∈ S i x cj (cid:17) is a THR ◦ AND representation of f , of size s . Definition 2.6 ( OR polynomials) . Define a function p : {− , } n → R of the form p ( x ) = P S ⊆ [ n ] a S W i ∈ S x i to be an OR polynomial . Define the weight of p to be P S ⊆ [ n ] | a S | , andits degree to be max S ⊆ [ n ] {| S | : a S = 0 } . Define the sign rank of a real matrix A = [ A ij ], denoted by sr ( A ) to be the least rankof a matrix B = [ B ij ] such that A ij B ij > i, j ) such that A ij = 0.Forster [12] proved the following relation between the sign rank of a {± } valued matrixand its spectral norm. Theorem 2.7 (Forster [12]) . Let A = [ A xy ] x ∈ X,y ∈ Y be a {± } valued matrix. Then, sr ( A ) ≥ p | X || Y ||| A || We require the following generalization of Forster’s theorem by Razborov and Sherstov[31]. 7 heorem 2.8 (Razborov and Sherstov [31]) . Let A = [ A xy ] x ∈ X,y ∈ Y be a real valued matrixwith s = | X || Y | entries, such that A = 0 . For arbitrary parameters h, γ > , if all but h ofthe entries of A satisfy | A xy | ≥ γ , then sr ( A ) ≥ γs || A ||√ s + γh The following lemma from Forster et al. [13] tells us that functions that have efficient
THR ◦ MAJ representations have low sign rank.
Lemma 2.9 (Forster et al. [13]) . Let f : {− , } n × {− , } n → {− , } be a booleanfunction computed by a THR ◦ MAJ circuit of size s . Then, sr ( M f ) ≤ sn where M f denotes the communication matrix of f . For the purpose of this paper, we abuse notation, and use sr ( f ) and sr ( M f ) interchange-ably, to denote the sign rank of M f .In the model of communication we consider, two players, say Alice and Bob, are giveninputs X ∈ X and Y ∈ Y for some finite input sets X , Y , they are given access to private randomness and they wish to compute a given function f : X × Y → {− , } . We willuse X = Y = {− , } n for the purposes of this paper. Alice and Bob communicate usinga randomized protocol which has been agreed upon in advance. The cost of the protocolis the maximum number of bits communicated on the worst case input and randomness.A protocol Π computes f with advantage ε if the probability of f agreeing with Π is atleast 1 / ε for all inputs. We denote the cost of the best such protocol to be R ε ( f ).Note here that we deviate from the notation used in [25], for example. Unbounded errorcommunication complexity was introduced by Paturi and Simon [28], and is defined asfollows. UPP ( f ) = min ε ( R ε ( f )) . This measure gives rise to the following natural communication complexity class, as arguedby Babai et al. [2].
Definition 2.10.
UPP cc ( f ) ≡ { f : UPP ( f ) = polylog( n ) } . Paturi and Simon [28] showed an equivalence between
UPP ( f ) and the sign rank of M f ,where M f denotes the communication matrix of f . Theorem 2.11 (Paturi and Simon [28]) . For any function f : {− , } n × {− , } n →{− , } , UPP ( f ) = log sr ( M f ) ± O (1) . The following lemma characterizes the spectral norm of the communication matrix of
XOR functions. 8 emma 2.12 (Folklore) . Let f : {− , } n × {− , } n → R be any real valued function andlet M denote the communication matrix of f ◦ XOR . Then, || M || = 2 n · max S ⊆ [ n ] (cid:12)(cid:12)(cid:12) b f ( S ) (cid:12)(cid:12)(cid:12) . Finally, we require the following well-known lemma by Minsky and Papert [26].
Lemma 2.13 (Minsky and Papert [26]) . Let p : {− , } n → R be any symmetric realpolynomial of degree d . Then, there exists a univariate polynomial q of degree at most d ,such that for all x ∈ {− , } n , p ( x ) = q ( x )) where x ) = |{ i ∈ [ n ] : x i = 1 }| . OMB l ◦ OR m For notational convenience, denote g = OMB l , f = g ◦ W m . Let n = lm . We first use anidea from Krause and Pudl´ak [24] which enables us work with g , rather than g ◦ W m . Lemma 3.1.
Let f = g l ◦ W m : {− , } ml → {− , } , ∆ ∈ R , e x ≥ ∀ x ∈ X , where X denotes the set of all inputs x in {− , } ml such that W m ( x ) = − l , and let p be a realpolynomial such that ∀ x ∈ {− , } ml , f ( x ) p ( x ) ≥ ∆ , ∀ x ∈ {− , } ml such that _ m ( x ) = − l , f ( x ) p ( x ) ≥ ∆ + e x . Then, there exists an OR polynomial p ′ , of weight at most wt ( p ′ ) , such that ∀ y ∈ {− , } l , p ′ ( y ) g ( y ) ≥ wt ( p ) (cid:0) ∆ − l · − m (cid:1) g ( − l ) p ′ ( − l ) ≥ wt ( p ) (cid:18) ∆ − l · − m + P x ∈ X e x | X | (cid:19) . Proof.
For any y ∈ {− , } l , denote by E y [ f ( x )] the expected value of f ( x ) with respect tothe uniform distribution over all x ∈ {− , } ml such that W m ( x ) = y . For each I k ⊆ [ l ] × [ m ],define J k ⊆ [ l ] to be the projection of I k on [ l ]. Formally, i ∈ J k ⇐⇒ ∃ j, x i,j ∈ I k . Note that for any y ∈ {− , } l , E y [ f ( x ) p ( x )] = g ( y ) · E y [ p ( x )] ≥ ∆and E − l [ f ( x ) p ( x )] = g ( − l ) · E − l [ p ( x )] ≥ ∆ + P x ∈ X e x | X | . OR function. The following argument appearsin the proof of Lemma 2.3 in [24]. However, we reproduce the proof below for clarity andcompleteness.First observe that for all y ∈ {− , } l , and for all x satisfying W m ( x ) = y , the monomialcorresponding to I k equals M ( i,j ) ∈ I k x i,j = M ( i,j ) ∈ I k ,y i = − x i,j . Let A = { j ∈ [ l ] : y j = − } . If A ∩ J k = ∅ , then E y M ( i,j ) ∈ I k x i,j = _ j ∈ J k y j = 1Else, let B = A ∩ J k . In this case, W j ∈ J k y j = −
1. Also, E y M ( i,j ) ∈ I k x i,j = E x ∈{− , } A ∩ Jk : W ( x )= − | A ∩ Jk | M ( i,j ) ∈ I k ,y i = − x i,j (1)Note that E x ∈{− , } A ∩ Jk M ( i,j ) ∈ I k ,y i = − x i,j = 0 (2)Denote | A ∩ J k | = q . Using Equation 2 and a simple counting argument, the absolute valueof the RHS (and thus the LHS) of Equation 1 can be upper bounded as follows. (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E y " L ( i,j ) ∈ I k x i,j ≤ mq − (2 m − q (2 m − q ≤ q · mq − m mq ≤ l − m Hence, for all y ∈ {− , } l , we have (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E y M ( i,j ) ∈ I k x i,j − − _ j ∈ J k y j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ l − m . (3)Say p = v + P k v k p k , where p k ( x ) = ⊕ ( i,j ) ∈ I k x i,j is the unique multilinear expansionof p . Define p ′ = v − P k v k − X k v k _ j ∈ J k y j . Note that wt ( p ′ ) = wt v − P k v k − X k v k _ j ∈ J k y j = (cid:12)(cid:12)(cid:12)(cid:12) v − P k v k (cid:12)(cid:12)(cid:12)(cid:12) + X k (cid:12)(cid:12)(cid:12) v k (cid:12)(cid:12)(cid:12) ≤ wt ( p ) . y ∈ {− , } l , g ( y ) · p ′ ( y ) ≥ ∆ − wt ( p ) (cid:0) l · − m (cid:1) and g ( − l ) · p ′ ( − l ) ≥ ∆ + P x ∈ X e x | X | − wt ( p ) (cid:0) l · − m (cid:1) Next, we use random restrictions which reduces the degree of the approximating OR polynomial, at the cost of a small error. Lemma 3.2.
Let g l = OMB : {− , } l → {− , } , f = g l ◦ W m , and ∆ , { e x ≥ x ∈ X } (where X is defined as in Lemma 3.1), and p be a real polynomial such that ( ∀ x ∈ {− , } ml , f ( x ) p ( x ) ≥ ∆ ∀ x ∈ {− , } ml such that W m ( x ) = − l , p ( x ) ≥ ∆ + e x . Then, for any integer d > , there exists an OR polynomial p ′′ , of degree d and weight atmost wt ( p ) , such thatFor all y ∈ {− , } l/ , p ′′ ( y ) g l/ ( y ) ≥ ∆ − wt ( p ) (cid:16) l · − m + 2 − ( d − (cid:17) and p ′′ ( − l/ ) ≥ ∆ + P x ∈ X e x | X | − wt ( p ) (cid:16) l · − m + 2 − ( d − (cid:17) . Proof.
Lemma 3.1 guarantees the existence of an OR polynomial p ′ , of weight at most wt ( p ),such that ∀ y ∈ {− , } l , p ′ ( y ) g ( y ) ≥ ∆ − wt ( p ) (cid:0) l · − m (cid:1) p ′ ( − l ) ≥ ∆ + P x ∈ X e x | X | − wt ( p ) (cid:0) l · − m (cid:1) . Now, set each of the l variables to − /
2, and leave it unset withprobability 1 /
2. Call this random restriction r . Any OR monomial of degree at least d getsfixed to − − − d . Thus, by linearity of expectation, the expected weightof surviving monomials of degree at least d in p ′ is at most wt ( p ) · − d . Let M | r denote thevalue of a monomial M after the restriction r . By Markov’s inequality,Pr r X M :deg( M | r ) ≥ d wt ( M | r ) > wt ( p ) · − d +1 < / l/ { ( x i , x i +1 ) : i ∈ [ l/ } (assume w.l.o.g that l is even). Forany pair, the probability that both of its variables remain unset is 1 /
4. This probability isindependent over pairs. Thus, by a Chernoff bound, the probability that at most l/
16 pairsremain unset is at most 2 − l . 11y a union bound, there exists a setting of variables such that at least l/
16 pairs ofvariables are left free, and the weight of degree ≥ d monomials in p ′ is at most wt ( p ) · − d +1 .Set the remaining 7 l/ −
1. After the restriction, drop the monomialsof degree ≥ d from p ′ to obtain p ′′ , which is now an OR polynomial of degree less than d and weight at most wt ( p ). Note that the function g l hit with this restriction just becomes g l/ .Thus,For all y ∈ {− , } l/ , p ′′ ( y ) g l/ ( y ) ≥ ∆ − wt ( p ) (cid:16) l · − m + 2 − ( d − (cid:17) and p ′′ ( − l/ ) ≥ ∆ + P x ∈ X e x | X | − wt ( p ) (cid:16) l · − m + 2 − ( d − (cid:17) . OMB In this section, we show that approximating
OMB by a low weight polynomial p mustimply that the degree of p is large.We require the following result by Ehlich and Zeller [11] and Rivlin and Cheney [32]. Lemma 3.3 ([11, 32]) . The following holds true for any real valued α > and k > . Let p be a univariate polynomial of degree d < p k/ , such that p (0) ≥ α , and p ( i ) ≤ for all i ∈ [ k ] . Then, there exists i ∈ [ k ] such that p ( i ) < − α . We next use the idea of ‘doubling’ for the
OMB function, as in [3, 6] to show that alow degree polynomial of bounded weight cannot represent OMB well. This is our mainapproximation theoretic lemma. Lemma 3.4.
Suppose p is a polynomial of degree d < p n/ and a > , b ∈ R be realssuch that OMB ( − n ) ≥ a and OMB ( x ) p ( x ) ≥ b for all x ∈ {− , } n . Then, for all i ∈ { , , . . . , ⌊ n/ d ⌋} , there exists an x i ∈ {− , } n (not necessarily distinct) such that | p ( x i ) | ≥ i a + (cid:0) · i − (cid:1) b . The argument will be an iterative one, inspired by the arguments of Beigel and Buhrmanet al. [3, 6].
Claim 3.5. If a and b are reals such that a > , b ∈ R and i a + (cid:0) · i − (cid:1) b < for some i ≥ , then j a + (cid:0) · j − (cid:1) b < for all j > i .Proof. Note that since a > i a + (cid:0) · i − (cid:1) b < b must be negative. For any j > i ,write 2 j a + (cid:0) · j − (cid:1) b = 2 j − i (cid:0) i a + (cid:0) · i − (cid:1) b (cid:1) + 3 · (2 j − i +1 − b < Proof of Lemma 3.4.
We will assume, for the rest of the proof, that2 i a + (cid:0) · i − (cid:1) b ≥ ∀ i ∈ (cid:2) ⌊ n/ d ⌋ (cid:3) . (4)If not, the lemma is trivially true by Claim 3.5.Divide the variables into ⌊ n/ d ⌋ contiguous blocks of size 10 d each. Induction hypothesis:
For each i ∈ { , . . . , ⌊ n/ d ⌋} , there exists an input x i ∈{− , } n such that 12 x ij = − i th block. • The values of x ij for indices j to the left of the i th block are set as dictated by theprevious step. • | p ( x ) | ≥ i a + (cid:0) · i − (cid:1) b . • The value of p ( x ) is negative if i is odd, and positive if i is even.We now prove the induction hypothesis. • Base case:
Say i = 1. We know from our assumption that OMB ( − n ) ≥ a and OMB ( x ) p ( x ) ≥ b for all x ∈ {− , } n . Set the variables corresponding to the evenindices in the first block to −
1, and all variables to the right of the first block to − y , . . . , y d . Define a polynomial p : {− , } d → R by p ( y ) = E σ ∈ S d ˜ p ( σ ( y )), where ˜ p ( y ) denotes the value of p on input y , . . . , y d ,and the remaining variables are set as described earlier. The expectation is over theuniform distribution. Note that p is a symmetric polynomial of degree at most d ,and satisfies p ( − d ) ≥ a, p ( y ) ≤ − b ∀ y = − d . By Lemma 2.13, there exists a univariate polynomial p ′ such that for all i ∈ { }∪ [5 d ], p ′ ( i ) = p ( y ) ∀ y such that y ) = i Thus, p ′ (0) ≥ a, p ′ ( j ) ≤ − b ∀ j ∈ [5 d ] . Define p ′′ = p ′ + b . Thus, p ′′ (0) ≥ a + b ≥
0, and p ′′ ( j ) ≤ ∀ j ∈ [5 d ].By Lemma 3.3, there exists a j ∈ [5 d ] such that p ′′ ( j ) < − a − b . This means p ′ ( j ) < − a − b <
0, because of Equation 4. This implies existence of an x ∈ {− , } n (withall variables to the right of the first block still set to −
1) such that p ( x ) < − a − b . • Inductive step:
In the i th block, set the variables corresponding to the even indicesto − i is odd, and set the odd indexed variables to − i is even. Set the variablesoutside the i th block as dictated by the previous step. Assume that i is odd (theargument for even integers i follows in a similar fashion, with suitable sign changes).Denote the free variables by y , . . . , y d . Define a polynomial p i : {− , } d → R by p i ( y ) = E σ ∈ S d ˜ p ( σ ( y )), where ˜ p ( y ) denotes the value of p on input y , . . . , y d ,and the remaining variables are set as described earlier. The expectation is over theuniform distribution. Note that p i is a symmetric polynomial of degree at most d ,and satisfies p i ( − d ) ≥ i a + (cid:0) · i − (cid:1) b, p ( y ) ≤ − b ∀ y = − d . By Lemma 2.13, there exists a univariate polynomial p ′ i such that for all j ∈ { }∪ [5 d ], p ′ i ( j ) = p i ( y ) ∀ y such that y ) = j. p ′ i (0) ≥ i a + (cid:0) · i − (cid:1) b, p ′ ( j ) ≤ − b ∀ j ∈ [5 d ] . Define p ′′ i = p ′ i + b . Thus, p ′′ i (0) ≥ i a + (cid:0) · i − (cid:1) b ≥ , p ′′ i ( j ) ≤ ∀ j ∈ [5 d ] . By Lemma 3.3, there exists a j ∈ [5 d ] such that p ′′ i ( j ) ≤ − i +1 a − (cid:0) · i +1 − (cid:1) b ,and hence p ′ i ( j ) ≤ − i +1 a − (cid:0) · i +1 − (cid:1) b , by Equation 4. This implies the existenceof an x in {− , } n (with all variables to the right of the i th block still set to − i th block as dictated by the previous step) such that p ( x ) < − i +1 a − (cid:0) · i +1 − (cid:1) b . In this section, we prove our lower bounds. We first use linear programming duality to giveus a sufficient approximation theoretic condition f for showing that the sign rank of f ◦ XOR is large. Let δ > X be any subset of {− , } n .(LP1) Variables ε, { µ x : x ∈ {− , } n } Minimize ε s.t. (cid:12)(cid:12)(cid:12)(cid:12)P x µ ( x ) f ( x ) χ S ( x ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ ε ∀ S ⊆ [ n ] P x µ ( x ) = 1 ε ≥ µ ( x ) ≥ δ n ∀ x ∈ X The first two constraints above specify that correlation of f against all parities need tobe small w.r.t a distribution µ . The last constraint enforces the fact that µ is ‘ δ -smooth’over the set X . As we had indicated before in Section 1.1, these constraints make analyzingthe LP challenging.Standard manipulations (as in [9], for example) and strong linear programming dualityreveal that the optimum of the above linear program equals the optimum of the followingprogram.(LP2) Variables ∆ , { α S : S ⊆ [ n ] } , { ξ x : x ∈ X } Maximize ∆ + δ n P x ∈ X ξ x s.t. f ( x ) P S ⊆ [ n ] α S χ S ( x ) ≥ ∆ ∀ x ∈ {− , } n f ( x ) P S ⊆ [ n ] α S χ S ( x ) ≥ ∆ + ξ x ∀ x ∈ X P S ⊆ [ n ] | α S | ≤ ∈ R α S ∈ R ∀ S ⊆ [ n ] ξ x ≥ ∀ x ∈ X x over the smooth set, the dual polynomial has to better the worst margin by at least ξ x . If the OPT is large, then it means that on average the dual polynomial did significantlybetter than the worst margin. Below is our main technical result of this section, which saysthat no such dual polynomial exists, even when the smoothness parameter δ is as high as1/4. Theorem 4.1.
Let f = OMB l ◦ W l / − log l : {− , } l / − l log l → {− , } , δ = 1 / and X = { x ∈ {− , } l / − l log l : W ( x ) = − l } . Then the optimal value, OPT , of (LP2) is atmost − l / .Proof. Let p be a polynomial of weight 1, for which (LP2) attains its optimum. Denotethe values taken by the variables at the optimum by ∆ OPT , { ξ x, OPT : x ∈ X } . Towards acontradiction, assume OPT ≥ − l / .Lemma 3.2 (set m = l / − log l ) shows the existence of an OR polynomial p ′ on l/ y ∈ {− , } l/ , p ′ ( y ) OMB ( y ) ≥ ∆ OPT − · − l / − · − l / and p ′ ( − l/ ) ≥ ∆ + P x ∈ X ξ x, OPT | X | − · − l / − · − l / . Note that
OPT ≥ − l / = ⇒ ∆ OPT ≥ − l / − δ P x ∈ X ξ x, OPT n (5) p ′ satisfies the assumptions of Lemma 3.4 with d = deg( p ′ ) = l / < p l/
32 (since any OR polynomial of degree d can be represented by a polynomial of degree at most d ), a =∆ OPT + P x ∈ X ξ x, OPT | X | − · − l / , and b = ∆ OPT − · − l / . a = ∆ OPT + P x ∈ X ξ x, OPT | X | − · − l / ≥ − l / − · − l / ≥ . Let us denote k = l / /
80 for the remaining of this proof. Thus, by Lemma 3.4, thereexists an x ∈ {− , } l/ such that | p ′ ( x ) | ≥ k a + (cid:16) · k − (cid:17) b ≥ ∆ OPT (4 · k −
3) + 2 k P x ∈ X ξ x, OPT | X | − · − k (4 · k − ≥ (cid:16) · k − (cid:17) (cid:18) − l / − δ P x ∈ X ξ x, OPT n (cid:19) + 2 k P x ∈ X ξ x, OPT | X | − · − k (4 · k − ≥ (cid:16) · k − (cid:17) (cid:16) − k/ − · − k (cid:17) Since δ = 1 / > k > p ′ was a polynomial of weight at most 1.15 heorem 4.2. Let f = OMB l ◦ W l / − log l : {− , } l / − l log l → {− , } . Then, sr ( f ◦ XOR ) ≥ l / − l Proof.
Let n = l / − l log l . Theorem 4.1 tells us that the optimum of (LP2) (and hence(LP1), by duality) is at most 2 − l / , when f = OMB ◦ W l / − log l . We first estimate thesize of X c . The probability (over the uniform distribution on the inputs) of a particular OR gate firing a 1 is l / − log l . By a union bound, the probability of any OR gate firing a 1 isat most l l / , and hence | X c | ≤ n · l l / . By Lemma 2.12 and Theorem 2.8, sr ( f ◦ XOR ) ≥ sr ( f µ ◦ XOR ) ≥ δ n n OPT · n + δ n · h ≥ / − l / + | X c | n ≥ / − l / + l · − l / ≥ l / − l f on n input variables such that for large enough n , sr ( f ◦ XOR ) ≥ n / − log n Corollary 4.3.
Let f = OMB l ◦ W l / − log l : {− , } l / − l log l → {− , } , and let n = l / − l log l denote the number of input variables. Then UPP ( f ◦ XOR ) ≥ n / −
32 log n − . Proof.
It follows from Theorem 4.2 and Theorem 2.11.We now prove Theorem 1.1, which gives us a lower bound on the size of
THR ◦ MAJ circuits computing
OMB ◦ W l / − log l ◦ XOR . Proof of Theorem 1.1.
Suppose
OMB ◦ W l / − log l ◦ XOR could be represented by a THR ◦ MAJ circuit of size s . Let n = 2 l / − l log l . By Lemma 2.9 and Theorem 4.2, s (cid:16) l / − l log l (cid:17) ≥ sr ( f ) ≥ l / − l . Thus, s ≥ l / − log l
16 = 2 Ω ( n / ) . THR ◦ MAJ from
THR ◦ THR . Proof of Corollary 1.2.
Let n = 2 l / − l log l . By Lemma 2.5, f = OMB ◦ W l / − log l ◦ XOR can be computed by a THR ◦ AND ◦ XOR circuit of size n . Hence f ∈ THR ◦ ETHR = THR ◦ THR , by Theorem 2.3. By Theorem 1.1,
THR ◦ MAJ circuits computing f requiresize 2 Ω ( n / ). This work refines our understanding of depth-2 threshold circuits by providing the followingsummary: c LT ( LT ( c LT = MAJ ◦ THR ( THR ◦ MAJ ( LT ⊆ c LT ⊆ NP / polyWhile we cannot rule out that SAT has efficient THR ◦ THR circuits, we do not evenknow whether IP is in LT . On the other hand, the most powerful method used to provelower bounds on the size of depth-2 threshold circuits for computing an explicit function f exploits the fact that f has large sign rank. Before our work, it was not known if LT contained any function of large sign rank. Our main result shows that indeed there are suchfunctions, answering a question explicitly raised by Hansen and Podolskii [17] and Amanoand Maruoka [1].The central open question in the area is to prove super-polynomial lower bounds onthe size of THR ◦ THR circuits. The best known explicit lower bounds due to Kane andWilliams [22] is roughly n / . We feel that there is a dire need of discovering new techniquesfor proving strong lower bounds against THR ◦ THR circuits.
Acknowledgements
We are grateful to Kristoffer Hansen for bringing to our attention the question of separatingthe classes
THR ◦ MAJ and
THR ◦ THR at the summer school on lower bounds, held in Praguein 2015. We also thank Michal Kouck´y for organizing and inviting us to the workshop.
References [1] Kazuyuki Amano and Akira Maruoka. Complexity of depth-2 circuits with thresholdgates. In , 30th International Symposium Mathematical Foundations of ComputerScience MFCS , pages 107–118, 2005.[2] L´aszl´o Babai, Peter Frankl, and Janos Simon. Complexity classes in communicationcomplexity theory (preliminary version). In , pages 337–347, 1986.[3] Richard Beigel. Perceptrons, PP, and the polynomial hierarchy.
Computational Com-plexity , 4:339–349, 1994.[4] Richard Beigel and Jun Tarui. On acc.
Computational Complexity , 4:340–366, 1994.175] Jehoshua Bruck. Harmonic analysis of polynomial threshold functions.
SIAM J. Dis-crete Math. , 3(2):168–177, 1990.[6] Harry Buhrman, Nikolay Vereshchagin, and Ronald de Wolf. On computation andcommunication with small bias. In
Proceedings of the Twenty-Second Annual IEEEConference on Computational Complexity , CCC ’07, pages 24–32. IEEE ComputerSociety, 2007.[7] Mark Bun and Justin Thaler. Improved bounds on the sign-rank of acˆ0. In , pages 37:1–37:14, 2016.[8] A. K. Chandra, L. Stockmeyer, and U. Vishkin. Constant depth reducibility.
SIAM J.Computing , 13:423–439, 1984.[9] Arkadev Chattopadhyay and Nikhil S. Mande. Dual polynomials and communicationcomplexity of XOR functions.
Arxiv , 2017.[10] Ruiwen Chen, Rahul Santhanam, and Srikanth Srinivasan. Average-case lower boundsand satisfiability algorithms for small threshold circuits. In , pages 1:1–1:35,2016.[11] Hartmut Ehlich and Karl Zeller. Schwankung von polynomen zwischen gitterpunkten.
Mathematische Zeitschrift , 86(1):41–44, 1964.[12] J¨urgen Forster. A linear lower bound on the unbounded error probabilistic communica-tion complexity. In
Proceedings of the 16th Annual IEEE Conference on ComputationalComplexity, Chicago, Illinois, USA, June 18-21, 2001 , pages 100–106, 2001.[13] J¨urgen Forster, Matthias Krause, Satyanarayana V. Lokam, Rustam Mubarakzjanov,Niels Schmitt, and Hans Ulrich Simon. Relations between communication complexity,linear arrangements, and computational complexity. In
FST TCS 2001: Foundationsof Software Technology and Theoretical Computer Science, 21st Conference, Bangalore,India, December 13-15, 2001, Proceedings , pages 171–182, 2001.[14] Mikael Goldmann, Johan H˚astad, and Alexander A. Razborov. Majority gates VS.general weighted threshold gates.
Computational Complexity , 2:277–300, 1992.[15] Mikael Goldmann and Marek Karpinski. Simulating threshold circuits by majoritycircuits.
SIAM J. Comput. , 27(1):230–246, 1998.[16] A. Hajnal, W. Maas, P. Pudl´ak, M. Szegedy, and G. Tur´an. Threshold circuits ofbounded depth.
J. Comput. Syst. Sci. , 46(2):129–154, 1993.[17] Kristoffer Arnsfelt Hansen and Vladimir V. Podolskii. Exact threshold circuits. In
Proceedings of the 25th Annual IEEE Conference on Computational Complexity, CCC2010, Cambridge, Massachusetts, June 9-12, 2010 , pages 270–279, 2010.1818] Kristoffer Arnsfelt Hansen and Vladimir V. Podolskii. Polynomial threshold functionsand boolean threshold circuits.
Inf. Comput. , 240:56–73, 2015.[19] Johan H˚astad. On the size of weights for threshold gates.
SIAM J. Discrete Math ,7(3):484–492, 1994.[20] Johan H˚astad and Mikael Goldmann. On the power of small-depth threshold circuits.
Computational Complexity , 1:113–129, 1991.[21] Thomas Hofmeister. A note on the simulation of exponential threshold weights. In
Computing and Combinatorics, Second Annual International Conference, COCOON’96, Hong Kong, June 17-19, 1996, Proceedings , pages 136–141, 1996.[22] Daniel M. Kane and Ryan Williams. Super-linear gate and super-quadratic wire lowerbounds for depth-two and depth-three threshold circuits. In
Proceedings of the 48thAnnual ACM SIGACT Symposium on Theory of Computing, STOC 2016, Cambridge,MA, USA, June 18-21, 2016 , pages 633–643, 2016.[23] Matthias Krause and Pavel Pudl´ak. On the computational power of depth-2 circuitswith threshold and modulo gates.
Theor. Comput. Sci. , 174(1-2):137–156, 1997.[24] Matthias Krause and Pavel Pudl´ak. Computing boolean functions by polynomials andthreshold circuits.
Computational Complexity , 7(4):346–370, 1998.[25] Eyal Kushilevitz and Noam Nisan.
Communication complexity . Cambridge UniversityPress, 1997.[26] Marvin Minsky and Seymour Papert.
Perceptrons - an introduction to computationalgeometry . MIT Press, 1987.[27] S. Muroga.
Threshold Logic and its Applications . Wiley-Interscience, 1971.[28] Ramamohan Paturi and Janos Simon. Probabilistic communication complexity.
J.Comput. Syst. Sci. , 33(1):106–123, 1986.[29] N. Pippenger. The complexity of computations by networks.
IBM J.Res.Develop. ,31:235–243, 1987.[30] Alexander A. Razborov. On small depth threshold circuits. In
Third ScandinavianWorkshop on Algorithm Theory (SWAT) , pages 42–52, 1992.[31] Alexander A. Razborov and Alexander A. Sherstov. The sign-rank of AC0.
SIAM J.Comput. , 39(5):1833–1855, 2010.[32] Theodore J Rivlin and Elliott W Cheney. A comparison of uniform approximations onan interval and a finite subset thereof.
SIAM Journal on numerical Analysis , 3(2):311–320, 1966.[33] Alexander A. Sherstov. The pattern matrix method.
SIAM J. Comput. , 40(6):1969–2000, 2011. 1934] K. I. Siu and J. Bruck. On the power of thrshold circuits with small weights.
SIAM J.Discrete Math. , 4(3):423–435, 1991.[35] Andrew Chi-Chih Yao. On ACC and threshold circuits. In