[PDF] Near-optimal small-depth lower bounds for small distance connectivity

Abstract

We show that any depth- d circuit for determining whether an n -node graph has an s -to- t path of length at most k must have size n Ω( k 1/d /d) . The previous best circuit size lower bounds for this problem were n k exp(−O(d)) (due to Beame, Impagliazzo, and Pitassi [BIP98]) and n Ω((logk)/d) (following from a recent formula size lower bound of Rossman [Ros14]). Our lower bound is quite close to optimal, since a simple construction gives depth- d circuits of size n O( k 2/d ) for this problem (and strengthening our bound even to n k Ω(1/d) would require proving that undirected connectivity is not in N C 1 . ) Our proof is by reduction to a new lower bound on the size of small-depth circuits computing a skewed variant of the "Sipser functions" that have played an important role in classical circuit lower bounds [Sip83, Yao85, Hås86]. A key ingredient in our proof of the required lower bound for these Sipser-like functions is the use of \emph{random projections}, an extension of random restrictions which were recently employed in [RST15]. Random projections allow us to obtain sharper quantitative bounds while employing simpler arguments, both conceptually and technically, than in the previous works [Ajt89, BPU92, BIP98, Ros14].

Full PDF

NNear-optimal small-depth lower boundsfor small distance connectivity

Xi Chen ∗ Columbia University Igor C. Oliveira † Columbia University Rocco A. Servedio ‡ Columbia UniversityLi-Yang Tan § Toyota Technological InstituteSeptember 25, 2015

Abstract

We show that any depth- d circuit for determining whether an n -node graph has an s -to- t pathof length at most k must have size n Ω( k /d /d ) . The previous best circuit size lower bounds forthis problem were n k exp( − O ( d )) (due to Beame, Impagliazzo, and Pitassi [BIP98]) and n Ω((log k ) /d ) (following from a recent formula size lower bound of Rossman [Ros14]). Our lower bound isquite close to optimal, since a simple construction gives depth- d circuits of size n O ( k /d ) for thisproblem (and strengthening our bound even to n k Ω(1 /d ) would require proving that undirectedconnectivity is not in NC . )Our proof is by reduction to a new lower bound on the size of small-depth circuits computinga skewed variant of the “Sipser functions” that have played an important role in classical circuitlower bounds [Sip83, Yao85, H˚as86]. A key ingredient in our proof of the required lower bound forthese Sipser-like functions is the use of random projections , an extension of random restrictionswhich were recently employed in [RST15]. Random projections allow us to obtain sharperquantitative bounds while employing simpler arguments, both conceptually and technically,than in the previous works [Ajt89, BPU92, BIP98, Ros14]. ∗ [email protected]. Supported in part by NSF grants CCF-1149257 and CCF-1423100. † [email protected]. ‡ [email protected]. Supported in part by NSF grants CCF-1319788 and CCF-1420349. § [email protected]. Part of this research was done while visiting Columbia University. a r X i v : . [ c s . CC ] S e p Introduction

Graph connectivity problems are of great interest in theoretical computer science, both from analgorithmic and a computational complexity perspective. The “ st -connectivity,” or STCONN ,problem — given an n -node graph G with two distinguished vertices s and t , is there a path ofedges from s to t ? — plays a particularly central role. One longstanding question is whetherany improvement is possible on Savitch’s O ((log n ) )-space algorithm [Sav70], based on “repeatedsquaring,” for the directed STCONN problem; since this problem is complete for NL , any suchimprovement would show that NL ⊆ SPACE ( o (log n )), and hence would have a profound impacton our understanding of non-deterministic space complexity. Wigderson’s survey [Wig92] providesa now somewhat old, but still very useful, overview of early results on connectivity problems.In this paper we consider the “small distance connectivity” problem STCONN ( k ( n )) which isdeﬁned as follows. The input is the adjacency matrix of an undirected n -vertex graph G which hastwo distinguished vertices s and t , and the problem is to determine whether G contains a path oflength at most k ( n ) from s to t . We study this problem from the perspective of small-depth circuitcomplexity; for a given depth d (which may depend on k ), we are interested in the size of unboundedfan-in depth- d circuits of AND, OR, and NOT gates that compute STCONN ( k ( n )). (As severalauthors [BIP98, Ros14] have observed, the directed and undirected versions of the STCONN ( k ( n ))problems are essentially equivalent via a simple reduction that converts a directed graph into alayered undirected graph; for simplicity we focus on the undirected problem in this paper.)An impetus for this study comes from the above-mentioned question about Savitch’s algorithm.As noted by Wigderson [Wig92], a simple reduction shows that if Savitch’s algorithm is optimal,then for all k polynomial-size unbounded fan-in circuits for STCONN ( k ( n )) must have depthΩ(log k ). By giving lower bounds on the size of small-depth circuits for STCONN ( k ( n )), Beame,Impagliazzo, and Pitassi [BIP98] have shown that depth Ω(log log k ) is required for k ( n ) ≤ log n ,and more recently Rossman [Ros14] has shown that depth Ω(log k ) is required for k ( n ) ≤ log log n .These bounds for restricted ranges of k motivate further study of the circuit complexity of small-depth circuits for STCONN ( k ( n )) . Below we give a more thorough discussion of both upper andlower bounds for this problem, before presenting our new results.

Upper bounds (folklore).

A natural approach to obtain eﬃcient circuits for

STCONN ( k ( n )) isby repeated squaring of the input adjacency matrix. If x i,j is the input variable that takes value1 if edge { i, j } is present in the input graph, then the graph contains a path of length at most 2from i to j if and only if the depth-2 circuit (cid:87) nk =1 ( x i,k ∧ x k,j ) is satisﬁed (assuming that x i,i = 1 forevery i ). Iterating this construction yields a circuit of size poly( n ) and depth 2 log k that computes STCONN ( k ( n )), whenever k is a power of two.For smaller depths, a natural extension of this approach leads to the following construction.Let G be the input graph. For every pair of nodes u, v in G , check by exhaustive search for pathsof length at most t = k /d connecting these nodes. (We assume that k /d is an integer in orderto avoid unnecessary technical details.) Note that this can be done simultaneously for every pairof nodes by a (multi-output) depth-2 OR-of-ANDs circuit of size n t + O (1) . Let G be a new graphthat has an edge between u and v if and only if a path of length at most t connects these nodes.In general, if we start with G and repeat this procedure d times, we obtain a sequence of graphs G , G , . . . , G d for which the following holds: G i has an edge between nodes u and v if and only1f they are connected by a path of length at most t i in the initial graph G . In particular, thisconstruction provides a circuit of depth 2 d and size n k /d + O (1) that computes STCONN ( k ( n )).Summarizing this discussion, it follows that for all k ≤ n and d ≤ log k , STCONN ( k ( n )) can becomputed by depth-2 d circuits of size n O ( k /d ) , or equivalently by depth- d circuits of size n O ( k /d ) . Lower bounds.

Furst, Saxe, and Sipser [FSS84] were the ﬁrst to show that

STCONN def = STCONN ( n ) / ∈ AC via a reduction from their lower bound against small-depth circuits computingthe parity function. By the same reduction, H˚astad’s subsequent optimal lower bound againstparity [H˚as86] implies that depth- d circuits computing STCONN ( k ( n )) must have size 2 Ω( k / ( d +1) ) ;in particular, for k ( n ) = (log n ) ω (1) polynomial-size circuits computing STCONN ( k ( n )) must havedepth d = Ω(log k/ log log n ). Note, however, that this is not a useful bound for small distanceconnectivity, since when k ( n ) = o (log n ) the 2 Ω( k / ( d +1) ) lower bound is less than n and hence trivial.Ajtai [Ajt89] was the ﬁrst to show that STCONN ( k ( n )) / ∈ AC for all k ( n ) = ω n (1); however, hisproof did not yield an explicit circuit size lower bound. His approach was further analyzed and sim-pliﬁed by Bellantoni, Pitassi, and Urquhart [BPU92], who showed that this technique gives a (barelysuper-polynomial) n Ω(log ( d +3) k ) lower bound on the size of depth- d circuits for STCONN ( k ( n )),where log ( i ) denotes the i -times iterated logarithm. This implies that polynomial-size circuits com-puting STCONN ( k ( n )) must have depth Ω(log ∗ k ).Beame, Impagliazzo, and Pitassi [BIP98] gave a signiﬁcant quantitative strengthening of Ajtai’sresult in the regime where k ( n ) is not too large. For k ( n ) ≤ log n , they showed that any depth- d circuit for STCONN ( k ( n )) must have size n Ω( k φ − d/ ) , where φ = ( √ / STCONN ( k ( n ))require depth Ω(log log k ) (and as noted above, the [BIP98] lower bound only holds for k ( n ) ≤ log n ).Beame et al. asked whether this Ω(log log k ) could be improved to Ω(log k ), which is optimal by theupper bound sketched above. This was achieved recently by Rossman [Ros14], who showed thatfor k ( n ) ≤ log log n , polynomial-size circuits for STCONN ( k ( n )) require depth Ω(log k ). In moredetail, he showed that for k ( n ) ≤ log log n and d ( n ) ≤ log n/ (log log n ) O (1) , depth- d formulas for STCONN ( k ( n )) require size n Ω(log k ) . By the trivial relation between formulas and circuits (everycircuit of size S and depth d is computed by a formula of size S d and depth d ), this implies thatfor such k ( n ) and d ( n ), depth- d circuits for STCONN ( k ( n )) require size n Ω((log k ) /d ) . While thisanswers the question of Beame et al., the n Ω((log k ) /d ) circuit size bound that follows from Rossman’sformula size bound is signiﬁcantly smaller than the n Ω( k φ − d/ ) circuit size bound of [BIP98] when d is small. Furthermore, Rossman’s result only holds for k ( n ) ≤ log log n whereas [BIP98]’s holdsfor k ( n ) ≤ log n (and ideally we would like a lower bound for all distances k ( n ) ≤ n ). Our main result is a near-optimal lower bound for the small-depth circuit size of

STCONN ( k ( n ))for all distances k ( n ) ≤ n . We prove the following: Theorem 1.

For any k ( n ) ≤ n / and any d = d ( n ) , any depth- d circuit computing STCONN ( k ( n )) must have size n Ω( k /d /d ) . Furthermore, for any k ( n ) ≤ n and any d = d ( n ) , any depth- d circuitcomputing STCONN ( k ( n )) must have size n Ω( k / d /d ) . k ’sImplicit in [H˚as86] 2 Ω( k / ( d +1) ) Ω(log k/ log log n ) All k [Ajt89, BPU92] n Ω(log ( d +3) k ) Ω(log ∗ k ) All k [BIP98] n Ω( k φ − d/ ) Ω(log log k ) k ≤ log n [Ros14] n Ω((log k ) /d ) Ω(log k ) k ≤ log log n Folklore upper bound n O ( k /d ) k All k This work n Ω( k /d /d ) Ω(log k/ log log k ) k ≤ n / n Ω( k / d /d ) All k Table 1: Previous work and our results on the size of depth- d circuits for STCONN ( k ( n )). Thecolumn “Range of k ’s” indicates the values of k for which the lower bound is proved to hold.Our lower bound is very close to the best possible, given the n O ( k /d ) upper bound. Indeed,strengthening our theorem to n k Ω(1 /d ) for all values of k and d would imply a breakthrough in circuitcomplexity, showing that unbounded fan-in circuits of depth o (log n ) computing STCONN musthave super-polynomial size. Since every function in NC can be computed by unbounded fan-incircuits of polynomial size and depth O (log n/ log log n ) (see e.g. [KPPY84]), such a strengtheningwould yield an unconditional proof that STCONN / ∈ NC .Comparing to previous work, our n Ω( k /d /d ) lower bound subsumes the main n Ω( k φ − d/ ) lowerbound result of Beame et al. [BIP98] for all depths d , and improves the n Ω((log k ) /d ) circuit size lowerbound that follows from Rossman’s formula size lower bound [Ros14] except when d is quite closeto log k (speciﬁcally, except when Ω(log k/ log log k ) ≤ d ≤ O (log k )). For large distances k ( n ) forwhich the results of [BIP98, Ros14] do not apply (i.e. k ( n ) = ω (log n )), our lower bound subsumesthe 2 Ω( k / ( d +1) ) lower bound that is implied by [H˚as86] for all distances k ( n ) ≤ n / and depths d ,and it subsumes the subsequent n Ω(log ( d +3) k ) lower bound of [Ajt89, BPU92] for all distances k anddepths d .Another perspective on Theorem 1 is that it implies that polynomial-size circuits require depthΩ(log k/ log log k ) to compute STCONN ( k ( n )) for all distances k ( n ) ≤ n . While Rossman’s resultsgive Ω(log k ), they hold only for the signiﬁcantly restricted range k ( n ) ≤ log log n . (And indeed, asnoted above a lower bound of Ω(log k ) for all k ( n ) would imply that STCONN / ∈ NC .) Previous state-of-the-art results on this problem employed rather sophisticated arguments andinvolved machinery. Beame et al. [BIP98] (as well as the earlier works of [Ajt89, BPU92]) obtainedtheir lower bounds by considering the

STCONN ( k ( n )) problem on layered graphs of permutations,i.e., graphs with k + 1 layers of n vertices per layer in which the induced graph between adjacentlayers is a perfect bipartite matching. They developed a special-purpose “connectivity switchinglemma” that bounds the depth of specialized decision trees for randomly-restricted layered graphs.Rossman [Ros14] considered random subgraphs of the “complete k -layered graph” (with k +1 layers3f n vertices and kn edges) where each edge is independently present with probability 1 /n . At theheart of his proof is an intricate notion of “pathset complexity,” which roughly speaking measuresthe minimum cost of constructing a set of paths via the operations of union and relational join,subject to certain “density constraints.”In contrast, we feel that our approach is both conceptually and technically simple. Instead ofworking with layered permutation graphs or random subgraphs of the complete layered graph, weconsider a class of series-parallel graphs that are obtained in a straightforward way (see Section 3)from a skewed variant of the “Sipser functions” that have played an important role in the classicalcircuit lower bounds of Sipser [Sip83], Yao [Yao85], and H˚astad [H˚as86]. Brieﬂy, for every d ∈ N ,the d -th Sipser function Sipser d is a read-once monotone formula with d alternating layers of ANDand OR gates of fan-in w , where w ∈ N is an asymptotic parameter that tends to ∞ (and so Sipser d computes an n = w d variable function). Building on the work of Sipser and Yao, H˚astad used theSipser functions to prove an optimal depth-hierarchy theorem for circuits, showing that for every d ∈ N , any depth- d circuit computing Sipser d +1 must have size exp( n Ω(1 /d ) ).The skewed variant of the Sipser functions that we use to prove our near-optimal lower boundsfor STCONN ( k ( n )) is as follows. For every d ∈ N and 2 ≤ u ≤ w , the d -th u -skewed Sipserfunction , denoted SkewedSipser u,d , is essentially

Sipser d +1 but with the AND gates having fan-in u rather than w (see Section 3 for a precise deﬁnition; as we will see, the number of levels of ANDgates is the key parameter for SkewedSipser , which is why we write

SkewedSipser u,d to denote the n -variable formula that has d levels of AND gates and d + 1 levels of OR gates.) Via a simplereduction given in Section 3, we show that to get lower bounds for depth- d circuits computing STCONN ( u d ) on n -node graphs, it suﬃces to prove that depth- d circuits for SkewedSipser u,d musthave large size. Under this reduction the fan-in of the AND gates is directly related to the length of(potential) paths between s and t . This is why we must use a skewed variant of the Sipser functionin order to obtain lower bounds for small distance connectivity. We remark that even the case u = 2 is interesting and can be used to get the n Ω( k /d /d ) lower bound of Theorem 1 for k up toroughly 2 √ log n . Allowing a range of values for u enables us to get the lower bound for k up to n / (as stated in Theorem 1).Our main technical result of the paper is a lower bound for SkewedSipser u,d , a formula of depth2 d + 1 over n = n ( u, w, d ) = u d w d +33 / variables (for technical reasons we use a smaller fan-in forthe ﬁrst layer of OR gates next to the inputs). Theorem 2.

Let d ( w ) ≥ and ≤ u ( w ) ≤ w / , where w → ∞ . Then any depth- d circuitcomputing SkewedSipser u,d has size at least w Ω( u ) = n Ω( u/d ) . Observe that setting u = k /d this size lower bound is n Ω( k /d /d ) , and therefore we indeed obtainthe lower bound for STCONN ( k ( n )) stated in Theorem 1 as a corollary. As we point out in Section6 (Remark 18), the lower bound given in Theorem 2 for SkewedSipser is essentially optimal.Though they are superﬁcially similar, Theorem 2 and H˚astad’s depth hierarchy theorem diﬀerin two important respects. Both result from our goal of using Theorem 2 to get lower bounds forsmall distance connectivity, and both pose signiﬁcant challenges in extending H˚astad’s proof:1. H˚astad showed that depth- d unbounded fan-in circuits require large size to compute a singlehighly symmetric “hard function,” namely Sipser d +1 . In contrast, toward our goal of under-standing the depth- d circuit size of STCONN ( k ( n )) for all values of k = k ( n ) and d = d ( n ), The exact deﬁnition of the function used in [H˚as86] diﬀers slightly from our description for some technical reasonswhich are not important here.

4e seek lower bounds on the size of depth- d unbounded fan-in circuits computing any oneof a spectrum of asymmetric hard functions, namely SkewedSipser u,d for all u := k /d (withstronger quantitative bounds as k and u get larger).2. To get the strongest possible result in his depth hierarchy theorem, H˚astad (like Yao andSipser) was primarily focused on lower bounding the size of circuits of depth exactly one lessthan Sipser d +1 . In contrast, since in our framework our goal is to lower bound the size ofdepth- d circuits computing SkewedSipser u,d (corresponding to

STCONN ( k ( n )) with k = u d )which has depth 2 d + 1, we are interested in the size of circuits of depth (roughly) half thatof our hard function SkewedSipser u,d .In Section 2 we recall the high-level structure of H˚astad’s proof of his depth hierarchy theorem(based on the method of random restrictions), highlight the issues that arise due to each of the twodiﬀerences above, and describe how our techniques — speciﬁcally, the method of random projections — allow us to prove Theorem 2 in a clean and simple manner.

H˚astad’s depth hierarchy theorem and its proof.

Recall that H˚astad’s depth hierarchytheorem shows that

Sipser d +1 cannot be computed by any circuit C of depth d and size exp( n O (1 /d ) ) . The main idea is to design a sequence of random restrictions {R (cid:96) } ≤ (cid:96) ≤ d satisfying two competingrequirements: • Circuit C collapses. The randomly restricted circuit C (cid:22) ρ ( d ) · · · ρ (2) , where ρ ( (cid:96) ) ← R (cid:96) for2 ≤ (cid:96) ≤ d , collapses to a “simple function” with high probability. This is shown via iterativeapplications of a switching lemma for the R (cid:96) ’s, where each application shows that with highprobability a random restriction ρ ( (cid:96) ) decreases the depth of the circuit C (cid:22) ρ ( d ) · · · ρ ( (cid:96) +1) byat least one. The upshot is that while C is a size- S depth- d circuit, C (cid:22) ρ ( d ) · · · ρ (2) collapsesto a small-depth decision tree (i.e. a “simple function”) with high probability. • Hard function

Sipser d +1 retains structure. In contrast with the circuit C , the hardfunction Sipser d +1 is “resilient” against the random restrictions ρ ( (cid:96) ) ← R (cid:96) . In particular,each random restriction ρ ( (cid:96) ) simpliﬁes Sipser only by one layer, and so

Sipser d +1 (cid:22) ρ ( d ) · · · ρ ( (cid:96) ) contains Sipser (cid:96) as a subfunction with high probability. Therefore, with high probability.

Sipser d +1 (cid:22) ρ ( d ) · · · ρ (2) still contains Sipser as a subfunction, and hence is a “well-structuredfunction” which cannot be computed by a small-depth decision tree.We remind the reader that to satisfy these competing demands, the random restrictions {R (cid:96) } de-vised by H˚astad speciﬁcally for his depth hierarchy theorem are not the “usual” random restrictionswhere each coordinate is independently kept alive with probability p ∈ (0 , Sipser does not retain structure under these random re-strictions). Likewise, the switching lemma for the R (cid:96) ’s is not the “standard” switching lemma(which H˚astad used to obtain his optimal lower bounds against the parity function). Instead, atthe heart of H˚astad’s proof are new random restrictions {R (cid:96) } ≤ (cid:96) ≤ d designed to satisfy both require-ments above: the coordinates of R (cid:96) are carefully correlated so that Sipser (cid:96) +1 retains structure, and5˚astad proved a special-purpose switching lemma showing that C collapses under these carefullytailored new random restrictions. Issues that arise in our setting.

At a technical level (related to point (1) described atthe end of Section 1), H˚astad’s special-purpose switching lemma is not useful for analyzing our

SkewedSipser u,d formulas for most values of u = k /d of interest, since they have a “ﬁne structure”that is destroyed by his too-powerful random restrictions. His switching lemma establishes that anyDNF of width n O (1 /d ) collapses to a small-depth decision tree with high probability when it is hit bya random restriction ρ ( (cid:96) ) ← R (cid:96) . Observe that his hard function Sipser d +1 has DNF-width Ω( √ n ), sohis switching lemma does not apply to it (and indeed as discussed above, hitting Sipser d +1 with hisrandom restriction results in a well-structured function that still contains Sipser d as a subfunctionwith high probability). In contrast, in our setting the hard function SkewedSipser u,d has d levelsof AND gates of fan-in u , and in particular, can be written as a DNF of width u d = k . So for all k = k ( n ) and d = d ( n ) such that k (cid:28) n O (1 /d ) (indeed, this holds for most values of k and d ofinterest), the relevant hard function SkewedSipser u,d collapses to a small-depth decision tree aftera single application of H˚astad’s random restriction.Next (related to point (2)), recall that the formula computing H˚astad’s hard function

Sipser d +1 has a highly regular structure where the fan-ins of all gates — both AND’s and OR’s — are the same.As discussed above, H˚astad employs a random restriction which (with high probability) “peels oﬀ”a single layer of Sipser d +1 and results in a function that contains Sipser d as a subfunction. Dueto their regular structures, Sipser d is dual to Sipser d +1 (more precisely, the bottom-layer depth-2subcircuits of Sipser d are dual to those of Sipser d +1 ), and this allows H˚astad to repeat the sameprocedure d − SkewedSipser u,d formulas where the fan-ins of the AND gates are much less than those of theOR gates. Therefore, in order to reduce to a smaller instance of the same problem, our setuprequires that we peel oﬀ two layers of

SkewedSipser u,d at a time rather than just one as in H˚astad’sargument. To put it another way, while H˚astad’s switching lemma uses a single layer of his hardfunction

Sipser d +1 (i.e. disjoint copies of OR’s/AND’s of fan-in w ) to “trade for” one layer ofdepth reduction in C , our switching lemma will use two layers of our hard function SkewedSipser u,d (i.e. disjoint copies of read-once CNF’s with u = k /d clauses of width w ) to trade for one layer ofdepth reduction in C . Our approach: random projections.

A key technical ingredient in H˚astad’s proof of hisdepth hierarchy theorem — and indeed, in the works of [BIP98, Ros14] on

STCONN ( k ( n )) as well— is the method of random restrictions . In particular, they all employ switching lemmas whichshow that a randomly-restricted small-width DNF collapses to a small-depth decision tree withhigh probability: as mentioned above, H˚astad proved a special-purpose switching lemma for randomrestrictions tailored for the Sipser functions, while Beame et al. developed a “connectivity switchinglemma” for random restrictions of layered permutation graphs, and Rossman used H˚astad’s “usual”switching lemma in conjunction with his pathset complexity machinery.In this paper we work with random projections , a generalization of random restrictions. Given aset of formal variables X = { x , ..., x n } , a restriction ρ either ﬁxes a variable x i (i.e. ρ ( x i ) ∈ { , } )or keeps it alive (i.e. ρ ( x i ) = x i , often denoted by ∗ ). A projection , on the other hand, either ﬁxes x i or maps it to a variable y j from a possibly diﬀerent space of formal variables Y = { y , ..., y m } .Restrictions are therefore a special case of projections where Y ≡ X , and each x i can only beﬁxed or mapped to itself. (See Section 4 for precise deﬁnitions.) Our arguments crucially employprojections in which Y is smaller than X , and where moreover each x i is only mapped to a speciﬁc6lement y j where j depends on i in a carefully designed way that depends on the structure of theformula computing the SkewedSipser function. Such “collisions”, where multiple formal variablesin X are mapped to the same new formal variable y j ∈ Y , play an important role in our approach.Random projections were used in the recent work of Rossman, Servedio, and Tan [RST15], wherethey are the key ingredient enabling that paper’s average-case extension of H˚astad’s worst-casedepth hierarchy theorem. In earlier work, Impagliazzo, Paturi, and Saks [IPS97] used random pro-jections to obtain size-depth tradeoﬀs for threshold circuits, and Impagliazzo and Segerlind [IS01]used them to establish lower bounds against constant-depth Frege systems in proof complexity.Our work provides further evidence for the usefulness of random projections in obtaining stronglower bounds: random projections allow us to obtain sharper quantitative bounds while employingsimpler arguments, both conceptually and technically, than in the previous works [Ajt89, BPU92,BIP98, Ros14] on the small-depth complexity of STCONN ( k ( n )).We remark that although [RST15] and this work both employ random projections to reasonabout the Sipser function (and its skewed variants), the main advantage oﬀered by projections overrestrictions are diﬀerent in the two proofs. In [RST15] the overarching challenge was to establish average-case hardness, and the identiﬁcation of variables was key to obtaining uniform-distributioncorrelation bounds from the composition of highly-correlated random projections. As outlinedabove, in this work a signiﬁcant challenge stems from our goal of understanding the depth- d circuitsize of STCONN ( k ( n )) for all values of k = k ( n ) and d = d ( n ). The added expressiveness of randomprojections over random restrictions is exploited both in the proof of our projection switching lemma(see Section 2.1 below) and in the arguments establishing that our SkewedSipser u,d functions “retainstructure” under our random projections.

Our approach shares the same high-level structure as H˚astad’s depth hierarchy theorem, and isbased on a sequence Ψ of d − • Hard function

SkewedSipser retains structure.

Our random projections are deﬁned withthe hard function

SkewedSipser in mind, and are carefully designed so as to ensure that

SkewedSipser u,d “retains structure” with high probability under their composition Ψ .In more detail, each of the d − Ψ peels oﬀ twolayers of SkewedSipser , and a randomly projected

SkewedSipser u,(cid:96) contains

SkewedSipser u,(cid:96) − as a subfunction with high probability. These individual random projections are simple todescribe: each bottom-layer depth-2 subcircuit of SkewedSipser u,(cid:96) (a read-once CNF with u = k /d clauses of width w ) independently “survives” with probability q ∈ (0 ,

1) and is“killed” with probability 1 − q (where q is a parameter of the restrictions), and – if it survives, all uw variables in the CNF are projected to the same fresh formalvariable (with diﬀerent CNFs mapped to diﬀerent formal variables); – if it is killed, all its variables are ﬁxed according to a random 0-assignment of the CNFchosen uniformly from a particular set of 2 u many 0-assignments.In other words, each bottom-layer depth-2 subcircuit independently simpliﬁes to a fresh formalvariable (with probability q ) or the constant 0 (with probability 1 − q ). With the appropriate7eﬁnition of SkewedSipser and choice of q , it is easy to verify that indeed a randomly projected SkewedSipser u,(cid:96) contains

SkewedSipser u,(cid:96) − as a subfunction with high probability. (For thisto happen, the fanin of the bottom OR gates of SkewedSipser is chosen to be moderatelysmaller than w , the fanin of all other OR gates in SkewedSipser ; see Deﬁnition 3 for details.) • Circuit C collapses. In contrast with

SkewedSipser u,d , any depth- d circuit C of size n O ( u/d ) collapses to a small-depth decision tree under Ψ with high probability. Following the standard“bottom-up” approach to proving lower bounds against small-depth circuits, we establish thisby arguing that each of the individual random projections comprising Ψ “contributes to thesimpliﬁcation” of C by reducing its depth by (at least) one.More precisely, in Section 5 we prove a projection switching lemma , showing that a small-width DNF or CNF “switches” to a small-depth decision tree with high probability underour random projections. (The depth reduction of C follows by applying this lemma to everyone of its bottom-level depth-2 subcircuits.) Recall that the random projection of a depth-2circuit over a set of formal variables X yields a function over a new set of formal variables Y ,and in our case Y is signiﬁcantly smaller than X . In addition to the structural simpliﬁcationthat results from setting variables to constants (as in the switching lemmas of [H˚as86, BIP98,Ros14] for random restrictions ), the proof of our projection switching lemma also exploits theadditional structural simpliﬁcation that results from distinct variables in X being mapped tothe same variable in Y . A restriction over a ﬁnite set of variables A is an element of { , , ∗} A . We deﬁne the composition ρρ (cid:48) of two restrictions ρ, ρ (cid:48) ∈ { , , ∗} A over a set of variables A to be the restriction( ρρ (cid:48) ) α def = (cid:40) ρ α if ρ α (cid:54) = ∗ ρ (cid:48) α otherwise , for all α ∈ A .A DNF is an OR of ANDs (terms) and a

CNF is an AND of ORs (clauses). The width of aDNF (respectively, CNF) is the maximum number of variables that occur in any one of its terms(respectively, clauses).The size of a circuit is its number of gates, and the depth of a circuit is the length of its longestroot-to-leaf path. We count input variables as gates of a circuit (so any circuit for a function thatdepends on all n input variables trivially has size at least n ). We will assume throughout the paperthat circuits are alternating , meaning that every root-to-leaf path alternates between AND gatesand OR gates. We also assume that circuits are layered , meaning that for every gate G , every root-to- G path has the same length. These assumptions are without loss of generality as by a standardconversion (see e.g. the discussion at [Sta]), every depth- d size- S circuit is equivalent to a depth- d alternating layered circuit of size at most poly( S ) (this polynomial increase is oﬀset by the “Ω( · )”notation in the exponent of all of our theorem statements.)8 Lower bounds against

SkewedSipser yield lower bounds for smalldistance connectivity

In this section we deﬁne

SkewedSipser u,d and show that computing this formula on a particular input z is equivalent to solving small-distance connectivity on a certain undirected (multi)graph G ( z ).In a bit more detail, every input z corresponds to a subgraph G ( z ) of a ﬁxed ground graph G thatdepends only on SkewedSipser u,d . (Jumping ahead, we associate each input bit of

SkewedSipser u,d with an edge of its corresponding ground graph G .) Roughly speaking, AND gates translate intosequential paths, while OR gates correspond to parallel paths. After deﬁning SkewedSipser u,d anddescribing this reduction, we give the proof of Theorem 1, assuming Theorem 2.The

SkewedSipser formula is deﬁned in terms of an integer parameter w ; in all our results thisis an asymptotic parameter that approaches + ∞ , and so w should be thought of as “suﬃcientlylarge” throughout the paper. Deﬁnition 3.

For 2 ≤ u ≤ w and d ≥ SkewedSipser u,d is the Boolean function computed by thefollowing monotone read-once formula: • There are 2 d + 1 alternating layers of OR and AND gates, where the top and bottom-layergates are OR gates. (So there are d + 1 layers of OR gates and d layers of AND gates.) • AND gates all have fan-in u . • OR gates all have fan-in w , except bottom-layer OR gates which have fan-in w / ; weassume that w / is an integer throughout the paper. (The most important thing about theconstant 33/100 in the above deﬁnition is that it is less than 1; the particular value 33/100was chosen for technical reasons so that we could get the constant 5 in Theorem 1.)Consequently, SkewedSipser u,d is a Boolean function over n = ( uw ) d w / variables in total. From

SkewedSipser u,d to small-distance connectivity.

There is a natural correspondence be-tween read-once monotone Boolean formulas and series-parallel multigraphs in which each graphhas a special designated “start” node s and a special designated “end” node t . We now describethis correspondence via the inductive structure of read-once monotone Boolean formulas. As weshall see, under this correspondence there is a bijection between the variables of a formula f andthe edges of the graph G ( f ). • If f ( x ) = x is a single variable, then the graph G ( f ) has vertex set V ( f ) = { s, t } and edgeset E ( f ) consisting of a single edge { s, t } . • Let f , . . . , f m be read-once monotone Boolean formulas over disjoint sets of variables, where G ( f i ) is the (multi)graph associated with f i and s i , t i are the start and end nodes of G ( f i ). – If f = AND( f , . . . , f m ): The graph G ( f ) is obtained by identifying t with s , t with s , . . . , and t m − with s m . The start node of G ( f ) is s and the end node is t m . Thusthe vertex set V ( f ) is V ( f ) ∪ · · · ∪ V ( f m ) \ { t , . . . , t m − } and the edge set E ( f ) is themultiset E (cid:48) ( f ) ∪ · · · ∪ E (cid:48) ( f m ), where each E (cid:48) ( f i ) is obtained from E ( f i ) by renamingthe appropriate vertices. 9 ∧ ∧ ∧ ∧∨ ∨ ∨ ∨ ∨ ∨ ∨ ∨ ∨ ∨ ∨ ∨∧ ∧ ∧ ∧ ∧ ∧ ∧ ∧ ∧ ∧ ∧ ∧ ∧ ∧ ∧ ∧ ∧ ∧ ∧ ∧ ∧ ∧ ∧ ∧ x · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · x s t Figure 1:

A read-once formula f (on the left), which is a fan-in 4 OR of fan-in 3 ANDs of fan-in 2 ORs offan-in 2 ANDs, and the corresponding graph G ( f ) (on the right). – If f = OR( f , . . . , f m ): The graph G ( f ) is obtained by identifying s , . . . , s m all to anew start vertex s and t , . . . , t m all to a new end vertex t . Thus the vertex set V ( f ) is V ( f ) ∪ · · · ∪ V ( f m ) ∪ { s, t } \ { s , . . . , s m , t , . . . , t m } and the edge set E ( f ) is the multiset E (cid:48) ( f ) ∪ · · · ∪ E (cid:48) ( f m ), where again each E (cid:48) ( f i ) is obtained from the corresponding edgeset E ( f i ) by renaming vertices accordingly.Since f is read-once, the number of edges of G ( f ) is precisely the number of variables of f , and thereis a natural correspondence between edges and variables. Figure 1 provides a concrete example ofthis construction. Remark 4.

We note that if f is a read-once monotone Boolean formula in which the bottom-levelgates are AND gates and have fan-in at least two, then G ( f ) is a simple graph and not a multigraph.A simple inductive argument gives the following: Observation 5. If f is a read-once monotone alternating formula with r layers of AND gates offan-ins α , . . . , α r , respectively, then every shortest path from s to t in the graph G ( f ) has lengthexactly α · · · α r . Furthermore, if H is a subgraph of G ( f ) that contains some s -to- t path, then itcontains a path of length α · · · α r . As a corollary, we have:

Observation 6.

Every shortest path from s to t in G ( SkewedSipser u,d ) has length exactly u d . Given a read-once monotone formula f over variables x , . . . , x n and an assignment z ∈ { , } n to the variables x , . . . , x n , we deﬁne the graph G ( f, z ) to be the (spanning) subgraph of G ( f )which has vertex set V ( f, z ) = V ( f ) and edge set E ( f, z ) deﬁned as follows: each edge in E ( f ) ispresent in E ( f, z ) if and only if the corresponding coordinate of z is set to 1. A simple inductiveargument gives the following: Observation 7.

Given a read-once monotone alternating formula with r layers of AND gates offan-ins α , . . . , α r , respectively, and an assignment z ∈ { , } n , the graph G ( f, z ) contains a pathfrom s to t of length α · · · α r if and only if f ( z ) = 1 . From these observations we obtain the following connection between

SkewedSipser u,d and small-distance connectivity, which is key to our lower bound:

Corollary 3.1.

The multigraph G ( SkewedSipser u,d , z ) contains an s -to- t path of length at most u d if and only if SkewedSipser u,d ( z ) = 1 . multigraphs . One way to obtain lower bounds for simple graphs insteadof multigraphs is by extending SkewedSipser u,d with an extra layer of fan-in two AND gates nextto the input variables, then relying on Remark 4. We use this simple observation and Theorem 2to establish Theorem 1.

Theorem 1.

For any k ( n ) ≤ n / and any d = d ( n ) , any depth- d circuit computing STCONN ( k ( n )) must have size n Ω( k /d /d ) . Furthermore, for any k ( n ) ≤ n and any d = d ( n ) , any depth- d circuitcomputing STCONN ( k ( n )) must have size n Ω( k / d /d ) .Proof. We assume that d < k/ log log k and ( k/ /d ≥ d ≥ k/ log log k or ( k/ /d < u = (cid:106) ( k/ /d (cid:107) . Then we have u ≥ u = Ω( k /d ). For convenience, let k = u d ≤ k/ n (cid:48) def = (cid:106) n (cid:107) . (1)Further, let w be the largest positive integer such that( u w ) d w / ≤ n (cid:48) . (2)Observe that, since k ≤ n / and d < k/ log log k , as n → + ∞ we have similarly w → + ∞ .Our choice of w also implies that w satisﬁes u d ( w + 1) d +33 / > n (cid:48) . Let n = ( u w ) d w / . Then from d < k/ log log k and w → + ∞ we have n ≥ u d (cid:18) w + 12 (cid:19) d + = Ω( n/ d ) = ω ( n . ) . Combining this with k ≤ n / and k ≤ k/ k = o ( n / ) and n ≥ k . when n issuﬃciently large.We deﬁne a variant of our SkewedSipser u,d formula so we can rely on Remark 4 and work directlywith simple graphs instead of multigraphs. More precisely, let

SkewedSipser † u ,w ,d be analogous to SkewedSipser u,d with parameters u (AND gate fan-in), w (OR gate fan-in), and d but containingan extra layer of fan-in 2 AND gates at the bottom connected to a new set of input variables. Inother words, this is a depth 2 d + 2 read-once alternating formula with twice the number of inputvariables of our original SkewedSipser formula (each input variable of

SkewedSipser becomes an ANDgate connected to two new fresh variables). Since

SkewedSipser u ,d can be obtained by restricting SkewedSipser † u ,w ,d appropriately (i.e. by setting to 1 a single variable in every new pair of variables)a lower bound on the circuit complexity of SkewedSipser immediately implies the same lower boundfor

SkewedSipser † . 11n order to obtain a lower bound via Theorem 2, we need that w / ≥ u . This is equivalentto having n ≥ u d/ , which follows from d ≥ d ≥ STCONN ( k ( n ))) and n ≥ k . since n ≥ k . > k / / . Consequently, we can apply Theorem 2 to

SkewedSipser u ,d , and it follows from our discussion abovethat any depth- d circuit computing SkewedSipser † u ,w ,d must have size at least n Ω( u /d )0 = n Ω( k /d /d ) . (3)In the rest of the proof we translate (3) into a lower bound for STCONN ( k ( n )). Following theexplanation given above, we consider the simple graph G ( SkewedSipser † ) with appropriate param-eters. Since u d ≤ k/

2, it follows from the same argument used to establish Corollary 3.1 that thegraph G ( SkewedSipser † u ,w ,d , z ) contains an s -to- t path of length at most 2 u d ≤ k if and only if wehave SkewedSipser † u ,w ,d ( z ) = 1. Because G ( SkewedSipser † u ,w ,d ) has no isolated vertices and has n edges, it contains at most 2 n ≤ n (cid:48) ≤ n vertices by (1) and (2). Thus, a circuit C that computes STCONN ( k ( n )) on undirected simple graphs on n vertices can also be used to compute the formula SkewedSipser † u ,w ,d , and (3) yields that C must have size n Ω( k /d /d ) . This completes the ﬁrst part ofTheorem 1.It remains to prove the lower bound for STCONN ( k (cid:48) ( n )) with n / < k (cid:48) ( n ) ≤ n . For this,let k ( n ) def = n / . We have established above that computing STCONN ( k ( n )) on subgraphs of G ( SkewedSipser † u ,w ,d ) using depth- d circuits requires size n Ω( k /d /d ) . However, a subgraph of G ( SkewedSipser † u ,w ,d ) contains an s -to- t path of length at most k ( n ) if and only if it containsa path from s to t of length at most k (cid:48) ( n ) (Observation 5). Consequently, any circuit C thatcomputes STCONN ( k (cid:48) ( n )) on general n -vertex graphs can be used to compute STCONN ( k ( n )) onsubgraphs of G ( SkewedSipser † u ,w ,d ) (by setting some input edges to 0). In particular, C must havesize n Ω( k /d /d ) = n Ω( n / d /d ) = n Ω( k (cid:48) / d /d ) . This completes the second part of Theorem 1.

Remark 8.

It is not hard to see that our reduction in fact also captures other natural graphproblems such as directed k -path (“Is there a directed path of length k ( n ) in G ?”) and directed k -cycle (“Is there a directed cycle of length k ( n ) in G ?”), and hence the lower bounds of Theorem 1apply to these problems as well. This suggests the possibility of similarly obtaining other lowerbounds from (variants of) depth hierarchy theorems for Boolean circuits, and we leave this as anavenue for further investigation. In this section we deﬁne our random projections, which will be crucial in the proof of Theorem 2.First, we introduce notation to manipulate the ﬁrst two layers of

SkewedSipser u,d . Deﬁnition 9.

For 2 ≤ u ≤ w , we deﬁne CNFSipser u to be the Boolean function computed by thefollowing monotone read-once formula: 12 The top gate is an AND gate and the bottom-layer gates are OR gates. • The top AND gate has fanin u . • The bottom-layer OR gates all have fan-in w / .For SkewedSipser u,d and each (cid:96) ∈ [ d + 1], we write OR ( (cid:96) ) to denote an OR gate that is in the (cid:96) -thlevel of OR gates away from the input variables and similarly write AND ( (cid:96) ) to denote an AND gatethat is in the (cid:96) -th level of AND gates away from the input variables. So the root of SkewedSipser u,d is the only OR ( d +1) gate; each AND ( (cid:96) ) gate has u many OR ( (cid:96) ) gates as its inputs; each AND (1) gateof SkewedSipser u,d computes a disjoint copy of

CNFSipser u .Next we introduce an addressing scheme for gates and variables of SkewedSipser u,d . Addressing scheme.

Viewing

SkewedSipser u,d as a tree (with its leaves being variables and therest being AND , OR gates), we index its nodes (gates or variables) by addresses as follows. Theroot (gate) is indexed by ε , the empty string. The j -th child of a node is indexed by the address ofits parent concatenated with j . Thus, the variables of SkewedSipser u,d are indexed by addresses A ( d ) := (cid:110) ( b , a , b , . . . , a d , b d ) : a i ∈ [ u ] , b , . . . , b d − ∈ [ w ] , b d ∈ [ w / ] (cid:111) . Block and section decompositions.

We will refer to the set of uw / addresses of variablesbelow an AND (1) gate as a block , and the set of w / addresses of variables below an OR (1) gateas a section .It will be convenient for us to view the set of all variable addresses A ( d ) as A ( d ) = B ( d ) × A (cid:48) , where B ( d ) = (cid:8) ( b , a , b . . . , a d − , b d − ) : a i ∈ [ u ] , b i ∈ [ w ] (cid:9) and A (cid:48) = [ u ] × [ w / ] . Here B ( d ) can be viewed as the set of addresses of the AND (1) gates of SkewedSipser u,d , and A (cid:48) canbe viewed as the set of variable addresses of CNFSipser u computed by each such gate (following thesame addressing scheme).More formally, for a ﬁxed β ∈ B ( d ) we call the set of addresses A ( d, β ) def = (cid:8) ( β, τ ) : τ ∈ A (cid:48) (cid:9) a block of A ( d ); these are the addresses of variables below the AND (1) gate speciﬁed by β . Thus, A ( d ) is the disjoint union of w ( uw ) d − many blocks, each of cardinality | A (cid:48) | = uw / .For a ﬁxed β ∈ B ( d ) and a ∈ [ u ], we call the set of addresses A ( d, β, a ) def = (cid:8) ( β, a, b ) : b ∈ [ w / ] (cid:9) a section of A ( d ); these are the addresses of variables below the OR (1) gate speciﬁed by ( β, a ).Each block A ( d, β ) is the disjoint union of u many sections, each of cardinality w / .To summarize, the set of addresses of variables A ( d ) can be decomposed into w ( uw ) d − manyblocks A ( d, β ) (corresponding to the AND (1) gates), β ∈ B ( d ), and each such block can be furtherdecomposed into u many sections A ( d, β, a ) (corresponding to its u input OR (1) gates), a ∈ [ u ].13ccordingly we also decompose A (cid:48) , the set of variable addresses of CNFSipser u , into sections A (cid:48) ( a ) def = (cid:8) ( a, b ) : b ∈ [ w / ] (cid:9) , for a ∈ [ u ] . The following fact is trivial given the deﬁnition of

CNFSipser u . (Below and subsequently, weuse “ (cid:37) ” to denote a restriction to the variables of CNFSipser and “ ρ ” to denote a restriction to thevariables of SkewedSipser .) Fact 4.1.

For any a ∈ [ u ] and restriction (cid:37) ∈ { , , ∗} A (cid:48) that sets all variables in the a -th section A (cid:48) ( a ) to , i.e., (cid:37) τ = 0 for all τ ∈ A (cid:48) ( a ) , we have that CNFSipser u (cid:22) (cid:37) ≡ . Now we deﬁne our random projection operator proj ρ ( · ). Deﬁnition 10 (Projection operators) . Given a restriction ρ ∈ { , , ∗} A ( d ) , the projection operatorproj ρ maps a function f : { , } A ( d ) → { , } to a function proj ρ ( f ) : { , } B ( d ) → { , } , where (cid:0) proj ρ ( f ) (cid:1) ( y ) = f ( x ) , where x β,τ def = (cid:40) y β if ρ β,τ = ∗ ρ β,τ if ρ β,τ ∈ { , } .For convenience, we sometimes write proj( f (cid:22) ρ ) instead of proj ρ ( f ). Remark 11.

The following interpretation of the projection operator will come in handy. Given arestriction ρ ∈ { , , ∗} A ( d ) , if f is computed by a circuit C , then proj ρ ( f ) is computed by a circuit C (cid:48) obtained from C by replacing every occurrence of x β,τ by y β if ρ β,τ = ∗ , or by ρ β,τ if ρ β,τ ∈ { , } .The crux of our random projection operator proj ρ ( · ) is then a distribution D ( d ) u over restrictions { , , ∗} A ( d ) to the variables { x β,τ : ( β, τ ) ∈ A ( d ) } , from which ρ is drawn. To this end, we considerthe block decomposition B ( d ) × A (cid:48) of A ( d ), and ρ ← D ( d ) u is obtained by drawing independently, foreach block β ∈ B ( d ), a restriction ρ β from a distribution D u over { , , ∗} A (cid:48) to be deﬁned below. Deﬁnition 12 (Distributions D u and D ( d ) u ) . The distribution D u = D u ( q ) over { , , ∗} A (cid:48) is para-meterized by a probability q ∈ (0 , (cid:37) from D u is generated as follows: • With probability q , output (cid:37) = {∗} A (cid:48) (i.e. the restriction ﬁxes no variables). • Otherwise (with probability 1 − q ), we draw a ← [ u ] (a random section) and z ← { , } (arandom bit) independently and uniformly at random, and output (cid:37) where for each τ ∈ A (cid:48) , (cid:37) τ = (cid:40) z if τ ∈ A (cid:48) ( a )1 − z otherwise.Note that in this case (cid:37) is distributed uniformly among 2 u many binary strings in { , } A (cid:48) .These strings are “section-monochromatic”, with u − − z and the one remaining section a taking entirely the other “rare” value z .As described above, a draw of ρ ∈ { , , ∗} B ( d ) × A (cid:48) from D ( d ) u = D ( d ) u ( q ) is obtained by indepen-dently drawing ρ β ← D u = D u ( q ) for each block β ∈ B ( d ) . The following observation about supp( D ( d ) u ) will be useful for us:14 emark 13. A restriction ρ ∈ { , , ∗} B ( d ) × A (cid:48) is in the support of D ( d ) u iﬀ for every block β ∈ B ( d ), ρ β is either {∗} A (cid:48) , or there exists exactly one section a ∈ [ u ] such that ρ β,τ = 0 if τ ∈ A (cid:48) ( a ) and 1otherwise, or there exists exactly one section a ∈ [ u ] such that ρ β,τ = 1 if τ ∈ A (cid:48) ( a ) and 0 otherwise.Therefore, if T is a term of width at most u − β ∈ B ( d ), the variablesfrom block β that occur in T all occur with the same sign, then T can be satisﬁed by a restrictionin the support of D ( d ) u (i.e., T (cid:22) ρ ≡ ρ ∈ supp( D ( d ) u )). (Note that this crucially uses thefact that T has width at most u −

1, and in particular does not contain variables from all u sectionsof any block β . Also note that the inverse of this is not true, e.g., consider T = x β,τ ∧ ¬ x β,τ (cid:48) with τ and τ (cid:48) from two diﬀerent sections.) Our goal now is to prove the following projection switching lemma for (very) small width DNFs:

Theorem 14 (Projection Switching Lemma) . For ≤ u ≤ w , let F be an r - DNF over the variables { x β,τ } , ( β, τ ) ∈ A ( d ) , where r ≤ u − . Then for all s ≥ and q ∈ (0 , , we have Pr ρ ←D ( d ) u ( q ) (cid:104) proj ρ ( F ) has decision tree depth ≥ s (cid:105) ≤ (cid:18) qru − q (cid:19) s . Notice that while F is an r -DNF over formal variables { x β,τ : ( β, τ ) ∈ A ( d ) } , we will bound thedecision tree depth of proj ρ ( F ), a function over the new formal variables { y β : β ∈ B ( d ) } . Remark 15.

Projections will play a key role in the proof. Consider a term of the form T = x β,τ ∧ ¬ x β,τ (cid:48) for some τ (cid:54) = τ (cid:48) , and suppose our ρ from D ( d ) u is such that ρ β,τ = ρ β,τ (cid:48) = ∗ . In this casewe have T (cid:22) ρ = x β,τ ∧ ¬ x β,τ (cid:48) , i.e., the term survives the restriction ρ , but proj ρ ( T ) = y β ∧ ¬ y β ≡ , i.e., the term is killed by proj ρ . Our proof will crucially leverage simpliﬁcations of this sort. Remark 16.

The parameters of Theorem 14 are quite delicate in the sense that the statementfails to hold for DNFs of width u . To see this, consider SkewedSipser u,d with d = 1, a depth-3formula that can also be written as a u -DNF. Then by Corollary 6.2 (to be introduced in Section6), we have that for ρ ← D (1) u ( q ) with q = w − / , the function proj ρ ( SkewedSipser u, ) containsa w / -way OR as a subfunction — and hence has decision tree depth at least w / — withprobability 1 − o (1). So while the statement of Theorem 14 holds for ( u − u -DNFs when u = o ( w / ) and w → ∞ . Remark 17.

We observe that the conclusion of Theorem 14 still holds if the condition “ F is an r -CNF” replaces “ F is an r -DNF.” This can be shown either by a straightforward adaptation ofour proof, or via a reduction to the DNF case using duality, the invariance of our distribution ofrandom projections under the operation of ﬂipping each bit, and the fact that decision tree depthdoes not change when input variables and output value are negated. Given an r -DNF F over variables { x β,τ : ( β, τ ) ∈ A ( d ) } and a restriction ρ ∈ { , , ∗} A ( d ) , proj ρ ( F )is a function over the new variables { y β : β ∈ B ( d ) } . We assume a ﬁxed but arbitrary ordering onthe terms in F , and the variables within terms. The canonical decision tree CanonicalDT ( F, ρ ) thatcomputes proj ρ ( F ) is deﬁned inductively as follows.15 anonicalDT ( F, ρ ) :0. If proj ρ ( F ) ≡ T be the ﬁrst term in F such that T (cid:22) ρ is non-constant and T (cid:22) ρρ (cid:48) ≡ ρ (cid:48) ∈ supp( D ( d ) u ). We observe that such a term must exist, or the procedure would havehalted at step 0 above and not reached the current step 1.To see this, ﬁrst note that certainly there must exist a term T (cid:48) such that T (cid:48) (cid:22) ρ is non-constantsince otherwise F (cid:22) ρ is constant (and likewise proj ρ ( F )). We furthermore claim that amongthese terms T (cid:48) , there must exist one such that T (cid:48) (cid:22) ρ is satisﬁable by some ρ (cid:48) ∈ supp( D ( d ) u ), i.e. T (cid:48) (cid:22) ρρ (cid:48) ≡

1. To prove this, suppose that each of these terms T (cid:48) satisﬁes that T (cid:48) (cid:22) ρ is non-constant and there exists no restriction ρ (cid:48) ∈ supp( D ( d ) u ) such that T (cid:48) (cid:22) ρρ (cid:48) ≡

1. By Remark 13(and our assumption that r ≤ u − T (cid:48) (cid:22) ρ must contain two literals from the same blockoccurring with opposite signs, i.e., x β,τ and ¬ x β,τ (cid:48) , for some β ∈ B ( d ). In this case, we havethat proj ρ ( T (cid:48) ) contains both y β and ¬ y β and hence proj ρ ( T (cid:48) ) ≡

0. But if each such term T (cid:48) has proj ρ ( T (cid:48) ) ≡

0, then proj ρ ( F ) ≡ η = (cid:8) β ∈ B ( d ) : x β,τ or ¬ x β,τ occurs in T (cid:22) ρ for some τ (cid:9) Our canonical decision tree will then query variables y β , β ∈ η exhaustively, i.e., we grow acomplete binary tree of depth | η | ; we will refer to T as the term of this tree.3. For every assignment π ∈ { , } η to variables y β , β ∈ η (equivalently, every path through thecomplete binary tree of depth | η | ), we recurse on CanonicalDT ( F, ρ ( η (cid:55)→ π )), where we use( η (cid:55)→ π ) ∈ { , , ∗} A ( d ) to denote the following restriction:( η (cid:55)→ π ) β,τ = (cid:40) π β if β ∈ η ∗ otherwise, for all β ∈ B ( d ) and τ ∈ A (cid:48) . (4) Proposition 5.1.

For every ρ ∈ { , , ∗} A ( d ) , we have that CanonicalDT ( F, ρ ) computes proj ρ ( F ) . While

CanonicalDT is well deﬁned for all ρ , we shall mostly be interested in ρ ∈ supp( D ( d ) u ). Let B def = (cid:8) ρ ∈ supp( D ( d ) u ) : decision tree depth of CanonicalDT ( F, ρ ) ≥ s (cid:9) be the set of bad restrictions, To prove Theorem 14, it suﬃces to bound Pr ρ ←D ( d ) u ( q ) [ ρ ∈ B ] , thetotal weight of B under D ( d ) u ( q ). Following Razborov’s strategy (see [Bea95] for more details), wewill construct a map θ : B → { , , ∗} A ( d ) × { , } s × { , } s (log r +1) , with the following two key properties:1. (injection) θ ( ρ ) (cid:54) = θ ( ρ (cid:48) ) for any two distinct restrictions ρ, ρ (cid:48) ∈ B ;16. (weight increase) Let θ ( ρ ) ∈ { , , ∗} A ( d ) denote the ﬁrst component of θ ( ρ ). Then Pr [ ρ = θ ( ρ )] Pr [ ρ = ρ ] ≥ Γ , for all ρ ∈ B , (5)where Γ = ((1 − q ) / qu ) s is “large”.Assuming such a map θ exists (below we describe its construction and prove the two propertiesstated above), Theorem 14 follows from a simple combinatorial argument. Proof of Theorem 14.

Fix a pair O ∈ { , } s × { , } s (1+log r ) and let B O = (cid:8) ρ ∈ B : (cid:0) θ ( ρ ) , θ ( ρ ) (cid:1) = O (cid:9) ⊆ B , where we use θ ( ρ ) and θ ( ρ ) to denote the second and third components of θ ( ρ ), respectively.Then we have that Pr [ ρ ∈ B O ] = (cid:88) ρ ∈B O Pr [ ρ = ρ ] ≤ (1 / Γ) · (cid:88) ρ ∈B O Pr [ ρ = θ ( ρ )] ≤ / Γ . Here the ﬁrst inequality uses (5) and the second inequality uses the property of θ being an injection:we have that θ ( ρ ) (cid:54) = θ ( ρ (cid:48) ) for any two distinct ρ, ρ (cid:48) ∈ B O (recall that θ ( ρ ) = θ ( ρ (cid:48) ) and θ ( ρ ) = θ ( ρ (cid:48) )), and therefore (cid:80) ρ ∈B O Pr [ ρ = θ ( ρ ) ] ≤

1. Summing up over all possible O ’s, we have Pr [ ρ ∈ B ] = (cid:88) O Pr [ ρ ∈ B O ] ≤ s · (2 r ) s · (cid:0) qu/ (1 − q ) (cid:1) s = (cid:0) qru/ (1 − q ) (cid:1) s , and this concludes the proof of Theorem 14.The rest of the section is organized as follows. We construct the map θ in Section 5.3. Then weshow that it is an injection in Section 5.4, by showing that one can decode ρ from θ ( ρ ) uniquely forany ρ ∈ B . Finally we prove the weight increase, i.e., (5) in Section 5.5. Let ρ ∈ B be a bad restriction. Let π ∗ be the lexicographically ﬁrst path of length at least s in thedecision tree CanonicalDT ( F, ρ ) (witnessing the badness of ρ ), and π be its truncation at length s .Then θ ( ρ ) is deﬁned to be binary( π ) ∈ { , } s , the binary representation of π , i.e., π i ∈ { , } isthe evaluation of the i th y -variable along π .Recall that CanonicalDT ( F, ρ ) is composed of a collection of complete binary trees, one for eachrecursive call of

CanonicalDT . Let R , . . . , R s (cid:48) for some 1 ≤ s (cid:48) ≤ s denote the sequence of completebinary trees that π visits, with R sharing the same root as CanonicalDT ( F, ρ ) and π ending in R s (cid:48) .(Here s (cid:48) ≥ s ≥ T i to denote the term of tree R i , for each i ∈ [ s (cid:48) ].For each i ∈ [ s (cid:48) − η i = (cid:8) β ∈ B ( d ) : y β is queried in tree R i (cid:9) , and for the special case of i = s (cid:48) , we let η s (cid:48) = (cid:8) β ∈ B ( d ) : y β is queried in tree R s (cid:48) before the end of π (cid:9) . (6)17or each i ∈ [ s (cid:48) ], π induces a binary string π ( i ) ∈ { , } η i , where π ( i ) β for each β ∈ η i is set to be theevaluation of y β along π (in tree R i ). Note that T i is the i -th term processed by CanonicalDT ( F, ρ )along the bad path π and equivalently, T i is the ﬁrst term processed by CanonicalDT (cid:0)

F, ρ ( η (cid:55)→ π (1) ) · · · ( η i − (cid:55)→ π ( i − ) (cid:1) , where ( η j (cid:55)→ π ( j ) ) is a restriction deﬁned as in (4). So T i is the ﬁrst term in F such that T i (cid:22) ρ ( η (cid:55)→ π (1) ) · · · ( η i − (cid:55)→ π ( i − )is non-constant and T i (cid:22) ρ ( η (cid:55)→ π (1) ) · · · ( η i − (cid:55)→ π ( i − ) ρ (cid:48) ≡ , for some ρ (cid:48) ∈ supp( D ( d ) u ).At a high level, θ ( ρ ) and θ ( ρ ) are deﬁned as follows. The third component θ ( ρ ) = encode( η ) ◦ · · · ◦ encode( η s (cid:48) ) ∈ { , } s (1+log r ) is the concatenation of s (cid:48) binary strings, where each encode( η i ) is a concise representation of η i .In particular, we are able to recover η i given both encode( η i ) and T i . We describe the encoding of η i in Section 5.3.1. For the ﬁrst component we have θ ( ρ ) = ρσ (1) · · · σ ( s (cid:48) ) ∈ { , , ∗} A ( d ) , where each σ ( i ) ∈ { , , ∗} A ( d ) is a restriction and ρσ (1) · · · σ ( s (cid:48) ) is their composition (note that eachof these s (cid:48) +1 restrictions, like the overall composition, belongs to { , , ∗} A ( d ) ). We deﬁne the σ ( i ) ’sin Section 5.3.2. η i Fix an i ∈ [ s (cid:48) ]. Let η i = { β , . . . , β t } for some t ≥

1, with β j ’s ordered lexicographically. It followsfrom the deﬁnition of η i that every β j appears in T i , meaning that either x β,τ or ¬ x β,τ appears in T i for some τ ∈ A (cid:48) .Instead of encoding each β j directly using its binary representation, we use log r bits to encodethe index of the ﬁrst x β, · or ¬ x β, · variable that occurs in T i . Here log r bits suﬃce because T i hasat most r variables. Also recall that we ﬁxed an ordering on the variables of each term, so indices ofvariables in T i are well deﬁned. We let location( β j ) denote the log r bits for β j . We also append itwith one additional bit to indicate whether β j is the last element in η i . More formally, we writeencode( η i ) = location( β ) ◦ ◦ location( β ) ◦ ◦ · · · ◦ location( β t ) ◦ ∈ { , } | η i | (1+log r ) . We summarize properties of θ ( ρ ) below: Proposition 5.2.

Given θ ( ρ ) , one can recover uniquely s (cid:48) and encode( η ) , . . . , encode( η s (cid:48) ) . Fur-thermore, given encode( η i ) and T i for some i ∈ [ s (cid:48) ] , one can recover uniquely η i . .3.2 The σ ( i ) restriction We now deﬁne σ ( i ) for a general i ∈ [ s (cid:48) ]. For ease of notation we deﬁne the restriction ρ ( i −

1) def = ρ ( η (cid:55)→ π (1) ) · · · ( η i − (cid:55)→ π ( i − ) ∈ { , , ∗} A ( d ) . Note that ρ (0) = ρ . Recalling our CanonicalDT algorithm and the deﬁnition of T i as the i -th termprocessed by CanonicalDT ( F, ρ ), we have that T i is the ﬁrst term in F such that T i (cid:22) ρ ( i − is non-constant and T i (cid:22) ρ ( i − ρ (cid:48) ≡ ρ (cid:48) ∈ supp( D ( d ) u ). Therefore, we have η i = (cid:8) β ∈ B ( d ) : x β,τ or ¬ x β,τ occurs in T i (cid:22) ρ ( i − for some τ ∈ A (cid:48) (cid:9) . We deﬁne σ ( i ) ∈ { , , ∗} B ( d ) × A (cid:48) to be an arbitrary restriction (say the lexicographic ﬁrst underthe ordering 0 ≺ ≺ ∗ ) satisfying the following three properties:1. T i (cid:22) ρ ( i − σ ( i ) (cid:54)≡

0, and2. σ ( i ) ∈ supp( D ( d ) u ), and3. σ ( i ) β ∈ { , } A (cid:48) for all β ∈ η i , and σ ( i ) β = {∗} A (cid:48) for all β / ∈ η i .In words, σ ( i ) is the lexicographic ﬁrst restriction in supp( D ( d ) u ) that completely ﬁxes blocks β ∈ η i ,leaves all other blocks β / ∈ η i free, and ﬁxes the blocks in η i in a way that does not falsify T i (cid:22) ρ ( i − .For 1 ≤ i < s (cid:48) , we recall that η i contains all blocks with variables occurring in T i (cid:22) ρ ( i − , and soproperty (1) above can in fact be stated as T i (cid:22) ρ ( i − σ ( i ) ≡

1. (This is not necessarily true for thespecial case of i = s (cid:48) since η s (cid:48) may only contain a subset of the blocks with variables occurring in T s (cid:48) (cid:22) ρ ( s (cid:48) − ; c.f. (6).)We observe that such a restriction σ ( i ) (one satisfying all three properties above) must exist.As remarked at the start of this subsection, by the deﬁnition of T i there exists a restriction ρ (cid:48) ∈ supp( D ( d ) u ) such that T i (cid:22) ρ ( i − ρ (cid:48) ≡

1. This along with the fact that D ( d ) u is independent acrossblocks implies the existence of a restriction in supp( D ( d ) u ) that ﬁxes exactly the blocks in η i in away that does not falsify T i (cid:22) ρ ( i − .This ﬁnishes the deﬁnition of σ ( i ) . We record the following key properties of σ ( i ) : Proposition 5.3. T i (cid:22) ρ ( i − σ ( i ) ≡ for ≤ i < s (cid:48) , and T s (cid:48) (cid:22) ρ ( s (cid:48) − σ ( s (cid:48) ) (cid:54)≡ . Proposition 5.4.

For every β ∈ η i , we have ρ ( i − β = {∗} A (cid:48) whereas σ ( i ) β ∈ { , } A (cid:48) , and Pr (cid:37) ←D u ( q ) [ (cid:37) = ρ ( i − β ] = q whereas Pr (cid:37) ←D u ( q ) [ (cid:37) = σ ( i ) β ] = 1 − q u . Lemma 5.5.

The map θ : B → { , , ∗} A ( d ) × { , } s × { , } s (log r +1) , where θ ( ρ ) = (cid:16) ρσ (1) · · · σ ( s (cid:48) ) , binary( π ) , encode( η ) ◦ · · · ◦ encode( η s (cid:48) ) (cid:17) , (7) is an injection.

19e will prove Lemma 5.5 by describing a decoder that can recover ρ ∈ B given θ ( ρ ) as in (7).Let σ = σ (1) · · · σ ( s (cid:48) ) . Note that s (cid:48) can be derived from θ ( ρ ). To obtain ρ , it suﬃces to recover thesets η i , by simply replacing ( ρσ ) β,τ with ∗ for all β ∈ η ∪ · · · ∪ η s (cid:48) and all τ ∈ A (cid:48) .To recover η i ’s, we assume inductively that the decoder has recovered the “hybrid” restriction ρ ( i − σ ( i ) · · · σ ( s (cid:48) ) = ρ ( η (cid:55)→ π (1) ) · · · ( η i − (cid:55)→ π ( i − ) σ ( i ) · · · σ ( s (cid:48) ) and the sets η , . . . , η i − , (8)with the base case i = 1 being ρσ (1) · · · σ ( s (cid:48) ) = θ ( ρ ), which is trivially true by assumption. We willshow below how to decode T i and η i , and then obtain the next “hybrid” restriction ρ ( i ) σ ( i +1) · · · σ ( s (cid:48) ) = ρ ( η (cid:55)→ π (1) ) · · · ( η i (cid:55)→ π ( i ) ) σ ( i +1) · · · σ ( s (cid:48) ) We can recover all s (cid:48) sets η , . . . , η s (cid:48) after repeating this for s (cid:48) times.The following lemma shows how to recover T i , given the “hybrid” restriction in (8). Proposition 5.6.

For ≤ i < s (cid:48) , we have that T i is the ﬁrst term in F such that T i (cid:22) ρ ( i − σ ( i ) · · · σ ( s (cid:48) ) ≡ . For the special case of i = s (cid:48) , we have that T s (cid:48) is the ﬁrst term in F such that T s (cid:48) (cid:22) ρ ( s (cid:48) − σ ( s (cid:48) ) ρ (cid:48)(cid:48) ≡ for some ρ (cid:48)(cid:48) ∈ supp( D ( d ) u ) .Proof. We ﬁrst justify the claim for 1 ≤ i < s (cid:48) . Recall that T i is the ﬁrst term in F such that T i (cid:22) ρ ( i − is non-constant and T i (cid:22) ρ ( i − ρ (cid:48) ≡ ρ (cid:48) ∈ supp( D ( d ) u ). This to-gether with Proposition 5.3 implies that T i is the ﬁrst term in F such that T i (cid:22) ρ ( i − σ ( i ) ≡

1: as σ ( i ) ∈ supp( D ( u ) d ), it follows that ρ ( i − σ ( i ) cannot satisfy any term that occurs before T i in F .For the same reason, T i remains the ﬁrst term in F such that T i (cid:22) ρ ( i − σ ( i ) · · · σ ( s (cid:48) ) ≡ σ ( i +1) , · · · , σ ( s (cid:48) ) ∈ supp( D ( d ) u ) and so is their composition).The argument for i = s (cid:48) is similar. We again recall that T s (cid:48) is the ﬁrst term in F such that T s (cid:48) (cid:22) ρ ( s (cid:48) − is non-constant and T s (cid:48) (cid:22) ρ ( s (cid:48) − ρ (cid:48) ≡ ρ (cid:48) ∈ supp( D ( d ) u ). Since everyterm in T that occurs before T s (cid:48) in F is such that T (cid:22) ρ ( s (cid:48) − ρ (cid:48) (cid:54)≡ ρ (cid:48) ∈ supp( D ( d ) u ), certainly T (cid:22) ρ ( s (cid:48) − σ ( s (cid:48) ) ρ (cid:48)(cid:48) (cid:54)≡ ρ (cid:48)(cid:48) ∈ supp( D ( d ) u ) as well. On the other hand, by Proposition 5.3 wehave that σ ( s (cid:48) ) does not falsify T s (cid:48) (cid:22) ρ ( s (cid:48) − , and so there must exist ρ (cid:48)(cid:48) ∈ supp( D ( d ) u ) such that T s (cid:48) (cid:22) ρ ( s (cid:48) − σ ( s (cid:48) ) ρ (cid:48)(cid:48) ≡

1. This completes the proof.With T i in hand we use encode( η i ) to reconstruct η i by Proposition 5.2. We modify the current“hybrid” restriction ρ ( η (cid:55)→ π (1) ) · · · ( η i − (cid:55)→ π ( i − ) σ ( i ) · · · σ ( s (cid:48) ) as follows: for each β ∈ η i , set (cid:0) ρ ( η (cid:55)→ π (1) ) · · · ( η i − (cid:55)→ π ( i − ) σ ( i ) · · · σ ( s (cid:48) ) (cid:1) β,τ = π ( i ) β , for all τ ∈ A (cid:48) . The resulting restriction is ρ ( η (cid:55)→ π (1) ) · · · ( η i (cid:55)→ π ( i ) ) σ ( i +1) · · · σ ( s (cid:48) ) as desired.Starting with ρσ and repeating this procedure for s (cid:48) times, we recover all η i ’s and then ρ . Thiscompletes the proof that θ is an injection. 20 .5 Weight increase Recall that ρ and ρσ diﬀer in exactly s many blocks, and furthermore, ρ is {∗} A (cid:48) on all these blockswhereas ρσ belongs to { , } A (cid:48) ∩ supp( D u ) on these blocks. Lemma 5.7.

For any ρ ∈ B and ρσ = θ ( ρ ) , we have Pr [ ρ = ρσ ] Pr [ ρ = ρ ] = (cid:89) blocks β onwhich they diﬀer Pr [ (cid:37) = ( ρσ ) β ] Pr [ (cid:37) = ρ β ] = (cid:18) − q qu (cid:19) s . Proof.

This follows from independence across blocks and Proposition 5.4.

In this section we prove our main technical result, Theorem 2, restated below:

Theorem 2.

There is an absolute constant c > such that the following holds. Let d = d ( w ) and u = u ( w ) satisfy d ≥ and ≤ u ≤ w / . Then for w suﬃciently large, any depth- d circuit computing the SkewedSipser u,d function ( recall that this is a formula of depth d + 1 over n = ( uw ) d w / variables ) must have size at least n c · ( u/d ) . We begin by ﬁrst observing that the claimed n Ω( u/d ) circuit size lower bound is o ( n ), and hencevacuous, if d > u ; thus it suﬃces to prove the claimed bound under the assumption that d ≤ u. Wemake this assumption in the rest of the proof below (see speciﬁcally Corollary 6.2). Of course wecan also assume that d ≥

2, since depth-1 circuits of any size cannot compute

SkewedSipser u, . Inthe proof we set the parameter q to be q = w − / .In Section 6.1 we establish that our target function SkewedSipser u,d retains structure with highprobability under a suitable random projection. In Section 6.2 we repeatedly apply both this resultand our projection switching lemma to prove Theorem 2.

We start with an easy proposition about what happens to

CNFSipser u under a random restrictionfrom D u ( q ). The following is an immediate consequence of Deﬁnition 12 and Fact 4.1: Proposition 6.1.

For (cid:37) ← D u ( q ) , we have that (cid:0) CNFSipser u (cid:22) (cid:37) (cid:1) ≡ (cid:40) CNFSipser u with probability q with probability − q. We obtain the following corollary.

Corollary 6.2.

For every ≤ (cid:96) ≤ d , we have that proj ρ ( SkewedSipser u,(cid:96) ) contains SkewedSipser u,(cid:96) − as a subfunction with probability at least . over a random restriction ρ ← D ( (cid:96) ) u ( q ) . roof. Recall that ρ ← D ( (cid:96) ) u ( q ) is drawn by independently drawing ρ β ← D u ( q ) for each block β ∈ B ( (cid:96) ). We have that proj ρ ( SkewedSipser u,(cid:96) ) contains

SkewedSipser u,(cid:96) − as a subfunction if the fol-lowing holds: for each of the OR (2) gates in SkewedSipser u,(cid:96) , at least w / of the w AND (1) gates(each one corresponding to an independent

CNFSipser u function) that are its children (say at ad-dresses β , . . . , β w ) have ρ β i ∈ {∗} A (cid:48) .By Proposition 6.1, for a given OR (2) gate, the expected number of β i ’s beneath it that have ρ β i ∈ {∗} A (cid:48) is qw = w / . So a multiplicative Chernoﬀ bound shows that at least w / of the β i ’s beneath it have ρ β i ∈ {∗} A (cid:48) except with failure probability at most e − w / / . By aunion bound over the (at most n ) OR (2) gates in SkewedSipser u,(cid:96) , we have that the overall failureprobability is at most n · e − w / / . Since n = u d w d + ≤ w d + ≤ w u + ≤ w w / + (cid:28) . · e w / / , the proof is complete. (In the above we used u ≤ w / for the ﬁrst inequality, d ≤ u for thesecond, u ≤ w / again for the third, and w being suﬃciently large for the last.) Most of the proof is devoted to showing that the required size for a depth- d circuit that computes SkewedSipser u,d is at least S def = 0 . u q ) u − . (9)We prove (9) by contradiction; so assume there is a depth- d circuit C of size at most S thatcomputes SkewedSipser u,d . As noted in Section 2.2 we assume that C is alternating and leveled.We “get the argument oﬀ the ground” by ﬁrst hitting both SkewedSipser u,d and C with proj ρ ( · )for ρ ← D ( d ) u ( q ), where q = w − / . (By Remark 17, we can apply our projection switchinglemma, Theorem 14, both to r -DNFs and r -CNFs.) Applying Theorem 14 (with r = 1 and s = u −

1) to each of the gates at distance 1 from the inputs in C , we have that the resultingcircuit proj ρ ( C ) has depth d , bottom fan-in u −

1, and at most S gates at distance at least 2 fromthe inputs with failure probability at most S · (16 qu ) u − < .

1. On the other hand, taking (cid:96) = d in Corollary 6.2 we have that proj ρ ( SkewedSipser u,d ) contains

SkewedSipser u,d − as a subfunctionwith failure probability at most 0 .

1. By a union bound, with probability at least 0 .

8, a draw of ρ ← D ( d ) u ( q ) satisﬁes both of the above, and we ﬁx any such restriction κ ( d ) ∈ supp( D ( d ) u ( q )). Afurther deterministic “trimming” restriction (by only setting certain variables to 0; note that thiscan only simplify proj ρ ( d ) ( C ) further) causes the target proj ρ ( d ) ( SkewedSipser u,d ) to become exactly

SkewedSipser u,d − . Let us write C d to denote the resulting simpliﬁed version of the original circuit C after the combined “project-and-trim”. As C is supposed to compute SkewedSipser u,d , C d mustcompute SkewedSipser u,d − .Next, we consider what happens to SkewedSipser u,d − and C d if we hit them both with proj ρ ( · )for ρ ← D ( d − u ( q ). Applying Theorem 14 (with r = s = u −

1) to each of the gates at distance 2from the inputs and taking a union bound, the resulting circuit proj ρ ( C d ) has depth ( d − In this initial application we view C as having an extra layer of gates of fan-in 1 next to the input variables, sowe have a valid application of Theorem 14 with r = u − ≥ Note that proj ρ ( C ) may have a large number of gates at distance 1 from the inputs but it suﬃces for our purposeto bound the number of gates at distance at least 2 from the inputs. u −

1, and at most S gates at distance at least 2 from the inputs with failure probabilityat most S · (16 ruq ) u − < S · (16 qu ) u − ≤ .

1. On the other hand, taking (cid:96) = d − SkewedSipser u,d − and we have that proj ρ ( SkewedSipser u,d − ) contains SkewedSipser u,d − as a subfunction with failure probability at most 0 .

1. Once again by a unionbound, with probability at least 0 . ρ ← D ( d − u ( q ) satisﬁes both of the above, and weﬁx any such restriction κ ( d − ∈ supp( D d − u ( q )). As before we perform a deterministic trimmingrestriction that causes the target proj κ ( d − ( SkewedSipser u,d − ) to become exactly SkewedSipser u,d − and we let C d − be the resulting simpliﬁed version of C d after the combined project-and-trim. As C d computes SkewedSipser u,d − we have that C d − must compute SkewedSipser u,d − .Repeating the argument above, each time taking r = s = u − κ ( d − ∈ supp( D ( d − u ( q )) , . . . , κ (1) ∈ supp( D (1) u ( q )) and their resulting circuits C d − , . . . , C such that • Hard function retains structure.

For 1 ≤ (cid:96) ≤ d −

2, proj κ ( (cid:96) ) ( SkewedSipser u,(cid:96) ) contains

SkewedSipser u,(cid:96) − as a subfunction, and hence there exists a deterministic trimming restrictionthat results in proj κ ( (cid:96) ) ( SkewedSipser u,(cid:96) ) becoming exactly

SkewedSipser u,(cid:96) − . • Circuit collapses.

For 2 ≤ (cid:96) ≤ d −

2, the circuit proj κ ( (cid:96) ) ( C (cid:96) +1 ) has depth (cid:96) , bottom fan-in u −

1, and has at most S gates at distance at least 2 from the inputs. Furthermore, C (cid:96) isthe simpliﬁed version of proj κ ( (cid:96) ) ( C (cid:96) +1 ) after the deterministic trimming restriction associatedwith proj κ ( (cid:96) ) ( SkewedSipser u,(cid:96) ). Finally, the circuit proj κ (1) ( C ) can be expressed as a depth-( u −

1) decision tree, and C is the simpliﬁed version of proj κ (1) ( C ) after the deterministictrimming restriction associated with proj κ (1) ( SkewedSipser u, ).The above implies that C (cid:96) computes SkewedSipser u,(cid:96) − for all 1 ≤ (cid:96) ≤ d −

2. This yields the de-sired contradiction since C , a decision tree of depth at most u −

1, cannot compute

SkewedSipser u, ,the OR of w / ≥ u many variables. Hence any depth- d circuit computing SkewedSipser u,d musthave size at least S , where S is the quantity deﬁned in (9). The following calculation showing that S = n Ω( u/d ) completes the proof of Theorem 2: Claim 6.3. S = n Ω( u/d ) .Proof. We ﬁrst observe that n = u d w d + ≤ w d + ≤ w ( + ) d < w d , and hence n d < w, where we used u ≤ w / for the ﬁrst inequality and d ≥ S = 0 . (cid:32) w / u (cid:33) u − ≥ . (cid:32) w / (cid:33) u/ ≥ . (cid:32) n d · (cid:33) u/ = n Ω( u/d ) , where we used q = w − / for the ﬁrst equality, 2 ≤ u ≤ w / for the ﬁrst inequality, and w > n d for the ﬁnal equality. Remark 18.

We remark that a straightforward construction yields small-depth circuits computing

SkewedSipser u,d that nearly match the lower bound given by Theorem 2. This construction simplyapplies de Morgan’s law to convert a u -way AND of w -way ORs into a w u -way OR of u -way ANDs.This is done for all of the AND ( d ) , AND ( d − , AND ( d − , . . . gates in SkewedSipser u,d . Collapsingadjacent layers of gates after this conversion, we obtain a depth-( d + 1) circuit of size n O ( u/d ) thatcomputes the SkewedSipser u,d function. 23 eferences [Ajt83] Mikl´os Ajtai. Σ -formulae on ﬁnite structures. Annals of Pure and Applied Logic ,24(1):1–48, 1983. 2[Ajt89] Mikl´os Ajtai. First-order deﬁnability on ﬁnite structures.

Annals of Pure and AppliedLogic , 45:211–225, 1989. 1, 2, 3, 7[Bea95] Paul Beame. A switching lemma primer. University of Washington, Dept. of ComputerScience and Engineering, Technical Report UW-CSE-95-07-01, 1995. 16[BIP98] Paul Beame, Russell Impagliazzo, and Toniann Pitassi. Improved depth lower boundsfor small distance connectivity.

Computational Complexity , 7:325 –345, 1998. 1, 2, 3, 6,7, 8[BPU92] Stephen Bellantoni, Toniann Pitassi, and Alasdair Urquhart. Approximation and smalldepth Frege proofs.

SIAM Journal on Computing , 21(6):1161–1179, 1992. 1, 2, 3, 7[Cai86] Jin-Yi Cai. With probability one, a random oracle separates

PSPACE from thepolynomial-time hierarchy. In

Proceedings of the 18th Annual ACM Symposium onTheory of Computing (STOC) , pages 21–29, 1986. 2[FSS84] Merrick Furst, James Saxe, and Michael Sipser. Parity, circuits, and the polynomial-time hierarchy.

Mathematical Systems Theory , 17(1):13–27, 1984. 2[H˚as86] Johan H˚astad.

Computational Limitations for Small Depth Circuits . MIT Press, Cam-bridge, MA, 1986. 1, 2, 3, 4, 8[IPS97] Russell Impagliazzo, Ramamohan Paturi, and Michael E. Saks. Size–depth tradeoﬀs forthreshold circuits.

SIAM Journal on Computing , 26(3):693–707, 1997. 7[IS01] Russell Impagliazzo and Nathan Segerlind. Counting axioms do not polynomially simu-late counting gates. In

Proceedings of the 42nd Annual IEEE Symposium on Foundationsof Computer Science (FOCS) , pages 200–209, 2001. 7[KPPY84] Maria Klawe, Wolfgang Paul, Nicholas Pippenger, and Mihalis Yannakakis. On mono-tone formulae with restricted depth. In

Proceedings of the 16th Annual ACM Symposiumon Theory of Computing (STOC) , pages 480–487, 1984. 3[Ros14] Benjamin Rossman. Formulas vs. circuits for small distance connectivity. In

Proceedingsof the 46th Annual ACM Symposium on Theory of Computing (STOC) , pages 203–212.ACM, 2014. 1, 2, 3, 6, 7, 8[RST15] Benjamin Rossman, Rocco A. Servedio, and Li-Yang Tan. An average-case depth hier-archy theorem for Boolean circuits. In

Proceedings of the 56th Annual IEEE Symposiumon Foundations of Computer Science (FOCS) , 2015. to appear. 1, 7[Sav70] Walter Savitch. Relationships between nondeterministic and deterministic tape com-plexities.

Journal of Computer and System Sciences , 4:177–192, 1970. 124Sip83] Michael Sipser. Borel sets and circuit complexity. In

Proceedings of the 15th AnnualACM Symposium on Theory of Computing (STOC) , pages 61–69, 1983. 1, 4[Sta] Theoretical Computer Science StackExchange. Available at http://cstheory.stackexchange.com/questions/7672/most-efficient-way-to-convert-an-textac0-circuit-to-a-circuit-of-any-dep .8[Wig92] Avi Wigderson. The complexity of graph connectivity. In

Proceedings of the 17thSymposium on Mathematical Foundations of Computer Science (MFCS) , pages 112–132. Springer-Verlag, 1992. 1[Yao85] Andrew Yao. Separating the polynomial time hierarchy by oracles. In