[PDF] Learning algorithms from circuit lower bounds

Abstract

We revisit known constructions of efficient learning algorithms from various notions of constructive circuit lower bounds such as distinguishers breaking pseudorandom generators or efficient witnessing algorithms which find errors of small circuits attempting to compute hard functions. As our main result we prove that if it is possible to find efficiently, in a particular interactive way, errors of many p-size circuits attempting to solve hard problems, then p-size circuits can be PAC learned over the uniform distribution with membership queries by circuits of subexponential size. The opposite implication holds as well. This provides a new characterisation of learning algorithms and extends the natural proofs barrier of Razborov and Rudich. The proof is based on a method of exploiting Nisan-Wigderson generators introduced by Kraj\'{i}\v{c}ek (2010) and used to analyze complexity of circuit lower bounds in bounded arithmetic. An interesting consequence of known constructions of learning algorithms from circuit lower bounds is a learning speedup of Oliveira and Santhanam (2016). We present an alternative proof of this phenomenon and discuss its potential to advance the program of hardness magnification.

Full PDF

LLearning algorithms from circuit lower bounds

J´an Pich

University of Oxford

November 2020

Abstract

We revisit known constructions of eﬃcient learning algorithms from various no-tions of constructive circuit lower bounds such as distinguishers breaking pseudoran-dom generators or eﬃcient witnessing algorithms which ﬁnd errors of small circuitsattempting to compute hard functions. As our main result we prove that if it ispossible to ﬁnd eﬃciently, in a particular interactive way, errors of many p-size cir-cuits attempting to solve hard problems, then p-size circuits can be PAC learnedover the uniform distribution with membership queries by circuits of subexponentialsize. The opposite implication holds as well. This provides a new characterisation oflearning algorithms and extends the natural proofs barrier of Razborov and Rudich.The proof is based on a method of exploiting Nisan-Wigderson generators intro-duced by Kraj´ıˇcek (2010) and used to analyze complexity of circuit lower boundsin bounded arithmetic.An interesting consequence of known constructions of learning algorithms fromcircuit lower bounds is a learning speedup of Oliveira and Santhanam (2016). Wepresent an alternative proof of this phenomenon and discuss its potential to advancethe program of hardness magniﬁcation.

While the central conjectures in complexity theory such as P (cid:54) = NP have the form ofimpossibility results, we hope that a better understanding of the impossibility phenomenawill also shed light on the question of constructing new useful algorithms. A successfulformalization of such hopes can be found in cryptography, where the impossibility resultsin the form of average-case lower bounds are turned into cryptographic primitives. In thepresent paper we are interested in turning complexity lower bounds into eﬃcient learningalgorithms.Results of this form can be traced back to cryptography as well. The ‘ pseudoran-domness from unpredictability ’ paradigm was used by Blum, Furst, Kearns and Lipton [3]1 a r X i v : . [ c s . CC ] D ec o show that eﬃcient distinguishers breaking pseudorandom generators imply an eﬃcientlearning of p-size circuits on average. The distinguishers from [3] can be interpreted as con-structive circuit lower bounds distinguishing partial truth-tables of easy Boolean functionsfrom partial truth-tables of hard functions, cf. Section 4. The existing methods for prov-ing circuit lower bounds have been also applied in constructions of new learning algorithmsfor restricted circuit classes, e.g. Linial, Mansour and Nisan [23] used AC lower boundsto get learning algorithms for AC . More recently, in a landmark work, Carmosino, Im-pagliazzo, Kabanets and Kolokolova [5] gave a generic construction of learning algorithmsfrom natural proofs of circuit lower bounds. Oliveira and Santhanam [32] extended theirresult to a dichotomy between the non-existence of non-uniform pseudorandom functionfamilies and the existence of eﬃcient learning of small circuits. These results led Oliveiraand Santhanam [32] also to a discovery of a surprising learning speedup. For example,learning p-size circuits over the uniform distribution with membership queries by circuitsof weakly subexponential size 2 n /n ω (1) implies that for each constant k and (cid:15) >

0, cir-cuits of size n k can be learned over the uniform distribution with membership queries bycircuits of strongly subexponential size 2 n (cid:15) . In the present paper we revisit these connections. We start by considering a simple instance-speciﬁc model of learning in which proving a single circuit lower bound impliesa reliable prediction of the value of a target function on a single input. The modelunderlies the construction of learning algorithms from [3, 5] and diﬀers from the standardPAC learning model mainly in that it does not ask learners to construct a circuit whichcomputes the target function on a big fraction of inputs, cf. Section 3.

Learning from witnessing lower bounds.

Our main result is a construction of eﬃcientPAC learning of p-size circuits from a constructive circuit lower bound for an arbitraryBoolean function H . More precisely, we obtain subexponential-size circuits learning p-size circuits over the uniform distribution with membership queries. The assumptionof a constructive circuit lower bound we need is deﬁned as the existence of 2 O ( n ) -size‘witnessing’ circuits W which given an oracle access to a p-size circuit D with n inputsﬁnd a not-yet-queried input on which D fails to compute H . The circuits W are allowedto fail on 1 /poly ( n ) fraction of circuits D . Moreover, even if circuits W succeed on acircuit D they are allowed to output incorrect answer log n times (receiving a correctionin each round) before generating the right answer, cf. Theorem 1. The implication can bealso interpreted as a construction of PAC learning algorithms from a frequent interactiveinstance-speciﬁc learning: If we are given an algorithm which is able to predict a value ofa big fraction of p-size circuits (after a small number of queries and ≤ log n mistakes) even We use the adjective ‘instance-speciﬁc’ only informally in this paper. The instance-speciﬁc modeldiscussed earlier actually diﬀers slightly from the concept in Theorem 1.

2n a single input, this already implies learnability of p-size circuits on almost all inputs.The opposite implication producing eﬃcient witnessing of lower bounds from learningalgorithms holds as well, which yields a new characterisation of PAC learning of smallcircuits, cf. Lemma 1.

Relation to proof complexity, natural proofs and witnessing theorems.

Thenotion of interactive witnessing of circuit lower bounds from Theorem 1 is motivated bywitnessing theorems from bounded arithmetic. One of the most prominent theories ofbounded arithmetic is Cook’s theory PV , which formalizes p-time reasoning. Theories ofbounded arithmetic satisfy many so called witnessing theorems, which allow us to show,for example, that if we can prove a p-size circuit lower bound for a function H ∈ NP in PV then there exists a witnessing analogous to the one from Theorem 1 except that thewitnessing circuits W have white-box access to D (i.e. access to a full description of D ),see Section 3.1 for a more detailed comparison. The witnessing from Theorem 1 is alsoclosely related to algorithms ﬁnding hard instances of NP problems by Gutfreund, Shaltiel,Ta-Shma [12] and Atserias [2]. The main diﬀerence is that the algorithms from [12] havewhite-box access to the algorithm whose error they search for. While Atserias [2] made [12]work with the black-box (oracle) access, his algorithm achieves much smaller probabilityof success than the one required in Theorem 1, cf. Section 3.1.The proof of Theorem 1 is an adaptation of a method of exploiting Nisan-Wigdersongenerators introduced by Kraj´ıˇcek [17] in order to give a model-theoretic evidence forRazborov’s conjecture in proof complexity. Razborov’s conjecture [39] states a conditionalhardness of deriving tautologies expressing the existence of an element outside of therange of a suitable NW-generator in strong proof systems. Kraj´ıˇcek’s result signiﬁcantlystrengthens a similar but much simpler proof of the validity of Razborov’s conjecture forproof systems with feasible interpolation [34]. The method has been also used to showa conditional hardness of generating hard tautologies [19], a conditional unprovability ofp-size circuit lower bounds for SAT in theories of bounded arithmetic below Cook’s theory PV [35] and an unconditional unprovability of strong nondeterministic lower bounds inJeˇr´abek’s theory of approximate counting APC [37]. We take advantage of its uniqueway of exploiting the NW generator: it gives us a reconstruction algorithm which afterbreaking the NW-generator in a particular interactive fashion allows us to approximatelycompute the function on which the generator is based. There are, however, technicalissues with adapting this method in our context, e.g. unlike in bounded arithmetic ourwitnessing circuits can fail with a signiﬁcant probability. Our main contribution is inﬁnding the right notions which allow the arguments to go through (in both directions).A competing notion of constructive circuit lower bounds has been developed in theinﬂuential theory of natural proofs of Razborov and Rudich [40], which explains whymany of the existing lower bound methods cannot yield separations such as P (cid:54) = NP .Natural proofs are known to be equivalent to the existence of eﬃcient learning algo-3ithms, cf. [5]. For example, P / poly -natural proofs useful against P / poly are equivalentto subexponential-size circuits learning p-size circuits over the uniform distribution withmembership queries. Furthermore, natural proofs have been used to derive unprovabilityresults in proof complexity as well. Speciﬁcally, to derive unprovability of circuit lowerbounds in proof systems with the feasible interpolation property, cf. [38, 16]. Despite sim-ilar applications and motivations for deﬁning these concepts, the relation between naturalproofs and the witnessing method has not been clear. In fact, a priori the ‘static’ deﬁni-tion of natural proofs appears to be quite orthogonal to the witnessing from Theorem 1.Theorem 1 thus not only extends the scope of the natural proofs barrier by providing an-other equivalent characterisation which incorporates interactivity but also helps to clarifyits relation to the witnessing method. Learning speedup.

Our second contribution is a simple proof of a generalized learningspeedup of Oliveira and Santhanam [32]. Speciﬁcally, we show that for each superpolyno-mial function s , if for each constant k , circuits of size n k are learnable by circuits of size s over the uniform distribution with random examples, then for each constant k and (cid:15) > n k are learnable over the uniform distribution with membership queries bycircuits of size O ( s (cid:15) ), cf. Theorem 6. We obtain the speedup by a more direct exploitationof a slightly modiﬁed NW-generator. In comparison to the proof from [32], this sidestepsthe need to construct natural proofs and invoke the construction of Carmosino et al. [5].A disadvantage of the method is that we need to assume learning with random examplesinstead of membership queries. Nevertheless, we present one more alternative proof ofthe learning speedup based on (a simple case of) Theorem 1, which allows to start withmembership queries, cf. Theorem 7. We emphasize, however, that behind all proofs ofthe learning speedup is essentially the same general idea of reconstructing, in this or thatway, the base function of some form of the NW-generator. Relation to hardness magniﬁcation and locality.

The generalized learning speedupcan be interpreted as a nonlocalizable hardness magniﬁcation theorem reducing a com-plexity lower bound into a seemingly weaker one. In general, hardness magniﬁcation refersto an approach to strong complexity lower bounds developed in a series of recent papers,cf. Section 5. Unfortunately, while the approach avoids (in certain cases provably [6])the natural proofs barrier, it suﬀers from a ‘ locality barrier ’: magniﬁcation theorems typ-ically yield unconditional upper bounds for speciﬁc problems if the computational modelin question is allowed to use oracles with small fan-in, but the existing lower bounds ac-tually work even against the presence of local oracles. In fact, a better understanding ofnonlocalizable lower bounds is essential for further progress on strong complexity lowerbounds in general, see Section 5 for more details. A promising aspect of the learning P / poly -natural proofs useful against P / poly are deﬁned as 2 O ( n ) -size circuits with 2 n inputs acceptinga 1 / O ( n ) -fraction of inputs and rejecting all inputs which represent truth-tables of Boolean functions on n inputs computable by p-size circuits, cf. Deﬁnition 1. Learning from breaking cryptographic pseudorandom generators.

In Section 4we survey known constructions of learning algorithms from distinguishers breaking pseu-dorandom generators (PRGs) or natural proofs. While several such constructions areknown, the question of extracting eﬃcient learning of p-size circuits from the non-existenceof cryptographic PRGs remains open. A positive answer to this question would estab-lish an interesting win-win situation: either safe cryptography or eﬃcient learning ispossible. In the already mentioned approach, Oliveira and Santhanam [32] showed thateﬃcient learning of p-size circuits with membership queries follows from the non-existenceof nonuniform pseudorandom function families. By a straightforward adaptation of theproof method behind their result we show that eﬃcient learning of p-size circuits withrandom examples follows from the non-existence of succinct nonuniform pseudorandomfunction families, cf. Theorem 5. Finally, we point out that the desired construction oflearning algorithms from the non-existence of cryptographic PRGs is closely related to aquestion of Rudich about turning demibits to superbits, cf. Section 4.4. [ n ] denotes { , . . . , n } . Circuit [ s ] denotes fan-in two Boolean circuits of size at most s . Thesize of a circuit is the number of gates. A function f : { , } n (cid:55)→ { , } is γ -approximatedby a circuit C , if Pr x [ C ( x ) = f ( x )] ≥ γ . Deﬁnition 1 (Natural property [40]) . Let m = 2 n and s, d : N (cid:55)→ N . A sequence ofcircuits { C m } ∞ m =1 is a Circuit [ s ( m )] -natural property useful against Circuit [ d ( n )] if1. Constructivity. C m has m inputs and size s ( m ) ,2. Largeness. Pr x [ C m ( x ) = 1] ≥ /m O (1) ,3. Usefulness.

For each suﬃciently big m , C m ( x ) = 1 implies that x is a truth-table ofa function on n inputs which is not computable by circuits of size d ( n ) . Deﬁnition 2 (Pseudorandom generator) . A function g : { , } n (cid:55)→ { , } n +1 computableby p-size circuits is a pseudorandom generator safe against circuits of size s ( n ) , if foreach circuit D of size s ( n ) , (cid:12)(cid:12)(cid:12)(cid:12) Pr y ∈{ , } n +1 [ D ( y ) = 1] − Pr x ∈{ , } n [ D ( g ( x )) = 1] (cid:12)(cid:12)(cid:12)(cid:12) < s ( n ) . Deﬁnition 3 (PAC learning) . A circuit class C is learnable over the uniform disributionby a circuit class D up to error (cid:15) with conﬁdence δ , if there are randomized oracle circuits f from D such that for every Boolean function f : { , } n (cid:55)→ { , } computable bya circuit from C , when given oracle access to f , input n and the internal randomness w ∈ { , } ∗ , L f outputs the description of a circuit satisfying Pr w [ L f (1 n , w ) (1 − (cid:15) ) -approximates f ] ≥ δ.L f uses non-adaptive membership queries if the set of queries which L f makes to theoracle does not depend on the answers to previous queries. L f uses random examples ifthe set of queries which L f makes to the oracle is chosen uniformly at random. In this paper, PAC learning always refers to learning over the uniform distribution.

Boosting conﬁdence and reducing error.

The conﬁdence of the learner can beeﬃciently boosted in a standard way. Suppose an s -size circuit L f learns f up to error (cid:15) with conﬁdence δ . We can then run L f k times, test the output of L f from every run with m new random queries and output the most accurate one. By Hoeﬀding’s inequality, m random queries fail to estimate the error (cid:15) of an output of L f up to γ with probability atmost 2 /e γ m . Therefore the resulting circuit of size poly ( s, m, k ) learns f up to error (cid:15) + γ with conﬁdence at least 1 − k/e γ m − (1 − δ ) k ≥ − k/e γ m − e − kδ . If we are trying tolearn small circuits we can get even conﬁdence 1 by ﬁxing internal randomness of learnernonuniformly without losing much on the running time or the error of the output. It isalso possible to reduce the error up to which L f learns f without a signiﬁcant blowup inthe running time and conﬁdence. If we want to learn f with a better error, we ﬁrst learnan ampliﬁed version of f , Amp ( f ). Employing direct product theorems and Goldreich-Levin reconstruction algorithm, Carmosino et. al. [5, Lemma 3.5] showed that for each0 < (cid:15), γ < f with n inputs to a Booleanfunction Amp ( f ) with poly ( n, /(cid:15), log(1 /γ )) inputs so that Amp ( f ) ∈ P / poly f and thereis a probabilistic poly ( | C | , n, /(cid:15), /γ )-time machine which given a circuit C (1 / γ )-approximating Amp ( f ) and an oracle access to f outputs with high probability a circuit(1 − (cid:15) )-approximating f . We thus typically ignore the optimisation of the conﬁdence anderror parameter in the rest of the paper. The most direct way of turning circuit lower bounds into a certain type of learning canbe described as follows. The simple observation from box A appeared in [27, Section 4.5] and [36]. I am not aware of amore systematic treatment of this concept. There are related models of learning such as ‘knows whatit knows’ model by Li-Littman-Walsh [22] and ‘reliable learning’ by Rivest-Sloan [41] which prohibitincorrect predictions in various ways. These models, however, follow the formalization of PAC learningin that the goal of the learner is to learn the target concept by accessing it. In box A we do not assumethat the target concept f is determined on all inputs or prior to the given samples. . Prediction from lower bound. Suppose we are given bits f ( y ) , . . . , f ( y k )for n -bit strings y , . . . , y k deﬁning a partial Boolean function f . We want to pre-dict the value of f on a new input y k +1 ∈ { , } n . A priori f ( y k +1 ) is not deﬁnedbut we will interpret the minimal-size circuit C f coinciding with f on y , . . . , y k as ‘the right’ prediction of f ( y k +1 ). That is, we want to ﬁnd C f ( y k +1 ). Here, weassume that the minimal circuit C f determines the value f ( y k +1 ). Otherwise,there are two circuits C , C of minimal size such that C ( y k +1 ) (cid:54) = C ( y k +1 ), andtherefore any prediction is equally good. Say that the size of the minimal circuit C f is s . Then the task to predict the value C f ( y k +1 ) can be formulated as thetask to prove an s -size circuit lower bound of the form ∀ circuit C of size s, (cid:95) i =1 ,...,k C ( y i ) (cid:54) = f ( y i ) ∨ C ( y k +1 ) (cid:54) = (cid:15) for (cid:15) = 0 or (cid:15) = 1.An interesting aspect of the prediction method described in box A is that by provingeven a single circuit lower bound we can learn something about the function f (if we knowthe value s ). More precisely, we predict C f on a single input but do not necessarilly gainknowledge of the values of C f on other inputs. This ‘instance-speciﬁc’ learning should becontrasted with PAC learning, Deﬁnition 3, where one is required to generate a circuitpredicting the target function f on most inputs. This, however, does not mean that it iseasier to learn in the sense of box A: in Deﬁnition 3 we do not need to recognize when theprediction errs while the prediction from box A is zero-error in the sense that it guaranteesto output the right value of C f ( y k +1 ). Determining minimal circuit size.

A drawback of the observation in box A is thatit requires knowledge of the size s of the minimal circuit C f , which might be hard forthe learner to determine. The size s could be determined by deciding t -size circuit lowerbounds for t ∈ [ s ]. Perhaps a more practical way of addressing the issue is to take asuﬃciently big approximate value s (cid:48) of s , choose a random t ∈ [ s (cid:48) ] and prove t -size lowerbounds (as in box A with t instead of s ). If s (cid:48) ≤ n O (1) , the probability that we havethe right t is 1 /n O (1) . Then, by solving polynomially many t -size lower bounds (in orderto predict C f ( y ) on polynomially many y ’s), we can approximate the accuracy of ourpredictions. If the accuracy is not high, we can reapeat the process with a new random Provability vs truth.

The deﬁnition of ‘the right’ prediction in terms of minimal circuits usedin box A can be interpreted as an implicit (alternative) deﬁnition of truth. Consider, for example, thatstrings y j encode statements in set theory ZFC and the value f ( y j ) is 1 if and only if the statement encodedby y is provable in ZFC. It would be interesting to ﬁnd out whether the minimal circuit coinciding witha suﬃciently rich list of such samples ( y j , f ( y j )) determines a truth value of the Continuum Hypothesisor of the consistency of ZFC, statements which are independent of ZFC. Unfortunately, in general, suchquestions seem to be out of reach of the contemporary mathematics. ∈ [ s (cid:48) ]. The advantage of this method is that it does not rely on deciding correctlywhether some particular t -size circuit lower bounds hold - we are actually allowed to erron some fraction of lower bounds. However, its predictions are no longer zero-error. Aclosely related argument is formalized in Section 4. Proof complexity.

The prediction method from box A relies on proof complexity ofcircuit lower bounds, cf. [20]. It would be interesting to ﬁnd out if proving circuit lowerbounds in standard proof systems suﬃces to construct learning circuits.

Question 1 (Learning interpolation) . Is there a p-time function which given an ExtendedFrege proof of a formula (cid:87) y ∈ A C ( y ) (cid:54) = f ( y ) ∨ C ( x ) (cid:54) = (cid:15) , for (cid:15) = 0 or (cid:15) = 1 , with freevariables representing s -size circuits C with n inputs, a ﬁxed set A of n -bit inputs of asuﬃciently big size | A | = poly ( s, n ) , a ﬁxed n -bit string x / ∈ A and values of f ∈ Circuit [ s ] on A , outputs a circuit (1 / /n ) -approximating f ? We now give a construction of PAC learning algorithms from an interactive witnessingof circuit lower bounds. As discussed in the introduction, the implication can be alsointerpreted as a construction of PAC learning algorithms from a frequent interactiveinstance-speciﬁc learning.

Theorem 1 (Learning from interactive witnessing of lower bounds) . Let d ≥ k, K ≥ and H be a Boolean function with n inputs. Assume there are Kn -size circuits W , . . . , W b log n with b = 2 Kn such that for each distribution R on n dk -size circuits with n inputs there ex-ists j ∈ [ b ] such that circuits W j , . . . , W j log n witness errors of n dk -size circuits attemptingto compute H in the following way.Given an oracle access to a random n dk -size circuit D ( x ) with n inputs, withprobability at least − /n over R , the following interactive protocol succeeds:After querying values of circuit D , W j outputs a not-yet-queried x ∈ { , } n s.t. D ( x ) (cid:54) = H ( x ) or W j receives a correction in the form of bits D ( x ) , H ( x ) s.t. D ( x ) = H ( x ) . Having D ( x ) , H ( x ) and the samples queried by W j , W j makesfurther queries to D and generates the second not-yet-queried candidate x ∈ { , } n for the claim C ( x ) (cid:54) = H ( x ) . If D ( x ) = H ( x ) , W j receives a correction and theprotocol continues in this way until some W jt , for t ≤ log n , with access to all Notably, Razborov [39] established that weak proof systems such as Resolution operating with k -DNFs for small k do not have polynomial-size proofs of any superpolynomial circuit lower bound what-soever and he conjectured this holds under a hardness assumption even for stronger systems such asFrege. The issue is, however, delicate because proof systems like Extended Frege are already capable offormalizing a lot of complexity theory, see e.g. [27], and it is perfectly plausible that if a circuit lowerbound is provable at all, then it is eﬃciently provable in Extended Frege. revious corrections and samples ﬁnds the right x t which has not been queried by W j , . . . , W jt and witnesses D ( x t ) (cid:54) = H ( x t ) .Then, circuits of size n dk with n d inputs can be learned by circuits of size K (cid:48) n overthe uniform distribution with non-adaptive membership queries, conﬁdence / K (cid:48) n up toerror / − / K (cid:48) n , where K (cid:48) is a constant depending only on K . Note that the witnessing circuits from Theorem 1 can work for arbitrary function H and, for the circuits D on which the witnessing succeeds, the number of queries in eachround is implicitly bounded by < n (since after querying D on all inputs it would beimpossible to output a not-yet-queried input). Proof.

The proof follows the main construction from [35, 17] in the context of learn-ing. The main technical complication is caused by the fact that the witnessing circuits W . . . , W b log n are allowed to fail on a signiﬁcant fraction of inputs.In order to derive the conclusion of the theorem it suﬃces to assume that the witnessingcircuits work for distributions R induced by speciﬁc Nisan-Wigderson generators.Consider a Nisan-Wigderson generator based on a circuit C which we aim to learn.Speciﬁcally, for d ≥ n d ≤ m ≤ n d , let A = { a i,j } i ∈ [2 n ] j ∈ [ m ] be a 2 n × m n d ones per row and J i ( A ) := { j ∈ [ m ]; a i,j = 1 } . Then deﬁne an NW-generator N W C : { , } m (cid:55)→ { , } n as ( N W C ( w )) i = C ( w | J i ( A ))where w | J i ( A ) are w j ’s such that j ∈ J i ( A ).For any d ≥

2, Nisan and Wigderson [29] constructed a 2 n × m A with n d ones per row and n d ≤ m ≤ n d which is also an ( n, n d )-design meaning that foreach i (cid:54) = j , | J i ( A ) ∩ J j ( A ) | ≤ n and | J i ( A ) | = n d . Moreover, there are n d -size circuitswhich given i ∈ { , } n and w ∈ { , } m output w | J i ( A ), cf. [5]. Therefore, if C has n d inputs and size n dk , then for each w ∈ { , } m , ( N W C ( w )) x is a function on n inputs x computable by circuits of size n dk . We want to learn C by a circuit of size 2 O ( n ) .Let R be the distribution on n dk -size circuits deﬁned so that a random circuit over R is ( N W C ( w )) x for w ∈ { , } m chosen uniformly at random.By the assumption of the theorem, we have 2 Kn -size circuits W , . . . , W b log n , with b = 2 Kn such that for some j ∈ [ b ] for 1 − /n of all w ∈ { , } m circuits W j , . . . , W j log n ﬁnd an error of the n dk -size circuit ( N W C ( w )) x attempting to compute H . We will usethem in order to break, in a certain sense, the generator N W C and reconstruct the circuit C . For each w deﬁne a trace tr ( C, w ) = x , . . . , x t as the sequence of t ≤ log n stringsgenerated by W j , . . . , W jt on ( N W C ( w )) x such that W jt is the ﬁrst circuit which succeedsin witnessing the error, i.e. H ( x t ) (cid:54) = ( N W C ( w )) x t . If circuits W j , . . . , W j log n do not ﬁnd9n error, x t = x log n . The trace is deﬁned w.r.t. a ﬁxed ‘helpful’ oracle Y providingcorrections in the form of bits ( N W C ( w )) x , H ( x ).For u ∈ { , } n d and v ∈ { , } m − n d deﬁne r x ( u, v ) ∈ { , } m by putting bits of u intopositions J x ( A ) and ﬁlling the remaining bits by v (in the natural order). We say that w ∈ { , } m is good if the trace tr ( C, w ) ends with a string witnessing an error of circuit(

N W C ( w )) x and bad otherwise. Similarly, given v ∈ { , } m − n d and x (cid:48) ∈ { , } n , we saythat u ∈ { , } n d is good if r x (cid:48) ( u, v ) is.The core claim of the proof is the existence of a frequent trace on which circuit W j , . . . , W j log n succeed in witnessing the error with signiﬁcant advantage. Claim 3.1.

There is a trace

T r = X , . . . , X t , t ≤ log n such that for s ≥ / (6 n ( t − n n ) of all a ∈ { , } m − n d for s (cid:48) ≥ s of all u ∈ { , } n d tr ( C, r X t ( u, a )) starts with T r and atleast (2 / − t /n − /n ) s (cid:48) n d u ’s are good and satisfy tr ( C, r X t ( u, a )) = T r . The trace

T r is constructed inductively: in step i we want to ﬁnd X , . . . , X i − suchthat for ≥ / n ( i − of all w ’s tr ( C, w ) strictly extends X , . . . , X i − and the fraction ofgood w ’s for which this happens is ≥ − i / n . For i = 1 this holds by the assumption.Assume we have such X , . . . , X i − . We want to extend them to X , . . . , X i . Since thereare at most 2 n strings X j , there is X i such that for s (cid:48)(cid:48) ≥ / (2 n n ( i − ) w ’s tr ( C, w ) startswith X , . . . , X i and ≤ i /n of these w ’s are bad. Otherwise, the fraction of good w ’s forwhich tr ( C, w ) strictly extends X , . . . , X i − would be ≤ / n + 1 − i /n < − i / n if 2 n ≤ n . Now, either for ≥ (2 / s (cid:48)(cid:48) of w ’s tr ( C, w ) stops at X i (hence, for ≤ (1 / s (cid:48)(cid:48) w ’s the trace continues and for ≤ i s (cid:48)(cid:48) /n bad w ’s tr ( C, w ) starts with X , . . . , X i ) or for ≥ (1 / s (cid:48)(cid:48) w ’s the trace strictly extends X , . . . , X i . In the latter case, for ≤ i s (cid:48)(cid:48) /n bad w ’s tr ( C, w ) starts with X , . . . , X i , which means that the fraction of bad w ’s such that tr ( C, w ) strictly extends X , . . . , X i is ≤ · i /n .Since for all w , the length of tr ( C, w ) is bounded by log n , the process of extending X , . . . , X i − has to stop at some step 1 ≤ i ≤ log n . That is, there is T r = X , . . . , X t , t ≤ log n such that for ≥ (2 / s of w ’s tr ( C, w ) =

T r , for ≤ (1 / s of w ’s tr ( C, w ) strictlyextends

T r and ≤ t s/n of w ’s such that tr ( C, w ) is consistent with

T r are bad, where s ≥ / (6 n ( t − n ). The number of good w ’s such that tr ( C, w ) =

T r is at least (2 / − t /n ) s m . Therefore, ≥ s/n a ’s can be completed by s (cid:48) ≥ s/n u ’s to a string w = r X t ( u, a )such that tr ( C, w ) starts with

T r and at least (2 / − t /n − /n ) s (cid:48) n d u ’s are good andsatisfy tr ( C, r X t ( u, a )) = T r . This proves the claim.For X ∈ { , } n and a (cid:48) ∈ { , } m − n d let r X ( · , a (cid:48) ) be the bits of a (cid:48) in the positionsof [ m ] \ J X ( A ). Since A is an ( n, n d )-design, for any row x (cid:54) = X at most n bits of r X ( · , a (cid:48) ) | J x ( A ) are not set. For x (cid:54) = X , let Y X,a (cid:48) x,C be the set of all corrections provided by Y on x, C and r X ( u, a (cid:48) ) | J x ( A ) for all u ∈ { , } n d . This includes queries to C on inputs r X ( u, a (cid:48) ) | J x ( A ). The size of each set Y X,a (cid:48) x,C is 2 O ( n ) .10e are ready to describe a circuit D (cid:48) that approximates C . First, choose uniformly atrandom a (cid:48) ∈ { , } m − n d , a trace X , . . . , X t with t ≤ log n , a bit maj ∈ { , } and j (cid:48) ∈ [ b ].Query C so that all queries to C from sets Y X t ,a (cid:48) x,C , for x (cid:54) = X t , are obtained. In order toget access to all corrections from Y X t ,a (cid:48) X ,C , . . . , Y X t ,a (cid:48) X t − ,C we provide also the full truth-tableof H as a nonuniform advice of D (cid:48) . The truth table of H is a single nonuniform advice ofthe learner which works for every C . Then D (cid:48) computes as follows. For each u ∈ { , } n d produce r X t ( u, a (cid:48) ). Next, use W j (cid:48) to produce x . If a query of W j (cid:48) cannot be answeredby Y X t ,a (cid:48) x,C with x (cid:54) = X t or x (cid:54) = X , output maj . Otherwise, use the advice from Y X t ,a (cid:48) X ,C to ﬁnd out if H ( X ) = N W C ( r X t ( u, a (cid:48) )) X . If the equality does not hold, output maj .Otherwise, use W j (cid:48) to generate x and continue in the same manner until W j (cid:48) t produces x t . If a query of W j (cid:48) t cannot be answered by Y X t ,a (cid:48) x,C with x (cid:54) = X t or x t (cid:54) = X t , output maj .Otherwise, output 0 iﬀ H ( X t ) = 1. The resulting circuit D (cid:48) has n d inputs and size 2 O ( n ) ,if m ≤ n (which holds w.l.o.g.).By Claim 3.1, with probability at least 1 / (6 n log n O ( n log n ) ) the learner guessed j (cid:48) = j ,trace T r and assignment a such that for at least (2 / − t /n − /n ) s (cid:48) of all u ∈ { , } n d , D (cid:48) will successfully predict C ( u ). Moreover, for at most (1 / t /n + 2 /n ) s (cid:48) of all u ’s,the trace extends T r or starts with

T r but does not end with a string witnessing an error.Since with probability 1 / u ’s is maj ,Pr u [ D (cid:48) ( u ) = C ( u )] ≥ / / − t /n − /n ) s .The assumption from Theorem 1 is justiﬁed by the following lemma which establishesthe converse. Lemma 1 (Witnessing from learning) . Let k ≥ ; (cid:15) < ; n / n ≥ (cid:15)n ≥ n k and H be a Boolean function with n inputs hard to (1 − /n ) -approximate by circuits of size (cid:15)n . Assume Circuit [ n k ] can be learned by Circuit [2 (cid:15)n ] over the uniform distribution withconﬁdence up to error (cid:15) (cid:48) .Then, there are O ( n ) -size circuits W , . . . , W b with b = 2 n / n such that for eachdistribution R on n k -size circuits with n inputs there exists j ∈ [ b ] such that given anoracle access to a random n k -size circuit D ( x ) with n inputs, with probability at least − (cid:15) (cid:48) n over R , after ≤ (cid:15)n queries to circuit D , W j outputs a not-yet-queried x ∈ { , } n s.t. D ( x ) (cid:54) = H ( x ) .Proof. By the assumption, there exists an 2 (cid:15)n -size circuit W which for each n k -size circuit D , given an oracle access to D , outputs a circuit C (1 − (cid:15) (cid:48) )-approximating D . Since H is hard to (1 − /n )-approximate by circuits of size 2 (cid:15)n ≤ n / n , there are at least 2 n / n inputs which have not been queried by W and on which C fails to compute H . Therefore,a random input which has not been queried by W and on which C fails to compute H witnesses D ( x ) (cid:54) = H ( x ) with probability ≥ − (cid:15) (cid:48) n . Let W , . . . , W b , b = 2 n / n , becircuits such that W i simulates W and outputs the i -th input on which C fails to compute11 ignoring inputs which have been queried by W . The size of each W i is 2 O ( n ) because ituses the whole truth table of H as a nonuniform advice. Let R be arbitrary distributionon circuits of size n k . Since for each D , at least 1 − (cid:15) (cid:48) n of W i ’s succeed, there is W j which succeeds on random D with probability ≥ − (cid:15) (cid:48) n over R .Note that Theorem 1 together with Lemma 1 imply that for suitable H it is possible tocollapse the number of rounds in the interactive witnessing from Theorem 1 at the expenseof witnessing errors of slightly smaller circuits (and a small increase in the running timeof the witnessing). Learning from witnessing lower bounds with white-box access.

Theorem 1 holdsalso under the stronger assumption that circuits W . . . , W b log n witness errors of n dk -sizenondeterministic circuits D with n inputs (and ≤ n dk nondeterministic bits), where D computes a function in Circuit [ n dk ], i.e. D is a nondeterministic circuit computing afunction in P / poly . Then it makes sense to allow W , . . . , W b log n to access a full descriptionof a given nondeterministic circuit D . The conclusion of the resulting theorem remainsvalid with the only diﬀerence that the learning algorithm is given full description of an n dk -size nondeterministic circuit with n d inputs representing the target function (whichis computable by an n dk -size deterministic circuit with n d inputs). Comparison to witnessing in bounded arithmetic.

The existence of witnessinganalogous to the one from Theorem 1 follows from the provability of circuit lower boundsin bounded arithmetic.If H : { , } n → { , } is an NP function and n , k are constants, we can write downa ∀ Σ b formula LB ( H, n k ) stating that H is hard for circuits of size n k : ∀ n, n > n ∀ circuit D of size ≤ n k ∃ y, | y | = n, D ( y ) (cid:54) = H ( y ) , where D ( y ) (cid:54) = H ( y ) is a Σ b formula stating that a circuit D on input y outputs the oppositevalue of H ( y ). Here, Σ b is a class of formulas in the language of Cook’s theory PV whichdeﬁne precisely the predicates from Σ p level of the polynomial hierarchy, cf. [20].By the KPT theorem [21], if PV proves LB ( H, n k ) then there are ﬁnitely many poly ( n )-time functions W , . . . , W l which witness the existential quantiﬁers of LB ( H, n k ) (includingthe existential quantiﬁer from the subformula D ( y ) (cid:54) = H ( y )) in the same interactive wayas in Theorem 1 except that the corrections include strings standing for the innermostuniversal quantiﬁer of LB ( H, n k ) (which allow to verify in p-time that D ( y ) (cid:54) = H ( y ) hasnot been witnessed by the most recent candidates). Moreover, W , . . . , W l have accessto the full description of a given circuit D and do not make queries to D but directlygenerate potential errors, cf. [35].It is possible to change the formula LB ( H, n k ) by introducing a parameter m satisfying2 n = | m | so that the witnessing from the PV -provability of the new formula is given bycircuits W , . . . , W l of size 2 O ( n ) . In such case, H is allowed to be in NE . We could allow12 to be even an arbitrary Boolean function if we formulated the lower bound in QBFproof systems instead of bounded arithmetic.A crucial diﬀerence between the black-box witnessing from Theorem 1 and white-box witnessing in bounded arithmetic is that, under standard hardness assumptions, thewhite-box witnessing of p-size circuit lower bounds for functions H such as SAT exists,cf. [27].

Comparison to other witnessing theorems.

Lipton and Young [24] showed that foreach Boolean function H hard for circuits of size O ( n k +1 ) there is a multiset of inputs A of size O ( n k ), the so called anticheckers, such that each n k -size circuit fails to compute H on ≥ / A . Therefore, for each distribution R on n k -size circuits,some input from the set of anticheckers will witness an error of a random n k -size circuits D (without a single query to D ) with probability ≥ / R . Using t rounds theprobability of witnessing an error can be increased to 1 − / (3 / t . This can be done with ≤ n O ( kt ) witnessing circuits W ij . More precisely, we can let W i , . . . , W it to be the i -thpossible t -tuple of inputs from the set of anticheckers, for i < n O ( kt ) . Theorem 1 showsthat it is not possible to increase this probability further to 1 − /n using log n roundsunless p-size circuits can be learned eﬃciently.Gutfreund, Shaltiel and Ta-Shma [12] showed that if P (cid:54) = NP there is a p-time algo-rithm which, given a description of an n k -time machine D , generates a set of ≤ D fails to solve SAT on one of them. Atserias [2] extended this by showing thatif NP (cid:54)⊆ BPP there is a probabilistic p-time algorithm which, given an oracle access toan n k -time machine D , outputs with probability ≥ / D fails to solve SAT on one of them. These algorithms diﬀer from the witnessing in The-orem 1 in several ways: they ﬁnd errors of uniform algorithms, are allowed to generateerrors of diﬀerent lengths, generate errors with a signiﬁcantly smaller probability than theprobability required in Theorem 1 and the set of formulas generated by the algorithm ofAtserias includes formulas on which the algorithm queried D . Circuit lower bounds can be used to construct PAC learning algorithms also if we assumethat they break pseudorandom generators. The construction goes back to a relation be-tween predictability and pseudorandomness which can be interpreted in terms of learningalgorithms, as shown by Blum, Furst, Kearn and Lipton [3] and later extended by severalother works. In this section we survey some of these connections, derive a construction oflearning algorithms from the non-existence of succinct nonuniform pseudorandom func-tion families and show how these connections relate to a question of Rudich about turningdemibits to superbits.We start by recalling the construction from [3], which underlies all results in this13ection.For an n c -size circuit C with n inputs deﬁne a generator G C : { , } mn (cid:55)→ { , } mn + m which maps m n -bit strings x , . . . , x m to x , C ( x ) , . . . , x m , C ( x m ). Lemma 2 (from [3]) . There is a randomized p-time function L such that for every n c -sizecircuit C , if an s -size circuit D satisﬁes Pr[ D ( x ) = 1] − Pr[ D ( G C ( x )) = 1] ≥ /s, then the circuit C is learnable by L ( D ) over the uniform distribution with random exam-ples, conﬁdence / m s , up to error / − / ms .Proof. Given D , L ( D ) chooses a random i ∈ [ m ], random bits r i , . . . , r m , random n -bitstrings x , . . . , x n except x i and queries the bits C ( x ) , . . . , C ( x i − ). For x i ∈ { , } n ,let p i := D ( x , C ( x ) , . . . , x i − , C ( x i − ) , x i , r i , . . . , x m , r m ). Then L ( D ) on x i predicts thevalue C ( x i ) by outputting ¬ r i if p i = 1 and r i otherwise. By triangle inequality, random i ∈ [ m ] satisﬁes Pr[ p i = 1] − Pr[ p i +1 = 1] ≥ /ms with probability 1 /m . Since the probability over r i . . . , r m , x , . . . , x m that L ( D ) predicts C ( x i ) correctly is12 Pr[ p i = 1 | r i (cid:54) = C ( x i )] + 12 (1 − Pr[ p i = 1 | r i = C ( x i )]) , and Pr[ p i = 1] = Pr[ p i = 1 | r i = C ( x i )] + Pr[ p i = 1 | r i (cid:54) = C ( x i )] , it follows thatPr x i [ L ( D )( x i ) = C ( x i )] ≥ / / ms with probability 1 / m s over the internal randomness of L ( D ).The proof of Lemma 2 implies that learning on average follows from breaking pseu-dorandom generators. Speciﬁcally, let R be a p-size circuit which given r bits outputsan n c -size circuit C and consider a generator G : { , } mn + r (cid:55)→ { , } mn + m which applies R on its ﬁrst r input bits in order to output a circuit C and then computes as a gener-ator G C on the remaining mn inputs. Breaking G implies that we can break G C withsigniﬁcant probability over C drawn from the distribution induced by R . Consequently,breaking G means that we can learn a big fraction of n c -size circuits w.r.t. R . Can weimprove this average-case learning into a worst-case learning which works for all n c -sizecircuits? Since eﬃcient learning algorithms for p-size circuits yield natural properties14seful against p-size circuits, which by [40] break pseudorandom generators, a positiveanswer would present an important dichotomy: cryptographic pseudorandom generatorsdo not exist if and only if there are eﬃcient learning algorithms for small circuits (withsuitable parameters). This possibility has been explored by Oliveira-Santhanam [32] andSanthanam [43], cf. Section 4.3. Question 2 (Dichotomy) . Assume that for each (cid:15) < there is no pseudorandom generator g : { , } n (cid:55)→ { , } n +1 computable in P / poly and safe against circuits of size n (cid:15) forinﬁnitely many n . Does it follow that p-size circuits are learnable by circuits of size O ( n δ ) , for some δ < , with conﬁdence /n , up to error / − / O ( n δ ) ? The proof of Lemma 2 shows also that we can construct a worst-case learning algorithm as-suming that given an oracle access to a pseudorandom generator we can eﬃciently produceits distinguisher. In particular, a single method breaking all pseudorandom generatorswould suﬃce.

Deﬁnition 4.

The circuit size problem

GCSP [ s, k ] is the problem to decide whether fora given list of k samples ( y i , b i ) , y i ∈ { , } n , b i ∈ { , } , there exists a circuit C of size s computing the partial function deﬁned by samples ( y i , b i ) , i.e. C ( y i ) = b i for the given k samples ( y i , b i ) . The parameterized minimum circuit size problem MCSP [ s ] stands for GCSP [ s, n ] where the list of n samples deﬁnes the whole truth-table of a Boolean function. If we were extraordinary in proving circuit lower bounds, we could solve

GCSP eﬃ-ciently. Note that

MCSP [ n O (1) ] ∈ P / poly is stronger assumption than the existence of P / poly -natural property useful against P / poly , which breaks pseudorandom generators.The following theorem appeared (in diﬀerent terminology) in Vadhan [45], see also [15]. Theorem 2 (Learning from succinct natural proofs) . Assume

GCSP [ n c , n d ] ∈ P / poly forconstants d > c + 1 . Then, Circuit [ n c ] is learnable by P / poly over the uniform distributionwith random examples, conﬁdence /poly ( n ) , up to error / − /poly ( n ) .Proof. As the number of partial Boolean functions on a given set of m inputs is 2 m andthe number of n c -size circuits is bouded by 2 n c +1 , GCSP [ n c , n d ] ∈ P / poly implies that for m = n d there are p-size circuits D such that for each n c -size circuit C ,Pr[ D ( x ) = 1] − Pr[ D ( G C ( x )) = 1] ≥ / . Now, it suﬃces to apply Lemma 2. 15 .2 Worst-case learning from natural proofs

In Theorem 2, we can learn f ∈ Circuit [ n c ] even if the algorithm for GCSP works justfor a signiﬁcant fraction of partial truth-tables ( y , b ) , . . . , ( y n d , b n d ) with zero-error oneasy partial truth-tables. Carmosino, Impagliazzo, Kabanets and Kolokolova [5] provedthat the assumption of Theorem 2 can be weakened to the existence of a standard naturalproperty. The price for this is that the resulting learning uses membership queries insteadof random examples. The crucial idea is similar to the proof of Theorem 1: apply thenatural property (as an algorithm for suitable GCSP ) on a Nisan-Wigderson generator

N W f based on the function f , which we want to learn. Theorem 3 (Learning from natural proofs [5]) . Let R be a P / poly -natural property usefulagainst Circuit [ n d ] for some d ≥ . Then, for each γ ∈ (0 , , Circuit [ n k ] is learnableby Circuit [2 O ( n γ ) ] over the uniform distribution with non-adaptive membership queries,conﬁdence 1, up to error n k , where k = dγa and a is an absolute constant. Oliveira and Santhanam [32] showed that the assumption of the existence of natural proofsfrom Theorem 3 can be further weakened to the existence of a distinguisher breakingnon-uniform pseudorandom function families. Their result follows from a combination ofTheorem 3 and the Min-Max Theorem. Using their strategy but combining the Min-MaxTheorem with Theorem 2, learning algorithms with random examples can be obtainedfrom distinguishers breaking succinct non-uniform pseudorandom function familiesA two-player zero-sum game is speciﬁed by an r × c matrix M and is played as follows.MIN, the row player, chooses a probability distribution p over the rows. MAX, the columnplayer, chooses a probability distribution q over the columns. A row i and a column j aredrawn randomly from p and q , and MIN pays M i,j to MAX. MIN plays to minimize theexpected payment, MAX plays to maximize it. The rows and columns are called the purestrategies available to MIN and MAX, respectively, while the possible choices of p and q are called mixed strategies . The Min-Max theorem states that playing ﬁrst and revealingone’s mixed strategy is not a disadvantage: min p max j (cid:88) i p ( i ) M i,j = max q min i (cid:88) j q ( j ) M i,j . Note that the second player need not play a mixed strategy - once the ﬁrst player’sstrategy is ﬁxed, the expected payoﬀ is optimized for the second player by playing somepure strategy. The expected payoﬀ when both players play optimally is called the value of the game. We denote it v ( M ).A mixed strategy is k -uniform if it chooses uniformly from a multiset of k pure strate-gies. Let M min = min i,j M i,j and M max = max i,j M i,j . Newman [28], Alth¨ofer [1] and16ipton-Young [24] showed that each player has a near-optimal k -uniform strategy for k proportional to the logarithm of the number of pure strategies available to the opponent. Theorem 4 ([28, 1, 24]) . For each (cid:15) > and k ≥ ln( c ) / (cid:15) , min p ∈ P k max j (cid:88) i p ( i ) M i,j ≤ v ( M ) + (cid:15) ( M max − M min ) , where P k denotes the k -uniform strategies for MIN. The symmetric result holds for MAX. Deﬁnition 5 (Succinct non-uniform PRF) . An ( m, m (cid:48) ) -succinct non-uniform pseudoran-dom function family from circuit class C safe against circuits of size s is a set S of partialtruth-tables (cid:104) ( x , b ) , . . . , ( x m , b m ) (cid:105) where each x i is an n -bit string and b i ∈ { , } suchthat each partial truth-table from S is computable by one of m (cid:48) circuits from C and forevery circuit D of size s , Pr x [ D ( x ) = 1] − Pr x ∈ S [ D ( x ) = 1] < /s where the ﬁrst probability is taken over x ∈ { , } m ( n +1) chosen uniformly at random andthe second probability over partial truth-tables chosen uniformly at random from S . Theorem 5 (Learning or succinct non-uniform PRF) . Let c ≥ and s > n, m ≥ .There is an ( m, s ) -succinct non-uniform PRF in Circuit [ n c ] safe against Circuit [ s ] orthere are circuits of size poly ( s ) learning Circuit [ n c ] over the uniform distribution withrandom examples, conﬁdence /poly ( s ) , up to error / − /poly ( s ) .Proof. Consider a two-player zero-sum game speciﬁed by a matrix M with rows indexedby n c -size circuits with n inputs and columns indexed by s -size circuits with m ( n + 1)inputs. Deﬁne the entry M C,D of M corresponding to a row circuit C and a column circuit D as M C,D := | Pr x [ D ( x ) = 1] − Pr x [ D ( G C ( x )) = 1] | for the generator G C from the proof of Lemma 2. Hence M max − M min ≤ v ( M ) ≥ / s , then by Theorem 4 (with (cid:15) = 1 / s ), there exist a multiset of k ≤ n c +1 s s -size circuits D , . . . , D k such that for every n c -size circuit C , a random D from D , . . . , D k satisﬁes E[ | Pr[ D ( x ) = 1] − Pr[ D ( G C ( x )) = 1] | ] ≥ / s. By Lemma 2, for every n c -size circuit C , one of the circuits D , . . . , D k (or theirnegations) can be used to learn C with conﬁdence 1 /poly ( s ), up to error 1 / − /poly ( s ).A poly ( s )-size circuit using a random D i from D , . . . , D k or its negation thus learns Circuit [ n c ] with random examples, conﬁdence 1 /poly ( s ), up to error 1 / − /poly ( s ).17f v ( M ) < / s , then by Theorem 4 (with (cid:15) = 1 / s ), there exists a multiset of k ≤ s n c -size circuits C , . . . , C k such that for every s -size circuit D , a random C from C , . . . , C k satisﬁes E[ | Pr[ D ( x ) = 1] − Pr[ D ( G C ( x )) = 1] | ] ≤ / s. Since E[ | Pr[ D ( x ) = 1] − Pr[ D ( G C ( x )) = 1] | ] ≥ | Pr[ D ( x ) = 1] − E[Pr[ D ( G C ( x )) = 1]] | agenerator G : { , } mn + (cid:100) log k (cid:101) (cid:55)→ { , } mn + m which takes as input a string of length mn + (cid:100) log k (cid:101) encoding (an index of) a circuit C from C , . . . , C k together with m n -bit strings x , . . . , x m and outputs x , C ( x ) , . . . , x m , C ( x m )is safe against circuits of size s . The range of G deﬁnes an ( m, s )-succinct non-uniformPRF in Circuit [ n c ] safe against Circuit [ s ].Note that the existence of a generator G from the proof of Theorem 5 follows directyfrom a counting argument if we do not require that G deﬁnes a PRF of small complexity: arandom set of poly ( s, n ) strings (yielding a non-uniform pseudorandom generator mapping { , } O (log s ) to { , } n ) fools circuits of size s . Rudich [42] proposed a conjecture about the existence of superbits, a version of pseudo-random generators safe against nondeterministic circuits, and showed that it rules out theexistence of NP -natural properties against P / poly . He then asked whether the existenceof superbits follows from a seemingly weaker assumption of the existence of so calleddemibits. We note that an aﬃrmative answer to his question would resolve Question 2 innondeterministic setting. Deﬁnition 6 (Superbit) . A function g : { , } n (cid:55)→ { , } n +1 computable by p-size cir-cuits is a superbit if there is (cid:15) < such that for inﬁnitely many input lengths n , for allnondeterministic circuits C of size | C | ≤ n (cid:15) , Pr x ∈{ , } n +1 [ C ( x ) = 1] − Pr x ∈{ , } n [ C ( g ( x )) = 1] < / | C | . Deﬁnition 7 (Demibit) . A function g : { , } n (cid:55)→ { , } n +1 computable by p-size circuitsis a demibit if there is (cid:15) < such that for inﬁnitely many input lengths n , no nondeter-ministic circuit C of size | C | ≤ n (cid:15) satisﬁes Pr x ∈{ , } n +1 [ C ( x ) = 1] ≥ / | C | and Pr x ∈{ , } n [ C ( g ( x )) = 1] = 0 . roposition 1 (Question 2 vs Rudich’s problem) . Assume the existence of demibitsimplies the existence of superbits. Then, either superbits exist or for each c ≥ , for each (cid:15) < , Circuit [ n c ] is learnable by Circuit [2 O ( n (cid:15) ) ] over the uniform distribution with randomexamples, conﬁdence / O ( n (cid:15) ) up to error / − / O ( n (cid:15) ) , where the learner is allowedto generate a nondeterministic or co-nondeterministic circuit approximating the targetfunction.Proof. Assume superbits do not exist and their non-existence implies the non-existenceof demibits. Consider a generator G : { , } mn + n c +1 (cid:55)→ { , } mn + m , with m = n c +1 + 1,which interprets the ﬁrst n c +1 bits of its input as a description of an n c -size circuit C andthen computes on the remaining mn inputs as generator G C from Lemma 2. Since G isnot a demibit, for each (cid:15) < D of size 2 ( mn + m − (cid:15) ,such that for each n c -size circuit C ,Pr[ D ( x ) = 1] − Pr[ D ( G C ( x )) = 1] ≥ / | D | . By the proof of Lemma 2, this means that n c -size circuits are learnable by circuits ofsize poly ( | D | ) with conﬁdence 1 /poly ( | D | ) up to error 1 / − /poly ( | D | ), except that thelearner might generate nondeterministic (if r i = 0) or co-nondeterminitic (if r i = 1) circuitapproximating the target function. A striking consequence of the relation between natural proofs and learning algorithms isa learning speedup of Oliveira and Santhanam [32].Suppose P / poly is learnable by circuits of weakly subexpoential size 2 n /n ω (1) . Thelearning circuits can be used to accept truth-tables of all functions in P / poly while theirsize guarantees that many hard functions are going to be rejected. This implies theexistence of a P / poly -natural property useful against P / poly , which by Theorem 3, givesus circuits of strongly subexponential size 2 n γ , γ <

1, learning P / poly .The argument of Oliveira and Santhanam can be generalized to a speedup of learnersof arbitrary size s . Here, we show how to derive such a generalized version more directlywithout constructing natural proofs and invoking Theorem 3. This is possible thanks to amore direct exploitation of a slightly modiﬁed NW-generator. A drawback of the approachis that we need to assume learning with random examples instead of membership queries. Theorem 6 (Generalized speedup) . Let d, k ≥ and n ≤ s ( n ) ≤ n /n . Assume Circuit [ n dk ] is learnable by Circuit [ s ( n )] over the uniform distribution with random exam-ples, conﬁdence , up to error / − /n . Then circuits of size m k with m = n d inputs arelearnable by circuits of size n dK ( s ( n )) over the uniform distribution with non-adaptivemembership queries, conﬁdence /n , up to error / − /n . Here, K is an absoluteconstant. n O (log n ) , then p-size circuits are learnablewith membership queries by circuits of size O ( n (cid:15) log n ), for each (cid:15) >

0. The speedupis achieved w.r.t. the input length of target functions at the expense of their circuitcomplexity.

Proof.

Let A be a 2 b × u b, n d )-design with | J i ( A ) | = n d for n d ≤ u ≤ n d , a constant d and parameter b such that ns ≤ b ≤ ns . The design is constructedin the usual way by evaluating polynomials of degree ≤ b on n d points of a ﬁeld with n d ≤ p ≤ n d elements. In particular, there are n d -size circuits which given i ∈ { , } b and w ∈ { , } u output w | J i ( A ). Deﬁne N W f -generator mapping strings w of length u to strings of length 2 n as ( N W f ( w )) x ,...,x n = f ( w | J x ,...,x b ( A )) . Then for each m -input function f ∈ Circuit [ m k ] and w ∈ { , } u , ( N W f ( w )) x is com-putable as a function of x ∈ { , } n by a circuit of size n dk .By the assumption of the theorem every such circuit ( N W f ( w )) x is learnable by acircuit L of size s with conﬁdence δ = 1, up to error 1 / − (cid:15) . Consequently, there is acircuit D f of size O ( s ) such thatPr w,x,y ,...,y t [ D f ( x , . . . , x n , w, y , . . . , y t ) = f ( w | J x ,...,x b ( A ))] ≥ (1 / (cid:15) ) δ (5.1)where D f queries values f ( w | J y j ( A )) for t ≤ s random strings y j ∈ { , } b , j = 1 , . . . , t .The size of D f takes into account the need to simulate the circuit described by L . Now,random y , . . . , y t satisfyPr w,x [ D f ( x , . . . , x n , w, y , . . . , y t ) = f ( w | J x ,...,x b ( A ))] ≥ / (cid:15) − /n (5.2)with probability at least 1 /n . Otherwise, the probability in (5.1) would be < /n + (1 / (cid:15) − /n ). Similarly, given y , . . . , y t such that (5.2) holds, a random x ∈ { , } n satisﬁesPr w [ D f ( x , . . . , x n , w, y , . . . , y t ) = f ( w | J x ,...,x b ( A ))] ≥ / (cid:15) − /n (5.3)with probability at least 2 /n . Moreover, since every y j speciﬁes 2 n − b values of ( N W f ( w )) x ,given y , . . . , y t , a random x ∈ { , } n equals some y j on the ﬁrst b bits with probability ≤ t/ b ≤ /n . Applying the same averaging one more time, for y , . . . , y t and x whichdiﬀers on the ﬁrst b bits from each y j and satisﬁes (5.3), randomly ﬁxed u − n d bits of w on the positions of [ u ] \ J x ( A ) preserve the probability (5.3) up to an additional error 1 /n with probability at least 1 /n .For each y , . . . , y t , each x which diﬀers on the ﬁrst b bits from every y j and for eachﬁxation of u − n d bits of w on the positions of [ u ] \ J x ( A ), ( b, n d )-design guarantees that20he number of all queries f ( w | J y j ( A )), j = 1 , . . . , t , of D f for all possible w with the u − n d ﬁxed bits is ≤ t b . We can thus learn a circuit D (cid:48) approximating f ∈ Circuit [ m k ]with m = n d inputs with advantage 1 / (cid:15) − /n in the following way. Choose random y , . . . , y t , x , random u − n d bits of w corresponding to [ u ] \ J x ( A ) and query ≤ t b values f ( w | J y j ( A )) for all possible w with the u − n d ﬁxed bits. Then the circuit D (cid:48) , given n d bitsof w corresponding to J x ( A ), generates w and computes as D f with the provided queries f ( w | J y j ( A )). Since w can be constructed from given n d bits, x and the u − n d ﬁxed bits of w by a circuit of size n O ( d ) , each w | J y j ( A ) can be constructed from w and y j by a circuitof size n d and for each query to f the right value can be selected by a circuit of size O ( n d t b ), the size of D (cid:48) is O ( s + tn d + n d t b + n O ( d ) ) ≤ n O ( d ) s . D (cid:48) can be describedby n dK s bits, for an absolute constant K , and constructed by a circuit of the same sizewhich just substitutes y j , x and u − n d bits of w in the otherwise ﬁxed description of D (cid:48) .Since random y , . . . , y t satisfy (5.2) with probability at least 1 /n , a random x diﬀerson the ﬁrst b bits from each y , . . . , y t and satisﬁes (5.3) with probability at least 1 /n while the randomly ﬁxed u − n d bits of w have the desired property with probability atleast 1 /n as well, the conﬁdence of the learning algorithm is at least 1 /n .We give one more proof of the learning speedup which also addresses the issue ofmembership queries. Theorem 7 (Alternative speedup) . Let d ≥ k ≥ and (cid:15) < . Assume Circuit [ n dk ] islearnable by Circuit [2 (cid:15)n ] over the uniform distribution (possibly with membership queries)with conﬁdence , up to error /n . Then, circuits of size n dk with n d inputs are learnableby circuits of size Kn over the uniform distribution with conﬁdence / Kn up to error / − Kn , where K is an absolute constant.Proof. By a counting argument there exists H which is not (1 − /n )-approximable bycircuits of size 2 (cid:15)n . Here, n is w.l.o.g. suﬃciently big. By Lemma 1, learnability of Circuit [ n dk ] by Circuit [2 (cid:15)n ] up to error 1 /n implies the existence of circuits of size 2 O ( n ) witnessing errors of circuits of size n dk with probability ≥ − /n . The conclusion thusfollows by applying Theorem 1. The improved conﬁdence and approximation parameteris the consequence of the fact that our witnessing circuits succeed in the ﬁrst round, i.e. t = 1. Proof-search speedup.

The core trick behind Theorem 6 can be formulated in thecontext of proof complexity. Assume that an n dk -size lower bound is provable in a proofsystem P by a proof of size s ( n ). Then, a substitutional instance of the same P -proofof size s ( n ) proves an m k -size lower bound for circuits with m = n d inputs, on inputsgiven by the NW-generator from the proof of Theorem 6. Here, the base function of theNW-generator is not speciﬁed but represented by free variables encoding a circuit of size m k . 21 onlocalizable hardness magniﬁcation. Theorem 6 and the original speedup ofOliveira and Santhanam can be interpreted as hardness magniﬁcation theorems. Hardnessmagniﬁcation is an approach to strong complexity lower bounds by reducing them toseemingly much weaker lower bounds developed in a series of recent papers [33, 27, 31,25, 9, 10, 7, 6, 8, 26, 11], see [6] for a more comprehensive survey. For example, it turns outthat in order to prove that functions computable in nondeterministic quasipolynomial-time are hard for NC it suﬃces to show that a parameterized version of the minimumcircuit size problem MCSP is hard for AC [2]. However, [6] identiﬁed a locality barrier which explains why direct adaptations of many existing lower bounds do not yield strongcomplexity lower bounds via hardness magniﬁcation. Essentially, the reason is that theexisting lower bounds for explicit Boolean functions work often even for models which areallowed to use arbitrary oracles with n o (1) -small fan-in. This is easy to see in the case of AC [2] lower bounds: oracles of small fan-in can be simulated by polynomials of low degree.On the other hand, hardness magniﬁcation theorems typically yield (unconditional) upperbounds in the form of weak computational models extended with local oracles computingspeciﬁc problems such as the abovementioned version of MCSP . In fact, even irrespectiveof hardness magniﬁcation it is important to develop lower bound methods which do notlocalize: proving the nonexistence of subexponential-size learning algorithms for P / poly would imply the nonexistence of P / poly natural properties against P / poly but it is not hardto see that natural properties against P / poly are computable by p-size circuits with localoracles. Overcoming the locality barrier is thus essential for proving strong complexitylower bounds in general. Theorem 6, if read counterpositively, is a magniﬁcation of O ( n (cid:15) log n )-size lower boundsfor learning p-size circuits to n O (log n ) -size lower bounds. This diﬀers from previous hard-ness magniﬁcation theorems by avoiding localization: the size of the learner plays a crucialrole in the reduction and therefore cannot be simply replaced by an arbitrary oracle. Thesame trick is behind non-blackbox worst-case to average-case reductions within NP of Hi-rahara [13]. To the best of my knowledge, the only other hardness magniﬁcation theoremswith this property appeared in [6] and [14]. [6, Theorem 1], like Hirahara [13] and the Some known circuit lower bounds above the magniﬁcation threshold are provably nonlocalizable butthey do not ﬁt to the framework of the so called Hardness Magniﬁcation frontier [6], one reason being thatthey do not work for explicit and natural problems, cf. [6, 8]. For example, a nonlocalizable lower boundfrom [6] works for a function in E which is artiﬁcial in the sense that it is designed to avoid localization,not for a problem of independent interest such as MCSP . Oliveira [30] showed that near superlinear-size lower bounds for a version of

MCSP deﬁned w.r.t. a notion of randomized Kolmogorov complexityimply strong circuit lower bounds while the same problem is provably hard for probabilistic p-time. Thelower bound of Oliveira works, however, only against uniform models of computation. Moreover, themagniﬁcation theorem concludes at best a ‘weak’ lower bound of the form quasipolynomial-time QP being hard for P / poly . Similarly, an approach of Chen, Jin and Williams [8] via derandomizations anduniform obstructions appears to avoid the locality barrier but yields at best lower bounds of the form QP (cid:54)⊆ P / poly . There are two more results which could be potentially classiﬁed as nonlocalizable hardness magni-

MCSP whose localizedversion does not hold (as witnessed by other hardness magniﬁcation theorems). Theorem6 does not seem to localize in this sense either: it asks for an n (cid:15) log n -size lower boundon learning algorithms while there seems to be no reason to expect that p-size circuitsare learnable by circuits of size O ( n log n ) extended with oracles of fan-in n o (1) . (Such alocalization would mean that p-size circuits are learnable in subexponential size.) Themagniﬁcation theorems of Hirahara [14] face similar complications. Unfortunately, Theorem 6 does not reduce p-size lower bounds to, say, subquadraticlower bounds: It magniﬁes n O ( d ) s -size lower bounds for learning functions with m = n d inputs (and circuit complexity m k ) to an s -size lower bound for learning functions with n inputs (and circuit complexity n dk ). That is, a polynomial speedup w.r.t. the input-length of target functions is traded for a polynomial decrease of the circuit size of targetfunctions. Ideally, we would like to magnify, say, n . -size formula lower bound for learningcircuits of size n . with n inputs to n O (1) -size formula lower bounds for learning circuitsof size n . with n inputs. If the existing methods for proving the required formulalower bounds were applicable to prove subquadratic formula lower bounds for learningalgorithms (note that such lower bounds are allowed to localize and naturalize), such astrengthening of Theorem 6 would lead to explicit NC lower bounds. The methods for deriving learning algorithms from circuit lower bounds presented in thispaper might be improvable in many ways. ﬁcations. A theorem of Buresh-Oppenheim and Santhanam [4, Theorem 1] is based on an exploitationof Nisan-Wigderson generators similar to that of [6] but it seems less practical in its current form, asit magniﬁes only lower bounds for nondeterministic circuits. The other result of Tal [44] shows that anaverage-case hardness for formulas of size s can be magniﬁed to the worst-case hardness for slightly biggerformulas. A problem is that [44] magniﬁes at best to an s -size lower bound. Moreover, if we wanted tostrenghten it further by connecting it with another magniﬁcation theorem, it is not clear how to preservethe nonlocalizability - the weak lower bound obtained via [44] would likely localize. Hirahara [14, Theorem 11 and 13] proves two types of magniﬁcation theorems. The ﬁrst type essen-tially adapts the result from [6] in the context of weaker computational models. The second type extendsit by introducing metacomputational circuit lower bound problems MCLPs and showing that weak lowerbounds for MCLPs can be magniﬁed as well. MCLPs are not solvable by any algorithm whatsoeverunless standard hardness assumptions break. This implies that there is no unconditional upper boundfor MCLPs and the locality barrier does not apply. Unfortunately, we do not have any interesting lowerbound for MCLPs either. The corresponding magniﬁcation theorems thus do not establish a HardnessMagniﬁcation frontier [6]. Nevertheless, as suggested in [14], developing such methods might be a way tostrong lower bounds. afe cryptography or eﬃcient learning. Perhaps the most appealing question asksfor bridging cryptography and learning theory. Showing that eﬃcient learning follows frombreaking pseudorandom generators, i.e. answering positively Question 2, would establisha remarkable win-win situation. As discussed in Section 4.4 the question is closely relatedto a problem of Rudich about turning demibits to superbits.

Instance-speciﬁc learning vs PAC learning.

Circuit lower bounds correspond toa simple instance-specifc learning model described in Section 3. Can we improve ourunderstand of the model and its relation to PAC learning? In particular, can we determinehow much we can learn from a single circuit lower bound? A possible formalization of theproblem is given by Question 1.

Connections to proof complexity.

The present paper brings several methods fromproof complexity to learning theory. It seems likely that these connections can be strength-ened. A particularly relevant part of proof complexity is the theory of proof complexitygenerators, cf. [18]. An interesting conjecture in the area due to Razborov [39] impliesa conditional hardness of circuit lower bounds in strong proof systems. In other words,Razborov’s conjecture asks for turning short proofs of circuit lower bounds into upperbounds breaking standard hardness assumptions.Notably, strengthening Theorem 1 by allowing white-box access in the witnessing oflower bounds would lead to a conditional unprovability of p-size lower bounds for

SAT inCook’s theory PV . A complication is that under standard hardness assumptions such awitnessing exists. That is, in order to obtain the conditional unprovability, one might needto exploit the PV -provability in a deeper way. Nevertheless, this suggests a simpliﬁedversion of Question 2: Can we prove a disjunction stating the PV -consistency of theexistence of strong pseudorandom generators or the PV -consistency of eﬃcient learning?Since, by witnessing theorems in PV , both the PV -provability of the non-existince ofpseudorandom generators and the PV -provability of the impossibility of eﬀﬁcient learningimply uniform eﬃcient algorithms witnessing these facts, it could be possible to combinethem with a version of uniform MinMax [46] to get a contradiction. Nonlocalizable hardness magniﬁcation near the existing lower bounds.

Can wepush forward the program of hardnness magniﬁcation by strengthening the magniﬁcationfrom Theorem 6 to a setting in which strong circuit lower bounds follow from lower boundsnear the already existing ones? The importance of the question stems from the necessityof developing nonlocalizable magniﬁcation theorems or nonlocalizable constructive lowerbound methods as discussed in Section 5.

SAT solving circuit lower bounds.

It would be interesting to investigate practicalconsequences of the provability of circuit lower bounds. Circuit lower bounds for explicitlygiven Boolean functions are coNP statements which means that they are encodable intopropositional tautologies resp. SAT instances. Could SAT solvers be successful in provinginteresting instances of circuit lower bounds for some ﬁxed input lengths? If so, this could24rovide an experimental veriﬁcation of central results and conjectures from complexitytheory such as P (cid:54) = NP up to some ﬁnite domain. As discussed in the present paper,eﬃcient algorithms proving circuit lower bounds can be also transformed into learningalgorithms, which provides a separate motivation for this line of research.In particular, SAT solving of circuit lower bounds could lead to an interesting com-parison with the research on neural networks. The task of training a neural network isto design a circuit C of size s , typically with a speciﬁc architecture, coinciding with sometraining input samples ( y i , f ( y i )), and apply it to predict the value f ( y ) on a new input y .As discussed in Section 3, this problem can be addressed by proving a circuit lower bound.Since proving a circuit lower bound can give us a reliable instance-speciﬁc prediction onecould try to use SAT solvers to verify outcomes of neural networks. More generally, onecould try to simulate neural networks by SAT solving circuit lower bounds. A potentialadvantage of SAT solvers is that they do not need to construct a circuit coinciding withtraining data - it is enough to prove its properties (lower bounds). On the othe hand,SAT solvers need to prove a universal statement which might turn out to be even harder. Acknowledgements

I would like to thank Rahul Santhanam for many inspiring discussions which, in particular,motivated me to prove Theorem 1. I am indebted to Susanna de Rezende and ErfanKhaniki for many illuminating discussions during the development of the project. I wouldalso like to thank V. Kanade for helpful comments on the existing learning models andL. Chen, V. Kabanets, J. Kraj´ıˇcek and I.C. Oliveira for helpful comments on the draft ofthe paper. This project has received funding from the European Union’s Horizon 2020research and innovation programme under the Marie Sk(cid:32)lodovska-Curie grant agreementNo 890220.

References [1] Alth¨ofer I.;

On sparse approximations to randomized strategies and convex combina-tions ; Linear Algebra and its Applications, 199(1):339-355, 1994.[2] Atserias A.;

Distinguishing SAT from polynomial-size circuits, through black-boxqueries ; CCC, 2006.[3] Blum A., Furst M., Kearns J., Lipton R.;

Cryptographic primitives based on hardlearning problems ; CRYPTO, 1993. 254] Buresh-Oppenheim J., Santhanam R.;

Making hard problems harder ; CCC 2006.[5] Carmosino M., Impagliazzo R., Kabanets V., Kolokolova A.;

Learning algorithmsfrom natural proofs ; CCC, 2016.[6] Chen L., Hirahara S., Oliveira I.C., Pich J., Rajgopal N., Santhanam R.;

Beyondnatural proofs: hardness magniﬁcation and locality ; ITCS, 2020.[7] Chen L., Jin C., Williams R.;

Hardness magniﬁcation for all sparse NP languages ;FOCS, 2019.[8] Chen L., Jin C., Williams R.; Sharp threshold results for computational complexity ;STOC, 2020.[9] Chen L., McKay D., Murray C., Williams R.;

Relations and equivalences betweencircuit lower bounds and Karp-Lipton theorems ; CCC, 2019.[10] Chen L., Tell R.;

Bootstrapping results for threshold circuits “just beyond” knownlower bounds ; STOC, 2019.[11] Cheragchi M., Hirahara S., Myrisiotis D., Yoshida Y.;

One-tape Turing machine andread-once branching program lower bounds for MCSP ; preprint, 2020.[12] Gutfreund D., Shaltiel R., Ta-Shma A.;

If NP languages are hard in the worst-casethen it is easy to ﬁnd their hard instances ; CCC, 2005.[13] Hirahara S.;

Non-black-box worst-case to average-case reductions within NP ; FOCS,2018.[14] Hirahara S.; Non-disjoint promise problems from meta-computational view of pseu-dorandom generator constructions ; CCC, 2020.[15] Ilango R., Loﬀ B., Oliveira I.C.;

NP-hardness of circuit minimization for multi-outputfunctions ; CCC, 2020.[16] Kraj´ıˇcek J.;

Dual weak pigeonhole principle, pseudo-surjective functions and prov-ability of circuit lower bounds ; Journal of Symbolic Logic, 69(1):265-286, 2004.[17] Kraj´ıˇcek J.;

On the proof complexity of the Nisan-Wigderson generator based on ahard NP ∩ coNP function ; Journal of Symbolic Logic, 11(1):11-27, 2011.[18] Kraj´ıˇcek J.; Forcing with random variables and proof complexity ; Cambridge Univer-sity Press, 2011.[19] Kraj´ıˇcek J.;

On the computational complexity of ﬁnding hard tautologies ; Bulletin ofthe London Mathematical Society, 46(1):111-125, 2014.2620] Kraj´ıˇcek J.;

Proof complexity ; Cambridge University Press, 2019.[21] Kraj´ıˇcek J., Pudl´ak P., Takeuti G.;

Bounded arithmetic and the polynomial hierarchy ,Annals of Pure and Applied Logic, 52:143-153, 1991.[22] Li L., Littman M., Walsh T.;

Knows what it knows: a framework for self-awarelearning ; ICML, 2008.[23] Linial N., Mansour Y., Nisan N.;

Constant depth circuits, Fourier transform, andlearnability ; Journal of the Association for Computing Machinery; 40(3):607-620,1993.[24] Lipton R.J., Young N.E.;

Simple strategies for large zero-sum games with applicationsto complexity theory ; STOC, 1994.[25] McKay D., Murray C., Williams R.;

Weak lower bounds on resource-bounded com-pression imply strong separations of complexity classes ; STOC, 2019.[26] Modanese A.;

Lower bounds and hardness magniﬁcation for sublinear-time shrinkingcellular automata ; preprint, 2020.[27] M¨uller M., Pich J.;

Feasibly constructive proofs of succinct weak circuit lower bounds ;Annals of Pure and Applied Logic, 2019.[28] Newman I.;

Private vs common random bits in communication complexity ; Informa-tion Processing Letters, 39:67-71, 1991.[29] Nisan N., Wigderson A.;

Hardness vs. randomness ; J. Comp. Systems Sci., 49:149-167, 1994.[30] Oliveira I.C.;

Randomness and intractability in Kolmogorov complexity ; ICALP, 2019.[31] Oliveira I.C., Pich. J., Santhanam R.;

Hardness magniﬁcation near state-of-the-artlower bounds ; CCC, 2019.[32] Oliveira I.C., Santhanam R.;

Conspiracies between learning algorithms, circuit lowerbounds, and pseudorandomness ; CCC, 2017.[33] Oliveira I.C., Santhanam R.;

Hardness magniﬁcation for natural problems ; FOCS,2018.[34] Pich J.;

Nisan-Wigderson generators in proof systems with forms of interpolation ;Mathematical Logic Quarterly, 57(4), 2011.[35] Pich J.;

Circuit lower bounds in bounded arithmetics ; Annals of Pure and AppliedLogic, 166(1):29-45, 2015. 2736] Pich J.;

Mathesis universalis ; Literis, 2016.[37] Pich J., Santhanam R.;

Strong co-nondeterministic lower bounds for NP cannot beproved feasibly ; preprint, 2020.[38] Razborov A.A; Unprovability of lower bounds on the circuit size in certain fragmentsof bounded arithmetic , Izvestiya of the Russian Academy of Science, 59:201-224, 1995.[39] Razborov A.A.;

Pseudorandom generators hard for k -DNF Resolution and Polyno-mial Calculus ; Annals of Mathematics, 181(2):415-472, 2015.[40] Razborov A.A, Rudich S.; Natural Proofs ; Journal of Computer and System Sciences,55(1):24-35, 1997.[41] Rivest R., Sloan R.;

Learning complicated concepts reliably and usefully ; AAAI, 1988.[42] Rudich S.;

Super-bits, demi-bits, and NP/qpoly-natural proofs ; Journal of Computerand System Sciences, 55(1):24-35, 1997.[43] Santhanam R.;

Pseudorandomness and the Minimum Circuit Size Problem ; ITCS,2020.[44] Tal A.;

Computing requires larger formulas than approximating ; STOC, 2017.[45] Vadhan S.;

Learning versus refutation ; COLT, 2017.[46] Vadhan S., Zheng C.J.;