[PDF] Comparing computational entropies below majority (or: When is the dense model theorem false?)

Abstract

Full PDF

aa r X i v : . [ c s . CC ] N ov Comparing computational entropies below majority(or: When is the dense model theorem false?)

Russell Impagliazzo ∗ Sam McGuire † UCSE CSE UCSD [email protected] [email protected] 13, 2020

Abstract

Computational pseudorandomness studies the extent to which a random variable Z looks like theuniform distribution according to a class of tests F . Computational entropy generalizes computa-tional pseudorandomness by studying the extent which a random variable looks like a high entropy distribution. There are diﬀerent formal deﬁnitions of computational entropy with diﬀerent advan-tages for diﬀerent applications. Because of this, it is of interest to understand when these deﬁnitionsare equivalent.We consider three notions of computational entropy which are known to be equivalent when thetest class F is closed under taking majorities. This equivalence constitutes (essentially) the so-called dense model theorem of Green and Tao (and later made explicit by Tao-Zeigler, Reingold et al., andGowers). The dense model theorem plays a key role in Green and Tao’s proof that the primes containarbitrarily long arithmetic progressions and has since been connected to a surprisingly wide rangeof topics in mathematics and computer science, including cryptography, computational complexity,combinatorics and machine learning. We show that, in diﬀerent situations where F is not closedunder majority, this equivalence fails. This in turn provides examples where the dense model theoremis false . Computational pseudorandomness is a central topic in theoretical computer science. In this scenario, onehas a class F of boolean functions f : { , } n → { , } (which we’ll refer to as tests ) and random variable Z over { , } n . We say that Z is ǫ -pseudorandom with respect to F ) if max f ∈F | E [ f ( Z )] − E [ f ( U )] | ≤ ε where U is the uniform distribution over { , } n and ε > Z as‘behaving like the uniform distribution’ according to tests in F . In general, say that two random variables X , Y ε -indistinguishable by F if max f ∈F | E [ f ( X )] − E [ f ( Y )] | (and so ε -pseudorandom distributions areexactly those which are ε -indistinguishable from U ). Constructing explicit Z ’s which behave like theuniform distribution according to diﬀerent test classes is among the central goals of complexity theory,with suﬃciently strong constructions leading to, for example, derandomization of BPP . One way inwhich the theory of pseudo-randomness is rich is that there are multiple equivalent formulations ofpseudo-randomness, such as Yao’s next bit test ([51]).The various notions of pseudo-entropy and pseudo-density generalize pseudo-randomness to formalizehow much randomness a distribution looks like it has as far as this class of tests can perceive. Many ofthese notions were ﬁrst introduced as stepping stones towards pseudo-randomness, giving properties ofsub-routines within constructions of pseudo-random generators. However, measuring seeming random-ness quantitatively is important in many other contexts, so these notions have found wider application.For example, in mathematical subjects such as combinatorics and number theory, there is a generalphenomenon of “structure vs. randomness”, where a deterministically deﬁned object such as a graph orset of integers can be decomposed into a structured part and a random part. Pseudo-entropy quantiﬁeshow much randomness the “random part” has. Notions of pseudo-density were used in this context byGreen, Tao, and Ziegler [18, 48] to show that the primes contain arbitrarily long arithmetic progressions. ∗ Research supported by NSF award 1909634 and a Simons Investigator award † Research supported by NSF award 1909634 and a Simons Investigator award

1e can also use pseudo-entropy notions to characterize the amount of seeming randomness remains na cryptographic key after it has been compromised with a side-channel attack. A data set used in amachine learning algorithm might not have much randomness in itself, and might not be completelyrandom looking, but is hopefully representative of the much larger set of inputs that the results of thealgorithm will be applied to, so we can use notions of pseudo-entropy to say when such algorithms willgeneralize. There are many possible deﬁnitions of this intuitive idea, and as with pseudo-randomness,the power of pseudo-entropy is that many of these notions have been related or proven equivalent.In particular, the dense model theorem provides such a basic equivalence. Here, the intuitive conceptwe are trying to capture is the density (or relative min-entropy) of the target distribution within a largerdistribution, what fraction of the larger distribution is within the target. We say that Z is δ -dense if E [ µ ( x )] = 2 − n P x µ ( x ) ≥ δ where µ : { , } n → [0 ,

1] is density function deﬁning Z (in the sense thatPr[ Z = z ] = µ ( z ) / (2 n E [ µ ( x )])). One application of indistinguishability from a dense distribution is as astepping stone to pseudorandomness: if Z is indistinguishable from a distribution M with density δ withinthe uniform distribution, then applying a randomness extractor with min-entropy rate n − log(1 /δ ) to Z is a pseudorandom distribution. A more sophisticated application comes from additive number theory. Itis not hard to show that a random subset of [ N ] = { , , ..., N } (including each element with probability1 /

2, say) contains many arithmetic progressions (which are sets of the form { a, a + b, a + 2 b, a + 3 b, ... } ).Szemer´edi [45] showed that, in fact, suﬃciently dense subsets of the integers also contain such arithmeticprogressions: speciﬁcally, that for any k , the size of the largest subsets of [ N ] which doesn’t contain anarithmetic progression grows like o ( N ).So we would like some technology to reason about random variables Z which ‘behave like densedistributions’. It turns out, however, that formalizing what it means for Z to ‘behave like a densedistribution’ is subtle. Here are three perfectly legitimate candidates: Candidate 1: Z behaves like a δ -dense distribution if it behaves like something that’s δ -dense. Formally,this means that Z is ε -indistinguishable from some δ -dense distribution. In this case, we say that Z has a δ -dense ε -model . Candidate 2: Z behaves δ -dense if it’s δ -dense inside of something that behaves like the uniform dis-tribution. Formally this means there’s an ε -pseudorandom distribution X in which Z is δ -dense.In this case, we say that Z is δ -dense in an ε -pseudorandom set . Candidate 3: Z behaves δ -dense if it appears to be the case that conditioning on Z increases the sizeof any set by at most (roughly) a 1 /δ -factor. This is an operational deﬁnition: conditioning on a(truly) dense set increases the set by at most a 1 /δ -fraction, so we should expect the same behaviorfrom things that behave like a dense set. Formally, this means that δ E [ f ( Z )] ≤ E [ f ( U )] + ε forany f in our test class F . In this case, we say that Z has ( ε, δ ) -pseudodensity. Precisely which deﬁnition you pick will depend on what you know about Z and in what sense youwould like it to behave like a δ -dense distribution. Indeed, each of these deﬁnitions have appeared indiﬀerent applications ([25], [18], [13], respectively), so there are scenarios where each of these types ofbehavior is desired. In general, the ﬁrst candidate is the strongest (and, arguably, the most natural), butit is sometimes hard to establish that a distribution has the property. The following claim gives somesimple relationships between the deﬁnitions: Claim 1.1.

For any F , the following hold:1. If Z has a δ -dense ε -model, then Z is δ -dense in a ε -pseudorandom set.2. If Z is δ -dense in an ε -pseudorandom set, then Z has ( ǫ, δ ) -pseudodensity.Proof sketch.

1. Let M be the δ -dense ε -model for Z . Note that U = δ M + (1 − δ ) M . So U ′ = δ Z + (1 − δ ) M is ε -pseudorandom and Z is δ -dense within it.2. Suppose Z is δ -dense in Z ′ which ε -pseudorandom for F . Then for any f ∈ F , δ E [ f ( Z )] ≤ E [ f ( Z ′ )] ≤ E [ f ( U ] + ε . (cid:4) The marvelous quality of these three candidates in particular is that, for many natural F , all of themare equivalent , and so establishing even ( ǫ ′ , δ )-pseudodensity is enough to guarantee the existence of a δ -dense ε -model. 2his equivalence holds for F which are closed under majority , meaning for any k (which we canthink of as k = O (1) for now), if f , ..., f k ∈ F then MAJ k ( f , ..., f k ) ∈ F , where MAJ : { , } n →{ , } is 1 if at least half of its input bits are 1. In fact, it holds for more general F if we allowthe distinguishing parameter ( ε ′ in ( ε ′ , δ )-pseudodensity) to be exponentially small (as in the originalformulation, which we’ll dicuss later on). In this case, the subtelty in deﬁning what it means to behavelike a dense set vanishes. These equivalences constitute (essentially) what is known as the dense modeltheorem , originating in the work of Green-Tao [18] and Tao-Zeigler [48], and independently in Barak etal. [8] (though in diﬀerent guises). This result has been fruitfully applied in many seemingly unrelatedareas of mathematics and computer science: additive number theory [18, 48] where F encodes additiveinformation about subsets of { , ..., N } (or possibly a more general group), graph theory [49, 38] where F encodes cuts in a ﬁxed graph, circuit complexity [49], Fourier analysis [29], machine learning [29]and leakage-resilient cryptography [14]. The ubiquity of the dense model theorem motivates a simplequestion: are there natural scenarios in which the dense model theorem is false ?We show that the answer to this question is yes . In particular, we show that for either implicationfrom Claim 1.1 there is a class F and a random variable Z so that converse fails to hold. From thecomputational entropy perspective, we show that the three computational entropies we’ve discussed areinequivalent for certain test classes F . Necessarily (with ε ′ not exponentially small) these classes are not closed under majority and so we will need to look ‘below’ majority in order to ﬁnd our counterexamples. We turn to discuss the dense model theorem in some more detail to better contextualize our work.Restricting our attention to random variable over { , } n , the dense model theorem states the following: Theorem 1.1 (Dense model theorem) . Let F be a class of tests f : { , } n → { , } and Z a randomvariable over { , } n with ( εδ, δ ) -pseudodensity with respect to MAJ k ◦ F for k = O (log(1 /δ ) /ε ) . Then Z has a δ -dense ε -model with respect to F . We will generally also consider a parameter ε ′ , which in this case is εδ , the additive error in pseudo-density. To get an intuition for what this is saying, let’s conisder a setting where it’s false but for trivialreasons. As a simple example given in [52], pick a set Z some set as a (1 − ε ) fraction of another set S of size δ n . Then Z doesn’t have a δ -dense ε -model (i.e. S ) with respect to Z ’s indicator function,which we’ll call f . On the other hand, the distribution W obtained by sampling Z with probability δ and sampling from S ’s complement with probability 1 − δ is at most εδ -distinguishable from S for anyfunction, since εδ is simply the measure of the diﬀerence between S and Z . In particular Z is δ -densein the εδ -pseudorandom W (which implies, via Claim 1.1, that it is ( εδ, δ )-pseudodense). This meansthat the Theorem 1.1 is tight for the dependence on ε ′ = εδ , in that it becomes false for Ω( εδ ). In manyinstances, we think of ε = 1 / poly ( n ), δ constant (or perhaps with mild dependences on n ) and ε ′ = δε .Originally, the dense model theorem was proved with a diﬀerent (and stronger) assumption; namely,that Z is dense in a pseudorandom set. Green and Tao, in proving that the primes contain arbitrarilylong arithmetic progressions, used it to the following eﬀect: if Z are the prime numbers up to n , then itsdensity is known to behave like Θ(1 / log n ). On the other hand, Szemer´edi [45] showed that suﬃcientlydense subsets of Z contain arbitrarily long arithmetic progressions. The best bounds for Szemer´edi ’stheorem require density ω (1 / log log n )), which is much larger than the primes (see [16] and the recent[9] for more on the rich history on this and related problems). Not all is lost, however: the only propertyof dense sets that we’re interested in is that they contain arithemtic progressions. So Green and Taoconstruct a class F of tests which can ‘detect’ arithmetic progressions and under which the primes aredense inside of a F ′ -pseudorandom set (more on F ′ later). By applying the dense model theorem, weconclude that the primes ‘look like’ a dense set (themselves having long arithemtic progressions) withrespect to the class F . As F detects arithmetic progressions, it must be the case that the primes possessthem. Of course, many details need to be ﬁlled in, but we hope this example shows the reader the ‘spirit’of the dense model theorem.A primary source of interest in the dense model theorem is in the connections it shares with seeminglyunrelated branches of mathematics and computer science. The original application was in additivenumber theory, but it was independently discovered and proved in the context of cryptography ([8, 14]).RTTV [38] and Gowers [17] observed proofs of the dense model theorem which use linear programmingduality, which is in turn related to Nisan’s proof of the hardcore lemma from circuit complexity [28]. Infact, Impagliazzo [29] shows in unpublished work that optimal-density versions of the hardcore lemma dueto Holenstein [26] actually imply the dense model theorem. Klivans and Servedio [32] famously observed3he relationship betweeen the hardcore lemma and boosting , a fundamental technique for aggregatingweak learners in machine learning [15]. Together with the result of Impagliazzo, this connection meansthat dense model theorems can be proved by a particular type of boosting algorithm. A boostingargument for the existence of dense models also gives us constructive versions of the dense model theorem,which are needed for algorithmic applications. Zhang [52] (without using Impagliazzo’s reduction fromthe dense model theorem to the hardcore lemma) used the boosting algorithm of [7] directly to provethe dense model theorem with optimal query complexity ( k ).In addition to its connections to complexity, machine learning, additive number theory and cryp-tography, the dense model theorem (and ideas which developed from the dense model theorem, chieﬂythe approximation theorem of [49]), have been used to understand the weak graph regularity lemmaof Frieze and Kannan [29], notions of computational diﬀerential privacy [36] and even generalization ingenerative adversarial networks (GANs) [5]. We now turn to discussing the complexity-theoretic aspectsof the dense model theorem, speciﬁcally regarding our question of whether the MAJ k from the statementis optimal.As alluded to earlier, Green and Tao actually worked in a setting where F ′ doesn’t ’need to com-pute majorities but where εδ (that is, the distinguishing parameter in the pseudodensity assump-tion in the statement of Theorem 1.1) needs to be replaced by some ε ′ = exp( − poly (1 /ε, /δ )) (with k = poly (1 /δ, /ε ) experiencing a small increase). We state this result, as proved in Tao and Zeigler [48]and stated this way in RTTV [38], for comparison. For a test class F , let Q k F be the set of tests of theform Q i ∈ [ k ] f i for f i ∈ F . Theorem 1.2 (Computationally simple dense-model theorem, strong assumption) . Let F be a class oftests f : { , } n → [0 , and Z a random variable over { , } n which is δ -dense in a set ε ′ -pseudorandomfor Q k F with k = poly (1 /δ, /ε ) and ε ′ = exp( − /δ, /ε ) . Then Z has a δ -dense ε -model with respectto F . RTTV [38] observe that this proof can be adapted to work for ε ′ have polynomial dependence on ε, δ by restricting to the case of boolean-valued tests. Doing so, however, makes F ′ much more complicated(essentially requiring circuits of size exponential in k ). In Theorem 1.1, we can obtain the best of bothworlds: ε ′ has polynomial dependence on ε, δ and the complexity blow-up is rather small. However,in this more picturesque circumstance, we need to be able to compute majorities. Is such a tradeoﬀnecessary? Our results suggest that the answer is yes. Theorem 1.6 (stated in the following section)tells us that if the dense model theorem is true for F , then there’s a small, constant-depth circuit with F -oracle gates approximating majority on O (1 /ε ) bits.Another important aspect of the dense model theorem is how the diﬀerent assumptions are related.As mentioned, the original assumption was that Z is δ -dense in an ε -pseudorandom set, but the proofcan be extended to the case where Z is ( ε, δ )-pseudodense. Claim 1.1 showed that the former assumptionimplies that latter assumption. When the dense model theorem is true, the latter also implies the former:simply apply the dense model theorem to Z which is ( ε, δ )-dense to obtain a δ -dense ε -model. Then, bythe ﬁrst part of Claim 1.1, we’re done.First, we give examples of situations where these two notions are distinct. For example, we show inTheorem 1.4 and Theorem 1.5 that they are inequivalent when F is constant-depth polynomial sizecircuits or when F is a low-degree polynomial over a ﬁnite ﬁeld. Note that a separation betweenpseudodensity and being dense in a pseudorandom set also implies a separation between pseudodensityand having a dense model, as being dense in a pseudorandom set is a necessary condition for having adense model.Second, we show that the dense model theorem is false even when we make the stronger assumptionthat the starting distribution Z is dense in a pseudorandom set. Speciﬁcally, in Theorem 1.3 we canshow that some distributions Z are dense in a pseudorandom set but fail to have a dense model when F consists of constant-depth, polynomial size circuits.Having contextualized our work some, we now turn to describe our contributions in more detail. We separate the previously described notions of computational entropy, giving examples where the densemodel theorem is false. We are able to prove diﬀerent separations when F is constant-depth unboundedfan-in circuits, low-degree polynomials over a ﬁnite ﬁeld, and, in one case, any test class F which cannoteﬃciently approximate majority (in some sense made explicit later on). The only known separation prior4as between pseudodensity and having a dense model for bounded-width read-once branching programs,due to Barak et al. [8].Let C ( S, d ) denote the class of unbounded fan-in, size S , depth d circuits. We are generally thinkingof S = poly ( n ) and d = O (1), which corresponds to the complexity class AC . Theorem 1.3 shows that Z being δ -dense in an ε -pseudorandom set need not imply that Z has a δ -dense ε -model when the testclass is C ( S, d ): Theorem 1.3.

Let ε, ε ′ > be arbitrary, δ ≥ ε ′ / and S ≤ exp O (cid:16) √ ε ′ ε · p log(1 /δ )log(1 /ε ′ ) (cid:17) / ( d − ! . Then for F = C ( S, d ) , there is a random variable D over { , } n with n = O (log(1 /δ ) /ε ) so that D is δ -dense in an ε ′ -pseudorandom set but does not have a δ -dense ε -model. In particular, the dense modeltheorem is false in this setting. Recall that the dense model theorem is false when ε ′ = Ω( εδ ), which makes the restriction δ ≥ ε ′ / ε = 1 / poly ( n ), δ = O (1) and ε ′ = δε = Θ( ε ), in which case thisgives us (essentially) a lower bound of weakly exponential in 1 / √ ε ≈ / √ ε ′ .Let N α denote the product distribution of n Bernoulli random variables with success probability1 / − α . Recall that density in a pseudorandom set readily implies pseudodensity, and one can use thedense model theorem to show the converse. We show that ( ε, δ )-pseudodensity need not imply δ -densityin an ε -pseudorandom set when the test class is C ( S, d ): Theorem 1.4.

Fix ε, ε ′ , δ > , d ∈ N , and S ≤ exp O (cid:16) √ δ √ ε · log(1 /δ )log(1 /ε ′ ) (cid:17) / ( d − ! . Then N √ ε/δ over { , } n with n = O (1 /ε ) is ( ε ′ , δ ) -pseudodense and yet N √ ε/δ is not δ -dense inside ofany ε -pseudorandom set. The dependence ε ′ means that we can take ε ′ exponentially smaller than ε and still obtain a sepa-ration. This case corresponds to F being ‘very’ fooled by N α but still not being δ -dense in a ‘mildly’pseudorandom set. This result draws on a recent line of work in the pseudorandomness literature —often referred to as ‘the coin problem’ and studied in, e.g., [42, 12, 1, 46] — which concerns the ability ofa test class F unable to compute majority has in distinguising N α and U . We will discuss this connectionin more detail during the proof overviews.We prove a similar separation for degree- d F p -polynomials (on n variables), which generalizes (anduses techniques from) a recent result of Srinivasan [44] in the case where δ = 1. In this case, we thinkof a distribution Z as being ( ε ′ , δ )-pseudodense for degree- d F p -polynomials when δ Pr[ P ( Z ) = 0] − ε ′ ≥ Pr[ P ( U ) = 0] for any degree- d polynomial P ∈ F p [ X , ..., X n ] (noting that we are only evaluating P over { , } n ). Theorem 1.5.

MOD p gates by the classical lower boundsof Razborov [37] and Smolensky [43]. Perhaps more interestingly, this holds even over non-prime ﬁelds.Also notably, there is no dependence on ε ′ ≤ εδ , so we can take it to be arbitrarily small.We also prove a more general separation between pseudodensity and density in a pseudorandom set.This result, drawing from the work of [42], provides a more speciﬁc characterization of the sense in whichdense model theorems are ‘required’ to compute majority.5 heorem 1.6. Let ε, δ > . Suppose F is a test class of boolean functions f : { , } n → { , } with thefollowing property: there is no AC F -oracle circuit of size poly ( n · √ δε / ) computing majority on O ( p δ/ε ) bits.Then N √ ε/δ is ( ǫδ, δ ) -pseudodense and yet does not have a δ -dense ε -model. In particular, when thehypotheses are met, the dense model theorem is false. Informally, this says that any F which can refute the pseudodensity of N α is only ‘a constant-depthcircuit away’ from computing majority. Computational entropy

Computational entropy was studied systematically in [8] and is relevant tovarious problems in complexity and cryptography such as leakage-resilience [14], constructions of PRGsfrom one-way functions [25, 21, 20]. and derandomization [13].There are a number of deﬁnitions of computational entropy which we don’t consider in this work.For example, Yao pseudoentropy [51] (see also [8]), corresponding to random variables which are ‘com-pressible’ by a class of tests F , in the sense that F can encode and decode the random variable byencoding into a small number of bits. Yao pseudoentropy was recently used in time-eﬃcient hardness-to-randomness tradeoﬀs [13], where (randomness-eﬃcient) samplers for pseudodense distributions wereused with an appropriate extractor to construct a pseudorandom distribution. Another example is in-accessible entropy of Haitner et al. [21], corresponding to the entropy of a message at some round in atwo-player protocol conditioned on the prior messages and the randomness of the players, which is usedin eﬃcient constructions of statistically hiding commitment schemes from one-way functions [20].Separating notions of computational entropy has been studied before in [8], who prove a separationof pseudodensity and having a dense model for bounded-width read-once branching programs. Sepa-rating notions of conditional computational entropy was studied in [27], showing separations betweenconditional variants of Yao pseudoentropy and having a dense model.As mentioned in [27], citing [49] and personal communication with Impagliazzo, another question ofinterest is whether Yao pseudoentropy (corresponding to eﬃcient encoding/decoding algorithms) implieshaving dense model. It is not hard to see that small Yao pseudoentropy implies small pseudodensity,with some mild restrictions on F . It would be interesting to see if the techniques from this paper can beused to understand Yao pseudoentropy in more detail. We leave this to future work. Complexity of dense model theorems and hardness ampliﬁcation

Prior work on the complexityof dense model theorems has included a tight lower bound on the query complexity [52] and a lower boundon the advice complexity [50]. As far as we are aware, this is the ﬁrst work to consider the computationalcomplexity of dense model theorems.There has also been prior work on the computational complexity of hardness ampliﬁcation, estab-lishing that various known strategies for hardness ampliﬁcation require the computation of majority[34, 42, 19, 41]. It is known that a particular type of hardness ampliﬁcation given by the hardcore lemma implies the dense model theorem [29].Our results are stronger in the following sense: previous work [34, 42, 19] shows that black-boxhardness ampliﬁcation proofs require majority. This means that if you amplify the hardness of f in someblack-box way, then this can be used to compute majority. In our case, we simply show (in diﬀerentsettings) that the dense model theorem is false , regardless of how we tried to prove it. By the connectionbetween the hardcore lemma and the dense model theorem, our results also provide scenarios wherethe hardcore lemma is false. As far as we are aware, these are the ﬁrst such scenarios recorded in theliterature. We discuss two general themes that appear consistently in the proofs and then discuss each of the maintheorems in some more detail.

A commonly-used observation in theoretical computer science is that most bit positions of a δ -denserandom variable over { , } n have bias O ( p log(1 /δ ) /n ) (see, for example, the introduction of [35]).Relevant to our purposes, it provides a necessary condition for having a δ -dense ε -model with respect6o any class F containing the projections z z i . Z has a δ -dense ε -model, then most bits of Z havebias ε + O ( p log(1 /δ ) /n ). In particular, if all of the bits of Z have large bias, then it can’t have a densemodel.This is used directly in the proof of Theorem 1.3. In this case, we construct a distribution Z whichis δ -dense in a set which is ε -pseudorandom for AC but where the each bit is noticeably biased awayfrom 1 / The biased coin distribution , N α over { , } n is the product of n Bernoulli random variables with successprobability 1 / − α . N α has recently garnered signiﬁcant interest in the pseudorandomness literature(see [2, 12, 46, 10, 1]). Shaltiel and Viola [42] showed that if f is a test which ε -distinguishes N α from U ,then there is a small, constant-depth circuit C with f -oracle gates which computes majority on O (1 /ε )bits. A similar, but qualitatively diﬀerent, connection due to Limaye et al [33] — extended to any choiceof ε > F p -polynomial with advantage 1 − ε in distinguishing N α from U must have degree Ω(log(1 /ε ) /α ). We extend some of these pseudorandomness results regarding N α to pseudodensity results.First, we extend the observation of Shaltiel and Viola to apply to tests f for which E [ f ( Z )] ≥ δ E [ f ( U )] + ε (which corresponds to pseudorandomness when δ = 1). This gives us unconditional pseu-dodensity for test classes F which can’t be used in small, constant-depth oracle circuits approximatingmajority. We also extend the observation of [33] to show lower bounds on the F p -degree for any function f which refutes the pseudodensity of N α .In Lemma 4.1, we show that N α exhibits ( ε, δ )-pseudodensity for ε = ( p · O (log S ) d − ) k and δ = e − αk/p . This can be seen as a generalization of Tal’s result, building on [12, 1, 42] that N α is 3 α · O (log S ) d − -pseudorandom for C ( S, d ).Tal uses a Fourier analytic proof which becomes very simple given tail bounds on the Fourier spectrumof AC (the latter being the main contribution of [46]). More generally, any F enjoying suﬃciently strongtail bounds on the Fourier spectrum (in the ℓ norm) cannot distinguish between N α and uniform. Itturns out, as proved by Tal and recorded in Agarwal [2], that if F is closed under restrictions thaneven bounding the ﬁrst level of the Fourier spectrum works. The proof of Lemma 4.1 based speciﬁcallyon the switching lemma for constant-depth circuits. While switching lemmas can be used to showFourier concentration, it would be intersting to ﬁnd a proof which only uses the assumption of Fourierconcentration (or some Fourier-analytic assumption). Our goal is to construct a random variable D which is dense inside of an AC -pseudorandom set butwhere each bit is biased away from 0. In this case, D would be distinguishable from any dense set, sincethe average bit of a dense set is roughly unbiased. Doing so requires two steps.The ﬁrst step is constructing an appropriate distribution Z that fools AC circuits. For this we adopta general strategy of Ajtai and Wigderson [3] (and applied in many contexts in pseudorandomness since;see, e.g., [40]): to fool a circuit C , we start by producing a random restriction to simpify C to a shortdecision tree (via the switching lemma), and then we fool the decision tree on the remaining bits using a k -wise independent distribution S . If we wanted Z to have small support size, we would need some wayof producing random restrictions with a small amount randomness (which is precisely the approach ofAjtai-Wigderson and later work). Fortunately, we only care about the existence of Z and are thereforecontent to use the ‘non-derandomized’ switching lemma.The second step is ﬁnding a dense subset D of S with biased bits. We do this by constructing S sothat each bit has bias roughly p log(1 /δ ) /K , where k ≪ K ≪ n is a parameter. This is achieved byrandomly bucketing the indices into K buckets and assigning each bucket a random bit, which reducesthe dimension of the problem from n to K . This means we can pick a δ -dense event in { , } K withextremal bias — met (up to constants) by the function accepting all strings with weight less than K/ − K p log(1 /δ ) — in order to ﬁnd a dense subset of S with large bias. The bucketing constructionintroduces some error when a small set I ⊆ [ n ] hits to distinct elements in some buckets.7 .4.4 Theorem 1.4 We will show N α has ( δ, ε ′ )-pseudodensity for AC for δ = ε ′ = O (1), α = 1 / poly log( n ). The idea is that N α can be sampled by ﬁrst sampling a random restriction which leaves a p fraction of the bits unset(and is unbiased on the restricted bits) and then setting the remaining bits with bias α/p . Applyingthe switching lemma, we conclude that E [ f ( N α )] ≈ E [ f ′ ( N α/p )] where f ′ is a short decision tree (whichdoesn’t not depend on all of its inputs). A simple calculation reveals that acceptance probability of f ′ can increase by at a most a factor (1 + α/p ) d ≤ e αd/p when passing from the uniform distribution to N α/p . By incorporating the error from the switching lemma (i.e. the advantage lost by conditioning onthe switching lemma succeeding), we get ( δ, ǫ )-pseudodensity.To prove the separation, we use the fact that the Hamming weight of a random variable fooling C ( S, d ) is concentrated around its expectation. This means in particular that if N α were δ -dense in apseudorandom distribution, then the tails of N α couldn’t be too heavy and therefore α couldn’t be toolarge. Theorem 1.5 and Theorem 1.6 draw from related work of Srinivasan [44] and Shaltiel-Viola [42] respec-tively.With ǫ > F an arbitrary class of tests f : { , } n → {± } , suppose that f ∈ F witnesses that N ε fails to have ( ε ′ , δ )-pseudo-density in the sense that E [ f ( U )] ≤ δ E [ f ( N β )] − γ. [44] and [42] both make use of the following simple observation. Given two strings u, v ∈ { , } m with wt( u ) = (1 / − ε ) m and wt( v ) = m/

2, a uniformly random index i ∈ [ m ] has u i distributed as a(1 / − ε )-biased coin and v i as an unbiased coin. In our case, applying f to suﬃciently many randomsamples from u or v ‘distinguishes’ the two of them, but in a weaker sense.In the case of Theorem 1.6, we can amplify acceptance probabilities by increasing the size of thecircuit by a factor 1 /εδ , after which we can apply [42] saying that constant-error distinguishers between N α and U can be used to compute majority.For Theorem 1.5, we apply a beautiful recent result of Srinivasan [44] showing that any m -variatepolynomial (over a ﬁnite ﬁeld) which vanishes on most points on the slice 1 / − α and doesn’t vanish onmost points on the slice 1 / αm ). One way of interpreting this result is thatlow-degree polynomials can’t approximately solve certain ‘promise’ versions of majority.In this latter case, we need to open up the error reduction procedure we use for Theorem 1.6 and showhow to approximate it using low-degree polynomials. This will ultimately be achieved by approximatingOR with a probabilistic polynomial, as in [37, 43]. We write [ n ] = { , ..., n } and use boldface to denote random variables. Let C ( S, d ) be the set of size S ,depth- d unbounded fan-in circuits. For a boolean function f : { , } n → { , } , let DT ( f ) denote thedepth of the shortest decision tree computing f . As before, let N α denote the random variable corresponding to the product of n independent coins withbias (1 / − α ). That is, Pr[ N α = z ] = (1 / − α ) wt( z ) (1 / α ) n − wt( z ) where wt( z ) denotes the Hamming weight of z .For a random variable Z over { , } n and i ∈ [ n ], let bias i ( Z ) = | Pr[ Z i = 1] − Pr[ Z i = 0] | /

2. Let B = { z z i : i ∈ [ n ] } be the set of monotone projections. A random variable Z = ( Z , ..., Z n ) is ǫ -pseudorandom with respect to B precisely when each marginal Z i has the property that bias i ( Z ) = | Pr[ Z i = 1] − / | ≤ ε for each i ∈ [ n ]. In particular, Claim 2.1.

For any ε > , N ε is ε -pseudorandom with respect to B . .2 Information theory The (Shannon) entropy of a random variable is deﬁned as H ( Z ) = − X x ∈{ , } n p Z ( x ) log p Z ( x ) , where p Z is the probability density function corresponding to Z . The Shannon entropy of random vectoris sub-additive, in that H ( Z ) ≤ P i ∈ [ n ] Z i . When Z ∈ { , } and Pr[ Z = 1] = p , we use h ( p ) = H ( Z ) = − ( p log p + (1 − p ) log(1 − p )) to denote the binary entropy function.The min-entropy is deﬁned as H ∞ ( Z ) = − min x ∈{ , } n log p Z ( x )If Z is δ -dense inside of U , then its min-entropy is n − log(1 /δ ) and for any random variable Z , H ∞ ( Z ) ≤ H ( Z ).By this latter inequality and subadditivity, the average entropy of Z ’s bits is at least 1 − log(1 /δ ) /n .Appealing to a quadratic approximation of binary entropy, we learn that the bias must be at most p log(1 /δ ) /n . This result has been referred to as Chang’s inequality and the

Level-1 inequality , havingbeen observed in diﬀerent forms and with diﬀerent proofs in, for example, [47, 11, 22, 31]. Because it isso simple, we provide a proof here:

Claim 2.2. If Z is δ -dense in U , then E i [bias i ( Z )] ≤ p log(1 /δ ) /n Proof. As δ -density is equivalent to n − log(1 /δ ) min-entropy, n − log(1 /δ ) = H ∞ ( Z ) ≤ H ( Z ) ≤ X i ∈ [ n ] H ( Z i ) , by subadditivity of entropy. The entropy of Z i ’s bits, therefore, is at least 1 − log(1 /δ ) /n on average.Taking the Taylor series, we can approximate the binary entropy function h ( p ) around 1 / h (1 / ε ) ≤ − (2 / ln 2) ε . Comparing this bound with the average, we get1 − log(1 /δ ) ≤ − (2 / ln 2) ε , meaning ε ≤ p (ln 2 / · (log(1 /δ ) /n ) ≤ p log(1 /δ ) /n . (cid:4) It follows directly from Claim 2.2 that if bias i ( Z i ) exceeds ε + p log(1 /δ ) /n for every i , then Z does nothave a δ -dense ε -model with respect to the projections B . Lemma 2.1.

Let Z be a random variable with bias i ( Z ) ≤ γ for every i ∈ [ n ] . Then for any δ > and γ ≥ ǫ + q log(1 /δ ) n , Z does not have a δ -dense ǫ -model withrespect to B . This is used for the separation in Theorem 1.3. We would also like a necessary condition for beingdense in a pseudorandom set. Towards this end, we note that pseudorandom distributions for even verysimple test classes have mild concentration properties.

Claim 2.3.

Suppose F can compute x i ⊕ x j for every i, j ∈ [ n ] and let Z over { , } n be ε -pseudorandomfor F . Then Pr "X i Z i ≤ n/ − αn ≤ α n + ε α Proof.

We work over {± } instead of { , } to make calculations easier. We can compute the secondmoment as E [( X i Z i ) ] = X i E [ Z i ] + X i = j E [ Z i Z j ] ≤ n + εn . P i Z i ) , we see thatPr " | X i Z i | ≥ αn = Pr " ( X i Z i ) ≥ (2 αn ) ≤ E [( X i Z i ) ] / (2 αn ) . We use 2 αn because it maps back to n/ − αn in { , } . Then the conclusion follows from our secondmoment calculation and converting back to { , } . (cid:4) The tails of a dense subset can’t be too much larger than the original distribution, by deﬁnition ofdensity. This gives us a test for being dense in a pseudorandom set, which we specialize to N α . Lemma 2.2.

Let ε, δ > be arbitrary. Suppose F can compute x i ⊕ x j for any i, j ∈ [ n ] and α ≥ p / (8 δ ) · (1 /n + ε ) . Then N α is not δ -dense in any set which is ε -pseudorandom for F .Proof. Under N α , the volume of the threshold [ P i Z i ≤ n/ − αn ] is 1 /

2. Taking Claim 2.3 in thecontrapositive, we reach the desired conclusion when1 / > δα n + ε δα α > δ (1 /n + ε ) (cid:4) A restriction over [ n ] is a function ρ : [ n ] → { , , ∗} . Indices in ρ − ( ∗ ) can be thought of as unset andeach other index as set . For another restriction z so that ρ − ( ∗ ) ⊆ z − ( { , } ), let ρ ◦ z ∈ { , } n bedeﬁned by ( ρ ◦ z ) i = ( z i if i ∈ ρ − ( ∗ ), ρ i otherwise.Deﬁne the restricted function f | ρ : { , } ρ − ( ∗ ) → { , } over ρ ’s unset indices by f | ρ ( z ) = f ( ρ ◦ z ) . Let R p be the distribution on restrictions over [ n ] obtained by setting ρ ( i ) = ∗ independently withprobability p , and then setting each bit not assigned to ∗ a random bit. The switching lemma we use isdue to Rossman [39], building on a long line of work [3, 23, 24, 30]: Theorem 2.1 (Rossman [39]) . Suppose f ∈ C ( S, d ) . Then Pr ρ ∼ R p [ DT ( f | ρ ) ≥ k ] ≤ ( p · O (log S ) d − ) k By considering a random restriction ρ ∼ R p over [ n ] and a random variable Z over { , } n , thedeﬁnition of a restricted function implies that E [ f ( ρ ◦ Z )] = E [ f | ρ ( Z )] . We make crucial use of two simple corollaries of the switching lemma, which allow us to reason aboutdistinguishability for AC circuits in terms of distinguishability for short decision trees. Lemma 2.3.

Suppose f ∈ C ( S, d ) . Then there is a distribution over depth k decision trees so that | E [ f ( ρ ◦ Z )] − E [ h ρ ( Z )] | ≤ ( p · O (log S ) d − ) k Proof.

Let g ρ denote the optimal decision tree for f | ρ . Let E denote the event that g ρ has depth atmost k and Pr[ E ] = 1 − q . Let h ρ be the distribution over depth at most k decision trees obtained bysampling g ρ conditioned on E . Then E [ f ( ρ ◦ Z )] = E [ f | ρ ( Z )]= (1 − q ) E [ g ρ ( Z ) | E ] + q E [ g ρ ( Z ) |¬ E ]= (1 − q ) E [ h ρ ( Z )] + q E [ g ρ ( Z ) |¬ E ]= E [ h ρ ( Z )] − q ( E [ h ρ ( Z )] − E [ g ρ ( Z ) |¬ E ]) . q because f is Boolean. By Theorem 2.1, q ≤ ( p · O (log S ) d − ) k . (cid:4) Lemma 2.4.

Suppose f ∈ C ( S, d ) . Then there’s a depth k decision tree h so that | E [ f ( U )] − E [ f ( ρ ◦ Z )] | ≤ | E [ f ′ ( U )] − E [ f ′ ( Z )] | + ( p · O (log S ) d − ) k Proof.

Lemma 2.3 gives us the following upper bound. | E [ f ( U )] − E [ f ( ρ ◦ Z )] | ≤ | ( E [ h ρ ( U )] ± q ) − ( E [ h ρ ( Z )] ± q ) | (Lemma 2.3) ≤ | E [ h ρ ( U )] − E [ h ρ ( Z )] | + 2 q (triangle inequality)We can continue to upper bound the right-hand term by | E [ h ρ ( U )] − E [ h ρ ( Z )] | = | E ρ [ E [ h ρ ( U )] − E [ h ρ ( Z )]] |≤ E ρ [ | E [ h ρ ( U )] − E [ h ρ ( Z )] | ] (triangle inequality) ≤ | E [ h ( U )] − E [ h ( Z )] | where the last line holds for some h in the support of h ρ by averaging. (cid:4) We start by reducing the problem of constructing a pseudorandom Z for AC to constructing a pseudo-random Z for small-depth decision trees. This can be immediately achieved by applying Lemma 2.4. Claim 3.1.

Let p ∈ [0 , be arbitrary and suppose Z is a random variable over { , } n which is ǫ -pseudorandom for depth- k decision trees. Then for ρ ∼ R p , ρ ◦ Z is ǫ ′ -pseudorandom for C ( S, d ) for ǫ ′ = ǫ + ( p · O (log S ) d − ) k The next lemma constructs a pseudorandom distribution for depth- k decision trees with each bithaving signiﬁcant bias. Lemma 3.1.

For any k ∈ N , δ > and K ≥ / δ , there is a k -wise independent random variable S over { , } n and a δ -dense subset D of S with the property that1. D is δ -dense in S .2. For all i ∈ [ n ] , bias i ( D ) = Ω( p log(1 /δ ) / K ) S is k /K -pseudorandom for depth- k decision trees. We will use the following standard lower bound on the lower tail of a binomial distribution:

Claim 3.2 ([6]) . For < α < and let Z , ..., Z K be independent unbiased coins ( { , } -valued). Thenany γ with / − γ = r/K for some positive integer r satisﬁes − K (1 − h (1 / − γ )) √ K ≤ Pr  X i ∈ [ K ] Z i ≤ K/ − Kγ  Proof of Lemma 3.1.

We sample S in two stages. First, randomly partition [ n ] into K parts A , ..., A K for K > k . Second, assign to each A i a uniformly random bit b i .Let D be S conditioned on b = ( b , ..., b K ) having weight less than K/ − p K log(1 /δ ) /

8. Since the b i ’s are unbiased random bits, we can apply Claim 3.2 to lower bound D ’s density: for any γ ,Pr  X i ∈ [ k ] b i ≤ γK  ≥ − Kh (1 / − γ ) √ K . δ when 2 − K (1 − h (1 / − γ )) √ K ≥ δ − h (1 / − γ ) ≥ log(1 /δ ) /K − log(2 K ) / K γ ≥ log(1 /δ ) /K − log(2 K ) / K with the upper bound in the last line following from h (1 / − γ ) ≥ − γ . Hence, if the set of stringswith weight at most K/ − γK is δ -dense, we have γ ≥ p log(1 /δ ) /K − log(2 K ) / K . log(2 K ) / K is atmost log(1 /δ ) / K when 2 K ≤ /δ , in which case γ ≥ p log(1 /δ ) / K . In particular, this lower boundsthe bias of D ’s bits.To see why it’s k /K -pseudorandom for depth- k decision trees, consider a depth- k decision tree T .Over U , we can imagine evaluting T ‘on-line’ as follows: whenever T queries the i th bit, determinethe value of z i by ﬂipping an unbiased coin. Over S , we can imagine evaluating T similarly, where wedetermine the bucket A j that i lives in and the value b j of that bucket.By conditioning S on not placing two distinct indices i, j in the same bucket — call this conditionedrandom variable S ′ — then T doesn’t have any distinguish advantage over S ′ , as all of the bits it queriesare independent and uniform. By a union bound, S places two distinct indices in the same bucket withprobability at most k /K . T ’s distinguishing advantage is therefore at most k /K . (cid:4) In principle, we could have used other pseudorandom distributions for decision trees such as the ε -almost k -wise independent distributions from [4]. The construction here is used to obtain betterdependence on the parameters of interest. We will also need a claim to expresses the bias of the bits in ρ ◦ Z . The proof can be found in the appendix. Claim 3.3.

Fix p ∈ [0 , and a random variable Z . Let E be an event which is independent from ρ (inthat the conditional distribution of ρ is identical to the unconditioned distribution). Then Pr[( ρ ◦ Z ) i = 1 | E ] = p Pr[ Z i = 1 | E ] + (1 − p ) / Theorem 1.3.

Let ε, ε ′ > be arbitrary, δ ≥ ε ′ / and S ≤ exp O (cid:16) √ ε ′ ε · p log(1 /δ )log(1 /ε ′ ) (cid:17) / ( d − ! . Then for F = C ( S, d ) , there is a random variable D over { , } n with n = O (log(1 /δ ) /ε ) so that D is δ -dense in an ε ′ -pseudorandom set but does not have a δ -dense ε -model. In particular, the dense modeltheorem is false in this setting.Proof. Let n = log(1 /δ ) /ε , k = log(2 /ε ′ ) and K = (2 k ) /ε ′ . We also need K ≥ / δ by the restriction inLemma 3.1, which explaines the restriction 8 δk ≥ ε ′ , simpliﬁed by using 8 δ ≥ ε ′ (a stronger restriction)instead. Let S and D be the random variables from Lemma 3.1. By Claim 3.3, the bias of ρ ◦ S (where ρ ∼ R p ) is p p log(1 /δ ) / K . By Claim 3.1 and Lemma 3.1, ρ ◦ S is ε ′ = k /K + ( pO (log S ) d − ) k pseudorandom. We can also ensure that ρ ◦ S does not have a δ -dense ε -model when p p log(1 /δ ) / K ≥ ε + p log(1 /δ ) /n , by Lemma 2.1.By substituting, p ≥ p K/n = 2 p ( kε ) /ε ′ · log(1 /δ ). In comparison, ε ′ ≥ k /K + ( pO (log S ) d − ) k .Recalling that k /K = ε ′ /

2, we get that ε ′ / ≥ (2 p K/nO (log S ) d − ) k √ n √ K ( ε ′ / /k ≥ O (log S ) d − p log 1 /δε · √ ε ′ √ k · ( ε ′ / /k ≥ O (log S ) d − . p log 1 /δε · √ ε ′ √

32 log(1 /ε ′ ) ≥ O (log S ) d − . The claim follows by solving for S . (cid:4) Proof of Theorem 1.4

Theorem 1.4 follows by combining Lemma 2.2 and the following lemma:

Lemma 4.1. N α has ( ε, δ ) -pseudodensity for C ( S, d ) for ε = ( p · O (log S ) d − ) k and δ = e − αk/p . Of note, the only additive error depends on the error from the switching lemma. Compare this withthe claim that N α is (3 α · O (log S ) d − )-pseudorandom (and therefore has the same pseudodensity for δ = 1) for C ( S, d ), due to Tal [46].To prove the lemma, we need a few claims.

Claim 4.1.

Suppose f ∈ C ( S, d ) . Then there is a depth- k decision tree h with the property that: E [ f ( N α )] ≤ E [ h ( N α/p )] + ( p · O (log S ) d − ) k Proof.

Take Z = N α/p in Lemma 2.3, so we have ρ ◦ N α/p = N α and E [ f ( N α )] ≤ E [ h ρ ( N α/p )] + ( p · O (log S ) d − ) k Averaging over ρ yields the ﬁxed decision tree. (cid:4) Second, we can upper bound the extent to which the acceptance probability of a short decision treeincreases when passing from the uniform distribution U to the biased distribution N γ Claim 4.2.

Suppose f : { , } n → {− , } is a depth- k decision tree. Then E [ f ( N γ )] ≤ (1 + γ ) k · E [ f ( U )] ≤ e γk · E [ f ( U )]The proof amounts to a calculation, which we include in the appendix. We’re now in a position toprove the lemma. Proof of Lemma 4.1.

Directly applying Claim 4.1, we get E [ f ( N α )] ≤ E [ f ′ ( N α/p )] + ( p · O (log S ) d − ) k Applying Claim 4.2 to E [ f ′ ( N α/p )], we get E [ f ( N α )] ≤ (1 + α/p ) k E [ f ′ ( U )] ≤ e αk/p E [ f ′ ( U )] . Putting these together ﬁnishes the proof. (cid:4)

We can now prove Theorem 1.4, restated here:

Theorem 1.4.

Fix ε, ε ′ , δ > , d ∈ N , and S ≤ exp O (cid:16) √ δ √ ε · log(1 /δ )log(1 /ε ′ ) (cid:17) / ( d − ! . Then N √ ε/δ over { , } n with n = O (1 /ε ) is ( ε ′ , δ ) -pseudodense and yet N √ ε/δ is not δ -dense inside ofany ε -pseudorandom set.Proof of Theorem 1.4. Let n = 1 / (7 ε ), k = log(1 /ε ′ ) and α = p ε/δ . These choices satisfy α ≥ q δ (1 /n + ε ), meaning N α is not δ -dense in any ε -pseudorandom set for C ( S, d ), by Lemma 2.2.By Lemma 4.1, N α has ( ε ′ , δ )-pseudodensity for δ = e − αk/p and ε ′ = ( p · O (log S ) d − ) k .The constraint on the density implies δ = e − αk/p log(1 /δ ) = αk/p log(1 /δ ) = p ε/δ log(1 /ε ′ ) /pp = p ε/δ log(1 /ε ′ )log(1 /δ ) . p into the expression for ε ′ , we get ε ′ = ( p · O (log S ) d − ) k ( ε ′ ) /k /p = O (log S ) d − ( ε ′ ) / log ( /ε ′ ) · √ δ log(1 /δ ) √ ε log(1 /ε ′ ) = O (log S ) d − . Note that ( ε ′ ) / log ( /ε ′ ) = 2 − log ( /ε ′ ) / log ( /ε ′ ) = 1 /

2. Solving for S gives the claimed bound. (cid:4) In this section, we prove Theorem 1.5 and Theorem 1.6. We include them in the same section due totheir similarity and start with Theorem 1.5 which is more involved. Indeed, Theorem 1.6 will followdirectly from a result of Shaltiel and Viola [42] after an appropriate error reduction procedure.

As in Theorem 1.4, we will prove unconditional pseudodensity for N α and compare it with the lowerbound for α given by Lemma 2.2.Let Sp n,k denote the (random variable uniform over the) set of n -bit strings of weight exactly n/ − k .We convert a function which refutes the pseudodensity of N α into a (random) function which distinguishesbetween the two ‘slices’ of the hypercube of weight m/ − αm and m/

2, which extend similar ideas from[42, 33, 44].

Lemma 5.1.

Let f : { , } n → { , } be a function for which δ E [ f ( N α )] − ε ′ ≥ E [ f ( U )] and let q = E [ f ( N α )] . Then for any positive integer ℓ , there is a random function F : { , } m → { , } so that Pr (cid:2) F ( Sp m,αm ) = 0 (cid:3) ≤ e − ℓq ;Pr (cid:2) F ( Sp m, ) = 1 (cid:3) ≤ ℓqδ. Moreover, F is the OR of ℓ copies of f (on random inputs).Proof. Fix an input z ∈ { , } m . For i ∈ [ ℓ ], let I i denote the random length n sequence over [ m ]obtained by sampling n indices j , ..., j n ∈ [ m ] independently, uniformly at random (with replacement).The random function is deﬁned as F ( z ) = _ i ∈ [ ℓ ] f ( z I i )We can evaluate F ’s acceptance probability on Sp m,αm and Sp m, as follows:1. Suppose z ∈ Sp m,αm . Then each z I i is distributed as N α on n bits, meaning the z I i ’s constitute ℓ independent samples from N α . By deﬁnition, f accepts with probability q over N α and so theprobability that f doesn’t accept in ℓ independent runs is (1 − q ) k ≤ e − kq .2. Suppose z ∈ Sp m,m/ . Then z I i ∈ { , } is a uniformly random string, meaning we have ℓ independent samples from U . For each sample, f outputs 1 independently with probability q ′ = E [ f ( U )], so that this occurs once in ℓ attempts happens with probability at most ℓq ′ . Since q ′ ≤ qδ − ε ≤ qδ , we can bound this by ℓqδ . (cid:4) Lemma 5.2.

Fix ε ′ , δ > and δ < c for some absolute constant c ≈ / Let F be a ﬁeld of positivecharacteristic p = O (1) and let F be the set of all polynomials P ∈ F [ X , ..., X n ] with degree at most d .Then if N α has ( ε ′ , δ ) -pseudodensity with respect to F , d = O (1 /α ) . Note that there is no dependence on ε ′ . This is a relic of the ﬁeld size p >

0, which allows us toapproximate large fan-in ORs with polynomials whose degree does not depend on the fan-in but onlythe quality of approximation.We will proceed in the contrapositive; if N α isn’t pseudodense, we will use the witness to constructa polynomial which we can prove degree lower bounds on directly. By ensuring that the degrees arerelated by a constant factor, we get the lower bound. The proof uses the following approximation of theOR function as a low-degree polynomial over F when F has positive characteristic (in characteristic zero,there is dependence on the fan-in of the OR, which we want to avoid).14 laim 5.1 ([37]) . For any n and any ﬁnite ﬁeld F with characteristic p > , there is a distribution R on degree d polynomials so that for all z ∈ { , } n Pr[ R ( x ) = OR ( x )] ≥ − γ where d ≤ p log(1 /γ ) . Moreover, R is supported on polynomials R with R ( z ) ∈ { , } for every z ∈ { , } n We also use a special case of the robust Heg¨edus lemma, discovered recently by Srinivasan [44].

Lemma 5.3 (Robust Heg¨edus lemma (special case), [44]) . Let F be a ﬁnite ﬁeld. Let − m/ ≤ λ ≤ c where c < is a (small) absolute constant. Let α m be an integer so that − α m ≥ λ . Then if P : F n → F is a degree d polynomial for which:1. Pr (cid:2) P ( Sp m,αm ) = 0 (cid:3) ≤ λ Pr (cid:2) P ( Sp m, ) = 0 (cid:3) ≤ − e − α m/ Then d = Ω( αm ) . Now we can prove the main lemma.

Proof of Lemma 5.2.

Let P : F n → F be a degree d polynomial. We can assume P takes boolean valuesover { , } n by replacing P with P p − ( p being F ’s characteristic). This increases the degree to less than pd . Assume towards a contradiction that P can certify that N α is not ( ε ′ , δ )-pseudodense, in the sensethat δ Pr[ P ( N α ) = 0] − ε ′ ≥ Pr[ P ( U ) = 0].Let q = Pr[ P ( N α )]. Applying Lemma 5.1 to P (noting that P is boolean on { , } n ) with ℓ speciﬁedlater, we obtain a random polynomial F of the form ∨ i P ( z I i ). For a ﬁxed z ∈ { , } m , let X ( z ) = ( z I i ) i ∈ [ ℓ ] For γ to be determined later, let R be the random polynomial from Claim 5.1 and we remark that R can be made to output boolean values on boolean inputs. Then for any z ∈ { , } m , F ( z ) = OR ( X ( z ))so F ( z ) = R ( X ( z )) with probability 1 − γ . In sum, R ( X ( · )) has the property that:Pr (cid:2) R ( X ( Sp m,αm )) = 0 (cid:3) ≤ e − ℓq + γ ;Pr (cid:2) R ( X ( Sp m, )) = 1 (cid:3) ≤ ℓqδ + γ. Moreover, R ( X ( · )) is a random polynomial over F of degree at most dp log(1 /γ ), with the p comingfrom possibly replacing P with P p − .We now apply Lemma 5.3 to R ( X ( · )). Let c denote the constant from the statement of the lemma.Let m = ln(1 /c ) / α , λ = e − ℓq + γ ≤ c , γ = c/ ℓ = ln(2 /c ) /q ≤ (ln(2 /c ) δ ) /ε ′ where the inequalityis because q ≥ ε ′ /δ follows from δq − ε ′ ≥

0. With this setting of parameters, note that, in regards tothe hypotheses of Lemma 5.3, we have 2 − m/ ≤ λ ≤ − α m = c ,By calculating, R ( X ( · ))’s classiﬁes Sp m, incorrectly with probability e − ℓq = c/ Sp m,αm incorrectly with probability ℓqδ ≤ ln(2 /c ) δ , which we can bound away from 1 by choosing δ to be a suﬃciently small constant.Therefore, by Lemma 5.3, the degree of some polynomial in the support of R ( X ( · )) is Ω(1 /α ). Since γ and p ( F ’s characteristic) are constants, the same conclusion therefore holds of P . (cid:4) We can now prove Theorem 1.5, restated here:

Theorem 1.5.

Fix a ﬁnite ﬁeld F with characteristic p = O (1) , ε, ε ′ > and let c > δ > where c ≈ / is an absolute constant. Suppose that d ≤ O ( p δ/ε ) Then when F is the n -variate degree- d polynomials over F with n = 1 /ε , and α = O ( p ε/δ ) , N α is ( ε ′ , δ ) -pseudodense but is not δ -dense inside of an ε -pseudorandom set.Proof of Theorem 1.5. If N α is ( ε ′ , δ )-pseudodense with respect to degree- d polynomials over F , then d = O (1 /α ). On the other hand, taking α ≥ p / δ (1 /n + ε ) implies that N α is not δ -dense in an ε -pseudorandom set. Therefore, if we take n = 1 /ε then picking α ≥ p ε/ δ will give us a separation. (cid:4) .2 Proof of Theorem 1.6 In what is becoming tradition, we will show an unconditional pseudodensity result, from which Theo-rem 1.6 will follow by Lemma 2.2.

Lemma 5.4.

Let ε > and / > δ > be arbitrary. Suppose F is a test class of boolean functions f : { , } n → { , } with the following property: there is no AC F -oracle circuit of size poly ( n/αε ) which computes MAJ on O (1 /α ) input bits. Then N α is ( ǫδ, δ ) -pseudodense. This will follow from the result of from Shaltiel and Viola:

Theorem 5.1 ([42]) . Let f : { , } n → { , } be a function that distinguishes between U and N α withconstant distinguishing probability. Then there is an AC -circuit of size poly ( n/α ) using f -oracle gateswhich computes majority on O (1 /α ) bits.Proof of Lemma 5.4. Suppose N α is not ( ǫδ, δ )-pseudodense for F as in the statement of the theoremand let f witness this fact with q = E [ f ( N α )].Let F ( z ) = _ j ∈ ℓ f ( ρ i ( z ))where each ρ i is a random permutation of [ n ]. When z ∼ N α , ρ i ( z ) is distributed as N α and likewisefor U . Hence F rejects samples from N α with probability (1 − q ) ℓ ≤ e − qℓ which is constant when ℓ = 1 /q ≤ δ/ε ′ = 1 /ε , this latter property following because δq − ε ′ >

0. Additionally, F accepts samplesfrom U with probability ℓqδ , which is at most δ for our choice of ℓ . Averaging, some function in F ’ssupport distinguishes N α and U with constant advantage. Applying Theorem 5.1 yields a small AC circuit computing majority, which is a contradiction. (cid:4) Theorem 1.6.

Let ε, δ > . Suppose F is a test class of boolean functions f : { , } n → { , } with thefollowing property: there is no AC F -oracle circuit of size poly ( n · √ δε / ) computing majority on O ( p δ/ε ) bits.Then N √ ε/δ is ( ǫδ, δ ) -pseudodense and yet does not have a δ -dense ε -model. In particular, when thehypotheses are met, the dense model theorem is false. The proof is the same as Theorem 1.5, so we omit the details.

References [1] Scott Aaronson. Bqp and the polynomial hierarchy. In

Proceedings of the forty-second ACM sym-posium on Theory of computing , pages 141–150, 2010.[2] Rohit Agrawal. Coin theorems and the fourier expansion. arXiv preprint arXiv:1906.03743 , 2019.[3] Miklos Ajtai and Avi Wigderson. Deterministic simulation of probabilistic constant depth circuits.In , pages 11–19. IEEE,1985.[4] Noga Alon, Oded Goldreich, Johan H˚astad, and Ren´e Peralta. Simple constructions of almost k-wiseindependent random variables.

Random Structures & Algorithms , 3(3):289–304, 1992.[5] Sanjeev Arora, Rong Ge, Yingyu Liang, Tengyu Ma, and Yi Zhang. Generalization and equilibriumin generative adversarial nets (gans). In

Proceedings of the 34th International Conference on MachineLearning-Volume 70 , pages 224–232. JMLR. org, 2017.[6] R.B. Ash.

Information Theory . Dover books on advanced mathematics. Dover Publications, 1990.[7] Boaz Barak, Moritz Hardt, and Satyen Kale. The uniform hardcore lemma via approximate bregmanprojections. In

Proceedings of the twentieth annual ACM-SIAM symposium on Discrete algorithms ,pages 1193–1200. SIAM, 2009.[8] Boaz Barak, Ronen Shaltiel, and Avi Wigderson. Computational analogues of entropy. In

Ap-proximation, Randomization, and Combinatorial Optimization.. Algorithms and Techniques , pages200–215. Springer, 2003. 169] Thomas F Bloom and Olof Sisask. Breaking the logarithmic barrier in roth’s theorem on arithmeticprogressions. arXiv preprint arXiv:2007.03528 , 2020.[10] Joshua Brody and Elad Verbin. The coin problem and pseudorandomness for branching programs.In , pages 30–39. IEEE,2010.[11] Mei-Chu Chang et al. A polynomial bound in freiman’s theorem.

Duke mathematical journal ,113(3):399–419, 2002.[12] Gil Cohen, Anat Ganor, and Ran Raz. Two sides of the coin problem. In

Approximation, Random-ization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2014) .Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2014.[13] Dean Doron, Dana Moshkovitz, Justin Oh, and David Zuckerman. Nearly optimal pseudorandom-ness from hardness. Technical report, ECCC preprint TR19-099, 2019.[14] Stefan Dziembowski and Krzysztof Pietrzak. Leakage-resilient cryptography. In , pages 293–302. IEEE, 2008.[15] Yoav Freund and Robert Schapire. A short introduction to boosting.

Journal-Japanese Society ForArtiﬁcial Intelligence , 14(771-780):1612, 1999.[16] W. T. Gowers. A new proof of szemer´edi’s theorem.

Geometric & Functional Analysis GAFA ,11(3):465–588, 2001.[17] W. T. Gowers. Decompositions, approximate structure, transference, and the hahn–banach theorem.

Bulletin of the London Mathematical Society , 42(4):573–606, 2010.[18] Ben Green and Terence Tao. The primes contain arbitrarily long arithmetic progressions.

Annalsof Mathematics , pages 481–547, 2008.[19] Aryeh Grinberg, Ronen Shaltiel, and Emanuele Viola. Indistinguishability by adaptive procedureswith advice, and lower bounds on hardness ampliﬁcation proofs. In , pages 956–966. IEEE, 2018.[20] Iftach Haitner, Omer Reingold, and Salil Vadhan. Eﬃciency improvements in constructing pseudo-random generators from one-way functions.

SIAM Journal on Computing , 42(3):1405–1430, 2013.[21] Iftach Haitner, Omer Reingold, Salil Vadhan, and Hoeteck Wee. Inaccessible entropy. In

Proceedingsof the forty-ﬁrst annual ACM symposium on Theory of computing , pages 611–620, 2009.[22] Lianna Hambardzumyan and Yaqiao Li. Chang’s lemma via pinsker’s inequality.

Discrete Mathe-matics , 343(1):111496, 2020.[23] Johan H˚astad. Computational limitations of small-depth circuits. 1987.[24] Johan H˚astad. On the correlation of parity and small-depth circuits.

SIAM Journal on Computing ,43(5):1699–1708, 2014.[25] Johan H˚astad, Russell Impagliazzo, Leonid A Levin, and Michael Luby. A pseudorandom generatorfrom any one-way function.

SIAM Journal on Computing , 28(4):1364–1396, 1999.[26] Thomas Holenstein. Key agreement from weak bit agreement. In

Proceedings of the thirty-seventhannual ACM symposium on Theory of computing , pages 664–673, 2005.[27] Chun-Yuan Hsiao, Chi-Jen Lu, and Leonid Reyzin. Conditional computational entropy, or towardseparating pseudoentropy from compressibility. In

Annual International Conference on the Theoryand Applications of Cryptographic Techniques , pages 169–186. Springer, 2007.[28] Russell Impagliazzo. Hard-core distributions for somewhat hard problems. In

Proceedings of IEEE36th Annual Foundations of Computer Science , pages 538–545. IEEE, 1995.[29] Russell Impagliazzo. Connections between pseudo-randomness and machine learning: boosting,dense models, and regularity. 2020. 1730] Russell Impagliazzo, William Matthews, and Ramamohan Paturi. A satisﬁability algorithm for ac0.In

Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms , pages961–972. SIAM, 2012.[31] Russell Impagliazzo, Cristopher Moore, and Alexander Russell. An entropic proof of chang’s in-equality.

SIAM Journal on Discrete Mathematics , 28(1):173–176, 2014.[32] Adam R Klivans and Rocco A Servedio. Boosting and hard-core set construction.

Machine Learning ,51(3):217–238, 2003.[33] Nutan Limaye, Karteek Sreenivasaiah, Srikanth Srinivasan, Utkarsh Tripathi, and S Venkitesh. Aﬁxed-depth size-hierarchy theorem for AC [ ⊕ ] via the coin problem. In Proceedings of the 51stAnnual ACM SIGACT Symposium on Theory of Computing , pages 442–453, 2019.[34] Chi-Jen Lu, Shi-Chun Tsai, and Hsin-Lung Wu. Complexity of hard-core set proofs. computationalcomplexity , 20(1):145–171, 2011.[35] Or Meir and Avi Wigderson. Prediction from partial information and hindsight, with applicationto circuit lower bounds. computational complexity , 28(2):145–183, 2019.[36] Ilya Mironov, Omkant Pandey, Omer Reingod, and Salil Vadhan. Computational diﬀerential privacy.In

Annual International Cryptology Conference , pages 126–142. Springer, 2009.[37] Alexander A Razborov. Lower bounds on the size of bounded depth circuits over a complete basiswith logical addition.

Mathematical Notes of the Academy of Sciences of the USSR , 41(4):333–338,1987.[38] Omer Reingold, Luca Trevisan, Madhur Tulsiani, and Salil Vadhan. Dense subsets of pseudorandomsets. In , pages 76–85.IEEE, 2008.[39] Benjamin Rossman. An entropy proof of the switching lemma and tight bounds on the decision-treesize of ac0, 2017.[40] Rocco A Servedio and Li-Yang Tan. Improved pseudorandom generators from pseudorandom multi-switching lemmas. arXiv preprint arXiv:1801.03590 , 2018.[41] Ronen Shaltiel. Is it possible to improve yao’s xor lemma using reductions that exploit the eﬃciencyof their oracle? In

Approximation, Randomization, and Combinatorial Optimization. Algorithms andTechniques (APPROX/RANDOM 2020) . Schloss Dagstuhl-Leibniz-Zentrum f¨ur Informatik, 2020.[42] Ronen Shaltiel and Emanuele Viola. Hardness ampliﬁcation proofs require majority.

SIAM Journalon Computing , 39(7):3122–3154, 2010.[43] Roman Smolensky. On representations by low-degree polynomials. In

Proceedings of 1993 IEEE34th Annual Foundations of Computer Science , pages 130–138. IEEE, 1993.[44] Srikanth Srinivasan. A robust version of hegedus’s lemma, with applications. In KonstantinMakarychev, Yury Makarychev, Madhur Tulsiani, Gautam Kamath, and Julia Chuzhoy, editors,

Proccedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2020,Chicago, IL, USA, June 22-26, 2020 , pages 1349–1362. ACM, 2020.[45] Endre Szemer´edi. On sets of integers containing no four elements in arithmetic progression.

ActaMathematica Academiae Scientiarum Hungarica , 20(1-2):89–104, 1969.[46] Avishay Tal. Tight bounds on the fourier spectrum of ac0. In . Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2017.[47] Michel Talagrand. How much are increasing sets positively correlated?

Combinatorica , 16(2):243–258, 1996.[48] Terence Tao, Tamar Ziegler, et al. The primes contain arbitrarily long polynomial progressions.

Acta Mathematica , 201(2):213–305, 2008. 1849] Luca Trevisan, Madhur Tulsiani, and Salil Vadhan. Regularity, boosting, and eﬃciently simulatingevery high-entropy distribution. In , pages 126–136. IEEE, 2009.[50] Thomas Watson. Advice lower bounds for the dense model theorem.

ACM Transactions on Com-putation Theory (TOCT) , 7(1):1–18, 2015.[51] Andrew C Yao. Theory and application of trapdoor functions. In , pages 80–91. IEEE, 1982.[52] Jiapeng Zhang. On the query complexity for showing dense model. 2011.

A Omitted proofs

A.1 Proof of Claim 3.3

Claim 3.3.

Let R i be the event that i ∈ ρ − ( ∗ ).Pr[( ρ ◦ Z ) i = 1 | E ] = Pr[ R i ] Pr[ Z i = 1 | R i , E ] + Pr[ ¬ R i ] Pr[ ρ i = 1 |¬ R i , E ]= p Pr[ Z i | E ] + (1 − p ) / Z i from R i and the independence of ρ i = 1 from E to obtain the ﬁnalline. (cid:4) A.2 Proof of Claim 4.2

Claim 4.2.

Suppose f : { , } n → {− , } is a depth- k decision tree. Then E [ f ( N γ )] ≤ (1 + γ ) k · E [ f ( U )] ≤ e γk · E [ f ( U )] Proof.

We proceed by induction. When k = 0, f is constant and so the claim holds trivially. Suppose f ( z ) = (1 − z i ) g ( z ) + z i h ( z ) where g, h have depth- k decision trees which don’t depend on z i . Note that E [ f ( U )] = ( E [ g ( U )] + E [ h ( U )]) / E [ g ( U )] / ≥ E [ f ( U )]. For z ∼ N ε , E [ f ( z )] = E [(1 − z i ) g ( z )] + E [ z i h ( z )]= (1 − E [ z i ]) E [ g ( z )] + E [ z i ] E [ h ( z )] (independence) ≤ (1 / − ε )(1 + ε ) k E [ g ( U )] + (1 / ε )(1 + ε ) k E [ h ( U )] (induction)= (1 + ε ) k h ( E [ g ( U )] + E [ h ( U )]) / − ε E [ g ( U )] + ε E [ h ( U )] i = (1 + ε ) k [ E [ f ( U )] − ε E [ g ( U )] + ε (2 E [ f U ] − E [ g ( U )])]= (1 + ε ) k [ E [ f ( U )] + ε (2 E [ f U ] − E [ g ( U )])] ≤ (1 + ε ) k [ E [ f ( U )] + ε E [ f U ]] (assuming E [ g ] / ≤ E [ f ])= (1 + ε ) k +1 · E [ f ( U )] . The argument for E [ h ] / ≤ E [ f ] is the same. (cid:4)(cid:4)