[PDF] Positive spectrahedra: Invariance principles and Pseudorandom generators

Abstract

In a recent work, O'Donnell, Servedio and Tan (STOC 2019) gave explicit pseudorandom generators (PRGs) for arbitrary m-facet polytopes in n variables with seed length poly-logarithmic in m,n, concluding a sequence of works in the last decade, that was started by Diakonikolas, Gopalan, Jaiswal, Servedio, Viola (SICOMP 2010) and Meka, Zuckerman (SICOMP 2013) for fooling linear and polynomial threshold functions, respectively. In this work, we consider a natural extension of PRGs for intersections of positive spectrahedrons. A positive spectrahedron is a Boolean function f(x)=[x_1A^1+\cdots +x_nA^n \preceq B] where the A^is are k\times k positive semidefinite matrices. We construct explicit PRGs that \delta-fool "regular" width-M positive spectrahedrons (i.e., when none of the A^is are dominant) over the Boolean space with seed length \textsf{poly}(\log k,\log n, M, 1/\delta). Our main technical contributions are the following: We first prove an invariance principle for positive spectrahedra via the well-known Lindeberg method. As far as we are aware such a generalization of the Lindeberg method was unknown. Second, we prove an upper bound on noise sensitivity and a Littlewood-Offord theorem for positive spectrahedra. Using these results, we give applications for constructing PRGs for positive spectrahedra, learning theory, discrepancy sets for positive spectrahedra (over the Boolean cube) and PRGs for intersections of structured polynomial threshold functions.

Full PDF

aa r X i v : . [ c s . CC ] J a n Positive spectrahedrons: Geometric properties,Invariance principles and Pseudorandom generators

Srinivasan ArunachalamIBM Quantum.

IBM T.J. Watson Research CenterYorktown Heights, USA

[email protected]

Penghui YaoState Key Laboratory forNovel Software Technology,

Nanjing University [email protected]

January 21, 2021

Abstract

In a recent work, O’Donnell, Servedio and Tan (STOC 2019) gave explicit pseudorandom gen-erators (

PRG s) for arbitrary m -facet polytopes in n variables with seed length poly-logarithmicin m, n , concluding a sequence of works in the last decade, that was started by Diakonikolas,Gopalan, Jaiswal, Servedio, Viola (SICOMP 2010) and Meka, Zuckerman (SICOMP 2013) forfooling linear and polynomial threshold functions, respectively. In this work, we consider anatural extension of PRG s for intersections of positive spectrahedrons. A positive spectrahe-dron is a Boolean function f ( x ) = [ x A + · · · + x n A n (cid:22) B ] where the A i s are k × k posi-tive semideﬁnite matrices. We construct explicit PRG s that δ -fool “regular” width- M positivespectrahedrons (i.e., when none of the A i s are dominant) over the Boolean space with seedlength poly(log k, log n, M, /δ ).Our main technical contributions are the following: We ﬁrst prove an invariance principlefor positive spectrahedrons via the well-known Lindeberg method. As far as we are aware sucha generalization of the Lindeberg method was unknown. Second, we prove various geometricproperties of positive spectrahedrons such as their noise sensitivity, Gaussian surface area and aLittlewood-Oﬀord theorem for positive spectrahedrons. Using these results, we give applicationsfor constructing PRG s for positive spectrahedrons, learning theory, discrepancy sets for positivespectrahedrons (over the Boolean cube) and

PRG s for intersections of structured polynomialthreshold functions. ontents ,

7) in Theorem 27 for Bentkus function . . . . . . . . . . . . . . . 29

A Proof of Lemma 34: Case 2 50

Introduction

Constructing explicit pseudorandom generators (

PRG ) for a class of interesting Boolean func-tions has received tremendous attention in the last few decades. One particular class of func-tions that has seen a ﬂurry of works is the class of halfspaces. A halfspace is a Boolean function f : {− , } n → { , } that can be expressed as f ( x ) = sign( a x + · · · + a n x n − b ) for some realvalues a , . . . , a n , b ∈ R . Halfspaces arise naturally in a many areas of theoretical computer scienceincluding machine learning, communication complexity, circuit complexity and pseudorandomness.A successful line of work [Ser06, DHK +

10, MZ13, KM15, GKM18] resulted in

PRG s that ε -foolhalfspaces with seed length poly-logarithmic in ( n/ε ) over the Boolean space.Given the success in designing PRG s for single halfspaces (or linear threshold function), twoalternate lines of work received a lot of attention, polynomial threshold functions and intersec-tions of halfspaces. A degree- d polynomial threshold function ( PTF ) is simply a function f ( x ) =sign( p ( x )) where p is a degree- d polynomial. In this direction, there have been a sequence ofworks [DGJ +

10, DHK +

10, Kan10, Kan11a, Kan11b, Kan11c, Kan14b, OST20] that produced

PRG s with seed length exponential in d over the Boolean space and quasi-polynomial in d overthe Gaussian space. Alternatively, another line of work considered intersections of halfspaces (i.e.,a polytope). In this direction, a sequence of works [GOWZ10, HKM13, ST17, CDS19, OST19]produced a PRG for m -facet polytopes in n variables with seed length poly-logarithmic in m, n . In this work, we initiate the construction of

PRG s for spectrahedrons: a natural generalizationof halfspaces, polytopes and

PTF s in one framework. A spectrahedron S ⊆ R n is a feasible regionof a semideﬁnite program . Namely, S = ( x ∈ R n : X i x i A i (cid:22) B ) for some symmetric matrices A , . . . , A n , B , where (cid:22) is the standard L¨owner ordering. We say S is a positive spectrahedron if either all A i s are positive semideﬁnite ( PSD ) or all A i s are neg-ative semideﬁnite. Spectrahedrons are important basic objects in polynomial optimization andalgebraic geometry [BPT12, Sch18]. Mathematically, spectrahedrons have rich and complicatedstructures and include well-known geometric objects like polytopes, cylinders, polyhedrons, ellip-topes. Computationally, semideﬁnite programming has found many applications in theoreticalcomputer science in the ﬁeld of optimization [AK07], approximation theory [GW95, GM12], algo-rithms [AHK05, JLL + + +

15, LRS15]. Theclass of semideﬁnite programs that consists of only

PSD matrices is an important class of SDPs,termed as positive semideﬁnite programs , which has been used to characterize various quantuminteractive proof systems [JUW09, JJUW11, GW13]. Their computational complexity has alsoreceived a lot of attention in the past decade [JY11, PT12, AZLO16, JLL + PRG s for regular positive spectrahedrons, which we deﬁne inSection 1.2. Before stating our main results, we brieﬂy discuss the techniques developed by priorworks to construct

PRG s for polytopes before discussing the challenges we need to handle here. We remark that there is still room for improvement in the seed length of the

PRG in [OST19]. In this ordering, we say A (cid:22) B if B − A is positive semideﬁnite, i.e., all the eigenvalues of B − A are non-negative. .1 Prior work and conceptual challenges One of the earliest works that considered fooling threshold functions was by Meka-Zuckerman [MZ13]and [DGJ + PRG s for functions f via invariance principles . Roughly speaking, an invariance principlefor a function f : {− , } n → { , } states that, the expected value of f ( U n ) (where the inputis uniformly random in {− , } n ) is close to the expected value of f ( G n ) (where the input is astandard G n = N (0 , n Gaussian). Invariance theorems are generalizations of the classic Berry-Esseen central limit theorem, generally proven using the well-known Lindeberg method [Lin22].The versatile framework of [MZ13] allows one to use invariance principles along with a few moreingredients to construct

PRG s, so the technical challenge is in establishing invariance principles.Using this framework, Harsha, Klivans and Meka [HKM13] proved an invariance principle forregular polytopes (i.e., when the coeﬃcients in (all) the halfspaces are “regular”). The main noveltyin their work was the poly-logarithmic (in the input parameters) error dependence. In order toprove this, they ﬁrst proved a general invariance principle for smooth functions (over polytopes).Subsequently they instantiate their invariance principle for the so-called

Bentkus molliﬁer [Ben90], crucially relying on the fact that the molliﬁer has derivatives that scale poly-logarithmic in the inputsize. Finally in order to go from invariance principles (for the molliﬁer) to fooling regular polytopes,they need to prove an anti-concentration of polytopes in the Gaussian space. For this, they use (asa black-box) a well-known result of Nazarov [Naz03, KOS08], which bounds the Gaussian surfacearea ( GSA ) of polytopes. Putting together the invariance principle for smooth functions, Bentkusmolliﬁer and Nazarov’s bound on

GSA , [HKM13] obtained their main results for regular polytopes.We discuss this proof idea in more detail in Section 1.3.Subsequently, Servedio and Tan [ST17] improved the results of [HKM13] by considering “low-weight” polytopes, which removes the regularity condition (albeit, with the seed length of the

PRG in [ST17] depending on the weight). Finally, O’Donnell, Servedio and Tan [OST19] showed howto fool arbitrary polytopes. In [OST19] they still proved a “Boolean-invariance principle” for theBentkus molliﬁer, however they bypass the entire Gaussian space (in fact it is a necessity to avoidthis Gaussian space since standard invariance principles do not hold for non-regular polytopes).Although they bypass the Gaussian intermediate (which is standard in invariance principles), theirproof techniques still use the Lindeberg method. Additionally, a crucial tool introduced by themwas the Boolean anti-concentration of polytopes, since they can no longer use the

GSA bound ofNazarov which used by [HKM13, ST17, CDS19] for

Gaussian anti-concentration.

There are two straightforward approaches to constructing

PRG s for positive spectrahedrons. Theﬁrst is to write a spectrahedron as a linear program. Naturally one can approximate a positive-semideﬁnite constraint X (cid:23) k × k symmetric matrix with exponentially many constraints z T X z ≥ z ∈ R k . However the results of [HKM13, OST19] would be moot here since the seed-lengths of their PRG s are poly-logarithmic in the number of constraints, which are polynomial inthe dimension k , while our goal it to have seed lengths poly-logarithmic in k . The second approachis to use Sylvester’s criterion to write out k polynomials of degree at most k (corresponding to the The Bentkus molliﬁer is a function which provides a “smooth” continuous approximation to the the discretemultivariate indicator function (also referred to as orthant functions ). determinantal representation of the k minors) and one could potentially use PRG s for polynomialthreshold functions (

PTF ). However, ﬁnding optimal

PRG s for

PTF s has remained open for yearsand the best-known

PRG s we have for degree- k PTF s over the Boolean space depends exponentially in k [MZ13]. This naturally motivates us to use the “eigenstructure” of X (cid:23) Invariance principles:

Since a spectrahedron naturally deals with eigenvalues of matrices,it is unclear if we could use known invariance principles for spectrahedrons. In fact, we arenot even aware of a generalization of the Lindeberg-type argument to show an invarianceprinciple for spectral functions (i.e., functions that act on the eigenspectrum of matrices).2.

Geometric properties:

Prior works of [KOS08, HKM13, ST17, CDS19] crucially used thework of Nazarov [Naz03] which bounds the Gaussian surface area of polytopes in order to provetheir anti-concentration. However, spectrahedrons are very poorly understood, and even morebasic questions about their average sensitivity, noise sensitivity, surface area are unknown.3.

Anti-concentration:

An important technique for constructing

PRG s using invariance prin-ciples requires one to prove anti-concentration , i.e., when moving from the smooth molliﬁerto the orthant functions a crucial ingredient is anti-concentration. It is far from clear ifspectrahedrons enjoy such nice properties in either Boolean spaces or Gaussian spaces.As far as we are aware, none of these questions have been considered for any class of spectrahedronsexcept polytopes. Our main contribution is to make signiﬁcant progress in all these questions forthe class of positive spectrahedrons.

In order to state our main result we ﬁrst deﬁne

PRG s and ( τ, M )-regular spectrahedrons. A pseu-dorandom generator is a function G : {− , } r → {− , } n and is said to ε -fool a class of functions F ⊆ { f : {− , } n → { , }} with seed length r if it satisﬁes the following: for every f ∈ F , we have (cid:12)(cid:12)(cid:12)(cid:12) Pr x ∼U n [ f ( x ) = 1] − Pr y ∼U r [ f ( G ( y )) = 1] (cid:12)(cid:12)(cid:12)(cid:12) ≤ ε, where U n (resp. U r ) corresponds to uniform distribution over {− , } n (resp. {− , } r ). We nextdeﬁne the class of regular positive spectrahedrons. Given τ, M >

0, we say a sequence of k × k positive semideﬁnite matrices (cid:0) A , . . . , A n (cid:1) is ( τ, M ) -regular if I (cid:22) n X i =1 (cid:0) A i (cid:1) (cid:22) M · I and A i (cid:22) τ · I for every i ∈ [ n ] . (1)This regularity assumption is a very natural assumption, it says that the width of a semideﬁniteprogram deﬁned by these matrices is bounded. We remark that our regularity condition naturallyextends (and is in fact less restrictive) the regularity condition that was used in prior works onfooling halfspaces and polytopes [GOWZ10, DGJ +

10, MZ13, HKM13]. In Section 1.5.2 we discussmore about why this notion of regularity is necessary and suﬃcient for our proof techniques.3 spectrahedron S ⊆ R n is a feasible region of the convex set S = (cid:8) x ∈ R n : P i x i A i (cid:22) B (cid:9) . We say S is a positive spectreheron if either all A i s are positive semideﬁnite ( PSD ) or all A i sare negative semideﬁnite. We say S is a ( τ, M ) -regular positive spectrahedron if ( A , . . . , A n ) are( τ, M ) regular. It is also natural to consider an intersection of positive spectrahedrons S , . . . , S t .However, without loss of generality one can assume that t = 2 since one can “pack” all the S i s with PSD matrices into a larger block diagonal matrix with dimension t · k and similarly all the negativesemideﬁnite matrices, so we can always assume we are working with an intersection of two positivespectrahedrons. For simplicity, in the introduction we assume that we are working with a singleregular positive spectrahedron here and state our main theorem.

Result 1 (PRG for positive spectrahedrons) . There exists a

PRG G : { , } r → {− , } n withseed length r = O (log n · log k · M · /δ ) that δ -fools ( τ, M ) -regular positive spectrahedrons for τ ≤ poly( δ/ ( M · log k )) . Typically, handling the “regular case” is the ﬁrst step towards obtaining optimal results inpseudorandom generators for geometric objects and we have accomplished that here for the ﬁrsttime. To prove this theorem, we follow the well-known three-step approach and prove the following:1. An invariance principle for the Bentkus molliﬁer of arbitrary regular spectrahedrons.2. Boolean and Gaussian anti-concentration for positive regular spectrahedrons.3. An invariance principle for positive regular spectrahedronsBefore proving these statements, we ﬁrst overview the [HKM13, OST19] approach to proving in-variance principles (since our high-level ideas are inspired by their works).

First recall that a polytope is the feasible region of the set { x ∈ R n : W x ≤ b } for a ﬁxed W ∈ R n × n , b ∈ R n . We say a polytope is τ -regular if each row W i satisﬁes k W i k = 1 and k W i k ≤ τ . At a high-level the [HKM13] invariance principle states the following: (cid:12)(cid:12)(cid:12)(cid:12) Pr x ∼U n [ W x ≤ b ] − Pr g ∼G n [ W g ≤ b ] (cid:12)(cid:12)(cid:12)(cid:12) ≤ poly(log n, τ ) . (2)To show this, they ﬁrst express the orthant function above (which we denote O : R n → { , } ),as [ W x ≤ b ] = [ W x ≤ b ] · · · [ W n x ≤ b n ]. Given this structure, they now use the well-knownLindeberg method [Lin22] (see [O’D14, Tao10] for a detailed exposition) to move from the uniformdistribution over a Boolean space to the Gaussian space. To establish Eq. (2), they follow a three-step approach: (1) First, they prove a version of Eq. (2) for smooth functions e O : R n → R (i.e.,functions who have bounded multivariate derivatives). In particular, they use the Lindeberg methodto show that the expected value of e O ( W x ) for x ∼ U n , is “close” to the expected value of e O ( W g ) for g ∼ G n . To understand this closeness, they write out e O ( W z ) using the standard multivariate Taylor For simplicity in exposition, we assume here that k B k ≤ M (our main theorems depend on the norm of B ). Crucially we remark that the seed length of our

PRG has dependence only logarithmic in k , so even with anintersection of t positive spectrahedrons, the dependence would be logarithmic in t as well. For simplicity, we assume that the number of constraints and variables are equal. Their analysis is more general. e O ( W x ) and e O ( W g ) by the higher-order derivatives ofthe smooth function e O . (2) Second, they observe that a result of Bentkus [Ben90] provides exactlyan approximator e O : R n → R (which we refer to as the Bentkus molliﬁer ) which serves as a smooth approximation to the { , } -valued orthant function O ( x ) = [ W x ≤ b ]. Additionally thismolliﬁer crucially satisﬁes the property that k e O ( ℓ ) k ≤ O (cid:0) log ℓ n (cid:1) . (3) So far they established thatthe Bentkus molliﬁer (which served as a proxy for [ W x ≤ b ]) satisﬁes an approximate version ofEq. (2). In order to go from being close with respect to this Bentkus molliﬁer to multidimensionalCDF closeness, they prove Gaussian anti-concentration of polytopes. For this, they use a resultof Nazarov [Naz03] (as a black-box) which shows that the Gaussian surface area of a polytope is O ( √ log n ). These three steps allow them to prove Eq. (2). We begin by deﬁning spectral functions. Let f : R k → R , we say ψ : Sym k → R is a spectralfunction if ψ ( M ) = f ( λ ( M )) for all M ∈ Sym k where λ ( M ) = ( λ , . . . , λ k ) are the k eigenvaluesof M . In other words, a spectral function ψ ( · ) depends on a function ψ applied to the eigenvaluesof its argument. We say f satisﬁes an invariance principle if E x ∼U n " ψ X i x i A i − B ! ≈ ε E g ∼G n " ψ X i g i A i − B ! , for symmetric matrices A , . . . , A n , B . A conceptual challenge in proving an invariance principleeven for smooth spectral functions is that standard Lindeberg-style proofs of invariance theoremsuse multivariate Taylor series of the molliﬁer function cannot be used here, since our functions acton the eigenvalues of matrices. In the past, there have been various invariance principles [MOO05,Mos08, HKM13, Yao19] but none of them apply here; as far as we are aware invariance principleswith non-diagonal A i , B have not been studied. In this work, we overcome this challenge andadapt the Lindeberg-style proofs of probabilistic invariance principles to prove its analogue forspectral functions.To this end, recall that we are concerned with spectrahedrons whose feasible regions are givenby { x ∈ R n : P i x i A i (cid:22) B } , which can alternatively be written as { x : λ max (cid:0)P i x i A i − B (cid:1) ≤ } .So we let our spectral function f : R k → R to be f ( λ ) = [max i λ i ≤

0] (recall that althoughour spectrahedron acts on n bits on which we want to prove an invariance principle, our spectralfunction acts only on the k eigenvalues). For this function, we can still use the Bentkus molliﬁer e O : R k → R as a smooth approximation to f . So our ﬁrst main contribution is to prove aninvariance principle for the Bentkus molliﬁer applied to the spectrum of matrices. We remark thatin contrast to [HKM13], we do not prove a general invariance principle for spectral functions, insteadour spectral function is tailored for the Bentkus molliﬁer (which is also the case for [OST19]).

Fr´echet derivatives.

Since our Bentkus molliﬁer is acting on the eigenspectrum of matrices,instead of multivariate Taylor expansion, we adopt

Fr´echet derivatives , a notion of derivatives that isstudied in Banach spaces. Unfortunately, Fr´echet series (in contrast to standard multivariate series)are still not well understood. In fact even basic properties such as continuity, Lipschitz continuity,diﬀerentiability, continuous diﬀerentiability, were only proven in the last three decades [BSS98,Lew96, BS99, CQT03], which have been well-known for centuries in standard calculus. In particular, Here k f ( ℓ ) k is the 1-norm of the coeﬃcients in the ℓ -th derivative. In [HKM13], they care about k f (4) k =max x P p,q,r,s | ∂ p ∂ q ∂ r ∂ s f ( x ) | . In fact our analysis can allow arbitrary orthant functions which can be approximated by a Bentkus molliﬁer. spectral functions only appeared in the last decade.Fortunately for us, Sendov [Sen07] provided a tensorial representation of high-order Fr´echetseries for spectral functions which we employ to analyze the Fr´echet derivatives of the Bentkusmolliﬁer. The challenge is in bounding the 3-tensors that appears in Sendov’s theorem, whichproduce 6 terms corresponding to diﬀerent permutations of the tensors after simpliﬁcation. Threeof these 6 terms can simply be upper bounded by k e O (3) k which we know to be small for the Bentkusmolliﬁer. We remark that these are exactly, and the only, terms that appear in the standardinvariance principle proofs for linear forms. Intuitively this is not surprising since the ﬁrst threeterms simply correspond to the case when the A i , B are diagonal which reduces a spectrahedron toa polytope. However, bounding the remaining terms is highly non-trivial and one of our technicalcontributions is in showing these remaining terms are bounded for the Bentkus molliﬁer. Bounding derivatives and obtaining invariance principle.

Bounding these last three termsof the 3-tensors signiﬁcantly deviates from the analysis of [HKM13] since we need to deal with oﬀ-diagonal entries of matrices which is unique to the matrix-spectrahedron case and is not facedin [HKM13, ST17, OST19]. To bound this, we use several properties of Fr´echet derivatives suchas, mean value theorems for Fr´echet derivatives, divided diﬀerences representations of Fr´echetderivatives [BLZ05], and Dyson’s theorem [Bha13] which provides a useful integral expression forFr´echet derivatives (using the structure of the molliﬁer). More importantly, since we work with theBentkus molliﬁer [Ben90], we completely open up the Bentkus black-box and show various analyticproperties of this molliﬁer e O in order to prove that our Fr´echet derivatives are bounded.In order to go from bounded third-order Fr´echet derivatives to a ﬁnal invariance principle,we still need to borrow some results from random matrix theory to upper bound the moments of P i x i A i . Although, the concentration of P i x i A i for uniformly random x ∼ U n is well-studiedby standard matrix Chernoﬀ bounds [Tro15], we need better concentration of this random matrixvariable at higher Schatten norms. For the diagonal polytope case [HKM13] used the standardhypercontractivity and [OST19] used Rosenthal’s inequality. Fortunately for us, a matrix-versionof Rosenthal’s inequality [MJC +

14] was proven a few years back and we use it to conclude our proof(in fact we also crucially rely on this inequality to construct our

PRG ). Putting everything togetherwe obtain our main invariance principle for the Bentkus molliﬁer applied as a spectral function (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E x ∼U n " e O n X i =1 x i A i − B ! − E g ∼G n " e O n X i =1 g i A i − B ! ≤ poly(log k, M, τ ) . (3)We remark that the invariance principle above does not assume the positivity of the matrices. Webelieve this is a necessity for future work on fooling arbitrary spectrahedrons. Even with an invariance principle in hand, we are faced with the same challenges as [HKM13, ST17,OST19] to show an anti-concentration statement. Recall that our goal is to show that for a ( τ, M )-regular positive spectrahedron S , the expected value of the indicator function [ x ∈ S ] for x ∼ U n isclose to the expected value of [ g ∈ S ] for g ∼ G n . This is “almost” what we showed in the previoussection except that the Bentkus molliﬁer e O in Eq. (3) is replaced by the orthant indicator function f ( x ) = [max i x i ≤ .5.1 Geometric properties A well-known theorem of Ball [Bal93] shows an upper bound of O ( n / ) on the Gaussian surface area( GSA ) of arbitrary n -dimensional convex object. Crucially in the works of [HKM13, ST17] they usedan improvement of Ball’s theorem by Nazarov [Naz03] who showed that the GSA of k -facet polytopesis O ( √ log k ). This logarithmic-upper bound on GSA allows [KOS08, HKM13, ST17, CDS19] toobtain invariance principles, learning algorithms and pseudorandom generators that depend poly-logarithmic in k . In contrast, for our setting it is unclear what is the GSA of spectrahedrons. Clearly,Ball’s theorem gives an upper bound of O ( n / ) on GSA for us. Apart from that, spectrahedrons arepoorly understood. Below we prove an upper bound of O (1) on the GSA of positive spectrahedrons. Result 2 (Geometric properties of positive spectrahedrons) . Let S be a positive spectrahedron andconsider F : {− , } n → { , } deﬁned as F ( x ) = [ x ∈ S ] . The average sensitivity of F is O ( √ n ) ,the ε -Boolean noise sensitivity of F is O ( √ ε ) , and the Gaussian surface area is O (1) . We remark that the noise-sensitivity statement we have above can be viewed as a “positive-matrix-analogue” version of the well-known Peres’s theorem [Per04]. In order to prove this state-ment, we ﬁrst observe that the average sensitivity of F being O ( √ n ) immediately follows by theobservation that positive spectrahedrons correspond to unate functions and Kane [Kan14a] showed AS ( f ) ≤ O ( √ n ) if f is unate (and a similar statement is known to be false for noise sensitivity).One issue we need to handle when translating between noise sensitivity and average sensitivityis the following: in the standard technique of [Per04, DGJ +

10, Kan14a], one upper bounds the ε -noise sensitivity of a function f by “bucketing” the input variables into m = O (1 /ε ) buckets B , . . . , B m and reduces the function f : {− , } n → {− , } to a function g : {− , } m → {− , } deﬁned as g ( b ) = P mℓ =1 b i P i ∈ B ℓ z i A i (for uniformly random z ). One then upper bounds NS ε ( f )using AS ( g ) (up to a factor ε ). Clearly when using this technique to bound ε -noise sensitivity ofhalfspaces, both f, g are intersections of halfspaces and one can upper bound the average sensitivityof g using Kane’s result [Kan14a] to be O ( √ m ). However in our setting if f is an indicator of a positive spectrahedron, then g no longer needs to be an indicator of a positive spectrahedron since P i ∈ B ℓ z i A i need not even be either a positive semideﬁnite matrix or a negative semideﬁnite matrix.We overcome this by modifying the bucketing procedure of [DGJ +

10] to ensure g is an indicator ofa unate function. However, in the process case we end up upper bounding NS ε ( f ) by the “average2-sensitivity” of g . We extend the results of Kane [Kan14a] by showing that even the “average 2-sensitivity” of g is small for our setting. Finally, to move from an upper bound on ε -noise sensitivityto Gaussian surface area, we use standard folklore results [DHK +

10, Kan11a, Bal13].

Gaussian anti-concentration of polytopes directly follows from the fact that the Gaussian surfacearea of polytopes is bounded since its surface has only ﬁnite normed vectors. This is crucially usedin [HKM13, ST17, CDS19]. However, it is not clear how to obtain Gaussian anti-concentrationof positive spectrahedrons even with bounded Gaussian surface area (as proven in Result 2) dueto its complicated geometric structures. Here, to move from molliﬁer-closeness to CDF closeness,we prove a

Boolean anti-concentration for positive spectrahedrons, which is in fact stronger thanGaussian anti-concentration, inspired by the Boolean anti-concentration for polytopes in [OST19].

Regularity condition.

Before explaining the Boolean anti-concentration, we need to revisit the regularity condition, which is also used for polytopes. In [HKM13, ST17], it is assumed that every7alfspace (or row in the matrix W ) satisﬁes k W i k = 1 and k W i k ≤ τ . One important question is:what is a regularity assumption for spectrahedrons and for which assumptions can we show anti-concentration? A natural possibility is to see if Nazarov’s result [Naz03] holds for spectrahedrons(i.e., show anti-concentration in the weaker Gaussian setting). To the best of our knowledge, thishas ﬁrstly not been studied in literature. Moreover, it is not hard to see that, in order for theproof of Nazarov to work for spectrahedrons, one can make a very strong assumption that every A i satisﬁes λ min ( A i ) ≥

1. However, this seems to signiﬁcantly restrict the class of spectrahedrons.In order to resolve this, we propose ( τ, M )-regularity as deﬁned in Eq. (1) and prove a strongerstatement, i.e., Boolean anti-concentration for ( τ, M )-regular positive spectrahedrons. We use thisstatement to go from closeness between the molliﬁer e O (cid:0)P i x i A i − B (cid:1) and e O (cid:0)P i g i A i − B (cid:1) (whichwe already established in Eq. (3)) to closeness between (cid:2)P i x i A i (cid:22) B (cid:3) and (cid:2)P i g i A i (cid:22) B (cid:3) . In thisdirection, we prove a Littlewood-Oﬀord type theorem for positive spectrahedrons. Result 3 (Littlewood-Oﬀord for positive spectrahedrons) . If ( A , . . . , A n ) are ( τ, M ) -regular. Thenevery Λ , we have Pr x ∼U n " λ max X i x i A i − B ! ∈ [ − Λ , Λ] ≤ O (Λ) . The classic Littlewood-Oﬀord theorem [LO39, Erd45] anti-concentration inequality for a half-space w ∈ R n (satisfying | w i | ≥

1) and α ∈ R proves a bound on the probability that P i w i x i ∈ [ α, α + 2] (where x ∼ U n ). In [OST19] they generalized this for intersections of halfspaces and inthe result above we show a matrix-version of Littlewood-Oﬀord theorem. Intuitively, our statementshows the largest eigenvalue of a positive spectrahedron cannot all be very-concentrated in a smallregion (i.e., small eigenvalue regions have small measure over the Boolean cube).The proof of our result is similar to the proofs in [Kan14a, OST19] which show anti-concentrationfor intersections of unate functions (which is the case for positive spectrahedrons). There are acouple of subtleties for us: in [OST19], they perform random “bucketing” of the coordinates in apolytope and show that with high probability, each bucket has “signiﬁcant” weight, which followsimmediately from the Paley-Zygmund inequality. However, for us, ﬁrstly random bucketing doesnot produce a positive spectrahedron (the same issue which we faced in Theorem 2), so insteadwe need to bucket in a non-standard way to go from a positive spectrahedron to a bucket whichcorresponds to a unate function. Next, to show that each bucket has signiﬁcant weight (which inour case corresponds to large smallest eigenvalue), we invoke the matrix Chernoﬀ bound for nega-tively correlated variables. We remark that higher-dimensional extensions of the Littlewood-Oﬀordtheorem [FF88, TV12] do not talk of eigenspectrum of matrices and diﬀers from our result.Using the standard bits-to-Gaussians trick, this also gives us Gaussian anti-concentration (i.e.,the positive spectrahedrons analogue of Nazarov’s result [Naz03] which is unknown as far as we areaware). Putting this together with our invariance principle statement we obtain our main result. Result 4 (Fooling positive spectrahedrons) . For every ( τ, M ) -regular positive spectrahedron S , (cid:12)(cid:12) E x ∼U n [ x ∈ S ] − E g ∼G n [ g ∈ S ] (cid:12)(cid:12) ≤ poly( M, log k, τ ) . (4)Apart from the applications of constructing pseudorandom generators (which we discuss in thenext section) we believe that our invariance principle for the Bentkus molliﬁer of arbitrary spectra-hedrons, opening up the Bentkus molliﬁer (i.e., understanding the Bentkus functions which werealmost used as a black-box in [HKM13, ST17, OST19]), the Littlewood-Oﬀord theorem for positivespectrahedrons, Gaussian surface area of positive spectrahedrons, could be of independent interest.8 .6 Applications We now brieﬂy discuss how to use the invariance principle to obtain our pseudorandom generator.Our construction is based on the Meka-Zuckerman [MZ13]

PRG construction for fooling halfspaces.We note in the passing that this same

PRG (with diﬀerent parameters) was also used by [HKM13,ST17] and slight modiﬁcation of it by [OST19]. We omit the details of the

PRG construction herereferring the interested reader to Section 6.3 for an explicit construction.One subtlety in order to go from invariance principle to fooling the MZ-generator is the follow-ing: recall that our invariance principles showed that expected value under the uniform distributionwas close to the expected value under the Gaussian distribution. However, in order to fool the MZ-generator one needs to show that the invariance principle proofs holds also for k -wise independentdistributions. In this direction, we use a neat trick from [OST19] that shows that in order toshow invariance principles for k -wise independent distributions, it suﬃces to show just Booleananti-concentration, and second we crucially use the fact that the matrix Rosenthal inequality canbe derandomized by analyzing its the original proof. Put together, this shows that our invarianceprinciple proof holds for k -wise independent distributions and gives us our main PRG result.

Result 5 (PRG for positive spectrahedrons) . Let S be a ( τ, M ) -regular positive spectrahedron.There exists a PRG G : { , } r → {− , } n with r = (log n ) · poly(log k, M, /δ ) that δ -fools S withrespect to the uniform distribution for every τ ≤ poly( δ/ (log k · M )) . Learning geometric objects is a fundamental problem in computational learning theory. An applica-tion of upper bounding noise sensitivity or Gaussian surface area of spectrahedrons (in Theorem 2)is in agnostic learning. The agnostic learning framework introduced by [KSS94, Hau92] is the fol-lowing: let

C ⊆ { c : {− , } n → { , }} be a concept class and D : {− , } n × { , } → [0 ,

1] be adistribution. Deﬁne opt ( C ) = min c ∈C Pr ( x,b ) ∼D [ c ( x ) = b ] , i.e., what is the best approximation to D from within the concept class. The goal of an agnostic learner is the following: given many samples( x, b ) ∼ D , the goal of a learner is to produce a hypothesis h : {− , } n → { , } which satisﬁesPr ( x,b ) ∼D [ h ( x ) = b ] ≤ opt ( C ) + ε. Note that if opt ( C ) = 0, this is the standard PAC learning framework and agnostic learning modelslearnability under adversarial noise. A natural restriction of this model is when the marginal of D on the ﬁrst n bits is the uniform distribution on { , } n . It is a folklore result [KOS04] that a func-tion f having low noise sensitivity can be approximated by low-degree polynomials (see [HKM13,Lemma 2.7] for an explicit statement). Furthermore, the well-known L1-polynomial regression al-gorithm [KKMS08] shows how to learn low-degree polynomials in the agnostic framework. Puttingthese two connections together gives us the following theorem. Result 6 (Learning positive spectrahedrons) . The concept class of positive spectrahedrons (in n variables with k × k symmetric matrices) can be agnostically learned under the uniform distributionin time n O (log k ) for every constant error parameter. The previous best known result [KOS08] for learning positive spectrahedrons even in the PACmodel was 2 O ( n / ) (as far as we are aware); our result provides a substantially better complexity.9 .6.3 Discrepancy sets for spectrahedrons Understanding discrepancy sets for convex objects is a fundamentally important problem in theﬁelds of convex geometry, optimization, and a range of other areas. Prior works of [HKM13, ST17,OST19] constructed such discrepancy sets for polytopes, but a natural question is to extend theirconstruction to spectrahedrons. In our context, one application of our main result can be viewedas the following: consider the set of all possible positive spectrahedrons (over the Boolean cube) S = { x ∈ {− , } n : P i x i A i (cid:22) B } , then can we construct a small subset of the Boolean cube {− , } n such that this set δ -approximates the {− , } n -volume of every positive spectrahedron?One way to construct such a set is to construct a PRG for the class of functions. So an immediatecorollary of our

PRG for positive spectrahedrons is the following theorem. Result 7 (Discrepancy set for positive spectrahedrons) . There is a deterministic algorithm which,given a ( τ, M ) -regular positive spectrahedron S , runs in time exp(log n, log k, M, /δ ) and outputs a δ -approximation of the number of points in {− , } n contained in S as long as τ ≤ poly( δ/ ( M log k )) . Constructing

PRG s for

PTF s has received a lot of attention. However, the best known seed lengthfor fooling a degree- k PTF on n bits scales as O (log n · k ) (over the Boolean space). A simpleobservation we make is that fooling spectrahedrons (on n bits with k × k matrices) can be in factbe viewed as the more challenging task of fooling an intersection of k many degree- k PTF s.Recall that a spectrahedron is given by S = { x ∈ R n : B − P i x i A i (cid:23) } . Without lossof generality, we may assume that the measure of x satisfying det (cid:0)P i B − x i A i (cid:1) = 0 is zero.Sylvester’s criterion implies that a matrix M (which in our case is B − P i x i A i ) is positive deﬁnite if and only if the determinant of the k principle minors of M are positive. Hence, an alternatecharacterization of S is the set of x ∈ R n for which S = k ^ r =1  det B − X i x i A i ! r × r >  = k ^ r =1 sign[ p r ( x )]modulo a zero-measure set, where M r × r means the top left r × r principle minor of M . Clearly eachdeterminantal expression produces a polynomial p r of degree at most r . So, our main result aboutfooling S , shows that there is a structured class of intersections of degree- k PTF s (i.e., the class ofpolynomials which can be written as in terms of the above) which can be fooled by a

PRG withseed length O (log n · log k · M/δ ), which is exponentially better than using existing

PRG s for

PTF s.We remark that apriori, it is not even clear why should an arbitrary polynomial even correspondto a spectrahedron as above? However, a well-known result of [HMV06, GM12] states that anarbitrary degree- d polynomial p ∈ R [ x , . . . , x n ] with real coeﬃcients has a symmetric determinantalrepresentation , i.e., there exists symmetric A , A , . . . , A n such that p ( x , . . . , x n ) = det A + X i x i A i ! . where A i ∈ Sym (cid:0) n + dd (cid:1) . So, if we could fool arbitrary spectrahedrons that might be a promisingavenue to fool PTF s and intersections of

PTF s. We remark that counting integer solutions to positive spectrahedrons is not as naturally motivated as that forpolytopes, but nevertheless understanding discrepancy sets for geometric objects is a fundamental question. See [Qua12] for a simple linear algebraic proof of this statement. .7 Future work Our work opens this new line of research into understanding

PRG s for spectrahedrons with severalnovel techniques. This raises several questions for future work. . Can we remove regularity for positive spectrahedrons? One of the crucial techniques thatServedio and Tan [ST17] introduced (inspired by a prior work of Servedio [Ser06]) was decomposinga polytope into head and tail variables (i.e., tail coordinates in a halfspace which satisfy regularityand head coordinates are the dominant variables). They express the head variables as CNF, usethe result of Bazzi [Baz09] to fool the head variables and invariance principles for tail variables.However, in our setting breaking up a single spectrahedron into head and tail variables is unclearand even if possible, what is the analogue of the CNF for our setting? . Can we fool arbitrary spectrahedrons? Besides the diﬃculty in removing the regularitycondition, another fundamental barrier we face here is, anti-concentration. What is the Gaussiansurface area of a spectrahedron, even this is unknown (as far as we are aware). Our techniques suchas bucketing, using Kane’s result [Kan14a], and Boolean anti-concentration [OST19] crucially usethe assumption of positivity. Going beyond this, might require new understanding on the geometricstructures (like average sensitivity, noise sensitivity) about arbitrary spectrahedrons. . A general invariance principle for spectral functions? Here, we showed our invariance prin-ciple speciﬁcally for the Bentkus molliﬁer. However, like the result of [HKM13] can we prove ageneral invariance principle for arbitrary smooth spectral functions? Given the applications of in-variance principles, they are now considered to be powerful techniques in computational complexitytheory. Having an invariance principle for spectral functions could ﬁnd more applications such asdeciding noisy entangled quantum games [Yao19]. . Can we fool spectrahedral caps? Let S n − = { x ∈ R n : k x k = 1 } denote the n -dimensionalsphere, then a spectrahedral cap is the set of S n − that is “cut” by a spectrahedron, i.e., for aspectrahedron S , we deﬁne the spectrahedral cap C S as C S = S n − ∩ S . In the polytope-setting,fooling spherical caps has received a lot of attention classically [HKM13, KM15] (with almostoptimal seed length PRG s). Can we similarly fool spectrahedral caps? . Fooling polynomial threshold functions? Can we make progress in ﬁnding better

PRG s for

PTF s using techniques we developed here for fooling arbitrary spectrahedrons?

Acknowledgements.

We thank Jop Bri¨et and Minglong Qin for helpful comments. This collab-oration earlier faced some bureaucratic issues. We are deeply grateful for the support from JelaniNelson, Kewen Wu, Yitong Yin and others in the TCS community. P.Y. was supported by theNational Key R&D Program of China 2018YFB1003202, National Natural Science Foundation ofChina (Grant No. 61972191), the Program for Innovative Talents and Entrepreneur in Jiangsuthe Fundamental Research Funds for the Central Universities 0202/14380068 and Anhui Initiativein Quantum Information Technologies Grant No. AHY150100. Part of the work was done whenP.Y. and S.A. were participating in the program ”Quantum Wave in Computing” held at SimonsInstitute for the Theory for Computing.

Organization.

In Section 2 we introduce all the mathematical aspects which we use in thispaper as well as state various lemmas in random matrix theory and multidimensional calculus.In Section 3, we introduce the Bentkus molliﬁer and discuss various properties. In Section 4 westate our main theorem regarding spectral derivatives of smooth functions and go on to boundthe spectral derivatives for the Bentkus function (proving a technical lemma in Appendix A). In11ection 5 we prove an upper bound on the Gaussian surface area of positive spectrahedrons as wellas our Littlewood-Oﬀord theorem for this class. In Section 6 we prove our invariance principletheorem and go on to construct a pseudorandom generator for the class of positive spectrahedrons.

For an integer n ≥

1, let [ n ] represent the sets { , . . . , n } . Given a ﬁnite set X and a natural number k , let X k be the set X × · · · × X , the Cartesian product of X , k times. Given a = ( a , . . . , a k )and a set S ⊆ [ k ], we write a S and a − S to represent the projections of a to the coordinatesspeciﬁed in S and the on coordinates outside S , respectively. For any i ∈ [ k ], a − i represents a , . . . , a i − , a i +1 , . . . , a n and a i , a ≥ i are deﬁned similarly. Let µ be a probability distribution on X , and µ ( x ) represent the probability of x ∈ X according to µ .Let X be a random variable distributed according to µ . We use the same symbol to represent arandom variable and its distribution whenever it is clear from the context. The expectation of afunction f on X is deﬁned as E [ f ( X )] = E x ∼ X [ f ( x )] = P x ∈X Pr[ X = x ] · f ( x ) = P x µ ( x ) · f ( x ),where x ∼ X represents that x is drawn according to X . For any event E x on x , [ E ( x )] representsthe indicator function of E . In this paper, the lower-cased letters in bold x , y , z · · · are reservedfor random variables. Distributions.

Throughout, we denote G (where G = N (0 , R with mean 0 and variance 1. We denote U n to be theuniform distribution on {− , } n . We say a sequence of random variables X = ( x , . . . , x n ) is t -wise uniform if any subset of X of size t is uniformly distributed (observe that the uniformdistribution is clearly t -wise independent for every t ≥ H on functions [ n ] → [ m ]is said to be an r -wise uniform hash family if for h ∼ H , ( h (1) , . . . , h ( n )) is r -wise uniform. For any f : R → R in C d , which is the set of all real functions that are d -time diﬀerentiable, weuse f ( d ) to denote the d -th derivative of f . Given a function F : R k → R and a k -dimensionalmulti-index α = ( α , . . . , α m ) ∈ N k , ∂ α F denotes the mixed partial derivative taken α i times in the i -th coordinate. Fact 1.

Let k ∈ N and f : R k → R be a C d function. Then for all x, y ∈ R k , f ( x + y ) = X α ∈ N k : | α |≤ d − ∂ α f ( x ) α ! m Y i =1 y α i i + err ( x, y ) , where α ! = α ! · · · α m ! , | α | = P i α i and | err ( x, y ) | ≤ sup v ∈ R k X α ∈ N k : | α | = d | ∂ α f ( v ) | max i | y i | d . For a t -time diﬀerentiable function f : R k → R and s ≤ t , deﬁne k f ( s ) k = max n X p ,p ,...,p s ∈ [ k ] | ∂ p · · · ∂ p s f ( x ) | : x ∈ R k o eﬁnition 2. Let f : R → R . For any distinct inputs x , . . . , x n ∈ R , the divided diﬀerence isdeﬁned recursively as follows. f [0] = f,f [ i ] ( x , . . . , x i +1 ) = f [ i ] ( x , . . . , x i − , x i ) − f [ i ] ( x , . . . , x i − , x i +1 ) x i − x i +1 . For other values of x , . . . , x i +1 , f [ i ] is deﬁned by continuous extension. Fact 3 (Mean value theorem for divided diﬀerence [Boo05]) . For any f ∈ C n and any x , . . . , x n +1 ,there exists ξ ∈ (min { x , . . . , x n +1 } , max { x , . . . , x n +1 } ) such that f [ n ] ( x , . . . , x n +1 ) = f ( n ) ( ξ ) n ! . Let f : { , } n → { , } , g : R n → { , } and S be a Borel set in R n . We deﬁne the followingcombinatorial properties of Boolean-valued functions f, g .1. Average sensitivity: AS ( f ) = P ni =1 Pr x [ f ( x ) = f ( x ⊕ e i )], where the probability is takenuniformly in { , } n .2. ε -Noise sensitivity: NS ε ( f ) = Pr x , y [ f ( x ) = f ( y )] where the probability is taken according tothe distribution: x is uniformly random in { , } n and y is obtained from x by independentlyﬂipping each x i with probability ε .3. Gaussian noise sensitivity: GNS ε ( g ) = Pr x , z [ g ( x ) = g ( y )] where x , z are independent andrandom Gaussian vectors in G n , and y = (1 − ε ) x + √ ε − ε z .4. Gaussian surface area: GSA ( S ) = lim inf δ → G n ( S δ \ S ) δ where S δ = { x : dist ( x, S ) ≤ δ } denotesthe δ -neighborhood of S under Euclidean distance.We refer interested readers to [O’D14] for more on these parameters and their applications toanalysis of Boolean functions. For any integer k >

0, we use

Mat k and Sym k to represent the set of k × k real matrices andsymmetric matrices, respectively. For any matrix X , k X k p represents the Schattern p -norm of X and k X k represents the spectral norm of X . I k represents a k × k identity matrix. The subscript k may be omitted whenever the dimension is clear from the context. We need the following resultsin matrix analysis. Fact 4. [Bha00] For any k × k real symmetric matrix A , let B be its upper triangle part of A .Namely B i,j = A i,j if i ≤ j and is otherwise. Then k B k ≤ ln kπ k A k . Fact 5. [Tro12, Theorem 1.1] Let n, k ≥ be integers and X , . . . , X n be independent random k × k real symmetric matrices satisfy (cid:22) X i (cid:22) R for i ∈ [ n ] . Set µ = λ min n n X i =1 E [ X i ] ! . hen Pr " λ min n X i =1 X i ! ≤ (1 − δ ) µ ≤ k · e − δ (1 − δ ) − δ ! µ/R for every δ ∈ [0 , . Fact 6.

For every integer m ≥ and A , . . . , A n ∈ Sym k it holds that E (cid:2) k P i g i A i k m (cid:3) ≤ (1 + 2 m ⌈ log k ⌉ ) m/ · k P i ( A i ) k m/ and E (cid:2) k P i x i A i k m (cid:3) ≤ (1 + 2 m ⌈ log k ⌉ ) m/ · k P i ( A i ) k m/ , where the expectations are taken over x ∼ U n and g ∼ G n . Additionally, the second inequality stillholds if x is m ⌈ log k ⌉ -wise uniform.Proof. It suﬃces to prove the second inequality as the ﬁrst one follows by the standard bits-to-Gaussians tricks [O’D14, Chapter 11]. Let B = P i x i A i where x ∼ U n . The proof closely followsthe argument in [Tro16], where Tropp proved the case that m = 1. For any integer p ≥

1, it isproved in [Tro16, Eqs. (4.9,4.11)] that E (cid:2) Tr B p (cid:3) ≤ k · (cid:18) p + 1 e (cid:19) p · k P i ( A i ) k p . Thus E [ k B k m ] ≤ E (cid:2) Tr B pm (cid:3) / p ≤ k / p · (cid:18) pm + 1 e (cid:19) m/ · k P i ( A i ) k m/ . Setting p = ⌈ log k ⌉ , we conclude the result. Fact 7 (Matrix Rosenthal inequality [MJC +

14, Corollary 7.4]) . Let X , . . . , X n be centered, inde-pendent random real symmetric matrices. Then (cid:16) E h k P i X i k p p i(cid:17) p ≤ p p − k (cid:0)P i E (cid:2) X i (cid:3)(cid:1) k p + (4 p − X i E h k X i k p p i! p . This inequality still holds if X , . . . , X n are p -wise independent. Let f : R k → R and λ : Sym k → R k where λ ( X ) = ( λ ( X ) , . . . , λ k ( X )) are the eigenvalues of M sorted in a non-increasing order. We refer to λ max = λ interchangeably. Let F = f ◦ λ : Sym k → R .If f : R → R is an analytic function in R , namely its Taylor series converges in R , we deﬁne f ( X ) for general matrices using its Taylor expansion. It is not hard to see that the Taylor seriesstill converges with matrix inputs. If X is symmetric with a spectral decomposition X = U DU T ,where D = diag ( λ ( X ) , . . . , λ k ( X )), then f ( X ) = U diag ( f ( λ ( X )) , . . . , λ k ( X )) U T .The Fr´echet derivatives are a notion of derivatives deﬁned in Banach space. In this paper, weonly concern about the Fr´echet derivatives on matrix spaces. Readers may refer to [Col12] for amore thorough treatment. The Fr´echet derivatives are the maps that are deﬁned as follows.14 eﬁnition 8. Given integers m, n ≥ , a map F : Mat m → Mat n and P, Q ∈ Mat m , the Fr´echetderivative of F at P with respect to Q is deﬁned to be DF ( P ) [ Q ] = ddt F ( P + tQ ) | t =0 . The k -th order Fr´echet derivative of F at P with respect to ( Q , . . . , Q k ) is deﬁned to be D k F ( P ) [ Q , . . . , Q k ] = ddt D k − F ( P + tQ k ) [ Q , . . . , Q k − ] | t =0 . Fr´echet derivatives share many common properties with the derivatives in Euclidean spaces,such as linearity, composition rules, Taylor expansions, etc. We refer the interested reader to [Col12]for more. Some basic properties of Fr´echet derivatives are summarized in the following fact.

Fact 9. [Bha13, Chapter X.4] Given

F, G : Mat n → Mat m and P, Q , . . . , Q k ∈ Mat n , it holds that1. D ( F + G ) ( P ) [ Q ] = DF ( P ) [ Q ] + DG ( P ) [ Q ] .2. D ( F · G ) ( P ) [ Q ] = DF ( P ) [ Q ] · G ( P ) + F ( P ) · DG ( P ) [ Q ] .3. If m = n , D ( F ◦ G ) ( P ) [ Q ] = ( D ( G ◦ F ) ( P ) ◦ DF ( P )) [ Q ] .4. D k F ( P ) [ Q , . . . , Q k ] = D k F ( P ) (cid:2) Q σ (1) , . . . , Q σ ( k ) (cid:3) for every k > and permutation σ ∈ S k . The following fact states that Fr´echet derivatives can be expressed as divided diﬀerences.

Fact 10. [BLZ05] Let f : R → R be an analytical function and X = diag ( x , . . . , x k ) be a diagonalmatrix whose spectrum is in R . For any matrix A, B , the following holds Df ( X ) [ A ] = (cid:16) f [1] ( x i , x i ) A i ,i (cid:17) ≤ i ,i ≤ k . (5) D f ( X ) [ A, B ] =  k X j =1 f [2] ( x i , x j , x i ) A i ,j B j,i  ≤ i ,i ≤ k . (6) Fact 11 (Dyson’s expansion [Bha13, Chapter X.4]) . Let f ( x ) = e x . For any X ∈ Sym k and A, B ∈ Mat k , it holds Df ( X ) [ A ] = Z du e (1 − u ) X Ae uX . Lemma 12.

Let f ( x ) = e − x / . It holds that D f ( X ) [ A, B ]= 14 Z du Z dv (1 − u ) e − (1 − u )(1 − v ) X / ( XB + BX ) e − (1 − u ) vX / ( XA + AX ) e − uX / + 14 Z du Z dv ue − (1 − u ) X / ( XA + AX ) e − u (1 − v ) X / ( XB + BX ) e − uvX / − Z du e − (1 − u ) X / ( AB + BA ) e − uX / . In [BLZ05, Lemma 3.8] this fact is proven when A = B is a symmetric matrix and it is not hard to generalizetheir proof to obtain Eqs. (5), (6). n particular, if A = B = H is a symmetric matrix ,then D f ( X ) [ H, H ]= 14 Z du Z dv (1 − u ) e − (1 − u )(1 − v ) X / ( XH + HX ) e − (1 − u ) vX / ( XH + HX ) e − uX / + 14 Z du Z dv ( u ) e − (1 − u ) X / ( XH + HX ) e − u (1 − v ) X / ( XH + HX ) e − uvX / − Z du e − (1 − u ) X / H e − uX / . Note that f ( x ) = e − x / is analytical in R . Thus it is valid to deﬁne f on arbitrary matrices. Proof.

For any t ∈ (0 , g ( x ) = e − tx . By the deﬁnition of Fr´echet derivative Dg ( X ) [ A ] = lim ε → ε (cid:18) e − t ( X + εA ) − e − tA (cid:19) = lim ε → ε (cid:16) e − t ( X + ε ( XA + AX )+ ε A ) − e − tX (cid:17) = lim ε → ε (cid:16) e − t ( X + ε ( XA + AX ) ) + O (cid:0) ε (cid:1) − e − tX (cid:17) = − t Z du e − (1 − u ) tX ( XA + AX ) e − utX , where the second equality is from the fact that k e X + εY − e X k = O ( ε ) and the last equality is fromFact 11. Setting t = , we have Df ( X ) [ A ] = − Z du e − (1 − u ) X / ( XA + AX ) e − uX / . Taking one more derivative on X with respect to B , we conclude the result. Deﬁnition 13.

Given τ, M > , we say a sequence of k × k positive semideﬁnite matrices ( A , . . . , A n ) is ( τ, M ) -regular if I (cid:22) n X i =1 (cid:0) A i (cid:1) (cid:22) M · I and A i (cid:22) τ · I for every i ∈ [ m ] (7)A spectrahedron S ⊆ R k is a feasible region of a semideﬁnite program. Namely, the set S = (cid:8) x ∈ R n : P i x i A i (cid:22) B (cid:9) for some symmetric matrices A , . . . , A n , B . We say S is a posi-tive spectrahedron if either all A i s are positive semideﬁnite or all A i s are negative semideﬁnite( NSD ). Moreover, it is ( τ, M )-regular if either ( A , . . . , A n ) or ( − A , . . . , − A n ) is ( τ, M )-regular.We say S is an intersection of positive spetrahedrons if S = S ∩ S where S and S are positivespectrahedrons whose matrices are all positive semideﬁnite and negative semideﬁnite, respectively.Note that it suﬃces to consider the intersections of two spetrahedrons as one can pack all PSD matrices into one large block-diagonal matrix (looking ahead this will only aﬀect the parametersin our main results by a logarithmic factor). Packing the corresponding B i s, one get a positivespectrahedron. Same for all negative semideﬁnite matrices.16 .6 Pseudorandomness Deﬁnition 14.

A function g : {− , } r → {− , } n with seed length r , is said to δ -fool a function f : {− , } n → R if (cid:12)(cid:12)(cid:12)(cid:12) E s ∼U r [ f ( g ( s ))] − E u ∼U n f ( u ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ δ. The function g is said to be a eﬃcient pseudorandom generator ( PRG ) that δ -fools a class F of n -variable functions if g is computable by a deterministic uniform poly ( n ) -time algorithm and g fools all function f ∈ F . For ℓ ≥

1, let T ℓ be an ℓ -tensor, i.e., T ℓ : ( R k ) × ℓ → R . Note that a ℓ -tensor is deﬁned uniquely by thecoeﬃcients { T i ,...,i ℓ : i , . . . , i ℓ ∈ [ k ] } . Below we abuse notation by letting T ( i , . . . , i ℓ ) = T i ,...,i ℓ .Often we will use the natural bijection between 2 ℓ -tensors acting on R k and ℓ -tensors acting on Mat k , i.e., for a 2 ℓ -tensor T : ( R k ) × ℓ → R deﬁned as T ( x , . . . , x ℓ ) = X i ,...,i ℓ ∈ [ k ] T ( i , . . . , i ℓ , i ℓ +1 , . . . , i ℓ ) x i · · · x ℓi ℓ , we can also view T as T ′ : ( Mat k ) × ℓ → R deﬁned by rearranging the terms above to obtain: T ′ ( X , . . . , X ℓ ) = X i ,j ∈ [ n ] X i ,j ∈ [ k ] · · · X i ℓ ,j ℓ ∈ [ k ] T ( i , . . . , i ℓ , j , . . . , j ℓ ) X i ,j · · · X ℓi ℓ ,j ℓ Finally, we deﬁne “permutation folding” operator which takes a (2 ℓ )-tensor on R k as deﬁnedabove and produces a permutation to produce an ℓ -tensor on Mat k . Deﬁnition 15 (Deﬁnition of diag σ T ) . Let T : ( R k ) × t → R be a k -tensor and σ ∈ S t . Then wedeﬁne diag σ T : ( Mat k ) × t → R as the following map (diag σ T ) (( i , j ) . . . , ( i k , j k )) = T ( i , . . . , i k ) iﬀ ~i = σ~j, (8) and otherwise. In this paper, we are interested in smooth approximators of the function ψ : R k → R deﬁned as ψ ( x ) = (cid:20) max i x i ≤ (cid:21) . (9)To this end, we introduce the Bentkus molliﬁer deﬁned by Bentkus in [Ben90] and establish severalnew properties. Readers may refer to [Ben90, FK20] for a more thorough treatment. Deﬁnition 16. [Ben90] Let g ( x ) = R x −∞ √ π e − t / dt . For every integer k ≥ , deﬁne G : R k → R as G ( x , . . . , x k ) = k Y i =1 g ( x i ) . .1 Properties of the molliﬁer and its derivatives It is easy to calculate that g ′ ( x ) = 1 √ π e − x / (10) g ′′ ( x ) = − x √ π exp (cid:0) − x / (cid:1) (11) g ′′′ ( x ) = 1 √ π (cid:0) x − (cid:1) exp (cid:0) − x / (cid:1) . (12)In order to simplify many calculations, we introduce the function¯ g ( x ) = g ′ ( x ) g ( x ) . (13) Fact 17. [FK20] It holds that g ′ ( u ) = − ( u + g ( u )) · g ( u ); (14) g ′′ ( u ) = (cid:0) u − (cid:1) g ( u ) + 3 ug ( u ) + 2 g ( u ) . (15) Also g is positive and monotone decreasing in R . g ′ is negative and monotone increasing in R . Fact 18. [Fel68, Section 7.1] For any x ≥ , it holds that e − x / √ π (cid:18) x − x (cid:19) ≤ − g ( x ) ≤ e − x / x √ π . The following lemma immediately follows from Fact 17 and Fact 18.

Lemma 19.

For any ∆ ≥ and x ∈ R with | x | ≤ ∆ , it holds that | g ( x ) | ≤ , (cid:12)(cid:12) g ′ ( x ) (cid:12)(cid:12) ≤ | g ( x ) | , (cid:12)(cid:12) g ′′ ( x ) (cid:12)(cid:12) ≤ | g ( x ) | . Fact 20. [Ben90] It holds that for any integer t, k ≥ x ∈ R k k G ( t ) ( x ) k ≤ C t log t/ ( k + 1) (16) for some constant C t only depending on t . Lemma 21.

For any x ∈ R k , if there exist more than k indices satisfying x i ≤ , then k G (1) ( x ) k ≤ O (cid:0) k (cid:1) . roof. Note that g ( z ) ≤ if z ≤

0. Let T = { i : x i ≤ } . Then k G (1) ( x ) k = k X i =1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) g ′ ( x i ) Y j = i g ( x j ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = X i ∈ T (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) g ′ ( x i ) Y j = i g ( x j ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)X i/ ∈ T g ′ ( x i ) Y j = i g ( x j ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ | T | | T |− + 12 | T | (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)X i/ ∈ T g ′ ( x i ) Y j = i : j / ∈ T g ( x j ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ | T | | T |− + 2 √ k | T | , where the equality used that the terms are all positive and the second inequality is from Fact 20.The upper bound is O (cid:0) k (cid:1) if | T | ≥ k . Claim 22.

For any x > y , it holds that (cid:12)(cid:12)(cid:12)(cid:12) g ( x ) g ′ ( y ) − g ′ ( x ) g ( y ) x − y (cid:12)(cid:12)(cid:12)(cid:12) ≤ (1 + | x | ) exp (cid:18) − y (cid:19) = (1 + | x | ) g ′ ( y ) · √ π. (17) Proof. (cid:12)(cid:12)(cid:12)(cid:12) g ( x ) g ′ ( y ) − g ′ ( x ) g ( y ) x − y (cid:12)(cid:12)(cid:12)(cid:12) = 12 π (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z −∞ exp (cid:16) − (cid:16) y + ( t + x ) (cid:17)(cid:17) − exp (cid:16) − (cid:16) x + ( t + y ) (cid:17)(cid:17) x − y dt (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ π exp (cid:18) − x + y (cid:19) Z −∞ (cid:12)(cid:12)(cid:12)(cid:12) exp (cid:18) − t (cid:19) exp ( − ty ) − exp ( − tx ) x − y (cid:12)(cid:12)(cid:12)(cid:12) dt = 12 π exp (cid:18) − x + y (cid:19) Z −∞ (cid:12)(cid:12)(cid:12)(cid:12) exp (cid:18) − t − tx (cid:19) − exp ( − t ( y − x )) y − x (cid:12)(cid:12)(cid:12)(cid:12) dt ≤ π exp (cid:18) − x + y (cid:19) Z −∞ (cid:12)(cid:12)(cid:12)(cid:12) exp (cid:18) − t − tx (cid:19) t (cid:12)(cid:12)(cid:12)(cid:12) dt = 12 π exp (cid:18) − y (cid:19) Z −∞ (cid:12)(cid:12)(cid:12)(cid:12) exp (cid:18) −

12 ( t + x ) (cid:19) t (cid:12)(cid:12)(cid:12)(cid:12) dt = 12 π exp (cid:18) − y (cid:19) (cid:18) exp (cid:18) − x (cid:19) + √ πx − x Z ∞ x e − t / dt (cid:19) ≤ (1 + | x | ) exp (cid:18) − y (cid:19) , where the second inequality used | − e − z | ≤ | z | .For every θ > , α ∈ R , we deﬁne the Bentkus molliﬁer as follows. G θ ( x ) = Pr g ∼G k [ x + α + θ g ≤

0] (18)19t is not hard to verify that G θ ( x ) = n Y i =1 Z − xiθ −∞ √ π e − x i / = G (cid:16) − x θ , · · · , − x k θ (cid:17) . The following fact states that G θ ( · + α ) /G θ ( · − α ) is a good approximator of ψ deﬁned inEq. (9) except a small inner/outer region near the “boundary” which is made precise below. Fact 23 (Lemma 6.7 and Fact 6.8 in [OST19]) . For any δ, θ ∈ (0 , , x ∈ R k there exists Λ =Θ (cid:16) θ · p log( k/δ ) (cid:17) and α = Θ (cid:16) θ · p log( k/δ ) (cid:17) such that the following holds.1. | G θ ( x + α ) − ψ ( x ) | ≤ δ if max i x i ≤ − Λ .2. | G θ ( x − α ) − ψ ( x ) | ≤ δ if max i x i ≥ Λ .3. G θ ( x + α ) − δ ≤ ψ ( x ) ≤ G θ ( x − α ) + δ for all x ∈ R k .where x + α = ( x + α, . . . , x k + α )Let A i = diag (cid:0) A i , A i (cid:1) and D = diag ( D , D ) be block diagonal matrices. To keep thenotations succinct, we set A ( x ) = P i x i A i − D . Fact 24. [OST19, Lemma 6.9] Let k, δ, θ, Λ , α be the parameters satisfying Fact 23. Let Ψ , Ψ θ : Sym k → R be the functions deﬁned as Ψ ( M ) = ψ ( λ ( M )) , Ψ θ ( M ) = G θ ( λ ( M )) , where ψ is deﬁnedin Eq. (9) and G θ is deﬁned in Eq. (18) , x and x ′ be two random variables in R k satisfying that (cid:12)(cid:12) E [Ψ θ ( A ( x ) + β I )] − E (cid:2) Ψ θ (cid:0) A (cid:0) x ′ (cid:1) + β I (cid:1)(cid:3)(cid:12)(cid:12) ≤ η, for both β = α and β = − α . Then, it holds that (cid:12)(cid:12) E [Ψ ( A ( x ))] − E (cid:2) Ψ (cid:0) A (cid:0) x ′ (cid:1)(cid:1)(cid:3)(cid:12)(cid:12) ≤ η + 3 δ + Pr [ λ max ( A ( x )) ∈ ( − Λ , Λ]] . Before we describe the main theorem of this section, we need the following notation introduced bySendov in [Sen07] to calculate the high-order Fr´echet derivatives of spectral functions.

Deﬁnition 25. [Sen07] Let t ≥ and x ∈ R t . Let T : ( R k ) × t → R be a t -tensor. For every, ℓ ∈ [ t ] , deﬁne a ( t + 1) -tensor T ℓ out : ( R k ) × ( t +1) → R as follows ( T ℓ out )( i , . . . , i t +1 ) = ( i ℓ = i t +1 T ( i ,...,i ℓ − ,i t +1 ,i ℓ +1 ,...,i t ) − T ( i ,...,i ℓ − ,i ℓ ,i ℓ +1 ,...,i t ) x it +1 − x iℓ i ℓ = i t +1 . Finally, for every ℓ ∈ [ t ] , deﬁne T σ ( x ) =  ∇ f ( x ) ℓ = 1 , σ = (1)( T ( x )) ℓ out ℓ ≤ t − ∇ T σ ( x ) ℓ = t, here σ ( ℓ ) is deﬁned as follows: let σ be a permutation of [ k ] given in the cycle decomposition,then σ ( ℓ ) is a permutation of [ k + 1] elements whose cycle representation is the same as σ exceptthat the element k + 1 is inserted after the ℓ th element and before the ( ℓ + 1) th element in the cyclerepresentation of σ . We are now ready to state the main theorem for computing spectral derivatives.

Theorem 26. [Sen07] Let X ∈ Sym k be such that the eigenvalues of X are all distinct. Let F : Sym k → R be a spectral function (i.e., F = f ◦ λ for f : R k → R ). Then F is t -timesdiﬀerentiable at X if and only if f is t -times diﬀerentiable at λ ( X ) .Moreover, for every σ ∈ S t , x ∈ R k , let T σ ( x ) : ( R k ) × t → R be a t -tensor as deﬁned inDeﬁnition 25 (which depends on the function f ). Then, for every U , . . . , U t ∈ Sym k , we have D t F ( X ) [ U , . . . , U t ] = X σ ∈ S t diag σ T σ ( λ ( X )) ! ( V T U V, . . . , V T U t V ) , where V satisﬁes X = V (diag( λ ( X )) V T and diag σ T : ( Mat k ) t → R is a t -tensor on the set Sym k (as deﬁned in Deﬁnition 15). In this section, we ﬁrst understand the relevant quantities to compute the spectral derivatives ofsmooth functions.

Theorem 27.

Let k, n ≥ . Let f : R k → R be a -times diﬀerentiable symmetric function and λ : Sym k → R k be the map λ ( M ) = ( λ ( M ) , . . . , λ k ( M )) for every M ∈ Sym k . Let F : Sym k → R be deﬁned as F ( M ) = ( f ◦ λ )( M ) for all M ∈ Sym k . Then, for every P ∈ Sym k with distinct eigenvalues and H ∈ Sym k , let P = V (diag ( λ ( P ))) V T be a spectral decomposition of P and H = V QV T . Then D F ( P ) [ Q, Q, Q ] is the summation of the following terms.1. P i ∇ i ,i ,i f ( x ) H i ,i P i = i ∇ i ,i ,i f ( x ) H i ,i H i ,i P i = i = i ( ∇ i ,i ,i f ( x )) · H i ,i H i ,i H i ,i P i = i (cid:18) ∇ i ,i −∇ i ,i x i − x i − ∇ i −∇ i ( x i − x i ) (cid:19) f ( x ) H i ,i H i ,i P i = i = i ∇ i ,i −∇ i ,i x i − x i f ( x ) H i ,i H i ,i P i = i = i (cid:16) ∇ i −∇ i ( x i − x i )( x i − x i ) − ∇ i −∇ i ( x i − x i )( x i − x i ) (cid:17) f ( x ) H i ,i H i ,i H i ,i P i = i = i (cid:16) ∇ i −∇ i ( x i − x i )( x i − x i ) − ∇ i −∇ i ( x i − x i )( x i − x i ) (cid:17) f ( x ) H i ,i H i ,i H i ,i , For more intuition, consider a simple example: let σ = (12)(3) be a permutation on [3], then σ ( · ) is a permutationon [4] deﬁned as follows: σ (1) is (142)(3), similarly σ (2) = (124)(3), σ (3) = (12)(34), σ (4) = (12)(3)(4). Think of x ∈ R k as the eigenvalues of X ∈ Sym k , i.e., x = λ ( X ). here x = ( λ ( P ) , . . . , λ k ( P )) .Proof. To prove this theorem, we ﬁrst apply Theorem 26 for t = 3 to obtain D F ( P ) [ Q, Q, Q ] =  X σ ∈ S diag σ T σ ( λ ( P ))  ( H, H, H ) . (19)We next carefully express each quantity in the summation using the deﬁnition of these tensors andupper bound each term. To this end, we break down all the six elements of S and analyze themseparately as follows. Case 1: σ = (1)(2)(3) . Then T σ ( x ) = ∇ f ( x ). Case 2: σ = (12)(3) . First note that we have for σ = (12) and (cid:0) T (12) ( x ) (cid:1) i ,i = ( i = i x i − x i · ( ∇ i − ∇ i ) f ( x ) i = i Now, in order to compute T (12)(3) , we need to compute ∇ T (12) ( x ) which can be written as follows (cid:0) T (12)(3) ( x ) (cid:1) i ,i ,i =  i = i x i − x i · (cid:16) ∇ i ,i − ∇ i ,i (cid:17) f ( x ) − x i − x i ) · ( ∇ i − ∇ i ) f ( x ) i = i = i x i − x i · (cid:16) ∇ i ,i − ∇ i ,i (cid:17) f ( x ) + x i − x i ) · ( ∇ i − ∇ i ) f ( x ) i = i = i x i − x i · (cid:16) ∇ i ,i − ∇ i ,i (cid:17) f ( x ) i = i = i Case 3: σ = (13)(2) . First note that for σ = (1)(2), we have T (1)(2) = ∇ f and σ (1) = (13)(2).So, we need to compute (cid:0) ∇ f (cid:1) f ( x ) and we get (cid:0) T (13)(2) ( x ) (cid:1) i ,i ,i = ( i = i x i − x i · (cid:16) ∇ i ,i − ∇ i ,i (cid:17) f ( x ) i = i Case 4: σ = (1)(23) . First note that for σ = (1)(2), we have T (1)(2) = ∇ f and σ (2) = (1)(23).So, we need to compute (cid:0) ∇ f (cid:1) f ( x ) and we get (cid:0) T (1)(23) ( x ) (cid:1) i ,i ,i = ( i = i x i − x i · (cid:16) ∇ i ,i − ∇ i ,i (cid:17) f ( x ) i = i Case 5: σ = (123) . Let σ = (12), then σ (2) = (123). So we need to compute (cid:0) T (12) (cid:1) f ( x ) andwe obtain (cid:0) T (123) ( x ) (cid:1) i ,i ,i =  x i − x i ) · ( ∇ i − ∇ i ) f ( x ) i = i = i x i − x i ) · ( ∇ i − ∇ i ) f ( x ) i = i = i x i − x i )( x i − x i ) · ( ∇ i − ∇ i ) f ( x ) − x i − x i )( x i − x i ) · ( ∇ i − ∇ i ) f ( x ) i = i = i ase 6: σ = (132) . Let σ = (12), then στ (1) = (132). So we need to compute (cid:0) T (12) (cid:1) f ( x ) andwe obtain. (cid:0) T (132) ( x ) (cid:1) i ,i ,i =  − x i − x i ) · ( ∇ i − ∇ i ) f ( x ) i = i = i x i − x i ) · ( ∇ i − ∇ i ) f ( x ) i = i = i x i − x i )( x i − x i ) · ( ∇ i − ∇ i ) f ( x ) − x i − x i )( x i − x i ) · ( ∇ i − ∇ i ) f ( x ) i = i = i X σ ∈ S T σ ( x )( H, H, H ) = X σ X i ,i ,i ( T σ ( x )) i ,i ,i H i ,i σ (1) H i ,i σ (2) H i ,i σ (3) Let’s write this out as follows: by T i , we mean T case ( i ) above X i ,i ,i ( T ) i ,i ,i H i ,i H i ,i H i ,i + ( T ) i ,i ,i H i ,i H i ,i H i ,i + ( T ) i ,i ,i H i ,i H i ,i H i ,i + ( T ) i ,i ,i H i ,i H i ,i H i ,i + ( T ) i ,i ,i H i ,i H i ,i H i ,i + ( T ) i ,i ,i H i ,i H i ,i H i ,i and in particular, assuming H is symmetric the above simpliﬁes to X i ,i ,i ( T ) i ,i ,i H i ,i H i ,i H i ,i + ( T ) i ,i ,i H i ,i H i ,i + ( T ) i ,i ,i H i ,i H i ,i + ( T ) i ,i ,i H i ,i H i ,i + ( T ) i ,i ,i H i ,i H i ,i H i ,i + ( T ) i ,i ,i H i ,i H i ,i H i ,i (20)Now, we will break up this sum into 5 cases as follows which we need to upper bound Case (i): i = i = i . Then Eq. (20) reduces to the following X i ,i H i ,i H i ,i ( T + T ) + H i ,i H i ,i ( T + T + T + T ) (21)Note that when we say T q above, we mean ( T q ) i ,i ,i = ( T q ) i ,i ,i (since i = i ). Let us now plugin the values of the corresponding T q s into the formula and rewrite the above as follows X i = i H i ,i H i ,i (cid:0) ∇ i ,i ,i f ( x ) + 0 (cid:1) ++ H i ,i H i ,i ∇ i ,i − ∇ i ,i x i − x i + ∇ i − ∇ i ( x i − x i ) + ∇ i ,i − ∇ i ,i x i − x i + ∇ i − ∇ i ( x i − x i ) ! f ( x )= X i = i H i ,i H i ,i (cid:0) ∇ i ,i ,i f ( x ) (cid:1) + 2 H i ,i H i ,i ∇ i ,i − ∇ i ,i x i − x i + ∇ i − ∇ i ( x i − x i ) ! f ( x ) (22) Case (ii): i = i = i . Then Eq. (20) reduces to X i ,i H i ,i H i ,i ( T + T ) + H i ,i H i ,i ( T + T + T + T ) (23)23he above simplies to the following X i = i H i ,i H i ,i (cid:0) ∇ i ,i ,i f ( x ) + 0 (cid:1) ++ H i ,i H i ,i ∇ i ,i − ∇ i ,i x i − x i + ∇ i ,i − ∇ i ,i x i − x i + ∇ i − ∇ i ( x i − x i ) + ∇ i − ∇ i ( x i − x i ) ! f ( x )= X i = i H i ,i H i ,i (cid:0) ∇ i ,i ,i f ( x ) (cid:1) + 2 H i ,i H i ,i ∇ i ,i − ∇ i ,i x i − x i + ∇ i − ∇ i ( x i − x i ) ! f ( x ) (24) Case (iii): i = i = i . Then Eq. (20) reduces to X i ,i H i ,i H i ,i ( T + T ) + H i ,i H i ,i ( T + T + T + T ) (25)The above simpliﬁes to the following X i = i H i ,i H i ,i (cid:0) ∇ i ,i ,i f ( x ) + 0 (cid:1) ++ H i ,i H i ,i ∇ i ,i − ∇ i ,i x i − x i − ∇ i − ∇ i ( x i − x i ) + ∇ i ,i − ∇ i ,i x i − x i − ∇ i − ∇ i ( x i − x i ) ! f ( x )= X i = i H i ,i H i ,i (cid:0) ∇ i ,i ,i f ( x ) (cid:1) + 2 H i ,i H i ,i ∇ i ,i − ∇ i ,i x i − x i − ∇ i − ∇ i ( x i − x i ) ! f ( x ) (26) Case (i)+ Case (ii)+ Case (iii).

We ﬁrst upper bound these three cases to get the desiredupper bound in the theorem statement. First summing the three cases, we have X i = i H i ,i H i ,i (cid:0) ∇ i ,i ,i + ∇ i ,i ,i + ∇ i ,i ,i (cid:1) f ( x )+ 6 X i = i H i ,i H i ,i ∇ i ,i − ∇ i ,i x i − x i − ∇ i − ∇ i ( x i − x i ) ! f ( x ) | {z } ( ⋆ ) (27)We now bound ( ⋆ ) using the following claim. Case (iv): i = i = i . Then Eq. (20) reduces to X i H i ,i ( T + T + T + T + T + T ) = X i H i ,i ∇ i ,i ,i f (28)24 ase (v): i = i = i . Then Eq. (20) stays the same and we get X i ,i ,i ( ∇ i ,i ,i f ) · H i ,i H i ,i H i ,i + ∇ i ,i − ∇ i ,i x i − x i f ( x ) H i ,i H i ,i + ∇ i ,i − ∇ i ,i x i − x i f ( x ) H i ,i H i ,i + ∇ i ,i − ∇ i ,i x i − x i f ( x ) H i ,i H i ,i + (cid:18) ∇ i − ∇ i ( x i − x i )( x i − x i ) − ∇ i − ∇ i ( x i − x i )( x i − x i ) (cid:19) f ( x ) H i ,i H i ,i H i ,i + (cid:18) ∇ i − ∇ i ( x i − x i )( x i − x i ) − ∇ i − ∇ i ( x i − x i )( x i − x i ) (cid:19) f ( x ) H i ,i H i ,i H i ,i (29)This concludes the proof of the theorem statement. We now state the main theorem we prove using the theorem above. Let G : R k → R be the Bentkusfunction given in Deﬁnition 16. Theorem 28.

Let k ≥ be an integer and ψ : Sym k → R be a function deﬁned as ψ ( M ) =( G ◦ λ ) ( M ) where G is given in Deﬁnition 16. Given ∆ ≥ X ∈ Sym k with eigenvalues λ ( X ) =( x , . . . , x k ) satisfying that k X k ≤ ∆ , it holds that (cid:12)(cid:12) D ψ ( X ) [ H, H, H ] (cid:12)(cid:12) ≤ O (cid:0) ∆ · log k · k H k (cid:1) . The following corollary simply follows from the deﬁnition of G θ in Eq. 18 and the chain ruleof Fr´echet derivatives in Fact 9. Corollary 29.

Let k ≥ be an integer and θ > , α ∈ R and Ψ θ : Sym k → R be a function deﬁnedas Ψ θ ( M ) = ( G θ ◦ λ ) ( M + α I ) , where G θ is given in Eq. (18) . Given ∆ ≥ , X ∈ Sym k witheigenvalues λ ( X ) = ( x , . . . , x k ) satisfying that k X k ≤ ∆ , it holds that (cid:12)(cid:12) D Ψ θ ( X + α I ) [ H, H, H ] (cid:12)(cid:12) ≤ O (cid:18) ∆ + α θ · log k · k H k (cid:19) . In order to prove the theorem above, We upper bound all the terms listed in Theorem 27individually in the following sections (in increasing order of diﬃculty). Given the calculations arefairly technical we break down the analysis in the following sections for modularity and readerconvenience. In Section 4.4.1 we bound the ﬁrst three terms in Theorem 27 (this is the easy casesince the analysis is very similar to what happens in [HKM13] and requires new properties of theBentkus function), in Section 4.4.2 and 4.4.3 we bound the fourth and ﬁfth term (this alreadydeviates from the analysis of [HKM13]) and ﬁnally in Section 4.5 we bound the sixth and seventhterm (this calculation is fairly involved and deviates signiﬁcantly from prior works, since we needto deal with properties of Fr´echet derivatives, new properties of Bentkus function and the non-diagonal entries of the matrices H which is unique to the matrix-spectrahedron case and is notfaced in [HKM13, ST17, OST19]).As spectral functions and spectral norms are unitary invariant, we assume that X = diag ( x , . . . , x k )is diagonal without loss of generality. To adopt Theorem 27, we assume that all the x , . . . , x n aredistinct. We further conclude the result by continuity.25 .4 Bounding terms ( )-( ) in Theorem 27 for Bentkus function Let G : R k → R be the Bentkus function given in Deﬁnition 16. Recall that G ( x ) = Q i g ( x i ),where g ( x ) = √ π R − x −∞ e − t / dt . Recall the notation g ′ ( x ) = √ π e − x / and g ( x ) = g ′ ( x ) /g ( x ). (1 , , in Theorem 27Lemma 30 (Bounding terms (1 , , . The following three terms (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)X i ∇ i ,i ,i G ( x ) H i ,i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X i = i ∇ i ,i ,i G ( x ) H i ,i H i ,i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X i = i = i ∇ i ,i ,i G ( x ) · H i ,i H i ,i H i ,i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) can be upper bound by O (log . k · k H k ) .Proof. The ﬁrst upper bound is straightforward. Observe that (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)X i ∇ i ,i ,i G ( x ) H i ,i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ max i | H i,i | · X i (cid:12)(cid:12) ∇ i ,i ,i G ( x ) (cid:12)(cid:12) ≤ max i | H i,i | ·k G (3) ( x ) k ≤ k H k · log . k, where the second inequality follows by deﬁnition of k G (3) k and the last inequality used max i,j | H i,j | ≤k H k (the latter being the spectral norm of H ) and Fact 20 to conclude k G (3) k ≤ O (cid:0) log . k (cid:1) .Similarly, the remaining two terms can also be bounded exactly as above (by observing that P i = i ∇ i ,i ,i G and P i = i = i ( ∇ i ,i ,i G ) appear in the expression of k G (3) k . ) in Theorem 27 In order to bound the remaining terms in Theorem 27, we need the following claim.

Claim 31.

It holds that1. P i = i g ( x i ) (cid:12)(cid:12)(cid:12) G ( x ) H i ,i H i ,i (cid:12)(cid:12)(cid:12) ≤ O (cid:16) √ log k · k H k (cid:17) . P i = i g ( x i ) (cid:12)(cid:12)(cid:12) G ( x ) H i ,i H i ,i (cid:12)(cid:12)(cid:12) ≤ O (cid:16) √ log k · k H k (cid:17) . P i = i = i (cid:12)(cid:12)(cid:12) g ( x i ) g ( x i ) G ( x ) H i ,i H i ,i (cid:12)(cid:12)(cid:12) ≤ O (cid:16) log k · k H k (cid:17) .Proof. For Item 1, we have X i = i g ( x i ) (cid:12)(cid:12) G ( x ) H i ,i H i ,i (cid:12)(cid:12) ≤ X i g ( x i ) G ( x ) · max i X i (cid:12)(cid:12) H i ,i H i ,i (cid:12)(cid:12) ≤ k G (1) ( x ) k k H k where the last inequality is becausemax i X i (cid:12)(cid:12) H i ,i H i ,i (cid:12)(cid:12) ≤ k H k max i (cid:0) H (cid:1) i ,i ≤ k H k , (30)using the fact that max ij | H ij | ≤ k H k . Using Fact 20 shows the ﬁrst inequality. Item 2 follows bythe same reason. 26or Item 3, we have X i = i = i (cid:12)(cid:12) g ( x i ) g ( x i ) G ( x ) H i ,i H i ,i (cid:12)(cid:12) = X i = i | g ( x i ) g ( x i ) G ( x ) | max i ,i X i (cid:12)(cid:12) H i ,i H i ,i (cid:12)(cid:12) ≤ O (cid:16) log k · k H k (cid:17) where the inequality is from Fact 20 and the fact that X i (cid:12)(cid:12) H i ,i H i ,i (cid:12)(cid:12) ≤ k H k · (cid:0) H (cid:1) i ,i ≤ k H k . (31) Lemma 32 (Bounding terms (4) in Theorem 27) . We have X i = i H i ,i H i ,i ∇ i ,i G − ∇ i ,i Gx i − x i − ∇ i G − ∇ i G ( x i − x i ) ! ≤ O (cid:16) ∆ · p log k k H k (cid:17) . Proof.

First observe that ∇ i G ( x ) = g ′ ( x i ) Y j = i G ( x j ) = g ( x i ) · G ( x ) , and similarly we have ∇ i ,i G ( x ) = g ( x i ) ∇ i G ( x )+ G ( x ) ∇ i g ( x i ) = (cid:0) g ( x i ) − ( x i + g ( x i )) g ( x i ) (cid:1) G ( x ) = − x i g ( x i ) G ( x ) , where we used Fact 17. Now, let us start upper bounding the lemma statement as follows (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X i = i H i ,i H i ,i ∇ i ,i G − ∇ i ,i Gx i − x i − ∇ i G − ∇ i G ( x i − x i ) !(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ X i = i (cid:12)(cid:12)(cid:12) − g ( x i ) g ( x i ) + x i g ( x i ) x i − x i − g ( x i ) − g ( x i )( x i − x i ) (cid:12)(cid:12)(cid:12) · | G ( x ) · H i ,i H i ,i | = X i = i (cid:12)(cid:12)(cid:12) − g ( x i ) g ( x i ) + x i g ( x i ) x i − x i − g ′ ( ξ i ,i ) x i − x i (cid:12)(cid:12)(cid:12) · | G ( x ) · H i ,i H i ,i | = X i = i (cid:12)(cid:12)(cid:12) g ( x i ) g ( x i ) + x i g ( x i ) x i − x i − ( ξ i ,i + g ( ξ i ,i )) g ( ξ i ,i ) x i − x i (cid:12)(cid:12)(cid:12) · | G ( x ) · H i ,i H i ,i |≤ X i = i (cid:12)(cid:12)(cid:12) x i g ( x i ) − ξ i ,i g ( ξ i ,i ) x i − x i (cid:12)(cid:12)(cid:12) · | G ( x ) · H i ,i H i ,i | | {z } :=(1) + (cid:12)(cid:12)(cid:12) g ( x i ) g ( x i ) − g ( ξ i ,i ) x i − x i (cid:12)(cid:12)(cid:12) · | G ( x ) · H i ,i H i ,i | | {z } :=(2) , where the ﬁrst equality used the mean-value theorem to obtain a ξ i ,i ∈ [ x i , x i ], second equalityused Eq. (14). We now bound both these terms separately as follows.27 erm 1 upper bound. Note that ξ i ,i is between x i and x i . The ﬁrst term is upper bounded by X i = i (cid:12)(cid:12)(cid:12) x i g ( x i ) − ξ i ,i g ( ξ i ,i ) x i − ξ i ,i (cid:12)(cid:12)(cid:12) · | G ( x ) · H i ,i H i ,i | = X i = i (cid:12)(cid:12)(cid:12)(cid:0) − η i ,i (cid:1) g ( η i ,i ) − η i ,i g ( η i ,i ) (cid:12)(cid:12)(cid:12) · (cid:12)(cid:12) G ( x ) · H i ,i H i ,i (cid:12)(cid:12) ≤ X i = i g ( η i ,i ) (cid:12)(cid:12) G ( x ) · H i ,i H i ,i (cid:12)(cid:12) for some η i ,i between x i and ξ i ,i , where we apply a mean value theorem for the function xg ( x )for the equality and Lemma 19 for the inequality. Note that g ( · ) is nonnegative and monotonedecreasing by Fact 17. Thus the ﬁrst term is upper bounded by2∆ X i = i max { g ( x i ) , g ( x i ) } (cid:12)(cid:12) G ( x ) · H i ,i H i ,i (cid:12)(cid:12) which, in turn, is upper bounded by O (cid:16) ∆ · √ log k · k H k (cid:17) from Fact 20 and Eqs (30), (31). Term 2 upper bound.

By triangle inequality we upper bound the second term by X i = i (cid:12)(cid:12)(cid:12) g ( x i ) g ( x i ) − g ( ξ i ,i ) x i − x i (cid:12)(cid:12)(cid:12) · | G ( x ) · H i ,i H i ,i |≤ X i = i (cid:12)(cid:12)(cid:12) g ( x i ) g ( x i ) − g ( x i ) x i − x i (cid:12)(cid:12)(cid:12) · | G ( x ) · H i ,i H i ,i | + X i = i (cid:12)(cid:12)(cid:12) g ( x i ) − g ( ξ i ,i ) x i − x i (cid:12)(cid:12)(cid:12) · | G ( x ) · H i ,i H i ,i | . (32)We ﬁrst upper bound the ﬁrst quantity in Eq. (32) ﬁrst as follows. X i = i (cid:12)(cid:12)(cid:12) g ( x i ) g ( x i ) − g ( x i ) x i − x i (cid:12)(cid:12)(cid:12) · | G ( x ) · H i ,i H i ,i | = X i = i (cid:12)(cid:12)(cid:12) g ( x i ) − g ( x i ) x i − x i (cid:12)(cid:12)(cid:12) · | G ( x ) | · | g ( x i ) | · | H i ,i H i ,i | = X i = i | g ′ ( ζ i ,i ) | · | G ( x ) | · | g ( x i ) | · | H i ,i H i ,i |≤ · X i = i | G ( x ) | · | g ( x i ) | · | H i ,i H i ,i |≤ k G (1) k k H k . ≤ O (cid:16) ∆ · p log k · k H k (cid:17) , (33)where ζ i ,i between x i and x i , ﬁrst inequality uses Fact 19, the second inequality uses Eqs. (30), (31)and the last inequality is from Fact 20.We now bound the second term in Eq. (32) as follows28 i = i (cid:12)(cid:12)(cid:12) g ( x i ) − g ( ξ i ,i ) x i − x i (cid:12)(cid:12)(cid:12) · | G ( x ) · H i ,i H i ,i |≤ X i = i (cid:12)(cid:12)(cid:12) g ( x i ) − g ( ξ i ,i ) ξ i ,i − x i (cid:12)(cid:12)(cid:12) · | G ( x ) · H i ,i H i ,i | (for ξ is between x i and x i )= 2 X i = i (cid:12)(cid:12) g ( η i ,i ) g ′ ( η i ,i ) (cid:12)(cid:12) · | G ( x ) | · (cid:12)(cid:12) H i ,i H i ,i (cid:12)(cid:12) (for some η i ,i between x i and ξ i ,i ) ≤ X i = i | g ( η i ,i ) | · | G ( x ) | · (cid:12)(cid:12) H i ,i H i ,i (cid:12)(cid:12) (Fact 19) ≤ X i = i max { g ( x i ) , g ( x i ) } · | G ( x ) | · (cid:12)(cid:12) H i ,i H i ,i (cid:12)(cid:12) . (Fact 17 and Lemma 19)Further applying Fact 20 and putting together Eqs. (31)(30), we conclude that it can be upperbounded by O (cid:16) ∆ √ log k k H k (cid:17) . ) in Theorem 27Lemma 33 (Bounding terms (5) in Theorem 27) . We have (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X i = i = i ∇ i ,i G ( x ) − ∇ i ,i G ( x ) x i − x i H i ,i H i ,i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ O (cid:16) ∆ · log k · k H k (cid:17) . Proof. (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X i = i = i ∇ i ,i G ( x ) − ∇ i ,i G ( x ) x i − x i H i ,i H i ,i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X i = i = i g ( x i ) ( g ( x i ) − g ( x i )) x i − x i G ( x ) H i ,i H i ,i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X i = i = i g ′ ( ξ i ,i ) g ( x i ) G ( x ) H i ,i H i ,i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (for some ξ i ,i between x i and x i ) ≤ X i = i = i (cid:12)(cid:12) max { g ( x i ) , g ( x i ) } g ( x i ) G ( x ) H i ,i H i ,i (cid:12)(cid:12) ≤ O (cid:16) ∆ · log k · k H k (cid:17) , where the last inequality is from Fact 20 and Eqs. (30)(31). (6 , in Theorem 27 for Bentkus function Let G : R k → R be the Bentkus function given in Deﬁnition 16. Recall that G ( x ) = Q i g ( x i ),where g ( x ) = √ π R − x −∞ e − t / dt . Recall the notation g ′ ( x ) = √ π e − x / and g ( x ) = g ′ ( x ) /g ( x ).Restating the terms for convenience. 29 emma 34 (Bounding terms (6 ,

7) in Theorem 27) . (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X i = i = i g ( x i ) − g ( x i ) x i − x i − g ( x i ) − g ( x i ) x i − x i x i − x i G ( x ) H i ,i H i ,i H i ,i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ O (cid:16) ∆ log k k H k (cid:17) (34)This is the most involved part. Note that the left hand side is unchanged if we zero out alldiagonal entries of H . And further note that k H − diag ( H ) k ≤ k H k where diag ( H ) is a diagonalmatrix obtained by diagonalizing H . Thus, we may assume that the diagonal elements in H arezero without loss of generality. We break down the analysis into two cases (the ﬁrst one being thesimpler case). x i s. The simpler case is when the number of negative x i s is “large”. Lemma 35. If |{ i : x i < }| > k , then the quantity in Eq. (34) is upper bounded by O (cid:16) ∆ √ log kk · k H k (cid:17) .Proof. Applying Fact 3 a mean value theorem of divided diﬀerence and Lemma 19, the term inEq. (34) is upper bounded by O  ∆ X i = i = i g ( ζ i ,i ,i )  G ( x ) | H i ,i H i ,i H i ,i |≤ O  ∆ X i = i = i max { g ( x i ) , g ( x i ) , g ( x i ) }  G ( x ) | H i ,i H i ,i H i ,i |≤ O  ∆ k G (1) k max i X i ,i | H i ,i H i ,i H i ,i |  ≤ O ∆ · k · k G (1) ( x ) k · max i ,i X i | H i ,i H i ,i H i ,i | ! ≤ O (cid:16) ∆ · k · k G (1) ( x ) k · k H k (cid:17) ≤ O (cid:18) ∆ √ log kk · k H k (cid:19) where the ﬁrst inequality is from the positivity and monotonicity of g ( · ) due to Fact 17 to concludethat | g ( ζ i ,i ,i ) | ≤ max {| g ( x i ) | , | g ( x i ) | , | g ( x i ) |} ; the second last inequality is from the follow-ing fact X i | H i ,i H i ,i H i ,i | ≤ k H k vuut X i H i ,i ! X i H i ,i ! ≤ k H k ; (35)the last inequality is from Lemma 21 (which uses that the number of negative x i s is ≤ k ).30 .5.2 Case 2: A few negative x i s We now assume that |{ i : x i < }| ≤ k and this case the most complicated and upper boundingit is the most technical. We push this proof to Appendix A. Proof of Theorem 28.

Combining Theorem 27 and Lemmas 30, 32, 33, 34, we obtain our result.

In this section we prove certain combinatorial and geometric properties of positive spectrahedrons.Understanding the surface area of a convex object is a fundamental question in convex geometry.In the context of theoretical computer science, one of the earliest works Klivans, O’Donnell andServedio [KOS04] related learnability of geometric convex objects (in the PAC and agnostic setting)to a natural complexity measure of of

Gaussian surface area ( GSA ). Recall that for a convex object S ⊆ R n , we have GSA ( S ) = lim inf δ → G n ( S δ \ S ) δ where S δ = { x : dist ( x, S ) ≤ δ } denotes the δ -neighborhood of S under Euclidean distance.In some sense the work of [KOS04] showed that, GSA of convex objects characterizes learnabilityof these objects under the Gaussian distribution. This remarkable connection has provided furthermotivation to understand what is the

GSA of basic well-studied convex sets. In this direction, awell-known result of Ball gives an upper bound on the surface area of arbitrary convex objects.

Theorem 36. [Bal93] The surface area of every convex set on n coordinates is at most O ( n / ) . For our setting it is unclear what the Gaussian surface area of spectrahedrons is. Clearly, sincespectrahedrons are convex objects, one can use Theorem 36 to show an upper bound of O ( n / ).Below we show that one can in fact prove an upper bound of O (1) on the Gaussian surface area of positive spectrahedrons. We make this formal in the theorem below. Theorem 37 (Matrix version of Peres theorem) . Let S be a positive spectrahedron deﬁned as S = (cid:8) x ∈ R n : X i x i A i (cid:22) B, A , . . . , A n , B ∈ Sym k and A i is PSD for i ∈ [ n ] (cid:9) . Then the Gaussian surface area of S is O (1) (independent of k, n ). Let f ( x ) = [ x ∈ S ] for x ∈ {− , } n . Then the ε -noise sensitivity of f is NS ε ( f ) = O ( √ ε ) . Corollary 38.

Let S , S be distinct positive spectrahedrons speciﬁed by { A j , . . . , A nj , B j } j ∈ [2] respectively, where A i (cid:23) and A i (cid:22) for all i . Let F ( x ) = ^ j =1 "X i x i A ij (cid:22) B j be an intersection of positive spectrahedrons. Then AS ( F ) ≤ O ( √ n ) , GSA ( S ∩ S ) = O (1) Subsequently it was shown that this bound was optimal for a convex body formed by exp( n / ) randomlyintersecting halfspaces. O ( √ log k ) (thereby reproving Nazarov [Naz03]). Before stating theKane’s result, we need to introduce the following notion. Deﬁnition 39 (Unate function) . A function f : {− , } n → { , } is unate if it satisﬁes thefollowing: for every i ∈ [ n ] , f is either increasing or decreasing with respect to the i th coordinate,i.e., for every i ∈ [ n ] , either f ( x , . . . , x i − , − , x i +1 , . . . , x n ) ≤ f ( x , . . . , x i − , , x i +1 , . . . , x n ) forall x or f ( x , . . . , x i − , − , x i +1 , . . . , x n ) ≥ f ( x , . . . , x i − , , x i +1 , . . . , x n ) for all x . In particular, Kane proved the following stronger statement.

Theorem 40. [Kan14a] Let f , . . . , f k : {− , } n → { , } be unate functions and let F : {− , } n → { , } be deﬁned as F ( x ) = V i f i ( x ) . Then the average sensitivity of F satisﬁes AS ( F ) ≤ O ( p n log( k + 1)) . It is not hard to see that a positive spectrahedron is a unate function so Theorem 40 holds forus as well for k = 1. To be precise we have Corollary 41.

Let S be as deﬁned in Theorem 37. Let F : {− , } n → { , } be deﬁned as F ( x ) = 1 if and only if x ∈ S . Then AS ( F ) ≤ O ( √ n ) . In order to translate Theorem 40 to obtain the corollary above: for a positive spectrahedron S ,let F ( x ) = [ x ∈ S ] for x ∈ {− , } n , then one can rewrite F as F ( x ) = V kj =1 (cid:2) λ j (cid:0)P i x i A i − B (cid:1) ≥ (cid:3) which is an AND of k unate functions by the Weyl’s inequality [Bha13, Theorem III 2.1](the innerfunctions f i are unate since all the A i s are promised to be PSD ).Recall that we are interested in the Gaussian surface area of such bodies (not just the averagesensitivity) which is closely related to the noise sensitivity of positive spectrahedrons. In the samepaper, Kane [Kan14a] adapts the well-known techniques of [DGJ +

10] to show that the ε -noisesensitivity of intersections of halfspaces is at most O ( √ ε log k ) and remarks that such a bound does not hold for the intersections of unate functions. Below, we show that one can modify theproof of [DGJ +

10] to also show that the noise sensitivity of positive spectrahedrons is can bebounded by a the “average 2-sensitivity” of positive spectrahedrons which we show is O ( √ ε ) bymodifying Kane’s proof in Theorem 40. This proves our Theorem 37. Proof of Theorem 37.

In order to prove the theorem, we ﬁrst show that for a function f : {− , } n →{ , } deﬁned as f ( x ) = " n X i =1 x i A i (cid:22) B for A , . . . , A n , B ∈ Sym k and A i is PSD for i ∈ [ n ], the ε -noise sensitivity of f satisﬁes NS ε ( f ) = Pr ( x , y ) ε − correlated [ f ( x ) = f ( y )] ≤ O ( √ ε ) . For simplicity let us assume that ε = 1 /m , for some integer m which divides n (since NS ε is anon-decreasing function in ε , we can even round ε down to satisfy this condition).In order to analyze NS ε ( f ) we ﬁrst observe that one can generate an ε -correlated pair of strings( x, y ) ∈ {− , } n as follows: There is a +1 compared to Kane’s result to ensure that the result is valid for k = 1.

32. Pick a uniformly random string z ∼ U n .2. Randomly partition [ n ] into m disjoint buckets C , . . . , C m ⊆ [ n ] such that ∪ i C i = [ n ].Furthermore, for z ∈ {− , } n (picked in step 1), split each bucket as follows: for every ℓ ∈ [ m ], split C ℓ into C ℓ, and C ℓ, − such that C ℓ, corresponds to the positive coordinates in z B ℓ and C ℓ, − corresponds to the negative coordinates in z C ℓ . So overall there are 2 m disjointbuckets { C ℓ,s : ℓ ∈ [ m ] , s ∈ {− , }} such that ∪ ℓ,s C ℓ,s = [ n ]. Set ˜ C ℓ = C ℓ, if ℓ ≤ n and˜ C ℓ = C ℓ − n, − if ℓ > n .3. Corresponding to each bucket ˜ C ℓ , pick a uniformly random bit b ℓ ∼ U .4. Obtain x as follows: for every ℓ ∈ [2 m ], obtain x from z by multiplying all the bits in z ˜ C ℓ by b ℓ .5. We obtain y as follows: pick a uniformly random ℓ ∈ [ m ] and ﬂip the signs of x i (obtainedin step 4) for all the indices i in C ℓ , i.e., y i = − x i if i ∈ C ℓ and y i = x i otherwise.Observe that the the ( x , y ) obtained in step (4 ,

5) are uniform and ε -correlated. To see this,ﬁrst observe that the probability of obtaining x ∈ {− , } n is given byPr z ∼U n , { C k } , b ∼U m [ x = x ] = Pr z ,C, b [ z C · b = x C , . . . z C m · b m = x C m ]= m X i =1 Pr z ,C, b (cid:2) z C i · b i = x C i | z C
10, Proposition 9.2]. The second result we useis by Ball [Bal13] who showed that: if a Boolean function f is an indicator function of a convexset S , i.e., f − (1) = S and if S has a smooth boundary, then the Gaussian surface area of S canbe bounded as GSA ( S ) ≤ lim ε → GNS ε ( f ) √ ε . (40)Putting together Eq. (40) and Eq. (39), we get GSA ( S ) ≤ lim ε → NS ε ( f ) √ ε ≤ O (1) , where the ﬁnal inequality used the upper bound we derived earlier in Eq. (38). This concludes theproof of the theorem. We note that [DGJ +

10, Proposition 9.2] shows this statement with equality asymptotically (i.e., when we taken k Bernoulli’s to approximate a Gaussian for k → ∞ ) for f being a degree- d polynomial threshold function, and thesame proof holds true when f is an intersection of spectrahedrons. We remark that one can also obtain this bound [Kan11a, Section 3].

34e now prove Corollary 38 which bounds the Gaussian surface area of intersections of positivespectrahedrons.

Proof of Corollary 38.

The proof is very similar to the proof of the theorem above. Let m = ⌈ /ε ⌉ .We follow the same bucketing steps (1) − (5) in Theorem 37 to obtain a g : {− , } m → { , } given by g ( b ) =  m X q =1 b q X j ∈ C q z A j (cid:22) B  ·  m X q =1 b q X j ∈ C q z A j (cid:22) B  . Observe that g is an intersection of positive spectrahedrons and by deﬁnition each positive spec-trahedron is a unate function. So, by Theorem 40, we have AS ( g ) ≤ O ( √ m ) = p /ε. Repeating the same steps after Eq. (38), we get that

GSA ( S ∩ S ) ≤ lim ε → GNS ε ( F ) √ ε ≤ lim ε → NS ε ( F ) √ ε ≤ lim ε → p ε · AS ( g ) ≤ O (1) . This concludes the proof of the corollary.

We now prove the main lemma which shows that the largest eigenvalues of positive spectrahedronscannot be very concentrated. In particular, we show that for a uniformly random x ∼ U n , supposewe consider a spectrahedron D = P i x i A i − B then the measure (over the Boolean cube) that D has largest eigenvalue in a small interval is fairly small. This anti-concentration statement willbe crucial in our invariance principle proof when we move from the Bentkus molliﬁer to our CDFfunction. In the passing we remark that, prior to this work, we aren’t even aware if the weaker Gaussian analogue of this statement was known (in particular, the results of [HKM13, ST17] onlyrequire Gaussian anti-concentration for which they use a result of Nazarov [Naz03] as a black-box).We remark that the proof of our main theorem (stated below) follows the result of [OST19,Kan14a] closely since they are able to handle intersections of unate functions which is the case forpositive spectrahedrons. However, there are two subtleties.( i ) In [OST19] they bucket the set of halfspaces (which form the polytope) and show that eachbucket has signiﬁcant weight. Crucially for them, they use the fact that intersections ofhalfspaces are still unate functions. But this is not the case for positive spectrahedrons.For this, we need to modify the bucketing procedure (akin to what happens in the proof ofTheorem 37) so that this bucketing of positive spectrahedrons still results in a unate function.( ii ) In [OST19] they prove an analogue of Lemma 46 which shows that each bucket has “signiﬁcantweight”. However our proof deviates signiﬁcantly from the proof in [OST19]. For them,proving the statement in the lemma (for diagonal matrices), follows directly from Paley-Zygmund inequality, but as far as we are aware, we do not have a matrix-version of thisinequality. Due to this diﬃculty, we modify their proof and use the matrix Chernoﬀ boundto prove the statement above. 35 heorem 42. Let k ≥ be an integer and τ ≤ √ log k . Let { B , B } ⊆ Sym k , { A i } i ∈ [ n ] and { A i } i ∈ [ n ] be sequences of PSD and

NSD matrices, respectively. They satisfy that for all i ∈ [ n ] , j ∈ [2] , A i (cid:22) τ · I , A i (cid:23) − τ I and P ni =1 ( A ij ) (cid:23) I . Then for every Λ ≥ τ log k , we have Pr x ∼U n " ∃ j ∈ [2] s.t. λ max X i x i A ij − B j ! ∈ ( − Λ , Λ] ≤ O (Λ) . Again using the standard bits-to-Gaussians trick, we have the following corollary.

Corollary 43.

Let k ≥ be an integer and τ ≤ k . Let { B , B } ⊆ Sym k , { A i } i ∈ [ n ] and { A i } i ∈ [ n ] be sequences of PSD and

NSD matrices, respectively. They satisfy that for all i ∈ [ n ] , j ∈ [2] , A i (cid:22) τ · I , A i (cid:23) − τ I and P i ( A ij ) (cid:23) I . Then for every Λ ≥ τ log k , we have Pr g ∼G n " ∃ j ∈ [2] s.t. λ max X i g i A ij − B j ! ∈ ( − Λ , Λ] ≤ O (Λ) . In order to prove this theorem we will use the following two lemmas by [OST19]. Beforestating these lemmas, we introduce a few deﬁnitions from [OST19] (adapted to our setting ofpositive spectrahedrons). For the rest of the section, we let F : {− , } n → { , } be the indicatorof an intersection of positive spectrahedrons, i.e., for every j ∈ [2], let F j ( x ) = hP ni =1 x i A ij (cid:22) B j i ,where { A ij } i satisfy Eq. (7) and F ( x ) = ^ j =1 F j ( x ) = ^ j =1 " n X i =1 x i A ij (cid:22) B j . (41)1. For a set S ⊆ {− , } n , let E ( S ) be the fraction of n · n − edges which have one endpoint in S and one endpoint in S c (i.e., complement of S ).2. We let H j ⊆ {− , } n be the indicator-set for F j , i.e., x ∈ H j if and only if F j ( x ) = 1.Additionally, suppose we have sets { ¯ H , ¯ H } such that H j ⊆ ¯ H j such that ¯ H j are also theindicator-sets of unate functions. Let ∂H j = ¯ H j \ H j .3. For α ∈ [0 , ∂H j is α -semi thin if for every x ∈ H j , at least an α -fraction of itshypercube-neighbours (i.e., set of y ∈ {− , } n for which d ( x, y ) = 1) are outside ∂H j .4. We now deﬁne a few sets: let F = ¯ H ∩ ¯ H , F ◦ = H ∩ H , ∂F = F \ F ◦ With this terminology, we have the following lemma that bounds the number of edges that cross F . Lemma 44 ([OST19, Theorem 7.18]) . For j ∈ [2] , let H j be as deﬁned above. Suppose H j is α -semi thin, then vol ( ∂F ) ≤ O (cid:18) α √ n (cid:19) Using this lemma, we get the following theorem (which is the analogue of [OST19, Theo-rem 7.19]). 36 heorem 45.

Let λ > , α ∈ [0 , , { B , B } ⊆ Sym k . Let { A ij } i ∈ [ n ] ,j ∈ [2] ⊆ Sym k satisfy that A i (cid:23) , A i (cid:22) for all i ∈ [ n ] . At least α -fraction of i ∈ [ n ] satisfy that A i (cid:23) λ · I and A i (cid:22) − λ · I .Then, we have Pr x ∼U n " ∃ j ∈ [2] s.t. λ max X i x i A ij − B j ! ∈ ( − λ, ≤ O (cid:18) α √ n (cid:19) . Proof.

Let { A ij } , { B j } be as in the theorem statement. Let H j = n x ∈ {− , } n : λ max X i x i A ij − B j ! ≤ − λ o , ¯ H j = n x ∈ {− , } n : λ max X i x i A ij − B j ! ≤ o . Clearly we then have that ∂H j = ( x ∈ {− , } n : λ max X i x i A ij − B j ! ∈ ( − λ, ) and ∂F = ( x ∈ {− , } n : ∃ j ∈ [2] s.t. λ max X i x i A ij − B j ! ∈ ( − λ, ) . Since we assumed that at least an α -fraction of i s satisﬁed A i (cid:23) λ · I and A i (cid:22) − λ · I , it followsthat H j is α -semi thin, hence we can apply Lemma 44 to obtain the theorem statement.Using this theorem, we are now ready to prove our main technical lemma which says thatwe can always “randomly bucket” our positive spectrahedron so that many of these buckets have“pretty large” smallest eigenvalue. Lemma 46.

Let { A i } i ∈ [ n ] ⊆ Sym k be a sequence of positive semideﬁnite matrices which is ( τ, M ) -regular with τ ≤ √ log k . Let m ≥ τ log k and π : [ n ] → [ m ] be a random hash function thatindependently assigns each i ∈ [ n ] to a uniformly random bucket in [ m ] . For c ∈ [ m ] , let σ c = X j ∈ π − ( c ) A j and we say the bucket c ∈ C is good if σ c (cid:23) τm · I . Then, Pr [ at most m/ buckets c ∈ [ m ] are good ] ≤ exp ( − m/ . Proof.

Let z i ∈ { , } be a random variable satisfying Pr[ z i = 1] = 1 /m . Let Z i = z i · A i , henceone can write σ c = P i Z i . In particular, this implies E [ σ c ] = 1 m X i A i (cid:23) τ · m X i (cid:0) A i (cid:1) (cid:23) τ m . Applying Fact 5 (for δ = 1 / µ = 1 /τ m , R = τ ) we havePr "X i Z i (cid:23) τ m I ≥ − k · (cid:18) e (cid:19) τ m ≥ j ∈ [ n ] and c ∈ [ m ] deﬁne random variables Y c,j = ( π ( j ) = c , and X j = " m X c =1 Y c,j σ c (cid:23) τ m I . Using the Claim 47 below, X , . . . , X n are negatively associated. Thus we may apply the Chernoﬀbound to P mi =1 X i which has mean at least 3 m/

4, which gives us the lemma statement.

Claim 47.

The random variables X , . . . , X n are negatively associated.Proof. From [DP09, Page 35, Example 3.1 ], the set of random variables { Y c,j } ≤ c ≤ m are neg-atively associated for j ∈ [ n ]. Note that { Y ,j , . . . , Y m,j } j ∈ [ n ] are n independent families of ran-dom variables. By [DP09, Page 35], { Y c,j } c ∈ [ m ] ,j ∈ [ n ] are negatively associated. Given σ , . . . , σ m , (cid:2)P mc =1 Y c,j σ c (cid:23) τm I (cid:3) is a monotone non-decreasing function of Y c, , . . . , Y c,n . Thus from [DP09,Page 35], X , . . . , X m are negatively associated.The proof of this claim concludes the proof of the lemma.We are now ready to proof our main theorem. Proof of Theorem 42.

For j ∈ [2], let f j ( x ) = P ni =1 x i A ij . Let π : [ n ] → [2 m ] be a randomhash function that independently assigns each i ∈ [ n ] to uniformly random bucket in [2 m ]. Let C , . . . , C m ⊆ [ n ] be the buckets and z ∈ {− , } m be uniformly random. Consider the function g j : {− , } k → Sym k deﬁned as g j ( z ) = m X q =1 z q · X i ∈ C q A ij . For q ∈ [2 m ], deﬁne ¯ A qj = P i ∈ C q A ij , so g j ( z ) = P q z q ¯ A qj . Observe that distribution of f j and g j are the same, i.e., for every D ∈ Sym k we havePr z ∼U m , { C i } [ g j ( z ) = D ] = Pr x ∼U n [ f j ( x ) = D ] . (42)In order to see this we argue that the n -bit string w ∈ {− , } n deﬁned as w i = z q iﬀ i ∈ C q , isuniformly random. To show this, we ﬁrst prove the following: for z ∈ {− , } m , let S = { q ∈ [2 m ] : z q = 1 } and T = ∪ q ∈ S C q . Then, observe that for every T ⊆ [ n ], we have Pr z , { C q } [ T = T ] = 2 − n (for every i ∈ [ n ], the probability of i ∈ C q is 1 / (2 m ) and the probability C q is included in T is 1 / z q is a uniformly random bit, hence for every i ∈ [ n ], we have Pr z , { C q } [ i ∈ T ] = P mi =1 (1 / m ) · (1 /

2) = 1 / i ∈ [ n ] by construction). It is now easyto see that w is uniformly random becausePr z , { C j } [ W = w ] = X T Pr[ T = T ] · Pr[ W = w | T = T ] = 12 n X T Pr[ W = w | T = T ] = 2 − n , where the last equality used the fact that once we ﬁx T , then all the bits of w which are 1 are ﬁxed.For m = τ log k , let π : [ n ] → [2 m ] be a random hash that buckets these n variables (jointlyfor j ∈ [2]). By Lemma 46, we argued that, with probability at least 1 − e − m/ , at least 9 m/ m buckets are good for j = 1, i.e., a good bucket q ∈ [2 m ] for j = 1 satisﬁes P i ∈ π − ( q ) A i (cid:23) τm · I .For the same reason, with probability at least 1 − e − m/ , at least 9 m/ m buckets are good38or j = 2, i.e., a good bucket q ∈ [2 m ] for j = 2 satisﬁes P i ∈ π − ( q ) A i (cid:22) − τm · I . Applying a unionbound, at least 8 m/ m buckets are good for every j ∈ [2] with probability at least 1 − · e − m/ .By the argument in the start of the proof, we know that after bucketing, we can convert each f j into a function g j : {− , } m → Sym k such that f j and g j have the same distribution. Now wecan invoke Theorem 45 as follows: we know that a 4 / q ∈ [2 m ] satisfy ¯ A q (cid:23) τm · I and¯ A q (cid:22) − τm · I , so we havePr z ∼U m  ∃ j ∈ [2] s.t. λ max  m X q =1 z q ¯ A qj − B j  ∈ ( − / τ m,  ≤ O r m ! + 2 e − m/ . We now prove the main theorem statement. In order to do so, ﬁrst observe that, we can partitionthe bound on the LHS into ⌈ τ m ⌉ intervals as Λ ≥ / τ m from our choice of parameters. andby a union bound we havePr x ∼U n " ∃ j ∈ [2] s.t. λ max X i x i A ij − B j ! ∈ ( − Λ , Λ] ≤ O Λ · τ · m r m + exp( − Ω( m/ !! From the choice of the parameters, the ﬁrst term above dominates. And thusPr x ∼U n " ∃ j ∈ [2] s.t. λ max X i x i A ij − B j ! ∈ [ − Λ , ≤ O (Λ) . Similarly one can also show when the LHS of the equation above is replaced with (0 , Λ]. Hence weget our theorem statement.

In this section, we establish our main invariance principles.

We now prove our main lemma which is an invariance principle for the Bentkus molliﬁer. We remarkthat our analysis is the standard Lindeberg-style argument for proving invariance principles, butwhen applied to the spectral Bentkus molliﬁer. We ﬁrst write out the Fr´echet series for the Bentkusmolliﬁer, which we then upper bound using our main Theorem 28. In order to understand the errorterms in the Fr´echet series, we use the matrix Rosenthal inequality (in Fact 7) in order to understandthe moments of random matrices (we remark that this inequality will also be useful in our

PRG construction). Superﬁcially, our proof techniques resemble the previous invariance principle proofsused in [HKM13, ST17, OST19], but the quantities we need to bound are very diﬀerent fromtheir analysis. To be precise, for a vector v ∈ R k , observe that the event [ ∀ i ∈ [ k ] : v i ≤ b i + Λ , and ∃ j ∈ [ k ] : v j ≥ b j − Λ] canbe broken down into the intersections of Λ / τ m events given by V τm − ℓ =0 [ ∀ i ∈ [ k ] : v i ≤ b i + Λ − ℓ/ τ m, and ∃ j ∈ [ ℓ ] : v j > b j − Λ − ( ℓ + 1) / τ m ]. emma 48. Let k ≥ , θ, τ ∈ (0 , and Ψ θ : Sym k → R be deﬁned as Ψ θ ( M ) = ( G θ ◦ λ ) ( M ) where G θ is the Bentkus molliﬁer deﬁned in Eq. (18) . Let S , S be ( τ, M ) -regular positive spectrahedronsspeciﬁed by matrices { A , . . . , A n , B } and { A , . . . , A n , B } respectively. Let A i = diag (cid:0) A i , A i (cid:1) and B = diag ( B , B ) be block diagonal matrices. Then (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E x ∼U n " Ψ θ n X i =1 x i A i − B ! − E g ∼G n " Ψ θ n X i =1 g i A i − B ! ≤ O (cid:18) log kθ · ( M + k B k ) · ( M · τ ) . (cid:19) . This inequality holds if x is (10 log k ) -wise uniform.Proof. Let t = ⌈ /τ ⌉ . Let H = { h : [ n ] → [ t ] } be a family of (10 log k )-wise uniform hashingfunctions, i.e., for every subset I ⊆ [ n ] of size at most 10 log k , and b ∈ [ t ] I , we havePr h ∈H [ h ( i ) = b i ] = 1 t | I | , where the probability is taken over a uniformly random function h ∈ H . Fix an h ∈ H (think of h as a partition of [ n ] into t blocks S , . . . , S t ⊆ [ n ], where S i = h − ( i ) for all i ∈ [ t ]). For x ∼ U n and y ∼ G n let us divide x , y into blocks x , . . . , x t and y , . . . , y t according to h . It is not hardto see that x i ∼ uniform {− , } | h − ( i ) | and y i ∼ G | h − ( i ) | . We now upper bound the quantity (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E x ∼U n " Ψ θ n X i =1 x i A i − B ! − E y ∈G n " Ψ θ n X i =1 y i A i − B ! (43)by the standard hybrid argument. Let { Z , . . . , Z t } be a set of random variable on n coordinatessuch that Z is the uniform distribution on {− , } n and Z t is uniform in G n . To this end, deﬁne Z ℓ as follows: for j ∈ [ ℓ ], let Z ℓ | h − ( j ) = y j and for ℓ < j ≤ t let Z ℓ | h − ( j ) = x j . It is easy to see that Z ∼ U n and Z t ∼ G n . We now can upper bound Eq. (43) as (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E x ∼U n " Ψ θ n X i =1 x i A i − B ! − E y ∼G n " Ψ θ n X i =1 y i A i − B ! = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) t X ℓ =1 E x ∼U n y ∼G n " Ψ θ n X i =1 Z ℓi A i − B ! − E x ∼U n y ∼G n " Ψ θ n X i =1 Z ℓ − i A i − B ! ≤ t X ℓ =1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E x ∼U n y ∼G n " Ψ θ n X i =1 Z ℓi A i − B ! − E x ∼U n y ∼G n " Ψ θ n X i =1 Z ℓ − i A i − B ! (44)We now upper bound each of the t quantities on the RHS of Eq. (44). Fix ℓ ∈ [ t ] and let usassume for simplicity that h − ( ℓ ) = [ m ]. By deﬁnition of Z ℓ we observe that Z ℓj = Z ℓ +1 j for all j ∈ { m + 1 , . . . , n } and in fact we have Z ℓ = ( x , . . . , x m , Z m +1 , . . . , Z n ) , Z ℓ +1 = ( y , . . . , y m , Z m +1 , . . . , Z n ) , where x i ∼ U and y i ∈ G is uniform in their respective domains. Crucially note that Z m +1 , . . . , Z n is independent of the x i s or y i s by deﬁnition of Z ℓ , Z ℓ +1 . Rewriting the ℓ -th term in Eq. (44),we get (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E x ∼U n y ∼G n  Ψ θ  m X i =1 x i A i | {z } Q + n X i = m +1 Z i A i − B | {z } P  − E x ∼U n y ∼G n  Ψ θ  m X i =1 y i A i | {z } R + n X i = m +1 Z i A i − B | {z } P (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (45)40et us analyze both these quantities separately. We can ﬁrst write the Fr´echet series for both theseexpressions asΨ θ ( Q + P ) = Ψ θ ( P ) + D Ψ θ ( P ) [ Q ] + 12 D Ψ θ ( P ) [ Q, Q ] + 16 D Ψ θ (cid:0) P ′ (cid:1) [ Q, Q, Q ] (46)where P ′ = P + ξQ for some ξ ∈ [0 , Ψ θ ( R + P ) = Ψ θ ( P ) + D Ψ θ ( P ) [ R ] + 12 D Ψ θ ( P ) [ R, R ] + 16 D Ψ θ (cid:0) P ′′ (cid:1) [ R, R, R ] , (47)where P ′′ = P + ξ ′ R for some ξ ∈ [0 , x match with the standardnormal distributions. Thus we have that E x ∼U n y ∼G n [ D Ψ θ ( P ) [ R ]] = E x ∼U n y ∼G n [ D Ψ θ ( P ) [ Q ]] E x ∼U n y ∼G n (cid:2) D Ψ θ ( P ) [ R, R ] (cid:3) = E x ∼U n y ∼G n (cid:2) D Ψ θ ( P ) [ Q, Q ] (cid:3) . (48)So by taking the diﬀerence of Eq. (47) and Eq. (46), only the third order spectral derivatives remainto be bounded. For this, we now use the Corollary 29 and obtain (cid:12)(cid:12) D Ψ θ (cid:0) P ′ (cid:1) [ Q, Q, Q ] (cid:12)(cid:12) ≤ O (cid:18) ∆ θ log k · k Q k (cid:19) (49) (cid:12)(cid:12) D Ψ θ (cid:0) P ′′ (cid:1) [ R, R, R ] (cid:12)(cid:12) ≤ O (cid:18) ∆ θ log k · k R k (cid:19) . (50)where ∆ = k P ′ k and ∆ = k P ′′ k .Thus, the absolute value of Eq. (45) is upper bounded bylog kθ E h ∆ k Q k + ∆ k R k i ≤ log kθ (cid:18) E h k P ′ k i / E h k Q k i / + E h k P ′′ k i / E h k R k i / (cid:19) , (51)where the inequality is by Cauchy-Schwarz inequality.Using Fact 6 and the fact that P i ( A i ) (cid:22) M · I , we have E h k P ′ k i ≤ O (cid:16) log k · M + k B k (cid:17) , E h k P ′′ k i ≤ O (cid:16) log k · M + k B k (cid:17) (52)We now upper bound the last term in Eq. (51) using the following claim. Claim 49.

It holds that E h k Q k i ≤ O (cid:0) log k · τ · M (cid:1) , E h k R k i ≤ O (cid:0) log k · τ · M (cid:1) . Before proving this claim, observe that combining Claim 49 with Eq. (52), (51), we can upperbound Eq. (51) (and in turn Eq. (45)) by O (cid:18) log kθ · (cid:0) M log k + k B k (cid:1) · (cid:0) log k · τ . · M . (cid:1)(cid:19) ≤ O (cid:18) log kθ · ( M + k B k ) · ( M · τ ) . (cid:19) This follows directly from the mean value theorem for Fr´echet derivatives [AP95]. (cid:12)(cid:12)(cid:12) E x ∼U n " Ψ θ n X i =1 x i A i ! − E y ∼G n " Ψ θ n X i =1 y i A i ! ≤ O (cid:18) log kθ · ( M + k B k ) · ( M · τ ) . (cid:19) , concluding the theorem proof. We now prove the claim above. Proof of Claim 49.

Note that Q = P ni =1 x i A i , where ( x , . . . , x n ) is i.i.d. with Pr [ x i = 1] =Pr[ x i = −

1] = t and Pr[ x i = 0] = 1 − /t . Then using Fact 7, we have E h k Q k p p i / p ≤ p p − (cid:13)(cid:13)(cid:13) t X i (cid:0) A i (cid:1) ! / (cid:13)(cid:13)(cid:13) p + (8 p − t X i k A i k p p ! / p ≤ p p − · r Mt · k p + (8 p − (cid:18) τ p − · k · Mt (cid:19) / p where the second inequality used P i (cid:0) A i (cid:1) (cid:22) M · I for both terms and 0 (cid:22) A i (cid:22) τ I for upperbounding the second term. Setting p = 10 log k , t = 1 /τ we have E h k Q k p p i / p ≤ O (cid:16)p log k · √ τ · √ M + log k · τ · ( M/τ ) / (80 log k ) (cid:17) = O (cid:16) log k · √ τ · √ M (cid:17) . Thus, we have E h k Q k i ≤ E h k Q k p p i p ≤ O (cid:0) log k · τ · M (cid:1) , where in the ﬁrst inequality note that the LHS is the spectral norm and the RHS is the (8 p )-Schatten norm. This proves the ﬁrst inequality in the claim statement. The second inequality inthe claim follows by the exact same argument (since Fact 7 applies to even P i g i A i ).The proof of this claim concludes the proof of the theorem. We are now ready to prove our main theorem now, which involves combining our anti-concentrationTheorem 42 and our invariance principle for Bentkus molliﬁer in Lemma 48. Theorem 50.

Let k ≥ , M ≥ , γ ≥ , τ ∈ [0 , , δ ∈ [0 , . Let S , S be ( τ, M ) -regular positivespectrahedrons speciﬁed by matrices { A , . . . , A n , B } ∈ Sym k and { A , . . . , A n , B } ∈ Sym k respec-tively satisfying k B k , k B k ≤ γ . Let S = S ∩ S . If µ is a (10 log k ) -wise uniform distributionover {− , } n , then (cid:12)(cid:12)(cid:12)(cid:12) E x ∼ µ [ x ∈ S ] − E g ∼G n [ g ∈ S ] (cid:12)(cid:12)(cid:12)(cid:12) ≤ C · (cid:0) M + γ (cid:1) / · log / k · M / · τ / , for some universal constant C > . We remark that our theorem statements should also hold true for a larger class of proper distributions as consideredin [HKM13], which requires one to extend our main Theorem 19 to show that even the 4th order spectral derivativescan be bounded by k f (4) k . We believe this should be possible and leave this to be made rigorous for future work. roof. Again for notational simplicity, let A i = diag (cid:0) A i , A i (cid:1) and B = diag ( B , B ) be blockdiagonal matrices. We conclude the result by combining Fact 24, Lemma 48 and Corollary 42 asfollows: ﬁrst Lemma 48 implies (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E x ∼ µ " Ψ θ n X i =1 x i A i − B ! − E g ∼G n " Ψ θ n X i =1 g i A i − B ! ≤ O (cid:18) log kθ · ( M + k B k ) · ( M · τ ) . (cid:19) , In particular, using Fact 24 (for D = B − β · I and D = B + β · I ), the “if” condition of Fact 24 issatisﬁed with η = O (cid:18) log kθ · (cid:0) M + ( γ + β ) (cid:1) · ( M · τ ) . (cid:19) where β = O ( θ · p log k/δ ). In particular, Fact 24 and Corollary 43 now together imply that (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E x ∼ µ " Ψ n X i =1 x i A i − B ! − E g ∼G n " Ψ n X i =1 g i A i − B ! ≤ γ + 3 δ + Pr g ∼G n " λ max n X i =1 g i A i − B ! ∈ [ − Λ , Λ] = O (cid:18) log kθ · (cid:18) M + (cid:16) γ + θ · p log( k/δ ) (cid:17) (cid:19) · ( M · τ ) . + δ + Λ (cid:19) ≤ O (cid:18) log kθ · (cid:18) M + (cid:16) γ + p log( k/δ ) (cid:17) (cid:19) · ( M · τ ) . + δ + Λ (cid:19) Let us ﬁx θ ← δ, θ ← Λ , (cid:18) ( M · τ ) . · log k · (cid:18) M + (cid:16) γ + p log k (cid:17) (cid:19)(cid:19) / ← θ. This gives us (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E x ∼ µ " Ψ n X i =1 x i A i − B ! − E g ∼G n " Ψ n X i =1 g i A i − B ! ≤ (cid:0) ( M · τ ) . · log k · ( M + γ ) (cid:1) / . We are now ready to describe our pseudorandom generator for fooling positive spectrahedrons.Our

PRG is based on the well-known construction of Meka and Zuckerman [MZ13] which we de-scribe now. We remark that the same

PRG (with minor modiﬁcations and diﬀerent parametersettings) was used in [MZ13, HKM13, ST17] in order to obtain

PRG s for polytopes.

Meka-Zuckerman PRG.

We begin by describing the Meka-Zuckerman

PRG . Let us ﬁx theparameters δ ∈ (0 , τ = Ω( δ / / (log k · M · ( M + γ ))) so that we have (cid:0) M + γ (cid:1) / · log / k · M / · τ / = δ (where the LHS of this equality is the upper bound obtained in our invariableprinciple proof). Let t = ⌈ /τ ⌉ and consider the family of (2 log k )-wise uniform functions H = { h :[ n ] → [ t ] } , i.e., for every for every subset I ⊆ [ n ] of size at most 5 log k , and b ∈ [ t ] I , we havePr h ∈H [ h ( i ) = b i ] = 1 t | I | , h ∈ H . Eﬃcient constructionsof such hash function families are known with |H| = O ( n k ). For simplicity (as in the proofof [MZ13, HKM13]), we also assume that for every j ∈ [ t ], we have | h − ( j ) | = n/t . Let m = n/t and G : { , } s → {− , } m generate a (10 log k )-wise uniform distribution over {− , } m , i.e., forevery I ⊆ [ n ] of size at most 5 log k and b ∈ {− , } I , we havePr z ∈{ , } s x = G ( z ) [ x i = b i for all i ∈ I ] = 12 | I | , where the probability is taken over uniformly random z ∈ { , } s . It is well-known by [NN93] thateﬃcient constructions of generators G are known for s = O (log k log n ). Finally, we are ready todescribe the Meka-Zuckerman generator: for a given hash function family H and generator G ,deﬁne G : H × ( { , } s ) t → {− , } n by G ( h, z , . . . , z t ) = x, where x | h − ( i ) = G ( z i ) for i ∈ [ t ] . Clearly the seed length of this generator is O (cid:18) (log n )(log k ) + (log n )(log k ) 1 τ (cid:19) = O ((log n )(log k ) /τ ) = (log n ) · poly(log k, M, /δ, γ ) , where the ﬁrst term is the logarithm of the number of elements of the hash function family |H| ,the second term because we have s = O ((log n )(log k )) and recall that we picked t = O (1 /τ ) andthe ﬁnal equality used the bound on τ we ﬁxed at the start of the proof.We now restate our main theorem and prove it. Theorem 51.

Let δ ∈ (0 , , k, n, M ≥ and τ ≤ δ / / (log k · M · ( M + γ )) . Let S , S be ( τ, M ) -regular positive spectrahedrons speciﬁed by matrices { A , . . . , A n , B } ∈ Sym k and { A , . . . , A n , B } ∈ Sym k with k B k , k B k ≤ γ . Let S = S ∩ S . There exists a PRG G : { , } r → {− , } n with r = (log n ) · poly(log k, M, /δ, γ ) that δ -fools S with respect to the uniform distribution. The proof of this theorem is a generic statement that allows one to go from invariance principlesproven using the proof techniques to construct

PRG s. The proof uses the same proof ideas ofHarsha, Klivans and Meka [HKM13, Section 7.2] (except that now we directly proved

Boolean anti-concentration instead of the weaker

Gaussian anti-concentration as proven by [HKM13]). Weprovide the proof below for completeness.

Proof.

Again for notational simplicity, let A i = diag (cid:0) A i , A i (cid:1) and B = diag ( B , B ) be blockdiagonal matrices. The PRG G will be the Meka-Zuckerman PRG deﬁned above, so the seed length r = (log n ) · poly(log k, M, /δ, γ ) immediately follows. (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E x ∼U r " Ψ θ n X i =1 ( G ( x )) i A i − B ! − E g ∼G n " Ψ θ n X i =1 g i A i − B ! ≤ O (cid:18) log kθ · ( M + k B k ) · ( M · τ ) . (cid:19) , (53)44here we used the fact that G ( x ) for uniformly random x ∈ { , } r generates a (10 log k )-wise uni-form distribution and Lemma 48 holds for every (10 log k )-wise uniform distribution µ . Repeatingthe same calculation that we did in the proof of Theorem 50, we get (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E x ∼U r " Ψ n X i =1 ( G ( x )) i A i − B ! − E g ∼G n " Ψ n X i =1 g i A i − B ! ≤ γ + 3 δ + Pr g ∼G n [ λ max ( A ( g )) ∈ ( − Λ , Λ]]= O (cid:18) log kθ · ( M + k B k ) · ( M · τ ) . + δ + Λ (cid:19) , and using our assumption on τ (and the same parameters as in Theorem 50), this implies that (cid:12)(cid:12)(cid:12)(cid:12) E x ∼U r [ G ( x ) ∈ S ] − E g ∼G n [ g ∈ S ] (cid:12)(cid:12)(cid:12)(cid:12) ≤ δ, hence proving our theorem statement. References [AHK05] Sanjeev Arora, Elad Hazan, and Satyen Kale. Fast algorithms for approximate semidef-inite programming using the multiplicative weights update method. In , pages 339–348.IEEE, 2005. 1[AK07] Sanjeev Arora and Satyen Kale. A combinatorial, primal-dual approach to semideﬁniteprograms. In

Proceedings of the thirty-ninth annual ACM symposium on Theory ofcomputing , pages 227–236, 2007. 1[AP95] Antonio Ambrosetti and Giovanni Prodi.

A primer of nonlinear analysis , volume 34.Cambridge University Press, 1995. 41[AS10] Brendan P.W. Ames and Hristo S. Sendov. Asymptotic expansions of the orderedspectrum of symmetric matrices.

Nonlinear Analysis: Theory, Methods & Applications ,72(11):4288 – 4297, 2010. 6[AS12] Brendan P.W. Ames and Hristo S. Sendov. A new derivation of a formula by Kato.

Linear Algebra and its Applications , 436(3):722 – 730, 2012. 6[AS16] Brendan P.W. Ames and Hristo S. Sendov. Derivatives of compound matrix valuedfunctions.

Journal of Mathematical Analysis and Applications , 433(2):1459 – 1485,2016. 6[AZLO16] Zeyuan Allen-Zhu, Yin Tat Lee, and Lorenzo Orecchia. Using optimization to obtaina width-independent, parallel, simpler, and faster positive SDP solver. In

Proceedingsof the 2016 Annual ACM-SIAM Symposium on Discrete Algorithms , pages 1824–1831,2016. 1[Bal93] Keith Ball. The reverse isoperimetric problem for Gaussian measure.

Discrete & Com-putational Geometry , 10(4):411–420, 1993. 7, 3145Bal13] Keith Ball. Talk: Noise sensitivity and Gaussian surface area, 2013. .7, 34[Baz09] Louay MJ Bazzi. Polylogarithmic independence can fool DNF formulas.

SIAM Journalon Computing , 38(6):2220–2272, 2009. 11[Ben90] Vidmantas Bentkus. Smooth approximations of the norm and diﬀerentiable func-tions with bounded support in Banach space ℓ k ∞ . Lithuanian Mathematical Journal ,30(3):223–230, 1990. 2, 5, 6, 17, 18[Bha00] Rajendra Bhatia. Pinching, trimming, truncating, and averaging of matrices.

TheAmerican Mathematical Monthly , 107(7):602–608, 2000. 13[Bha13] Rajendra Bhatia.

Matrix analysis , volume 169. Springer Science & Business Media,2013. 6, 15, 32[BHK +

19] Boaz Barak, Samuel Hopkins, Jonathan Kelner, Pravesh K Kothari, Ankur Moitra,and Aaron Potechin. A nearly tight sum-of-squares lower bound for the planted cliqueproblem.

SIAM Journal on Computing , 48(2):687–735, 2019. 1[BLZ05] Jan Brinkhuis, Zhi-Quan Luo, and Shuzhong Zhang. Matrix convex functions withapplications to weighted centers for semideﬁnite programming, 2005. 6, 15[Boo05] Carl de Boor.

Divided diﬀerences . Surv. Approx. Theory 1, 2005. 13[BPT12] Grigoriy Blekherman, Pablo A. Parrilo, and Rekha R. Thomas.

Semideﬁnite Optimiza-tion and Convex Algebraic Geometry . Society for Industrial and Applied Mathematics,2012. 1[BS99] Rajendra Bhatia and Kalyan B. Sinha. Derivations, derivatives and chain rules.

LinearAlgebra and its Applications , 302-303:231 – 244, 1999. 5[BSS98] Rajendra Bhatia, Dinesh Singh, and Kalyan B. Sinha. Diﬀerentiation of operator func-tions and perturbation bounds.

Communications in Mathematical Physics , 191:603–611, 1998. 5[CDS19] Eshan Chattopadhyay, Anindya De, and Rocco A Servedio. Simple and eﬃcient pseu-dorandom generators from Gaussian processes. In . Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2019. 1,2, 3, 6, 7[Col12] Rodney Coleman.

Calculus on normed vector spaces . Springer Science & BusinessMedia, 2012. 14, 15[CQT03] Xin Chen, Houduo Qi, and Paul Tseng. Analysis of nonsmooth symmetric-matrix-valued functions with applications to semideﬁnite complementarity problems.

SIAMJournal on Optimization , 13(4):960–985, 2003. 5[DGJ +

10] Ilias Diakonikolas, Parikshit Gopalan, Ragesh Jaiswal, Rocco A Servedio, and EmanueleViola. Bounded independence fools halfspaces.

SIAM Journal on Computing ,39(8):3441–3462, 2010. 1, 2, 3, 7, 32, 3446DHK +

10] Ilias Diakonikolas, Prahladh Harsha, Adam Klivans, Raghu Meka, Prasad Raghaven-dra, Rocco A Servedio, and Li-Yang Tan. Bounding the average sensitivity and noisesensitivity of polynomial threshold functions. In

Proceedings of the forty-second ACMsymposium on Theory of computing , pages 533–542, 2010. 1, 7[DP09] Devdatt P. Dubhashi and Alessandro Panconesi.

Concentration of Measure for theAnalysis of Randomized Algorithms . Cambridge University Press, 2009. 38[Erd45] Paul Erd¨os. On a lemma of Littlewood and Oﬀord.

Bulletin of the American Mathe-matical Society , 51(12):898–902, 1945. 8[Fel68] Willliam Feller.

An introduction to probability theory and its applications, vol 1 . NewYork: Wiley, 1968. 18[FF88] P´eter Frankl and Z Furedi. Solution of the Littlewood-Oﬀord problem in high dimen-sions.

Annals of Mathematics , pages 259–270, 1988. 8[FK20] Xiao Fang and Yuta Koike. High-dimensional central limit theorems by Stein’s method. arXiv preprint arXiv:2001.10917 , 2020. 17, 18[FMP +

15] Samuel Fiorini, Serge Massar, Sebastian Pokutta, Hans Raj Tiwary, and Ronald deWolf. Exponential lower bounds for polytopes in combinatorial optimization.

Journalof the ACM (JACM) , 62(2):1–23, 2015. 1[GKM18] Parikshit Gopalan, Daniel M Kane, and Raghu Meka. Pseudorandomness via thediscrete Fourier transform.

SIAM Journal on Computing , 47(6):2451–2487, 2018. 1[GM12] Bernd G¨artner and Jiri Matousek.

Approximation algorithms and semideﬁnite pro-gramming . Springer Science & Business Media, 2012. 1, 10[GOWZ10] Parikshit Gopalan, Ryan O’Donnell, Yi Wu, and David Zuckerman. Fooling functionsof halfspaces under product distributions. In , pages 223–234. IEEE, 2010. 1, 3[GW95] Michel X. Goemans and David P. Williamson. Improved approximation algorithms formaximum cut and satisﬁability problems using semideﬁnite programming.

J. ACM ,42(6):1115–1145, 1995. 1[GW13] Gus Gutoski and Xiaodi Wu. Parallel approximation of min-max problems.

Computa-tional Complexity , 22:385 – 428, 2013. 1[Hau92] David Haussler. Decision theoretic generalizations of the PAC model for neural net andother learning applications.

Information and computation , 100(1):78–150, 1992. 9[HKM13] Prahladh Harsha, Adam Klivans, and Raghu Meka. An invariance principle for poly-topes.

Journal of the ACM (JACM) , 59(6):1–25, 2013. , 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,25, 35, 39, 42, 43, 44[HMV06] J William Helton, Scott A McCullough, and Victor Vinnikov. Noncommutative convex-ity arises from linear matrix inequalities.

Journal of Functional Analysis , 240(1):105–191, 2006. 10[JJUW11] Rahul Jain, Zhengfeng Ji, Sarvagya Upadhyay, and John Watrous. QIP = PSPACE.

Journal of the ACM , 58(6), 2011. 1 47JLL +

20] Arun Jambulapati, Yin Tat Lee, Jerry Li, Swati Padmanabhan, and Kevin Tian. Posi-tive semideﬁnite programming: Mixed, parallel, and width-independent. In

Proceedingsof the 52nd Annual ACM SIGACT Symposium on Theory of Computing , STOC 2020,page 789–802, 2020. 1[JUW09] R. Jain, S. Upadhyay, and J. Watrous. Two-message quantum interactive proofs are inPSPACE. In ,pages 534–543, 2009. 1[JY11] R. Jain and P. Yao. A parallel approximation algorithm for positive semideﬁnite pro-gramming. In , pages 463–471, 2011. 1[Kan10] Daniel M. Kane. k -independent Gaussians fool polynomial threshold functions. arXivpreprint arXiv:1012.1614 , 2010. 1[Kan11a] Daniel M. Kane. The Gaussian surface area and noise sensitivity of degree- d polynomialthreshold functions. computational complexity , 20(2):389–412, 2011. 1, 7, 34[Kan11b] Daniel M. Kane. k-independent Gaussians fool polynomial threshold functions. In Proceedings of the 26th Annual IEEE Conference on Computational Complexity, CCC ,pages 252–261. IEEE Computer Society, 2011. 1[Kan11c] Daniel M. Kane. A small PRG for polynomial threshold functions of Gaussians. InRafail Ostrovsky, editor,

IEEE 52nd Annual Symposium on Foundations of ComputerScience, FOCS , pages 257–266. IEEE Computer Society, 2011. 1[Kan14a] Daniel Kane. The average sensitivity of an intersection of half spaces.

Research in theMathematical Sciences , 1(1):13, 2014. 7, 8, 11, 32, 35[Kan14b] Daniel M. Kane. A pseudorandom generator for polynomial threshold functions ofGaussian with subpolynomial seed length. In , pages 217–228. IEEE, 2014. 1[KKMS08] Adam Tauman Kalai, Adam R Klivans, Yishay Mansour, and Rocco A Servedio. Ag-nostically learning halfspaces.

SIAM Journal on Computing , 37(6):1777–1805, 2008.9[KM15] Pravesh K. Kothari and Raghu Meka. Almost optimal pseudorandom generators forspherical caps. In

Proceedings of the forty-seventh annual ACM symposium on Theoryof computing , pages 247–256, 2015. 1, 11[KOS04] Adam R Klivans, Ryan O’Donnell, and Rocco A Servedio. Learning intersections andthresholds of halfspaces.

Journal of Computer and System Sciences , 68(4):808–840,2004. 9, 31[KOS08] Adam R Klivans, Ryan O’Donnell, and Rocco A Servedio. Learning geometric conceptsvia Gaussian surface area. In , pages 541–550. IEEE, 2008. 2, 3, 7, 9[KSS94] Michael J Kearns, Robert E Schapire, and Linda M Sellie. Toward eﬃcient agnosticlearning.

Machine Learning , 17(2-3):115–141, 1994. 948Lew96] A. S. Lewis. Derivatives of spectral functions.

Mathematics of Operations Research ,21(3):576–588, 1996. 5[Lin22] J.W. Lindeberg. Eine neue herleitung des exponentialgesetzes in der wahrscheinlichkeit-srechnung.

Mathematische Zeitschrift , 15:211–225, 1922. 2, 4[LO39] John Edensor Littlewood and Albert C Oﬀord. On the number of real roots of a randomalgebraic equation. ii. In

Mathematical Proceedings of the Cambridge PhilosophicalSociety , volume 35, pages 133–148. Cambridge University Press, 1939. 8[LRS15] James R Lee, Prasad Raghavendra, and David Steurer. Lower bounds on the size ofsemideﬁnite programming relaxations. In

Proceedings of the forty-seventh annual ACMsymposium on Theory of computing , pages 567–576, 2015. 1[MJC +

14] Lester Mackey, Michael I. Jordan, Richard Y. Chen, Brendan Farrell, and Joel A.Tropp. Matrix concentration inequalities via the method of exchangeable pairs.

Ann.Probab. , 42(3):906–945, 05 2014. 6, 14[MOO05] Elchanan Mossel, Ryan O’Donnell, and Krzysztof Oleszkiewicz. Noise stability of func-tions with low inﬂuences: invariance and optimality. In , pages 21–30. IEEE, 2005. 5[Mos08] Elchanan Mossel. Gaussian bounds for noise correlation of functions and tight analysisof long codes. In , pages 156–165. IEEE Computer Society, 2008. 5[MZ13] Raghu Meka and David Zuckerman. Pseudorandom generators for polynomial thresholdfunctions.

SIAM Journal on Computing , 42(3):1275–1301, 2013. 1, 2, 3, 9, 43, 44[Naz03] Fedor Nazarov. On the maximal perimeter of a convex set in R n with respect to aGaussian measure. In Geometric aspects of functional analysis , pages 169–187. Springer,2003. 2, 3, 5, 7, 8, 32, 35[NN93] Joseph Naor and Moni Naor. Small-bias probability spaces: Eﬃcient constructions andapplications.

SIAM journal on computing , 22(4):838–856, 1993. 44[NPS08] Jiawang Nie, Pablo A. Parrilo, and Bernd Sturmfels.

Semideﬁnite Representation ofthe k-Ellipse , pages 117–132. Springer New York, New York, NY, 2008. 1[O’D14] Ryan O’Donnell.

Analysis of Boolean functions . Cambridge University Press, 2014. 4,13, 14, 34[OST19] Ryan O’Donnell, Rocco A Servedio, and Li-Yang Tan. Fooling polytopes. In

Proceedingsof the 51st Annual ACM SIGACT Symposium on Theory of Computing , pages 614–625,2019. 1, 2, 4, 5, 6, 7, 8, 9, 10, 11, 20, 25, 35, 36, 39[OST20] Ryan O’Donnell, Rocco A Servedio, and Li-Yang Tan. Fooling Gaussian PTFs via localhyperconcentration. In

Proceedings of the 52nd Annual ACM SIGACT Symposium onTheory of Computing , pages 1170–1183, 2020. 1[Per04] Yuval Peres. Noise stability of weighted majority. arXiv math/0412377 , 2004. 749PT12] Richard Peng and Kanat Tangwongsan. Faster and simpler width-independent parallelalgorithms for positive semideﬁnite programming. In

Proceedings of the Twenty-FourthAnnual ACM Symposium on Parallelism in Algorithms and Architectures , SPAA ’12,page 101–108. Association for Computing Machinery, 2012. 1[Qua12] Ronan Quarez. Symmetric determinantal representation of polynomials.

Linear algebraand its applications , 436(9):3642–3660, 2012. 10[Sch18] Claus Scheiderer. Spectrahedral shadows.

SIAM Journal on Applied Algebra and Ge-ometry , 2(1):26–44, 2018. 1[Sen07] Hristo S Sendov. The higher-order derivatives of spectral functions.

Linear algebra andits applications , 424(1):240–281, 2007. 6, 20, 21[Ser06] Rocco A Servedio. Every linear threshold function has a low-weight approximator. In , pages 18–32.IEEE, 2006. 1, 11[ST17] Rocco A Servedio and Li-Yang Tan. Fooling intersections of low-weight halfspaces.In ,pages 824–835. IEEE, 2017. 1, 2, 3, 6, 7, 8, 9, 10, 11, 25, 35, 39, 43[Tao10] Terence Tao. 254a notes: Topics in random matrix theory., 2010. https://terrytao.wordpress.com/tag/lindeberg-replacement-trick/ . 4[Tro12] Joel A. Tropp. User-friendly tail bounds for sums of random matrices.

Foundations ofComputational Mathematics , 12:389–434, 2012. 13[Tro15] Joel A Tropp. An introduction to matrix concentration inequalities. arXiv preprintarXiv:1501.01571 , 2015. 6[Tro16] Joel A. Tropp. The expected norm of a sum of independent random matrices: Anelementary approach. In Christian Houdr´e, David M. Mason, Patricia Reynaud-Bouret,and Jan Rosi´nski, editors,

High Dimensional Probability VII , pages 173–202, Cham,2016. Springer International Publishing. 14[TV12] Terence Tao and Van Vu. The Littlewood-Oﬀord problem in high dimensions and aconjecture of Frankl and F¨uredi.

Combinatorica , 32(3):363–372, 2012. 8[Viz17] Cynthia Vizant. Spectrahedra., 2017. https://clvinzan.math.ncsu.edu/slides/MSRI_SpectrahedraSlides.pdf .1[Yao19] Penghui Yao. A doubly exponential upper bound on noisy EPR states for binary games. arXiv:1904.08832 , 2019. 5, 11

A Proof of Lemma 34: Case 2

Recall that the goal is to prove the following inequality (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X i = i = i g ( x i ) − g ( x i ) x i − x i − g ( x i ) − g ( x i ) x i − x i x i − x i G ( x ) H i ,i H i ,i H i ,i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ O (cid:16) ∆ log k k H k (cid:17) (54)50irst observe that the LHS of the inequality above can be rephrased as follows. (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X i = i = i g ( x i ) − g ( x i ) x i − x i − g ( x i ) − g ( x i ) x i − x i x i − x i G ( x ) H i ,i H i ,i H i ,i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X i = i = i xi >xi g ′ ( x i ) g ( x i ) − g ( x i ) g ′ ( x i ) x i − x i g ( x i ) − g ′ ( x i ) g ( x i ) − g ( x i ) g ′ ( x i ) x i − x i g ( x i ) x i − x i G (cid:0) x −{ i ,i ,i } (cid:1) H i ,i H i ,i H i ,i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (55)Providing an upper bound on this consists of several lemmas and the result is concluded by combingall of them via triangle inequalities. To keep the expressions short, we use the following notationsto represent Eq. (54), which are clear in the context. (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X i = i = i x i >x i h i i ′ h i i−h i ih i i ′ [ i − i ] h i i − h i i ′ h i i−h i ih i i ′ [ i − i ] h i i [ i − i ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , (56)where we implicitly hide the G (cid:0) x −{ i ,i ,i } (cid:1) H i ,i H i ,i H i ,i term. We ﬁrst give a sketch of how weare going to upper bound this inequality and break it into subsections.(56) = h i ih i i ′ − h i i ′ h i i [ i − i ] · h i i − h i i [ i − i ] | {z } Section A. , Lemma − h i ih i i ′ −h i i ′ h i i [ i − i ] − h i ih i i ′ −h i i ′ h i i [ i − i ] [ i − i ] h i i | {z } ( ⋆ ) Section A. , Remark . We now break up Remark 1 into two cases( ⋆ ) = Remark · I [min { x i , x i } > x i ] | {z } ( † ) + Remark · I [ x i < x i < x i ] | {z } ( †† ) . Note that there are the only two cases we need to handle since by symmetry between i and i , wecan assume x i > x i , without loss of generality. Now we bound these two terms, separately.( † ) = h i i ′ −h i i ′ [ i − i ] − h i i ′ −h i i ′ [ i − i ] [ i − i ] h i ih i i | {z } Section A. , Lemma − h i i−h i i [ i − i ] − h i i−h i i [ i − i ] [ i − i ] h i i ′ h i i | {z } Section A. , Lemma . and ( †† ) = h i ih i i ′ −h i ih i i ′ [ i − i ] − h i ih i i ′ −h i ih i i ′ [ i − i ] [ i − i ] h i i | {z } Section A. , Lemma

59 + h i i ′ −h i i ′ [ i − i ] h i i − h i i ′ −h i i ′ [ i − i ] h i i [ i − i ] h i i | {z } ( ¶ ) Section A. , Remark , and ( ¶ ) = h i i ′ − h i i ′ [ i − i ] · h i i − h i i [ i − i ] · h i i | {z } Section A. , Lemma

60 + h i i ′ −h i i ′ [ i − i ] − h i i ′ −h i i ′ [ i − i ] [ i − i ] h i ih i i | {z } Section A. , Lemma O (cid:16) ∆ log k k H k (cid:17) in the respective sections (as underbraced by the terms).51 .1 Case 2.1 Lemma 52. (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X i = i = i xi >xi h i ih i i ′ − h i i ′ h i i [ i − i ] · h i i − h i i [ i − i ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ O (cid:16) ∆ · log k · k H k (cid:17) . Remark 1.

Using the triangle inequality, it suﬃces to upper bound (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X i = i = i xi >xi h i ih i i ′ −h i i ′ h i i [ i − i ] − h i ih i i ′ −h i i ′ h i i [ i − i ] [ i − i ] h i i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Proof of Lemma 52.

We apply Claim 22 to the ﬁrst sum and obtain O (∆ max { g ′ ( x i ) , g ′ ( x i ) } )(note that we have max {· , ·} to compensate for the fact that x i ≥ x i or x i ≥ x i ). Therefore, theleft hand side in Lemma 52 can be upper bounded by O  X i = i = i xi >xi ∆ (cid:12)(cid:12)(cid:12)(cid:12) max (cid:8) g ′ ( x i ) , g ′ ( x i ) (cid:9) · g ( x i ) − g ( x i ) x i − x i · G (cid:0) x −{ i ,i ,i } (cid:1) H i ,i H i ,i H i ,i (cid:12)(cid:12)(cid:12)(cid:12) ≤ O  X i = i = i xi >xi ≥ ,xi ≥ ( · · · ) + X i = i = i xi >xi ≥ ,xi < ( · · · ) + X i = i = i xi >xi ,xi < ,xi ≥ ( · · · ) + X i = i = i xi >xi ,xi < ,xi < ( · · · )  (57) First term in Eq. (57) . Note that g ( x ) ≥ if x ≥

0. Since g ′ is monotone decreasing in theinterval [0 , ∞ ), the ﬁrst summation is upper bounded by O (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X i = i = i xi >xi ≥ ,xi ≥ ∆ (cid:12)(cid:12) max (cid:8) g ′ ( x i ) g ′ ( x i ) G ( x − i ) , g ′ ( x i ) g ′ ( x i ) G ( x − i ) (cid:9) H i ,i H i ,i H i ,i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ O ∆ · k G (2) k · max i ,i X i | H i ,i H i ,i H i ,i | ! (58) ≤ O ∆ · log k · max i ,i X i | H i ,i H i ,i H i ,i | ! ≤ O (cid:0) ∆ · log k · k H k (cid:1) (59)where the ﬁrst inequality used that | g ′ ( ζ i ,i ) | ≤ max {| g ′ ( x i ) | , | g ′ ( x i ) |} ) and the last inequalityfollows by Eq. (35). Second term in Eq. (57) . The second summation is upper bounded as follows, again by the52ean value theorem observe that O  X i = i = i xi >xi ≥ ,xi < ∆ (cid:12)(cid:12) max (cid:8) g ′ ( x i ) g ′ ( x i ) , g ′ ( x i ) g ′ ( x i ) (cid:9) G ( x − i ) H i ,i H i ,i H i ,i (cid:12)(cid:12) ≤ O  ∆ · X i : x i < k G (1) ( x − i ) k max i X i | H i ,i H i ,i H i ,i | + k G (2) ( x − i ) k max i ,i | H i ,i H i ,i H i ,i |  ≤ O (cid:16) ∆ · log . k · k H k (cid:17) , where the last inequality is from Fact 20 and the assumption that |{ i : x i ≤ }| ≤ k . Third term in Eq. (57) . Using the fact that g ′ ( · ) is bounded by a constant, the thirdsummation is upper bounded by O  X i = i = i xi >xi ,xi < ,xi ≥ ∆ (cid:12)(cid:12) max (cid:8) g ′ ( x i ) , g ′ ( x i ) (cid:9) · G (cid:0) x −{ i ,i } (cid:1) H i ,i H i ,i H i ,i (cid:12)(cid:12) = O  X i = i = i xi >xi ,xi ≥ ,xi < ,xi ≥ ( · · · ) + X i = i = i xi >xi ,xi ≥ ,xi < ,xi < ( · · · )  . (60)For the ﬁrst summation in Eq. (60), using the fact that g ( x ) ≥ when x ≥

0, it is upper bounded by O  ∆ X i = i = i xi >xi ,xi ≥ ,xi < ,xi ≥ (cid:12)(cid:12) max (cid:8) g ′ ( x i ) , g ′ ( x i ) (cid:9) · G (cid:0) x −{ i } (cid:1) H i ,i H i ,i H i ,i (cid:12)(cid:12) ≤ O  ∆ X i : x i < k G (1) k max i X i | H i ,i H i ,i H i ,i |  ≤ O  ∆ X i : x i < p log k max i X i | H i ,i H i ,i H i ,i |  ≤ O (cid:16) ∆ · log . k · k H k (cid:17) , where the second inequality is from Fact 20, and the last inequality used Eq. (35) and the assumptionthat |{ i : x i ≤ }| ≤ k .In order to upper bound the second summation in Eq. (60), ﬁrst observe that both g ( · ) and G ( · ) are positive and upper bounded by 1. Thus, Eq. (60) can be bounded as O  ∆ X i = i xi < ,xi < X i | H i ,i H i ,i H i ,i |  ≤ O (cid:16) ∆ · log k · k H k (cid:17) . |{ i : x i ≤ }| ≤ k . Fourth term in Eq. (57) . The last summation is upper bounded by O (cid:16) ∆ · log k · k H k (cid:17) using the same arguments to upper bound the second summation in Eq. (60). A.2 Case 2.2

We upper bound the quantity in Remark 1 in two cases that x i > x i and x i > x i . In order toprove this lemma we need the following lemmas and claims. Claim 53.

For integer k ≥ , X ∈ Sym k and H ∈ Mat k it holds that (cid:13)(cid:13) ( XH + HX ) e − X (cid:13)(cid:13) ≤ k X k · k He − X / k Proof.

As the Schattern norm is unitarily invariant, we assume that X = diag ( x , . . . , x n ) isdiagonal without loss of generality. Then k ( XH + HX ) e − X / k = X i,j H i,j ( x i + x j ) e − x j ≤ k X k · X i,j H i,j e − x j = 4∆ k He − X / k . Lemma 54.

Given an integer k ≥ , u , u , u ≥ satisfying u + u + u = 1 and X ∈ Sym k , H , H , H ∈ Mat k , if u , u ≤ , then it holds that (cid:12)(cid:12)(cid:12) Tr h e − u X H e − u X H e − u X H i(cid:12)(cid:12)(cid:12) ≤ k H e − X k · k H e − X k · k H k Proof.

Using the inequality | Tr ABC | ≤ k A k · k B k · k C k (where k · k is the standard Frobeniusnorm and k · k is the spectral norm), we have (cid:12)(cid:12)(cid:12) Tr h e − u X H e − u X H e − u X H i(cid:12)(cid:12)(cid:12) ≤ k e − u X H e ( u − ) X k · k e − u X H e ( u − ) X k · k H k≤ k H e − X k · k H e − X k · k H k The last inequality is from Lemma 55

Lemma 55.

Given diagonal matrices A = diag ( a , . . . , a k ) , B = diag ( b , . . . , b k ) with a ≥ · · · ≥ a k ≥ and b ≥ · · · ≥ b k ≥ and an arbitrary matrix H , it holds that k AHB k ≤ k HAB k . Proof.

Note that k HAB k − k AHB k = X i,j H ij (cid:0) a j b j − a i a j (cid:1) = X i,j H i,j a i b i + a j b j − a i b j − a j b i ! = 12 X i,j (cid:0) a i − a j (cid:1) (cid:0) b i − b j (cid:1) ≥ , where the ﬁrst equality is from the symmetry. 54 emma 56. Given an integer k ≥ , matrices A, B, C ∈ Mat k and X ∈ Sym k with k X k ≤ ∆ , itholds that (cid:12)(cid:12)(cid:12) Tr h D (cid:16) e − X / (cid:17) [ A, B ] C i(cid:12)(cid:12)(cid:12) ≤ · max ( k Ae − X / k · k Be − X / k · k C k , k Ae − X / k · k Ce − X / k · k B k , k Be − X / k k Ce − X / k · k A k ) . Proof.

Combining Lemma 12, Lemma 54 and the inequality that k ( XA + AX ) e − X / k ≤ k Ae − X / k , k XA + AX k ≤ k A k , we conclude the result. Lemma 57. (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X i = i = i xi >xi ,xi >xi h i i ′ −h i i ′ [ i − i ] − h i i ′ −h i i ′ [ i − i ] [ i − i ] h i ih i i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ O (cid:16) ∆ · log . k k H k (cid:17) .. Proof of Lemma 57.

We break the summation into two summations (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X i = i = i xi >xi >xi ( · · · ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X i = i = i xi >xi >xi ( · · · ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (61)For the ﬁrst summation, we deﬁne A i,j = ( H i,j , if x i < x j , otherwise . and Then k A k ≤ log k · k H k by Fact 4 (without loss of generality, we may assume that x i s aresorted in increasing order. Further notice that all the diagonal entries of H are zeros. Thus A isthe upper triangle part of H ). We ﬁrst bound the ﬁrst term in Eq. (61). In this direction, we ﬁrstrewrite it as1 √ π X i G ( x − i ) (cid:18)(cid:16) D (cid:16) e − X / (cid:17) [ A, A T ] H (cid:17) i ,i (cid:19) = 1 √ π X i < ( · · · ) + 1 √ π X i ≥ ( · · · ) , (62)where X = diag ( x , . . . , x k ) and we implicitly used that we are summing over terms with x i < x i .For the ﬁrst summation in Eq. (62), (cid:12)(cid:12)(cid:12) Tr (cid:16) D (cid:16) e − X / (cid:17) [ A, A T ] HE i ,i (cid:17)(cid:12)(cid:12)(cid:12) ≤ log k · max n k Ae − X / k · k HE i ,i k , k Ae − X / k · k He − X / k · k AE i ,i k o ≤ log k k He − X / k · k H k . (63)55here the ﬁrst inequality is by Lemma 56 and the second inequality is because X and E i ,i arediagonal and A is a submatrix of H . Thus, the ﬁrst summation in Eq. (62) is upper bounded by∆ · log k √ π X i : x i < G ( x − i ) k He − X / k · k H k =  ∆ log k · X i : x i < G ( x − i ) X i ,i e − x i H i ,i k H k  ≤  ∆ log k · max i X i = i g ′ ( x i ) · G ( x − i ) · H i ,i k H k  ≤ O (cid:16) ∆ log . k k H k (cid:17) , (64)where the ﬁrst inequality is from the assumption that |{ i : x i < }| ≤ k and the second in-equality is from Fact 20.For the second summation in Eq. (62), we deﬁne˜ H i,j = ( H i,j g ( x j ) , if x j ≥ , otherwise . Then k ˜ H k ≤ k H k as g ( x i ) ≥ if x i ≥

0. It is easy to verify that the second summation inEq. (62) is equal to (cid:12)(cid:12)(cid:12)(cid:12) √ π G ( x ) Tr D (cid:16) e − X / (cid:17) [ A, A T ] ˜ H (cid:12)(cid:12)(cid:12)(cid:12) ≤ ∆ log k √ π G ( x ) k He − X / k k H k ≤ O (cid:16) ∆ · log . k k H k (cid:17) . where the ﬁrst inequality is from Lemma 56 and the second inequality is from Fact 20.Finally, the second summation in Eq. (61) can be upper bounded using the verbatim samearguments by O (cid:16) ∆ · log . k · k H k (cid:17) . This proves the lemma statement. Lemma 58. (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X i = i = i xi >xi ,xi >xi h i i−h i i [ i − i ] − h i i−h i i [ i − i ] [ i − i ] h i i ′ h i i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ O (cid:16) ∆ · log . k · k H k (cid:17) Proof. (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X i = i = i xi >xi ,xi >xi h i i−h i i [ i − i ] − h i i−h i i [ i − i ] [ i − i ] h i i ′ h i i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X i = i = i xi >xi ,xi >xi ≥ ( · · · ) + X i = i = i xi >xi ,xi > ,xi < ( · · · ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (65)56o upper bound the ﬁrst summation in Eq. (65), we apply Fact 3 and upper bound the ﬁrstsummation by O  X i = i = i xi >xi ,xi >xi ≥ (cid:12)(cid:12) g ′′ ( ξ i ,i ,i ) g ′ ( x i ) G (cid:0) x −{ i ,i } (cid:1) H i ,i H i ,i H i ,i (cid:12)(cid:12) ≤ O (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∆ · X i = i = i xi >xi ,xi >xi ≥ g ′ ( x i ) g ′ ( x i ) G (cid:0) x −{ i ,i } (cid:1) H i ,i H i ,i H i ,i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ O k G (2) ( x ) k max i ,i X i | H i ,i H i ,i H i ,i | ! ≤ O (cid:16) ∆ · log k · k H k (cid:17) where the last inequality is from Fact 20 and Eq. (35). Note that | g ′′ ( ξ ) | ≤ ∆ for any ξ ∈ [ x i , max { x i , x i } ] by Eq. (11). Applying Fact 3, the second summation in Eq. (65) is upperbounded by O  ∆ X i : x i < X i ,i g ′ ( x i ) G (cid:0) x −{ i ,i } (cid:1) | H i ,i H i ,i H i ,i |  ≤ O ∆ · log k · max i k G (1) ( x − i ) k · max i X i | H i ,i H i ,i H i ,i | ! ≤ O (cid:16) ∆ · log . k · k H k (cid:17) where the ﬁrst inequality is from the assumption that |{ i : x i < }| ≤ k and the second in-equality is from Fact 20 and Eq. (35). A.3 Case 2.3

We now bound the second case of Remark 1 when x i > x i > x i . Recall that the goal is to upperbound the following lemma. Lemma 59. (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X i = i = i xi >xi >xi h i ih i i ′ −h i ih i i ′ [ i − i ] − h i ih i i ′ −h i ih i i ′ [ i − i ] [ i − i ] h i i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ O (cid:16) ∆ · log . k · k H k (cid:17) Remark 2.

Combining with Remark 1, it suﬃces to upper bound (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X i = i = i xi >xi >xi h i i ′ −h i i ′ [ i − i ] h i i − h i i ′ −h i i ′ [ i − i ] h i i [ i − i ] h i i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) roof of Lemma 59. (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X i = i = i xi >xi >xi h i ih i i ′ −h i ih i i ′ [ i − i ] − h i ih i i ′ −h i ih i i ′ [ i − i ] [ i − i ] h i i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ X i = i = i xi >xi >xi (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) h i ih i i ′ −h i ih i i ′ [ i − i ] − h i ih i i ′ −h i ih i i ′ [ i − i ] [ i − i ] h i i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + X i = i = i xi >xi >xi (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) h i ih i i ′ −h i ih i i ′ [ i − i ] − h i ih i i ′ −h i ih i i ′ [ i − i ] [ i − i ] h i i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = X i = i = i xi >xi >xi (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) h i i−h i i [ i − i ] − h i i−h i i [ i − i ] [ i − i ] h i i ′ h i i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + X i = i = i xi >xi >xi (cid:12)(cid:12)(cid:12)(cid:12) h i i − h i i [ i − i ] · h i i ′ − h i i ′ [ i − i ] · h i i (cid:12)(cid:12)(cid:12)(cid:12) (66)The ﬁrst term is upper bounded by O (cid:16) ∆ · log . k · k H k (cid:17) using the same argument in Lemma 58.The second term can be rephrased as (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X i = i = i xi >xi >xi g ( x i ) − g ( x i ) x i − x i · g ′ ( x i ) − g ′ ( x i ) x i − x i g ( x i ) G ( x − i ) H i ,i H i ,i H i ,i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X i = i = i xi >xi >xi ,xi ≥ ( · · · ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X i = i = i xi >xi >xi ,xi < ( · · · ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (67)For the ﬁrst summation in Eq. (67), we apply the mean value theorem for both g and g ′ . FromEq. (11) it is upper bounded by X i = i = i xi >xi >xi ,xi ≥ ∆ (cid:12)(cid:12) g ′ ( x i ) g ′ ( x i ) g ( x i ) G ( x − i ) H i ,i H i ,i H i ,i (cid:12)(cid:12) ≤ O (cid:16) ∆ · k G (2) ( x ) k k H k (cid:17) ≤ O (cid:16) ∆ · log k · k H k (cid:17) . For the second term in Eq. (67), it is not hard to verify that (cid:12)(cid:12)(cid:12)(cid:12) g ′ ( x i ) − g ′ ( x i ) x i − x i (cid:12)(cid:12)(cid:12)(cid:12) ≤ ∆ max (cid:8) g ′ ( x i ) , g ( x i ) (cid:9) (68)Further notice that | g ′ ( · ) | ≤

1. Applying the mean value theorem to g , we upper bound the secondsummation in 67 by O  ∆ X i = i = i xi >xi >xi ,xi < max (cid:8) g ′ ( x i ) , g ′ ( x i ) (cid:9) g ( x i ) G ( x − i ) | H i ,i H i ,i H i ,i |  ≤ O ∆ · log k · max i ·k G (1) ( x − i ) k · max i X i | H i ,i H i ,i H i ,i | ! ≤ O (cid:16) ∆ · log . k · k H k (cid:17) |{ i : x i < }| ≤ k and the second in-equality is from Fact 20 Eq. (35). A.4 Case 2.4

In this section, we want to upper bound Remark 2. To do so we write it as the sum of two terms(the ﬁrst one is easy to bound).

Lemma 60. (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X i = i = i xi >xi >xi h i i ′ − h i i ′ [ i − i ] · h i i − h i i [ i − i ] · h i i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ O (cid:16) ∆ · log k · k H k (cid:17) Proof of Lemma 60.

We split the summation into two cases that x i ≥ x i <

0. For thecase that x i ≥

0, we apply the mean value theorem to g ( · ) and Eq. (68), it is upper bounded by O (cid:16) ∆ · log k · k H k (cid:17) . For the case that x i <

0, we have x i <

0. Note that | g ′ ( · ) | ≤

1. Thus it isupper bounded by O  X i = i xi
Lemma 61. (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X i = i = i xi >xi >xi h i i ′ −h i i ′ [ i − i ] − h i i ′ −h i i ′ [ i − i ] [ i − i ] h i ih i i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ O (cid:16)p log k · k H k (cid:17) . Before we prove this lemma, we ﬁrst prove a “simpler” proposition which will be crucial inupper bound the above.

Proposition 62. (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X i = i = i h i i ′ −h i i ′ [ i − i ] − h i i ′ −h i i ′ [ i − i ] [ i − i ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ O (cid:16) ∆ · p log k · k H k (cid:17) Proof.

Using Fact 10, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X i = i = i g ′ ( x i ) − g ′ ( x i ) x i − x i − g ′ ( x i ) − g ′ ( x i ) x i − x i x i − x i G ( x ) H i ,i H i ,i H i ,i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = 1 √ π (cid:12)(cid:12)(cid:12)(cid:12) Tr (cid:20) D (cid:18) e − X (cid:19) [ H, H ] · H (cid:21)(cid:12)(cid:12)(cid:12)(cid:12) G ( x ) , X = diag ( x , . . . , x n ). Using Lemma 12, it suﬃces to upper bound G ( x ) (cid:12)(cid:12)(cid:12)(cid:12) Tr (cid:20) e − uX ( XH + HX ) e − v (1 − u ) X ( XH + HX ) e − (1 − v )(1 − u ) X H (cid:21)(cid:12)(cid:12)(cid:12)(cid:12) (69)and G ( x ) (cid:12)(cid:12)(cid:12)(cid:12) Tr (cid:20) e ( u − X H e − uX H (cid:21)(cid:12)(cid:12)(cid:12)(cid:12) (70)Note that u + v (1 − u ) + (1 − v ) (1 − u )=1. At least two of these three quantities are at most .We upper bound Eq. 69 in the following three cases.If u ≤ and (1 − u ) (1 − v ) ≤ , using Claim 53 and Lemma 54, Eq. (69) is upper bounded by (cid:13)(cid:13) ( XH + HX ) e − X (cid:13)(cid:13) k H k ≤ ∆ k He − X / k · k H k If u ≤ and v (1 − u ) ≤ , then the Eq. (69) is upper bounded by k ( XH + HX ) e − X k · k He − X k k XH + HX k ≤ k He − X / k · k XH + HX k≤ k He − X / k · k H k . where the second last inequality is by Claim 53. The case that u (1 − v ) ≤ and v (1 − u ) ≤ follows similarly.Also Eq. (70) can be upper bounded with similar arguments. Thus G ( x ) · (cid:12)(cid:12)(cid:12) Tr e ( u − X / H e − uX / H (cid:12)(cid:12)(cid:12) ≤ G ( x ) k H k · k He − X / k . (71)Therefore, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X i = i = i g ′ ( x i ) − g ′ ( x i ) x i − x i − g ′ ( x i ) − g ′ ( x i ) x i − x i x i − x i G ( x ) H i ,i H i ,i H i ,i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (72) ≤ (cid:18) G ( x ) · Tr (cid:20) D (cid:18) e − X (cid:19) [ H, H ] · H (cid:21)(cid:19) (73) ≤ O (cid:16) ∆ G ( x ) k H kk He − X / k (cid:17) = O  ∆ X i ,i e − x i H i ,i G ( x ) · k H k  ≤ O  ∆ X i ,i g ′ ( x i ) G ( x ) H i ,i · k H k  ≤ O  ∆ X i ,i g ′ ( x i ) G ( x − i ) H i ,i · k H k  ≤ O ∆ k G (1) ( x ) k · max i X i H i ,i ! · k H k ! ≤ O (cid:18) ∆ k G (1) ( x ) k · max i (cid:0) H (cid:1) i ,i · k H k (cid:19) ≤ O (cid:16) ∆ · p log k · k H k (cid:17) , (74)60here the second inequality used e − x i / ≤

1, third inequality used g ( x ) ∈ [0 ,

1] and the lastinequality is from Fact 20.We are now ready to prove the main lemma. Note that end of the day we need to bound thecase (in Remark 2) when the quantity in Lemma 60 contains G ( x − i ) instead of G ( x −{ i ,i ,i } ).Observe that the inequality in Lemma 61 can be written as (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X i = i = i i
By the paragraph above, proving this lemma is equivalent to proving Eq. (75).For the ﬁrst summation above, let (cid:0) A i (cid:1) i,j = ( H i,i , if j = i and i > i , otherwise . The left hand side of the claim statement can be expressed as (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) √ π X i G ( x − i ) (cid:16) Tr D (cid:16) e − X / (cid:17) h A i , (cid:0) A i (cid:1) T i H (cid:17)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) √ π X i : x i < ( · · · ) + 1 √ π X i : x i ≥ ( · · · ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (76)Using the same arguments in Lemma 62 and the fact that k A i e − X / k ≤ k He − X / k , k A i k ≤k H k , we can upper bound the ﬁrst summation in Eq. (76) by1 √ π X i : x i < G ( x − i ) k He − X / k · k H k ≤ O (cid:16) ∆ · log . k k H k (cid:17) , where the inequality follows from the argument in Eq. (64).For the second summation, deﬁne B i ,i =  H i ,i q g ( x i ) , if x i ≥ , otherwise . Note that k Be − X / k ≤ √ k He − X / k and k B k ≤ √ k H k (since g ( x ) ≥ / x ≥ (cid:12)(cid:12)(cid:12)(cid:12) G ( x ) 1 √ π Tr D (cid:16) e − X / (cid:17) (cid:2) B, B T (cid:3) A (cid:12)(cid:12)(cid:12)(cid:12) ≤ √ π G ( x ) k He − X / k k H k ≤ O (cid:16) ∆ · log . k · k H k (cid:17)(cid:17)

Related Researches

On the Power and Limitations of Branch and Cut

by Noah Fleming

Placing Green Bridges Optimally, with a Multivariate Analysis

by Till Fluschnik

A full complexity dichotomy for immanant families

by Radu Curticapean

On Computation Complexity of True Proof Number Search

by Chao Gao

On the Algorithmic Content of Quantum Measurements

by Samuel Epstein

Parameterized Complexity of Immunization in the Threshold Model

by Gennaro Cordasco

Reconstructing Arbitrary Trees from Traces in the Tree Edit Distance Model

by Thomas Maranzatto

Enumerating maximal consistent closed sets in closure systems

by Lhouari Nourine

The #ETH is False, #k-SAT is in Sub-Exponential Time

by Giorgio Camerani

A Comprehensive Survey on the Multiple Travelling Salesman Problem: Applications, Approaches and Taxonomy

by Omar Cheikhrouhou

Approximability of all Boolean CSPs in the dynamic streaming setting

by Chi-Ning Chou

Conditional Dichotomy of Boolean Ordered Promise CSPs

by Joshua Brakensiek

Training Neural Networks is ER-complete

by Mikkel Abrahamsen

Sorting Short Integers

by Michal Koucký

Parallel algorithms for power circuits and the word problem of the Baumslag group

by Caroline Mattes

Data Structures Lower Bounds and Popular Conjectures

by Pavel Dvo?ák

Unambiguous DNFs and Alon-Saks-Seymour

by Kaspars Balodis

Depth lower bounds in Stabbing Planes for combinatorial principles

by Stefan Dantchev

A note on VNP-completeness and border complexity

by Christian Ikenmeyer

Lower Bounds on Dynamic Programming for Maximum Weight Independent Set

by Tuukka Korhonen

Proof complexity of positive branching programs

by Anupam Das

Barriers for recent methods in geodesic optimization

by Cole Franks

Hitting Sets and Reconstruction for Dense Orbits in VP e and Σ?Σ Circuits

by Dori Medini

Sampling and Complexity of Partition Function

by Chuyu Xiong

A polynomial time construction of a hitting set for read-once branching programs of width 3

by Ji?í ?íma

«

1

2

3

4

»

Submitted on 20 Jan 2021 (v1), last revised 1 Jun 2021 (this version, v2) Updated

arXiv.org Original Source

NASA ADS

Google Scholar

Semantic Scholar