[PDF] Lower bounds for the smallest singular value of structured random matrices

Abstract

We obtain lower tail estimates for the smallest singular value of random matrices with independent but non-identically distributed entries. Specifically, we consider n×n matrices with complex entries of the form M=A∘X+B=( a ij ξ ij + b ij ) where X=( ξ ij ) has iid centered entries of unit variance and A and B are fixed matrices. In our main result we obtain polynomial bounds on the smallest singular value of M for the case that A has bounded (possibly zero) entries, and B=Z n − − √ where Z is a diagonal matrix with entries bounded away from zero. As a byproduct of our methods we can also handle general perturbations B under additional hypotheses on A , which translate to connectivity hypotheses on an associated graph. In particular, we extend a result of Rudelson and Zeitouni for Gaussian matrices to allow for general entry distributions satisfying some moment hypotheses. Our proofs make use of tools which (to our knowledge) were previously unexploited in random matrix theory, in particular Szemerédi's Regularity Lemma, and a version of the Restricted Invertibility Theorem due to Spielman and Srivastava.

Full PDF

aa r X i v : . [ m a t h . P R ] M a y Submitted to the Annals of Probability arXiv: arXiv:1608.07347

LOWER BOUNDS FOR THE SMALLEST SINGULARVALUE OF STRUCTURED RANDOM MATRICES

By Nicholas Cook ∗ University of California, Los Angeles

We obtain lower tail estimates for the smallest singular value ofrandom matrices with independent but non-identically distributedentries. Speciﬁcally, we consider n × n matrices with complex entriesof the form M = A ◦ X + B = ( a ij ξ ij + b ij )where X = ( ξ ij ) has iid centered entries of unit variance and A and B are ﬁxed matrices. In our main result we obtain polynomial boundson the smallest singular value of M for the case that A has bounded(possibly zero) entries, and B = Z √ n where Z is a diagonal matrixwith entries bounded away from zero. As a byproduct of our methodswe can also handle general perturbations B under additional hypothe-ses on A , which translate to connectivity hypotheses on an associatedgraph. In particular, we extend a result of Rudelson and Zeitouni forGaussian matrices to allow for general entry distributions satisfyingsome moment hypotheses. Our proofs make use of tools which (toour knowledge) were previously unexploited in random matrix the-ory, in particular Szemer´edi’s Regularity Lemma, and a version of theRestricted Invertibility Theorem due to Spielman and Srivastava.

1. Introduction.

Throughout the article we make use of the follow-ing standard asymptotic notation: f = O ( g ), f ≪ g , g ≫ f all mean that | f | ≤ Cg for some absolute constant C < ∞ . We indicate dependence ofthe implied constant on parameters with subscripts, e.g. f ≪ α g . C, c, c ′ , c ,etc. denote unspeciﬁed constants whose value may be diﬀerent at each oc-curence, and are understood to be absolute if no dependence on parametersis indicated.1.1. Background.

Recall that the singular values of an n × n matrix M with complex entries are the eigenvalues of √ M ∗ M , which we arrange innon-increasing order: k M k = s ( M ) ≥ · · · ≥ s n ( M ) ≥ . ∗ Partially supported by NSF postdoctoral fellowship DMS-1266164.

MSC 2010 subject classiﬁcations:

Primary 60B20; secondary 15B52

Keywords and phrases:

Random matrices, condition number, regularity lemma, metricentropy N. COOK (throughout we write k · k for the ℓ n → ℓ n operator norm). M is invertible ifand only if s n ( M ) >

0, in which case s n ( M ) = k M − k − . We (informally)say that M is “well-invertible” if s n ( M ) is well-separated from zero.The largest and smallest singular values of random matrices with inde-pendent entries have been intensely studied, in part due to applications intheoretical computer science. Motivated by their work on the ﬁrst electroniccomputers, von Neumann and Goldstine sought upper bounds on the condi-tion number κ ( M ) = s ( M ) /s n ( M ) of a large matrix M with iid entries [43].More recently, bounds on the condition number of non-centered random ma-trices have been important in the theory of smoothed analysis of algorithms developed by Spielman and Teng [31]. The smallest singular value has alsoreceived attention due to its connection with proving convergence of theempirical spectral distribution – see [6, 36].Much is known about the largest singular value for random matrices withindependent entries. First we review the iid case: we denote by X = X n an n × n matrix whose entries ξ ij are iid copies of a centered complex randomvariable with unit variance, and refer to such X as an “iid matrix”. Fromthe works [4, 44] it is known that √ n s ( X n ) ∈ (2 − ε, ε ) with probabilitytending to one as n → ∞ for any ﬁxed ε >

0. In connection with prob-lems in computer science and the theory of Banach spaces there has beenconsiderable interest in obtaining non-asymptotic bounds for matrices withindependent but non-identically distributed entries; see the recent works [5]and [41] and references therein for an overview.The picture is far less complete for the smallest singular value of randommatrices; however, recent years have seen much progress for the case ofthe iid matrix X . The limiting distribution of √ ns n ( X ) was obtained byEdelman for the case of Gaussian entries [11], and this law was shown byTao and Vu to hold for all iid matrices with entries ξ ij having a suﬃcientlylarge ﬁnite moment [35].Quantitative lower tail estimates for s n ( X ) proved to be considerablymore challenging than bounding the operator norm. The ﬁrst breakthroughwas made by Rudelson [27], who showed that if X has iid real-valued sub-Gaussian entries, that is(1.1) E exp (cid:0) | ξ | /K (cid:1) ≤ K < ∞ , then(1.2) P (cid:16) s n ( X ) ≤ tn − / (cid:17) ≪ K t + n − / for all t ≥ . Around the same time, in [37] Tao and Vu used methods from additive

NVERTIBILITY OF STRUCTURED RANDOM MATRICES combinatorics to obtain bounds of the form(1.3) P (cid:16) s n ( X ) ≤ n − β (cid:17) ≪ n − α for any ﬁxed α > β suﬃciently large depending on α , for the case thatthe entries of X take values in {− , , } . Roughly speaking, their approachwas to classify potential almost-null vectors v according to the amount of ad-ditive structure present in the multi-set of coordinate values { v j } nj =1 . Theyextended (1.3) to uncentered matrices with general entry distributions hav-ing ﬁnite second moment in [36] (see Theorem 1.6 below), which was instru-mental for their proof of the celebrated circular law for the limiting spectraldistribution of √ n X .Motivated by these developments, in [28] Rudelson and Vershynin founda diﬀerent way to quantify the additive structure of a vector v called the es-sential least common denominator , and obtained the following improvementof (1.2), (1.3) for matrices with sub-Gaussian entries:(1.4) P (cid:16) s n ( X ) ≤ tn − / (cid:17) ≪ K t + e − cn . This estimate is optimal up to the implied constant and c = c ( K ) > K as in (1.1)).Finally, we mention that there has also been work on upper tail bounds for the smallest singular value – see in particular [24, 29] – but we do notconsider this problem further in the present work.1.2. A general class of non-iid matrices.

In this paper we are concernedwith bounds for the smallest singular value of random matrices with in-dependent but non-identically distributed entries. The following deﬁnitionallows us to quantify the dependence of our bounds on the distribution ofthe matrix entries.

Definition . Let ξ be a complex randomvariable and let κ ≥

1. We say that ξ is κ -spread if(1.5) Var (cid:2) ξ ( | ξ − E ξ | ≤ κ ) (cid:3) ≥ κ . Remark . It follows from the monotone convergence theorem thatany random variable ξ with non-zero second moment is κ -spread for some κ < ∞ . Furthermore, if ξ is centered with unit variance and ﬁnite p thmoment µ p for some p >

2, then it is routine to verify that ξ is κ -spreadwith κ = 3(3 µ pp ) / ( p − , say. N. COOK

Our results concern the following general class of matrices:

Definition . Let A = ( a ij ) and B =( b ij ) be deterministic n × m matrices with a ij ∈ [0 ,

1] and b ij ∈ C for all i, j .Let X = ( ξ ij ) be an n × m matrix with independent entries, all identicallydistributed to a complex random variable ξ with mean zero and varianceone. Put(1.6) M = A ◦ X + B = ( a ij ξ ij + b ij ) ni,j =1 where ◦ denotes the matrix Hadamard product. We refer to A , B and ξ asthe standard deviation proﬁle , mean proﬁle and atom variable , respectively.We denote the L p norm of the atom variable by(1.7) µ p := ( E | ξ | p ) /p . Without loss of generality, we assume throughout that ξ is κ -spread forsome ﬁxed κ ≥ Remark . The assumption that the entries of M are shifted scalingsof random variables ξ ij having a common distribution is made for conve-nience, as it allows us to access some standard anti-concentration estimates(see Section 2.2). We expect the proofs can be modiﬁed to cover general ma-trices with independent entries having speciﬁed means and variances (pos-sibly with additional moment hypotheses), but we do not pursue this here.As a concrete example one can consider a centered non-Hermitian bandmatrix, where one sets a ij ≡ | i − j | exceeding some bandwidth param-eter w ∈ [ n −

1] – see Corollary 1.16.The singular value distributions for structured random matrices have beenstudied in connection with wireless MIMO networks [13, 40]. The limitingspectral distributions and spectral radius for certain structured random ma-trices have been used to model the dynamical properties of neural networks[1, 25]. In the recent work [10] with Hachem, Najim and Renfrew, the lim-iting spectral distribution was determined for a general class of centeredstructured random matrices. That work required bounds on the smallest sin-gular value for shifts of centered matrices by scalar multiples of the identity,

NVERTIBILITY OF STRUCTURED RANDOM MATRICES which was the original motivation for the results in this paper (in particular,Corollary 1.21 below is a key input for the proofs in [10]).The picture for the smallest singular value of structured random matricesis far less complete than for the largest singular value. Here we contentourselves with identifying suﬃcient conditions on the matrices A, B and thedistribution of ξ for a structured random matrix M to be well-invertiblewith high probability. Speciﬁcally, we seek to address the following: Question A.

Let M be an n × n random matrix as in Deﬁnition 1.3. Underwhat assumptions on the standard deviation and mean proﬁles A, B and thedistribution of the atom variable ξ do we have(1.8) P (cid:0) s n ( M ) ≤ n − β (cid:1) = O ( n − α )for some constants α, β > B = − z √ nI for some ﬁxed z ∈ C (where I denotes the n × n identity matrix) is of particular interest for applications to the limitingspectral distribution of centered random matrices. As we shall see in the nextsubsection, existing results in the literature give lower tail bounds for s n ( M )that are uniform in the shift B under the size constraint k B k = n O (1) , i.e.(1.9) sup B ∈M n ( C ): k B k≤ n C P (cid:0) s n ( A ◦ X + B ) ≤ n − β (cid:1) = O ( n − α ) . for some constant C > k B k = O ( √ n )). Suchbounds can be viewed as matrix analogues of classical anti-concentration (or“small ball”) bounds of the form(1.10) sup z ∈ C P ( | S n − z | ≤ r ) ≤ f ( r ) + o (1)for a sequence of scalar random variables S n (such as the normalized partialsums of an inﬁnite sequence of iid variables), where f : R + → R + is somecontinuous function such that f ( r ) → r →

0. In fact, bounds of theform (1.10) are a central ingredient in the proofs of estimates (1.9). Roughlyspeaking, the translation invariance of (1.10) causes the uniformity in theshift B in (1.9) to come for free once one can handle the centered case B = 0(the assumption k B k = n O (1) is needed to have some continuity of the map u

7→ k

M u k on the unit sphere in order to apply a discretization argument).In light of this we may pose the following: Question B.

Let M be an n × n random matrix as in Deﬁnition 1.3, andlet γ >

0. Under what assumptions on the standard deviation proﬁle A and N. COOK the distribution of the atom variable ξ do we have(1.11) sup B ∈M n ( C ): k B k≤ n γ P (cid:0) s n ( M ) ≤ n − β (cid:1) = O ( n − α )for some constants α, β > A for which we can expect to have (1.11). Observation . Suppose that A = ( a ij ) has a k × m submatrix ofzeros for some k, m with k + m > n . Then A ◦ X is singular with probability1. Thus, (1.11) fails (by taking B = 0 ) for any ﬁxed α, β > . Theorem 1.12 below (see also Theorem 1.10 for the Gaussian case) showsthat the above is in some sense the only obstruction to obtaining (1.11).1.3.

Previous results.

Before stating our main results on Questions Aand B we give an overview of what is currently in the literature.For the case of a constant standard deviation proﬁle A and essentiallyarbitrary mean proﬁle B we have the following result of Tao and Vu: Theorem . Let X be an n × n matrix withiid entries ξ ij ∈ C having mean zero and variance one. For any α, γ > there exists β > such that for any ﬁxed (deterministic) n × n matrix B with k B k ≤ n γ , (1.12) P (cid:0) s n ( X + B ) ≤ n − β (cid:1) = O α,γ ( n − α ) . A stronger version of the above bound was established earlier by Sankar,Spielman and Teng for the case that X has iid standard Gaussian entries[31]. For the case that B = 0, the bound (1.4) of Rudelson and Vershyningives the optimal dependence β = α +1 / B with k B k = O ( √ n )). Recently, the sub-Gaussian assumption for(1.4) was relaxed by Rebrova and Tikhomirov to only assume a ﬁnite secondmoment [26].When the entries of M have bounded density the problem is much simpler.The following is easily obtained by the argument in [6, Section 4.4]. Proposition . Let M be an n × n random matrix with independent entries having density on C NVERTIBILITY OF STRUCTURED RANDOM MATRICES or R uniformly bounded by ϕ > . For every α > there is a β = β ( α, ϕ ) > such that (1.13) P (cid:0) s n ( M ) ≤ n − β (cid:1) = O ( n − α ) . Note that above we make no assumptions on the moments of the entriesof M – in particular, they may have heavy tails. The following result of Bor-denave and Chafa¨ı (Lemma A.1 in [6]) relaxes the hypothesis of continuousdistributions from Proposition 1.7 while still allowing for heavy tails, butcomes at the cost of a worse probability bound. Proposition . Let Y be an n × n random matrix with independent entries η ij ∈ C . Supposethat for some p, r, σ > we have that for all i, j ∈ [ n ] , (1.14) P ( | η ij | ≤ r ) ≥ p, Var( η ij ( | η ij | ≤ r )) ≥ σ . For any s ≥ , t ≥ , and any ﬁxed n × n matrix B we have (1.15) P (cid:18) s n ( Y + B ) ≤ t √ n , k Y + B k ≤ s (cid:19) ≪ p,r,σ p log s (cid:18) ts + 1 √ n (cid:19) . The non-degeneracy conditions (1.14) do not allow for some entries tobe deterministic. Litvak and Rivasplata [22] obtained a lower tail estimateof the form (1.8) for centered random matrices having a suﬃciently smallconstant proportion of entries equal to zero deterministically. Below we givenew results (Theorems 1.12 and 1.24) allowing all but an arbitrarily small(ﬁxed) proportion of entries to be deterministic.Finally, we recall a theorem of Rudelson and Zeitouni [30] for Gaussianmatrices, showing that Observation 1.5 is essentially the only obstruction toobtaining (1.11). To state their result we need to set up some graph theoreticnotation, which will be used repeatedly throughout the paper.To a non-negative n × m matrix A = ( a ij ) we associate a bipartite graphΓ A = ([ n ] , [ m ] , E A ), with ( i, j ) ∈ E A if and only if a ij >

0. For a row index i ∈ [ n ] we denote by(1.16) N A ( i ) = { j ∈ [ m ] : a ij > } its neighborhood in Γ A . Thus, the neighborhood of a column index j ∈ [ m ]is denoted N A T ( j ). Given sets of row and column indices I ⊂ [ n ] , J ⊂ [ m ],we deﬁne the associated edge count (1.17) e A ( I, J ) := |{ ( i, j ) ∈ [ n ] × [ m ] : a ij > }| . N. COOK

We will generally work with the graph that only puts an edge ( i, j ) when a ij exceeds some ﬁxed cutoﬀ parameter σ >

0. Thus, we denote by(1.18) A ( σ ) = ( a ij a ij ≥ σ )the matrix which thresholds out entries smaller than σ .Rudelson and Zeitouni work with Gaussian matrices whose matrix of stan-dard deviations A = ( a ij ) satisﬁes the following expansion-type condition. Definition . Let A = ( a ij ) be an n × m matrixwith non-negative entries. For I ⊂ [ n ] and δ ∈ (0 , δ -broadly connected neighbors of I as(1.19) N ( δ ) A ( I ) = { j ∈ [ m ] : |N A T ( j ) ∩ I | ≥ δ | I |} . For δ, ν ∈ (0 , A is ( δ, ν ) -broadly connected if(1) |N A ( i ) | ≥ δm for all i ∈ [ n ];(2) |N A T ( j ) | ≥ δn for all j ∈ [ m ];(3) |N ( δ ) A T ( J ) | ≥ min( n, (1 + ν ) | J | ) for all J ⊂ [ m ]. Theorem . Let G be an n × n matrix with iid standard real Gaussian entries, and let A bean n × n matrix with entries a ij ∈ [0 , for all i, j . With notation as in (1.18) ,assume that A ( σ ) is ( δ, ν ) -broadly connected for some σ , δ, ν ∈ (0 , . Let K ≥ , and let B be a ﬁxed n × n matrix with k B k ≤ K √ n . Then for any t ≥ , (1.20) P (cid:0) s n ( A ◦ G + B ) ≤ tn − / (cid:1) ≪ δ,ν,σ K O (1) t + e − cn for some c = c ( δ, ν, σ ) > . Note that the assumption of broad connectivity gives us an “epsilon ofseparation” from the bad example of Observation 1.5. Thus, Theorem 1.10provides a near-optimal answer to Question B for Gaussian matrices.

Remark . Since the dependence of the bound (1.20) on the param-eters δ and ν is not quantiﬁed, Theorem 1.10 only addresses Question B for dense standard deviation proﬁles, i.e. when A has a non-vanishing propor-tion of large entries. While it would not be diﬃcult to quantify the steps in[30], the resulting dependence on parameters is not likely to be optimal. NVERTIBILITY OF STRUCTURED RANDOM MATRICES New results.

Our ﬁrst result removes the Gaussian assumption fromTheorem 1.10, though at the cost of a worse probability bound. Recall theparameter κ from Deﬁnition 1.3. Theorem . Let M = A ◦ X + B be an n × n matrix as in Deﬁnition 1.3, and assume that A ( σ ) is ( δ, ν ) -broadly connected for some σ , δ, ν ∈ (0 , . Let K ≥ . Forany t ≥ , (1.21) P (cid:18) s n ( M ) ≤ t √ n , k M k ≤ K √ n (cid:19) ≪ K,δ,ν,σ ,κ t + 1 √ n . Remark . While we have stated no moment assumptions on theatom variable ξ over the standing assumption of unit variance, the restric-tion to the event {k M k ≤ K √ n } requires us to assume at least four ﬁnitemoments to deduce P ( s n ( M ) ≤ t/ √ n ) ≪ t + o (1). Here we give a lower tailestimate at the optimal scale s n ( M ) ∼ n − / ; however, the arguments inthis paper can be used to establish a polynomial lower bound on s n ( M ) ofnon-optimal order under larger perturbations B (similar to (1.28) below). Remark . We expect that theprobability bound in (1.21) can be improved by making use of more advancedtools of Littlewood–Oﬀord theory introduced in [28, 36], though it appearsthese tools cannot be applied in a straightforward manner. In the interestof keeping the paper of reasonable length we do not pursue this here.

Remark . The meth-ods used to prove Theorem 1.12 together with an idea of Tao and Vufrom [38] can be used to give lower bounds of optimal order on s n − k ( M )with n ε ≤ k ≤ cn for any ε > c = c ( κ , σ , δ, ν, K ) >

0; see [9, Theorem 4.5.1]. Such bounds are of interest forproving convergence of the empirical spectral distribution; see [6, 38].In light of Observation 1.5, Theorem 1.12 gives an essentially optimalanswer to Question B for dense random matrices (see Remark 1.11). Itwould be interesting to establish a version of this result that allows for onlya proportion o (1) of the entries to be random. Indeed, we expect a versionof the above theorem to hold when A has density as small (log O (1) n ) /n .(Quantifying the dependence on δ, ν in (1.21) would only allow a slightpolynomial decay in the density.)We note that they broad connectivity hypothesis includes many standarddeviation proﬁles of interest, such as band matrices: N. COOK

Corollary . Let M = A ◦ X + B be an n × n matrix as in Deﬁnition 1.3, and assume that for someﬁxed σ , ε ∈ (0 , , a ij ≥ σ for all i, j with min( | i − j | , n − | i − j | ) ≤ εn . Let K ≥ . Then (1.21) holds for any t ≥ (with implied constant dependingon K, σ , ε and κ ). We defer the proof to Appendix A.

Remark . It is possible to modify our argument for the above corol-lary to treat a band proﬁle that does not “wrap around”, i.e. only enforcing a ij ≥ σ for i, j with | i − j | ≤ εn .Having addressed Question B, we now ask whether we can further relaxthe assumptions on the standard deviation proﬁle A by assuming more aboutthe mean proﬁle B . In particular, can we make assumptions on B that give(1.8) while allowing A ◦ X to be singular deterministically?Of course, a trivial example is to take A = 0 and B any invertible matrix.Another easy example is to take take B to be very well-invertible , with s n ( B ) ≥ K √ n for a large constant K > B = K √ nI ,where I is the identity matrix). Indeed, standard estimates for the operatornorm of random matrices with centered entries (cf. Section 5.2) give k A ◦ X k = O ( √ n ) with high probability provided the atom variable ξ satisﬁessome additional moment hypotheses. From the triangle inequality s n ( M ) = inf u ∈ S n − k ( A ◦ X + B ) u k ≥ s n ( B ) − k A ◦ X k , so s n ( M ) ≫ √ n with high probability if K is suﬃciently large.The problem becomes non-trivial when we allow B to have singular valuesof size ε √ n for small ε > A as in Observation 1.5. In this case anyproof of a lower tail estimate of the form (1.8) must depart signiﬁcantly fromthe proofs of the results in the previous section by making use of argumentswhich are not translation invariant.Our main result shows that when the mean proﬁle B is a diagonal matrixwith smallest entry at least an arbitrarily small (ﬁxed) multiple of √ n , thenwe do not need to assume anything further about the standard deviationproﬁle A . Theorem . Fix arbitrary r ∈ (0 , ] , K ≥ , and let Z be a (deterministic) diagonal matrix with diagonal entries z , . . . , z n ∈ C satisfying (1.22) | z i | ∈ [ r , K ] ∀ i ∈ [ n ] . NVERTIBILITY OF STRUCTURED RANDOM MATRICES Let M be an n × n random matrix as in Deﬁnition 1.3 with B = Z √ n ,and assume µ η < ∞ for some ﬁxed η > . There are α ( η ) > and β ( r , η, µ η ) > such that (1.23) P (cid:0) s n ( M ) ≤ n − β (cid:1) = O r ,K ,η,µ η ( n − α ) . Remark . The assumption of 4 + η momentsis due to our use of a result of Vershynin, Theorem 5.8 below, on the operatornorm of products of random matrices. Apart from this, at many points in ourargument we use that an m × m submatrix of M has operator norm O ( √ m )with high probability (assuming m grows with n ), which requires at leastfour ﬁnite moments. Under certain additional assumptions on the standarddeviation proﬁle we only need to assume two moments – see Remark 5.12. Remark α, β on parameters) . The proof gives α ( η ) = min(1 , η ). If we were to assume ξ has ﬁnite p th moment for asuﬃciently large constant p then we could take any ﬁxed α < / β on µ η and r given by our proof is very bad, of theform(1.24) β = twr (cid:0) O η (1) exp(( µ η /r ) O (1) ) (cid:1) where twr( x ) is a tower exponential 2 ... of height x . (The factor O η (1)comes from Vershynin’s bound mentioned in the previous remark – we donot know the precise dependence on η , but we expect it is relatively mild.)This is due to our use of Szemer´edi’s regularity lemma (speciﬁcally, a versionfor directed graphs due to Alon and Shapira – see Lemma 5.2). It would beinteresting to obtain a version of Theorem 1.18 with a better dependence of β on the parameters.As we remarked above, the case of a diagonal mean proﬁle is of specialinterest for the problem of proving convergence of the empirical spectraldistribution of centered random matrices with a variance proﬁle. Corollary . Let X =( ξ ij ) be an n × n matrix whose entries are iid copies of a centered complexrandom variable ξ having unit variance and (4 + η ) -th moment µ η < ∞ for some ﬁxed η > . Let A = ( a ij ) be a ﬁxed n × n non-negative matrixwith entries uniformly bounded by σ max < ∞ . Put Y = √ n A ◦ X , and ﬁxan arbitrary z ∈ C \ { } . There are constants α = α ( η ) > and β = β ( | z | , η, µ η , σ max ) > such that (1.25) P (cid:0) s n ( Y − zI ) ≤ n − β (cid:1) = O | z | ,σ max ,µ η ( n − α ) . N. COOK

While our main motivation was to handle diagonal perturbations of cen-tered random matrices, we conjecture that Theorem 1.18 extends to matricesas in Deﬁnition 1.3 with more general mean proﬁles B : Conjecture . Theorem 1.18 continues to hold for B ∈ M n ( C ) notnecessarily diagonal, where the constraint (1.22) is replaced with √ n s i ( B ) ∈ [ r , K ] for all ≤ i ≤ n . Ideas of the proof.

Here we give an informal discussion of the mainideas in the proof of Theorem 1.18.

Regular partitions of graphs.

As with Theorem 1.12, the key is to as-sociate the standard deviation proﬁle A with a graph. Since we want thediagonal of M to be preserved under relabeling of vertices will will associate A with a directed graph (digraph) which puts an edge i → j whenever a ij exceeds some small threshold σ >

0. Since A has no special connectivitystructure a priori , we will apply a version of Szemer´edi’s regularity lemma fordigraphs (Lemma 5.2) to partition the vertex set [ n ] into a bounded numberof parts of equal size I , . . . , I m , together with a small set of “bad” vertices I bad , such that for most ( k, l ) ∈ [ m ] the subgraph on I k ∪ I l enjoys certain“pseudorandomness” properties. These properties will not be quite strongenough to control the smallest singular value of the corresponding submatrix M I k ,I l of M , but we can apply a “cleaning” procedure (as it is called in theextremal combinatorics literature) to remove a small number of bad verticesfrom each part in the partition (which we add to I bad ), after which we willbe able to control s min ( M I k ,I l ) for most ( k, l ) ∈ [ m ] . We defer the preciseformulation of the pseudorandomness properties and corresponding boundon the smallest singular value to Deﬁnition 1.23 and Theorem 1.24 below. Schur complement formula.

The task will then be to lift this control onthe invertibility of submatrices to the whole matrix M . The key tool here isthe Schur complement formula (see Lemma 5.4) which allows us to controlthe smallest singular value of a block matrix(1.26) (cid:18) M M M M (cid:19) assuming some control on the smallest singular values of (perturbationsof) the diagonal block submatrices M , M and on the operator norm ofthe oﬀ-diagonal submatrices M , M . The control on the smallest singularvalue of the whole matrix is somewhat degraded, but this is acceptable as NVERTIBILITY OF STRUCTURED RANDOM MATRICES we will only apply Lemma 5.4 a bounded number of times. If we can ﬁnda generalized diagonal of “good” block submatrices that are well-invertibleunder additive perturbations, then after permuting the blocks to lie on themain diagonal we can apply the Schur complement bound along a nestedsequence of submatrices partitioned as in (1.26), where M is a “good” ma-trix and M is well-invertible by the induction hypothesis. We remark thatthe strategy of leveraging properties of a small submatrix using the Schurcomplement formula was recently applied in a somewhat diﬀerent mannerin [7] to prove the universality of spectral statistics of random Hermitianband matrices. Decomposition of the reduced digraph.

At this point it is best to think ofthe regular partition I , . . . , I m as inducing a “macroscopic scale” digraph R = ([ m ] , E ) (often called the reduced digraph in extremal combinatorics)that puts an edge ( k, l ) ∈ E whenever the corresponding submatrix A I k ,I l is pseudorandom and suﬃciently dense. If we can cover the vertices of R with vertex-disjoint directed cycles, then we will have found a generalizeddiagonal of submatrices of M with the desired properties, and we can ﬁnishwith a bounded number of applications of the Schur complement formula asdescribed above.Of course, it may be the case that R cannot be covered by disjoint cycles.For instance, if A were to have all ones in the ﬁrst n/ n/ R wouldhave no incoming edges. This is where we make crucial use of the diagonalperturbation Z √ n (indeed, without this perturbation M would be singularin this example). The top left n/ × n/ M is dense, and wecan apply Theorem 1.24 to control its smallest singular vale. The bottomright n/ × n/ r √ n , and hence its smallest singular value is at least r √ n . Thisargument even allows for the bottom right submatrix of A to be nonzero butsuﬃciently sparse: we can use the triangle inequality and standard boundson the operator norm of sparse random matrices to argue that the smallestsingular value of the bottom right submatrix is still of order ≫ r √ n .We handle the general case as follows. We greedily cover as many ofthe vertices of R as we can with disjoint cycles – call this set of vertices U cyc ⊂ [ m ]. At this point we have either covered the whole graph (and we aredone) or the graph on the remaining vertices U free is cycle-free. This meansthat the vertices of R can be relabeled so that its adjacency matrix is upper-triangular on U free × U free . Write J cyc = S k ∈ U cyc I k , J free = S k ∈ U free I k anddenote the corresponding submatrices of A on the diagonal by A cyc , A free , N. COOK and likewise for M . We thus have a relabeling of [ n ] under which A free isclose to upper triangular (there may be some entries of A free below thediagonal of size less than σ , or which are contained in a small number ofexceptional pairs from the regular partition). Crucially, this relabeling haspreserved the diagonal, so the submatrix M free is a diagonal perturbationof an (almost) upper-triangular random matrix. We then show that such amatrix has smallest singular value of order ≫ r √ n with high probability.With another application of the Schur complement bound we can combinethe control on the submatrices M cyc , M free (along with standard boundson the operator norm for the oﬀ-diagonal blocks) to conclude the proof.(Actually, the bad set I bad of rows and columns requires some additionalarguments, but we do not discuss these here.)This concludes the high level description of the proof of Theorem 1.18.We only remark that the above partitioning and cleaning procedures willgenerate various error terms and residual submatrices (such as the verticesin I bad , or the small proportion of pairs ( I k , I l ) which are not suﬃcientlypseudorandom). As the smallest singular value is notoriously sensitive toperturbations, it will take some care to control these terms. We will use somehigh-powered tools such as bounds on the operator norm of sparse randommatrices and products of random matrices due to Lata la and Vershynin –see Section 5.2. Invertibility from connectivity assumptions.

Now we state the speciﬁcpseudorandomness condition on a standard deviation proﬁle under whichwe have good control on the smallest singular value. While “pseudorandom”generally means that the edge distribution in a graph is close to uniform ona range of scales, we will only need control from below on the edge densities(morally speaking, we want the matrix A to be as far as possible fromthe zero matrix, the most poorly invertible matrix). The following one-sidedcondition is taken from the combinatorics literature (see [17, Deﬁnition 1.6]).The reader should recall the notation introduced in (1.16)–(1.18). Definition . Let A be an n × m matrix withnon-negative entries. For δ, ε ∈ (0 , A is ( δ, ε ) -super-regular ifthe following hold:(1) |N A ( i ) | ≥ δm for all i ∈ [ n ];(2) |N A T ( j ) | ≥ δn for all j ∈ [ m ];(3) e A ( I, J ) ≥ δ | I || J | for all I ⊂ [ n ] , J ⊂ [ m ] with | I | ≥ εn and | J | ≥ εm .The reader should compare this condition with Deﬁnition 1.9. Conditions(1) and (2) are are the same in both deﬁnitions, while it is not hard to see NVERTIBILITY OF STRUCTURED RANDOM MATRICES that condition (3) above implies(1.27) |N ( δ ) A T ( J ) | ≥ (1 − ε ) n whenever | J | ≥ εn (with notation as in (1.19)), which is stronger thancondition (3) in Deﬁnition 1.9 for such J . On the other hand, conditions (1)and (2) imply that |N ( √ δ/ A T ( J ) | ≥ δn for any J ⊂ [ m ] (see Lemma 3.4),so super-regularity is stronger than broad connectivity for ε, η suﬃcientlysmall depending on δ . Theorem . Let M = A ◦ X + B be an n × n matrix as in Deﬁnition 1.3. Assume that A ( σ ) (as deﬁned in (1.18) ) is ( δ, ε ) -super-regular for some δ, σ ∈ (0 , and < ε < c δσ with c > a suﬃciently small constant. For any γ ≥ / there exists β = O ( γ ) such that (1.28) P (cid:0) s n ( M ) ≤ n − β , k M k ≤ n γ (cid:1) ≪ γ,δ,σ ,κ r log nn . Note that Theorem 1.24 allows for a mean proﬁle B of arbitrary poly-nomial size in operator norm, whereas in Theorem 1.12 we only allowed k B k = O ( √ n ). The ability to handle such large perturbations will be cru-cial in the proof of Theorem 1.18, as the iterative application of the Schurcomplement bound discussed above will lead to perturbations of increasinglylarge polynomial order.We defer discussion of the key technical ideas for Theorem 1.12 and The-orem 1.24 to Sections 3 and 4. We only mention here that our proof ofTheorem 1.24 makes crucial use of a new “entropy reduction” argument,which allows us to control the event that k M u k is small for some u in cer-tain portions of the sphere S n − by the event that this holds for some u ina random net of relatively low cardinality. The argument uses an improve-ment by Spielman and Srivastava [32] of the classic Restricted InvertibilityTheorem due to Bourgain and Tzafriri [8] – see Section 3 for details.1.6. Organization of the paper.

The rest of the paper is organized asfollows. Sections 2, 3 and 4 are devoted to the proofs of Theorems 1.12and 1.24. We prove these theorems in parallel as they involve many similarideas. In Section 2 we collect some standard lemmas on anti-concentrationfor random walks and products of random matrices with ﬁxed vectors, alongwith some facts about nets in Euclidean space. In Section 3 we show thatrandom matrices as in Theorems 1.12 and 1.24 are well-invertible over setsof “compressible” vectors in the unit sphere, and in Section 4 we establish N. COOK control over the complementary set of “incompressible” vectors. Theorem1.18 is proved in Section 5.1.7.

Notation.

In addition to the asymptotic notation deﬁned at the be-ginning of the article, we will occasionally use the notation f = o ( g ) tomean that f /g → n → ∞ , where the parameter n will be the size ofthe matrix under consideration (this will only be for the sake of brevity, asall of our arguments are quantitative). M n,m ( C ) denotes the set of n × m matrices with complex entries. When m = n we will write M n ( C ). For a matrix A = ( a ij ) ∈ M n,m ( C ) we willsometimes use the notation A ( i, j ) = a ij . For I ⊂ [ n ] , J ⊂ [ m ], A I,J denotesthe | I | × | J | submatrix with entries indexed by I × J . We abbreviate A J := A J,J . k·k denotes the Euclidean norm when applied to vectors, and the ℓ m → ℓ n operator norm when applied to an n × m matrix. k A k HS denotes the Hilbert–Schmidt (or Frobenius) norm of a matrix A . We will sometimes denote thesmallest singular value of a square matrix M by s min ( M ) (in situations where M is a submatrix of a larger matrix this will often be clearer than writingthe dimension).We denote the unit sphere in C n by S n − . For J ⊂ [ n ], we denote by C J ⊂ C n (resp. S J ⊂ S n − ) the set of vectors (resp. unit vectors) in C n supported on J . Given a vector v ∈ C n , we denote by v J ∈ C n the projectionof v to the coordinate subspace C J . For m ∈ N , x ∈ R , (cid:0) [ m ] x (cid:1) denotes thefamily of subsets of [ m ] of size ⌊ x ⌋ .When considering a random matrix M as in Deﬁnition 1.3, we use R i todenote the i th row of M , and write(1.29) F I,J := h{ ξ ij } i ∈ I,j ∈ J i for the sigma algebra of events generated by the entries { ξ ij } i ∈ I,j ∈ J of X .For I ⊂ [ n ] we write P I ( · ) for probability conditional on F [ n ] \ I, [ n ] . Acknowledgements.

The author thanks David Renfrew and Terence Taofor useful conversations, and also thanks David Renfrew for providing helpfulcomments on a preliminary version of the manuscript.

2. Preliminaries.

Partitioning and discretizing the sphere.

For the proofs of Theorems1.12 and 1.24 we make heavy use of ideas and notation developed in [20, 21,27, 28] and related ideas from geometric functional analysis. In particular,

NVERTIBILITY OF STRUCTURED RANDOM MATRICES in order to lower bound s n ( M ) = inf u ∈ S n − k M u k we partition the sphere into sets of vectors of diﬀerent levels of “compress-ibility”, which we presently deﬁne, and separately obtain control on theinﬁmum of k M u k over each set.Recall from Section 1.7 our notation C J ⊂ C m for the set of vectorssupported on J ⊂ [ m ]. For a set T ⊂ C n and ρ > T ρ for theset of points within Euclidean distance ρ of T . We recall also the followingdeﬁnitions from [30]. For θ, ρ ∈ (0 , compressiblevectors (2.1) Comp( θ, ρ ) := S m − ∩ [ J ∈ ( [ m ] θm )( C J ) ρ and the complementary set of incompressible vectors (2.2) Incomp( θ, ρ ) := S m − \ Comp( θ, ρ ) . That is, Comp( θ, ρ ) is the set of unit vectors within (Euclidean) distance ρ of a vector supported on at most θm coordinates. On the other hand,incompressible vectors enjoy the following property which will lead to goodanti-concentration properties for an associated random walk. Lemma . Fix θ, ρ ∈ (0 , and let v ∈ Incomp( θ, ρ ) . There is a set L + ⊂ [ m ] with | L + | ≥ θm such that | v j | ≥ ρ/ √ m for all j ∈ L + . Moreover, for all λ ≥ there isa set L ⊂ [ m ] with | L | ≥ (1 − λ ) θm such that for all j ∈ L , ρ √ m ≤ | v j | ≤ λ √ θm . Proof.

Take L + = { j : | v j | ≥ ρ/ √ m } and denote L − = { j : | v j | ≤ λ/ √ θm } . Since v lies a distance at least ρ from any vector supported onat most θm coordinates we must have | L + | ≥ θm , which gives the ﬁrstclaim. On the other hand, since v ∈ S m − , by Markov’s inequality we have | ( L − ) c | ≤ θm/λ , so taking L = L + ∩ L − we have | L | ≥ (1 − λ ) θm .For ﬁxed choices of θ, ρ we informally refer to the coordinates of v ∈ Incomp( θ, ρ ) where | v j | ≥ ρ/ √ n as the essential support of v .Now we recall a standard fact about nets of the sphere of controlledcardinality. For ρ >

0, recall that a ρ -net of a set T ⊂ C m is a ﬁnite subsetΣ ⊂ T such that for all v ∈ T there exists v ′ ∈ Σ with k v − v ′ k ≤ ρ . N. COOK

Lemma . Let V ⊂ C m be a subspaceof (complex) dimension k , let T ⊂ V ∩ S m − , and let ρ ∈ (0 , . Then T hasa ρ -net Σ ⊂ T of cardinality | Σ | ≤ (3 /ρ ) k . Proof.

Let Σ ⊂ T be a ρ -separated (in Euclidean distance) subset thatis maximal under set inclusion. It follows from maximality that Σ is a ρ -netof T . Let Σ ρ/ denote the ρ/ V . Noting that Σ ρ/ isa disjoint union of k -dimensional Euclidean balls of radius ρ/

2, we have | Σ | c k ( ρ/ k ≤ vol k (Σ ρ/ ) ≤ c k (1 + ρ/ k where vol k denotes the k -dimensional Lebesgue measure on V and c k isthe volume of the Euclidean unit ball in C k . The desired bound follows byrearranging.2.2. Anti-concentration for scalar random walks.

In this subsection wecollect some standard anti-concentration estimates for scalar random walks,which are perhaps the most central tool for proving that random matricesare (well-)invertible with high probability.

Definition . Let ξ be a complex-valuedrandom variable. For v ∈ C n we let(2.3) S ξ ( v ) = n X j =1 ξ j v j where ξ , . . . , ξ n are iid copies of ξ . For r ≥ concentrationprobability (2.4) p ξ,v ( r ) = sup z ∈ C P (cid:0) | S ξ ( v ) − z | ≤ r (cid:1) . Throughout this section we operate under the following distributionalassumption on ξ . Definition . Let κ ≥

1. A complex random variable ξ is said to have κ -controlled secondmoment if one has the upper bound(2.5) E | ξ | ≤ κ (in particular, | E ξ | ≤ κ / ), and the lower bound(2.6) E [Re( zξ − w )] ( | ξ | ≤ κ ) ≥ κ [Re( z )] for all z ∈ C , a ∈ R . NVERTIBILITY OF STRUCTURED RANDOM MATRICES Roughly speaking, a complex random variable ξ has controlled secondmoment if its distribution has a one-(real-)dimensional marginal with fairlylarge variance on some compact set. The following is a quantitative versionof [36, Lemma 2.4], and shows that by multiplying the matrices X and B inDeﬁnition 1.3 by a scalar phase (amounting to multiplying M by a phase,which does not aﬀect its singular values) we can assume the atom variable ξ has O ( κ )-controlled second moment in all of our proofs with no loss ofgenerality. The proof is deferred to Appendix B. Lemma . Let ξ be a centered complex random variable with unit vari-ance, and assume ξ is κ -spread for some κ ≥ (see Deﬁnition 1.1). Thenthere exists θ ∈ R such that e iθ ξ has κ -controlled second moment for some κ = O ( κ ) . Below we give two standard bounds on the concentration function p ξ,v ( r )when ξ is a κ -controlled random variable and v ∈ S n − . The ﬁrst gives acrude constant order bound that is uniform in v ∈ S n − : Lemma . Let ξ bea complex random variable with κ -controlled second moment. There exists r > depending only on κ such that p ξ,v ( r ) ≤ − r for all v ∈ S n − . Note that Lemma 2.6 is sharp for the case that v is a standard basisvector. The following gives an improved bound when v has small ℓ ∞ norm. Lemma . Let ξ be a complex randomvariable that is κ -controlled for some κ > , and let v ∈ S n − . For all r ≥ , (2.7) p ξ,v ( r ) ≪ κ r + k v k ∞ . Lemma 2.7 can be deduced from the Berry–Ess´een theorem (which is theapproach taken in [20], for instance), but this would require ξ to have ﬁnitethird moment, which we do not assume. (Generally speaking, higher momentassumptions should only be necessary to prove concentration bounds asopposed to anti-concentration.) Since we could not locate a proof in theliterature for the case that ξ and the coeﬃcients of v take values in C , weprovide a proof in Appendix B.2.3. Anti-concentration for the image of a ﬁxed vector.

In this subsectionwe boost the anti-concentration bounds for scalar random variables fromthe previous sections to anti-concentration for the image of a ﬁxed vectorunder a random matrix. The following lemma of Rudelson and Vershynin isconvenient for this task. N. COOK

Lemma . Let ζ , . . . , ζ n be inde-pendent non-negative random variables.(a) Suppose that for some ε , p > and all j ∈ [ n ] , P ( ζ j ≤ ε ) ≤ p . Thereare c , p ∈ (0 , depending only on p such that (2.8) P (cid:18) n X j =1 ζ j ≤ c ε n (cid:19) ≤ p n . (b) Suppose that for some K, ε ≥ and all j ∈ [ n ] , P ( ζ j ≤ ε ) ≤ Kε for all ε ≥ ε . Then for all ε ≥ ε , (2.9) P (cid:18) n X j =1 ζ j ≤ ε n (cid:19) ≤ ( CKε ) n . Note that in part (a) we have given more speciﬁc dependencies on the pa-rameters than in [28]. For completeness we provide the proof of this modiﬁedversion in Appendix B.Let M = A ◦ X + B be as in Deﬁnition 1.3. Recall that we denote by R i the i th row of M . In the following lemmas we assume that the atom variable ξ has κ -controlled second moment for some ﬁxed κ ≥

1. For v ∈ C m and i ∈ [ n ] we write(2.10) v i := ( v j a ij ) mj =1 For α > I α ( v ) := { i ∈ [ n ] : k v i k ≥ α } . Lemma . Fix v ∈ C m and let α > such that I α ( v ) = ∅ . For all I ⊂ I α ( v ) , (2.12) sup w ∈ C n P I (cid:16) k M v − w k ≤ c α | I | / (cid:17) ≤ e − c | I | where c > is a constant depending only on κ (recall our notation P I ( · ) from Section 1.7). Proof.

Fix w ∈ C n arbitrarily. For any i ∈ I α ( v ) and any t ≥ P ( | R i · v − w i | ≤ t ) ≤ p ξ,v i ( t ) = p ξ,v i / k v i k ( t/ k v i k ) ≤ p ξ,v i / k v i k ( t/α ) . Taking t = αr , by Lemma 2.6 we have(2.13) P ( | R i · v − w i | ≤ αr ) ≤ − r NVERTIBILITY OF STRUCTURED RANDOM MATRICES where r > κ .Fix I ⊂ I α ( v ) arbitrarily. We may assume without loss of generality that I is non-empty. By Lemma 2.8(a) there exists c > κ such that(2.14) P I (cid:18) X i ∈ I | R i · v − w i | ≤ c r α | I | (cid:19) ≤ e − c | I | . Now for any τ ≥ P I (cid:16) k M v − w k ≤ τ | I | / (cid:17) = P I (cid:18) n X i =1 | R i · v − w i | ≤ τ | I | (cid:19) ≤ P I (cid:18) X i ∈ I | R i · v − w i | ≤ τ | I | (cid:19) and the claim follows by taking τ = c / r α =: c α and applying (2.14).By similar lines, using Lemmas 2.8(b) and 2.7 in place of Lemmas 2.8(a)and 2.6, respectively, one obtains the following, which is superior to Lemma2.9 for vectors v with small ℓ ∞ norm. The details are omitted. Lemma . Fix v ∈ C m . Let α > such that I α ( v ) = ∅ and ﬁx I ⊂ I α ( v ) nonempty. For all t ≥ , (2.15) sup w ∈ C n P I (cid:16) k M v − w k ≤ t | I | / (cid:17) ≤ O κ (cid:18) α (cid:0) t + k v k ∞ (cid:1)(cid:19) | I | .

3. Invertibility from connectivity: Compressible vectors.

In thissection we combine the anti-concentration estimates from Section 2 withunion bounds over ε -nets (as obtained for instance from Lemma 2.2) toprove that with high probability, a random matrix M as in Theorem 1.12 orTheorem 1.24 is well-invertible on the set of compressible vectors Comp( θ, ρ )(as deﬁned in (2.1)) for appropriate choices of θ, ρ . Hence, there will be acompetition between the quality of the anti-concentration estimates and thecardinality of the ε -nets. For small values of θ we can use ε -nets of smallcardinality, but only have poor anti-concentration bounds (namely, Lemma2.9), while for large θ the nets are very large, but we have access to theimproved anti-concentration of Lemma 2.10.In both cases we start with a crude result, Lemma 3.3, giving control forthe vectors in Comp( θ , ρ ) for some small value of θ (possibly depending N. COOK on n ). We then use an iterative argument argument to obtain control onComp( θ, ρ ) for larger values of θ while lowering the parameter ρ . For The-orem 1.12 we want to take θ close to 1, while for Theorem 1.24 a constantorder value of θ will suﬃce.It turns out that that while the standard ε -net from Lemma 2.2 suﬃcesto prove Lemma 3.3, it is insuﬃcient to obtain control on Comp( θ, ρ ) for thedesired values of θ . For the broadly connected case this is essentially due toworking in C n rather than R n , which causes a factor 2 increase in metricentropies (this diﬃculty was not present in the proof of Theorem 1.10 in [30]as they worked in R n ). The situation is worse for the case of Theorem 1.24,the main source of diﬃculty being that k B k can be of arbitrary polynomialorder. As a consequence, the starting point θ for our iterative argumentwill be of size o (1). This prevents us from using the third condition of thesuper-regularity hypothesis (see Deﬁnition 1.23), which only “sees” vectorsthat are essentially supported on more than εn coordinates.We deal with this by reducing the entropy cost of the nets over whichwe take union bounds. In Section 3.2 we prove Lemma 3.5 which shows,roughly speaking, that if we have already established control on vectors inComp( θ, ρ ) for some θ, ρ , then we can control the vectors in Comp( θ + ∆ , ρ ′ )for some small ∆ , ρ ′ using a random net of signiﬁcantly smaller cardinalitythan the net provided by Lemma 2.2. We can then increment θ from θ upto size ≫

1, taking steps of size ∆. For the broadly connected case we cancontinue and take θ as close to 1 as desired. The entropy reduction argumentfor Lemma 3.5 makes use of a strong version of the well-known RestrictedInvertibility Theorem due to Spielman and Srivastava – see Theorem 3.7.We now state the main results of this section. For K ≥ boundedness event (3.1) B ( K ) := (cid:8) k M k ≤ K √ n (cid:9) . With a ﬁxed choice of K we write(3.2) E ( θ, ρ ) := B ( K ) ∧ (cid:8) ∃ u ∈ Comp( θ, ρ ) : k M u k ≤ ρK √ n (cid:9) . Proposition . Let M = A ◦ X + B be as in Deﬁnition 1.3 with n/ ≤ m ≤ n , and as-sume that ξ has κ -controlled second moment for some κ ≥ (see Deﬁnition2.4). Let K ≥ and σ , δ, ν ∈ (0 , . There exist θ ( κ, σ , δ, K ) > and ρ ( κ, σ , δ, ν, K ) > such that the following holds. Assume(1) |N A ( σ ) T ( j ) | ≥ δn for all j ∈ [ m ] ; NVERTIBILITY OF STRUCTURED RANDOM MATRICES (2) |N ( δ ) A ( σ ) T ( J ) | ≥ min((1 + ν ) | J | , n ) for all J ⊂ [ m ] with | J | ≥ θ m .Then for any < θ ≤ (1 − δ ) min( nm , , (3.3) P ( E ( θ, ρ )) ≪ κ,σ ,δ,ν,K exp (cid:0) − c κ δσ n (cid:1) where c κ > depends only on κ . The following gives control of compressible vectors for more general pro-ﬁles than in Proposition 3.1 (essentially removing the condition (2)). How-ever, we have to take the parameter ρ much smaller, and we only covervectors that are essentially supported on a small (linear) proportion of thecoordinates, rather than a proportion close to one. Proposition . Let M = A ◦ X + B be as in Deﬁnition 1.3 with n/ ≤ m ≤ n .Assume ξ has κ -controlled second moment for some κ ≥ , and that for some a > we have (3.4) n X i =1 a ij ≥ a n for all j ∈ [ m ] . Fix γ ≥ / and let ≤ K = O ( n γ − / ) . Then for some ρ = ρ ( γ, a , κ, n ) ≫ γ,a ,κ n − O ( γ ) and a suﬃciently small constant c > we have (3.5) P (cid:0) E ( c a , ρ ) (cid:1) ≪ γ,a ,κ exp (cid:0) − c κ a n (cid:1) where c κ > depends only on κ . Highly compressible vectors.

In this subsection we establish the fol-lowing crude version of Proposition 3.2, giving control on vectors in Comp( θ , ρ )with θ suﬃciently small depending on a and K . Lemma . Let M = A ◦ X + B be as inDeﬁnition 1.3 with m ≤ n . Assume that ξ has κ -controlled second momentfor some κ ≥ . Suppose also that there is a constant a > such that forall j ∈ [ m ] , P ni =1 a ij ≥ a n . Let K ≥ . Then with notation as in (3.2) wehave (3.6) P ( E ( θ , ρ )) ≤ e − c κ a n where θ = c κ a / log( K/a ) and ρ = c κ a /K for a suﬃciently small c κ > depending only on κ . N. COOK

We will need the following lemma, which ensures that the set I α ( v ) from(2.11) is reasonably large when the columns of A have large ℓ norm. Asimilar argument has been used in [22] and [30]. Lemma . Let A be an n × m matrix as in Deﬁnition1.3, and assume that for some a > we have P ni =1 a ij ≥ a n for all j ∈ [ m ] .Then for any v ∈ S m − we have | I a / ( v ) | ≥ a n . Proof.

Writing α = a / √

2, we have a n ≤ n X i =1 m X j =1 | v j | a ij = X i ∈ I α ( v ) m X j =1 | v j | a ij + X i/ ∈ I α ( v ) m X j =1 | v j | a ij ≤ X i ∈ I α ( v ) m X j =1 | v j | + X i/ ∈ I α ( v ) a ≤ | I α ( v ) | + 12 a n and rearranging gives the claim. Proof of Lemma 3.3.

Fix J ⊂ [ m ] of size ⌊ θ m ⌋ and let v ∈ S J bearbitrary. Writing α = a / √

2, by Lemma 2.9 and our choice of ρ (with c κ > κ ), P (cid:0) k M v k ≤ ρ K √ n (cid:1) ≤ P (cid:16) k M v k ≤ c κ a | I α ( v ) | / (cid:17) ≤ e − c κ | I α ( v ) | . Applying Lemma 3.4, we obtain(3.7) P (cid:0) k M v k ≤ ρ K √ n (cid:1) ≤ e − c κ a n ∀ v ∈ S J (adjusting c κ ). By Lemma 2.2 we may ﬁx Σ J ⊂ S J a ρ / S J suchthat | Σ J | ≤ (12 /ρ ) k . Suppose that k M k ≤ K √ n and that k M u k ≤ ρ K √ n for some u ∈ S m − ∩ ( C J ) ρ / . Let u ′ ∈ C J with k u − u ′ k ≤ ρ /

4, and let u ′′ ∈ Σ J with k u ′′ − u ′ k u ′ k k ≤ ρ /

4. By the triangle inequality, k u − u ′′ k ≤ k u − u ′ k + (cid:13)(cid:13)(cid:13)(cid:13) u ′ − u ′ k u ′ k (cid:13)(cid:13)(cid:13)(cid:13) + (cid:13)(cid:13)(cid:13)(cid:13) u ′ k u ′ k − u ′′ (cid:13)(cid:13)(cid:13)(cid:13) ≤ ρ / NVERTIBILITY OF STRUCTURED RANDOM MATRICES where the bound on the middle term follows from |k u ′ k − | ≤ ρ / k M u ′′ k ≤ k M u k + k M ( u − u ′′ ) k ≤ ρ K √ n + K √ n · (3 ρ / ≤ ρ K √ n. Applying the union bound and (3.7) (adjusting c κ to replace ρ by 2 ρ ), P (cid:0) ∃ u ∈ S m − ∩ ( C J ) ρ / : k M u k ≤ ρ K √ n (cid:1) ≤ P (cid:0) ∃ u ′′ ∈ Σ J : k M u ′′ k ≤ ρ K √ n (cid:1) ≤ O (1 /ρ ) θ m e − c κ a n From (2.1) and applying the union bound over all choice of J ∈ (cid:0) [ m ] θ m (cid:1) , P ( E ( θ , ρ / ≤ O (1 /θ ) θ m O (1 /ρ ) θ m e − c κ a n ≤ O (cid:18) θ ρ (cid:19) θ n e − c κ a n , where we used our assumption m ≤ n . The desired bound now followsfrom substituting our choices of θ , ρ , and again adjusting the constant c κ to replace ρ / ρ in the above.3.2. An entropy reduction lemma.

The aim of this subsection is to es-tablish the following:

Lemma . For every I ⊂ [ n ] , J ⊂ [ m ] , ε > there is a random ﬁnite set Σ I,J ( ε ) ⊂ S J , measurablewith respect to F I.J = h{ ξ ij } i ∈ I,j ∈ J i , such that the following holds. Let ρ ∈ (0 , , K > and < θ < nm . On B ( K ) ∧ E ( θ, ρ ) c , for all J ⊂ [ m ] with | J | > θm and all β, ρ ′ ∈ (0 , , putting (3.8) ρ ′′ = 6 ρ ′ βρ (cid:18) n ⌊ θm ⌋ (cid:19) / there exists I ⊂ [ n ] with | I | = ⌊ (1 − β ) ⌊ θm ⌋⌋ such that(1) | Σ I,J ( ρ ′′ ) | ≤ ( C/ρ ′′ ) | J |−| I | ) for an absolute constant C > , and(2) for any u ∈ S m − ∩ ( C J ) ρ ′ such that k M u k ≤ ρ ′ K √ n , we have dist( u, Σ I,J ( ρ ′′ )) ≤ ρ ′′ .Furthermore, writing (3.9) G I,J ( ρ ′′ ) := (cid:26)(cid:12)(cid:12) Σ I,J ( ρ ′′ ) (cid:12)(cid:12) ≤ (cid:18) Cρ ′′ (cid:19) | J |−| I | ) (cid:27) N. COOK we have that for any θ ′ ∈ ( θ, , E ( θ, ρ ) c ∧ E ( θ ′ , ρ ′ ) ⊂ _ J ∈ ( [ m ] θ ′ m ) _ I ∈ ( [ n ](1 − β )2 ⌊ θm ⌋ ) (cid:18) G I,J ( ρ ′′ ) ∧ n ∃ u ∈ Σ I,J ( ρ ′′ ) : k M u k ≤ ρ ′′ K √ n o(cid:19) . (3.10) Remark . We obtain the random set Σ

I,J ( ε ) as the intersection ofthe sphere S J with an ε -net of the kernel of the submatrix M I,J . However,for our purposes it only matters that it is ﬁxed by conditioning on the rows { R i } i ∈ I , has small cardinality, and serves as a net for almost-null vectors of M that are supported on J .To prove Lemma 3.5 we use the following version of the Restricted Invert-ibility Theorem [32] (the version below is taken from [23, Theorem 3.1]). Theorem . Suppose v , . . . , v n ∈ C m are such that P ni =1 v i v ∗ i = I m . For any β ∈ (0 , , there is a subset I ⊂ [ n ] of size | I | = ⌊ (1 − β ) m ⌋ for which (3.11) λ | I | (cid:18) X i ∈ I v i v ∗ i (cid:19) ≥ β m/n where λ k ( A ) denotes the k th largest eigenvalue of a Hermitian matrix A . This has the following consequence, which can be seen as a robust quan-titative version of the basic fact from linear algebra that the row rank of amatrix is equal to its column rank.

Corollary . Let M be an n × m matrix with n ≥ m , and assume s m ( M ) ≥ ε √ n for some ε > . For any β ∈ (0 , there exists I ⊂ [ n ] with | I | = ⌊ (1 − β ) m ⌋ such that s | I | ( M I, [ m ] ) ≥ βε √ m. Remark . The original Restricted Invertibility Theorem of Bourgainand Tzafriri [8] only gives | I | ≥ cm and s | I | ( M I, [ m ] ) ≥ cε √ m for some(small) absolute constant c >

0, while it will be important for our purposesto be able to take I of size close to m . NVERTIBILITY OF STRUCTURED RANDOM MATRICES Proof of Corollary 3.8.

By the singular value decomposition it suf-ﬁces to consider M of the form M = U Σ where U is an n × m matrixwith orthonormal columns and Σ is an m × m diagonal matrix with entriesbounded below by ε √ n . Fix α ∈ (0 , v ∗ , . . . , v ∗ n ∈ C m denote therows of U , it follows from orthonormality that I m = U ∗ U = n X i =1 v i v ∗ i . Hence, we can apply Theorem 3.7 to obtain a subset I ⊂ [ n ] with | I | = ⌊ (1 − β ) m ⌋ such that s | I | ( U I, [ m ] ) = λ | I | (cid:18) X i ∈ I v i v ∗ i (cid:19) ≥ β m/n. Now we have s | I | ( M I,m ) ≥ s | I | ( U I,m ) s m (Σ) ≥ β r mn ε √ n = βε √ m. Proof of Lemma 3.5.

Let I ⊂ [ n ] , J ⊂ [ m ], and write V I,J = C J ∩ ker( M I,J ). Conditional on F I,J , for ε >

I,J ( ε ) be an ε -net of S m − ∩ V I,J . By Lemma 2.2 we may take(3.12) | Σ I,J ( ε ) | = O (1 /ε ) V I,J ) . Let ρ, ρ ′ ∈ (0 , K > < θ < nm . Fix β ∈ (0 ,

1) and J ⊂ [ m ] with | J | > θm . On E ( θ, ρ ) c , for all J ⊂ J with | J | = ⌊ θm ⌋ we have s ⌊ θm ⌋ ( M [ n ] ,J ) ≥ ρK √ n. By Corollary 3.8 there exists I ⊂ [ n ] with | I | = ⌊ (1 − β ) ⌊ θm ⌋⌋ such that s | I | ( M I,J ) ≥ βρK p ⌊ θm ⌋ . By the Cauchy interlacing law,(3.13) s | I | ( M I,J ) ≥ βρK p ⌊ θm ⌋ . In particular, the submatrix ( y ij ) i ∈ I,j ∈ J has full row-rank, which impliesdim( V I,J ) = | J | − | I | . From (3.12) we conclude(3.14) | Σ I,J ( ε ) | = O (1 /ε ) | J |−| I | ) N. COOK for any ε > u ∈ S m − ∩ ( C J ) ρ ′ such that(3.15) k M u k ≤ ρ ′ K √ n. Letting v ′ ∈ C J such that k u − v ′ k ≤ ρ ′ , and putting v := v ′ / k v ′ k ∈ S J , bythe triangle inequality we have k u − v k ≤ ρ ′ and(3.16) k M v k ≤ k

M u k + k M kk u − v k ≤ ρ ′ K √ n. On the other hand, k M v k ≥ k M I, [ m ] v k = k M I, [ m ] (I − P V I,J ) v k where P V I,J is the matrix for orthogonal projection to the subspace V I,J .Applying (3.13), k M v k ≥ k (I − P V I,J ) v k βρK p ⌊ θm ⌋ . Together with (3.16) this implies that v lies within distance(3.17) 3 ρ ′ √ nβρ p ⌊ θm ⌋ = ρ ′′ / V I,J . Since v is a unit vector we have dist( v, S m − ∩ V I,J ) ≤ ρ ′′ by the triangle inequality, anddist( u, Σ I,J ( ρ ′′ )) ≤ k u − v k + ρ ′′ + dist( v, S m − ∩ V I,J ) ≤ ρ ′ + 2 ρ ′′ ≤ ρ ′′ as desired (that 2 ρ ′ ≤ ρ ′′ follows from inspection of (3.8)).Now to prove (3.10), let θ ′ ∈ ( θ, E ( θ, ρ ) c and applyingthe ﬁrst part of the lemma, E ( θ, ρ ) c ∧ E ( θ ′ , ρ ′ )= B ( K ) ∧ E ( θ, ρ ) c ∧ _ J ∈ ( [ m ] θ ′ m ) n ∃ v ∈ ( S J ) ρ ′ : k M v k ≤ ρ ′ K √ n o ⊂ _ J ∈ ( [ m ] θ ′ m ) _ I ∈ ( [ n ](1 − β )2 ⌊ θm ⌋ ) (cid:18) G I,J ( ρ ′′ ) ∧ n ∃ u ∈ Σ I,J ( ρ ′′ ) : k M u k ≤ ρ ′′ K √ n o(cid:19) (3.18)where in the last line we noted that for v ∈ ( S J ) ρ ′ , u ∈ Σ I,J ( ρ ′′ ) such that k u − v k ≤ ρ ′′ , we have k M u k ≤ k

M v k + 3 ρ ′′ K √ n ≤ ( ρ ′ + 3 ρ ′′ ) K √ n ≤ ρ ′′ K √ n. NVERTIBILITY OF STRUCTURED RANDOM MATRICES Broadly connected proﬁle: Proof of Proposition 3.1.

We will obtainProposition 3.1 from an iterative application of the following lemma:

Lemma . Let M = A ◦ X + B be as in Deﬁnition 1.3 with m ≥ n/ . Assume ξ has κ -controlled second moment for some κ ≥ , and that for some σ , δ, ν, θ ∈ (0 , we have(1) |N A ( σ ) ( j ) | ≥ δn for all j ∈ [ m ] ;(2) |N ( δ ) A ( σ ) ( J ) | ≥ min((1 + ν ) | J | , n ) for all J ⊂ [ m ] with | J | ≥ ( θ / m .Let K ≥ , ρ ∈ (0 , , and θ ∈ [ θ , such that (1 + ν ) θm < n . There exists ρ ′ = ρ ′ ( κ, σ , δ, ν, ρ, θ, K ) > such that (3.19) P (cid:16) E ( θ, ρ ) c ∧ E (cid:16)(cid:16) ν (cid:17) θ, ρ ′ (cid:17)(cid:17) = O κ,σ ,δ,ν,ρ,θ,K ( e − n ) . Proof.

We may assume n is suﬃciently large depending on κ, σ , δ, ν, ρ, θ, K .Write θ ′ = (cid:0) ν (cid:1) θ and take β = ν . Let ρ ′ > κ, σ , δ, ν, ρ, θ, K , and let ρ ′′ be as in (3.8). Intersectingthe right hand side of (3.10) with E ( θ, ρ ) c , we have E ( θ, ρ ) c ∧ E ( θ ′ , ρ ′ ) ⊂ _ J ∈ ( [ m ] θ ′ m ) _ I ∈ ( [ n ](1 − β )2 ⌊ θm ⌋ ) G I,J ( ρ ′′ ) ∧ E ( θ, ρ ) c ∧ n ∃ u ∈ Σ I,J ( ρ ′′ ) : k M u k ≤ ρ ′′ K √ n o ⊂ _ J ∈ ( [ m ] θ ′ m ) _ I ∈ ( [ n ](1 − β )2 ⌊ θm ⌋ ) G I,J ( ρ ′′ ) ∧ n ∃ u ∈ Σ I,J ( ρ ′′ ) \ Comp( θ, ρ ) : k M u k ≤ ρ ′′ K √ n o (3.20)where the second line follows by taking ρ ′ small enough that 4 ρ ′′ < ρ .Fix J ⊂ [ m ] and I ⊂ [ n ] of sizes ⌊ θ ′ m ⌋ , ⌊ (1 − β ) ⌊ θm ⌋⌋ , respectively, andcondition on F I, [ n ] (recall the notation (1.29)) to ﬁx Σ I,J ( ρ ′′ ). Consider anarbitrary element u ∈ Σ I,J ( ρ ′′ ) \ Comp( θ, ρ ). By Lemma 2.1, there is a set L ⊂ [ m ] with | L | ≥ (1 − νC ) θm and(3.21) ρ √ m ≤ | u j | ≤ C √ νθm for all j ∈ L , where C > i ∈ N ( δ ) ( L ), we have(3.22) k ( u L ) i k ≥ X i ∈ L : a ij ≥ σ | u j | a ij ≥ ρ m σ δ | L | ≥ ρ σ δθ =: α N. COOK where in the last inequality we took C suﬃciently large. Hence,(3.23) | I α ( u L ) | ≥ |N ( δ ) ( L ) | ≥ min (cid:0) n, (1 + ν )(1 − ν/C ) θm (cid:1) ≥ (cid:16) ν (cid:17) θm taking C larger if necessary, where in the second inequality we used ourassumption θ ≥ θ , and in the third inequality we used our assumption(1 + ν ) θm < n .Fix I ⊂ I α ( u L ) \ I of size n := ⌊ (1 + ν ) θm ⌋ − | I | . In particular, ν θm ≤ n ≤ (cid:16) ν (cid:17) θm − (1 − β ) θm ≤ νθm (3.24)and n + 2 | I | − | J | ≥ (cid:16) ν (cid:17) θm + (1 − β ) θm − (cid:16) ν (cid:17) θm − O (1)= 110 νθm − O (1) . (3.25)by our choice of β . By Lemma 2.10,(3.26) P I (cid:0) k M u k ≤ ρ ′′ K √ n (cid:1) ≤ O κ (cid:18) α (cid:18) ρ ′′ K √ n p | I | + 1 √ νθm (cid:19)(cid:19) n ≤ O κ (cid:18) ρ ′′ Kαθ / (cid:19) n where in the second inequality we applied the assumption m ≥ n/ n is suﬃciently large that ρ ′′ ≫ /K √ n (it follows from (3.8)and our assumption that ρ ′ is independent of n that ρ ′′ is bounded belowindependent of n ).Suppose that G I,J ( ρ ′′ ) holds. Since the bound (3.26) is uniform in thechoice of I , we can undo the conditioning and apply the union bound overelements of Σ I,J ( ρ ′′ ) \ Comp( θ, ρ ) to ﬁnd P (cid:16) ∃ u ∈ Σ I,J ( ρ ′′ ) \ Comp( θ, ρ ) : k M u k ≤ ρ ′′ K √ n (cid:17) ≤ O (cid:18) ρ ′′ (cid:19) | J |−| I | ) O κ (cid:18) ρ ′′ Kαθ / (cid:19) n = O κ (cid:18) Kαθ / (cid:19) n O ( ρ ′′ ) n +2 | I |− | J | = O κ (cid:18) Kαθ / (cid:19) νθm O ( ρ ′′ ) νθm − O (1) where in the last line we applied the bounds (3.24) and (3.25). Since thisis uniform in I, J , we can undo the conditioning on F I, [ n ] and apply (3.20) NVERTIBILITY OF STRUCTURED RANDOM MATRICES with another union bound over the choices of I, J to obtain(3.27) P (cid:0) E ( θ, ρ ) c ∧ E ( θ ′ , ρ ′ ) (cid:1) ≤ m + n O κ (cid:18) Kαθ / (cid:19) νθm O (cid:18) ρ ′ νρθ / (cid:19) νθm − O (1) where we have substituted the deﬁnition of ρ ′′ . The result now follows bytaking ρ ′ suﬃciently small.Now we conclude the proof of Proposition 3.1. From our assumptions itfollows that for all j ∈ [ m ] we have P ni =1 a ij ≥ δσ n . Together with ourassumption m ≤ n , this means we can apply Lemma 3.3 to ﬁnd that(3.28) P ( E ( θ , ρ )) ≤ e − c κ δσ n where θ = c κ δσ / log( K/δσ ) and ρ = c κ δσ /K .We may assume without loss of generality that ν ≤ δ/

2. For l ≥ θ l = (1 + ν ) l θ , and let k be the smallest l such that θ l ≥ θ . We have (cid:16) ν (cid:17) θ k − m ≤ (cid:16) ν (cid:17) θm ≤ (cid:16) − δ (cid:17) min( m, n ) < n. In particular, (1 + ν/ k θ ≤ (1 + ν/ θ ≤

1, so(3.29) k ≤ log θ log (cid:0) ν (cid:1) ≪ κ,σ ,δ,ν,K . Applying Lemma 3.10 inductively, we have that for every 1 ≤ l ≤ k there is ρ l > κ, σ , δ, ν and K such that(3.30) P ( E ( θ l , ρ l ) \ E ( θ l − , ρ l − )) = O κ,σ ,δ,ν,K ( e − n ) . Together with (3.28) and the union bound, P ( E ( θ, ρ )) ≤ P ( E ( θ , ρ )) + k X l =1 P ( E ( θ l , ρ l ) \ E ( θ l − , ρ l − )) ≤ e − c κ δσ n + O κ,σ ,δ,ν,K ( e − n ) = O κ,σ ,δ,ν,K ( e − c κ δσ n ) . General proﬁle: Proof of Proposition 3.2.

For technical reasons (es-sentially due to the fact that we want to allow the operator norm to havearbitrary polynomial size) the anti-concentration argument from the previ-ous section will not suﬃce here, and we will need the following substitute.Roughly speaking, while previously we argued by isolating a large set of N. COOK coordinates on which the vector u is “ﬂat” (see (3.21)), here we will needto locate a set on which u is very ﬂat , only ﬂuctuating by a constant factor.This is done by a simple dyadic decomposition of the range of u , which isresponsible for the loss of a logarithmic factor in the probability bound. Asimilar argument will be used in Section 4.2. Lemma . Let M be as in Proposition 3.2. Let v ∈ Incomp( θ, ρ ) for some θ, ρ ∈ (0 , , and ﬁx I ⊂ [ n ] with | I | ≤ a n . Then for all t ≥ a ρ/ √ m , (3.31) sup w ∈ C n P [ n ] \ I (cid:16) k M v − w k ≤ t √ n (cid:17) = O κ  t log / ( √ mρ ) a ρθ /  a n . Remark . Proceeding as in the proof of Lemma 3.10 would yield(3.32)sup w ∈ C n P [ n ] \ I (cid:16) k M v − w k ≤ t √ n (cid:17) = O κ (cid:18) ta ρθ / (cid:19) a n for all t ≥ a √ θm . The ability to take t down to the scale ∼ ρ/ √ m will be crucial in the proofof Lemma 3.13 below. Proof.

We begin by ﬁnding a set of indices on which v varies by at mosta factor of 2. For k ≥ L k = { j ∈ [ m ] : 2 − ( k +1) < | v j | ≤ − k } . Since v ∈ Incomp( θ, ρ ), we have | L + | := |{ j ∈ [ m ] : | v j | ≥ ρ/ √ m }| ≥ θm. Indeed, were this not the case then v would be within distance ρ of thevector v L + whose support is smaller than θm , implying v ∈ Comp( θ, ρ ).Thus, L + ⊂ S ℓk =0 L k for some ℓ ≪ log( √ mρ ). By the pigeonhole principlethere exists k ∗ ≤ ℓ such that L ∗ := L k ∗ satisﬁes(3.33) | L ∗ | ≥ θnℓ ≫ θm log( √ mρ ) . Denote I ∗ := I a k v L ∗ k ( v L ∗ ). By Lemma 3.4,(3.34) | I ∗ | ≥ a n. Fix i ∈ I ∗ . By deﬁnition of I ∗ ,(3.35) k ( v i ) L ∗ k ≥ a k v L ∗ k NVERTIBILITY OF STRUCTURED RANDOM MATRICES and since | v j | ≫ ρ/ √ m on L ∗ ,(3.36) k v L ∗ k ≫ ρ √ m | L ∗ | / . Furthermore, since a ij ≤ j ∈ [ m ] and v varies by a factor at most 2on L ∗ ,(3.37) k ( v i ) L ∗ k ∞ ≤ k v L ∗ k ∞ ≤ k v L ∗ k| L ∗ | / . Fix w ∈ C n arbitrarily, and recall that R i denotes the i th row of M . ByLemma 2.7 and the above estimates, for all t ≥ P ( | R i · v − w i | ≤ t ) ≪ κ t + k ( v i ) L ∗ k ∞ k ( v i ) L ∗ k≪ a (cid:18) t k v L ∗ k + k ( v i ) L ∗ k ∞ k v L ∗ k (cid:19) ≪ a tρ (cid:18) m | L ∗ | (cid:19) / + 1 | L ∗ | / ! = 1 a (cid:18) m | L ∗ | (cid:19) / (cid:18) tρ + 1 √ m (cid:19) . By Lemma 2.8, P I ∗ \ I (cid:16) k M v − w k ≤ t | I ∗ \ I | / (cid:17) ≤ P I ∗ \ I (cid:16) X i ∈ I ∗ \ I | R i · v − w i | ≤ t | I ∗ \ I | (cid:17) = O κ (cid:18) t √ ma ρ | L ∗ | / (cid:19) | I ∗ \ I | for all t ≥ ρ/ √ m . Substituting the lower bounds (3.33), (3.34) on | L ∗ | and | I ∗ | and our assumption | I | ≤ a n , P I ∗ \ I (cid:18) k M v − w k ≤ ta √ n (cid:19) = O κ  t log / ( √ mρ ) a ρθ /  a n for all t ≥ ρ/ √ m . The result now follows by replacing t with 2 t/a as undoingthe conditioning on the remaining rows in [ n ] \ I .Now we are ready to prove the analogue of Lemma 3.10 for general proﬁles.Whereas in the broadly connected case we obtained control on vectors in N. COOK

Comp((1 + β ) θ, ρ ′ ) after restricting to the event that we have control onComp( θ, ρ ), for small β >

0, here we will also need to assume control onComp( θ , ρ ) for a ﬁxed small θ at each step. The control on Comp( θ, ρ )will be used to obtain a net of low cardinality using Lemma 3.5, whilethe control on Comp( θ , ρ ) will be used to obtain good anti-concentrationestimates using Lemma 3.11. (In the broadly connected case the control onComp( θ, ρ ) was suﬃcient for both purposes.) Lemma . Let M beas in Proposition 3.2, ﬁx γ > / and put K = n γ − / . Let θ , ρ be as inLemma 3.3, and ﬁx θ ∈ [ θ , c a ] , where c is a suﬃciently small constant(we may assume the constant c in Lemma 3.3 is suﬃciently small so thatthis interval is non-empty). We have (3.38) P (cid:16) E ( θ , ρ ) c ∧ E ( θ, ρ ) c ∧ E ( θ + βa , ρ ′ (cid:17)(cid:17) = O γ,a ,κ ( e − n ) for some ρ ′ ≫ γ,a ,κ n − O ( γ ) ρ , where we set (3.39) β = c min (cid:18) , γ − / (cid:19) for a suﬃciently small constant c > . Proof.

Let ρ ′ > ρ ′′ be as in (3.8).We denote θ ′ = θ + βa . Intersecting both sides of (3.10) with E ( θ , ρ ) c , wehave E ( θ , ρ ) c ∧ E ( θ, ρ ) c ∧ E ( θ ′ , ρ ′ ) ⊂ _ J ∈ ( [ m ] θ ′ m ) _ I ∈ ( [ n ](1 − β )2 ⌊ θm ⌋ ) G I,J ( ρ ′′ ) ∧ n ∃ u ∈ Σ I,J ( ρ ′′ ) \ Comp( θ , ρ ) : k M u k ≤ ρ ′′ K √ n o (3.40)where we have assumed ρ ′ is small enough that 4 ρ ′′ < ρ .Fix J ⊂ [ m ] and I ⊂ [ n ] of size ⌊ θ ′ m ⌋ , ⌊ (1 − β ) ⌊ θm ⌋⌋ , respectively,and condition on F I, [ n ] to ﬁx Σ I,J ( ρ ′′ ). Fix an arbitrary u ∈ Σ I,J ( ρ ′′ ) \ Comp( θ , ρ ). From Lemma 3.11 we have(3.41) P [ n ] \ I (cid:16) k M u k ≤ ρ ′′ K √ n (cid:17) = O κ  ρ ′′ K log / ( √ nρ ) a ρ θ /  a n NVERTIBILITY OF STRUCTURED RANDOM MATRICES provided(3.42) ρ ′′ ≥ ca ρ K √ n for some small constant c > n/ ≤ m ≤ n ).Applying the union bound over the choices of u ∈ Σ I,J ( ρ ′′ ) \ Comp( θ , ρ ),on the event G I,J ( ρ ′′ ) we have P (cid:16) ∃ u ∈ Σ I,J ( ρ ′′ ) \ Comp( θ , ρ ) : k M u k ≤ ρ ′′ K √ n (cid:17) ≤ O (cid:18) ρ ′′ (cid:19) | J |−| I | ) O κ  ρ ′′ K log / ( √ nρ ) a ρ θ /  a n = O (cid:18) ρ ′′ (cid:19) | J |−| I | ) O κ,a (cid:0) ρ ′′ K log( K √ n ) (cid:1) a n where in the second line we substituted the expressions for ρ , θ fromLemma 3.3. Denoting ε = ρ ′′ K , the above bound rearranges to(3.43) O κ,a (log n ) n n O ( γ ) n O ( γ − / | J |−| I | ) ε a n − | J |−| I | ) . We can bound | J | − | I | = θm + βa m − (1 − β ) θm + O (1) ≤ βa m + 2 βθm + O (1)= O ( βa m ) + O (1)where we used our assumption that θ ≤ c a . In particular, | J | − | I | ≤ a n + O (1) if the constant c in (3.39) is suﬃciently small, and (3.43) isbounded by(3.44) O κ,a (log n ) n n O ( γ ) n O ( γ − / βa m ε a n − O (1) . Applying the union bound over the choices of

I, J in (3.40), which incurs aharmless factor of 2 m + n = O (1) n , and substituting the expression (3.39) for β we have(3.45) P (cid:16) E ( θ , ρ ) c ∧ E ( θ, ρ ) c ∧ E ( θ + βa , ρ ′ (cid:17)(cid:17) = O κ,a (log n ) n n O ( γ ) ε − O (1) ( n O ( c ) ε / ) a n . It only remains to check that we can take ε suﬃciently small to obtain (3.38).From (3.42) we are constrained to take ε = ρ ′′ K ≥ ca ρ K √ n = c ′ a √ n N. COOK for some constant c ′ ∈ (0 ,

1) suﬃciently small. Taking ε = a / √ n and c suﬃciently small we have(3.46) P (cid:16) E ( θ , ρ ) c ∧ E ( θ, ρ ) c ∧ E ( θ + βa , ρ ′ (cid:17)(cid:17) ≤ O κ,a (1) n n O ( γ ) n − . a n which yields (3.38) as desired. With this choice of ε , ρ ′ ≫ ρ ′′ βρθ ≥ ρ ′′ βρθ ≫ κ,a ,γ ρn − γ +1 / − o (1) as desired (recall that θ ≫ κ a / log( K/a ) ≫ γ,a ,κ / log n ).Now we conclude the proof of Proposition 3.2. Since the event B ( K ) ismonotone under increasing K , by perturbing γ and assuming n is suﬃcientlylarge we may take K = n γ − / with γ > /

2. Let ρ , θ be as in Lemma 3.3,and for l ≥ θ l = θ + lβa with β = β ( γ ) as in (3.39). By Lemma3.13 we can inductively deﬁne a sequence ρ l such that for each l ≥ θ l ≤ c a , ρ l ≫ γ,a ,κ n − O ( γ ) ρ l − and P ( E ( θ , ρ ) c ∧ E ( θ l − , ρ l − ) c ∧ E ( θ l , ρ l )) = O γ,a ,κ ( e − n ) . Applying the union bound, for some k = O ( γ ) we have P (cid:0) E ( c a , ρ ) (cid:1) ≤ P ( E ( θ , ρ )) + k X l =1 P ( E ( θ , ρ ) c ∧ E ( θ l − , ρ l − ) c ∧ E ( θ l , ρ l )) ≤ e − c κ a n + O γ,a ,κ ( e − n )= O γ,a ,κ ( e − c κ a n )and ρ ≫ γ,a ,κ n − O ( γ ) . This concludes the proof of Proposition 3.2.

4. Invertibility from connectivity: Incompressible vectors.

Inthis section we conclude the proofs of Theorems 1.12 and 1.24 by boundingthe event that k M u k is small for some incompressible vector u (recall theterminology from Section 2.1). We follow the (by now standard) approachof reducing to the event that a ﬁxed row R i of M lies close to the spanof the remaining rows, an idea which goes back to the work of Koml´os onthe singularity probability for Bernoulli matrices [14, 15, 16]. This can inturn be controlled by the event that a random walk R i · v concentrates neara particular point, where v is a ﬁxed unit vector in the orthocomplementof the remaining rows. Independence of the rows allows us to condition on NVERTIBILITY OF STRUCTURED RANDOM MATRICES v , and our results from the previous section allow us to argue that v isincompressible.For the case that the entries of R i have variances uniformly bounded be-low, we could then complete the proof by applying the anti-concentrationestimate of Lemma 2.7. In the present setting, however, a proportion 1 − δ of the entries of R i may have zero variance. For the case of broadly con-nected proﬁle we follow the argument of Rudelson and Zeitouni [30] and useProposition 3.1 to show v has essential support of size (1 − δ/ n , and hencehas non-trivial overlap with the support of R i .For the case of a super-regular proﬁle, Proposition 3.2 only gives that v has essential support of size ≫ δσ . In Lemma 4.1 we make use of a dou-ble counting argument to show that if we choose the row R i at random, onaverage it will have good overlap with the corresponding normal vector v ( i ) (which also depends on i ). Here is where we make crucial use of the super-regularity hypothesis on A . Lemma 4.1 is a natural extension of a doublecounting argument used by Koml´os in his work on the singularity proba-bility for Bernoulli matrices, and which was applied to bound the smallestsingular value of iid matrices by Rudelson and Vershynin in [28]. We werealso inspired by a similar reﬁnement of the double counting argument fromthe recent paper [19] on the singularity probability for adjacency matricesof random regular digraphs.4.1. Proof of Theorem 1.12.

By Lemma 2.5 and multiplying X and B by a phase (which does not aﬀect our hypotheses) we may assume that ξ has O ( κ )-controlled second moment. Fix K ≥

1, and let ρ = ρ ( κ, σ , δ, ν, K )be as in Proposition 3.1. We may assume n is suﬃciently large dependingon κ, σ , δ, ν, K . For the remainder of the proof we restrict to the event B ( K ) = {k M k ≤ K √ n } .For j ∈ [ n ] let M ( i ) denote the n − × n matrix obtained by removingthe i th row from M . Deﬁne the good event(4.1) G = n ∀ i ∈ [ n ] , ∀ u ∈ Comp(1 − δ/ , ρ ) , k u ∗ M k , k M ( i ) u k > ρK √ n o . Applying Proposition 3.1 to M ∗ and M ( i ) for each i ∈ [ n ] (using our restric-tion to B ( K )) and the union bound we have(4.2) P ( G ) = 1 − O κ,σ ,δ,ν,K ( ne − c κ δσ n ) = 1 − O σ ,δ,ν,K ( e − c κ δσ n )adjusting c κ slightly. Let t ≤

1, and deﬁne the event(4.3) E ( t ) = G ∧ (cid:8) ∃ u ∈ Incomp(1 / , ρ ) : k u ∗ M k ≤ t/ √ n (cid:9) . N. COOK

For n suﬃciently large (larger than 1 /ρK ) it suﬃces to show(4.4) P ( E ( t )) ≪ κ,σ ,δ,ν,K t + n − / . Recalling that R i denotes the i th row of M , we denote(4.5) R − i = span( R j : j ∈ [ n ] \ { i } )and let(4.6) E i ( t ) = G ∧ { dist( R i , R − i ) ≤ t/ρ } . We now use a double counting argument of Rudelson and Vershynin from[28] to control E ( t ) in terms of the events E i ( t ). Suppose that E ( t ) holds,and let u ∈ Incomp(1 / , ρ ) such that k u ∗ M k ≤ t/ √ n . Then we must have | u i | ≥ ρ/ √ n for at least n/

10 elements i ∈ [ n ]. For each such i we have t √ n ≥ k u ∗ M k = (cid:13)(cid:13)(cid:13)(cid:13) n X j =1 u j R j (cid:13)(cid:13)(cid:13)(cid:13) ≥ (cid:13)(cid:13)(cid:13)(cid:13) P R ⊥− i n X j =1 u j R j (cid:13)(cid:13)(cid:13)(cid:13) = | u i | (cid:13)(cid:13)(cid:13) P R ⊥− i R i (cid:13)(cid:13)(cid:13) ≥ ρ √ n dist( R i , R − i )where we denote by P W the orthogonal projection to a subspace W . Thus,on E ( t ) we have that E i ( t ) holds for at least n/

10 values of i ∈ [ n ], so bydouble counting,(4.7) P ( E ( t )) ≤ n n X i =1 P ( E i ( t )) . Now it suﬃces to show that for arbitrary ﬁxed i ∈ [ n ],(4.8) P ( E i ( t )) ≪ κ,σ ,δ,ν,K t + n − / . Fix i ∈ [ n ] and condition on { R j : j ∈ [ n ] \ { i }} . Draw a unit vec-tor u ∈ R ⊥− i independent of R i , according to Haar measure (say). Sincedist( R i , R − i ) ≤ | R i · u | , it suﬃces to show(4.9) P ( | R i · u | ≤ t/ρ ) ≪ κ,σ ,δ,ν,K t + n − / . Since u ∈ ker( M ( i ) ), on G we have that u ∈ Incomp(1 − δ , ρ ). By Lemma2.1 there exists L ⊂ [ n ] of size | L | ≥ (1 − δ ) n such that ρ √ n ≤ | u j | ≤ √ δn NVERTIBILITY OF STRUCTURED RANDOM MATRICES for all j ∈ L . By assumption we have |N A ( σ ) ( i ) | = |{ j ∈ [ n ] : a ij ≥ σ }| ≥ δn , so letting J = N A ( σ ) ( i ) ∩ L we have | J | ≥ δn/

4. Denoting v = ( u i ) J =( a ij u j j ∈ J ) j , we have k v k = X j ∈ J a ij | u j | ≥ | J | σ ρ /n ≥ δσ ρ / k v k ∞ ≤ k u J k ∞ ≤ √ δn (recall that a ij ≤ i, j ∈ [ n ]). Conditioning on u and { ξ ij } j / ∈ J , weapply Lemma 2.7 to conclude P ( | R i · u | ≤ t/ρ ) ≪ κ k v k (cid:18) tρ + k v k ∞ (cid:19) ≪ ρσ δ / (cid:18) tρ + 1 √ δn (cid:19) which gives (4.9) as desired.4.2. Proof of Theorem 1.24.

By Lemma 2.5 and multiplying X and B by a phase (which does not aﬀect our hypotheses) we may assume that ξ has κ = O ( κ )-controlled second moment. Fix γ ≥ / K = O ( n γ − / ).We will show that for all τ ≥ P (cid:18) s n ( M ) ≤ τ √ n , k M k ≤ K √ n (cid:19) ≪ γ,σ ,δ,κ n O ( γ ) τ + r log nn . For the remainder of the proof we restrict to the boundedness event(4.11) B ( K ) = {k M k ≤ K √ n } . By the assumption that A ( σ ) is ( δ, ε )-super-regular we have n X i =1 a ij ≥ δσ n for all j ∈ [ n ]. Let a = δ / σ , and let ρ = ρ ( γ, a , κn ) and c be as inProposition 3.2. In particular,(4.12) ρ ≫ γ,δ,σ n − O ( γ ) . Denoting θ = c δσ , for τ > G ( τ ) = n ∀ u ∈ Comp( θ, ρ ) , k M u k , k u ∗ M k > τ / √ n o . N. COOK

Applying Proposition 3.2 to M and M ∗ , along with the union bound, wehave(4.14) P ( G ( τ )) = 1 − O γ,δ,σ ,κ ( e − c κ δσ n )as long as τ ≤ ρKn .Let 0 < τ ≤ M ( i ) from Section4.1, we deﬁne the sets(4.15) S i ( τ ) = (cid:26) u ∈ S n − : k M ( i ) u k ≤ τ √ n (cid:27) . Informally, for small τ this is the set of unit almost-normal vectors to thesubspace R − i spanned by the rows of M ( i ) . In Lemma 4.1 below we reduceour task to bounding the probability that a row R i is nearly orthogonal toa vector u ( i ) ∈ S i ( τ ) that is independent of R i , and also has many largecoordinates in the support of R i . The reduction uses the super-regularityhypothesis together with a careful averaging argument. It turns out thatfor this argument to work it is important to consider almost-normal vectorsrather than normal vectors (as in the proof of Theorem 1.12).Writing N ( i ) = N A ( σ ) ( i ), we deﬁne the good overlap events (4.16) O i ( τ ) = (cid:8) ∃ u ∈ S i ( τ ) : |N ( i ) ∩ L + ( u, ρ ) | ≥ δθn (cid:9) where(4.17) L + ( u ) = { j ∈ [ n ] : | u j | ≥ ρ/ √ n } . On O i ( τ ) we ﬁx a vector u ( i ) = u ( i ) ( M ( i ) , τ ) ∈ S i ( τ ), chosen measurablywith respect to M ( i ) , satisfying |N ( i ) ∩ L + ( u, ρ ) | ≥ δθn . Lemma . Recall the parameter ε from oursuper-regularity hypothesis (cf. Deﬁnition 1.23), and assume ε ≤ θ/ . Then (4.18) P (cid:18) G ( τ ) ∧ n s n ( M ) ≤ τ √ n o(cid:19) ≤ θn n X i =1 P (cid:18) O i ( τ ) ∧ (cid:26) | R i · u ( i ) | ≤ τρ (cid:27)(cid:19) . Proof.

Suppose G ( τ ) ∧ { s n ( M ) ≤ τ / √ n } holds. Then there exist u, v ∈ S n − such that k M u k , k M ∗ v k ≤ τ / √ n . By our restriction to G ( τ ) we musthave u, v ∈ Incomp( θ, ρ ). With notation as in (4.17) we have | L + ( u ) | , | L + ( v ) | ≥ θn . In particular, | L + ( u ) | ≥ εn , so(4.19) |N ( i ) ∩ L + ( u ) | ≥ δ | L + ( u ) | ≥ δθn NVERTIBILITY OF STRUCTURED RANDOM MATRICES for at least (1 − ε ) n elements i ∈ [ n ]. Indeed, otherwise we would have e A ( σ ) ( I, L + ( u )) = X i ∈ I |N ( i ) ∩ L + ( u ) | < δ | I || L + ( u ) | for some I ⊂ [ n ] with | I | > εn , which contradicts our assumption that A ( σ )is ( δ, ε )-super-regular. Since k M ( i ) u k ≤ k M u k ≤ τ √ n for all i ∈ [ n ], we havethat u ∈ S i ( τ ) for all i ∈ [ n ]. Thus,(4.20) (cid:12)(cid:12)(cid:8) i ∈ L + ( v ) : O i ( τ ) holds (cid:9)(cid:12)(cid:12) ≥ θn − εn ≥ θn/ . Fix i ∈ L + ( v ) such that O i ( τ ) holds. We have τ √ n ≥ k v ∗ M k ≥ | v ∗ M u ( i ) | ≥ | v i || R i · u ( i ) | − (cid:12)(cid:12)(cid:12)(cid:12) X j = i v j R j · u ( i ) (cid:12)(cid:12)(cid:12)(cid:12) . The ﬁrst term on the right hand side is bounded below by ρ √ n | R i · u ( i ) | since i ∈ L + ( v ). By Cauchy–Schwarz the second term is bounded above by k M ( i ) u ( i ) k ≤ τ / √ n , since u ( i ) ∈ S i ( τ ). Rearranging we conclude | R i · u ( i ) | ≤ τ /ρ for all i ∈ L + ( v ) such that O i ( τ ) holds. Letting E i ( t ) = {| R i · u ( i ) | ≤ t } , we have shown that on the event G ( τ ) ∧ { s n ( M ) ≤ τ / √ n } , the event O i ( τ ) ∧ E i (2 τ /ρ ) holds for at least θn/ i ∈ [ n ] (from (4.20)). Itfollows that n X i =1 ( O i ( τ ) ∧ E i (2 τ /ρ )) ≥ θn ( G ( τ ) ∧ { s n ( M ) ≤ τ / √ n } ) . Taking expectations on each side and rearranging yields the claim.Fix i ∈ [ n ] arbitrarily, and suppose that O i ( τ ) holds. We condition on therows { R j } j ∈ [ n ] \{ i } to ﬁx u ( i ) . We begin by ﬁnding a large set on which u ( i ) is ﬂat, following a similar dyadic pigeonholing argument as in the proof ofLemma 3.11. Letting L k = { j ∈ [ n ] : 2 − ( k +1) < | u ( i ) j | ≤ − k , since δθn ≤ |N ( i ) ∩ L + ( u ( i ) ) | ≤ (cid:12)(cid:12)(cid:12)(cid:12) ℓ [ k =0 N ( i ) ∩ L k (cid:12)(cid:12)(cid:12)(cid:12) for some ℓ ≪ log( √ n/ρ ), by the pigeonhole principle there exists k ∗ ≤ ℓ such that J := N ( i ) ∩ L k ∗ satisﬁes(4.21) | J | ≥ δθn/ℓ ≫ δθn log( √ n/ρ ) . N. COOK

Let us denote v = ( a ij u ( i ) j j ∈ J ) j . Since a ij ≥ σ for j ∈ N ( i ) and | u ( i ) j | ≫ ρ/ √ n for j ∈ L k ∗ ,(4.22) k v k ≥ σ k ( u ( i ) ) J k ≫ σ ρ ( | J | /n ) / and since u ( i ) varies by at most a factor of 2 on J ,(4.23) k v k ∞ ≤ k u ( i ) J k ∞ ≤ k u ( i ) k / | J | / . By further conditioning on the variables { ξ ij } j / ∈ J and applying Lemma 2.7along with the estimates (4.22), (4.23) we have P (cid:16) | R i · u ( i ) | ≤ τ /ρ (cid:17) ≪ κ τ /ρ + k v k ∞ k v k≪ σ (cid:18) τ /ρρ ( | J | /n ) / + 1 | J | / (cid:19) = 1 σ (cid:18) n | J | (cid:19) / (cid:18) τρ + 1 √ n (cid:19) . Inserting the bound (4.21) and undoing all of the conditioning, we haveshown P (cid:18) O i ( τ ) ∧ (cid:26) | R i · u ( i ) | ≤ τρ (cid:27)(cid:19) ≪ κ σ √ δθ (cid:18) τρ + 1 √ n (cid:19) log / ( √ n/ρ ) . Since the right hand side is uniform in i , applying Lemma 4.1 (taking c = c /

2) and substituting the expression for θ we have(4.24) P (cid:18) G ( τ ) ∧ n s n ( M ) ≤ τ √ n o(cid:19) ≪ κ σ δ (cid:18) τρ + 1 √ n (cid:19) log / ( √ n/ρ )for all τ ≥ τ ≤ ρ , in whichcase our constraint τ ≤ ρKn from (4.14) holds). The bound (4.10) nowfollows by substituting the lower bound (4.12) on ρ and the bound (4.14) on G ( τ ) c (which is dominated by the O ( n − / log / n ) term). This concludesthe proof of Theorem 1.24.

5. Invertibility under diagonal perturbation: Proof of main the-orem.

In this ﬁnal section we prove Theorem 1.18. See Section 1.5 for ahigh level discussion of the main ideas. In Sections 5.1 and 5.2 we collectthe main tools of the proof: the regularity lemma, the Schur complementbound, and bounds on the operator norm of random matrices. In Section5.3 we apply the regularity lemma to decompose the standard deviationproﬁle A into a bounded number of submatrices enjoying various properties.In Section 5.4 we apply the decomposition to prove Theorem 1.18, on twotechnical lemmas, and in the ﬁnal sections we prove these lemmas. NVERTIBILITY OF STRUCTURED RANDOM MATRICES Preliminary Tools.

We begin by stating a version of the regularitylemma suitable for our purposes. Recall that in Theorem 1.12 we associatedthe standard deviation proﬁle A with a bipartite graph. Here it will be moreconvenient to associate A with a directed graph. That is, to a non-negativesquare matrix A = ( a ij ) ≤ i,j ≤ n we associate a directed graph Γ A on vertexset [ n ] having an edge i → j when a ij > A tohave self-loops, though the diagonal of A will have a negligible eﬀect on ourarguments). The notation (1.16)–(1.17) extends to this setting. Additionally,we denote the density of the pair ( I, J ) ρ A ( I, J ) := e A ( I, J ) | I || J | . Definition . Let A be an n × n matrix with non-negative entries. For ε >

0, we say that a pair of vertex subsets

I, J ⊂ [ n ] is ε -regular for A if for every I ′ ⊂ I, J ′ ⊂ J satisfying | I ′ | > ε | I | , | J ′ | > ε | J | we have | ρ A ( I ′ , J ′ ) − ρ A ( I, J ) | < ε. The following is a version of the regularity lemma for directed graphswhich follows quickly from a stronger result of Alon and Shapira [2, Lemma3.1]. Note that [2, Lemma 3.1] is stated for directed graphs without loops,which in the present setting means that it only applies to matrices A withdiagonal entries equal to zero. However, Lemma 5.2 follows from applying[2, Lemma 3.1] to the matrix A ′ formed be setting the diagonal entries of A to zero, and noting that the diagonal has a negligible impact on the edgedensities ρ A ( I, J ) when | I | , | J | ≫ n . Lemma . Let ε > . There exists m ∈ N with ε − ≤ m ≪ ε such that for all n suﬃciently large depending on ε , forevery n × n non-negative matrix A there is a partition of [ n ] into m + 1 sets I , I , . . . , I m with the following properties:(1) | I | < εn ;(2) | I | = | I | = · · · = | I m | ;(3) all but at most εm of the pairs ( I k , I l ) are ε -regular for A . Remark . The dependence on ε of the bound m ≤ O ε (1) is verybad: a tower of exponentials of height O ( ε − C ). Indeed, as in Szemer´edi’s N. COOK proof for the setting of bipartite graphs [33], the proof in [2] gives such abound with C = 5. It was shown by Gowers that for undirected graphs onecannot do better than C = 1 /

16 in general [12]. As remarked in [2], hisargument carries over to give a similar result for directed graphs.We will apply this in Section 5.3 to partition the standard deviation proﬁleinto a bounded number of manageable submatrices. The following elemen-tary fact from linear algebra will be used to lift the invertibility propertiesobtained for these submatrices back to the whole matrix.

Lemma . Let M ∈ M N + n ( C ) , which wewrite in block form as M = (cid:18) A BC D (cid:19) for A ∈ M N ( C ) , B ∈ M N,n ( C ) , C ∈ M n,N ( C ) , D ∈ M n ( C ) . Assume that D is invertible. Then (5.1) s N + n ( M ) ≥ (cid:18) k B k s n ( D ) (cid:19) − (cid:18) k C k s n ( D ) (cid:19) − min (cid:16) s n ( D ) , s N ( A − BD − C ) (cid:17) . Proof.

From the identity (cid:18)

A BC D (cid:19) = (cid:18) I N BD − I n (cid:19) (cid:18) A − BD − C D (cid:19) (cid:18) I N D − C I n (cid:19) we have (cid:18) A BC D (cid:19) − = (cid:18) I N − D − C I n (cid:19) (cid:18) ( A − BD − C ) − D − (cid:19) (cid:18) I N − BD − I n (cid:19) . We can use the triangle inequality to bound the operator norm of the ﬁrstand third matrices on the right hand side by 1 + k BD − k and 1 + k CD − k ,respectively. Now by sub-multiplicativity of the operator norm, k M − k ≤ (1 + k BD − k )(1 + k D − C k ) max( k ( A − BD − C ) − k , k D − k ) ≤ (cid:18) k B k s n ( D ) (cid:19)(cid:18) k C k s n ( D ) (cid:19) max( k ( A − BD − C ) − k , k D − k ) . The bound (5.1) follows after taking reciprocals.

NVERTIBILITY OF STRUCTURED RANDOM MATRICES Control on the operator norm.

The following lemma summarizes thecontrol we will need on the operator norm of submatrices and products ofsubmatrices of M . Lemma . Let ξ ∈ C be a centeredrandom variable with E | ξ | η ≤ for some η ∈ (0 , . Let θ ∈ (0 , . Thenthe following hold for all n ≥ :(a) (Control for sparse matrices) If A ∈ M n ([0 , is a ﬁxed matrix and X = ( ξ ij ) is an n × n matrix of iid copies of ξ , then (5.2) k A ◦ X k ≪ τ √ n except with probability O τ ( n − η/ ) , where τ = τ ( A ) ∈ [0 , is any numbersuch that (5.3) n X k =1 a ik , n X k =1 a kj ≤ τ n for all i, j ∈ [ n ] , and (5.4) n X i,j =1 a ij ≤ τ n . (b) (Control for matrix products) Let m ∈ [ θn, n ] . If A ∈ M n,m ([0 , and D ∈ M m,n ( C ) are ﬁxed matrices with k D k ≤ , and X = ( ξ ij ) is an n × m matrix of iid copies of ξ , then (5.5) k D ( A ◦ X ) k ≪ η √ m except with probability O θ ( n − η/ ) . Remark . The probability bounds in the above lemma can be im-proved under higher moment assumptions on ξ , and improve to exponentialbounds under the assumption that ξ is sub-Gaussian (see (1.1)).We will use standard truncation arguments to deduce Lemma 5.5 fromthe following bounds on the expected operator norm of random matricesdue to Lata la and Vershynin. Theorem . Let n, m be suﬃciently large and let Y bean n × m random matrix with independent, centered entries Y ij ∈ R havingﬁnite fourth moment. Then (5.6) E k Y k ≪ max i ∈ [ n ] m X j =1 E Y ij ! / + max j ∈ [ m ] n X i =1 E Y ij ! / + n X i =1 m X j =1 E Y ij ! / . N. COOK

Theorem . Let η ∈ (0 , and n, m, N suﬃcientlylarge natural numbers. Let D ∈ M m,N ( R ) be a deterministic matrix satis-fying k D k ≤ and Y ∈ M N,n ( R ) be a random matrix with independentcentered entries Y ij satisfying E | Y ij | η ≤ . Then (5.7) E k DY k ≪ η √ n + √ m. Proof of Lemma 5.5.

We begin with (a). By splitting X into real andimaginary parts and applying the triangle inequality we may assume ξ is areal-valued random variable. Set η = min(1 / , η/

32) and deﬁne the productevent(5.8) E = n ^ i,j =1 E ij ; E ij = (cid:8) | ξ ij | ≤ n / − η (cid:9) . By Markov’s inequality,(5.9) P ( E cij ) ≤ n − (4+ η )(1 / − η ) ≤ n − for all i, j ∈ [ n ]. By the union bound,(5.10) P ( E c ) ≤ n n − (4+ η )(1 / − η ) ≤ n − η/ . We denote X ′ = ( ξ ′ ij ) = ( ξ ij − E ξ ij E ij ) = X − E ( X E ) . First we show(5.11) k A ◦ E ( X E ) k ≪ τ √ n. Since the variables ξ ij are centered, | E ( ξ ij E ij ) | = | E ( ξ ij E cij ) | . By twoapplications of H¨older’s inequality and (5.9), | E ( ξ ij E cij ) | ≤ ( E | ξ ij | ) / P ( E cij ) / ≤ n − / . Thus,(5.12) k A ◦ E ( X E ) k ≤ k A ◦ E ( X E ) k HS ≤ n − / k A k HS ≤ τ n / which yields (5.11) with room to spare.Now from (5.10), (5.11) and the triangle inequality it is enough to show(5.13) P (cid:0) E ∧ (cid:8) k A ◦ X ′ k ≥ Cτ √ n (cid:9)(cid:1) = O τ ( n − η/ ) NVERTIBILITY OF STRUCTURED RANDOM MATRICES for a suﬃciently large constant C > ξ ′ ij E ij are centered and satisfy E | ξ ′ ij E ij | = O (1). It follows from Theorem 5.7 that E E k A ◦ X ′ k ≪ max i ∈ [ n ] n X j =1 a ij ! / + max j ∈ [ n ] n X i =1 a ij ! / + n X i,j =1 a ij ! / ≪ τ √ n. Thus, (5.13) will follow if we can show(5.14) P (cid:0) k A ◦ X ′ k E − E k A ◦ X ′ k E ≥ τ √ n (cid:1) = O τ ( n − η/ ) . This in turn follows in a routine manner from Talagrand’s inequality [34,Theorem 6.6] (see also [3, Corollary 4.4.11]): Observe that X

7→ k A ◦ X k isa convex and 1-Lipschitz function on the space M n ( R ) equipped with the(Euclidean) Hilbert–Schmidt metric. Since the matrix X ′ E has centeredentries that are bounded by O ( n / − η ), Talagrand’s inequality gives thatthe left hand side of (5.14) is bounded by(5.15) O (cid:0) exp( − cτ n/ ( n / − η ) ) (cid:1) = O (cid:0) exp( − cτ n η ) (cid:1) which gives (5.14) with plenty of room.Now we turn to part (b). The proof follows a very similar truncation argu-ment to the one in part (a), so we only indicate the necessary modiﬁcations.As before, by splitting D and X into real and imaginary parts and applyingthe triangle inequality we may assume D and X are real matrices. We deﬁne E as in (5.8), with E ij = (cid:8) | ξ ij | ≤ ( n √ m ) / − η (cid:9) and(5.16) η = 14 η η . With this choice of η , Markov’s inequality and the union bound give P ( E c ) = O θ ( n − η/ ). Taking X ′ = X − E ( X E ) as before, we can bound k D ( A ◦ E ( X E )) k ≤ k A ◦ E ( X E ) k by submultiplicativity of the operator norm,and the same argument as before gives(5.17) k A ◦ E ( X E ) k ≤ nm ( n √ m ) − (4+ η )(1 / − η ) = m / − η/ = o ( √ m ) . Since X ′ E has centered entries with ﬁnite moments of order 4 + η , byTheorem 5.8 we have(5.18) E k D ( A ◦ X ′ E ) k ≪ η √ m. N. COOK

The mapping X

7→ k D ( A ◦ X ) k is convex and 1-Lipschitz with respect tothe Hilbert–Schmidt metric on M n ( R ) (since k D k ≤

1) so using Talagrand’sinequality as in part (a) we ﬁnd that P (cid:0) k D ( A ◦ X ′ E ) k − E k D ( A ◦ X ′ E ) k ≥ √ m (cid:1) ≪ exp (cid:16) − cm/ ( n √ m ) / − η (cid:17) ≤ exp (cid:0) − c ′ ( θ ) n cη (cid:1) for some constant c > c ′ ( θ ) > θ .As the last line is bounded by O θ ( n − η/ ), the result follows from the above,(5.17), (5.18) and the triangle inequality by the same argument as for part(a).5.3. Decomposition of the standard deviation proﬁle.

We now begin theproof of Theorem 1.18, which occupies the remainder of the paper. In thepresent subsection we prove Lemma 5.9 below, which shows that the stan-dard deviation proﬁle A can be partitioned into a bounded collection ofsubmatrices with certain nice properties. For the motivation behind thislemma (and the notation J free , J cyc ) see Section 1.5. Lemma . Let A be an n × n matrix with entries a ij ∈ [0 , . Let ε, δ, σ ∈ (0 , , and assume ε is suﬃciently small depending on δ . Thereexists ≤ m ≪ ε , a partition [ n ] = J bad ∪ J free ∪ J cyc = J bad ∪ J free ∪ J ∪ · · · ∪ J m (5.19) and a set F ⊂ [ n ] satisfying the following properties:(1) εn ≪ | J bad | ≪ δ / n .(2) | F | ≪ δn , and for all i ∈ J free , (5.20) |{ j ∈ J free : ( i, j ) ∈ F }| , |{ j ∈ J free : ( j, i ) ∈ F }| ≤ δ / n. (3) If J free = ∅ then there is a permutation τ : J free → J free such that forall ( i, j ) ∈ J free × J free \ F with τ ( i ) ≥ τ ( j ) , a ij < σ .(4) If m ≥ then (5.21) | J | = · · · = | J m | ≫ ε n and there is a permutation π : [ m ] → [ m ] such that for all ≤ k ≤ m , A ( σ ) J k ,J π ( k ) is (2 δ, ε ) -super-regular (see Deﬁnition 1.23). NVERTIBILITY OF STRUCTURED RANDOM MATRICES Proof.

We begin by applying Lemma 5.2 to A ( σ ) to obtain m ∈ N with ε − ≤ m = O ε (1) and a partition [ n ] = I ∪ · · · ∪ I m satisfying theproperties in that lemma.The partition I , . . . , I m is almost what we need. In the remainder ofthe proof we perform a “cleaning” procedure (as it is commonly referred toin the extremal combinatorics literature) to obtain a partition J , . . . , J m with improved properties, where J k ⊂ I k for each 1 ≤ k ≤ m , and J ⊃ I collects the leftover elements.We start by forming a reduced digraph R = ([ m ] , E ) on the vertex set[ m ] with directed edge set(5.22) E := n ( k, l ) ∈ [ m ] : ( I k , I l ) is ε -regular and ρ A ( σ ) ( I k , I l ) > δ o . Next we ﬁnd a (possibly empty) set T ⊂ [ m ] such that the induced subgraph R ( T ) is covered by vertex-disjoint directed cycles, and the induced subgraph R ([ m ] \ T ) is cycle-free. Such a set can be obtained by greedily removingcycles and the associated vertices from R until the remaining graph has nomore directed cycles. By relabeling I , . . . , I m we may take T = [ m ], where m ∈ [0 , m ].Assuming m = 0, the fact that R ([ m ]) is covered by vertex-disjoint cyclesis equivalent to the existence of a permutation π : [ m ] → [ m ] such that( k, π ( k )) ∈ E for all 1 ≤ k ≤ m . Now we will obtain the sets J , . . . , J m obeying the properties in part (4) of the lemma. Let 1 ≤ k ≤ m . We havethat ( I k , I π ( k ) ) is ε -regular with density ρ k := ρ A ( σ ) ( I k , I π ( k ) ) > δ , so if weassume ε ≤ δ then for every I ⊂ I k , J ⊂ I π ( k ) with | I | , | J | ≥ ε | I k | ,(5.23) e A ( σ ) ( I, J ) ≥ ( ρ k − ε ) | I || J | ≥ δ | I || J | . It remains to ensure that conditions (1) and (2) from Deﬁnition 1.23 alsohold, which we will do by removing a small number of rows and columns.Letting I ′ k = (cid:8) i ∈ I k : |N A ( σ ) ( i ) ∩ I π ( k ) | < δ | I k | (cid:9) we have e A ( σ ) ( I ′ k , I π ( k ) ) < δ | I ′ k || I π ( k ) | , and it follows that | I ′ k | ≤ ε | I k | .Similarly, letting I ′′ k = (cid:8) i ∈ I k : |N A ( σ ) T ( i ) ∩ I π − ( k ) | < δ | I k | (cid:9) we have | I ′′ k | ≤ ε | I k | . Letting I ∗ k ⊂ I k be a set of size ⌊ ε | I k |⌋ containing I ′ k ∪ I ′′ k , we take(5.24) J k = I k \ I ∗ k . N. COOK

With this deﬁnition we have | J | = · · · | J m | , and for each 1 ≤ k ≤ m, i ∈ J k ,(5.25) |N A ( σ ) ( i ) ∪ J π ( k ) | , |N A ( σ ) T ( i ) ∩ J π − ( k ) | ≥ (4 δ − ε ) | I k | ≥ δ | J k | . Furthermore, for each 1 ≤ k ≤ m and I ⊂ J k , J ⊂ J π ( k ) with | I | , | J | ≥ ε | J k | ,if we assume ε ≤ / | I | , | J | ≥ ε | I k | , so by (5.23)(5.26) e A ( σ ) ( I, J ) ≥ δ | I || J | . It follows that for every 1 ≤ k ≤ m the submatrix A ( σ ) J k ,J π ( k ) is (2 δ, ε )-super-regular, which concludes the proof of part (4) of the lemma.Now we prove parts (2) and (3). We will obtain J free by removing a smallnumber of bad elements from I free := S m k = m +1 I k . Since the induced subgraph R ([ m + 1 , m ]) is cycle-free we may relabel I m +1 , . . . , I m so that(5.27) ( k, l ) / ∈ E for all m < l ≤ k ≤ m . We take(5.28) F = (cid:8) ( i, j ) ∈ [ n ] : ( i, j ) ∈ I k × I l for some ( k, l ) / ∈ E (cid:9) . The contribution to F from irregular pairs ( I k , I l ) is at most εn by theregularity of the partition I , . . . , I m , and the contribution from pairs ( I k , I l )with density less than 5 δ is at most 5 δn . Hence,(5.29) | F | ≤ εn + 5 δn ≤ δn giving the ﬁrst estimate in (2) (recall that we assumed ε ≤ δ ). Setting(5.30) I ′ free = n i ∈ I free : max (cid:0) |{ j ∈ [ n ] : ( i, j ) ∈ F }| , |{ j ∈ [ n ] : ( j, i ) ∈ F }| (cid:1) ≥ δ / n o it follows from (5.29) that(5.31) | I ′ free | ≤ δ / n. Let I ∗ free ⊂ I free be any set containing I ′ free of size min( | I free | , ⌊ δ / n ⌋ ) andtake J free = I free \ I ∗ free . The bounds (5.20) now follow immediately from(5.30). For part (3), from (5.27) we may take for τ any ordering of theelements of J free that respects the order of the sets J k := I k \ I ∗ free , i.e. sothat τ ( j ) ≥ τ ( i ) for all i ∈ J k , j ∈ J l and all m < l ≤ k ≤ m .Finally, taking(5.32) J bad = I ∪ I ∗ free ∪ m [ k =1 I ∗ k . NVERTIBILITY OF STRUCTURED RANDOM MATRICES we have | J bad | ≤ εn + 12 δ / n + 2 εn ≤ δ / n giving the upper bound in part (1). Now recalling that we took | I ∗ free | =min( | I free | , ⌊ δ / n ⌋ ) and | I ∗ k | = ⌊ ε | I k |⌋ for all 1 ≤ k ≤ m , we also havethe lower bound | J bad | ≥ min (cid:18) | I ∗ free | , (cid:12)(cid:12)(cid:12)(cid:12) m [ k =1 I ∗ k (cid:12)(cid:12)(cid:12)(cid:12)(cid:19) ≥ min (cid:18) ⌊ δ / n ⌋ , | I free | , ε (cid:12)(cid:12)(cid:12)(cid:12) m [ k =1 I k (cid:12)(cid:12)(cid:12)(cid:12) − m (cid:19) = min (cid:18) ⌊ δ / n ⌋ , (cid:12)(cid:12)(cid:12)(cid:12) m [ k = m +1 I k (cid:12)(cid:12)(cid:12)(cid:12) , ε (cid:12)(cid:12)(cid:12)(cid:12) m [ k =1 I k (cid:12)(cid:12)(cid:12)(cid:12) − m (cid:19) ≫ εn where we used that at least one of the sets I free = S m k = m +1 I k , I cyc = S mk =1 I k must be of size at least n/

4, say. This gives the lower bound in part (1) andcompletes the proof.5.4.

High level proof of Theorem 1.18.

In this subsection we prove The-orem 1.18 on two lemmas (Lemmas 5.10 and 5.11) which give control onthe smallest singular values of the submatrices M J free and (perturbationsof) M J cyc , with J free , J cyc as in Lemma 5.9. The proofs of these lemmas aredeferred to the remaining subsections.By our moment assumptions on ξ it follows that ξ is κ -spread for some κ = O ( µ η ) (see Remark 1.2). By Lemma 2.5 and multiplying X and B by a phase we may assume ξ has O ( µ η )-controlled second moment.Without loss of generality we may assume η <

1. We introduce parameters σ , δ, ε ∈ (0 ,

1) to be chosen suﬃciently small depending on r , η , and µ η ;speciﬁcally we will have the following dependencies:(5.33) σ = σ ( r , µ η ) , δ = δ ( r , η, µ η ) , ε = ε ( σ , δ ) . For the remainder of the proof we assume that n is suﬃciently large depend-ing on all parameters (which will only depend on r , K , η and µ η ).We begin by summarizing the control we have on the operator normof submatrices of A ◦ X . From Lemma 5.5(a) we have that for any ﬁxed B = ( b ij ) ∈ M n ([0 , I, J ⊂ [ n ] with | I | ≤ | J | ,(5.34) P (cid:16) k ( B ◦ X ) I,J k ≤ τ K p | J | (cid:17) = 1 − O τ ( | J | − η/ ) N. COOK for some K = O ( µ η ), and any τ ≤ τ ≥ | J | / max  max i ∈ I X j ∈ J b ij  / , max j ∈ J X i ∈ I b ij ! / ,  n X i,j =1 b ij  /  , and similarly with | J | replaced by | I | if | J | ≤ | I | . In particular, taking τ = 1and B = A we have k ( A ◦ X ) I,J k ≪ µ η p max( | I | , | J | )with probability 1 − O (max( | I | , | J | ) − η/ ) . (5.36)(We state (5.34) for general B ∈ M n ([0 , A .)We now apply Lemma 5.9 (assuming ε is suﬃciently small dependingon δ ) to obtain a partition [ n ] = J bad ∪ J free ∪ J cyc and a set F ⊂ [ n ] satisfying the properties (1)–(4) in the lemma. In the following we abbreviate M free := M J free and M cyc := M J cyc . Lemma . Assume n := | J free | ≥ δ / n . If σ , δ are suﬃciently smalldepending on r and µ η , then (5.37) s n ( M free ) ≫ µ η ,r √ n except with probability O µ η ,r ,δ ( n − η/ ) . (Note that while the deﬁnition of M free depends on ε , the bounds in theabove lemma are independent of ε .) Lemma . Assume n := | J cyc | ≥ δ / n . Fix γ ≥ and let W ∈M n ( C ) be a deterministic matrix with k W k ≤ n γ . There exists β = β ( γ, σ , δ ) such that if ε = ε ( σ , δ ) is suﬃciently small, (5.38) P (cid:16) s n ( M cyc + W ) ≤ n − β (cid:17) ≪ K ,γ,δ,σ ,µ η r log nn . Remark . We note that in the proof of Lemma 5.11 we do notmake use of the fact that the atom variable ξ has more than two ﬁnitemoments (the dependence on µ η is only through the parameter κ = O ( µ η )). In particular, we can remove the extra moment hypotheses inTheorem 1.18 under the additional assumption that the standard deviation NVERTIBILITY OF STRUCTURED RANDOM MATRICES proﬁle A contains a generalized diagonal of block submatrices which aresuper-regular and of dimension linear in n (that is, if we can take J bad = J free = ∅ in (5.19)).We defer the proofs of Lemmas 5.10 and 5.11 to subsequent sections, andconclude the proof of Theorem 1.18. Note that at this stage (before we haveapplied Lemma 5.10 or 5.11) the only constraint we have put on the pa-rameters in (5.33) is to assume ε is suﬃciently small depending on δ for theapplication of Lemma 5.9. We proceed in the following steps: Step 1:

Bound the smallest singular value of M free using Lemma 5.10. Inthis step we ﬁx σ ( r , µ η ), while δ is assumed to be suﬃcientlysmall depending on r , µ η but is otherwise left free. Step 2:

Bound the smallest singular value of(5.39) M := M J free ∪ J bad , J free ∪ J bad = (cid:18) M free B C M (cid:19) . using the result of Step 1, the Schur complement bound of Lemma5.4, (5.34) and Lemma 5.5(b). In this step we ﬁx δ ( r , η, µ η ). Step 3:

Bound the smallest singular value of(5.40) M = (cid:18) M cyc B C M (cid:19) . using the result of Step 2, the Schur complement bound of Lemma5.4, and Lemma 5.11. In this step we ﬁx ε ( σ , δ ).The case that one of J free or J cyc is small (or empty) can be handledessentially by skipping either Step 1 or Step 3. We will begin by assuming(5.41) | J free | , | J cyc | ≥ δ / n and address the case that this does not hold at the end. Step 1.

By Lemma 5.10 and the assumption (5.41), we can take σ and δ suﬃciently small depending on r and µ η such that(5.42) s min ( M free ) ≫ µ η ,r √ n except with probability O µ η ,r ,δ ( n − η/ ). We now ﬁx σ = σ ( r , µ η ) onceand for all, but leave δ free to be taken smaller if necessary. By independenceof the entries of M we may now condition on a realization of M free such that(5.42) holds. N. COOK

Step 2.

By (5.36) and (5.41) we have k C k = O µ η ( √ n ) except withprobability O δ ( n − η/ ). We henceforth condition on a realization of C sat-isfying this bound. Together with (5.42) this gives(5.43) k C M − k ≤ k C k s min ( M free ) ≪ µ η ,r . Since B is independent of C and M free we can apply Lemma 5.5(b) toconclude(5.44) k C M − B k ≪ η,µ η k C M − k| J bad | / ≪ η,µ η ,r | J bad | / except with probability O ε ( n − η/ ) = O δ,ε ( n − η/ ), where we have used thelower bound | J bad | ≫ εn from Lemma 5.9(1). On the other hand, by thetriangle inequality and (5.36),(5.45) s min ( M ) = s min ( Z J bad √ n + ( A ◦ X ) J bad ) ≥ r √ n − O µ η ( | J bad | / )except with probability O ( | J bad | − η/ ) = O ε ( n − η/ ). Again by the triangleinequality and the previous two displays,(5.46) s min ( M − C M − B ) ≥ r √ n − O η,µ η ,r ( | J bad | / )except with probability O δ,ε ( n − η/ ). Since | J bad | ≪ δ / n we can take δ smaller, if necessary, depending on r , η, µ η to conclude that(5.47) s min ( M − C M − B ) ≥ ( r / √ n except with probability O δ,ε ( n − η/ ). We may henceforth condition on theevent that (5.47) holds. Of an event with probability O δ ( n − η/ ) we may alsoassume k B k = O µ η ( √ n ). From Lemma 5.4 and the preceding estimateswe have s min ( M ) ≫ (cid:18) O µ η ( √ n ) s min ( M free ) (cid:19) − min (cid:2) s min ( M free ) , s min ( M − C M − B ) (cid:3) ≫ µ η ,r min (cid:2) √ n, s min ( M − C M − B ) (cid:3) ≫ µ η ,r √ n. (5.48)At this point we ﬁx δ = δ ( r , η, µ η ). NVERTIBILITY OF STRUCTURED RANDOM MATRICES Step 3.

Condition on a realization of M such that (5.48) holds. By (5.36)we may also condition on realizations of the matrices B , C in (5.40) suchthat k B k , k C k ≪ µ η √ n . Applying Lemma 5.4, s n ( M ) ≫ (cid:18) O µ η ( √ n ) s min ( M ) (cid:19) − min (cid:2) s min ( M ) , s min ( M cyc − B M − C ) (cid:3) ≫ µ η ,r min (cid:2) √ n, s min ( M cyc − B M − C ) (cid:3) . (5.49)By our estimates on k B k , k C k and s min ( M ) we have(5.50) k B M − C k ≪ µ η ns min ( M ) ≪ µ η ,r √ n (unlike in Step 2, here we did not need the stronger control on matrix prod-ucts provided by (5.5)). Now since M is independent of M , B , C , we canapply Lemma 5.11 with γ = 0 .

51 (say), ﬁxing ε suﬃciently small dependingon σ ( r , µ η ) and δ ( r , η, µ η ), to obtain(5.51) P (cid:16) s min ( M cyc − B M − C ) ≤ n − β (cid:17) ≪ K ,r ,η,µ η r log nn for some β = β ( r , η, µ η ) >

0. The result now follows from the above and(5.49), taking α = min( η/ , / δ is small enough that only one of these bounds fails. Inthis case we simply redeﬁne J bad to include the smaller of J cyc , J free . Notethat we still have | J bad | = O ( δ / n ). If | J cyc | < δ / n , then with this newdeﬁnition of J bad we have M = M , and the desired bound on s n ( M ) followsfrom (5.48) (with plenty of room). If | J free | < δ / n then we skip Step 2,proceeding with Step 3 using M in place of M . The bound (5.48) in thiscase follows from (5.45) and the bound | J bad | ≪ δ / n , taking δ suﬃcientlysmall depending on µ η , r . This concludes the proof of Theorem 1.18.5.5. Proof of Lemma 5.10.

We denote(5.52) A F = ( a ij ( i,j ) ∈ F ) . By the estimates on F in Lemma 5.9 we can apply (5.34) with τ = O ( δ / )to obtain(5.53) k ( A F ( σ ) ◦ X ) J free k ≪ µ η δ / √ n N. COOK except with probability at most O δ ( n − η/ ) = O δ ( n − η/ ). By another appli-cation of (5.34) with τ = 1,(5.54) (cid:13)(cid:13)(cid:0) ( A − A ( σ )) ◦ X (cid:1) J free (cid:13)(cid:13) ≪ µ η σ √ n except with probability at most O δ ( n − η/ ). Let(5.55) f M free := ( e A ◦ X ) J free + Z J free √ n, e A := A ( σ ) − A F ( σ ) . By the above estimates and the triangle inequality, s min ( M free ) ≥ s min ( f M free ) − k (( A − e A ) ◦ X ) J free k≥ s min ( f M free ) − O µ η ( δ / + σ ) √ n (5.56)except with probability O δ ( n − η/ ). Thus, it suﬃces to show(5.57) s min ( f M free ) ≫ µ η ,r √ n. except with probability O µ η ,r ,δ ( n − η/ ) – the result will then follow from(5.57) and (5.56) by taking δ, σ suﬃciently small depending on µ η , r .Furthermore, by Lemma 5.9(3) and conjugating M free by a permutationmatrix we may assume that e A is (strictly) upper triangular. Now it suﬃcesto prove the following: Lemma . Let M = A ◦ X + B be an n × n matrix as in Deﬁnition1.3, and further assume that for some r > , K ≥ , α > , • A is upper triangular; • B = Z √ n = diag( z i √ n ) ni =1 with | z i | ≥ r for all ≤ i ≤ n ; • ξ is such that for all n ′ ≥ and any ﬁxed A ′ ∈ M n ′ ([0 , , k A ′ ◦ X ′ k ≤ K √ n ′ except with probability O (( n ′ ) − α ) .Then s n ( M ) ≫ K,r √ n except with probability O K,r (1) α n − α . Remark . The proof gives an implied constant of order exp( − O ( K/r ) O (1) )in the lower bound on s n ( M ).To deduce Lemma 5.10 we apply the above lemma with M = f M free , α = η/ K = O ( µ η ) (by (5.36)) and n ≫ δ n in place of n , which givesthat (5.57) holds with probability(5.58) 1 − O µ η ,r ( n − η/ ) = 1 − O µ η ,r ,δ ( n − η/ )where in the ﬁrst bound we applied our assumption that η < NVERTIBILITY OF STRUCTURED RANDOM MATRICES Proof.

First we note that we may take n to be a dyadic integer, i.e. n = 2 q for some q ∈ N . Indeed, if this is not the case, then letting 2 q be thesmallest dyadic integer larger than n we can increase the dimension of M to 2 q by padding A out with rows and columns of zeros, adding additionalrows and columns of iid copies of ξ to X , and extending the diagonal of Z with entries z i ≡ r for n < i ≤ q . The hypotheses on A and Z in thelemma are still satisﬁed, and the smallest singular value of the new matrixis a lower bound for that of the original matrix (since the original matrix isa submatrix of the new matrix).Now ﬁx an arbitrary dyadic ﬁltration F = S p ≥ { J s : s ∈ { , } p } of [ n ],where we view { , } as labeling the trivial partition of [ n ], consisting onlyof the empty string ∅ , so that J ∅ = [ n ]. Thus, for every 0 ≤ p < q and everybinary string s ∈ { , } p , J s has cardinality n − p and is evenly partitionedby J s , J s . For a binary string s we abbreviate M s := M J s and similarlydeﬁne A s , X s , Z s . We also write B s = M J s ,J s , so that we have the blockdecomposition(5.59) M s = (cid:18) M s B s M s (cid:19) . For p ≥ B ∗ ( p ) = (cid:8) k A ◦ X k ≤ K √ n } ∧ (cid:8) ∀ s ∈ { , } p , k A s ◦ X s k ≤ K √ n − p (cid:9) . By our assumption on ξ we have(5.61) P ( B ∗ ( p )) ≥ − O ( n − α ) − p O (( n − p ) − α ) = 1 − O (2 (1+ α ) p n − α ) . For arbitrary s ∈ { , } p , by the triangle inequality we have that on B ∗ ( p ), s min ( M s ) ≥ s min ( Z s ) − k A s ◦ X s k ≥ ( r − K − p/ ) √ n. Setting p = ⌊ K/r ) ⌋ + 1 we have that on B ∗ ( p ),(5.62) s min ( M s ) ≥ ( r / √ n for all s ∈ { , } p . For the remainder of the proof we restrict the samplespace to the event B ∗ ( p ) and will use the Schur complement bound (Lemma5.4) to show that the desired lower bound on s min ( M ) holds deterministically(note that by (5.61) and our choice of p , B ∗ ( p ) holds with probability1 − O K,r ( n − α )).For 0 ≤ p ≤ p let(5.63) λ p = min s ∈{ , } p √ n s min ( M s ) . N. COOK

From (5.62) we have(5.64) λ p ≥ r / ≤ p ≤ p and s ∈ { , } p − . By the block decomposition (5.59)and Lemma 5.4, s min ( M s ) ≫ (cid:18) k B s k s min ( M s ) (cid:19) − min (cid:0) s min ( M s ) , s min ( M s ) (cid:1) ≥ (1 + K/λ p ) − λ p √ n so λ p − ≫ (1 + K/λ p ) − λ p √ n for all 0 ≤ p ≤ p . Applying this iterativelyalong with (5.64) we conclude λ ≫ K,r

1, i.e.(5.65) s min ( M ) ≫ K,r √ n as desired.5.6. Proof of Lemma 5.11.

We may assume throughout that n is suﬃ-ciently large depending on the parameters K , γ, δ, σ , and µ η . Note wemay also assume γ > P ( k ( A ◦ X ) I,J k ≥ n ) ≤ n − ∀ I, J ⊂ [ n ] . Indeed, for any

I, J ⊂ [ n ], P ( k ( A ◦ X ) I,J k ≥ n ) ≤ P ( k A ◦ X k HS ≥ n ) . Furthermore, E k A ◦ X k ≤ E k X k = n , and (5.66) follows from theabove display and Markov’s inequality.By multiplying M cyc by a permutation matrix we may assume that A k := A J k is (2 δ, ε )-super-regular for 1 ≤ k ≤ m (unlike in the proof of Lemma5.10 the diagonal matrix Z √ n plays no special role here). We denote J ≤ k = J ∪ · · · ∪ J k , and for any matrix W of dimension at least | J ≤ k | we abbreviate(5.67) W k = W J k , W ≤ k = W J ≤ k , W ≤ k − ,k = W J ≤ k − ,J k , W k, ≤ k − = W J k ,J ≤ k − so that for 2 ≤ k ≤ m we have the block decomposition(5.68) W ≤ k = (cid:18) W ≤ k − W ≤ k − ,k W k, ≤ k − W k (cid:19) . NVERTIBILITY OF STRUCTURED RANDOM MATRICES Let us denote(5.69) n ′ = | J | = · · · = | J m | ≫ ε n. For 1 ≤ k ≤ m − β > kn ′ × kn ′ matrix W , we denote theevent(5.70) E k ( β, W ) := (cid:8) s kn ′ ( M ≤ k + W ) > n − β (cid:9) . Let γ > W ∈ M n ′ ,n ′ ( C ) with k W k ≤ n γ .By (5.66) we have(5.71) k M + W k ≤ K √ n + n + n γ ≤ n γ with probability 1 − O ( n − ) if n is suﬃciently large depending on K and γ . By Theorem 1.24 there exists β ( γ ) = O ( γ ) such that if ε is suﬃcientlysmall depending on σ , δ , then P (cid:0) E ( β , W ) c (cid:1) ≤ P ( k M + W k > n γ ) + P ( E ( β , W ) c ∧ {k M + W k ≤ n γ } ) ≪ γ,δ,σ ,ε,µ η r log nn , (5.72)where we have used (5.69) to write n in n − β rather than n ′ , and the factthat the atom variable is O ( µ η )-spread.Now let 2 ≤ k ≤ m , and suppose we have found a function β k − ( γ ) suchthat for any γ > k − n ′ × ( k − n ′ matrix W with k W k ≤ n γ ,(5.73) P ( E k − ( β k − ( γ ) , W ) c ) ≪ γ,δ,σ ,ε,µ η r log nn . Fix a kn ′ × kn ′ matrix W with k W k ≤ n γ . By Lemma 5.4 we have s kn ′ ( M ≤ k + W ) ≫ (cid:18) k ( M + W ) ≤ k − ,k k s ( k − n ′ ( M ≤ k − + W ≤ k − ) (cid:19) − (cid:18) k ( M + W ) k, ≤ k − k s ( k − n ′ ( M ≤ k − + W ≤ k − ) (cid:19) − × min h s ( k − n ′ ( M ≤ k − + W ≤ k − ) , s n ′ (cid:0) M k + B k (cid:1)i (5.74)where we have abbreviated(5.75) B k := W k − ( M + W ) k, ≤ k − ( M ≤ k − + W ≤ k − ) − ( M + W ) ≤ k − ,k . N. COOK

Suppose that the event E k − ( β k − ( γ ) , W ≤ k − ) holds. We condition on arealization of the submatrix M ≤ k − satisfying(5.76) s ( k − n ′ ( M ≤ k − + W ≤ k − ) ≥ n − β k − ( γ ) . Moreover, from (5.66) we have(5.77) k ( M + W ) ≤ k − ,k k , k ( M + W ) k, ≤ k − k ≤ K √ n + n + n γ ≤ n γ with probability 1 − O ( n − ). Conditioning on the event that the above holds,from the previous two displays we have k B k k ≤ n γ + 4 n γ + β k − ( γ ) . Again by(5.66),(5.78) k M k + B k k ≤ K √ n + n + 4 n γ + β k − ( γ ) ≤ n γ + β k − ( γ ) with probability 1 − O ( n − ) in the randomness of M k . By Theorem 1.24and independence of M k from M ≤ k − , M k, ≤ k − , M k, ≤ k − , there exists β ′ k = O ( γ + β k − ( γ ) ) such that(5.79) P (cid:16) s n ′ ( M k + B k ) ≤ n − β ′ k (cid:17) ≪ γ,δ,σ ,ε,µ η r log nn . Restricting further to the event that s n ′ ( M k + B k ) > n − β ′ k and substitutingthe above estimates into (5.74), we have(5.80) s kn ′ ( M ≤ k + W ) ≫ n − γ − β k − ( γ ) min( n − β k − ( γ ) , n − β ′ k ) ≥ n − β k ( γ ) for some β k ( γ ) = O ( γ + β k − ( γ ) ). With this choice of β k ( γ ) we have shown(5.81) P ( E k ( β k ( γ ) , W ≤ k ) c ∧ E k − ( β k − ( γ ) , W ≤ k − )) ≪ γ,δ,σ ,ε,µ η r log nn . Applying this bound for all 2 ≤ k ′ ≤ k together with (5.72) and Bayes’ rulewe conclude that for any ﬁxed k and any square matrix W of dimension atleast kn ′ and operator norm at most n γ ,(5.82) P ( E k ( β k ( γ ) , W ≤ k ) c ) ≪ γ,δ,σ ,ε,µ η k r log nn . The result now follows by taking k = m and recalling that m = O ε (1). NVERTIBILITY OF STRUCTURED RANDOM MATRICES APPENDIX A: INVERTIBILITY FOR PERTURBEDNON-HERMITIAN BAND MATRICESIn this appendix we prove Corollary 1.16.By conditioning on the entries ξ ij with min( | i − j | , n − | i − j | ) > εn andabsorbing the corresponding entries of A ◦ X into B we may assume theentries of A ( σ ) are zero outside the band. By Theorem 1.12 it suﬃces toshow that A ( σ ) is ( δ, ν )-broadly connected for δ, ν ∈ (0 ,

1) suﬃciently smalldepending on ε . Throughout the proof we may assume that n is suﬃcientlylarge depending on ε , i.e. n ≥ n for any n ( ε ) ∈ N .Let δ, ν ∈ (0 ,

1) to be chosen suﬃciently small depending on ε . For all i ∈ [ n ] we have |N A ( σ ) ( i ) | , |N A T ( σ ) ( i ) | ≥ εn , so taking δ < ε , it only remainsto verify the third condition in Deﬁnition 1.9. Note that if | J | > (1 − ε ) n we trivially have | J ( i ) | ≥ |N A ( σ ) ( i ) | − εn ≥ εn for every i ∈ [ n ], and thecondition holds in this case.Fix a set J ⊂ [ n ] with 1 ≤ | J | ≤ (1 − ε ) n . For the remainder of the proofwe abbreviate J ( i ) := J ∩ N A ( σ ) ( i ) and I δ := N ( δ ) A T ( σ ) ( J ) = { i : | J ( i ) | ≥ δ | J |} . It will be convenient to view i

7→ | J ( i ) | as a function on the torus Z /n Z (which we identify with [ n ] in the natural way). From double counting wehave(A.1) X i ∈ Z /n Z | J ( i ) | = (1 + ⌊ εn ⌋ ) | J | ≥ ε | J | . On the other hand, we have the discrete derivative bound(A.2) || J ( i ) | − | J ( i − || ≤ ∀ i ∈ Z /n Z . Suppose towards a contradiction that(A.3) | I δ | < (1 + ν ) | J | . Since we took δ < ε , from (A.1) and the pigeonhole principle it follows that | I δ | ≥

1. We decompose I δ = ∪ l ∈ L I l as a disjoint union of interval subsets I l = [ a l , b l ] ⊂ Z /n Z that are pairwise separated by a distance at least 2. Wefurther split L = L > ∪ L ≤ , where L > = { l ∈ L : | I l | ≥ εn } and L ≤ = L \ L > .Note that for each l ∈ L we have(A.4) | J ( a l ) | = | J ( b l ) | = ⌊ δ | J |⌋ + 1 . N. COOK

From the bound (A.2) and the endpoint conditions (A.4) we see that within I l ,(A.5) | J ( i ) | ≤ min (cid:2) ⌊ δ | J |⌋ + 1 + min( i − a l , b l − i ) , εn + 1 (cid:3) , where the second argument in the outer minimum comes from the bound | J ( i ) | ≤ N A ( σ ) ( i ) ≤ εn + 1. For l ∈ L ≤ we ignore the second argument inthe outer minimum (which only increases the bound), and sum to obtain X i ∈ I l | J ( i ) | ≤ ( δ | J | + 1) | I l | + 14 | I l | ≤ (1 + δ | J | + εn ) | I l | , l ∈ L ≤ . For l ∈ L > we have X i ∈ I l | J ( i ) | = X i ∈ I l :min( i − a l ,b l − i ) ≤ εn ⌊ δ | J |⌋ + 1 + min( i − a l , b l − i )+ (2 εn + 1) |{ i ∈ I l : i − a l , b l − i ≥ εn + 1 }|≤ εn ( ⌊ δ | J |⌋ + 1) + 4 ε n + (2 εn + 1)( | I l | − εn ) ≤ (2 εn + 1) | I l | + 4 εnδ | J | − ε n . From the previous two displays we obtain X i ∈ Z /n Z | J ( i ) | ≤ δ | J | n + X i ∈ I δ | J ( i ) |≤ δ | J | n + X l ∈ L ≤ (1 + δ | J | + εn ) | I l | + X l ∈ L > h (2 εn + 1) | I l | + 4 εnδ | J | − ε n i = δ | J | n + 4 εn ( δ | J | − εn ) | L > | + (1 + δ | J | + εn ) X l ∈ L ≤ | I l | + (2 εn + 1) X l ∈ L > | I l | . If | L > | = 0 then X i ∈ Z /n Z | J ( i ) | ≤ δ | J | n + (1 + δ | J | + εn ) | I δ | . Combining with (A.1) and rearranging we obtain | I δ | ≥ (2 ε − δ ) | J | n εn + δ | J | ≥ ε − δε + δ | J | , NVERTIBILITY OF STRUCTURED RANDOM MATRICES and we contradict (A.3) taking ν < /

2, say, and δ < cε for a suﬃcientlysmall constant c >

0. If | L > | ≥

1, from our assumption δ < ε we have X i ∈ Z /n Z | J ( i ) | ≤ δ | J | n − εn | L > | + (2 εn + 1) X i ∈ L | I l |≤ δ | J | n − ε n + (2 εn + 1) | I δ | . Together with (A.1) this gives | I δ | ≥ εn εn + 1 | J | + 2 ε n − δn | J | εn + 1 ≥ εn εn + 1 | J | + 14 εn where in the last bound we took δ < ε and assumed n ≥ /ε . Taking ν < ε/

8, say, we contradict (A.3) if n is suﬃciently large. The claim follows.APPENDIX B: PROOFS OF ANTI-CONCENTRATION LEMMASIn this appendix we prove Lemmas 2.5, 2.7 and 2.8. All three are estab-lished by modiﬁcation of existing arguments from the literature. B.1. Proof of Lemma 2.5. (2.5) is immediate by our assumptions. Itremains to show(B.1) E | Re( zξ − w ) | ( | ξ | ≤ κ ) ≫ κ | Re( z ) | for all z, w ∈ C after rotating ξ by a phase if necessary. We may assume κ is larger than any ﬁxed constant. Let E denote the event {| ξ | ≤ κ } . ByChebyshev’s inequality,(B.2) P ( E ) ≥ − κ . Fix z, w ∈ C . Write e E := E ( ·|E ). By (B.2) and assuming κ is suﬃcientlylarge we have that the left hand side of (B.1) is ≫ e E | Re( zξ − w ) | , so itsuﬃces to show(B.3) e E | Re( zξ − w ) | ≫ κ | Re( z ) | after rotating ξ by a phase. Denoting η := ξ − e E ξ , we have e E | Re( zξ − w ) | = e E | Re( zη + ( e E ξ − w )) | = e E | Re( zη ) | + | e E ξ − w | N. COOK so it suﬃces to show that after rotating ξ by a phase,(B.4) e E | Re( zη ) | ≫ κ | Re( z ) | . We ﬁrst estimate the conditional variance of η . We have e E | η | = e E | ξ | − | e E ξ | = 1 P ( E ) E | ξ | E − P ( E ) | E ξ E | = 1 P ( E ) Var( ξ E ) + 1 P ( E ) (cid:18) − P ( E ) (cid:19) E | ξ | E = 1 P ( E ) (cid:0) Var( ξ E ) − P ( E c ) E | ξ | E (cid:1) ≫ Var( ξ E ) − O (1 /κ )where in the ﬁnal line we applied (B.2), the assumption E | ξ | = 1, andassumed κ is suﬃciently large. Now by our assumption that ξ is κ -spreadwe have Var( ξ E ) ≫ /κ , so(B.5) e E | η | ≫ /κ taking κ larger if necessary.Now consider the covariance matrix(B.6) Σ κ := e E | Re( η ) | e E (Re( η )Im( η )) e E (Re( η )Im( η )) e E | Im( η ) | ! . Writing z = a − ib and letting x = ( a b ) T be the associated column vector,we have(B.7) e E | Re( zη ) | = e E | a Re( η ) + b Im( η ) | = x T Σ κ x. Since Σ κ has two non-negative eigenvalues σ ≥ σ ≥ e E | η | ≫ /κ , it follows that σ ≫ /κ . We may rotate ξ by an appropriate phaseto assume the corresponding eigenspace is spanned by (1 0) T . This gives e E | Re( zη ) | ≫ σ | Re( z ) | ≫ κ | Re( z ) | as desired. NVERTIBILITY OF STRUCTURED RANDOM MATRICES B.2. Proof of Lemma 2.7.

We ﬁrst need to recall a couple of lemmasfrom [36, 39].

Lemma

B.1 (Fourier-analytic bound, cf. [39, Lemma 6.1]) . Let ξ be acomplex-valued random variable. For all r > and any v ∈ S n − we have (B.8) p ξ,v ( r ) ≪ r Z w ∈ C : | w |≤ /r exp (cid:18) − c n X j =1 k wv j k ξ (cid:19) d w where (B.9) k z k ξ := E k Re( z ( ξ − ξ ′ )) k R / Z ,ξ ′ is an independent copy of ξ , and k x k R / Z denotes the distance from x tothe nearest integer. The next lemma gives an important property enjoyed by the “norm” k · k ξ from Lemma B.1 under the assumption that ξ has κ -controlled secondmoment. Lemma

B.2 (cf. [36, Lemma 5.3]) . For any κ > there are constants c , c > such that if ξ is κ -controlled, then k z k ξ ≥ c | Re( z ) | whenever | z | ≤ c . Proof of Lemma 2.7.

Let r ≥

0. We may assume r ≥ C k v k ∞ for anyﬁxed constant C > κ . From Lemma B.1, p ξ,v ( r ) ≪ r Z | w |≤ /r exp (cid:18) − c n X j =1 k wv j k ξ (cid:19) d w. If C is suﬃciently large depending on κ , it follows from Lemma B.2 thatwhenever | w | ≤ /r , k wv j k ξ ≥ c | Re( wv j ) | , giving p ξ,v ( r ) ≪ r Z | w |≤ /r exp (cid:18) − c ′ n X j =1 (Re( wv j )) (cid:19) d w where c ′ depends only on κ . By change of variable,(B.10) p ξ,v ( r ) ≪ Z | w |≤ exp (cid:18) − c ′ r n X j =1 (Re( wv j )) (cid:19) d w. N. COOK

Write v j = r j e iθ j for each j ∈ [ n ]. Since v ∈ S n − we have P nj =1 r j = 1. ByJensen’s inequality, p ξ,v ( r ) ≪ Z | w |≤ exp (cid:18) − c ′ r n X j =1 r j (cid:0) Re( we iθ j ) (cid:1) (cid:19) d w ≤ Z | w |≤ n X j =1 r j exp (cid:18) − c ′ r (cid:0) Re( we iθ j ) (cid:1) (cid:19) d w. By rotational invariance the last expression is equal to n X j =1 r j Z | w |≤ exp (cid:18) − c ′ r (Re( w )) (cid:19) d w = Z | w |≤ exp (cid:18) − c ′ r (Re( w )) (cid:19) d w which by direct computation is seen to be of size O ( r ) (with implied constantdepending on κ ). Together with our assumption that r ≥ C k v k ∞ this gives(2.7). B.3. Proof of Lemma 2.8.

We only prove part (a) as part (b) is givenin [28, Lemma 2.2].Let c > p , and let α >

0a suﬃciently small constant to be chosen later. We have P  n X j =1 | ζ j | ≤ c ε n  = P  n − c ε n X j =1 | ζ j | ≥  ≤ E exp  c αn − αε n X j =1 | ζ j |  = e c αn n Y j =1 E exp (cid:0) − α | ζ j | /ε (cid:1) . (B.11)For arbitrary j ∈ [ n ] we have E exp (cid:0) − α | ζ j | /ε (cid:1) = Z P (cid:0) exp (cid:0) − α | ζ j | /ε (cid:1) ≥ u (cid:1) d u = Z ∞ P (cid:0) | ζ j | ≤ sε / √ α (cid:1) d( e − s ) ≤ p Z √ α d( e − s ) + Z ∞√ α d( e − s )= p (1 − e − α ) + e − α = 1 − (1 − p )(1 − e − α ) . NVERTIBILITY OF STRUCTURED RANDOM MATRICES Inserting this in (B.11), we obtain P  n X j =1 | ζ j | ≤ c ε n  ≤ e c αn (cid:2) − (1 − p )(1 − e − α ) (cid:3) n ≤ exp (cid:0) n (cid:0) c α − (1 − p )(1 − e − α ) (cid:1)(cid:1) . The claim now follows by setting c = (1 − p ) / α a suﬃciently small constant.REFERENCES [1] Aljadeff, J. , Renfrew, D. and

Stern, M. (2015). Eigenvalues of block structuredasymmetric random matrices.

J. Math. Phys. Alon, N. and

Shapira, A. (2004). Testing subgraphs in directed graphs.

J. Comput.System Sci. Anderson, G. W. , Guionnet, A. and

Zeitouni, O. (2010).

An introduction torandom matrices . Cambridge Studies in Advanced Mathematics . Cambridge Uni-versity Press, Cambridge. MR2760897 (2011m:60016)[4]

Bai, Z. D. , Silverstein, J. W. and

Yin, Y. Q. (1988). A note on the largesteigenvalue of a large-dimensional sample covariance matrix.

J. Multivariate Anal. Bandeira, A. S. and van Handel, R. (2016). Sharp nonasymptotic bounds onthe norm of random matrices with independent entries.

Ann. Probab. Bordenave, C. and

Chafa¨ı, D. (2012). Around the circular law.

Probab. Surv. Bourgade, P. , Erdos, L. , Yau, H.-T. and

Yin, J.

Universality for a class of randomband matrices. Preprint available at arXiv:1602.02312.[8]

Bourgain, J. and

Tzafriri, L. (1987). Invertibility of “large” submatrices withapplications to the geometry of Banach spaces and harmonic analysis.

Israel J. Math. Cook, N. A. (2016). Spectral properties of non-Hermitian random matrices PhDthesis, University of California, Los Angeles.[10]

Cook, N. A. , Hachem, W. , Najim, J. and

Renfrew, D.

Limiting spectral distri-bution for non-Hermitian random matrices with a variance proﬁle. In preparation.[11]

Edelman, A. (1988). Eigenvalues and condition numbers of random matrices.

SIAMJ. Matrix Anal. Appl. Gowers, W. T. (1997). Lower bounds of tower type for Szemer´edi’s uniformitylemma.

Geom. Funct. Anal. Hachem, W. , Loubaton, P. and

Najim, J. (2007). Deterministic equivalentsfor certain functionals of large random matrices.

Ann. Appl. Probab. Koml´os, J. ∼ komlos/01short.pdf.[15] Koml´os, J. (1967). On the determinant of (0 ,

1) matrices.

Studia Sci. Math. Hungar Koml´os, J. (1968). On the determinant of random matrices.

Studia Sci. Math. Hun-gar. N. COOK[17]

Koml´os, J. and

Simonovits, M. (1996). Szemer´edi’s regularity lemma and its ap-plications in graph theory. In

Combinatorics, Paul Erd˝os is eighty, Vol. 2 (Keszthely,1993) . Bolyai Soc. Math. Stud. Lata la, R. (2005). Some estimates of norms of random matrices.

Proc. Amer. Math.Soc.

Litvak, A. E. , Lytova, A. , Tikhomirov, K. , Tomczak-Jaegermann, N. and

Youssef, P. (2017). Adjacency matrices of random digraphs: singularity and anti-concentration.

J. Math. Anal. Appl.

Litvak, A. E. , Pajor, A. , Rudelson, M. and

Tomczak-Jaegermann, N. (2005).Smallest singular value of random matrices and geometry of random polytopes.

Adv.Math.

Litvak, A. E. , Pajor, A. , Rudelson, M. , Tomczak-Jaegermann, N. and

Ver-shynin, R. (2005). Euclidean embeddings in spaces of ﬁnite volume ratio via randommatrices.

J. Reine Angew. Math.

Litvak, A. E. and

Rivasplata, O. (2012). Smallest singular value of sparse randommatrices.

Studia Math.

Marcus, A. W. , Spielman, D. A. and

Srivastava, N. (2014). Ramanujan graphsand the solution of the Kadison–Singer problem. In

Proc. ICM, Vol III

Nguyen, H. H. and

Vu, V. H. (2016). Normal vector of a random hyperplane.Preprint available at arXiv:1604.04897.[25]

Rajan, K. and

Abbott, L. (2006). Eigenvalue spectra of random matrices for neuralnetworks.

Physical review letters Rebrova, E. and

Tikhomirov, K.

Covering of random ellipsoids, and invertibilityof matrices with i.i.d. heavy-tailed entries. Preprint available at arXiv:1508.06690.[27]

Rudelson, M. (2008). Invertibility of random matrices: norm of the inverse.

Ann.of Math. (2)

Rudelson, M. and

Vershynin, R. (2008). The Littlewood-Oﬀord problem and in-vertibility of random matrices.

Adv. Math.

Rudelson, M. and

Vershynin, R. (2008). The least singular value of a randomsquare matrix is O ( n − / ). C. R. Math. Acad. Sci. Paris

Rudelson, M. and

Zeitouni, O. (2016). Singular values of Gaussian matrices andpermanent estimators.

Random Structures Algorithms Sankar, A. , Spielman, D. A. and

Teng, S.-H. (2006). Smoothed analysis of thecondition numbers and growth factors of matrices.

SIAM J. Matrix Anal. Appl. Spielman, D. A. and

Srivastava, N. (2012). An elementary proof of the restrictedinvertibility theorem.

Israel J. Math.

Szemer´edi, E. (1978). Regular partitions of graphs. In

Probl`emes combinatoires etth´eorie des graphes (Colloq. Internat. CNRS, Univ. Orsay, Orsay, 1976) . Colloq.Internat. CNRS

Talagrand, M. (1996). A new look at independence.

Ann. Probab. Tao, T. and

Vu, V. (2010). Random matrices: the distribution of the smallest sin-gular values.

Geom. Funct. Anal. Tao, T. and

Vu, V. H. (2008). Random matrices: the circular law.

Commun. Con-temp. Math. Tao, T. and

Vu, V. H. (2009). Inverse Littlewood-Oﬀord theorems and thecondition number of random discrete matrices.

Ann. of Math. (2) [38] Tao, T. and

Vu, V. H. (2010). Random matrices: universality of ESDs and the cir-cular law.

Ann. Probab. Tao, T. and

Vu, V. H. (2010). Smooth analysis of the condition number and theleast singular value.

Math. Comp. Tulino, A. M. and

Verd´u, S. (2004).

Random matrix theory and wireless commu-nications . Now Publishers Inc.[41] van Handel, R. On the spectral norm of Gaussian random matrices. Preprint avail-able at arXiv:1502.05003.[42]

Vershynin, R. (2011). Spectral norm of products of random and deterministic ma-trices.

Probab. Theory Related Fields von Neumann, J. and

Goldstine, H. H. (1947). Numerical inverting of matricesof high order.

Bull. Amer. Math. Soc. Yin, Y. Q. , Bai, Z. D. and

Krishnaiah, P. R. (1988). On the limit of the largesteigenvalue of the large-dimensional sample covariance matrix.