[PDF] Random weighted projections, random quadratic forms and random eigenvectors

Abstract

We present a concentration result concerning random weighted projections in high dimensional spaces. As applications, we prove (1) New concentration inequalities for random quadratic forms; (2) The infinity norm of most unit eigenvectors of a random ±1 matrix is of order O( logn/n − − − − − − √ ) ; (3) An estimate on the threshold for the local semi-circle law which is tight up to a logn − − − − √ factor.

Full PDF

RRANDOM WEIGHTED PROJECTIONS, RANDOM QUADRATIC FORMSAND RANDOM EIGENVECTORS

VAN VUDEPARTMENT OF MATHEMATICS, YALE UNIVERSITYKE WANGINSTITUTE FOR MATHEMATICS AND ITS APPLICATIONS, UNIVERSITY OF MINNESOTA

Abstract.

We present a concentration result concerning random weighted projections in highdimensional spaces. As applications, we prove • New concentration inequalities for random quadratic forms. • The inﬁnity norm of most unit eigenvectors of a random ± O ( (cid:112) log n/n ). • An estimate on the threshold for the local semi-circle law which is tight up to a √ log n factor. Introduction

Projection of a random vector.

Consider C n with a subspace H of dimension d . Let X = ( ξ , . . . , ξ n ) be a random vector. In all considerations in this paper, we assume that X is inisotropic position, namely E X ⊗ X = Id . The length of the orthogonal projection of X onto H isan important parameter which plays an essential role in the studies of random matrices and relatedareas.In [26], Tao and the ﬁrst author showed that (under certain conditions) this length is stronglyconcentrated. In other words, the projection of X onto H lies essentially on a circle centered at theorigin. This fact played a crucial role in the studies of the determinant of a random matrix withindependent entries (see [26, 20]). We say that ξ is K -bounded if | ξ | ≤ K with probability 1. Lemma 1.1 (Projection lemma, [26] ) . Let X = ( ξ , . . . , ξ n ) be a random vector in C n whosecoordinates ξ i are independent K -bounded random variables with mean 0 and variance 1, where K ≥ E | ξ i | + 1) for all i . Let H be a subspace of dimension d and Π H X be the length of theprojection of X onto H . Then P ( | Π H X − √ d | ≥ t ) ≤

10 exp( − t K ) . The projection lemma follows from the Talagrand’s inequality ([17, Chapter 4]). The constants10 and 20 are rather arbitrary. We make no attempt to optimize the constants in this paper.

Notation.

We use standard asymptotic notations such as

O, o, Θ , ω, (cid:28) etc., under the assumptionthat n → ∞ . For a vector X , (cid:107) X (cid:107) is its Euclidean norm and (cid:107) X (cid:107) ∞ its inﬁnity norm. For a matrix Key words and phrases. random weighted projections; random quadratic forms; inﬁnity norm of eigenvectors; localsemi-circle law; random covariance matrix.V. Vu is supported by research grants DMS-0901216, DMS-1307797 and AFOSAR-FA-9550-09-1-0167. a r X i v : . [ m a t h . P R ] A ug V. VU AND K. WANG A ∈ C n × n , (cid:107) A (cid:107) F and (cid:107) A (cid:107) denote its Frobenius and spectral norm, respectively. All eigenvectorshave unit length.1.2. Weighted projection lemmas.

Let us ﬁx an orthonormal basis { u , . . . , u d } of H . We canexpress Π H X as(1) Π H X = ( d (cid:88) i =1 | u ∗ i X | ) / . In recent studies, we came up with situations when the roles of the axes are not compatible.Formally speaking, one is required to consider a weighted version of (1) where ( (cid:80) di =1 | u ∗ i X | ) / isreplaced by ( (cid:80) di =1 c i | u ∗ i X | ) / with c i being non-negative numbers (weights).We are able to prove a variant of Lemma 1.1 for this more general problem, which turns out tobe useful in a number of applications, some of which will be discussed in the paper. Furthermore,we can also weaken the assumption on random vector X in various ways.We say a random vector X = ( ξ , . . . , ξ n ) is K - concentrated (where K may depend on n ) if thereare constants C, C (cid:48) > F : C n → R and any t > P ( | F ( X ) − M ( F ( X )) | ≥ t ) ≤ C exp( − C (cid:48) t K ) , where M ( Y ) denotes the median of a random variable Y (choose an arbitrary one as there aremany).Notice that the notion of K -concentrated is somewhat similar to the notion of threshold in randomgraph theory in the sense that if X is K -concentrated then it is cK -concentrated for any constant c >

0. (Similarly, if p ( n ) is a threshold for a property P (say, containing a triangle) then cp ( n )is also a threshold.) One can also replace the median by the expectation (see Lemma 2.1). Thedependence on K on the RHS of (2) is ﬂexible (one can replace K by any function f ( K )); however,the quality of the concentration bound will depend on f ( K ) and we leave it as an exercise for thereader to work out this dependence. Examples of K -concentrated random variables • If the coordinates of X are iid standard gaussian (real or complex), then X is 1-concentrated(see [17]). • If ξ i are independent and ξ i are K -bounded for all i , then X is K -concentrated (this is acorollary of Talagrand’s inequality; see [17, Chapter 4] or [27, Theorem F.5]). • If X satisﬁes the log-Sobolev inequality with parameter K , then it is K -concentrated (see[17, Theorem 5.3]). • The coordinates ξ i of X come from a random walk satisfying certain mixing properties(see [25, Corollary 4]; in this corollary (cid:107) Γ (cid:107) plays the role of K ). In this and the previousexample, the coordinates of X are not necessarily indepedent. Lemma 1.2.

Let X = ( ξ , . . . , ξ n ) be a K -concentrated random vector in C n . Then there areconstants C, C (cid:48) > (which depend on, but could be diﬀerent from the constants in (2) ) such thatthe following holds. Let H be a subspace of dimension d with an orthonormal basis { u , . . . , u d } . ANDOM WEIGHTED PROJECTIONS, RANDOM QUADRATIC FORMS AND RANDOM EIGENVECTORS 3

Then for any ≥ c , . . . , c d ≥ P  | (cid:118)(cid:117)(cid:117)(cid:116) d (cid:88) j =1 c j | u ∗ j X | − (cid:118)(cid:117)(cid:117)(cid:116) d (cid:88) j =1 c j | ≥ t  ≤ C exp( − C (cid:48) t K ) . In particular, by squaring, it follows that(3) P  | d (cid:88) j =1 c j | u ∗ j X | − d (cid:88) j =1 c j | ≥ t (cid:118)(cid:117)(cid:117)(cid:116) d (cid:88) j =1 c j + t  ≤ C exp( − C (cid:48) t K ) . Another way to weaken the K -bounded assumption is to consider truncation. If ξ is not bounded,but has light tail, then by setting K appropriately, we can show that P ( | ξ | ≥ K ) is negligible withrespect to the probability bound we want to prove.Assume that the ξ i are independent with mean zero and variance one. Choose a number K > ε := max ≤ i ≤ n P ( | ξ i | > K ). Set ξ (cid:48) i := ξ i I | ξ i |≤ K and let µ i and σ i denote its mean andvariance. Set ε := max ≤ i ≤ n | µ i | and ε := max ≤ i ≤ n | σ i − | . Assume all ε j ≤ / Lemma 1.3.

There are constants

C, C (cid:48) > such that the following holds. Let X = ( ξ , . . . , ξ n ) be a random vector in C n whose coordinates ξ i are independent random variables with mean 0 andvariance 1. Under the above notations, we have, for any ≥ c , . . . , c n ≥ and any t > P  | (cid:118)(cid:117)(cid:117)(cid:116) d (cid:88) j =1 c j | u ∗ j X | − (cid:118)(cid:117)(cid:117)(cid:116) d (cid:88) j =1 c j | ≥ t + 4 n K ( ε + ε )  ≤ C exp( − C (cid:48) t K ) + nε . Concentration of random quadratic forms.

Consider a quadratic form Y := X ∗ AX where X = ( ξ , . . . , ξ n ) is, as usual, a random vector and A = ( a ij ) ≤ i,j ≤ n a deterministic matrix.As application of the new projection lemmas, we prove a large deviation result for Y , which canbe seen as the quadratic version of the standard Chernoﬀ bound. Theorem 1.4 (Concentration of quadratic forms I) . Let X be a K -concentrated random vector in C n . Then there are constants C, C (cid:48) > such that for any matrix A (5) P ( | X ∗ AX − trace( A ) | ≥ t ) ≤ C log n exp( − C (cid:48) K − min { t (cid:107) A (cid:107) F log n , t (cid:107) A (cid:107) } ) . Theorem 1.5 (Concentration of quadratic forms II) . Let X and ε , ε , ε be as in Lemma 1.3.There are constants C, C (cid:48) > such that the following holds. Assume n K ( ε + ε ) = o (1) , then P ( | X ∗ AX − trace( A ) | ≥ t ) ≤ C log n exp( − C (cid:48) K − min { t (cid:107) A (cid:107) F log n , t (cid:107) A (cid:107) } ) + nε . As an illustration, let us consider the case when ξ i are sub-exponential. We say that ξ is sub-exponential with exponent α > a and b ) if for any t > P ( | ξ − E ξ | ≥ t α ) ≤ a exp( − bt ) . Corollary 1.6 (Concentration of quadratic forms with sub-exponential variables) . Assume that ξ i are independent and sub-exponential (with exponent α > ) random variables with mean 0 andvariance 1. Then there are constants C, C (cid:48) > such that for any t satisfying (6) t = ω (( (cid:107) A (cid:107) F + log α n (cid:107) A (cid:107) ) log α +1 n ) , V. VU AND K. WANG we have (7) P ( | X ∗ AX − trace( A ) | ≥ t ) ≤ C exp( − C (cid:48) min { ( t (cid:107) A (cid:107) F √ log n ) α +1 / , ( t (cid:107) A (cid:107) ) α +1 } ) . Quadratic forms of random variables appear frequently in probability and the large deviationproblem has been considered by several researchers, with ﬁrst and perhaps most famous result beingby Hanson-Wright inequality [14]. In many cases, our results improve earlier results signiﬁcantly;see Section 3 for more details.1.4.

Norm of random eigenvectors.

Let M n be a symmetric ± ± / u be an arbitrary unit eigenvector of M n . We investigate the following natural question, How big is (cid:107) u (cid:107) ∞ ? A good bound on the inﬁnity norm of the eigenvectors is important in spectral analysis of graphsand many other applications, such as the studies of nodal domains (see for instance [7] and thereferences therein). Recently, it plays a crucial role in breakthrough works concerning local statisticsof random matrices (see Section 1.5 and also [8, 31] for surverys).Set W n = √ n M n . Thanks to the classical Wigner’s semi-circle law (see Section 1.5), we know thatmost of the eigenvalues of W n belong to the interval ( − (cid:15), − (cid:15) ). Using our new concentrationresults, we are able to obtain (what we believe to be) the optimal bound for the eigenvectorscorresponding to these eigenvalues. Theorem 1.7 (Inﬁnity norm of eigenvectors) . Let M n be a n × n symmetric matrix whose upperdiagonal entries are iid random variables that takes values ± with the same probability. Let W n = √ n M n . For any constant C > , there is a constant C > such that the following holds. • (Bulk case) With probability at least − n − C , for any ﬁxed (cid:15) > and any ≤ i ≤ n with λ i ( W n ) ∈ [ − (cid:15), − (cid:15) ] , there is a unit eigenvector u i ( W n ) of λ i ( W n ) satisfying (cid:107) u i ( W n ) (cid:107) ∞ ≤ C log / n √ n . • (Edge case) With probability at least − n − C , for any (cid:15) > and any ≤ i ≤ n with λ i ( W n ) ∈ [ − − (cid:15), − (cid:15) ] ∪ [2 − (cid:15), (cid:15) ] , there is a unit eigenvector u i ( W n ) of λ i ( W n ) satisfying (cid:107) u i ( W n ) (cid:107) ∞ ≤ C log n √ n . The best previous bound was of the form log C nn / for some large (usually not explicit) constant C [9, 10, 11, 27]. We conjecture that the bound O ( (cid:112) log n/n ) is sharp (it is easy to see thatthis is the case if the entries are standard gaussian) and also that it holds for all eigenvectors.Very recently, Rudelson and Vershynin [24] also studied the norm of random eigenvectors using ageometric method, which is diﬀerent from our approach discussed in Section 1.5. ANDOM WEIGHTED PROJECTIONS, RANDOM QUADRATIC FORMS AND RANDOM EIGENVECTORS 5

The local semi-circle law.

Denote by ρ sc the semi-circle density function with support on[ − , ρ sc ( x ) := (cid:40) π √ − x , | x | ≤ , | x | > . Let us recall the classical Wigner’s semi-circle law:

Theorem 1.8 (Semi-circular law) . Let M n be a random Hermitian matrix whose entries on andabove the diagonal are iid bounded random variables with zero mean and unit variance and W n = √ n M n . Then for any real number x , lim n →∞ n |{ ≤ i ≤ n : λ i ( W n ) ≤ x }| = (cid:90) x − ρ sc ( y ) dy in the sense of probability, where we use | I | to denote the cardinality of a ﬁnite set I . The key tool for bounding the inﬁnity norm of an eigenvector is a statement of the following type:any interval of length at least T (which tends to zero with n ) in the spectrum [ − ,

2] contains aneigenvalue, with high probability. The quality of the bound will depend on how small T is. Thisapproach was developed by Erd˝os, Schlein and Yau in [9, 10, 11], leading to eigenvector norm boundsof order n − / , n − / and ﬁnally n − o (1) . A simpler argument, following the same approach, wasdeveloped by Tao and the ﬁrst author in [27] (see [27, Section 4] for a problem concerning randomnon-hermitian matrices).One way to attack the above problem is to show that the semi-circle law holds for small intervals(or at small scale). Intuitively, we would like to have with high probability that | N I − n (cid:90) I ρ sc ( x ) dx | ≤ δn | I | , for any interval I and ﬁxed δ >

0, where N I denotes the number of eigenvalues of W n := √ n M n inthe interval I . Of course, the reader can easily see that I cannot be arbitrarily short (since N I isan integer). Following [11], we call a statement of this kind a local semi-circle law (LSCL).A natural question arises: how short can I be ? Formally, we say that the LSCL holds at a scale f ( n ) if with probability 1 − o (1) | N I − n (cid:90) I ρ sc ( x ) dx | ≤ δn | I | , for any interval I in the bulk of length ω ( f ( n )) and any ﬁxed δ >

0. Furthermore, we say that f ( n )is a threshold scale if the LSCL holds at scale f ( n ) but does not holds at scale g ( n ) for any function g ( n ) = o ( f ( n )). (The reader may notice a similarity between this deﬁnition and the deﬁnition ofthreshold functions for random graphs.) We would like to raise the following problem. Problem 1.9.

Determine the threshold scale (if exists).

We do not know a sharp estimate for the threshold for any matrix ensembles, even in the basicGUE (random matrix with complex gaussian entries) and GOE (random matrix with real gaussianentries) cases. A recent result by Ben Arous and Bourgade [1] shows that the maximum gapbetween two consecutive (bulk) eigenvalues of GUE is of order Θ( √ log n/n ), with high probability.Thus, if we partition the bulk into intervals of length α √ log n/n for a suﬃciently small α , one ofthese intervals contains at most one eigenvalue. Therefore, we expect that in natural ensembles, V. VU AND K. WANG the LSCL does not hold below the √ log n/n scale. In [11, 29], upper bound of the form log C n/n was proved for some large value of C . Here we are going to show Theorem 1.10 (Threshold for local semi-circle law) . Let M n be a Hermitian matrix whose upperdiagonal entries are independent random variables with mean and variance . Assume furthermorethat for ≤ i ≤ n , the vectors X i , obtained by deleting the i -th entry of the i -th row vector of M ,are K -concentrated. Then the threshold scale for LSCL is bounded from above by K log n/n . In the GUE case, the gap between the upper and lower bound is only O ( √ log n ) and it is anintriguing problem to remove this factor. We also conjecture that Ben Arous and Bourgade’s resulton the largest gap holds for ± Structure of the paper.

In the next section, we prove the new projection lemmas. In Section 3,we prove the new concentration inequalities for quadratic forms and make a comparison with priorresults. The next section, Section 4 can be seen as a preparation step in which we recall facts aboutrandom matrices. We prove the new threshold for the local semi-circle law in Section 5, and thebound on the inﬁnity norm of eigenvectors in Section 6. The appendices contain proofs concerningrandom sample covariance matrices.

Acknowledgement.

The authors would like to thank the anonymous referees for their carefulreading and constructive suggestions.2.

Proofs of Lemmas 1.2 and 1.3

Proof of Lemma 1.2.

Set f ( X ) := (cid:113)(cid:80) dj =1 c j | u ∗ j X | . Thus, f is a function from C n to R .We ﬁrst observe that f ( X ) is convex. Indeed, for 0 ≤ λ, µ ≤ λ + µ = 1 and any X, Y ∈ C n ,by Cauchy-Schwardz inequality, f ( λX + µY ) ≤ (cid:118)(cid:117)(cid:117)(cid:116) d (cid:88) j =1 c j ( λ | u ∗ j X | + µ | u ∗ j Y | ) ≤ λ (cid:118)(cid:117)(cid:117)(cid:116) d (cid:88) j =1 c j | u ∗ j X | + µ (cid:118)(cid:117)(cid:117)(cid:116) d (cid:88) j =1 c j | u ∗ j Y | = λf ( X ) + µf ( Y ) . Next, we show that f ( X ) is 1-Lipschitz. Notice that f ( X ) ≤ (cid:113)(cid:80) dj =1 | u ∗ j X | ≤ (cid:107) X (cid:107) . Since f ( X )is convex, one has12 f ( X ) = f ( 12 X ) = f ( 12 ( X − Y ) + 12 Y ) ≤ f ( X − Y ) + 12 f ( Y ) . Thus f ( X ) − f ( Y ) ≤ f ( X − Y ) and f ( Y ) − f ( X ) ≤ f ( Y − X ) = f ( X − Y ), which imply | f ( X ) − f ( Y ) | ≤ f ( X − Y ) ≤ (cid:107) X − Y (cid:107) . ANDOM WEIGHTED PROJECTIONS, RANDOM QUADRATIC FORMS AND RANDOM EIGENVECTORS 7

Thus, by the deﬁnition of K -concentrated property,(9) P ( | f ( X ) − M ( f ( X )) | ≥ t ) ≤ C exp( − C (cid:48) t K )for some constants C, C (cid:48) > | M ( f ( X )) − (cid:113)(cid:80) dj =1 c j | = O ( K ) . We use the followinglemma. The proof of the lemma is classical and thus omitted.

Lemma 2.1.

Let Y be a real random variable. Assume P ( | Y − µ | ≥ t ) ≤ f ( t ) , where (cid:82) ∞ f ( x ) dx = O (1) , then | E Y − µ | = O (1) . Assume furthermore that Y is non-negative, µ ≥ , σ = √ E Y , and (cid:82) ∞ xf ( x ) dx = O (1) , then | E Y − σ | = O (1) . To apply this lemma, set c (cid:48) i := c i max ≤ i ≤ d c i , Y := K (cid:113)(cid:80) di =1 c (cid:48) i | u ∗ i X | and µ := M ( Y ). We have, bythe K -concentration property P ( | Y − µ | ≥ t ) = P ( | (cid:118)(cid:117)(cid:117)(cid:116) d (cid:88) i =1 c (cid:48) i | u ∗ i X | − M ( (cid:118)(cid:117)(cid:117)(cid:116) d (cid:88) i =1 c (cid:48) i | u ∗ i X | ) | ≥ tK ) ≤ C exp( − C (cid:48) t ) . Set f ( x ) = C exp( − C (cid:48) x ). The assumptions on f ( x ) in Lemma 2.1 are trivially satisﬁed. As X isisotropic, σ = E Y = K (cid:80) di =1 c (cid:48) i . It follows from Lemma 2.1 that M ( Y ) = 1 K (cid:118)(cid:117)(cid:117)(cid:116) d (cid:88) i =1 c (cid:48) i + O (1) . Renormalizing, we obtain M ( (cid:118)(cid:117)(cid:117)(cid:116) d (cid:88) i =1 c i | u ∗ i X | ) = (cid:118)(cid:117)(cid:117)(cid:116) d (cid:88) i =1 c i + O ( K (cid:113) max ≤ i ≤ d c i ) , which concludes the proof of Lemma 1.2. Proof of Lemma 1.3.

1) + µ i σ i | ≤ Kε + ε ) . V. VU AND K. WANG

It follows that D := X (cid:48) − ˜ X has norm at most 2 n / ( Kε + ε ) with probability one. On the otherhand, (cid:12)(cid:12)(cid:12) (cid:88) ≤ i ≤ d c i | u ∗ i X (cid:48) | − (cid:88) ≤ i ≤ d c i | u ∗ i ˜ X | (cid:12)(cid:12)(cid:12) ≤ (cid:88) ≤ i ≤ d c i | u ∗ i X (cid:48) i || u ∗ i D | + (cid:88) ≤ i ≤ d c i | u ∗ i D | . As u i are unit vectors, | u ∗ i X (cid:48) i | ≤ (cid:107) X (cid:48) i (cid:107) ≤ √ nK and | u ∗ i D i | ≤ (cid:107) D i (cid:107) ≤ √ n ( K(cid:15) + (cid:15) ) (these boundsare generous and can be improved by a polynomial factor in certain cases, but in applications suchimprovement rarely matters). It follows, again rather generously, | (cid:88) ≤ i ≤ d c i | u ∗ i X (cid:48) | − (cid:88) ≤ i ≤ d c i | u ∗ i ˜ X | | ≤ n d (cid:88) i =1 c i K ( ε + ε ) ≤ n K ( ε + ε ) . Applying Lemma 1.2 for ˜ X , we obtain Lemma 1.3.In practice, ε j are typically super-polynomially small, i.e. n − ω (1) , which yields 4 n K ( ε + ε ) = o (1). This term can be ignored (by slightly changing the values of C, C (cid:48) if necessary) and we endup with a more friendly inequality(10) P ( | (cid:118)(cid:117)(cid:117)(cid:116) d (cid:88) j =1 c j | u ∗ j X | − (cid:118)(cid:117)(cid:117)(cid:116) d (cid:88) j =1 c j | ≥ t ) ≤ C exp( − C (cid:48) t K ) + nε . In the sub-exponential case, for a suﬃciently large K (compared to a and b ), ε j ≤ exp( − b K /α )for j = 1 , ,

3. For K = ω (log α n ), n K exp( − b K /α ) = o (1) and (10) yield(11) P  | (cid:118)(cid:117)(cid:117)(cid:116) d (cid:88) j =1 c j | u ∗ j X | − (cid:118)(cid:117)(cid:117)(cid:116) d (cid:88) j =1 c j | ≥ t  ≤ C exp( − C (cid:48) t K ) + n exp( − b K /α ) . Random Quadratic Forms

Proofs of new results.

Let us ﬁrst prove Theorem 1.4. Notice that if Y = X ∗ AX , then Y + ¯ Y = X ∗ ( A + A ∗ ) X and Y − ¯ Y = X ∗ ( A − A ∗ ) X . Since Y − trace A = 12 [( Y + ¯ Y ) − trace( A + A ∗ )] + 12 [( Y − ¯ Y ) − trace( A − A ∗ )] , we have P ( | Y − trace A | ≥ t ) ≤ P ( | ( Y + ¯ Y ) − trace( A + A ∗ ) | ≥ t )+ P ( |√− Y − ¯ Y ) − trace( √− A − A ∗ )) | ≥ t ) . Moreover, as (cid:107) A + A ∗ (cid:107) F , (cid:107) A − A ∗ (cid:107) F = O ( (cid:107) A (cid:107) F ) and (cid:107) A + A ∗ (cid:107) , (cid:107) A − A ∗ (cid:107) = O ( (cid:107) A (cid:107) ), it suﬃcesto prove the theorem in the case A is Hermitian.Next, we observe that any Hermitian matrix A can be written as A := A − A where A i arepositive semi-deﬁnite and max i =1 , (cid:107) A i (cid:107) ≤ (cid:107) A (cid:107) , max i =1 , (cid:107) A i (cid:107) F ≤ (cid:107) A (cid:107) F . (In fact, the positiveeigenvalues of A are the positive eigenvalues of A and the positive eigenvalues of A are the absolutevalues of the negative eigenvalues of A .) This enables us to further reduce the problem to the casewhen A is positive semi-deﬁnite. ANDOM WEIGHTED PROJECTIONS, RANDOM QUADRATIC FORMS AND RANDOM EIGENVECTORS 9

Finally, as the content of the theorem is invariant under scaling, we can assume that (cid:107) A (cid:107) = 1.Let 1 = c ≥ c , . . . , c n ≥ A together with corresponding orthonormaleigenvectors { u , . . . , u n } . We have(12) X ∗ AX − trace( A ) = n (cid:88) j =1 c j | u ∗ j x | − n (cid:88) j =1 c j . This is precisely the setting of the projection lemmas. By Lemma 1.2, we know that for anynumbers 0 ≤ d j ≤ j ∈ J ,(13) P ( | (cid:88) j ∈ J d j | u ∗ j X | − (cid:88) j ∈ J d j | ≥ t (cid:115)(cid:88) i d i + t ) ≤ C exp( − C (cid:48) K − t ) . However, it is somewhat wasteful to apply this directly to (12). We will perform an extra partitionstep. Set J k := { ≤ j ≤ n : 14 k +1 ≤ c j ≤ k } , ≤ k ≤ k := 10 log n, and let J k +1 be the collection of the remaining indices.For each 0 ≤ k ≤ k + 1, apply Lemma 1.2 to d i := 4 k c i , c i ∈ J k , we have, for any s ≥ P ( | (cid:88) i ∈ J k k c i ( | u ∗ i X | − | ≥ s (cid:115)(cid:88) i ∈ J k k c i + s ) ≤ C exp( − C (cid:48) K − s ) . Set s := t (cid:107) A (cid:107) F and simplify by 4 k , the above inequality becomes P ( | (cid:88) i ∈ J k c i ( | u ∗ i X | − | ≥ t k (cid:107) A (cid:107) F (cid:115)(cid:88) i ∈ J k c i + t k (cid:107) A (cid:107) F ) ≤ C exp( − C (cid:48) K − t (cid:107) A (cid:107) F ) . Apparently, (cid:80) k +1 k =0 t k (cid:107) A (cid:107) F ≤ t (cid:107) A (cid:107) F . Moreover, (cid:80) i ∈ J k c i ≤ n × n − = n − and (cid:88) ≤ k ≤ k − k (cid:115)(cid:88) i ∈ J k c i ≤ k / ( k (cid:88) k =0 − k (cid:88) i ∈ J k c i ) / ≤ / n ( k (cid:88) k =0 (cid:88) i ∈ J k c i ) / ≤ / n (cid:107) A (cid:107) F , by Cauchy-Schwartz inequality.Putting the above estimates together and using the union bound, we obtain P ( | n (cid:88) i =1 c i ( | u ∗ i X | − | ≥

16 log / nt + 2 t (cid:107) A (cid:107) F + n − ) ≤ C log n exp( − C (cid:48) K − t (cid:107) A (cid:107) F ) . We can ignore the small term n − (by slightly adjusting the constant 16), the desired boundfollows. Remark 3.1.

If we have more information about A , the log n term can be improved. For instance,if all eigenvalues of A are comparable, then we do not need this term.The proof of Theorem 1.5 uses Lemma 1.3 and is left as an exercise. To prove Corollary 1.6, noticethat we can obtain an analogue of (11)(14) P ( | X ∗ AX − trace( A ) | ≥ t ) ≤ C exp( − C (cid:48) K − min { t (cid:107) A (cid:107) F log n , t (cid:107) A (cid:107) } ) + n exp( − b K /α ) , under the assumption that K = ω (log α n ).To optimize the bound, we choose K such that K − min { t (cid:107) A (cid:107) F log n , t (cid:107) A (cid:107) } = K /α . This leads tosetting K := min { ( t (cid:107) A (cid:107) F √ log n ) /α , ( t (cid:107) A (cid:107) ) /α } . Assume(15) t = ω (( (cid:107) A (cid:107) F + log α n (cid:107) A (cid:107) ) log α +1 n ) . This assumption guarantees K = ω (log α n ). It also implies n exp( − b K /α ) ≤ exp( − b K /α ),proving Corollary 1.6.3.2. Comparison to earlier results.

In 1971, Hanson and Wright [14] obtained the ﬁrst impor-tant inequality for sub-gaussian random variables.

Theorem 3.2 (Hanson-Wright inequality) . Let X = ( ξ , . . . , ξ n ) ∈ R n be a random vector with ξ i being iid symmetric and sub-gaussian random variables with mean 0 and variance 1. There existconstants C, C (cid:48) > such that the following holds. Let A be a real matrix of size n with entries a ij and B := ( | a ij | ) . Then (16) P ( | X T AX − trace( A ) | ≥ t ) ≤ C exp( − C (cid:48) min { t (cid:107) A (cid:107) F , t (cid:107) B (cid:107) } ) for any t > . Later, Wright [34] extended Theorem 3.2 to non-symmetric random variables. Recently, Hsu,Kakade and Zhang [16] showed that one can obtain a better upper tail (notice that (cid:107) B (cid:107) is replacedby (cid:107) A (cid:107) )(17) P ( X T AX − trace( A ) ≥ t ) ≤ C exp( − C (cid:48) min { t (cid:107) A (cid:107) F , t (cid:107) A (cid:107) } )under a considerably weaker assumption (which, in particular, does not require the ξ i to be in-dependent). On the other hand, their method does not cover the lower tail. Let us pause hereto point out a strong distinction from the linear case and the quadratic case: In the linear case(Chernoﬀ type bounds), the lower tail follows from the upper tail by simply switching ξ i to − ξ i ,but this trick is useless in the quadratic case. Recently, Rudelson and Vershynin [23] proved theHanson-Wright inequality(18) P ( | X T AX − trace( A ) | ≥ t ) ≤ C exp( − C (cid:48) min { t (cid:107) A (cid:107) F , t (cid:107) A (cid:107) } ) , assuming ξ i are sub-gaussian.In the previous papers, the random variables ξ i are required to be real. Few years ago, motivatedby the delocalization problem for random matrices, Erd˝os, Schlein and Yau [11] considered the ANDOM WEIGHTED PROJECTIONS, RANDOM QUADRATIC FORMS AND RANDOM EIGENVECTORS 11 complex case. By assuming either both the real and imaginary parts of ξ i are iid sub-gaussian orthe distribution of ξ i is rotationally symmetric (real and imaginary parts still sub-gaussian), theyproved(19) P ( | X ∗ AX − trace( A ) | ≥ t ) ≤ C exp( − C (cid:48) t (cid:107) A (cid:107) F ) . Later, Erd˝os, Yau and Yin [13] showed that if ξ i are independent sub-exponential random variableswith exponent α >

0, having mean 0 and variance 1, then(20) P ( | X ∗ AX − trace( A ) | ≥ t ) ≤ C exp( − C (cid:48) ( t (cid:107) A (cid:107) F ) α ) . To simplify the comparison, let us ignore the log n terms in our theorems (which play little rolein practice). If K = O (1), then the main diﬀerence between Theorem 3.2 of Hanson and Wrightand Theorem 1.4 is that the term (cid:107) B (cid:107) in Theorem 3.2 is now replaced by (cid:107) A (cid:107) . It is easy to seethat (cid:107) B (cid:107) ≥ (cid:107) A (cid:107) for any real matrix A . In fact, in many cases, (cid:107) B (cid:107) is signiﬁcantly larger than (cid:107) A (cid:107) . For instance, a random matrix A with entries of order 1 typically has spectral norm of order √ n , but in this case it is clear that (cid:107) B (cid:107) has spectral norm of order n (as all row sums are of thisorder). The same holds for several classical explicit matrices, such as the Hadamard matrix. Inthese cases, our bound improves Hanson-Wright’s signiﬁcantly. Furthermore, our result applies inthe complex case while the approach used by Hanson and Wright is restricted to the real case.Comparing to (19), we do not need the fairly restricted assumption that either both the real andimaginary parts of ξ i are iid sub-gaussian or the distribution of ξ i is rotationally symmetric. In thecase K = O (1), both terms t (cid:107) A (cid:107) F log n and t (cid:107) A (cid:107) in our bound can be considerably larger than t (cid:107) A (cid:107) F .For instance, t (cid:107) A (cid:107) and t (cid:107) A (cid:107) F diﬀer by a factor √ n in both the random and Hadamard cases.In order to make a Hanson-Wright type bound non-trivial, we need to assume t ≥ (cid:107) A (cid:107) F + (cid:107) A (cid:107) . Inmany applications, we want the probability bound to be polynomially or even super-polynomiallysmall, i.e. n − O (1) or n − ω (1) . This requires a lower bound log Ω(1) n ( (cid:107) A (cid:107) F + (cid:107) A (cid:107) ) on t , which isconsistent with the assumption (6) in Corollary 1.6.Notice that (7) compares favorably to (20). For the term t (cid:107) A (cid:107) F , the exponent α +1 / is superior to α +2 (notice that we are talking about a double exponent, so an improvement here could improvethe quality of the bound quite a lot). For the term t (cid:107) A (cid:107) , the exponent α +1 is still better than α +2 . Furthermore, (cid:107) A (cid:107) can be signiﬁcantly smaller than (cid:107) A (cid:107) F , as discussed earlier.4. Random matrices and the Stieltjes transform

This section serves as a preparation, in which we recall several facts about random matrices. Theempirical spectral distribution (ESD) function of the n × n Hermitian matrix W n := √ n M n = √ n ( ζ ij ) ≤ i,j ≤ n is a one-dimensional function F W n ( x ) = 1 n |{ ≤ j ≤ n : λ j ( W ) ≤ x }| , where | I | denotes the cardinality of a set I . We are going to focus on the case when the entries of M n are K -bounded; it is easy to extend this assumption to K -concentrated. The Stieltjes transform of a real measure µ ( x ) is deﬁned for any complex number z not in thesupport of µ as s ( z ) = (cid:90) R x − z dµ ( x ) . Thus, the Stieltjes transform s n ( z ) of W n is s n ( z ) = (cid:90) R x − z dF W n ( x ) = 1 n n (cid:88) i =1 λ i ( W n ) − z . Furthermore, the Stieltjes transform s sc ( z ) of the semi-circle distribution is s sc ( z ) := (cid:90) R ρ sc ( x ) x − z dx = − z + √ z − , where √ z − − ,

2] and asymptotically equals z at inﬁnity [4].The beauty (and power) of the Stieltjes transform lies in the fact that it has a clear linear algebracontent; s n ( z ) of W n is exactly the trace of the matrix ( W n − zI ) − . This allows us to compute theStieltjes transform by looking at the diagonal entries of ( W n − zI ) − . In matrix theory, Stieltjestransform plays the role Fourier transform in analysis. If the Stieltjes transforms of two spectralmeasures are close to each other (for all z ), then the two measures are more or less the same. Inparticular, if s n ( z ) is close to s sc ( z ), then the spectral distribution of W n is close to the semi-circledistribution (see for instance [4, Chapter 11], [10]). We are going to use the following lemma. Lemma 4.1.

Let M n be a random Hermitian matrix with independent K -bounded entries withmean 0 and variance 1. Let /n < η < / and L, ε, δ > . For any constant C > , there existsa constant C > such that if one has the bound | s n ( z ) − s sc ( z ) | ≤ δ with probability at least − n − C uniformly for all z with | Re ( z ) | ≤ L and Im ( z ) ≥ η , then for anyinterval I in [ − L + ε, L − ε ] with | I | ≥ max (2 η, ηδ log δ ) , one has | N I − n (cid:90) I ρ sc ( x ) dx | ≤ δn | I | with probability at least − n − C . This is [29, Lemma 64], which, in turn, is a variant of [10, Corollary 4.3].An appropriate application of Lemma 4.1 will imply Theorem 1.10. (As a matter of fact, we agoing to prove a little bit more.) In order to use this lemma, we set L = 4 , ε = 1, and critically η := K C log nnδ , where C = C + 10 . We are going to show that(21) | s n ( z ) − s sc ( z ) | = o ( δ )holds with probability at least 1 − n − C for any ﬁxed z in the region { z ∈ C : | Re( z ) | ≤

4, Im( z ) ≥ η } .Notice that in this statement we ﬁx z . However, it is simple to strengthen the statement to hold forall z , using an (cid:15) -net argument, exploiting the fact that s n ( z ) is Lipschitz continuous with Lipschitzconstant O ( n ) (for details, we refer to [9, Theorem 1.1] or [29, Section 5.2]). ANDOM WEIGHTED PROJECTIONS, RANDOM QUADRATIC FORMS AND RANDOM EIGENVECTORS 13

In order to show that s n ( z ) is close to s sc ( z ), the key observation is that s sc ( z ) can also be deﬁnedby the equation(22) s sc ( z ) = − z + s sc ( z ) . This equation is stable, so if we can show s n ( z ) ≈ − z + s n ( z ) then it follows that s n ( z ) ≈ s sc ( z ).This observation was due to Bai et al. [2], who used it to prove the n − / rate of convergence of s n ( z ) to s sc ( z ). In [9, 10, 11], Erd˝os et al. reﬁned Bai’s approach to prove local semi-circle law atscales ﬁner than n − / , ultimately to n − log C n [10]. Our main contribution here is to push thescale further down to n − log n , which we believe is (at most) a factor √ log n from the truth.Recall that s n ( z ) is the trace of ( W n − zI ) − . By computing the diagonal entires, one can show(see [4, Chapter 11], [10] or [29, Lemma 39])(23) s n ( z ) = 1 n n (cid:88) k =1 − ζ kk √ n − z − Y k , where Y k = a ∗ k ( W n,k − zI ) − a k and W n,k is the matrix W n with the k -th row and column removed, and a k is the k -th row of W n with the k -th element removed.The entries of a k are independent of each other and of W n,k , and have mean zero and variance1 /n . By linearity of expectation we have E ( Y k | W n,k ) = 1 n trace( W n,k − zI ) − = (1 − n ) s n,k ( z )where s n,k ( z ) := 1 n − n − (cid:88) i =1 λ i ( W n,k ) − z is the Stieltjes transform of W n,k . From the Cauchy interlacing law, we can get | s n ( z ) − (1 − n ) s n,k ( z ) | = O ( 1 n (cid:90) R | x − z | dx ) = O ( 1 nη ) = o ( δ )and thus E ( Y k | W n,k ) = s n ( z ) + o ( δ ) . The heart of the matter now is the following concentration result.

Lemma 4.2.

Let M n be as in Lemma 4.1. For ≤ k ≤ n , Y k = E ( Y k | W n,k ) + o ( δ ) holds withprobability at least − O ( n − C ) for any z with | Re ( z ) | ≤ and Im ( z ) ≥ η . To prove this lemma, we are going to make an essential use of the weighted projection lemma, asshowed in the next section. Proof of Lemma 4.2 and Threshold of the Local Law

We are going to prove Lemma 4.2 and the following more quantitative version of Theorem 1.10.

Theorem 5.1.

For any constants (cid:15), δ, C > , there is a constant C > such that the followingholds. Let M n be a Hermitian matrix whose upper diagonal entries are independent random vari-ables with mean and variance . Assume furthermore that for ≤ i ≤ n , the vectors X i , obtainedby deleting the i -th entry of the i -th row vector of M n , are K -concentrated. Then with probabilityat least − n − C , we have | N I − n (cid:90) I ρ sc ( x ) dx | ≤ δn (cid:90) I ρ sc ( x ) dx, for all interval I ⊂ ( − (cid:15), − (cid:15) ) of length at least C K log n/n . First, we record a lemma that provides a crude upper bound on the number of eigenvalues in shortintervals.

Lemma 5.2.

Let M n be a random Hermitian matrix with independent K -bounded entries withmean 0 and variance 1. For any constant C > , there exists a constant C > such that for anyinterval I ⊂ R with | I | ≥ C K log nn , N I (cid:28) n | I | with probability at least − n − C . This lemma is Proposition 66 in [29], which is a variant of [11, Theorem 5.1]. Notice that(24) Y k = a ∗ k ( W n,k − zI ) − a k = n − (cid:88) j =1 | u j ( W n,k ) ∗ a k | λ j ( W n,k ) − z = 1 n n − (cid:88) j =1 | u j ( W n,k ) ∗ X k | λ j ( W n,k ) − z , where X k = √ na k is the k -th row of M n with the k -th element removed. Note that the entries of X k are independent with mean 0 and variance 1. Therefore,(25) | Y k − E ( Y k | W n,k ) | = 1 n | n − (cid:88) j =1 | u j ( W n,k ) ∗ X k | − λ j ( W n,k ) − z | = 1 n | n − (cid:88) j =1 R j λ j ( W n,k ) − x − √− η | , where R j := | u j ( W n,k ) ∗ X k | −

1. By symmetry, we can restrict the sum to those indices j where λ j ( W n,k ) − x ≥ J be the set of indices j such that 0 ≤ λ j ( W n,k ) − x ≤ η . Since x = Re z, η = Im z , we have n | (cid:80) j ∈ J R j λ j ( W n,k ) − x −√− η |≤ n | (cid:80) j ∈ J λ j ( W n,k ) − x ( λ j ( W n,k ) − x ) + η R j | + n | (cid:80) j ∈ J η ( λ j ( W n,k ) − x ) + η R j |≤ nη | (cid:80) j ∈ J ( λ j ( W n,k ) − x ) η ( λ j ( W n,k ) − x ) + η R j | + nη | (cid:80) j ∈ J η ( λ j ( W n,k ) − x ) + η R j | . Consider the sum S := | (cid:80) j ∈ J ( λ j ( W n,k ) − x ) η ( λ j ( W n,k ) − x ) + η R j | . As 0 ≤ ( λ j ( W n,k ) − x ) η ( λ j ( W n,k ) − x ) + η ≤

1, we are in positionto apply Lemma 1.2. Taking t = C K √ log n with a suﬃciently large constant C , by (3) we have S ≤ C nη (2 K (cid:112) | J | log n + C K log n ) ANDOM WEIGHTED PROJECTIONS, RANDOM QUADRATIC FORMS AND RANDOM EIGENVECTORS 15 with probability at least 1 − C exp( − C (cid:48) C log n ) ≥ − n − C / . By Lemma 5.2, | J | ≤ Bnη withprobability at least 1 − n − C , for some suﬃciently large constant B >

0. Recall η := K C log nnδ ; itfollows that with probability at least 1 − n − C / we have S ≤ C C − δ (2 √ B + C C − δ ) . Thus, for C suﬃciently large compared to C and B , we have S ≤ δ . Similarly, we can provethe same bound for S := nη | (cid:80) j ∈ J η ( λ j ( W n,k ) − x ) + η R j | .For the other eigenvalues, we divide the real line into small intervals. For integer l ≥

0, let J l bethe set of eigenvalues λ j ( W n,k ) such that 10 l η < λ j ( W n,k ) − x ≤ l +1 η . The number of such J l is at most 20 log n . By Lemma 5.2 one has, | J l | ≤ B l nη with probability at least 1 − n − C , forsome suﬃciently large constant B >

0. Again by Lemma 1.2 (taking t = KC √ log n ),1 n | (cid:88) j ∈ J l R j λ j ( W n,k ) − x − √− η |≤ n | (cid:88) j ∈ J l λ j − x ( λ j − x ) + η R j | + 1 n | (cid:88) j ∈ J l η ( λ j − x ) + η R j |≤ l nη | (cid:88) j ∈ J l l η ( λ j − x )( λ j − x ) + η R j | + 110 l nη | (cid:88) j ∈ J l (10 l η ) ( λ j − x ) + η R j |≤ C K l nη (2 (cid:112) | J l | (cid:112) log n + KC log n ) ≤ δ BC C − − l/ with probability at least 1 − C exp( − C (cid:48) C log n ) − n − C ≥ − n − C / .Summing over l , we have1 n | (cid:88) l (cid:88) j ∈ J l R j λ j ( W n,k ) − x − √− η | ≤ C C − Bδ ≤ δ with probability at least 1 − n − C / , for C suﬃciently large. This completes the proof of Lemma4.2.Inserting the bounds into (23), one has s n ( z ) + 1 n n (cid:88) k =1 s n ( z ) + z + o ( δ ) = 0with probability at least 1 − O ( n − C ). The term | ζ kk / √ n | = o ( δ ) as | ζ kk | ≤ K by assumption.Comparing this equation with (22), one can use a continuity argument (see [28] for details) toobtain | s n ( z ) − s ( z ) | ≤ δ with probability at least 1 − O ( n − C +100 ).By Lemma 4.1, it follows that for random matrices M n with K -bounded entries, for any constant C >

0, there exists a constant C > ≤ δ ≤ / I ⊂ ( − ,

3) oflength at least C K log n/nδ ,(26) | N I − n (cid:90) I ρ sc ( x ) dx | ≤ δn | I | holds with probability at least 1 − n − C . In particular, Theorem 5.1 follows. The infinity norm of eigenvectors

We prove Theorem 1.7 in the following more general form.

Theorem 6.1 (Optimal inﬁnity norm of eigenvectors) . Let M n be a Hermitian matrix whose upperdiagonal entries are independent random variables with mean 0 and variance 1. Further assumethat for any index ≤ i ≤ n , the vector X i , obtained by deleting the i -th entry of the i -th row vectorof M n , is K -concentrated. Let W n = √ n M n . Then for any constant C > , there is a constant C > such that the following holds. • (Bulk case) With probability at least − n − C , for any (cid:15) > and any ≤ i ≤ n with λ i ( W n ) ∈ [ − (cid:15), − (cid:15) ] there is a unit eigenvector u i ( W n ) of λ i ( W n ) satisfying (cid:107) u i ( W n ) (cid:107) ∞ ≤ C K log / n √ n . • (Edge case) With probability at least − n − C , for any (cid:15) > and any ≤ i ≤ n with λ i ( W n ) ∈ [ − − (cid:15), − (cid:15) ] ∪ [2 − (cid:15), (cid:15) ] , there is a unit eigenvector u i ( W n ) of λ i ( W n ) satisfying (cid:107) u i ( W n ) (cid:107) ∞ ≤ C K log n √ n . We give here the proof of the ﬁrst part of Theorem 6.1. The proof of the second part is somewhatdiﬀerent and deferred to the appendix. With the threshold for local semi-circle law, we are able toderive the eigenvector delocalization results thanks to the next lemma.

Lemma 6.2 (Eq (4.3), [9] or Lemma 41, [29]) . Let W n = (cid:18) a Y ∗ Y W n − (cid:19) be an n × n Hermitian matrix for some a ∈ C and Y ∈ C n − , and let (cid:18) xv (cid:19) be an eigenvector of W n with eigenvalue λ i ( W n ) , where x ∈ C and v ∈ C n − . Assume none of the eigenvalues of W n − equals λ i ( W n ) . Then | x | = 11 + (cid:80) n − j =1 ( λ j ( W n − ) − λ i ( W n )) − | u j ( W n − ) ∗ Y | , where u j ( W n − ) is a unit eigenvector corresponding to the eigenvalue λ j ( W n − ) . The assumption that the eigenvalues of W n and W n − do not collide was taken care of in [32,Section 3.1], so we can assume that the above formula makes sense in applications.First, for the bulk case, for any λ i ( W n ) ∈ ( − ε, − ε ), by Theorem 5.1, one can ﬁnd an interval I ⊂ ( − ε, − ε ), centered at λ i ( W n ) and with length | I | = K C log n/n , such that N I ≥ δ n | I | ( δ > − n − C − . By Cauchy interlacing law, we canﬁnd a set J ⊂ { , . . . , n − } with | J | ≥ N I / | λ j ( W n − ) − λ i ( W n ) | ≤ | I | for all j ∈ J .Let X be the ﬁrst column of M n with the ﬁrst entry removed. Then X = √ nY . ANDOM WEIGHTED PROJECTIONS, RANDOM QUADRATIC FORMS AND RANDOM EIGENVECTORS 17

By Lemma 6.2, we have | x | = 11 + (cid:80) n − j =1 ( λ j ( W n − ) − λ i ( W n )) − | u j ( W n − ) ∗ √ n X | ≤

11 + (cid:80) j ∈ J ( λ j ( W n − ) − λ i ( W n )) − | u j ( W n − ) ∗ √ n X | ≤

11 + n − | I | − (cid:80) j ∈ J | u j ( W n − ) ∗ X | ≤

11 + 100 − n − | I | − | J | ≤ | I | /δ ≤ K C log nn (27)for some constant C with probability at least 1 − n − C − . The third inequality follows from (3)by taking t = δ K √ C log n (say).Thus, by union bound and symmetry, (cid:107) u i ( W n ) (cid:107) ∞ ≤ C K log / n √ n holds with probability at least1 − n − C . Appendix A. Proof for the Edge case of Theorem 6.1

For the edge case in Theorem 6.1, we use a diﬀerent approach based on the next lemma.

Lemma A.1 (Interlacing identity, Lemma 37, [28]) . Let W n − be the matrix W n with the n -th rowand n -th column removed and Y is the n -th column of W n with the n -th element ζ nn / √ n removed.If none of the eigenvalues of W n − equals λ i ( W n ) , then (28) n − (cid:88) j =1 | u j ( W n − ) ∗ Y | λ j ( W n − ) − λ i ( W n ) = 1 √ n ζ nn − λ i ( W n ) . By symmetry, it suﬃces to consider the case λ i ( W n ) ∈ [2 − (cid:15), (cid:15) ] for (cid:15) > X the n -th column of M n with the n -th element removed. Thus Y = √ n X. By Lemma 6.2, in orderto show | x | ≤ C K log n/n (for constant C > C + 100) with probability at least 1 − n − C − ,it is enough to show n − (cid:88) j =1 | u j ( W n − ) ∗ X | ( λ j ( M n − ) − λ i ( M n )) ≥ nC K log n . By the projection lemma, | u j ( W n − ) ∗ X | ≤ K √ C log n with probability at least 1 − n − C . Itsuﬃces to show that with probability at least 1 − n − C − , n − (cid:88) j =1 | u j ( W n − ) ∗ X | ( λ j ( M n − ) − λ i ( M n )) ≥ nC K log n . By Cauchy-Schwardz inequality, it is enough to show for some integers 1 ≤ T − < T + ≤ n − (cid:88) T − ≤ j ≤ T + | u j ( W n − ) ∗ Y | | λ j ( W n − ) − λ i ( W n ) | ≥ √ T + − T − C . K √ log n . By Lemma A.1, we are going to show for some integers T + , T − satisfying T + − T − = O (log n ) (thechoice of T + , T − will be given later) that(29) | (cid:88) j ≥ T + or j ≤ T − | u j ( W n − ) ∗ Y | λ j ( W n − ) − λ i ( W n ) | ≤ − (cid:15) − √ T + − T − C . K √ log n + o (1) , with probability at least 1 − n − C − .Let η = K C log nnδ with constant δ = (cid:15)/ I k for k ≥ I = ( λ i ( W n ) − η, λ i ( W n ) + η ). For 1 ≤ k ≤ k = log . n (say), | I k | has length 2 ηδ − k = o (1)and I k = ( λ i ( W n ) − β k η, λ i ( W n ) − β k − η ] ∪ [ λ i ( W n ) + β k − η, λ i ( W n ) + β k η ) , where we denote by β k = (cid:80) ks =0 δ − s . The distance from λ i ( W n ) to the interval I k satisﬁesdist( λ i ( W n ) , I k ) ≥ β k − η. For each such interval, by (26), for suﬃciently large constant

C >

0, the number of eigen-values | J k | = N I k ≤ nα I k | I k | + δ k +1 n | I k | with probability at least 1 − n − C − , where α I k = (cid:82) I k ρ sc ( x ) dx/ | I k | .For the k -th interval, by (3) taking t = K √ C log n , we have that, with probability at least1 − C (cid:48)(cid:48) exp( − C (cid:48) C log n ) ≥ − n − C − for suﬃciently large C ,1 n (cid:88) j ∈ J k | u j ( W n − ) ∗ X | | λ j ( W n − ) − λ i ( W n ) | ≤ n λ i ( W n ) , I k ) (cid:88) j ∈ J k | u j ( W n − ) ∗ X | ≤ n λ i ( W n ) , I k ) ( | J k | + K (cid:112) | J k | (cid:112) C log n + CK log n ) ≤ α I k | I k | dist( λ i ( W n ) , I k ) + 1 nηβ k − ( nδ k +1 | I k | + √ K (cid:112) C log n (cid:112) n | I k | + CK log n ) ≤ α I k | I k | dist( λ i ( W n ) , I k ) + 10 δ k − . For k ≥ k + 1, let the interval I k ’s have the same length of | I k | = 2 δ − k η . Note that thenumber of such intervals is bounded crudely by o ( n ). By (26), the number of eigenvalues | J k | ≤ nα I k | I k | + δ k +1 n | I k | with probability at least 1 − n − C − . And the distance from λ i ( W n ) to theinterval I k satisﬁes dist( λ i ( W n ) , I k ) ≥ β k − η + ( k − k ) | I k | . The contribution of such intervals can be computed similarly1 n (cid:88) j ∈ J k | u j ( W n − ) ∗ X | | λ j ( W n − ) − λ i ( W n ) | ≤ n λ i ( W n ) , I k ) (cid:88) j ∈ J k | u j ( W n − ) ∗ X | ≤ n λ i ( W n ) , I k ) ( | J k | + K (cid:112) | J k | (cid:112) C log n + CK log n ) ≤ α I k | I k | dist( λ i ( W n ) , I k ) + δ k k − k with probability at least 1 − n − C − . ANDOM WEIGHTED PROJECTIONS, RANDOM QUADRATIC FORMS AND RANDOM EIGENVECTORS 19

Summing over all intervals for k ≥

10 (say), we obtain(30) | (cid:88) j ≥ T + or j ≤ T − | u j ( W n − ) ∗ Y | λ j ( W n − ) − λ i ( W n ) | ≤ | (cid:88) I k α I k | I k | dist( λ i ( W n ) , I k ) | + δ. On the other hand, it follows from Riemann integration of the principal value integral that (cid:88) I k α I k | I k | dist( λ i ( W n ) , I k ) = p.v. (cid:90) − ρ sc ( x ) λ i ( W n ) − x dx + o (1) , where p.v. (cid:90) − ρ sc ( x ) λ i ( W n ) − x dx := lim ε → (cid:90) − ≤ x ≤ , | x − λ i ( W n ) |≥ ε ρ sc ( x ) λ i ( W n ) − x dx. From the explicit formula for the Stieltjes transform and from residue calculus, one obtains p.v. (cid:90) − ρ sc ( x ) x − λ i ( W n ) dx = − λ i ( W n ) / | λ i ( W n ) | ≤

2, and with the right-hand side replaced by − λ i ( W n ) / (cid:112) λ i ( W n ) − / | λ i ( W n ) | >

2. Finally, we always have(31) | (cid:88) I k α I k | I k | dist( λ i ( W n ) , I k ) | ≤ (cid:15). Now for the rest of eigenvalues that satisfy | λ i ( W n ) − λ j ( W n − ) | ≤ | I | + | I | + . . . + | I | ≤ η/δ ,by Theorem 5.1 and Cauchy interlacing law, the number of eigenvalues is at most T + − T − ≤ nη/δ = 8 CK log n/δ with probability at least 1 − n − C − for suﬃciently large constant C >

0. Thus(32) √ T + − T − C . K √ log n ≤ δ C ≤ (cid:15)/ , by choosing C suﬃciently large compared to δ − . Thus, from (29), (30), (31) and (32), we haveproved that there exits a constant C > − n − C − , | x | ≤ C K log n √ n . The conclusion of the second part of Theorem 1.7 follows from symmetry and union bounds.

Appendix B. Local Marchenko-Pastur law for random covariance matrix anddelocalization of singular vectors

In this appendix, we extend the results obtained for random Hermitian matrices discussed in theprevious sections to random covariance matrices, focusing on the changes needed for the proofs.Interested reader can refer to closely related papers [30] and [33] (see also [12, 22]).Let M = M p,n = ( ζ ij ) ≤ i ≤ p, ≤ j ≤ n be a p × n matrix, where p = p ( n ) is an integer such that p ≤ n and lim n →∞ p/n = y ∈ (0 , M n,p are independent random variableswith mean zero and variance one. For such a p × n random matrix M , we form the n × n (sample) covariance matrix W = W p,n = n M ∗ M . This (non-negative deﬁnite) matrix has at most p non-zeroeigenvalues which are ordered as0 ≤ λ ( W ) ≤ λ ( W ) ≤ . . . ≤ λ p ( W ) . Denote by σ ( M ) , . . . , σ p ( M ) the singular values of M . It is easy to see that σ i ( M ) = √ nλ i ( W ) / .From the singular value decomposition, there exist orthonormal bases { u , . . . , u p } for C n and { v , . . . , v p } for C p such that M u i = σ i v i and M ∗ v i = σ i u i . A fundamental result concerning the asymptotic limiting behavior of ESD for large covariancematrices is the Marchenko–Pastur Law (see [3] and [18]).

Theorem B.1. (Marchenko–Pastur Law) Assume the entries of matrix M ∈ C p × n are independentrandom variables with mean zero and variance one and lim n →∞ p/n = y ∈ (0 , . Then the empiricalspectral distribution of the matrix W = n M ∗ M converges with probability 1 to the Marchenko-Pastur Law with a density function ρ MP,y ( x ) := 12 πxy (cid:112) ( b − x )( x − a ) [ a,b ] ( x ) , where a := (1 − √ y ) , b := (1 + √ y ) . The hard edge of the limiting support of spectrum refers to the left edge a when y = 1 whereit gives rise to a singularity of x − / . The cases of left edge a when y < b regardless of the value of y are usually called the soft edge . Recent progress on studying the localconvergence to Marchenko–Pastur Law includes [12, 22, 30, 33] for the soft edge and [6, 27] for thehard edge. We focus on improving the previous results for the soft edge in this appendix.Our main results for the random covariance matrices are the following local Marchenko–Pasturlaw (LMPL) and the delocalization property of singular vectors. Theorem B.2.

For any constants (cid:15), δ, C > , there exists a constant C > such that the followingholds. Assume lim n →∞ p/n = y for some < y ≤ . Let M = M p,n = ( ζ ij ) ≤ i ≤ p, ≤ j ≤ n be a randommatrix whose entries are independent K -bounded random variables with mean 0 and variance 1.Consider the covariance matrix W = n M ∗ M . Then with probability at least − n − C , one has | N I ( W n,p ) − p (cid:90) I ρ MP,y ( x ) dx | ≤ δp (cid:90) I ρ MP,y ( x ) dx. for any interval I ⊂ ( a + (cid:15), b − (cid:15) ) of length at least C K log n/n. Theorem B.3 (Delocalization of singular vectors) . Let M p,n be as in Theorem B.2. For anyconstant C > , there is a constant C > such that the following holds. • (Bulk case) With probability at least − n − C , for any (cid:15) > and any ≤ i ≤ p such that σ i ( M p,n ) /n ∈ [ a + (cid:15), b − (cid:15) ] , there is a left singular vector u i corresponding to σ i ( M p,n ) suchthat (cid:107) u i (cid:107) ∞ ≤ C K log / n √ n . The same holds for right singular vectors.

ANDOM WEIGHTED PROJECTIONS, RANDOM QUADRATIC FORMS AND RANDOM EIGENVECTORS 21 • (Edge case) With probability at least − n − C , for any (cid:15) > and any ≤ i ≤ p such that σ i ( M p,n ) /n ∈ [ a − (cid:15), a + (cid:15) ] ∪ [ b − (cid:15), b + (cid:15) ] if a (cid:54) = 0 and σ i ( M p,n ) /n ∈ [4 − (cid:15), if a = 0 , thereis a left singular vector u i corresponding to σ i ( M n,p ) such that (cid:107) u i (cid:107) ∞ ≤ C K log n √ n . The same holds for right singular vectors.

Remark B.4.

Theorem B.2 and Theorem B.3 actually hold for a larger class of matrices, using the K -concentration introduced in the previous sections. For instance, Theorem B.2 holds for randommatrices M p,n = ( ζ ij ) whose entries are independent random variables with mean 0 and variance1, and the row vectors are K -concentrated. And Theorem B.3 holds if we further assume thecolumn vectors of M p,n are also K -concentrated. Indeed, the K -bounded assumption is only usedto guarantee K -concentration.B.1. Proof of Theorem B.2.

Similarly to the Hermitian case, we compare the Stieltjes transformof

W s ( z ) := 1 p p (cid:88) i =1 λ i ( W ) − z with that of the Marchenko–Pastur Law s MP,y ( z ) := (cid:90) R x − z ρ MP,y ( x ) dx = (cid:90) ba πxy ( x − z ) (cid:112) ( b − x )( x − a ) dx. The explicit expression of s MP,y ( z ) is given by (see [4]) s MP,y ( z ) = − y + z − − (cid:112) ( y + z − − yz yz , where we take the branch of (cid:112) ( y + z − − yz with cut at [ a, b ] that is asymptotically y + z − z tends to inﬁnity. Note that it is uniquely deﬁned by the equation s MP,y ( z ) + 1 y + z − yzs MP,y ( z ) = 0 . We will show that s ( z ) satisﬁes a similar equation.The analogue of Lemma 4.1 is the following lemma. Lemma B.5. (Lemma 29, [30])

Let M n,p be a random matrix with independent K -bounded entrieswith mean 0 and variance 1. Assume lim n → + ∞ p/n = y ∈ (0 , . Let /n < η < / , and L , L , ε, δ > . For any constant C > , there exists a constant C > such that if one has thebound | s ( z ) − s MP,y ( z ) | ≤ δ with (uniformly) probability at least − n − C for all z with L ≤ Re ( z ) ≤ L and Im ( z ) ≥ η . Thenfor any interval I in [ L − ε, L + ε ] with | I | ≥ max (2 η, ηδ log δ ) , one has | N I − p (cid:90) I ρ MP,y ( x ) dx | ≤ δp | I | with probability at least − n − C . The objective is to show(33) | s ( z ) − s MP,y ( z ) | = o ( δ )with probability at least 1 − n − C for any z in the region R y , where R y = { z ∈ C : | z | ≤ , a − (cid:15) ≤ Re( z ) ≤ b + (cid:15), Im( z ) ≥ η } if y (cid:54) = 1, and R y = { z ∈ C : | z | ≤ , (cid:15) ≤ Re( z ) ≤ (cid:15), Im( z ) ≥ η } if y = 1. We use the parameter η := K C log nnδ , where C = C + 10 . Note that in the deﬁned region R y , | s MP,y ( z ) | = O (1).First, by Schur’s complement, one can rewrite(34) s ( z ) = 1 p trace( W ∗ − zI ) − = 1 p p (cid:88) k =1 ξ kk − z − Y k where Y k = a ∗ k ( W k − zI ) − a k , and W k is the matrix W ∗ = n M M ∗ = ( ξ ij ) ≤ i,j ≤ p with the k -th rowand k -th column removed, and a k is the k -th row of W with the k -th element removed. Let M k bethe ( p − × n minor of M with the k -th row removed and X ∗ i ∈ C n (1 ≤ i ≤ p ) be the rows of M .Thus ξ kk = X k ∗ X k /n = (cid:107) X k (cid:107) /n, a k = n M k X k , W k = n M k M ∗ k . Thus Y k = p − (cid:88) j =1 | a ∗ k v j ( M k ) | λ j ( W k ) − z = p − (cid:88) j =1 n λ j ( W k ) | X ∗ k u j ( M k ) | λ j ( W k ) − z where u ( M k ) , . . . , u p − ( M k ) ∈ C n and v ( M k ) , . . . , v n ( M k ) ∈ C p − are orthonormal right and leftsingular vectors of M k . Here we use the fact that a ∗ k v j ( M k ) = n X ∗ k M ∗ k v j ( M k ) = n σ j ( M k ) X ∗ k u j ( M k )and σ j ( M k ) = nλ j ( W k ).The entries of X k are independent of each other and of W k , and have mean 0 and variance 1.Since u j ( M k ) are unit vectors, by linearity of expectation we have E ( Y k | W k ) = p − (cid:88) j =1 n λ j ( W k ) λ j ( W k ) − z = p − n + zn p − (cid:88) j =1 λ j ( W k ) − z = p − n (1 + zs k ( z )) , where s k ( z ) = 1 p − p − (cid:88) i =1 λ i ( W k ) − z is the Stieltjes transform of W k . By Cauchy interlacing law, we have | s ( z ) − (1 − p ) s k ( z ) | = O ( 1 p (cid:90) R | x − z | dx ) = O ( 1 pη ) . Thus E ( Y k | W k ) = p − n + z pn s ( z ) + O ( 1 nη ) = p − n + z pn s ( z ) + o ( δ ) . On the other hand, Y k is concentrated about E ( Y k | W k ) with high probability: Lemma B.6.

Let M n,p be as in Lemma B.5. For ≤ k ≤ p , Y k = E ( Y k | W k ) + o ( δ ) holds withprobability at least − O ( n − C ) for any z in the region R y . ANDOM WEIGHTED PROJECTIONS, RANDOM QUADRATIC FORMS AND RANDOM EIGENVECTORS 23

1. Note that λ j ( W k ) = O (1). The estimation of (35) is a repetitionof the calculation in (25). Interested reader are encouraged to work out the details. Inserting thebounds to (34), we have s ( z ) + 1 y + z − yzs ( z ) + o ( δ ) = 0with probability at least 1 − O ( n − C ). By a continuity argument (see for instance [33]), one has | s ( z ) − s MP,y ( z ) | = o ( δ ) with probability at least 1 − n − C +100 (say). By Lemma B.5, we haveshowed that for any constants (cid:15), C >

0, there exists a constant C > < δ < / I ⊂ ( a − (cid:15), b + (cid:15) ) if a (cid:54) = 0 or I ⊂ ( (cid:15), (cid:15) ) if a = 0 of length at least C K log n/nδ , with probability at least 1 − n − C ,(36) | N I − p (cid:90) I ρ MP,y ( x ) dx | ≤ δp | I | . In particular, Theorem B.2 follows.B.2.

Proof of Theorem B.3.

To prove the delocalization of singular vectors, we need the follow-ing formula to express the entries of a singular vector in terms of the singular values and singularvectors of a minor. It is enough to prove the delocalization for the right (unit) singular vectors.

Lemma B.7 (Corollary 25, [30]) . Let p, n ≥ , and let M p,n = (cid:0) M p,n − X (cid:1) be a p × n matrix for some X ∈ C p , and let (cid:18) ux (cid:19) be a right unit singular vector of M p,n withsingular value σ i ( M p,n ) , where x ∈ C and u ∈ C n − . Suppose that none of the singular values of M p,n − are equal to σ i ( M p,n ) . Then | x | = 11 + (cid:80) min ( p,n − j =1 σ j ( M p,n − ) ( σ j ( M p,n − ) − σ i ( M p,n ) ) | v j ( M p,n − ) ∗ X | , where v ( M p,n − ) , . . . , v min ( p,n − ( M p,n − ) ∈ C p is an orthonormal system of left singular vectorscorresponding to the non-trivial singular values of M p,n − . In a similar vein, if M p,n = (cid:18) M p − ,n Y ∗ (cid:19) for some Y ∈ C n , and (cid:18) vy (cid:19) is a left unit singular vector of M p,n with singular value σ i ( M p,n ) ,where y ∈ C and v ∈ C p − , and none of the singular values of M p − ,n are equal to σ i ( M p,n ) . Then | y | = 11 + (cid:80) min ( p − ,n ) j =1 σ j ( M p − ,n ) ( σ j ( M p − ,n ) − σ i ( M p,n ) ) | u j ( M p − ,n ) ∗ Y | , where u ( M p − ,n ) , . . . , u min ( p − ,n ) ( M p − ,n ) ∈ C n is an orthonormal system of right singular vectorscorresponding to the non-trivial singular values of M p − ,n . First, if λ i ( W p,n ) lies within the bulk of spectrum, by Theorem B.2, one can ﬁnd an interval I ⊂ ( a + ε, b − ε ), centered at λ i ( W p,n ) and with length | I | = K C log n/n such that N I ≥ δ n | I | ( δ > − n − C − . By Cauchy interlacing law, we canﬁnd a set J ⊂ { , . . . , p } with | J | ≥ N I / | λ j ( W p,n − ) − λ i ( W p,n ) | ≤ | I | for all j ∈ J .Thus min( p,n − (cid:88) j =1 σ j ( M p,n − ) ( σ j ( M p,n − ) − σ i ( M p,n ) ) | v j ( M p,n − ) ∗ X | ≥ n (cid:88) j ∈ J λ j ( W p,n − )( λ j ( W p,n − ) − λ i ( W p,n )) | v j ( M p,n − ) ∗ X | ≥ n − | I | − (cid:88) j ∈ J | v j ( M p,n − ) ∗ X | ≥ − n − | I | − | J |≥ δ | I | − ≥ nK C log n with probability at least 1 − n − C − for some constant C >

0. The fourth inequality follows from(3) by taking t = δ K √ C log n .Thus, by Lemma B.7 and the union bound, | x | ≤ C K log / n √ n with probability at least 1 − n − C − .By symmetry and union bounds, (cid:107) u i ( M p,n ) (cid:107) ∞ ≤ C K log / n √ n holds with probability at least 1 − n − C .For the edge case, we consider | λ i ( W p,n ) − a | = o (1) ( a (cid:54) = 0) or | λ i ( W p,n ) − b | = o (1). We ﬁrstrecord an analogue of Lemma A.1. Lemma B.8 (Interlacing identity for singular values, Lemma 3.5 [33]) . Assume the notations inLemma B.7, then for every i , (37) min ( p,n − (cid:88) j =1 σ j ( M p,n − ) | v j ( M p,n − ) ∗ X | σ j ( M p,n − ) − σ i ( M p,n ) = || X || − σ i ( M p,n ) . Similarly, we have (38) min ( p − ,n ) (cid:88) j =1 σ j ( M p − ,n ) | u j ( M p − ,n ) ∗ Y | σ j ( M p − ,n ) − σ i ( M p,n ) = || Y || − σ i ( M p,n ) . By the union bound and Lemma B.7, in order to show | x | ≤ C K log n/n with probability atleast 1 − n − C − for some large constant C > C + 100, it is enough to show min( p,n − (cid:88) j =1 σ j ( M p,n − ) ( σ j ( M p,n − ) − σ i ( M p,n ) ) | v j ( M p,n − ) ∗ X | ≥ nC K log n . By the projection lemma, | v j ( M p,n − ) ∗ X | ≤ K √ C log n with probability at least 1 − n − C . Itsuﬃces to show that with probability at least 1 − n − C − , min( p,n − (cid:88) j =1 σ j ( M p,n − ) ( σ j ( M p,n − ) − σ i ( M p,n ) ) | v j ( M p,n − ) ∗ X | ≥ nC K log n . ANDOM WEIGHTED PROJECTIONS, RANDOM QUADRATIC FORMS AND RANDOM EIGENVECTORS 25

By Cauchy-Schwardz inequality and note that | σ i ( M p,n − ) | ≤ √ n almost surely (See [15, 35]),it is enough to show for some integers 1 ≤ T − < T + ≤ min( p, n −

1) (the choice of T − , T + will begiven later), (cid:88) T − ≤ j ≤ T + n σ j ( M p,n − ) | σ j ( M p,n − ) − σ i ( M p,n ) | | v j ( M p,n − ) ∗ X | ≥ √ T + − T − C . K √ log n . On the other hand, by the projection lemma, with probability at least 1 − n − C − , (cid:107) X (cid:107) /n = y + o (1). By (37) in Lemma B.8,(39) min( p,n − (cid:88) j =1 n σ j ( M p,n − ) | v j ( M p,n − ) ∗ X | σ j ( M p,n − ) − σ i ( M p,n ) = y + o (1) − λ i ( W p,n ) . It is enough to evaluate (cid:88) j ≥ T + or j ≤ T − λ j ( W p,n − ) | v j ( M p,n − ) ∗ X | λ j ( W p,n − ) − λ i ( W p,n ) . (40)The estimation of (40) is similar to that of (29). We divide the real line into disjoint intervals I k for k ≥

0. Let η = K C log nnδ with small constant δ ≤ .

01. Denote β k = (cid:80) ks =0 δ − s . Let I = ( λ i ( W p,n ) − η, λ i ( W p,n ) + η ). For 1 ≤ k ≤ k = log . n (say), I k = ( λ i ( W p,n ) − β k η, λ i ( W p,n ) − β k − η ] ∪ [ λ i ( W p,n ) + β k − η, λ i ( W p,n ) + β k η ) , thus | I k | = 2 δ − k η = o (1) and the distance from λ i ( W p,n ) to the interval I k satisﬁesdist( λ i ( W p,n ) , I k ) ≥ β k − η. For each such interval, by (36), for suﬃciently large constant

C >

0, the number of eigen-values | J k | = N I k ≤ pα I k | I k | + δ k +1 p | I k | with probability at least 1 − n − C − , where α I k = (cid:82) I k ρ MP,y ( x ) dx/ | I k | .Taking t = K √ C log n in (3) for C suﬃciently large, it follows that with probability at least1 − C (cid:48)(cid:48) exp( − C (cid:48) C log n ) ≥ − n − C − ,1 n (cid:88) j ∈ J k | λ j ( W p,n − ) || v j ( M p,n − ) ∗ X | | λ j ( W p,n − ) − λ i ( W p,n ) | ≤ n (1 + λ i ( W p,n )dist( λ i ( W n ) , I k ) ) (cid:88) j ∈ J k | v j ( W p,n − ) ∗ X | ≤ n (1 + λ i ( W p,n )dist( λ i ( W p,n ) , I k ) )( | J k | + K (cid:112) | J k | (cid:112) C log n + CK log n ) ≤ n (1 + λ i ( W p,n )dist( λ i ( W p,n ) , I k ) )( pα I k | I k | + δ k p | I k | + √ K (cid:112) C log n √ n (cid:112) | I k | + CK log n ) ≤ y (1 + λ i ( W p,n )dist( λ i ( W p,n ) , I k ) ) α I k | I k | + 100 δ k − . For k ≥ k + 1, let the intervals I k ’s have the same length of | I k | = 2 δ − k η . Note that the numberof such intervals is bounded crudely by o ( n ). The distance from λ i ( W p,n ) to the interval I k satisﬁesdist( λ i ( W p,n ) , I k ) ≥ β k − η + ( k − k ) | I k | . The contribution of such intervals can be estimated similarly by1 n (cid:88) j ∈ J k | λ j ( W p,n − ) || v j ( M p,n − ) ∗ X | | λ j ( W p,n − ) − λ i ( W p,n ) | ≤ n (1 + λ i ( W p,n )dist( λ i ( W n ) , I k ) ) (cid:88) j ∈ J k | v j ( W p,n − ) ∗ X | ≤ n (1 + λ i ( W p,n )dist( λ i ( W p,n ) , I k ) )( | J k | + K (cid:112) | J k | (cid:112) C log n + CK log n ) ≤ y (1 + λ i ( W p,n )dist( λ i ( W p,n ) , I k ) ) α I k | I k | + δ k k − k with probability at least 1 − n − C − .Summing over all intervals for k ≥

10 (say), we have k (cid:88) k =10 δ k − + (cid:88) k ≥ k δ k k − k ≤ δ. Using Riemann integration of the principal value integral, we obtain y (cid:88) I k (1 + λ i ( W p,n )dist( λ i ( W p,n ) , I k ) ) α I k | I k | = | p.v. (cid:90) ba y xρ MP,y ( x ) x − λ i ( W p,n ) dx | + o (1)(41)where (see [33] for details)(42) p.v. (cid:90) ba y xρ MP,y ( x ) x − λ i ( W p,n ) dx = (cid:40) √ y + o (1) , if | λ i ( W p,n ) − a | = o (1); −√ y + o (1) , if | λ i ( W p,n ) − b | = o (1) . follows from the explicit formula for the Stieltjes transform and from residue calculus.Now for the rest of eigenvalues such that | λ i ( W p,n ) − λ j ( W p,n − ) | ≤ | I | + | I | + . . . + | I | ≤ η/δ .By Theorem B.2 and Cauchy interlacing law, the number of eigenvalues is at most T + − T − ≤ nη/δ = 8 CK log n/δ with probability at least 1 − n − C − for constant C > √ T + − T − C . K √ log n ≤ δ C ≤ δ again by choosing C suﬃciently large. From Lemma B.7, by comparing (39), (40) and (42), onecan conclude with probability at least 1 − n − C − , | x | ≤ C K log n √ n . The conclusion of Theorem B.3follows from symmetry and union bounds.

References [1] G. B. Arous and P. Bourgade. Extreme gaps between eigenvalues of random matrices.

Ann. Probab. , 41(4):2648-2681, 2013.[2] Z. D. Bai, B. Q. Miao, and J. Tsay. Convergence rates of the spectral distributions of large Wigner matrices.

Int. Math. J.

Journal of Multivariate Analysis , 54(2):175–192, 1995.[4] Z. D. Bai and J. W. Silverstein.

Spectral analysis of large dimensional random matrices . Springer Verlag, 2010.[5] Z. D. Bai and Y. Q. Yin. Limit of the smallest eigenvalue of a large dimensional sample covariance matrix.

Anna.Probab. , 21(3):1275–1294, 1993.[6] C. Cacciapuoti, A. Maltsev, and B. Schlein. Local Marchenko–Pastur law at the hard edge of sample covariancematrices.

Arxiv preprint arXiv:1206.1730 , 2012.

ANDOM WEIGHTED PROJECTIONS, RANDOM QUADRATIC FORMS AND RANDOM EIGENVECTORS 27 [7] Y. Dekel, J. R. Lee, and N. Linial. Eigenvectors of random graphs: Nodal domains.

Random Structures &Algorithms , 39(1):39–58, 2011.[8] L. Erd˝os. Universality of Wigner random matrices: a survey of recent results.

Russian Mathematical Surveys ,66:507, 2011.[9] L. Erd˝os, B. Schlein, and H. T. Yau. Local semicircle law and complete delocalization for Wigner randommatrices.

Communications in Mathematical Physics , 287(2):641–655, 2009.[10] L. Erd˝os, B. Schlein, and H. T. Yau. Semicircle law on short scales and delocalization of eigenvectors for Wignerrandom matrices.

Ann. Probab. , 37(3):815–852, 2009.[11] L. Erd˝os, B. Schlein, and H. T. Yau. Wegner estimate and level repulsion for Wigner random matrices.

Int MathRes Notices , 2010(3):436–479, 2010.[12] L. Erd˝os, B. Schlein, H. T. Yau, and J. Yin. The local relaxation ﬂow approach to universality of the localstatistics for random matrices.

Ann. Inst. H. Poincar´e Probab. Statist. , 48(1):1–46, 2012.[13] L. Erd˝os, H. T. Yau, and J. Yin. Bulk universality for generalized Wigner matrices.

Probability Theory andRelated Fields , 154(1-2):341–407, 2012.[14] D. L. Hanson and F. T. Wright. A bound on tail probabilities for quadratic forms in independent randomvariables.

Ann. Math. Statist. , 42(3):1079–1083, 1971.[15] S. Geman. A limit theorem for the norm of random matrices.

Ann. Probab. , 8(2):252–261, 1980.[16] D. Hsu, S. M. Kakade, and T. Zhang. A tail inequality for quadratic forms of subgaussian random vectors.

Electron. Commun. Probab. , 17(52):1–6, 2012.[17] M. Ledoux.

The concentration of measure phenomenon , volume 89. American Mathematical Soc., 2001.[18] V. A. Marˇcenko and L. A. Pastur. Distribution of eigenvalues for some sets of random matrices.

Math. USSR-Sbornik , 1(4):457–483, 1967.[19] M. L. Mehta.

Random matrices , volume 142. Academic press, 2004.[20] H. Nguyen and V. Vu, Random matrices: Law of the determinant,

Ann. Probab. , 42 (1), 2014.[21] L. A. Pastur. On the spectrum of random matrices.

Theor. Math. Phys. , 10(1):67–74, 1972.[22] N. S. Pillai and J. Yin. Universality of covariance matrices.

Arxiv preprint arXiv:1110.2501 , 2011.[23] M. Rudelson and R. Vershynin. Hanson-Wright inequality and sub-gaussian concentration.

Arxiv preprintarXiv:1306.2872 , 2013.[24] M. Rudelson and R. Vershynin. Delocalization of eigenvectors of random matrices with independent entries.

Arxiv preprint arXiv:1306.2887 , 2013.[25] P. Samson. Concentration of measure inequalities for Markov chains and Φ-mixing processes.

Ann. Probab. ,28(1):416–461, 2000.[26] T. Tao and V. Vu. On random ± Random Structures & Algorithms ,28(1):1–23, 2006.[27] T. Tao and V. Vu. Random matrices: The distribution of the smallest singular values.

Geom. Funct. Anal. ,20(1):260–297, 2010.[28] T. Tao and V. Vu. Random matrices: Universality of local eigenvalue statistics up to the edge.

Commun. Math.Phys. , 298(2):549–572, 2010.[29] T. Tao and V. Vu. Random matrices: Universality of local eigenvalue statistics.

Acta mathematica , 206(1):127–204, 2011.[30] T. Tao and V. Vu. Random covariance matrices: Universality of local statistics of eigenvalues.

Ann. Probab. ,40(3):1285–1315, 2012.[31] T. Tao and V. Vu. Random matrices: The universality phenomenon for Wigner ensembles. arXiv preprintarXiv:1202.0068 , 2012.[32] L. Tran, V. Vu, and K. Wang, Sparse random graphs: eigenvalues and eigenvectors,

Random Structures &Algorithms , 42(1):110-134, 2013.[33] K. Wang. Random covariance matrices: Universality of local statistics of eigenvalues up to the edge.

RandomMatrices: Theory and Appl. , 01, 1150005, 2012.[34] F. T. Wright. A bound on tail probabilities for quadratic forms in independent random variables whose distri-butions are not necessarily symmetric.

Ann. Probab. , 1(6):1068–1070, 1973.[35] Y. Q. Yin, Z. D. Bai, and P. R. Krishnaiah On the limit of the largest eigenvalue of the large dimensional samplecovariance matrix.

Probab. Th. Rel. Fields , 78(4):509–521, 1988.

Van Vu, Department of Mathematics, Yale University, New Haven, CT 06520, USA

E-mail address : [email protected] Ke Wang, Institute for Mathematics and its Applications, University of Minnesota, Minneapolis,MN 55455, USA

E-mail address ::