[PDF] Limit distributions of sample covariance matrices are compound free Poisson

Abstract

We show that the empirical distribution of the eigenvalues of the sample covariance matrix of certain random vectors (not necessarily independent entries) with bounded marginal L 4 norms converges weakly to a compound free Poisson distribution.

Full PDF

aa r X i v : . [ m a t h . P R ] O c t SAMPLE COVARIANCE MATRICES CONVERGE TO COMPOUNDFREE POISSON DISTRIBUTION

MARCH T. BOEDIHARDJO

Abstract.

We show that the empirical distribution of the eigenvalues of the sample covari-ance matrix of certain random vectors (not necessarily independent entries) with boundedmarginal L norms converges weakly to a compound free Poisson distribution. Main result

Marchenko and Pastur [2] showed that the empirical distribution of the eigenvalues ofthe sample covariance matrix of a random vector uniformly distributed on the unit sphereconverges weakly to the Marchenko-Pastur law. There has been many generalizations togeneral random vectors (see [1]). The main result of this paper is

Theorem 1.1.

Suppose that f , . . . , f N are independent random vectors on C n such that sup x ∈ S n − E | ( f j , x ) | ≤ Ln and E k f j k k ≤ L k , j = 1 , . . . , N, k ≥ for some L > and L k > , k ≥ independent of n and N . If n, N → ∞ in such a way that nN → λ ∈ (0 , ∞ ) and (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) N X j =1 E k f j k k − f j ⊗ f j − a k I (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ Cn − ǫ , k ≥ , for some a k ∈ C , k ≥ and C, ǫ > independent of n and N , then E ◦ tr( f ⊗ f + . . . + f N ⊗ f N ) p → X π ∈ NC( p ) Y B ∈ π a | B | . Notation: tr means normalized trace. NC( p ) is the set of all noncrossing partitions on { , . . . , p } . Remarks.

1. An immediate consequence of Theorem 1.1 is that the theorem of Marchenkoand Pastur still holds if the random vector is distributed (but not uniformly distributed) onthe unit sphere provided that it has bounded marginal L norms.2. The condition sup x ∈ S n − E | ( f i , x ) | ≤ Ln cannot be removed from Theorem 1.1. For example,when N = n and each f i is uniformly distributed on the canonical basis { e i } ni =1 for C n , wehave a k = 1 and E ◦ tr( f ⊗ f + . . . + f n ⊗ f n ) p → B p , where B p is the Bell number, the number of partitions on { , . . . , p } . A graph inequality

This section is devoted to proving the following lemma.

Lemma 2.1.

Let S , . . . , S r be subsets of a set E such that every element e ∈ E is containedin exactly two of the sets S , . . . , S r . Assume that | S | ≤ . . . ≤ | S r | . Let t ≥ . Then min( t, | S | ) + min( t, | S \ S | ) + . . . + min( t, | S r \ ( S ∪ . . . ∪ S r − ) | ) ≥ min( t, | S | )2 r. Lemma 2.2.

Let S , . . . , S r be subsets of a set E such that every element x ∈ E is containedin exactly two of the sets S , . . . , S r . Then | E | = 12 r X k =1 | S k | . Proof.

By assumption, r X k =1 I S k ( x ) = 2 for all x ∈ E . So r X k =1 | S k | = r X k =1 X x ∈ E I S k ( x ) = X x ∈ E r X k =1 I S k ( x ) = X x ∈ E | E | . (cid:3) In Lemma 2.3 and 2.5 below, Λ c is understood as { , . . . , r }\ Λ. Also when k = 1, S k \ ( S ∪ . . . ∪ S k − ) is understood as S . Lemma 2.3.

Let S , . . . , S r be subsets of a set E such that every element x ∈ E is containedin exactly two of the sets S , . . . , S r . If Λ ⊂ { , . . . , r } and ≤ k ≤ r , then X k ∈ Λ | S k \ ( S ∪ . . . ∪ S k − ) | ≥  X ≤ k ≤ k − k ∈ Λ | S k | − X ≤ k ≤ k − k ∈ Λ c | S k |  . Proof. X k ∈ Λ | S k \ ( S ∪ . . . ∪ S k − ) | = r X k =1 | S k \ ( S ∪ . . . ∪ S k − ) | − X k ∈ Λ c | S k \ ( S ∪ . . . ∪ S k − ) | = | E | − X k ∈ Λ c | S k \ ( S ∪ . . . ∪ S k − ) | since E = r [ k =1 S k = 12 r X k =1 | S k | − X k ∈ Λ c | S k \ ( S ∪ . . . ∪ S k − ) | by Lemma 2 .

Lemma 2.4.

Let m ≥ . Let Λ and Λ be subsets of { , . . . , m } . If | [ l, m ] ∩ Λ | ≤ | [ l, m ] ∩ Λ | for all l ∈ { , . . . , m } then there exists a strictly increasing function f : Λ → Λ such that f ( k ) ≥ k for all k ∈ Λ .Proof. Since by assumption | Λ | ≤ | Λ | , the function f : Λ → Λ deﬁned by sending the i thlargest element of Λ to the i th largest element of Λ is well deﬁned and strictly increasing.It remains to show that f ( k ) ≥ k for all k ∈ Λ . For each i = 1 , . . . , | Λ | , let k i be the i thlargest element of Λ . By assumption, | [ k i , m ] ∩ Λ | ≤ | [ k i , m ] ∩ Λ | for all i = 1 , . . . , | Λ | .Note that [ k i , m ] ∩ Λ = { k , k . . . , k i } . So | [ k i , m ] ∩ Λ | = i . Therefore, | [ k i , m ] ∩ Λ | ≥ i for all i = 1 , . . . , | Λ | . So the i th largest element of Λ is at least k i . So f ( k i ) ≥ k i for all i = 1 , . . . , | Λ | so f ( k ) ≥ k for all k ∈ Λ . (cid:3) Lemma 2.5.

Let S , . . . , S r be subsets of a set E such that every element x ∈ E is containedin exactly two of the sets S , . . . , S r . Assume that | S | ≤ . . . ≤ | S r | . If Λ ⊂ { , . . . , r } then X k ∈ Λ | S k \ ( S ∪ . . . ∪ S k − ) | ≥ | S | ( | Λ | − | Λ c | ) . Proof.

Case I:

For every ≤ l ≤ r , | [ l, r ] ∩ Λ c | < | [ l, r ] ∩ Λ | . From the ﬁrst four lines of the proof of Lemma 2.3, we have X k ∈ Λ | S k \ ( S ∪ . . . ∪ S k − ) | = 12 r X k =1 | S k | − X k ∈ Λ c | S k \ ( S ∪ . . . ∪ S k − ) | . Thus, X k ∈ Λ | S k \ ( S ∪ . . . ∪ S k − ) | ≥ r X k =1 | S k | − X k ∈ Λ c | S k | = 12 X k ∈ Λ | S k | − X k ∈ Λ c | S k | . Taking m = r , Λ = Λ c and Λ = Λ in Lemma 2.4, we obtain an injective function f : Λ c → Λsuch that f ( k ) ≥ k for all k ∈ Λ c . Therefore, X k ∈ Λ | S k \ ( S ∪ . . . ∪ S k − ) | = 12 X j ∈ Λ | S j | − X k ∈ Λ c | S k | = 12 X j ∈ f (Λ c ) | S j | + 12 X j ∈ Λ \ f (Λ c ) | S j | − X k ∈ Λ c | S k | AMPLE COVARIANCE MATRICES CONVERGE TO COMPOUND FREE POISSON DISTRIBUTION 5 = 12 X k ∈ Λ c | S f ( k ) | + 12 X j ∈ Λ \ f (Λ c ) | S j | − X k ∈ Λ c | S k | = 12 X k ∈ Λ c ( | S f ( k ) | − | S k | ) + 12 X j ∈ Λ \ f (Λ c ) | S j |≥ | Λ \ f (Λ c ) || S | . The last inequality follows from the fact that f ( k ) ≥ k for all k ∈ Λ c and the assumptionthat | S | ≤ . . . ≤ | S r | . Since | Λ \ f (Λ c ) | = | Λ | − | f (Λ c ) | = | Λ | − | Λ c | , it follows that X k ∈ Λ | S k \ ( S ∪ . . . ∪ S k − ) | ≥

12 ( | Λ | − | Λ c | ) | S | . Case II:

There exists ≤ k ≤ r such that | [ k , r ] ∩ Λ c | ≥ | [ k , r ] ∩ Λ | . We may assume that k is the smallest one with such property. We may also assume that k >

1. Otherwise, the result is trivial. Thus, we have | [ l, k − ∩ Λ c | < | [ l, k − ∩ Λ | for all l ∈ { , . . . , k − } . Otherwise, an l failing this property would contradict with theminimality of k . Taking m = k −

1, Λ = [1 , k − ∩ Λ c and Λ = [1 , k − ∩ Λ in Lemma2.4, we obtain an injective function f : [1 , k − ∩ Λ c → [1 , k − ∩ Λ satisfying f ( k ) ≥ k for all k ∈ [1 , k − ∩ Λ c .By Lemma 2.3, we have X k ∈ Λ | S k \ ( S ∪ . . . ∪ S k − ) |≥  X ≤ k ≤ k − k ∈ Λ | S k | − X ≤ k ≤ k − k ∈ Λ c | S k |  = 12  X j ∈ [1 ,k − ∩ Λ | S j | − X k ∈ [1 ,k − ∩ Λ c | S k |  = 12  X j ∈{ f ( k ): k ∈ [1 ,k − ∩ Λ c } | S j | + X j ∈ [1 ,k − ∩ Λ \{ f ( k ): k ∈ [1 ,k − ∩ Λ c } | S j | − X k ∈ [1 ,k − ∩ Λ c | S k |  = 12  X k ∈ [1 ,k − ∩ Λ c | S f ( k ) | + X j ∈ [1 ,k − ∩ Λ \{ f ( k ): k ∈ [1 ,k − ∩ Λ c } | S j | − X k ∈ [1 ,k − ∩ Λ c | S k |  = 12  X k ∈ [1 ,k − ∩ Λ c ( | S f ( k ) | − | S k | ) + X j ∈ [1 ,k − ∩ Λ \{ f ( k ): k ∈ [1 ,k − ∩ Λ c } | S j |  ≥

12 (0 + | [1 , k − ∩ Λ \{ f ( k ) : k ∈ [1 , k − ∩ Λ c }|| S | ) . The last equality follows from the fact that f ( k ) ≥ k for all k ∈ [1 , k − ∩ Λ c and theassumption that | S | ≤ . . . ≤ | S r | . Therefore, X k ∈ Λ | S k \ ( S ∪ . . . ∪ S k − ) | ≥

12 ( | [1 , k − ∩ Λ | − |{ f ( k ) : k ∈ [1 , k − ∩ Λ c }| ) | S | = 12 ( | [1 , k − ∩ Λ | − | [1 , k − ∩ Λ c | ) | S | MARCH T. BOEDIHARDJO = 12 ( | Λ | − | [ k , r ] ∩ Λ | − | Λ c | + | [ k , r ] ∩ Λ c | ) | S |≥

12 ( | Λ | − | Λ c | ) | S | . The last inequality follows from Case II assumption. (cid:3)

Proof of Lemma 2.1.

Let Λ = { ≤ k ≤ r : | S k \ ( S ∪ . . . ∪ S k − ) | ≤ t } . Thenmin( t, | S | ) + min( t, | S \ S | ) + . . . + min( t, | S n \ ( S ∪ . . . ∪ S n − ) | )= X k ∈ Λ | S k \ ( S ∪ . . . ∪ S k − ) | + t | Λ c | . If | Λ c | ≥ r thenmin( t, | S | ) + min( t, | S \ S | ) + . . . + min( t, | S r \ ( S ∪ . . . ∪ S r − ) | ) ≥ tr | Λ | ≥ r then | Λ | − | Λ c | ≥ t, | S | ) + min( t, | S \ S | ) + . . . + min( t, | S n \ ( S ∪ . . . ∪ S n − ) | ) ≥ | S | ( | Λ | − | Λ c | ) + t | Λ c |≥

12 min( t, | S | )( | Λ | − | Λ c | ) + min( t, | S | ) | Λ c | = 12 min( t, | S | )( | Λ | + | Λ c | ) = min( t, | S | )2 r. (cid:3) Proof of the main result

Lemma 3.1. If y and z are nonnegative random variables then for every < ǫ < , E yz ≤ ( E y ) − ǫ ( E y ( z ǫ )) ǫ . Proof.

By H¨older’s inequality, E yz = E y − ǫ ( y ǫ z ) ≤ ( E y ) − ǫ ( E ( y ǫ z ) ǫ ) ǫ = ( E y ) − ǫ ( E y ( z ǫ )) ǫ . (cid:3) Lemma 3.2.

Let f , . . . , f r be a random vector on C n such that for every δ > there exists M δ > such that sup x ∈ S n − E | ( f, x ) | ≤ M δ n − δ ) and E k f k k ≤ L k , f ∈ { f , . . . , f r } , k ≥ . Then for every ǫ > and x , . . . , x r ∈ C n with k x i k ≤ , E | ( f , x ) | . . . | ( f r , x r ) | ≤ C ǫ n min( r, − ǫ ) , where C ǫ depends on ǫ and certain M δ and L k,δ but not on n .Proof. By H¨older’s inequality, E | ( f , x ) | . . . | ( f r , x r ) | ≤ ( E | ( f , x ) | r ) r . . . ( E | ( f r , x r ) | r ) r so it suﬃces to prove the lemma when f = . . . = f r = f and x = . . . = x r = x . If r > ǫ > E | ( f, x ) | r ≤ E | ( f, x ) | k f k r − ≤ ( E | ( f, x ) | ) − ǫ ( E | ( f, x ) | k f k r − ǫ ) ǫ ≤ ( E | ( f, x ) | ) − ǫ ( E k f k r − ǫ ) ǫ AMPLE COVARIANCE MATRICES CONVERGE TO COMPOUND FREE POISSON DISTRIBUTION 7 ≤ (cid:18) M ǫ n − ǫ ) (cid:19) − ǫ ( L r − ǫ ) ǫ ≤ M − ǫ ǫ n − ǫ ) ( L r − ǫ ) ǫ . If r ≤ E | ( f, x ) | r ≤ ( E | ( f, x ) | ) r ≤ M r ǫ n r (1 − ǫ ) . (cid:3) Lemma 3.3.

Let G = ( V, E ) be a graph with no loops but perhaps with multiple edges. Let ( B v ) v ∈ V be independent σ -subalgebras of a probability space (Ω , B , P ) . For each e ∈ E , let u ( e ) and u ( e ) be the two endpoints of e and let h (1) e and h (2) e be B u ( e ) -measurable and B u ( e ) -measurable random vectors on C n . Assume that for every δ > , there exist M δ > and L k,δ such that sup x ∈ S n − E | ( h, x ) | ≤ M δ n − δ ) and E k h k k ≤ L k,δ n δ , h ∈ [ e ∈ E { h (1) e , h (2) e } , k ≥ . If every vertex has degree at least , then for every ǫ > , E Y e ∈ E |h h (1) e , h (2) e i| ≤ C ǫ n | V | (1 − ǫ ) , where C ǫ depends on ǫ , the graph G and certain M δ and L k,δ but not on n .Proof. Let v , . . . , v | V | be an enumeration of V with ascending order according to their de-grees, i.e., deﬁning S j to be the set of all edges incident to v j , we have | S | ≤ | S | ≤ . . . | S | V | | .For each j = 1 , . . . , | V | , if e ∈ S j \ ( S ∪ . . . ∪ S j − ) then either u ( e ) = v j or u ( e ) = v j and so by interchanging the values of u ( e ) and u ( e ) (and accordingly also h (1) e and h (2) e ), ifnecessary, we may assume that u ( e ) = v j . Thus, for every η > E Y e ∈ E |h h (1) e , h (2) e i| = E | V | Y j =1 Y e ∈ S j \ ( S ∪ ... ∪ S j − ) |h h (1) e , h (2) e i| = E | V | Y j =1 Y e ∈ S j \ ( S ∪ ... ∪ S j − ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)* h (1) e , h (2) e k h (2) e k + η +(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ( k h (2) e k + η )= E | V | Y j =1 Y e ∈ S j \ ( S ∪ ... ∪ S j − ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)* h (1) e , h (2) e k h (2) e k + η +(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) | V | Y j =1 Y e ∈ S j \ ( S ∪ ... ∪ S j − ) ( k h (2) e k + η )= E | V | Y j =1 Y e ∈ S j \ ( S ∪ ... ∪ S j − ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)* h (1) e , h (2) e k h (2) e k + η +(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Y e ∈ E ( k h (2) e k + η ) , (3.1)where as before, when j = 1, S j \ ( S ∪ . . . ∪ S j − ) is understood as S . MARCH T. BOEDIHARDJO

Since u ( e ) = v j , h (1) e is B v j -measurable. On the other hand, by assumption, h (2) e is B u ( e ) -measurable; and since G has no loops, u ( e ) = u ( e ) = v j . Therefore, by Lemma 3.2,(3.2) E B vj Y e ∈ S j \ ( S ∪ ... ∪ S j − ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)* h (1) e , h (2) e k h (2) e k + η +(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C ǫ n min( | S j \ ( S ∪ ...S j − ) | , − ǫ ) . Note that the right hand side is a constant. We claim that(3.3) E | V | Y j =1 Y e ∈ S j \ ( S ∪ ... ∪ S j − ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)* h (1) e , h (2) e k h (2) e k + η +(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C ǫ n | V | (1 − ǫ ) , where C ǫ denotes any positive number depending on ǫ , the graph G and certain M δ and L k,δ but not on n .To prove the claim, we write E | V | Y j =1 Y e ∈ S j \ ( S ∪ ... ∪ S j − ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)* h (1) e , h (2) e k h (2) e k + η +(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = E  Y e ∈ S (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)* h (1) e , h (2) e k h (2) e k + η +(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)  | V | Y j =2 Y e ∈ S j \ ( S ∪ ...S j − ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)* h (1) e , h (2) e k h (2) e k + η +(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . All the edges e in the ﬁrst parenthesis are incident to v , whereas all the e in the secondparenthesis are not incident to v . Thus, the term in the second parenthesis is independentof B v and so E | V | Y j =1 Y e ∈ S j \ ( S ∪ ... ∪ S j − ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)* h (1) e , h (2) e k h (2) e k + η +(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = E  E B v Y e ∈ S (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)* h (1) e , h (2) e k h (2) e k + η +(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)  | V | Y j =2 Y e ∈ S j \ ( S ∪ ...S j − ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)* h (1) e , h (2) e k h (2) e k + η +(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C ǫ n min( | S | , − ǫ ) E  | V | Y j =2 Y e ∈ S j \ ( S ∪ ...S j − ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)* h (1) e , h (2) e k h (2) e k + η +(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , where the inequality follows from (3.2). Continuing this procedure, we obtain E | V | Y j =1 Y e ∈ S j \ ( S ∪ ... ∪ S j − ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)* h (1) e , h (2) e k h (2) e k + η +(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C ǫ n min( | S | , − ǫ ) C ǫ n min( | S \ S | , − ǫ ) . . . C ǫ n min( | S | V | \ ( S ∪ ... ∪ S | V |− ) | , − ǫ ) . By Lemma 2.1, it follows that E | V | Y j =1 Y e ∈ S j \ ( S ∪ ... ∪ S j − ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)* h (1) e , h (2) e k h (2) e k + η +(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C ǫ n min( | S | , | V | (1 − ǫ ) , AMPLE COVARIANCE MATRICES CONVERGE TO COMPOUND FREE POISSON DISTRIBUTION 9 possibly with diﬀerent C ǫ . Since by assumption, | S | ≥

4, the claim (3.3) is proved. Havingproved (3.3), before we apply Lemma 3.1, we estimate E  | V | Y j =1 Y e ∈ S j \ ( S ∪ ... ∪ S j − ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)* h (1) e , h (2) e k h (2) e k + η +(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Y e ∈ E ( k h (2) e k + η ) ! ǫ ≤ E  | V | Y j =1 Y e ∈ S j \ ( S ∪ ... ∪ S j − ) k h (1) e k  Y e ∈ E ( k h (2) e k + η ) ǫ ! ≤ E Y e ∈ E k h (1) e k ! Y e ∈ E ( k h (2) e k + η ) ǫ ! ≤ Y e ∈ E E k h (1) e k | E | Y e ∈ E E ( k h (2) e k + η ) | E | ǫ ! | E | , where the last inequality follows from H¨older’s inequality. Combining this estimate with(3.1), (3.3) and Lemma 3.1, we obtain E Y e ∈ E |h h (1) e , h (2) e i| ≤ C ǫ n | V | (1 − ǫ ) Y e ∈ E E k h (1) e k | E | Y e ∈ E E ( k h (2) e k + η ) | E | ǫ ! ǫ | E | . Taking η to be arbitarily small, we have E Y e ∈ E |h h (1) e , h (2) e i| ≤ C ǫ n | V | (1 − ǫ ) Y e ∈ E E k h (1) e k | E | Y e ∈ E E k h (2) e k | E | ǫ ! ǫ | E | ≤ C ǫ n | V | (1 − ǫ ) Y e ∈ E ( L | E | , n ) Y e ∈ E ( L | E | ǫ , n ) ! ǫ | E | ≤ C ǫ n | V | (1 − ǫ ) Y e ∈ E L | E | , Y e ∈ E L | E | ǫ , ! ǫ | E | n ǫ , where the second inequality follows from the assumption. This completes the proof with adiﬀerent ǫ . (cid:3) Lemma 3.4.

Suppose that ( B j ) j ∈ J are independent σ -subalgebras of a probability space (Ω , B , P ) . Let j : { , . . . , p } → J be such that ker j is a crossing partition on { , . . . , p } .For each i = 1 , . . . , p , let f (1) i , f (2) i be B j ( i ) -measurable functions on Ω . Assume that forevery δ > , there exist M δ > and L k,δ > , k ≥ such that (3.4)sup x ∈ S n − E | ( f, x ) | ≤ M δ n − δ ) and E k f k k ≤ L k,δ n δ , f ∈ { f (1)1 , f (2)1 , . . . , f (1) p , f (2) p } , k ≥ Then for every ǫ > , | E ◦ tr( f (1)1 ⊗ f (2)1 )( f (1)2 ⊗ f (2)2 ) . . . ( f (1) p ⊗ f (2) p ) | ≤ C ǫ n |{ j (1) ,...,j ( p ) }| +1 − ǫ , where C ǫ > depends on ǫ, p and certain M δ and L k,δ but not on n .Proof. We may assume that j (1) = j (2) = . . . = j ( p ) = j (1) and each j ( i ) appears at leasttwice in the list j (1) , . . . , j ( p ). Otherwise, if j ( i ) = j ( i + 1) then( f (1) i ⊗ f (2) i )( f (1) i +1 ⊗ f (2) i +1 ) = h f (1) i +1 , f (2) i i ( f (1) i ⊗ f (2) i +1 ) = ( h f (1) i +1 , f (2) i i f (1) i ) ⊗ f (2) i +1 . Note that h f (1) i +1 , f (2) i i f (1) i and f (2) i +1 are B j ( i ) -measurable since j ( i ) = j ( i +1). Also, by H¨older’sinequality, h f (1) i +1 , f (2) i i f (1) i satisﬁes (3.4) perhaps with diﬀerent M δ and L k,δ . Thus, the resultfollows by induction hypothesis since the product ( f (1)1 ⊗ f (2)1 ) . . . ( f (1) p − ⊗ f (2) p ) of p termsbecomes a product of p − i th term and the ( i + 1)th term are combined.)Similar argument works if we have j ( p ) = j (1).If there is a j ( i ) that appears only once in the list j (1) , . . . , j ( p ), then by independence of( B j ) j ∈ J , E ◦ tr( f (1)1 ⊗ f (2)1 ) . . . ( f (1) p − ⊗ f (2) p )= E ◦ tr( f (1)1 ⊗ f (2)1 ) . . . ( f (1) i ⊗ f (2) i ) . . . ( f (1) p − ⊗ f (2) p )= E ◦ tr( f (1)1 ⊗ f (2)1 ) . . . ( E f (1) i ⊗ f (2) i ) . . . ( f (1) p − ⊗ f (2) p )= E ◦ tr( f (1)1 ⊗ f (2)1 ) . . . ( E f (1) i ⊗ f (2) i )( f (1) i +1 ⊗ f (2) i +1 ) . . . ( f (1) p − ⊗ f (2) p )= E ◦ tr( f (1)1 ⊗ f (2)1 ) . . . (( E f (1) i ⊗ f (2) i ) f (1) i +1 ⊗ f (2) i +1 ) . . . ( f (1) p ⊗ f (2) p )= 1 n E ◦ tr( f (1)1 ⊗ f (2)1 ) . . . ( n ( E f (1) i ⊗ f (2) i ) f (1) i +1 ⊗ f (2) i +1 ) . . . ( f (1) p ⊗ f (2) p ) . (3.5)Note that E f (1) i ⊗ f (2) i is a deterministic matrix and |h ( E f (1) i ⊗ f (2) i ) x, y i| = | E h x, f (2) i ih f (1) i , y i|≤ E |h x, f (2) i i||h f (1) i , y i|≤ ( E |h f (2) i , x i| ) ( E |h f (1) i , y i| ) ≤ ( E |h f (2) i , x i| ) ( E |h f (1) i , y i| ) ≤ (cid:18) M δ n − δ ) (cid:19) (cid:18) M δ n − δ ) (cid:19) = √ M δ n − δ , x, y ∈ S n − . Thus, k n E f (1) i ⊗ f (2) i k ≤ p M δ n δ . Hence, n ( E f (1) i ⊗ f (2) i ) f (1) i +1 is B j ( i +1) and still satisﬁes (3.4) perhaps with diﬀerent M δ and L k,δ . Thus, in view of (3.5), the result follows by induction hypothesis since the product( f (1)1 ⊗ f (2)1 ) . . . ( f (1) p ⊗ f (2) p ) of p terms becomes a product of p − i th term isabsorbed by the ( i + 1)th term.)Therefore, we may justiﬁably assume that j (1) = j (2) = . . . = j ( p ) = j (1) and each j ( i )appears at least twice in the list j (1) , . . . , j ( p ). | E ◦ tr( f (1)1 ⊗ f (2)1 )( f (1)2 ⊗ f (2)2 ) . . . ( f (1) p ⊗ f (2) p ) | = 1 n | E h f (2)1 , f (1)2 ih f (2)2 , f (1)3 i . . . h f (2) p , f (1)1 i| ≤ n E |h f (2)1 , f (1)2 i||h f (2)2 , f (1)3 i| . . . |h f (2) p , f (1)1 i| . For notational convenience, let j ( p + 1) = j (1) and f (1) p +1 = f (1)1 . Then we have(3.6) | E ◦ tr( f (1)1 ⊗ f (2)1 )( f (1)2 ⊗ f (2)2 ) . . . ( f (1) p ⊗ f (2) p ) | ≤ n E p Y i =1 |h f (2) i , f (1) i +1 i| . We use Lemma 3.3 to estimate this. First, we take the vertex set V = { j (1) , . . . , j ( p ) } andthe edge set E = { , . . . , p } , where for each i ∈ E , the two endpoints are u ( i ) = j ( i ) and AMPLE COVARIANCE MATRICES CONVERGE TO COMPOUND FREE POISSON DISTRIBUTION 11 u ( i ) = j ( i + 1). There are no loops since we assume that j ( i ) = j ( i + 1) for all i = 1 , . . . , p .For each i ∈ E , take h (1) i = f (2) i and h (2) i = f (1) i +1 . To see that every vertex has degree atleast 4, recall that we assume that for every j ∈ V = { j (1) , . . . , j ( p ) } , there exist i = i in { , . . . , p } such that j ( i ) = j ( i ) = j . Since j (1) = j (2) = . . . = j ( p ) = j (1), i and i cannot be consective numbers. Therefore, the vertex j is incident with the four distinctedges i − , i , i − , i . (When i = 1, i − p .) Thus, the assumptions of Lemma 3.3are satisﬁed and so we obtain E p Y i =1 |h f (2) i , f (1) i +1 i| ≤ C ǫ n |{ j (1) ,...,j ( p ) }| (1 − ǫ ) . The result follows by combining this with 3.6. (cid:3)

Remark.

In Lemma 3.4, the assumption that ker j is a crossing partition is necessary becauseit guarantees that repeating the procedure of (1) combining the i th term and the ( i + 1)thterm when j ( i ) = j ( i + 1) and (2) the i th term being absorbed by the ( i + 1)th term when j ( i )appears only once in the list j (1) , . . . , j ( p ) does not make reduce { , . . . , p } to a singleton.Without the crossing assumption, one would have got Lemma 3.6 below.As an immediate consequence of Lemma 3.4, we have Proposition 3.5.

Suppose that ( f j ) j ∈ J is an independent family of random vectors on C n such that sup x ∈ S n − E | ( f j , x ) | ≤ Ln and E k f j k k ≤ L k , j ∈ J, k ≥ for some L > and L k > , k ≥ independent of N . Let j : { , . . . , p } → J be such that ker j is a crossing partition on { , . . . , p } . Then for every ǫ > , | E ◦ tr( f j (1) ⊗ f j (1) ) . . . ( f j ( p ) ⊗ f j ( p ) ) | ≤ C ǫ n |{ j (1) ,...,j ( p ) }| +1 − ǫ , where C ǫ > depends on ǫ, p, L and certain L k but not on n . The following lemma is the analog of Lemma 3.4 for noncrossing partition.

Lemma 3.6.

Suppose that ( B j ) j ∈ J are independent σ -subalgebras of a probability space (Ω , B , P ) . Let j : { , . . . , p } → J be such that ker j is a noncrossing partition on { , . . . , p } .For each i = 1 , . . . , p , let f (1) i , f (2) i be B j ( i ) -measurable functions on Ω . Assume that for every δ > , there exist M δ > and L k,δ > , k ≥ such that sup x ∈ S n − E | ( f, x ) | ≤ M δ n − δ ) and E k f k k ≤ L k,δ n δ , f ∈ { f (1)1 , f (2)1 , . . . , f (1) p , f (2) p } , k ≥ . Then for every ǫ > , (3.7) k E ( f (1)1 ⊗ f (2)1 )( f (1)2 ⊗ f (2)2 ) . . . ( f (1) p ⊗ f (2) p ) k ≤ C ǫ n |{ j (1) ,...,j ( p ) }|− ǫ , where C ǫ > depends on ǫ, p and certain M δ and L k,δ but not on n . The only diﬀerences are that on the left hand side of (3.7), one has norm of expectation in-stead of trace expectation and that on the right hand side of (3.7), one only has C ǫ n |{ j (1) ,...,j ( p ) }|− ǫ instead of C ǫ n |{ j (1) ,...,j ( p ) }| +1 − ǫ in Lemma 3.4. The proof of Lemma 3.6 is exactly the same as thebeginning of the proof of Lemma 3.4. One needs the fact that for every noncrossing partition π on { , . . . , p } , at least one of the following holds.(1) There exists i ∈ { , . . . , p − } such that i and i + 1 are in the same block of π .(2) π has a singleton block. This is because every noncrossing partition contains an interval block.As an immediate consequence of Lemma 3.7, we have

Lemma 3.7.

Suppose that ( f j ) j ∈ J is an independent family of random vectors on C n suchthat sup x ∈ S n − E | ( f j , x ) | ≤ Ln and E k f j k k ≤ L k , j ∈ J, k ≥ for some L > and L k > , k ≥ independent of N . Let j : { , . . . , p } → J be such that ker j is a noncrossing partition on { , . . . , p } . Then for every ǫ > , k E ( f j (1) ⊗ f j (1) ) . . . ( f j ( p ) ⊗ f j ( p ) ) k ≤ C ǫ n |{ j (1) ,...,j ( p ) }|− ǫ , where C ǫ > depends on ǫ, p, L and certain L k but not on n . Proposition 3.8.

Suppose that f , . . . , f N are independent random vectors on C n such that sup x ∈ S n − E | ( f j , x ) | ≤ Ln and E k f j k k ≤ L k , j = 1 , . . . , N, k ≥ for some L > and L k > , k ≥ independent of n and N . If n, N → ∞ in such a way that nN → λ ∈ (0 , ∞ ) and n ǫ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) N X j =1 E k f j k k − f j ⊗ f j − a k I (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) → , k ≥ , for some a k ∈ C , k ≥ and ǫ > independent of n and N , then for every noncrossingpartition π on { , . . . , p } , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X j : { ,...,p }→{ ,...,N } ker j = π E ◦ tr( f j (1) ⊗ f j (1) ) . . . ( f j ( p ) ⊗ f j ( p ) ) − Y B ∈ π a | B | (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) → . Proof.

We prove by induction on p . For p = 1, the result is obvious. For p ≥

2, since π is anoncrossing partition on { , . . . , p } , there is an interval block B ∈ π . For simplicity, sincethe trace is cyclic invariant, we may assume that B = { , . . . , q } for some 1 ≤ q ≤ p . Thus,for every j : { , . . . , p } → { , . . . , N } with ker j = π , we have E ◦ tr( f j (1) ⊗ f j (1) ) . . . ( f j ( p ) ⊗ f j ( p ) )=tr E ( f j (1) ⊗ f j (1) ) . . . ( f j ( q ) ⊗ f j ( q ) ) E ( f j ( q +1) ⊗ f j ( q +1) ) . . . ( f j ( p ) ⊗ f j ( p ) )=tr E ( k f j (1) k q − f j (1) ⊗ f j (1) ) E ( f j ( q +1) ⊗ f j ( q +1) ) . . . ( f j ( p ) ⊗ f j ( p ) ) , since j (1) = . . . = j ( q ). Note that every j : { , . . . , p } → { , . . . , N } with ker j = π corre-sponds to j : { q + 1 , . . . , p } → { , . . . , N } with ker l = π \{ B } and j (1) ∈ { , . . . , N }\{ j ( q +1) , . . . , j ( p ) } . Thus, X j : { ,...,p }→{ ,...,N } ker j = π E ◦ tr( f j (1) ⊗ f j (1) ) . . . ( f j ( p ) ⊗ f j ( p ) )= X j : { q +1 ,...,p }→{ ,...,N } ker j = π \{ B } X j (1) ∈{ ,...,N }\{ j ( q +1) ,...,j ( p ) } tr E ( k f j (1) k q − f j (1) ⊗ f j (1) ) E ( f j ( q +1) ⊗ f j ( q +1) ) . . . ( f j ( p ) ⊗ f j ( p ) ) AMPLE COVARIANCE MATRICES CONVERGE TO COMPOUND FREE POISSON DISTRIBUTION 13 = X j : { q +1 ,...,p }→{ ,...,N } ker j = π \{ B } X j (1) ∈{ ,...,N } tr E ( k f j (1) k q − f j (1) ⊗ f j (1) ) E ( f j ( q +1) ⊗ f j ( q +1) ) . . . ( f j ( p ) ⊗ f j ( p ) ) − X j : { q +1 ,...,p }→{ ,...,N } ker j = π \{ B } X j (1) ∈{ j ( q +1) ,...,j ( p ) } tr E ( k f j (1) k q − f j (1) ⊗ f j (1) ) E ( f j ( q +1) ⊗ f j ( q +1) ) . . . ( f j ( p ) ⊗ f j ( p ) )=tr  X j (1) ∈{ ,...,N } E k f j (1) k q − f j (1) ⊗ f j (1)  X j : { q +1 ,...,p }→{ ,...,N } ker j = π \{ B } E ( f j ( q +1) ⊗ f j ( q +1) ) . . . ( f j ( p ) ⊗ f j ( p ) )  − X j : { q +1 ,...,p }→{ ,...,N } ker j = π \{ B } X j (1) ∈{ j ( q +1) ,...,j ( p ) } tr E ( k f j (1) k q − f j (1) ⊗ f j (1) ) E ( f j ( q +1) ⊗ f j ( q +1) ) . . . ( f j ( p ) ⊗ f j ( p ) )=tr a q I  X j : { q +1 ,...,p }→{ ,...,N } ker j = π \{ B } E ( f j ( q +1) ⊗ f j ( q +1) ) . . . ( f j ( p ) ⊗ f j ( p ) )  + tr  X j (1) ∈{ ,...,N } E k f j (1) k q − f j (1) ⊗ f j (1) − a q I  X j : { q +1 ,...,p }→{ ,...,N } ker j = π \{ B } E ( f j ( q +1) ⊗ f j ( q +1) ) . . . ( f j ( p ) ⊗ f j ( p ) )  − X j : { q +1 ,...,p }→{ ,...,N } ker j = π \{ B } X j (1) ∈{ j ( q +1) ,...,j ( p ) } tr E ( k f j (1) k q − f j (1) ⊗ f j (1) ) E ( f j ( q +1) ⊗ f j ( q +1) ) . . . ( f j ( p ) ⊗ f j ( p ) ) . By induction hypothesis, the ﬁrst termtr a q I  X j : { q +1 ,...,p }→{ ,...,N } ker j = π \{ B } E ( f j ( q +1) ⊗ f j ( q +1) ) . . . ( f j ( p ) ⊗ f j ( p ) )  converges to a q Y B ∈ π \{ B } a | B | = Y B ∈ π a | B | . For the second term, (cid:12)(cid:12)(cid:12)(cid:12) tr  X j (1) ∈{ ,...,N } E k f j (1) k q − f j (1) ⊗ f j (1) − a q I  X j : { q +1 ,...,p }→{ ,...,N } ker j = π \{ B } E ( f j ( q +1) ⊗ f j ( q +1) ) . . . ( f j ( p ) ⊗ f j ( p ) )  (cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) X j (1) ∈{ ,...,N } E k f j (1) k q − f j (1) ⊗ f j (1) − a q I (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) X j : { q +1 ,...,p }→{ ,...,N } ker j = π \{ B } k E ( f j ( q +1) ⊗ f j ( q +1) ) . . . ( f j ( p ) ⊗ f j ( p ) ) k  ≤ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) X j (1) ∈{ ,...,N } E k f j (1) k q − f j (1) ⊗ f j (1) − a q I (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) X j : { q +1 ,...,p }→{ ,...,N } ker j = π \{ B } C ǫ n |{ j ( q +1) ,...,j ( p ) }|− ǫ  by Lemma 3 . ≤ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) X j (1) ∈{ ,...,N } E k f j (1) k q − f j (1) ⊗ f j (1) − a q I (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) C ǫ n ǫ → . For the third term, (cid:12)(cid:12)(cid:12)(cid:12) X j : { q +1 ,...,p }→{ ,...,N } ker j = π \{ B } X j (1) ∈{ j ( q +1) ,...,j ( p ) } tr E ( k f j (1) k q − f j (1) ⊗ f j (1) ) E ( f j ( q +1) ⊗ f j ( q +1) ) . . . ( f j ( p ) ⊗ f j ( p ) ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ X j : { q +1 ,...,p }→{ ,...,N } ker j = π \{ B } X j (1) ∈{ j ( q +1) ,...,j ( p ) } k E ( k f j (1) k q − f j (1) ⊗ f j (1) ) kk E ( f j ( q +1) ⊗ f j ( q +1) ) . . . ( f j ( p ) ⊗ f j ( p ) ) k X j : { q +1 ,...,p }→{ ,...,N } ker j = π \{ B } X j (1) ∈{ j ( q +1) ,...,j ( p ) } C n − C n |{ j ( q +1) ,...,j ( p ) }|− by Lemma 3 . ǫ = 14 AMPLE COVARIANCE MATRICES CONVERGE TO COMPOUND FREE POISSON DISTRIBUTION 15 ≤ X j : { q +1 ,...,p }→{ ,...,N } ker j = π \{ B } p C n − C n |{ j ( q +1) ,...,j ( p ) }|− = X j : { q +1 ,...,p }→{ ,...,N } ker j = π \{ B } Cn |{ j ( q +1) ,...,j ( p ) }| + ≤ Cn → . (cid:3) Proof of Theorem 1.1. E ◦ tr( f ⊗ f + . . . + f N ⊗ f N ) p = X j : { ,...,p }→{ ,...,N } E ◦ tr( f j (1) ⊗ f j (1) ) . . . ( f j ( p ) ⊗ f j ( p ) )= X j : { ,...,p }→{ ,...,N } ker j noncrossing E ◦ tr( f j (1) ⊗ f j (1) ) . . . ( f j ( p ) ⊗ f j ( p ) )+ X j : { ,...,p }→{ ,...,N } ker j crossing E ◦ tr( f j (1) ⊗ f j (1) ) . . . ( f j ( p ) ⊗ f j ( p ) )The ﬁrst term converges to X π ∈ NC( p ) Y B ∈ π a | B | by Proposition 3.8. For the second term, X j : { ,...,p }→{ ,...,N } ker j crossing E ◦ tr( f j (1) ⊗ f j (1) ) . . . ( f j ( p ) ⊗ f j ( p ) ) ≤ X j : { ,...,p }→{ ,...,N } ker j crossing C n |{ j (1) ,...,j ( p ) }| +1 − by Proposition 3 . ǫ = 12 ≤ Cn → . (cid:3) References [1] R. Adamczak, On the Marchenko-Pastur and circular laws for some classes of random matrices withdependent entries, Electron. J. Probab. (2011), 1068-1095.[2] V. Marchenko, L. Pastur, Distribution of eigenvalues of some sets of random matrices, Math USSR-Sb.1, (1967), 507-536. Department of Mathematics, Texas A&M University, College Station, Texas 77843

E-mail address ::