[PDF] Bernstein type inequality for a class of dependent random matrices

Abstract

In this paper we obtain a Bernstein type inequality for the sum of self-adjoint centered and geometrically absolutely regular random matrices with bounded largest eigenvalue. This inequality can be viewed as an extension to the matrix setting of the Bernstein-type inequality obtained by Merlevède et al. (2009) in the context of real-valued bounded random variables that are geometrically absolutely regular. The proofs rely on decoupling the Laplace transform of a sum on a Cantor-like set of random matrices.

Full PDF

aa r X i v : . [ m a t h . P R ] A p r Bernstein type inequality for a class of dependent randommatrices

Marwa Banna a , Florence Merlev`ede b , Pierre Youssef ca b Universit´e Paris Est, LAMA (UMR 8050), UPEMLV, CNRS, UPEC, 5 Boulevard Descartes,77 454 Marne La Vall´ee, France.E-mail: [email protected]; ﬂ[email protected] c Department of Mathematical and Statistical sciences. University of Alberta, Canada.Email: [email protected] words: Random matrices, Bernstein inequality, Deviation inequality, Absolute regular-ity, β -mixing coeﬃcients.Mathematics Subject Classiﬁcation (2010): 60B20, 60F10. Abstract.

In this paper we obtain a Bernstein type inequality for the sum of self-adjoint centeredand geometrically absolutely regular random matrices with bounded largest eigenvalue. Thisinequality can be viewed as an extension to the matrix setting of the Bernstein-type inequalityobtained by Merlev`ede et al. (2009) in the context of real-valued bounded random variablesthat are geometrically absolutely regular. The proofs rely on decoupling the Laplace transformof a sum on a Cantor-like set of random matrices.

The analysis of the spectrum of large matrices has known signiﬁcant development recentlydue to its important role in several domains. One of the questions is to study the ﬂuctuations ofa Hermitian matrix X from its expectation measured by its largest eigenvalue. Matrix concen-tration inequalities give probabilistic bounds for such ﬂuctuations and provide eﬀective methodsfor studying several models. The Laplace transform method, which is due to Bernstein in thescalar case, was generalized to the sum of independent Hermitian random matrices by Ahlswedeand Winter in [2]. The starting point is that the usual Chernoﬀ bound when we deal with thepartial sums associated to real-valued random variables has the following counterpart in thematrix setting: P (cid:16) λ max (cid:0) n X i =1 X i (cid:1) ≥ x (cid:17) ≤ inf t> n e − tx · E Tr (cid:0) e t P ni =1 X i (cid:1)o (1)(see [2]). Here and all along the paper, ( X i ) i ≥ is a family of d × d self-adjoint random matrices.The main problem is then to give a suitable bound for L n ( t ) := E Tr (cid:0) e t P ni =1 X i (cid:1) . In the inde-pendent case, starting from the Golden-Thompson inequality stating that if A and B are twoself-adjoint matrices, Tr (cid:0) e A + B (cid:1) ≤ Tr (cid:0) e A e B (cid:1) , Ahlswede and Winter have observed that E Tr (cid:0) e t P ni =1 X i (cid:1) ≤ λ max ( E (e t X n )) · E Tr (cid:0) e t P n − i =1 X i (cid:1) (2)and gave a bound for L n ( t ) by iterating the procedure above. In [17], Tropp used Lieb’s concavitytheorem (see [9]) to improve the bound on L n ( t ) stated in [2] and obtained Lemma 4 of Section1.1. This lemma then allows to extend to the matrix setting the usual Bernstein inequality forthe partial sum associated with independent real-valued random variables.Let us mention that an extension of the so-called Hoeﬀding-Azuma inequality for matrixmartingales and of the so-called McDiarmid bounded diﬀerence inequality for matrix-valuedfunctions of independent random variables is also stated in [17].Taking another direction, Mackey et al. [10] extended to the matrix setting Chatterjee’stechnique for developing scalar concentration inequalities via Stein’s method of exchangeablepairs (see [4] and [5]), and established Bernstein and Hoeﬀding inequalities as well as concentra-tion inequalities. Following this approach, Paulin et al. [14] established a so-called McDiarmidinequality for matrix-valued functions of dependent random variables under conditions on theassociated Dobrushin interdependence matrix.The aim of this paper is to give an extension of the Bernstein deviation inequality whenwe consider the largest eigenvalue of the partial sums associated with self-adjoint, centered andabsolutely regular random matrices with bounded largest eigenvalue. This kind of dependencecannot be compared to the dependence structure imposed in [10] or in [14].Note that for dependent random matrices, the ﬁrst step given by (2) of the iterative procedurein [2] fails as well as the concave trace function method used in [17]. Therefore additionaltransformations on the Laplace transform have to be made. Even in the scalar dependentcase, obtaining sharp Bernstein-type inequalities is a challenging problem and a dependencestructure of the underlying process has obviously to be precise. For instance, Adamczak [1]proved a Bernstein-type inequality for the partial sum associated with bounded functions of ageometrically ergodic Harris recurrent Markov chain. He showed that even in this context whereit is possible to go back to the independent setting by creating random iid cycles, a logarithmicextra factor (compared to the independent case) cannot be avoided (see Theorem 6 and Section3.2 in [1]).In [11] and [12], Merlev`ede et al. considered more general dependence structures than Harrisrecurrent Markov chains and proved Bernstein-type inequalities for the partial sums associatedwith bounded real-valued random variables whose strong mixing coeﬃcients (or τ -dependentcoeﬃcients) decrease geometrically or sub-geometrically. Note that in [12], the case of real-valued random variables that are not necessarily bounded is also treated. The method usedin both papers mentioned consists of partitioning the n random variables in blocks indexed byCantor-type sets plus a remainder. The idea is then to control the log-Laplace transform of eachpartial sum on the Cantor-type sets. The log-Laplace transform of the total partial sum is thenhandled with the help of a general result which provides bounds for the log-Laplace transform ofany sum of real-valued random variables (see our Lemma 5 in the context of random matrices).Obviously, the main step is to obtain a suitable upper bound of the log-Laplace transform of thepartial sum on each of the Cantor-type set. The dependence structure assumed in [11] or [12]allow the following control: for any index sets Q and Q ′ of natural numbers such that Q ⊂ [1 , p ]and Q ′ ⊂ [ n + p, ∞ ) and any t > E (cid:0) e t P i ∈ Q X i e t P i ∈ Q ′ X i (cid:1) ≤ E (cid:0) e t P i ∈ Q X i (cid:1) E (cid:0) e t P i ∈ Q ′ X i (cid:1) + ε ( n ) k e t P i ∈ Q X i k ∞ k e t P i ∈ Q ′ X i k ∞ , (3)where ε ( n ) is a sequence of positive real numbers depending on the dependent coeﬃcients. Thebinary tree structure of the Cantor-type sets allows to iterate the decorrelation procedure above-mentioned allowing then to suitably handle the log-Laplace transform of the partial sum on eachof the Cantor-type set.In the random matrix setting, iterating a procedure as (3) cannot lead to suitable exponentialinequalities essentially due to the fact that the extension of the Golden-Thompson inequality tothree or more Hermitian matrices fails, and then the extension of the exponential inequalitiesstated in [11] and [12] to the matrix setting is not straight forward. To beneﬁt of the ideasdeveloped in [2] or in [17], we shall rather bound the log-Laplace transform of the partial sum2ndexed by a Cantor-type set, say K , by the log-Laplace transform of the sum of 2 ℓ independentand self-adjoint random matrices plus a small error term (here ℓ depends on the cardinality of K ). Lemma 8 is in this direction and can be viewed as a decoupling lemma for the Laplacetransform in the matrix setting. As we shall see, a well-adapted dependence structure allowingsuch a procedure is the absolute regularity structure. Indeed, the Berbee’s coupling lemma(see Lemma 6 below) allows a ”good coupling” in terms of absolute regular coeﬃcients (see thedeﬁnition (4)) even when the underlying random variables take values in a high dimensionalspace (working with d × d random matrices can be viewed as working with random vectors ofdimension d ). The decoupling lemma 8 associated with additional coupling arguments will thenallow us to prove our key Proposition 7 giving a bound for the Laplace transform of the partialsum indexed by Cantor-type set of self-adjoint random matrices. As we shall see, our methodallows to extend the scalar Bernstein type inequality given in [11] to the matrix setting.Our paper is organized as follows. In Section 2, we introduce some notations and deﬁnitionsand state our Bernstein-type inequality for the class of random matrices we consider (see The-orem 1). Section 3 is devoted to some examples of matrix models where this Bernstein-typeinequality applies. The proof of the main result is given in Section 4. For any d × d matrix X = [( X ) i,j ] di,j =1 whose entries belong to K = R or C , we associate itscorresponding vector X in K d whose coordinates are the entries of X i.e. X = (cid:0) ( X ) i,j , ≤ i ≤ d (cid:1) ≤ j ≤ d . Therefore X = (cid:0) X i , ≤ i ≤ d (cid:1) where X i = ( X ) i − ( j − d,j for ( j − d + 1 ≤ i ≤ jd , and X will be called the vector associated with X . Reciprocally, given X = (cid:0) X ℓ , ≤ ℓ ≤ d (cid:1) in K d we shall associate a d × d matrix X by setting X = (cid:2) ( X ) i,j (cid:3) ni,j =1 where ( X ) i,j = X i +( j − d . The matrix X will be referred to as the matrix associated with X .In all the paper we consider a family ( X i ) i > of d × d self-adjoint random matrices whose en-tries are deﬁned on a probability space (Ω , A , P ) and with values in K , and that are geometricallyabsolutely regular in the following sense. Let β = 1 and β k = sup j > β ( σ ( X i , i ≤ j ) , σ ( X i , i > j + k )) , for any k ≥

1, (4)where β ( A , B ) = 12 sup (cid:8) X i ∈ I X j ∈ J | P ( A i ∩ B j ) − P ( A i ) P ( B j ) | (cid:9) , the maximum being taken over all ﬁnite partitions ( A i ) i ∈ I and ( B i ) i ∈ J of Ω respectively withelements in A and B .The ( β k ) k > are usually called the coeﬃcients of absolute regularity of the sequence of vectors( X i ) i > and we shall assume in this paper that they decrease geometrically in the sense thatthere exists c > k > β k = sup j > β ( σ ( X i , i ≤ j ) , σ ( X i , i > j + k )) ≤ e − c ( k − , (5)3ote that the β k coeﬃcients have been introduced by Kolmogorov and Rozanov [8] and even ifthey are more restrictive than the so-called Rosenblatt strong mixing coeﬃcients α k they canbe computed in many situations. For instance, we refer to the work by Doob [6] for suﬃcientconditions on Markov chains to be geometrically absolutely regular or by Mokkadem [13] formild conditions ensuring vector ARMA processes to be also geometrically β -mixing.In all the paper, we will assume that the underlying probability space (Ω , A , P ) is rich enoughto contain a sequence ( ǫ i ) i ∈ Z = ( δ i , η i ) i ∈ Z of iid random variables with uniform distribution over[0 , , independent of ( X i ) i > . In addition, the following notations will be used : log x := ln x ,log x = log x log 2 , we write for the zero matrix and I d for the d × d identity matrix, we use the curlyinequalities to denote the semideﬁnite ordering i.e. (cid:22) X means that X is positive semideﬁnite. Theorem 1

Let ( X i ) i > be a family of self-adjoint random matrices of size d . Assume that (5) holds and that there exists a positive constant M such that for any i ≥ , E ( X i ) = and λ max ( X i ) ≤ M almost surely. (6) Then there exists a universal positive constant C such that for any x > and any integer n > , P (cid:16) λ max (cid:0) n X i =1 X i (cid:1) ≥ x (cid:17) ≤ d exp (cid:16) − Cx v n + c − M + xM γ ( c, n ) (cid:17) . where v = sup K ⊆{ ,...,n } K λ max (cid:16) E (cid:0) X i ∈ K X i (cid:1) (cid:17) (7) and γ ( c, n ) = log n log 2 max (cid:16) ,

32 log nc log 2 (cid:17) . (8)In the deﬁnition of v above, the maximum is taken over all nonempty subsets K ⊆ { , . . . , n } .To prove the deviation inequality stated in Theorem 1, we shall use the matrix Chernoﬀbound (1). The theorem will then follow from the following control of the matrix log-Laplacetransform that is proved in Section 4.3: Under the conditions of Theorem 1, for any positive t such that tM < /γ ( c, n ), we havelog E Tr (cid:16) exp (cid:0) t n X i =1 X i (cid:1)(cid:17) ≤ log d + t n (cid:0) v + 2 M/ ( cn ) / (cid:1) − tM γ ( c, n ) . As proved in Section 4.2.4 of [10], this inequality together with Jensen’s inequality leads tothe following upper bound for the expectation of the largest eigenvalue of P ni =1 X i : Under theconditions of Theorem 1, E λ max (cid:16) n X i =1 X i (cid:17) ≤ v p n log d + 4 M c − / p log d + M γ ( c, n ) log d . Let ( τ k ) k be a stationary sequence of real-valued random variables such that k τ k ∞ ≤ Y k ) k of independent real and symmetric d × d random matrices which isindependent of ( τ k ) k . For any i = 1 , . . . , n , let X i = τ i Y i and note that in this case β k = β ( σ ( τ i , i ≤ , σ ( τ i , i ≥ k )) . orollary 2 Assume that there exists a positive constant c such that β k ≤ e − c ( k − for any k ≥ and suppose that each random matrix Y k satisﬁes EY k = , λ max ( Y k ) ≤ M and λ min ( Y k ) ≥ − M almost surely.Then for any t > and any integer n ≥ , P (cid:16) λ max (cid:0) n X k =1 τ k Y k (cid:1) ≥ t (cid:17) ≤ d exp (cid:16) − Ct nM E ( τ ) + M + tM (log n ) (cid:17) , where C is a positive constant depending only on c .Proof. The above corollary follows by noting that for any K ⊆ { , . . . , n } Σ K := E (cid:0) X k ∈ K τ k Y k (cid:1) = X k ∈ K E ( τ k ) E ( Y k ) = E ( τ ) X k ∈ K E ( Y k ) , which, by Weyl’s inequality, implies that λ max (Σ K ) ≤ M Card( K ) E ( τ ). Therefore, we inferthat v ≤ M E ( τ ). (cid:3) We consider now another model for which Theorem 1 can be applied. Let ( X k ) k ∈ Z be ageometrically absolutely regular sequence of real-valued centered random variables. That is,there exists a positive constant c such that for any k ≥ ℓ ∈ Z β (cid:0) σ ( X i , i ≤ ℓ ) , σ ( X i , i ≥ k + ℓ ) (cid:1) ≤ e − c ( k − . (9)For any i = 1 , . . . , n , let X i be the d × d random matrix deﬁned by X i = C i C Ti − E ( C i C Ti ) where C i = ( X ( i − d +1 , . . . , X id ) T . Note that in this case, β k = sup ℓ ∈ Z β (cid:0) σ ( C i , i ≤ ℓ ) , σ ( C i , i ≥ ℓ + k ) (cid:1) ≤ e − c d ( k − . for any k ≥ Corollary 3

Assume that ( X k ) k satisﬁes (9) . Suppose in addition that there exists a positiveconstant M satisfying sup k k X k k ∞ ≤ M a.s. Then, for any x > and any integer n ≥ P (cid:16) λ max (cid:0) n X i =1 X i (cid:1) ≥ x (cid:17) ≤ d exp (cid:16) − Cx ndM + dM + xM ( d log n + log n ) (cid:17) , where C is a positive constant depending only on c .Proof. For any i ∈ { , . . . , n } , note that λ max ( X i ) ≤ λ max ( C i C Ti ) implying that λ max ( X i ) ≤ dM a.s. To get the desired result, it remains to control v . We have for any K ⊆ { , . . . , N } ,Σ K := E (cid:0) X i ∈ K X i (cid:1) = X i,j ∈ K Cov( C i C Ti , C j C Tj )and we note that the ( k, ℓ ) th component of Σ K is(Σ K ) k,ℓ = h E (cid:0) X i ∈ K X i (cid:1) i k,ℓ = X i,j ∈ K d X s =1 Cov (cid:0) X ( i − d + k X ( i − d + s , X ( j − d + s X ( j − d + ℓ (cid:1) . Therefore we infer by Gerschgorin’s theorem that (cid:12)(cid:12) λ max (cid:0) Σ K (cid:1)(cid:12)(cid:12) sup k d X ℓ =1 | (Σ K ) k,ℓ | sup k X i,j ∈ K d X ℓ =1 d X s =1 (cid:12)(cid:12) Cov (cid:0) X ( i − d + k X ( i − d + s , X ( j − d + s X ( j − d + ℓ (cid:1)(cid:12)(cid:12) . After tedious computations involving Ibragimov’s covariance inequality (see [7]), we infer that v ≤ c dM where c is a positive constant depending only on c . Applying Theorem 1 withthese upper bounds ends the proof. (cid:3) Proof of Theorem 1

The proof of Theorem 1 being very technical, it is divided into several steps. In Section 4.1,we ﬁrst collect some technical preliminary lemmas that will be necessary all along the proof.In Section 4.2, we give the main ingredient to prove our Bernstein-type inequality, namely: abound for the Laplace transform of the partial sum, indexed by a suitable Cantor-type set, ofthe self-adjoint random matrices under consideration (see Proposition 7 and Section 4.2.1 for theconstruction of this suitable Cantor-set). As quoted in the introduction, this key result is basedon a decoupling lemma which is stated in Section 4.2.2. The proof of Theorem 1 is completedin Section 4.3.

The following lemma is due to Tropp [17]. Under the form stated below, it is a combination ofhis Lemmas 3.4 and 6.7 together with the proof of his Corollary 3.7.

Lemma 4

Let K be a ﬁnite subset of positive integers. Consider a family ( U k ) k ∈ K of d × d self-adjoint random matrices that are mutually independent. Assume that for any k ∈ K , E ( U k ) = and λ max ( U k ) ≤ B a.s.where B is a positive constant. Then for any t > , E Tr (cid:0) e t P k ∈ K U k (cid:1) ≤ d exp (cid:16) t g ( tB ) λ max (cid:16) X k ∈ K E ( U k ) (cid:17)(cid:17) , (10) where g ( x ) = x − (e x − x − . The next lemma is an adaptation of Lemma 3 in [12] to the case of the log-Laplace transformof any sum of d × d self-adjoint random matrices. Lemma 5

Let U , U , . . . be a sequence of d × d self-adjoint random matrices. Assume thatthere exists positive constants σ , σ , . . . and κ , κ , . . . such that, for any i ≥ and any t in [0 , /κ i [ , log E Tr (cid:0) e t U i (cid:1) ≤ C d + ( σ i t ) / (1 − κ i t ) , where C d is a positive constant depending only on d . Then, for any positive n and any t in [0 , / ( κ + κ + · · · + κ n )[ , log E Tr (cid:0) e t P nk =0 U k (cid:1) ≤ C d + ( σt ) / (1 − κt ) , where σ = σ + σ + · · · + σ n and κ = κ + κ + · · · + κ n . Proof.

Lemma 5 follows from the case n = 1 by induction on n . For any t ≥

0, let L ( t ) = log E Tr (cid:0) e t ( U + U ) (cid:1) and notice that by the Golden-Thompson inequality, L ( t ) ≤ log E Tr (cid:0) e t U e t U (cid:1) . (11)Deﬁne the functions γ i by γ i ( t ) = ( σ i t ) / (1 − κ i t ) for t ∈ [0 , /κ i [ and γ i ( t ) = + ∞ for t ≥ /γ i , A and B are d × d self-adjoint random matrices then, for any 1 ≤ p, q ≤ ∞ with p − + q − = 1, | Tr( AB ) | ≤ k A k S p k B k S q , (12)where k A k S p = k ( λ i ( A )) di =1 k ℓ pn = (cid:16) P di =1 | λ i ( A ) | p (cid:17) /p (resp. k B k S q ) is the p -Schatten norm of A (resp the q -Schatten norm of B ).Starting from (11) and applying (12) with A = e t U and B = e t U , we derive that for any t > p ∈ ]1 , ∞ [ L ( t ) ≤ log E (cid:16) k e t U k S p k e t U k S q (cid:17) , which gives by applying H¨older’s inequality L ( t ) ≤ p − log E k e t U k p S p + q − log E k e t U k q S q . Observe now that since U is self-adjoint k e t U k p S p = d X i =1 | λ i (e t U ) | p = d X i =1 λ i (e tp U ) = Tr (cid:0) e tp U (cid:1) , and similarly k e t U k q S q = Tr (cid:0) e tq U (cid:1) . So, overall, L ( t ) ≤ p − log E Tr (cid:0) e tp U (cid:1) + q − log E Tr (cid:0) e tq U (cid:1) . (13)For any t in [0 , /κ [, take u t = ( σ /σ )(1 − κt ) + κ t (here κ = κ + κ and σ = σ + σ ). Withthis choice 1 − u t = ( σ /σ )(1 − κt ) + κ t , so that u t belongs to ]0 , p = 1 /u t , we get that for any t in [0 , /κ [, L ( t ) ≤ u t γ ( t/u t ) + (1 − u t ) γ ( t/ (1 − u t )) = ( σt ) / (1 − κt ) , which completes the proof of Lemma 5. (cid:3) Next lemma allows coupling and is due to Berbee [3].

Lemma 6

Let X and Y be two random variables deﬁned on a probability space (Ω , A , P ) andtaking their values in Borel spaces B and B respectively. Assume that (Ω , A , P ) is rich enoughto contain a random variable δ with uniform distribution over [0 , independent of ( X, Y ) .Then there exists a random variable Y ∗ = f ( X, Y, δ ) where f is a measurable function from B × B × [0 , into B such that Y ∗ is independent of X , has the same distribution as Y and P ( Y = Y ∗ ) = β ( σ ( X ) , σ ( Y )) . Let us note that the β -mixing coeﬃcient β ( σ ( X ) , σ ( Y )) has the following equivalent deﬁnition: β ( σ ( X ) , σ ( Y )) = 12 k P X,Y − P X ⊗ P Y k , (14)where P X,Y is the joint distribution of (

X, Y ) and P X and P Y are respectively the distributionsof X and Y and, for two positive measures µ and ν , the notation k µ − ν k denotes the totalvariation of µ − ν . 7 .2 A key result The next proposition is the main ingredient to prove Theorem 1. It is based on a suitableconstruction of a subset K A of { , . . . , A } for which it is possible to give a good upper bound forthe Laplace transform of P i ∈ K A X i . Its proof is based on the decoupling Lemma 8 below thatallows to compare E Tr (cid:0) e t P i ∈ KA X i (cid:1) with the same quantity but replacing P i ∈ K A X i with a sumof independent blocks. Proposition 7

Let ( X i ) i > be as in Theorem 1. Let A be a positive integer larger than . Thenthere exists a subset K A of { , . . . , A } with Card( K A ) > A/ , such that for any positive t suchthat tM ≤ min (cid:0) , c log 232 log A (cid:1) , log E Tr (cid:16) e t P i ∈ KA X i (cid:17) ≤ log d + 4 × . t Av + 9( tM ) c e − c/ (32 tM ) , (15) where v is deﬁned in (7) . The proof of this proposition is divided into several steps. K A As in [11] and [12], the set K A will be a ﬁnite union of 2 ℓ disjoint sets of consecutive integerswith same cardinality spaced according to a recursive ‘Cantor”-like construction. Let δ = log 22 log A and ℓ := ℓ A = sup { k ∈ N ∗ : Aδ (1 − δ ) k − k > } . Note that ℓ ≤ log A/ log 2 and δ ≤ / A > n = A and for any j ∈ { , . . . , ℓ } , n j = (cid:6) A (1 − δ ) j j (cid:7) and d j − = n j − − n j . (16)For any nonnegative x , the notation ⌈ x ⌉ means the smallest integer which is larger or equal to x . Note that for any j ∈ { , . . . , ℓ − } , d j > Aδ (1 − δ ) j j − > Aδ (1 − δ ) j j +1 , (17)where the last inequality comes from the deﬁnition of ℓ . Moreover, n ℓ ≤ A (1 − δ ) ℓ ℓ + 1 ≤ A (1 − δ ) ℓ ℓ − , (18)where the last inequality comes from the fact that Aδ (1 − δ ) ℓ − ℓ × − δδ > ℓ and the fact that δ ≤ / K A we proceed as follows. At the ﬁrst step, we divide the set { , . . . , A } intothree disjoint subsets of consecutive integers: I , , I ∗ , and I , . These subsets are such thatCard( I , ) = Card( I , ) = n and Card( I ∗ , ) = d . At the second step, each of the sets ofintegers I ,i , i = 1 ,

2, is divided into three disjoint subsets of consecutive integers as follows: forany i = 1 , I ,i = I , i − ∪ I ∗ ,i ∪ I , i where Card( I , i − ) = Card( I , i ) = n and Card( I ∗ ,i ) = d .Iterating this procedure we have constructed after j steps (1 ≤ j ≤ ℓ A ), 2 j sets of consecutiveintegers, I j,i , i = 1 , . . . , j , each of cardinality n j such that a j, k − b j, k − − d j − for any k = 1 , . . . , j − , where a j,i = min { k ∈ I j,i } and b j,i = max { k ∈ I j,i } . Moreover if, for any i = 1 , . . . , j − , we set I ∗ j − ,i = { b j, i − + 1 , . . . , a j, i − } , then I j − ,i = I j, i − ∪ I ∗ j − ,i ∪ I j, i .8fter ℓ steps we then have constructed 2 ℓ sets of consecutive integers, I ℓ,i , i = 1 , . . . , ℓ , eachof cardinality n ℓ such that I ℓ, i − and I ℓ, i are spaced by d ℓ − integers. The set of consecutiveintegers K A is then deﬁned by K A = ℓ [ k =1 I ℓ,k . Note that { , . . . , A } = K A ∪ ( ∪ ℓ − j =0 ∪ j i =1 I ∗ j,i )Therefore Card( { , . . . , A } \ K A ) = ℓ − X j =0 2 j X i =1 Card( I ∗ j,i ) = ℓ − X j =0 j d j = A − ℓ n ℓ . But A − ℓ n ℓ ≤ A (cid:0) − (1 − δ ) ℓ (cid:1) = Aδ ℓ − X j =0 (1 − δ ) j ≤ Aδℓ ≤ A . (19)Therefore A > Card( K A ) > A/ k ∈ { , , . . . , ℓ } and any j ∈ { , . . . , k } , let K k,j := K A,k,j = j ℓ − k [ i =( j − ℓ − k +1 I ℓ,i (20)Therefore K , = K A and, for any j ∈ { , . . . , ℓ } , K ℓ,j = I ℓ,j . Moreover, for any k ∈ { , . . . , ℓ } and any j ∈ { , . . . , k − } , there are exactly d k − integers between K k, j − and K k, j . We start by introducing some notations, then we state the decoupling Lemma 8 below that isfundamental to prove Proposition 7. Let K A be deﬁned as in Step 1. In the rest of the proof,we will adopt the following notation. For any integer m ∈ { , . . . , ℓ } , ( V ( m ) j ) ≤ j ≤ m will denotea family of 2 m mutually independent random vectors deﬁned on (Ω , A , P ), each of dimension s d,ℓ,m := d Card( K m,j ) = d ℓ − m n ℓ and such that V ( m ) j = D ( X i , i ∈ K m,j ) . (21)The existence of such a family is ensured by the Skorohod lemma (see [15]). Indeed since(Ω , A , P ) is assumed to be large enough to contain a sequence ( δ i ) i ∈ Z of iid random variablesuniformly distributed on [0 ,

1] and independent of the sequence ( X i ) i > , there exist measurablefunctions f j such that the vectors V ( m ) j = f j (cid:0) ( X i , i ∈ K m,k ) k =1 ,...,j , δ j (cid:1) , j = 1 , . . . , m , areindependent and satisfy (21).Let π ( m ) i be the i -th canonical projection from K s d,ℓ,m onto K d , namely: for any vector x = ( x i , i ∈ K m,j ) of K s d,ℓ,m , π ( m ) i ( x ) = x i . For any i ∈ K m,j , let X ( m ) j ( i ) = π ( m ) i ( V ( m ) j ) and S ( m ) j = X i ∈ K m,j X ( m ) j ( i ) , (22)where X ( m ) j ( i ) is the d × d random matrix associated with X ( m ) j ( i ) (recall that this means thatthe ( k, ℓ )-th entry of X ( m ) j ( i ) is the (( ℓ − d + k )-th coordinate of the vector X ( m ) j ( i )).9ith the above notations, we have E Tr (cid:0) e t P i ∈ KA X i (cid:1) = E Tr (cid:0) e t S (0)1 (cid:1) . (23)We are now in position to state the following lemma which will be a key step in the proof ofProposition 7 and allows decoupling when we deal with the Laplace transform of a sum of selfadjoint random matrices. Lemma 8

Assume that (6) holds. Then for any t > and any k ∈ { , . . . , ℓ − } , E Tr (cid:16) e t P kj =1 S ( k ) j (cid:17) ≤ E Tr (cid:16) e t P k +1 j =1 S ( k +1) j (cid:17)(cid:16) β d k +1 e tMn ℓ ℓ − k (cid:17) k , where ( S ( k ) j ) j =1 ,..., k is the family of mutually independent random matrices deﬁned in (22) .Proof. Note that for any k ∈ { , . . . , ℓ − } and any j ∈ { , . . . , k } , K k,j = K k +1 , j − ∪ K k +1 , j where the union is disjoint. Therefore S ( k ) j = S ( k ) j, + S ( k ) j, and V ( k ) j = ( V ( k ) j, , V ( k ) j, ) , where S ( k ) j, := P i ∈ K k +1 , j − X ( k ) j ( i ), S ( k ) j, := P i ∈ K k +1 , j X ( k ) j ( i ), V ( k ) j, := (cid:0) X ( k ) j ( i ) , i ∈ K k +1 , j − (cid:1) and V ( k ) j, := (cid:0) X ( k ) j ( i ) , i ∈ K k +1 , j (cid:1) . Note that there are exactly d k integers between K k +1 , j − and K k +1 , j and that for any i ∈ { , . . . , k +1 } ,Card( K k +1 ,i ) = Card( K k +1 , ) = 2 ℓ − ( k +1) n ℓ . Recall that the probability space is assumed to be large enough to contain a sequence( δ i , η i ) i ∈ Z of iid random variables uniformly distributed on [0 , independent of the sequence( X i ) i > . Therefore according to the remark on the existence of the family ( V ( m ) j ) ≤ j ≤ m made atthe beginning of Section 4.2.2, the sequence ( η i ) i ∈ Z is independent of ( V ( m ) j ) ≤ j ≤ m . Accordingto Lemma 6 there exists a random vector e V ( k )1 , of size d Card( K k +1 , ) with the same law as V ( k )1 , that is measurable with respect to σ ( η ) ∨ σ ( V ( k )1 , ) ∨ σ ( V ( k )1 , ), independent of σ ( V ( k )1 , ) and suchthat P ( e V ( k )1 , = V ( k )1 , ) = β (cid:0) σ ( V ( k )1 , ) , σ ( V ( k )1 , ) (cid:1) ≤ β d k +1 , where the inequality comes from the fact that, by relation (14), the quantity β (cid:0) σ ( V ( k )1 , ) , σ ( V ( k )1 , ) (cid:1) depends only on the joint distribution of ( V ( k )1 , , V ( k )1 , ) and therefore, by (21), β (cid:0) σ ( V ( k )1 , ) , σ ( V ( k )1 , ) (cid:1) = β (cid:0) σ ( X i , i ∈ K k +1 , ) , σ ( X i , i ∈ K k +1 , ) (cid:1) ≤ β d k +1 . Note that by construction, e V ( k )1 , is independent of σ (cid:0) V ( k )1 , , ( V ( k ) j ) j =2 ,..., k (cid:1) .For any i ∈ K k +1 , , let e X ( k )1 , ( i ) = π ( k +1) i ( e V ( k )1 , ) and e S ( k )1 , = X i ∈ K k +1 , e X ( k )1 , ( i ) , where e X ( k )1 , ( i ) is the d × d random matrix associated with the random vector e X ( k )1 , ( i ).10ith the notations above, we have E Tr exp (cid:16) t k X j =1 S ( k ) j (cid:17) = E (cid:16) e V ( k )1 , = V ( k )1 , Tr exp (cid:0) t k X j =1 S ( k ) j (cid:1)(cid:17) + E (cid:16) e V ( k )1 , = V ( k )1 , Tr exp (cid:0) t k X j =1 S ( k ) j (cid:1)(cid:17) ≤ E Tr exp (cid:16) t S ( k )1 , + t e S ( k )1 , + t k X j =2 S ( k ) j (cid:17) + E (cid:16) e V ( k )1 , = V ( k )1 , Tr exp (cid:0) t k X j =1 S ( k ) j (cid:1)(cid:17) . (24)(With usual convention, P k j = ℓ S ( k ) j is the null vector if ℓ > k ). By Golden-Thompson inequality,we have Tr exp (cid:16) t k X j =1 S ( k ) j (cid:17) ≤ Tr (cid:16) e t S ( k )1 · e t P kj =2 S ( k ) j (cid:17) . Hence, since σ (cid:0) V ( k ) j , j = 2 , . . . , k (cid:1) is independent of σ (cid:0) V ( k )1 , , V ( k )1 , , e V ( k )1 , (cid:1) , we get E (cid:16) e V ( k )1 , = V ( k )1 , Tr exp (cid:0) t k X j =1 S ( k ) j (cid:1)(cid:17) ≤ Tr (cid:16) E (cid:0) e V ( k )1 , = V ( k )1 , e t S ( k )1 (cid:1) · E exp (cid:0) t k X j =2 S ( k ) j (cid:1)(cid:17) . Note now the following fact: if U is a d × d self-adjoint random matrix with entries deﬁned on(Ω , A , P ) and such that λ max ( U ) ≤ b a.s., then for any Γ ∈ A , Γ e U (cid:22) e b I d Γ a.s. and so λ max E (cid:0) Γ e U (cid:1) ≤ e b P (Γ) . Therefore if we consider V a d × d self-adjoint random matrix with entries deﬁned on (Ω , A , P ),the following inequality is valid:Tr (cid:0) E ( Γ e U ) E (e V ) (cid:1) ≤ e b P (Γ) · E Tr(e V ) . (25)Notice now that ( X ( k )1 ( i ) , i ∈ K k, ) has the same distribution as ( X i , i ∈ K k, ). Therefore λ max ( X ( k )1 ( i )) ≤ M a.s. for any i , implying by Weyl’s inequality that λ max ( t S ( k )1 ) ≤ tM Card( K k, ) = tM ℓ − k n ℓ a.s.Hence, applying (25) with b = tM ℓ − k n ℓ , Γ = { e V ( k )1 , = V ( k )1 , } and V = t P k j =2 S ( k ) j , and takinginto account that P (Γ) ≤ β d k +1 , we obtain E (cid:16) e V ( k )1 , = V ( k )1 , Tr exp (cid:0) t k X j =1 S ( k ) j (cid:1)(cid:17) ≤ β d k +1 e tn ℓ ℓ − k M E Tr exp (cid:16) t k X j =2 S ( k ) j (cid:17) . (26)Note now that if V and W are two independent random matrices with entries deﬁned on (Ω , A , P )and such that E ( W ) = then E Tr exp( V ) = E Tr exp (cid:0) E (cid:0) V + W | σ ( V ) (cid:1)(cid:1) . Since Tr ◦ exp is convex, it follows from Jensen’s inequality applied to the conditional expectationthat E Tr exp( V ) ≤ E (cid:0) E (cid:0) Tr e V + W | σ ( V ) (cid:1)(cid:1) = E (cid:0) Tr e V + W (cid:1) . (27)Since E ( X ( k )1 ( i )) = E ( X i ) = for any i ∈ K k, and σ ( S ( k )1 , , e S ( k )1 , ) is independent of σ ( S ( k ) j , j =2 , . . . , k ), we can apply the inequality above with W = t ( S ( k )1 , + e S ( k )1 , ) and V = t P k j =2 S ( k ) j .Therefore, starting from (26) and using (27), we get E (cid:16) e V ( k )1 , = V ( k )1 , Tr exp (cid:0) t k X j =1 S ( k ) j (cid:1)(cid:17) ≤ β d k +1 e tn ℓ ℓ − k M E Tr exp (cid:16) t (cid:0) S ( k )1 , + e S ( k )1 , + k X j =2 S ( k ) j (cid:1)(cid:17) . (28)11tarting from (24) and considering (28), it follows that E Tr exp (cid:16) t k X j =1 S ( k ) j (cid:17) ≤ (1 + β d k +1 e tn ℓ ℓ − k M ) E Tr exp (cid:16) t (cid:0) S ( k )1 , + e S ( k )1 , + k X j =2 S ( k ) j (cid:1)(cid:17) . (29)The proof of Lemma 8 will then be achieved after having iterated this procedure 2 k − j -th step to the ( j + 1)-thstep.At the end of the j -th step, assume that we have constructed with the help of the couplingLemma 6, j random vectors e V ( k ) i, , i = 1 , . . . , j , each of dimension d Card( K k +1 , ) and satisfyingthe following properties: for any i in { , . . . , j } , e V ( k ) i, is a measurable function of ( V ( k ) i, , V ( k ) i, , η i ),it has the same distribution as V ( k ) i, , is such that P ( e V ( k ) i, = V ( k ) i, ) ≤ β d k +1 , is independent of V ( k ) i, and it satisﬁes E Tr exp (cid:16) t k X j =1 S ( k ) j (cid:17) ≤ (cid:0) β d k +1 e tn ℓ ℓ − k M (cid:1) j · E Tr exp (cid:16) t j X i =1 ( S ( k ) i, + e S ( k ) i, ) + t k X i = j +1 S ( k ) i (cid:17) , (30)where we have implemented the following notation: e S ( k ) i, = X r ∈ K k +1 , i e X ( k ) i, ( r ) . (31)In the notation above, e X ( k ) i, ( r ) is the d × d random matrix associated with the random vector e X ( k ) i, ( r ) of K d deﬁned by e X ( k ) i, ( r ) = π ( k +1) r ( e V ( k ) i, ) for any r ∈ K k +1 , i . Note that the induction assumption above has been proven at the beginning of the proof for j = 1. Moreover, note that since, for any m ∈ { , . . . , ℓ } , ( V ( m ) j ) ≤ j ≤ m is a family of independentrandom vectors, the random vectors e V ( k ) i, , i = 1 , . . . , j , deﬁned above are also such that, for any i ∈ { , . . . , j } , e V ( k ) i, is independent of σ (cid:0) ( V ( k ) ℓ, ) ℓ =1 ,...,i , ( e V ( k ) ℓ, ) ℓ =1 ,...,i − , ( V ( k ) ℓ ) ℓ = i +1 ,..., k (cid:1) .Now to show that the induction hypothesis also holds at step j + 1, we proceed as follows.By Lemma 6, there exists a random vector e V ( k ) j +1 , of size d Card( K k +1 , ) with the same law as V ( k ) j +1 , , measurable with respect to σ ( η j +1 ) ∨ σ ( V ( k ) j +1 , ) ∨ σ ( V ( k ) j +1 , ), independent of σ ( V ( k ) j +1 , )and such that P ( e V ( k ) j +1 , = V ( k ) j +1 , ) ≤ β d k +1 . (32)(The inequality above comes again from (21) and the equivalent deﬁnition (14) of the β -coeﬃcients). Note that by construction, σ (cid:0) ( V ( k ) i, ) i =1 ,...,j +1 , ( e V ( k ) i, ) i =1 ,...,j , ( V ( k ) i ) i = j +2 ,..., k (cid:1) and σ ( e V ( k ) j +1 , ) are independent. With the notation (31), we have the following decomposition: E Tr exp (cid:16) t j X i =1 ( S ( k ) i, + e S ( k ) i, ) + t k X i = j +1 S ( k ) i (cid:17) ≤ E Tr exp (cid:16) t j +1 X i =1 ( S ( k ) i, + e S ( k ) i, ) + t k X i = j +2 S ( k ) i (cid:17) + E (cid:16) e V ( k ) j +1 , = V ( k ) j +1 , Tr exp (cid:0) t j X i =1 ( S ( k ) i, + e S ( k ) i, ) + t k X i = j +1 S ( k ) i (cid:1)(cid:17) . (33)12sing Golden-Thompson inequality, we haveTr exp (cid:16) t j X i =1 ( S ( k ) i, + e S ( k ) i, ) + t k X i = j +1 S ( k ) i (cid:17) ≤ Tr (cid:16) exp (cid:0) t S ( k ) j +1 (cid:1) · exp (cid:0) t j X i =1 ( S ( k ) i, + e S ( k ) i, ) + t k X i = j +2 S ( k ) i (cid:1)(cid:17) . Hence, since the sigma algebra generated by (cid:0) ( V ( k ) i, ) i =1 ,...,j , ( e V ( k ) i, ) i =1 ,...,j , ( V ( k ) i ) i = j +2 ,..., k (cid:1) isindependent of that generated by (cid:0) V ( k ) j +1 , , V ( k ) j +1 , , e V ( k ) j +1 , (cid:1) , we get E Tr (cid:16) e t (cid:0) P ji =1 ( S ( k ) i, + e S ( k ) i, )+ P ki = j +1 S ( k ) i (cid:1) e V ( k ) j +1 , = V ( k ) j +1 , (cid:17) ≤ Tr (cid:16) E e t (cid:0) P ji =1 ( S ( k ) i, + e S ( k ) i, )+ P ki = j +2 S ( k ) i (cid:1) · E (cid:0) e t S ( k ) j +1 e V ( k ) j +1 , = V ( k ) j +1 , (cid:1)(cid:17) . (34)By Weyl’s inequality, λ max (cid:0) t S ( k ) j +1 (cid:1) ≤ t X r ∈ K k,j +1 λ max ( X ( k ) j +1 ( r )) a.s.Using that V ( k ) j +1 = D ( X i , i ∈ K k,j +1 ) and that λ max ( X i ) ≤ M a.s. for any i , it follows that λ max (cid:0) t S ( k ) j +1 (cid:1) ≤ tM Card( K k,j +1 ) = tM ℓ − k n k a.s.In addition, we notice that S ( k ) j +1 , + e S ( k ) j +1 , is independent of P ji =1 ( S ( k ) i, + e S ( k ) i, ) + P k i = j +2 S ( k ) i and, since e V ( k ) j +1 , = D V ( k ) j +1 , and V ( k ) j +1 = D ( X i , i ∈ K k,j +1 ), E ( S ( k ) j +1 , + e S ( k ) j +1 , ) = . Therefore,starting from (34) and taking into account (32), an application of inequality (25) with b = tM ℓ − k n ℓ , Γ = { e V ( k ) j +1 , = V ( k ) j +1 , } and V = t (cid:0) P ji =1 ( S ( k ) i, + e S ( k ) i, ) + P k i = j +2 S ( k ) i (cid:1) , followed by anapplication of inequality (27) with W = t ( S ( k ) j +1 , + e S ( k ) j +1 , ), gives E Tr (cid:16) e t (cid:0) P ji =1 ( S ( k ) i, + e S ( k ) i, )+ P ki = j +1 S ( k ) i (cid:1) e V ( k ) j +1 , = V ( k ) j +1 , (cid:17) ≤ β d k +1 e tn ℓ ℓ − k M × E Tr (cid:16) e t (cid:0) P j +1 i =1 ( S ( k ) i, + e S ( k ) i, )+ P ki = j +2 S ( k ) i (cid:1)(cid:17) . (35)Therefore, starting from (33) and using (35), we get E Tr (cid:16) e t (cid:0) P ji =1 ( S ( k ) i, + e S ( k ) i, )+ P ki = j +1 S ( k ) i (cid:1)(cid:17) ≤ (cid:0) β d k +1 e tn ℓ ℓ − k M (cid:1) × E Tr (cid:16) e t (cid:0) P j +1 i =1 ( S ( k ) i, + e S ( k ) i, )+ P ki = j +2 S ( k ) i (cid:1)(cid:17) , which combined with (30) implies that E Tr (cid:16) e t P kj =1 S ( k ) j (cid:17) ≤ (cid:0) β d k +1 e tn ℓ ℓ − k M (cid:1) j +1 × E Tr (cid:16) e t (cid:0) P j +1 i =1 ( S ( k ) i, + e S ( k ) i, )+ P ki = j +2 S ( k ) i (cid:1)(cid:17) , proving the induction hypothesis for the step j + 1. Finally 2 k steps of the procedure lead to E Tr (cid:16) e t P kj =1 S ( k ) j (cid:17) ≤ (cid:0) β d k +1 e tn ℓ ℓ − k M (cid:1) k × E Tr (cid:16) e t P ki =1 ( S ( k ) i, + e S ( k ) i, ) (cid:17) . (36)To end the proof of the lemma it suﬃces to notice the following facts: the random vectors V ( k ) i, , e V ( k ) i, , i = 1 , . . . , k , are mutually independent and such that V ( k ) i, = D V ( k +1)2 i − and V ( k ) i, = D ( k +1)2 i . In addition, the random vectors V ( k +1) i , i = 1 , . . . , k +1 , are mutually independent.This obviously implies that E Tr (cid:16) e t P ki =1 ( S ( k ) i, + e S ( k ) i, ) (cid:17) = E Tr (cid:16) e t P k +1 i =1 S ( k +1) i (cid:17) , which ends the proof of the lemma. (cid:3) We shall prove Inequality (15) with K A deﬁned in Section 4.2.1.Let us prove it ﬁrst in the case where 0 < tM ≤ /A . Since by Weyl’s inequality, λ max (cid:0) X i ∈ K A X i (cid:1) ≤ X i ∈ K A λ max (cid:0) X i (cid:1) ≤ AM , and E ( X i ) = for any i ∈ K A , it follows by using Lemma 4 applied with K = { } and U = P i ∈ K A X i that, for any t > E Tr (cid:0) e t P i ∈ KA X i (cid:1) ≤ d exp (cid:16) t g ( tAM ) λ max (cid:16) E (cid:0) X i ∈ K A X i (cid:1) (cid:17)(cid:17) . Therefore by the deﬁnition of v , since g is increasing, tAM < g (4) ≤ .

1, we get E Tr (cid:0) e t P i ∈ KA X i (cid:1) ≤ d exp(3 . × At v ) , proving then (15).We prove now Inequality (15) in the case where 4 /A < tM ≤ min (cid:0) , c log 232 log A (cid:1) . Let κ = c k ( t ) = inf n k ∈ Z : A ((1 − δ ) / k ≤ min (cid:0) κ ( tM ) , A (cid:1)o . (37)Note that if t M ≤ κ/A then k ( t ) = 0 whereas k ( t ) ≥ t M > κ/A . In addition by theselection of ℓ A , A ((1 − δ ) / ℓ < /δ . Therefore k ( t ) ≤ ℓ A since ( tM ) ≤ cδ/

32. Then, startingfrom (23), considering the selection of k ( t ) and using Lemma 8, we get by induction that E Tr exp (cid:0) t X i ∈ K A X i (cid:1) ≤ k ( t ) − Y k =0 (cid:16) β d k +1 e tMn ℓ ℓ − k (cid:17) k E Tr exp (cid:16) t k ( t ) X j =1 S ( k ( t )) j (cid:17) , (38)with the usual convention that Q − k =0 a k = 1. Note that in the inequality above, ( S ( k ( t )) j ) j =1 ,..., k ( t ) is a family of mutually independent random matrices deﬁned in (22). They are then constructedfrom a family ( V ( k ( t )) j ) ≤ j ≤ k ( t ) of 2 k ( t ) mutually independent random vectors that satisfy (21).Therefore we have that, for any j ∈ , . . . , k ( t ) , S ( k ( t )) j = D P i ∈ K k ( t ) ,j X i . Moreover, accordingto the remark on the existence of the family ( V ( k ( t )) j ) ≤ j ≤ k ( t ) made at the beginning of Section4.2.2, the entries of each random matrix S ( k ( t )) j are measurable functions of ( X i , δ i ) i ∈ Z .The rest of the proof consists of giving a suitable upper bound for E Tr exp (cid:16) t P k ( t ) j =1 S ( k ( t )) j (cid:17) .With this aim, let p be a positive integer to be chosen later such that2 p ≤ Card( K k ( t ) ,j ) := q . (39)Note that q = 2 ℓ − k ( t ) n ℓ and by (19) q ≥ A k ( t )+1 . k ( t ) = 0 then q ≥ A/ q ≥ /A < tM ≤ k ( t ) ≥ t M > κ/A , by the deﬁnition of k ( t ), we have q ≥ κ ( tM ) and then q ≥ tM ) ≤ κ/

2. Hence in all cases, q ≥ p satisfying (39) is always possible.Let m q,p = [ q/ (2 p )]. For any j ∈ { , . . . , k ( t ) } , we divide K k ( t ) ,j into 2 m q,p consecutiveintervals ( J ( k ( t )) j,i , ≤ i ≤ m q,p ) each containing p consecutive integers plus a remainder interval J ( k ( t )) j, m q,p +1 containing r = q − pm q,p consecutive integers. Note that this last interval containsat most 2 p − X ( k ( t )) j ( k ) be the d × d random matrix associated with the randomvector X ( k ( t )) j ( k ) deﬁned in (22) and deﬁne Z ( k ( t )) j,i = X k ∈ K k ( t ) ,j ∩ J ( k ( t )) j,i X ( k ( t )) j ( k ) . (40)With this notation S ( k ( t )) j = m q,p +1 X i =1 Z ( k ( t )) j, i − + m q,p X i =1 Z ( k ( t )) j, i . Since Tr ◦ exp is a convex function, we get E Tr exp (cid:16) t k ( t ) X j =1 S ( k ( t )) j (cid:17) ≤ E Tr exp (cid:16) t k ( t ) X j =1 m q,p +1 X i =1 Z ( k ( t )) j, i − (cid:17) + 12 E Tr exp (cid:16) t k ( t ) X j =1 m q,p X i =1 Z ( k ( t )) j, i (cid:17) . (41)We start by giving an upper bound for E Tr exp (cid:16) t P k ( t ) j =1 P m q,p i =1 Z ( k ( t )) j, i (cid:17) . With this aim, let usdeﬁne the following vectors U ( k ( t )) j,i = (cid:0) X ( k ( t )) j ( k ) , k ∈ K k ( t ) ,j ∩ J ( k ( t )) j,i (cid:1) and W ( k ( t )) j = (cid:0) U ( k ( t )) j,i , i ∈ { , . . . , m q,p + 1 } (cid:1) . (42)Proceeding by induction and using the coupling lemma 6, one can construct random vectors U ∗ ( k ( t )) j, i , j = 1 , . . . , k ( t ) , i = 1 , . . . , m q,p , that satisfy the following properties:(i) ( U ∗ ( k ( t )) j, i , ( j, i ) ∈ { , . . . , k ( t ) } × { , . . . , m q,p } ) is a family of mutually independent randomvectors,(ii) U ∗ ( k ( t )) j, i has the same distribution as U ( k ( t )) j, i ,(iii) P ( U ∗ ( k ( t )) j, i = U ( k ( t )) j, i ) ≤ β p +1 . Let us explain the construction. Recall ﬁrst that (Ω , A , P ) is assumed to be rich enough to containa sequence ( η i ) i ∈ Z of iid random variables with uniform distribution over [0 ,

1] independent of( X i , δ i ) i ∈ Z (the sequence ( δ i ) i ∈ Z has been used to construct the independent random matrices S ( k ( t )) j , j = 1 , . . . , k ( t ), involved in inequality (38)). For any j ∈ { , . . . , k ( t ) } , let U ∗ ( k ( t )) j, = U ( k ( t )) j, , and construct the random vectors U ∗ ( k ( t )) j, i , i = 2 , . . . , m q,p , recursively from ( U ∗ ( k ( t )) j, ℓ , ≤ ℓ ≤ i −

1) as follows. According to Lemma 6, there exists a random vector U ∗ ( k ( t )) j, i such that U ∗ ( k ( t )) j, i = f i,j (cid:0) ( U ∗ ( k ( t )) j, ℓ ) ≤ ℓ ≤ i − , U ( k ( t )) j, i , η i +( j − k ( t ) (cid:1) (43)where f i,j is a measurable function, U ∗ ( k ( t )) j, i has the same law as U ( k ( t )) j, i , is independent of σ (cid:0) U ∗ ( k ( t )) j, ℓ , ≤ ℓ ≤ i − (cid:1) and P ( U ∗ ( k ( t )) j, i = U ( k ( t )) j, i ) = β (cid:0) σ (cid:0) U ∗ ( k ( t )) j, ℓ , ≤ ℓ ≤ i − (cid:1) , σ ( U ( k ( t )) j, i ) (cid:1) ≤ β p +1 .

15y construction, for any ﬁxed j ∈ { , . . . , k ( t ) } , the random vectors U ∗ ( k ( t )) j, i , i = 1 , . . . , m q,p ,are mutually independent. In addition, by (43) and the fact that ( W ( k ( t )) j , j = 1 , . . . , k ( t ) ) is afamily of mutually independent random vectors, we note that ( U ∗ ( k ( t )) j, i , ( i, j ) ∈ { , . . . , m q,p } ×{ , . . . , k ( t ) } ) is also so. Therefore the constructed random vectors U ∗ ( k ( t )) j, i i = 1 , . . . , m q,p , j = 1 , . . . , k ( t ) , satisfy Items (i) and (ii) above. Moreover, by (43), we have σ (cid:0) U ∗ ( k ( t )) j, ℓ , ≤ ℓ ≤ i − (cid:1) ⊆ σ (cid:0) U ( k ( t )) j, ℓ , ≤ ℓ ≤ i − (cid:1) ∨ σ (cid:0) η ℓ +( j − k ( t ) , ≤ ℓ ≤ i − (cid:1) . Since ( η i ) i ∈ Z is independent of ( X i , δ i ) i ∈ Z , we have β (cid:0) σ (cid:0) U ∗ ( k ( t )) j, ℓ , ≤ ℓ ≤ i − (cid:1) , σ ( U ( k ( t )) j, i ) (cid:1) ≤ β (cid:0) σ (cid:0) U ( k ( t )) j, ℓ , ≤ ℓ ≤ i − (cid:1) , σ ( U ( k ( t )) j, i ) (cid:1) . By relation (14), the quantity β (cid:0) σ (cid:0) U ( k ( t )) j, ℓ , ≤ ℓ ≤ i − (cid:1) , σ ( U ( k ( t )) j, i ) (cid:1) depends only on the jointdistribution of (cid:0) ( U ( k ( t )) j, ℓ ) ≤ ℓ ≤ i − , U ( k ( t )) j, i (cid:1) . By the deﬁnition (42) of the U ( k ( t )) j,ℓ ’s, the deﬁnition(22) of the X ( k ( t )) j ( k )’s and (21), we infer that β (cid:0) σ (cid:0) U ( k ( t )) j, ℓ , ≤ ℓ ≤ i − (cid:1) , σ ( U ( k ( t )) j, i ) (cid:1) = β (cid:0) σ (cid:0) X k , k ∈ ∪ i − ℓ =1 K k ( t ) ,j ∩ J ( k ( t )) j, ℓ (cid:1) , σ (cid:0) X k , k ∈ K k ( t ) ,j ∩ J ( k ( t )) j, i (cid:1)(cid:1) ≤ β p +1 . So, overall, the constructed random vectors U ∗ ( k ( t )) j, i i = 1 , . . . , m q,p , j = 1 , . . . , k ( t ) , satisfy alsoItem (iii) above.Denote now X ∗ ( k ( t )) j, i ( ℓ ) = π ℓ ( U ∗ ( k ( t )) j, i )where π ( m ) i is the ℓ -th canonical projection from K pd onto K d , namely: for any vector x =( x i , i ∈ { , . . . , p } ) of K pd , π ℓ ( x ) = x ℓ . Let X ∗ ( k ( t )) j, i ( ℓ ) be the d × d random matrix associatedwith X ∗ ( k ( t )) j, i ( ℓ ) and deﬁne, for any i = 1 . . . , m q,p , Z ∗ ( k ( t )) j, i = X ℓ ∈ K k ( t ) ,j ∩ J ( k ( t )) j, i X ∗ ( k ( t )) j, i ( ℓ ) . Observe that by Item (ii) above, Z ∗ ( k ( t )) j, i = D Z ( k ( t )) j, i (where we recall that Z ( k ( t )) j, i is deﬁned by(40)) and by Item (i), the random matrices Z ∗ ( k ( t )) j, i , i = 1 , . . . , m q,p , j = 1 , . . . , k ( t ) , are mutuallyindependent. The aim now is to prove that the following inequality is valid: E Tr exp (cid:16) t k ( t ) X j =1 m q,p X i =1 Z ( k ( t )) j, i (cid:17) ≤ (cid:16) m q,p − qtM β p +1 (cid:17) k ( t ) E Tr exp (cid:16) t k ( t ) X j =1 m q,p X i =1 Z ∗ ( k ( t )) j, i (cid:17) . (44)Obviously, this can be done by induction if we can show that, for any ℓ in { , . . . , k ( t ) } , E Tr exp (cid:16) t ℓ − X j =1 m q,p X i =1 Z ∗ ( k ( t )) j, i + 2 t k ( t ) X j = ℓ m q,p X i =1 Z ( k ( t )) j, i (cid:17) ≤ (cid:16) m q,p − qtM β p +1 (cid:17) E Tr exp (cid:16) t ℓ X j =1 m q,p X i =1 Z ∗ ( k ( t )) j, i + 2 t k ( t ) X j = ℓ +1 m q,p X i =1 Z ( k ( t )) j, i (cid:17) . (45)16o prove the inequality above, we set C ℓ − ,ℓ ( t ) = 2 t ℓ − X j =1 m q,p X i =1 Z ∗ ( k ( t )) j, i + 2 t k ( t ) X j = ℓ m q,p X i =1 Z ( k ( t )) j, i and we write E Tr exp (cid:0) C ℓ − ,ℓ ( t ) (cid:1) = E (cid:16) m q,p Y i =2 U ( k ( t )) ℓ, i = U ∗ ( k ( t )) ℓ, i Tr exp (cid:0) C ℓ − ,ℓ ( t ) (cid:1)(cid:17) + E (cid:16) ∃ i ∈{ ,...,m q,p } : U ( k ( t )) ℓ, i = U ∗ ( k ( t )) ℓ, i Tr exp (cid:0) C ℓ − ,ℓ ( t ) (cid:1)(cid:17) ≤ E Tr exp (cid:0) C ℓ,ℓ +1 ( t ) (cid:1) + E (cid:16) ∃ i ∈{ ,...,m q,p } : U ( k ( t )) ℓ, i = U ∗ ( k ( t )) ℓ, i Tr exp (cid:0) C ℓ − ,ℓ ( t ) (cid:1)(cid:17) . (46)Note that the sigma algebra generated by the random vectors ( U ∗ ( k ) j, i ) i ∈{ ,...,m q,p } ,j ∈{ ,...,ℓ − } and( U ( k ) j, i ) i ∈{ ,...,m q,p } , j ∈{ ℓ +1 ,..., k ( t ) } is independent of σ (cid:0) ( U ( k ) ℓ, i , U ∗ ( k ) ℓ, i ) i ∈{ ,...,m q,p } (cid:1) . This fact togetherwith the Golden-Thomson inequality give E (cid:16) ∃ i ∈{ ,...,m q,p } : U ( k ( t )) ℓ, i = U ∗ ( k ( t )) ℓ, i Tr exp (cid:0) C ℓ − ,ℓ ( t ) (cid:1)(cid:17) ≤ Tr (cid:16) E (cid:16) exp (cid:0) t ℓ − X j =1 m q,p X i =1 Z ∗ ( k ( t )) j, i + 2 t k ( t ) X j = ℓ +1 m q,p X i =1 Z ( k ( t )) j, i (cid:1)(cid:17) × E (cid:16) ∃ i ∈{ ,...,m q,p } : U ( k ( t )) ℓ, i = U ∗ ( k ( t )) ℓ, i exp (cid:0) t m q,p X i =1 Z ( k ( t )) ℓ, i (cid:1)(cid:17)(cid:17) . By Weyl’s inequality and (21), we infer that, almost surely, λ max (cid:0) t m q,p X i =1 Z ( k ( t )) ℓ, i (cid:1) ≤ t m q,p X i =1 X k ∈ K k ( t ) ,ℓ ∩ J ( k ( t )) ℓ, i λ max (cid:0) X k (cid:1) ≤ tm q,p pM ≤ tqM . (47)Therefore, applying (25) with b = tqM , Γ = {∃ i ∈ { , . . . , m q,p } : U ( k ( t )) ℓ, i = U ∗ ( k ( t )) ℓ, i } and V = 2 t P ℓ − j =1 P m q,p i =1 Z ∗ ( k ( t )) j, i + 2 t P k ( t ) j = ℓ +1 P m q,p i =1 Z ( k ( t )) j, i and taking into account that P (Γ) ≤ m q,p X i =2 P ( U ( k ( t )) ℓ, i = U ∗ ( k ( t )) ℓ, i ) ≤ ( m q,p − β p +1 , we get E (cid:16) ∃ i ∈{ ,...,m q,p } : U ( k ( t )) ℓ, i = U ∗ ( k ( t )) ℓ, i Tr exp (cid:0) C ℓ − ,ℓ ( t ) (cid:1)(cid:17) ≤ ( m q,p − β p +1 e qtM E Tr exp (cid:16) t ℓ − X j =1 m q,p X i =1 Z ∗ ( k ( t )) j, i + 2 t k ( t ) X j = ℓ +1 m q,p X i =1 Z ( k ( t )) j, i (cid:17) . Using that the sigma algebra generated by the random vectors ( U ∗ ( k ) j, i ) i ∈{ ,...,m q,p } , j ∈{ ,...,ℓ − } and ( U ( k ) j, i ) i ∈{ ,...,m q,p } ,j ∈{ ℓ +1 ,..., k ( t ) } is independent of σ (cid:0) ( U ∗ ( k ) ℓ, i ) i ∈{ ,...,m q,p } (cid:1) , and noticing thatby construction, E ( Z ∗ ( k ( t )) ℓ, i ) = E ( Z ( k ( t )) ℓ, i ) = , an application of inequality (27) then gives E (cid:16) ∃ i ∈{ ,...,m q,p } : U ( k ( t )) ℓ, i = U ∗ ( k ( t )) ℓ, i Tr exp (cid:0) C ℓ − ,ℓ ( t ) (cid:1)(cid:17) ≤ β p +1 ( m q,p − qtM E Tr exp (cid:0) C ℓ,ℓ +1 ( t ) (cid:1) . (48)17tarting from (46) and taking into account (48), inequality (45) follows and so does inequality(44).With the same arguments as above and with obvious notations, we infer that E Tr exp (cid:16) t k ( t ) X j =1 m q,p +1 X i =1 Z ( k ( t )) j, i − (cid:17) ≤ (cid:16) m q,p e qtM β p +1 (cid:17) k ( t ) E Tr exp (cid:16) t k ( t ) X j =1 m q,p +1 X i =1 Z ∗ ( k ( t )) j, i − (cid:17) . (49)Note that to get the above inequality, we used instead of (47) that, almost surely, λ max (cid:0) t m q,p +1 X i =1 Z ( k ( t )) ℓ, i − (cid:1) ≤ t m q,p +1 X i =1 X k ∈ K k ( t ) ,ℓ ∩ J ( k ( t )) ℓ, i − λ max (cid:0) X k (cid:1) ≤ M t ( m q,p p + q − pm q,p ) = 2 M t ( q − pm q,p ) ≤ M t ( q + 2 p ) ≤ tqM . Starting from (38) and taking into account (41), (44) and (49), we then derive E Tr exp (cid:16) t X i ∈ K A X i (cid:17) ≤ (cid:16) m q,p e qtM β p +1 (cid:17) k ( t ) k ( t ) − Y k =0 (cid:16) β d k +1 e tMn ℓ ℓ − k (cid:17) k × (cid:16) E Tr exp (cid:0) t k ( t ) X j =1 m q,p X i =1 Z ∗ ( k ( t )) j, i (cid:1) + 12 E Tr exp (cid:0) t k ( t ) X j =1 m q,p +1 X i =1 Z ∗ ( k ( t )) j, i − (cid:1)(cid:17) . (50)Now we choose p = h tM i ∧ h q i . Note that the random vectors ( Z ∗ ( k ( t )) j, i − ) i,j are mutually independent and centered. Moreover,2 λ max ( Z ∗ ( k ( t )) j, i − ) ≤ M p ≤ t a.s.Therefore by using Lemma 4 together with the deﬁnition of v and the fact that 2 k ( t ) ( m q,p +1) p ≤ k ( t ) q ≤ A , we get E Tr exp (cid:16) t k ( t ) X j =1 m q,p +1 X i =1 Z ∗ ( k ( t )) j, i − (cid:17) ≤ d exp(4 × . × At v ) . (51)Similarly, we obtain that E Tr exp (cid:16) t k ( t ) X j =1 m q,p X i =1 Z ∗ ( k ( t )) j, i (cid:17) ≤ d exp(4 × . × At v ) . (52)Next, by using Condition (5) and (19), we getlog (cid:16) m q,p e tqM β p +1 (cid:17) k ( t ) ≤ k ( t ) m q,p e tqM e − cp ≤ A p e tqM e − cp . (53)18everal situations can occur. Either ( tM ) ≤ κ/A and in this case k ( t ) = 0 implying that A/ ≤ q ≤ A ≤ κ/ ( tM ) . If in addition q ≥ / ( tM ) then p = [2 / ( tM )] ≥ /tM (since tM ≤ A p e tqM e − cp ≤ AtM κ/ ( tM ) e − c/ ( tM ) ≤ AtM − c/ (4 tM ) ≤ ( tM ) c e − c/ (16 tM ) , where we have used that log A ≤ c tM , A ≥

2, and e − c/ (8 tM ) ≤ tM c for the last inequality. Ifotherwise q < / ( tM ) then p = [ q/ ≥ q/

4. Hence, since 2 tM ≤ c/

16 (since log A ≥ log 2) and tM > /A , A p e tqM e − cp ≤ Aq e − cq/ ≤ − cA/ ≤ AtM e − c/ (8 tM ) ≤ ( tM ) c e − c/ (32 tM ) , where we have used that A/ ≤ q for the second inequality, and that log A ≤ c tM , A ≥ − c/ (16 tM ) ≤ tM c for the last one.Either ( tM ) > κ/A and in this case k ( t ) ≥ k ( t ),we have q ≥ A k ( t )+1 ≥ κ tM ) . (54)If in addition q ≥ / ( tM ) then p = [2 / ( tM )] ≥ /tM , and by (18) and the deﬁnition of k ( t ), q ≤ A (1 − δ ) ℓ k ( t ) ≤ κ ( tM ) . It follows that A p e tqM e − cp ≤ AtM κ/ ( tM ) e − c/ ( tM ) ≤ AtM − c/ (2 tM ) ≤ ( tM ) c e − c/ (8 tM ) , where we have used that log A ≤ c tM , A ≥ − c/ (4 tM ) ≤ tMc for the last inequality. Nowif q < / ( tM ) then p = [ q/ ≥ q/

4. Hence, using again the fact that 2 tM ≤ c/

16 combinedwith (54), we get A p e tqM e − cp ≤ A ( tM ) κ e − cq/ ≤ A ( tM ) c e − c × × tM )2 ≤ tM ) c e − c/ (32 tM ) , where we have used that log A ≤ c (32 tM ) and A ≥ (cid:16) m q,p e tqM β p +1 (cid:17) k ( t ) ≤ tM ) c e − c/ (32 tM ) . (55)We handle now the term Q k ( t ) − k =0 (cid:16) β d k +1 e tMn ℓ ℓ − k (cid:17) k only in the case where ( κ/A ) / < tM ,otherwise this term is equal to one. By taking into account (5), (17), (18) and the fact that tM ≤ cδ/

8, we havelog k ( t ) − Y k =0 (cid:16) β d k +1 e tMn ℓ ℓ − k (cid:17) k ≤ k ( t ) − X k =0 k exp (cid:16) − c Aδ (1 − δ ) k k +1 + 2 tM A (1 − δ ) ℓ k (cid:17) ≤ k ( t ) − X k =0 k exp (cid:16) − c Aδ (1 − δ ) k k +2 (cid:17) ≤ k ( t ) exp (cid:16) − Acδ (1 − δ ) k ( t ) − k ( t )+1 (cid:17) .

19y the deﬁnition of k ( t ), we have A (1 − δ ) k ( t ) − k ( t ) − > κ ( tM ) . Therefore 2 k ( t ) ≤ A ( tM ) κ . Moreover Acδ (1 − δ ) k ( t ) − k ( t )+1 > cκδ tM ) > κtM , since tM ≤ cδ/

8. It follows thatlog k ( t ) − Y k =0 (cid:16) β d k +1 e tMn ℓ ℓ − k (cid:17) k ≤ A ( tM ) κ exp (cid:0) − κ/ ( tM ) (cid:1) ≤ ( tM ) c e − c/ (32 tM ) , (56)where we have used the fact that log A ≤ c tM . So, overall, starting from (50) and consideringthe upper bounds (51), (52), (55) and (56), we getlog E Tr exp (cid:16) t X i ∈ K A X i (cid:17) ≤ log d + 4 × . At v + 9( tM ) c e − c/ (32 tM ) . Therefore Inequality (15) also holds in the case where 4 /A < tM ≤ min (cid:0) , c log 232 log A (cid:1) . This endsthe proof of the proposition. (cid:3) Let A = A = n and Y (0) ( k ) = X k for any k = 1 , . . . , A . Let K A be the discrete Cantor typeset as deﬁned from { , . . . , A } in Section 4.2.1. Let A = A − Card( K A ) and deﬁne for any k = 1 , . . . , A , Y (1) ( k ) = X i k where { i , . . . , i A } = { , . . . , A } \ K A . Now for i ≥

1, let K A i be deﬁned from { , . . . , A i } exactly as K A is deﬁned from { , . . . , A } .Set A i +1 = A i − Card( K A i ) and { j , . . . , j A i +1 } = { , . . . , A i } \ K A i . Deﬁne now Y ( i +1) ( k ) = Y ( i ) ( j k ) for k = 1 , . . . , A i +1 , and set L = L n = inf { j ∈ N ∗ , A j ≤ } . Note that, for any i ∈ { , . . . , L − } , A i > K A i ) > A i /

2. Moreover A i ≤ n − i .The following decomposition clearly holds n X k =1 X k = L − X i =0 X k ∈ K Ai Y ( i ) ( k ) + A L X k =1 Y ( L ) ( k ) . (57)Let U i = X k ∈ K Ai Y ( i ) ( k ) for 0 ≤ i ≤ L − U L = A L X k =1 Y ( L ) ( k ) , For any positive x , let h ( c, x ) = min (cid:16) , c log 232 log x (cid:17) . For any i ∈ { , . . . , L − } , noticing that the self-adjoint random matrices ( Y ( i ) ( k )) k satisfy thecondition (5) with the same constant c , we can apply Proposition 7 and get that for any positive t satisfying tM < h ( c, n/ i ),log E Tr (cid:0) exp( t U i ) (cid:1) ≤ log d + 4 t n − i (2 v + √ × i/ M/ ( n / √ c )) − tM/h ( c, n − i ) . (58)20n the other hand, by Weyl’s inequality, λ max (cid:0) U L (cid:1) ≤ M A L ≤ M .

Therefore by using Lemma 4, for any positive t , E Tr (cid:0) exp( t U L ) (cid:1) ≤ d exp (cid:16) t g (2 tM ) λ max (cid:0) E (cid:0) U L (cid:17)(cid:1) . Hence by the deﬁnition of v , for any positive t such that tM <

1, we getlog E Tr (cid:0) exp( t U L ) (cid:1) ≤ log d + 2 t v ≤ log d + 2 t v − tM . (59)Let κ i = Mh ( c, n/ i ) for 0 ≤ i ≤ L − κ L = M and σ i = 2 √ n i/ (cid:16) v + √ × i Mn √ c (cid:17) for 0 ≤ i ≤ L − σ L = v √ . Since L ≤ h log n − log 2log 2 i + 1 , we get L X i =0 κ i ≤ M (cid:16) L − X i =0 h ( c, n/ i ) + 1 (cid:17) ≤ M log n log 2 max (cid:16) ,

32 log nc log 2 (cid:17) = M γ ( c, n ) . Moreover L X i =0 σ i = 2 √ n L − X i =0 − i/ (cid:16) v + √ × i/ Mn / √ c (cid:17) + v √ ≤ √ nv + 2 c − / n − M L + v √ ≤ √ nv + 2 c − / M .

Taking into account (58) and (59), we get overall by Lemma 5, that for any positive t such that tM < /γ ( c, n ),log E Tr (cid:16) exp (cid:0) t n X i =1 X i (cid:1)(cid:17) ≤ log d + t n (cid:0) v + 2 M/ ( cn ) / (cid:1) − tM γ ( c, n ) := γ n ( t ) . (60)To end the proof of the theorem, it suﬃces to notice that for any positive x P (cid:16) λ max (cid:0) n X i =1 X i (cid:1) > x (cid:17) ≤ inf t> tM ≤ /γ ( c,n ) exp (cid:0) − tx + γ n ( t ) (cid:1) , where γ n ( t ) is deﬁned in (60). (cid:3) Acknowledgements.

This work was initiated when the ﬁrst named author was visiting thethird named author at the University of Alberta. She would like to thank Nicole Tomczak-Jaegermann and Alexander Litvak for their hospitality and the University of Alberta for theexcellent working conditions. 21 eferences [1] Adamczak, R. (2008). A tail inequality for suprema of unbounded empirical processes withapplications to Markov chains.

Electron. Journal Probab. , 1000-1034.[2] Ahlswede, R. and Winter, A. (2002). Strong converse for identiﬁcation via quantum chan-nels. IEEE Trans. Inform. Theory , , no. 3, 569-579.[3] Berbee H.C.P. (1979). Random walks with stationary increments and renewal theory. Math-ematical Centre Tracts , , Mathematisch Centrum, Amsterdam, 223 pp.[4] Chatterjee, S. (2006). Concentration inequalities with exchangeable pairs. Ph.D. thesis, Stanford Univ., Palo Alto. .[5] Chatterjee, S. (2007). Steins method for concentration inequalities.

Probability theory andrelated ﬁelds , , no. 1, 305-321.[6] Doob, J. L. (1953) Stochastic processes . Wiley, New York.[7] Ibragimov, I. A. (1962). Some limit theorems for stationary processes.

Teor. Verojatnost. iPrimenen . , 361-392.[8] Kolmogorov, A.N., Rozanov, Y.A. (1960). On the strong mixing conditions for stationarygaussian sequences. Theor. Probab. Appl. , , 204-207.[9] Lieb, E. H. (1973). Convex trace functions and the Wigner-Yanase-Dyson conjecture. Adv.Math. , 267-288.[10] Mackey, L., Jordan, M. I., Chen, R. Y., Farrell, B., Tropp, J. A. (2014). Matrix concentra-tion inequalities via the method of exchangeable pairs. Ann. Probab. , , no. 3, 906-945.[11] Merlev`ede F., Peligrad M. and Rio E. (2009). Bernstein inequality and moderate devia-tions under strong mixing conditions. High dimensional probability V: the Luminy volume,273292, Inst. Math. Stat. Collect., 5, Inst. Math. Statist., Beachwood, OH [12] Merlev`ede F., Peligrad M. and Rio E. (2011). A Bernstein type inequality and moderatedeviations for weakly dependent sequences.

Probab. Theory Related Fields , no. 3-4,435-474.[13] Mokkadem, A. (1990). Propri´et´e de m´elange des processus autor´egressifs polynomiaux.

Ann.Instit Henri Poincar´e. , 133-141.[14] Paulin, D., Mackey, L. and Tropp, J. A. (2014). Efron-Stein Inequalities for Random Ma-trices. arXiv:1408.3470 .[15] Skorohod, A. V. (1976). On a representation of random variables. Theory Probab. Appl. , , 628-632.[16] Tao, T. (2012). Topics in random matrix theory . Graduate Studies in Mathematics, 132.American Mathematical Society, Providence, RI, 282 pp.[17] Tropp, J. A. (2012). User-friendly tail bounds for sums of random matrices.

Found. Comput.Math. ,12