Asymptotically Efficient Estimation of Smooth Functionals of Covariance Operators
aa r X i v : . [ m a t h . S T ] M a r Asymptotically Efficient Estimation of SmoothFunctionals of Covariance Operators
Vladimir Koltchinskii
Abstract
Let X be a centered Gaussian random variable in a separable Hilbert space H with covariance operator Σ . We study a problem of estimation of a smooth func-tional of Σ based on a sample X , . . . , X n of n independent observations of X. More specifically, we are interested in functionals of the form h f (Σ) , B i , where f : R R is a smooth function and B is a nuclear operator in H . We proveconcentration and normal approximation bounds for plug-in estimator h f ( ˆΣ) , B i , ˆΣ := n − P nj =1 X j ⊗ X j being the sample covariance based on X , . . . , X n . Thesebounds show that h f ( ˆΣ) , B i is an asymptotically normal estimator of its expecta-tion E Σ h f ( ˆΣ) , B i (rather than of parameter of interest h f (Σ) , B i ) with a parametricconvergence rate O ( n − / ) provided that the effective rank r (Σ) := tr (Σ) k Σ k ( tr(Σ) being the trace and k Σ k being the operator norm of Σ ) satisfies the assumption r (Σ) = o ( n ) . At the same time, we show that the bias of this estimator is typi-cally as large as r (Σ) n (which is larger than n − / if r (Σ) ≥ n / ). In the case when H is a finite-dimensional space of dimension d = o ( n ) , we develop a method of biasreduction and construct an estimator h h ( ˆΣ) , B i of h f (Σ) , B i that is asymptoticallynormal with convergence rate O ( n − / ) . Moreover, we study asymptotic propertiesof the risk of this estimator and prove asymptotic minimax lower bounds for arbi-trary estimators showing the asymptotic efficiency of h h ( ˆΣ) , B i in a semi-parametricsense. Keywords.
Asymptotic efficiency, Sample covariance, Bootstrap, Effective rank,Concentration inequalities, Normal approximation, Perturbation theory
Let X be a random variable in a separable Hilbert space H sampled from a Gaussian dis-tribution with mean and covariance operator Σ := E ( X ⊗ X ) (denoted in what follows School of Mathematics, Georgia Institute of Technology, Atlanta, GA 30332-0160, USA;e-mail:[email protected]
Mathematics Subject Classification (2010):
Primary 62H12; Secondary 62G20, 62H25, 60B20 N (0; Σ) ). The purpose of this paper is to study a problem of estimation of smooth func-tionals of unknown covariance Σ based on a sample X , . . . , X n of i.i.d. observations of X. Specifically, we deal with the functionals of the form h f (Σ) , B i , where f : R R is a smooth function and B is a nuclear operator. The estimation of bilinear forms ofspectral projection operators of covariance Σ , which is of importance in the principalcomponent analysis, could be easily reduced to this basic problem. Moreover, the estima-tion of h f (Σ) , B i is a major building block in the development of methods of statisticalestimation much more general functionals of covariance such as the functionals of theform h f (Σ) , B i . . . h f k (Σ) , B k i and their linear combinations.Throughout the paper, we use the following notations. Given A, B ≥ , A . B meansthat A ≤ CB for a numerical (most often, unspecified) constant C > A & B isequivalent to B . A ; A ≍ B is equivalent to A . B and B . A. Sometimes, constantsin the above relationships might depend on some parameter(s). In such cases, the signs . , & and ≍ are provided with subscripts: say, A . γ B means that A ≤ C γ B for a constant C γ > that depends on γ. Let B ( H ) denote the space of all bounded linear operators in a separable Hilbert space H equipped with the operator norm and let B sa ( H ) denote the subspace of all self-adjointoperators. In what follows, A ∗ denotes the adjoint of operator A ∈ B ( H ) , tr( A ) denotesits trace (provided that A is trace class) and k A k denotes its operator norm. We use thenotation k A k p for the Schatten p -norm of A : k A k pp := tr( | A | p ) , | A | = ( A ∗ A ) / , p ∈ [1 , ∞ ] . In particular, k A k is the nuclear norm of A, k A k is its Hilbert–Schmidt normand k A k ∞ = k A k is its operator norm. We denote the space of self-adjoint operators A with k A k p < ∞ ( p -th Schatten class operators) by S p = S p ( H ) , ≤ p ≤ ∞ . The spaceof compact self-adjoint operators in H is denoted by C sa ( H ) . The inner product notation h· , ·i is used both for inner products in the underlying Hilbert space H and for the Hilbert–Schmidt inner product between the operators. Moreover, it is also used to denote boundedlinear functionals on the spaces of operators (for instance, h A, B i , where A is a boundedoperator and B is a nuclear operator, is a value of such a linear functional on the space ofbounded operators). For u, v ∈ H , u ⊗ v denotes the tensor product of vector u and v :( u ⊗ v ) x := u h v, x i , x ∈ H . The operator u ⊗ v is of rank and finite linear combinationsof rank one operators are operators of finite rank. The rank of A is denoted by rank( A ) . Finally, C + ( H ) denotes the cone of self-adjoint positively semi-definite nuclear operatorsin H (the covariance operators). More precisely, the “smoothness” in this paper means that the function belongs to the Besov space B s ∞ , ( R ) for a proper value of s > , see Subsection 1.2. The main results of the paper are proved in the case when H is a real Hilbert space. However, on acouple of occasions, especially in auxiliary statements, its complexification H C = { u + iv : u, v ∈ H } witha standard extension of the inner product and complexification of the operators acting in H is needed. Withsome abuse of notation, we keep in such cases the notation H for the complex Hilbert space. stimationoffunctionalsofcovarianceoperators 3In what follows, we often use exponential bounds for random variables of the fol-lowing form: for all t ≥ with probability at least − e − t , ξ ≤ Ct.
Sometimes, ourderivation would yield a slightly different probability bound, for instance: for all t ≥ with probability at least − e − t , ξ ≤ Ct.
Such bounds could be easily rewritten againas − e − t by adjusting the value of constant C : for t ≥ with probability at least − e − t = 1 − e − t − log(3) , we have ξ ≤ C ( t + log(3)) ≤ Ct.
Such an adjustmentof constants will be used in many proofs without further notice.
Let ˆΣ denote the sample covariance based on the data X , . . . , X n :ˆΣ := n − n X j =1 X j ⊗ X j . It is well known that ˆΣ is a complete sufficient statistics and equals the maximum likeli-hood estimator in the problem of estimation of unknown covariance in the model X , . . . , X n i.i.d. ∼ N (0; Σ) . In what follows, we often use so called effective rank of covariance Σ as a complexityparameter of covariance estimation problem. It is defined as r (Σ) := tr(Σ) k Σ k . Note that r (Σ) ≤ rank(Σ) ≤ dim( H ) . The following result of Koltchinskii and Lounici[KL2] shows that, in the Gaussian case, the size of the random variable k ˆΣ − Σ kk Σ k (which isa relative operator norm error of the estimator ˆΣ of Σ ) is completely characterized by theratio r (Σ) n . Theorem 1.
The following bound holds: E k ˆΣ − Σ k ≍ k Σ k (cid:18)r r (Σ) n _ r (Σ) n (cid:19) . (1.1) Moreover, for all t ≥ with probability at least − e − t (cid:12)(cid:12)(cid:12) k ˆΣ − Σ k − E k ˆΣ − Σ k (cid:12)(cid:12)(cid:12) . k Σ k (cid:18)(cid:18)r r (Σ) n _ (cid:19)r tn _ tn (cid:19) . (1.2)It follows from the expectation bound (1.1) and the concentration inequality (1.2) that,for all t ≥ with probability at least − e − t , k ˆΣ − Σ k . k Σ k (cid:18)r r (Σ) n _ r (Σ) n _ r tn _ tn (cid:19) (1.3) VladimirKoltchinskiiand, for all p ≥ , E /p k ˆΣ − Σ k p . p k Σ k (cid:18)r r (Σ) n _ r (Σ) n (cid:19) . (1.4)To avoid the dependence of the constant on p, the following modification of the abovebound will be used on a couple of occasions: E /p k ˆΣ − Σ k p . k Σ k (cid:18)r r (Σ) n _ r (Σ) n _ r pn _ pn (cid:19) . (1.5)Since r (Σ) ≤ d := dim( H ) , the bounds in terms of effective rank do imply well knownbounds in terms of dimension. For instance, for all t ≥ with probability at least − e − t , k ˆΣ − Σ k . k Σ k (cid:18)r dn _ dn _ r tn _ tn (cid:19) (1.6)(see, e.g., [Ver]). Of course, bound (1.6) is meaningless in the infinite-dimensional case.In the finite-dimensional case, it is sharp if Σ is isotropic ( Σ = cI d for a constant c ), orif it is of isotropic type , that is, the spectrum of Σ is bounded from above and boundedaway from zero (by constants). In this case, r (Σ) ≍ d, which makes (1.6) sharp. Thisis the case, for instance, for popular spiked covariance models introduced by Johnstone[Jo] (see also [JoLu, Paul, BJNP]). However, in the case of a fast decay of eigenvalues of Σ , the effective rank r (Σ) could be significantly smaller than d and it becomes the rightcomplexity parameter in covariance estimation.In what follows, we are interested in the problems in which r (Σ) is allowed to be large,but r (Σ) = o ( n ) as n → ∞ . This is a necessary and sufficient condition for ˆΣ to be anoperator norm consistent estimator of Σ , which also means that ˆΣ is a small perturbationof Σ when n is large and methods of perturbation theory could be used to analyze thebehavior of f ( ˆΣ) for smooth functions f. In this subsection, we state and discuss the main results of the paper concerning asymptot-ically efficient estimation of functionals h f (Σ) , B i for a smooth function f : R R andnuclear operator B. It turns out that the proper notion of smoothness of function f in theseproblems is defined in terms of Besov spaces and Besov norms. The relevant definitions(of spaces B s ∞ , ( R ) and corresponding norms), notations and references are provided inSection 2.A standard approach to asymptotic analysis of plug-in estimators (in particular, suchas h f ( ˆΣ) , B i ) in statistics is the Delta Method based on the first order Taylor expansion of f ( ˆΣ) . Due to a result by Peller (see Section 2), for any f ∈ B ∞ , ( R ) , the mapping A stimationoffunctionalsofcovarianceoperators 5 f ( A ) is Fr´echet differentiable with respect to the operator norm on the space of boundedself-adjoint operators in H . Let Σ be a covariance operator with spectral decomposition Σ := P λ ∈ σ (Σ) λP λ , σ (Σ) being the spectrum of Σ , λ being an eigenvalue of Σ and P λ being the corresponding spectral projection (the orthogonal projection onto the eigenspaceof Σ ). Then the derivative Df (Σ)( H ) = Df (Σ; H ) of operator function f ( A ) at A = Σ in the direction H is given by the following formula: Df (Σ; H ) = X λ,µ ∈ σ (Σ) f [1] ( λ, µ ) P λ HP µ , where f [1] ( λ, µ ) = f ( λ ) − f ( µ ) λ − µ , λ = µ and f [1] ( λ, µ ) = f ′ ( λ ) , λ = µ (see Section 2).Moreover, if, for some s ∈ (1 , , f ∈ B s ∞ , ( R ) , then the following first order Taylorexpansion holds f ( ˆΣ) − f (Σ) = Df (Σ; ˆΣ − Σ) + S f (Σ; ˆΣ − Σ) with the linear term Df (Σ; ˆΣ − Σ) = n − P nj =1 Df (Σ; X j ⊗ X j − Σ) and the remainder S f (Σ; ˆΣ − Σ) satisfying the bound k S f (Σ; ˆΣ − Σ) k . s k f k B s ∞ , k ˆΣ − Σ k s (see (2.15)). Since the linear term Df (Σ; ˆΣ − Σ) is the sum of i.i.d. random variables, itis easy to check (for instance, using Berry-Esseen bound) that √ n h Df (Σ; ˆΣ − Σ) , B i isasymptotically normal with the limit mean equal to zero and the limit variance σ f (Σ; B ) := 2 k Σ / Df (Σ; B )Σ / k . Using exponential bound (1.3) on k ˆΣ − Σ k , one can easily conclude that the remain-der h S f (Σ; ˆΣ − Σ) , B i is asymptotically negligible (that is, of the order o ( n − / ) ) if (cid:16) r (Σ) n (cid:17) s/ = o ( n − / ) , or, equivalently, r (Σ) = o ( n − s ) . In the case when s = 2 , this means that r (Σ) = o ( n / ) . This implies that h f ( ˆΣ) , B i is an asymptotically nor-mal estimator of h f (Σ) , B i with convergence rate n − / and limit normal distribution N (0; σ f (Σ; B )) (under the assumption that r (Σ) = o ( n − s ) ). The above perturbationanalysis is essentially the same as for spectral projections of ˆΣ in the case of fixed finitedimension (see Anderson [A]), or in the infinite-dimensional case when the “complexity”of the problem (characterized by tr(Σ) or r (Σ) ) is fixed (see Dauxois, Pousse and Romain[DPR]). Note also that the bias of estimator h f ( ˆΣ) , B i , h E Σ f ( ˆΣ) − f (Σ) , B i = h E Σ S f (Σ; ˆΣ − Σ) , B i , is upper bounded by . k f k B s ∞ , k B k (cid:16) r (Σ) n (cid:17) s/ , so, it is of the order o ( n − / ) (asymp-totically negligible) under the same condition r (Σ) = o ( n − s ) . Moreover, it is easy to VladimirKoltchinskiisee that this bound on the bias is sharp for generic smooth functions f. For instance, if f ( x ) = x and B = u ⊗ u, then one can check by a straightforward computation that sup k u k≤ |h E Σ f ( ˆΣ) − f (Σ) , u ⊗ u i| = k tr(Σ)Σ + Σ k n ≍ k Σ k r (Σ) n . This means that, as soon as r (Σ) ≥ n / , one can choose a vector u from the unit ball (forwhich the supremum is “nearly attained”) such that both the bias and the remainder are notasymptotically negligible and, moreover, it turns out that, if r (Σ) n / → ∞ , then h f ( ˆΣ) , B i isnot even a √ n -consistent estimator of h f (Σ) , B i . If, in addition, the operator norm k Σ k isbounded by a constant R > , one can find a function in the space B ∞ , ( R ) that coincideswith f ( x ) = x in a neighborhood of the interval [0 , R ] , and the above claims hold forthis function, too (see also Remark 2 below).Our first goal is to show that h f ( ˆΣ) , B i is an asymptotically normal estimator of itsown expectation h E Σ f ( ˆΣ) , B i with convergence rate n − / and limit variance σ f (Σ; B ) in the class of covariances with effective rank of the order o ( n ) . Given r > and a > , define G ( r ; a ) := n Σ : r (Σ) ≤ r, k Σ k ≤ a o . Theorem 2.
Suppose, for some s ∈ (1 , , f ∈ B s ∞ , ( R ) . Let a > , σ > . Suppose that r n > and r n = o ( n ) as n → ∞ . Then sup Σ ∈G ( r n ; a ) , k B k ≤ ,σ f (Σ; B ) ≥ σ sup x ∈ R (cid:12)(cid:12)(cid:12)(cid:12) P Σ (cid:26) n / h f ( ˆΣ) − E Σ f ( ˆΣ) , B i σ f (Σ; B ) ≤ x (cid:27) − Φ( x ) (cid:12)(cid:12)(cid:12)(cid:12) → (1.7) as n → ∞ , where Φ( x ) := √ π R x −∞ e − t / dt, x ∈ R . This result is a consequence of Corollary 4 proved in Section 4 that provides an explicitbound on the accuracy of normal approximation. Its proof is based on a concentrationbound for the remainder h S f (Σ; ˆΣ − Σ) , B i of the first order Taylor expansion developedin Section 3. This bound essentially shows that the centered remainder h S f (Σ; ˆΣ − Σ) , B i − E h S f (Σ; ˆΣ − Σ) , B i is of the order (cid:16) r (Σ) n (cid:17) ( s − / q n , which is o ( n − / ) as soon as r (Σ) = o ( n ) . Theorem 2 shows that the naive plug-in estimator h f ( ˆΣ) , B i “concentrates” around itsexpectation with approximately standard normal distribution of random variables n / h f ( ˆΣ) − E Σ f ( ˆΣ) , B i σ f (Σ; B ) . At the same time, as we discussed above, the plug-in estimator has a large bias when theeffective rank of Σ is sufficiently large (say, r (Σ) ≥ n / for functions f of smoothnessstimationoffunctionalsofcovarianceoperators 7 s = 2 ). In the case when Σ ∈ G ( r n ; a ) with r n = o ( n / ) and σ f (Σ; B ) ≥ σ , the biasis negligible and h f ( ˆΣ) , B i becomes an asymptotically normal estimator of h f (Σ) , B i . Moreover, we will also derive asymptotics of the risk of plug-in estimator for loss func-tions satisfying the following assumption:
Assumption 1.
Let ℓ : R R + be a loss function such that ℓ (0) = 0 , ℓ ( u ) = ℓ ( − u ) , u ∈ R , ℓ is nondecreasing and convex on R + and, for some constants c , c > ℓ ( u ) ≤ c e c u , u ≥ . Corollary 1.
Suppose, for some s ∈ (1 , , f ∈ B s ∞ , ( R ) . Let a > , σ > . Supposethat r n > and r n = o ( n − s ) as n → ∞ . Then sup Σ ∈G ( r n ; a ) , k B k ≤ ,σ f (Σ; B ) ≥ σ sup x ∈ R (cid:12)(cid:12)(cid:12)(cid:12) P Σ (cid:26) n / ( h f ( ˆΣ) , B i − h f (Σ) , B i ) σ f (Σ; B ) ≤ x (cid:27) − Φ( x ) (cid:12)(cid:12)(cid:12)(cid:12) → (1.8) as n → ∞ . Moreover, under the same assumptions on f and r n , and for any loss function ℓ satisfying Assumption 1, sup Σ ∈G ( r n ; a ) , k B k ≤ ,σ f (Σ; B ) ≥ σ (cid:12)(cid:12)(cid:12)(cid:12) E Σ ℓ (cid:18) n / (cid:16) h f ( ˆΣ) , B i − h f (Σ) , B i (cid:17) σ f (Σ; B ) (cid:19) − E ℓ ( Z ) (cid:12)(cid:12)(cid:12)(cid:12) → (1.9) as n → ∞ , where Z is a standard normal random variable. The main difficulty in asymptotically efficient estimation of functional h f (Σ) , B i isrelated to the development of bias reduction methods. We will discuss now an approachto this problem in the case when H is a finite-dimensional space of dimension d = d n = o ( n ) and the covariance operator Σ is of isotropic type (the spectrum of Σ is boundedfrom above and bounded away from zero by constants that do not depend on n ). In thiscase, the effective rank r (Σ) is of the same order as the dimension d, so, d will be usedas a complexity parameter. The development of a similar approach in a more generalsetting (when the effective rank r (Σ) is a relevant complexity parameter) remains an openproblem.Consider the following integral operator T g (Σ) := E Σ g ( ˆΣ) = Z C + ( H ) g ( S ) P (Σ; dS ) , Σ ∈ C + ( H ) where C + ( H ) is the cone of positively semi-definite self-adjoint operators in H (covari-ance operators) and P (Σ; · ) is the distribution of the sample covariance ˆΣ based on n i.i.d. observations sampled from N (0; Σ) (which is a rescaled Wishart distribution). Inwhat follows, T will be called the Wishart operator . We will view it as an operator actingon bounded measurable functions on the cone C + ( H ) taking values either in real line, or in VladimirKoltchinskiithe space of self-adjoint operators. Such operators play an important role in the theory ofWishart matrices (see, e.g., James [James, James1, James2], Graczyk, Letac and Massam[GLM, GLM1], Letac and Massam [LetMas]). Their properties will be discussed in detailin Section 5. To find an unbiased estimator g ( ˆΣ) of f (Σ) , one has to solve the integralequation T g (Σ) = f (Σ) , Σ ∈ C + ( H ) ( the Wishart equation ). Let B := T − I , I beingthe identity operator. Then, the solution of Wishart equation can be formally written asthe Neumann series g (Σ) = ( I + B ) − f (Σ) = ( I − B + B − . . . ) f (Σ) = ∞ X j =0 ( − j B j f (Σ) . We do not use this representation in what follows and do not need any facts about theconvergence of the series. Instead, we will define an approximate solution of Wishartequation in terms of a partial sum of Neumann series f k (Σ) := k X j =0 ( − j B j f (Σ) , Σ ∈ C + ( H ) . With this definition, we have E Σ f k ( ˆΣ) − f (Σ) = ( − k B k +1 f (Σ) , Σ ∈ C + ( H ) . It remains to show that hB k +1 f (Σ) , B i is small for smooth enough functions f, whichwould imply that the bias h E Σ f k ( ˆΣ) − f (Σ) , B i of estimator h f k ( ˆΣ) , B i of h f (Σ) , B i isalso small. [Very recently, a similar approach was considered in the paper by Jiao, Hanand Weissman [JHW] in the case of estimation of function f ( θ ) of the parameter θ ofbinomial model B ( n ; θ ) , θ ∈ [0 , . In this case, T f is the Bernstein polynomial of degree n approximating function f, and some results of classical approximation theory ([GonZ],[Tot]) were used in [JHW] to control B k f. ]Note that P ( · ; · ) is a Markov kernel and it could be viewed as the transition kernel ofa Markov chain ˆΣ ( t ) , t = 0 , , . . . in the cone C + ( H ) , where ˆΣ (0) = Σ , ˆΣ (1) = ˆΣ , and,in general, for any t ≥ , ˆΣ ( t ) is the sample covariance based on n i.i.d. observationssampled from the distribution N (0; ˆΣ ( t − ) (conditionally on ˆΣ ( t − ). In other words, theMarkov chain { ˆΣ ( t ) } is based on iterative applications of bootstrap, and it will be calledin what follows the bootstrap chain . As a consequence of bound (1.6), with a high proba-bility (conditionally on ˆΣ ( t − ), k ˆΣ ( t ) − ˆΣ ( t − k . k ˆΣ ( t − k q dn , so, when d = o ( n ) , theMarkov chain { ˆΣ ( t ) } moves in “small steps” of the order ≍ q dn . Clearly, with the abovedefinitions, T k f (Σ) = E Σ f ( ˆΣ ( k ) ) . Note that, by Newton’s binomial formula, B k f (Σ) = ( T − I ) k f (Σ) = k X j =0 ( − k − j (cid:18) kj (cid:19) T j f (Σ) = E Σ k X j =0 ( − k − j (cid:18) kj (cid:19) f ( ˆΣ ( j ) ) . stimationoffunctionalsofcovarianceoperators 9The expression P kj =0 ( − k − j (cid:0) kj (cid:1) f ( ˆΣ ( j ) ) could be viewed as the k -th order difference offunction f along the Markov chain { ˆΣ ( t ) } . It is well known that, for a k times continuouslydifferentiable function f in real line, the k -th order difference ∆ kh f ( x ) (where ∆ h f ( x ) := f ( x + h ) − f ( x ) ) is of the order O ( h k ) for a small increment h. Thus, at least heuristically,one can expect that B k f (Σ) would be of the order O (cid:16)(cid:16) dn (cid:17) k/ (cid:17) (since q dn is the sizeof the “steps” of the Markov chain { ˆΣ ( t ) } ). This means that, for d much smaller than n, one can achieve a significant bias reduction in a relatively small number of steps k. The justification of this heuristic is rather involved. It is based on a representation ofoperator function f (Σ) in the form D g (Σ) := Σ / Dg (Σ)Σ / , where g is a real valuedfunction on the cone C + ( H ) invariant with respect to the orthogonal group. The propertiesof orthogonally invariant functions are then used to derive an integral representation forthe function B k f (Σ) = B k D g (Σ) = DB k g (Σ) that implies, for a sufficiently smooth f, bounds on B k f (Σ) of the order O (cid:16)(cid:16) dn (cid:17) k/ (cid:17) and, as a consequence, bounds on the bias ofestimator h f k ( ˆΣ) , B i of h f (Σ) , B i of the order o ( n − / ) , provided that d = o ( n ) and k issufficiently large (see (5.15) in Section 5 and Theorem 8, Corollary 5 in Section 6).The next step in analysis of estimator h f k ( ˆΣ) , B i is to derive normal approximationbounds for h f k ( ˆΣ) , B i − E Σ h f k ( ˆΣ) , B i . To this end, we study in Section 7 smoothnessproperties of functions DB k g (Σ) for a smooth orthogonally invariant function g that arelater used to prove proper smoothness of such functions as h f k (Σ) , B i and derive concen-tration bounds on the remainder h S f k (Σ; ˆΣ − Σ) , B i of the first order Taylor expansionof h f k ( ˆΣ) , B i , which is the main step in showing that the centered remainder is asymp-totically negligible and proving the normal approximation. In addition, we show that thelimit variance in the normal approximation of h f k ( ˆΣ) , B i − E Σ h f k ( ˆΣ) , B i coincides with σ f (Σ; B ) (which is exactly the same as the limit variance in the normal approximation of h f ( ˆΣ) , B i− E Σ h f ( ˆΣ) , B i ). This, finally, yields normal approximation bounds of theorems10 and 11 in Section 8.Given d > and a ≥ , denote by S ( d ; a ) the set of all covariance operators in a d -dimensional space H such that k Σ k ≤ a, k Σ − k ≤ a. The following result on uniformnormal approximation of estimator h f k ( ˆΣ) , B i of h f (Σ) , B i is an immediate consequenceof Theorem 11. Theorem 3.
Let a ≥ , σ > . Suppose that, for some α ∈ (0 , , < d n ≤ n α , n ≥ . Suppose also that, for some s > − α , f ∈ B s ∞ , ( R ) . Let k be an integer number such that,for some β ∈ (0 , , − α < k + 1 + β ≤ s. Then sup Σ ∈S ( d n ; a ) , k B k ≤ ,σ f (Σ; B ) ≥ σ sup x ∈ R (cid:12)(cid:12)(cid:12)(cid:12) P Σ (cid:26) n / (cid:16) h f k ( ˆΣ) , B i − h f (Σ) , B i (cid:17) σ f (Σ; B ) ≤ x (cid:27) − Φ( x ) (cid:12)(cid:12)(cid:12)(cid:12) → (1.10)0 VladimirKoltchinskii as n → ∞ . Moreover, if ℓ is a loss function satisfying Assumption 1, then sup Σ ∈S ( d n ; a ) , k B k ≤ ,σ f (Σ; B ) ≥ σ (cid:12)(cid:12)(cid:12)(cid:12) E Σ ℓ (cid:18) n / (cid:16) h f k ( ˆΣ) , B i − h f (Σ) , B i (cid:17) σ f (Σ; B ) (cid:19) − E ℓ ( Z ) (cid:12)(cid:12)(cid:12)(cid:12) → (1.11) as n → ∞ . Remark 1.
Note that for α ∈ (0 , / and s > − α , one can choose k = 0 , implying that f k ( ˆΣ) = f ( ˆΣ) in Theorem 3 is a usual plug-in estimator (compare this with Corollary1). However, for α = , we have to assume that s > and choose k = 1 to satisfy thecondition k +1+ β > − α = 2 . Thus, in this case, the bias correction is already nontrivial.For larger values of α, even more smoothness of f is required and more iterations k inour bias reduction method is needed. Remark 2.
It easily follows from well known embedding theorems for Besov spaces(see, e.g., [Tr], Section 2.3.2) that, for s ′ > s > , the H¨older space C s ′ ( R ) ⊂ B s ∞ , ( R ) . Moreover, it is easy to see that any C s ′ -function defined locally in a neighborhood of thespectrum of Σ could be extended to a function from C s ′ ( R ) . These observations show thatTheorem 3 could be applied to all C s functions defined in a neighborhood of the spectrumof Σ for all s > − α . To show the asymptotic efficiency of estimator h f k ( ˆΣ) , B i , it remains to prove a min-imax lower bound on the risk of an arbitrary estimator T n ( X , . . . , X n ) of the functional h f (Σ) , B i that would imply the optimality of the variance σ f (Σ; B ) in normal approx-imation (1.10), (1.11). Consider a function f ∈ B s ∞ , ( R ) for some s ∈ (1 , . Given a > , let ˚ S ( d ; a ) be the set of all covariance operators in a Hilbert space H of dimension d = dim( H ) such that k Σ k < a, k Σ − k < a. Given σ > , denote ˚ S f,B ( d ; a ; σ ) := ˚ S ( d ; a ) ∩ { Σ : σ f (Σ; B ) > σ } . Note that the set ˚ S f,B ( d ; a ; σ ) is open in operator norm topology, which easily followsfrom the continuity of functions Σ
7→ k Σ k , Σ
7→ k Σ − k (on the set of non-singularoperators) and Σ σ f (Σ; B ) (see Lemma 26 in Section 9) with respect to the operatornorm. This set could be empty. For instance, since σ f (Σ; B ) ≤ k f ′ k L ∞ k Σ k k B k , wehave that ˚ S f,B ( d ; a ; σ ) = ∅ if σ > k f ′ k L ∞ k Σ k k B k . Denote B f ( d ; a ; σ ) := n B : k B k ≤ , ˚ S f,B ( d ; a ; σ ) = ∅ o . The following theorem provides an asymptotic minimax lower bound on the meansquared error of estimation of functionals h f (Σ) , B i , k B k ≤ . By convention, it will beassumed that inf ∅ = + ∞ . stimationoffunctionalsofcovarianceoperators 11 Theorem 4.
Let a > , σ > and let { d n } be an arbitrary sequence of integers d n ≥ . Then, for all a ′ ∈ (1 , a ) and σ ′ > σ , lim inf n →∞ inf T n inf B ∈ B f ( d n ; a ′ ; σ ′ ) sup Σ ∈ ˚ S ( d n ; a ) ,σ f (Σ; B ) >σ n E Σ ( T n − h f (Σ) , B i ) σ f (Σ; B ) ≥ , (1.12) where the first infimum is taken over all statistics T n = T n ( X , . . . , X n ) based on i.i.d.observations X , . . . , X n sampled from N (0; Σ) . The proof of this theorem is given in Section 9.
Remark 3. If C ⊂ σ (Σ) is a “component” of the spectrum of Σ such that the distance dist( C ; σ (Σ) \ C ) from C to the rest of the spectrum is bounded away from zero by asufficiently large gap and P C is the orthogonal projection on the direct sum of eigenspacesof Σ corresponding to the eigenvalues from C, then it is easy to represent P C as f (Σ) fora smooth function f that is equal to on C and vanishes outside of a neighborhood of C that does not contain other eigenvalues. The problem of efficient estimation of linearfunctionals of spectral projection P C (such as its matrix entries in a given basis or generalbilinear forms) is of importance in principal component analysis. A related problem ofestimation of linear functionals of principal components was recently studied in [KLN]in the case of one-dimensional spectral projections. The methods of efficient estimationdeveloped in [KLN] are rather specialized and they could not be easily extended even tospectral projections of higher rank than . This, in part, was our motivation to study theproblem for more general smooth functionals and to develop a more general approachto the problem of efficient estimation. Similarly, one can represent operator P C Σ P C asa smooth function of Σ and use the approach of the current paper to develop efficientestimators of bilinear forms or matrix entries of such operators. This could be of interestin the case of covariance matrices of the form Σ = Σ + σ I d , where Σ is a low rankcovariance matrix (say, the covariance matrix whose eigenvectors are “spikes” of a spikedcovariance model). If C is the set of top eigenvalues of Σ that correspond to its “spikes”,then estimation of matrix Σ could be reduced to estimation of P C Σ P C . Remark 4.
The results of the paper could not be directly applied to estimation of func-tionals of the form tr( f (Σ)) since in this case B is the identity operator and its nuclearnorm is not bounded by a constant. In such cases, √ n -consistent estimators do not al-ways exist in high-dimensional problems, minimax optimal convergence rates are slowerthan n − / and they do depend on the dimension (see, for instance, [CLZ] for an exampleof estimation of the log-determinant log det(Σ) = tr(log(Σ)) ). Although some elementsof our approach (in particular, the bias reduction method) could be useful in this case, acomprehensive theory of estimation of functionals h f (Σ) , B i in the case of unboundednuclear norm of operator B remains an open problem and it is beyond the scope of thispaper.2 VladimirKoltchinskii Remark 5.
In this paper, the problem was studied only in the case of Gaussian modelswith known mean (without loss of generality, it is set to be zero) and unknown covarianceoperators. In [KZh], a similar problem of efficient estimation of smooth functionals of un-known mean in Gaussian shift models with known covariance was studied. The problembecomes more complicated when both mean and covariance are unknown (in particular, itwould require a more difficult analysis of operators T and B involved in the bias reductionmethod). Remark 6.
The computation of estimators f k ( ˆΣ) could be based on a Monte Carlo sim-ulation of the bootstrap chain. To this end, one has to simulate a segment of this chain oflength k + 1 starting at the sample covariance ˆΣ . This would allow us to compute the sum P kj =0 ( − k − j (cid:0) kj (cid:1) f ( ˆΣ ( j +1) ) . Averaging such sums over a sufficiently large number N ofindependent copies of bootstrap chain provides a Monte Carlo approximation of B k f ( ˆΣ) , which allows us to approximate f k ( ˆΣ) . A total of ( k + 1) N computations of the function f of covariance operators (each of them based on a singular value decomposition) wouldbe required to implement this procedure. Up to our best knowledge, the problem of efficient estimation for general classes ofsmooth functionals of covariance operators in the setting of the current paper has notbeen studied before. However, many results in the literature on nonparametric, semipara-metric and high-dimensional statistics as well as some results in random matrix theory arerelevant in our context. We provide below a very brief discussion of some of these results.Asymptotically efficient estimation of smooth functionals of infinite-dimensional pa-rameters has been an important topic in nonparametric statistics for a number of years thatalso has deep connections to efficiency in semiparametric estimation (see, e.g., [BKRW],[GN] and references therein). The early references include Levit [Lev1, Lev2] and thebook of Ibragimov and Khasminskii [IKh]. In the paper by Ibragimov, Nemirovski andKhasminskii [IKhN] and later in paper [Nem1] and in Saint-Flour Lectures [Nem2] byNemirovski, sharp results on efficient estimation of general smooth functionals of pa-rameters of Gaussian white noise models were obtained, precisely describing the depen-dence between the rate of decay of Kolmogorov’s diameters of parameter space (used asa measure of its complexity) and the degree of smoothness of functionals for which effi-cient estimation is possible. A general approach to construction of efficient estimators ofsmooth functionals in Gaussian white noise models was also developed in these papers.The result of Theorem 3 is in the same spirit with the growth rate α of the dimension ofthe space being the complexity parameter instead of the rate of decay of Kolmogorov’sdiameters. At this point, we do not know whether the smoothness threshold s > − α forstimationoffunctionalsofcovarianceoperators 13efficient estimation obtained in this theorem is sharp (although the sharpness of the samesmoothness threshold was proved in [KZh]) in the case of Gaussian shift model).More recently, there has been a lot of interest in semi-parametric efficiency prop-erties of regularization-based estimators (such as LASSO) in various models of high-dimensional statistics, see, e.g., [GBRD], [JaMont], [ZZ], [JG] as well as in minimaxoptimal rates of estimation of special functionals (in particular, linear and quadratic) insuch models [CL1], [CL2], [CCT].In a series of pioneering papers in the 80s–90s, Girko obtained a number of results onasymptotically normal estimation of many special functionals of covariance matrices inhigh-dimensional setting, in particular, on estimation of the Stieltjes transform of spectralfunction tr(( I + t Σ) − ) (see [Gir] and also [Gir1] and references therein). His estimatorswere typically functions of sample covariance ˆΣ defined in terms of certain equations (socalled G -estimators) and the proofs of their asymptotic normality were largely based onmartingale CLT. The centering and normalizing parameters in the limit theorems in thesepapers are often hard to interpret and the estimators were not proved to be asymptoticallyefficient.Asymptotic normality of so called linear spectral statistics tr( f ( ˆΣ)) centered either bytheir own expectations, or by the integral of f with respect to Marchenko-Pastur type lawhas been an active subject of research in random matrix theory both in the case of high-dimensional sample covariance (or Wishart matrices) and in other random matrix modelssuch as Wigner matrices, see, e.g., Bai and Silverstein [BaiS], Lytova and Pastur [LP],Sosoe and Wong [SW]. Although these results do not have direct statistical implicationssince tr( f ( ˆΣ)) does not “concentrate” around the corresponding population parameter,probabilistic and analytic techniques developed in these papers are highly relevant.There are many results in the literature on special cases of the above problem, suchas asymptotic normality of statistic log det( ˆΣ) = tr(log( ˆΣ)) (the log-determinant). If d = d n ≤ n, then it was shown that the sequence log det( ˆΣ) − a n,d − log det(Σ) b n,d converges in distribution to a standard normal random variable for explicitly given se-quences a n,d , b n,d that depend only on the sample size n and on the dimension d. Thismeans that log det( ˆΣ) is an asymptotically normal estimator of log det(Σ) = tr(log(Σ)) subject to a simple bias correction (see, e.g., Girko [Gir] and more recent paper by Cai,Liang and Zhou [CLZ]). The convergence rate of this estimator is typically slower than n − / : for instance, if d = n α for α ∈ (0 , , then the convergence rate is ≍ n − (1 − α ) / (and, for α = 1 , the estimator is not consistent). In this case, the problem is relativelysimple since log det( ˆΣ) − log det(Σ) = log det( W ) , where W is the sample covariancebased on a sample of n i.i.d. standard normal random vectors.4 VladimirKoltchinskiiIn a recent paper by Koltchinskii and Lounici [KL1] (see also [KL3, KL4]), the prob-lem of estimation of bilinear forms of spectral projections of covariance operators wasstudied in the setting when the effective rank r (Σ) = o ( n ) as n → ∞ . Normal ap-proximation and concentration results for bilinear forms centered by their expectationswere proved using first order perturbation expansions for empirical spectral projectionsand concentration inequalities for their remainder terms (which is similar to the approachof the current paper). Special properties of the bias of these estimators were studied that,in the case of one dimensional spectral projections, led to the development of a bias re-duction method based on sample splitting that resulted in a construction of √ n -consistentand asymptotically normal estimators of linear forms of eigenvectors of the true covari-ance (principal components) in the case when r (Σ) = o ( n ) as n → ∞ . This approach hasbeen further developed in a very recent paper by Koltchinskii, Loeffler and Nickl [KLN]in which asymptotically efficient estimators of linear forms of eigenvectors of Σ werestudied.Other recent references on estimation of functionals of covariance include Fan, Rigol-let and Wang [FRW] (optimal rates of estimation of special functionals of covarianceunder sparsity assumptions), Gao and Zhou [GaoZ] (Bernstein-von Mises theorems forfunctionals of covariance), Kong and Valiant [KoVa] (estimation of “spectral moments” tr(Σ k ) ). In this section, we discuss several results in operator theory concerning perturbations ofsmooth functions of self-adjoint operators in Hilbert spaces. They are simple modifica-tions of known results due to several authors (see recent survey by Aleksandrov and Peller[AP2]).
Let f : C C be an entire function and let σ > . It is said that f is of exponential type σ (more precisely, ≤ σ ) if for any ε > there exists C = C ( ε, σ, f ) > such that | f ( z ) | ≤ Ce ( σ + ε ) | z | , z ∈ C . In what follows, E σ = E σ ( C ) denotes the space of all entire functions of exponential type σ. It is straightforward to see (and well known) that f ∈ E σ if and only if lim sup R →∞ log sup ϕ ∈ [0 , π ] | f ( Re iϕ ) | R =: σ ( f ) ≤ σ. For other recent results on covariance estimation under assumptions on its effective rank see [NSU,RW]. stimationoffunctionalsofcovarianceoperators 15With a little abuse of notation, the restriction f ↾ R of function f to R will be also denotedby f ; F f will denote the Fourier transform of f : F f ( t ) = R R e − itx f ( x ) dx (if f is notsquare integrable, its Fourier transform could be understood in the sense of tempereddistributions). According to Paley-Wiener theorem , E σ \ L ∞ ( R ) = { f ∈ L ∞ ( R ) : supp( F f ) ⊂ [ − σ, σ ] } . It is also well known that f ∈ E σ T L ∞ ( R ) if and only if | f ( z ) | ≤ k f k L ∞ ( R ) e σ | Im( z ) | , z ∈ C . We will use the following
Bernstein inequality k f ′ k L ∞ ( R ) ≤ σ k f k L ∞ ( R ) that holdsfor all functions f ∈ E σ T L ∞ ( R ) . Moreover, since f ∈ E σ implies f ′ ∈ E σ , we alsohave k f ′′ k L ∞ ( R ) ≤ σ k f k L ∞ ( R ) , and similar bounds hold for all the derivatives of f. Next elementary lemma is a corollary of Bernstein inequality. It provides bounds on theremainder of the first order Taylor expansion of a function f ∈ E σ T L ∞ ( R ) . Lemma 1.
Let f ∈ E σ T L ∞ ( R ) . Denote S f ( x ; h ) := f ( x + h ) − f ( x ) − f ′ ( x ) h, x, h ∈ R . Then | S f ( x ; h ) | ≤ σ k f k L ∞ ( R ) h , x, h ∈ R and | S f ( x ; h ′ ) − S f ( x ; h ) | ≤ σ k f k L ∞ ( R ) δ ( h, h ′ ) | h ′ − h | , x, h, h ′ ∈ R . where δ ( h, h ′ ) := ( | h | ∧ | h ′ | ) + | h ′ − h | . We also need an extension of Bernstein inequality to functions of many complex vari-ables. Let f : C k C be an entire function and let σ := ( σ , . . . , σ k ) , σ j > . Function f is of exponential type σ = ( σ , . . . , σ k ) if for any ε > there exists C = C ( ε, σ, f ) > such that | f ( z , . . . , z k ) | ≤ Ce P kj =1 ( σ j + ε ) | z j | , z , . . . , z k ∈ C . Let E σ ,...,σ k be the set of all such functions. The following extension of Bernstein in-equality could be found in the paper by Nikolsky [Nik], who actually proved it for anarbitrary L p -norm, ≤ p ≤ ∞ . If f ∈ E σ ,...,σ k ∩ L ∞ ( R ) , then for any m ≥ and any m , . . . , m k ≥ such that P kj =1 m j = m, (cid:13)(cid:13)(cid:13)(cid:13) ∂ m f∂x m . . . ∂x m k k (cid:13)(cid:13)(cid:13)(cid:13) L ∞ ( R k ) ≤ σ m . . . σ m k k k f k L ∞ ( R k ) . (2.1)Let w ≥ be a C ∞ function in real line with supp( w ) ⊂ [ − , such that w ( t ) =1 , t ∈ [ − , and w ( − t ) = w ( t ) , t ∈ R . Define w ( t ) := w ( t/ − w ( t ) , t ∈ R which6 VladimirKoltchinskiiimplies that supp( w ) ⊂ { t : 1 ≤ | t | ≤ } . Let w j ( t ) := w (2 − j t ) , t ∈ R with supp( w j ) ⊂ { t : 2 j ≤ | t | ≤ j +2 } , j = 0 , , . . . . These definitions immediately im-ply that w ( t ) + X j ≥ w j ( t ) = 1 , t ∈ R . Finally, define functions
W, W j ∈ S ( R ) (the Schwartz space of functions in R ) by theirFourier transforms as follows: w ( t ) = ( F W )( t ) , w j ( t ) = ( F W j )( t ) , t ∈ R , j ≥ . For a tempered distribution f ∈ S ′ ( R ) , one can define its Littlewood-Paley dyadic decom-position as the family of functions f := f ∗ W, f n := f ∗ W n − , n ≥ with compactlysupported Fourier transforms. Note that, by Paley-Wiener theorem, f n ∈ E n +1 T L ∞ ( R ) . It is well known that P n ≥ f n = f with convergence of the series in the space S ′ ( R ) . Weuse the following Besov norms k f k B s ∞ , := X n ≥ ns k f n k L ∞ ( R ) , s ∈ R and define the corresponding Besov spaces as B s ∞ , ( R ) := n f ∈ S ′ ( R ) : k f k B s ∞ , < ∞ o . We do not use in what follows the whole scale of Besov spaces B sp,q ( R ) , but only thespaces B s ∞ , ( R ) for p = ∞ , q = 1 and s ≥ . Note that Besov norms k · k B sp,q are equiv-alent for different choices of function w and the corresponding Besov spaces coincide. If f ∈ B s ∞ , ( R ) for some s ≥ , then the series P n ≥ f n converges uniformly to f in R , which easily implies that f ∈ C u ( R ) , where C u ( R ) is the space of all bounded uniformlycontinuous functions in R and k f k L ∞ ≤ k f k B s ∞ , . Thus, for s ≥ , the space B s ∞ , ( R ) is continuously embedded in C u ( R ) . Moreover, if C s ( R ) denotes the H¨older space ofsmoothness s > , then, for all s ′ > s > , C s ′ ( R ) ⊂ B s ∞ , ( R ) ⊂ C s ( R ) (see [Tr],section 2.3.2, 2.5.7). Further details on Besov spaces could be also found in [Tr]. For a continuous (and even for a Borel measurable) function f in R and A ∈ B sa ( H ) , theoperator f ( A ) is well defined and self-adjoint (for instance, by the spectral theorem). By astandard holomorphic functional calculus, the operator f ( A ) is well defined for A ∈ B ( H ) and for any function f : G ⊂ C C holomorphic in a neighborhood G of the spectrum σ ( A ) of A. It is given by the following Cauchy formula: f ( A ) := − πi I γ f ( z ) R A ( z ) dz, stimationoffunctionalsofcovarianceoperators 17where R A ( z ) := ( A − zI ) − , z σ ( A ) is the resolvent of A and γ ⊂ G is a contoursurrounding σ ( A ) with a counterclockwise orientation. In particular, this holds for allentire functions f and the mapping B ( H ) ∋ A f ( A ) ∈ B ( H ) is Fr´echet differentiablewith derivative Df ( A ; H ) = 12 πi I γ f ( z ) R A ( z ) HR A ( z ) dz, H ∈ B ( H ) . (2.2)The last formula easily follows from the perturbation series for the resolvent R A + H ( z ) = ∞ X k =0 ( − k ( R A ( z ) H ) k R A ( z ) , z ∈ C \ σ ( A ) that converges in the operator norm as soon as k H k < k R A ( z ) k = z,σ ( A )) . We need to extend bounds of Lemma 1 to functions of operators establishing similarproperties for the remainder of the first order Taylor expansion S f ( A ; H ) := f ( A + H ) − f ( A ) − Df ( A ; H ) , A, H ∈ B sa ( H ) , where f is an entire function of exponential type σ. This is related to a circle of problemsstudied in operator theory literature concerning operator Lipschitz and operator differ-entiable functions (see, in particular, a survey on this subject by Aleksandrov and Peller[AP2]).We will need the following lemma.
Lemma 2.
Let f ∈ E σ T L ∞ ( R ) . Then, for all
A, H, H ′ ∈ B sa ( H ) , k f ( A + H ) − f ( A ) k ≤ σ k f k L ∞ ( R ) k H k , (2.3) k Df ( A ; H ) k ≤ σ k f k L ∞ ( R ) k H k , (2.4) k S f ( A ; H ) k ≤ σ k f k L ∞ ( R ) k H k (2.5) and k S f ( A ; H ′ ) − S f ( A ; H ) k ≤ σ k f k L ∞ ( R ) δ ( H, H ′ ) k H ′ − H k , (2.6) where δ ( H, H ′ ) := ( k H k ∧ k H ′ k ) + k H ′ − H k . Bound (2.3) and (2.4) are well known, see Aleksandrov and Peller [AP2] (in fact,bound (2.3) means that, for f ∈ E σ T L ∞ ( R ) , B sa ( H ) ∋ A f ( A ) ∈ B sa ( H ) is operatorLipschitz with respect to the operator norm). The proof of bounds (2.5) and (2.6) is alsobased on a very nice approach by Aleksandrov and Peller [AP1, AP2] developed to provethe operator Lipschitz property. We are giving this proof for completeness.8 VladimirKoltchinskii Proof.
Let E be a complex Banach space and let E σ ( E ) be the space of entire functions F : C E of exponential type σ, that is, entire functions F such that for any ε > there exists a constant C = C ( ε, σ, F ) > for which k F ( z ) k ≤ Ce ( σ + ε ) | z | , z ∈ C . If F ∈ E σ ( E ) and sup x ∈ R k F ( x ) k < + ∞ , then Bernstein inequality holds for function F :sup x ∈ R k F ′ ( x ) k ≤ σ sup x ∈ R k F ( x ) k . (2.7)Indeed, for any l ∈ E ∗ , l ( F ( · )) ∈ E σ T L ∞ ( R ) , which implies that sup x ∈ R k F ′ ( x ) k = sup k l k≤ sup x ∈ R | l ( F ′ ( x )) | ≤ σ sup k l k≤ sup x ∈ R | l ( F ( x )) | = σ sup x ∈ R k F ( x ) k and k F ( x + h ) − F ( x ) k ≤ σ sup x ∈ R k F ( x ) k| h | . (2.8)A similar simple argument (now based on Lemma 1) shows that for S F ( x ; h ) := F ( x + h ) − F ( x ) − F ′ ( x ) h, we have k S F ( x ; h ) k ≤ σ x ∈ R k F ( x ) k h , x, h ∈ R (2.9)and k S F ( x ; h ′ ) − S F ( x ; h ) k ≤ σ sup x ∈ R k F ( x ) k (cid:20) | h || h ′ − h | + | h ′ − h | (cid:21) , x, h, h ′ ∈ R . (2.10)Next, for given A, H ∈ B sa ( H ) and f ∈ E σ T L ∞ ( R ) , define F ( z ) := f ( A + zH ) , z ∈ C . Then, F ∈ E σ k H k ( B ( H )) . Indeed, F is complex-differentiable at any point z ∈ C withderivative F ′ ( z ) = Df ( A + zH ; H ) , so, it is an entire function with values in E = B ( H ) . In addition, by von Neumann theorem (see, e.g., [Dav], Theorem 9.5.3), k F ( z ) k = k f ( A + zH ) k ≤ sup | ζ |≤k A k + | z |k H k | f ( ζ ) | ≤ k f k L ∞ ( R ) e σ k A k e σ k H k| z | , z ∈ C , implying that F is of exponential type σ k H k . Note also that sup x ∈ R k F ( x ) k = sup x ∈ R k f ( A + xH ) k ≤ sup x ∈ R | f ( x ) | = k f k L ∞ ( R ) . Hence, bounds (2.7) and (2.8) imply that k f ( A + H ) − f ( A ) k = k F (1) − F (0) k ≤ sup x ∈ R k F ′ ( x ) k ≤ σ k H k sup x ∈ R k F ( x ) k ≤ σ k f k L ∞ ( R ) k H k and k Df ( A ; H ) k = k F ′ (0) k ≤ σ k f k L ∞ ( R ) k H k , stimationoffunctionalsofcovarianceoperators 19which proves bounds (2.3) and (2.4). Similarly, using (2.9), we get k S f ( A ; H ) k = k f ( A + H ) − f ( A ) − Df ( A ; H ) k = k F (1) − F (0) − F ′ (0)(1 − k = k S F (0 , k ≤ σ k H k x ∈ R k F ( x ) k ≤ σ k f k L ∞ ( R ) k H k , proving (2.5).To prove bound (2.6), define F ( z ) := f ( A + H + z ( H ′ − H )) − f ( A + z ( H ′ − H )) , z ∈ C . As in the previous case, F is an entire function with values in B ( H ) . The bound k F ( z ) k ≤ k f k L ∞ ( R ) (cid:16) e σ k A + H k + e σ k A k (cid:17) e σ k H ′ − H k| z | implies that F ∈ E σ k H ′ − H k ( B ( H )) . Clearly, we also have sup x ∈ R k F ( x ) k ≤ k f k L ∞ ( R ) . Note that S f ( A ; H ′ ) − S f ( A ; H ) = Df ( A + H ; H ′ − H ) − Df ( A ; H ′ − H ) + S f ( A + H ; H ′ − H ) (2.11)and bound (2.5) implies k S f ( A + H ; H ′ − H ) k ≤ σ k f k L ∞ ( R ) k H ′ − H k . On the other hand, we have (by Bernstein inequality) k Df ( A + H ; H ′ − H ) − Df ( A ; H ′ − H ) k = k F ′ (0) k ≤ σ k H ′ − H k sup x ∈ R k F ( x ) k and (2.3) implies that sup x ∈ R k F ( x ) k = sup x ∈ R k f ( A + H + x ( H ′ − H )) − f ( A + x ( H ′ − H )) k ≤ σ k f k L ∞ ( R ) k H k . Now, it follows from (2.11) that k S f ( A ; H ′ ) − S f ( A ; H ) k ≤ σ k f k L ∞ ( R ) (cid:18) k H k + k H ′ − H k (cid:19) k H ′ − H k , which implies (2.6). Remark 7.
In addition to (2.6), the following bound follows from (2.3) and (2.4) k S f ( A ; H ′ ) − S f ( A ; H ) k ≤ σ k f k L ∞ ( R ) k H ′ − H k . (2.12)Note also that δ ( H, H ′ ) ≤ k H k + k H ′ k , H, H ′ ∈ B sa ( H ) . f ∈ B ∞ , ( R ) is operator Lipschitz and operator differen-tiable on the space of self-adjoint operators with respect to the operator norm (in Alek-sandrov and Peller [AP2], these facts were proved using Littlewood-Paley theory and ex-tensions of Bernstein inequality for operator functions, see also their earlier paper [AP1]).We will state Peller’s results in the next lemma in a convenient form for our purposesalong with some additional bounds on the remainder of the first order Taylor expansion S f ( A ; H ) = f ( A + H ) − f ( A ) − Df ( A ; H ) for f in proper Besov spaces. Lemma 3. If f ∈ B ∞ , ( R ) , then for all A, H ∈ B sa ( H ) , k f ( A + H ) − f ( A ) k ≤ k f k B ∞ , ( R ) k H k . (2.13) Moreover, the function B sa ( H ) ∋ A f ( A ) ∈ B sa ( H ) is Fr´echet differentiable withrespect to the operator norm with derivative given by the following series (that convergesin the operator norm): Df ( A ; H ) = X n ≥ Df n ( A ; H ) . (2.14) If f ∈ B s ∞ , ( R ) for some s ∈ [1 , , then, for all A, H, H ′ ∈ B sa ( H ) , k S f ( A ; H ) k ≤ − s k f k B s ∞ , k H k s (2.15) and k S f ( A ; H ′ ) − S f ( A ; H ) k ≤ k f k B s ∞ , ( δ ( H, H ′ )) s − k H ′ − H k . (2.16) Proof.
Recall that, for f ∈ B ∞ , ( R ) , the series P n ≥ f n converges uniformly in R tofunction f. Since
A, A + H, A + H ′ are bounded self-adjoint operators, we also get X n ≥ f n ( A ) = f ( A ) , X n ≥ f n ( A + H ) = f ( A + H ) , X n ≥ f n ( A + H ′ ) = f ( A + H ′ ) (2.17)with convergence of the series in the operator norm.To prove bound (2.13), observe that k f ( A + H ) − f ( A ) k = (cid:13)(cid:13)(cid:13)(cid:13)X n ≥ [ f n ( A + H ) − f n ( A )] (cid:13)(cid:13)(cid:13)(cid:13) ≤ X n ≥ k f n ( A + H ) − f n ( A ) k ≤ X n ≥ n +1 k f n k L ∞ ( R ) k H k = 2 k f k B ∞ , k H k , Peller, in fact, used modified homogeneous Besov classes instead of inhomogeneous Besov spaces weuse in this paper. stimationoffunctionalsofcovarianceoperators 21where we used (2.3).By bound (2.4), X n ≥ k Df n ( A ; H ) k ≤ X n ≥ n +1 k f n k L ∞ ( R ) k H k = 2 k f k B ∞ , k H k < ∞ , implying the convergence in the operator norm of the series P n ≥ Df n ( A ; H ) . We willdefine Df ( A ; H ) := X n ≥ Df n ( A ; H ) (2.18)and prove that this yields the Fr´echet derivative of f ( A ) . To this end, note that (2.17) and(2.18) implies that S f ( A ; H ) = X n ≥ [ f n ( A + H ) − f n ( A ) − Df n ( A ; H )] = X n ≥ S f n ( A ; H ) . (2.19)As a consequence, k S f ( A ; H ) k ≤ X n ≤ N k S f n ( A ; H ) k + X n>N k f n ( A + H ) − f n ( A ) k + X n>N k Df n ( A ; H ) k≤ X n ≤ N n +1) k f n k L ∞ ( R ) k H k + 2 X n>N n +1 k f n k L ∞ ( R ) k H k , where we used bounds (2.3), (2.4) and (2.5). Given ε > , take N so that P n>N n +1 k f n k L ∞ ≤ ε and suppose H satisfies k H k ≤ ε P n ≤ N n +1) k f n k L ∞ ( R ) . This implies that k S f ( A ; H ) k ≤ ε k H k and Fr´echet differentiability of f ( A ) with derivative Df ( A ; H ) follows.To prove (2.16), use (2.6) and (2.12) to get k S f n ( A ; H ′ ) − S f n ( A ; H ) k≤ n +1) k f n k L ∞ ( R ) δ ( H, H ′ ) k H ′ − H k ^ n +2 k f n k L ∞ ( R ) k H ′ − H k = 2 n +2 k f n k L ∞ ( R ) (2 n δ ( H, H ′ ) ∧ k H ′ − H k . k S f ( A ; H ′ ) − S f ( A ; H ) k ≤ X n ≥ k S f n ( A ; H ′ ) − S f n ( A ; H ) k≤ X n ≥ n +2 k f n k L ∞ ( R ) (2 n δ ( H, H ′ ) ∧ k H ′ − H k = 4 (cid:18) X n ≤ /δ ( H,H ′ ) n k f n k L ∞ ( R ) δ ( H, H ′ ) + X n > /δ ( H,H ′ ) n k f n k L ∞ ( R ) (cid:19) k H ′ − H k≤ (cid:18) X n ≤ /δ ( H,H ′ ) sn k f n k L ∞ ( R ) (cid:18) δ ( H, H ′ ) (cid:19) − s δ ( H, H ′ )+ X − n <δ ( H,H ′ ) sn k f n k L ∞ ( R ) ( δ ( H, H ′ )) s − (cid:19) k H ′ − H k≤ (cid:18) X n ≤ /δ ( H,H ′ ) sn k f n k L ∞ ( R ) + X n > /δ ( H,H ′ ) sn k f n k L ∞ ( R ) (cid:19) ( δ ( H, H ′ )) s − k H ′ − H k = 4 k f k B s ∞ , ( δ ( H, H ′ )) s − k H ′ − H k , which yields (2.16). Bound (2.15) follows from (2.16) when H ′ = 0 . Suppose A ∈ B sa ( H ) is a compact operator with spectral representation A = P λ ∈ σ ( A ) λP λ , where P λ denotes the spectral projection corresponding to the eigenvalue λ. The follow-ing formula for the derivative Df ( A ; H ) , f ∈ B ∞ , ( R ) is well known (see [Bh], TheoremV.3.3 for a finite-dimensional version): Df ( A ; H ) = X λ,µ ∈ σ ( A ) f [1] ( λ, µ ) P λ HP µ , (2.20)where f [1] ( λ, µ ) := f ( λ ) − f ( µ ) λ − µ for λ = µ and f [1] ( λ, µ ) := f ′ ( λ ) for λ = µ. In otherwords, the operator Df ( A ; H ) can be represented in the basis of eigenvectors of A as aSchur product of Loewner matrix ( f [1] ( λ, µ )) λ,µ ∈ σ ( A ) and the matrix of operator H in thisbasis. We will need this formula only in the case of discrete spectrum, but there are alsoextensions for more general operators A with continuous spectrum (with the sums beingreplaced by double operator integrals), see Aleksandrov and Peller [AP2], theorems 3.5.11and 1.6.4.Finally, we need some extensions of the results stated above for higher order deriva-tives (see [Skr], [ACDS], [KS] and references therein for a number of subtle results in thisdirection). If g : B sa ( H )
7→ B sa ( H ) is a k times Fr´echet differentiable function, its k -thderivative D k g ( A ) A ∈ B sa ( H ) can be viewed as a symmetric multilinear operator valuedform D k g ( A )( H , . . . , H k ) = D k g ( A ; H , . . . , H k ) , H , . . . , H k ∈ B sa ( H ) . stimationoffunctionalsofcovarianceoperators 23Given such a form M : B sa ( H ) × · · · × B sa ( H )
7→ B sa ( H ) , define its operator norm as k M k := sup k H k ,..., k H k k≤ k M ( H , . . . , H k ) k . The derivatives D k g ( A ) are defined iteratively: D k g ( A )( H , . . . , H k − , H k ) = D ( D k − g ( A )( H , . . . , H k − ))( H k ) . For f ∈ E σ T L ∞ ( R ) , the k -th derivative D k f ( A ) is given by the following formula: D k f ( A ; H , . . . , H k ) = ( − k +1 πi X π ∈ S k I γ f ( z ) R A ( z ) H π (1) R A ( z ) H π (2) . . . R A ( z ) H π ( k ) R A ( z ) dz,H , . . . , H k ∈ B sa ( H ) . where γ ⊂ C is a contour surrounding σ ( A ) with a counterclockwise orientation.The following lemmas hold. Lemma 4.
Let f ∈ E σ T L ∞ ( R ) . Then, for all k ≥ , k D k f ( A ) k ≤ σ k k f k L ∞ ( R ) , A ∈ B sa ( H ) . (2.21) Proof.
Given
A, H , . . . H k ∈ B sa ( H ) , denote F ( z , . . . , z k ) = f ( A + z H + · · · + z k H k ) , ( z , . . . , z k ) ∈ C k . Then, f is an entire operator valued function of exponential type ( σ k H k , . . . , σ k H k k ) : k F ( z , . . . , z k ) k ≤ sup | ζ |≤k A + z H + ··· + z k H k k | f ( ζ ) | ≤ e k A k exp n σ k H k| z | + · · · + σ k H k k| z k | o . By Bernstein inequality (2.1) (extended to Banach space valued functions as it was doneat the beginning of the proof of Lemma 2), we get (cid:13)(cid:13)(cid:13)(cid:13) ∂ k F ( x , . . . , x k ) ∂x . . . ∂x k (cid:13)(cid:13)(cid:13)(cid:13) ≤ σ k k H k . . . k H k k sup x ,...,x k ∈ R k F ( x , . . . , x k ) k . Therefore, k D k f ( A + x H + . . . x k H k )( H , . . . , H k ) k ≤ σ k k H k . . . k H k kk f k L ∞ ( R ) . For x = · · · = x k = 0 , this yields k D k f ( A )( H , . . . , H k ) k ≤ σ k k H k . . . k H k kk f k L ∞ ( R ) , implying the claim of the lemma.4 VladimirKoltchinskii Lemma 5.
Let f ∈ E σ T L ∞ ( R ) . Then, for all k ≥ and all A, H , . . . , H k , H ∈ B sa ( H ) , k D k f ( A + H ; H , . . . , H k ) − D k f ( A ; H , . . . , H k ) k ≤ σ k +1 k f k L ∞ ( R ) k H k . . . k H k kk H k (2.22) and k S D k f ( · ; H ,...,H k ) ( A ; H ) k ≤ σ k +2 k f k L ∞ ( R ) k H k . . . k H k kk H k . (2.23) Proof.
Bound (2.22) easily follows from (2.21) (applied to the derivative D k +1 f ). Theproof of bound (2.23) relies on Bernstein inequality (2.1) and on a slight modification ofthe proof of bound (2.5). Lemma 6.
Suppose f ∈ B k ∞ , ( R ) . Then the function B sa ( H ) ∋ A f ( A ) ∈ B sa ( H ) is k times Fr´echet differentiable and k D j f ( A ) k ≤ j k f k B j ∞ , , A ∈ B sa ( H ) , j = 1 , . . . , k. (2.24) Moreover, if for some s ∈ ( k, k + 1] , f ∈ B s ∞ , ( R ) , then k D k f ( A + H ) − D k f ( A ) k ≤ k +1 k f k B s ∞ , k H k s − k , A, H ∈ B sa ( H ) . (2.25) Proof.
As in the proof of Lemma 3, we use Littlewood-Paley decomposition of f. Since,by (2.21), for all j = 1 , . . . , k X n ≥ k D j f n ( A ; H , . . . , H j ) k ≤ X n ≥ ( n +1) j k f n k L ∞ ( R ) k H k . . . k H j k≤ j k f k B j ∞ , k H k . . . k H j k < + ∞ , (2.26)the series P n ≥ D j f n ( A ; H , . . . , H j ) converges in operator norm and we can define sym-metric j -linear forms D j f ( A ; H , . . . , H j ) := X n ≥ D j f n ( A ; H , . . . , H j ) , j = 1 , . . . , k. By the same argument as in the proof of claim (2.4) of Lemma 3 and using bounds (2.22)and (2.23), we can now prove by induction that D j f ( A ; H , . . . , H j ) , j = 1 , . . . , k are theconsecutive derivatives of f ( A ) . Indeed, for j = 1 , it was already proved in Lemma 3.Assuming that it is true for some j < k, we have to prove that it is also true for j + 1 . Tostimationoffunctionalsofcovarianceoperators 25this end, note that k D j f ( A + H ; H , . . . , H j ) − D j f ( A ; H , . . . , H j ) − D j +1 f ( A ; H , . . . , H j , H ) k≤ X n ≤ N k S D j f n ( · ; H ,...,H j ) ( A ; H ) k + X n>N k D j f n ( A + H ; H , . . . , H j ) − D j f n ( A ; H , . . . , H j ) k + X n>N k D j +1 f n ( A ; H , . . . , H j , H ) k≤ X n ≤ N ( j +2)( n +1) k f n k L ∞ ( R ) k H k . . . k H j kk H k + 2 X n>N ( j +1)( n +1) k f n k L ∞ ( R ) k H k . . . k H j kk H k . Given ε > , take N so that P n>N ( j +1)( n +1) k f n k L ∞ ≤ ε , which is possible for f ∈ B j +1 ∞ , ( R ) , and suppose H satisfies k H k ≤ ε P n ≤ N ( j +2)( n +1) k f n k L ∞ ( R ) . Then, we have k D j f ( A + H ; H , . . . , H j ) − D j f ( A ; H , . . . , H j ) − D j +1 f ( A ; H , . . . , H j , H ) k ≤ ε k H k . . . k H k kk H k . Therefore, the function A D j f ( A ; H , . . . , H j ) is Fr´echet differentiable with deriva-tive D j +1 f ( A ; H , . . . , H j , H ) . Bounds (2.24) now follow from (2.26).To prove (2.25), note that k D k f ( A + H )( H , . . . , H k ) − D k f ( A )( H , . . . , H k ) k≤ X n ≥ k D k f n ( A + H )( H , . . . , H k ) − D k f n ( A )( H , . . . , H k ) k . Using bounds (2.21) and (2.22), we get k D k f ( A + H )( H , . . . , H k ) − D k f ( A )( H , . . . , H k ) k ≤ X n ≤ k H k ( n +1)( k +1) k f n k L ∞ ( R ) k H kk H k . . . k H k k + 2 X n > k H k ( n +1) k k f n k L ∞ ( R ) k H k . . . k H k k≤ k +1 k H k . . . k H k k (cid:20) X n ≤ k H k ns k f n k L ∞ ( R ) n ( k +1 − s ) k H k + X n > k H k ns k f n k L ∞ ( R ) n ( k − s ) (cid:21) ≤ k +1 k H k . . . k H k kk H k s − k (cid:20) X n ≤ k H k ns k f n k L ∞ ( R ) + X n > k H k ns k f n k L ∞ ( R ) (cid:21) = 2 k +1 k f k B s ∞ , k H k s − k k H k . . . k H k k , which implies (2.25).6 VladimirKoltchinskiiIn what follows, we use the definition of H¨older space norms of functions of boundedself-adjoint operators. For an open set G ⊂ B sa ( H ) , a k -times Fr´echet differentiablefunctions g : G
7→ B sa ( H ) and, for s = k + β, β ∈ (0 , , define k g k C s ( G ) := max ≤ j ≤ k sup A ∈ G k D j g ( A ) k _ sup A,A + H ∈ G,H =0 k D k g ( A + H ) − D k g ( A ) kk H k β . (2.27)Similar definition applies to k -times Fr´echet differentiable functions g : G R (with k D j g ( A ) k being the operator norm of j -linear form). In both cases, C s ( G ) denotes thespace of functions g on G (operator valued or real valued) with k g k C s ( G ) < ∞ . In partic-ular, these norms apply to operator functions B sa ( H ) ∋ A f ( A ) ∈ B sa ( H ) , where f isa function in real line. With a little abuse of notation, we write the norm of such operatorfunctions as k f k C s ( B sa ( H )) . The next result immediately follows from Lemma 6.
Corollary 2.
Suppose that, for some k ≥ and s ∈ ( k, k + 1] , we have f ∈ B s ∞ , ( R ) . Then k f k C s ( B sa ( H )) ≤ k +1 k f k B s ∞ , . Let g : B sa ( H ) R be a Fr´echet differentiable function with respect to the operator normwith derivative Dg ( A ; H ) , H ∈ B sa ( H ) . Note that Dg ( A ; H ) , H ∈ B sa ( H ) is a boundedlinear functional on B sa ( H ) and its restriction to the subspace C sa ( H ) ⊂ B sa ( H ) of com-pact self-adjoint operators in H can be represented as Dg ( A, H ) = h Dg ( A ) , H i , H ∈C sa ( H ) , where Dg ( A ) ∈ S is a trace class operator in H . Let S g ( A ; H ) be the remainderof the first order Taylor expansion of g : S g ( A ; H ) := g ( A + H ) − g ( A ) − Dg ( A ; H ) , A, H ∈ B sa ( H ) . Our goal is to obtain concentration inequalities for random variable S g (Σ; ˆΣ − Σ) aroundits expectation. It will be done under the following assumption on the remainder S g ( A ; H ) : Assumption 2.
Let s ∈ [1 , . Assume there exists a constant L g,s > such that, for all Σ ∈ C + ( H ) , H, H ′ ∈ B sa ( H ) , | S g (Σ; H ′ ) − S g (Σ; H ) | ≤ L g,s ( k H k ∨ k H ′ k ) s − k H ′ − H k . Note that Assumption 2 implies (for H ′ = 0 ) that | S g (Σ; H ) | ≤ L g,s k H k s , Σ ∈C + ( H ) , H ∈ B sa ( H ) . Theorem 5.
Suppose Assumption 2 holds for some s ∈ (1 , . Then there exists a constant K s > such that for all t ≥ with probability at least − e − t | S g (Σ; ˆΣ − Σ) − E S g (Σ; ˆΣ − Σ) | (3.1) ≤ K s L g,s k Σ k s (cid:18)(cid:16) r (Σ) n (cid:17) ( s − / _(cid:16) r (Σ) n (cid:17) s − / _(cid:16) tn (cid:17) ( s − / _(cid:16) tn (cid:17) s − / (cid:19)r tn . stimationoffunctionalsofcovarianceoperators 27 Proof.
Let ϕ : R R be such that ϕ ( u ) = 1 , u ≤ , ϕ ( u ) = 0 , u ≥ and ϕ ( u ) =2 − u, u ∈ (1 , . Denote E := ˆΣ − Σ and, given δ > , define h ( X , . . . , X n ) := S g (Σ; E ) ϕ (cid:18) k E k δ (cid:19) . (3.2)We start with deriving a concentration bound for the function h ( X , . . . , X n ) of Gaussianrandom variables X , . . . , X n . To this end, we will show that h ( X , . . . , X n ) satisfies aLipschitz condition. With a minor abuse of notation, we will assume for a while that X , . . . , X n are non-random points of H and let X ′ , . . . , X ′ n be another set of such points.Denote by ˆΣ ′ := n − P nj =1 X ′ j ⊗ X ′ j the sample covariance based on X ′ , . . . , X ′ n and let E ′ := ˆΣ ′ − Σ . The following lemma establishes a Lipschitz condition for h. Lemma 7.
Suppose Assumption 2 holds with some s ∈ (1 , . Then, for an arbitrary δ > and h defined by (3.2) , the following bound holds with some constant C s > forall X , . . . , X n , X ′ , . . . , X ′ n ∈ H : | h ( X , . . . , X n ) − h ( X ′ , . . . , X ′ n ) | ≤ C s L g,s ( k Σ k / + √ δ ) δ s − √ n (cid:18) n X j =1 k X j − X ′ j k (cid:19) / . (3.3) Proof.
Using the fact that ϕ takes values in [0 , and it is a Lipschitz function with con-stant , and taking into account Assumption 2, we get | h ( X , . . . , X n ) | ≤ | S g (Σ; E ) | I ( k E k ≤ δ ) ≤ L g,s k E k s I ( k E k ≤ δ ) ≤ s L g,s δ s (3.4)and similarly | h ( X ′ , . . . , X ′ n ) | ≤ s L g,s δ s . (3.5)We also have | h ( X , . . . , X n ) − h ( X ′ , . . . , X ′ n ) | ≤ | S g (Σ , E ) − S g (Σ , E ′ ) | + 1 δ | S g (Σ , E ′ ) |k E − E ′ k≤ L g,s ( k E k ∨ k E ′ k ) s − k E ′ − E k + L g,s δ k E ′ k s k E ′ − E k . (3.6)If both k E k ≤ δ and k E ′ k ≤ δ, then (3.6) implies | h ( X , . . . , X n ) − h ( X ′ , . . . , X ′ n ) | ≤ (2 s − + 2 s ) L g,s δ s − k E ′ − E k . (3.7)If both k E k > δ and k E ′ k > δ, then ϕ (cid:16) k E k δ (cid:17) = ϕ (cid:16) k E ′ k δ (cid:17) = 0 , implying that h ( X , . . . , X n ) = h ( X ′ , . . . , X ′ n ) = 0 . If k E k ≤ δ, k E ′ k > δ and k E ′ − E k > δ, then | h ( X , . . . , X n ) − h ( X ′ , . . . , X ′ n ) | = | h ( X , . . . , X n ) | ≤ s L g,s δ s ≤ s L g,s δ s − k E ′ − E k . k E k ≤ δ, k E ′ k > δ and k E ′ − E k ≤ δ, then k E ′ k ≤ δ and, similarly to (3.7), weget | h ( X , . . . , X n ) − h ( X ′ , . . . , X ′ n ) | ≤ (3 s − + 3 s ) L g,s δ s − k E ′ − E k . (3.8)By these simple considerations, bound (3.8) holds in all possible cases. This fact alongwith (3.4), (3.5) yield | h ( X , . . . , X n ) − h ( X ′ , . . . , X ′ n ) | ≤ (3 s − + 3 s ) L g,s δ s − ( k E ′ − E k ∧ δ ) . (3.9)We now obtain an upper bound on k E ′ − E k . We have k E ′ − E k = (cid:13)(cid:13)(cid:13)(cid:13) n − n X j =1 X j ⊗ X j − n − n X j =1 X ′ j ⊗ X ′ j (cid:13)(cid:13)(cid:13)(cid:13) ≤ (cid:13)(cid:13)(cid:13)(cid:13) n − n X j =1 ( X j − X ′ j ) ⊗ X j (cid:13)(cid:13)(cid:13)(cid:13) + (cid:13)(cid:13)(cid:13)(cid:13) n − n X j =1 X ′ j ⊗ ( X j − X ′ j ) (cid:13)(cid:13)(cid:13)(cid:13) = sup k u k , k v k≤ (cid:12)(cid:12)(cid:12)(cid:12) n − n X j =1 h X j − X ′ j , u ih X j , v i (cid:12)(cid:12)(cid:12)(cid:12) + sup k u k , k v k≤ (cid:12)(cid:12)(cid:12)(cid:12) n − n X j =1 h X ′ j , u ih X j − X ′ j , v i (cid:12)(cid:12)(cid:12)(cid:12) ≤ sup k u k≤ (cid:18) n − n X j =1 h X j − X ′ j , u i (cid:19) / sup k v k≤ (cid:18) n − n X j =1 h X j , v i (cid:19) / + sup k u k≤ (cid:18) n − n X j =1 h X ′ j , u i (cid:19) / sup k v k≤ (cid:18) n − n X j =1 h X j − X ′ j , v i (cid:19) / ≤ k ˆΣ k / + k ˆΣ ′ k / √ n (cid:18) n X j =1 k X j − X ′ j k (cid:19) / ≤ (2 k Σ k / + k E k / + k E ′ k / )∆ , where ∆ := √ n (cid:18)P nj =1 k X j − X ′ j k (cid:19) / . Without loss of generality, assume that k E k ≤ δ (again, if both k E k > δ and k E ′ k > δ, then h ( X , . . . , X n ) = h ( X ′ , . . . , X ′ n ) = 0 and inequality (3.3) trivially holds). Then we have k E ′ − E k ≤ (2 k Σ k / + 2 √ δ + k E ′ − E k / )∆ If k E ′ − E k ≤ δ, the last bound implies that k E ′ − E k ≤ (2 k Σ k / + (2 √ √ δ )∆ ≤ k Σ k / ∆ _ (4 √ √ δ ∆ . Otherwise, if k E ′ − E k > δ, we get k E ′ − E k ≤ k Σ k / ∆ _ (4 √ k E ′ − E k / , stimationoffunctionalsofcovarianceoperators 29which yields k E ′ − E k ≤ k Σ k / ∆ _ (4 √ ∆ . Thus, either k E ′ − E k ≤ k Σ k / ∆ , or k E ′ − E k ≤ (4 √ ∆ . In the last case, wealso have (since δ < k E ′ − E k ) δ < √ δ k E ′ − E k / ≤ (4 √ √ δ ∆ . This shows that k E ′ − E k ∧ δ ≤ k Σ k / ∆ _ (4 √ √ δ ∆ (3.10)both when k E ′ − E k ≤ δ and when k E ′ − E k > δ. Substituting bound (3.10) in (3.9) yields | h ( X , . . . , X n ) − h ( X ′ , . . . , X ′ n ) |≤ (3 s − + 3 s ) L g,s (4 k Σ k / + (4 √ √ δ ) δ s − √ n (cid:18) n X j =1 k X j − X ′ j k (cid:19) / , which implies (3.3).In what follows, we set, for a given t > ,δ = δ n ( t ) := E k ˆΣ − Σ k + C k Σ k (cid:20)(cid:18)r r (Σ) n _ (cid:19)r tn _ tn (cid:21) . It follows from (1.3) that there exists a choice of absolute constant
C > such that P {k ˆΣ − Σ k ≥ δ n ( t ) } ≤ e − t , t ≥ . (3.11)Assuming that t ≥ log(4) , we get that P {k E k ≥ δ } ≤ / . Let M := Med( S g (Σ; E )) bea median of random variable S g (Σ; E ) . Then P { h ( X , . . . , X n ) ≥ M } ≥ P { h ( X , . . . , X n ) ≥ M, k E k < δ }≥ P { S g (Σ; E ) ≥ M, k E k < δ } ≥ / − P {k E k ≥ δ } ≥ / . Similarly, P { h ( X , . . . , X n ) ≤ M } ≥ / . In view of Lipschitz property of h (Lemma7), we now use a relatively standard argument (see Lemma 2 in [KL2] and its applicationslater in Section 3 of that paper) based on Gaussian isoperimetric inequality (see Ledoux[Led], Theorem 2.5 and inequality (2.9)) to conclude that with probability at least − e − t | h ( X , . . . , X n ) − M | . s L g,s δ s − ( k Σ k / + δ / ) k Σ k / r tn . S g (Σ; E ) = h ( X , . . . , X n ) on the event {k E k < δ } of probability atleast − e − t , we get that with probability − e − t | S g (Σ; E ) − M | . s L g,s δ s − ( k Σ k / + δ / ) k Σ k / r tn . (3.12)It follows from (1.1) that δ = δ n ( t ) . k Σ k (cid:18)r r (Σ) n _ r (Σ) n _ r tn _ tn (cid:19) . (3.13)Substituting (3.13) into (3.12) easily yields that with probability at least − e − t | S g (Σ; E ) − M | (3.14) . s L g,s k Σ k s (cid:18)(cid:16) r (Σ) n (cid:17) ( s − / _(cid:16) r (Σ) n (cid:17) s − / _(cid:16) tn (cid:17) ( s − / _(cid:16) tn (cid:17) s − / (cid:19)r tn , and, moreover, by adjusting the value of the constant in inequality (3.14) the probabilitybound can be replaced by − e − t . By integrating out the tails of probability bound (3.14)one can get that | E S g (Σ; E ) − M | ≤ E | S g (Σ; E ) − M | (3.15) . s L g,s k Σ k s (cid:18)(cid:16) r (Σ) n (cid:17) ( s − / _(cid:16) r (Σ) n (cid:17) s − / _(cid:16) n (cid:17) ( s − / (cid:19)r n . Combining (3.14) and (3.15) implies that, for all t ≥ , with probability at least − e − t | S g (Σ; E ) − E S g (Σ; E ) | (3.16) . s L g,s k Σ k s (cid:18)(cid:16) r (Σ) n (cid:17) ( s − / _(cid:16) r (Σ) n (cid:17) s − / _(cid:16) tn (cid:17) ( s − / _(cid:16) tn (cid:17) s − / (cid:19)r tn , which completes the proof. Example 1.
Consider the following functional g ( A ) := h f ( A ) , B i = tr( f ( A ) B ∗ ) , A ∈ B sa ( H ) , where f is a given smooth function and B ∈ S is a given nuclear operator. Corollary 3. If f ∈ B s ∞ , ( R ) for some s ∈ (1 , , then with probability at least − e − t the following concentration inequality holds for the functional g : | S g (Σ; ˆΣ − Σ) − E S g (Σ; ˆΣ − Σ) | (3.17) . s k f k B s ∞ , k B k k Σ k s (cid:18)(cid:16) r (Σ) n (cid:17) ( s − / _(cid:16) r (Σ) n (cid:17) s − / _(cid:16) tn (cid:17) ( s − / _(cid:16) tn (cid:17) s − / (cid:19)r tn . stimationoffunctionalsofcovarianceoperators 31 Proof.
It easily follows from Lemma 3 that Assumption 2 is satisfied for s ∈ [1 , with L g,s = 2 s +1 k f k B s ∞ , k B k . Therefore, Theorem 5 implies bound (3.17).In what follows, we need a more general version of the bound of Theorem 5 (undersomewhat more general conditions than Assumption 2).
Assumption 3.
Assume that, for all Σ ∈ C + ( H ) , H, H ′ ∈ B sa ( H ) , | S g (Σ; H ′ ) − S g (Σ; H ) | ≤ η (Σ; k H k ∨ k H ′ k ) k H ′ − H k , where < δ η (Σ; δ ) is a nondecreasing function of the following form: η (Σ; δ ) := η (Σ) δ α _ · · · _ η m (Σ) δ α m , for given nonnegative functions η , . . . , η m on C + ( H ) and given positive numbers α , . . . α m . The proof of the following result is a simple modification of the proof of Theorem 5.
Theorem 6.
Suppose Assumption 3 holds. Then, for all t ≥ with probability at least − e − t , | S g (Σ; ˆΣ − Σ) − E S g (Σ; ˆΣ − Σ) | . η η (Σ; δ n (Σ; t )) (cid:16)p k Σ k + p δ n (Σ; t )) p k Σ k r tn , (3.18) where δ n (Σ; t ) := k Σ k (cid:18)r r (Σ) n _ r (Σ) n _ r tn _ tn (cid:19) . (3.19) Let g : B sa ( H ) R be a Fr´echet differentiable function with respect to the operator normwith derivative Dg ( A ; H ) , H ∈ B sa ( H ) . Recall that Dg ( A ; H ) = h Dg ( A ) , H i , H ∈C sa ( H ) , where Dg ( A ) ∈ S . Denote D g (Σ) := Σ / Dg (Σ)Σ / . The following theorem is the main result of this section.
Theorem 7.
Suppose Assumption 2 holds for some s ∈ (1 , and also that r (Σ) ≤ n. Define γ s ( g ; Σ) := log (cid:18) L g,s k Σ k s kD g (Σ) k (cid:19) and t n,s ( g ; Σ) := (cid:20) − γ s ( g ; Σ) + s −
12 log (cid:18) n r (Σ) (cid:19)(cid:21) _ . Then sup x ∈ R (cid:12)(cid:12)(cid:12)(cid:12) P (cid:26) n / ( g ( ˆΣ) − E g ( ˆΣ)) √ kD g (Σ) k ≤ x (cid:27) − Φ( x ) (cid:12)(cid:12)(cid:12)(cid:12) . s (cid:18) kD g (Σ) k kD g (Σ) k (cid:19) √ n (4.1) + L g,s k Σ k s kD g (Σ) k (cid:18)(cid:16) r (Σ) n (cid:17) ( s − / _(cid:16) t n,s ( g ; Σ) n (cid:17) ( s − / _(cid:16) t n,s ( g ; Σ) n (cid:17) s − / (cid:19)q t n,s ( g ; Σ) . Proof.
Note that g ( ˆΣ) − g (Σ) = h Dg (Σ) , ˆΣ − Σ i + S g (Σ; ˆΣ − Σ) and, since E h Dg (Σ) , ˆΣ − Σ i = 0 , E g ( ˆΣ) − g (Σ) = S g (Σ; ˆΣ − Σ) − E S g (Σ; ˆΣ − Σ) , implying that g ( ˆΣ) − E g ( ˆΣ) = h Dg (Σ) , ˆΣ − Σ i + S g (Σ; ˆΣ − Σ) − E S g (Σ; ˆΣ − Σ) . (4.2)The linear term h Dg (Σ) , ˆΣ − Σ i = n − n X j =1 h Dg (Σ) X j , X j i − E h Dg (Σ) X, X i (4.3)is the sum of i.i.d. random variables and it will be approximated by a normal distribution.We will need the following simple lemma. Lemma 8.
Let A ∈ S be a self-adjoint trace class operator. Denote by λ j , j ≥ theeigenvalues of the operator Σ / A Σ / (repeated with their multiplicities and, to be spe-cific, such that their absolute values are arranged in a non-increasing order). Then h AX, X i d = X k ≥ λ k Z k , where Z , Z , . . . are i.i.d. standard normal random variables.Proof. First assume that Σ is a finite rank operator, or, equivalently, that X takes valuesin a finite dimensional subspace L of H . In this case, X = Σ / Z, where Z is a standardnormal vector in L. Therefore, h AX, X i = h A Σ / Z, Σ / Z i = h Σ / A Σ / Z, Z i = X k ≥ λ k Z k , where { Z k } are the coordinates of Z in the basis of eigenvectors of Σ / A Σ / . In the general infinite dimensional case the result follows by a standard finite dimen-sional approximation.stimationoffunctionalsofcovarianceoperators 33Note that E h AX, X i = P k ≥ λ k = tr(Σ / A Σ / ) and Var( h AX, X i ) = X k ≥ λ k E ( Z k − = 2 X k ≥ λ k = 2 k Σ / A Σ / k . The following result immediately follows from Berry-Esseen bound (see [Pet], Chap-ter 5, Theorem 3; an extension of the inequality to infinite sums of independent r.v. isbased on a straightforward approximation argument).
Lemma 9.
The following bound holds: sup x ∈ R (cid:12)(cid:12)(cid:12)(cid:12) P (cid:26) n / h Dg (Σ) , ˆΣ − Σ i√ kD g (Σ) k ≤ x (cid:27) − Φ( x ) (cid:12)(cid:12)(cid:12)(cid:12) . (cid:18) kD g (Σ) k kD g (Σ) k (cid:19) √ n . Proof.
Indeed, by (4.3) and Lemma 8 with A = Dg (Σ) ,n / h Dg (Σ) , ˆΣ − Σ i√ kD g (Σ) k d = P nj =1 P k ≥ λ k ( Z k,j − / (cid:18)P nj =1 P k ≥ λ k ( Z k,j − (cid:19) , (4.4)where { Z k,j } are i.i.d. standard normal random variables. By Berry-Esseen bound, sup x ∈ R (cid:12)(cid:12)(cid:12)(cid:12) P (cid:26) P nj =1 P k ≥ λ k ( Z k,j − / (cid:18)P nj =1 P k ≥ λ k ( Z k,j − (cid:19) (cid:27) − Φ( x ) (cid:12)(cid:12)(cid:12)(cid:12) . P nj =1 P k ≥ | λ k | E | Z k,j − | (cid:18)P nj =1 P k ≥ λ k E ( Z k,j − (cid:19) / . P k ≥ | λ k | (cid:18)P k ≥ λ k (cid:19) / √ n . (cid:18) kD g (Σ) k kD g (Σ) k (cid:19) √ n . Finally, the following lemma will be also used.
Lemma 10.
For random variables ξ, η, denote ∆( ξ, η ) := sup x ∈ R | P { ξ ≤ x } − P { η ≤ x }| and δ ( ξ, η ) := inf δ> h P {| ξ − η | ≥ δ } + δ i . Then, for a standard normal random variable Z, ∆( ξ, Z ) ≤ ∆( η, Z ) + δ ( ξ, η ) . Proof.
For all x ∈ R , δ > , P { ξ ≤ x } ≤ P { ξ ≤ x, | ξ − η | < δ } + P {| ξ − η | ≥ δ }≤ P { η ≤ x + δ } + P {| ξ − η | ≥ δ }≤ P { Z ≤ x + δ } + ∆( η, Z ) + P {| ξ − η | ≥ δ }≤ P { Z ≤ x } + δ + ∆( η, Z ) + P {| ξ − η | ≥ δ } , P { Z ≤ x + δ } − P { Z ≤ x } ≤ δ. Thus, P { ξ ≤ x } − P { Z ≤ x } ≤ ∆( η, Z ) + P {| ξ − η | ≥ δ } + δ. Similarly, P { ξ ≤ x } − P { Z ≤ x } ≥ − ∆( η, Z ) − P {| ξ − η | ≥ δ } − δ, implying that for all δ > ξ, Z ) ≤ ∆( η, Z ) + P {| ξ − η | ≥ δ } + δ. Taking the infimumover δ > yields the claim of the lemma.We apply the last lemma to random variables ξ := n / ( g ( ˆΣ) − E g ( ˆΣ)) √ kD g (Σ) k and η := n / h Dg (Σ) , ˆΣ − Σ i√ kD g (Σ) k . By (4.2), ξ − η = n / ( S g (Σ; ˆΣ − Σ) − E S g (Σ; ˆΣ − Σ)) √ kD g (Σ) k . Recall that Assumption 2 holds and r (Σ) ≤ n, and denote δ n,s ( g ; Σ; t ) := K s L g,s k Σ k s √ kD g (Σ) k (cid:18)(cid:16) r (Σ) n (cid:17) ( s − / _(cid:16) tn (cid:17) ( s − / _(cid:16) tn (cid:17) s − / (cid:19) √ t. It immediately follows from Theorem 5 that P {| ξ − η | ≥ δ n,s ( g ; Σ; t ) } ≤ e − t , t ≥ and,as a consequence, δ ( ξ, η ) ≤ inf t ≥ h δ n,s ( g ; Σ; t ) + e − t i . It follows from lemmas 9 and 10 that, for some
C > , sup x ∈ R (cid:12)(cid:12)(cid:12)(cid:12) P (cid:26) n / ( g ( ˆΣ) − E g ( ˆΣ)) √ kD g (Σ) k ≤ x (cid:27) − Φ( x ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ C (cid:18) kD g (Σ) k kD g (Σ) k (cid:19) √ n + inf t ≥ h δ n,s ( g ; Σ; t ) } + e − t i . (4.5)Recall that γ s ( g ; Σ) = log (cid:18) L g,s k Σ k s kD g (Σ) k (cid:19) and t n,s ( g ; Σ) = (cid:20) − γ s ( g ; Σ) + s −
12 log (cid:18) n r (Σ) (cid:19)(cid:21) _ . Let ¯ t := t n,s ( g ; Σ) . Then e − ¯ t ≤ L g,s k Σ k s √ ¯ t √ kD g (Σ) k (cid:16) r (Σ) n (cid:17) ( s − / . s δ n,s ( g ; Σ; ¯ t ) . stimationoffunctionalsofcovarianceoperators 35Therefore, inf t ≥ h δ n,s ( g ; Σ; t ) } + e − t i . s δ n,s ( g ; Σ; ¯ t ) . s L g,s k Σ k s kD g (Σ) k (cid:18)(cid:16) r (Σ) n (cid:17) ( s − / _(cid:16) t n,s ( g ; Σ) n (cid:17) ( s − / _(cid:16) t n,s ( g ; Σ) n (cid:17) s − / (cid:19)q t n,s ( g ; Σ) . Substituting this into bound (4.5) completes the proof of Theorem 7.Our main example of interest is the functional g ( A ) := h f ( A ) , B i , A ∈ B sa ( H ) , where f is a given smooth function and B ∈ S ( H ) is a given nuclear operator. If f ∈ B ∞ , ( R ) , then the function A f ( A ) is operator differentiable implying the differentiability ofthe functional A g ( A ) with derivative Dg ( A ; H ) = h Df ( A ; H ) , B i , A, H ∈ B sa ( H ) . Moreover, for A = Σ with spectral decomposition Σ = P λ ∈ σ (Σ) λP λ , formula (2.20)holds implying that C sa ( H ) ∋ H Df (Σ; H ) = Df (Σ) H ∈ B sa ( H ) is a symmetricoperator: h Df (Σ) H , H i = h H , Df (Σ) H i , H ∈ C sa ( H ) , H ∈ S ( H ) . Therefore, Dg (Σ; H ) = h Df (Σ; B ) , H i , H ∈ C sa ( H ) , or, in other words, Dg (Σ) = Df (Σ; B ) . Denote σ f (Σ; B ) := √ k Σ / Df (Σ; B )Σ / k and µ (3) f (Σ; B ) = k Σ / Df (Σ; B )Σ / k . The following result is a simple consequence of Theorem 7 and Corollary 3.
Corollary 4.
Let f ∈ B s ∞ , ( R ) for some s ∈ (1 , . Define γ s ( f ; Σ) := log (cid:18) s +3 / k f k B s ∞ , k B k k Σ k s σ f (Σ; B ) (cid:19) and t n,s ( f ; Σ) := (cid:20) − γ s ( f ; Σ) + s −
12 log (cid:18) n r (Σ) (cid:19)(cid:21) _ . Then sup x ∈ R (cid:12)(cid:12)(cid:12)(cid:12) P (cid:26) n / D f ( ˆΣ) − E f ( ˆΣ)) , B E σ f (Σ , B ) ≤ x (cid:27) − Φ( x ) (cid:12)(cid:12)(cid:12)(cid:12) . s ∆ ( s ) n ( f ; Σ; B ) := (cid:18) µ (3) f (Σ; B ) σ f (Σ; B ) (cid:19) √ n + k f k B s ∞ , k B k k Σ k s σ f (Σ; B ) (cid:18)(cid:16) r (Σ) n (cid:17) ( s − / _(cid:16) t n,s ( f ; Σ) n (cid:17) ( s − / _(cid:16) t n,s ( f ; Σ) n (cid:17) s − / (cid:19)q t n,s ( f ; Σ) . (4.6)We will now prove Theorem 2 and Corollary 1 from Section 1.6 VladimirKoltchinskii Proof.
The proof of (1.7) immediately follows from bound (4.6). It is also easy to prove(1.8) using (1.7), the bound on the bias k E Σ f ( ˆΣ) − f (Σ) k = k E Σ S f (Σ; ˆΣ − Σ) k . k f k B s ∞ , E k ˆΣ − Σ k s . k f k B s ∞ , k Σ k s (cid:18) r (Σ) n (cid:19) s/ (4.7)and Lemma 10.The proof of (1.9) is a bit more involved and requires a couple of more lemmas. Thefollowing fact is well known (it follows, e.g., from [Ver], Proposition 5.16). Lemma 11.
Let { ξ i } be i.i.d. standard normal random variables and let { γ i } be realnumbers. Then, for all t ≥ with probability at least − e − t (cid:12)(cid:12)(cid:12)(cid:12)X i ≥ γ i ( ξ i − (cid:12)(cid:12)(cid:12)(cid:12) . (cid:18)X i ≥ γ i (cid:19) / √ t _ sup i ≥ | γ i | t. Lemma 12.
If, for some s ∈ (1 , , f ∈ B s ∞ , ( R ) and r (Σ) ≤ n, then, for all t ≥ withprobability at least − e − t (cid:12)(cid:12)(cid:12)(cid:12) n / h f ( ˆΣ) − f (Σ) , B i σ f (Σ; B ) (cid:12)(cid:12)(cid:12)(cid:12) . s (cid:18) k f k B s ∞ , k B k k Σ k s σ f (Σ; B ) _ k f k L ∞ k B k σ f (Σ; B ) _ (cid:19)(cid:18) √ t _ ( r (Σ)) s/ n ( s − / (cid:19) . (4.8) Proof.
Recall that h f ( ˆΣ) − E f ( ˆΣ) , B i = h Df (Σ; ˆΣ − Σ) , B i + h S f (Σ; ˆΣ − Σ) − E S f (Σ; ˆΣ − Σ) , B i . It follows from (4.4) that n / h Df (Σ; ˆΣ − Σ) , B i σ f (Σ; B ) d = P nj =1 P k ≥ λ k ( Z k,j − / (cid:18)P nj =1 P k ≥ λ k ( Z k,j − (cid:19) , (4.9)where Z k,j are i.i.d. standard normal r.v. and λ k are the eigenvalues (repeated with theirmultiplicities) of Σ / Df (Σ; B )Σ / . Using Lemma 11, we easily get that for all t ≥ with probability at least − e − t , (cid:12)(cid:12)(cid:12)(cid:12) n / h Df (Σ; ˆΣ − Σ) , B i σ f (Σ; B ) (cid:12)(cid:12)(cid:12)(cid:12) . √ t ∨ t √ n . (4.10)To control h S f (Σ; ˆΣ − Σ) − E S f (Σ; ˆΣ − Σ) , B i , we use bound (3.17) to get that for all t ≥ with probability at least − e − t |h S f (Σ; ˆΣ − Σ) − E S f (Σ; ˆΣ − Σ) , B i| (4.11) . s k f k B s ∞ , k B k k Σ k s (cid:18)(cid:16) r (Σ) n (cid:17) ( s − / _(cid:16) r (Σ) n (cid:17) s − / _(cid:16) tn (cid:17) ( s − / _(cid:16) tn (cid:17) s − / (cid:19)r tn . stimationoffunctionalsofcovarianceoperators 37If r (Σ) ≤ n and t ≤ n, bounds (4.10),(4.11) and (4.7) easily imply that with probabilityat least − e − t (cid:12)(cid:12)(cid:12)(cid:12) n / h f ( ˆΣ) − f (Σ) , B i σ f (Σ; B ) (cid:12)(cid:12)(cid:12)(cid:12) . s (cid:18) k f k B s ∞ , k B k k Σ k s σ f (Σ; B ) _ (cid:19)(cid:18) √ t _ ( r (Σ)) s/ n ( s − / (cid:19) . (4.12)Note also that, for all t > n, (cid:12)(cid:12)(cid:12)(cid:12) n / h f ( ˆΣ) − f (Σ) , B i σ f (Σ; B ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ k f k L ∞ k B k σ f (Σ; B ) √ t. (4.13)The result immediately follows from bounds (4.12) and (4.13). Lemma 13.
Let ℓ be a loss function satisfying Assumption 1. For any random variables ξ, η and for all A > | E ℓ ( ξ ) − E ℓ ( η ) | ≤ ℓ ( A )∆( ξ ; η ) + E ℓ ( ξ ) I ( | ξ | ≥ A ) + E ℓ ( η ) I ( | η | ≥ A ) . Proof.
Clearly, | E ℓ ( ξ ) − E ℓ ( η ) | ≤ | E ℓ ( ξ ) I ( | ξ | < A ) − E ℓ ( η ) I ( | η | < A ) | + E ℓ ( ξ ) I ( | ξ | ≥ A )+ E ℓ ( η ) I ( | η | ≥ A ) . (4.14)Denoting by F ξ , F η the distribution functions of ξ, η, assuming that A is a continuity pointof both F ξ and F η and using integration by parts, we get | E ℓ ( ξ ) I ( | ξ | < A ) − E ℓ ( η ) I ( | η | < A ) | = (cid:12)(cid:12)(cid:12)Z A − A ℓ ( x ) d ( F ξ − F η )( x ) (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12) ℓ ( A )( F ξ − F η )( A ) − ℓ ( − A )( F ξ − F η )( − A ) − Z A − A ( F ξ − F η )( x ) ℓ ′ ( x ) dx (cid:12)(cid:12)(cid:12) . Using the properties of ℓ (in particular, that ℓ is an even function and ℓ ′ is nonnegative andnondecreasing on R + ), we get | E ℓ ( ξ ) I ( | ξ | < A ) − E ℓ ( η ) I ( | η | < A ) | ≤ ℓ ( A )∆( ξ ; η )+2 Z A ℓ ′ ( u ) du ∆( ξ, η ) = 4 ℓ ( A )∆( ξ, η ) , which together with (4.14) imply the claim. If A is not a continuity point of F ξ or F η , onecan easily obtain the result by a limiting argument.The following lemma is elementary.8 VladimirKoltchinskii Lemma 14.
Let ξ be a random variable such that for some τ > and for all t ≥ withprobability at least − e − t | ξ | ≤ τ √ t (4.15) Let ℓ be a loss function satisfying Assumption 1. Then E ℓ ( ξ ) ≤ e √ πc e c τ . (4.16)We now apply lemmas 13 and 14 to r.v. ξ := ξ (Σ) := √ n (cid:16) h f ( ˆΣ) , B i − h f (Σ) , B i (cid:17) σ f (Σ; B ) and η := Z. Bound (4.8) and Lemma 14 along with the fact that under conditions of thetheorem ( r (Σ)) s/ n ( s − / ≤ r s/ n n ( s − / ≤ (for large enough n ) imply that bounds (4.15) and (4.16)hold with τ := k f k B s ∞ , k B k k Σ k s σ f (Σ; B ) _ k f k L ∞ k B k σ f (Σ; B ) _ . It follows from the bound of Lemma 13 that | E ℓ ( ξ ) − E ℓ ( Z ) | ≤ ℓ ( A )∆( ξ ; Z ) + E / ℓ ( ξ ) P / {| ξ | ≥ A } + E / ℓ ( Z ) P / {| Z | ≥ A } . (4.17)Using bounds (4.16), standard bounds on E ℓ ( Z ) , P {| Z | ≥ A } and the bound of Corollary4, we get | E ℓ ( ξ ) − E ℓ ( Z ) | . s c e c A ∆ ( s ) n ( f ; Σ; B ) + √ e (2 π ) / c e c τ e − A / (2 τ ) + c e c e − A / . To complete the proof of (1.9), it remains to take the supremum over the class of covari-ances G ( r n ; a ) ∩ { Σ : σ f (Σ; B ) ≥ σ } and over all the operators B with k B k ≤ , andto pass to the limit first as n → ∞ and then as A → ∞ . In what follows, we assume that H is a finite-dimensional inner product space of di-mension dim( H ) = d. Recall that C + ( H ) ⊂ B sa ( H ) denotes the cone of covarianceoperators in H and let L ∞ ( C + ( H )) be the space of uniformly bounded Borel measur-able functions on C + ( H ) equipped with the uniform norm. Define the following operator T : L ∞ ( C + ( H )) L ∞ ( C + ( H )) : T g (Σ) = E Σ g ( ˆΣ) , Σ ∈ C + ( H ) , (5.1)stimationoffunctionalsofcovarianceoperators 39where ˆΣ = ˆΣ n := n − P nj =1 X j ⊗ X j is the sample covariance operator based on i.i.d.observations X , . . . , X n sampled from N (0; Σ) . Let P (Σ; · ) denote the probability distri-bution of ˆΣ in the space C + ( H ) (equipped with its Borel σ − algebra B ( C + ( H )) ). Note that P (Σ; n − A ) , A ∈ B ( C + ( H )) is a Wishart distribution W d (Σ; n ) . Clearly, P is a Markovkernel, T g (Σ) = Z C + ( H ) g ( V ) P (Σ; dV ) , g ∈ L ∞ ( C + ( H )) and operator T is a contraction: kT g k L ∞ ≤ k g k L ∞ . Let ˆΣ := Σ , ˆΣ (1) := ˆΣ and, more generally, given ˆΣ ( k ) , define ˆΣ ( k +1) as the samplecovariance based on n i.i.d. observations X ( k )1 , . . . , X ( k ) n sampled from N (0; ˆΣ ( k ) ) . Then ˆΣ ( k ) , k ≥ is a homogeneous Markov chain with values in C + ( H ) , with ˆΣ (0) = Σ andwith transition probability kernel P. The operator T k can be represented as T k g (Σ) = E Σ g ( ˆΣ ( k ) )= Z C + ( H ) · · · Z C + ( H ) g ( V k ) P ( V k − ; dV k ) P ( V k − ; dV k − ) . . . P ( V ; dV ) P (Σ; dV ) , Σ ∈ C + ( H ) . In what follows, we will be interested in operator B = T − I that can be called a biasoperator since B g (Σ) represents the bias of the plug-in estimator g ( ˆΣ) of g (Σ) : B g (Σ) = E Σ g ( ˆΣ) − g (Σ) , Σ ∈ C + ( H ) . Note that, by Newton’s binomial formula, B k g (Σ) can be represented as follows B k g (Σ) = ( T − I ) k g (Σ) = k X j =0 ( − k − j (cid:18) kj (cid:19) T j g (Σ) (5.2) = E Σ k X j =0 ( − k − j (cid:18) kj (cid:19) g ( ˆΣ ( j ) ) , which could be viewed as the expectation of the k -th order difference of function g alongthe sample path of Markov chain ˆΣ ( t ) , t = 0 , , . . . . Denote g k (Σ) := k X j =0 ( − j B j g (Σ) , Σ ∈ C + ( H ) . (5.3) Proposition 1.
The bias of estimator g k ( ˆΣ) of g (Σ) is given by the following formula: E Σ g k ( ˆΣ) − g (Σ) = ( − k B k +1 g (Σ) . Proof.
Indeed, E Σ g k ( ˆΣ) − g (Σ) = T g k (Σ) − g (Σ) = ( I + B ) g k (Σ) − g (Σ) = k X j =0 ( − j B j g (Σ) − k +1 X j =1 ( − j B j g (Σ) − g (Σ) = ( − k B k +1 g (Σ) . Let now L ∞ ( C + ( H ); B sa ( H )) be the space of uniformly bounded Borel measurablefunctions g : C + ( H )
7→ B sa ( H ) . We will need a version of the linear operator definedby formula (5.1) acting from the space L ∞ ( C + ( H ); B sa ( H )) into itself. With a little abuseof notation, we still denote it by T and also set B := T − I . These operators satisfy allthe properties stated above. This allows one to define operator valued function g k by (5.3)for which Proposition 1 still holds. In what follows, it should be clear from the contextwhether T and B act on real valued, or on operator valued functions.Given a smooth function f in real line, we would like to find an estimator of f (Σ) witha small bias. To this end, we consider an estimator f k ( ˆΣ) and, in view of Proposition 1, weneed to show that, for a proper choice of k (depending on α such that d = dim( H ) ≤ n α ), k E Σ f k ( ˆΣ) − f (Σ) k = kB k +1 f (Σ) k = o ( n − / ) . At the same time, we need to show that function f k satisfies certain smoothness prop-erties such as Assumption 3. As a consequence, (properly normalized) random variables n / (cid:16) h f k ( ˆΣ) , B i− E Σ h f k ( ˆΣ) , B i (cid:17) would be close in distribution to a standard normal r.v..Since, in addition, the bias E Σ h f k ( ˆΣ) , B i − h f (Σ) , B i is of the order o ( n − / ) , we wouldbe able to conclude that h f k ( ˆΣ) , B i is an asymptotically normal estimator of h f (Σ) , B i with the classical convergence rate n − / . Our approach to this problem is based on representing operator valued function f k (Σ) as f k (Σ) = D g k (Σ) , where g : C + ( H ) R is a real valued orthogonally invariantfunction and D is a differential operator defined below and called the lifting operator .This approach allows us to derive certain integral representations for functions B k f (Σ) = DB k g (Σ) that are then used to obtain proper bounds on B k f (Σ) and to study smoothnessproperties of functions B k f (Σ) and f k (Σ) . A function g ∈ L ∞ ( C + ( H )) is orthogonally invariant iff, for all orthogonal transfor-mations U of H , g ( U Σ U − ) = g (Σ) , Σ ∈ C + ( H ) . Note that any such function g could berepresented as g (Σ) = ϕ ( λ (Σ) , . . . , λ d (Σ)) , where λ (Σ) ≥ . . . λ d (Σ) are the eigenval-ues of Σ and ϕ is a symmetric function of d variables. A typical example of orthogonallyinvariant function is g (Σ) = tr( ψ (Σ)) for a function of real variable ψ. Let L O ∞ ( C + ( H )) bethe space of all orthogonally invariant functions from L ∞ ( C + ( H )) . Clearly, orthogonallyinvariant functions form an algebra. We will need several facts concerning the propertiesof operators T , B as well as the lifting operator operator D on the space of orthogonallyinvariant functions. In the case of orthogonally invariant polynomials, similar propertiescould be found in the literature on Wishart distribution (see, e.g., [FK, LetMas]).stimationoffunctionalsofcovarianceoperators 41 Proposition 2. If g ∈ L O ∞ ( C + ( H )) , then T g ∈ L O ∞ ( C + ( H )) and B g ∈ L O ∞ ( C + ( H )) . Proof.
Indeed, the transformation Σ U Σ U − is a bijection of C + ( H ) , T g ( U Σ U − ) = E U Σ U − g ( ˆΣ) = E Σ g ( U ˆΣ U − ) = E Σ g ( ˆΣ) = T g (Σ) and the function T g is uniformly bounded.An operator valued function g : C + ( H )
7→ B sa ( H ) is called orthogonally equivariant iff for all orthogonal transformations U g ( U Σ U − ) = U g (Σ) U − , Σ ∈ C + ( H ) . We say that g : C + ( H )
7→ B sa ( H ) is differentiable (resp., continuously differentiable, k times continuously differentiable, etc) in C + ( H ) iff there exists a uniformly bounded,Lipschitz with respect to the operator norm and differentiable (resp., continuously dif-ferentiable, k times continuously differentiable, etc) extension of g to an open set G, C + ( H ) ⊂ G ⊂ B sa ( H ) . Note that g could be further extended from G to a uniformlybounded Lipschitz with respect to the operator norm function on B sa ( H ) , which will bestill denoted by g. Proposition 3. If g : C + ( H ) R is orthogonally invariant and continuously differen-tiable in C + ( H ) with derivative Dg, then Dg is orthogonally equivariant.Proof. First suppose that Σ is positively definite. Then, given H ∈ B sa ( H ) , Σ + tH is acovariance operator for all small enough t. Thus, for all H ∈ B sa ( H ) , h Dg ( U Σ U − ) , H i = lim t → g ( U Σ U − + tH ) − g ( U Σ U − ) t = lim t → g ( U (Σ + tU − HU ) U − ) − g ( U Σ U − ) t = lim t → g (Σ + tU − HU )) − g (Σ) t = h Dg (Σ) , U − HU i = h U Dg (Σ) U − , H i implying Dg ( U Σ U − ) = U Dg (Σ) U − . (5.4)It remains to observe that positively definite covariance operators are dense in C + ( H ) andto extend (5.4) to C + ( H ) by continuity.We now define the following differential operator D g (Σ) := Σ / Dg (Σ)Σ / acting on continuously differentiable functions in C + ( H ) . It will be called the lifting op-erator . We will show that operators T and D commute (and, as a consequence, B and D also commute).2 VladimirKoltchinskii Proposition 4.
Suppose d < ∼ n. For all functions g ∈ L O ∞ ( C + ( H )) that are continuouslydifferentiable in C + ( H ) with a uniformly bounded derivative Dg and for all Σ ∈ C + ( H ) DT g (Σ) = T D g (Σ) and DB g (Σ) = BD g (Σ) . Proof.
Note that ˆΣ d = Σ / W Σ / , where W is the sample covariance based on i.i.d.standard normal random variables Z , . . . , Z n in H (which is a rescaled Wishart matrix).Let Σ / W / = RU be the polar decomposition of Σ / W / with positively semidefinite R and orthogonal U. Then, we have ˆΣ = Σ / W Σ / = Σ / W / W / Σ / = RU U − R = R and W / Σ W / = W / Σ / Σ / W / = U − RRU = U − R U = U − Σ / W Σ / U = U − ˆΣ U. Since g is orthogonally invariant, we have T g (Σ) = E Σ g ( ˆΣ) = E g (Σ / W Σ / ) = E g ( W / Σ W / ) , Σ ∈ C + ( H ) . (5.5)Since we extended g to a uniformly bounded function on B sa ( H ) , the right hand sideof (5.5) is well defined for all Σ ∈ B sa ( H ) , and it will be used to extend T g (Σ) to B sa ( H ) . Moreover, since g is Lipschitz with respect to the operator norm and, for d . n, E k W k ≤ E k W − I k ≤ C q dn . (see (1.6)), it is easy to check that T g (Σ) isLipschitz with respect to the operator norm on B sa ( H ) . Let H ∈ B sa ( H ) and Σ t := Σ + tH, t > . Note that T g (Σ t ) − T g (Σ) t = E g ( W / Σ t W / ) − E g ( W / Σ W / ) t = E g ( W / Σ t W / ) − g ( W / Σ W / ) t I ( k W k ≤ / √ t )+ E g ( W / Σ t W / ) − g ( W / Σ W / ) t I ( k W k > / √ t ) . (5.6)Recall that g is continuously differentiable in the open set G ⊃ C + ( H ) . Also, W / Σ W / ∈C + ( H ) ⊂ G and W / Σ t W / ∈ G for all small enough t > . The last fact follows fromthe bound k W / (Σ t − Σ) W / k ≤ k W k t k H k ≤ √ t k H k that holds for all t ≤ k W k (or k W k ≤ / √ t ) Therefore, we easily get that lim t → g ( W / Σ t W / ) − g ( W / Σ W / ) t I ( k W k ≤ / √ t )= h Dg ( W / Σ W / ) , W / HW / i = h W / Dg ( W / Σ W / ) W / , H i . stimationoffunctionalsofcovarianceoperators 43Also, since g is Lipschitz with respect to the operator norm, (cid:12)(cid:12)(cid:12)(cid:12) g ( W / Σ t W / ) − g ( W / Σ W / ) t I ( k W k ≤ / √ t ) (cid:12)(cid:12)(cid:12)(cid:12) . g k W / (Σ t − Σ) W / k t ≤ k W kk Σ t − Σ k t ≤ k W kk H k . Since E k W k . , we can use Lebesgue dominated convergence theorem to prove that lim t → E g ( W / Σ t W / ) − g ( W / Σ W / ) t I ( k W k ≤ / √ t )= E h W / Dg ( W / Σ W / ) W / , H i = h E W / Dg ( W / Σ W / ) W / , H i . (5.7)On the other hand, since g is uniformly bounded, we can use bound (1.6) to prove that forsome constant C > and for all t ≤ /C E (cid:12)(cid:12)(cid:12)(cid:12) g ( W / Σ t W / ) − g ( W / Σ W / ) t I ( k W k > / √ t ) (cid:12)(cid:12)(cid:12)(cid:12) . g t P n k W k ≥ √ t o ≤ t exp (cid:26) − nC √ t (cid:27) → t → . (5.8)It follows from (5.6), (5.7) and (5.8) that h D T g (Σ) , H i = h E W / Dg ( W / Σ W / ) W / , H i . It is also easy to check that E W / Dg ( W / Σ W / ) W / is a continuous function in G implying that T g is continuously differentiable in G with Fr´echet derivative D T g (Σ) = E W / Dg ( W / Σ W / ) W / . Since W / Σ W / = U − ˆΣ U and Dg is an orthogonally equivariant function (see Propo-sition 3), we get Dg ( W / Σ W / ) = U − Dg ( ˆΣ) U. Therefore, DT g (Σ)= Σ / D T g (Σ)Σ / = Σ / E ( W / Dg ( W / Σ W / ) W / )Σ / = E (Σ / W / Dg ( W / Σ W / ) W / Σ / ) = E (Σ / W / U − Dg ( ˆΣ) U W / Σ / )= E ( RU U − Dg ( ˆΣ) U U − R ) = E ( RDg ( ˆΣ) R ) = E Σ ( ˆΣ / Dg ( ˆΣ) ˆΣ / ) = E Σ D g ( ˆΣ)= T D g (Σ) . Similar relationship for operators B and D easily follows.We will now derive useful representations of operators T k and B k and prove that theyalso commute with the differential operator D . Proposition 5.
Suppose d . n. Let W , . . . , W k , . . . be i.i.d. copies of W. Then, for all g ∈ L O ∞ ( C + ( H )) and for all k ≥ , T k g (Σ) = E g ( W / k . . . W / Σ W / . . . W / k ) (5.9) and B k g (Σ) = E X I ⊂{ ,...,k } ( − k −| I | g ( A ∗ I Σ A I ) , (5.10) where A I := Q i ∈ I W / i . Suppose, in addition, that g is continuously differentiable in C + ( H ) with a uniformly bounded derivative Dg.
Then D B k g (Σ) = E X I ⊂{ ,...,k } ( − k −| I | A I Dg ( A ∗ I Σ A I ) A ∗ I , (5.11) and, for all Σ ∈ C + ( H ) DT k g (Σ) = T k D g (Σ) and DB k g (Σ) = B k D g (Σ) . (5.12) Finally, B k D g (Σ) = DB k g (Σ)= E (cid:16) X I ⊂{ ,...,k } ( − k −| I | Σ / A I Dg ( A ∗ I Σ A I ) A ∗ I Σ / (cid:17) . (5.13) Proof.
Since ˆΣ d = Σ / W Σ / , W / Σ W / = U − Σ / W Σ / U, where U is an orthog-onal operator, and g is orthogonally invariant, we have T g (Σ) = E Σ g ( ˆΣ) = E g ( W / Σ W / ) (5.14)(which has been already used in the proof of Proposition 4).By Proposition 2, orthogonal invariance of g implies the same property of T g and, byinduction, of T k g for all k ≥ . Then, also by induction, it follows from (5.14) that T k g (Σ) = E g ( W / k . . . W / Σ W / . . . W / k ) . If I ⊂ { , . . . , k } with | I | = card( I ) = j and A I = Q i ∈ I W / i , it clearly implies that T j g (Σ) = E g ( A ∗ I Σ A I ) . In view of (5.2), we easily get that (5.10) holds. If g is continuously differentiable in C + ( H ) with a uniformly bounded derivative Dg, it follows from (5.10) that B k g (Σ) is Recall that W is the sample covariance based on i.i.d. standard normal random variables Z , . . . , Z n in H . stimationoffunctionalsofcovarianceoperators 45continuously differentiable in C + ( H ) with Fr´echet derivative given by (5.11). To provethis, it is enough to justify differentiation under the expectation sign which is done ex-actly as in the proof of Proposition 4. Finally, it follows from (5.11) that the derivatives D B k g, k ≥ are uniformly bounded in C + ( H ) . Similarly, as a consequence of (5.9)and the properties of g, T k g (Σ) is continuously differentiable in C + ( H ) with uniformlybounded derivative D T k g for all k ≥ . Therefore, (5.12) follows from Proposition 4 byinduction. Formula (5.13) follows from (5.12) and (5.11).Define the following functions providing the linear interpolation between the identityoperator I and operators W / , . . . , W / k : V j ( t j ) := I + t j ( W / j − I ) , t j ∈ [0 , , ≤ j ≤ k. Clearly, for all j = 1 , . . . , k, t j ∈ [0 , , V j ( t j ) ∈ C + ( H ) . Let R = R ( t , . . . , t k ) = V ( t ) . . . V k ( t k ) and L = L ( t , . . . , t k ) = V k ( t k ) . . . V ( t ) = R ∗ . Define S = S ( t , . . . , t k ) = L ( t , . . . , t k )Σ R ( t , . . . , t k ) , ( t , . . . , t k ) ∈ [0 , k . Finally, let ϕ ( t , . . . , t k ) := Σ / R ( t , . . . , t k ) Dg ( S ( t , . . . , t k )) L ( t , . . . , t k )Σ / , ( t , . . . , t k ) ∈ [0 , k . The following representation will play a crucial role in our further analysis.
Proposition 6.
Suppose g ∈ L O ∞ ( C + ( H )) is k + 1 times continuously differentiable func-tion with uniformly bounded derivatives D j g, j = 1 , . . . , k + 1 . Then the function ϕ is k times continuously differentiable in [0 , k and B k D g (Σ) = E Z · · · Z ∂ k ϕ ( t , . . . , t k ) ∂t . . . ∂t k dt . . . dt k , Σ ∈ C + ( H ) . (5.15) Proof.
Given a function φ : [0 , k R , define for ≤ i ≤ k finite difference operators D i φ ( t , . . . , t k ) := φ ( t , . . . , t i − , , t i +1 , . . . , t k ) − φ ( t , . . . , t i − , , t i +1 , . . . , t k ) , (with obvious modifications for i = 1 , k ). Then D . . . D k φ does not depend on t , . . . , t k and is given by the formula D . . . D k φ = X ( t ,...,t k ) ∈{ , } k ( − k − ( t + ··· + t k ) φ ( t , . . . , t k ) . (5.16)6 VladimirKoltchinskiiIt is well known and easy to check that if φ is k times continuously differentiable in [0 , k , then D . . . D k φ = Z · · · Z ∂ k φ ( t , . . . , t k ) ∂t . . . ∂t k dt . . . dt k . (5.17)Similar definitions and formula (5.17) also hold for vector- and operator-valued functions φ. It immediately follows from (5.13) and (5.16) that B k D g (Σ) = E D . . . D k ϕ. (5.18)Since Dg is k times continuously differentiable and the functions S ( t , . . . , t k ) , R ( t , . . . , t k ) are polynomials with respect to t , . . . , t k , the function ϕ is k times continuously differ-entiable in [0 , k . Representation (5.15) follows from (5.18) and (5.17).
Our goal in this section is to prove the following bound on iterated bias operator B k D g (Σ) . Theorem 8.
Suppose g ∈ L O ∞ ( C + ( H )) is k + 1 times continuously differentiable functionwith uniformly bounded derivatives D j g, j = 1 , . . . , k + 1 . Suppose also that d ≤ n and k ≤ n. Then the following bound holds for some constant
C > and for all Σ ∈ C + ( H ) : kB k D g (Σ) k ≤ C k max ≤ j ≤ k +1 k D j g k L ∞ ( k Σ k k +1 ∨ k Σ k ) (cid:18) dn _ kn (cid:19) k/ . (6.1)It follows from commutativity relationships (5.12) that D g k (Σ) = ( D g ) k (Σ) , Σ ∈ C + ( H ) , where g k is defined by formula (5.3) and ( D g ) k (Σ) := k X j =0 ( − j B j D g (Σ) , Σ ∈ C + ( H ) . Clearly, we have (see Proposition 1) that E Σ D g k ( ˆΣ) − D g (Σ) = ( − k B k +1 D g (Σ) . (6.2)Bound (6.1) is needed, in particular, to control the bias of estimator D g k ( ˆΣ) of D g (Σ) . Namely, we have the following corollary.stimationoffunctionalsofcovarianceoperators 47
Corollary 5.
Suppose that g ∈ L O ∞ ( C + ( H )) is k + 2 times continuously differentiablefunction with uniformly bounded derivatives D j g, j = 1 , . . . , k + 2 and also that d ≤ n, k + 1 ≤ n. Then k E Σ D g k ( ˆΣ) −D g (Σ) k ≤ C ( k +1) max ≤ j ≤ k +2 k D j g k L ∞ ( k Σ k k +2 ∨k Σ k ) (cid:18) dn _ k + 1 n (cid:19) ( k +1) / . (6.3) If, in addition, k + 1 ≤ d ≤ n and, for some δ > ,k ≥ log d log( n/d ) + δ (cid:18) d log( n/d ) (cid:19) . (6.4) Then k E Σ D g k ( ˆΣ) − D g (Σ) k ≤ C ( k +1) max ≤ j ≤ k +2 k D j g k L ∞ ( k Σ k k +2 ∨ k Σ k ) n − δ . (6.5)The proof of this corollary immediately follows from formula (6.2) and bound (6.1).If d = n α for some α ∈ (0 , , condition (6.4) becomes k ≥ α + δ − α . Thus, if k ( α, δ ) := min (cid:26) k ≥ α + δ − α (cid:27) , then bound (6.5) holds with k = k ( α, δ ) . Remark 8.
In Section 7, we will obtain a sharper bound on the bias of estimator D g k ( ˆΣ) (under stronger smoothness assumptions, see Corollary 7).The first step towards the proof of Theorem 8 is to compute the partial derivative ∂ k ϕ∂t ...∂t k of function ϕ which would allow us to use representation (5.15). To this end, wefirst derive formulas for partial derivatives of operator-valued function h ( S ( t , . . . , t k )) , where h = Dg.
To simplify the notations, given T = { t i , . . . , t i m } ⊂ { t , . . . , t k } , wewill write ∂ T S instead of ∂ m S ( t ,...,t k ) ∂t i ...∂t im (similarly, we use the notation ∂ T h ( S ) for a partialderivative of a function h ( S ) ).Let D j,T be the set of all partitions (∆ , . . . , ∆ j ) of T ⊂ { t , . . . , t k } with non-emptysets ∆ i , i = 1 , . . . , j (partitions with different order of ∆ , . . . , ∆ j being identical). For ∆ = (∆ , . . . , ∆ j ) ∈ D j,T , set ∂ ∆ S = ( ∂ ∆ S, . . . , ∂ ∆ j S ) . Denote D T := S | T | j =1 D j,T . For ∆ = (∆ , . . . , ∆ j ) ∈ D T , set j ∆ := j. Lemma 15.
Suppose, for some m ≤ k, h = Dg ∈ L ∞ ( C + ( H ); B sa ( H )) is m timescontinuously differentiable with derivatives D j h, j ≤ m. Then the function [0 , k ∋ ( t , . . . , t k ) h ( S ( t , . . . , t k )) is m times continuously differentiable and for any T ⊂{ t , . . . , t k } with | T | = m∂ T h ( S ) = X ∆ ∈D T D j ∆ h ( S )( ∂ ∆ S ) = m X j =1 X ∆ ∈D j,T D j h ( S )( ∂ ∆ S ) . (6.6) Recall that D j h is an operator valued symmetric j -linear form on the space B sa ( H ) . Proof.
Since [0 , k ∋ ( t , . . . , t k ) S ( t , . . . , t k ) is an operator valued polynomial and h is m times continuously differentiable, the function [0 , k ∋ ( t , . . . , t k ) h ( S ( t , . . . , t k )) is also m times continuously differentiable. We will now prove formula (6.6) by inductionwith respect to m. For m = 1 , it reduces to ∂ { t i } h ( S ) = ∂h ( S ) ∂t i = Dh ( S ) (cid:16) ∂S∂t i (cid:17) , which is true by the chain rule. Assume that (6.6) holds for some m < k and for any T ⊂ { t , . . . , t k } , | T | = m. Let T ′ = T ∪ { t l } for some t l T. Then ∂ T ′ h ( S ) = ∂ { t l } ∂ T h ( S ) = m X j =1 X ∆ ∈D j,T ∂ { t l } D j h ( S )( ∂ ∆ S ) . (6.7)Given ∆ = (∆ , . . . , ∆ j ) ∈ D j,T , define partitions ∆ ( i ) ∈ D j,T ′ , i = 1 , . . . , j as follows: ∆ (1) := (∆ ∪ { t l } , ∆ , . . . , ∆ j ) , ∆ (2) := (∆ , ∆ ∪ { t l } , . . . , ∆ j ) , . . . , ∆ ( j ) := (∆ , . . . , ∆ j − , ∆ j ∪ { t l } ) . Also define a partition ˜∆ ∈ D j +1 ,T ′ as follows: ˜∆ := (∆ , . . . , ∆ j , { t l } ) . It is easy to seethat any partition ∆ ′ ∈ D T ′ is the image of a unique partition ∆ ∈ D T under one of thetransformations ∆ ∆ ( i ) , i = 1 , . . . , j ∆ and ∆ ˜∆ . This implies that D T ′ = [ ∆ ∈D T { ∆ (1) , . . . , ∆ ( j ∆ ) , ˜∆ } . It easily follows from the chain rule and the product rule that ∂ { t l } D j h ( S )( ∂ ∆ S ) = j X i =1 D j h ( S )( ∂ ∆ ( i ) S ) + D j +1 h ( S )( ∂ ˜∆ S ) . Substituting this in (6.7) easily yields ∂ T ′ h ( S ) = m +1 X j =1 X ∆ ∈D j,T ′ D j h ( S )( ∂ ∆ S ) . Next we derive upper bounds on k ∂ T S k , k ∂ T R k and k ∂ T L k for T ⊂ { t , . . . , t k } . Denote δ i := k W i − I k , i = 1 , . . . , k. Lemma 16.
For all T ⊂ { t , . . . , t k } , k ∂ T R k ≤ Y t i ∈ T δ i δ i k Y i =1 (1 + δ i ) , (6.8)stimationoffunctionalsofcovarianceoperators 49 k ∂ T L k ≤ Y t i ∈ T δ i δ i k Y i =1 (1 + δ i ) (6.9) and k ∂ T S k ≤ k k Σ k Y t i ∈ T δ i δ i k Y i =1 (1 + δ i ) . (6.10) Remark 9.
The bounds of the lemma hold for T = ∅ with an obvious convention that inthis case Q t i ∈ T a i = 1 . Proof.
Observe that ∂∂t i V i ( t i ) = W / i − I. Let B i := V i ( t i ) and B i := W / i − I. For R = V ( t ) . . . V k ( t k ) , we have ∂ T R = Q ki =1 B I T ( t i ) i and k ∂ T R k ≤ Y t i ∈ T k W / i − I k Y t i T k V i ( t i ) k . Note that, due to an elementary inequality |√ x − | ≤ | x − | , x ≥ , we have k W / i − I k ≤ k W i − I k = δ i and k V i ( t i ) k ≤ k W / i − I k ≤ k W i − I k = 1 + δ i . Therefore, k ∂ T R k ≤ Y t i ∈ T δ i Y t i T (1 + δ i ) = Y t i ∈ T δ i δ i k Y i =1 (1 + δ i ) , which proves (6.8). Similarly, we have (6.9).Note that, by the product rule, ∂ T S = ∂ T ( L Σ R ) = X T ′ ⊂ T ( ∂ T ′ L )Σ( ∂ T \ T ′ R ) . Therefore, k ∂ T S k ≤ k Σ k X T ′ ⊂ T k ∂ T ′ L kk ∂ T \ T ′ R k≤ k Σ k X T ′ ⊂ T Y t i ∈ T ′ δ i δ i Y t i ∈ T \ T ′ δ i δ i k Y i =1 (1 + δ i ) = 2 k k Σ k Y t i ∈ T δ i δ i k Y i =1 (1 + δ i ) , proving (6.10). Lemma 17.
Suppose that, for some ≤ m ≤ k, h = Dg ∈ L ∞ ( C + ( H ); B sa ( H )) is m times differentiable with uniformly bounded continuous derivatives D j h, j = 1 , . . . , m. Then for all T ⊂ { t , . . . , t k } with | T | = m k ∂ T h ( S ) k ≤ m ( k + m +1) max ≤ j ≤ m k D j h k L ∞ ( k Σ k m ∨ k Y i =1 (1 + δ i ) m Y t i ∈ T δ i δ i . (6.11)0 VladimirKoltchinskii Proof.
Assume that m ≥ (for m = 0 , the bound of the lemma is trivial). Let ∆ =(∆ , . . . , ∆ j ) ∈ D j,T , j ≤ m. Note that k D j h ( S )( ∂ ∆ S, . . . , ∂ ∆ j S ) k ≤ k D j h ( S ) kk ∂ ∆ S k . . . k ∂ ∆ j S k≤ k D j h ( S ) k kj k Σ k j j Y l =1 Y t i ∈ ∆ l δ i δ i k Y i =1 (1 + δ i ) j = k D j h ( S ) k kj k Σ k j Y t i ∈ T δ i δ i k Y i =1 (1 + δ i ) j . Using Lemma 15, we get k ∂ T h ( S ) k ≤ m X j =1 X ∆ ∈D j,T k D j h ( S )( ∂ ∆ S ) k≤ m X j =1 card( D j,T ) k D j h ( S ) k kj k Σ k j k Y i =1 (1 + δ i ) j Y t i ∈ T δ i δ i . Note that the number of all functions on T with values in { , . . . , j } is equal to j m and,clearly, card( D j,T ) ≤ j m . Therefore, k ∂ T h ( S ) k ≤ m X j =1 j m k D j h ( S ) k kj k Σ k j k Y i =1 (1 + δ i ) j Y t i ∈ T δ i δ i ≤ m m +1 km max ≤ j ≤ m k D j h k L ∞ ( k Σ k ∨ k Σ k m ) k Y i =1 (1 + δ i ) m Y t i ∈ T δ i δ i ≤ m ( k + m +1) max ≤ j ≤ m k D j h k L ∞ ( k Σ k ∨ k Σ k m ) k Y i =1 (1 + δ i ) m Y t i ∈ T δ i δ i , (6.12)which easily implies bound (6.11).Next we bound partial derivatives of the function Σ / Lh ( S ) R Σ / (with S = S ( t , . . . , t k ) ,L = L ( t , . . . , t k ) , R = R ( t , . . . , t k ) and h = Dg ). Lemma 18.
Assume that d ≤ n and k ≤ n. Suppose h = Dg ∈ L ∞ ( C + ( H ); B sa ( H )) is k times differentiable with uniformly bounded continuous derivatives D j h, j = 1 , . . . , k. Then k ∂ { t ,...,t k } Σ / Rh ( S ) L Σ / k ≤ k k (2 k +1) max ≤ j ≤ k k D j h k L ∞ ( k Σ k k +1 ∨ k Σ k ) k Y i =1 (1 + δ i ) k +1 δ i . (6.13)stimationoffunctionalsofcovarianceoperators 51 Proof.
Note that ∂ { t ,...,t k } Σ / Rh ( S ) L Σ / = X T ,T ,T (cid:16) Σ / ( ∂ T R )( ∂ T h ( S ))( ∂ T L )Σ / (cid:17) , (6.14)where the sum is over all the partitions of the set { t , . . . , t k } into disjoint subsets T , T , T . The number of such partitions is equal to k . We have k Σ / ( ∂ T R )( ∂ T h ( S ))( ∂ T L )Σ / k ≤ k Σ kk ∂ T L kk ∂ T h ( S ) kk ∂ T R k . (6.15)Assume | T | = m , | T | = m , | T | = m . It follows from Lemma 17 that k ∂ T h ( S ) k ≤ m ( k + m +1) max ≤ j ≤ m k D j h k L ∞ ( k Σ k m ∨ k Y i =1 (1 + δ i ) m Y t i ∈ T δ i δ i . On the other hand, by (6.8) and (6.9), we have k ∂ T R k ≤ Y t i ∈ T δ i δ i k Y i =1 (1 + δ i ) and k ∂ T L k ≤ Y t i ∈ T δ i δ i k Y i =1 (1 + δ i ) . It follows from these bounds and (6.15) that k Σ / ( ∂ T R )( ∂ T h ( S ))( ∂ T L )Σ / k≤ k Σ k m ( k + m +1) max ≤ j ≤ m k D j h k L ∞ ( k Σ k m ∨ k Y i =1 (1 + δ i ) m +2 Y t i ∈ T ∪ T ∪ T δ i δ i ≤ k (2 k +1) max ≤ j ≤ k k D j h k L ∞ ( k Σ k k +1 ∨ k Σ k ) k Y i =1 (1 + δ i ) k +2 k Y i =1 δ i δ i = 2 k (2 k +1) max ≤ j ≤ k k D j h k L ∞ ( k Σ k k +1 ∨ k Σ k ) k Y i =1 (1 + δ i ) k +1 δ i . Since the number of terms in the sum in the right hand side of (6.14) is equal to k , weeasily get that bound (6.13) holds.We are now in a position to prove Theorem 8. Proof.
We use representation (5.15) to get kB k D g (Σ) k ≤ Z · · · Z E k ∂ { t ,...,t k } Σ / Rh ( S ) L Σ / k dt . . . dt k (6.16)Using bounds (6.13), this yields kB k D g (Σ) k ≤ k k (2 k +1) max ≤ j ≤ k k D j h k L ∞ ( k Σ k k +1 ∨ k Σ k ) E k Y i =1 (1 + δ i ) k +1 δ i . (6.17)2 VladimirKoltchinskiiNote that E k Y i =1 (1 + δ i ) k +1 δ i = k Y i =1 E (1 + δ i ) k +1 δ i = (cid:16) E (1 + k W − I k ) k +1 k W − I k (cid:17) k and E (1 + k W − I k ) k +1 k W − I k = 2 k +1 E (cid:18) k W − I k (cid:19) k +1 k W − I k≤ k +1 E k W − I k k +1 k W − I k = 2 k (cid:16) E k W − I k + E k W − I k k +2 (cid:17) . Using bound (1.5), we get that with some constant C ≥ E k W − I k ≤ E / (2 k +2) k W − I k k +2 ≤ C (cid:18)r dn _ r kn (cid:19) , which implies that E (1 + k W − I k ) k +1 k W − I k ≤ k (cid:20) C (cid:18)r dn _ r kn (cid:19) + C k +21 (cid:18) dn _ kn (cid:19) k +1 (cid:21) ≤ k C k +21 (cid:18)r dn _ r kn (cid:19) and E k Y i =1 (1 + δ i ) k +1 δ i ≤ k C k +2 k (cid:18) dn _ kn (cid:19) k/ . (6.18)We substitute this bound in (6.17) to get kB k D g (Σ) k ≤ k k + k C k +2 k max ≤ j ≤ k k D j h k L ∞ ( k Σ k k +1 ∨ k Σ k ) (cid:18) dn _ kn (cid:19) k/ , (6.19)which implies the result. D g k (Σ) . Our goal in this section is to show that, for a smooth orthogonally invariant function g, the function D g k (Σ) satisfies Assumption 3. This result will be used in the next section toprove normal approximation bounds for D g k ( ˆΣ) . We will assume in what follows that g is defined and properly smooth on the whole space B sa ( H ) of self-adjoint operators in H . Recall that, by formula (5.15), B k D g (Σ) = E Z · · · Z ∂ k ϕ ( t , . . . , t k ) ∂t . . . ∂t k dt . . . dt k , stimationoffunctionalsofcovarianceoperators 53where ϕ ( t , . . . , t k ) := Σ / R ( t , . . . , t k ) Dg ( S ( t , . . . , t k )) L ( t , . . . , t k )Σ / , ( t , . . . , t k ) ∈ [0 , k . Let δ ∈ (0 , / and let γ : R R be a C ∞ function such that: ≤ γ ( u ) ≤ √ u, u ≥ , γ ( u ) = √ u, u ∈ h δ, δ i , supp( γ ) ⊂ h δ , δ i and k γ k B ∞ , . log(2 /δ ) √ δ . For instance, one can take γ ( u ) := λ ( u/δ ) √ u (1 − λ ( δu/ , where λ is a C ∞ non-decreasing function with values in [0 , , λ ( u ) = 0 , u ≤ / and λ ( u ) = 1 , u ≥ . Thebound on the norm k γ k B ∞ , could be proved using equivalent definition of Besov normsin terms of difference operators (see [Tr], Section 2.5.12). Clearly, for all Σ ∈ C + ( H ) , k γ (Σ) k ≤ k Σ k / and for all Σ ∈ C + ( H ) with σ (Σ) ⊂ h δ, δ i , we have γ (Σ) = Σ / . Since we need further differentiation of B k D g (Σ) with respect to Σ , it will be conve-nient to introduce (for given H, H ′ ∈ B sa ( H ) ) the following function: φ ( t , . . . , t k ; s , s ) := γ ( ¯Σ( s , s )) R ( t , . . . , t k ) Dg (cid:16) L ( t , . . . , t k ) ¯Σ( s , s ) R ( t , . . . , t k ) (cid:17) L ( t , . . . , t k ) γ ( ¯Σ( s , s )) , where ¯Σ( s , s ) = Σ + s H + s ( H ′ − H ) , s , s ∈ R . Note that ϕ ( t , . . . , t k ) = φ ( t , . . . , t k , , . By the argument already used at the beginning of the proof of Lemma15, if h := Dg is k times continuously differentiable, then the function φ is also k timescontinuously differentiable.For simplicity, we write in what follows B k (Σ) := B k D g (Σ) and D k (Σ) := D g k (Σ) . Clearly, D k (Σ) := P kj =0 ( − j B j (Σ) and the following representation holds: B k (Σ) := E Z · · · Z ∂ k φ ( t , . . . , t k , , ∂t . . . ∂t k dt . . . dt k , k ≥ . For k = 0 , we have B (Σ) := D g (Σ) . Denote γ β,k (Σ; u ) := ( k Σ k ∨ u ∨ k +1 / ( u ∨ u β ) , u > , β ∈ [0 , , k ≥ . Recall definition (2.27) of C s -norms of smooth operator valued functions defined in anopen set G ⊂ B sa ( H ) . It is assumed in this section that G = B sa ( H ) and we will write k · k C s instead of k · k C s ( B sa ( H )) . Theorem 9.
Suppose that, for some k ≤ d, g is k + 2 times continuously differentiablein B sa ( H ) and, for some β ∈ (0 , , k Dg k C k +1+ β < ∞ . In addition, suppose that g ∈ L O ∞ ( C + ( H )) and σ (Σ) ⊂ h δ, δ i . Then, for some constant C ≥ and for all H, H ′ ∈B sa ( H ) k S B k (Σ; H ′ ) − S B k (Σ; H ) k ≤ C k log (2 /δ ) δ k Dg k C k +1+ β (cid:18) dn (cid:19) k/ γ β,k (Σ; k H k ∨ k H ′ k ) k H ′ − H k . (7.1) Corollary 6.
Suppose that, for some k ≤ d, g is k + 2 times continuously differentiableand, for some β ∈ (0 , , k Dg k C k +1+ β < ∞ . Suppose also that g ∈ L O ∞ ( C + ( H )) , d ≤ n/ and that σ (Σ) ⊂ h δ, δ i . Then, for some constant C ≥ and for all H, H ′ ∈ B sa ( H ) k S D k (Σ; H ′ ) − S D k (Σ; H ) k ≤ C k log (2 /δ ) δ k Dg k C k +1+ β γ β,k (Σ; k H k ∨ k H ′ k ) k H ′ − H k . (7.2) Proof.
Indeed, k S D k (Σ; H ′ ) − S D k (Σ; H ) k ≤ k X j =0 k S B j (Σ; H ′ ) − S B j (Σ; H ) k≤ C k log (2 /δ ) δ k Dg k C k +1+ β k X j =0 (cid:18) dn (cid:19) j/ γ β,k (Σ; k H k ∨ k H ′ k ) k H ′ − H k≤ C k log (2 /δ ) δ k Dg k C k +1+ β γ β,k (Σ; k H k ∨ k H ′ k ) k H ′ − H k , implying the bound of the corollary (after proper adjustment of the value of C ).We now give the proof of Theorem 9. Proof.
Note that S B k (Σ; H ′ ) − S B k (Σ; H ) = DB k (Σ+ H ; H ′ − H ) − DB k (Σ; H ′ − H )+ S B k (Σ+ H ; H ′ − H ) , so, we need to bound k DB k (Σ + H ; H ′ − H ) − DB k (Σ; H ′ − H ) k and k S B k (Σ + H ; H ′ − H ) k separately. To this end, note that B k (Σ + s H ) = E Z · · · Z ∂ k φ ( t , . . . , t k , s , ∂t . . . ∂t k dt . . . dt k ,DB k (Σ; H ) = E Z · · · Z ∂ k +1 φ ( t , . . . , t k , , ∂t . . . ∂t k ∂s dt . . . dt k , stimationoffunctionalsofcovarianceoperators 55and DB k (Σ + s H ; H ′ − H ) = E Z · · · Z ∂ k +1 φ ( t , . . . , t k , s , ∂t . . . ∂t k ∂s dt . . . dt k . The last two formulas hold provided that g is k + 2 times continuously differentiablewith uniformly bounded derivatives D j g, j = 0 , . . . , k + 2 and, as a consequence, thefunction φ ( t , . . . , t k , s , s ) is k + 1 times continuously differentiable (the proof of thisfact is similar to the proof of differentiability of function φ ( t , . . . , t k ) , see the proofs ofProposition 4 and Lemma 15).As a consequence, DB k (Σ + H ; H ′ − H ) − DB k (Σ; H ′ − H )= E Z · · · Z (cid:20) ∂ k +1 φ ( t , . . . , t k , , ∂t . . . ∂t k ∂s − ∂ k +1 φ ( t , . . . , t k , , ∂t . . . ∂t k ∂s (cid:21) dt . . . dt k (7.3)and S B k (Σ + H ; H ′ − H )= E Z · · · Z (cid:20) ∂ k φ ( t , . . . , t k , , ∂t . . . ∂t k − ∂ k φ ( t , . . . , t k , , ∂t . . . ∂t k − ∂ k +1 φ ( t , . . . , t k , , ∂t . . . ∂t k ∂s (cid:21) dt . . . dt k = E Z · · · Z Z (cid:20) ∂ k +1 φ ( t , . . . , t k , , s ) ∂t . . . ∂t k ∂s − ∂ k +1 φ ( t , . . . , t k , , ∂t . . . ∂t k ∂s (cid:21) ds dt . . . dt k . (7.4)The next two lemmas provide upper bounds on k DB k (Σ + H ; H ′ − H ) − DB k (Σ; H ′ − H ) k and k S B k (Σ + H ; H ′ − H ) k Lemma 19.
Suppose that, for some k ≤ d, g ∈ L O ∞ ( C + ( H )) is k + 2 times continuouslydifferentiable and, for some β ∈ (0 , , k Dg k C k +1+ β < ∞ . In addition, suppose that σ (Σ) ⊂ h δ, δ i . Then, for some constant
C > and for all H, H ′ ∈ B sa ( H ) k DB k (Σ + H ; H ′ − H ) − DB k (Σ; H ′ − H ) k (7.5) ≤ C k log (2 /δ ) δ k Dg k C k +1+ β (( k Σ k + k H k ) k +1 / ∨ (cid:18) dn (cid:19) k/ ( k H k ∨ k H k β ) k H ′ − H k . Lemma 20.
Suppose that, for some k ≤ d, g ∈ L O ∞ ( C + ( H )) is k + 2 times continuouslydifferentiable and, for some β ∈ (0 , , k Dg k C k +1+ β < ∞ . In addition, suppose that σ (Σ) ⊂ h δ, δ i . Then, for some constant
C > and for all H, H ′ ∈ B sa ( H ) k S B k (Σ + H ; H ′ − H ) k (7.6) ≤ C k log (2 /δ ) δ k Dg k C k +1+ β (( k Σ k + k H k + k H ′ k ) k +1 / ∨ (cid:18) dn (cid:19) k/ ( k H ′ − H k β ∨ k H ′ − H k ) . Lemma 21.
Suppose that, for some k ≤ d, g ∈ L O ∞ ( C + ( H )) is k + 2 times differen-tiable with uniformly bounded continuous derivatives D j g, j = 0 , . . . , k + 2 . In addition,suppose that σ (Σ) ⊂ h δ, δ i . Then, for some constant
C > and for all H ∈ B sa ( H ) , k DB k (Σ; H ) k ≤ C k log (2 /δ ) δ k Dg k C k +1 ( k Σ k k +1 / ∨ (cid:18) dn (cid:19) k/ k H k . (7.7)We give below a proof of Lemma 19. The proofs of lemmas 20, 21 are based on asimilar approach. Proof.
First, we derive an upper bound on the difference ∂ k +1 φ ( t , . . . , t k , , ∂t . . . ∂t k ∂s − ∂ k +1 φ ( t , . . . , t k , , ∂t . . . ∂t k ∂s in the right hand side of (7.3). To this end, note that by the product rule ∂ k +1 φ ( t , . . . , t k , s , s ) ∂t . . . ∂t k ∂s = ∂∂s X T ,T ,T γ ( ¯Σ)( ∂ T R )( ∂ T h ( L ¯Σ R ))( ∂ T L ) γ ( ¯Σ) , where h = Dg and the sum extends to all partitions T , T , T of the set of variables { t , . . . , t k } . We can further write ∂ k +1 φ ( t , . . . , t k , s , s ) ∂t . . . ∂t k ∂s = X T ,T ,T h ( ∂ { s } γ ( ¯Σ))( ∂ T R )( ∂ T h ( L ¯Σ R ))( ∂ T L ) γ ( ¯Σ) + γ ( ¯Σ)( ∂ T R )( ∂ T ∪{ s } h ( L ¯Σ R ))( ∂ T L ) γ ( ¯Σ)+ γ ( ¯Σ)( ∂ T R )( ∂ T h ( L ¯Σ R ))( ∂ T L )( ∂ { s } γ ( ¯Σ)) i . (7.8)Observe that ∂ { s } γ ( ¯Σ) = Dγ ( ¯Σ; H ′ − H ) and deduce from (7.8) that ∂ k +1 φ ( t , . . . , t k , , ∂t . . . ∂t k ∂s − ∂ k +1 φ ( t , . . . , t k , , ∂t . . . ∂t k ∂s = X T ,T ,T [ A + A + · · · + A ] , (7.9)where A := [ Dγ (Σ + H ; H ′ − H ) − Dγ (Σ; H ′ − H )]( ∂ T R )( ∂ T h ( L ¯Σ , R ))( ∂ T L ) γ ( ¯Σ , ) ,A := Dγ (Σ; H ′ − H )( ∂ T R )( ∂ T h ( L (Σ + H ) R ) − ∂ T h ( L Σ R ))( ∂ T L ) γ ( ¯Σ , ) ,A := Dγ (Σ; H ′ − H )( ∂ T R )( ∂ T h ( L Σ R ))( ∂ T L )( γ (Σ + H ) − γ (Σ)) ,A := ( γ (Σ + H ) − γ (Σ))( ∂ T R )( ∂ T ∪{ s } h ( L ¯Σ , R ))( ∂ T L ) γ ( ¯Σ , ) A := γ (Σ)( ∂ T R )( ∂ T ∪{ s } h ( L (Σ + H ) R ) − ∂ T ∪{ s } h ( L Σ R ))( ∂ T L ) γ ( ¯Σ , ) ,A := γ (Σ)( ∂ T R )( ∂ T ∪{ s } h ( L Σ R ))( ∂ T L )( γ (Σ + H ) − γ (Σ)) ,A := ( γ (Σ + H ) − γ (Σ))( ∂ T R )( ∂ T h ( L ¯Σ , R ))( ∂ T L ) Dγ ( ¯Σ , ; H ′ − H ) A := γ (Σ)( ∂ T R )( ∂ T h ( L (Σ + H ) R ) − ∂ T h ( L Σ R ))( ∂ T L ) Dγ ( ¯Σ , ; H ′ − H ) stimationoffunctionalsofcovarianceoperators 57and A := γ (Σ)( ∂ T R )( ∂ T h ( L Σ R ))( ∂ T L )( Dγ (Σ + H ; H ′ − H ) − Dγ (Σ; H ′ − H )) . To bound the norms of operators A , A , . . . , A , we need several lemmas. We intro-duce here some notation used in their proofs. Recall that for a partition ∆ = (∆ , . . . , ∆ j ) of the set { t , . . . , t k } ∂ ∆ ( L Σ R ) = ( ∂ ∆ ( L Σ R ) , . . . , ∂ ∆ j ( L Σ R )) . We will need some transformations of ∂ ∆ ( L Σ R ) . In particular, for i = 1 , . . . , j and H ∈B sa ( H ) , denote ∂ ∆ ( L Σ R )[ i : Σ → H ] = ( ∂ ∆ ( L Σ R ) , . . . , ∂ ∆ i − ( L Σ R ) , ∂ ∆ i ( LHR ) , ∂ ∆ i +1 ( L Σ R ) , . . . , ∂ ∆ j ( L Σ R )) . We will also write ∂ ∆ ( L Σ R )[ i : Σ → H ; i + 1 , . . . , j : Σ → Σ + H ]= ( ∂ ∆ ( L Σ R ) , . . . , ∂ ∆ i − ( L Σ R ) , ∂ ∆ i ( LHR ) , ∂ ∆ i +1 ( L (Σ+ H ) R ) , . . . , ∂ ∆ j ( L (Σ+ H ) R )) . In addition, the following notation will be used: ∂ ∆ ( L Σ R ) ⊔ B = ( ∂ ∆ ( L Σ R ) , . . . , ∂ ∆ j ( L Σ R ) , B ) The meaning of other similar notation should be clear from the context. Finally, recall that δ i = k W i − I k , i ≥ . Lemma 22.
Suppose that, for some ≤ m ≤ k, function h ∈ L ∞ ( C + ( H ); B sa ( H )) is m + 1 times differentiable with uniformly bounded continuous derivatives D j h, j =1 , . . . , m + 1 . For all T ⊂ { t , . . . , t k } with | T | = m, k ∂ T h ( L (Σ + H ) R ) − ∂ T h ( L Σ R ) k (7.10) ≤ m ( k + m +2)+1 max ≤ j ≤ m +1 k D j h k L ∞ (( k Σ k + k H k ) m ∨ Y t i ∈ T δ i δ i k Y i =1 (1 + δ i ) m +2 k H k . Proof.
By Lemma 15, ∂ T h ( L (Σ + H ) R ) − ∂ T h ( L Σ R ) = (7.11) m X j =1 X ∆ ∈D j,T h D j h ( L (Σ + H ) R )( ∂ ∆ ( L (Σ + H ) R )) − D j h ( L Σ R )( ∂ ∆ ( L Σ R )) i . D j h ( L (Σ + H ) R )( ∂ ∆ ( L (Σ + H ) R )) − D j h ( L Σ R )( ∂ ∆ ( L Σ R ))= j X i =1 D j h ( L (Σ + H ) R )( ∂ ∆ ( L Σ R )[ i : Σ → H ; i + 1 , . . . , j : Σ → Σ + H ])+ ( D j h ( L (Σ + H ) R ) − D j h ( L Σ R ))( ∂ ∆ ( L Σ R )) . The following bounds hold for all ≤ i ≤ j : (cid:13)(cid:13)(cid:13) D j h ( L (Σ + H ) R )( ∂ ∆ ( L Σ R )[ i : Σ → H ; i + 1 , . . . , j : Σ → Σ + H ]) (cid:13)(cid:13)(cid:13) ≤ k D j h ( L (Σ + H ) R ) k Y ≤ l
Suppose that, for some ≤ m ≤ k, h ∈ L ∞ ( C + ( H ); B sa ( H )) is m + 1 timesdifferentiable with uniformly bounded continuous derivatives D j h, j = 1 , . . . , m +1 . Thenfor some constant
C > and for all T ⊂ { t , . . . , t k } with | T | = m and all s ∈ [0 , , k ∂ T ∪{ s } h ( L ¯Σ s , R ) k (7.12) ≤ m ( k + m +2)+1 max ≤ j ≤ m +1 k D j h k L ∞ (( k Σ k + k H k ) m ∨ Y t i ∈ T δ i δ i k Y i =1 (1 + δ i ) m +2 k H ′ − H k . Proof.
By Lemma 15, ∂ T ∪{ s } h ( L ¯Σ R ) = m X j =1 X ∆ ∈D j,T ∂ { s } D j h ( L ¯Σ R )( ∂ ∆ ( L ¯Σ R )) . (7.13)Next, we have ∂ { s } D j h ( L ¯Σ R )( ∂ ∆ ( L ¯Σ R )) = D j +1 h ( L ¯Σ R )( ∂ ∆ ( L ¯Σ R ) ⊔ ∂ { s } ( L ¯Σ R ))+ j X i =1 D j h ( L ¯Σ R )( ∂ ∆ (( L ¯Σ R )[ i : ∂ ∆ i ( L ¯Σ R ) → ∂ ∆ i ∪{ s } ( L ¯Σ R )]) . Note that ∂ { s } ( L ¯Σ R ) = L ( H ′ − H ) R and ∂ ∆ i ∪{ s } ( L ¯Σ R ) = ∂ ∆ i ( L ( H ′ − H ) R ) , implying ∂ { s } D j h ( L ¯Σ R )( ∂ ∆ ( L ¯Σ R )) = D j +1 h ( L ¯Σ R )( ∂ ∆ ( L ¯Σ R ) ⊔ L ( H ′ − H ) R )+ j X i =1 D j h ( L ¯Σ R )( ∂ ∆ ( L ¯Σ R )[ i : ¯Σ → H ′ − H ]) . (7.14)The following bounds hold: (cid:13)(cid:13)(cid:13) D j +1 h ( L ¯Σ R )( ∂ ∆ ( L ¯Σ R ) ⊔ L ( H ′ − H ) R ) (cid:13)(cid:13)(cid:13) ≤ k D j +1 h k L ∞ j Y i =1 k ∂ ∆ i ( L ¯Σ R ) kk L kk R kk H ′ − H k and (cid:13)(cid:13)(cid:13) D j h ( L ¯Σ R )( ∂ ∆ ( L ¯Σ R )[ i : ¯Σ → H ′ − H ]) (cid:13)(cid:13)(cid:13) ≤ k D j h k L ∞ Y l = i k ∂ ∆ l ( L ¯Σ R ) kk ∂ ∆ i ( L ( H ′ − H ) R ) k . The rest of the proof is based on the bounds almost identical to the ones in the proof ofLemma 22.0 VladimirKoltchinskii
Lemma 24.
Suppose that, for some ≤ m ≤ k, h ∈ L ∞ ( C + ( H ); B sa ( H )) is m + 2 timesdifferentiable with uniformly bounded continuous derivatives D j h, j = 1 , . . . , m + 2 . Forsome constant
C > and for all T ⊂ { t , . . . , t k } with | T | = m, (cid:13)(cid:13)(cid:13) ∂ T ∪{ s } h ( L (Σ + H ) R ) − ∂ T ∪{ s } h ( L Σ R ) (cid:13)(cid:13)(cid:13) (7.15) ≤ C k ( m +1) max ≤ j ≤ m +2 k D j h k L ∞ (( k Σ k + k H k ) m ∨ Y t i ∈ T δ i δ i k Y i =1 (1 + δ i ) m +4 k H kk H ′ − H k . Moreover, if for some ≤ m ≤ k, h ∈ L ∞ ( C + ( H ); B sa ( H )) is m + 1 times continuouslydifferentiable and, for some β ∈ (0 , , k h k C m +1+ β < ∞ , then (cid:13)(cid:13)(cid:13) ∂ T ∪{ s } h ( L (Σ + H ) R ) − ∂ T ∪{ s } h ( L Σ R ) (cid:13)(cid:13)(cid:13) (7.16) ≤ C k ( m +1) k h k C m +1+ β (( k Σ k + k H k ) m ∨ Y t i ∈ T δ i δ i k Y i =1 (1 + δ i ) m +4 ( k H k ∨ k H k β ) k H ′ − H k . Proof.
By (7.13), ∂ T ∪{ s } h ( L (Σ + H ) R ) − ∂ T ∪{ s } h ( L Σ R ) (7.17) = m X j =1 X ∆ ∈D j,T h ∂ { s } D j h ( L ¯Σ , R )( ∂ ∆ ( L ¯Σ , R )) − ∂ { s } D j h ( L ¯Σ , R )( ∂ ∆ ( L ¯Σ , R )) i and by (7.14), ∂ { s } D j h ( L ¯Σ , R )( ∂ ∆ ( L ¯Σ , R )) − ∂ { s } D j h ( L ¯Σ , R )( ∂ ∆ ( L ¯Σ , R ))= j X i =1 D j +1 h ( L (Σ + H ) R )( ∂ ∆ ( L Σ R )[ i : Σ → H ; i + 1 , . . . , j : Σ → Σ + H ] ⊔ L ( H ′ − H ) R )+ [ D j +1 h ( L (Σ + H ) R ) − D j +1 h ( L Σ R )]( ∂ ∆ ( L Σ R ) ⊔ L ( H ′ − H ) R )+ j X i =1 X i ′ = i D j h ( L (Σ + H ) R )( ∂ ∆ ( L Σ R )[ i : Σ → H ′ − H ; i ′ : Σ → H ; l > i ′ , l = i : Σ → Σ + H ])+ j X i =1 [ D j h ( L (Σ + H ) R ) − D j h ( L Σ R )]( ∂ ∆ ( L Σ R )[ i : Σ → H ′ − H ]) . (7.18)stimationoffunctionalsofcovarianceoperators 61Similarly to the bounds in the proof of Lemma 22, we get (cid:13)(cid:13)(cid:13) D j +1 h ( L (Σ + H ) R )( ∂ ∆ ( L Σ R )[ i : Σ → H ; i + 1 , . . . , j : Σ → Σ + H ] ⊔ L ( H ′ − H ) R ) (cid:13)(cid:13)(cid:13) ≤ kj k D j +1 h k L ∞ ( k Σ k + k H k ) j − Y t i ∈ T δ i δ i k Y i =1 (1 + δ i ) j +2 k H kk H ′ − H k , (cid:13)(cid:13)(cid:13) [ D j +1 h ( L (Σ + H ) R ) − D j +1 h ( L Σ R )]( ∂ ∆ ( L Σ R ) ⊔ L ( H ′ − H ) R ) (cid:13)(cid:13)(cid:13) ≤ kj k D j +2 h k L ∞ k Σ k j Y t i ∈ T δ i δ i k Y i =1 (1 + δ i ) j +4 k H kk H ′ − H k , (cid:13)(cid:13)(cid:13) D j h ( L (Σ + H ) R )( ∂ ∆ ( L Σ R )[ i : Σ → H ′ − H ; i ′ : Σ → H ; l > i ′ , l = i : Σ → Σ + H ]) (cid:13)(cid:13)(cid:13) ≤ kj k D j h k L ∞ ( k Σ k + k H k ) j − Y t i ∈ T δ i δ i k Y i =1 (1 + δ i ) j k H kk H ′ − H k and (cid:13)(cid:13)(cid:13) [ D j h ( L (Σ + H ) R ) − D j h ( L Σ R )]( ∂ ∆ ( L Σ R )[ i : Σ → H ′ − H ]) (cid:13)(cid:13)(cid:13) ≤ kj k D j +1 h k L ∞ k Σ k j − Y t i ∈ T δ i δ i k Y i =1 (1 + δ i ) j k H kk H ′ − H k . These bounds along with formulas (7.17), (7.18) imply that bound (7.15) holds. The proofof bound (7.16) is similar.We now get back to bounding operators A , . . . , A in the right hand side of (7.9). Iteasily follows from lemmas 16, 17, 22, 23 and 24 as well as from the bounds k γ (Σ) k ≤ k Σ k / , k γ (Σ + H ) − γ (Σ) k ≤ k γ k B ∞ , k H k . log(2 /δ ) √ δ k H k , k Dγ (Σ; H ) k ≤ k γ k B ∞ , k H k . log(2 /δ ) √ δ k H k and k Dγ (Σ + H ; H ′ − H ) − Dγ (Σ; H ′ − H ) k . k γ k B ∞ , k H kk H ′ − H k . log (2 /δ ) δ k H kk H ′ − H k that for some constant C > and for all l = 1 , . . . , | A l | ≤ C k log (2 /δ ) δ k Dg k C k +1+ β (( k Σ k + k H k ) k +1 / ∨ k Y i =1 δ i (1 + δ i ) k +5 ( k H k ∨ k H k β ) k H ′ − H k . (cid:13)(cid:13)(cid:13)(cid:13) ∂ k +1 φ ( t , . . . , t k , , ∂t . . . ∂t k ∂s − ∂ k +1 φ ( t , . . . , t k , , ∂t . . . ∂t k ∂s (cid:13)(cid:13)(cid:13)(cid:13) (7.19) ≤ C k log (2 /δ ) δ k Dg k C k +1+ β (( k Σ k + k H k ) k +1 / ∨ k Y i =1 δ i (1 + δ i ) k +5 ( k H k ∨ k H k β ) k H ′ − H k with some constant C > . Similarly to (6.18), we have that for k ≤ d with some constant C ≥ E k Y i =1 δ i (1 + δ i ) k +5 ≤ C k (cid:18) dn (cid:19) k/ . Using this together with (7.19) to bound on the expectation in (7.3) yields bound (7.5).Theorem 9 immediately follows from lemmas 19 and 20.We will now derive a bound on the bias of estimator D g k ( ˆΣ) that improves the boundsof Section 6 under stronger assumptions on smoothness of g. Corollary 7.
Suppose g ∈ L O ∞ ( C + ( H )) is k + 2 times continuously differentiable for some k ≤ d ≤ n and, for some β ∈ (0 , , k Dg k C k +1+ β < ∞ . In addition, suppose that forsome δ > σ (Σ) ⊂ h δ, δ i . Then, for some constant
C > , k E Σ D g k ( ˆΣ) − D g (Σ) k ≤ C k log (2 /δ ) δ k Dg k C k +1+ β ( k Σ k ∨ k +3 / k Σ k (cid:18) dn (cid:19) ( k +1+ β ) / . (7.20) Proof.
First note that B k +1 D g (Σ) = E Σ B k ( ˆΣ) − B k (Σ)= E Σ DB k (Σ; ˆΣ − Σ) + E Σ S B k (Σ; ˆΣ − Σ) = E Σ S B k (Σ; ˆΣ − Σ) . It follows from bound (7.1) (with H ′ = ˆΣ − Σ and H = 0 ) that k S B k (Σ; ˆΣ − Σ) k ≤ C k log (2 /δ ) δ k Dg k C k +1+ β (cid:18) dn (cid:19) k/ γ β,k (Σ; k ˆΣ − Σ k ) k ˆΣ − Σ k . (7.21)Since γ β,k (Σ; k ˆΣ − Σ k ) k ˆΣ − Σ k≤ ( k Σ k ∨ k +1 / ( k ˆΣ − Σ k β + k ˆΣ − Σ k ) + k ˆΣ − Σ k k + β +3 / + k ˆΣ − Σ k k +5 / , stimationoffunctionalsofcovarianceoperators 63we can use the bound E /p k ˆΣ − Σ k p . k Σ k (cid:16)q dn ∨ p pn (cid:17) to get that for some constant C > and for k ≤ d ≤ n E γ β,k (Σ; k ˆΣ − Σ k ) k ˆΣ − Σ k ≤ C k ( k Σ k ∨ k +3 / k Σ k (cid:18) dn (cid:19) (1+ β ) / . Therefore, for some constant
C > , kB k +1 D g (Σ) k ≤ E k S B k (Σ; ˆΣ − Σ) k ≤ C k log (2 /δ ) δ k Dg k C k +1+ β ( k Σ k ∨ k +3 / k Σ k (cid:18) dn (cid:19) ( k +1+ β ) / . Since E Σ D g k ( ˆΣ) − D g (Σ) = ( − k B k +1 D g (Σ) , the result follows. In this section, our goal is to prove bounds showing that, for sufficiently smooth orthog-onally invariant functions g, for large enough k and for an operator B with nuclear normbounded by a constant, the distribution of random variables √ n (cid:16) hD g k ( ˆΣ) , B i − hD g (Σ) , B i (cid:17) σ g (Σ; B ) is close to the standard normal distribution when n → ∞ and d = o ( n ) . It will be shownthat this holds true with σ g (Σ; B ) = 2 (cid:13)(cid:13)(cid:13) Σ / ( D D g (Σ)) ∗ B Σ / (cid:13)(cid:13)(cid:13) , (8.1)where ( D D g (Σ)) ∗ is the adjoint operator of D D g (Σ) : h D D g (Σ) H , H i = h H , ( D D g (Σ)) ∗ H i . We will prove the following result.
Theorem 10.
Suppose that, for some s > , g ∈ C s +1 ( B sa ( H )) ∩ L O ∞ ( C + ( H )) is anorthogonally invariant function. Suppose that d ≥ n and, for some α ∈ (0 , ,d ≤ n α . Suppose also that Σ is non-singular and, for a small enough constant c > ,d ≤ cn ( k Σ k ∨ k Σ − k ) . (8.2) Finally, suppose that s > − α and let k be an integer number such that − α < k + 1 + β ≤ s for some β ∈ (0 , . Then, there exists a constant C such that sup x ∈ R (cid:12)(cid:12)(cid:12)(cid:12) P (cid:26) √ n (cid:16) hD g k ( ˆΣ) , B i − hD g (Σ) , B i (cid:17) σ g (Σ; B ) ≤ x (cid:27) − Φ( x ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ C k L g ( B ; Σ) h n − k + β − α ( k +1+ β )2 + n − (1 − α ) β/ p log n i + C √ n , (8.3)4 VladimirKoltchinskii where L g ( B ; Σ) := k B k k Dg k C s σ g (Σ; B ) ( k Σ k ∨ k Σ − k ) log (2( k Σ k ∨ k Σ − k )) k Σ k ( k Σ k ∨ k +3 / . We will also need the following exponential upper bound on the r.v. √ n (cid:16) hD g k (ˆΣ) ,B i−hD g (Σ) ,B i (cid:17) σ g (Σ; B ) . Proposition 7.
Under the assumptions of Theorem 10, there exists a constant C such that,for all t ≥ with probability at least − e − t , (cid:12)(cid:12)(cid:12)(cid:12) √ n (cid:16) hD g k ( ˆΣ) , B i − hD g (Σ) , B i (cid:17) σ g (Σ; B ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ C k ( L g ( B ; Σ) ∨ √ t. (8.4)Our main application is to the problem of estimation of the functional h f (Σ) , B i fora given smooth function f and given operator B. We will use h f k ( ˆΣ) , B i as its estimator,where f k (Σ) := P kj =0 ( − j B j f (Σ) . Denote σ f (Σ; B ) = 2 (cid:13)(cid:13)(cid:13) Σ / Df (Σ; B )Σ / (cid:13)(cid:13)(cid:13) . Theorem 11.
Suppose that, for some s > , f ∈ B s ∞ , ( R ) . Suppose that d ≥ n and,for some α ∈ (0 , , d ≤ n α . Suppose also that Σ is non-singular and, for a small enoughconstant c = c s > , d ≤ cn ( k Σ k ∨ k Σ − k ) . (8.5) Finally, suppose that s > − α and let k be an integer number such that − α < k + 1 + β ≤ s for some β ∈ (0 , . Then, there exists a constant C such that sup x ∈ R (cid:12)(cid:12)(cid:12)(cid:12) P (cid:26) √ n (cid:16) h f k ( ˆΣ) , B i − h f (Σ) , B i (cid:17) σ f (Σ; B ) ≤ x (cid:27) − Φ( x ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ C k M f ( B ; Σ) h n − k + β − α ( k +1+ β )2 + n − (1 − α ) β/ p log n i + C √ n , (8.6) where M f ( B ; Σ) := k B k k f k B s ∞ , σ f (Σ; B ) ( k Σ k ∨ k Σ − k ) s log (2( k Σ k ∨ k Σ − k )) k Σ k ( k Σ k ∨ k +3 / . Proposition 8.
Under the assumptions of Theorem 11, there exists a constant C such that,for all t ≥ with probability at least − e − t , (cid:12)(cid:12)(cid:12)(cid:12) √ n (cid:16) h f k ( ˆΣ) , B i − h f (Σ) , B i (cid:17) σ f (Σ; B ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ C k ( M f ( B ; Σ) ∨ √ t. (8.7)We now turn to the proof of Theorem 10 and Proposition 7.stimationoffunctionalsofcovarianceoperators 65 Proof.
Recall the notation D k (Σ) := D g k (Σ) . For a given operator B, define functionals d k (Σ) := h D k (Σ) , B i , recall that d k ( ˆΣ) − E d k ( ˆΣ) = h D d k (Σ) , ˆΣ − Σ i + S d k (Σ; ˆΣ − Σ) − E S d k (Σ; ˆΣ − Σ) and consider the following representation: √ n (cid:16) hD g k ( ˆΣ) , B i − hD g (Σ) , B i (cid:17) σ g (Σ; B ) = √ n h D d k (Σ) , ˆΣ − Σ i√ kD d k (Σ) k + ζ , (8.8)with the remainder ζ := ζ + ζ + ζ , where ζ := √ n ( h E D g k ( ˆΣ) − D g (Σ) , B i ) σ g (Σ; B ) ,ζ := √ n ( S d k (Σ; ˆΣ − Σ) − E S d k (Σ; ˆΣ − Σ)) σ g (Σ; B ) ,ζ := √ n h D d k (Σ) , ˆΣ − Σ i√ kD d k (Σ) k √ kD d k (Σ) k − σ g (Σ; B ) σ g (Σ; B ) . Step 1 . By Lemma 9, sup x ∈ R (cid:12)(cid:12)(cid:12)(cid:12) P (cid:26) √ n h D d k (Σ) , ˆΣ − Σ i√ kD d k (Σ) k ≤ x (cid:27) − Φ( x ) (cid:12)(cid:12)(cid:12)(cid:12) . (cid:18) kD d k (Σ) k kD d k (Σ) k (cid:19) √ n . kD d k (Σ) kkD d k (Σ) k √ n . √ n . (8.9)Note also that √ n h D d k (Σ) , ˆΣ − Σ i√ kD d k (Σ) k d = P nj =1 P i ≥ λ i ( Z i,j − √ n (cid:18)P i ≥ λ i (cid:19) / , where { Z i,j } are i.i.d. standard normal random variables and { λ i } are the eigenvalues of D d k (Σ) (see the proof of Lemma 9). To provide an upper bound on the right hand side,we use Lemma 11 to get that with probability at least − e − t (cid:12)(cid:12)(cid:12)(cid:12) √ n h D d k (Σ) , ˆΣ − Σ i√ kD d k (Σ) k (cid:12)(cid:12)(cid:12)(cid:12) . √ t ∨ t √ n . (8.10)We will now control separately each of the random variables ζ , ζ , ζ . Step 2 . To bound ζ , we observe that, for δ = k Σ k∨k Σ − k , σ (Σ) ⊂ h δ, δ i and useinequality (7.20) that yields: | ζ | ≤ √ n k E D g k ( ˆΣ) − D g (Σ) kk B k σ g (Σ; B ) ≤ C k Λ k,β ( g ; Σ; B )( k Σ k ∨ k +3 / k Σ k√ n (cid:18) dn (cid:19) ( k +1+ β ) / , (8.11)6 VladimirKoltchinskiiwhere Λ k,β ( g ; Σ; B ) := k B k k Dg k C k +1+ β σ g (Σ; B ) ( k Σ k ∨ k Σ − k ) log (2( k Σ k ∨ k Σ − k )) . Under the assumption that, for some α ∈ (0 , , d ≤ n α , the last bound implies that | ζ | ≤ C k Λ k,β ( g ; Σ; B )( k Σ k ∨ k +3 / k Σ k n − k + β − α ( k +1+ β )2 , (8.12)which tends to for k + 1 + β > − α . Step 3 . To bound ζ , recall Theorem 6 and Corollary 6. It follows from these statementsthat, under the assumptions k g k C k +2+ β < ∞ and d ≤ n/ , for all t ≥ with probabilityat least − e − t , | ζ | ≤ √ n | S d k (Σ; ˆΣ − Σ) − E S d k (Σ; ˆΣ − Σ) | σ g (Σ; B ) ≤ C k Λ k,β ( g ; Σ; B ) γ β,k (Σ; δ n (Σ; t )) (cid:16)p k Σ k + p δ n (Σ; t ) (cid:17)p k Σ k√ t, (8.13)where δ n (Σ; t ) := k Σ k (cid:18)r r (Σ) n _ r (Σ) n _ r tn _ tn (cid:19) ≤ k Σ k (cid:18)r dn _ r tn _ tn (cid:19) =: ¯ δ n (Σ; t ) . Recall that γ β,k (Σ; u ) = ( k Σ k ∨ u ∨ k +1 / ( u ∨ u β ) , u > . For d ≤ n and t ≤ n, we have δ n (Σ; t ) ≤ k Σ k and γ β,k (Σ; δ n (Σ; t )) ≤ ( k Σ k ∨ k +3 / , which implies that, forsome C > and for all t ∈ [1 , n ] with probability at least − e − t , | ζ | ≤ C k Λ k,β ( g ; Σ; B ) k Σ k ( k Σ k ∨ k +3 / √ t. (8.14)Let now t = 3 log n. For d ≥ n, d ≤ n, we have ¯ δ n (Σ; t ) ≤ k Σ k q dn ≤ k Σ k and γ β,k (Σ; δ n (Σ; t )) ≤ γ β,k (Σ; ¯ δ n (Σ; t )) ≤ ( k Σ k ∨ k +3 / (cid:18) dn (cid:19) β/ . In addition, (cid:16)p k Σ k + p δ n (Σ; t ) (cid:17)p k Σ k√ t . k Σ k p log n. Thus, for d ≥ n, d ≤ n α , it follows from (8.13) that with some constant C ≥ andwith probability at least − n − , | ζ | ≤ C k Λ k,β ( g ; Σ; B ) k Σ k ( k Σ k ∨ k +3 / (cid:18) dn (cid:19) β/ p log n ≤ C k Λ k,β ( g ; Σ; B ) k Σ k ( k Σ k ∨ k +3 / n − (1 − α ) β/ p log n. (8.15)stimationoffunctionalsofcovarianceoperators 67 Step 4 . Finally, we need to bound ζ . To this end, denote b k (Σ) := h B k (Σ) , B i . Then, b (Σ) = hD g (Σ) , B i and d k (Σ) = P kj =0 ( − j b j (Σ) . Observe that h D b j (Σ) , H i = D b j (Σ; H ) = h DB j (Σ) H, B i = h H, ( DB j (Σ)) ∗ B i , implying D b j (Σ) = ( DB j (Σ)) ∗ B. Therefore, we have k D b j (Σ) k = sup k H k ≤ |h DB j (Σ) H, B i| ≤ sup k H k≤ |h DB j (Σ) H, B i| ≤ k B k sup k H k≤ k DB j (Σ) H k . To bound the right hand side we use Lemma 21 that yields sup k H k≤ k DB j (Σ) H k ≤ C j max ≤ j ≤ j +2 k D j g k L ∞ ( k Σ k j +1 / ∨ (cid:18) dn (cid:19) j/ . Therefore, for all j = 1 , . . . , k, k D b j (Σ) k ≤ C k k B k max ≤ j ≤ k +2 k D j g k L ∞ ( k Σ k k +1 / ∨ (cid:18) dn (cid:19) j/ and kD b j (Σ) k = k Σ / D b j (Σ)Σ / k ≤ k Σ kk D b j (Σ) k ≤ C k k B k max ≤ j ≤ k +2 k D j g k L ∞ ( k Σ k k +3 / ∨ k Σ k ) (cid:18) dn (cid:19) j/ . Since also √ kD b (Σ) k = √ k Σ / ( D D g (Σ)) ∗ B Σ / k = σ g (Σ; B ) , we get (cid:12)(cid:12)(cid:12) √ kD d k (Σ) k − σ g (Σ; B ) (cid:12)(cid:12)(cid:12) ≤ √ k X j =1 kD b j (Σ) k ≤ √ C k k B k max ≤ j ≤ k +2 k D j g k L ∞ ( k Σ k k +3 / ∨ k Σ k ) k X j =1 (cid:18) dn (cid:19) j/ , implying that, under the assumption d ≤ n/ , (cid:12)(cid:12)(cid:12)(cid:12) √ kD d k (Σ) k − σ g (Σ; B ) σ g (Σ; B ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ √ C k k B k σ g (Σ; B ) max ≤ j ≤ k +2 k D j g k L ∞ ( k Σ k k +3 / ∨ k Σ k ) r dn . (8.16)It follows from (8.16) and (8.10) that with some C > and with probability at least − e − t | ζ | ≤ C k k B k σ g (Σ; B ) max ≤ j ≤ k +2 k D j g k L ∞ k Σ k ( k Σ k ∨ k +1 / r dn (cid:18) √ t ∨ t √ n (cid:19) . (8.17)8 VladimirKoltchinskiiFor d ≥ n, d ≤ n α and t = 3 log n, this yields | ζ | ≤ C k σ g (Σ; B ) max ≤ j ≤ k +2 k D j g k L ∞ k Σ k ( k Σ k ∨ k +1 / n − (1 − α ) / p log n (8.18)that holds for some C ≥ with probability at least − n − . Step 5 . It follows from bounds (8.12), (8.15) and (8.18) that, for some C ≥ withprobability at least − n − , | ζ | ≤ C k Λ k,β ( g ; Σ; B ) k Σ k ( k Σ k ∨ k +3 / h n − k + β − α ( k +1+ β )2 + n − (1 − α ) β/ p log n + n − (1 − α ) / p log n i , which implies that with the same probability and with a possibly different C ≥ | ζ | ≤ C k L g ( B ; Σ) h n − k + β − α ( k +1+ β )2 + n − (1 − α ) β/ p log n i . It follows from the last bound that δ ( ξ, η ) ≤ n − + C k L g ( B ; Σ) h n − k + β − α ( k +1+ β )2 + n − (1 − α ) β/ p log n i , where ξ := √ n (cid:16) hD g k ( ˆΣ) , B i − hD g (Σ) , B i (cid:17) σ g (Σ; B ) , η := √ n h D d k (Σ) , ˆΣ − Σ i√ kD d k (Σ) k ,ξ − η = ζ and δ ( ξ, η ) is defined in Lemma 10. It follows from bound (8.9) and Lemma10 that, for some C ≥ , bound (8.3) holds. Step 6 . It remains to prove Proposition 7. When t ∈ [1 , n ] , bound (8.4) immediatelyfollows from (8.8), (8.10), (8.12), (8.14) and (8.17). To prove it for t > n, first observethat |hD g (Σ) , B i| ≤ k Σ / Dg (Σ)Σ / kk B k ≤ k Dg k L ∞ k Σ kk B k . (8.19)We will also prove that for some constant C > |hD g k (Σ) , B i| ≤ C k k Dg k L ∞ k Σ kk B k . (8.20)To this end, note that, by (5.13), kDB k g (Σ) k ≤ E X I ⊂{ ,...,k } k Σ / A I Dg ( A ∗ I Σ A I ) A ∗ I Σ / k≤ X I ⊂{ ,...,k } k Σ kk Dg k L ∞ E k A I k ≤ k Σ kk Dg k L ∞ X I ⊂{ ,...,k } E Y i ∈ I k W i k≤ k Σ kk Dg k L ∞ k X j =0 (cid:18) kj (cid:19) ( E k W k ) j = k Σ kk Dg k L ∞ (1 + E k W k ) k . (8.21)stimationoffunctionalsofcovarianceoperators 69For d ≤ n, we have E k W − I k . q dn ≤ C ′ for some C ′ > . Thus, E k W k ≤ E k W − I k ≤ C ′ =: C. Therefore, kDB k g (Σ) k ≤ C k k Dg k L ∞ k Σ k . In view of the definition of g k , this impliesthat (8.20) holds with some C > . It follows from (8.19) and (8.20) that, for some
C > , (cid:12)(cid:12)(cid:12)(cid:12) √ n (cid:16) hD g k ( ˆΣ) , B i − hD g (Σ) , B i (cid:17) σ g (Σ; B ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ C k k B k k Dg k L ∞ k Σ k√ nσ g (Σ; B ) . (8.22)For t > n, the right hand side of bound (8.22) is smaller than the right hand side of bound(8.4). Thus, (8.4) holds for all t ≥ . Next we prove Theorem 11 and Proposition 8.
Proof.
First suppose that, for some δ > , σ (Σ) ⊂ [2 δ, ∞ ) . Let γ δ ( x ) = γ ( x/δ ) , where γ : R [0 , is a nondecreasing C ∞ function, γ ( x ) = 0 , x ≤ / , γ ( x ) = 1 , x ≥ . Define f δ ( x ) = f ( x ) γ δ ( x ) , x ∈ R . Then, f (Σ) = f δ (Σ) which also implies that, for all Σ with σ (Σ) ⊂ [2 δ, ∞ ) , Df (Σ) = Df δ (Σ) and σ f (Σ; B ) = σ f δ (Σ; B ) . Let ϕ ( x ) := R x f δ ( t ) t dt, x ≥ and ϕ ( x ) = 0 , x < . Clearly, f δ ( x ) = xϕ ′ ( x ) , x ∈ R . Let g ( C ) := tr( ϕ ( C )) , C ∈ B sa ( H ) . Then, clearly, g is an orthogonally invariant function, Dg ( C ) = ϕ ′ ( C ) , C ∈ B sa ( H ) and D g ( C ) = C / ϕ ′ ( C ) C / = f δ ( C ) , C ∈ C + ( H ) . It is also easy to see that D g k ( C ) = ( f δ ) k ( C ) , C ∈ C + ( H ) . Using Corollary 2 of Section2, standard bounds for pointwise multipliers of functions in Besov spaces ([Tr], Section2.8.3) and characterization of Besov norms in terms of difference operators ([Tr], Section2.5.12), it is easy to check that k Dg k C s ≤ k +1 k ϕ ′ k B s ∞ , = 2 k +1 (cid:13)(cid:13)(cid:13)(cid:13) f ( x ) γ δ ( x ) x (cid:13)(cid:13)(cid:13)(cid:13) B s ∞ , . k +1 (cid:13)(cid:13)(cid:13)(cid:13) γ δ ( x ) x (cid:13)(cid:13)(cid:13)(cid:13) B s ∞ , k f k B s ∞ , . k +1 δ (cid:13)(cid:13)(cid:13)(cid:13) γ ( x/δ ) x/δ (cid:13)(cid:13)(cid:13)(cid:13) B s ∞ , k f k B s ∞ , . k +1 ( δ − − s ∨ δ − ) k f k B s ∞ , . Denote η := √ n (cid:16) h ( f δ ) k ( ˆΣ) , B i − h f (Σ) , B i (cid:17) σ f (Σ; B ) . It follows from Theorem 10 that ∆( η ; Z ) ≤ C k M f δ ( B ; Σ) h n − k + β − α ( k +1+ β )2 + n − (1 − α ) β/ p log n i + C √ n (8.23)0 VladimirKoltchinskiiwith M f δ ( B ; Σ) . k +1 M f,δ ( B ; Σ) and M f,δ ( B ; Σ) := k B k ( δ − − s ∨ δ − ) k f k B s ∞ , σ f (Σ; B ) ( k Σ k∨k Σ − k ) log (2( k Σ k∨k Σ − k )) k Σ k ( k Σ k∨ k +3 / . It will be shown that, under the assumption σ (Σ) ⊂ [2 δ, ∞ ) , the estimator ( f δ ) k ( ˆΣ) = D g k ( ˆΣ) can be replaced by the estimator f k ( ˆΣ) . To this end, the following lemma will beproved.
Lemma 25.
Suppose, for some δ > , σ (Σ) ⊂ [2 δ, ∞ ) and, for a sufficiently largeconstant C > , d ≤ log (1 + δ/ k Σ k ) C ( k + 1) n =: ¯ d. (8.24) Then, with probability at least − e − ¯ d , k f k ( ˆΣ) − ( f δ ) k ( ˆΣ) k ≤ ( k k +1 + 2) e − ¯ d k f k L ∞ . (8.25) Proof.
Recall that, by (5.2), B k f (Σ) = E Σ k X j =0 ( − k − j (cid:18) kj (cid:19) f ( ˆΣ ( j ) ) , implying that B k f ( ˆΣ) − B k f δ ( ˆΣ) = E ˆΣ k X j =0 ( − k − j (cid:18) kj (cid:19)h f ( ˆΣ ( j +1) ) − f δ ( ˆΣ ( j +1) ) i . Note also that f ( ˆΣ ( j +1) ) = f δ ( ˆΣ ( j +1) ) provided that σ ( ˆΣ ( j +1) ) ⊂ [ δ, ∞ ) (since f ( x ) = f δ ( x ) , x ≥ δ ). This easily implies the following bound: kB k f ( ˆΣ) − B k f δ ( ˆΣ) k ≤ k +1 k f k L ∞ P ˆΣ n ∃ j = 1 , . . . , k + 1 : σ ( ˆΣ ( j ) ) [ δ, ∞ ) o . (8.26)To control the probability of the event G := n ∃ j = 1 , . . . , k + 1 : σ ( ˆΣ ( j ) ) [ δ, ∞ ) o , consider the following event: E := n k ˆΣ ( j +1) − ˆΣ ( j ) k < C k ˆΣ ( j ) k r dn , j = 1 , . . . , k o . It follows from bound (1.6) (applied conditionally on ˆΣ ( j ) ) that, for a proper choice ofconstant C > , P ˆΣ ( E c ) ≤ E ˆΣ k X j =1 P ˆΣ ( j ) n k ˆΣ ( j +1) − ˆΣ ( j ) k ≥ C k ˆΣ ( j ) k r dn o ≤ ke − d . (8.27)stimationoffunctionalsofcovarianceoperators 71Note that, on the event E, k ˆΣ ( j +1) k ≤ k ˆΣ ( j ) k (cid:18) C q dn (cid:19) , which implies by inductionthat k ˆΣ ( j ) k ≤ k ˆΣ k (cid:18) C r dn (cid:19) j − , j = 1 , . . . , k + 1 . This also yields that, on the event E, k ˆΣ ( j ) − ˆΣ k ≤ j − X i =1 k ˆΣ ( i +1) − ˆΣ ( i ) k ≤ j − X i =1 k ˆΣ ( i ) k C r dn ≤ k ˆΣ k C r dn j − X i =1 (cid:18) C r dn (cid:19) i − ≤ k ˆΣ k (cid:20)(cid:18) C r dn (cid:19) j − − (cid:21) , j = 1 , . . . , k + 1 . Consider also the event F := n k ˆΣ − Σ k ≤ C k Σ k q dn o that holds with probability atleast − e − d with a proper choice of constant C . On this event, k ˆΣ k ≤ k Σ k (cid:18) C q dn (cid:19) . Therefore, on the event E ∩ F, k ˆΣ ( j ) − Σ k ≤ k Σ k (cid:18) C r dn (cid:19)(cid:20)(cid:18) C r dn (cid:19) j − − (cid:21) + k Σ k C r dn = k Σ k (cid:20)(cid:18) C r dn (cid:19) j − (cid:21) , j = 1 , . . . , k + 1 . Note that k Σ k (cid:20)(cid:18) C r dn (cid:19) k +1 − (cid:21) ≤ k Σ k (cid:18) exp (cid:26) C ( k + 1) r dn (cid:27) − (cid:19) ≤ δ provided that condition (8.24) holds. Therefore, on the event E ∩ F, k ˆΣ ( j ) − Σ k ≤ δ, j =1 , . . . , k + 1 . Since σ (Σ) ⊂ [2 δ, ∞ ) , this implies that σ ( ˆΣ ( j ) ) ⊂ [ δ, ∞ ) , j = 1 , . . . , k + 1 . In other words, E ∩ F ⊂ G c . Bound (8.26) implies that kB k f ( ˆΣ) − B k f δ ( ˆΣ) k I F ≤ k +1 k f k L ∞ I F E ˆΣ I G ≤ k +1 k f k L ∞ I F E ˆΣ I F c ∪ E c ≤ k +1 k f k L ∞ I F ( I F c + E ˆΣ I E c )= 2 k +1 k f k L ∞ I F P ˆΣ ( E c ) ≤ k k +1 e − d k f k L ∞ I F . This proves that on the event F of probability at least − e − d kB k f ( ˆΣ) − B k f δ ( ˆΣ) k ≤ k k +1 e − d k f k L ∞ . Moreover, the same bound also holds for kB j f ( ˆΣ) − B j f δ ( ˆΣ) k for all j = 1 , . . . , k andthe dimension d in the above argument can be replaced by an arbitrary upper bound d ′ d ′ = ¯ d ). Thus, under condition (8.24), withprobability at least − e − ¯ d kB j f ( ˆΣ) − B j f δ ( ˆΣ) k ≤ j j +1 e − ¯ d k f k L ∞ , j = 1 , . . . , k and also k f ( ˆΣ) − f δ ( ˆΣ) k ≤ e − ¯ d k f k L ∞ . This immediately implies that, under the as-sumption σ (Σ) ⊂ [2 δ, ∞ ) and condition (8.24), with probability at least − e − ¯ d , bound(8.25) holds.Define ξ := √ n (cid:16) h f k ( ˆΣ) , B i − h f (Σ) , B i (cid:17) σ f (Σ; B ) It follows from (8.25) that with probability at least − e − ¯ d | ξ − η | ≤ ( k k +1 + 2) k f k L ∞ k B k σ f (Σ; B ) √ ne − ¯ d and we can conclude that, under conditions d ≥ n and (8.24), the following boundholds for some C > δ ( ξ, η ) ≤ ( k k +1 + 2) k f k L ∞ k B k σ f (Σ; B ) √ ne − d + e − d ≤ C k k f k L ∞ k B k σ f (Σ; B ) n − + n − . Combining this with bound (8.23) and using Lemma 10 yields that with some
C > ξ, Z ) ≤ C k M f,δ ( B ; Σ) h n − k + β − α ( k +1+ β )2 + n − (1 − α ) β/ p log n i + C √ n + C k k f k L ∞ k B k σ f (Σ; B ) n − . (8.28)It remains to choose δ := k Σ − k (which implies that σ (Σ) ⊂ [2 δ, ∞ ) ). Since log (1 + δ/ k Σ k ) C ( k + 1) ≥ log (1 + 1 / k Σ k ∨ k Σ − k ) ) C s ≥ c ,s ( k Σ k ∨ k Σ − k ) (8.29)for a sufficiently small constant c ,s , condition (8.24) follows from assumption (8.5) on d. The bound of Theorem 11 immediately follows from (8.28).It remains to prove Proposition 8. It follows from bound (8.25) that, for t ∈ [1 , ¯ d ] and d ≥ n, d ≤ ¯ d, with probability at least − e − t , k f k ( ˆΣ) − ( f δ ) k ( ˆΣ) k ≤ ( k k +1 + 2) n − k f k L ∞ ≤ ( k k +1 + 2) k f k L ∞ r tn . (8.30)Due to a trivial bound k f k ( ˆΣ) − ( f δ ) k ( ˆΣ) k ≤ k +1 k f k L ∞ and bound (8.29), we get that,for t ≥ ¯ d, k f k ( ˆΣ) − ( f δ ) k ( ˆΣ) k ≤ k +1 k f k L ∞ √ ¯ d √ t ≤ k +1 k f k L ∞ ( k Σ k ∨ k Σ k − ) √ c ,s r tn . (8.31)stimationoffunctionalsofcovarianceoperators 73It follows from (8.30) and (8.31) that there exists a constant C > such that, for all t ≥ with probability at least − e − t , k f k ( ˆΣ) − ( f δ ) k ( ˆΣ) k ≤ C k k f k L ∞ ( k Σ k ∨ k Σ k − ) r tn . This implies that, for some
C > , with the same probability | ξ − η | ≤ C k k B k k f k L ∞ ( k Σ k ∨ k Σ k − ) σ f (Σ; B ) √ t. (8.32)Applying bound of Proposition 7 to D g k = ( f δ ) k , we get that for some constant C > and for all t ≥ with probability at least − e − t | η | ≤ C k M f,δ ( B ; Σ) √ t. Combiningthis with bound (8.32) yields that for some
C > and all t ≥ with probability at least − e − t | ξ | ≤ C k M f,δ ( B ; Σ) √ t, which, taking into account that δ = k Σ − k , completesthe proof of Proposition 8.Finally, we are ready to prove Theorem 3 stated in the introduction. Proof. If d < n, the claims of Theorem 3 easily follow from Corollary 1. Underthe assumption d ≥ n, the proof of (1.10) immediately follows from the bound ofTheorem 11 (it is enough to take the supremum over the class of covariances S ( d n ; a ) ∩{ σ f (Σ; B ) ≥ σ } and over all the operators B with k B k ≤ , and to pass to the limit as n → ∞ ).To prove (1.11), we apply lemmas 13 and 14 to r.v. ξ := ξ (Σ) := √ n ( h f k (ˆΣ) ,B i−h f (Σ) ,B i ) σ f (Σ; B ) and η := Z. Using bounds (8.7) and (4.16), we get that E ℓ ( ξ ) ≤ e √ πc e c τ , where τ := 2 C k M f ( B ; Σ) . Using bounds (4.17), (4.16), easy bounds on E ℓ ( Z ) , P {| Z | ≥ A } , and the bound of Theorem 11, we get | E ℓ ( ξ ) − E ℓ ( Z ) | ≤ c e c A h C k M f ( B ; Σ) (cid:16) n − k + β − α ( k +1+ β )2 + n − (1 − α ) β/ p log n (cid:17) + C √ n i + √ e (2 π ) / c e c τ e − A / (2 τ ) + c e c e − A / . It remains to take the supremum over the class of covariances S ( d n ; a ) ∩{ σ f (Σ; B ) ≥ σ } and over all the operators B with k B k ≤ , and to pass to the limit first as n → ∞ andthen as A → ∞ . Our main goal in this section is to prove Theorem 4 stated in Subsection 1.2.4 VladimirKoltchinskii
Proof.
The main part of the proof is based on an application of van Trees inequality andfollows the same lines as the proof of a minimax lower bound for estimation of linear func-tionals of principal components in [KLN]. We will need the following lemma (that couldbe of independent interest) showing the Lipschitz property of the function Σ σ f (Σ; B ) . It holds for an arbitrary separable Hilbert space H (not necessarily finite-dimensional). Lemma 26.
Suppose, for some s ∈ (1 , , f ∈ B s ∞ , ( R ) . Then (cid:12)(cid:12)(cid:12) σ f (Σ + H ; B ) − σ f (Σ; B ) (cid:12)(cid:12)(cid:12) ≤ k f ′ k L ∞ (2 k Σ k + k H k ) k B k h k f ′ k L ∞ k H k + 8 k f k B s ∞ , k Σ kk H k s − i . (9.1) Proof.
Note that σ f (Σ; B ) = 2 (cid:13)(cid:13)(cid:13) Σ / Df (Σ; B )Σ / (cid:13)(cid:13)(cid:13) = 2tr (cid:16) Σ / Df (Σ; B )Σ Df (Σ; B )Σ / (cid:17) = 2tr (cid:16) Σ Df (Σ; B )Σ Df (Σ; B ) (cid:17) . This implies that σ f (Σ + H ; B ) − σ f (Σ; B )= 2tr (cid:16) HDf (Σ + H ; B )(Σ + H ) Df (Σ + H ; B ) (cid:17) + 2tr (cid:16) Σ( Df (Σ + H ; B ) − Df (Σ; B ))(Σ + H ) Df (Σ + H ; B ) (cid:17) + 2tr (cid:16) Σ Df (Σ; B ) HDf (Σ + H ; B ) (cid:17) + 2tr (cid:16) Σ Df (Σ; B )Σ( Df (Σ + H ; B ) − Df (Σ; B ) (cid:17) . (9.2)We then have (cid:12)(cid:12)(cid:12) (cid:16) HDf (Σ + H ; B )(Σ + H ) Df (Σ + H ; B ) (cid:17)(cid:12)(cid:12)(cid:12) ≤ k Df (Σ + H ; B )(Σ + H ) Df (Σ + H ; B ) k k H k≤ k Σ + H kk Df (Σ + H ; B ) Df (Σ + H ; B ) k k H k≤ k Σ + H kk Df (Σ + H ; B ) k k H k≤ k f ′ k L ∞ ( k Σ k + k H k ) k H kk B k . (9.3)Similarly, it could be shown that (cid:12)(cid:12)(cid:12) (cid:16) Σ Df (Σ; B ) HDf (Σ + H ; B ) (cid:17)(cid:12)(cid:12)(cid:12) ≤ k f ′ k L ∞ k Σ kk H kk B k . (9.4)Also, we have (cid:16) Σ( Df (Σ + H ; B ) − Df (Σ; B ))(Σ + H ) Df (Σ + H ; B ) (cid:17) = h ( Df (Σ + H ) − Df (Σ))( B ) , C i = h ( Df (Σ + H ) − Df (Σ))( C ) , B i = h Df (Σ + H ; C ) − Df (Σ; C ) , B i , stimationoffunctionalsofcovarianceoperators 75where C := (Σ + H ) Df (Σ; B )Σ + Σ Df (Σ; B )(Σ + H ) . Using bound (2.25), this implies (cid:12)(cid:12)(cid:12) (cid:16) Σ( Df (Σ + H ; B ) − Df (Σ; B ))(Σ + H ) Df (Σ + H ; B ) (cid:17)(cid:12)(cid:12)(cid:12) = |h Df (Σ + H ; C ) − Df (Σ; C ) , B i| ≤ k Df (Σ + H ; C ) − Df (Σ; C ) kk B k ≤ k f k B s ∞ , k C kk H k s − k B k ≤ k f k B s ∞ , k Σ kk Σ + H kk Df (Σ; B ) kk H k s − k B k ≤≤ k f ′ k L ∞ k f k B s ∞ , k Σ k ( k Σ k + k H k ) k H k s − k B k . (9.5)Similarly, (cid:12)(cid:12)(cid:12) (cid:16) Σ Df (Σ; B )Σ( Df (Σ + H ; B ) − Df (Σ; B ) (cid:17)(cid:12)(cid:12)(cid:12) ≤ k f ′ k L ∞ k f k B s ∞ , k Σ k k H k s − k B k . (9.6)Substituting bound (9.3), (9.4), (9.5) and (9.6) into (9.2), we get (9.1).For given a ′ ∈ (1 , a ) and σ ′ > σ , assume that B f ( d n ; a ′ ; σ ′ ) = ∅ (otherwise, theproof becomes trivial) and, for B with k B k ≤ such that ˚ S f,B ( d n ; a ′ ; σ ′ ) = ∅ , consider Σ ∈ ˚ S f,B ( d n ; a ′ ; σ ′ ) . For H ∈ B sa ( H ) and c > , define Σ t := Σ + tH √ n and S c,n (Σ , H ) := { Σ t : t ∈ [ − c, c ] } . In what follows, H will be chosen so that k H k ≤ k f ′ k L ∞ a . (9.7)Recall that the set ˚ S f,B ( d n ; a ; σ ) is open in operator norm topology, so, Σ is its interiorpoint. Moreover, let δ > and suppose that k Σ − Σ k < δ. If δ < a − a ′ , then k Σ k < a. If δ < a − a ′ a , then it is easy to check that k Σ − k < a. Also, using the bound of Lemma 26,it is easy to show that, for B with k B k ≤ , the condition k f ′ k L ∞ (2 a + δ ) h k f ′ k L ∞ δ + 8 k f k B s ∞ , aδ s − i ≤ ( σ ′ ) − σ (9.8)implies that σ (Σ; B ) > σ . Thus, for a small enough δ = δ ( f, s, a, a ′ , σ , σ ′ ) ∈ (0 , satisfying assumptions δ < a − a ′ a and (9.8), we have B (Σ ; δ ) := { Σ : k Σ − Σ k < δ } ⊂ ˚ S f,B ( d n ; a ; σ ) . For given c and δ, for H satisfying (9.7) and for all large enough n (more specifically, for n > c a k f ′ k L ∞ δ ), we have c k H k√ n < δ, (9.9)6 VladimirKoltchinskiiimplying that S c,n (Σ , H ) ⊂ B (Σ ; δ ) ⊂ ˚ S f,B ( d n ; a ; σ ) . Define ϕ ( t ) := h f (Σ t ) , B i , t ∈ [ − c, c ] . Clearly, ϕ is continuously differentiable function with derivative ϕ ′ ( t ) = 1 √ n h Df (Σ t ; H ) , B i , t ∈ [ − c, c ] . (9.10)Consider the following parametric model: X , . . . , X n i . i . d . ∼ N (0; Σ t ) , t ∈ [ − c, c ] . (9.11)It is well known that the Fisher information matrix for model X ∼ N (0; Σ) with non-singular covariance Σ is I (Σ) = (Σ − ⊗ Σ − ) (see, e.g., [Eat]). This implies that theFisher information for model X ∼ N (0 , Σ t ) , t ∈ [ − c, c ] is I ( t ) = h I (Σ t )Σ ′ t , Σ ′ t i = n h I (Σ t ) H, H i and for model (9.11) it is I n ( t ) = nI ( t ) = h I (Σ t ) H, H i = 12 h (Σ − t ⊗ Σ − t ) H, H i = 12 h Σ − t H Σ − t , H i = 12 tr(Σ − t H Σ − t H ) = 12 k Σ − / t H Σ − / t k . We will now use well known van Trees inequality (see, e.g., [GL]) that provides a lowerbound on the average risk of an arbitrary estimator T ( X , . . . , X n ) of a smooth function ϕ ( t ) of parameter t of model (9.11) with respect to a smooth prior density π c on [ − c, c ] such that J π c := R c − c ( π ′ c ( t )) π c ( t ) dt < ∞ and π c ( c ) = π c ( − c ) = 0 . It follows from thisinequality that sup t ∈ [ − c,c ] E t ( T n ( X , . . . , X n ) − g ( t )) ≥ Z c − c E t ( T n ( X , . . . , X n ) − g ( t )) π c ( t ) dt ≥ (cid:16)R c − c ϕ ′ ( t ) π c ( t ) dt (cid:17) R c − c I n ( t ) π c ( t ) dt + J π c . (9.12)A common choice of prior is π c ( t ) := c π (cid:16) tc (cid:17) for a smooth density π on [ − , with π (1) = π ( −
1) = 0 and J π := R − π ′ ( t )) π ( t ) dt < ∞ . In this case, J π c = c − J π . Next weprovide bounds on the numerator and the denominator of the right hand side of (9.12).For the numerator, we get (cid:16)Z c − c ϕ ′ ( t ) π c ( t ) dt (cid:17) = (cid:16)Z c − c [ ϕ ′ (0) + ( ϕ ′ ( t ) − ϕ ′ (0))] π ( t/c ) dt/c (cid:17) ≥ ( ϕ ′ (0)) + 2 ϕ ′ (0) Z c − c ( ϕ ′ ( t ) − ϕ ′ (0)) π ( t/c ) dt/c ≥ ( ϕ ′ (0)) − | ϕ ′ (0) | Z c − c | ϕ ′ ( t ) − ϕ ′ (0) | π ( t/c ) dt/c. stimationoffunctionalsofcovarianceoperators 77Using (9.10) along with the following bound (based on bound (2.25)), | ϕ ′ ( t ) − ϕ ′ (0) | ≤ √ n k Df (Σ t ; H ) − Df (Σ ; H ) kk B k ≤ √ n k f k B s ∞ , k Σ t − Σ k s − k H kk B k ≤ n s/ k f k B s ∞ , k H k s k B k | t | s − , we get: (cid:16)Z c − c ϕ ′ ( t ) π c ( t ) dt (cid:17) ≥ n h Df (Σ ; H ) , B i − √ n |h Df (Σ ; H ) , B i| n s/ k f k B s ∞ , k H k s k B k Z c − c | t | s − π ( t/c ) dt/c = 1 n h Df (Σ ; H ) , B i − k f k B s ∞ , k H k s k B k c s − n (1+ s ) / |h Df (Σ ; H ) , B i| Z − | t | s − π ( t ) dt ≥ n h Df (Σ ; H ) , B i − k f k B s ∞ , k H k s k B k c s − n (1+ s ) / |h Df (Σ ; H ) , B i| . (9.13)Observing that h Df (Σ ; H ) , B i = h Df (Σ ; B ) , H i = h Σ − / D Σ − / , Σ − / H Σ − / i , where D := Σ Df (Σ ; B )Σ , we can rewrite bound (9.13) as (cid:16)Z c − c ϕ ′ ( t ) π c ( t ) dt (cid:17) ≥ n h Σ − / D Σ − / , Σ − / H Σ − / i − (cid:12)(cid:12)(cid:12) h Σ − / D Σ − / , Σ − / H Σ − / i (cid:12)(cid:12)(cid:12) k f k B s ∞ , k H k s k B k c s − n (1+ s ) / . (9.14)To bound the denominator, we need to control I n ( t ) = tr(Σ − t H Σ − t H ) in terms of I n (0) = tr(Σ − H Σ − H ) . To this end, note that Σ − t = Σ − + C Σ − , where C := (cid:18) I + t Σ − H √ n (cid:19) − − I. Suppose H satisfies the assumption c k Σ − H k√ n ≤ , (9.15)which also implies that k C k ≤ | t | k Σ − H k√ n ≤ . Note also that tr(Σ − t H Σ − t H ) = tr(Σ − H Σ − H ) + 2tr( C Σ − H Σ − H ) + tr( C Σ − HC Σ − H ) . Therefore, I n ( t ) ≤ I n (0) + k C kk Σ − H Σ − H k + 12 k C Σ − H k k H Σ − C k ≤ I n (0) + (cid:16) k C k + k C k (cid:17) k Σ − H k ≤ I n (0) + 3 | t |k Σ − H k √ n Z c − c I n ( t ) π ( t/c ) dt/c ≤ I n (0) + 3 k Σ − H k √ n Z c − c | t | π ( t/c ) dt/c ≤ k Σ − / H Σ − / k + 3 c k Σ − H k √ n . (9.16)Substituting bounds (9.14) and (9.16) into inequality (9.12), we get sup t ∈ [ − c,c ] n E t ( T n ( X , . . . , X n ) − g ( t )) ≥ h Σ − / D Σ − / , Σ − / H Σ − / i − (cid:12)(cid:12)(cid:12) h Σ − / D Σ − / , Σ − / H Σ − / i (cid:12)(cid:12)(cid:12) k f k Bs ∞ , k H k s k B k c s − n ( s − / k Σ − / H Σ − / k + 3 c k Σ − H k √ n + J π c . (9.17)Note that k Σ − / D Σ − / k = k Σ / Df (Σ ; B )Σ / k = 12 σ f (Σ ; B ) . In what follows, we use H := D, which clearly satisfies condition (9.7) since, for Σ ∈ ˚ S f,B ( d n ; a ; σ ) and k B k ≤ , k D k = k Σ Df (Σ ; B )Σ k ≤ k Σ k k Df (Σ ; B ) k≤ k f ′ k L ∞ k B k k Σ k ≤ a k f ′ k L ∞ k B k ≤ a k f ′ k L ∞ . (9.18)We also have k Σ − D k = tr( Df (Σ ; B )Σ Df (Σ , B )) ≤ k Σ k k Df (Σ ; B ) k ≤ k Σ k k f ′ k L ∞ k B k ≤ a k f ′ k L ∞ k B k ≤ a k f ′ k L ∞ , (9.19)implying that assumptions (9.9) and (9.15) hold for H = D provided that n > c a k f ′ k L ∞ δ . (9.20)With this choice of H, (9.17) implies sup t ∈ [ − c,c ] n E t ( T n ( X , . . . , X n ) − g ( t )) σ f (Σ ; B ) ≥ − c k Σ − D k √ n + k f k Bs ∞ , k D k s k B k c s − n ( s − / + J π c σ f (Σ ; B ) + 3 c k Σ − D k √ n + J π c . (9.21)It follows from (9.21), (9.18) and (9.19) that for B satisfying k B k ≤ t ∈ [ − c,c ] n E t ( T n ( X , . . . , X n ) − g ( t )) σ f (Σ ; B ) ≥ − γ n,c ( f ; a ; σ ) (9.22)stimationoffunctionalsofcovarianceoperators 79where γ n,c ( f ; a ; σ ) := a k f ′ k L ∞ c √ n + a s k f k Bs ∞ , k f ′ k sL ∞ c s − n ( s − / + J π c σ . Denote σ ( t ) := σ f (Σ t ; B ) , t ∈ [ − c, c ] . By Lemma 26, | σ ( t ) − σ (0) | ≤ k f ′ k L ∞ (cid:18) k Σ k + | t |k D k√ n (cid:19) k B k h k f k L ∞ | t |k D k√ n +8 k f k B s ∞ , k Σ k | t | s − k D k s − n ( s − / i . Note that assumption (9.15) on H = D implies that c k D k√ n = c k Σ Σ − H k√ n ≤ c k Σ − H kk Σ k√ n ≤ k Σ k . Using bound (9.18), we get that, for all t ∈ [ − c, c ] and B with k B k ≤ , | σ ( t ) − σ (0) | ≤ ca k f ′ k L ∞ n / + 24 c s − a s k f ′ k sL ∞ k f k B s ∞ , n ( s − / =: λ n,c ( f ; a ) . which implies that sup t ∈ [ − c,c ] σ ( t ) σ (0) ≤ λ n,c ( f ; a ) σ . (9.23)It follows from (9.22) and (9.23) that sup t ∈ [ − c,c ] n E t ( T n ( X , . . . , X n ) − g ( t )) σ ( t ) (cid:18) λ n,c ( f ; a ) σ (cid:19) ≥ sup t ∈ [ − c,c ] n E t ( T n ( X , . . . , X n ) − g ( t )) σ ( t ) sup t ∈ [ − c,c ] σ ( t ) σ (0) ≥ sup t ∈ [ − c,c ] n E t ( T n ( X , . . . , X n ) − g ( t )) σ f (Σ ; B ) ≥ − γ n,c ( f ; a ; σ ) , which yields that for all B ∈ B f ( d n ; a ′ ; σ ′ )sup Σ ∈ ˚ S f,B ( d n ; a ; σ ) n E Σ ( T n ( X , . . . , X n ) − h f (Σ) , B i ) σ f (Σ; B ) ≥ sup t ∈ [ − c,c ] n E t ( T n ( X , . . . , X n ) − g ( t )) σ ( t ) ≥ − γ n,c ( f ; a ; σ )1 + λ n,c ( f ; a ) σ . (9.24)It remains to observe that lim c →∞ lim sup n →∞ γ n,c ( f ; a ; σ ) = 0 and lim c →∞ lim sup n →∞ λ n,c ( f ; a ) = 0 to complete the proof.0 VladimirKoltchinskii Remark 10.
It follows from the proof that the following local version of (1.12) also holds:for all a ′ ∈ (1 , a ) , σ ′ > σ , lim δ → lim inf n →∞ inf T n inf B ∈ B f ( d n ; a ′ ; σ ′ ) inf Σ ∈ ˚ S f,B ( d n ; a ′ ; σ ′ ) sup k Σ − Σ k <δ n E Σ ( T n − h f (Σ) , B i ) σ f (Σ; B ) ≥ . (9.25) Acknowledgments.
The author is very thankful to Richard Nickl, Alexandre Tsybakov and Mayya Zhilovafor several helpful conversations and to Anna Skripka for careful reading of Section 2 of the paper andmaking several useful suggestions. A part of this work was done while on leave to the University of Cam-bridge. The author is thankful to the Department of Pure Mathematics and Mathematical Statistics of thisuniversity for its hospitality. Finally, the author is thankful to the referees for numerous helpful comments.The research was partially supported by NSF Grants DMS-1810958, DMS-1509739 and CCF-1523768.
References [AP1] Aleksandrov, A. and Peller, V.: Functions of perturbed unbounded self-adjointoperators. Operator Bernstein type inequalities.
Indiana Univ. Math. J. , 59(4), 1451–1490 (2010).[AP2] Aleksandrov, A. and Peller, V.: Operator Lipschitz Functions. arXiv: 1611.01593(2016).[A] Anderson, T.W.:
An introduction to multivariate statistical analysis.
Wiley Seriesin Probability and Statistics. Wiley-Interscience [John Wiley & Sons], Hoboken, NJ(2003).[ACDS] Azamov, N.A., Carey, A.L., Dodds, P.G. and Sukochev, F.A.: Operator integrals,spectral shift, and spectral flow.
Canad. J. Math. , 61, 2, 241–263 (2009).[BaiS] Bai, Z.D. and Silverstein, J.W.: CLT for linear spectral statistics of large samplecovariance matrices.
The Annals of Probability , 32(1A), 553–605 (2014).[Bh] Bhatia, R.:
Matrix Analysis.
Springer (1997).[BKRW] Bickel, P.J., Klaassen, C.A.J., Ritov, Y. and Wellner, J.A.:
Efficient and Adap-tive Estimation for Semiparametric Models . Johns Hopkins University Press, Balti-more (1993).[BJNP] Birnbaum, A., Johnstone, I.M, Nadler, B. and Paul, D.: Minimax bounds forsparse PCA with noisy high-dimensional data.
Annals of Statistics , 41(3):1055–1084 (2013).[CL1] Cai, T.T. and Low, M.: On adaptive estimation of linear functionals.
Annals ofStatistics , 33, 2311–2343 (2005).stimationoffunctionalsofcovarianceoperators 81[CL2] Cai, T.T. and Low, M.: Non-quadratic estimators of a quadratic functional.
Annalsof Statistics , 33, 2930–2956 (2005).[CLZ] Cai, T.T., Liang, T. and Zhou, H.: Law of log determinant of sample covariancematrix and optimal estimation of differential entropy for high-dimensional Gaussiandistributions.
Journal of Multivariate Analysis , 137, 161–172 (2015).[CCT] Collier, O., Comminges, L. and Tsybakov, A.: Minimax estimation of linear andquadratic functionals on sparsity classes.
Annals of Statistics , 45, 3, 923–958 (2017).[DPR] Dauxois, J., Pousse, A. and Romain, Y.: Asymptotic theory for the principalcomponent analysis of a vector random function: some applications to statisticalinference.
J. Multivariate Anal. , 12(1):136–154 (1982).[Dav] Davies, E.B.:
Linear Operators and their Spectra.
Cambridge University Press,Cambridge, UK (2007).[Eat] Eaton, M.L.:
Multivariate Statistics: A Vector Space Approach. Wiley Series inProbability and Mathematical Statistics: Probability and Mathematical Statistics,
John Wiley & Sons, Inc., New York (1983).[FRW] Fan, J., Rigollet, P. and Wang, W.: Estimation of functionals of sparse covariancematrices.
Annals of Statistics , 43, 6, 2706–2737 (2015).[FK] Faraut, J. and Koranyi, A.:
Analysis on symmetric cones.
Clarendon Press, Oxford(1994).[GaoZ] Gao, C. and Zhou, H.: Bernstein-von Mises theorems for functionals of the co-variance matrix.
Electronic Journal of Statistics , 10, 2, 1751–1806 (2016).[GL] Gill, R.D. and Levit, B.Y.: Applications of the van Trees inequality: a BayesianCram´er-Rao bound.
Bernoulli,
Mathematical Foundations of Infinite-Dimensional Statis-tical Models . Cambridge University Press (2016).[Gir] Girko, V.L.: Introduction to general statistical analysis.
Theory Probab. Appl. , 32,2: 229–242 (1987).[Gir1] Girko, V.L.:
Statistical analysis of observations of increasing dimension . Springer(1995).[GonZ] Gonska, H. and Zhou, X.-L.: Approximation theorems for the iterated Booleansums of Bernstein operators.
Journal of Computational and Applied Mathematics ,53, 1, 21–31 (1994).[GLM] Graczyk, P., Letac, G. and Massam, H.: The complex Wishart distribution andthe symmetric group.
Annals of Statistics , 31(1), 287–309 (2003).2 VladimirKoltchinskii[GLM1] Graczyk, P., Letac, G. and Massam, H.: The hyperoctahedral group, symmetricgroup representations and the moments of the real Wishart distribution.
J. Theor.Probability , 18, 1-42 (2005).[IKh] Ibragimov, I.A. and Khasminskii, R.Z.:
Statistical Estimation: Asymptotic Theory .Springer-Verlag, New York (1981).[IKhN] Ibragimov, I.A., Nemirovski, A.S. and Khasminskii, R.Z.: Some problems ofnonparametric estimation in Gaussian white noise.
Theory of Probab. and Appl. , 31,391–406 (1987).[JaMont] Javanmard, A. and Montanari, A.: Hypothesis testing in high-dimensional re-gression under the Gaussian random design model: Asymptotic theory.
IEEE Trans-actions on Information Theory , 60, 10, 6522–6554 (2014).[James] James, A.T.: The Distribution of the Latent Roots of the Covariance Matrix.
Ann. Math. Statist. , 31(1), 151–158 (1960).[James1] James, A.T.: Zonal Polynomials of the Real Positive Definite Symmetric Ma-trices.
Ann. Math. , 74(3), 456–469 (1961).[James2] James, A.T.: Distributions of Matrix Variates and Latent Roots Derived fromNormal Samples.
Ann. Math. Statist. , 35(2), 475–501 (1964).[JG] Jankov´a, J. and van de Geer, S.: Semi-parametric efficiency bounds for high-dimensional models
Annals of Statistics , 46(5), 2336–2359 (2018).[JHW] Jiao, J., Han, Y. and Weissman, T.: Bias correction with Jackknife, Bootstrap andTaylor Series., arXiv:1709.06183 (2017).[Jo] Johnstone, I.M.: On the distribution of the largest eigenvalue in principal compo-nents analysis.
Annals of Statistics , 29(2):295–327 (2001).[JoLu] Johnstone, I.M. and Lu, A.Y.: On consistency and sparsity for principal com-ponents analysis in high dimensions.
J. Amer. Statist. Assoc. , 104(486):682–693(2009).[Kato] Kato, T:
Perturbation Theory for Linear Operators . Springer-Verlag, New York(1980).[KS] Kissin, E. and Shulman, V.S.: Classes of operator-smooth functions. II. Operator-differentiable functions.
Integral Equations and Operator Theory , 49, 2, 165–210(2004).[KL1] Koltchinskii, V. and Lounici, K.: Asymtotics and concentration bounds for bilinearforms of spectral projectors of sample covariance.
Ann. Inst. H. Poincar´e Probab.Statist. , 52, 4, 1976–2013 (2016).stimationoffunctionalsofcovarianceoperators 83[KL2] Koltchinskii, V. and Lounici, K.: Concentration inequalities and moment boundsfor sample covariance operators.
Bernoulli , 23, 1, 110–133 (2017).[KL3] Koltchinskii, V. and Lounici, K.: Normal approximation and concentration ofspectral projectors of sample covariance.
Annals of Statistics , 45, 1, 121–157 (2017).[KL4] Koltchinskii, V. and Lounici, K.: New Asymptotic Results in Principal ComponentAnalysis.
Sankhya , 79, 2, 254–297 (2017).[KLN] Koltchinskii, V., L¨offler, M. and Nickl, R.: Efficient Estimation of Linear Func-tionals of Principal Components. arXiv:1708.07642 (2017).[KZh] Koltchinskii, V. and Zhilova, M.: Efficient Estimation of Smooth Functionals inGaussian Shift Models. arXiv:1810.02767 (2018).[KoVa] Kong, W. and Valiant, G.: Spectrum estimation from samples.
Annals of Statis-tics , 45, 5, 2218-2247 (2017).[Led] Ledoux, M.:
The Concentration of Measure Phenomenon . American MathematicalSociety (2001).[LetMas] Letac, G. and Massam, H.: All invariant moments of the Wishart distribution.
Scandinavian J. of Statistics , 31(2), 295–318 (2004).[Lev1] Levit, B.: On the efficiency of a class of non-parametric estimates.
Theory ofProb. and applications , 20(4), 723–740 (1975).[Lev2] Levit, B.: Asymptotically efficient estimation of nonlinear functionals.
Probl.Peredachi Inf. (Problems of Information Transmission) , 14(3), 65–72 (1978).[LP] Lytova, A. and Pastur, L.: Central limit theorem for linear eigenvalue statistics ofrandom matrices with independent entries.
Annals of Probability , 37(5), 1778–1840(2009).[NSU] Naumov, A., Spokoiny, V. and Ulyanov, V.: Bootstrap confidence sets for spectralprojectors of sample covariance. https://arxiv.org/pdf/1703.00871.pdf (2017).[Nem1] Nemirovski, A.: On necessary conditions for the efficient estimation of function-als of a nonparametric signal which is observed in white noise.
Theory of Probab.and Appl. , 35, 94–103 (1990).[Nem2] Nemirovski, A.:
Topics in Non-parametric Statistics . Ecole d’Ete de Probabilit´esde Saint-Flour. Lecture Notes in Mathematics, v. 1738, Springer, New York (2000).[Nik] Nikolsky, S.M.: Inequalities for entire functions of finite degree and their applica-tions in the theory of differentiable functions of many variables.
Proc. Steklov Math.Inst.,
38, 244–278 (1951) (in Russian).4 VladimirKoltchinskii[Paul] Paul, D.: Asymptotics of sample eigenstructure for a large dimensional spikedcovariance model.
Statist. Sinica , 17(4):1617–1642 (2007).[Pel] Peller, V.V.: Hankel operators in the perturbation theory of unitary and self-adjointoperators.
Funk. anal. i ego pril. , 1985, 19(2), 37–51 (In Russian), English transl.:
Func. Anal. Appl. , 19(2), 111–123 (1985).[Pel1] Peller, V.V.: Multiple operator integrals and higher operator derivatives.
J. Funct.Anal. , 233(2): 515–544 (2006).[Pel2] Peller, V.V.: Multiple operator integrals in perturbation theory.
Bull. Math. Sci. ,6(1), 15–88 (2016).[Pet] Petrov, V.V.:
Sums of independent random variables.
Springer (1975).[RW] Reiss, M. and Wahl, M.: Non-asymptotic upper bounds for the reconstruction errorof PCA. https://arxiv.org/abs/1609.03779 (2016).[Skr] Skripka, A.: Taylor approximation of operator functions.
Operator Theory: Ad-vances and Applications , 240, Birkh¨auser, Basel, 243–256 (2014).[SW] Sosoe, P. and Wong, P.: Regularity conditions in the CLT for linear eigenvaluestatistics of Wigner matrices.
Advances in Mathematics , 249(20), 37–87 (2013).[Tot] Totik, V.: Approximation by Bernstein polynomials.
American Journal of Mathe-matics , 116, 4, 995–1018 (1994).[Tr] Triebel, H.:
Theory of function spaces . Birkh¨auser Verlag, Basel (1983).[GBRD] van de Geer, S., B¨uhlmann, P., Ritov, Y. and Dezeure, R.: On asymptotically op-timal confidence regions and tests for high-dimensional models.
Annals of Statistics ,42(3), 1166–1202 (2014).[Ver] Vershynin, R.: Introduction to the non-asymptotic analysis of random matrices. In
Compressed sensing , pages 210–268. Cambridge Univ. Press, Cambridge (2012).[ZZ] Zhang, C.-H. and Zhang, S.S.: Confidence intervals for low dimensional parametersin high dimensional linear models.