[PDF] A general method for power analysis in testing high dimensional covariance matrices

Abstract

Covariance matrix testing for high dimensional data is a fundamental problem. A large class of covariance test statistics based on certain averaged spectral statistics of the sample covariance matrix are known to obey central limit theorems under the null. However, precise understanding for the power behavior of the corresponding tests under general alternatives remains largely unknown. This paper develops a general method for analyzing the power behavior of covariance test statistics via accurate non-asymptotic power expansions. We specialize our general method to two prototypical settings of testing identity and sphericity, and derive sharp power expansion for a number of widely used tests, including the likelihood ratio tests, Ledoit-Nagao-Wolf's test, Cai-Ma's test and John's test. The power expansion for each of those tests holds uniformly over all possible alternatives under mild growth conditions on the dimension-to-sample ratio. Interestingly, although some of those tests are previously known to share the same limiting power behavior under spiked covariance alternatives with a fixed number of spikes, our new power characterizations indicate that such equivalence fails when many spikes exist. The proofs of our results combine techniques from Poincar\'e-type inequalities, random matrices and zonal polynomials.

Full PDF

AA GENERAL METHOD FOR POWER ANALYSIS INTESTING HIGH DIMENSIONAL COVARIANCEMATRICES

QIYANG HAN, TIEFENG JIANG, AND YANDI SHEN

Abstract.

Covariance matrix testing for high dimensional data is afundamental problem. A large class of covariance test statistics basedon certain averaged spectral statistics of the sample covariance matrixare known to obey central limit theorems under the null. However, pre-cise understanding for the power behavior of the corresponding testsunder general alternatives remains largely unknown. This paper devel-ops a general method for analyzing the power behavior of covariancetest statistics via accurate non-asymptotic power expansions. We spe-cialize our general method to two prototypical settings of testing iden-tity and sphericity, and derive sharp power expansion for a number ofwidely used tests, including the likelihood ratio tests, Ledoit-Nagao-Wolf’s test, Cai-Ma’s test and John’s test. The power expansion foreach of those tests holds uniformly over all possible alternatives undermild growth conditions on the dimension-to-sample ratio. Interestingly,although some of those tests are previously known to share the same lim-iting power behavior under spiked covariance alternatives with a ﬁxednumber of spikes, our new power characterizations indicate that suchequivalence fails when many spikes exist. The proofs of our results com-bine techniques from Poincar´e-type inequalities, random matrices andzonal polynomials. Introduction

Problem setup.

Let X , . . . , X n be i.i.d. samples from a p -variatenormal distribution N p ( µ, Σ). We are interested in the following generaltesting problem: H : ( µ, Σ) ∈ H versus H : H does not hold (1.1)for certain classes H to be speciﬁed in later sections. For the most of thepaper, we will focus on the marginal testing of the covariance matrix Σ.Covariance matrix testing is a fundamental problem in multivariate sta-tistical analysis. Departing from the classical low-dimensional setting where Date : January 28, 2021.2000

Mathematics Subject Classiﬁcation.

Key words and phrases.

Central limit theorem, covariance test, large dimensionality,power analysis, Poincar´e inequalities, sphericity tests, spiked covariance.The research of Q. Han is partially supported by NSF DMS-1916221. The research ofT. Jiang is partially supported by NSF DMS-1916014. a r X i v : . [ m a t h . S T ] J a n Q. HAN, T. JIANG, AND Y. SHEN the dimension p is ﬁxed, e.g. [And58, Mui82, Eat83], the majority of re-cent works have been devoted to (1.1) in the high-dimensional setting where p is allowed to grow proportionally or even polynomially with n ; see e.g.,[LW02, BD05, BJYZ09, JJY12, CM13, JY13, JQ15, CJ18], for an incom-plete list. To facilitate discussions below, let X = [ X , . . . , X n ] (cid:62) ∈ R n × p be the data matrix, and let T ( X ) be a generic test statistic whose distribu-tion is invariant under H , i.e., the law of T ( X ) remains the same for any( µ, Σ) ∈ H in (1.1). Denote (throughout the paper we use the symbol ≡ for deﬁnition) m ( µ, Σ) ≡ E ( µ, Σ) T ( X ) , σ µ, Σ) ≡ Var ( µ, Σ) (cid:0) T ( X ) (cid:1) (1.2)for the mean and variance of T ( X ) under N p ( µ, Σ), respectively. We alwaysassume that the two quantities in (1.2) are ﬁnite. In a similar spirit, we usethe subscript ( µ, Σ) in E ( µ, Σ) and other probabilistic notations to indicatethat the evaluation is under measure N p ( µ, Σ). Due to the distributionalinvariance of T ( X ), its mean and variance under the null m H ≡ m ( µ , Σ ) , σ H ≡ σ µ , Σ ) (1.3)are well-deﬁned for any speciﬁcation of ( µ , Σ ) ∈ H .A common theme of the aforementioned works is a central limit theorem(CLT) for the normalized test statistic T ( X ) under the null H : underthe assumption min { n, p } → ∞ along with some other case-speciﬁc growthconditions on ( n, p ), it holds that T ( X ) − m H σ H converges in distribution to N (0 ,

1) under H . (1.4)Hereafter N (0 ,

1) denotes the standard normal distribution.The persistence of the universal CLT (1.4) in a wide class of covariancetest statistics T ( X ) in the high-dimensional regime, as cited above, is nota mere coincidence: it is known that Gaussian approximation holds when T ( X ) depends on ‘suﬃcient average’ of eigenvalues of the sample covariancematrix, for instance when T ( X ) can be written as its linear spectral statistic[BS04]. From a statistical point of view, the validity of CLT (1.4) imme-diately leads to the construction of an asymptotically exact test: for anyprescribed size α ∈ (0 , X ) ≡ Ψ( X ; m H , σ H ) ≡ (cid:16) T ( X ) − m H σ H > z α (cid:17) . (1.5)Here z α is the normal quantile such that P ( N (0 , > z α ) = α . The quanti-ties m H and σ H are usually known in closed forms, at least asymptotically,in the above cited works to carry out the tests. Even not amenable to exactexpression, these quantities can be simulated easily as well.To assess the quality of a generic test Ψ( X ) and facilitate comparisonbetween diﬀerent tests, a more subtle and diﬃcult question is to study the IGH DIMENSIONAL COVARIANCE TESTING 3 power behavior, ideally asymptotically exact, for each and every test. Al-though of fundamental importance, existing technical devices for asymptot-ically exact power analysis of covariance tests are rather limited. Roughlyspeaking, these techniques fall into the following two main categories:(1) Establish directly a central limit theorem for T ( X ) under the alternative[WY13, CM13, CJ18, Jia19]. A number of case-speciﬁc techniques, e.g.,random matrix theory [WY13], moment calculations [CJ18], martingaletheory [CM13], have been used along this line for diﬀerent tests.(2) Use contiguity theory in conjunction with Le Cam’s third lemma. Thisprogram is carried in [OMH13, OMH14] in the spiked covariance alter-native with a ﬁxed number of spikes.In addition to the case-speciﬁc nature of the techniques involved, a commondownside of these methods lies in the imposition of rather restrictive con-ditions on both the growth of ( n, p ) and the alternative Σ under which thepower analysis is valid. These restrictions may sometimes be more funda-mental than technical. For instance, the method (2) works only for spiked al-ternatives in the sub-critical regime below the Baik-Ben Arous-P´ech´e (BBP)phase transition [BBAP05], as the log likelihood ratio process could becomesingular in the super-critical regime above the BBP phase transition. SeeSection 5 for a detailed technical comparison.1.2. A new method of power analysis.

In this paper, we develop ageneral method for analyzing the power behavior of a generic test statistic T ( X ) when the CLT (1.4) under the null holds. For the test Ψ( X ) in (1.5)built from a generic test statistic T ( X ) whose distribution remains invariantunder H , the general power formula (see Theorem 2.1) takes the followingform: for any ( µ, Σ) ∈ R p × M p , where M p is the set of all p × p covariancematrices, (cid:12)(cid:12)(cid:12)(cid:12) E ( µ, Σ) Ψ( X ) − (cid:20) − Φ (cid:18) z α − m ( µ, Σ) − m H σ H (cid:19)(cid:21)(cid:12)(cid:12)(cid:12)(cid:12) ≤ err H + err ( µ, Σ) . (1.6)Here m ( µ, Σ) is deﬁned in (1.2), err H is the normal approximation error of T ( X ) under H in Kolmogorov distance [formally deﬁned in (2.2) ahead],and err ( µ, Σ) = (cid:16) V ( µ, Σ) max (cid:8) | m ( µ, Σ) − m H | , σ H (cid:9) (cid:17) / (1.7)characterizes the departure of ( µ, Σ) from the null (so err ( µ , Σ ) = 0 for any( µ , Σ ) ∈ H ). The ‘variance’ parameter V µ, Σ) , formally deﬁned in (2.1)ahead, characterizes the order of stochastic ﬂuctuation of the test statistic T ( X ) under the alternative compared to that under the null.From (1.6), it is clear that when the CLT (1.4) under the null holds,the power of Ψ( X ) under the alternative ( µ, Σ) has an asymptotically exactexpression via the parameter (cid:0) m ( µ, Σ) − m H (cid:1) /σ H , provided that err ( µ, Σ) →

0. Consequently, the key step in applying (1.6) rests in the validation of

Q. HAN, T. JIANG, AND Y. SHEN the condition err ( µ, Σ) →

0. Informally, this condition is satisﬁed as long asthe distribution of T ( X ) ‘stabilizes’ under the prescribed alternative in anappropriate sense. More precisely, err ( µ, Σ) vanishes as long as the order ofstochastic ﬂuctuation V µ, Σ is smaller compared either to the mean diﬀerence | m ( µ, Σ) − m H | , or to the standard deviation σ H of the test statistic T ( X )under the null. In typical applications that will be detailed below, the formercase corresponds to the large departure regime of the alternative from thenull, i.e., ( µ, Σ) is suﬃciently away from H , while the latter to the smalldeparture regime, so usually the validity of the power expansion (1.6) holdsfor the entire regime of alternatives.From a diﬀerent angle, our theory (1.6) is reminiscent of the classical LeCam’s contiguity theory in parametric LAN models. There if an estimatorsequence is asymptotically normally distributed under the null, then it isagain asymptotically normal under the alternative, but with a mean shiftwhose exact value depends on the magnitude of the local alternative, cf.[vdV98]. Therefore the power of the corresponding test based on such anestimator sequence is determined completely by this mean shift parame-ter. Our theory (1.6) suggests a similar paradigm in the context of high-dimensional covariance testing (1.1), in that the power of a test statistic witha null CLT (1.4) is determined by the (normalized) mean shift parameter (cid:0) m ( µ, Σ) − m H (cid:1) /σ H .1.3. Two testing cases: testing identity and sphericity.

To demon-strate the versatility of the general principle described above, we apply it toa number of test statistics in two benchmark special cases of (1.1).The ﬁrst application is the test for identity Σ = I . In the growing p setting, this problem has been extensively studied in the literature, see e.g.,[LW02, Sri05, BJYZ09, CZZ10, JJY12, CM13, JY13, ZBY15, CJ18]. Amongthe tests studied in the above works, we apply our general theory (1.6) tothe following three tests: Likelihood Ratio Test (LRT) (see Section 3.1),Ledoit-Nagao-Wolf’s test [Nag73, LW02] (see Section 3.3), and Cai-Ma’s test[CM13] (see Section 3.4). Compared to previous results where power analysisis either missing or requires restrictive conditions on the alternative, ourresults pose no assumptions on the alternative Σ and only mild conditionson the growth of ( n, p ). As an example, the LRT, denoted by Ψ LRT ( X ), isshown to admit the following asymptotic power formula (see Theorem 3.3):under min { n, p } → ∞ with lim( p/n ) < E ( µ, Σ) Ψ LRT ( X ) ∼ − Φ (cid:18) z α − d S (Σ , I ) (cid:113) (cid:0) − pn − − log (cid:0) − pn − (cid:1)(cid:1) (cid:19) . (1.8)Here a ∼ b stands for a/b → ( µ, Σ) →

0. Inthe LRT setting, a much stronger estimate can be proved in that err ( µ, Σ) ≤ Cp − / holds for some absolute constant C >

0. This key estimate follows

IGH DIMENSIONAL COVARIANCE TESTING 5 by a series of algebraic manipulations, upon calculating that V µ, Σ) = ( n − (cid:107) Σ − I (cid:107) F , m ( µ, Σ) − m H = [( n − / d S (Σ , I ), and σ H ≥ cp for someabsolute constant c > . See Proposition 3.2 and its proof for more details.The second application is the sphericity test Σ = λI for some unspeciﬁed λ >

0. In the growing p setting, this problem has previously been studiedin [LW02, Sri05, CZZ10, JJY12, JY13, JQ15]. We study in this paper thefollowing two widely-used tests: LRT for sphericity (Section 4.1), John’s test[Joh71] (Section 4.2), both invariant under H . Similar to the previous case,our results on the power behavior of these tests do not pose any assumptionon the alternative Σ. As an example, the LRT for sphericity, denoted byΨ LRT ,s ( X ), is shown to admit the following asymptotic power formula (seeTheorem 4.3): under min { n, p } → ∞ with lim( p/n ) < E ( µ, Σ) Ψ LRT ,s ( X ) ∼ − Φ (cid:18) z α − − log det(Σ · b − (Σ)) (cid:113) (cid:0) − pn − − log (cid:0) − pn − (cid:1)(cid:1) (cid:19) . (1.9)Here det( · ) is the matrix determinant and b (Σ) ≡ tr(Σ) /p with tr( · ) denotingthe trace. To the best of our knowledge, the above power formula for theLRT in the sphericity is new in the literature.It should be mentioned that although we state (1.8)-(1.9) in asymptoticformulae for simplicity of representation in the introduction, these resultsactually hold with explicit non-asymptotic error bounds due to the intrinsicﬁnite-sample nature of our theory (1.6). The non-asymptotic error boundsof the power expansion of all the aforementioned tests require quantitativenormal approximation error bounds err H for the corresponding test statis-tics, whose proofs depend on several spectral estimates for a class of specialhigh dimensional matrices that will be detailed in Section 7. These results,proved using techniques from (second-order) Poincar´e inequalities, randommatrices and zonal polynomials, are new and of independent interest (andsometimes even improve signiﬁcantly known asymptotic results).We conclude this introduction by noting that in contrast to (1.1) whichtargets at general alternatives, several previous works [OMH13, WY13,OMH14] obtained power expansions similar to (1.6) within a special classof alternatives known as the spiked covariance model [Joh01]. We draw de-tailed comparisons with these results in Section 5. In particular, as will beclear in Section 5, although [OMH13, WY13, OMH14] showed that someof the aforementioned tests have asymptotically equivalent power behaviorunder the spiked covariance alternative with a ﬁxed number of spikes, ournew power characterizations indicate that such equivalence in general failswhen many spikes exist.1.4. Organization.

The rest of the paper is organized as follows. We detailthe general principle described above in Section 2. Sections 3 and 4 are Here (cid:107)·(cid:107) F is the matrix Frobenius norm and d S ( · , · ) is the matrix Stein loss to bedeﬁned in (3.5) ahead. Q. HAN, T. JIANG, AND Y. SHEN devoted to testing Σ = I and Σ = λI respectively. Section 5 focuses on thecase study of spike alternatives. Some concluding remarks are in Section 6followed by some key spectral estimates in Section 7. Sections 8 - 13 containthe main proofs of results in Sections 3 and 4, with the rest of technicaldetails deferred to the appendices.1.5. Notation.

For any positive integer n , let [ n ] denote the set { , . . . , n } .For a, b ∈ R , a ∨ b ≡ max { a, b } and a ∧ b ≡ min { a, b } . For a ∈ R , let a + ≡ a ∨ a − ≡ ( − a ) ∨

0. For x ∈ R n , let (cid:107) x (cid:107) p = (cid:107) x (cid:107) (cid:96) p ( R n ) denote its p -norm (0 ≤ p ≤ ∞ ) with (cid:107) x (cid:107) abbreviated as (cid:107) x (cid:107) . Let B p ( r ; x ) ≡ { z ∈ R p : (cid:107) z − x (cid:107) ≤ r } be the unit (cid:96) ball in R p . By n we denote the vector ofall ones in R n . For a matrix M ∈ R n × n , let (cid:107) M (cid:107) op and (cid:107) M (cid:107) F denote thespectral and Frobenius norms of M respectively. We use { e j } to denote thecanonical basis, whose dimension should be self-clear from the context.We use C x to denote a generic constant that depends only on x , whosenumeric value may change from line to line unless otherwise speciﬁed. No-tations a (cid:46) x b and a (cid:38) x b mean a ≤ C x b and a ≥ C x b respectively, and a (cid:16) x b means a (cid:46) x b and a (cid:38) x b . The symbol a (cid:46) b means a ≤ Cb forsome absolute constant C . For two nonnegative sequences { a n } and { b n } ,we write a n (cid:28) b n (respectively a n (cid:29) b n ) if lim n →∞ ( a n /b n ) = 0 (respec-tively lim n →∞ ( a n /b n ) = ∞ ). We write a n ∼ b n if lim n →∞ ( a n /b n ) = 1. Wefollow the convention that 0 / ϕ, Φ be the density and the cumulative distribution function of astandard normal random variable. For any α ∈ (0 , z α be the normalquantile deﬁned by P ( N (0 , > z α ) = α . For two random variables X, Y on R , we use d TV ( X, Y ) and d Kol ( X, Y ) to denote their total variation distanceand Kolmogorov distance deﬁned respectively by d TV ( X, Y ) ≡ sup B ∈B ( R ) (cid:12)(cid:12) P (cid:0) X ∈ B (cid:1) − P (cid:0) Y ∈ B (cid:1)(cid:12)(cid:12) ,d Kol ( X, Y ) ≡ sup t ∈ R (cid:12)(cid:12) P (cid:0) X ≤ t (cid:1) − P (cid:0) Y ≤ t (cid:1)(cid:12)(cid:12) . (1.10)Here B ( R ) denotes the Borel σ -algebra of R .Let γ d be the standard Gaussian measure on R d , and for r ≥ W r, ( γ d )be the completion of C ∞ ( R d ), the space of smooth and compactly supportedfunctions in R d , with respect to the norm (cid:107) f (cid:107) r ≡ (cid:20) (cid:88) | α |≤ r (cid:90) (cid:0) ∂ α f ( x ) (cid:1) γ d (d x ) (cid:21) / . (1.11)In other words, W r, ( γ d ) is the Sobolev space with respect to the Gaussianmeasure γ d . 2. A general principle

Consider a generic test statistic T : R n × p → R whose law is invariantunder H , i.e., for any ( µ, Σ) ∈ H , the law of T ( X ) remains the same. For IGH DIMENSIONAL COVARIANCE TESTING 7 any ( µ, Σ) ∈ R p × M p , let T ( µ, Σ) : R n × p → R n × p be deﬁned by T ( µ, Σ) ( z ) ≡ ∇ T (cid:0) z Σ / + n µ (cid:62) (cid:1) Σ / , z ∈ R n × p . Here n is the n -vector of all ones, and ∇ T : R n × p → R n × p is the mapwith (cid:0) ∇ T ( z ) (cid:1) ij = ∂T ( z ) /∂z ij . Recall that Z , . . . , Z n are i.i.d. randomvariables with a standard p -variate normal distribution N (0 , I p ). For any( µ, Σ) ∈ R p × M p , deﬁne the quantity V µ, Σ) ≡ inf ( µ , Σ ) ∈ H E (cid:13)(cid:13) T ( µ, Σ) ( Z ) − T ( µ , Σ ) ( Z ) (cid:13)(cid:13) F . (2.1)The inﬁmum in the above deﬁnition is usually dummy as in many cases T itself is invariant over H in the sense that for any ( µ , Σ ) , ( µ , Σ ) ∈ H and any z ∈ R n × p , T ( z Σ / + n µ (cid:62) ) = T ( z Σ / + n µ (cid:62) ), so T ( µ , Σ ) ( z ) = T ( µ , Σ ) ( z ).Equipped with the above deﬁnitions, the following result provides a gen-eral recipe of analyzing the behavior of the statistic T ( X ) whenever normalapproximation under the null is possible; its proof is presented later in thissection. Recall the quantities m ( µ, Σ) , m H , σ µ, Σ) , σ H deﬁned in (1.2)-(1.3)and that γ n × p denotes the standard Gaussian measure in R n × p . Theorem 2.1.

Suppose that T : R n × p → R is an element of W , ( γ n × p ) ,and the law of T ( X ) is invariant under H . Then for any ( µ, Σ) ∈ R p × M p and t ∈ R , (cid:12)(cid:12)(cid:12)(cid:12) P ( µ, Σ) (cid:18) T (cid:0) X (cid:1) − m H σ H > t (cid:19) − P (cid:18) N (cid:18) m ( µ, Σ) − m H σ H , (cid:19) > t (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) ≤ err H + C · (cid:18) (1 + | t | ) V ( µ, Σ) | m ( µ, Σ) − m H | (cid:94) V ( µ, Σ) σ H (cid:19) / . Here

C > is a universal constant, and err H ≡ d Kol (cid:18) T (cid:0) X (cid:1) − m H σ H , N (0 , (cid:19) under H (2.2) is the normal approximation error of T ( X ) under H in Kolmogorov distanceas deﬁned in (1.10).Remark . The comparison with normal distributions in the above theo-rem could be extended to more general distributions. We refrain from suchextension because all of the tests statistics considered in this paper (seeSections 3 and 4 ahead) have a normal limit under the null.Theorem 2.1 uniﬁes and broadens substantially the scope of power anal-ysis in the current covariance testing literature. In particular, it poses noapriori assumptions on the alternative and applies to both contiguous andnon-contiguous one, while most of the current literature focus on the behav-ior of the statistic under contiguous alternatives (with only known exceptionsin [CJ18] for the LRT).

Q. HAN, T. JIANG, AND Y. SHEN

Recall the generic test Ψ( X ) deﬁned in (1.5). The following result is animmediate consequence of Theorem 2.1. Corollary 2.3.

Suppose that T : R n × p → R is an element of W , ( γ n × p ) and the law of T ( X ) is invariant under H . For any α ∈ (0 , , there existssome C α > such that (cid:12)(cid:12)(cid:12)(cid:12) E ( µ, Σ) Ψ( X ) − (cid:20) − Φ (cid:18) z α − m ( µ, Σ) − m H σ H (cid:19)(cid:21)(cid:12)(cid:12)(cid:12)(cid:12) ≤ err H + C α (cid:18) V ( µ, Σ) | m ( µ, Σ) − m H | ∨ σ H (cid:19) / holds for any ( µ, Σ) ∈ R p × M . Here err H is deﬁned in (2.2). The above result reduces the analysis of the behavior of Ψ( X ) in (1.5)into essentially the following two steps:(1) ( Normal approximation under H ) Show thaterr H = d Kol (cid:18) T (cid:0) X (cid:1) − m H σ H (cid:19) → , under H . Asymptotic normality has been derived for a variety of statistics in thehigh dimensional covariance testing literature, using mostly case-speciﬁctechniques; see e.g., [LW02, BD05, BJYZ09, JJY12, CM13, JY13, JQ15,CJ18] for an incomplete list. In this paper, we show err H → T ( X ), as opposed to the rather “hard”calculation techniques adopted in previous literature.(2) ( Ratio control ) Show that V ( µ, Σ) | m ( µ, Σ) − m H | ∨ σ H → . (2.3)This requires upper bounds on V ( µ, Σ) and lower bounds for | m ( µ, Σ) − m H | and σ H . The general strategy for these bounds are: (i) the term V ( µ, Σ) deﬁned in (2.1) can be evaluated by computing the gradient of T , (ii)the null variance σ H has asymptotically exact formula for many teststatistics in the literature, and can sometimes be directly evaluated viaFourier expansion in the Gaussian space; (iii) the mean diﬀerence term | m ( µ, Σ) − m H | requires a near closed-form formula for the mean of T under the alternative. As will be seen in the examples in Sections 3 and4, evaluation of these quantities typically requires certain case-speciﬁctechniques. IGH DIMENSIONAL COVARIANCE TESTING 9

The remainder of this section is devoted to the proof of Theorem 2.1,which utilizes the following lemma.

Lemma 2.4.

For any t ∈ R and u ∈ R , (cid:12)(cid:12) P (cid:0) N ( u, ≤ t (cid:1) − P (cid:0) N ((1 + η ) u, ≤ t (cid:1)(cid:12)(cid:12) ≤ | t | ) · | η | . Proof.

This result strengthens [HSS20, Lemma 5.4]. We assume withoutloss generality η ∈ [ − / , /

2] because otherwise the right hand side of thedesired display is greater than or equal to 1. Note that the left hand side isbounded by (cid:12)(cid:12)(cid:12)(cid:12) (cid:90) t − (1+ η ) ut − u ϕ ( z ) d z (cid:12)(cid:12)(cid:12)(cid:12) ≤ | η | · (cid:20) sup v ∈ [( t − u ) −| ηu | , ( t − u )+ | ηu | ] ϕ ( v ) | u | (cid:21) ≡ | η | · M t ( u ) . Here ϕ ( · ) is the normal density. First consider u ≥

0. Then M t ( u ) ≤ sup v ∈ [ t − u/ ,t − u/ ϕ ( v ) u . • If t − u/ ≤

0, then M t ( u ) ≤ ϕ (cid:16) t − u (cid:17) u = ϕ (cid:16) t − u (cid:17) ( u − t ) + 2 tϕ (cid:16) t − u (cid:17) ≤ x ∈ R | x | ϕ ( x ) + 2 √ π | t | = 2 √ πe + 2 √ π | t | . Here we used the readily veriﬁed fact that sup x ∈ R | x | ϕ ( x ) = 1 / √ πe . • If t − u/ ≥

0, then M t ( u ) ≤ ϕ (cid:16) t − u (cid:17) u = ϕ (cid:16) t − u (cid:17)(cid:16) u − t (cid:17) + 23 tϕ (cid:16) t − u (cid:17) ≤ (cid:16) √ e + 1 √ π | t | (cid:17) . • Otherwise (2 / t ≤ u ≤ t , so M t ( u ) ≤ | u | ≤ | t | .The case u < u M t ( u ) ≤ | t | ). (cid:3) Proof of Theorem 2.1.

Let Z ∈ R n × p be a matrix generated by n i.i.d. sam-ples from N (0 , I p ). Let X ( µ, Σ) ≡ Z Σ / + n µ (cid:62) . Without loss of generality,let ( µ , Σ ) ∈ H be the pair achieving the inﬁmum in (2.1). Then, T ( X ) − m H σ H d = T (cid:0) X ( µ, Σ) (cid:1) − T (cid:0) X ( µ , Σ ) (cid:1) σ H + T (cid:0) X ( µ , Σ ) (cid:1) − m H σ H = m ( µ, Σ) − m H σ H + W ( Z ) σ H + T (cid:0) X ( µ , Σ ) (cid:1) − m H σ H . (2.4)Here W ( Z ) is the centered variable deﬁned by W ( Z ) ≡ T (cid:0) Z Σ / + n µ (cid:62) (cid:1) − T (cid:0) Z Σ / + n µ (cid:62) (cid:1) − (cid:0) m ( µ, Σ) − m H (cid:1) . Using the chain rule, ∂ ( ij ) W ( Z ) = (cid:88) ( i (cid:48) j (cid:48) ) (cid:20) ∂∂X ( µ, Σ)( i (cid:48) j (cid:48) ) T (cid:0) X ( µ, Σ) (cid:1) · ∂X ( µ, Σ)( i (cid:48) j (cid:48) ) ∂Z ( ij ) − ∂∂X ( µ , Σ )( i (cid:48) j (cid:48) ) T (cid:0) X ( µ , Σ ) (cid:1) · ∂X ( µ , Σ )( i (cid:48) j (cid:48) ) ∂Z ( ij ) (cid:21) = (cid:88) j (cid:48) (cid:20)(cid:0) ∇ T ( X ( µ, Σ) ) (cid:1) ij (cid:48) · (Σ / ) j (cid:48) j − T ( X ( µ , Σ ) ) (cid:1) ij (cid:48) · (Σ / ) j (cid:48) j (cid:21) = (cid:16) ∇ T (cid:0) X ( µ, Σ) (cid:1) Σ / − ∇ T (cid:0) X ( µ , Σ ) (cid:1) Σ / (cid:17) ij = (cid:0) T ( µ, Σ) ( Z ) − T ( µ , Σ ) ( Z ) (cid:1) ij . By the Gaussian-Poincar´e inequality [BLM13, Theorem 3.20],Var (cid:0) W ( Z ) (cid:1) ≤ E (cid:20) (cid:88) ( ij ) (cid:0) ∂ ( ij ) W ( Z ) (cid:1) (cid:21) = E (cid:13)(cid:13) T ( µ, Σ) ( Z ) − T ( µ , Σ ) ( Z ) (cid:13)(cid:13) F = V µ, Σ) . This means for any u >

0, on an event E with probability at least 1 − u − , (cid:12)(cid:12) W ( Z ) (cid:12)(cid:12) ≤ u · V ( µ, Σ) . Hence for any t ∈ R , the decomposition (2.4) entails that [recall the deﬁni-tion of err H in (2.2)] P (cid:18) T (cid:0) X ( µ, Σ) (cid:1) − m H σ H > t (cid:19) = P (cid:18) m ( µ, Σ) − m H + W ( Z ) σ H + T (cid:0) X ( µ , Σ ) (cid:1) − m H σ H > t (cid:19) ≤ P (cid:18) m ( µ, Σ) − m H + u · V ( µ, Σ) σ H + T (cid:0) X ( µ , Σ ) (cid:1) − m H σ H > t (cid:19) + 1 u ≤ P (cid:18) m ( µ, Σ) − m H + u · V ( µ, Σ) σ H + N (0 , > t (cid:19) + 1 u + err H ≡ p ( u ) + err H . Next we bound p ( · ) using two diﬀerent ways. First by Lemma 2.4, we haveinf u> p ( u ) ≤ P (cid:18) m ( µ, Σ) − m H σ H + N (0 , > t (cid:19) + inf u> (cid:20) | t | ) u · V ( µ, Σ) | m ( µ, Σ) − m H | + 1 u (cid:21) ≤ P (cid:18) m ( µ, Σ) − m H σ H + N (0 , > t (cid:19) + C (cid:18) (1 + | t | ) V ( µ, Σ) | m ( µ, Σ) − m H | (cid:19) / . On the other hand, by anti-concentration of the standard normal distribu-tion, i.e., | P (cid:0) N (0 , ≤ a (cid:1) − P (cid:0) N (0 , ≤ b (cid:1) | ≤ | a − b | for any a, b ∈ R ,inf u> p ( u ) ≤ P (cid:18) m ( µ, Σ) − m H σ H + N (0 , > t (cid:19) + inf u> (cid:20) u · V ( µ, Σ) σ H + 1 u (cid:21) ≤ P (cid:18) m ( µ, Σ) − m H σ H + N (0 , > t (cid:19) + C (cid:18) V ( µ, Σ) σ H (cid:19) / . IGH DIMENSIONAL COVARIANCE TESTING 11

Collecting the bounds completes the proof for one direction. For the otherdirection, we have P (cid:18) T (cid:0) X ( µ, Σ) (cid:1) − m H σ H > t (cid:19) = P (cid:18) m ( µ, Σ) − m H + W ( Z ) σ H + T (cid:0) X ( µ , Σ ) (cid:1) − m H σ H > t (cid:19) ≥ P (cid:18) m ( µ, Σ) − m H − u · V ( µ, Σ) σ H + T (cid:0) X ( µ , Σ ) (cid:1) − m H σ H > t (cid:19) − u ≥ P (cid:18) m ( µ, Σ) − m H − u · V ( µ, Σ) σ H + N (0 , > t (cid:19) − u − err H . The rest of the proof follows from similar arguments as in the previousdirection by invoking the two diﬀerent bounds. (cid:3) Testing identity

Σ = I We introduce some additional notation. Based on i.i.d. samples X , . . . , X n from N ( µ, Σ), the sample covariance matrix and its unbiased modiﬁcationare given by S ∗ ≡ n − n (cid:88) k =1 (cid:0) X k − ¯ X (cid:1)(cid:0) X k − ¯ X (cid:1) (cid:62) with ¯ X ≡ n − n (cid:88) i =1 X i ,S ≡ nN S ∗ d = 1 N N (cid:88) k =1 ( X k − µ ) (cid:62) ( X k − µ ) . (3.1)Here N = n − S for math-ematical simplicity (unless otherwise speciﬁed), and adopt the right mostexpression of (3.1) as its deﬁnition whenever no confusion could arise.3.1. LRT.

Consider the testing problem: H : Σ = I versus H : H does not hold . (3.3)This is a special case of (1.1) by taking H = R p × { I } , and has beenextensively studied in the literature; see [LW02, Sri05, BJYZ09, CZZ10,JJY12, CM13, JY13, ZBY15, CJ18] for an incomplete list.This subsection studies the behavior of the LRT for testing (3.3). Themodiﬁed log-likelihood ratio statistic T LRT : R N × p → R (cf. [Mui82, Theo-rem 8.4.2]) is deﬁned as T LRT ( X ) ≡ N (cid:2) tr( S ) − log det S − p (cid:3) . (3.4) Trivially, the law of T LRT ( X ) is invariant under H . The general principle inTheorem 2.1 thereby applies in view of the regularity of T LRT (see AppendixB). Moreover, due to the translation invariance of T LRT , we will use the set ofnotation (cid:0) m Σ;LRT , σ

Σ;LRT , V

Σ;LRT (cid:1) to represent their generic versions deﬁnedin (1.2) and (2.1).Following the discussion after Corollary 2.3, we start by establishing aquantitative CLT for T LRT ( X ) under H ; its proof is presented in Section8.2. Theorem 3.1.

Suppose p/N ≤ − ε for some ε ∈ (0 , . Then there existssome constant C = C ( ε ) > , such that under H , d TV (cid:18) T LRT ( X ) − m I ;LRT σ I ;LRT , N (0 , (cid:19) ≤ Cp .

Below we make some comments on Theorem 3.1: • (( n, p ) -condition ) It is clear from the deﬁnition of T LRT ( X ) that if p ≥ n then S is singular and the log-likelihood ratio statistic T LRT ≡ −∞ isdegenerate. The CLT for the log-likelihood ratio statistic T LRT ( X ) under H was ﬁrst derived in [BJYZ09] using random matrix theory under theassumption that p/n → y for some y ∈ (0 , n > p + 1and p → ∞ , and in [ZBY15] to relax the Gaussian assumption. Thecondition p/N ≤ − ε in Theorem 3.1 is used to derive the stable estimate E (cid:107) S − (cid:107) op ≤ C for some absolute constant C >

0; see Lemma 7.3 fordetails. • ( Rate of normal approximation ) As introduced in Section 2, Theorem3.1 and other CLT’s to follow are proved via Chatterjee’s second-orderPoincar´e inequality [Cha09], which provides the rate p − of normal ap-proximation. As for ﬁxed p , 2 T LRT ( X ) converges weakly under H to achi-squared distribution with p ( p + 1) / p − cannot be further im-proved.The following result establishes the ratio control (2.3) for the log-likelihoodratio statistic T LRT ( X ); its proof is presented in Section 8.3. For p.s.d. Σ and p.d. Σ , let d S (Σ , Σ ) ≡ tr(Σ Σ − ) − log det (cid:0) Σ Σ − ) − p (3.5)be the Stein loss with the convention that d S (Σ , Σ ) ≡ ∞ if Σ is singular. Proposition 3.2.

Suppose Σ is non-singular. The following hold:(1) V = N (cid:107) Σ − I (cid:107) F .(2) m Σ;LRT − m I ;LRT = ( N/ d S (Σ , I ) .(3) In the asymptotic regime N ≥ p + 1 with p → ∞ , σ I ;LRT ∼ N (cid:20) − pN − log (cid:18) − pN (cid:19)(cid:21) . IGH DIMENSIONAL COVARIANCE TESTING 13

In particular, σ I ;LRT ≥ cp for some universal constant c > .(4) There exists some universal constant C > such that V Σ;LRT | m Σ;LRT − m I ;LRT | ∨ σ I ;LRT ≤ Cp / . The above proposition gives a prototypical example of how to proceedwith the ratio control (2.3). For the log-likelihood ratio statistic T LRT ( X )deﬁned in (3.4), both V Σ;LRT and the mean diﬀerence m Σ;LRT − m I ;LRT admit easy-to-handle closed-form formulae. To give some insights for thebound obtained in Proposition 3.2-(4), let us consider the ‘local regime’ ofalternatives in which d S (Σ , I ) ≈ (cid:107) Σ − I (cid:107) F . Then (2.3) can be bounded, upto a constant, by (cid:113) N (cid:107) Σ − I (cid:107) F N (cid:107) Σ − I (cid:107) F ∨ σ I ;LRT ≤ sup x ≥ xx ∨ σ I ;LRT = 1inf x ≥ (cid:0) x ∨ σ I ;LRT x (cid:1) = 1 σ / I ;LRT , in the prescribed local regime of alternatives. The above simple reasoningexempliﬁes the essential reason why the ratio (2.3) must be small: if Σis suﬃciently away from I , then the mean diﬀerence m Σ;LRT − m I ;LRT issubstantially larger than V Σ;LRT , but would otherwise be compensated bythe diverging nature of σ I ;LRT .Let Ψ LRT ( X ) be the LRT built from the generic test (1.1) and the log-likelihood ratio statistic T LRT ( X ). Combining the above results with thegeneric Theorem 2.1, we obtain the following asymptotic formula for thepower behavior of Ψ LRT ( X ). Recall that z α is the normal quantile deﬁnedunder (1.5). Theorem 3.3.

Suppose p/N ≤ − ε for some ε ∈ (0 , . Then there existssome constant C = C ( ε, α ) > such that (cid:12)(cid:12)(cid:12)(cid:12) E Σ Ψ LRT ( X ) − P (cid:18) N (cid:18) N · d S (Σ , I )2 σ I , (cid:19) > z α (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C · p − / . (3.6) Consequently, in the asymptotic regime N ∧ p → ∞ with lim( p/N ) < , E Σ Ψ LRT ( X ) ∼ − Φ (cid:18) z α − d S (Σ , I ) (cid:113) (cid:0) − pN − log (cid:0) − pN (cid:1)(cid:1) (cid:19) . Remark . Note that the right hand side of the above asymptotic expres-sion is bounded below by 1 − Φ( z α ) = α >

0, hence (3.6) along with p → ∞ suﬃce for the above asymptotic equivalence to hold.Below we make some comments on Theorem 3.3: • ( Case of singular

Σ) Rigorously speaking, Theorem 2.1 only holds for thecase where m Σ is ﬁnite, which excludes the case of singular Σ. However,in the latter case, there exists some a ∈ R p such that a (cid:62) X ≡ S is necessarily singular as well, thus rendering the test Ψ LRT ( X )always rejecting the null. • ( Power comparison ) To the best of our knowledge, [CJ18] is the onlywork that contains formal theory on the power behavior of the LRTΨ

LRT ( X ) targeting at general alternatives. A related but diﬀerent LRTwas considered in [OMH13, OMH14] which was targeted at the specialclass of spike alternatives; see more details in Section 5. Comparing to[CJ18, Theorem 1], Theorem 3.3 removes their condition sup n (cid:107) Σ (cid:107) op < ∞ and applies to arbitrary alternatives Σ. • ( Minimax optimality ) It was established in [CM13, Theorem 1] that whenlim( p/n ) < ∞ , the minimax rate of testing (3.3) is (cid:112) p/n under the (cid:107)·(cid:107) F norm. Using the relation [recall (3.5)] d S (Σ , I ) = (cid:80) pj =1 ( λ j − − log λ j ) (cid:38) (cid:80) pj =1 ( λ j − ∧ | λ j − | with { λ j } pj =1 ≥ LRT ( X ) is minimax rate optimal in the asymptotic regime N ∧ p → ∞ with lim( p/N ) <

1; see Sections 3.4 and 5 for more reﬁnedcomparison.3.2.

LRT: simultaneous testing of mean and covariance.

Considerthe following variant of (3.3): H : µ = 0 , Σ = I versus H : H does not hold. (3.7)This is a special case of (1.1) by taking H = { } × { I } , and has previouslybeen studied in [JY13, JQ15, CJ18].We will study the behavior of the LRT for (3.7). Using the vanilla version S ∗ of the sample covariance [recall (3.1)], the log-likelihood ratio statistictakes the form (cf. [Mui82, Theorem 8.5.1]) T ( µ, Σ);LRT ( X ) ≡ n (cid:0) tr( S ∗ ) − log det S ∗ − p + ¯ X (cid:62) ¯ X (cid:1) . (3.8)We will use (cid:0) m ( µ, Σ);LRT , σ ( µ, Σ);LRT , V ( µ, Σ);LRT (cid:1) to represent their generic ver-sions deﬁned in (1.2) and (2.1).The next theorem establishes a quantitative CLT for T ( µ, Σ);LRT ( X ); itsproof is given in Section 9.2. Theorem 3.5.

Suppose p/n ≤ − ε for some ε ∈ (0 , . Then there existssome constant C = C ( ε ) > , such that under H , d TV (cid:18) T ( µ, Σ);LRT ( X ) − m (0 ,I );LRT σ (0 ,I );LRT , N (0 , (cid:19) ≤ Cp .

The following result establishes the ratio control in (2.3) for T ( µ, Σ);LRT ( X );its proof is presented in Section 9.3. Recall that d S (Σ , Σ ) is the Stein lossdeﬁned in (3.5). Proposition 3.6.

Suppose Σ is non-singular. The following hold:(1) V µ, Σ);LRT = n (cid:0) (cid:107) Σ − I (cid:107) F + µ (cid:62) Σ µ (cid:1) .(2) m ( µ, Σ);LRT − m (0 ,I );LRT = ( n/ (cid:0) d S (Σ , I ) + (cid:107) µ (cid:107) (cid:1) . IGH DIMENSIONAL COVARIANCE TESTING 15 (3) In the asymptotic regime n ≥ p + 2 with p → ∞ , σ ,I );LRT ∼ n (cid:20) − pn − − log (cid:18) − pn − (cid:19)(cid:21) . In particular, σ ,I );LRT ≥ cp for some universal constant c > .(4) There exist some universal constant C > such that V ( µ, Σ);LRT | m ( µ, Σ);LRT − m (0 ,I ) | ∨ σ (0 ,I );LRT ≤ Cp / . The above proposition is similar to Proposition 3.2. It should however bementioned that the neat closed-form formula in (2) is available due to thefact we use the vanilla version S ∗ in (3.8).Let Ψ LRT; m ( X ) be the LRT built from the generic test (1.1) and thelog-likelihood ratio statistic T LRT; m . Combining the above results with thegeneric Theorem 2.1, we obtain the following asymptotic power formula forΨ LRT; m ( X ). Theorem 3.7.

Suppose p/n ≤ − ε for some ε ∈ (0 , . Then there existssome constant C = C ( ε, α ) > such that (cid:12)(cid:12)(cid:12)(cid:12) E ( µ, Σ) Ψ LRT; m ( X ) − P (cid:18) N (cid:18) n · (cid:0) d S (Σ , I ) + (cid:107) µ (cid:107) (cid:1) σ (0 ,I ) , (cid:19) > z α (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C · p − / . Consequently, in the asymptotic regime n ∧ p → ∞ with lim( p/n ) < , E ( µ, Σ) Ψ LRT; m ( X ) ∼ − Φ (cid:18) z α − d S (Σ , I ) + (cid:107) µ (cid:107) (cid:113) (cid:0) − pn − − log (cid:0) − pn − (cid:1)(cid:1) (cid:19) . The law of T ( µ, Σ);LRT ( X ) under the alternative is previously derived in[CJ18, Theorem 2] under several regularity conditions on ( µ, Σ). We removethose conditions completely in Theorem 3.7.3.3.

Ledoit-Nagao-Wolf ’s test.

This subsection studies testing (3.3) us-ing the (rescaled) modiﬁed Nagao’s trace statistic [Nag73] by Ledoit andWolf [LW02]: T LNW ( X ) ≡ N (cid:20) tr (cid:0) S − I (cid:1) − N tr ( S ) (cid:21) . (3.9)An asymptotically equivalent statistic as an unbiased estimator of (cid:107) Σ − I (cid:107) F has also been studied in [Sri05]. One advantage of using (3.9) is that itapplies to the case p > n where the LRT in Section 3.1 becomes degenerate.We will use (cid:0) m Σ;LNW , σ

Σ;LNW , V

Σ;LNW (cid:1) to represent their generic versionsdeﬁned in (1.2) and (2.1). The next theorem establishes a quantitative CLTfor T LNW ( X ) under H ; its proof is presented in Section 10.2. Theorem 3.8.

There exists an absolute constant

C > such that under H , d TV (cid:18) T LNW ( X ) − m I ;LNW σ I ;LNW , N (0 , (cid:19) ≤ CN ∧ p . The CLT for T LNW ( X ) was ﬁrst derived in [LW02, Proposition 7] underthe condition that p/N → y ∈ (0 , ∞ ), which was later improved in [BD05,Theorem 3.6] to include the case y ∈ { , ∞} . Here we give explicit errorbounds in the normal approximation.The following result establishes the ratio control (2.3) for T LNW ; its proofis presented in Section 10.3.

Proposition 3.9.

Suppose p/N ≤ M for some M > . Then the followinghold:(1) V ≤ C N (cid:0) (cid:107) Σ (cid:107) ∨ (cid:1) (cid:107) Σ − I (cid:107) F for some constant C = C ( M ) > .(2) With Q LNW (Σ) ≡ ( N − − N − ) tr(Σ − I ) , m Σ;LNW − m (0 ,I ) = N (cid:2) (cid:107) Σ − I (cid:107) F + Q LNW (Σ) (cid:3) . (3) In the asymptotic regime N ∧ p → ∞ , σ I ;LNW ∼ p . (4) There exists some constant C = C ( M ) > such that V Σ;LNW | m Σ;LNW − m I ;LNW | ∨ σ I ;LNW ≤ C p / . There are two signiﬁcant structural diﬀerences in the above propositioncompared to Proposition 3.2. First, compared to V Σ;LRT , V Σ;LNW comes withan additional multiplicative factor (cid:107) Σ (cid:107) ∨

1. Second, although a closed-formformula is available for m Σ;LNW , a somewhat undesirable ‘residual term’ Q LNW (Σ) exists. Removing the eﬀect of these terms in the ratio control (4)requires additional technicalities that will be detailed in Section 10.3.We note that the variance formula for σ I ;LNW in the above proposition isparticularly easy to derive from scratch due to the polynomial structure of T LNW ( X ) in (3.9). This result will also be useful in the variance formula forJohn’s test to be studied in Section 4.2.Let Ψ LNW ( X ) be the test built from (1.1) and the statistic in (3.9). Com-bining the above results with Theorem 2.1 and some additional eﬀorts toremove the residual term Q LNW (Σ) in the mean diﬀerence formula (2) inthe above proposition, we have the following asymptotic power formula forΨ

LNW ( X ); see Section 10.4 for its proof. Theorem 3.10.

Suppose p/N ≤ M for some M > . Then there existssome constant C = C ( α, M ) > such that (cid:12)(cid:12)(cid:12)(cid:12) E Σ Ψ LNW ( X ) − P (cid:18) N (cid:18) N · (cid:107) Σ − I (cid:107) F σ I ;LNW , (cid:19) > z α (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C · p − / . IGH DIMENSIONAL COVARIANCE TESTING 17

Consequently, in the asymptotic regime N ∧ p → ∞ with lim( p/N ) < ∞ , E Σ Ψ LNW ( X ) ∼ − Φ (cid:18) z α − (cid:107) Σ − I (cid:107) F p/N ) (cid:19) . The asymptotic behavior of T LNW under the alternative is previously onlyknown in [Sri05, Theorem 4.1] under rather restrictive conditions on both Σand growth of p . Theorem 3.10 only requires p/N to be bounded and makesno assumptions on Σ.3.4. Cai-Ma’s test.

In this subsection, we assume additionally that thedata matrix X is centered, i.e., X , . . . , X n are i.i.d. from N (0 , Σ). Considertesting (3.3) using the (rescaled) U-statistic by Cai and Ma [CM13]: T CM ( X ) ≡ n − (cid:88) ≤ i n .We will use (cid:0) m Σ;CM , σ

Σ;CM , V

Σ;CM (cid:1) to represent their generic versionsdeﬁned in (1.2) and (2.1). The next theorem establishes a quantitative CLTfor T CM ( X ) under H ; its proof is given in Section 11.2. Theorem 3.11.

There exists some absolute constant

C > such that under H , d TV (cid:18) T CM ( X ) σ I ;CM , N (0 , (cid:19) ≤ C · (cid:18) log nn (cid:95) p (cid:19) . The above theorem improves [CM13, Proposition 3] under the null in twodirections: (1) the above Berry-Esseen bound holds in the stronger totalvariation distance, and (2) the normal approximation rate is improved from( n ∧ p ) − / to ( n ∧ p ) − (ignoring the logarithmic factor).The following result establishes the ratio control (2.3) for T CM ; its proofis given in Section 11.3. Proposition 3.12.

Let y = p/n . Then the following hold:(1) There exists some absolute constant C > such that V ≤ C (cid:2) (1 ∨ y ) + ( y ∨ y ) log n (cid:3) · n (cid:0) (cid:107) Σ (cid:107) ∨ (cid:1) (cid:107) Σ − I (cid:107) F . (2) m Σ;CM − m I ;CM = ( n/ (cid:107) Σ − I (cid:107) F .(3) σ I ;CM = ( p + p ) · n/ ( n − , so in the asymptotic regime n ∧ p → ∞ , σ I ;CM ∼ p . (4) There exists some absolute constant C > such that V Σ;CM | m Σ;CM − m I ;CM | ∨ σ I ;CM ≤ C (cid:2) (1 ∨ y ) / + y (1 ∨ y ) / log n (cid:3) ( n ∧ p ) / . Remark . We keep the ratio term y = p/n in the above result (in par-ticular (4)) to cope with the additional log n factor, so we may remove lowerbound conditions for p .Let Ψ CM ( X ) be the test built from (1.1) and the statistic in (3.10). Com-bining the above results with the generic Theorem 2.1, we obtain the fol-lowing asymptotic power formula for Ψ CM ( X ); some details are provided inSection 11.4. Theorem 3.14.

Suppose p/n ≤ M for some M > . Then there existssome constant C = C ( α, M ) > such that (cid:12)(cid:12)(cid:12)(cid:12) E Σ Ψ CM ( X ) − P (cid:18) N (cid:18) n (cid:107) Σ − I (cid:107) F σ I ;CM , (cid:19) > z α (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C (cid:18) log / nn / (cid:95) p / (cid:19) . Consequently, in the asymptotic regime n ∧ p → ∞ with lim( p/n ) < ∞ , E Σ Ψ CM ( X ) ∼ − Φ (cid:18) z α − (cid:107) Σ − I (cid:107) F p/n ) (cid:19) . [CM13, Equation (27)] proved that the exact power expansion above holdsuniformly for all alternatives Σ such that b (cid:112) p/n ≤ (cid:107) Σ − I (cid:107) F ≤ B (cid:112) p/n forsome ﬁxed 0 < b ≤ B < ∞ . Their proof is based on a Berry-Esseen bound(cf. [CM13, Proposition 3]) for the CLT of T CM ( X ) under general alter-natives using the test-speciﬁc martingale diﬀerence representation. Theo-rem 3.14 holds uniformly for all possible alternatives Σ when additionally p/n ≤ M , using the general method of Theorem 2.1.4. Testing sphericity

Σ = λI Likelihood ratio test.

Consider the testing problem: H : Σ = λI versus H : H does not hold (4.1)for some un-speciﬁed λ >

0. This is a special case of (1.1) by taking H = R p × { λI : λ > } , and has been extensively studied previously in [LW02,Sri05, CZZ10, JJY12, JY13].This subsection studies the LRT for (4.1). The (re-scaled) log-likelihoodratio statistic for (4.1) is deﬁned by (cf. [Mui82, Theorem 8.3.2]): T LRT ,s ( X ) ≡ N (cid:0) p log tr( S ) − log det S − p log p (cid:1) . (4.2) IGH DIMENSIONAL COVARIANCE TESTING 19

Evidently, the law of T LRT ,s ( X ) does not depend on the λ in (4.1) and henceis invariant under H . Thus the general principle in Theorem 2.1 appliesdue to regularity of T LRT; s (see Appendix B). Moreover, due to the trans-lation invariance of T LRT ,s ( X ), we will use (cid:0) m Σ;LRT ,s , σ Σ;LRT ,s , V Σ;LRT ,s (cid:1) torepresent their generic versions deﬁned in (1.2) and (2.1).For a symmetric p × p matrix M , let b (cid:96) ( M ) ≡ p − tr( M (cid:96) ) , b ( M ) ≡ b ( M ) . (4.3)The next theorem establishes a quantitative CLT for T LRT ,s ( X ); its proofis presented in Section 12.2. Recall that T LRT ,s is only non-degenerate if p ≤ n − N . Theorem 4.1.

Suppose p/N ≤ − ε for some ε ∈ (0 , . Then there existssome C = C ( ε ) > such that under H , d TV (cid:18) T LRT ,s ( X ) − m I ;LRT ,s σ I ;LRT ,s , N (0 , (cid:19) ≤ Cp .

The CLT for T LRT ,s ( X ) was previously derived in [JY13, Theorem 1]under the asymptotics y ∈ (0 , p to grow proportionally to N but excludes the boundary case y = 1.The following result establishes the ratio control (2.3) for T LRT ,s ; seeSection 12.3 for its proof. Proposition 4.2.

Suppose Σ is non-singular. The following hold:(1) There exists some absolute constant C > such that V ,s ≤ C N (cid:107) Σ · b − (Σ) − I (cid:107) F holds for N, p large enough.(2) The mean diﬀerence is given by m Σ;LRT ,s − m I ;LRT ,s = N (cid:2) − log det(Σ · b − (Σ)) + Q LRT ,s (Σ · b − (Σ)) (cid:3) . Here (cid:12)(cid:12) Q LRT ,s (cid:0) Σ · b − (Σ) (cid:1)(cid:12)(cid:12) ≤ C N − b (cid:2)(cid:0) Σ · b − (Σ) (cid:1) (cid:3) (4.4) for some absolute constant C > .(3) In the asymptotic regime N ∧ p → ∞ with lim( p/N ) < , σ I ;LRT ,s ∼ N (cid:20) − pN − log (cid:18) − pN (cid:19)(cid:21) . (4) There exists some absolute constant C > such that V Σ;LRT ,s | m Σ;LRT ,s − m I ;LRT ,s | ∨ σ I ;LRT ,s ≤ C ( σ I ;LRT ,s ∧ N ) / . There is a genuine diﬀerence between the above ratio control result andthe previous ones studied in Section 3, in that a closed-form formula for themean diﬀerence m Σ;LRT ,s − m I ;LRT ,s is no longer available. One therefore hasto work with strong enough upper bounds for the ‘residual term’ Q LRT ,s (Σ · b − (Σ)), the removal of which constitutes the main technicalities in theproofs; see Section 12.3 for details.Let Ψ LRT ,s ( X ) be the test built from (1.1) and the statistic in (4.2). Com-bining the above results with Theorem 2.1 and some additional eﬀorts toremove the residual term Q LRT ,s (Σ · b − (Σ)), we have the following asymp-totic power formula for Ψ LRT ,s ( X ); see Section 12.4 for its proof. Theorem 4.3.

Suppose p/N ≤ − ε for some ε ∈ (0 , . Then there existssome constant C = C ( ε, α ) > such that (cid:12)(cid:12)(cid:12)(cid:12) E Σ Ψ LRT ,s ( X ) − P (cid:18) N (cid:18) − N log det (cid:0) Σ · b − (Σ) (cid:1) σ I ; s , (cid:19) > z α (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C · p − / . Consequently, in the asymptotic regime N ∧ p → ∞ with lim( p/N ) < , E Σ Ψ LRT ,s ( X ) ∼ − Φ (cid:18) z α − − log det (cid:0) Σ · b − (Σ) (cid:1)(cid:113) (cid:0) − pN − log (cid:0) − pN (cid:1)(cid:1) (cid:19) . To the best of our knowledge, in the high dimensional regime N ∧ p → ∞ ,the LRT for (4.1) was only studied in [JY13, JQ15], where formal theorywas missing on the power behavior of Ψ LRT ,s . Theorem 4.3 ﬁlls this gap.4.2. John’s test.

Consider testing (4.1) using the (rescaled) John’s tracestatistic [Joh71]: T J ( X ) ≡ N (cid:20)(cid:18) Sp − tr( S ) − I (cid:19) (cid:21) . (4.5)Clearly the law of T J ( X ) is invariant under H , and the above statisticis non-degenerate for all conﬁgurations of ( n, p ). The general principle inTheorem 2.1 thereby applies in view of the regularity of T J (see AppendixB). We will use (cid:0) m Σ;J , σ

Σ;J , V

Σ;J (cid:1) to represent their generic versions deﬁnedin (1.2) and (2.1).The next theorem establishes a quantitative CLT for T J ( X ) under H ; itsproof is given in Section 13.2. Theorem 4.4.

There exists some absolute constant

C > , such that under H , d TV (cid:18) T J ( X ) − m I ;J σ I ;J , N (0 , (cid:19) ≤ CN ∧ p . Central limit theorems for T J ( X ) under H in high dimensions are ﬁrst ob-tained in [LW02]. We improve these results both in terms of non-asymptoticnormal approximation bound and the removal of the condition 0 < lim( p/N ) ≤ lim( p/N ) < ∞ .The following result establishes the ratio control (2.3) for T J ; its proof ispresented in Section 13.3. Recall the deﬁnition of b (Σ) in (4.3). Proposition 4.5.

Suppose p/N ≤ M for some M > . Then the followinghold for N larger than a big enough absolute constant: IGH DIMENSIONAL COVARIANCE TESTING 21 (1) There exists some constant C = C ( M ) > such that V ≤ C · N (cid:0) (cid:107) Σ · b − (Σ) (cid:107) ∨ (cid:1) (cid:107) Σ · b − (Σ) − I (cid:107) F . (2) The mean diﬀerence is given by m Σ;J − m I ;J = N (cid:2) (cid:107) Σ · b − (Σ) − I (cid:107) F + Q J (cid:0) Σ · b − (Σ) (cid:1)(cid:3) . Here (cid:12)(cid:12) Q J (cid:0) Σ · b − (Σ) (cid:1)(cid:12)(cid:12) ≤ C · N − / (cid:0) p − (cid:107) Σ · b − (Σ) (cid:107) F + 1 (cid:1) (cid:107) Σ · b − (Σ) − I (cid:107) F for some C = C ( M ) > .(3) In the asymptotic regime N ∧ p → ∞ , σ I ;J ∼ p . (4) There exists some C = C ( M ) > such that V Σ;J | m Σ;J − m I ;J | ∨ σ I ;J ≤ C p / . The proof of the above ratio control result is the most complicated amongresults of this type studied in this paper. The main complication is due tothe existence of the tr( S ) term in the denominator in (4.5), which leads tothe complications both in the control of V J and the ‘residual term’ Q J (Σ · b − (Σ)). On the other hand, similar to Proposition 4.2-(3), the asymptoticformula for σ I ; J in the above proposition also removes the condition 0 < lim( p/N ) ≤ lim( p/N ) < ∞ that is required in [LW02, Proposition 3], via acomparison to σ I ;LNW studied in Proposition 3.9-(3).Let Ψ J ( X ) be the test built from (1.1) and the statistic in (4.5). Com-bining the above results with Theorem 2.1 and some additional eﬀorts toremove the residual term Q J (Σ · b − (Σ)), we have the following asymptoticpower formula for Ψ J ( X ); see Section 13.4 for its proof. Theorem 4.6.

Suppose p/N ≤ M for some M > . Then there exists someconstant C = C ( α, M ) > such that (cid:12)(cid:12)(cid:12)(cid:12) E Σ Ψ J − P (cid:18) N (cid:18) N · (cid:107) Σ · b − (Σ) − I (cid:107) F σ I ;J , (cid:19) > z α (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C · p − / . Consequently, in the asymptotic regime N ∧ p → ∞ with lim( p/N ) < ∞ , E Σ Ψ J ∼ − Φ (cid:18) z α − (cid:107) Σ · b − (Σ) − I (cid:107) F p/N ) (cid:19) . The power behavior for John’s test is previous studied in [OMH13, OMH14,WY13] for a special class alternatives under the spiked covariance modelwith a ﬁxed number of spikes; see Section 5 ahead for a detailed discussion.To the best of our knowledge, the theorem above gives the ﬁrst completecharacterization of the power behavior for John’s test for arbitrary alterna-tives in the high-dimensional regime N ∧ p → ∞ with lim( p/N ) < ∞ . Spiked covariance models

In this section, we consider a special class of alternatives known as thespiked covariance model [Joh01]:Σ( a ) = diag (cid:0) a , . . . , a p (cid:1) , (5.1)where a = ( a , . . . , a p ) ∈ ( − , ∞ ) p . Write ¯ a = (cid:80) pj =1 a j /p . Specializing theresults obtained in Sections 3 and 4, we have the following. Corollary 5.1.

The following hold.(1) The power for the likelihood ratio test of

Σ = I satisﬁes E Σ( a ) Ψ LRT ∼ − Φ (cid:18) z α − (cid:80) pj =1 (cid:0) a j − log(1 + a j ) (cid:1)(cid:113) (cid:0) − pN − log (cid:0) − pN (cid:1)(cid:1) (cid:19) ≡ β LRT ( a ) , under N ∧ p → ∞ with lim( p/N ) < .(2) The powers for Ledoit-Nagao-Wolf and Cai-Ma’s tests of Σ = I satisfy E Σ( a ) Ψ LNW ∼ E Σ( a ) Ψ CM ∼ − Φ (cid:18) z α − (cid:80) pj =1 a j p/N ) (cid:19) ≡ β LNW , CM ( a ) , under N ∧ p → ∞ with lim( p/N ) < ∞ .(3) The power for the likelihood ratio test of Σ = λI satisﬁes E Σ( a ) Ψ LRT; s ∼ − Φ (cid:18) z α − (cid:80) pj =1 log a a j (cid:113) (cid:0) − pN − log (cid:0) − pN (cid:1)(cid:1) (cid:19) ≡ β LRT; s ( a ) , under N ∧ p → ∞ with lim( p/N ) < .(4) The power for John’s test of Σ = λI satisﬁes E Σ( a ) Ψ J ∼ − Φ (cid:18) z α − (cid:80) pj =1 ( a j − ¯ a ) / (1 + ¯ a ) p/N ) (cid:19) ≡ β J ( a ) , under N ∧ p → ∞ with lim( p/N ) < ∞ . (1), (2) and (4) above recover [OMH14, Proposition 8 (i)-(ii)], while (3)-(4) above recover [WY13, Equations (4.5) and (4.8)]. Both [OMH13, WY13]considered the case where r ≡ (cid:107) a (cid:107) and the non-zero elements of a are ﬁxed.The techniques in [OMH14] work with a further restriction (cid:107) a (cid:107) ∞ < √ y where y is the limiting value of the ratio p/N . This restriction coincideswith the Baik-Ben Arous-P´ech´e (BBP) phase transition [BBAP05], and isessential for the techniques of [OMH14], due to the singular nature of thelikelihood ratio process when (cid:107) a (cid:107) ∞ > √ y already in the case r = 1, see[OMH13, Theorem 8]. The restriction (cid:107) a (cid:107) ∞ < √ y is removed in [WY13]for the likelihood ratio test Ψ LRT; s and John’s test Ψ J for sphericity, byvariations of Bai-Silverstein techniques developed in [BS04, BJYZ09]. Bothresults in [WY13] and [OMH14] also hold beyond Gaussian distributions. IGH DIMENSIONAL COVARIANCE TESTING 23

It is easy to see that in the setting of [OMH13, WY13] with a ﬁxednumber of spikes as described above, the asymptotic powers are the samefor the following two group of tests:(1) Likelihood ratio tests Ψ

LRT , Ψ LRT; s : β LRT = β LRT; s .(2) Ledoit-Nagao-Wolf, Cai-Ma and John’s tests: β LNW , CM = β J .Clearly, neither group of tests universally dominates the other in terms ofthe power behavior. For instance, the power of tests in (1) dominates thatof (2) when some of a j ’s are close to − a j ’s are close to ∞ .In general, the asymptotic power equivalence of the above two groupsmay not hold when the number of spikes are no longer ﬁxed. Instead, wehave the following power ordering within each group. Corollary 5.2. (1) Likelihood ratio tests Ψ LRT , Ψ LRT; s have the power or-dering: β LRT ( a ) ≥ β LRT; s ( a ) . (2) Ledoit-Nagao-Wolf, Cai-Ma and John’s tests Ψ LNW , Ψ CM , Ψ J have thepower ordering: β LNW , CM ( a ) (cid:40) ≥ β J ( a ) , a (cid:0) − (1 + ¯ a ) (cid:1) ≤ ¯ a ; < β J ( a ) , a (cid:0) − (1 + ¯ a ) (cid:1) > ¯ a . Here a ≡ (cid:80) pj =1 a j /p .Proof. (1) follows from the inequality (cid:80) pj =1 log(1 + ¯ a ) ≤ (cid:80) pj =1 ¯ a = (cid:80) pj =1 a j .(2) follows by the following calculation: (cid:80) pj =1 ( a j − ¯ a ) (1 + ¯ a ) = (cid:80) pj =1 ( a j − ¯ a )(1 + ¯ a ) = p (cid:88) j =1 a j + (cid:80) pj =1 a j · (cid:0) − (1 + ¯ a ) (cid:1) − p ¯ a (1 + ¯ a ) . The proof is complete. (cid:3)

Note that { ¯ a ≥ } (cid:40) { a (cid:0) − (1 + ¯ a ) (cid:1) ≤ ¯ a } (the inclusion is in factproper), so if ¯ a ≥

0, John’s test Ψ J will be less powerful than Ledoit-Nagao-Wolf and Cai-Ma’s tests Ψ LNW , Ψ CM . Furthermore, both inequalities in theabove corollary can be strict asymptotically, and similar to the discussionabove, there are no universal power dominance relationships between thetests in the two groups.5.1. An illustrative simulation study.

Below we present some simula-tion results in the current spiked model setting to support the ﬁndings ofCorollary 5.1 and 5.2. The conﬁdence level will be taken to be α = 0 . n, p ) = (300 , . . . . . . Fixed spikes (near singular)

Tau P o w e r LRT.identityNagaoCai−MaLRT.sphericityJohn . . . . . . Fixed spikes (non−singular)

Tau P o w e r LRT.identityNagaoCai−MaLRT.sphericityJohn . . . . . . Growing spikes

Tau P o w e r LRT.identityNagaoCai−MaLRT.sphericityJohn

Figure 1.

The power curves for the tests { Ψ LRT , Ψ LNW , Ψ CM , Ψ LRT; s , Ψ J } in the spiked covariancemodel (5.1). τ on the x -axis indexes the spike magnitude,with τ = 1 corresponding to the null case Σ = I . The dashedhorizontal line marks the prescribed size α = 0 .

05. Notethat the red solid line for Ψ

LNW is not visible as it coincideswith the orange one for Ψ CM predicted by Corollary 5.1-(2). • ( Fixed number of spikes, near singular

Σ) In the top left panel of Figure1, we ﬁx a number of r = 5 spikes with the same magnitude a j = − τ − with τ ∈ { , , . . . , } . Note that τ = 1 corresponds to the null hypoth-esis with no spikes, and as τ grows Σ becomes increasingly singular. Aspredicted by Corollary 5.1 and the discussion thereafter, in this case ofnear singular alternatives Σ, the power of the tests in the ﬁrst group { Ψ LRT , Ψ LRT; s } dominates that of the second group { Ψ LNW , Ψ CM , Ψ J } . • ( Fixed number of spikes, non-singular

Σ) In the top right panel of Figure1, we again ﬁx a number of r = 5 spikes with the same magnitude a j = IGH DIMENSIONAL COVARIANCE TESTING 25 . τ −

1) with τ ∈ { , , . . . , } . Again τ = 1 corresponds to the nullcase. Compared to the previous case, we have the same power groupingeﬀect but the dominance is reversed. • ( Growing number of spikes ) In the bottom panel of Figure 1, we ﬁx anumber of r = 50 spikes with the same magnitude a j = 0 . τ −

1) with τ ∈ [10]. This exempliﬁes the case of a growing number of spikes where,as predicted by Corollary 5.2, we see: (i) The power of LRT Ψ LRT fortesting identity dominates its counterpart Ψ

LRT; s for testing sphericity;(ii) the powers of the { Ψ LNW , Ψ CM } and Ψ J are no longer equivalent,with the former having a strictly larger power as ¯ a > Concluding remarks

In this paper, we develop a general method for power analysis in high di-mensional covariance testing problems when a CLT holds for the test statis-tic under the null. We apply the new method to a number of tests in twoprototypical problems of testing identity Σ = I and sphericity Σ = λI withunspeciﬁed λ >

0. The key technical step is to control the ratio (2.3) whichtypically requires case-speciﬁc techniques. A strong enough control of theratio (2.3), as demonstrated in many examples in the paper, leads to a sharpasymptotic power expansion of the test that holds for arbitrary alternatives.If normal approximation of the test statistic under the null can be quantiﬁednon-asymptotically, then the full ﬁnite-sample strength of our method canbe utilized. For the tests studied in this paper, this is achieved via non-trivial applications of Chatterjee’s second-order Poincar´e inequality [Cha09](see Lemma A.1).Below we sketch some directions for future research:(1) (

Upper limit condition on ( n, p )) Other than the minimal condition n ∧ p → ∞ , two additional conditions on ( n, p ) made in this paper are: (i) inresults for the LRT (Sections 3.1, 3.2, 4.1), we assume that lim( p/n ) < p/n →

1; (ii) for the asymptotic powerformulae of all other test statistics (Theorems 3.10, 3.14, 4.6), we requirethat lim( p/n ) < ∞ . Some degree of relaxation is certainly possible,but it remains an open question to obtain the best possible upper limitcondition for p/n under which the power expansion of these tests is stillvalid.(2) ( Gaussian assumption ) Throughout the paper we have worked with Gauss-ian observations to obtain complete characterizations for the power be-havior of various tests. Gaussianity enters at a technical level via theGaussian-Poincar´e inequality, the second-order Poincar´e inequality [Cha09],Fourier expansion in the Gaussian space [NP12]. It is of great interestto extend our results to non-Gaussian observations. The main technicalhurdles seem however to be sharp estimates for intermediate terms like V ( µ, Σ) , | m ( µ, Σ) − m H | , and σ H . (3) ( Other testing problems ) The general method developed in Section 2could potentially be applied to other testing problems, including butnot limited to, (block) independence test, two- and multi-sample (joint)testing of µ and Σ, testing of regression coeﬃcients in multivariate lin-ear regression. These problems require further technical works and willtherefore be pursued elsewhere.7. Some spectral estimates

First we introduce some convention on notation: Let I , I be ﬁnite indexsets. For A = ( A ι ,ι ) ι ∈I ,ι ∈I ∈ R I ×I , its operator norm is deﬁned as (cid:107) A (cid:107) op ≡ sup v ∈ B I (1) (cid:107) Av (cid:107) (cid:96) ( R I ) . (7.1)It can be readily veriﬁed that (cid:107) A (cid:107) op = sup u ∈ B I ,v ∈ B I (cid:104) u, Av (cid:105) I , and for asymmetric matrix A ∈ R I ×I , (cid:107) A (cid:107) op = sup u ∈ B I |(cid:104) u, Au (cid:105) I | . Here (cid:104)· , ·(cid:105) I isthe standard inner product on R I . Clearly, the deﬁnition of the operatornorm does not depend on the choice of the ordering of the index sets.Under this notational convention, with the index set Λ ≡ { ( ij ) : i ∈ [ N ] , j ∈ [ p ] } , we present below two results on the spectral norm of somespecial Λ × Λ matrices that are crucial to the proof of the quantitativecentral limit theorems. We do not specify a particular ordering on Λ aswe will be only interested in the operator norm as deﬁned above. In thefollowing we use N to denote the set of natural numbers. Recall the datamatrix X = [ X , . . . , X N ] (cid:62) ∈ R N × p and the deﬁnition of S in (3.1). Proposition 7.1. (1) Suppose p/N ≤ − ε for some ε > . For (cid:96), m ∈ N such that (cid:96) + m ≥ , let U (cid:96),m ∈ R Λ × Λ be deﬁned by ( U (cid:96),m ) ( ij ) , ( i (cid:48) j (cid:48) ) ≡ N − X (cid:62) i S − (cid:96) X i (cid:48) ( S − m ) jj (cid:48) . (7.2) Then for any q ∈ N , there exists some C = C ( ε, (cid:96), m, q ) > such that E (cid:107) U (cid:96),m (cid:107) q op ≤ C for p ≥ C .(2) When X i , S and N is replaced by X i − ¯ X , S ∗ and n , the conclusion of(1) still holds. When the inverse S − in (7.2) is replaced by S , the condition on p/N canbe substantially relaxed. Proposition 7.2.

Let y ≡ p/N . For (cid:96), m ∈ N , let U (cid:96),m ;+ ∈ R Λ × Λ be deﬁnedby ( U (cid:96),m ;+ ) ( ij ) , ( i (cid:48) j (cid:48) ) ≡ N − X (cid:62) i S (cid:96) X i (cid:48) ( S m ) jj (cid:48) . (7.3) Then for any q ∈ N , there exists some C = C ( (cid:96), m, q ) > such that E (cid:107) U (cid:96),m ;+ (cid:107) q op ≤ C ( √ y ∨ y ) q ( (cid:96) + m +1) . The proof of Proposition 7.1 relies crucially on the following stable mo-ment estimate for (cid:107) S − (cid:107) op . Its proof utilizes two main technical tools: (i)rigidity estimates on the eigenvalues of the sample covariance matrix (cf. IGH DIMENSIONAL COVARIANCE TESTING 27 [PY14]); (ii) closed form distributional formula of sample eigenvalues viazonal polynomials [Mui82, Chapter 9.7]. Details are presented in AppendixC.

Lemma 7.3.

Let S Z ≡ N − (cid:80) Ni =1 Z i Z (cid:62) i where Z i ’s are i.i.d. N (0 , I ) in R p .Suppose p/N ≤ − ε for some ﬁxed ε > and every N, p ≥ . Then for anypositive integer q ≤ ( N − p − / , we have E (cid:107) S − Z (cid:107) q op ≤ C for some positive C = C ( ε, q ) . The following corollary of the Koltchinskii-Lounici theorem [KL17] willalso be repeatedly used.

Lemma 7.4.

Let S Z ≡ N − (cid:80) Ni =1 Z i Z (cid:62) i where Z i ’s are i.i.d. N (0 , I ) in R p .Then for any positive integer q , there exists some positive C = C ( q ) suchthat E (cid:107) S Z − I (cid:107) q op ≤ C · (cid:16)(cid:114) pN (cid:95) pN (cid:17) q . Proof.

This is a direct consequence of [KL17, Corollary 2]. (cid:3)

Now we prove Propositions 7.1 and 7.2.

Proof of Proposition 7.1.

We only prove (1); claim (2) follows from com-pletely same arguments by noting that (7.4) and (7.6) below still holdwith the prescribed substitution. Note that U (cid:96),m is symmetric in that( U (cid:96),m ) ( ij )( i (cid:48) j (cid:48) ) = ( U (cid:96),m ) ( i (cid:48) j (cid:48) )( ij ) , and satisﬁes that for any non-negative in-tegers ( (cid:96) , m ) and ( (cid:96) , m ) such that ( (cid:96) + m ) ∧ ( (cid:96) + m ) ≥ U (cid:96) ,m · U (cid:96) ,m ) ( ij ) , ( i (cid:48) j (cid:48) ) = N − (cid:88) (¯ i ¯ j ) X (cid:62) i S − (cid:96) X ¯ i ( S − m ) j ¯ j X (cid:62) ¯ i S − (cid:96) X i (cid:48) ( S − m ) ¯ jj (cid:48) = N − X (cid:62) i S − (cid:96) (cid:18) (cid:88) ¯ i X ¯ i X (cid:62) ¯ i (cid:19) S − (cid:96) X i (cid:48) · (cid:18) (cid:88) ¯ j ( S − m ) j ¯ j ( S − m ) ¯ jj (cid:48) (cid:19) = N − X (cid:62) i S − ( (cid:96) + (cid:96) − X i (cid:48) ( S − ( m + m ) ) jj (cid:48) = ( U (cid:96) + (cid:96) − ,m + m ) ( ij ) , ( i (cid:48) j (cid:48) ) . (7.4)Consequently the above argument entails that for any q ∈ N , ( U (cid:96),m ) q = U (cid:96) (cid:48) ,m (cid:48) with (cid:96) (cid:48) ≡ (cid:96) (cid:48) ( q ) ≡ q ( (cid:96) −

1) + 1 , m (cid:48) ≡ m (cid:48) ( q ) ≡ qm. (7.5)Using that (cid:107) U (cid:96),m (cid:107) op = sup u ∈ B N × p (1) | (cid:80) ( ij )( i (cid:48) j (cid:48) ) u ij ( U (cid:96),m ) ( ij ) , ( i (cid:48) j (cid:48) ) u i (cid:48) j (cid:48) | , we have (cid:107) U (cid:96),m (cid:107) op = sup u ∈ B N × p (1) N − (cid:12)(cid:12)(cid:12)(cid:12) (cid:88) i,i (cid:48) ,j,j (cid:48) X (cid:62) i S − (cid:96) X i (cid:48) ( S − m ) jj (cid:48) u ij u i (cid:48) j (cid:48) (cid:12)(cid:12)(cid:12)(cid:12) = N − sup u ∈ B N × p (1) (cid:12)(cid:12)(cid:12)(cid:12) (cid:88) i,i (cid:48) X (cid:62) i S − (cid:96) X i (cid:48) · (cid:88) jj (cid:48) u ij · ( S − m ) jj (cid:48) u i (cid:48) j (cid:48) (cid:12)(cid:12)(cid:12)(cid:12) = N − sup u ∈ B N × p (1) (cid:12)(cid:12)(cid:12)(cid:12) (cid:88) i,i (cid:48) X (cid:62) i S − (cid:96) X i (cid:48) · (cid:104) u · S − m · u (cid:62) (cid:105) ii (cid:48) (cid:12)(cid:12)(cid:12)(cid:12) . As the ( i, i (cid:48) )-th entry of X (cid:62) S − (cid:96) X is X (cid:62) i S − (cid:96) X i (cid:48) and that tr( AB ) = (cid:80) Ni,i (cid:48) =1 A ii (cid:48) B ii (cid:48) for two symmetric matrices in R N × N , we have (cid:107) U (cid:96),m (cid:107) op = N − sup u ∈ B N × p (1) (cid:12)(cid:12) tr (cid:2) XS − (cid:96) X (cid:62) · uS − m u (cid:62) (cid:3)(cid:12)(cid:12) = N − sup u ∈ B N × p (1) (cid:12)(cid:12) tr (cid:2) ( u (cid:62) X ) S − (cid:96) ( X (cid:62) u ) · S − m (cid:3)(cid:12)(cid:12) . Further using twice the fact that tr( AB ) ≤ tr( A ) (cid:107) B (cid:107) op for any two p.s.d.and symmetric matrices A, B , we arrive at (cid:107) U (cid:96),m (cid:107) op ≤ N − (cid:107) S − m (cid:107) op (cid:107) S − (cid:96) (cid:107) op · sup u ∈ B N × p (1) (cid:12)(cid:12) tr (cid:0) XX (cid:62) uu (cid:62) (cid:1)(cid:12)(cid:12) ≤ (cid:107) S − ( (cid:96) + m ) (cid:107) op · N − (cid:107) XX (cid:62) (cid:107) op · sup u ∈ B N × p (1) tr( uu (cid:62) )= (cid:107) S − ( (cid:96) + m ) (cid:107) op · (cid:107) S (cid:107) op = (cid:107) S − (cid:107) (cid:96) + m − . (7.6)Hence for any q ∈ N , equations (7.4)-(7.6) and Lemma 7.3 entail that E (cid:107) U (cid:96),m (cid:107) q op = E (cid:107) U q(cid:96),m (cid:107) op = E (cid:107) U (cid:96) (cid:48) ,m (cid:48) (cid:107) op ≤ E (cid:107) S − (cid:107) (cid:96) (cid:48) + m (cid:48) − = E (cid:107) S − (cid:107) q ( (cid:96) + m − ≤ C (cid:96),m,q , completing the proof. (cid:3) Proof of Proposition 7.2.

The proof largely follows that of Proposition 7.1with modiﬁcations. We sketch the diﬀerence below. Using the same calcu-lations as in (7.4), we have( U (cid:96) ,m ;+ · U (cid:96) ,m ;+ ) ( ij ) , ( i (cid:48) j (cid:48) ) = N − (cid:88) (¯ i ¯ j ) X (cid:62) i S (cid:96) X ¯ i ( S m ) j ¯ j X (cid:62) ¯ i S (cid:96) X i (cid:48) ( S m ) ¯ jj (cid:48) = N − X (cid:62) i S (cid:96) (cid:18) (cid:88) ¯ i X ¯ i X (cid:62) ¯ i (cid:19) S (cid:96) X i (cid:48) · (cid:18) (cid:88) ¯ j ( S m ) j ¯ j ( S m ) ¯ jj (cid:48) (cid:19) = N − X (cid:62) i S (cid:96) + (cid:96) +1 X i (cid:48) ( S m + m ) jj (cid:48) = ( U (cid:96) + (cid:96) +1 ,m + m ;+ ) ( ij ) , ( i (cid:48) j (cid:48) ) . Hence for any q ∈ N , ( U (cid:96),m ;+ ) q = U (cid:96) (cid:48) ,m (cid:48) ;+ with (cid:96) (cid:48) now deﬁned by (cid:96) (cid:48) ≡ (cid:96) (cid:48) ( q ) ≡ (cid:96) + ( q − (cid:96) + 1) and m (cid:48) = qm remains the same as in (7.5).Then using the same arguments as in (7.6), we have (cid:107) U (cid:96),m ;+ (cid:107) op ≤ (cid:107) S (cid:107) (cid:96) + m +1op ,hence for any q ∈ N , E (cid:107) U (cid:96),m ;+ (cid:107) q op ≤ E (cid:107) S (cid:107) (cid:96) (cid:48) + m (cid:48) +1op (cid:46) (cid:96),m,q ( √ y ∨ y ) q ( (cid:96) + m +1) , where the last inequality follows by Lemma 7.4 and the fact that (cid:96) (cid:48) + m (cid:48) +1 = (cid:96) + ( q − (cid:96) + 1) + qm + 1 = q ( (cid:96) + m + 1). (cid:3) IGH DIMENSIONAL COVARIANCE TESTING 29 Proofs for Section 3.1

Recall Λ = { ( ij ) : i ∈ [ N ] , j ∈ [ p ] } . In the following sections, for asuﬃciently smooth function T : R Λ → R , its gradient ∇ T : R Λ → R Λ andHessian ∇ T : R Λ → R Λ × Λ are deﬁned respectively by (cid:0) ∇ T ( x ) (cid:1) ( ij ) ≡ ∂T∂x ( ij ) ( x ) and (cid:0) ∇ T ( x ) (cid:1) ( ij ) , ( i (cid:48) j (cid:48) ) ≡ ∂ T∂x ( ij ) ∂x ( i (cid:48) j (cid:48) ) ( x ) , with x = ( x ( ij ) ) ∈ R Λ . Slightly abusing notation, we using (cid:107)∇ T ( x ) (cid:107) F ≡(cid:107)∇ T ( x ) (cid:107) (cid:96) ( R Λ ) . The operator norm (cid:107)∇ T ( x ) (cid:107) op is deﬁned in (7.1).8.1. Evaluation of derivatives.

In the following, we use { e j } pj =1 to rep-resent the canonical basis in R p . Let δ ij be the Kronecker delta. Lemma 8.1.

Recall the form of T LRT ( X ) in (3.4). We assume without lossof generality that µ = 0 . Then for any ( i, j ) , ( i (cid:48) , j (cid:48) ) ∈ [ N ] × [ p ] ,(1) (cid:0) ∇ T LRT ( X ) (cid:1) ( ij ) = (cid:0) X ( I − S − ) (cid:1) ( ij ) = e (cid:62) j ( I − S − ) X i .(2) (cid:0) ∇ T LRT ( X ) (cid:1) ( ij ) , ( i (cid:48) j (cid:48) ) = N − X (cid:62) i S − ( e j (cid:48) X (cid:62) i (cid:48) + X i (cid:48) e (cid:62) j (cid:48) ) S − e j + δ ii (cid:48) e (cid:62) j ( I − S − ) e j (cid:48) .Proof. We use T as a shorthand for T LRT .(1). By deﬁnition, we have ∂ ( ij ) T ( X ) = N (cid:0) ∂ ( ij ) tr( S ) − ∂ ( ij ) log det S (cid:1) . For the ﬁrst partial derivative, using ∂ ( ij ) X k = δ ik e j , we have ∂ ( ij ) tr( S ( X )) = N − (cid:88) k ∂∂X ij tr( X k X (cid:62) k )= N − (cid:88) k δ ik tr (cid:2) e j X (cid:62) k + X k e (cid:62) j (cid:3) = N − (cid:88) k δ ik · X kj = 2 N − X ij . (8.1)For the second partial derivative, using the well-known fact that ∇ log det A = A − for any invertible and symmetric matrix A (see e.g., [BV04, SectionA.4.1]), we have ∂ ( ij ) log det S = (cid:88) k,(cid:96) ∂ log det S∂S k(cid:96) ∂S k(cid:96) ∂X ij ( ∗ ) = (cid:88) k,(cid:96) ( S − ) k(cid:96) · N ( δ jk X i(cid:96) + δ j(cid:96) X ik )= 1 N (cid:88) (cid:96) ( S − ) j(cid:96) X i(cid:96) + (cid:88) k ( S − ) kj X ik = 2 N ( XS − ) ij , (8.2) where in ( ∗ ), we use ∂S k(cid:96) ∂X ij = 1 N ∂∂X ij (cid:104) Xe k , Xe (cid:96) (cid:105) = 1 N ( δ jk X i(cid:96) + δ j(cid:96) X ik ) . (8.3)Combining (8.1) and (8.2) yields the ﬁrst claim.(2). Again by deﬁnition, we have ∂ ( ij )( i (cid:48) j (cid:48) ) T ( X ) = N (cid:0) ∂ ( ij )( i (cid:48) j (cid:48) ) tr( S ) − ∂ ( ij )( i (cid:48) j (cid:48) ) log det S (cid:1) . For the ﬁrst derivative, it follows from (8.1) that ∂ ( ij )( i (cid:48) j (cid:48) ) tr( S ) = 2 N − ∂ ( i (cid:48) j (cid:48) ) X ij = 2 N − δ ii (cid:48) δ jj (cid:48) . (8.4)For the second derivative, it follows from (8.2) that ∂ ( ij )( i (cid:48) j (cid:48) ) log det S = 2 N ∂∂X i (cid:48) j (cid:48) X (cid:62) i S − e j = 2 N (cid:16) ∂X i ∂X i (cid:48) j (cid:48) (cid:62) S − e j + X (cid:62) i ∂S − ∂X i (cid:48) j (cid:48) e j (cid:17) ( ∗∗ ) = 2 N (cid:16) δ ii (cid:48) e (cid:62) j (cid:48) S − e j − N − X (cid:62) i S − (cid:0) e j (cid:48) X (cid:62) i (cid:48) + X i (cid:48) e (cid:62) j (cid:48) (cid:1) S − e j (cid:17) , (8.5)where in ( ∗∗ ) we use the following calculation with the help of (8.3): ∂S − ∂X i (cid:48) j (cid:48) = − S − ∂S∂X i (cid:48) j (cid:48) S − = − S − (cid:16) (cid:88) k(cid:96) e k e (cid:62) (cid:96) ∂S k(cid:96) ∂X i (cid:48) j (cid:48) (cid:17) S − = − N S − · (cid:16) (cid:88) k(cid:96) e k e (cid:62) (cid:96) ( δ j (cid:48) k X i (cid:48) (cid:96) + δ j (cid:48) (cid:96) X i (cid:48) k ) (cid:17) S − = − N S − (cid:0) e j (cid:48) X (cid:62) i (cid:48) + X i (cid:48) e (cid:62) j (cid:48) (cid:1) S − . (8.6)We obtain the second claim by combining (8.4) and (8.5). (cid:3) Normal approximation.

Proof of Theorem 3.1.

We again shorthand T LRT;Σ by T . By Lemma 8.1, (cid:107)∇ T ( X ) (cid:107) F = (cid:88) i,j (cid:0) ∂ ( ij ) T ( X ) (cid:1) = (cid:88) i (cid:107) ( I − S − ) X i (cid:107) ≤ (cid:107) S − (cid:107) (cid:107) I − S (cid:107) (cid:88) i (cid:107) X i (cid:107) . Using Lemma 7.3 and Lemma 7.4, E (cid:107)∇ T ( X ) (cid:107) F (cid:46) E (cid:18) (cid:107) S − (cid:107) (cid:107) I − S (cid:107) (cid:88) i (cid:107) X i (cid:107) (cid:19) = (cid:88) i,i (cid:48) E (cid:20) (cid:107) S − (cid:107) (cid:107) I − S (cid:107) (cid:107) X i (cid:107) (cid:107) X i (cid:48) (cid:107) (cid:21) ≤ (cid:88) i,i (cid:48) E / (cid:107) S − (cid:107) · E / (cid:107) I − S (cid:107) · E / (cid:107) X i (cid:107) · E / (cid:107) X i (cid:48) (cid:107) (cid:46) N · (cid:16)(cid:114) pN (cid:17) · p · p = p . (8.7)Again by Lemma 8.1, the second derivatives are ∂ ( ij ) , ( i (cid:48) j (cid:48) ) T ( X ) = N − X (cid:62) i S − e j (cid:48) X (cid:62) i (cid:48) S − e j + N − X (cid:62) i S − X i (cid:48) ( S − ) jj (cid:48) + δ ii (cid:48) (cid:0) I − S − (cid:1) jj (cid:48) ≡ (cid:0) T + T + T (cid:1) ( ij ) , ( i (cid:48) j (cid:48) ) . Recall the deﬁnition of U (cid:96),m in Proposition 7.1-(1). Then( T ) ( ij ) , ( i (cid:48) j (cid:48) ) = N − (cid:88) (¯ i ¯ j ) X (cid:62) i S − e ¯ j · X (cid:62) ¯ i S − e j · X (cid:62) ¯ i S − e j (cid:48) · X (cid:62) i (cid:48) S − e ¯ j = N − (cid:18) (cid:88) ¯ i e (cid:62) j S − X ¯ i X (cid:62) ¯ i S − e j (cid:48) (cid:19) · (cid:18) (cid:88) ¯ j X (cid:62) i S − e ¯ j e (cid:62) ¯ j S − X i (cid:48) (cid:19) = N − ( S − ) jj (cid:48) · X (cid:62) i S − X i (cid:48) = ( U , ) ( ij ) , ( i (cid:48) j (cid:48) ) , (8.8)and T = U , . Proposition 7.1-(1) entails that E (cid:107) T (cid:107) ∨ E (cid:107) T (cid:107) = O (1) . (8.9)On the other hand, T has a block diagonal structure with respect to theindex ( i, i (cid:48) ), so its spectral norm equals that of ( I − S − ) ∈ R p × p , and hence E (cid:107) T (cid:107) = E (cid:107) I − S − (cid:107) ≤ E (cid:2) (cid:107) S − (cid:107) (cid:107) I − S (cid:107) (cid:3) = O (1) . (8.10)Combining all the estimates above, we ﬁnd that E (cid:107)∇ T ( X ) (cid:107) (cid:46) E (cid:107) T (cid:107) + E (cid:107) T (cid:107) + E (cid:107) T (cid:107) = O (1) . (8.11)Let X (cid:48) be an independent copy of X and let X (cid:48) t ≡ √ tX + √ − tX (cid:48) ∈ R N × p .Let E (cid:48) denote expectation only with respect to X (cid:48) and¯ T ( X ) ≡ (cid:90) √ t (cid:10) ∇ T ( X ) , E (cid:48) ∇ T ( X (cid:48) t ) (cid:11) d t. Then by the Gaussian-Poincar´e inequalityVar (cid:0) ¯ T ( X ) (cid:1) ≤ E (cid:107)∇ ¯ T ( X ) (cid:107) F (cid:46) (cid:113) E (cid:107)∇ T ( X ) (cid:107) (cid:113) E (cid:107)∇ T ( X ) (cid:107) F (cid:46) p . The claim now follows from the second-order Poincar´e inequality in LemmaA.1 and Proposition 3.2-(4). (cid:3)

Ratio control.

Proof of Proposition 3.2.

We shorthand ( T LRT , m

Σ;LRT , σ

Σ;LRT , V

Σ;LRT ) by(

T, m Σ , σ Σ , V Σ ).(1). Recall that Z , . . . , Z n are i.i.d. samples from N (0 , I p ). By Lemma 8.1,with S Z ≡ N − (cid:80) Ni =1 Z i Z (cid:62) i , we have T Σ ( Z ) = Z Σ / (cid:0) I − Σ − / S − Z Σ − / (cid:1) Σ / = Z (Σ − S − Z ) . Hence with { λ j } pj =1 denoting the eigenvalues of Σ, we have V = E (cid:107) T Σ ( Z ) − T I ( Z ) (cid:107) F = E (cid:107) Z (Σ − I ) (cid:107) F = E tr (cid:0) (Σ − I ) Z (cid:62) Z (Σ − I ) (cid:1) = tr (cid:0) E Z (cid:62) Z (Σ − I ) (cid:1) = N (cid:107) Σ − I (cid:107) F = N (cid:88) j ( λ j − . (2). Note that m Σ = ( N/ E (cid:20) tr (cid:0) Σ / S Z Σ / (cid:1) − log det(Σ / S Z Σ / ) − p (cid:21) = ( N/ (cid:2) tr(Σ) − log det Σ − p (cid:3) − ( N/ E log det S Z , so m Σ − m I = ( N/ d S (Σ , I )= ( N/ p (cid:88) j =1 (cid:0) λ j − log λ j − (cid:1) (cid:38) N p (cid:88) j =1 (cid:2) | λ j − | ∧ ( λ j − (cid:3) . (3). It is shown by the proof of [CJ18, Theorem 1] that with µ n, , σ n, deﬁnedin [CJ18, Corollary 1], and Y n ≡ (cid:0) T ( X ) − µ n, (cid:1) / ( nσ n, ), for s ∈ ( − s , s )for some s >

0, lim n ∧ p →∞ ,n ≥ p +2 M Y n ( s ) = M N (0 , ( s ) = e s / , where M Y ( s ) ≡ E e sY denotes the moment generating function of a genericrandom variable Y . Now using that for any s ∈ (0 , s ), E Y n = 4 (cid:90) ∞ t P (cid:0) Y n > t (cid:1) d t ≤ (cid:90) ∞ t e − st M Y n ( s ) d t = (6 /s ) M Y n ( s ) , it follows that sup n E Y n < ∞ , and hence convergence of moments yields that E Y n → , E Y n →

1. This implies σ I / ( n σ n, ) = Var( Y n ) = E Y n − ( E Y n ) →

1. Hence the asymptotic formula for σ I holds. In particular, this means thatthere exists some suﬃciently large M such that for n ∧ p ≥ M, n ≥ p + 2, σ I ≥ M − · n σ n, = 12 M · n (cid:20) − pN − log (cid:18) − pN (cid:19)(cid:21) ( ∗ ) ≥ M n · p N ≥ p M , where in ( ∗ ) we used the inequality − x − log(1 − x ) ≥ x / x ∈ (0 , { λ j } pj =1 are eigenvalues of Σ. By (1)-(3), we only need toshow that for some universal C > (cid:113) N (cid:80) j ( λ j − (cid:0) N (cid:80) j (cid:0) | λ j − | ∧ ( λ j − (cid:1)(cid:1) (cid:87) σ I ≤ C (cid:0) σ I ∧ N (cid:1) / . (8.12) IGH DIMENSIONAL COVARIANCE TESTING 33

To see this, let ν j ≡ | λ j − | , and J ≡ { j ∈ [ p ] : ν j ≤ } , it suﬃces to prove (cid:113) N (cid:80) j ∈ J ν j (cid:87) (cid:113) N (cid:80) j ∈ J c ν j (cid:0) N (cid:80) j ∈ J ν j (cid:1) (cid:87) (cid:0) N (cid:80) j ∈ J c ν j (cid:1) (cid:87) σ I ≤ C (cid:0) σ I ∧ N (cid:1) / . (8.13)This follows asLHS of (8.13) ≤ (cid:113) N (cid:80) j ∈ J ν j (cid:0) N (cid:80) j ∈ J ν j (cid:1) (cid:87) σ I + √ N (cid:80) j ∈ J c ν j (cid:0) N (cid:80) j ∈ J c ν j (cid:1) (cid:87) σ I ≤ x ≥ (cid:0) x ∨ σ I x (cid:1) + N − / (cid:46) (cid:0) σ I ∧ N (cid:1) − / . The proof is complete. (cid:3) Proofs for Section 3.2

Evaluation of derivatives.Lemma 9.1.

Recall the form of T ( µ, Σ);LRT ( X ) in (3.8). Then for any ( i, j ) , ( i (cid:48) , j (cid:48) ) ∈ [ n ] × [ p ] ,(1) (cid:0) ∇ T ( µ, Σ);LRT ( X ) (cid:1) ( ij ) = (cid:0) X ( I − S − ∗ ) + n ¯ X (cid:62) S − ∗ (cid:1) ( ij ) = e (cid:62) j (cid:0) ( I − S − ∗ ) X i + S − ∗ ¯ X (cid:1) .(2) (cid:0) ∇ T ( µ, Σ);LRT ( X ) (cid:1) ( ij )( i (cid:48) j (cid:48) ) = n − ( X i − ¯ X ) (cid:62) S − ∗ (cid:0) e j (cid:48) ( X i (cid:48) − ¯ X ) (cid:62) + ( X i (cid:48) − ¯ X ) e (cid:62) j (cid:48) (cid:1) S − ∗ e j + δ ii (cid:48) e (cid:62) j ( I − S − ∗ ) e j (cid:48) + n − e (cid:62) j (cid:48) S − ∗ e j .Proof. We shorthand T ( µ, Σ);LRT as T .(1). By deﬁnition, we have ∂ ( ij ) T ( X ) = n (cid:2) ∂ ( ij ) tr( S ∗ ) − ∂ ( ij ) log det S ∗ + ∂ ( ij ) ¯ X (cid:62) ¯ X (cid:3) . For the ﬁrst term, using ∂ ( ij ) X k = δ ik e j and ∂ ( ij ) ¯ X = n − (cid:80) k ∂ ij X k = n − e j , we have ∂ ( ij ) tr( S ∗ ( X )) = n − (cid:88) k ∂∂X ij tr (cid:0) ( X k − ¯ X )( X k − ¯ X ) (cid:62) (cid:1) = n − (cid:88) k tr (cid:2) ( δ ik e j − n − e j )( X k − ¯ X ) (cid:62) + ( X k − ¯ X )( δ ik e j − n − e j ) (cid:62) (cid:3) = n − (cid:88) k δ ik · X k − ¯ X ) j = 2 n − ( X ij − ¯ X j ) . (9.1)For the second term, using the well-known fact that ∇ log det A = A − forany invertible and symmetric matrix A (see e.g., [BV04, Section A.4.1]), wehave ∂ ( ij ) log det S ∗ = (cid:88) k,(cid:96) ∂ log det S ∗ ∂ ( S ∗ ) k(cid:96) ∂ ( S ∗ ) k(cid:96) ∂X ij ∗ ) = (cid:88) k,(cid:96) ( S − ∗ ) k(cid:96) · n (cid:0) δ jk ( X i(cid:96) − ¯ X (cid:96) ) + δ j(cid:96) ( X ik − ¯ X k ) (cid:1) = 2 n (cid:88) (cid:96) ( S − ∗ ) j(cid:96) ( X i(cid:96) − ¯ X (cid:96) ) = 2 n (cid:0) ( X − n ¯ X (cid:62) ) S − ∗ (cid:1) ij , (9.2)where in ( ∗ ), we use ∂ ( S ∗ ) k(cid:96) ∂X ij = 1 n ∂∂X ij n (cid:88) m =1 (cid:0) ( X m − ¯ X ) (cid:62) e k (cid:1)(cid:0) ( X m − ¯ X ) (cid:62) e (cid:96) (cid:1) = 1 n (cid:2) δ jk ( X i(cid:96) − ¯ X (cid:96) ) + δ j(cid:96) ( X ik − ¯ X k ) (cid:3) . (9.3)Lastly, for the third term, we have ∂ ( ij ) ¯ X (cid:62) ¯ X = 2( ∂ ( ij ) ¯ X ) (cid:62) ¯ X = 2 n ¯ X j . (9.4)We combine (9.1)-(9.4) to obtain the ﬁrst claim.(2). Again by deﬁnition, we have ∂ ( ij )( i (cid:48) j (cid:48) ) T ( X ) = n (cid:2) ∂ ( ij )( i (cid:48) j (cid:48) ) tr( S ∗ ) − ∂ ( ij )( i (cid:48) j (cid:48) ) log det S ∗ + ∂ ( ij )( i (cid:48) j (cid:48) ) ¯ X (cid:62) ¯ X (cid:3) . For the ﬁrst term, it follows from (9.1) that ∂ ( ij )( i (cid:48) j (cid:48) ) tr( S ∗ ) = 2 n − ∂ ( i (cid:48) j (cid:48) ) ( X ij − ¯ X j ) = 2 n − (cid:0) δ ii (cid:48) − n − (cid:1) δ jj (cid:48) . (9.5)For the second term, it follows from (9.2) that ∂ ( ij )( i (cid:48) j (cid:48) ) log det S ∗ = 2 n ∂∂X i (cid:48) j (cid:48) ( X i − ¯ X ) (cid:62) S − ∗ e j = 2 n (cid:104) ∂ ( X i − ¯ X ) ∂X i (cid:48) j (cid:48) (cid:62) S − ∗ e j + ( X i − ¯ X ) (cid:62) ∂S − ∗ ∂X i (cid:48) j (cid:48) e j (cid:105) ( ∗∗ ) = 2 n (cid:104) ( δ ii (cid:48) − n − ) e (cid:62) j (cid:48) S − ∗ e j − n − ( X i − ¯ X ) (cid:62) S − ∗ (cid:0) e j (cid:48) ( X i (cid:48) − ¯ X ) (cid:62) + ( X i (cid:48) − ¯ X ) e (cid:62) j (cid:48) (cid:1) S − ∗ e j (cid:105) , (9.6)where in ( ∗∗ ) we use the following calculation with the help of (9.3): ∂S − ∗ ∂X i (cid:48) j (cid:48) = − S − ∗ ∂S ∗ ∂X i (cid:48) j (cid:48) S − ∗ = − S − ∗ (cid:104) (cid:88) k,(cid:96) e k e (cid:62) (cid:96) ∂ ( S ∗ ) k(cid:96) ∂X i (cid:48) j (cid:48) (cid:105) S − ∗ = − n S − ∗ · (cid:104) (cid:88) k,(cid:96) e k e (cid:62) (cid:96) (cid:0) δ j (cid:48) k ( X i (cid:48) (cid:96) − ¯ X (cid:96) ) + δ j (cid:48) (cid:96) ( X i (cid:48) k − ¯ X k ) (cid:1)(cid:105) S − ∗ = − n S − ∗ (cid:2) e j (cid:48) ( X i (cid:48) − ¯ X ) (cid:62) + ( X i (cid:48) − ¯ X ) e (cid:62) j (cid:48) (cid:3) S − ∗ . IGH DIMENSIONAL COVARIANCE TESTING 35

Lastly ∂ ( ij ) , ( i (cid:48) j (cid:48) ) ¯ X (cid:62) ¯ X = 2 n ∂ ( i (cid:48) j (cid:48) ) ¯ X j = 2 n δ jj (cid:48) . (9.7)We combine (9.5)-(9.7) to obtain the second claim. (cid:3) Normal approximation.

Proof of Theorem 3.5.

We again shorthand T ( µ, Σ);LRT as T . First we boundthe norm for the gradient: by Lemma 9.1-(1), (cid:107)∇ T ( X ) (cid:107) F = (cid:88) i,j (cid:0) X ( I − S − ∗ ) + n ¯ X (cid:62) S − ∗ (cid:1) ij ≤ (cid:88) i (cid:107) ( I − S − ∗ ) X i (cid:107) + 2 (cid:88) i,j (cid:0) ¯ X (cid:62) S − ∗ e j (cid:1) = 2 (cid:88) i (cid:107) ( I − S − ∗ ) X i (cid:107) + 2 n ¯ X (cid:62) S − ∗ ¯ X ≡ ( I ) + ( II ) . By essentially the same arguments as in (8.7) in the proof of Theorem 3.1,we have E ( I ) (cid:46) p , so we only need to handle ( II ): E ( II ) = 4 n · E (cid:0) ¯ X (cid:62) S − ∗ ¯ X ¯ X (cid:62) S − ∗ ¯ X (cid:1) ≤ n · E (cid:0) (cid:107) S − ∗ (cid:107) (cid:107) ¯ X (cid:107) (cid:1) ( ∗ ) = 4 n E (cid:107) S − ∗ (cid:107) · E (cid:107) ¯ X (cid:107) ∗∗ ) (cid:46) n · p n = p . Here ( ∗ ) follows from the fact that S ∗ is independent of ¯ X , and ( ∗∗ ) followsfrom Lemma 7.3. Combining the bounds we have E (cid:107)∇ T ( X ) (cid:107) F (cid:46) p .Next we bound the spectral norm for the Hessian. By Lemma 9.1-(2), ∂ ( ij ) , ( i (cid:48) j (cid:48) ) T ( X ) = n − ( X i − ¯ X ) (cid:62) S − ∗ e j (cid:48) ( X i (cid:48) − ¯ X ) (cid:62) S − ∗ e j + n − ( X i − ¯ X ) (cid:62) S − ∗ ( X i (cid:48) − ¯ X )( S − ∗ ) jj (cid:48) + δ ii (cid:48) (cid:0) I − S − ∗ (cid:1) jj (cid:48) + n − ( S − ∗ ) jj (cid:48) ≡ (cid:0) T + T + T + T (cid:1) ( ij ) , ( i (cid:48) j (cid:48) ) . Using the same calculation as in (8.8), we have( T ) ( ij ) , ( i (cid:48) j (cid:48) ) = n − ( X i − ¯ X ) (cid:62) S − ∗ ( X i (cid:48) − ¯ X )( S − ∗ ) jj (cid:48) . Proposition 7.1-(2) entails that E (cid:107) T (cid:107) ∨ E (cid:107) T (cid:107) = O (1). Similar to (8.10), E (cid:107) T (cid:107) = O (1). For T , note that (cid:107) T (cid:107) op = n − sup u =( u ,...,u n ) ∈ R n × p (cid:80) nk =1 (cid:107) u k (cid:107) ≤ (cid:12)(cid:12)(cid:12)(cid:12) (cid:88) ≤ k,(cid:96) ≤ n u (cid:62) k S − ∗ u (cid:96) (cid:12)(cid:12)(cid:12)(cid:12) ≤ n − sup u =( u ,...,u n ) ∈ R n × p (cid:80) nk =1 (cid:107) u k (cid:107) ≤ (cid:88) ≤ k,(cid:96) ≤ n (cid:107) u k (cid:107)(cid:107) u (cid:96) (cid:107) · (cid:107) S − ∗ (cid:107) op ≤ (cid:107) S − ∗ (cid:107) op , where in the last inequality we use the Cauchy-Schwarz inequality that (cid:80) ≤ k,(cid:96) ≤ n (cid:107) u k (cid:107)(cid:107) u (cid:96) (cid:107) = (cid:0) (cid:80) nk =1 (cid:107) u k (cid:107) (cid:1) ≤ n (cid:80) k (cid:107) u k (cid:107) ≤ n . Hence E (cid:107) T (cid:107) = O (1). Combining the bounds we arrive at E (cid:107)∇ T ( X ) (cid:107) = O (1). The restof the proof proceeds along the lines in the proof of Theorem 3.1, with thehelp of the variance formula in Proposition 3.6-(3). (cid:3) Ratio control.

Proof of Proposition 3.6.

We shorthand ( T ( µ, Σ);LRT , m ( µ, Σ);LRT , σ ( µ, Σ);LRT , V ( µ, Σ);LRT )by (

T, m ( µ, Σ) , σ ( µ, Σ) , V ( µ, Σ) ). (1). Recall that Z , . . . , Z n are i.i.d. samplesfrom N (0 , I p ). Let S ∗ ,Z ≡ n − n (cid:88) i =1 ( Z i − ¯ Z )( Z i − ¯ Z ) (cid:62) . By Lemma 9.1, T ( µ, Σ) = (cid:18)(cid:0) Z Σ / + n µ (cid:62) (cid:1)(cid:0) I − Σ − / S ∗ ,Z Σ − / (cid:1) + n (cid:0) ¯ Z (cid:62) Σ / + µ (cid:62) (cid:1) · Σ − / S ∗ ,Z Σ − / (cid:19) Σ / = Z (Σ − S ∗ ,Z ) + n ¯ Z (cid:62) S ∗ ,Z + n µ (cid:62) Σ / . Hence with (cid:107) µ (cid:107) ≡ µ (cid:62) Σ µ and { λ pj =1 } pj =1 denoting the eigenvalues of Σ, V µ, Σ) = E (cid:13)(cid:13) T ( µ, Σ) ( Z ) − T (0 ,I ) ( Z ) (cid:13)(cid:13) F = E (cid:107) Z (Σ − I ) + n µ (cid:62) Σ / (cid:107) F = n (cid:0) (cid:107) Σ − I (cid:107) F + µ (cid:62) Σ µ (cid:1) = n (cid:20) (cid:88) j (cid:0) λ j − (cid:1) + (cid:107) µ (cid:107) (cid:21) . (2). Note that m ( µ, Σ) = n E (cid:2) tr (cid:0) Σ / S ∗ ,Z Σ / (cid:1) − log det(Σ / S ∗ ,Z Σ / ) − p + ¯ X (cid:62) ¯ X (cid:3) = n (cid:2) tr(Σ) − log det Σ − p + (cid:107) µ (cid:107) (cid:3) − n · E log det S ∗ ,Z , where the second equality follows as E tr( S ∗ ,Z ) = Nn E tr (cid:18) N − n (cid:88) k =1 ( Z k − ¯ Z )( Z k − ¯ Z ) (cid:62) (cid:19) = n − n E tr (cid:18) N − N (cid:88) i =1 Z k Z (cid:62) k (cid:19) = n − n tr(Σ) , and E ¯ X (cid:62) ¯ X = n − E (cid:16) (cid:88) k X k (cid:17) (cid:62) (cid:16) (cid:88) (cid:96) X (cid:96) (cid:17) = n − (cid:20)(cid:18) (cid:88) k (cid:54) = (cid:96) + (cid:88) k = (cid:96) (cid:19) E X (cid:62) k X (cid:96) (cid:21) = n − (cid:2) n ( n − (cid:107) µ (cid:107) + n (cid:0) (cid:107) µ (cid:107) + tr(Σ) (cid:1)(cid:3) = (cid:107) µ (cid:107) + n − tr(Σ) . IGH DIMENSIONAL COVARIANCE TESTING 37

Hence m ( µ, Σ) − m (0 ,I ) = n (cid:0) d S (Σ , I ) + (cid:107) µ (cid:107) (cid:1) . (3). The proof is the same as Proposition 3.2-(3) by invoking [CJ18, Theo-rem 2].(4). By (1)-(3), we only need to show that for some universal constant C > (cid:113) n (cid:0) (cid:80) j ( λ j − ∨ (cid:107) µ (cid:107) (cid:1)(cid:0) n (cid:80) j (cid:0) | λ j − | ∧ ( λ j − (cid:1)(cid:1) (cid:87) (cid:0) n (cid:107) µ (cid:107) (cid:1) (cid:87) σ (0 ,I ) ≤ C (cid:0) σ (0 ,I ) ∧ n (cid:1) / (9.8)holds. In view of (8.12), we only need to prove that (cid:113) n (cid:107) µ (cid:107) (cid:0) n (cid:80) j (cid:0) | λ j − | ∧ ( λ j − (cid:1)(cid:1) (cid:87) (cid:0) n (cid:107) µ (cid:107) (cid:1) (cid:87) σ (0 ,I ) ≤ C (cid:0) σ (0 ,I ) ∧ n (cid:1) / . As (cid:107) µ (cid:107) = (cid:107) µ (cid:107) + µ (cid:62) (Σ − I ) µ ≤ (cid:107) µ (cid:107) + (cid:107) µ (cid:107) · max j | λ j − | , with ν j = | λ j − | and J = { j ∈ [ p ] : ν j ≤ } , we only need to prove (cid:112) n (cid:107) µ (cid:107) (cid:0) n (cid:80) j ∈ J ν j (cid:1) (cid:87) (cid:0) n (cid:80) j ∈ J c ν j (cid:1) (cid:87) (cid:0) n (cid:107) µ (cid:107) (cid:1) (cid:87) σ (0 ,I ) ≤ σ / ,I ) , (9.9) (cid:112) n (cid:107) µ (cid:107) · max j ν j (cid:0) n (cid:80) j ∈ J ν j (cid:1) (cid:87) (cid:0) n (cid:80) j ∈ J c ν j (cid:1) (cid:87) (cid:0) n (cid:107) µ (cid:107) (cid:1) (cid:87) σ (0 ,I ) ≤ C (cid:0) σ (0 ,I ) ∧ n (cid:1) / . (9.10)To see (9.9), note thatLHS of (9.9) ≤ (cid:112) n (cid:107) µ (cid:107) (cid:0) n (cid:107) µ (cid:107) (cid:1) (cid:87) σ (0 ,I ) ≤ x ≥ (cid:0) x ∨ σ ,I x (cid:1) ≤ σ / ,I . To see (9.10), using that ab ≤ ( a + b ) /

2, we haveLHS of (9.10) (cid:46) n / (cid:107) µ (cid:107) (cid:0) n (cid:107) µ (cid:107) (cid:1) + n / max j ν j (cid:0) n (cid:80) j ∈ J ν j (cid:1) (cid:87) (cid:0) n (cid:80) j ∈ J c ν j (cid:1) (cid:87) σ (0 ,I ) ≤ n − / + n / (max j ν j ) · max j ν j > (cid:0) n (cid:80) j ∈ J c ν j (cid:1) + n / (max j ν j ) · max j ν j ≤ (cid:0) n max j ν j (cid:1) (cid:87) σ (0 ,I ) ≤ n − / + n / inf x ≥ (cid:0) nx ∨ σ (0 ,I ) x (cid:1) (cid:46) n − / (cid:95) σ − / ,I ) , as desired. (cid:3) Proofs for Section 3.3

Evaluation of derivatives.Lemma 10.1.

Recall the form of T LNW ( X ) in (3.9). We assume withoutloss of generality that µ = 0 . Then for any ( i, j ) , ( i (cid:48) , j (cid:48) ) ∈ [ N ] × [ p ] , (1) (cid:0) ∇ T LNW ( X ) (cid:1) ( ij ) = (cid:0) X ( S − I ) − (tr( S ) /N ) X (cid:1) ij = e (cid:62) j ( S − I ) X i − (tr( S ) /N ) X ij .(2) (cid:0) ∇ T LNW ( X ) (cid:1) ( ij )( i (cid:48) j (cid:48) ) = N − δ jj (cid:48) X (cid:62) i X i (cid:48) + N − X i (cid:48) j X ij (cid:48) + δ ii (cid:48) ( S − I ) jj (cid:48) − (2 /N ) X ij X i (cid:48) j (cid:48) − (tr( S ) /N ) δ ii (cid:48) δ jj (cid:48) .Furthermore, for any ( i (cid:96) , j (cid:96) ) ∈ [ N ] × [ p ] , (cid:96) = 1 , , , , ∂ ( i j )( i j )( i j )( i j ) T LNW ( X )= N − (cid:0) δ i i δ i i δ j j δ j j + δ i i δ i i δ j j δ j j + δ i i δ i i δ j j δ j j + δ i i δ i i δ j j δ j j + δ i i δ i i δ j j δ j j + δ i i δ i i δ j j δ j j (cid:1) − N − (cid:0) δ i i δ i i δ j j δ j j + δ i i δ i i δ j j δ j j + δ i i δ i i δ j j δ j j (cid:1) . Proof.

We shorthand T LNW as T . As ∂ ij S ( X ) = N − ( e j X (cid:62) i + X i e (cid:62) j ), forthe ﬁrst-order derivatives we have ∂ ( ij ) T ( X ) = N (cid:18) tr (cid:2) ∂ ( ij ) ( S − I ) (cid:3) − N · S ) tr (cid:2) ∂ ( ij ) S (cid:3)(cid:19) = 12 tr (cid:2) ( S − I )( e j X (cid:62) i + X i e (cid:62) j ) (cid:3) − tr( S ) X ij N = (cid:0) X ( S − I ) (cid:1) ij − tr( S ) N X ij = e (cid:62) j ( S − I ) X i − tr( S ) N X ij . For the second-order derivatives we have ∂ ( ij ) , ( i (cid:48) j (cid:48) ) T ( X ) = ∂ ( i (cid:48) j (cid:48) ) (cid:0) e (cid:62) j ( S − I ) X i (cid:1) − N − ∂ ( i (cid:48) j (cid:48) ) (cid:0) tr( S ) X ij (cid:1) = N − e (cid:62) j ( e j (cid:48) X (cid:62) i (cid:48) + X i (cid:48) e (cid:62) j (cid:48) ) · X i + δ ii (cid:48) e (cid:62) j ( S − I ) e j (cid:48) − N − (cid:0) (2 /N ) X ij X i (cid:48) j (cid:48) + δ ii (cid:48) δ jj (cid:48) tr( S ) (cid:1) = N − δ jj (cid:48) X (cid:62) i X i (cid:48) + N − X i (cid:48) j X ij (cid:48) + δ ii (cid:48) ( S − I ) jj (cid:48) − N − X ij X i (cid:48) j (cid:48) − N − tr( S ) δ ii (cid:48) δ jj (cid:48) . For the third-order derivatives we have ∂ ( i j )( i j )( i j ) T ( X )= N − δ j j ∂ ( i j ) ( X (cid:62) i X i ) + N − ∂ ( i j ) ( X i j X i j )+ N − δ i i e (cid:62) j (cid:0) e j X (cid:62) i + X i e (cid:62) j (cid:1) e j − N − ∂ ( i j ) (cid:0) X i j X i j (cid:1) − N − δ i i δ j j X i j = N − (cid:0) δ i i δ j j X i j + δ i i δ j j X i j (cid:1) + N − (cid:0) δ i i δ j j X i j + δ i i δ j j X i j (cid:1) + N − (cid:0) δ i i δ j j X i j + δ i i δ j j X i j (cid:1) − N − (cid:0) δ i i δ j j X i j + δ i i δ j j X i j + δ i i δ j j X i j (cid:1) . IGH DIMENSIONAL COVARIANCE TESTING 39

For the fourth-order derivatives we have ∂ ( i j )( i j )( i j )( i j ) T ( X )= N − (cid:0) δ i i δ i i δ j j δ j j + δ i i δ i i δ j j δ j j + δ i i δ i i δ j j δ j j + δ i i δ i i δ j j δ j j + δ i i δ i i δ j j δ j j + δ i i δ i i δ j j δ j j (cid:1) − N − (cid:0) δ i i δ i i δ j j δ j j + δ i i δ i i δ j j δ j j + δ i i δ i i δ j j δ j j (cid:1) . The proof is complete. (cid:3)

Normal approximation.

Proof of Theorem 3.8.

Let y ≡ p/N . We start by showing that E (cid:107)∇ T ( X ) (cid:107) ≤ C (1 ∨ y ) (10.1)for some absolute constant C >

0. Reorganizing the terms in Lemma 10.1,we have (cid:0) ∇ T ( X ) (cid:1) ( ij ) , ( i (cid:48) j (cid:48) ) = N − X (cid:62) i X i (cid:48) δ jj (cid:48) + N − X ij (cid:48) X i (cid:48) j − N − X i (cid:48) j (cid:48) X ij + δ ii (cid:48) e (cid:62) j (cid:48) ( S − I − N − tr( S ) I ) e j ≡ ( T , + T , − T , + T , ) ( ij ) , ( i (cid:48) j (cid:48) ) . Recall the deﬁnition of U (cid:96),m ;+ from Proposition 7.2. As T , = U , and( T , ) ( ij ) , ( i (cid:48) j (cid:48) ) = N − (cid:88) (¯ i ¯ j ) X ¯ ij X i ¯ j X i (cid:48) ¯ j X ¯ ij (cid:48) = N − (cid:18) (cid:88) ¯ i X ¯ ij X ¯ ij (cid:48) (cid:19)(cid:18) (cid:88) ¯ j X i ¯ j X i (cid:48) ¯ j (cid:19) = N − S jj (cid:48) X (cid:62) i X i (cid:48) = ( U , ) ( ij ) , ( i (cid:48) j (cid:48) ) , Proposition 7.2 entails that E (cid:107) T , (cid:107) ∨ E (cid:107) T , (cid:107) = O ((1 ∨ y ) ). For T , ,as (cid:107) T , (cid:107) op = (2 /N ) sup u,v ∈ B N × p (cid:12)(cid:12)(cid:12)(cid:12) (cid:88) ( ij ) , ( i (cid:48) j (cid:48) ) u ij X ij X i (cid:48) j (cid:48) v i (cid:48) j (cid:48) (cid:12)(cid:12)(cid:12)(cid:12) = (2 /N ) (cid:107) X (cid:107) F . Hence E (cid:107) T , (cid:107) = O ( y ) = O ((1 ∨ y ) ). For T , , it holds by the blockdiagonal structure that (cid:107) T , (cid:107) op = (cid:107) S − I − N − tr( S ) I (cid:107) op ≤ (cid:107) S − I (cid:107) op + N − tr( S ) . Hence it holds by Lemma 7.4 that E (cid:107) T , (cid:107) (cid:46) (cid:0) y ∨ √ y (cid:1) + N − · N − E (cid:107) X (cid:107) F (cid:46) (1 ∨ y ) . By collecting the estimates of T , - T , , we complete the proof of (10.1).Next we show that E (cid:107)∇ T ( X ) (cid:107) F (cid:46) p . This will be done by two estimatesbelow.( Estimate 1 ) By Lemma 10.1-(1), (cid:107)∇ T ( X ) (cid:107) F (cid:46) (cid:88) i (cid:107) ( S − I ) X i (cid:107) + N − tr ( S ) (cid:107) X (cid:107) F ≤ (cid:0) (cid:107) S − I (cid:107) + N − tr ( S ) (cid:1) (cid:88) i (cid:107) X i (cid:107) , so by Lemma 7.4 and Proposition 7.2, E (cid:107)∇ T ( X ) (cid:107) F (cid:46) E (cid:104)(cid:0) (cid:107) S − I (cid:107) + N − tr ( S ) (cid:1) (cid:88) i (cid:107) X i (cid:107) (cid:105) (cid:46) (cid:88) i,i (cid:48) E (cid:2)(cid:0) (cid:107) S − I (cid:107) + N − tr ( S ) (cid:1) (cid:107) X i (cid:107) (cid:107) X i (cid:48) (cid:107) (cid:3) ≤ (cid:88) i,i (cid:48) (cid:0) E / (cid:107) S − I (cid:107) + N − E / tr ( S ) (cid:17) · E / (cid:107) X i (cid:107) · E / (cid:107) X i (cid:48) (cid:107) (cid:46) N · (cid:104)(cid:16) pN (cid:17) + (cid:16) pN (cid:17) · E / (cid:107) S (cid:107) (cid:105) · p · p (cid:46) p (1 + y ) . ( Estimate 2 ) Note that ∇ T ( X ) = X (cid:0) S − N − tr( S ) I (cid:1) − X ≡ T , + T , . It is clear that E (cid:107) T , (cid:107) F (cid:46) N p . To handle T , , note that (cid:107) T , (cid:107) F = N tr (cid:16)(cid:0) S − N − tr( S ) I (cid:1) S (cid:17) = N tr (cid:16) S + N − tr ( S ) S − N − tr( S ) S (cid:17) = N (cid:104) tr( S ) + N − tr ( S ) − N − tr( S ) tr( S ) (cid:105) . Then using Lemma D.5, we have under the prescribed asymptotics that E (cid:107) T , (cid:107) F = N (cid:20) py + 3 py + p + 3 y + 3 y + 4 N − y + N − (cid:0) p + 6 py + 8 N − y (cid:1) − N − (cid:0) p y + p + py + 4( y + y ) + 4 N − y (cid:1)(cid:21) = p (cid:18) Np + 1 N + 3 p − N p − N (cid:19) = p (cid:2) O (( N ∧ p ) − ) (cid:3) + pN. Hence we have E (cid:107)∇ T ( X ) (cid:107) F = (cid:0) E (cid:107)∇ T ( X ) (cid:107) F (cid:1) + Var (cid:0) (cid:107)∇ T ( X ) (cid:107) F (cid:1) = O ( p (1 + y − )) + Var (cid:0) (cid:107)∇ T ( X ) (cid:107) F (cid:1) . (10.2)By the Gaussian-Poincar´e inequality, we haveVar (cid:0) (cid:107)∇ T ( X ) (cid:107) F (cid:1) ≤ E (cid:107)∇(cid:107)∇ T ( X ) (cid:107) F (cid:107) F = 4 E (cid:107) (cid:0) ∇ T ( X ) (cid:1) (cid:62) ∇ T ( X ) (cid:107) F ≤ E / (cid:107)∇ T ( X ) (cid:107) · E / (cid:107)∇ T ( X ) (cid:107) F . Combining the above display with (10.2) yields that E (cid:107)∇ T ( X ) (cid:107) F ≤ O ( p (1 + y − )) + 4 E / (cid:107)∇ T ( X ) (cid:107) · E / (cid:107)∇ T ( X ) (cid:107) F . IGH DIMENSIONAL COVARIANCE TESTING 41

Solving the quadratic inequality above and using (10.1), we arrive at E (cid:107)∇ T ( X ) (cid:107) F = O (cid:0) p (1 + y − ) ∨ E (cid:107)∇ T ( X ) (cid:107) (cid:1) = O ( p (1 + y − )) . Combining the above two estimates, we have E (cid:107)∇ T ( X ) (cid:107) F (cid:46) p max y ≥ min (cid:8) (1 + y ) , (1 + y − ) } (cid:16) p . The rest of the proof proceeds along the lines in the proof of Theorem 3.1,with the help of the variance formula in Proposition 3.9-(3). (cid:3)

Ratio control.

Proof of Proposition 3.9. (1). Recall that Z , . . . , Z n are i.i.d. samples from N (0 , I p ). By Lemma 10.1, with S Z ≡ N − (cid:80) Ni =1 Z i Z (cid:62) i , T Σ;LNW ( Z ) = (cid:20) Z Σ / (cid:0) Σ / S Z Σ / − I (cid:1) − N − tr(Σ S Z ) Z Σ / (cid:21) Σ / = Z Σ S Z Σ − Z Σ − N − tr(Σ S Z ) Z Σ , so T Σ;LNW ( Z ) − T I ;LNW ( Z )= (cid:104) Z Σ( S Z Σ − I ) − Z ( S Z − I ) (cid:105) − N (cid:104) tr(Σ S Z ) Z Σ − tr( S Z ) Z (cid:105) = (cid:104) Z Σ( S Z Σ − I ) − Z ( S Z Σ − I ) + Z ( S Z Σ − I ) − Z ( S Z − I ) (cid:105) − N (cid:104) tr(Σ S Z ) Z Σ − tr(Σ S Z ) Z + tr(Σ S Z ) Z − tr( S Z ) Z (cid:105) = Z (Σ − I )( S Z Σ − I ) + ZS Z (Σ − I ) − N tr(Σ S Z ) Z (Σ − I ) − N tr (cid:0) (Σ − I ) S Z (cid:1) Z ≡ V ( Z ) + V ( Z ) + V ( Z ) + V ( Z ) . Note that (cid:107) V ( Z ) (cid:107) F ≤ (cid:107) S Z Σ − I (cid:107) (cid:107) Z (Σ − I ) (cid:107) F ≤ (cid:107) S Z Σ − I (cid:107) (cid:107) Z (cid:107) (cid:107) Σ − I (cid:107) F , (cid:107) V ( Z ) (cid:107) F ≤ (cid:107) ZS Z (cid:107) (cid:107) Σ − I (cid:107) F ≤ (cid:107) Z (cid:107) (cid:107) S Z (cid:107) (cid:107) Σ − I (cid:107) F , (cid:107) V ( Z ) (cid:107) F ≤ N − tr (Σ S Z ) (cid:107) Z (Σ − I ) (cid:107) F ≤ p N − (cid:107) Σ (cid:107) (cid:107) S Z (cid:107) (cid:107) Z (Σ − I ) (cid:107) F , (cid:107) V ( Z ) (cid:107) F ≤ N − tr (cid:0) (Σ − I ) S Z (cid:1) (cid:107) Z (cid:107) F ≤ N − (cid:107) S Z (cid:107) F (cid:107) Z (cid:107) F (cid:107) Σ − I (cid:107) F ≤ pN − (cid:107) S Z (cid:107) (cid:107) Z (cid:107) F (cid:107) Σ − I (cid:107) F . Under p/N ≤ M , we have V (cid:46) M N (cid:0) (cid:107) Σ (cid:107) ∨ (cid:1) (cid:107) Σ − I (cid:107) F . (2). By Lemma D.4, with δ N ≡ N − − N − , m Σ = N (cid:20) E tr( S − I ) − N E tr ( S ) (cid:21) = N (cid:20) E tr( S ) − E tr( S ) + p − N E tr ( S ) (cid:21) = N (cid:20)(cid:0) N − (cid:1) tr(Σ ) + N − tr (Σ) − p − N − tr (Σ) − N − tr(Σ ) (cid:21) = N (cid:2) (1 + δ N ) tr(Σ ) − p (cid:3) . Hence m Σ − m I = N (cid:104) (1 + δ N ) tr(Σ − I ) − − I ) (cid:105) = N (cid:2) (cid:107) Σ − I (cid:107) F + δ N tr(Σ − I ) (cid:3) . (3). By the Plancherel’s theorem (i.e., [Cha14, formula (6.2)]), we have σ I ;LNW = (cid:88) ( ij ) (cid:2) E ∂ ( ij ) T ( X ) (cid:3) + 12! (cid:88) ( i j )( i j ) (cid:2) E ∂ ( i j )( i j ) T ( X ) (cid:3) + 13! (cid:88) ( i j )( i j )( i j ) (cid:2) E ∂ ( i j )( i j )( i j ) T ( X ) (cid:3) + 14! (cid:88) ( i j )( i j )( i j )( i j ) (cid:2) E ∂ ( i j )( i j )( i j )( i j ) T ( X ) (cid:3) ≡ ( I ) + ( II ) + ( III ) + ( IV ) . Terms ( I ) - ( IV ) are handled as follows: • To handle ( I ), note that E ∂ ( ij ) T ( X ) = E e (cid:62) j ( S − I ) X i − E (cid:2) (tr( S ) /N ) X ij (cid:3) . The ﬁrst term satisﬁes E e (cid:62) j ( S − I ) X i = E e (cid:62) j (cid:18) N N (cid:88) k =1 X k X (cid:62) k (cid:19) X i = N − e (cid:62) j E ( X i · (cid:107) X i (cid:107) )= N − e (cid:62) j E (cid:18) X i (cid:107) X i (cid:107) · (cid:107) X i (cid:107) (cid:19) = N − e (cid:62) j E (cid:18) X i (cid:107) X i (cid:107) (cid:19) · E (cid:107) X i (cid:107) = 0 . A similar identity holds for the second term, so ( I ) = 0. • ( II ) (cid:46) p/N = o ( p ) by noting that E ∂ ( i j )( i j ) T ( X ) = ( N − − N − ) · δ i i δ j j . • ( III ) = 0 by direct calculation. • ( IV ) = 6 p (cid:0) o (1) (cid:1) by direct calculation. IGH DIMENSIONAL COVARIANCE TESTING 43

The proof is now complete by collecting all of the estimates.(4). By (1)-(3), (cid:107) Σ (cid:107) op ≤ (cid:107) Σ − I (cid:107) F + 1 and the condition p/N ≤ M , we onlyneed to show that √ N (cid:107) Σ − I (cid:107) F (cid:87) (cid:113) N (cid:107) Σ − I (cid:107) F (cid:0) N (cid:107) Σ − I (cid:107) F − N δ N | tr(Σ − I ) | (cid:1) + (cid:87) σ I ;LNW ≤ C M ( σ I ;LNW ∧ N ) / . (10.3)Note that with { λ j } pj =1 denoting the eigenvalues of Σ, | tr(Σ − I ) | = (cid:12)(cid:12)(cid:12)(cid:12) p (cid:88) j =1 ( λ j − (cid:12)(cid:12)(cid:12)(cid:12) ≤ max j ( λ j + 1) · p (cid:88) j =1 | λ j − |≤ √ p ( (cid:107) Σ (cid:107) op + 1) (cid:107) Σ − I (cid:107) F (cid:46) M √ N ( (cid:107) Σ − I (cid:107) F ∨ (cid:107) Σ − I (cid:107) F , (10.4)so for N large enough, (10.3) is satisﬁed provided that √ N (cid:107) Σ − I (cid:107) F (cid:87) (cid:113) N (cid:107) Σ − I (cid:107) F (cid:0) N (cid:107) Σ − I (cid:107) F − C (cid:48) M √ N (cid:107) Σ − I (cid:107) F (cid:1) + (cid:87) σ I ;LNW ≤ C M ( σ I ;LNW ∧ N ) / . (10.5)To see this, note that the left hand side of the above display is bounded, upto a constant that may depend on M , by √ N (cid:107) Σ − I (cid:107) F ≤ C (cid:48) M σ I ;LNW + √ N (cid:107) Σ − I (cid:107) F > C (cid:48) M √ N (cid:107) Σ − I (cid:107) F (cid:87) (cid:113) N (cid:107) Σ − I (cid:107) F N (cid:107) Σ − I (cid:107) F ∨ σ I ;LNW (cid:46) σ I ;LNW + √ N (cid:107) Σ − I (cid:107) F N (cid:107) Σ − I (cid:107) F ∨ σ I ;LNW + (cid:113) N (cid:107) Σ − I (cid:107) F N (cid:107) Σ − I (cid:107) F ∨ σ I ;LNW ≤ σ I ;LNW + 1 N / + 1inf x ≥ (cid:0) x ∨ σ I ;LNW x (cid:1) ≤ RHS of (10.5) . This completes the proof. (cid:3)

Completing the proof for power expansion.

Proof of Theorem 3.10.

Abbreviate Ψ

LNW by Ψ. By Theorem 3.8 and Propo-sition 3.9, we have (cid:12)(cid:12)(cid:12)(cid:12) E Σ Ψ( X ) − P (cid:18) N (cid:18) N · (cid:0) (cid:107) Σ − I (cid:107) F + Q LNW (Σ) (cid:1) σ I ;LNW , (cid:19) > z α (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C · p − / . We only need to remove the residual term Q LNW (Σ). To see this, note thatby (10.4), | Q LNW (Σ) | ≤ C M N − / ( (cid:107) Σ − I (cid:107) F ∨ (cid:107) Σ − I (cid:107) F . So using Lemma 2.4 we have∆ P ≤ C α,M ( (cid:107) Σ − I (cid:107) F ∨ N / (cid:107) Σ − I (cid:107) F , where ∆ P ≡ P (cid:18) N (cid:18) N · (cid:0) (cid:107) Σ − I (cid:107) F + Q LNW (Σ) (cid:1) σ I ;LNW , (cid:19) > z α (cid:19) − P (cid:18) N (cid:18) N · (cid:107) Σ − I (cid:107) F σ I ;LNW , (cid:19) > z α (cid:19) . On the other hand, by anti-concentration of normal random variable,∆ P ≤ C M N / ( (cid:107) Σ − I (cid:107) F ∨ (cid:107) Σ − I (cid:107) F σ I ;LNW . Hence∆ P (cid:46) α,M ( (cid:107) Σ − I (cid:107) F ∨ N / (cid:107) Σ − I (cid:107) F (cid:94) N / ( (cid:107) Σ − I (cid:107) F ∨ (cid:107) Σ − I (cid:107) F σ I ;LNW ≤ (cid:107) Σ − I (cid:107) F > N / + (cid:107) Σ − I (cid:107) F ≤ (cid:20) N / (cid:107) Σ − I (cid:107) F (cid:94) N / (cid:107) Σ − I (cid:107) F σ I ;LNW (cid:21) ≤ N / + 1inf α ≥ (cid:0) x ∨ σ I ;LNW x (cid:1) (cid:16) σ I ;LNW ∧ N ) / . Similarly we may get a lower bound for ∆ P . The proof is complete. (cid:3) Proofs for Section 3.4

In the proof of this subsection, we write S n ≡ n − (cid:80) nk =1 X k X (cid:62) k , where X i ’s are i.i.d. N (0 , Σ).11.1.

Evaluation of derivatives.Lemma 11.1.

Recall the form of T CM ( X ) in (3.10). Then for any ( i, j ) , ( i (cid:48) , j (cid:48) ) ∈ [ n ] × [ p ] ,(1) ∂ ( ij ) T CM ( X ) = nn − X (cid:62) i ( S n − I ) e j − n − ( (cid:107) X i (cid:107) − X ij .(2) ∂ ( ij ) , ( i (cid:48) j (cid:48) ) T CM ( X ) = nn − (cid:2) δ ii (cid:48) ( S n − I ) jj (cid:48) + n − X (cid:62) i X i (cid:48) δ jj (cid:48) + n − X ij (cid:48) X i (cid:48) j (cid:3) − n − (cid:2) δ ii (cid:48) X ij (cid:48) X ij + ( (cid:107) X i (cid:107) − δ ii (cid:48) δ jj (cid:48) (cid:3) .Proof. (1). Note that for any 1 ≤ k < (cid:96) ≤ n , we have ∂ ( ij ) h ( X k , X (cid:96) ) = ∂ ( ij ) ( X (cid:62) k X (cid:96) ) − ∂ ( ij ) ( X (cid:62) k X k + X (cid:62) (cid:96) X (cid:96) )= 2( X (cid:62) k X (cid:96) )( δ ki X (cid:96)j + δ (cid:96)i X kj ) − (cid:0) δ ik X ij + 2 δ (cid:96)i X ij (cid:1) = 2 δ ki (cid:2) ( X (cid:62) k X (cid:96) ) X (cid:96)j − X ij (cid:3) + 2 δ (cid:96)i (cid:2) ( X (cid:62) k X (cid:96) ) X kj − X ij (cid:3) . The above display entails that ∂ ( ij ) T CM ( X ) = n (cid:18) n (cid:19) − (cid:88) k<(cid:96) ∂ ( ij ) h ( X k , X (cid:96) )= n (cid:18) n (cid:19) − (cid:88) k<(cid:96) (cid:104) δ ki (cid:0) ( X (cid:62) k X (cid:96) ) X (cid:96)j − X ij (cid:1) + δ (cid:96)i (cid:0) ( X (cid:62) k X (cid:96) ) X kj − X ij (cid:1)(cid:105) IGH DIMENSIONAL COVARIANCE TESTING 45 = 2 n − (cid:88) k ∈ [ n ]: k (cid:54) = i (cid:2) ( X (cid:62) i X k ) X kj − X ij (cid:3) = 2 n − X (cid:62) i (cid:104) (cid:88) k ∈ [ n ]: k (cid:54) = i ( X k X (cid:62) k − I ) (cid:105) e j = 2 nn − X (cid:62) i ( S n − I ) e j − n − (cid:0) (cid:107) X i (cid:107) − (cid:1) X ij . (2). By (1), ∂ ( ij ) , ( i (cid:48) j (cid:48) ) T CM ( X ) = 2 nn − ∂ ( i (cid:48) j (cid:48) ) (cid:2) X (cid:62) i ( S n − I ) e j (cid:3) − n − ∂ ( i (cid:48) j (cid:48) ) (cid:104) ( (cid:107) X i (cid:107) − X ij (cid:105) = 2 nn − (cid:104) δ ii (cid:48) ( S n − I ) jj (cid:48) + n − X (cid:62) i X i (cid:48) δ jj (cid:48) + n − X ij (cid:48) X i (cid:48) j (cid:105) − n − (cid:104) δ ii (cid:48) X ij (cid:48) X ij + ( (cid:107) X i (cid:107) − δ ii (cid:48) δ jj (cid:48) (cid:105) . The proof is then completed. (cid:3)

Normal approximation.

Proof of Theorem 3.11.

We abbreviate T CM as T . We ﬁrst bound the oper-ator norm of the Hessian. By Lemma 11.1-(2), we have ∂ ( ij ) , ( i (cid:48) j (cid:48) ) T ( X ) = 2 nn − δ ii (cid:48) ( S n − I ) jj (cid:48) + 2 n − X (cid:62) i X i (cid:48) δ jj (cid:48) + 2 n − X ij (cid:48) X i (cid:48) j − n − δ ii (cid:48) X ij X ij (cid:48) − n − (cid:107) X i (cid:107) − δ ii (cid:48) δ jj (cid:48) ≡ ( T , + T , + T , − T , − T , ) ( ij ) , ( i (cid:48) j (cid:48) ) . In view of the proof of Theorem 3.8, we have E (cid:107) T , (cid:107) ∨ E (cid:107) T , (cid:107) ∨ E (cid:107) T , (cid:107) = O ((1 ∨ y ) ). To handle T , , note that (cid:107) T , (cid:107) op (cid:46) n − · sup u ∈ B n × p (1) (cid:12)(cid:12)(cid:12)(cid:12) (cid:88) ( ij ) , ( i (cid:48) j (cid:48) ) u ij u i (cid:48) j (cid:48) δ ii (cid:48) X ij X ij (cid:48) (cid:12)(cid:12)(cid:12)(cid:12) = n − · sup u ∈ B n × p (1) (cid:12)(cid:12)(cid:12)(cid:12) (cid:88) i,j,j (cid:48) u ij u ij (cid:48) X ij X ij (cid:48) (cid:12)(cid:12)(cid:12)(cid:12) = n − · sup u ∈ B n × p (1) (cid:12)(cid:12)(cid:12)(cid:12) (cid:88) i (cid:16) (cid:88) j u (cid:62) i e j e (cid:62) j X i (cid:17) · (cid:16) (cid:88) j (cid:48) u (cid:62) i e j (cid:48) e (cid:62) j (cid:48) X i (cid:17)(cid:12)(cid:12)(cid:12)(cid:12) = n − · sup u ∈ B n × p (1) (cid:88) i (cid:0) u (cid:62) i X i (cid:1) = n − · sup u ∈ B n × p (1) (cid:88) i (cid:0) e (cid:62) i U X (cid:62) e i (cid:1) = n − · sup u ∈ B n × p (1) tr( U X (cid:62) XU (cid:62) )= n − · sup u ∈ B n × p (1) (cid:107) XU (cid:62) (cid:107) F ≤ n − (cid:107) X (cid:107) = (cid:107) S n (cid:107) op . Hence by Lemma 7.4, we have E (cid:107) T , (cid:107) (cid:46) ( y ∨ √ y ) = O ((1 ∨ y ) ). Tohandle T , , note that (cid:107) T , (cid:107) op (cid:46) n − · sup u ∈ B n × p (1) (cid:12)(cid:12)(cid:12)(cid:12) (cid:88) ( ij ) , ( i (cid:48) j (cid:48) ) u ij u i (cid:48) j (cid:48) ( (cid:107) X i (cid:107) − δ ii (cid:48) δ jj (cid:48) (cid:12)(cid:12)(cid:12)(cid:12) = n − · sup u ∈ B n × p (1) (cid:12)(cid:12)(cid:12)(cid:12) (cid:88) ij u ij ( (cid:107) X i (cid:107) − (cid:12)(cid:12)(cid:12)(cid:12) = n − · sup u ∈ B n × p (1) (cid:12)(cid:12)(cid:12)(cid:12) (cid:88) i (cid:107) u i (cid:107) ( (cid:107) X i (cid:107) − (cid:12)(cid:12)(cid:12)(cid:12) ≤ n − · sup u ∈ B n × p (1) (cid:88) i (cid:107) u i (cid:107) (cid:107) X i (cid:107) + n − = n − · max i ∈ [ n ] (cid:107) X i (cid:107) + n − . This entails that, with (cid:107) X i (cid:107) following χ ( p ), E (cid:107) T , (cid:107) (cid:46) n − E (cid:0) max i ∈ [ n ] (cid:107) X i (cid:107) (cid:1) + n − (cid:46) ( y log n ) ∨ . By putting together the estimates for T , - T , , we have that E (cid:107)∇ T ( X ) (cid:107) (cid:46) y log n ∨ . (11.1)Next we bound the norm of the gradient. By Lemma 11.1-(1), we have ∂ ( ij ) T ( X ) = 2 nn − X (cid:62) i ( S n − I ) e j − n − (cid:107) X i (cid:107) − X ij ≡ ( T , − T , ) ( ij ) . Then E (cid:107)∇ T ( X ) (cid:107) F = E (cid:107) T , (cid:107) F + E (cid:107) T , (cid:107) F − E tr( T (cid:62) , T , ) ≡ ( I ) + ( II ) − III ) . For ( I ), we have by Lemmas D.4 and D.5,( I ) = (cid:16) nn − (cid:17) · n · E tr (cid:104)(cid:0) S n − I (cid:1) S n (cid:105) = (cid:16) nn − (cid:17) · n · (cid:2) E tr( S n ) − E tr( S n ) + E tr( S n ) (cid:3) = (cid:16) nn − (cid:17) · n · ( py + py + 3 y + y + 4 yn − )= (cid:16) nn − (cid:17) · n (cid:2) n − p + n − p (1 + O ( n ∧ p ) − ) (cid:3) = 4 n ( n − p + 4 p (cid:2) O ( n ∧ p ) − (cid:3) . IGH DIMENSIONAL COVARIANCE TESTING 47

For ( II ), we have( II ) = (cid:16) n − (cid:17) E (cid:88) i,j (cid:0) (cid:107) X i (cid:107) − (cid:1) X i,j = (cid:16) n − (cid:17) · n · E (cid:0) (cid:107) X (cid:107) − (cid:107) X (cid:107) + (cid:107) X (cid:107) (cid:1) = 4 n ( n − p + O ( n − p ) . For (

III ), we have(

III ) = 2 nn − · n − · n · E X (cid:62) ( S n − I ) X ( (cid:107) X (cid:107) − n ( n − (cid:20) n − E (cid:18) X (cid:62) (cid:88) j X j X (cid:62) j X ( (cid:107) X (cid:107) − (cid:19) − E (cid:107) X (cid:107) + E (cid:107) X (cid:107) (cid:21) = 4 n ( n − (cid:20) n − (cid:16) E (cid:107) X (cid:107) − E (cid:107) X (cid:107) + ( n − E (cid:107) X (cid:107) − ( n − E (cid:107) X (cid:107) (cid:17) − E (cid:107) X (cid:107) + E (cid:107) X (cid:107) (cid:21) = 4 n ( n − (cid:0) E (cid:107) X (cid:107) − E (cid:107) X (cid:107) + E (cid:107) X (cid:107) (cid:1) = 4 n ( n − p + O ( n − p ) . Combine the estimates for ( I )-( III ) to yield that E (cid:107)∇ T ( X ) (cid:107) F = 4 p (1 + O ( n ∧ p ) − ). Then by following the same argument as in the proof of The-orem 3.8 and using (11.1), we again arrive at E (cid:107)∇ T ( X ) (cid:107) F = (cid:0) E (cid:107)∇ T ( X ) (cid:107) F (cid:1) + O (cid:0) E (cid:107)∇ T ( X ) (cid:107) (cid:1) = 16 p (cid:0) O ( n ∧ p ) − (cid:1) . The rest of the proof follows from the same lines in the proof of Theorem 3.1,with the help of the variance formula in Proposition 3.12-(3). The normalapproximation error bound then becomes a constant multiple of( y log n ∨ · pp = p log nn ∨ p = log nn (cid:95) p , as desired. (cid:3) Ratio control.

Proof of Proposition 3.12. (1). Recall that Z , . . . , Z n are i.i.d. samplesfrom N (0 , I p ). Let S X ≡ S n = n − (cid:80) ni =1 X i X (cid:62) i and S Z ≡ n − (cid:80) ni =1 Z i Z (cid:62) i .Then for any ( i, j ) ∈ [ n ] × [ p ], Lemma 11.1-(1) implies that (cid:0) T Σ ( Z ) (cid:1) ( ij ) = (cid:88) ¯ j (cid:0) ∇ T ( X ) (cid:1) ( i ¯ j ) (Σ / ) ¯ jj = 2 nn − (cid:88) ¯ j X (cid:62) i ( S X − I ) e ¯ j e (cid:62) ¯ j Σ / e j − n − (cid:107) X i (cid:107) − (cid:88) ¯ j X (cid:62) i e ¯ j e (cid:62) ¯ j Σ / e j = 2 nn − X (cid:62) i ( S X − I )Σ / e j − n − (cid:107) X i (cid:107) − X (cid:62) i Σ / e j = 2 nn − Z (cid:62) i (Σ S Z Σ − Σ) e j − n − (cid:107) Σ / Z i (cid:107) − Z (cid:62) i Σ e j . This entails that (cid:107) T Σ ( Z ) − T I ( Z ) (cid:107) F (cid:46) (cid:88) i,j (cid:104) Z (cid:62) i (Σ S Z Σ − Σ − S Z + I ) e j (cid:105) + n − (cid:88) i,j (cid:104) ( (cid:107) Σ / Z i (cid:107) − Z (cid:62) i Σ e j − ( (cid:107) Z i (cid:107) − Z (cid:62) i e j (cid:105) ≡ V + V . To handle V , note that V = n tr (cid:104) (Σ S Z Σ − Σ − S Z + I ) S Z (cid:105) ≤ n (cid:107) S Z (cid:107) op · (cid:107) (Σ − I ) S Z Σ + S Z (Σ − I ) − (Σ − I ) (cid:107) F (cid:46) n (cid:107) Σ − I (cid:107) F · (cid:107) S Z (cid:107) op (cid:0) (cid:107) S Z (cid:107) ∨ (cid:1) · (cid:0) (cid:107) Σ (cid:107) ∨ (cid:1) . Hence by Lemma 7.4, we have E V (cid:46) n (1 ∨ y ) (cid:0) (cid:107) Σ (cid:107) ∨ (cid:1) (cid:107) Σ − I (cid:107) F .To handle V , note that n V (cid:46) (cid:88) i,j (cid:104) ( (cid:107) Σ / Z i (cid:107) − Z (cid:62) i Σ e j − ( (cid:107) Z i (cid:107) − Z (cid:62) i Σ e j (cid:105) + (cid:88) i,j (cid:104) ( (cid:107) Z i (cid:107) − Z (cid:62) i Σ e j − Z (cid:62) i e j ) (cid:105) ≡ V , + V , . To handle V , , we have V , = (cid:88) i,j (cid:0) Z (cid:62) i (Σ − I ) Z i (cid:1) · (cid:0) Z (cid:62) i Σ e j ) = (cid:88) i (cid:0) Z (cid:62) i (Σ − I ) Z i (cid:1) · Z (cid:62) i Σ Z i = (cid:88) i tr (cid:0) (Σ − I ) Z i Z (cid:62) i (Σ − I ) Z i Z (cid:62) i Σ Z i Z (cid:62) i (cid:1) ≤ (cid:88) i tr (cid:0) (Σ − I ) Z i Z (cid:62) i (Σ − I ) (cid:1) · max i ∈ [ n ] (cid:107) Z i Z (cid:62) i Σ Z i Z (cid:62) i (cid:107) op ≤ n · (cid:107) Σ − I (cid:107) F · (cid:107) S Z (cid:107) op · (cid:107) Σ (cid:107) · max i ∈ [ n ] (cid:107) Z i (cid:107) , where in the above we repeatedly use the fact that tr( AB ) ≤ tr( A ) (cid:107) B (cid:107) op for any p.s.d. matrices A, B . Hence by Lemma 7.4, we have E V , ≤ n · (cid:107) Σ − I (cid:107) F · (cid:107) Σ (cid:107) · E / (cid:107) S Z (cid:107) · E / max i ∈ [ n ] (cid:107) Z i (cid:107) (cid:46) np (1 ∨ y )(log n ) · (cid:107) Σ (cid:107) (cid:107) Σ − I (cid:107) F . To handle V , , we have V , = (cid:88) i ( (cid:107) Z i (cid:107) − Z (cid:62) i (Σ − I ) Z i (cid:46) n · (cid:107) Σ − I (cid:107) F · (cid:107) S Z (cid:107) op · (cid:0) max i ∈ [ n ] (cid:107) Z i (cid:107) ∨ (cid:1) . Hence the same bound as above implies that E V , (cid:46) np (1 ∨ y )(log n ) (cid:0) (cid:107) Σ (cid:107) ∨ (cid:1) (cid:107) Σ − I (cid:107) F . Combining the estimates of E V , and E V , completes the proofof claim (1).(2) and (3). These follow directly from the mean and variance formula(3.11).(4). By (1)-(3), it remains to prove that (cid:113) n (cid:0) (cid:107) Σ (cid:107) ∨ (cid:1) (cid:107) Σ − I (cid:107) F n (cid:107) Σ − I (cid:107) F ∨ p ≤ C ( n ∧ p ) / holds for some universal constant C >

0. Using (cid:107) Σ (cid:107) op ≤ (cid:107) Σ − I (cid:107) op + 1 ≤(cid:107) Σ − I (cid:107) F + 1, it suﬃces to prove √ n (cid:107) Σ − I (cid:107) F (cid:87) (cid:113) n (cid:107) Σ − I (cid:107) F n (cid:107) Σ − I (cid:107) F ∨ p ≤ C ( n ∧ p ) / . This is weaker than the proven inequality (10.5). (cid:3)

Completing the proof for power expansion.

Proof of Theorem 3.14.

By Corollary 2.3 and Proposition 3.12, the error ofpower expansion is bounded by a constant multiple of (cid:16) log nn ∨ p (cid:17) + (cid:16) ∨ y ( n ∧ p ) / (cid:17) (cid:95) (cid:16) y / (1 ∨ y ) / (log n ) / ( n ∧ p ) / (cid:17) ≡ (cid:16) log nn ∨ p (cid:17) + ( I ) ∨ ( II ) . Using the condition p/n ≤ M , we have ( I ) (cid:46) M p − / . For ( II ), we have( II ) ≤ p / n − / (log n ) / ≤ n − / (log n ) / when p ≤ n and ( II ) (cid:46) M n − / (log n ) / otherwise. The proof is complete. (cid:3) Proofs for Section 4.1

Evaluation of derivatives.Lemma 12.1.

Recall the form of T LRT ,s ( X ) in (4.2) and the deﬁnition of b ( S ) in (4.3). Then for any ( i, j ) , ( i (cid:48) , j (cid:48) ) ∈ [ N ] × [ p ] ,(1) ∂ ( ij ) T LRT ,s ( X ) = (cid:0) X ( I − S − ) (cid:1) ( ij ) + (cid:0) /b ( S ) − (cid:1) X ij = e (cid:62) j (cid:2) ( I − S − ) X i + (cid:0) /b ( S ) − (cid:1) X i (cid:3) . (2) ∂ ( ij ) , ( i (cid:48) j (cid:48) ) T LRT ,s ( X ) = N − X (cid:62) i S − ( e j (cid:48) X (cid:62) i (cid:48) + X i (cid:48) e (cid:62) j (cid:48) ) S − e j + δ ii (cid:48) e (cid:62) j ( I − S − ) e j (cid:48) + (cid:0) /b ( S ) − (cid:1) δ ii (cid:48) δ jj (cid:48) − (2 /N p ) X ij X i (cid:48) j (cid:48) /b ( S ) .Proof. (1). We shorthand T LRT ,s ( X ) as T . By deﬁnition, (8.1) and (8.3),we have ∂ ( ij ) T ( X ) = N (cid:0) p · ∂ ( ij ) log tr( S ) − ∂ ij log det S (cid:1) = N (cid:18) p ∂ ( ij ) tr( S )tr( S ) − p (cid:88) k,(cid:96) =1 ∂ log det S∂S k(cid:96) ∂S k(cid:96) ∂X ij (cid:19) = N (cid:20) pN X ij tr( S ) − p (cid:88) k,(cid:96) =1 ( S − ) k(cid:96) · N (cid:0) δ kj X i(cid:96) + δ (cid:96)j X ik (cid:1)(cid:21) = p tr( S ) X ij − p (cid:88) k =1 ( S − ) kj X ik = (cid:0) X ( I − S − ) (cid:1) ij + (cid:18) p tr( S ) − (cid:19) X ij . (2). By the previous part, we have ∂ ( ij ) , ( i (cid:48) j (cid:48) ) T ( X ) = ∂ ( i (cid:48) j (cid:48) ) (cid:0) X ( I − S − ) (cid:1) ij + ∂ ( i (cid:48) j (cid:48) ) (cid:18) p tr( S ) − (cid:19) X ij ≡ ( I ) + ( II ) . The ﬁrst term above is already calculated in Lemma 8.1-(2):( I ) = N − X (cid:62) i S − ( e j (cid:48) X (cid:62) i (cid:48) + X i (cid:48) e (cid:62) j (cid:48) ) S − e j + δ ii (cid:48) e (cid:62) j ( I − S − ) e j (cid:48) . So we only need to evaluate the second term:( II ) = p · ∂ ( i (cid:48) j (cid:48) ) tr − ( S ) · X ij + (cid:16) p tr( S ) − (cid:17) ∂ ( i (cid:48) j (cid:48) ) X ij = − p · ∂ ( i (cid:48) j (cid:48) ) tr( S ) · X ij · tr − ( S ) + (cid:16) p tr( S ) − (cid:17) δ ii (cid:48) δ jj (cid:48) = − pN X ij X i (cid:48) j (cid:48) · tr − ( S ) + (cid:16) p tr( S ) − (cid:17) δ ii (cid:48) δ jj (cid:48) . The proof is complete. (cid:3)

Normal approximation.

Proof of Theorem 4.1.

We abbreviate T LRT ,s ( X ) as T . First we bound thenorm for the gradient. Comparing Lemmas 8.1-(1) and 12.1-(1), we onlyneed to control E (cid:107) (cid:0) b − ( S ) − (cid:1) X (cid:107) F = E (cid:0) N (cid:0) b − ( S ) − (cid:1) tr( S ) (cid:1) ≤ N p · E / b ( S ) · E / (cid:0) b − ( S ) − (cid:1) (cid:46) N p · (cid:16) pN (cid:17) = p . IGH DIMENSIONAL COVARIANCE TESTING 51

The inequality in the ﬁnal line of the above display follows as E b ( S ) ≤ E (cid:107) S (cid:107) (cid:46) , (12.1) E (cid:0) b − ( S ) − (cid:1) = E / b − ( S ) · E / (cid:0) b ( S ) − (cid:1)

16 ( ∗ ) (cid:46) ( pN − ) . (12.2)Here ( ∗ ) follows from Lemma D.4-(3). Now by combining with (8.7) derivedin the proof of Theorem 3.1, we see that E (cid:107)∇ T ( X ) (cid:107) F (cid:46) p .Next we bound the spectral norm of the Hessian. Comparing Lemmas8.1-(1) and 12.1-(1), we only need to control the spectral norms of T and T , where ( T ) ( ij ) , ( i (cid:48) j (cid:48) ) ≡ (cid:0) b − ( S ) − (cid:1) δ ii (cid:48) δ jj (cid:48) , ( T ) ( ij ) , ( i (cid:48) j (cid:48) ) ≡ − N p X ij X i (cid:48) j (cid:48) · b − ( S ) . For T , clearly (cid:107) T (cid:107) op = | /b ( S ) − | , so E (cid:107) T (cid:107) = E (cid:0) /b ( S ) − (cid:1) (cid:46) ( p/N ) by (12.1). For T , note that (cid:107) T (cid:107) op = 2 N p · b ( S ) sup u,v ∈ B N × p (1) (cid:12)(cid:12)(cid:12)(cid:12) (cid:88) ( ij ) , ( i (cid:48) j (cid:48) ) u ij X ij X i (cid:48) j (cid:48) v i (cid:48) j (cid:48) (cid:12)(cid:12)(cid:12)(cid:12) = 2 N p · b ( S ) (cid:107) X (cid:107) F = 2 b ( S ) . So E (cid:107) T (cid:107) (cid:46) E b − ( S ) = O (1) by Lemma D.4-(3). By combining with (8.11)derived in the proof of Theorem 3.1, we see that E (cid:107)∇ T ( X ) (cid:107) = O (1). Therest of the proof proceeds along the lines in the proof of Theorem 3.1, withthe help of the variance formula in Proposition 4.2-(3). (cid:3) Ratio control.

Proof of Proposition 4.2.

We abbreviate ( T LRT ,s , m Σ;LRT ,s , σ Σ;LRT ,s , V Σ;LRT ,s as ( T, m Σ; s , σ Σ; s , V Σ; s ), and assume without loss of generality that b (Σ) =tr(Σ) /p = 1 (otherwise we may replace Σ by Σ · b − (Σ)).(1). By Lemma 12.1, with S Z ≡ N − (cid:80) Ni =1 Z i Z (cid:62) i , we have T Σ; s = (cid:20) Z Σ / ( I − Σ − / S − Z Σ − / ) + (cid:18) b (Σ / S Z Σ / ) − (cid:19) Z Σ / (cid:21) Σ / = Z (Σ − S − Z ) + (cid:18) b (Σ / S Z Σ / ) − (cid:19) Z Σ= Z Σ b (Σ / S Z Σ / ) − ZS − Z . Hence V s = E (cid:107) T Σ; s − T I ; s (cid:107) F = E (cid:13)(cid:13)(cid:13)(cid:13) Z Σ b (Σ / S Z Σ / ) − Zb ( S Z ) (cid:13)(cid:13)(cid:13)(cid:13) F ≤ (cid:26) E (cid:20)(cid:18) b (Σ / S Z Σ / ) − b ( S Z ) (cid:19) (cid:107) Z Σ (cid:107) F (cid:21) + E (cid:2) b − ( S Z ) (cid:107) Z (Σ − I ) (cid:107) F (cid:3)(cid:27) ≡ E (cid:0) ( I ) + ( II ) (cid:1) . We bound ( I ) and ( II ) separately:( I ) = b − (Σ / S Z Σ / ) b − ( S Z ) b (cid:0) (Σ − I ) S Z (cid:1) (cid:107) Z Σ (cid:107) F ≤ b − (Σ / S Z Σ / ) b − ( S Z ) (cid:107) S Z (cid:107) · (cid:0) (cid:107) Σ (cid:107) F /p (cid:1) (cid:107) Z (cid:107) (cid:107) Σ − I (cid:107) F ;( II ) ≤ b − ( S Z ) · (cid:107) Z (cid:107) (cid:107) Σ − I (cid:107) F . Using Lemmas D.2 and 7.4, we have V s (cid:46) ( p − (cid:107) Σ − I (cid:107) F + 1) N (cid:107) (Σ − I ) (cid:107) F . On the other hand, a trivial bound for V s is V s = E (cid:13)(cid:13)(cid:13)(cid:13) Z Σ b (Σ / S Z Σ / ) − Zb ( S Z ) (cid:13)(cid:13)(cid:13)(cid:13) F (cid:46) E b − (Σ / S Z Σ / ) (cid:107) Z Σ (cid:107) F + E b − ( S Z ) (cid:107) Z (cid:107) F (cid:46) N (cid:0) (cid:107) Σ − I (cid:107) F ∨ p (cid:1) . Collecting the two bounds, we have V s (cid:46) (cid:2)(cid:0) p − (cid:107) Σ − I (cid:107) F + 1 (cid:1) N (cid:107) (Σ − I ) (cid:107) F (cid:3) (cid:94) N (cid:0) (cid:107) Σ − I (cid:107) F ∨ p (cid:1) (cid:16) N (cid:107) (Σ − I ) (cid:107) F . (2). As m Σ; s = N (cid:104) p · E log tr(Σ S Z ) − log det(Σ) − p log p − E log det( S Z ) (cid:105) , by Lemma D.3 we have m Σ; s − m I ; s = N (cid:2) − log det(Σ) + Q s (Σ) (cid:3) , where | Q s (Σ) | ≡ | p (cid:0) E log tr(Σ S Z ) − E log tr( S Z ) (cid:1) | (cid:46) N − (cid:110) b (Σ ) + e − cN (cid:2) b / (Σ ) (cid:3)(cid:111) (cid:46) N − (cid:104) b (Σ ) (cid:105) (cid:46) N − b (Σ ) , where the last inequality follows as b (Σ ) = p − (cid:80) pj =1 λ j ≥ p − ( (cid:80) pj =1 λ j ) =1.(3). Recall T LRT deﬁned in (3.4). Deﬁne∆( X ) ≡ T LRT ( X ) − T LRT ,s ( X ) . IGH DIMENSIONAL COVARIANCE TESTING 53

Then for any ε >

0, there exists some C ε > X , . . . , X n are i.i.d. N (0 , I p )), (cid:2) (1 − ε ) σ I ;LRT − C ε Var I (∆) (cid:3) + ≤ σ I ;LRT ,s ≤ (1 + ε ) σ I ;LRT + C ε Var I (∆) . (12.3)We will now bound Var I (∆). By Lemmas 8.1-(1) and 12.1-(1), we have forany i, j ∈ [ N ] × [ p ] ∂ ( ij ) ∆( X ) = ∂ ( ij ) T LRT ( X ) − ∂ ( ij ) T LRT ,s ( X ) = X ij (cid:2) b − ( S ) − (cid:3) . By the Gaussian-Poincar´e inequality [BLM13, Theorem 3.20],Var I ∆( X ) ≤ E (cid:2) b − ( S ) − (cid:3) (cid:107) X (cid:107) F = N p E (cid:2) b ( S ) − (cid:3) b − ( S ) ≤ N p · E / (cid:0) b ( S ) − (cid:1) · E / b − ( S ) ( ∗ ) (cid:46) N p · ( N p ) − = 1 . Here ( ∗ ) follows from Lemma D.4-(3). Hence by choosing ε in (12.3) to bedecaying to 0 slowly enough, σ I ;LRT and σ I ;LRT ,s share the same asymptoticformula in Proposition 3.2-(3).(4). By (1)-(2), and using that b (Σ ) = (cid:107) Σ (cid:107) F /p , we only need to prove thatfor a given constant C >

0, there exists some constant C = C ( C ) > (cid:113) N (cid:107) Σ − I (cid:107) F (cid:16) − N log det(Σ) − C (1 + (cid:107) Σ (cid:107) F p ) − C e − cN (cid:0) (cid:107) Σ (cid:107) F p / + 1 (cid:1)(cid:17) + (cid:87) σ I ; s ≤ C (cid:0) σ I ; s ∧ N (cid:1) / . Equivalently, with λ = ( λ , . . . , λ p ) ∈ (0 , ∞ ) p and ¯ λ ≡ p − (cid:80) j λ j = 1, weonly need to show (cid:113) N (cid:80) j ( λ j − (cid:16) N (cid:80) j − log(1 + ( λ j − − C − C (cid:0) (cid:80) j λ j p (cid:1) − C e − cN ( (cid:80) j λ j ) / p / (cid:17) + (cid:87) σ I ; s is at most a multiple of (cid:0) σ I ; s ∧ N (cid:1) − / . Let J ≡ { j : | λ j − | ≤ } and J c ≡ { j : | λ j − | > } . As | λ j − | (cid:46) p , so the ﬁrst term in the denominatorbecomes N (cid:88) j (cid:2) − log(1 + ( λ j − λ j − (cid:3) − C − C (cid:80) j λ j p − C e − cN ( (cid:80) j λ j ) / p / (cid:38) N (cid:88) j ( λ j − ∧ | λ j − | − C p − (cid:88) j ( λ j − − C . Next, by breaking the summation in (cid:80) j ( λ j − into J and J c , the abovedisplay equals N (cid:88) j ∈ J ( λ j − + N (cid:88) j ∈ J | λ j − | − C (cid:80) j ∈ J ( λ j − + (cid:80) j ∈ J c ( λ j − p − C ≥ ( N − C p − ) (cid:88) j ∈ J ( λ j − + ( N − O (1)) (cid:88) j ∈ J c | λ j − | − C ≥ N (cid:88) j ( λ j − ∧ | λ j − | − C for N and p large enough. Hence with ν j ≡ | λ j − | , we only need to showthat for given C > (cid:113) N (cid:80) j ∈ J ν j (cid:87) (cid:113) N (cid:80) j ∈ J c ν j (cid:16) N (cid:80) j ∈ J ν j + N (cid:80) j ∈ J c ν j − C (cid:17) + (cid:87) σ I ; s ≤ C (cid:0) σ I ; s ∧ N (cid:1) / . Equivalently, we only need to show (cid:113) N (cid:80) j ∈ J ν j (cid:0) N (cid:80) j ∈ J ν j − C (cid:1) + (cid:87) σ I ; s ≤ Cσ / I ; s , (12.4) (cid:113) N (cid:80) j ∈ J c ν j (cid:0) N (cid:80) j ∈ J c ν j − C (cid:1) + (cid:87) σ I ; s ≤ CN / . (12.5)To see these inequalities, note that the left side of (12.4) is bounded by N (cid:80) j ∈ J ν j ≤ C (2 C ) / σ I ; s + N (cid:80) j ∈ J ν j > C (cid:113) N (cid:80) j ∈ J ν j ( N/ (cid:80) j ∈ J ν j (cid:87) σ I ; s (cid:46) σ I ; s + 1inf x ≥ (cid:0) x ∨ σ I ; s x (cid:1) (cid:46) σ − / I ; s . Also, the left side of (12.5) is bounded by √ N (cid:80) j ∈ J c ν j (cid:0) N (cid:80) j ∈ J c ν j − C (cid:1) + (cid:87) σ I ; s ≤ N (cid:80) j ∈ Jc ν j ≤ C (2 C ) / √ N σ I ; s + N (cid:80) j ∈ Jc ν j > C √ N (cid:80) j ∈ J c ν j N (cid:80) j ∈ J c ν j (cid:87) σ I ; s (cid:46) √ N σ I ; s + 1 √ N (cid:46) N / , proving the claim. (cid:3) IGH DIMENSIONAL COVARIANCE TESTING 55

Completing the proof for power expansion.

Proof of Theorem 4.3.

The proof is similar to that of Theorem 3.10, weprovide some details for the convenience of the reader. Without loss ofgenerality we assume b (Σ) = 1. Abbreviate Ψ LRT ,s by Ψ and Q LRT ,s (Σ) by Q (Σ). By Theorem 4.1 and Proposition 4.2, we have (cid:12)(cid:12)(cid:12)(cid:12) E Σ Ψ( X ) − P (cid:18) N (cid:18) N · (cid:0) − log det (cid:0) Σ (cid:1) + Q (Σ) (cid:1) σ I ; s , (cid:19) > z α (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C · p − / . We only need to remove the residual term Q (Σ). To this end, we claim that | ∆ P | ≤ C α (cid:104) N Q (Σ) σ I ; s (cid:94) Q (Σ) | log det (cid:0) Σ (cid:1) | (cid:105) . (12.6)where ∆ P ≡ P (cid:18) N (cid:18) N · (cid:0) − log det (cid:0) Σ (cid:1) + Q (Σ) (cid:1) σ I ; s , (cid:19) > z α (cid:19) − P (cid:18) N (cid:18) − N log det (cid:0) Σ (cid:1) σ I ; s , (cid:19) > z α (cid:19) . Here the ﬁrst bound in (12.6) is by anti-concentration of the normal distri-bution, and the second bound in (12.6) follows from Lemma 2.4.Let { λ j } pj =1 be the eigenvalues of Σ so that (cid:80) pj =1 λ j = p . Then by (4.4), Q (Σ) (cid:46) N p ) − (cid:80) pj =1 λ j . Hence using the bound σ I ; s ≥ cp , (12.6) entailsthat | ∆ P | ≤ C (cid:48) α · (cid:104) p − (cid:80) pj =1 λ j p (cid:94) ( N p ) − (cid:80) pj =1 λ j (cid:80) pj =1 λ j − log λ j − (cid:105) . (12.7)If max j λ j ≤

10, we use the ﬁrst bound in (12.7) to conclude that ∆ P (cid:46) α p − . Otherwise, by writing J ≡ { j ∈ [ p ] : | λ j − | ≥ } and J c ≡ [ p ] \ J , thesecond bound in (12.7) yields that | ∆ P | (cid:46) α ( N p ) − (cid:80) pj =1 λ j (cid:80) pj =1 | λ j − | ∧ ( λ j − (cid:46) ( N p ) − (cid:80) j ∈ J ( λ j − + ( N p ) − ( | J | + | J c | ) (cid:80) j ∈ J | λ j − |≤ ( N p ) − (cid:80) j ∈ J ( λ j − (cid:80) j ∈ J | λ j − | + N − (cid:80) j ∈ J | λ j − |≡ ( I ) + ( II ) . Now ( II ) (cid:46) N − as max j λ j >

10, and ( I ) satisﬁes( I ) ≤ ( N p ) − max j ∈ J | λ j − | (cid:46) N − by using the trivial bound that max j λ j ≤ p due to the normalization b (Σ) =1. The proof is complete. (cid:3) Proofs for Section 4.2

Evaluation of derivatives.Lemma 13.1.

Recall the form of T J ( X ) in (4.5) and the deﬁnition of b (cid:96) ( S ) in (4.3). Then the following hold:(1) For the ﬁrst-order partial derivatives: for any ( i, j ) ∈ [ N ] × [ p ] , ∂ ( ij ) T J ( X ) = (cid:16) XSb ( S ) − X · b ( S ) b ( S ) (cid:17) ij = X (cid:62) i Se j b ( S ) − X ij b ( S ) b ( S ) . (2) For the second-order partial derivatives: for any ( i, j ) , ( i (cid:48) , j (cid:48) ) ∈ [ N ] × [ p ] , ∂ ( ij ) , ( i (cid:48) j (cid:48) ) T J ( X )= b ( S ) − (cid:0) N − δ jj (cid:48) X (cid:62) i X i (cid:48) + N − X i (cid:48) j X ij (cid:48) + δ ii (cid:48) S jj (cid:48) (cid:1) − δ ii (cid:48) δ jj (cid:48) b ( S ) b ( S )+ X ij X i (cid:48) j (cid:48) b ( S ) b ( S ) N p − b ( S ) N p (cid:104) X (cid:62) i Se j · X i (cid:48) j (cid:48) + X (cid:62) i (cid:48) Se j (cid:48) · X ij (cid:105) . Proof.

We abbreviate T J ( X ) by T ( X ) and write b = b ( S ) in the proof if noconfusion could arise.(1). Note that ∂ ij S ( X ) = N − ( e j X (cid:62) i + X i e (cid:62) j ), ∂ ( ij ) tr( S ) = 2 N − X ij and ∂ ( ij ) b ( S ) = 2 N p X ij ,∂ ( ij ) b ( S ) = 2 N p tr (cid:0) S ( e j X (cid:62) i + X i e (cid:62) j ) (cid:1) = 4 N p X (cid:62) i Se j . (13.1)For the ﬁrst-order derivatives we have ∂ ( ij ) T ( X ) = N (cid:20) (cid:18) Sb − I (cid:19) ∂ ( ij ) (cid:18) Sb (cid:19)(cid:21) = N (cid:20)(cid:18) Sb − I (cid:19) · N − ( e j X (cid:62) i + X i e (cid:62) j ) b − N p ) − SX ij b (cid:21) = 12 b tr (cid:2) ( S − bI )( e j X (cid:62) i + X i e (cid:62) j ) (cid:3) − X ij b p tr (cid:2) ( S − bI ) S (cid:3) = ( XS ) ij b − X ij b − (cid:20) X ij b b − X ij b (cid:21) = ( XS ) ij b − X ij · b b ≡ T , ( ij ) ( X ) − T , ( ij ) ( X ) . (2). For the second-order derivatives, ∂ ( i (cid:48) j (cid:48) ) T , ( ij ) ( X ) = ∂ ( i (cid:48) j (cid:48) ) ( X (cid:62) i Se j ) b − ( X (cid:62) i Se j ) ∂ ( i (cid:48) j (cid:48) ) b b = δ ii (cid:48) e (cid:62) j (cid:48) Se j + N − X (cid:62) i ( e j (cid:48) X (cid:62) i (cid:48) + X i (cid:48) e (cid:62) j (cid:48) ) e j b − pN X (cid:62) i Se j · X i (cid:48) j (cid:48) b = N − δ jj (cid:48) X (cid:62) i X i (cid:48) + N − X i (cid:48) j X ij (cid:48) + δ ii (cid:48) S jj (cid:48) b − pN X (cid:62) i Se j · X i (cid:48) j (cid:48) b , IGH DIMENSIONAL COVARIANCE TESTING 57 ∂ ( i (cid:48) j (cid:48) ) T , ( ij ) ( X ) = δ ii (cid:48) δ jj (cid:48) b b + X ij · ∂ ( i (cid:48) j (cid:48) ) (cid:20) b b (cid:21) = δ ii (cid:48) δ jj (cid:48) b b + X ij · (cid:20) X (cid:62) i (cid:48) Se j (cid:48) b N p − b X i (cid:48) j (cid:48) b N p (cid:21) = δ ii (cid:48) δ jj (cid:48) b b − X ij X i (cid:48) j (cid:48) b b N p + 4 X (cid:62) i (cid:48) Se j (cid:48) · X ij b N p .

Combining the above two displays, we have ∂ ( ij ) , ( i (cid:48) j (cid:48) ) T ( X ) = b − (cid:0) N − δ jj (cid:48) X (cid:62) i X i (cid:48) + N − X i (cid:48) j X ij (cid:48) + δ ii (cid:48) S jj (cid:48) (cid:1) − δ ii (cid:48) δ jj (cid:48) b b + X ij X i (cid:48) j (cid:48) b b N p − b N p (cid:104) X (cid:62) i Se j · X i (cid:48) j (cid:48) + X (cid:62) i (cid:48) Se j (cid:48) · X ij (cid:105) . The proof is complete. (cid:3)

Normal approximation.

Proof of Theorem 4.4.

We abbreviate T J by T and write b = b ( S ) in theproof if no confusion could arise. First we bound the operator norm of theHessian. By Lemma 13.1-(2), ∂ ( ij ) , ( i (cid:48) j (cid:48) ) T ( X ) = b − (cid:0) N − δ jj (cid:48) X (cid:62) i X i (cid:48) + N − X i (cid:48) j X ij (cid:48) + δ ii (cid:48) S jj (cid:48) (cid:1) − δ ii (cid:48) δ jj (cid:48) b b + X ij X i (cid:48) j (cid:48) b b N p − b N p (cid:104) X (cid:62) i Se j · X i (cid:48) j (cid:48) + X (cid:62) i (cid:48) Se j (cid:48) · X ij (cid:105) ≡ ( T − T + T − T ) ( ij ) , ( i (cid:48) j (cid:48) ) . Following the proof of Theorem 3.8 along with Lemma D.2, we have E (cid:107) T (cid:107) (cid:46) (1 ∨ y ) . Next for T , we have by Lemma D.2 and Lemma 7.4 that E (cid:107) T (cid:107) (cid:46) E ( b · b − ) ≤ E / b · E / b − (cid:46) E / (cid:107) S (cid:107) (cid:46) (1 ∨ y ) . The operator norm of T can be similarly bounded by E (cid:107) T (cid:107) = 6 ( N p ) E (cid:104)(cid:16) b b (cid:17) (cid:107) X (cid:107) F (cid:105) (cid:46) ( N p ) − E / b · E / b − E / (cid:107) X (cid:107) F (cid:46) ( N p ) − · E / b · ( N p ) (cid:46) (1 ∨ y ) . Lastly, (cid:107) T (cid:107) op (cid:46) b N p · sup u,v ∈ B N × p (1) (cid:12)(cid:12)(cid:12)(cid:12) (cid:88) ( ij ) , ( i (cid:48) j (cid:48) ) X (cid:62) i Se j · X i (cid:48) j (cid:48) u ij v i (cid:48) j (cid:48) (cid:12)(cid:12)(cid:12)(cid:12) = 1 b N p · sup u,v ∈ B N × p (1) (cid:12)(cid:12)(cid:12)(cid:12)(cid:16) (cid:88) i,j X (cid:62) i Se j u ij (cid:17)(cid:16) (cid:88) i (cid:48) j (cid:48) X i (cid:48) j (cid:48) v i (cid:48) j (cid:48) (cid:17)(cid:12)(cid:12)(cid:12)(cid:12) ≤ b N p · (cid:107) XS (cid:107) F · (cid:107) X (cid:107) F ≤ b N p · (cid:107) S (cid:107) op (cid:107) X (cid:107) F . Hence by Lemma 7.4 and Lemma D.2, E (cid:107) T ( X ) (cid:107) (cid:46) (1 ∨ y ) . Puttingtogether the bounds for T - T yields that E (cid:107)∇ T ( X ) (cid:107) (cid:46) (1 ∨ y ) . Next we bound the norm of the gradient. We will show that E (cid:107)∇ T ( X ) (cid:107) F (cid:46) p by considering the two cases p/N ≤ p/N > Case p/N ≤

1) By Lemma 13.1-(1), we may write ∇ T ( X ) = b − X (cid:0) b − S − I (cid:1) − b − X · b (cid:0) b − S − I (cid:1) , so (cid:107)∇ T ( X ) (cid:107) F (cid:46) b − (cid:107) X (cid:107) F (cid:107) S − bI (cid:107) + b − (cid:107) X (cid:107) F (cid:107) S − bI (cid:107) (cid:46) (cid:107) X (cid:107) F (cid:0) b − (cid:107) S − I (cid:107) + b − | b − | + b − (cid:107) S − I (cid:107) + b − | b − | (cid:1) . By Lemma 7.4 and Lemma D.4, it holds under the condition p/N ≤ E (cid:107) T ( X ) (cid:107) F (cid:46) ( N p ) (cid:0) ( N − p ) + ( N − p ) (cid:1) (cid:46) p . ( Case p/N >

1) By Lemma 13.1-(1), we have E (cid:107)∇ T ( X ) (cid:107) F = E (cid:20)(cid:13)(cid:13)(cid:13)(cid:13) XSb (cid:13)(cid:13)(cid:13)(cid:13) F + (cid:13)(cid:13)(cid:13)(cid:13) Xb b (cid:13)(cid:13)(cid:13)(cid:13) F − (cid:28) XSb , Xb b (cid:29) (cid:21) = N p · E (cid:20) bb − b b (cid:21) = N p · E ( bb − b ) + N p · E ( bb − b )( b − − ≡ ( I ) + ( II ) . To handle ( I ), it holds by Lemma D.5-(4)(5) that under p > N ,( I ) = Np E (cid:2) tr( S ) tr( S ) − tr ( S ) (cid:3) = Np N − O ( N p ) = O ( p ) . To handle ( II ), it holds by Lemmas D.2, D.4-(3), and D.5-(6) that under p > N ,( II ) = N p · E ( bb − b ) b − (1 − b ) ≤ N p · E / ( bb − b ) E / b − E / ( b − ≤ N p · (cid:16) | E ( bb − b ) | + Var / ( bb − b ) (cid:17) · E / b − E / ( b − = N p · O ( N − p ) · O (1) · O (( N p ) − / ) = o ( p ) . Putting together the estimates for ( I ) and ( II ) yield that E (cid:107)∇ T ( X ) (cid:107) F = O ( p ) under the considered case p > N . The rest of the proof proceeds alongthe lines in the proof of Theorem 3.8, with the help of the variance formulain Proposition 4.5-(3). The normal approximation error bound becomes aconstant multiple of (1 ∨ y ) · pp = pn ∨ p = 1 n ∧ p , as desired. (cid:3) Ratio control.

Proof of Proposition 4.5.

We assume without loss of generality that b (Σ) =tr(Σ) /p = 1 (otherwise we replace Σ by Σ · b − (Σ)).(1). By Lemma 13.1, with S Z ≡ N − (cid:80) Ni =1 Z i Z (cid:62) i , we have T Σ;J = (cid:26) Z Σ / Σ / S Z Σ / b (Σ / S Z Σ / ) − Z Σ / b (Σ / S Z Σ / ) b (Σ / S Z Σ / ) (cid:27) Σ / = Z Σ S Z Σ b (Σ / S Z Σ / ) − Z Σ b (Σ / S Z Σ / ) b (Σ / S Z Σ / ) , so T Σ;J − T I ;J = (cid:26) Z Σ S Z Σ b (Σ / S Z Σ / ) − ZS Z b ( S Z ) (cid:27) − (cid:26) Z Σ b (Σ / S Z Σ / ) b (Σ / S Z Σ / ) − Z b ( S Z ) b ( S Z ) (cid:27) = (cid:26) Z Σ S Z Σ b (Σ / S Z Σ / ) − ZS Z b ( S Z ) (cid:27) − Z (cid:26) b (Σ / S Z Σ / ) b (Σ / S Z Σ / ) − b ( S Z ) b ( S Z ) (cid:27) − ( Z Σ − Z ) · b (Σ / S Z Σ / ) b (Σ / S Z Σ / ) ≡ V ( Z ) + V ( Z ) + V ( Z ) . We will handle the Frobenius norms of V ( Z ) , V ( Z ) , V ( Z ) separately below.For V ( Z ), (cid:107) V ( Z ) (cid:107) F (cid:46) (cid:13)(cid:13)(cid:13)(cid:13) Z Σ S Z Σ b (Σ / S Z Σ / ) − ZS Z b (Σ / S Z Σ / ) (cid:13)(cid:13)(cid:13)(cid:13) F + (cid:13)(cid:13)(cid:13)(cid:13) ZS Z b (Σ / S Z Σ / ) − ZS Z b ( S Z ) (cid:13)(cid:13)(cid:13)(cid:13) F = (cid:107) Z Σ S Z Σ − ZS Z (cid:107) F · b (Σ / S Z Σ / )+ (cid:107) ZS Z (cid:107) F · (cid:20) b (Σ / S Z Σ / ) − b ( S Z ) b (Σ / S Z Σ / ) b ( S Z ) (cid:21) ≡ V , + V , . Note that V , (cid:46) b − (Σ / S Z Σ / ) (cid:0) (cid:107) Z Σ S Z (Σ − I ) (cid:107) F + (cid:107) Z (Σ − I ) S Z (cid:107) F (cid:1) (cid:46) (cid:104) b − (Σ / S Z Σ / ) · (cid:107) S Z (cid:107) (cid:105) · (cid:0) (cid:107) Σ (cid:107) ∨ (cid:1) · (cid:107) Z (cid:107) (cid:107) Σ − I (cid:107) F ,V , ≤ (cid:107) S Z (cid:107) (cid:107) Z (cid:107) F b − (Σ / S Z Σ / ) b − ( S Z ) × (cid:0) tr((Σ − I ) S Z ) /p (cid:1) (cid:0) b (Σ / S Z Σ / ) ∨ b ( S Z ) (cid:1) (cid:46) (cid:104) (cid:107) S Z (cid:107) b − (Σ / S Z Σ / ) b − ( S Z ) (cid:0) b (Σ / S Z Σ / ) ∨ b ( S Z ) (cid:1)(cid:105) × p − (cid:107) Z (cid:107) F (cid:107) Σ − I (cid:107) F . So under p/N ≤ M , by Lemma D.2 and Lemma 7.4, we have E (cid:107) V ( Z ) (cid:107) F (cid:46) M N (cid:0) (cid:107) Σ (cid:107) ∨ (cid:1) (cid:107) Σ − I (cid:107) F . For V ( Z ), (cid:107) V ( Z ) (cid:107) F = (cid:107) Z (cid:107) F (cid:16) b (Σ / S Z Σ / ) b (Σ / S Z Σ / ) − b ( S Z ) b ( S Z ) (cid:17) (cid:46) (cid:107) Z (cid:107) F (cid:26)(cid:16) b (Σ / S Z Σ / ) b (Σ / S Z Σ / ) − b ( S Z ) b (Σ / S Z Σ / ) (cid:17) + (cid:16) b ( S Z ) b (Σ / S Z Σ / ) − b ( S Z ) b ( S Z ) (cid:17) (cid:27) ≡ V , + V , . Note that (cid:0) b (Σ / S Z Σ / ) − b ( S Z ) (cid:1) = p − tr (cid:0) S Z Σ S Z Σ − S Z (cid:1) (cid:46) p − (cid:110) tr (cid:0) S Z (Σ − I ) S Z Σ (cid:1) + tr (cid:0) S Z (Σ − I ) (cid:1)(cid:111) (cid:46) p − (cid:107) S Z (cid:107) (cid:0) (cid:107) Σ (cid:107) F /p + 1 (cid:1) (cid:107) Σ − I (cid:107) F (13.2) (cid:46) p − (cid:107) S Z (cid:107) (cid:0) (cid:107) Σ (cid:107) ∨ (cid:1) (cid:107) Σ − I (cid:107) F , so V , (cid:46) (cid:104) b − (Σ / S Z Σ / ) (cid:107) S Z (cid:107) (cid:105) · (cid:0) (cid:107) Σ (cid:107) ∨ (cid:1) · (cid:0) p − (cid:107) Z (cid:107) F (cid:1) · (cid:107) Σ − I (cid:107) F ,V , ≤ (cid:107) Z (cid:107) F b − (Σ / S Z Σ / ) b − ( S Z ) b ( S Z ) (cid:0) b (Σ / S Z Σ / ) − b ( S Z ) (cid:1) × (cid:16) b (Σ / S Z Σ / ) + b (Σ / S Z Σ / ) b ( S Z ) + b ( S Z ) (cid:17) (cid:46) (cid:104) b − (Σ / S Z Σ / ) b − ( S Z ) (cid:0) b (Σ / S Z Σ / ) ∨ b ( S Z ) (cid:1) (cid:107) S Z (cid:107) (cid:105) × p − (cid:107) Z (cid:107) F (cid:107) Σ − I (cid:107) F . Hence under p/N ≤ M , by Lemma D.2 and Lemma 7.4, we have E (cid:107) V ( Z ) (cid:107) F (cid:46) M N (cid:0) (cid:107) Σ (cid:107) ∨ (cid:1) (cid:107) Σ − I (cid:107) F . Lastly, recall that tr(Σ) = p so using trace H¨older inequality we havetr( S Z Σ S Z Σ) ≤ tr(Σ) (cid:107) S Z Σ S Z (cid:107) op ≤ p (cid:107) S Z (cid:107) (cid:107) Σ (cid:107) op , so V ( Z ) satisﬁes (cid:107) V ( Z ) (cid:107) F ≤ p − (cid:107) Σ − I (cid:107) F · (cid:107) Z (cid:107) · b − (Σ S Z ) · tr ( S Z Σ S Z Σ) ≤ (cid:104) b − (Σ / S Z Σ / ) (cid:107) S Z (cid:107) (cid:105) · (cid:107) Σ (cid:107) · (cid:107) Z (cid:107) (cid:107) Σ − I (cid:107) F Hence under p/N ≤ M , by Lemma D.2 and Lemma 7.4, we have E (cid:107) V ( Z ) (cid:107) F (cid:46) M N (cid:0) (cid:107) Σ (cid:107) ∨ (cid:1) (cid:107) Σ − I (cid:107) F . Combining the estimates proves the claim.(2). Recall the normalization b (Σ) = 1. Note that E (cid:20) b (Σ / S Z Σ / ) b (Σ / S Z Σ / ) (cid:21) = E b (Σ / S Z Σ / ) E b (Σ / S Z Σ / ) + E (cid:20) b (Σ / S Z Σ / ) (cid:18) b (Σ / S Z Σ / ) − E b (Σ / S Z Σ / ) (cid:19)(cid:21) ( ∗ ) = (1 + N − ) b (Σ) b (Σ) + pN ) / ( N p ) + E (cid:20) b (Σ / S Z Σ / ) (cid:18) b (Σ / S Z Σ / ) − E b (Σ / S Z Σ / ) (cid:19)(cid:21) = b (Σ) b (Σ) + pN + (cid:26) E (cid:20) b (Σ / S Z Σ / ) (cid:18) b (Σ / S Z Σ / ) − E b (Σ / S Z Σ / ) (cid:19)(cid:21) + (cid:20) (1 + N − ) b (Σ) b (Σ) + pN (cid:21)(cid:18)

11 + 2 tr(Σ ) · ( N p ) − − (cid:19) + N − b (Σ) b (Σ) (cid:27) IGH DIMENSIONAL COVARIANCE TESTING 61 ≡ b (Σ) b (Σ) + pN + R (Σ) . Here we use Lemma D.4-(1) in ( ∗ ) and R (Σ) = E (cid:20) b (Σ / S Z Σ / ) (cid:18) b (Σ / S Z Σ / ) − E b (Σ / S Z Σ / ) (cid:19)(cid:21) + (cid:20) (1 + N − ) b (Σ) b (Σ) + pN (cid:21)(cid:18)

11 + 2 tr(Σ ) · ( N p ) − − (cid:19) + N − b (Σ) b (Σ) ≡ R (Σ) + R (Σ) + R (Σ) . As m Σ;J = N E tr (cid:16) Σ / S Z Σ / b (Σ / S Z Σ / ) − I (cid:17) = N p (cid:26) E (cid:20) b (Σ / S Z Σ / ) b (Σ S Z ) (cid:21) − (cid:27) , we have m Σ;J − m I ;J = N p (cid:0) p − (cid:107) Σ − I (cid:107) F + R (Σ) − R ( I ) (cid:1) Now we handle R (cid:96) (Σ) − R (cid:96) ( I ) for (cid:96) = 1 , , (cid:96) = 1, (cid:12)(cid:12) R (Σ) − R ( I ) (cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12) E (cid:20) b (Σ / S Z Σ / ) (cid:18) b (Σ / S Z Σ / ) − E b (Σ / S Z Σ / ) (cid:19) − b ( S Z ) (cid:18) b ( S Z ) − E b ( S Z ) (cid:19)(cid:21)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12) E (cid:20)(cid:0) b (Σ / S Z Σ / ) − b ( S Z ) (cid:1)(cid:18) b (Σ / S Z Σ / ) − E b (Σ / S Z Σ / ) (cid:19)(cid:21)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12) E (cid:20) b ( S Z ) (cid:18)(cid:18) b (Σ / S Z Σ / ) − b ( S Z ) (cid:19) − (cid:18) E b (Σ / S Z Σ / ) − E b ( S Z ) (cid:19)(cid:19)(cid:21)(cid:12)(cid:12)(cid:12)(cid:12) ≡ R , + R , . The term R , can be handled as follows: by (13.2) Lemmas 7.4, D.2, andD.4, under p/N ≤ M , R , (cid:46) E / (cid:0) b (Σ / S Z Σ / ) − b ( S Z ) (cid:1) · E / b − (Σ / S Σ / ) · Var / (cid:0) b (Σ / S Z Σ / ) (cid:1) · (cid:0) E b (Σ / S Σ / ) (cid:1) − (cid:46) M p − · (cid:104) p − / (cid:0) p − / (cid:107) Σ (cid:107) F + 1 (cid:1) (cid:107) Σ − I (cid:107) F (cid:105) · Var / (cid:0) tr (Σ S Z ) (cid:1) (cid:46) M ( N / p ) − (cid:0) p − (cid:107) Σ (cid:107) F + 1 (cid:1) (cid:107) Σ − I (cid:107) F . For R , , we have by Lemmas 7.4 and D.4 that, under p/N ≤ M , R , = (cid:12)(cid:12)(cid:12)(cid:12) E (cid:20) b ( S Z ) (cid:18) b ( S Z ) − b (Σ / S Z Σ / ) b (Σ / S Z Σ / ) b ( S Z ) − E b ( S Z ) − E b (Σ / S Z Σ / ) E b (Σ / S Z Σ / ) E b ( S Z ) (cid:19)(cid:21)(cid:12)(cid:12)(cid:12)(cid:12) ≤ E b ( S Z ) b − (Σ / S Z Σ / ) b − ( S Z ) × (cid:12)(cid:12) b ( S Z ) − b (Σ / S Z Σ / ) − E (cid:0) b ( S Z ) − b (Σ / S Z Σ / ) (cid:1)(cid:12)(cid:12) + (cid:12)(cid:12) E (cid:0) b ( S Z ) − b (Σ / S Z Σ / ) (cid:1)(cid:12)(cid:12) · E b ( S Z ) × (cid:12)(cid:12)(cid:12)(cid:12) b (Σ / S Z Σ / ) b ( S Z ) − E b (Σ / S Z Σ / ) E b ( S Z ) (cid:12)(cid:12)(cid:12)(cid:12) (cid:46) M Var / (cid:0) b ( S Z ) − b (Σ / S Z Σ / ) (cid:1) + (cid:12)(cid:12) E (cid:0) b ( S Z ) − b (Σ / S Z Σ / ) (cid:1)(cid:12)(cid:12) · (cid:0) Var / ( b (Σ / S Z Σ / )) ∨ Var / ( b ( S Z )) (cid:1) ( ∗ ) (cid:46) M ( N / p ) − (cid:107) Σ − I (cid:107) F + p − / (cid:107) Σ − I (cid:107) F · ( N / p ) − ( (cid:107) Σ (cid:107) F ∨ p / ) (cid:46) ( N / p ) − (cid:0) p − / (cid:107) Σ (cid:107) F + 1 (cid:1) (cid:107) Σ − I (cid:107) F . Here in ( ∗ ) we use the fact that (cid:12)(cid:12) E (cid:0) b ( S Z ) − b (Σ / S Z Σ / ) (cid:1)(cid:12)(cid:12) (cid:46) p − E / tr (cid:0) (Σ − I ) S Z (cid:1) ≤ p − / (cid:107) Σ − I (cid:107) F · E / (cid:107) S Z (cid:107) (cid:46) M p − / (cid:107) Σ − I (cid:107) F . Hence | R (Σ) − R ( I ) | (cid:46) M ( N / p ) − (cid:0) p − (cid:107) Σ (cid:107) F + 1 (cid:1) (cid:107) Σ − I (cid:107) F . For (cid:96) = 2, with a (Σ) ≡ / (cid:0) ) / ( N p ) (cid:1) − | a (Σ) | ≤ /N and | a ( I ) | ≤ / ( N p )), we have R (Σ) = (1 + N − ) b (Σ) a (Σ) + N − p a (Σ) , so (cid:12)(cid:12) R (Σ) − R ( I ) (cid:12)(cid:12) (cid:46) M (cid:12)(cid:12) b (Σ) a (Σ) − b ( I ) a ( I ) (cid:12)(cid:12) + | a (Σ) − a ( I ) | ≡ R , + R , . The two terms R , , R , can be handled as follows: using tr(Σ ) ≤ p under b (Σ) = 1, we have R , (cid:46) b (Σ) | a (Σ) − a ( I ) | + | a ( I ) || b (Σ) − b ( I ) | (cid:46) p − tr(Σ )( N p ) − | tr (cid:0) Σ − I (cid:1) | + ( N p ) − · p − · | tr(Σ − I ) | (cid:46) ( N p / ) − (cid:0) p − / (cid:107) Σ (cid:107) F + 1 (cid:1) (cid:107) Σ − I (cid:107) F ,R , (cid:46) ( N p / ) − (cid:0) p − / (cid:107) Σ (cid:107) F + 1 (cid:1) (cid:107) Σ − I (cid:107) F , so (cid:12)(cid:12) R (Σ) − R ( I ) (cid:12)(cid:12) (cid:46) M ( N p / ) − (cid:0) (cid:107) Σ (cid:107) F /p / + 1 (cid:1) (cid:107) Σ − I (cid:107) F . For (cid:96) = 3, (cid:12)(cid:12) R (Σ) − R ( I ) (cid:12)(cid:12) = N − (cid:12)(cid:12) b (Σ) − b ( I ) (cid:12)(cid:12) (cid:46) ( N p / ) − (cid:0) p − / (cid:107) Σ (cid:107) F + 1 (cid:1) (cid:107) Σ − I (cid:107) F . IGH DIMENSIONAL COVARIANCE TESTING 63

Now with Q J (Σ) ≡ p (cid:0) R (Σ) − R ( I ) (cid:1) , we have | Q J (Σ) | (cid:46) M p max { ( N / p ) − , ( N p / ) − } (cid:0) p − (cid:107) Σ (cid:107) F + 1 (cid:1) (cid:107) Σ − I (cid:107) F (cid:46) M N − / (cid:0) p − (cid:107) Σ (cid:107) F + 1 (cid:1) (cid:107) Σ − I (cid:107) F , and m Σ;J − m I ;J = N (cid:0) (cid:107) Σ − I (cid:107) F + Q J (Σ) (cid:1) . (3). Recall T LNW deﬁned in (3.9). Let∆( X ) ≡ T LNW ( X ) − T J ( X ) . Then for any ε >

0, there exists some C ε > X , . . . , X n are i.i.d. N (0 , I p )), (cid:2) (1 − ε ) σ I ;LNW − C ε Var I (∆) (cid:3) + ≤ σ I ;J ≤ (1 + ε ) σ I ;LNW + C ε Var I (∆) . (13.3)We will now bound Var I (∆). By Lemmas 10.1-(1) and 13.1-(1), we have forany i, j ∈ [ N ] × [ p ] ∂ ( ij ) ∆( X ) = ∂ ( ij ) T LNW ( X ) − ∂ ( ij ) T J ( X )= (cid:0) X ( S − I ) − N − tr( S ) X (cid:1) ij − (cid:20) Xb (cid:16) Sb − I (cid:17) − Xb ( S ) · b (cid:16)(cid:16) Sb ( S ) − I (cid:17) (cid:17)(cid:21) ij = (cid:20) X ( S − I ) − Xb (cid:16) Sb − I (cid:17)(cid:21) ij + (cid:20) Xb (cid:0) ( S − b ( S ) I ) (cid:1)(cid:0) b − ( S ) − (cid:1)(cid:21) ij + (cid:20) X (cid:16) b ( S ) − b ( S ) − N − tr( S ) (cid:17)(cid:21) ij ≡ (cid:0) ∆ + ∆ + ∆ (cid:1) ij . We now handle ∆ -∆ separately below. For ∆ , by Lemmas 7.4, D.2, andD.4, we have E (cid:107) ∆ (cid:107) F (cid:46) E b − (1 − b ) (cid:107) X ( S − I ) (cid:107) F + E b − (1 − b ) (cid:107) XS (cid:107) F ≤ N E b − (1 − b ) (cid:107) S (cid:107) op (cid:107) S − I (cid:107) F + N E b − ( b − tr( S ) (cid:46) N · ( pN ) − · (1 ∨ y )( N − p ) + N E / b − E / ( b − E / tr ( S ) ( ∗ ) (cid:46) o ( p ) + N · O (1) · ( N p ) − · O ( N − p ∨ p ) = o ( p ) . Here in ( ∗ ) the ﬁrst bound follows by direct calculation and the secondbound follows as: by Lemma D.5-(7), E tr ( S ) ≤ E tr ( S ) (cid:107) S (cid:107) ≤ E / tr ( S ) · E / (cid:107) S (cid:107) (cid:46) p · ( y ∨

1) = O ( N − p ∨ p ) . For ∆ , using b (cid:0) ( S − b ( S ) I ) (cid:1) ≤ (cid:107) ( S − b ( S ) I ) (cid:107) op (cid:46) (cid:107) S (cid:107) ∨ b ( S ), we have E (cid:107) ∆ (cid:107) F (cid:46) E b − ( b ∨ b ∨ b − (cid:0) (cid:107) S (cid:107) ∨ b (cid:1) (cid:107) X (cid:107) F (cid:46) ( pN ) − · ( pN ) · E / (cid:0) (cid:107) S (cid:107) ∨ b (cid:1) (cid:16) (1 ∨ y ) = o ( p ) . For ∆ , let h ( S ) ≡ b ( S ) − b ( S ) − N − tr( S ), we have E (cid:107) ∆ (cid:107) F (cid:46) E (cid:107) X (cid:107) F h ( S ) ≤ N E / tr ( S ) · E / h ( S ) (cid:46) N p · (cid:104)(cid:0) E h ( S ) (cid:1) + Var (cid:0) h ( S ) (cid:1) + E (cid:107)∇ h ( S ) (cid:107) F (cid:105) / , where the last inequality follows since E h ( S ) = [ E h ( S )] + Var( h ( S )) ≤ (cid:2) E h ( S )) + 2 Var ( h ( S )) + Var( h ( S ) (cid:3) ≤ E h ( S )) + 2 Var ( h ( S )) + 4 E h ( S ) (cid:107)∇ h ( S ) (cid:107) F ≤ E h ( S )) + 2 Var ( h ( S )) + 4 τ E h ( S ) + C τ E (cid:107)∇ h ( S ) (cid:107) F and choosing, say, τ = 1 /

8. For E h ( S ), Lemma D.4 yields the direct evalu-ation E h ( S ) = (1 + N − ) p + N − p p − p + 2 N − pp − pN = (1 + N − ) + pN − (cid:16) N p (cid:17) − pN = 1 N − N p = O ( N − ) . For Var (cid:0) h ( S ) (cid:1) , the Gaussian-Poincar´e inequality [BLM13, Theorem 3.20]yields thatVar (cid:0) h ( S ) (cid:1) ≤ E (cid:88) ij (cid:0) ∂ ( ij ) h ( S ) (cid:1) ∗∗ ) = (cid:88) ij E (cid:18) X (cid:62) i Se j N p − X ij b ( S ) N p − X ij N (cid:19) (cid:46) ( N p ) − (cid:0) E (cid:107) XS (cid:107) F + E b ( S ) (cid:107) X (cid:107) F (cid:1) + N − E (cid:107) X (cid:107) F (cid:46) ( N p ) − E tr( S ) (cid:107) S (cid:107) op + ( N p ) − ( N p ) + N − ( N p ) (cid:46) ( N p ) − E / tr ( S ) E / (cid:107) S (cid:107) + ( N p ) − + pN − (cid:46) ( N p ) − · p · (1 ∨ y ) + ( N p ) − + pN − = o ( N − p ) . Here ( ∗∗ ) follows from (13.1). Lastly E (cid:107)∇ h ( S ) (cid:107) F can be bounded similarly: E (cid:107)∇ h ( S ) (cid:107) F (cid:46) ( N p ) − (cid:0) E (cid:107) XS (cid:107) F + E b ( S ) (cid:107) X (cid:107) F (cid:1) + N − E (cid:107) X (cid:107) F = o ( N − p ) . Now by the Gaussian-Poincar´e inequality [BLM13, Theorem 3.20],Var I (∆) ≤ E (cid:88) ij (cid:0) ∂ ( ij ) ∆( X ) (cid:1) (cid:46) E (cid:107) ∆ (cid:107) F + E (cid:107) ∆ (cid:107) F + E (cid:107) ∆ (cid:107) F = o ( p ) . As σ I ;LNW ∼ p / → ∞ whenever N ∧ p → ∞ , by taking ε in (13.3) slowlydecaying to 0 we conclude σ I ;LNW ∼ σ I ;J .(4). By (1)-(2), as (cid:107) Σ (cid:107) F /p (cid:46) (cid:107) Σ − I (cid:107) F /p + 1 (cid:46) (cid:107) Σ − I (cid:107) F ∨ (cid:107) Σ − I (cid:107) F ≤ (cid:107) Σ (cid:107) F + √ p ≤ p + √ p under tr(Σ) = p ], we only need to provethat given C , we may ﬁnd some constant C > √ N (cid:107) Σ − I (cid:107) F (cid:0) ∨ (cid:107) Σ − I (cid:107) F (cid:1)(cid:0) N (cid:107) Σ − I (cid:107) F − C N / (cid:107) Σ − I (cid:107) F (cid:0) ∨ (cid:107) Σ − I (cid:107) F (cid:1)(cid:1) + (cid:87) σ I ;J ≤ C ( σ I ;J ∧ N ) / . (13.4) IGH DIMENSIONAL COVARIANCE TESTING 65

Write α = (cid:107) Σ − I (cid:107) F , we only need to prove that √ N α (cid:87) √ N α (cid:0) N α − C N / α (cid:1) + (cid:87) σ I ;J ≤ C ( σ I ;J ∧ N ) / . (13.5)This follows asLHS of (13.5) (cid:46) α ≤ C N − / σ I ; J + α> C N − / √ N α (cid:87) √ N α N α (cid:87) σ I ;J (cid:46) σ I ; J + 1 N / inf α ≥ (cid:0) α ∨ σ I ;J /Nα (cid:1) + 1 N / (cid:46) σ I ; J + 1 σ / I ;J + 1 N / (cid:16) σ I ;J ∧ N ) / . The proof is complete. (cid:3)

Completing of the proof for power expansion.

Proof of Theorem 4.6.

The proof essentially follows that of Theorem 3.10 bynoting that the key property used therein is | Q LNW (Σ) | ≤ C M N − / ( (cid:107) Σ − I (cid:107) F ∨ (cid:107) Σ − I (cid:107) F , while here we have | Q J (Σ · b − (Σ)) | ≤ C M N − / (cid:107) Σ · b − (Σ) − I (cid:107) F ≤ N − / ( (cid:107) Σ · b − (Σ) − I (cid:107) F ∨ (cid:107) Σ · b − (Σ) − I (cid:107) F . (cid:3) Appendix A. Second-order Poincar´e inequality

The main tool used for proving normal approximations is the followingsecond-order Poincar´e inequality due to [Cha09]. Recall that W , ( γ n ) isthe Gaussian Sobolev space deﬁned in (1.11). Lemma A.1 (Second-order Poincar´e inequality) . Let ξ be an n -dimensionalstandard normal random vector. Let F : R n → R be an element of W , ( γ n ) .Let ξ (cid:48) be an independent copy of ξ . Deﬁne T : R n → R by T ( y ) ≡ (cid:90) √ t (cid:68) ∇ F ( y ) , E ξ (cid:48) ∇ F ( √ ty + √ − tξ (cid:48) ) (cid:69) d t. Then with W ≡ F ( ξ ) , d TV (cid:18) W − E W (cid:112) Var( W ) , N (0 , (cid:19) ≤ (cid:112) Var( T ( ξ ))Var( W ) . Appendix B. Sobolev regularity of matrix functionals

Lemma B.1.

The following hold:(1) Let f : R N × p → R be deﬁned by f ( X ) = log det( X (cid:62) X ) . If N ≥ p +1 , then f ∈ W , ( γ N × p ) provided itself and its pointwise ﬁrst derivatives live in L ( γ N × p ) , f ∈ W , ( γ N × p ) provided itself, its pointwise ﬁrst, and secondderivatives live in L ( γ N × p ) . In particular, if N, p are large enough with p/N ≤ − ε for some ε ∈ (0 , , then f ∈ W , ( γ N × p ) . (2) Let g (cid:96) : R N × p → R be deﬁned by g (cid:96) ( X ) = tr − (cid:96) ( X (cid:62) X ) for (cid:96) ∈ N . Then g (cid:96) ∈ W , ( γ N × p ) provided itself and its pointwise ﬁrst derivatives live in L ( γ N × p ) , g (cid:96) ∈ W , ( γ N × p ) provided itself, its pointwise ﬁrst, and secondderivatives live in L ( γ N × p ) . In particular, there exists some N (cid:96) ∈ N suchthat for N ≥ N (cid:96) , g (cid:96) ∈ W , ( γ N × p ) .Proof. (1). We ﬁrst prove the claim involving f ∈ W , ( γ N × p ). Let W r,p ( R d )be the standard Sobolev class on R d (cf. [Bog98, Chapter 1.5]) and recallthat C ∞ ( R d ) is the class of smooth functions on R d with compact support.By [Bog98, Proposition 1.5.2], we only need to verify that ζf ∈ W , ( R N × p ),i.e., ζf ∈ L ( R N × p ) and its ﬁrst partial derivatives (in the sense of distri-butions) live in L ( R N × p ) for every ζ ∈ C ∞ ( R N × p ). ζf ∈ L ( R N × p ) followsfrom N ≥ p +1 > p . Using the absolute continuity on line characterization ofthe space W , ( R N × p ) (cf. [Maz11, Section 1.1.3]), we only need to show that ζf is absolutely continuous on almost all straight lines that are parallel tocoordinate axes and the ﬁrst pointwise derivatives of ζf belong to L ( R N × p ).As ζ has compact support, the latter requirement is satisﬁed by the assump-tion that f and its ﬁrst pointwise derivatives live in L ( γ N × p ). To show thealmost absolute continuity, we only need to do so for f on a compact subsetof R N × p . Identify X ∈ R N × p in the matrix form X = [ X · · · X p ] where X j ∈ R N for 1 ≤ j ≤ p , and in the coordinate form X = ( X (cid:62) , . . . , X (cid:62) p ).Let L j ≡ { ( X (cid:62) , . . . , X (cid:62) p ) ∈ R N × p : X j ∈ lin( X , . . . , X j − , X j +1 , . . . , X p ) } ,and π − ( ij ) : R N × p → R N × p − be the natural projection that excludes ( X j ) i .Then π − ( ij ) ( L j (cid:48) ) is a subset of R N × p − of Lebesgue measure 0 for each( i, j ) ∈ [ N ] × [ p ] , j (cid:48) ∈ [ p ] under the condition p ≤ N −

1, as we may write L j (cid:48) = (cid:110)(cid:0) X (cid:62) , . . . , X (cid:62) j (cid:48) − , (cid:88) j (cid:54) = j (cid:48) γ j X (cid:62) j , X (cid:62) j (cid:48) +1 , . . . , X (cid:62) p (cid:1) : X j ∈ R N , γ j ∈ R , j (cid:54) = j (cid:48) (cid:111) . Hence with L ≡ ∪ j L j , π − ( ij ) ( L ) is a subset of R N × p − of Lebesgue measure0 for every ( i, j ) ∈ [ N ] × [ p ]. In particular, this means that for any X − ( ij ) / ∈ π − ( ij ) ( L ), the map f along the line x ( ij ) (cid:55)→ ( x ( ij ) , X − ( ij ) ) does not touch { X ∈ R N × p : det( X (cid:62) X ) = 0 } , and hence is locally Lipschitz as ∇ f ( X ) =2 X ( X (cid:62) X ) − . This veriﬁes the almost absolute continuity property, andhence f ∈ W , ( γ N × p ) provided itself and its ﬁrst pointwise derivatives livein L ( γ N × p ).The veriﬁcation of f ∈ W , ( γ N × p ) under L integrability of the pointwisederivatives up to the second order is the same, upon noting the derivativeshave singularities only at { X ∈ R N × p : det( X (cid:62) X ) = 0 } (the precise deriva-tive formula is given in Lemma 8.1).The last assertion follows from Lemma 7.3 and (8.11) that establishesthe L integrability of the pointwise ﬁrst and second derivatives, and thestraightforward veriﬁcation of the L integrability of f itself.(2). The singularity of g (cid:96) occurs only at tr( X (cid:62) X ) = (cid:80) j (cid:107) X j (cid:107) = 0, i.e., X = 0. The almost absolute continuity on line characterization is therefore IGH DIMENSIONAL COVARIANCE TESTING 67 easily veriﬁed. The L integrability of the derivatives up to the second orderfollows from Lemma D.2. (cid:3) Appendix C. Proof of Lemma 7.3

Proof of Lemma 7.3.

Write S Z for S in the proof for simplicity. Let λ bethe smallest eigenvalue of S , and y ≡ ( p − /N < − ε . By [RV09, Theorem1.1], on an event E with probability at least 1 − e − cN (1 − y ) , λ ≥ c (1 − √ y ) for some absolute constant c >

0. A similar estimate can be obtained usingrigidity estimate for the eigenvalues of the sample covariance matrix, e.g.,[PY14, Theorem 3.1(iii)]. Hence E (cid:107) S − (cid:107) q op = E (cid:107) S − (cid:107) q op E + E (cid:107) S − (cid:107) q op E c ≤ c − q (cid:0) − √ y (cid:1) − q + E / (cid:107) S − (cid:107) q op · e − cN (1 − y ) / . (C.1)Now we give an upper bound for E (cid:107) S − (cid:107) q op . Let r ≡ ( N − p − / k , we write κ (cid:96) k if κ = ( k , k , . . . ), with convention k ≥ k ≥ . . . , is a partition of k , i.e., (cid:80) i k i = k . Let C κ denote the zonal polynomial (cf. [Mui82, Chapter 7])with respect to the partition κ . Then it follows from [Mui82, Corollary 9.7.4]that, for any x > P ( (cid:107) S − (cid:107) op > x ) = 1 − P ( λ > /x )= 1 − e − Np x pr (cid:88) k =0 (cid:88) κ (cid:96) k : k ≤ r C κ (cid:0) N I/ (2 x ) (cid:1) k != e − Np x (cid:104) ∞ (cid:88) k =0 (cid:0) N p/ (2 x ) (cid:1) k k ! − pr (cid:88) k =0 (cid:88) κ (cid:96) k : k ≤ r C κ (cid:0) N I/ (2 x ) (cid:1) k ! (cid:105) = e − Np x (cid:104) ∞ (cid:88) k = pr +1 (cid:0) N p/ (2 x ) (cid:1) k k ! + pr (cid:88) k =0 k ! (cid:110)(cid:18) N p x (cid:19) k − (cid:88) κ (cid:96) k : k ≤ r C κ (cid:16) N I x (cid:17)(cid:111)(cid:105) ( ∗ ) = e − Np x (cid:104) ∞ (cid:88) k = pr +1 (cid:0) N p/ (2 x ) (cid:1) k k ! + pr (cid:88) k =0 k ! (cid:88) κ (cid:96) k : k >r C κ (cid:16) N I x (cid:17)(cid:105) ( ∗∗ ) = e − Np x (cid:104) ∞ (cid:88) k = pr +1 (cid:0) N p/ (2 x ) (cid:1) k k ! + pr (cid:88) k = r +1 (cid:0) N/ (2 x ) (cid:1) k k ! (cid:88) κ (cid:96) k : k >r C κ (cid:0) I (cid:1)(cid:17)(cid:105) ( ∗∗∗ ) ≤ e − Np x · ∞ (cid:88) k = r +1 (cid:0) N p/ (2 x ) (cid:1) k k ! . Here ( ∗ ) follows from [Mui82, Deﬁnition 7.2.1, (iii)]: for any k ≥ t > (cid:88) κ (cid:96) k C κ ( t · I ) = (cid:2) tr( t · I ) (cid:3) k = ( tp ) k ; (C.2) ( ∗∗ ) follows from the fact that for each k and partition κ of k , C κ is ahomogeneous polynomial of order k ; ( ∗ ∗ ∗ ) follows from the non-negativityof zonal polynomial for I (cf. [Mui82, Corollary 7.2.4]) and an applicationof (C.2) with t = 1: (cid:88) κ (cid:96) k : k >r C κ (cid:0) I (cid:1) ≤ (cid:88) κ (cid:96) k C κ (cid:0) I (cid:1) = p k . Hence by using the fact that for any k ≥ q + 1, (cid:90) ∞ e − Np x (cid:18) N p x (cid:19) k · x q − d x = (cid:18) N p (cid:19) q (cid:90) ∞ e − y y k − q − d y = (cid:18) N p (cid:19) q ( k − q − , we have for every r ≥ q E (cid:107) S − (cid:107) q op = 2 q (cid:90) ∞ x q − P ( (cid:107) S − (cid:107) op > x ) d x ≤ (cid:18) N p (cid:19) q ∞ (cid:88) k = r +1 k ( k − · · · ( k − q )= (cid:18) N p (cid:19) q ∞ (cid:88) k = r +1 q · (cid:110) k − · · · ( k − q ) − k · · · ( k − q − (cid:111) = (cid:18) N p (cid:19) q q r ( r − · · · ( r − q + 1) (cid:46) q ( N p ) q r q . (C.3)Combining (C.1) and (C.3), as p/N ≤ − ε , E (cid:107) S − (cid:107) q op ≤ C qε + C ε N q e − c ε N (cid:46) q,ε r = ( N − p − / r is not an integer,write S = N − N S (cid:48) + N X N X (cid:62) N , where S (cid:48) ≡ N − (cid:80) N − i =1 X i X (cid:62) i . Then usingSherman-Morrison formula, S − = NN − S (cid:48) ) − − N ( N − · ( S (cid:48) ) − X N X (cid:62) N ( S (cid:48) ) − N − X (cid:62) N ( S (cid:48) ) − X N ≡ NN − S (cid:48) ) − − R. (C.5)As X (cid:62) N ( S (cid:48) ) − X N ≥ (cid:107) X N (cid:107) /λ max ( S (cid:48) ), we have E (cid:107) R (cid:107) q op ≤ (cid:16) NN − (cid:17) q · E (cid:20) (cid:107) ( S (cid:48) ) − X N X (cid:62) N ( S (cid:48) ) − (cid:107) q op λ max ( S (cid:48) ) q (cid:107) X N (cid:107) q (cid:21) (cid:46) q E (cid:16) λ max ( S (cid:48) ) q λ min ( S (cid:48) ) q (cid:17) ≤ E / (cid:107) S (cid:48) (cid:107) q op · E / (cid:107) ( S (cid:48) ) − (cid:107) q op (cid:46) q,ε . The claim for r not being an integer follows from the decomposition (C.4)and the estimate above. (cid:3) IGH DIMENSIONAL COVARIANCE TESTING 69

Appendix D. Moment and concentration (in)equalities fortrace functionals

Lemma D.1.

Let Z ∈ R N × p be a random matrix whose entries are i.i.d. N (0 , , and A ∈ R p × p . Then E (cid:107) ZA (cid:107) F ≤ N (cid:107) A (cid:62) A (cid:107) F + N (cid:107) A (cid:107) F ≤ N (cid:107) A (cid:107) F . Proof. As (cid:107) ZA (cid:107) F = tr( ZAA (cid:62) Z (cid:62) ) = tr( A (cid:62) Z (cid:62) ZA ), we have E (cid:107) ZA (cid:107) F = E tr ( A (cid:62) Z (cid:62) ZA ) = Var (cid:0) tr( A (cid:62) Z (cid:62) ZA ) (cid:1) + (cid:0) E tr( A (cid:62) Z (cid:62) ZA ) (cid:1) = Var (cid:0) tr( A (cid:62) Z (cid:62) ZA ) (cid:1) + N tr ( A (cid:62) A ) . Further note that for any ( i, j ) ∈ [ N ] × [ p ], ∂ ( ij ) tr( A (cid:62) Z (cid:62) ZA ) = tr (cid:0) A (cid:62) ( e i e (cid:62) j ) (cid:62) ZA (cid:1) + tr (cid:0) A (cid:62) Z (cid:62) e i e (cid:62) j A (cid:1) = ( ZAA (cid:62) ) ij + ( AA (cid:62) Z (cid:62) ) ji = 2( ZAA (cid:62) ) ij , so by Gaussian-Poincar´e inequality,Var (cid:0) tr( A (cid:62) Z (cid:62) ZA ) (cid:1) ≤ E (cid:88) i,j (cid:0) ∂ ( ij ) tr( A (cid:62) Z (cid:62) ZA ) (cid:1) = 4 E (cid:107) ZAA (cid:62) (cid:107) F = 4 E tr( ZAA (cid:62) AA (cid:62) Z (cid:62) ) = 4 N tr( AA (cid:62) AA (cid:62) ) . Finally note thattr( AA (cid:62) AA (cid:62) ) = (cid:107) A (cid:62) A (cid:107) F = (cid:88) i λ i ( A ) ≤ (cid:18) (cid:88) i λ i ( A ) (cid:19) = (cid:107) A (cid:107) F = tr ( A (cid:62) A ) . The claim follows. (cid:3)

Lemma D.2.

Let S Z ≡ N − (cid:80) Ni =1 Z i Z (cid:62) i where Z i ’s are i.i.d. N (0 , I ) in R p . Then there exists some universal constant C > such that for anynon-negative deﬁnite matrix Σ and any t > , P (cid:18) N (cid:12)(cid:12)(cid:0) tr(Σ S Z ) − tr(Σ) (cid:1)(cid:12)(cid:12) > t (cid:19) ≤ (cid:18) − t C ( N (cid:107) Σ (cid:107) F + (cid:107) Σ (cid:107) op t ) (cid:19) . Consequently, P (cid:0) tr(Σ S Z ) < tr(Σ) / (cid:1) ≤ e − cN for some universal c > .Furthermore, for any (cid:96) ∈ Z such that (cid:96) ≥ − N/ , there exists some C (cid:96) > such that [recall (4.3)] E b (cid:96) (Σ / S Z Σ / ) ≤ C (cid:96) · b (cid:96) (Σ) . Proof.

Let X i ≡ Σ / Z i . Then tr(Σ S Z ) = N − (cid:80) Ni =1 Z (cid:62) i Σ Z i = N − (cid:80) Ni =1 (cid:107) X i (cid:107) ,and E tr(Σ S Z ) = E (cid:107) X i (cid:107) = tr(Σ). By Hanson-Wright inequality (cf. [BLM13,pp.39]), E exp (cid:18) λ N (cid:88) i =1 ( (cid:107) X i (cid:107) − E (cid:107) X i (cid:107) ) (cid:19) ≤ exp (cid:18) λ · N (cid:107) Σ (cid:107) F − λ (cid:107) Σ (cid:107) op (cid:19) , so by [BLM13, Theorem 2.3], we have P (cid:18) N (cid:12)(cid:12)(cid:0) tr(Σ S Z ) − tr(Σ) (cid:1)(cid:12)(cid:12) > t (cid:19) ≤ (cid:18) − t C ( N (cid:107) Σ (cid:107) F + (cid:107) Σ (cid:107) op t ) (cid:19) . In particular, with t ≡ N tr(Σ) /

2, we have P (cid:0) tr(Σ S Z ) < tr(Σ) / (cid:1) ≤ exp (cid:18) − N tr (Σ) C (cid:0) N (cid:107) Σ (cid:107) F + N (cid:107) Σ (cid:107) op tr(Σ) (cid:1) (cid:19) ≤ e − cN . For the expectation bound, let { λ j } pj =1 be the eigenvalues of Σ and assumewithout loss of generality that (cid:80) pj =1 λ j = 1. Then E tr (cid:96) ( S Z Σ) = E (cid:16) N N (cid:88) i =1 Z (cid:62) i Σ Z i (cid:17) = E (cid:16) N N (cid:88) i =1 Z (cid:62) i diag( λ , . . . , λ p ) Z i (cid:17) (cid:96) = N (cid:96) E (cid:18) N (cid:88) i =1 p (cid:88) j =1 λ j Z ij (cid:19) (cid:96) ≡ N (cid:96) E (cid:18) p (cid:88) j =1 λ j Y j (cid:19) (cid:96) ( ∗ ) ≤ N (cid:96) E p (cid:88) j =1 λ j Y (cid:96)j ( ∗∗ ) (cid:46) (cid:96) N (cid:96) · p (cid:88) j =1 λ j N (cid:96) = 1 . Here ( ∗ ) follows as the map x (cid:55)→ x (cid:96) is convex on (0 , ∞ ) for (cid:96) ∈ Z , and ( ∗∗ )follows from the following calculations: • If (cid:96) ∈ Z ≥ , E Y (cid:96) = E (cid:0) χ ( N ) (cid:1) (cid:96) (cid:46) (cid:96) N (cid:96) . • If (cid:96) ∈ Z ≤− and (cid:96) ≥ − N/

2, then E Y (cid:96) = E (cid:0) χ − ( N ) (cid:1) − (cid:96) = (cid:90) x − (cid:96) − N Γ( N/ x − N − e − x d x = 2 (cid:96) Γ( N/ (cid:96) )Γ( N/ (cid:46) (cid:96) N (cid:96) . The proof is complete. (cid:3)

Lemma D.3.

Let S Z ≡ N − (cid:80) Ni =1 Z i Z (cid:62) i where Z i ’s are i.i.d. N (0 , I ) , and Σ ∈ R p × p be a non-negative deﬁnite matrix. Recall the deﬁnition of b (cid:96) (Σ) in (4.3). Then for some universal constants C, c > , (cid:12)(cid:12) E log tr(Σ S Z ) − log tr(Σ) (cid:12)(cid:12) ≤ b (cid:2) (Σ · b − (Σ)) (cid:3) N p + Ce − cN (cid:16) b / (cid:2) (Σ · b − (Σ)) (cid:3) ( N p ) / (cid:95) (cid:17) . Proof.

Let E ≡ { tr(Σ( S Z − I )) / tr(Σ) ≥ − / } . By Lemma D.2, P ( E c ) ≤ e − cN for some universal constant c >

0. As | log(1 + x ) − x | ≤ x for x ≥ − / (cid:12)(cid:12) E log tr(Σ S Z ) − log tr(Σ) (cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12) E log (cid:18) (cid:0) Σ( S Z − I ) (cid:1) tr(Σ) (cid:19)(cid:0) E + E c (cid:1)(cid:12)(cid:12)(cid:12)(cid:12) IGH DIMENSIONAL COVARIANCE TESTING 71 ≤ (cid:12)(cid:12)(cid:12)(cid:12) E tr (cid:0) Σ( S Z − I ) (cid:1) tr(Σ) E (cid:12)(cid:12)(cid:12)(cid:12) + 4 E (cid:18) tr (cid:0) Σ( S Z − I ) (cid:1) tr(Σ) (cid:19) + E (cid:20) log (cid:16) tr(Σ S Z )tr(Σ) (cid:17) E c (cid:21) ≤ E / (cid:20) tr (cid:0) Σ( S Z − I ) (cid:1) tr(Σ) (cid:21) P / ( E c ) + 4 E (cid:18) tr (cid:0) Σ( S Z − I ) (cid:1) tr(Σ) (cid:19) + E / log (cid:16) tr(Σ S Z )tr(Σ) (cid:17) · P / ( E c ) ≡ ( I ) + ( II ) + ( III ) . To handle ( I ), note that by Gaussian-Poincar´e inequality [BLM13, Theorem3.20], E tr (cid:0) Σ( S Z − I ) (cid:1) ≤ E (cid:88) i,j (cid:2) ∂ ( ij ) tr (cid:0) Σ( S Z − I ) (cid:1)(cid:3) = E (cid:88) i,j (cid:20) N − tr (cid:18) Σ (cid:88) k (cid:0) δ ik e j Z (cid:62) k + δ ik Z k e (cid:62) j (cid:1)(cid:19)(cid:21) = E (cid:88) i,j (cid:2) N − tr (cid:0) Σ e j Z (cid:62) i + Σ Z i e (cid:62) j (cid:1)(cid:3) = 4 N (cid:88) i,j E Z (cid:62) i Σ e j e (cid:62) j Σ Z i = 4 tr(Σ ) N , so ( I ) = 2 e − cN/ tr / (Σ ) N / tr(Σ) = 2 e − cN/ · b / (cid:2) (Σ · b − (Σ)) (cid:3) ( N p ) / . The second term has closed-form expression: by Lemma D.4-(1),( II ) = 2 tr(Σ ) N tr (Σ) . To handle (

III ), by using 0 ≤ log x ≤ x − x ≥ − x − ≤ log x < x ∈ (0 , E log (cid:16) tr(Σ S Z )tr(Σ) (cid:17) = E (cid:20) log (cid:16) tr(Σ S Z )tr(Σ) (cid:17) (cid:110) tr(Σ S Z )tr(Σ) ≥ (cid:111)(cid:21) + E (cid:20) log (cid:16) tr(Σ S Z )tr(Σ) (cid:17) (cid:110) tr(Σ S Z )tr(Σ) < (cid:111)(cid:21) ≤ E (cid:20) tr (cid:0) Σ( S Z − I ) (cid:1) tr(Σ) (cid:21) + E (cid:20) tr(Σ)tr(Σ S Z ) (cid:21) (cid:46) tr(Σ ) N tr (Σ) + 1 , where in the last inequality we apply Lemma D.2. Hence( III ) (cid:46) e − cN (cid:104) tr / (Σ ) N / tr(Σ) (cid:95) (cid:105) . The proof is complete by collecting the bounds. (cid:3)

Lemma D.4.

Let S Z ≡ N − (cid:80) Ni =1 Z i Z (cid:62) i where Z i ’s are i.i.d. N (0 , I ) in R p , and Σ ∈ R p × p be a non-negative deﬁnite matrix.(1) There exists some absolute C > such that E tr (cid:2)(cid:0) Σ / S Z Σ / (cid:1) (cid:3) = (cid:0) N − (cid:1) tr(Σ ) + N − tr (Σ) , E tr (Σ / S Z Σ / ) = tr (Σ) + 2 N − tr(Σ ) , E tr (cid:2)(cid:0) Σ / S Z Σ / (cid:1) (cid:3) ≤ C (cid:104) N − (1 ∨ ( p/N )) tr(Σ ) + tr (Σ ) + N − tr (Σ) (cid:105) . (2) There exists some absolute C > such that Var (cid:0) tr(Σ S Z ) (cid:1) ≤ N − (cid:107) Σ (cid:107) F , Var (cid:0) tr (Σ / S Z Σ / ) (cid:1) ≤ C ( N − tr (Σ)) · N (cid:107) Σ (cid:107) F , Var (cid:0) tr (Σ / S Z Σ / ) − tr ( S Z ) (cid:1) (cid:46) (cid:0) N − tr (Σ − I ) (cid:1) · N (cid:107) Σ (cid:107) F + ( N − p ) · N (cid:107) Σ − I (cid:107) F , Var (cid:0) tr (cid:2)(cid:0) Σ / S Z Σ / (cid:1) (cid:3)(cid:1) ≤ CN − (cid:2) ∨ ( N − p ) (cid:3) tr(Σ ) . (3) Recall that b (Σ) = tr(Σ) /p from (4.3). For any (cid:96) ∈ N , E | b (Σ / S Z Σ / ) − b (Σ) | (cid:96) ≤ C (cid:0) (cid:107) Σ (cid:107) F N − / p − (cid:1) (cid:96) for some constant C = C ( (cid:96) ) .Proof. Let X i ’s be i.i.d. N (0 , Σ). We write S ≡ Σ / S Z Σ / in the prooffor simplicity.(1). Note that E tr( S ) = N − E tr (cid:20) (cid:88) i,j X i X (cid:62) i X j X (cid:62) j (cid:21) = N − (cid:20) (cid:88) i (cid:54) = j E tr (cid:0) X i X (cid:62) i X j X (cid:62) j (cid:1) + (cid:88) i = j E ( X (cid:62) i X j ) (cid:21) = N − (cid:104) N ( N −

1) tr(Σ ) + N E ( Z (cid:62) Σ Z ) (cid:105) ( ∗ ) = N − (cid:104) N ( N + 1) tr(Σ ) + N tr (Σ) (cid:105) = (cid:0) N − (cid:1) tr(Σ ) + N − tr (Σ) , and E tr ( S ) = E (cid:18) N − N (cid:88) i =1 X (cid:62) i X i (cid:19) = N − (cid:88) i,j E X (cid:62) i X i X (cid:62) j X j = N − (cid:20) (cid:88) i (cid:54) = j E (cid:107) X i (cid:107) E (cid:107) X j (cid:107) + (cid:88) i E ( X (cid:62) i X i ) (cid:21) = N − (cid:20) N ( N − (cid:0) E Z (cid:62) Σ Z (cid:1) + N E (cid:0) Z (cid:62) Σ Z (cid:1) (cid:21) ( ∗∗ ) = N − (cid:104) N ( N −

1) tr (Σ) + N (cid:0) tr (Σ) + 2 tr(Σ ) (cid:1)(cid:105) = tr (Σ) + 2 N − tr(Σ ) . IGH DIMENSIONAL COVARIANCE TESTING 73

Here ( ∗ ) , ( ∗∗ ) follow by the following calculations: E Z (cid:62) Σ Z = E (cid:18) (cid:88) j λ j Z j (cid:19) = tr(Σ) , E ( Z (cid:62) Σ Z ) = E (cid:18) (cid:88) j λ j Z j (cid:19) = 3 (cid:88) j λ j + (cid:88) j (cid:54) = j (cid:48) λ j λ j (cid:48) = 2 (cid:88) j λ j + (cid:18) (cid:88) j λ j (cid:19) = tr (Σ) + 2 tr(Σ ) , where λ , . . . , λ p are the eigenvalues of Σ. The ﬁnal one follows as E tr (cid:2)(cid:0) Σ / S Z Σ / (cid:1) (cid:3) = Var (cid:0) tr (cid:2)(cid:0) Σ / S Z Σ / (cid:1) (cid:3)(cid:1) + (cid:16) E tr (cid:2)(cid:0) Σ / S Z Σ / (cid:1) (cid:3)(cid:17) (cid:46) N − (cid:0) ∨ ( N − p ) (cid:1) tr(Σ ) + tr (Σ ) + N − tr (Σ) . The last inequality used (2) to be proved below.(2). For the ﬁrst variance bound, note that ∂∂Z ij tr(Σ S Z ) = N − tr (cid:0) Σ( e j Z (cid:62) i + Z i e (cid:62) j ) (cid:1) = 2 N − ( Z Σ) ij , so Gaussian-Poincar´e inequality yields thatVar (cid:0) tr(Σ S Z ) (cid:1) ≤ E (cid:88) i,j (cid:20) ∂∂Z ij tr (Σ S Z ) (cid:21) = 4 N − E (cid:107) Z Σ (cid:107) F = 4 N − (cid:107) Σ (cid:107) F . For the second variance bound, note that ∂∂Z ij tr (Σ S Z ) = 2 tr(Σ S Z ) · N − tr (cid:0) Σ( e j Z (cid:62) i + Z i e (cid:62) j ) (cid:1) = 4 N − tr(Σ S Z )( Z Σ) ij , so Gaussian-Poincar´e inequality yields thatVar (cid:0) tr (Σ S Z ) (cid:1) ≤ E (cid:88) i,j (cid:20) ∂∂Z ij tr (Σ S Z ) (cid:21) = 16 N − E tr (Σ S Z ) (cid:107) Z Σ (cid:107) F ( ∗ ) ≤ N − tr (Σ) · E / (cid:107) Z Σ (cid:107) F ( ∗∗ ) (cid:46) ( N − tr (Σ)) · N (cid:107) Σ (cid:107) F . Here in ( ∗ ) we use Lemma D.2, and in ( ∗∗ ) we use Lemma D.1.For the third variance bound, note that ∂∂Z ij (cid:0) tr (Σ S Z ) − tr ( S Z ) (cid:1) = 4 N − (cid:0) tr(Σ S Z )( Z Σ) ij − tr( S Z ) Z ij (cid:1) . Hence Var (cid:0) tr (Σ S Z ) − tr ( S Z ) (cid:1) (cid:46) N − E tr (cid:0) (Σ − I ) S Z (cid:1) (cid:107) Z Σ (cid:107) F + N − E tr ( S Z ) (cid:107) Z (Σ − I ) (cid:107) F (cid:46) (cid:2) N − tr (Σ − I ) (cid:3) · N (cid:107) Σ (cid:107) F + ( N − p ) · N (cid:107) Σ − I (cid:107) F . For the fourth variance bound, note that ∂∂Z ij tr(Σ S Z Σ S Z ) = 2 N − tr (cid:104) Σ S Z Σ( e j Z (cid:62) i + Z i e (cid:62) j ) (cid:105) = 4 N − (cid:0) Z Σ S Z Σ (cid:1) ij , so by Gaussian-Poincar´e inequality and Lemma 7.4,Var (cid:0) tr(Σ S Z Σ S Z ) (cid:1) ≤ N − E (cid:107) Z Σ S Z Σ (cid:107) F = 16 N − E tr (cid:0) Z Σ S Z ΣΣ S Z Σ Z (cid:62) (cid:1) = 16 N − E tr (cid:0) Σ S Z ΣΣ S Z Σ S Z (cid:1) ≤ N − E (cid:107) S Z (cid:107) tr(Σ ) (cid:46) N − (cid:0) ∨ ( N − p ) (cid:1) tr(Σ ) . (3). This follows by integrating the tail of | b (Σ / S Z Σ / ) − b (Σ) | in LemmaD.2: E | b (Σ / S Z Σ / ) − b (Σ) | (cid:96) = (cid:90) ∞ (cid:96)t (cid:96) − P (cid:0) | b (Σ / S Z Σ / ) − b (Σ) | > t (cid:1) d t (cid:46) (cid:96) (cid:90) ∞ t (cid:96) − e − Np (cid:107) Σ (cid:107) F t d t + (cid:90) ∞ t (cid:96) − e − Np (cid:107) Σ (cid:107) op t d t (cid:46) (cid:96) (cid:0) (cid:107) Σ (cid:107) F N − / p − (cid:1) (cid:96) + (cid:0) (cid:107) Σ (cid:107) op N − p − (cid:1) (cid:96) (cid:16) (cid:0) (cid:107) Σ (cid:107) F N − / p − (cid:1) (cid:96) . The proof is complete. (cid:3)

Lemma D.5.

Let S Z ≡ N − (cid:80) Ni =1 Z i Z (cid:62) i where Z i ’s are i.i.d. N (0 , I ) in R p . With y ≡ p/N the following hold:(1) E tr( S Z ) = py + 3 py + p + 3 y + 3 y + 4 y/N .(2) E tr ( S Z ) = p + 6 py + 8 y/N .(3) E tr( S Z ) tr( S Z ) = p y + p + py + 4( y + y ) + 4 y/N .(4) E tr ( S Z ) equals N − (cid:2) N p ( p + 2)( p + 4)( p + 6) + N ( N − (cid:0) p ( p + 2) (cid:1) + 2 N ( N − p ( p + 2) + 4 N ( N − p ( p + 2)( p + 4)+ 4 N ( N − N − p ( p + 2) + 2 N ( N − N − p ( p + 2)+ N ( N − N − N − p (cid:3) . (5) E tr( S Z ) tr( S Z ) equals N − (cid:2) N p ( p + 2)( p + 4)( p + 6) + N ( N − p ( p + 2)( p + 4)+ 3 N ( N − p ( p + 2) + 3 N ( N − p ( p + 2)( p + 4)+ 3 N ( N − N − p ( p + 2) + 3 N ( N − N − p ( p + 2)+ N ( N − N − N − p (cid:3) . (6) Var (cid:0) b ( S Z ) b ( S Z ) − b ( S Z ) (cid:1) = O ( p /N ) in the asymptotic regime p >N → ∞ .(7) For any k, (cid:96) ∈ N , E tr k ( S (cid:96)Z ) ≤ Cp k(cid:96) for some constant C = C ( k, (cid:96) ) > . IGH DIMENSIONAL COVARIANCE TESTING 75

Proof.

Write S Z for S in the proof for simplicity. Recall that if R follows achi-squared distribution with an integer ν degrees of freedom, then E R = ν + 2 ν, E R = ν + 6 ν + 8 ν, E R = ν ( ν + 2)( ν + 4)( ν + 6) . Hence (1)-(3) follows from the following calculations: We have E tr( S ) = N − · E (cid:88) i ,i ,i ( Z (cid:62) i Z i )( Z (cid:62) i Z i )( Z (cid:62) i Z i )= N − · (cid:16) (cid:88) | ( i ,i ,i ) | =1 E (cid:107) Z (cid:107) + (cid:88) | ( i ,i ,i ) | =2 E (cid:107) Z (cid:107) + (cid:88) | ( i ,i ,i ) | =3 E (cid:107) Z (cid:107) (cid:17) = N − · (cid:2) N · E ( χ p ) + (3 N − N ) E ( χ p ) + N ( N − N − · p (cid:3) = N − · (cid:2) N ( p + 6 p + 8 p ) + (3 N − N )( p + 2 p ) + N ( N − N − p (cid:3) = N − · (cid:2) ( N p + 3 N p + N p ) + 3( N p + N p ) + 4 N p (cid:3) = N − p + 3 N − p + p + 3( N − p ) + 3 N − p + 4 N − p = py + 3 py + p + 3 y + 3 y + 4 N − y, and E tr ( S ) = N − E (cid:18) N (cid:88) i =1 (cid:107) Z i (cid:107) (cid:19) = N − E ( χ Np ) = N − ( N p + 6 N p + 8 N p ) = p + 6 N − p + 8 N − p = p + 6 py + 8 N − y, and E tr( S ) tr( S ) = N − E (cid:16) N (cid:88) i =1 (cid:107) Z i (cid:107) (cid:17)(cid:16) N (cid:88) i ,i =1 ( Z (cid:62) i Z i ) (cid:17) = N − E N (cid:88) i ,i ,i =1 (cid:107) Z i (cid:107) ( Z (cid:62) i Z i ) = N − (cid:104) (cid:88) | ( i ,i ,i ) | =3 E (cid:107) Z (cid:107) · E ( Z (cid:62) Z ) + (cid:88) ( i = i ) (cid:54) = i E (cid:107) Z (cid:107) · E (cid:107) Z (cid:107) + (cid:88) | ( i ,i ) | = | ( i ,i ,i ) | =2 E (cid:107) Z (cid:107) ( Z (cid:62) Z ) + (cid:88) | ( i ,i ,i ) | =1 E (cid:107) Z (cid:107) (cid:105) = N − (cid:2) N ( N − N − p + ( N − N )( p + 2 p )+ 2( N − N ) (cid:0) p + 2 p (cid:1) + N (cid:0) p + 6 p + 8 p (cid:1)(cid:3) = N − (cid:2) ( N p + N p ) + N p + 4( N p + N p ) + 4 N p (cid:3) = p y + p + py + 4( y + y ) + 4 N − y. (4). By deﬁnition, we have E tr ( S ) = N − E (cid:16) N (cid:88) i ,i (cid:48) =1 ( X (cid:62) i X i (cid:48) ) (cid:17) = N − N (cid:88) i ,i (cid:48) ,i ,i (cid:48) =1 E ( X (cid:62) i X i (cid:48) ) ( X (cid:62) i X i (cid:48) ) . (D.1)The right hand side of (D.1) breaks into (cid:80) i =1 A i , where A , A - A , A - A ,and A correspond to the cases where ( i , i (cid:48) , i , i (cid:48) ) take 1 , , , • ( A ) When ( i , i (cid:48) , i , i (cid:48) ) take 1 value, there are N such summands eachof which take the value E (cid:107) X (cid:107) = p ( p + 2)( p + 4)( p + 6). • ( A ) When ( i , i (cid:48) , i , i (cid:48) ) take 2 values with ( i = i (cid:48) ) (cid:54) = ( i = i (cid:48) ), there are N ( N −

1) such summands each of which takes the value E (cid:107) X (cid:107) (cid:107) X (cid:107) = (cid:0) E (cid:107) X (cid:107) (cid:1) = p ( p + 2) . • ( A ) When ( i , i (cid:48) , i , i (cid:48) ) take 2 values with ( i = i ) (cid:54) = ( i (cid:48) = i (cid:48) ) or( i = i (cid:48) ) (cid:54) = ( i = i (cid:48) ), there are 2 N ( N −

1) such summands each of whichtakes the value E ( X (cid:62) X ) ( X (cid:62) X ) = E ( X (cid:62) X ) = E (cid:16) p (cid:88) j =1 X ,j X ,j (cid:17) = p (cid:88) j ,j ,j ,j =1 (cid:0) E X ,j X ,j X ,j X ,j (cid:1) = (cid:88) | ( j ,j ,j ,j ) | =1 (cid:0) E X ,j X ,j X ,j X ,j (cid:1) + (cid:88) | ( j ,j ,j ,j ) | =2 (cid:0) E X ,j X ,j X ,j X ,j (cid:1) = p · + 3 p ( p − · p ( p + 2) . • ( A ) When ( i , i (cid:48) , i , i (cid:48) ) take 2 values of the form ( i = i = i (cid:48) ) (cid:54) = i (cid:48) orits variants, there are 4 N ( N −

1) such summands each of which takes thevalue E (cid:107) X (cid:107) ( X (cid:62) X ) = E tr( (cid:107) X (cid:107) X X (cid:62) X X (cid:62) ) = E (cid:107) X (cid:107) = p ( p + 2)( p + 4) . • ( A ) When ( i , i (cid:48) , i , i (cid:48) ) take 3 values of the form ( i = i ) (cid:54) = i (cid:48) (cid:54) = i (cid:48) orits variants, there are 4 N ( N − N −

2) such summands each of whichtakes the value E ( X (cid:62) X ) ( X (cid:62) X ) = E (cid:107) X (cid:107) = p ( p + 2). • ( A ) When ( i , i (cid:48) , i , i (cid:48) ) take 3 values of the form ( i = i (cid:48) ) (cid:54) = i (cid:54) = i (cid:48) orits variants, there are 2 N ( N − N −

2) such summands each of whichtakes the value E ( X (cid:62) X ) ( X (cid:62) X ) = p · E (cid:107) X (cid:107) = p ( p + 2). • ( A ) When ( i , i (cid:48) , i , i (cid:48) ) take 4 values, there are N ( N − N − N − E ( X (cid:62) X ) ( X (cid:62) X ) = p . IGH DIMENSIONAL COVARIANCE TESTING 77 (5). By deﬁnition, we have E (cid:2) tr( S ) tr( S ) (cid:3) = N − E (cid:16) N (cid:88) i =1 (cid:107) X i (cid:107) (cid:17)(cid:16) N (cid:88) j ,j ,j =1 (cid:0) X (cid:62) j X j (cid:1)(cid:0) X (cid:62) j X j (cid:1)(cid:0) X (cid:62) j X j (cid:1)(cid:17) = N − N (cid:88) i,j ,j ,j =1 E (cid:104) (cid:107) X i (cid:107) (cid:0) X (cid:62) j X j (cid:1)(cid:0) X (cid:62) j X j (cid:1)(cid:0) X (cid:62) j X j (cid:1)(cid:105) . (D.2)The right hand side of (D.2) breaks into (cid:80) i =1 B i , where B , B - B , B - B , and B correspond to the cases where ( i, j , j , j ) take 1 , , , • ( B ) When ( i, j , j , j ) takes 1 value, there are N such summands in(D.2), each of which takes the value E (cid:107) X (cid:107) = p ( p + 2)( p + 4)( p + 6). • ( B ) When ( i, j , j , j ) take 2 values with i (cid:54) = ( j = j = j ), there are N ( N −

1) such summands each of which takes the value E (cid:107) X (cid:107) (cid:107) X (cid:107) = p ( p + 2)( p + 4). • ( B ) When ( i, j , j , j ) take 2 values of the form ( i = j ) (cid:54) = ( j = j ) andits variants, there are 3 N ( N −

1) such summands each of which takes thevalue E (cid:107) X (cid:107) ( X (cid:62) X ) (cid:107) X (cid:107) ( X (cid:62) X ) = E tr (cid:16) (cid:107) X (cid:107) X X (cid:62) · (cid:107) X (cid:107) X X (cid:62) (cid:17) = tr (cid:16) E (cid:107) X (cid:107) X X (cid:62) (cid:17) ∗ ) = tr (cid:2) ( p + 2) I p (cid:3) = p ( p + 2) . Here in ( ∗ ) we use the following fact by direct calculation (cid:16) E (cid:107) X (cid:107) X X (cid:62) (cid:17) k(cid:96) = E (cid:104)(cid:0) p (cid:88) m =1 X ,m (cid:1) X ,k X ,(cid:96) (cid:105) = ( p + 2) δ k(cid:96) . • ( B ) When ( i, j , j , j ) take 2 values of the form ( i = j = j ) (cid:54) = j orits variants, there are 3 N ( N −

1) such summands each of which takes thevalue E (cid:107) X (cid:107) ( X (cid:62) X ) = E (cid:107) X (cid:107) = p ( p + 2)( p + 4). • ( B ) When ( i, j , j , j ) take 3 values of the form ( i = j ) (cid:54) = j (cid:54) = j orits variants, there are 3 N ( N − N −

2) such summands each of whichtakes the value E (cid:107) X (cid:107) ( X (cid:62) X )( X (cid:62) X )( X (cid:62) X ) = E (cid:107) X (cid:107) = p ( p + 2). • ( B ) When ( i, j , j , j ) take 3 values of the form i (cid:54) = ( j = j ) (cid:54) = j orits variants, there are 3 N ( N − N −

2) such summands each of whichtakes the value E (cid:107) X (cid:107) (cid:107) X (cid:107) ( X (cid:62) X ) = E (cid:107) X (cid:107) · E (cid:107) X (cid:107) = p ( p + 2). • ( B ) When ( i, j , j , j ) take 4 values, there are N ( N − N − N − E (cid:2) (cid:107) X (cid:107) ( X (cid:62) X )( X (cid:62) X ) · ( X (cid:62) X ) (cid:3) = p .(6). Let F ( X ) ≡ b ( S ) b ( S ) − b ( S ). Then using for any ( i, j ) ∈ [ N ] × [ p ] ∂ ij b = 2( N p ) − X ij , ∂ ij b = 4( N p ) − X (cid:62) i Se j , ∂ ij b = 6( N p ) − X (cid:62) i S e j , we have ∂ ij F ( X ) = 2( N p ) − X ij · b + 6( N p ) − X (cid:62) i S e j · b − N p ) − X (cid:62) i Se j · b = ( N p ) − (cid:16) b X + 6 bXS − b XS (cid:17) ( ij ) . Hence by the Gaussian-Poincar´e inequality, we have by direct calculationVar( F ( X )) ≤ E (cid:107)∇ F ( X ) (cid:107) F = ( N p ) − E (cid:0) bb + 36 b b + 32 b b − bb b (cid:1) = ( N p ) − E (cid:104)

28 tr( S ) tr ( S ) + 36 tr ( S ) tr( S ) + 32 tr ( S ) tr( S ) −

96 tr( S ) tr( S ) tr( S ) (cid:105) . The rest of the proof follows from similar arguments as in (4) and (5) byexplicit calculation and cancellation of higher order terms; we omit the de-tails.(7). This follows directly from the property of the chi-squared distribution: E tr k ( S (cid:96) ) = N − k(cid:96) E (cid:104) N (cid:88) i ,...,i (cid:96) =1 ( X (cid:62) i X i ) · · · ( X (cid:62) i (cid:96) − X i (cid:96) )( X (cid:62) i (cid:96) X i ) (cid:105) k ≤ N − k(cid:96) E (cid:16) N (cid:88) i ,...,i (cid:96) =1 (cid:107) X i (cid:107) · · · (cid:107) X i (cid:96) (cid:107) (cid:17) k = N − k(cid:96) E (cid:16) N (cid:88) i =1 (cid:107) X i (cid:107) (cid:17) k(cid:96) = N − k(cid:96) E (cid:0) χ ( N p ) (cid:1) k(cid:96) (cid:46) k,(cid:96) p k(cid:96) . The proof is complete. (cid:3)

References [And58] T. W. Anderson,

An introduction to multivariate statistical analysis , WileyPublications in Statistics, John Wiley & Sons, Inc., New York; Chapman &Hall, Ltd., London, 1958.[BBAP05] Jinho Baik, G´erard Ben Arous, and Sandrine P´ech´e,

Phase transition ofthe largest eigenvalue for nonnull complex sample covariance matrices , Ann.Probab. (2005), no. 5, 1643–1697.[BD05] Melanie Birke and Holger Dette, A note on testing the covariance matrix forlarge dimension , Statist. Probab. Lett. (2005), no. 3, 281–289.[BJYZ09] Zhidong Bai, Dandan Jiang, Jian-Feng Yao, and Shurong Zheng, Corrections toLRT on large-dimensional covariance matrix by RMT , Ann. Statist. (2009),no. 6B, 3822–3840.[BLM13] St´ephane Boucheron, G´abor Lugosi, and Pascal Massart, Concentration in-equalities: A nonasymptotic theory of independence , Oxford University Press,Oxford, 2013.[Bog98] Vladimir I. Bogachev,

Gaussian measures , Mathematical Surveys and Mono-graphs, vol. 62, American Mathematical Society, Providence, RI, 1998.[BS04] Zhidong Bai and Jack W. Silverstein,

CLT for linear spectral statistics of large-dimensional sample covariance matrices , Ann. Probab. (2004), no. 1A, 553–605. IGH DIMENSIONAL COVARIANCE TESTING 79 [BV04] Stephen Boyd and Lieven Vandenberghe,

Convex Optimization , CambridgeUniversity Press, Cambridge, 2004.[Cha09] Sourav Chatterjee,

Fluctuations of eigenvalues and second order Poincar´e in-equalities , Probab. Theory Related Fields (2009), no. 1-2, 1–40.[Cha14] ,

Superconcentration and related topics , Springer Monographs in Math-ematics, Springer, Cham, 2014.[CJ18] Huijun Chen and Tiefeng Jiang,

A study of two high-dimensional likelihoodratio tests under alternative hypotheses , Random Matrices Theory Appl. (2018), no. 1, 1750016, 23.[CM13] T. Tony Cai and Zongming Ma, Optimal hypothesis testing for high dimensionalcovariance matrices , Bernoulli (2013), no. 5B, 2359–2388.[CZZ10] Song Xi Chen, Li-Xin Zhang, and Ping-Shou Zhong, Tests for high-dimensionalcovariance matrices , J. Amer. Statist. Assoc. (2010), no. 490, 810–819.[Eat83] Morris L. Eaton,

Multivariate statistics , Wiley Series in Probability and Math-ematical Statistics: Probability and Mathematical Statistics, John Wiley &Sons, Inc., New York, 1983, A vector space approach.[HSS20] Qiyang Han, Bodhisattva Sen, and Yandi Shen,

High dimensional asymptoticsof likelihood ratio tests in gaussian sequence model under convex constraint ,arXiv preprint arXiv:2010.03145 (2020).[Jia19] Tiefeng Jiang,

Determinant of sample correlation matrix with application , Ann.Appl. Probab. (2019), no. 3, 1356–1397.[JJY12] Dandan Jiang, Tiefeng Jiang, and Fan Yang, Likelihood ratio tests for co-variance matrices of high-dimensional normal distributions , J. Statist. Plann.Inference (2012), no. 8, 2241–2256.[Joh71] S. John,

Some optimal multivariate tests , Biometrika (1971), 123–127.[Joh01] Iain M. Johnstone, On the distribution of the largest eigenvalue in principalcomponents analysis , Ann. Statist. (2001), no. 2, 295–327.[JQ15] Tiefeng Jiang and Yongcheng Qi, Likelihood ratio tests for high-dimensionalnormal distributions , Scand. J. Stat. (2015), no. 4, 988–1009.[JY13] Tiefeng Jiang and Fan Yang, Central limit theorems for classical likelihood ratiotests for high-dimensional normal distributions , Ann. Statist. (2013), no. 4,2029–2074.[KL17] Vladimir Koltchinskii and Karim Lounici, Concentration inequalities and mo-ment bounds for sample covariance operators , Bernoulli (2017), no. 1, 110–133.[LW02] Olivier Ledoit and Michael Wolf, Some hypothesis tests for the covariance ma-trix when the dimension is large compared to the sample size , Ann. Statist. (2002), no. 4, 1081–1102.[Maz11] Vladimir Maz’ya, Sobolev spaces with applications to elliptic partial diﬀerentialequations , vol. 342, Springer, Heidelberg, 2011.[Mui82] Robb J. Muirhead,

Aspects of multivariate statistical theory , John Wiley &Sons, Inc., New York, 1982, Wiley Series in Probability and MathematicalStatistics.[Nag73] Hisao Nagao,

On some test criteria for covariance matrix , Ann. Statist. (1973), 700–709.[NP12] Ivan Nourdin and Giovanni Peccati, Normal approximations with Malliavincalculus , Cambridge Tracts in Mathematics, vol. 192, Cambridge UniversityPress, Cambridge, 2012, From Stein’s method to universality.[OMH13] Alexei Onatski, Marcelo J. Moreira, and Marc Hallin,

Asymptotic power ofsphericity tests for high-dimensional data , Ann. Statist. (2013), no. 3, 1204–1231.[OMH14] , Signal detection in high dimension: the multispiked case , Ann. Statist. (2014), no. 1, 225–254. [PY14] Natesh S. Pillai and Jun Yin, Universality of covariance matrices , Ann. Appl.Probab. (2014), no. 3, 935–1001.[RV09] Mark Rudelson and Roman Vershynin, Smallest singular value of a randomrectangular matrix , Comm. Pure Appl. Math. (2009), no. 12, 1707–1739.[Sri05] Muni S. Srivastava, Some tests concerning the covariance matrix in high di-mensional data , J. Japan Statist. Soc. (2005), no. 2, 251–272.[vdV98] Aad van der Vaart, Asymptotic Statistics , Cambridge Series in Statistical andProbabilistic Mathematics, vol. 3, Cambridge University Press, Cambridge,1998.[WY13] Qinwen Wang and Jianfeng Yao,

On the sphericity test with large-dimensionalobservations , Electron. J. Stat. (2013), 2164–2192.[ZBY15] Shurong Zheng, Zhidong Bai, and Jianfeng Yao, Substitution principle for CLTof linear spectral statistics of high-dimensional sample covariance matrices withapplications to hypothesis testing , Ann. Statist. (2015), no. 2, 546–591.(Q. Han) Department of Statistics, Rutgers University, Piscataway, NJ 08854,USA.

Email address : [email protected] (T. Jiang) School of Statistics, University of Minnesota, Minneapolis, MN55455, USA.

Email address : [email protected] (Y. Shen) Department of Statistics, University of Washington, Seattle, WA98105, USA.

Email address ::