[PDF] Multi-level Thresholding Test for High Dimensional Covariance Matrices

Abstract

We consider testing the equality of two high-dimensional covariance matrices by carrying out a multi-level thresholding procedure, which is designed to detect sparse and faint differences between the covariances. A novel U-statistic composition is developed to establish the asymptotic distribution of the thresholding statistics in conjunction with the matrix blocking and the coupling techniques. We propose a multi-thresholding test that is shown to be powerful in detecting sparse and weak differences between two covariance matrices. The test is shown to have attractive detection boundary and to attain the optimal minimax rate in the signal strength under different regimes of high dimensionality and the sparsity of the signal. Simulation studies are conducted to demonstrate the utility of the proposed test.

Full PDF

aa r X i v : . [ m a t h . S T ] O c t Submitted to the Annals of Statistics

MULTI-LEVEL THRESHOLDING TEST FOR HIGHDIMENSIONAL COVARIANCE MATRICES

By Song Xi Chen , Bin Guo and Yumou Qiu Peking University , Southwestern University of Finance and Economics and Iowa State University We consider testing the equality of two high-dimensional covari-ance matrices by carrying out a multi-level thresholding procedure,which is designed to detect sparse and faint diﬀerences between thecovariances. A novel U -statistic composition is developed to estab-lish the asymptotic distribution of the thresholding statistics in con-junction with the matrix blocking and the coupling techniques. Wepropose a multi-thresholding test that is shown to be powerful in de-tecting sparse and weak diﬀerences between two covariance matrices.The test is shown to have attractive detection boundary and to at-tain the optimal minimax rate in the signal strength under diﬀerentregimes of high dimensionality and the sparsity of the signal. Simula-tion studies are conducted to demonstrate the utility of the proposedtest.

1. Introduction.

Understanding the dependence among data compo-nents is an important goal in high-dimensional data analysis as diﬀerent de-pendence structures lead to diﬀerent inference procedures, for instance in theHotelling’s test for the mean (Hotelling, 1931) and Fisher’s linear discrimi-nant analysis, the pooled covariance estimate is used under the assumptionof the same covariance matrix between the two samples. For high dimen-sional data, the covariance matrices are utilized in the form of its inverse toenhance the signal strength in the innovated Higher Criticism test for highdimensional means (Hall and Jin, 2010) and in Gaussian graphical models(Liu, 2013; Ren et al., 2015). In genetic studies, covariances are widely usedto understand the interactions among genes, to study functionally relatedgenes (Yi et al., 2007), and to construct and compare co-expression geneticnetworks (de la Fuente, 2010).As a multivariate statistical procedure is likely constructed based on aspeciﬁc dependence structure of the data, testing for the equality of twocovariance matrices Σ and Σ from two populations has been an endur-ing task. John (1971), Gupta and Giri (1973), Nagao (1973) and Perlman(1980) presented studies under the conventional ﬁxed dimensional setting; Keywords and phrases: β − mixing, Covariance matrix, High dimensionality, Detectionboundary, Rare and faint signal, Thresholding. S. X. CHEN, B. GUO AND Y. QIU see Anderson (2003) for a comprehensive review. The modern high-dimensionaldata have generated a renewed interest under the so-called “large p , small n ” paradigm. For Gaussian data with the dimension p and the sample size n being the same order, Schott (2007) and Srivastava and Yanagihara (2010)proposed two sample tests based on the distance measure k Σ − Σ k F ,the squared Frobenius matrix norm between the two covariances. Bai et al.(2009) considered a corrected likelihood ratio test via the large dimensionalrandom matrix theory. For nonparametric settings without explicitly re-stricting p and the sample sizes, Li and Chen (2012) proposed an ℓ -testbased on a linear combination of U -statistics which is an unbiased estimatorof k Σ − Σ k F . Qiu and Chen (2012) studied an ℓ -test for the bandednessof a covariance. Cai et al. (2013) proposed a test based on the maximal stan-dardized diﬀerences (an ℓ max -type formulation) between the entries of twosample covariance matrices. Chang et al. (2017) constructed a simulationbased approach to approximate the distribution of the maximal statistics.Studies have shown that the ℓ -tests are powerful for detecting dense andweak diﬀerences in the covariances, while the ℓ max -formulation is powerfulagainst sparse and strong signal settings.Detecting rare and faint signals has attracted much attention in high-dimensional statistical inference. The studies have been largely concentratedfor the mean problems (Fan, 1996; Donoho and Jin, 2004; Delaigle et al.,2011; Zhong et al., 2013; Qiu et al., 2018), while studies for covariance ma-trices are much less. Arias-Castro et al. (2012) investigated near optimaltesting rules for detecting nonzero correlations in a one sample setting forGaussian data with clustered nonzero signals.The aim of this paper is on enhancing the power performance in testingdiﬀerences between two covariances when the diﬀerences are both sparseand faint, which is the most challenging setting for signal detection andbrings about the issue of optimal detection boundary for covariance matri-ces. We introduce thresholding on the ℓ -formulation of Li and Chen (2012)to remove those non-signal bearing entries of the covariances, which re-duces the overall noise (variance) level of the test statistic and increasesthe signal to noise ratio for the testing problem. The formulation may beviewed as a parallel development to the thresholding method for detect-ing diﬀerences in the means, for instance the Higher Criticism (HC) testof Donoho and Jin (2004); Hall and Jin (2010) and Delaigle et al. (2011),and the ℓ -thresholding formulation in Fan (1996), Zhong et al. (2013) andQiu et al. (2018). However, comparing with the studies on the thresholdingtests for the means, there is few work on the thresholding tests for covari-ance matrices beyond a discussion in Donoho and Jin (2015), largely due HRESHOLDING TEST FOR COVARIANCE to a diﬃculty in treating the dependence among the entries of the samplecovariance matrices.To overcome the theoretical diﬃculty, we adopt a matrix version of theblocking method to partition the matrix entries to big square blocks sepa-rated by small rectangular blocks. The coupling technique is used to con-struct an equivalent U-statistic to the thresholding test statistic based onthe covariance matrix block partition. The equivalent U-statistic formulationallows establishing the martingale central limit theorem (Hall and Heyde,1980) for the asymptotic distribution of the test statistic. A multi-thresholdingtest procedure is proposed to make the test adaptive to the unknown signalstrength and sparsity. Under the setting of rare and faint diﬀerences betweenthe two covariances, the power of the proposed test is studied and its detec-tion boundary is derived, which shows the beneﬁts of the multi-thresholdingover existing two sample covariance tests.The paper is organized as follows. We introduce the setting of the co-variance testing in Section 2. The thresholding statistic and the multi-levelthresholding test are proposed in Sections 3 and 4, with its power and detec-tion boundary established in Section 5. Simulation studies and discussionsare presented in Sections 6 and 7, respectively. Proofs and a real data anal-ysis are relegated to the appendix and the supplementary material (SM).

2. Preliminary.

Suppose that there are two independent samples of p -dimensional random vectors X , . . . , X n i.i.d. ∼ F and Y , . . . , Y n i.i.d. ∼ F drawn from two distributions F and F , respectively, where X k =( X k , . . . , X kp ) T , Y k = ( Y k , . . . , Y kp ) T , n and n are the sample sizes,and “i.i.d.” stands for “independent and identically distributed”. Let µ =( µ , . . . , µ p ) T and µ = ( µ , . . . , µ p ) T be the means of F and F , and Σ = ( σ ij ) p × p and Σ = ( σ ij ) p × p be the covariance matrices of F and F ,respectively. Let Ψ = ( ρ ij ) p × p and Ψ = ( ρ ij ) p × p be the correspondingcorrelation matrices. We consider testing(2.1) H : Σ = Σ vs. H a : Σ = Σ under a high-dimensional setting where p ≫ n , n .Let ∆ = Σ − Σ = ( δ ij ) where δ ij = σ ij − σ ij are component-wisediﬀerences between Σ and Σ , q = p ( p + 1) / n = n n / ( n + n ) be the eﬀective sample size in the testingproblem.While hypothesis (2.1) oﬀers all possible alternatives against the equal-ity of the two covariances, we consider in this study a subset of the alter-natives that constitutes the most challenging setting with the number of S. X. CHEN, B. GUO AND Y. QIU non-zero δ ij being rare and the magnitude of the non-zero δ ij being faint;see Donoho and Jin (2004) and Hall and Jin (2010) for similar settings inthe context for testing means. Let m a denote the number of nonzero δ ij for i ≤ j . We assume a sparse setting such that m a = ⌊ q (1 − β ) ⌋ for a β ∈ (1 / , β is the sparsity parameter and ⌊·⌋ is the integer truncation function.We note that β ∈ (0 , /

2] is the dense case under which the testing is easier.The faintness of signals is characterized by(2.2) δ ij = q r ,ij log( q ) /n = q r ,ij log( p ) /n { o (1) } if δ ij = 0for r ,ij >

0. As shown in Theorem 3, p log( p ) /n in (2.2) is the minimumrate for successful signal detection under the sparse setting. Speciﬁcally, ouranalysis focuses on a special case of (2.1) such that H : δ ij = 0 for all 1 ≤ i ≤ j ≤ p vs. H a : m a = ⌊ q (1 − β ) ⌋ nonzero δ ij with strength speciﬁed in (2.2).(2.3)Here, the signal strength r ,ij together with β ∈ (1 / ,

1) constitutes therare and faint signal setting, which has been used to evaluate tests onmeans and regression coeﬃcients (Donoho and Jin, 2004; Hall and Jin, 2010;Zhong et al., 2013; Qiu et al., 2018). Our proposed test is designed to achievehigh power under H a of (2.3) that oﬀers the most challenging setting for de-tecting unequal covariances, as shown in Theorem 3.Hypotheses (2.3) are composite null versus composite alternative. Underthe null, although the two covariances are the same, they can take diﬀerentvalues; and under the alternative no prior distribution is assumed on thelocation of the nonzero δ ij . This is diﬀerent from the simple null versussimple alternative setting of Donoho and Jin (2004). The derivation of theoptimal detection boundary for such composite hypotheses is more diﬃcultas shown in the later analysis.Let { π ℓ,p } p ! ℓ =1 denote all possible permutations of { , . . . , p } and X k ( π ℓ,p )and Y k ( π ℓ,p ) be the reordering of X k and Y k corresponding to a permuta-tion π ℓ,p . We assume that there is a permutation π ℓ ∗ ,p such that X k ( π ℓ ∗ ,p )and Y k ( π ℓ ∗ ,p ) are weakly dependent, deﬁned via the β -mixing (Bradley,2005). As the proposed statistic in (3.1) is of the ℓ -type and is invariant topermutations of X k and Y k , there is no need to know π ℓ ∗ ,p .Let { X k } = { X k ( π ℓ ∗ ,p ) } and { Y k } = { Y k ( π ℓ ∗ ,p ) } to simplify notation.Let F m b m a ( X k ) = σ { X kj : m a ≤ j ≤ m b } and F m b m a ( Y k ) = σ { Y kj : m a ≤ j ≤ m b } be the σ -ﬁelds generated by { X k } and { Y k } for 1 ≤ m a ≤ m b ≤ p .The β -mixing coeﬃcients are ζ x,p ( h ) = sup ≤ m ≤ p − h ζ {F m ( X k ) , F pm + h ( X k ) } HRESHOLDING TEST FOR COVARIANCE and ζ y,p ( h ) = sup ≤ m ≤ p − h ζ {F m ( Y k ) , F pm + h ( Y k ) } (Bradley, 2005), wherefor two σ -ﬁelds A and B , ζ ( A , B ) = 12 sup u X l =1 u X l =1 (cid:12)(cid:12) P ( A l ∩ B l ) − P ( A l ) P ( B l ) (cid:12)(cid:12) . Here, the supremum is taken over all ﬁnite partitions { A l ∈ A} u l =1 and { B l ∈ B} u l =1 of the sample space, and u , u ∈ Z + , the set of positiveintegers.Let ¯ X = P n k =1 X k /n and ¯ Y = P n k =1 Y k /n be the two sample meanswhere ¯ X = ( ¯ X , . . . , ¯ X p ) T and ¯ Y = ( ¯ Y , . . . , ¯ Y p ) T . Let b Σ = (ˆ σ ij ) = 1 n n X k =1 ( X k − ¯ X )( X k − ¯ X ) T and b Σ = (ˆ σ ij ) = 1 n n X k =1 ( Y k − ¯ Y )( Y k − ¯ Y ) T , and κ = lim n ,n →∞ n / ( n + n ). Moreover, let θ ij = var { ( X ki − µ i )( X kj − µ j ) } , θ ij = var { ( Y ki − µ i )( Y kj − µ j ) } ; ρ (1) ij,lm = Cor { ( X ki − µ i )( X kj − µ j ) , ( X kl − µ l )( X km − µ m ) } , and ρ (2) ij,lm = Cor { ( Y ki − µ i )( Y kj − µ j ) , ( Y kl − µ l )( Y km − µ m ) } . Both θ ij and θ ij can be estimated byˆ θ ij = 1 n n X k =1 { ( X ki − ¯ X i )( X kj − ¯ X j ) − ˆ σ ij } andˆ θ ij = 1 n n X k =1 { ( Y ki − ¯ Y i )( Y kj − ¯ Y j ) − ˆ σ ij } . As ˆ θ ij /n + ˆ θ ij /n is ratioly consistent to the variance of ˆ σ ij − ˆ σ ij , wedeﬁne a standardized diﬀerence between ˆ σ ij and ˆ σ ij as M ij = F ij for F ij = ˆ σ ij − ˆ σ ij (ˆ θ ij /n + ˆ θ ij /n ) / , ≤ i ≤ j ≤ p. Cai et al. (2013) proposed a maximum statistic M n = max ≤ i ≤ j ≤ p M ij that targets at the largest signal between Σ and Σ . Li and Chen (2012)proposed an ℓ -test that aims at k Σ − Σ k F . Donoho and Jin (2015) brieﬂydiscussed the possibility of applying the Higher Criticism (HC) statistic fortesting H : Σ = I p with Gaussian data.We are to propose a test by carrying out multi-level thresholding on { M ij } to ﬁlter out potential signals via an ℓ -formulation, and show that suchthresholding leads to a more powerful test than both the maximum test andthe ℓ -type tests when the signals are rare and faint. S. X. CHEN, B. GUO AND Y. QIU

3. Thresholding statistics for covariance matrices.

By the mod-erate deviation result in Lemma 2 in the SM, under Assumptions 1A (or1B), 2, 3 and H of (2.1), P (cid:8) max ≤ i ≤ j ≤ p M ij > p ) (cid:9) → n, p → ∞ .This implies that a threshold level of 4 log( p ) is asymptotically too large un-der the null hypothesis, and suggests a smaller threshold λ p ( s ) = 4 s log( p )for a thresholding parameter s ∈ (0 , T n ( s ) = X ≤ i ≤ j ≤ p M ij I { M ij > λ p ( s ) } , where I ( · ) denotes the indicator function.Statistic T n ( s ) removes those small standardized diﬀerences M ij between b Σ and b Σ . Compared with the ℓ -statistic of Li and Chen (2012), T n ( s )keeps only large M ij after ﬁltering out the potentially insigniﬁcant ones. Byremoving those smaller M ij ’s, the variance of T n ( s ) is much reduced fromthat of Li and Chen (2012) which translates to a larger power as shown inthe next section. Compared to the ℓ max -test of Cai et al. (2013) whose poweris determined by the maximum of M ij , the thresholding statistic not onlyuses the largest M ij , but also all relatively large entries. This enhances theability in detecting weak signals as reﬂected in the power and the detectionboundary in Section 5.Let C be a positive constant whose value may change in the context.For two real sequences { a n } and { b n } , a n ∼ b n means that there are twopositive constants c and c such that c ≤ a n /b n ≤ c for all n . We makethe following assumptions in our analysis. Assumption . As n → ∞ , p → ∞ , log p ∼ n ̟ for a ̟ ∈ (0 , / Assumption . As n → ∞ , p → ∞ , n ∼ p ξ for a ξ ∈ (0 , Assumption . There exists a positive constant τ such that τ < min ≤ i ≤ p { σ ii , σ ii } ≤ max ≤ i ≤ p { σ ii , σ ii } < τ − and(3.2) min i,j { θ ij / ( σ ii σ jj ) , θ ij / ( σ ii σ jj ) } > τ. (3.3) Assumption . There exist positive constants η and C such that for all | t | < η , E [exp { t ( X ki − µ i ) } ] ≤ C and E [exp { t ( Y ki − µ i ) } ] ≤ C for i = 1 , . . . , p. Assumption . There exists a small positive constant ρ such that(3.4) max {| ρ ij | , | ρ ij |} < − ρ for any i = j, and max {| ρ (1) ij,lm | , | ρ (2) ij,lm |} < − ρ for any ( i, j ) = ( l, m ). HRESHOLDING TEST FOR COVARIANCE Assumption . There is a permutation ( π ℓ ∗ ,p ) of the data sequences { X kj } pj =1 and { Y kj } pj =1 such that the permuted sequences are β -mixing withthe mixing coeﬃcients satisfying max { ζ x,p ( h ) , ζ y,p ( h ) } ≤ Cγ h for a constant γ ∈ (0 , p ∈ Z + and positive integer h ≤ p − p relative to n , respectively. Assumption 2 prescribes that θ ij and θ ij are bounded away from zero to ensure the denominators of M ij beingbounded away from zero with probability approaching 1. Assumption 3 as-sumes the distributions of X ki and Y ki are sub-Gaussian. Sub-Gaussianity iscommonly assumed in high-dimensional literature (Bickel and Levina, 2008a;Cai et al., 2013; Xue et al., 2012). Assumption 4 regulates the correlationsamong variables in X k and Y k , and subsequently the correlations among { F ij } where M ij = F ij .The β -mixing Assumption 5 is made for the unknown variable permuta-tion π ℓ ∗ ,p . Similar mixing conditions for the column-wise dependence weremade in Delaigle et al. (2011) and Zhong et al. (2013) for thresholding testsof means. If { X kj } pj =1 and { Y kj } pj =1 are both Markov chains (the vector se-quence under the variable permutation), Theorem 3.3 in Bradley (2005) pro-vides conditions for the processes being β -mixing. If { X kj } pj =1 and { Y kj } pj =1 are linear processes with i.i.d. innovation processes { ǫ x,kj } pj =1 and { ǫ y,kj } pj =1 ,which include the ARMA processes as the special case, then they are β -mixing provided the innovation processes are absolutely continuous (Mokkadem,1988). The latter is particularly weak. Under the Gaussian distribution, anycovariance that matches to the covariance of an ARMA process up to apermutation will be β -mixing. Furthermore, normally distributed data withbanded covariance or block diagonal covariance after certain variable permu-tation also satisfy this assumption. The β -mixing coeﬃcients are assumedto decay at an exponential rate in Assumption 5 to simplify proofs, whilearithmetic rates can be entertained at the expense of more technical details.There are implications of the β -mixing on σ ij and σ ij due to Davydov’sinequality, which potentially restricts the signal level δ ij = p r ,ij log( q ) /n .However, as the β -mixing is assumed for the unknown permutation π ℓ ∗ ,p ,which is likely not the ordering of the observed data, the restriction wouldbe minimal. In the unlikely event that the observed order of the data matchesthat under π ℓ ∗ ,p , the β -mixing would imply that the signals would appearnear the main diagonal of Σ and Σ . However, as the power of the testis determined by the detectable signal strength at or larger than the order p log( q ) /n , the eﬀect of the β -mixing on the alternative hypothesis and thepower is limited as long as there exists a portion of diﬀerences with the S. X. CHEN, B. GUO AND Y. QIU standardized strength above the detection boundary ρ ∗ ( β, ξ ) established inPropositions 3 and 4.Let µ T n , ( s ) and σ T n , ( s ) be the mean and variance of the thresholdingstatistic T n ( s ), respectively, under H . Let φ ( · ) and ¯Φ( · ) be the density andsurvival functions of N (0 , q = p ( p + 1) /

2. Thefollowing proposition provides expansions of µ T n , ( s ) and σ T n , ( s ). Proposition . Under Assumptions 1A or 1B and Assumptions 2-5,we have µ T n , ( s ) = ˜ µ T n , ( s ) { O ( λ / p ( s ) n − / ) } where ˜ µ T n , ( s ) = q { λ / p ( s ) φ ( λ / p ( s )) + 2 ¯Φ( λ / p ( s )) } . In addition, under either (i) Assumption 1A with s > / or (ii) Assumption1B with s > / − ξ/ , σ T n , ( s ) = ˜ σ T n , ( s ) { o (1) } , where ˜ σ T n , ( s ) = q [2 { λ / p ( s ) + 3 λ / p ( s ) } φ ( λ / p ( s )) + 6 ¯Φ( λ / p ( s ))] . From Proposition 1, we see that the main orders ˜ µ T n , ( s ) and ˜ σ T n , ( s ) of µ T n , ( s ) and σ T n , ( s ) are known and are solely determined by p and s , andhence can be readily used to estimate the mean and variance of T n ( s ). Thesmaller order term λ / p ( s ) n − / in µ T n , ( s ) is useful in analyzing the perfor-mance of the thresholding test as in (4.2) later. Compared to the variance ofthe thresholding statistic on the means (Zhong et al., 2013), the exact mainorder ˜ σ T n , ( s ) of σ T n , ( s ) requires a minimum bound on the threshold levels,which is due to the more complex dependence among { M ij I ( M ij > λ p ( s )) } .More discussion regarding this is provided after Theorem 1.Next, we derive the asymptotic distribution of T n ( s ) at a given s . Thetesting for the covariances involves a more complex dependency structurethan those in time series and spatial data. In particular, although the datavector is β -mixing under a permutation, the vectorization of ( M ij ) p × p is notnecessarily a mixing sequence, as the sample covariances in the same rowor column are dependent since they share common segments of data. As aresult, the conventional blocking plus the coupling approach (Berbee, 1979)for mixing series is insuﬃcient to establish the asymptotic distribution of T n ( s ).To tackle the challenge, we ﬁrst use a combination of the matrix blocking,as illustrated in Figure 4 in Appendix, and the coupling method. Due to thecircular dependence of the sample covariances, this only produces indepen-dence among the big matrix blocks with none overlapping indices, and thosematrix blocks that share common indices are still dependent. To respect thisreality, we introduce a novel U-statistic representation (A.6), which allows HRESHOLDING TEST FOR COVARIANCE the use of the martingale central limit theorem on the U-statistic represen-tation to attain the asymptotic normality of T n ( s ). Theorem . Suppose Assumptions 2-5 are satisﬁed. Then, under the H of (2.1), and either (i) Assumption 1A with s > / or (ii) Assumption1B with s > / − ξ/ , we have σ − T n , ( s ) { T n ( s ) − µ T n , ( s ) } d → N (0 , as n, p → ∞ . As the dependence between M i j I { M i j > λ p ( s ) } and M i j I { M i j >λ p ( s ) } decreases as the threshold level s increases, the restriction on s inTheorem 1 is to control the dependence among the thresholded sample co-variances in T n ( s ). Under Assumption 1B that prescribes the polynomialgrowth of p , the minimum threshold level that guarantees the Gaussianlimit of T n ( s ) can be chosen as close to 0 as ξ approaches 2. Compared tothe thresholding statistic on the means (Zhong et al., 2013), the threshold-ing on the covariance matrices requires a larger threshold level in order tocontrol the dependence among entries of the sample covariances.

4. Multi-Thresholding test.

To formulate the multi-thresholding test,we need to ﬁrst construct a single level thresholding test based on Theorem1. From Proposition 1, we note that ˜ σ T n , ( s ) /σ T n , ( s ) →

1. Let ˆ µ T n , ( s ) bean estimate of µ T n , ( s ) that satisﬁes(4.1) ˆ µ T n , ( s ) − µ T n , ( s ) = o p { ˜ σ T n , ( s ) } . By Slutsky’s theorem, under (4.1), the conclusion of Theorem 1 is still validif µ T n , ( s ) and σ T n , ( s ) are replaced by ˆ µ T n , ( s ) and ˜ σ T n , ( s ), respectively. Anatural choice of ˆ µ T n , ( s ) is the main order term ˜ µ T n , ( s ) given in Proposition1. According to the expansion of µ T n , ( s ),(4.2) µ T n , ( s ) − ˜ µ T n , ( s )˜ σ T n , ( s ) = O p { λ / p ( s ) p − s n − / } , which converges to zero under Assumption 1B and s > − ξ/

2. Therefore,we reject the null hypothesis of (2.1) if(4.3) T n ( s ) > ˜ µ T n , ( s ) + z α ˜ σ T n , ( s ) , where z α is the upper α quantile of N (0 , s .It is noted that Condition (4.1) is to simplify the analysis on the thresh-olding statistic. When estimators satisfying (4.1) are not available, we may S. X. CHEN, B. GUO AND Y. QIU choose ˆ µ T n , ( s ) = ˜ µ T n , ( s ) while the lower threshold bound has to be chosenas 1 − ξ/ µ T n , ( s )can be constructed by establishing expansions for µ T n , ( s ) and then correct-ing for the bias empirically. Delaigle et al. (2011) found that more precisemoderate deviation results can be derived for the bootstrap calibrated t-statistics, which provides more accurate estimator for the mean.Existing works (Donoho and Jin, 2004; Delaigle et al., 2011) have shownthat for detecting rare and faint signals in means, a single level thresholdingcannot make the testing procedure adaptive to the unknown signal strengthand sparsity. However, utilizing many thresholding levels can capture the un-derlying sparse and faint signals. This is the path we take for the covariancetesting problem.Let T n ( s ) = ˜ σ − T n , ( s ) { T n ( s ) − ˆ µ T n , ( s ) } be the standardization of T n ( s ).We construct a multi-level thresholding statistic by maximizing T n ( s ) over arange of thresholds. This is in the same spirit of the HC test of Donoho and Jin(2004) and the multi-thresholding test of Zhong et al. (2013) for the means.Deﬁne the multi-level thresholding statistic(4.4) V n ( s ) = sup s ∈S ( s ) T n ( s ) , where S ( s ) = ( s , − η ] for a lower bound s and an arbitrarily smallpositive constant η . From Theorem 1, a choice of s is either 1 / / − ξ/ p having the exponential or polynomial growth. Deﬁne(4.5) S n ( s ) = { s ij : s ij = M ij / (4 log( p )) and s < s ij ≤ (1 − η ) } . Since both ˆ µ T n , ( s ) and ˜ σ T n , ( s ) are monotone decreasing, V n ( s ) can beattained on S n ( s ) such that(4.6) V n ( s ) = sup s ∈S n ( s ) T n ( s ) . This reduces the computation to ﬁnite number of threshold levels. Theasymptotic distribution of V n ( s ) is given in the following theorem. Theorem . Suppose conditions of Theorem 1 and (4.1) hold, under H of (2.1), P { a (log( p )) V n ( s ) − b (log( p ) , s , η ) ≤ x } → exp( − e − x ) , where a ( y ) = (2 log( y )) / and b ( y, s , η ) = 2 log( y )+2 − log log( y ) − − log( π )+ log(1 − s − η ) . HRESHOLDING TEST FOR COVARIANCE This leads to an asymptotic α -level multi-thresholding test (MTT) thatrejects H if(4.7) V n ( s ) > { q α + b (log( p ) , s , η ) } /a (log( p )) , where q α is the upper α quantile of the Gumbel distribution. The test isadaptive to the unknown signal strength and sparsity as revealed in thenext section. However, the convergence of V n ( s ) can be slow, which maycause certain degree of size distortion. To speed up the convergence, wewill present a parametric bootstrap procedure with estimated covariancesto approximate the null distribution of V n ( s ) in Section 6.

5. Power and detection boundary.

We evaluate the power perfor-mance of the proposed thresholding test (4.7) under the alternative hypoth-esis (2.3) by deriving its detection boundary, and demonstrate its superiorityover the ℓ -type and ℓ max -type tests.A detection boundary is a phase transition diagram in terms of the signalstrength and sparsity parameters ( r, β ). We ﬁrst outline the notion in thecontext of testing for high dimensional means. Donoho and Jin (2004) con-sidered testing hypotheses for means from p independent N ( µ j , H ( m )0 : µ j = 0 for all j vs. H ( m ) a : µ , . . . , µ p i.i.d. ∼ (1 − ǫ ) ν + ǫν µ a where ǫ = p − β , µ a = p r log( p ), β ∈ (0 ,

1) and r ∈ (0 , ν and ν µ a denote the point mass distributions at 0 and µ a , respectively. The highdimensionality is reﬂected by p → ∞ . Let(5.2) ρ ( β ) = (cid:26) max { , β − / } if 0 < β ≤ / − √ − β ) if 3 / < β < r = ρ ( β ) is the optimal detection boundary forhypotheses (5.1) under the Gaussian distributed data setting of Donoho and Jin(2004), in the sense that (i) for any test of hypothesis (5.1),(5.3) P (Reject H ( m )0 | H ( m )0 ) + P (Not reject H ( m )0 | H ( m ) a ) → r < ρ ( β );and (ii) there exists a test such that(5.4) P (Reject H ( m )0 | H ( m )0 ) + P (Not reject H ( m )0 | H ( m ) a ) → r > ρ ( β ),as n, p → ∞ . Donoho and Jin (2004) showed that the HC test attains thisdetection boundary, and thus is optimal. They also derived phase transi-tion diagrams for non-Gaussian data. See Zhong et al. (2013) and Qiu et al. S. X. CHEN, B. GUO AND Y. QIU (2018) in other constructions for testing means and regression coeﬃcientsthat also have r = ρ ( β ) as the detection boundary which is not necessarilyoptimal under nonparametric data distributions.Deﬁne the standardized signal strength(5.5) r ij = r ,ij / { (1 − κ ) θ ij + κθ ij } for σ ij = σ ij , by recognizing that the denominator is the main order term of the varianceof √ n (ˆ σ ij − ˆ σ ij ). Under Gaussian distributions, θ ij = σ ii σ jj + σ ij and θ ij = σ ii σ jj + σ ij . Under the alternative hypothesis in (2.3), since thediﬀerence between σ ij and σ ij is at most at the order p log( p ) /n , we have r ij = r ,ij / ( σ ii σ jj + σ ij ) { O ( p log( p ) /n ) } .Deﬁne the maximal and minimal standardized signal strength(5.6) ¯ r = max ( i,j ): σ ij = σ ij r ij and ¯ r = min ( i,j ): σ ij = σ ij r ij . Let C ( β, ¯ r, ¯ r ) = (cid:8) ( Σ , Σ ) : under H a of (2.3) such that m a = ⌊ q − β ⌋ , maximaland minimal standardized signal strength are ¯ r and ¯ r ,respectively, and satisfy Assumptions 2, 4 and 5 (cid:9) be the class of covariance matrices with sparse and weak diﬀerences. For any( Σ , Σ ) ∈ C ( β, ¯ r, ¯ r ), let µ T n , ( s ) and σ T n , ( s ) be the mean and variance of T n ( s ) under H a in (2.3), and letPower n ( Σ , Σ ) = P (cid:2) V n ( s ) > { q α + b (log( p ) , s , η ) } /a (log( p )) | Σ , Σ (cid:3) be the power of the MTT in (4.7). Put SNR( s ) = µ Tn, ( s ) − µ Tn, ( s ) σ Tn, ( s ) be thesignal to noise ratio under H a in (2.3). Note that V n ( s ) = max s ∈S ( s ) σ T n , ( s )˜ σ T n , ( s ) (cid:26) T n ( s ) − µ T n , ( s ) σ T n , ( s ) − ˆ µ T n , ( s ) − µ T n , ( s ) σ T n , ( s ) +SNR( s ) (cid:27) . Thus, the power of the MTT is critically determined by SNR( s ).The next proposition gives the mean and variance of T n ( s ) under H a of(2.3) with the same standardized signal strength r ∗ , corresponding to thecases that r ij = ¯ r for all σ ij = σ ij ( r ∗ = ¯ r ) and r ij = ¯ r for all σ ij = σ ij ( r ∗ = ¯ r ). Let L p be a multi-log( p ) term which may change in context. HRESHOLDING TEST FOR COVARIANCE Proposition . Under Assumptions 1A or 1B, 2-5 and H a in (2.3)with r ij = r ∗ for all σ ij = σ ij , µ T n , ( s ) = µ T n , ( s ) + µ T n ,a ( s ) , where µ T n ,a ( s ) = L p q (1 − β ) I ( s < r ∗ ) + L p q (1 − β ) p − √ s −√ r ∗ ) I ( s > r ∗ ) . In addition, under either (i) Assumption 1A with s > / or (ii) As-sumption 1B with s > / − ξ/ , σ T n , ( s ) = L p q (1 − β ) p − √ s −√ r ∗ ) I ( s >r ∗ ) + L p q (1 − β ) I ( s < r ∗ ) + L p qp − s . From Proposition 2 via the maximal and minimal signal strength de-ﬁned in (5.6), the detection boundary of the proposed MTT are establishedin Propositions 3 and 4 below. As shown in the previous section, a lowerthreshold bound s is needed to control the dependence among the entriesof the sample covariance matrices. The restriction on the threshold levelsleads to a slightly higher detection boundary as compared with that givenin (5.2). Before proceeding further, let us deﬁne a family of detection bound-aries indexed by ξ ∈ [0 ,

2] that connects p and n via n ∼ p ξ :(5.7) ρ ∗ ( β, ξ ) =  ( √ − ξ −√ − β − ξ ) , / < β ≤ / − ξ/ ,β − / , / − ξ/ < β ≤ / , (1 − √ − β ) , / < β < . It is noted that the phase diagrams ρ ∗ ( β, ξ ) are only deﬁned over β ∈ (1 / , ρ ∗ ( β, ξ ) ≥ ρ ( β )for β ∈ (1 / ,

1) and any ξ ∈ [0 , n ∼ p ξ for ξ ∈ (0 ,

2) asprescribed in Assumption 1B, a case considered in Delaigle et al. (2011) inthe context of mean testing.

Proposition . Under Assumptions 1B, 2-5, (4.1) and the alternativehypothesis (2.3), for s = 1 / − ξ/ , an arbitrarily small ǫ > , and a seriesof nominal sizes α n = ¯Φ((log p ) ǫ ) → , as n, p → ∞ ,(i) if ¯ r > ρ ∗ ( β, ξ ) , inf ( Σ , Σ ) ∈C ( β, ¯ r, ¯ r ) Power n ( Σ , Σ ) → ;(ii) if ¯ r < ρ ∗ ( β, ξ ) , sup ( Σ , Σ ) ∈C ( β, ¯ r, ¯ r ) Power n ( Σ , Σ ) → . Proposition 3 shows that the power of the proposed MTT over the class C ( β, ¯ r, ¯ r ) is determined by β , and the minimum and maximum standardizedsignal strength. More importantly, ρ ∗ ( β, ξ ) in (5.7) is the detection bound-ary of the MTT. The power converges to 1 if ¯ r is above this boundary,and diminishes to 0 if ¯ r is below it. The detection boundaries ρ ∗ ( β, ξ ) aredisplayed in Figure 1 for three values of ξ . Note that ρ ( β ) in (5.2) is the de-tection boundary of the MTT for s = 0 that corresponds to ξ = 2, which is S. X. CHEN, B. GUO AND Y. QIU . . . . . . Detection boundary of MTT b ( - - b ) r * ( b, ) r * ( b, ) r * ( b, ) b - Fig 1: The detection boundary ρ ∗ ( β, ξ ) in (5.7) of the proposed multi-levelthresholding test with s = 1 / − ξ/ ξ = 0 , . , . n = p ξ , andthe two pieces (in dashed and dotted curves) that constitute the optimaldetection boundary ρ ( β ) for testing means given in (5.2).the lowest one in the family. It can be shown that ρ ∗ ( β, ξ ) approaches to ρ ( β )as ξ →

2; namely if n ∼ p , we have ρ ∗ ( β,

2) = ρ ( β ), which is the optimaldetection boundary for testing the means with uncorrelated Gaussian data.Restricting s ≥ s = 1 / − ξ/ ρ ∗ ( β, ξ ) ofthe proposed MTT for 1 / < β ≤ / − ξ/

16 as a price for controlling thesize of the test. Similar results on the inﬂuence of the lower threshold boundon testing means were given in Delaigle et al. (2011).The following proposition shows that ρ ∗ ( β,

0) is the detection boundarywhen dimension p grows exponentially fast with n , which can be viewed asa degenerated polynomial growth case with ξ = 0. Proposition . Under Assumption 1A, 2-5, (4.1) and the alternativehypothesis (2.3), for s = 1 / , an arbitrarily small ǫ > , and a series ofnominal sizes α n = ¯Φ((log p ) ǫ ) → , as n, p → ∞ ,(i) if ¯ r > ρ ∗ ( β, , inf ( Σ , Σ ) ∈C ( β, ¯ r, ¯ r ) Power n ( Σ , Σ ) → ;(ii) if ¯ r < ρ ∗ ( β, , sup ( Σ , Σ ) ∈C ( β, ¯ r, ¯ r ) Power n ( Σ , Σ ) → . As ρ ∗ ( β, ≥ ρ ∗ ( β, ξ ) for any ξ ∈ (0 , HRESHOLDING TEST FOR COVARIANCE growth rate of p leads to a higher detection boundary that may be viewedas a sacriﬁce of the power due to the higher dimensionality.From Cai et al. (2013), the power of the ℓ max -test converges to 1 ifmax ≤ i ≤ j ≤ p | σ ij − σ ij | ( θ ij /n + θ ij /n ) / > p log p, which is equivalent to ¯ r > ℓ max -test is stronger than that r ij ∈ (0 ,

1) required by theMTT in this paper. Also, the ℓ -test of Li and Chen (2012) does not havenon-trivial power for β > /

2. Hence, the proposed MTT is more powerfulthan both the ℓ -tests and ℓ max -tests in detecting sparse and weak signals.Propositions 3 and 4 indicate that the MTT can detect the diﬀerencesbetween the unequal covariances in Σ and Σ at the order of c a p log( p ) /n for some positive constant c a . We are to show that the order p log( p ) /n isminimax optimal.Let W α be the collection of all α -level tests for hypotheses (2.1) un-der Gaussian distributions and Assumptions 2, 4 and 5, namely, P ( W α =1 | H ) ≤ α for any W α ∈ W α . Note that (3.2) and (3.4) are suﬃcient condi-tions for (3.3) and the second part of Assumption 4 under Gaussian Distribu-tions, respectively. Deﬁne a class of covariance matrices with the diﬀerencesbeing at least of order { log( p ) /n } / : C ( β, c ) = (cid:8) ( Σ , Σ ) : under H a of (2.3) such that m a = ⌊ q − β ⌋ , r ,ij ≥ c for all σ ij = σ ij , and satisfy Assumptions 2, 4 and 5 (cid:9) . Having Assumptions 4 and 5 in C ( β, c ) and W α is for comparing the powerperformance of the MTT with the minimax rate. Comparing with the covari-ance class C ( β, ¯ r, ¯ r ), C ( β, c ) has no constraint on the maximal signal strength.For Gaussian data, θ ij , θ ij ≤ τ − where τ speciﬁes the bounds in (3.2).Thus, the standardized signal strength r ij ≥ cτ / σ ij = σ ij . For theMTT, from Propositions 3 and 4, inf ( Σ , Σ ) ∈C ( β,c ) Power n ( Σ , Σ ) → n, p → ∞ for a large constant c . The following theorem shows that the lowerbound { log( p ) /n } / for the signal in C ( β, c ) is the optimal rate, namelythere is no α -level test that can distinguish H a from H in (2.3) with prob-ability approaching 1 uniformly over the class C ( β, c ) for some c > Theorem . For the Gaussian distributed data, under Assumptions 1B,2, 4 and 5, for any τ > , < ω < − α and max { / , (3 − ξ ) / } < β < ,there exists a constant c > such that, as n, p → ∞ , sup W α ∈W α inf ( Σ , Σ ) ∈C ( β,c ) P ( W α = 1) ≤ − ω. S. X. CHEN, B. GUO AND Y. QIU

As Propositions 3 and 4 have shown that the proposed MTT can detectsignals at the rate of { log( p ) /n } / for β > /

2, the MTT test is at leastminimax rate optimal for β > max { / , (3 − ξ ) / } . Compared to Theorem4 of Cai et al. (2013), by studying the alternative structures in C ( β, c ), weextend the minimax result from the highly sparse signal regime 3 / < β < { / , (3 − ξ ) / } < β <

1, which oﬀers a wider range of the signalsparsity. The optimality under 1 / < β ≤ max { / , (3 − ξ ) / } requiresinvestigation in a separate eﬀort.Obtaining the lower and upper bounds of the detectable signal strengthat the rate p log( p ) /n requires more sophisticated derivation. These twobounds could be the same under certain conditions for testing one-samplecovariances. However, for the two-sample test, the lower and upper boundsmay not match. This is due to the composite null hypothesis in (2.3). Morediscussion on this issue is given in Section 7.

6. Simulation Results.

We report results from simulation experimentswhich were designed to evaluate the performances of the proposed two-sample MTT under high dimensionality with sparse and faint signals. Wealso compare the proposed test with the tests in Srivastava and Yanagihara(2010) (SY), Li and Chen (2012) (LC) and Cai et al. (2013) (CLX).In the simulation studies, the two random samples { X k } n k =1 and { Y k } n k =1 were respectively generated from(6.1) X k = Σ Z k and Y k = Σ Z k , where { Z k } and { Z k } are i.i.d. random vectors from a common population.We considered two distributions for the innovation vectors Z k and Z k : (i) N (0 , I p ); (ii) Gamma distribution where components of Z k and Z k werei.i.d. standardized Gamma(4,2) with mean 0 and variance 1. To design thecovariances Σ and Σ , let Σ (0)1 = D Σ ( ∗ ) D , where D = diag( d , . . . , d p )with elements generated according to the uniform distribution U(0 . , Σ ( ∗ ) = ( σ ∗ ij ) was a positive deﬁnite correlation matrix. Once generated, D was held ﬁxed throughout the simulation. The following two designs of Σ ( ∗ ) were considered in the simulation:Design 1: σ ∗ ij = 0 . | i − j | ;(6.2) Design 2: σ ∗ ij = 0 . I ( i = j ) + 0 . I ( i, j ∈ [4 k − , k ]) . (6.3)for k = 1 , . . . , ⌊ p/ ⌋ . Design 1 has an auto-regressive structure and Design2 is block diagonal with block size 4. Matrix D created heterogeneity fordiﬀerent dimensions of the data. HRESHOLDING TEST FOR COVARIANCE To generate scenarios of sparse and weak signals under the alternativehypothesis, we chose(6.4) Σ ( ⋆ )1 = Σ (0)1 + ǫ c I p and Σ ( ⋆ )2 = Σ (0)1 + U + ǫ c I p , where U = ( u kl ) p × p is a banded symmetric matrix and ǫ c is a positivenumber to guarantee the positive deﬁniteness of Σ ( ⋆ )2 . Speciﬁcally, let k = ⌊ m p /p ⌋ , where m p = ⌊ q − β / ⌋ is the number of distinct pairs with nonzero u kl . Let u l + k +1 l = u l l + k +1 = p r log p/n for l = 1 , . . . , k and k = m p − pk + k ( k + 1) /

2, and let u kl = p r log p/n for | k − l | ≤ k and k = l if k ≥

1. Set ǫ c = | min { λ min ( Σ (0)1 + U ) , }| + 0 .

05, where λ min ( A ) denotesthe minimum eigenvalue of a matrix A . Since ǫ c > λ min ( Σ ( ⋆ )2 ) ≥ λ min ( Σ (0)1 + U ) + ǫ c >

0, both Σ ( ⋆ )1 and Σ ( ⋆ )2 were positive deﬁnite underboth Designs 1 and 2. Under the null hypothesis, we chose Σ = Σ = Σ (0)1 in (6.1), while under the alternative hypothesis, Σ = Σ ( ⋆ )1 and Σ = Σ ( ⋆ )2 .The simulated data were generated as a reordering of X k and Y k from(6.1) according to a randomly selected permutation π p of { , . . . , p } . Once π p was generated, it was held ﬁxed throughout the simulation. To mimic theregime of sparse and faint signals, we generated a set of β and r values. First,we ﬁxed β = 0 . r = 0 . , . , . . . , r = 0 . β was varied from 0 . . n , n ) as (60 , , , , p =175 , ,

396 and 530 according to p = ⌊ . n . ⌋ . We set s = 0 . η was chosen as 0.05in (4.5). We chose ˆ µ T n , ( s ) = ˜ µ T n , ( s ). The process was replicated 500 timesfor each setting of the simulation.Since the convergence of V n ( s ) to the Gumbel distribution given in (4.7)can be slow when the sample size was small, we employed a bootstrap pro-cedure in conjunction with a consistent covariance estimator proposed byRothman (2012), which ensures the positive deﬁniteness of the estimated co-variance. Since Σ = Σ under the null hypothesis, the two samples { X k } n k =1 and { Y k } n k =1 were pooled together to estimate Σ . Denote the estimator ofRothman (2012) as b Σ . For the b -th bootstrap resample, we drew n sam-ples of X ∗ and n samples of Y ∗ independently from N (0 , b Σ ). Then, thebootstrap test statistic V ∗ ( b ) n ( s ) was obtained based on X ∗ and Y ∗ . Thisprocedure was repeated B = 250 times to obtain the bootstrap sampleof the proposed multi-thresholding statistic {V ∗ (1) n ( s ) , . . . , V ∗ ( B ) n ( s ) } under S. X. CHEN, B. GUO AND Y. QIU the null hypothesis. The bootstrap empirical null distribution of the pro-posed statistic was b F ( x ) = B P Bb =1 I {V ∗ ( b ) n ( s ) ≤ x } and the bootstrapp-value was 1 − b F ( V n ( s )), where V n ( s ) was the multi-thresholding statis-tic from the original sample. We reject the null hypothesis if this p-valueis smaller than the nominal signiﬁcant level α = 0 .

05. The validity of thebootstrap approximation can be justiﬁed in two key steps. First of all, if wegenerate the “parametric bootstrap samples” from the two normal distribu-tions with the true population covariance matrices, by Theorems 1 and 2, thebootstrap version of the single thresholding and multi-thresholding statisticswill have the same limiting Gaussian distribution and the extreme value dis-tribution, respectively. Secondly, we can replace the true covariance aboveby a consistently estimated covariance matrix ˆ Σ (Rothman, 2012), which ispositive deﬁnite. The justiﬁcation of the bootstrap procedure can be madeby showing the consistency of ˆ Σ by extending the results in Rothman (2012).Table 1 reports the empirical sizes of the proposed multi-thresholdingtest using the limiting Gumbel distribution for the critical value (denoted asMTT) and the bootstrap calibration procedure described above (MTT-BT),together with three existing methods, with the nominal level 0.05, and theGaussian and Gamma distributed random vectors, respectively. We observethat the MTT based on the asymptotic distribution exhibited some sizedistortion when the sample size was small. However, with the increase ofthe sample size, the sizes of MTT became closer to the nominal level. At themeantime, the CLX and SY tests also experienced some size distortion underthe Gamma scenario in smaller samples. It is observed that the proposedmulti-thresholding test with the bootstrap calibration (MTT-BT) performedconsistently well under all the scenarios with accurate empirical sizes. Thisshows that the bootstrap distribution oﬀered more accurate approximationthan the limiting Gumbel distribution to the distribution of the test statistic V n ( s ) under the null hypothesis.Figure 2 displays the empirical powers with respect to diﬀerent signalstrengths r for covariance matrix Designs 1 and 2 with n = n = 80, p = 277and n = n = 100, p = 396 under the Gaussian distribution, respectively.Figure 3 reports the empirical powers under diﬀerent sparsity ( β ) levels whenthe signal strength r was ﬁxed at 0.6. Simulation results on the powers underthe Gamma distribution are available in the SM. It is noted that at β = 0 . Σ and Σ among a total of q = 38503 and 78606 unique entries for p = 277and 396, respectively. To make the powers comparable for diﬀerent methods,we adjusted the critical values of the tests by their respective empiricalnull distributions so that the actual sizes were approximately equal to the HRESHOLDING TEST FOR COVARIANCE Table 1

Empirical sizes for the tests of Srivastava and Yanagihara (2010) (SY), Li and Chen(2012) (LC), Cai et al. (2013) (CLX) and the proposed multi-level thresholding testbased on the limiting distribution calibration in (4.7) (MTT) and the bootstrapcalibration (MTT-BT) for Designs 1 and 2 under the Gaussian and Gammadistributions with the nominal level of . p ( n , n ) SY LC CLX MTT MTT-BTGaussian Design 1175 (60, 60) 0.048 0.058 0.054 0.088 0.058277 (80, 80) 0.052 0.052 0.058 0.064 0.056396 (100, 100) 0.042 0.046 0.058 0.064 0.054530 (120, 120) 0.056 0.048 0.050 0.056 0.046Gaussian Design 2175 (60, 60) 0.060 0.048 0.052 0.094 0.048277 (80, 80) 0.040 0.060 0.040 0.064 0.052396 (100, 100) 0.052 0.042 0.044 0.090 0.048530 (120, 120) 0.050 0.046 0.044 0.060 0.054Gamma Design 1175 (60, 60) 0.046 0.060 0.066 0.110 0.056277 (80, 80) 0.060 0.050 0.044 0.076 0.044396 (100, 100) 0.046 0.052 0.046 0.066 0.054530 (120, 120) 0.060 0.056 0.048 0.060 0.048Gamma Design 2175 (60, 60) 0.070 0.056 0.066 0.108 0.056277 (80, 80) 0.060 0.058 0.068 0.112 0.044396 (100, 100) 0.060 0.050 0.044 0.068 0.046530 (120, 120) 0.054 0.056 0.048 0.056 0.048 nominal level 5%. Due to the size adjustment, the MTT based on the limitingdistribution and the MTT-BT based on the bootstrap calibration had thesame test statistic, and hence the same power. Here, we only reported thenumerical power results for the MTT-BT.Figure 2 reveals that the power of the proposed MTT-BT was the high-est among all the tests under all the scenarios. Even though the powers ofother tests improved as the signal strength r was increased, the proposedMTT-BT maintained a lead over the whole range of r ∈ [0 . , r increased. We observe from Figure 3 that the proposedtest also had the highest empirical power across the range of β . The pow-ers of the MTT-BT at the high sparsity level ( β ≥ .

7) were higher thanthose of the CLX test. The latter test is known for doing well in the powerwhen the signal was sparse. We take this as an empirical conﬁrmation tothe attractive detection boundary of the proposed MTT established in the S. X. CHEN, B. GUO AND Y. QIU theoretical analysis reported in Section 5. The monotone decrease patternin the power proﬁle of the four tests reﬂected the reality of reduction inthe number of signals as β was increased. It is noted that the two ℓ normbased tests SY and LC are known to have good powers when the signalsare dense, i.e. β ≤ .

5. This was well reﬂected in Figure 3 indicating thetwo tests had comparable powers to the MTT-BT when β = 0 . . β was larger than 0.5, both SY and LC’s powers started todecline quickly and were surpassed by the CLX, which were consistent withthe results of Figure 2 that the ℓ -tests without regularization incorporatedtoo many uninformative dimensions and lowered their signal to noise ratios.We also observe that as the level of the sparsity was in the range of [0 . , .

7. Discussion.

For establishing the asymptotic normality of the thresh-olding statistic T n ( s ) in (3.1), the β -mixing condition (Assumption 5) canbe weakened. Polynomial rate of the β -mixing coeﬃcients can be assumedat the expense of more dedicated proofs. Under this case, to prove Theorem1, the length of the small and big segments of the matrix blocking ( b and a in Figure 4) need to be chosen at polynomial rates of p , where the ordersdepend on the decaying rate of the β -mixing coeﬃcients.Although Theorem 3 provides the minimax rate p log( p ) /n of the signalstrength for testing hypotheses (2.3), the lower and upper bounds at this ratemay not match due to the composite nature of the hypotheses for the two-sample test. To illustrate this point, let W ∈ W α be the critical function of atest for the hypotheses (2.3). Let E , Σ and E Σ , Σ be the expectation withrespect to the data distribution under the null and alternative hypotheses,respectively. The derivation of the minimax bound starts from the following1 + α − sup W ∈W α inf Σ , Σ E Σ , Σ (W) ≥ inf W ∈W α sup Σ , Σ { E , Σ W + E Σ , Σ (1 − W) }≥ sup Σ inf W sup Σ { E , Σ W + E Σ , Σ (1 − W) } . In the last inequality above, the inﬁmum over the test W is taken underﬁxed Σ . This essentially reduces to one-sample hypothesis testing. Then,a least favorable prior will be constructed on Σ given Σ is known. As thetest under known Σ cannot control the type I error for the two-samplehypotheses (2.3), the bound on the maxmin power for hypotheses (2.3) isnot tight. It is for this reason that one may derive the tight minimax boundfor the one sample spherical hypothesis H : Σ = σ I . We will leave this HRESHOLDING TEST FOR COVARIANCE Fig 2: Empirical powers with respect to the signal strength r for the tests ofSrivastava and Yanagihara (2010) (SY), Li and Chen (2012) (LC), Cai et al.(2013) (CLX) and the proposed multi-level thresholding test with the boot-strap calibration (MTT-BT) for Designs 1 and 2 with Gaussian innovationsunder β = 0 . p = 277, n = n = 80 and p = 396, n = n = 100respectively. . . . . . P o w e r r(a) Design 1, p = =

80, n = MTT−BTCLXLCSY . . . . . P o w e r r(b) Design 2, p = =

80, n = MTT−BTCLXLCSY . . . . . P o w e r r(c) Design 1, p = = = MTT−BTCLXLCSY . . . . . P o w e r r(d) Design 2, p = = = MTT−BTCLXLCSY S. X. CHEN, B. GUO AND Y. QIU

Fig 3: Empirical powers with respect to the sparsity level β for the tests ofSrivastava and Yanagihara (2010) (SY), Li and Chen (2012) (LC), Cai et al.(2013) (CLX) and the proposed multi-level thresholding test with the boot-strap calibration (MTT-BT) for Designs 1 and 2 with Gaussian innovationsunder r = 0 . p = 277, n = n = 80 and p = 396, n = n = 100respectively. . . . . . . P o w e r b (a) Model 1, p = =

80, n = MTT−BTCLXLCSY . . . . . . P o w e r b (b) Model 2, p = =

80, n = MTT−BTCLXLCSY . . . . . . P o w e r b (c) Model 1, p = = = MTT−BTCLXLCSY . . . . . . P o w e r b (d) Model 2, p = = = MTT−BTCLXLCSY

HRESHOLDING TEST FOR COVARIANCE problem as a future work, especially for the unexplored region 1 / < β < max { / , (3 − ξ ) / } in Theorem 3.The proposed thresholding tests can be extended to testing for correlationmatrices between the two populations. Recall that Ψ = ( ρ ij ) p × p and Ψ =( ρ ij ) p × p are correlation matrices of X k and Y k . Consider the hypotheses H : Ψ = Ψ vs. H a : Ψ = Ψ . Let ˆ ρ ij = ˆ σ ij / (ˆ σ ii ˆ σ jj ) / and ˆ ρ ij = ˆ σ ij / (ˆ σ ii ˆ σ jj ) / be the sample cor-relations of the two groups. As for M ij , the squared standardized diﬀerence M ⋆ij between the sample correlations can be constructed based on ˆ ρ ij andˆ ρ ij and their estimated variances. Let T ⋆n ( s ) = X ≤ i ≤ j ≤ p M ⋆ij I { M ⋆ij > λ p ( s ) } be the single level thresholding statistic based on the sample correlations.Similar to the case of sample covariances, the moderate deviation results onˆ ρ ij − ˆ ρ ij can be derived. It can be shown that T ⋆n ( s ) has the same asymptoticdistribution as T n ( s ). The multi-thresholding test can be constructed similarto (4.6) and (4.7).

8. Appendix.

In this section, we provide proof to Theorem 1. Thetheoretical proofs for all other propositions and theorems are relegated to thesupplementary material. Without loss of generality, we assume E ( X ) = 0and E ( Y ) = 0. Let C and L p be a constant and a multi-log( p ) term whichmay change from case to case, respectively. Proof of Theorem 1 . To prove Theorem 1, we propose a novel techniquethat constructs an equivalent U-statistic to T n ( s ) which is based on a parti-tion of covariance into a group of big square blocks separated by small stripsas shown in Figure 4. Speciﬁcally, the indices { , . . . , p } are grouped into asequence of big segments of length a and small segments of length b : { , . . . , a } , { a +1 , . . . , a + b } , { a + b +1 , . . . , a + b } , { a + b +1 , . . . , a +2 b } , . . . where b = o ( a ). Let d = ⌊ p/ ( a + b ) ⌋ be the total number of pairs of largeand small segments. The sets of indices for the large segments and the smallsegments are, respectively, S m = { ( m − a + b ) + 1 , . . . , ma + ( m − b } and R m = { ma + ( m − b + 1 , . . . , m ( a + b ) } S. X. CHEN, B. GUO AND Y. QIU for m = 1 , . . . , d , and a remainder set R d +1 = { d ( a + b ) + 1 , . . . , p } . For a twodimensional array { ( i, j ) : 1 ≤ i ≤ j ≤ p } , the above index partition resultsin d ( d − / a × a : {I m m = S m × S m : 1 ≤ m < m ≤ d } , colored in blue in Figure 4. They are separated by d smallerhorizontal and vertical rectangles with widths a by b and square blocks ofsize b . There are also d residual triangular blocks with a ( a + 1) / Covariance matrix partition ............ ... ... ... ... a b a ab Fig 4: Matrix partition in the upper triangle of a covariance matrix. Thesquare sub-matrices (in blue color) of size a are the bigger blocks, which areseparated by smaller size strips of width b (marked by the 45-degree lines).There are d triangle blocks along the diagonal plus remaining smaller sizeblocks in the residual set R d +1 which are not shown in the diagram.Let A ij ( s ) = L ij ( s ) − µ ,ij ( s ), where µ ,ij ( s ) = E ( L ij ( s ) | H ) and L ij ( s ) = M ij I ( M ij > λ p ( s )). Then, T n ( s ) − E { T n ( s ) } = P ≤ i ≤ j ≤ p A ij ( s ) under thenull hypothesis. Here, we drop the threshold level s in the notations A ij ( s ), L ij ( s ) and µ ,ij ( s ) for simplicity of the statement, when there is no confusion.Based on the matrix partition in Figure 4, T n ( s ) − E { T n ( s ) } can be dividedinto summation of A ij ( s ) over the big square blocks of size a × a , the smallstrips and the triangular blocks along the main diagonal.Let R = ∪ dm =1 R m be the collection of the indices in the small segments.From Figure 4, T n ( s ) − E { T n ( s ) } can be divided into four parts such that(A.1) T n ( s ) − E { T n ( s ) } = B ,n + B ,n + B ,n + B ,n , HRESHOLDING TEST FOR COVARIANCE where B ,n = X ≤ m

4. Note that N = a d ( d − / q (1+ o (1)), N ≤ dpb ≤ p b/ ( a + b ) = o ( p ), N = da / ≤ pa/ o ( p ) and N ≤| R d +1 | p ≤ ( a + b ) p = o ( p ). Similar as deriving Var { T n ( s ) } in Proposition1, we haveVar( B ,n ) = X i ∈ R or j ∈ R, i ≤ j Var( A ij )(A.3) + Cov (cid:18) X i ∈ R or j ∈ Ri ≤ j A i j , X i ∈ R or j ∈ Ri ≤ j A i j (cid:19) , (A.4)where Var( A ij ) = Var( L ij ) = v (0 , s ) { o (1) } ∼ L p p − s under the nullhypothesis, which is given in Lemma 5 in the SM. Notice that the summa-tion of the variances on the right side of (A.3) is bounded by L p p − s N = o (cid:0) Var { T n ( s ) } (cid:1) .For the covariance terms in (A.4), let d i j ,i j = min( | i − i | , | i − j | , | j − j | , | j − i | ) be the minimum coordinate distance between ( i , j ) and ( i , j ),and between ( i , j ) and ( j , i ), where i ≤ j and i ≤ j . For any ﬁxed( i , j ) and a large positive constant M , by Assumption 5 and Davydov’sinequality (Corollary 1.1 of Bosq (1998), p.21), there exists a constant c > | Cov( L i j , L i j ) | ≤ Cγ d i j ,i j ≤ p − M for γ ∈ (0 ,

1) and any d i j ,i j > c log p . Therefore, (cid:12)(cid:12)(cid:12)(cid:12) Cov (cid:18) X i ∈ R or j ∈ R, i ≤ j A i j , X i ∈ R or j ∈ R, i ≤ j A i j (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) ≤ X i ∈ R or j ∈ R, i ≤ j (cid:26) N p − M + X d i j ,i j ≤ c log pi ∈ R or j ∈ R, i ≤ j (cid:12)(cid:12) Cov( A i j , A i j ) (cid:12)(cid:12)(cid:27) , S. X. CHEN, B. GUO AND Y. QIU where by Lemmas 5 and 6 in the SM, | Cov( A i j , A i j ) (cid:12)(cid:12) ≤ L p | ρ i j ,i j | p − s − ǫ + µ ,i j µ ,i j L p n − / for a small ǫ >

0. It follows that X d i j ,i j ≤ c log pi ∈ R or j ∈ R, i ≤ j (cid:12)(cid:12) Cov( A i j , A i j ) (cid:12)(cid:12) ≤ X | i − i |≤ c log p p X j =1 (cid:12)(cid:12) Cov( A i j , A i j ) (cid:12)(cid:12) + X | j − j |≤ c log p p X i =1 (cid:12)(cid:12) Cov( A i j , A i j ) (cid:12)(cid:12) , which is bounded by L p P pj =1 | ρ i j ,i j | p − s − ǫ + L p p − s n − / . It has beenshown that P pj =1 | ρ i j ,i j | ≤ C < ∞ in (S.23) in the SM. By choos-ing M large, the covariance term in (A.4) is bounded by L p N p − s − ǫ + L p N p − s n − / , which is a small order term of Var { T n ( s ) } if s > / s > / − ξ/ B ,n ) = o (cid:0) Var { T n ( s ) } (cid:1) .For B ,n , note that the triangles { ( i, j ) ∈ S m , i ≤ j } along the diagonal areat least b apart from each other, where b ∼ log( p ). The covariances between P i,j ∈ S m A ij and P i,j ∈ S m A ij are negligible for m = m . It follows thatVar( B ,n ) = X ≤ m ≤ d Var (cid:18) X i,j ∈ S m , i ≤ j A ij (cid:19) { o (1) } = X ≤ m ≤ d X i ,j ∈ S m , i ≤ j X i ,j ∈ S m , i ≤ j Cov( A i j , A i j ) { o (1) } , which is bounded by Cda v (0 , s ) = o ( L p a p − s ). This shows that Var( B ,n ) = o (cid:0) Var { T n ( s ) } (cid:1) when a ≪ p / . Here, for two positive sequences { c ,n } and { c ,n } , c ,n ≪ c ,n means that c ,n = o ( c ,n ). For B ,n , we haveVar( B ,n ) ≤ N v (0 , s ) { o (1) } + X j ∈ R d +1 X j ∈ R d +1 | Cov( A i j , A i j ) | = o ( p − s ) + X j ∈ R d +1 (cid:18) N p − M + X d i j ,i j ≤ c log pj ∈ R d +1 | Cov( A i j , A i j ) | (cid:19) . Similar to the case of Var( B ,n ), the last summation term above is equalto N p − M + L p N p − s − ǫ + L p N p − s n − / , which is a small order term ofVar { T n ( s ) } . Meanwhile, since N = q (1+ o (1)), following the same derivationof Proposition 1, it can be shown that Var( B ,n ) = Var( T n ( s )) { o (1) } . HRESHOLDING TEST FOR COVARIANCE Combining the above results, we see that Var( B l,n ) are at a smaller orderof Var { T n ( s ) } for l = 2 , . . . ,

4. This together with (A.1) imply(A.5) T n ( s ) − E ( T n ( s )) p Var( T n ( s )) = B ,n p Var( T n ( s )) + o (1) . Therefore, to show the asymptotical normality of T n ( s ) − E { T n ( s ) } , it suﬃcesto focus on its main order term B ,n .Let Z S m = { X S m , Y S m } for m = 1 , . . . , d , where X S m = { X ki : 1 ≤ k ≤ n , i ∈ S m } and Y S m = { Y ki : 1 ≤ k ≤ n , i ∈ S m } are the segments of thetwo data matrices with the columns in S m . Notice that the summation of A ij in B ,n can be expressed as X i ∈ S m ,j ∈ S m A ij = f ( Z S m , Z S m )for some function f ( · , · ).Let F m b m a ( Z ) = σ { Z S m : m a ≤ m ≤ m b } be the σ -algebra generated by { Z S m } for 1 ≤ m a ≤ m b ≤ d . Let ζ z ( h ) = sup ≤ m ≤ d − h ζ {F m ( Z ) , F dm + h ( Z ) } be the β -mixing coeﬃcient of the sequence Z S , . . . , Z S d . By Theorem 5.1 inBradley (2005) and Assumption 5, we have ζ z ( h ) ≤ n X k =1 ζ x,p ( hb ) + n X k =1 ζ y,p ( hb ) ≤ C ( n + n ) γ hb for some γ ∈ (0 , b = b − log( n + n ) / log( γ ) leads to ζ z ( h ) ≤ Cγ hb ( n + n ) − h ≤ Cγ hb . By Berbee’s theorem (page 516 in Athreya and Lahiri(2006)), there exist Z ∗ S independent of Z S such that P ( Z S = Z ∗ S ) = ζ { σ ( Z S ) , σ ( Z S ) } ≤ ζ z (1) ≤ Cγ b . By applying this theorem recursively,there exist Z S , Z ∗ S , . . . , Z ∗ S d that are mutually independent with each other,and P ( Z S m = Z ∗ S m ) ≤ Cγ b for m = 2 , . . . , d .Let D = ∪ dm =2 { Z S m = Z ∗ S m } , then P ( D ) ≤ Cdγ b . By choosing b = − c log( p ) / log( γ ) for a large positive number c , we have P ( D ) convergesto 0 at the rate p − c / ( a + b ). Notice that Var( B ,n ) is at the order p − s .Since E ( | B ,n I D | ) ≤ { Var( B ,n ) P ( D ) } / converges to 0 for a suﬃcientlylarge c , it follows that B ,n I D → c .Thus, by letting b = c max { log( p ) , log( n + n ) } for a large constant c >

0, there exists an array of mutually independent random vectors Z ∗ S , . . . , Z ∗ S d such that Z ∗ S m = Z S m with overwhelming probability for m = 1 , . . . , d and B ,n can be expressed as a U -statistic formulation on a sequence of mutuallyindependent random vectors as(A.6) B ,n = X m

For simplicity of notations, we will drop the superscript ∗ in (A.6) in thefollowing proof. Now, we only need to establish the asymptotical normalityof B ,n under the expression (A.6).To this end, we ﬁrst study the conditional distribution of M ij given the j th variable, where i ∈ S m , j ∈ S m and m = m . Recall that F ij = (ˆ σ ij − ˆ σ ij )(ˆ θ ij /n + ˆ θ ij /n ) − / is the standardization of ˆ σ ij − ˆ σ ij , where ˆ σ ij = ˜ σ ij − ¯ X i ¯ X j and ˆ σ ij =˜ σ ij − ¯ Y i ¯ Y j for ˜ σ ij = P n k =1 X ki X kj /n and ˜ σ ij = P n k =1 Y ki Y kj /n . Then, M ij = F ij . Note that the unconditional asymptotical distribution of F ij isstandard normal.Let E j ( · ), Var j ( · ) and Cov j ( · ) be the conditional mean, variance andcovariance given the j th variable, respectively. From the proof of Lemma 7 inthe SM, we have that E j (ˆ σ ij ) = E j (ˆ σ ij ) = 0, Var j (˜ σ ij ) = σ ii ˜ σ jj /n andCov j (˜ σ ij , ¯ X i ¯ X j ) = Var j ( ¯ X i ¯ X j ) = σ ii ( ¯ X j ) /n . It follows that Var j (ˆ σ ij ) = σ ii ˆ σ jj /n and Var j (ˆ σ ij ) = σ ii ˆ σ jj /n . In the proof of Lemma 7, it hasalso been shown that ˆ θ ij = σ ii ˆ σ jj + O p ( p log( p ) /n ) and ˆ θ ij = σ ii ˆ σ jj + O p ( p log( p ) /n ) given the j th variable. Similar results hold given the i thvariable. Therefore, F ij is still asymptotically standard normal distributedgiven either the i th or the j th variable. And, the moderate deviation resultsfrom Lemma 2.3 and Theorem 3.1 in Saulis and Statuleviˇcius (1991) forindependent but non-identically distributed variables can be applied to F ij ,given either one of the variables.Let F = {∅ , Ω } and F m = σ { Z S , . . . , Z S m } for m = 1 , , · · · , d bea sequence of σ -ﬁeld generated by { Z S , . . . , Z S m } . Let E F m ( · ) denote theconditional expectation with respect to F m . Write B ,n = P dm =1 D m , where D m = ( E F m − E F m − ) B ,n . Then for every n, p , { D m , ≤ m ≤ d } is amartingale diﬀerence sequence with respect to the σ -ﬁelds { F m } ∞ m =0 . Let σ m = E F m − ( D m ). By the martingale central limit theorem (Chapter 3in Hall and Heyde, 1980), to show the asymptotical normality of B ,n , itsuﬃces to show(A.7) P dm =1 σ m Var( B ,n ) p −→ P dm =1 E ( D m )Var ( B ,n ) −→ . By the independence between { Z S , . . . , Z S d } , we have D m = m − X m =1 f ( Z S m , Z S m ) + X m >m E F m f ( Z S m , Z S m )(A.8) − m − X m =1 E F m − f ( Z S m , Z S m ) , HRESHOLDING TEST FOR COVARIANCE where for any m < m , E F m f ( Z S m , Z S m ) = E F m X i ∈ S m ,j ∈ S m A ij = X i ∈ S m ,j ∈ S m E i A ij . For m < m , let ˜ f ( Z S m , Z S m ) = X i ∈ S m ,j ∈ S m ˜ A ij for˜ A ij = M ij I ( M ij > λ p ( s )) − E F m { M ij I ( M ij > λ p ( s )) } , where i ∈ S m and j ∈ S m . We can decompose D m = D m, + D m, , where(A.9) D m, = m − X m =1 ˜ f ( Z S m , Z S m ) and D m, = X m >m E F m f ( Z S m , Z S m ) . Let G = { max k ,k ,i ( | X k i | , | Y k i | ) ≤ c √ log p } for a positive constant c .Under Assumption 3, P ( G c ) → p for a large c . Tostudy { σ m } , we focus on the set G . By Lemma 7 in the SM, we have E i { M ij I ( M ij > λ p ( s )) } = µ ,ij (cid:8) O ( L p n − / ) (cid:9) , which implies E i A ij = µ ,ij O ( L p n − / ). This leads to E F m f ( Z S m , Z S m ) = L p O ( a p − s n − / ) for m > m , and D m, = ( d − m ) L p O ( a p − s n − / ).From (A.9), we can write D m = D m, + 2 D m, D m, + D m, , where D m, =( d − m ) L p O ( a p − s n − ). Note that E F m − ( D m, D m, ) is equal to m − X m =1 X m >m X j ∈ S m ,j ∈ S m X j ∈ S m ,j ∈ S m E F m − { ˜ A j j E F m ( A j j ) } . Similar as applying the coupling method on the big segments Z S , . . . , Z S d of the variables, the j th and j th variables can be eﬀectively viewed as in-dependent when | j − j | > c log p for some constant c >

0. Therefore, given F m − , E F m − { ˜ A j j E F m ( A j j ) } is negligible when | j − j | > c log p . Mean-while, notice that (cid:12)(cid:12) E F m − { ˜ A j j E F m ( A j j ) } (cid:12)(cid:12) ≤ O ( L p p − s n − / ) E F m − ( | ˜ A j j | )and E F m − ( | ˜ A j j | ) ≤ E F m − { M ij I ( M ij > λ p ( s )) } , which is at the order L p p − s . Therefore, we have (cid:12)(cid:12) E F m − ( D m, D m, ) (cid:12)(cid:12) ≤ O ( L p d a p − s n − / ) . S. X. CHEN, B. GUO AND Y. QIU

Base on the above results, by choosing a ≪ √ n , σ m = E F m − ( D m ) can beexpressed as σ m = E F m − ( D m, ) + O ( L p d a p − s n − / ), where(A.10) E F m − ( D m, ) = m − X m ,m =1 X j ∈ S m j ∈ S m X j ∈ S m j ∈ S m E F m − ( ˜ A j j ˜ A j j ) . For the above summation in (A.10), note that when j = j , j = j , E F m − ( ˜ A j j ) = E j { M j j I ( M j j > λ p ( s )) } − µ ,j j (1 + o p (1)) . By Lemma 7 in the SM, we have E j { M j j I ( M j j > λ p ( s )) } = E ( L j j | H ) { o p (1) } , which implies E F m − ˜ A j j = Var { A j j | H } (1+ o p (1)), where L j j = M j j I ( M j j > λ p ( s )).Let ρ j j = Cor( X kj , X kj ) and ρ j j = Cor( Y kj , Y kj ) be the correla-tions. Let ˜ ρ j j = ˜ σ j j / (˜ σ j j ˜ σ j j ) / and ˜ ρ j j = ˜ σ j j / (˜ σ j j ˜ σ j j ) / .For j = j and j = j , by Lemma 7 in the SM, (cid:12)(cid:12) Cor ( j ,j ) (˜ σ j j − ˜ σ j j , ˜ σ j j − ˜ σ j j ) (cid:12)(cid:12) ≤ ˜ ρ j j , where ˜ ρ j j = max {| ˜ ρ j j | , | ˜ ρ j j |} . By Lemmas 6 and 7 in the SM, we have | E ( j ,j ) ( ˜ A j j ˜ A j j ) | ≤ L p ˜ ρ j j p − s ρj j { o p (1) } + O p ( L p p − s n − / ) . Similarly, for j = j and j = j , by Lemma 7, we have that (cid:12)(cid:12) Cor j (˜ σ j j − ˜ σ j j , ˜ σ j j − ˜ σ j j ) (cid:12)(cid:12) ≤ ρ j j and | E j ( ˜ A j j ˜ A j j ) | ≤ L p ρ j j p − s/ (1+ ρ j j ) { o p (1) } + O p ( L p p − s n − / ) , where ρ j j = max {| ρ j j | , | ρ j j |} . For j = j and j = j , we have (cid:12)(cid:12) Cor ( j ,j ) (˜ σ j j − ˜ σ j j , ˜ σ j j − ˜ σ j j ) (cid:12)(cid:12) ≤ ˜ ρ j j ρ j j . By Assumption 5 and Davydov’s inequality, for any positive constant M ,there exists a constant c > | E ( j ,j ) ( ˜ A j j ˜ A j j ) | ≤ Cγ | j − j | ≤ p − M for a constant γ ∈ (0 ,

1) and | j − j | > c log p . For j and j close, byLemmas 6 and 7, it follows that | E ( j ,j ) ( ˜ A j j ˜ A j j ) | ≤ L p ˜ ρ j j ρ j j p − s/ (1+˜ ρ j j ρ j j ) { o p (1) } + O p ( L p p − s n − / ) . HRESHOLDING TEST FOR COVARIANCE Combining all the diﬀerent cases above for the indexes ( j , j , j , j ) to-gether, equation (A.10) can be decomposed as E F m − ( D m, ) = a ( m − { A | H }{ o p (1) } + m − X m ,m =1 X j ∈ S m = j ∈ S m X j ∈ S m E ( j ,j ) ( ˜ A j j ˜ A j j )(A.12) + m − X m =1 X j ∈ S m X j = j ∈ S m E j ( ˜ A j j ˜ A j j )(A.13) + m − X m ,m =1 X j ∈ S m = j ∈ S m X j = j ∈ S m E F m − ( ˜ A j j ˜ A j j ) . (A.14)Note that ρ j j = 0 for m = m due to the independence between Z S m and Z S m . Under Assumption 3, we also have | ˜ ρ j j − ρ j j | ≤ L p n − / for j ∈ S m and j ∈ S m . The term in (A.12) is bounded by a ( m − L p n − / p − s + a ( m − X j = j ∈ S m L p ρ j j p − s ρj j . By Assumption 5 and Davydov’s inequality, for any

M >

0, there exists aconstant c > ρ j j ≤ p − M for | j − j | > c log p . Therefore, thesummation of L p ρ j j p − s/ (1+ ρ j j ) over j = j ∈ S m is bounded by X | j − j |≤ c log p L p ρ j j p − s/ (1+ ρ j j ) + X | j − j | >c log p L p p − M − s ≤ aL p p − s/ (2 − ǫ ) for a small positive constant ǫ >

0. For (A.13), similarly, we have (cid:12)(cid:12)(cid:12)(cid:12) m − X m =1 X j ∈ S m X j = j ∈ S m E j ( ˜ A j j ˜ A j j ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ a ( m − O p ( L p n − / p − s )+ a ( m − X j = j ∈ S m L p ρ j j p − s/ (1+ ρ j j ) , which is bounded by a ( m − O p ( L p n − / p − s ) + a ( m − L p p − s/ (2 − ǫ ) .For the last term in (A.14), by choosing M in (A.11) suﬃciently large, it isbounded by a ( m − L p n − / p − s + a ( m − L p p − s/ (2 − ǫ ) .Notice that σ m = E F m − ( D m, ) + O p ( L p d a p − s n − / ) by choosing a ≪√ n . Summing up all the terms in (A.12) – (A.14), up to a multiplication of1 + o p (1), we have that d X m =1 σ m = a d ( d − { A | H } + O p ( p L p n − / p − s ) + O ( p L p p − s − ǫ ) , S. X. CHEN, B. GUO AND Y. QIU where a d ( d − { A | H } / B ,n | H )(1+ o (1)). Since L p p − s − ǫ = o { Var( B ,n | H ) } , it follows that d X m =1 σ m = Var( B ,n | H )(1 + o (1)) + O p ( p L p n − / p − s ) . Note that p L p n − / p − s = o ( L p p − s ) for any n and p when s > /

2. Given n = p ξ for ξ ∈ (0 , p L p n − / p − s is at a small order of Var( B ,n | H ) = L p p − s if s > / − ξ/

4, which proves the ﬁrst claim of (A.7).For the second claim of (A.7), notice that D m = D m, + D m, where | D m, | ≤ dL p O ( a p − s n − / ). Given a ≪ √ n , we have P dm =1 d a p − s n − ≪ Var ( B ,n | H ) when s > / − ξ/

8. Since D m ≤ D m, + D m, ), to showthe second claim of (A.7), we only need to focus on D m, , which is m − X m =1 ˜ f ( Z S m , Z S m ) + c ∗ X m ,m ˜ f ( Z S m , Z S m ) ˜ f ( Z S m , Z S m ) + c ∗ X m ,m ,m ˜ f ( Z S m , Z S m ) ˜ f ( Z S m , Z S m ) ˜ f ( Z S m , Z S m )(A.15)+ ∗ X m ,m ,m ,m ˜ f ( Z S m , Z S m ) ˜ f ( Z S m , Z S m ) ˜ f ( Z S m , Z S m ) ˜ f ( Z S m , Z S m )where P ∗ indicates summation over distinct indices smaller than m , and c and c are two positive constants.Note that the expectation of the last term in (A.15) equals to the ex-pectation of its conditional expectation given Z S m , where the conditionalexpectation is bounded by (cid:12)(cid:12)(cid:12)(cid:12) ∗ X m ,m ,m ,m { E S m ˜ f ( Z S m , Z S m ) }{ E S m ˜ f ( Z S m , Z S m ) }× { E S m ˜ f ( Z S m , Z S m ) }{ E S m ˜ f ( Z S m , Z S m ) } (cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:18) X m | E S m ˜ f ( Z S m , Z S m ) | (cid:19) = O { a ( m − L p p − s n − } . Note that the summation of this quantity over 1 ≤ m ≤ d is a small orderterm of Var ( B ,n | H ) giving s > / − ξ/ σ m , E S m ˜ f ( Z S m , Z S m ) is equal to X j ∈ S m ,j ∈ S m X j ∈ S m ,j ∈ S m E S m ( ˜ A j j ˜ A j j ) = O { L p a p − s } , HRESHOLDING TEST FOR COVARIANCE which leads to E S m ∗ X m ,m ˜ f ( Z S m , Z S m ) ˜ f ( Z S m , Z S m ) = O { a ( m − L p p − s } and E S m ∗ X m ,m ,m ˜ f ( Z S m , Z S m ) ˜ f ( Z S m , Z S m ) ˜ f ( Z S m , Z S m )= O { a ( m − L p p − s n − } . The summation of the two terms above over 1 ≤ m ≤ d are at smaller ordersof Var ( B ,n | H ). Also notice that E m − X m =1 ˜ f ( Z S m , Z S m ) ≤ m − X m =1 a X i ∈ S m j ∈ S m E ˜ A ij = a ( m − L p p − s . Since P dm =1 a ( m − L p p − s = a L p p − s ≪ p − s if a ≪ p (1 − s ) / , thesecond claim of (A.7) is valid given a ≪ min { n / , p (1 − s ) / } and s > / − ξ/

8. This proves the asymptotical normality of T n ( s ) for s > / − ξ/ H of (2.1) by choosing a ≪ min { n / , p (1 − s ) / } , b ∼ max { log( p ) , log( n + n ) } and b ≪ a . (cid:3) References.

Anderson, T. W. (2003).

An Introduction to Multivariate Statistical Analysis (3rd ed.),New York: John Wiley & Sons.Arias-Castro, E., Bubeck, S. and Lugosi, G. (2012). Detection of correlations.

The Annalsof Statistics , , 412–435.Athreya, K. and Lahiri, S. (2006). Measure Theory and Probability Theory , New York:Springer.Bai, Z. D., Jiang, D. D., Yao, J. F. and Zheng S. R. (2009). Corrections to LRT onlarge-dimensional covariance matrix by RMT.

The Annals of Statistics , , 3822-3840.Bai, Z. D. and Silverstein, J. W. (2010). Spectral Analysis of Large Dimensional RandomMatrices , New York: Springer.Bai, Z. D. and Yin, Y. Q. (1993). Limit of the smallest eigenvalue of a large dimensionalsample covariance matrix.

The Annals of Probability , , 1275-1294.Berbee, H. (1979). Random Walks with Stationary Increments and Renewal Theory , Am-sterdam: Mathematical Centre.Bickel, P. and Levina, E. (2008a). Regularized estimation of large covariance matrices.

The Annals of Statistics , , 199-227.Bickel, P. and Levina, E. (2008b). Covariance regularization by thresholding. The Annalsof Statistics , , 2577-2604.Bosq, D. (1998). Nonparametric Statistics for Stochastic Processes (2nd ed.), New York:Springer.Bradley, R. (2005). Basic properties of strong mixing conditions: a survey and some openquestions.

Probability Surveys , , 107-144. S. X. CHEN, B. GUO AND Y. QIUCai, T., Liu, W. D. and Xia, Y. (2013). Two-sample covariance matrix testing and supportrecovery in high-dimensional and sparse settings.

Journal of the Americain StatisticalAssociation , , 265-277.Chang, J. Y., Zhou, W., Zhou, W. X. and Wang, L. (2017). Comparing large covariancematrices under weak conditions on the dependence structure and its application to geneclustering. Biometrics , , 31-41.Delaigle, A., Hall, P. and Jin, J. (2011). Robustness and accuracy of methods for highdimensional data analysis based on Student’s t-statistic. Journal of the Royal StatisticalSociety: Series B (Statistical Methodology) , , 283-301.de la Fuente, A. (2010). From diﬀerential expression to diﬀerential networking–identiﬁcation of dysfunctional regulatory networks in diseases. Trends in Genetics , ,326-333.Donoho, D. and Jin, J. (2004). Higher criticism for detecting sparse heterogeneous mix-tures. The Annals of Statistics , , 962 - 994.Donoho, D. and Jin, J. (2015). Higher criticism for large-scale inference, especially for rareand weak eﬀects. Statistical Science , , 1–25.Fan, J. (1996). Test of signiﬁcance based on wavelet thresholding and Neyman’s truncation. Journal of the American Statistical Association , , 674-688.Gupta, D. S. and Giri, N. (1973). Properties of tests concerning covariance matrices ofnormal distributions. The Annals of Statistics , , 1222-1224.Hall, P. and Heyde, C. C. (1980). Martingale Limit Theory and Its Application , AcademicPress.Hall, P. and Jin, J. (2010). Innovated higher criticism for detecting sparse signals incorrelated noise.

The Annals of Statistics , , 1686-1732.Hotelling, H. (1931). The generalization of student’s ratio. Annals of Mathematical Statis-tics , , 54-65.Ingster, Y. I. (1997). Some problems of hypothesis testing leading to inﬁnitely divisibledistributions. Mathematical Methods of Statistics , , 47-69.John, S. (1971). Some optimal multivariate tests. Biometrika , , 123-127.Li, J. and Chen, S. X. (2012). Two sample tests for high-dimensional covariance matrices. The Annals of Statistics , , 908-940.Liu, W. (2013). Gaussian graphical model estimation with false discovery rate control. The Annals of Statistics , , 2948–2978.Mokkadem, A. (1988). Mixing properties of ARMA processes. Stochastic processes andtheir applications , , 309–315.Nagao, H. (1973). On some test criteria for covariance matrix. The Annals of Statistics , , 700-709.Perlman, M. D. (1980). Unbiasedness of the likelihood ratio tests for equality of severalcovariance matrices and equality of several multivariate normal populations. The Annalsof Statistics , , 247-263.Qiu, Y. and Chen, S. X. (2012). Test for bandedness of high-dimensional covariance ma-trices and bandwidth estimation. The Annals of Statistics , , 1285-1314.Qiu, Y., Chen, S. X. and Nettleton, D. (2018). Detecting rare and faint signals via thresh-olding maximum likelihood estimators. The Annals of Statistics , , 895-923.Ren, Z., Sun, T., Zhang, C. H. and Zhou, H. (2015). Asymptotic normality and optimalitiesin estimation of large Gaussian graphical models. The Annals of Statistics , , 991–1026.Rothman, A. J. (2012). Positive deﬁnite estimators of large covariance matrices. Biometrika , , 539-550.Saulis, L. and Statuleviˇcius, V. A. (1991). Limit Theorems for Large Deviations,

Dordrecht:HRESHOLDING TEST FOR COVARIANCE Kluwer Academic.Schott, J. R. (2007). A test for the equality of covariance matrices when the dimensionis large relative to the sample sizes.

Computational Statistics and Data Analysis , ,6535-6542.Srivastava, M. S., and Yanagihara, H. (2010). Testing the equality of several covariancematrices with fewer observations than the dimension. Journal of Multivariate Analysis , , 1319-1329.Tran, L. T. (1990). Kernel density estimation on random ﬁelds. Journal of MultivariateAnalysis , , 37-53.Xue, L. Z., Ma, S. Q. and Zou, H. (2012). Positive-deﬁnite ℓ -penalized estimation of largecovariance matrices. Journal of the Americain Statistical Association , , 1480-1491.Yi, G., Sze, S. H. and Thon, M. R. (2007). Identifying clusters of functionally relatedgenes in genomes. Bioinformatics , , 1053-1060.Zhong, P. S., Chen, S. X. and Xu, M. Y. (2013). Tests alternative to higher criticism forhigh dimensional means under sparsity and column-wise dependence. The Annals ofStatistics , , 2820-2851. Guanghua School of Management andCenter for Statistical SciencePeking UniversityBeijing, 100871, ChinaE-mail: [email protected]

Center of Statistical Research andSchool of StatisticsSouthwestern University of Financeand EconomicsChengdu, Sichuan, 611130, ChinaE-mail: [email protected]