[PDF] On the Phase Transition of Wilk's Phenomenon

Abstract

Wilk's theorem, which offers universal chi-squared approximations for likelihood ratio tests, is widely used in many scientific hypothesis testing problems. For modern datasets with increasing dimension, researchers have found that the conventional Wilk's phenomenon of the likelihood ratio test statistic often fails. Although new approximations have been proposed in high dimensional settings, there still lacks a clear statistical guideline regarding how to choose between the conventional and newly proposed approximations, especially for moderate-dimensional data. To address this issue, we develop the necessary and sufficient phase transition conditions for Wilk's phenomenon under popular tests on multivariate mean and covariance structures. Moreover, we provide an in-depth analysis of the accuracy of chi-squared approximations by deriving their asymptotic biases. These results may provide helpful insights into the use of chi-squared approximations in scientific practices.

Full PDF

OOn the Phase Transition of Wilk’s Phenomenon B Y Y INQIU H E Department of Statistics, University of Michigan, MI 48109, U.S.A. [email protected] O M ENG , Z

HENGHAO Z ENG

University of Science and Technology of China, Anhui, 230026, P.R. China [email protected], [email protected]

AND G ONGJUN X U Department of Statistics, University of Michigan, MI 48109, U.S.A. [email protected] S UMMARY

Wilk’s theorem, which offers universal chi-squared approximations for likelihood ratio tests, is widelyused in many scientiﬁc hypothesis testing problems. For modern datasets with increasing dimension, re-searchers have found that the conventional Wilk’s phenomenon of the likelihood ratio test statistic oftenfails. Although new approximations have been proposed in high dimensional settings, there still lacks aclear statistical guideline regarding how to choose between the conventional and newly proposed approx-imations, especially for moderate-dimensional data. To address this issue, we develop the necessary andsufﬁcient phase transition conditions for Wilk’s phenomenon under popular tests on multivariate meanand covariance structures. Moreover, we provide an in-depth analysis of the accuracy of chi-squared ap-proximations by deriving their asymptotic biases. These results may provide helpful insights into the useof chi-squared approximations in scientiﬁc practices.

Some key words : Wilk’s phenomenon, phase transition

1. I

NTRODUCTION

The likelihood ratio test is a standard testing method for many hypothesis testing problems due to itsnice statistical properties (Anderson, 2003; Muirhead, 2009). Under the low-dimensional setting with aﬁxed number of parameters p and large sample size n , classic theorems offer general asymptotic resultsfor various likelihood ratio test statistics. One of the most celebrated and fundamental results is Wilks’theorem, which states that, under the null hypothesis, twice the negative log-likelihood ratio asymptoti-cally approaches a χ f distribution, where f is the difference of the degrees of freedom between the nulland alternative hypotheses. The popularly used Bartlett correction provides a general rescaling strategythat further improves the ﬁnite sample accuracy of the chi-squared approximations (Cordeiro and Cribari-Neto, 2014; Barndorff-Nielsen and Hall, 1988). Similar Wilk’s phenomenon and Bartlett correction werealso studied for empirical likelihood (Owen, 1990; DiCiccio et al., 1991; Chen and Cui, 2006).Despite the extensive literature on the Wilk’s-type phenomenon of likelihood ratio tests under ﬁnitedimensions, it is of emerging interest to study the large n , diverging p asymptotic regions in a widevariety of modern applications. To understand how large the dimension p can be to ensure the validity ofthe classical Wilk’s phenomenon, various works establish sufﬁcient conditions on the growth rate of p as n increases. For instance, Portnoy (1988) showed that the chi-squared approximation of the likelihood ratio C (cid:13) a r X i v : . [ m a t h . S T ] A ug H E ET AL . test statistic for a simple hypothesis in canonical exponential families holds if p/n / → . Moreover,Hjort et al. (2009), Chen et al. (2009), and Tang and Leng (2010) studied the empirical likelihood ratiostatistic when p → ∞ . Particularly, Chen et al. (2009) argued that p/n / → is likely to be the bestrate for the chi-squared approximation of general empirical likelihood ratio test, and showed that forthe least-squares empirical likelihood, a simpliﬁed version of the empirical likelihood, the chi-squaredapproximation holds if p/n / → . The effect of data dimension was also studied in other inferenceproblems; see, for example, Portnoy (1985), He and Shao (2000), and Wang (2011).When the dimension p further increases, researchers have found that the chi-squared approximationsbased on Wilk’s theorem often become inaccurate, resulting in the failure of the corresponding likelihoodratio tests. To address this issue, various corrections and alternative approximations for the likelihood ratiotests have been proposed. For example, when p is asymptotically proportional to n , namely, p/n → y ∈ (0 , as n → ∞ , Bai et al. (2009), Jiang and Yang (2013), and Jiang and Qi (2015) proposed normalapproximations for the corrected likelihood ratio tests on testing mean vectors and covariance matrices.Zheng (2012), Bai et al. (2013), and He et al. (2020) proposed normal approximations for correctedlikelihood ratio tests in multivariate linear regression models. Furthermore, Sur and Cand`es (2019), Suret al. (2019), and Cand`es and Sur (2020) studied the phase transition of the maximum likelihood estimatorfor the logistic regression, and proposed a rescaled chi-squared approximation for the likelihood ratio test.Despite the proposed distributional theory of the likelihood ratio tests for low- or high-dimensionaldata, there still lacks a quantitative guideline on which approximation should be chosen to use in practice,especially for moderate-dimensional data. For instance, when analyzing a dataset with the number of pa-rameters p ≤ and sample size n = 100 , the chi-squared approximation may be considered as reliable.However, when studying a data set with moderate dimension, e.g., p is between to and sample size n = 100 , it may be unclear to practitioners whether they can still apply the classical chi-squared approxi-mations or they should turn to other high-dimensional asymptotic results. To address this practical issue,it is of interest to investigate the phase transition boundary where the chi-squared approximation startsto fail as p increases, and also characterize the approximation accuracy. Theoretically, this needs a deepunderstanding of the limiting behavior of the likelihood ratio test statistics from low to high dimensions.In this work, we focus on several standard likelihood ratio tests on multivariate mean and covariancestructures that are widely used in biomedical and social sciences (Pituch and Stevens, 2015; Cleff, 2019).For each considered likelihood ratio test, we derive its phase transition boundary of Wilk’s phenomenonand also provide an in-depth analysis of the accuracy of the chi-squared approximation. First, in terms ofthe phase transition boundary, we establish the necessary and sufﬁcient condition for Wilk’s theorem tohold when p increases with n . Speciﬁcally, we show that the chi-squared approximations hold if and onlyif p/n d → , where the value of d depends on the testing problem and whether the Bartlett correction isused. Interestingly, the proposed phase transition boundaries resonate with the abovementioned literature(e.g., Portnoy, 1988; Chen et al., 2009), which mostly focused on sufﬁcient conditions without the Bartlettcorrection. Second, we provide a detailed characterization of the asymptotic bias of each chi-squaredapproximation. Speciﬁcally, we consider two local asymptotic regimes, depending on whether Wilk’stheorem holds or not. Under the asymptotic regime when Wilk’s theorem holds, the derived asymptoticbias sharply characterizes the convergence rate of the distribution of the likelihood ratio test statisticto the limiting chi-squared distribution, and thus provides a useful measure on the accuracy of the chi-squared approximation. When Wilk’s theorem fails, the derived asymptotic bias describes the unignorablediscrepancy between the chi-squared approximation and the true distribution of the likelihood ratio teststatistic. As illustrated in the simulation studies, our theoretical results of the phase transition boundariesand the asymptotic biases may provide a helpful guideline on the use of the chi-squared approximationsin practice. 2. R ESULTS OF O NE -S AMPLE T ESTS

In this section, we present the theoretical results under three one-sample testing problems. We alsoobtain similar results for other multiple-sample testing problems, which are introduced in §

4, and please n the Phase Transition of Wilk’s Phenomenon see their details in the Supplementary Material. Under one-sample problems, suppose x , . . . , x n ∈ R p are independent and identically distributed random vectors with distribution N p ( µ, Σ) , which denotes a p -variate multivariate normal distribution with mean vector µ and covariance matrix Σ . We deﬁne x = n − (cid:80) ni =1 x i and A = (cid:80) ni =1 ( x i − x )( x i − x ) T , and denote the determinant and the trace of A by | A | and tr( A ) , respectively. We next introduce the considered testing problems and the corresponding likelihoodratio tests (Anderson, 2003; Muirhead, 2009).(I) Testing Speciﬁed Value for the Mean Vector.

This test examines whether the population mean vector µ is equal to a speciﬁed vector µ ∈ R p , that is, H : µ = µ against H a : H is not true. Through thetransformation x i − µ , we consider, without loss of generality, µ = (0 , . . . , T . Then, the likelihoodratio test statistic is Λ n = | A | n/ ( A + n ¯ x ¯ x T ) − n/ . When p is ﬁxed and n → ∞ , under the null hypothe-sis, the classical chi-squared approximation without correction is − n d −→ χ f , where d −→ representsthe convergence in distribution and f = p , and the chi-squared approximation with the Bartlett correctionis − ρ log Λ n d −→ χ f , where ρ = 1 − (1 + p/ /n .(II) Testing the Sphericity of the Covariance Matrix.

This test examines whether the covariance matrix Σ is proportional to an identity matrix; that is, H : Σ = λ I p against H a : H is not true, where λ > isan unspeciﬁed constant and I p denotes the p × p identity matrix. The likelihood ratio test statistic is Λ n = | A | ( n − / { tr( A ) /p } − p ( n − / . When p is ﬁxed and n → ∞ , under the null hypothesis, the chi-squaredapproximation is − n d −→ χ f , where f = ( p − p + 2) / , and the chi-squared approximation withthe Bartlett correction is − ρ log Λ n d −→ χ f , where ρ = 1 − { n − p } − (2 p + p + 2) .(III) Joint Testing Speciﬁed Values for the Mean Vector and Covariance Matrix.

Consider a speci-ﬁed vector µ ∈ R p and a speciﬁed positive-deﬁnite matrix Σ ∈ R p × p . We study the test H : µ = µ and Σ = Σ against H a : H is not true. By applying the transformation Σ − / ( x i − µ ) , we assume,without loss of generality, that µ = 0 and Σ = I p . Then, the likelihood ratio test statistic is Λ n =( e/n ) np/ | A | n/ exp {− tr( A ) / − nx T x/ } . When p is ﬁxed and n → ∞ , under the null hypothesis,the chi-squared approximation is − n d −→ χ f , where f = p ( p + 3) / , and the chi-squared approxi-mation with the Bartlett correction is − ρ log Λ n d −→ χ f , where ρ = 1 − { n ( p + 3) } − (2 p + 9 p + 11) . For the likelihood ratio tests of the above three testing problems, Theorem 2.1 gives the phase transitionboundaries of the chi-squared approximations without and with the Bartlett correction.T

HEOREM

Assume n > p + 1 for all n ≥ and n − p → ∞ as n → ∞ . Under H , for the chi-squared approximations without and with the Bartlett correction of each likelihood ratio test in (I)–(III),we have the following necessary and sufﬁcient conditions:(i) sup α ∈ (0 , | pr {− n > χ f ( α ) } − α | → if and only if p/n d → (ii) sup α ∈ (0 , | pr {− ρ log Λ n > χ f ( α ) } − α | → if and only if p/n d → , where the values of d and d under the three testing problems are listed in the table below.(I) Mean (II) Covariance (III) Joint(i) without correction d : / / / (ii) with correction d : / / / In Theorem 2.1, n > p + 1 is assumed to ensure the existence of the likelihood ratio tests. We nextdiscuss the obtained phase transition boundaries of the classical chi-squared approximations without cor-rection. When only testing mean parameters, Theorem 2.1 suggests that the chi-squared approximationholds if and only if p/n / → . This asymptotic regime is similarly assumed in Portnoy (1988), whichconsidered testing p natural parameters in exponential families. However, Portnoy (1988) only showedthe sufﬁciency of p/n / → for the chi-squared approximation to be applied, and did not establish thenecessary and sufﬁcient result, which is essential for understanding the phase transition behaviors. In ad-dition, when the likelihood ratio tests involve covariance matrices as in (II) and (III), Theorem 2.1 showsthat the chi-squared approximation holds if and only if p/n / → , which is consistent with the discus- H E ET AL . sion in Chen et al. (2009). Particularly, under certain regularity conditions, Chen et al. (2009) establishedthat the chi-squared approximation of the empirical likelihood ratio test holds if p/n / → . The authorsfurther argued that p/n / → is likely to be the best rate for p , because it is the necessary and sufﬁcientcondition for the convergence of the sample covariance matrix to the true covariance matrix Σ under thetrace norm when the eigenvalues of Σ are bounded. The analysis provides an intuitive explanation for thephase transition boundaries obtained above, and our necessary and sufﬁcient result would serve as anothersupport for their conjecture, despite the different problem settings in Chen et al. (2009) and here.Additionally, for the chi-squared approximations with the Bartlett correction, Theorem 2.1 also explic-itly characterizes their phase transition boundaries, which generally achieve a larger asymptotic regionthan those without correction. When p is ﬁxed, the Bartlett correction serves as a rescaling strategy thatcan improve the convergence rate of the likelihood ratio statistic from O ( n − ) to O ( n − ) ; however, when p grows with sample size n , the classical result cannot apply directly. Alternatively, the results in Theorem2.1 provide a precise illustration of how the Bartlett correction improves the chi-squared approximationsin terms of the phase transition boundaries.The phase transition boundaries in Theorem 2.1 give the necessary and sufﬁcient conditions on theasymptotic regimes of ( n, p ) in Wilk’s phenomenon. When applying the likelihood ratio test in practice, itis desired to have a better understanding of the accuracy of the chi-squared approximation, especially nearits phase transition boundary. The following Theorem 2.2 characterizes the accuracy of each chi-squaredapproximation for tests (I)–(III) when Wilk’s phenomenon holds. Speciﬁcally, we consider the asymptoticregime where ( n, p ) satisﬁes the corresponding necessary and sufﬁcient condition in Theorem 2.1, i.e., p/n d → and p/n d → for the chi-squared approximations without and with the Bartlett correction,respectively.T HEOREM

For each likelihood ratio test (I)–(III), let d i , i = 1 , take the corresponding valuesin Theorem . . Let z α denote the upper α -level quantile of the standard normal distribution. Consider p → ∞ as n → ∞ . Then under H , given α ∈ (0 , ,(i) when p/n d → , the chi-squared approximation satisﬁes pr {− n > χ f ( α ) } − α = ϑ ( n, p ) √ π exp (cid:18) − z α (cid:19) + o (cid:18) p /d n (cid:19) ; (1) (ii) when p/n d → , the chi-squared approximation with the Bartlett correction satisﬁes pr {− ρ log Λ n > χ f ( α ) } − α = ϑ ( n, p ) √ π exp (cid:18) − z α (cid:19) + o (cid:18) p /d n (cid:19) . (2) The values of ϑ ( n, p ) and ϑ ( n, p ) under three testing problems (I)–(III) are listed below. (I) ϑ ( n, p ) = p + 2 p n √ f , ϑ ( n, p ) = p ( p − ρn ) √ f ; (II) ϑ ( n, p ) = p (2 p + 3 p − − /p n − √ f , ϑ ( n, p ) = ( p − p − p + 2)(2 p + 6 p + 3 p + 2)144 p ρ ( n − √ f ; (III) ϑ ( n, p ) = p (cid:0) p + 9 p + 11 (cid:1) n √ f , ϑ ( n, p ) = p (2 p + 18 p + 49 p + 36 p − p + 3)( ρn ) √ f . In Theorem 2.2, the forms of ϑ ( n, p ) and ϑ ( n, p ) are derived from a nontrivial calculation of cer-tain complicated inﬁnite series (see Eq. (B.20) and (B.28) in the Supplementary Material). We can seethat for each test, ϑ ( n, p ) and ϑ ( n, p ) are of orders of p /d n − and p /d n − , respectively. It fol-lows that ϑ ( n, p ) exp( − z α / / √ π in (1) is the leading term for the chi-squared approximation bias pr {− n > χ f ( α ) } − α , and therefore can be used to measure the accuracy of the chi-squaredapproximation. Similar conclusion also holds for ϑ ( n, p ) exp( − z α / / √ π in (2) when using the chi-squared approximation with the Bartlett correction. We demonstrate the usefulness of (1) and (2) in prac-tice by simulation studies in § n the Phase Transition of Wilk’s Phenomenon In the above discussion, we focus on the local asymptotic regime when Wilk’s phenomenon holds,and the derived bias describes the accuracy of the chi-squared approximation. When p further increasesbeyond this local asymptotic regime, the chi-squared approximation starts to fail, and the approximationbias becomes asymptotically unignorable. The following Theorem 2.3 characterizes such unignorablebiases of the chi-squared approximations. Particularly, we consider the local asymptotic regime p/n → ,which includes the case when Wilk’s theorem fails, that is, p/n d (cid:54)→ for the chi-squared approximation,and p/n d (cid:54)→ for the chi-squared approximation with the Bartlett correction.T HEOREM

Assume p → ∞ and p/n → as n → ∞ . For each likelihood ratio test (I)–(III), under H , there exists a small constant δ ∈ (0 , such that for any α ∈ (0 , ,(i) the chi-squared approximation satisﬁes pr (cid:8) − n > χ f ( α ) (cid:9) − α = ¯Φ (cid:26) χ f ( α ) + 2 µ n nσ n (cid:27) − α + O (cid:26)(cid:16) pn (cid:17) − δ + f − − δ (cid:27) , (3) where ¯Φ( · ) = 1 − Φ( · ) , and Φ( · ) denotes the cumulative distribution function of the standard nor-mal distribution;(ii) the chi-squared approximation with the Bartlett correction satisﬁes pr (cid:8) − ρ log Λ n > χ f ( α ) (cid:9) − α = ¯Φ (cid:40) χ f ( α ) + 2 ρµ n ρnσ n (cid:41) − α + O (cid:26)(cid:16) pn (cid:17) − δ + f − − δ (cid:27) . (4) The values of µ n and σ n under each problem are listed below, where L x,p = log(1 − p/x ) for x > p . (I) µ n = n (cid:110)(cid:16) n − p − (cid:17) ( L n,p − L n − ,p ) + L n,p + pL n, (cid:111) , σ n = 12 ( L n,p − L n − ,p ); (II) µ n = − n − (cid:8) ( n − p − / L n − ,p + p (cid:9) , σ n = − (cid:18) pn − L n − ,p (cid:19) ( n − n ; (III) µ n = − n (cid:8) ( n − p − / L n − ,p + p (cid:9) − p , σ n = − (cid:18) pn − L n − ,p (cid:19) . Theorem 2.3 is derived by quantifying the difference between the characteristic functions of log Λ n and a normal distribution (see Lemma B.3.2 in the Supplementary Material). The local asymptotic regime p/n → is assumed mainly for the technical simplicity of evaluating the asymptotic expansions of thecharacteristic functions. Under the conditions of Theorem 2.3, ¯Φ[ { χ f ( α ) + 2 µ n } / (2 nσ n )] − α in (3)can be approximated by ¯Φ { z α + ( f + 2 µ n ) / (2 nσ n ) } − ¯Φ( z α ) , where ( f + 2 µ n ) / (2 nσ n ) is of the or-der of pn − d (see Remark B.3.2 in the Supplementary Material). Consequently, when the chi-squaredapproximation fails, i.e., pn − d (cid:54)→ , we know that ¯Φ[ { χ f ( α ) + 2 µ n } / (2 nσ n )] − α in (3) character-izes the corresponding unignorable bias of the chi-squared approximation. Similarly, we can show that ¯Φ[ { χ f ( α ) + 2 ρµ n } / (2 ρnσ n )] − α can be approximated by ¯Φ { z α + ( f + 2 ρµ n ) / (2 ρnσ n ) } − ¯Φ( z α ) ,where ( f + 2 ρµ n ) / (2 ρnσ n ) is of the order of p /d n − . Therefore, when the chi-squared approxima-tion with the Bartlett correction fails, i.e., pn − d (cid:54)→ , we know that (4) characterizes the correspondingunignorable approximation bias. Remark . . . Although the above discussions consider p/n d (cid:54)→ and p/n d (cid:54)→ , (3) and (4) inTheorem 2.3 also hold under the asymptotic regimes p/n d → and p/n d → examined in Theorem2.2. However, since Theorems 2.2 and 2.3 focus on different asymptotic regimes and are proved usingdifferent techniques, we can show that when p/n d → and p/n d → , (3) and (4) have an additionalremainder term O { ( p/n ) (1 − δ ) / + f − (1 − δ ) / } compared to (1) and (2), respectively; see Remark B.3.2 inthe Supplementary Material. Therefore, under the asymptotic regimes of Theorem 2.2, (1) and (2) providea sharper characterization of the accuracy of the chi-squared approximations than (3) and (4), respectively. H E ET AL .

3. S

IMULATIONS

We conduct simulation studies to evaluate the ﬁnite-sample performance of the theoretical results. Par-ticularly, under the null hypothesis of the one-sample tests, we generate data with µ = (0 , . . . , T and Σ = I p and use α = 0 . . We next consider problem (III), jointly testing mean and covariance, as an illus-tration example, and present the results of the chi-squared approximation without the Bartlett correction.For test (III) with the Bartlett correction and problems (I)–(II), testing mean and covariance separately,the simulation results are similar and thus presented in § A.3 of the Supplementary Material.First, to examine the phase transition boundary in Theorem 2.1, we take p = (cid:98) n (cid:15) (cid:99) , where n ∈{ , , , } , (cid:15) ∈ { / , . . . , / } , and (cid:98)·(cid:99) denotes the ﬂoor function. We plot the empir-ical type-I error versus (cid:15) in Part (a) of Fig. 1, which is based on 1,000 simulation replications. We can seethat for all considered sample sizes, the empirical type-I errors start to inﬂate around (cid:15) = 1 / , matchingthe phase transition boundary d = 1 / of test (III) in Theorem 2.1. Similar results are obtained for othertests as shown in the Supplementary Material. E m p i r i c a l t y pe - I e rr o r (a) E m p i r i c a l t y pe - I e rr o r / B i a s (b) E m p i r i c a l t y pe - I e rr o r / B i a s (c)Fig. 1: Chi-squared approximation without the Bartlett correction for test (III): (a) Empirical type-I errorfor n = 100 (cross), (asterisk), (square), and (triangle); the theoretical phase transitionboundary (cid:15) = 1 / (vertical dashed line). (b) Empirical type-I error for n = 500 (asterisk); asymptoticbias ϑ ( n, p ) exp( − z α / / √ π in (1) (dot); the difference between the empirical type-I error and theasymptotic bias in (1) (circle). (c) Empirical type-I error for n = 500 (asterisk); the maximum bias overthe bias in (1) and the bias ¯Φ[ { χ f ( α ) + 2 µ n } / (2 nσ n )] − α in (3) (dot); the location where the bias in(3) starts to dominate the bias in (1) (plus sign); the difference between the empirical type-I error and themaximum bias (circle).Second, we numerically evaluate the asymptotic biases in Theorems 2.2 and 2.3 with p = (cid:98) n (cid:15) (cid:99) , where n ∈ { , } and (cid:15) ∈ (0 , . Parts (b) and (c) in Fig. 1 present the results with n = 500 , while theresults with n = 100 are similar and thus reported in the Supplementary Material. Part (b) shows that theasymptotic bias in (1) can be an informative indicator of the failure of Wilk’s theorem. Particularly, as (cid:15) increases, the asymptotic bias in (1) increases accordingly. At the (cid:15) values where the empirical type-I errorbegins to inﬂate (e.g. (cid:15) ∈ [0 . , . ), the difference between the empirical type-I error and the asymptoticbias is still close to 0.05 as shown in the circle line, suggesting that (1) can approximate the bias well.When (cid:15) further increases beyond the phase transition boundary (e.g. (cid:15) > . ), the asymptotic bias keepsincreasing, and its large value indicates the failure of the chi-squared approximation, even though it nowunderestimates the approximation bias in this regime. To better characterize the approximation bias when (cid:15) is beyond the phase transition boundary, we can combine the results in Theorem 2.3 together with thosein Theorem 2.2. Speciﬁcally, Part (c) shows that taking the maximum over the two asymptotic biases in (1)and (3) gives a good evaluation of the approximation bias for a full range of (cid:15) , below or above the phasetransition boundary. We also ﬁnd that using (3) itself does not evaluate the approximation bias well forsmall (cid:15) (results are not presented). Based on our theoretical and numerical results, when applying Wilk’stheorem, we would recommend practitioners to compare the asymptotic bias, either (1) or the maximumover (1) and (3), with a small threshold value that they may specify beforehand, e.g., 0.01-0.02. If the n the Phase Transition of Wilk’s Phenomenon asymptotic bias is larger than the threshold, the chi-squared approximation should not be directly used,and other methods would be needed.4. R ESULTS OF O THER T ESTS

In addition to three one-sample tests in §

2, we also obtain similar theoretical and numerical resultsfor other four popular testing problems in the Supplementary Material. Particularly, we consider threemultiple sample tests: (IV) Testing the equality of several mean vectors; (V) Testing the equality of severalcovariance matrices; (VI) Jointly testing the equality of several mean vectors and covariance matrices. Wealso study (VII) Testing independence between multiple vectors. Similarly to the results in §

2, for eachlikelihood ratio test, we establish not only the phase transition boundary of Wilk’s theorem, but also theapproximation biases under the two asymptotic regimes, where Wilk’s theorem holds or not, respectively.Please see the details in § A of the Supplementary Material.5. D

ISCUSSION

This study derives the phase transition boundary and characterizes the approximation bias of Wilk’stheorem in seven standard likelihood ratio tests. It is interesting to see that the phase transition bound-ary generally depends on the problem setting and whether the Bartlett correction is used or not, whichemphasizes the necessity of statistically-principled guidelines. The approximation bias of Wilk’s theo-rem was also recently studied by Anastasiou and Reinert (2018), which derived an explicit bound of thechi-squared approximation bias for a general family of regular likelihood ratio test statistics. However,as noted in that paper, their bounds are generally not optimized. It is thus of interest to further studythe necessary and sufﬁcient conditions for Wilk’s phenomenon and the approximation accuracy in sucha general setting. Beyond the regular parametric inference problems, Wilk’s-type phenomenon has alsobeen studied in geometrically irregular parametric models (Drton and Williams, 2011; Chen et al., 2018),and extended to nonparametric models and statistical learning theory (e.g., Fan et al., 2000, 2001; Fanand Zhang, 2004; Boucheron and Massart, 2011). Understanding the phase transition behavior of Wilk’sphenomenon for the likelihood ratio tests would shed light on studying the general Wilk’s phenomenonunder these complicated statistical models. Besides the likelihood ratio tests, similar phase transition phe-nomena can also occur for other popular test statistics. For instance, Xu et al. (2019) recently studied theapproximation theory for Pearson’s chi-squared statistics when the number of cells is large, and demon-strated a similar phase transition phenomenon that the asymptotic distribution of the test statistic can beeither a chi-squared or a normal distribution. It is interesting to further investigate the phase transitionboundaries of these tests. A

CKNOWLEDGEMENT

The authors are grateful to the editor, Professor Paul Fearnhead, an associate editor and three refereesfor their valuable comments and suggestions. This research was partially supported by the U.S. NationalScience Foundation. S

UPPLEMENTARY MATERIAL

The supplementary material available at

Biometrika online includes theoretical results for the other fourtesting problems in Section 4, additional simulation studies, and the proofs of the theorems.R

EFERENCES

Abramowitz, M. and I. A. Stegun (1970).

Handbook of mathematical functions with formulas, graphs, and mathe-matical tables (9th ed.), Volume 55. US Government printing ofﬁce. H E ET AL . Anastasiou, A. and G. Reinert (2018). Bounds for the asymptotic distribution of the likelihood ratio. arXiv preprintarXiv:1806.03666 .Anderson, T. (2003).

An Introduction to Multivariate Statistical Analysis . Wiley Series in Probability and Statistics.Wiley.Bai, Z., D. Jiang, J.-F. Yao, and S. Zheng (2009). Corrections to LRT on large-dimensional covariance matrix byRMT.

The Annals of Statistics 37 (6B), 3822–3840.Bai, Z., D. Jiang, J.-f. Yao, and S. Zheng (2013). Testing linear hypotheses in high-dimensional regressions.

Statis-tics 47 (6), 1207–1223.Barndorff-Nielsen, O. and P. Hall (1988). On the level-error after Bartlett adjustment of the likelihood ratio statistic.

Biometrika 75 (2), 374–378.Boucheron, S. and P. Massart (2011). A high-dimensional Wilks phenomenon.

Probability theory and relatedﬁelds 150 (3-4), 405–433.Cand`es, E. J. and P. Sur (2020). The phase transition for the existence of the maximum likelihood estimate in high-dimensional logistic regression.

The Annals of Statistics .Chen, S. X. and H. Cui (2006). On bartlett correction of empirical likelihood in the presence of nuisance parameters.

Biometrika 93 (1), 215–220.Chen, S. X., L. Peng, and Y.-L. Qin (2009). Effects of data dimension on empirical likelihood.

Biometrika 96 (3),711–722.Chen, Y., J. Huang, Y. Ning, K.-Y. Liang, and B. G. Lindsay (2018). A conditional composite likelihood ratio testwith boundary constraints.

Biometrika 105 (1), 225–232.Cleff, T. (2019).

Applied Statistics and Multivariate Data Analysis for Business and Economics: A Modern ApproachUsing SPSS, Stata, and Excel . Springer.Cordeiro, G. M. and F. Cribari-Neto (2014).

An introduction to Bartlett correction and bias reduction . Springer.DiCiccio, T., P. Hall, and J. Romano (1991). Empirical likelihood is Bartlett-correctable. the Annals of Statistics 19 (2),1053–1061.Drton, M. and B. Williams (2011). Quantifying the failure of bootstrap likelihood ratio tests.

Biometrika 98 (4),919–934.Fan, J., H.-N. Hung, and W.-H. Wong (2000). Geometric understanding of likelihood ratio statistics.

Journal of theAmerican Statistical Association 95 (451), 836–841.Fan, J., C. Zhang, and J. Zhang (2001). Generalized likelihood ratio statistics and Wilks phenomenon.

The Annals ofstatistics 29 (1), 153–193.Fan, J. and W. Zhang (2004). Generalised likelihood ratio tests for spectral density.

Biometrika 91 (1), 195–209.He, X. and Q.-M. Shao (2000). On parameters of increasing dimensions.

Journal of Multivariate Analysis 73 (1),120–135.He, Y., T. Jiang, J. Wen, and G. Xu (2020). Likelihood ratio test in multivariate linear regression: from low to highdimension.

Statistica Sinica .Hjort, N. L., I. W. McKeague, and I. Van Keilegom (2009). Extending the scope of empirical likelihood.

The Annalsof Statistics 37 (3), 1079–1111.Jiang, T. and Y. Qi (2015). Likelihood ratio tests for high-dimensional normal distributions.

Scandinavian Journal ofStatistics 42 (4), 988–1009.Jiang, T. and F. Yang (2013). Central limit theorems for classical likelihood ratio tests for high-dimensional normaldistributions.

The Annals of Statistics 41 (4), 2029–2074.Luke, Y. L. (1969).

Special functions and their approximations , Volume 2. Academic press.Muirhead, R. J. (2009).

Aspects of multivariate statistical theory , Volume 197. John Wiley & Sons.Owen, A. (1990). Empirical likelihood ratio conﬁdence regions.

The Annals of Statistics , 90–120.Pituch, K. A. and J. P. Stevens (2015).

Applied multivariate statistics for the social sciences: Analyses with SAS andIBMs SPSS . Routledge.Portnoy, S. (1985). Asymptotic behavior of M estimators of p regression parameters when p /n is large; II. Normalapproximation. The Annals of Statistics , 1403–1417.Portnoy, S. (1988). Asymptotic behavior of likelihood methods for exponential families when the number of param-eters tends to inﬁnity.

The Annals of Statistics , 356–366.Press, W. H., B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling (1992).

Numerical recipes in Fortran 77: the artof scientiﬁc computing . Cambridge university press.Sur, P. and E. J. Cand`es (2019). A modern maximum-likelihood theory for high-dimensional logistic regression.

Proceedings of the National Academy of Sciences 116 (29), 14516–14525.Sur, P., Y. Chen, and E. J. Cand`es (2019). The likelihood ratio test in high-dimensional logistic regression is asymp-totically a rescaled chi-square.

Probability Theory and Related Fields 175 (1-2), 487–558.Tang, C. Y. and C. Leng (2010). Penalized high-dimensional empirical likelihood.

Biometrika 97 (4), 905–920.Ushakov, N. G. (2011).

Selected topics in characteristic functions . Walter de Gruyter.Van der Vaart, A. W. (2000).

Asymptotic statistics , Volume 3. Cambridge university press.Wang, L. (2011). GEE analysis of clustered binary data with diverging number of covariates.

The Annals of Statis-tics 39 (1), 389–417. n the Phase Transition of Wilk’s Phenomenon Whittaker, E. T. and G. N. Watson (1996).

A course of modern analysis . Cambridge university press.Xu, M., D. Zhang, and W. B. Wu (2019). Pearson’s chi-squared statistics: approximation theory and beyond.

Biometrika 106 (3), 716–723.Zheng, S. (2012). Central limit theorems for linear spectral statistics of large dimensional F -matrices. Annales del’IHP Probabilit´es et statistiques 48 (2), 444–476.Zwillinger, D. (2002).

CRC standard mathematical tables and formulae (31st ed.). CRC press.

Supplementary Material for“On the Phase Transition of Wilk’s Phenomenon”

In this supplementary material, we present additional results in § A. Particularly, the theoretical resultsfor tests (IV)–(VI) and test (VII) are given in § A.1 and § A.2, respectively. All the simulations for tests(I)–(VII) are provided in § A.3. We next present the proofs for the testing problem (III) as an illustrationexample in § B, where the corresponding results in Theorems 2.1–2.3 are proved in §§ B.1–B.3, respec-tively. The proofs for other tests are similar and given in § C. The technical lemmas are proved in § D. A Additional Results 10

A.1 Multiple-Sample Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10A.2 Testing Independence between Multiple Vectors . . . . . . . . . . . . . . . . . . . . . . 12A.3 Additional Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

B Proof Illustration with Problem (III) 23

B.1 Proof of Theorem 2.1 (III) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23B.2 Proof of Theorems 2.2 (III) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25B.3 Proof of Theorems 2.3 (III) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

C Proofs of Other Problems 30

C.1 Proofs of Theorems 2.1, A.1 & A.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30C.2 Proofs of Propositions A.1 & A.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39C.3 Proofs of Theorems 2.2, A.2 & A.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41C.4 Proofs of Theorems 2.3, A.3, & A.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

D Proofs of Assisted Lemmas 52

D.1 Results on Asymptotic Expansions of the Gamma Function . . . . . . . . . . . . . . . . 52D.2 Lemmas for Theorems 2.2, A.2 & A.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . 55D.3 Lemmas for Theorems 2.3, A.3, & A.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . 740 H E ET AL . A. A

DDITIONAL R ESULTS

A.1 . Multiple-Sample Tests

This subsection presents the theoretical results of three multiple-sample tests (IV)–(VI). Under themultiple-sample problems, let k denote the number of samples, which is assumed to be ﬁxed comparedto the sample size. In each sample i = 1 , . . . , k, the observations x i , · · · , x in i are independent andidentically distributed N p ( µ i , Σ i ) random vectors. In this subsection, we deﬁne x i = n − i (cid:80) n i j =1 x ij and A i = (cid:80) n i j =1 ( x ij − x i )( x ij − x i ) T for i = 1 , . . . , k , and let A = A + . . . + A k and n = n + . . . + n k .We next brieﬂy review the likelihood ratio tests for the problems (IV)–(VI).(IV) Testing the Equality of Several Mean Vectors.

Consider H : µ = . . . = µ k agains H a : H is not true, where the covariances of the k samples are assumed to be the same. Deﬁne B = (cid:80) ki =1 n i ( x i − x )( x i − x ) T and x = n − (cid:80) ki =1 n i x i . Then, the likelihood ratio test statistic is Λ n = | A | n/ | A + B | − n/ . When p is ﬁxed and n → ∞ , the chi-squared approximation is − n d −→ χ f ,where f = ( k − p , and the chi-squared approximation with the Bartlett correction is − ρ log Λ n d −→ χ f ,where ρ = 1 − { k + p ) / } /n .(V) Testing the Equality of Several Covariance Matrices.

Consider H : Σ = . . . = Σ k against H a : H is not true. For this test, Λ n = | A | − ( n − k ) / ( n − k ) ( n − k ) p/ × (cid:81) ki =1 ( n i − − ( n i − p/ | A i | ( n i − / is the modiﬁed likelihood ratio test statistic with the unbiasedness property. When p is ﬁxed and min ≤ i ≤ k n i → ∞ , the chi-squared approximation is − n d −→ χ f , where f = p ( p + 1)( k − / , and the chi-squared approximation with the Bartlett correction is − ρ log Λ n d −→ χ f , where ρ = 1 −{ p + 1)( k − } − (2 p + 3 p − { (cid:80) ki =1 ( n i − − − ( n − k ) − } . (VI) Joint Testing the Equality of Mean Vectors and Covariance Matrices.

Consider H : µ = . . . = µ k , Σ = . . . = Σ k against H a : H is not true. The likelihood ratio test statistic is Λ n = n pn/ | A + B | − n/ × (cid:81) ki =1 n − pn i / i | A i | n i / . When p is ﬁxed and min ≤ i ≤ k n i → ∞ , the chi-squaredapproximation is − n d −→ χ f , where f = p ( k − p + 3) / , and the chi-squared approxima-tion with the Bartlett correction is − ρ log Λ n d −→ χ f , where ρ = 1 − { k − p + 3) } − (2 p + 9 p +11)( (cid:80) ki =1 n − i − n − ) . For the likelihood ratio tests (IV)–(VI), Theorem A.1 gives the phase transition boundaries of the chi-squared approximations without and with the Bartlett correction.T

HEOREM

A.1.

Assume n i > p + 1 for i = 1 , . . . , k , and there exists a constant δ ∈ (0 , such that δ < n i /n j < δ − for any ≤ i, j ≤ k. Under H , for the chi-squared approximations without and withthe Bartlett correction, we have the following necessary and sufﬁcient conditions:(i) sup α ∈ (0 , | pr {− n > χ f ( α ) } − α | → if and only if p/n d → (ii) when p = o ( n ) , sup α ∈ (0 , | pr {− ρ log Λ n > χ f ( α ) } − α | → if and only if p/n d → ,where the values of d and d under the three testing problems are listed in the table below.(IV) Mean (V) Covariance (VI) Joint(i) without correction d : / / / (ii) with correction d : / / / In Theorem A.1, the boundedness of n i /n j suggests that the sizes of all the samples are compara-ble. The additional regularity condition p = o ( n ) in (ii) speciﬁes a local asymptotic region, which is ofpractical interest, and simulation studies suggest that the conclusion can hold more generally without thiscondition. With a ﬁxed k , the phase transition boundaries in Theorem A.1 are parallel to those in The-orem 2.1, and the analyses after Theorem 2.1 apply to Theorem A.1 similarly. Particularly, examiningcovariances or not will yield different phase transition boundaries in the three problems. When k alsoincreases with n , the phase transition boundaries would involve k, p , and n , as illustrated in the followingproposition. n the Phase Transition of Wilk’s Phenomenon P ROPOSITION

A.1.

Consider n > p + k , n − k → ∞ , and n − p → ∞ . For Λ n in problem (IV), un-der H , as n → ∞ ,(i) sup α ∈ (0 , | pr {− n > χ f ( α ) } − α | → if and only if √ pk ( p + k ) /n → (ii) sup α ∈ (0 , | pr {− ρ log Λ n > χ f ( α ) } − α | → if and only if √ pk ( p + k ) /n → . Proposition A.1 suggests that the total number of samples k and the dimension of each observation p play symmetric roles in the phase transition boundary of problem (IV). When k is ﬁxed, Proposition A.1is consistent with Theorem A.1. To further illustrate the cases with increasing k , we consider p = (cid:98) n (cid:15) (cid:99) and k = (cid:98) n η (cid:99) , where < (cid:15), η < and (cid:98)·(cid:99) denotes the ﬂoor of a number. Then the two phase transitionboundaries in Proposition A.1 become (i) max { (cid:15), η } + ( (cid:15) + η ) / < and (ii) max { (cid:15), η } + ( (cid:15) + η ) / < , respectively. Speciﬁcally, for (i), when (cid:15) is close to 0, the largest value of η is around / , and viceversa; when (cid:15) = η , suggesting p and k are of the same order, the largest value of (cid:15) is / . For (ii), when (cid:15) is close to 0, the largest value of η is around / , and vice versa; when (cid:15) = η , the largest value of (cid:15) becomes / .In addition to the phase transition boundaries above, the following Theorem A.2, similarly to Theorem2.2, further characterizes the accuracy of each chi-squared approximation for tests (IV)–(VI) when Wilk’stheorem holds. Speciﬁcally, we consider p/n d → and p/n d → for the chi-squared approximationswithout and with the Bartlett correction, respectively.T HEOREM

A.2.

Assume that there exists a constant δ ∈ (0 , such that δ < n i /n j < δ − for any ≤ i, j ≤ k, and p → ∞ as n → ∞ . For each likelihood ratio test (IV)–(VI), let d i , i = 1 , take thecorresponding values in Theorem A. . Then under H , for any α ∈ (0 , ,(i) when p/n d → , (1) in Theorem . holds with the value of ϑ ( n, p ) listed below;(ii) when p/n d → , (2) in Theorem . holds with the values of ϑ ( n, p ) listed below.Let D n,r = (cid:80) ki =1 n − ri − n − r and ˜ D n,r = (cid:80) ki =1 ( n i − − r − ( n − k ) − r .(IV) Mean: ϑ ( n, p ) = p ( k − p + 2 + k )4 n √ f ,ϑ ( n, p ) = ( k − p ( p + k − k − n ρ √ f ; (V) Covariance: ϑ ( n, p ) = ˜ D n, p (2 p + 3 p − √ f ,ϑ ( n, p ) = p ( p + 1)24 ρ √ f (cid:110) ( p − p + 2) ˜ D n, − k − − ρ ) (cid:111) ; (VI) Joint: ϑ ( n, p ) = D n, p (cid:0) p + 9 p + 11 (cid:1) √ f ,ϑ ( n, p ) = p ( p + 3)24 ρ √ f (cid:110) ( p + 1)( p + 2) D n, − k − − ρ ) (cid:111) . Theorem A.2 shows that for multiple-sample tests (IV)–(VI), (1) and (2) in Theorem 2.2 still hold.However, the values of ϑ ( n, p ) and ϑ ( n, p ) depend on the testing problems, and are different fromthose in Theorem 2.2. Similarly to Theorem 2.2, in each test (IV)–(VI), we also know that ϑ ( n, p ) and ϑ ( n, p ) are of the orders of p /d n − and p /d n − , respectively. Then ϑ ( n, p ) exp( − z α / / √ π in (1)and ϑ ( n, p ) exp( − z α / / √ π in (2) are the leading terms of the biases of the chi-squared approximationswithout and with the Bartlett correction, respectively. We can similarly use the derived asymptotic biasesto measure the approximation accuracy, and please see the simulation studies for multiple-sample tests(IV)–(VI) in § A.3.2 H E ET AL . Theorem A.2 focuses on the local asymptotic regime of ( n, p ) when Wilk’s theorem holds. When p further increases such that Wilk’s theorem fails, the biases of the chi-squared approximations becomeunignorable. The following Theorem A.3 characterizes such unignorable biases of the chi-squared approx-imations in testing problems (IV)–(VI). Similarly to Theorem 2.3, we consider a general local asymptoticregime p/n → , which includes the case when Wilk’s theorem fails, i.e., p/n d (cid:54)→ and p/n d (cid:54)→ forthe chi-squared approximations without and with the Bartlett correction, respectively.T HEOREM

A.3.

Assume that there exists a constant δ ∈ (0 , such that δ < n i /n j < δ − for any ≤ i, j ≤ k. Moreover, assume p → ∞ and p/n i → as n i → ∞ . For each likelihood ratio test (I)–(III), under H , for any α ∈ (0 , , (3) and (4) in Theorem . hold under three testing problems (IV)–(VI)with µ n and σ n listed below.(IV) Mean: µ n = n (cid:8) ( n − p − k − / L n − ,p − L n − k,p ) + ( k − L n − ,p + pL n − ,k − (cid:9) ,σ n = 12 (cid:0) L n − ,p − L n − k,p (cid:1) ; (V) Covariance: µ n = 12 k (cid:88) i =1 ( n i − (cid:110) ( n − p − k − / L n − k,p − ( n i − p − / L n i − ,p (cid:111) ,σ n = ( n − k ) n (cid:40) L n − k,p − k (cid:88) i =1 (cid:18) n i − n − k (cid:19) L n i − ,p (cid:41) ; (VI) Joint: µ n = 12 (cid:34) − kp + n (cid:16) n − p − (cid:17) L n,p − k (cid:88) i =1 (cid:26) p n i + n i (cid:16) n i − p − (cid:17) L n i − ,p (cid:27)(cid:35) ,σ n = 12 (cid:32) L n,p − k (cid:88) i =1 n i n × L n i − ,p (cid:33) . Theorem A.3 shows that (3) and (4) still hold for multiple-sample tests (IV)–(VI), where the valuesof µ n and σ n depend on the speciﬁc testing problem. Similarly to Theorem 2.3, the analysis in RemarkB.3.2 also applies here, and we know that when pn − d (cid:54)→ , (3) characterizes the unignorable biases forthe chi-squared approximation, and when pn − d (cid:54)→ , (4) characterizes the unignorable biases for the chi-squared approximation with the Bartlett correction. Moreover, the analysis in Remark 2.0.1 also appliessimilarly to the multiple-sample tests (IV)–(VI), and thus is not repeated here.A.2 . Testing Independence between Multiple Vectors This subsection studies testing the independence between k sets of multivariate normal variables. Sup-pose x , . . . , x n ∈ R p are independent and identically distributed N p ( µ, Σ) random vectors, and we par-tition x i and Σ as x i = ( ξ T i , . . . , ξ T ik ) T and Σ = (Σ jl ) ≤ j,l ≤ k , respectively, where ξ i,j is of size p j × , Σ jl is a p j × p l sub-matrix of Σ , and (cid:80) kj =1 p j = p . In this subsection, we deﬁne x = n − (cid:80) ni =1 x i , ¯ ξ j = n − (cid:80) ni =1 ξ ij , A = (cid:80) ni =1 ( x i − x )( x i − x ) T , and A jj = (cid:80) ni =1 ( ξ ij − ¯ ξ j )( ξ ij − ¯ ξ j ) T . (VII) Testing Independence of Subvectors of Multivariate Normal Distribution.

For the multivari-ate normal distribution, testing the independence between k sets of vectors ξ i, , . . . , ξ i,k is equiv-alent to testing H : Σ jl = 0 , for ≤ j < l ≤ k , against H a : H is not true. The likelihood ratiostatistic is Λ n = | A | n/ (cid:81) kj =1 | A jj | − n/ . When p , . . . , p k are ﬁxed, the chi-squared approximation is − n d −→ χ f , where f = ( p − (cid:80) ki =1 p i ) / the chi-squared approximation with the Bartlett correc-tion is − ρ log Λ n d −→ χ f , where ρ = 1 − (3 / n ) − − { n ( p − (cid:80) ki =1 p i ) } − ( p − (cid:80) ki =1 p i ) . Theorem A.4 below gives the phase transition boundaries of the chi-squared approximations withoutand with the Bartlett correction for test (VII).T

HEOREM

A.4.

Assume n > p + 1 and there exists δ ∈ (0 , such that δ < p i /p j < δ − for ≤ i, j ≤ k . For Λ n in problem (VII), under H , as n → ∞ , n the Phase Transition of Wilk’s Phenomenon (i) sup α ∈ (0 , | pr {− n > χ f ( α ) } − α | → if and only if p/n / → (ii) when p = o ( n ) , sup α ∈ (0 , | pr {− ρ log Λ n > χ f ( α ) } − α | → if and only if p/n / → . The phase transition boundaries in Theorem A.4 are consistent with those in Theorems 2.1 and A.1 fortesting problems (II), (III), (V), and (VI). This is reasonable because testing independence between multi-variate normal vectors examines the structures of covariance matrices. In Theorem A.4, the boundednessof p i /p j suggests that the dimensions of the multiple vectors are comparable. The following PropositionA.2 relaxes this constraint for k = 2 , a case closely related to the canonical correlation analysis.P ROPOSITION

A.2.

Consider n > p + p and n − max { p , p } → ∞ . For Λ n in problem (VII), un-der H , as n → ∞ ,(i) sup α ∈ (0 , | pr {− n > χ f ( α ) } − α | → if and only if √ p p ( p + p ) /n → (ii) sup α ∈ (0 , | pr {− ρ log Λ n > χ f ( α ) } − α | → if and only if √ p p ( p + p ) /n → . Proposition A.2 shows that the effects of p and p on the phase transition boundaries are symmetric. Tofurther illustrate, consider p = (cid:98) n (cid:15) (cid:99) and p = (cid:98) n η (cid:99) , where < (cid:15), η < . Then the two phase transitionboundaries in Proposition A.2 become (i) max { (cid:15), η } + ( (cid:15) + η ) / < and (ii) max { (cid:15), η } + ( (cid:15) + η ) / < , respectively. When (cid:15) = η , i.e., p and p are of the same order, the largest value of (cid:15) and η achievable is(i) / and (ii) / respectively, which are consistent with Theorem A.4. When η is close to 0, the largestvalue of (cid:15) is (i) / and (ii) / respectively. Therefore when one set of the vectors is of ﬁnite dimension,the chi-squared approximations without and with the Bartlett correction can be applied when p/n / → and p/n / → , respectively. This demonstrates an interesting phenomenon that for the phase transitionboundary, the growth rate of p changes as the ratio of p and p varies.Similarly to Theorems 2.2 and A.2, the following Theorem A.5 further characterizes the accuracy ofthe chi-squared approximation under the asymptotic regime where p satisﬁes the corresponding necessaryand sufﬁcient conditions in Theorem A.4.T HEOREM

A.5.

Assume that there exists δ ∈ (0 , such that δ < p i /p j < δ − for ≤ i, j ≤ k , and p → ∞ as n → ∞ . Let d = 1 / and d = 2 / as in Theorem A. . For Λ n in problem (VII), under H ,for any α ∈ (0 , ,(i) when p/n d → , (1) in Theorem . holds with the value of ϑ ( n, p ) below;(ii) when p/n d → , (2) in Theorem . holds with the value of ϑ ( n, p ) below.Let D p,r = p r − (cid:80) kj =1 p rj . Then ϑ ( n, p ) = 2 D p, + 9 D p, n √ f , ϑ ( n, p ) = 1( ρn ) √ f (cid:18) D p, − D p, − D p, D p, (cid:19) . Similar to Theorems 2.2 and A.2, Theorem A.5 focuses on the local asymptotic regime when Wilk’stheorem holds, and we know from a similar analysis that (1) and (2) provide useful information on theaccuracy of the chi-squared approximations. Please see the simulations for test (VII) in § A.3. When p further increases such that Wilk’s theorem fails, the following Theorem A.6 characterizes the unignorablechi-squared approximation biases for test (VII) similarly as in Theorems 2.3 and A.3.T HEOREM

A.6.

Assume that there exists δ ∈ (0 , such that δ < p i /p j < δ − for ≤ i, j ≤ k , and p → ∞ and p/n → as n → ∞ . For Λ n in problem (VII), under H , as n → ∞ , for any α ∈ (0 , , (3) and (4) in Theorem . hold with µ n and σ n listed below. µ n = n (cid:20) − (cid:18) n − p − (cid:19) L n − ,p + k (cid:88) j =1 (cid:26)(cid:18) n − p j − (cid:19) L n − ,p j (cid:27)(cid:21) ,σ n = 12 (cid:18) − L n − ,p + k (cid:88) j =1 L n − ,p j (cid:19) . H E ET AL . Note that Theorem A.6 is analogous to Theorems 2.3 and A.3, and therefore similar analyses andconclusions as in Remarks 2.0.1 and B.3.2 also hold for test (VII), which are not repeated here.A.3 . Additional Simulations

We next introduce the simulation settings of each test and afterwards analyze the numerical results.A.3.1 . One-Sample Tests (I)–(III).

Similarly to Section 3, under the null hypothesis of each one-sample test (I)–(III), we set µ = (0 , . . . , T and Σ = I p . (1) On the phase transition boundaries. We take p = (cid:98) n (cid:15) (cid:99) , where n ∈ { , , , } and (cid:15) ∈{ / , . . . , / } . We next plot the empirical type-I error rates (over 1000 replications) versus (cid:15) foreach chi-squared approximation in Fig. 2. We still include the results in § (2) On the asymptotic biases. To evaluate the asymptotic biases in Theorems 2.2 and 2.3, we take p = (cid:98) n (cid:15) (cid:99) , where n ∈ { , } and (cid:15) ∈ (0 , . The results of n = 100 and (over 3000 replications) aregiven in Fig. 4 and Fig. 5, respectively. In each setting, the range of (cid:15) is chosen such that the largestempirical type-I error is below 0.5.To facilitate the presentation of ﬁgures and the discussions below, we deﬁne (cid:36) = ϑ ( n, p ) exp( − z α / / √ π, (cid:36) = ¯Φ (cid:2)(cid:8) χ f ( α ) + 2 µ n (cid:9) / (2 nσ n ) (cid:3) − α,(cid:36) = ϑ ( n, p ) exp( − z α / / √ π, (cid:36) = ¯Φ (cid:2)(cid:8) χ f ( α ) + 2 ρµ n (cid:9) / (2 ρnσ n ) (cid:3) − α. Then (cid:36) , (cid:36) , (cid:36) , and (cid:36) denote the asymptotic biases in (1)–(4), respectively. For each test in Fig. 4and Fig. 5, we plot (cid:36) and (cid:36) in the subﬁgures in the columns (a) and (c), respectively. Similarly to §

3, to better characterize each approximation bias when (cid:15) is beyond the corresponding phase transitionboundary, we combine the results in Theorem 2.2 and those in Theorem 2.3. Speciﬁcally, in the column(b) of Fig. 4 and Fig. 5, we plot M c ( (cid:36) , (cid:36) ) ≡ (cid:36) { (cid:36) < c } + max { (cid:36) , (cid:36) } { (cid:36) ≥ c } , where {·} denotes an indicator function, and c denotes a small positive threshold, and we choose c = 0 . in thesimulations. This deﬁnition of M c ( (cid:36) , (cid:36) ) suggests that (cid:36) is used when the approximation bias issmaller than c , and max { (cid:36) , (cid:36) } is used when the approximation bias becomes larger. Similarly, wedeﬁne M c ( (cid:36) , (cid:36) ) ≡ (cid:36) { (cid:36) < c } + max { (cid:36) , (cid:36) } { (cid:36) ≥ c } , and plot it in the column (d) of Fig. 4and Fig. 5. Remark A. . . For each chi-squared approximation, max { (cid:36) , (cid:36) } already characterizes the bias wellmost of the time. We use M c ( (cid:36) , (cid:36) ) instead of max { (cid:36) , (cid:36) } because (cid:36) can mistakenly indicate alarge bias under small (cid:15) , especially when n is small. Compared to max { (cid:36) , (cid:36) } , M c ( (cid:36) , (cid:36) ) does notuse (cid:36) when (cid:36) indicates that the bias is still small. As long as c is sufﬁciently small but not too closeto zero, M c ( (cid:36) , (cid:36) ) will not take the wrong value given by (cid:36) , and thus gives a good evaluation ofthe approximation bias under a wide range of (cid:15) values. Despite the difference between M c ( (cid:36) , (cid:36) ) and max { (cid:36) , (cid:36) } , we note that M c ( (cid:36) , (cid:36) ) is equal to max { (cid:36) , (cid:36) } under most cases. For instance, inall our simulations with n = 500 and c = 0 . , M c ( (cid:36) , (cid:36) ) = max { (cid:36) , (cid:36) } . Thus in §

3, we did nothighlight this difference. When the Bartlett correction is used, we know that similar analysis applies to max { (cid:36) , (cid:36) } and M c ( (cid:36) , (cid:36) ) .A.3.2 . Multiple-Sample Tests (IV)–(VI). Consider k = 3 , n = n = n , and n = n + n + n . Un-der the null hypothesis of each multiple-sample test (IV)–(VI), we set µ i = (0 , . . . , T , and Σ i = I p for i = 1 , , . (1) On the phase transition boundaries. Let p = (cid:98) n (cid:15) (cid:99) , where n = n + n + n and n i ∈{ , , , } for i = 1 , , . We then plot the empirical type-I error rates (over 1000 replica-tions) versus (cid:15) for each chi-squared approximation in Fig. 3. (2) On the asymptotic biases. To evaluate the asymptotic biases in Theorems A.2 and A.3, we take p = (cid:98) n (cid:15) (cid:99) , where n = n + n + n , n i ∈ { , } for i = 1 , , , and (cid:15) ∈ (0 , . The results of n i = 100 and (over 3000 replications) are given in Fig. 6 and Fig. 7, respectively. Similarly to Fig. 4 and Fig. n the Phase Transition of Wilk’s Phenomenon

5, in each row of Fig. 6 and Fig. 7, the lines with dot markers in the four columns (a)–(d) give (cid:36) , M c ( (cid:36) , (cid:36) ) , (cid:36) , and M c ( (cid:36) , (cid:36) ) , respectively.A.3.3 . Testing Independence between Multiple Tests (VII). Consider k = 3 . Under the null hypothesisof test (VII), we set µ = (0 , . . . , T and Σ = I p . (1) On the phase transition boundaries. Let p = (cid:98) n (cid:15) (cid:99) , where (cid:15) ∈ { / , / , . . . , / } and n ∈{ , , , } . Under each ( n, p ) , we set p = p = (cid:98) p/ (cid:99) and p = p − p − p , and then plotthe empirical type I error (over 1000 replications) versus (cid:15) in Fig. 3. (2) On the asymptotic biases. To evaluate the asymptotic biases in Theorems A.5 and A.6, we set p = (cid:98) n (cid:15) (cid:99) ,where n ∈ { , } and (cid:15) ∈ (0 , . Under each ( n, p ) , we take p = p = (cid:98) p/ (cid:99) and p = p − p − p .The results of n = 100 and (over 3000 replications) are given in Fig. 8 and Fig. 9, respectively.Similarly to Figures 4–7, in Fig. 8 and Fig. 9, the lines with dot markers in the four columns (a)–(d) give (cid:36) , M c ( (cid:36) , (cid:36) ) , (cid:36) , and M c ( (cid:36) , (cid:36) ) , respectively.We next analyze the simulation results. First, as shown in Figures 2 and 3, the theoretical phase tran-sition boundary, denoted by a vertical line, is observed to be consistent with where each chi-squaredapproximation starts to fail. For instance, the two plots in the ﬁrst row of Fig. 2 show that for test (I),the type-I error rates of the chi-squared approximations without and with the Bartlett correction begin toinﬂate when (cid:15) is around / and / , respectively. These are consistent with d = 2 / and d = 4 / fortest (I) in Theorem 2.1. Similarly for other tests, we can see that the numerical results are also consistentwith the corresponding conclusions in Theorems 2.1, A.1, and A.4.Second, similarly to §

3, the results in Figures 4–9 show that the derived theoretical asymptotic biasesprovide good evaluations of the corresponding chi-squared approximation biases. From the subﬁguresin the column (a) of Figures 4–9, we can see that as (cid:15) increases, the empirical type-I error inﬂates, and (cid:36) also increases accordingly. At the (cid:15) values where the type-I error begins to inﬂate, the differencebetween the empirical type-I error and (cid:36) is close to 0.05, as shown by the circle line, which suggests that (cid:36) approximates the chi-squared approximation bias pr {− n > χ f ( α ) } − α well in this regime.When (cid:15) further increases beyond the corresponding phase transition boundary, the asymptotic bias (cid:36) keeps increasing, and its large value indicates the failure of the chi-squared approximation, even thoughnow (cid:36) underestimates the approximation bias in this regime. To better characterize the approximationbias when (cid:15) is beyond the phase transition boundary, we combine (cid:36) and (cid:36) by plotting M c ( (cid:36) , (cid:36) ) inthe column (b) of Figures 4–9. The results suggest that utilizing the two asymptotic biases in (1) and in (3)together can give a good evaluation of the approximation bias under a wide range of (cid:15) values, either belowor above the phase transition boundary. Moreover, in each subﬁgure in the column (b), we also highlightthe location with x -axis (cid:15) ∗ where M c ( (cid:36) , (cid:36) ) starts to be larger than (cid:36) (the plus sign). When (cid:15) < (cid:15) ∗ , M c ( (cid:36) , (cid:36) ) = (cid:36) , indicating that (cid:36) approximates the bias better than (cid:36) does in this regime, while (cid:36) performs better than (cid:36) when (cid:15) ≥ (cid:15) ∗ . Similarly, for the chi-squared approximation with the Bartlettcorrection, similar conclusions can be obtained by the results in the columns (c) and (d) of Figures 4–9.The simulations under the ﬁnite sample suggest that the derived asymptotic biases can be used as prac-tical guidelines for the considered likelihood ratio tests. Speciﬁcally, when using the chi-squared approxi-mation in each test, similarly to our recommendation in §

3, the practitioners can compare the asymptoticbias, either (cid:36) or M c ( (cid:36) , (cid:36) ) , with a small threshold value that they may specify in advance, e.g.,0.01–0.02. If the asymptotic bias is larger than the threshold, the chi-squared approximation should not bedirectly used, and other methods would be needed. In addition, when using the chi-squared approximationwith the Bartlett correction in each test, we can compare the asymptotic bias, either (cid:36) or M c ( (cid:36) , (cid:36) ) with the pre-speciﬁed threshold value. Similarly, if the asymptotic bias is larger than the threshold, thechi-squared approximation with the Bartlett correction should not be directly applied, and other methodswould be needed.6 H E ET AL . T e s t (I) E m p i r i c a l t y pe - I e rr o r E m p i r i c a l t y pe - I e rr o r T e s t (II) E m p i r i c a l t y pe - I e rr o r E m p i r i c a l t y pe - I e rr o r T e s t (III) E m p i r i c a l t y pe - I e rr o r (i) Without the Bartlett correction E m p i r i c a l t y pe - I e rr o r (ii) With the Bartlett correction correction Fig. 2: One-sample tests (I)–(III). Rows 1-3 give the results for tests (I)–(III), respectively. Columns (i) and(ii) correspond to the chi-squared approximations without and with the Bartlett correction, respectively.Within each subﬁgure: empirical type-I error versus (cid:15) with n = 100 (cross), (asterisk), (square),and (triangle); theoretical phase transition boundary (vertical dashed line). n the Phase Transition of Wilk’s Phenomenon T e s t (I V ) E m p i r i c a l t y pe - I e rr o r E m p i r i c a l t y pe - I e rr o r T e s t ( V ) E m p i r i c a l t y pe - I e rr o r E m p i r i c a l t y pe - I e rr o r T e s t ( V I) E m p i r i c a l t y pe - I e rr o r E m p i r i c a l t y pe - I e rr o r T e s t ( V II) E m p i r i c a l t y pe - I e rr o r (i) Without the Bartlett correction E m p i r i c a l t y pe - I e rr o r (ii) With the Bartlett correctionFig. 3: Multiple-sample tests (IV)–(VI) and the independence test (VII). Rows 1-4 give results for tests(IV)–(VII), respectively. Columns (i) and (ii) are for the chi-squared approximations without and with theBartlett correction, respectively. Within each subﬁgure, please see the caption description in Fig. 2.8 H E ET AL . Test(I) . . . . . . . . . . . Empirical type-I error / Bias . . . . . . . . . . . Empirical type-I error / Bias . . . . . . . . . . . . . Empirical type-I error / Bias . . . . . . . . . . . . . Empirical type-I error / Bias

Test(II) . . . . . . . . . . . . Empirical type-I error / Bias . . . . . . . . . . . . Empirical type-I error / Bias . . . . . . . . . . . . . . Empirical type-I error / Bias . . . . . . . . . . . . . . Empirical type-I error / Bias

Test(III) . . . . . . . . . . . Empirical type-I error / Bias ( a ) W it hou tt h e B a r tl e tt c o rr ec ti on . . . . . . . . . . . Empirical type-I error / Bias ( b ) W it hou tt h e B a r tl e tt c o rr ec ti on . . . . . . . . . . . . Empirical type-I error / Bias ( c ) W it h t h e B a r tl e tt c o rr ec ti on . . . . . . . . . . . . Empirical type-I error / Bias ( d ) W it h t h e B a r tl e tt c o rr ec ti on F i g . : O n e - s a m p l e t e s t s (I) – (III) w h e n n = . R o w s r e s e n tt h e r e s u lt s f o r t e s t s (I) – (III) , r e s p ec ti v e l y . F o rf ou r c o l u m n s i n eac h r o w : ( a ) W it hou t t h e B a r tl e tt c o rr ec ti on : e m p i r i ca lt yp e -I e rr o r v e r s u s (cid:15) ( a s t e r i s k ) ; (cid:36) , i . e ., t h ea s y m p t o ti c b i a s i n ( )( do t ) ;t h e d i ff e r e n ce b e t w ee n t h ee m p i r i ca lt yp e -I e rr o r a nd (cid:36) ( c i r c l e ) . ( b ) W it hou tt h e B a r tl e tt c o rr ec ti on : e m p i r i ca lt yp e -I e rr o r v e r s u s (cid:15) ( a s t e r i s k ) ; M c ( (cid:36) , (cid:36) ) w it h c = . ( do t ) ;t h e l o ca ti on w it h x - a x i s (cid:15) ∗ s a ti s f y i ng M c ( (cid:36) , (cid:36) ) = (cid:36) w h e n (cid:15) < (cid:15) ∗ a nd M c ( (cid:36) , (cid:36) ) > (cid:36) w h e n (cid:15) ≥ (cid:15) ∗ ( p l u ss i gn ) ;t h e d i ff e r e n ce b e t w ee n t h ee m p i r i ca lt yp e -I e rr o r a nd M c ( (cid:36) , (cid:36) ) ( c i r c l e ) . ( c ) W it h t h e B a r tl e tt c o rr ec ti on : e m p i r i ca lt yp e -I e rr o r v e r s u s (cid:15) ( a s t e r i s k ) ; (cid:36) , i . e ., t h ea s y m p t o ti c b i a s i n ( )( do t ) ;t h e d i ff e r e n ce b e t w ee n t h ee m p i r i ca lt yp e -I e rr o r a nd (cid:36) ( c i r c l e ) . ( d ) W it h t h e B a r tl e tt c o rr ec ti on : e m p i r i ca lt yp e -I e rr o r v e r s u s (cid:15) ( a s t e r i s k ) ; M c ( (cid:36) , (cid:36) ) w it h c = . ( do t ) ;t h e l o ca ti on w it h x - a x i s (cid:15) ∗ s a ti s f y i ng M c ( (cid:36) , (cid:36) ) = (cid:36) w h e n (cid:15) < (cid:15) ∗ a nd M c ( (cid:36) , (cid:36) ) > (cid:36) w h e n (cid:15) ≥ (cid:15) ∗ ( p l u ss i gn ) ;t h e d i ff e r e n ce b e t w ee n t h e e m p i r i ca lt yp e -I e rr o r a nd M c ( (cid:36) , (cid:36) ) ( c i r c l e ) . n the Phase Transition of Wilk’s Phenomenon Test(I) . . . . . . . . . . . . . Empirical type-I error / Bias . . . . . . . . . . . . . Empirical type-I error / Bias . . . . . . . . . . . . . Empirical type-I error / Bias . . . . . . . . . . . . . Empirical type-I error / Bias

Test(II) . . . . . . . . . . . Empirical type-I error / Bias . . . . . . . . . . . Empirical type-I error / Bias . . . . . . . . . . . . Empirical type-I error / Bias . . . . . . . . . . . . Empirical type-I error / Bias

Test(III) . . . . . . . . . . Empirical type-I error / Bias ( a ) W it hou tt h e B a r tl e tt c o rr ec ti on . . . . . . . . . . Empirical type-I error / Bias ( b ) W it hou tt h e B a r tl e tt c o rr ec ti on . . . . . . . . . . . . Empirical type-I error / Bias ( c ) W it h t h e B a r tl e tt c o rr ec ti on . . . . . . . . . . . . Empirical type-I error / Bias ( d ) W it h t h e B a r tl e tt c o rr ec ti on F i g . : O n e - s a m p l e t e s t s (I) – (III) w h e n n = . R o w s r e s e n tt h e r e s u lt s f o r t e s t s (I) – (III) , r e s p ec ti v e l y . F o rf ou r c o l u m n s i n eac h r o w , p l ea s e s ee t h e ca p ti ond e s c r i p ti on i n F i g . . H E ET AL . Test(IV) . . . . . . . . . . . Empirical type-I error / Bias . . . . . . . . . . . Empirical type-I error / Bias . . . . . . . . . . . . . . Empirical type-I error / Bias . . . . . . . . . . . . . . Empirical type-I error / Bias

Test(V) . . . . . . . . . . Empirical type-I error / Bias . . . . . . . . . . Empirical type-I error / Bias . . . . . . . . . . . . Empirical type-I error / Bias . . . . . . . . . . . . Empirical type-I error / Bias

Test(V) . . . . . . . . . . . Empirical type-I error / Bias . . . . . . . . . . . Empirical type-I error / Bias . . . . . . . . . . . . . Empirical type-I error / Bias . . . . . . . . . . . . . Empirical type-I error / Bias

Test(VI) . . . . . . . . . . . Empirical type-I error / Bias ( a ) W it hou tt h e B a r tl e tt c o rr ec ti on . . . . . . . . . . . Empirical type-I error / Bias ( b ) W it hou tt h e B a r tl e tt c o rr ec ti on . . . . . . . . . . . . . Empirical type-I error / Bias ( c ) W it h t h e B a r tl e tt c o rr ec ti on . . . . . . . . . . . . . Empirical type-I error / Bias ( d ) W it h t h e B a r tl e tt c o rr ec ti on F i g . : M u lti p l e - s a m p l e t e s t s (I V ) – ( V I) w h e n n = . R o w s r e s e n tt h e r e s u lt s f o r t e s t s (I V ) – ( V I) , r e s p ec ti v e l y . F o rf ou r c o l u m n s i n eac h r o w , p l ea s e s ee t h eca p ti ond e s c r i p ti on i n F i g . . H E ET AL . . . . . . . . . . . Empirical type-I error / Bias ( a ) W it hou tt h e B a r tl e tt c o rr ec ti on . . . . . . . . . . Empirical type-I error / Bias ( b ) W it hou tt h e B a r tl e tt c o rr ec ti on . . . . . . . . . . . . Empirical type-I error / Bias ( c ) W it h t h e B a r tl e tt c o rr ec ti on . . . . . . . . . . . . Empirical type-I error / Bias ( d ) W it h t h e B a r tl e tt c o rr ec ti on F i g . : I nd e p e nd e n ce t e s t ( V II) w h e n n = : f o r c o l u m n s ( a ) – ( d ) , p l ea s e s ee t h eca p ti ond e s c r i p ti on i n F i g . . . . . . . . . . . . . Empirical type-I error / Bias ( a ) W it hou tt h e B a r tl e tt c o rr ec ti on . . . . . . . . . . . Empirical type-I error / Bias ( b ) W it hou tt h e B a r tl e tt c o rr ec ti on . . . . . . . . . . . . Empirical type-I error / Bias ( c ) W it h t h e B a r tl e tt c o rr ec ti on . . . . . . . . . . . . Empirical type-I error / Bias ( d ) W it h t h e B a r tl e tt c o rr ec ti on F i g . : I nd e p e nd e n ce t e s t ( V II) w h e n n = : f o r c o l u m n s ( a ) – ( d ) , p l ea s e s ee t h eca p ti ond e s c r i p ti on i n F i g . . n the Phase Transition of Wilk’s Phenomenon B. P

ROOF I LLUSTRATION WITH P ROBLEM (III)In this section, we illustrate the proofs of Theorems 2.1–2.3 by focusing on the testing problem (III),which jointly tests the the one-sample mean vector and covariance matrix. Other testing problems (I)–(II)and (IV)–(VII) can be proved following a similar analysis, and are discussed in Section C. We deﬁne somenotation to facilitate the proofs. For two sequences of numbers { a n ; n ≥ } and { b n ; n ≥ } , a n = O ( b n ) denotes lim sup n →∞ | a n /b n | < ∞ ; a n = o ( b n ) denotes lim n →∞ a n /b n = 0 ; a n = Θ( b n ) representsthat a n = O ( b n ) and b n = O ( a n ) hold simultaneously; a n ∼ b n denotes lim n →∞ | a n /b n | = 1 .B.1 . Proof of Theorem . (III) When p is ﬁxed, the chi-squared approximations hold by the classical multivariate analysis (Anderson,2003; Muirhead, 2009). Therefore, without loss of generality, the proofs below focus on p → ∞ .Deriving the necessary and sufﬁcient conditions for the chi-squared approximations requires the correctunderstanding of the limiting behavior of log Λ n under both low and high dimensions. Particularly, weexamine the limiting distribution of the log likelihood ratio test statistic log Λ n based on the momentgenerating function of log Λ n , that is, E { exp( t log Λ n ) } . For Λ n in question (III), by Theorem 8.5.3 andCorollary 8.5.4 in Muirhead (2009), we have that under H , E { exp( t log Λ n ) } = E(Λ tn ) = (cid:18) en (cid:19) npt/ (1 + t ) − np (1+ t ) / × Γ p [ { n (1 + t ) − } / p { ( n − / } , (B.1)where Γ p ( · ) is the multivariate Gamma function; see Deﬁnition 2.1.10 in Muirhead (2009).When p is ﬁxed, the moment generating function of − n approximates that of a chi-squaredvariable χ f , where f = p ( p + 3) / ; see, Sections 8.2.4 and 8.5 in Muirhead (2009). When p → ∞ , Jiangand Yang (2013) and Jiang and Qi (2015) derived an approximate expansion of the multivariate Gammafunction, and their Theorem 5 utilized (B.1) to show that under the conditions of Theorem 2.1, E[exp { s ( − n + 2 µ n ) / (2 nσ n ) } ] → exp( s / , (B.2)where exp( s / is the moment generating function of N (0 , , and µ n = − (cid:26) n (2 n − p −

3) log (cid:18) − pn − (cid:19) + 2( n + 1) p (cid:27) , (B.3) σ n = − (cid:26) pn − (cid:18) − pn − (cid:19)(cid:27) . (B.4)We next prove (i) in Theorem 2.1 when p → ∞ based on (B.2). Particularly, we write sup α ∈ (0 , (cid:12)(cid:12) pr {− n > χ f ( α ) } − α (cid:12)(cid:12) = sup α ∈ (0 , (cid:12)(cid:12)(cid:12) pr( T n > q n,α ) − ¯Φ( q n,α ) + ¯Φ( q n,α ) − ¯Φ( z α ) (cid:12)(cid:12)(cid:12) , (B.5)where T n = ( − n + 2 µ n ) / (2 nσ n ) , q n,α = { χ f ( α ) + 2 µ n } / (2 nσ n ) , and ¯Φ( · ) = 1 − Φ( · ) with Φ( · ) being the cumulative distribution function of N (0 , . Since (B.2) suggests that T n converges to N (0 , in distribution, and the cumulative distribution function of N (0 , is continuous, by P´olya-Cantelli Lemma (see, e.g., Lemma 2.11 in Van der Vaart (2000)), we have sup α ∈ (0 , | pr( T n > q n,α ) − ¯Φ( q n,α ) | → . Consequently, (B.5) → if and only if sup α ∈ (0 , | ¯Φ( q n,α ) − ¯Φ( z α ) | → , which is equiv-alent to sup α ∈ (0 , | q n,α − z α | → , as ¯Φ( · ) is a continuous and strictly decreasing function with boundedderivative. Since χ f can be viewed as a summation over f independent χ variables, and f → ∞ as p → ∞ , we can apply BerryEsseen theorem to χ f variable, and obtain sup α ∈ (0 , (cid:12)(cid:12) { χ f ( α ) − f } / (cid:112) f − z α (cid:12)(cid:12) = O ( f − / ) . (B.6)4 H E ET AL . Therefore, sup α ∈ (0 , | q n,α − z α | → is equivalent to (cid:112) f × (2 nσ n ) − → , (B.7) ( O (1) + f + 2 µ n ) × (2 nσ n ) − → . (B.8)Following similar analysis, we know that under the conditions of Theorem 2.1 and p → ∞ , for the chi-squared approximation with the Bartlett correction, sup α ∈ (0 , | pr {− ρ log Λ n > χ f ( α ) } − α | holds ifand only if (cid:112) f × (2 nρσ n ) − → , (B.9) ( O (1) + f + 2 ρµ n ) × (2 nρσ n ) − → . (B.10)We next examine (B.7)–(B.8) and (B.9)–(B.10) for the chi-squared approximation without and with theBartlett correction, respectively. (III.i) The chi-squared approximation. We next discuss two cases lim n →∞ p/n = 0 and lim n →∞ p/n = C ∈ (0 , , respectively. Case (III.i.1) lim n →∞ p/n = 0 . Under this case, we prove that (B.7) holds. As √ f ∼ p , it is equivalentto show that p/ (2 nσ n ) → . By Taylor’s expansion of σ n in (B.4), we have σ n = − pn − − log (cid:18) − pn − (cid:19) = p n − + o (cid:18) p n (cid:19) , and therefore √ f × (2 nσ n ) − → . We next show that (B.8) holds if and only if p /n → . Given(B.7) and √ f ∼ p , (B.8) is equivalent to ( f + 2 µ n ) /p → . By p/n = o (1) and Taylor’s expansion of log(1 − x ) , for µ n in (B.3), we have µ n /p = − n + 1) + n (2 n − p − (cid:26) n − p n − + p n − + O (cid:18) p n (cid:19)(cid:27) (B.11) = − n + 1) + (2 n − p − (cid:26) p n −

1) + p n − (cid:27) + 2 + o (1) + O (cid:18) p n (cid:19) = − p − n − p − p n −

1) + (2 n − p − p n − + o (1) + O (cid:18) p n (cid:19) . As f /p = p + 3 , we obtain × ( f + 2 µ n ) /p = − p + { n − − p − } p n −

1) + 2 p n −

1) + o (1) + O (cid:18) p n (cid:19) (B.12) = − p n −

1) + o (1) + O (cid:18) p n (cid:19) . Therefore when p/n → , (B.8) holds if and only if p /n → . Case (III.i.2) lim n →∞ p/n = C ∈ (0 , . Under this case, we have (cid:112) f × (2 nσ n ) − ∼ p (2 nσ n ) − ∼ C (2 σ n ) − . (B.13)If C = 1 , σ n → ∞ and thus (B.13) → . If C ∈ (0 , , we have C (2 σ n ) − ∼ C [ − { C + log(1 − C ) } ] − / < when < C < . In summary, (B.7) does not hold, which suggests that the chi-squaredapproximation fails.Finally, we consider a general sequence p/n = p n /n ∈ [0 , , where we write p as p n to emphasize that p changes with n . Similarly, we also write f as f n . Note that a sequence converges if and only if everysubsequence converges. For the sequence { p n /n } , by the BolzanoWeierstrass theorem, we can furthertake a subsequence { n t } such that p n t /n t → C ∈ [0 , . If C ∈ (0 , , the above analysis still applies,which shows that the chi-squared approximation fails. Alternatively, if all the subsequences of { p/n } n the Phase Transition of Wilk’s Phenomenon converge to , we know p/n → . In summary, the above analysis shows that (B.7) and (B.8) hold if andonly if p /n → . (III.ii) The chi-squared approximation with the Bartlett correction. Similarly to the analysis above, wediscuss two cases lim n →∞ p/n = 0 and lim n →∞ p/n = C ∈ (0 , , respectively. Case (III.ii.1) lim n →∞ p/n = 0 . Under this case, we know (B.9) holds since ρ = 1 + O ( p/n ) → and p/ (2 nσ n ) → as shown in Case (III.i.1) above. Given (B.9), deriving the condition for (B.10) is equiva-lent to examine when p − ( f + 2 ρµ n ) → . Following the analysis of (B.12), we further obtain × ( f + 2 µ n ) /p = ( p + 3) − n + 1) + n (2 n − p − (cid:88) j =1 p j − j ( n − j + O (cid:18) p n (cid:19) (B.14) = − p n − − p n − + O (cid:18) p n (cid:19) + o (1) . We write ρ = 1 − ∆ n where ∆ n = { n ( p + 3) } − (2 p + 9 p + 11) , which is O ( p/n ) . By (B.12), wehave µ n /p = − p − − p / { n − } + o (1) + O ( p n − ) . Together with (B.14), we have × ( f + 2 ρµ n ) /p = 2 × ( f + 2 µ n ) /p − n × µ n /p (B.15) = − p n − − p n − − ∆ n (cid:26) − p − − p n − (cid:27) + O (cid:18) p n (cid:19) + o (1)= − p n − − p n − + 2 p ( p + 3)6 n ( p + 3) + 2 p × p n ( p + 3) × n −

1) + O (cid:18) p n (cid:19) + o (1)= − p n + O (cid:18) p n (cid:19) + o (1) . Therefore under this case (B.10) holds if and only if p /n → . Case (III.ii.2):

When lim n →∞ p/n = C ∈ (0 , , we have ρ → − C/ and (cid:112) f × (2 nρσ n ) − ∼ C × (1 − C/ − (2 σ n ) − . Similarly to the Case (III.i.2) above, if C = 1 , (B.9) → ; if C ∈ (0 , , we have C (1 − C/ − (2 σ n ) − ∼ C (1 − C/ − [ − { C + log(1 − C ) } ] − / < when < C < . In summary, (B.9)does not hold, which suggests the failure of the chi-squared approximation with the Bartlett correction.For a general sequence p/n = p n /n ∈ [0 , , the analysis of taking subsequences above can be ap-plied similarly. In summary, we know that for the likelihood ratio test in problem (III), the chi-squaredapproximation with the Bartlett correction holds if and only if p /n → . B.2 . Proof of Theorem . (III) Similarly to § B.1, in this subsection, we prove Theorem 2.2 for problem (III) as an illustration example,while the proofs of other problems are similar and the details are provided in § C.3. Particularly, we proveTheorem 2.2 for problem (III) by examining the characteristic function of − η log Λ n , where η = 1 or η = ρ , and ρ is the corresponding Bartlett correction factor, given in §

2. The following Lemma B.2.1gives an asymptotic expansion for the characteristic function E { exp( − itη log Λ n ) } , where the notation i is reserved to denote the solution of the equation x = − , i.e., the imaginary unit.L EMMA

B.2.1.

Under H of the testing problem (III), when η = 1 or η = ρ with the Bartlett correc-tion factor ρ in § , the characteristic function of − η log Λ n satisﬁes that for a given integer L , when p L +2 /n L → , E { exp( − itη log Λ n ) } = (1 − it ) − f/ exp (cid:34) L − (cid:88) l =1 ς l (cid:8) (1 − it ) − l − (cid:9) + O (cid:18) p L +2 n L (cid:19)(cid:35) , H E ET AL . where f = p ( p + 3) / is the corresponding degrees of freedom, and ς l = ( − l +1 l ( l + 1) p (cid:88) j =1 (cid:40) B l +1 (cid:18) (1 − η ) n − j (cid:19) − (cid:18) (1 − η ) n (cid:19) l +1 (cid:41) (cid:16) ηn (cid:17) − l . (B.16) For any integer l ≥ , B l ( · ) represents the Bernoulli polynomial of degree l ; see, e.g., Eq. (25) in Section8.2.4 of Muirhead (2009).Proof. Section D.2.1 on Page 55. (cid:3)

With Lemma B.2.1, we next prove (1) and (2) in Theorem 2.2 for the chi-squared approximations withoutand with the Bartlett correction, respectively. (i) The chi-squared approximation.

When ρ = 1 , as B l +1 ( · ) is a polynomial of order l + 1 , we have ς l = O ( p l +2 n − l ) for l ≥ , and we can check that ς = Θ( p n − ) ; see (B.23). Thus when p /n → , ς l → for l ≥ . Let Ψ( t ) = E { exp( − it log Λ n ) } . Then by Lemma B.2.1, Ψ( t ) = (1 − it ) − f/ (cid:40) exp (cid:34) (cid:88) l =1 ς l (cid:8) (1 − it ) − l − (cid:9) + O (cid:0) p n − (cid:1)(cid:35)(cid:41) . (B.17)By Taylor’ expansion, we can write exp[ ς l { (1 − it ) − l − } ] = 1 + V l ( t ) , where V l ( t ) = ∞ (cid:88) v =1 ς vl v ! v (cid:88) w =0 (cid:18) vw (cid:19) (1 − it ) − lw ( − v − w . (B.18)Then by (B.17) and p /n → , we have Ψ( t ) = ˜Ψ( t ) { O ( p /n ) } , where ˜Ψ( t ) = (1 − it ) − f/ (cid:8) V ( t ) (cid:9)(cid:8) V ( t ) (cid:9) = (1 − it ) − f/ + ∞ (cid:88) v =1 ς v v ! v (cid:88) w =0 (cid:18) vw (cid:19) (1 − it ) − f/ − w ( − v − w (B.19) + ∞ (cid:88) v =1 ς v v ! v (cid:88) w =0 (cid:18) vw (cid:19) (1 − it ) − f/ − w ( − v − w + (cid:88) v ≥

1; 0 ≤ w ≤ v v ≥

1; 0 ≤ w ≤ v ς v ς v v ! v ! (cid:18) v w (cid:19)(cid:18) v w (cid:19) (1 − it ) − f − w − w ( − v − w + v − w . Note that (1 − it ) − f/ is the characteristic function of χ f distribution. Following similar analysis toSection 8.5 in Anderson (2003), we use the inversion property of the characteristic function, and then by(B.19), we obtain that Pr( − n ≤ x ) (B.20) = (cid:40) Pr( χ f ≤ x ) + ∞ (cid:88) v =1 ς v v ! v (cid:88) w =0 (cid:18) vw (cid:19) Pr( χ f +2 w ≤ x )( − v − w + ∞ (cid:88) v =1 ς v v ! v (cid:88) w =0 (cid:18) vw (cid:19) Pr( χ f +4 w ≤ x )( − v − w + (cid:88) v ≥

1; 0 ≤ w ≤ v v ≥

1; 0 ≤ w ≤ v ς v ς v v ! v ! (cid:18) v w (cid:19)(cid:18) v w (cid:19) Pr( χ f +2 w +4 w ≤ x )( − v − w + v − w (cid:41)(cid:26) O (cid:18) p n (cid:19)(cid:27) . (From (B.19) to (B.20), Fubini’s theorem is implicitly used to exchange the order of the inﬁnite sumand the integration of characteristic functions.) We next utilize the following Propositions B.1 and B.2 toevaluate (B.20). n the Phase Transition of Wilk’s Phenomenon P ROPOSITION

B.1.

Given an integer h ∈ { , , , } , when x = χ f ( α ) , there exists a constant C suchthat as f → ∞ , v (cid:88) w =0 (cid:18) vw (cid:19) Pr( χ f +2 hw ≤ x )( − v − w = O ( v ! C v f − v/ ) (B.21) uniformly over v ≥ .Proof. Please see Section D.2.4 on Page 58. (cid:3) P ROPOSITION

B.2.

For ( h , h ) = (1 , or ( h , h ) = (2 , , when x = χ f ( α ) , there exists a con-stant C such that as f → ∞ , v (cid:88) w =0 v (cid:88) w =0 (cid:18) v w (cid:19)(cid:18) v w (cid:19) Pr( χ f +2 h w +2 h w ≤ x )( − v − w + v − w = O { v ! v ! C v + v f − ( v + v ) / } uniformly over v , v ≥ .Proof. Please see Section D.2.5 on Page 63. (cid:3)

Remark B. . . In Propositions B.1 and B.2, C denotes a universal constant and its value can change.This is similarly used in the following proofs. In addition, for a series { b v,f } that depends on positiveintegers v and f , we say b v,f = O ( v ! C v f − v/ ) as f → ∞ and uniformly over v ≥ , if there exists aconstant C such that sup v ≥ lim sup f →∞ | b v,f / ( v ! C v f − v/ ) | < ∞ .When x = χ f ( α ) and f → ∞ , we apply Proposition B.1 with h = 1 and h = 2 , and Proposition B.2 with ( h , h ) = (1 , to (B.20). Then as ς = Θ( p n − ) , ς = O ( p n − ) , and f = Θ( p ) , when p → ∞ and p /n → , we obtain Pr( − n ≤ x ) = Pr( χ f ≤ x ) + ς (cid:8) Pr( χ f +2 ≤ x ) − Pr( χ f ≤ x ) (cid:9) + o ( p /n ) . (B.22)We next compute ς . Particularly, for the chi-squared approximation, ρ = 1 , and then by (B.16), ς = 12 p (cid:88) j =1 B (cid:18) − j (cid:19)(cid:16) n (cid:17) − = 124 n p (cid:0) p + 9 p + 11 (cid:1) , (B.23)where we use B ( z ) = z − z + 1 / ; see, e.g., Eq. (26) in Section 8.2.4 of Muirhead (2009). To ﬁnishthe proof of (1), we use the following lemma.L EMMA

B.2.2.

When x = χ f ( α ) and f → ∞ , for h ∈ { , , , } , Pr( χ f +2 h ≤ x ) − Pr( χ f ≤ x ) = − h (cid:88) k =1 (cid:26) Γ (cid:18) f h − k + 1 (cid:19)(cid:27) − (cid:16) x (cid:17) f + h − k e − x/ (B.24) = − h √ f π exp (cid:18) − z α (cid:19) (cid:110) O ( f − / ) (cid:111) . (B.25) Proof.

Please see Section D.2.3 on Page 57. (cid:3) As p → ∞ , f → ∞ . Then by (B.22) and (B.23), and applying Lemma B.2.2 with h = 1 , (1) is proved,where ϑ ( n, p ) = ς / √ f . (ii) The chi-squared approximation with the Bartlett correction. Similarly to the proof in Part (i) above,we prove (2) by examining the expansion of the characteristic function in Lemma B.2.1. In particular, forthe chi-squared approximation with the Bartlett correction, we note that the Bartlett correction factor ρ ischosen such that ς = 0 (see Section 8.5.3 in Muirhead (2009)). This can be checked by plugging ρ = 1 − H E ET AL . { n ( p + 3) } − (2 p + 9 p + 11) into (B.16) to calculate ς . In addition, by B ( z ) = z − z / z/ (see, e.g., Eq. (26) in Section 8.2.4 of Muirhead (2009)), we calculate ς = p (2 p + 18 p + 49 p + 36 p − p + 3)( ρn ) , (B.26)and therefore ς = Θ( p n − ) . We redeﬁne Ψ( t ) = E { exp( − itρ log Λ n ) } . Then when p /n → , byLemma B.2.1, we have Ψ( t ) = (1 − it ) − f/ (cid:40) exp (cid:34) (cid:88) l =2 ς l (cid:8) (1 − it ) − l − (cid:9) + O ( p n − ) (cid:35)(cid:41) , (B.27)where we use ς = 0 . Similarly to (B.19), we have Ψ( t ) = (1 − it ) − f/ { V ( t ) }{ V ( t ) }{ O ( p n − ) } . Moreover, similarly to (B.20), we obtain pr( − ρ log Λ n ≤ x ) (B.28) = (cid:40) Pr( χ f ≤ x ) + ∞ (cid:88) v =1 ς v v ! v (cid:88) w =0 (cid:18) vw (cid:19) pr( χ f +4 w ≤ x )( − v − w + ∞ (cid:88) v =1 ς v v ! v (cid:88) w =0 (cid:18) vw (cid:19) Pr( χ f +6 w ≤ x )( − v − w + (cid:88) v ≥

1; 0 ≤ w ≤ v v ≥

1; 0 ≤ w ≤ v ς v ς v v ! v ! (cid:18) v w (cid:19)(cid:18) v w (cid:19) Pr( χ f +4 w +6 w ≤ x )( − v − w + v − w (cid:41)(cid:26) O (cid:16) p n (cid:17)(cid:27) . When x = χ f ( α ) and f → ∞ , we apply Proposition B.1 with h = 2 and h = 3 , and Proposition B.2 with ( h , h ) = (2 , to (B.28). Then as ς = Θ( p /n ) , ς = O ( p /n ) , and f = Θ( p ) , we know that when p → ∞ and p /n → , Pr( − ρ log Λ n ≤ x ) = Pr( χ f ≤ x ) + ς (cid:8) Pr( χ f +4 ≤ x ) − Pr( χ f ≤ x ) (cid:9) + o ( p /n ) . (B.29)By (B.26) and (B.29), and applying Lemma B.2.2 with h = 2 , we prove (2), where ϑ ( n, p ) = 2 ς / √ f .B.3 . Proof of Theorem . (III) In this section, we prove Theorem 2.3 also by examining the characteristic function of the likelihoodratio test statistic. In particular, motivated by the limit in (B.2), we study the standardized test statistic ( − n + 2 µ n )(2 nσ n ) − , where the values of µ n and σ n are given in Theorem 2.3. Under H of thetesting problem (III), by (B.1), the characteristic function of ( − n + 2 µ n ) / (2 nσ n ) is E (cid:40) exp (cid:18) is × − n + 2 µ n nσ n (cid:19)(cid:41) (B.30) = (cid:18) en (cid:19) − npti/ (1 − ti ) − np (1 − ti ) / Γ p [ { n (1 − ti ) − } / p { ( n − / } exp (cid:18) µ n sinσ n (cid:19) , where i denotes the imaginary unit and t = s/ ( nσ n ) . Then the proof of Theorem 2.3 utilizes the followinginequality result of the characteristic function.L EMMA

B.3.1 (T

HEOREM

SHAKOV , 2011)).

Let G ( x ) and G ( x ) be two distributionfunctions with characteristic functions ψ ( s ) and ψ ( s ) , respectively. If G ( x ) has a derivative and sup x G (cid:48) ( x ) ≤ a < ∞ , then for any positive T and any b ≥ / (2 π ) , sup x (cid:12)(cid:12) G ( x ) − G ( x ) (cid:12)(cid:12) ≤ b (cid:90) T − T (cid:12)(cid:12)(cid:12)(cid:12) ψ ( s ) − ψ ( s ) s (cid:12)(cid:12)(cid:12)(cid:12) ds + cT , where c is a constant that depends on a and b . n the Phase Transition of Wilk’s Phenomenon We next prove (3) and (4) in Theorem 2.3 for the chi-squared approximations without and with the Bartlettcorrection, respectively. (i) Chi-squared approximation.

We prove (3) by using Lemma B.3.1 to derive an upper bound of thedifference G ( x ) − G ( x ) , where we consider G ( x ) = Pr (cid:18) − n + 2 µ n nσ n ≤ x (cid:19) , G ( x ) = Φ( x ); here Φ( x ) denotes the cumulative distribution function of the standard normal distribution. Then thecharacteristic function of G ( x ) is ψ ( s ) = (B.30), and the characteristic function of G ( x ) is ψ ( s ) =exp( − s / . To quantify ψ ( s ) − ψ ( s ) , we use the following Lemma B.3.2.L EMMA

B.3.2.

When s = o (min { ( n/p ) / , f / } ) , log ψ ( s ) − log ψ ( s ) = O (cid:16) pn (cid:17) s + (cid:18) p + pn (cid:19) O (cid:0) s (cid:1) + O (cid:18) s √ f (cid:19) . (B.31) Proof.

Please see Section D.3.1 on Page 74. (cid:3)

By Lemmas B.3.1 and B.3.2, we take T = min { ( n/p ) (1 − δ ) / , f (1 − δ ) / } , where δ ∈ (0 , is a smallconstant, and then sup x (cid:12)(cid:12) G ( x ) − G ( x ) (cid:12)(cid:12) ≤ b (cid:90) T − T ψ ( s ) (cid:26) O (cid:16) pn (cid:17) + (cid:18) p + pn (cid:19) O ( s ) + O (cid:18) s √ f (cid:19)(cid:27) ds + cT . (B.32)Since (cid:82) T − T ψ ( s ) < ∞ , (cid:82) T − T ψ ( s ) s < ∞ , and (cid:82) T − T ψ ( s ) s < ∞ , by f = Θ( p ) and (B.32), sup x (cid:12)(cid:12) G ( x ) − G ( x ) (cid:12)(cid:12) = O (cid:26)(cid:16) pn (cid:17) (1 − δ ) / + f − (1 − δ ) / (cid:27) . Consider x = { χ f ( α ) + 2 µ n } (2 nσ n ) − , and then G ( x ) − G ( x ) gives Pr (cid:8) − n ≤ χ f ( α ) (cid:9) − Φ (cid:40) χ f ( α ) + 2 µ n nσ n (cid:41) = O (cid:26)(cid:16) pn (cid:17) (1 − δ ) / + f − (1 − δ ) / (cid:27) . (B.33)Then (3) is proved by ¯Φ( · ) = 1 − Φ( · ) and Pr {− n > χ f ( α ) } = 1 − Pr {− n ≤ χ f ( α ) } . (ii) Chi-squared approximation with the Bartlett correction. To prove (4), we still use (B.32). Nowconsider x = { χ f ( α ) + 2 ρµ n } (2 ρnσ n ) − , and then G ( x ) − G ( x ) gives Pr (cid:8) − ρ log Λ n ≤ χ f ( α ) (cid:9) − Φ (cid:40) χ f ( α ) + 2 ρµ n ρnσ n (cid:41) = O (cid:26)(cid:16) pn (cid:17) (1 − δ ) / + f − (1 − δ ) / (cid:27) . Remark B. . . Although Theorem 2.3 is inspired by the limit in (B.2), which was ﬁrst establishedin Jiang and Yang (2013), Theorem 2.3 differs from the existing results by further characterizing theconvergence rate of (B.2) by Lemma B.3.2. Particularly, Jiang and Yang (2013) proved (B.2) when s isconsidered ﬁxed and the convergence rate is not examined. On the other hand, Lemma B.3.2 allows s changes with n and p , and the difference between the two characteristic functions is characterized by(B.31). Technically, establishing (B.31) requires a careful investigation of the asymptotic expansion of thegamma functions, where the technical details are given in Sections D.1 and D.3. Remark B. . . Since χ f can be viewed as a summation over f independent χ variables, by apply-ing the central limit theorem, we have χ f ( α ) = √ f z α + f + O (1) , where z α denote the upper α -levelquantile of the standard normal distribution. For the problem (III), note that µ n and σ n in Theorem0 H E ET AL . nσ n / √ f = 1 + O ( p/n ) . Consequently, when f → ∞ and p/n → , Φ (cid:26) χ f ( α ) + 2 µ n nσ n (cid:27) = Φ (cid:18) z α + f + 2 µ n nσ n (cid:19) + O (cid:18) √ f (cid:19) + O (cid:16) pn (cid:17) . Moreover, by (B.12), ( f + 2 µ n ) / (2 nσ n ) ∼ − p / (6 n ) when p/n → . Thus − ( f + 2 µ n ) / (2 nσ n ) = √ ϑ ( n, p ) + o ( p /d n − ) , which is of the order of p /d n − with d = 1 / . When p/n d → , by α = ¯Φ( z α ) and Taylor’s series of ¯Φ( · ) at z α , ¯Φ (cid:18) z α + f + 2 µ n nσ n (cid:19) − α = ϑ ( n, p ) √ π exp (cid:18) − z α (cid:19) + o (cid:18) p /d n (cid:19) , which suggests that the ﬁrst two terms in the right hand side of (3) are consistent with (1). Similarly, forthe chi-squared approximation with the Bartlett correction, when f → ∞ and p/n → , Φ (cid:26) χ f ( α ) + 2 ρµ n ρnσ n (cid:27) = Φ (cid:18) z α + f + 2 ρµ n ρnσ n (cid:19) + O (cid:18) √ f (cid:19) + O (cid:16) pn (cid:17) . By (B.15), we have − ( f + 2 ρµ n ) / (2 ρnσ n ) = √ ϑ ( n, p ) + o ( p /d n − ) , which is of the order of p /d n − with d = 2 / . Thus when p /d n − → , we also know that the ﬁrst two terms in the righthand side of (4) are consistent with (2). For other likelihood ratio tests (II)–(VI), similar conclusions alsohold by the proofs in Section C.1.C. P ROOFS OF O THER P ROBLEMS

In this section, we provide the proofs of other testing problems following similar arguments to that inSection B. Particularly, for tests (I)–(II) and (IV)–(VII), Theorems 2.1, A.1 and A.4 are proved in SectionC.1; Theorems 2.2, A.2 and A.5 are proved in Section C.3, Theorems 2.3, A.3, and A.6 are proved inSection C.4. Propositions A.1 and A.2 are proved in Section C.2.C.1 . Proof of Theorems . , A. & A. p is ﬁxed, the chi-squared approximations hold by the classical multivariate analysis (Anderson,2003; Muirhead, 2009). Therefore, without loss of generality, the proofs below focus on p → ∞ . In addi-tion, we note that the analysis of taking subsequences in Section B.1 can be used similarly in the followingproofs, and thus we consider without loss of generality that the sequence p/n has a limit below. We nextstudy six likelihood ratio tests in the following subsections separately.C.1.1 . Proof of Theorem . (I): Testing One-Sample Mean Vector Similarly to the proof above, wederive the necessary and sufﬁcient conditions for the chi-squared approximations by examining the mo-ment generating functions. Note that testing one-sample mean vector can be viewed as testing coefﬁcientvector µ of the multivariate linear regression x i = 1 × µ + (cid:15) i , where (cid:15) i ∼ N (0 , Σ) . Motivated by the ap-proximate expansion of multivariate Gamma function in Jiang and Yang (2013), He et al. (2020) studiedthe moment generating function of the likelihood ratio test in high-dimensional multivariate linear regres-sion. Particularly, by Theorem 3 in He et al. (2020), we know that when n, p → ∞ and n − p → ∞ , (B.2)holds with µ n = n (cid:26) ( n − p − /

2) log ( n − p )( n − n ( n − − p ) + log (cid:16) − pn (cid:17) + p log (cid:18) − n (cid:19)(cid:27) , (C.1) σ n = 12 (cid:26) log (cid:16) − pn (cid:17) − log (cid:18) − pn − (cid:19)(cid:27) . (C.2)Following the analysis in Section B.1, we know that to derive the necessary and sufﬁcient conditionsfor the chi-squared approximations without and with the Bartlett correction, it is equivalent to examine(B.7)–(B.8) and (B.9)–(B.10), respectively, with µ n in (C.1) and σ n in (C.2). n the Phase Transition of Wilk’s Phenomenon (I.i) The chi-squared approximation. When p/n → , we apply Theorem 1 in He et al. (2020), and knowthat (B.7)–(B.8) hold if and only if p /n → . When p/n → C ∈ (0 , , we have σ n = log (cid:26) (cid:16) − pn − (cid:17) − pn ( n − (cid:27) ∼ Cn (1 − C ) , and then √ f / (2 nσ n ) = √ p/ (2 nσ n ) → √ − C < . Therefore (B.7) fails, which suggests that theclassical chi-squared approximation fails. (I.ii) The chi-squared approximation with the Bartlett correction. When p/n → , we apply Theorem 2in He et al. (2020), and know that (B.9)–(B.10) hold if and only if p /n → . When p/n → C ∈ (0 , and n − p → ∞ , we have ρ ∼ − C/ , and then √ f / (2 nρσ n ) = (1 − C/ − √ p/ (2 nσ n ) → (1 − C/ − √ − C < . Therefore (B.9) fails, which suggests that the classical chi-squared approximationwith the Bartlett correction fails.C.1.2 . Proof of Theorem . (II): Testing One-Sample Covariance Matrix Similarly to the proof inSection B.1, by Theorem 1 in Jiang and Yang (2013) and Jiang and Qi (2015), we know that under theconditions of our Theorem 2.1 and p → ∞ , (B.2) holds with µ n = − ( n − p − n −

12 ( n − p − /

2) log (cid:18) − pn − (cid:19) , (C.3) σ n = − (cid:26) pn − (cid:18) − pn − (cid:19)(cid:27) × ( n − n . (C.4)Following the analysis above, we know that to derive the necessary and sufﬁcient conditions for the chi-squared approximations without and with the Bartlett correction, it is equivalent to examine (B.7)–(B.8)and (B.9)–(B.10), respectively, with µ n in (C.3) and σ n in (C.4). As analyzed in Section B.1, it sufﬁces todiscuss two cases lim n →∞ p/n = 0 and lim n →∞ p/n = C ∈ (0 , below. (II.i) The chi-squared approximation.Case (II.i.1) lim n →∞ p/n = 0 . As √ f ∼ p , and (C.4) and (B.4) are asymptotically the same, by theproof in Section B.1, we know that (B.7) holds under this case. We next show that (B.8) holds if and onlyif p /n → . By (B.7) and √ f ∼ p , (B.8) is equivalent to p − ( f + 2 µ n ) → . By Taylor’s expansion of µ n in (C.3), we obtain µ n = − ( n − p n − n − p − / (cid:26) pn − p n − + p n − + O (cid:18) p n (cid:19)(cid:27) . Through calculations, we obtain p − ( f + 2 µ n ) = p − × (cid:26) − p p ( n − p )2( n −

1) + p n n − + o ( p ) + O (cid:18) p n (cid:19)(cid:27) = p − (cid:26) − p n + o ( p ) + O (cid:18) p n (cid:19)(cid:27) = − p n { o (1) } + o (1) , which goes to 0 if and only if p /n → . Case (II.i.2) lim n →∞ p/n = C ∈ (0 , . Similarly, as (C.4) and (B.4) are asymptotically equal, we canapply the analysis same as Section B.1, and know that the chi-squared approximation fails under thiscase. (II.ii) The chi-squared approximation with the Bartlett correction.Case (II.ii.1) lim n →∞ p/n = 0 . Under this case, we know (B.9) holds since ρ = 1 + O ( p/n ) → and p/ (2 nσ n ) → as shown above. Given (B.9), to prove (B.10), it is equivalent to prove p − ( f + 2 ρµ n ) → . By Taylor’s expansion of µ n in (C.4), we have µ n = − p ( n − n − p − / n − (cid:26) pn − p n − + p n − + p n − + O (cid:18) p n (cid:19)(cid:27) . H E ET AL . After calculations, we obtain ρµ n = − p (cid:18) p + 12 (cid:19) + p n −

1) + p ( n − p )2( n − − p ( n − p )6( n − + p ( n − p )3( n − − p n n − + p n n − + o ( p ) + O (cid:18) p n (cid:19) . It follows that f + 2 ρµ n = − p p ( n − p )2( n −

1) + p n − − p ( n − p )6( n − + p ( n − p )3( n − + 5 p n n − + o ( p ) + O (cid:18) p n (cid:19) = − p n + o ( p ) + O (cid:18) p n (cid:19) . Therefore p − { f + µ n ρ ( n − } → if and only if p /n → . Case (II.ii.2) lim n →∞ p/n = C ∈ (0 , . Under this case, we have ρ → − C/ . Similarly, as (C.4) (B.4)are asymptotically equal, we can apply the proof same as in Section B.1, and know that the chi-squaredapproximation with the Bartlett correction also fails under this case.C.1.3 . Proof of Theorem A. (IV): Testing the Equality of Several Mean Vectors Note that testing theequality of several mean vectors can be viewed as testing the coefﬁcient matrix in multivariate linearregression; see, Section 10.7 in Muirhead (2009). Similarly to Section C.1.1, by Theorem 3 in He et al.(2020), we know that when n, p → ∞ and n − p → ∞ , (B.2) holds with µ n = n (cid:40) ( n − p − k − /

2) log ( n − − p )( n − k )( n − p − k )( n − (C.5) + ( k −

1) log ( n − − p )( n −

1) + p log ( n − k )( n − (cid:41) ,σ n = 12 (cid:26) log (cid:18) − pn − (cid:19) − log (cid:18) − pn − k (cid:19)(cid:27) . (C.6)Following the analysis in Section B.1, we know to derive the necessary and sufﬁcient conditions for thechi-squared approximations without and with the Bartlett correction, it is equivalent to examine (B.7)–(B.8) and (B.9)–(B.10), respectively, with µ n in (C.5) and σ n in (C.6). (IV.i) The chi-squared approximation. When p/n → , we apply Theorem 1 in He et al. (2020), andknow that (B.7)–(B.8) hold if and only if p /n → . When p/n → C ∈ (0 , and n − p → ∞ , we have σ n ∼ C ( k − / { n (1 − C ) } , and then √ f / (2 nσ n ) = (cid:112) k − p/ (2 nσ n ) → √ − C < . There-fore (B.7) fails, which suggests that the classical chi-squared approximation fails. (IV.ii) The chi-squared approximation with the Bartlett correction. When p/n → , we apply Theorem 2in He et al. (2020), and know that (B.9)–(B.10) hold if and only if p /n → . When p/n → C ∈ (0 , and n − p → ∞ , we have ρ ∼ − C/ , and then √ f / (2 nρσ n ) = (1 − C/ − √ p/ (2 nσ n ) → (1 − C/ − √ − C < . Therefore (B.9) fails, which suggests that the classical chi-squared approximationwith the Bartlett correction fails.C.1.4 . Proof of Theorem A. (V): Testing the Equality of Several Covariance Matrices Similarly tothe proof in Section B.1, by Theorem 4 in Jiang and Yang (2013) and Jiang and Qi (2015), we know that n the Phase Transition of Wilk’s Phenomenon under the conditions of Theorem A.1 and p → ∞ , (B.2) holds with µ n = 14 (cid:26) ( n − k )(2 n − p − k −

1) log (cid:18) − pn − k (cid:19) (C.7) − k (cid:88) i =1 ( n i −

1) (2 n i − p −

3) log (cid:18) − pn i − (cid:19)(cid:41) ,σ n = ( n − k ) n (cid:40) log (cid:18) − pn − k (cid:19) − k (cid:88) i =1 (cid:18) n i − n − k (cid:19) log (cid:18) − pn i − (cid:19)(cid:41) . (C.8)Following the analysis in Section B.1, we next derive the equivalent conditions for (B.7)–(B.8) and (B.9)–(B.10), respectively, with µ n in (C.7) and σ n in (C.8). (V.i) The chi-squared approximation.Case (V.i.1) lim n →∞ p/n = 0 . Under this case, we show that (B.7) holds. By Taylor’s expansion, σ n = ( n − k ) n (cid:34) − pn − k − p n − k ) + k (cid:88) i =1 (cid:18) n i − n − k (cid:19) (cid:26) pn i − p n i − (cid:27) + O (cid:18) p n (cid:19)(cid:35) = ( n − k ) n (cid:40) − pn − k + k (cid:88) i =1 p ( n i − n − k ) − p n − k ) + kp n − k ) + O (cid:18) p n (cid:19)(cid:41) = ( k − p n { o (1) } , where we use n i = Θ( n ) . As √ f ∼ p √ k − , we have (B.7) holds. Given (B.7), we know that (B.8) isequivalent to (2 f + 4 µ n ) / (2 p √ k − → . Through Taylor’s expansion, we obtain µ n = − p (2 n − p − k − − ( n − p ) p n − k − n − p ) p n − k ) + o (cid:18) p n (cid:19) + o ( p )+ k (cid:88) i =1 p (2 n i − p −

3) + k (cid:88) i =1 ( n i − p ) p ( n i −

1) + k (cid:88) i =1 n i − p ) p n i − + o (cid:18) p n (cid:19) = p ( p − kp − k + 1) + p n − k ) − k (cid:88) i =1 p n i −

1) + o (cid:18) p n (cid:19) + o ( p ) . By f = p ( p + 1)( k − / , we have f + 4 µ n = p (cid:32) n − k − k (cid:88) i =1 n i − (cid:33) + o (cid:18) p n (cid:19) + o ( p ) = Θ( p /n ) + o ( p ) , (C.9)where we use the fact that ( n − k ) − − (cid:80) ki =1 ( n i − − > . It follows that (2 f + 4 µ n ) / (2 p √ k −

1) =Θ( p /n ) , which converges to 0 if and only if p /n → . Case (V.i.2) lim n →∞ p/n = C ∈ (0 , . Under this case, we show that (B.7) and (B.8) do not hold atthe same time. Particularly, (B.7) and (B.8) together induce µ n + n σ n ) / (2 f ) → , which indicates µ n + n σ n ) n − → , and thus g ( C ) = 0 , where we deﬁne g ( C ) = (2 − C ) log(1 − C ) − k (cid:88) i =1 δ i (2 δ i − C ) log(1 − Cδ − i ) , and we assume n i /n → δ i ∈ (0 , for i = 1 , . . . , k. As p/n = ( p/n i ) × ( n i /n ) < n i /n , we have for C ∈ (0 , min i =1 ,...,k δ i ] by taking deriva-4 H E ET AL . tive of g ( C ) . Speciﬁcally, by (cid:80) ki =1 δ i = 1 and calculations, we have g (cid:48) ( C ) = k (cid:88) i =1 δ i (cid:8) − log(1 − C ) − (1 − C ) − + log(1 − Cδ − i ) + δ i ( δ i − C ) − (cid:9) ,g (cid:48)(cid:48) ( C ) = k (cid:88) i =1 δ i × C (cid:8) − (1 − C ) − + ( δ i − C ) − (cid:9) . When < C ≤ δ i < for i = 1 , . . . , k , we have g (cid:48)(cid:48) ( C ) > and thus g (cid:48) ( C ) is a monotonically increasingfunction of C . As g (cid:48) (0) = 0 , g (cid:48) ( C ) > when < C < and then g ( C ) is also monotonically increas-ing. By g (0) = 0 , we further obtain g ( C ) > when < C < , which contradicts with g ( C ) = 0 .As a result, we know (B.7) and (B.8) do not hold simultaneously, which suggests that the chi-squaredapproximation fails. (V.ii) The chi-squared approximation with the Bartlett correction. When lim n →∞ p/n = 0 , since ρ =1 + O ( p/n ) → and (B.7) is proved above, we know (B.9) holds. Given (B.9), as f ∼ p ( k − / ,to prove (B.10), it is equivalent to show (2 f + 4 ρµ n ) /p → , which is also equivalent to (2 f + 4 µ n − n µ n ) /p → , where we redeﬁne in this subsection that ∆ n = 2 p + 3 p − p + 1)( k − × ˜ D n, , ˜ D n, = k (cid:88) i =1 n i − − n − k . Similarly to the analysis of (C.9), through Taylor’s expansion of µ n in (C.7), we obtain f + 4 µ n = − p × ˜ D n, − p × ˜ D n, + o (cid:18) p n (cid:19) + o ( p ) , (C.10)where ˜ D n, = (cid:80) ki =1 ( n i − − − ( n − k ) − . Moreover, by (C.9) and ∆ n = O ( p/n ) = o (1) , we have n µ n = ∆ n (cid:18) − p × ˜ D n, − f (cid:19) + o (cid:18) p n (cid:19) + o ( p ) , (C.11)Combining (C.10) and (C.11), we have f + 4 µ n − n µ n = − p × ˜ D n, − p × ˜ D n, + ∆ n (cid:18) p × ˜ D n, + 2 f (cid:19) + o (cid:18) p n (cid:19) + o ( p ) , = p k − (cid:110) D n, − k −

1) ˜ D n, (cid:111) + o (cid:18) p n (cid:19) + o ( p ) , (C.12)where we use ˜ D n, = Θ( n − ) , ˜ D n, = Θ( n − ) , ∆ n = p ˜ D n, / { k − } + o ( p/n ) , and n f = p ˜ D n, / o ( p ) . We next show that (C.12) = Θ( p n − ) . In particular, in this subsection, we redeﬁne δ i = ( n i − / ( n − k ) , which satisﬁes (cid:80) ki =1 δ i = 1 . Then by the deﬁnitions of ˜ D n, and ˜ D n, , we calculate that ( n − k ) × { D n, − k −

1) ˜ D n, } = (5 − k ) k (cid:88) i =1 δ − i + 2 (cid:88) ≤ i (cid:54) = j ≤ k δ − i δ − j − k (cid:88) i =1 δ − i + 3 k − . (C.13) n the Phase Transition of Wilk’s Phenomenon As δ − i δ − j ≤ δ − i + δ − j , we have(C.13) ≤ (3 − k ) k (cid:88) i =1 δ − i − k (cid:88) i =1 δ − i + 3 k − ≤ (3 − k ) k (cid:88) i =1 δ − i − k + 3 k − , (C.14)where in the last inequality, we use (cid:80) ki =1 δ − i ≥ k ( (cid:80) ki =1 δ i ) − = k . Therefore (C.14) < when k ≥ . When k = 2 , as δ + δ = 1 , we have δ − + δ − = δ − δ − and (C.13) = − (cid:80) i =1 δ − i − (cid:80) i =1 δ − i + 5 . As (cid:80) i =1 δ − i ≥ , (C.13) < − × + 5 < . In summary, we know (C.13) < for k ≥ , and thus (C.12) = Θ( p n − ) . If follows that (2 f + 4 ρµ n ) /p → if and only if p /n → . Insummary, we know for testing problem (V), the chi-squared approximation with the Bartlett correctionworks if and only if p /n → . C.1.5 . Proof of Theorem A. (VI): Joint Testing the Equality of Several Mean Vectors and CovarianceMatrices Similarly to the proof in Section B.1, by Theorem 3 in Jiang and Yang (2013) and Jiang andQi (2015), we know that under the conditions of Theorem A.1 and p → ∞ , (B.2) holds with µ n = 14 (cid:40) − kp − k (cid:88) i =1 pn i − nL n,p (2 p − n + 3) + k (cid:88) i =1 n i L n i − ,p (2 p − n i + 3) (cid:41) , (C.15) σ n = 12 (cid:32) L n,p − k (cid:88) i =1 n i n × L n i − ,p (cid:33) , (C.16)where L n,p = log(1 − p/n ) . Following Section B.1, we next derive the equivalent conditions for (B.7)–(B.8) and (B.9)–(B.10), respectively, with µ n in (C.15) and σ n in (C.16). (VI.i) The chi-squared approximation.Case (VI.i.1) lim n →∞ p/n = 0 . Under this case, we show that (B.7) holds. As − log(1 − x ) = x + x / O ( x ) and n i = Θ( n ) , we obtain σ n = = k (cid:88) i =1 n i n (cid:26) pn i − p n i − (cid:27) − pn − p n + O (cid:18) p n (cid:19) = k (cid:88) i =1 n i n (cid:18) pn i + pn i + p n i (cid:19) − pn − p n + O (cid:18) p n (cid:19) = kpn + ( k − p n + O (cid:18) p n (cid:19) , where in the second equation, we use ( n i − − = n − i + n − i + O ( n − i ) and ( n i − − = n − i + O ( n − i ) . It follows that nσ n ∼ p √ k − . By √ f ∼ p √ k − , we have (B.7). Given (B.7), we knowthat (B.8) is equivalent to (2 f + 4 µ n ) /p → . As p/n = o (1) , through Taylor’s expansion, we obtain − n (2 p − n + 3) L n,p = n (2 p − n + 3) (cid:26) pn + p n + p n + O (cid:18) p n (cid:19)(cid:27) (C.17) = p (cid:26) p + p n − n − p − p n + 3 + O (cid:18) p n (cid:19) + o (1) (cid:27) = p (cid:26) p + p n − n + 3 + O (cid:18) p n (cid:19) + o (1) (cid:27) . H E ET AL . Similarly, by Taylor’s expansion and n i = Θ( n ) , we have − n i (2 p − n i + 3) L n i − ,p (C.18) = n i (2 p − n i + 3) (cid:26) pn i − p n i − + p n i − + O (cid:18) p n (cid:19)(cid:27) = n i (2 p − n i + 3) (cid:26) pn i + pn i + p n i + p n i + O (cid:18) p n (cid:19) + O (cid:18) p n (cid:19)(cid:27) = p (cid:26) p + p n i − n i + 3 − O (cid:18) p n i (cid:19) + o (1) (cid:27) , where in the second equation, we use ( n i − − = n − i + n − i + O ( n − i ) and ( n i − − a = n − ai + O ( n − i ) for integers a ≥ . Combining (C.17) and (C.18), we obtain f + 4 µ n = 2 f − kp + p (cid:40) (1 − k ) p + p (cid:16) n − k (cid:88) i =1 n i (cid:17) + 3 − k (cid:41) + O (cid:16) p n (cid:17) + o ( p ) (C.19) = p (cid:16) n − k (cid:88) i =1 n i (cid:17) + O (cid:16) p n (cid:17) + o ( p ) . As n − − (cid:80) ki =1 n − i = Θ( n − ) , we have f + 4 µ n = Θ( p n − ) . Therefore we know (2 f + 4 µ n ) /p → if and only if p /n → . Case (VI.i.2) lim n →∞ p/n = C ∈ (0 , . In this subsection, we redeﬁne δ i = n i /n ∈ (0 , . Then n σ n f → C ( k − × (cid:110) log(1 − C ) − k (cid:88) i =1 δ i log(1 − Cδ − i ) (cid:111) , where < C ≤ δ i < . Therefore (B.7) induces g ( C ) = 0 , where we deﬁne g ( C ) = log(1 − C ) − k (cid:88) i =1 δ i log(1 − Cδ − i ) − ( k − C / . By taking derivative of g ( C ) , we obtain g (cid:48) (0) = 0 , g (cid:48)(cid:48) (0) = 0 , and g (cid:48)(cid:48)(cid:48) ( C ) = 2( C − − k (cid:88) i =1 δ i ( C − δ i ) = k (cid:88) i =1 δ i (1 − δ i )( C − δ i C + δ i + δ i )(1 − C ) ( δ i − C ) . As C − δ i C + δ i + δ i is a monotonically decreasing function of C when < C ≤ δ i < , and it equals δ i ( δ i − > when C = δ i , we have g (cid:48)(cid:48)(cid:48) ( C ) > for < C ≤ δ i . It follows that g ( C ) is a monoton-ically increasing function when < C ≤ δ i < . As g (0) = 0 , we have g ( C ) > , which contradictswith g ( C ) = 0 . Therefore, we know that (B.7) does not hold under this case, which implies that thechi-squared approximation fails. (VI.ii) The chi-squared approximation with the Bartlett correction. When lim n →∞ p/n = 0 , since ρ = 1 + O ( p/n ) → and (B.7) is proved above, we know (B.9) holds. Given (B.9), as f ∼ p ( k − / ,to prove (B.10), it is equivalent to show (2 f + 4 ρµ n ) /p → , which is equivalent to (2 f + 4 µ n − n µ n ) /p → , where in this subsection, we redeﬁne ∆ n = 2 p + 9 p + 116( p + 3)( k − × D n, , D n, = k (cid:88) i =1 n i − n . Similarly to (C.17), through Taylor’s expansion, we further have n (2 p − n + 3) r n = p (cid:26) p − n + 3 + p n + p n + O (cid:16) p n (cid:17) + o (1) (cid:27) . n the Phase Transition of Wilk’s Phenomenon In addition, similarly to (C.18), we have n i (2 p − n i + 3) r n (cid:48) i = p (cid:26) p − n i + 3 − p n i + p n i + O (cid:16) p n i (cid:17) + o (1) (cid:27) . (C.20)It follows that f + 4 µ n = − p D n, − p D n, + O (cid:16) p n (cid:17) + o ( p ) , (C.21)where D n, = (cid:80) ki =1 n − i − n − . Moreover, by (C.19) and ∆ n = O ( p/n ) = o (1) , n µ n = ∆ n (cid:16) − p D n, − f (cid:17) + O (cid:16) p n (cid:17) + o ( p ) . (C.22)Combining (C.21) and (C.22), we obtain f + 4 µ n − n µ n = p k − { D n, − k − D n, } + O (cid:16) p n (cid:17) + o ( p ) , (C.23)where we use D n, = Θ( n − ) , D n, = Θ( n − ) , ∆ n = pD n, / { k − } + o ( p/n ) , and n f = p D n, / o ( p ) . Following the analysis of (C.13), we know (C.23) = Θ( p n − ) . Therefore, (2 f + ρµ n ) /p → if and only if p /n → , which suggests that the chi-squared approximation with theBartlett correction holds if and only if p /n → . C.1.6 . Proof of Theorem A. (VII): Testing Independence between Multiple Vectors Similarly to theproof in Section B.1, by Theorem 2 in Jiang and Yang (2013) and Jiang and Qi (2015), we know thatunder the conditions of Theorem A.4 and p → ∞ , (B.2) holds with µ n = n  − (cid:18) n − p − (cid:19) L n − ,p + k (cid:88) j =1 (cid:26)(cid:18) n − p j − (cid:19) L n − ,p j (cid:27) (C.24) σ n = 12 (cid:18) − L n − ,p + k (cid:88) j =1 L n − ,p j (cid:19) . (C.25)Following the analysis in Section B.1, we next derive the equivalent conditions for (B.7)–(B.8) and (B.9)–(B.10), respectively, with µ n in (C.24) and σ n in (C.25). (VII.i) The chi-squared approximation.Case (VI.i.1) lim n →∞ p/n = 0 . Under this case, we show that (B.7) holds. Through Taylor’s expansion, σ n = pn − p n − − k (cid:88) i =1 (cid:26) p i n − p i n − (cid:27) + O (cid:18) p n (cid:19) = p − (cid:80) ki =1 p i n − + O (cid:18) p n (cid:19) . Recall that f = p − (cid:80) ki =1 p i , and thus (B.7) holds. As f = Θ( p ) undert the conditions of TheoremA.4, given (B.7), we know (B.8) is equivalent to (2 f + 4 µ n ) /p → . Similarly to the analysis of (C.18),through Taylor’s expansion, we have n (2 n − p − L n − ,p = p (cid:110) p + p n − n + 1 + O (cid:16) p n (cid:17) + o (1) (cid:111) ,n (2 n − p i − L n − ,p i = p i (cid:110) p i + p i n − n + 1 + O (cid:16) p i n (cid:17) + o (1) (cid:111) . H E ET AL . It follows that f + 4 µ n = p − k (cid:88) i =1 p i − p (cid:16) p + p n − n + 1 (cid:17) + k (cid:88) i =1 p i (cid:16) p i + p i n − n + 1 (cid:17) + O (cid:16) p n (cid:17) + o ( p )= 13 n (cid:18) k (cid:88) i =1 p i − p (cid:19) + O (cid:16) p n (cid:17) + o ( p ) . Under the conditions of Theorem A.4, we have (cid:80) ki =1 p i − p = Θ( p ) . Thus (2 f + 4 µ n ) /p → if andonly if p /n → , which suggests that the chi-squared approximation holds if and only if p /n → . Case (VI.i.2) lim n →∞ p/n = C ∈ (0 , . Under this case, we show that (B.7) and (B.8) do not hold atthe same time. Particularly, as f = Θ( p ) and p/n → C ∈ (0 , , (B.7) induces (2 n σ n − f ) /n → ,and (B.8) induces ( f + 2 µ n ) /n → . Therefore, (B.7) and (B.8) together give (2 n σ n + 2 µ n ) /n → . Suppose lim n →∞ p i /n = C i ∈ (0 , . It follows that (cid:80) ki =1 C i = C , and (2 n σ n + 2 µ n ) /n → − g ( C ) + k (cid:88) i =1 g ( C i ) , (C.26)where g ( C ) = (2 − C ) log(1 − C ) . Note that g ( C ) is a strictly concave function of C ∈ (0 , and g (0) = 0 . By the property of strictly concave function, we have k (cid:88) i =1 g ( C i ) = k (cid:88) i =1 g ( C × C i /C ) > k (cid:88) i =1 g ( C ) × C i /C = g ( C ) , where we use (cid:80) ki =1 C i = C. Therefore when C ∈ (0 , , the right hand side of (C.26) > , which contra-dicts with (2 n σ n + 2 µ n ) /n → . We thus know that (B.7) and (B.8) do not hold simultaneously, whichsuggests that the chi-squared approximation fails. (VII.ii) The chi-squared approximation with the Bartlett correction.

When lim n →∞ p/n = 0 , since ρ =1 + O ( p/n ) → and (B.7) is proved above, we know (B.9) holds. Given (B.9), as f = Θ( p ) , to prove(B.10), it is equivalent to show (2 f + 4 ρµ n ) /p → , which is equivalent to (2 f + 4 µ n − n µ n ) /p → , where in this subsection, we redeﬁne ∆ n = 2 × D p, + 9 × D p, n × D p, , D p, = p − k (cid:88) i =1 p i , D p, = p − k (cid:88) i =1 p i . Similarly to (C.20), through Taylor’s expansion, we further obtain n (2 n − p − L n − ,p = p (cid:110) p − n + 1 + p n + p n + O (cid:16) p n (cid:17) + o (1) (cid:111) ,n (2 n − p i − L n − ,p i = p i (cid:110) p i − n + 1 + p i n + p i n + O (cid:16) p i n (cid:17) + o (1) (cid:111) . It follows that f + 4 µ n = − n D p, − n D p, + O (cid:16) p n (cid:17) + o ( p ) , (C.27)where D p, = p − (cid:80) ki =1 p i . Moreover, as ∆ n = Θ( p/n ) , by (C.27) and f = D p, , we have n µ n = ∆ n (cid:16) − n D p, − D p, (cid:17) + O (cid:16) p n (cid:17) + o ( p ) . n the Phase Transition of Wilk’s Phenomenon As ∆ n = D p, / (3 nD p, ) + O ( n − ) , we calculate that f + 4 µ n − n µ n (C.28) = − n D p, − n D p, + D p, nD p, (cid:16) n D p, + D p, (cid:17) + O (cid:16) p n (cid:17) + o ( p )= − n D p, (3 D p, D p, − D p, ) + O (cid:16) p n (cid:17) + o ( p ) . We next prove (C.28) = Θ( p n − ) by showing D p, D p, − D p, = Θ( p ) . Speciﬁcally, by the def-initions of D p, , D p, , and D p, , we write D p, D p, − D p, (C.29) = p (cid:16) p − k (cid:88) i =1 p i (cid:17) + 2 p (cid:16) − p k (cid:88) i =1 p i + k (cid:88) i =1 p i (cid:17) + 2 p (cid:16) p k (cid:88) i =1 p i − k (cid:88) i =1 p i (cid:17) + (cid:16) − p + k (cid:88) i =1 p i (cid:17) k (cid:88) i =1 p i + 2 (cid:40) k (cid:88) i =1 p i k (cid:88) i =1 p i − (cid:16) k (cid:88) i =1 p i (cid:17) (cid:41) . Using p = (cid:80) ki =1 p i , we obtain p k (cid:88) i =1 p αi − k (cid:88) i =1 p α +1 i = (cid:88) i (cid:54) = j p i p αj , p (cid:88) i (cid:54) = j p i p j − (cid:88) i (cid:54) = j p i p j = (cid:88) i (cid:54) = j (cid:54) = l p i p j p l , (C.30)where integer ≤ α ≤ , and we use (cid:80) i (cid:54) = j and (cid:80) i (cid:54) = j (cid:54) = l to denote the summation (cid:80) ≤ i (cid:54) = j ≤ k and (cid:80) ≤ i (cid:54) = j (cid:54) = l ≤ k for simplicity. By (C.30), we calculate that(C.29) = p (cid:88) i (cid:54) = j (cid:54) = l p i p j p l + 2 p (cid:88) i (cid:54) = j p i p j − (cid:88) i (cid:54) = j p i p j k (cid:88) l =1 p l − (cid:88) i (cid:54) = j p i p j + 2 (cid:88) i (cid:54) = j p i p j > p (cid:88) i (cid:54) = j p i p j − (cid:88) i (cid:54) = j p i p j k (cid:88) l =1 p l − (cid:88) i (cid:54) = j p i p j = 2 (cid:16) k (cid:88) i =1 p i + (cid:88) i (cid:54) = j p i p j (cid:17) (cid:88) i (cid:54) = j p i p j − (cid:88) i (cid:54) = j p i p j − (cid:88) i (cid:54) = j (cid:54) = l p i p j p l − (cid:88) i (cid:54) = j p i p j > . Therefore (C.29) = Θ( p ) and then (C.28) = Θ( p n − ) . Thus (2 f + 4 ρµ n ) /p → if and only if p /n → , which suggests that the chi-squared approximation with the Bartlett correction holds if andonly if p /n → . C.2 . Proofs of Propositions A. & A. p → ∞ and p/n has a limit.C.2.1 . Proof of Proposition A. n, p →∞ , n − k → ∞ , and n − p → ∞ , (B.2) holds with µ n in (C.5) and σ n (C.6). Moreover, to derive thenecessary and sufﬁcient conditions for the chi-squared approximations without and with the Bartlett cor-rection, it is equivalent to examine (B.7)–(B.8) and (B.9)–(B.10), respectively, with µ n in (C.5) and σ n in(C.6). (i) The chi-squared approximation. (i.1) When p/n → and k/n → , we apply Theorem 1 in Heet al. (2020), and know that (B.7)–(B.8) hold if and only if √ pk ( p + k ) /n → . (i.2) When p/n → C ∈ (0 , and k/n → , we have f ∼ C ( k − n and σ n ∼ C ( k − / { n (1 − C ) } . It follows that0 H E ET AL . √ f / (2 nσ n ) ∼ √ − C < . Thus (B.7) fails, which suggests that the chi-squared approximation fails.(i.3) When p/n → and k/n → C ∈ (0 , , by applying the symmetric substitution technique in Sec-tion 10.4 of Muirhead (2009), we can switch k and p and analyze similarly as in the case (i.2) above.Therefore we know the chi-squared approximation also fails here. (i.4) When p/n → C ∈ (0 , and k/n → C ∈ (0 , , we know < C + C ≤ as p + k < n . By the constraint, it then sufﬁces toconsider C , C ∈ (0 , . Note that σ n ∼ log { (1 − C )(1 − C ) } − log(1 − C − C ) and f /n ∼ C C . Thus (B.7) induces g ( C , C ) = 0 where g ( C , C ) = C C − log { (1 − C )(1 − C ) } +log(1 − C − C ) . If C + C = 1 , g ( C , C ) → −∞ . We next consider < C + C < . By cal-culations, we have g (0 , C ) = 0 , and dd C g ( C , C ) = C { ( C − C + C ) − C } (1 − C )(1 − C − C ) < , where we use C , C ∈ (0 , and < C + C < . Similarly to the previous analyses, we know that g ( C , C ) is monotonically decreasing for C ∈ (0 , and thus g ( C , C ) < , as C ∈ (0 , and g (0 , C ) = 0 . Therefore (B.7) fails, which suggests that the classical chi-squared approximation fails. (ii) The chi-squared approximation with the Bartlett correction. (ii.1) When p/n → and k/n → , weapply Theorem 2 in He et al. (2020), and know that (B.9)–(B.10) hold if and only if √ pk ( p + k ) /n → . (ii.2) When p/n → C ∈ (0 , and k/n → , we have ρ ∼ − C/ , and the proof of part (IV.ii)in Section C.1.3 can be applied here similarly. Thus the chi-squared approximation fails. (ii.3) When p/n → and k/n → C ∈ (0 , , we know the chi-squared approximation also fails by switching k and p symmetrically as in the case (i.3) above. (ii.4) When p/n → C ∈ (0 , and k/n → C ∈ (0 , ,we know < C + C ≤ as p + k < n . Similarly to the case (i.4) above, we consider C , C ∈ (0 , and C + C < . Here ρ ∼ − ( C + C ) / and then (B.9) induces g ( C , C ) = 0 , where g ( C , C ) = 2 C C − (2 − C − C )[log { (1 − C )(1 − C ) } − log(1 − C − C )] . By calculations,we have g (0 , C ) = 0 , and dd C g ( C , C ) | C =0 = − C / (1 − C ) < , d d C g ( C , C ) = − C { ( C + C )( C −

2) + 2 } (1 − C ) (1 − C − C ) < , where we use ( C + C )( C −

2) + 2 > as < C + C < and − < C − < − . Similarly tothe analysis above, we know that g ( C , C ) < and thus (B.9) fails, which suggests that the chi-squaredapproximation with the Bartlett correction fails.C.2.2 . Proof of Proposition A. (i) The chi-squared approximation. (i.1) When p /n → and p /n → , we apply Theorem 1 in He et al. (2020), and know that (B.7)–(B.8) hold if and onlyif √ p p ( p + p ) /n → . (i.2) When p /n → C ∈ (0 , and p /n → , we have f ∼ Cnp and σ n ∼ Cp / { n (1 − C ) } . Then √ f / (2 nσ n ) ∼ √ − C < suggesting the failure of (B.7) and thusthe chi-squared approximation fails. (i.3) When p /n → and p /n → C ∈ (0 , , the chi-squared ap-proximation also fails by the symmetric substitution technique in Section C.2.1. (i.4) When p /n → C ∈ (0 , and p /n → C ∈ (0 , , we have σ n ∼ log { (1 − C )(1 − C ) } − log(1 − C − C ) and f /n ∼ C C . It follows that the analysis in case (i.4) of Section C.2.1 can be applied similarly, and weobtain the same conclusion, that is, (B.7) fails and then the chi-squared approximation fails. (ii) The chi-squared approximation with the Bartlett correction. (ii.1) When p /n → and p /n → , we apply Theorem 2 in He et al. (2020), and know that (B.9)–(B.10) hold if and only if √ p p ( p + p ) /n → . (ii.2) When p /n → C ∈ (0 , and p /n → , we have ρ ∼ − C/ , andthen √ f / (2 nρσ n ) = (1 − C/ − √ p/ (2 nσ n ) → (1 − C/ − √ − C < . Therefore (B.9) fails,which suggests that the classical chi-squared approximation with the Bartlett correction fails. (ii.3) When p /n → and p /n → C ∈ (0 , , similar conclusion holds by the symmetric substitution technique asabove. (ii.4) When p /n → C ∈ (0 , and p /n → C ∈ (0 , , we have ρ ∼ − ( C + C ) / . It fol- n the Phase Transition of Wilk’s Phenomenon lows that the analysis in case (ii.4) of Section C.2.1 can be applied similarly. Then we obtain the sameconclusion, that is, (B.9) fails and the chi-squared approximation with the Bartlett correction fails.C.3 . Proofs of Theorems . , A. & A. − η log Λ n when η = 1 and ρ ; here ρ denotes the corresponding Bartlett correction factor of each test.By Eq. (20)–(23) in Section 8.2.4 of Muirhead (2009), we know that for the testing problems (I)–(II)and (IV)–(VII), the characteristic functions of the likelihood ratio test statistics take the following generalform: log E { exp( − itη log Λ n ) } = ϕ ( t ) − ϕ (0) , (C.31)where ϕ ( t ) = 2 itη  K (cid:88) k =1 ξ ,k log ξ ,k − K (cid:88) j =1 ξ ,j log ξ ,j  + K (cid:88) k =1 log Γ (cid:8) ηξ ,k (1 − it ) + τ ,k + υ ,k (cid:9) − K (cid:88) j =1 log Γ (cid:8) ηξ ,j (1 − it ) + τ ,j + υ ,j (cid:9) ,i denotes the imaginary unit, τ ,k = (1 − η ) ξ ,k , and τ ,j = (1 − η ) ξ ,j . We next consider η = 1 and ρ for the chi-squared approximation without and with the Bartlett correction, respectively. The valuesof ρ , K , K , ξ ,k , ξ ,j , υ ,k , and υ ,j depend on the testing problem, and thus take different valuesin the following subsections. Moreover, by Muirhead (2009), in each problem, we have (cid:80) K k =1 ξ ,k = (cid:80) K j =1 ξ ,k , the degrees of freedom f is f = −  K (cid:88) k =1 υ ,k − K (cid:88) j =1 υ ,j −

12 ( K − K )  , (C.32)and the Bartlett correction ρ takes the value ρ = 1 − f  K (cid:88) k =1 υ ,k − υ ,k + ξ ,k − K (cid:88) j =1 υ ,j − υ ,j + ξ ,j  . (C.33)In the following proofs, we use Lemma C.3.1 below to obtain an asymptotic expansion of each character-istic function.L EMMA

C.3.1.

For a ﬁnite integer L , when η = 1 or ρ , p/n → , and R n,L (in (C.34) below) con-verges to , log E { exp( − itη log Λ n ) } = − f − it ) + L − (cid:88) l =1 ς l (cid:8) (1 − it ) − l − (cid:9) + R n,L , where ς l = ( − l +1 l ( l + 1)  K (cid:88) k =1 B l +1 ( τ ,k + υ ,k )( η × ξ ,k ) l − K (cid:88) j =1 B l +1 ( τ ,j + υ ,j )( η × ξ ,j ) l  , H E ET AL . B l +1 ( · ) denotes the ( l + 1) -th Bernoulli polynomial; see, e.g., Eq. (25) in Section 8.2.4 of Muirhead(2009), and R n,L denotes the remainder which is of the order of R n,L = O (cid:32) K (cid:88) k =1 | τ ,k + υ ,k | L +1 | ηξ ,k | L + K (cid:88) j =1 | τ ,j + υ ,j | L +1 | ηξ ,j | L (cid:33) . (C.34) Proof.

Please see Section D.2.16 on Page 73. (cid:3)

We next examine each testing problem based on Lemma C.3.1.C.3.1 . Proof of Theorem . (I): Testing One-Sample Mean Vector Recall that in Section C.1.1, wemention that testing one-sample mean vector can be viewed as testing coefﬁcient vector of a multivariatelinear regression model. By Section 10.5 in Muirhead (2009), we know that in this problem, K = 1 , K = 1 , ξ , = n/ , ξ , = n/ , υ , = − p/ , υ , = 0 , f = p and ρ = 1 − ( p/ /n . We next dis-cuss the chi-squared approximation without and with the Bartlett correction, respectively. (i) Chi-squared approximation. Consider ρ = 1 and p /n → . Then τ , = τ , = 0 , ς l = ( − l +1 l ( l + 1) × n/ l (cid:110) B l +1 (cid:16) − p (cid:17) − B l +1 (0) (cid:111) , (C.35)and for any ﬁnite integer L , R n,L = O ( p L +1 n − L ) . Since B l +1 ( · ) is a polynomial of order l + 1 , then ς l = O ( p l +1 /n l ) . By Lemma C.3.1, when p /n → , R n, = O ( p n − ) → , and E { exp( − it log Λ n ) } = (1 − it ) − f (cid:89) l =1 exp (cid:104) ς l (cid:8) (1 − it ) − l − (cid:9)(cid:105)(cid:8) O ( p n − ) (cid:9) = (1 − it ) − f (cid:8) V ( t ) + V ( t ) + V ( t ) V ( t ) (cid:9)(cid:8) O ( p n − ) (cid:9) , where V l ( t ) is deﬁned as in (B.18) on Page 26. Then similarly to the proof in Section B.2, by the inversionproperty of the characteristic function, we obtain Pr( − n ≤ x ) (C.36) = (cid:40) Pr( χ f ≤ x ) + ∞ (cid:88) v =1 ς v v ! v (cid:88) w =0 (cid:18) vw (cid:19) Pr( χ f +2 w ≤ x )( − v − w + ∞ (cid:88) v =1 ς v v ! v (cid:88) w =0 (cid:18) vw (cid:19) Pr( χ f +4 w ≤ x )( − v − w + (cid:88) v ≥

1; 0 ≤ w ≤ v v ≥

1; 0 ≤ w ≤ v ς v ς v v ! v ! (cid:18) v w (cid:19)(cid:18) v w (cid:19) Pr( χ f +2 w +4 w ≤ x )( − v − w + v − w (cid:41)(cid:26) O (cid:16) p n (cid:17)(cid:27) . When x = χ f ( α ) , by Propositions B.1 and B.2, and ς l = O ( p l +1 /n l ) , we have Pr( − n ≤ x ) = Pr( χ f ≤ x ) + ς (cid:8) Pr( χ f +2 ≤ x ) − Pr( χ f ≤ x ) (cid:9) + o ( p / /n ) . Particularly, by Lemma B.2.2,

Pr( χ f +2 ≤ x ) − Pr( χ f ≤ x ) = − √ f π exp (cid:18) − z α (cid:19) (cid:110) O ( f − / ) (cid:111) , and we compute ς = ( p + 2 p ) / (4 n ) . In Theorem 2.2, we have ϑ ( n, p ) = ς / √ f . (ii) Chi-squared approximation with the Bartlett correction. By choosing the Bartlett correction factor ρ as in (C.33), we have ς = 0 ; see, e.g., Section 8.2.4 in Muirhead (2009). Speciﬁcally, in this problem, n the Phase Transition of Wilk’s Phenomenon ρ = 1 − ( p + 2) / (2 n ) , ρξ , = ρξ , = n/ − ( p + 2) / , τ , = τ , = ( p + 2) / , υ , = − p/ , υ , =0 , and then ς l = ( − l +1 l ( l + 1)( ρ × ξ , ) l (cid:26) B l +1 (cid:18) − p − (cid:19) − B l +1 (cid:18) p + 24 (cid:19)(cid:27) . We calculate ς = p ( p − { ρn ) } − , ς = 0 , and ς l = O ( p l +1 n − l ) for l ≥ . Similarly to the proofin Section B.2, when p /n → , we have E { exp( − itρ log Λ n ) } = (1 − it ) − f (cid:8) V ( t ) + V ( t ) + V ( t ) V ( t ) (cid:9)(cid:8) O ( p /n ) (cid:9) , and thus Pr( − ρ log Λ n ≤ x ) (C.37) = (cid:40) Pr( χ f ≤ x ) + ∞ (cid:88) v =1 ς v v ! v (cid:88) w =0 (cid:18) vw (cid:19) Pr( χ f +4 w ≤ x )( − v − w + ∞ (cid:88) v =1 ς v v ! v (cid:88) w =0 (cid:18) vw (cid:19) Pr( χ f +8 w ≤ x )( − v − w + (cid:88) v ≥

1; 0 ≤ w ≤ v v ≥

1; 0 ≤ w ≤ v ς v ς v v ! v ! (cid:18) v w (cid:19)(cid:18) v w (cid:19) Pr( χ f +4 w +8 w ≤ x )( − v − w + v − w (cid:41)(cid:26) O (cid:16) p n (cid:17)(cid:27) . Note that ς = Θ( p n − ) and ς = Θ( p n − ) . By applying proposition B.1 with h = 2 , ∞ (cid:88) v =1 ς v v ! v (cid:88) w =0 (cid:18) vw (cid:19) Pr( χ f +4 w ≤ x )( − v − w = ∞ (cid:88) v =1 (cid:110) O (cid:0) ς p − / (cid:1)(cid:111) v = Θ( p / n − ) . By applying proposition B.1 with h = 4 , we have ∞ (cid:88) v =1 ς v v ! v (cid:88) w =0 (cid:18) vw (cid:19) Pr( χ f +8 w ≤ x )( − v − w = ∞ (cid:88) v =1 (cid:110) O (cid:0) ς p − / (cid:1)(cid:111) v = O (cid:0) p / n − (cid:1) = o ( p / n − ) , and (cid:88) v ≥

1; 0 ≤ w ≤ v v ≥

1; 0 ≤ w ≤ v ς v ς v v ! v ! (cid:18) v w (cid:19)(cid:18) v w (cid:19) Pr( χ f +4 w +8 w ≤ x )( − v − w + v − w = (cid:88) v ≥ (cid:8) O (cid:0) ς p − / (cid:1)(cid:9) v (cid:88) v ≥ { O ( ς ) } v v ! = o (cid:0) p / n − (cid:1) . In summary, by (C.37),

Pr( − ρ log Λ n ≤ x ) = Pr( χ f ≤ x ) + ς (cid:110) Pr( χ f +4 ≤ x ) − Pr( χ f ≤ x ) (cid:111) + o (cid:0) p / n − (cid:1) . Particularly, by Lemma B.2.2,

Pr( χ f +4 ≤ x ) − Pr( χ f ≤ x ) = − √ f π exp (cid:18) − z α (cid:19) (cid:110) O ( f − / ) (cid:111) . In Theorem 2.2 (I), ϑ ( n, p ) = 2 ς / √ f .C.3.2 . Proof of Theorem . (II): Testing One-Sample Covariance Matrix In this problem, by Section8.3.3 in Muirhead (2009), we know f = ( p + 2)( p − / , and (cid:114) K = p , K = 1 ;4 H E ET AL . (cid:114) ξ ,k = ( n − / , υ ,k = − ( k − / for k = 1 , . . . , K ; (cid:114) ξ , = p ( n − / , υ , = 0 . (i) Chi-squared approximation. Consider ρ = 1 and p /n → . Then τ ,k = 0 for k = 1 , . . . , K , τ , = 0 , and ς l = ( − l +1 l ( l + 1) (cid:40) p (cid:88) k =1 (cid:18) n − (cid:19) l B l +1 (cid:18) − k − (cid:19) − p ( n − B l +1 (0) (cid:41) , which satisﬁes ς l = O ( p l +2 /n l ) . By Lemma C.3.1, E { exp( − it log Λ n ) } = (1 − it ) − f (cid:8) V ( t ) + V ( t ) + V ( t ) V ( t ) (cid:9)(cid:8) O ( p /n ) (cid:9) , where V l ( t ) is deﬁned as in (B.18). Similarly to Section B.2, by the inversion property of the characteristicfunctions, and Propositions B.1 and B.2, we obtain (B.22). We calculate ς = 12 (cid:34) p (cid:88) k =1 n − (cid:40)(cid:18) − k − (cid:19) − (cid:18) − k − (cid:19) + 16 (cid:41) − p ( n − × (cid:35) = 2 p + 3 p − p − /p n − . The conclusion then follows by Lemma B.2.2 and ϑ ( n, p ) = ς / √ f . (ii) Chi-squared approximation with the Bartlett correction. In this problem, consider ρ = 1 − p + p + 26 p ( n − , and p /n → . Then τ ,k = (2 p + p + 2) / (12 p ) for k = 1 , . . . , p , and τ , = (2 p + p + 2) / . It fol-lows that ς l = ( − l +1 l ( l + 1) (cid:26) ρ ( n − (cid:27) − l (cid:40) p (cid:88) k =1 B l +1 (cid:18) p + p + 212 p − k − (cid:19) − p − l B l +1 (cid:18) p + p + 212 (cid:19)(cid:41) . In particular, we calculate ς = ( p − p − p + 2)288 p ρ ( n − (2 p + 6 p + 3 p + 2) . Similarly to Section B.2, by the inversion property of the characteristic functions, and Propositions B.1and B.2, we obtain (B.29). The conclusion then follows by Lemma B.2.2 and ϑ ( n, p ) = 2 ς / √ f .C.3.3 . Proof of Theorem A. (IV): Testing the Equality of Several Mean Vectors Recall that in SectionC.1.3, we show that this testing problem can be viewed as testing the coefﬁcient matrix in multivariatelinear regression. Then by Eq. (3) in Section 10.5.3 in Muirhead (2009), we know that in this problem, f = ( k − p , and (cid:114) K = k − , K = k − ; (cid:114) ξ ,j = n/ , υ ,j = − ( j + p ) / , j = 1 , . . . , k − ; (cid:114) ξ ,j = n/ , υ ,j = − j / , j = 1 , . . . , k − . (i) Chi-squared approximation. It follows that ς l = ( − l +1 l ( l + 1) (cid:18) n (cid:19) l  k − (cid:88) j =1 B l +1 (cid:18) − j + p (cid:19) − k − (cid:88) j =1 B l +1 (cid:18) − j (cid:19) , n the Phase Transition of Wilk’s Phenomenon which is O ( p l +1 n − l ) when k is ﬁnite. In particular, we calculate ς = p ( k − p + 2 + k )4 n . Applying similar analysis to that in Section C.3.1, the conclusion follows by ϑ ( n, p ) = ς / √ f . (ii) Chi-squared approximation with the Bartlett correction. In this problem, ρ = 1 − n ( p + k + 2) . It follows that ς l = ( − l +1 l ( l + 1) (cid:18) ρn (cid:19) l  k − (cid:88) j =1 B l +1 (cid:26) (1 − ρ ) n − ( j + p )2 (cid:27) − k − (cid:88) j =1 B l +1 (cid:26) (1 − ρ ) n − j (cid:27) . We calculate that ς = ( k − p ( p + k − k − ρ n . Similarly to Section C.3.1, the conclusion then follows by ϑ ( n, p ) = 2 ς / √ f .C.3.4 . Proof of Theorem A. (V): Testing the Equality of Several Covariance Matrices In this problem,by Section 8.2.4 in Muirhead (2009), we have f = p ( p + 1)( k − / , and (cid:114) K = kp, K = p ; (cid:114) ξ ,j = ( n r − / , j = ( r − p + 1 , . . . , rp , ( r = 1 , . . . , k ) ; (cid:114) υ ,j = − ( r − / , j = r, p + r, . . . , ( k − p + r , ( r = 1 , . . . , p ) ; (cid:114) ξ ,j = ( n − k ) / , υ ,j = − ( j − / , j = 1 , . . . , p . (i) Chi-squared approximation. Consider ρ = 1 and p /n → . Then ς l = ( − l +1 l ( l + 1) (cid:34) k (cid:88) r =1 p (cid:88) r =1 (cid:18) n r − (cid:19) l B l +1 (cid:18) − r − (cid:19) − p (cid:88) j =1 (cid:18) n − k (cid:19) l B l +1 (cid:18) − j − (cid:19)(cid:35) , which satisﬁes ς l = O ( p l +2 /n l ) . Particularly, ς = (cid:18) k (cid:88) i =1 n i − − n − k (cid:19) p (2 p + 3 p − . Following similar analysis to that in Section B.2, the conclusion then follows by ϑ ( n, p ) = ς / √ f . (ii) Chi-squared approximation with the Bartlett correction. In this problem, ρ = 1 − (2 p + 3 p − p + 1)( k − (cid:32) k (cid:88) i =1 n i − − n − k (cid:33) , and we consider p /n → . In this problem, ς l = ( − l +1 l ( l + 1) (cid:34) k (cid:88) r =1 p (cid:88) r =1 B l +1 { (1 − ρ )( n r − / − ( r − / }{ ρ ( n r − / } l − p (cid:88) j =1 B l +1 { (1 − ρ )( n − k ) / − ( j − / }{ ρ ( n − k ) / } l (cid:35) . H E ET AL . Note that (1 − ρ )( n − k ) and (1 − ρ )( n r − are of the order of Θ( p ) , B l +1 ( · ) is a polynomial of order l + 1 , and k is ﬁnite. Then for l ≥ , ς l = O ( p l +2 /n l ) . In particular, we calculate ς = p ( p + 1)48 ρ (cid:34) ( p − p + 2) (cid:40) k (cid:88) i =1 n i − − n − k ) (cid:41) − k − − ρ ) (cid:35) . Similarly to Section B.2, the conclusion then follows by ϑ ( n, p ) = 2 ς / √ f .C.3.5 . Proof of Theorem A. (VI): Joint Testing the Equality of Several Mean Vectors and CovarianceMatrices In this problem, by Section 10.8.2 in Muirhead (2009), we have f = ( k − p ( p + 3) / , and (cid:114) K = kp, K = p ; (cid:114) ξ ,j = n r / , j = ( r − p + 1 , . . . , rp , ( r = 1 , . . . , k ) ; (cid:114) υ ,j = − r/ , j = r, p + r, . . . , ( k − p + r , ( r = 1 , . . . , p ) ; (cid:114) ξ ,j = n/ , υ ,j = − j / , ( j = 1 , . . . , p ) . (i) Chi-squared approximation. Consider ρ = 1 and p /n → . It follows that ς l = ( − l +1 l ( l + 1)  k (cid:88) r =1 p (cid:88) r =1 B l +1 ( − r / n r / l − p (cid:88) j =1 B l +1 ( − j/ n/ l  . Particularly, we compute ς = (cid:32) k (cid:88) r =1 n r − n (cid:33) p (cid:0) p + 9 p + 11 (cid:1) . Following similar analysis to that in Section B.2, the conclusion then follows by ϑ ( n, p ) = ς / √ f . (ii) Chi-squared approximation with the Bartlett correction. In this problem, ρ = 1 − (cid:32) k (cid:88) r =1 n r − n (cid:33) (cid:0) p + 9 p + 11 (cid:1) k − p + 3) . It follows that ς = 0 and for l ≥ , ς l = ( − l +1 l ( l + 1)  k (cid:88) r =1 p (cid:88) r =1 B l +1 { (1 − ρ ) n r / − r / } ( ρn r / l − p (cid:88) j =1 B l +1 { (1 − ρ ) n/ − j/ } ( ρn/ l  . Particularly, we calculate ς = 1 ρ (cid:40) p ( p + 1)( p + 2)( p + 3)48 (cid:18) k (cid:88) i =1 n i − n (cid:19) − p ( k − p + 3)8 (1 − ρ ) (cid:41) . Applying similar analysis to that in Section B.2, the conclusion then follows by ϑ ( n, p ) = 2 ς / √ f .C.3.6 . Proof of Theorem A. (VII): Testing Independence between Multiple Vectors In this problem,by Section 11.2.4 in Muirhead (2009), we have f = ( p − (cid:80) kj =1 p j ) / , and (cid:114) K = p, K = p ; (cid:114) ξ ,j = n/ , υ ,j = − j / , j = 1 , . . . , p ; (cid:114) ξ , p + ... + p r − + j = n/ , υ , p + ... + p r − + j = − j / , r = 1 , . . . , k , j = 1 , . . . , p r . n the Phase Transition of Wilk’s Phenomenon (i) Chi-squared approximation. Consider ρ = 1 and p /n → . It follows that ς l = ( − l +1 l ( l + 1)  p (cid:88) j =1 B l +1 ( − j / n/ l − k (cid:88) r =1 p r (cid:88) j =1 B l +1 ( − j / n/ l  . Particularly, ς = 2( p − (cid:80) kj =1 p j ) + 9( p − (cid:80) kj =1 p j )24 n . Following similar analysis to that in Section B.2, the conclusion then follows by ϑ ( n, p ) = ς / √ f . (ii) Chi-squared approximation with the Bartlett correction. In this problem, ρ = 1 − D p, + 9 D p, nD p, where D p,r = p r − (cid:80) kj =1 p rj . Then ς l = ( − l +1 l ( l + 1)  p (cid:88) j =1 B l +1 { (1 − ρ ) n/ − j / } ( ρn/ l − k (cid:88) r =1 p r (cid:88) j =1 B l +1 { (1 − ρ ) n/ − j / } ( ρn/ l  . In particular, we calculate ς = 1( ρn ) (cid:32) D p, − D p, − D p, D p, (cid:33) . Applying similar analysis to that in Section B.2, the conclusion then follows by ϑ ( n, p ) = 2 ς / √ f .C.4 . Proofs of Theorems . , A. , & A. ψ ( s ) = exp( − s / , and we let ψ ( s ) be the characteristic function of ( − n + 2 µ n ) / (2 nσ n ) , where Λ n denotes the corresponding likelihood ratio test statistic, and µ n and σ n take the corresponding values given in Theorems 2.3, A.3, & A.6. By the analysis in SectionB.3, we know that it sufﬁces to prove the results similar to Lemma B.3.2 on Page 29. In particular, inthe following subsections, we prove that under H of each test, when s = o (min { ( n/p ) / , f / } ) , thecharacteristic functions satisfy log ψ ( s ) − log ψ ( s ) = O (cid:18) pn + 1 √ f (cid:19) s + (cid:18) p + pn (cid:19) O (cid:0) s (cid:1) + O (cid:18) s √ f (cid:19) . (C.38)C.4.1 . Proof of Theorem . (I): Testing One-Sample Mean Vector Recall that in Section C.1.1, wemention that testing one-sample mean vector can be viewed as testing coefﬁcient vector of a multivariatelinear regression model. By Section 10.5.3 in Muirhead (2009), we have log ψ ( s ) = log Γ (cid:8) n (1 − ti ) − p (cid:9) Γ (cid:8) ( n − p ) (cid:9) − log Γ (cid:8) n (1 − ti ) (cid:9) Γ (cid:0) n (cid:1) + µ n sinσ n , where t = s/ ( nσ n ) . By (B.7), t = s/ ( nσ n ) = O ( s/ √ f ) . By Lemma D.1.3 (on Page 53), log Γ (cid:8) n (1 − ti ) − p (cid:9) Γ (cid:8) ( n − p ) (cid:9) = (cid:26)

12 ( n − p ) − nti (cid:27) log (cid:26)

12 ( n − p ) − nti (cid:27) + 12 nti −

12 ( n − p ) log (cid:26)

12 ( n − p ) (cid:27) + nti n − p ) + O (cid:18) tn + t (cid:19) . H E ET AL . Similarly, we have log Γ { n (1 − ti ) } Γ( n ) = (cid:26) n (1 − ti )2 (cid:27) log (cid:26) n (1 − ti )2 (cid:27) + 12 nti − n (cid:16) n (cid:17) + ti O (cid:18) tn + t (cid:19) . It follows that log ψ ( s ) = g (cid:18) − nti (cid:19) − g (0) + µ n sinσ n + O (cid:18) ptn + t (cid:19) , where we deﬁne in this subsection that g ( z ) = { ( n − p ) / z } log { ( n − p ) / z } − ( n/ z ) log( n/ z ) . Following the proof of Lemma D.3.3 (see Section D.3.4 on Page 77), we similarly obtain g (cid:18) − nti (cid:19) − g (0) = g (1)0 (0) × (cid:18) − nti (cid:19) − g (2)0 (0)2 n t O ( pt ) , where g (1)0 (0) = log (cid:16) − pn (cid:17) , g (2)0 (0) = 2 pn ( n − p ) . Recall that nσ n / √ f → by (B.7). Then by Taylor’s series and f = p , g (2)0 (0) n = 4 n σ n (cid:110) O (cid:16) pn (cid:17)(cid:111) = 4 n σ n + O (cid:18) p n (cid:19) . Moreover, by Taylor’s series, we have ng (1)0 (0) − µ n = O ( p/n ) . In summary, by t = s/ ( nσ n ) and nσ n = Θ( √ p ) , we obtain log ψ ( s ) = − µ n sinσ n − n σ n s nσ n ) + µ n sinσ n + O (cid:18) psn (cid:19) + O (cid:18) pn + 1 p (cid:19) s + O (cid:18) s √ p (cid:19) . Then (C.38) is proved.C.4.2 . Proof of Theorem . (II): Testing One-Sample Covariance Matrix By Corollary 8.3.6 in Muir-head (2009), we have log ψ ( s ) = − p ( n − ti p + log Γ p { ( n − − ti ) } Γ p { ( n − } + log Γ { p ( n − } Γ { p ( n − − ti ) } + µ n ti. By (B.7) and f = Θ( p ) , nσ n = Θ( p ) . Then as t = s/ ( nσ n ) , the conditions in Lemma D.3.1 (on Page74 ) are satisﬁed and we have log Γ p { ( n − − ti ) / } Γ p { ( n − / } = − ( n − β n, ti n − β n, t β n, (cid:26) − ( n − ti (cid:27) + O (cid:18) p tn (cid:19) + (cid:18) p + pn (cid:19) O (cid:0) p t (cid:1) + O (cid:0) p t (cid:1) , where β n, , β n, , and β n, ( · ) are deﬁned in Lemma D.3.1. In addition, we can apply Lemma D.1.3 andobtain log Γ { p ( n − / } Γ { p ( n − − ti ) / } = − p (cid:26) n −

12 (1 − ti ) (cid:27) log (cid:20) p (cid:26) n −

12 (1 − ti ) (cid:27)(cid:21) + p ( n − p ( n − − p ( n − ti − ti + O (cid:18) tpn + t (cid:19) . By the deﬁnition of β n, ( · ) in Lemma D.3.1, we have log Γ { p ( n − / } Γ { p ( n − / − pnti/ } = − β n, (cid:26) − ( n − ti (cid:27) − p ( n − ti (1 − log p )2 + O (cid:0) t + t (cid:1) . n the Phase Transition of Wilk’s Phenomenon Since µ n = ( β n, + p )( n − / , n σ = β n, ( n − , t = s/ ( nσ n ) , and nσ n = Θ( p ) , log ψ ( s ) − log ψ ( s ) = O (cid:18) pn + 1 p (cid:19) s + O (cid:18) p + pn (cid:19) s + O (cid:18) s p (cid:19) . C.4.3 . Proof of Theorem A. (IV): Testing the Equality of Several Mean Vectors By (C.31) and theanalysis in Section C.3.3, we have log ψ ( s ) = k − (cid:88) j =1 (cid:34) log Γ (cid:8) ( n − j − p ) − nti (cid:9) Γ (cid:8) ( n − j − p ) (cid:9) − log Γ (cid:8) ( n − j ) − nti (cid:9) Γ (cid:8) ( n − j ) (cid:9) (cid:35) + µ n sinσ n , where t = s/ ( nσ n ) . By Lemma D.1.3, log Γ (cid:8) ( n − j − p ) − nti (cid:9) Γ (cid:8) ( n − j − p ) (cid:9) = (cid:26)

12 ( n − j − p ) − nti (cid:27) log (cid:26)

12 ( n − j − p ) − nti (cid:27) − n − j − p n − j − p nti O ( t + t ) . Applying similar analysis, we obtain log Γ (cid:8) ( n − j − nti ) (cid:9) Γ (cid:8) ( n − j ) (cid:9) = (cid:18) n − j − nti (cid:19) log (cid:18) n − j − nti (cid:19) − n − j n − j nti O ( t + t ) . It follows that log ψ ( s ) = (cid:80) k − j =1 { g j ( nti/ − g j (0) } + µ n si/ ( nσ n ) + O ( t + t ) , where we deﬁne inthis subsection that g j ( z ) = (cid:18) n − j − p − z (cid:19) log (cid:18) n − j − p − z (cid:19) − (cid:18) n − j − z (cid:19) log (cid:18) n − j − z (cid:19) . Following similar proof to that of Lemma D.3.3 (see Section D.3.4), we obtain k − (cid:88) j =1 { g j ( nti ) − g j (0) } = k − (cid:88) j =1 g (1) j (0) nti − n t k − (cid:88) j =1 g (2) j (0) + O ( pt ) , (C.39)where g (1) j (0) = log (cid:18) n − j (cid:19) − log (cid:18) n − j − p (cid:19) , g (2) j (0) = 2 n − j − p − n − j . Note that k − (cid:88) j =1 g (2) j (0) = k − (cid:88) j =1 p ( n − j − p )( n − j ) = p ( k − n − p − n (cid:26) O (cid:18) kn (cid:19)(cid:27) , and σ n = log (cid:26) p ( k − n − k )( n − p − (cid:27) = p ( k − n − p − n (cid:26) O (cid:18) kn (cid:19)(cid:27) . Thus (cid:80) k − j =1 g (2) j (0)(4 σ n ) − = 1 + O ( n − ) . In addition, k − (cid:88) j =1 g (1) j (0) = log Γ( n − n − k ) − log Γ( n − p − n − p − k ) . H E ET AL . We then apply Lemma D.1.1 to expand the log Γ( · ) function, and calculate k − (cid:88) j =1 g (1) j (0) = − (cid:18) n − p − k − (cid:19) (cid:26) log (cid:18) − pn − (cid:19) − log (cid:18) − pn − k (cid:19)(cid:27) − p log (cid:18) − k − n − (cid:19) − ( k −

1) log (cid:18) − pn − (cid:19) + O ( n − ) . Therefore (cid:80) k − j =1 g (1) j (0) = − µ n /n + O ( n − ) . Then by (C.39), t = s/ ( nσ n ) , nσ n = Θ( f / ) , and f =Θ( p ) , we have log ψ ( s ) = (cid:8) − µ n /n + O ( n − ) (cid:9) nti − n σ n t (cid:8) O ( n − ) (cid:9) + µ n ti + O (cid:0) t + t + pt (cid:1) = − s O (cid:18) √ f (cid:19) s + O (cid:18) pn + 1 f (cid:19) s + O (cid:18) s √ f (cid:19) . By log ψ ( s ) = − s / , (C.38) is proved.C.4.4 . Proof of Theorem A. (V): Testing the Equality of Several Covariance Matrices By (C.31) andthe analysis in Section C.3.4, we have log ψ ( s ) = log Γ p (cid:8) ( n − k ) (cid:9) Γ p (cid:8) ( n − k )(1 − ti ) (cid:9) + k (cid:88) j =1 log Γ p (cid:8) ( n j − − ti ) (cid:9) Γ p (cid:8) ( n j − (cid:9) − p  ( n − k ) log( n − k ) − k (cid:88) j =1 ( n j −

1) log( n j −  ti µ n sinσ n , where t = s/ ( nσ n ) . By Lemma D.3.1, we can expand log Γ p ( · ) and obtain log ψ ( s ) = − µ n ti − n σ n t µ n ti + R n ( t ) , (C.40)where the calculations of µ n and σ n are similar to that in Section A.5 of Jiang and Qi (2015), and thusthe details are skipped here. In (C.40), R n ( t ) denotes the remainder term of the expansion. Since LemmaD.3.1 is used, we know that the remainder term satisﬁes R n ( t ) = O (cid:16) pn (cid:17) s + (cid:18) p + pn (cid:19) s + O (cid:18) s p (cid:19) . By t = s/ ( nσ n ) and (C.40), (C.38) is obtained.C.4.5 . Proof of Theorem A. (VI): Joint Testing the Equality of Several Mean Vectors and CovarianceMatrices By Corollary 10.8.3 in Muirhead (2009), log ψ ( s ) = log Γ p (cid:8) ( n − (cid:9) Γ p (cid:8) ( n − − nti (cid:9) + k (cid:88) j =1 log Γ p (cid:8) ( n j − − n j ti (cid:9) Γ p (cid:8) ( n j − (cid:9) − p (cid:18) n log n − k (cid:88) j =1 n j log n j (cid:19) ti µ n sinσ n , n the Phase Transition of Wilk’s Phenomenon where t = s/ ( nσ n ) . By Lemma D.3.1, log Γ p (cid:8) ( n j − − n j ti (cid:9) Γ p (cid:8) ( n j − (cid:9) = (cid:20) pn j + (cid:18) n j − p − (cid:19) n j log (cid:18) − pn j − (cid:19)(cid:21) ti (C.41) + (cid:26) pn j − (cid:18) − pn j − (cid:19)(cid:27) n j t (cid:37) n j ( t ) + R n ( t ) , where for an integer l , we deﬁne (cid:37) l ( t ) = p (cid:26)(cid:18) l −

12 + lt (cid:19) log (cid:18) l −

12 + lt (cid:19) − l −

12 log l − (cid:27) , (C.42)and R n ( t ) denotes the remainder term and it is of the order of R n ( t ) = O (cid:18) ptn (cid:19) + O (cid:18) p + pn (cid:19) p t + O (cid:0) p t (cid:1) . (C.43)In addition, to evaluate log ψ ( s ) , we also use Lemma C.4.1 below.L EMMA

C.4.1.

Under the conditions of Theorem A. , as p/n → and t = s/ ( nσ n ) = O ( s/ √ f ) , n t log (cid:18) − pn − (cid:19) = n t log (cid:16) − pn (cid:17) + O (cid:16) pn (cid:17) t , (C.44) (cid:26) ( n − p − / n log (cid:18) − pn − (cid:19)(cid:27) t = (cid:110) ( n − p − / n log (cid:16) − pn (cid:17)(cid:111) t − pt + O (cid:16) pn (cid:17) t. Moreover, for (cid:37) l ( t ) deﬁned in (C.42) , we have − (cid:37) n ( t ) + k (cid:88) j =1 (cid:37) n j ( t ) = (cid:18) − k − n log n + k (cid:88) j =1 n j log n j (cid:19) tp O (cid:18) ptn + pt (cid:19) . (C.45) Proof.

Please see Section D.3.5 on Page 77. (cid:3)

By Lemma C.4.1 and the expansions of gamma functions in (C.41), we calculate log ψ ( s ) (C.46) = (cid:26) p − (cid:18) n − p − (cid:19) n log (cid:16) − pn (cid:17) + k (cid:88) j =1 (cid:18) n j − p − (cid:19) n j log (cid:18) − pn j − (cid:19)(cid:27) ti −  n L n,p − k (cid:88) j =1 n j L n j − ,p  t − p  (1 − k ) − n log n + k (cid:88) j =1 n j log n j  ti − p (cid:18) n log n − k (cid:88) j =1 n j log n j (cid:19) ti µ n sinσ n + R n ( t ) , where R n ( t ) denotes the remainder term of (C.46), which is of the order same as that in (C.43), whereaswe mention that the exact value of R n ( t ) can change. Then we obtain (C.38) by t = s/ ( nσ n ) and nσ n =Θ( f / ) .C.4.6 . Proof of Theorem A. (VII): Testing Independence between Multiple Vectors By Theorem11.2.3 in Muirhead (2009), we know log ψ ( s ) = log Γ p { ( n − − nti } Γ p { ( n − } + k (cid:88) j =1 Γ p j { ( n − } Γ p j { ( n − − nti } + µ n sinσ n , H E ET AL . where t = s/ ( nσ n ) . By Lemma D.3.1, we can expand log Γ p ( · ) and obtain log ψ ( s ) =  p + (cid:18) n − p − (cid:19) L n − ,p − k (cid:88) j =1 (cid:26) p j + (cid:18) n − p j − (cid:19) L n − ,p j (cid:27) nti  pn − L n − ,p − k (cid:88) j =1 (cid:18) p j n − L n − ,p j (cid:19) n t (cid:18) p − k (cid:88) j =1 p j (cid:19) (cid:26) n (1 − ti )2 log n (1 − ti )2 − n n (cid:27) + µ n sinσ n + R n ( t ) , where R n ( t ) denotes the remainder term and its order satisﬁes R n ( t ) = O (cid:18) ptn (cid:19) + O (cid:18) p + pn (cid:19) p t + O (cid:0) p t (cid:1) . Then we obtain (C.38) by noticing p − (cid:80) kj =1 p j = 0 and t = O ( s/p ) .D. P ROOFS OF A SSISTED L EMMAS

D.1 . Results on Asymptotic Expansions of the Gamma Functions

In this section, we provide some results on asymptotic expansions of the gamma functions, which arerepeatedly used in the proofs. We ﬁrst give the following Lemma D.1.1 on the expansion of log Γ( z ) ,which also provides the basis for other lemmas below. Lemma D.1.1 and its proof can be found in 12.33of Whittaker and Watson (1996).L EMMA

D.1.1.

Suppose that a complex number z satisﬁes Re ( z ) ≥ (cid:15) > and | arg( z ) | ≤ π/ − (cid:15) with (cid:15) > and < (cid:15) < π/ being given in advance. When | z | → ∞ , and an even integer L , we have log Γ( z ) = (cid:18) z − (cid:19) log z − z + log √ π + L − (cid:88) l =1 ( − l +1 B l +1 (0) l ( l + 1) z l + R L ( z ) , (D.1) where B l +1 ( · ) represents the Bernoulli polynomial of order l + 1 , and | R L ( z ) | = O (cid:18) | B L +2 (0) | ( L + 1)( L + 2) | z | L +1 (cid:19) . Particularly, we know B l (0) = 0 when l is odd and l ≥ . In Lemma D.1.1, if we take L = 2 and z as a real number, by B (0) = 1 / , we have log Γ( z ) = (cid:18) z − (cid:19) log z − z + log √ π + 112 z + O ( z − ) . (D.2)Given Lemma D.1.1, we next prove two additional lemmas on asymptotic expansions of the gammafunctions.L EMMA

D.1.2.

Suppose a complex number z + a satisﬁes Re ( z + a ) ≥ (cid:15) > and | arg( z + a ) | ≤ π/ − (cid:15) with (cid:15) > and < (cid:15) ≤ π/ being given in advance. Assume | a | → ∞ as | z | → ∞ and | a | = o ( | z | ) . For a ﬁnite even L , when | a | L +1 / | z | L → , log Γ( z + a ) = (cid:18) z + a − (cid:19) log z − z + log √ π + L − (cid:88) l =1 ( − l +1 B l +1 ( a ) l ( l + 1) z l + O (cid:18) | a | L +1 | z | L (cid:19) . n the Phase Transition of Wilk’s Phenomenon Proof.

Please see Section D.1.1 on Page 53. (cid:3) L EMMA

D.1.3.

For a real number x → ∞ and a real number b = o ( x ) , log Γ( x + bi )Γ( x ) = ( x + bi ) log( x + bi ) − x log x − bi − bi x + O (cid:18) b + b x (cid:19) , where i denotes the imaginary unit.Proof. Please see Section D.1.2 on Page 54. (cid:3)

D.1.1 . Proof of Lemma D. . (on Page 52) By (D.1), for a ﬁnite even L , we have log Γ( z + a ) (D.3) = (cid:18) z + a − (cid:19) log( z + a ) − ( z + a ) + log √ π + L − (cid:88) l =1 ( − l +1 B l +1 (0) l ( l + 1)( z + a ) l + O (cid:0) | z + a | − L − (cid:1) = (cid:18) z + a − (cid:19) log z − z + (cid:18) z + a − (cid:19) log (cid:16) az (cid:17) − a + log √ π + L − (cid:88) l =1 ( − l +1 B l +1 (0) l ( l + 1) z l (cid:16) az (cid:17) − l + O (cid:0) | z + a | − L − (cid:1) . By Taylor’s expansion, (cid:18) z + a − (cid:19) log (cid:16) az (cid:17) − a = L − (cid:88) k =1 ( − k +1 z k (cid:26) a k +1 k ( k + 1) − k a k (cid:27) + O (cid:18) | a | L +1 | z | L (cid:19) . (D.4)Note that B (0) = 1 and B (0) = − / . Thus(D.4) = L − (cid:88) k =1 ( − k +1 k ( k + 1) z k (cid:26) B (0) a k +1 + (cid:18) k + 11 (cid:19) B (0) a k (cid:27) + O (cid:18) | a | L +1 | z | L (cid:19) . (D.5)In addition, by Taylor’s expansion, when L is ﬁnite, L − (cid:88) l =1 ( − l +1 B l +1 (0) l ( l + 1) z l (cid:16) az (cid:17) − l (D.6) = L − (cid:88) l =1 ( − l +1 B l +1 (0) l ( l + 1) z l (cid:40) L − − l (cid:88) s =0 ( − s (cid:18) l + s − s (cid:19) a s z s + O (cid:16) | a/z | L − l (cid:17)(cid:41) = L − (cid:88) k =1 k (cid:88) t =1 ( − k +1 B t +1 (0) t ( t + 1) z k ( k − t − k − t )! a k − t + O (cid:16) | a/z | L (cid:17) = L − (cid:88) k =1 k +1 (cid:88) t =2 ( − k +1 B t (0) k ( k + 1) z k (cid:18) k + 1 t (cid:19) a k +1 − t + O (cid:16) | a/z | L (cid:17) . Combining (D.3), (D.5), and (D.6), we obtain log Γ( z + a ) = (cid:18) z + a − (cid:19) log z − z + log √ π + L − (cid:88) k =1 ( − k +1 k ( k + 1) z k (cid:40) k +1 (cid:88) t =0 (cid:18) k + 1 t (cid:19) B t (0) a k +1 − t (cid:41) + O (cid:18) | a | L +1 | z | L (cid:19) . H E ET AL . By the property of the Bernoulli polynomials, B k +1 ( a ) = (cid:80) k +1 t =0 B t (0) a k +1 − t ; see, e.g., Eq. (13) on Page21 in Luke (1969). Therefore the lemma is proved.D.1.2 . Proof of Lemma D. . (on Page 53) By Binet’s second formula of the gamma function, it canbe obtained that for a complex number z with positive real part, and any integer L ≥ , log Γ( z ) = (cid:18) z − (cid:19) log z − z + log √ π + L (cid:88) l =1 B l (0)(2 l − l ) z l − + 2( − L z L − (cid:90) ∞ (cid:90) t u L d uu + z d te πt − please see Page 252 in Whittaker and Watson (1996) for details. Take L = 1 , and by B (0) = 1 / , wehave log Γ( x ) = (cid:18) x − (cid:19) log x − x + log √ π + 112 x − x (cid:90) ∞ (cid:18)(cid:90) t u x + u d u (cid:19) d te πt − . Similarly, we have log Γ( x + bi ) = (cid:18) x + bi − (cid:19) log( x + bi ) − ( x + bi ) + log √ π + 112( x + bi ) − x + bi (cid:90) ∞ (cid:18)(cid:90) t u ( x + bi ) + u d u (cid:19) d te πt − . It follows that log Γ( x + bi )Γ( x ) (D.7) = ( x + bi ) log( x + bi ) − x log x − bi −

12 log (cid:18) bix (cid:19) + 112 (cid:18) x + bi − x (cid:19) + ˜ R , where ˜ R = − (cid:90) ∞ (cid:90) t (cid:20) u ( x + bi ) { ( x + bi ) + u } − u x ( x + u ) (cid:21) d u d te πt − . To evaluate ˜ R , we note that u ( x + bi ) { ( x + bi ) + u } − u x ( x + u )= − u x × b x i − b x + b x i { (1 + b x i ) + u x } (1 + b x i ) { (1 + b x i ) + u x } (1 + u x )= − u b x x (1 + b x i )(1 + u x ) × (cid:20) i − b x (1 + b x i ) + u x + i (cid:21) , where for easy presentation, we let b x = b/x and u x = u/x . Since b = o ( x ) , | (1 + b x i ) − | is bounded.Moreover, we also know (1 + u x ) − and |{ (1 + b x i ) + u x } − | are bounded. It follows that there existsa constant C such that | ˜ R | ≤ Cb x x (cid:90) ∞ (cid:18)(cid:90) t u d u (cid:19) d te πt − O (cid:18) bx (cid:19) , where we use (cid:82) ∞ t ( e πt − − d t is a constant; see 7.2 in Whittaker and Watson (1996). Lemma D.1.3is then obtained by (D.7) and log (cid:18) bix (cid:19) = bix + O (cid:18) b x (cid:19) , x + bi − x = O (cid:18) bx (cid:19) . n the Phase Transition of Wilk’s Phenomenon D.2 . Lemmas for Theorems . , A. & A. . Proof of Lemma B. . (on Page 25) By (B.1), we can write log E { exp( − itη log Λ n ) } = G + G + G , where in this subsection, we let G = − iηnpt log (cid:18) en (cid:19) , G = − np − iηt ) log(1 − iηt ) ,G = log Γ p (cid:18) n − − ηnit (cid:19) − log Γ p (cid:16) n − (cid:17) . By the property of multivariate gamma function; see, e.g., Theorem 2.1.12 in Muirhead (2009), we obtain G = p (cid:88) j =1 log Γ (cid:26) n − ηit ) − j (cid:27) − p (cid:88) j =1 log Γ (cid:18) n − j (cid:19) = p (cid:88) j =1 (cid:20) log Γ (cid:26) ηn − it ) + n (1 − η ) − j (cid:27) − log Γ (cid:26) ηn n (1 − η ) − j (cid:27)(cid:21) . We ﬁrst examine G . When η = 1 or η = ρ , for ≤ j ≤ p , n (1 − η ) − j = O ( p ) and ηn = Θ( n ) .As p = o ( n ) , |{ n (1 − η ) − j }{ ηn (1 − it ) } − | = O ( p/n ) = o (1) . Then we can apply Lemma D.1.2 onPage 52, and obtain log Γ (cid:26) ηn − it ) + n (1 − η ) − j (cid:27) = (cid:26) ηn − it ) + n (1 − η ) − j − (cid:27) log (cid:110) ηn − it ) (cid:111) − ηn − it ) + log √ π + L − (cid:88) l =1 ( − l +1 l ( l + 1) B l +1 (cid:26) n (1 − η )2 − j (cid:27) (cid:110) ηn − it ) (cid:111) − l + O (cid:18) p L +1 n L (cid:19) , and log Γ (cid:26) ηn n (1 − η ) − j (cid:27) = (cid:26) ηn n (1 − η ) − j − (cid:27) log ηn − ηn √ π + L − (cid:88) l =1 ( − l +1 l ( l + 1) B l +1 (cid:26) n (1 − η )2 − j (cid:27) (cid:16) ηn (cid:17) − l + O (cid:18) p L +1 n L (cid:19) . It follows that G = − ηpnti log (cid:16) n e (cid:17) − pηnit log η + pn − iηt ) log(1 − it ) − p (cid:88) j =1 j + 12 log(1 − it )+ L − (cid:88) l =1 ( − l +1 l ( l + 1) p (cid:88) j =1 B l +1 (cid:26) n (1 − η )2 − j (cid:27) (cid:16) ηn (cid:17) − l (cid:8) (1 − it ) − l − (cid:9) + O (cid:18) p L +2 n L (cid:19) . H E ET AL . We next examine G . By − iηt = η (1 − it ) + 1 − η , and Taylor’s expansion, (1 − iηt ) log(1 − iηt )= { η (1 − it ) + 1 − η } log { η (1 − it ) } + 1 − η + (1 − η ) L − (cid:88) l =1 ( − l +1 l ( l + 1) (cid:18) − ηη (cid:19) l (1 − it ) − l + O (cid:8) (1 − η ) L +1 (cid:9) . As log(1) = (1 − iη ×

0) log(1 − iη ×

0) = 0 , by applying Taylor’s expansion similarly as above, (1 − iηt ) log(1 − iηt ) − log(1)= − iηt log η (1 − it ) + log(1 − it )+ (1 − η ) L − (cid:88) l =1 ( − l +1 l ( l + 1) (cid:18) − ηη (cid:19) l (cid:8) (1 − it ) − l − (cid:9) + O { (1 − η ) L +1 } . As (1 − η ) /η = { (1 − η ) n/ } / ( ηn/ , G = − L − (cid:88) l =1 ( − l +1 l ( l + 1) p (cid:88) j =1 (cid:26) (1 − η ) n (cid:27) l +1 (cid:16) ηn (cid:17) − l (cid:8) (1 − it ) − l − (cid:9) + iηnpt log η (1 − it ) − np − it ) + O (cid:8) (1 − η ) L +1 pn (cid:9) . In summary, as − η = O ( p/n ) when η = 1 or ρ , we have G + G + G = − p (cid:88) j =1 j + 12 log(1 − it ) + L − (cid:88) l =1 ς l (cid:8) (1 − it ) − l − (cid:9) + O (cid:18) p L +2 n L (cid:19) , where ς l = ( − l +1 l ( l + 1) p (cid:88) j =1 (cid:34) B l +1 (cid:26) (1 − η ) n − j (cid:27) − (cid:26) (1 − η ) n (cid:27) l +1 (cid:35) (cid:16) ηn (cid:17) − l . Particularly, as B l +1 ( · ) is a polynomial of order l + 1 and (1 − η ) n = O ( p ) , we have ς l = O ( p l +2 n − l ) .D.2.2 . Notation of the ﬁnite difference and computation rules In the following, we prove PropositionsB.1 and B.2 and Lemma B.2.2 based on the calculus of the ﬁnite difference. To facilitate the proofs,we introduce some notation. Given x , deﬁne a function with respect to the degrees of freedom f as F x ( f ) = P ( χ f ≤ x ) . Let ∆ h represent a forward difference operator with step h , that is, ∆ h ( F x , f ) = F x ( f + 2 h ) − F x ( f ) . For an integer v ≥ , it follows that the v -th order forward difference is ∆ v h ( F x , f ) = v (cid:88) w =0 (cid:18) vw (cid:19) ( − v − w F ( f + 2 hw ) , where ∆ h ( F x , f ) = ∆ h ( F x , f ) . Particularly, when h = 1 , we have ∆ v ( F x , f ) = v (cid:88) w =0 (cid:18) vw (cid:19) ( − v − w P ( χ f +2 w ≤ x ); when h = 2 , ∆ v ( F x , f ) = v (cid:88) w =0 (cid:18) vw (cid:19) ( − v − w P ( χ f +4 w ≤ x ) . In the following proofs, we use several rules of the ﬁnite difference operator listed in Lemmas D.2.1–D.2.3below, which can be found in Section 3.7 of Zwillinger (2002). n the Phase Transition of Wilk’s Phenomenon L EMMA

D.2.1 (L

EIBNIZ RULE ). For two functions F ( f ) and G ( f ) , and two positive integers v and h , ∆ vh ( F G, f ) = v (cid:88) w =0 (cid:18) vw (cid:19) ∆ wh ( F, f )∆ v − wh ( G, f + hw ) . L EMMA

D.2.2 (L

INEARITY RULE ). For two constants C and C , two functions F ( f ) and G ( f ) , andtwo positive integers v and h , the linear combination C F ( f ) + C G ( f ) satisﬁes ∆ vh ( C F + C G, f ) = C ∆ vh ( F ) + C ∆ vh ( G ) . L EMMA

D.2.3.

For a function F ( f ) and positive integers v , v , h , and h , ∆ v h ∆ v h ( F, f ) = ∆ v h ∆ v h ( F, f ) = ∆ v h ∆ v − h { ∆ h ( F, f ) } = ∆ v h ∆ v − h { ∆ h ( F, f ) } . Based on the notation and lemmas on the ﬁnite difference, we ﬁrst prove Lemma B.2.2 in Section D.2.3,and then use Lemma B.2.2 to prove Propositions B.1 and B.2 in Sections D.2.4 and D.2.5, respectively.D.2.3 . Proof of Lemma B. . (on Page 27) We prove (B.24) in Lemma B.2.2 from the cumulativedistribution function of the chi-squared distribution. In particular, by the probability density of χ f , wehave Pr (cid:0) χ f ≤ x (cid:1) = γ ( f / , x/ f / , where γ ( m, x ) is the lower incomplete gamma function deﬁned as γ ( m, x ) = (cid:82) x t m − e − t d t , and Γ( m ) is the gamma function deﬁned as Γ( m ) = (cid:82) ∞ t m − e − t d t ; see, e.g., Section 6.2 in Press et al. (1992).Thus for an integer h , ∆ h ( F x , f ) = Γ (cid:0) f (cid:1) γ (cid:0) f + h, x (cid:1) − Γ (cid:0) f + h (cid:1) γ (cid:0) f , x (cid:1) Γ (cid:0) f + h (cid:1) Γ (cid:0) f (cid:1) , where ∆ h ( F x , f ) = Pr( χ f +2 h ≤ x ) − Pr( χ f ≤ x ) following the notation in Section D.2.2. By integra-tion by parts, we have Γ( m + 1) = m Γ( m ) , and then Γ( m + h ) = h (cid:89) k =1 ( m + h − k )Γ( m ) . (D.8)Similarly, we have γ ( m + 1 , x ) = mγ ( m, x ) − x m e − x , and then γ ( m + h, x ) = h (cid:89) k =1 ( m + h − k ) γ ( m, x ) − h (cid:88) k =1 k − (cid:89) t =1 ( m + h − t ) x m + h − k e − x ; this recurrence formulas can also be found in Sections 6.3 and 6.5 in Abramowitz and Stegun (1970). Itfollows that ∆ h ( F x , f ) = − (cid:80) hk =1 (cid:81) k − t =1 ( f / h − t )( x/ f + h − k e − x/ (cid:81) ht =1 ( f / h − t ) × Γ( f /

2) = − h (cid:88) k =1 ( x/ f + h − k e − x/ Γ ( f / h − k + 1) . Therefore (B.24) is proved.We next prove (B.25) in Lemma B.2.2 based on (B.24) by discussing h ∈ { , , , } , respectively. (1). We ﬁrst consider h = 1 . Under this case, ∆ ( F x , f ) = − ( x/ f/ e − x/ Γ( f / . (D.9)8 H E ET AL . By (D.2), as f → ∞ , Γ( f /

2) = ( f / f/ − / e − f/ √ π { O ( f − ) } . Moreover, by Γ( f / f / f / , we have f / x/ f/ e − x/ = 1 √ f π (cid:18) xf (cid:19) f/ exp (cid:26) f − x O ( f − ) (cid:27) = 1 √ f π exp (cid:26) f − x f (cid:18) x − ff (cid:19) + O ( f − ) (cid:27) . When x = χ f ( α ) , we have x = f + √ f { z α + O ( f − / ) } by (B.6). Then by Taylor’s series, ∆ ( F x , f ) = 1 √ f π exp (cid:26) − ( x − f ) f + O ( f − / ) (cid:27) = 1 √ f π exp (cid:18) − z α (cid:19) { O ( f − / ) } . (2). When h = 2 , by (B.24), (D.8), and x = f + √ f { z α + O ( f − / ) } , we have ∆ ( F x , f ) = − x f + 1 × ∆ ( F x , f ) + ∆ ( F x , f ) = − √ f π exp (cid:18) − z α (cid:19) { O ( f − / ) } . (3). When h = 3 , similarly by (B.24), (D.8), and x = f + √ f { z α + O ( f − / ) } , we have ∆ ( F x , f ) = − ( x ) ( f + 2)( f + 1) ∆ ( F x , f ) + ∆ ( F x , f ) = − √ f π exp (cid:18) − z α (cid:19) { O ( f − / ) } . (4). When h = 4 , similarly by (B.24), (D.8), and x = f + √ f { z α + O ( f − / ) } , we have ∆ ( F x , f ) = − ( x ) ( f + 3)( f + 2)( f + 1) ∆ ( F x , f ) + ∆ ( F x , f )= − √ f π exp (cid:18) − z α (cid:19) { O ( f − / ) } . In summary, (B.25) is proved.D.2.4 . Proof of Proposition B. (on Page 27) We prove Proposition B.1 based on the notation inSection D.2.2 and Lemma B.2.2, which is proved in Section D.2.3 above. Particularly, we write the lefthand side of (B.21) as ∆ v h ( F x , f ) below. By (B.25), we know (B.21) holds for v = 1 and h ∈ { , , , } .We next prove (B.21) for v ≥ when h ∈ { , , , } , respectively. (Part I) Proof for h = 1 . When v = 2 , by (B.24), we have ∆ ( F x , f ) = − f + 2) (cid:16) x (cid:17) f +1 e − x/ + 1Γ( f + 1) (cid:16) x (cid:17) f e − x/ . Then we can write ∆ ( F x , f ) = A ( f ) Q ( f ) , where we deﬁne Q ( f ) = ∆ ( F x , f ) , and A ( f ) = x/ ( f + 2) − . (D.10)Note that Q ( f ) = O ( f − / ) by (B.25), and A ( f ) = O ( f − / ) by (B.6) when x = χ f ( α ) . Therefore,(B.21) holds for h = 1 and v = 2 .We next prove (B.21) for h = 1 and v > by the mathematical induction. Assume that there existssome constant C such that uniformly for integers ≤ k ≤ v − , ∆ k ( F x , f ) = O ( k ! C k f − k/ ) , that is, uniformly for integers ≤ k ≤ v − , ∆ k − ( Q , f ) = O ( k ! C k f − k/ ) . (D.11) n the Phase Transition of Wilk’s Phenomenon We next prove ∆ v ( F x , f ) = O ( v ! C v f − v/ ) . By the deﬁnition of Q ( f ) and A ( f ) , we have ∆ v ( F x , f ) = ∆ v − ( Q , f ) = ∆ v − ( A Q , f ) . By Lemma D.2.1, ∆ v − ( A Q , f ) = v − (cid:88) w =0 (cid:18) v − w (cid:19) ∆ w ( A , f )∆ v − − w ( Q , f + 2 w ) . (D.12)To evaluate (D.12), by (D.11), for ≤ w ≤ v − , we have ∆ v − − w ( Q , f + 2 w ) = O (cid:8) ( v − w − C v − w − f − ( v − w − / (cid:9) . In addition, to evaluate ∆ w ( A , f ) in (D.12), we use the following Lemma D.2.4.L EMMA

D.2.4.

When x = χ f ( α ) and f → ∞ , A ( f ) = √ z α f − / { O ( f − ) } , and for any in-teger w ≥ , ∆ w ( A , f ) = x × ( − w w w ! 1 (cid:81) w +1 k =1 ( f + 2 k ) . (D.13) Thus there exists a constant C such that (D.13) is of the order of O ( w ! C w f − w ) as f → ∞ uniformly for w ≥ .Proof. Please see Section D.2.6 on Page 67. (cid:3)

By Lemma D.2.4, (D.12) gives that as f → ∞ , ∆ v − ( A Q , f ) (D.14) = O ( f − / ) × O (cid:8) ( v − C v − f − ( v − / (cid:9) + v − (cid:88) w =1 (cid:18) v − w (cid:19) O (cid:0) w ! C w f − w (cid:1) × O (cid:8) ( v − w − C v − w − f − ( v − w − / (cid:9) = ( v − C v − O ( f − v/ ) + v − (cid:88) w =1 ( v − v − w − C v − O { f − ( w − / × f − v/ } = O ( v ! C v f − v/ ) , where in the last equation, we use v − w − ≤ v − and O { f − ( w − / } = O (1) when w ≥ . We notethat there exists a constant C such that the last equation in (D.14) holds uniformly for v ≥ . In summary,we obtain (B.21) for h = 1 . (Part II) Proof for h = 2 . By (B.24), (D.9) and (D.10), ∆ ( F x , f ) = Q ( f ) + Q ( f ) , (D.15)where we deﬁne Q ( f ) = − f + 2) (cid:16) x (cid:17) f +1 e − x/ . Then by (D.15) and Lemma D.2.2, we have ∆ v ( F x , f ) = ∆ v − ( Q , f ) + ∆ v − ( Q , f ) . Therefore, to prove (B.21) for h = 2 , it sufﬁces to prove ∆ v − ( Q , f ) = O ( v ! C v f − v/ ) , ∆ v − ( Q , f ) = O ( v ! C v f − v/ ) . (D.16)0 H E ET AL . As Q ( f ) = Q ( f − , it sufﬁces to prove (D.16), and we next use the mathematical induction. Notethat (D.16) holds for v = 1 since ∆ ( Q , f ) = Q ( f ) = O ( f − / ) by the proof of (B.25). In addition,for v = 2 , we have ∆ ( Q , f ) = Q ( f + 4) − Q ( f ) = A ( f ) Q ( f ) , (D.17)where A ( f ) = ( x ) ( f + 3)( f + 2) − . Note that Q ( f ) = O ( f − / ) , and when x = χ f ( α ) , we have A ( f ) = O ( f − / ) by (B.6). Therefore, ∆ ( Q , f ) = O ( f − ) , i.e., (D.16) holds for v = 2 . For v ≥ , we next use the mathematical induction,where we assume for integers ≤ w ≤ v − , ∆ w ( Q , f ) = O { ( w + 1)! C w +1 f − ( w +1) / } , (D.18)and prove (D.16). By (D.17), ∆ v − ( Q , f ) = ∆ v − ( A Q , f ) . Then by Lemma D.2.1, ∆ v − ( A Q , f ) = v − (cid:88) w =0 (cid:18) v − w (cid:19) ∆ w ( A , f )∆ v − − w ( Q , f + 4 w ) . (D.19)We next prove (D.16) by (D.18), (D.19) and the following Lemma D.2.5.L EMMA

D.2.5.

When x = χ f ( α ) , A ( f ) = 2 √ z α f − / { O ( f − / ) } . Moreover, there exists aconstant C such that uniformly for any integer w ≥ , ∆ w ( A , f ) = O (cid:26) ( w + 1)! C w w (cid:89) t =1 ( f + 2 t ) − (cid:27) . (D.20) Proof.

Please see Section D.2.7 on Page 67. (cid:3)

By Lemma D.2.5 and (D.18), we have(D.19) = O ( f − / ) × O (cid:8) ( v − C v − f − ( v − / (cid:9) + v − (cid:88) w =1 (cid:18) v − w (cid:19) O (cid:26) ( w + 1)! C w w (cid:89) t =1 ( f + 2 t ) − ( v − − w )! C ( v − − w ) f − v − − w (cid:27) = O (cid:8) ( v − C v − f − v (cid:9) + v − (cid:88) w =1 O (cid:8) ( v − v − − w ) C v − f − v (cid:9) ( w + 1) f w +12 (cid:81) wt =1 ( f + 2 t ) . (D.21)To evaluate (D.21), we note that when w = 1 and , ( w + 1) f ( w +1) / { (cid:81) wt =1 ( f + 2 t ) } − = O ( f (1 − w ) / ) ; when w ≥ , as f → ∞ , ( w + 1) f ( w +1) / (cid:81) wt =1 ( f + 2 t ) ≤ w + 12 w f ( w +1) / f w − = O (1) uniformly over w ≥ . Moreover, by (cid:80) v − w =1 ( v − v − − w ) ≤ v ! , we obtain (D.19) = O ( v ! C v f − v/ ) . (Part III) Proof for h = 3 . By (B.24), ∆ ( F x , f ) = Q ( f ) + Q ( f ) + Q ( f ) , (D.22)where we deﬁne Q ( f ) = − f + 3) (cid:16) x (cid:17) f +2 e − x/ . n the Phase Transition of Wilk’s Phenomenon Then by (D.22) and Lemma D.2.2, ∆ v ( F x , f ) = ∆ v − ( Q , f ) + ∆ v − ( Q , f ) + ∆ v − ( Q , f ) . Since Q ( f ) = Q ( f − and Q ( f ) = Q ( f − , it sufﬁces to prove ∆ v − ( Q , f ) = O ( v ! C v f − v/ ) . (D.23)We next prove (D.23) by the mathematical induction. Note that (D.23) holds for v = 1 since ∆ ( Q , f ) = Q ( f ) = O ( f − / ) by the proof of (B.25) in Section D.2.3. In addition, for v = 2 , ∆ ( Q , f ) = Q ( f + 6) − Q ( f ) = A ( f ) Q ( f ) , (D.24)where A ( f ) = (cid:89) k =1 A ,k ( f ) − , A ,k ( f ) = xf + 4 + 2 k . Note that A ( f ) = O ( f − / ) when x = χ f ( α ) by (B.6). Moreover, as Q ( f ) = O ( f − / ) , ∆ ( Q , f ) = O ( f − ) , i.e., (D.23) holds for v = 2 . For v ≥ , we next use the mathematical induction, where we assumefor integers ≤ w ≤ v − , ∆ w ( Q , f ) = O { ( w + 1)! C w +1 f − ( w +1) / } , (D.25)and prove (D.23). By (D.24), ∆ v − ( Q , f ) = ∆ v − ( A Q , f ) . Then by Lemma D.2.1, ∆ v − ( A Q , f ) = v − (cid:88) w =0 (cid:18) v − w (cid:19) ∆ w ( A , f )∆ v − − w ( Q , f + 6 w ) . (D.26)We next prove (D.26) by (D.25) and the following Lemma D.2.6.L EMMA

D.2.6.

When x = χ f ( α ) , A ( f ) = 3 √ z α f − / { O ( f − / ) } . Moreover, there exists aconstant C such that uniformly for any integer w ≥ , ∆ w ( A , f ) = O (cid:26) ( w + 2)! C w w (cid:89) t =1 ( f + 2 t ) − (cid:27) . Proof.

Please see Section D.2.8 on Page 68. (cid:3)

Then by (D.25) and Lemma D.2.6,(D.26) = O ( f − / ) × O (cid:8) ( v − C v − f − ( v − / (cid:9) + v − (cid:88) w =1 (cid:18) v − w (cid:19) O (cid:26) ( w + 2)! C w w (cid:89) t =1 ( f + 2 t ) − ( v − − w )! C v − − w f − ( v − − w ) / (cid:27) = O (cid:8) ( v − C v − f − v/ (cid:9) + v − (cid:88) w =1 O (cid:8) ( v − C v f − v/ (cid:9) ( w + 2)( w + 1) f ( w +1) / (cid:81) wt =1 ( f + 2 t ) . Note that when w ≤ , ( w + 2)( w + 1) f ( w +1) / (cid:81) wt =1 ( f + 2 t ) − = O { f (1 − w ) / } ; when w ≥ , ( w + 2)( w + 1) f ( w +1) / (cid:81) wt =1 ( f + 2 t ) ≤ ( w + 2)( w + 1) w ( w − f (5 − w ) / = O (1) as f → ∞ uniformly over w ≥ . It follows that (D.26) = O ( v ! C v f − v ) and thus (D.23) is proved. (Part IV) Proof for h = 4 . By (B.24), ∆ ( F x , f ) = Q ( f ) + Q ( f ) + Q ( f ) + Q ( f ) , (D.27)2 H E ET AL . where we deﬁne Q ( f ) = − f + 4) (cid:16) x (cid:17) f +3 e − x/ . Then by (D.27) and Lemma D.2.1, ∆ v ( F x , f ) = ∆ v − ( Q , f ) + ∆ v − ( Q , f ) + ∆ v − ( Q , f ) + ∆ v − ( Q , f ) . Since Q ( f ) = Q ( f − , Q ( f ) = Q ( f − , and Q ( f ) = Q ( f − , it sufﬁces to prove ∆ v − ( Q , f ) = O ( v ! C v f − v/ ) . (D.28)We next prove (D.28) by the mathematical induction. Note that (D.28) holds for v = 1 since ∆ ( Q , f ) = Q ( f ) = O ( f − / ) by the proof of (B.25) in Section D.2.3. In addition, for v = 2 , wehave ∆ ( Q , f ) = Q ( f + 8) − Q ( f ) = A ( f ) Q ( f ) , (D.29)where A ( f ) = (cid:89) k =1 A ,k ( f ) − , A ,k ( f ) = xf + 6 + 2 k . Note that A ( f ) = O ( f − / ) as x = f + √ f { z α + O ( f − / ) } . Moreover, as Q ( f ) = O ( f − / ) , ∆ ( Q , f ) = O ( f − ) , i.e., (D.28) holds for v = 2 . For v ≥ , we next use the mathematical induction,where we assume for integers ≤ w ≤ v − , ∆ w ( Q , f ) = O { ( w + 1)! C w +1 f − ( w +1) / } , (D.30)and prove (D.28). By (D.29), ∆ v − ( Q , f ) = ∆ v − ( A Q , f ) . Then by Lemma D.2.1, ∆ v − ( A Q , f ) = v − (cid:88) w =0 (cid:18) v − w (cid:19) ∆ w ( A , f )∆ v − − w ( Q , f + 8 w ) . (D.31)We next prove (D.31) by (D.30), (D.31) and the following Lemma D.2.7.L EMMA

D.2.7.

When x = χ f ( α ) , A ( f ) = 4 √ z α f − / { O ( f − / ) } . Moreover, there exists aconstant C such that as f → ∞ , ∆ w ( A , f ) = O (cid:26) ( w + 3)! C w w (cid:89) t =1 ( f + 2 t ) − (cid:27) holds uniformly for any integer w ≥ Proof.

Please see Section D.2.9 on Page 68. (cid:3)

Then by (D.30) and Lemma D.2.7,(D.31) = O ( f − / ) × O (cid:8) ( v − C v − f − ( v − / (cid:9) + v − (cid:88) w =1 (cid:18) v − w (cid:19) O (cid:26) ( w + 3)! C w w (cid:89) t =1 ( f + 2 t ) − ( v − − w )! C v − − w f − ( v − − w ) / (cid:27) = O (cid:8) ( v − C v − f − v/ (cid:9) + v − (cid:88) w =1 O (cid:8) ( v − C v f − v/ (cid:9) ( w + 3)( w + 2)( w + 1) f w +12 (cid:81) wt =1 ( f + 2 t ) . n the Phase Transition of Wilk’s Phenomenon Note that when w ≤ , ( w + 3)( w + 2)( w + 1) f ( w +1) / (cid:81) wt =1 ( f + 2 t ) − = O { f (1 − w ) / } ; when w ≥ , as f → ∞ , ( w + 3)( w + 2)( w + 1) f ( w +1) / (cid:81) wt =1 ( f + 2 t ) ≤ ( w + 3)( w + 2)( w + 1) w ( w − w − f (7 − w ) / = O (1) holds uniformly over w ≥ . It follows that (D.31) = O ( v ! C v f − v ) and thus (D.28) is proved.D.2.5 . Proof of Proposition B. (on Page 27) Similar to the proof of Proposition B.1 in SectionD.2.5, we prove Proposition B.2 using the notation in Section D.2.2 and Lemma B.2.2. We next discuss ( h , h ) = (1 , and ( h , h ) = (2 , in (Part I) and (Part II) below, respectively. (Part I) Proof for h = 1 and h = 2 . Based on the notation in Section D.2.2, it is equivalent to provethat there exists some constant C such that when x = χ f ( α ) , as f → ∞ , ∆ v ∆ v ( F x , f ) = O (cid:8) v ! v ! C v + v f − ( v + v ) / (cid:9) , (D.32)uniformly for integers v , v ≥ .When v = 0 or v = 0 , (D.32) holds by Proposition B.1. When v = v = 1 , by (D.9), we have ∆ ∆ ( F x , f ) = − f + 3) (cid:16) x (cid:17) f +2 e − x/ + 1Γ( f + 1) (cid:16) x (cid:17) f e − x/ = D , ( f )∆ ( F x , f ) , where D , ( f ) = x ( f + 4)( f + 2) − . (D.33)As D , ( f ) = O ( f − / ) and ∆ ( F x , f ) = O ( f − / ) , (D.32) holds for v = v = 1 . We next prove(D.32) by the mathematical induction. Particularly, we assume for integers s ≤ v and s ≤ v , ∆ s ∆ s ( F x , f ) = O (cid:8) s ! s ! C s + s f − ( s + s ) / (cid:9) , (D.34)and prove that (D.34) also holds for ( s , s ) = ( v + 1 , v ) and ( s , s ) = ( v , v + 1) , i.e., ∆ v ∆ v +12 ( F x , f ) and ∆ v +14 ∆ v ( F x , f ) , respectively. Step I.1. ∆ v ∆ v +12 ( F x , f ) . Recall that we deﬁne Q ( f ) = ∆ ( F x , f ) . It follows that (D.34) gives thatfor integers s ≤ v − and s ≤ v ∆ s ∆ s ( Q , f ) = O (cid:8) ( s + 1)! s ! C s + s +1 f − ( s + s +1) / (cid:9) . (D.35)It is then equivalent to prove that (D.35) holds for ( s , s ) = ( v , v ) , i.e., ∆ v ∆ v ( Q , f ) . By ∆ ( Q , f ) = A ( f ) Q ( f ) , (see the deﬁnitions in (D.10)), and Lemmas D.2.1 and D.2.2, ∆ v ∆ v ( Q , f )= v − (cid:88) w =0 (cid:18) v − w (cid:19) ∆ v (cid:110) ∆ w ( A , f )∆ v − − w ( Q , f + 2 w ) (cid:111) = v − (cid:88) w =0 (cid:18) v − w (cid:19) v (cid:88) w =0 (cid:18) v w (cid:19) ∆ w ∆ w ( A , f )∆ v − w ∆ v − − w ( Q , f + 2 w + 4 w ) . (D.36)To evaluate (D.36), we use the following Lemma D.2.8.L EMMA

D.2.8.

For two integers w and w satisfying w + w ≥ , there exists some constant C suchthat as f → ∞ , ∆ w ∆ w ( A , f ) = ( w + w )! O (cid:32) C w + w w + w (cid:89) k =1 f + 2 k (cid:33) H E ET AL . uniformly over w + w ≥ .Proof. Please see Section D.2.10 on Page 69. (cid:3)

By Lemma D.2.8 and the assumption (D.35), we have(D.36) = v − (cid:88) w =0 (cid:18) v − w (cid:19) v (cid:88) w =0 (cid:18) v w (cid:19) ( w + w )! O (cid:32) w + w (cid:89) k =1 f + 2 k (cid:33) C v + v +1 × ( v − w )!( v − w )! O (cid:8) ( f + 2 w + 4 w ) − ( v − w + v − w ) / (cid:9) = v − (cid:88) w =0 ( v − v − w ) v (cid:88) w =0 v ! C v + v +1 O { f − ( v + v +1) / }× ( w + w )! w ! w ! w + w (cid:89) k =1 f + 2 k × f ( w + w +1) / . We next use the following Lemma D.2.9.L

EMMA

D.2.9.

For integers w , w , and f , ( w + w )! w ! w ! w + w (cid:89) k =1 f + 2 k × f ( w + w +1) / = O (cid:8) − ( w + w − / (cid:9) . Proof.

Please see Section D.2.11 on Page 70. (cid:3)

It follows that by Lemma D.2.9,(D.36) = v − (cid:88) w =0 ( v − v − w ) v (cid:88) w =0 v ! 1( √ w + w − C v + v +1 O { f − ( v + v +1) / } = O { v ! v ! C v + v +1 f − ( v + v +1) / } , (D.37)which is O { ( v + 1)! v ! C v + v +1 f − ( v + v +1) / } as v < v + 1 . Therefore, we obtain ∆ v ∆ v ( Q , f ) = O { ( v + 1)! v ! C v + v +1 f − ( v + v +1) / } . Step I.2. ∆ v +14 ∆ v ( F x , f ) . By (D.15), ∆ v +14 ∆ v ( F x , f ) = ∆ v ∆ v +14 ( F x , f ) = ∆ v ∆ v ( Q , f ) + ∆ v ∆ v +12 ( F x , f ) . By (D.37), we have ∆ v ∆ v +12 ( F x , f ) = O { v ! v ! C v + v +1 f − ( v + v +1) / } . Therefore, it remains toprove ∆ v ∆ v ( Q , f ) = O { v !( v + 1)! C v + v +1 f − ( v + v +1) / } . By (D.17) and Lemma D.2.1, ∆ v ∆ v ( Q , f )= ∆ v (cid:40) v − (cid:88) w =0 (cid:18) v − w (cid:19) ∆ w ( A , f )∆ v − − w ( Q , f + 4 w ) (cid:41) = v − (cid:88) w =0 (cid:18) v − w (cid:19) v (cid:88) w =0 (cid:18) v w (cid:19) ∆ w ∆ w ( A , f )∆ v − w ∆ v − − w ( Q , f + 4 w + 2 w ) . (D.38)To evaluate (D.38) through the mathematical induction, by (D.15) and (D.34), we can assume that orintegers s ≤ v and s ≤ v − , ∆ s ∆ s ( Q , f ) = O (cid:8) s !( s + 1)! C s + s +1 f − ( s + s +1) / (cid:9) . (D.39)In addition, we use the following Lemma D.2.10. n the Phase Transition of Wilk’s Phenomenon L EMMA

D.2.10.

For two integers w and w satisfying w + w ≥ , ∆ w ∆ w ( A , f ) = ( w + w + 1)! O (cid:32) C w + w +1 w + w (cid:89) k =1 f + 2 k (cid:33) . Proof.

Please see Section D.2.12 on Page 71. (cid:3)

Combining (D.39) and Lemma D.2.10, we obtain ∆ v ∆ v ( Q , f ) = O { v ! v ! C v + v +1 f − ( v + v +1) / } similarly to (D.37) in Step I.1 . As v < v + 1 , we have ∆ v ∆ v ( Q , f ) = O { v !( v + 1)! C v + v +1 × f − ( v + v +1) / } . (Part II) Proof for h = 2 and h = 3 . In this part, we prove ∆ v ∆ v ( F x , f ) = O (cid:8) v ! v ! C v + v f − ( v + v ) / (cid:9) , (D.40)as f → ∞ and uniformly for integers v , v ≥ .When v = 0 or v = 0 , (D.40) holds by Proposition B.1. When v = v = 1 , note that ∆ ( F x , f ) = Q ( f ) + Q ( f ) by (D.15). Then we have ∆ ∆ ( F x , f ) = ∆ ( Q , f ) + ∆ ( Q , f ) . Particularly, ∆ ( Q , f ) = D , ( f ) Q ( f ) , D , ( f ) = x ( f + 6)( f + 4)( f + 2) − (D.41) ∆ ( Q , f ) = D , ( f ) Q ( f ) , D , ( f ) = x ( f + 8)( f + 6)( f + 4) − . By the proof of (B.25), Q ( f ) = O ( f − / ) and Q ( f ) = O ( f − / ) . In addition, for x = χ f ( α ) , by (B.6), D , ( f ) = O ( f − / ) and D , ( f ) = O ( f − / ) . Therefore, (D.40) holds for v = 1 and v = 1 . When v > or v > , by (D.15), ∆ v ∆ v ( F x , f ) = ∆ v ∆ v − ( Q , f ) + ∆ v ∆ v − ( Q , f ) . It sufﬁces to prove ∆ v ∆ v − ( Q , f ) = O (cid:8) v ! v ! C v + v f − ( v + v ) / (cid:9) , (D.42) ∆ v ∆ v − ( Q , f ) = O (cid:8) v ! v ! C v + v f − ( v + v ) / (cid:9) . (D.43)We next prove (D.42) and (D.43) by the mathematical induction, respectively.First, to prove (D.42), we apply the mathematical induction considering increasing v and v in thefollowing Step II.1 and

Step II.2 , respectively.

Step II.1.

We assume for ≤ s ≤ v − and ≤ s ≤ v , ∆ s ∆ s ( Q , f ) = O { ( s + 1)! s ! C s + s +1 f − ( s + s +1) / } , (D.44)and then prove (D.42). Note that ∆ ( Q , f ) = D , ( f ) Q ( f ) , where D , ( f ) is deﬁned in (D.33). Thenby the Leibniz rule in Lemma D.2.1, ∆ v ∆ v − ( Q , f )= ∆ v ∆ v − ( D , Q , f )= v (cid:88) k =0 v − (cid:88) k =0 (cid:18) v − k (cid:19)(cid:18) v k (cid:19) ∆ k ∆ k ( D , , f ) × ∆ v − k ∆ v − − k ( Q , f + 4 k + 6 k ) . (D.45)To evaluate (D.45), we use the following Lemma D.2.11.6 H E ET AL . L EMMA

D.2.11.

For integers k + k ≥ , there exists some constant C such that ∆ k ∆ k ( D , , f ) = ( k + k + 1)! O (cid:32) C k + k k + k (cid:89) t =1 f + 2 t (cid:33) , as f → ∞ and uniformly over k + k ≥ .Proof. Please see Section D.2.13 on Page 72. (cid:3)

Then applying similar analysis to that of (D.36) and (D.37) in

Part I above, we obtain (D.42) by theassumption (D.44) and Lemma D.2.11.

Step II.2.

We assume for ≤ s ≤ v − and ≤ s ≤ v − , (D.44) holds, and then prove (D.42).By (D.41) and the Leibniz rule in Lemma D.2.1, ∆ v ∆ v − ( Q , f ) (D.46) = ∆ v − ∆ v − ( D , Q , f )= v − (cid:88) k =0 v − (cid:88) k =0 (cid:18) v − k (cid:19)(cid:18) v − k (cid:19) ∆ k ∆ k ( D , , f ) × ∆ v − − k ∆ v − − k ( Q , f + 4 k + 6 k ) . Similarly to the analysis of (D.45), we use the following Lemma D.2.12 to evaluate (D.46).L

EMMA

D.2.12.

For integers k + k ≥ , there exists a constant C such that ∆ k ∆ k ( D , , f ) = ( k + k + 2)! O (cid:32) C k + k k + k (cid:89) t =1 f + 2 t (cid:33) , as f → ∞ and uniformly over k + k ≥ .Proof. Please see Section D.2.14 on Page 72. (cid:3)

Since we assume (D.44) holds for ≤ s ≤ v − and ≤ s ≤ v − , then by Lemma D.2.12,(D.46) = v − (cid:88) k =0 v − (cid:88) k =0 (cid:18) v − k (cid:19)(cid:18) v − k (cid:19) ( k + k + 2)!( v − − k )!( v − k )! × C v + v f − ( v + v ) / O (cid:32) f − ( k + k +1) / k + k (cid:89) t =1 f + 2 t (cid:33) = C v + v f − ( v + v ) / ( v − v − v (cid:88) k =0 v − (cid:88) k =0 ( v − k ) × ( k + k + 2)! k ! k ! O (cid:32) f − ( k + k +1) / k + k (cid:89) t =1 f + 2 t (cid:33) . We next use the following Lemma D.2.13 to evaluate (D.46).L

EMMA

D.2.13.

For integers k + k ≥ , as f → ∞ , ( k + k + 2)! k ! k ! O (cid:40) f − ( k + k +1) / k + k (cid:89) t =1 f + 2 t (cid:41) = O { − ( k + k − / } . Proof.

Please see Section D.2.15 on Page 72. (cid:3)

Then by Lemma D.2.13, we obtain ∆ v ∆ v − ( Q , f ) = O { v ! v ! C v + v f − ( v + v ) / } similarly to(D.37). In summary, combining Step II.1 and

Step II.2 , we ﬁnish the proof of (D.42). n the Phase Transition of Wilk’s Phenomenon Second, to prove (D.43), we can use the mathematical induction similarly to the proof of (D.42). Theanalysis would be very similar and the details are thus skipped.D.2.6 . Proof of Lemma D. . (on Page 59) When x = χ f ( α ) , by (B.6), we have x = f + √ f { z α + O ( f − / ) } , and then A ( f ) = √ z α f − / { O ( f − ) } . We next prove (D.13) by the mathematicalinduction. For w = 1 , we compute ∆ ( A , f ) = A ( f + 2) − A ( f ) = − x × × f + 2)( f + 4) . Therefore (D.13) holds when w = 1 . We next assume (D.13) holds, and prove the conclusion holds for ∆ w +12 ( A , f ) . Particularly, ∆ w +12 ( A , f ) = x × ( − w w w ! (cid:40) (cid:81) w +2 k =2 ( f + 2 k ) − (cid:81) w +1 k =1 ( f + 2 k ) (cid:41) = x × ( − w +1 w +1 ( w + 1)! 1 (cid:81) w +2 k =1 ( f + 2 k ) . In summary, Lemma D.2.4 is proved.D.2.7 . Proof of Lemma D. . (on Page 60) When x = χ f ( α ) , by (B.6), we have x = f + √ f { z α + O ( f − / ) } , and then A ( f ) = 2 √ z α f − / { O ( f − ) } . We next prove (D.20). Note that we can write A ( f ) = A , ( f ) A , ( f ) − , where we deﬁne A , ( f ) = xf + 4 and A , ( f ) = xf + 6 . By Lemmas D.2.1 and D.2.2, when w ≥ , ∆ w ( A , f ) = ∆ w ( A , A , , f ) = w (cid:88) k =0 (cid:18) wk (cid:19) ∆ k ( A , , f )∆ w − k ( A , , f + 4 k ) . (D.47)To prove (D.47) = O ( w ! C w f − w ) , we next evaluate ∆ k ( A , , f ) and ∆ w − k ( A , , f + 4 k ) .In particular, we prove that ∆ k ( A , , f ) = ( − k k k ! x × (cid:81) k +1 t =1 ( f + 4 t ) (D.48)by the mathematical induction. When k = 1 , ∆ ( A , , f ) = xf + 8 − xf + 4 = x × ( − f + 4)( f + 8) . Thus (D.48) holds for k = 1 . We next assume (D.48) holds and prove the conclusion for ∆ k +14 ( A , , f ) .Speciﬁcally, ∆ k +14 ( A , , f ) = ( − k k k ! x (cid:40) (cid:81) k +2 t =2 ( f + 4 t ) − (cid:81) k +1 t =1 ( f + 4 t ) (cid:41) = ( − k +1 k +1 ( k + 1)! x (cid:81) k +2 t =1 ( f + 4 t ) . In summary, (D.48) is proved. Moreover, as A , ( f ) = A , ( f + 2) , we have ∆ k ( A , , f ) = ∆ k ( A , , f + 2) = ( − k k k ! x (cid:81) k +1 t =1 ( f + 2 + 4 t ) . H E ET AL . It follows that ∆ w − k ( A , , f + 4 k ) = ( − w − k w − k ( w − k )! x { (cid:81) w +1 t = k +1 ( f + 2 + 4 t ) } − . Then by(D.47), there exists a constant C such that (cid:12)(cid:12) ∆ w ( A , A , , f ) (cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) w (cid:88) k =0 (cid:18) wk (cid:19) ( − w k !( w − k )! x (cid:81) k +1 t =1 ( f + 4 t ) (cid:81) w +1 t = k +1 ( f + 2 + 4 t ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ w ! C w w (cid:88) k =0 x (cid:81) w +2 t =1 ( f + 2 t ) . As x = χ f ( α ) = O ( f ) , we obtain that (D.20) holds as f → ∞ and uniformly for any integer w ≥ .D.2.8 . Proof of Lemma D. . (on Page 61) When x = χ f ( α ) , by (B.6), we have x = f + √ f { z α + O ( f − / ) } , and then A ( f ) = 3 √ z α f − / { O ( f − / ) } . We next consider ∆ w ( A , f ) for w ≥ . As A ( f ) = (cid:81) l =1 A ,l ( f ) − , ∆ w ( A , f ) = w (cid:88) k =0 k (cid:88) k =0 (cid:18) k k (cid:19)(cid:18) wk (cid:19) ∆ k ( A , , f )∆ k − k ( A , , f + 6 k )∆ w − k ( A , , f + 6 k ) . Similarly to the proofs of Lemma D.2.4 in Section D.2.6, for A ,l ( f ) , l ∈ { , , } , we can obtain that forany integer w ≥ and l ∈ { , , } ∆ w ( A ,l , f ) = ( − w w ! x × (cid:81) wt =0 ( f + 4 + 2 l + 6 t ) . It follows that ∆ w ( A , f ) = w (cid:88) k =0 k (cid:88) k =0 (cid:18) k k (cid:19)(cid:18) wk (cid:19) ( − w k !( k − k )!( w − k )! × x (cid:40) k +1 (cid:89) t =1 ( f + 6 t ) k +1 (cid:89) t = k +1 ( f + 6 t + 2) w +1 (cid:89) t = k +1 ( f + 6 t + 4) (cid:41) − . As (cid:0) k k (cid:1)(cid:0) wk (cid:1) k !( k − k )!( w − k )! = w ! , (cid:80) wk =0 (cid:80) k k =0 ≤ ( w + 1) , and x = χ f ( α ) = O ( f ) , thereexists a constant C such that as f → ∞ and uniformly over w ≥ , ∆ w ( A , f ) = O (cid:26) ( w + 2)! C w w (cid:89) t =1 ( f + 2 t ) − (cid:27) . D.2.9 . Proof of Lemma D. . (on Page 62) When x = χ f ( α ) , by (B.6), we have x = f + √ f { z α + O ( f − / ) } , and then A ( f ) = 4 √ z α f − / { O ( f − / ) } . We next prove the conclusion for w ≥ . As A ( f ) = (cid:81) l =1 A ,l ( f ) − , ∆ w ( A , f ) = w (cid:88) k =0 k (cid:88) k =0 k (cid:88) k =0 (cid:18) wk (cid:19)(cid:18) k k (cid:19)(cid:18) k k (cid:19) ∆ k ( A , , f ) × ∆ k − k ( A , , f + 8 k ) × ∆ k − k ( A , , f + 8 k ) × ∆ w − k ( A , , f + 8 k ) . Similarly to the proof of Lemma D.2.4 in Section D.2.6, for A ,l ( f ) , l ∈ { , , , } , we can obtain thatfor any integer w ≥ , ∆ w ( A ,l , f ) = ( − w w ! x × (cid:81) wt =0 ( f + 6 + 2 l + 8 t ) . n the Phase Transition of Wilk’s Phenomenon It follows that ∆ w ( A , f ) = w (cid:88) k =0 k (cid:88) k =0 k (cid:88) k =0 (cid:18) wk (cid:19)(cid:18) k k (cid:19)(cid:18) k k (cid:19) ( − w k !( k − k )!( k − k )!( w − k )! × x (cid:26) k +1 (cid:89) t =1 ( f + 8 t ) k +1 (cid:89) t = k +1 ( f + 8 t + 2) k +1 (cid:89) t = k +1 ( f + 8 t + 4) w +1 (cid:89) k +1 ( f + 8 t + 6) (cid:27) − . As (cid:0) wk (cid:1)(cid:0) k k (cid:1)(cid:0) k k (cid:1) k !( k − k )!( k − k )!( w − k )! = w ! , (cid:80) wk =0 (cid:80) k k =0 (cid:80) k k =0 ≤ ( w + 1) , and x = O ( f ) , there exists a constant C such that ∆ w ( A , f ) = O (cid:26) ( w + 3)! C w w (cid:89) t =1 ( f + 2 t ) − (cid:27) . D.2.10 . Proof of Lemma D. . (on Page 63) By the proof of Lemma D.2.4, we have ∆ w ( A , f ) = ( − w w w ! x w +1 (cid:89) s =1 A ,s ( f ) , where A ,s ( f ) = 1 / ( f + 2 s ) . It follows that ∆ w (cid:8) ∆ w ( A , f ) (cid:9) = x ( − w w !∆ w (cid:40) w +1 (cid:89) s =1 A ,s ( f ) (cid:41) . (D.49)To prove Lemma D.2.8, by x = χ f ( α ) = O ( f ) and (D.49), it sufﬁces to prove ∆ w (cid:40) w +1 (cid:89) s =1 A ,s ( f ) (cid:41) = ( w + w )! w ! O (cid:40) C w + w w + w +1 (cid:89) s =1 ( f + 2 s ) − (cid:41) . (D.50)We next prove (D.50) by the mathematical induction. Consider w = 0 ﬁrst. Similarly to the proof ofLemma D.2.4, for each integer ≤ s ≤ w + 1 , we have ∆ w ( A ,s , f ) = w !( − w w (cid:89) k =0 ( f + 2 s + 4 k ) . (D.51)Thus (D.50) holds for w = 0 . We then assume for integers ≤ l ≤ w , ∆ w (cid:40) l (cid:89) s =1 A ,s ( f ) (cid:41) = ( w + l − l − O (cid:40) w + l (cid:89) k =1 ( f + 2 k ) − (cid:41) , (D.52)and prove (D.50). By the Leibniz rule in Lemma D.2.1, ∆ w (cid:40) w +1 (cid:89) s =1 A ,s ( f ) (cid:41) = w (cid:88) k =0 (cid:18) w k (cid:19) ∆ k (cid:40) w (cid:89) s =1 A ,s ( f ) (cid:41) ∆ w − k ( A ,w +1 , f + 4 k ) . (D.53)0 H E ET AL . Then by (D.51) and (D.52), we obtain(D.53) = w (cid:88) k =0 (cid:18) w k (cid:19) ( k + w − w − O (cid:32) C w + k w + k (cid:89) s =1 f + 2 s (cid:33) × O (cid:40) ( w − k )! C w − k w − k (cid:89) s =0 f + 4 k + 2( w + 1) + 4 s (cid:41) = C w + w w (cid:88) k =0 w ! (cid:18) k + w − k (cid:19) O (cid:32) w + w +1 (cid:89) s =1 f + 2 s (cid:33) . By the hockey-stick identity, (cid:80) w k =0 (cid:0) k + w − k (cid:1) = (cid:0) w + w w (cid:1) . Therefore, (D.50) is proved and then (D.49)follows.D.2.11 . Proof of Lemma D. . (on Page 64) We next prove Lemma D.2.9 by discussing the caseswhen w + w is odd and even, respectively.(1) When w + w is odd, ( w + w + 1) / is an integer, and then ( w + w )! w + w (cid:89) k =1 f + 2 k × f ( w + w +1) / ≤ ( w + w )! w + w (cid:89) k =( w + w +1) / k ≤ − ( w + w − / w + w +1) / (cid:89) k =1 k. To prove Lemma D.2.9, it now sufﬁces to prove that there exists a constant C such that w ! w ! ( w + w +1) / (cid:89) k =1 k ≤ C. (D.54)To prove (D.54), we use the following Lemma D.2.14.L EMMA

D.2.14 (F

ACTORIAL BOUND ). For any integer w ≥ , (cid:16) we (cid:17) w e ≤ w ! ≤ (cid:18) w + 1 e (cid:19) w +1 e. Proof.

This is a known bound on factorial in literature, and is obtained by (cid:82) w ln x d x ≤ (cid:80) wx =1 ln x ≤ (cid:82) w ln( x + 1)d x . (cid:3) Assume without loss of generality that w ≥ w , and then by Lemma D.2.14, w ! w ! ( w + w +1) / (cid:89) k =1 k ≤ e (cid:18) ew (cid:19) w (cid:18) ew (cid:19) w (cid:18) w + w + 32 e (cid:19) ( w + w +3) / = 1 e (cid:18) e w w w + w + 32 e (cid:19) w (cid:18) e w w + w + 32 e (cid:19) ( w − w ) / (cid:18) w + w + 32 e (cid:19) / . (D.55)As w + w + 3 ≤ w , there exists a constant C such that(D.55) ≤ C (cid:18) ew (cid:19) w (cid:18) ew (cid:19) ( w − w ) / ( w + w + 3) / . n the Phase Transition of Wilk’s Phenomenon When w − w ≥ ,(D.55) ≤ C (cid:18) ew (cid:19) w (cid:18) ew (cid:19) ( w − w − / (cid:26) e ( w + w + 3) w (cid:27) / , which is bounded. When ≤ w − w ≤ ,(D.55) ≤ C (cid:18) ew (cid:19) w (2 w + 5) / , which is also bounded. In summary, (D.55) is bounded.(2) When w + w is even, similarly, we have ( w + w )! w + w (cid:89) k =1 f + 2 k × f ( w + w +1) / ≤ − ( w + w ) / − w + w ) / (cid:89) k =1 k. To prove Lemma D.2.9, it now sufﬁces to prove that there exists a constant C such that w ! w ! ( w + w ) / (cid:89) k =1 k ≤ C. Similar analysis can be applied and the conclusions follow.D.2.12 . Proof of Lemma D. . (on Page 65) When w = 0 , we know Lemma D.2.10 holdsby Lemma D.2.5. Recall that we write A ( f ) = A , ( f ) A , ( f ) − in Section D.2.7. Thus when w + w ≥ , ∆ w ∆ w ( A , f ) = ∆ w ∆ w ( A , A , , f ) . By the Leibniz rule in Lemma D.2.1, ∆ w ∆ w ( A , A , , f )= w (cid:88) k =0 w (cid:88) k =0 (cid:18) w k (cid:19)(cid:18) w k (cid:19) ∆ k ∆ k ( A , , f )∆ w − k ∆ w − k ( A , , f + 2 k + 4 k ) . (D.56)Following the proof of Lemma D.2.8, we have when k + k ≥ , ∆ k ∆ k ( A , , f ) = ( k + k )! O (cid:32) C k + k k + k (cid:89) s =1 f + 2 s (cid:33) , and when w + w − k − k ≥ , ∆ w − k ∆ w − k ( A , , f + 2 k + 4 k ) = ( w + w − k − k )! O (cid:32) C w + w − k − k w + w (cid:89) s = k + k +1 f + 2 s (cid:33) . Therefore,(D.56) = w ! w ! w (cid:88) k =0 w (cid:88) k =0 (cid:18) k + k k (cid:19)(cid:18) w + w − k − k w − k (cid:19) O (cid:32) C w + w w + w (cid:89) s =1 f + 2 s (cid:33) . By the ChuVandermonde identity, w (cid:88) k =0 w (cid:88) k =0 (cid:18) k + k k (cid:19)(cid:18) w + w − k − k w − k (cid:19) = w + w (cid:88) m =0 w (cid:88) s =0 (cid:18) ms (cid:19)(cid:18) w + w − mw − s (cid:19) = ( w + w + 1) (cid:18) w + w w (cid:19) . H E ET AL . Then ∆ w ∆ w ( A , f ) = ( w + w + 1)! O { C w + w (cid:81) w + w s =1 ( f + 2 s ) − } . D.2.13 . Proof of Lemma D. . (on Page 65) By the deﬁnition of D , ( f ) , when k + k ≥ , ∆ k ∆ k ( D , , f ) = x ∆ k ∆ k ( A , A , , f ) , where recall that we deﬁne A ,t = 1 / ( f + 2 t ) for integers t . By the Leibniz rule in Lemma D.2.1, ∆ k ∆ k ( A , A , , f ) = k (cid:88) s =0 k (cid:88) s =0 (cid:18) k s (cid:19)(cid:18) k s (cid:19) ∆ s ∆ s ( A , , f )∆ k − s ∆ k − s ( A , , f + 4 s + 6 k ) Following the proof of Lemma D.2.8 in Section D.2.10, we similarly have ∆ s ∆ s ( A , , f ) = ( s + s )! O (cid:32) C s + s s + s +1 (cid:89) k =1 f + 2 k (cid:33) . Then following the proof of Lemma D.2.10 in Section D.2.12, we obtain Lemma D.2.11. The analysiswill be very similar and thus the details are skipped.D.2.14 . Proof of Lemma D. . (on Page 66) Note that we can write D , ( f ) = x (cid:81) k =1 A ,k ( f ) − . By the Leibniz rule in Lemma D.2.1, ∆ k ∆ k ( D , , f ) = k (cid:88) s =0 s (cid:88) s =0 (cid:18) k s (cid:19)(cid:18) s s (cid:19) k (cid:88) t =0 t (cid:88) t =0 (cid:18) k t (cid:19)(cid:18) t t (cid:19) x × ∆ t ∆ s ( A , , f ) × ∆ t − t ∆ s − s ( A , , f + 6 s + 4 t )∆ k − t ∆ k − s ( A , , f + 6 s + 4 t ) . Following the proof of Lemma D.2.8 in Section D.2.10, we similarly have that for integers t + s ≥ , and l ∈ { , , } , ∆ t ∆ s ( A ,l ) = ( t + s )! O (cid:32) C t + s t + s +1 (cid:89) m =1 f + 2 m (cid:33) . By x = χ f ( α ) = O ( f ) , ∆ k ∆ k ( D , , f ) = k (cid:88) s =0 s (cid:88) s =0 k (cid:88) t =0 t (cid:88) t =0 (cid:18) k s (cid:19)(cid:18) s s (cid:19)(cid:18) k t (cid:19)(cid:18) t t (cid:19) ( t + s )!( t + s − t − s )! × ( k + k − t − s ) × O (cid:32) k + k (cid:89) m =1 f + 2 m (cid:33) . Similarly to the proof of Lemma D.2.10 in Section D.2.12, by the ChuVandermonde identity, we obtain ∆ k ∆ k ( D , , f ) = k (cid:88) s =0 k (cid:88) t =0 k ! k ! (cid:18) k + k − s − t k − s (cid:19)(cid:18) s + t s (cid:19) ( s + t + 1)= ( k + k + 2)! × O (cid:32) k + k (cid:89) m =1 f + 2 m (cid:33) , where we use s + s + 1 ≤ k + k + 1 in the second equation.D.2.15 . Proof of Lemma D. . (on Page 66) We prove Lemma D.2.13 similarly to the proof ofLemma D.2.9 in Section D.2.11 by discussing k + k is odd and even, respectively. n the Phase Transition of Wilk’s Phenomenon (1) When k + k is odd, similarly to the analysis of (D.55), we assume without loss of generality that k ≥ k , and obtain ( k + k + 2)! k ! k ! O (cid:32) f − ( k + k +1) / k + k (cid:89) t =1 f + 2 t (cid:33) ≤ − ( k + k − / k ! k ! ( k + k + 2)( k + k + 1) ( k + k +1) / (cid:89) t =1 t. (D.57)Note that ( k + k + 2)( k + k + 1) k ! k ! ( k + k +1) / (cid:89) t =1 t ≤ C (cid:18) e k k k + k + 32 e (cid:19) k (cid:18) e k k + k + 32 e (cid:19) ( k − k ) / ( k + k + 3) / ≤ C (cid:18) ek (cid:19) k (cid:18) ek (cid:19) ( k − k − / (cid:26) e ( k + k + 3) k (cid:27) / . (D.58)When k − k ≥ , we can see that (D.58) is bounded. When k − k ≤ , we have(D.58) ≤ C (cid:18) k k (cid:19) (5 − k + k ) / (cid:18) ek (cid:19) ( k + k − / (cid:18) k + k + 3 k (cid:19) / , which suggests that (D.58) is bounded. In summary, we know (D.58) is bounded, and therefore (D.57) = O { − ( k + k − / } .(2) When k + k is even, similar analysis can be applied, and then Lemma D.2.13 is proved.D.2.16 . Proof of Lemma C. . (on Page 41) We prove Lemma C.3.1 based on (C.31). In each testingproblem, we have | τ ,k + υ ,k | / | ηξ ,k | = o (1) ; see Sections C.3.1–C.3.6. Then under the conditions ofLemma C.3.1, we can apply Lemma D.1.2 and obtain for ≤ k ≤ K , log Γ (cid:8) ηξ ,k (1 − it ) + τ ,k + υ ,k (cid:9) = (cid:26) ηξ ,k (1 − it ) + τ ,k + υ ,k − (cid:27) log (cid:8) ηξ ,k (1 − it ) (cid:9) − ηξ ,k (1 − it ) + log √ π + L − (cid:88) l =1 ( − l +1 B l +1 ( τ ,k + υ ,k ) l ( l + 1) (cid:110) ηξ ,k (1 − it ) (cid:111) − l + O (cid:16) | τ ,k + υ ,k | L +1 / | ηξ ,k | L (cid:17) . Applying similar expansion to log Γ( ηξ ,k + τ ,k + υ ,k ) , we obtain log Γ (cid:8) ηξ ,k (1 − it ) + τ ,k + υ ,k (cid:9) − log Γ (cid:0) ηξ ,k + τ ,k + υ ,k (cid:1) = (cid:18) ηξ ,k + τ ,k + υ ,k − (cid:19) log(1 − it ) − itηξ ,k log (cid:8) ηξ ,k (1 − it ) (cid:9) + 2 itηξ ,k + L − (cid:88) l =1 ( − l +1 B l +1 ( τ ,k + υ ,k ) l ( l + 1)( ηξ ,k ) l (cid:110) (1 − it ) − l − (cid:111) + O (cid:16) | τ ,k + υ ,k | L +1 / | ηξ ,k | L (cid:17) . H E ET AL . Similarly, for ≤ j ≤ K , we have log Γ (cid:8) ηξ ,j (1 − it ) + τ ,j + υ ,j (cid:9) − log Γ (cid:0) ηξ ,j + τ ,j + υ ,j (cid:1) = (cid:18) ηξ ,j + τ ,j + υ ,j − (cid:19) log(1 − it ) − itηξ ,j log (cid:8) ηξ ,j (1 − it ) (cid:9) + 2 itηξ ,j + L − (cid:88) l =1 ( − l +1 B l +1 ( τ ,j + υ ,j ) l ( l + 1)( ηξ ,j ) l (cid:110) (1 − it ) − l − (cid:111) + O (cid:16) | τ ,j + υ ,j | L +1 / | ηξ ,j | L (cid:17) . Then by the form of ϕ ( t ) in (C.31), we calculate(C.31) = 2 itη  K (cid:88) k =1 ξ ,k log ξ ,k − K (cid:88) j =1 ξ ,j log ξ ,j  + (cid:26) K (cid:88) k =1 ( ξ ,k + τ ,k + υ ,k − / − K (cid:88) j =1 ( ξ ,j + τ ,j + υ ,j − / (cid:27) log(1 − it ) − itη  K (cid:88) k =1 ξ ,k log ξ ,k − K (cid:88) j =1 ξ ,j log ξ ,j  − itη (log η −  K (cid:88) k =1 ξ ,k − K (cid:88) j =1 ξ ,j  + L − (cid:88) l =1 ς l (cid:110) (1 − it ) − l − (cid:111) + O (cid:32) K (cid:88) k =1 | τ ,k + υ ,k | L +1 | ηξ ,k | L + K (cid:88) j =1 | τ ,j + υ ,j | L +1 | ηξ ,j | L (cid:33) . By the facts that τ ,k = ηξ ,k , τ ,j = ηξ ,j , and (cid:80) K k =1 ξ ,k = (cid:80) K j =1 ξ ,k , Lemma C.3.1 is proved.D.3 . Lemmas for Theorems . , A. , & A. . Proof of Lemma B. . (on Page 29) By (B.30) on Page 28, log ψ ( s ) = − pnti en − pn (1 − ti )2 log(1 − ti ) + log Γ p { ( n − / − nti/ } Γ p { ( n − / } + µ n ti, where t = s/ ( nσ n ) . We next examine log ψ ( s ) by the following Lemma D.3.1.L EMMA

D.3.1.

Let { p = p n ; n ≥ } , { m = m n ; n ≥ } , { t n ; n ≥ } , and { s n ; n ≥ } satisfy that (i) p n → ∞ and p n = o ( n ) ; (ii) there exists (cid:15) ∈ (0 , such that (cid:15) ≤ m n /n ≤ (cid:15) − ; (iii) t = t n = O ( ns/p ) ;(iv) s = s n = o (min { ( n/p ) / , f / } ) . Then as n → ∞ , log Γ p (cid:0) m − + ti (cid:1) Γ p (cid:0) m − (cid:1) = β m, ti − β m, t + β m, ( ti ) + O (cid:18) p tm (cid:19) + (cid:18) p + pm (cid:19) O (cid:18) p t m (cid:19) + O (cid:16) p t m (cid:17) , where β m, = − (cid:26) p + (cid:18) m − p − (cid:19) log (cid:18) − pm − (cid:19)(cid:27) ; β m, = − (cid:26) pm − (cid:18) − pm − (cid:19)(cid:27) ; β m, ( ti ) = p (cid:26)(cid:18) m −

12 + ti (cid:19) log (cid:18) m −

12 + ti (cid:19) − m −

12 log m − (cid:27) . Proof.

Please see Section D.3.2 on Page 75. (cid:3) n the Phase Transition of Wilk’s Phenomenon By (B.7) and f = Θ( p ) , we know t = s/ ( nσ n ) = O ( s/p ) . Thus we can apply Lemma D.3.1 and expand log Γ p { ( n − / − nti/ } Γ p { ( n − / } = − nβ n, ti − β n, n t β n, (cid:18) − nti (cid:19) + O (cid:18) p tn (cid:19) + (cid:18) p + pn (cid:19) O (cid:0) p t (cid:1) + O (cid:0) p t (cid:1) . We next use the following Lemma D.3.2 to evaluate β n, ( − nti/ .L EMMA

D.3.2.

When p = p n → ∞ , p = o ( n ) , and t = t n = O ( s/p ) with s = s n = o (min { ( n/p ) / , f / } ) , β n, (cid:18) − nti (cid:19) = − pnti n pn (1 − ti )2 log(1 − ti ) + pti O (cid:18) pt + ptn (cid:19) . Proof.

Please see Section D.3.3 on Page 76. (cid:3)

It follows that log ψ ( s ) = − { p ( n −

1) + nβ n, } ti − β n, n t µ n ti + O (cid:18) p tn (cid:19) + (cid:18) p + pn (cid:19) O (cid:0) p t (cid:1) + O (cid:0) p t (cid:1) . Since σ n = β n, / , µ n = { p ( n −

1) + nβ n, } / , and t = s/ ( nσ n ) , log ψ ( s ) = − s O (cid:18) psn (cid:19) + (cid:18) p + pn (cid:19) O ( s ) + O (cid:18) s p (cid:19) , where we use t = O ( s/p ) . As log ψ ( s ) = − s / , (B.31) is proved.D.3.2 . Proof of Lemma D. . (on Page 74) By the property of the multivariate gamma function; see,e.g., Theorem 2.1.12 in Muirhead (2009), log Γ p (cid:0) m − + ti (cid:1) Γ p (cid:0) m − (cid:1) = p (cid:88) j =1 log Γ (cid:0) m − j + ti (cid:1) Γ (cid:0) m − j (cid:1) . (D.59)Then by Lemma D.1.3 on Page 53, log Γ (cid:0) m − j + ti (cid:1) Γ (cid:0) m − j (cid:1) = p (cid:88) j =1 (cid:34) (cid:18) m − j ti (cid:19) log (cid:18) m − j ti (cid:19) − (cid:18) m − j (cid:19) log (cid:18) m − j (cid:19) (D.60) − ti − tim − j + O (cid:26) t + t ( m − j ) (cid:27)(cid:35) , as m → ∞ uniformly for all ≤ j ≤ p. Note that t/ ( m − j ) = t/m + ( t/m ) × { j/ ( m − j ) } , and then p (cid:88) j =1 tim − j = ptim + O (cid:18) p m (cid:19) ti. (D.61)By (D.60) and (D.61), we obtain as m → ∞ ,(D.59) = p (cid:88) j =1 (cid:34) (cid:18) m − j ti (cid:19) log (cid:18) m − j ti (cid:19) − (cid:18) m − j (cid:19) log (cid:18) m − j (cid:19)(cid:35) (D.62) − ( m + 1) ptim + O (cid:18) p m t + pm t (cid:19) . H E ET AL . For ≤ j ≤ p , deﬁne g j ( z ) = (cid:18) m − j z (cid:19) log (cid:18) m − j z (cid:19) − (cid:18) m −

12 + z (cid:19) log (cid:18) m −

12 + z (cid:19) , where the real part of z > − ( m − p ) / . It follows that the “ (cid:80) pj =1 ” term in the ﬁrst row of (D.62) is equalto p (cid:26)(cid:18) m −

12 + ti (cid:19) log (cid:18) m −

12 + ti (cid:19) − m −

12 log m − (cid:27) + p (cid:88) j =1 { g j ( ti ) − g j (0) } . (D.63)To evaluate (D.63), we use the following Lemma D.3.3.L EMMA

D.3.3.

Let p = p m such that ≤ p < m , p → ∞ and p/m → as m → ∞ . When t = t m = O ( ms/p ) with s = s m = o (min { ( m/p ) / , p / } ) , we have that, as m → ∞ , p (cid:88) j =1 { g j ( ti ) − g j (0) } = ν ,m ti − ν ,m t + O (cid:18) p tm (cid:19) + (cid:18) p + pm (cid:19) O (cid:18) p t m (cid:19) + O (cid:16) p t m (cid:17) , where ν ,m = (cid:18) p − m + 32 (cid:19) log (cid:18) − pm − (cid:19) − m − m p, (D.64) ν ,m = − (cid:26) pm − (cid:18) − pm − (cid:19)(cid:27) . Proof.

Please see Section D.3.4 on Page 77. (cid:3)

Then by Lemma D.3.3,(D.63) = p (cid:26)(cid:18) m −

12 + ti (cid:19) log (cid:18) m −

12 + ti (cid:19) − m −

12 log m − (cid:27) + ν ,m ti − ν ,m t + O (cid:18) p tm (cid:19) + (cid:18) p + pm (cid:19) O (cid:18) p t m (cid:19) + O (cid:16) p t m (cid:17) . In summary, Lemma D.3.1 can be proved by noticing β m, = ν ,m − ( m + 1) pm , β m, = ν ,m / β m, ( ti ) = p (cid:26)(cid:18) m −

12 + ti (cid:19) log (cid:18) m −

12 + ti (cid:19) − m −

12 log m − (cid:27) . D.3.3 . Proof of Lemma D. . (on Page 75) By Taylor’s series, p − β n, ( − nti/

2) = − nti n − nti (cid:18) − ti − n (cid:19) + n −

12 log (cid:18) − ti − tin − (cid:19) = − nti n − nti − ti ) + nti n (1 − ti ) + O (cid:18) ntn (cid:19) + n −

12 log(1 − ti ) − n − ti ( n − − ti ) + n − O (cid:18) t n (cid:19) = − nti n n (1 − ti ) −

12 log(1 − ti ) + O (cid:18) t + t n (cid:19) . n the Phase Transition of Wilk’s Phenomenon It follows that β n, ( − nti/

2) = − pnti n pn (1 − ti )2 log(1 − ti ) + pti O (cid:18) pt + ptn (cid:19) . D.3.4 . Proof of Lemma D. . (on Page 76) The ﬁrst-order derivatives of g j ( z ) is g (1) j ( z ) = log (cid:18) m − j z (cid:19) − log (cid:18) m −

12 + z (cid:19) , and for l ≥ , the l -th order derivatives of g j ( z ) is g ( l ) j ( z ) = ( − l − ( l − (cid:40)(cid:18) m − j z (cid:19) − ( l − − (cid:18) m −

12 + z (cid:19) − ( l − (cid:41) = ( − l − ( l − (cid:18) m −

12 + z (cid:19) − ( l − l − (cid:88) v =1 (cid:18) l − v (cid:19) (cid:18) j − m − j + 2 z (cid:19) v . By Taylor’s expansion, g j ( ti ) − g j (0) = (cid:80) ∞ l =1 g ( l ) j (0) z l /l ! . In particular, g (1) j (0) = log( m − j ) − log( m − , g (2) j (0) = 2 m − j − m − . When z = ti , t = t m = O ( ms/p ) , and l ≥ , as j − / ( m − j + 2 z ) = O ( p/m ) = o (1) , g ( l ) j (0) z l /l ! = O (cid:18) m l − pm t l (cid:19) = O (cid:18) pm l (cid:19) t l . As t/m = O ( s/p ) = o (1) , p (cid:88) j =1 { g j ( ti ) − g j (0) } = p (cid:88) j =1 g (1) j (0) ti − p (cid:88) j =1 g (2) j (0) t + O (cid:16) p t m (cid:17) . By Lemma A.2 in Jiang and Qi (2015), p (cid:88) j =1 g (1) j (0) = ν ,m + O ( ν ,m ) , p (cid:88) j =1 g (2) j (0) = ν ,m (cid:26) O (cid:18) p + pm (cid:19)(cid:27) , where ν ,m and ν ,m are deﬁned in (D.64). In summary, p (cid:88) j =1 { g j ( ti ) − g j (0) } = ν ,m ti − ν ,m t + O ( ν ,m ) t + ν ,m O (cid:18) p + pn (cid:19) t + O (cid:16) p t m (cid:17) . Then Lemma D.3.3 follows by ν ,m = O ( p /m ) . D.3.5 . Proof of Lemma C. . (on Page 51) By Taylor’s series, we have (C.44). In addition, for (C.45),note that we can write p − (cid:37) l ( t ) = l −

12 log (cid:18) ltl − (cid:19) + lt (cid:18) l −

12 + lt (cid:19) . H E ET AL . By Taylor’s series log x = log a + (cid:80) L − l =1 ( − l − l − ( x/a − l + O { ( x/a − L } , we obtain (cid:37) l ( t ) p = l (cid:18) t + tl − (cid:19) −

12 log (cid:18) ltl − (cid:19) + lt (cid:26) l (1 + t )2 − (cid:27) = l t ) + lt l − t ) − lt l −

1) + lt l (1 + t )2 − t t ) + O (cid:18) tl + t (cid:19) = l (1 + t )2 log(1 + t ) + lt l − t O (cid:18) tl + t (cid:19) . Then by n = (cid:80) kj =1 n j , we have − (cid:37) n ( t ) + k (cid:88) j =1 (cid:37) n j ( t ) = (cid:18) − k − n log n + k (cid:88) j =1 n j log n j (cid:19) tp O (cid:18) ptn + pt (cid:19) . R EFERENCES

Abramowitz, M. and I. A. Stegun (1970).

Handbook of mathematical functions with formulas, graphs, and mathe-matical tables (9th ed.), Volume 55. US Government printing ofﬁce.Anastasiou, A. and G. Reinert (2018). Bounds for the asymptotic distribution of the likelihood ratio. arXiv preprintarXiv:1806.03666 .Anderson, T. (2003).

The Annals of Statistics 37 (6B), 3822–3840.Bai, Z., D. Jiang, J.-f. Yao, and S. Zheng (2013). Testing linear hypotheses in high-dimensional regressions.

Statis-tics 47 (6), 1207–1223.Barndorff-Nielsen, O. and P. Hall (1988). On the level-error after Bartlett adjustment of the likelihood ratio statistic.

Biometrika 75 (2), 374–378.Boucheron, S. and P. Massart (2011). A high-dimensional Wilks phenomenon.

The Annals of Statistics .Chen, S. X. and H. Cui (2006). On bartlett correction of empirical likelihood in the presence of nuisance parameters.

Biometrika 93 (1), 215–220.Chen, S. X., L. Peng, and Y.-L. Qin (2009). Effects of data dimension on empirical likelihood.

Biometrika 96 (3),711–722.Chen, Y., J. Huang, Y. Ning, K.-Y. Liang, and B. G. Lindsay (2018). A conditional composite likelihood ratio testwith boundary constraints.

Biometrika 105 (1), 225–232.Cleff, T. (2019).

Applied Statistics and Multivariate Data Analysis for Business and Economics: A Modern ApproachUsing SPSS, Stata, and Excel . Springer.Cordeiro, G. M. and F. Cribari-Neto (2014).

Biometrika 98 (4),919–934.Fan, J., H.-N. Hung, and W.-H. Wong (2000). Geometric understanding of likelihood ratio statistics.

Journal of theAmerican Statistical Association 95 (451), 836–841.Fan, J., C. Zhang, and J. Zhang (2001). Generalized likelihood ratio statistics and Wilks phenomenon.

The Annals ofstatistics 29 (1), 153–193.Fan, J. and W. Zhang (2004). Generalised likelihood ratio tests for spectral density.

Biometrika 91 (1), 195–209.He, X. and Q.-M. Shao (2000). On parameters of increasing dimensions.

Journal of Multivariate Analysis 73 (1),120–135.He, Y., T. Jiang, J. Wen, and G. Xu (2020). Likelihood ratio test in multivariate linear regression: from low to highdimension.

Statistica Sinica . n the Phase Transition of Wilk’s Phenomenon Hjort, N. L., I. W. McKeague, and I. Van Keilegom (2009). Extending the scope of empirical likelihood.