[PDF] Local Asymptotic Normality of the spectrum of high-dimensional spiked F-ratios

Abstract

We consider two types of spiked multivariate F distributions: a scaled distribution with the scale matrix equal to a rank-one perturbation of the identity, and a distribution with trivial scale, but rank-one non-centrality. The norm of the rank-one matrix (spike) parameterizes the joint distribution of the eigenvalues of the corresponding F matrix. We show that, for a spike located above a phase transition threshold, the asymptotic behavior of the log ratio of the joint density of the eigenvalues of the F matrix to their joint density under a local deviation from this value depends only on the largest eigenvalue λ 1 . Furthermore, λ 1 is asymptotically normal, and the statistical experiment of observing all the eigenvalues of the F matrix converges in the Le Cam sense to a Gaussian shift experiment that depends on the asymptotic mean and variance of λ 1 . In particular, the best statistical inference about a sufficiently large spike in the local asymptotic regime is based on the largest eigenvalue only. As a by-product of our analysis, we establish joint asymptotic normality of a few of the largest eigenvalues of the multi-spiked F matrix when the corresponding spikes are above the phase transition threshold.

Full PDF

aa r X i v : . [ m a t h . S T ] N ov Local Asymptotic Normality of the spectrum ofhigh-dimensional spiked F-ratios

Prathapasinghe Dharmawansa, Iain M. Johnstone, and Alexei Onatski.July 18, 2018

Abstract

We consider two types of spiked multivariate F distributions: a scaled distributionwith the scale matrix equal to a rank-one perturbation of the identity, and a distribu-tion with trivial scale, but rank-one non-centrality. The norm of the rank-one matrix( spike ) parameterizes the joint distribution of the eigenvalues of the correspondingF matrix. We show that, for a spike located above a phase transition threshold, theasymptotic behavior of the log ratio of the joint density of the eigenvalues of the Fmatrix to their joint density under a local deviation from this value depends only onthe largest eigenvalue λ . Furthermore, λ is asymptotically normal, and the sta-tistical experiment of observing all the eigenvalues of the F matrix converges in theLe Cam sense to a Gaussian shift experiment that depends on the asymptotic meanand variance of λ . In particular, the best statistical inference about a suﬃcientlylarge spike in the local asymptotic regime is based on the largest eigenvalue only.As a by-product of our analysis, we establish joint asymptotic normality of a few ofthe largest eigenvalues of the multi-spiked F matrix when the corresponding spikesare above the phase transition threshold. Key words : Spiked F-ratio, Local Asymptotic Normality, multivariate F distri-bution, phase transition, super-critical regime, asymptotic normality of eigenvalues,limits of statistical experiments.

In this paper we establish the

Local Asymptotic Normality ( LAN ) of the statistical ex-periments of observing the eigenvalues of the F-ratio, B − A, of two high-dimensionalindependent Wishart matrices, A and B . We consider two situations. First, both A and B are central Wisharts with dimensionality and degrees of freedom that grow propor-tionally, and with the covariance parameters that diﬀer by a matrix of rank one. Second, A and B have the same covariance parameter, but A is a non-central Wishart with thenon-centrality parameter of rank one. In both cases, the joint distribution of the eigen-values of B − A depends on the norm of the rank-one matrix, which we call a spike . We1nd that the considered statistical experiments are LAN under a local parameterizationof the spike when the locality is above a phase transition threshold.Many classical multivariate statistical tests are based on the eigenvalues of F-ratiomatrices. For example, all tests of the equality of two covariance matrices and of the general linear hypothesis in the

Multivariate Linear Model described in Muirhead’s (1982)chapters 8 and 10 are of this form. Contemporaneous statistical applications often requirethe dimensionality of the F-ratio and its degrees of freedom be large and comparable.Therefore, we consider the asymptotic regime where the dimensionality and the degreesof freedom diverge to inﬁnity at the same rate.Our requirement that the parameters of the two Wisharts diﬀer by a rank-one ma-trix can be linked to situations where the alternative hypothesis is characterized by thepresence of one factor or signal, which is absent from the data under the null. Inferenceconditional on factors requires considering non-central F-ratios, whereas the uncondi-tional inference leads to F-ratios with unequal covariances.The main result of this paper can be summarized as follows. We show that theasymptotic behavior of the log ratio of the joint density of the eigenvalues of B − A, whichcorresponds to a suﬃciently large value of the spike , to their joint density under a localdeviation from this value depends only on the largest eigenvalue λ . Furthermore, λ is asymptotically normal, and the statistical experiment of observing all the eigenvaluesof B − A converges in the Le Cam sense to a Gaussian shift experiment that dependson the asymptotic mean and variance of λ . In particular, the best statistical inferenceabout a suﬃciently large spike in the local asymptotic regime is based on the largesteigenvalue only.We derive an explicit formula for the phase transition threshold demarcating thearea of the suﬃciently large spikes . In a general framework, where the parameters of A and B may diﬀer by a matrix ∆ of a ﬁnite rank, we show that, when the norm of ∆ isbelow the threshold, any ﬁnite number of the largest eigenvalues of B − A almost surelyconverge to the upper boundary of the support of the limiting spectral distribution of B − A, derived by Wachter (1980). In contrast, when m of the largest eigenvalues of ∆are above the threshold, we ﬁnd that the m of the largest eigenvalues of B − A almostsurely converge to locations strictly above the upper boundary of Wachter’s distribution,and that their local ﬂuctuations about these limits are asymptotically jointly normal.In a setting of two independent and not necessarily normal samples, the phase tran-sition phenomenon has been studied in Nadakuditi and Silverstein (2010). They obtaina formula for the threshold, and establish the almost sure limits of the m largest eigen-values for the case where ∆ describes the diﬀerence between covariance matrices of thetwo samples. The limiting distribution of ﬂuctuations above the threshold is describedin their paper as an open problem. Our paper solves this problem for the case of twonormal samples. 2he phase transition phenomenon for a single Wishart matrix has also been a subjectof active recent research. Baik et al (2005) study the joint distributions of a few of thelargest eigenvalues of complex Wisharts with spiked covariance parameters. They derivethe asymptotic distributions of a few of the largest eigenvalues, which turn out to bediﬀerent depending on whether the sizes of the corresponding spikes are below, at, orabove a phase transition threshold, the situations often referred to as the sub-critical , critical , and super-critical regimes .Similar transition takes place for real Wisharts. Paul (2007) establishes asymptoticnormality of the ﬂuctuations of a few of the largest eigenvalues in the super-criticalregime of the real case. F´eral and P´ech´e (2009), Benaych-Georges et al (2011) and Baoet al (2014a) show that the ﬂuctuations in the sub-critical real case have the Tracy-Widom distribution , while Mo (2012) and Bloemendal and Vir`ag (2011, 2013) establishthe asymptotic distribution of a diﬀerent type in the critical regime. In a setting of twonormal samples, Bao et al (2014b) study the almost sure limits of the sample canonicalcorrelations when the population canonical correlations are below and when they areabove a phase transition threshold.Our results on the joint asymptotic normality of the largest eigenvalues in the super-critical regime for F-ratios can be used to make statistical inference about the eigenvaluesof the “ratio” of the population covariances of A and B , or the eigenvalues of the non-centrality parameter of A . The estimates of these eigenvalues play important role inMANOVA and the discriminant analysis , and can also be used in constructing modiﬁedmodel selection criteria as discussed in Sheena et al (2004). Further, they may be impor-tant in as diverse applications as constructing genetic selection indices and describing adegree of ﬁnancial turbulence (see Hayes and Hill (1981), and Kritzman and Li (2010)).We expect that our asymptotic normality results can be extended to the case of the“ratio” of two sample covariance matrices constructed from non-normal samples. In theone-sample case, such an extension of Paul’s (2007) asymptotic normality results hasbeen done in Bai and Yao (2008). In this paper, we focus on normal data. This focus isdictated by our main goal: establishing the LAN property of the statistical experimentsof observing the eigenvalues of B − A . To reach this goal, we derive an asymptoticapproximation to a log likelihood process by representing it in the form of a contourintegral, and applying the Laplace approximation method. The explicit form of the jointdistribution of the eigenvalues of B − A is known only in the normal case, and we needsuch an explicit form for our analysis.A decision-theoretic approach to the ﬁnite sample estimation of the eigenvalues of the“ratio” of the population covariances of A and B , or the eigenvalues of the non-centralityparameter of A was taken in many previous studies (see Sheena et al (2004), Bilodeauand Srivastava (1992), and references therein). In one of the ﬁrst such studies, Muirheadand Verathaworn (1985) explain that the ideal decision-theoretic approach that directly3nalyzes expected loss with respect to the joint distribution of the eigenvalues of B − A “does not seem feasible due primarily to the complexity of the distribution of the orderedlatent roots...” Instead, they focus on deriving an optimal estimator from a particularclass.Our LAN result makes possible an asymptotic implementation of the ideal decision-theoretic approach. We overcome the complexity of the joint distribution of the eigen-values by using a tractable contour integral representation of the log likelihood process,which was obtained in the single-spike case by Dharmawansa and Johnstone (2014). Inthe multiple-spike case, a similar representation involves multiple contour integrals (seePassemier et al (2014)). An asymptotic analysis of such a multiple integral requires asubstantial additional eﬀort, and we leave it for future research.It is interesting to contrast the

LAN result in the super-critical regime with theasymptotic behavior of the log likelihood ratio in the case of a sub-critical spike. In aseparate research eﬀort, we follow Onatski et al (2013), who analyze the log likelihoodratio in the sub-critical regime for the case of a single Wishart matrix, to show thatthe experiment of observing the eigenvalues of B − A in the sub-critical regime is not ofthe LAN type. Furthermore, the log-likelihood process turns out to depend only on asmooth functional of the empirical distribution of all the eigenvalues of Σ − A, so thatasymptotically eﬃcient inference procedures may ignore the information contained in λ altogether. The results of this sub-critical analysis will be published elsewhere.The rest of the paper is structured as follows. In the next section, we describe oursetting. In Section 3, we derive the almost sure limits of a few of the largest eigenvaluesof the F-ratio. In Section 4, we establish the asymptotic normality of the eigenvalueﬂuctuations in the super-critical regime. In Section 5, we derive an asymptotic approx-imation to the joint distribution of the eigenvalues of B − A for the special case of asingle super-critical spike. In Section 6, we show that the likelihood ratio in the localparameter space is asymptotically equivalent to a centered and scaled largest eigenvalue,and establish the LAN property. Section 7 concludes.

Suppose that A ∼ W p ( n + k, Σ , Ω ) and B ∼ W p ( n , Σ )are independent non-central and central Wishart matrices respectively. For the non-centrality parameter Ω , we use a symmetric version of the deﬁnition in Muirhead (1982,p. 442). That is, if Z is an n × p matrix distributed as N ( M, I n ⊗ Σ) , then Z ′ Z ∼ W p ( n, Σ , Ω) with the non-centrality parameter Ω = Σ − / M ′ M Σ − / . We will consider4wo diﬀerent settings for the parameters Σ , Σ , and Ω . Setting 1 (Spiked covariance) Σ = Σ , Σ = Σ / ( I p + V hV ′ ) Σ / , and Ω = 0 . Here Σ / is the symmetric square root of a positive deﬁnite matrix Σ; V in a p × k matrix of nuisance parameters with orthonormal columns, and h = diag { h , ..., h k } is the diagonal matrix of the “covariance spikes” with h > ... > h k . Setting 2 (Spiked non-centrality) Σ = Σ , Σ = Σ , and Ω = ( n + k ) V hV ′ , whereΣ , V, and h are as deﬁned above, but h j with j = 1 , ..., k are interpreted as “non-centrality spikes.”We are interested in the behavior of the eigenvalues of F ≡ ( B/n ) − A/n A , where n A = n + k, as n , n , and p grow so that p/n → c and p/n → c with 0 < c i < , while k , thenumber of spikes, remains ﬁxed. In what follows, we will assume that Σ = I p . Thisassumption is without loss of generality because the eigenvalues of F do not changeunder the transformation A Σ − / A Σ − / , B Σ − / B Σ − / .It is convenient to think of A/n A as a sample covariance matrix XX ′ /n A of thesample X having the factor structure X = V F ′ + ε (1)with V, F , and ε playing the roles of the factor loadings, factors, and idiosyncratic terms,respectively. Matrices F and ε are mutually independent, and independent from B . Thedistribution of ε is N (0 , I p ⊗ I n A ) , and the distribution of F depends on the setting.For Setting 1, F ∼ N (0 , I p ⊗ h ) , whereas for Setting 2, F is a deterministic matrixsuch that F ′ F /n A = h . With this interpretation, Settings 1 and 2 describe, respectively,distributions which are unconditional and conditional on the factors. In both cases thespike parameters h j , j = 1 , ..., k, measure the factors’ variability.We would like to introduce a convenient representation of the eigenvalues of F , thatwe will denote as λ p ≥ ... ≥ λ pp . First, note that λ pj , j = 1 , ..., p , are invariant withrespect to the simultaneous transformations A U AU ′ ≡ n A ˜ H and B U BU ′ ≡ n E, (2)where U is a random matrix uniformly distributed over the orthogonal group O ( p ).Under the assumption that Σ = I p , matrix n E is distributed as W p ( n , I p ) and is5ndependent from ˜ H . Matrix ˜ H has the form ˜ X ˜ X ′ /n A , where˜ X = ˜ V F ′ + ˜ ε with ˜ ε ∼ N (0 , I p ⊗ I n A ) independent from ˜ V , and ˜ V being a random p × k matrixuniformly distributed on the Stiefel manifold of orthogonal k -frames in R p . We can thinkof ˜ V as having the form ˜ V = v (cid:0) v ′ v (cid:1) − / ≡ vW − / v , where v ∼ N (0 , I p ⊗ I k ) and W v ≡ v ′ v is Wishart W k ( p, I k ) . Further, let O F ∈ O ( n A ) be such that the submatrix of its ﬁrst k columns equals F ( F ′ F ) − / , and let ˆ X = ˜ XO F . Clearly,˜ H = ˜ X ˜ X ′ /n A = ˆ X ˆ X ′ /n A , (3)and matrix ˆ X has the form ˆ X = vW − / v h / W / F + ˆ ε, where v, W F and ˆ ε are mutually independent and independent from E ; ˆ ε ∼ N (0 , I p ⊗ I n A );and the distribution of W F depends on the setting. For Setting 1, W F ∼ W k ( n A , I k ) , whereas for Setting 2, W F = n A I k .Finally, let us denote the submatrix of the ﬁrst k columns of ˆ ε as u. Thenˆ X ˆ X ′ = ξξ ′ + n H, (4)where n H ∼ W p ( n , I p ) , H and ξξ ′ are mutually independent, and independent from E, and ξ = vW − / v h / W / F + u. (5)Using (2), (3), and (4), we obtain the convenient representation for the eigenvalues,announced above. Let ˆ x p ≥ ... ≥ ˆ x pp be the roots of the equationdet (cid:0) ξξ ′ /n + H − xE (cid:1) = 0 . (6)Then λ pj = n ˆ x pj / ( n + k ) . (7)This representation is convenient because the roots of (6) can be viewed and analyzed asperturbations of the roots of equation det ( H − xE ) = 0 caused by adding the low-rankmatrix ξξ ′ /n to H . 6f x ∈ R is such that H − xE is invertible, then (cid:0) ξξ ′ /n + H − xE (cid:1) − = S − Sξ (cid:0) I k + ξ ′ Sξ/n (cid:1) − ξ ′ S/n , where S ≡ ( H − xE ) − . Therefore, if x is a root of the equationdet (cid:16) I k + ξ ′ ( H − xE ) − ξ/n (cid:17) = 0 , (8)then it also solves (6), and hence, the asymptotic behavior of the roots of (6) can beinferred from that of the random matrix-valued function M ( x ) = ξ ′ ( H − xE ) − ξ/n . (9)This is the main idea of the analysis in the next section of the paper. Let n ≡ ( n , n ) and c ≡ ( c , c ). We will denote the asymptotic regime where n , n , and p grow so that p/n → c and p/n → c with c j ∈ (0 ,

1) as p, n → c ∞ . As followsfrom Wachter’s (1980) work, as p, n → c ∞ , the empirical distribution of the eigenvaluesof E − H converges in probability to the distribution with density1 − c π p ( b + − λ ) ( λ − b − ) λ ( c + c λ ) { b − ≤ λ ≤ b + } . (10)The upper and the lower boundaries of the support of this density are b ± = (cid:18) ± r − c (cid:19) , where r = √ c + c − c c .The results of Silverstein and Bai (1995) and Silverstein (1995) show that the empiricaldistribution converges not only in probability, but also almost surely (a.s.). Furthermore,as follows from Theorem 1.1 of Bai and Silverstein (1998), the largest eigenvalue of E − H a.s. converges to b + .The latter convergence, together with (7) and Weyl’s inequalities for the eigenvaluesof a sum of two Hermitian matrices (see Theorem 4.3.7 in Horn and Johnson (1985)),imply that the k + 1-th largest eigenvalue of F , λ p,k +1 , a.s. converges to b + . Those ofthe k largest eigenvalues that remain separated from b + as p, n → c ∞ , must correspondto solutions of (8). Below, we study these solutions in detail.7 emma 1 For any x > b + , as p, n → c ∞ , p tr h ( H − xE ) − i a.s. → m x (0) and (11)1 p tr (cid:20) dd x ( H − xE ) − (cid:21) a.s. → dd x m x (0) , (12) where m x (0) = lim z → m x ( z ) , and m x ( z ) ∈ C + is analytic in z ∈ C + , and satisﬁesequation z −

11 + c m x ( z ) = − m x ( z ) − x − c xm x ( z ) . (13) Proof:

Let x ∈ R be such that x > b + , and let F x ( λ ) be the empirical distributionfunction of the eigenvalues of H − xE . For any z ∈ C + , letˆ m x ( z ) = Z ( λ − z ) − d F x ( λ )be the Stieltjes transform of F x ( λ ). Note that matrix H − xE can be represented in theform Y T Y ′ /p, where Y ∼ N (0 , I p ⊗ I n + n ) and T is a diagonal matrix with the ﬁrst n and the last n diagonal elements equal to p/n and − xp/n , respectively. Therefore,by Theorem 1.1 of Silverstein and Bai (1995), for any z ∈ C + , ˆ m x ( z ) a.s. convergesto m x ( z ) ∈ C + , which is an analytic function in the domain z ∈ C + that solves thefunctional equation (13).By Theorem 1.1 of Bai and Silverstein (1998), the largest eigenvalue of E − H a.s.converges to b + . Therefore, for any x > b + , the largest eigenvalue of H − xE is a.s.asymptotically bounded away from the positive semi-axis. Hence, ˆ m x ( z ) is analytic andbounded in a small disc D around z = 0 for all suﬃciently large p and n , a.s. By Vitali’stheorem (see Titchmarsh (1960), p.168), ˆ m x ( z ) is a.s. converging to an analytic functionin D . Since, in D ∩ C + , the limiting function is m x ( z ) , we have1 p tr h ( H − xE ) − i = ˆ m x (0) a.s. → m x (0) , where m x (0) = lim z → m x ( z ). Further, p tr h ( H − ζE ) − i is an analytic bounded func-tion of ζ in a small disk D x around x, for all suﬃciently large p and n , a.s. Therefore,by Vitali’s theorem its a.s. limit f ( ζ ) is analytic in D x , and1 p tr (cid:20) dd ζ ( H − ζE ) − (cid:21) a.s. → dd ζ f ( ζ )in D x . On the other hand, we know that f ( ζ ) = m Re ζ (0) for ζ from D x . Therefore, wehave (12). (cid:3) emma 2 For any x > b + , as p, n → c ∞ , (cid:13)(cid:13)(cid:13)(cid:13) M ( x ) − ( h + c I k ) 1 p tr h ( H − xE ) − i(cid:13)(cid:13)(cid:13)(cid:13) a.s. → and (cid:13)(cid:13)(cid:13)(cid:13) dd x M ( x ) − ( h + c I k ) 1 p tr (cid:20) dd x ( H − xE ) − (cid:21)(cid:13)(cid:13)(cid:13)(cid:13) a.s. → , where k·k denotes the spectral norm. Proof:

This convergences follow from (5), (9), and Lemma 3 stated below. (cid:3)

Lemma 3

Let C be a random p × p matrix, independent from u and v , which are asdeﬁned in Section 2, and such that p k C k is bounded for all suﬃciently large p, a.s. Then,as p → ∞ , (cid:13)(cid:13) v ′ Cv − (tr C ) I k (cid:13)(cid:13) a.s. → and (cid:13)(cid:13) v ′ Cu (cid:13)(cid:13) a.s. → . Proof:

This lemma follows from the Borel-Cantelli lemma, and the upper boundson the fourth moments of the entries v ′ Cv − (tr C ) I k and v ′ Cu established by Lemma2.7 of Bai and Silverstein (1998). (cid:3) Lemma 4 (i) For any ε > , the k eigenvalues of M ( x ) are strictly increasing functionsof x ∈ ( b + + ε, ∞ ) for suﬃciently large p and n , a.s.; (ii) m x (0) is a strictly increasing,continuous function of x ∈ ( b + , ∞ ) ; (iii) lim x →∞ m x (0) = 0 , and lim x ↓ b + m x (0) ( h i + c ) < − if and only if h i > ¯ h, where ¯ h = c + r − c . Proof:

Let µ ∈ (0 , ∞ ) be the largest eigenvalue of E − H. For any x > x > µ , matrix ( x E − H ) − − ( x E − H ) − is negative deﬁnite, a.s. Part (i) follows from this,from the deﬁnition (9), and from the fact that µ a.s. → b + . Part (i) together with Lemmas1 and 2 imply that m x (0) is increasing on ( b + , ∞ ) . It is strictly increasing because,otherwise, (13) would not be satisﬁed for some z ∈ C + that are suﬃciently close to zero.The continuity follows from the analyticity of m x (0) established in the proof of Lemma1. Finally, lim x →∞ m x (0) = 0 is implied by (ii) and (11). Equation (13) implies thatlim x ↓ b + m x (0) = c − r + 1) r , which, in its turn, implies the second statement of (iii). (cid:3) Let ˆ x p ≥ ... ≥ ˆ x pk be the solutions of equation (8). By Lemmas 1, 2, and 4, if h > ... > h m > ¯ h > h m +1 > ... > h k , (14)then ˆ x pi a.s. → x i , where x i , i = 1 , ..., m, are such that1 + ( h i + c ) m x i (0) = 0 (15)9nd m x i (0) satisﬁes (13) with x replaced by x i . In particular,11 + c m x i (0) − m x i (0) − x i − c x i m x i (0) = 0 . (16)Combining (15) and (16), we obtain1 h i + 1 − x i h i + c + c x i = 0 , which implies that x i = ( h i + c ) ( h i + 1) h i − c ( h i + 1) . (17)By (7), n ˆ x pi / ( n + k ) , i = 1 , ..., m , must be the m largest eigenvalues of F , and thus, x i , i = 1 , ..., m, describe their a.s. limits. Since there are only m roots of (8) that areasymptotically separated from b + and are located above b + , the other k − r of the largesteigenvalues of F must a.s. converge to b + . To summarize, the following proposition holds. Proposition 5

Suppose that h > ... > h m > ¯ h > h m +1 > ... > h k . Then, for i ≤ m, the i -th largest eigenvalue of F a.s. converges to x i deﬁned in (17). For m < i ≤ k, the i -th largest eigenvalue a.s. converges to b + . As follows from Proposition 5, ¯ h = ( c + r ) / (1 − c ) is the phase transition thresholdfor the eigenvalues of the spiked F-ratio F . The value of this threshold diverges to inﬁnitywhen c →

1. Note that, when c is close to one, the smallest eigenvalue of B/n isclose to zero, which makes ( B/n ) − a particularly bad estimator of the inverse of thepopulation covariance, Σ − . When c → , the phase transition converges to √ c , whichis the phase transition threshold for the eigenvalues of a single spiked Wishart matrix.In such a case, x i converges to ( h i + c ) ( h i + 1) /h i , which is the a.s. limit of the i -thlargest eigenvalue of the spiked Wishart when the i -th spike h i is above √ c . In what follows, we will assume that (14) holds, so that only m eigenvalues of F separatefrom the bulk asymptotically. We would like to study their ﬂuctuations around thecorresponding a.s. limits. Proposition 5 shows that the limits x i depend on c and c .Because of this dependence, the rate of the convergence has to depend on the rates ofthe convergences p/n → c and p/n → c . However, as will be shown below, the latterrates do not aﬀect the ﬂuctuations of λ pi around x pi = ( h i + c p ) ( h i + 1) h i − c p ( h i + 1) , which are obtained from x i by replacing c and c by c p = p/n and c p = p/n .10imilar to x i , which are linked to the Stieltjes transform of the limiting spectraldistribution of xE − H via (15), x pi also can be linked to the limiting Stieltjes transform,albeit under a slightly diﬀerent asymptotic regime. Precisely, let m px ( z ) be the Stieltjestransform of the limiting spectral distribution of xE − H as n , n , and p grow so that p/n and p/n remain ﬁxed . Then, similarly to (15), we have1 + ( h i + c p ) m px pi (0) = 0 . (18)This link will be useful in our analysis below, where we maintain the assumption that p/n and p/n are not necessarily ﬁxed, but converge to c and c , respectively.Recall that, by (7), λ pi = n ˆ x pi / ( n + k ) , where ˆ x pi , i = 1 , ..., m, satisfy ( ). Clearly,the asymptotic distributions of √ p ( λ pi − x pi ) and √ p (ˆ x pi − x pi ) , i = 1 , ..., m, coincide.Therefore, below we will study the asymptotic behavior of √ p (ˆ x pi − x pi ) , i = 1 , ..., m. By the standard Taylor expansion argument, √ p (ˆ x pi − x pi ) = − √ p det M ( x pi ) dd x det M ( x pi ) + (ˆ x pi − x pi ) d d x det M (˜ x pi ) , (19) i = 1 , ..., m , where M ( x ) = I k + M ( x ) , and ˜ x pi ∈ [ x pi , ˆ x pi ] . We havedd x det M ( x pi ) = det M ( x pi ) tr S ( x pi ) , and d d x det M ( x pi ) = det M ( x pi ) n tr R ( x pi ) + (tr S ( x pi )) − tr (cid:2) S ( x pi ) (cid:3)o , where S ( x ) = M ( x ) − dd x M ( x ) , and R ( x ) = M ( x ) − d d x M ( x ) . Since the event det M ( x pi ) = 0 or 1 + M ii ( x pi ) = 0 for some i = 1 , ..., m happens with probability zero, we can simultaneously multiply the numerator and de-nominator of (19) by (1 + M ii ( x pi )) / det M ( x pi ) to obtain √ p (ˆ x pi − x pi ) = − √ p (1 + M ii ( x pi )) s ( x pi ) + (ˆ x pi − x pi ) δ ( x pi ) , (20)where s ( x pi ) = (1 + M ii ( x pi )) tr S ( x pi ) , δ ( x pi ) = (1 + M ii ( x pi )) n tr R ( x pi ) + (tr S ( x pi )) − tr (cid:2) S ( x pi ) (cid:3)o . Lemma 6

For any i = 1 , ..., m, we have: (i) s ( x pi ) P → ( h i + c ) dd x m x i (0) ; (ii) δ ( x pi ) = O (1) a.s. Proof : By Lemmas 1 and 2,dd x M ( x pi ) a.s. → ( h + c I k ) dd x m x i (0) . (21)Further, (1 + M ii ( x pi )) ( I k + M ( x pi )) − a.s. → diag { , ..., , , , ..., } (22)with 1 at the i -th place on the diagonal. The latter convergence follows from the factthat I k + M ( x pi ) can be viewed as a small perturbation of a diagonal matrix I k + ( h + c I k ) m x i (0) , which has non-zero diagonal elements, except at the i -th position. The eigenvalue per-turbation formulae (see, for example, (2.33) on p.79 of Kato (1980)) will then lead to(22). Combining (21) and (22), and using the deﬁnition of s ( x pi ) , we obtain (i).To establish (ii), we note that (1 + M ii ( x pi )) tr R ( x pi ) = O P (1) by an argumentsimilar to that used to establish (i). Further, (tr S ( x pi )) − tr (cid:2) S ( x pi ) (cid:3) is a linearfunction of the only eigenvalue of S ( x pi ) that diverges to inﬁnity. By the eigenvalueperturbation formulae, such an eigenvalue equals (1 + M ii ( x pi )) − O (1) a.s. Therefore,(1 + M ii ( x pi )) (cid:16) (tr S ( x pi )) − tr (cid:2) S ( x pi ) (cid:3)(cid:17) = O (1) , which concludes the proof of (ii). (cid:3) Equation (20), Lemma 6, and the Slutsky theorem imply that, for the purpose ofestablishing convergence in distribution of √ p (ˆ x pi − x pi ), i = 1 , ..., m, we may focus onthe numerator of (20) Z ii ( x pi ) ≡ √ p (1 + M ii ( x pi )) = √ p (cid:0) M ii ( x pi ) − ( h i + c p ) m px pi (0) (cid:1) , where the last equality follows from (18).The random variable Z ii is the entry of the matrix Z ( x pi ) = √ p (cid:0) M ( x pi ) − ( h + c p I k ) m px pi (0) (cid:1) that belongs to the i -th row and the i -th column. Let us now introduce new notations.12et D = ( W F /n ) / h / ( W v /p ) − / ,G = ( H − x pi E ) − /p, ∆ F = √ n (cid:16) ( W F /n ) / − I k (cid:17) , and∆ v = √ p ( W v /p − I k ) . Then, using equations (9) and (5), we obtain the following decomposition. Z ( x pi ) = X v =1 Z ( v ) , where Z (1) = D √ p (cid:0) v ′ Gv − I k tr G (cid:1) D ′ ,Z (2) = (tr G ) D ( W v /p ) − / h / √ c p ∆ F ,Z (3) = tr G √ c p ∆ F h / ( W v /p ) − h / ,Z (4) = − (tr G ) h / ∆ v ( W v /p ) − h / ,Z (5) = √ c p √ p (cid:0) Dv ′ Gu + u ′ GvD ′ (cid:1) ,Z (6) = c p √ p (cid:0) u ′ Gu − I k tr G (cid:1) , and Z (7) = ( h + c p I k ) √ p (cid:0) tr G − m px pi (0) (cid:1) . For the last term, Z (7) , we prove the following lemma. Lemma 7 Z (7) P → . Proof:

The proof of this lemma will appear in a separate work. Had x pi beennegative, H − x pi E would have been having the form Y T Y ′ with Y ∼ N (0 , I p ⊗ I n + n )and a positive deﬁnite diagonal T with converging spectral distribution. The Lemmawould have been following then from the results of Bai and Silverstein (2004). Our proofextends Bai and Silverstein’s (2004) arguments to the case of negative x pi . (cid:3) Further, the asymptotic behavior of the terms Z (2) and Z (3) diﬀer depending on thesetting. Recall that for Setting 1, W F ∼ W k ( n A , I k ). Then, since∆ F = √ n ( W F /n − I k ) / o P (1) , a standard CLT together with Lemma 1 imply that Z (2) + Z (3) d → N (cid:0) , c m x i (0) h (cid:1) . (23)13he latter limit is independent from the limits of Z ( j ) , j = 2 , , because W F is indepen-dent from u and v .In contrast, for Setting 2, we have W F = n A I k , and ∆ F = o (1) . Therefore, Z (2) + Z (3) P → . (24)Let us now establish the convergence of Z ( j ) , j ≤ j = 2 ,

3. Let l i and L i be such that [ l i , L i ] includes the support of the limiting spectral distribution, G x i ,of H − x pi E . Moreover, let [ l i , L i ] be such that none of the eigenvalues λ ( i ) p , ..., λ ( i ) pp of H − x pi E lies outside [ l i , L i ] for suﬃciently large p , a.s. Further, let g q with q = 1 , ..., Q, where Q is an arbitrary positive integer, be functions which are continuous on [ l i , L i ]and let ζ denote a p × m matrix with i.i.d. N (0 ,

1) entries . Finally, letΘ = { ( q, s, t ) : q = 1 , ..., Q ; 1 ≤ s ≤ t ≤ m } . The following Lemma is a slight modiﬁcation of Lemma 13 of the Supplementary Ap-pendix in Onatski (2012).

Lemma 8

The joint distribution of random variables (cid:26) √ p X pj =1 g q (cid:16) λ ( i ) pj (cid:17) ( ζ js ζ jt − δ st ) , ( q, s, t ) ∈ Θ (cid:27) weakly converges to a multivariate normal. The covariance between components ( q, s, t ) and ( q , s , t ) of the limiting distribution is equal to when ( s, t ) = ( s , t ) , and to (1 + δ st ) R g q ( λ ) g q ( λ ) d G x i ( λ ) when ( s, t ) = ( s , t ) . Proof:

For readers’ convenience, we provide a proof of this Lemma in the Appendix. (cid:3)

Note that all entries of Z ( j ) , j ≤ j = 2 , , are linear combinations of theterms having the form considered in Lemma 8, with weights converging in probabilityto ﬁnite constants. Take, for example Z (1) . Its entries are linear combinations of theentries of 1 √ p v ′ ( H − x pi E ) − v − I k √ p tr ( H − x pi E ) − , which, in turn, can be represented in the form √ p X pj =1 (cid:16) λ ( i ) pj (cid:17) − ( ζ js ζ jt − δ st ) . Thematrix ζ is obtained by multiplying [ u, v ] from the left by the eigenvector matrix of H − x pi E .Lemma 8 implies that vector (cid:16) Z (1) ii , Z (4) ii , Z (5) ii , Z (6) ii (cid:17) converges in distribution to a14our-dimensional normal vector with zero mean and the following covariance matrix  h i m ′ x i (0) − h i m x i (0) 0 0 − h i m x i (0) 2 h i m x i (0) 0 00 0 4 c h i m ′ x i (0) 00 0 0 2 c m ′ x i (0)  . Combining this result with Lemma 7, and convergencies (23), and (24), we obtain, forSetting 1, Z ii ( x pi ) d → N (cid:16) , h i + c ) m ′ x i (0) − h i (1 − c ) m x i (0) (cid:17) , (25)and, for Setting 2, Z ii ( x pi ) d → N (cid:16) , h i + c ) m ′ x i (0) − h i m x i (0) (cid:17) . (26)To establish the joint convergence of Z ii ( x pi ) , i = 1 , ..., m, we need another lemma. Foreach i = 1 , ..., m, let g ( i ) q , with q = 1 , ..., Q, be functions continuous on [ l i , L i ] . Lemma 9

For any set of pairs { ( s i , t i ) : i = 1 , ..., m } such that ( s i , t i ) = ( s i , t i ) forany i = i , the joint distribution of random variables (cid:26) √ p X pj =1 g ( i ) q (cid:16) λ ( i ) pj (cid:17) ( ζ js i ζ jt i − δ s i t i ) , i = 1 , ..., m (cid:27) weakly converges to a multivariate normal. The covariance between components i and i of the limiting distribution is equal to when i = i . The proof of this lemma is very similar to that of Lemma 8, and we omit it to savespace. Lemma 9 implies that Z ii ( x pi ) , i = 1 , ..., m jointly converge to an m -dimensionalnormal vector with a diagonal covariance matrix. This result, together with equation(20), Lemma 6, and convergences (25, 26) establish the following Lemma. Lemma 10

The joint asymptotic distribution of √ p ( λ pi − x pi ) , i = 1 , ..., m is normal,with diagonal covariance matrix. For Setting 1, the i -th diagonal element of the covari-ance matrix equals h i + c ) m ′ x i (0) − h i (1 − c ) m x i (0)( h i + c ) (cid:0) dd x m x i (0) (cid:1) . (27) For Setting 2, it equals h i + c ) m ′ x i (0) − h i m x i (0)( h i + c ) (cid:0) dd x m x i (0) (cid:1) . (28)15n the Appendix, we establish the following explicit expressions for m x i (0) , m ′ x i (0) , and dd x m x i (0) : m x i (0) = ( h i + c ) − , (29) m ′ x i (0) = − h i ( h i + c ) (cid:16) c + c (1 + h i ) − h i (cid:17) , (30)d m x i (0) / d x = − ( c (1 + h i ) − h i ) ( h i + c ) (cid:16) c + c (1 + h i ) − h i (cid:17) . (31)Using (29), (30), and (31) in (27) and (28), we obtain Proposition 11

For any h > ... > h m > ¯ h ≡ ( c + r ) / (1 − c ) , the joint asymptoticdistribution of √ p ( λ pi − x pi ) , i = 1 , ..., m is normal with diagonal covariance matrix.For Setting 1, √ p ( λ pi − x pi ) d → N  , r h i ( h i + 1) (cid:16) h i − c ( h i + 1) − c (cid:17) ( c − h i + c h i )  , (32) whereas for Setting 2, √ p ( λ pi − x pi ) d → N  , t h i ( h i + 1) (cid:16) h i − c ( h i + 1) − c (cid:17) ( c − h i + c h i )  . (33) Here r = c + c − c c ,t = c + c − c (cid:0) h i − c (cid:1) (1 + h i ) , and x pi = ( h i + p/n ) ( h i + 1) h i − ( h i + 1) p/n . Remark 12

It is straightforward to verify that t < r as long as h i > ¯ h . Therefore, theasymptotic variance of λ i is smaller for Setting 2 than for Setting 1. This accords withintuition because, as discussed above, Setting 2 corresponds to the asymptotic analysisconditional on factors F , whereas Setting 1 corresponds to the unconditional asymptoticanalysis. The factors’ variance adds to the asymptotic variance of λ i . Remark 13

For Setting 1, when c → , the asymptotic variance of λ i converges to thecorrect asymptotic variance c ( h i + 1) (cid:0) h i − c (cid:1) /h i f the largest eigenvalue of the spiked Wishart model. Non-centrality spikes in Wishartdistribution were considered in Onatski (2007). The limit of the asymptotic variance in(33) when c → coincides with the formula for the asymptotic variance derived there. From now on, let us consider the case of a single spike, which is located above the phasetransition threshold ¯ h . That is, assume that k = m = 1 , and let h = h p . We would liketo study the asymptotic behavior of the ratio of the joint densities of all the eigenvaluesof F that correspond to H : h p = h and to H : h p = h + γ/ √ p, where h > ¯ h is ﬁxed and γ is a local parameter.Following James (1964) and Khatri (1967), we can write the joint density of theeigenvalues of F in Setting 1 as f (Λ; h p ) = Z p (Λ)(1 + h p ) n A / F (cid:18) n h p h p + 1 V V ′ , α p Λ( I p + α p Λ) − (cid:19) , and in Setting 2 as f (Λ; h ) = Z p (Λ)exp { h p n A / } F (cid:18) n n A n A h p V V ′ , α p Λ( I p + α p Λ) − (cid:19) , where F and F are the hypergeometric functions of two matrix arguments, α p = n A /n , n = n A + n , Λ = diag { λ p , · · · , λ pp } , and Z pj (Λ) , j = 1 , , depend on n A , n , p and Λ, but not on h p . The joint densities are evaluated at the observed values of theeigenvalues.To facilitate analysis, we use Proposition 1 of Dharmawansa and Johnstone (2014)to rewrite f (Λ; h p ) and f (Λ; h p ) as shown in the following lemma. Lemma 14

Consider the region C \ (1 , ∞ ) in the complex plane. Let ˜ K be a contourdeﬁned in that region which starts at −∞ , encircles ˜ λ pj = α p λ pj / (1 + α p λ pj ) , j =1 , ..., p, counter-clockwise and returns to −∞ . Then we have f (Λ; h p ) = C p (Λ) k p ( h p ) 12 πi Z ˜ K (cid:18) − h p h p z (cid:19) p − n − p Y j =1 (cid:16) z − ˜ λ pj (cid:17) − d z, (34)17 nd f (Λ; h p ) = C p (Λ) k p ( h p ) 12 πi (35) × Z ˜ K F (cid:18) n − p + 22 , n A − p + 22 ; n A h p z (cid:19) p Y j =1 (cid:16) z − ˜ λ pj (cid:17) − d z where C pj (Λ) , j = 1 , , depend on n A , n , p and Λ , but not on h p , k p ( h p ) = (1 + h p ) p − − nA h − p/ p , and k p ( h p ) = exp {− n A h p / } h − p/ p . We will now derive an asymptotic approximation to the contour integrals in (34) and(35). First, we will analyze (34) and then turn to (35).

Let us deform the contour ˜ K , without changing the integral’s value with probabilityapproaching one as p, n → c ∞ , as shown in Figure 1. Let K = K + ∪ K + with K + = K +1 ∪ K +2 ∪ K +3 ∪ K +4 , where K +1 = n z : ℑ ( z ) ≥ | z − ˜ λ p | = ǫ o , K +2 = { z : z ∈ [˜ x , ˜ λ p − ǫ ] } , K +3 = { z : ℜ ( z ) = ˜ x and 0 ≤ ℑ ( z ) ≤ ˜ x } , and K +4 = { z : ℜ ( z ) ≤ ˜ x and ℑ ( z ) = ˜ x } . Here ǫ > αb + / (1 + αb + ) < ˜ x < αx / (1 + αx ) with α = lim α p = c /c ,and x = lim x p = ( h + c ) ( h + 1) h − ( h + 1) c . (36)As follows from our results in the previous section, ˜ λ p a.s. → αx / (1 + αx ) and ˜ λ p a.s. → αb + / (1 + αb + ), so ˜ x ∈ (cid:16) ˜ λ p , ˜ λ p (cid:17) for suﬃciently large p and n , a.s.Consider the following integral over the deformed contour K I p ( γ, Λ) = Z K F ( z ) p Y j =1 (cid:16) z − ˜ λ pj (cid:17) − d z, (37)where F ( z ) ≡ (cid:18) − h p h p z (cid:19) p − n − . For two sequences of random variables { ξ p } and { η p } , we will write ξ p P ∼ η p if and only18 x i ˜ x − i ˜ x Im ( z ) K +2 K +3 K +4 Re ( z ) α p λ p α p λ p +1 K +1 α p λ p α p λ p +1 Figure 1: The contour K .if ξ p /η p converges in probability to 1 as p, n → c ∞ . We have the following lemma. Lemma 15

Under the hypothesis that h p = h , uniformly in γ from any compact subsetof R I p ( γ, Λ) P ∼ F (cid:16) ˜ λ p (cid:17) (cid:18) − πpH (cid:19) p Y j =2 (cid:16) ˜ λ p − ˜ λ pj (cid:17) − , where the principal branches of the square roots are used, and H = h (1 − c )( µ − p b + )( µ − p b − )( c + c µ )2 c c ( h − c µ ) µ ( c + h ) (38) with µ = h + 1 . Proof:

Let K j = K + j ∪ K + j for j = 1 , ..., . Using this notation, we can decompose(37) as I p ( γ, Λ) = I , ,p ( γ, Λ) + I , ,p ( γ, Λ) (39)where I , ,p ( γ, Λ) is the part of the integral corresponding to K ∪ K , and I , ,p ( θ, λ ) isthe part corresponding to the rest of the contour, K ∪ K . Our strategy is to show thatthe integral I p ( γ, Λ) is asymptotically equivalent to I , ,p ( γ, Λ), the integral I , ,p ( γ, Λ)being asymptotically dominated by I , ,p ( γ, Λ).Let us ﬁrst focus on I , ,p ( γ, Λ). Since the singularity of the integrand at ˜ λ p is ofthe inverse square root type, as the radius ǫ of K converges to zero, the integral over19 converges to zero too. Therefore, we have I , ,p ( γ, Λ) = 2 I ,p ( γ, Λ) , (40)where I ,p ( γ, Λ) = Z ˜ x ˜ λ p F ( z ) p Y j =1 (cid:16) z − ˜ λ pj (cid:17) − d z. Changing the variable of integration from z to x = ˜ λ p − z, we arrive at I ,p ( θ, λ ) = − Z ˜ λ p − ˜ x F (cid:16) ˜ λ p − x (cid:17) p Y j =1 (cid:16) ˜ λ p − ˜ λ pj − x (cid:17) − d x = F (cid:16) ˜ λ p (cid:17) ( − Z ˜ λ p − ˜ x exp {− pf p ( x ) } x − d x where f p ( x ) = β p (cid:18) h p (1 + α p λ p )1 + h p + α p λ p x (cid:19) + 12 p p X j =2 ln (cid:16) ˜ λ p − ˜ λ pj − x (cid:17) with β p = (2 + n − p ) /p . Now the integral R ˜ λ p − ˜ x exp {− pf p ( x ) } x − d x can be eval-uated using standard Laplace approximation steps (see Olver (1997), section 7.3) asfollows.First, let us show that the derivative dd x f p ( x ) is continuous and positive on x ∈ [0 , ˜ λ p − ˜ x ] for suﬃciently large p and n , a.s. We havedd x f p ( x ) = β p h p (1 + α p λ p )( h p + (1 + α p λ p )(1 + h p x )) − p p X j =2 (cid:16) ˜ λ p − ˜ λ pj − x (cid:17) − . Therefore, the continuity follows from the fact that, when h p = h , λ p a.s. → x ≡ ( h + c ) ( h + 1) h − ( h + 1) c and λ p a.s. → b + . In order to establish the positivity, we ﬁrst obtainmin x ∈ [0 , ˜ λ p − ˜ x ] dd x f p ( x ) = β p h p h p − h p ˜ x − p p X j =2 (cid:16) ˜ x − ˜ λ pj (cid:17) − . It is straightforward to verify that the above equation can be represented in the following20orm min x ∈ [0 , ˜ λ p − ˜ x ] dd x f p ( x ) = β p h p (1 + αx )1 + h p + αx + p − p (1 + αx )+ 12 α p p (1 + αx ) p X j =2 ( λ pj − αx /α p ) − . where x = ˜ x / ( α (1 − ˜ x )) . Therefore, we obtainmin x ∈ [0 , ˜ λ p − ˜ x ] dd x f p ( x ) a.s. → Ψ( x , h ) (41)where Ψ( x , h ) = βh (1 + αx )2(1 + h + αx ) + 12 (1 + αx ) + 12 α (1 + αx ) m ( x ) ,β = c − + c − − , and m ( x ) = lim z → x m ( z ) with m ( z ) being the Stieltjes transform of the limiting spec-tral distribution of F , that is the distribution with density (10).Since m ( x ) is increasing on x ∈ ( b + , ∞ ), we haveΨ( x , h ) > lim x ↓ b + Ψ( x , h ) . Moreover, noting the fact that lim x ↓ b + Ψ( x , h ) is an increasing function of h and h > ¯ h ≡ ( c + r ) / (1 − c ) = p b + −

1, we obtainΨ( x , h ) > lim x ↓ b + Ψ( x , h ) > lim x ↓ b + Ψ (cid:0) x , ¯ h (cid:1) . (42)Finally, direct calculations, which are not reported here to save space, show that, as x converges to b + from the right, m ( x ) → − / ( b + − p b + ) . (43)This in turn gives lim x ↓ b + Ψ (cid:0) x, ¯ h (cid:1) = 0 (44)which establishes the positivity.Since λ p a.s. → x , we have f ′ p (0) ≡ dd x f p ( x ) (cid:12)(cid:12)(cid:12)(cid:12) x =0 a.s. → H , H = r c c (1 + αx ) h h + αx + 12 (1 + αx ) + 12 α (1 + αx ) m ( x ) . Direct calculations show that m ( x ) = lim z → x m ( z ) = − (1 + h ) / ( x h ) , (45)which, after some algebraic manipulations, gives (38).We may now exploit the approach given in Olver (1997, pp. 81-82) to yield Z ˜ λ p − ˜ x e − pf p ( x ) x − d x P ∼ (cid:18) πpf ′ p (0) (cid:19) e − pf p (0) . Therefore, we obtain I ,p ( γ, Λ) P ∼ F (cid:16) ˜ λ p (cid:17) (cid:18) − πpH (cid:19) p Y j =2 (cid:16) ˜ λ p − ˜ λ pj (cid:17) − . (46)As Lemma 16 below shows, I , ,p ( γ, Λ) is asymptotically dominated by I ,p ( θ, λ ), whichcompletes the proof. (cid:3) Lemma 16

Under the hypothesis that h p = h , uniformly in γ from any compact subsetof R I , ,p ( γ, Λ) = o P ( I ,p ( γ, Λ)) . (47) Proof:

Let us ﬁrst consider the integral over the contour K . For z ∈ K , we have (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F ( z ) p Y j =1 (cid:16) z − ˜ λ pj (cid:17) − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) < F (cid:16) ˜ λ p (cid:17) e − pf p ( ˜ λ p − ˜ x ) (cid:16) ˜ λ p − ˜ x (cid:17) − . Also, in view of (41), (42), and (44), we have f p (˜ λ p − ˜ x ) > f p (0) + ǫ , for suﬃcientlylarge p and n , a.s., where ǫ >

0. Therefore, using (46), we conclude Z K F ( z ) p Y j =1 (cid:16) z − ˜ λ pj (cid:17) − d z = o P ( I ,p ( γ, Λ)) . (48)Now consider the integral over the contour K . We have (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z K F ( z ) p Y j =1 (cid:16) z − ˜ λ pj (cid:17) − d z (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) < p Y j =1 (cid:12)(cid:12)(cid:12) ˜ λ pj − ˜ x (cid:12)(cid:12)(cid:12) − Z ˜ x −∞ F ( x ) d x = 4 1 + h p h p ( n − p ) (cid:18) − h p h p ˜ x (cid:19) F (˜ x ) p Y j =1 (cid:12)(cid:12)(cid:12) ˜ λ pj − ˜ x (cid:12)(cid:12)(cid:12) − . F (˜ x ) p Y j =1 (cid:16) ˜ λ pj − ˜ x (cid:17) − = F (cid:16) ˜ λ p (cid:17) e − pf p ( ˜ λ p − ˜ x ) (cid:16) ˜ λ p − ˜ x (cid:17) − , we can follow a similar procedure to that outlined above to obtain Z K F ( z ) p Y j =1 (cid:16) z − ˜ λ pj (cid:17) − d z = o P ( I ,p ( γ, Λ)) . This along with (48) gives (47). (cid:3)

Consider the following integral J p ( γ, Λ) = Z ˜ K F ( ζ ) p Y j =1 (cid:16) z − ˜ λ pj (cid:17) − d z, where F ( ζ ) ≡ F ( n A u + 1 , n A v + 1; n A ζ )with u = n A + n − p n A , v = n A − p n A , and ζ = h p z. In Johnstone and Onatski (2014) (Theorem 5), the following result is derived. As p, n → c ∞ , F ( ζ ) = C ( p, n ) e − n A ϕ ( ζ ) √ πn A i (cid:0) ψ ( ζ ) + O (cid:0) n − A (cid:1)(cid:1) , (49)where O (cid:0) n − A (cid:1) is uniform for ζ that do not approach zero or negative semi-axis and R ( ζ ) ≥ − u + v , C ( p, n ) = Γ ( n A v + 1) Γ ( n A ( u − v ) + 1)Γ ( n A u + 1) ,ϕ ( ζ ) = ( u − v ) ln ( u − v ) + v ln ( z + + v ) − u ln ( z + + u ) − z + , (50)where the principal branches of the logarithms are chosen, z + = 12 (cid:26) ζ − v + q ( ζ − v ) + 4 uζ (cid:27) , (51)23 x Im ( z ) 1 Re ( z ) α p λ p α p λ p +1 C +1 C +2 C +3 C +4 α p λ p α p λ p +12( − u + v ) h p Figure 2: The contour C .where the principal branch of the square root is chosen when R ( ζ ) ≥ − u + v and theother branch is chosen when Re ζ < − u + v , and ψ ( ζ ) = " ( z + − ζ ) s uz − u − v ( z + − ζ ) − , where the branch of the square root is chosen so that √− − i. We will deform the contour ˜ K , without changing the integral’s value with probabilityapproaching one as p, n → c ∞ , as shown in Figure 2. Formally, C = C + ∪ C + with C + = C +1 ∪ C +2 ∪ C +3 ∪ C +4 , where C +1 = { z : ℑ ( z ) ≥ | z − ˜ λ p | = ǫ } , C +2 = { z : z ∈ [˜ x , ˜ λ p − ǫ ] } , C +3 = { z : z = 2 ζ/h p s.t. R ( ζ ) ≥ − u + v, ℑ ( ζ ) ≥ , | z + + u | = | z + u |} , C +4 = { z : z = 2 ζ/h p s.t. R ( ζ ) < − u + v, and ℑ ( z + ) = | z + u |} . Here z = 12 (cid:26) ζ − v + q ( ζ − v ) + 4 uζ (cid:27) , and (52) ζ = h p x . emma 17 Under the hypothesis that h p = h , uniformly in γ from any compact subsetof R J p ( γ, Λ) P ∼ Z ( p, n , h ) e − n A ϕ (cid:16) hp ˜ λ p (cid:17) (cid:18) − πpH (cid:19) / p Y j =2 (cid:16) ˜ λ p − ˜ λ pj (cid:17) − , where Z ( p, n , h ) = C ( p, n ) √ πp c + c µ p c µ + c − c + 2 c µ , and H and µ are as deﬁned in Lemma 15. Proof : Similar to the case of Setting 1, we split J p ( γ, Λ) into two parts J p ( γ, Λ) = J , ,p ( γ, Λ) + J , ,p ( γ, Λ) , where J , ,p ( γ, Λ) is the part of the integral corresponding to C ∪ C , and J , ,p ( θ, λ ) isthe part corresponding to the rest of the contour, C ∪ C . Furthermore, J , ,p ( γ, Λ) P ∼ J ,p ( γ, Λ) , (53)where J ,p ( γ, Λ) = Z ˜ x ˜ λ p C ( p, n ) e − n A ϕ ( ζ ) √ πn A i ψ ( ζ ) p Y j =1 (cid:16) z − ˜ λ pj (cid:17) − d z. In contrast to (40), we only have the asymptotic equivalence in (53) because we are usingthe uniform asymptotic approximation (49) to deﬁne J ,p ( γ, Λ) . After the change of the variable of integration, ζ x = h p ˜ λ p − ζ, we obtain J ,p ( γ, Λ) = − Z hp ( ˜ λ p − ˜ x ) C ( p, n ) e − n A ϕ ( ζ ) √ πn A i ψ ( ζ ) 2 h p p Y j =1 (cid:18) h p ζ − ˜ λ pj (cid:19) − d x. This can be rewritten as J ,p ( γ, Λ) = (cid:18) − h p (cid:19) C ( p, n ) √ πn A i Z hp ( ˜ λ p − ˜ x ) e − n A g p ( ζ ) ψ ( ζ ) x − d x, where g p ( ζ ) = ϕ ( ζ ) + 12 n A X pj =2 ln (cid:18) h p ζ − ˜ λ pj (cid:19) , and ζ = h p λ p − x. Following the approach in the above analysis in the case of Setting 1, we now wouldlike to show that the derivative dd x g p ( h p ˜ λ p − x ) is continuous and positive on x ∈ [0 , h p (cid:16) ˜ λ p − ˜ x (cid:17) ] for suﬃciently large p and n , a.s. This is equivalent to showing that25 d ζ g p ( ζ ) is continuous and negative on ζ ∈ [ h p ˜ x , h p ˜ λ p ] for suﬃciently large p and n , a.s.It is straightforward to verify that z + satisﬁes the quadratic equation z + ( v − ζ ) z + − uζ = 0 , and ζ = z + ( z + + v ) z + + u . (54)From this, and the deﬁnition (51) of z + , we obtain that, z + > ζ for positive ζ, anddd ζ z + = z + + u z + + v − ζ = ( u + z + ) uv + 2 uz + + z > . (55)On the other hand,dd z + ϕ ( ζ ) = vz + + v − uz + + u − − uv + 2 uz + + z ( v + z + ) ( u + z + ) < ϕ ( ζ ) is strictly decreasing function of ζ. Furthermore, it is a convex function of ζ >

0. Indeed, d d z ϕ ( ζ ) = − v ( z + + v ) + u ( z + + u ) = ( u − v ) (cid:0) z − uv (cid:1) ( z + + v ) ( z + + u ) , and, using (54) and (55), we also haved d ζ z + = − u ( u + z + ) u − v (cid:0) uv + 2 uz + + z (cid:1) . Therefore, we obtaind d ζ ϕ ( ζ ) = d d z ϕ ( ζ ) (cid:18) dd ζ z + (cid:19) + dd z + ϕ ( ζ ) d d ζ z + = ( u + z + ) ( u − v )( v + z + ) (cid:0) uv + 2 uz + + z (cid:1) > . Therefore, ϕ ( ζ ) is, indeed, convex for positive ζ, and has a continuous derivative.Further, it is straightforward to see that w ( ζ ) ≡ n A X pj =2 ln (cid:18) h p ζ − ˜ λ pj (cid:19)

26s a strictly increasing concave function of ζ > h p ˜ λ p . This implies thatmax ζ ∈ h hp ˜ x , hp ˜ λ p i dd ζ g p ( ζ ) < dd ζ ϕ ( ζ ) (cid:12)(cid:12)(cid:12)(cid:12) ζ = hp ˜ λ p + dd ζ w ( ζ ) (cid:12)(cid:12)(cid:12)(cid:12) ζ = hp ˜ x = − uv + 2 uz + + z ( v + z + ) ( u + z + ) (cid:18) u ( u − v )2 z + ( u + z + ) (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) ζ = hp ˜ λ p − ph p n A p − p (1 + αx ) − ph p n A α p p (1 + αx ) p X j =2 ( λ j − αx /α p ) − The right hand side of the latter equality a.s. converges toΠ( x , h ) = − c c ( h + 1) − − c h (1 + αx ) − c h α (1 + αx ) m ( x ) , where m ( z ) is the Stieltjes transform of the limiting spectral distribution of F . Since m ( x ) is an increasing function of x > b + ,Π( x , h ) < lim x ↓ b + Π( x , h ) . On the other hand, using (43), we getlim x ↓ b + Π( x , h ) = − c c ( h + 1) − r ( r + c ) c h (1 − c ) ( r + 1) . (56)Note that, considered as a function of h > ¯ h, lim x ↓ b + Π( x , h ) may have positivederivative only when lim x ↓ b + Π( x , h ) < . Indeed,dd h lim x ↓ b + Π( x , h ) = c c ( h + 1) − r ( r + c ) c h (1 − c ) ( r + 1) < h c c ( h + 1) − r ( r + c ) c h (1 − c ) ( r + 1) ! If the latter expression is positive for h > ¯ h >

0, then lim x ↓ b + Π( x , h ) is clearlynegative. Therefore, lim x ↓ b + Π( x , h ) < max (cid:26) , lim x ↓ b + Π( x , ¯ h ) (cid:27) . But, using the deﬁnition ¯ h = ( c + r ) / (1 − c ) in (56), we obtainlim x ↓ b + Π( x , ¯ h ) = − c (1 − c ) c (1 + r ) − r ( r + c ) c ( r + 1) = 0 . ζ ∈ h hp ˜ x , hp ˜ λ p i dd ζ g p ( ζ ) is a.s. negative for suﬃciently large p and n . Now, since λ p a.s. → x , we have g ′ p (0) ≡ − dd ζ g p ( ζ ) (cid:12)(cid:12)(cid:12)(cid:12) ζ = hp ˜ λ p a.s. → R , where R = c c ( h + 1) + 1 + c h (1 + αx ) + c αh (1 + αx ) m ( x ) . Using (45), (36) and (38), we obtain R = 2 c H /h . Exploiting the approach given in Olver (1997, pp. 81-82), we obtain Z hp ( ˜ λ p − ˜ x ) e − n A g p ( ζ ) ψ ( ζ ) x − d x P ∼ (cid:18) πn A g ′ p (0) (cid:19) e − n A g p (cid:16) hp ˜ λ p (cid:17) ψ (cid:18) h αx αx ) (cid:19) . On the other hand, direct calculation shows that ψ (cid:18) h αx αx ) (cid:19) = i √ c + c + c h ) √ c p c µ + c − c + 2 c µ , and g p (cid:18) h p λ p (cid:19) = ϕ (cid:18) h p λ p (cid:19) + 12 n A X pj =2 ln (cid:16) ˜ λ p − ˜ λ pj (cid:17) . Therefore, J ,p ( γ, Λ) P ∼ Z ( p, n , h ) (cid:18) − πpH (cid:19) e − n A ϕ (cid:16) hp ˜ λ p (cid:17) p Y j =2 (cid:16) ˜ λ p − ˜ λ pj (cid:17) − , (57)where Z ( p, n , h ) = C ( p, n ) √ πp c + c µ p c µ + c − c + 2 c µ . As Lemma 18 below shows, J , ,p ( γ, Λ) is asymptotically dominated by J ,p ( θ, λ ), whichcompletes the proof. (cid:3) Lemma 18

Under the hypothesis that h p = h , uniformly in γ from any compact subsetof R J , ,p ( γ, Λ) = o P ( J ,p ( γ, Λ)) . Proof:

Let us ﬁrst consider the integral J ,p ( γ, Λ) over the contour C . For z ∈ C , bydeﬁnition, we have R ( ζ ) ≡ R ( h p z/ ≥ − u + v . Therefore, the uniform approximation2849) is still valid, and we have J ,p ( γ, Λ) P ∼ Z C C ( p, n ) e − n A ϕ ( ζ ) √ πn A i ψ ( ζ ) p Y j =1 (cid:16) z − ˜ λ pj (cid:17) − d z (58)= Z C C ( p, n ) e − n A g p ( ζ ) √ πn A i ψ ( ζ ) (cid:16) z − ˜ λ p (cid:17) − d z. Let us show that, for ζ = h p z/ z ∈ C , R g p ( ζ ) > g p ( h p ˜ x / . (59)Recall that g p ( ζ ) = ϕ ( ζ ) + 12 n A X pj =2 ln (cid:18) h p ζ − ˜ λ pj (cid:19) , and ϕ ( ζ ) = ( u − v ) ln ( u − v ) + v ln ( z + + v ) − u ln ( z + + u ) − z + . By deﬁnition of C , as z moves along C away from ˜ x , ζ is changing so that z + movesalong a circle with center at − u and radius z + u, where z is as deﬁned in (52).In particular, | z + + u | remains constant, R ( − z + ) increases, and, since v < u, | z + + v | increases too. Overall, R ( ϕ ( ζ )) = ( u − v ) ln ( u − v ) + v ln | z + + v | − u ln | z + + u | + R ( − z + )is increasing. Note also that | ζ | = | z + | | z + + v | / | z + + u | must increase, which impliesthat (cid:12)(cid:12)(cid:12) h p ζ − ˜ λ pj (cid:12)(cid:12)(cid:12) is increasing for all j ≥ , and thus, R (cid:18) n A X pj =2 ln (cid:18) h p ζ − ˜ λ pj (cid:19)(cid:19) is increasing too. This implies (59).On the other hand, in the above proof of Lemma 17 we have shown that dd ζ g p ( ζ ) iscontinuous and negative on ζ ∈ [ h p ˜ x , h p ˜ λ p ] . Hence, there must exist

C > ζ = h p z/ z ∈ C , (cid:12)(cid:12)(cid:12) e − n A g p ( ζ ) (cid:12)(cid:12)(cid:12) ≤ e − n A C e − n A ϕ (cid:16) hp ˜ λ p (cid:17) p Y j =2 (cid:16) ˜ λ p − ˜ λ pj (cid:17) − . This inequality, together with (57) and (58) imply that J ,p ( γ, Λ) = o P ( J ,p ( γ, Λ)) . J ,p ( γ, Λ) = o P ( J ,p ( γ, Λ)) follows from pp. 29-31 of Johnstone andOnatski (2014). (cid:3)

Let us denote the likelihood ratio by L p ( γ, Λ) = f (Λ; h p ) f (Λ; h ) . (60)From Lemmas 14 and 15, we obtain the following expression f (Λ; h p ) = 12 πi C p (Λ) k p ( h p ) I p ( γ, Λ) . Using Lemma 15, we obtain L p ( γ, Λ) P ∼ k p ( h p ) k p ( h )  − h p h p ˜ λ p − h h ˜ λ p  p − n − . (61)Consider a new local parameter θ = γ/ω ( h ) , where ω ( h ) = 2 h (1 + h ) r ( h − c (1 + h )) . We have the following lemma.

Lemma 19

Let Under the null hypothesis that h = h , uniformly in θ from any compactsubset of R , ln L p ( γ, Λ) = θ √ p ( λ p − x p ) − θ τ ( h ) + o P (1) where x p = ( h + p/n ) ( h + 1) h − ( h + 1) p/n , and τ ( h ) = 2 r h ( h + 1) (cid:16) h − c ( h + 1) − c (cid:17) ( c − h + c h ) . roof: Taking the logarithm of (61) yieldsln L p ( γ, Λ) = n + 2 − p ln − ˜ λ p h h ! − ln − ˜ λ p h p h p !! − (cid:18) p − (cid:19) ln h p h + (cid:18) p − n − (cid:19) ln 1 + h p h + o P (1) . (62)Moreover, we have the following expansionsln − ˜ λ p h h ! − ln − ˜ λ p h p h p ! = p − γ ˜ λ p (1 + h ) (cid:16) h (1 − ˜ λ p ) (cid:17) − p − γ ˜ λ p (1 + h ) (1 + h (1 − ˜ λ p )) (63)+ p − γ ˜ λ p h ) (1 + h (1 − ˜ λ p )) + o P ( p − ) , ln 1 + h p h = p − γ

11 + h − p − γ h ) + o ( p − ) , (64)and ln h p h = p − γh − − p − γ h − + o ( p − ) . (65)Finally, using (63), (64), and (65) in (62) and noting the fact that λ − x p a.s. → , weobtain the statement of the lemma by straightforward algebraic manipulations. (cid:3) Lemma 19 together with the asymptotic normality of √ p ( λ p − x p ) established inProposition 11 imply, via Le Cam’s First Lemma (see van der Vaart (1998), p.88),that the sequences of the probability measures { P h ,p } and n P h + γ/ √ p,p o describingthe joint distribution of the eigenvalues of F under the null H : h p = h and un-der the local alternative H : h p = h + γ/ √ p are mutually contiguous. Moreover,the experiments (cid:16) P h + θ ω ( h ) / √ p,p : θ ∈ R (cid:17) converge to the Gaussian shift experiment (cid:0) N (cid:0) θ , τ ( h ) (cid:1) : θ ∈ R (cid:1) . In particular, these experiments are LAN . Let us denote the likelihood ratio by L p ( γ, Λ) = f (Λ; h p ) f (Λ; h ) . (66)From Lemmas 14 and 17, we obtain the following expression f (Λ; h p ) = 12 πi C p (Λ) k p ( h p ) J p ( γ, Λ) . L p ( γ, Λ) P ∼ exp h − n A X j =1 ( a j ( h p ) − a j ( h )) i , (67)where a ( h ) = h + ln h ,a ( h ) = −  h λ p − v + s(cid:18) h λ p − v (cid:19) + 4 u h λ p  ,a ( h ) = − u ln   h λ p − v + s(cid:18) h λ p − v (cid:19) + 4 u h λ p  , and a ( h ) = ( u − v ) ln   − h λ p − v + s(cid:18) h λ p − v (cid:19) + 4 u h λ p  . We would like, ﬁrst, to expand a j ( h p ) − a j ( h ) , with j = 1 , ..., , in the power seriesof γ/ √ p up to, and including, the terms of order O P (cid:16) p (cid:17) . For a , we have a ( h p ) − a ( h ) = h + 12 h γ √ p − h γ p . (68)For a , note that (cid:18) h p λ p − v (cid:19) + 4 u h p λ p = (cid:18) h λ p − v (cid:19) + 4 u h λ p + (cid:18) h λ p + 2 u − v (cid:19) γ √ p ˜ λ p + γ p ˜ λ p . Using this expression and the facts that, when h p = h , ˜ λ p a.s. → ( h + 1) ( c + h ) h (1 + c /c + h ) , u → c /c − c , and v → − c , we obtain after some algebra, a ( h p ) − a ( h ) = − ˜ λ p h ˜ λ p + 2 u − vS ! γ √ p + C (2) γ p + o P (cid:18) p (cid:19) , (69)where S = s(cid:18) h λ p − v (cid:19) + 4 u h λ p , C (2) = c ( h + 1) ( c + h ) ( c + c − c c ) ( c + c + h c )2 h (cid:0) c h + c + c + c h + c + 2 h c (cid:1) . For a , we have a ( h p ) − a ( h ) = − u ˜ λ p (cid:16) h ˜ λ p + 2 u − v + S (cid:17) (cid:18)(cid:16) h ˜ λ p − v (cid:17) + 4 u h ˜ λ p + (cid:16) h ˜ λ p − v (cid:17) S (cid:19) γ √ p (70)+ C (2) C (3) γ p + o P (cid:18) p (cid:19) , where C (3) = ( c + c − c c ) c ( c + h ) + (cid:0) c h + c + c + c h + c + 2 h c (cid:1) ( c + c + c h )2 c c ( c + h ) . Finally, for a , we obtain a ( h p ) − a ( h ) = ( u − v ) ˜ λ p (cid:16) h ˜ λ p + 2 u − v − S (cid:17) (cid:18)(cid:16) h ˜ λ p − v (cid:17) + 4 u h ˜ λ p + (cid:16) − h ˜ λ p − v (cid:17) S (cid:19) γ √ p (71) − C (2) C (4) γ p + o P (cid:18) p (cid:19) , where C (4) = c + c + c h c ( c + h ) + ( c + c − c c ) (cid:0) c + c + c h + c + 2 c h + 2 c h (cid:1) c ( c + h ) ( c + c + c h ) . Summing up the γ /p terms in the expansions (68-71), we obtain that the γ /p term inthe expansion of X j =1 ( a j ( h p ) − a j ( h )), which we will refer as T , equals T = − c c + c + c h − h + 2 c h h (cid:0) c + c + c h + c + 2 c h + 2 c h (cid:1) γ p . (72)Now let ∆ = √ p ( λ p − x p ) , where x p = ( h + p/n ) ( h + 1) h − ( h + 1) p/n ,

33o that ˜ λ p = λ p n n + λ p = x p + ∆ / √ p n n + x p + ∆ / √ p = ( h + 1) ( p/n + h ) h (1 + n /n + h ) + ∆ √ p c c ( c + c h − h ) h ( c + c + c h ) + o P (cid:18) √ p (cid:19) , Our next goal is to expand the weights on γ/ √ p in expansions (68-71) into power seriesof ∆ / √ p up to the linear term only.For (69), we have − ˜ λ p h ˜ λ p + 2 u − vS ! = τ (2)0 + h τ (2)11 + τ (2)12 i ∆ √ p + o P (cid:18) √ p (cid:19) , where τ (2)11 = − c ( c + c h − h ) h (cid:0) c + c + c h + c + 2 c h + 2 c h (cid:1) ,τ (2)12 = c ( h + 1) ( c + h ) ( c + c − c c ) ( c + c h − h ) h (cid:0) c + c + c h + c + 2 c h + 2 c h (cid:1) , and τ (2)0 is a complicated function of h , p, n , and n , which we do not report here.For (70), we have − u ˜ λ p (cid:16) h ˜ λ p + 2 u − v + S (cid:17) (cid:18)(cid:16) h ˜ λ p − v (cid:17) + 4 u h ˜ λ p + (cid:16) h ˜ λ p − v (cid:17) S (cid:19) = τ (3)0 + h τ (3)11 + τ (3)12 i ∆ √ p + o P (cid:18) √ p (cid:19) , where τ (3)11 = c ( c − h + c h ) ( c + c − c c ) ( c + c + c h ) (cid:0) c + c + c h − c + 2 c h (cid:1) h c (cid:0) c + c + c h + c + 2 c h + 2 c h (cid:1) ,τ (3)12 = − c ( h + 1) ( c − h + c h ) ( c + c − c c )2 h (cid:0) c + c + c h + c + 2 c h + 2 c h (cid:1) , and τ (3)0 is a complicated function of h , p, n , and n , which we do not report here.For (71), we have( u − v ) ˜ λ p (cid:16) h ˜ λ p + 2 u − v − S (cid:17) (cid:18)(cid:16) h ˜ λ p − v (cid:17) + 4 u h ˜ λ p + (cid:16) − h ˜ λ p − v (cid:17) S (cid:19) = τ (4)0 + h τ (4)11 + τ (4)12 i ∆ √ p + o P (cid:18) √ p (cid:19) , τ (4)11 = c ( c − h + c h ) ( c + c − c c ) (cid:0) − c − c + c h + c + 2 c c + 2 c c h (cid:1) c h ( c + c + c h ) (cid:0) c + c + c h + c + 2 c h + 2 c h (cid:1) ,τ (4)12 = − c ( h + 1) ( c − h + c h ) ( c + c − c c )2 h ( c + c + c h ) (cid:0) c + c + c h + c + 2 c h + 2 c h (cid:1) , and τ (4)0 is a complicated function of h , p, n , and n , which we do not report here.We have veriﬁed, using Maple symbolic algebra software, that τ (2) + τ (3) + τ (4) = −

12 1 + h h , which is exactly the negative of the term on γ/ √ p in (68). Hence, the term on γ/ √ p inthe expansion of X j =1 ( a j ( h p ) − a j ( h )) is zero. Further, we have veriﬁed that X j =2 (cid:16) τ ( j )11 + τ ( j )12 (cid:17) = − c ( c − h + c h ) h (cid:0) c + c + c h + c + 2 c h + 2 c h (cid:1) . This equality, together with (67) and (72) imply thatln L p ( γ, Λ) P ∼

12 ( c − h + c h ) h (cid:0) c + c + c h + c + 2 c h + 2 c h (cid:1) γ ∆ (73)+ 14 c + c + c h − h + 2 c h h (cid:0) c + c + c h + c + 2 c h + 2 c h (cid:1) γ . Consider a diﬀerent local parameter θ = γ/ω ( h ) , where ω ( h ) = 2 h (cid:0) c + c + c h + c + 2 c h + 2 c h (cid:1) ( h − c (1 + h )) . Asymptotic approximation (73) implies the following lemma.

Lemma 20

Under the null hypothesis that h = h , uniformly in θ from any compactsubset of R , ln L p ( γ, Λ) = θ √ p ( λ p − x p ) − θ τ ( h ) + o P (1)35 here x p = ( h + p/n ) ( h + 1) h − ( h + 1) p/n , and τ ( h ) = 2 h (cid:16) h − c (1 + h ) − c (cid:17) (cid:16) ( c + c ) (1 + h ) − c (cid:0) h − c (cid:1)(cid:17) ( c − h + c h ) . Similarly to the case of Setting 1, Lemma 20 together with the asymptotic normalityof √ p ( λ p − x p ) established in Proposition 11 imply, via Le Cam’s First Lemma (seevan der Vaart (1998), p.88), that the sequences of the probability measures { P h ,p } and n P h + γ/ √ p,p o describing the joint distribution of the eigenvalues of F under the null H : h p = h and under the local alternative H : h p = h + γ/ √ p are mutually contiguous.Moreover, the experiments (cid:16) P h + θ ω ( h ) / √ p,p : θ ∈ R (cid:17) converge to the Gaussian shiftexperiment (cid:0) N (cid:0) θ , τ ( h ) (cid:1) : θ ∈ R (cid:1) . In particular, these experiments are LAN . In this paper, we establish the Local Asymptotic Normality of the experiments of observ-ing the eigenvalues of the F-ratio F ≡ ( B/n ) − A/n A of two large-dimensional Wishartmatrices. The experiments are parameterized by the value of a single spike that de-scribes the “ratio” of the covariance parameters of A and B , or, in the case of equalcovariance parameters, the non-centrality parameter of A . We ﬁnd that the asymptoticbehavior of the log ratio of the joint density of the eigenvalues of F , which correspondsto a super-critical spike, to their joint density under a local deviation from this valuedepends only on the largest eigenvalue λ p . This implies, in particular, that the beststatistical inference about a super-critical spike in the local asymptotic regime is basedon the largest eigenvalue only.As a by-product of our analysis, in a multi-spike setting, we establish the jointasymptotic normality of a few of the largest eigenvalues of F that correspond to thesuper-critical spikes. We derive an explicit formulas for the almost sure limits of theseeigenvalues, and for the asymptotic variances of their ﬂuctuations around these limits. This work was supported in part by NIH grant 5R01 EB 001988 (PD, IMJ), the SimonsFoundation Math + X program (PD), NSF grant DMS 1407813 (IMJ), and the the J.M.Keynes Fellowships Fund, University of Cambridge (AO).36

Appendix

We will need the following two lemmas.

Lemma 21 (McLeish 1974) Let { X pj , G pj , j = 1 , ..., p } be a martingale diﬀerence arrayon the probability triple (Ω , G , P ) . If the following conditions are satisﬁed: a) Lindeberg’scondition: for all ε > , X j R | X pj | >ε X pj d P → as p → ∞ ; b) X j X pj P → , then X j X pj d → N (0 , . Proof:

This is a consequence of Theorem (2.3) of McLeish (1974). Two conditions ofthe theorem: i) max j ≤ p | X pj | is uniformly bounded in L norm, and ii) max j ≤ p | X pj | P → (cid:3) Lemma 22 (Hall and Heyde) Let { X pj , G pj , j = 1 , ..., p } be a martingale diﬀerence ar-ray, and deﬁne V pJ = X Jj =1 E (cid:16) X pj |G p,j − (cid:17) and U pJ = X Jj =1 X pj for J = 1 , ..., p .Suppose that the conditional variances V pp are tight, that is sup p P (cid:0) V pp > ε (cid:1) → as ε → ∞ , and that the conditional Lindeberg condition holds, that is, for all ε > , X j E h X pj {| X pj | > ε } |G p,j − i P → . Then max J (cid:12)(cid:12)(cid:12) U pJ − V pJ (cid:12)(cid:12)(cid:12) P → . Proof:

This is a shortened version of Theorem 2.23 in Hall and Heyde (1980). (cid:3)

Let f q ( λ ) , q = 1 , ..., Q, be such that f q ( λ ) = g q ( λ ) for λ ∈ [ l i , L i ] and f q ( λ ) = 0otherwise. Consider random variables X pj = 1 √ p X ( q,s,t ) ∈ Θ γ qst f q (cid:16) λ ( i ) pj (cid:17) ( ζ js ζ jt − δ st ) , where γ qst are some constants. Let G pJ be the σ -algebra generated by λ ( i ) p , ..., λ ( i ) pp and ζ js with j = 1 , ..., J ; s = 1 , ..., m . Clearly, { X pj , G pj , j = 1 , ..., p } form a martingale diﬀerencearray. Let K be the number of diﬀerent triples ( q, s, t ) ∈ Θ . Consider an arbitrary orderin Θ. In H¨older’s inequality X Ka =1 y a z a ≤ (cid:18)X Ka =1 ( y a ) b (cid:19) /b (cid:18)X Ka =1 ( z a ) c (cid:19) /c , which holds for y a > , z a > b > , c > , and 1 /b + 1 /c = 1 , take y a = (cid:12)(cid:12)(cid:12)(cid:12) √ p γ qst f q (cid:16) λ ( i ) pj (cid:17) ( ζ js ζ jt − δ st ) (cid:12)(cid:12)(cid:12)(cid:12) , q, s, t ) is the a -th triple in Θ , z a = 1 , and b = 2 + δ for some δ >

0. Then, theinequality implies that | X pj | δ ≤ K δ R δi X ( q,s,t ) ∈ Θ (cid:12)(cid:12)(cid:12)(cid:12) √ p γ qst ( ζ js ζ jt − δ st ) (cid:12)(cid:12)(cid:12)(cid:12) δ , (74)where R i = max q =1 ,...,Q sup λ ∈ [ l i ,L i ] | g q ( λ ) | . Since ζ js are i.i.d. N (0 , , (74) implies that X pj =1 E | X pj | δ → p → ∞ , whichmeans that the Lyapunov condition holds for X pj . As is well known, Lyapunov’s con-dition implies Lindeberg’s condition. Hence, condition a) of Lemma 21 is satisﬁed for X pj .Let us consider X pj =1 X pj . Since the convergence in mean implies the convergencein probability, the conditional Lindeberg condition is satisﬁed for X pj because the un-conditional Lindeberg condition is satisﬁed as checked above. Further, in notations ofLemma 22, it is easy to see that V pp = X q,q (cid:20)(cid:16)X ≤ s ≤ t ≤ m γ qst γ q st (1 + δ st ) (cid:17) p X pj =1 f q (cid:16) λ ( i ) pj (cid:17) f q (cid:16) λ ( i ) pj (cid:17)(cid:21) . The convergence of the empirical distribution of λ ( i ) p , ..., λ ( i ) pp to G x i and the equality of g q and f q on the support of G x i implies that V pp P → Σ ≡ X q,q (cid:20)(cid:16)X ≤ s ≤ t ≤ m γ qst γ q st (1 + δ st ) (cid:17) Z g q ( λ ) g q ( λ ) d G x i (cid:21) . In particular, V pp is tight and Lemma 22 applies. Therefore, X pj =1 X pj converges to thesame limit as V pp . Thus, by Lemma 21, we get X pj =1 X pj d → N (0 , Σ) . Finally, let Y pj = 1 √ p X ( q,s,t ) ∈ Θ γ qst g q (cid:16) λ ( i ) pj (cid:17) ( ζ js ζ jt − δ st ) . Since Pr (cid:16)X pj =1 X pj = X pj =1 Y pj (cid:17) → p → ∞ , we have X pj =1 Y pj d → N (0 , Σ). Lemma 8 follows from this convergence viathe Cramer-Wold device. (cid:3) .2 Derivation of (29), (30), and (31) Expression (29) immediately follows from (15). Next, diﬀerentiating identity (13) withrespect to z , we obtain1 + c m ′ x ( z )(1 + c m x ( z )) = m ′ x ( z ) m x ( z ) + − x c m ′ x ( z )(1 − c xm x ( z )) . Setting z = 0 and x = x i , and using the fact that m x i (0) = − ( h i + c ) − , (75)which follows from (15), we obtain1 + c m ′ x i (0) (cid:16) − c ( h i + c ) − (cid:17) = m ′ x i (0)( h i + c ) − + − x i c m ′ x i (0) (cid:16) c x i ( h i + c ) − (cid:17) . Using the deﬁnition (17) of x i , we obtain1 + c m ′ x i (0) (cid:16) − c ( h i + c ) − (cid:17) = m ′ x i (0)( h i + c ) − − ( h i + c ) ( h i + 1) c m ′ x (0) h i , which implies (30). Finally, diﬀerentiating identity (13) with respect to x , we obtain c d m x ( z ) / d x (1 + c m x ( z )) = d m x ( z ) / d x ( m x ( z )) + − c xm x ( z ) − x ( c m x ( z ) + c x d m x ( z ) / d x )(1 − c xm x ( z )) . Setting z = 0 and x = x i , we obtain c d m x i (0) / d x (1 + c m x i (0)) = d m x i (0) / d x ( m x i (0)) + − − c x i d m x i (0) / d x (1 − c x i m x i (0)) . This equality, the deﬁnition (17) of x i , and equation (75) imply (31). References [1] Bai, Z.D. and Silverstein, J.W. (1998) “No Eigenvalues Outside the Support of theLimiting Spectral Distribution of Large-Dimensional Sample Covariance Matrices,”The Annals of Probability, Vol. 26, pp. 316-345.392] Bai, Z.D. and J. Yao (2008) “Central limit theorems for eigenvalues in a spiked pop-ulation model,” Annales de l’Institut Henri Poincar´e - Probabilit´es et Statistiques44, 447–474.[3] Baik, J., G. Ben Arous and S. P´ech´e (2005) “Phase transition of the largest eigen-value for nonnull complex sample covariance matrices,” Annals of Probability 33,1643–1697.[4] Bao, Z., J. Hu, G. Pan, and W. Zhou (2014b) “Canonical correlation coeﬃcients ofhigh-dimensional normal vectors: ﬁnite rank case,” arXiv 1407.7194[5] Bao, Z., G. Pan, and W. Zhou (2014a) “Universality for the largest eigenvalue ofsample covariance matrices with general population,” arXiv: 1304.5690v6.[6] Benaych-Georges, F., A. Guionnet, and M. Maida (2011) “Fluctuations of the Ex-treme Eigenvalues of Finite Rank Deformations of Random Matrices,” ElectronicJournal of Probability 16, 1621-1662.[7] Bilodeau, M. and M. S. Srivastava (1992) “Estimation of the Eigenvalues of Σ Σ − ,”Journal of Multivariate Analysis 41, 1-13.[8] Bloemendal, A. and B. Vir´ag (2013) “Limits of spiked random matrices I,” Proba-bility Theory and Related Fields 156, 795-825.[9] Bloemendal, A. and B. Vir´ag (2011) “Limits of spiked random matrices II,” arXiv:1109.3704v1.[10] Dharmawansa, P. and I. M. Johnstone (2014) “Joint density of eigenvalues in spikedmultivariate models,” Stat 3, no. 1, 240–249.[11] F´eral, D. and S. P´ech´e (2009) “The largest eigenvalues of sample covariance matricesfor a spiked population: diagonal case,” Journal of Mathematical Physics 50, 073302.[12] Hall, P., and C.C. Heyde (1980) Martingale limit theory and its application , NewYork: Academic Press.[13] J. F. Hayes, J. F. and W. G. Hill (1981) “Modiﬁcation of Estimates of Parametersin the Construction of Genetic Selection Indices (’Bending’),” Biometrics Vol. 37,No. 3, pp. 483-493[14] Horn, R. A., and C. R. Johnson (1985) Matrix Analysis, Cambridge UniversityPress.[15] James, A. T. (1964) “Distributions of matrix variates and latent roots derived fromnormal samples”,

Annals of Mathematical Statistics

35, 475-501.4016] Johnstone, I. M. and A. Onatski (2014) “Likelihood ratio analysis in the sub-criticalregime. Case 4,” manuscript Case4version7.pdf, in preparation.[17] Kato, T. (1980)

Perturbation Theory for Linear Operators , Springer-Verlag. Berlin,Heidelberg, New York.[18] Khatri, C. G. (1967) “Some distributional problems associated with the character-istics roots of S S − ,” Ann. Math. Stat., vol. 38, no. 3, pp. 944–948.[19] Kritzman, M. and Y. Li (2010) “Skulls, Financial Turbulence, and Risk Manage-ment,” Financial Analysts Journal 66, 30-41.[20] McLeish, D.L. (1974) ”Dependent Central Limit Theorems and Invariance Princi-ples”, Annals of Probability , Vol. 2, No. 4, p.620-628.[21] Mo, M.Y. (2012) “The rank 1 real Wishart spiked model,” Communications on Pureand Applied Mathematics 65, 1528–1638.[22] Muirhead, R.J. (1982) Aspects of Multivariate Statistical Theory. John Wiley &Son, Hoboken, New Jersey.[23] Muirhead, R.J. and T. Verathaworn (1985) “On estimating the latent roots ofΣ Σ − ,” in Krishnaiah, P.R. (ed.) Multivariate Analysis - VI, Elsevier Science Pub-lishers B.V., 431-447.[24] Nadakuditi, R.R. and J. W. Silverstein (2010) “Fundamental Limit of Sample Gen-eralized Eigenvalue Based Detection of Signals in Noise Using Relatively Few Signal-Bearing and Noise-Only Samples,” IEEE Journal of Selected Topics in Signal Pro-cessing 4 (3), 468-480.[25] Olver, F.W.J. (1997) Asymptotics and Special Functions

Journal of Econometrics

Real and Complex Analysis , 3-rd edition. McGraw-Hill series inhigher mathematics.[34] Sheena, Y., A.K. Gupta, and Y. Fujikoshi (2004) “Estimation of the eigenvalues ofnoncentrality parameter in matrix variate noncentral beta distribution,” Annals ofthe Institute of Statistical Mathematics 56, 101-125.[35] Silverstein, J.W. (1995) “Strong convergence of the empirical distribution of eigen-values of large dimensional random matrices,” Journal of Multivariate Analysis 5,331-339.[36] Silverstein, J.W. and Bai, Z.D. (1995) “On the empirical distribution of eigenvaluesof large dimensional random matrices,”

Journal of Multivariate Analysis

54, 175-192.[37] Silverstein, J.W. and S. Choi (1995) “Analysis of the Limiting Spectral Distributionof Large Dimensional Random Matrices,”

Journal of Multivariate Analysis , 54, 295-309[38] Titchmarsh, E. C. (1960) The Theory of Functions, 2nd ed. Oxford, England. OxfordUniversity Press.[39] van der Vaart, A.W. (1998) Asymptotic Statistics. Cambridge University Press.[40] Wachter, K. (1980) “The limiting empirical measure of multiple discriminant ratios,”