Inference on Eigenvalues of Wishart Distribution Using Asymptotics with respect to the Dispersion of Population Eigenvalues
aa r X i v : . [ m a t h . S T ] A p r Inference on Eigenvalues of Wishart Distribution UsingAsymptotics with respect to the Dispersion of PopulationEigenvalues
Yo Sheena ∗ and Akimichi Takemura † April, 2007
Abstract
In this paper we derive some new and practical results on testing and interval estimationproblems for the population eigenvalues of a Wishart matrix based on the asymptotic theoryfor block-wise infinite dispersion of the population eigenvalues. This new type of asymptotictheory has been developed by the present authors in Takemura and Sheena (2005) and Sheenaand Takemura (2007a,b) and in these papers it was applied to point estimation problem ofpopulation covariance matrix in a decision theoretic framework. In this paper we apply it tosome testing and interval estimation problems. We show that the approximation based on thistype of asymptotics is generally much better than the traditional large-sample asymptoticsfor the problems.
Key words and phrases: eigenvalues of covariance matrix, Wishart distribution, test on eigenvalues,interval estimation of eigenvalues
Let S = ( s ij ) be distributed according to Wishart distribution W p ( n, Σ ), where p is the dimension, n is the degrees of freedom and Σ is the covariance matrix. Let λ ≥ . . . ≥ λ p > Σ . In this paper we consider some testing and interval estimation problemsfor the eigenvalues of Σ . Our aim is to give practical solutions to the problems based on theasymptotic theory for block-wise infinite dispersion of the population eigenvalues. In view ofthe intractability of the finite sample exact distribution of sample eigenvalues, usually the largesample asymptotic approximation is used. There exists an extensive literature on improving thefirst-order large sample approximation by an asymptotic expansion (see Siotani et al. (1985) fora comprehensive treatment). However for a moderate or small value of the sample size n , thelarge sample asymptotic theory often gives a poor approximation. In these cases asymptoticexpansions tend to give an even larger error. On the other hand, we find that approximation ∗ Department of Economics, Shinshu University † Graduate School of Information Science and Technology, University of Tokyo n .The first problem we consider in this paper is testing the one-sided null hypothesis on the m thpopulation eigenvalue H ( m )0 : λ m ≥ λ ∗ m . (1)For testing H ( m )0 it is natural to consider a one-sided rejection region based on the m th sampleeigenvalue of l m of S . We show that the least favorable distribution is given by λ ∗ m = λ = · · · = λ m and 0 = λ m +1 = · · · = λ p . This is exactly the situation covered by the asymptotic theory forblock-wise infinite dispersion. Therefore it gives an explicit solution to the testing problem of H ( m )0 .The second problem is the interval estimation for the largest population eigenvalue λ in termsof the largest sample eigenvalue l of S . We will show that confidence interval based block-wise infinite dispersion gives much better coverage probability than the conventional large sampleasymptotics.The third problem is testing the hypothesis of equality of the several smallest eigenvalues: λ m +1 = · · · = λ p . This problem is important in determining the rank of the systematic part in amultivariate variance component model. We consider approximation to the null distribution of thelikelihood ratio criterion under the block-wise infinite dispersion of population eigenvalues. Againthis type of asymptotics gives much better approximation than the large sample asymptotics.The organization of the paper is as follows. In Section 2 we set up notations for the paperand give some preliminary results on the asymptotic theory for block-wise infinite dispersion ofthe population eigenvalues. In Section 3 we study the above three problems, 1) One-sided test fora population eigenvalue in Section 3.1; 2) Interval estimation for extreme eigenvalues in Section3.2; 3) Testing equality of the smallest eigenvalues in Section 3.3. Denote the spectral decompositions of Σ and S by Σ = ΓΛΓ ′ (2) S = GLG ′ , (3)where G , Γ ∈ O ( p ) , the group of p × p orthogonal matrices, and Λ = diag( λ , . . . , λ p ), L =diag( l , . . . , l p ) are diagonal matrices with the eigenvalues λ ≥ . . . ≥ λ p > l ≥ . . . ≥ l p > Σ and S , respectively. We use the notations λ = ( λ , . . . , λ p ) and l = ( l , . . . , l p ) hereafter. Weguarantee the uniqueness (almost surely) of the decomposition (3) by requiring that e G = ( e g ij ) = Γ ′ G (4)has positive diagonal elements.In Takemura and Sheena (2005) we considered what happens to appropriately normalizedcomponents of S if the population eigenvalues become infinitely dispersed, i.e.,( λ /λ , λ /λ , . . . , λ p /λ p − ) → .
2n Sheena and Takemura (2007b) we generalized the asymptotic result of Takemura and Sheena(2005) to the case when the population eigenvalues are block-wise infinitely dispersed.Let the population eigenvalues be parameterized as follows; λ i = (cid:26) ξ i α, if i = 1 , . . . , m,ξ i β, if i = m + 1 , . . . , p, (5)where ξ i ’s are fixed and “asymptotic parameter” α and β vary. When we say population eigenvaluesare “(two-)block-wise infinitely dispersed”, it means that β/α → . (6)The above notation is used as a general notation including specific convergences (divergences)such as ( α, β ) → ( ∞ , α, β ) → (1 ,
0) and so on. More precisely, the operation lim β/α → f ( α, β )means lim i →∞ f ( α i , β i ) with any specific sequences α i , β i , i = 1 , , . . . , such that β i /α i → i → ∞ . As Sheena and Takemura (2007b) indicates, appropriate normalization for the sample eigen-values is given by d i = (cid:26) l i /α, if i = 1 , . . . , m,l i /β, if i = m + 1 , . . . , p, (7)while (4) also serves as an appropriate normalization for sample eigenvectors.For the normalized population or sample eigenvalues, we use the following notations; ξ = ( ξ , . . . , ξ m ) , ξ = ( ξ m +1 , . . . , ξ p ) , d = ( d , . . . , d m ) , d = ( d m +1 , . . . , d p ) , Ξ = diag( ξ , . . . , ξ m ) , Ξ = diag( ξ m +1 , . . . , ξ p ) , D = diag( d , . . . , d m ) , D = diag( d m +1 , . . . , d p ) . Now we state the basic theorem on the asymptotic distributions of d , d . Theorem 1
Suppose that we have two independent Wishart distributions f W ∼ W m ( n, Ξ ) , f W ∼ W p − m ( n − m, Ξ ) and that their spectral decompositions are given by f W = e G e D e G ′ , e D = diag( e d , . . . , e d m ) , e d = ( e d , . . . , e d m ) , f W = e G e D e G ′ , e D = diag( e d m +1 , . . . , e d p ) , e d = ( e d m +1 , . . . , e d p ) , where e G ∈ O ( m ) , e G ∈ O ( p − m ) , e d ≥ · · · ≥ e d m , e d m +1 ≥ · · · ≥ e d p . Then as β/α → , d i d → e d i , i = 1 , . Proof.
Using Lemma 1 of Sheena and Takemura (2007b), we prove the convergence of themoment generating function. Let x ( G , l , λ , α, β ) = exp α − m X i =1 l i θ i + β − p X i = m +1 l i θ i ! = exp p X i =1 d i θ i ! , | θ i | < − min j ξ − j , ∀ i . Notice that (19) in Lemma 1 of Sheena and Takemura (2007b) issatisfied since x ( Γ G , l , λ , α, β ) ≤ exp α − m X i =1 l i | θ i | + β − p X i = m +1 l i | θ i | ! ≤ exp − α − m X i =1 l i ξ − i + 3 − β − p X i = m +1 l i ξ − i ! = exp − p X i =1 l i λ − i ! ≤ exp (cid:0) tr 3 − GLG ′ Λ − (cid:1) , ∀ G ∈ O ( p ) , ∀ l ∈ { l | l ≥ · · · ≥ l p ≥ } . For the last inequality, see e.g. Marshall and Olkin (1979) Ch.20.A.1. Since x ( d , q , ξ , α, β ; Γ , H ( τ ) ) =exp ( P pi =1 d i θ i ), trivially we have¯ x Γ ( H ( τ ) G ( q , q , ) , d , Q , ξ ) = exp p X i =1 d i θ i ! . Therefore we havelim β/α → E " exp p X i =1 d i θ i ! = E " exp m X i =1 e d i ( f W ) θ i + p X i = m +1 e d i ( f W ) θ i ! . The asymptotic result in the previous section has possibly various applications for inference onthe population eigenvalues. We give three inference problems as interesting applications.
Consider the null hypothesis on the m th ( m = 1 , . . . , p ) population eigenvalue H ( m )0 : λ m ≥ λ ∗ m against the alternative H ( m )1 : λ m < λ ∗ m . Need for testing H arises in some practical cases, forexample: • In principal component analysis, λ ∗ (= λ ∗ = · · · = λ ∗ p ) may be a cut-off value and a test for H ( m )0 is repeatedly carried out starting from m = 1 until H ( m )0 is rejected. This is one of themethods for deciding the dimension of the principal components.4 Let x i ( i = 1 , . . . , p ) be the return of the i th asset in finance and x = ( x , . . . , x p ) is dis-tributed as the p -dimensional normal distribution N p ( , Σ ). H (1)1 is equivalent to the asser-tion a ′ Σ a < λ ∗ , ∀ a = ( a , . . . , a p ) such that k a k = 1. If H (1)0 is rejected, then it means thatthe group of assets x is stable in view of volatility since any portfolio among the group isnever beyond λ ∗ in its variance.A natural rejection region in testing H ( m )0 is given by l m ≤ l ∗ m ( γ ) for a given significance level γ .The following lemma and Theorem 1 give the critical point l ∗ m ( γ ). Lemma 1
For any positive c sup H ( m )0 P Λ ( l m ≤ c ) = lim β → P ¯ Λ ( l m ≤ c ) , where ¯ Λ = diag(¯ λ , . . . , ¯ λ p ) , ¯ λ = · · · = ¯ λ m = λ ∗ m , ¯ λ m +1 = · · · = ¯ λ p = β. Proof.
According to Theorem 1 of Anderson and Das Gupta (1964), P Λ ( l m ≤ c ) is a monoton-ically decreasing function with respect to each λ i , ( i = 1 , . . . , p ), hence P Λ ( l m ≤ c ) ≤ P ¯ Λ ( l m ≤ c ) , where β = λ p . Furthermore P ¯ Λ ( l m ≤ c ) is monotonically increasing as β goes to zero.Because of the result of Theorem 1 with α = 1, ξ i = λ ∗ m , ( i = 1 , . . . , m ) and ξ i = 1 , ( i = m + 1 , . . . , p ), lim β → P ¯ Λ ( l m ≤ c ) = P ( e l m ≤ c ) , where e l m is distributed as the smallest eigenvalues of W m ( n, λ ∗ m I m ). Therefore we have thefollowing result. Theorem 2
For testing hypothesis H ( m )0 against H ( m )1 , a test with significance level γ is givenwith the rejection region l m ≤ l ∗ m ( γ ) , where l ∗ m ( γ ) is the lower 100 γ % point of the smallest eigenvalue of W m ( n, λ ∗ m I m ) . For analytic calculation of l ∗ m ( γ ), see Thompson (1962), Hanumara and Thompson (1968). In thecase m = 1, which is practically the most important, it is given by λ ∗ χ n ( γ ) , where χ n ( γ ) is thelower 100 γ % point of the χ distribution with the degree of freedom n. In this subsection we present a new way of constructing a confidence interval for the extremepopulation eigenvalues. Let λ ≤ f ( l ) be a one-sided estimated interval with confidence level γ .For example, in the second case in Section 3.1, the maximum volatility in all possible portfolioamong the assets x is estimated to be less than or equal to f ( l ).However if we use the exact finite distribution theory, it is not easy to find an appropriate f ( l ) under a given γ even if we only consider an interval of the simplest form λ ≤ c l with some5onstant c . (Note that l i /λ i , ( i = 1 , . . . , p ) is bounded in probability. See Lemma 1 of Takemuraand Sheena (2005).) Therefore usually a large sample approximation is employed (e.g. Theorem13.5.1. of Anderson (2003)): √ n (cid:16) l i n − λ i (cid:17) d → N (0 , λ i ) , i = 1 , . . . , p. Let z γ denote the upper 100 γ percentile of the standard normal distribution. Since P (cid:16)r n (cid:16) l nλ − (cid:17) ≥ z γ (cid:17) = P (cid:16) l ≥ ( √ nz γ + n ) λ (cid:17) → γ as n → ∞ , we have an approximate confidence interval λ ≤ ( √ nz γ + n ) − l , (8)with confidence level close to γ for sufficiently large n .Now we propose an alternative approximation. Suppose m = 1 in Theorem 1, then as β/α goes to zero, d d → ˜ d = f W . Since f W /ξ ∼ χ ( n ), l λ = d ξ d → χ ( n ) . as β/α goes to zero. From this asymptotics, we can make an approximate interval λ ≤ ( χ γ ( n )) − l , (9)where χ γ ( n ) is the upper 100 γ percentile of χ distribution with the degree of freedom n . Theinterval (9) has approximately γ confidence level when β/α is sufficiently close to zero.We are interested in how large n for (8) or how small β/α for (9) is required to get practi-cally sufficient approximations. Because of difficulty in theoretical evaluations, we carried out asimulation study with the fixed parameters p = 3 , m = 1 , ξ = ξ = ξ = 1 , α = 1, while weselect different n ’s (5, 10, 20, 50, 100, 500, 1000) and β ’s (1.0, 0.9, 0.8, 0.6, 0.5, 0.3, 0.1, 0.01,0.001). For each case, 50000 Wishart random matrices are generated. We present the results inTable 1. The numbers under L1(U1) indicate the ratio of the largest sample eigenvalue which fallwithin the interval (8) with γ = 0 . . ± .
01 deviation fromthe desired value, hence the approximation may be good enough for many practical purposes. Wecan summarize the result as follows;1. In every case (9) gives better approximation than (8).2. Since β is as large as 0.3, (9) already gives a good approximation. In that sense, theapproximated interval (9) seems robust. When β is smaller or equal to 0.1, (9) works welleven with small samples such as n = 5 or 10, while (8) needs samples as large as 100 or 500.6able 1: Approximated Interval Estimation β n = 5 .402 .322 1.00 1.00 .341 .267 1.00 1.00 .276 .209 1.00 1.00 n = 10 .416 .345 1.00 1.00 .327 .264 1.00 1.00 .247 .192 1.00 1.00 n = 20 .419 .361 1.00 1.00 .301 .250 1.00 1.00 .207 .165 1.00 1.00 n = 50 .416 .373 1.00 1.00 .256 .222 1.00 1.00 .153 .129 1.00 1.00 n = 100 .413 .380 1.00 1.00 .212 .189 1.00 1.00 .124 .109 .999 .999 n = 500 .405 .389 1.00 1.00 .118 .111 .999 .999 .080 .074 .984 .981 n = 1000 .407 .395 1.00 1.00 .097 .093 .995 .994 .070 .067 .972 .970 β .6 .5 .3U1 U2 L1 L2 U1 U2 L1 L2 U1 U2 L1 L2 n = 5 .167 .120 1.00 1.00 .133 .094 1.00 1.00 .095 .067 1.00 .998 n = 10 .139 .104 1.00 1.00 .111 .083 1.00 .999 .084 .063 .999 .989 n = 20 .112 .089 1.00 .998 .094 .073 .999 .995 .075 .058 .990 .974 n = 50 .089 .075 .995 .991 .080 .068 .988 .980 .069 .058 .974 .961 n = 100 .078 .068 .984 .978 .072 .064 .975 .968 .062 .055 .965 .957 n = 500 .064 .059 .963 .959 .059 .055 .959 .955 .057 .053 .958 .954 n = 1000 .060 .057 .960 .958 .058 .055 .956 .953 .055 .053 .956 .954 β .1 .01 .001U1 U2 L1 L2 U1 U2 L1 L2 U1 U2 L1 L2 n = 5 .075 .055 .049 .953 .071 .052 .951 n = 10 .072 .055 .992 .959 .068 .052 .989 .951 .065 .048 .988 .950 n = 20 .066 .053 .979 .956 .066 .052 .976 .951 .063 .050 .974 .948 n = 50 .061 .051 .966 .953 .058 .048 .964 .950 .059 .050 .965 .950 n = 100 .057 .050 .961 .951 .057 .051 .961 .951 .056 .049 .960 .951 n = 500 .055 .051 .954 .950 .054 .050 .956 .952 .053 .049 .954 .950 n = 1000 .052 .050 .954 .951 .053 .050 .953 .949 .052 .049 .954 .951
3. When β is from 0.5 to 0.6, both approximations need large samples such as 500 or 1000. If β is larger than 0.6, they need samples larger than 1000 for a good approximation.Similarly we can make an interval estimation for the smallest eigenvalue, λ p . Let m = p − β/α goes to zero, d p d → ˜ d p = f W . Since f W /ξ p ∼ χ ( n − p + 1), l p λ p = d p ξ p d → χ ( n − p + 1) . as β/α goes to zero. Using this fact, we can estimate λ p to lie in the interval λ p ≤ ( χ γ ( n − p + 1)) − l p , (10)7t approximately γ confidence level when β/α is sufficiently close to zero.We now compare (8) and (9) more closely in view of the known results on asymptotic expansionof distribution of sample eigenvalues. Let A n = r n (cid:18) l /λ n − (cid:19) , (11)The asymptotic expansion of A n up to the order n / is given by (see Sugiura (1973)) F A n ( t ) = Φ( t ) − √ φ ( t )3 √ n ( t −
1) + 12 p X i =2 λ i λ − λ i ! + o ( n − / ) . (12)Now suppose x i , i = 1 , . . . , n , are independently and identically distributed as χ (1) distribution.The normalized variable ˜ x i = 1 √ x i − , i = 1 , . . . , n has zero mean and unit variance. Let B n = 1 √ n n X i =1 ˜ x i = r n (cid:18) P ni =1 x i n − (cid:19) . (13)The asymptotic expansion of B n up to the order n / is given by F B n ( t ) = Φ( t ) − √ φ ( t )3 √ n ( t −
1) + o ( n − / ) . (14)Comparing (12) and (14), we notice that if t >
1, then the absolute value of the second termin (14) is smaller than that of (12) by the margin12 p X i =2 λ i λ − λ i (15)Since l /λ is asymptotically distributed as χ ( n ) when the largest population eigenvalue is in-finitely deviated from the others, l /λ in (11) is similarly distributed as P x i in (13). In this case(15) vanishes and both expansions (12), (14) coincide. It is naturally conjectured that when thelargest population eigenvalue λ is positioned far away from the smaller eigenvalues, we can makean “easier” inference on λ . The fact that the term (15) shrinks in that situation supports thisconjecture as well as our simulation results. As in the introduction of Section 11.7.3 of Anderson (2003), the equality of the p − m smallestpopulation eigenvalues λ m +1 = · · · = λ p (say σ ) (16)is equivalent to the covariance structure Σ = Φ + σ I p , Φ , a positive semidefinite matrix with rank m , represents the variance-covariance matrixof a systematic part and σ I p arises from measurement error. If hypothesis (16) is accepted, thenit suggests that the systematic part might consist of m independent factors. Need for testing (16)also arises in principal component analysis when the dimension of principal components has to bedecided. Once it is accepted and σ is sufficiently small, which might require another hypothesistesting, we could ignore the last p − m principal components.The likelihood ratio statistic for testing (16) is given (see e.g. Theorem 9.6.1 of Muirhead(1982)) by V = Q pi = m +1 l i (cid:16)P pi = m +1 l i (cid:17) p − m ( p − m ) p − m , and the critical region is V ≤ c ( γ ) for a given significance level γ .In order to give the critical point c ( γ ), we traditionally make use of the asymptotic convergence − n log V d → χ (( p − m + 2)( p − m − / , as n → ∞ . (17)Bartlett adjustment and further refinement on the asymptotic result are found in Section 9.6 ofMuirhead (1982). From this convergence, the approximate critical point is given as c ( γ ) = exp (cid:16) − n − χ γ (( p − m + 2)( p − m − / (cid:17) (18)On the other hand we can approximate the critical point c ( γ ) based on the asymptotic resultin Theorem 1. We can expect that this approach yields good approximation since in testinghypothesis (16), we often encounter the situation where the eigenvalues λ m +1 , . . . , λ p are muchsmaller than the other eigenvalues. The hypothesis (16) with small σ corresponds to the case ξ m +1 = · · · = ξ p = 1 in Theorem 1. Consequently we can approximate the distribution of V in(17) by the distribution of e V = Q pi = m +1 d i (cid:16)P pi = m +1 d i (cid:17) p − m ( p − m ) p − m , where d i ( i = m + 1 , . . . , p ) are the eigenvalues of Wishart matrix W p − m ( n − m, I p − m ) . Even underthe distribution W p − m ( n − m, I p − m ), it is not easy to derive analytical expressions for percentagepoints for e V . For p − m = 2, the distribution function is explicitly given (see 10.7.3. of Anderson(2003)) by P ( e V ≤ v ) = v ( n − m − / which gives the critical point c ( γ ) as c ( γ ) = γ / ( n − m − . (19)Generally a numerical calculation is needed for the exact evaluation of critical points. For thisproblem, refer to Consul (1967) and Pillai and Nagarsenker (1971).We made a simulation for the comparison between the above two methods. Let p = 3 , m = 1and consider testing the hypothesis λ = λ . We examined the accuracy of the two critical points(18) and (19) with γ = 0 .
05 and γ = 0 .
01 by simulating the probability of V being smaller thanthese critical points when λ = λ , that is, the probability of the error of the first kind. We put9able 2: Simulated Type 1 Error β n = 5 .140 .041 .051 .008 .142 .041 .053 .008 .141 .042 .054 .008 n = 10 .063 .033 .015 .005 .064 .033 .016 .006 .067 .036 .017 .007 n = 20 .038 .026 .007 .004 .039 .028 .008 .005 .041 .030 .008 .005 n = 50 .025 .021 .004 .003 .027 .024 .004 .003 .032 .029 .006 .005 n = 100 .022 .020 .003 .003 .025 .023 .003 .003 .034 .031 .005 .005 n = 500 .016 .016 .002 .002 .029 .029 .005 .005 .043 .043 .008 .008 n = 1000 .017 .017 .002 .002 .035 .035 .006 .006 .048 .047 .009 .009 β .6 .5 .35%1 5%2 1%1 1%2 5%1 5%2 1%1 1%2 5%1 5%2 1%1 1%2 n = 5 .145 .044 .055 .008 .148 .044 .056 .009 .158 .047 .059 .009 n = 10 .072 .039 .019 .008 .076 .041 .021 .008 .086 .046 .023 .009 n = 20 .051 .036 .011 .006 .057 .042 .012 .005 .067 .049 .016 .009 n = 50 .046 .040 .009 .008 .053 .047 .011 .009 .056 .050 .012 .010 n = 100 .050 .047 .010 .009 .050 .047 .010 .009 .052 .049 .010 .009 n = 500 .050 .050 .010 .009 .050 .050 .010 .010 .052 .051 .010 .010 n = 1000 .051 .050 .010 .010 .051 .050 .010 .010 .051 .051 .010 .010 β .1 .01 .0015%1 5%2 1%1 1%2 5%1 5%2 1%1 1%2 5%1 5%2 1%1 1%2 n = 5 .161 .049 .061 .010 .164 .050 .063 .011 .167 .051 .064 .010 n = 10 .091 .051 .026 .011 .092 .051 .025 .009 .090 .050 .024 .010 n = 20 .066 .049 .015 .010 .067 .050 .016 .010 .067 .050 .016 .010 n = 50 .057 .051 .012 .010 .057 .051 .012 .010 .055 .048 .012 .010 n = 100 .053 .050 .011 .010 .053 .049 .011 .010 .054 .051 .011 .010 n = 500 .051 .051 .010 .010 .050 .049 .010 .010 .051 .050 .011 .010 n = 1000 .050 .050 .009 .009 .051 .051 .010 .010 .051 .050 .010 .010 λ = 1 , λ = β, λ = β and varied both β and n. Table 2 shows the result, where the labels5(1)%1 and 5(1)%2 indicate that the numbers below correspond to the critical points (18) and (19)respectively with γ = 0 . . . Numbers in bold mean that they are within ± .
001 deviationfrom the desired value.We can summarize the result as follows;1. If β ≥ .
8, both (18) and (19) need a large sample size. Especially when β is as large as1.0 or 0.9, more than 1000 samples are required for a good approximation. There is nomeaningful difference between both critical points.2. If β equals 0.6 or 0.5, 50 (sometimes 20) samples are large enough to give a good approx-imation for both (18) and (19). There is no significant difference between both criticalpoints. 10. If β < .
5, (19) shows significantly better performance than (18). Even with such a smallsample as 5, (19) gives very accurate approximations. The critical point (19) is robust inthe sense that it already gives an excellent approximation when the smallest eigenvalues are0.3 times as large as the largest eigenvalue.
References [1] Anderson, T. W. (2003).
An Introduction to Multivariate Statistical Analysis , 3rd ed., Wiley,New Jersey.[2] Anderson, T. W. and Das Gupta, S. (1964). A monotonicity property of the power functionsof some tests of the equality of two covariance matrices.
Ann. Math. Statist. , , 1059-1063.[3] Consul, P C. (1967). On the exact distributions of the criterion W for testing sphericity andin a p -variate normal distribution Ann. Math. Statist. , , 1170-1174[4] Hanumara, R. C. and Thompson, W. A. (1968). Percentage points of the extreme roots of aWishart matrix Biometrika , , 505-512.[5] Marshall, A. W. and Olkin, I. (1979). Inequalities: Theory of Majorization and Its Applica-tions , Academic Press, California.[6] Muirhead, R. J. (1982).
Aspects of Multivariate Statistical Theory , Wiley, New York.[7] K. C. S. Pillai and Nagarsenker, B N. (1971). On the distribution of the sphericity testcriterion in classical and complex normal populations having unknown covariance matrices.
Ann. Math. Statist. , , 764-767.[8] Sheena, Y. and Takemura, A. (2007a). An asymptotic expansion of Wishart distribution whenthe population eigenvalues are infinitely dispersed. Statistical Methodology , , 158-184.[9] Sheena, Y. and Takemura, A. (2007b). Asymptotic distribution of Wishart matrixfor block-wise dispersion of population eigenvalues. Journal of Multivariate Analysis ,doi:10.1016/j.jmva.2007.04.001[10] Siotani, M., Hayakawa, T. and Fujikoshi, Y. (1985).
Modern Multivariate Statistical Analysis:A Graduate Course and Handbook , American Science Press, Columbus, Ohio.[11] Sugiura, N. (1973) . Derivatives of the characteristic root of a symmetric or Hermitian matrixwith two applications in multivariate analysis.
Comm. Statist. , , 393-417.[12] Takemura, A. and Sheena, Y. (2005). Distribution of eigenvalues and eigenvectors of Wishartmatrix when the population eigenvalues are infinitely dispersed and its application to minimaxestimation of covariance matrix. Journal of Multivariate Analysis , , 271-299.[13] Thompson, W. A. (1962). Estimation of dispersion parameters. J. Res. Natl. Bur. StandardsSec. B ,66