[PDF] The local power of the gradient test

Abstract

The asymptotic expansion of the distribution of the gradient test statistic is derived for a composite hypothesis under a sequence of Pitman alternative hypotheses converging to the null hypothesis at rate n −1/2 , n being the sample size. Comparisons of the local powers of the gradient, likelihood ratio, Wald and score tests reveal no uniform superiority property. The power performance of all four criteria in one-parameter exponential family is examined.

Full PDF

aa r X i v : . [ m a t h . S T ] J u l Annals of the Institute of Statistical Mathematics manuscript No. (will be inserted by the editor)

The local power of the gradient test

Artur J. Lemonte · Silvia L. P. Ferrari

Received: date / Revised: date

Abstract

The asymptotic expansion of the distribution of the gradient test statistic isderived for a composite hypothesis under a sequence of Pitman alternative hypothe-ses converging to the null hypothesis at rate n − / , n being the sample size. Com-parisons of the local powers of the gradient, likelihood ratio, Wald and score testsreveal no uniform superiority property. The power performance of all four criteria inone-parameter exponential family is examined. Keywords

Asymptotic expansions · Chi-square distribution · Gradient test · Likelihood ratio test · Pitman alternative · Power function · Score test · Wald test

The most commonly used large sample tests are the likelihood ratio (Wilks 1938),Wald (Wald 1943) and Rao score (Rao 1948) tests. Recently, Terrell (2002) proposeda new test statistic that shares the same ﬁrst order asymptotic properties with thelikelihood ratio ( LR ), Wald ( W ) and Rao score ( S R ) statistics. The new statistic,referred to as the gradient statistic ( S T ), is markedly simple. In fact, Rao (2005)wrote: “The suggestion by Terrell is attractive as it is simple to compute. It wouldbe of interest to investigate the performance of the [gradient] statistic.” The presentpaper goes in this direction.Let x = ( x , . . . , x n ) ⊤ be a random vector of n independent observations withprobability density function π ( x | θ ) that depends on a p -dimensional vector of un-known parameters θ = ( θ , . . . , θ p ) ⊤ . Consider the problem of testing the compositenull hypothesis H : θ = θ against H : θ = θ , where θ = ( θ ⊤ , θ ⊤ ) ⊤ , θ =( θ , . . . , θ q ) ⊤ and θ = ( θ q +1 , . . . , θ p ) ⊤ , θ representing a ( p − q ) -dimensional We gratefully acknowledge grants from FAPESP and CNPq (Brazil).A.J. Lemonte · S.L.P. FerrariDepartamento de Estat´ıstica, Universidade de S˜ao Paulo, S˜ao Paulo/SP, 05508-090, BrazilE-mail: [email protected] Artur J. Lemonte, Silvia L. P. Ferrari ﬁxed vector. Let ℓ be the total log-likelihood function, i.e. ℓ = ℓ ( θ ) = P nl =1 log π ( x l | θ ) . Let U ( θ ) = ∂ℓ/∂ θ = ( U ( θ ) ⊤ , U ( θ ) ⊤ ) ⊤ be the corresponding total scorefunction partitioned following the partition of θ . The restricted and unrestricted max-imum likelihood estimators of θ are b θ = ( b θ ⊤ , b θ ⊤ ) ⊤ and e θ = ( e θ ⊤ , θ ⊤ ) ⊤ , respec-tively.The gradient statistic for testing H is S T = U ( e θ ) ⊤ ( b θ − e θ ) . (1)Since U ( e θ ) = , the gradient statistic in (1) can be written as S T = U ( e θ ) ⊤ ( b θ − θ ) . Clearly, S T has a very simple form and does not involve knowledge of theinformation matrix, neither expected nor observed, and no matrices, unlike W and S R . Asymptotically, S T has a central chi-square distribution with p − q degrees offreedom under H . Terrell (2002) points out that the gradient statistic “is not transpar-ently non-negative, even though it must be so asymptotically.” His Theorem 2 impliesthat if the log-likelihood function is concave and is differentiable at e θ , then S T ≥ .In this paper we derive the asymptotic distribution of the gradient statistic fora composite null hypothesis under a sequence of Pitman alternatives converging tothe null hypothesis at a convergence rate n − / . In other words, the sequence ofalternative hypotheses is H n : θ = θ + n − / ǫ , where ǫ = ( ǫ q +1 , . . . , ǫ p ) ⊤ .Similar results for the likelihood ratio and Wald tests were obtained by Hayakawa(1975) and for the score test, by Harris & Peers (1980). Comparison of local powerproperties of the competing tests will be performed. Our results will be specialized tothe case of the one-parameter exponential family. A brief discussion closes the paper. Our notation follows that of Hayakawa (1975, 1977). We introduce the followinglog-likelihood derivatives y r = n − / ∂ℓ∂θ r , y rs = n − ∂ ℓ∂θ r ∂θ s , y rst = n − / ∂ ℓ∂θ r ∂θ s ∂θ t , their arrays y = ( y , . . . , y p ) ⊤ , Y = (( y rs )) , Y ... = (( y rst )) , the corresponding cu-mulants κ rs = E ( y rs ) , κ r,s = E ( y r y s ) , κ rst = n / E ( y rst ) , κ r,st = n / E ( y r y st ) , κ r,s,t = n / E ( y r y s y t ) and their arrays K = (( κ r,s )) , K ... = (( κ rst )) , K .,.. =(( κ r,st )) and K .,.,. = (( κ r,s,t )) .We make the same assumptions as in Hayakawa (1975). In particular, it is as-sumed that the κ ’s are all O (1) and they are not functionally independent; for in-stance, κ r,s = − κ rs . Relations among them were ﬁrst obtained by Bartlett (1953a,b).Also, it is assumed that Y is non-singular and that K is positive deﬁnite with inverse K − = (( κ r,s )) say. For triple-sufﬁx quantities we use the following summationnotation K ... ◦ a ◦ b ◦ c = p X r,s,t =1 κ rst a r b s c t , K .,.. ◦ M ◦ b = p X r,s,t =1 κ r,st m rs b t , he local power of the gradient test 3 where M is a p × p matrix and a , b and c are p × column vectors.The partition θ = ( θ ⊤ , θ ⊤ ) ⊤ induces the corresponding partitions: Y = (cid:20) Y Y Y Y (cid:21) , K = (cid:20) K K K K (cid:21) , K − = (cid:20) K K K K (cid:21) , a = ( a ⊤ , a ⊤ ) ⊤ , etc. Also, K .. ◦ a ◦ b ◦ c = p X r = q +1 p X s,t =1 κ rst a r b s c t . Using a procedure analogous to that of Hayakawa (1975), we can write the asymp-totic expansion of S T for the composite hypothesis up to order n − / as S T = − ( Zy + ξ ) ⊤ Y ( Zy + ξ ) − √ n K ... ◦ ( Zy + ξ ) ◦ Y − y ◦ Y − y − √ n K ... ◦ ( Zy + ξ ) ◦ ( Z y − ξ ) ◦ ( Z y − ξ ) + O p ( n − ) , where Z = Y − − Z , Z = (cid:20) Y −

00 0 (cid:21) , ξ = (cid:20) Y − Y − I p − q (cid:21) ǫ , I p − q being the identity matrix of order p − q .We can now use a multivariate Edgeworth Type A series expansion of the jointdensity function of y and Y up to order n − / (Peers 1971), which has the form f = f (cid:20) √ n ( K .,.,. ◦ K − y ◦ K − y ◦ K − y − K .,.,. ◦ K − ◦ K − y ) − √ n K .,.. ◦ K − y ◦ D (cid:21) + O ( n − ) , where f = (2 π ) − p/ | K | − / exp (cid:26) − y ⊤ K − y (cid:27) p Y r,s =1 δ ( y rs − κ rs ) , D = (( d bc )) , d bc = δ ′ ( y bc − κ bc ) /δ ( y bc − κ bc ) , with δ ( · ) being the Dirac deltafunction (Bracewell 1999), to obtain the moment generating function of S T , M ( t ) say.From f and the asymptotic expansion of S T up to order n − / , we arrive, afterlong algebra, at M ( t ) = (1 − t ) − ( p − q ) exp (cid:18) t − t ǫ ⊤ K . ǫ (cid:19) × (cid:20) √ n ( A d + A d + A d ) (cid:21) + O ( n − ) , Artur J. Lemonte, Silvia L. P. Ferrari where d = 2 t/ (1 − t ) , K . = K − K K − K , A = − ( K ... ◦ K − ◦ ǫ ∗ +4 K .,.. ◦ A ◦ ǫ ∗ + K ... ◦ A ◦ ǫ ∗ + K ... ◦ ǫ ∗ ◦ ǫ ∗ ◦ ǫ ∗ ) / , A = − ( K ... ◦ K − ◦ ǫ ∗ − K ... ◦ A ◦ ǫ ∗ − K .,.. ◦ ǫ ∗ ◦ ǫ ∗ ◦ ǫ ∗ ) / , A = − K ... ◦ ǫ ∗ ◦ ǫ ∗ ◦ ǫ ∗ / , ǫ ∗ = (cid:20) K − K − I p − q (cid:21) ǫ , A = (cid:20) K −

00 0 (cid:21) . When n → ∞ , M ( t ) → (1 − t ) − ( p − q ) / exp { tλ/ (1 − t ) } , where λ = ǫ ⊤ K . ǫ / ,and hence the limiting distribution of S T is a non-central chi-square distributionwith p − q degrees of freedom and non-centrality parameter λ . Under H , i.e. when ǫ = , M ( t ) = (1 − t ) − ( p − q ) / + O ( n − ) and, as expected, S T has a centralchi-square distribution with p − q degrees of freedom up to an error of order n − .Also, from M ( t ) we may obtain the ﬁrst three moments of S T up to order n − / as µ ′ ( S T ) = p − q + λ + 2 A / √ n , µ ( S T ) = 2( p − q + 2 λ ) + 8( A + A ) / √ n and µ ( S T ) = 8( p − q + 3 λ ) + 6( A + 2 A + A ) / √ n . The moment generating function of S T in a neighborhood of θ = θ can be written,after some algebra, as M ( t ) = (1 − t ) − ( p − q ) exp (cid:18) t − t ǫ ⊤ K † . ǫ (cid:19) × (cid:20) √ n X k =0 a k (1 − t ) − k (cid:21) + O ( n − ) , where a = 14 (cid:8) K † ... ◦ ( K − ) † ◦ ( ǫ ∗ ) † − (4 K .,.. + 3 K ... ) † ◦ A † ◦ ( ǫ ∗ ) † − K ... + 2 K .,.. ) † ◦ ( ǫ ∗ ) † ◦ ( ǫ ∗ ) † ◦ ( ǫ ∗ ) † − K .. + K ,.. ) † ◦ ǫ ◦ ( ǫ ∗ ) † ◦ ( ǫ ∗ ) † (cid:9) ,a = − (cid:8) K † ... ◦ ( K − − A ) † ◦ ( ǫ ∗ ) † − ( K ... + 2 K .,.. ) † ◦ ( ǫ ∗ ) † ◦ ( ǫ ∗ ) † ◦ ( ǫ ∗ ) † (cid:9) ,a = − K † ... ◦ ( ǫ ∗ ) † ◦ ( ǫ ∗ ) † ◦ ( ǫ ∗ ) † , (2)and a = − ( a + a + a ) . The symbol “ † ” denotes evaluation at θ = ( θ ⊤ , θ ⊤ ) ⊤ .Inverting M ( t ) , we arrive at the following theorem, our main result. Theorem 1

The asymptotic expansion of the distribution of the gradient statistic fortesting a composite hypothesis under a sequence of local alternatives converging tothe null hypothesis at rate n − / is Pr( S T ≤ x ) = G f,λ ( x ) + 1 √ n X k =0 a k G f +2 k,λ ( x ) + O ( n − ) , (3) he local power of the gradient test 5 where G m,λ ( x ) is the cumulative distribution function of a non-central chi-squarevariate with m degrees of freedom and non-centrality parameter λ . Here, f = p − q , λ = ǫ ⊤ K † . ǫ / and the a k ’s are given in (2). If q = 0 , the null hypothesis is simple, ǫ ∗ = − ǫ and A = . Therefore, animmediate consequence of Theorem 1 is the following corollary. Corollary 1

The asymptotic expansion of the distribution of the gradient statisticfor testing a simple hypothesis under a sequence of local alternatives convergingto the null hypothesis at rate n − / is given by (3) with f = p , λ = ǫ ⊤ K † ǫ / , a = K † ... ◦ ǫ ◦ ǫ ◦ ǫ / , a = −{ K † ... ◦ ( K − ) † ◦ ǫ − K † .,.. ◦ ǫ ◦ ǫ ◦ ǫ } / , a = { K † ... ◦ ( K − ) † ◦ ǫ − ( K ... + 2 K .,.. ) † ◦ ǫ ◦ ǫ ◦ ǫ } / and a = K † ... ◦ ǫ ◦ ǫ ◦ ǫ / . To ﬁrst order S T , LR , W and S R have the same asymptotic distributional proper-ties under either the null or local alternative hypotheses. Up to an error of order n − the corresponding criteria have the same size but their powers differ in the n − / term. The power performance of the different tests may then be compared basedon the expansions of their power functions ignoring terms or order less than n − / .Harris & Peers (1980) presented a study of local power, up to order n − / , for thelikelihood ratio, Wald and score tests. They showed that none of the criteria is uni-formly better than the others.Let S i ( i = 1 , , , ) be, respectively, the likelihood ratio, Wald, score and gradi-ent statistics. We can write their local powers as Π i = 1 − Pr( S i ≤ x ) = Pr( S i > x ) ,where Pr( S i ≤ x ) = G p − q,λ ( x ) + 1 √ n X k =0 a ik G p − q +2 k,λ ( x ) + O ( n − ) . The coefﬁcients that deﬁne the local powers of the likelihood ratio and Wald testsare given in Hayakawa (1975), those corresponding to the score and gradient testsare given in Harris & Peers (1980) and in (2), respectively. All of them are compli-cated functions of joint cumulants of log-likelihood derivatives but we can draw thefollowing general conclusions: – all the four tests are locally biased; – if K ... = , the likelihood ratio, Wald and gradient tests have identical localpowers; – if K ... = 2 K .,.,. , the score and gradient tests have identical local powers.Further classiﬁcations are possible for appropriate subspaces of the parameter space;see, for instance, Harris & Peers (1980) and Hayakawa & Puri (1985). Therefore,there is no uniform superiority of one test with respect to the others. Hence, the gradi-ent test, which is very simple to compute as pointed out by C.R. Rao, is an attractivealternative to the likelihood ratio, Wald and score tests. Artur J. Lemonte, Silvia L. P. Ferrari

Let x = ( x , . . . , x n ) ⊤ be a random sample of size n , with each x l having probabilitydensity function π ( x ; θ ) = exp { t ( x ; θ ) } , where θ is a scalar parameter. To test H : θ = θ , where θ is a ﬁxed known constant, the likelihood ratio, Wald, score andgradient statistics are, respectively, S = 2 n X l =1 { t ( x l ; b θ ) − t ( x l ; θ ) } , S = n ( b θ − θ ) K ( b θ ) ,S = ( P nl =1 t ′ ( x l ; θ )) nK ( θ ) , S = ( b θ − θ ) n X l =1 t ′ ( x l ; θ ) , where b θ is the maximum likelihood estimator of θ and K = K ( θ ) denotes the Fisherinformation for a single observation. Under H all the four statistics have a centralchi-square distribution with one degree of freedom asymptotically.Now, let κ θθ = E { t ′′ ( x ; θ ) } , κ θθθ = E { t ′′′ ( x ; θ ) } , κ θθ,θ = E { t ′′ ( x ; θ ) t ′ ( x ; θ ) } , κ θ,θ = − κ − θθ , etc, where primes denote derivatives with respect to θ ; for instance t ′′ ( x ; θ ) = d t ( x ; θ ) / d θ . The asymptotic expansion of the distribution of the gradi-ent statistic for the null hypothesis H : θ = θ under the sequence of local alterna-tives H n : θ = θ + n − / ǫ is given by (3) with f = 1 , λ = K † ǫ / , a = κ † θθθ ǫ , a = − κ † θθθ ( κ θ,θ ) † ǫ − κ † θ,θθ ǫ ,a = κ † θθθ ( κ θ,θ ) † ǫ − ( κ θθθ + 2 κ θ,θθ ) † ǫ , a = κ † θθθ ǫ . We now specialize to the case where π ( x ; θ ) belongs to the one-parameter expo-nential family. Let t ( x ; θ ) = − log ζ ( θ ) − α ( θ ) d ( x ) + v ( x ) , where α ( · ) , ζ ( · ) , d ( · ) and v ( · ) are known functions. Also, α ( · ) and ζ ( · ) are assumed to have ﬁrst three con-tinuous derivatives, with ζ ( · ) > , α ′ ( θ ) and β ′ ( θ ) being different from zero for all θ in the parameter space, where β ( θ ) = ζ ′ ( θ ) / { ζ ( θ ) α ′ ( θ ) } . Since K = α ′ ( θ ) β ′ ( θ ) , P nl =1 t ( x l ; θ ) = − n { log ζ ( θ ) + α ( θ ) ¯ d − ¯ v } , P nl =1 t ′ ( x l ; θ ) = − nα ′ ( θ ) { β ( θ ) + ¯ d } ,with ¯ d = P nl =1 d ( x l ) /n and ¯ v = P nl =1 v ( x l ) /n , we have S = 2 n (cid:20) log (cid:26) ζ ( θ ) ζ ( b θ ) (cid:27) + { α ( θ ) − α ( b θ ) } ¯ d (cid:21) , S = n ( b θ − θ ) α ′ ( b θ ) β ′ ( b θ ) ,S = nα ′ ( θ ) { β ( θ ) + ¯ d } β ′ ( θ ) , S = n ( θ − b θ ) α ′ ( θ ) { β ( θ ) + ¯ d } . Let α ′ = α ′ ( θ ) , α ′′ = α ′′ ( θ ) , β ′ = β ′ ( θ ) and β ′′ = β ′′ ( θ ) . It can be shown that κ θθ = − α ′ β ′ , κ θθθ = − (2 α ′′ β ′ + α ′ β ′′ ) , κ θ,θθ = α ′′ β ′ , κ θ,θ,θ = α ′ β ′′ − α ′′ β ′ . Thecoefﬁcients that deﬁne the local powers of the tests that use S , S , S and S are a = a = a = − a = 2 a = − (2 α ′′ β ′ + α ′ β ′′ ) ǫ , a = α ′′ β ′ ǫ , he local power of the gradient test 7 a = a = − a = ( α ′ β ′′ − α ′′ β ′ ) ǫ , a = α ′′ β ′ ǫ − ( α ′ β ′′ − α ′′ β ′ ) ǫ α ′ β ′ ,a = − a = α ′′ β ′ ǫ − (2 α ′′ β ′ + α ′ β ′′ ) ǫ α ′ β ′ , a = ( α ′ β ′′ − α ′′ β ′ ) ǫ α ′ β ′ , a = 0 ,a = α ′′ β ′ ǫ α ′′ β ′ + α ′ β ′′ ) ǫ α ′ β ′ , a = α ′ β ′′ ǫ − (2 α ′′ β ′ + α ′ β ′′ ) ǫ α ′ β ′ . If α ( θ ) = θ , π ( x ; θ ) corresponds to a one-parameter natural exponential family. Inthis case, α ′ = 1 , α ′′ = 0 and the a ’s simplify considerably.We now present some analytical comparisons among the local powers of the fourtests for a number of distributions within the one-parameter exponential family. Let Π i and Π j be the power functions, up to order n − / , of the tests that use the statistics S i and S j , respectively, with i = j and i, j = 1 , , , . We have, Π i − Π j = 1 √ n X k =0 ( a jk − a ik ) G k,λ ( x ) . (4)It is well known that G m,λ ( x ) − G m +2 ,λ ( x ) = 2 g m +2 ,λ ( x ) , (5)where g ν,λ ( x ) is the probability density function of a non-central chi-square randomvariable with ν degrees of freedom and non-centrality parameter λ . From (4) and (5),we can state the following comparison among the powers of the four tests. Here, weassume that θ > θ (0) ; opposite inequalities hold if θ < θ (0) .1. Normal ( θ > , −∞ ≤ µ ≤ ∞ and x ∈ IR): – µ known: α ( θ ) = (2 θ ) − , ζ ( θ ) = θ / , d ( x ) = ( x − µ ) and v ( x ) = −{ log(2 π ) } / , Π > Π > Π > Π . – θ known: α ( µ ) = − µ/θ , ζ ( µ ) = exp { µ / (2 θ ) } , d ( x ) = x and v ( x ) = −{ x + log(2 πθ ) } / , Π = Π = Π = Π .2. Inverse normal ( θ > , µ > and x > ): – µ known: α ( θ ) = θ , ζ ( θ ) = θ − / , d ( x ) = ( x − µ ) / (2 µ x ) and v ( x ) = −{ log(2 πx ) } / , Π > Π > Π = Π . – θ known: α ( µ ) = θ/ (2 µ ) , ζ ( µ ) = exp {− θ/µ ) } , d ( x ) = x and v ( x ) = −{ θ/ (2 x ) − log( θ/ (2 πx )) } / , Π > Π > Π > Π .3. Gamma ( k > , k known, θ > and x > ): α ( θ ) = θ , ζ ( θ ) = θ − k , d ( x ) = x and v ( x ) = ( k −

1) log( x ) − log { Γ ( k ) } , Γ ( · ) is the gamma function, Π >Π > Π = Π .4. Truncated extreme value ( θ > and x > ): α ( θ ) = θ − , ζ ( θ ) = θ , d ( x ) =exp( x ) − and v ( x ) = x , Π > Π > Π > Π .5. Pareto ( θ > , k > , k known and x > k ): α ( θ ) = 1 + θ , ζ ( θ ) = ( θk θ ) − , d ( x ) = log( x ) and v ( x ) = 0 , Π > Π > Π = Π .6. Laplace ( θ > , −∞ < k < ∞ , k known and x > ): α ( θ ) = θ − , ζ ( θ ) = 2 θ , d ( x ) = | x − k | and v ( x ) = 0 , Π > Π > Π > Π .7. Power ( θ > , φ > , φ known and x > φ ): α ( θ ) = 1 − θ , ζ ( θ ) = θ − φ θ , d ( x ) = log( x ) and v ( x ) = 0 , Π > Π > Π = Π . Artur J. Lemonte, Silvia L. P. Ferrari

The gradient test can be an interesting alternative to the classic large-sample tests,namely the likelihood ratio, Wald and Rao score tests. It is competitive with the otherthree tests since none is uniformly superior to the others in terms of second order localpower as we showed. Unlike the Wald and the score statistics, the gradient statisticdoes not require to obtain, estimate or invert an information matrix, which can be anadvantage in complex problems.Theorem 3 in Terrell (2002) points to another important feature of the gradienttest. It suggests that we can, in general, improve the approximation of the distributionof the gradient statistic by a chi-square distribution under the null hypothesis by usinga less biased estimator to θ . It is well known that the maximum likelihood estimatorcan be bias-corrected using Cox & Snell (1968) results or the approach proposed byFirth (1993). The effect of replacing the maximum likelihood estimator by its bias-corrected versions will be studied in future research. Note that, unlike LR and S R ,the gradient statistic is not invariant under non-linear reparameterizations, as is thecase of W . However, we can improve its performance, under the null hypothesis, bychoosing a parameterization under which the maximum likelihood estimator is nearlyunbiased.Our results are quite general, and can be speciﬁed to important classes of sta-tistical models, such as the generalised linear models. Local power comparisonsof the three usual large-sample tests in generalised linear models are presented byCordeiro et al. (1994) and Ferrari et al. (1997). The extension of their studies to in-clude the gradient test will be reported elsewhere.As a ﬁnal remark, the power comparisons performed in the present paper con-sider the four tests in their original form, i.e. they are not corrected to achieve localunbiasedness; see Rao & Mukerjee (1997) and references therein for this alternativeapproach. In fact, this approach can be explored in future work for the gradient test. References B ARTLETT , M. S. (1953a). Approximate conﬁdence intervals.

Biometrika , 12–19.B ARTLETT , M. S. (1953b). Approximate conﬁdence intervals, II. More than one unknown parameter.

Biometrika , 306–317.B RACEWELL , R. (1999).

The Fourier Transform and Its Applications . New York: McGraw-Hill, 3rd ed.C

ORDEIRO , G. M., B

OTTER , D. A. & F

ERRARI , S. L. P. (1994). Nonnull asymptotic distributions ofthree classic criteria in generalised linear models.

Biometrika , 709–720.C OX , D. R. & S NELL , E. J. (1968). A general deﬁnition of residuals (with discussion).

Journal of theRoyal Statistical Society B , 248–275.F ERRARI , S. L. P., B

OTTER , D. A. & C

RIBARI -N ETO , F. (1997). Local power of three classic criteriain generalised linear models with unknown dispersion.

Biometrika , 482–4S5.F IRTH , D. (1993). Bias reduction of maximum likelihood estimates.

Biometrika , 27–38.H ARRIS , P. & P

EERS , H. W. (1980). The local power of the efﬁcient score test statistic.

Biometrika ,525–529.H AYAKAWA , T. (1975). The likelihood ratio criterion for a composite hypothesis under a local alternative.

Biometrika , 451–460.H AYAKAWA , T. (1977). The likelihood ratio criterion and the asymptotic expansion of its distribution.

Annals of the Institute of Statistical Mathematics , 359–378.he local power of the gradient test 9H AYAKAWA , T. & P

URI , M. L. (1985). Asymptotic expansions of the distributions of some test statistics.

Annals of the Institute of Statistical Mathematics , 95–108.P EERS , H. W. (1971). Likelihood ratio and associated test criteria.

Biometrika , 577–587.R AO , C. R. (1948). Large sample tests of statistical hypotheses concerning several parameters with appli-cations to problens of estimation. Proceedings of the Cambridge Philosophical Society , 50–57.R AO , C. R. (2005). Score test: historical review and recent developments. In Advances in Ranking andSelection, Multiple Comparisons, and Reliability , N. Balakrishnan, N. Kannan & H. N. Nagaraja, eds.Birkhuser, Boston.R AO , C. R. & M UKERJEE , R. (1997). Comparison of LR, score, and Wald tests in a non-iid setting.

Journal of Multivariate Analysis , 99–110.T ERRELL , G. R. (2002). The gradient statistic.

Computing Science and Statistics , 206–215.W ALD , A. (1943). Tests of statistical hypothesis concerning several parameters when the number ofobservations is large.

Transactions of the American Mathematical Society , 426–482.W ILKS , S. S. (1938). The large-sample distribution of the likelihood ratio for testing composite hypothe-sis.

Annals of Mathematical Statistics9