Some parametric tests based on sample spacings
SSome parametric tests based on sample spacings
Rahul Singh and Neeraj Misra Department of Mathematics and Statistics, Indian Institute of Technology Kanpur, India
Abstract
Assume that we have a random sample from an absolutely continuous distribution (univari-ate or multivariate) with a known functional form and some unknown parameters. In this paper,we have studied several parametric tests symmetrically based on sample spacings. Asymptoticproperties of these tests have been investigated under the simple null hypothesis and the se-quence of local alternatives converging to the null hypothesis. The asymptotic properties ofthe proposed tests have also been studied under composite null hypothesis. It is observed thatthese tests have similar properties as the likelihood ratio test. For assessment of finite sam-ple performance of the proposed tests, we have performed an extensive numerical study. Theproposed tests can be used in some situations where likelihood ratio tests do not exist due tounboundedness of likelihood function.
Keywords:
Asymptotic distribution, generalised spacings estimator, hypothesis test, likelihoodratio test, multivariate spacings, nearest neighbour, sample spacings.
Let X , X , . . . , X n be independent and identically distributed (iid) random variables from an ab-solutely continuous distribution function F η , η ∈ Θ ⊆ R p . We assume that, for every θ ∈ Θ, thefunctional form of F θ is specified and that the true parameter η ∈ Θ is the unknown. Here weare interested in tests for simple and composite hypotheses concerning unknown true parameter.There are many ways to solve this problem. A popular method is to develop test statistic based onsample spacings. Let X n , X n , . . . , X n : n be order statistics corresponding to X , X , . . . , X n . Let X n = −∞ and X n +1: n = ∞ . For any positive integer m ( < n ), m -step disjoint sample spacingsare defined as D ( m ) j,n ( θ ) = F θ ( X jm : n ) − F θ ( X ( j − m : n ) , j = 1 , , . . . , (cid:22) n + 1 m (cid:23) , (1)where for any real number x , (cid:98) x (cid:99) denotes the largest integer not exceeding x . For any positiveinteger m = m ( n ), sufficiently smaller than n , we can take M = (cid:22) n + 1 m (cid:23) (cid:39) n + 1 m . For statingasymptotic results, without loss in generality, we can take M to be an integer. Suppose θ ∈ Θ bepre-specified, for testing H : η = θ , a useful test statistic based on disjoint sample spacings has Email: [email protected], [email protected] a r X i v : . [ m a t h . S T ] F e b ome parametric tests based on sample spacings the form S ( m ) φ,n ( θ ), where S ( m ) φ,n ( θ ) = 1 M M (cid:88) j =1 φ ((( M D ( m ) j,n ( θ )))) , θ ∈ Θ , (2)for some real valued convex function φ defined on positive half line. The choice of function φ corresponding to some popular test statistics are as follows: φ ( x ) Statistic x Greenwood Statistic (Greenwood, 1946) x r , r > x ) Log Spacing Statistic (Moran, 1951) x log( x ) Relative Entropy Spacing Statistic (Misra & van der Meulen, 2001) | x − | r , r > m = 1 (simple spacings case), under quite general conditions on the underlyingdistribution and function φ , Sethuraman & Rao (1970) established the asymptotic normality of thetest statistic S (1) φ,n ( θ ) under the simple null hypothesis H : η = θ . Del Pino (1979) extended thisresult for any finite m . Mirakhmedov (2005) further extended their results to situations where m isallowed to grow with n ( m → ∞ ) such that m = o ( n ). Goodness of fit tests based on (2) can detectalternative converging to null distribution at a rate of n − / or slower, and within this class theGreenwood test statistic is asymptotically locally most powerful in terms of the Pitman efficiency(Sethuraman & Rao, 1970).Cheng & Amin (1983) and Ranneby (1984) studied estimation of true unknown parameter η and proposed maximising the parametric function (2) with φ ( x ) = log( x ) , x >
0. The estimatorso obtained is called the maximum spacings product estimator (MSPE). Ghosh & Jammalmadaka(2001) continued this study further by considering general φ . Under quite general conditions theyshowed that such an estimator has similar asymptotic properties as maximum likelihood estimator(MLE). Such an estimator is known as the generalised spacing estimator (GSE). Ekstr¨om et al.(2020) extended the results of Ghosh & Jammalmadaka (2001) to situations where m is any finite,but fixed positive integer or m → ∞ such that m = o ( n ). S ( m ) φ,n ( θ ) alone can’t be used as test statistic as this doesn’t contain information regardingalternative hypothesis. Suppose that ˆ η is √ n -consistent estimator of η then for large n , under H : η = θ , S ( m ) φ,n (ˆ η ) is expected to be close to S ( m ) φ,n ( θ ). Hence, some distance function measuringdeparture of S ( m ) φ,n ( θ ) from S ( m ) φ,n (ˆ η ) can be used as a test statistic, where larges value of the distanceindicate incompatibility of the data with the H . Based on this idea, Torabi (2006) proposedparametric tests based on simple spacings. For m = 1, Ekstrom (2013) further studied suchparametric tests and showed that such tests have asymptotic properties similar to likelihood ratiotests. This paper can be seen as extension of the same paper.For multivariate random vectors, Zhou & Jammalamadaka (1993) generalized the concept ofunivariate spacings using nearest neighbour balls. Kuljus & Ranneby (2015) proved consistencyof GSE for multivariate observations. Kuljus & Ranneby (2020) found that GSE for multivariateobservations have similar asymptotic properties as MLE. In this paper we have studied parametric2 ahul Singh and Neeraj Misra tests based on multivariate sample spacings and found that they have similar asymptotic propertiesas their univariate analogue.We have also performed an extensive numerical study to assess finite sample performances ofthe proposed tests. For large sample size ( n ≥ χ d ( δ ) denotes a non-central chisquare random variable with d degrees of freedom and non-centrality parameter δ , and χ d denotes acentral chi square random variable with d degrees of freedom. R will denote the real line and for anypositive integer p , R p will denote the p -dimensional Euclidean space. Convergences in probabilityand distribution are denoted by p −→ and d −→ , respectively.Rest of the paper is arranged as follows. Tests for univariate distributions are discussed inSection 2. In Section 3, tests for multivariate distributions are discussed. A numerical study toassess finite sample performance of our tests is given in Section 4. A discussion based on the studyis given in Section 5 and all the proofs are given in the Appendix. Let η ∈ Θ be the true and unknown value of the parameter. Based on a random sample X , X , . . . , X n from F η , we aim to test the hypothesis H : η = θ against H A : η (cid:54) = θ , (3)where θ ∈ Θ is pre-specified. Let φ : (0 , ∞ ) → R be a convex function. As a measure of departureof any F θ , θ ∈ Θ from F η , Csisz´ar (1977) defined φ -divergence of F θ with respect to F η as S φ ( θ, η ) := (cid:90) ∞−∞ φ (cid:18) f θ ( x ) f η ( x ) (cid:19) f η ( x ) dx, θ ∈ Θ , (4)where f τ is density corresponding to distribution F τ , τ ∈ Θ. If φ ( x ) = − log( x ) , x >
0, the φ -divergence is known as the Kulback-Libeler (KL) divergence. Using Jensen’s inequality we haveinf θ ∈ Θ S φ ( θ, η ) = S φ ( η, η ) = φ (1) . Thus if ˆ η is a suitable estimator of η then T φ,n ( θ ) = S φ ( θ , ˆ η ) − inf θ ∈ Θ S φ ( θ, ˆ η )can be used as a measure of departure from the null hypothesis H : η = θ . Since inf θ ∈ Θ S φ ( θ, η ) = S φ ( η, η ), a suitable estimator of η is the one that minimises S φ ( θ, η ) with respect to θ ∈ Θ.Since S φ ( θ, η ) involves unknown η , an appropriate approximation of S φ ( θ, η ) may be used forthis purpose. Note that nm { F θ ( X j + m : n ) − F θ ( X j : n ) } is a non-parametric histogram estimator of3 ome parametric tests based on sample spacings f θ ( x ) f η ( x ) , x ∈ [ X j : n , X j + m : n ) (see Prakasa Rao (1983)). This suggests that S ( m ) φ,n ( θ ), defined by (2),is an appropriate approximation of S φ ( θ, η ) and ˆ θ ( m ) φ,n = arg inf θ ∈ Θ S ( m ) φ,n ( θ ) is a reasonable estimatorof η . Thus T ( m ) φ,n ( θ ) = S ( m ) φ,n ( θ ) − inf θ ∈ Θ S ( m ) φ,n ( θ ) = S ( m ) φ,n ( θ ) − S ( m ) φ,n (ˆ θ ( m ) φ,n ) , (5)is a suitable statistic for testing H : η = θ and significantly large values of T ( m ) φ,n ( θ ) provideevidence of departure from H : η = θ .Ekstr¨om et al. (2020) assumed that m = o ( n ) and under quite general conditions proved thatˆ θ ( m ) φ,n is consistent estimator of the true parameter η and has asymptotic normal distribution, andthe convergence is uniform, i.e.,lim n →∞ sup x (cid:12)(cid:12)(cid:12)(cid:12) P η (cid:16) √ n (ˆ θ ( m ) φ,n − η ) ≤ x (cid:17) − Φ (cid:32) x (cid:115) I ( η ) σ φ,m (cid:33) (cid:12)(cid:12)(cid:12)(cid:12) = 0 , (6)where σ φ,m = mV ar ( ζ m φ (cid:48) ( ζ m )) + (2 m + 1) µ φ,m − mµ φ,m E (cid:0) ζ m φ (cid:48) ( ζ m ) (cid:1) ( E ( ζ m φ (cid:48)(cid:48) ( ζ m ))) , ζ m ∼ m Gamma ( m, , (7) µ φ,m = E ( ζ m φ (cid:48) ( ζ m )) and Gamma ( m,
1) denotes the gamma distribution with shape parameter m and scale parameter 1.Earlier the special case of m = 1 was studied by Ghosh and Jammalmadaka (2001) and similarresults were obtained. For m = 1 and φ ( x ) = − log( x ), ˆ θ ( m ) φ,n was first studied estimator basedon sample spacings (Cheng and Amin (1983); Ranneby (1984)). Observe that, using mean valuetheorem, for some ˜ X j ∈ ( X j : n , X j + m : n ) we have T ( m ) − log ,n ( θ ) = − M M (cid:88) j =1 log (cid:16) D ( m ) j,n ( θ ) (cid:17) − inf θ ∈ Θ M M (cid:88) j =1 − log (cid:16) D ( m ) j,n ( θ ) (cid:17) = sup θ ∈ Θ M M (cid:88) j =1 log (cid:32) D ( m ) j,n ( θ ) D ( m ) j,n ( θ ) (cid:33) = sup θ ∈ Θ M M (cid:88) j =1 log (cid:32) f θ ( ˜ X j ) f θ ( ˜ X j ) (cid:33) = 1 M log (cid:32) sup θ ∈ Θ (cid:81) Mj =1 f θ ( ˜ X j ) (cid:81) Mj =1 f θ ( ˜ X j ) (cid:33) (8)Thus asymptotically, the test based on T ( m ) − log ,n seems to be equivalent to the likelihood ratio test.Likelihood ratio tests are quite popular and are known to have nice asymptotic properties. Ekstr¨om(2013) showed that tests based on T (1) − log ,n ( θ ) have similar properties as likelihood ratio test. Weuse the m -spacings based test statistic˜ T ( m ) φ,n ( θ ) = 2 nT ( m ) φ,n ( θ ) E ( ζ m φ (cid:48)(cid:48) ( ζ m )) σ φ,m , (9)for testing H : η = θ against H A : η (cid:54) = θ . 4 ahul Singh and Neeraj Misra For proving theoretical results of this paper, we will require the following assumptions.(A1) The distributions in the family { F θ , θ ∈ Θ } have common support and the true parameter η is an interior point of Θ;(A2) the distributions in the family { F θ , θ ∈ Θ } are distinguishable, i.e., if θ (cid:54) = θ then F θ ( x ) (cid:54) = F θ ( x ) for some x ∈ R , and F θ is differentiable with respect to θ ∈ Θ;(A3) φ : (0 , ∞ ) → R is strictly convex and thrice continuously differentiable, and Var( Z φ (cid:48) ( ζ m )), E ( Z φ (cid:48)(cid:48) ( ζ m )) and E ( Z φ (cid:48)(cid:48)(cid:48) ( ζ m )) are finite and bounded away from zero, where Z , . . . , Z m areiid standard exponential variates and ζ m = m ( Z + . . . + Z m );(A4) f η ( x ), F − η ( x ), and ∂∂x f η ( x ) and ∂ ∂x ∂θ j f θ ( x ) (cid:12)(cid:12)(cid:12)(cid:12) θ = η are continuous in x , and ∂∂θ j f θ ( x ), ∂ ∂θ j ∂θ k f θ ( x )and ∂ ∂θ j ∂θ k ∂θ l f θ ( x ) are continuous in x as well as θ in an open neighborhood of η , and forall i, j, k ∈ { , , . . . , p } ;(A5) (cid:90) ∂ ∂θ j ∂θ k f θ ( F − η ( u )) (cid:12)(cid:12)(cid:12)(cid:12) θ = η f η ( F − η ( u )) du < ∞ and I j,k ( θ ) = (cid:90) ∞−∞ ddθ j f θ ( x ) ddθ k f θ ( x ) f θ ( x ) dx < ∞ ∀ θ in an open neighbourhood of η, and that the Fisher information matrix I ( θ ) = ( I j,k ( θ )) is positive definite for every θ in anopen neighbourhood of η ;(A6) lim t →∞ min { , φ ( t ) } t = 0, and (cid:12)(cid:12) φ ( t ) (cid:12)(cid:12) ≤ a ( t − b + t c ) , ∀ t >
0, where a , b and c are nonnegativeconstants.Following convex functions are of special interest, as these functions satisfy the conditions requiredfor ensuring consistency and asymptotic normality of θ ( m ) φ,n , and lim m →∞ σ φ,m = 1 (cf. Ekstr¨om etal., 2020). φ ( x ) = φ γ ( x ) = γ − (1 + γ ) − ( x γ +1 − , if γ (cid:54) = − , , − log x, if γ = − ,x log x, if γ = 0 , (10)Under assumptions (A1), (A2) and (A6), Ekstr¨om et al. (2020) proved that θ ( m ) φ,n is consistentestimator of the true parameter η . Ekstr¨om et.al. (2020) commented that multi-parameter versionof result (6) is possible but they didn’t state the result explicitly. The multivariate version isdesirable for our purpose, and is stated below. 5 ome parametric tests based on sample spacings Theorem 1.
Let X , X , . . . , X n be iid observations from F η . Suppose that m = o ( n ) , and assumethat conditions (A1)-(A5) hold. If ˆ θ ( m ) φ,n = arg inf θ ∈ Θ S ( m ) φ,n ( θ ) is a consistent estimator of the trueparameter η , then(a) for finite m , √ n (ˆ θ ( m ) φ,n − η ) d → N (cid:16) , σ φ,m I ( η ) − (cid:17) , (b) for m → ∞ such that m = o ( n ) and lim m →∞ σ φ,m = 1 , we have √ n (ˆ θ ( m ) φ,n − η ) d → N (cid:16) , I ( η ) − (cid:17) . Theorem 1 can be proved using Cram´er-Wald device and using arguments similar to those inthe proof of Theorem 2 of Ekstr¨om et.al. (2020), so we are skipping the details here. We have thefollowing result.
Theorem 2.
Let m be a fixed positive integer. Assume that ˆ θ ( m ) φ,n = arg inf θ ∈ Θ S φ,n is a consistentestimator of the true parameter η ∈ Θ ⊆ R p . Then under the assumption of Theorem 1, we have(a) under H : η = θ , ˜ T ( m ) φ,n ( θ ) d → χ p as n → ∞ ; and(b) under η n = θ + ∆ n − / , where ∆ ∈ R p is a constant vector, ˜ T ( m ) φ,n ( θ ) d → χ p (cid:16) σ − φ,m ∆ t I ( θ )∆ (cid:17) as n → ∞ . The special case of Theorem 2, for m = 1, is proved by Ekstr¨om (2013). Thus for testing (3)based on the statistic ˜ T ( m ) φ,n ( θ ), we may take rejection region as ω = { ˜ T ( m ) φ,n ( θ ) ≥ c α } , where α isthe size of the test and the critical value c α is such that (cid:82) ∞ c α dF χ p ( y ) = α , where F χ p denotes thedistribution function of χ p . The power of the test for local alternatives η n = θ + ∆ n − / around θ , is given by β ∗ ( θ ) = (cid:90) ∞ c α dF χ p ( σ − φ,m ∆ t I ( θ )∆ )( y ) , (11)where F χ p ( σ − φ,m ∆ t I ( θ )∆ ) denotes the distribution function of χ p (cid:16) σ − φ,m ∆ t I ( θ )∆ (cid:17) . To maximise thepower, it is evident that the function φ should be such that σ φ,m is minimum. Ekstr¨om et.al. (2020)noted that σ φ,m ≥ phi and m , and for finite m , the minimum value is attained if and onlyif φ ( x ) = a log( x ) + b x + c x >
0, where a <
0. Thus for finite m , within the class of tests based on˜ T ( m ) φ,n ( θ ), the test based on φ ( x ) = − log( x ) , x > , is asymptotically locally most powerful. For m → ∞ , we have the following result. Theorem 3.
Suppose that m → ∞ such that m = o ( n ) , and lim m →∞ σ φ,m = 1 . Assume that ˆ θ ( m ) φ,n = arg inf θ ∈ Θ S ( m ) φ,n ( θ ) is consistent estimator of true parameter η ∈ Θ ⊆ R p . Then under theassumption of Theorem 1, we have(a) under H : η = θ , ˜ T ( m ) φ,n ( θ ) d → χ p as n → ∞ ; and(b) under η n = θ + ∆ n − / , ˜ T ( m ) φ,n ( θ ) d → χ p (cid:0) ∆ t I ( θ )∆ (cid:1) as n → ∞ . ahul Singh and Neeraj Misra The functions φ defined in (10) satisfy the assumption lim m →∞ σ φ,m = 1 (cf. Ekstr¨om et al.,2020). Theorem 3 suggests that, if m → ∞ such that m = o ( n ), then all the tests based on φ withlim m →∞ σ φ,m = 1, have same asymptotic power.Observe that we can use two different convex functions φ and φ in (9), for the purpose ofestimating and testing. By doing so we obtains a larger class of tests, based on the test statistic˜ T ( m ) φ ,φ ,n ( θ ) = S ( m ) φ ,n ( θ ) − S ( m ) φ ,n (ˆ θ ( m ) φ ,n ) E ( ζ m φ (cid:48)(cid:48) ( ζ m )) σ φ ,m . (12)By using φ ( x ) = − log( x ) in (12), we get an asymptotically locally efficient testing procedure basedon sample spacings for fixed m . When m → ∞ such that m = o ( n ), by choosing any φ such thatlim m →∞ σ φ ,m = 1, we get an asymptotically locally efficient testing procedure based on samplespacings. Define ˜ T ( m ) ∗ φ,n ( θ ) := 2 n (cid:16) S ( m ) φ,n ( θ ) − S ( m ) φ,n (ˆ θ ( m ) − log ,n ) (cid:17) E ( ζ m φ (cid:48)(cid:48) ( ζ m )) . It is possible to extend Theorem 2-3 for the statistic in (12), as stated in following results:
Corollary 1.
Suppose that m is finite. Then, under the assumptions of Theorem 2, we have(a) under H , ˜ T ( m ) ∗ φ,n ( θ ) d → χ p as n → ∞ ; and(b) under η n = θ + ∆ n − / , ˜ T ( m ) ∗ φ,n ( θ ) d → χ p (cid:0) ∆ t I ( θ )∆ (cid:1) as n → ∞ . Corollary 2.
Suppose that m → ∞ such that m = o ( n ) , and lim m →∞ σ φ ,m = 1 . Then, under theassumptions of Theorem 3, we have(a) under H , ˜ T ( m ) φ ,φ ,n ( θ ) d → χ p as n → ∞ ; and(b) under θ n = θ + ∆ n − / , ˜ T ( m ) φ ,φ ,n ( θ ) d → χ p (cid:0) ∆ t I ( η )∆ (cid:1) as n → ∞ . From Corollaries 1 and 2, it follows that tests based on ˜ T ( m ) ∗ φ,n and ˜ T ( m ) φ ,φ ,n ( θ ), such thatlim m →∞ σ φ ,m = 1, are asymptotically equivalent to likelihood ratio test (Sen & Singer (1994)).Here we can also adopt the p-value approach for parametric tests. It can be easily seen that, fortesting (3), tests based on ˜ T ( m ) φ,n ( θ ), form nested critical region, i.e., ω α ⊂ ω α (cid:48) if α < α (cid:48) . And incase of nested critical region, it would be beneficial to determine not only whether the verdict is toaccept or reject the null hypothesis at the given level of significance, but also to determine the leastlevel of significance (ˆ p ( X ) = inf { α : X ∈ ω α } ), at which the null hypothesis would be rejected forthe available observation set (cf. Lehman 1986, p 75). The advantage of using p-value approach isthat it would enable to reach verdict of the test at other level of significance also. The testing procedure discussed in the previous section can be extended for testing compositehypothesis also. For testing composite hypothesis it is pragmatic to present testing problem as H : h ( η ) = 0 against H A : h ( η ) (cid:54) = 0 , (13)7 ome parametric tests based on sample spacings where Θ ⊆ R p and h = ( h , h , . . . , h r ) : R p → R r is a vector valued function such that for every θ ∈ Θ, the p × r matrix, H ( θ ) = (( ∂∂θ i h j ( θ ))) exists, H ( θ ) continuous in θ and rank (cid:0) H ( θ ) (cid:1) = r .Under the above setup, an extension of the test statistic (9) is T ( m ) φ,n ( h ) = inf θ : h ( θ )=0 S ( m ) φ,n ( θ ) − inf θ ∈ Θ S ( m ) φ,n ( θ ) E ( ζ m φ (cid:48)(cid:48) ( ζ m )) σ φ,m . (14)Significantly large positive value of the test statistic T ( m ) φ,n ( h ) would result into rejection of the nullhypothesis. We have the following result regarding asymptotic behaviour of T ( m ) φ,n ( h ), under thenull hypothesis. Theorem 4.
Let m = o ( n ) (finite or m → ∞ ) and suppose that the assumptions of Theorem 1hold. Assume that ˆ θ ( m ) φ,n = arg inf θ ∈ Θ S φ,n is a consistent estimator of true parameter η ∈ Θ ⊆ R p .Then under H , T ( m ) φ,n ( h ) d −→ χ r as n → ∞ . This result gives another similarity between spacings based test and likelihood ratio test (Sen& Singer (1994)). Observe that the test procedure based on T ( m ) φ,n ( h ) can be used for testing ofregression model when errors follow a mixture or a heavy tailed distribution and likelihood ratiotest is not applicable.This idea has also been extended to nonparametric setting. Suppose that we have a randomsample from distribution function F ∈ F , F is a family of absolutely continuous distributionfunctions on real line. To test H : F = F against H A : F (cid:54) = F , where F ∈ F is completelyspecified, an extension of test statistic (9) is T ( m ) φ,n ( F ) = S ( m ) φ,n ( F ) − inf F ∈F S ( m ) φ,n ( F ) , (15)where D ( m ) j,n ( F ) = F ( X jm : n ) − F ( X ( j − m : n ) , j = 1 , , . . . , (cid:22) n + 1 m (cid:23) , F ∈ F and S ( m ) φ,n ( F ) = 1 M M (cid:88) j =1 φ ( M D ( m ) j,n ( F )) . Clearly inf F ∈F S ( m ) φ,n ( F ) is attained for that distribution which puts equal mass on every spacings.Thus in such a case the test statistic T ( m ) φ,n ( F ) is equivalent to S ( m ) φ,n ( F ). Under H , U i := F ( X i )’sare independent uniform random variables over [0 , S ( m ) φ,n ( F ) under null and local alternatives converging to null hypothesis at the rate of n − in case of simple spacing was first obtained by Sethuraman & Rao (1970) and in case of m -spacings, for fixed m was obtained by Del Pino (1979). Test statistics of type S ( m ) φ,n ( F ), whichare symmetrically based on spacings, can’t detect alternative converging to null at faster rate than n − . Holst and Rao (1981) considered test statistics asymmetrically based on spacings and foundthat such tests can discriminate alternatives converging to the null distribution, at a rate of n − ,8 ahul Singh and Neeraj Misra as in the case of Kolmogorov-Smirnov and Cramer-von Mises tests. Results of the previous section, can be extended to multivariate distributions for m = 1. Spacingsfor multivariate observations are defined in terms of nearest neighbor balls. Let X , X , . . . , X n beindependent and identically distributed d -dimensional random vectors from an absolutely continu-ous distribution function F η , η ∈ Θ ⊆ R p . Nearest neighbour distance to X i is defined as R n ( i ) := min i (cid:54) = j || X i − X j || , i = 1 , , . . . , n, where || · || is some distance measure on R d . Let B ( x, r ) = { y ∈ R d : || x − y || ≤ r } be the closed ballwith center x ∈ R d and radius r ( > X i by X i ( nn ) and nearestneighbor ball of X i by B n ( X i ) := { y ∈ R d : || X i − y || ≤ R n ( i ) } . Let f θ and P θ , respectively be thedensity function and probability measure corresponding to distribution F θ , θ ∈ Θ. Define randomvariable ξ i,n as ξ i,n ( θ ) = n P θ ( B n ( X i )) = n (cid:90) B n ( X i ) d P θ ( y ) , θ ∈ Θ , i = 1 , , . . . , n. Ranneby et al. (2005) extended the idea of maximum spacing estimator to multivariate observa-tions and defined multivariate maximum spacing estimator as ˆ θ n = arg sup θ ∈ Θ 1 n (cid:80) ni =1 log( ξ i,n ( θ )).Kuljus & Ranneby (2015) extended this idea for any strictly convex function φ : (0 , ∞ ) → (0 , ∞ )such that φ has minima at x = 1. They defined generalised spacing function S φ,n ( θ ) and generalisedspacing estimator ˆ η φ,n as follows. S φ,n ( θ ) = 1 n n (cid:88) i =1 φ ( ξ i,n ( θ )) , θ ∈ Θ; ˆ η φ,n = arg min θ ∈ Θ S φ,n ( θ ) . (16)If minimiser doesn’t exist the estimator can be suitably modified on the lines suggested by Kuljus& Ranneby (2015). For simplicity we assume that the minimiser exists.Consider following notations: q ( x ) = xφ (cid:48) ( x ) , σ q = q (0) + (cid:90) ∞ (cid:90) ∞ k ( s, t ) dq ( s ) dq ( t ) + 2 q (0) (cid:90) ∞ k (0 , t ) dq ( t ) ,k ( s, t ) = e − t − te − s − t + e − s − t (cid:90) W ( s,t ) ( e β ( s,t,x ) − dx, ≤ s ≤ t ≤ ∞ , where W ( s, t ) = { x ∈ R d : r ≤ || x || ≤ r + r } , β ( s, t, x ) = (cid:90) B (0 ,r ) ∩ B ( x,r ) dz with t and s denoting volumes of the balls B (0 , r ) and B (0 , r ), respectively.Under some general assumptions, Kuljus & Ranneby (2020) proved asymptotic normality of9 ome parametric tests based on sample spacings ˆ θ φ,n , i.e., √ n (ˆ θ φ,n − η ) d → N (cid:32) , σ q b φ I ( η ) − (cid:33) , as n → ∞ , (17)where b φ = E ( Z φ (cid:48)(cid:48) ( Z )) and σ q as defined above, and I ( η ) is the Fisher information matrix.For a specified θ ∈ Θ, we wish to test the null hypothesis H : η = θ against the alternative H A : η (cid:54) = θ . Define, T φ,n ( θ ) = 1 n n (cid:88) i =1 φ ( ξ i,n ( θ )) − inf θ ∈ Θ n n (cid:88) i =1 φ ( ξ i,n ( θ )) = 1 n n (cid:88) i =1 φ ( ξ i,n ( θ )) − n n (cid:88) i =1 φ ( ξ i,n (ˆ θ φ,n ))and ˜ T φ,n ( θ ) = 2 nb φ σ q T φ,n . (18)For testing H : η = θ against H A : η (cid:54) = θ , similar to previous discussion, we will reject H forlarge values of test statistic ˜ T φ,n ( θ ). Following theorem gives asymptotic distribution of ˜ T φ,n ( θ ),under null and a local alternative sequence. Theorem 5.
Assume that ˆ θ φ,n = arg min θ ∈ Θ S φ,n ( θ ) is a consistent estimator of true η ∈ Θ ⊆ R p .Under assumptions of Theorem 1, Kuljus & Ranneby (2020), we have(a) under H , ˜ T φ,n ( θ ) d → χ p as n → ∞ ; and(b) under η n = θ + ∆ n − / , ˜ T φ,n ( θ ) d → χ p (cid:32) b φ σ q ∆ t I ( θ )∆ (cid:33) as n → ∞ . Using Theorem 5, asymptotic test procedure for level α test would be exactly same as theunivariate case. Also the asymptotic power for this test, for local alternatives is again same as theunivariate case.This procedure can be extended for testing composite hypothesis for absolutely continuousmultivariate distribution. Practically a composite hypothesis testing problem can be written as H : h ( η ) = 0 against H A : h ( η ) (cid:54) = 0 , (19)where h = ( h , h , . . . , h r ) : R p → R r is a vector valued function such that the p × r matrix, H ( θ ) = (( ∂∂θ i h j ( θ ))) exists, and is continuous in θ , and rank (cid:0) H ( θ ) (cid:1) = r . An extension of the testis based on the statistic T φ,n ( h ) = 2 nb φ σ q (cid:32) inf θ : h ( θ )=0 n n (cid:88) i =1 φ ( ξ i,n ( θ )) − inf θ ∈ Θ n n (cid:88) i =1 φ ( ξ i,n ( θ )) (cid:33) , (20)and large positive values of the test statistic would result into rejection of the null hypothesis. Theorem 6.
Assume that ˆ θ φ,n = arg inf θ ∈ Θ S φ,n ( θ ) be a consistent estimator of true η ∈ Θ ⊆ R p .Then, under assumptions of Theorem 1 of Kuljus & Ranneby (2020) and under H , we have T φ,n ( h ) d −→ χ r as n → ∞ . Similar to the univariate case, we can use two different convex functions φ and φ , one forestimation and other for testing H , and obtain results similar to the one stated in Theorem 5-6.10 ahul Singh and Neeraj Misra Let d be a semi-metric, which means that, it satisfies the following conditions: d ( s, t ) ≥ ∀ t and s ; d ( s, t ) = 0, iff s = t ; and d ( s, t ) = d ( t, s ). Let d ( s, · ) is thrice continuously differentiable andthat τ φ,m = ∂∂s d ( s, E ( ζ m )) | s = E ( ζ m ) exists and non-zero. Then all the test statistics (univariate andmultivariate) can be modified by replacing the difference by the semi-metric and dividing by τ φ,m and obtain corresponding theorems and corollaries discussed above (cf. Ekstr¨om, 2013). In this section we will evaluate finite sample performance of the proposes tests and compare theirempirical powers with those of likelihood ratio based tests. One of the essential properties of agood tests is that its type-I error rate should be close to the level of the test. If type-I error rate isaway from the level then the conclusion made may be spurious. Obviously both type-I error rateand power of the test depend on n and m . In the following numerical study we have taken 10 , Define m opt ( n ) = the value of m for which type-I error rate is closest to the level of the test, forgiven sample size n . m opt ( n ) may also depend on the hypothesized model. Example 1.
Suppose that we want to test H : η = 1 against the alternative H A : η (cid:54) = 1 for a randomsample from exponential distribution with failure rate parameter η ∈ (0 , ∞ ) = Θ. Let X , . . . , X n be an available random sample from the distribution. Under H , ˜ T ( m ) φ,n (1) d → χ as n → ∞ for m = o ( n ). We take α = 0 .
05 as level of significance. Then, H is rejected if ˜ T ( m ) φ,n > χ , . (cid:39) . n = 100, m = 1 and φ ( x ) = − log( x ).We have studied performance of the test procedure based on ˜ T ( m ) φ,n (1), for different values of n and m , and fixed alternatives θ ∈ { . , . , . , . , . , . , . , . , . , . } . We found that m opt ( n ) increases as n increases, and for n ≤ ≤ m opt ( n ) ≤
5. Figure 1 gives empiricalpowers of the test for n = 200 and φ ( x ) = − log( x ), here m opt ( n ) = 3.11 ome parametric tests based on sample spacings l l l l l l l l ll l l l l l l l l q E m p i r i c a l po w e r Figure 1: The empirical powers of ˜ T (1) − log ,n (1) (hollow circles), ˜ T ( m opt ( n )) − log ,n (1) (solid circles) and like-lihood ratio test (hollow squares), for n = 200.For failure rate greater than 1, the likelihood ratio test perform better than tests based on˜ T ( m ) − log ,n (1) but for failure rate less than 1, tests based on ˜ T ( m ) − log ,n (1) outperform likelihood ratio test.For other choices of φ , tests based on ˜ T ( m ) φ,n (1) perform slightly inferior.If the assumptions (A1)-(A6) are not satisfied, the limiting distribution of ˜ T ( m ) φ,n may not bechi-square (cf. Ekstr¨om, 2013). Sometimes due to unboundedness of likelihood function likelihoodratio test can’t be used but tests based on ˜ T ( m ) φ,n ( θ ) still may be applicable, as illustrated in thefollowing example. Example 2.
Consider a population with a distribution function, F η ( x ) = 12 Φ( x − µ ) + 12 Φ (cid:18) x − µσ (cid:19) , −∞ < x < ∞ , where η = ( µ, σ ) ∈ R × (0 , ∞ ). Let X , . . . , X n be a random sample from this distribution. Supposethat we want to test H : η = (0 ,
1) against H A : η (cid:54) = (0 , T (1) − log ,n (0 ,
1) is well approximated by thelimiting distribution for n = 225, under null hypothesis as well as under local alternatives. Figure2 shows that for n = 225, the distribution of ˜ T ( m ) − log ,n (0 ,
1) is better approximated by the limitingchi-square distribution for m = m opt = 2 than for m = 1. For local alternative η n = (0 , n − / ,we take ∆ = (3 ,
3) then for n = 225 η n = (0 . , . T ( m ) − log ,n (0 ,
1) for m = m opt ( n ) = 2, is closer to the limiting non-central chi-squaredistribution than for the case of m = 1. For other φ s similar observations were made.We observed that m opt ( n ) depends not only on n but also on the underlying model. To obtain m opt ( n ), we propose the following algorithm based on hypothesized model and data.Step-I Compute θ (1) φ,n , using the given data and the hypothesized model.Step-II For some large positive integer B , each on b th occasion b ∈ { , , . . . , B } , draw bootstrap12 ahul Singh and Neeraj Misra ll l ll l ll l lll lllll l lll ll llll l l l l ll l lll l ll lll lll ll l l ll ll l lllll l ll ll l lll ll lll l ll lll lllll lll ll lll lll lll llll ll ll ll ll lll l ll lll lll lll ll llll ll ll llll ll l llll ll lll l ll l ll lllll l l ll ll ll l ll llll l l lll lll lll lll ll llll lll ll lll l ll ll ll l lll lll llll ll lllll lll l l lll ll l l lllll l l llll l ll lllll ll ll llll l l lll ll l llll ll lll ll lll l ll lll l lll lll lll ll ll ll ll ll ll lllll l ll ll lllll ll lll l l ll ll ll lll lll llll l ll ll ll llll ll llll ll ll lll ll lll lllll ll ll ll ll lll ll ll ll ll llll ll lll llll l ll ll l llll ll lllll l ll l lllll ll lll ll l lll ll lll ll l lll ll lll lllll ll llll l l ll lll ll ll ll ll llll llll ll lll l l ll ll l lllll lll lllllll l lllll l ll l ll l l lll l ll ll ll llll ll lll lll lll ll ll ll ll ll lll lll l l lll llll ll lll llll ll llll ll llll ll ll lll l lll lll ll lll ll llll ll ll ll llll ll lll lll l lll l ll lll l lll ll l ll llll l lll ll ll ll ll l ll ll llll l l lll l llll ll lllll l lll ll ll ll l ll l lll ll ll ll ll ll lll lll l l ll ll ll l ll ll l lll l l lllll ll ll l lll lll lll l ll lll lll llll lll ll l ll ll l ll ll l ll lll ll ll lll ll lll lllll l lllll ll ll ll lll lll ll ll l lll ll llll lll ll ll l ll llll lll lll ll ll ll ll l llll l llll llll ll ll ll ll lll lll ll ll ll lll ll ll ll l ll l ll ll ll llll ll l lll llll ll ll ll l ll ll ll ll ll l lll ll l Empirical quantiles Q uan t il e o f li m i t i ng d i s t r i bu t i on m = 1 ll ll ll l ll lll llll lll l ll lll lll ll lllll llll ll l lll l lllll l ll l ll l ll lll l llll ll l lll l l ll ll l ll l lll lll l l ll l l lll lll l lll l ll ll l lll l ll llll lll l lll lll ll lll lll l llll llll lll lll ll lll l ll lll lll ll llll l l lllll ll l l ll l ll lll lll llll llll ll ll lll ll l ll ll ll ll ll lll ll ll ll lll ll l ll llllll ll l ll ll ll ll ll l lll lll ll ll lll lll lll lll l ll l lll ll ll ll llll l ll l l lll lll l llll llll l lll lll ll l lll llll lll ll ll ll llll l lll lll lll ll l lll lll llll ll ll lll ll lll lll ll llll ll ll l lll ll llll ll ll llllll ll ll lllll ll l llll l lll ll ll lll ll ll ll l ll l lll l ll ll l llll l l ll l llll l llll l ll lll l lll lll l ll ll lll ll llll l ll ll l ll lll l ll l l lll lll ll lllll ll ll l l ll lllll ll l ll lll llll ll lll lll ll ll ll l lll lll ll ll ll l lll l lll ll ll ll ll ll lll lll ll ll lll l l llll lll ll lll llll ll l llll l lll l ll l ll l lll ll l ll l lllll lll ll l lll l lll lll ll ll ll lll llll lll ll ll ll ll ll ll l ll ll lllll lll l ll l l l ll lll lll l l lll ll l ll ll lllll ll ll l l l lll ll l ll ll ll l ll ll ll lll ll lll l ll l llll lll l lll l ll l lll ll lll llll ll lll lll ll llll lll lll l ll ll l ll ll ll l lll ll lll lll ll ll l lll lll l ll ll llll lll lll ll l ll ll ll lll lll l lll ll ll ll lll ll ll lll ll ll l ll lllll lll lllll ll ll ll l ll lll l lll l ll l ll ll ll l l ll l ll ll lll ll lll lll ll ll Empirical quantiles Q uan t il e o f li m i t i ng d i s t r i bu t i on m = 2 Figure 2: Q-Q plot of 1000 replicates of ˜ T ( m ) − log ,n ( θ ), n = 225 under H , empirical distributionagainst limiting χ distribution.samples x ∗ b , x ∗ b , . . . , x ∗ bn from the population with distribution F ˆ θ (1) φ,n . Let M be a subsetof suitably chosen natural numbers, containing competing choices of m . For each m ∈ M compute type-I error rate for the test, based on the B bootstrap samples.Step-III Take m opt ( n ) = (cid:8) m ∈ M : type-I error rate is closest to the level for the test based on ˜ T ( m ) φ,n (cid:9) .We have taken M to be the set of all natural numbers ≤ n .If m opt ( n ) has multiple values, we take minimum of those m opt ( n ). For the models considered inthe examples, as n increases type-I error rates remain close to the level upto certain large value of m . For sample size n ≥
50, our proposed tests for m opt ( n ), perform similar to likelihood ratio test. Example 3.
Suppose that X , . . . , X n is a random sample from bivariate normal population N ( η, Σ),where η = (cid:32) µ µ (cid:33) ∈ R and Σ = (cid:32) (cid:33) . Consider the testing problem H : η = (0 , t against H A : η (cid:54) = (0 , t . To examine finite sample performance of the proposed test statistic ˜ T φ,n ( θ ), wetake n = 100 and φ ( x ) = − log( x ) + x −
1. Under H , the distribution of ˜ T φ,n ( θ ) is reasonablyclose the limiting χ distribution (see Figure 4). To asses performance of the test under localalternatives, we take ∆ = (1 , θ n = θ + ∆ n − / = (0 . , . t . Figure 4 shows thatunder this alternative, the distribution of ˜ T φ,n ( θ ) is reasonably close to the limiting distribution χ (cid:32) b φ σ q ∆ t I ( θ )∆ (cid:33) . For other choices of φ , performance of corresponding test statistics are similar.In this case likelihood ratio test perform better than the proposed tests.13 ome parametric tests based on sample spacings ll lll l llll l ll ll lll llll lll ll l ll lll lll ll l llll ll l l lll l l ll llll llll ll ll l ll lll lll ll l lll ll lll l ll l ll lll l ll llllll l lll ll l ll ll lll l ll lll l ll ll l llll l l ll ll lll llll ll llll lll llll ll ll l l ll ll l ll ll ll l ll l llll ll l ll ll ll l lll lll l lllll llll l ll ll ll ll ll l l lll l ll lll lllll lll lll l ll lll ll lll ll l ll l lllll l lll l lll l lll lll lll ll llll ll lll l lll l ll llll l ll l llll ll l lll lll l llll llll l l ll ll lll l l llllll ll l ll ll lll l ll l lll ll llll lll l lllll lll lll ll ll ll ll lll ll ll l lll l ll ll l lll l llll llll ll l ll lll ll ll lll l ll l ll l ll ll ll lll ll l ll l lll ll l lll l lll l ll lll l llll l llll l l lll l ll l l l lll l lll llll l ll lll ll l lll ll lll ll lll ll ll llll l lll l ll lll lll l l l llllll ll l llll ll lll ll l lll l lll ll l lll l ll l lll ll l ll l ll lll ll lll l l lllll l ll ll l ll lll l l lll lll ll ll lll ll llll l ll l ll l lll l lll lll l ll ll ll l l lll llll l l llll l l llll ll l lll ll lll ll l ll l ll ll l lll lll l lll ll ll ll lll lll ll llll ll lll ll lll ll l l ll ll ll ll lll l lll l ll l ll l lll ll ll l lll llll lll l lll l l ll l lll ll llll l lllll llll l ll ll l l ll ll l lll l ll l l l lll lll llll ll ll lll ll l l ll ll ll l ll l ll ll l ll ll l ll l llll lllll ll ll lll llll ll lll l ll l lllll ll lll ll l lll l l lll l lll ll ll ll ll llll ll l l ll l l llll ll ll ll ll ll ll ll l lllll lll l ll Empirical quantiles Q uan t il e o f li m i t i ng d i s t r i bu t i on m = 1 l lll ll l lll ll l ll ll llll l ll l lll l lll ll lllll lllll l l ll lll l l l l ll l ll l l lll l llll ll l ll ll lll ll l llll lll lll l l llll lll llll ll l llll ll llll llll l lll l lll l llll ll l llll ll ll lll ll llll l lll lll ll ll ll l ll l lll lll ll l lll ll l ll l ll lll ll lll lll ll l ll lll ll ll ll llll ll llll ll ll ll l l ll l ll l lll llll l ll l ll ll l ll l lll ll l lll ll l lll lll l lll l llll l ll ll lll l llllll ll l llll l lll l ll llll l l llll lllll ll llll lll ll ll ll ll ll l lll lll l l lll llll ll ll lll l lll l llll ll ll ll l ll lll l l ll ll ll lll llll l ll ll llll l l lll l ll l ll llll l lll lll l ll lll l ll l ll l l lll ll l l ll l ll ll ll l l l llll ll ll llll l lllll lll ll l llll lll lllll l ll ll ll ll ll llll lll l ll ll l ll ll ll l ll ll l ll ll l l ll ll ll lll l l llll l l llll l ll ll llll ll l l ll ll l l ll ll ll l l ll ll llll ll l lll l ll ll ll ll l ll lll l l l llll lll lll l ll ll llll lll l ll lll lll ll ll l lll l lll l ll l ll l lll lll ll lll l llll ll lll ll l lll ll lll ll ll ll l ll l llll lll l l ll l ll l l llll l lll ll llll ll ll l ll lll lll llll lll ll lll ll ll ll ll l ll ll l ll l ll ll l l ll l lll ll ll lll ll lll ll ll l l ll llll ll l l l ll l l ll l lll ll ll ll ll l l l lll l ll l llll l ll ll ll l ll ll ll ll l l l lll l llll ll lll l ll lll ll ll l l ll l ll lll llll ll l lll lll ll ll ll l lll l lll llll ll ll lll ll ll lll l l llll l ll ll llllll l ll lll l l lll ll llll Empirical quantiles Q uan t il e o f li m i t i ng d i s t r i bu t i on m = 2 Figure 3: Q-Q plot of 1000 replicates of ˜ T ( m ) − log ,n ( θ ), n = 225 under the alternative θ n = θ +∆ n − / with ∆ = (3 , χ (∆ t I ( θ )∆) distribution. Example 4.
Consider a bivariate population with probability measure12 N (cid:32)(cid:32) µ µ (cid:33) , (cid:32) (cid:33)(cid:33) + 12 N (cid:32)(cid:32) µ µ (cid:33) , (cid:32) ρρ (cid:33)(cid:33) , where N ( τ, Σ ) denotes the probability measure of bivariate normal distribution with mean vector τ and covariance matrix Σ . Here η = ( µ , µ , ρ ) ∈ R × ( − , X , X , . . . , X n be a randomsample from this population. We want to test H : η = (0 , ,
0) against H A : η (cid:54) = (0 , , µ , µ ) = X , the likelihood increases unboundedly as ρ approaches 1. So in this situationthe likelihood ratio test can not be used. For this problem, conditions required for the proposedtests are satisfied. To evaluate finite sample performance of the proposed test statistic ˜ T φ,n ( θ ), wetake n = 225 and φ ( x ) = − log( x ) + x −
1. Under H , the distribution of ˜ T φ,n ( θ ) is reasonablyclose to the limiting χ distribution (see Figure 5). To asses performance of the test under localalternatives, we take ∆ = (0 . , . , . θ n = θ + ∆ n − / = (0 . , . , . t . Figure 5shows that under this alternative, the distribution of ˜ T φ,n ( θ ) is reasonably close to the limitingdistribution χ (cid:32) b φ σ q ∆ t I ( θ )∆ (cid:33) . In this paper, we have studied several new parametric tests for univariate and multivariate ab-solutely continuous distributions, based on sample spacings, for simple as well as composite hy-pothesis. For univariate distribution, the test based on ˜ T ( m ) φ,n ( θ ), under fairly general conditions,has similar asymptotic properties as likelihood ratio test. For finite m , test corresponding to the φ ( x ) = − log( x ) gives the asymptotically efficient test. For m → ∞ such that m = o ( n ) thereare multiple asymptotically efficient for example any test corresponding to φ = φ γ (see 10), is14 ahul Singh and Neeraj Misra lll ll llll l ll ll l lll lll lll l lll ll ll lll ll l ll llllll lll llll lll llll ll ll ll l ll lll ll ll ll lll lll l lll l ll l l ll lllll lll l ll llll lllll lll l lll l l ll llll lll llllllll llll lll l lllll lll l lll l ll ll ll lll ll lll l ll lll ll ll l lll ll llll ll lll lll l lll l l lll lll lll ll ll ll lll llll l ll ll lllll l llllll lll llll ll lll ll lll ll l ll l ll lll ll ll ll l lllll ll l l lll lll l ll lll llllll ll lll lll lll lll ll lllll lll llll l ll lll l l ll llll lll ll llll l lll ll l ll ll l ll ll l ll lllllll llll ll l lll ll lll ll ll ll l lll l ll ll ll l llll l ll ll ll lll l ll l llll ll l l lll llll ll ll ll ll lll lll ll lll llll l ll ll l l l l lll lllll lll ll lll l l ll lll llll ll lll ll l lll llll ll llll ll l ll l lll ll l ll l llll ll l l l lll ll ll l lllll ll l ll lllll lll lllll l ll ll ll ll l ll ll l lllll l ll lll l lll lll ll lllll ll lll l ll ll lll l ll l l ll ll ll ll ll l l llll llll l lll ll lll l ll ll l ll llll lll ll lll ll lll ll ll ll lll lll lll lll lll ll l ll ll lll lll l ll ll l lll ll l ll lll lll lll l lll llll l lll lll ll ll lll ll ll lll ll l llll ll ll lll l ll ll lll ll lllll ll l lll ll lll l ll l l l ll llll ll ll l l ll lll lll l ll llll ll l ll llll l lll ll ll ll l lll ll lll ll lll l lll ll ll l ll ll l ll llll lll lll llll l l llll l l ll lllll ll l ll llll ll lll ll ll ll ll ll lll lll ll l lll ll ll llll lll ll ll l lll l lll ll l Empirical quantiles Q uan t il e o f li m i t i ng d i s t r i bu t i on l l ll l lll ll lllll l ll lll l lll lllll l llll l lll lll ll lll l l l lll ll ll ll llll ll lll ll lll ll l ll llll llll ll l ll ll ll l ll ll ll ll lll l l lll llllll ll l ll ll ll ll l ll l llll l l ll ll ll l ll ll ll ll l ll ll llll lll l l l llll ll lllll ll lll l l l lll lll lll ll lll ll lll l lll lll lll ll ll ll l l lll l ll ll ll lll ll lll l lll l lll llll ll l lll llll lll ll ll l lll ll lll ll llll ll l llll ll ll ll lll lll ll lll l ll lll ll l lll ll l ll ll ll lll lll l ll lll ll ll lll ll lll lll ll ll lll llll lll l l lll llll l lll ll ll l ll ll lll l llll l llll ll l l ll llll l lll ll ll l l ll ll ll l ll ll ll ll ll lll l lll l l ll l ll l lllll l l lll lll l ll lll ll l ll lll lll ll lll ll llll l ll lll ll llll lll ll lll ll lll l ll llll lll ll l ll ll l lll l lll ll ll l l llll lll ll l llll l ll lll llll l lll lll ll lll llll l l ll lll llll ll l ll ll l ll lllll llll lll lll l lll ll ll llll ll ll ll ll l lll l lll llll lll l l lll lll ll lllll lll l ll ll l ll l llll lll l llll l llllll l l lllll l l lll lll ll ll l ll lll lll ll lll ll ll lll ll ll l lll llll lll l ll l ll lll l lllll l llll l lll l l ll lll l llll lll l lll ll lll ll lll lll ll llll lll ll lll l lll lll lllll l l lll l lll ll llllll lll ll l ll ll ll ll llll ll llll lll lll lll ll lllll ll lll ll l ll ll l ll lll ll lll ll ll ll ll l ll ll llll ll ll ll lll l l l lllll l llll ll llll ll ll ll ll ll llll l lll l ll lll Empirical quantiles Q uan t il e o f li m i t i ng d i s t r i bu t i on Figure 4: Left plot is Q-Q plot of 1000 replicates of ˜ T φ,n ( θ ), n = 100 and φ ( x ) = − log( x )+ x −
1, un-der H , empirical distribution against limiting χ distribution. The right plot is corresponding plotunder the alternative θ n = θ +∆ n − / with ∆ = (1 ,
1) and limiting distribution χ (cid:18) b φ σ q ∆ t I ( θ )∆ (cid:19) .asymptotically efficient. For multivariate random variables, spacings are defined in terms of near-est neighbor balls. For simple multivariate spacings, the distribution of ˜ T φ,n is similar to those oflikelihood ratio test statistic. In this case we don’t have a asymptotically efficient test. Simulationstudy shows that distributions of all the test statistics (corresponding to different convex functions φ ), for finite sample size ( n ≥
50) are reasonably close to their limiting distribution under nullhypothesis as well as local alternatives. For certain distributions likelihood ratio test doesn’t exist(due to unboundedness of the likelihood function), and in such situations, tests based on spacingsmay be useful.
Appendix
Proof of Theorem 2.
For simplicity let us denote S ( m ) φ,n ( θ ), T ( m ) φ,n ( θ ) and ˆ θ ( m ) φ,n , respectively, by S , T and ˆ θ . Also denote S (cid:48) j ( τ ) = ∂∂θ j S ( θ ) (cid:12)(cid:12)(cid:12)(cid:12) θ = τ , S (cid:48)(cid:48) j,k ( τ ) = ∂ ∂θ j ∂θ k S ( θ ) (cid:12)(cid:12)(cid:12)(cid:12) θ = τ , S (cid:48)(cid:48)(cid:48) j,k,l ( τ ) = ∂ ∂θ j ∂θ k dθ l S ( θ ) (cid:12)(cid:12)(cid:12)(cid:12) θ = τ , and δ = θ − ˆ θ with δ j as j th component of δ . Using Taylors expansion about ˆ θ , we have T = p (cid:88) j =1 δ j S (cid:48) j (ˆ θ ) + 12 p (cid:88) j =1 p (cid:88) k =1 δ j δ k (cid:32) S (cid:48)(cid:48) jk (ˆ θ ) + 13 p (cid:88) l =1 δ l S (cid:48)(cid:48)(cid:48) jkl ( θ ∗ ) (cid:33) , (21)where θ ∗ is a point on the line segment joining θ and ˆ θ . Under H , we have ˆ θ p → θ , implies ˆ θ isin any open neighbourhood of θ in Θ with probability tending to one. By assumption we have15 ome parametric tests based on sample spacings ll ll ll lll ll lll l ll ll llll lll l ll l l llll ll ll ll ll l llll l ll ll l l l lll l ll ll ll l ll l l lll l ll ll ll ll ll ll ll ll ll ll ll lll llll l ll ll lll ll ll l ll l lll l lll ll lll ll ll ll ll ll ll l ll lll ll ll ll lll l llll ll ll l ll ll l lll llll l ll lll lllll l lll ll l lll l lll ll lll l llll lll ll ll ll l ll ll ll ll lll ll ll l lll l l ll ll ll ll ll ll lll lll lll ll l ll ll lll l ll ll ll l l lllll lllll ll llllll l ll ll ll ll lll l lllll l ll l ll l ll l llll lll lllll l lllll l ll l lllll llll ll ll lll llll l lll ll lllll ll lll ll l ll ll ll l ll ll ll ll lll l llll lll l lll ll lll ll ll llll l ll l lll l ll ll lll l ll ll ll lll ll llll lll lll l ll llll l ll lll ll lll ll l lll ll ll l ll ll ll l ll l lll ll l lll lll ll l l ll l ll ll lll lll l ll ll l lll lll ll ll l lll l ll lll l llll llll l lll l l lll lll l lll ll l lll ll ll ll ll ll ll ll lll ll ll ll l ll lll ll l ll l ll ll lll lll llll ll lll l ll llll l ll ll l ll ll ll lll lll l l llllll ll ll ll lll ll lll llll l ll lll ll l lll lll ll ll llll ll lll lll l lll lll lllll l lll l llll lll l ll lll ll ll llll l ll lll lll ll l ll llll l ll l ll l ll ll ll ll l l llll llllll ll ll lll l l lll l ll lll ll l lll ll lll ll ll ll ll l lllll l ll ll lll ll lll l ll lll l lll llll ll ll l lll l lll l l ll ll ll ll ll lll l ll lll lll l l lllll l l lll lll l ll l ll l ll l lll l ll llll llll l ll lll lllll l lll l ll ll l l l lll l llll ll ll ll Empirical quantiles Q uan t il e o f li m i t i ng d i s t r i bu t i on l l ll ll ll lll lll l lll l l llll ll ll l l lll l ll l lllll lll ll lll l ll lllll lll ll l ll lllll ll ll ll l lllll l l ll ll l lll l lll ll l lll l ll l l lllll lll ll ll lll ll lll ll llll lll ll ll ll l lll lll llll l ll lllll ll ll lll ll lll l lll lll llll ll ll l lll l ll l ll lll l l ll lll ll lll l ll l ll ll ll ll ll l ll lll ll ll lllll l llll ll lll lll l l lllll l ll ll lll ll l ll ll l ll ll lll ll ll ll ll llll ll lll l ll lll ll l ll ll ll l l llll llll l lll ll ll l llll lll ll ll lll lll ll l l ll ll ll ll lll ll ll lll l ll lll lll ll ll l ll llll ll lll l ll lll l ll ll ll lll l ll ll ll l lll l ll ll l lll ll llll ll lllll ll ll l ll lll lll ll ll ll l ll lll ll l lllll lll ll l llll ll lll lll lll ll ll l lll lll lll ll l ll llll llll llll l lll lll ll l ll ll llll ll lll ll l lll l ll ll l lll lll l lll ll ll lll ll lllll llll lll l ll ll ll ll ll l l l ll ll lll ll llllll l l lll lll l ll l llll l ll lll ll l ll ll ll l l lll ll ll lll llll ll ll lll llll l lll ll l llll ll ll ll ll l l lll ll ll ll ll l ll l lll l ll l l lll llll l lll ll ll l l llllll ll l l llll l lll lll l l l l ll ll ll ll ll ll l l ll lll lllll llll ll l lll llll ll ll llllll ll lll llll l lll lll l ll lll l ll ll ll l ll lll ll llllll l ll ll llllll ll l ll lll l lll l l ll ll llll ll llll llll l ll ll lll l llll ll ll lll ll ll l ll lllll lll ll ll ll llll llll l ll ll ll l l ll l ll ll ll lll l l ll l lll l lll l lll ll l ll l Empirical quantiles Q uan t il e o f li m i t i ng d i s t r i bu t i on Figure 5: Left plot is Q-Q plot of 1000 replicates of ˜ T φ,n ( θ ), n = 225 and φ ( x ) = − log( x ) + x − H , empirical distribution against limiting χ distribution. The right plot is correspondingplot under the alternative θ n = θ + ∆ n − / with ∆ = (0 . , . , .
15) and limiting distribution χ (cid:18) b φ σ q ∆ t I ( θ )∆ (cid:19) . S (cid:48) j (ˆ θ ) = 0; j = 1 , . . . , p . Observe that using Taylors expansion about θ , we have S (cid:48)(cid:48) jk (ˆ θ ) = S (cid:48)(cid:48) jk ( θ ) − p (cid:88) l =1 δ l S (cid:48)(cid:48)(cid:48) jkl ( θ ∗∗ ) , (22)where θ ∗∗ is a point on the line segment joining θ and ˆ θ . Using arguments as those in the proof ofTheorem 2, Ekstr¨om et.al. (2020), we have, for j, k = 1 , , . . . , pS (cid:48)(cid:48) jk ( θ ) + 13 p (cid:88) l =1 δ l S (cid:48)(cid:48)(cid:48) jkl ( θ ∗ ) − p (cid:88) l =1 δ l S (cid:48)(cid:48)(cid:48) jkl ( θ ∗∗ ) = S (cid:48)(cid:48) jk ( θ ) + o p (1) p → E (cid:0) ζ m φ (cid:48)(cid:48) ( ζ m ) (cid:1) I j,k ( θ ) . (23)From (21)-(23) and Theorem 1, we get2 nT = E (cid:0) ζ m φ (cid:48)(cid:48) ( ζ m ) (cid:1) n p (cid:88) j =1 p (cid:88) k =1 δ j δ k I jk ( θ ) + o p (1)= E (cid:0) ζ m φ (cid:48)(cid:48) ( ζ m ) (cid:1) n ( θ − ˆ θ ) t I ( θ )( θ − ˆ θ ) + o p (1) (24)and thus under H ,˜ T ( m ) φ,n = 2 nTE ( ζ m φ (cid:48)(cid:48) ( ζ m )) σ φ,m = nσ φ,m ( θ − ˆ θ ) t I ( θ )( θ − ˆ θ ) + o p (1) d → χ p . (25)This concludes proof of part ( a ). 16 ahul Singh and Neeraj Misra For η n = θ + ∆ n − / , observe that √ n (ˆ θ − θ ) = √ n (ˆ θ − θ n ) + ∆ d → N (cid:16) ∆ , σ φ,m I ( θ ) − (cid:17) . (26)Hence using similar arguments as in part ( a ), we get˜ T φ,n d → χ p ( σ − φ,m ∆ t I ( θ )∆) . (27)This completes the proof of part ( b ).Theorem 3 can be proved using arguments similar to that in Theorem 2 and part (b) of Theorem1. Proof of Corollary 2.
From Theorem 1 and arguments like those in Corollary 1 of Ekstr¨om et.al.(2020), √ n (ˆ θ ( m ) φ ,n − θ ) d → N (cid:16) , I ( θ ) − (cid:17) . (28)Using (21)-(23), and arguments like those in proof of Theorem 2, we have under H ˜ T ( m ) φ ,φ ,n = 2 n (cid:16) S φ ,n ( θ ) − S φ ,n (ˆ θ φ ,n ) (cid:17) E ( ζ m φ (cid:48)(cid:48) ( ζ m )) σ φ ,m = nσ φ ,m ( θ − ˆ θ φ ,n ) t I ( θ )( θ − ˆ θ φ ,n ) + o p (1)= n ( θ − ˆ θ φ ,n ) t I ( θ )( θ − ˆ θ φ ,n ) + o p (1) d → χ p (29)Similarly for η n = θ + ∆ n − / , √ n (ˆ θ φ ,n − θ ) d → N (cid:16) ∆ , I ( θ ) − (cid:17) . (30)And thus ˜ T ( m ) φ ,φ ,n = n ( θ − ˆ θ φ ,n ) t I ( θ )( θ − ˆ θ φ ,n ) + o p (1) d → χ p (∆ t I ( θ )∆) . (31)Corollary 1 can be proved using similar arguments as in Corollary 2. Proof of Theorem 4.
The testing problem can be equivalently written as H : η = g ( β ) against H A : η (cid:54) = g ( β ) , (32)where g = ( g , g , . . . , g p ) : R p − r → R p is a vector valued function such that the p × ( p − r ) matrix G ( β ) = (( ∂∂β j g i ( β ))) exists and is continuous in β , and rank ( G ( β )) = p − r . For example, let p = 3, η = ( η , η , η ) t , h ( η ) = η − η then β = ( β , β ) t and g ( β ) = ( β , β , β ) t . For testing problem(32) proof is similar to the proof of Theorem 5.6.3 of Sen & Singer (1994).17 ome parametric tests based on sample spacings Proof of Theorem 5.
This can be proved similar to Theorem 2, using Kuljus & Ranneby (2015)and Kuljus & Ranneby (2020).
Proof of Theorem 6.
This can be proved similar to Theorem 4, using Kuljus & Ranneby (2015)and Kuljus & Ranneby (2020).
References [1] Cheng, R. C. H., & Amin, N. A. K. (1983). Estimating parameters in continuous univariate distributionswith a shifted origin. Journal of the Royal Statistical Society: Series B (Methodological), 45(3), 394-403.[2] Csisz´ar, I. (1977). Information measures: a critical survey. Transactions of the seventh Prague conferenceon information theory, statistical decision functions, random processes, pp 73–86.[3] Del Pino, G. E. (1979). On the asymptotic distribution of k-spacings with applications to goodness-of-fittests. The Annals of Statistics, 1058-1065.[4] Ekstr¨om, M. (2013). Powerful Parametric Tests Based on Sum-Functions of Spacings. Scnd J Scatist40: 886-898.[5] Ekstr¨om, M., Mirakhmedov, S. M., & Jammalamadaka, S. R. (2020). A class of asymptotically efficientestimators based on sample spacings. Test, 29(3), 617-636.[6] Ghosh K, Jammalamadaka S Rao (2001) A general estimation method using spacings. JStatist PlannInference 93:71–82.[7] Greenwood, M. (1946). The statistical study of infectious diseases. Journal of the Royal StatisticalSociety, 109(2), 85-110.[8] Holst, L., & Rao, J. S. (1981). Asymptotic spacings theory with applications to the two-sample problem.Canadian Journal of Statistics, 9(1), 79-89.[9] Kimball, B. F. (1947). Some basic theorems for developing tests of fit for the case of the non-parametricprobability distribution function, I. The Annals of Mathematical Statistics, 18(4), 540-548.[10] Kuljus, K., & Ranneby, B. (2015). Generalized maximum spacing estimation for multivariate observa-tions. Scandinavian Journal of Statistics, 42(4), 1092-1108.[11] Kuljus, K, Ranneby, B. (2020). Asymptotic normality of generalized maximum spacing estimators formultivariate observations. Scand J Statist. 47: 968– 989.[12] Lehmann EL, Casella G (1998) Theory of point estimation, 2nd edn. Springer, New York.[13] Mirakhmedov, S. A. (2005). Lower estimation of the remainder term in the CLT for a sum of thefunctions of k-spacings. Statistics & probability letters, 73(4), 411-424.[14] Misra, N., & van der Meulen, E. C. (2001). A new test of uniformity based on overlapping samplespacings. Communications in Statistics-Theory and Methods, 30(7), 1435-1470.[15] Moran, P. A. P. (1951). The random division of an interval—Part II. Journal of the Royal StatisticalSociety: Series B (Methodological), 13(1), 147-150.[16] Pitman, E. J. (2018). Some basic theory for statistical inference: Monographs on applied probabilityand statistics. CRC Press. ahul Singh and Neeraj Misra [17] Prakasa Rao BLS (1983) Nonparametric functional estimation. Academic Press, Orlando.[18] Ranneby, B. (1984). The maximum spacing method. An estimation method related to the maximumlikelihood method. Scandinavian Journal of Statistics, 93-112.[19] Ranneby, B., Jammalamadaka, S. R., & Teterukovskiy, A. (2005). The maximum spacing estimationfor multivariate observations. Journal of statistical planning and inference, 129(1-2), 427-446.[20] Rao, J. S. (1969). Some contributions to the analysis of circular data (Doctoral dissertation, IndianStatistical Institute, Kolkata).[21] Sen, P. K., & Singer, J. M. (1994). Large sample methods in statistics: an introduction with applications(Vol. 25). CRC press.[22] Sethuraman, J., & Rao, J. S. (1970). Pitman efficiencies of tests based on spacings. NonparametricTechniques in Statistical Inference (ML Puri, ed.).[23] Torabi, H. (2006). A new method for hypotheses testing using spacings. Statistics & probability letters,76(13), 1345-1347.[24] Zhou, S., & Jammalamadaka, S. R. (1993). Goodness of fit in multidimensions based on nearest neigh-bour distances. Journal of Nonparametric Statistics, 2(3), 271-284.[17] Prakasa Rao BLS (1983) Nonparametric functional estimation. Academic Press, Orlando.[18] Ranneby, B. (1984). The maximum spacing method. An estimation method related to the maximumlikelihood method. Scandinavian Journal of Statistics, 93-112.[19] Ranneby, B., Jammalamadaka, S. R., & Teterukovskiy, A. (2005). The maximum spacing estimationfor multivariate observations. Journal of statistical planning and inference, 129(1-2), 427-446.[20] Rao, J. S. (1969). Some contributions to the analysis of circular data (Doctoral dissertation, IndianStatistical Institute, Kolkata).[21] Sen, P. K., & Singer, J. M. (1994). Large sample methods in statistics: an introduction with applications(Vol. 25). CRC press.[22] Sethuraman, J., & Rao, J. S. (1970). Pitman efficiencies of tests based on spacings. NonparametricTechniques in Statistical Inference (ML Puri, ed.).[23] Torabi, H. (2006). A new method for hypotheses testing using spacings. Statistics & probability letters,76(13), 1345-1347.[24] Zhou, S., & Jammalamadaka, S. R. (1993). Goodness of fit in multidimensions based on nearest neigh-bour distances. Journal of Nonparametric Statistics, 2(3), 271-284.