Bayesian nonparametric tests for multivariate locations
aa r X i v : . [ m a t h . S T ] J u l Electronic Journal of Statistics (Submitted)
Vol. 0 (0000)ISSN: 1935-7524DOI:
Bayesian nonparametric tests formultivariate locations
Indrabati Bhattacharya and Subhashis Ghosal
Department of StatisticsNorth Carolina State Universitye-mail: [email protected] ; [email protected] Abstract:
In this paper, we propose Bayesian non-parametric tests forone-sample and two-sample multivariate location problems. We model theunderlying distributions using a Dirichlet process prior. For the one-sampleproblem, we compute a Bayesian credible set of the multivariate spatialmedian and accept the null hypothesis if the credible set contains the nullvalue. For the two-sample problem, we form a credible set for the differ-ence of the spatial medians of the two samples and we accept the nullhypothesis of equality if the credible set contains zero. We derive the localasymptotic power of the tests under shrinking alternatives, and also presenta simulation study to compare the finite-sample performance of our testingprocedures with existing parametric and non-parametric tests.
Keywords and phrases:
Bayesian nonparametrics, Hypothesis testing,credible region, Pitman alternatives.
1. Introduction
Several frequentist testing procedures for multivariate locations are available inthe literature, both parametric and non-parametric. The most well-known para-metric procedure is the Hotelling’s T -test, which is based on the multivariatemean vector and the covariance matrix, and it also relies on the assumptionof multivariate normality. This technique performs well if the assumption ofmultivariate normality is nearly correct, but suffers heavily otherwise, or in thepresense of outliers. Non-parametric and robust alternatives based on signs andranks have been quite popular over the years (Oja and Randles (2004)).The notions of signs and ranks are based on the “ordering” of the data points,but in the multivariate setting, there is no objective basis of ordering. Thenotions are generalized to higher dimensions using ℓ -objective functions (seeSection 2). The existing one-sample location problem have the following set up.Suppose that, we have n observations Y = Y , . . . , Y n ∈ R k from a distribution P ( y − θ ), where P ( · − θ ) is a k -variate continuous distribution centered at θ = ( θ , . . . , θ k ) T . Our objective is to test the hypothesis H : θ = θ vs. H : θ = θ . (1)The existing non-parametric test procedures are based on the spatial sign vectors U , mulivariate spatial rank R , and multivariate spatial signed rank Q , which imsart-ejs ver. 2014/10/16 file: ps-template.tex date: July 2, 2020 hattacharya and Ghosal/Bayesian nonparametric tests for multivariate locations are defined respectively by U ( y ) = ( k y k − y, y = 0 , , y = 0 , (2) R ( y, Y ) = 1 n n X i =1 U ( y − Y i ) , (3) Q ( y ) = 12 [ R ( y ; Y ) + R ( y, − Y )] . (4)The estimator of the location associated with spatial signs in (2) is the spatialmedian ˆ θ n = arg min θ ∈ R k P n k Y − θ k , (5)where P n = n − P ni =1 δ Y i is the empirical measure. The objective functions (3)and (4) give rise to multivariate Hodges-Lehmann estimators (Oja and Randles(2004)). The p-values of these multivariate sign and rank-based tests rely on alimiting chi-square distribution of the test statistics. Provided the underlyingdistribution is elliptically symmetric, i.e., its density is of the form f ( y − θ ) = | Σ | − / g (( y − θ ) T Σ − ( y − θ )) , with symmetry center θ , and a positive definite scatter matrix Σ, its centerof symmetry, location parameter, mean and spatial median are the same. Inthis paper, we construct Bayesian non-parametric testing procedures for multi-variate locations using spatial median. Such a procedure is attractive becauseit provides a credible set for spatial median, hence a testing criterion can beformulated without depending on asymptotics. In other words, here we focuson the objective function of type (2) and propose a non-parametric Bayesiantesting procedure. We assume that the observations are drawn from a randomdistribution P , and we put a Dirichlet process (details given in Section 3) prioron it. From P , we can infer about its median functional θ ( P ) = arg min θ ∈ R k P ( k Y − θ k − k Y k ) , (6)where P f = R f d P . The exact posterior distribution of θ ( P ) can be obtainedeasily by posterior simulation. Thus, we can form a credible region for θ ( P ) andour decision will be based on whether the value θ falls into this credible set.For elliptically symmetric distributions, this testing procedure effectively stud-ies the one-sample location problem described above, but our testing procedurecan be used to study a wider range of distributions P , where we study the nullhypothesis H : θ ( P ) = 0. We show that our testing procedure is asymptoti-cally non-parametric and further compute the asymptotic power function underPitman (contiguous) alternatives along possible directions. The two-sample testcan be formulated in a similar way.The rest of this paper is organized as follows. In Section 2, we give an overviewof the existing multivariate testing procedures. In Section 3, we describe our imsart-ejs ver. 2014/10/16 file: ps-template.tex date: July 2, 2020 hattacharya and Ghosal/Bayesian nonparametric tests for multivariate locations Bayesian non-parametric test procedures. Section 4 gives the local asymptoticpower under contiguous alternatives and Section 5 presents a simulation study.All the proofs are given in Section 6.
2. Overview of existing tests
We begin this section by describing existing non-parametric testing proceduresfor one-sample location problems, and later move on to two-sample and sev-eral samples problems. Let Y , . . . , Y n ∈ R k be n observations from a k -variateprobability distribution P . According to Sirki¨a et al. (2007), the non-parametrictesting methods can be classified as based on multivariate spatial sign func-tion U , multivariate spatial rank R , and multivariate spatial signed rank Q ,which are defined as follows: The test statistic based on the score statistic T ( Y ), which is a general notation for the score functions described above,is given by n − P ni =1 T ( Y i ). Under H , n − / P ni =1 T ( Y i ) N k (0 , Σ), whereΣ = P { T ( Y ) T ( Y ) T } . The usual estimator for Σ is ˆΣ = n − P ni =1 T ( Y i ) T ( Y i ) T .The assumption of elliptical symmetry is needed to decide the appropriatecut-off for constructing the test procedure. Under H , Q = n (cid:13)(cid:13)(cid:13) ˆΣ − / n n X i =1 T ( Y i ) (cid:13)(cid:13)(cid:13) χ k , where denotes convergence in distribution, and χ k denotes a chi-square dis-tribution with degrees of freedom k (Sirki¨a et al. (2007)). For elliptically sym-metric distributions, Q is strictly distribution free (Oja and Randles (2004)).An approximate p-value can be obtained from the above chi-square distribution.For small sample sizes, a conditional distribution-free p-value can be obtainedunder the assumption of directional symmetry (under which ( Y − θ ) / k Y − θ k has the same distribution as ( θ − Y ) / k θ − Y k ). This p-value can be obtainedas E δ [ { Q δ ≥ Q } ], where E δ is the expectation for the uniform distribution δ over the 2 n k -dimensional with each component being +1 or −
1, and Q δ is thevalue of the test statistic for the data points δ Y , . . . , δ n Y n (Oja and Randles(2004)).The one sample testing procedures have been naturally extended to two sam-ples. Suppose that, we have 2 independent random samples Y ( j )1 , . . . , Y ( j ) n j , from k -variate distributions P ( · − θ ( j ) ), j = 1 ,
2. We test the hypothesis H : θ (1) = θ (2) , against H : θ (1) = θ (2) . Sirki¨a et al. (2007) developed a testing procedure using the general score func-tion T ( Y ) based on the following inner standardization approach. First, a k × k matrix H and a k -vector have to be found such that, for Z ( j ) i = H ( Y ( j ) i − h ) , i = imsart-ejs ver. 2014/10/16 file: ps-template.tex date: July 2, 2020 hattacharya and Ghosal/Bayesian nonparametric tests for multivariate locations , . . . , n j , j = 1 ,
2, 1 n X j =1 n j X i =1 T ( Z ( j ) i ) =0 ,kn X j =1 n j X i =1 T ( Z ( j ) i ) T ( Z ( j ) i ) T = (cid:26) n X j =1 n j X i =1 k T ( Z ( j ) i ) k (cid:27) I k , where n = n + n , and I k denotes the identity matrix of order k × k . The teststatistic has the form Q = k P j =1 n j k n j P n j i =1 T ( Z ( j ) i ) k n P j =1 P n j i =1 k T ( Z ( j ) i ) k . It has been shown that Q has a limiting chi-square distribution with k degreesof freedom. Thus, for large samples, a p-value can be constructed using thequantiles of the chi-square distribution. For smaller samples, an approximatep-value can be obtained using a conditionally distribution-free permutation test verion (Sirki¨a et al. (2007)). This approach has been extended to a general c number of samples as well.
3. Bayesian Non-parametric Tests
Suppose that, we have n observations Y , . . . , Y n ∈ R k from a k -dimensionaldistribution P . We choose a non-parametric Bayesian approach, i.e., we imposea prior on the underlying random distribution P , and we form a credible setbased on the posterior distribution of the spatial-median functional θ ( P ) = arg min θ ∈ R k P {k Y − θ k − k Y k } . (7)The most commonly used prior on P is a Dirichlet process prior with centeringmeasure α (DP( α )) (see Chapter 4, Ghosal and van der Vaart (2017)). A Dirich-let process prior can be alternatively denoted as DP( M G ), where M = | α | , and¯ α = α/M has cumulative distribution function G . The notations DP( α ) andDP( M G ) will be used interchangeably in this paper.The process DP( α ) is a conjugate prior for i.i.d. observations from P , and theposterior distribution of P given Y , . . . , Y n is DP( α + n P n ). The exact poste-rior distribution of θ ( P ) cannot be obtained analytically, but posterior samplescan be drawn via the stick-breaking construction of a Dirichlet process (chap-ter 4, Ghosal and van der Vaart (2017)). If ξ , ξ , . . . iid ∼ ¯ α , and V , V , . . . , iid ∼ Be(1 , M ) are independent random variables and W j = V j Q j − l =1 (1 − V l ), then P = P ∞ j =1 W j δ ξ j ∼ DP( M ¯ α ). Thus the posterior distribution of P given imsart-ejs ver. 2014/10/16 file: ps-template.tex date: July 2, 2020 hattacharya and Ghosal/Bayesian nonparametric tests for multivariate locations Y , . . . , Y n takes the same form P ∞ j =1 W j δ ξ j with V , V , . . . , iid ∼ Be(1 , M + n ).For computation, we use a truncated approximation P N = P Nj =1 W j δ ξ j to thestick-breaking representation. Thus, a posterior 100(1 − α )% credible region canbe formed using the following steps. • Draw V j , j = 1 , . . . , N − iid ∼ Be(1 , M + n ) and V N = 1. Then, wecalculate the stick-breaking weights as W = V , W j = V j Q j − l =1 (1 − V l ), j = 2 , . . . , N . • With probability M/ ( M + n ), draw ξ j from G , and with probability n/ ( M + n ), ξ j is drawn from P n . • Draw posterior samples ˆ θ b , b = 1 , . . . , B ˆ θ b = arg min θ N X j =1 W jb k Y jb − θ k . • Compute the posterior mean ¯ θ = B − P Bb =1 ˆ θ b and the posterior covari-ance matrix S = B − P Bb =1 (ˆ θ b − ¯ θ )(ˆ θ b − ¯ θ ) ′ . • A 100(1 − α )% credible set for θ ( P ) is then given by C ( Y , . . . , Y n ; α ) = { θ : ( θ − ¯ θ ) ′ S − ( θ − ¯ θ ) ≤ r − α } , where r − α is the 100(1 − α )th percentile of (ˆ θ b − ¯ θ ) ′ S − (ˆ θ b − ¯ θ ) , b =1 , . . . , B . • We reject H if θ / ∈ C ( Y , . . . , Y n ; α ).The non-informative limit as M → P is DP( n P n ) is calledthe Bayesian bootstrap distribution. Its centering measure is P n , and a ran-dom distribution generated from it is supported on the observation points. Ifwe choose the non-informative limit of the posterior Dirichlet process, we donot need to generate posterior samples from the full Dirichlet process, and weonly need to sample n independently and identically distributed (i.i.d.) obser-vations from an exponential distribution with parameter 1, which saves somecomputational cost. Theorem 1.
The one-sample Bayesian non-parametric test for H : θ ( P ) = θ is asymptotically distribution-free, i.e., P θ ( θ ∈ C ( Y , . . . , Y n ; α )) → − α, for any P θ such that θ ( P θ ) = θ . As we have already mentioned, the testing procedure has been constructedonly using the posterior samples, without relying on any asymptotic properties.
The Bayesian non-parametric testing procedure for two-sample location problemcan be easily constructed generalizing the one-sample procedure. Suppose that, imsart-ejs ver. 2014/10/16 file: ps-template.tex date: July 2, 2020 hattacharya and Ghosal/Bayesian nonparametric tests for multivariate locations we have n observations Y (1)1 , . . . , Y (1) n ∈ R k from a distribution P (1) , and n observations Y (2)1 , . . . , Y (2) n ∈ R k from P (2) . We want to test the hypothesis H : θ ( P (1) ) − θ ( P (2) ) = 0 against H : θ ( P (1) ) − θ ( P (2) ) = 0 . As we have previously mentioned, if P (1) = P ( ·− θ (1) ) and P (2) = P ( ·− θ (2) ) areelliptically symmetric distributions, then this problem boils down to studyingthe two-sample location problem H : θ (1) − θ (2) = 0 against H : θ (1) − θ (2) = 0.We put a DP( M G ) prior on both P (1) and P (2) , for some M > G . Thus P (1) and P (2) have the stick-breaking representations P (1) = P ∞ j =1 W (1) j δ (1) ξ j and P (2) = P ∞ j =1 W (2) j δ (2) ξ j respectively. where both W (1) j , j = 1 , , . . . and W (2) j , j = 1 , , . . . are drawn from Be(1 , M ), and ξ (1) j , j = 1 , , . . . and ξ (2) j , j =1 , , . . . are i.i.d. samples from G . We truncate both sets of weights at N andconstruct a 100(1 − α )% posterior credible set through the following steps. • Draw V (1) j , j = 1 , . . . , N − iid ∼ Be(1 , M + n ) and V (1) N = 1. Then thestick-breaking weights are W (1)1 = V (1)1 , W (1) j = V (1) j Q j − l =1 (1 − V (1) l ), j = 2 , . . . , N . Similarly draw V (2) j from Be(1 , M + n ) and construct W (2) j accordingly. • Next, with probability M/ ( M + n l ), draw Y ( l )1 b , . . . , Y ( l ) Nb from G , and withprobability n l / ( M + n l ), draw Y ( l )1 b , . . . , Y ( l ) Nb from P n l , l = 1 , • Draw posterior samples ˆ θ (1) b and ˆ θ (2) b , b = 1 , . . . , B ,ˆ θ (1) b = arg min θ N X j =1 W (1) jb k Y (1) jb − θ k , ˆ θ (2) b = arg min θ N X j =1 W (2) jb k Y (2) jb − θ k . • Compute ¯ θ ( l ) = B − P Bb =1 ˆ θ ( l ) b and S ( l ) = B − P Bb =1 (ˆ θ ( l ) b − ¯ θ ( l ) )(ˆ θ ( l ) b − ¯ θ ( l ) ) ′ ,for l = 1 , • A 100(1 − α )% credible set for θ ( P ) − θ ( P ) is then given by C ( Y (1)1 , . . . , Y (1) n , Y (2)1 , . . . , Y (2) n ; α ) = { θ − θ : ( θ − θ − ¯ θ (1) + ¯ θ (2) ) ′ ( S (1) + S (2) ) − ( θ − θ − ¯ θ (1) + ¯ θ (2) ) ≤ r − α } , (8)where r − α is the 100(1 − α )th percentile of (ˆ θ (1) b − ˆ θ (2) b − ¯ θ (1) + ¯ θ (2) ) ′ ( S (1) + S (2) ) − (ˆ θ (1) b − ˆ θ (2) b − ¯ θ (1) + ¯ θ (2) ) , b = 1 , . . . , B . • We reject H if 0 / ∈ C ( Y (1)1 , . . . , Y (1) n , Y (1)2 , . . . , Y (2) n ; α ).Again, we can consider the non-informative limit of the posterior Dirichlet pro-cesses, and use the Bayesian bootstrap to form the credible region. imsart-ejs ver. 2014/10/16 file: ps-template.tex date: July 2, 2020 hattacharya and Ghosal/Bayesian nonparametric tests for multivariate locations Theorem 2.
The two-sample Bayesian non-parametric test is asymptoticallynon-parametric, i.e., for < α < , and θ ∈ R k P (1) θ P (2) θ (0 ∈ C ( Y (1)1 , . . . , Y (1) n , Y (2)1 , . . . , Y (2) n ; α )) → − α, for any P (1) θ , P (2) θ such that θ ( P (1) θ ) = θ ( P (2) θ ) = θ .
4. Asymptotic power Under Contiguous Alternatives
In this section, we analyze the local asymptotic power of the proposed Bayesiannon-parametric tests, i.e., the limiting power under a sequence of alternativesconverging to the null value. For the one-sample problem, we consider differen-tiable in quadratic mean (DQM) densities P = { p θ = d P θ / d µ : θ ∈ R k } , i.e.,there exists a vector valued measurable function ˙ ℓ θ : R k → R k such that, for h → Z (cid:16) √ p θ + h − √ p θ − h T ˙ ℓ θ √ p θ (cid:17) d µ = o ( k h k ) . We consider shrinking alternatives of the form H n : θ = θ + h √ n , (9)for models P θ ∈ P . Then we derive the limiting power for the sequence ofdistributions P θ + h/ √ n ∈ P . As a consequence of the DQM condition, the models P nθ + h/ √ n satisfy the local asymptotically normal (LAN) condition, i.e., thereexist a matrix I θ and a random vector ∆ n,θ ∼ N k (0 , I θ ) such that, for everyconverging sequence h n → h ,log d P nθ + h n / √ n d P nθ = h T ∆ n,θ − h T I θ h + o P nθ (1) . (10)In this context, specifically ∆ n,θ = n − / P ni =1 h T ˙ ℓ θ ( Y i ), and I θ = P θ ˙ ℓ θ ˙ ℓ Tθ . Thenext theorem gives the limiting power for the one-sample test under a sequenceof alternatives of the form H n for the DQM models. Before stating the theorem,we introduce some notations. Define U θ,P = P (cid:18) ( Y − θ )( Y − θ ) T k Y − θ k (cid:19) (11) V θ,P = P (cid:26) k Y − θ k (cid:18) I k − ( Y − θ )( Y − θ ) T k Y − θ k (cid:19)(cid:27) , (12)Let P ⋆ be the true distribution of Y , i.e., the truth of P , and θ ⋆ be the spatialmedian for the true distribution P ⋆ , i.e., θ ⋆ = θ ( P ⋆ ). Theorem 3.
For a sequence of shrinking alternatives of the form (9) , i.e., undera sequence of differentiable in quadratic mean (DQM) models P θ + h/ √ n ∈ P , imsart-ejs ver. 2014/10/16 file: ps-template.tex date: July 2, 2020 hattacharya and Ghosal/Bayesian nonparametric tests for multivariate locations the asymptotic power of the one-sample Bayesian non-parametric test for H : θ ( P ) = θ is given by F χ ( χ k ; α ; k, δ ′ ( V − θ ,P θ U θ ,P θ V − θ ,P θ ) − δ ) , where δ = P θ (cid:18) − V − θ ,P θ Y − θ k Y − θ k ˙ ℓ Tθ I − θ h (cid:19) , (13) and F χ ( x ; k, δ ) is the CDF of a non-central chi-square distribution with degreesof freedom k and non-centrality parameter δ , with χ k ; α being the − α ) thpercentile of a χ k distribution. For the two sample problem, we again consider DQM models P (1) θ + h / √ n , P (2) θ + h / √ n ∈ P , i.e., the contiguous alternatives are of the form H n : θ ( j ) n j = θ + h j √ n j , j = 1 , , (14)with n = n + n , such that n /n → λ , and n /n → − λ . The following theoremgives the limiting power of the two-sample test under contiguous alternatives ofthe form (14), and the notations from Theorem 3 directly translate to the nexttheorem. Theorem 4.
For a sequence of shrinking alternatives of the form (14) , i.e.,for a sequence of DQM models P (1) θ + h / √ n , P (2) θ + h / √ n ∈ P , the asymptoticpower of the two-sample Bayesian non-parametric test for testing H : θ ( P (1) ) = θ ( P (2) ) = θ for θ ∈ R k , is given by F χ ( χ k ; α ; δ ′ ( V − θ ; P (1) θ U θ ; P (1) θ V − θ ,P (1) θ + V − θ ; P (2) θ U θ ; P (2) θ V − θ ,P (2) θ ) − δ ) , where δ = 1 √ λ P (1) θ {− V − θ ,P (1) θ Y (1) − θ k Y (1) − θ k ˙ ℓ (1) θ T I (1) θ − h } + 1 √ − λ P (2) θ {− V − θ ,P (2) θ Y (2) − θ k Y (2) − θ k ˙ ℓ (2) θ T I (2) θ − h } , (15) for any θ ∈ R k .
5. Simulation Study
We perform the simulation study to demonstrate the finite sample performanceof the proposed one-sample and two-sample Bayesian non-parametric tests. Wecompare our tests with the Hotelling T -test, and the spatial sign and ranktests. The underlying distributions are bivariate Gaussian, bivariate t (both el-liptically symmetric), and bivariate gamma (asymmetric). The bivariate gammadistribution is constructed using Gaussian copula (Xue-Kun Song (2000)). To imsart-ejs ver. 2014/10/16 file: ps-template.tex date: July 2, 2020 hattacharya and Ghosal/Bayesian nonparametric tests for multivariate locations θ NPBayes Sign Test Rank Test Hotelling’s T Bivariate Gaussian Distribution(0, 0) 0.050 0.046 0.051 0.055(0.05, 0.05) 0.139 0.086 0.084 0.099(0.1, 0.05) 0.169 0.125 0.141 0.156(0.1, -0.1) 0.221 0.188 0.213 0.234Bivariate t Distribution(0, 0) 0.050 0.053 0.041 0.02(0.05, 0.05) 0.174 0.058 0.053 0.025(0.1, 0.05) 0.179 0.094 0.082 0.018(0.1, -0.1) 0.201 0.171 0.197 0.026Bivariate Gamma Distribution(0, 0) 0.050 0.016 0.025 0.294(0.05, 0.05) 0.027 0.021 0.039 0.528(0.1, 0.05) 0.029 0.013 0.058 0.607(0.1, -0.1) 0.034 0.009 0.018 0.255
Table 1
Power for testing H : θ ( P ) = 0 for bivariate Gaussian, bivariate t (with 1 degree offreedom), and bivariate gamma distributions with different location parameters ( θ ). describe the construction briefly, let Y , . . . , Y k be k many univariate gammadistributions Ga( s, r ) with cumulative distribution functions (CDF) and prob-ability density functions (PDF) being denoted by F j and f j , j = 1 , . . . , k . Thenthe joint density of Y = ( Y , . . . , Y k ) is then given by g ( y, s, r, V ) = c φ { F , . . . , F k | V } k Y j =1 f j ( y j , s, r ) , where c φ ( ·| V ) denotes the density of the k -dimensional Gaussian copula. Forcomparison, we choose a general version of Hotelling’s T -test, where the Gaus-sian assumption can be relaxed to existence of second moments. Here the p-valueis based on a chi-square approximation instead of the usual F -distribution. Forone-sample test, we consider a sample size of n = 100 and for the two-sampletest, we choose n = 100, n = 90. We calculate the power, i.e., the proportionof times the null hypotheses are rejected off 2000 replications. The location pa-rameters are chosen suitably to show a good range of powers, and the scattermatrices are chosen to be I . For our testing method, we choose a DP( α ) priorwith α = 2 × N (0 , I ).Tables 1 and 2 show the power values, and it should be noted that our testprocedures attain the nominal level 0 .
05, and outperforms all other proceduresin most cases. When the underlying distributions are not Gaussian, our methodperforms better than other methods, especially compared with the Hotelling’s T -test. Remark 1.
Here we have considered tests for multivariate locations basedon spatial medians, but these tests can be constructed using multivariate ℓ -medians (with ℓ p -norms) as well. For some fixed p >
1, the ℓ -median for a imsart-ejs ver. 2014/10/16 file: ps-template.tex date: July 2, 2020 hattacharya and Ghosal/Bayesian nonparametric tests for multivariate locations θ (1) θ (2) NPBayes Sign Test Rank Test Hotelling’s T Bivariate Gaussian Distribution(0, 0) (0, 0) 0.050 0.057 0.051 0.037(0, 0) (0.1, 0) 0.135 0.091 0.085 0.083(0, 0) (0.1, 0.1) 0.225 0.098 0.122 0.136(0, 0) (0, 0.3) 0.402 0.337 0.346 0.146Bivariate t Distribution(0, 0) (0, 0) 0.059 0.041 0.052 0.011(0, 0) (0,1. 0) 0.141 0.060 0.074 0.026(0, 0) (0.1, 0.1) 0.158 0.087 0.099 0.022(0, 0) (0, 0.3) 0.307 0.248 0.213 0.023Bivariate Gamma Distribution(0, 0) (0, 0) 0.024 0.020 0.019 0.017(0, 0) (0.1, 0) 0.020 0.019 0.018 0.025(0, 0) (0.1, 0.1) 0.030 0.015 0.014 0.030(0, 0) (0, 0.3) 0.033 0.018 0.023 0.028
Table 2
Power for testing H : θ (1) = θ (2) for bivariate Gaussian, bivariate t (with 1 degree offreedo), and bivariate gamma distributions with different location parameters ( θ (1) and θ (2) ). k -variate distribution P can be defined as θ p ( P ) = arg min θ ∈ R k {k Y − θ k p − k θ k p } . Bernstein-von Mises theorems of ℓ -medians are available in the literature (Bhattacharya and Ghosal(2019)). Hence the expressions for local asymptotic powers under shrinking al-ternatives can be obtained using those theorems.
6. Proofs
Before giving the proofs of the main theorems, we need a couple of auxiliaryresults. The first one gives convergence results for the posterior mean ¯ θ andcovariance matrix S . Lemma 1.
Suppose the true distribution of Y , . . . , Y n ∈ R k is P ⋆ , and thefollowing conditions hold for P ⋆ .1. The distribution P ⋆ has a density that is bounded on bounded subsets of R k .2. The spatial median of P ⋆ , θ ⋆ = θ ( P ⋆ ) is unique.Then, ¯ θ = ˆ θ n + o P ⋆ ( n − / ) and nS = V − θ ⋆ ,P ⋆ U θ ⋆ ,P ⋆ V − θ ⋆ ,P ⋆ + o P ⋆ (1) , where ˆ θ n isthe sample spatial median of Y , . . . , Y n .Proof. The posterior distribution of θ ( P ) can be approximated by a Gaus-sian distribution in the Bernstein-von Mises sense (Bhattacharya and Ghosal(2019)), i.e., given Y , . . . , Y n √ n ( θ ( P ) − ˆ θ n ) N k (0 , V − θ ⋆ ,P ⋆ U θ ⋆ ,P ⋆ V − θ ⋆ ,P ⋆ ) . imsart-ejs ver. 2014/10/16 file: ps-template.tex date: July 2, 2020 hattacharya and Ghosal/Bayesian nonparametric tests for multivariate locations Let B n is the Bayesian bootstrap process defined by B n f = P ni =1 W ni δ Y i , where( W n , . . . , W nn ) follows a Dirichlet distribution Dir( n ; 1 , . . . , θ ( B n ) =arg min θ B n k Y − θ k . It has been shown in Theorem 3.1 in Bhattacharya and Ghosal(2019) that, asymptotically, θ ( P ) is a Bayesian bootstrapped analog of a Z -estimator. Thus, our problem boils down to showing the first and second mo-ment consistency of the bootstrap Z -estimator θ ( B n ).Cheng (2015) proved the consistency of the bootstrap moment estimators forthe class of exchangeably weighted bootstrap (see Section 2.2, Cheng (2015)).The Bayesian bootstrap weights fall into the class of the exchangeable bootstrapweights, and we have to show that the ℓ criterion function m θ ( y ) = −k y − θ k + k y k satisfies the following two sufficient conditions. Let G n = √ n ( P n − P ⋆ )denotes the empirical process and G ⋆n = √ n ( B n − P n ) denotes the bootstrapempirical process. Suppose the following conditions hold.1. Let Θ be the compact parameter space. For any θ ∈ Θ, P ⋆ ( m θ − m θ ⋆ ) . −k θ − θ ⋆ k .
2. Define N δ = { m θ − m θ : k θ − θ k ≤ δ } . We have to show (cid:0) E X k G n k p ′ N δ (cid:1) /p ′ . δ (16) (cid:0) E XW k G ⋆n k p ′ N δ (cid:1) /p ′ . δ, (17)for some p ′ > R k with high probability. InTheorem 3.1 of Bhattacharya and Ghosal (2019), it has been shown that forsome 0 < ǫ < / K > P ( k Y k ≤ K ) > − ǫ , given Y , . . . , Y n , k θ ( B n ) k ≤ K with high P ⋆n -probability, which implies that asymptotically,given Y , . . . , Y n , k θ ( P ) k ≤ K with high P ⋆n -probability.After fixing K >
0, we choose Θ = { θ : k θ k ≤ K } . Since Θ is compact,Condition 1 can be shown from a Taylor series expansion around θ ⋆ P ⋆ m θ − P ⋆ m θ ⋆ = ( θ − θ ⋆ ) ′ P ⋆ ˙ m θ ⋆ + ( θ − θ ⋆ ) ′ V θ ⋆ ,P ⋆ ( θ − θ ⋆ )2 + o ( k θ − θ ⋆ k ) . (18)Since θ ⋆ is the maximizer of P ⋆ m θ , P ⋆ ˙ m θ ⋆ vanishes. The matrix V θ ⋆ ,P ⋆ is neg-ative definite, and hence, the second term in the right hand side of (18) isbounded above by − c k θ − θ ⋆ k , for a positive constant c .Before proving Condition 2, we introduce some notations. For any class offunctions A , and metric ℓ , its ǫ .-bracketing number is denoted as N [ ] ( ǫ, A , ℓ ).The corresponding bracketing entropy integral is defined as J [ ] ( ǫ, A , ℓ ) = Z δ q N [ ] ( ǫ, A , ℓ )d ǫ. Following Cheng (2015), a simple sufficient condition for (16) is the followingglobal Lipschitz condition | m θ ( x ) − m θ ⋆ ( x ) | ≤k θ − θ ⋆ k (19) imsart-ejs ver. 2014/10/16 file: ps-template.tex date: July 2, 2020 hattacharya and Ghosal/Bayesian nonparametric tests for multivariate locations for any θ ∈ Θ, and J [ ] (1 , N δ , L ( P ⋆ )) + k M k ψ p ′ < ∞ , (20)for some p ′ >
2, where k · k ψ p is the Orlicz norm with respect to the convexfunction ψ p ( t ) = exp ( t p − | m θ ( y ) − m θ ⋆ ( y ) | ≤ k θ − θ ⋆ k . Since M ( y ) = 1 for every y , we just have to show J [ ] (1 , N δ , L ( P ⋆ )) < ∞ .By Example 19.7 in Van der Vaart (1998), since | m θ ( y ) − m θ ′ ( y ) | ≤ k θ − θ ′ k ,for every θ, θ ′ ∈ Θ, there exists a constant K such that N [ ] (1 , N δ , L ( P ⋆ )) ≤ (cid:18) diam Θ ǫ (cid:19) k , for every 0 < ǫ < diam Θ . Then, the entropy is of the order log(1 /ǫ ). By a change of variable, it can beshown that J [ ] (1 , N δ , L ( P ⋆ )) < ∞ .The next lemma gives a Bernstein-von Mises theorem for the differenceof spatial medians for two independent samples Y (1)1 , . . . , Y (1) n ∼ P (1) , and Y (2)1 , . . . , Y (2) n ∼ P (2) . The sample spatial medians are denoted by ˆ θ n and ˆ θ n respectively. We put a DP( α ) prior on both P (1) and P (2) , and construct a pos-terior for θ ( P (1) ) − θ ( P (2) ). The asymptotic result follows almost immediatelyfrom Theorem 3.1 in Bhattacharya and Ghosal (2019). Lemma 2.
Suppose the following conditions hold.1. The true distributions P ⋆ (1) and P ⋆ (2) have probability densities that arebounded on compact subsets of R k .2. The true spatial medians, θ ⋆ (1) = θ ( P ⋆ (1) ) and θ ⋆ (2) = θ ( P ⋆ (2) ) areunique.Then, denoting n = n + n , such that n /n → λ , and n /n → − λ , (i) √ n (ˆ θ (1) n − θ ⋆ (1) − ˆ θ (2) n + θ ⋆ (2) ) N k (0 , λ − V − θ ⋆ (1) ,P ⋆ (1) U θ ⋆ (1) ,P ⋆ (1) V − θ ⋆ (1) ,P ⋆ (1) +(1 − λ ) − V − θ ⋆ (2) ,P ⋆ (2) U θ ⋆ (2) ,P ⋆ (1) V − θ ⋆ (2) ,P ⋆ (1) )(ii) Given Y (1)1 , . . . , Y (1) n , Y (1)2 , . . . , Y (2) n , √ n ( θ ( P (1) ) − ˆ θ (1) n − θ ( P (2) ) + ˆ θ (2) n ) N k (0 , λ − V − θ ⋆ (1) ,P ⋆ (1) U θ ⋆ (1) ,P ⋆ (1) V − θ ⋆ (1) ,P ⋆ (1) +(1 − λ ) − V − θ ⋆ (2) ,P ⋆ (2) U θ ⋆ (2) ,P ⋆ (2) V − θ ⋆ (2) ,P ⋆ (2) ) . Proof.
From Theorem 3.1 in Bhattacharya and Ghosal (2019), for j = 1 , √ n j (ˆ θ ( j ) n j − θ ⋆ ( j ) ) N k (0 , V − θ ⋆ ( j ) ,P ⋆ ( j ) U θ ⋆ ( j ) ,P ⋆ ( j ) V − θ ⋆ ( j ) ,P ⋆ ( j ) ),(ii) Given Y ( j )1 , . . . , Y ( j ) n j , √ n j ( θ ( P ( j ) ) − ˆ θ ( j ) n j ) N k (0 , V − θ ⋆ ( j ) ,P ⋆ ( j ) U θ ⋆ ( j ) ,P ⋆ ( j ) V − θ ⋆ ( j ) ,P ⋆ ( j ) ) . imsart-ejs ver. 2014/10/16 file: ps-template.tex date: July 2, 2020 hattacharya and Ghosal/Bayesian nonparametric tests for multivariate locations From independence of the two samples, the conclusion follows.
Proof of Theorem 1.
The probability of accepting the null hypothesis under thenull is P θ ( θ ∈ C ( Y , . . . , Y n )) = P θ ((¯ θ − θ ) ′ S − (¯ θ − θ ) ≤ r − α )Using Lemma 1 and Theorem 3.1 in Bhattacharya and Ghosal (2019), n ( θ − ¯ θ ) T S − ( θ − ¯ θ ) χ k , (21)which implies that r − α = χ k ; α + o P θ (1). The weak convergence in (21) usesthe fact that if X ∼ N k (0 , I k ), then X T X ∼ χ k . Next, again using Lemma 1,Theorem 3.1 in Bhattacharya and Ghosal (2019) and Slutsky’s theorem, P θ ( θ ∈ C ( Y , . . . , Y n )) = P θ ((¯ θ − θ ) ′ S − (¯ θ − θ ) ≤ r − α )= P θ ((ˆ θ n − θ ) ′ ( V − θ ,P θ U θ ,P θ V − θ ,P θ ) − (ˆ θ n − θ )+ o P θ (1) ≤ χ k ; α + o P θ (1)) → − α. . Proof of Theorem 2.
The proof is similar to that of Theorem 1. Using Lemma 2under H , √ n (ˆ θ (1) n − ˆ θ (2) n ) converges to a Gaussian distribution with mean 0 andcovariance matrix λ − V − θ ,P (1) θ U θ ,P (1) θ V − θ ,P (1) θ + (1 − λ ) − V − θ ,P (2) θ U θ ,P (2) θ V θ ,P (2) θ .Using Lemma 1, Lemma 2 and Slutsky’s theorem, n (¯ θ (1) − ¯ θ (2) ) ′ ( S (1) + S (2) ) − (¯ θ (1) − ¯ θ (2) ) χ k , which implies that r − α = χ k ; α + o P (1) θ (1) + o P (2) θ (1). Next, using Lemma 2,under H ,( P (1) θ × P (2) θ )[(¯ θ (1) − ¯ θ (2) ) ′ ( S (1) + S (2) ) − (¯ θ (1) − ¯ θ (2) ) ≤ r − α ]= ( P (1) θ × P (2) θ )( n (ˆ θ (1) n − ˆ θ (2) n ) ′ ( 1 λ V − θ ,P (1) θ U θ ,P (1) θ V − θ ,P (1) θ + 11 − λ V − θ ,P (2) θ U θ ,P (2) θ V − θ ,P (2) θ ) − (ˆ θ (1) n − ˆ θ (2) n ) + o P (1) θ (1) + o P (2) θ (1) ≤ χ k ; α + o P (1) θ (1) + o P (2) θ (1)) → − α. Proof of Theorem 3.
It is well known that the models P nθ and P nθ + h/ √ n aremutually contiguous. Under H , using Theorem 5.23 in Van der Vaart (1998),ˆ θ n can be written as √ n (ˆ θ n − θ ) = − √ n n X i =1 V − θ ,P θ Y i − θ k Y i − θ k + o P θ (1) . imsart-ejs ver. 2014/10/16 file: ps-template.tex date: July 2, 2020 hattacharya and Ghosal/Bayesian nonparametric tests for multivariate locations Let L n denote the log-likelihood ratio L n = log(d P nθ + h/ √ n / d P nθ ). By centrallimit theorem, ( √ n (ˆ θ n − θ ) , L n ) tends to a ( k + 1)-dimensional Gaussian dis-tribution with mean zero and covariance δ = P θ (cid:0) − V − θ ,P θ Y − θ k Y − θ k ˙ ℓ Tθ I − θ h (cid:1) . Then by Le Cam’s third lemma (Example 6.7, Van der Vaart (1998)), under P θ + h/ √ n , √ n (ˆ θ n − θ ) converges weakly to a Gaussian distribution with mean δ . Following the arguments used in Theorem 1, the local asymptotic power ofthe test is given by P θ + h/ √ n (¯ θ − θ ) ′ S − (¯ θ − θ ) ≤ r − α ) = P θ + h/ √ n ( n (ˆ θ n − θ ) ′ ( V − θ ,P θ U θ ,P θ V − θ ,P θ ) − (ˆ θ n − θ ) + o P θ (1) ≤ χ k ; α + o P θ (1)) . Under P θ + h/ √ n , n (ˆ θ n − θ ) ′ ( V − θ ,P θ U θ ,P θ V − θ ,P θ ) − (ˆ θ n − θ ) tends to a chi-square distribution with non-centrality parameter δ ′ ( V − θ ,P θ U θ ,P θ V − θ ,P θ ) − δ ,which gives us the asymptotic power given in the statement of Theorem 3. Proof of Theorem 4.
The proof proceeds along the lines of Theorem 3. The mod-els { P (1) θ × P (2) θ } and { P (1) θ + h / √ n × P (2) θ + h / √ n } are mutually contiguous. FromTheorem 5.23 in Van der Vaart (1998), sample spatial medians have the follow-ing linearization, √ n (ˆ θ (1) n − θ ) = − √ n n X i =1 V − θ ,P (1) θ Y (1) i − θ k Y (1) i − θ k + o P (1) θ (1) , (22) √ n (ˆ θ (2) n − θ ) = − √ n n X i =1 V − θ ,P (2) θ Y (2) i − θ k Y (2) i − θ k + o P (2) θ (1) . (23)Subtracting (23) from (22), under H , √ n (ˆ θ (1) n − ˆ θ (2) n ) = − (cid:26) √ n λ n X i =1 V − θ ,P (1) θ Y (1) i − θ k Y (1) i − θ k − p n (1 − λ ) n X i =1 V − θ ,P (2) θ Y (2) i − θ k Y (2) i − θ k (cid:27) + o P (1) θ (1) + o P (2) θ (1) . Define the log-likelihood ratio L ′ N = log(d P (1) θ + h / √ n d P (2) θ + h / √ n / d P (1) θ d P (2) θ ),which looks like L ′ N = 1 √ n h T n X i =1 ˙ ℓ (1) θ ( Y (1) i ) − h T I (1) θ h + 1 √ n h T n X i =1 ˙ ℓ (2) θ ( Y (2) i ) − h T I (2) θ h + o P (1) θ (1) + o P (2) θ (1) . imsart-ejs ver. 2014/10/16 file: ps-template.tex date: July 2, 2020 hattacharya and Ghosal/Bayesian nonparametric tests for multivariate locations By central limit theorem, the joint distribution of √ n (ˆ θ (1) n − ˆ θ (2) n ) and log L ′ n tends to a ( k + 1)-dimensional Gaussian distribution with mean zero and co-variance δ = 1 √ λ P (1) θ {− V − θ ,P (1) θ Y (1) − θ k Y (1) − θ k ˙ ℓ (1) θ T I (1) θ − h } +1 √ − λ P (2) θ {− V − θ ,P (2) θ Y (2) − θ k Y (2) − θ k ˙ ℓ (2) θ T I (2) θ − h } . Again, by Le Cam’s third lemma, under P (1) θ + h / √ n × P (2) θ + h / √ n , √ n (ˆ θ (1) n − ˆ θ (2) n )converges weakly to a Gaussian distribution with mean δ . Thus following thearguments used in Theorem 2, the asymptotic power is given by P (1) θ + h / √ n × P (2) θ + h / √ n { (¯ θ (1) n − ¯ θ (2) n ) ′ ( S (1) + S (2) ) − (¯ θ (1) n − ¯ θ (2) n ) ≤ r − α } = P (1) θ + h / √ n × P (2) θ + h / √ n { n (ˆ θ (1) n − ˆ θ (2) n ) ′ ( λ − V − θ ,P (1) θ U θ ,P (1) θ V − θ ,P (1) θ +(1 − λ ) − V − θ ,P (2) θ U θ ,P (2) θ V (1) θ ,P (2) θ ) − (ˆ θ (1) n − ˆ θ (2) n )+ o P (1) θ (1) + o P (2) θ (1) ≤ χ k ; α + o P (1) θ (1) + o P (2) θ (1) } . Therefore under P (1) θ + h / √ n × P (2) θ + h / √ n , n (ˆ θ (1) n − ˆ θ (2) n ) ′ ( 1 λ V − θ ,P (1) θ U θ ,P (1) θ V − θ ,P (1) θ + 11 − λ V − θ ,P (2) θ U θ ,P (2) θ V − θ ,P (2) θ ) − (ˆ θ (1) n − ˆ θ (2) n )tends to a non-central chi-square distribution with non-centrality parameter δ ′ ( λ − V − θ ,P (1) θ U θ ,P (1) θ V − θ ,P (1) θ +(1 − λ ) − V − θ ,P (2) θ U θ ,P (2) θ V − θ ,P (2) θ ) − δ , which givesthe asymptotic power. References
Bhattacharya, I. and Ghosal, S. (2019). Bayesian Inference on MultivariateMedians and Quantiles. arXiv preprint arXiv:1909.10110 .Cheng, G. (2015). Moment consistency of the exchangeably weighted boot-strap for semiparametric M-estimation.
Scandinavian Journal of Statistics ,42(3):665–684.Ghosal, S. and van der Vaart, A. (2017).
Fundamentals of NonparametricBayesian Inference , volume 44. Cambridge University Press.Oja, H. and Randles, R. H. (2004). Multivariate nonparametric tests.
StatisticalScience , 19(4):598–605.Sirki¨a, S., Taskinen, S., Nevalainen, J., and Oja, H. (2007). Multivariate non-parametrical methods based on spatial signs and ranks: The R package spa-tialNP. imsart-ejs ver. 2014/10/16 file: ps-template.tex date: July 2, 2020 hattacharya and Ghosal/Bayesian nonparametric tests for multivariate locations Van der Vaart, A. W. (1998).
Asymptotic Statistics , volume 3. CambridgeUniversity Press.Xue-Kun Song, P. (2000). Multivariate dispersion models generated from Gaus-sian copula.
Scandinavian Journal of Statistics , 27(2):305–320., 27(2):305–320.