Global comparisons of medians and other quantiles in a one-way design when there are tied values
GGLOBAL COMPARISONS OF MEDIANS ANDOTHER QUANTILES IN A ONE-WAY DESIGNWHEN THERE ARE TIED VALUES
Rand R. WilcoxDept of PsychologyUniversity of Southern CaliforniaJune 25, 2015 a r X i v : . [ s t a t . M E ] J un BSTRACTFor J ≥ J groups have a common population median or identical quantiles, with an emphasis onthe quartiles. Classic rank-based methods are sometimes suggested for comparing medians,but it is well known that under general conditions they do not adequately address this goal.Extant methods based on the usual sample median are unsatisfactory when there are tiedvalues except for the special case J = 2. A variation of the percentile bootstrap used inconjunction with the Harrell–Davis quantile estimator performs well in simulations. Themethod is illustrated with data from the Well Elderly 2 study.Keywords: Tied values; bootstrap methods; Harrell–Davis estimator; projection dis-tances; Well Elderly 2 study For J independent random variables, let θ j be the population median or some other quantileassociated with j th variable ( j = 1 , . . . , J ). The paper considers the problem of testing H : θ = · · · = θ J , (1)particularly when there are tied values. The focus is on comparing the medians as well asthe upper or lower quartiles, but it is evident that the results are relevant when comparingother quantiles instead.For the special case J = 2, the Wilcoxon-Mann-Whitney (WMW) test is sometimessuggested for comparing medians, but it is well known that under general conditions it doesnot accomplish this goal (e.g., Hettmansperger, 1984; Fung, 1980). Roughly, the reason isthat for two independent random variables, X and Y , it is not based on an estimate of θ − θ ,but rather on an estimate of P ( X < Y ). Another concern is that when distributions differ,under general conditions the WMW test uses the wrong standard error (e.g., Cliff, 1996;Wilcox, 2012). More generally, the Kruskall–Wallis test, which reduces to the Wilcoxon-Mann-Whitney test when J = 2, does not test (1).2et another possible approach is to use the usual sample median in conjunction witha permutation test. However, results in Romano (1990) establish that this approach isunsatisfactory as well.For a random sample X , . . . , X n , let X (1) ≤ . . . ≤ X ( n ) denote the observations writtenin ascending order and let M j denote the usual sample median. That is, if the number ofobservations, n , is odd, M = X ( m ) , where m = ( n + 1) / n is even, M = ( X ( m ) + X ( m +1) )2 , where now m = n/
2. A natural way of proceeding is to estimate θ j with M j and use sometest statistic that is based in part on some estimate of the standard error of M j . Whensampling from a continuous distribution where tied values occur with probability zero, aneffective method was studied by Bonett and Price (2002) that can be used when J = 2or when J > M j have beenderived, but extant results indicate that all of them can perform poorly when tied valuescan occur (Wilcox, 2012). Wilcox (2006) found a slight extension of a standard percentilebootstrap method that performs well when testing (1), there are tied values and J = 2. ButWilcox (2012) notes that in terms of testing (1) when J >
2, evidently no method has beenfound that performs well in simulations when there are tied values.There is another complication when working with the usual sample median. It is wellknown that when sampling from a continuous distribution, under certain regularity condi-tions, M j is asymptotically normal. However, when sampling from a discrete distributionwith a finite sample space, M j does not converge to a normal distribution. More broadly,when estimating quantiles using a single order statistic, or a weighted average of two or-der statistics, assuming asymptotic normality is generally unsatisfactory when dealing withdiscrete distributions where tied values occur.As an illustration, consider a beta-binomial distribution having the probability function P ( x ) = B ( m − x + r, x + s )( m + 1) B ( m − x + 1 , x + 1) B ( r, s ) , (2)3here B is the complete beta function, r > s > x = 0 , . . . , m . Consider m = 30. Then the cardinality of thesample space is 31 and as is evident, if the sample size is n >
31, tied values occur withprobability one. The left panel of Figure 1 shows a plot of 3000 sample medians generatedfrom a beta-binomial distribution with r = 1 and s = 3 based on a sample size n = 100.(So the beta-binomial distribution is skewed to the right, P ( x ) is monotonic decreasingand x = 6 corresponds to the .52 quantile.) The right panel is the same as the left, onlynow n = 500. As is evident, the sampling distribution has not moved closer to a normaldistribution and indeed the cardinality of the sample space has decreased, indicating thatany method for making inferences based on the sample median that assumes asymptoticnormality can perform poorly.For the special case where the goal is to compare two independent groups, a method forcomparing quantiles that deals effectively with tied values is to use a percentile bootstrap inconjunction with the quantile estimator derived by Harrell and Davis (1982); see Wilcox etal. (2013). The Harrell–Davis estimator uses a weighted average of all the order statistics.The result is a sampling distribution that is typically well approximated by a continuousdistribution. Consider again the right panel of Figure 1 and note that the cardinality of thesample space is only five. That is, only five values for the sample median were observedamong the 3000 estimates. In contrast, if the Harrell–Davis estimator is used, there are notied values among all 3000 estimates. But the method studied by Wilcox et al. is limitedto J = 2; there are no results on how best to proceed when testing (1) and there are J > . . . . X R e l . F r eq . * * * * * * * * * * * * * 5.0 5.5 6.0 6.5 7.0 . . . . . X R e l . F r eq . * * * * * Figure 1: The left panel shows a plot of 3000 sample medians when sampling from a beta-binomial distribution, n = 100, m = 30, r = 1, s = 3. The right panel is a plot of the 3000medians when n = 500 5sing the usual sample median for the situation at hand, preliminary simulations found thatit performs poorly in terms of controlling the Type I error probability. Results in Wilcox etal. (2013) suggest that using instead the Harrell–Davis estimator, reasonably good controlover the Type I error probability might be obtained. So the goal here is to determine theextent to which this is the case.Note that in terms of characterizing the typical value of some random variable, the pop-ulation median is an obvious choice. However, situations are encountered where differencesoccur in the tails of a distribution that have substantive interest (e.g. Doksum & Sievers,1976; Wilcox et al., 2014). This issue can be addressed by comparing quantiles other thanthe median, which can help provide a deeper understanding of how distributions differ. Thispoint is illustrated in section 4. To describe the Harrell and Davis (1982) estimate of the q th quantile, let Y be a randomvariable having a beta distribution with parameters a = ( n + 1) q and b = ( n + 1)(1 − q ).That is, the probability density function of Y isΓ( a + b )Γ( a )Γ( b ) y a − (1 − y ) b − , ≤ y ≤ , where Γ is the gamma function. Let W i = P (cid:18) i − n ≤ Y ≤ in (cid:19) . For a random sample X , . . . , X n , let X (1) ≤ . . . ≤ X ( n ) denote the observations written inascending order. The Harrell–Davis estimate of θ q , the q th quantile, isˆ θ q = n (cid:88) i =1 W i X ( i ) . (3)In terms of its standard error, Sfakianakis and Verginis (2006) show that in some situa-tions the Harrell–Davis estimator competes well with alternative estimators that again usea weighted average of all the order statistics, but there are exceptions. (Sfakianakis andVerginis derived alternative estimators that have advantages over the Harrell–Davis in some6ituations. But when sampling from heavy-tailed distributions, the standard errors of theirestimators can be substantially larger than the standard error of ˆ θ q .) Additional comparisonsof various estimators are reported by Parrish (1990), Sheather and Marron (1990), as wellas Dielman, Lowry and Pfaffenberger (1994). The only certainty is that no single estimatordominates in terms of efficiency. For example, the Harrell–Davis estimator has a smallerstandard error than the usual sample median when sampling from a normal distribution ora distribution that has relatively light tails, but for sufficiently heavy-tailed distributions,the reverse is true (Wilcox, 2012, p. 87).Let δ jk = θ j − θ k . Roughly, the strategy for testing (1) is to test H : δ jk = 0 , ∀ j < k, (4)via a percentile bootstrap method. If the null hypothesis is true, then , a vector of zeroshaving length ( J − J ) /
2, should be nested fairly deeply within a cloud of bootstrap estimatesof δ jk . Moreover, the depth of can be used to compute a p-value as will be indicated.A natural way of measuring the depth of within a bootstrap cloud is via Mahalanobisdistance. Note, however, that the δ jk parameters are linearly dependent. This indicates thatMahalanobis distance can fail from a computational point of view because the bootstrapcovariance matrix will be singular. This proved to be the case, so the strategy here is tomeasure the depth of using a method that does not require the use of a covariance matrix.For completeness, note that the issue of a singular covariance matrix could be avoided byusing the first group as a reference group and testing H : θ − θ = θ − θ = · · · = θ − θ J .In terms of a Type I error, this approach is reasonable, but in terms of power, this is notnecessarily the case. The reason is that power can depend on which group is used as thereference group because the choice of the reference group impacts the magnitude of thedifferences between the medians that are compared.To describe the details of the proposed test of (1) via (4), let X ij be a random samplefrom the j th group ( i = 1 , . . . , n j ). Generate a bootstrap sample from the j th group byresampling with replacement n j observations from group j . Let ˆ θ ∗ j be the estimate of the q thquantile for group j based on this bootstrap sample. Let ˆ δ ∗ jk = ˆ θ ∗ j − ˆ θ ∗ k , j < k . Repeat thisprocess B times yielding ˆ δ ∗ bjk , b = 1 , . . . , B . Here, B = 600 is used in order to avoid overlyhigh execution time and because this choice has been found to provide reasonably good7ontrol over the Type I error probability when dealing with related bootstrap techniques(e.g., Wilcox, 2012). However, in terms of power, there might be a practical advantage tousing a larger choice for B (Racine & MacKinnon, 2007).A portion of the strategy used here is based on measuring the depth of a point in amultivariate data cloud using a projection-type method, which provides an approximationof half-space depth (Wilcox, 2012, section 6.2.5). For notational convenience, momentarilyfocus on an n × p matrix of data, Y . Let ˆ τ be some measure of location based on Y . Forsimplicity, the marginal medians (based on the usual sample median) are used. Let U i = Y i − ˆ τ ( i = 1 , . . . , n ), C i = U i U (cid:48) i and for any j ( j = 1 , . . . , n ), let W ij = J (cid:88) k =1 U ik U jk ,T ij = W ij C i ( U i , . . . , U ip ) (5)and D ij = (cid:107) T ij (cid:107) , where (cid:107) T ij (cid:107) is the Euclidean norm associated with the vector T ij ( i = 1 , . . . n ; j = 1 , . . . , n ).Let d ij = D ij q i − q i , where q i and q i are estimates of the upper and lower quartiles, respectively, based on D i , . . . , D in . Here, q i and q i are estimated with the so-called ideal fourths (e.g., Friqqe etal., 1989.), which are computed as follows. Let j be the integer portion of ( n/
4) + (5 / h = n − j. The lower quartile is estimated with s q i = (1 − h ) D i ( j ) + hD i ( j +1) , (6)8here D i (1) ≤ · · · ≤ D i ( n ) . Letting k = n − j + 1, the upper quartile is q i = (1 − h ) D i ( k ) + hD i ( k − . (7)The projection distance of Y j , relative to the cloud of points represented by Y , is themaximum value of d ij , the maximum being taken over i = 1 , . . . , n . This measure of depthis nearly the same as the measure derived by Donoho and Gasko (1992). the only differenceis that they used the median absolute difference (mad) as a measure of scale rather than theinterquartiles range. MAD has a higher breakdown point but using the interquartile rangehas been found to perform better in various situations (Wilcox, 2012). This might be dueto the poor efficiency of mad, but the extent this is the case is unclear. Perhaps using madwould perform well in the simulations reported here, but this is left to future investigations.Now create a ( B + 1) × L matrix G where the first B rows are based on the ˆ δ ∗ bjk , b = 1 , . . . , B , L = ( J − J ) /
2. That is, row b consists of the L values associated with ˆ δ ∗ bjk for all j < k . Row B + 1 of G is a vector having length L . Then from general theoreticalresults in Liu and Singh (1997), a (generalized) p-value can be computed based on the relativedistance of . Compute the projection distance for each row of G . The distance associatedwith the b th row is denoted by K b and the distance for the null vector (row B + 1) is denotedby K . Then a generalized p-value is1 − B B (cid:88) b =1 I ( K ≥ K b ) , where the indicator function I ( K ≥ K b ) = 1 if K ≥ K b , otherwise I ( K ≥ K b ) = 0. Thiswill be called method Q. Simulations were used to study the small-sample properties of method Q when there are J = 4 groups. Results are reported when comparing medians as well as the lower andupper quartiles. Estimated Type I error probabilities, ˆ α , were based on 4000 replications.Both continuous and discrete distributions were used. The four continuous distributionswere normal, symmetric and heavy-tailed, asymmetric and light-tailed, and asymmetric and9able 1: Some properties of the g-and-h distribution.g h κ κ Z has a standard normaldistribution, then W = exp( gZ ) − g exp( hZ / , if g > Z exp( hZ / , if g = 0has a g-and-h distribution where g and h are parameters that determine the first four mo-ments. The four distributions used here were the standard normal ( g = h = 0 . h = 0 . g = 0 . h = 0 . g = 0 . g = h = 0 . κ ) and kurtosis ( κ ) for each distribution. Additional propertiesof the g-and-h distribution are summarized by Hoaglin (1985).As for situations where tied values can occur, consider a discrete distribution with asample space having cardinality N . A goal in the simulations was to get some sense abouthow well method Q controls the Type I error probability when N is small. Roughly, asthe likelihood of tied values increases, at what point does method Q break down? Here,results are reported when data are generated from a beta-binomial distribution for whichthe cardinality of the sample space is N = m + 1 = 11 and N = m + 1 = 21. The choicesfor ( r, s ) were (3, 3), which has a symmetric distribution, as well as (1, 3) and (1,9), whichare skewed distributions.First consider the four g-and-h distributions when testing at the .05 level and the groupshave a common sample size n = 20. As indicated in Table 2, the estimated Type I errorprobability ranges between .025 and .062. Although the importance of a Type I error dependson the situation, Bradley (1978) suggests that as a general guide, when testing at the .05level, the actual level should not drop below .025 or exceed .075. In Table 2, the estimates10able 2: Estimated Type I Error Probability using method Q, continuous distributions, α = . q g h n = 20 n = 500.25 0.0 0.0 0.058 0.0590.25 0.0 0.2 0.031 0.0460.25 0.2 0.0 0.061 0.0580.25 0.2 0.2 0.038 0.0550.50 0.0 0.0 0.059 0.0610.50 0.0 0.2 0.046 0.0570.50 0.2 0.0 0.062 0.0620.50 0.2 0.2 0.054 0.0560.75 0.2 0.0 0.049 0.0540.75 0.2 0.2 0.025 0.038were in this range.A possible criticism of the results in Table 2 is that they are based on only 4000 repli-cations. Consequently, some comments about the precision of the estimates in Table 2 areprovided. Assuming Bradley’s criterion is reasonable, consider the issue of whether the ac-tual level is less than or equal .075. Using the method in Pratt (1968), it can be seen thatbased on a two-sided .95 confidence interval for the actual level, the confidence interval willnot contain .075 if ˆ α ≤ . α ≤ . α ≥ . α ≥ .
032 is required and there are only twosituations where the estimate is less than .032 which occurred for n = 20 when comparingthe quartiles.For normal distributions, a simulation was run with n = 100 as an additional check onhow the method performs as n gets large. The estimated Type I error probability was .058.For the beta-binomial distributions, estimated Type I error probabilities are shown inTable 3. For m = 20, control over the Type I error probability is reasonably good when11able 3: Estimated probability of a Type I error using method Q when sampling from a beta-binomial distribution for a sample sizes n = 20 and 50, α = .
05, and where the cardinalityof the sample space is N = m + 1. q r s ( n, m ) = (20 ,
10) ( n, m ) = (20 ,
20) ( n, m ) = (50 ,
10) ( n, m ) = (50 , m = 10, it is evident that control over the probability of a Typecan be unsatisfactory, particularly when ( r, s ) = (1 , m = 20, method Q can be unsatisfactory with n = 20,the highest estimate of actual level being .085. For this particular situation, increasing thesample size of two of the groups to 40, the estimate is .064. With all sample sizes equal to30 and m = 10, the estimate is .065.Precise details regarding the rate of convergence to the nominal level is impossible withonly 4000 replications, but it is evident that the rate of convergence can depend on thenature of the distribution. Among the discrete distributions considered here for which thecardinality of the sample space is 21, n ≥
20 suffices in terms of achieving an estimated TypeI error probability reasonably close to a nominal .05 level when comparing the medians. Butfor m = 10 (the cardinality of the sample space is 11) and ( r, s ) = (1 , n ≥
180 is requiredwhen comparing medians. With n = 100, for example, the estimate of the actual levelexceeds .09, in which case the .95 confidence interval for the actual level does not contain.075.A few simulations were performed using a discrete distribution where the null hypothesisis true but not all of the distributions are identical. All indications are that when comparing12edians, again the Type I error probability is controlled reasonably well when n = m = 20.Consider, for example, a beta-binomial distribution where r = s = 3. Then 11 is the.54 quantile. Now, suppose that for all four groups data are generated from a discretedistribution such that P ( X ≤ x ) corresponds to a beta-binomial distribution where r = s = 3provided that x ≤
11, but that otherwise some of the cumulative distributions differ. So thedistributions are not all identical in the right tail, but the hypothesis of equal populationmedians is true. Consider in particular the situation where for three of the groups theprobability function, say f ( x ), corresponds to a beta-binomial probability function when x <
15. Otherwise f ( x ) = (cid:88) x =15 P ( x ) / x ≥
15, where again P ( x ) indicates a a beta-binomial distribution given by (2). Nowthe estimated Type I error probability when comparing the population medians is .057. Ifinstead for x = 15 , , . . . , f ( x ) is taken to be P (21), P (20) , . . . , P (15), respectively, theestimated Type I error probability is .048. However, when comparing the lower quartiles,now control over the Type I error probability exceeds .09. Increasing the sample sizes to40, this problem persisted. With all sample sizes equal to 50, control over the Type I errorprobability is reasonably good, the estimate being .064. Method Q is illustrated using data from the Well Elderly 2 study (Jackson, et al., 2009;Clark et al. 1997). Generally, the study dealt with the efficacy of a particular interventionstrategy aimed at improving the physical and emotional health of older adults. One partic-ular issue was whether four educational groups differed in terms of a measure of meaningfulactivity prior to intervention. The four groups were high school graduate, some college ortechnical school, 4 years of college and post-graduate school. For convenience, these groupsare designated G1, G2, G3 and G4 henceforth. Meaningful activity was measured with thesum of 29 Likert items, where the possible responses for each item were 0, 1, 2, 3 and 4.Higher scores reflect higher levels of meaningful activities. The sample sizes were 62, 81, 110and 125, respectively. 13
Figure 2: Boxplots for the four groups compared in the Well Elderly 2 study. G1=high schoolgraduate, G2=some college or technical school, G3=4 years of college and G4=post-graduateschool.Figure 2 shows boxplots for each of the four groups, which suggests that more pronounceddifferences occur based on the lower quartile compared to upper quartile. Applying methodQ, the p-values corresponding to the .25, .5 and .75 quantiles were 0, .074 and .294, respec-tively. So in terms of participants who score relatively high on meaningful activities, nosignificant difference is found, but a significant result is found for low levels of meaningfulactivity as reflected by the .25 quantiles. 14
Concluding Remarks
All indications are that if the cardinality of the sample space is
N >
20 and the samplesize is n ≥
30, reasonably good control over the Type I error probability will be achievedusing method Q when the goal is to compare the population medians. When comparing thequartiles, now n ≥
50 might be required. Of course, simulations do not guarantee that thiswill be the case for all practical situations that might be encountered. But the main point isthat no other method has been found that performs even tolerably well in simulations whentied values are likely to occur, except for the special case of J = 2 groups.There are many variations of the method used here. For example, in various situations,weighted bootstrap methods have been suggested when dealing with robust estimators; seefor example Salibian-Barrera and Zamar (2002) and the papers they cite. There are severalalternatives to the Harrell–Davis estimator that use a weighted sum of all the order statistics,and there are variations of the projection distance that was used. perhaps there are situationswhere some combination of these methods provide a practical advantage over the methodused here, but this remains to be determined. The main point is that the method in thepaper is the only known method that continues to perform reasonably well when there aretied values.Finally, method Q can be applied with the R function Qanova, which has been added tothe Forge R package WRS. This function is also stored in the file Rallfun-v28, which can bedownloaded from http://dornsife.usc.edu/cf/labs/wilcox/wilcox-faculty-display.cfm.REFERENCESBradley, J. V. (1978) Robustness? British Journal of Mathematical and Statistical Psy-chology , , 144–152.Bonett, D. G. & Price, R. M. (2002). Statistical inference for a linear function of medi-ans: Confidence intervals, hypothesis testing, and sample size requirements. PsychologicalMethods, 7 , 370–383.Clark, F., Azen, S. P., Zemke, R., Jackson J., Carlson, M., Mandel, D., Hay, J., Joseph-15on, K., Cherry, B., Hessel, C., Palmer, J., & Lipson, L . (1997). Occupational therapy forindependent-living older adults. A randomized controlled trial.
Journal of the AmericanMedical Association,, 278 , 1321–1326.Cliff, N. (1996).
Ordinal Methods for Behavioral Data Analysis . Mahwah, NJ: Erlbaum.Dielman, T., Lowry, C. & Pfaffenberger, R. (1994). A comparison of quantile estimators.
Communications in Statistics–Simulation and Computation, 23 , 355–371.Doksum, K. A. & Sievers, G. L. (1976). Plotting with confidence: graphical comparisonsof two populations.
Biometrika, 63 , 421–434.Donoho, D. L. & Gasko, M. (1992). Breakdown properties of the location estimates basedon halfspace depth and projected outlyingness.
Annals of Statistics, 20 , 1803–1827.Frigge, M., Hoaglin, D. C. & Iglewicz, B. (1989). Some implementations of the boxplot.
American Statistician, 43 , 50–54.Fung, K. Y., 1980. Small sample behaviour of some nonparametric multi-sample locationtests in the presence of dispersion differences.
Statistica Neerlandica, 34 , 189–196.Harrell, F. E. & Davis, C. E. (1982). A new distribution-free quantile estimator.
Biometrika,69 , 635–640.He, X., Simpson, D. G. & Portnoy, S. L. (1990). Breakdown robustness of tests.
Journalof the American Statistical Association, 85 , 446–452.Hettmansperger, T. P. (1984).
Statistical Inference Based on Ranks . New York: Wiley.Hoaglin, D. C. (1985). Summarizing shape numerically: The g-and-h distribution. In D.Hoaglin, F. Mosteller & J. Tukey (Eds.)
Exploring Data Tables Trends and Shapes . NewYork: Wiley, pp. 461–515.Hochberg, Y. (1988). A sharper Bonferroni procedure for multiple tests of significance.
Biometrika, 75 , 800–802.Jackson, J., Mandel, D., Blanchard, J., Carlson, M., Cherry, B., Azen, S., Chou, C.-P.,Jordan-Marsh, M., Forman, T., White, B., Granger, D., Knight, B., & Clark, F. (2009).16onfronting challenges in intervention research with ethnically diverse older adults:the USCWell Elderly II trial.
Clinical Trials, 6
Journal of the American Statistical Association, 92 , 266–277.Parrish, R. S. (1990). Comparison of quantile estimators in normal sampling.
Biometrics,46 , 247–257.Pratt, J. W. (1968). A normal approximation for binomial, F, beta, and other common,related tail probabilities, I.
Journal of the American Statistical Association , , 1457–1483.Racine, J. & MacKinnon, J. G. (2007). Simulation-based tests that can use any numberof simulations. Communications in Statistics–Simulation and Computation, 36 , 357–365.Radloff L. (1977). The CES-D scale: a self report depression scale for research in thegeneral population.
Applied Psychological Measurement, 1 , 385-401.Romano, J. P. (1990). On the behavior of randomization tests without a group invarianceassumption.
Journal of the American Statistical Association, 85 , 686–692.Salibian-Barrera, M. & Zamar, R. H. (2002). Bootstrapping robust estimates of regres-sion.
Annals of Statistics, 30 , 556–582.Sfakianakis, M. E. & Verginis, D. G. (2006). A new family of nonparametric quantileestimators.
Communications in Statistics–Simulation and Computation, 37 , 337–345.Sheather, S. J. & Marron, J. S. (1990). Kernel quantile estimators.
Journal of theAmerican Statistical Association, 85 , 410–416.Wilcox, R. R. (2006). Comparing medians.
Computational Statistics & Data Analysis,51 , 1934–1943.Wilcox, R. R. (2012).
Introduction to Robust Estimation and Hypothesis Testing (3rdEdition). San Diego, CA: Academic Press.Wilcox, R. R., Erceg-Hurn, D., Clark, F. & Carlson, M. (2014). Comparing two inde-pendent groups via the lower and upper quantiles.
Journal of Statistical omputation and Simulation, 84omputation and Simulation, 84