[PDF] Double-bootstrap methods that use a single double-bootstrap simulation

Abstract

We show that, when the double bootstrap is used to improve performance of bootstrap methods for bias correction, techniques based on using a single double-bootstrap sample for each single-bootstrap sample can be particularly effective. In particular, they produce third-order accuracy for much less computational expense than is required by conventional double-bootstrap methods. However, this improved level of performance is not available for the single double-bootstrap methods that have been suggested to construct confidence intervals or distribution estimators.

Full PDF

aa r X i v : . [ m a t h . S T ] A ug Double-bootstrap methods that use a singledouble-bootstrap simulation

Jinyuan Chang Peter HallDepartment of Mathematics and StatisticsThe University of Melbourne, VIC, 3010, [email protected] [email protected]

Abstract

We show that, when the double bootstrap is used to improve performance of bootstrap methodsfor bias correction, techniques based on using a single double-bootstrap sample for each single-bootstrap sample can be particularly eﬀective. In particular, they produce third-order accuracyfor much less computational expense than is required by conventional double-bootstrap meth-ods. However, this improved level of performance is not available for the single double-bootstrapmethods that have been suggested to construct conﬁdence intervals or distribution estimators.

Keywords:

Bias correction; Bias estimation; Conﬁdence intervals; Distribution estimation; Edge-worth expansion; Second-order correctness; Third-order correctness.

Double-bootstrap methods that use a single simulation at the second bootstrap level have beenstudied in at least one context for more than a decade. An early contribution was made by White(2000), although in the setting of diagnosing the overuse of a dataset, rather than speeding upMonte Carlo simulation for general applications of the bootstrap. Davidson & Mackinnon (2001,2002), and the same authors in a number of subsequent papers accessible via Mackinnon (2006)and Davidson & Mackinnon (2007), introduced the concept independently and explored its appli-cations. Giacomini et al. (2013) christened the technique the warp-speed double-bootstrap method,nomenclature that we shall use here, too. Giacomini et al. (2013) demonstrated that this approach isasymptotically consistent. All this work is for the case of distribution estimation and its applicationto constructing conﬁdence intervals and hypothesis tests.In statistics the conventional double bootstrap is used in two main classes of problems: (i) Toimprove the eﬀectiveness of bias correction, and (ii) to improve the coverage accuracy of conﬁdenceintervals. In problem (i), an application of the double bootstrap reduces the order of magnitude ofbias by the factor O ( n − ), and in problem (ii) it reduces coverage error by the factor O ( n − / ) forone-sided conﬁdence intervals, and O ( n − ) for two-sided intervals. In the setting of problem (i), it isnot clear whether there exists a version of warp-speed methodology for bias correction, and whether,1hould it exist, it successfully reduces the order of magnitude of bias. Call these questions 1 and 2,respectively. In problem (ii), it is unclear whether the warp-speed double bootstrap is as eﬀective asthe conventional double bootstrap, in the sense of oﬀering the above levels of improved accuracy; weshall refer to this as question 3. In the present paper we show that the answers to questions 1 and 2are positive, but that the answer to question 3 is negative. In particular, the warp-speed bootstrapdoes not reduce the order of magnitude of coverage error of a conﬁdence interval.There is an extensive literature on conventional double-bootstrap methods, particularly in thecontext of improving the coverage accuracy of single-bootstrap methods. The ﬁrst mention of thedouble bootstrap in this setting apparently was by Hall (1986), followed quickly by contributions ofBeran (1987, 1988). See also Hall & Martin (1988). The approach suggested by Hall (1992, Chap. 3)allows general multiple bootstrap methods to be developed together, so that diﬀerent settings donot require separate treatment. However, details of properties of the technique seem to be veryproblem-speciﬁc. Efron (1983) was the ﬁrst to use the double bootstrap in any setting; in that paperhis work was in the context of estimating the error rate of classiﬁers. Research on optimising thetrade-oﬀ between the numbers of simulations in the ﬁrst and second stages of the conventional doublebootstrap, in the context of distribution estimation and constructing conﬁdence intervals, includesthat of Booth & Hall (1994), Booth & Presnell (1998) and Lee & Young (1999).It has become conventional to assess performance of the bootstrap in terms of Edgeworth ex-pansions, not least because that approach enables theoretical properties to be developed in the verybroad context addressed by Bhattacharya & Ghosh (1978). The resulting approximations are valid,in absolute rather than relative terms, uniformly in the tails. An alternative approach, based onlarge deviation probabilities, is valid in relative terms; see e.g. Hall (1990). However, it requireseither more stringent assumptions or specialised methods that, at least at present, are not availablein the context of the models used by Bhattacharya & Ghosh (1978). In the setting of absolute ratherthan relative accuracy, arbitrarily far out into the tails, the results in this paper take the result ofconsistency, demonstrated by Giacomini et al. (2013), much further. Let θ = f ( µ ) be a parameter expressible as a known function, f , of a p -variate mean, µ , and let¯ X denote an unbiased estimator of µ = ( µ , . . . , µ p ) T . Our estimator of θ is the same function of asample mean, ¯ X : ˆ θ = f ( ¯ X ) . (1)The smooth function f maps a point x in p -variate Euclidean space to a point on the real line. Wedo not insist that ¯ X be a mean of n , say, independent and identically distributed random p -vectors,2ince it might be the case that ¯ X = ( ¯ X , . . . , ¯ X p ) T , with¯ X j = 1 n j n j X i =1 X ji , where X ji , for 1 ≤ i ≤ n j , are independent for each i , E ( X ji ) = µ j for each j , and the n j s are notall equal. Nevertheless, in mathematical terms we shall assume that the n j s are all functions of aninteger parameter n , and that each n j ≍ n ; that is, each ratio n j /n is bounded away from zero andinﬁnity as n → ∞ .These issues are related to dependence relationships among the random variables X ji , whichshould be reﬂected in resampling methodology. In our theoretical work we shall suppose that:either (i) each n j = n and the vectors ( X i , . . . , X pi ) T , for i ≥

1, are independent andidentically distributed; or (ii) the X ji s are totally independent, for 1 ≤ i ≤ n j and1 ≤ j ≤ p , and in this case, for each j ∈ { , . . . , p } the variables X j , X j , . . . areidentically distributed, and n j ≍ n . (2)Each of (i) and (ii) above can be generalized, for example to hybrid cases where, for posi-tive integers p , . . . , p r that satisfy P rj =1 p j = p , and deﬁning q j = P jk =1 p k , the vectors V ji =( X q j +1 ,i , . . . , X q j +1 i ) T , for 0 ≤ j ≤ r − i ≥

1, are completely independent, and for each j thevectors V ji , for i ≥

1, are identically distributed. Bootstrap methods that reﬂect these properties canbe constructed readily, and theory providing authoritative support in this setting can be developed,but for the sake of brevity, in our theoretical work we shall restrict attention to cases where (2) holds.

Bias-corrected estimators of θ , based on the conventional bootstrap and the double bootstrap, re-spectively, are given byˆ θ bc = 2 ˆ θ − E (ˆ θ ∗ | X ) , ˆ θ bcc = 3 ˆ θ − E (ˆ θ ∗ | X ) + E (ˆ θ ∗∗ | X ) . (3)Here X = { X ji : 1 ≤ i ≤ n j , ≤ j ≤ p } denotes the original dataset, ˆ θ ∗ is the version of ˆ θ computed from a resample X ∗ drawn randomly, with replacement, from X , in a manner that reﬂectsappropriately the dependence structure, and ˆ θ ∗∗ is the version of ˆ θ computed from X ∗∗ , which inturn is drawn randomly with replacement from X ∗ , again reﬂecting dependence.Monte Carlo approximations to the quantities ˆ θ bc and ˆ θ bcc in (3) are given respectively by˜ θ bc = 2 ˆ θ − B B X b =1 ˆ θ ∗ b , ˜ θ bcc = 3 ˆ θ − B B X b =1 ˆ θ ∗ b + 1 BC B X b =1 C X c =1 ˆ θ ∗∗ bc , (4)where ˆ θ ∗ b denotes the b th out of B independent and identically distributed, conditional on X , versionsof ˆ θ ∗ , computed from respective resamples X ∗ b drawn by sampling randomly, with replacement, fromthe data in X , and ˆ θ ∗∗ bc is the c th out of C independent and identically distributed, conditional on X and X ∗ , versions of ˆ θ ∗∗ , and is computed from a resample X ∗∗ bc drawn by sampling randomly, withreplacement, from X ∗ b . 3 .3 Bootstrap algorithms Reﬂecting the model at (1), we can express ˆ θ ∗ b and ˆ θ ∗∗ bc in (4) as ˆ θ ∗ b = f ( ¯ X ∗ b ) and ˆ θ ∗∗ bc = f ( ¯ X ∗∗ bc ),where ¯ X ∗ b = ( ¯ X ∗ b , . . . , ¯ X ∗ bp ) T , ¯ X ∗∗ bc = ( ¯ X ∗∗ bc , . . . , ¯ X ∗∗ bcp ) T , ¯ X ∗ bj denotes the mean of data in the resample X ∗ bj = { X ∗ bj , . . . , X ∗ bjn j } , ¯ X ∗∗ bcj is the mean of data in the re-resample X ∗∗ bcj = { X ∗∗ bcj , . . . , X ∗∗ bcjn j } drawn by sampling with replacement from X ∗ bj , the resampling operations at the ﬁrst bootstraplevel are undertaken by resampling the vectors X i = ( X i , . . . , X pi ) T randomly, with replacement, if(2)(i) holds, or by resampling the X ji s randomly and completely independently, conditional on X and with replacement, if (2)(ii) obtains, and resampling at the second bootstrap level is undertakenanalogously. In Theorem 1 in section 5.1 we shall show that if C → ∞ , no matter how slowly, as n and B diverge,then the asymptotic distribution of the Monte Carlo simulation error incurred when constructing ˜ θ bcc at (4) is the same as it would be if C = ∞ . In particular, not only is the error of order ( nB ) − / , thelarge-sample limiting distribution of the relevant asymptotically normal random variable, which hasstandard deviation proportional to ( nB ) − / , and which describes in relative detail the accuracy ofMonte Carlo bootstrap simulation, is identical to the limiting distribution that would arise if C = ∞ .Moreover, if C is held ﬁxed then the order of magnitude, ( nB ) − / , remains unchanged, but thestandard deviation of the large-sample limiting distribution referred to above changes by a constantfactor. This result is critical. It demonstrates the relatively small gains that are to be achieved bytaking C to be large, and argues in favour of taking C = 1, for example. This is the analogue, forbias correction, of the warp-speed bootstrap for distribution estimation when constructing conﬁdenceintervals.Therefore the order of magnitude of Monte Carlo simulation error in ˜ θ bcc is unchanged even if C is held ﬁxed. Incidentally, the order of magnitude, ( nB ) − / , should be compared with that ofthe uncorrected bias that remains after applying the bias correction that leads to ˜ θ bcc ; it is n − .Therefore, unless B is of order n or larger, for the regular bootstrap, the orders of magnitudeinvolving B , discussed above, dominate the error in the bias correction. As in section 2.1 we shall assume that the parameter θ can be represented as f ( µ ), where thefunction f : IR p → IR is known, and µ = E ( X ) is an unknown p -vector of parameters, estimated by¯ X = n − P ni =1 X i where X = { X , . . . , X n } is a random sample of data vectors. Here and below weuse model (2)(i) for the data, but only minor modiﬁcations are needed if (2)(ii) is employed instead.4n such cases, provided that f is suﬃciently smooth and ˆ θ is given by (1), the asymptotic variance, σ n , of ˆ θ is estimated root- n consistently by n − ˆ σ , whereˆ σ = p X j =1 p X j =1 f j j ( ¯ X ) 1 n n X i =1 ( X j i − ¯ X j )( X j i − ¯ X j ) . Here, given a p -vector x = ( x , . . . , x p ) T , and integers j , . . . , j r between 1 and p ; and assuming that f has r well-deﬁned derivatives with respect to each variable; we put f j ...j r ( x ) = ( ∂/∂x j ) . . . ( ∂/∂x j r ) f ( x ) . The above deﬁnitions of ˆ θ and ˆ σ are used in (5) below. Let R , referred to as the “root” by Giacomini et al. (2013), be given by either of the formulae R = n / (ˆ θ − θ ) , R = n / (ˆ θ − θ ) / ˆ σ . (5)Here ˆ θ and ˆ σ are estimators of parameters θ and σ computed from the random sample X , and σ denotes the asymptotic variance of n / ˆ θ . The warp-speed bootstrap of Giacomini et al. (2013),closely related to suggestions by White (2000) and Davidson & Mackinnon (2002, 2007), can bedeﬁned as follows.As in section 2, let X ∗ b , for 1 ≤ b ≤ B , be drawn randomly, with replacement, from X , and beindependent conditional on X . Draw X ∗∗ b , denoting a single double-bootstrap resample, by samplingrandomly, with replacement, from X ∗ b for b = 1 , . . . , B , in such a manner that these re-resamples areindependent, conditional on X and X ∗ , . . . , X ∗ B . In the context of section 2, X ∗∗ b would be one of theresamples X ∗∗ b , . . . , X ∗∗ bC which were drawn by resampling from X ∗ b , but on the present occasion werequire only one of these resamples.Let ˆ θ ∗ b and ˆ θ ∗∗ b denote the versions of ˆ θ computed from X ∗ b and X ∗∗ b , respectively, instead of X ,and write ˆ σ ∗ b and ˆ σ ∗∗ b for the corresponding versions of ˆ σ . If R is given by one of the formulae at (5),deﬁne R ∗ b = n / (ˆ θ ∗ b − ˆ θ ) , R ∗ b = n / (ˆ θ ∗ b − ˆ θ ) / ˆ σ ∗ b , (6) R ∗∗ b = n / (ˆ θ ∗∗ b − ˆ θ ∗ b ) , R ∗∗ b = n / (ˆ θ ∗∗ b − ˆ θ ∗ b ) / ˆ σ ∗∗ b , (7)in the respective cases, and put b F ∗ B ( x ) = 1 B B X b =1 I ( R ∗ b ≤ x ) , e F ∗ B ( x ) = 1 B B X b =1 I ( R ∗∗ b ≤ x ) . (8)Then b F ∗ B is the conventional single-bootstrap, Monte Carlo approximation to the distribution func-tion F of R , and the limit of b F ∗ B , as B → ∞ , is the conventional single-bootstrap approximationto F . The function e F ∗ B is a short-cut, warp-speed, double-bootstrap approximation to F .5iven a nominal coverage level α ∈ (0 ,

1) of a conﬁdence interval, deﬁne x = ˆ x ∗ α to be the solutionof the equation e F ∗ B ( x ) = α , and similarly let ˆ x α be the solution of b F ∗ B ( x ) = α . If R is given by eitherof the two expressions in (5), consider the respective conﬁdence intervals, I ∗ bα = (ˆ θ ∗ b − n − / ˆ x ∗ α , ∞ ) , I ∗ bα = (ˆ θ ∗ b − n − / ˆ σ ∗ b ˆ x ∗ α , ∞ ) , (9)which are bootstrap versions of the respective intervals I α = (ˆ θ − n − / ˆ x α , ∞ ) , I α = (ˆ θ − n − / ˆ σ ˆ x α , ∞ ) . (10)In either case, our estimator of the probability p α that the interval I α covers θ is given byˆ p Bα = 1 B B X b =1 I (ˆ θ ∈ I ∗ bα ) . (11)We take the ﬁnal interval to be I ˆ β Bα , where β = ˆ β Bα denotes the solution of ˆ p Bβ = α .Earlier warp-speed bootstrap methodology is a little ambiguous in the percentile- t setting, i.e. inthe context of the second deﬁnition in each of (5)–(7), where the technique is not completely clearfrom the algorithms of White (2000), Davidson & Mackinnon (2001, 2002) and Giacomini et al. (2013,pp. 570–571). In particular it is unclear from Giacomini et al. (2013) when, or whether, the estimatorˆ σ should be replaced by its single- or double-bootstrap forms, ˆ σ ∗ and ˆ σ ∗∗ , for example in (6)–(9).The choices we have made are appropriate, however, and in particular the algorithm would not besecond-order accurate, or third-order accurate in the case of the double bootstrap, if we were to usesimply ˆ σ in those instances. In section 5.2 we shall show that in the percentile- t case, using the case B = ∞ as a benchmark, theapproach suggested above produces quantile estimators that are identical to those obtained using thestandard single-bootstrap method, up to an error of order n − / . In particular, they do not reducethe O ( n − ) coverage error of single-bootstrap methods. Similar results hold for percentile-methodbootstrap procedures. Here we report the results of a simulation study comparing the performances of ﬁve diﬀerent boot-strap methods for bias correction: The single bootstrap, the conventional double bootstrap, and thesuggested alternative method involving only C = 1, 2, 5 or 10 double-bootstrap replications. Thedata were of two types, either the exponential distribution, with density 2 − e − x/ on the positivehalf-line, or the log-normal distribution. These two distributions both have nonzero skewness and6onzero kurtosis, making them challenging for the bootstrap. The parameter of interest also tooktwo forms, both of them nonlinear: either θ = f ( µ ) = µ or θ = sin( µ ), where µ was the populationmean. In such cases there is a term with order n − in the bias expansion, which cannot be elimi-nated by the single bootstrap but can be removed by the double bootstrap. This is reﬂected in oursimulation results, which show that the double bootstrap provides better bias correction than thesingle bootstrap method.Sample size, n , was chosen in steps of 20 between 20 and 80; the number of simulations, B , inthe ﬁrst bootstrap step was set equal to n , for each of the bootstrap methods; and the numberof simulations, C , for the second bootstrap step in the conventional double bootstrap was taken tobe the integer part of 10 B / , which we write as ⌊ B / ⌋ . The choice of B / here was suggestedby Booth & Hall (1994) in the context of conﬁdence intervals, and gives an expression for C that isorders of magnitude larger than obtained using relatively small, ﬁxed C . For example, when n = 20the value of C = ⌊ B / ⌋ is between 20 and 200 times the values C = 1, 2, 5 or 10 used to simulatethe alternative approach to double-bootstrap methods; when n = 80 the respective factors are 80to 800.From equation (4), 1 B B X b =1 ˆ θ ∗ b − ˆ θ and 3 B B X b =1 ˆ θ ∗ b − BC B X b =1 C X c =1 ˆ θ ∗∗ bc − θ , provide the estimates of the true bias of ˆ θ , i.e., E (ˆ θ ) − θ , via single bootstrap and double bootstrap,respectively. Empirical approximations to bias, computed by averaging over the results of 5,000Monte Carlo trials in each case, are reported in Tables 1-2 in Supplementary Material, and the ratiosof such approximations and true bias are graphed in Figure 1. The ﬁgure shows that, for the valuesof B used in our analysis, there is little to choose between performance when using C = 1 and C = ⌊ B / ⌋ . In this section we illustrate the coverage performance of bootstrap conﬁdence intervals, with nominalcoverage 0 .

9, for the population means of the two distributions considered in section 4.1, i.e. theexponential and log-normal distributions. Sample size n was taken equal to 20 and 40 in each case; B was increased from 200 to 700 in steps of 100, as indicated on the horizontal axis of each panel;and one-sided and two-sided equal-tailed bootstrap conﬁdence intervals were considered, each usingeither the percentile or percentile- t bootstrap, implemented via the single bootstrap, the conventionaldouble bootstrap, C = ⌊ B / ⌋ ; and the warp speed bootstrap, i.e. the double bootstrap with C = 1.This choice of C was suggested by Lee & Young (1999). To provide a perspective diﬀerent from thatin section 4.1, in the present section we graph coverage as a function of B for ﬁxed n , rather than asa function of n for ﬁxed B as in section 4.1. Results in the two settings can of course be expressedin same way; the conclusions do not alter.Results for sample size n = 20, with each point on each graph based on 5,000 Monte Carlo7 n E s t i m a t e db i a s / T r u e b i a s

20 30 40 50 60 70 800.750.80.850.90.951 n E s t i m a t e db i a s / T r u e b i a s

20 30 40 50 60 70 801.051.11.151.21.251.3 n E s t i m a t e db i a s / T r u e b i a s

20 30 40 50 60 70 800.60.650.70.750.80.850.90.95 n E s t i m a t e db i a s / T r u e b i a s Figure 1: Performance of bootstrap methods for bias correction. First and second rows show resultsfor the exponential distribution, and the log-normal distribution, respectively; left- and right-handpanels show results for θ = µ and θ = sin( µ ), respectively. In each panel the graphs represent singlebootstrap method ( − ⋆ − ) and conventional double-bootstrap methods with C = 1 ( · · · + · · · ), C = 2( · · · ◦ · · · ), C = 5 ( · · · × · · · ), C = 10 ( · · · ♦ · · · ) and C = ⌊ B / ⌋ ( · · · (cid:3) · · · ), respectively.simulations, are presented in Figure 2. It can be seen that, for each conﬁdence interval type, theconventional double-bootstrap method gives greater coverage accuracy than the single-bootstrap andwarp-speed bootstrap. Results for sample size n = 40 are similar, and are reported in SupplementaryMaterial. Our main regularity condition, in addition to the model assumptions (1) and (2), is the followingcondition:(i) f ( x ) is diﬀerentiable six times with respect to any combination of the p componentsof x ; and those derivatives, as well as f itself, are uniformly bounded; and (ii) the data X ji have at least six ﬁnite moments, and E ( X ji ) is bounded uniformly in i and j . (12)8

00 300 400 500 600 7000.780.80.820.840.860.880.90.920.940.960.98 B E m p i r i c a l c o v e r ag e

200 300 400 500 600 7000.780.80.820.840.860.880.90.920.940.960.98 B E m p i r i c a l c o v e r ag e

200 300 400 500 600 7000.780.80.820.840.860.880.90.920.940.960.98 B E m p i r i c a l c o v e r ag e Figure 2: Performance of bootstrap methods for conﬁdence intervals when n = 20. First andsecond rows show results for the exponential distribution, and the log-normal distribution, respec-tively; left- and right-hand panels show results for one-sided and two-sided equal-tailed conﬁdenceintervals, respectively. In each panel the graphs represent single-bootstrap percentile ( − ⋆ − ), single-bootstrap percentile- t ( − · ⋆ · − ), conventional double-bootstrap percentile ( − (cid:3) − ), conventionaldouble-bootstrap percentile- t ( − · (cid:3) · − ), warp-speed percentile ( −♦− ) and warp-speed percentile- t methods ( − · ♦ · − ).Condition (12) can be generalized, but (for example) if we relax signiﬁcantly the condition of bound-edness of f and its derivatives, in (12)(i), then we need to strengthen the assumption about tails ofthe distributions of the X ji s, in (12)(ii). We shall deﬁne τ = E (cid:20)(cid:26) p X j =1 ( X j − µ j ) f j ( µ ) (cid:27) (cid:21) . (13)In Theorem 1, below, we decompose the bias-corrected estimators ˜ θ bc , based on the single boot-strap, and ˜ θ bcc , based the double bootstrap, as follows:˜ θ bc = U bc + V bc , ˜ θ bcc = U bcc + V bcc , (14)Here U bc and U bcc are the “ideal” versions of ˜ θ bc and and ˜ θ bcc , respectively, that we would obtainif we were to do an inﬁnite number of simulations, i.e. if we were to take B = C = ∞ ; and V bc V bcc denote error terms arising from doing only a ﬁnite number of Monte Carlo simulations.Part (d) of Theorem 1 shows that the error terms V bc in the case of the single bootstrap, and V bcc for the double bootstrap, both equal O p { ( nB ) − / } , and that this is the exact order, regardless ofthe selection of C in the second bootstrap stage. Although the Monte Carlo error terms in the singlebootstrap and the double bootstrap share the same convergence rate, equations (15) show that thedouble bootstrap provides a higher degree of accuracy, in terms of bias correction, than the singlebootstrap if we take B = C = ∞ . Part (d) also implies that if B is suﬃciently large, or moreprecisely if n = O ( B ), then the Monte Carlo error is of the same order as, or order smaller than,the deterministic remainders in (15). These are the main theoretical ﬁndings of Theorem 1. Theorem 1.

Assume that the data are generated according to either of the models at (2) , that (12) holds, and that B = B ( n ) → ∞ as n → ∞ . Then: (a) Equations (14) hold, where U bc and U bcc arefunctions of X alone, and in particular do not involve X ∗ or X ∗∗ , and satisfy E ( U bc ) = θ + O ( n − ) , E ( U bcc ) = θ + O ( n − ) ; (15) and V bc and V bcc are functions of both X and X ∗ (and also of X ∗∗ , in the case of V bcc ), andsatisfy E ( V bc | X ) = E ( V bcc | X ) = 0 . (b) Both U bc and U bcc equal ˆ θ + O p ( n − ) , and both satisfy thesame central limit theorem as ˆ θ . (c) In particular, both U bc and U bcc are asymptotically normallydistributed with mean θ and a variance, σ n say, which has the property that n σ n is bounded as n → ∞ .(d) Conditional on X , V bc and V bcc are asymptotically normally distributed with zero means andvariances of size ( nB ) − , and if C = C ( n ) → ∞ as n → ∞ then the ratio of the variances convergesto 1 as n diverges. In the case of (2)(i) the asymptotic variances of V bc and V bcc , both conditionalon X and unconditionally, are ( Bn ) − τ and (4 + C − ) ( Bn ) − τ , respectively. In connection with part (d) it can be shown that, if C diverges (no matter how slowly) as n increases, the asymptotic distribution of the error is the same as it would be if C = ∞ . If σ n is asin part (c) then, under the model (2)(i), there exists a positive constant c such that n σ n = c + o (1)as n → ∞ . However, this is not necessarily correct under the model (2)(ii), since in that setting wedo not require the ratios n j /n to converge. In the context of (2)(i), formulae for U bc and U bcc aregiven at (A9) and (A10), respectively, in the Supplementary Material.The orders of magnitude of the remainders in (15) are exact when skewness and kurtosis arenonzero. It follows from part (b) of Theorem 1 that, in the case B = C = ∞ , ˜ θ bc and ˜ θ bcc satisfyidentical central limit theorems, and in particular both have the same asymptotic variances. We shall assume that X , which represents a generic p -vector X i = ( X i , . . . , X pi ) T , where 1 ≤ i ≤ n and (2)(i) holds, satisﬁes the following multivariate version of Cram´er’s continuity condition (Hall,1992): lim sup k t k→∞ (cid:12)(cid:12) E { exp( it T X ) } (cid:12)(cid:12) < . (16)10n this occasion, i denotes √−

1. For brevity we shall treat in detail only the percentile- t case,evidenced by the second formula in each of (5)–(7), and discuss the percentile method brieﬂy belowTheorem 2.Let Φ and φ denote the standard normal distribution and density functions, respectively. Assumethat an unknown scalar parameter θ can be written as θ = f ( µ ), where µ = E ( X ), and that ourestimator of θ is ˆ θ = f ( ¯ X ), as at (1), where ¯ X = n − P ni =1 X i . Methods of Bhattacharya & Ghosh(1978) can be used to prove that, under conventional assumptions such as those in Theorem 2 below, G ( x ) ≡ pr { n / (ˆ θ − θ ) / ˆ σ ≤ x } = Φ( x ) + X j =1 n − j/ Q j ( x ) φ ( x ) + n − A n ( x ) , (17)where Q j is a polynomial of degree 3 j −

1, and is an even or odd function according as j is odd oreven, respectively; and the remainder A n ( x ) satisﬁessup n ≥ sup −∞

11s as at (19); if we were to use the percentile- t bootstrap method, it would be (ˆ θ − n − / ˆ σ ˆ x α , ∞ ),where ˆ x α is as at (20); and if we were to employ the warp-speed bootstrap method, it would be(ˆ θ − n − / ˆ σ ˆ x ˆ β α , ∞ ), as discussed in section 3.2, where ˆ β α denotes the limit, as B → ∞ , of thequantity ˆ β Bα introduced there. However, we shall show in Theorem 2 that ˆ x ˆ β α = ˆ x α + O p ( n − / ),and so the endpoints of standard percentile- t and warp-speed bootstrap conﬁdence intervals diﬀeronly to order n − / . This signals that conventional arguments, based on Edgeworth expansions, canbe used to prove that the standard percentile- t conﬁdence interval, and its warp-speed bootstrapvariant, have identical coverage error up to and including terms of order n − , and of course thatcan be done under the assumptions of Theorem 2. Since, as is well known, the coverage error ofthe percentile- t interval is genuinely of order n − (Hall, 1986), then it follows that the warp-speedbootstrap does not improve on that accuracy. Theorem 2.

Assume that model (2)(i) applies; that the function f , in the deﬁnition θ = f ( µ ) ,has ﬁve bounded derivatives; and that (16) holds, E ( k X k K ) < ∞ for suﬃciently large K > , and B = ∞ . Then ˆ x ˆ β α = ˆ x α + O p ( n − / ) . The appropriate number of moments that should be assumed for general Edgeworth or Cor-nish Fisher expansions, even in relatively simple, non-bootstrap cases, is awkward to determine.For example, the argument of Bhattacharya & Ghosh (1978) requires at least six moments in thecase of the Studentised mean, whereas it is known that three moments are suﬃcient; see e.g. Hall(1987). Even if we were to develop, in full detail, a proof of Theorem 2 based on the methods ofBhattacharya & Ghosh (1978), the number of moments we would need to assume would be undulygenerous, and instead refer to the number as simply K . We choose not to provide such a detaileddevelopment here. However, the number of derivatives is relatively easy to address, and the theoremprovides detail in that respect.Let e F ∗ ( x ) = pr (cid:8) n / (cid:0) ˆ θ ∗∗ − ˆ θ ∗ (cid:1)(cid:14) ˆ σ ∗∗ ≤ x (cid:12)(cid:12) X (cid:9) , which is the limit of e F ∗ B ( x ), deﬁned in (8), as B → ∞ . Then ˆ x ˆ β α is the solution of e F ∗ ( x ) = α . Ourfocus on the case B = ∞ deserves comment. In the early days of the bootstrap, B = ∞ was seenas “the statistical bootstrap method,” and the case of ﬁnite B was interpreted as a Monte Carloapproximation to the bootstrap. Indeed, taking B < ∞ was viewed more as an issue to be addressedin computational or numerical terms, rather than statistical ones. Reﬂecting this, for about eightyears from the mid 1980s considerable eﬀort was spent developing eﬃcient computational methodsfor undertaking bootstrap resampling. However, by the early 1990s computers had become so fastthat this area of research had largely disappeared. This remains the case today; taking B in thethousands, without using numerical devices to increase simulation eﬃciency, is now the rule ratherthan the exception. The diﬀerence between such large values of B , and using the mathematical idealvalue B = ∞ , is particularly small. 12 Conclusion and discussion

We have investigated the role played by C , the number of resamples used in the second bootstrapstage, in double bootstrap methods for bias correction and conﬁdence intervals. Speciﬁcally, we haveshown that the double bootstrap is largely insensitive to choice of C in the context of bias correc-tion. Indeed, double bootstrap methods with ﬁxed C can produce third-order accuracy, much as doconventional double bootstrap methods with diverging C . This result demonstrates the eﬀective-ness, for bias correction, of using the double bootstrap with a single double-bootstrap simulation.Although existing work shows that the warp-speed double bootstrap ( C = 1) can improve accuracyin hypothesis testing, there has not been, until now, any theoretical underpinning of its performancein the context of conﬁdence intervals. However, when only a single bootstrap resample is used in thesecond-bootstrap stage to construct conﬁdence intervals, the order of magnitude of coverage error isnot improved relative to that for the single bootstrap. Supplementary material

Supplementary Material available for theoretical proofs of Theorems 1 and 2, and additional simu-lation results in sections 4.1 and 4.2.

References

Beran, R. (1987). Prepivoting to reduce level error in conﬁdence sets.

Biometrika , 457–468. Beran, R. (1988). Prepivoting test statistics: a bootstrap view of asymptotic reﬁnements.

J. Amer.Statist. Assoc. , 687–697. Bhattacharya, R.N. & Ghosh, J.K. (1978). On the validity of the formal Edgeworth expansion.

Ann. Statist. , 434–451. Booth, J.G. & Hall, P. (1994). Monte Carlo approximation and the iterated bootstrap.

Biometrika , 331–340. Booth, J.G. & Presnell, B. (1998). Allocation of Monte Carlo resources for the iterated boot-strap.

J. Comput. Graph. Statist. , 92–112. Davidson, R. & Mackinnon, J.G. (2001). Improving the reliability of bootstrap tests. QueensInstitute for Economic Research Discussion Paper No. 995, revised.

Davidson, R. & Mackinnon, J.G. (2002). Fast double bootstrap tests of nonnested linear regres-sion models.

Econometric Rev. , 417–427. Davidson, R. & Mackinnon, J.G. (2007). Improving the reliability of bootstrap tests with thefast double bootstrap.

Comput. Statist. Data Anal. , 3259–3281.13 avison, A.C., Hinkley, D.V. & Schechtman, E. (1986). Eﬃcient bootstrap simulation. Biometrika , 555–566. Efron, B. (1983). Estimating the error rate of a prediction rule: improvement on cross-validation.

J. Amer. Statist. Assoc. , 316–331. Giacomini, R., Politis, D.N. & White, H. (2013). A warp-speed method for conducting MonteCarlo experiments involving bootstrap estimators.

Econometric Theory , 567–589. Hall, P. (1986). On the bootstrap and conﬁdence intervals.

Ann. Statist. , 1431–1452. Hall, P. (1987). Edgeworth expansion for Student’s t statistic under minimal moment conditions. Ann. Probab. , 920–931. Hall, P. (1988). On symmetric bootstrap conﬁdence intervals.

J. Roy. Statist. Soc.

Ser. B ,35–45. Hall, P. (1990).

On the relative performance of bootstrap and Edgeworth approximations of adistribution function.

J. Multivariate Anal. , 108–129. Hall, P. (1992).

The Bootstrap and Edgeworth Expansion . Springer, New York.

Hall, P. & Martin, M.A. (1988). On bootstrap resampling and iteration.

Biometrika , 661–671. Lee, S.M.S. & Young, G.A. (1999). The eﬀect of Monte Carlo approximation on coverage errorof double-bootstrap conﬁdence intervals.

J. Roy. Statist. Soc.

Ser. B , 353–366. Mackinnon, J.G. (2006). Applications of the fast double bootstrap. Queens Economics DepartmentWorking Paper No. 1023.

White, H. (2000). A reality check for data snooping.

Econometrica , 1097–1126.14 upplementary material for “Double-bootstrap methods use asingle double-bootstrap simulation” Jinyuan Chang Peter HallDepartment of Mathematics and StatisticsThe University of Melbourne, VIC, 3010, Australia

A Proof of Theorem 1

In view of (12), Taylor expansion can be used to derive the following formulae:ˆ θ = θ + X s =1 s ! p X j =1 . . . p X j s =1 ( ¯ X j − µ j ) · · · ( ¯ X j s − µ j s ) f j ...j s ( µ ) + O p ( n − ) , (A1)and E (ˆ θ ) = θ + X s =1 s ! p X j =1 . . . p X j s =1 E { ( ¯ X j − µ j ) · · · ( ¯ X j s − µ j s ) } f j ...j s ( µ ) + O ( n − ) , (A2)where the remainder term R n that is denoted by O p ( n − ) in (A1) satisﬁes E ( R n ) = O ( n − ).Deﬁne ξ j j = cov( X j , X j ) ,ξ j j j = E { ( X j − µ j ) ( X j − µ j ) ( X j − µ j ) } ,ξ j j j j = ξ j j ξ j j + ξ j j ξ j j + ξ j j ξ j j . Then, if (2)(i) holds, E { ( ¯ X j − µ j )( ¯ X j − µ j ) } = n − ξ j j ,E { ( ¯ X j − µ j )( ¯ X j − µ j )( ¯ X j − µ j ) } = n − ξ j j j ,E { ( ¯ X j − µ j )( ¯ X j − µ j )( ¯ X j − µ j )( ¯ X j − µ j ) } = n − ξ j j j j + O ( n − ) . Hence, by (A2), E (ˆ θ ) = θ + 12 n p X j =1 p X j =1 ξ j j f j j ( µ ) + 16 n p X j =1 p X j =1 p X j =1 ξ j j j f j j j ( µ )+ 124 n p X j =1 . . . p X j =1 ξ j j j j f j j j j ( µ ) + O ( n − )= θ + n − γ + n − (cid:0) γ + γ (cid:1) + O ( n − ) , (A3)where, for r = 2 , , γ r = p X j =1 . . . p X j r =1 ξ j ...j r f j ...j r ( µ ) .

15f (2)(ii) holds, instead of (2)(i); and if we deﬁne σ j = ξ jj , and write I ( E ) for the indicatorfunction of an event E ; then the following relations obtain: E { ( ¯ X j − µ j )( ¯ X j − µ j ) } = n − j I ( j = j ) σ j ,E { ( ¯ X j − µ j )( ¯ X j − µ j )( ¯ X j − µ j ) } = n − j I ( j = j = j ) ξ j j j , and E { ( ¯ X j − µ j )( ¯ X j − µ j )( ¯ X j − µ j )( ¯ X j − µ j ) } = ( n j n j ) − I ( j = j = j = j ) ξ j j j j + ( n j n j ) − I ( j = j = j = j ) ξ j j j j + ( n j n j ) − I ( j = j = j = j ) ξ j j j j + O ( n − ) . Therefore we can write (A2) as E (ˆ θ ) = θ + n − γ (1) + n − γ (2) + O ( n − ) , (A4)where the quantities γ (1) and γ (2) may depend on n but are bounded as n → ∞ . Property (A4) isthe analogue, in the context of (2)(ii) rather than (2)(i), of (A3).To explore properties of Monte Carlo approximations to the quantities E (ˆ θ ∗ | X ) and E (ˆ θ ∗∗ | X )(compare (3) and (4)), observe ﬁrst that, analogously to (A1),ˆ θ ∗ = f ( ¯ X ∗ ) = ˆ θ + X r =1 r ! p X j =1 . . . p X j r =1 ( ¯ X ∗ j − ¯ X j ) · · · ( ¯ X ∗ j r − ¯ X j r ) f j ...j r ( ¯ X ) + O p ( n − ) , ˆ θ ∗∗ = f ( ¯ X ∗∗ ) = ˆ θ ∗ + X r =1 r ! p X j =1 . . . p X j r =1 ( ¯ X ∗∗ j − ¯ X ∗ j ) · · · ( ¯ X ∗∗ j r − ¯ X ∗ j r ) f j ...j r ( ¯ X ∗ ) + O p ( n − ) . Averaging these formulae over bootstrap replicates we obtain the following expansions: S bc ≡ B B X b =1 ˆ θ ∗ b = ˆ θ + X r =1 r ! p X j =1 . . . p X j r =1 f j ...j r ( ¯ X ) × B B X b =1 ( ¯ X ∗ bj − ¯ X j ) · · · ( ¯ X ∗ j r − ¯ X j r ) + O p ( n − ) , (A5) S bcc ≡ BC B X b =1 C X c =1 ˆ θ ∗∗ bc = 1 B B X b =1 ˆ θ ∗ b + X r =1 r ! p X j =1 . . . p X j r =1 B B X b =1 f j ...j r ( ¯ X ∗ b ) × C C X c =1 ( ¯ X ∗∗ bcj − ¯ X ∗ bj ) · · · ( ¯ X ∗∗ bcj r − ¯ X ∗ bj r ) + O p ( n − ) . (A6)In view of (12), the remainder terms R n , say, that are denoted by O p ( n − ) in (A5) and (A6) satisfy E ( R n ) = O ( n − ). 16eﬁne ˆ ξ j j = 1 n n X i =1 ( X j i − ¯ X j )( X j i − ¯ X j ) , ˆ ξ j j j = 1 n n X i =1 ( X j i − ¯ X j )( X j i − ¯ X j )( X j i − ¯ X j ) , ˆ ξ j j j j = ˆ ξ j j ˆ ξ j j + ˆ ξ j j ˆ ξ j j + ˆ ξ j j ˆ ξ j j , ˆ η r = p X j =1 . . . p X j r =1 ˆ ξ j ...j r f j ...j r ( ¯ X ) , the latter for r = 2 , ,

4. In the discussion below we shall assume, for the sake of deﬁniteness, thatthe data are generated by the model (2)(i). The case of model (2)(ii) is similar.Suppose ﬁrst that we use the regular bootstrap, both for resampling X ∗ b from X and for resampling X ∗∗ bc from X ∗ b . Then the conditional expected values of the non-remainder terms on the right-handsides of (A5) and (A6) satisfy the following identities, respectively: E (cid:26) ˆ θ + X r =1 r ! p X j =1 . . . p X j r =1 f j ...j r (cid:0) ¯ X (cid:1) B B X b =1 ( ¯ X ∗ bj − ¯ X j ) · · · ( ¯ X ∗ j r − ¯ X j r ) (cid:12)(cid:12)(cid:12)(cid:12) X (cid:27) = ˆ θ + n − ˆ η + n − (cid:0) ˆ η + ˆ η (cid:1) + O p ( n − ) , (A7) E (cid:26) B B X b =1 ˆ θ ∗ b + X r =1 r ! p X j =1 . . . p X j r =1 B B X b =1 f j ...j r ( ¯ X ∗ b ) × C C X c =1 ( ¯ X ∗∗ bcj − ¯ X ∗ bj ) · · · ( ¯ X ∗∗ bcj r − ¯ X ∗ bj r ) (cid:12)(cid:12)(cid:12)(cid:12) X (cid:27) = ˆ θ + n − (2 − n − ) ˆ η + n − (cid:0) ˆ η + ˆ η (cid:1) + 2 n − (cid:0) ˆ η + ˆ η (cid:1) + O p ( n − ) , (A8)where, as before, the expected values of the O p ( n − ) remainder terms equal O ( n − ).Recall the deﬁnitions of ˜ θ bc and ˜ θ bcc at (4), and deﬁne U bc ≡ E (˜ θ bc | X ) = 2 ˆ θ − E ( S bc | X ) ,U bcc ≡ E (˜ θ bcc | X ) = 3 { ˆ θ − E ( S bc | X ) } + E ( S bcc | X ) . Then (A7) and (A8) imply that U bc = U bc ′ + O p ( n − ) and U bcc = U bcc ′ + O p ( n − ), where theexpected values of the O p ( n − k ) remainder terms equal O ( n − k ), and U bc ′ = ˆ θ − (cid:8) n − ˆ η + n − (cid:0) ˆ η + ˆ η (cid:1)(cid:9) , (A9) U bcc ′ = ˆ θ − n − (1 + n − ) ˆ η + n − (cid:0) ˆ η + ˆ η (cid:1) − n − (cid:0) ˆ η + ˆ η (cid:1) . (A10)Therefore U bc and U bcc both equal ˆ θ + O p ( n − ), as claimed in part (b) of Theorem 1.Put V bc = ˜ θ bc − E (˜ θ bc | X ) and V bcc = ˜ θ bcc − E (˜ θ bcc | X ). Employing (A3) and the properties E (ˆ η ) = (1 − n − ) γ + n − (cid:0) γ + γ (cid:1) + O ( n − ) , E (ˆ η r ) = γ r + O ( n − ) (A11)17or r = 3 ,

4, we deduce that E (˜ θ bc ) = E ( U bc ) = θ + O ( n − ), and that V bc = ˜ θ bc − U bc is afunction of both X and X ∗ , satisfying E ( V bc | X ) = 0 (in the context of (2)(i)) and var( V bc | X ) = { o p (1) } ( Bn ) − τ . Central limit theorems for U bc and V bc follow from Lindeberg’s theorem.In the context of (2)(i), those parts of (15) and (b)–(d), in Theorem 1, that pertain to the single-bootstrap estimator ˜ θ bc , follow from these properties. (The exactness of the orders of magnitude ofremainders in (15) can be proved by deriving concise formulae for those terms, using (A9)–(A11).)The results discussed two paragraphs above also imply that E (˜ θ bcc ) = E ( U bcc ) = θ + O ( n − ),and of course, V bcc = ˜ θ bcc − U bcc is a function of X , X ∗ and X ∗∗ satisfying E ( V bcc | X ) = 0. Notetoo that, in the context of (2)(i),( BC ) var( S bcc − S bc (cid:12)(cid:12) X ) ∼ p var (cid:26) B X b =1 C X c =1 p X j =1 f j ( ¯ X ∗ b ) ( ¯ X ∗∗ bcj − ¯ X ∗ bj ) (cid:12)(cid:12)(cid:12)(cid:12) X (cid:27) = E "(cid:26) B X b =1 C X c =1 p X j =1 f j ( ¯ X ∗ b ) ( ¯ X ∗∗ bcj − ¯ X ∗ bj ) (cid:27) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X ∼ p E "(cid:26) B X b =1 C X c =1 p X j =1 f j ( µ ) ( ¯ X ∗∗ bcj − ¯ X ∗ bj ) (cid:27) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X = E E "(cid:26) B X b =1 C X c =1 p X j =1 f j ( µ ) ( ¯ X ∗∗ bcj − ¯ X ∗ bj ) (cid:27) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X , X ∗ X ! = C E E "(cid:26) B X b =1 p X j =1 f j ( µ ) ( ¯ X ∗∗ b j − ¯ X ∗ bj ) (cid:27) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X , X ∗ X ! = C E "(cid:26) B X b =1 p X j =1 f j ( µ ) ( ¯ X ∗∗ b j − ¯ X ∗ bj ) (cid:27) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X = BC E "(cid:26) p X j =1 f j ( µ ) ( ¯ X ∗∗ j − ¯ X ∗ j ) (cid:27) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X ∼ p BC n − τ , and cov( S bcc − S bc , S bc | X ) = o p ( B − ). Therefore,var(˜ θ bcc | X ) = var( V bcc | X ) = var( S bcc − S bc | X )= var( S bcc − S bc | X ) − S bcc − S bc , S bc | X ) + 4 var( S bc | X )= ( nB ) − (4 + C − ) τ + o p { ( nB ) − } . Much as in the case of ˜ θ bc , it can be proved from (A10) and (A11) that E (˜ θ bcc ) = E ( U bcc ) = θ + O ( n − ). If (2)(i) holds then these properties, and Lindeberg’s central limit theorem, imply thoseparts of Theorem 1 that pertain to the double-bootstrap estimator ˜ θ bcc . Cases where the model(2)(ii) holds are similar. 18 Proof of Theorem 2

Consider ﬁrst the solution β = β α , say, of the equationpr { n / (ˆ θ ∗ − ˆ θ ) / ˆ σ ∗ ≤ x β } = α , (A12)where x = x β is the solution of pr { n / (ˆ θ − θ ) / ˆ σ ≤ x } = β . (A13)Note that pr { n / (ˆ θ ∗ − ˆ θ ) / ˆ σ ∗ ≤ x | X } = Φ( x ) + n − / b Q ( x ) φ ( x ) + · · · + n − / b Q ( x ) φ ( x ) + n − ˆ A n ( x ) , (A14)where the remainder ˆ A n ( x ) satisﬁespr (cid:26) sup −∞ n K (cid:27) = O ( n − K ) (A15)and the constants K and K , both of which are strictly positive, can be chosen as small or as large,respectively, as desired, at the expense of having to assume a higher moment of k X k in the theorem.The left-hand side of (A12) equals the expected value of the left-hand side of (A14), and hencealso of the right-hand side of that formula. The coeﬃcients of b Q k depend on moments of X ∗ ,conditional on X , through rational polynomials in those conditional moments. The denominators inthose rational polynomials can be Taylor expanded, obtaining quantities b Q exp k , say, which have theproperty thatsup −∞

19s identical, up to terms of order n − / , to the solution x = x α of equation (A13) when β = α there,and in particular, x β α = x α + O ( n − / ) . Therefore, x β α = z α + n − / Q cf1 ( z α ) + n − Q cf2 ( z α ) + O ( n − / ) . (A17)Recall that the distribution function estimator with which we are working is the version of thesecond formula in (8) when B = ∞ and C = 1: e F ∞ ( x ) = pr { n / (ˆ θ ∗∗ − ˆ θ ∗ ) / ˆ σ ∗∗ ≤ x | X } , where ˆ θ ∗ , ˆ θ ∗∗ and ˆ σ ∗∗ are computed from X ∗ , X ∗∗ and X ∗∗ , respectively. Since we are taking B = ∞ in our analysis then ˆ x α , deﬁned (9) in the case of ﬁnite B , is now given by the limit as B → ∞ ofthat deﬁnition, i.e. the solution in x of pr { n / (ˆ θ ∗ − ˆ θ ) / ˆ σ ∗ ≤ x | X } = α . In this notation, ˆ β α isdeﬁned to be the solution in β of the equation e F ∞ (ˆ x β ) = α , i.e. the solution in β ofpr { n / (ˆ θ ∗∗ − ˆ θ ∗ ) / ˆ σ ∗∗ ≤ ˆ x β | X } = α . (A18)Now, the solution in β of (A18) is an estimator of the solution β = β α ofpr { n / (ˆ θ ∗ − ˆ θ ) / ˆ σ ∗ ≤ x β } = α , where x = x β is the solution of (A13). That is, a representation of ˆ x ˆ β α as a Cornish-Fisher expansionis identical to the analogous representation of x β α , except that moments of X are replaced by thecorresponding moments of X ∗ conditional on X . Since the Cornish-Fisher expansion of x β α is givenby (A17), up to and including terms of order n − , thenˆ x ˆ β α = z α + n − / b Q cf1 ( z α ) + n − b Q cf2 ( z α ) + O p ( n − / ) . This is identical to the expansion of ˆ x α , the solution ofpr { n / (ˆ θ ∗ − ˆ θ ) / ˆ σ ∗ ≤ x | X } = α , up to and including terms of order n − , and so ˆ x ˆ β α = ˆ x α + O p ( n − / ), as had to be proved. C Simulation results

In this section, we provide the simulation results for sections 4.1 and 4.2.

C.1 Bias estimation in section 4.1

Tables 1 and 2 report the empirical approximations to bias computed by averaging over the resultsof 5,000 Monte Carlo trials in the settings of exponential distribution and log-normal distribution,respectively. 20able 1: Bias estimation based on diﬀerent bootstrap methods for µ and sin( µ ) with Exp(2) distri-bution. The values in brackets denote the ratios of the estimated biases and true bias, respectively. n

20 40 60 80 µ true bias ( × ) 115.1658 57.0163 38.1427 28.6419single ( × ) 129.7612 62.6221 41.3012 30.7055[1.1267] [1.0983] [1.0828] [1.0720]double with C = 1 ( × ) 125.9539 61.2805 40.8512 30.2225[1.0937] [1.0748] [1.0710] [1.0552]double with C = 2 ( × ) 125.1125 61.4080 40.6490 30.2391[1.0864] [1.0770] [1.0657] [1.0558]double with C = 5 ( × ) 125.3128 61.3515 40.5743 30.2928[1.0881] [1.0760] [1.0638] [1.0576]double with C = 10 ( × ) 125.6812 61.4801 40.5936 30.2841[1.0913] [1.0783] [1.0643] [1.0573]double with C = ⌊ B / ⌋ ( × ) 125.5125 61.4068 40.6418 30.2630[1.0898] [1.0770] [1.0655] [1.0566]sin( µ ) true bias ( × ) -8.4970 -4.4585 -2.9896 -2.2458single ( × ) -6.2578 -3.8283 -2.7155 -2.1012[0.7365] [0.8587] [0.9083] [0.9356]double with C = 1 ( × ) -7.8440 -4.3452 -2.9636 -2.2358[0.9231] [0.9746] [0.9913] [0.9955]double with C = 2 ( × ) -7.8299 -4.3505 -2.9557 -2.2359[0.9215] [0.9758] [0.9887] [0.9956]double with C = 5 ( × ) -7.8483 -4.3475 -2.9526 -2.2383[0.9237] [0.9751] [0.9876] [0.9967]double with C = 10 ( × ) -7.8521 -4.3499 -2.9541 -2.2380[0.9241] [0.9756] [0.9881] [0.9965]double with C = ⌊ B / ⌋ ( × ) -7.8520 -4.3480 -2.9555 -2.2371[0.9241] [0.9752] [0.9886] [0.9961]21able 2: Bias estimation based on diﬀerent bootstrap methods for µ and sin( µ ) with exp { N (0 , } distribution. The values in brackets denote the ratios of the estimated biases and true bias, respec-tively. n

20 40 60 80 µ true bias ( × ) 116.4471 55.6341 36.9453 27.9352single ( × ) 150.1797 66.8400 42.5223 31.3730[1.2897] [1.2014] [1.1510] [1.1231]double with C = 1 ( × ) 128.1239 59.6595 39.0303 29.2126[1.1003] [1.0724] [1.0564] [1.0457]double with C = 2 ( × ) 131.4972 59.7961 39.2092 29.2521[1.1292] [1.0748] [1.0613] [1.0471]double with C = 5 ( × ) 127.7990 59.7409 39.0654 29.1772[1.0975] [1.0738] [1.0574] [1.0445]double with C = 10 ( × ) 129.5233 59.4563 39.0700 29.1729[1.1123] [1.0687] [1.0575] [1.0443]double with C = ⌊ B / ⌋ ( × ) 128.8509 59.5656 39.1011 29.1925[1.1065] [1.0707] [1.0584] [1.0450]sin( µ ) true bias ( × ) -9.8256 -5.6652 -3.9217 -2.9741single ( × ) -6.1373 -4.3128 -3.2181 -2.5383[0.6246] [0.7613] [0.8206] [0.8535]double with C = 1 ( × ) -8.1200 -5.2653 -3.7202 -2.8340[0.8264] [0.9294] [0.9486] [0.9529]double with C = 2 ( × ) -8.0672 -5.2670 -3.7275 -2.8318[0.8210] [0.9297] [0.9505] [0.9522]double with C = 5 ( × ) -8.0785 -5.2651 -3.7201 -2.8321[0.8222] [0.9294] [0.9486] [0.9523]double with C = 10 ( × ) -8.0812 -5.2684 -3.7214 -2.8320[0.8225] [0.9300] [0.9489] [0.9522]double with C = ⌊ B / ⌋ ( × ) -8.0796 -5.2667 -3.7228 -2.8324[0.8223] [0.9297] [0.9493] [0.9524]22 .2 Performance of n = 40 in section 4.2 Figure 3 shows the empirical coverage of the conﬁdence intervals constructed by diﬀerent bootstrapmethods when sample size n = 40.

200 300 400 500 600 7000.780.80.820.840.860.880.90.920.940.960.98 B E m p i r i c a l c o v e r ag e

200 300 400 500 600 7000.780.80.820.840.860.880.90.920.940.960.98 B E m p i r i c a l c o v e r ag e Figure 3: Performance of bootstrap methods for conﬁdence intervals when n = 40. First andsecond rows show results for the exponential distribution, and the log-normal distribution, respec-tively; left- and right-hand panels show results for one-sided and two-sided equal-tailed conﬁdenceintervals, respectively. In each panel the graphs represent single-bootstrap percentile ( − ⋆ − ), single-bootstrap percentile- t ( − · ⋆ · − ), conventional double-bootstrap percentile ( − (cid:3) − ), conventionaldouble-bootstrap percentile- t ( − · (cid:3) · − ), warp-speed percentile ( −♦− ) and warp-speed percentile- t methods ( − · ♦ · −− · ♦ · −