[PDF] Prepivoted permutation tests

Abstract

Full PDF

aa r X i v : . [ m a t h . S T ] F e b Prepivoted permutation tests

Colin B. Fogarty ∗ Abstract

We present a general approach to constructing permutation tests that are both exact forthe null hypothesis of equality of distributions and asymptotically correct for testing equality ofparameters of distributions while allowing the distributions themselves to diﬀer. These robustpermutation tests transform a given test statistic by a consistent estimator of its limiting distri-bution function before enumerating its permutation distribution. This transformation, knownas prepivoting, aligns the unconditional limiting distribution for the test statistic with the prob-ability limit of its permutation distribution. Through prepivoting, the tests permute one minusan asymptotically valid p -value for testing the null of equality of parameters. We describe twoapproaches for prepivoting within permutation tests, one directly using asymptotic normalityand the other using the bootstrap. We further illustrate that permutation tests using bootstrapprepivoting can provide improvements to the order of the error in rejection probability relativeto competing transformations when testing equality of parameters, while maintaining exactnessunder equality of distributions. Simulation studies highlight the versatility of the proposal, il-lustrating the restoration of asymptotic validity to a wide range of permutation tests conductedwhen only the parameters of distributions are equal. Permutation tests are becoming increasingly popular for inference in applications ranging from ﬁeldtrials in economics to microarray analyses in genomics. In many applied domains permutation testsare viewed as appealing nonparametric substitutions for parametric t tests in small-sample regimes(Ludbrook and Dudley, 1998; Ruxton, 2006). Unfortunately, while permutation tests provide exacttests for the null hypothesis of equality of distributions, they do not generally provide valid inferencefor the equality of parameters of distributions even asymptotically. For instance, a permutationtest which permutes the diﬀerence in sample means does not generally provide an asymptoticallyvalid test for equality of population means (Romano, 1990). While it may be correct avoid the t test in small sample regimes, replacing it with a permutation test generally requires a reformulationof the null hypothesis being tested. Absent this, practitioners may draw unwarranted conclusionsfrom a permutation test’s rejection of the null hypothesis.Misconceptions surrounding the ﬁndings to which practitioners are entitled motivate the devel-opment of robust permutation tests. Here robustness does not refer to any particular connectionwith the ﬁeld of robust statistics, but rather to a desired robustness against natural misinterpreta-tions when implementing a permutation test and assessing its results. A robust permutation testmaintains the exactness of the procedure under the null of equality in distributions while, througha suitably chosen test statistic, providing asymptotically valid inference for the null hypothesis thatcertain parameters of the distributions are equal even when the distributions diﬀer. Building upon ∗ Operations Research and Statistics Group, MIT Sloan School of Management, Massachusetts Institute of Tech-nology, Cambridge MA 02142 (e-mail: [email protected] ) θ ( · ), simply permuting a studentized version of these estimators suﬃces toyield an asymptotically robust permutation test. Chung and Romano (2016b) extend these resultsto the multivariate setting, suggesting permuting a modiﬁed Hotelling T-squared statistic usingan unpooled covariance estimator. The success of studentization in the aforementioned problemsis tied to the standard normal (univariate) and chi-squared (multivariate) limiting distributionsattained after studentization. Both the unconditional asymptotic distribution of the test statisticand the limit of the reference distribution generated by permutation align after studentization,resulting in asymptotic correctness of permutation inference for testing equality of parameters evenwhen the distributions themselves diﬀer. Studentization only produces a robust permutation testwhen it yields an asymptotic pivot for both the unconditional and permutation distributions, hencefailing to provide a general resolution.Prepivoting is the transformation of a test statistic by an empirical estimate of its cumulativedistribution function. The transformation is motivated by the probability integral transform: if wehad access to the true distribution function H T n ( x, P ) of a continuous random variable T n , then H T n ( T n , P ) would be distributed as a uniform random variable on [0,1]. We describe two formsof prepivoting that can restore asymptotic validity to permutation tests when testing the null ofequality for parameters based upon estimators admitting asymptotically linear representations. Theﬁrst, termed Gaussian prepivoting, makes direct use of the asymptotic normality stemming fromasymptotic linearity. It is shown that this transformation is equivalent to studentization wheneverstudentization would produce an asymptotic pivot, but provides an asymptotic pivot for a broaderclass of test statistics. Gaussian prepivoting both sheds light upon the true purpose of studenti-zation when it is successful and provides general methodology for situations where studentizationalone would fail. Cohen and Fogarty (2020) describe Gaussian prepivoting within randomizationtests in ﬁnite population causal inference as a way to create tests that are exact for Fisher’s sharpnull while asymptotically conservative for Neyman’s weak null. The second, bootstrap prepivoting,forms an estimate of the distribution function using the bootstrap. Bootstrap prepivoting wasintroduced in Beran (1987, 1988) as a means of improving the order of approximation for bootstrapconﬁdence intervals and hypothesis tests. Chung and Romano (2016b) ﬁrst used bootstrap prepiv-oting within a permutation test to restore ﬁrst-order correctness to multivariate permutation testspermuting the maximum absolute t statistic. In this work we highlight the broad generality withwhich the idea succeeds, along with circumstances in which bootstrap prepivoting provides higherorder accuracy relative to competing methods for constructing robust permutation tests.This work suggests the following general approach to constructing robust permutation tests:rather than permuting a seemingly natural test statistic for assessing equality of parameters, insteadpermute one minus a large-sample p -value known to asymptotically valid under the null of equalityof parameters. Theoretical developments and simulation studies in §§ p -values may confer in terms of reduced errors in rejection probability. Let P , . . . , P k be d -variate distributions for k populations. For each i = 1 , . . . , k let X i , . . . , X in i be independent and identically distributed random variables distributed according to P i , and letthe k samples be jointly independent. Let n = P ki =1 n i , and impose the following condition on n i /n : 2 ondition 1. For i = 1 , ..., k , n i /n → p i with < p i < as n → ∞ , with n i /n − p i = O ( n − / ) . Let X i denote the n i × d matrix whose j th row contains X ij . Write Z for the n × d matrixstacking X , . . . , X k on top of one another, and let I i be the row indices of Z containing samplesfrom population i , i.e. I = { , . . . , n } , I = { n + 1 , . . . , n + n } ,. . . , I k = { n − n k + 1 , . . . , n } .Let P n = Q ki =1 Q n i j =1 P i be the distribution of Z , and let P ∞ be the corresponding inﬁnite productmeasure sending n → ∞ . We will generally drop the dependence on the sample size and refer toboth P n and P ∞ as P .Let π = { π (1) , . . . , π ( n ) } be a permutation of { , . . . , n } . Let Z π reﬂect the matrix Z with itsrows permuted according to π , so that the j th row of Z π contains the π ( j )th row of Z . Let Z πi = Z π ( I i ) be the n i × d matrix containing the n i rows of Z π whose row indices are within I i . For instance,the j th row of Z πi contains the π {I i ( j ) } th row of Z . Deﬁne ˆ P πi as the empirical distribution of Z πi , and ˆ P π = Q ki =1 Q n i j =1 ˆ P πi . The permutation distribution for a possibly multivariate statistic T n ( Z π ) is R T n ( x, ˆ P ) = 1 n ! X π ∈G n { T n ( Z π ) ≤ x } = pr { T n ( Z Π ) ≤ x | Z } , where P n is the set of all n ! permutations π , Π is uniformly distributed on G n and is independentof Z , and 1( A ) is an indicator that the event A occurred.Consider the null hypothesis of equality of distributions H P : P = . . . = P k , and let T n ( · ) be ascalar test statistic. The level − α permutation test for H P based upon the test statistic T n rejectsthe null hypothesis through the decision rule ϕ T n ( α ) = 1 { T n ( Z ) > R − T n (1 − α, ˆ P ) } , where R − T n (1 − α, ˆ P ) = inf { x : R n ( x, ˆ P ) ≥ − α } . Through this construction, E { ϕ T n ( α ) } ≤ α forany n . Control in ﬁnite samples stems from the order statistics forming a complete suﬃcient statisticfor the common unknown distribution under H P . It follows that only exact tests have a Neymanstructure, and the distribution of T n ( Z ) given the order statistics is precisely the permutationdistribution R T n ( · , ˆ P ) (Lehmann and Romano, 2005, Theorem 5.11.1). A level exactly equal to α may be attained through a suitably randomized decision rule when T n ( Z ) = R − n (1 − α, ˆ P ). Suppose that interest lies not in the null of equality of distributions themselves but rather in thenull of equality of parameters of distributions, H θ ( P ) : θ ℓ ( P ) = · · · = θ ℓ ( P k ) for all ℓ = 1 , . . . , d, against the alternative that θ ℓ ( P i ) = θ ℓ ( P ′ i ) for some ℓ = 1 , . . . , d and some i = i ′ , where θ ℓ ( · ) isa real valued parameter. We suppose the existence of estimators ˆ θ ℓ ( · ) for each ℓ which satisfy thefollowing: Condition 2.

There exist asymptotically linear estimators ˆ θ ℓ ( · ) such that for each i = 1 , . . . , k andeach ℓ = 1 , . . . , d n / i { ˆ θ ℓ ( X i ) − θ ℓ ( P i ) } = n − / i n i X j =1 f P i ,ℓ ( X ij,ℓ ) + o P i (1) , ith E { f P i ,ℓ ( X i ,ℓ )) } = 0 and E { f P i ,ℓ ( X i ,ℓ ) } < ∞ . Furthermore, let Y , . . . , Y n be independentand identically distributed from the mixture ¯ P = P ki =1 p i P i and let Y be the n × d matrix whose j th row is Y j . Then, n / { ˆ θ ℓ ( Y ) − θ ℓ ( ¯ P ) } = n − / n X j =1 f ¯ P ,ℓ ( Y j,ℓ ) + o ¯ P (1) , with E { f ¯ P ,ℓ ( Y ,ℓ )) } = 0 and E { f ¯ P ,ℓ ( Y ,ℓ ) } < ∞ . Let Σ i be the covariance matrix for ( f P i , ( X i , ) , . . . , f P i ,d ( X i ,d )) T for i = 1 , .., k , and let ¯Σ bethe covariance matrix for ( f ¯ P , ( Y , ) , . . . , ( f ¯ P ,d ( Y ,d )) T with Y distributed as in Condition 2. LetˆΘ( Z π ) be the d × k matrix with ˆ θ ℓ ( Z πi ) in its { ℓ, i } entry. Moving forwards we will generally useˆΘ = ˆΘ( Z ) and ˆΘ π = ˆΘ( Z π ) as shorthand, and we will use similar notation for other estimators.Let ˆ C π be a (potentially random) matrix of column contrasts of dimension k × m , i.e. a matrixwhose m columns all sum to zero. We will begin with test statistics T n satisfying the following: Condition 3. T n ( · ) is of the form T n ( Z π ) = g { n / ˆΘ π ˆ C π , ˆ η π } , (1) where (a) g : R d × m × Ξ R is a jointly continuous function in both of its arguments; (b) ˆ C and ˆ C Π converge in probability to contrast matrices C and ¯ C respectively; and (c) ˆ η and ˆ η Π convergein probability to values η, ¯ η ∈ Ξ respectively, where Π is uniformly distributed over P n and isindependent of Z . As will be demonstrated in §

5, many common test statistics for testing equality in parametersmay be written in this form. Without loss of generality, in what follows we assume that we rejectthe null for large values of T n . Let H T n ( x, P ) be the true distribution function for T n and supposethat H θ ( P ) holds but P i = P i ′ for some i, i ′ = 1 , . . . , k . Let H T ( x, P ) be the limiting distributionfunction for T n , and suppose that the permutation distribution R T n ( x, ˆ P ) converges in probabilityto R T ( x, P ) at all continuity points of R T ( x, P ). Then, even for seemingly natural test statistics T n satisfying Conditions 2 and 3, it can occur that sup x ∈ R | H T ( x, P ) − R T ( x, P ) | >

0. As a result, thepermutation test ϕ T n ( α ) may be anti-conservative for H θ ( P ) even in the limit. Perhaps the mostprominent example of this is the permutation distribution of the diﬀerence in means in the two-sample univariate case, which fails to control the Type I error rate even asymptotically under thenull of equality of expectations of distributions unless either the variances of the two distributionsare equal or the sample sizes are equal; see Romano (1990) or Chung and Romano (2013, Example2.1) for additional details.This potential for anti-conservativeness for statistics satisfying Conditions 2 and 3 may be under-stood through the following theorem comparing the unconditional limit distribution of n / vec( ˆΘ ˆ C )to the probability limit of its permutation distribution, where vec( · ) is the columnwise vectorizationoperator. Theorem 1.

Suppose Conditions 1 and 2 hold and let V n ( Z π ) = n / vec ( ˆΘ π ˆ C π ) . Unconditionally, V n ( Z ) converges in distribution to a multivariate normal random variable, with mean zero andcovariance ( C T ⊗ I d × d )Γ( C T ⊗ I d × d ) T , (2)4 here ⊗ is the Kronecker product, I d × d is the d × d identity matrix, and Γ is the direct sum of thematrices p − i Σ i ( i = 1 , .., k ) , i.e. the block-diagonal matrix of dimension kd × kd with p − i Σ i in the i th block of dimension d × d .The permutation distribution of V n ( · ) , R V n ( t, ˆ P ) , instead converges weakly in probability to thelaw of a multivariate normal distribution, with mean zero and covariance ( ¯ C T ⊗ I d × d )¯Γ( ¯ C T ⊗ I d × d ) T , (3) where ¯Γ is the direct sum of the matrices p − i ¯Σ ( i = 1 , . . . , k ) . Under H θ ( P ) ¯Γ and ¯ C need not equal Γ and C , such that the covariances governing the true limitdistribution and the probability limit of the permutation distribution diﬀer. As the true limitingdistribution and the limiting permutation distribution for T n are simply push-forward measures ofthe above multivariate normals determined by the function g in (1), this can corrupt the limitingsize of ϕ T n under H θ ( P ) . Remark 1.

Given asymptotic linearity, establishing asymptotic normality for the unconditionaldistribution of V n ( Z ) is a standard application of the central limit theorem and is omitted. Theproof of weak convergence in probability of R V n to the distribution function of the mean zeromultivariate Gaussian with covariance given in (3) is an extension of steps within the proof ofTheorem 3.1 of Chung and Romano (2013) to the multivariate setting. In sketching the proof inthe Appendix, we highlight the essential role of the contrast matrix ˆ C ( Z π ) in the deﬁnitions of V n ( Z π ) and T n ( Z π ) . Under our assumptions, R V n converges weakly in probability to the law ofa mean-zero multivariate normal with covariance given above if and only if ˆ C n ( Z Π ) converges inprobability to a contrast matrix. In short, the contrast matrix justiﬁes application of Lemma 4.1of D¨umbgen and Del Conte-Zerial (2013), as the independence required in their (D2) is satisﬁed;see also Lemma A.1 of Chung and Romano (2016b). Example 5.3 of Chung and Romano (2013)shows what occurs when ˆ C Π does not converge in probability to a contrast matrix: the permutationdistribution will not converge weakly in probability to any ﬁxed law Q . Let H T ( x, P ) be the limiting law for a scalar test statistic T n ( Z ), let R T ( x, P ) represent theprobability limit of R T n ( x, ˆ P ), and suppose that both H T ( x, P ) and R T ( x, P ) are continuous andstrictly increasing. Suppose that F T n ( x, ˆ P ) is an estimate of H T n ( x, P ), the distribution functionfor T n ( Z ). Consider now the true distribution H F n of F T n { T n ( Z ) , ˆ P } , along with the permutationdistribution R F n of the form R F n ( x, ˆ P ) = 1 N ! X π ∈G n F T n { T n ( Z π ) , ˆ P π } ≤ x ] = pr[ F T n { T n ( Z Π ) , ˆ P Π } ≤ x | Z ] . In words, F T n { T n ( Z ) , ˆ P } transforms T n ( Z ) by an estimate of its distribution function, in so doingapproximating a transformation that would give rise to a random variable distributed uniformlyon [0,1] under continuity. Theorem 2 shows that if F T n ( x, ˆ P Π ) also suitably estimates R T ( x, P ) thelimiting distributions H F and R F will be the same.5 heorem 2. Let A θ ( P ) be a subset of H θ ( P ) . Suppose that for all ( P , . . . , P k ) ∈ A θ ( P ) the limits R T ( x, P ) and H T ( x, P ) are continuous and strictly increasing. Suppose that the estimator F T n satisﬁes sup x ∈ R | F T n ( x, ˆ P ) − H T n ( x, P ) | →

0; sup x ∈ R | F T n ( x, ˆ P Π ) − R T n ( x, ˆ P ) | → in probability as n → ∞ for each element of A θ ( P ) . Then, for each element of A θ ( P ) sup x ∈ R | H F n ( x, P ) − U ( x ) | →

0; sup x ∈ R | R F n ( x, ˆ P ) − U ( x ) | → in probability, where U ( x ) is the distribution function for a uniform random variable on [0,1]. The theorem indicates that after transformation by F T n , the true distribution H F n and thepermutation distribution R F n both converge to the same limits H F ( x, P ) = R F ( x, P ) = U ( x ),even if the limits H T and R T are not the same. Applying Lemma 11.2.1 and Corollary 11.2.3 ofLehmann and Romano (2005) yields the following corollary: Corollary 1.

Suppose the setup of Theorem 2 and consider the permutation test ϕ F n ( α ) = 1[ F n { T n ( Z ) , ˆ P } > R − F n (1 − α, ˆ P )] . Then for each ( P , . . . , P k ) ∈ A θ ( P ) , lim E { ϕ F n ( α ) } = α . That is, the test ϕ F n ( α ) is asymptoticallypointwise level α for all elements of A θ ( P ) . p -values Suppose inference is desired for H θ ( P ) but that the requirement of exactness when P = . . . = P k isdropped. If the estimated distribution function F T n satisﬁes sup x ∈ R | F T n ( x, ˆ P ) − H T n ( x, P ) | → A θ ( P ) ⊂ H θ ( P ) with a continuous and strictly increasing limitingdistribution function H T ( x, P ), then an asymptotically valid p -value over this subset of H θ ( P ) is p val = 1 − F T n { T n ( Z ) , ˆ P } . Rejecting the null if 1 − F T n { T n ( Z ) , ˆ P } ≤ α ⇔ F T n { T n ( Z ) , ˆ P } ≥ − α provides a pointwiseasymptotically level α test over A θ ( P ) . The permutation test ϕ F n ( α ) instead rejects with nominallevel α when F T n { T n ( Z ) , ˆ P } > R − F n (1 − α, ˆ P ) . The permutation distribution R F n ( x, ˆ P ) calculates for each permutation π what 1 minus thelarge-sample p -value using F T n would be with an observed sample Z π . If sup x ∈ R | F T n ( x, ˆ P Π ) − R T n ( x, ˆ P ) | → R − F n (1 − α, ˆ P ) → U − (1 − α ) = 1 − α in probability. In words, the 1 − α quantile of 1 minusthe permuted large-sample p -values converges in probability to 1 − α . While the usual test wouldreject when one minus the p -value constructed using F T n falls at or above 1 − α , the permutation test ϕ F n ( α ) replaces 1 − α with the permutation quantile R − F n (1 − α, ˆ P ), which converges in probabilityto 1 − α . Replacing 1 − α by the 1 − α quantile of this permutation distribution preserves asymptoticcorrectness under A θ ( P ) ⊂ H θ ( P ) , while additionally providing exactness should P = . . . = P k .Given an asymptotically valid p -value, constructing a robust permutation test under a givenrestriction on H θ ( P ) amounts to conﬁrming that the permutation distribution of the large-sample6 -value converges weakly in probability to the distribution function of a uniform random variable on[0,1]. This task is greatly simpliﬁed through the theoretical advances presented in Chung and Romano(2013) and Chung and Romano (2016b). While many approaches exist for constructing robust per-mutation tests that are catered to speciﬁc test statistics T n , we now present two general approachesfor constructing asymptotically valid p -values whose permutation distributions have the requiredlimiting behavior. The ﬁrst approach, presented in § V n ( Z ) and V n ( Z Π ) proven in Theorem 1 to compute an asymptotically valid p -value using asuitably constructed covariance estimator. The second, presented in § p -value using a bootstrap null distribution for H θ ( P ) . Our ﬁrst approach uses the asymptotic normality proven in Theorem 1. To proceed, we require theexistence of a covariance estimator ˆΣ of the following form:

Condition 4.

For each i , ˆΣ( X i ) converges in probability to Σ i . Furthermore, ˆΣ( Y ) convergesin probability to ¯Σ , where Y , .., Y n are independent and identically distributed from the mixture ¯ P = P ki =1 p i P i and Y is an n × d matrix whose j th row is Y j . For any permutation π ∈ P n , let ˆΓ π = ˆΓ( Z π ) be the block-diagonal matrix of dimension kd × kd with ˆΣ( Z πi ) in the i th of k blocks. Consider the distributional estimator K T n ( x, ˆ P π ) = γ ( kd )0 , ˆΓ π n a : g (vec − d,k ( a ) ˆ C π , ˆ η π ) ≤ x o , (4)where vec − d,k ( a ) takes a vector a of length kd and builds a d × k matrix columnwise (such that theﬁrst d elements of a ﬁll the ﬁrst column of vec − d,k ( a ), the second d elements ﬁll the second column,and so forth), and γ ( p ) µ, Λ ( A ) is the probability that a p -variate Gaussian random variable with meanparameter µ and covariance parameter Λ falls within a set A , i.e. γ ( p ) µ, Λ ( A ) = 1 p det(2 π Λ) Z a ∈A exp (cid:26) −

12 ( a − µ ) T Λ − ( a − µ ) (cid:27) dx. Under Condition 2, 1 − K T n { T n ( Z ) , ˆ P } is the usual asymptotically valid right-tail p -value for H θ ( P ) leveraging asymptotic normality. Proposition 1.

Suppose that H T ( x, P ) and R T ( x, P ) are continuous and strictly increasing andthat Conditions 1 - 4 hold. Then, Theorem 2 applies to the distributional estimator K T n deﬁned in(4). The proof is presented in the Appendix. Given Theorem 1, it is a straightforward applicationof the continuous mapping theorem and Slutsky’s theorem (both in the conventional case and foruse within permutation distributions).

Remark 2.

For some choices of T n the transformation (4) will have a familiar form, being the cu-mulative distribution function of a known distribution; examples are given in §§ χ distributions. When this is not the case the probability required for the computa-tion of (4) may be replaced with a Monte-Carlo estimate from B draws of a multivariate normalwith mean zero and covariance ˆΓ π . Importantly, using a Monte-Carlo estimate does not corruptexactness of the test when P = . . . = P k . .4 Bootstrap prepivoting The transformation outlined in § T n , and requiresthe existence of a covariance estimator ˆΣ satisfying Condition 4. We now describe circumstancesunder which asymptotic validity for permutation tests when testing equality of parameters may berestored with the aid of bootstrap prepivoting (Beran, 1987, 1988). This avoids the need for anestimator ˆΣ, but will also be shown to be attractive even when a convenient covariance estimatorexists. This approach was ﬁrst employed by Chung and Romano (2016b) to restore asymptoticvalidity for the permutation distribution of the max t -statistic under the null of equality of mul-tivariate means. We now demonstrate that the approach is applicable to test statistics satisfyingCondition 3 under additional regularity conditions on the involved estimators.For a given permutation π ∈ G n and for i = 1 , .., k let Z ∗ πi , . . . , Z ∗ πin i be independent andidentically distributed draws from ˆ P πi , the empirical distribution of Z πi . Let Z ∗ πi be the n i × d matrix whose j th row contains Z ∗ πij , and write Z ∗ π for the n × d matrix stacking Z ∗ π , , . . . , Z ∗ πk ontop of one another. Write ˆΘ ∗ π = ˆΘ( Z ∗ π ), ˆ C ∗ π = ˆ C ( Z ∗ π ), and ˆ η ∗ π = ˆ η ( Z ∗ π ).Let ˘ T n be the bootstrap modiﬁcation of T n , deﬁned as˘ T n ( Z ∗ π , Z π ) = g { n / ( ˆΘ ∗ π − ˆΘ π ) ˆ C ∗ π , ˆ η ∗ π } , (5)and deﬁne the bootstrap distribution function J T n ( x, ˆ P π ) as J T n ( x, ˆ P π ) = pr { ˘ T n ( Z ∗ π , Z π ) ≤ x | Z π } . (6)The modiﬁcation ˘ T n is required for centering the bootstrap null distribution, accomplished throughsubtracting ˆΘ π in (5). The bootstrap prepivoted test statistic is then J T n { T n ( Z π ) , ˆ P π } . For thebootstrap to be applicable in our setup, additional restrictions on the estimators are required. Condition 5.

Let X ∗ i , . . . , X ∗ in i be independent and identically distributed draws from ˆ P i , and let X ∗ i be the n i × d matrix whose j th row is X ∗ ij . Then, for i = 1 , . . . , k , n / i { ˆ θ ℓ ( X ∗ i ) − ˆ θ ℓ ( X i ) } = n − / i n i X j =1 { f P i ,ℓ ( X ∗ ij,ℓ ) − f P i ,ℓ ( X ij,ℓ ) } + E in i , where E in i satisﬁes pr ( n / i E in i ≥ ǫ | X i ) → in probability as n i → ∞ for any ǫ > . Furthermore,let Y , . . . , Y n be distributed according to the mixture ¯ P = P ki =1 P i , and let Y ∗ , . . . , Y ∗ n be independentand identically distributed draws from the empirical distribution of Y , . . . , Y n , with Y and Y ∗ being n × d matrices whose j th rows are Y j and Y ∗ j . Then, n / { ˆ θ ℓ ( Y ∗ ) − ˆ θ ℓ ( Y ) } = n − / n X j =1 { f ¯ P ,ℓ ( Y ∗ j,ℓ ) − f ¯ P ,ℓ ( Y j,ℓ ) } + ¯ E n , where ¯ E n satisﬁes pr ( n / ¯ E n ≥ ǫ | Y ) → in probability as n → ∞ for any ǫ > . Condition 6. ˆ η ∗ and ˆ η ∗ Π satisfypr {| ˆ η ( Z ∗ ) − η | > ǫ | Z } → pr {| ˆ η ( Z ∗ Π ) − ¯ η | > ǫ | Z Π } → in probability as n → ∞ for any ǫ > . The analogous statements hold for ˆ C ( Z ∗ ) and ˆ C ( Z ∗ Π ) withlimiting values C and ¯ C respectively. roposition 2. Suppose that H T ( x, P ) and R T ( x, P ) are continuous and strictly increasing andthat Conditions 1-3 and 5-6 hold. Then, Theorem 2 applies to J T n deﬁned in (6). The proof of this proposition closely mirrors that of Theorem 2.6 of Chung and Romano (2016b).The result for the permutation distribution R J n is proven in the Appendix, while the result forthe true unconditional distribution H J n is standard under these conditions and omitted. Liu et al.(1989) describe the representation given in Condition 5 and suﬃcient conditions giving rise toit. Of particular interest, Hadamard diﬀerentiability of θ ( · ) at P , . . . , P k and ¯ P is suﬃcient forCondition 5 to hold. Verifying Condition 6 for mean-like statistics often involves applying certaintriangular array weak laws, such as that of Bickel and Freedman (1981, Theorem 2.2, part b) orLehmann and Romano (2005, Lemma 15.4.1). Lemma 4.1 of D¨umbgen and Del Conte-Zerial (2013)shows that Condition 6 may equivalently be expressed in terms of the unconditional convergence inprobability of ˆ η ( Z ∗ ), ˆ η ( Z ∗ Π ), ˆ C ( Z ∗ ), and ˆ C ( Z ∗ Π ), accounting for randomness in both the observedand the bootstrap sample for Z ∗ and observed sample, the bootstrap sample, and the permutationdraw for Z ∗ Π ; see also B¨ucher and Kojadinovic (2019).In practice both the bootstrap law J T n and the permutation distribution R J n will be approxi-mated through Monte-Carlo simulation. In § B of the Appendix we provide pseudocode illustratingthe general implementation. R code for the test statistics described in § § A is availableon the author’s website. The package boot within R greatly simpliﬁes implementation, as it canbe used to generate both permutation and bootstrap distributions for generic test statistics whileallowing for parallel computations. Given the additional computational burden imposed by embedding a bootstrap scheme within apermutation test, the approach of § § § T n .Rather than providing primitive conditions under which the expansions exist for speciﬁc test statis-tics such as those of the form speciﬁed in Condition 3, in our exposition we will highlight interestingconsequences of the expansions assuming their existence. Generally the expansions hold under thesmooth function model of Bhattacharya and Ghosh (1978); see also Hall (1992, § § Condition 7. H T n and R T n admit asymptotic expansions of the following form for some j ≥ : H T n ( x, P ) = A T n ( x, P ) + n − j/ a T n ( x, P ) + O ( n − ( j +1) / ); (7) R T n ( x, ˆ P ) = B T n ( x, ˆ P ) + n − j/ b T n ( x, ˆ P ) + O P ( n − ( j +1) / ) , (8) uniformly in x , where the functions on the right-hand side of the expansion are continuous in x . ondition 8. The bootstrap estimator J T n for the distribution of T n admits the following expan-sions for some j ≥ : J T n ( x, ˆ P ) = K T n ( x, ˆ P ) + n − j/ k T n ( x, ˆ P ) + O P ( n − ( j +1) / ); (9) J T n ( x, ˆ P Π ) = K T n ( x, ˆ P Π ) + n − j/ k T n ( x, ˆ P Π ) + O P ( n − ( j +1) / ) , (10)uniformly in x , where the functions on the right-hand side of the expansion are continuous in x .Furthermore, a T n ( x, P ) − k T n ( x, ˆ P ) and b T n ( x, ˆ P ) − k T n ( x, ˆ P Π ) are both O P ( n − / ) uniformly in x . Condition 9.

For some ˜ j ≥ , K T n satisﬁes K T n ( x, ˆ P ) − A T n ( x, P ) = O P ( n − ˜ j/ ); K T n ( x, ˆ P Π ) − B T n ( x, ˆ P ) = O P ( n − ˜ j/ ) uniformly in x . Moreover, uniformly in x , k T n ( x, ˆ P ) satisﬁes k T n ( x, ˆ P ) − a T n ( x, P ) = O P ( n − / ); k T n ( x, ˆ P Π ) − b T n ( x, ˆ P ) = O P ( n − / ) . Let H J n and R J n be the unconditional and permutation distributions for the bootstrap prepiv-oted J T n { T n ( Z ) , ˆ P } . The following theorems provide the order of errors in rejection probabilityusing a bootstrap prepivoted permutation test under H θ ( P ) . Theorem 3.

Suppose that Conditions 7 - 9 hold. Consider replacing the test statistic T n ( Z π ) with the bootstrap prepivoted J T n { T n ( Z π ) , ˆ P π } . Let U ( x ) be the distribution function of a uniformrandom variable on [0 , . Then, uniformly over x , H J n ( x, P ) = U ( x ) + O ( n − j / ); R J n ( x, ˆ P ) = U ( x ) + O P ( n − j / ) , where j = min { j, ˜ j } . As a result, H J n ( x, P ) − R J n ( x, ˆ P ) = O P ( n − j / ) uniformly in x . Theorem 4.

Consider the setup of Theorem 3, but strengthen Condition 9 such that K T n ( x, ˆ P Π ) = K T n ( x ) , A T n ( x, P ) = A T n ( x ) , and B T n ( x, ˆ P ) = B T n ( x ) . If A T n ( x ) = B T n ( x ) = K T n ( x ) , H J n ( x, P ) = U ( x ) + O ( n − ( j +1) / ); R J n ( x, ˆ P ) = U ( x ) + O P ( n − ( j +1) / ) , such that H J n ( x, P ) − R J n ( x, ˆ P ) = O P ( n − ( j +1) / ) uniformly in x . The proofs of Theorems 3 and 4 are presented in the Appendix, and closely follow the presen-tation of Cases 1 and 2 in Beran (1988, § § C of the Appendix, we discuss how repeatedapplications of bootstrap prepivoting may be used to further improve the order of the error inrejection probability. If the asymptotic expansions exist with the required number of terms, ℓ ap-plications of bootstrap prepivoting would yield an error order of O ( n − j + ℓ − ), where j = j underthe conditions of Theorem 3 and j = j + 1 under the conditions of Theorem 4. The required layersof bootstrapping become computationally burdensome as the number of iterations increase whileproviding diminishing improvements to the order of approximation, making anything beyond twoapplications of bootstrap prepivoting within a permutation test unappealing in most settings. Assuming that the expansions are valid, Theorems 3 and 4 show that whether or not the bootstrapfurnishes a higher-order correction depends upon whether or not A T n = B T n = K T n . Under thesetup of these theorems the Gaussian prepivoted test statistic K T n { T n ( Z ) , ˆ P } results in a test with10n error in rejection probability O ( n − j / ), the same order recovered through Theorem 3. As aresult, bootstrap prepivoting can only provide an improvement in the order of the error in rejectionprobability over Gaussian prepivoting when T n is an asymptotic pivot.We now suggest that the approaches of § § · ) used in the Gaussian prepivotingtransformation satisﬁes Condition 6 for ˆΓ( Z ∗ ) and ˆΓ( Z ∗ Π ). Consider using bootstrap prepivotingon a Gaussian prepivoted test statistic, yielding the bootstrap distribution function J K n ( x, ˆ P π ) = pr h K T n { ˘ T n ( Z ∗ π , Z π ) , ˆ P ∗ π } ≤ x | Z π i , where ˆ P ∗ π = Q ki =1 Q n i j =1 ˆ P ∗ πi and ˆ P ∗ πi is the empirical distribution of Z ∗ πi . The corresponding prepiv-oted test statistic is J K n [ K n { T n ( Z π ) , ˆ P π } , ˆ P π ]. After Gaussian prepivoting, by Theorem 1 the leadterm of the expansions for both the unconditional and permutation distributions of K T n { T n ( Z ) , ˆ P } are A K n ( x, P ) = B K n ( x, ˆ P ) = U ( x ), the distribution function for the standard uniform. Mean-while, K T n { T n ( Z π ) , ˆ P π } as deﬁned in (4) is itself a test statistic of the form (1) whenever T n ( Z π ),and K T n { ˘ T n ( Z ∗ π , Z π ) , ˆ P ∗ π } is of the form (5) whenever ˘ T n ( Z ∗ π , Z π ) is; see the proof of Proposition1 for details. Therefore, using Proposition 2 the lead term in the relevant bootstrap expansionswould also be U ( x ) in regular cases. So long as the expansions exist Theorem 4 then applies whenbootstrapping a Gaussian prepivoted test statistic.As previously discussed Gaussian prepivoting requires a suitable covariance estimator ˆΣ( · ), usedto form ˆΓ( Z π ) in (4). If a natural covariance estimator is unavailable, applying two layers ofbootstrap prepivoting can also provide a higher-order correction; see § C of the Appendix for details.

In situations where expressions for the expansions (7) - (8) are known in closed form, permutationtests with higher-order correctness under H θ ( P ) can be attained using suitable estimates for theterms in the expansions. This can avoid the computational costs of embedding the bootstrap withina permutation test, or can be used in conjunction with the bootstrap to further improve the orderof the error in rejection probability. The approach closely follows the developments in § E T n { T n ( Z π ) , ˆ P π } = K T n { T n ( Z π ) , ˆ P π } + n − j/ k T n { T n ( Z π ) , ˆ P π } . (11)Let H E n and R E n be the unconditional and permutation distributions of E T n { T n ( Z π ) , ˆ P π } . Thefollowing theorems show that the order of the diﬀerence between H E n ( x, P ) and R E n ( x, ˆ P ) dependsupon whether K T n ( x, · ) aligns with the lead terms in (7) and (8), just as in Theorems 3 and 4. Theorem 5.

Suppose that Conditions 7 and 9 hold. Then, H E n ( x, P ) − R E n ( x, ˆ P ) = O P ( n − j / ) uniformly in x , with j = min { ˜ j, j } . Theorem 6.

Suppose that Conditions 7 and 9 hold. Suppose further that K T n ( x, ˆ P Π ) = K T n ( x ) , A T n ( x, P ) = A T n ( x ) in (7), and B T n ( x, ˆ P ) = B T n ( x ) in (8), with K T n ( x ) = A T n ( x ) = B T n ( x ) .Then, H E n ( x, P ) − R E n ( x, ˆ P ) = O P ( n − ( j +1) / ) uniformly in x . J T n with E T n , H J n with H E n , and R J n with R E n in the proofs. Theorem6 shows the potential beneﬁt of adding an estimate of the next term of the expansion of T n to aGaussian prepivoted test statistic whenever the base statistic T n is already pivotal: an improve-ment in the order of the error rejection probability may be attained without using the bootstrap.Moreover, one may apply bootstrap prepivoting to E T n in order to attain the same higher-order ac-curacy as applying a double bootstrap (so long as the required expansions exist for E T n ). Theorem5 illustrates that when T n is not pivotal, there is no beneﬁt from adding an estimate of the subse-quent term for the order of the error in rejection probability relative to simply using the Gaussianprepivoted test statistic, much as there was no beneﬁt to applying bootstrap prepivoting in thiscase. Under Conditions 7 and 9 and assuming the setup of Theorem 5 the Gaussian prepivotedtest statistic K T n { T n ( Z π ) , ˆ P π } also satisﬁes H K n ( x, P ) − R K n ( x, ˆ P ) = O P ( n − j / ). We now present simulation studies to illustrate the beneﬁts of prepivoted permutation tests. Inthis section we will focus on properties of these tests under H θ ( P ) , focusing on both asymptoticcorrectness and higher-order accuracy for various prepivoted permutation tests while illustrating theimpropriety of other seemingly natural permutation tests. The primary beneﬁt of using prepivotedpermutation tests over alternative large-sample tests for H θ ( P ) is the exactness of inference for any n should the distributions also be equal. While we do not showcase this property in simulationstudies, it in combination with the theoretical results and simulation studies presented here providecompelling motivation for deploying prepivoted permutation tests in practice.Sections 5.2 - 5.4 concern inference for equality of population means, denoted H µ ( P ) , withscenarios varying in whether the two- or k -sample problem is under consideration and whether theresponse is univariate or multivariate. In these examples, we will let ˆ µ πi = ˆ µ ( Z πi ) be the samplemean of Z πi , with ˆ µ π = (ˆ µ π , .., ˆ µ πk ) T ; ˆ µ πi may be a scalar or vector-valued depending upon thecontext. ˆΣ πi will be either the sample variance or the sample covariance matrix for Z πi , and ˆΓ π willbe the kd × kd block-diagonal matrix with ( n/n i ) ˆΣ πi in the i th of k blocks. Test statistics will involveone of two contrast matrices. First, let C be the k × k ( k − / k group means. For instance, when k = 2 C = (1 , − T ; for k = 3, C is 3 × , − , T , (1 , , − T , and (0 , , − T . Second, let b = ( n /n, . . . , n k /n ) T ,let B be the k × k matrix with B ij = b i for all i , and deﬁne C = I k × k − B . Finally, a few commonhypothesis tests for H µ ( P ) assume homogeneity of variances, with the corresponding statistics usingthe usual pooled estimator of the variance. Through our examples we will show that bootstrapand Gaussian prepivoting may be applied to these statistics to restore asymptotic validity evenwhen variances are heterogeneous. To deﬁne the pooled variance let A be the n × k matrix whose i th column indicates membership in the i th group, such that A ji = 1 if j ∈ I i , 0 otherwise. Let H = A ( A T A ) − A T , deﬁne ˆ ν π = Z T π ( I n × n − H ) Z π / ( n − k ) as the pooled variance estimator, and letˆΛ π be the kd × kd block-diagonal matrix with ( n/n i )ˆ ν π in the i th of k blocks. In these examples therequired convergence in probability of ˆ ν ( Z Π ), ˆΛ( Z Π ) and ˆΓ( Z Π ) under Condition 3 may be veriﬁedusing Lemma 5.3 of Chung and Romano (2013).12 .2 Nonparametric Behrens-Fisher We begin with the two-sample, univariate case, and compare the performance of various permuta-tion tests when only equality of expectations holds. The candidate test statistics are T n ( Z π ) = n / ˆ µ T π C = n / (ˆ µ π − ˆ µ π ); S n ( Z π ) = T n ( Z π ) { C T ˆΓ π C } / = n / (ˆ µ π − ˆ µ π ) { ( n/n i ) ˆΣ π + ( n/n ) ˆΣ π } / ; K T n { T n ( Z π ) , ˆ P π } = K S n { S n ( Z π ) , ˆ P π } = Φ { S n ( Z π ) } ; E S n { S n ( Z π ) , ˆ P π } = Φ { S n ( Z π ) } + 16  ˆ ξ ( Z π ) n C T ˆΓ π C o /  φ { S n ( Z π ) } (cid:2) { S n ( Z π ) } + 1 (cid:3) n − / , where ˆ ξ ( Z π ) = ( n/n ) ˆ κ π − ( n/n ) ˆ κ π , ˆ κ πi is the sample central third moment for group i , andΦ( · ) and φ ( · ) are the cumulative distribution function and density function for the standard normaldistribution. T n is the diﬀerence in means. From Theorem 1, this test statistic may yield anti-conservativeinference under H µ ( P ) . K T n applies the approach of § T n . S n is the usual t -statistic studentized using the unpooled variance estimator, advocatedin Chung and Romano (2013) to restore ﬁrst-order accuracy to permutation tests under H µ ( P ) .Observe that K T n is a monotone increasing function of S n , such that permutation tests using S n and K T n will yield identical p -values. Due to this equivalence the success of studentization forrestoring ﬁrst-order accuracy can be understood in light of Proposition 1. E S n applies (11) to S n , with the second term in E S n representing an estimate of the second term in the Edgeworthexpansion for S n (Abramovitch and Singh, 1985, § E S n will reﬂect Theorem 6 due to pivotality of the lead term in the expansions for S n .Suﬃcient conditions for Conditions 7 and 8 to hold for T n are that E || X i || δ < ∞ for i = 1 , δ > P i is continuous for at least one i . For S n and K T n , E || X i || δ < ∞ along with the aforementioned continuity assumption are suﬃcient, while E || X i || δ < ∞ andthe continuity assumption are suﬃcient for the expansions for E S n to satisfy Conditions 7 and 8.Validity of the expansions (7) and (9) follows from classical arguments in the k -sample case underthese assumptions; see Babu and Singh (1983), Hall and Martin (1988), and Hall (1992) amongmany. The expansions for the randomization distributions (8) and (10) are less conventional butcan nonetheless be established using existing results. Validity of (10) can be established throughvarious approaches, such as through results from Liu et al. (1989) for the bootstrap under non in-dependent and identically distributed models. Interestingly, we can alternatively use the contiguityresults in Chung and Romano (2013, Lemma 5.3) to leverage results on Edgeworth expansions forthe bootstrap in the k -sample problem in the special case that all k samples arise as independentand identically distributed draws from a common distribution ¯ P . Essential for applying these con-tiguity results is the fact that the moment conditions and continuity requirements on the individualpopulations are all that is required for the conventional k -sample case. For (8), one can use resultsfrom ﬁnite population survey sampling under the assumption that the ﬁnite population is itself gen-erated from a superpopulation model. For instance, both T n and S n may be expressed in the formrequired for Theorem 2 of Babu and Singh (1985); see also Booth and Hall (1993), Booth et al.(1994) and Babu and Bai (1996) for related results. In particular, results in Babu and Bai (1996)provide for the existence of the expansions with remainder O P ( n − j/ ) for j > H µ ( P ) in the univariate, two-sample setting. For each simulation n = 0 . n ,and the nominal level of the tests is α = 0 . T n and S n are the unstudentized and studentizeddiﬀerence in means respectively. K T n applies (4) to T n , and E S n applies (11) to S n .Original Bootstrap Prepivot T n S n K T n E S n T n S n K T n E S n ERP O ( · ) 1 n − / n − / n − n − / n − n − n − / n = 50 0.13 0.12 0.12 0.09 0.13 0.08 0.08 0.07 n = 100 0.11 0.09 0.09 0.06 0.10 0.06 0.06 0.05 n = 250 0.11 0.08 0.08 0.06 0.08 0.05 0.05 0.05 n = 1000 0.09 0.06 0.06 0.05 0.06 0.05 0.05 0.05 n = 5000 0.09 0.05 0.05 0.05 0.06 0.05 0.05 0.05For each test we will reject for large values of the test statistic, with desired Type I error rate of α = 0 .

05. For each test statistic we further consider the permutation distributions after bootstrapprepivoting, yielding 8 test statistics in total. In each simulation scenario X , . . . , X n will beindependent and identically distributed as − ( E (1 / −

8) and X , . . . , X n will be independentand identically distributed as E (1 / −

5, where E ( λ ) is the exponential distribution with rate λ .The sample sizes n and n will be varied across scenarios, with n = n . In each scenario wesimulate 10,000 data sets. Permutation tests within each data set are based upon 999 draws fromthe permutation distribution. For the bootstrap prepivoted test statistics, 500 bootstrap samplesare drawn for each permutation.The ﬁrst row of Table 1 shows the expected accuracy of the resulting permutation distributions.Permuting T n itself will be asymptotically invalid because it is not asymptotically pivotal. Theorem3 applies to the bootstrap prepivoted transform of T n , while Theorem 4 applies to the bootstrapprepivoted transforms of the remaining test statistics. For S n and K S n , the order of the error inrejection probability is O ( n − / ) before applying bootstrap prepivoting, while the rate is O ( n − )for E S n before applying bootstrap prepivoting by Theorem 6.Table 1 shows the results of our simulation study with various choices of n and n . Even withthe largest values of n and n , T n does not provide a rejection rate close to the nominal level.This can be understood in light of Theorem 1 through a comparison of the covariances (2) and (3):the unconditional limit distribution of T n has a larger variance than that of the probability limitof R T n , inﬂating the rejection rate even asymptotically. For the remaining 7 test statistics, for n and n suﬃciently large the rejection rates approach the nominal level, a reﬂection of Propositions1 and 2. For smaller values of n , for each test statistic bootstrap prepivoting brings the rejectionrate closer to the nominal level. Furthermore, both before and after applying bootstrap prepivoting E S n comes closer to the nominal error rate than K T n or S n , which are in turn closer to the nominalrate than T n . This reﬂects the developments in §

4. As desired, these higher-order improvementsare particularly noticeable at small values of n . Comparing the extremes, at n = 50 the bootstrapprepivoted version of E S n has an estimated Type I error rate of 0.07 compared to 0.12 for S n without bootstrap prepivoting. At n = 100, these values are 0.05 and 0.09 respectively. Even at n = 1000, the permutation test based upon S n has a rejection rate slightly above the nominal leveldespite asymptotic validity.The permutation tests based using E S n and its bootstrap prepivoted transformation provideappealing approaches to the nonparametric Behrens-Fisher problem. The permutation test using14 S n is exact for any sample size if P = P , and provides an error in rejection probability of order O ( n − ) under H µ ( P ) . After applying bootstrap prepivoting to E S n , the order in the error in rejectionprobability for the resulting permutation test is reduced to O ( n − / ) while maintaining exactnessif P = P . Through additional applications of bootstrap prepivoting to E S n the error in rejectionprobability can be further reduced under H µ ( P ) while maintaining exactness when P = P ; see thediscussion of bootstrap iteration in § C of the Appendix for more details. t In this section we keep k = 2 but now consider multivariate responses. Three common test statisticsfor testing H µ ( P ) in this situation are the pooled and unpooled Hotelling statistics along with themax absolute t statistic ( L n , S n , and M n respectively): L n ( Z π ) = n (ˆ µ π C ) T ( C T ˆΛ π C ) − (ˆ µ π C ); S n ( Z π ) = n (ˆ µ π C ) T ( C T ˆΓ π C ) − (ˆ µ π C ); M n ( Z π ) = max ≤ ℓ ≤ d n / | (ˆ µ π C ) ℓ |{ ( C T ˆΓ π C ) ℓℓ } / . In addition to investigating the performance of these three test statistics under H µ ( P ) , we considerthe impact of (a) Bootstrap prepivoting the original statistics; (b) Gaussian prepivoting the orig-inal statistics; and (c) Bootstrap prepivoting the Gaussian prepivoted versions of the statistics,yielding 12 total test statistics. Chung and Romano (2016b) considered the use of the unpooledversion of Hotelling’s statistic S n ( Z π ) along with the bootstrap prepivoted version of the max ab-solute t statistic J M n { M n ( Z π ) , ˆ P π } for inference on H µ ( P ) using permutation tests. They furtherdemonstrated that without further modiﬁcation, L n ( Z π ) cannot be used for inference for H θ ( P ) .Applying Gaussian prepivoting to the unpooled Hotelling statistic yields the test statistic G d { S n ( Z π ) } , where G df is the distribution function of a χ random variable with df degrees offreedom. As this is a monotone increasing transformation of S n , permutation p -values attained af-ter Gaussian prepivoting will be identical to those attained without Gaussian prepivoting. Throughthis equivalence, Proposition 1 proves the asymptotic validity of permutation inference using S n for testing equality in means. Applying Gaussian prepivoting to L n ( Z π ) requires computing thedistribution function for general quadratic forms of multivariate Gaussians; we use the algorithm ofDavies (1980), implemented by the function davies within the R package CompQuadForm to accom-plish this. Applying Gaussian prepivoting to M n ( Z π ) requires the computation of the probabilitythat correlated mean-zero Gaussians fall within a rectangular region symmetric about the origin.While this can be accomplished using the pmvnorm function in R , for improved speed we proceedalong the lines of Remark 2, drawing B = 1000 Monte-Carlo draws from a multivariate normalwith mean zero and covariance ˆΓ π in order to approximate (4) within each permutation. Bootstrapprepivoting simply computes the bootstrap distribution of (5), with the function g within Condition3 varying with the test statistic being employed.Our response variables are d = 15 dimensional. For j = 1 , . . . , n we draw X j independentlyand identically distributed as exp {N (0 , I × ) } , where N reﬂects the multivariate normaldistribution and 0 is a vector of length 15 with zeroes in each element. For j = 1 , . . . , n wethen draw X j independently and identically distributed as N ( µ, V ), where V ℓℓ = 1, V ℓℓ ′ = 0 . ℓ = ℓ ′ , and µ ℓ = exp(0 .

5) for ℓ, ℓ ′ = 1 , . . . ,

15. We set n = 0 . n and n = 0 . n . We simulate5000 data sets using the above generative model with diﬀerent values for n . For each data set,permutation inference proceeds using 999 permutations. When bootstrap prepivoting is used, wedraw 200 bootstrap draws for each permutation π .15able 2: Simulated Type I error rates for permutation tests under H µ ( P ) in the multivariate, two-sample setting. For each simulation n = 0 . n , and the nominal level of the tests is α = 0 . T -squared,unpooled Hotelling T -squared, and max absolute t statistic.Original Gaussian Prepivot Bootstrap Prepivot Boot-after-Gauss n L n S n M n L n S n M n L n S n M n L n S n M n

150 0.82 0.37 0.33 0.32 0.37 0.31 0.16 0.16 0.11 0.12 0.16 0.12500 0.75 0.17 0.20 0.16 0.17 0.19 0.10 0.11 0.08 0.08 0.11 0.081000 0.70 0.11 0.14 0.10 0.11 0.13 0.07 0.08 0.06 0.06 0.08 0.062000 0.69 0.08 0.10 0.08 0.08 0.10 0.06 0.06 0.06 0.05 0.06 0.065000 0.69 0.07 0.08 0.07 0.07 0.08 0.06 0.06 0.05 0.05 0.05 0.05Table 2 shows the results for various choices of n . By comparing the covariances (2) and (3) inTheorem 1, tests permuting L n and M n will not have the correct level under H µ ( P ) even asymptot-ically, though the magnitude of the error is more alarming for L n than for M n in this simulation.A permutation test using S n on the other hand will be asymptotically correct under H µ ( P ) byProposition 1 because it is equivalent to a Gaussian prepivoted test statistic; note the equality ofthe rejection rates for S n before and after Gaussian prepivoting. The Gaussian prepivoted trans-formations of L n and M n restore asymptotic validity under H µ ( P ) , as evidenced as n increases inthe second set of columns. Comparing the second and third sets of columns, bootstrap prepivot-ing outperforms Gaussian prepivoting, resulting in tests that come much closer to attaining thenominal level. For L n in particular, applying bootstrap prepivoting after Gaussian prepivotingfurther improves the error rate in small samples relative to either (a) simply applying bootstrapprepivoting to L n itself; or (b) simply applying Gaussian prepivoting to L n without applying thebootstrap. The superiority of the tests in the fourth set of columns (bootstrap prepivoting afterGaussian prepivoting) for both S n and L n align with Theorem 4 and the discussion in § M n , while the bootstrap prepivoted test statistics both outperform the asymptotically valid Gaus-sian prepivoted transform there is no noticeable diﬀerence between the two modes of bootstrapprepivoting in the third and fourth set of columns. k -sample problem: Analysis of variance, Tukey-Kramer, and an alter-native robust test We now consider univariate responses in the multi-group setting, taking k = 4. Common teststatistics for inference on H µ ( P ) with k > F -statistic arising from an analysis of variancecomputation F n ; and the Tukey-Kramer range statistic T n , F n ( Z π ) = n { ˆ µ π C diag( b / ) }{ ˆ µ π C diag( b / ) } T / ( k − ν π ; T n ( Z π ) = max ≤ ℓ ≤ k ( k − / n / | ˆ µ π C | ℓ { ( C T ˆΛ π C ) ℓℓ } / , where diag( b / ) is a k × k diagonal matrix with b / i = ( n i /n ) / in the i th diagonal. The conven-tional reference distributions make an assumption of equality of variances across the k groups, an16ssumption not embedded within H µ ( P ) . Hence the usual reference distributions do not generallysatisfy Theorem 2 in the heteroskedastic case. Unlike in previous examples, Gaussian prepivotingfor F n and T n yields distributions with neither closed-form distribution functions nor fast numericalapproximation algorithms. While the transformation (4) can be approximated through Monte-Carlosimulation as described in Remark 2, we instead proceed applying bootstrap prepivoting to T n and F n to restore asymptotic correctness.In addition to these two statistics, we also consider the robust statistic suggested in Chung and Romano(2013, Equation 3.2), shown to provide a robust permutation test under H µ ( P ) in their Theorem3.1. To express this statistic in the form (1), let ˆ D π be a k × k matrix with identical columns whereˆ D πij = ( n i / ˆΣ πi ) / { P kℓ =1 ( n ℓ / ˆΣ πℓ ) } for all j , and let ˆ C π = I k × k − ˆ D π . Observe that ˆ C π is a randommatrix of column contrasts. Denoting their robust k - sample statistic as W n ( Z π ), W n ( Z π ) = k X i =1 ( n i /n )(1 / ˆΣ πℓ ) { ( n / ˆ µ π ˆ C π ) i } = k X i =1 n i ˆΣ πi ˆ µ πi − P kℓ =1 n ℓ ˆ µ πi / ˆΣ πℓ P kℓ =1 n ℓ / ˆΣ πℓ ! .W n also satisﬁes Condition 3. For this statistic, Gaussian prepivoting through (4) returns thetest statistic G k − ( W n ), where G df is the distribution function of a χ random variable with df degrees of freedom. This is a monotone increasing transformation of W n , such that the permutationdistributions of the original test statistic and the Gaussian prepivoted test statistic yield identical p -values. Proposition 1 thus provides an alternative justiﬁcation for the success of this test statistic: itis equivalent to a Gaussian prepivoted test statistic. As with T n and F n , we investigate permutationtests with both W n and its bootstrap prepivoted modiﬁcation.We draw 5000 data sets each of size n , with n = 0 . n , n = 0 . n , n = 0 . n , n = 0 . n . Let σ = 0 . , σ = 0 . , σ = 0 .

40, and σ = 0 .

25. For i = 1 , . . . ,

4, we take X ij independent andidentically distributed as X ij ∼ exp {N (0 , σ i ) } − exp( σ i /

2) for j = 1 , ..., n i . For each data set, weconduct permutation tests using the original and bootstrap prepivoted simulations based upon 999draws from the permutation distribution. When bootstrap prepivoting is used, we take 200 drawsfor each permutation π .Table 3 shows the results for various choices of n . As a consequence of Theorem 1, permutationtests based upon F n and T n will be asymptotically invalid under H µ ( P ) . This is reﬂected in theﬁrst set of columns, where these tests reject well above the nominal level even at n = 2000.The permutation test using W n will result in an asymptotically correct procedure by Proposition1 because it is equivalent to its Gaussian prepivoted transformation. The second set of columnsreﬂects Proposition 2: bootstrap prepivoting has restored ﬁrst-order correctness to the permutationtests using F n and T n under H µ ( P ) , and the nominal levels of these tests approach α = 0 .

05 as n increases. We also see that the bootstrap prepivoted version of W n provides an improvement in theerror in rejection rate relative to the tests using W n itself, particularly at n = 100. This highlightsthe beneﬁts of applying bootstrap prepivoting even with test statistics whose permutation tests arealready asymptotically valid for the null of equality of parameters of distributions, as suggested byTheorem 4. In § A of the Appendix, we consider multivariate analysis of variance with both k > d >

2. The results are qualitatively similar to those presented thus far. Wilk’s Lambda, the Pillai-Bartlett Trace, the Lawley-Hotelling Trace, and Roy’s Greatest Root are all of the form requiredin Condition 3, but permutation tests based upon the statistics themselves can be invalid under H µ ( P ) . Applying Gaussian or bootstrap prepivoting restores asymptotic validity to permutation17able 3: Simulated Type I error rates for permutation tests under H µ ( P ) in the k -sample, univariatesetting. We set k = 4. In each simulation n = 0 . n , n = 0 . n , n = 0 . n , n = 0 . n , and thenominal level of the tests is α = 0 .

05. Within each set of columns, left to right the test statisticsare the F -statistic from an analysis of variance, the Tukey-Kramer max statistic, and the statisticof Chung and Romano (2013, Equation 3.2).Original Bootstrap Prepivot n F n T n W n F n T n W n

100 0.22 0.23 0.12 0.08 0.09 0.09250 0.21 0.22 0.08 0.07 0.07 0.061000 0.20 0.22 0.06 0.06 0.06 0.062000 0.20 0.22 0.05 0.05 0.05 0.05tests conducted under H µ ( P ) . We also consider testing equality of medians in the two-sample caseby way of bootstrap prepivoting. This represents a case where the developments of § not apply, as while the bootstrap distributional estimators for the studentized and studentized mediansare consistent asymptotic expansions for these distributions do not exist (Hall and Martin, 1989;Sakov and Bickel, 2000). The robust permutation tests presented in this work are exact under the null of equality of dis-tributions, while providing asymptotically valid inference for equality of parameters so long asasymptotically linear estimators exist. The conditions in this work do not encompass permutationtests based upon k -sample U -statistics such as Wilcoxon’s rank sum test; see Chung and Romano(2016a) for a discussion of robust permutation tests for two-sample U -statistics. There it is shownthat while qualitatively similar ideas work for constructing robust permutation tests, the proofsvary considerably between the case of asymptotically linear estimators and the case of U -statistics.The remedies presented in §§ p -values based upon asymptotic normality and the other permuting bootstrap p -values. These areby no means the only available options. For instance, instead of the bootstrap, one could alsoconstruct p -values using subsampling. For the nonparametric Behrens-Fisher problem presented in § p -value from Welch’s unpooled t -test using the Welch-Satterthwaitedegrees of freedom. While not explicitly explored here, our use of permutation tests based uponasymptotically p -values has immediate consequences for power against contiguous alternatives to H θ ( P ) : the prepivoted permutation tests have the same limiting local power as the large-sample testswhose p -values they permute. See Chung and Romano (2013, Remark 2.3) for further discussion.Hence there is nothing lost in terms of ﬁrst-order power properties for the prepivoted permutationtests relative to large-sample alternatives, while much is gained by providing an exact test for P = . . . = P k .The possibility of higher-order corrections described in § p -values within permutation tests instead of other competitors: tests using bootstrapprepivoting maintain exactness under equality of distribution, while the order of approximationmay be driven down under equality of parameters through bootstrap iteration. The computationrequired by embedding a bootstrap scheme within a permutation test makes repeated iterations of18he bootstrap practically infeasible. When available, bootstrapping a statistic satisfying the con-ditions of Theorem 6 can provide an improvement in the order of approximation without requiringan additional layer of bootstrapping. An interesting area for future research is the potential use ofweighted bootstrap iteration (Lee and Young, 2003) within permutation tests, which is known toprovide subsequent improvements of O ( n − ) rather than O ( n − / ) with each bootstrap iterationfor bootstrap hypothesis tests. Appendix

A Additional simulation studies

A.1 Multivariate analysis of variance with prepivoted permutation tests

When k > d >

2, the usual tests for H µ ( P ) are based upon the eigenvalues of the product ofthe model sum of squares matrix and the inverse of the residual sum of squares matrix. Using thenotation established in § D n ( Z π ) = n n ˆ µ π C diag( b / ) }{ ˆ µ π C diag( b / ) } T / ( k − o ˆ ν − π . Let λ πj be the j th eigenvalue of D n ( Z π ). We consider the following common test statistics:Wilk’s Lambda = k Y j =1 { / (1 + λ πj ) } ;Pillai-Bartlett Trace = k X j =1 { λ πj / (1 + λ πj ) } ;Lawley-Hotelling Trace = k X j =1 λ πj ;Roy’s Greatest Root = max ≤ j ≤ k λ πj . For each of these test statistics, we consider permutation inference using both the statistic itselfand the bootstrap prepivoted transformations of these statistics. We note that unlike the otherstatistics considered in the main text, for Wilk’s Lambda evidence against the null is suggested bysmall values of the test statistic; we simply take the negative of the test statistic and compute theright-tail permutation p -value as discussed in the manuscript. The other three tests reject withlarge values of the test statistic.We simulate 5000 data sets of size n with k = 4 groups, with n = 0 . n , n = 0 . n , n = 0 . n , n = 0 . n . The outcome variable is of dimension d = 10. Let R ( ρ ) denote a 10 ×

10 correlationmatrix with equal correlations ρ . Let ρ = 0 . , ρ = 0 . , ρ = 0 . , ρ = 0 .

9, and let σ = 1 , σ =0 . , σ = 0 . , σ = 0 .

4. For j = 1 , . . . , n i and i = 1 , . . . ,

4, we generate outcome variables as X ij ∼ exp( N (cid:0) , σ i R ( ρ i ) (cid:1) − (exp( σ i / , . . . , exp( σ i / T , with the vector (exp( σ i / , . . . , exp( σ i / T being of length 10. In each data set we conduct per-mutation inference using 999 permutations. When bootstrap prepivoting is applied, we use 200bootstrap draws for each permutation. We investigate performance at n = 200 , H µ ( P ) in the k -sample, mul-tivariate setting. We set k = 4. In each simulation n = 0 . n , n = 0 . n , n = 0 . n , n = 0 . n ,and the nominal level of the tests is α = 0 .

05. Within each set of four columns, left to right thetest statistics are Wilk’s Lambda, the Pillai-Bartlett Trace, the Lawley-Hotelling Trace, and Roy’sGreatest Root. Original Bootstrap Prepivot n W P-B L-H R W P-B L-H R200 0.73 0.72 0.74 0.77 0.11 0.10 0.10 0.131000 0.72 0.72 0.73 0.78 0.07 0.07 0.07 0.082000 0.71 0.71 0.71 0.78 0.07 0.06 0.06 0.07Table S1 shows the results. We see that the permutation tests using the original test statisticshave rejection rates far above the nominal level even at n = 2000, and by Theorem 1 this wouldpersist asymptotically. The bootstrap prepivoted tests approach the nominal level as n increases,reﬂecting Proposition 2. A.2 Equality of medians

When testing equality of medians in the two-sample setting, Chung and Romano (2013, Exam-ple 2.2 and Section 4) suggest permuting a studentized diﬀerence in medians using the bootstrapstandard error. Here we demonstrate that the bootstrap could instead be used to prepivot theunstudentized diﬀerence in medians, providing an alternative to studentization for restoring asymp-totic validity for permutation tests performed in the absence of group invariance. Let ˆ m πi be themedian of the responses in group i , and let ˆ v πi be the bootstrap estimator of the variance for n / ( ˆ m π − ˆ m π ); see Chung and Romano (2013, §

4) for a closed form expression for ˆ v π . Deﬁne T n ( Z π ) = n / ( ˆ m π − ˆ m π ); S n ( Z π ) = T n ( Z π ) / ˆ v / πi . We will compare inference based upon T n and S n to tests applying bootstrap prepivoting to T n and S n . We note that while the developments in § § § X j be independent and identicallydistributed draws from a standard normal distribution for j = 1 , .., n , and let X j be independentand identically distributed draws from a normal distribution with mean zero and standard deviation5 for j = 1 , .., n . For each data set permutation inference is based upon 999 permutations. Forbootstrap prepivoted test statistics, 500 bootstrap draws are used for each permutation.Table S2 show the results for various choices of n . We ﬁrst note that unlike the case of thediﬀerence in means, the permutation test using T n does not yield an asymptotically valid test20or equality of medians even when n = n . We see that S n yields an asymptotically valid test byProposition 1 because it is equivalent to a Gaussian prepivoted test statistic, and that the bootstrapprepivoted version of T n and S n are also asymptotically valid through an application of Proposition2.Table S2: Simulated Type I error rates for permutation tests testing equality of medians in thetwo-sample, univariate case for various values of n and n . The nominal level of the tests is α = 0 .

05. Original Boot. Pre.( n , n ) T n S n T n S n (13 ,

13) 0.22 0.12 0.14 0.10(51 ,

51) 0.23 0.07 0.09 0.06(201 , , B Pseudocode for bootstrap prepivoted permutation tests

Algorithm 1 presents pseudocode for implementing a bootstrap prepivoted permutation test. Toperform Gaussian prepivoting, simply replace ˆ J T n with K T n as deﬁned in (4), which eliminates theneed for any draws from the bootstrap distribution. C Bootstrap iteration

Through iterated applications of bootstrap prepivoting one can attain further reﬁnements to theorder of approximation for permutation tests conducted when only H θ ( P ) holds if the expansionsadmit additional terms. Begin once again with a test statistic T n of the form (1). The ﬁrst applica-tion of bootstrap prepivoting yields the test statistic J T n { T n ( Z π ) , Z π } , the transform of T n ( Z π ) byits bootstrap distribution function. In the second round of bootstrap prepivoting, J T n { T n ( Z π ) , Z π } itself is transformed by a bootstrap estimate of its distribution function for each π ∈ P n . For agiven permutation π ∈ G n , one ﬁrst constructs ˆ P ∗ πi as the empirical distribution of Z ∗ πi , . . . , Z ∗ πin i (the bootstrap samples from ˆ P πi ) for i = 1 , . . . , k . Then, for each i , one takes n i independent andidentically distributed draws from ˆ P ∗ πi , call them Z ∗∗ πi , . . . , Z ∗∗ πin i , in so doing taking a bootstrapsample from the initial bootstrap sample in group i . Let Z ∗∗ πi be the n i × d matrix whose j th rowcontains Z ∗∗ πij , and write Z ∗∗ π for the n × d matrix stacking Z ∗∗ π , , . . . , Z ∗∗ πk on top of one another.Deﬁne J (1) J n ( x, ˆ P π ) as J (1) J n ( x, ˆ P π ) = pr[ J T n { ˘ T n ( Z ∗∗ π , Z ∗ π ) , ˆ P ∗ π } ≤ x | Z π ] . Regardless of whether or not Theorems 3 or 4 applied to the original test statistic, under Conditions7-9 and after the ﬁrst application of bootstrap prepivoting we have that H J n ( x, P ) = U ( x ) + O ( n − j / ) and R J n ( x, ˆ P ) = U ( x ) + O P ( n − j / ) for j = j + 1 under the setting of Theorem 4 or j = j under the setting of Theorem 3. Denote now H (1) J n = H J n and R (1) J n = R J n to reinforce thatthese are the true and permutation distributions after a single application of bootstrap prepivoting21 lgorithm 1: Prepivoted permutation test using bootstrap prepivoting

Input:

The observed data Z ; the test statistic T n ( · ) and its bootstrap modiﬁcation˘ T n ( · , · ); the number of bootstrap and permutation draws nboot and nperm . Result:

The p -value for the bootstrap prepivoted test statisticCompute T n ( Z ). for b=1 to b= nboot do Draw n i observations Z ∗ i , . . . , Z ∗ in i from ˆ P i (the empirical distribution of Z i ) for i = 1 , .., k .Stack Z ∗ , . . . , Z ∗ kn k as a matrix Z ∗ .Compute t b = ˘ T n ( Z ∗ , Z ). end Compute ˆ J T n { T n ( Z ) , ˆ P } = 1 nboot nboot X b =1 { t b ≤ T n ( Z ) } for p=1 to p= nperm do Draw π uniformly from the set of permutations P n .Create Z π by rearranging the rows of Z based upon the permutation π .Compute T n ( Z π ). for b=1 to b= nboot do Draw n i observations Z ∗ πi , . . . , Z ∗ πin i from ˆ P π (the empirical distribution of Z πi ) for i = 1 , .., k .Stack Z ∗ π , . . . , Z ∗ πkn k as a matrix Z ∗ π .Compute t πb = ˘ T n ( Z ∗ π , Z π ). end Compute j p = ˆ J T n { T n ( Z π ) , ˆ P π } = 1 nboot nboot X b =1 { t πb ≤ T n ( Z π ) } endreturn p val = 11 + nperm  nperm X p =1 j p ≥ ˆ J T n { T n ( Z ) , ˆ P } ]  J (1) J n ( x, P ) and J (1) J n ( x, P Π ) estimate these distributions. Supposing the existence of higher-order expansions for H (1) J n , R (1) J n and their bootstrap analogues, in regular cases they would take theform H (1) J n ( x, P ) = U ( x ) + n − j / ˜ a T n ( x, P ) + O ( n − ( j +1) / ); R (1) J n ( x, ˆ P ) = U ( x ) + n − j / ˜ b T n ( x, ˆ P ) + O P ( n − ( j +1) / ); J (1) J n ( x, ˆ P ) = U ( x ) + n − j / ˜ k T n ( x, ˆ P ) + O P ( n − ( j +1) / ); J (1) J n ( x, ˆ P Π ) = U ( x ) + n − j / ˜ k T n ( x, ˆ P Π ) + O P ( n − ( j +1) / ) . Consider the test statistic J (1) J n [ J T n { T n ( Z π ) , ˆ P π } , ˆ P π ] and suppose that ˜ a T n ( x, P ) − ˜ k T n ( x, ˆ P ) and˜ b T n ( x, ˆ P ) − ˜ k T n ( x, ˆ P Π ) are both O P ( n − / ) uniformly over x . The argument underpinning Theorem4 would then apply because the lead terms in the expansions all equal U ( x ). Letting H (2) J n and R (2) J n be the true and permutation distributions for J (1) J n [ J T n { T n ( Z π ) , ˆ P π } , ˆ P π ], the statistic after twoapplications of bootstrap prepivoting, we would have H (2) J n ( x, P ) − R (2) J n ( x, ˆ P ) = O P ( n − ( j +1) / ).This translates into an improvement from O ( n − j / ) to O ( n − ( j +1) / ) in the error in rejectionprobability under H θ ( P ) when permuting J (1) J n [ J T n { T n ( Z π ) , ˆ P π } , ˆ P π ].Further iterations of bootstrap prepivoting may be applied, so long as the asymptotic expansionsexist in the required form at each iteration. For instance, the third application of bootstrapprepivoting applies the bootstrap transform J (2) J n ( x, ˆ P π ) = pr( J (1) J n [ J T n { ˘ T n ( Z ∗∗∗ π , Z ∗∗ π ) , ˆ P ∗∗ π } , ˆ P ∗ π ] ≤ x | Z π ) , where for i = 1 , . . . , k Z ∗∗∗ πi , . . . , Z ∗∗∗ πin i are independent and identically distributed draws from ˆ P ∗∗ πi ,the empirical distribution of Z ∗∗ πi , . . . , Z ∗∗ πin i , and ˆ P ∗∗ π = Q ki =1 Q n i j =1 ˆ P ∗∗ πi . In general, the ℓ th appli-cation yields H ( ℓ ) J n ( x, P ) − R ( ℓ ) J n ( x, P ) = O P ( n − ( j + ℓ − / ) uniformly in x if the necessary expansionsexist. D Theorem 1 and the role of contrast matrices

Under Conditions 1 and 2, unconditional asymptotic normality for V n ( Z ) = vec { ˆΘ( Z ) ˆ C ( Z ) } followsimmediately from the central limit theorem and Slutsky’s theorem. The form of the covariance givenin (2) may be derived using Henderson and Searle (1979, Equation 6). To study the probabilitylimit of R V n ( x ) at all points x , we use a technique devised by Hoeﬀding (1952) and generalized byD¨umbgen and Del Conte-Zerial (2013) and Chung and Romano (2013, 2016c) which characterizesweak convergence in probability of permutation distributions in terms of the joint unconditionalconvergence of suitable test statistics. Adapting Lemma A.1 of Chung and Romano (2016c) to oursetting: Lemma 1.

Let Π and Π ′ be independent and identically distributed uniformly over P n , let Z bedistributed according to P n as before, and let Z be independent of Π and Π ′ . Then, R V n convergesweakly in probability to some law Q if and only if { V n ( Z Π ) , V n ( Z Π ′ ) } converges in distribution to ( U, U ′ ) , where U and U ′ are independent and identically distributed with law Q . Hence in order for R V n to converge weakly in probability to a ﬁxed law Q , { V n ( Z Π ) , V n ( Z Π ′ ) } must converge jointly in distribution to random variables distributed according to Q and must be23symptotically independent of one another. Under Conditions 1 and 2 the limit law Q for V n mustbe that of a multivariate Gaussian. Therefore, asymptotic independence of the limiting randomvariables U and U ′ can be established by showing that Cov ( U i , U ′ i ′ ) = 0 for all i, i ′ .Let Θ( ¯ P ) be the d × k matrix with θ ( ¯ P ) in each of the k columns, and let ¯ Z contain n independentand identically distributed draws from the mixture distribution ¯ P . We now apply the Cram´er-Wolddevice to directly leverage the proof of Theorem 3.1 in Chung and Romano (2013) for the case d = 1.Chung and Romano (2013) use slightly diﬀerent scaling in their proof. To relate their results toours, deﬁne ˜ θ i ( ¯ P ) = n / i θ ( ¯ P ) and ˆ˜ θ ( ¯ Z Π i ) = n / i ˆ θ ( ¯ Z Π i ). Let ˜Θ( ¯ P ) be the d × k matrix with ˜ θ i ( ¯ P )in the i th column, and deﬁne ˆ˜Θ( Z Π ) analogously.The proof of Theorem 3.1 in Chung and Romano (2013) establishes asymptotic normality for[vec { ˆ˜Θ( ¯ Z Π ) − ˜Θ( ¯ P ) } , vec { ˆ˜Θ( ¯ Z Π ′ ) − ˜Θ( ¯ P ) } ] T when d = 1 under Conditions 1 and 2. Applyingthe Cram´er-Wold device and recalling the relationship between ˆΘ( Z Π ) and ˆ˜Θ( Z π ), it follows that[ n / vec { ˆΘ( ¯ Z Π ) − Θ( ¯ P ) } , n / vec { ˆΘ( ¯ Z Π ′ ) − Θ( ¯ P ) } ] T converges in distribution to a mean zero mul-tivariate Gaussian (vec( D ) , vec( D ′ )), where D and D ′ are the limit distributions of n / { ˆΘ( ¯ Z Π ) − Θ( ¯ P ) } and n / { ˆΘ( ¯ Z Π ′ ) − Θ( ¯ P ) } respectively and are of dimension d × k . The covariance matricesfor vec( D ) and vec( D ′ ) are identical and equal ¯Γ, a kd × kd block-diagonal matrix with ¯Σ /p i inthe i th of k blocks as described in the statement of Theorem 1. On the oﬀ-diagonal, modifyingEquation (S25) of Chung and Romano (2013) for our scaling we have for any columns i, i ′ and anyrows j, j ′ cov( D ji , D ′ j ′ i ′ ) = ¯Σ jj ′ , (1)where D ji and D ′ j ′ i ′ are the { j, i } and { j ′ , i ′ } entries of D and D ′ .Lemma 1 cannot be be applied directly because (1) does not equal zero. Recall from Condition3 that ¯ C is a matrix of column contrasts and is assumed be the probability limit of ˆ C ( Z Π ). We nowconsider the limit distribution of ( n / vec[ { ˆΘ( Z Π ) − Θ( ¯ P ) } ¯ C ] , n / vec[ { ˆΘ( ¯ Z Π ′ ) − Θ( ¯ P ) } ¯ C ]) T , whichequals [ n / vec { ˆΘ( ¯ Z Π ) ¯ C } , n / vec { ˆΘ( ¯ Z Π ′ ) ¯ C } ] T because ¯ C is a matrix of column contrasts and thecolumns of Θ( ¯ P ) are identical. This vector converges in distribution to { vec( D ¯ C ) , vec( D ′ ¯ C ) } T with ( D, D ′ ) deﬁned as before. Choose any two columns ℓ and ℓ ′ of ¯ C and any two rows j and j ′ of ˆΘ( ¯ Z π ), and consider the limiting value of the covariance between n / P ki =1 ˆ θ j ( ¯ Z Π i ) ¯ C iℓ and n / P ki =1 ˆ θ j ′ ( ¯ Z Π ′ i ) ¯ C iℓ ′ , i.e. cov( P ki =1 D ji ¯ C iℓ , P ki =1 D ′ j ′ i ¯ C iℓ ′ ).cov ( n / k X i =1 ˆ θ j ( ¯ Z Π i ) ¯ C iℓ , n / k X i =1 ˆ θ j ′ ( ¯ Z Π ′ i ) ¯ C iℓ ′ ) → ¯Σ jj ′ k X i =1 ¯ C iℓ k X i =1 ¯ C iℓ ′ (2)= 0 , where the last line follows because ¯ C is a matrix of column contrasts. This holds for any columns ℓ and ℓ ′ of ¯ C and for any ˆ θ j ( ¯ Z Π ), ˆ θ j ′ ( ¯ Z Π ′ ). As a result, cov[ n / vec { ˆΘ( ¯ Z Π ) ¯ C } , n / vec { ˆΘ( ¯ Z Π ′ ) ¯ C } ]converges to a kd × kd matrix containing all zeroes. Using asymptotic normality, it follows thatvec { n / ˆΘ( ¯ Z Π ) ¯ C } and vec { n / ˆΘ( ¯ Z Π ′ ) ¯ C } are asymptotically independent. The form of the covari-ance given in (3) may again be derived from Henderson and Searle (1979, Equation 6).The above proof applied to independent permutations of ¯ Z , containing n independent andidentically distributed draws from ¯ P . To complete the proof for Z , containing n i samples from P i for i = 1 , . . . , k , we appeal to the coupling construction described in Section 5.3 of Chung and Romano(2013); see the proof of Theorem 3.1 of Chung and Romano (2013) for details. Lemma 1 alongwith Slutsky’s theorem for permutation distributions may then be applied to conclude that that24he permutation distribution of R V n converges weakly in probability to the law of a mean zeromultivariate Gaussian with covariance (3) as desired.The above derivation shows the necessity of the columns of ¯ C being contrast matrices under ourconditions. Suppose some column ℓ of ¯ C was not a contrast. Then, (2) with ℓ = ℓ ′ and taking j = j ′ for any j would only be guaranteed to be zero in the degenerate case where ¯Σ jj = 0 for all j , whichis disallowed under Condition 2. If (2) does not equal zero, asymptotic independence cannot beachieved and hence the permutation distribution for n / vec { ˆΘ( Z ) ˆ C ( Z ) } cannot converge weaklyin probability to the law of any ﬁxed distribution by Lemma 1. E Proof of Theorem 2

We prove the statement for H F n ( x, P ), the proof for R F n ( x, P ) being analogous. Let F − T n ( x, ˆ P ) =inf { x : F T n ( x, ˆ P ) ≥ x } be deﬁned as before, and deﬁne ˜ F − T n ( x, ˆ P ) = sup { x : F T n ( x, ˆ P ) ≤ x } .Recalling that H F n ( x, P ) = pr[ F T n { T n ( Z ) , ˆ P } ≤ x ], we havepr { T n ( Z ) ≤ ˜ F − T n ( x, ˆ P ) } ≤ pr[ F T n { T n ( Z ) , ˆ P } ≤ x ] ≤ pr { T n ( Z ) ≤ F − T n ( x, ˆ P ) } . As H T ( x, P ) is continuous and strictly increasing as a function of x , we have by Lemma 11.2.1 ofLehmann and Romano (2005) that F − T n ( x, ˆ P ) converges in probability to H − T ( x, P ) for any x ∈ [0 , F − T n ( x, ˆ P ) also converges in probability to H − T ( x, P ) for any x ∈ [0 , { T n ( Z ) ≤ F − T n ( x, ˆ P ) } and pr { T n ( Z ) ≤ ˜ F − T n ( x, ˆ P ) } both converge to H T { H − T ( x, P ) , P } = U ( x ), the distribution functionof a uniform random variable on [0,1], implying convergence of pr[ F T n { T n ( Z ) , ˆ P } ≤ x ] to U ( x )at all points x ∈ [0 , F Proof of Proposition 1

Under Condition 3, the function h (Γ , n / Θ C , η , C ) = γ ( kd )0 , Γ n a : g (vec − d,k ( a ) C , η ) ≤ g ( n / Θ C , η ) } o is jointly continuous in Γ , n / Θ C , η , C ; see Lemma B of Cohen and Fogarty (2020) for a proof.Moreover, observe that by Theorem 1 and the continuous mapping theorem, h { Γ , n / ˆΘ( Z ) C, η, C } converges in distribution to a standard uniform, and the permutation distribution of h { ¯Γ , n / ˆΘ( Z ) ¯ C, ¯ η, ¯ C } converges weakly in probability to the law of a standard uniform distribution. Note that K T n { T n ( Z π ) , ˆ P π } = h { ˆΓ( Z π ) , n / ˆΘ( Z π ) ˆ C ( Z π ) , ˆ η ( Z π ) , ˆ C ( Z π ) } . That { ˆ C ( Z ) , ˆ η ( Z ) , ˆΓ( Z ) } converges in probability to( C, η,

Γ) by Conditions 3 and 4 provides the conclusion of the proposition pertaining to the uncon-ditional law H K n through Theorem 1, Slutsky’s theorem, and continuous mapping theorem. Next, { ˆ C ( Z Π ) , ˆ η ( Z Π ) , ˆΓ( Z Π ) } converges in probability to ( ¯ C, ¯ η, ¯Γ) by Conditions 3 and 4, with the resultfor ˆΓ( Z Π ) using contiguity results in Lemma 5.3 of Chung and Romano (2013), available to us un-der Condition 1. Hence, Slutsky’s theorem and the continuous mapping theorem for permutationdistributions (Chung and Romano, 2016c, Lemmas A.4 and A.5) in concert with Theorem 1 thenprovide the result for the permutation distribution R K n .25 Proof of Proposition 2

The proof closely follows that of Theorem 2.6 of Chung and Romano (2016c), diﬀering primarilyin the condition used to ensure bootstrap consistency. For each i let G n,i ( x, ˆ P πi ) be the bootstrapestimator for the distribution of n / { ˆ θ ( Z πi ) − θ ( ¯ P ) } , i.e. the distribution of n / { ˆ θ ( Z ∗ πi ) − ˆ θ ( Z πi ) } given Z π when Z ∗ πi contains n i independent and identically distributed draws from ˆ P πi . Fix a δ > ε > G n as G n = { π : sup x | G n,i ( x, ˆ P πi ) − G i ( x, ¯ P ) | < δ, i = 1 , . . . , k }∩ { π : pr {| ˆ η ( Z ∗ π ) − ¯ η | > ε ∧ | ˆ C ( Z ∗ π ) − ¯ C | > ε | Z π } < δ } , where G i ( x, ¯ P ) is the distribution function of a multivariate Gaussian with mean zero and covariance¯Σ /p i . Let G cn be the complement of G n .The randomization distribution R J n ( x, ˆ P ) of J T n { T n ( Z π ) , ˆ P π } can be expressed as R J n ( x, ˆ P ) = 1 |P n | X π ∈G n J T n { T n ( Z π ) , ˆ P π } ≤ x ] + 1 |P n | X π ∈G cn J T n { T n ( Z π ) , ˆ P π } ≤ x ] . We begin by showing that |G n | / |P n | converges to 1 in probability. To do so, we can show that foreach i and for any δ, ε > |P n | X π ∈P n { sup x | G n,i ( x, ˆ P πi ) − G i ( x, ¯ P ) | < δ } P → , and that 1 |P n | X π ∈P n {| ˆ η ( Z ∗ π ) − ¯ η | > ε ∧ | ˆ C ( Z ∗ π ) − ¯ C | > ε | Z π } < δ ] P → . It is suﬃcient to show that for each i pr(sup x | G n,i ( x, ˆ P Π i ) − G i ( x, ¯ P ) | > δ ) → , (3)and that pr[pr {| ˆ η ( Z ∗ Π ) − ¯ η | > ε ∧ | ˆ C ( Z ∗ Π ) − ¯ C | > ε | Z Π } < δ ] → P n . (4) holds for any ε, δ by Condition 6. To prove (3), we use the contiguityresults presented in Lemma 5.3 of Chung and Romano (2013) which are at our disposal underCondition 1. Letting ˆ Q i be the empirical distribution of Y , . . . , Y n i when Y i are independent andidentically distributed according to ¯ P , Lemma 5.3 of Chung and Romano (2013) shows that (3)holds if pr(sup x | G n,i ( x, ˆ Q i ) − G i ( x, ¯ P ) | < δ ) → i = 1 , . . . , k by the bootstrap central limittheorem of Liu et al. (1989). Hence, (3) holds for any δ >

0, implying the weak consistency ofthe bootstrap for G n,i ( x, ˆ P Π i ) for all i . Furthermore, observe that given any Z π , the randomvariables n / { ˆ θ ( Z ∗ πi ) − ˆ θ ( Z πi ) } are jointly independent for i = 1 , . . . , k by construction. There-fore, an application of the Cram´er-Wold device yields the joint convergence of [ n / { ˆ θ ( Z ∗ π ) − θ ( Z π ) } , . . . , n / { ˆ θ ( Z ∗ πk ) − ˆ θ ( Z πk ) } ] T to k independent multivariate Gaussians, with component-wise distribution functions G i ( x, ¯ P ) for i = 1 , . . . , k .For any δ > ε > R J n as R J n ( x, ˆ P ) = 1 |P n | X π ∈G n J T n { T n ( Z π ) , ˆ P π } ≤ x ] + o P (1) . (5)Recall the construction of ˘ T n ( Z π , Z ∗ π ) in (5) used in deﬁning J T n ,˘ T n ( Z ∗ π , Z π ) = g { n / ( ˆΘ ∗ π − ˆΘ π ) ˆ C ∗ π , ˆ η ∗ π } , where the i th column of ˆΘ ∗ π − ˆΘ π is ˆ θ ( Z ∗ πi ) − ˆ θ ( Z πi ). Applying the continuous mapping theoremand Slutsky’s theorem, we then have that J T n ( x, ˆ P π ) converges in probability to R T ( x, P ) at allpoints x , where R T ( x, P ) is the distribution of g ( U, ¯ η ) and U has a multivariate normal distributionwith covariance given by (3) in the manuscript. Therefore, with probability tending to one, theremaining term in (5) is bounded for any ǫ > |P n | X π ∈G n R T { T n ( Z π ) , P } ≤ x − ǫ ] ≤ |P n | X π ∈G n J T n { T n ( Z π ) , ˆ P π } ≤ x ] ≤ |P n | X π ∈G n R T { T n ( Z π ) , P } ≤ x + ǫ ] . Using Theorem 1 and the continuous mapping theorem for permutation distributions (Chung and Romano,2016c, Lemma A.6), we know that the permutation distribution of T n , R T n ( x, ˆ P ), also converges inprobability to R T ( x, P ) at all points x , with R T ( · , P ) continuous and strictly increasing at R − T ( · , P )by assumption. Again applying the continuous mapping theorem for permutation distributions, wehave 1 |P n | X π ∈G n R T { T n ( Z π ) , P } ≤ x − ǫ ] P → x − ǫ ;1 |P n | X π ∈G n R T { T n ( Z π ) , P } ≤ x + ǫ ] P → x + ǫ, such that for any ǫ > x − ǫ ≤ |P n | X π ∈G n J T n { T n ( Z π ) , ˆ P π } ≤ x ] ≤ x + ǫ. Sending ǫ to zero completes the proof. H Proof of Theorem 3

Proof.

Observe that K T n ( x, ˆ P ) = A T n ( x, P ) + O P ( n − ˜ j/ ) and that A T n ( x, P ) = H T n ( x, P ) + O P ( n − j/ ). By continuity of a T n ( x, P ) and (7), we have pr[ H T n { T n ( Z ) , P } ≤ x ] = U ( x )+ O P ( n − j/ )27niformly in x . Therefore, we havepr[ K T n { T n ( Z ) , ˆ P } ≤ x ] = pr[ H T n { T n ( Z ) , P } + O P ( n − ˜ j/ ) + O P ( n − j/ ) ≤ x ]= U ( x ) + O ( n − j / )uniformly in x where j = min { j, ˜ j } . As a result,pr[ J T n { T n ( Z ) , ˆ P } ≤ x ] = pr[ K T n { T n ( Z ) , P } + O P ( n − j/ ) ≤ x ]= U ( x ) + O ( n − j / )uniformly in x . The analogous derivation yields that R J n ( x, ˆ P ) = pr h J T n { T n ( Z Π ) , ˆ P Π } ≤ x | Z i = U ( x ) + O P ( n − j / )uniformly in x , such that H J n ( x, P ) − R J n ( x, ˆ P ) = O P ( n − j / ) uniformly in x . I Proof of Theorem 4

Proof. As A T n ( x, P ) = K T n ( x, ˆ P ), we have H T n ( x, P ) − J T n ( x, ˆ P ) = n − j/ { a T n ( x, P ) − k T n ( x, ˆ P ) } + O P ( n − ( j +1) / ) = O P ( n − ( j +1) / )uniformly in x . Further observe that under the null, by continuity of a T n ( x, P )pr[ H T n { T n ( Z ) , P } ≤ x ] = U ( x ) + O ( n − ( j +1) / ) . uniformly in x . As a consequence, H J n ( x, P ) = pr[ J T n { T n ( Z ) , ˆ P } ≤ x ]= pr[ H T n { T n ( Z ) , P } ≤ x + O P ( n − ( j +1) / )] = U ( x ) + O ( n − ( j +1) / ) , uniformly in x . An analogous derivation gives R T n ( x, ˆ P ) − J T n ( x, ˆ P Π ) = O P ( n − ( j +1) / ); pr[ R T n { T n ( Z ) , ˆ P } ≤ x | Z ] = U ( x ) + O P ( n − ( j +1) / ) , uniformly in x such that R J n ( x, ˆ P ) = pr h J T n { T n ( Z Π ) , ˆ P Π } ≤ x | Z i = pr h R T n { T n ( Z Π ) , ˆ P } ≤ x + O P ( n − ( j +1) / ) | Z i = U ( x ) + O P ( n − ( j +1) / ) . uniformly in x . This yields H J n ( x, P ) − R J n ( x, ˆ P ) = O P ( n − ( j +1) / ) uniformly in x .28 eferences Abramovitch, L. and Singh, K. (1985). Edgeworth corrected pivotal statistics and the bootstrap.

The Annals of Statistics , 13(1):116–132.Babu, G. J. and Bai, Z. (1996). Mixtures of global and local Edgeworth expansions and theirapplications.

Journal of Multivariate Analysis , 59(2):282–307.Babu, G. J. and Singh, K. (1983). Inference on means using the bootstrap.

The Annals of Statistics ,11(3):999–1003.Babu, G. J. and Singh, K. (1985). Edgeworth expansions for sampling without replacement fromﬁnite populations.

Journal of Multivariate Analysis , 17(3):261–278.Beran, R. (1987). Prepivoting to reduce level error of conﬁdence sets.

Biometrika , 74(3):457–468.Beran, R. (1988). Prepivoting test statistics: A bootstrap view of asymptotic reﬁnements.

Journalof the American Statistical Association , 83(403):687–697.Bhattacharya, R. N. and Ghosh, J. K. (1978). On the validity of the formal Edgeworth expansion.

The Annals of Statistics , 6(2):434–451.Bickel, P. J. and Freedman, D. A. (1981). Some asymptotic theory for the bootstrap.

The Annalsof Statistics , pages 1196–1217.Booth, J. G., Butler, R. W., and Hall, P. (1994). Bootstrap methods for ﬁnite populations.

Journalof the American Statistical Association , 89(428):1282–1289.Booth, J. G. and Hall, P. (1993). An improvement of the jackknife distribution function estimator.

The Annals of Statistics , 21(3):1476–1485.B¨ucher, A. and Kojadinovic, I. (2019). A note on conditional versus joint unconditional weakconvergence in bootstrap consistency results.

Journal of Theoretical Probability , 32(3):1145–1165.Chung, E. and Romano, J. P. (2013). Exact and asymptotically robust permutation tests.

TheAnnals of Statistics , 41(2):484–507.Chung, E. and Romano, J. P. (2016a). Asymptotically valid and exact permutation tests based ontwo-sample U-statistics.

Journal of Statistical Planning and Inference , 168:97–105.Chung, E. and Romano, J. P. (2016b). Multivariate and multiple permutation tests.

Journal ofEconometrics , 193(1):76–91.Chung, E. and Romano, J. P. (2016c). Multivariate and multiple permutation tests.

Journal ofEconometrics , 193(1):76–91.Cohen, P. L. and Fogarty, C. B. (2020). Gaussian prepivoting for ﬁnite population causal inference. arXiv preprint arXiv:2002.06654 .Davies, R. B. (1980). Algorithm AS 155: The distribution of a linear combination of χ Journal of the Royal Statistical Society. Series C (Applied Statistics) , 29(3):323–333.29¨umbgen, L. and Del Conte-Zerial, P. (2013). On low-dimensional projections of high-dimensionaldistributions. In

From Probability to Statistics and Back: High-Dimensional Models andProcesses–A Festschrift in Honor of Jon A. Wellner , pages 91–104. Institute of MathematicalStatistics.Hall, P. (1992).

The bootstrap and Edgeworth expansion . Springer Science & Business Media.Hall, P. and Martin, M. (1988). On the bootstrap and two-sample problems.

Australian Journalof Statistics , 30(1):179–192.Hall, P. and Martin, M. A. (1989). A note on the accuracy of bootstrap percentile method conﬁdenceintervals for a quantile.

Statistics & Probability Letters , 8(3):197–200.Henderson, H. V. and Searle, S. (1979). Vec and vech operators for matrices, with some uses inJacobians and multivariate statistics.

Canadian Journal of Statistics , 7(1):65–81.Hoeﬀding, W. (1952). The large-sample power of tests based on permutations of observations.

TheAnnals of Mathematical Statistics , pages 169–192.Janssen, A. (1997). Studentized permutation tests for non-iid hypotheses and the generalizedBehrens-Fisher problem.

Statistics and Probability Letters , 36(1):9–21.Lee, S. M. and Young, G. A. (2003). Prepivoting by weighted bootstrap iteration.

Biometrika ,90(2):393–410.Lehmann, E. L. and Romano, J. P. (2005).

Testing statistical hypotheses . Springer Science &Business Media.Liu, R. Y., Singh, K., and Lo, S.-H. (1989). On a representation related to the bootstrap.

Sankhy¯a:The Indian Journal of Statistics, Series A , (2):168–177.Ludbrook, J. and Dudley, H. (1998). Why permutation tests are superior to t and F tests inbiomedical research.

The American Statistician , 52(2):127–132.Neuhaus, G. (1993). Conditional rank tests for the two-sample problem under random censorship.

The Annals of Statistics , 21(4):1760–1779.Omelka, M. and Pauly, M. (2012). Testing equality of correlation coeﬃcients in two populationsvia permutation methods.

Journal of Statistical Planning and Inference , 142(6):1396–1406.Pauly, M. (2011). Discussion about the quality of F-ratio resampling tests for comparing variances.

Test , 20(1):163–179.Romano, J. P. (1990). On the behavior of randomization tests without a group invariance assump-tion.

Journal of the American Statistical Association , 85(411):686–692.Ruxton, G. D. (2006). The unequal variance t-test is an underused alternative to Student’s t-testand the Mann–Whitney U test.

Behavioral Ecology , 17(4):688–690.Sakov, A. and Bickel, P. J. (2000). An Edgeworth expansion for the m out of n bootstrappedmedian.