Asymptotic optimality of the Westfall--Young permutation procedure for multiple testing under dependence
aa r X i v : . [ m a t h . S T ] M a r The Annals of Statistics (cid:13)
Institute of Mathematical Statistics, 2011
ASYMPTOTIC OPTIMALITY OF THE WESTFALL–YOUNGPERMUTATION PROCEDURE FOR MULTIPLE TESTINGUNDER DEPENDENCE By Nicolai Meinshausen, Marloes H. Maathuisand Peter B¨uhlmann
University of Oxford, ETH Z¨urich and ETH Z¨urich
Test statistics are often strongly dependent in large-scale multi-ple testing applications. Most corrections for multiplicity are undulyconservative for correlated test statistics, resulting in a loss of powerto detect true positives. We show that the Westfall–Young permu-tation method has asymptotically optimal power for a broad classof testing problems with a block-dependence and sparsity structureamong the tests, when the number of tests tends to infinity.
1. Introduction.
We consider multiple hypothesis testing where the un-derlying tests are dependent. Such testing problems arise in many applica-tions, in particular, in the fields of genomics and genome-wide associationstudies [9, 18, 24], but also in astronomy and other fields [21, 26]. Popularmultiple-testing procedures include the Bonferroni–Holm method [19] whichstrongly controls the family-wise error rate (FWER), and the Benjamini–Yekutieli procedure [3] which controls the false discovery rate (FDR), bothunder arbitrary dependence structures between test statistics. If test statis-tics are strongly dependent, these procedures have low power to detect truepositives. The reasons for this loss of power are well known: loosely speaking,many strongly dependent test-statistics carry only the information equiva-lent to fewer “effective” tests. Hence, instead of correcting among manymultiple tests, one would in principle only need to correct for the smallernumber of “effective” tests. Moreover, when controlling some error measureof false positives, an oracle would only need to adjust among the tests corre-sponding to true negatives. In large-scale sparse multiple testing situations,
Received June 2011; revised November 2011. N. Meinshausen and M. H. Maathuis contributed equally to this work.
AMS 2000 subject classifications.
Key words and phrases.
Multiple testing under dependence, Westfall–Young procedure,permutations, familywise error rate, asymptotic optimality, high-dimensional inference,sparsity, rank-based nonparametric tests.
This is an electronic reprint of the original article published by theInstitute of Mathematical Statistics in
The Annals of Statistics ,2011, Vol. 39, No. 6, 3369–3391. This reprint differs from the original inpagination and typographic detail. 1
N. MEINSHAUSEN, M. H. MAATHUIS AND P. B ¨UHLMANN this latter issue is usually less important since the number of true positivesis typically small, and the number of true negatives is close to the overallnumber of tests.The dependence among tests can be taken into account by using thepermutation-based Westfall–Young method [33], already used widely in prac-tice (e.g., [6, 36]). Under the assumption of subset-pivotality (see Section 2.3for a definition), this method strongly controls the FWER under any kindof dependence structure [32].In this paper we show that the Westfall–Young permutation method isan optimal procedure in the following sense. We introduce a single-step or-acle multiple testing procedure, by defining a single threshold such that allhypotheses with p -values below this threshold are rejected (see Section 2.2).The oracle threshold is the largest threshold that still guarantees the desiredlevel of the testing procedure. The oracle threshold is unknown in practiceif the dependence among test statistics and the set of true null hypothesesare unknown. We show that the single-step Westfall–Young threshold ap-proximates the oracle threshold for a broad class of testing problems witha block-dependence and sparsity structure among the tests, when the num-ber of tests tends to infinity. Our notion of asymptotic optimality relative toan oracle threshold is on a general level and for any specified test statistic.The power of a multiple testing procedure depends also on the data gener-ating distribution and the chosen individual test(s): we do not discuss thisaspect here. Instead, our goal is to analyze optimality once the individualtests have been specified.Our optimality result has an immediate consequence for large-scale multi-ple testing: it is not possible to improve on the power of the Westfall–Youngpermutation method while still controlling the FWER when consideringsingle-step multiple testing procedures for a large number of tests and as-suming only a block-dependence and sparsity structure among the tests(and no additional modeling assumptions about the dependence or clus-tering/grouping). Hence, in such situations, there is no need to considerad-hoc proposals that are sometimes used in practice, at least when takingthe viewpoint that multiple testing adjusted p -values should be as modelfree as possible.1.1. Related work.
There is a small but growing literature on optimal-ity in multiple testing under dependence. Sun and Cai [30] studied andproposed optimal decision procedures in a two-state hidden Markov model,while Genovese et al. [12] and Roeder and Wasserman [28] looked at theintriguing possibility of incorporating prior information by p -value weight-ing. The effect of correlation between test statistics on the level of FDRcontrol was studied in Benjamini and Yekutieli [3] and Benjamini et al. [2]; PTIMALITY OF THE WESTFALL–YOUNG PERMUTATION PROCEDURE see also Blanchard and Roquain [4] for FDR control under dependence. Fur-thermore, Clarke and Hall [7] discuss the effect of dependence and clusteringwhen using “wrong” methods based on independence assumptions for con-trolling the (generalized) FWER and FDR. The effect of dependence on thepower of Higher Criticism was examined in Hall and Jin [16, 17]. Anotherviewpoint is given by Efron [10], who proposed a novel empirical choice of anappropriate null distribution for large-scale significance testing. We do notpropose new methodology in this manuscript but study instead the asymp-totic optimality of the widely used Westfall–Young permutation method [33]for dependent test statistics.
2. Single-step oracle procedure and the Westfall–Young method.
Af-ter introducing some notation, we define our notion of a single-step oraclethreshold and describe the Westfall–Young permutation method.2.1.
Preliminaries and notation.
Let W be a data matrix containing n independent realizations of an m -dimensional random variable X = ( X , . . . ,X m ) with distribution P m and possibly some additional deterministic re-sponse variables y . Prototype of data matrix W . To make this more concrete, consider thefollowing setting that fits the examples described in Section 3.2. Let y bea deterministic variable, and allow the distribution of X = X y to dependon y . For each value y ( i ) , i = 1 , . . . , n , we observe an independent sample X ( i ) = ( X ( i )1 , . . . , X ( i ) m ) of X = X y ( i ) . We then define W to be an ( m + 1) × n -dimensional matrix by setting W ,i = y ( i ) for i = 1 , . . . , n and W j +1 ,i = X ( i ) j for j = 1 , . . . , m and i = 1 , . . . , n . Thus, the first row of W contains the y -variables, and the i th column of W corresponds to the i th data sam-ple ( y ( i ) , X ( i ) ).Based on W , we want to test m null hypotheses H j , j = 1 , . . . , m , con-cerning the m components X , . . . , X m of X . For concrete examples, seeSection 3.2. Let I ( P m ) ⊆ { , . . . , m } be the indices of the true null hypothe-ses, and let I ′ ( P m ) be the indices of the true alternative hypotheses, thatis, I ′ ( P m ) = { , . . . , m } \ I ( P m ). Let P be a distribution under the com-plete null hypothesis, that is, I ( P ) = { , . . . , m } . We denote the class of alldistributions under the complete null hypothesis by P .Suppose that the same test is applied for all hypotheses, and let S n ⊆ [0 , p -values this test can take. Thus, S n = [0 ,
1] for t -testsand related approaches, while S n is discrete for permutation tests and rank-based tests. Let p j ( W ), j = 1 , . . . , m , be the p -values for the m hypotheses,based on the chosen test and the data W . N. MEINSHAUSEN, M. H. MAATHUIS AND P. B ¨UHLMANN
Single-step oracle multiple testing procedure.
Suppose that we knewthe true set of null hypotheses I ( P m ) and the distribution of min j ∈ I ( P m ) p j ( W )under P m (which is of course not true in practice). Then we could de-fine the following single-step oracle multiple testing procedure: reject H j if p j ( W ) ≤ c m,n ( α ), where c m,n ( α ) is the α -quantile of min j ∈ I ( P m ) p j ( W )under P m . c m,n ( α ) = max n s ∈ S n : P m (cid:16) min j ∈ I ( P m ) p j ( W ) ≤ s (cid:17) ≤ α o . (1)Throughout, we define the maximum of the empty set to be zero, corre-sponding to a threshold c m,n ( α ) that leads to zero rejections.This oracle procedure controls the FWER at level α , since, by definition, P m ( H j is rejected for at least one j ∈ I ( P m ))= P m (cid:16) min j ∈ I ( P m ) p j ( W ) ≤ c m,n ( α ) (cid:17) ≤ α, and it is optimal in the sense that values c ∈ S n with c > c m,n ( α ) no longercontrol the FWER at level α .2.3. Single-step Westfall–Young multiple testing procedure.
The Westfall–Young permutation method is based on the idea that under the completenull hypothesis, the distribution of W is invariant under a certain group oftransformations G , that is, for every g ∈ G , gW and W have the same distri-bution under P ∈ P . Romano and Wolf [29] refer to this as the “random-ization hypothesis.” In the sequel, G is the collection of all permutations g of { , . . . , n } , so that the number of elements |G| equals n !. Prototype permutation group G acting on the prototype data matrix W . Inthe examples in Section 3.2, W is a prototype data matrix as described inSection 2.1. The prototype permutation g ∈ G leads to a matrix gW obtainedby permuting the first row of W (i.e., permuting the y -variables). For allexamples in Section 3.2, under the complete null hypothesis P ∈ P , thedistribution of gW is then identical to the distribution of W for all g ∈ G , sothat the randomization hypothesis is satisfied. We suppress the dependenceof |G| on the sample size n for notational simplicity.The single-step Westfall–Young critical value is a random variable, definedas follows:ˆ c m,n ( α ) = max (cid:26) s ∈ S n : 1 |G| X g ∈G n min j =1 ,...,m p j ( gW ) ≤ s o ≤ α (cid:27) = max n s ∈ S n : P ∗ (cid:16) min j =1 ,...,m p j ( W ) ≤ s (cid:17) ≤ α o , PTIMALITY OF THE WESTFALL–YOUNG PERMUTATION PROCEDURE where 1 {·} denotes the indicator function, and P ∗ represents the permuta-tion distribution P ∗ ( f ( W ) ≤ x ) = 1 |G| X g ∈G { f ( gW ) ≤ x } (2)for any function f ( · ) mapping W into R . In other words, ˆ c m,n ( α ) is the α -quantile of the permutation distribution of min j =1 ,...,m p j ( W ). Our mainresult (Theorem 1) shows that under some conditions, the Westfall–Youngthreshold ˆ c m,n ( α ) approaches the oracle threshold c m,n ( α ).It is easy to see that the Westfall–Young permutation method providesweak control of the FWER, that is, control of the FWER under the com-plete null hypothesis. Under the assumption of subset-pivotality, it also pro-vides strong control of the FWER [33], that is, control of the FWER underany set I ( P m ) of true null hypotheses. Subset-pivotality means that thedistribution of { p j ( W ) : j ∈ K } is identical under the restrictions T j ∈ K H j and T j ∈ I ( P ) H j for all possible subsets K ⊆ I ( P m ) of true null hypotheses.Subset-pivotality is not a necessary condition for strong control; see, for ex-ample, Romano and Wolf [29], Westfall and Troendle [31] and Goeman andSolari [13].
3. Asymptotic optimality of Westfall–Young.
We consider the frame-work where the number of hypotheses m tends to infinity. This frameworkis suitable for high-dimensional settings arising, for example, in microarrayexperiments or genome-wide association studies.3.1. Assumptions. (A1) Block-independence: the p -values of all true nullhypotheses adhere to a block-independence structure that is preserved un-der permutations in G . Specifically, there exists a partition A , . . . , A B m of { , . . . , m } such that for any pair of permutations g, g ′ ∈ G ,min ˜ g ∈{ g,g ′ } min j ∈ A b ∩ I ( P m ) p j (˜ gW ) , b = 1 , . . . , B m , are mutually independent under P m . Here, the number of blocks is denotedby B = B m . [We assume without loss of generality that A b ∩ I ( P m ) = ∅ forall b = 1 , . . . , B , meaning that there is at least one true null hypothesis ineach block; otherwise, the condition would be required only for blocks with A b ∩ I ( P m ) = ∅ .](A2) Sparsity: the number of alternative hypotheses that are true un-der P m is small compared to the number of blocks, that is, | I ′ ( P m ) | /B m → m → ∞ .(A3) Block-size: the maximum size of a block, m B m := max b =1 ,...,B m | A b | ,is of smaller order than the square root of the number of blocks, that is, m B m = o ( √ B m ) as m → ∞ . N. MEINSHAUSEN, M. H. MAATHUIS AND P. B ¨UHLMANN (B1) Let G be a random permutation taken uniformly from G . Under P m ,the joint distribution of { p j ( W ) : j ∈ I ( P m ) } is identical to the joint distri-bution of { p j ( GW ) : j ∈ I ( P m ) } .(B2) Let P ∗ be the permutation distribution in (2). There exists a con-stant r < ∞ such that for s = c m,n ( α ) ∈ S n and all W , r − s ≤ P ∗ ( p j ( W ) ≤ s ) ≤ rs for all j = 1 , . . . , m. (3)(B3) The p -values corresponding to true null hypotheses are uniformlydistributed; that is, for all j ∈ I ( P m ) and s ∈ S n , we have P m ( p j ( W ) ≤ s ) = s .A sufficient condition for the block-independence assumption (A1) is thatfor every fixed pair of permutations g, g ′ ∈ G the blocks of random vari-ables { p j ( gW ) , p j ( g ′ W ) : j ∈ A b ∩ I ( P m ) } are mutually independent for b =1 , . . . , B m . This condition is implied by block-independence of the m lastrows of the prototype W for the examples discussed in Section 3.2 andfor the prototype G as in Section 2.3. The block-independence assumptioncaptures an essential characteristic of large-scale testing problems: a teststatistic is often strongly correlated with a number of other test statisticsbut not at all with the remaining tests.The sparsity assumption (A2) is appropriate in many contexts. Mostgenome-wide association studies, for example, aim to discover just a fewlocations on the genome that are associated with prevalence of a certaindisease [20, 23]. Furthermore, assumption (A3) requiring that the range of(block-) dependence is not too large, which seems reasonable in genomicapplications: for example, when having many different groups of genes (e.g.,pathways), each of them not too large in cardinality, a block-dependencestructure seems appropriate.We now consider assumptions (B1)–(B3), supposing that we work witha prototype data matrix W and a prototype permutation group G as de-scribed in Sections 2.1 and 2.3. Assumption (B1) is satisfied if each p -va-lue p j ( W ) only depends on the 1st and ( j + 1)th rows of W . Moreover,subset-pivotality is satisfied in this setting. Assumption (B3) is satisfied forany test with valid type I error control. Assumption (B2) is fulfilled with r = 1 if for all WP G ( p j ( GW ) ≤ s | W ) = s, j = 1 , . . . , m, s ∈ S n , (4)where P G is the probability with respect to a random permutation G takenuniformly from G , so that the left-hand side of (4) equals P ∗ ( p j ( W ) ≤ s )in (3). Note that assumptions (B1) and (B3) together imply that P m,G ( p j ( GW ) ≤ s ) = s, j ∈ I ( P m ) , s ∈ S n , (5)where the probability P m,G is with respect to a random draw of the data W ,and a random permutation G taken uniformly from G . Thus, assumption (B2) PTIMALITY OF THE WESTFALL–YOUNG PERMUTATION PROCEDURE holds if (5) is true for all j = 1 , . . . , m when conditioned on the observeddata. Section 3.2 discusses three concrete examples that satisfy assump-tions (B1)–(B3) and subset-pivotality. Remark.
For our theorems in Section 3.3, it would be sufficient if (3)were holding only with probability converging to 1 when sampling a ran-dom W , but we leave a deterministic bound since it is easier notationally,the extension is direct and we are mostly interested in rank-based and con-ditional tests for which the deterministic bound holds.3.2. Examples.
We now give three examples that satisfy assump-tions (B1)–(B3), as well as subset-pivotality. As in Section 2.1, let y bea deterministic scalar class variable and X = ( X , . . . , X m ) an m -dimensionalvector of random variables, where the distribution of X = X y can dependon y . Let the prototype data matrix W and the prototype group of permuta-tions G be defined as in Sections 2.1 and 2.3, respectively. In all examples, wework with tests with valid type I error control, and each p -value p j ( W ) onlydepends on the 1st and ( j + 1)th rows of W . Hence, assumptions (B1), (B3)and subset-pivotality are satisfied, and we focus on assumption (B2) in theremainder.For the examples in Sections 3.2.1 and 3.2.2, we assume that there existsa µ ( y ) ∈ R m and an m -dimensional random variable Z = ( Z , . . . , Z m ) suchthat X = X y = µ ( y ) + Z. (6)We omit the dependence of X = X y on y in the following for notationalsimplicity.3.2.1. Location-shift models.
We consider two-sample testing problemsfor location shifts, similar to Example 5 of Romano and Wolf [29]. Usingthe notation in (6), y ∈ { , } is a binary class variable, and the marginaldistributions of Z are assumed to have a median of zero.We are interested in testing the null hypotheses H j : µ j (1) = µ j (2) , j = 1 , . . . , m, versus the corresponding two-sided alternatives, H ′ j : µ j (1) = µ j (2) , j = 1 , . . . , m. We now discuss location-shift tests that satisfy assumption (B2). First,note that all permutation tests satisfy (B2) with r = 1, since the p -valuesin a permutation test are defined to fulfill P ∗ ( p j ( W ) ≤ s ) = s for all s ∈ S n .Permutation tests are often recommended in biomedical research [22] and N. MEINSHAUSEN, M. H. MAATHUIS AND P. B ¨UHLMANN other large scale location-shift testing applications due to their robustnesswith respect to the underlying distributions. For example, one can use theWilcoxon test. Another example is a “permutation t -test”: choose the p -value p j ( W ) as the proportion of permutations for which the absolute valueof the t -test statistic is larger than or equal to the observed absolute valueof the t -test statistic for H j . Then condition (B2) is fulfilled with r = 1with the added advantage that inference is exact, and the type I error isguaranteed even if the distributional Gaussian assumption for the t -testis not fulfilled [14]. Computationally, such a “permutation t -test” procedureseems to involve two rounds of permutations: one for the computation of themarginal p -value and one for the Westfall–Young method; see (2). However,the computation of the marginal permutation p -value can be inferred fromthe permutations in the Westfall–Young method, as in Meinshausen [25],and just a single round of permutations is thus necessary.3.2.2. Marginal association.
Suppose that we have a continuous vari-able y in formula (6). Based on the observed data, we want to test the nullhypotheses of no association between variable X j and y , that is, H j : µ j ( y ) is constant in y, j = 1 , . . . , m, versus the corresponding two-sided alternatives. A special case is the testfor linear marginal association, where the functions µ j ( y ) for j = 1 , . . . , m are assumed to be of the form µ j ( y ) = γ j + β j y , and the test of no linearmarginal association is based on the null hypotheses H j : β j = 0 , j = 1 , . . . , m. Rank-based correlation test like Spearman’s or Kendall’s correlation co-efficient are examples of tests that fulfill assumption (B2). Alternatively,a “permutation correlation-test” could be used, analogous to the “permuta-tion t -test” described in Section 3.2.1.3.2.3. Contingency tables.
Contingency tables are our final example. Let y ∈ { , , . . . , K y } be a class variable with K y distinct values. Likewise, as-sume that the random variable X is discrete and that each component of X can take K x distinct values, X = ( X , . . . , X m ) ∈ { , , . . . , K x } m .As an example, in many genome-wide association studies, the variables ofinterest are single nucleotide polymorphisms (SNPs). Each SNP j (denotedby X j ) can take three distinct values, in general, and it is of interest to seewhether there is a relation between the occurrence rate of these categoriesand a category of a person’s health status y [5, 15, 20].Based on the observed data, we want to test the null hypothesis for j =1 , . . . , m that the distribution of X j does not depend on y , H j : P ( X j = k | y ) = P ( X j = k ) for all k ∈ { , . . . , K x } and y ∈ { , . . . , K y } . PTIMALITY OF THE WESTFALL–YOUNG PERMUTATION PROCEDURE The available data for hypothesis H j is contained in the 1st and ( j + 1)throws of W . These data can be summarized in a contingency table andFisher’s exact test can be used. Since the test is conditional on the marginaldistributions, we have that P ( p j ( GW ) ≤ s | W ) = s for a random permutation G ∈ G and (B2) is fulfilled with r = 1.3.3. Main result.
We now look at the properties of the Westfall–Youngpermutation method and show asymptotic optimality in the sense that, withprobability converging to 1 as the number of tests increases, the estimatedWestfall–Young threshold ˆ c m,n ( α ) is at least as large as the optimal oraclethreshold c m,n ( α − δ ), where δ > Theorem 1.
Assume (A1)–(A3) and (B1)–(B3) . Then for any α ∈ (0 , and any δ ∈ (0 , α ) P m { ˆ c m,n ( α ) ≥ c m,n ( α − δ ) } → as m → ∞ . (7)We note that the sample size n can be fixed and does not need to tendto infinity. However, if the range of p -values S n is discrete, the sample sizemust increase with m to avoid a trivial result where the oracle threshold c m,n ( α − δ ) vanishes; see also Theorem 2 where this is made explicit for theWilcoxon test in the location-shift model of Section 3.2.1.Theorem 1 implies that the actual level of the Westfall–Young procedureconverges to the desired level (up to possible discretization effects; see Sec-tion 3.4). To appreciate the statement in Theorem 1 in terms of power gain,consider a simple example. Assume that the m hypotheses form B blocks.In the most extreme scenario, test statistics are perfectly dependent withineach block. In such a scenario, the oracle threshold (1) for each individual p -value is then 1 − B √ − α, which is larger than, but very closely approximated by α/B for large val-ues of B . Thus, when controlling the FWER at level α , hypotheses can berejected when their p -values are less than 1 − B √ − α and certainly whentheir p -values are less than α/B . However, the value of B and the block-dependence structure between hypotheses are unknown in practice. Witha Bonferroni correction for the FWER at level α , hypotheses can be rejectedwhen their p -values are less than α/m . If m ≫ B , the power loss compared tothe procedure with the oracle threshold is substantial, since the Bonferronimethod is really controlling at an effective level of size αB/m instead of α .Theorem 1, in contrast, implies that the effective level under the Westfall–Young procedure converges to the desired level (up to possible discretizationeffects). N. MEINSHAUSEN, M. H. MAATHUIS AND P. B ¨UHLMANN
Discretization effects with Wilcoxon test.
We showed in the last sec-tion that the Westfall–Young permutation method is asymptotically equiva-lent to the oracle threshold under the made assumptions. In this section welook in more detail at the difference between the nominal and effective levelsof the oracle multiple testing procedure. Controlling at nominal level α , theeffective oracle level is defined as α − = P m n min j ∈ I ( P m ) p j ( W ) ≤ c m,n ( α ) o . (8)By definition, α − is less than or equal to α . We now examine under whichassumptions the effective level α − can be replaced by the nominal level α .As a concrete example, we work with the following assumptions:(W) The test is a two-sample Wilcoxon test with equal sample sizes n = n = n/
2, applied to a location-shift model as defined in Section 3.2.1.(A3 ′ ) Block-size: the maximum size of a block satisfies m B = O (1) as m → ∞ .The restriction to equal sample sizes in (W) is only for technical simplicity.We then obtain the following result about the discretization error. Theorem 2.
Assume (W) . Then the oracle critical value c m,n ( α ) isstrictly positive when n ≥ ( m/α ) + 2 . When assuming in addition (A1), (A2) and (A3 ′ ) , then the results of The-orem 1 hold, and, for any α ∈ (0 , , we have α − → α as m, n → ∞ such that n/ log( m ) → ∞ . The first result in Theorem 2 says that the oracle critical value for thetest defined in (W) is nontrivial, even when the number of tests grows al-most exponentially with sample size. Hence, in this setting the result fromTheorem 1 still applies in a nontrivial way.The second result in Theorem 2 gives sufficient criteria for the effectiveoracle level α − to converge to α . It is conceivable that this result can also beobtained under a milder assumption than (A3 ′ ), but this requires a detailedstudy of the Wilcoxon p -values, and we leave this for future work. The maintakeaway message is that discreteness of the p -values does not change theoptimality result fundamentally.
4. Empirical results.
The power of the Westfall–Young procedure hasalready been examined empirically in several studies. Westfall et al. [34]includes a comparison with the Bonferroni–Holm method, reporting a gainin power when using the Westfall–Young procedure. Its focus is on “geneticeffects in association studies,” including genotype (SNP-type) analysis and
PTIMALITY OF THE WESTFALL–YOUNG PERMUTATION PROCEDURE also gene expression microarray analysis. Becker and Knapp [1] apply theWestfall–Young permutation procedure and report substantial gain in powerover Bonferroni correction in the context of haplotype analysis. Yekutieli andBenjamini [37] and Reiner et al. [27] discuss the gain of resampling in termsof power, although their focus is mainly on FDR controlling methods. AlsoDudoit et al. [8] and Ge et al. [11] report that resampling-based methodssuch as the Westfall–Young permutation procedure have clear advantages interms of power.Here, we look at a few simulated examples to study the finite-sampleproperties and compare with the asymptotic results of Theorem 1. Datafor a two-sample location-shift model as in Section 3.2.1 with m hypothe-ses and equal sample sizes of 50 are generated from a multivariate Gaus-sian distribution with unit variances and (i) a Toeplitz correlation matrixwith correlations ρ i,j = ρ | i − j | for some ρ ∈ { . , . , . } and 1 ≤ i, j ≤ m and (ii) a block covariance model, where correlations within all blocks (ofsize 50 each) are set to the same ρ ∈ { . , . , . } and to 0 outside ofeach block. Ten alternative hypotheses are picked at random from the first100 components by applying a shift of 0.75 whereas all remaining m − α = 0 .
05. The step-down version of the Bonferroni cor-rection is the Bonferroni–Holm procedure [19] and the step-down versionof the Westfall–Young procedure is given in Westfall and Young [33]. Theoracle threshold (both single-step and step-down) is approximated on a sep-arate set of 1000 simulations, and the Westfall–Young method is using 1000permutations for each simulation.The following main results emerge: the Westfall–Young method is veryclose in power to the oracle procedure for all values of m in the blockmodel, giving support to the asymptotic results of Theorem 1. Moreover,the Westfall–Young and the oracle procedures are also very similar in theToeplitz model, indicating that Theorem 1 may be generalized beyond block-independence models. The power gains of the Westfall–Young procedure,compared to Bonferroni–Holm, are substantial; between 20% and 250% forthe considered scenarios, where the largest gains are achieved in settingswith a large number of hypotheses and high correlations. Finally, the dif-ference between step-down and single-step methods is very small in thesesparse high-dimensional settings for all three multiple testing methods.It might be unexpected that the power of the Westfall–Young is slightlylarger than the oracle procedure for two settings with very high correlations.This is due to the finite number of simulations when approximating theoracle threshold, a finite number of permutations in the Westfall–Youngprocedure and a finite number of simulation runs. The family-wise error rate N. MEINSHAUSEN, M. H. MAATHUIS AND P. B ¨UHLMANN
Fig. 1.
Average number of true positives for the Toeplitz model (top row) and the blockmodel (bottom row). The number of hypotheses is increasing from m = 100 (left panel) to m = 10 , (right panel) and the correlation parameter is varied as ρ = 0 . , . , . in the Toeplitz model and ρ = 0 . , . , . in the block model. The results are shown forthe Bonferroni correction (single-step and corresponding step-down Bonferroni–Holm; lightcolor); the oracle procedure (single-step and step-down; gray color) and the Westfall–Youngpermutation procedure (single-step and step-down; dark color). is between 0.03 and 0.04 for both oracle and Westfall–Young procedures andbelow 0.02 for the Bonferroni correction in the Toeplitz model. The nominallevel of α = 0 .
05 is exceeded sometimes in the block model (again for thereason that we use only a finite number of simulations), where both the oracleand Westfall–Young procedures attain a family-wise error rate between 0.04and 0.07 in all settings.
PTIMALITY OF THE WESTFALL–YOUNG PERMUTATION PROCEDURE The computational cost of the Westfall–Young procedures scales approxi-mately linearly with the number m of hypotheses. When using 1000 permu-tations, computing the Westfall–Young threshold takes about 1.4 secondsper hypotheses on a 3 GHz CPU. For the largest setting of m = 10,000 hy-potheses, the threshold could thus be computed in just over 2 minutes. Itseems as if this computational cost is acceptable, even for very large-scaletesting problems.
5. Discussion.
We considered asymptotic optimality of large-scale multi-ple testing under dependence within a nonparametric framework. We showedthat, under certain assumptions, the Westfall–Young permutation method isasymptotically optimal in the following sense: with probability converging to1 as the number of tests increases, the Westfall–Young critical value for mul-tiple testing at nominal level α is greater than or equal to the unknown oraclethreshold at level α − δ for any δ >
0. This implies that the actual level ofthe Westfall–Young procedure converges to the effective oracle level α − . Toinvestigate the possible impact of discrete p -values, we studied a specific ex-ample and provided sufficient conditions that ensure that α − converges to α .We gave several examples that satisfy subset-pivotality and our assump-tions (B1)–(B3) [while assumptions (A1)–(A3) are about the unknown data-generating distribution]. Most of these examples involve rank-based or per-mutation tests. These tests are appropriate for very high-dimensional testingproblems. If the number of tests is in the thousands or even millions, extremetail probabilities are required to claim significance, and these tail probabili-ties are more trustworthy under a nonparametric than a parametric test.If the hypotheses are strongly dependent, the gain in power of the Westfall–Young method compared to a simple Bonferroni correction can be very sub-stantial. This is a well known, empirical fact, and we have established herethat this improvement is also optimal in the asymptotic framework we con-sidered.Our theoretical results could be expanded to include step-down proce-dures like Bonferroni–Holm [19] and the step-down Westfall–Young method[11, 33]. The distinction between single-step and step-down procedures isvery marginal though in our sparse high-dimensional framework, as reportedin Section 4, since the number of rejected hypotheses is orders of magnitudessmaller than the total number of hypotheses.
6. Proofs.
After introducing some additional notation in Section 6.1,the proof of Theorem 1 is in Section 6.2 and the proof of Theorem 2 is inSection 6.3.6.1.
Additional notation.
Let p ( b ) ( W ) be the minimum p -value over alltrue null hypotheses in the b th block: p ( b ) ( W ) = min j ∈ A b ∩ I ( P m ) p j ( W ) , b = 1 , . . . , B, N. MEINSHAUSEN, M. H. MAATHUIS AND P. B ¨UHLMANN and let π b ( c ) denote the probability under P m that p ( b ) ( W ) is less than orequal to a constant c ∈ [0 , π b ( c ) = P m ( p ( b ) ( W ) ≤ c ) , b = 1 , . . . , B. Throughout, we denote the expected value, the variance and the covarianceunder P m by E m , Var m and Cov m , respectively.6.2. Proof of Theorem 1.
Let α ′ ∈ (0 ,
1) and δ ′ ∈ (0 , α ′ ). Let δ = δ ′ / α = α ′ − δ ′ . Then writing expression (7) in terms of α ′ and δ ′ is equivalentto P m { ˆ c m,n ( α + 2 δ ) ≥ c m,n ( α ) } → m → ∞ . By definition,ˆ c m,n ( α + 2 δ ) = max n s ∈ S n : P ∗ (cid:16) min j ∈{ ,...,m } p j ( W ) ≤ s (cid:17) ≤ α + 2 δ o . We thus have to show that P m n P ∗ (cid:16) min j ∈{ ,...,m } p j ( W ) ≤ c m,n ( α ) (cid:17) ≤ α + 2 δ o → m → ∞ .First, we show in Lemma 1 that there exists an M < ∞ such that P ∗ (cid:16) min j ∈{ ,...,m } p j ( W ) ≤ c m,n ( α ) (cid:17) ≤ P ∗ (cid:16) min j ∈ I ( P m ) p j ( W ) ≤ c m,n ( α ) (cid:17) + δ for all m > M and for all W . This result is mainly due to the sparsityassumption (A2). Second, we show in Lemma 2 that P m n P ∗ (cid:16) min j ∈ I ( P m ) p j ( W ) ≤ c m,n ( α ) (cid:17) ≤ α + δ o → m → ∞ . (10)Theorem 1 follows by combining these two results. Lemma 1.
Let α ∈ (0 , , δ ∈ (0 , α ) , and assume (A1), (A2), (B2) and (B3) . Then there exists an M < ∞ such that P ∗ (cid:16) min j ∈{ ,...,m } p j ( W ) ≤ c m,n ( α ) (cid:17) ≤ P ∗ (cid:16) min j ∈ I ( P m ) p j ( W ) ≤ c m,n ( α ) (cid:17) + δ for all m > M and for all W . Proof.
Note that c m,n ( α ) ∈ S n by definition. Using the union bound,we have, for all s ∈ S n and all W , P ∗ (cid:16) min j ∈{ ,...,m } p j ( W ) ≤ s (cid:17) ≤ P ∗ (cid:16) min j ∈ I ( P m ) p j ( W ) ≤ s (cid:17) (11) + X j ∈ I ′ ( P m ) P ∗ ( p j ( W ) ≤ s ) . PTIMALITY OF THE WESTFALL–YOUNG PERMUTATION PROCEDURE Hence, we only need to show that there exists an
M < ∞ such that X j ∈ I ′ ( P m ) P ∗ ( p j ( W ) ≤ c m,n ( α )) ≤ δ (12)for all m > M and all W . By assumption (B2) with constant r , X j ∈ I ′ ( P m ) P ∗ ( p j ( W ) ≤ c m,n ( α )) ≤ | I ′ ( P m ) | rc m,n ( α )(13) = r | I ′ ( P m ) | B Bc m,n ( α ) . Since | I ′ ( P m ) | /B → m → ∞ by assumption (A2), and Bc m,n ( α ) isbounded above by − log(1 − α ) under assumptions (A1) and (B3) (see Lem-ma 3), we can choose a M < ∞ such that the right-hand side of (13) isbounded above by δ for all m > M . This proves the claim in (12) and com-pletes the proof. (cid:3) Lemma 2.
Let α > and δ > and assume (A1), (A3) and (B1)–(B3) .Then P m n P ∗ (cid:16) min j ∈ I ( P m ) p j ( W ) ≤ c m,n ( α ) (cid:17) ≤ α + δ o → as m → ∞ . Proof.
Let ε >
0. The statement in the lemma is equivalent to showingthat there exists an
M < ∞ such that P m n P ∗ (cid:16) min j ∈ I ( P m ) p j ( W ) > c m,n ( α ) (cid:17) < − α − δ o < ε (14)for all m > M . By definition, P ∗ (cid:16) min j ∈ I ( P m ) p j ( W ) > c m,n ( α ) (cid:17) = 1 |G| X g ∈G n min j ∈ I ( P m ) p j ( gW ) > c m,n ( α ) o (15) = 1 |G| X g ∈G R ( g, W ) , where R ( g, W ) := 1 n min j ∈ I ( P m ) p j ( gW ) > c m,n ( α ) o . (We suppress the dependence on m, n, P m and α for notational simplicity.)Let G be a random permutation, chosen uniformly in G , and let 1 denotethe identity permutation. Then, by assumption (B1), it follows that E m (cid:18) |G| X g ∈G R ( g, W ) (cid:19) = E m,G R ( G, W ) = E m R (1 , W ) . N. MEINSHAUSEN, M. H. MAATHUIS AND P. B ¨UHLMANN
By definition of c m,n ( α ) [see (1)], E m R (1 , W ) = P m (cid:16) min j ∈ I Pm p j ( W ) > c m,n ( α ) (cid:17) ≥ − α. Hence, the desired result (14) follows from a Markov inequality as soon asone can show that the variance of (15) vanishes as m → ∞ , that is, ifVar m (cid:18) |G| X g ∈G R ( g, W ) (cid:19) = 1( |G| ) X g,g ′ ∈G Cov m ( R ( g, W ) , R ( g ′ , W )) = o (1)(16)as m → ∞ .Let G, G ′ be two random permutations, drawn independently and uni-formly from G . ThenCov m,G,G ′ ( R ( G, W ) , R ( G ′ , W )) = 1( |G| ) X g,g ′ ∈G Cov m ( R ( g, W ) , R ( g ′ , W )) . Hence, in order to show (16), we only need to show thatCov m,G,G ′ ( R ( G, W ) , R ( G ′ , W )) = o (1) for m → ∞ . Define R b ( g, W ) := 1 { p ( b ) ( gW ) > c m,n ( α ) } , (17)so that R ( g, W ) := Q Bb =1 R b ( g, W ). We then need to prove that, as m → ∞ , E m,G,G ′ B Y b =1 R b ( G, W ) R b ( G ′ , W ) ! − E m,G B Y b =1 R b ( G, W ) !! = o (1) . (18)Using assumption (A1), the left-hand side in (18) can be written as B Y b =1 E m,G,G ′ { R b ( G, W ) R b ( G ′ , W ) } − B Y b =1 [ E m,G { R b ( G, W ) } ] . Note that E m,G,G ′ { R b ( G, W ) R b ( G ′ , W ) } and [ E m,G { R b ( G, W ) } ] are boun-ded between 0 and 1. For sequences of numbers a , . . . , a B and b , . . . , b B that are bounded between 0 and 1, the following inequality holds: (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) B Y j =1 a j − B Y j =1 b j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) B X j =1 (cid:26) ( a j − b j ) (cid:18)Y k
1] under assumptions (A1), (B1) and (B2).Hence, using Lemma 5, it follows that0 ≤ E m,G,G ′ { R b ( G, W ) R b ( G ′ , W ) } − [ E m,G { R b ( G, W ) } ] ≤ (log { / (1 − α ) } αr m B B − ) . Since m B = o ( √ B ) under assumption (A3), claim (19) follows. (cid:3) Lemma 3.
Under assumptions (A1) and (B3) , we have Bc m,n ( α ) ≤ B X b =1 π b ( c m,n ( α )) ≤ log { / (1 − α ) } . (21) Proof.
Let b ∈ { , . . . , B } and j b ∈ I ( P m ) ∩ A b . Then π b { c m,n ( α ) } ≥ P m ( p j b ( W ) ≤ c m,n ( α )) = c m,n ( α ) , (22)where the inequality follows from the definition of π b ( · ), and the equalityfollows from assumption (B3) and the fact that c m,n ( α ) ∈ S n . Summing (22)over b = 1 , . . . , B yields the first inequality of (21).To prove the second inequality of (21), note that assumption (A1) andthe definition of c m,n ( α ) imply that1 − B Y b =1 [1 − π b { c m,n ( α ) } ] ≤ α. (23)The maximum of P Bb =1 π b { c m,n ( α ) } under constraint (23) is obtained when π { c m,n ( α ) } = · · · = π B { c m,n ( α ) } . This implies π b { c m,n ( α ) } ≤ − (1 − α ) /B for all b = 1 , . . . , B , so that B X b =1 π b { c m,n ( α ) } ≤ B − B (1 − α ) /B , and this is bounded above by − log(1 − α ) for all values of B . (cid:3) Lemma 4.
Assume (A1), (B1) and (B2) . Let F b be the distributionof µ b ( W ) , where µ b ( W ) is defined in (20). Then support( F b ) ⊆ [1 − log { / (1 − α ) } αr m B B − , . N. MEINSHAUSEN, M. H. MAATHUIS AND P. B ¨UHLMANN
Proof.
Using assumption (B2) with constant r and the union bound,it holds that1 − µ b ( W ) = P ∗ { p ( b ) ( W ) ≤ c m,n ( α ) } ≤ r | A b | c m,n ( α ) . Since m B = max b =1 ,...,B | A b | , the support of F b is thus in the interval [1 − m B rc m,n ( α ) , c m,n ( α ) ≤ − log(1 − α ) αrB − . (24)To see that (24) holds, we first show that1 − α ≤ P m n min j ∈ I ( P m ) p j ( W ) > c m,n ( α ) o ≤ (1 − c m,n ( α ) /r ) B . (25)The first inequality in (25) follows directly from the definition of c m,n ( α );see (1). To prove the second inequality, note that assumption (A1) impliesthat P m n min j ∈ I ( P m ) p j ( W ) > c m,n ( α ) o = B Y b =1 P m { p ( b ) ( W ) > c m,n ( α ) } . (26)By assumption (B1) and the law of iterated expectations, P m { p ( b ) ( W ) > c m,n ( α ) } = P m,G { p ( b ) ( GW ) > c m,n ( α ) } (27) = E m { P m,G { p ( b ) ( GW ) > c m,n ( α ) | W }} . By assumption (B2), the conditional probability within each block satisfies P m,G { p ( b ) ( GW ) > c m,n ( α ) | W } = P ∗ { p ( b ) ( W ) > c m,n ( α ) }≤ − P ∗ { p j b ( W ) ≤ c m,n ( α ) } (28) ≤ − c m,n ( α ) /r, where j b ∈ I ( P m ) ∩ A b . Since the right-hand side of (28) does not dependon W , the same bound holds for (27), where we also take the expectationover W . Using this result in (26), the second inequality in (25) follows.Finally, (25) implies c m,n ( α ) ≤ r { − (1 − α ) /B } . Since B (1 − (1 − α ) /B ) ≤ − log(1 − α ) for all values of B , it follows that 1 − (1 − α ) /B ≤ − log(1 − α ) B − . This proves (24) and completes the proof. (cid:3) Lemma 5.
Let U be a real-valued random variable with support [ a, b ] ⊂ [0 , . Suppose that the distribution of the two random variables X and X ,conditional on U = u , is given by X , X . i . d . ∼ Bernoulli( u ) . Then ≤ E ( X X ) − E ( X ) E ( X ) ≤ ( b − a ) . PTIMALITY OF THE WESTFALL–YOUNG PERMUTATION PROCEDURE Proof.
By the assumption that X and X are Bernoulli conditionalon U , it follows that E ( X | U ) = E ( X | U ) = U . Combining this with thelaw of iterated expectation and the fact that X and X are conditionallyindependent given U , we obtain E ( X X ) = E U { E ( X X | U ) } = E U { E ( X | U ) E ( X | U ) } = E ( U ) . Moreover, we have E ( X ) = E U { E ( X | U ) } = E ( U ) and similarly E ( X ) = E ( U ). Hence, E ( X X ) − E ( X ) E ( X ) = E ( U ) − { E ( U ) } = Var( U ) . Finally, 0 ≤ Var( U ) ≤ ( b − a ) by the assumption on the support of U . (cid:3) Proof of Theorem 2.
First, note that (W) implies (B1)–(B3). Usingthe union bound and assumption (B3), it holds for any s ∈ S n that ms is anupper bound for P m (min j ∈ I ( P m ) p j ( W ) ≤ s ). Hence, c m,n ( α ) = max n s ∈ S n : P m (cid:16) min j ∈ I ( P m ) p j ( W ) ≤ s (cid:17) ≤ α o (29) ≥ max { s ∈ S n : ms ≤ α } . This implies that the oracle critical value is larger than zero if the set { s ∈ S n : ms ≤ α } is nonempty, which is the case if min( S n ) ≤ α/m . The small-est possible two-sided Wilcoxon p -value is min( S n ) = 2 ( n/ n/ n ! ≤ − n/ .Hence, it is sufficient to require that 2 − n/ ≤ α/m , or equivalently, that n ≥ ( m/α ) + 2.Note that (A3 ′ ) implies (A3). Hence, under assumptions (W), (A1), (A2)and (A3 ′ ), the result in Theorem 1 applies.Let α ∈ (0 , ′ ), α − → α as m, n → ∞ such that n/ log( m ) → ∞ , where α − was defined in (8). De-fine c + m,n ( α ) := min { s ∈ S n : s > c m,n ( α ) } . Using the definition of α − andassumption (A1), we have α − = P m (cid:16) min j ∈ I ( P m ) p j ( W ) ≤ c m,n ( α ) (cid:17) = 1 − B Y b =1 [1 − π b { c m,n ( α ) } ](30) = 1 − B Y b =1 [1 − π b { c + m,n ( α ) } + π b { c + m,n ( α ) } − π b { c m,n ( α ) } ] . N. MEINSHAUSEN, M. H. MAATHUIS AND P. B ¨UHLMANN
Define the function g m,n : Q Bb =1 [0 , π b { c + m,n ( α ) } ] → R by g m,n ( u ) := g m,n ( u , . . . , u B ):= 1 − B Y b =1 [1 − π b { c + m,n ( α ) } + u b ] , so that the right-hand side of (30) equals g m,n ( w ), where w b := π b { c + m,n ( α ) }− π b { c m,n ( α ) } for b = 1 , . . . , B . A first-order Taylor expansion of g m,n ( w ) around(0 , . . . ,
0) yields α − = g m,n ( w ) = g m,n (0) + B X b =1 w b ∂g m,n ( u ) ∂u b (cid:12)(cid:12)(cid:12)(cid:12) u =0 + R, (31)where R = o ( P Bb =1 w b ). For all b = 1 , . . . , B , we have ∂g m,n ( u ) ∂u b (cid:12)(cid:12)(cid:12)(cid:12) u =0 = − B Y j =1 ,j = b [1 − π j { c + m,n ( α ) } ]= − − g m,n (0)1 − π b { c + m,n ( α ) } ≥ − − g m,n (0)1 − m B c + m,n ( α ) , where the inequality follows from π b { c + m,n ( α ) } ≤ m B c + m,n ( α ) for b = 1 , . . . , B ,by the union bound and assumption (B3). Plugging this into (31) yields α − ≥ g m,n (0) − − g m,n (0)1 − m B c + m,n ( α ) B X b =1 w b + R (32) = g m,n (0) (cid:18) P Bb =1 w b − m B c + m,n ( α ) (cid:19) − P Bb =1 w b − m B c + m,n ( α ) + R. The definition of c + m,n ( α ) implies that g m,n (0) > α for all m and n . Hence, if B X b =1 w b → m B c + m,n ( α ) → m, n → ∞ such that n/ log( m ) → ∞ , then the right-hand side of (32)converges to α and the proof is complete.We first consider P Bb =1 w b . By definition, there is no value s ′ ∈ S n suchthat c m,n ( α ) < s ′ < c + m,n ( α ). Hence, w b = P m n min j ∈ A b ∩ I ( P m ) p j ( W ) = c + m,n ( α ) o ≤ m B max j ∈ A b ∩ I ( P m ) P m { p j ( W ) = c + m,n ( α ) } = m B { c + m,n ( α ) − c m,n ( α ) } , PTIMALITY OF THE WESTFALL–YOUNG PERMUTATION PROCEDURE where the inequality follows from the union bound, and the last equality isdue to assumption (B3). This implies B X b =1 w b ≤ Bm B { c + m,n ( α ) − c m,n ( α ) } = Bc m,n ( α ) m B (cid:18) c + m,n ( α ) c m,n ( α ) − (cid:19) . Similarly, we have m B c + m,n ( α ) = Bc m,n ( α ) m B B c + m,n ( α ) c m,n ( α ) . Note that Bc m,n ( α ) ≤ log { / (1 − α ) } by Lemma 3 and m B = O (1) (andhence B → ∞ ) by assumption (A3 ′ ). Hence, in order to prove (33), it sufficesto show that c + m,n ( α ) /c m,n ( α ) → m, n → ∞ , n/ log( m ) → ∞ . (34)Let the ordered p -values in S n , based on a two-sided Wilcoxon test withequal sample sizes n/ s < s < · · · < s r n ,where r n = ⌊ n / ⌋ . It is well known that s i = 2 ( n/ n/ n ! i X j =0 q n/ ( j ) for i = 0 , . . . , r n − s r n = 1, where q n ( j ) is the number of integer partitions of j such that nei-ther the number of parts nor the part magnitudes exceed n [and q n (0) = 1] [35].Let i m,n satisfy s i m,n = c m,n ( α ). Then c + m,n ( α ) c m,n ( α ) = P i m,n +1 j =0 q n/ ( j ) P i m,n j =0 q n/ ( j ) . This ratio converges to 1 if i m,n → ∞ . Recall that c m,n ( α ) ≥ max { s ∈ S n : s ≤ α/m } [see (29)]. Hence, c + m,n ( α ) = 2 ( n/ n/ n ! i m,n +1 X j =0 q n/ ( j ) > α/m. Since 2 m { ( n/ n/ } /n ! ≤ m − n/ → m, n → ∞ such that n/ log( m ) → ∞ , we have that under these conditions i m,n → ∞ and c + m,n ( α ) /c m,n ( α ) →
1. Thus (34) holds and hence implies (33), which completes theproof.
Acknowledgments.
We would like to thank two referees for constructivecomments. N. MEINSHAUSEN, M. H. MAATHUIS AND P. B ¨UHLMANN
REFERENCES [1]
Becker, T. and
Knapp, M. (2004). A powerful strategy to account for multipletesting in the context of haplotype analysis.
The American Journal of HumanGenetics Benjamini, Y. , Krieger, A. M. and
Yekutieli, D. (2006). Adaptive linear step-up procedures that control the false discovery rate.
Biometrika Benjamini, Y. and
Yekutieli, D. (2001). The control of the false discovery rate inmultiple testing under dependency.
Ann. Statist. Blanchard, G. and
Roquain, ´E. (2009). Adaptive false discovery rate control underindependence and dependence.
J. Mach. Learn. Res. Bond, G. L. , Hu, W. and
Levine, A. (2005). A single nucleotide polymorphism inthe MDM2 gene: From a molecular and cellular explanation to clinical effect.
Cancer Research Cheung, V. G. , Spielman, R. S. , Ewens, K. G. , Weber, T. M. , Morley, M. and
Burdick, J. T. (2005). Mapping determinants of human gene expressionby regional and genome-wide association.
Nature
Clarke, S. and
Hall, P. (2009). Robustness of multiple testing procedures againstdependence.
Ann. Statist. Dudoit, S. , Shaffer, J. P. and
Boldrick, J. C. (2003). Multiple hypothesis testingin microarray experiments.
Statist. Sci. Dudoit, S. and van der Laan, M. J. (2008).
Multiple Testing Procedures withApplications to Genomics . Springer, New York. MR2373771[10]
Efron, B. (2007). Correlation and large-scale simultaneous significance testing.
J. Amer. Statist. Assoc.
Ge, Y. , Dudoit, S. and
Speed, T. P. (2003). Resampling-based multiple testing formicroarray data analysis.
Test Genovese, C. R. , Roeder, K. and
Wasserman, L. (2006). False discovery controlwith p -value weighting. Biometrika Goeman, J. J. and
Solari, A. (2010). The sequential rejection principle of family-wise error control.
Ann. Statist. Good, P. I. (2011). Permutation tests. In
Analyzing the Large Number of Variablesin Biomedical and Satellite Imagery
Goode, E. L. , Dunning, A. M. , Kuschel, B. , Healey, C. S. , Day, N. E. , Pon-der, B. A. J. , Easton, D. F. and
Pharoah, P. P. D. (2002). Effect of germ-line genetic variation on breast cancer survival in a population-based study.
Cancer Research Hall, P. and
Jin, J. (2008). Properties of higher criticism under strong dependence.
Ann. Statist. Hall, P. and
Jin, J. (2010). Innovated higher criticism for detecting sparse signalsin correlated noise.
Ann. Statist. Hirschhorn, J. N. and
Daly, M. J. (2005). Genome-wide association studies forcommon diseases and complex traits.
Nature Reviews Genetics Holm, S. (1979). A simple sequentially rejective multiple test procedure.
Scand. J.Stat. Kruglyak, L. (1999). Prospects for whole-genome linkage disequilibrium mappingof common disease genes.
Nature Genetics Liang, C.-L. , Rice, J. A. , de Pater, I. , Alcock, C. , Axelrod, T. , Wang, A. and
Marshall, S. (2004). Statistical methods for detecting stellar occultationsPTIMALITY OF THE WESTFALL–YOUNG PERMUTATION PROCEDURE by Kuiper belt objects: The Taiwanese–American occultation survey. Statist.Sci. Ludbrook, J. and
Dudley, H. (1998). Why permutation tests are superior to t and F tests in biomedical research. Amer. Statist. Marchini, J. , Donnelly, P. and
Cardon, L. R. (2005). Genome-wide strategiesfor detecting multiple loci that influence complex diseases.
Nature Genetics McCarthy, M. I. , Abecasis, G. R. , Cardon, L. R. , Goldstein, D. B. , Lit-tle, J. , Ioannidis, J. P. A. and
Hirschhorn, J. N. (2008). Genome-wideassociation studies for complex traits: Consensus, uncertainty and challenges.
Nature Reviews Genetics Meinshausen, N. (2006). False discovery control for multiple tests of associationunder general dependence.
Scand. J. Stat. Meinshausen, N. and
Rice, J. (2006). Estimating the proportion of false nullhypotheses among a large number of independently tested hypotheses.
Ann.Statist. Reiner, A. , Yekutieli, D. and
Benjamini, Y. (2003). Identifying differentiallyexpressed genes using false discovery rate controlling procedures.
Bioinformatics Roeder, K. and
Wasserman, L. (2009). Genome-wide significance levels andweighted hypothesis testing.
Statist. Sci. Romano, J. P. and
Wolf, M. (2005). Exact and approximate stepdown methodsfor multiple hypothesis testing.
J. Amer. Statist. Assoc.
Sun, W. and
Cai, T. T. (2009). Large-scale multiple testing under dependence.
J. R. Stat. Soc. Ser. B Stat. Methodol. Westfall, P. H. and
Troendle, J. F. (2008). Multiple testing with minimal as-sumptions.
Biom. J. Westfall, P. H. and
Young, S. S. (1989). p -value adjustments for multiple tests inmultivariate binomial models. J. Amer. Statist. Assoc. Westfall, P. H. and
Young, S. S. (1993).
Resampling-Based Multiple Testing:Examples and Methods for p -Value Adjustment . Wiley, New York.[34] Westfall, P. H. , Zaykin, D. V. and
Young, S. S. (2002). Multiple tests for geneticeffects in association studies. In
Biostatistical Methods: Methods in MolecularBiology ( S. Looney , ed.)
Wilcoxon, F. (1945). Individual comparisons by ranking methods.
Biometrics Bul-letin Winkelmann, J. , Schormair, B. , Lichtner, P. , Ripke, S. , Xiong, L. , Jalilzadeh, S. , Fulda, S. , P¨utz, B. , Eckstein, G. and
Hauk, S. et al. (2007). Genome-wide association study of restless legs syndrome identifies com-mon variants in three genomic regions.
Nature Genetics Yekutieli, D. and
Benjamini, Y. (1999). Resampling-based false discovery rate con-trolling multiple test procedures for correlated test statistics.
J. Statist. Plann.Inference N. MeinshausenDepartment of StatisticsUniversity of OxfordUnited KingdomE-mail: [email protected]