Adapting BH to One- and Two-Way Classified Structures of Hypotheses
AAdapting BH to One- and Two-Way Classified Structures ofHypotheses
Shinjini Nandi ∗ andSanat K. Sarkar † Department of Statistical Science, Temple University
Abstract
Multiple testing literature contains ample research on controlling false discoveries forhypotheses classified according to one criterion, which we refer to as one-way classifiedhypotheses. Although simultaneous classification of hypotheses according to two differentcriteria, resulting in two-way classified hypotheses, do often occur in scientific studies, nosuch research has taken place yet, as far as we know, under this structure. This article pro-duces procedures, both in their oracle and data-adaptive forms, for controlling the overallfalse discovery rate (FDR) across all hypotheses effectively capturing the underlying one-or two-way classification structure. They have been obtained by using results associatedwith weighted Benjamini-Hochberg (BH) procedure in their more general forms providingguidance on how to adapt the original BH procedure to the underlying one- or two-way clas-sification structure through an appropriate choice of the weights. The FDR is maintainednon-asymptotically by our proposed procedures in their oracle forms under positive regres-sion dependence on subset of null p -values (PRDS) and in their data-adaptive forms underindependence of the p -values. Possible control of FDR for our data-adaptive procedures incertain scenarios involving dependent p -values have been investigated through simulations.The fact that our suggested procedures can be superior to contemporary practices has beendemonstrated through their applications in simulated scenarios and to real-life data sets.While the procedures proposed here for two-way classified hypotheses are new, the data-adaptive procedure obtained for one-way classified hypotheses is alternative to and oftenmore powerful than those proposed in Hu et al. (2010). Keywords:
One-way grouped BH, Two-way grouped BH, Data-adaptive one-way grouped BH,Data-adaptive two-way grouped BH ∗ Shinjini Nandi is a PhD student at Department of Statistical Science, Temple University (email: [email protected]); † Sanat K. Sarkar is Professor of Department of Statistical Science, Temple University, Philadelphia, PA 19122(email: [email protected]). This work is based on Nandi’s doctoral research. Sarkar’s research was supported byNSF Grants DMS-1208735 and DMS-1309273. a r X i v : . [ s t a t . M E ] M a r Introduction
Large-scale multiple testing problems often involve classifying a set of hypotheses into severalgroups. In some cases, the families/groups might be formed naturally due to characteristics ofthe underlying scientific experiment. In other situations, a certain feature attributable to eachhypothesis might serve as the basis of partition. Grouping of hypotheses due to any well definedargument, whether natural or artificial, benefits analyses. In all such instances, since the classifi-cation is due to a single criterion, we refer to the setup as ‘one-way classification’ of hypotheses.Multiple testing procedures adapted to such arrangement of hypotheses incorporate the infor-mation of similar characteristics within each group. Consequently, they can address problemsspecific to such structures, they usually have more power and better control over false discoveriesthan their counterparts that ignore group structures. One-way classified hypotheses have beenwidely investigated in the literature. Existing multiple testing procedures have been revamped(in Pacifico et al. (2004), Benjamini and Heller (2007), etc.) to accommodate such layout. Huet al. (2010) introduced a weighted BH procedure for one-way grouped hypotheses that assignsweights to each group proportional to the number of null hypotheses in it, before applying theBH procedure to the weighted hypotheses pooled together across all groups. Ignatiadis et al.(2016) suggested a data driven weighted procedure to test similarly classified hypotheses. Anyset of weights that depend on external covariates and satisfy some simple constraints can beconsidered for a testing procedure similar to the method suggested in Hu et al. (2010). The opti-mum set of weights are chosen subject to maximization of power using data-based optimizationtechniques.In many situations, a set of hypotheses might be classified according to more than onenorm of classification. Just like one-way classified hypotheses, multiple interesting featuresor nature of the experiment may determine the norms. For example, brain imaging studiesinvolving fMRI data (Foygel Barber and Ramdas (2015)), geographical studies involving datacollected through satellite remote sensing (Clements et al. (2014)), studies in genetics involvingmicroarray time course experiments (Sun and Wei (2011)), etc, comprise of spatio-temporaldata. The multitude of hypotheses arising out of such data can be clustered into groups formedthrough aggregation of neighboring spatial units, and/or related time points. Other examplescan be found in bioinformatics, studies involving association between genes and proteins, andgenomewide association studies that involve analysis of association of SNPs with different regionsof the brain (Stein et al. (2010)). Examples where more than two types of classification areimposed simultaneously on a set of hypotheses are very rare. If a set of hypotheses is classifiedin exactly two different ways, we call it a set of ‘two-way classified hypotheses’.In such cases, researchers are most interested in the hypotheses that emerge as significantwhen effects due to both classifications are factored in. The scope of existing multiple testing2rocedures is limited to one-way classified data and such methods are incapable to gauge thesimultaneous effect of two-way classification. Some efforts made to study such structures in Steinet al. (2010), Sun and Wei (2011), etc. involve repetitive application of one-way classificationmultiple testing procedures. Broadly speaking, in the first step, one of the two classificationsis prioritized over the other. Considering the hypotheses as classified only due to this factor,significant groups and/or individual hypotheses are determined. In the second step, these signif-icant elements are further tested for significance due to the second grouping criterion and finallythe set of significant hypotheses is determined. Foygel Barber and Ramdas (2015), Ramdaset al. (2017) discuss multi-way classification and suggest an algorithm that recursively appliesBH procedure to all partitions created and selects the set of hypotheses as significant which arerejected in all partitions.The goal of this article is to suggest a new data-adaptive multiple testing procedure for one-way classified hypotheses and broaden the scope of multiple testing procedures to two-way clas-sified hypotheses. In Section 2, we describe existing methodologies that serve as the backgroundfor developing our new methods. In section 3, this is followed by description of the one-way clas-sification model, the weighted multiple testing procedure and our proposed data-adaptive versionof it. In Section 4, we introduce the two-way classification model and modify the multiple test-ing procedures for one-way classified hypotheses to suit to the new layout. We also discuss thecorresponding data-adaptive procedures that can be applied to multiple hypotheses subjectedto such classification. We establish that our proposed data-adaptive methods (both one-wayand two-way) are adequate for finite sets of hypotheses, at least under independence. Section5 demonstrates through simulation studies that the performances of our suggested methods aresuperior to existing practices in most practical scenarios. Though our suggested data-adaptiveprocedures are proven to control false discoveries for independent hypotheses, simulations showthat for suitable choices of parameters, they are also applicable to positively dependent hypothe-ses, in certain scenarios involving high density of signals. To illustrate its utility, our proposedmethod is applied to a dataset on prevalence of microbial communities involving two-way classi-fied hypotheses in section 6, and its performance is compared with that of an existing method.We end our paper with some concluding remarks in Section 7.
In this section, we recall some existing results on weighted and data-adaptive weighted p -valuebased FDR controlling procedures for testing a set of hypotheses with no specific group structureand present them in their general forms to set the stage for developing similar procedures in thelarger domain of one- and two-way classified hypotheses. The discussions surrounding these3esults will provide ideas on the basic methodological steps that we will take to develop ourproposed newer procedures in the next section.Consider simultaneous testing of a set of N hypotheses H , . . . , H N based on their respectivep-values P , . . . , P N subject to a control overFDR = E (cid:20) V N max { R N , } (cid:21) , with R N and V N being the total numbers of rejected and falsely rejected null hypotheses, re-spectively, under the following assumption: Assumption 1 P i ∼ U (0 , for each i ∈ I , with I being the set of indexes of null hypotheses. Regarding dependence among the p-values, we assume that they are positively regressiondependent on subset (PRDS) of null p-values, as defined below generally for any set of randomvariable X , . . . , X k : Condition 1
A set of random variable X , . . . , X k is said to be positively regression dependenton a particular subset S of these random variables if E [ φ ( X , . . . , X k ) | X i = x ] is non-decreasingin x , for each X i ∈ S and for any (coordinatewise) non-decreasing function φ of ( X , . . . , X k ) . Clearly, independent p-values are PRDS. For examples of non-independent p-values satisfyingCondition 1, the readers are referred to Benjamini and Yekutieli (2001) and Sarkar (2002), Sarkar(2008). A weaker form of positive dependence condition, with E [ φ ( X , . . . , X k ) | X i = x ] replacedby E [ φ ( X , . . . , X k ) | X i ≤ x ], is often assumed in the literature in the context of BH type FDRcontrolling procedures [Finner et al. (2009), Sarkar (2008)]. This condition could have been usedinstead of Condition 1 in this paper without affecting our results relying on such a condition.Let us now recall the definition of the BH procedure for a single group of hypotheses in itsmore general form in terms of weighted p-values. Definition 1
For a set of N hypotheses, suppose that the i th p-value P i is assigned a non-stochastic weight w i ≥ , for i = 1 , . . . , N . The weighted BH procedure at level α correspondingto these weights is a stepup procedure with the critical constants iα/N , i = 1 , . . . , N , i.e., itorders the weighted p-values P wi = w i P i , i = 1 , . . . , N , in increasing order as P w ( i ) , i = 1 , . . . , N ,and rejects the hypotheses H (1) , . . . , H ( R ) corresponding to P w (1) , . . . , P w ( R ) where R = max (cid:26) ≤ j ≤ N : P w ( j ) ≤ jαN (cid:27) , provided the maximum exists; otherwise, it rejects none. esult 1 The FDR of the weighted BH procedure based on p-values satisfying the PRDS condi-tion is bounded above by αN (cid:80) i ∈ I w i . A proof of this result using techniques from Sarkar (2002) is provided in Appendix.Result 1 serves as our foundation. It leads to systematic development of our proposed proce-dures in their oracle forms through appropriate choice of weights suited to either one- or two-wayclassification structures levied on the set of hypotheses, before we construct their appropriatedata-adaptive versions. More specifically, one can determine weights that satisfy (cid:88) i ∈ I w − i = N, (1)and appropriately capture the underlying classification structure to develop a weighted BH pro-cedure in its oracle form that controls the FDR at α across all hypotheses, conservatively underPRDS and exactly under independence, before constructing an appropriate data-adaptive ver-sion of it. For instance, the choice of weights, w i = π = | I | /N , for all i = 1 , . . . , N , satisfyingthis condition yields the single-group BH procedure in its oracle form. A data-adaptive versionof it is the one that uses the existing data to estimate π . There are several such data-adaptivesingle-group BH procedures that have been put forward in the literature, for example, Benjaminiand Hochberg (2000), Storey et al. (2004), Sarkar (2008), and Blanchard and Roquain (2009).The same condition is also satisfied by the weights chosen by Hu et al. (2010) in theirconstruction of a weighted BH procedure in its oracle form, referred to as the one-way groupedBH procedure, in the context of testing one-way classified hypotheses. Its control over the FDRunder PRDS is also proven and can now be seen to follow from Result 1, which is more general.It will be revisited in the next section where we introduce a data-adaptive version of it thatis not only different from the data-adaptive procedures originally proposed in Hu et al. (2010),but also more preferred in a non-asymptotic setting, where its FDR is theoretically shown to becontrolled, at least under independence.The next section will also contain newer procedures in their oracle as well as data-adaptiveforms that we introduce in this article for testing two-way classified hypotheses. The non-asymptotic FDR control of all these new data-adaptive procedures under independence is estab-lished using the following result. Result 2
The FDR of a data-adaptive weighted BH procedure with (co-ordinate wise) non-decreasing estimated weight functions ˆ w i ( P ) > , i = 1 , . . . , N , is bounded above by α underindependence if E (cid:34)(cid:88) i ∈ I w i ( P ( − i ) , (cid:35) ≤ N (2)5 here ˆ w i ( P ( − i ) , represents ˆ w i as a function of P ( − i ) = { P , . . . , P N } \ P i with P i = 0 . A proof of this result can be seen in Sarkar (2008).We introduce below a newer class of estimates, expressing some of the existing ones in a moregeneral form, which offers a wider scope of data-dependent adaptation of the BH procedure toboth one- and two-way group structures of hypotheses with proven non-asymptotic FDR controlunder independence. The following lemma will be useful in checking the inequality in (2) forthis larger class of estimates:
Lemma 1
Let R N ( λ ) = (cid:80) Ni =1 I ( P i ≤ λ ) , V N ( λ ) = (cid:80) i ∈ I I ( P i ≤ λ ) , and R ( − i ) N − ( λ ) = (cid:80) Nj ( (cid:54) = i )=1 I ( P j ≤ λ ) , for a fixed λ ∈ (0 , . Then, for any non-negative real valued function f of R N ( λ ) , we havethe following result: E (cid:40)(cid:88) i ∈ I (1 − λ ) f (( R ( − i ) N − ( λ )) N − R ( − i ) N − ( λ ) (cid:41) ≤ E { f ( R N ( λ )) } . (3) Proof:
The inequality in (3) follows from the fact that N − V N ( λ ) ≤ N − R N ( λ ), and so theright-hand side of that inequality is greater than or equal to E (cid:26) [ N − V N ( λ )] f ( R N ( λ ))[ N − R N ( λ )] ∨ (cid:27) = E (cid:40)(cid:88) i ∈ I I ( P i > λ ) f ( R ( − i ) N − ( λ ) + I ( P i ≤ λ ))[ N − R ( − i ) N − ( λ ) − I ( P i ≤ λ )] ∨ (cid:41) , which reduces to the left-hand side of (3) since the P i ’s are independent. Remark 1
It is important to note, as we proceed to use Lemma 1 to develop procedures undermore complex structures of hypotheses in the next section, that R will be subscripted differentlyunder different structural settings since its definition should correctly reflect the number ofhypotheses involved. Using the same notations as used in Hu et al. (2010), let us suppose that the N hypotheses tobe simultaneously tested are split into m non-overlapping groups according to some criterion,with n g (cid:5) pairs of hypothesis and the corresponding p-value, ( H gi , P gi ), i = 1 , . . . , n g (cid:5) , falling ingroup g , and N = m (cid:80) g =1 n g (cid:5) . Let n g be the number of null hypotheses and I g ⊆ { , . . . , n g (cid:5) } be6he corresponding set of sub-indexes associated with i in group g . The set of indexes of all nullhypotheses among all hypotheses can then be expressed as I = m (cid:83) g =1 I g . Let π g = n g /n g (cid:5) bethe proportion of true nulls in group g , so that π , the proportion of true nulls in the entire setof N hypotheses, can be expressed as π = m (cid:80) g =1 n g (cid:5) π g /N .One-way grouped BH, shortly one-way GBH, is an oracle procedure. It is defined by Huet al. (2010) as a weighted BH procedure with the weights being formulated in terms π g , for g = 1 , . . . , m , assuming they are known, in a way that allows the BH procedure to effectivelyadapt to the present structural setting of the hypotheses. We revisit it in the following sub-section, before developing our newly proposed data-adaptive version of it later in this section. It is a weighted BH procedure with w g = π g (1 − π )1 − π g (4)being assigned as weight to P gi , for each i = 1 , . . . , n g (cid:5) , and g = 1 , . . . , m , assuming theseproportions are all known. Hu et al. (2010) referred to it as simply grouped BH, shortly GBHprocedure, but as said above, we will refer to it here as one-way GBH procedure. Since m (cid:88) g =1 (cid:88) i ∈ I g w − g = 11 − π m (cid:88) g =1 n g (cid:5) π g (1 − π g ) π g = 11 − π m (cid:88) g =1 n g (cid:5) (1 − π g ) = N, (5)the equality in (1) is satisfied by these weights, and so we have the following theorem, which ofcourse was proved in Hu et al. (2010) using different arguments. Theorem 1
One-way GBH procedure controls the overall FDR under PRDS and Assumption1.
There is a Bayesian justification behind the choice of these weights, as articulated by Hu etal. (2010). However, a look at these weights from a different point of view seems to providefurther insight into the effectiveness of these weights under the current setting.For a group with small proportion of true nulls, π g would be small. At the same time, it wouldhave higher odds of being significant relative to other groups, as measured by (1 − π g ) / (1 − π ).Consequently, the weight associated with that group gets deflated, facilitating easier rejection ofits members when the weighted BH procedure is applied to all the hypotheses.This sort of interpretation for the weights guides us in understanding how to estimate them,differently from Hu et al. (2010) and in constructing a data-adaptive version of one-way GBHprocedure that we will describe below. 7 .2 Data-Adaptive One-Way GBH Procedure We propose this procedure by considering the one-way GBH and replacing the weight w g in itby the following: ˆ w g = n g (cid:5) − R n g (cid:5) + 1 N (1 − λ ) · R N + m − R n g (cid:5) , g = 1 , . . . , m, (6)where, for some fixed λ ∈ (0 , R n g (cid:5) ≡ R n g (cid:5) ( λ ) = n g (cid:5) (cid:80) i =1 I ( P gi ≤ λ ), and R N = m (cid:80) g =1 R n g (cid:5) . We referto this procedure as a data-adaptive one-way GBH procedure.The idea of using this type of estimate for each w g came from its alternative interpretationnoted above. We estimate π g by ˆ π g = ( n g (cid:5) − R n g (cid:5) + 1) /n g (cid:5) (1 − λ ), which is a slight adjustment(considered by Storey et al. (2004)), from n g (cid:5) − R n g (cid:5) to n g (cid:5) − R n g (cid:5) + 1, made in the estimateoriginally used by Storey (2002) for the proportion of true nulls in the context of single-groupmultiple testing. To estimate (1 − π ) / (1 − π g ), we propose estimating (1 − π g ) / (1 − π ), whichis the proportion of false nulls in group g among all false nulls, by N ( R n g (cid:5) + m − /n g R N ,having made a slight adjustment to its natural estimate N R n g (cid:5) /n g (cid:5) R N , and then inverting thisestimate. When m = 1, the ˆ w g in expression (6) reduces to that of Storey et al. (2004). Theorem 2
The above data adaptive one-way GBH procedure controls the overall FDR underindependence among all p-values and Assumption 1.
Proof. The theorem will follow from Result 1 if we can show that the estimated weights,treated as functions of all the p-values, used in adaptive one-way GBH satisfy the two conditionsin Result 2 - (i) ˆ w g is non-decreasing in P gi for each g , and (ii) the inequality in (2) holds.The first condition follows by noting that both n g (cid:5) − R n g (cid:5) + 1 and ( R N + m − /R n g (cid:5) =1 + (cid:80) g (cid:48) (cid:54) = g ( R n g (cid:5) (cid:48) + 1) /R n g (cid:5) are non-increasing in each R n g (cid:5) , which itself is non-increasing in each P gi .To show that the second condition is also satisfied, let us first express ˆ w g as a function of P , the set of all p-values, i.e., as ˆ w g ( P ), for each g . Then, note that if we set P gi at 0 for aparticular pair ( g, i ∈ I g ), we getˆ w g ( P − ( g,i ) ,
0) = n g (cid:5) − R ( − i ) n g (cid:5) − N (1 − λ ) R ( − i ) n g (cid:5) − + (cid:80) g (cid:48) (cid:54) = g R n g (cid:48) (cid:5) + mR ( − i ) n g (cid:5) − + 1 , g = 1 , . . . , m, R ( − i ) n g (cid:5) − = (cid:80) i (cid:48) (cid:54) = i I ( P gi (cid:48) ≤ λ ). Thus, we haveE m (cid:88) g =1 (cid:88) i ∈ I g w g ( P − ( g,i ) , = N E m (cid:88) g =1 (cid:88) i ∈ I g (1 − λ ) f ( R ( − i ) n g (cid:5) − , (cid:80) g (cid:48) (cid:54) = g R n g (cid:48) (cid:5) ) n g (cid:5) − R ( − i ) n g (cid:5) − (7)where f x, (cid:88) g (cid:48) (cid:54) = g R n g (cid:48) (cid:5) = x + 1 x + (cid:80) g (cid:48) (cid:54) = g R n g (cid:48) (cid:5) + m . Applying Lemma 1 to the expectation in the right-hand side of equation (7) with respect to thep-values in the g th group, and completing the expectation with respect to all p-values, we seethat the left-hand side of equation (7) is less than or equal to N E (cid:40) m (cid:88) g =1 R n g (cid:5) + 1 R n g (cid:5) + (cid:80) g (cid:48) (cid:54) = g R n g (cid:48) (cid:5) + m (cid:41) = N E (cid:40) m (cid:88) g =1 R n g (cid:5) + 1 R N + m (cid:41) = N E (cid:26) R N + mR N + m (cid:27) = N, i.e., the second condition is also satisfied. Thus, the theorem is proved. Suppose, the N hypotheses can be classified simultaneously according to two criteria and canbe laid out in an m × n matrix with n gh ≥ g th rowand the h th column, i.e., in the ( g, h )th cell of the matrix. We will consider the two differentscenarios, one involving only one hypothesis per cell (i.e., n gh = 1) and the other involvingmultiple hypotheses per each cell (i.e., n gh > n gh = 1 Let n g be the number of true nulls in the g th row, for g = 1 , . . . m, , and m h be the number oftrue nulls in the h th column, for h = 1 , . . . , n . The subsets of indexes of true nulls associated with9 in the g th row and g in the h th column are, respectively, I g ⊆ { , . . . , n } and I h ⊆ { , . . . , m } .Consequently, the set of indexes of true nulls among the entire set of hypotheses can be expressedas I = m (cid:83) g =1 I g = n (cid:83) h =1 I h .The proportion of true nulls in the g th row is defined as π g = n g /n , and that in the h thcolumn as π h = m h /m . The proportion of true nulls in the entire set of N = mn hypothesesis π = m (cid:80) g =1 π g /m = n (cid:80) h =1 π h /n . The hypothesis H gh at the intersection of the g th row and h th column is affected upon by itsboth parent row and column, which motivates us to consider assigning the following weight to P gh corresponding to H gh , assuming all these proportions are known: w gh = (cid:20) (cid:26) π g − π g − π + 1 π h − π h − π (cid:27)(cid:21) − , g = 1 , . . . , m, h = 1 , . . . , n, (8)to simultaneously account for both row and column effects. The weighted BH procedure appliedto the N hypotheses based on the weighted p-values P wgh = w gh P gh , for g = 1 , . . . , m ; h = 1 , . . . , n ,is one of our proposed procedures in its oracle form, which we refer to as an Oracle Two-WayGBH procedure.This weight is a simple extension of that from one- to two-way classification setting. If theparent row has a low proportion of true nulls π g , it subsequently has a large odds of beingsignificant relative to other rows, as indicated by (1 − π g ) / (1 − π ). This reduces the weight w gh , making H gh more likely to be rejected. The weight is similarly affected by the parentcolumn. We assume that both classifications have equal impacts on the individual hypothesis,and so the weight is a function of the simple mean of contributions from each of parent groups.Noting that w gh can be expressed as w − gh = w − g + w − h , with w g along the rows being de-fined in expression (4) and w h being defined similarly along the columns as w h = π h (1 − π )1 − π h , h =1 , . . . , n, one sees from (5) that m (cid:88) g =1 (cid:88) h ∈ I g w − gh = 12 m (cid:88) g =1 (cid:88) h ∈ I g w − g + 12 n (cid:88) h =1 (cid:88) g ∈ I h w − h = 12 N + 12 N = N, (9)since (cid:80) mg =1 (cid:80) h ∈ I g = (cid:80) nh =1 (cid:80) g ∈ I h . Thus, equality in (1) is satisfied for the weights in expres-sion (8), and so we can state the following theorem from Result 1 without offering a proof. Theorem 3
Oracle Two-Way GBH procedure based on the weights in (8) controls the overallFDR under PRDS and Assumption 1. emark 2 The weight in (8) can be customized, still satisfying (9), to suit variable influenceof the row and column classifications. As an example, the weight can be adapted to reflect theimbalance between the number of rows and columns as follows: : w gh = (cid:20) m + n ) (cid:26) mπ g · − π g − π + nπ h · − π h − π (cid:27)(cid:21) − , g = 1 , . . . , m, h = 1 , . . . , n. (10)These weights additionally account for the proportion of groups along the rows and also alongthe columns out of the total number of groups. Clearly, if m = n , they reduce to those in(8). Such choice of weights in adapting BH procedure to two-way classification structure is notunique, and there remains a scope for other choices depending on variable factors or externalinformation. We consider the Oracle Two-Way GBH procedure in Theorem 3 and estimate the weights in itby the following:ˆ w gh = (cid:20) N (1 − λ )2 (cid:26) { n − R n g + 1 } · R n g { R N + m − } + 1 { m − R m h + 1 } · R m h { R N + n − } (cid:27)(cid:21) − ,g = 1 , . . . , m, h = 1 , . . . , n, (11)where R n g ≡ R n g ( λ ) = (cid:80) nh =1 I ( P gh ≤ λ ), R m h ≡ R m h ( λ ) = (cid:80) mg =1 I ( P gh ≤ λ ), and R N ≡ R N ( λ ) = m (cid:80) g =1 R n g ( λ ) = n (cid:80) h =1 R m h ( λ ), for some fixed λ ∈ (0 , .We have the following theorem as an extension of Theorem 2 from one- to two-way classifi-cation setting. Theorem 4
The above Data-Adaptive Two-Way GBH procedure controls the overall FDR un-der independence among all p-values and Assumption 1. Proof. This theorem can be proved based on the same arguments that were used to proveTheorem 2 using Result 2. First, note that ˆ w − gh = ˜ w − g + ˜ w − h , where˜ w g = n − R n g + 1 N (1 − λ ) R N + m − R n g , and ˜ w h = m − R m h + 1 N (1 − λ ) R N + n − R m h . w gh is non-decreasing in P gh , since both ˜ w g and ˜ w h are non-decreasing in P gh , which can be proved exactly the way it was proved for the ˆ w g in Theorem 2.Moreover, as proved in Theorem 2 for ˆ w g using Lemma 1, we have the following inequalitiesfor ˜ w g and ˜ w h under the assumption of independence among all p-values:E m (cid:88) g =1 (cid:88) g ∈ I g w g ( P − ( g,h ) , ≤ N, E n (cid:88) h =1 (cid:88) g ∈ I h w h ( P − ( g,h ) , ≤ N, from which we see thatE m (cid:88) g =1 (cid:88) h ∈ I g w gh ( P − ( g,h ) , = 12 E m (cid:88) g =1 (cid:88) h ∈ I g w g ( P − ( g,h ) , + 12 E n (cid:88) h =1 (cid:88) g ∈ I h w h ( P − ( g,h ) , ≤ N, i.e., inequality (2) in Result 2 holds. Thus, the theorem is proved. Remark 3
The estimate of weight considered above is stated in its simplest form. Like itsoracle counterpart, it can also be modified asˆ w gh = (cid:20) N (1 − λ ) m + n (cid:26) n − R n g + 1 · mR n g R N + m − m − R m h + 1 · nR m h R N + n − (cid:27)(cid:21) − ,g = 1 , . . . , m, h = 1 , . . . , n (12)In addition to accounting for the row and column effects, this expression also accounts for thedifference in numbers of rows and columns and accordingly emphasizes the corresponding effectson the individual hypothesis. Theorem 4 can also be stated in terms of adaptive two-way GBHwith one hypothesis per cell in terms of these alternative weights. n gh > Let n g (cid:5) = n (cid:80) h =1 n gh and n (cid:5) h = m (cid:80) g =1 n gh be the total numbers of hypotheses, respectively, in the g throw and h th column, so that N = m (cid:88) g =1 n g (cid:5) = n (cid:88) h =1 n (cid:5) h = m (cid:88) g =1 n (cid:88) h =1 n gh . n gh be the number of true nulls in the ( g, h )th cell and I gh ⊆ { , . . . , n gh } be the corre-sponding subset of indexes of true nulls. The overall set of indexes of the true nulls is I = m (cid:91) g =1 n (cid:91) h =1 I gh . The proportion of true nulls in the ( g, h )th cell of the m × n matrix is π gh = n gh /n gh . Thishelps to define π g = n (cid:80) h =1 n gh π gh / n (cid:80) h =1 n gh , π h = m (cid:80) g =1 n gh π gh / m (cid:80) g =1 n gh , and N π = m (cid:88) g =1 n g (cid:5) π g = n (cid:88) h =1 n (cid:5) h π h = m (cid:88) g =1 n (cid:88) h =1 n gh π gh . Suppose P ghk is the k th p-value in the ( g, h )th cell, and H ghk is the corresponding hypothesis.Assuming that all the proportions mentioned above are known, we consider assigning the fol-lowing weights to P ghk , for each k = 1 , . . . , n gh , to capture the underlying two-way classificationstructure of the hypotheses, and refer to the resulting weighted BH procedure as an OracleTwo-WayGBH > . w gh = (cid:20) (cid:26) π gh (cid:18) − π gh − π g + 1 − π gh − π h (cid:19) + (cid:18) π g · − π g − π + 1 π h · − π h − π (cid:19)(cid:27)(cid:21) − (13)Expressing w gh as w − gh = w − ,gh + w − ,gh + w − g + w − h , where w ,gh = π gh (1 − π g )1 − π gh , w ,gh = π gh (1 − π h )1 − π gh , w g = π g (1 − π )1 − π g , and w h = π h (1 − π )1 − π h , one can see that m (cid:88) g =1 n (cid:88) h =1 (cid:88) k ∈ I gh (cid:104) w − ,gh + w − ,gh + w − g + w − h (cid:105) = 4 N, that is, the equality in (1) is satisfied by the weights in (13). Therefore, we can state the followingfrom Result 1 without a proof Theorem 5
Two-way GBH with multiple hypotheses per cell based on the weights in (13) con-trols the overall FDR conservatively under PRDS and Assumption 1. emark 4 Of course, one can consider defining two-way GBH procedure with multiple hypothe-ses per cell based on other types of weight subject to the equality in (1). For instance, followingthe preceding case of one hypothesis per cell in the two-way classification setup, a natural choiceof weight assigned to hypotheses in the ( g, h )th cell would be w gh = (cid:20) (cid:26) π g · − π g − π + 1 π h · − π h − π (cid:27)(cid:21) − (14)The choice of weight for the two-way GBH procedure in Theorem 5 consists of an additionalterm that depends on the ratio of the proportion of signals in each cell to the same proportionsin the parent row and column. Owing to unequal number of members at the intersections in thetwo-way layout, further modifications of the weights would be complicated. However, if thereare an equal number, say p >
0, hypotheses at each cell, we can further edit these weights in(14) as w gh = (cid:20) p ( m + n ) (cid:26) mpπ g · − π g − π + npπ h · − π h − π (cid:27)(cid:21) − , (15a)and those in the procedure in Theorem 5 as w gh = (cid:20) p ( m + n ) (cid:26) pπ gh (cid:18) − π gh − π g + 1 − π gh − π h (cid:19) + (cid:18) p ( m − π g · − π g − π + p ( n − π h · − π h − π (cid:19) (cid:27)(cid:21) − . (15b) Consider the Two-WayGBH > in Theorem 5 to replace its weight w gh byˆ w gh = (cid:20) (cid:26) − λn gh − R n gh + 1 (cid:18) n g (cid:5) R n gh R n g (cid:5) + n − n (cid:5) h R n gh R n (cid:5) h + m − (cid:19) + N (1 − λ ) (cid:18) R n g (cid:5) { n g (cid:5) − R n g (cid:5) + 1 }{ R N + m − } + R n (cid:5) h { n (cid:5) h − R n (cid:5) h + 1 }{ R N + n − } (cid:19) (cid:27)(cid:21) − g = 1 . . . , m, h = 1 , . . . , n, (16)where, for some fixed λ , R n gh ≡ R n gh ( λ ) = (cid:80) n gh k =1 I ( P ghk ≤ λ ), R n g (cid:5) ≡ R n g (cid:5) ( λ ) = (cid:80) nh =1 R n gh ( λ ), R n (cid:5) h ≡ R n (cid:5) h ( λ ) = (cid:80) mg =1 R n gh ( λ ). R N = m (cid:88) g =1 R n g (cid:5) = n (cid:88) h =1 R n (cid:5) h = m (cid:88) g =1 n (cid:88) h =1 R n gh. It is referred to as a Data-Adaptive Two-WayGBH > .14 heorem 6 The above Data-Adaptive Two-WayGBH > controls the overall FDR under inde-pendence among all p-values and Assumption 1. Proof. Again, this theorem will be proved based on the same arguments that were used toprove Theorem 4 by verifying that the conditions in Result 2 are satisfied by the weight functionsˆ w gh ; i.e., it is increasing in each P ghk and that the following inequality holds with P ghk beingset to 0 in it: (cid:80) mg =1 (cid:80) nh =1 (cid:80) k ∈ I gh ˆ w − gh ( P − ( g,h,k ) , ≤ N .As in proving Theorem 4, let us consider ˆ w gh in terms of the following representation: ˆ w − gh = ˜ w − ,gh + ˜ w − ,gh + ˜ w − g + ˜ w − h , where˜ w ,gh = n gh − R n gh + 1 n g (cid:5) (1 − λ ) · R n g (cid:5) + n − R n gh , ˜ w ,gh = n gh − R n gh +1 n (cid:5) h (1 − λ ) · R n (cid:5) h + n − R n gh , ˜ w g = n g (cid:5) − R n g (cid:5) + 1 N (1 − λ ) · R N + m − R n g (cid:5) , ˜ w h = n (cid:5) h − R n (cid:5) h + 1 N (1 − λ ) · R N + n − R n (cid:5) h , (17)As argued before in proving Theorems 2 and 4, each of the four weights in (17) can be shownto satisfy the same two properties that we intend to show for ˆ w gh . In other words, ˆ w gh satisfiesthe desired two conditions in Result 2, and hence the theorem is proved. Remark 5
Other choices of weights can be suggested asˆ w gh = (cid:20) N (1 − λ )2 (cid:26) n g (cid:5) − R g. + 1 R g. R N + m − n (cid:5) h − R .h + 1 R .h R N + n − (cid:27)(cid:21) − (18)As in the oracle case, these weights can be further modified to be more informative if thereare an equal number of hypotheses ( p >
0) at each cell. The modified choice corresponding toexpression (16) would beˆ w gh = (cid:20) m + n ) p (cid:26) p (1 − λ ) n gh. − R n gh + 1 (cid:18) n g (cid:5) R n gh R n g (cid:5) + n − n (cid:5) h R n gh R n (cid:5) h + m − (cid:19) + N (1 − λ ) (cid:18) p ( m − R n g (cid:5) { n g (cid:5) − R n g (cid:5) + 1 }{ R N + m − } + p ( m − R n (cid:5) h { n (cid:5) h − R n (cid:5) h + 1 }{ R N + n − } (cid:19) (cid:27)(cid:21) − ∀ g = 1 . . . , m, h = 1 , . . . , n (19)and the modified choice corresponding to expression (18) isˆ w gh = (cid:20) N (1 − λ )( m + n ) p (cid:26) mpn g (cid:5) − R g. + 1 R g. R N + m − npn (cid:5) h − R .h + 1 R .h R N + n − (cid:27)(cid:21) − (20) We carried out extensive simulation studies to investigate the performances of our proposedprocedures in Theorems 2-6 in terms FDR control and power (expected proportion of correctly15ejected false nulls among all false nulls) against their relevant competitors. This section discussesthese results.
Here, our study was designed to compare the performance of the Data Adaptive Two-Way GBHprocedure in Theorem 2 with its following three relevant competitors, the first two of which wereconsidered in Hu et al. (2010) as extensions to one-way classification setting of the single-groupdata-adaptive BH procedures proposed, respectively, in Benjamini and Hochberg (2000) andBenjamini et al. (2006).
LSL (Least-Slope) Grouped BH:
One-Way GBH procedure with π g in equation (4) being esti-mated by the following, for each g :ˆ π LSL g = min (cid:18) (cid:98) l g,i (cid:99) + 1 n , (cid:19) , l g,i = n − i + 11 − P g, ( i ) , such that l g,i > l g,i − , with P g, ( i ) being the i th minimum ordered p-value in the g th group. TST (Two-Stage) Grouped BH:
One-Way GBH procedure with π g in equation (4) being esti-mated by the following, for each g : ˆ π TST g = n − r g n , with r g being the number of rejections obtained by applying the non-adaptive BH procedure tothe p-values in the g th group at level α/ (1 + α ). Naive Adaptive BH:
The usual data-adaptive BH with the following estimate of π :ˆ π = N − R N + 1 N (1 − λ ) , with R N = m (cid:88) g =1 n (cid:88) i =1 I ( P g,i ≤ λ ) , (21)for some fixed λ ∈ (0 , The following steps were taken to simulate values of FDR and power for the aforementionedprocedures.1. Generate θ g (cid:5) , for g = 1 , . . . , m , as a random sample from Ber(1 − π (cid:5) );2. For each g such that θ g (cid:5) = 1, generate θ (cid:5) | g = ( θ | g , . . . , θ n | g ) as a random vector of n i.i.d.Ber(1 − π ); 16. Given ( θ g (cid:5) , θ (cid:5) | g ), g = 1 , . . . , m , generate m independent n -dimensional random vectors X g = ( X g, . . . , X g,n ), g = 1 , . . . , m , as follows: X g = µθ g (cid:5) θ (cid:5) | g + (cid:113) (1 − ρ g ) Z g + √ ρ g Z g , for some 0 ≤ ρ g <
1, having generated { Z g , Z g = ( Z g , . . . , Z gn ) T } as a random vector of n + 1 i.i.d. N (0 ,
1) samples, for g = 1 , . . . , m .4. Apply each procedure at FDR level α = 0 .
05 for testing H g,i : E( X g,i ) = 0 against K g,i : E( X g,i ) >
0, simultaneously for all g = 1 , . . . , m, i = 1 , . . . , n , in terms of thecorresponding p-values, and note the proportions of false rejections among all rejectionsand correct rejections among all false nulls.5. Repeat Steps 1-4 200 times to simulate the values of FDR and power for each procedureby averaging out the corresponding proportions obtained in Step 4. Remark 6
Our modeling of E( X g ) in term of ( θ g (cid:5) , θ (cid:5) | g ) allowed us to split the state of eachhypothesis at two levels, group and individual, enabling us to regulate the density of signals inthe entire set of hypotheses using the following representation of true nulls among all hypotheses: π = 1 − (1 − π (cid:5) )(1 − π ) . (22) We fixed m = 50, n = 100, µ = 0 for true null hypotheses, and = 3 for true signals.We have two main objectives in our simulation study regarding the performance of ourproposed Data-Adaptive One-Way GBH procedure in Theorem 2 - (i) to investigate how wellit performs among all four procedures under independence when all of them are theoreticallyknown to control FDR, and (ii) to investigate if it can possibly control FDR under PRDS inview of the fact that such control is yet to be theoretically proved.Figures 1-2 display the findings of the first type of investigation, with λ = 0 .
5. Figure 1considers situations where signals are distributed evenly across all groups, with π (cid:5) = 0 and π (i.e., π by equation (22)) being allowed to vary. As seen from this figure, our proposed procedureperforms better than the LSL and TST GBH procedures. It controls FDR less conservativelyand is more powerful than its counterparts at all levels of density of true signals. However, itsperformance is quite similar to the adaptive BH procedure owing to the signals being uniformlydistributed across all groups. Since signals may potentially be non-uniformly distributed acrossthe groups, we considered a scenario where only half the groups may contain significant members;see Figure 2. Our proposed procedure is remarkably more powerful in this case than the othermethods. 17igure 3 displays the findings of the second type of investigation. As seen from it, ourproposed procedure can potentially control FDR in scenarios where concentration of signals ishigh and for certain choices of λ , preferably < α . A few such scenarios with varying density ofsignals uniformly distributed in all groups (i.e., π (cid:5) = 0) and choices of λ < α have been shownin this figure. . . . . . . p F DR One−way Adapted BHLSL GBHTST GBHAdaptive BH 0.0 0.2 0.4 0.6 0.8 1.0 . . . . . . p P o w e r One−way Adapted BHLSL GBHTST GBHAdaptive BH
Figure 1: FDR and Power comparisons of the Data-Adaptive One-Way GBH proposed withother methods, under independence, ( m = 50 , n = 100 , ρ = 0 , π (cid:5) = 0 , π ). . . . . . . p F DR One−way Adapted BHLSL GBHTST GBHAdaptive BH 0.0 0.2 0.4 0.6 0.8 1.0 . . . . . . p P o w e r One−way Adapted BHLSL GBHTST GBHAdaptive BH
Figure 2: FDR and Power comparisons of the Data-Adaptive One-Way GBH with other methodsapplied to independent one-way classified hypotheses when true signals are unevenly distributed( m = 50 , n = 100 , ρ = 0 , π (cid:5) = 0 . , π ) .18 .00 0.01 0.02 0.03 0.04 . . . . . . p = 0.70 l F DR One−way Adapted BHAdaptive BH 0.00 0.01 0.02 0.03 0.04 . . . . . . p = 0.75 l . . . . . . p = 0.80 l . . . . . . p = 0.90 l (a) FDR Comparisons . . . . . . p = 0.70 l P o w e r One−way Adapted BHAdaptive BH 0.00 0.01 0.02 0.03 0.04 . . . . . . p = 0.75 l . . . . . . p = 0.80 l . . . . . . p = 0.90 l (b) Power Comparisons Figure 3: Comparison of the Data-Adaptive One-Way GBH with the naive Adaptive BH methodunder PRDS condition for varying choices of 0 < λ < . α ) ( m = 50 , n = 100 , ρ = 0 . , π (cid:5) =0 , π ) . This section presents results associated with our simulation study that focused on investigatingthe performances of our proposed (i) Oracle Two-Way GBH procedure (Theorem 3) against theusual single-group BH procedure and the p-filter algorithm (Foygel Barber and Ramdas (2015),Ramdas et al. (2017)) in their oracle forms, and (ii) Data-Adaptive Two-Way GBH procedure(Theorem 4) against the Naive Data-Adaptive BH in terms of FDR control and power undernormal distributional settings. The simulation setting here is a natural extension of that in the above section. More specifically,it consists of the following steps:1. Generate Θ mn as an m × n random matrix of i.i.d. Ber(1 − π rc ), θ m as a random vectorof m i.i.d. Ber(1 − π r ), and θ n as a random vector of n i.i.d. Ber(1 − π c );19. Obtain Θ = Θ mn (cid:63) θ m Tn (cid:63) m θ Tn , (23)(with A(cid:63)B denoting the Hadamard product between matrices A and B , and a representingthe a -dimensional vector of 1’s),3. Given Θ , generate a random m × n matrix X = (( X gh )) as follows: X = µ Θ + (cid:112) (1 − ρ r )(1 − ρ c ) Z mn + (cid:112) (1 − ρ r ) ρ c Z m Tn + (cid:112) ρ r (1 − ρ c ) m Z Tn + √ ρ r ρ c Z m Tn , having generated Z mn as m × n random matrix, Z m as m -dimensional random vector, and Z n as n -dimensional random vector, each comprising i.i.d. N (0 , Z as an additional N (0 ,
1) random variable.4. Apply each procedure at FDR level α = 0 .
05 for testing H gh : E( X gh ) = 0 against K gh : E( X gh ) >
0, simultaneously for all g = 1 , . . . , m, h = 1 , . . . , n , in terms of thecorresponding p-values, and note the proportions of false rejections among all rejectionsand correct rejections among all false nulls.5. Repeat Steps 1-4 200 times to simulate the values of FDR and power for each procedureby averaging out the corresponding proportions obtained in Step 4. Remark 7
Note that vec ( X ) ∼ N mn ( vec ( µ Θ ) , Σ c ⊗ Σ r ) , where Σ r = (1 − ρ r ) I n + ρ r n Tn , ρ r ∈ [0 , c = (1 − ρ c ) I m + ρ c m Tm , ρ c ∈ [0 , . Thus,the test statistics are allowed to have different types of dependence structure by appropriatelysetting the value of ρ r and/or ρ c at 0.Also, as seen from equation (23), the hidden state of each row and each columns in terms ofbeing significant or not has been factored into that of the hypothesis lying at their intersection.This enables us to incorporate the true effect of the underlying two-way classification structureinto our simulation study. Specifically, we can regulate the density of signals in the entire matrixusing the following π = 1 − (1 − π rc )(1 − π r )(1 − π c ) , representing the proportion of true nulls in the entire set of mn hypotheses in terms of theproportions of rows (1 − π r ) and columns (1 − π c ) containing signals.20 .2.2 Simulation Findings We fixed m = 50, n = 100, µ = 0 for true null hypotheses, and = 3 for true signals. Comparison of Oracle Procedures:
Here, we wanted to make two types of investigation of OracleTwo-Way GBH ’s performance under independence as well as under PRDS compared to theother oracle procedures being considered - one in terms of identifying signals and the other interms of FDR control and power. The findings of these are displayed in Figures 4-6.For the first type of investigation, in the 50 ×
100 matrix, we arranged the significant hypothe-ses in two 15 ×
15 blocks and along the diagonal of another 15 ×
15 block, as shown in Figure(4a). This arrangement helps to analyze the performance of a multiple testing procedure whenthe signals are dense (in the two blocks) as well as when they are sparse (along the diagonal).The performance of each method based on one trial is shown in the remaining plots in Figure(4), with that being shown in Figures 4b-4d for the independence case and in Figures 4e -4g inthe PRDS case (when ρ r = 0 . ρ c = 0 . procedure is seen to be successful in identifying maximum number of clustered signals,and almost equally efficient as the BH procedure when the signals are sparse. The performancesof the p-filter algorithm and the BH procedure are comparable. The BH better identifies sparsesignals, although under independence. It makes marginally higher number of false rejectionsthan the p-filter process.For the second type of investigation, we varied the density of true signals in the 50 × π r and π c to regulate the proportions of significant rows andcolumns. For each choice of ( π r , π c ), we varied π rc between 0 and 1 to determine the densityof signals in the significant rows and columns. We evaluated the performance of each of thethree methods in terms of simulated FDR and power at each level of (1 − π rc ). The resultsare displayed in Figure 5 for the independent case and in Figure 6 for the PRDS case. Ourproposed Oracle Two-Way GBH procedure performs better than either p-filter algorithm orthe BH procedure in terms of both FDR control and power. Performances of the p-filter processand the BH procedure are comparable. As the density of true signals increases, the proposedmethod maintains control on FDR at level α and is more powerful than the other two procedures.21 a) Signals (b) Two-way GBH (c) pfilter (d) BH(e) Two-way GBH (f) pfilter (g) BH Figure 4: Comparison of the proposed oracle Two-way GBH with one hypothesis per cell, withother methods for one trial. (a) shows the layout of the significant hypotheses. (b), (c) and (d)show the performances of the proposed two-way GBH, the p-filter process and the BH procedureif the hypotheses are independent. (e), (f) and (g) show the performances of these methods forthe same setup, respectively, if there is positive dependence among the hypotheses. We choose ρ r = 0 . ρ c = 0 . .0 0.2 0.4 0.6 0.8 1.0 . . . . . . p r = 0, p c =0 p rc F DR Two−way Grouped BHp−filterBH 0.0 0.2 0.4 0.6 0.8 1.0 . . . . . . p r = 0.3, p c =0.7 p rc . . . . . . p r = 0.5, p c =0.5 p rc . . . . . . p r = 0.8, p c =0.2 p rc (a) FDR Comparisons . . . . . . p r = 0, p c =0 p rcP o w e r Two−way Grouped BHp−filterBH 0.0 0.2 0.4 0.6 0.8 1.0 . . . . . . p r = 0.3, p c =0.7 p rc . . . . . . p r = 0.5, p c =0.5 p rc . . . . . . p r = 0.8, p c =0.2 p rc (b) Power Comparisons Figure 5: Comparison of the oracle Two-way GBH procedure with other methods, under inde-pendence. Set of parameters used is ( m = 50 , n = 100 , ρ r = 0 , ρ c = 0 , π r , π c , π rc ). Comparison of Data-Adaptive Procedures : Here, our focus had been to investigate the followingtwo questions regarding performance of our proposed Data Adaptive Two-Way GBH in Theorem4 compared to its natural competitor, which is Naive Data-Adaptive BH: (i) How well it performsunder independence when both are theoretically known to control FDR? (ii) If it can it possiblycontrol FDR under PRDS in view of the fact that such control is yet to be theoretically provedfor both of these procedures.Figures 7 and 8 display the findings of these investigation. Figure 7, which summarizesthe results associated with answering question (i) (with λ = 0 . π r = π c = 0), our proposed method seems significantly more powerfulin these situations. We chose ρ r = 0 . ρ c = 0 . possibly can control FDR under PRDS when there is a high density of signals across allrow and column groups; however, the choice of λ seems crucial in such situations, and its valuesshould be chosen in the range (0 , α ). 23 .0 0.2 0.4 0.6 0.8 1.0 . . . . . . p r = 0, p c =0 p rc F DR Two−way Grouped BHp−filterBH 0.0 0.2 0.4 0.6 0.8 1.0 . . . . . . p r = 0.3, p c =0.7 p rc . . . . . . p r = 0.5, p c =0.5 p rc . . . . . . p r = 0.8 p c =0.2 p rc (a) FDR Comparisons . . . . . . p r = 0, p c =0 p rcP o w e r Two−way Grouped BHp−filterBH 0.0 0.2 0.4 0.6 0.8 1.0 . . . . . . p r = 0.3, p c =0.7 p rc . . . . . . p r = 0.5, p c =0.5 p rc . . . . . . p r = 0.8, p c =0.2 p rc (b) Power Comparisons Figure 6: Comparison of the oracle Two-way GBH procedure for hypotheses with PRDSproperty, with other methods. Set of parameters used is ( m = 50 , n = 100 , ρ r = 0 . , ρ c =0 . , π r , π c , π rc ). 24 .0 0.2 0.4 0.6 0.8 1.0 . . . . . . p r = 0, p c =0 p rc F DR Two−way Adaptive Grouped BHAdaptive BH 0.0 0.2 0.4 0.6 0.8 1.0 . . . . . . p r = 0.3, p c =0.7 p rc . . . . . . p r = 0.5, p c =0.5 p rc . . . . . . p r = 0.8, p c =0.2 p rc (a) FDR Comparisons . . . . . . p r = 0, p c =0 p rcP o w e r Two−way Adaptive Grouped BHAdaptive BH 0.0 0.2 0.4 0.6 0.8 1.0 . . . . . . p r = 0.3, p c =0.7 p rc . . . . . . p r = 0.5, p c =0.5 p rc . . . . . . p r = 0.8, p c =0.2 p rc (b) Power Comparisons Figure 7: Comparison of the data-adaptive Two-way GBH procedure with the naive AdaptiveBH method, under independence. Set of parameters used is ( m = 50 , n = 100 , ρ r = 0 , ρ c =0 , π r , π c , π rc ) 25 .00 0.01 0.02 0.03 0.04 . . . . . . p rc = 0.70 l F DR Two−way Adaptive Grouped BHAdaptive BH 0.00 0.01 0.02 0.03 0.04 . . . . . . p rc = 0.75 l . . . . . . p rc = 0.80 l . . . . . . p rc = 0.90 l (a) FDR Comparisons . . . . . . p rc = 0.70 l P o w e r Two−way Adaptive Grouped BHAdaptive BH 0.00 0.01 0.02 0.03 0.04 . . . . . . p rc = 0.75 l . . . . . . p rc = 0.80 l . . . . . . p rc = 0.90 l (b) Power Comparisons Figure 8: Comparison of the data-adaptive Two-way GBH procedure for hypotheses with PRDSproperty, with the naive Adaptive BH procedure, for varying choices of 0 < λ < . α ). Setof parameters used is ( m = 50 , n = 100 , ρ r = 0 . , ρ c = 0 . , π r = 0 , π c = 0 , π rc )26 .3 Two-way Classified Hypotheses - Multiple hypotheses at each in-tersection This section presents results from our simulation study carried out to investigate the perfor-mances of (i) Oracle Two-Way GBH > procedure (Theorem 5) against the single-group BHprocedure in its oracle form and (ii) Data-Adaptive Two-Way GBH > (Theorem 6) against thenaive Adaptive BH in terms of FDR control and power under normal distributional settings. We considered the case where n gh = p for all ( g, h ), so that our data generating process hadto be designed to produce a random pair of third order tensors of dimension m × n × p , ( X , Θ ), consisting of normally distributed test statistics and the Bernoulli hidden states of thecorresponding hypotheses, respectively. The following were the steps in that process:1. Generate Θ mnp as an m × n × p dimensional random tensor of i.i.d. Ber(1 − π rc ), θ m as arandom vector of m i.i.d. Ber(1 − π r ), and θ n as a random vector of n i.i.d. Ber(1 − π c ).2. Obtain θ = θ mnp (cid:63) ( θ m ◦ n ◦ p ) (cid:63) ( m ◦ θ n ◦ p )(with a ◦ b denoting the outer product between the vectors a and b );3. Given Θ , generate X as an m × n × p dimensional tensor having a tensor normal distributiongiven below using its vectored form: vec ( X ) ∼ N mnp ( vec ( µ θ ) , Σ p ⊗ Σ c ⊗ Σ r ) , where Σ r = (1 − ρ r ) I n + ρ r n Tn , ρ r ∈ [0 , c = (1 − ρ c ) I m + ρ c m Tm , ρ c ∈ [0 , p = (1 − ρ p ) I p + ρ p p Tp , ρ p ∈ [0 , X ghk be the k th layer test statistic in the ( g, h ) cell. They can have different typesof positive dependence structures determined through appropriate choices of the correlationcoefficients ρ r ρ c and ρ p . If there is independence along any dimension of the tensor X , thecorresponding correlation coefficient is set to 0. We considered the problem of testing H gh :E( X ghk ) = 0 against K ghk : E( X ghk ) >
0, simultaneously for all g = 1 , . . . , m, h = 1 , . . . , n, k =1 , . . . , p . So, the next two steps in our simulation study were the following:4. Apply each of the aforementioned procedures at FDR level α = 0 .
05, and note down eachof the the proportions of false rejections among all rejections and correct rejections amongall false nulls. 27. Repeat Steps 1-4 200 times to simulate the values of FDR and power for each procedureby averaging out the corresponding proportions noted in Step 4.
We considered fixed m = 50, n = 100, p = 10, and set µ at 0 for true null hypotheses and at= 3 for all true signals. The rest of the parameters are regulated to generate different situationsand analyze the performance of our method in those settings. The combination of parameters( π r , π c ) chosen are similar to those in the case of two-way classification with one hypothesis percell. For each combination of values for ( π r , π c ), 1 − π rc was varied between 0 and 1. Signalsare sparse for smaller values of 1 − π rc and the density increases with its value. Comparison of Oracle Procedures:
We wanted to make two types of investigation for OracleTwo-Way GBH > in Theorem 5 under both independence and PRDS condition against the usualsingle group BH - how does it perform in terms of FDR control and power? The findings of theseare displayed in Figures 9 (for the independent case) and 10 (for the PRDS case correspondingto ρ r = 0 . ρ c = 0 .
4, and ρ p = 0 . Comparison of Data-Adaptive Procedures : As before, our focus in this case was to investigatethe following two questions regarding the performance of our proposed Data Adaptive Two-Way GBH > in Theorem 6 compared to naive Adaptive BH: (i) How well it performs underindependence when both are theoretically known to control FDR? (ii) Can it can possibly controlFDR under PRDS in view of the fact that such control is yet to be theoretically proved for bothof these procedures? Figures 11 and 12 display the findings of these investigation. Figure 11,which summarizes the results associated with answering question 1 (with λ = 0 . m × n grid (which occurs when π r = π c = 0). However, our proposed method is more powerfulwhen the signals are not uniformly distributed, which is displayed for the other combinations ofthe ( π r , π c ) values. Figure 12 says, as in the case of two-way classification with one hypothesisper cell, our proposed Data-Adaptive Two-Way GBH > can possibly control FDR under PRDSwith choices of λ < α , with a few instances being shown in the figure, when there is a highdensity of signals across all row and column groups. We apply our two-way classified method to a microbial abundance dataset to illustrate its ap-plication in real scientific problems. We consider the
GlobalPatterns dataset available through28 .0 0.2 0.4 0.6 0.8 1.0 . . . . . . p r = 0, p c =0 p rc F DR Two−way Grouped BHBH 0.0 0.2 0.4 0.6 0.8 1.0 . . . . . . p r = 0.3, p c =0.7 p rc . . . . . . p r = 0.5, p c =0.5 p rc . . . . . . p r = 0.8, p c =0.2 p rc (a) FDR Comparisons . . . . . . p r = 0, p c =0 p rcP o w e r Two−way Grouped BHBH 0.0 0.2 0.4 0.6 0.8 1.0 . . . . . . p r = 0.3, p c =0.7 p rc . . . . . . p r = 0.5, p c =0.5 p rc . . . . . . p r = 0.8, p c =0.2 p rc (b) Power Comparisons Figure 9: Comparison of the oracle Two-way GBH > procedure with the BH method, appliedto independent hypotheses. Set of parameters used is ( m = 50 , n = 100 , p = 10 , ρ r = 0 , ρ c =0 , ρ p = 0 , π r , π c , π rc ) 29 .0 0.2 0.4 0.6 0.8 1.0 . . . . . . p r = 0, p c =0 p rc F DR Two−way Grouped BHBH 0.0 0.2 0.4 0.6 0.8 1.0 . . . . . . p r = 0.3, p c =0.7 p rc . . . . . . p r = 0.5, p c =0.5 p rc . . . . . . p r = 0.8, p c =0.2 p rc (a) FDR Comparisons . . . . . . p r = 0, p c =0 p rcP o w e r Two−way Grouped BHBH 0.0 0.2 0.4 0.6 0.8 1.0 . . . . . . p r = 0.3, p c =0.7 p rc . . . . . . p r = 0.5, p c =0.5 p rc . . . . . . p r = 0.8, p c =0.2 p rc (b) Power Comparisons Figure 10: Comparison of the oracle Two-way GBH > procedure with the BH method, when thehypotheses satisfy PRDS condition. Set of parameters used is ( m = 50 , n = 100 , p = 10 , ρ r =0 . , ρ c = 0 . , ρ p = 0 . , π r , π c , π rc ) 30 .0 0.2 0.4 0.6 0.8 1.0 . . . . . . p r = 0, p c =0 p rc F DR Two−way Adaptive Grouped BHAdaptive BH 0.0 0.2 0.4 0.6 0.8 1.0 . . . . . . p r = 0.3, p c =0.7 p rc . . . . . . p r = 0.5, p c =0.5 p rc . . . . . . p r = 0.8, p c =0.2 p rc (a) FDR Comparisons . . . . . . p r = 0, p c =0 p rcP o w e r Two−way Adaptive Grouped BHAdaptive BH 0.0 0.2 0.4 0.6 0.8 1.0 . . . . . . p r = 0.3, p c =0.7 p rc . . . . . . p r = 0.5, p c =0.5 p rc . . . . . . p r = 0.8, p c =0.2 p rc (b) Power Comparisons Figure 11: Comparison of the data-adaptive Two-way GBH > procedure with the naive AdaptiveBH method, when the hypotheses are independent. Set of parameters used is ( m = 50 , n =100 , p = 10 , ρ r = 0 , ρ c = 0 , ρ p = 0 , π r , π c , π rc ) 31 .00 0.01 0.02 0.03 0.04 . . . . . . p rc = 0.70 l F DR Two−way Adaptive Grouped BHAdaptive BH 0.00 0.01 0.02 0.03 0.04 . . . . . . p rc = 0.75 l . . . . . . p rc = 0.80 l . . . . . . p rc = 0.90 l (a) FDR Comparisons . . . . . . p rc = 0.70 l P o w e r Two−way Adaptive Grouped BHAdaptive BH 0.00 0.01 0.02 0.03 0.04 . . . . . . p rc = 0.75 l . . . . . . p rc = 0.80 l . . . . . . p rc = 0.90 l (b) Power Comparisons Figure 12: Comparison of the data-adaptive Two-way GBH > procedure and the naive AdaptiveBH procedure, applied to hypotheses with PRDS property, for varying choices of 0 < λ < . α ). Set of parameters used is ( m = 50 , n = 100 , p = 10 , ρ r = 0 . , ρ c = 0 . , ρ p = 0 . , π r = 0 , π c =0 , π rc ) 32he Bioconductor package phyloseq . The data was first studied in Caporaso et al. (2011) toanalyze prevalence of microbial communities in different environments. The data consists of19216 microbes identified by their Operational Taxonomic (OTU) Numbers obtained from 26samples of 9 different environments, which includes a mock environment. The environments arecharacterized by 7 variables. Classification of the microbes according to their 7 taxonomic ranksis provided along-with the phylogenetic tree describing the relationships among the microbes.The data records abundance patterns of each microbe across the nine sample environments.Since microbes closely related at the tips of the phylogenetic tree have similar characteristics, itis quite likely that they have similar abundance patterns which renders a positive dependencein the data. A smaller subset of this dataset, consisting of data on only microbes specific tothe Chlamydiae bacteria taxon, was studied by Sankaran and Holmes (2014). For their analysis,they classified 21 microbes into four groups formed according to their taxonomic families andinvoked the Grouped BH procedure as suggested by Hu et al. (2010) to find which particularmicrobes are significantly abundant across the environments.We perform the analysis on a larger scale on the entire GlobalPatterns dataset. A linearregression is fit from data on each microbe’s prevalence to the environment types. Each p-valuecorresponds to a particular microbe and an environment. The p-value P ij , corresponding to the i th microbe and j th environment answers the question “Is the i th microbe abundantly presentin the j th environment?” In contrast to the analysis provided in Sankaran and Holmes (2014),we consider the p-values to be in a two-way classified structure. Considering the microbes asindividual groups furnishes m = 19216 groups and together with n = 9 environments we obtaina two way structure of dimensions 19216 × N = 120942hypotheses classified into a grid of m = 334 families along rows and n = 9 environments alongcolumns. Since there are unequal number of members (min: 1 and max: 1658) in each family,we use the data-adaptive method for two-way classified hypotheses with unequal number ofhypotheses in each cell with weights as mentioned in expression (16). The method identified7584 hypotheses as significant. In comparison, the adaptive BH procedure, applied to the entireset of hypotheses identified 7377 hypotheses as significant.33 a) Discoveries made in microbial families(b) Number of microbes discovered in each environment Figure 13: Comparison of the data-adaptive procedure for two-way classified hypotheses withmultiple hypothesis per cell, with the adaptive BH procedure, when applied to the microbiomedataset. 34
Concluding Remarks
In this article we have introduced a well-founded framework for one- and two-way classified hy-potheses, and an effective yet simple multiple testing method to test such hypotheses. Throughsimulations and data analysis, we have established that existing multiple testing proceduresthat do not take into consideration the layout of the hypotheses, are not sufficiently efficientto study such structures. Our proposed method in its oracle form controls FDR for indepen-dent hypotheses as well as for positively dependent hypotheses that satisfy the PRDS property.The corresponding data-adaptive procedure maintains control on FDR non-asymptotically forindependent hypotheses. Simulation studies show that it is also capable to control FDR forhypotheses with PRDS property, under certain conditions and when the density of signals ishigh. The method is flexible as it adapts itself to one-way or two-way classification structuresdepending on the choice of weights. We suggest generic weights suitable for such structures ofhypotheses, and these weights can be modified to involve additional information appropriate inany particular situation.In essence, this article explores an underdeveloped area of multiple testing, which is adaptingstandard procedures to structures of hypotheses more complex than what these procedures areinitially designed under to gain more efficiency. Occurrence of hypotheses exhibiting complexstructures, especially in the form of being classified according to multiple criteria, is becomingmore and more prevalent in modern statistical investigations with the current boom of BigData producing massive amounts of data from various sources. However, research focused ondeveloping methods efficiently accommodating such structural information has been taking placeat a pace that is much slower than one would hope for. Some advances have indeed been made inone-way classification setting (Hu et al. (2010), Liu et al. (2016), and Sarkar and Zhao (2017)),but there is still scope of making that advancement to a greater extent to provide a fullercoverage of that setting. Moreover, no advancement has been made yet in the direction ofadapting methods to two-way classification setting and beyond. This article makes a significantcontribution in this broader domain. There remains a scope of improving the proposed methodthrough specific choices of weights suitable to specific scenarios and produce newer methods thatcan effectively and efficiently be extended possibly to multi-way classification settings.We conclude this section with some open issues to be resolved in future research. In extendingthe Oracle One-Way GBH from one- to two-way classification setting before constructing a data-adaptive version of it, we have proposed using certain specific combinations of the row andcolumn weights (see, (8), (10), (13), (14), (15a) and (15b)). However, it would be worthwhile toinvestigate if these weights can be combined in an optimal manner. The data-adaptive procedureshere have been proposed by estimating weights using Storey et al. (2004) type estimates ofproportions of true nulls. Developing alternative data-adaptive procedures using other types of35stimates of these proportions would be an important undertaking.
A Appendix
A.1 Proof of Result 1
The FDR of a stepup procedure based on the weighted p-values and any set of critical constants c ≤ · · · ≤ c N can be expressed as follows [see, e.g., Sarkar (2002)]:FDR = N (cid:88) r =1 (cid:88) i ∈ I r Pr (cid:16) P wi ≤ c r , R w ( − i ) = r − (cid:17) = (cid:88) i ∈ I Pr( P wi ≤ c ) + N − (cid:88) r =1 (cid:88) i ∈ I E (cid:20) Pr (cid:16) R w ( − i ) ≥ r | P wi (cid:17) (cid:26) I ( P wi ≤ c r +1 ) r + 1 − I ( P wi ≤ c r ) r (cid:27)(cid:21) , (24)(assuming c = 0 and 0 / R w ( − i ) representing the number of rejections in the stepupprocedure based on the weighted p-values ( P w , . . . , P wN ) / { P wi } and the critical constants c i , i = 2 , . . . , N . With c i = ic , i = 1 , . . . , N , it is bounded above by (cid:80) i ∈ I Pr( P wi ≤ c ) underPRDS, which can be shown by making use of the following observations for each i ∈ I . • For each r = 1 , . . . , N − (cid:20) Pr (cid:16) R w ( − i ) ≥ r | P wi (cid:17) (cid:26) I ( P wi ≤ c r +1 ) r + 1 − I ( P wi ≤ c r ) r (cid:27)(cid:21) ≤ Pr (cid:16) R w ( − i ) ≥ r | P wi = c r (cid:17) (cid:26) Pr( P wi ≤ c r +1 ) r + 1 − Pr( P wi ≤ c r ) r (cid:27) ≤ . (25)The first inequality in (25) follows from the following two results:(i) Pr (cid:16) R w ( − i ) ≥ r | P wi (cid:17) = E (cid:110) I ( R w ( − i ) ≥ r ) | P wi (cid:111) is non-increasing in P wi , since I ( R w ( − i ) ≥ r ) isa non-increasing function of the weighted p-values, and the PRDS condition on the p-valuestranslates to that on the weighted p-values, and(ii) I ( P wi ≤ c r +1 ) r +1 − I ( P wi ≤ c r )) r changes sign from - to + at P wi = c r as P wi increases.The second inequality follows from the fact thatPr( P wi ≤ c r +1 ) r + 1 − Pr( P wi ≤ c r ) r = min { ( r + 1) c , } r + 1 − min { rc , } r ≤ . • For r = 0, the expectation in (25) equals Pr( P wi ≤ c ), and so with c = α/N , (cid:80) i ∈ I Pr( P wi ≤ c ) = (cid:80) i ∈ I min { αNw i , } ≤ αN (cid:80) i ∈ I w i .Thus, Result 1 is proved. 36 eferences Benjamini, Y. and R. Heller (2007). False discovery rates for spatial signals.
Journal of theAmerican Statistical Association 102 (480), 1272–1281.Benjamini, Y. and Y. Hochberg (2000). On the adaptive control of the false discovery rate inmultiple testing with independent statistics.
Journal of Educational and Behavioral Statis-tics 25 (1), 60–83.Benjamini, Y., A. M. Krieger, and D. Yekutieli (2006). Adaptive linear step-up procedures thatcontrol the false discovery rate.
Biometrika 93 (3), 491–507.Benjamini, Y. and D. Yekutieli (2001). The control of the false discovery rate in multiple testingunder dependency.
Ann. Statist. 29 (4), 1165–1188.Blanchard, G. and E. Roquain (2009). Adaptive false discovery rate control under independenceand dependence.
J. Mach. Learn. Res. 10 , 2837–2871.Caporaso, J. G., C. L. Lauber, W. A. Walters, D. Berg-Lyons, C. A. Lozupone, P. J. Turnbaugh,N. Fierer, and R. Knight (2011). Global patterns of 16s rrna diversity at a depth of millionsof sequences per sample.
Proc Natl Acad Sci U S A 108 (Suppl 1), 4516–4522. 201000080[PII].Clements, N., S. K. Sarkar, Z. Zhao, and D.-Y. Kim (2014). Applying multiple testing proceduresto detect change in east african vegetation.
Ann. Appl. Stat. 8 (1), 286–308.Finner, H., T. Dickhaus, and M. Roters (2009, 04). On the false discovery rate and an asymp-totically optimal rejection curve.
Ann. Statist. 37 (2), 596–618.Foygel Barber, R. and A. Ramdas (2015, December). The p-filter: multi-layer FDR control forgrouped hypotheses.
ArXiv e-prints .Hu, J. X., H. Zhao, and H. H. Zhou (2010). False discovery rate control with groups.
Journalof the American Statistical Association 105 (491), 1215–1227.Ignatiadis, N., B. Klaus, J. B. Zaugg, and W. Huber (2016). Data-driven hypothesis weightingincreases detection power in genome-scale multiple testing.
Nature Methods 13 , 577 EP.Liu, Y., S. K. Sarkar, and Z. Zhao (2016). A new approach to multiple testing of groupedhypotheses.
Journal of Statistical Planning and Inference 179 , 1 – 14.Pacifico, M. P., C. Genovese, I. Verdinelli, and L. Wasserman (2004). False discovery control forrandom fields.
Journal of the American Statistical Association 99 (468), 1002–1014.37amdas, A., R. Foygel Barber, M. J. Wainwright, and M. I. Jordan (2017). A unified treatmentof multiple testing with prior knowledge using the p-filter.
ArXiv e-prints .Sankaran, K. and S. Holmes (2014). structssi: Simultaneous and selective inference for groupedor hierarchically structured data.
Journal of Statistical Software, Articles 59 (13), 1–21.Sarkar, S. K. (2002). Some results on false discovery rate in stepwise multiple testing procedures.
Ann. Statist. 30 (1), 239–257.Sarkar, S. K. (2008). On methods controlling the false discovery rate.
Sankhy¯a: The IndianJournal of Statistics, Series A (2008-) 70 (2), 135–168.Sarkar, S. K. and Z. Zhao (2017). Local False Discovery Rate Based Methods for MultipleTesting of One-Way Classified Hypotheses.
ArXiv e-prints .Stein, J. L., X. Hua, S. Lee, A. J. Ho, A. D. Leow, A. W. Toga, A. J. Saykin, L. Shen, T. Foroud,N. Pankratz, M. J. Huentelman, D. W. Craig, J. D. Gerber, A. N. Allen, J. J. Corneveaux,B. M. DeChairo, S. G. Potkin, M. W. Weiner, and P. M. Thompson (2010). Voxelwise genome-wide association study (vgwas).
NeuroImage 53 (3), 1160 – 1174. Imaging Genetics.Storey, J. D. (2002). A direct approach to false discovery rates.
Journal of the Royal StatisticalSociety: Series B (Statistical Methodology) 64 (3), 479–498.Storey, J. D., J. E. Taylor, and D. Siegmund (2004). Strong control, conservative point estimationand simultaneous conservative consistency of false discovery rates: A unified approach.
Journalof the Royal Statistical Society. Series B (Statistical Methodology) 66 (1), 187–205.Sun, W. and Z. Wei (2011). Multiple testing for pattern identification, with applications tomicroarray time-course experiments.