[PDF] Adjusting the Benjamini-Hochberg method for controlling the false discovery rate in knockoff assisted variable selection

Abstract

This paper revisits the knockoff-based multiple testing setup considered in Barber & Candes (2015) for variable selection applied to a linear regression model with n\ge 2d, where n is the sample size and d is the number of explanatory variables. The BH method based on ordinary least squares estimates of the regressions coefficients is adjusted to this setup, making it a valid p-value based FDR controlling method that does not rely on any specific correlation structure of the explanatory variables. Simulations and real data applications demonstrate that our proposed method in its original form and its data-adaptive version incorporating estimated proportion of truly unimportant explanatory variables are powerful competitors of the FDR controlling methods in Barber & Candes (2015).

Full PDF

AAdjusting the Benjamini-Hochberg method forcontrolling the false discovery rate in knockoﬀ assistedvariable selection

Sanat K. Sarkar ∗ and Cheng Yong Tang ∗ Abstract

This paper revisits the knockoﬀ-based multiple testing setup considered in Barber& Cand`es (2015) for variable selection applied to a linear regression model with n ≥ d ,where n is the sample size and d is the number of explanatory variables. The BH methodbased on ordinary least squares estimates of the regressions coeﬃcients is adjusted tothis setup, making it a valid p -value based FDR controlling method that does not rely onany speciﬁc correlation structure of the explanatory variables. Simulations and real dataapplications demonstrate that our proposed method in its original form and its data-adaptive version incorporating estimated proportion of truly unimportant explanatoryvariables are powerful competitors of the FDR controlling methods in Barber & Cand`es(2015). Key words and phrases : False discovery rate (FDR); Knockoﬀ; Multiple testing; Variableselection.

This paper is motivated by the seminal work of Barber & Cand`es (2015) on variable selec-tion in the setting of a linear regression model, with the selection process being guided bydistribution-free methods controlling the false discovery rate (FDR) of Benjamini & Hochberg(1995) (BH, 1995). The novelty of this selection procedure lies in the construction of knock-oﬀs allowing formation of a framework for the underlying multiple testing problem thatproduces powerful FDR controlling algorithms. It is widely applicable, since p -value calcu-lations and knowledge of any speciﬁc correlation structure of the explanatory variables are ∗ Department of Statistical Science, Temple University, 1810 Liacouras Walk, Philadelphia, Pennsylvania19122-6083, U.S.A. Emails: [email protected], [email protected] a r X i v : . [ s t a t . M E ] F e b o longer required. Considering ﬁxed-design linear regression model with n ≥ d , where n isthe sample size and d is the number of explanatory variables, Barber & Cand`es (2015) putforward their ground-breaking idea of constructing knockoﬀs, before introducing their FDRcontrolling algorithms and providing numerical evidence of their remarkable performance inpractical applications.A considerable amount of research has taken place in the knockoﬀ domain, with paperswritten on variable selection centered around the knockoﬀ-based FDR controlling algorithmsproposed by Barber & Cand`es (2015). The BH in its original form is often consideredin these papers as a competing FDR controlling method; see, for example, Cand`es et al.(2018); Barber & Cand`es (2019), and Xing et al. (2019). However, it would have been moremeaningful if the BH method were adjusted to the knockoﬀ setting before being considered asthe competitor. So, adapting BH to the knockoﬀ setting, and thus establishing it as a valid,powerful FDR controlling method under this setting that does not rely on the correlationstructure of the explanatory variables is an important objective. This is the motivation thatdrives our research in this article. As in Barber & Cand`es (2015), let us consider the classical linear regression model: Y = Xβ + (cid:15), (2.1)where Y ∈ R d is the response vector, X = ( X , . . . , X d ) ∈ R n × d is design matrix of rank d ≤ n and consists of the known vectors of observations on d explanatory variables X , . . . , X d , β = ( β , . . . , β d ) ∈ R d is the unknown vector of regression coeﬃcients, and (cid:15) ∼ N d (0 , σ I )is the Gaussian noise with unknown variance. We assume n ≥ d , as in Barber & Cand`es(2015), and refer to the model in (2.1) as the one with this assumption in the rest of thepaper.Variable selection under this model can be framed as a multiple testing problem, with H j : β j = 0 as the null hypothesis being tested against β j (cid:54) = 0, simultaneously for j =1 , . . . , d . The explanatory variables corresponding to the rejected null hypotheses accordingto a multiple testing procedure using some estimates of the regressing coeﬃcients can beselected/discovered as the important variables. The FDR, formally deﬁned by Benjamini &Hochberg (1995) as FDR = E (cid:18) V max( R, (cid:19) , V being the number of falsely selected/discovered variables and R being the total ofselected variables, is a powerful measure of potential errors associated with such a selectionprocedure. Benjamini & Hochberg (1995) also introduced their procedure, famously knownas the BH procedure, which is one of the most commonly used FDR controlling proceduresunder similar multiple testing scenarios, albeit under certain positive dependence conditionssatisﬁed by the underlying test statistics or the p -values; see, for example, Benjamini & Yeku-tieli (2001) and Sarkar (2002). Unfortunately, however, it is inapplicable in variable selection,since the required positive dependence condition among the test statistics exhibited throughΣ = X (cid:48) X is not generally satisﬁed, except in the very special case when the explanatoryvariables are uncorrelated.Recognizing the limitation of using BH in variable selection, Barber & Cand`es (2015)put forward their innovative idea of using knockoﬀs to construct procedures controlling theFDR in variable selection. These procedures not only do not rely on any distributionalassumptions and any speciﬁc correlation structure among the explanatory variables, but alsocan consider any type of estimates of the regression coeﬃcients, such as those obtained usingLasso. More speciﬁcally, considering the case of n ≥ d , they constructed (cid:101) X , the knockoﬀcopy of X = ( X , . . . , X d ), satisfying (cid:101) X (cid:48) (cid:101) X = Σ and (cid:101) X (cid:48) X = Σ − D, where D = diag { s } , with s ∈ R d + being such that 2Σ − D is positive deﬁnite. The followingis an explicit representation of the knockoﬀ: (cid:101) X = X Σ − (Σ − D ) + (cid:101) U (2 D − D Σ − D ) , (2.2)where (cid:101) U ∈ R n × d is orthonormal to the column space of X and (2 D − D Σ − D ) is positivesquare root of 2 D − D Σ − D , which exists due to the above condition on D .Once the knockoﬀ (cid:101) X is constructed, the augmented version of the model in (2.1), with X replaced by ( X, (cid:101) X ), is used to estimate the pair of regression coeﬃcient vectors associatedwith X j and (cid:101) X j , for each j = 1 , . . . , d , using an estimation method. Given a statistic thattends to have larger positive values when β j (cid:54) = 0 than when β j = 0, and one would have usedunder the model in (1) to test H j against β j (cid:54) = 0, i.e., X j is important or not, there will be d pairs of such statistics ( Z i , (cid:101) Z i ), i = 1 , . . . d , when the augmented model is ﬁtted. Among otherchoices, the test statistic Z j can be deﬁned as the value of the penalty parameter associatedwith the Lasso solution path at which X j ﬁrst enters the Lasso model. Barber & Cand`es(2015) propose their FDR controlling procedures using W j = ( Z j ∨ (cid:101) Z j )[2 ( Z j > (cid:101) Z j ) − V / max( R, W j ’s.In this article, we take a diﬀerent approach. It aims at directly controlling the FDRthrough the use of a p -value based BH type procedure, and yet, does not rely on any speciﬁccorrelation structure of Σ. It is driven by our observation that the knockoﬀ (cid:101) X in (2.2), oncecreated, yields the following two distinct estimators of β : (cid:98) β = (2Σ − D ) − ( X + (cid:101) X ) (cid:48) Y = (cid:98) β OLS + (2Σ − D ) − (2 D − D Σ − D ) (cid:101) U (cid:48) Y and (cid:98) β = D − ( X − (cid:101) X ) (cid:48) Y = (cid:98) β OLS − D − (2 D − D Σ − D ) (cid:101) U (cid:48) Y, where (cid:98) β OLS is the ordinary least squares (OLS) estimator of β under the model in (2.1). Theseestimators are independently distributed under Gaussian noise assumption as N d ( β, σ (2Σ − D ) − ) and N d ( β, σ D − ), respectively. Thus, we note that the knockoﬀ allows creating twoindependent settings for multiple testing of the H j ’s, one of which is free of the correlationstructure of the explanatory variables. It is a novel framework where the BH can now workas a valid FDR controlling procedure, either independently by itself or integratively with theBoferroni.More speciﬁcally, let us deﬁne the vectors of test statistics T = ( T , . . . , T d ) (cid:48) = 1 (cid:98) σ √ (cid:8) (2Σ − D ) − (cid:9) ] − (cid:98) β and T = ( T , . . . , T d ) (cid:48) = 1 (cid:98) σ √ D (cid:98) β . The estimated noise variance (cid:98) σ , obtained from the linear model with the augmented designmatrix ( X, (cid:101) X ), is distributed independently of (cid:98) β and (cid:98) β as σ χ n − d / ( n − d ). From these,we obtain P ( t ) i = P ( t n − d > T i ) and P ( t ) i = P ( t n − d > T i ) , (2.3)with t n − d being the t distribution with n − d degrees of freedom, which are the pairsof p -values, each providing a test for H i , for i = 1 , . . . , d . Now, the BH can be used,either by itself based on P ( t ) = ( P ( t )1 , . . . , P ( t ) d ) or by amalgamating it with the Bonferronimethod based on P ( t ) = ( P ( t )1 , . . . , P ( t ) d ), to control the FDR. More generally, one canconsider screening the potentially signiﬁcant hypotheses by using the Bonferroni methodbased on P ( t ) before incorporating the corresponding p -values into the BH method based4n P ( t ) = ( P ( t )1 , . . . , P ( t ) d ) to determine which of these should be ﬁnally rejected to achievethe desired control over the FDR. A formal description of this method, to be called theBonferroni-BH, is presented in the next section considering a more general setting, bothin its original form that is agnostic to the unknown proportion of true nulls and in itsdata-adaptive form that incorporates an estimate of the proportion of truly unimportantexplanatory variables. Let us present our methodology in a general multiple testing scenario, which is not necessarilyrestricted to the aforementioned linear model and is referred to as paired-multiple testing,where p -values appear in pairs for each of the null hypotheses. More speciﬁcally, supposethat a pair of p -values, ( P ( x ) i , P ( y ) i ), each providing a test for a hypothesis H i associated withthe same parameter shared by a pair of generic random variables ( X i , Y i ), is available, for i = 1 , . . . , d . We consider the problem of simultaneous testing of these d null hypothesescontrolling the false discovery rate (FDR) at level α using both sets of p -values P ( x ) =( P ( x )1 , . . . , P ( x ) d ) and P ( y ) = ( P ( y )1 , . . . , P ( y ) d ) under the following assumptions: Assumption 1

For each i ∈ I = { j : H j is null } , ( P ( x ) i , P ( y ) i ) ∼ C ( u, v ) (bivariate copula). Assumption 2

The components of one of the two sets of p -values, say, P ( y ) , are positivelydependent in the sense of being PRDS, i.e., E (cid:110) φ ( P ( y )1 , . . . , P ( y ) d ) | P ( y ) i (cid:111) ↑ P ( y ) i for each i ∈ I , and for any (co-ordinatewise) non-decreasing function φ of ( P ( y )1 , . . . , P ( y ) d ). Assumption 2 will be referred to as Assumption 2 ∗ in the special case when the p -valuesare independent.Our proposed FDR controlling algorithm under this setting is deﬁned below: Method 1.

Bonferroni-BH.Step 1 Given α ∈ (0 , (cid:101) P i = 1 − ( P ( x ) i ≤ λ )(1 − P ( y ) i ), for i = 1 , . . . , d , for some ﬁxed λ ∈ [ α, (cid:101) P (1) ≤ · · · ≤ (cid:101) P ( d ) as the ordered versions of the (cid:101) P i ’s, and α i ( λ ) = iC − (cid:0) αd | λ (cid:1) /λ ,for i = 1 , . . . , d , where C − ( u | v ) is such that C ( C − ( u | v ) , v ) = u , ﬁnd (cid:101) R ( λ ) = max (cid:110) j : (cid:101) P ( j ) ≤ α j ( λ ) (cid:111) , provided the maximum exists; otherwise, let it be 0.5tep 3 Reject H ( i ) , the hypothesis correpsonding to (cid:101) P ( i ) , ∀ i ≤ (cid:101) R ( λ ). Theorem 1

The Bonferroni-BH controls the FDR at π α , where π is the proportion of truenulls, under Assumptions 1 and 2, if(i) P ( x ) and P ( y ) are independent, or(ii) P ( x ) and P ( y ) are dependent in the following way:Given two sets of p -values U ( x ) = ( U ( x )1 , . . . , U ( x ) d ) and U ( y ) = ( U ( y )1 , . . . , U ( y ) d ) satisfyingAssumptions 1 and 2 ∗ for testing the same set of d hypotheses, and a positive valuedrandom variable Z distributed independently of U ( x ) and U ( y ) , P ( x ) i = P ( U ( x ) i , Z ) and P ( y ) i = P ( U ( y ) i , Z ) , for some (co-ordinatewise) increasing function P such that, for each i ∈ I , the condi-tional density of each of P ( x ) i and P ( y ) i at u given Z = z , say, f ( u | z ) , is totally positiveof order two (TP ) in ( u, z ) ; i.e., the following inequality holds: f ( u | z ) f ( u (cid:48) | z (cid:48) ) ≥ f ( u (cid:48) | z ) f ( u | z (cid:48) ) , for all u ≤ u (cid:48) , z ≤ z (cid:48) .Remark 1 .(i) The sets of p -values P ( t ) and P ( t ) in (2.3) (which are P ( x ) and P ( y ) , respectively, inTheorem 1) satisfy Assumptions 1 and 2 ∗ . Also, they are independent (or approximatelyindependent) if σ is known (or if n is suﬃciently large so that (cid:98) σ ≈ σ ), with T and T being deﬁned with (cid:98) σ replaced by σ and the corresponding sets of p -values being generatedusing χ distribution. They are dependent in the sense mentioned in Theorem (ii) when σ is unknown and n is not suﬃciently large. In other words, we propose in this article theBonferroni-BH as an FDR controlling method for variable selection applied to model (1).(ii) When Σ is diagonal, the Bonferroni-BH reduces to the original FDR controlling BHmethod based on the OLS estimate of β under the model in (1). So, the Bonferroni-BH canbe regarded as the knockoﬀ-adaptive/adjusted version of the BH.(iii) The parameter λ acts as a conduit between the Bonferroni method using only P ( x ) and the BH method using only P ( y ) . It takes Method 1 from the Bonferroni when λ = α/d to the BH when λ = 1. The choice of λ , although arbitrary, is important. A knowledge ofthe sparsity of signals or the extent of dependence between the explanatory variables canprovide some insight into its proper choice. Without such information, however, we proposeusing λ = √ α , as results of numerical studies presented in the next section seem to suggest6hat Method 1 with this choice of λ can have signiﬁcantly improved performance comparedto its natural competitors.(iv) If both P ( x ) and P ( y ) were PRDS, an alternative and possibly more powerful FDRcontrolling method than the Bonferroni-BH could be constructed, with a new name paired-BH, by considering the original BH based on max( P ( x ) i , P ( y ) i ), i = 1 , . . . , d .(v) As in the regular BH, the Bonferroni-BH can be adapted to the existing data throughestimating π , potentially providing a tighter control of the FDR at α . A class of suchadaptive versions of the Bonferroni-BH is given below. Method 2.

Adaptive Bonferroni-BH.Step 1. Using P ( y ) , deﬁne an estimator (cid:98) π ( y )0 ≡ (cid:98) π ( P ( y ) ) of π , which is increasing in P ( y ) andsatisﬁes the following inequality: E (cid:40)(cid:88) i ∈ I [1 / (cid:98) π ( y )0( − i ) )] (cid:41) ≤ d, (3.1)with (cid:98) π ( y )0( − i ) ≡ (cid:98) π ( P ( y )( − i ) , P ( y ) i = 0).Step 2. Let P ∗ i = (cid:40) P ( x ) i > λ (cid:98) π ( y )0 P ( y ) i if P ( x ) i ≤ λ, for i = 1 , . . . , d. Step 3 Find R ∗ ( λ ) = max { i : P ∗ ( i ) ≤ iαλd } . Step 4 Reject the hypothesis corresponding to P ∗ ( i ) for all i ≤ R ∗ ( λ ). Theorem 2

The Adaptive Bonferroni-BH controls the FDR at α under Assumptions 1 and2*. Remark 2. There are several diﬃdent choices for (cid:98) π ( y )0 satisfying the required conditionsin Theorem 2; see, for example, Sarkar (2008). Among these, the one in Storey et al. (2004),as deﬁned below (cid:98) π ( y )0 = d − (cid:80) di =1 ( P ( y ) i ≤ η ) + 1 d (1 − η ) , for some ﬁxed 0 < η <

1, is frequently chosen. We will consider using this estimate in thisarticle to numerically study the performance of our proposed adaptive Bonferroni-BH.7

Simulations

This section presents the results of an extensive simulation study we conducted to numericallyinvestigate the performances of our proposed Bonferroni-BH method and its data-adaptiveversion in terms FDR control and power (expected proportion of correct discoveries). Thedata were generated following the model in (2.1), with the components of the model matrixbeing generated from correlated standard normal random variables having an AR(1) struc-ture whose correlation coeﬃcient was set at 0.5. The knockoﬀ copy was created followingthe method of Barber & Cand`es (2015) by implementing the “Equi-Correlated Knockoﬀs”available therein, by choosing D = diag { s } = diag { s, . . . , s } , with s = 2 ξ min (Σ) ∨

1, where ξ min (Σ) is the smallest eigenvalue of Σ. With the knockoﬀ method of Barber & Cand`es (2015)being the relevant competitor of our proposed methods, we benchmarked the performances ofall these methods against the original BH method based on the (cid:98) β statistic, which is knownto control the FDR exactly at π α .The following sets of values for the diﬀerent parameters provided various settings for oursimulation study: • Sample size: n ∈ { , , , } ; • Predictor dimensionality: d ∈ { . n, . n, . n, . n } ; • The number of important variables: k ∈ { , (cid:100) . d (cid:101) , (cid:100) . d (cid:101) , (cid:100) . d (cid:101) , (cid:100) . d (cid:101) , (cid:100) . d (cid:101)} where (cid:100)·(cid:101) is the ceiling function; • Signal strength: β = · · · = β k = a ∈ { , , , , } for diﬀerent amplitude a ; • FDR level: α ∈ { . , . , . , . } . • The thresholding parameter for Bonferroni: λ = √ α • The tuning parameter for estimation of true nulls: η = 0 . k , covering casesfrom no signal ( k = 0, all nulls are true) to diﬀerent percentages of signals among thehypotheses. As seen from Figures 1 and 2, while all these methods control the FDR, asexpected since it was theoretically established, the Barber-Candes method is seen to beextremely conservative when there are no or few signals, most often not making any discovery,8ven when the sample size is as large as 1 , k = 0 . d , at α = 0 .

05 and 0 .

10, respectively. In these two ﬁgures, the impact of thesignal strength on the performances of all the procedures is also examined. The ﬁrst rowsin these ﬁgures show consistent pattern in the FDR control. While the FDR is controlledfor each of the methods, the knockoﬀ method remains conservative when the signals arerelatively weak and the sample size is not large. The second rows in these ﬁgures show verypromising power performance of the proposed methods, especially when the sample size issmaller and the FDR level is α = 0 .

05, lower than 0 .

10. Additionally, the Bonferroni-BH isseen to have better power when adapted to the data via estimating the proportion of truenulls, with both having better power than the (cid:98) β -only based BH method.With the results in Figures 3 and 4 being for a moderate setting where 20% of thepredictors are contributing, the overreaching conclusion that can be made from our entiresimulation study is that the knockoﬀ method has better performance at larger FDR levels,with denser signals, and larger sample sizes. Whereas, our proposed Bonferroni-BH andAdaptive Bonferroni-BH perform well at smaller FDR levels, with sparser signals, and smallersample sizes. In other words, our proposed methods can perform quite well relative to theBarber-Candes method in some instances, making them competitive among FDR controllingmultiple testing procedures for variable selection. We analyzed the same data set that was considered in Barber & Cand`es (2015). This dataset has been described and studied in Rhee et al. (2006); and it is available to the public.The data set contains drug resistance measurements and genotype information from samplesof HIV-1. Separated data sets are available for resistance, respectively, to protease inhibitors(PIs), to nucleoside reverse transcriptase (RT) inhibitors (NRTIs), and to nonnucleoside RTinhibitors (NNRTIs). The same data preparing and pre-processing steps as those in Barber& Cand`es (2015) were followed.We considered analyzing the resistance to protease inhibitors (PIs) data. In this dataset, there are 7 drugs. We considered the log-fold increase of lab-tested drug resistance asthe response variable, and the same model matrix as that in Barber & Cand`es (2015). Wealso compared the selected mutations for various methods with existing treatment-selectedmutation (TSM) panels (Rhee et al., 2005). As pointed out in Barber & Cand`es (2015), theTSM list is a reference consisting of mutations associated with the general PI class of drugs,and so the TSM list is not expected to be speciﬁc to the individual drugs.9igure 1: Simulated FDR for α = 0 .

05. When k = 0, all null hypotheses are true, when k (cid:54) = 0, the amplitude is 6; B-BH – the proposed Bonferroni-BH procedure with λ = √ α ;AB-BH – the adaptive Bonferroni-BH procedure; Beta 2 only refers to the approach using (cid:98) β only; Knockoﬀ – the knockoﬀ based method of Barber & Cand`es (2015).10igure 2: Simulated FDR for α = 0 .

1. When k = 0, all null hypotheses are true, when k (cid:54) = 0,the amplitude is 6; B-BH – the proposed Bonferroni-BH procedure with λ = √ α ; AB-BH– the adaptive Bonferroni-BH procedure; Beta 2 only refers to the approach using (cid:98) β only;Knockoﬀ – the knockoﬀ based method of Barber & Cand`es (2015).11igure 3: Simulated FDR (1st row) and power (2nd row) when some null hypotheses are nottrue; the FDR level is α = 0 . λ = √ α ; AB-BH – the adaptiveBonferroni-BH procedure; Beta 2 only refers to the approach using (cid:98) β only; Knockoﬀ –the knockoﬀ based method of Barber & Cand`es (2015).12igure 4: Simulated FDR (1st row) and power (2nd row) when some null hypotheses are nottrue; the FDR level is α = 0 . λ = √ α ; AB-BH – the adaptiveBonferroni-BH procedure; Beta 2 only refers to the approach using (cid:98) β only; Knockoﬀ –the knockoﬀ based method of Barber & Cand`es (2015).13e considered three diﬀerent FDR levels: α = 0 . α = 0 .

1, and α = 0 .

2. The corre-sponding results are reported in Figures 5, 6, and 7, respectively. We compare our proposedmethods, the Bonferroni-BH and the Adaptive Bonferrni-BH, with the knockoﬀ method ofBarber & Cand`es (2015), and the original BH method at level α using (cid:98) β OLS , where the lattertwo are replicated from the study of Barber & Cand`es (2015).The ﬁndings are summarized as follows. For the ATV drug, our methods generally makeno or few discoveries for all FDR levels. The sample size of this group is smaller, less thanhalf of almost all other six groups. This might be due to the reason that the uncertaintylevel in the estimations is relatively too high for the testing methods to detect more signals.No discovery is also the case for the knockoﬀ method when α = 0 .

05 for the ATV drug. Forthe other six drugs, our proposed Bonferroni-BH and Adaptive Bonferroni-BH approachesidentify reasonable number of discoveries, and the validations with the TSM reference alsoseem reasonable. More speciﬁcally, we observe that our methods are stable for α = 0 . , . .

2. That is, the Bonferroni-BH and the Adaptive Bonferroni-BH are making consistentlymore discoveries with increasing level of FDR control, and the portions of the relative falsediscoveries with reference to the TSM list are also consistent. For the knockoﬀ method, weobserve that it does not oﬀer any discovery for six out of seven drugs at α = 0 .

05, and forone out of seven drugs at α = 0 .

1. A plausible reason is that the HIV-1 data set is likelyto contain relatively sparse signals (less than 15% if referred to the TSM list), in whichcase, as our simulations above indicated, the Knockoﬀ method won’t perform well with lowerFDR levels. However, when α = 0 .

2, the knockoﬀ method detects more discoveries than ourmethods do.

The take-home message from our research in this article is that, for variable selection appliedto linear regression models, the BH can be adjusted to the Barber-Cand`es knockoﬀ basedmultiple testing setup, making it work as a valid p -value based FDR controlling procedurewithout relying on any speciﬁc dependence structure of Σ. It supports our view that suchan adjusted version of the BH method should be considered as a relevant competitor, ratherthan its original, non-knockoﬀ version, in a comparative study of FDR controlling methodsunder the same setting. The Bonferroni-BH and its adaptive version incorporating estimatedproportion of true nulls for varying choices of λ that we propose here represent such a class ofadjusted BH procedures. However, as stated in Remark 1(iii), these procedures correspondingto λ = √ α are our preferred choices among them.Our simulation study seems to indicate that the distribution-free approach in Barber &14igure 5: Real data example for α = 0 .

05. Blue represents the number of discovers that arein the TSM list; and yellow represents the number of discoveries that not in the TSM list.The total number of HIV-1 protease positions in the TSM list is 34.BH – the BH procedure; Knockoﬀ – the knockoﬀ based method of Barber & Cand`es (2015);B-BH – the proposed Bonferroni-BH procedure; AB-BH – the adaptive Bonferroni-BH pro-cedure. 15igure 6: Real data example for α = 0 .

1. Blue represents the number of discovers that arein the TSM list; and yellow represents the number of discoveries that are not in the TSMlist. The total number of HIV-1 protease positions in the TSM list is 34.BH – the BH procedure; Knockoﬀ – the knockoﬀ based method of Barber & Cand`es (2015);B-BH – the proposed Bonferroni-BH procedure; AB-BH – the adaptive Bonferroni-BH pro-cedure. 16igure 7: Real data example for α = 0 .

2. Blue represents the number of discovers that arein the TSM list; and yellow represents the number of discoveries that are not in the TSMlist. The total number of HIV-1 protease positions in the TSM list is 34.BH – the BH procedure; Knockoﬀ – the knockoﬀ based method of Barber & Cand`es (2015);B-BH – the proposed Bonferroni-BH procedure; AB-BH – the adaptive Bonferroni-BH pro-cedure. 17and`es (2015) may not always perform satisfactorily. Plausible explanations for this are:(i). It aims at controlling a conservative estimate of the false discovery proportion (FDP).(ii). While estimating the FDP, it considers estimating the number of false discoveries usingthe (cid:101) Z i ’s that are relatively small since each of them is likely to come from the nulldistribution if the corresponding explanatory variable is truly unimportant. As a result,it is methodologically indiﬀerent to some strong signals, resulting possible loss of powerin making important discoveries, especially in a scenario where there are fewer butstronger signals.(iii). When the FDR level is more stringent in the sense of α being relatively small, theestimation of the number of false discoveries may be subject to higher level of variance,which inevitably transfers to the subsequent decisions on discoveries. This could bemore severe when the sample size is relatively small.In contrast, by incorporating the distribution of the test statistics and utilizing the scopeof using the knockoﬀs, our approach is more advantageous in detecting sparse signals andhandling multiple testing problems with more stringent FDR levels.Adjusting the BH to the knockoﬀ setting is in essence same as adapting it to the un-known correlation structure of the underlying test statistics. Adapting BH to the unknowndependence/correlation structure of the test statistics or p -values without losing its ultimatecontrol over the FDR has been an outstanding open problem in multiple testing. Our researchin this article can be seen to make some contribution towards answering this open problemunder Gaussian distributional setting. Although focused mainly on multiple testing of theregression coeﬃcients in a linear regression model, it shows that the BH can be successfullyadapted to the dependence structure if two independent settings of the underlying testingproblem can be created using the existing data, with one being free of the dependence struc-ture, so that BH can be applied, either by itself or in conjunction with the Bonferroni, tocontrol the FDR. This idea can be extended to other multiple testing problems, at least underGaussian distributional settings. Notably among them is multiple testing of the multivariatenormal means with arbitrary dependence structure, as brieﬂy described below.Given X = ( X , . . . , X n ) ∼ N nd ( µ (cid:48) n , I d ⊗ Σ) , where n ≥ d , 1 n = (1 , . . . , (cid:48) , µ = ( µ , . . . , µ d ) (cid:48) and Σ, assumed known for the time being,is positive deﬁnite, consider testing of H i : µ i = 0 simultaneously for i = 1 , . . . , d subject tocontrolling the FDR. It is well-known in the multiple testing literature that if Σ is diagonal orsatisﬁes certain positive dependence condition that can be qualitatively assessed through the18orrelations and the alternatives are all right- or left-sided, then the BH can provably controlthe FDR. Whether or not the BH or a suitable data-adaptive version of it can provablycontrol the FDR when the alternatives are all two-sided is still an unsettled issue in theliterature. Our research in this article suggests for the ﬁrst time, as far as we know, apath towards settling this issue positively through a paired- multiple testing scenario, asdescribed in Section 3. We will delve into this important problem, along with broadeningour current research to scenarios with 2 d > n and high-dimensional multiple testing, in ournext communications.Further research is needed to provide a fuller, more comprehensive study of our proposedBonferroni-BH in a paired-multiple testing scenario considered in this article. For instance: • There are other ways of estimating the proportion of true nulls using P ( y ) , as noted inRemark 2. A more speciﬁc comparative study involving diﬀerent adaptive versions of theBonferroni-BH corresponding to these diﬀerent types of estimate can reveal a more completepicture on opportunities in terms of having a balance between controlling FDR and enhancingpower. • There are other possible ways of amalgamating the Bonferroni using P ( x ) and theBH using P ( y ) , yielding other FDR controlling methods. For instance, one can considerestimating the proportion of true nulls from P ( y ) using a BH type algorithm, before using itin constructing an FDR controlling method in terms of the Bonferroni method based on P ( x ) with an improved common threshold that incorporates this estimate.We intend to pursue these lines of research in our future communications as well. Acknowledgements

We thank Edo Airoldi for some useful suggestions.

References

Barber, R. F. & Cand`es, E. J. (2015). Controlling the false discovery rate via knockoﬀs.

The Annals of Statistics , 2055–2085. Barber, R. F. & Cand`es, E. J. (2019). A knockoﬀ ﬁlter for high-dimensional selectiveinference.

The Annals of Statistics , 2504–2537. Benjamini, Y. & Hochberg, Y. (1995). Controlling the false discovery rate: A practicaland powerful approach to multiple testing.

Journal of the Royal Statistical Society: SeriesB (Methodological) , 289–300. 19 enjamini, Y. & Yekutieli, D. (2001). The control of the false discovery rate in multipletesting under dependency.

The Annals of Statistics , 1165–1188. Blanchard, G. & Roquain, E. (2008). Two simple suﬃcient conditions for FDR control.

Electronic Journal of Statistics , 963–992. Cand`es, E. , Fan, Y. , Janson, L. & Lv, J. (2018). Panning for gold: ‘model-x’ knockoﬀsfor high dimensional controlled variable selection.

Journal of the Royal Statistical Society:Series B (Statistical Methodology) , 551–577. Rhee, S.-Y. , Fessel, W. J. , Zolopa, A. R. , Hurley, L. , Liu, T. , Taylor, J. , Nguyen, D. P. , Slome, S. , Klein, D. , Horberg, M. , Flamm, J. , Follansbee, S. , Schapiro, J. M. & Shafer, R. W. (2005). HIV-1 protease and reverse-transcriptasemutations: Correlations with antiretroviral therapy in subtype b isolates and implicationsfor drug-resistance surveillance.

The Journal of Infectious Diseases , 456–465.

Rhee, S.-Y. , Taylor, J. , Wadhera, G. , Ben-Hur, A. , Brutlag, D. L. & Shafer,R. W. (2006). Genotypic predictors of human immunodeﬁciency virus type 1 drug resis-tance.

Proceedings of the National Academy of Sciences , 17355–17360.

Sarkar, S. K. (2002). Some results on false discovery rate in stepwise multiple testingprocedures.

The Annals of Statistics , 239–257. Sarkar, S. K. (2008). On methods controlling the false discovery rate.

Sankhy˜a: TheIndian Journal of Statistics , 135–168. Storey, J. D. , Taylor, J. E. & Siegmund, D. (2004). Strong control, conservative pointestimation and simultaneous conservative consistency of false discovery rates: a uniﬁedapproach.

Journal of the Royal Statistical Society: Series B (Statistical Methodology) ,187–205. Xing, X. , Zhao, Z. & Liu, J. S. (2019). Controlling false discovery rate using gaussianmirrors.