Modified estimator for the proportion of true null hypotheses under discrete setup with proven FDR control by the adaptive Benjamini-Hochberg procedure
aa r X i v : . [ m a t h . S T ] S e p Modified estimator for the proportion of true null hypothesesunder discrete setup with proven FDR control by the adaptiveBenjamini-Hochberg procedure
Aniket BiswasDepartment of StatisticsDibrugarh UniversityDibrugarh, Assam, India-786004Email: [email protected]
Gaurangadeb ChattopadhyayDepartment of StatisticsUniversity of CalcuttaKolkata, West Bengal, India-700019Email: [email protected]
Abstract
Some crucial issues about a recently proposed estimator for the proportion of true null hy-potheses ( π ) under discrete setup are discussed. An estimator for π is introduced under thesame setup. The estimator may be seen as a modification of a very popular estimator for π ,originally proposed under the assumption of continuous test statistics. It is shown that adap-tive Benjamini-Hochberg procedure remains conservative with the new estimator for π beingplugged in. Keywords : Multiple hypotheses testing, RNA Sequence data, Fisher’s exact test.
MSC 2010:
In this era of next generation sequencing data, application of multiple testing procedures in dis-crete paradigm have become increasingly popular over the last decade. The classical Benjamini-Hochberg (henceforth BH) procedure controls FDR at a prefixed level when the p -values cor-responding to the true null hypotheses are uniform (Benjamini and Hochberg 1995). Theadditional assumptions for proving the control over FDR are that the p -values correspondingto the true null hypotheses are independent among themselves and they are independent withthe p -values corresponding to the false null hypotheses. Following the steps in the proof ofFDR control by BH procedure, it is worth noting that the procedure is conservative in twodimensions. The first one is that the BH procedure originally controls FDR at level π q insteadof the prefixed level q ∈ (0 ,
1) and thus a reasonable estimate of π can inflate the cutoffs foreach of the ordered p -values to bring more rejections with the same control over FDR. It isworth mentioning that, the BH procedure was originally introduced for multiple continuoustest-statistics where the tested nulls are usually simple. Despite the above fact, BH procedureaccommodates multiple discrete test statistics as the p -values under this setup are stochasticallylarger than the uniform variate. Unfortunately, super-uniformity of the p -values correspond-ing to the true null hypotheses makes the procedure conservative under discrete setup. Heyse(2011) noted the fact and modified the BH procedure (henceforth BHH procedure) for discretetest statistics. However, the first cause of being conservative is valid for both continuous anddiscrete paradigms. A huge literature is available on estimating π under continuous set-up1Storey 2002, Storey et al. 2004, Langaas et al. 2005, Benjamini et al. 2006, Wang et al. 2011,Cheng et al. 2015, Biswas (1) 2020 among many others). These estimators for π become veryconservative when applied in discrete setup. Dialsingh et al. (2015) and Chen et al. (2018) arethe only works particularly focusing on the issue of estimating π under discrete setup till date.Suitable estimate of π is used to construct more powerful adaptive FDR controlling procedures(Benjamini et al. 2006, Sarkar 2008, Blanchard and Roquain 2009, Chen and Doerge 2014).The current work introduces a new estimator for the proportion of true null hypotheses underdiscrete setup. The focus is obviously on the desired result that the adaptive FDR controllingalgorithms constructed by plugging in the new estimator for π is conservative. The work ofChen et al. (2018) and two related reports need a revisit to motivate the current work.The rest of the article is organized as follows. In the next section we introduce some usefulnotations, discuss some key developments and motivate the current research. We present theestimator in Section 3. The theoretical result on adaptive algorithms being conservative isstated and proved in Section 4. We conclude the article by pointing out the possible futureadditions to the article. Suppose H , H , ..., H m are m related but independent hypotheses to be tested. From empir-ical Bayesian motivation given in Storey (2002), H i ’s are iid Bernoulli random variables for i ∈ I = { , , ..., m } with parameter π and H i = 1 denotes that the i -th null hypothesis istrue. Let T = { i ∈ I : H i = 1 } , the set of indices corresponding to the true null hypothesesand F = { i ∈ I : H i = 0 } , the same corresponding to the false null hypotheses. We denote thenumber of true null hypotheses by m and m = P i ∈I H i ∼ Binomial( m, π ). H i ’s and hence m are unobserved. The parameter of interest π is required and it is usually estimated from theset of available p -values { p , p , ..., p m } , p i being the p -value corresponding to H i , i ∈ I . We usethe notation p to denote p -value irrespective of it being a random variable or an observed valueand the meaning holds according to the context. Under discrete multiple testing setup, multipleFisher’s exact tests (FET) or Binomial tests (BT) are performed. For details on two-populationFET, BT and their relevance in identifying differentially expressed genes from RNA sequencedata available on two different study groups, see Chen (2019).For a suitably chosen tuning parameter τ ∈ (0 , { i ∈ F : p i > τ } has cardinality 0. Let I denote the usual indicator function such that I ( x ) = 1 if thestatement x is true and I ( x ) = 0, otherwise. As discussed earlier, p ∼ Uniform(0 ,
1) when thetested null is simple and the test statistics are continuous. Thus under continuous setup withthe additional assumption, P i ∈I I ( p i > τ ) ∼ Binomial( m , − τ ). Noting that E ( m ) = mπ ,a conservative estimator for π isˆ π ( τ ) = 1 m (1 − τ ) X i ∈I I ( p i > τ ) . (2. 1)Storey et al. (2004) further introduced a conservative bias in ˆ π ( τ ) and the modified estimatoris ˆ π S ( τ ) = 1 m (1 − τ ) + 1 m (1 − τ ) X i ∈I I ( p i > τ ) . (2. 2)Though the distributions of the true null p -values are same under the continuous setup, dis-tribution of the false null p -values are different and thus should be labelled by i . Denote thedistribution function of p i for i ∈ F by G i . Then ˆ π S ( τ ) has conservative bias B = 1 m (1 − τ ) + 1 m (1 − τ ) X i ∈F [1 − G i ( τ )] . (2. 3)2owever, if ˆ π S ( τ ) is used for estimating π under discrete setup, another positive bias termgets introduced. For discrete tests, the true null p -values lose homogeneity and we denotedistribution function of p i for i ∈ T by F i . Under discrete multiple testing setup, the bias inˆ π S ( τ ) is B = 1 m (1 − τ ) + 1 m (1 − τ ) X i ∈T [ τ − F i ( τ )] + 1 m (1 − τ ) X i ∈F [1 − G i ( τ )] . (2. 4)Henceforth we continue our discussion by assuming that the p -values are obtained from discretetests. As mentioned in Section 1, p stochastically dominates Uniform(0 ,
1) random variablewhen it is obtained from a discrete test. Thus the additional bias B − B is non-negative since F i ( τ ) ≤ τ . Let S i denote the support of p i . It is to be noted that B = B iff τ ∈ S i for all i ∈ I ,that is τ ∈ ∩ i ∈I S i . Storey (2002) suggested a bootstrap routine to select appropriate τ undercontinuous setup and Chen et al. (2018) worked with τ = 0 . τ is prefixed and there is no guarantee whatsoever that τ can be chosen from ∩ i ∈I S i . We illustrate the situation with a numerical example for FETusing the notations in section 2.1 of Chen (2019). Consider X i ∼ Binomial( q i , N ) for i = 1 , H : q = q versus H : q = q . (2. 5)for discrete tests, two-sided p -values are defined in many ways but to maintain super-uniformityof the p -values, we work with the usual one similar to Chen et al. (2018). A concise discus-sion on super-uniform and sub-uniform p -values for discrete test interested readers may referto Chen (2019). If the total observed count c = 1, least possible p -values is 1. Thus if a singletest with c = 1, is to be performed among the m tests, ∩ i ∈I S i is essentially 1. Thus, the onlypossible choice of τ for B = B is 1 which is clearly an absurd choice. However, for all practicalpurposes the rows with total observed count 1 are removed from the data matrix for furtheranalysis. Henceforth we assume that the working dataset does not contain any row with totalcount equal to 1. Still the problem persists as follows. Assume m = 3 FETs, N = 5 and totalobserved counts are 2 , S = { . , . } , S = { . , . } and S = { . , . , . } . Again the only possible choice of τ for B = B is 1. Thusit is clear that while testing m hypotheses simultaneously under discrete setup, use of ˆ π S ( τ )amounts to B − B extra bias eventually.Chen et al. (2018) came up with an idea to not work with prefixed τ . In fact, the authorssuggested to use different guiding values for different p -values in ˆ π S ( τ ), to overcome the abovementioned shortcoming. The idea is to fix tuning parameter τ and determine a guiding value λ i ∈ S i only for p i depending on the fixed τ . The estimator ˆ π G in Chen et al. (2018) can beperceived to be an improvement over ˆ π S since use of customized guiding values vanishes theextra bias term as demonstrated therein. Chen et al. (2018) used multiple tuning parametersfor possible reduction in variance of the estimator in the same spirit of Jiang and Doerge(2008). The estimator was proven to be able to construct conservative adaptive BH algorithm.Unfortunately, the proof of the result suffers from a critical mistake as pointed out in Biswas(2) (2020). However, a revised version of the proof makes the assumption that the set oftuning parameters { τ j : j = 1 , , ..., n } are chosen such that ∪ nj =1 { τ j } ⊂ ∩ i ∈I S i (Chen andDoerge 2020). The authors also mentioned that such a choice may not exist for multipletuning parameters and thus only option is to work with a single tuning parameter. Underthis restrictive assumption on τ the tuning parameter, λ i as defined in Chen et al. (2018) isidentically τ ∈ ∩ i ∈I S i for all i ∈ I . Thus under the assumption ˆ π G becomes identical to ˆ π S for a fixed τ . It is obvious that the shortcomings of ˆ π S discussed in the previous paragraph arealso present for the recently developed ˆ π G , under the assumption made in the revised proof.The above facts necessitate further investigation. In this article, we propose a new estimatorfor π in similar spirit of the work in Chen et al. (2018). Choice of the tuning parameters3 τ j : j = 1 , , ..., n } are not restricted by the support of the p -values for the new estimator.However, for each p i and τ j , a threshold value λ ij ∈ S i is obtained in a different way from λ ij ’s in Chen et al. (2018). Despite the liberal choices of tuning parameters we have been ableto prove that conservative adaptive FDR controlling step-up procedures can be constructedby plugging in the new estimator. We also point out that under the assumption in Chen andDoerge (2020), the new estimator is same as ˆ π G and hence ˆ π S . Though the new estimator isdeveloped from a novel perspective in this article, it may be thought of as a generalization ofthe estimator in Chen et al. (2018) or a modification of the estimator in Storey et al. (2004)under discrete paradigm with proven FDR control for the adaptive BH procedure. Define q i = inf S i for i ∈ I and ν = max { q i : 1 ≤ i ≤ m } . We assume that ν < c = 1 are to be removed as already mentioned in Section2. The estimator and the desired result do not require this assumption but we continue withthe reasonable assumption for two reasons. The first one is that the data reduction techniquemakes further algebraic treatment straight forward and the second one is that the techniquereduces the number of tests to perform by screening out those rows of the dataset for whichtesting the corresponding hypothesis is futile due to extremely less available information. Onemay refer to the data analysis sections of Chen et al. (2018) and Chen (2019) for verificationof the discussed usual practice. Now we discuss the estimator through the following steps. Algorithm 1 • Set a sequence of tuning parameters ν ≤ τ ≤ τ ≤ ... ≤ τ n < • For each i ∈ I and j ∈ J = { , , ..., n } , define T ij = { λ ∈ S i : λ ≥ τ j } λ ij = inf { λ : λ ∈ T ij } η j = max ≤ i ≤ m λ ij . • For each j ∈ J , define the following trial estimator for π . β ( τ j ) = 1 m (1 − η j ) + 1 − τ j m (1 − η j ) X i ∈I I ( p i > λ ij )1 − λ ij If β ( τ j ) is greater than 1, we take β ( τ j ) = 1. • The final estimator for π is ˆ π H = 1 n X j ∈J β ( τ j ) . Note that the assumption ∪ i ∈J { τ j } ⊂ ∩ i ∈I S i = ⇒ τ j = η j , hence ˆ π H and ˆ π G are identical fora fixed set of tuning parameters. Otherwise for each j ∈ J , η j ≥ τ j and thus ˆ π H dominatesˆ π G almost surely. This small amount of conservative bias enables us to prove the desired resulttheoretically without the impractical assumption taken up in Chen and Doerge (2020). However,we already mentioned that ˆ π G and ˆ π S are identical under the same assumption and thus a goodpoint of investigation is to validate whether the conservative bias present in ˆ π S is more thanthe bias present in ˆ π H . 4 .1 Bias of the estimator From the definition of ˆ π H , Bias(ˆ π H ) = (1 /n )Bias[ β ( τ j )]. Now using the same notations as inSection 2 E [ β ( τ j )] = 1 m (1 − η j ) + 1 − τ j m (1 − η j ) "X i ∈T − F i ( λ ij )1 − λ ij + X i ∈F − G i ( λ ij )1 − λ ij . (3. 6)As defined in Algorithm 1, λ ij ∈ S i and hence F i ( λ ij ) = λ ij for each i ∈ I and j ∈ J . Thus P i ∈T (1 − F i ( λ ij )) / (1 − λ ij ) in 3. 6 is equal to m and E ( m ) = m (1 − π ) as discussed inSection 2. Therefore Bias [ β ( τ j )] = 1 m (1 − η j ) + (cid:18) − − τ j − η j (cid:19) π + 1 − τ j m (1 − η j ) X i ∈F − G i ( λ ij )1 − λ ij . (3. 7)With τ = τ j in B , the bias structure of ˆ π S we get Bias [ˆ π ( τ j )] = 1 m (1 − τ j ) + 1 m (1 − τ j ) X i ∈T [ τ − F i ( τ j )] + 1 m (1 − τ j ) X i ∈F [1 − G i ( τ j )] . (3. 8)Note that under that assumption τ j = η j for each j ∈ J , the bias in β ( τ j ) is identical to ˆ π j asexpected. For the general setting, the first term in (3. 7) dominates the first term in (3. 8). Theother terms are not directly comparable. Numerical comparison of bias of the new estimatorwith the estimator with ˆ π S (0 .
5) under different simulation settings will provide a deeper insight.
Denote the set of available p -values { p , p , ..., p m } by p . Let us introduce the following notation. p k = { p , ..., p k − , , p k +1 , ..., p m } Accordingly we define β k ( τ j ) and ˆ π H k as β ( τ j ) and ˆ π H computed by replacing p by p k , respec-tively for each k ∈ I and j ∈ J .We briefly revisit the BH algorithm. First p is ordered p (1) ≤ p (2) ≤ ... ≤ p ( m ) and thehypothesis corresponding to p ( i ) is renamed H ( i ) , i ∈ I . Define ˆ k = max ≤ i ≤ m { i : p ( i ) ≤ iα/m } for some prefixed level α ∈ (0 , k exists, we reject H (1) , H (2) , ..., H (ˆ k ) and acceptthe remaining hypotheses. If such a ˆ k does not exist, then we do not reject any hypotheses.The adaptive version of BH algorithm (Henceforth ABH) takes the choice of ˆ k = max ≤ i ≤ m { i :ˆ π H p ( i ) ≤ iα/m } and proceeds in similar line with the BH algorithm. The adjusted p -values ofthe BH algorithm and the BHH algorithm are discussed in Heyse (2011). The new estimatorfor π can similarly be plugged into the BHH algorithm to construct adaptive BHH algorithm(Henceforth ABHH) (see Chen et al. 2018). However, the BHH and hence ABHH algorithmsare difficult to express in terms of linear cut-offs for ordered p -values as BH and BHH. Thus thetechnique we opt for proving FDR control by ABH cannot be directly applied to ABHH. Due tothe improved performance of BHH over BH algorithm, numerical investigation regarding FDRcontrol and real life application of ABHH are worth performing. Theorem 1 : If the p -values are independent of each other and BH procedure controls the FDRat a prefixed level α ∈ (0 , π H plugged-in ABH procedure controls the FDR at α .5 roof of Theorem 1 : Obviously ˆ π H = H ( p , p , ..., p m ), a function of the available p -valuesfor a fixed set of tuning parameters. The functional reciprocal of ˆ π H be G ( p , p , ..., p m ), forsome function G : (0 , m → R + . Noting that the BH procedure is a linear step-up procedure,we apply Theorem 11 of Blanchard and Roquain (2009) (A particular case was done earlier inBenjamini et al. (2006)). Thus to prove Theorem 1 we need to validate the following sufficientconditions on ˆ π H .1. G is a component-wise non-decreasing function.2. For each k ∈ T , E (cid:2) / ˆ π H k (cid:3) ≤ /π .For p ′ i ≥ p i for some i ∈ I , I ( p ′ i > λ ij ) ≥ I ( p i ≥ λ ij ) for each j ∈ J . Other terms in theexpression of ˆ π H are free of any p i . From the above facts it is trivial to justify the first condition.The second condition is the crucial one. From the definition of ˆ π H in Algorithm 1 and thenotations introduced earlier in this section, using AM-HM inequality we get1ˆ π H k = n P j ∈J β k ( τ j ) ≤ n X j ∈J β k ( τ j )and hence E (cid:20) π H k (cid:21) ≤ n X j ∈J E (cid:20) β k ( τ j ) (cid:21) . Thus for proving the second condition, it is sufficient to show for each j ∈ J that E (cid:20) β k ( τ j ) (cid:21) ≤ E (cid:20) π (cid:21) . (4. 9)Now for a fixed j ∈ J β ( τ j ) = 1 m (1 − η j ) + 1 − τ j m (1 − η j ) "X i ∈I I ( p i > λ ij )1 − λ ij and hence β k ( τ j ) = 1 m (1 − η j ) + 1 − τ j m (1 − η j ) X i ∈I k I ( p i > λ ij )1 − λ ij . (4. 10)Here I k = { , ..., k − , , k + 1 , ..., m } . For k ∈ T , denote T − { k } by T k . Replacing I ( p i >λ ij ) / (1 − λ ij ) for i ∈ I k − T k by 0 in equation (4. 10) we get mβ k ( τ j ) ≥ − η j + 1 − τ j − η j X i ∈T k I ( p i > λ ij )1 − λ ij (4. 11)and hence E (cid:20) β k ( τ j ) (cid:21) ≤ m (1 − η j ) E
11 + (1 − τ j ) P i ∈I I ( p i >λ ij )1 − λ ij . (4. 12)For all i ∈ I , λ ij ≥ τ j and thus (1 − λ ij ) − ≥ (1 − τ j ) − . Replacing (1 − λ ij ) − by (1 − τ j ) − inright hand side (RHS) of equation (4. 12) we get E
11 + (1 − τ j ) P i ∈I I ( p i >λ ij )1 − λ ij ≤ E "
11 + P i ∈T k I ( p i > λ ij ) (4. 13)6nd hence E (cid:20) β k ( τ j ) (cid:21) ≤ m (1 − η j ) E "
11 + P i ∈T k I ( p i > λ ij ) . (4. 14)Since η j = max i ∈I λ ij , inequality in equation (4. 14) is preserved by replacing λ ij by η j for each i ∈ I RHS of (4. 14). E (cid:20) β k ( τ j ) (cid:21) ≤ m (1 − η j ) E "
11 + P i ∈T k I ( p i > η j ) (4. 15)For each i ∈ I , p i ≥ U i ∼ Uniform(0 ,
1) almost surely. Therefore I ( p i > η j ) ≥ I ( U i ≥ η j )almost surely. Since p i ’s are independent of each other U i ’s are considered to be so. Thus E (cid:20) β k ( τ j ) (cid:21) ≤ m (1 − η j ) E "
11 + P i ∈T k I ( U i > η j ) . (4. 16)Since U i ’s are independent, B = P i ∈T k I ( U i > η j ) ∼ Binomial( m − , − η j ). Applying Lemma1 of Benjamini et al. (2006) in equation (4. 16) we get E (cid:20)
11 + B (cid:21) ≤ m (1 − η j ) . (4. 17)Thus using (4. 17) in equation (4. 16) we get the result in (4. 9). Hence the proof. This article contains some theoretical developments regarding the proposed estimator for theproportion of true null hypotheses. Work on a data-driven choice of the tuning parametersis currently ongoing. Upon arriving at a suitable selection technique, future work consistsof performance exploration through extensive simulation studies. The next version will alsoaccommodate a real life application to establish the new methodology. We are expecting toreport the final results in a revised version of the current article within a short period of time.
References
Benjamini, Y., and Hochberg, Y. (1995), “Controlling the false discovery rate: a practical and powerfulapproach to multiple testing,”
Journal of the Royal Statistical Society: Series B (Statistical Methodol-ogy) , 289-300.Benjamini, Y., Krieger, A. M., and Yekutieli, D. (2006), “Adaptive linear step-up procedures thatcontrol the false discovery rate,”
Biometrika , 93(3), 491-507.Biswas, A. (2020), “Estimating the proportion of true null hypotheses based on sum of p -values andapplication in microarray data,” Communications in Statistics - Simulation and Computation , DOI:10.1080/03610918.2020.1800036Biswas, A. (2020), “Regarding Paper “Multiple testing with discrete data: Proportion of true nullhypotheses and two adaptive FDR procedures” by Xiongzhi Chen, Rebecca W. Doerge, and Joseph F.Heyse,”
Biometrical journal . Biometrische Zeitschrift.Blanchard, G., and Roquain, E. (2009), “Adaptive false discovery rate control under independence anddependence,”
Journal of Machine Learning Research , 10, 2837–2871. hen, X., and Doerge, R. (2014), “Generalized estimators for multiple testing: Proportion of true nullsand false discovery rate,” Retrieved from http://arxiv.org/abs/1410.4274.Chen, X., Doerge, R. W., and Heyse, J. F. (2018), “Multiple testing with discrete data: Proportionof true null hypotheses and two adaptive FDR procedures,” Biometrical Journal , 60(4), 761-779.Chen, X. (2020), “False discovery rate control for multiple testing based on discrete p-values,”
Bio-metrical Journal , 62(4), 1060-1079.Chen, X., and Doerge, R. W. (2020), “Comments on Dr. Aniket Biswas’ Letter to the Editor,”
Biomet-rical Journal .Cheng, Y., Gao, D., and Tong, T. (2015), “Bias and variance reduction in estimating the proportion oftrue-null hypotheses,”
Biostatistics , 16(1), 189-204.Dialsingh, I., Austin, S. R., and Altman, N. S. (2015), “Estimating the proportion of true null hy-potheses when the statistics are discrete,”
Bioinformatics , 31(14), 2303-2309.Heyse, J. F. (2011), “A false discovery rate procedure for categorical data,” In M. Bhattacharjee, S.K. Dhar, & S. Subramanian (Eds.),
Recent advancesin biostatistics: False discovery rates, survival anal-ysis, and related topics (Vol. 4, chap. 3, pp. 43–58). Singapore, SI: World Scientific.Jiang, H., and Doerge, R. W. (2008), ”Estimating the proportion of true null hypotheses for multi-ple comparisons,”
Cancer Informatics , 6, 117693510800600001.Langaas, M., Lindqvist, B. H., and Ferkingstad, E. (2005), “Estimating the proportion of true nullhypotheses, with application to DNA microarray data,”
Journal of the Royal Statistical Society: SeriesB (Statistical Methodology),
Sankhy¯a: The Indian Jour-nal of Statistics, Series A (2008-), 135-168.Storey, J. D. (2002), “A direct approach to false discovery rates,”
Journal of the Royal Statistical Society:Series B (Statistical Methodology),
Journal of the RoyalStatistical Society: Series B (Statistical Methodology) , 66(1), 187-205.Wang, H. Q., Tuominen, L. K., and Tsai, C. J. (2010), “SLIM: a sliding linear model for estimatingthe proportion of true null hypotheses in datasets with dependence structures,”
Bioinformatics , 27(2),225-231., 27(2),225-231.