Bound on FWER for correlated normal distribution
aa r X i v : . [ m a t h . S T ] A ug Bound on FWER for correlated normal distribution
Nabaneet Das and Subir K. Bhandari Indian Statistical Institute,Kolkata Indian Statistical Institute,KolkataAugust 7, 2019
Abstract
In this paper,our main focus is to obtain an asymptotic bound on the family wise errorrate (FWER) for Bonferroni-type procedure in the simultaneous hypotheses testing problemwhen the observations corresponding to individual hypothesis are correlated. In particular,we have considered the sequence of null hypotheses H i : X i ∼ N (0 ,
1) ( i = 1 , , ...., n ) andequicorrelated structure of the sequence ( X , ...., X n ). Distribution free bound on FWERunder equicorrelated setup can be found in [19]. But the upper bound provided in [19] is not abounded quantity as the no. of hypotheses( n ) gets larger and larger and as a result,FWERis highly overestimated for the choice of a particular distribution (e.g.- normal). In theequicorrelated normal setup, we have shown that FWER asymptotically is a convex function(as a function of correlation ( ρ )) and hence an upper bound on the FWER of Bonferroni- α procedure is α (1 − ρ ).This implies,Bonferroni’s method actually controls the FWER at amuch smaller level than the desired level of significance under the positively correlated caseand necessitates a correlation correction. Multiple hypothesis testing has been one of the most lively area of research in statistics for thepast few decades. The biggest challenge in this area comes from the fact that the models involvean extensive collection of unknown parameters and one has to draw simultaneous inference on alarge number of hypotheses mainly with the goal of ensuring a good overall performance (ratherthan focusing too much on the individual problems). Very often data sets from modern scien-tific investigations,in the field of Biology,astronomy,economics etc. require such simultaneoustesting on thousands of hypotheses.Various measures of error rate have been proposed over the years. One of the hard-line frequen-tist approach is to control the family wise error rate (FWER) which is defined as the probabilityof making at least one false rejection in a family of hypothesis-testing problem.Bonferroni’s bound provides the classical FWER control method. However, the step-up andstep-down algorithms by [11] , [17],[ ? , 12] ,[10] provides improvement over the Bonferroni’smethod in terms of power. While Holm’s procedure provides control over the FWER in gen-eral, the other algorithms depend heavily on the independence of the p-values of the individualhypothesis. [3], [4],[7] provides excellent review of the whole theory.One of the main limitations of these classical methods that control FWER in the strong sense,is their conservative nature which results in lack of power. A substantial improvement in powerhas been achieved by considering the False discovery rate (FDR) criterion proposed by [1]. See,for example [2], [20], [18], [16],[13] for further details. [14] ,[7] provides an excellent account onthe literature on FDR.However, most of these works have been done in the context of independent observations. Very1ittle literature can be found that covered correlated variables. [14] reviews FDR control underdependence set up. [6] clearly shows the effects of correlation on the summary statistics bypointing out that the correlation penalty depends on the root mean square (rms) of correla-tions. An excellent review of the whole work can be found in [7]. All these works gives immenselight in the context that FWER or FDR should be treated more carefully where correlation ispresent.Some distribution free bound on FWER can be found in [19] using Chebyshev-type inequali-ties. But as these inequalities are distribution free, FWER is highly overestimated for choice ofparticular distributions (e.g.- normal). Also, these inequalities are not of much use for a largenumber of hypotheses.In our work, we have considered equicorrelated normal distribution and obtained a sharperbound on the FWER for the Bonferroni type FWER control procedure. We have shown that,asymptotically (For large no. of hypotheses) F W ER ( ρ ) is a convex function in ρ ∈ [0 ,
1] andhence FWER in Bonferroni- α procedure is bounded by α (1 − ρ ). This suggests a necessary cor-relation correction in Bonferroni procedure. While the bound provided in [19] is not a boundedquantity as no. of hypotheses gets larger and larger , the bound provided in this work remainsstable even as n → ∞ and shows a clearer picture of the effect of correlation on FWER. Thisis probably the first attempt in the context of finding most classically used FWER in terms of ρ asymptotically. We, in further communication expect to attempt the same problem in termsof root mean square (rms) of correlations as attempted in [6]. Let X , X , ...... be a sequence of observations and the null hypotheses are H i : X i ∼ N (0 , i = 1 , , .... Many single hypothesis testing problems focus on the one sided tests which rejects the nullhypothesis for large values of the observation. We’ll consider here the one-sided test : - Reject H i if X i > c . ( c is chosen according to the required significance level of the test).The two-sidedhypothesis problem can be solved in the similar manner.Under this setup a classical measure of the type-I error is FWER which is the probabilityof falsely rejecting at least one null hypothesis (Which happens if X i > c for some i and theprobability is computed under the intersection null hypothesis ∩ ni =1 H i = { X i ∼ N (0 , ∀ i =1 , , ..., n } . When we compute under the intersection null hypothesis, FWER and FDR are thesame. Hence this study also gives an idea of the behaviour of FDR under this setup. FWER = P(At least one false rejection)= P( X i > c for some i | H )Suppose, Corr ( X i , X j ) = ρ ∀ i = j ( ρ ≥ ρ .Let, H ( ρ ) = 1 − FWER ( ρ ) = P ( X i ≤ c, ∀ i = 1 , , ....n ) Theorem 3.1
Suppose each H i is being tested at size α n . If lim n →∞ nα n = α ∈ (0 , then,as n → ∞ , H ′′ ( ρ ) ≤ and hence H ( ρ ) asymptotically is a concave function in [0,1] . Note :-1. For ρ = 0 (Under independence), we must have , FWER = 1 − (1 − α ) n ≈ nα .2. For ρ = 1 ( When X i = X j a.s. ∀ i = j ), we must have FWER = α (Because one rejectionwould imply rejection of all null hypotheses).Suppose y = L ( ρ ) denotes the line which joins (1 , α ) and (0 , − (1 − α ) n ).The following corol-lary describles the asymptotic behaviour of FWER as a function of ρ . Corollary 3.1.1 As n → ∞ , FWER ( ρ ) is bounded above by the line L ( ρ ) . In this section we are going to provide a proof of this theorem. H ( ρ ) and it’s derivatives Under the framework described above, we can say that under H ,the sequence { X n } n ≥ isexchangeable. (i.e. ( X i , ..., X ik ) ∼ N k (0 e k , (1 − ρ ) I k + ρJ k ) (Where J k is a k ∗ k matrix of ones).Then, X k = θ + Z k , ∀ k ≥ θ is a mean 0, normal random variable, independent of the sequence { Z n } n ≥ and Z i ’sare i.i.d. normal random variables.Since Cov ( X i , X j ) = ρ this implies, V ar ( θ ) = ρ ⇒ θ ∼ N (0 , ρ ) and Z n ∼ i.i.d. N (0 , − ρ ) ∀ n ≥ H ( ρ ) = P ( θ + Z i ≤ c ∀ i = 1 , , ..., n ) = E θ [Φ n ( c − θ √ − ρ )] = E [Φ n ( c + √ ρZ √ − ρ )].(Where Z ∼ N (0 ,
1) and Φ is the c.d.f. of N(0,1) distribution)If we define , d = c + √ ρZ √ − ρ , then H ( ρ ) = E [Φ n ( d )].Now, an application of dominated convergence theorem would yield, H ′ ( ρ ) = E [ n Φ n − ( d ) φ ( d ) G ( ρ, Z )] (1)( where G ( ρ, Z ) = ∂d∂ρ = − ρ ) [ c + Z √ ρ ] and φ ( . ) is the N(0,1) p.d.f. )And again by D.C.T. , H ′′ ( ρ ) = E [ n n − ( d ) φ ( d )[ aG ( ρ, Z ) + bG ( ρ, Z ) + c Φ( d )4 ρ (1 − ρ ) ] (2)Where, a = ( n − φ ( d ) − d Φ( − d ) and b = (4 ρ − d )2 ρ (1 − ρ ) .Let’s define, α = Φ( − d ).Then, note that, H ′′ ( ρ ) = E [ n (1 − α ) n − φ ( d )[ aG ( ρ, Z ) + bG ( ρ, Z ) + c (1 − α )4 ρ (1 − ρ ) ]]= ∞ R −∞ n (1 − α ) n − φ ( d )[ aG ( ρ, z ) + bG ( ρ, z ) + c (1 − α )4 ρ (1 − ρ ) ] φ ( z ) dz Proof of the main theorem
The proof of the main theorem involves three steps. • Step 1 :-
The 2nd and 3rd term in H ′′ ( ρ ) → n → ∞ . (Proof is given in appendix(lemma I)). • Step 2 :-
Suppose at z = z , α ( z ) = Φ( − d ( z )) = n . If G ( ρ, z ) > ≤ • Step 3 :- If G ( ρ, z ) ≤ → n → ∞ .Proof of step 1 is given in appendix (lemma I). We shall proceed with the proof of step 2. We know that, x Φ( − x ) ∼ φ ( x ) for large enough x . d = c + √ ρZ √ − ρ → a.s. ∞ as n → ∞ (Because c → a.s. ∞ as n → ∞ ).So, a = ( n − φ ( d ) − d Φ( d ) ∼ d ( nα − E [ n (1 − α ) n − φ ( d ) aG ( ρ, Z )] ∼ E [ n (1 − α ) n − φ ( d ) d ( nα − G ( ρ, Z )]. E [ n (1 − α ) n − φ ( d ) d ( nα − G ( ρ, Z )]= ∞ R −∞ n ( nα − − α ) n − φ ( d ) dG ( ρ, z ) φ ( z ) dz ∝ R n ( nα − − α ) n − K ( α ) dα ( Where K ( α ) = dG ( ρ, z ) φ ( z ) )Here we have assumed that G ( ρ, z ) > R n ( nα − − α ) n − dα = 0Suppose, f n ( α ) = n ( nα − − α ) n − We have, f n ( α ) > ( < )0 ⇔ α > ( < ) n and R f n ( α ) dα = 0.Since Φ( − d ( z )) = n < for large enough n, this means , d ( z ) > α is a decreasing function of z. Since G ( ρ, z ) >
0, we can say that, K ( α ( z )) > K ( α ( z ′ )) ∀ z ′ < z , z > z .This means, n R | f n ( α ) | K ( α ) dα > R n | f n ( α ) | K ( α ) dα .Since f n ( α ) < , ∀ α ∈ (0 , n ), this means R f n ( α ) K ( α ) dα ≤ E [ n (1 − α ) n − φ ( d ) d ( nα − G ( ρ, Z )] ≤ In this part we have assumed that G ( ρ, z ) ≤
0. The whole real line is broken in three disjointregions { α > nn } , { α < n (log n ) } and { n (log n ) ≤ α ≤ nn } . We’ll show that the integralin these three regions separately → n → ∞ .4 Case 1 : - { α > nn } If α > nn , then (1 − α ) n − < (1 − nn ) n − .Using the fact that, (1 − xn ) n ≤ e − x ∀ x > , ∀ n ∈ N , we can say that, (1 − α ) n − < n .So , E [ n (1 − α ) n − φ ( d ) aG ( ρ, Z ) I ( α > nn ) ] ≤ E [ aG ( ρ,Z ) n I ( α > nn ) ]Each H i is being tested at size α n and we reject H i if X i > c .So, α n = Φ( − c ).As n → ∞ , c → ∞ by the condition lim n →∞ nα n = α ∈ (0 , c , we have Φ( − c ) ∼ φ ( c ) c = √ πce c . So, n ∼ α √ πce c ...................(i)Now, observe that, a = ( n − φ ( d ) − d Φ( d ) ≤ n − | d | ≤ n − c + √ ρ | Z |√ − ρ and G ( ρ, Z ) ≤ c + Z ρ − ρ ) .So, E [ aG ( ρ,Z ) n I ( α > nn ) ] ≤ E [ ( n − c + √ ρ | Z |√ − ρ )( c Z ρ − ρ )3 ) n ] → n → ∞ . • Case 2 :- { α < n (log n ) } For all d > φ ( d ) ≤ dα .This implies, | a | = | ( n − φ ( d ) − d Φ( d ) | ≤ n | d | α + | d | .nα < n ) . So, for large n , nα < | a | ≤ | d | for large n .Using the fact φ ( d ) ≤ dα note that, E [ n (1 − α ) n − φ ( d ) aG ( ρ, Z ) I ( α < n (log n )3 ) ] ≤ E [ d G ( ρ,Z )(log n ) I ( α < n (log n )3) ].From the previous observation (in case 1) we can say that, d G ( ρ, Z ) ≤ ( c + √ ρ | Z | ) ( c + Z ρ )2 ρ (1 − ρ ) Notation :- x n = Θ( y n ) if ∃ c , c > M ∈ N such that, c y n ≤ x n ≤ c y n ∀ n ≥ M Note that, log n = Θ( c ). (By (i))This implies, E [ d G ( ρ,Z )(log n ) I ( α < n (log n )3) ] → E [ n (1 − α ) n − φ ( d ) aG ( ρ, Z ) I ( α < n (log n )3 ) ] → • Case 3 : - { n (log n ) ≤ α ≤ nn } .If { G ( ρ, z ) ≤ } then z ≤ −√ ρc ⇒ | z | ≥ √ ρc . (This means z takes a very largenegative value.Note that, dG ( ρ, Z ) = p ( Z, c ) ( A polynomial in Z and c ).Since lim | x |→∞ p ( x, c ) e − x = 0, this implies, lim n →∞ p ( z, c ) φ ( z ) = 0 if z < z Also, note that, R n | f n ( α ) | dα = n R ( k − − kn ) n − dk ≤ ∞ R ( x − e − x dx < ∞⇒ R n | f n ( α ) | dα is bounded.And hence, E [ n (1 − α ) n − φ ( d ) d ( nα − G ( ρ, Z ) I ( α > n ) ] = R n f n ( α ) K ( α ) dα → { n (log n ) ≤ α ≤ n } .Suppose Φ( − d ( z )) = n (log n ) and z = z + δ n . ⇒ Φ( − d ( z ))Φ( − d ( z )) = (log n ) d ( z i )Φ( − d ( z i )) ∼ φ ( − d ( z i )) for i = 0 , ⇒ d ( z ) e d z d ( z ) e d z ∼ (log n ) ⇒ log( d ( z ) d ( z ) ) + d ( z ) − d ( z )2 ∼ n ))Since d ( z ) > d ( z ) ⇒ ( d ( z )+ d ( z ))( d ( z ) − d ( z ))2 ≤ n ) ⇒ d ( z ) − d ( z ) = q ρ − ρ δ n ≤ n ) d ( z )+ d ( z ) Since d ( z ) > d ( z ) = − Φ − ( n ) = Θ( √ log n ) this implies, n ) d ( z )+ d ( z ) → δ n → n → ∞ .So, E [ n (1 − α ) n − φ ( d ) d ( nα − G ( ρ, Z ) I ( n (log n )3 α ≤ n ) ] ≤ n R n (log n )3 n (1 − α ) n − K ( α ) dα Note that, | K ( α ) | ≤ max z ∈ [ z ,z + δ n ] p ( z, c ) φ ( z ) ( as K ( α ) = p ( z, c ) φ ( z ) )Since δ n → p ( z , c ) φ ( z ) → n → ∞ , by continuity of p ( z, c ) φ ( z ) we can saythat, max z ∈ [ z ,z + δ n ] p ( z, c ) φ ( z ) → | K ( α ) | → n → ∞ . n R n (log n )3 n (1 − α ) n − dα = R n )3 (1 − kn ) n − dk ≤ R e − k dk ≤ n R n (log n )3 n (1 − α ) n − K ( α ) dα → E [ n (1 − α ) n − φ ( d ) d ( nα − G ( ρ, Z ) I ( n (log n )3 α ≤ n ) ] → From the theorem in the previous section, we have H ′′ ( ρ ) ≤ ⇒ F W ER ′′ ( ρ ) ≥ n → ∞ . This means for large n, FWER ( ρ ) is bounded by L ( ρ ) in [0,1].From this, we can conclude that, Theorem 5.1
For large n , FWER ( ρ ) ≤ α n + (1 − ρ )[1 − α n − (1 − α n ) n ] . For large n, 1 − (1 − α n ) n ≈ nα n and this implies, FWER ( ρ ) ≤ α n [ n − ( n − ρ ].Bonferroni’s method suggests us to take α n = αn if we want to maintain α - FWER level.This satisfies the criterion of the main theorem of section 2.When α n = αn , then α n ( n − ( n − ρ ) ∼ α (1 − ρ ).This implies, the FWER of the Bonferroni’s procedure is bounded by α (1 − ρ ). Lemma 6.1
The second and third term in H ′′ ( ρ ) → as n → ∞ . Proof :- We shall do it only for the third term. The other one follows similarly. This proofis similar to the proof of case 2 ( G ( ρ, z ) <
0) of step 3 of the main theorem.6hird term in H ′′ ( ρ ) is ∝ E [ nc (1 − α ) n − φ ( d )] = ∞ R −∞ nc (1 − α ) n − φ ( d ) φ ( z ) dz .Let, φ ( z ) = M ( α ).Then, ∞ R −∞ nc (1 − α ) n − φ ( d ) φ ( z ) dz = c n R (1 − kn ) n − M ( kn ) dk Since φ ( d ) ≤ d α for large enough d , so E [ nc (1 − α ) n − φ ( d ) I ( α < n (log n )3 ) ] →
0. Also, if α > nn then (1 − α ) n − ≤ n and hence E [ nc (1 − α ) n − φ ( d ) I ( α > nn ) ] →
0. Let, Φ( − d ( z − δ ′ n )) = n (log n ) and Φ( − d ( z + δ ′′ n )) = nn .Similar to the case 2 ( G ( ρ, z ) <
0) of step 3 of the main theorem we can say that, max { δ ′ n , δ ′′ n } → n → ∞ . ⇒ If n (log n ) < α < nn then cM ( α ) ≤ max z ∈ [ z − δ ′ n ,z + δ ′′ n ] cφ ( z ) → n → ∞ .Thus, E [ nc (1 − α ) n − φ ( d ) I ( n (log n )3 <α < nn ) ] → H ′′ ( ρ ) → Lemma 6.2 E [ n (1 − α ) n − φ ( d ) G ( ρ, Z ) | a − d ( nα − | ] → as n → ∞ . Proof :- | a − d ( nα − | = ( n − | φ ( d ) − d Φ( d ) | .If d ≤ α ≥ and hence, (1 − α ) n − ≤ ( ) n − .Since n = Θ( ce c ), this immediately implies that, E [ n − α ) n − φ ( d ) G ( ρ, Z ) | a − d ( nα − | I ( d ≤ ] → d > , | φ ( d ) − d Φ( d ) | ≤ Kα for a constant K > E [ n (1 − α ) n − φ ( d ) G ( ρ, Z ) | a − d ( nα − | I ( d> ] ≤ KE [ n α (1 − α ) n − φ ( d ) G ( ρ, Z ) I ( d> ].A similar idea to the proof of lemma-I will tell us that, we need to consider the region cor-responding to n (log n ) < α < nn and the integral corresponding to this region → Theorem 5.1 tells us for large n, FWER for Bonferroni’s method (with level of significance α ) is asymptotically bounded above by α (1 − ρ ).In order to verify this result empirically,somesimulation results have been provided in table 1. In our simulation experiments,we have consid-ered ρ = 0 , . , . , . , . , . α = 0 . , . , . , . , . , . ρ, α ),10000 replications have been made to estimate the FWER (the estimate obtained is denoted byˆ F W ER ).In each replication, we have generated 10000 equicorrelated normal random variableseach with mean 0 and variance 1.Bonferroni’s method suggests us to reject H i at level α if Z i > (1 − α )-th quantile of N(0,1) distribution. In each replication we have to note whetheror not any of the 10000 Z i ’s exceeds that cut-off and then ˆ F W ER is obtained accordingly fromthe 10000 replications.Each ˆ
F W ER obtained at the combination ( ρ, α ) is compared with α (1 − ρ )(the upper bound7entioned in section 5).It is impressive that in all the cases ˆ F W ER is substantially smaller than α (1 − ρ ) (except at ( ρ, α ) = (0 . , .
01) although the difference is not noteworthy).All these ob-servations suggest that in positively correlated setup,Bonferroni’s method actually controls theFWER at a much smaller level than the desired level of significance which makes this methodmore conservative in this case. ρ α
F W ER α (1 − ρ ) 1.00E-03 0.005 0.01 0.04 0.06 0.070.7 ˆ F W ER α (1 − ρ ) 0.003 0.015 0.03 0.12 0.18 0.210.5 ˆ F W ER α (1 − ρ ) 0.005 0.025 0.05 0.2 0.3 0.350.3 ˆ F W ER α (1 − ρ ) 0.007 0.035 0.07 0.28 0.42 0.490.1 ˆ F W ER α (1 − ρ ) 0.009 0.045 0.09 0.36 0.54 0.630 ˆ F W ER α (1 − ρ ) 0.01 0.05 0.1 0.4 0.6 0.7Table 1: Simulation results8 eferences [1] Yoav Benjamini and Yosef Hochberg. Controlling the false discovery rate: a practical andpowerful approach to multiple testing. Journal of the Royal statistical society: series B(Methodological) , 57(1):289–300, 1995.[2] Yoav Benjamini and Wei Liu. A step-down multiple hypotheses testing procedure thatcontrols the false discovery rate under independence.
Journal of statistical planning andinference , 82(1-2):163–170, 1999.[3] Sandrine Dudoit, Juliet Popper Shaffer, Jennifer C Boldrick, et al. Multiple hypothesistesting in microarray experiments.
Statistical Science , 18(1):71–103, 2003.[4] Sandrine Dudoit and Mark J Van Der Laan.
Multiple testing procedures with applicationsto genomics . Springer Science & Business Media, 2007.[5] Bradley Efron. Correlation and large-scale simultaneous significance testing.
Journal ofthe American Statistical Association , 102(477):93–103, 2007.[6] Bradley Efron. Correlated z-values and the accuracy of large-scale statistical estimates.
Journal of the American Statistical Association , 105(491):1042–1055, 2010.[7] Bradley Efron.
Large-scale inference: empirical Bayes methods for estimation, testing, andprediction , volume 1. Cambridge University Press, 2012.[8] Bradley Efron et al. Size, power and false discovery rates.
The Annals of Statistics ,35(4):1351–1377, 2007.[9] Bradley Efron, Robert Tibshirani, John D Storey, and Virginia Tusher. Empirical bayesanalysis of a microarray experiment.
Journal of the American statistical association ,96(456):1151–1160, 2001.[10] Yosef Hochberg. A sharper bonferroni procedure for multiple tests of significance.
Biometrika , 75(4):800–802, 1988.[11] Sture Holm. A simple sequentially rejective multiple test procedure.
Scandinavian journalof statistics , pages 65–70, 1979.[12] Gerhard Hommel. A stagewise rejective multiple test procedure based on a modified bon-ferroni test.
Biometrika , 75(2):383–386, 1988.[13] Joseph P Romano, Azeem M Shaikh, and Michael Wolf. Control of the false discovery rateunder dependence using the bootstrap and subsampling.
Test , 17(3):417, 2008.[14] Sanat K Sarkar. On methods controlling the false discovery rate.
Sankhy¯a: The IndianJournal of Statistics, Series A (2008-) , pages 135–168, 2008.[15] Sanat K Sarkar et al. Some results on false discovery rate in stepwise multiple testingprocedures.
The Annals of Statistics , 30(1):239–257, 2002.[16] Sanat K Sarkar et al. Stepup procedures controlling generalized fwer and generalized fdr.
The Annals of Statistics , 35(6):2405–2420, 2007.[17] R John Simes. An improved bonferroni procedure for multiple tests of significance.
Biometrika , 73(3):751–754, 1986.[18] John D Storey. A direct approach to false discovery rates.
Journal of the Royal StatisticalSociety: Series B (Statistical Methodology) , 64(3):479–498, 2002.919] Yung Liang Tong.
Probability inequalities in multivariate distributions . Academic Press,2014.[20] Daniel Yekutieli and Yoav Benjamini. Resampling-based false discovery rate controllingmultiple test procedures for correlated test statistics.