TFisher Tests: Optimal and Adaptive Thresholding for Combining p -Values
TTFisher Tests: Optimal and Adaptive Thresholdingfor Combining p -Values Hong ZhangDepartment of Mathematical SciencesWorcester Polytechnic Institute, Worcester, MA 01609E-mail: [email protected]
Tiejun TongDepartment of MathematicsHong Kong Baptist University, Kowloon Tong, Hong KongE-mail: [email protected]
John LandersDepartment of NeurologyUniversity of Massachusetts Medical School, Worcester, MA 01655E-mail:
Zheyang WuDepartment of Mathematical SciencesProgram of Bioinformatics and Computational BiologyProgram of Data ScienceWorcester Polytechnic Institute, Worcester, MA 01609E-mail: [email protected]
January 16, 2018 a r X i v : . [ s t a t . M E ] J a n uthor’s Footnote: Hong Zhang is Doctoral Candidate, Department of Mathematical Sciences, Worcester PolytechnicInstitute. Mailing address: 100 Institute Rd, Worcester, MA 01609, USA (E-mail: [email protected]).Tiejun Tong is Associate Professor, Department of Mathematics, Hong Kong Baptist Univer-sity. Mailing address: FSC1201, Fong Shu Chuen Building, Kowloon Tong, Hong Kong (E-mail:[email protected]). John Landers is Professor, Department of Neurology, University of Mas-sachusetts Medical School. Mailing address: 55 Lake Avenue North, AS6-1053, Worcester, MA01655, USA (E-mail: [email protected]). Zheyang Wu is Corresponding Author andAssociate Professor, Department of Mathematical Sciences, Program of Bioinformatics and Compu-tational Biology, and Program of Data Science, Worcester Polytechnic Institute. Mailing address:100 Institute Rd, Worcester, MA 01609, USA (E-mail: [email protected]). The research wassupported in part by the NSF grant DMS-1309960.2 bstract
For testing a group of hypotheses, tremendous p -value combination methods have been developedand widely applied since 1930’s. Some methods (e.g., the minimal p -value) are optimal for sparsesignals, and some others (e.g., Fisher’s combination) are optimal for dense signals. To address awide spectrum of signal patterns, this paper proposes a unifying family of statistics, called TFisher,with general p -value truncation and weighting schemes. Analytical calculations for the p -value andthe statistical power of TFisher under general hypotheses are given. Optimal truncation andweighting parameters are studied based on Bahadur Efficiency (BE) and the proposed AsymptoticPower Efficiency (APE), which is superior to BE for studying the signal detection problem. Asoft-thresholding scheme is shown to be optimal for signal detection in a large space of signalpatterns. When prior information of signal pattern is unavailable, an omnibus test, oTFisher, canadapt to the given data. Simulations evidenced the accuracy of calculations and validated thetheoretical properties. The TFisher tests were applied to analyzing a whole exome sequencing dataof amyotrophic lateral sclerosis. Relevant tests and calculations have been implemented into an Rpackage TFisher and published on the CRAN.
Keywords : p -value combination tests, optimality, statistical power, signal detection, exomesequencing analysis, Bahadur efficiency, asymptotic power efficiency.3. INTRODUCTIONThe p -value combination approach is an important statistical strategy for information-aggregateddecision making. It is foundational to a lot of applications such as meta-analysis, data integration,signal detection, etc. In this approach a group of input p -values P i , i = 1 , ..., n, are combined toform a single statistic for testing the property of the whole group. For example, in meta-analysiseach p -value corresponds to the significance level of one study, and a group of similar studiesand their p -values are combined to test a common scientific hypothesis. In the scenario of signaldetection, each p -value is for one feature factor, and the p -values of a group of factors are combinedto determine whether some of those factors are associated with a given outcome. In either scenario,regardless of the original data variation, p -values provide a commonly scaled statistical evidenceof various sources (i.e., studies or a factors), therefore the p -value combination approach can beconsidered as combining information from different sources to make a reliable conclusion. Indeed, p -value combination can provide extra power than non-combination methods. In signal detectionfor example, weak signals could be detectable as a group but not recoverable as individuals (Donohoand Jin, 2004; Jin and Ke, 2016).The question is how we should combine a given group of p -values. One of the earliest methodsis Fisher’s combination statistic proposed in 1930’s (Fisher, 1932), which is simply the product ofall p -values, or equivalently its monotonic log transformation: T = n (cid:89) i =1 P i ⇔ W = − T ) = − n (cid:88) i =1 log( P i ) . (1)Fisher’s combination enjoys asymptotic optimality over any possible ways of combining p -valueswhen all p -values represent “signals”, e.g., all studies are positive or all features are associated(Littell and Folks, 1971, 1973). In this sense, the log-transformation of Fisher’s combination issuperior to other transformation functions, e.g., the inverse Gaussian Z-transformation (Stoufferet al., 1949; Whitlock, 2005). However, in real applications, it is often the case that only partof the p -values are related to signals. One example is in the meta-analysis of differential geneexpression, where the positive outcomes could happen in one or some of the studies only (Song andTseng, 2014). Another example is in detecting genetic associations for a group of genetic markers,where some of these markers are associated but some others are not (Hoh et al., 2001; Su et al.,4016). In fact, it has been shown that when true signals are in a very small proportion, e.g., atthe level of n − α with α ∈ (3 / , p -value as thestatistic (Tippert, 1931). However, the minimal p -value may no longer be optimal for denser weaksignals, e.g., under α ∈ (1 / , /
4) (Donoho and Jin, 2004; Wu et al., 2014). Thus, between the twoends of the classic methods – the optimality of Fisher’s combination for very dense signals and theoptimality of minimal p -value method for very sparse signals – a straightforward idea is to combinea subgroup of smaller p -values that more likely represent true signals. Following this idea, styles oftruncation methods were proposed. For example, the truncated product method (TPM) statisticis defined as (Zaykin et al., 2002, 2007): T = n (cid:89) i =1 P I ( P i ≤ τ ) i ⇔ W = − T ) = n (cid:88) i =1 − P i ) I ( P i ≤ τ ) , (2)where I ( · ) is the indicator function and τ is the threshold of truncation. A variation of TPM iscalled the rank truncation product (RTP) method, in which τ is set as the k th smallest p -value fora given k (Dudbridge and Koeleman, 2003; Kuo and Zaykin, 2011).Truncation-based methods have been widely applied in various practical studies and showndesirable performance. For example, many papers have been published in the genome-wide associ-ation studies (Biernacka et al. (2012); Dai et al. (2014); Dudbridge and Koeleman (2003); Li andTseng (2011); Yu et al. (2009), and others). However, there is a lack of theoretical study on thebest choice of τ . Two ad hoc intuitions were considered. One is a “natural” choice of τ = 0 . τ as the true proportion of signals. In Sections 5 and 6 of this paper, however,we will show that in general neither of the two intuitions gives the best choice of τ .Moreover, even if we can get the best τ for TPM, would it be an optimal statistic? The answeris still no. In fact, besides truncation, the statistical power could be improved through properlyweighting the p -values. In this paper, we propose a general weighting and truncation frameworkthrough a family of statistics called TFisher. We provide accurate analytical calculations for both p -value and statistical power of TFisher under general hypotheses. For the signal detection problem,theoretical optimality of the truncation and weighting schemes are systematically studied based onBahadur Efficiency (BE), as well as a more sophisticated measure Asymptotic Power Efficiency5APE) proposed here. The results show that in a large parameter space, TPM and RTP are notoptimal; the optimal method is by coordinating weighting and truncation in a soft-thresholdingmanner. This result provides an interesting connection to a rich literature of shrinkage and penaltymethods in the context of de-noising and model selection (Abramovich et al., 2006; Donoho, 1995).When prior information of signal patterns is unavailable, an omnibus test, called oTFisher,is proposed to obtain a data-adaptive weighting and truncation scheme. In general, omnibustest does not guarantee the highest power for all signal patterns, but it often provides a robustsolution that performs reasonably well in most scenarios. In literature, omnibus test mostly dependson computationally intensive simulations or permutations (Lee et al., 2012; Li and Tseng, 2011;Lin et al., 2016; Yu et al., 2009). In order to reduce the computation and improve the stabilityand accuracy, we provide an analytical calculation for determining the statistical significance ofoTFisher.The remainder of the paper is organized as follows. Problem formulation is given in Section 2,where the definitions of TFisher and the settings of hypotheses are clarified. For the whole TFisherfamily under finite n , we provide analytical calculations for their p -values in Section 3 and theirstatistical power in Section 4. Theoretical studies of optimality based on BE and APE are givenin Section 5. With extensive simulations, Section 6 demonstrates that our analytical calculationsare accurate and that our theoretical studies reflect the reality well. Section 7 shows an applicationof TFisher tests to analyzing a whole exome sequencing data for finding putative disease genesof amyotrophic lateral sclerosis. Concluding remarks are given in Section 8. Detailed proofs oflemmas and theorems and the supplementary figures are given in Supplementary Materials.2. TFISHER TESTS AND HYPOTHESES2.1 TFisherWith the input p -values P i , i = 1 , ..., n , the TFisher family extends Fisher’s p -value combinationto a general weighting and truncation scheme. The general formula of TFisher statistics can be6quivalently written as T = n (cid:89) i =1 (cid:18) P i τ i (cid:19) I ( P i ≤ τ ) ⇔ W = − T = n (cid:88) i =1 ( − P i ) + 2 log( τ i )) I ( P i ≤ τ ) , (3)where τ is the truncation parameter that excludes too big p -values and τ i are the weightingparameters for p -values. This statistic family unifies a broad range of p -value combination methods.When τ = τ i = 1, the statistic is the traditional Fisher’s combination statistic. When τ ∈ (0 , τ i = 1, it becomes the truncated product method (TPM) (Zaykin et al., 2002). When τ = P ( k ) and τ i = 1 for a given k , where P (1) ≤ ... ≤ P ( n ) are the ordered input p -values, it becomes therank truncation product method (RTP) (Dudbridge and Koeleman, 2003). When τ = 1 and τ i = P − λ i i , it leads to the power-weighted p -value combination statistic T = (cid:81) ni =1 P λ i i (Good,1955; Li and Tseng, 2011). For the simplicity of theoretical studies, in what follows we restrict toconstant parameters τ and τ i = τ . Such two-parameter definition corresponds to the dichotomousmixture model in the classic signal detection setting, such as those specified in (10) and (22).The weighting and truncation scheme is also related to thresholding methods in a rich literatureof shrinkage estimation, de-noising and model selection (Abramovich et al., 2006; Donoho, 1995).In particular, when τ = τ and τ = 1, TFisher corresponds to the hard-thresholding (i.e., TPM): W h = n (cid:88) i =1 ( − P i )) I ( P i ≤ τ ) . (4)When τ = τ = τ , TFisher is a soft-thresholding method: W s = n (cid:88) i =1 ( − P i ) + 2 log( τ )) + , (5)where ( x ) + = max { x, } . The soft-thresholding could have three benefits over hard-thresholdinghere. First, a value τ ∈ (0 ,
1) downscales the significance of original p -values, which could reducethe type I error rate in the related context of multiple hypotheses testing. Secondly, even though E H ( W s ) − E H ( W s ) < E H ( W h ) − E H ( W h ), W s has a much smaller variance, which could makeitself more powerful than W h . Thirdly, W s has a better weighting scheme for small p -values. Tosee this point, Figure 1 illustrates that the hard-thresholding scheme, represented by the curve − P i ) I ( P i ≤ τ ), is discontinuous at the cutoff τ . In contrast, the soft-thresholding scheme2 ( − log( P i ) + log( τ )) + is pushed down to be a smoothed curve. The more steeply dropping curve7f the soft-thresholding gives relatively heavier weights to smaller p -values that are more likelyassociated with true signals. In Section 5, we will provide theoretical result for functional relation-ships between signal patterns and optimal τ and τ . The soft-thresholding is to be shown mostlyoptimal, which is consistent with the conclusion in shrinkage analysis (Donoho, 1995).Figure 1: Comparison between the hard-thresholding curve − P i ) I ( P i ≤ τ ) (black) and thesoft-thresholding curve 2 ( − log( P i ) + log( τ )) + (green dot). τ = 0 . τ and τ are difficult todetermine. However, we can apply an omnibus test, called oTFisher, which adapts the choice ofthese parameters to the given data. oTFisher does not guarantee the highest power, but it oftenprovides a robust test that performs reasonably well over most signal patterns. In general, oTFisheradaptively chooses τ and τ that give the smallest p -value over the space of (0 , × (0 , + ∞ ): W o = min τ ,τ G τ ,τ ( W ( τ , τ )) , where G τ ,τ is the survival function of W ( τ , τ ) defined in (3) under the null hypothesis. Forpractical computation, we study a discrete domain over ( τ j , τ j ) for j = 1 , ..., m : W o = min j G j ( W j ) . (6)As we will show in theory and in simulations, a grid of τ j = τ j ∈ (0 ,
1) over small, mediate andlarge values in (0 ,
1) could perform sufficiently well in most cases.8.2 HypothesesTo answer the key question of how p -values should be combined, we keep in mind that the per-formance, in particular the statistical power, of different methods depends on the setting of thenull and alternative hypotheses. A general setting for the group testing problem is given in thefollowing. For independent and identically distributed (i.i.d.) input statistics X , ..., X n , we aim attesting the null and alternative hypotheses: H : X i ∼ F for all i vs. H : X i ∼ F for all i, (7)where F j , j = 0 ,
1, denote arbitrary continuous cumulative distribution functions (CDFs). Basedon the given H , the corresponding input p -values are P i = ¯ F ( X i ) , (8)where ¯ F = 1 − F denotes the survival function of the null distribution. Note that the one-sided p -value definition in (8) actually covers the two-sided tests too. This is because F is arbitrary, andthe statistics can simply be replaced by X (cid:48) i = X i ∼ F (cid:48) whenever the signs of input statistics havemeaningful directionality (e.g., protective and deleterious effects of mutations in genetic associationstudies). Also note that the i.i.d. assumption in (7) is for the convenience of power calculation.If p -value calculation of TFisher is the only concern in a data analysis, the null hypothesis can begeneralized to H : Independent T i ∼ F i , or equivalently, H : P i i . i . d . ∼ Uniform[0 , , i = 1 , ..., n. (9)That is, the TFisher tests can be applied into meta-analysis or integrative analysis of heterogenousdata, where input test statistics could potentially follow different distributions.A particularly interesting scenario is the signal detection problem, where the target is to test theexistence of “signals” in a group of statistics. Usually the test statistics are, or can be approximatedby, the Gaussian distribution. Thus the problem is to test the null hypothesis of all “noises” versusthe alternative hypothesis that a proportion of signals exist: H : X i i . i . d . ∼ N (0 , σ ) vs. H : X i i . i . d . ∼ (cid:15)N ( µ, σ ) + (1 − (cid:15) ) N (0 , σ ) , i = 1 , ..., n. (10)9ere the zero mean indicates the noise, the none-zero mean µ represents the signal strength, and (cid:15) ∈ (0 ,
1] represents the proportion of the signals. The signal patterns are characterized by theparameter space of ( (cid:15), µ ). For simplicity, we assume the variance σ is known or can be accuratelyestimated, which is equivalent to assuming σ = 1 without loss of generality (otherwise, data canbe rescaled by σ ). 3. TFISHER DISTRIBUTION UNDER H In this section we provide the calculation for the exact null distribution of TFisher in (3) when τ and τ are given. Based on that, an asymptotic approximation for the null distribution of oTFisherin (6) is also provided. Thus the p -values of TFisher and oTFisher can be quickly and accuratelycalculated in practical applications.3.1 Exact Distribution at Given τ and τ Consider the general null hypothesis in (9). Let U i i . i . d . ∼ Uniform[0 , , i = 1 , ..., n, and N be thenumber of U i less than or equal to τ . The TFisher statistic in (3) can be written as W ( τ , τ ) = N (cid:88) i =1 − (cid:18) τ τ U i (cid:19) . For a fixed positive integer k ≥
1, it is easy to check that P (cid:32) k (cid:88) i =1 − (cid:18) τ τ U i (cid:19) ≥ w (cid:33) = ¯ F χ k (cid:18) w + 2 k log (cid:18) τ τ (cid:19)(cid:19) , where ¯ F χ k ( x ) is the survival function of a chi-squared distribution with degrees of freedom 2 k . Since N ∼ Binomial( n, τ ), W can be viewed as a compound of this shifted chi-squared distribution andthe binomial distribution: P ( W ≥ w ) = (1 − τ ) n I { w ≤ } + n (cid:88) k =1 (cid:18) nk (cid:19) τ k (1 − τ ) n − k ¯ F χ k (cid:18) w + 2 k log (cid:18) τ τ (cid:19)(cid:19) . We can further simplify the above formula by noting the relationship between ¯ F χ k ( x ) and theupper incomplete gamma function Γ( s, x ):¯ F χ k ( x ) = (cid:90) + ∞ x u k − e − u/ k ( k − du = (cid:90) + ∞ x/ y k − e − y ( k − dy = Γ( k, x/ k − e − x/ k − (cid:88) j =0 ( x/ j j ! . W is given by P ( W ≥ w ) = (1 − τ ) n I { w ≤ } + n (cid:88) k =1 (cid:18) nk (cid:19) τ k (1 − τ ) n − k e − w/ (cid:18) τ τ (cid:19) k k − (cid:88) j =0 [ w/ k log( τ /τ )] j j != (1 − τ ) n I { w ≤ } + e − w/ n (cid:88) k =1 k − (cid:88) j =0 (cid:18) nk (cid:19) τ k (1 − τ ) n − k [ w + 2 k log( τ /τ )] j (2 j )!! . (11)Note that the formula is not continuous in the first term because of the truncation at τ . Also asa special case, for the soft-thresholding statistic with τ = τ = τ , we have P ( W s ≥ w ) = (1 − τ ) n I { w ≤ } + e − w/ n (cid:88) k =1 k − (cid:88) j =0 (cid:18) nk (cid:19) τ k (1 − τ ) n − k w j (2 j )!! . For given τ and τ , this p -value calculation is exact. As evidenced by simulations, Figure 2 showsthat formula (11) provides a perfect null distribution curve for the TFisher family W in (3).Figure 2: The right-tail distribution curve of W ( τ , τ ) under H . Left panel: ( τ , τ ) = (0 . , . τ , τ ) = (0 . , . simulations; Exact: byformula (11).3.2 Calculation for Omnibus TestFor the omnibus test oTFisher in (6), noting that G j is monotone, we have P (min j G j ( W j ) > t ) = P ( W j ( P , ..., P n ) < w j , j = 1 , ..., m ) , (12)11here for each j and given ( τ j , τ j ), the exact value of w j ≡ G − j ( t ) can be calculated by (11). These W j ’s are functions of the same set of input p -values, and therefore they are dependent among eachother. Fortunately, since W j = (cid:80) ni =1 − P i /τ j ) I ( P i <τ j ) , by the Central Limit Theorem (CLT),the statistics ( W , ..., W m ) follow asymptotically the multivariate normal (MVN) distribution withmean vector µ = ( µ , ..., µ m ) and covariance matrix Σ, where µ j = E ( W j ) = 2 nτ j (1 + log( τ j /τ j )) , andΣ jk = Cov( W j , W k )= 4 nτ jk + 4 n (cid:20) τ jk (1 + log( τ j τ jk ))(1 + log( τ k τ jk )) − τ j τ k (1 + log( τ j τ j ))(1 + log( τ k τ k )) (cid:21) , (13)where τ jk = min { τ j , τ k } . Note that under the special case of the soft-thresholding with τ j = τ j = τ j , the two formulas can be readily simplified (assuming τ j ≤ τ k ) as µ j = 2 nτ j , Σ jk = 4 nτ j (cid:20) − τ k + log( τ k τ j ) (cid:21) . Thus we can approximate the p -value of oTFisher by the asymptotic distribution of W j ’s P (min j G j ( W j ) > w o ) ≈ P ( W (cid:48) j < w j , j = 1 , ..., m ) , (14)where ( W (cid:48) j ) ∼ MVN( µ, Σ), and µ and Σ are given in (13). The multivariate normal probabilitiescan be efficiently computed, e.g., by Genz (1992). Figure 3 shows the left-tail probability of W o ,which corresponds to the p -value because a smaller W o indicates a stronger evidence against thenull. The figure shows that the calculation method is accurate even for small n , and the accuracyimproves as n increases. The calculation is slightly conservative, which guarantees that the type Ierror rate will be sufficiently controlled in real applications.4. TFISHER DISTRIBUTION UNDER GENERAL H In this section we provide a methodology for calculating the distribution of TFisher in (3) underthe general H and H in (7), and thus the statistical power. Even though the calculation is derivedasymptotically, it possesses a high accuracy for small to moderate n .For any given CDF F or F in (7), we define a monotone transformation function on [0 , D ( x ) = x under H : F , ¯ F ( ¯ F − ( x )) under H : F (cid:54) = F . (15)12igure 3: The left-tail null distribution of W o over τ j = τ j = τ j ∈ { . , . , ..., } . Simulation:curve obtained by 10 simulations; Approx.: by calculation in (11).For any random p -value P i in (8), we have D ( P i ) ∼ Uniform[0 ,
1] under either H or H . Further-more, we define function δ ( x ) = D ( x ) − x, (16)which provides a metric for the difference between H and H . For example, for any level α test, δ ( α ) represents the difference between the statistical power and the size. For any random p -value P , δ ( P ) measures a stochastic difference between the p -value distribution under H versus thatunder H .The TFisher statistic can be written as W = n (cid:88) i =1 − (cid:18) P i τ (cid:19) I ( P i ≤ τ ) = n (cid:88) i =1 Y i , (17)where Y i ≡ − (cid:16) D − ( U i ) τ (cid:17) I ( D − ( U i ) ≤ τ ) , and U i = D ( P i ) are i.i.d. Uniform[0 , F and F , the D function could be complicated and exact calculation could bedifficult. Here we propose an asymptotic approximation for the distribution of W under H . Notethat since W is the sum of i.i.d. random variables, it is asymptotically normal by the CLT. However,for small to moderate n and for small truncation parameter τ , the normal approximation is not veryaccurate. Here we use a three-parameter ( ξ, ω, α ) skew normal distribution (SN) to accommodate13he departure from normality (Azzalini, 1985). Specifically, we approximate W by W D ≈ SN( ξ, ω, α ) , where the probability density function of SN is f ( x ) = 2 ω φ (cid:18) x − ξω (cid:19) Φ (cid:18) α x − ξω (cid:19) , with φ and Φ being the probability density function and the CDF of N (0 , ξ, ω, α ) are obtained by solving the equations of the first three moments: ξ = µ − (cid:18) µ − π (cid:19) / ,ω = (cid:115) σ + (cid:18) µ − π (cid:19) / ,α = sgn( µ ) (cid:115) π (2 µ ) / σ (4 − π ) / + (2 − π )(2 µ ) / , where µ = E ( W ) = nE ( Y ) ,σ = Var( W ) = n [ E ( Y ) − E ( Y )] ,µ = E ( W − E ( W )) = n [ E ( Y − E ( Y )) ] , with EY k = (cid:90) D ( τ )0 (cid:18) − (cid:18) D − ( u ) τ (cid:19)(cid:19) k du , k = 1 , , . As shown by Figure 4, the SN approximation for calculating statistical power is accurate evenfor small n and τ . We have also studied other distribution-approximation techniques includingthe generalized normal distribution (Nadarajah, 2005; Varanasi and Aazhang, 1989), the first- andsecond-order Edgeworth expansions (DasGupta, 2008), Saddle point approximation (Daniels, 1954;Lugannani and Rice, 1980), etc. Based on our simulation results (not reported in this paper to savespace), we note that the SN approximation provides a better accuracy for calculating the power ofTFisher with small τ under small n . 14igure 4: The right-tail distribution of W under the alternative hypotheses of Gaussian mixturein (10). Left panel: ( τ , τ ) = (0 . , . . , . simulations; Approx. SN: by the skew-normal approximation; Approx. N: by the normalapproximation. 5. ASYMPTOTIC OPTIMALITY FOR SIGNAL DETECTIONIn this section, we study the asymptotic performance and optimality within the TFisher familyin (3). The subscript n is explicitly added to indicate that the asymptotics is driven by n → ∞ .Overall, both studies of BE and APE consistently conclude that the soft-thresholding with τ = τ isoptimal or close to optimal in a broad space of the signal parameters ( (cid:15), µ ), whereas Fisher’s method(i.e., no truncation) or TPM (i.e., the hard-thresholding) are not. The functional relationshipbetween optimal ( τ ∗ , τ ∗ ) and ( (cid:15), µ ) by APE better reflects the patterns of statistical power thanthat by BE in real data analysis.5.1 Properties Based on Bahadur EfficiencyBE was first introduced by Bahadur (1960) to study the large sample property of test statis-tics. Consider a test T n = T ( X , ..., X n ), where X , ..., X n are random samples. Denote L n ( t ) = P H ( T n > t ) as the survival function of T n under H , and L n ( t | θ ) as the survival function under H . Under H , if lim n →∞ − n log L n ( T n | θ ) = c T ( θ ) ∈ (0 , ∞ ) , (18)15e call the constant c T ( θ ) the Bahadur Efficiency (BE, or Bahadur exact slope) of T n (Nikitin,1995). Since L n ( T n | θ ) is actually the p -value under H , c T ( θ ) suggests how quickly the p -valuedecays to zero. Thus, BE indicates how much the null and alternative distributions of T n areseparated in an asymptotic sense. It is also related to the minimal sample size n that is necessaryfor the test to reach a given statistical power at a given significance level (Bahadur, 1967). Ifanother test T (cid:48) has c T (cid:48) ( θ ) > c T ( θ ), T (cid:48) n is said to be Bahadur asymptotically more efficient than T n .Here, for the signal detection problem defined in (10), the parameter θ is a vector ( (cid:15), µ ).Note that under the hypothesis settings in (7) and (10), the input statistics X , ..., X n can beregarded as the input samples for the p -value combination tests, e.g., in (3). Thus the number n oftests to be combined can be regarded as the sample size n in the Bahadur asymptotics given in (18).This setting is similar as some BE studies for p -value combination methods (e.g., Abu-Dayyeh et al.(2003)), but are different from the others where the input statistics X i are related to the samplesize (e.g., Littell and Folks (1971, 1973)).To calculate c T ( θ ), one can apply a composition method (cf. Theorem 1.2.2 in Nikitin (1995)).Specifically, if (i) T n P → g ( θ ) under H , and (ii) the tail property of p -value under H satisfieslim n →∞ − n log L n ( t ) = f ( t ), where f ( t ) is continuous on an open interval I and g ( θ ) ∈ I for all θ under H , then c T ( θ ) = f ( g ( θ )). Note that the convergency T n P → g ( θ ) under H implies that thevariance of T n will converge to 0 under H . Thus BE contains the variance information only under H . We make this important property as a remark. Remark 1.
Bahadur efficiency does not incorporate the information on the variance of the statisticunder H . Now we calculate the BE of any TFisher statistic W n ( τ , τ ) in (3). Considering an equivalenttest statistic T n = W n /n and following (17) and the Law of Large Numbers, under H we have W n n P → E = E ( Y i ) = (cid:90) τ − log (cid:18) uτ (cid:19) D (cid:48) ( u ) du. Note that, P ( W n /n > t ) = P ( n (cid:80) ni Y i − E (cid:112) V /n > t − E (cid:112) V /n ) , E and V denote the mean and variance of Y i under H , respectively: E = E H ( Y i ) = τ (1 − log τ + log τ ) ,V = Var H ( Y i ) = τ (1 + (1 − τ )(1 − log τ + log τ ) ) . (19)Consider the statistic under H , by the CLT and Mill’s ratio, we havelim n →∞ − n log P ( W n /n > t ) = ( t − E ) V . Thus the BE of W n is c ( (cid:15), µ ; τ , τ ) = ( E − E ) V = ∆ V . (20)The signal parameters (cid:15) and µ are involved through the D (cid:48) ( u ) function in the expression of E .The formula does not contain information on the variance of the statistic under H , as stated inRemark 1.The BE-optimal τ and τ are the ones that maximize c ( (cid:15), µ ; τ , τ ). Under the general hypothe-ses in (7), based on the metric δ ( x ) for the difference between H and H defined in (16), Lemma1 gives a loose condition for the soft-thresholding being “first-order optimal” in the sense that itreaches the stationary point of maximization. It means that in a very general case of arbitrary H ,the soft-thresholding with τ = τ may provide a promising choice for construction of a powerfultest. Lemma 1.
Consider TFisher statistics W n ( τ , τ ) in (3) under the general hypotheses in (7). With δ ( x ) in (16), if τ ∗ is the solution of equation (cid:90) x log( u ) dδ ( u ) = δ ( x ) (cid:18) log( x ) − − x − x (cid:19) , (21) then the soft-thresholding with τ = τ = τ ∗ satisfies the first-order conditions for maximizing c ( (cid:15), µ ; τ , τ ) in (20). Equation (21) can be easily checked, and is often satisfied in broad cases, e.g., the signaldetection problem defined by the Gaussian mixture model in (10). However, before getting thespecific maximizers τ ∗ and τ ∗ of BE, we study their first-order property in a more general case17han the Gaussian: hypotheses based on a general mixture model with arbitrary continuous CDFs G and G : H : X i i . i . d . ∼ G vs. H : X i i . i . d . ∼ (cid:15)G + (1 − (cid:15) ) G , (22)where the proportion (cid:15) ∈ (0 ,
1) can be considered as the signal proportion. Lemma 2 gives asomewhat surprising result that the maximizers τ ∗ and τ ∗ of BE are irrelevant to (cid:15) . Lemma 2.
Consider TFisher statistics W n ( τ , τ ) in (3) under the hypotheses of mixture model in(22), the maximizers τ ∗ and τ ∗ of c ( (cid:15), µ ; τ , τ ) do not depend on (cid:15) . The result of Lemma 2 becomes not so surprising if we consider the limitation of BE as statedin Remark 1. In particular, the denominator V of BE in (20) represents the variation of the testunder H , which is irrelevant to (cid:15) . BE is related to H only through the difference of the means E − E , which is proportional to (cid:15) in the same way no matter what τ and τ are.For the signal detection problem defined by the Gaussian mixture model in (10), Theorem 1gives a sufficient condition that guarantees soft-thresholding will reach a local maximum. Theorem 1.
Consider TFisher statistics W n ( τ , τ ) in (3) under the signal detection problem in(10). Follow the same notations in Lemma 1. It can be shown that the solution τ ∗ of equation (21)exists for any (cid:15) ∈ (0 , and µ > . . Furthermore, if τ ∗ also satisfies the condition δ ( τ ∗ ) δ (cid:48) ( τ ∗ ) = 1 − τ ∗ − Φ(Φ − (1 − τ ∗ ) − µ ) e µ Φ − (1 − τ ∗ ) − µ / − > − τ ∗ , then the soft-thresolding with τ = τ = τ ∗ guarantees a local maximum of c ( (cid:15), µ ; τ , τ ) in (20). Inparticular, τ ∗ > ¯Φ( µ/ satisfies the above condition. Theorem 1 illustrates that for the signal detection problem, if µ is not too small, the optimal τ ∗ can be calculated. The theorem does not guarantee the maximum is unique. However, since wehave the closed form of BE in (20), we can always study its properties numerically. Fixing (cid:15) = 0 . µ = 0 .
5, 1, and 1 .
5, Figure 5 gives the numerical values of c ( (cid:15), µ ; τ , τ ) over a grid of τ ∈ (0 , τ ∈ (0 ,
3) with step size 0.01. It shows that the local maximum is unique. More numericalstudies under various setups of µ and (cid:15) also confirm that the maximum is likely unique (results notshown here to save space). 18igure 5: 3D surface of BE c ( (cid:15), µ ; τ , τ ) over τ and τ . (cid:15) = 0 .
5. Left panel: µ = 0 .
5, themaximizers τ ∗ = 0 . , τ ∗ = 1 .
28 and the global maximum c ∗ = 0 . µ = 1, τ ∗ = τ ∗ = 0 . c ∗ = 0 . µ = 1 . τ ∗ = τ ∗ = 0 .
05 and c ∗ = 1 . c ( (cid:15), µ ; τ , τ ),the left panel of Figure 6 shows the values of global maximizers τ ∗ , τ ∗ over µ ; the right panel showsthe global and restricted maximums. A few observations can be made. First, the soft-thresholdingwith τ ∗ = τ ∗ is global optimal for maximizing BE when µ > .
78. It indicates that the lower boundfor the cut-off of 0 .
85 given in Theorem 1 is pretty tight. Secondly, when µ is larger than this cutoff, τ ∗ = τ ∗ = τ ∗ is a decreasing function of the signal strength µ . That is, the stronger the signals,the more beneficial the truncation method will be. When the signals are weaker, i.e., when µ isless than the cutoff, the optimal τ ∗ , τ ∗ could be different. τ ∗ is close to 1, but τ ∗ could be largerthan 1. It means that for weak signals, we should not truncate too much, but instead should give aheavier weight to smaller p -values through τ . Thirdly, even when the soft-thresholding is not theoptimal, it still gives a very similar value of c ( (cid:15), µ ; τ , τ ). That can be seen from the right panelof Figure 6: when µ is small, various methods have a similar c ( (cid:15), µ ; τ , τ ) value, which is close to0. However, when µ is large, we note that the optimal soft-thresholding is significantly better thanthe optimal hard-thresholding (TPM), and both are better than Fisher’s method (no truncation).This result means that the difference between soft-thresholding and the global-optimal methodscould be practically negligible. 19igure 6: BE-optimality over µ values. Left panel: Global maximizers τ ∗ and τ ∗ of BE c ( (cid:15), µ ; τ , τ )over µ . Right panel: Maximums of BE over µ . Optimal: Globally maximal BE; Soft: Maximal BEunder restriction τ = τ ; TPM: Maximal BE under restriction τ = 1; Fisher: BE at τ = τ = 1. (cid:15) = 0 . (cid:15) to statisticalpower, which could be not true in real data analysis. In particular, BE in (20) is related to H only through the difference of the means E − E , but not the variance. However, in reality a givenstatistic could have significantly different variations under the null and the alternative. To addressthis limitation, we develop a new asymptotic metric, called Asymptotic Power Efficiency (APE),which will take such variation difference into consideration.APE is defined based on a more direct and accurate asymptotics to reflect the patterns ofstatistical power. Following equation (17), under H , by the CLT we have P H ( W n > nE + z α (cid:112) nV ) → α, where E and V are defined in (19), z α is the (1 − α ) quantile of N (0 , nE + z α √ nV α asymptotic critical value for W n . Accordingly, the asymptotic power is P H (cid:16) W n > nE + z α (cid:112) nV (cid:17) = P (cid:32) W n − nE √ nV > z α (cid:114) V V − √ n E − E √ V (cid:33) , where V = E H ( Y i ) =[ (cid:90) τ log ( u ) D (cid:48) ( u ) du − ( (cid:90) τ log( u ) D (cid:48) ( u ) du ) +2( D ( τ ) −
1) log( τ ) (cid:90) τ log( u ) D (cid:48) ( u ) du + log ( τ ) D ( τ )(1 − D ( τ ))] . The rescaled critical value a ( (cid:15), µ ; τ , τ ) = z α (cid:114) V V − √ n ∆ √ V (23)is called the APE. Since W n − nE √ nV → N (0 , a ( (cid:15), µ ; τ , τ ), the bigger the asymptoticpower, and thus the more “efficient” a test is. BE and APE are consistent in the sense that thebigger the mean difference ∆, the more efficient a test is. Meanwhile, APE is more sophisticatedas it accounts for differences of both the means and the variances under the alternative versus thenull.When n is large, a ( (cid:15), µ ; τ , τ ) is dominated by the √ n term. We define b ( (cid:15), µ ; τ , τ ) = ∆ √ V (24)as another measure for the performance of a statistic, called Asymptotic Power Rate (APR). Notethat APR is similar as BE except that the denominator refers to the alternative variance under H .Since APR is more directly related to statistical power than BE, this formula indicates that thevariance of the statistic under the alternative hypothesis could be more relevant to its power thanits null variance. The next theorem indicates that the soft-thresholding method can be a promisingcandidate in terms of maximizing b ( (cid:15), µ ; τ , τ ), as long as the signal strength µ is not too smalland the signal proportion (cid:15) is not too large. Theorem 2.
Consider TFisher statistics W n ( τ , τ ) in (3) under signal detection problem in (10).When µ > . and (cid:15) < h b ( µ ) = 1 + ˜ g ( µ )(˜ g ( µ )) − ˜ g ( µ ) − ˜ g ( µ ) , where ˜ g k ( µ ) = (cid:82) log k ( u )( e µ Φ − (1 − x ) − µ / − du , the soft-thresolding with τ = τ = τ ∗ , for some τ ∗ , is a stationary point of b ( (cid:15), µ ; τ , τ ) in (24). (cid:15) . Moreover, we give a similar theoremconcerning the APE, which further allows the number of tests n and the significance level α to playa role in determining the theoretical boundary for the soft-thresholding to be promising. Theorem 3.
Follow the assumptions and notations in Theorem 2. There exists a lower bound µ (cid:48) > such that if µ > µ (cid:48) and (cid:15) < h a ( µ ) = (1 − c n )[1 + ˜ g ( µ )] + 2˜ g ( µ ) + ˜ g ( µ )(1 − c n )[(˜ g ( µ )) − ˜ g ( µ ) − ˜ g ( µ )] + 2˜ g ( µ ) + ˜ g ( µ ) , where c n = √ n/z α , then the soft-thresolding with τ = τ = τ ∗ , for some τ ∗ , is a stationary pointof a ( (cid:15), µ ; τ , τ ) in (23). Theorems 2 and 3 show that when (cid:15) is not too large and µ is not too small, soft-thresholdingis promising. Figure 7 shows that when n becomes larger, the theoretical boundary defined by a ( (cid:15), µ ; τ , τ ) is closer to the boundary defined by b ( (cid:15), µ ; τ , τ ), as expectd. Under finite n , theadvantage of soft-thresholding is even more prominent because the curve with n = 50 covers abigger parameter space than that of the other two.Figure 7: The boundaries defined by h b ( µ ) (Theorem 2, black) and h a ( µ ) (Theorem 3, α = 0 . n = 50; cyan: n = 5000). The soft thresholding τ = τ = τ ∗ , for some τ ∗ , satisfies the firstorder condition of maximizing b ( (cid:15), µ ; τ , τ ) or a ( (cid:15), µ ; τ , τ ) for all ( (cid:15), µ ) below the correspondingboundary curves. 22e further study numerically the optimal points based on APE. At n = 50 and α = 0 .
05, theleft panels of Figure 8 fix (cid:15) (row 1: (cid:15) = 0 .
01; row 2: (cid:15) = 0 .
1) and plot the maximizer τ ∗ , τ ∗ over µ . The pattern is consistent with that for BE in Figure 6: the soft-thresholding is indeed globallyoptimal when µ is large enough, and τ ∗ is a decreasing function of µ . Moreover, the smaller the (cid:15) ,the smaller the µ cutoff to guarantee the soft-thresholding being optimal. When µ is smaller thanthe cutoff, both τ ∗ and τ ∗ could be large, indicating a light truncation and a significance-upscalingweighting for the p -values. The right panels of Figure 8 fix µ (row 1: µ = 1; row 2: µ = 2) and plotthe maximizer τ ∗ , τ ∗ over (cid:15) . Consistent with our theorem, the soft-thresholding is indeed globallyoptimal when (cid:15) is not too large (i.e., sparse signals). Such optimal τ ∗ is proportional to the signalproportion (cid:15) . The τ ∗ /(cid:15) ratio is a decreasing function of µ , which could be larger or smaller than 1.Thus, the best cutoff τ ∗ is not a “natural” value 0.05 as suggested in literature (Zaykin et al., 2002);it is also not simply the signal proportion (cid:15) . Instead, there is a functional relationship between τ ∗ and the signal pattern defined by (cid:15) and µ , as is given here.Figure 8: The global maximizer ( τ ∗ , τ ∗ ) for a ( (cid:15), µ ; τ , τ ) when n = 50, α = 0 .
05. From top tobottom, left column: (cid:15) = 0 .
01 or 0 .
1; right column: µ = 1 or 2.23hen (cid:15) is big or when µ is small, the soft-thresholding may not be optimal based on APE.However, when that happens, the practically meaningful difference is likely small because theseareas correspond the true statistical power being close to 0 or 1. Figure 9 shows the comparison ofstatistical power between the global optimality (with maximizers ( τ ∗ , τ ∗ ) of APE) and the optimalsoft-thresholding (under restriction τ = τ = τ ∗ ). The two power curves match perferctly, even atregions where the soft-thresholding may not be globally optimal in theory. Here the optimizationis done by a grid search over τ ∈ { . , . , ..., } and τ ∈ { . , . , ..., } , and statisticalpower is calculated by the method provided in Section 4. The result suggests that we may almostalways focus on the soft-thresholding in the TFisher family.Figure 9: Power comparison between globally optimal TFisher statistic (at global maximizers( τ ∗ , τ ∗ ) of APE) and the optimal soft-thresholding TFisher (at restricted maximizers τ = τ = τ ∗ of APE). The number of tests n = 50, the type I error α = 0 .
05. Left: (cid:15) = 0 .
1. Right: µ = 2..6. STATISTICAL POWER COMPARISON FOR SIGNAL DETECTIONIn this section, we focus on statistical power for the signal detection problem in (10). First, weshow that our analytical power calculation is accurate when comparing with simulations. Then, wecompare the statistical power among different methods and demonstrate their relative performance.Statistical power calculation combines the calculations for the null distribution (for controlling24he type I error) given in Section 3 and for the alternative distribution given in Section 4. Herewe evidence the accuracy of these calculation methods through comparing the statistical power bycalculation versus simulation. Figure 10 shows that even for relatively small n , we have accuratestatistical power calculation under various model parameter setups.Figure 10: The statistical power calculation versus simulation for signal detection. Type I error rate α = 0 .
05. Left panel: τ = 0 . , τ = 0 .
5; Middle: τ = 0 . , τ = 0 .
05; Right: τ = 0 . , τ = 0 . simulations. Calc SN: by calculation.Next, we compare various methods in the TFisher family: optimal TFisher with global maxi-mizers τ ∗ , τ ∗ of APE in (23), soft-thresholding with fixed τ = τ = 0 .
05, soft-thresholding omnibustest oTFisher with adaptive τ = τ ∈ { . , . , . , } , Fisher’s method with τ = τ = 1, andTMP with τ = 0 .
05 and τ = 1. Figure 11 illustrates the power over the signal strength µ atvarious number n of input p -values (by row) and the expected number n(cid:15) of signals (by column).Figure 12 illustrates the power over signal proportion (cid:15) at various n (by row) and the signal strength µ (by column). Interesting observations can be seen from these two figures. First, with no surprise,the optimal TFisher is always the best over all settings. Actually in most of those cases the optimalTFisher corresponds to the soft-thresholding with τ ∗ = τ ∗ , and if they are not equal, the power dif-ference is almost always negligible (see Figures 8 and 9). Secondly, the soft-thresholding oTFisheris a relatively robust method over various signal patterns. It is often close to the best, and neverbe the worst. In fact, its power is often close to the power of the statistic with the parameters itadaptively chooses. For example, if oTFisher chooses τ = τ = 0 .
05, it gives a similar but slightly25ower power than TFisher with the same parameters. The slight loss of power is possibly due to thevariation of the adaptive choice. Thirdly, the soft-thresholding TFisher with fixed τ = τ = 0 . (cid:15) is small. It also has a clear disad-vantage when the signals are dense. The original Fisher’s method, which is also a special case ofsoft-thresholding shows an opposite pattern. Meanwhile, the relative advantage of Soft-0.05 versusFisher is also related to the signal strength µ . In consistence with the theoretical study of both BEand APE, the larger the µ , the smaller the optimal τ ∗ shall be. Such phenomenon is evidenced bypanel 3-3 in Figure 11 and the panel 1-3 in Figure 12: when (cid:15) is relatively big, say around 0.1 and0.2, Soft-0.05 could still be better than Fisher at large µ . Lastly, the hard-thresholding TMP-0.05is mostly not among the best. In particular, it has a clear disadvantage to Soft-0.05 for detectingsparse signals.Finally we compare the power of three omnibus tests: oTFisher with soft-thresholding, theadaptive TPM (ATPM, hard-thresholding), and the adaptive RTP (ARTP). ARTP was shown tohave the highest power among a group of adaptive set-based methods for genetic association testing(Su et al., 2016; Yu et al., 2009). The Supplementary Figures S1 and S2 in the SupplementaryMaterials illustrate the power of the optimal TFisher and the three omnibus tests under the samesettings as Figures 11 and 12, respectively. The key result is that oTFisher actually dominatesboth ATPM and ARTP across all settings of signal patterns. ARTP could be better than ATPMfor sparser and stronger signals, but the opposite is true for denser and weaker signals.In summary, the pattern of power comparison well reflects the theoretical study in Section 5.The soft-thresholding that restricts τ = τ = τ is the right strategy to reach the optimal statisticin most cases. The optimal τ ∗ is related to the signal pattern defined by both parameters (cid:15), µ . If weknow the signal pattern, e.g., small (cid:15) (especially if µ is big at the same time), then we should choosea small τ . However, if no such prior information is available in a study, then the soft-thresholdingoTFisher with a grid of τ over small, mediate and large values in (0 ,
1) will likely be a robustsolution. 26igure 11: The power comparison over signal strength µ . Type I error rate α = 0 .
05. Soft 0.05:soft-thresholding at τ = τ = 0 .
05; TPM 0.05: hard-thresholding at τ = 0 . , τ = 1; Fisher:Fisher’s combination at τ = τ = 1; Optimal: optimal TFisher at maximizers τ ∗ , τ ∗ of APE;Omnibus: soft-thresholding oTFisher with adaptive τ ∈ { . , . , . , } .27igure 12: The power comparison over signal proportion (cid:15) . Type I error rate α = 0 .
05. Soft 0.05:soft-thresholding at τ = τ = 0 .
05; TPM 0.05: hard-thresholding at τ = 0 . , τ = 1; Fisher:Fisher’s combination at τ = τ = 1; Optimal: optimal TFisher at maximizers τ ∗ , τ ∗ of APE;Omnibus: soft-thresholding oTFisher with adaptive τ ∈ { . , . , . , } .28. ALS EXOME-SEQ DATA ANALYSISThe p -value combination methods have been widely used for genetic association studies, but mostof them were based on hard-thresholding, including TPM and RTP methods (Biernacka et al., 2012;Dai et al., 2014; Dudbridge and Koeleman, 2003; Hoh et al., 2001; Li and Tseng, 2011; Su et al.,2016; Yu et al., 2009). In this section we apply and assess the soft-thresholding TFisher by analyzinga whole exome sequencing data of amyotrophic lateral sclerosis (ALS). ALS is a neurodegenerativedisorder resulting from motor neuron death. It is the most common motor neuron disease in adults(Motor Neuron Diseases Fact Sheet, NINDS). ALS is a brutal disease that causes patients to losemuscle strength and coordination even for breathing and swallowing, while leaving their senses ofpain unaffected. ALS is uniformly fatal, usually within five years. Genetics plays a critical role inALS; the heritability is estimated about 61% (Smith et al., 2014). The identification of ALS genesis foundational in elucidation of disease pathogenesis, development of disease models, and design oftargeted therapeutics. Despite numerous advances in ALS gene detection, these genes can explainonly a small proportion (about 10%) of cases (Cirulli et al., 2015).Exome-sequencing data is obtained by the next-generation sequencing technology for sequencingall protein-coding genes in a genome, i.e., the exome. This approach identifies genetic variants thatalter protein sequences that may affect diseases. It provides a great balance between the depth ofsequencing and the cost comparing with the whole-genome sequencing. Our data comes from theALS Sequencing Consortium, and the data cleaning and single nucleotide variant (SNV) filteringprocess follows the same steps as the original study (Smith et al., 2014). Specifically, we focusedon SNVs which occur at highly conserved positions (with positive GERP score (Davydov et al.,2010)) or which represent stop-gain or stop-loss mutations (Liu et al., 2016). SNVs that have lowgenotyping quality (missing rate < p -values from thegroup of SNVs within that gene generate a TFisher statistic, then the summary p -value of this29tatistic is obtained to measure how significant the gene is associated. Here we apply the logisticregression model to obtain the input SNV p -values, which allows adjusting for other covariates suchas non-genetic factors. Specifically, let y k be the binary indicator of the case ( y k = 1) or the control( y k = 0) for the k th individual, k = 1 , ..., N . Let G k = ( G k , ..., G kn ) denote the genotype vector of n SNVs in the given gene, and let Z k = (1 , Z k , Z k ) be the vector of the intercept and covariatesof gender and country origin. The logistic regression model islogit( E ( Y k | G k , Z k )) = G (cid:48) k β + Z (cid:48) k γ, where β and γ are the coefficients. The null hypothesis is that none of the SNVs in the gene areassociated, and thus this gene is not associated: H : β i = 0 , i = 1 , ..., n. To test this null hypothesis, we adopt a classic marginal test statistic (McCullagh and Nelder, 1989;Schaid et al., 2002) U i = N (cid:88) k =1 G ki ( Y k − ˜ Y k ) , i = 1 , ..., n, where ˜ Y k is the fitted probability of the case under H . It can be shown that under H , the vectorof statistics U = ( U , ..., U n ) D → N (0 , Σ), as N → ∞ , where Σ can be estimated byˆΣ = G (cid:48) W G − G (cid:48) W Z ( Z (cid:48) W Z ) − Z (cid:48) W G, where G = ( G ki ) and Z = ( Z ki ) are the corresponding design matrices, and the diagonal matrix W = diag( ˜ Y k (1 − ˜ Y k )). After de-correlation we get the input statistics X = ˆΣ − U D → N (0 , I n × n ),and the input p -values are 2 P ( N (0 , > | X i | ) i . i . d . → Uniform[0 , p -value calculationmethods given in Section 3 can be applied to any TFisher or oTFisher statistics.The left panel of Figure 13 gives the Q-Q plot of the gene-level p -values of TFisher statisticsat τ = τ = 0 .
05. Because of the truncation, it is natural that some genes have p -values at 1(indicated by the flat part of the dots). It often happens when the gene contains only a few SNVsand their marginal p -values, as the input of its TFisher statistic, are all large, say larger than 0.05here. Such genes are likely not associated anyway, thus the truncation does not influence the typeI error rate being well controlled at the gene level. The majority of p -values are still along the30iagonal as expected. The right panel of Figure 13 provides the Q-Q plot of the gene-level p -valuesby oTFisher test, in which the parameters τ = τ adapt over { . , . , } . The top ranked genesby both methods are consistent, which is reasonable because the signals, i.e., the ALS associatedSNVs, are expected to be in a small proportion of all SNVs.Figure 13: Q-Q plots of p -values based on soft-thresholding tests. Left: τ = τ = 0 .
05. Right:omnibus with τ = τ ∈ { . , . , } .To the best of our knowledge, most of these top ranked genes have not been directly reportedin genetic association studies of ALS, even though they are promisingly related to ALS from thefunctionality perspective as discussed below. This result indicates that TFisher tests could likelycontribute extra power over existing methods for the discovery of novel disease genes. Certainly,the result is based on very limited data; further statistical and biological validations are needed toclarify their genetic mechanisms to ALS.The biological relevance of the top ranked genes is briefly discussed here. Gene SMAP1 (agroup of 8 SNVs, p -value 1 . × − ) is among significant clusters of altered genes in frontal cortexof ALS samples (Andr´es-Benito et al., 2017). The STRING protein-protein network (Szklarczyket al., 2014) shows that it has a strong connection with LRRK2 , a gene associated with late-onsetParkinson’s disease (PD), which is a neurodegenerative diseases closely related to ALS (Bonifati,31006). Gene
SLC22A24 (12 SNVs, p -value 1 . × − ) has reported statistical association withAlzheimer’s disease, another neurodegenerative disease closely related to ALS (Ayers et al., 2016).Furthermore, STRING network shows that SLC22A24 has strong connections with two ALS relatedgenes:
AMACR and
C7orf10 . AMACR is a gene of AMACR deficiency, a neurological disordersimilar as ALS; both initiate and slowly worsen in later adulthood.
C7orf10 is associated withALS types 3 and 4 (Fanning et al., 2012). Gene
OSMR (8 SNVs, p -value 6 . × − ) has beenfound critically involved in neuronal function regulation and protection (Guo et al., 2015). Also,it is associated with IL31RA functional receptor, which is a critical neuroimmune link betweenTH2 cells and sensory nerves (Cevikbas et al., 2014). Gene
TBX6 (8 SNVs, p -value 9 . × − )involves regulation in neural development and maturation (Chapman and Papaioannou, 1998).Moreover, in a novel stem cell therapy of ALS, TBX6 and its associated
SOX2 play a criticalrole (S Pandya et al., 2012). Gene
VAX2 (7 SNVs, p -value 1 . × − ) plays a functional role inspecifying dorsoventral forebrain. It has direct protein-protein interaction with ALS gene CHMP2B (Cox et al., 2010). It also has a direct STRING connection with
SIX3
GFRA1 (4SNVs, p -value 2 . × − ) encodes a member of the glial cell line-derived neurotrophic factorreceptor (GDNFR). It has direct STRING connection with two ALS related genes: RAP1A , whichis associated with ALS by influencing the activation of
Nox2 , a modifier of survival in ALS(Carteret al., 2009), and
PIK3CA , which is an up-regulated gene in the ALS mouse model (de Oliveiraet al., 2014). 8. DISCUSSIONWe proposed and studied a family of Fisher type p -value combination tests, TFisher, with a generalweighting and truncation scheme, for which many existing methods are special cases. For the signaldetection problem, we studied the optimal TFisher statistics that maximize the BE and the APE.As a result, we showed that soft-thresholding is nearly the best choice, better than the TPM andRTP methods used in a rich literature of applied statistics.From the theoretical perspective, the studies of BE and APE revealed the rules for best weight-32ng and truncating input p -values in order to best reveal true signals. Our results validated ageneral principle: when the signals are sparse and strong, more relative weight should be givento the smallest p -values; when the signals are dense and weak, a more “flat” weighting scheme isappropriate. Meanwhile, the original magnitude of p -values often need to be downscaled by theparameter τ ∈ (0 , H /H -definingparameters. Based on this idea, the statistic family could be further generalized and powerfulmethods could be obtained for specific testing problems in the future.From the practical perspective, the paper provided analytical calculations for both p -value andstatistical power for a broad family of TFisher statistics under general hypotheses. Data-adaptiveomnibus tests could also be applied to real data with unknown signal pattern. A data analysispipeline for genetic association studies was illustrated, and a list of putative ALS genes wereidentified and discussed. 9. SUPPLEMENTARY MATERIALSThe supplementary materials provide detailed proofs of all lemmas and theorems, and the supple-mentary figures for statistical power comparisons among data-adaptive methods: oTFisher,ATPM, and ARTP.9.1 Proofs for Lemmas and Theorems Proof of Lemma 1
The first order partial derivative of c ( θ ; τ , τ ) with respect to τ is ∂c ( θ ; τ , τ ) ∂τ ∝ V ∂ ∆ ∂τ − ∆ ∂V ∂τ , ∂ ∆ ∂τ = log (cid:18) τ τ (cid:19) ( D (cid:48) ( τ ) − ,∂V ∂τ = (cid:32) − − τ ) (cid:18) (cid:18) τ τ (cid:19)(cid:19) + (1 − τ ) (cid:18) (cid:18) τ τ (cid:19)(cid:19) (cid:33) . At τ = τ , we have both partials equal to 0.We further examine the partial derivative with respect to τ and then evaluate it at τ = τ , ∂ ∆ ∂τ (cid:12)(cid:12)(cid:12) τ = τ = 2( D ( τ ) − τ ) τ ; ∂V ∂τ (cid:12)(cid:12)(cid:12) τ = τ = 2(1 − τ );∆ (cid:12)(cid:12)(cid:12) τ = τ = (cid:90) τ − log ( u ) ( D (cid:48) ( u ) − du + log( τ )( D ( τ ) − τ ); V (cid:12)(cid:12)(cid:12) τ = τ = τ (2 − τ ) . Thus ∂c ( θ ; τ , τ ) ∂τ (cid:12)(cid:12)(cid:12) τ = τ = 0 ⇔ ( D ( τ ) − τ )(2 − τ ) − (1 − τ ) (cid:90) τ − log( u )( D (cid:48) ( u ) − du − ( D ( τ ) − τ )(1 − τ ) log( τ ) = 0 ⇔ (cid:90) τ − log( u )( D (cid:48) ( u ) − du = ( D ( τ ) − τ ) (cid:18) − τ − τ − log( τ ) (cid:19) . Proof of Lemma 2
The Bahadur efficiency is c ( θ ; τ , τ ) = ( E − E ) V = ∆ V , where V is irrelevantto H , thus to (cid:15) . On the other hand, ∆ = (cid:82) τ − log (cid:16) uτ (cid:17) ( D (cid:48) ( u ) − du . We can show that∆ = (cid:15)g ( τ , τ , µ ). This is equivalent to show D (cid:48) ( u ) − (cid:15) .By (15), D ( x ) = 1 − F ( F − (1 − x )) where F ( x ) = G ( x ) and F ( x ) = (1 − (cid:15) ) G ( x ) + (cid:15)G ( x ; µ ).We can further write D ( x ) = 1 − (1 − (cid:15) ) G ( G − (1 − x )) − (cid:15)G ( G − (1 − x ); µ )= 1 − (1 − (cid:15) )(1 − x ) − (cid:15)G ( G − (1 − x ); µ )= x + (cid:15) − (cid:15)x − (cid:15)G ( G − (1 − x ); µ ) .D ( x ) − x = (cid:15) (1 − x − G ( G − (1 − x ); µ )) .D (cid:48) ( x ) − (cid:15) (cid:18) − G (cid:48) ( G − (1 − x ); µ ) G (cid:48) ( G − (1 − x )) (cid:19) . This completes the proof. 34 roof of Theorem 1
Following Lemma 1 for the first-order conditions for maximizing c ( θ ; τ , τ )in (20), note that for the Gaussian mixture model in (10), D (cid:48) ( x ) − (cid:15) ( e µ Φ − (1 − x ) − µ / −
1) and D ( x ) − x = (cid:15) (1 − x − Φ(Φ − (1 − x ) − µ )). Therefore the optimal τ = τ = τ ∗ does not depend on (cid:15) . Let f ( τ ) = ( D ( τ ) − τ ) (cid:16) − τ − τ − log( τ ) (cid:17) + (cid:82) τ log( u )( D (cid:48) ( u ) − du . Note that f (0) = 0, f (cid:48) (0) =1 − D (cid:48) (0) >
0. A sufficient condition for the existence of the root τ ∗ is f (1) = 1 − D (cid:48) (1) − (cid:90) τ log( u )( D (cid:48) ( u ) − du > ⇔ (cid:15) + (cid:15) (cid:90) τ log( u )( e µ Φ − (1 − x ) − µ / − du < . This is equivalent to e − µ / − (cid:90) log( u ) e µ Φ − (1 − u ) du > ⇐⇒ µ > µ = 0 . . Next we will examine the the second order derivatives. In a generic form, ∂ c ( θ ; τ , τ ) ∂τ ∂τ = 1 V (cid:32) ∂V ∂τ ∂V ∂τ V − ∂V ∂τ ∂ ∆ ∂τ + ∆ ∂ V ∂τ ∂τ V + 2∆ ∂ ∆ ∂τ ∂τ + 2 ∂ ∆ ∂τ ∂ ∆ ∂τ (cid:33) . Again ∂ ∆ ∂τ (cid:12)(cid:12)(cid:12) τ = τ = 0 and ∂V ∂τ (cid:12)(cid:12)(cid:12) τ = τ = 0. We can simplify ∂ c ( θ ; τ , τ ) ∂τ (cid:12)(cid:12)(cid:12) τ = τ = 1 V − ∆ ∂ V ∂τ V + 2∆ ∂ ∆ ∂τ (cid:12)(cid:12)(cid:12) τ = τ ,∂ c ( θ ; τ , τ ) ∂τ (cid:12)(cid:12)(cid:12) τ = τ = 1 V ( ∂V ∂τ ) V − ∂V ∂τ ∂ ∆ ∂τ + ∆ ∂ V ∂τ V + 2∆ ∂ ∆ ∂τ + 2( ∂ ∆ ∂τ ) (cid:12)(cid:12)(cid:12) τ = τ ,∂ c ( θ ; τ , τ ) ∂τ ∂τ (cid:12)(cid:12)(cid:12) τ = τ = 1 V (cid:32) − ∆ ∂ V ∂τ ∂τ V + 2∆ ∂ ∆ ∂τ ∂τ (cid:33)(cid:12)(cid:12)(cid:12) τ = τ . The following are the relevant terms evaluated at τ = τ = τ ∗ ∂ V ∂τ = 2; ∂ V ∂τ = 2(1 − τ ∗ ) τ ∗ ; ∂ V ∂τ ∂τ = − ∂ ∆ ∂τ = − D (cid:48) ( τ ∗ ) − τ ∗ ; ∂ ∆ ∂τ = − D ( τ ∗ ) − τ ∗ τ ∗ ; ∂ ∆ ∂τ ∂τ = D (cid:48) ( τ ∗ ) − τ ∗ ;∆ = ( D ( τ ∗ ) − τ ∗ )(2 − τ ∗ )1 − τ ∗ ; V = τ ∗ (2 − τ ∗ ) . ∂ c ( θ ; τ ,τ ) ∂τ (cid:12)(cid:12)(cid:12) τ = τ = τ ∗ < D (1) < D ( τ ∗ )+(1 − τ ∗ ) D (cid:48) ( τ ∗ ), which is always true because D ( x ) is a concave function. Finally, after some algebra, the condition ( ∂ c ( θ ; τ ,τ ) ∂τ ∂ c ( θ ; τ ,τ ) ∂τ − ( ∂ c ( θ ; τ ,τ ) ∂τ ∂τ ) ) (cid:12)(cid:12)(cid:12) τ = τ = τ ∗ > D ( τ ∗ ) − τ ∗ D (cid:48) ( τ ∗ ) − > − τ ∗ . One sufficient condition for such inequality holds is D (cid:48) ( τ ∗ ) > τ ∗ > − Φ( µ/ Proof of Theorem 2
Taking the partial of b ( θ ; τ , τ ) with respect to τ , we have ∂∂τ b ( θ ; τ , τ ) ∝ (cid:18) V ∂ ∆ ∂τ − ∆ ∂V ∂τ (cid:19) , where ∂V ∂τ =[log ( τ ) D (cid:48) ( τ ) − τ ) D (cid:48) ( τ ) (cid:90) τ log( u ) D (cid:48) ( u ) du +2 D (cid:48) ( τ ) log( τ ) (cid:90) τ log( u ) D (cid:48) ( u ) du + 2( D ( τ ) −
1) log( τ ) log( τ ) D (cid:48) ( τ )+ log ( τ )( D (cid:48) ( τ ) − D ( τ ) D (cid:48) ( τ ))] . Therefore, ∂V ∂τ (cid:12)(cid:12)(cid:12) τ = τ = [log ( τ ) D (cid:48) ( τ ) − D (cid:48) ( τ ) log ( τ ) + log ( τ ) D (cid:48) ( τ )] = 0 . Together with ∂V ∂τ (cid:12)(cid:12)(cid:12) τ = τ = ∂ ∆ ∂τ (cid:12)(cid:12)(cid:12) τ = τ = 0, as was shown in the proof of Theorem 1, we have ∂∂τ b ( θ ; τ , τ ) (cid:12)(cid:12)(cid:12) τ = τ = 0 . The choice τ = τ = τ ∗ meets the first order conditions for the optimality if ∂∂τ b ( θ ; τ , τ ) (cid:12)(cid:12)(cid:12) τ = τ = τ ∗ =0 has a solution τ ∗ . This is equivalent to solve (cid:18) V ∂ ∆ ∂τ − ∆ ∂V ∂τ (cid:19)(cid:12)(cid:12)(cid:12) τ = τ = τ ∗ = 0 , | τ = τ = − (cid:90) τ log( u ) D (cid:48) ( u ) du + D ( τ ) log( τ ) − τ ; ∂ ∆ ∂τ | τ = τ = D ( τ ) − τ τ ; V | τ = τ = [ (cid:90) τ log ( u ) D (cid:48) ( u ) du − ( (cid:90) τ log( u ) D (cid:48) ( u ) du ) + 2( D ( τ ) −
1) log( τ ) (cid:90) τ log( u ) D (cid:48) ( u ) du + log ( τ ) D ( τ )(1 − D ( τ ))]; ∂V ∂τ | τ = τ = 2 D ( τ ) − τ [ (cid:90) τ log( u ) D (cid:48) ( u ) du − D ( τ ) log( τ )] . Plug them in and after simplification, we want to solve f b ( τ ) = ( g ( τ )) − τ (1 − τ ))( D ( τ ) − − τ g ( τ ) − D ( τ ) − τ − τ g ( τ )+ D ( τ )( D ( τ ) −
1) log( τ )(1 − log( τ ))1 − τ = 0 , where g k ( τ ) = g k ( τ ; (cid:15), µ ) = (cid:82) log k ( u ) D (cid:48) ( u ) du .It is easy to check that f b (0) = 0 and f (cid:48) b (0) <
0. A sufficient condition for the existence of aroot is that f b (1) >
0, i.e., f b (1) = ( g (1)) + D (cid:48) (1) g (1) − (1 − D (cid:48) (1)) g (1) > . Notice that g (1) = (cid:15) ˜ g ( µ ) − g (1) = (cid:15) ˜ g ( µ ) + 2. This is equivalent to (cid:15) [(˜ g ( µ )) − ˜ g ( µ ) − ˜ g ( µ )] > g ( µ ) . Since (˜ g ( µ )) − ˜ g ( µ ) − ˜ g ( µ ) < g ( µ ) needs to be <
0, the sufficient conditions forthe existence of a root is µ > µ = 0 . ,(cid:15) < g ( µ )(˜ g ( µ )) − ˜ g ( µ ) − ˜ g ( µ ) , where µ is the same given in Theorem 1. Proof of Theorem 3
Taking the partial of a ( θ ; τ , τ ) with respect to τ , we have ∂∂τ a ( θ ; τ , τ ) ∝ ( z α (cid:112) V − √ n ∆) (cid:18) V ∂V ∂τ − V ∂V ∂τ (cid:19) − √ nV (cid:18) V ∂ ∆ ∂τ − ∆ ∂V ∂τ (cid:19) ∝ (cid:18) V ∂V ∂τ − V ∂V ∂τ (cid:19) − √ nV z α (cid:18) V ∂ ∆ ∂τ − ∆ ∂V ∂τ (cid:19) . ∂V ∂τ | τ = τ = ∂V ∂τ | τ = τ = ∂ ∆ ∂τ | τ = τ = 0, andthus ∂∂τ a ( θ ; τ , τ ) (cid:12)(cid:12)(cid:12) τ = τ = 0 . The choice τ = τ = τ ∗ meets the first order conditions for the optimality if ∂∂τ a ( θ ; τ , τ ) | τ = τ =0 has a solution τ ∗ . This is equivalent to solve (cid:20)(cid:18) V ∂V ∂τ − V ∂V ∂τ (cid:19) − √ nV z α (cid:18) V ∂ ∆ ∂τ − ∆ ∂V ∂τ (cid:19)(cid:21)(cid:12)(cid:12)(cid:12) τ = τ = 0 , where V | τ = τ = τ (2 − τ ), ∂V ∂τ | τ = τ = 2(1 − τ ), and the rest of the terms are given in the proofof Theorem 2. We can simplify the equation to be f c ( τ ) = ( τ − c τ )( g ( τ )) − τ ( D ( τ ) − − τ (2 log( τ )(1 − τ + c τ ) − τ − c τ ) g ( τ )+ (cid:18) τ − c τ D ( τ ) − τ − τ (cid:19) g ( τ ) + τ D ( τ )( D ( τ ) −
1) log( τ )1 − τ (log( τ )(1 − τ + c τ ) − τ − c τ ) = 0 , where c τ = c n (cid:112) τ (2 − τ ). Here we have f c (0) = 0 and f (cid:48) c (0) >
0. The condition f c (1) < − c n ) (cid:15) (cid:0) (˜ g (1)) − ˜ g (1) − ˜ g (1) (cid:1) − (1 − c n )(˜ g (1) + 1)+ (cid:15) (2˜ g (1) + ˜ g (1)) − (2˜ g (1) + ˜ g (1)) < . For c n large enough, (1 − c n )[(˜ g ( µ )) − ˜ g ( µ ) − ˜ g ( µ )] + 2˜ g ( µ ) + ˜ g ( µ ) > τ ∗ , i.e., the stationary point is µ > µ (cid:48) such that (1 − c n )[1 + ˜ g ( µ (cid:48) )] + 2˜ g ( µ (cid:48) ) + ˜ g ( µ (cid:48) ) = 0 ,(cid:15) < (1 − c n )[1 + ˜ g ( µ )] + 2˜ g ( µ ) + ˜ g ( µ )(1 − c n )[(˜ g ( µ )) − ˜ g ( µ ) − ˜ g ( µ )] + 2˜ g ( µ ) + ˜ g ( µ ) . µ .Type I error rate α = 0 .
05. Optimal: optimal TFisher at maximizers τ ∗ , τ ∗ of APE; ARTP:adaptive RTP with adaptive K ∈ { , . n, . n, n } ; oTFisher: soft-thresholding omnibus TFisherwith adaptive τ ∈ { . , . , . , } ; ATPM: adaptive TPM (hard-thresholding) with adaptive τ ∈ { . , . , . , } . 39igure S2: Power comparison between the optimal and adaptive tests over signal proportion (cid:15) .Type I error rate α = 0 .
05. Optimal: optimal TFisher at maximizers τ ∗ , τ ∗ of APE; ARTP:adaptive RTP with adaptive K ∈ { , . n, . n, n } ; oTFisher: soft-thresholding omnibus TFisherwith adaptive τ ∈ { . , . , . , } ; ATPM: adaptive TPM (hard-thresholding) with adaptive τ ∈ { . , . , . , } . 40EFERENCESFelix Abramovich, Yoav Benjamini, David L Donoho, and Iain M Johnstone. Adapting to unknownsparsity by controlling the false discovery rate. The Annals of Statistics , 34(2):584–653, 2006.Walid A Abu-Dayyeh, Marwan A Al-Momani, and Hassen A Muttlak. Exact bahadur slope forcombining independent tests for normal and logistic distributions.
Applied mathematics andcomputation , 135(2):345–360, 2003.Pol Andr´es-Benito, Jes´us Moreno, Ester Aso, M´onica Povedano, and Isidro Ferrer. Amyotrophiclateral sclerosis, gene deregulation in the anterior horn of the spinal cord and frontal cortex area8: implications in frontotemporal lobar degeneration.
Aging , 9(3):823–851, 2017.Kristin L Ayers, Uyenlinh L Mirshahi, Amr H Wardeh, Michael F Murray, Ke Hao, Benjamin SGlicksberg, Shuyu Li, David J Carey, and Rong Chen. A loss of function variant in CASP7protects against Alzheimer’s disease in homozygous APOE ε BMC Genomics ,17(Suppl 2):445, 2016.Adelchi Azzalini. A class of distributions which includes the normal ones.
Scandinavian Journal ofStatistics , 12(2):171–178, 1985.Raghu R Bahadur. Stochastic comparison of tests.
The Annals of Mathematical Statistics , 31(2):276–295, 1960.Raghu R Bahadur. Rates of convergence of estimates and test statistics.
The Annals of Mathemat-ical Statistics , 38(2):303–324, 1967.Joanna M Biernacka, Gregory D Jenkins, Liewei Wang, Ann M Moyer, and Brooke L Fridley. Useof the gamma method for self-contained gene-set analysis of SNP data.
European Journal ofHuman Genetics , 20(5):565–571, 2012.Vincenzo Bonifati. Parkinson’s disease: the LRRK2-G2019S mutation: opening a novel era inParkinson’s disease genetics.
European Journal of Human Genetics , 14(10):1061–1062, 2006.Barrie J Carter, Pervin Anklesaria, Stephanie Choi, and John F Engelhardt. Redox modifier genes41nd pathways in amyotrophic lateral sclerosis.
Antioxidants & Redox Signaling , 11(7):1569–1586,2009.Ferda Cevikbas, Xidao Wang, Tasuku Akiyama, Cordula Kempkes, Terhi Savinko, Attila Antal,Gabriela Kukova, Timo Buhl, Akihiko Ikoma, Joerg Buddenkotte, et al. A sensory neuron–expressed IL-31 receptor mediates T helper cell–dependent itch: Involvement of TRPV1 andTRPA1.
Journal of Allergy and Clinical Immunology , 133(2):448–460, 2014.Deborah L Chapman and Virginia E Papaioannou. Three neural tubes in mouse embryos withmutations in the T-box gene Tbx6.
Nature , 391(6668):695–697, 1998.Elizabeth T Cirulli, Brittany N Lasseigne, Slav´e Petrovski, Peter C Sapp, Patrick A Dion, Claire SLeblond, Julien Couthouis, Yi-Fan Lu, Quanli Wang, Brian J Krueger, et al. Exome sequencingin amyotrophic lateral sclerosis identifies risk genes and pathways.
Science , 347(6229):1436–1441,2015.Laura E Cox, Laura Ferraiuolo, Emily F Goodall, Paul R Heath, Adrian Higginbottom, HeatherMortiboys, Hannah C Hollinger, Judith A Hartley, Alice Brockington, Christine E Burness, et al.Mutations in CHMP2B in lower motor neuron predominant amyotrophic lateral sclerosis (ALS).
PLoS One , 5(3):e9872, 2010.Hongying Dai, J. Steven Leeder, and Yuehua Cui. A modified generalized Fisher method forcombining probabilities from dependent tests.
Frontiers in Genetics , 5(32), 2014.Henry E Daniels. Saddlepoint approximations in statistics.
The Annals of Mathematical Statistics ,25(4):631–650, 1954.Anirban DasGupta.
Asymptotic Theory of Statistics and Probability . Springer Science & BusinessMedia, New York, 2008.Eugene V Davydov, David L Goode, Marina Sirota, Gregory M Cooper, Arend Sidow, and SerafimBatzoglou. Identifying a high fraction of the human genome to be under selective constraintusing GERP++.
PLoS Computational Biology , 6(12):e1001025, 2010.42abriela P de Oliveira, Jessica R Maximino, Mariana Maschietto, Edmar Zanoteli, Renato D Puga,Leandro Lima, Dirce M Carraro, and Gerson Chadi. Early gene expression changes in skeletalmuscle from SOD1G93A amyotrophic lateral sclerosis animal model.
Cellular and MolecularNeurobiology , 34(3):451–462, 2014.David L Donoho. De-noising by soft-thresholding.
IEEE Transactions on Information Theory , 41(3):613–627, 1995.David L Donoho and Jiashun Jin. Higher criticism for detecting sparse heterogeneous mixtures.
The Annals of Statistics , 32(3):962–994, 2004.Frank Dudbridge and Bobby PC Koeleman. Rank truncated product of p -values, with applicationto genomewide association scans. Genetic Epidemiology , 25(4):360–366, 2003.S Fanning, W Xu, C Beaurepaire, JP Suhan, A Nantel, and AP Mitchell. Functional control of theCandida albicans cell wall by catalytic protein kinase A subunit Tpk1.
Molecular Microbiology ,86(2):284–302, 2012.R. A. Fisher.
Statistical Methods for Research Workers . Oliver and Boyd, Edinburgh, 1932.Alan Genz. Numerical computation of multivariate normal probabilities.
Journal of Computationaland Graphical Statistics , 1(2):141–149, 1992.I. J. Good. On the weighted combination of signifiance tests.
Journal of the Royal StatisticalSociety: Series B , 17(2):264–265, 1955.Sen Guo, Zuo-Zhi Li, Jun Gong, Mei Xiang, Peng Zhang, Guang-Nian Zhao, Mingchang Li, AnkangZheng, Xueyong Zhu, Hao Lei, et al. Oncostatin M confers neuroprotection against ischemicstroke.
Journal of Neuroscience , 35(34):12047–12062, 2015.Josephine Hoh, Anja Wille, and Jurg Ott. Trimming, weighting, and grouping SNPs in humancase-control association studies.
Genome Research , 11(12):2115–2119, 2001.Jiashun Jin and Zheng Tracy Ke. Rare and weak effects in large-scale inference: methods andphase diagrams.
Statistica Sinica , 26:1–34, 2016.43hia-Ling Kuo and Dmitri V Zaykin. Novel rank-based approaches for discovery and replicationin genome-wide association studies.
Genetics , 189(1):329–340, 2011.Seunggeun Lee, Mary J Emond, Michael J Bamshad, Kathleen C Barnes, Mark J Rieder, Deborah ANickerson, David C Christiani, Mark M Wurfel, and Xihong Lin. Optimal unified approachfor rare-variant association testing with application to small-sample case-control whole-exomesequencing studies.
The American Journal of Human Genetics , 91(2):224–237, 2012.Jia Li and George C Tseng. An adaptively weighted statistic for detecting differential gene expres-sion when combining multiple transcriptomic studies.
The Annals of Applied Statistics , 5(2A):994–1019, 2011.Xinyi Lin, Seunggeun Lee, Michael C Wu, Chaolong Wang, Han Chen, Zilin Li, and Xihong Lin.Test for rare variants by environment interactions in sequencing association studies.
Biometrics ,72(1):156–164, 2016.Ramon C Littell and J Leroy Folks. Asymptotic optimality of Fisher’s method of combiningindependent tests.
Journal of the American Statistical Association , 66(336):802–806, 1971.Ramon C Littell and J Leroy Folks. Asymptotic optimality of Fisher’s method of combiningindependent tests II.
Journal of the American Statistical Association , 68(341):193–194, 1973.Xiaoming Liu, Chunlei Wu, Chang Li, and Eric Boerwinkle. dbNSFP v3.0: A one-stop database offunctional predictions and annotations for human nonsynonymous and splice-site SNVs.
HumanMutation , 37(3):235–241, 2016.Robert Lugannani and Stephen Rice. Saddle point approximation for the distribution of the sumof independent random variables.
Advances in Applied Probability , 12(2):475–490, 1980.Peter McCullagh and John A Nelder.
Generalized Linear Models . CRC Press LLC, Florida, 2ndedition, 1989.Saralees Nadarajah. A generalized normal distribution.
Journal of Applied Statistics , 32(7):685–694, 2005. 44akov Nikitin.
Asymptotic Efficiency of Nonparametric Tests . Cambridge University Press, NewYork, 1995.Rachna S Pandya, Lilly LJ Mao, Edward W Zhou, Robert Bowser, Zhenglun Zhu, Yongjin Zhu,and Xin Wang. Neuroprotection for amyotrophic lateral sclerosis: role of stem cells, growthfactors, and gene therapy.
Central Nervous System Agents in Medicinal Chemistry (FormerlyCurrent Medicinal Chemistry-Central Nervous System Agents) , 12(1):15–27, 2012.Daniel J Schaid, Charles M Rowland, David E Tines, Robert M Jacobson, and Gregory A Poland.Score tests for association between traits and haplotypes when linkage phase is ambiguous.
TheAmerican Journal of Human Genetics , 70(2):425–434, 2002.Bradley N Smith, Nicola Ticozzi, Claudia Fallini, Athina Soragia Gkazi, Simon Topp, Kevin PKenna, Emma L Scotter, Jason Kost, Pamela Keagle, Jack W Miller, et al. Exome-wide rarevariant analysis identifies TUBA4A mutations associated with familial ALS.
Neuron , 84(2):324–331, 2014.Chi Song and George C Tseng. Hypothesis setting and order statistic for robust genomic meta-analysis.
The Annals of Applied Statistics , 8(2):777–800, 2014.Samuel A Stouffer, Edward A Suchman, Leland C DeVinney, Shirley A Star, and Robin M Williams.
The American Soldier: Adjustment during Army Life , volume I. Princeton University Press, NewJersey, 1949.Yu-Chen Su, William James Gauderman, Kiros Berhane, and Juan Pablo Lewinger. Adaptiveset-based methods for association testing.
Genetic Epidemiology , 40(2):113–122, 2016.Damian Szklarczyk, Andrea Franceschini, Stefan Wyder, Kristoffer Forslund, Davide Heller, JaimeHuerta-Cepas, Milan Simonovic, Alexander Roth, Alberto Santos, Kalliopi P Tsafou, et al.STRING v10: protein–protein interaction networks, integrated over the tree of life.
NucleicAcids Research , 43(D1):D447–D452, 2014.L. Tippert.
The Methods of Statistics . Williams and Norgate Ltd., London, 1931.45ahesh K Varanasi and Behnaam Aazhang. Parametric generalized Gaussian density estimation.
The Journal of the Acoustical Society of America , 86(4):1404–1415, 1989.M. C. Whitlock. Combining probability from independent tests: the weighted Z-method is superiorto Fisher’s approach.
Journal of Evolutionary Biology , 18(5):1368–1373, 2005.Zheyang Wu, Yiming Sun, Shiquan He, Judy Cho, Hongyu Zhao, and Jiashun Jin. Detectionboundary and Higher Criticism approach for sparse and weak genetic effects.
The Annals ofApplied Statistics , 8(2):824–851, 2014.K. Yu, Q. Li, A. W. Bergen, R. M. Pfeiffer, P. S. Rosenberg, N. Caporaso, P. Kraft, and N. Chat-terjee. Pathway analysis by adaptive combination of P-values.