[PDF] Estimating The Proportion of Signal Variables Under Arbitrary Covariance Dependence

Abstract

Estimating the proportion of signals hidden in a large amount of noise variables is of interest in many scientific inquires. In this paper, we consider realistic but theoretically challenging settings with arbitrary covariance dependence between variables. We define mean absolute correlation (MAC) to measure the overall dependence level and investigate a family of estimators for their performances in the full range of MAC. We explicit the joint effect of MAC dependence and signal sparsity on the performances of the family of estimators and discover that no single estimator in the family is most powerful under different MAC dependence levels. Informed by the theoretical insight, we propose a new estimator to better adapt to arbitrary covariance dependence. The proposed method compares favorably to several existing methods in extensive finite-sample settings with strong to weak covariance dependence and real dependence structures from genetic association studies.

Full PDF

EEstimating The Proportion of Signal Variables UnderArbitrary Covariance Dependence

X. Jessie Jeng ∗ Department of Statistics, North Carolina State UniversityAugust 27, 2020

Abstract

Accurately estimating the proportion of signals hidden in a large amount of noisevariables is of interest in many scientiﬁc inquires. In this paper, we consider realisticbut theoretically challenging settings with arbitrary covariance dependence betweenvariables. We deﬁne mean absolute correlation (MAC) to measure the overall depen-dence strength and investigate a family of estimators for their performances in the fullrange of MAC. We explicit the joint eﬀect of MAC and signal sparsity on the perfor-mances of the family of estimators and discover that the most powerful estimator underindependence is no longer most eﬀective when the MAC dependence is strong enough.Motivated by the theoretical insight, we propose a new estimator to better adapt toarbitrary covariance dependence. The proposed method compares favorably to severalexisting methods in extensive ﬁnite-sample settings with strong to weak covariancedependence and real dependence structures from genetic association studies.

Keywords : Dependence adaptivity; High-dimension data; Lower bound estimator;Sparse signal ∗ Address for correspondence: Department of Statistics, North Carolina State University, 2311 StinsonDr., Raleigh, NC 27695-8203, USA. Email: [email protected] . a r X i v : . [ s t a t . M E ] F e b Introduction

We consider the problem of estimating the proportion of information bearing signals thatare sparsely located among a large amount of noise variables. This problem is of greatinterest in many scientiﬁc inquiries. For example, many multiple testing methods need toimplement estimates of signal proportion to calculate the local false discovery rate (Efron,2007), to derive the q-value (Storey, 2003), and to further improve the power of existingmultiple comparison approaches (Storey, 2002; Finner and Gontscharuk, 2009). Moreover,in many multi-stage signal discovery studies, a good estimation of signal proportion canassist eﬃcient pre-screening, signal identiﬁcation, and sample size calculation (Cai and Sun,2017). A recent line of research, which focuses on retaining a high proportion of signalsthrough eﬃcient false negative control, also replies on the estimation of signal proportion asa benchmark for signal inclusion (Jeng et al., 2016, 2019; Jeng and Hu, 2020).Although estimation of signal proportion is widely requested, methodology developmenthas met two major challenges. First, signals of diﬀerent sparsity levels often call for dif-ferent estimation methods, and the sparsity levels are unknown a priori. Secondly, the setof variables under investigation may have complex dependence structures. There exist anumber of rigorously developed methods. Most of them, however, assume independencebetween variables (Genovese and Wasserman, 2004; Meinshausen and Rice, 2006; Jin andCai, 2007a) and consider sparsity levels within a certain range (Cai et al., 2007; Jin, 2008).A nice review of the existing methods can be found in Chen (2019). More recent develop-ments extend the study to consider speciﬁc dependence structures of the variables such asblock-diagonal covariance matrices (Jeng et al., 2019) and certain dependence conditions tofacilitate accurate precision matrix estimation and bias mitigation in linear regression (Jengand Chen, 2019). There lacks a method to consistently estimate the signal proportion underarbitrary covariance dependence when the sparsity of signals is unknown and possibly fallsin a wide range. Such an estimator can have far-reaching impact in real practices.In this paper, we deﬁne mean absolute correlation (MAC) to measure the overall depen-dence strength and investigate a family of estimators for their performances in the full extentof MAC. We explicit the joint eﬀect of MAC and signal sparsity on the performances of the2amily of estimators and discover that the most powerful estimator under independence isno longer most eﬀective when the MAC dependence is strong enough given a signal sparsitylevel, or when signals are sparse enough given a MAC level. Motivated by the theoreticalinsight, we propose a new estimator to better adapt to arbitrary covariance dependence.The new method is compared with several popular methods in extensive simulation exam-ples including strong to weak covariance dependence between variables and real dependencestructures from genetic association studies. It shows that although the winner of the severalexisting methods changes over diﬀerent settings, performance of the new method is eithercomparable to or better than the performance of that winner in each setting. We apply thenew methods to analyze two real datasets. The ﬁrst dataset is from an expression quantita-tive trait loci (eQTL) study with 8637 candidate single-nucleotide polymorphisms (SNPs),for which the overall dependence in terms of the MAC level is rather weak. The seconddataset is from a classical association study with microarray data, where 4088 candidategenes possess much stronger overall dependence with a high MAC level. We compare theestimate of the new method with those of the existing methods and relate the results to thesimulation study with similar settings. The comparison results indicate better adaptivity ofthe new method to complex real dependence structures.The rest of the paper is organized as follows. Section 2 ﬁrst introduces a family oflower bound estimators and develops a general result on their estimation consistency un-der dependence. Then, for speciﬁc estimators in the family, the joint eﬀects of covariancedependence and signal sparsity are explicated and compared in terms of power under de-pendence. Consequently, a new, more powerful estimator is developed for applications witharbitrary covariance dependence. Section 3 compares the proposed method with existingmethods in simulation examples. Section 4 applies the new and existing methods to realgenetic association studies and relates the results with the simulation examples that adoptsimilar settings. Section 5 concludes the work with further discussions. Technical proofs andsupporting ﬁgures and tables are provided in Appendices.3

Method and Theory

Denote I and I as the sets of indices for signal and noise variables, respectively. We considerthe marginal distribution of p variables as X j ∼ F · { j ∈ I } + F · { j ∈ I } , j = 1 , . . . , p, where F and F are the null and signal distribution, respectively. Deﬁne the signal propor-tion π = | I | /p. We assume that F is continuous and known a priori. All the other components are unknown.Our goal is to estimate the signal proportion π without the need to specify F or to identifywhich variables are signals. Meinshausen and Rice (2006) introduced a family of proportion estimators that are builtupon the empirical process of p -values under independence. This family of estimators havebeen proved to provide lower bound estimates for the true proportion π . Members in thefamily are indexed by the choice of a bounding function, and it has been shown that theestimator with bounding function δ ( u ) = (cid:112) u (1 − u ) has the best overall performance forsignals of diﬀerent sparsity levels. This conclusion, unfortunately, does not hold anymoreunder arbitrary dependence. When variables are arbitrarily dependent, the limiting distri-bution of the empirical process of p -values is generally unknown and may not even have ananalytic expression. Moreover, the dependence eﬀect mingles with signal sparsity to inﬂu-ence the performances of diﬀerent estimators in the family. These diﬃculties substantiallycomplicate the estimation problem and motivate us to develop new techniques to study thefamily of estimators.For presentation simplicity, we perform inverse normal transformation as Z j = Φ − ( F ( X j )),where Φ − is the inverse of the cumulative distribution function of a standard Normal random4ariable. Then, we have Z j ∼ Φ · { j ∈ I } + G · { j ∈ I } , j = 1 , . . . , p, (1)where G denotes the signal distribution after inverse normal transformation, which remainsunknown.Next, we construct a modiﬁed family of estimators to accommodate dependence among Z j . Let ¯ W p ( t ) = p − p (cid:88) j =1 {| W j | > t } , where ( W , . . . , W p ) follow the joint null distribution of ( Z , . . . , Z p ). Denote ¯Φ( t ) = 1 − Φ( t ).For a given function δ ( t ) that is strictly positive on (0 , ∞ ), deﬁne V p,δ = sup t> | ¯ W p ( t ) − t ) | δ ( t ) . (2)Apparently, V p,δ varies with the choice of δ ( t ), and δ ( t ) is called a bounding function. Fora given δ ( t ) and a control level α , we deﬁne the corresponding bounding sequence c p,δ as afunction of p, δ and α that satisﬁes the following properties:(a) pc p,δ > p c p ,δ , where p = | I | , and(b) P ( V p,δ > c p,δ ) < α for all p .From the above properties, it can be seen that c p,δ is an upper bound of V p,δ , while V p,δ relieson only noise variables. Both V p,δ and c p,δ carry the information of dependence among thevariables. Also, the absolute sign in the numerator of (2) stabilizes V p,δ as ¯ W p ( t ) − t )may not be asymptotically symmetric anymore under arbitrary dependence.Then, given the observed Z j , a family of estimators indexed by δ are constructed asˆ π δ = sup t> ¯ F p ( t ) − t ) − c p,δ δ ( t )1 − t ) , (3)where ¯ F p ( t ) = p − p (cid:88) j =1 {| Z j | > t } .

5t can be seen that the ˆ π δ family of estimators hinge on the choices of the bounding function δ ( t ) and the corresponding bounding sequence c p,δ . As the bounding sequence c p,δ can imply anormal range of ¯ F p ( u ) if all variables are noise, the information carried by the observed ¯ F p ( u )that exceeds the normal range presents evidence for the existence of signals. This versionof lower bound family is for signals of two-sided eﬀects. Minor changes to accommodateone-sided signal eﬀect is straightforward. More details for the numerical implementation ofˆ π δ can be found at the end of Section 2.3.As shown in Meinshausen and Rice (2006), most estimators in the original family arenot uniformly eﬀective for signals of diﬀerent sparsity levels. More speciﬁcally, the signalproportion can be re-parameterized as π = p − γ with γ ∈ (0 , /

2) representing the relativelydense case and γ ∈ [1 / ,

1) representing the sparse case. Consistency of diﬀerent estimatorsin the original family was proved under independence for γ in the dense and sparse casesseparately. Here, we consider the estimation problem in the more challenging setting witharbitrary dependence and modify the family of estimators to accommodate the dependence.In order to provide theoretical insight for the modiﬁed family, we impose a monotonicityconstraint on the bounding function δ ( t ), which allows us to present the following generalresult on all the estimators in the family for signals in the full range of sparsity levels with γ ∈ (0 , Lemma 2.1

Consider model (1). For a given bounding function δ ( t ) , if there exist a bound-ing sequence c p,δ satisfying the properties in (a) and (b), then P (ˆ π δ < π ) ≥ − α. (4) On the other hand, for π satisfying < π (cid:28) , if δ ( t ) is non-increasing with respect to t ,and G = G p such that G p ( τ ) → or G p ( − τ ) → for some τ (cid:29) max { δ − ( π/c p,δ ) , } , then P (ˆ π δ > (1 − (cid:15) ) π ) → for any constant (cid:15) > . π δ in (4) holds as long as abounding sequence c p,δ satisfying (a) and (b) can be found. On the other hand, the upperbound property of ˆ π δ in (5) holds under certain conditions on the signal distribution G ,which essentially says that the signal eﬀect, either positive ( G <

Φ) or negative (

G >

Φ),is strong enough. When both lower bound and upper bound conditions are satisﬁed for agiven δ ( t ) and a degenerating α such that α = α p →

0, ˆ π δ consistently estimate the truesignal proportion, i.e., for any constant (cid:15) > P ((1 − (cid:15) ) π ≤ ˆ π δ < π ) → . The above results hold for arbitrarily dependent variables and for sparse signals that are notof a ﬁxed proportion of all the variables.

As individual estimators in the ˆ π δ family hinge on the choice of the bounding function, wefurther study their consistency with speciﬁc δ ( t ) functions. We focus on δ ( t ) of the form δ ( t ) = [ ¯Φ( t )] θ , θ ∈ [0 , π δ are closely related to the existing estimators,whose consistency has been studied under independence in literature (Meinshausen andRice (2006) and the references therein). Valuable insights can be obtained by comparing theestimators’ performances under independence and dependence, which later help us constructa more powerful estimator under dependence.In order to explicate the eﬀect of dependence on ˆ π δ , we assume that( Z , . . . , Z p ) ∼ N p ( µ, Σ) , (6)where µ is a p -dimensional sparse vector with µ j = A j · { j ∈ I } , A j (cid:54) = 0, and Σ isan arbitrary correlation matrix, i.e., Σ ij = Corr ( Z i , Z j ). We deﬁne the Mean Absolute7orrelation (MAC) to calibrate the covariance dependence as¯ ρ Σ = p (cid:88) i =1 p (cid:88) j =1 | Σ ij | /p . (7)A larger value of ¯ ρ Σ indicates stronger overall dependence.Moreover, we employ a discretization technique from Arias-Castro et al. (2011) and Jengand Chen (2019) as follows. Deﬁne T = [1 , √ p ] ∩ N and the discretized version of V p,δ as V ∗ p,δ = max t ∈ T | ¯ W p ( t ) − t ) | δ ( t ) . (8)Denote c ∗ p,δ as the bounding sequence based on V ∗ p,δ , and deﬁne the corresponding proportionestimator as ˆ π ∗ δ = max t ∈ T ¯ F p ( t ) − t ) − c ∗ p,δ δ ( t )1 − t ) . (9)Next, we explicates how the MAC level interacts with signal sparsity and signal intensityto inﬂuence the consistency of ˆ π ∗ δ with δ ( t ) = [ ¯Φ( t )] θ . Very diﬀerent results are discoveredfor θ ∈ [0 , /

2] and θ ∈ (1 / , Theorem 2.1

Consider model (6). Let δ ( t ) = [ ¯Φ( t )] θ with θ ∈ [0 , / . Then, there existsa bounding sequence c ∗ p,δ = O (cid:18)(cid:113) ¯ ρ Σ (log p ) θ +1 / (cid:19) , that satisﬁes properties (a) and (b), and the corresponding estimator ˆ π ∗ δ satisﬁes P (ˆ π ∗ δ <π ) ≥ − α . Moreover, for π satisfying < π (cid:28) , if G = G p such that G p ( τ ) → or G p ( − τ ) → for some τ (cid:29) max (cid:40) , ¯Φ − (cid:32) π /θ ¯ ρ / (2 θ )Σ (log p ) ( θ +1 / / (2 θ ) (cid:33)(cid:41) , (10) then P (ˆ π ∗ δ > (1 − (cid:15) ) π ) → for any constant (cid:15) > . The above theorem says that for δ ( t ) = [ ¯Φ( t )] θ with θ ∈ [0 , / c ∗ p,δ , whose order increases with respect to ¯ ρ Σ , p and θ , respectively. The second part8f the theorem provides a condition on G in (10) for the consistency of ˆ π ∗ δ . This condition iseasier to be satisﬁed for less sparse signals (larger π ) or less dependent variables (smaller ¯ ρ Σ ).Note that ¯Φ − ( π /θ / ( ¯ ρ / (2 θ )Σ (log p ) ( θ +1 / / (2 θ ) )) is well-deﬁned only for π < (cid:112) ¯ ρ Σ (log p ) θ +1 / .In the case π ≥ (cid:112) ¯ ρ Σ (log p ) θ +1 / , condition (10) is simply τ (cid:29)

1. For the special case withindependent variables, ¯ ρ Σ = 1 /p and condition (10) degenerates to τ (cid:29) max { , ¯Φ − ( π /θ p / θ ) / (log p ) ( θ +1 / / (2 θ ) ) } , which agrees with the suﬃcient and necessary condition in Meinshausen and Rice (2006) forthe consistency of ˆ π δ under independence. The comparison can be made by adopting thesame parameterization as in Theorem 3 of their paper with π = p − γ , γ ∈ [1 / , ν = θ ,and κ = 2.Results in Theorem 2.1 explicate how the performance of ˆ π ∗ δ with δ ( t ) = [ ¯Φ( t )] θ , θ ∈ [0 , /

2] deteriorates as the MAC dependence gets stronger. These results, however, cannotbe extended to ˆ π ∗ δ with θ ∈ (1 / , Theorem 2.2

Consider model (6). Let δ ( t ) = [ ¯Φ( t )] θ with θ ∈ (1 / , . Then, there existsa bounding sequence c ∗ p,δ = O ( √ log p ) , that satisﬁes the properties (a) and (b), and thecorresponding estimator ˆ π ∗ δ satisﬁes P (ˆ π ∗ δ < π ) ≥ − α . Moreover, for π satisfying < π (cid:28) , if G = G p such that G p ( τ ) → or G p ( − τ ) → for some τ (cid:29) ¯Φ − (cid:18) π /θ (log p ) / (2 θ ) (cid:19) . (11) then P (ˆ π ∗ δ > (1 − (cid:15) ) π ) → for any constant (cid:15) > . This theorem shows that for δ ( t ) = [ ¯Φ( t )] θ with θ ∈ (1 / , c ∗ p,δ , whose order does not involve the MAC dependence level ¯ ρ Σ . Moreover, the intensitycondition in (11) does not involve ¯ ρ Σ , which means that ˆ π ∗ δ with θ ∈ (1 / ,

1] is consistentunder (11), no matter how strong the MAC dependence is.Although the above analyses are for the discretized version of ˆ π δ , it provides importantinsight on the performance of the ˆ π δ family under dependence, which is also supported by9xtensive simulation studies in Section 3. The study in Section 2.2 reveals that covariance dependence has diﬀerent eﬀects on diﬀerentestimators in the ˆ π δ family. As condition (10) gets more relaxed as ¯ ρ Σ decreases, estimatorswith θ ∈ [0 , /

2] are more powerful under weaker dependence in terms of the MAC level. Onthe other hand, when MAC dependence is strong enough, estimators with θ ∈ (1 / ,

1] couldbe more powerful as condition (11) does not change with ¯ ρ Σ . Motivated by these ﬁndings,we propose to construct a new estimator of the formˆ π adap = max { ˆ π δ , δ ∈ ∆ } , where ∆ is the set of δ ( t ) functions that render the most powerful estimators in diﬀerentdependence scenarios. Note that all the δ ( t ) functions in ∆ result in conservative estimatorswith the lower bound property P (ˆ π δ < π ) ≥ − α as stated in Lemma 2.1. Therefore, thenew estimator ˆ π adap naturally inherit the lower bound property.The power of ˆ π adap depends on the candidate δ ( t ) functions in ∆. We consider δ ( t ) =[ ¯Φ( t )] θ with speciﬁc θ values. Based on the results in Theorem 2.1, we ﬁnd that amongthe estimators with θ ∈ [0 , / θ = 1 / θ . On the other hand, based on the results in Theorem 2.2, themost powerful estimator among the estimators with θ ∈ (1 / ,

1] has θ = 1. The aboveanalysis help us narrow down to two candidates that are most powerful in their own θ groups. Comparing these two candidates, we can see that θ = 1 / θ = 1 when covariance dependence is relatively weak (orstrong) with ¯ ρ Σ ≤ π/ √ log p (or ¯ ρ Σ > π/ √ log p ). Since none of the candidates dominatesthe other under arbitrary dependence, our new estimator is constructed asˆ π adpt = max { ˆ π . , ˆ π } . (12)where ˆ π . denotes ˆ π δ with δ = [ ¯Φ( t )] / , and ˆ π denotes ˆ π δ with δ = ¯Φ( t ). ˆ π adpt is likely to be10omparable to ˆ π . and outperform ˆ π under relatively weak dependence, and be comparableto ˆ π and outperform ˆ π . when covariance dependence is strong.Numerical Implementation. We conclude this section with additional notes on the nu-merical implementation of ˆ π adap . Speciﬁcally, we simulate ( w , . . . , w p ) following the jointnull distribution of Z j . When the joint null distribution is unknown in real applications,( w , . . . , w p ) can often be simulated non-parametrically. For example, when ( Z , . . . , Z p )represent the associations between a set of explanatory variables and a response variable, acommon practice to simulate ( w , . . . , w p ) is by randomly shuﬄing only the sample of theresponse variable to remove the potential associations. More details for such permutationapproaches can be found in Westfall and Young (1993). Then, following (2), we simulate V p, . and V p, corresponding to θ = 1 / t = w , . . . , w p .Next, we set α = 0 . c p, . and c p, as the (1 − α )th quantile of 1000 replicatesof V p, . and V p, , respectively. The simulated c p, . and c p, are implemented to calculate ˆ π . and ˆ π as in (3) by taking the maximums over t = z , . . . z p , where z , . . . z p are the observedvariables after inverse normal transformation. Finally, the new estimator ˆ π adpt is calculatedby (12). In the following simulation examples, we consider six dependence structures: (a)-(d) are com-monly observed correlation structures in literature and (e)-(f) are real correlation structuresfrom genetic association studies. In all the examples, Σ ii = 1 , i = 1 , . . . , p .(a) Autoregressive . Σ ij = r | i − j | and r = 0 . Equal correlation . Σ ij = 0 . i (cid:54) = j .(c) Block correlation . Σ has square diagonal blocks. The oﬀ-diagonal elements in theblocks are 0.5, and the elements outside the blocks are zero.(d)

Sparse correlation . Σ has nonzero elements randomly located. The data generationprocess is similar to Model 3 in Cai et al. (2013). Let Σ ∗ = ( σ ij ), where σ ii = 1,11 ij = 0 . ∗ Bernoulli(1 , .

1) for i < j and σ ji = σ ij . Then Σ = I / (Σ ∗ + δI ) / (1 + δ ) I / ,where δ = | λ min (Σ ∗ ) | + 0 . SNP correlation . Σ is the sample correlation matrix of the real SNP data on Chro-mosome 21 from 90 individuals in the International HapMap project.(f)

Gene correlation . Σ is the sample correlation matrix of the real gene expression datafrom 71 individuals in a riboﬂavin production study.We generate test statistics Z , . . . , Z p ∼ N (( µ , . . . , µ p ) , Σ) and set p = 2000 for cases(a)-(d) above. Case (e) has p = 8657, which is the number of SNPs in the dataset, and case(f) has p = 4088, which is the number of genes in the dataset. Additional details of thedatasets can be found in Section 4. The block size in case (c) is set as 400 × µ , . . . , µ p )is a sparse vector with randomly located non-zero elements. We consider both relativelysparse signals with π = 0 .

02 and more dense signals with π = 0 . We ﬁrst demonstrate the MAC levels as deﬁned in (7) for the dependence structures in(a)-(f) above and report the realized values for c p, . and c p, . Recall that a larger value of¯ ρ Σ indicates stronger overall covariance dependence. It can be seen in Table 1 that ¯ ρ Σ isfairly small for cases (a) and (d), moderately small for cases (c) and (e), and fairly large forcases (b) and (f). Moreover, c p, . seems to vary positively with ¯ ρ Σ , whereas c p, does notshow such tendency. The numerical results are demonstrated more clearly in Figure 1 andare consistent with Theorem 2.1 and 2.2 about the dependence eﬀect on the two diﬀerentbounding sequences. Table 1: MAC and bounding sequences.Autocorr Equal corr Block corr Sparse corr SNP corr Gene corr¯ ρ Σ c p, . c p, We compare the new estimator ˆ π adpt with ˆ π . and ˆ π from the estimator family as well as twoother existing methods. Besides various dependence structures in (a)-(f), we consider sparseand relatively dense signals with π = 0 .

02 and 0 .

1, respectively, and varying signal intensitywith non-zero µ j = 3 , , ,

6. The comparisons are organized into two sets of examples.The ﬁrst set of examples demonstrate the performances of ˆ π adpt , ˆ π . , and ˆ π as they allpossess the lower bound property. Recall the theoretical results in Section 2.2 and 2.3 thatˆ π . may outperform (or underperform) ˆ π when dependence is weak (or strong) enough orsignals are less (or more) sparse. We observe these tendencies in Figure 2-7. Speciﬁcally, theautocorrelation case (Figure 2) has small ¯ ρ Σ = 0 . π . has comparableresults as those of ˆ π for small π = 0 .

02, and outperforms ˆ π for larger π = 0 .

1. The equalcorrelation case (Figure 3) has the largest ¯ ρ Σ = 0 .

5. It shows that ˆ π outperforms ˆ π . forboth π = 0 .

02 and 0 .

1. The block diagonal case (Figure 4) has moderate ¯ ρ Σ = 0 .

1. It showsthat ˆ π outperforms ˆ π . for small π , and is comparable to ˆ π . for larger π . The sparsecorrelation case (Figure 5) has the smallest ¯ ρ Σ = 0 . π . outperforms ˆ π π = 0 .

02 and 0 .

1. The SNP correlation case (Figure 6) has ¯ ρ Σ = 0 . π is slightly better for small π , and ˆ π . is better for larger π . The gene correlation case (Figure 7) has ¯ ρ Σ = 0 . π outperforms ˆ π . for both small and larger π . In all these examples, the new estimatorˆ π adpt always matches the winner of ˆ π . and ˆ π and exhibits generally better adaptivity todiﬀerent dependence structures and signal sparsity levels.Figure 2: Comparison under autocorrelation. “est 0.05”, “est 1”, and “est adapt” representˆ π . , ˆ π , and ˆ π adpt , respectively. The top row has π = 0 .

02, and the bottom row has π = 0 . π values are highlighted by the red horizontal lines.14igure 3: Comparison under equal correlation. Notations and symbols are the same as inFigure 2.Figure 4: Comparison under block dependence. Notations and symbols are the same as inFigure 2. 15igure 5: Comparison under sparse dependence. Notations and symbols are the same as inFigure 2.Figure 6: Comparison under SNP dependence. Notations and symbols are the same as inFigure 2. 16igure 7: Comparison under gene dependence. Notations and symbols are the same as inFigure 2.The second set of examples compares the performance of ˆ π adpt with the estimator ˆ π GW developed in Genovese and Wasserman (2004) and ˆ π JC developed in Jin and Cai (2007b).These two existing methods have been studied for relatively dense signals under indepen-dence. Table 2 and 3 show that for the cases (a)-(f), ˆ π GW and ˆ π JC tend to over-estimate thetrue π when dependence is strong (such as in (b) Equal correlation and (f) Gene correlation)or signals are sparse (such as for π = 0 . π adpt seems to be generally more accurate and stable over diﬀerent dependence structures and π values. 17able 2: Mean values and standard deviations (in brackets) of ˆ π adpt , ˆ π GW , and ˆ π JC whensignals are relatively spare with π = 0 . µ = 3 µ = 4 µ = 5 µ = 6Autocorr ˆ π adpt π GW π JC π adpt π GW π JC π adpt π GW π JC π adpt π GW π JC π adpt π GW π JC π adpt π GW π JC We apply the proposed method to two real datasets. The ﬁrst dataset is from an eQTLstudy with the goal to identify SNPs that potentially govern the expression of gene CCT8on chromosome 21. This gene has been found to be relevant to Down Syndrome (Bradicet al., 2011; Fan et al., 2012). We obtain the SNP data of unaﬀected subjects from theInternational HapMap project ( http://zzz.bwh.harvard.edu/plink/res.shtml )and the gene expression data from ftp://ftp.sanger.ac.uk/pub/genevar/ . Our dataincludes 90 samples from Asian population (45 Japanese in Tokyo, Japan (JPT), and 45Han Chinese in Beijing). We consider SNPs without missing values, which results in 8657candidate SNPs.As in Bradic et al. (2011) and Fan et al. (2012), we use marginal linear regression toderive test statistics for the associations between each SNP and the expression level of CCT8.Histogram of the test statistics is presented in Figure 8, where the long and thin right tail18able 3: Mean values and standard deviations (in brackets) of ˆ π adpt , ˆ π GW , and ˆ π JC whensignals are relatively dense with π = 0 . µ = 3 µ = 4 µ = 5 µ = 6Autocorr ˆ π adpt π GW π JC π adpt π GW π JC π adpt π GW π JC π adpt π GW π JC π adpt π GW π JC π adpt π GW π JC ρ Σ = 0 . c p, . and c p, have been demonstrated in Table 1, case (e).We apply the proposed estimator ˆ π adpt and the existing estimators ˆ π GW and ˆ π JC to thedataset. The results are ˆ π adpt = 0 . π GW = 0 . π JC = 0 . π adpt is smaller than ˆ π GW andˆ π JC in this setting and is expected to be closer to the true π .The second real application example has microarray data from a study on riboﬂavin(vitamin B ) production in bacillus subtilis. This dataset is available at annualreviews.org/doi/suppl/10.1146/annurev-statistics-022513-115545 and hasbeen studied in Buhlmann et al. (2014). The dataset includes the expression levels of 4088genes and the logarithm of riboﬂavin production rate of 71 individuals. Marginal regressioncoeﬃcients are used as test statistics for associations between genes and riboﬂavin produc-tion. The histogram of the test statistics is presented in Figure 10, which suggests a largersignal proportion than that in the ﬁrst real data example. Figure 11 shows the heatmap ofthe correlation matrix of the ﬁrst 50 genes, which indicates a more complicated dependencestructure. The MAC level of the genes is ¯ ρ Σ = 0 . c p, . and c p, have been demonstrated in Table 1, case (f).Figure 10: Histogram of test statistics for gene expression associationFigure 11: Heatmap of the absolute value of correlations for 50 gene expressions in riboﬂavinproduction study.In this example, we have ˆ π adpt = 0 . π GW = 0 . π JC = 0 . π adpt is smaller than ˆ π GW and ˆ π JC in this settingand is expected to be closer to the true π . 21 Conclusion and Discussion

Estimating the proportion of sparse signals under dependence is notoriously diﬃcult due tothe involved joint eﬀects of dependence, signal sparsity, and signal intensity. In this paper,we deﬁne the MAC level to measure arbitrary covariance dependence and explicate the eﬀectof MAC dependence on a family of estimators. Diﬀerent from the result developed underindependence that there exists a most powerful estimator in the family, we ﬁnd that no singleestimator in the family is most powerful in diﬀerent dependence scenarios. We identify can-didate estimators that are most powerful in diﬀerent scenarios and develop a new estimatorthat better adapts to the arbitrary covariance dependence. The new estimator inherits thelower bound property of the family, which is to provide a conservative estimate with highprobability. This property is valuable in real applications as it requires no conditions on theunknown signals. Moreover, the new estimator is generally more powerful than any membersin the lower bound family under arbitrary covariance dependence, and compares favorably toother popular methods in extensive numerical examples including weak to strong covariancedependence and real dependence structures from genetic associations studies.

Appendix

This section presents the proofs of Lemma 2.1, Theorem 2.1, and Theorem 2.2. The symbol C denotes a genetic, ﬁnite constant whose values can be diﬀerent at diﬀerent occurrences. We ﬁrst show P (ˆ π δ < π ) ≥ − α . Let Z j = Z j for j ∈ I and Z j = Z j for j ∈ I . Denote p = | I | and s = | I | . Then¯ F p ( t ) = p − (cid:88) j ∈ I {| Z j | >t } + p − (cid:88) j ∈ I {| Z j | >t } ≤ p − (cid:88) j ∈ I {| Z j | >t } + p − s = (1 − π ) p − (cid:88) j ∈ I {| Z j | >t } + π. P (ˆ π δ > π ) ≤ P (cid:18) sup t> (cid:110) (1 − π ) (cid:16) p − (cid:88) j ∈ I {| Z j | >t } − t ) (cid:17) − c p,δ δ ( t ) (cid:111) > (cid:19) ≤ P (cid:18) sup t> (cid:110) p − (cid:88) j ∈ I {| Z j | >t } − t ) − c p ,δ δ ( t ) (cid:111) > (cid:19) ≤ P ( V p > c p ,δ ) = α, where the second and last inequalities are by properties (a) and (b) of c p,δ , respectively. Theclaim P (ˆ π δ < π ) ≥ − α follows.Next, we show P (ˆ π δ > (1 − (cid:15) ) π ) →

1. Because ˆ π δ > ¯ F p ( t ) − t ) − c p,δ δ ( t ) for any t > F p ( t ) = 1 − πp (cid:88) j ∈ I {| Z j | >t } + πs (cid:88) j ∈ I {| Z j | >t } , then ˆ π δ π − > − π c p,δ δ ( t ) + 1 − ππ (cid:16) p − (cid:88) j ∈ I {| Z j | >t } − t ) (cid:17) + (cid:16) s − (cid:88) j ∈ I {| Z j | >t } − (cid:17) − t ) (13)for any t >

0. Now, set t in (13) at τ such that τ (cid:29) max { δ − ( π/c p,δ ) , } . We will show thateach term on the right hand side of (13) at t = τ is of o p (1).First, by the monotonicity of δ ( t ) and the condition τ (cid:29) δ − ( π/c p,δ ), we have δ ( τ ) = o ( π/c p,δ ) and the ﬁrst term A = − c p,δ δ ( τ ) /π = o (1).Consider the second term A = π − (1 − π ) (cid:16) p − (cid:80) j ∈ I {| Z j | >τ } − τ ) (cid:17) in (13). Thefollowing lemma is proved in Section 5.2. Lemma 5.1

For any τ (cid:29) max { δ − ( π/c p,δ ) , } , π − (cid:16) p − (cid:80) j ∈ I {| Z j | >τ } − τ ) (cid:17) = o p (1) . Lemma 5.1 and the condition π = o (1) are enough to show A = o p (1).23or the third term A = s − (cid:80) j ∈ I {| Z j | >τ } − P ( | A | > a ) = P (1 − s − (cid:88) j ∈ I {| Z j | >τ } > a ) ≤ a − (1 − P ( | Z j | > τ ))= a − ( G ( τ ) − G ( − τ )) = o (1)for any ﬁxed a >

0, where the third step is by Z j ∼ G for j ∈ I , and the last the step is bythe condition G ( τ ) → G ( − τ ) → A = − τ ) = o (1) given τ (cid:29) P (ˆ π δ /π > − (cid:15) ) → Recall the deﬁnitions of V p,δ in (2) and P (cid:32) sup t> | p − (cid:80) pj =1 {| W j | >t } − t ) | δ ( t ) > c p,δ (cid:33) < α. where W , . . . , W p ∼ N p (0 , Σ). Then, for t = τ , P (cid:32) | p − (cid:80) pj =1 {| W j | >τ } − τ ) | π > c p,δ δ ( τ ) π (cid:33) < α. Given τ (cid:29) δ − ( π/c p,δ ), we have c p,δ δ ( τ ) /π = o (1) and, consequently, | p − (cid:80) pj =1 {| W j | >τ } − τ ) | π = o p (1) . (14)Decompose the left hand side above as π − (cid:12)(cid:12)(cid:12) p − (cid:88) pj =1 {| W j | >τ } − τ ) (cid:12)(cid:12)(cid:12) ≥ π − (cid:12)(cid:12)(cid:12) p − (cid:88) j ∈ I {| W j | >τ } − τ ) (cid:12)(cid:12)(cid:12) − π − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p − p (cid:88) j =1 {| W j | >τ } − p − (cid:88) j ∈ I {| W j | >τ } (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) π − (cid:12)(cid:12)(cid:12) p − (cid:88) pj =1 {| W j | >τ } − p − (cid:88) j ∈ I {| W j | >τ } (cid:12)(cid:12)(cid:12) = π − (cid:12)(cid:12)(cid:12) p − (cid:88) j ∈ I {| W j | >τ } − πp − (cid:88) j ∈ I {| W j | >τ } (cid:12)(cid:12)(cid:12) ≤ s − (cid:88) j ∈ I {| W j | >τ } + p − (cid:88) j ∈ I {| W j | >τ } . (15)Since τ (cid:29)

1, both s − (cid:80) j ∈ I {| W j | >τ } = o p (1) and p − (cid:80) j ∈ I {| W j | >τ } = o p (1) by Markov’sinequality. Combining this with (14) and (15) gives π − (cid:12)(cid:12)(cid:12) p − (cid:88) j ∈ I {| W j | >τ } − τ ) (cid:12)(cid:12)(cid:12) = o p (1) . Now, because the joint distribution of Z j , j ∈ I , is the same as the joint distribution of W j , j ∈ I , claim in Lemma 5.1 follows. First, we show that given δ ( t ) = [ ¯Φ( t )] θ , θ ∈ [0 , / c ∗ p,δ = C (cid:112) ¯ ρ Σ (log p ) θ +1 / , with a largeenough constant C , satisﬁes properties (a) pc ∗ p,δ > p c ∗ p ,δ , and (b) P ( V ∗ p,δ > c ∗ p,δ ) < α for all p . Consider property (a). Deﬁne Σ as the covariance matrix of W j , j ∈ I and¯ ρ Σ = (cid:88) i ∈ I (cid:88) j ∈ I | Σ ij | /p . It can be shown that¯ ρ Σ > p (cid:88) i ∈ I (cid:88) j ∈ I | Σ ij | = (1 − π ) p (cid:88) i ∈ I (cid:88) j ∈ I | Σ ij | = (1 − π ) ¯ ρ Σ . Then it follows that c ∗ p,δ > C (1 − π ) (cid:113) ¯ ρ Σ (log p ) θ +1 / > (1 − π ) c ∗ p ,δ = ( p /p ) c ∗ p ,δ , and property (a) is veriﬁed. 25ext consider property (b). By Chebyshev’s inequality and direct calculation, P ( V ∗ p,δ > c ∗ p,δ ) ≤ (cid:0) c ∗ p,δ (cid:1) − Var ( V ∗ p,δ ) ≤ (cid:0) c ∗ p,δ (cid:1) − E ([ V ∗ p,δ ] )= (cid:0) c ∗ p,δ (cid:1) − E (cid:34) max t ∈ T (cid:18) | ¯ W p ( t ) − t ) | [ ¯Φ( t )] θ (cid:19) (cid:35) Let A ( t ) = [ ¯Φ( t )] − θ ( ¯ W p ( t ) − t )) . It can be shown that E (cid:18) max t ∈ T A ( t ) (cid:19) = (cid:90) ∞ P (max t ∈ T A ( t ) > c ) dc ≤ (cid:90) ∞ (cid:88) t ∈ T P ( A ( t ) > c ) dc = (cid:88) t ∈ T E [ A ( t )] ≤ C (cid:112) log p · max t ∈ T E [ A ( t )]= C (cid:112) log p · max t ∈ T (cid:8) [ ¯Φ( t )] − θ Var (cid:0) ¯ W p ( t ) (cid:1)(cid:9) The following lemma provides the order of

Var ( ¯ W p ( t )). Lemma 5.2

For W , . . . , W p ∼ N p (0 , Σ) and ¯ ρ Σ in (7), Var (cid:0) ¯ W p ( t ) (cid:1) = O (cid:16) ¯ ρ Σ · e − t / (cid:17) . (16)Therefore,[ ¯Φ( t )] − θ · Var (cid:0) ¯ W p ( t ) (cid:1) ≤ C [ ¯Φ( t )] − θ · ¯ ρ Σ · e − t / ≤ C (cid:18) te − t / (cid:19) θ · ¯ ρ Σ · e − t / ≤ C (log p ) θ · ¯ ρ Σ · e ( θ − / t ≤ C ¯ ρ Σ · (log p ) θ where the ﬁrst step above is by Lemma 5.2, the second step is by Mill’s ratio, the third stepis by t ∈ T , and the last step is by θ ∈ [0 , / P ( V ∗ p,δ > c ∗ p,δ ) ≤ C (cid:0) c ∗ p,δ (cid:1) − · ¯ ρ Σ · (log p ) θ +1 / , and c ∗ p,δ = O ( (cid:112) ¯ ρ Σ (log p ) θ +1 / ) follows.Next, we demonstrate the upper bound property of ˆ π ∗ δ . Based on the general results in26emma 2.1, it is suﬃcient to show δ − (cid:32) πc ∗ p,δ (cid:33) ≤ C ¯Φ − (cid:32) π /θ ¯ ρ / (2 θ )Σ (log p ) ( θ +1 / / (2 θ ) (cid:33) , which is straightforward given δ ( t ) = [ ¯Φ( t )] θ . Var (cid:0) ¯ W p ( t ) (cid:1) = p − p (cid:88) j =1 V ar (1 {| W j | >t } ) + p − (cid:88) i (cid:54) = j Cov (1 {| W i | >t } , {| W j | >t } ) . By Mill’s ratio, p − p (cid:88) j =1 V ar (1 {| W j | >t } ) ≤ p − t )(1 − t )) ≤ Cp − e − t / . For p − (cid:80) i (cid:54) = j Cov (1 {| W i | >t } , {| W j | >t } ), we have Cov (1 {| W i | >t } , {| W j | >t } ) = 4 (cid:90) t −∞ (cid:90) t −∞ f ( x, y ) dxdy − (cid:90) t −∞ φ ( x ) dx (cid:90) t −∞ φ ( y ) dy ≤ C | Σ ij | e − t / , where the last step follows from Corollary 2.1 in Li and Shao (2002). Combining the abovewith the deﬁnition of ¯ ρ Σ results in (16). First, it is easy to see that c ∗ p,δ = C √ log p satisﬁes property (a) pc p,δ > p c p ,δ .For property (b), by Markov’s inequality, P ( V ∗ p,δ > c ∗ p,δ ) ≤ (cid:0) c ∗ p,δ (cid:1) − E (cid:18) max t ∈ T | ¯ W p ( t ) − t ) | [ ¯Φ( t )] θ (cid:19) . Let B ( t ) = [ ¯Φ( t )] − θ | ¯ W p ( t ) − t ) | , and by the similar arguments as in Section 5.3, we have E [max t ∈ T B ( t )] ≤ C (cid:112) log p · max t ∈ T E [ B ( t )] . E [ B ( t )] ≤ [ ¯Φ( t )] − θ ( E [ ¯ W p ( t )] + 2 ¯Φ( t )) = 4[ ¯Φ( t )] − θ ≤ θ ∈ (1 / , P ( V ∗ p, > c ∗ p, ) ≤ C (cid:0) c ∗ p, (cid:1) − (cid:112) log p < α, where the last step is by c ∗ p, = C √ log p with a large enough constant C .Next, we demonstrate the upper bound property of ˆ π ∗ δ with δ ( t ) = [ ¯Φ( t )] θ , θ ∈ (1 / , δ − (cid:32) πc ∗ p,δ (cid:33) ≤ C ¯Φ − (cid:18) π /θ (log p ) / θ (cid:19) , which is straightforward given δ ( t ) = [ ¯Φ( t )] θ . References

Arias-Castro, E., E. J. Cand`es, and Y. Plan (2011). Global testing under sparse alternatives:Anova, multiple comparisons and the higher criticism.

The Annals of Statistics 39 (5),2533–2556.Bradic, J., J. Fan, and W. Wang (2011). Penalized composite quasi-likelihood for ultrahighdimensional variable selection.

Journal of the Royal Statistical Society: Series B 73 (3),325–349.Buhlmann, P., K. Kalisch, and L. Meier (2014). High-dimensional statistics with a viewtoward applications in biology.

Annu Rev Stat Appl. 1 , 255–278.Cai, T., W. Liu, and Y. Xia (2013). Two-sample covariance matrix testing and supportrecovery in high-dimensional and sparse settings.

Journal of the American StatisticalAssociation 108 (501), 265–277.Cai, T. and W. Sun (2017). Optimal screening and discovery of sparse signals with appli-28ations to multistage high-throughput studies.

Journal of the Royal Statistical Society:Series B (Statistical Methodology) 79 (1), 197.Cai, T. T., J. Jin, M. G. Low, et al. (2007). Estimation and conﬁdence sets for sparse normalmixtures.

The Annals of Statistics 35 (6), 2421–2449.Chen, X. (2019). Uniformly consistently estimating the proportion of false null hypothesesvia lebesgue–stieltjes integral equations.

Journal of Multivariate Analysis 173 , 724–744.Efron, B. (2007). Size, power and false discovery rates.

The Annals of Statistics 35 (4),1351–1377.Fan, J., X. Han, and W. Gu (2012). Estimating false discovery proportion under arbitrarycovariance dependence.

Journal of the American Statistical Association 107 (499), 1019–1035.Finner, H. and V. Gontscharuk (2009). Controlling the familywise error rate with plug-in estimator for the proportion of true null hypotheses.

Journal of the Royal StatisticalSociety: Series B (Statistical Methodology) 71 (5), 1031–1048.Genovese, C. and L. Wasserman (2004). A stochastic process approach to false discoverycontrol.

Annals of Statistics 32 , 1035–1061.Jeng, X. J. and X. Chen (2019). Variable selection via adaptive false negative control inlinear regression.

Electronic Journal of Statistics 13 (2), 5306–5333.Jeng, X. J., Z. J. Daye, W. Lu, and J.-Y. Tzeng (2016). Rare variants association analysis inlarge-scale sequencing studies at the single locus level.

PLoS computational biology 12 (6),e1004993.Jeng, X. J. and Y. Hu (2020). Dual control of testing errors in high-dimensional data analysis. arXiv preprint arXiv:2006.15667 .Jeng, X. J., T. Zhang, and J.-Y. Tzeng (2019). Eﬃcient signal inclusion with genomicapplications.

Journal of the American Statistical Association 114 (528), 1787–1799.29in, J. (2008). Proportion of non-zero normal means: universal oracle equivalences anduniformly consistent estimators.

Journal of the Royal Statistical Society: Series B 70 (3),461–493.Jin, J. and T. T. Cai (2007a). Estimating the null and the proportion of nonnull ef-fects in large-scale multiple comparisons.

Journal of the American Statistical Associa-tion 102 (478), 495–506.Jin, J. and T. T. Cai (2007b). Estimating the null and the proportion of nonnull ef-fects in large-scale multiple comparisons.

Journal of the American Statistical Associa-tion 102 (478), 495–506.Li, W. V. and Q.-M. Shao (2002). A normal comparison inequality and its applications.

Probability Theory and Related Fields 122 (4), 494–508.Meinshausen, N. and J. Rice (2006). Estimating the proportion of false null hypothesesamong a large number of independently tested hypotheses.

The Annals of Statistics 34 (1),373–393.Storey, J. D. (2002). A direct approach to false discovery rates.

Journal of the RoyalStatistical Society: Series B (Statistical Methodology) 64 (3), 479–498.Storey, J. D. (2003). The positive false discovery rate: a bayesian interpretation and theq-value.

The Annals of Statistics 31 (6), 2013–2035.Westfall, P. H. and S. S. Young (1993).