[PDF] FDR control for multiple hypothesis testing on composite nulls

Abstract

Multiple hypothesis testing often involves composite nulls, i.e., nulls that are associated with two or more distributions. In many cases, it is reasonable to assume that there is a prior distribution on the distributions despite it is unknown. When the number of distributions under true nulls is finite, we show that under the above assumption, the false discover rate (FDR) can be controlled using p -values computed under constraints imposed by the empirical distribution of the observations. Comparing to FDR control using p -values defined as maximum significance level over all null distributions, the proposed FDR control can have substantially more power.

Full PDF

aa r X i v : . [ m a t h . S T ] J u l FDR CONTROL FOR MULTIPLE HYPOTHESIS TESTINGON COMPOSITE NULLS

By Zhiyi Chi ∗ Department of Statistics, University of Connecticut

Multiple hypothesis testing often involves composite nulls, i.e.,nulls that are associated with two or more distributions. In manycases, it is reasonable to assume that there is a prior distributionon the distributions despite it is unknown. When the number of dis-tributions under true nulls is ﬁnite, we show that under the aboveassumption, the false discover rate (FDR) can be controlled using p -values computed under constraints imposed by the empirical distri-bution of the observations. Comparing to FDR control using p -valuesdeﬁned as maximum signiﬁcance level over all null distributions, theproposed FDR control can have substantially more power.

1. Introduction.

In hypothesis testing, a relatively simple case is wherethe data associated with true nulls and those with false nulls each follow acommon distribution (“simple versus simple”) [4, 6]. On the other hand,in many cases, either the data associated with true nulls follow diﬀerentdistributions (“composite nulls”) or those associated with false nulls followdiﬀerent distributions (“composite alternatives”). In the current literatureon multiple testing, once appropriate test statistics such as p -values are com-puted, testing procedures based on the statistics usually do not distinguishbetween the simple and composite cases [11, 10, 16, 7, 14]. At the time whena procedure is applied, it only has the test statistics available. For this rea-son, how the test statistics are deﬁned plays an important role in the overallperformance of the procedure.For composite nulls, p -values are usually deﬁned as maximum probabil-ities over all null distributions [10]. Following the random-eﬀects extensionfor composite alternatives [6], a Bayesian approach to calculating p -valuescan be used. Speciﬁcally, one assumes that there is a known prior distri-bution on the null distributions. Since the overall distribution of the dataassociated with true nulls can now be determined by an integral of the nulldistributions weighted by the prior, the composite case is essentially reducedto the simple one.The focus of the article lies between the above two approaches. The un-derlying premise is that there is a prior distribution on the null distributions,however, the prior is unknown. The basic observation is that, in the presenceof a large number of nulls, the empirical distribution of the data provides ∗ Research partially supported by NSF DMS 0706048 and NIH MH 68028.

AMS 2000 subject classiﬁcations:

Primary 62G10, 62H15; secondary 62G20

Keywords and phrases:

Multiple testing, FDR, composite null, empirical process, stop-ping time, DKW inequality . CHI/FDR CONTROL FOR COMPOSITE NULLS useful information on the prior. More speciﬁcally, the mixture of the nulldistributions, if multiplied by the population fraction of true nulls, is dom-inated by the empirical distribution of the data plus a small margin. Thisconstrains the set of possible priors. We shall explore the observation forthe case where there are only a ﬁnite number of null distributions. On theone hand, the p -values will be calculated as maximum probabilities. On theother, the maximization is over a range of linear combinations of the null dis-tributions, with the coeﬃcients being constrained. As a result, the p -valuescan be computed by linear programming.The article does not consider the case of composite alternatives. Theposition here is that, since oftentimes no information on the distributionsunder false nulls is available, it is sensible to regard data associated withfalse nulls as being sampled from a single overall distribution.Although our focus is the evaluation of p -values under constraints, westart with Section 2 on FDR control using maximum probabilities withoutconstraints. That the BH procedure can control the FDR in this case isknown [3]. The purpose of the section is to setup suitable framework forfollowing sections, by making a more general description of the BH procedureand indicating where constrained maximization may be introduced.Section 3 considers two ways to compute p -values. The ﬁrst one is se-quential, such that the p -value of each observation is obtained under linearconstraints imposed by observations whose p -values have already been com-puted. In the second one, in principle, the p -values can be computed forthe observations simultaneously under the linear constraints imposed by theentire data. Both types of p -values are then processed by the BH procedure.Analytically, it is easier to establish FDR control based on the ﬁrst type of p -values because the sequential computation allows one to use a stoppingtime argument [15]. On the other hand, since there are more constraintsimposed on the second type of p -values, presumably they may lead to moreimprovement in multiple testing. However, the simulation study reported inSection 4 indicates that the two types of p -values lead to similar performanceof multiple testing. Some possible explanations for this will be given at theend of Section 4. The study shows that, the BH procedure is substantiallymore powerful when using the two types of p -values than using p -valuescomputed by the usual unconstrained maximization. In addition to power,we will also compare the FDR and positive FDR (pFDR) realized by the p -values.The results in Section 4 indicate that in general, for the case of compositenulls, the prior on the null distributions cannot be estimated consistently.Basically, this is because the constraints imposed by the data cannot yieldexact details of the prior and also because the above two ways to evaluate p -values usually select diﬀerent linear combinations of the null distributionsfor diﬀerent observations. This is in contrast to the simple case, where thefraction of true nulls can be estimated consistently [2, 8, 15]. Conceptuallyit is of interest to ask whether there are conditions that allow the prior of . CHI/FDR CONTROL FOR COMPOSITE NULLS the null distributions to be estimated consistently. In Section 5, for the casewhere there are only a ﬁnite number of null distributions, a necessary andsuﬃcient condition will be given for the consistent estimation of the priorusing maximum likelihood estimation (MLE). Note that, in the MLE, thedistribution under false nulls is unknown, and the data are treated as thoughall are sampled from true nulls. An example will be given to show that forany ﬁnite set of linearly independent null distributions, one can construct alarge class of distributions that satisfy the condition.Section 6 contains a brief discussion. Most technical details are collectedin the Appendix.1.1. Assumptions and notation.

Let { F θ , θ ∈ Θ } be a family of distri-butions on R d . Given random observations X , . . . , X n ∈ R d , the compositenulls to be tested are H i : X i ∼ F θ for some θ ∈ Θ . Each F θ is a null distribution.Our discussion will be under the following random mixture model. Thedistribution under false nulls is G

6∈ { F θ , θ ∈ Θ } and the fraction of falsenulls among all nulls is a ∈ (0 , ν on Θ. The data are sampled as follows. Deﬁne probability measure µ on Θ ∪ {∗} , where ∗ is any element not in Θ, such that µ ( {∗} ) = a and µ ( A ) = (1 − a ) ν ( A ) for A ⊂ Θ. Sample η , . . . , η n iid ∼ µ . If η i = ∗ , thensample X i ∼ G ; otherwise, sample X i ∼ F η i . Thus η i can be thought of asthe identity of X i , indexing the distribution X i is sampled from.Throughout we will make two assumptions. First, ν is unknown. Indeed, if ν is known, then under true H i , X i ∼ F = R F θ ν ( dθ ) and thus the compositenull can be reduced to a simple null. Second, G is unknown. This assumptionis especially intended for the case where Θ is ﬁnite. Indeed, if G is known,then for n ≫

1, both a and ν can be estimated accurately by the MLE,which reduces the testing problem into one only involving simple nulls.Recall that for a multiple testing procedure, if R is the number of rejectednulls, and V that of rejected true nulls, thenFDR = E (cid:20) VR ∨ (cid:21) , pFDR = E (cid:20) VR R > (cid:21) . Furthermore, if there are n nulls and N of them are true, thenpower = E (cid:20) R − V ( n − N ) ∨ (cid:21) .

2. Testing based on maximum probabilities.

Usually, a descrip-tion of multiple testing procedure starts with p -values, treating them asalready available. For our discussion later, it is useful to start with how p -values are computed. The p -values are absent in the continuous version ofour description, but explicit in the discrete version. . CHI/FDR CONTROL FOR COMPOSITE NULLS Let { D t : t ∈ I} be a family of Borel sets in R d satisfying the followingconditions, where I 6 = ∅ is an open interval in R .D1. The family is increasing and right-continuous, i.e. D t = T s>t,s ∈I D s ,for t ∈ I .D2. S t ∈I D t = R d .D3. G ( T t ∈I D t ) = F θ ( T t ∈I D t ) = 0, θ ∈ Θ.For each θ ∈ Θ, deﬁne φ θ ( t ) =  F θ ( D t ) if t ∈ I , t ≤ inf I , t ≥ sup I , (2.1)i.e., φ θ ( t ) is the signiﬁcance level of the region D t under F θ . By D2 and D3, φ θ is nondecreasing and continuous at inf I and sup I . Denote M ( t ) = sup θ φ θ ( t ) , (2.2)i.e., M ( t ) is the signiﬁcance level of D t associated with { F θ , θ ∈ Θ } . It isnondecreasing with M ( t ) = 0 for t ≤ inf I and M ( t ) = 1 for t ≥ sup I .We can regard M ( t ) as sup µ R φ θ ( t ) dµ ( θ ), where the supremum is takenover all possible probability measures µ on Θ. By our assumption, thereis a prior ν on Θ. If there is no information on the value of ν , then thesupremum is justiﬁed. If, on the other hand, it is known that ν satisﬁescertain conditions, then it makes sense to use the conditions to constrainthe supremum, even though the conditions may not uniquely determine ν .This may yield a M ( t ) closer to R φ θ ( t ) dν ( θ ) that improves the performanceof multiple testing.Once M ( t ) are in place, the BH procedure can be applied. The procedurecan be described in two ways. The continuous version features a stoppingtime that may simplify the analysis of FDR control (cf. [15]), while thediscrete one is easier to implement. For t ∈ I , denote R n ( t ) = n X i =1 { X i ∈ D t } , V n ( t ) = n X i =1 { X i ∈ D t , η i ∈ Θ } . Procedure . Given control parameter α ∈ (0 , I R = (cid:26) t ∈ I : M ( t ) α ≤ R n ( t ) ∨ n (cid:27) . If I R = ∅ , set τ = inf I R and reject H i if and only if X i ∈ D τ . Otherwise,set τ = inf I and accept all H i .To describe the discrete version of Procedure 2.1, deﬁne s ( x ) = inf { t ∈ I : x ∈ D t } , s i = s ( X i ) , i = 1 , . . . , n. (2.3)By D2, the set in (2.3) is nonempty, so s ( x ) is well-deﬁned and s ( x ) < sup I . . CHI/FDR CONTROL FOR COMPOSITE NULLS Proposition . Under D1-3, the following statements hold.1) s i ∈ I almost surely.2) For any t ∈ I , s i ≤ t ⇐⇒ X i ∈ D t and hence R n ( t ) = P { s i ≤ t } .3) Given θ , if X i ∼ F θ , then s i ∼ φ θ .4) For i = 1 , . . . , n , the distribution function of s i is Q ( t ) = (1 − a ) Z φ θ ( t ) ν ( dθ ) + aG ( D t ) .

5) If φ θ ∈ C ( R ) for all θ , then M ( t ) is left-continuous. By Proposition 2.1, φ θ ( s i ) is the p -value of X i under F θ . Therefore, M ( s i )can be used as a p -value under the composite null H i [10]. Procedure . Let s (1) ≤ . . . ≤ s ( n ) be the orderstatistics of s i and s (0) = inf I . Reject H i if and only s i ≤ s ( R ) , where R = max ( i ≥ M ( s ( i ) ) α ≤ in ) . Proposition . Suppose φ θ ∈ C ( R ) for all θ . Then Procedures 2.1and 2.2 are the same, and both have FDR ≤ (1 − a ) α . In single hypothesis tests, nested rejection regions are usually indexed bysigniﬁcance level. For FDR control, other indices can be used. This allowsone to think about the rejection regions in more natural terms and alsoavoids problems when diﬀerent regions have the same signiﬁcance levels.

Example . Suppose X i ∈ R . To use lower-tail probabilities as p -values, set D t = ( −∞ , t ], t ∈ I = R . Then s i = X i and φ θ ( s i ) = F θ ( X i ).To use upper-tail probabilities as p -values, set D t = [ − t, ∞ ), t ∈ I = R .Then s i = − X i and φ θ ( s i ) = F θ ([ − s i , ∞ )) = F θ ([ X i , ∞ )). Suppose each F θ is continuous at 0. If we use D t = [ − t, t ], t ∈ I = [0 , ∞ ), then s i = | X i | and φ θ ( s i ) = F ([ −| X i | , | X i | ]).

3. Testing based on constrained maximum probabilities.

Outlines.

Testing using maximum probabilities can be very con-servative. Our goal is to ﬁnd alternative methods when Θ is a ﬁnite set { θ k , k = 1 , . . . , L } . The probability measure ν on Θ can now be speciﬁed by ν = ( ν , . . . , ν L ) ⊤ with ν k = ν ( { θ k } ). Henceforth, a letter in boldface willstand for an L -dimensional vector. Denote φ k ( t ) = φ θ k ( t ). In this section,we assume that all F k and hence all φ k ( t ) are continuous. Denote F n ( t ) = R n ( t ) /n, i.e. the empirical distribution based on s , . . . , s n deﬁned in (2.3). . CHI/FDR CONTROL FOR COMPOSITE NULLS Instead of M ( t ) = max k φ k ( t ) as in Procedure 2.1, for ﬁnite Θ, the pro-posed functions to use have the general form M n ( t ) = sup { c ⊤ φ ( t ) : c ∈ C, c ⊤ φ ∈ A n,t } , where C is a suitable subset of∆ = { c ∈ [0 , L : c + · · · + c L ≤ } and for n ≥ t ∈ I , A n,t is a family of functions on I . In general, C is constructed based on deterministic knowledge on ν and a . On the otherhand, A n,t is constructed based on the data and hence both M n ( t ) and A n,t may be random. If C = ∆ and A n,t is the entire family of functions on I , then M n ( t ) is max i φ i ( t ) and we recover Procedure 2.1. By adding conditions tomake C or A n,t smaller, M n ( t ) can be smaller than max i φ i ( t ), which mayresult in higher power. In particular, if C = { ν } , then M n ( t ) = ν ⊤ φ ( t ),which reduces the testing problem to the one for simple nulls.Oftentimes, there is no direct knowledge on ν or a so one has to set C = ∆;constraints on c are indirectly imposed through the condition c ⊤ φ ∈ A n,t .Then M n ( t ) takes the form M n ( t ) = sup { c ⊤ φ ( t ) : c ∈ ∆ , c ⊤ φ ∈ A n,t } . (3.1)In Section 4, we will consider the case where C can be chosen smaller than∆, and in Section 5, a case where substantial knowledge on ν can be attainedby estimation will be considered.Recall that (1 − a ) ν ⊤ φ ( t ) is the population fraction of true nulls with X i ∈ D t . In order for M n ( t ) not to underestimate the fraction, a basicrequirement is M n ( t ) ≥ (1 − a ) ν ⊤ φ ( t ). In general, since A n,t is random, thisrequires that A n,t have the property that as long as n is large enough, withprobability close to 1, (1 − a ) ν ⊤ φ ∈ A n,t for all t ∈ I .A basic fact to use in order to satisfy the condition is that, almost surely,as n → ∞ , sup t | F n ( t ) → Q ( t ) | → , where Q ( t ) is the distribution function of s i = s ( X i ) deﬁned in (2.3), i.e. Q ( t ) = (1 − a ) ν ⊤ φ ( t ) + aG ( D t ) . Then, with probability close to 1, (1 − a ) ν ⊤ φ is less than F n ( t ) plus a smallmargin. Moreover, Q ( t ) − (1 − a ) ν ⊤ φ ( t ) = aG ( D t ) is increasing in t . Thenfor n ≫

1, with probability close to 1, F n ( u ) − (1 − a ) ν ⊤ φ ( u ) > F n ( v ) − (1 − a ) ν ⊤ φ ( v ) − ǫ n , for all u > v. Therefore, in calculating M n ( t ), the maximization can be constrained tothose c such that, when they replace (1 − a ) ν , the inequalities still hold. . CHI/FDR CONTROL FOR COMPOSITE NULLS Construction using data sequentially.

Given the relative ease to es-tablish FDR control by using a stopping time as the random cut-oﬀ forrejection, we ﬁrst consider a construction of A n,t that allows a stoppingtime to be deﬁned.Incorporating the facts discussed just now, a basic form of A n,t is A n,t =  h ∈ C ( I ) : h ( s i ) ≤ F n ( s i ) + ǫ n for s i ≥ t F n ( t ) − F n ( t ) ≥ h ( t ) − h ( t ) − ǫ n for t , t ∈ T n with t ≤ t < t  , (3.2)where T n ⊂ I is a ﬁnite set of points. Although T n can contain any numberof points, to reduce computation, the number of points in T n needs to berelatively small.It is easy to see M n ( t ) = 0 if t ≤ inf I . Some other useful properties of M n ( t ) are as follows. Lemma . M n is always nondecreasing. Furthermore, if φ i ∈ C ( R ) forall i , then almost surely, 1) M n is continuous at every t other than s , . . . , s n and 2) it is left-continuous and has a right-hand limit at each s i . The continuous and discrete versions of the BH procedure using M n ( t ) aredescribed below. Similar to Procedure 2.2, the two versions are equivalent.As in Procedure 2.1, the random variable τ in the continuous version is astopping time. Procedure . Given control parameter α ∈ (0 , I R = (cid:26) t ∈ I : M n ( t ) α ≤ R n ( t ) ∨ n (cid:27) . If I R = ∅ , set τ = sup I R and reject H i if and only if s i ≤ τ . Otherwise, set τ = inf I and accept all H i .Equivalently, sort s i into s (1) ≤ . . . ≤ s ( n ) and set s (0) = inf I . Reject H i if and only if s i ≤ s ( R ) , where R = max ( i ≥ M n ( s ( i ) ) α ≤ R n ( s ( i ) ) ∨ n ) . For each i , M n ( s ( i ) ) is the maximum of c ⊤ φ ( s ( i ) ), with c k satisfying1) c k ≥ P c k ≤ c ⊤ φ ( s ( j ) ) ≤ F n ( s ( j ) ) + ǫ n for j ≥ i ;3) F n ( t ) − F n ( t ) ≥ P Lk =1 c k [ φ k ( t ) − φ k ( t )] + ǫ n for t , t ∈ T n with s ( i ) ≤ t < t .All the constraints are linear. As a result, M n ( s ( i ) ) can be computed bylinear programming. The computation is termed sequential because each . CHI/FDR CONTROL FOR COMPOSITE NULLS M n ( s ( i ) ) is computed based on the data greater than s ( i ) . Therefore, if weimagine that s ( i ) are input one by one, starting with the largest one, then M n ( s ( i ) ) can be computed only after all s ( j ) , j ≥ i , have been input.The FDR control of Procedure 3.1 is given in the next result. The maintool for the proof is martingale stopping time and the Dvoretzky-Kiefer-Wolfowitz (DKW) inequality [12]. Theorem . Suppose 1) φ i ∈ C ( R ) , 2) ν ⊤ φ ( t ) > for all t ∈ I and3) G ( D t ) in continuous in t . Then for n ≥ , provided exp( − nǫ n ) ≤ / ,Procedure 3.1 satisﬁes FDR ≤ α + 2(1 + |T n | ) exp( − nǫ n ) + E (cid:20) { R > } R ∨ (cid:21) . The bound contains terms in addition to α . For appropriate ǫ n and T n ,the term 2(1 + |T n | ) exp( − nǫ n ) is o (1) as n → ∞ . Under certain condi-tions, R is of the same order as n and hence the bound shows FDR can beasymptotically controlled at α . However, the simulation study in Section 4indicates that usually the realized FDR is substantially lower than α , whichis reasonable because M n ( t ) is an overestimation of (1 − a ) ν ⊤ φ ( t ).3.3. Construction using entire data.

In place of A n,t which depends on t , we can use a single family of functions A n . In order to impose maximumamount of linear constraints, A n should incorporate all X i . Based on thesame considerations underlying (3.2), we deﬁne A n =  h ∈ C ( I ) : h ( s i ) ≤ F n ( s i ) + ǫ n for all s i F n ( t ) − F n ( t ) ≥ h ( t ) − h ( t ) − ǫ n for t , t ∈ T n with t < t  . (3.3)Corresponding to (3.1), for t ∈ I , deﬁne M n ( t ) = sup n c ⊤ φ ( t ) : c ∈ ∆ , c ⊤ φ ∈ A n o . (3.4)It is easy to see that M n is nondecreasing. Therefore, corresponding to Pro-cedure 3.1, the following BH procedure obtains. Procedure . Given control parameter α ∈ (0 , I R = sup (cid:26) t ∈ I : M n ( t ) α ≤ R n ( t ) ∨ n (cid:27) . If I R = ∅ , set τ = sup I R and reject H i if and only if s i ≤ τ . Otherwise, set τ = inf I accept all H i .Equivalently, sort s i into s (1) ≤ . . . ≤ s ( n ) and set s (0) = inf I . Reject H i if and only if s i ≤ s ( R ) , where R = max ( i ≥ M n ( s ( i ) ) α ≤ R n ( s ( i ) ) ∨ n ) . . CHI/FDR CONTROL FOR COMPOSITE NULLS Like Procedure 3.1, M n ( s ( i ) ) can be computed using linear programming.For comparision, we list the constraints for the maximization. For each i , M n ( s ( i ) ) is the maximum of c ⊤ φ ( s ( i ) ), with c k satisfying1) c k ≥ P c k ≤ c ⊤ φ ( s ( j ) ) ≤ F n ( s ( j ) ) + ǫ n for all j = 1 , . . . , n .3) F n ( t ) − F n ( t ) ≥ P Lk =1 c k [ φ k ( t ) − φ k ( t )] + ǫ n for all t , t ∈ T n with t < t .It is worth pointing out that although the set of constraints on c is the samefor all s i , for diﬀerent i , because φ ( s i ) are diﬀerent, the value of c that yields M n ( s i ) will be diﬀerent.Unlike Procedures 2.1 and 3.1, since τ in Procedure 3.2 is determinedby the entire s , . . . , s n , it is not a stopping time. Because the martingalestopping time argument cannot be used to establish FDR control for ﬁnite n , we will work out an asymptotic statement instead.For s ∈ R and S ⊂ R , denote the distance from s to S by d ( s, S ) =inf {| s − t | : t ∈ S } . Deﬁne δ ( S, T ) = sup { d ( s, S ) : s ∈ T } for S , T ⊂ R .A sequence S n of ﬁnite sets is said to be increasingly dense in T if for any r > δ ( S n , T ∩ [ − r, r ]) → n → ∞ . Theorem . Suppose 1) all φ i are continuous and c ⊤ φ is strictlyincreasing in I G ( D t ) is continuous in t , and 3) as n → ∞ , ǫ n → , nǫ n → ∞ and T n is increasingly dense in I . Then, under Assumption Agiven below, for Procedure 3.2, lim n →∞ FDR ≤ α .Furthermore, asymptotically the procedure is equivalent to the one thatreject H i if and only if s i ≤ t ∗ , where t ∗ is deﬁned in (3.6) below. Intuitively, as n → ∞ , in certain sense A n should tend to A = { h ∈ C ( I ): Q − h ≥ } . Consequently, M n ( t ) should tend to m ( t ) = sup { c ⊤ φ ( t ) : c ∈ ∆ , c ⊤ φ ∈ A} . (3.5)If this is true, then, as in [6], the asymptotic of FDR as n → ∞ may becharacterize by a ﬁxed point derived from m ( t ) and Q ( t ). Let t ∗ = sup { t ∈ I : m ( t ) ≤ αQ ( t ) } . (3.6) Assumption A. t ∗ ∈ I and there is t < t ∗ , such that m ( t ) < αQ ( t )on ( t , t ∗ ).

4. Numerical study.

Setup.

Because the properties of M n ( t ) in (3.1) and (3.4) are hardto keep track of, it is diﬃcult to analyze the power and pFDR of Procedures3.1 and 3.2. We resort to numerical simulations to get a handle to these twoquantities. For comparison, Procedure 2.1 and the BH procedure with theprior probabilities ν , . . . , ν L being known are also included. . CHI/FDR CONTROL FOR COMPOSITE NULLS We only consider univariate observations. To use lower-tail p -values, weset D t = ( −∞ , t ]. By (2.3), if an observation X takes value x , then s ( X ) = x and hence φ k ( s ( X )) = F k ( x ), the left-tail p -value of X under F k . Also,given observations X , . . . , X n , from R n ( t ) = P ni =1 { X i ∈ D t } , R n ( X i ) isthe rank of X i .In each simulation, we draw iid samples X , . . . , X n from a mixture dis-tribution (1 − a ) P Lk =1 ν k F k ( x ) + aG ( x ), where G = F , . . . , F L . To test nulls H i : X i ∼ F k for some k, i = 1 , . . . , n, we compute four types of p -values:1) p i, seq = M n ( X i ) deﬁned by (3.1) and (3.2), where “seq” in the subscriptstands for “sequential”, indicating that as the calculation of M n ( X i )precedes to smaller X i , linear constraints are added sequentially;2) p i, glb = M n ( X i ) deﬁned by (3.3) and (3.4), where “glb” in the subscriptstands for “global”, indicating that M n ( X i ) are calculated under linearconstraints imposed by all X , . . . , X n ;3) p i, max = max k F k ( X i );4) p i, mix = P k ν k F k ( X i ), i.e., the p -value of X i when the values of ν ,. . . ν L are known.The computation of p i, seq and p i, glb is done by linear programming. By(3.1) and (3.4), both are maxima of c ⊤ F ( X i ) = c F ( X i ) + · · · + c L F L ( X i ).In the simulations, the constraints are a little diﬀerent from those basic onesgiven in (3.2) and (3.3). However, the analysis is the same.Denote by ¯Γ ∗ ( z ; α, β ) the z -th upper-tail quantile of the Gamma distri-bution with shape parameter α and scale parameter β . For i = 1 , . . . , n , tocompute p i, seq , the constraints on c , . . . , c L are1) c k ≥ P c k ≤ c ⊤ F ( X j ) ≤ u ( X j ) for X j ≥ X i , where u ( X j ) =  n ¯Γ ∗ (cid:18) n ; R n ( X i ) , . (cid:19) , if R n ( X j ) ≤ n . , F n ( X j ) + ǫ n otherwise;3) F n ( t ) − F n ( t ) ≥ c ⊤ [ F ( t ) − F ( t )] + ǫ n for t , t ∈ T n with X i ≤ t

95 to any 1 /β with β ∈ (0 , α , Procedure 3.1 using p i, seq . CHI/FDR CONTROL FOR COMPOSITE NULLS computed under the above constraints obtainsFDR ≤ α + r n + E (cid:20) { R > } R ∨ (cid:21) . (4.1)with r n → n → ∞ .With similar modiﬁcations to (3.3), for i = 1 , . . . , n , to compute p i, glb , theconstrains on c , . . . , c L are1) c k ≥ P c k ≤ c ⊤ F ( X j ) ≤ u ( X j ) for all X j ≥ X i ; and3) F n ( t ) − F n ( t ) ≥ c ⊤ [ F ( t ) − F ( t )] + ǫ n for all t , t ∈ T n with t < t .We then apply the BH procedure to the above p -values, speciﬁcally, Pro-cedure 3.1 to p i, seq , Procedure 3.2 to p i, glb , Procedure 2.2 to p i, max , and theBH procedure to p i, mix . For each set of F , . . . , F L and G , we draw 1000iid samples of X , . . . , X n with n = 5000. In this case, r n ≤ . × − in(4.1); see Appendix A.4. The power, FDR and pFDR of each procedure arecalculated by averaging over the samples. Throughout, a = 0 . p i, seq and p i, glb arecomputed by the R linear programming package glpk .4.2. Results.

We conduct 5 groups of simulations. The parameters of thesimulations are shown in Table 1. F , . . . , F L ν , . . . , ν L G N (0 , N ( − , N ( − ,

1) .75, .15, .1 N ( − , t , t , − , t , − .75, .15, .1 t , − N (0 , N ( − , N ( − ,

1) .6, .25, .15 N ( − , N (0 , N ( − , . N ( − , .

5) .75, .15, .1 N ( − , N ( µ, µ = 0 , − , − , − , − N ( − , Table 1

Parameters for the simulations. F k are null distributions, ν k their prior probabilities, and G the distribution under false nulls. In each simulation, a = 0 . t n,c denotes thenoncentral t distribution with n df and noncentrality c . The results of the simulations are summarized in Table 2. In all the simu-lations, the control parameter α is equal to 0.25. As expected, because p i, mix incorporate ν , . . . , ν L , which is information not accessible by the other typesof p -values, they yield the highest power with substantial margin. On theother hand, p i, seq and p i, glb yield substantially higher power than p i, max .This shows that even when ν , . . . , ν L are unknown, by utilizing propertiesof empirical processes to reduce overestimation of p -values, the power of theBH procedure can still be signiﬁcantly improved.In agreement with known results [1, 15], the FDR attained by using p i, mix or p i, max is close to or lower than (1 − a ) α = 0 . p i, max and (1 − a ) α indicates that testingbased on p i, max can be very conservative. On the other hand, in all the . CHI/FDR CONTROL FOR COMPOSITE NULLS simul. 1 2power FDR pFDR power FDR pFDR p i, seq .495 8.61 × − × − .236 8.57 × − × − p i, glb .494 8.60 × − × − .235 8.57 × − × − p i, max .223 2.55 × − × − .035 2.87 × − × − p i, mix .770 .238 .238 .634 .238 .238simul. 3 4power FDR pFDR power FDR pFDR p i, seq .449 .103 .103 4.82 × − × − .465 p i, glb .449 .102 .102 4.82 × − × − .465 p i, max .229 3.77 × − × − × − × − .523 p i, mix .685 .236 .236 .144 .226 .259simul. 5power FDR pFDR p i, seq × − × − × − p i, glb × − × − × − p i, max × − × − × − p i, mix .448 .239 .239 Table 2

Performance of the BH procedure applied to diﬀerent types of p -values in simulations1–5. In each simulation, the control parameter is set at α = 0 . simulations, the FDR attained by using p i, seq or p i, glb lies between the abovetwo, substantially lower than the ﬁrst one but substantially higher thanthe second. Together with the simulation result on power, this shows thatmultiple testing based on p i, seq and p i, glb is more conservative than based on p i, mix , but can be much less conservative than based on p i, max .The conservativeness of multiple testing based on the p -values other than p i, mix does not necessarily help the control of pFDR. In simulations 1 and 3,for each type of p -value, the power is relatively high, implying P ( R ≥ ≈ p i, max is low ( ≤ . p i, seq or p i, glb , the pFDR and FDR are similar to each other.The worst case is simulation 4, where the pFDR is almost twice as high asthe control parameter α = .

25 when p i, seq or p i, glb are used. Observe thatin simulation 4, negative observations with large absolute values are morelikely to be associated with true nulls than with false nulls. This explainsthe poor control of the pFDR by the BH procedure using p i, seq or p i, glb .To see in more detail why p i, seq and p i, glb in general yield better multipletesting results than p i, max , we compare the plots of the p -values. Becauseall the procedures in the study are variants of the BH procedure, it is moreinformative to compare the plots of p ( i ) / ( i/n ) = np ( i ) /i than to comparethose of p ( i ) , i = 1 , . . . , n , where p ( i ) is the i th smallest p -value of a given . CHI/FDR CONTROL FOR COMPOSITE NULLS type. Figures 1 display the plots of n ¯ p ( i ) /i versus i/n in the simulations,where ¯ p ( i ) is the average over the repetitions. The ﬁgure clearly shows thatfor small i/n , np ( i ) , seq /i and np ( i ) , glb /i are similar to each other, both aresubstantially lower than np ( i ) , max /i , and both increase more rapidly than np ( i ) , mix /i . This is consistent with the observation that multiple testing using p i, seq and that using p i, glb perform similarly in terms of power, FDR andpFDR, and in general both have higher power than multiple testing using p i, max at the same value of α .We next look at how p i, seq and p i, glb are computed by linear programming.For each p i, seq or p i, glb , denote by c ,i , . . . , c L,i the values of coeﬃcients thatyield p -values under the corresponding constraints. After the p -values aresorted, let c k, ( i ) be the values corresponding to p ( i ) , seq or p ( i ) , glb . We plot c k, ( i ) versus i/n for k = 1 , . . . , L . Figure 2 shows the plots for simulations 1and 5. The plots for the other simulations are qualitatively similar. As canbe seen, although p i, seq and p i, glb in the simulations are similar, this is notthe case for the corresponding coeﬃcients c k,i . For each k , when i/n is small, c k, ( i ) for the two types of p -values are similar. However, as i/n increases, tocompute p i, seq , essentially only one c k stays nonzero. In all the simulations,this unique c k is associated with the last null distribution of the null, i.e., F L , which also has the smallest sup-norm distance from G among all F k .In contrast, to compute p i, glb , more complicated combinations of c , . . . , c L are picked. This diﬀerence between the coeﬃcients for p i, seq and p i, glb maybe partially attributed to how linear programming is implemented by thepackage used. However, it also indicates linear programming may not yieldconsistent estimation of c , . . . , c L .Note that in Figure 2, for small i/n , the sum of c k, ( i ) is quite smallerthan 0.4. Since a = 1 − P c k , this would imply the fraction of false nullscould be as high as 0.6, which is improbable in many cases. This raises thepossibility that, by imposing some constraint on the sum of c k , the powermay be improved. Recall that a = 0 .

05 in the simulation study. We simulatethe scenario where it is known that a ≤ .

1. For both p i, seq and p i, glb , theﬁrst constraint on c , . . . , c L is expanded to become1’) c k ≥

0, 0 . ≤ P c k ≤ p -values computed with the expanded linear constraints by p ′ i, seq and p ′ i, glb , and those computed previously still by p i, seq and p i, glb . InTable 3, the power and pFDR of the BH procedures when applied to the p -values are compared. In all the cases, the FDR is substantially lower than(1 − a ) α = 0 . R − Vn − N over 1000 repetitions is reported. Recall R is the numberof rejections, V that of false rejections, n = 5000 is the total number ofnulls, and N is number of true nulls. In simulations 1–3, there is a small butsigniﬁcant increase in power by using p ′ i, seq and p ′ i, glb . This is not the casein simulations 4 and 5, where the power is very low for all the 4 types of p -values. . CHI/FDR CONTROL FOR COMPOSITE NULLS simul. 1 2power SD( R − Vn − N ) pFDR power SD( R − Vn − N ) pFDR p i, seq .495 5.20 × − × − .236 6.42 × − × − p i, glb .494 5.20 × − × − .235 6.42 × − × − p ′ i, seq .541 5.25 × − .103 .296 6.83 × − × − p ′ i, glb .541 5.25 × − .103 .296 6.83 × − × − simul. 3 4power SD( R − Vn − N ) pFDR power SD( R − Vn − N ) pFDR p i, seq .449 5.35 × − .103 4.82 × − × − .465 p i, glb .449 5.35 × − .102 4.82 × − × − .465 p ′ i, seq .473 5.38 × − .113 6.26 × − × − .479 p ′ i, glb .473 5.38 × − .113 6.26 × − × − .479simul. 5power SD( R − Vn − N ) pFDR p i, seq × − × − × − p i, glb × − × − × − p ′ i, seq × − × − × − p ′ i, glb × − × − × − Table 3

Performance of the BH procedure applied to p -values computed under diﬀerent linearconstraints: p i, seq and p i, glb are the same as in Table 2, p ′ i, seq and p ′ i, glb are computedwith the additional constraint c + · · · + c L ≥ .

9. For each simulation, R − Vn − N is thefraction of rejected false nulls among all false nulls in a repetition. The SD is obtainedover 1000 repetitions. In Figure 3, we compare the plots of np ( i ) /i for the p -values. Since allrejections occur when i ≪ n , we only compare the plots with i/n ≤ .

05. Itis seen that for small i/n , the plots for p i, seq and p i, glb are very close to eachother, explaining why the performances of the BH procedure based on thetwo types of p -value are similar. Likewise, the plots for p ′ i, seq and p ′ i, glb arevery close to each other, and in simulations 1–3, both are signiﬁcantly lowerthan the plots of p i, seq and p i, glb , which explains the improved power yieldedby p ′ i, seq and p ′ i, glb . Finally, comparing Figures 2 and 4, we can see that theextra constraint c + · · · + c L ≥ . p ′ i, seq with i ≪ n , the linear programming setstwo coeﬃcients nonzero, as opposed to only one for p i, seq .From the above results, it is seen that the performances of the BH pro-cedure based on p i, seq and p i, glb are close to each other, even though thelatter one are subject to more constraints. The reason seems to lie in how p ( i ) , seq are computed. The evaluation of p ( i ) , seq incorporates the constraintsimposed by s ( j ) with j ≥ i . For small i , the set of constraints is only diﬀer-ent by a small fraction from those that are imposed by the entire set of s ( j ) .Under regular conditions, constraints imposed by s ( j ) with j < i will notchange the maximization substantially. This implies that for small i , p ( i ) , seq and p ( i ) , glb are close to each other, as can be seen from Figure 3. Since the . CHI/FDR CONTROL FOR COMPOSITE NULLS BH procedure only reject nulls with small p -values, its performance basedon either type of p -values will be similar.

5. MLE for prior probabilities of nulls.

Let Θ = { θ , . . . , θ L } . Asindicated in Section 4, for composite nulls, in general the prior ν may not beestimated consistently. In this seciton, we consider under what conditions ν can be estimated consistently. Under the setup in Section 1, suppose each F k has a density f k and G is absolutely continuous with respect to thedistribution under true nulls. By the Radon-Nikodym theorem, G has adensity ρ ( x ) ν ⊤ f ( x ) with ρ ( x ) ≥

0. Then the data X , . . . , X n are iid withdensity q ( x ) = [1 − a + aρ ( x )] ν ⊤ f ( x ) . Pretending all the nulls are true, the MLE for ν isˆ ν n = arg sup c ∈ S n X i =1 ln[ c ⊤ f ( X i )] , where S is a suitable set. Usually, one would choose S = { c ∈ [0 , L : P c k =1 } because by the deﬁnition of prior probabilities, ν k ≥ P ν k = 1. Forthe reason described below, we shall make the setting a little more general.Still suppose that the distribution under true nulls is a linear combinationof F k . However, now ν k are allowed to be negative. In this setting, it hadbetter merely regard f k as a basis for a set of densities. Then set S = { c : c + · · · + c L = 1 , c ⊤ f ≥ } . (5.1)A reason for this choice of S can be seen when density functions undertrue nulls are linearly dependent. In this case, it is desirable to pick a basisfrom them, say f , . . . , f L , and represent the others as g j = P k λ jk f k . Bylinear dependence, λ jk can be negative. Let the mixture density under truenulls be a ⊤ f + b ⊤ g , with P a k + P b j = 1 and a k , b j ≥

0. By representingit as ν ⊤ f , we get ν k = a k + P j b j λ jk , which can be negative. On the otherhand, P ν k = 1 and ν ⊤ f ≥

0. Therefore, S in (5.1) contains ν .Recall that if A ⊂ R d , its interior is A o = { x : B ( x , r ) ⊂ A for some r > } , where B ( x, r ) = { z : | z k − x k | < r, k = 1 , . . . , d } . By this deﬁnition, S o = ∅ . However, regarding S as a subset in { c : P c k = 1 } , we have S o = { c : for some r > c + v ∈ S ∀ v ∈ B ( , r ) with P v k = 0 } . Both S and S o are convex. Since S contains all c with c k ≥ P c k = 1, S o = ∅ . Proposition . Suppose R q | ln f k | < ∞ and f , . . . , f L are linearlyindependent. Let a ∈ (0 , . If ν ∈ S o , then ˆ ν n P → ν ⇐⇒ Z ρf k = 1 for all k. . CHI/FDR CONTROL FOR COMPOSITE NULLS Apparently, if ρ = 1, then R ρf k = 1. A question is whether nontrivial ρ ≥ R ρ ( f k − f ) = 0, provided f k ∈ L ,one might search for ρ among functions in L that are orthogonal to f k − f .However, such functions are not always nonnegative. Moreover, oftentimes f k L . The construction below avoids these potential problems and seemsto be general. Example . We only consider how to construct ρ ≥ E = { x : ν ⊤ f ( x ) > } . The general case follows the sameidea. The main step is to ﬁnd bounded ψ , . . . , ψ L ∈ C ( R d ), such that the L × L matrix M = ( M ik ) is nonsingular, where M ik = R ψ i f k . Once thisis done, to construct ρ , ﬁx φ ≥ R φf k < ∞ andsup x ∈ E φ ( x ) = ∞ . Such φ always exist. By det M = 0, there are unique a , . . . , a L ∈ R , such that P a i M ik = 1 − R φf k for each k . Then R hf k = 1,where h = φ + P a i ψ i . It is easy to see h ∈ C ( R d ) is lower bounded andsup x ∈ E h ( x ) = ∞ . Then for c > ρ = 1 − c + ch ∈ C ( R d ) isnonnegative with sup x ∈ E ρ ( x ) = ∞ and R ρf k = 1 − c + c R hf k = 1.To see that ψ , . . . , ψ L as above exist, recalldet M = X σ sgn( σ ) Y k Z f σ ( k ) ψ k = Z X σ sgn( σ ) Y k f σ ( k ) ( x k ) ψ k ( x k ) d x = Z Y k ψ k ( x k ) det[ f i ( x k )] d x . where the sum is over all permutations σ of 1 , . . . , L and sgn( σ ) is the signof σ . Denote D ( x ) = det[ f i ( x k )]. Since | D ( x ) | ≤ P σ Q k f σ ( k ) ( x k ), D ∈ L .Because f , . . . , f L are linearly independent, we claim ℓ ( x : D ( x ) = 0) = 0 , (5.2)where ℓ is the Lebesgue measure. If (5.2) holds, then the characteristicfunction of D is nonzero. Therefore, there are t , . . . , t L = 0, such that R e i ( t x + ··· + t L x L ) D ( x ) d x = 0. It follows that there are ψ k ( x ) of the formsin( t k x ) or cos( t k x ), such that det M = 0.We use induction to prove (5.2). For L = 2, if D ( x ) = 0 a.e., then f ( x ) f ( x ) = f ( x ) f ( x ), a.e. Integrating over x yields f ( x ) = f ( x )a.e., contradicting the assumption that f and f are linearly independent.For L >

2, suppose (5.2) holds for L − f i . Now D ( x ) = L X i =1 ( − L + i f i ( x L ) M i ( x , . . . , x L − ) , where M i ( x , . . . , x L − ) is the determinant of the ( L − × ( L −

1) matrixconsisting of f l ( x k ), l = i , k = 1 , . . . , L −

1. Given x , . . . , x L − , D ( x ) is a . CHI/FDR CONTROL FOR COMPOSITE NULLS linear combination of f i ( x L ). Therefore, if D ( x ) = 0 a.e., then, by the linearindependence of f i ( x ), M i ( x , . . . , x L − ) = 0 a.e. for each i = 1 , . . . , L .However, this contradicts the induction hypothesis.

6. Discussion.

In the article, we have focused on the case of ﬁnitelycomposite nulls, where true nulls are only associated with a ﬁnite numberof distributions. Formally, it is straightforward to generalize the constrainedmaximization to the case of inﬁnitely composite nulls. However, usually themaximization will involve inﬁnitely many degrees of freedom and it becomesunclear how to accommodate this with a ﬁnite number of observations. Amore direct approach might be to partition the set of null distributionsinto a ﬁnite number of subsets and use the envelopes of the subsets tocompute p -values. To be more speciﬁc, given a partition Θ , . . . , Θ L of Θ,let u k ( t ) = sup θ ∈ Θ k φ θ ( t ) and l k ( t ) = inf θ ∈ Θ k φ θ ( t ). Then deﬁne, for example, M n ( t ) = sup { c ⊤ u ( t ) : c ∈ ∆ , c ⊤ l ( t ) is dominated by F n ( t ) up to a smallmargin } . Unfortunately, some of the constraints available to the ﬁnitelycomposite case can no longer be used. Another issue is how to select thepartition. Too coarse partition will only yield loose constraints on c k andtoo ﬁne partition will result in many degrees of freedom. Either way, theobtained M n ( t ) may not be much diﬀerent from the unconstrained maximumprobability.As is known, FDR control can be realized by the local FDRs [5]. For thesimple case, the local FDR at x is (1 − a ) f ( x ) /h ( x ), where a may be replacedwith 0, f is the density under true nulls, and h is the overall density of thedata X , . . . , X n or an estimate of the density. For the ﬁnitely compositecase where the null distributions have densities f , . . . , f L , we may derive aconservative estimate of the local FDR by ρ ( x ) /h ( x ), where ρ ( x ) = sup { c ⊤ f ( x ) : c ∈ ∆ and c ⊤ f ≤ h } . Alternatively, if the dimension of X i is high, then we may work on s i = s ( X i ),with the local FDR deﬁned as ρ ( s i ) /f ( s i ), where h is now the overall densityof s , . . . , s n or an estimate, while ρ ( t ) = sup { c ⊤ φ ( t ) : c ∈ ∆ and c ⊤ φ ≤ h } . It is worth pointing out that, unlike the simple case, the BH procedure basedon M n ( t ) and the FDR control based on ρ ( x ) /h ( x ) are no longer equivalent.The reason is that M n ( t ) is of the form max c R c ⊤ φ . The density of M n ( t ),if existent, in general is diﬀerent from max c c ⊤ φ that is associated with thelocal FDR. It remains to be seen how much diﬀerence the two approachesmay have. Appendix.

In this section, we give proofs of the theoretical statementsof the article. The Lebesgue measure on R d will be denoted by ℓ . For anynondecreasing function f deﬁned on R and x ∈ R , if A := sup { t : f ( t ) ≤ x } 6 = ∅ , deﬁne f ∗ ( x ) = sup A , otherwise, deﬁne f ∗ ( x ) = −∞ . By thisdeﬁnition, if f is left-continuous and x ∈ f ( I ), then f ( f ∗ ( x )) = x . . CHI/FDR CONTROL FOR COMPOSITE NULLS A.1. Proofs for Section 2.

Proof of Proposition 2.1.

Since s i = −∞ ⇐⇒ X i ∈ D t for all t , byD3 and the random mixture model, the probability of the event is 0, henceproving 1). By the right-continuity of D t , s i ≤ t ⇐⇒ X i ∈ D s for all s > t ⇐⇒ X i ∈ D t , yielding 2). By P ( s i ≤ t ) = P ( X i ∈ D t ) = φ θ ( t ), 3) holds and 4) followsfrom 3) and the random mixture model. To get 5), given t , for any ǫ > θ ∈ Θ such that M ( t ) ≤ φ θ ( t ) + ǫ . By D3, M ( s ) ≥ φ θ ( s ) → φ θ ( t ) as s ↑ t , giving M ( t − ) + ǫ ≥ M ( t ). Since M is nondecreasing and ǫ is arbitrary,this implies M ( t − ) = M ( t ). Proof of Proposition 2.2.

To see that Procedures 2.1 and 2.2 are thesame, by Proposition 2.1,Procedure 2.1 accepts H i ⇐⇒ s i > τ ⇐⇒ M ( t ) α > R n ( t ) n ∀ t ≥ s i . Because M ( t ) is nondecreasing and R n ( t ) is an nondecreasing step functionthat has jumps only at s i ,Procedure 2.1 accepts H i ⇐⇒ M ( s j ) α > R n ( s j ) n ∀ s j ≥ s i . Taking into account the possibility of ties, it is not hard to see that thecondition on the right hand side is equivalent to s i > s ( R ) , which impliesProcedures 2.1 and 2.2 always reject the same set of nulls.By the random mixture model, for X i under true nulls, the distribution of F η i ( D s i ) is a mixture of those of φ θ ( s ( X )) under F θ , θ ∈ Θ. By Proposition2.1, under F θ , φ θ ( s ( X )) ∼ Unif(0 , X i under true nulls, s i are iid ∼ Unif(0 , M ( s i ). Since M ( s i ) ≥ F η i ( D s i ), under true nulls, P ( M ( s i ) ≤ x ) ≤ P ( F η i ( D s i ) ≤ x ) = x . The proofthen follows from Theorem 5.1 and the comment that follows in [3]. A.2. Proofs for Section 3.

First, note that for Procedures 3.1 and3.2, the number of rejections and that of false rejections are R = R n ( τ ) and V = V n ( τ ), respectively. Proof of Lemma 3.1.

Let s < t . Then A n,s ⊂ A n,t and c ⊤ φ ( s ) ≤ c ⊤ φ ( t ) for any c ∈ ∆, giving M n ( s ) ≤ M n ( t ). Thus M n is nondecreasing.Next suppose φ i ∈ C ( R ) for all i .1) Given t , as 0 < t − u ≪

1, [ u, t ) has no point in T n and, almost surely,no s i . Thus A n,u = A n,t . Let K = { c ∈ ∆ : c ⊤ φ ∈ A n,t } . It is seen that K is compact and c ⊤ φ ( s ) is a uniformly continuous function in ( c , s ) ∈ K × I . . CHI/FDR CONTROL FOR COMPOSITE NULLS Then sup c ∈ K c ⊤ φ ( s ) is continuous in s , yielding M n ( u ) → M n ( t ) as u ↑ t .Thus M n is left-continuous.2) Since M n is nondecreasing, M n has a right-hand limit at every t . It onlyremains to be shown that at every t

6∈ { s , . . . , s n } , M n is right-continuous.Now, as 0 < u − t ≪

1, [ t, u ) contains no point in T n and no s i , yielding A n,u = A n,t . Then the right-continuity follows from the same argument forthe left-continuity.In addition to Lemma 3.1, we need a few lemmas to prove Theorem 3.1.For t ∈ I , deﬁne σ -ﬁeld F t = F ( R n ( t − ) , V n ( t − ) , R n ( s ) , V n ( s ) : s ≥ t ) . Then {F t , t ∈ I} is a backward ﬁltration, i.e., F t ⊂ F s for t > s . Lemma

A.2.1 . Suppose φ i ∈ C ( R ) for all i . Then for t ∈ R , M n ( t ) is F t -measurable. Proof.

It suﬃces to show that given a ≥ { M n ( t ) ≤ a } ∈ F t for t ∈ I .For c ∈ ∆, c ⊤ φ ∈ C ( R ) and { c ⊤ φ ∈ A n,t } = E ∩ E , where E = n c ⊤ φ ( s i ) ≤ F n ( s i ) + ǫ n for s i ≥ t o ,E = ( F n ( t ) − F n ( t ) ≥ c ⊤ [ φ ( t ) − φ ( t )] − ǫ n for t i ∈ T n ∩ [ t, τ ] with t < t ) . Note E = { c ⊤ φ ( s ) ≤ R n ( s ) /n + ǫ n ∀ s ≥ t with R n ( s ) > R n ( s − ) } . Since R n ( s − ) ∈ F t for s ≥ t , it can be seen that E ∈ F t . On the other hand, E ∈ F t . Therefore, { c ⊤ φ ∈ A n,t } ∈ F t .Since c ⊤ φ ∈ A n,t implies r ⊤ φ ∈ A n,t for any r ∈ Q L ∩ ∆ with r i ≤ c i ,where Q is the set of rational numbers, M n ( t ) = sup { r ⊤ φ ( t ) : r ∈ Q L ∩ ∆ , r ⊤ φ ∈ A n,t } . Notice that r ⊤ φ ( t ) is nonrandom. Then { M n ( t ) ≤ a } = \ r ∈ Q L ∩ ∆ s . t . r ⊤ φ ( t ) >a { r ⊤ φ

6∈ A n,t } ∈ F t . The next goal is to show τ is a stopping time of the backward ﬁltration F t . If sup I = ∞ , then τ has to start at ∞ . To get around this problem, weuse truncations. Let I R be as in Procedure 3.1. Given c < sup I , deﬁne I c = I R ∩ ( −∞ , c ] , τ c = ( sup I c if I c = ∅ inf I otherwise Lemma

A.2.2 . As c ↑ sup I , τ c ↑ τ a.s. Proof.

It suﬃces to show τ < sup I a.s. By deﬁnition, τ ≤ sup I .The event { τ = sup I} implies there are t k ↑ sup I , such that M n ( t k ) ≤ α [ R n ( t k ) ∨ /n . By Lemma 3.1, M n ( t k ) → M n (sup I ) = 1 a.s. On the otherhand, [ R n ( t k ) ∨ /n ≤

1. Therefore, P ( τ = sup I ) = 0. . CHI/FDR CONTROL FOR COMPOSITE NULLS Lemma

A.2.3 . Suppose φ k ∈ C ( R ) . Then 1) there is t > inf I such thatfor any c ∈ I , τ c ≥ t , 2) for c ∈ I , τ c is a stopping time of the backwardﬁltration {F t , t ∈ (inf I , c ] } . Proof.

1) Let u ( t ) := φ ( t ) + · · · + φ L ( t ). Then M n ( t ) ≤ u ( t ) and τ c ≥ sup (cid:26) t ∈ (inf I , c ] : u ( t ) α ≤ R n ( t ) ∨ n (cid:27) ≥ t := sup (cid:26) t ∈ (inf I , c ] : u ( t ) α ≤ n (cid:27) . Since φ k ∈ C ( R ) and φ k ( t ) → t → inf I , the set on the right hand sideis nonempty, yielding t > inf I .2) By deﬁnition, τ c is a stopping time of the backward ﬁltration F t if { τ c ≥ t } ∈ F t for every t ∈ (inf I , c ]. Denote E = { τ c ≥ t } . We ﬁrst show E = (cid:26) ∃ s ∈ [ t, c ] such that M n ( s ) α ≤ R n ( s ) ∨ n (cid:27) . (A.1)The right hand side of (A.1) equals {I c ∩ [ t, c ] = ∅} , which is a subset of E . On the other hand, the diﬀerence between the two events is { τ c ≥ t, I c ∩ [ t, c ] = ∅} = {I c = ∅ , I c ∩ [ t, c ] = ∅ , sup I c ≥ t }⊂ (cid:26) M n ( t ) α > R n ( t ) ∨ n , ∃ t k ↑ t with M n ( t k ) α ≤ R n ( t k ) ∨ n (cid:27) . Since by Lemma 3.1 M n is left-continuous, M n ( t k ) → M n ( t ). On the otherhand, R n ( t k ) → R n ( t − ) ≤ R n ( t ). Thus, the last event is empty and (A.1)holds. Note that by similar argument, M n ( τ c ) /α ≤ [ R n ( τ c ) ∨ /n. (A.2)Let A = { M n ( t ) /α ≤ [ R n ( t ) ∨ /n } . Then A ⊂ E and A ∈ F t . We nextshow E = A ∪ Γ, where Γ = T ∞ k =1 S r ∈ Q ∩ ( t,c ] Γ r,k , withΓ r,k = (cid:26) M n ( r ) α ≤ R n ( r + 1 /k ) ∨ n (cid:27) . Once this is done, by M n ( r ) ∈ F r (cf. Lemma A.2.1) and R n ( r + 1 /k ) ∈ F r ,Γ r,k ∈ F r ⊂ F t for any r > t . Then E ∈ F t .Note E − A implies τ c > t , which in turn implies there are r k ∈ Q with t < r k < τ c < r k + 1 /k . By M n ( r k ) α ≤ M n ( τ c ) α ≤ R n ( τ c ) ∨ n ≤ R n ( r k + 1 /k ) ∨ n , Γ r k ,k holds for all k . Thus E − A ⊂ Γ. . CHI/FDR CONTROL FOR COMPOSITE NULLS It remains to show that Γ ⊂ E . Suppose there are r k ∈ Q ∩ ( t, c ] with M n ( r k ) /α ≤ [ R n ( r k +1 /k ) ∨ /n . Then r k has a subsequence, say, itself, thatconverges to some s ∈ [ t, c ]. Since M n is nondecreasing and left-continuous,while R n is nondecreasing and right-continuous, M n ( s ) α ≤ lim k M n ( r k ) α ≤ lim k R n ( r k + 1 /k ) ∨ n ≤ R n ( s ) ∨ n . Therefore s ∈ I c and I c ∩ [ t, c ] = ∅ . Thus Γ ⊂ E . Lemma

A.2.4 . For n ≥ , denote Γ n = n (1 − a ) ν ⊤ φ ∈ A n,t , ∀ t ∈ I o . (A.3) Suppose Q is continuous. Then, provided exp( − nǫ n ) ≤ / , P (Γ n ) ≥ − (1 + |T n | ) exp n − nǫ n o . Proof.

Since Q is continuous, by the DKW inequality [12], for λ > n ≥

1, as long as exp( − nλ ) ≤ / P { sup( Q − F n ) ≥ λ } ≤ exp( − nλ ) . By Q ( t ) = (1 − a ) ν ⊤ φ ( t ) + aG ( D t ), P n (1 − a ) ν ⊤ φ ( t ) ≥ F n ( t ) + λ for some t o ≤ exp( − nλ ) . DKW inequality also implies that, given x ∈ R , P ( sup t ≥ x { [ Q ( t ) − Q ( x )] − [ F n ( t ) − F n ( x )] } ≥ λ ) ≤ exp( − nλ ) . (A.4)Assuming (A.4) is true for now, it follows that P ( Q ( t ) − Q ( t i ) ≥ F n ( t ) − F n ( t i ) + λ for some t i ∈ T n and t > t i ) ≤ |T n | exp( − nλ ) . Since Q ( t ) − Q ( t i ) ≥ (1 − a ) ν ⊤ [ φ ( t ) − φ ( t i )] for t > t i , by letting λ = ǫ n ,the Lemma then follows.Finally, to get (A.4), let y = Q ( x ). By quantile transformation,sup t ≥ x { [ Q ( t ) − Q ( x )] − [ F n ( t ) − F n ( x )] }∼ ξ = sup s ≥ y { s − y − [ G n ( s ) − G n ( y )] } , where G n is the empirical distribution of U i = Q ( X i ). Since U i are iid ∼ Unif(0 , V i = U i − y + { U i ≤ y } are iid ∼ Unif(0 ,

1) as well and ξ = sup ≤ s ≤ − y [ s − G ′ n ( s )], where G ′ n is the empirical distribution of V i .Applying DKW inequality to ξ , it is seen that (A.4) follows. . CHI/FDR CONTROL FOR COMPOSITE NULLS Proof of Theorem 3.1.

By Proposition 2.1, under true H i , s i ∼ ν ⊤ φ ,which is continuous and positive on I . As a result, { V ( t − ) / ν ⊤ φ ( t ) , F t , t ∈I} is a left-continuous backward martingale.Fix c ∈ I . By Lemma A.2.3, τ c is a stopping time of {F t , t ∈ (inf I , c ] } with τ c ≥ t > inf I . Then ν ⊤ φ ( τ c ) > V n ( τ c − ) / [ ν ⊤ φ ( τ c )] is well-deﬁned. By the optional sampling theorem (cf. [9], Ch. 1, Thm 3.22), E (cid:20) V n ( τ c − ) ν ⊤ φ ( τ c ) (cid:21) = E (cid:20) V n ( c − ) ν ⊤ φ ( c ) (cid:21) = (1 − a ) n. Let c ↑ sup I . By Lemma A.2.2, τ c ↑ τ . Because V n ( τ c − ) ↑ V n ( τ − ) ≤ n , φ k ( τ c ) ↑ φ k ( τ ) and ν ⊤ φ ( τ c ) ≥ ν ⊤ φ ( t ) >

0, by dominated convergence, E (cid:20) V n ( τ − ) ν ⊤ φ ( τ ) (cid:21) = (1 − a ) n. (A.5)On the other hand, because Q is continuous, by Lemma A.2.4, with Γ n deﬁned as in (A.3), E (cid:20) V n ( τ ) R n ( τ ) ∨ (cid:21) = E (cid:20) V n ( τ − ) R n ( τ ) ∨ (cid:21) + E (cid:20) V n ( τ ) − V n ( τ − ) R n ( τ ) ∨ (cid:21) ≤ E (cid:20) V n ( τ − ) R n ( τ ) ∨ n (cid:21) P (Γ n ) + P (Γ cn ) + E (cid:20) V n ( τ ) − V n ( τ − ) R n ( τ ) ∨ (cid:21) . From (A.2), M n ( τ ) /α ≤ [ R n ( τ ) ∨ /n . On the other hand, conditional onΓ n , M n ( τ ) ≥ (1 − a ) ν ⊤ φ ( τ ). Thus, by (A.5) E (cid:20) V n ( τ − ) R n ( τ ) ∨ n (cid:21) P (Γ n ) ≤ E (cid:20) αV n ( τ − ) /n (1 − a ) ν ⊤ φ ( τ ) Γ n (cid:21) P (Γ n ) ≤ E (cid:20) αV n ( τ − ) /n (1 − a ) ν ⊤ φ ( τ ) (cid:21) = α. By Lemma A.2.4, P (Γ cn ) ≤ (1+ |T n | ) exp( − nǫ n ). Finally, note that R n ( τ ) =0 implies V n ( τ ) − V n ( τ − ) = 0 while V n ( τ ) − V n ( τ − ) ≥ s i . Since s i under true nulls are iid with adensity, the probability of the latter event is 0. Therefore, V n ( τ ) − V n ( τ − ) ≤ { R > } a.s. This then ﬁnishes the proof.We next proof Theorem 3.2. For n ≥

1, deﬁneΓ n = n c ∈ ∆ : c ⊤ φ ∈ A n o . For each r >

0, corresponding to (3.3), deﬁneΓ r = ( c ∈ ∆ : c ⊤ φ ( t ) ≤ Q ( t ) + r,Q ( t ) − Q ( t ) ≥ c ⊤ [ φ ( t ) − φ ( t )] − r, t ≤ t ) . CHI/FDR CONTROL FOR COMPOSITE NULLS Both Γ n and Γ r are nonempty since they contain 0. It is not hard to see thatΓ n and Γ r are convex and closed, with Γ r being increasing and Γ = T r> Γ r .Also, whereas Γ n are random, Γ r are nonrandom.Observe that each t ∈ I , M n ( t ) = sup { c ⊤ φ ( t ) : c ∈ Γ n } , m ( t ) = sup { c ⊤ φ ( t ) : c ∈ Γ } . (A.6)Because Γ n is compact, there is a random c ( t ) ∈ Γ n , such that M n ( t ) = c ( t ) ⊤ φ ( t ) . (A.7)As commented after Theorem 3.2, we need to get M n → m . One way todo this is to ﬁrst get Γ n → Γ , which is formalized below. Lemma

A.2.5 . Let r > . Then under the conditions of Theorem 3.2, P (Γ ⊂ Γ n ⊂ Γ r ) → . Proof.

By the assumptions, Q ( t ) is continuous. Let E n = (cid:26) sup t | F n ( t ) − Q ( t ) | ≤ ǫ n / (cid:27) . Then, as in the proof of Lemma A.2.4, for n ≥

1, as long as exp( − nǫ n / ≤ / P ( E cn ) ≤ (cid:8) − nǫ n / (cid:9) . It is not hard to see that E n implies Γ ⊂ Γ n .As nǫ n → ∞ , P (Γ ⊂ Γ n ) ≥ P ( E n ) → c ⊤ φ is supported by and strictly increasing in I , almost surely, as n → ∞ , the set of s i under true nulls is increasingly dense in I , and thus so is S n = { s , . . . , s n } . Because φ k and Q are continuous distribution functions,they are equicontinuous. Given r >

0, ﬁx

C > δ >

0, such thatmax k [ φ k ( − C ) + 1 − φ k ( C )] + Q ( − C ) + 1 − Q ( C ) < r, max k | φ k ( s ) − φ k ( t ) | + | Q ( s ) − Q ( t ) | < r, if | s − t | < δ. Let E ′ n = { δ ( S n , [ − C, C ]) + δ ( T n , [ − C, C ]) < δ } . Conditional on E n ∩ E ′ n ,if t ∈ [ − C, C ], then | Q ( t ) − F n ( t ) | ≤ ǫ n and there is s i with | t − s i | < δ . Let c ∈ Γ n . By c k ≥ c + · · · + c L ≤ c ⊤ φ ( s i ) ≤ F n ( s i ) + ǫ n , c ⊤ φ ( t ) ≤ c ⊤ φ ( s i ) + max k | φ k ( t ) − φ k ( s i ) |≤ F n ( s i ) + ǫ n + r ≤ Q ( s i ) + 2 ǫ n + r< Q ( t ) + 2 ǫ n + 2 r. If t ≤ − C , then c ⊤ φ ( t ) ≤ max φ k ( − C ) ≤ r ≤ Q ( t ) + r . If t ≥ C , then c ⊤ φ ( t ) ≤ ≤ Q ( t ) + r . In any case, c ⊤ φ ( t ) ≤ Q ( t ) + 2 ǫ n + 2 r .Similarly, for t < t , it can be shown that c ⊤ [ φ ( t ) − φ ( t )] < Q ( t ) − Q ( t ) + 3 ǫ n + 4 r . As a result, c ∈ Γ σ , with σ = 3 ǫ n + 4 r . Then E n ∩ E ′ n ⊂{ Γ n ⊂ Γ σ } . Because ǫ n → P ( E n ∩ E ′ n ) → r is arbitrary, the proofis complete. . CHI/FDR CONTROL FOR COMPOSITE NULLS Lemma

A.2.6 . Suppose a < . Then, under the conditions of Theorem3.2, as n → ∞ , P ( M n ∈ C ( R )) → and sup | M n − m | P → . Also, m ∈ C ( R ) . Proof.

Because each φ k is bounded, nondecreasing and continuous, φ isuniformly continuous on R . Since Γ n is compact, c ⊤ φ ( t ), c ∈ Γ n as a familyof functions in t are equicontinuous and uniformly bounded. It follows that M n ∈ C ( R ). Likewise, since Γ is compact, m ∈ C ( R ).Given σ >

0, since Γ r is compact and Γ r ↓ Γ as r ↓

0, there is r > c ∈ Γ r , d ( c , Γ ) < σ . Conditional on Γ ⊂ Γ n , by (A.6), m ( t ) ≤ M n ( t ) for all t . On the other hand, conditional on Γ n ⊂ Γ r , for any t , there is c ( t ) ∈ Γ such that | c ( t ) − c ( t ) | ≤ σ , where c ( t ) is deﬁned as in(A.7). Then | M n ( t ) − c ( t ) ⊤ φ ( t ) | ≤ | c ( t ) − c ( t ) || φ ( t ) | ≤ √ Lσ = ⇒ M n ( t ) ≤ c ( t ) ⊤ φ ( t ) + √ Lσ ≤ m ( t ) + √ Lσ.

Thus, { Γ ⊂ Γ n ⊂ Γ r } ⊂ { ≤ M n ( t ) − m ( t ) ≤ √ Lσ all t } . Because σ isarbitrary, by Lemma A.2.5, sup | M n − m | P → Proof of Theorem 3.2.

The proof follows closely the one in [6]. ByAssumption A and the continuity of m and Q , for any 0 < ǫ ≪ t ∗ − t , δ = min ( inf t ∈ ( t + ǫ,t ∗ − ǫ ) [ αQ ( t ) − m ( t )] , inf t>t ∗ + ǫ [ m ( t ) − αQ ( t )] ) > . Let Q n ( t ) = [ R n ( t ) ∨ /n . As n → ∞ , because sup | Q n − Q | P → | M n − m | P →

0, the probability thatmin ( inf t ∈ ( t + ǫ,t ∗ − ǫ ) [ αQ n ( t ) − M n ( t )] , inf t>t ∗ + ǫ [ M n ( t ) − αQ n ( t )] ) ≥ δ/ P ( | τ − t ∗ | ≤ ǫ ) →

1. Therefore, τ P → t ∗ , which leadsto the last claim of the theorem. Since t ∗ > inf I and c ⊤ φ ( t ) is strictlyincreasing, Q ( t ∗ ) ≥ (1 − a ) ν ⊤ φ ( t ∗ ) >

0. By the Week Law of Large Numbersand dominated convergence,FDR = E (cid:20) V n ( τ ) /nQ n ( τ ) (cid:21) → (1 − a ) ν ⊤ φ ( t ∗ ) Q ( t ∗ ) ≤ m ( t ∗ ) Q ( t ∗ ) = α, where the last equality is due to the continuity of m and Q at t ∗ . A.3. Proofs for Section 5.

We need two lemmas for the proof ofProposition 5.1.

Lemma

A.3.1 . Suppose f , . . . , f L are linearly independent. Then S in (5.1) is a convex compact set. . CHI/FDR CONTROL FOR COMPOSITE NULLS Proof.

It is easy to see that S is convex and closed, so it suﬃces toshow S is bounded. Suppose there are c l ∈ S with | c l | → ∞ as l → ∞ . Since c l + · · · + c lk = 1, this implies max k c lk → ∞ and min k c lk → −∞ . There isa subsequence of c l and a partition of Θ into { θ i , . . . , θ i r } and { θ j , . . . , θ j t } ,with r > t > r + t = L , such that c li s ≥ c lj s < c l inthe subsequence. Without loss of generality, assume c li ≥ i = 1 , . . . , r and c li < i = r + 1 , . . . , L . Denote d li = − c l,r + i for i = 1 , . . . , t . Thenfor every x , r X k =1 c lk f k ( x ) ≥ t X k =1 d lk f r + k ( x ) , r X k =1 c lk = 1 + M l , with M l = t X k =1 d lk . Divide both sides of the inequality by M l and let l → ∞ . Since M l → ∞ ,there is a sequence of l along which ( c l , . . . , c lr ) ⊤ /M l and ( d l , . . . , d lt ) ⊤ /M l have limits, say ( u , . . . , u r ) ⊤ and ( v , . . . , v t ) ⊤ . Then r X k =1 u k f k ( x ) ≥ t X k =1 v k f r + k ( x ) , all x. It is easy to see that u k ≥ v k ≥ P u k = P v k = 1. Becausethe integrals of both sides are equal to 1, equality must hold. As a result, f , . . . , f L are linearly dependent, which is a contradiction. Lemma

A.3.2 . Suppose R q | ln f k | < ∞ for all k .1). For c ∈ S o and r > , if c + v ∈ S ∀ v ∈ B ( , r ) with P v k = 0 , then c ⊤ f ( x ) ≥ r [ M f ( x ) − m f ( x )] , all x, (A.1) where M f ( x ) = max k f k ( x ) and m f ( x ) = min k f k ( x ) .2). For any c ∈ S o , ln( c ⊤ f ) ∈ L ( Q ) .3). Let ℓ ( c ) := R q ln( c ⊤ f ) . Then ℓ ∈ C ( S o ) .4). For any c ∈ S o and x , c ⊤ f ( x ) = 0 ⇐⇒ f ( x ) = .5). If f , . . . , f L are linearly independent, then ℓ is strictly concave in S o . Proof.

1) For any v ∈ B ( , r ) with P v k = 0, by ( c + v ) ⊤ f ( x ) ≥ c ⊤ f ( x ) ≥ − P v k f k ( x ). Let v k = − r if k = min { i : f i ( x ) = M f ( x ) } , v k = r if k = min { i : f i ( x ) = m f ( x ) } , and v k = 0 otherwise. Then (A.1) follows.2) Let t + = t ∨ t − = ( − t ) ∨

0. By Lemma A.3.1, P c − k and P c + k = P c − k + 1 are bounded on S . Fix λ ∈ (0 ,

1) such that (1 − λ ) P c − k ≤ λ/ S . If M f ( x ) > m f ( x ) /λ , then by (A.1), c ⊤ f ( x ) ≥ r ( λ − − m f ( x ). If . CHI/FDR CONTROL FOR COMPOSITE NULLS M f ( x ) ≤ m f ( x ) /λ , then c ⊤ f ( x ) = X c + k f k ( x ) − X c − k f k ( x ) ≥ X c + k m f − X c − k M f ≥ (cid:18)X c + k − λ X c − k (cid:19) m f = (cid:20) − (cid:18) λ − (cid:19) X c − k (cid:21) m f ≥ m f / . Thus, there is a constant κ > c ⊤ f ( x ) ≥ κ ( r ∧ m f .On the other hand, c ⊤ f ( x ) ≤ P c + k M f ( x ) ≤ κ ′ M f ( x ), where κ ′ < ∞ isanother constant. As a result, | ln[ c ⊤ f ( x )] | ≤ max (cid:0)(cid:12)(cid:12) ln[ κ ′ M f ( x )] (cid:12)(cid:12) , | ln[ κ ( r ∧ m f ( x )] | (cid:1) (A.2)Then by ln f k ∈ L ( Q ), ln( c ⊤ f ) ∈ L ( Q ).3) Follows from (A.2) and dominated convergence.4) If c ⊤ f ( x ) = 0, then by (A.1), M f ( x ) = m f ( x ) and hence f k ( x ) are allequal. As a result, f k ( x ) = c ⊤ f ( x ) = 0.5) For c , c ∈ S o and θ ∈ (0 , S o is convex, c := (1 − θ ) c + θ c ∈ S o . Because ln z is strictly concave on (0 , ∞ ), (1 − θ ) ℓ ( c ) + θℓ ( c ) ≤ ℓ ( c ),with “=” ⇐⇒ c ⊤ f ( x ) = c ⊤ f ( x ) for x with q ( x ) >

0. On the other hand,if q ( x ) = 0, then c ⊤ f ( x ) = 0 and by 4), f ( x ) = . Therefore, “=” implies c ⊤ f ( x ) = c ⊤ f ( x ) for all x . Since f k are linearly independent, it follows that“=” ⇐⇒ c = c . Therefore, ℓ is strictly concave. Proof of Proposition 5.1.

By Lemma A.3.2, for c ∈ S o and X ∼ Q ,ln[ c ⊤ f ( X )] ∈ L , so by the Weak Law of Large Numbers, as n → ∞ , n − P ni =1 ln[ c ⊤ f ( X i )] P → ℓ ( c ). Since S is compact and ℓ is continuous andstrictly concave on S o , by standard argument, if ℓ has a maximum point in S o , then the point is unique and ˆ ν n converges in probability to it. Thus, toﬁnish the proof, it suﬃces to show that ν is the maximum point of ℓ ( c ) ifand only if R ρf k = 1.Let π be the map c → ( c , . . . , c L − ) ⊤ and d k ( x ) = f k ( x ) − f L ( x ), k < L .Since c + · · · + c L = 1 for c ∈ S , then c ⊤ f ( x ) = f L ( x ) + L − X k =1 c k [ f k ( x ) − f L ( x )] = f L ( x ) + π ( c ) ⊤ d ( x ) . Denote h ( u , x ) = f L ( x ) + u ⊤ d ( x ) and H ( u ) = R q ( x ) ln h ( u , x ) dx . Then ℓ ( c ) = H ( π ( c )). Since ℓ is strictly concave in S o , so is H on Γ o , with Γ = π ( S ) = { u : f L + u ⊤ d ≥ } . Note that π : S → Γ is bijective with π − ( u ) =( u , − P u k ) and π ( S o ) = Γ o . It remains to be seen that H is diﬀerentiablein Γ o , with ∂H ( u ) ∂u k = Z q ( x ) d k ( x ) h ( u , x ) dx, k = 1 , . . . , L − . . CHI/FDR CONTROL FOR COMPOSITE NULLS Once this obtains, by the strict concavity of H and ( ν , . . . , ν L − ) ⊤ ∈ Γ o , ν is the maximum point of ℓ ⇐⇒ ( ν , . . . , ν L − ) is the maximum point of H ⇐⇒ Z q ( x ) d k ( x ) ν ⊤ f ( x ) dx = 0 ⇐⇒ Z [1 − a + aρ ( x )][ f k ( x ) − f L ( x )] dx = 0 (a) ⇐⇒ Z ρf k = Z ρf L (b) ⇐⇒ Z ρf k = 1 all k, where (a) is due to a > R f k = 1 and (b) is due to the fact that R ρf k being all equal implies each being equal to R ρ ν ⊤ f = 1.Given u ∈ Γ o , ﬁx r > B ( u , r ) ⊂ Γ o . It is not hard to seethat there is σ >

0, such that for any v ∈ B ( u , r ), π − ( v ) + w ∈ S o , ∀ w ∈ B ( , σ ) with P w k = 0. Then by (A.1), h ( v , x ) ≥ σ [ M f ( x ) − m f ( x )] , all v ∈ B ( u , r ) and x. (A.3)For x with q ( x ) > h ( u + v , x ) > ∀ v ∈ B ( , r ).Therefore, ln[ h ( u + v , x ) /h ( u , x )] is well-deﬁned and by Taylor’s expansion,ln h ( u + v , x ) h ( u , x ) = L − X k =1 " d k ( x ) v k h ( u , x ) − d k ( x ) v k h ( u + z v , x ) for some z = z ( v , x ) ∈ [0 , u + z v ∈ B ( u , r ), by Lemma A.3.2, h ( u + z v , x ) > h ( u + z v , x ) ≥ σ [ M f ( x ) − m f ( x )]. On the otherhand, | d k ( x ) | ≤ M f ( x ) − m k ( x ). Thus | d k ( x ) /h ( u + z v , x ) | ≤ /σ . Likewise, | d k ( x ) /h ( u ) | ≤ /σ . As a result, H ( u + v ) − H ( u ) = Z q ( x ) ln h ( u + v , x ) h ( u , x ) dx = L − X k =1 v k Z q ( x ) d k ( x ) h ( u , x ) dx + O ( | v | ) , which ﬁnishes the proof. A.4. Proofs for Section 4.

Proof of (4.1).

Recall that the overall distribution under true nulls is P Lk =1 ν k F k and the distribution of X , . . . , X n is Q = (1 − a ) ν ⊤ F + aG .Then (1 − a ) ν ⊤ F ( X i ) ≤ Q ( X i ). By the assumption, Q is continuous, whichimplies that Q ( X i ) are iid ∼ Unif(0 , X (1) ≤ X (2) ≤ · · · ≤ X ( n ) , (cid:16) Q ( X ( k ) ) , ≤ k ≤ n (cid:17) ∼ (cid:18) ξ + · · · + ξ k ξ + · · · + ξ n +1 , ≤ k ≤ n (cid:19) , . CHI/FDR CONTROL FOR COMPOSITE NULLS where ξ , . . . , ξ n +1 are iid with density e − x { x ≥ } . By exponential inequal-ity, for β ∈ (0 , P ( ξ + · · · + ξ n +1 < β ( n + 1)) ≤ ( βe − β ) n +1 . Therefore,for each k ≤ a n , P (cid:16) Q ( X ( i ) ) ≤ ¯Γ ∗ (1 /n ; i, /β ) (cid:17) = P P ki =1 ξ i P n +1 i =1 ξ i ≤ (1 /n ) g ∗ (1 /n ; i, /β ) ! ≥ P k X i =1 ξ i ≤ βg ∗ (1 /n ; i, /β ) , n +1 X i =1 ξ i ≥ βn ! ≥ P β k X i =1 ξ i ≤ g ∗ (1 /n ; i, /β ) ! − P n +1 X i =1 ξ i < βn ! Because β − P ki =1 ξ i follows the Gamma distribution with shape parameter k and scale parameter β , by above inequalities yield P (cid:16) Q ( X ( i ) ) ≤ ¯Γ ∗ (1 /n ; i, /β ) (cid:17) ≥ − n − ( βe − β ) n +1 . As a result, P (cid:16) (1 − a ) ν ⊤ F ( X ( i ) ) ≤ ¯Γ ∗ (1 /n ; i, /β ) , all i ≤ a n (cid:17) ≥ P (cid:16) Q ( X ( i ) ) ≤ ¯Γ ∗ (1 /n ; i, /β ) , all i ≤ a n (cid:17) ≥ − a n (cid:20) n + ( βe − β ) n +1 (cid:21) . Following the proof of Theorem 3.1,FDR ≤ α + E (cid:20) { R > } R ∨ (cid:21) + 2(1 + |T n | ) exp( − nǫ n ) + a n (cid:20) n + ( βe − β ) n +1 (cid:21)| {z } r n . Note βe − β <

1. With ǫ n = p ln n/n and |T n | = ⌊ (ln n ) ⌋ , it is easy to see r n → n → ∞ . Furthermore, for a n = n . , β = 0 .

95, and n = 5000, r n ≈ . × − . REFERENCES [1] Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: a practicaland powerful approach to multiple testing. J. R. Stat. Soc. Ser. B Stat. Methodol. , 1,289–300.[2] Benjamini, Y. and Hochberg, Y. (2000). On the adaptive control of the false discoveryrate in multiple testing with independent statistics. J. Educ. Behav. Statist. , 1,60–83.[3] Benjamini, Y. and Yekutieli, D. (2001). The control of the false discovery rate inmultiple testing under dependency. Ann. Statist. , 4, 1165–1188.. CHI/FDR CONTROL FOR COMPOSITE NULLS [4] Efron, B. (2008). Microarrays, empirical Bayes and the two-groups model. Statist.Sci. , 1, 1351–1377.[5] Efron, B., Tibshirani, R., Storey, J. D., and Tusher, V. G. (2001). Empirical Bayesanalysis of a microarray experiment. J. Amer. Statist. Assoc. , 456, 1151–1160.[6] Genovese, C. and Wasserman, L. (2002). Operating characteristics and extensionsof the false discovery rate procedure. J. R. Stat. Soc. Ser. B Stat. Methodol. , 3,499–517.[7] Genovese, C. and Wasserman, L. (2006). Exceedance control of the false discoveryproportion. J. Amer. Statist. Assoc. , 476, 1408–1417.[8] Jin, J. and Cai, T. T. (2007). Estimating the null and the proportional of nonnulleﬀects in large-scale multiple comparisons.

J. Amer. Statist. Assoc. , 478, 495–506.[9] Karatzas, I. and Shreve, S. E. (1991).

Brownian motion and stochastic calculus , Seconded. Graduate Texts in Mathematics, Vol. . Springer-Verlag, New York.[10] Lehmann, E. L. and Romano, J. P. (2005). Generalizations of the familywise errorrate.

Ann. Statist. , 3, 1138–1154.[11] Lehmann, E. L., Romano, J. P., and Shaﬀer, J. P. (2005). On optimality of stepdownand stepup multiple test procedures. Ann. Statist. , 3, 1084–1108.[12] Massart, P. (1990). The tight constant in the Dvoretzky-Kiefer-Wolfowitz inequality. Ann. Probab. , 3, 1269–1283.[13] R Development Core Team. (2005). R: A language and environment for statisticalcomputing

Ann. Statist. , 1, 337–363.[15] Storey, J. D., Taylor, J. E., and Siegmund, D. (2004). Strong control, conservativepoint estimation and simultaneous conservative consistency of false discovery rates: auniﬁed approach. J. R. Stat. Soc. Ser. B Stat. Methodol. , 1, 187–205.[16] van der Laan, M. J., Dudoit, S., and Pollard, K. S. (2004). Augmentation proce-dures for control of the generalized family-wise error rate and tail probabilities for theproportion of false positives. Stat. Appl. Genet. Mol. Biol. 3 , Art. 15, 27 pp. (electronic).

Department of StatisticsUniversity of Connecticut215 Glenbrook Road, U-4120Storrs, CT 06269E-mail: [email protected] . CHI/FDR CONTROL FOR COMPOSITE NULLS Simulation 1 Simulation 2

Simulation 3 Simulation 4

Simulation 5

Fig 1 . Plots of n ¯ p ( i ) /i versus i/n in simulations 1–5 for diﬀerent types of p -values: p i, seq (“lp-sequential”), p i, glb (“lp-global”), p i, max (“max”), and p i, mix (“mix”).. CHI/FDR CONTROL FOR COMPOSITE NULLS Simulation 1

Simulation 5

Fig 2 . Plots of c k, ( i ) versus i/n , k = 1 , . . . , n in simulations 1 and 5, where c , ( i ) , . . . , c L, ( i ) are the coeﬃcients to attain p ( i ) , seq (left) or p ( i ) , glb (right).. CHI/FDR CONTROL FOR COMPOSITE NULLS Simulation 1 Simulation 2

Simulation 3 Simulation 4

Simulation 5

Fig 3 . Plots of n ¯ p ( i ) /i versus i/n in simulations 1–5, with i/n ≤ .

05. The plots withopen symbols are those of p i, seq and p i, glb as in Figure 1. The plots with closed symbolsare those of p i, seq and p i, glb computed with the extra constraint c + · · · + c L ≥ . Simulation 1

Simulation 5

Fig 4 . Plots of c k, ( i ) versus i/n , k = 1 , . . . , n in simulations 1 and 5, where c , ( i ) , . . . , c L, ( i ) are the coeﬃcients to attain p ′ ( i ) , seq (left) or p ′ ( i ) , glb (right), under the constraint c + · · · + c L ≥ . p ( i ) , seq and p ( i ) , glbglb