Empirical Bayes cumulative \ell-value multiple testing procedure for sparse sequences
aa r X i v : . [ m a t h . S T ] F e b Empirical Bayes cumulative ℓ –value multipletesting procedure for sparse sequences Kweku Abraham, Ismaël Castillo and Étienne Roquain
Université Paris–Saclaye-mail: [email protected]
Sorbonne UniversitéLaboratoire Probabilités, Statistique et Modélisation4, place Jussieu, 75005 Paris, Francee-mail: [email protected] ; [email protected] Abstract:
In the sparse sequence model, we consider a popular Bayesian multiple testingprocedure and investigate for the first time its behaviour from the frequentist point of view.Given a spike–and–slab prior on the high-dimensional sparse unknown parameter, one caneasily compute posterior probabilities of coming from the spike, which correspond to thewell known local-fdr values [25], also called ℓ –values. The spike–and–slab weight parameteris calibrated in an empirical Bayes fashion, using marginal maximum likelihood. The multipletesting procedure under study, called here the cumulative ℓ –value procedure , ranks coordi-nates according to their empirical ℓ –values and thresholds so that the cumulative rankedsum does not exceed a user–specified level t . We validate the use of this method from themultiple testing perspective: for alternatives of appropriately large signal strength, the falsediscovery rate (FDR) of the procedure is shown to converge to the target level t , while itsfalse negative rate (FNR) goes to 0. We complement this study by providing convergencerates for the method. Additionally, we prove that the q –value multiple testing procedure[40, 14] shares similar convergence rates in this model. MSC2020 subject classifications:
Primary 62G20 Secondary 62G07, 62G15.
Keywords and phrases:
Bayesian nonparametrics, spike–and–slab priors, multiple testing,false discovery rate, empirical Bayes, local-fdr.
Contents imsart-generic ver. 2020/08/06 file: CLvalues-procedures_EJS_submission.tex date: February 18, 2021 . Abraham, I. Castillo, E. Roquain/ C ℓ –value multiple testing A Auxiliary results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26B Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
1. Introduction
Multiple testing problems are ubiquitous and encountered in applications as diverse as genomics,imaging, and astrophysics. The seminal paper of Benjamini and Hochberg [7] introduced the FalseDiscovery Rate (FDR) as a criterion for multiple testing and provided a procedure controllingit, the so-called Benjamini–Hochberg procedure. Subsequent papers adapted this procedure indifferent contexts [9, 8, 10, 36, 33, 27, 17, 21, 32, 20, 11, 5, 6, 28]. We focus here on anotherclass of multiple testing procedures, also widely used in practice, consisting of empirical Bayesianprocedures. These have been made popular in particular through the two-group model [25] and aseries of papers by Efron [22, 23, 24], see also [42, 3, 41] for several extensions. More specifically,the local FDR (called ℓ –value here) can be seen as a Bayesian quantity corresponding to theprobability of being under the null distribution conditionally on the value of the test statistic.This probability is typically estimated by plugging in estimators of model aspects, which followsthe general philosophy of empirical Bayes methods. Using ℓ –values instead of p –values is oftenconsidered to be more powerful [42], which explains the popularity of these significance measuresin practical applications, including genomic data and biostatistics [34, 18, 45, 29, 2, 39, 26] butalso other applied fields, such as neuro-imaging as in e.g. [31]. In addition, the detection abilityof ℓ –values can be increased further by adding structure on the null configurations via a latentmodel, as a hidden Markov model [43, 1], or a stochastic block model [35], or via covariates [12].Despite their popular practical use, Bayesian multiple testing methods remain much less un-derstood from the theoretical point of view than p –value based approaches. Decision-theoreticarguments inspire most practical algorithms based on the Bayesian distribution (see among others ℓ –, C ℓ – and q –value procedures defined below). Such arguments are theoretically justified underthe assumption that the data has been generated from a model which includes specific randommodelling of the latent parameters, and this random modelling can be seen as a Bayesian prior.Yet, in practice, especially in sparsity problems, specification of prior aspects such as the numberof effective parameters and the distribution of alternative means is delicate. In the frequentist–Bayes literature, an alternative is to look for prior distributions that can be proved to have optimalor near-optimal behaviour from the frequentist point of view (see Section 1.4 below for generalreferences). This leads to the question of studying Bayesian multiple testing procedures in thefrequentist sense. From the perspective of multiple testing theory, the goal is to design proceduresthat are robust with respect to the latent modelling, which is in line with the classical strong errorrate control [19].While most of the literature on multiple testing for Bayesian methods has focused on latentvariable modelling with a random ‘signal’ parameter, we thus focus here on the case of any de-terministic signal. There are very few works so far in this setting — we present a brief literaturereview in Section 1.4 — and the present work can be seen as a continuation of [14]. In that work, afamily of spike–and–slab prior distributions was considered and frequentist properties of two mul-tiple testing procedures were investigated in the sparse sequence model: the ℓ –value procedure,where testing is based on the posterior probability that a given null hypothesis is true, and the q –value procedure, based on the Bayesian probability of the null given the hypothetical event thatthe data exceeds the value actually observed. A different procedure very popular in practice is onebased on cumulative ranked ℓ –values, called the C ℓ –value procedure below. This procedure wasconjectured to have desirable frequentist properties in [14]. The aim of the present paper is to con-firm this conjecture: the C ℓ –value procedure is studied here for the first time from the frequentistperspective in the setting of sparse deterministic hypotheses. We now proceed to introducing inmore detail the model, the inferential goals, and the multiple testing procedures to be considered. imsart-generic ver. 2020/08/06 file: CLvalues-procedures_EJS_submission.tex date: February 18, 2021 . Abraham, I. Castillo, E. Roquain/ C ℓ –value multiple testing Notation used in the paper is collected in Appendix B for the reader’s convenience.
Model.
Consider the Gaussian sequence model, for θ = ( θ , . . . , θ n ) ∈ R n , X i = θ i + ε i , ≤ i ≤ n, (1)where the noise variables ( ε i ) i ≤ n are assumed to be iid standard Gaussians N (0 , φ . We assume that there exists a true (unknown) vector θ ∈ R n that is sparse :specifically, if k θ k ℓ := { ≤ i ≤ n : θ i = 0 } denotes the number of non-zero coordinates of θ , we assume that θ ∈ ℓ ( s n ) for a sequence s n → ∞ satisfying s n /n → n → ∞ , where for s ≥ ℓ ( s ) = { θ ∈ R n : k θ k ℓ ≤ s } . (2)The distribution of the data under the true θ is given by P θ = n O i =1 N ( θ ,i , , where θ satisfies the sparsity constraint (2) but is otherwise arbitrary and non–random. To makeinference on θ , we follow a Bayesian approach and endow θ with a prior distribution Π. UsingBayes’ formula one can then form the posterior distribution Π[ · | X ], which is the conditionaldistribution of θ given X in the Bayesian framework. The choice of Π (and the correspondingposterior Π[ · | X ]) will be specified in more detail in Section 1.3.1 below. To assess the validity ofinference using Π[ · | X ], we study the behaviour of the latter — or of aspects of it used to build atesting procedure — in probability under the true frequentist distribution P θ . Multiple testing inferential problem, FDR and FNR.
We consider the multiple testing problemof determining for which i we have signal, that is, θ ,i = 0. More formally, we analyse a procedure ϕ ( X ) = ( ϕ i ( X )) ≤ i ≤ n , taking values in { , } n , that for each coordinate i guesses whether or notsignal is present. To evaluate the quality of such a procedure ϕ , one needs to consider certainrisk or loss functions. Here we focus on the most popular such risks, defined as follows: the FDR(false discovery rate) is the average proportion of errors among the positives, while the FNR (falsenegative rate) is the average proportion of errors among the true non-zero signals.More precisely, first define the false discovery proportion (FDP) at θ byFDP( ϕ ) = FDP( ϕ, θ ) := P ni =1 { θ ,i = 0 , ϕ i = 1 } ∨ (cid:0)P ni =1 ϕ i (cid:1) . (3)Then the FDR at θ is given byFDR( ϕ ) = FDR( ϕ ; θ ) := E θ [FDP( ϕ )] . (4)Similarly, the false negative rate (FNR) at θ is defined asFNR( ϕ ) = FNR( ϕ, θ ) := E θ (cid:20) P i ≤ n { θ ,i = 0 , ϕ i = 0 } ∨ (cid:0)P i ≤ n { θ ,i = 0 } (cid:1) (cid:21) . (5)To use classical testing terminology, the FDR can be interpreted as a type I error rate, while theFNR corresponds to a type II error rate. The former is ubiquitous and the latter, with the current(non-random) choice of denominator, has been widely used in recent contributions.The aim of multiple testing in this setting is to find procedures that keep both type of errorsunder control. Inevitably, in the sparse setting in model (1), to achieve this will require some signalstrength assumption (see (22) below). imsart-generic ver. 2020/08/06 file: CLvalues-procedures_EJS_submission.tex date: February 18, 2021 . Abraham, I. Castillo, E. Roquain/ C ℓ –value multiple testing For w ∈ (0 , w = Π w,γ denote a spike–and–slab prior for θ ,where, for G a distribution with density γ ,Π w = ((1 − w ) δ + w G ) ⊗ n . (6)That is, under Π w , the coordinates of θ are independent, and are either exactly equal to 0, withprobability (1 − w ), or are drawn from the ‘slab’ density γ . When the Bayesian model holds,the data X follows a mixture distribution, with each coordinate X i independently having density(1 − w ) φ + wg , where g denotes the convolution φ ⋆ γ . This shares similarities with the well-known two-group model in the multiple testing literature [25]: the only difference is that here thealternative (i.e. the slab), is fixed a priori, rather than estimated from the data.In this work, we consider in particular a ‘quasi-Cauchy’ alternative, wherein g ( x ) = (2 π ) − / x − (1 − e − x / ) , x ∈ R . (7)The references [30, 14] consider more generally a family of heavy-tailed distributions governedby a parameter κ ∈ [1 , κ = 2, and wenote that most of the calculations in the current paper work unchanged in the Laplace case κ = 1.Some, however, require minor adjustment, and in particular, one should expect a slightly differentrate of convergence of the FDR to t in Theorems 2 and 3 below.The posterior distribution Π w ( · | X ) can be explicitly derived as θ | X ∼ n O i =1 ( ℓ i,w ( X ) δ + (1 − ℓ i,w ( X )) G X i ) , where G x is the distribution with density γ x ( u ) := φ ( x − u ) γ ( u ) /g ( x ) and ℓ i,w ( X ) = Π w ( θ i = 0 | X ) = ℓ ( X i ; w ) , ≤ i ≤ n,ℓ ( x ; w ) = (1 − w ) φ ( x )(1 − w ) φ ( x ) + wg ( x ) ∈ (0 , , x ∈ R . (8)The quantities ℓ i,w ( X ), 1 ≤ i ≤ n , are called the ℓ –values . Note that w → ℓ i,w ( X ) is decreasing.For short, we sometimes write ℓ i ( X ) or ℓ i,w for ℓ i,w ( X ) when not ambiguous. In words, each ℓ i,w ( X ) corresponds to the posterior probability that the measurement X i comes from the null,this probability being computed in the Bayesian model with the spike–and–slab prior (6). Let usunderline that, in the usual multiple testing terminology of the two-group model, the posteriordistribution ℓ i,w ( X ) corresponds to the i -th local fdr of the data, when the alternative density is g , the null density is φ , and the proportion of true nulls is 1 − w , see, e.g., [24].In the empirical Bayes framework, one first estimates w empirically from the data using, forexample, the maximum (marginal) likelihood estimator, defined as the maximiser (which existsalmost surely, in view of Lemma 4) ˆ w = argmax w ∈ [1 /n, L ( w ) , (9)where L ( w ) denotes the marginal log-likelihood function for w , which can be expressed as L ( w ) = n X i =1 log φ ( X i ) + n X i =1 log(1 + wβ ( X i )) , β ( x ) := gφ ( x ) − . (10)The resulting empirical Bayes (EB) posterior is simply Π ˆ w [ · | X ]. Finding a maximiser ˆ w andsimulating from this distribution, or calculating aspects such as the posterior mean or median, canbe done in a fast and efficient way and has been implemented in the EBayesThresh
R package. Fromthe theoretical perspective, a lot of progress has been made in the last few years in understandingthe behaviour of the empirical Bayes posterior, in connection with the study of Bayesian proceduresin sparsity settings, and we briefly review such results in Section 1.4 below. imsart-generic ver. 2020/08/06 file: CLvalues-procedures_EJS_submission.tex date: February 18, 2021 . Abraham, I. Castillo, E. Roquain/ C ℓ –value multiple testing We start by recalling the notions of the Bayesian FDR and posterior FDR under a prior distribu-tion, see [38], and then define the three Bayesian multiple testing procedures of interest for thiswork.
BFDR and postFDR.
The Bayesian FDR is the FDR when instead of having a fixed θ , theparameter θ is truly generated from the prior Π w :BFDR( ϕ ) = BFDR w,γ ( ϕ ) := E θ ∼ Π w FDR( ϕ ; θ ) , (11)and the posterior FDR is the FDR obtained by drawing θ from its posterior:postFDR w ( ϕ ) := E Π w ( ·| X ) [FDP( ϕ )] = P ni =1 ℓ i,w ( X ) ϕ i ∨ ( P ni =1 ϕ i ) . (12)Note that postFDR w ( ϕ ) decreases as w increases (for a fixed procedure ϕ ), as a result of themonotonicity of the ℓ –values (see Lemma 4). ℓ –value procedure. Let us consider a family of multiple testing procedures ϕ = ϕ λ,w based on ℓ –value thresholding as follows. For any given level λ ∈ (0 , ϕ λ,w ( X ) = ( { ℓ i,w ( X ) < λ } ) ≤ i ≤ n . (13)The ℓ –value procedure at level t is then defined by ϕ t, ˆ w ( X ), for ˆ w as in (9).C ℓ –value procedure. Given the collection of procedures (13) for different thresholds λ , anotherway to choose λ is to ensure the posterior FDR (12) is controlled at a level as close as possible tothe target level t . This gives the C ℓ –value procedure defined, for ˆ w as in (9), as ϕ C ℓ = ϕ ˆ λ, ˆ w , ˆ λ = ˆ λ ( ˆ w, t ) := sup { λ ∈ (0 ,
1] : postFDR ˆ w ( ϕ λ, ˆ w ) ≤ t } . (14)This is a reformulation of the procedure considered in, e.g., [34, 42]. The original expression of ϕ C ℓ in these references (using cumulative sums rather than the level λ ) can be derived from theobservation that we necessarily threshold at one of the observed ℓ –values (or rather, ˆ ℓ i = ℓ i, ˆ w ( X ),values) since the posterior FDR only changes when we cross such a value. The threshold is ˆ λ =ˆ ℓ ( ˆ K +1) , with ˆ ℓ ( i ) denoting the i th order statistic of { ℓ i, ˆ w : 1 ≤ i ≤ n } , and we therefore reject thenull hypotheses for indices corresponding to the ˆ K smallest ˆ ℓ –values, where ˆ K is defined by1ˆ K ˆ K X i =1 ˆ ℓ ( i ) ≤ t < K + 1 ˆ K +1 X i =1 ˆ ℓ ( i ) . (15)(By convention the left inequality automatically holds in the case ˆ K = 0. If the right inequalityis not satisfied for any ˆ K < n , we set ˆ K = n and ˆ λ = 1.) Note that ˆ K is well defined and unique,by monotonicity of the average of nondecreasing numbers. This monotonicity also makes clear thefollowing dichotomy, which will prove useful in the sequel: for all t ∈ (0 ,
1) and λ ∈ (0 , ˆ w ( ϕ λ, ˆ w ) ≤ t ⇐⇒ λ ≤ ˆ λ. (16) In principle we define the order statistics so that repeats are allowed, defining them by the traits { ℓ i , i ≤ n } = { ℓ ( j ) , j ≤ n } as a multiset ( ∀ x ∈ R , { i : ℓ i = x } = { i : ℓ ( i ) = x } ) and ℓ (1) ≤ ℓ (2) ≤ · · · ≤ ℓ ( n ) . When ℓ ( ˆ K ) = ℓ ( ˆ K − in fact ϕ C ℓ as defined in (14) rejects fewer than ˆ K hypotheses. However, with probability 1, theˆ ℓ values are all distinct, due to the Gaussianity of X i and the strict increasingness of the map x ℓ i,w ( x ), seeLemma 4. imsart-generic ver. 2020/08/06 file: CLvalues-procedures_EJS_submission.tex date: February 18, 2021 . Abraham, I. Castillo, E. Roquain/ C ℓ –value multiple testing This indicates that the supremum in (14) is a maximum. Also observe that postFDR ˆ w ( ϕ t, ˆ w ) ≤ t ,so that ˆ λ ≥ t and the ℓ –value procedure is always more conservative than the C ℓ –value procedure. q –value procedure. Another way to calibrate a procedure ϕ i = {| X i | ≥ x } in order to controlthe (B)FDR is to further simplify the expectation of a ratio defining the BFDR and insteadconsider the ratio of expectations, q ( x ; w ) = E θ ∼ Π w P ni =1 { θ i = 0 } {| X i | ≥ | x |} E θ ∼ Π w P ni =1 {| X i | ≥ | x |} = (1 − w )Φ( | x | )(1 − w )Φ( | x | ) + wG ( | x | ) , x ∈ R , where Φ and G denote the upper tail functions of the densities φ and g respectively. The q –valuesare then given by q i,w ( X ) = q ( X i ; w ) = (1 − w )Φ( | X i | )(1 − w )Φ( | X i | ) + wG ( | X i | ) , ≤ i ≤ n, (17)and the q –value procedure is defined by thresholding the q -values at the target level t > ϕ q –val ( X ) = ( { q i, ˆ w ( X ) < t } ) ≤ i ≤ n . (18)Thanks to monotonicity of both the q and ℓ values (see Lemma 4) ϕ q –val lies in the class (13),so that ϕ q –val = ϕ λ q , ˆ w for some λ q = λ q ( ˆ w, t ). As with the ℓ –values, we sometimes write q i,w for q i,w ( X ). Rationale behind these procedures for FDR control.
Let us now give some intuition behind theintroduction of such procedures. Consider ϕ t,w , ϕ ˆ λ ( w,t ) ,w , and ϕ λ q ( w,t ) ,w ; that is, the ℓ –, C ℓ – and q –value procedures respectively, but with a fixed value of w . All three control the Bayesian FDR(BFDR) at level t under the prior Π w : for the first and third procedures, see Proposition 1 in[14]; for the C ℓ –value procedure with fixed w , since postFDR w ( ϕ ˆ λ ( w,t ) ,w ) ≤ t , we directly haveBFDR( ϕ ˆ λ ( w,t ) ,w θ ) ≤ t by taking expectations. Hence, from the decision–theoretical perspective, ifthe prior Π w is “correct”, these procedures are bona fide for the purpose of controlling the BFDR.Note that this says nothing when the procedures are constructed using a random w which (as in(9)) is typically what is done in practice. In addition, to derive frequentist properties, the proce-dure has to be evaluated under a fixed truth θ , which makes it even further from the previousdecision-theoretic argument. Yet, one can expect that for n large, ˆ w (and consequently the plug-inposterior Π ˆ w [ · | X ]) concentrates in an appropriate way, giving the hope, validated by Theorem 1below for the C ℓ –value procedure with strong signals, that the frequentist FDR at θ can still becontrolled. q –value and C ℓ –value: some differences. Note that, as originally introduced by [40], q ( x ) cor-responds to P ( θ,X ) ( θ i = 0 | | X i | ≥ x ). Hence, the q –value q i,w ( X ) corresponds to the conditionalprobability q i,w ( X ( ω )) = P θ ∼ Π w ( θ i = 0 | | X i | ≥ | X i ( ω ) | ). Nevertheless, it is not based solely onthe posterior Π w ( · | X ) but rather on the joint distribution of ( θ, X ): in the conditioning, theevent | X i | ≥ | X i ( ω ) | involves measures X i more extreme than the observed one X i ( ω ). By con-trast, the C ℓ –value procedure depends only on the observed event and not on other events thatone hypothetically could have observed. From a philosophical point of view, it follows that whileboth procedures adhere to multiple testing principles, the C ℓ –value procedure more closely alignswith Bayesian principles. This potentially also has positive implications for computation, since theC ℓ –value procedure can be calculated directly from ℓ –values, while computation of q –values mustbe done separately and can be more involved for more complicated priors/models. Frequentist analysis of Π ˆ w [ · | X ] . Recently, a number of works have analysed different aspects ofinference for the EB–posterior distribution, mostly from the estimation perspective. The paper imsart-generic ver. 2020/08/06 file: CLvalues-procedures_EJS_submission.tex date: February 18, 2021 . Abraham, I. Castillo, E. Roquain/ C ℓ –value multiple testing [30] pioneered this study by establishing that the posterior median and mean converge at mini-max rates over sparse classes for the quadratic risk. The posterior distribution itself was studiedin [13] and results on frequentist coverage of Bayesian credible sets were obtained in [16]. Thisconnects to the analysis of Bayesian methods in high-dimensional settings, where a variety of priordistributions (e.g. different types of spike–and–slab priors, continuous shrinkage priors includingthe horseshoe or mixture of Gaussians) and methods (e.g. empirical Bayes, fully Bayes, variationalBayes) have been considered. We refer to [4] for a review on the rapidly growing literature on thesubject. Summary of results from [14] . Given the optimality properties of the empirical Bayes posteriordistribution for estimation and confidence sets yielded by the above results, it is natural to askwhether efficient multiple testing procedures can be constructed from it. In [14], this question wasconsidered in the present setting (under an additional ‘polynomial sparsity’ condition) and thefollowing was obtained:• the ℓ –value procedure controls the FDR, uniformly over all sparse alternatives. Its FDR con-verges to 0 at a slow (logarithmic) rate. For alternatives with large enough “signal strength”,the ℓ –value procedure has a vanishing FNR.• the q –value procedure controls the FDR close to the target level, uniformly over all sparsealternatives. For alternatives with large enough signal strength, the q –value procedure hasFDR converging to the target level, and a vanishing FNR.A numerical study in the same reference confirmed the excellent behaviour of the ℓ –, q – and C ℓ –value procedures in practice, with some differences between the procedures appearing, as expectedfrom the above theoretical results: in particular, the ℓ –value procedure is slightly too conservativeand has FDR tending to 0 for any level t ∈ (0 , ℓ –value proceduredoes not exactly follow the FDR–scale. The q –value procedure was shown on the other hand toscale “correctly” in terms of FDR by having its FDR going to the target level t . The simulationssuggested, and our results below confirm, that the C ℓ –value procedure is able to adjust an ℓ –valuethresholding procedure to follow the FDR–scale by choosing a threshold ˆ λ ≥ t which will in factbe shown to converge to 1. Previous work on Bayesian methods for deterministic hypotheses.
While particularly good be-haviour is often reported in simulations for the empirical FDR using procedures based on empiricalBayes principles (see e.g. Section 5, Figure 7, in [44] for the use of an empirical Bayes–calibratedhorseshoe prior), the paper [14] described above and [37], which considered continuous shrinkagepriors, are among the very few providing frequentist guarantees on FDR control. One goal of thepresent paper is to obtain further results in this direction.
A model often considered in the literature on multiple testing is the following: θ = ( θ , . . . , θ n ) ∼ Q ; (19) X i | θ i indep. ∼ g θ i , (20)where the θ i ’s are random latent states, say taking values in { , } , Q is a probability distributionon such states, and g θ i is the density of the data point X i given one is in the state θ i . When the θ i ’s are independent, one recovers the so-called two-group model [25]. Another setting of interestis the case where Q follows a Markov chain, in which case the model (19)–(20) is a Hidden MarkovModel (HMM). The work [43] derived results for the C ℓ –value multiple testing procedure in thecase of parametric assumptions on the emission densities of the HMM, while the nonparametricsetting for emission densities has recently been considered in [1]. Other examples include two–sample multiple testing [12] and graph–data with underlying stochastic block-model structures[35]. Such latent variable approaches can be interpreted as Bayesian methods if we consider the imsart-generic ver. 2020/08/06 file: CLvalues-procedures_EJS_submission.tex date: February 18, 2021 . Abraham, I. Castillo, E. Roquain/ C ℓ –value multiple testing layer (19) as a prior distribution. The FDR control provided in those works is thus a BFDR controlin the terminology used in the current paper: that is, an FDR control integrated over the prior, asin (11). In other words, the prior distribution is considered to be “true”, and the main challengeof these studies is to deal with the estimation of the (hyper-)parameters Q and e.g. g , g . Bycontrast, in the sparse setting considered here, we are able to control the FDR without assumingthe latent structure (19)–(20) is genuinely true. While [14] studied ℓ – and q –value procedures, the commonly used C ℓ –value procedure was leftaside. Its theoretical study is more involved, because it is “doubly empirical”, with random choicesof both ˆ w and ˆ λ . Our main results here can be informally described as follows:• the C ℓ –value procedure is — to the best of our knowledge — analysed for the first timein the sparse frequentist setting. By extending ideas from [14], we prove that the C ℓ –valueprocedure controls the FDR at a user–predefined target level and has a vanishing FNR, forsuitably large non–zero signals.• the convergence rates of the FDR and FNR of the C ℓ –value procedure to the target leveland to 0 respectively are studied. One obtains logarithmic rates, confirming a conjectureformulated in [14], Section S–8 (in the supplementary material [15]).• the convergence rates of the FDR and FNR are also studied for the q –value procedure,thereby complementing Theorems 3 and 4 of [14].
2. Main results
Let us define a ‘strong signal class’ of θ ’s with exactly s n non-zero entries, each of which is “large”.For θ ∈ ℓ ( s n ), denote by S the support of θ , S = { i : θ ,i = 0 } . (21)Then for a sequence v n → ∞ we define the strong signal class ℓ ( s n ; v n ) = (cid:26) θ ∈ ℓ ( s n ) : | θ i | ≥ p n/s n ) + v n for i ∈ S , | S | = s n (cid:27) . (22) Theorem 1.
Fix t ∈ (0 , . Consider any sequence s n → ∞ such that s n /n → and any sequence v n → ∞ . Then, as n → ∞ , sup θ ∈ ℓ ( s n ,v n ) | FDR( ϕ C ℓ ; θ ) − t | → , (23)sup θ ∈ ℓ ( s n ,v n ) FNR( ϕ C ℓ ; θ ) → . (24)Let us emphasise that the conclusion of Theorem 1 does not mention the prior, holding forany deterministic θ in the strong signal class, not only for non-zero entries of θ drawn from thequasi-Cauchy distribution. Moreover, this frequentist consistency result holds uniformly across thestrong signal set ℓ ( s n ; v n ).The key novelty required in the proof of Theorem 1 relative to the proofs in [14] for the ℓ –value and q –value procedures is that, as a result of the “doubly empirical” nature of the C ℓ –valueprocedure, we must not only control the size and impact of fluctuations of ˆ w about some centralvalue w ∗ , but also of ˆ λ around some λ ∗ .Another key novelty is the weakening of conditions on v n and on s n . In [14] it is assumed thatthere exists some ν < s n ≤ n ν , but we are able to prove Theorem 1 without this imsart-generic ver. 2020/08/06 file: CLvalues-procedures_EJS_submission.tex date: February 18, 2021 . Abraham, I. Castillo, E. Roquain/ C ℓ –value multiple testing ‘polynomial sparsity’ condition. The boundary assumption of [14] is equivalent to granting that v n ≥ b (log( n/s n )) / for b >
0, whereas here we assume only that v n → ∞ . This new conditionis sharp: if v n is upper bounded by a constant, then for t small enough (depending on the upperbound), one cannot simultaneously achieve lim sup n FDR( ψ ) ≤ t and FNR( ψ ) → ψ = ϕ C ℓ , orindeed for any multiple testing procedure ψ . This is proved, along with other properties concerningthe sharp boundary, in an upcoming work by the current authors. The following result strengthens the conclusion of Theorem 1, showing that the FDR convergesto t from above and obtaining a precise rate of convergence, at the cost of requiring mild extraconditions on s n and v n . Theorem 2.
In the setting of Theorem 1, assume also that s n ≥ (log n ) , v n ≥ n/s n )) / . Then there exist constants c, C, C ′ > depending on t such that uniformly over θ ∈ ℓ ( s n ; v n ) ,for all n large enough we have c log log( n/s n )log( n/s n ) ≤ FDR( ϕ C ℓ ) − t ≤ C log log( n/s n )log( n/s n ) , (25)FNR( ϕ C ℓ ) ≤ C ′ (log ns n ) − . (26) Remarks. i. In fact we prove the stronger false discovery proportion result (implying (25)) thatfor some c, C >
0, writing ε n = log log( n/s n ) / (log( n/s n )) we have cε n ≤ FDP( ϕ C ℓ ) − t ≤ Cε n , with probability at least 1 − o ( ε n ) , and correspondingly for the false negative proportion.ii. The bound s n ≥ (log n ) can be relaxed to s n ≥ b (log n ) / log log n for some large enoughconstant b = b ( t ): see Lemma 19.Let us compare this result to the q –value procedure. It was proved in [14] that, in a similarsetting to that of Theorem 1, the analogous conclusion holds for the q –value procedure: it has FDRtending to t (Theorem 3 therein) and FNR converging to 0 (Theorem 4 therein). No convergencerate was provided. In the simulations of [14], the FDR of the C ℓ –value procedure seems largerthan that of the q –value procedure, which could suggest that the convergence rate of the C ℓ –valueprocedure is slower, as might be an expected consequence of the fact that the C ℓ –value procedureuses the two empirical quantities ˆ w and ˆ λ while the q –value procedure uses only ˆ w . The followingresult shows that this intuition is not correct: the two procedures have the same convergence ratefor the FDR. Theorem 3.
In the setting of Theorem 1, we have both sup θ ∈ ℓ ( s n ,v n ) | FDR( ϕ q –val ; θ ) − t | → and sup θ ∈ ℓ ( s n ,v n ) FNR( ϕ q –val ; θ ) → . Furthermore, in the setting of Theorem 2 there existconstants c, C, C ′ > depending on t such that uniformly over θ ∈ ℓ ( s n ; v n ) , for all n largeenough we have c log log( n/s n )log( n/s n ) ≤ FDR( θ , ϕ q –val ) − t ≤ C log log( n/s n )log( n/s n ) ; (27)FNR( θ , ϕ q –val ) ≤ C ′ (cid:0) log ns n (cid:1) − . (28)The key change relative to Theorem 3 in [14] is, as noted, that we obtain explicit convergencerates. As with Theorems 1 and 2 we have also weakened the boundary condition relative to [14],and allowed for sparsities s n only of slightly smaller order than n rather than requiring polynomialsparsity. imsart-generic ver. 2020/08/06 file: CLvalues-procedures_EJS_submission.tex date: February 18, 2021 . Abraham, I. Castillo, E. Roquain/ C ℓ –value multiple testing The proof relies on the concentration of ˆ w and ˆ λ . One shows (Lemmas 5 and 7) that ˆ w concentratesnear a (deterministic) value w ∗ , of order slightly larger than s n /n , that roughly maximizes theexpectation of the log-likelihood (10). Recalling that S denotes the support of θ , S = { i : θ ,i =0 } , the signal strength assumption ensures that for i ∈ S , with high probability ℓ i,w ∗ ≈
0. Hence,using that ˆ λ ≥ t >
0, we obtain X i ∈ S ϕ C ℓi ≈ s n ; and more precisely X i ∈ S (1 − ϕ C ℓi ) = o ( s n )see Lemma 9. This implies that the FNR of ϕ C ℓ tends to 0. For the FDR result, let V λ,w , λ, w ∈ [0 , ϕ λ,w , that is V λ,w = X i S { ℓ i,w < λ } . (29)One shows (Lemmas 6 and 8) that with high probability, ˆ λ is close to the solution λ ∗ to E [ V λ ∗ ,w ∗ ]( E θ =0 [ ℓ ,w ∗ | ℓ ,w ∗ < λ ∗ ] − t ) = ts n . This is because V λ ∗ ,w ∗ and P i S ℓ i,w ∗ { ℓ i,w ∗ < λ ∗ } concentrate around their means (Lemma 10and the proof of Lemma 8) ensuring that with high probability (recall ℓ i,w ∗ ≈ i ∈ S )postFDR w ∗ ( ϕ λ ∗ ,w ∗ ) ≈ P i S ℓ i,w ∗ { ℓ i,w ∗ < λ ∗ } s n + P i S { ℓ i,w ∗ < λ ∗ }≈ P i/ ∈ S E [ ℓ i,w ∗ ( X ) { ℓ i,w ∗ < λ ∗ } ] E [ V λ ∗ ,w ∗ ] + s n = E [ V λ ∗ ,w ∗ ] E θ =0 [ ℓ ,w ∗ | ℓ ,w ∗ < λ ∗ ] E [ V λ ∗ ,w ∗ ] + s n = t. Then, again using concentration of V λ ∗ ,w ∗ ,FDR( ϕ C ℓ ) ≈ E [ V λ ∗ ,w ∗ ] s n + E [ V λ ∗ ,w ∗ ] = ts n / ( E θ =0 [ ℓ ,w ∗ | ℓ ,w ∗ < λ ∗ ] − t ) s n + ts n / ( E θ =0 [ ℓ ,w ∗ | ℓ ,w ∗ < λ ∗ ] − t )= t/E θ =0 [ ℓ ,w ∗ | ℓ ,w ∗ < λ ∗ ] ≈ t (cid:0) − E θ =0 [ ℓ ,w ∗ | ℓ ,w ∗ < λ ∗ ]) (cid:1) , with the last approximation following from a Taylor expansion. Finally, one notes (Lemma 6) that E θ =0 [ ℓ ,w ∗ | ℓ ,w ∗ < λ ∗ ] converges to 1 (from below) at a rate ε n = log log( n/s n ) / log( n/s n ).The errors arising each time ≈ is invoked above depend on the sparsity s n and on the boundaryseparation sequence v n , and are shown in the setting of Theorem 2 to be of smaller order than ε n ,so that this concludes the (sketch) proof. Benefits of the C ℓ –value procedure. The key contribution of this paper is to analyse the C ℓ –valueprocedure. This procedure, like the q – and ℓ – value procedures, is in wide use in multiple testingand does not need our advocacy, but let us nevertheless highlight some advantages.Taking a Bayesian perspective, ℓ –value procedures, though optimal for classification problems,are less adapted to the FDR scale than q – and C ℓ – value procedures. Indeed, when the prior Π w correctly specifies the data distribution for some known w , these latter procedures achieve BFDRcontrol at close to the target level, while the ℓ –value procedure which typically has noticeably imsart-generic ver. 2020/08/06 file: CLvalues-procedures_EJS_submission.tex date: February 18, 2021 . Abraham, I. Castillo, E. Roquain/ C ℓ –value multiple testing smaller BFDR (recall the discussion in Section 1.3.2, and Proposition 1 in [14]). Similarly, fromthe frequentist point of view, the results herein and in [14] show that in strong signal settingswith a (non-random) sparse parameter θ , the C ℓ – and q – value procedures make full use of their“budget” of false discoveries in order to make more true discoveries, while the ℓ –value procedureundershoots the user-specified target FDR level and so is conservative.Another approach to adjusting the ℓ –value procedure to the FDR scale would be to use a deterministic threshold λ ∗ = λ ∗ n ( t ) → t asymptotically. In viewof Lemma 6, the appropriate choice would have 1 − λ ∗ of order (log n/s n ) − , which depends onthe unknown sparsity s n . The C ℓ –value procedure can be seen as one way to make an appropriatechoice adaptively to s n .While the C ℓ –value procedure shares multiple testing optimality properties with the q –valueprocedure, from a computational point of view it has more in common with the ℓ –value procedure.Indeed, given the ℓ –values, it is trivial to compute the C ℓ –value procedure, while it may remaindifficult to compute the q –values, requiring an extra integration step. See [35] for an example of q –value computations for Gaussian mixtures. General signal regimes.
The proofs herein are for strong signals. As noted after Theorem 1, withweaker signals it is impossible to achieve both small FDR and vanishing FNR. Simulations in [14]do, however, suggest that even without a signal strength assumption the C ℓ –value procedure maycontrol the FDR at close to the target level. When the ℓ –value procedure makes no discoveries(i.e. every ˆ ℓ –value is larger than the target level t ) the C ℓ –value procedure also makes no discov-eries, so that the proofs in [14] controlling the FDR of the ℓ –value procedure for very weak signalsalso apply to the C ℓ –value procedure. It remains to study “intermediate” signals, strong enoughthat the C ℓ –value procedure makes some discoveries but weaker than the class ℓ ( s n , v n ) analysedhere. We believe that the proofs in the intermediate setting require addressing significant extratechnicalities, specifically in constructing a set [ λ − , λ + ] with (1 − λ − ) and (1 − λ + ) of the sameorder that contains ˆ λ with probability tending to 1. Further comparison to the latent variable approach.
We discussed in Section 1.5 some differencesbetween the current approach, wherein the prior is used as a tool only and we target results uniformin the class ℓ ( s n ; v n ), with the approach of modelling θ as having genuinely been drawn from a“prior” known up to some (hyper-)parameters. Results in the two settings are complementary,since uniform guarantees demonstrate the “robustness” of the Bayesian approach. However, in thecurrent setting it is essential to choose an uninformative prior, hence the heavy (Cauchy) tailsof the slab distribution, while in the latent variable setting one must use a correctly specified“prior” to obtain optimal results. Relatedly, sparsity is critical for the current approach so thatthe influence of the (fixed, and arbitrary apart from the strong signal assumption) alternatives isnot too great. In contrast, in a latent variables setting one typically has dense signal, and densityis moreover helpful in such a setting since it allows accurate estimation of the distribution of thedata under the alternative. As noted after Theorem 1, one success of the current work is to removethe need for polynomial sparsity: this eliminates a gap between the two approaches, allowing ourcurrent theorems to work right up to border cases of near density.
3. Proofs of the main results
Throughout the proofs we use the following notation: for a real sequence ( a n ) n ∈ N and a non-negative sequence ( b n ) n ∈ N , we write a n . b n , b n & a n or a n = O ( b n ) if there exists a constant C such that | a n | ≤ Cb n for all n large enough; we write a n ≍ b n if a n . b n and b n . a n ; we write a n ≪ b n or a n = o ( b n ) if a n /b n → n → ∞ ; and we write a n ∼ b n if a n /b n → n → ∞ .We may also write, for example, f ( w ) ∼ g ( w ) as w → f /g )( w ) →
1, and correspondingly. imsart-generic ver. 2020/08/06 file: CLvalues-procedures_EJS_submission.tex date: February 18, 2021 . Abraham, I. Castillo, E. Roquain/ C ℓ –value multiple testing To make the given sketch argument rigorous, we define precise upper and lower bounds w ± and λ ± in place of the central quantities w ∗ , λ ∗ . There are four parameters governing convergencerates throughout the proof. For a constant α > ν n = αs − / n (log s n ) / , (30) δ n = (log( n/s n )) − , (31) ε n = δ n log log( n/s n ) , (32) ρ n = e − v n / . (33)[Recall that the ‘strong signal assumption’ of Theorem 1 is that θ ∈ ℓ ( s n , v n ).] Note that δ n = o ( ε n ) . (34)In the setting of Theorem 2 we further have ν n = o ( δ n ) , (35) ρ n ≤ δ n , (36)the former following from the fact that u ( u/ log u ) − / is decreasing on u > e and the assump-tion that s n ≥ (log n ) and the latter from the assumption that v n ≥ n/s n )) / .Recalling the definition (10) of β and defining ˜ m, m as in [14] by˜ m ( w ) = − E θ =0 h β ( X )1 + wβ ( X ) i (37) m ( τ, w ) = E θ , = τ h β ( X )1 + wβ ( X ) i , (38)we let w ± be the (almost surely unique) solutions to X i ∈ S m ( θ ,i , w − ) = (1 + ν n )( n − s n ) ˜ m ( w − ) , (39) X i ∈ S m ( θ ,i , w + ) = (1 − ν n )( n − s n ) ˜ m ( w + ) . (40)Note that equations solved by w + , w − are close to the expected score equation E [ S ( w )] = 0, where S ( w ) = L ′ ( w ) = n X i =1 β ( X i , w ) , β ( x, w ) = β ( x )1 + wβ ( x ) . (41)While it is shown in [14] that solutions exist for ν n = ν a fixed positive constant, strengtheningthis conclusion to allow ν n → w − ≤ w + to (39) and (40) for n large enough, for any α >
0, by Lemma 5.Let F w ( x ) = P θ =0 ( ℓ ,w ≤ x ) , (42)and for some A = A ( t ) > λ ± as the solutions to( n − s n ) F w − ( λ + ) (cid:0) E θ =0 [ ℓ ,w + | ℓ ,w − < λ + ] − t (cid:1) = ts n + As n ν n (43)( n − s n ) F w + ( λ − ) (cid:0) E θ =0 [ ℓ ,w − | ℓ ,w + < λ − ] − t (cid:1) = ts n − As n max( ν n , ρ n , δ n ) . (44)Note that unique solutions λ − < λ + to (43) and (44) exist by Lemma 6. imsart-generic ver. 2020/08/06 file: CLvalues-procedures_EJS_submission.tex date: February 18, 2021 . Abraham, I. Castillo, E. Roquain/ C ℓ –value multiple testing Section 4 will provide a number of core lemmas which allow a clear exposition of the proofsof Theorems 1 and 2. In particular, Lemmas 7–10 collectively tell us, via a union bound, thatthere exists an event A of probability at least 1 − ν n on which, for some a = a ( t ) > K n := { i ∈ S : ℓ i,w − > δ n } , ˆ w ∈ ( w − , w + ) ,K n ≤ s n ( ρ n + ν n ) , ˆ λ ∈ [ λ − , λ + ] ,V λ + ,w + ≤ E [ V λ + ,w + ] + as n ν n ,V λ − ,w − ≥ E [ V λ − ,w − ] − as n ν n . (45) FNR control.
By monotonicity of the ℓ –values (Lemma 4) and the fact that λ − is bounded awayfrom zero (as implied by Lemma 6) we note that for n large we have on A { i ∈ S : ϕ C ℓi = 0 } ≤ { i ∈ S : ℓ i,w − ≥ λ − } ≤ K n , (46)which in particular allows us to immediately deduce the FNR control (24):FNR( ϕ C ℓ ) ≤ E θ (cid:16) K n s n A + A c (cid:17) ≤ ρ n + ν n + P θ ( A c ) ≤ ρ n + 2 ν n → . (47)In the setting of Theorem 2, the fact that max( ρ n , ν n ) ≤ δ n (recall (35) and (36)) implies the FNRclaim (26). FDR upper bound.
We turn now to the control of the false discovery rate. By monotonicity (seeLemma 4), on the event A , the number V ˆ λ, ˆ w of false discoveries made by ϕ C ℓ lies between V λ − ,w − and V λ + ,w + . By Lemma 6 we see for a constant D = D ( t ) > E [ V λ + ,w − ] = ( n − s n ) F w − ( λ + ) ≤ (1 + D max( ε n , ν n )) t (1 − t ) − s n . Lemma 11 tells us that E [ V λ + ,w + ] ≤ (1 + B max( ν n , ρ n , δ n )) E [ V λ + ,w − ] for some constant B , hencefor some constant D ′ > D we deduce using δ n = o ( ε n ) that for n large enough we have E [ V λ + ,w + ] ≤ (1 + D ′ max( ε n , ν n , ρ n )) t (1 − t ) − s n . Since for a, b >
0, the map x x/ ( a + x ) is increasing and the map b/ ( a + x ) is decreasing on x > − a , using also (45) and (46) we deduceFDP( ϕ ˆ λ, ˆ w ) ≤ V ˆ λ, ˆ w V ˆ λ, ˆ w + s n − K n A + A c ≤ E [ V λ + ,w + ] + as n ν n E [ V λ + ,w + ] + as n ν n + s n − K n + A c ≤ (1 + D ′ max( ε n , ν n , ρ n )) t (1 − t ) − s n + as n ν n (1 + D ′ max( ε n , ν n , ρ n )) t (1 − t ) − s n + s n − s n ( ρ n + (1 − a ) ν n ) + A c ≤ t + D ′ tε n + a ′ max( ν n , ρ n )1 + D ′ tε n − a ′ max( ν n , ρ n ) + A c , for some a ′ = a ′ ( t ) >
0. Taking expectations, using that P θ ( A c ) ≤ ν n and that t + D ′ tε n D ′ tε n = t + D ′ t (1 − t ) ε n D ′ tε n ≤ t + D ′ t (1 − t ) ε n , by Taylor expanding we see that for some constant A ′ = A ′ ( t ), for n large we haveFDR( ϕ ˆ λ, ˆ w ) ≤ t + t (1 − t ) D ′ ε n + A ′ max( ν n , ρ n ) . imsart-generic ver. 2020/08/06 file: CLvalues-procedures_EJS_submission.tex date: February 18, 2021 . Abraham, I. Castillo, E. Roquain/ C ℓ –value multiple testing The right side converges to t in the settings of Theorems 1 and 2. In the latter setting we notemax( ν n , ρ n ) = o ( ε n ) by (34)–(36), and the upper bound in (25) follows. FDR lower bound.
For the lower bound, note by Lemma 6 that for a constant d > E [ V λ − ,w + ] ≥ t − t s n (cid:0) dε n − At max( ν n , ρ n ) (cid:1) , for n large. Thus, by Lemma 11 and for B the constant thereof, using that δ n = o ( ε n ) we see thatfor some constants A ′ , d ′ depending on t and for n larger than some N = N ( t ) we have E [ V λ − ,w − ] ≥ (1 − B max( ν n , ρ n , δ n ))(1 + dε n − At max( ν n , ρ n )) t (1 − t ) − s n ≥ (1 + d ′ ε n − A ′ max( ν n , ρ n )) t (1 − t ) − s n , hence, using (45) and upper bounding the number of true discoveries by s n ,FDP( ϕ ˆ λ, ˆ w ) ≥ E [ V λ − ,w − ] − as n ν n s n + E [ V λ − ,w − ] − as n ν n A ≥ (1 + d ′ ε n − A ′ max( ν n , ρ n )) t (1 − t ) − s n − as n ν n s n + (1 + d ′ ε n − A ′ max( ν n , ρ n )) t (1 − t ) − s n − as n ν n − A c ≥ t + d ′ tε n − a ′ max( ν n , ρ n )1 + d ′ tε n − a ′ max( ν n , ρ n ) − A c , for a ′ = A ′ t + a (1 − t ). Similarly to the upper bound we note that for large nt + d ′ tε n d ′ tε n = t + d ′ t (1 − t ) ε n d ′ tε n ≥ t + 0 . t (1 − t ) d ′ ε n , so that Taylor expanding and taking expectations, recalling that P θ ( A c ) ≤ ν n , we obtain for some A ′′ = A ′′ ( t ) FDR( ϕ ˆ λ, ˆ w ) ≥ t + 0 . t (1 − t ) d ′ ε n − A ′′ max( ν n , ρ n ) . Again the right side tends to t in the settings of both Theorems 1 and 2. In the latter setting,for all n greater than some N = N ( t ), we have 0 . t (1 − t ) d ′ ε n > A ′′ max( ν n , ρ n ), and the lowerbound in (25) follows. Let us prove Theorem 3 in the setting of Theorem 2; the proof with the weaker conditions ofTheorem 1 is similar and omitted. Fix θ ∈ ℓ ( s n ; v n ) and let S denote the support of θ . As withthe proof of Theorems 1 and 2, by Lemmas 7 and 9 there exists an event A of probability at least1 − ν n on which, for K n := { i ∈ S : ℓ i,w − > δ n } ,ˆ w ∈ ( w − , w + ) ,K n ≤ s n ( ρ n + ν n ) . By monotonicity of the q –values (Lemma 4) it will be enough to consider the tests ( { q i,w Define S ′ w = { i ∈ S : q i,w < t } , so that FNR( ϕ q –val ) = s − n E θ [ s n − S ′ ˆ w ].In view of the fact that ℓ i,w − ≥ q i,w − (see Lemma 20) we note that for n large we have S ′ w − = X i ∈ S { q i,w − < t } ≥ X i ∈ S { ℓ i,w − < t } ≥ s n − K n , imsart-generic ver. 2020/08/06 file: CLvalues-procedures_EJS_submission.tex date: February 18, 2021 . Abraham, I. Castillo, E. Roquain/ C ℓ –value multiple testing so that on A , using monotonicity of q –values, S ′ ˆ w ≥ S ′ w − ≥ s n (1 − ν n − ρ n ) . In the current setting max( ν n , ρ n ) ≤ δ n , so thatFNR( ϕ q –val ) = s − n E θ [ s n − S ′ ˆ w ] ≤ E θ (cid:0) ( ν n + ρ n ) A + A c (cid:1) ≤ ρ n + 2 ν n ≤ δ n , proving (28).We proceed with the proof of the FDR lower bound. As with the proofs in the C ℓ –value case,the key remaining steps are to prove the concentration of and to control the expectation of thenumber of false positives, and we begin with the latter. Second step: bounding the expected number of false positives. Define r : (0 , → [0 , ∞ ) and χ : (0 , → [0 , ∞ ) by r ( w, t ) = wt (1 − w )(1 − t ) , (48) χ ( x ) = ( ¯Φ / ¯ G ) − ( x ) . (49)Note that χ is well-defined and strictly decreasing because ¯Φ / ¯ G itself is strictly decreasing on [0 , ∞ ) (see Lemma 4). Moreover, recalling the definition (17) of the q –values, we note that for any w ∈ [0 , and t ∈ [0 , , { q i,w < t } = {| X i | > χ ( r ( w, t )) } . We write V ′ w = X i/ ∈ S { q i,w < t } = X i/ ∈ S {| X i | > χ ( r ( w, t )) } for the number of false positives of the multiple testing procedure ( { q i,w < t } ) ≤ i ≤ n . Note that V ′ w is increasing in w ∈ (0 , (Lemma 4) and by definition of χ satisfies E θ V ′ w = 2( n − s n )Φ( χ ( r ( w, t ))) = ( n − s n ) r ( w, t )2 G ( χ ( r ( w, t ))) , provided r ( w, t ) ≤ . From Lemma 12, we have ˜ m ( w ) (cid:18) c log log(1 /w )log(1 /w ) (cid:19) ≤ G ( χ ( r ( w, t ))) ≤ ˜ m ( w ) (cid:18) c ′ log log(1 /w )log(1 /w ) (cid:19) (50)for w small enough (smaller than some threshold possibly depending on t ). Using the defini-tion (39) of w − to translate from ˜ m to m , Lemma 15 to lower bound m ( θ ,i , w − ) , and that w − ≍ ( s n /n ) log( n/s n ) / (hence log log(1 /w − ) / log(1 /w − ) ≍ log log( n/s n ) / log( n/s n ) = ε n ) byLemma 5, we obtain E θ V ′ w − ≥ ( n − s n ) w − − w − t − t ˜ m ( w − ) (cid:18) c log log(1 /w − )log(1 /w − ) (cid:19) ≥ t − t w − X i ∈ S m ( θ ,i , w − )(1 + ν n ) − (1 + cε n ) ≥ s n t − t (1 + ν n ) − (1 − ρ n ) (1 + c ε n ) ≥ s n t − t (1 + c ε n ) , for some constants c , c > , because (1 + ν n ) − = 1 − O ( ν n ) and max( ν n , ρ n ) = o ( ε n ) . Third step: concentration of the number of false positives. Recalling that ν n = αs − / n (log s n ) / ,we see from an application of Bernstein’s inequality (Lemma 21) and the above that for someconstant c > and for a = a ( t ) large enough, P θ ( V ′ w − − E θ V ′ w − ≥ − aν n E θ V ′ w − ) ≤ exp {− (3 / a ν n E θ V ′ w − } ≤ e − c a s n ν n t/ (1 − t ) ≤ s − / n . imsart-generic ver. 2020/08/06 file: CLvalues-procedures_EJS_submission.tex date: February 18, 2021 . Abraham, I. Castillo, E. Roquain/ C ℓ –value multiple testing Fourth step: deriving the FDR lower bound. Using the previous steps and upper bounding thenumber of true positives by s n , we obtain by using again the monotonicity of the q –values andthat the map x x/ ( a + x ) is increasing on x > − a that FDP( θ , ϕ q –val ) ≥ V ′ ˆ w V ′ ˆ w + s n ≥ V ′ w − V ′ w − + s n { ˆ w ≥ w − }≥ (1 − aν n ) E θ V ′ w − (1 − aν n ) E θ V ′ w − + s n { ˆ w ≥ w − } { V ′ w − − E θ V ′ w − ≥ − aν n E θ V ′ w − } , for a = a ( t ) as above. Taking the expectation and using the bounds we have attained on proba-bilities, we find FDR( θ , ϕ q –val ) ≥ (1 − aν n ) E θ V ′ w − /s n (1 − aν n ) E θ V ′ w − /s n + 1 − ν n . Using the previously obtained bound on E θ V ′ w − and the fact that (1 − aν n )(1 + c ε n ) ≥ cε n for some c > and n large enough, we find that (1 − aν n ) E θ V ′ w − /s n (1 − aν n ) E θ V ′ w − /s n + 1 ≥ (1 + cε n ) t (1 + cε n ) t + 1 − t = t + cε n t cε n t = t + cε n t (1 − t )1 + cε n t ≥ t + 0 . t (1 − t ) cε n , and we deduce the FDR lower bound. Fifth step: deriving the FDR upper bound. Recall that on the event A , for n large we have both S ′ w − ≥ s n (1 − ν n − ρ n ) and w − ≤ ˆ w ≤ w + . Again using that x x/ ( a + x ) is increasing and herealso that x b/ ( a + x ) is decreasing on x > − a , FDP( θ , ϕ q –val ) ≤ V ′ w + ( V ′ w + + S ′ w − ) ∨ A + A c ≤ V ′ w + V ′ w + + s n (1 − ν n − ρ n ) A + A c . Here one could use a concentration argument as for the lower bound, but noting that x x/ ( a + x ) is convex, we bypass the need for this by appealing to Jensen’s inequality to obtain FDR( θ , ϕ q –val ) ≤ E θ V ′ w + E θ V ′ w + + s n (1 − ν n − ρ n ) + ν n . (51)For upper bounding E θ V ′ w + , we proceed as for the lower bound part: using (50), the definition(40) of w + , Lemma 15, and that w + ≍ ( s n /n ) log( n/s n ) / (so that log log(1 /w + ) / log(1 /w + ) ≍ ε n and w + = o ( ε n ) ) by Lemma 5, we find E θ V ′ w + ≤ ( n − s n ) r ( w + , t ) ˜ m ( w + ) (1 + c ′ ε n ) ≤ t (1 − t ) − (1 − w + ) − s n (1 − ν n ) − (1 + c ′ ε n ) ≤ t (1 − t ) − s n (1 + cε n ) , for any c > c ′ , for n larger than some N ( t ) , using again that ν n = o ( ε n ) . Substituting into (51)and recalling that we also have ρ n = o ( ε n ) yields FDR( θ , ϕ q –val ) ≤ t (1 + cε n ) t (1 + cε n ) + (1 − t )(1 − ν n − ρ n ) + ν n ≤ t + tcε n tcε n + o ( ε n ) ≤ t + t (1 − t ) cε n + o ( ε n ) . This completes the upper-bound and hence the proof. imsart-generic ver. 2020/08/06 file: CLvalues-procedures_EJS_submission.tex date: February 18, 2021 . Abraham, I. Castillo, E. Roquain/ C ℓ –value multiple testing 4. Core lemmas The following monotonicity results are mostly clear from the definitions. Lemma 4 (Monotonicity) . We have the following monotonicity results, all of which may be non-strict unless specified.As w ∈ (0 , increases, with other parameters fixed (note that we typically apply these resultswith n increasing and w = w n decreasing), ℓ i, ( X ) ≥ ℓ i,w ( X ) ↓ (strictly) q i, ( X ) ≥ q i,w ( X ) ↓ (strictly) V λ,w ↑ ( n − s n ) , λ ∈ (0 , V ′ w ↑ ( n − s n )postFDR w ( ϕ ) ↓ u ( ϕ λ,w ) ↑ n n X i =1 ℓ i,u , u ∈ (0 , L ′ ( w ) = S ( w ) ↓ P ni =1 β ( X i )1+ β ( X i ) (a.s. strictly).For fixed w, w ′ ∈ (0 , , as λ ∈ [0 , increases, V λ,w ↑ n − s n ,F w ( λ ) ↑ (strictly) E θ =0 [ ℓ ,w | ℓ ,w ′ < λ ] ↑ E θ =0 [ ℓ ,w ] Finally, we note that ( φ/g )( x ) and ( ¯Φ / ¯ G )( x ) decrease strictly as x ∈ [0 , ∞ ) increases. The following lemmas then form the core of the proofs of Theorems 1 and 2. Some ancillaryresults used in the proofs of these lemmas are relegated to Appendix A. Lemma 5. Under the assumptions of Theorem 1, define ν n as in (30) with α > arbitrary.Then there exist unique solutions w − , w + to (39) and (40) satisfying s n /n ≤ w − ≤ w + . ( s n /n )(log( n/s n )) / . More sharply, for w ∈ { w − , w + } , we have w ≍ s n ( n − s n ) − ˜ m ( w ) − ≍ s n ( n − s n ) − log( n/s n ) / ≍ ( s n /n )(log n/s n ) / . Lemma 6. In the setting of Theorem 1, for any constant A there exist unique solutions λ − < λ + to (43) and (44) , and these solutions satisfy − λ − ≍ − λ + ≍ δ n , (52) with suppressed constants depending on t . We further note that for some constants C, c > depending on t , E θ =0 [ ℓ ,w + | ℓ ,w − < λ + ] ≥ − Cε n , (53) E θ =0 [ ℓ ,w − | ℓ ,w + < λ − ] ≤ − cε n , (54) and that for some D, d > depending on t , recalling F w ( λ ) := P θ =0 ( ℓ ,w < λ ) , ( n − s n ) F w − ( λ + ) ≤ t − t s n (cid:0) D max( ε n , ν n ) (cid:1) , (55) ( n − s n ) F w + ( λ − ) ≥ t − t s n (cid:0) dε n − At max( ν n , ρ n ) (cid:1) , (56) for all n large enough. imsart-generic ver. 2020/08/06 file: CLvalues-procedures_EJS_submission.tex date: February 18, 2021 . Abraham, I. Castillo, E. Roquain/ C ℓ –value multiple testing Lemma 7. Under the assumptions of Theorem 1, recalling the definition (9) of ˆ w and the defini-tions (39) and (40) of w ± , we have P θ ( ˆ w ( w − , w + )) = o ( ν n ) , (57) provided the constant α in the definition (30) of ν n is large enough. Lemma 8. Under the assumptions of Theorem 1, recalling the definition (14) of ˆ λ as the thresholdof ϕ C ℓ and the definitions (43) and (44) of λ ± , we have P θ (ˆ λ [ λ − , λ + ]) = o ( ν n ) , (58) provided the constant A = A ( t ) is large enough in the definitions of λ ± . Lemma 9. In the setting of Theorem 1, recall that S denotes the support of θ as in (21) anddefine the (random) set S = { i ∈ S : ℓ i,w − ≤ δ n } , where w − is as in (39) . Then, defining K n = | S \ S | = { i ∈ S : ℓ i,w − > δ n } , (59) for all n large enough we have P θ (cid:0) K n /s n > ρ n + ν n (cid:1) = o ( ν n ) , (60) provided the constant α in the definition (30) of ν n is large enough. Lemma 10. In the setting of Theorem 1, define V λ,w as in (29) . Then P θ ( | V λ + ,w + − E [ V λ + ,w + ] | > as n ν n ) = o ( ν n ) , for some constant a = a ( t ) . The same holds upon replacing one or both of λ + and w + respectivelywith λ − and w − . Lemma 11. In the setting of Theorem 1, recall the definitions (29) , (39) , (40) , (43) and (44) of V λ,w , w ± , and λ ± . Then for some constant B > , EV λ + ,w + ≤ EV λ + ,w − (cid:16) B max( ν n , ρ n , δ n ) (cid:17) , (61) EV λ − ,w − ≥ EV λ − ,w + (cid:16) − B max( ν n , ρ n , δ n ) (cid:17) . (62) We here define two final quantities which appear in the proofs, closely related to χ as defined in(49): recalling the definition β ( x ) = ( g/φ )( x ) − from (10), we set ξ ( x ) = ( φ/g ) − ( x ) , x ∈ (0 , ( φ/g )(0)] (63) ζ ( w ) = β − (1 /w ) , w ∈ (0 , . (64)Note the relationship ζ ( w ) = ξ ( w/ (1 + w )) . (65) Proof of Lemma 4. Strict monotonicity in w of ℓ i,w , q i,w is immediate from the definitions (8)and (17): for example, ℓ i,w = (1 − w ) φ ( X i )(1 − w ) φ ( X i ) + wg ( X i ) = 11 + ( w/ (1 − w ))( g/φ )( X i ) decreases as w increases because ( g/φ )( X i ) > . Non-strict monotonicity in w of V λ,w , V ′ w , postFDR w ( ϕ ) follows immediately. The monotonicity of V λ,w in λ is also clear (and note that ℓ i,w < for w ∈ (0 , so that ( ϕ ,w ) i = 1 for all i ). To see that postFDR u ( ϕ λ,w ) ≥ postFDR u ( ϕ λ,w ) if w ≥ w , imsart-generic ver. 2020/08/06 file: CLvalues-procedures_EJS_submission.tex date: February 18, 2021 . Abraham, I. Castillo, E. Roquain/ C ℓ –value multiple testing note that changing w does not change the ordering of the ℓ i,w values, only their magnitudes,since smaller ℓ values correspond to larger values of | X i | . It follows that ℓ i,w and ℓ i,w both selectcoordinates i in order of increasing ℓ i,u values. Since the ℓ i,w are monotonic in w , we see that ϕ λ,w selects every i selected by ϕ λ,w , so that postFDR u ( ϕ λ,w ) , which can be viewed as the average ofthe selected ℓ i,u values (cf. (15)), is no smaller than postFDR u ( ϕ λ,w ) .Strict decreasingness of φ/g is immediate from the definition (7), and implies the same of ¯Φ / ¯ G (see [14, Lemma S-9]). In view of the explicit expression for F w in Lemma 16, its strictmonotonicity follows from that of ¯Φ , ξ = ( φ/g ) − and r ( w, λ ) = wλ (1 − w ) − (1 − λ ) − . Similarlythe score function S ( w ) = L ′ ( w ) defined in (41) can be seen, by differentiating, to be strictlydecreasing on the event where there exists i such that β ( X i ) = 0 , which has probability 1 because β ( x ) = ( g/φ )( x ) − is strictly increasing and the X i ’s have non-atomic distributions.For monotonicity of E θ =0 [ ℓ ,w | ℓ ,w ′ < λ ] in λ , first note that, writing ξ w ( λ ) = ξ ( r ( w, λ )) , adirect calculation yields { ℓ i,w < λ } = {| X i | > ξ w ( λ ) } . It follows that { ℓ ,w ′ < λ } = { ℓ ,w < ξ − w ◦ ξ w ′ ( λ ) } , and hence that we can express the expectation as E θ =0 [ ℓ ,w | ℓ ,w ′ < λ ] = Z ◦ ξ − w ◦ ξ w ′ ( λ ) , Z ( x ) = E θ =0 [ ℓ ,w | ℓ ,w < x ] . It suffices, since ξ w is decreasing, to note that Z is increasing, which is intuitively clear and formallyfollows from the following calculations: writing U = ℓ ,w , for b > a we have E [ U | U < b ] = E [ U | U < a ] Pr( U < a | U < b ) + E [ U | a ≤ U < b ] Pr( U ≥ a | U < b ) Then, since E [ U | a ≤ U < b ] ≥ a ≥ E [ U | U < a ] , we deduce that E [ U | U < b ] ≥ E [ U | U < a ](Pr( U < a | U < b ) + Pr( U ≥ a | U < b )) = E [ U | U < a ] . Proof of Lemma 5. We claim that, for some constant C > , X i ∈ S m ( θ ,i , s n /n ) > (1 + ν n )( n − s n ) ˜ m ( s n /n ) (66) X i ∈ S m (cid:0) θ ,i , C ( s n /n )(log n/s n ) / (cid:1) < (1 − ν n )( n − s n ) ˜ m (cid:0) C ( s n /n )(log n/s n ) / (cid:1) , (67)at least for large enough n . Existence of w ± satisfying s n /n ≤ w − ≤ w + . ( s n /n )(log n/s n ) / thenfollows from the intermediate value theorem, since ˜ m is continuous, increasing and non-negativeand m ( τ, · ) is continuous and decreasing for each fixed τ (see Lemma 20).To prove the claim, note that asymptotically as w → with w ≥ s n /n , by Lemma 15 we havefor some c, c ′ > c (log(1 /w )) − / ≤ ˜ m ( w ) ≤ c ′ (log(1 /w )) − / , / (2 w ) ≤ m ( θ ,i , w ) ≤ /w. It follows that the left side of (66) is of order n , while the right side is of the smaller order n log( n/s n ) − / . It also follows that X i ∈ S m ( θ ,i , C ( s n /n )(log n/s n ) / ) ≤ C − n (log( n/s n )) − / , (1 − ν n )( n − s n ) ˜ m ( C ( s n /n )(log n/s n ) / ) & ( n − s n ) log( n/s n ) − / for n large, where the suppressed constant does not depend on C (or α ), so that the right side of(67) upper bounds the left for C large enough, as claimed. imsart-generic ver. 2020/08/06 file: CLvalues-procedures_EJS_submission.tex date: February 18, 2021 . Abraham, I. Castillo, E. Roquain/ C ℓ –value multiple testing To prove the sharper asymptotics, observe by definition that for w ∈ { w − , w + } we have X i ∈ S m ( θ ,i , w ) = (1 ± ν n )( n − s n ) ˜ m ( w ) . Since s n /n ≤ w ≤ ( s n /n )(log n/s n ) / we may use the bounds on m given above to see that theleft side is ≍ s n w − . We also note that log(1 /w ) ≍ log( n/s n ) , so that the bounds on ˜ m given aboveyield ˜ m ( w ) ≍ log( n/s n ) − / . The result follows, noting also that s n /n → so n − s n ≍ n . Proof of Lemma 6. We prove the results for λ + ; the proofs for λ − are almost identical. We beginby showing that any solution to (43) is necessarily unique. Indeed, since the right side is positive,any solution necessarily lies in the set { λ : E θ =0 [ ℓ ,w + | ℓ ,w − < λ ] > t } . On this set, since λ E θ =0 [ ℓ ,w + | ℓ ,w − < λ ] − t is a non-decreasing positive function and F w − is a strictly increasing non-negative function (see Lemma 4), the left side of (43) is strictlyincreasing, yielding the claimed uniqueness of any solution.Next, Lemma 5 tells us that w − ≍ ( s n /n )(log n/s n ) / , so that log(1 /w − ) ≍ log( n/s n ) and w / − /δ n → , hence by Lemma 16 (with c = 1 / ) we have for any constant κ > F w − (1 − κδ n ) ≍ κ − δ − n w − (log(1 /w − )) − / ≍ κ − n − s n . [All suppressed constants in this proof will be independent of κ .] Similarly, by Lemma 17 we have − E θ =0 [ ℓ ,w + | ℓ ,w − < − κδ n ] ≍ κδ n log(1 /δ n ) = κε n . (68)Inserting these bounds we see that the left side of (43) is bounded above and below by a constanttimes κ − ( n − s n ) n − s n (1 − t − O ( κε n )) ≍ κ − s n . For κ large enough (depending on t ) this is smaller than the right side of (43) and for κ smallit is larger. The left side is continuous in λ + (see Lemmas 16 and 17) while the right side isfixed, so we deduce by the intermediate value theorem the existence of a solution λ + satisfying − Cδ n ≤ λ + ≤ − cδ n for constants C, c > , so that (52) is proved.The expectation result (53) now follows immediately from (68). The bound (55) for F w − ( λ + ) is obtained by rearranging the definition (43), inserting the bound for E θ =0 [ ℓ ,w + | ℓ ,w − ≤ λ + ] ,and using that (1 − x ) − = 1 + O ( x ) as x → . [For the bound on F w + ( λ − ) , one also recalls that δ n = o ( ε n ) .]Finally, to see that λ − < λ + , observe that ℓ ,w − > ℓ ,w + (Lemma 4) so that for λ > t , E θ =0 [( ℓ ,w − − t ) { ℓ ,w + < λ } ] − E θ =0 [( ℓ ,w + − t ) { ℓ ,w − < λ } ]= E θ =0 [( ℓ ,w − − ℓ ,w + ) { ℓ ,w − < λ } ] + E θ =0 [( ℓ ,w − − t ) { ℓ ,w + < λ ≤ ℓ ,w − } ] ≥ . Since (52) shows that λ − > t for n large, we apply this with λ = λ − to deduce that the left sideof (43) evaluated at λ − is smaller than its right side: F w − ( λ − )( E θ =0 [ ℓ ,w + | ℓ ,w − < λ − ] − t ) = E θ =0 [( ℓ ,w + − t ) { ℓ ,w − < λ − } ] ≤ E θ =0 [( ℓ ,w − − t ) { ℓ ,w + < λ − } ]= F w + ( λ − )( E θ =0 [ ℓ ,w − | ℓ ,w + < λ − ] − t )= ts n − As n max( ν n , ρ n , δ n ) n − s n < ts n + As n ν n n − s n . Since the right side of (43) is constant and the left side increases with λ + (as noted above whenshowing uniqueness), this implies that λ + > λ − . imsart-generic ver. 2020/08/06 file: CLvalues-procedures_EJS_submission.tex date: February 18, 2021 . Abraham, I. Castillo, E. Roquain/ C ℓ –value multiple testing Proof of Lemma 7. We follow the proof of Lemmas S-3 in [14], with the essential difference thatwe do not use the polynomial sparsity but rather the strong signal assumption. Let us prove P θ ( ˆ w < w − ) = o ( ν n ) for α large enough in the definition of ν n , the proof that P θ ( ˆ w > w + ) = o ( ν n ) being similar.Let S = L ′ denote the score function as in (41). Since ˆ w maximises L ( w ) , necessarily S ( ˆ w ) ≤ or ˆ w = 1 . If ˆ w < w − then only the former may hold, and we also deduce by monotonicity of S (Lemma 4) that S ( w − ) ≤ S ( ˆ w ) ≤ . Finally, by strict monotonicity of S , if S ( w − ) = 0 thennecessarily w − = ˆ w . This implies that { ˆ w < w − } ⊂ { S ( w − ) < } , hence P θ ( ˆ w < w − ) ≤ P θ ( S ( w − ) < 0) = P θ ( S ( w − ) − E θ S ( w − ) < − E θ S ( w − ))= P θ n X i =1 W i < − E ! , where we have introduced the notation W i = β ( X i , w ) − m ( θ ,i , w − ) and E = E θ S ( w − ) = P ni =1 m ( θ ,i , w − ) . For n large | W i | ≤ M = 2 /w − a.s. (see Lemma 20), so that we may scale thevariables W i to apply the Bernstein inequality (Lemma 21) and obtain P θ ( ˆ w < w − ) ≤ e − . E / ( V + M E/ , where V = P ni =1 Var( W i ) ≤ P ni =1 E θ m ( θ ,i , w ) , for m ( θ ,i , w ) = E θ ( β ( X i , w ) ) . In view ofthe definition (39) of w − , we have E = X i ∈ S m ( θ ,i , w − ) − ( n − s n ) ˜ m ( w − ) = ν n ( n − s n ) ˜ m ( w − ) . We also note, using the strong signal assumption and the bounds on m in Lemma 20 that forsome constants C, M > and n larger than some universal threshold, V ≤ X i : | θ ,i | >M m ( θ ,i , w − ) + X i : θ ,i =0 m (0 , w − ) ≤ Cw − X i ∈ S m ( θ ,i , w − ) + C ( n − s n ) ¯Φ( ζ ( w − )) w − , with ζ defined as in (64). By a standard normal tail bound and the definition of ζ , we have ¯Φ( ζ ( w − )) ≍ φ ( ζ ( w − )) /ζ ( w − ) ≍ w − g ( ζ ( w − )) /ζ ( w − ) , which is of order w − ˜ m ( w − ) /ζ ( w − ) because ˜ m ( w − ) ≍ ζ ( w − ) g ( ζ ( w − )) (see Lemma 20). Using the latter, and the fact that ζ ( w − ) → ∞ (Lemma 20), in combination with (39) gives V . nw − − ˜ m ( w − ) + nw − − ˜ m ( w − ) /ζ ( w − ) . nw − − ˜ m ( w − ) , so that V + M E/ E . nw − − ˜ m ( w − )( ν n ( n − s n ) ˜ m ( w − )) + 1 w − ν n ( n − s n ) ˜ m ( w − ) . ν n nw − ˜ m ( w − ) . This implies that P θ ( ˆ w < w − ) ≤ e − cν n nw − ˜ m ( w − ) for some constant c > . Now, by Lemma 5,we have nw − ˜ m ( w − ) ≍ s n . Hence, recalling the definition ν n = αs − / n (log s n ) / from eq. (30),we deduce that ν n nw − ˜ m ( w − ) ≥ log s n if the constant α is large enough, and hence the aboveprobability is bounded above by s − / n = o ( ν n ) . Proof of Lemma 8. Let B be an event whose complement has probability P θ ( B c ) = o ( ν n ) onwhich, with K n := { i ∈ S : ℓ i,w − > δ n } , ˆ w ∈ ( w − , w + ) ,K n ≤ s n ( ρ n + ν n ) ,V λ + ,w − ≤ E [ V λ + ,w − ] + as n ν n ,V λ − ,w + ≥ E [ V λ − ,w + ] − as n ν n ; (69) imsart-generic ver. 2020/08/06 file: CLvalues-procedures_EJS_submission.tex date: February 18, 2021 . Abraham, I. Castillo, E. Roquain/ C ℓ –value multiple testing note that such an event exists by Lemmas 7, 9 and 10, the proofs of which are independent ofLemma 8. Recall that ˆ λ is characterised by the posterior FDR: postFDR ˆ w ( ϕ λ, ˆ w ) := P ni =1 ℓ i, ˆ w { ( ϕ λ, ˆ w ) i = 1 } ∨ ( P ni =1 { ( ϕ λ, ˆ w ) i = 1 } ≤ t ⇐⇒ λ ≤ ˆ λ. Thus, it is enough to bound the posterior FDRs of ϕ λ − , ˆ w , ϕ λ + , ˆ w above and below respectively by t . We prove the upper and lower bound separately, which suffices by a union bound. Upper bound, postFDR ˆ w ( ϕ λ + , ˆ w ) > t with probability at least − o ( ν n ) . On the event B , mono-tonicity (see Lemma 4) allows us to deduce that postFDR ˆ w ( ϕ λ + , ˆ w ) ≥ postFDR ˆ w ( ϕ λ + ,w − ) ≥ postFDR w + ( ϕ λ + ,w − ) ≥ P i S ℓ i,w + { ℓ i,w − < λ + } s n + V λ + ,w − , (70)where to obtain the last line we have used that P i ∈ S { ℓ i,w − < λ + } ≤ s n and P i ∈ S ℓ i,w + { ℓ i,w − <λ + } ≥ . We apply Bernstein’s inequality (see Lemma 21) with, for some a (indeed, the same a as in (69), coming originally from Lemma 10, works), u = as n ν n , U i = − ℓ i,w + { ℓ i,w − < λ + } , i S . Note that X i S E [ U i ] = − E [ V λ + ,w − ] E θ =0 [ ℓ ,w + | ℓ ,w − < λ + ] , X i S Var( U i ) ≤ EV λ + ,w − ≍ s n . For a large enough we deduce that P θ (cid:16) X i S ℓ i,w + { ℓ i,w − < λ + } < E [ V λ + ,w − ] E θ =0 [ ℓ ,w + | ℓ ,w − < λ + ] − as n ν n (cid:17) ≤ s − / n . (71)Then by a union bound we see that on an event C ⊂ B of probability at least P ( B ) − s − / n =1 − o ( ν n ) , the numerator in the final line of (70) is lower bounded by E [ V λ + ,w − ] E θ =0 [ ℓ ,w + | ℓ ,w − < λ + ] − as n ν n =( n − s n ) F w − ( λ + ) E θ =0 [ ℓ ,w + | ℓ ,w − < λ + ] − as n ν n . Recalling also that on B we have V λ + ,w − ≤ E [ V λ + ,w − ] + as n ν n = ( n − s n ) F w − ( λ + ) + as n ν n , we deduce that postFDR ˆ w ( ϕ λ + , ˆ w ) ≥ C ( n − s n ) F w − ( λ + ) E θ =0 [ ℓ ,w + | ℓ ,w − < λ + ] − as n ν n s n + ( n − s n ) F w − ( λ + ) + as n ν n . Substituting for the first term in the numerator from the definition (43), we find that, for A > (1 + t ) a , postFDR ˆ w ( ϕ λ + , ˆ w ) ≥ C (cid:16) t + ( A − (1 + t ) a ) s n ν n s n + ( n − s n ) F w − ( λ + ) + as n ν n (cid:17) > t C , so that indeed ˆ λ ≤ λ + , at least for n large enough, on the event C . imsart-generic ver. 2020/08/06 file: CLvalues-procedures_EJS_submission.tex date: February 18, 2021 . Abraham, I. Castillo, E. Roquain/ C ℓ –value multiple testing Lower bound, postFDR ˆ w ( ϕ λ − , ˆ w ) ≤ t with probability at least − o ( ν n ) . On the event B , recalling(69), and using monotonicity of the ℓ –values (Lemma 4) and the fact that λ − is bounded awayfrom zero (Lemma 6), we see that { i ∈ S : ( ϕ λ − ,w + ) i = 0 } ≤ { i ∈ S : ℓ i,w + > δ n } ≤ { i ∈ S : ℓ i,w − > δ n } = K n ≤ s n ( ρ n + ν n ) . Since ℓ i,w + ≤ for all i , we also note that X i ∈ S ℓ i,w + ≤ K n + X i ∈ S δ n ≤ s n ( ρ n + ν n + δ n ) . Then on B , monotonicity arguments as used for the upper bound yield postFDR ˆ w ( ϕ λ − , ˆ w ) ≤ postFDR w − ( ϕ λ − ,w + ) ≤ P i S ℓ i,w − { ℓ i,w + < λ − } + s n ( ρ n + ν n + δ n ) s n − s n ( ρ n + ν n ) + V λ − ,w + . (72)Applying Bernstein’s inequality as for the upper bound, here with variables U i = ℓ i,w − { ℓ i,w + <λ − } , i S , we deduce that there is an event C ′ ⊂ B of probability at least − o ( ν n ) such that postFDR ˆ w ( ϕ λ − , ˆ w ) C ′ ≤ ( n − s n ) F w + ( λ − ) E θ =0 [ ℓ ,w − | ℓ ,w + < λ − ] + s n ( ρ n + (1 + a ) ν n + δ n ) s n + ( n − s n ) F w + ( λ − ) − s n ( ρ n + (1 + a ) ν n ) . Substituting for ( n − s n ) F w + ( λ − ) E θ =0 [ ℓ ,w − | ℓ ,w + < λ − ] in the numerator from the definition(44) of λ − , the right side is upper bounded by t if A is large enough, so that indeed λ − ≤ ˆ λ on C ′ . Proof of Lemma 9. Let u n = 5 log log( n/s n ) and define S = { i ∈ S : | X i | > p n/s n ) + u n } . First, we show that S ⊂ S . From Lemma 20 we have, for ξ = ( φ/g ) − , ξ ( u ) ≤ (cid:16) /u ) + 2 log log(1 /u ) + 6 log 2 (cid:17) / . The right side is decreasing in u so, recalling that w − ≥ s n /n (Lemma 5), we see that ξ evaluatedat u = ( w − / (2 log( n/s n ))) is upper bounded by the right side evaluated at u = s n / (2 n log( n/s n )) ,hence ξ (cid:16) w − n/s n ) (cid:17) ≤ p n/s n ) + 4 log log( n/s n ) + 2 log log log( n/s n ) + 8 log 2 + 2 log log 2 ≤ p n/s n ) + u n , for n large. Consequently we see that if | x | > p n/s n ) + u n , then φ ( x ) /g ( x ) = ξ − ( x ) > w − / (2 log( n/s n )) = w − δ n , so that ℓ i,w − ( x ) = (cid:16) w − w − gφ ( x ) (cid:17) − ≤ δ n , and indeed S ⊂ S .Next, observe, by Taylor expanding, that p n/s n ) + u n = p n/s n )+ o (1) . We deducethat for i ∈ S \ S , necessarily the noise variable ε i in (1) satisfies | ε i | > v n / , so that | S \ S | ≤| S \ S | ≤ N, where N is the binomial N = { i ∈ S : | ε i | > v n / } . Applying Bernstein’sinequality (Lemma 21) with U i = (cid:8) | ε i | > v n / (cid:9) , u = max (cid:0) EN, ν n (cid:1) ≥ X i ∈ S Var U i , imsart-generic ver. 2020/08/06 file: CLvalues-procedures_EJS_submission.tex date: February 18, 2021 . Abraham, I. Castillo, E. Roquain/ C ℓ –value multiple testing we see that Pr( N > EN + u ) ≤ exp (cid:16) − u / u + u/ (cid:17) = o ( ν n ) , for large enough constant α in the definition (30) of ν n . Finally, note that EN = 2 s n ¯Φ( v n / ≤ s n ρ n for ρ n = e − v n / as defined in (33), at least for n large, as a consequence of the standard tailbound ¯Φ( x ) ≍ φ ( x ) /x ≪ e − x / as x → ∞ . Proof of Lemma 10. Lemma 5 tells us that s n /n . w . ( s n /n )(log n/s n ) / for w ∈ { w − , w + } ,and Lemma 6 tells us that − λ ≍ δ n for λ ∈ { λ − , λ + } . Then V λ,w = P i S { ℓ i,w < λ } followsa binomial distribution, whose mean we deduce by Lemma 16 satisfies E [ V λ,w ] = ( n − s n ) F w ( λ ) ≍ ( n − s n ) w (1 − λ ) − (log(1 /w )) − / ≍ ( n − s n ) w (log( n/s n )) − / , so that again appealing to Lemma 5, we have E [ V λ,w ] ≍ s n . We apply Bernstein’s inequalityLemma 21 with, for some a = a ( t ) , U i = { ℓ i,w < λ } , u = as n ν n . Then P i S Var( U i ) ≤ E [ V λ,w ] ≍ s n so that for a constant C , larger than / for a large enough, P θ ( | V λ,w − E [ V λ,w ] | ≥ u ) ≤ − C log s n ) ≤ s − / n = o ( ν n ) . Proof of Lemma 11. We prove the control (61) for E [ V λ + ,w + ] ; the proof for E [ V λ − ,w − ] is almostidentical. By Lemma 16 we note that EV λ,w = ( n − s n ) F w ( λ ) = 2( n − s n ) ¯Φ( ξ ( r ( w, λ ))) for any λ, w ∈ (0 , , where we recall the definitions r ( w, λ ) = wλ (1 − w ) − (1 − λ ) − , ξ = ( φ/g ) − , so thatour goal is to bound E [ V λ + ,w + ] E [ V λ + ,w − ] − (cid:0) ξ ( r ( w + , λ + )) (cid:1) ¯Φ (cid:0) ξ ( r ( w − , λ + )) (cid:1) − . Write r ± = r ( w ± , λ + ) and ξ ± = ξ ( r ± ) (the notation ξ + is to link to r + , not to claim that ξ + ≥ ξ − ). As a consequence of Lemmas 5 and 6, log(1 /r − ) ≍ log(1 /r + ) ≍ log( n/s n ) = δ − n . Recalling that ξ ( u ) ∼ ( − u ) / as u → (see Lemma 20) it follows that ξ ± → ∞ , hence by astandard normal tail bound (also in Lemma 20) we have ≤ ¯Φ( ξ + )¯Φ( ξ − ) − ≤ (1 + ξ − ) ξ − ξ − ξ + φ ( ξ + ) φ ( ξ − ) − O (cid:16) max (cid:16) ξ − , ξ − ξ + − , φ ( ξ + ) φ ( ξ − ) − (cid:17)(cid:17) , provided the right hand side tends to zero, using that for a n , b n → , (1 + a n )(1 + b n ) − O (max( a n , b n )) . That ξ ( u ) ∼ ( − u ) / as u → implies ξ − − − O ((log 1 /r − ) − ) = O ( δ n ) .Next, by Lemma 14 we have ξ − − ξ = O (1) , hence ξ − ξ + − ξ − − ξ ξ + ξ − + ξ = O ((log n/s n ) − / ) = o ( δ n ) . It remains to control φ ( ξ + ) /φ ( ξ − ) − . By the definition of ξ we have φ ( ξ + ) φ ( ξ − ) = r + g ( ξ + ) r − g ( ξ − ) . Lemma 13 tells us that r + r − − O (max( ν n , ρ n )) , imsart-generic ver. 2020/08/06 file: CLvalues-procedures_EJS_submission.tex date: February 18, 2021 . Abraham, I. Castillo, E. Roquain/ C ℓ –value multiple testing so that it suffices to show g ( ξ + ) /g ( ξ − ) − O ( δ n ) . From the explicit definition (7) of g , we have g ( ξ + ) g ( ξ − ) − ξ − ξ − e − ξ / − e − ξ − / − . Observe that, for n large, − e − ξ / − e − ξ − / − e − ξ − / − e − ξ / − e − ξ − / ≤ e − ξ − / . The lower bound on ξ in Lemma 20 implies that ξ ( u ) ≥ p /u ) for u small, so that e − ξ − ≤ r − , which is of smaller order than δ n (note that r − ≍ ( s n /n )(log n/s n ) / as a consequence of Lemmas 5and 6). Noting that the bound attained above for ξ − /ξ + − also bounds ξ − /ξ − , we deducethat φ ( ξ + ) /φ ( ξ − ) − is suitably bounded and the lemma follows. Acknowledgements This work has been supported by ANR-16-CE40-0019 (SansSouci), ANR-17-CE40-0001 (BASICS),by the GDR ISIS through the "projets exploratoires" program (project TASTY), and by a publicgrant as part of the Investissement d’avenir project, reference ANR-11-LABX-0056-LMH, LabExLMH. imsart-generic ver. 2020/08/06 file: CLvalues-procedures_EJS_submission.tex date: February 18, 2021 . Abraham, I. Castillo, E. Roquain/ C ℓ –value multiple testing Appendix A: Auxiliary resultsLemma 12. For universal constants c, c ′ > , for all t ∈ (0 , , there exists ω ( t ) such that for w ≤ ω ( t ) , ˜ m ( w ) (cid:18) c log log(1 /w )log(1 /w ) (cid:19) ≤ G ( χ ( r ( w, t ))) ≤ ˜ m ( w ) (cid:18) c ′ log log(1 /w )log(1 /w ) (cid:19) . (73) Proof of Lemma 12. The proof relies on the following inequalities (see Lemma 20): for universalconstants C , C > and w small enough, G ( ζ ( w ))(1 − C ζ ( w ) − ) ≤ ˜ m ( w ) ≤ C ζ ( w ) − + 2 G ( ζ ( w )) . (74)Let us now prove the lower bound. By Lemma 20, for a universal constant c > , and w smallenough (smaller than a threshold that might depend on t ), ζ ( w ) − χ ( r ( w, t )) ≥ c /w ) ζ ( w ) . Hence,since g in nonincreasing on a vicinity of + ∞ , we have for w small enough G ( χ ( r ( w, t ))) − G ( ζ ( w )) = Z ζ ( w ) χ ( r ( w,t )) g ( u ) du ≥ ( ζ ( w ) − χ ( r ( w, t ))) g ( ζ ( w )) ≥ c ′ log log(1 /w ) ζ ( w ) , for a universal constant c ′ > . Combining the last display with (74) leads to ˜ m ( w ) ≤ Cζ ( w ) − + 2 G ( χ ( r ( w, t ))) − c ′ log log(1 /w ) ζ ( w ) ≤ G ( χ ( r ( w, t ))) − c ′ log log(1 /w ) ζ ( w ) , for w small enough. The lower bound now follows from ˜ m ( w ) ≍ /ζ ( w ) and ζ ( w ) ≍ (log(1 /w )) / (see Lemma 20).For the upper bound part, we proceed similarly: let us first prove that, for an universal constant c > , for w small enough (smaller than a threshold that might depend on t ), ζ ( w ) − χ ( r ( w, t )) ≤ c log log(1 /w ) ζ ( w ) . (75)This comes from Lemma 20: for w small enough, ζ ( w ) − χ ( r ( w, t )) ≤ /w ) + 2 log log(1 /w ) − − w )(1 − t ) / ( tw )) + log(log((1 − w )(1 − t ) / ( tw ))) + C + C ′ ≤ /w ) . This leads to (75). Now, proceeding as for the lower bound, we have G ( χ ( r ( w, t ))) − G ( ζ ( w )) = Z ζ ( w ) χ ( r ( w,t )) g ( u ) du ≤ ( ζ ( w ) − χ ( r ( w, t ))) g ( χ ( r ( w, t ))) ≤ c ′ log log(1 /w ) ζ ( w ) , Combining the latter with (74) gives G ( χ ( r ( w, t )) ≤ ˜ m ( w )(1 − C ζ ( w ) − ) − + 2 c ′ log log(1 /w ) ζ ( w ) , which implies the upper bound. imsart-generic ver. 2020/08/06 file: CLvalues-procedures_EJS_submission.tex date: February 18, 2021 . Abraham, I. Castillo, E. Roquain/ C ℓ –value multiple testing Lemma 13. Define w ± as in (39) and (40) , define λ + as in (43) , and recall the definitions (30) and (33) of ν n , ρ n and (48) of r . Then w + /w − − O (max( ν n , ρ n )) , r ( w + , λ + ) r ( w − , λ + ) − O (max( ν n , ρ n )) . Proof. We have w + ≥ w − (Lemma 5), hence we focus on bounding w + /w − − from above. SinceLemma 5 also tells us that w − ≥ s n /n and implies that log(1 /w − ) ≍ log(1 /w + ) ≍ log( n/s n ) , weuse Lemma 15 to bound m in the definitions (39) and (40) of w − and w + , and deduce that (1 − ν n )( n − s n ) w + ˜ m ( w + ) ≤ s n , (1 + ν n )( n − s n ) w − ˜ m ( w − ) ≥ s n (1 − ρ n ) . Taking the ratio, we deduce that w + ˜ m ( w + ) w − ˜ m ( w − ) ≤ (1 − ν n ) − (1 + ν n )(1 − ρ n ) − . Then, since w + ≥ w − and ˜ m is increasing, we see that w + w − − ≤ w + ˜ m ( w + ) w − ˜ m ( w − ) − O (max( ν n , ρ n )) , as claimed.Finally, since w + , w − → , we deduce that − w − − w + − w + /w − ) − − w + ) /w − = o ( w + /w − − , hence r ( w + , λ + ) r ( w − , λ + ) − O (cid:16) max (cid:16) w + w − − , − w − − w + − (cid:17)(cid:17) = O (max( ν n , ρ n )) . Lemma 14. Define w ± , λ ± , ξ, r as in (39) , (40) , (43) , (44) , (48) and (63) . Then ξ ( r ( w − , λ + )) − ξ ( r ( w + , λ + )) = O (1) Proof. Write r ± = r ( w ± , λ + ) and ξ ± = ξ ( r ± ) . Lemma 20 gives us the near matching upper andlower bounds on ξ that for u ∈ (0 , small enough we have ξ ( u ) ≤ (2 log(1 /u ) + 2 log log(1 /u ) + 6 log 2) / ,ξ ( u ) ≥ (2 log(1 /u ) + 2 log log(1 /u ) + 2 log 2) / . Using these bounds and monotonicity of ξ := ( φ/g ) − (which follows from the fact that of φ/g isdecreasing on x ≥ as in Lemma 4) we deduce that ≤ ξ − − ξ ≤ (cid:16) r + r − (cid:17) + 2 log log(1 /r − ) − /r + ) + 4 log 2 . (76)Observe that log log(1 /r − ) − log log(1 /r + ) = log (cid:16) log(1 /r − )log(1 /r + ) (cid:17) = log (cid:16) r + /r − )log(1 /r + ) (cid:17) . Using the standard bound log(1 + x ) ≤ x for x > − and the fact that r + → (by Lemmas 5and 6), this last expression is upper bounded by log( r + /r − )log(1 /r + ) = o (log( r + /r − )) , and, using Lemma 13, we similarly have log( r + /r − ) ≤ r + r − − O (max( ν n , ρ n )) = o (1) . Inserting into (76) we see that ξ − − ξ = O (1) , as claimed. imsart-generic ver. 2020/08/06 file: CLvalues-procedures_EJS_submission.tex date: February 18, 2021 . Abraham, I. Castillo, E. Roquain/ C ℓ –value multiple testing Lemma 15. There exists constants ω ∈ (0 , and c, c ′ > such that for any sequence s n /n → and v n → ∞ , for all θ ∈ ℓ ( s n , v n ) , for any i such that θ ,i = 0 , we have for any w ∈ [ s n /n, ω ] , (1 − ρ n ) w − ≤ m ( θ ,i , w ) ≤ w − , (77) c (log(1 /w )) − / ≤ ˜ m ( w ) ≤ c ′ (log(1 /w )) − / , (78) where we recall that ρ n = e − v n / as in (33) .Proof. Lemma 20 tells us that ˜ m ( w ) ≍ ζ ( w ) − and ζ ( w ) ∼ (2 log(1 /w )) − / , yielding (78). It alsotells us, regarding m , that there exists c > such that for all x ∈ R and all w ∈ (0 , ,m ( x, w ) ≤ min( w, c ) − , (79)so that the upper bound in (77) is immediate upon choosing ω = min( c , .It remains to show the lower bound on m . This lower bound is a sharpening of Lemma S-29in [14] and is proved similarly. By assumption, if, for some i , | θ ,i | 6 = 0 , then we may assume bysymmetry of m that µ = θ ,i > and we further have µ ≥ p n/s n ) + v n . Writing p = p ( n, w ) = v n ζ ( w ) and a = 1+0 . p , using monotonicity of φ/g and hence of β (Lemma 4),we have for w such that w | β (0) | < / , wm ( µ, w ) = Z | x | >aζ ( w ) wβ ( x )1 + wβ ( x ) φ ( x − µ ) dx + Z aζ ( w ) − aζ ( w ) wβ ( x )1 + wβ ( x ) φ ( x − µ ) dx ≥ Z x>aζ ( w ) wβ ( x )1 + wβ ( x ) φ ( x − µ ) dx − Z aζ ( w ) − aζ ( w ) φ ( x − µ ) dx ≥ wβ ( aζ ( w ))1 + wβ ( aζ ( w )) Φ( aζ ( w ) − µ ) − (1 − Φ( aζ ( w ) − µ )) . Increasingness of β implies that ζ is decreasing, so that also using Lemma 20 and a Taylor expan-sion, we have, for some ∆ n → , aζ ( w ) − µ ≤ ζ ( w ) − p n/s n ) − . v n ≤ ζ ( s n /n ) − p n/s n ) − . v n ≤ ∆ n − . v n . By standard properties of ¯Φ , including the tail bound ¯Φ( x ) ≍ φ ( x ) /x , − ¯Φ(∆ n − . v n ) = ¯Φ(0 . v n − ∆ n ) ≪ e − (0 . v n − ∆ n ) / ≤ e − (0 . v n ) / e v n ∆ n / ≪ ρ n . In particular, we have, for n large, − ¯Φ( aζ ( w ) − µ ) ≤ − ¯Φ(∆ n − . v n ) ≤ ρ n / . Additionally, wβ ( aζ ( w )) = β ( aζ ( w )) /β ( ζ ( w )) = (( g/φ )( aζ ( w )) − / (( g/φ )( ζ ( w )) − tendsquickly to infinity: wβ ( aζ ( w )) & g ( aζ ( w )) g ( ζ ( w )) φ ( ζ ( w )) φ ( aζ ( w )) & φ ( ζ ( w )) φ ( aζ ( w )) = e ( a − ζ ( w ) / ≫ e v n . / ≫ ρ − n . In particular, we see that, for n large, wβ ( aζ ( w ))1 + wβ ( aζ ( w )) = 1 − 11 + wβ ( aζ ( w )) ≥ − wβ ( aζ ( w )) ≥ − ρ n / . Inserting these bounds we find that wm ( µ, w ) ≥ (1 − ρ n / − ρ n / − ρ n / ≥ − ρ n . imsart-generic ver. 2020/08/06 file: CLvalues-procedures_EJS_submission.tex date: February 18, 2021 . Abraham, I. Castillo, E. Roquain/ C ℓ –value multiple testing Lemma 16. The function F w ( λ ) = P θ ,i =0 ( ℓ i,w ≤ λ ) is continuous and strictly increasing in λ .Assume that w = w n and λ = λ n ∈ (0 , satisfy λ → and w/ (1 − λ ) → . Then F w ( λ ) = 2 ¯Φ( ξ ( r ( w, λ ))) ≍ w (1 − λ ) − (log((1 − λ ) /w )) − / as n → ∞ , where ξ = ( φ/g ) − and r ( w, t ) = w (1 − w ) − t (1 − t ) − . If in fact w c / (1 − λ ) → forsome c < then F w ( λ ) ≍ w (1 − λ ) − (log(1 /w )) − / . Proof. A direct calculation, as needed also in proving Lemma 4, yields ℓ i ( X ) ≤ t ⇐⇒ | X i | ≥ ξ ( r ( w, t )) , (80)so that F w ( x ) = 2 ¯Φ( ξ ( r ( w, x ))) as claimed and hence F w is continuous.Next, we use a standard Gaussian tail bound, the definition of ξ , the definition (7) of g in thequasi-Cauchy case, the fact that r ( w, λ ) ≍ w/ (1 − λ ) as w → and λ → , and the fact that ξ ( u ) ≍ (log(1 /u )) / as u → (see Lemma 20) to see that ¯Φ( ξ ( r ( w, λ )) ≍ φ ( ξ ( r ( w, λ )) ξ ( r ( w, λ ) ≍ r ( w, λ ) g ( ξ ( r ( w, λ )) ξ ( r ( w, λ )) ≍ r ( w, λ ) ξ ( r ( w, λ )) − ≍ w − λ (cid:16) log (cid:16) − λw (cid:17)(cid:17) − / , as claimed. Note that log((1 − λ ) /w ) ≤ log(1 /w ) , and that when w c / (1 − λ ) → we have log((1 − λ ) /w ) & log(1 /w − c ) ≍ log(1 /w ) . Lemma 17. Suppose for sequences w = w ,n , w = w ,n and λ = λ n taking values in [0 , that λ → , that both w /w and w /w are bounded, and that w c / (1 − λ ) → for some c < . Then − E θ =0 [ ℓ ,w ( X ) | ℓ ,w ( X ) < λ ] ≍ (1 − λ ) log(1 / (1 − λ )) , Let us also note here that for fixed w , w , E θ =0 [ ℓ ,w ( X ) | ℓ ,w ( X ) < λ ] is continuous in λ .Proof. Recall the definitions β ( x ) = gφ ( x ) − , ζ ( w ) = β − (1 /w ) , ξ = ( φ/g ) − , and recall that ℓ ,w ( X ) < λ if and only if | X | > ξ ( r ( w, λ )) , see (80). Using symmetry of the densities φ and g wesee that for all w , w ∈ (0 , , E θ =0 [ ℓ ,w ( X ) | ℓ ,w ( X ) < λ ] = R ∞ ξ w (1 − w ) φ ( x )(1 − w ) φ ( x )+ w g ( x ) φ ( x ) d x ¯Φ( ξ w ) , where we have introduced the notation ξ w := ξ ( r ( w , λ )) . The expression on the right is continuousat any λ such that the denominator is bounded away from zero, i.e. at any λ = 0 , hence the sameis true of the conditional expectation.Write h w ( x ) = w β ( x ) φ ( x ) / (1 + w β ( x )) . For w , w small enough, the following bounds hold: φ ( x ) / ≤ h w ( x ) ≤ φ ( x ) , x ∈ [ ζ ( w ) , ∞ ); w g ( x ) / ≤ h w ( x ) ≤ w g ( x ) x ∈ [ ξ w , ζ ( w )] . To obtain these inequalities we have used monotonicity of φ/g and hence β , and the fact that β ( ζ ( w )) = 1 /w . The first inequalities then follow from the expression h w ( x ) = ( w β ( x )1+ w β ( x ) ) φ ( x ) ,while the latter inequalities result from the expression h w ( x ) = w g ( x )( − ( φ/g )( x )1+ w β ( x ) ) and the factthat ( φ/g )( x ) ≤ / for x large enough. By assumption there exists C > such that w ≤ Cw for all n large enough, and note that also λ ≥ C/ ( C + 1) by further increasing n if necessary.Recalling the relationship (65) and using that decreasingness of φ/g (Lemma 4) implies the sameof ξ = ( φ/g ) − , we then have ζ ( w ) = ξ ( w / (1 + w )) ≥ ξ ( w ) ≥ ξ ( Cw ) ≥ ξ ( r ( w , λ )) = ξ w . imsart-generic ver. 2020/08/06 file: CLvalues-procedures_EJS_submission.tex date: February 18, 2021 . Abraham, I. Castillo, E. Roquain/ C ℓ –value multiple testing In addition, since g is decreasing for x large, we have w g ( ζ ( w )) / ≤ h w ( x ) ≤ w g ( ξ w ) , x ∈ [ ξ w , ζ ( w )] . Then Z ∞ ξ w φ ( x )(1 − w ) φ ( x ) + w g ( x ) φ ( x ) d x = Z ∞ ξ w 11 + w β ( x ) φ ( x ) d x = Z ∞ ξ w φ ( x ) d x − Z ∞ ξ w h w ( x ) d x = ¯Φ( ξ w ) − Z ζ ( w ) ξ w h w ( x ) d x − Z ∞ ζ ( w ) h w ( x ) d x ≥ ¯Φ( ξ w ) − ( ζ ( w ) − ξ w ) w g ( ξ w ) − ¯Φ( ζ ( w )) . We can similarly upper bound the integral, so we deduce the inequalities (1 − w )¯Φ( ξ w ) h ¯Φ( ξ w ) − ¯Φ( ζ ( w )) − ( ζ ( w ) − ξ w ) w g ( ξ w ) i ≤ E θ =0 [ ℓ ,w ( X ) | ℓ ,w ( X ) < λ ] ≤ (1 − w )¯Φ( ξ w ) h ¯Φ( ξ w ) − 12 ¯Φ( ζ ( w )) − 14 ( ζ ( w ) − ξ w ) w g ( ζ ( w )) i . (81)Now, let us study in detail the order of each term. First, usual normal tail bounds, the definitionof ζ , the definition (7) of g in the quasi-Cauchy case and Lemma 20 (which tells us that ζ ( w ) ≍ (log 1 /w ) ) imply that for w small enough ¯Φ( ζ ( w )) ≍ φ ( ζ ( w )) ζ ( w ) ≍ w g ( ζ ( w )) ζ ( w ) ≍ w ζ ( w ) − ≍ w log − / (1 /w ) . Similarly to the proof of Lemma 16, observe that (1 − λ ) /w ≥ w c − and for n large and hencethat log((1 − λ ) /w ) ≍ log(1 /w ) . Using the definition of ξ and Lemma 20 (which tells us that ξ ( u ) ≍ log(1 /u ) ), we then obtain ¯Φ( ξ w ) ≍ φ ( ξ w ) ξ w = r ( w , λ ) g ( ξ w ) ξ w ≍ w − λ ξ − w ≍ w − λ log − / (cid:18) − λw (cid:19) ≍ w − λ log − / (cid:18) w (cid:19) . We deduce that ≤ ¯Φ( ζ ( w )) / ¯Φ( ξ w ) . − λ .We apply Lemma 18 with w = w and with t ∈ (0 , such that r ( w , t ) = r ( w , λ ) . Observingthat − t = r ( w , t ) 1 − w tw = r ( w , λ ) 1 − w tw ≍ λ − λ ≍ − λ , so that w c / (1 − t ) → , we deduce that ζ ( w ) − ξ w ≍ log(1 / (1 − t ))log(1 /w ) / ≍ log(1 / (1 − λ ))log(1 /w ) / . Again using that log((1 − λ ) /w ) ≍ log(1 /w ) , it follows that ( ζ ( w ) − ξ w ) w g ( ζ ( w )) ≍ ( ζ ( w ) − ξ w ) w g ( ξ w ) ≍ (1 − λ ) log(1 / (1 − λ )) ¯Φ( ξ w ) , since we showed above that w (1 − λ ) − log − / (1 /w ) ≍ ¯Φ( ξ w ) . Feeding these bounds into (81)yields that for some c , c , c > − E θ [ ℓ ,w | ℓ ,w < λ ] ≥ w + c (1 − w )((1 − λ ) log(1 / (1 − λ )) , − E θ [ ℓ ,w | ℓ ,w < λ ] ≤ w + c (1 − λ ) + c ((1 − λ ) log(1 / (1 − λ )) . The lower bound follows upon discarding the term w and noting that − w ≥ / for n large;for the upper bound we note that w + c (1 − λ ) = o (cid:0) (1 − λ ) log(1 / (1 − λ )) (cid:1) . imsart-generic ver. 2020/08/06 file: CLvalues-procedures_EJS_submission.tex date: February 18, 2021 . Abraham, I. Castillo, E. Roquain/ C ℓ –value multiple testing Lemma 18. Suppose for sequences w = w n and t = t n taking values in [0 , that t → and w c / (1 − t ) → for some c < . Then ζ ( w ) − ξ ( r ( w, t )) ≥ for n large enough and, as n → ∞ , ζ ( w ) − ξ ( r ( w, t )) ≍ log(1 / (1 − t ))(log(1 /w )) / . (82) Proof. For − t ≤ / and w/ (1 − t ) ≤ . , we have r ( w, t ) = tw (1 − t )(1 − w ) ≥ wt − t ≥ . w − t , so that log log(1 /r ( w, t )) ≤ log(log(2) + log((1 − t ) /w ) ≤ log(2) + log log((1 − t ) /w ) . Hence, using boundson ζ and ξ from Lemma 20 and noting that log(1 /w ) ≥ log((1 − t ) /w ) and that log(1 / (1 − w )) isbounded, we see that for − t and w/ (1 − t ) small enough we have for constants c, c ′ ζ ( w ) − ξ ( r ( w, t )) ≥ /w ) + 2 log log(1 /w ) + c − (2 log(1 /r ( w, t )) + 2 log log(1 /r ( w, t )) + 6 log 2) ≥ t/ (1 − t )) + 2 log(log(1 /w ) / log((1 − t ) /w )) + c ′ ≥ / (1 − t )) + c ′ ≥ log(1 / (1 − t )) . Conversely, for w ≤ / , we have r ( w, t ) = tw (1 − t )(1 − w ) ≤ w − t , so that log log(1 /r ( w, t )) ≥ log(log(0 . 5) + log((1 − t ) /w ) ≥ log(0 . 5) + log log((1 − t ) /w ) provided n is large enough that log((1 − t ) /w ) + log(0 . ≥ . − t ) /w ) . Note also, as in the proof of Lemma 16 that thecondition on w c / (1 − λ ) → implies that log(1 /w ) / log((1 − t ) /w ) is bounded. Again using boundson ζ and ξ from Lemma 20, for − t and w/ (1 − t ) small enough we deduce that for constants C, C ′ , C ′′ we have ζ ( w ) − ξ ( r ( w, t )) ≤ /w ) + 2 log log(1 /w ) + C − (2 log(1 /r ( w, t )) + 2 log log(1 /r ( w, t )) + 2 log 2) ≤ t/ (1 − t )) + 2 log(log(1 /w ) / log((1 − t ) /w )) + C ′ ≤ / (1 − t )) + C ′′ . This entails ζ ( w ) − ξ ( r ( w, t )) ≍ log( t/ (1 − t )) ζ ( w ) + ξ ( r ( w, t )) , and the result thus follows from ζ ( w ) ≤ ζ ( w ) + ξ ( r ( w, t )) ≤ ζ ( w ) (the latter being implied by theabove calculations) and ζ ( w ) ≍ (log(1 /w )) / . Lemma 19. For any β > there exists c = c ( β ) > such that for any s n > c (log n ) / log log n satisfying n/s n → ∞ we have for n large enough log log( n/s n )log( n/s n ) ≥ α (cid:16) log s n s n (cid:17) / . Consequently, the conclusions of Theorem 2 hold upon replacing the assumption s n ≥ (log n ) with s n ≥ b (log n ) / log log n for some large enough b .Proof. Write p ( s ) = log log( n/s )log( n/s ) , q ( s ) = (cid:16) log ss (cid:17) / . Since u − log u ≤ ( u ′ ) − log u ′ for u ≥ u ′ ≥ e , we see that p ( s n ) ≥ p (1) (apply with u = log n , u ′ = log( n/s n ) , and note u ′ > e for n large enough since n/s n → ∞ ). Similarly we notice that q is decreasing, at least on s > e , so that q ( s − ) ≥ q ( s n ) for s − := c (log n ) / log log n . It thereforesuffices to show that p (1) ≥ βq ( s − ) .Observe that for n large enough we have s − ≤ (log n ) , hence log s − ≤ n. It followsthat s − / log s − ≥ c (log n/ log log n ) . Thus, q ( s − ) ≤ (cid:16) c (cid:17) / log log n log n = ( c / − / p (1) . imsart-generic ver. 2020/08/06 file: CLvalues-procedures_EJS_submission.tex date: February 18, 2021 . Abraham, I. Castillo, E. Roquain/ C ℓ –value multiple testing The result follows for c = 2 β .To see that the proof of Theorem 2 holds under the weaker condition on s n , note that the lowerbound on s n was not assumed for any of the core lemmas, and in the proof of the theorem itselfwas only used to show that for any β > , ν n ≤ βε n for n large enough.Finally, for the readers convenience, we gather some results together whose proofs are omittedbecause they can be found elsewhere. The following lemma collects results from [14]; we remarkthat while the setting of that paper assumes polynomial sparsity, the results gathered here do notdepend on that assumption. Some of the following results are originally stated with dependenceon g and a related parameter κ ∈ [1 , ; here, with g explicitly given in (7), we substitute κ = 2 and use the bounds k g k ∞ := sup x | g ( x ) | ≤ √ π and x − / (2 √ π ) ≤ g ( x ) ≤ x − / √ π for | x | ≥ to simplify expressions. Lemma 20 (Results from [14]) . a. Lemma S-10: ℓ i,w − ( X ) ≥ q i,w − ( X ) .b. Lemma S-12: ξ ( u ) ∼ (2 log(1 /u )) / , and more precisely, for u small enough, ξ ( u ) ≥ (cid:16) /u ) + 2 log log(1 /u ) + 2 log 2 (cid:17) / ξ ( u ) ≤ (cid:16) /u ) + 2 log log(1 /u ) + 6 log 2 (cid:17) / . c. Lemma S-14: ζ ( w ) ∼ (2 log(1 /w )) / . More precisely, for constants c, C ∈ R and for w smallenough, (2 log(1 /w ) + 2 log log(1 /w ) + c ) / ≤ ζ ( w ) ≤ (2 log(1 /w ) + 2 log log(1 /w ) + C ) / . d. Proof of Lemma S-15: ζ ( w ) − χ ( r ( w, t )) ≥ c /w ) ζ ( w ) for a universal constant c > , for w small enough (smaller than a threshold that might depend on t ).e. Eq. (S-15): for some constant C ′ > and u ∈ (0 , small enough, χ ( u ) ≥ (2 log(1 /u ) − log(log(1 /u )) − C ′ ) / . f. Lemma S-20: there exists c > such that for any x ∈ R and w ∈ (0 , , | β ( x, w ) | ≤ ( w ∧ c ) − .g. Lemma S-21: there exists c > such that m ( x, w ) ≤ (min( c , w )) − for all x ∈ R . Thefunction ˜ m is continuous, non-negative and increasing. For any fixed τ the function w m ( τ, w ) is continuous and decreasing.h. Lemma S-23: ˜ m ( w ) ≍ ζ ( w ) g ( ζ ( w )) ≍ ζ ( w ) − .i. Proof of Lemma S-23: for universal constants C , C > and w small enough, ˜ m ( w ) ≤ C ζ ( w ) − + 2 G ( ζ ( w )) and ˜ m ( w ) ≥ G ( ζ ( w ))(1 − C ζ ( w ) − ) .j. Lemma S-26, Corollary S-28: for m ( θ ,i , w ) = E θ ( β ( X i , w ) ) , there exist constants C, ω , M > such that for all w ≤ ω and all τ ≥ M m (0 , w ) ≤ C ¯Φ( ζ ( w )) w − ,m ( τ, w ) ≤ Cm ( τ, w ) w − k. Lemma S-40: ¯Φ( x ) ∼ x − φ ( x ) as x → ∞ . More precisely, x x φ ( x ) x ≤ ¯Φ( x ) ≤ φ ( x ) x . Lemma 21 (Bernstein’s inequality) . Let U i , i ≤ n be independent random variables taking valuesin [0 , . Then, for any u > , P (cid:16) n X i =1 ( U i − E [ U i ]) ≥ u (cid:17) ≤ exp (cid:16) − u / P ni =1 Var( U i ) + u/ (cid:17) , and P (cid:16)(cid:12)(cid:12)(cid:12) n X i =1 ( U i − E [ U i ]) (cid:12)(cid:12)(cid:12) ≥ u (cid:17) ≤ (cid:16) − u / P ni =1 Var( U i ) + u/ (cid:17) . imsart-generic ver. 2020/08/06 file: CLvalues-procedures_EJS_submission.tex date: February 18, 2021 . Abraham, I. Castillo, E. Roquain/ C ℓ –value multiple testing Appendix B: Notation X = ( X , . . . , X n ) the data, with X i = θ i + ε i , where the ε i are i.i.d. Gaussians ε i ∼ N (0 , .θ the unknown true parameter in ℓ ( s n , v n ) . P θ the law of X with parameter θ , E θ the associated expectation ℓ ( s ) = { θ ∈ R N : { ≤ i ≤ n : θ i = 0 } = k θ k ℓ ≤ s } S = { i : θ i = 0 } the support of a vector θ ∈ ℓ ( s ) . ℓ ( s n , v n ) = { θ ∈ ℓ ( s n ) : | θ i | ≥ p n/s n ) + v n for i ∈ S , | S | = s n } , with s n → ∞ , n/s n → ∞ , v n → ∞ . (And s n ≥ (log n ) , v n ≥ n/s n )) / for Theorems 2 and 3.) Π w the spike–and–slab prior (6), under which θ i = 0 with probability − w and is drawn fromsome (implicitly defined) density γ with probability w , independently of the other θ j . Π w ( · | X ) the induced posterior on θ , see before (8). φ, g the standard Gaussian density and the quasi-Cauchy density g ( x ) = (2 π ) − / x − (1 − e − x / ) which respectively are the laws of X i under θ i = 0 and under θ i ∼ γ . ¯Φ , ¯ G the upper tail distributions for φ, g , e.g. ¯Φ( x ) = R ∞ x φ ( t ) d t . ℓ i,w ( X ) = = Π w ( θ i = 0 | X ) = (1 − w ) φ ( X i )(1 − w ) φ ( X i )+ wg ( X i ) . Also just denoted ℓ i,w , or ℓ i ( X ) at times. q i,w ( X ) = (1 − w )¯Φ( | X i | )(1 − w )¯Φ( | X i | )+ w ¯ G ( | X i | ) as in (17). Also just denoted q i,w at times. L ( w ) the log-likelihood (10), S ( w ) = L ′ ( w ) = P ni =1 β ( X i , w ) the score function. β ( x ) = ( g/φ )( x ) − , β ( x, w ) = β ( x ) / (1 + wβ ( x )) . ζ ( w ) = β − (1 /w ) , w ∈ (0 , ξ = ( φ/g ) − , χ = ( ¯Φ / ¯ G ) − ˜ m : ˜ m ( w ) = − E β ( X, w ) = − R ∞−∞ β ( t, w ) φ ( t ) d t ˜ m : m ( τ, w ) = E τ [ β ( X, w )] = R ∞−∞ β ( t, w ) φ ( t − τ ) d tr ( w, t ) = wt (1 − w ) − (1 − t ) − F w ( x ) = P θ =0 ( ℓ i ( X ) < x ) = 2 ¯Φ( ξ ( r ( w, x ))) . FDP , FDR , FNR the usual false discovery proportion, false discovery rate, and false negative rate,see (3)–(5). BFDR the Bayesian FDR, i.e. the FDR average over draws θ from the prior (see (11)). postFDR w ( ϕ ) = P ni =1 ℓ i,w ϕ i ∨ ( P ni =1 ϕ i ) . ˆ w = argmax w ∈ [1 /n, L ( w ) , the maximum likelihood estimator for w . w ± quantities which will be used to upper and lower bound ˆ w with high probability, defined in(39) and (40). ˆ λ = sup { λ : postFDR ˆ w ( ϕ λ, ˆ w ) ≤ t } .λ ± quantities which will be used to upper and lower bound ˆ λ with high probability, defined in(43) and (44). ϕ λ,w ( X ) i = ( { ℓ i,w ( X ) < λ } ) i ≤ n . ϕ C ℓ = ϕ ˆ λ, ˆ w ϕ q –val = ( { q i, ˆ w < t } ) ≤ i ≤ n . V λ,w = { i S : ℓ i,w < λ } the number of false discoveries made by ϕ λ,w . V ′ w = { i S : q i,w < t } . ν n , δ n , ρ n , ε n see (30)–(33) ( ν n = ( s n / log s n ) − / , δ n = (log n/s n ) − , ε n = δ n log log( n/s n ) , ρ n = e − v n / ). In the setting of Theorem 2, ε n is the largest of these asymptotically, see (35)and (36). K n = { i ∈ S : ℓ i,w − < δ n } . . , & , ≍ , ∼ , ≪ , o, O : For sequences a n , b n , a n . b n or a n = O ( b n ) means ( b n ≥ and) there exists aconstant C s.t. | a n | ≤ Cb n , and C is independent of n (and other arguments of a, b ). a n & b n means b n . a n . a n ≍ b n means a n . b n and a n & b n . a n ∼ b n means a n /b n → , and a n ≪ b n or a n = o ( b n ) means a n /b n → . For functions f, g , all these relations are definedcorrespondingly. imsart-generic ver. 2020/08/06 file: CLvalues-procedures_EJS_submission.tex date: February 18, 2021 . Abraham, I. Castillo, E. Roquain/ C ℓ –value multiple testing References [1] K. Abraham, I. Castillo, and E. Gassiat. Multiple testing in nonparametric Hidden Markovmodels: An Empirical Bayes approach. 2021. Arxiv preprint 2101.03838.[2] D. Amar, R. Shamir, and D. Yekutieli. Extracting replicable associations across multiple stud-ies: Empirical Bayes algorithms for controlling the false discovery rate. PLoS computationalbiology , 13(8):e1005700, 2017.[3] D. Azriel and A. Schwartzman. The empirical distribution of a large number of correlatednormal variables. Journal of the American Statistical Association , 110(511):1217–1228, 2015.[4] S. Banerjee, I. Castillo, and S. Ghosal. Bayesian inference in high-dimensional models. 2021.Book chapter to appear in Springer volume on data science, Arxiv preprint 2101.04491.[5] R. F. Barber and E. J. Candès. Controlling the false discovery rate via knockoffs. Ann.Statist. , 43(5):2055–2085, 2015.[6] R. F. Barber, E. J. Candès, et al. A knockoff filter for high-dimensional selective inference. The Annals of Statistics , 47(5):2504–2537, 2019.[7] Y. Benjamini and Y. Hochberg. Controlling the false discovery rate: A practical and powerfulapproach to multiple testing. J. Roy. Statist. Soc. Ser. B , 57(1):289–300, 1995.[8] Y. Benjamini, A. M. Krieger, and D. Yekutieli. Adaptive linear step-up procedures thatcontrol the false discovery rate. Biometrika , 93(3):491–507, 2006.[9] Y. Benjamini and D. Yekutieli. The control of the false discovery rate in multiple testingunder dependency. Ann. Statist. , 29(4):1165–1188, 2001.[10] G. Blanchard and E. Roquain. Adaptive false discovery rate control under independence anddependence. J. Mach. Learn. Res. , 10:2837–2871, 2009.[11] M. Bogdan, E. van den Berg, C. Sabatti, W. Su, and E. J. Candès. SLOPE—adaptive variableselection via convex optimization. Ann. Appl. Stat. , 9(3):1103–1140, 2015.[12] T. T. Cai, W. Sun, and W. Wang. Covariate-assisted ranking and screening for large-scaletwo-sample inference.[13] I. Castillo and R. Mismer. Empirical Bayes analysis of spike and slab posterior distributions. Electron. J. Stat. , 12(2):3953–4001, 2018.[14] I. Castillo and E. Roquain. On spike and slab empirical Bayes multiple testing. Ann. Statist. ,48(5):2548–2574, 2020.[15] I. Castillo and E. Roquain. Supplement to “On spike and slab empirical Bayes multipletesting". Ann. Statist. , 2020.[16] I. Castillo and B. Szabó. Spike and slab empirical Bayes sparse credible sets. Bernoulli ,26(1):127–158, 2020.[17] X. Chen, R. W. Doerge, and J. F. Heyse. Multiple testing with discrete data: Proportion oftrue null hypotheses and two adaptive FDR procedures. Biom. J. , 60(4):761–779, 2018.[18] E. P. Consortium et al. Identification and analysis of functional elements in 1% of the humangenome by the encode pilot project. Nature , 447(7146):799, 2007.[19] T. Dickhaus. Simultaneous statistical inference: With applications in the life sciences .Springer, Heidelberg, 2014.[20] S. Döhler, G. Durand, and E. Roquain. New FDR bounds for discrete and heterogeneoustests. Electron. J. Statist. , 12(1):1867–1900, 2018.[21] G. Durand. Adaptive p -value weighting with power optimality. Electron. J. Stat. , 13(2):3336–3385, 2019.[22] B. Efron. Large-scale simultaneous hypothesis testing: The choice of a null hypothesis. J.Am. Stat. Assoc. , 99(465):96–104, 2004.[23] B. Efron. Size, power and false discovery rates. Ann. Statist. , 35(4):1351–1377, 2007.[24] B. Efron. Microarrays, empirical Bayes and the two-groups model. Statist. Sci. , 23(1):1–22,2008.[25] B. Efron, R. Tibshirani, J. D. Storey, and V. Tusher. Empirical Bayes analysis of a microarrayexperiment. J. Amer. Statist. Assoc. , 96(456):1151–1160, 2001.[26] D. Gerard and M. Stephens. Empirical Bayes shrinkage and false discovery rate estimation,allowing for unwanted variation. Biostatistics , 07 2018. imsart-generic ver. 2020/08/06 file: CLvalues-procedures_EJS_submission.tex date: February 18, 2021 . Abraham, I. Castillo, E. Roquain/ C ℓ –value multiple testing [27] N. Ignatiadis, B. Klaus, J. Zaugg, and W. Huber. Data-driven hypothesis weighting increasesdetection power in genome-scale multiple testing. Nature methods , 13:577–580, 05 2016.[28] A. Javanmard, H. Javadi, et al. False discovery rate control via debiased lasso. ElectronicJournal of Statistics , 13(1):1212–1253, 2019.[29] W. Jiang and W. Yu. Controlling the joint local false discovery rate is more powerful thanmeta-analysis methods in joint analysis of summary statistics from multiple genome-wideassociation studies. Bioinformatics , 33(4):500–507, 12 2016.[30] I. M. Johnstone and B. W. Silverman. Needles and straw in haystacks: Empirical Bayesestimates of possibly sparse sequences. Ann. Statist. , 32(4):1594–1649, 2004.[31] N. Lee, A.-Y. Kim, C.-H. Park, and S.-H. Kim. An improvement on local FDR analysisapplied to functional MRI data. Journal of neuroscience methods , 267:115–125, 2016.[32] A. Li and R. F. Barber. Multiple testing with the structure-adaptive Benjamini-Hochbergalgorithm. J. R. Stat. Soc. Ser. B. Stat. Methodol. , 81(1):45–74, 2019.[33] W. Liu et al. Gaussian graphical model estimation with false discovery rate control. TheAnnals of Statistics , 41(6):2948–2978, 2013.[34] P. Müller, G. Parmigiani, C. Robert, and J. Rousseau. Optimal sample size for multipletesting: The case of gene expression microarrays. J. Amer. Statist. Assoc. , 99(468):990–1001,2004.[35] T. Rebafka, E. Roquain, and F. Villers. Graph inference with clustering and false discoveryrate control. 2019. Arxiv preprint 1907.10176.[36] E. Roquain and M. van de Wiel. Optimal weighting for false discovery rate control. Electron.J. Stat. , 3:678–711, 2009.[37] J.-B. Salomond. Risk quantification for the thresholding rule for multiple testing using gaus-sian scale mixtures. 2017. Arxiv preprint 1711.08705.[38] S. K. Sarkar, T. Zhou, and D. Ghosh. A general decision theoretic formulation of procedurescontrolling FDR and FNR from a Bayesian perspective. Statist. Sinica , 18(3):925–945, 2008.[39] M. Stephens. False discovery rates: A new deal. Biostatistics , 18(2):275–294, 10 2016.[40] J. D. Storey. The positive false discovery rate: A Bayesian interpretation and the q -value. Ann. Statist. , 31(6):2013–2035, 2003.[41] L. Sun and M. Stephens. Solving the empirical Bayes normal means problem with correlatednoise. 2018. Arxiv preprint 1812.07488.[42] W. Sun and T. T. Cai. Oracle and adaptive compound decision rules for false discovery ratecontrol. J. Amer. Statist. Assoc. , 102(479):901–912, 2007.[43] W. Sun and T. T. Cai. Large-scale multiple testing under dependence. J. R. Stat. Soc. Ser.B Stat. Methodol. , 71(2):393–424, 2009.[44] S. van der Pas, B. Szabó, and A. van der Vaart. Uncertainty quantification for the horseshoe(with discussion). Bayesian Anal. , 12(4):1221–1274, 2017.[45] R. W. Zablocki, A. J. Schork, R. A. Levine, O. A. Andreassen, A. M. Dale, and W. K.Thompson. Covariate-modulated local false discovery rate for genome-wide association stud-ies. Bioinformatics , 30(15):2098–2104, 2014., 30(15):2098–2104, 2014.