[PDF] Large-scale simultaneous inference under dependence

Abstract

Full PDF

LLarge-scale simultaneous inference under dependence

Jinjin Tian , Xu Chen , Eugene Katsevich , Jelle Goeman , Aaditya Ramdas Carnegie Mellon University Leiden University Medical Center University of Pennsylvania {jinjint,aramdas}@stat.cmu.edu{X.Chen.MS, J.J.Goeman}@[email protected]

Abstract

Simultaneous, post-hoc inference is desirable in large-scale hypotheses testing as it allows for explorationof data while deciding on criteria for proclaiming discoveries. It was recently proved that all admissible post-hoc inference methods for the number of true discoveries must be based on closed testing. In this paper weinvestigate tractable and efﬁcient closed testing with local tests of different properties, such as monotonicty , symmetry and separability , meaning that the test thresholds a monotonic or symmetric function or a functionof sums of test scores for the individual hypotheses. This class includes well-known global null tests byFisher, Stouffer and Rüschendorf, as well as newly proposed ones based on harmonic means and Cauchycombinations. Under monotonicity , we propose a new linear time statistic (“coma”) that quantiﬁes the costof multiplicity adjustments. If the tests are also symmetric and separable , we develop several fast (mostlylinear-time) algorithms for post-hoc inference, making closed testing tractable. Paired with recent advancesin global null tests based on generalized means, our work immediately instantiates a series of simultaneousinference methods that can handle many complex dependence structures and signal compositions. We provideguidance on choosing from these methods via theoretical investigation of the conservativeness and sensitivityfor different local tests, as well as simulations that ﬁnd analogous behavior for local tests and full closedtesting. One result of independent interest is the following: if P , . . . , P d are p-values from a multivariateGaussian with arbitrary covariance, then their arithmetic average P satisﬁes Pr ( P ≤ t ) ≤ t for t ≤ d . In large-scale hypothesis testing problems, choosing the criteria for proclaiming discoveries, or even pickingan error metric, can be tricky before researchers look at their data. A much more ﬂexible approach is doingsimultaneous (and thus post-hoc) inference, which allows the researcher to examine the whole data set andcompare data-dependent guarantees on any subsets that they like, before ﬁnally rejecting a set of null hypothe-ses along with the associated guarantee. Simultaneous inference methods are typically designed to control thefalse discovery proportion (FDP) for all possible choices of selections simultaneously (e.g. [1]), sometimes bycontrolling the family-wise error rate (FWER). It was recently proved that optimal post-hoc methods must bebased on closed testing [2, 3, 4]. One big obstacle that prevents closed testing from being popular in practice isits exponential computation time in the worst case. Further, the complex nature of the closure process makes ithard to theoretically quantify conservativeness and power.The key to deal with these obstacles boils down to the building block of closed testing, which is a local testfor every subset of the hypotheses, that tests for the presence of a signal in at least one of the hypotheses in thesubset (in other words, global null testing for each subset of hypotheses). The design of such local tests, i.e. a r X i v : . [ m a t h . S T ] F e b p -value combination test, that combines the evidence against theindividual hypotheses in the subset into a single test statistic. Formally speaking, consider a set of hypotheses H , . . . , H m , each as a collection of probability measures deﬁned on the same space (Ω , F ) , where Q (cid:63) is thetrue (unknown) distribution that generates the data. A hypothesis H i is true if Q (cid:63) ⊆ H i , and the global nullhypothesis is speciﬁed by m (cid:92) i =1 H i : = { H i is true, for all i ∈ [ m ] } . (1)Assume that we construct some test statistic, or score, T i which captures evidence refuting H i , and satisfyingPr H i { T i ≤ C i ( x ) } ≤ x, ∀ x ∈ [0 , , (2)for some corresponding critical value C i . (The scores are high when H i is true.) One common choice are p -values, where T i = P i , and C i ( x ) ≡ x . Then, global null testing can be done in the following way: combinethose scores using a function f and ﬁnd a calibration function g such thatPr (cid:84) mi =1 H i { f ( T , . . . , T m ) ≤ g ( m, x ) } ≤ x, for all x ∈ [0 , (3)is true under the assumed dependence structure (if any) among the scores. Note that generally f and g canalso depend on the scores themselves, however in this paper we consider speciﬁcally the case when f and g isﬁxed, and f as function the scores only, and g as function of the cardinality of hypotheses set only. These casesalready consist of a large proportion of existed global null tests, and simplify the analysis throughout the paper.We call a global null testing method in form of (3) as monotonic if f is monotonic in each of its arguments; symmetric if f remains unchanged on permuting its arguments; and separable if f = (cid:80) mi =1 f i ( T i ) for some f i .Monotonicity and symmetry are two rather common features of a global null test, while separability is moreunique. These three properties deﬁne a majority of existing global null tests, including famous examples likeFisher’s combination test [5], Stouffer’s combination method [6], Rüschendorf’s results [7] about the arithmeticmean of p -values; as well as recent advances like the Cauchy [8] and harmonic mean combinations [9].When the local tests used within closed testing are monotonic, symmetric and separable, fast shortcuts canbe derived. Given only monotonicity of local tests, we develop a novel statistic called coma (deﬁned in Sec-tion 2.1) to explicitly quantify the price of simultaneity under closed testing, which can be computed in lineartime. Given both monotonicity and symmetry of local tests, quadratic time shortcut for ﬁnding simultaneousFDP conﬁdence bounds have been developed by Goeman and Solari [10] and later a quadratic time variantfor simultaneous FWER control was presented by Dobriban [11]. We show that both shortcuts can be reducedto linear time (Algorithms 1, 2) if the local tests are also separable. Our ﬁndings make simultaneity tractableand affordable in practice, as the associated computation times are less than the initial sorting step. Thesefast shortcuts allow us to apply recent advances that develop separable global null tests, such as the general-ized mean based combination methods [12, 13, 14], in large-scale real world applications. Speciﬁcally, weobtain a class of novel methods for simultaneous inference, which we found rich enough to contain powerfulsolutions that adapt to various dependence assumptions and signal distributions. We further study the adap-tivity via careful quantiﬁcation of the balance between conservativeness caused by the need to protect againstunknown dependence, and test power. Speciﬁcally, we calibrate against the intermediate setting of arbitraryGaussian correlation (rather than the two extremes, independence and arbitrary dependence), and investigatethe asymptotic power under our derived calibration. The theoretical ﬁndings with regards to local tests are thenempirically conﬁrmed to be preserved after closure.The paper outline is as follows. In Section 2, we derive linear time algorithms for three kinds of tasksfor closed testing using a symmetric, monotonic and separable local test: (1) simultaneity assessment (e.g.compute the cost of simultaneity for a single subset of hypotheses chosen pre-hoc or post-hoc), (2) simultaneousinference (e.g. type-I error bounds and FDP error bounds calculation for a singe subset of hypotheses), and (3)automatic post-hoc selection (e.g. selection of the largest set of hypotheses with a predeﬁned error level for 02.23.21its post-hoc FDP bound). Then we focus on the multivariate Gaussian setting to formally evaluate a classof local test satisfying our requirements based on generalized means. Speciﬁcally, in Section 3.2, we derivethe asymptotic valid calibrated threshold for positively equicorrelated Gaussians, which allows us to calculatethe price paid to protect against different levels of dependence using different combination choices. Then wecalculate closed-form asymptotic power expressions under different signal setting in Section 3.3, and reasonabout the sweet spot for different combination methods. Finally, we conﬁrm that our qualitative conclusionsabout local tests are preserved after closure, using simulations in Section 4. A conclusion including takeawaysfor practitioners and future directions is provided in Section 5. Recall that we are interested in testing hypotheses H , . . . , H m , each represented by a collection of probabilitymeasures on the measurable space (Ω , F ) , where Q (cid:63) is the true (unknown) measure that generates the data.We call a hypothesis H i as null if Q (cid:63) ∈ H i , while as non-null if not. We denote H S : = (cid:84) i ∈ S H i as theintersection hypotheses corresponding to index set S , which is null if and only if H i is null for all i ∈ S . Inparticular, we let H ∅ equals the set of all probability measures on (Ω , F ) , so the null hypothesis H ∅ is alwaystrue. Let H : = { i : Q (cid:63) ∈ H i } denote the (unknown) set of null hypotheses that are true.The non-null hypotheses are usually of more interest, often serving as an important reference for variableselection and scientiﬁc discovery. Therefore we often call the non-null hypotheses as signals . A common goalis to identify a large set of hypotheses which contains mostly signals. In other words, we wish to proclaim aset of “discoveries” while controlling the number or fraction of false discoveries (i.e. the null hypotheses thatwere incorrectly proclaimed as discoveries).For a set S ⊆ [ m ] indexing the hypotheses, deﬁne its (unknown) number of false and true discoveries as (cid:15) ( S ) : = | S ∩ H | , δ ( S ) : = | S \ H | , (4)respectively. We wish to ﬁnd t α ( S ) ∈ { , } and e α ∈ { , , . . . , | S |} such that:Type-I error control: Pr { { δ ( S ) (cid:54) = 0 } ≥ t α ( S ) } ≥ − α, (5)False Discovery Proportion (FDP) control: Pr { (cid:15) ( S ) ≤ e α ( S ) } ≥ − α, (6)where t α ( S ) indicates whether we reject H S or not, and e α ( S ) provides the upper bound of the number ofnon-signals in S . Speciﬁcally, Type-I error control guarantees that with high probability, S is not rejected if itcontains only nulls; while the FDP control guarantees that with high probability, the number of false discoveriesin set S is upper bounded. Naturally, we prefer t α ( S ) to be one if possible, and e α ( S ) to be as small as possible.The slightly odd formalism for (5) is simply to draw parallels with the deﬁnitions that follow.To freely examine several arbitrary sets S and then select a set, we need extra corrections to ensure post-hocvalidity of error guarantees. In other words, we would need to convert the above high probability guaranteesfor an individual set S into one for all possible sets simultaneously. Formally, we desireSimultaneous Type-I error control: Pr (cid:8) { δ ( S ) = 0 } ≥ t α ( S ) for all S ⊆ [ m ] (cid:9) ≥ − α, (7)for some t α ( S ) ∈ { , } as before, and we would like to design an e α ( S ) ∈ { , , . . . , | S |} such thatSimultaneous FDP control: Pr { (cid:15) ( S ) ≤ e α ( S ) for all S ⊆ [ m ] } ≥ − α. (8)Closed form expressions for t ( · ) and e ( · ) can be derived in special cases [ ? ], but only bounds based on closedtesting can be admissible [15]. Closed testing was initially proposed by Marcus et al. [2], who suggested using t α ( S ) = { t α ( J ) = 1 for all J ⊇ S } . (9)It was later noticed by Goeman and Solari [10] that the same procedure also yields an expression for e α ( S ) : e α ( S ) = max {| I | : I ⊆ S, t α ( S ) (cid:54) = 1 } , (10) 02.23.21which is the size of the largest subset of S that is not rejected by closed testing. In this closed testing framework, t α deﬁned in (5) is also called as a local test , which is just a valid α -level test of the composite hypothesis H S ,while t α is the corresponding post-hoc version. We denote the set of composite hypotheses rejected locally(before closure) as U α , and as X α after closure, that is U α = { S ⊆ [ m ] : t α ( S ) = 1 } , and X α = { S ⊆ [ m ] : t α ( S ) = 1 } . (11)One may be concerned that simultaneity has a large statistical cost (paid in power) in practice. To addressthis concern, we propose a novel statistic called coma in Section 2.1 that quantiﬁes the cost of multiplicityadjustments. The statistic is invariant to the testing level α , and only costs linear time to compute as long as localtests are monotonic. Another practical concern with regard imposing simultaneity is the heavy computationtime, which is exponential in m in general. However, for certain local tests t α , one can design efﬁcient shortcutsfor computation. In Section 2.2, we present linear time shortcuts for calculating both t α and e α , for local teststhat are monotonic, symmetric and separable. Fast evaluation of a single set S with simultaneous guarantees inturn enables effective post-hoc selection among multiple sets of interest: we also develop linear and quadraticshortcuts for automatic selection of the largest set S with a prespecifed bound e α .Before we proceed, we introduce a special class of local tests based on generalized means as discussedby Vovk and Wang [12], since we will repeatedly use them as motivating examples. Consider the followingcombinations of p -values p , . . . , p m , indexed by r ∈ [ −∞ , ∞ ] : M r ( p , . . . , p m ) : =  max i ∈ [ m ] p i , if r = ∞ ;( (cid:81) mi =1 p i ) /m , if r = 0;( m (cid:80) mi =1 p ri ) /r , r ∈ ( −∞ , ∪ (0 , ∞ ); m min i ∈ [ m ] p i , if r = −∞ , (12)which corresponds to the arithmetic mean when r = 1 ; geometric mean when r = 0 ; and harmonic mean when r = − . For simplicity, we use M r,m to stand for M r ( p , . . . , p m ) throughout the paper. Denote t ( r ) α ( S ) : = { M r (( p i ) i ∈ S ) ≤ c r ( | S | , α ) } , (13)where c r ( | S | , α ) is a critical value depends only on | S | , α for different r . Then t ( r ) α ( S ) is a valid local test, andthe corresponding class T α := { t ( r ) α : r ∈ [ −∞ , ∞ ] } (14)is rich enough to contain many famous local test choices like the Bonferroni ( r = −∞ ) method, the Fisher’scombination ( r = 0 ), and the recent harmonic mean combination method ( r = − ); and its members also havesimple enough structure such that we can summarize their nature with a univariate parameter r . We now quantify the cost of the post-hoc inference via closed testing by developing a statistic called coma ,which stands for the

COst of Multiplicity Adjustment arising from requiring valid post-hoc inference. Thisstatistic is invariant to α by design, and only costs linear time to compute with monotonic local tests.To construct coma such that it is invariant to test level α , we intentionally use the adjusted p -value, whichis deﬁned as the smallest α under which the test would be rejected. Formally, for a certain set S among a seriesof hypotheses H , . . . , H m , denote the adjusted p -value based on S using local testing rule t α as p ( S ) := inf { α ∈ [0 ,

1] : t α ( S ) = 1 } , and the adjusted p -value for S after going through the closed testing procedure as p ( S ) := inf { α ∈ [0 ,

1] : t α ( S ) = 1 } . (15)Then coma is deﬁned as follows. 02.23.21 Deﬁnition 1 (cost of multiplicity adjustment) . For any S ⊆ [ m ] , deﬁnecoma ( S ) : = p ( S ) /p ( S ) (16)as the cost of multiplicity adjustment when testing H S .Note that coma ( S ) is a data-dependent quantity which depends on the choice of local test. As quickexample, coma ( S ) = m | S | if t α is Bonferroni. This example concurs with intuition that the cost of multiplicitygrows with total dimension m , however decreases with the subset dimension | S | . The following result presentsa more general expression for coma . Theorem 1.

For any S ⊆ [ m ] , we have p ( S ) = max { sup J ⊃ S p ( J ) , p ( S ) } . (17) Particularly, if the local test is monotonic, we have a linear time expression p ( S ) = max { p ( J i ) : 1 ≤ i ≤ | S c | , J i = [ m ] \ I i } , (18) where I i is the set of indices of hypotheses associate with i smallest scores in S c . Theorem 1 is proved in Appendix A. In the following, we examine how coma varies with the size of S for local tests based on generalized means T α in (14), using calibration derived by Vovk and Wang [12]under arbitrary dependence. Figure 1 plots coma versus the choice of local test for a set S of size out of hypotheses in total, using equicorrelated Gaussian data. We can see that coma with local test t ( r ) α withpositive r is generally smaller than that with negative r , while the order statistics based procedures, Simes andBonferroni, behave similarly. This indicates one would prefer to use t ( r ) α with positive r if one does not want toodifferent results on changing from pre-hoc to post-hoc. On the other hand, Figure 2 plots coma versus the sizeof target set S (with the total number of hypotheses remaining as ). Except for the consistent observationthat positive r have lower coma, we can additionally see that coma ( S ) generally decreases with the size of S ,which agrees with our intuition that lower resolution post-hoc inference should cost less.Figure 1: coma ( S ) versus different local test procedures under different extend of dependency. We simulatethe data to follow equicorrelated Gaussian, where we set total number of hypotheses m = 200 , and size of set S as 20. We set signal proportion outside S as . , signal proportion inside S as . with signal strength (i.e.the mean of Gaussian) µ = 2 . The results are averaged over × trials. 02.23.21Figure 2: coma ( S ) versus the size of S using different local test procedures under different extend of de-pendency. We simulate the data to follow equicorrelated Gaussian, where we set total number of hypotheses m = 200 , and size of set S as 20. We set signal proportion outside S as . , signal proportion inside S as . with signal strength (i.e. the mean of Gaussian) µ = 2 . The results are averaged over × trials. As mentioned before, special designs of local tests in closed testing can lead to shortcuts of much less com-putation time for evaluating either t α (9) or e α (10). Below we present a linear time shortcut for evaluating t α whenever the local test t α is monotonic , as well as for evaluating e α whenever local test t α are addition-ally symmetric and separable . We start with deﬁning these terms, which in fact are all pretty common andreasonable designs of local tests. Recall that the local test is an indicator function of whether to reject S or not: t α ( S ) : R | S | → { , } . (19)Similar to Dobriban [11], we treat the hypotheses identically. Then it makes sense to use the same local testingrule for each subset of a ﬁxed size s , that is t α satisfying the following global symmetry condition. Condition 1. (Global symmetry)

A local test has global symmetry if it has the form t α ( S ) = { f ( | S | ; { T i } i ∈ S ) ≤ α } , (20) for some function f that only depends on the size of S and the list of scores, but not the actual indices in S . There are two other commonly satisﬁed conditions — symmetry and monotonicity — which makes thecomputation manageable: quadratic time shortcuts for simultaneous FDP inference and FWER control havebeen developed by Goeman and Solari [10] and Dobriban [11] respectively under these two conditions.

Condition 2. (Monotonicity)

A local test of form (20) is called monotonic if for any s ≥ , any two sets ofscores ( T , . . . , T s ) and ( T (cid:48) , . . . , T (cid:48) s ) with T i ≤ T (cid:48) i for all i = 1 , . . . , s , we have f ( s ; T , . . . , T s ) ≥ f ( s ; T (cid:48) , . . . , T (cid:48) s ) . (21) Condition 3. (Symmetry)

A local test t α of form (20) is called symmetric if for any s ≥ , any set of scores ( T , . . . , T s ) , and any permutation ( i , . . . , i s ) of (1 , . . . , m ) , we have f ( s ; T , . . . , T s ) = f ( s ; T i , . . . , T i s ) . (22)Another condition which we ﬁnd could further reduce computation time is separability . This condition isrelatively under-emphasised in the past closed testing literature; however, it has long history in the global nulltesting literature with an increased recent interest [12, 9, 14]. Condition 4. (Separability)

A local test of form (20) is called separable if for s ≥ , and a set of scores ( T , . . . , T s ) , there exists a series of transformation functions on R { h k } sk =1 and a function g on R such that f ( s ; T , . . . , T s ) − α = s (cid:88) k =1 h k ( T i k ) − g ( s, α ) . (23) 02.23.21 Monotonicity is a reasonable requirement for a local test. Several well-known global null tests are mono-tonic, for example, Fisher’s test, Simes’ test, Bonferroni test, etc; as well as the tests based on generalizedmeans. As for symmetry and separability , recalling the class of local tests T α deﬁned in (14). It is easy tocheck that each of its element t ( r ) α is monotonic , symmetric for all r , and separable iff r (cid:54) = ±∞ . Remark . A local test t α with both symmetry and separability must admit the following form: t α ( S ) =  | S | (cid:88) k =1 h ( T i k ) ≤ g ( | S | , α )  , (24)that is the transformation functions in the summation are the same for each hypothesis.Now we are ready to present our linear time shortcuts to evaluate either t α (9) or e α (10). Note that we sortthe scores in ascending order in Algorithm 1 in order to have easier tracking of indices, since the algorithm isa step-down procedure. The proof for Theorem 2 is in Appendix B. Theorem 2.

Consider testing m hypotheses post-hoc via closed testing using monotonic local test t α .(a) For any set S ⊆ [ m ] , let J (cid:63)i be the set of indices associated with the i largest scores in S c . Then, t α ( S ) = (cid:89) ≤ i ≤| S c | t α ( I i ) , where I i = S ∪ J (cid:63)i , (25) which requires at most O ( m ) computation.(b) Additionally, if the local test is also symmetric and separable, then Algorithm 1 returns e α ( S ) in (10) with at most O ( m ) computation.Remark . Note that local test t ( r ) α is not separable when r = ±∞ , therefore the shortcut in Theorem 2 forevaluating corresponding e ( r ) α is not applicable. However, they lead to consonant closed testing as provedby Lemma 3 in Appendix C, and one interesting fact pointed out by Goeman and Solari [10] is that if forconsonant closed testing, the simultaneous FDP bound for a given set reduces to ﬁnding the number of its ele-mentary hypotheses that the closed testing cannot reject, therefore reducing to identifying the set of elementaryhypotheses being rejected after closure. For r = −∞ , this is just the Holm’s method, while for r = ∞ , this isjust checking whether we can reject the largest p -value to decide either reject all or nothing.We have presented procedures for fast inference on a single set S picked freely by users. For users whohave no idea of which candidate set to evaluate, Theorem 3 provides shortcuts for automatic selection among asequence of incremental sets: ﬁnding the largest one among them with FDP bounded by γ ∈ [0 , . Theorem 3.

Consider testing m hypotheses post-hoc via closed testing at level α , and a series of incrementalcandidate sets to reject: S ⊂ S · · · ⊂ S n ⊆ [ m ] with | S i | = i for all i ∈ [ n ] . Algorithm 3 returns the largestset S k such that e α ( S k ) ≤ γ | S k | given any desired FDP bound γ ∈ [0 , . If we additionally require local testto be monotonic, symmetric and separable, then(a) Algorithm 3 costs at most O ( m ) computation;(b) Algorithm 3 reduces to Algorithm 2 if γ ≡ and S k is indexes of hypotheses with k smallest scores,which cost at most O ( m ) computation. The validity of Algorithm 3 in Theorem 3 does not require any assumption on local test or presorting p -values, and needs m iterations in the worst case. In practice we expect fewer iterations will be needed asthe false discoveries are ruled out in batches quickly. Particularly, for the special case stated in part (b) inTheorem 3, the task costs only at most linear time. The proof of Theorem 3 is in Appendix E. Remark . Algorithm 2 in Theorem 3 is in fact just the shortcut for ﬁnding the largest hypotheses set to rejectwith strong FWER control among all m hypotheses. A closed testing is consonant if the local tests for every composite hypotheses S ∈ [ m ] are chosen in such a way that rejection of S after closure implies rejection of at least one of its elementary hypothesis after closure. Algorithm 1:

Shortcut for evaluating post-hoc false discoveries bound e α ( S ) Input:

A sequence of sorted scores T , . . . , T m which satisﬁes T ≥ · · · ≥ T m ; a local test rule ofform (24) with a monotonically increasing transformation function h and thresholding function g ; conﬁdence level α ; candidate rejection set S = { i , i , . . . , i s } and its complement S c = { j , j , . . . , j m − s } with i < i < . . . i s , j < j < . . . j m − s . Output:

High probability ( − α ) simultaneous bound e α ( S ) on the number of false discoveries in S . Initialization: transformed candidate set scores: u , . . . , u s , where u d = h ( T i d ) for ≤ d ≤ r ;transformed complementary set scores: v , . . . , v m − s , where v d = h ( T j d ) for ≤ d ≤ m − s ;ill-deﬁned transformed scores: v = max( u , v ); u s +1 = min( u s , v m − s ) − ;iteration related indices k ← a ← b ← − accumulated scores Q = 0 . while a − k − b ≤ m − s and k + b ≤ s do if u k + b +1 ≥ v a − k − b or a = 1 then Q = Q + u k + b +1 b = b + 1 else Q = Q + v a − k − b end while k ≤ min( s, a ) and Q > g ( a, α ) do if b > then b ← b − else Q ← Q + u k +1 − v a − k end k ← k + 1 end a = a + 1 end return k − The performance of the closed testing based post-hoc inference is largely based on properties of the buildingblocks, the local tests. Therefore, in order to provide better guidance of applying our newly derived shortcutsintroduced in Section 2, we look into the properties of different global null tests, particularly the generalizedmean based ones (i.e. t ( r ) α deﬁned in (13)), since our shortcuts apply to these.Vovk and Wang [12] ﬁrst summarized the class of generalized mean based combination methods, andderived closed form calibration under arbitrary dependence for different combination choice, using resultsbased on robust risk aggregation. Before we summarize their results, we would like to point out that the case m = 2 is curiously not given for the harmonic mean i.e. M − , . Lemma 1 addresses this missing piece, withproof located in Appendix F. Lemma 1.

For any pair of valid p -values p , p , we have Pr H ∩ H (cid:110) M − , ( p , p ) ≤ α (cid:111) ≤ α, (26) where M − represents the harmonic mean function (12) . Particularly, if p and p are marginally standarduniform, then a copula exists such that the equality is achieved. Now we speciﬁcally summarize the results for calibrating under arbitrary calibration [12] together with ournewly derived complement Lemma 1 in the following Lemma 2, as it will be our benchmark to compare with. 02.23.21

Lemma 2. (Vovk and Wang [12]) For M r,m deﬁned in (12) , we have α r,m M r,m is a valid p -value, where α r,m : =  ( r + 1) /r , if r ∈ ( − , ∞ ];(( y m + 1) / ( y m + m ) ) { m ≥ } + m { m ≤ } , if r = − rr +1 m /r , r ∈ [ −∞ , − , (27) and y m is the unique strictly positive solution of y = m (( y + 1) log ( y + 1) − y ) . Particularly, for r ∈ {−∞ , , ∞} ,we deﬁne α r,m as lim r →∞ ( r + 1) /r = 1 , lim r → ( r + 1) /r = e , and lim r →−∞ rr +1 m /r = m . Follow-up work [13, 14] explored the conservativeness of such calibration under some special dependencestructure: Wilson [9] derived asymptotic valid (in the sense of m → ∞ ) calibration under independenceusing generalized central limit theorem, and empirically studied their performance when the independenceassumption is broken; Chen et al. [14] compared the generalized mean based combination with order statisticsbased combination, and proved that only Cauchy combination (and its analog harmonic mean) and Simescombination pay no price for calibration to achieve validity under assumptions from independence to fulldependence (i.e. correlation one). Figure 3 summarizes all the cases (including ours) where theoreticallyvalid calibration have been derived. Note that, before our work, almost no results have derived in cases otherthan the two extremes—the independence case and the arbitrary dependence case: Chen et al. [14] providedsome theoretical justiﬁcation in the pairwise Gaussian scenario but only for harmonic mean. As for commonintermediate dependence structure like multivariate Gaussian case, most work only explored experimentally.Therefore, as shown in Figure 3, we work towards ﬁlling in the gap, by deriving calibration under one of theintermediate case, the equicorrelated Gaussian setting, which contains both two extremes as well as differentdependence level. Later, we also investigate the performance of our calibration, by analyzing the asymptotictype-I error and power under different settings. Particularly, our calibration recovers existed work in scenarioswhere independence provably has the highest inﬂated type-I error to be calibrated among others; while ourtheoretical performance investigation justify for the interesting behaviours noticed in early experimental studies[9, 14], that is, for the generalized mean based methods, choice of positive r performs better under heavydependence, and calibrating under independence gives high false positive rate overall; while choice of negative r performs bad under heavy dependence, and calibrating under independence gives low false rate overall. Before we present the main results, we ﬁrst motivate our focus on the equicorrelated Gaussian model. Considera Gaussian sequence model for the observations: ( X m , X m , . . . , . . . X mm ) ∼ N m ( µ m , Σ m ) , (28)where µ m = ( µ m , . . . , µ mm ) , and each entry µ mi iid ∼ µ m B m , with µ m > as a scalar, and B m as aBernoulli random variable with parameter π m . Additionally, we assume Σ m ∈ M m , where M m is the set ofall m × m correlation matrices with non-diagonal elements in [ − m , . Particularly, we denote the set of allequicorrelation matrices as M Em , which is the subset of M m with all equal non-diagonal elements.Suppose we are testing the global null hypothesis m (cid:92) i =1 H mi = { µ mi = 0 , ∀ i } at level α . We consider a one-sided p -value p mi = Φ( − X mi ) for each elementary hypothesis (where Φ is theCDF of a standard normal), and combine them using a generalized mean t ( r ) α (13). Denote the correspondingtype-I error given the correlation matrix Σ with respect to different r as follows: (cid:101) α m (Σ , r, c ) : = Pr (cid:84) mi =1 H mi (cid:32) m m (cid:88) i =1 p rmi (cid:33) /r ≤ c  , (29)0 02.23.21 Independence Equicorrelated Gaussian Arbitrary Dependence r ≥ 1 r = 0 −0.5 < r < 0 −1 < r ≤ − 0.5 r = − 1 r < − 1 r = − 0.5 m → ∞ & α → 0 m → ∞ & α → 0 m → ∞ α ≤ α m → ∞ (ours)(ours)(ours) (Chen et.al) (Fisher) (Vovk et.al)(Vovk et.al)(Vovk et.al)(Wilson) = (Chen et.al) α → 0 m → ∞ (Chen et.al) (Wilson) = (Chen et.al)(Wilson) = (Chen et.al) m → ∞ (Wilson) = (Chen et.al) = (ours) m → ∞ & α → 0 α → 0 (Chen et.al)(Wilson) = (Chen et.al) = (ours) m → ∞ & α → 0 α → 0 (Chen et.al) m → ∞ (Wilson) = (Chen et. al) α → 0 (Chen et.al) m → ∞ (ours) (Wilson), (Chen et.al)(Wilson) = (Chen et.al) m → ∞ & α → 0 α → 0 (Chen et.al) m → ∞ (Wilson) = (Chen et.al) r < 1 Numerical Closed-form

Figure 3: Summary of regimes in which we know how to calibrate generalized means of p-values. We omitexplicit expressions as there is sometimes no analytical formula, but thresholds can be calculated numerically(blue text). For explicit expression, we refer readers to corresponding reference mentioned in the text.where c is a correction/calibration threshold to account for dependence, which could be an absolute constant orpotentially depend on r, m and α . Remark . From the monotonicity of the generalized mean with respect to r , one can easily verify by contra-diction that (cid:101) α m (Σ , r, c ) ≤ α implies c ≤ α . Proposition 1.

Fix any m ≥ and any r ≥ . If c < m , then sup Σ ∈M m (cid:101) α m (Σ , r, c ) = sup Σ ∈M Em (cid:101) α m (Σ , r, c ) = (cid:101) α m (1 m Tm , r, c ) , (30) where m is the m -dimensional vector of all ones. Proposition 1 indicates that, for all r ≥ and appropriately small α , we only need to calibrate against thefully dependent case to have validity across the whole correlation space M m . The proof of Proposition 1 isin Appendix G, where we used the convexity of function Φ( − x ) r when r ≥ and x > , and the fact that(multivariate) Gaussianity is preserved under linear transformations. It is unclear whether the restriction on α can be entirely removed, but it could perhaps be slightly relaxed by constant factors. A particularly interestingcorollary is obtained is when r = 1 , summarized below. Corollary 1.

Let Σ ∈ M m be an arbitrary positive deﬁnite Gaussian correlation matrix (with possibly nega-tive entries). Let X ∼ N (0 , Σ) and let P i = Φ( − X i ) for i = 1 , . . . , m . Then, the arithmetic average ( r = 1) of the p -values, ¯ P := m (cid:80) mi =1 P i satisﬁes sup Σ ∈M m Pr( ¯ P ≤ α ) ≤ α, for any α < / m. It is possible that a variant of Proposition 1 also holds for r < , but we have found it to be technicallyintractable to prove currently. Nevertheless, for the sake of simplicity and interpretability, we next consider an1 02.23.21intermediate case of equicorrelated Gaussians, which was observed to be worse than the other commonly usedcorrelation structures when r < in extensive simulations, while also containing the perfectly correlated casein Proposition 1. In particular, we consider only positive correlation as the semi-positive deﬁnite requirementon the correlation matrix forces the range of negative ρ to be in ( − m , , which vanishes as m → ∞ . Deﬁnition 2. Positively equicorrelated Gaussian : For each m , the observations X m , . . . , X mm follow themodel in (28) but with a positive equicorrelated Σ m having ρ ij ≡ ρ ∈ [0 , for all i (cid:54) = j ∈ [ m ] .Formally, in the following Section 3.2 and Section 3.3, we consider the model deﬁned in Deﬁnition 2,and we write (cid:101) α m (Σ , r, c ) in (29) as (cid:101) α m ( ρ, r, c ) for simplicity. We intend to study the asymptotic ( m → ∞ )behaviour of calibrated (cid:101) α m ( ρ, r, c ) given ﬁxed α . We would like to investigate how their power varies as afunction of correlation ρ with respect to different r , and different signal settings. In this subsection, we derive the asymptotic calibration under the positively equicorrelated Gaussian modelin Deﬁnition 2. First, we formally deﬁne the asymptotic calibration of our concern, and then we present ourclosed-form solution under the positively equicorrelated Gaussian model.Typically, the asymptotic ( m → ∞ ) Type-I error would be deﬁned as A (cid:63) ( r ) := lim sup m →∞ sup ρ ∈ [0 , (cid:101) α m ( ρ, r, c ) . (31)However, we found (31) to be intractable; speciﬁcally, before taking the outer limit, we found taking the supre-mum with respect to ρ for ﬁxed m to be analytically infeasible under the positively equicorrelated Gaussianmodel. Therefore, we settle for an alternative (weaker) deﬁnition of target type-I error as the following surro-gate limit: A ( r ) : = sup ρ ∈ [0 , lim sup m →∞ (cid:101) α m ( ρ, r, c ) . (32)Note that A ∗ ( r ) ≥ A ( r ) deterministically , that is control over the surrogate asymptotic type-I error is weaker.Denote the highest calibrated threshold c that achieves A ( r ) ≤ α as c r ( m, α ) , that is c r ( m, α ) : = sup { c : sup ρ ∈ [0 , lim sup m →∞ (cid:101) α m ( ρ, r, c ) ≤ α } , (33)and the corresponding limiting type-I error as (cid:101) α ( ρ, r, α ) := lim sup m →∞ (cid:101) α m ( ρ, r, c r ( m, α )) . (34)In the following, we derive a closed-form expression for c r ( m, α ) , and the corresponding (cid:101) α ( ρ, r, α ) under thepositively equicorrelated Gaussian model. Note that in this setting, the observations can be written as X i = √ ρ Z + (cid:112) − ρ Z i , for all i = 1 , , . . . , m, (35)where Z ∼ N (0 , , Z i iid ∼ N (0 , for all i = 1 , , . . . , m , and Z ⊥⊥ { Z i } mi =1 . The corresponding one-sided p -values are p i = Φ( − X mi ) = Φ (cid:16) −√ ρ Z − (cid:112) − ρ Z i (cid:17) . (36)Here we drop index m as the distribution of X does not change with m and the same holds true for p -values.An important note from this decomposition is the following conditional independence, p , p , . . . , p m are i.i.d. conditional on Z , (37) To see this, observe that sup ρ ∈ [0 , (cid:101) α m ( ρ, r, c ) ≥ (cid:101) α m ( ρ, r, c ) for all ρ ∈ [0 , and all m . Taking lim sup m on both sides maintainsthe inequality, as does taking a further sup ρ on both sides. p ri when conditioning Z = z as a function of z that is g ρ,r ( z ) : = E [ p ri | Z = z ] = (cid:90) Φ( −√ ρ z − (cid:112) − ρ x ) r φ ( x ) dx, (38)noting that g ,r ( z ) is a constant, we enforce g − ,r ( · ) ≡ ∞ . Theorem 4.

Under the positively equicorrelated Gaussian setting, we have that, given α ∈ (0 , ,(a) if r > , then (cid:101) α ( ρ, r, α ) = Φ( − g − ρ,r ( α r )) , and c r ( m, α ) = min { α, ( rr +1 ) r } ;(b) if − < r ≤ , then (cid:101) α ( ρ, r, α ) = Φ( − g − ρ,r ( c r ( m, α ))) , and c r ( m, α ) = (sup ρ ∈ [0 , g ρ,r (cid:0) − Φ − ( α ) (cid:1) ) r is not a function of m , where g ρ,r is deﬁned in (38) ;(c) if r = − , then (cid:101) α ( ρ, r, α ) = α I { ρ = 0 } , and c r ( m, α ) = α α log m as α → ;(d) if r < − , then (cid:101) α ( ρ, r, α ) = α I { ρ = 0 } , and c r ( m, α ) = αm | r | − as α → .In all four cases, we have that c r ( m, α ) ≤ α . The proof of Theorem 4 is in Appendix H, where we mainly use the decomposition described above andgeneralized law of large numbers. From Theorem 4, we can see the calibrated threshold c r ( m, α ) under posi-tively positively equicorrelated Gaussian is less conservative than that under arbitrary correlation in Lemma 2;particularly, for r > , by a factor of ( r + 1) /r ; for r = 0 , by a factor of | log α | +1 | log α | ; for r = − , by a factor of log mα log m +1 ; for r < − , by a factor of | r || r |− . Figure 4 displays these ratios for r ∈ [ − , . We can see that,as | r | → ∞ , our positively equicorrelated Gaussian calibration is almost the same with Vovk calibration forarbitrary dependence, indicating that we do not pay much price for calibrating against positive equicorrelationto arbitrary dependence; while as | r | → , the positively equicorrelated Gaussian calibration is much tighterthan that of arbitrary correlated Gaussian in [12].Figure 4: The theoretical ratio of calibrated threshold under positively equicorrelated Gaussian (Cal G-dep ) andunder arbitrary dependence (Cal

A-dep ) versus different r .Next, we conduct some large-scale simulations (in terms of m ) to justify the surrogate control in Theo-rem 4. Explicitly, we compare our derived asymptotic type-I error (cid:101) α ( ρ, r, α ) under the surrogate calibration(i.e. A ( r ) ≤ α ) with the type-I error (cid:101) α (cid:63)m ( ρ, r, α ) under the ideal calibration (i.e. A (cid:63) ( r ) ≤ α ), that is (cid:101) α (cid:63)m ( ρ, r, α ) : = Pr (cid:84) mi =1 H mi (cid:32) m m (cid:88) i =1 p rmi (cid:33) r ≤ c (cid:63)r ( m, α )  (39)with c (cid:63)r ( m, α ) : = sup { c : sup ρ ∈ [0 , (cid:101) α m ( ρ, r, c ) ≤ α } . (40)3 02.23.21Figure 5 empirically shows that, for r > − , c r ( m, α ) can also approximately achieve control (cid:101) A (cid:63) ( r ) ≤ α inthe sense that (cid:101) α ( ρ, r, α ) ≈ (cid:101) α (cid:63)m ( ρ, r, α ) for each ρ ∈ [0 , if m is large enough. (a) m = 10 correlation r t y pe − I e rr o r r −0.8−0.5−0.300.5110 (b) m = 10 correlation r t y pe − I e rr o r r −0.8−0.5−0.300.5110 Figure 5: The asymptotic uniform type-I error (cid:101) α ( ρ, r, α ) using the calibrated positively equicorrelated Gaus-sian test (dotted line, identical in both subplots), and the empirical point-wise Type-I error (cid:101) α (cid:63)m ( ρ, α, r ) (solidline) with m = 10 (left) and m = 10 (right) of different method for adjusting dependence via simulation(averaging over trials) versus correlation given different r > − at conﬁdence level α = 0 . .For r ≤ − the approximation is much looser as the convergence (for point-wise ρ ) in generalized lawof large numbers is much slower and out of feasible simulation scope. Nevertheless, from Figure 6 onemay see a trend of convergence with regard the current limited magnitude of m : the “worst case" correla-tion given ﬁxed m, r, α , that is arg max ρ ∈ [0 , (cid:101) α (cid:63)m ( ρ, r, α ) slowly approaches as m grows, at different rategiven different α (faster for α away from ), and the point-wise Type-I error (cid:101) α (cid:63)m ( ρ, r, α ) slowly approaches (cid:101) α ( ρ, r, α ) = α { ρ = 0 } as m grows.Figure 6: The empirical point-wise Type-I error (cid:101) α (cid:63)m ( ρ, r, α ) (solid line) with different m ∈ { , , , } of different method for adjusting dependence via simulation (averaging over trials) versus correlation givendifferent r ≤ − at conﬁdence level α = 0 . in ﬁrst column and α = 0 . in second column. The dashedvertical lines indicate the “worst case" correlation given ﬁxed m, r, α , that is arg max ρ ∈ [0 , (cid:101) α (cid:63)m ( ρ, r, α ) .4 02.23.21 In this subsection, we study the power using the calibrated threshold C m ( α, r ) derived in Section 3.2. We lookat the case r > and r ≤ − separately, under different alternative settings. Given ρ and α , denote the powerfunction for different r with dimension m is (cid:101) β m ( ρ, r, α ) : = Pr (cid:32) m m (cid:88) i =1 p rmi (cid:33) r ≤ c r ( m, α )  , (41)with c r ( m, α ) is speciﬁed in Theorem 4.In the following, we are interested in the asymptotic behaviour of (cid:101) β m ( ρ, r, α ) under different setting of µ m and π m in the positive equicorrelated Gaussian model (see Deﬁnition 2), that is we intend to derive expressionof the asymptotic power (cid:101) β ( ρ, r, α ) : = lim m →∞ (cid:101) β m ( ρ, r, α ) . (42) Theorem 5.

Fix r ≥ , and consider the positive equicorrelated Gaussian model in Deﬁnition 2, where lim m →∞ µ m = µ ∈ [0 , ∞ ] , lim m →∞ π m = π ∈ [0 , . Then for any α ∈ (0 , ( rr +1 ) r ) , ρ ∈ [0 , , we have that the asymptotic power (cid:101) β ( ρ, r, α ) = lim m →∞ Pr (cid:110) π m g ρ,r ( Z + µ m √ ρ ) + (1 − π m ) g ρ,r ( Z ) ≤ α r (cid:111) ∈ [ (cid:101) α ( ρ, r, α ) , , (43) with g ρ,r deﬁned in (38) , (cid:101) α ( ρ, r, α ) ∈ [0 , α ] deﬁned in (34) , and Z as a standard Gaussian random variable.In particular, • if π = 1 , then (cid:101) β ( ρ, r, α ) =  , if µ = ∞ ;Φ (cid:16) − g − ρ,r ( α r ) + µ √ ρ (cid:17) ∈ [ (cid:101) α ( ρ, r, α ) , , if < µ < ∞ ; (cid:101) α ( ρ, r, α ) , if µ = 0 . (44)• if < π < , then (cid:101) β ( ρ, r, α ) =  Φ (cid:16) − g − ρ,r ( α r − π ) (cid:17) , if µ = ∞ ; Pr (cid:26) πg ρ,r ( Z + µ √ ρ ) + (1 − π ) g ρ,r ( Z ) ≤ α r (cid:27) ∈ (cid:20)(cid:101) α ( ρ, r, α ) , Φ (cid:18) − g − ρ,r ( α r − π ) (cid:19)(cid:21) , if < µ < ∞ ; (cid:101) α ( ρ, r, α ) , if µ = 0 . (45)• if π = 0 , then (cid:101) β ( ρ, r, α ) = (cid:101) α ( ρ, r, α ) , no matter what value that µ takes. The proof of Theorem 5 is in Appendix I, where we use a triangular-array version of generalized law oflarge numbers. From Theorem 5, we can ﬁrst verify some intuitive facts: the power achieves optimality (i.e.asymptotically goes one) in the perfect case (i.e. π = 1 with µ = ∞ ), while drops to the lower bound (cid:101) α ( ρ, r, α ) in the worst case (i.e. π = 0 or µ = 0 ). Theorem 5 also indicates some surprising ﬁndings that are somewhatexclusive to the case of r > : the asymptotic power will not go to one even in the full signal setting (i.e. π = 1 )as long as the signal is not perfect (i.e. µ < ∞ ), and similar happens in the perfect signal setting (i.e. µ = ∞ )as long as there are some non-signals (i.e. π < ). These counter-intuitive behaviours happen due to the nature5 02.23.21of the combination choices with r > , where the combination is essentially a weighted average of p -valuewith a monotonic increasing transformation, thus will be dominated by large p -values. Therefore, in the case of r > , as long as there is a non-diminishing part of observations that are most likely to generate large p -values(e.g. non-signals or weak signals), then we will lose power anyhow.As for an explicit example of Theorem 5, we consider when r = 1 , π ∈ (0 , , µ = ∞ for simplicity. In thiscase the combination becomes − π times the arithmetic mean of p -values (i.e. − πm (cid:80) mi =1 p mi ). Therefore,from the deﬁnition of (cid:101) β ( ρ, r, α ) , we know (cid:101) β (0 , α, r ) = Φ( − g − ρ,r ( α r − π ))) = Φ( −∞ ) = 0 from the deﬁnition of g − ,r , which agrees with Theorem 5. This example indicates a more general phenomena: when r ≥ , as long asthere are some non-signals (i.e. π < ), the asymptotic power under independence will always be 0 no matterhow strong the signal is (i.e. how large µ is). On the other hand, if ρ = 1 , then we have (cid:101) β (1 , α, r ) = α − π ,which will be close to one if and only if π → . From this simple example, we justify for the behaviour many[12, 13] observed in experiments: that is the combination choice with r ≥ will be powerless unless there aremany strong signals with heavy dependence.In the following, we study the case when r ≤ − . Firstly, we look at the setting with moderate signal strength,that is µ m = o ( √ log m ) . The following Theorem 6 shows that, as long as the signal is not strong enough, theasymptotic power for r ≤ − will always degenerate, no matter how dense the signal is. Theorem 6.

For r ≤ − , consider the positive equicorrelated Gaussian model in Deﬁnition 2 where µ m = o ( (cid:112) log m ) , lim m →∞ π m = π ∈ [0 , , we have that, for all ρ ∈ [0 , , and α > , (cid:101) β ( ρ, r, α ) = α { ρ = 0 } . (46)The proof of Theorem 6 is in Appendix J, where we mainly use results about limitation of inﬁnitely divisiblerandom variables. Theorem 6 indicates that, as long as the signal is not strong enough, the combination with r ≤ − will be powerless unless under independence. Despite of the observed robustness under dependencyfor cases of r ≤ − in experiments conducted by Wilson [9], the robustness will diminish as the number ofhypotheses goes to inﬁnity, and the method actually becomes highly sensitive to the dependence in the end.This phenomena arises from the fact that the gap between the calibrated threshold grows at least sub-linearly(i.e. O ( m (cid:15) ) with (cid:15) > ) for different ρ , therefore the conservativeness arises from calibration grows with thenumber of hypotheses, and resulting in high sensitivity to dependence in the end.In the following, we study the setting when the signal is strong enough, speciﬁcally, that is when µ m ≥ O ( √ log m ) . Theorem 7.

For r ≤ − , consider the positive equicorrelated Gaussian model in Deﬁnition 2 where µ m = (cid:112) c log m, with c > . For all ρ ∈ [0 , , if further one of the following is satisﬁed:(a) lim m →∞ π m = π > , and √ c > − (cid:112) (1 − ρ ) ;(b) π m = m γ − , where < γ < , and √ c > − (cid:112) γ (1 − ρ ) ,then we have that, for all α > , (cid:101) β ( ρ, r, α ) = 1 . (47)The proof of Theorem 7 is in Appendix K, where we use a sandwiching argument, similar to Liu and Xie[8]. Theorem 7 (a) indicates that, in order to achieve full power at different ρ , the signal strength need to bestronger under heavy dependence. This agrees with the intuition, since the power is fundamentally relatedto the tail of transformed p -value p rmi , which is thinner under heavy dependence, while heavier under weakdependence. Theorem 7 (b), following the sparse setting [16], states that the asymptotic power will achieveone with probability one, as long as the signal strength c and signal sparsity γ achieve the detection boundarydeﬁned by ρ : √ c > − (cid:112) γ (1 − ρ ) . Note that the detection boundary in (b) grows with ρ , which indicatesunder heavier dependence, the signal needs to be stronger/denser to achieve the sweet spot.6 02.23.21 In Section 3 we derive a bunch of theoretical results of local test t rα (13) under equicorrelated Gaussian model(Deﬁnition 2) in an asymptotic setting ( m → ∞ ), which indicate behaviors that positive r works better forheavy dependence and weak dense signals, while negative r works better for weak dependence and strongsparse signals. In this section we empirically present the evidence that the above behaviours of local test arelargely preserved after going through closed testing.As the theoretical results for local test in Section 3 are only asymptotic, while in closed testing we need toconsider all subsets of [ m ] , henceforth our theoretical results will not be applicable for a large proportion ofthem. On the other hand, calibrate for subsets of size to m is computationally expensive, therefore we usethe following approximation, that is calibrate for a few sizes in [ m ] and then interpolate to the whole [ m ] (seeFigure 7 for the case when m = 1000 , α = 0 . ). This empirical calibration with interpolation works well interms of maintaining correct error control and gives nontrivial power (see Figure 8– 11).Figure 7: Calibration under ρ -equicorrelated Gaussian for m ∈ [1000] for a worst case ρ ∈ [0 , (cal-culated using a grid of width 0.01). We compute the empirical α level calibrated cutoff c (cid:63)r ( m, α ) with α = 0 . (black dots) for ( m (cid:80) mi =1 p ri ) r with grid points m ∈ { , , . . . , , , , . . . , } and r ∈{− , − . , − , − . , − . , , . , , , } via averaging over trials. Then we approximate c (cid:63)r ( m, α ) forall m ∈ [1000] via ﬁtting a smooth line (cid:98) c (cid:63)m ( α, r ) r (red line) for the whole m ∈ [1000] , and use the ﬁttedvalue as our ﬁnal calibration. As for comparison, we also plot the theoretical calibrated cutoff c r ( m, α ) (seedeﬁnition (33)) derived for pointwise asymptotic type-I control in Section 3. In addition, for small m ( ∼ ),we just use empirical calibration for accurateness. Remark . Figure 7 provides reasonable evidence that for r ≥ , the worst case dependence is not achievedby ρ = 1 (perfect correlation) because if that was the case, there would be no violation of type-I error, but weobserve above that the achieved error is larger than α = 0 . .In the rest of this section, we consider m = 200 tests, each based on different samples which are notnecessarily independent, particularly, we assume the positively equicorrelated Gaussian model (Deﬁnition 2)7 02.23.21among samples of the m tests, and test whether a given set of data has zero mean. In particular, we consider µ m ≡ µ for all m , and π m ≡ π for all m . In Figure 8–11 we investigate the algorithms presented in Theorem 3,that is Algorithm 1 for ﬁnding the largest subset of m such that FDP is controlled under γ , and Algorithm 2for ﬁnding the largest subset of [ m ] such that FWER is controlled under α . Speciﬁcally, Figure 8–9 show theresults for ﬁnding the largest subset of [ m ] with FDP controlled under γ = 0 . , and Figure 10–11 show theresults for ﬁnding the largest subset of [ m ] with FWER controlled under α = 0 . . We can see that, both FDPand FWER are controlled as we wanted, while r < often has non-trivial power (close to one) comparing with r ≥ when signals are strong enough (i.e. µ large enough), while they are both powerless otherwise (i.e. µ notlarge enough). For weak signals speciﬁcally, we observe that r > have higher power comparing to r < ,especially under strong dependence (i.e. ρ (cid:29) ) and high signal density (i.e. π (cid:29) ). These ﬁndings generallyagree with our asymptotic theory for local test in Section 3, that is r < achieves almost perfect power undersetting with sparse strong signals, but is powerless when signals are not strong enough, in which case r > works better (especially given heavy dependence and dense signals).Figure 8: The empirical FDP and power versus the signal proportion under different settings using ﬁtted cal-ibration in Figure 7 and algorithm in Theorem 3, with α = 0 . , γ = 0 . , m = 200 , averaging over 1000trials.8 02.23.21Figure 9: The empirical FDP and power versus correlation ρ under different settings using ﬁtted calibration inFigure 7 and algorithm in Theorem 3, with α = 0 . , γ = 0 . , m = 200 , averaging over 1000 trials.Figure 10: The empirical FWER and power versus signal proportion under different settings using ﬁtted cali-bration in Figure 7 and algorithm in Theorem 3, with α = 0 . , m = 200 , averaging over 1000 trials.9 02.23.21Figure 11: The empirical FWER and power versus correlation ρ under different settings using ﬁtted calibrationin Figure 7 and algorithm in Theorem 3, with α = 0 . , m = 200 , averaging over 1000 trials. In this paper we investigate the general case of closed testing with local tests that adopt a special propertywe called separability , that is the test is function of summation of test scores for individual hypothesis. With separability , symmetry and monotonicity in a local tests, we derive a class of novel, fast algorithms for varioustypes of simultaneous inference. We pair our algorithms with recent advances in separable global null test, i.e.the generalized mean based methods summarized [12], and obtain a series of simultaneous inference methodsthat are sufﬁcient to handle many complex dependence structure and signal composition. We provide guidanceon choosing from these methods adaptively via theoretical investigation of the conservativeness and sensitivityfor different choices of local test in an equicorrelated Gaussian model. Speciﬁcally, we found that,• when signals are weak, all simultaneous inference methods are powerless, while the ones with positive r performs a bit better when signals are dense and highly correlated.• when signals are strong, methods with negative r often able to achieve full power, and methods withpositive r are often still powerless; except in the case when signals are also dense, they are comparable.In the following are some remaining problems that we would like to leave to future work. First we notethat some asymptotic theory does not agree with empirical results that well given current magnitude we canafford to try: especially for the negative r case, where the surrogate calibration leads to asymptotic results faraway from the high dimensional case in real world (like m = 10 ). How to bridging the gap between theoryand practice may worth more attention. Secondly, in this work we mainly focus on the equicorrelated Gaussiancase, while the derivation of the tight calibration under arbitrary correlated Gaussian will be intriguing thoughmuch harder. Finally, the power analysis is conducted only theoretically in local test: formal theoretical analysisafter closure would be desired, though we expect that to be much harder.0 02.23.21 References [1] Gilles Blanchard, Pierre Neuvial, Etienne Roquain, et al. Post hoc conﬁdence bounds on false positivesusing reference families.

Annals of Statistics , 48(3):1281–1303, 2020.[2] Ruth Marcus, Peritz Eric, and K Ruben Gabriel. On closed testing procedures with special reference toordered analysis of variance.

Biometrika , 63(3):655–660, 1976.[3] Christopher R Genovese and Larry Wasserman. Exceedance control of the false discovery proportion.

Journal of the American Statistical Association , 101(476):1408–1417, 2006.[4] Jelle J Goeman, Rosa J Meijer, Thijmen JP Krebs, and Aldo Solari. Simultaneous control of all falsediscovery proportions in large-scale multiple hypothesis testing.

Biometrika , 106(4):841–856, 2019.[5] Ronald A Fisher. Statistical methods for research workers. In

Breakthroughs in Statistics , pages 66–70.Springer, 1992.[6] Samuel A Stouffer, Edward A Suchman, Leland C DeVinney, Shirley A Star, and Robin M Williams Jr.The American soldier: adjustment during army life. (Studies in social psychology in World War II). 1949.[7] Ludger Rüschendorf. Random variables with maximum sums.

Advances in Applied Probability , pages623–632, 1982.[8] Yaowu Liu and Jun Xie. Cauchy combination test: a powerful test with analytic p -value calculation underarbitrary dependency structures. Journal of the American Statistical Association , 115(529):393–402,2020.[9] Daniel J Wilson. The harmonic mean p -value for combining dependent tests. Proceedings of the NationalAcademy of Sciences , 116(4):1195–1200, 2019.[10] Jelle J Goeman and Aldo Solari. Multiple testing for exploratory research.

Statistical Science , 26(4):584–597, 2011.[11] Edgar Dobriban. Fast closed testing for exchangeable local tests.

Biometrika , 107(3):761–768, 2020.[12] Vladimir Vovk and Ruodu Wang. Combining p-values via averaging.

Biometrika , 107(4):791–808, 2020.[13] Daniel J Wilson. Generalized mean p -values for combining dependent tests: comparison of generalizedcentral limit theorem and robust risk analysis. Wellcome Open Research , 5(55):55, 2020.[14] Yuyu Chen, Peng Liu, Ken Seng Tan, and Ruodu Wang. Trade-off between validity and efﬁciency ofmerging p -values under arbitrary dependence. arXiv preprint arXiv:2007.12366 , 2020.[15] Jelle Goeman, Jesse Hemerik, and Aldo Solari. Only closed testing procedures are admissible for con-trolling false discovery proportions. The Annals of Statistics , 2021+.[16] David Donoho and Jiashun Jin. Higher criticism for detecting sparse heterogeneous mixtures.

The Annalsof Statistics , 32(3):962–994, 2004.[17] Maurice J Frank, Roger B Nelsen, and Berthold Schweizer. Best-possible bounds for the distribution of asum—a problem of Kolmogorov.

Probability Theory and Related Fields , 74(2):199–211, 1987.[18] Vladimir V Uchaikin and Vladimir M Zolotarev.

Chance and stability: stable distributions and theirapplications . Walter de Gruyter, 2011.[19] B. V. Gnedenko and A. N. Kolmogorov.

Limit distributions for sums of independent random variables .Addison-Wesley Mathematics Series. Addison-Wesley, Cambridge, MA, 1954. Translated and annotatedby K. L. Chung. With an Appendix by J. L. Doob. MR:0062975. Zbl:0056.36001.[20] Robert D Gordon. Values of Mills’ ratio of area to bounding ordinate and of the normal probabilityintegral for large values of the argument.

The Annals of Mathematical Statistics , 12(3):364–366, 1941.1 02.23.21

Appendices

Appendix A Proof for Theorem 1

By deﬁnition, t α ( S ) = 1 is equivalent to p ( S ) ≤ α for any set S . Then noting that for any α ∈ [0 , , t α ( S ) = 1 ⇔ { p ( J ) ≤ α, for all J ⊇ S } ⇔ { p ( S ) ≤ α, t α ( J ) = 1 , for all J ⊃ S }⇔ { p ( S ) ≤ α, p ( J ) ≤ α, for all J ⊃ S } ⇔ α ≥ max { sup J ⊃ S p ( J ) , p ( S ) } . Therefore, by deﬁnition, we have p ( S ) = inf { α ∈ [0 ,

1] : α ≥ max { sup J ⊃ S p ( J ) , p ( S ) }} = max { sup J ⊃ S p ( J ) , p ( S ) } . (48)Given the monotonicity of t α and (48), we have sup J ⊃ S p ( J ) = sup ≤ i ≤| S c | p ( J i ) , where J i is the set of all hypotheses in [ m ] indices excluding the ones associate with the i smallest scores in S c . It is clear that RHS of above expression only require at most | S c | time to evaluate. Therefore, we haveﬁnished the proof. Appendix B Proof for Theorem 2

Without loss of generality, we assume that all the scores (e.g. p -values) are already sorted in a descendingorder, that is T ≥ T ≥ · · · ≥ T m . Denote S = { i , i , . . . , i s } , with ≤ s ≤ m and i < i , < · · · < i s ; and S c = { j , j , . . . , j m − s } as the complement set of S , with j < j , < · · · < j m − s . Proof for part (a).

By deﬁnition of t α ( S ) , we can write it as t α ( S ) = (cid:89) J : S ⊆ J ⊆ [ m ] t α ( J ) = (cid:89) ≤ i ≤| S c | (cid:89) J j : J i ⊆ S c , | J i | = i t α ( S ∪ J i ) . (49)Using the monotonicity of local test t α , we have t α ( S ∪ J i ) ≥ t α ( S ∪ J (cid:63)i ) , where J (cid:63)i = { j , . . . j i } . (50)for all J i ⊆ S c , | J i | = i and ≤ i ≤ | S c | , where J (cid:63)i is the set of hypotheses indices that is associated with the i largest scores in S c .Combining (49) and (50), and the fact that t α and t α are both / indicator, therefore we get t α ( S ) = (cid:89) ≤ i ≤| S c | t α ( S ∪ J (cid:63)i ) := (cid:89) ≤ i ≤| S c | t α ( I i ) , where I i = S ∪ J (cid:63)i (51)as we claimed, which requires at most O ( m ) computation in terms of local tests. Proof for part (b)

To prove the validity of Algorithm 1, we ﬁrst focus on the crucial line 9, and claim thatwhen event E := { line 9 is evaluated with k ≤ min { s, a }} (52)happens, the following four statement are true.2 02.23.21(i) Q = (cid:80) k + bj =1 u j + (cid:80) a − k − bl =1 v l ; (ii) b ≥ (iii) v a − k − b ≥ u k + b +1 ; (iv) u k + b ≥ v a − k − b +1 , if b > .We show this by induction. The ﬁrst time event E deﬁned in (52) happens, we have k = a = 1 , b = 0 and Q = u , so (i) and (ii) hold, (iii) holds since v ≥ u , and for (iv), since b = 0 there is nothing to prove.Now assume that (i)-(iv) hold the previous time the event E happens. Let a , k , b and Q be the valueof a, k, b and Q during that previous step. There are ﬁve ways for event E to happen again, which we cancharacterize by the way a, k, b are updated. We will discuss these ways one by one.1. Line → → → → E . In this case we update a = a ; b = b − k = k + 1 . We have b ≥ since b > . We have Q = Q , and (i) holds since k + b = k + b . By the induction hypothesis, v a − k − b = v a − k − b ≥ u k + b +1 = u k + r +1 . If b > , then also b > , so, by the induction hypothesis, u k + b = u k + b ≥ v a − k − b +1 = v a − k − b +1 . Line → → → → E . In this case we update a = a ; b = b = 0 ; k = k + 1 . Clearly, (ii)holds. We have Q = k +1 (cid:88) j =1 u j + a − k − (cid:88) l =1 v l , which reduces to (i). By the induction hypothesis, v a − k − b = v a − k − b − ≥ v a − k − b ≥ u k + b +1 = u k + b ≥ u k + b +1 , so (iii) follows. Since b = 0 there is nothing to prove (iv).3. Line → → → → , → E . In this case we update a = a + 1; b = b + 1; k = k . We get(ii) from the induction assumption since b = b + 1 ≥ . We obtain (i) since Q = k + b +1 (cid:88) j =1 u j + a − k − b (cid:88) l =1 v l . By the induction hypotheses, v a − k − b = v a − k − b ≥ u k + b +1 = u k + b ≥ u k + b +1 . Also, b > and u k + b = u k + b +1 ≥ v a − k − b = v a − k − b +1 . Line → → → → → E . We update a = a + 1 , k = k and b = b . We get (ii) trivially,and (i) since Q = k + b (cid:88) j =1 u j + a − k − b +1 (cid:88) l =1 v l . We have (iii) since v a − k − r = v a − k − b ≥ u k + b +1 = u k + b +1 . If b < , then b > . By the induction hypothesis, we get (iv), since, u k + b = u k + b ≥ v a − k − b +1 = v a − k − b ≥ v a − k − b +1 . Line → → → → → → → E . This happens only if k = a at the previousevaluation of line 9. In this case b = 0 or (i) would be ill-deﬁned. Consequently, we update a = a + 1 , k = k + 1 , and, since u k +1 ≤ v by deﬁnition of v , b = b = 0 , so (ii) holds. We get (i) since Q = k +1 (cid:88) j =1 u j + a − k (cid:88) l =1 v l + v − v . Since a = k + b and b = 0 , we have (iii) because v a − k − b = v ≥ u ≥ u k + b +1 . There is nothing toprove (iv) since b = 0 .Since we have exhausted the possibilities to get from one happening of event E to the next, the aboveanalysis proves (i)-(iv). It follows from (i)-(iv) that, in line 9, Q = W a,k , where W a,k = max { Q I : | I ∩ S | ≥ k, | I | = a } . To see why this is true, note that (i) Q is a sum of a terms, of which at least k terms are from S . The sumis the largest possible such sum, since the k largest scores in S are used, and by (ii) and (iv), of the a − k remaining scores, the smallest score that is included in the sum is larger than or equal to the largest score thatis not included. Note that, if k ≤ k (cid:48) , W a,k ≥ W a,k (cid:48) .Now, suppose e α ( S ) = e > . Then there exists some I ⊆ S with | I | = e and some J ⊇ I such that Q J > g ( | J | , α ) . In the algorithm, if a = | J | and k ≤ e , we have Q = W a,k ≥ W a,e ≥ Q J > g ( | J | , α ) , so thealgorithm enters the while loop in line 9, incrementing k while keeping a ﬁxed. This step is repeated at leastuntil k ≥ e + 1 . Since k is non-decreasing in the steps of the algorithm, it returns k − ≥ e . The same holdstrivially if e = 0 .If e α ( S ) = e < s , then for every I ⊂ S with | I | > e we have t α ( I ) = 1 , so for all J ⊇ I , wehave Q J ≤ g ( | J | , α ) . In particular, this holds for the worst case set, so for every e + 1 ≤ a ≤ m , we have W a,e +1 ≤ g ( a, α ) . If k = e +1 , therefore, the algorithm never enters the while loop in line 9, and consequentlynever increments k further. The algorithm therefore ends with k ≤ e + 1 and returns k − ≤ e . The sameholds trivially if e = s .Finally, since k − ≥ e and k − ≤ e , we have k − e . Appendix C Discussion of consonance

Firstly we formally state the deﬁnition of consonance.

Deﬁnition 3. (Consonance)

A closed testing is consonant if the local tests for every composite hypotheses S ∈ [ m ] are chosen in such a way that rejection of S after closure implies rejection of at least one of itselementary hypothesis after closure. Lemma 3.

The closed testing procedure using local test t ( r ) α deﬁned in (13) is consonant if and only if r = ±∞ .Proof. We will prove this proposition by analyzing different r case by case. Firstly, we show that closed testingusing local test t ( r ) α (13) is consonant when r = ±∞ .1. When r = ∞ , note that t ( ∞ ) α ( S ) = I { max i ∈ S p i ≤ α } . (53)Therefore, rejecting S after closure implies rejecting all the sets contains it locally, including the set [ m ] ,which in turn indicating rejection of all the sets locally, and after closure as well, therefore trivially wehave all the subsets of S being rejected after closure. In conclusion, the corresponding closed testingwhen r = ∞ is consonant.4 02.23.212. When r = −∞ , note that t ( −∞ ) α ( S ) = I {| S | min i ∈ S p i ≤ α } . (54)Therefore t −∞ α ( S ) = 1 implies t −∞ α ( B ) = 1 for all B ∈ B := { I ⊆ [ m ] : S ⊂ I } , (55)and particularly t −∞ α ( S ) = 1 , which in turn gives us t −∞ α ( A ) = 1 for all A ∈ A := { I ⊆ [ m ] : I ⊂ S, min i ∈ S p i ∈ I } (56)from the expression in (54).On the other hand, note that for some A ∈ A , and J ⊇ A , we have either J ⊇ S , or | J | (cid:54)⊇ S .In the former case, J is rejected locally due to fact (55). In the later case, if | J | ≤ | S | , we have | J | min i ∈ J p i ≤ | S | min i ∈ A p i = | S | min i ∈ S p i ≤ α ; if | J | > | S | , then there must exist B ∈ B suchthat | B | = | J | , which implies | J | min i ∈ J p i = | B | min i ∈ B p i ≤ α due to fact (56). Therefore J is stillrejected locally. Hence there exists at least one subset A of S that is rejected after closure, that is thecorresponding closed testing is consonant.Then we use counter examples to show that closed testing using local test t ( r ) α (13) is not consonant when r (cid:54) = ±∞ .1. When < r < ∞ , note that t ( r ) α = I (cid:110) m (cid:88) i =1 p ri ≤ mα r r + 1) (cid:111) Let β r,α = α r r +1) , the local testing rule becomes (cid:80) mi =1 p ri ≤ mβ r,α . Note that ≤ β r,α ≤ r +1) ≤ / for any r > . For the case m = 3 , and p r = β r,α / , p r = β r,α / , p r = 2 β r,α , we have p r + p r + p r < β r,α , p r + p r < β r,α , while p r + p r > β r,α and p r < β r,α . Therefore, we reject H ∩ H ∩ H , H ∩ H but neither H nor H after closure, therefore the rejection H ∩ H is notconsonant.2. When r = 0 , note that t (0) α = I (cid:110) m (cid:88) i =1 log 1 p i > m log eα (cid:111) . Let β α = log eα , and q i = log ( p i ) , then the local testing rule becomes (cid:80) mi =1 q i ≥ mβ α . Note that ≤ β α < ∞ , and ≤ q i ≤ ∞ . For m = 3 , let α = e . , and q = 1 . β α , q = 1 . β α , q = 0 . β α ,therefore we will reject H ∩ H ∩ H and H ∩ H after closure, but neither H nor H , which indicatesthat the rejection H ∩ H is not consonant.3. When − < r < , note that t ( r ) α = I (cid:110) m (cid:88) i =1 p ri ≥ mα r r + 1) (cid:111) . Let β r,α = α r r +1) , then the local testing rule becomes (cid:80) mi =1 p ri ≥ mβ r,α . Note that / ≤ r +1) ≤ β r,α < ∞ for any − < r < . For the case m = 3 , let α = r (cid:112) r + 1) , and p r = 1 . β r,α , p r = 1 . β r,α , p r = 0 . β r,α . Note the fact that β r,α ≥ , we have that p r + p r + p r > β r,α , p r + p r > β r,α , while p r + p r < β r,α p r + p r < β r,α . Therefore, we reject H ∩ H ∩ H , H ∩ H but neither H nor H after closure, therefore the rejection H ∩ H is not consonant.5 02.23.214. When r = − , note that t ( − α = I (cid:110) m (cid:88) i =1 p i ≥ em log mα (cid:111) . Let β α = eα , q i = 1 /p i , then the testing rule becomes (cid:80) mi =1 q i ≥ m log mβ α . For the case m = 5 , let q = q = 2 β α log 5 , q = q = q = β α log 5 , then we have (cid:88) i =1 q i = 5 log 5 β α , (cid:88) i =1 q i = 4 23 log 5 β α ≥ β α , and (cid:88) i =1 ,i (cid:54) =2 q i = 3 log 5 β α ≤ β α . Therefore, we must locally reject ∩ i =1 H i , ∩ i =1 H i , but not ∩ i =1 ,i (cid:54) =2 H i . Therefore, after closure, we willreject ∩ i =1 H i and ∩ i =1 H i , but we will not reject H , nor H , H or H , therefore the rejection ∩ i =1 H i is not consonant.5. When −∞ < r < − , note that t ( r ) α = I (cid:110) m (cid:88) i =1 p ri ≥ m − r α r +1 (cid:111) . Let t = − r , β t,α = α − t +1 , q i = 1 /p i , then the local test becomes (cid:80) mi =1 q ti ≥ m t β t,α . For the case m ≥ max { , t √ t √ − } , let q t = q t m t β α , and q t = · · · = q tm = m − m t β α (choose α such that β t,α > ), then we have that, (cid:80) mi =1 q ti ≥ m t β α , (cid:80) m − i =1 q ti ≥ m t β α , (cid:80) mi =1 ,i (cid:54) =2 q ti < m t β α . Therefore,we will reject ∩ i ∈ [ m − H i after closure, but we cannot reject H after closure, therefore the rejection ∩ i ∈ [ m ] ,i (cid:54) =2 H i is not consonant.6 02.23.21 Appendix D Algorithms for post-hoc auto-selection shortcuts

Algorithm 2:

Shortcut for auto selection of largest rejection set with zero FDP

Input:

Largest set S with zero false discoveries among all possible subsets of [ m ] , equivalently, theset of individual hypotheses with strong FWER control of level α . Initialization: transformed scores u , . . . , u m where u i = h ( T i ) for ≤ i ≤ m ;iteration related indices k ← s ← accumulated scores Q ← u k ;layer-wise thresholding c ← g ( s, α ) . while k < m and s < m do if Q > c then if s ≥ k then c ← c − u k Q ← Q − u k else Q ← Q − u k + u k +1 ; end k ← k + 1 else c ← c + g ( s + 1 , α ) − g ( s, α ) if s ≥ k then Q ← Q + u k +1 else c ← c − u s end s ← s + 1 end end return S = { k, . . . , m } Algorithm 3:

Shortcut for auto selection of largest rejection sets with bounded FDP

Input: conﬁdence level α ∈ (0 , ; desired FDP bound γ ∈ (0 , ; incremental candidate sets S ⊂ S · · · ⊂ S n with | S k | = k . Output: the largest S k such that e α ( S k ) ≤ γ | S k | . Initialization: k ← while k ≥ do e ← e α ( S k ) if e/k ≤ γ then return S k else k ← (cid:98) k − e − γ (cid:99) end end return ∅ Appendix E Proof for Theorem 3

We ﬁrst prove for the general case. Denote e i = e ( S i ) , and d i = i − e i . Consider we are at the iteration when S i is of our consideration, with i ∈ [ n ] . If e i i ≤ γ , then we return i , otherwise, we need to look for j < i suchthat e j j ≤ γ . Note that, for j < i , if d i − γ < j , we have d j j ( ∗ ) ≤ d i j < − γ, and in turn e j j > γ, (57)where (*) is true from Lemma 3 in Goeman et al. [15]. Therefore we cannot have e j j ≤ γ for j < i if j > d i − γ ,so we directly skip those iterations in batches, and that gives us Algorithm 3.In the following we prove under assumption that the local test are monotonic, symmetric and separable.Part (a) is obvious from Theorem 2. Hence in the following, we prove part (b) of the theorem. Without loss ofgenerality, assume that all the m scores are sorted as T ≥ T ≥ · · · ≥ T m .Firstly, we prove that the largest set S ⊆ [ m ] with e α ( S ) = 0 admits strong FWER control at level α . Note,from the deﬁnition of e α ( S ) (10), for any S such that e α ( S ) = 0 , each of its elementary subset is rejected byclosed testing at level α ; and conversely, for any hypotheses set S that is a collection of elementary hypothesesrejected by closed testing at level α , each of its subset is also rejected by closed testing, and hence e α ( S ) = 0 .Therefore, the largest set S ⊆ [ m ] with e α ( S ) = 0 is the collection of all the elementary hypotheses that arerejected by closed testing at level α . Then recalling the well known fact that the collection of all elementaryhypotheses rejected by closed testing at level α is a hypotheses set with strong FWER control at level α , wehave proved our argument.Then we show that, ﬁnding the collection of all the elementary hypotheses rejected by closed testing atlevel α is equivalent to ﬁnding a cutoff in the ordered scores. From the monotonicity of the local test, it is easyto see that, for any k ∈ [ m ] , if closed testing rejects T k , then it must also rejects T i for all i > k . Therefore,the ﬁnal rejection sets must be of the form { T k (cid:63) , . . . , T m } , where k (cid:63) is a cutoff we interested in ﬁnding in theordered scores.Finally we show that Algorithm 2 is constructed in a way to ﬁnd the correct cutoff, which is realized viasearching from the largest score and stopped at the ﬁrst one (which is our cutoff k (cid:63) ) rejected by closed testing.Note that we reject H k via closed testing if and only if each composite hypotheses that contains it can berejected locally. Using the monotonicity of the local test, this is saying that, for each s = 1 , . . . , m , we have: (cid:40)(cid:80) si =1 h ( T i ) ≤ g ( s, α ) , if s ≥ k ; h ( T k ) + (cid:80) s − i =1 h ( T i ) ≤ g ( s, α ) , otherwise . (58)With simple rearrangement, one may see that Algorithm 2 starts with k = 1 , increase k by 1 in each of itsupdates with k , and stops at the ﬁrst time that (58) is satisﬁed, when k is the cutoff k (cid:63) of our interests. Appendix F Proof for Lemma 1

We havePr H ∩ H (cid:110) M − ( p , p ) ≤ α (cid:111) = Pr H ∩ H (cid:26) p − + p − ≥ α (cid:27) = sup θ ∈ H ∩ H P θ (cid:18) p − + p − ≥ α (cid:19) . Let F ,θ ≥ F and F ,θ ≥ F be the CDFs of p − and p − , respectively, with F ( x ) = (1 − x − ) { x ≥ } . ByTheorem 3.1 in Frank et al. [17], if W is the lower Fréchet-Hoeffding bound from copula theory (which is a8 02.23.21copula itself in two dimensions), we have P θ (cid:18) p − + p − ≥ α (cid:19) ≤ − sup u + u =4 /α W { F ,θ ( u ) , F ,θ ( u ) } = 1 − sup u + u =4 /α max { F ,θ ( u ) + F ,θ ( u ) − , }≤ − sup u + u =4 /αu ≥ ,u ≥ max { − u − u , } = 1 − max(1 − α − α ,

0) = α By Theorem 3.2 of Frank et al. [17], a copula exists such that the ﬁrst inequality is exact. The second inequalityis exact if p and p are marginally standard uniform. Appendix G Proof for Proposition 1

We call X mi as just X i in this proof for brevity. Note that (cid:101) α m (Σ , r, c ) := Pr ∩ mi =1 H mi (cid:32) m m (cid:88) i =1 p rmi (cid:33) r ≤ c  = Pr (cid:40) m m (cid:88) i =1 h r ( X i ) ≤ C (cid:41) , (59)where h r ( x ) : = sign ( r )Φ( − x ) r , and C := c r . Note that h r is a convex function for x ≥ when r ≥ .Indeed, taking second derivative of h r with respect to x , we have d h r ( x ) dx = sign ( r ) r Φ( − x ) r − φ ( x ) [ x Φ( − x ) + r − shares the same sign with x Φ( − x ) + r − . Since x Φ( − x ) is positive for x > , and is upper bounded by aconstant t (cid:63) ( ≈ . ), we infer that for r ≥ , h r ( x ) is a convex function for x ≥ , as claimed above.Hence Pr (cid:40) m m (cid:88) i =1 h r ( X i ) ≤ C (cid:41) ≤ Pr (cid:40) h r ( 1 m m (cid:88) i =1 X i ) ≤ C (cid:41) , if X i > ∀ i ∈ [ m ] . (60)In the following, we prove that under condition C < r m r , condition { X i > ∀ i ∈ [ m ] } is implied by m (cid:80) mi =1 h r ( X i ) < C . Formally speaking, note that m m (cid:88) i =1 h r ( X i ) ≤ C ⇒ max i ∈ [ m ] h r ( X i ) ≤ mC ⇐⇒ min i ∈ [ m ] X i ≥ − Φ − ( mC r ) ( ∗ ) > , (61)where (*) is true due to our condition C < r m r .Therefore, we have that event { X i > , ∀ i ∈ [ m ] } is implied by event { m (cid:80) mi =1 h r ( X i ) ≤ C } , given that C < r m r , which in turn gives us the following due to (60),Pr (cid:40) m m (cid:88) i =1 p ri ≤ C (cid:41) = Pr (cid:40) m m (cid:88) i =1 h r ( X i ) ≤ C (cid:41) ≤ Pr (cid:40) h r ( 1 m m (cid:88) i =1 X i ) ≤ C (cid:41) = Pr (cid:40) Φ( − m m (cid:88) i =1 X i ) r ≤ C (cid:41) = Pr (cid:40) m m (cid:88) i =1 X i ≥ − Φ − ( C r ) (cid:41) = Pr (cid:40) m m (cid:88) i =1 X i ≥ C (cid:41) = Pr X ∼ N (0 , Σ) (cid:26) m I T X ≥ C (cid:27) ( ∗ ) = Pr Z ∼ N (0 ,σ ) { Z ≥ C } = 1 − Φ( C /σ Σ ) , (62)9 02.23.21where C = − Φ − ( C r ) > , and σ = m I T Σ I ∈ R , where I is vector of all ones in R m . Particularly, (*) istrue due to the fact that Gaussianity is preserved under afﬁne transformations.On the other hand, under fully dependence (i.e. ρ ij ≡ for all i, j ), we havePr (cid:40) m m (cid:88) i =1 p ri ≤ C (cid:41) = Pr { p r ≤ C } = Pr (cid:8) Φ − ( − X ) r ≤ C (cid:9) = Pr { X ≥ C } = 1 − Φ( C ) . (63)Therefore combining (62) and (63), and the fact that C > , we have that − Φ( C ) ≤ sup Σ ∈M m Pr (cid:40) m m (cid:88) i =1 p ri ≤ C (cid:41) ≤ sup Σ ∈M m − Φ( C /σ Σ ) (64) ( a ) = sup Σ ∈M Em − Φ( C /σ Σ ) ( b ) = sup ρ ∈ [ − m , − Φ (cid:18) C m + m − m ρ (cid:19) = 1 − Φ( C ) , (65)where we recall that M m is the class of all correlation matrix, and M Em is the class of all equicorrelationmatrices with correlation ρ ∈ [ − m , . Speciﬁcally, (a) is true since σ Σ only depends on the average of allentries in Σ , and (b) is true since σ Σ = m + m − m ρ for any Σ in M Em . In conclusion, we have sup Σ (cid:23) Pr (cid:40) m m (cid:88) i =1 p ri ≤ C (cid:41) = 1 − Φ( C ) , (66)for all r ≥ and the supremum is achieved at full dependence. Transforming back to the original representationin (29), we have completed our proof. Appendix H Proof for Theorem 4

Recalling decomposition (35), we can rewrite (cid:101) α m ( ρ, r, c ) as the following, which makes the link to the Gener-alized Law of Large Numbers clearer: (cid:101) α m ( ρ, r, c ) = E Z  Pr (cid:32) m m (cid:88) i =1 p ri (cid:33) r ≤ c ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Z = z  = E Z (cid:34) Pr (cid:40) sign ( r ) 1 m m (cid:88) i =1 p ri ≤ sign ( r ) · C ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Z = z (cid:41)(cid:35) , (67)where we use the conditional independence amongst { p i } mi =1 , and replace C := c r . Then we have lim sup m →∞ (cid:101) α m ( ρ, r, c ) = lim sup m →∞ E Z (cid:34) Pr (cid:40) sign ( r ) 1 m m (cid:88) i =1 p ri ≤ sign ( r ) · C (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Z = z (cid:41)(cid:35) = E Z (cid:34) lim sup m →∞ Pr (cid:40) sign ( r ) 1 m m (cid:88) i =1 p ri ≤ sign ( r ) · C (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Z = z (cid:41)(cid:35) , (cid:40) sign ( r ) 1 m m (cid:88) i =1 p ri ≤ sign ( r ) · C (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Z = z (cid:41) ≤ , (68)which is integrable with regard normal density. In the following, we focus on quantifying the limitation of theconditional probability in (68), for which we need the following lemma, which characterizes the distribution of p ri for r (cid:54) = ±∞ . Lemma 4.

Denote the CDF of p ri (with p i deﬁned in (36) ) conditioning on Z = z in its expression as F r,ρ,z ,and the corresponding density as f r,ρ,z , we have that: F r,ρ,z ( y ) = Φ (cid:32) sign ( r ) Φ − ( y r ) + √ ρz √ − ρ (cid:33) , and f r,ρ,z ( y ) = O ( y − ( ρ − r +1) ) as y r → , (69) where we take y − ( ρ − r +1) = exp( − yρ − ) when r = 0 .Proof. Without loss of generality, we only prove for the case r ≥ . Firstly, when r > , we have: F r,ρ,z ( y ) = Pr { p ri ≤ y | Z = z } = Pr (cid:110) Φ( −√ ρz − (cid:112) − ρZ i ) ≤ y r (cid:111) = Pr (cid:40) Z i ≤ Φ − ( y r ) + √ ρz √ − ρ (cid:41) = Φ (cid:32) Φ − ( y r ) + √ ρz √ − ρ (cid:33) , (70)and also the density f r,ρ,z ( y ) = dF r,ρ,z ( y ) dy ∝ y r − r √ − ρ φ (cid:32) Φ − ( y r ) + √ ρZ √ − ρ (cid:33) (cid:46) φ (Φ − ( y r )) (71) ∝ y r − r √ − ρ exp (cid:32) − ρ Φ − ( y r ) + √ ρZ Φ − ( y r )2(1 − ρ ) (cid:33) . (72)Using the approximation Φ − ( x ) = O (cid:32) − (cid:114) log ( 1 x ) (cid:33) when x → , we have that, f r,ρ,z ( y ) = O ( y − ( ρ − r +1) ) as y → . (73)For r = 0 , we have: F r,ρ,z ( y ) = Pr { log p i ≤ y | Z = z } = Pr (cid:110) Φ( −√ ρz − (cid:112) − ρZ i ) ≤ exp ( y ) (cid:111) = Pr (cid:26) Z i ≤ Φ − (exp ( y )) + √ ρz √ − ρ (cid:27) = Φ (cid:18) Φ − (exp ( y )) + √ ρz √ − ρ (cid:19) , (74)and also the density f r,ρ,z ( y ) = dF r,ρ,z ( y ) dy ∝ exp ( y ) r √ − ρ φ (cid:18) Φ − (exp ( y )) + √ ρZ √ − ρ (cid:19) (cid:46) φ (Φ − (exp ( y ))) (75) ∝ exp ( y ) r √ − ρ exp (cid:18) − ρ Φ − (exp ( y )) + √ ρZ Φ − (exp ( y ))2(1 − ρ ) (cid:19) . (76)1 02.23.21Using the approximation Φ − ( x ) = O (cid:32) − (cid:114) log ( 1 x ) (cid:33) when x → , we have that, f r,ρ,z ( y ) = O (exp ( y − ρ )) as y → −∞ , i.e. log y → . (77) (a) & (b) r > − : When r > − , using Lemma 4 we have that E [ p r | Z = z ] < ∞ for any ρ ∈ [0 , ,therefore by law of large numbers, we have, m m (cid:88) i =1 p ri (cid:12)(cid:12) Z = z d −→ E [ p r | Z = z ] , (78)where d → means converge in distribution. Therefore, lim sup m →∞ Pr (cid:40) sign ( r ) 1 m m (cid:88) i =1 p ri ≤ sign ( r ) · C (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Z = z (cid:41) = Pr { sign ( r ) E [ p r | Z = z ] ≤ sign ( r ) · C } , (79)and hence lim sup m →∞ (cid:101) α m ( ρ, r, c ) = E Z [ Pr { sign ( r ) E [ p r | Z = z ] ≤ sign ( r ) · C } ] := h ( ρ, r, C ) . (80)Recall that the conditional mean g ρ,r ( z ) : = E [ p ri | Z = z ] in (38), we have g ρ,r ( z ) = (cid:90) Φ( −√ ρ z − (cid:112) − ρ x ) r φ ( x ) dx = 1 √ − ρ (cid:90) φ ( y − √ ρz √ − ρ )Φ( − y ) r dy, (81)where φ as the standard normal p.d.f. From expression in (81), it is easy to see that g ρ,r ( z ) is monotonicallynon-increasing in z when r ≥ , while monotonically non-decreasing in z when r < . Therefore, using thismonotonicity, we have explicit expression h ( ρ, r, C ) = Φ (cid:0) − g − ρ,r ( C ) (cid:1) . (82)Recall the relationship C ≡ c r , and the deﬁnition of c r ( m, α ) that c r ( m, α ) := sup { c : sup ρ ∈ [0 , lim sup m →∞ (cid:101) α ( ρ, α, c ) ≤ α } . For r > , we have the following closed expression, that is c r ( m, α ) = (cid:18) inf ρ ∈ [0 , { C : Φ( − g − ρ,r ( C )) ≤ α } (cid:19) r via plugging in expression (82) in (80). Using the monotonicity of g ρ,r ( x ) with regard x , and the fact that g ρ,r ( x ) decreases with ρ when ρ > , x < and r ≥ (easy to verify that the derivative with regard ρ isalways negative), we can further simply c r ( m, α ) as c r ( m, α ) = (cid:18) inf ρ ∈ [0 , { C : C ≤ g ρ,r (cid:0) − Φ − ( α ) (cid:1) } (cid:19) r = (cid:18) inf ρ ∈ [0 , g ρ,r (cid:0) − Φ − ( α ) (cid:1)(cid:19) r = min { α, (cid:18) rr + 1 (cid:19) r } . (83)2 02.23.21Similarly, for − < r ≤ , we have the closed expression c r ( m, α ) = (cid:32) sup ρ ∈ [0 , { C : Φ( − g − ρ,r ( C )) ≤ α } (cid:33) r satisfy the requirements, and it can be simpliﬁed as c r ( m, α ) = (cid:32) sup ρ ∈ [0 , { c : c ≥ g ρ,r (cid:0) − Φ − ( α ) (cid:1) } (cid:33) r = (cid:32) sup ρ ∈ [0 , g ρ,r (cid:0) − Φ − ( α ) (cid:1)(cid:33) r . (84)Finally, we have that, (cid:101) α ( ρ, r, α ) = lim sup m →∞ (cid:101) α m ( ρ, r, c r ( m, α ))= (cid:40) Φ( − g − ρ,r ( α r )) , if r > − g − ρ,r ( c r ( m, α ) r ) , if − ≤ r ≤ . (85)And c r ( m, α ) =  min { α, (cid:16) rr +1 (cid:17) r } , if r > (cid:16) sup ρ ∈ [0 , g ρ,r (cid:0) − Φ − ( α ) (cid:1)(cid:17) r if − ≤ r ≤ . (86) (c) & (d): r ≤ − . When r ≤ − , things get a bit tricky, since according to Lemma 4, E [ p ri | Z = z ] maynot exist. In the following, we will use the stable law stated in Lemma 5 to derive the asymptotic behaviour of (cid:101) α m ( ρ, r, c ) for r < . Lemma 5. (Generalized LLN [18]) Consider a sequence of i.i.d random variables X , X , . . . , X m whichshares the same distribution with X , where X has support on [1 , ∞ ] and density f satisfying the following: f ( x ) = O ( x − ( β +1) ) , as x → ∞ with β > . Denote X m : = m (cid:80) mi =1 X i , we have that(a) if < β < , then m − β X m d → Y ;(b) if β = 1 , then X m − log m d → Y ;(c) if < β < , then m − β ( X m − E [ X ]) d → Y ;(d) if β ≥ , then X m d → E [ X ] ,where Y is some random variable with the same tail behaviour of X . Then, for r ≤ − , from Lemma 4, we have β = ρ − r . Let C ( α, r, m, ρ ) :=  C α,r m − ρ − r , if ≤ ρ < r ; C α,r + log m, if ρ = 1 + r ; C α,r m − ρ − r + E [ p r | Z = z ] , if r < ρ ≤ r ; C α,r + E [ p r | Z = z ] , if r < ρ ≤ , (87)3 02.23.21where C α,r is some constant that depends only on α and r that we will specify later. Using Lemma 5, we have lim m →∞ Pr (cid:40) m m (cid:88) i =1 p ri ≥ C ( α, r, m, ρ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Z = z (cid:41) = Pr { Y ≥ C α,r | Z = z } ( ∗ ) = Pr { p r ≥ C α,r | Z = z } + o (1)= F r,ρ,z ( C α,r ) + o (1) as α → , (88)where Y is the random variable comes from the limitation in Lemma 5, which shares the same tail behaviourof p r , therefore we have the approximation ( ∗ ) .Recalling deﬁnitions in (29), (34), our goal is to ﬁnd c such that sup ρ ∈ [0 , lim sup m →∞ (cid:101) α m ( ρ, r, c ) ≤ α ,or equivalently ﬁnd C such that sup ρ ∈ [0 , lim sup m →∞ (cid:101) α m ( ρ, r, C r ) ≤ α . Note that C ( α, r, m, ρ ) is mono-tonically non-increasing in ρ , and C ( α, r, m, dominates C ( α, r, m, ρ ) for any < ρ ≤ . Therefore, tocalibrate for arbitrary ρ ∈ [0 , , that is to ﬁnd a critical value that does not depend on ρ , we have no choice butlet C = C ( α, r, m, , and hence sup ρ ∈ [0 , lim sup m →∞ (cid:101) α m ( ρ, r, C ( α, r, m, r ) = (cid:101) α m ( ρ, r, C ( α, r, m, r )= E Z [1 − F r, ,Z ( C α,r )] = E Z (cid:104) Φ (cid:16) Φ − ( C r α,r ) (cid:17)(cid:105) = C r α,r ≤ α, which indicates we should set C α,r = α r to achieve the upper bound.Therefore we have c r ( m, α ) = ( C ( α, r, m, r = (cid:40) αm | r | − , if r < − α αm , if r = − , (89)and correspondingly (cid:101) α ( ρ, r, α ) = lim sup m →∞ (cid:101) α m ( ρ, r, c r ( m, α )) = α I { ρ = 0 } , where the last equality is true due to nature of stable law, where the tail behaviour determines the rate of grows,and the mismatch of the growth rate leads to degenerate asymptotic probability.Here we ﬁnish the proof of Theorem 4. Appendix I Proof for Theorem 5

In the following, we are interested in calculate the asymptotic power using the calibrated threshold c r ( m, α ) derived in Theorem 4. In particular, the power can re written as (cid:101) β m ( ρ, r, α ) : = Pr (cid:40) sign ( r ) 1 m m (cid:88) i =1 p rmi ≤ sign ( r ) C r ( m, α ) (cid:41) , (90)where C r ( m, α ) = c r ( m, α ) r , and p mi = Φ − ( − X mi ) for all i . Using similar decomposition as in the proofof Theorem 4, we have X mi = µ mi + √ ρ Z + (cid:112) − ρ Z i , for all i = 1 , , . . . , m, and p mi = Φ( − X mi ) = Φ (cid:16) − µ mi − √ ρ Z − (cid:112) − ρ Z i (cid:17) , p , p , . . . , p m i.i.d. if condition on Z . (91)4 02.23.21where variable Z ∼ N (0 , , Z i iid ∼ N (0 , , { Z } ⊥⊥ { µ mi , Z i } mi =1 , and µ mi iid ∼ µ m B mi with B mi iid ∼ Bernoulli ( π m ) for all i = 1 , , . . . , m . And the asymptotic power is given by (cid:101) β ( ρ, r, α ) = lim m →∞ β m ( ρ, α, r ) = E Z (cid:34) lim m →∞ Pr (cid:40) sign ( r ) 1 m m (cid:88) i =1 p rmi ≤ sign ( r ) C r ( m, α ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Z (cid:41)(cid:35) . (92)When r > , we can use the law of large numbers of triangular array, that is sup m E (cid:2) p rmi (cid:12)(cid:12) Z (cid:3) < ∞ ⇒ m m (cid:88) i =1 p rmi | Z − E [ p rmi | Z ] p → , (93)and get lim m →∞ Pr (cid:40) m m (cid:88) i =1 p rmi ≤ C r ( m, α ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Z (cid:41) = lim m →∞ Pr { E [ p rm | Z ] ≤ α r } = lim m →∞ Pr { π m E [ p rm | Z , µ m = µ m ] + (1 − π m ) E [ p rm | Z , µ m = 0] ≤ α r } . (94)Combining (92) and (94), we have that, when r > , (cid:101) β ( ρ, r, α ) = lim m →∞ (cid:101) β m ( ρ, r, α ) = Pr (cid:26) πg ρ,r ( Z + µ √ ρ ) + (1 − π ) g ρ,r ( Z ) ≤ α r (cid:27) (95)where g ρ,r is deﬁned in (38). From this expression, the following cases can be speciﬁed,• if π = 1 , then (cid:101) β ( ρ, r, α ) =  , if µ = ∞ ;Φ (cid:16) − g − ρ,r ( α r ) + µ √ ρ (cid:17) , if < µ < ∞ ; (cid:101) α ( ρ, r, α ) , if µ = 0 . (96)• if < π < , then (cid:101) β ( ρ, r, α ) =  Φ (cid:16) − g − ρ,r ( α r − π ) (cid:17) , if µ = ∞ ; Pr (cid:110) πg ρ,r ( Z + µ √ ρ ) + (1 − π ) g ρ,r ( Z ) ≤ α r (cid:111) , if < µ < ∞ ; (cid:101) α ( ρ, r, α ) , if µ = 0 . (97)• if π = 0 , then (cid:101) β ( ρ, r, α ) ≡ (cid:101) α ( ρ, r, α ) , no matter what value that µ takes.Therefore, we complete the proof. Appendix J Proof for Theorem 6

When r ≤ − , there is not necessarily a triangular version of stable law. We in fact utilize a more generalversion of such limitation: it says that, as long as the triangular array satisfy the uniformly asymptoticallynegligible (UAN) condition, that is for any (cid:15) > , lim m →∞ max i Pr {| Y mi | > (cid:15) } = 0 , (98)then we have that, lim m →∞ (cid:80) i Y mi converge to an inﬁnitely divisible distribution under certain conditions.The speciﬁc argument is formally stated in the following Lemma 6.5 02.23.21 Lemma 6. (Theorem 3.2.2 in [19]) Consider an triangular array { Y mk , k = 1 , . . . , k m } , such that the UANcondition is fulﬁlled, that is for any (cid:15) > m →∞ max k µ mk {| y | > (cid:15) } = 0 , (99) where µ mk is the distribution function for Y mk , and denote S m : = Y m + · · · + Y m,k m .Then there exists a deterministic sequence a m such that sequence S m − a m converges weakly to an inﬁnitelydivisible random variable Y if and only if the following conditions are fulﬁlled:1. for any A = ( −∞ , x ) with x < , and A = ( x, ∞ ) with x > such that ν ( ∂A ) = 0 , ν ( A ) : = lim m →∞ k m (cid:88) k =1 µ mk ( A ) (100) is a Lévy measure, i.e. a σ -ﬁnite Borel measure on R \ such that (cid:82) R \ min { , x } ν ( dx ) < ∞ .2. moreover, lim τ → lim sup m →∞ k m (cid:88) k =1 Var ( Z mk {| Z mk | < τ } ) = lim τ → lim inf m →∞ k m (cid:88) k =1 Var ( Z mk {| Z mk | < τ } ) = σ < ∞ . (101) Particularly, Y has the characteristic exponent φ ( t ) = − σ t + (cid:90) R \{ } ( e itx − − itx {| x | ≤ } ) ν ( dx ) , (102) and a m can be choosen by a m = k m (cid:88) k =1 (cid:90) | x | < xµ nk ( dx ) + o (1) (103) given that ν ( { x : | x | = 1 } ) = 0 . In our case, let Y mi = 1 m − r ( p rmi − a r,m ) | Z , where a r,m = 0 if r < − , and a r,m = log m if r = − . We ﬁrstly check the UAN condition (98). Note that lim m →∞ max i Pr (cid:26) | m − r ( p rmi − a r,m ) | > (cid:15) (cid:12)(cid:12)(cid:12)(cid:12) Z = z (cid:27) = lim m →∞ Pr (cid:26) | m − r ( p rmi − a r,m ) | > (cid:15) (cid:12)(cid:12)(cid:12)(cid:12) Z (cid:27) = lim m →∞ Pr (cid:8) p rmi > m − r (cid:15) + a r,m (cid:12)(cid:12) Z (cid:9) + Pr (cid:8) p rmi < − m − r (cid:15) + a r,m (cid:12)(cid:12) Z (cid:9) = lim m →∞ π m Pr (cid:8) p rmi > m − r (cid:15) + a r,m (cid:12)(cid:12) Z , µ mi = µ m (cid:9) + (1 − π m ) Pr (cid:8) p rmi > m − r (cid:15) + a r,m (cid:12)(cid:12) Z , µ mi = 0 (cid:9) + π m Pr (cid:8) p rmi < − m − r (cid:15) + a r,m (cid:12)(cid:12) Z , µ mi = µ m (cid:9) + (1 − π m ) Pr (cid:8) p rmi < − m − r (cid:15) + a r,m (cid:12)(cid:12) Z , µ mi = 0 (cid:9) = lim m →∞ π m Φ (cid:32) Φ − (( m − r (cid:15) + a r,m ) r ) + µ m + √ ρz √ − ρ (cid:33) + (1 − π m )Φ (cid:32) Φ − (( m − r (cid:15) + a r,m ) r ) + √ ρz √ − ρ (cid:33) + π m Φ (cid:32) − Φ − (( − m − r (cid:15) + a r,m ) r ) + µ m + √ ρz √ − ρ (cid:33) + (1 − π m )Φ (cid:32) − Φ − (( − m − r (cid:15) + a r,m ) r ) + √ ρz √ − ρ (cid:33) . (104)6 02.23.21For r < − , we have a r,m = 0 , and thus, (104) can be simpliﬁed as lim m →∞ π m Φ (cid:32) Φ − (( m − r (cid:15) ) r ) + µ m + √ ρz √ − ρ (cid:33) = lim m →∞ π m Φ (cid:32) Φ − ( (cid:15) r m ) + µ m + √ ρz √ − ρ (cid:33) , (105)while on the other hand, for r = − , we have a r,m = log m , and (104) can also be simpliﬁed as lim m →∞ π m Φ (cid:32) Φ − (( m(cid:15) + log m ) r ) + µ m + √ ρz √ − ρ (cid:33) = lim m →∞ π m Φ (cid:32) Φ − ( (cid:15) r m ) + µ m + √ ρz √ − ρ (cid:33) . (106)Therefore, in order to make (105) and (106) goes to zero, we only need to make µ m grows slower than | Φ − ( √ m ) | = O ( √ log m ) , that is µ m = o ( √ log m ) .Firstly, we consider the case ρ > , under which we will prove that for each i , Y mi = o p (1) when r < − , and Y mi = o (log m ) when r = − , as m → ∞ . We prove this by applying Lemma 6, during which we check forcondition 1 and 2 in it.As for condition 1 in Lemma 6 for r ≤ − , deﬁning ν ( x ) : = 1 − lim m →∞ m Pr { Y mi > x } for all x > ,it can be simpliﬁed to checking that − ν (1) + (cid:90) x } = Pr { m r P rmi > x | Z = z } = Pr (cid:40) P mi < x r m (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Z = z (cid:41) = π m Pr (cid:40) P mi < x r m (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Z = z , µ mi = µ m (cid:41) + (1 − π m ) Pr (cid:40) P mi < x r m (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Z = z , µ mi = 0 (cid:41) = π m Φ (cid:32) Φ − (( x r /m ) + µ m + √ ρz √ − ρ (cid:33) + (1 − π m )Φ (cid:32) Φ − (( x r /m ) + √ ρz √ − ρ (cid:33) = π m Φ  − (cid:113) mx − r ) + o (1) + µ m + √ ρz √ − ρ  + (1 − π m )Φ  − (cid:113) mx − r ) + o (1) + √ ρz √ − ρ  = π m Φ  − (cid:113) mx − r ) + √ ρz √ − ρ  + (1 − π m )Φ  − (cid:113) mx − r ) + √ ρz √ − ρ  + o (1)= Φ  − (cid:113) mx − r ) + √ ρz √ − ρ  + o (1) = O ( m − − ρ x − ρ ) r ) , (108)therefore, we have ν ( x ) = 1 − lim m →∞ m Pr { Y mi > x } = 1 − x − ρ ) r lim m →∞ m − ρ − ρ = 0 , (109)since ρ > . Therefore (107) is true, and in particular ν ( x ) = 0 when ρ > .Afterwards, we check condition 2 in Lemma 6, which simpliﬁes to verifying lim τ → lim m →∞ m Var ( Y mi { Y mi < τ } ) < ∞ , (110)7 02.23.21in our setting. Using the similar technique that we will calculate a m , i.e. the truncated ﬁrst moment, we havethe following about the truncated second moment for any ﬁxed truncation position τ > ,Var ( Y mi { Y mi < τ } ) ≤ m E (cid:2) Y mi { Y mi < τ } (cid:3) = o ( m − − ρ log − r ( m )) → , as m → ∞ , since ρ > . (111)Therefore, we have that the limit distribution does not have normal term when ρ > .Lastly, we compute a m via (103), that is a m = m E [ Y mi { Y mi < } ] = m r +1 E (cid:20) P rmi { P rmi < m r } (cid:21) = − m r +1 r √ − ρ (cid:90) m − r y r exp (cid:32) − ρ Φ − ( y r ) + 2 A m Φ − ( y r ) + A m − ρ ) (cid:33) dy, (112)where A m = √ ρz + µ m = o ( √ log m ) . Let x = Φ − ( y r ) , we have that (112) equals m r +1 √ − ρ (cid:90) ∞ Φ − ( m ) Φ( x ) r exp (cid:18) − x + 2 A m x + A m − ρ ) (cid:19) dx = m r +1 √ − ρ (cid:32)(cid:90) − ( m ) + (cid:90) ∞ (cid:33) Φ( x ) r exp (cid:18) − x + 2 A m x + A m − ρ ) (cid:19) dx = m r +1 √ − ρ ( I + I ) . (113)Using the following well-known Mill’s inequality [20], that is for any u > , u u φ ( u ) ≤ Φ( − u ) ≤ u φ ( u ) , (114)we have that I ≤ (cid:90) − ( m ) ( − x x ) r φ ( − x ) r exp (cid:18) − x + 2 A m x + A m − ρ ) (cid:19) dx = 1 √ π (cid:90) − Φ − ( m )1 ( x x ) r exp (cid:18) − [ r (1 − ρ ) + 1] x − A m x + A m − ρ ) (cid:19) dx = 1 √ π (cid:90) − Φ − ( m )1 ( 1 x + x ) − r exp (cid:18) − [ r (1 − ρ ) + 1] x − A m x + A m − ρ ) (cid:19) dx ≤ − r √ π (cid:90) − Φ − ( m )1 x − r exp (cid:18) − [ r (1 − ρ ) + 1] x − A m x + A m − ρ ) (cid:19) dx = 2 s √ π exp( c m ) (cid:90) − Φ − ( m )1 x s exp (cid:16) a x + b m x (cid:17) dx, (115)and I ≥ (cid:90) − ( m ) ( 1 − x ) r φ ( − x ) r exp (cid:18) − x + 2 A m x + A m − ρ ) (cid:19) dx = 1 √ π exp( c m ) (cid:90) − Φ − ( m )1 x s exp (cid:16) a x + b m x (cid:17) dx, (116)where s = − r ≥ ; a = r ( ρ − − − ρ = s − − ρ ; b m = A m − ρ > ; c m = A m − ρ ) .8 02.23.21Combining (115) and (116), we have that I = O (cid:32) exp( c m ) (cid:90) − Φ − ( m )1 x s exp (cid:16) a x + b m x (cid:17) dx (cid:33) . (117)In the following, we ﬁrst consider a > , under which case we demonstrate the rate of I . Then we argue thatthe case with a ≤ will only lead to slower rate.Let h m ( x ) = x s exp (cid:0) a x + b m x (cid:1) . When x > , we have ∂ h m ( x ) ∂x = (cid:2) ( ax + b m ) x s + a (2 s + 1) x s + 2 sb m x s − + s ( s − x s − (cid:3) exp (cid:16) a x + b m x (cid:17) ≥ , (118)that is, h m is convex in x for x > . Plugging into (117), we have I < ∼ s − √ π exp( c m ) | Φ − ( 1 m ) | (cid:20) exp (cid:16) a b m (cid:17) + | Φ − ( 1 m ) | s exp (cid:18) a − ( 1 m ) + b m Φ − ( 1 m ) (cid:19)(cid:21) = o (cid:16) m − r − − ρ log − r ( m ) (cid:17) as m → ∞ . (119)On the other hand, we have I ≤ − r (cid:90) ∞ exp (cid:18) − x + 2 A m x + A m − ρ ) (cid:19) < − r (cid:90) ∞−∞ exp (cid:18) − ( x + A m ) − ρ ) (cid:19) = 2 − r (cid:112) π (1 − ρ ) , (120)using the fact (cid:90) ∞−∞ exp ( − ax ) dx = (cid:114) πa , ( a > . Finally, plugging (119) and (120) into (113), we have that a m = o ( m − − ρ log − r ( m )) → as m → ∞ , since ρ > . (121)Based on the above calculations, we can ﬁnally apply Lemma 6 and have m (cid:88) i =1 Y mi − a m p → for all r ≤ − . (122)Therefore, when r < − , (cid:101) β ( ρ, r, α ) = lim m →∞ (cid:101) β m ( ρ, r, α ) = lim m →∞ E (cid:20) Pr (cid:26) m (cid:88) p rmi ≥ C r ( m, α ) (cid:12)(cid:12)(cid:12)(cid:12) Z (cid:27)(cid:21) = lim m →∞ E (cid:20) Pr (cid:26) m − r (cid:88) p rmi ≥ m r C r ( m, α ) (cid:12)(cid:12)(cid:12)(cid:12) Z (cid:27)(cid:21) = E (cid:104) lim m →∞ Pr (cid:110)(cid:88) Y mi ≥ m r +1 α r m − − r (cid:111)(cid:105) = lim m →∞ Pr (cid:110)(cid:88) Y mi − a m ≥ α r − a m (cid:111) = 0; (123)and similarly when r = − , (cid:101) β ( ρ, r, α ) = lim m →∞ β m ( ρ, α, r ) = lim m →∞ E (cid:20) Pr (cid:26) m (cid:88) p rmi ≥ α + log m (cid:12)(cid:12)(cid:12)(cid:12) Z (cid:27)(cid:21) = lim m →∞ Pr (cid:26)(cid:88) Y mi − a m ≥ α + log m − a m (cid:27) = 0 . (124)9 02.23.21In conclusion, for r ≤ − , we have that, β ( ρ, r, α ) = 0 as long as µ m = o ( √ log m ) and ρ > .On the other hand, recall that in Theorem 4 we derive that the calibrated threshold under equicorrelationwhen r ≤ − in fact equals to that under independence. Therefore when ρ = 0 , for all r ≤ − we have (cid:101) β ( ρ, r, α ) = lim m →∞ β m ( ρ, α, r ) = lim m →∞ Pr (cid:26) m (cid:88) p rmi ≥ C r ( m, α ) (cid:27) = lim m →∞ Pr (cid:26) m (cid:88) p rmi ≥ C r ( m, α ) (cid:27) = α. (125)Here we ﬁnish the proof for Theorem 6. Appendix K Proof for Theorem 7

Using the calibrated threshold c r ( m, α ) derived in Theorem 4, we have that β m ( ρ, r, α ) = Pr (cid:40) m m (cid:88) i =1 p rmi ≥ C r ( m, α ) (cid:41) = Pr (cid:40) m (cid:88) i =1 m r ( p rmi − a rm ) ≥ α r (cid:41) , (126)where C r ( m, α ) = c r ( m, α ) r , a rm = 0 for r < − , and a rm = log m for r = − .Therefore, we only need to prove that (cid:80) mi =1 m r ( P rmi − a rm ) → ∞ with probability one, where a rm = 0 for r < − , and a rm = log m for r = − . Since m (cid:88) i =1 m r ( P rmi − a rm ) ≥ max i { m r ( P rmi − a rm ) } = ( m min { P mi } ) r − m r a rm , (127)and with part (a) we have min i { P mi } = Φ( − (cid:112) − ρ max i { Z i + µ mi / (cid:112) − ρ } − √ ρZ )= Φ( − (cid:112) − ρ (cid:112) m − µ m − √ ρZ ) + o p (1)= O p ( m − ( √ (1 − ρ )+ √ c ) ) . (128)Therefore, we have that ( m min { P mi } ) r − m r a rm = O p ( m − r (cid:16) ( √ (1 − ρ )+ √ c ) − (cid:17) ) → ∞ w.p. , since √ c > − (cid:112) (1 − ρ ) , (129)and hence we proved the argument for part (a). Similarly, as for part (b) we have min i { P mi } = Φ( − (cid:112) − ρ max i { Z i + µ mi / (cid:112) − ρ } − √ ρZ ) (130) ≤ Φ( − (cid:112) − ρ (cid:112) γ log m − µ m − √ ρZ ) + o p (1) (131) = O p ( m − ( √ γ (1 − ρ )+ √ c ) ) (132)Therefore, we have that ( m min { P mi } ) r − m r a rm = O p ( m − r (cid:16) ( √ γ (1 − ρ )+ √ c ) − (cid:17) ) → ∞ w.p. , since √ c > − (cid:112) γ (1 − ρ ) ,,