[PDF] Inference for Moment Inequalities: A Constrained Moment Selection Procedure

Abstract

Inference in models where the parameter is defined by moment inequalities is of interest in many areas of economics. This paper develops a new method for improving the performance of generalized moment selection (GMS) testing procedures in finite-samples. The method modifies GMS tests by tilting the empirical distribution in its moment selection step by an amount that maximizes the empirical likelihood subject to the restrictions of the null hypothesis. We characterize sets of population distributions on which a modified GMS test is (i) asymptotically equivalent to its non-modified version to first-order, and (ii) superior to its non-modified version according to local power when the sample size is large enough. An important feature of the proposed modification is that it remains computationally feasible even when the number of moment inequalities is large. We report simulation results that show the modified tests control size well, and have markedly improved local power over their non-modified counterparts.

Full PDF

IInference for Moment Inequalities: A ConstrainedMoment Selection Procedure

Rami V. Tabri ∗ and Christopher D. Walker † Abstract

Inference in models where the parameter is deﬁned by moment inequalities is of interest inmany areas of economics. This paper develops a new method for improving the performance ofgeneralized moment selection (GMS) testing procedures in ﬁnite-samples. The method modiﬁesGMS tests by tilting the empirical distribution in its moment selection step by an amountthat maximizes the empirical likelihood subject to the restrictions of the null hypothesis. Wecharacterize sets of population distributions on which a modiﬁed GMS test is (i) asymptoticallyequivalent to its non-modiﬁed version to ﬁrst-order, and (ii) superior to its non-modiﬁed versionaccording to local power when the sample size is large enough. An important feature of theproposed modiﬁcation is that it remains computationally feasible even when the number ofmoment inequalities is large. We report simulation results that show the modiﬁed tests controlsize well, and have markedly improved local power over their non-modiﬁed counterparts.

Keywords : empirical likelihood, moment inequality model, statistical information.

JEL Classiﬁcation : C12, C14, C21 ∗ School of Economics, The University of Sydney, Sydney, New South Wales 2006, Australia, Tel: +61 2 9351 3092,Fax: +61 2 9351 4341, Email: [email protected]. † Corresponding author. Department of Economics, Harvard University, Cambridge, MA 02138, United States ofAmerica, Email: [email protected] a r X i v : . [ ec on . E M ] A ug Introduction

Statistical inference in models deﬁned by moment inequalities is a frequently encountered topic in economet-rics. Examples of applications include games of entry with multiple equilibria (e.g., Ciliberto and Tamer,2009), single/multiple agent optimization problems (e.g., Pakes et al., 2015), censored and missing data(e.g., Manski and Tamer, 2002; Imbens and Manski, 2004), model selection tests (e.g., Shi, 2015, and Hsuand Shi, 2017), event-study designs (e.g., Rambachan and Roth, 2019), stochastic dominance comparisons(e.g. Whang, 2019) and New-Keynesian DSGE models (e.g., Moon and Schorfheide, 2009). This paperconsiders inference for a ﬁnite-dimensional parameter deﬁned by a ﬁnite number of unconditional momentinequalities.We suppose that there exists a true value of the parameter θ ∈ Θ ⊆ R d that satisﬁes the momentinequality restrictions E F (cid:0) g j ( W i , θ ) (cid:1) ≥ j = 1 , ..., J, (1.1)where { g j ( · , θ ) : j = 1 , ..., J } are known real-valued functions, { W i : i ≤ n } are independent and identicallydistributed (i.i.d.) with unknown distribution F , and W i ∈ R dim( W i ) . Under these moment conditions,the set Θ I ( F ) ≡ (cid:8) θ ∈ Θ : E F (cid:0) g j ( W i , θ ) (cid:1) ≥ ∀ j = 1 , ..., J (cid:9) denotes the so-called identiﬁed set while any θ ∈ Θ I ( F ) is termed an identiﬁable parameter . Thus, the true value of the parameter might not be uniquelyidentiﬁed by F and the economic model.We are interested in conﬁdence sets for θ constructed by test inversion. The test is based on a statistic T n , for testing individual hypotheses for each θ that have the form H : θ ∈ Θ I ( F ) versus H : θ / ∈ Θ I ( F ) . (1.2)Inference in this model is challenging because the pointwise limiting null distribution of conventional teststatistics are discontinuous in the parameter – the dependence on the parameter is through the index setof moment inequalities (1.1) that are binding. In particular, a moment inequality enters the pointwiseasymptotic null distribution of the test statistic T n whenever it holds as an equality. Tests of (1.2) that havegood properties incorporate information about which moments E F (cid:0) g j ( W i , θ ) (cid:1) are “positive”, in order toexclude them from the computation of a critical value. Tests of this sort are known as two-step procedures inthe literature, examples of which include Andrews and Soares (2010), Canay (2010), Andrews and Barwick(2012a), and Romano et al. (2014). The ﬁrst step of those testing procedures use the data to determinewhether the moment inequalities (1.1) are close to or far from being equalities. The second step usesthe outcome of the ﬁrst step to yield information about which moment inequalities are “positive” whenconstructing tests of (1.2).The literature on two-step tests of (1.2) is vast, and almost all of these tests use the sample-analogueestimator of the moments E F (cid:0) g j ( W i , θ ) (cid:1) in the ﬁrst step to determine the slackness of the moment inequal-ities. This feature ignores the information present in the restrictions (1.1), because the sample-analogueestimator does not exploit the fact that the moments satisfy these restrictions under the null hypothesisin (1.2). Thus, we conjecture that implementing this information in such tests can improve their accuracyin ﬁnite-samples under the null and alternative hypotheses. This paper provides such a modiﬁcation for thebroad class of generalized moment selection (GMS) testing procedures put forward by Andrews and Soares(2010), and ﬁnds that our conjecture is in the right direction.We propose a modiﬁcation of GMS testing procedures that implements the information present in (1.1) sing the method of empirical likelihood (Owen, 2001). The modiﬁcation is to replace the sample-analogueestimator of the moments in the ﬁrst step of the GMS procedure with its constrained empirical likelihoodcounterpart, where the constraints are the moment inequalities (1.1). We label this modiﬁcation constrainedmoment selection (CMS). For a given test statistic and moment selection function, the CMS and GMStests only diﬀer in terms of which moments they select for the computation of the critical value in testsof (1.2). The motivation for our proposal is that the detection of the “positive” moment inequalities in theﬁrst step would be more accurate because we are using additional information that is available to us, whichthe sample-analogue estimator of the moments ignores. Consequently, the CMS procedure alters the GMScritical value for testing (1.2) in a data-dependent way that incorporates the information contained in (1.1)through a reduction of the parameter space for F . For this reason, we expect CMS tests of (1.2) to be moreaccurate than their GMS counterparts in ﬁnite-samples.This paper characterises the parameter space for ( θ , F ) over which the CMS and GMS testing proce-dures are asymptotically equivalent, to ﬁrst-order, under the null, local alternatives, and distant alternatives.We focus, though, on the GMS class of testing procedures in which the moment selection function is givenby the moment selection t -test. This focus is without loss of generality, as the results extend naturally, withappropriate modiﬁcations, to the more general setup in Andrews and Soares (2010) using their assump-tions. This means that for a given test statistic, CMS tests inherit all of the asymptotic properties of GMStests. Speciﬁcally, under the null, CMS conﬁdence sets are asymptotically valid with uniformity over theparameter space, not asymptotically conservative, and not asymptotically similar. Furthermore, CMS testsof (1.2) have greater asymptotic local power than tests based on subsampling or ﬁxed asymptotic criticalvalues, and are consistent against distant alternatives. The parameter space imposes only three conditionsin addition to the conditions that deﬁne the parameter space Andrews and Soares (2010) introduce. Theseconditions are part of Assumption GEL in Andrews and Guggenberger (2009): (i) a uniform bound on thevariances of the moment functions, (ii) a lower bound on the determinant of their correlation matrix, and(iii) a regularity condition on an estimator of the degree of slackness of the moments arising from the dualformulation of the constrained empirical likelihood problem. Collectively, the conditions that deﬁne ourparameter space enables the use of results from Andrews and Guggenberger (2009) on constrained empiricallikelihood estimation in our proofs of the aforementioned asymptotic results.While GMS and CMS are asymptotically equivalent procedures, we characterise local alternatives underwhich the power of CMS tests dominate their GMS counterparts for suﬃciently large, but ﬁnite, samples.These are directions in the alternative that have some non-violated moment inequalities (SNVIs) and anon-negative correlational structure. That is, conﬁgurations where some of the moments E F (cid:0) g j ( W i , θ ) (cid:1) under the alternative hypothesis are “positive”, and the covariance matrix of { g j ( W i , θ ) : j = 1 , ..., J } hasnon-negative entries only. The non-negative correlational structure arises in empirical applications; see,for example, Lok and Tabri (in press) who point to that structure for moment inequalities characterisingstochastic dominance comparisons. It is quite diﬃcult to determine the extent of this diﬀerence in localpowers analytically. However, using a Monte Carlo simulation experimental design based on Andrews andBarwick (2012a), who focus on ﬁnite-sample comparisons of the maximum null rejection probability (MNRP),we show using the modiﬁed method of moments (MMM) statistic that along such local alternatives thediﬀerences in MNRP-corrected powers of CMS and GMS tests can be approximately 36 percentage pointswhen J = 4 and n = 250 , which is strikingly large. See Section 4 for more details.The two-step tests in this literature that exploit the information (1.1) are the procedures put forwardby Andrews and Guggenberger (2009) and Canay (2010). They implement this information using (gener-alised) empirical likelihood. Andrews and Guggenberger (2009) and Canay (2010) develop subsampling and ootstrap tests of (1.2), respectively, using empirical-likelihood-type test statistics. Both tests have correctasymptotic size in a uniform sense and are shown not to be asymptotically conservative. However, Canay(2010)’s test has higher asymptotic power because it is a GMS procedure. More generally, Andrews andSoares (2010) show the asymptotic power of GMS tests dominate that of subsampling and plug-in asymp-totic tests. A disadvantage of Canay (2010)’s procedure is that it may be more computationally burdensomethan other GMS tests. Thus, our modiﬁcation of GMS tests can improve ﬁnite-sample performance withoutincurring a high computational cost.Andrews and Barwick (2012a) proposed a reﬁnement of GMS termed reﬁned moment selection (RMS)and discussed the reasons why such an approach is preferable. However, the RMS procedure is quite com-putationally expensive when J > . By contrast, the CMS procedure remains computationally feasiblewhen J is large. The reason is that the constrained empirical likelihood optimization problem it is basedupon has a strictly concave objective function, convex feasible set, and the choice variables enter linearlyinto the constraints. As a consequence, there is a unique global solution to this optimization problem andits implementation involves an of-the-shelf programming routine. More recently, Romano et al. (2014) pro-posed a two-step testing procedure for moment inequalities that is similar in spirit to the RMS procedureand remains computationally feasible when J is large. An important distinction between the CMS testingprocedure and these tests is that, like GMS tests, neither of them exploits the information present in themoment inequality constraints (1.1), because they employ the sample-analogue estimator of the moments intheir ﬁrst step.We examine the ﬁnite-sample performance of CMS tests using the MMM and adjusted quasi-likelihood-ratio (AQLR) test statistics in Monte Carlo simulations based on the experimental design in Andrews andBarwick (2012a). The experiment compares the performance of CMS to its GMS, RMS, RSW counterpartsin terms of MNRP and MNRP-corrected local power. The inclusion of the RMS and RSW procedures in thesimulation experiment is to benchmark the performance of CMS. Overall, the simulation results showcasethe value of implementing the information (1.1) in the CMS procedure in terms of ﬁnite-sample size andpower properties, and corroborate its theoretical superior performance over GMS. The simulation results alsoshow the performance of CMS and RMS tests based on the AQLR statistic are comparable. This ﬁndingis encouraging as the RMS test has desirable asymptotic properties but can be computationally expensivewhen J is large, while the CMS procedure isn’t costly to compute at all.The idea of exploiting information on parameters deﬁned by constraints for improving performance instatistical problems, through constrained estimation, is one of the most natural ideas in statistics. Theliterature on constrained estimation via tilting the empirical distribution overlaps with this paper, wherethe problem is that the constraints/information are not adequately reﬂected by the empirical distribution(e.g., Hall and Presnell, 1999). Tilting the empirical distribution allows one to incorporate informationselectively into a statistical procedure without changing the procedure itself. Lok and Tabri (in press) applythis idea to modifying two-step bootstrap tests for restricted stochastic dominance orderings using empiricallikelihood and semi-inﬁnite programming. The parameter of interest in their setup is inﬁnite-dimensionaland there is a continuum of moment inequality restrictions, which are deﬁned by moment functions thathave a particular form. The form of the moment functions in their setup yields a correlational structurethat facilitates the analysis of such moment inequalities. Contrastingly, in the setup of this paper, J is ﬁniteand the form of the moment functions { g j ( · , θ ) : j = 1 , ..., J } is arbitrary. The implementation of empiricallikelihood in their setup has a data-driven number of inequalities that increases with the sample size, whichcan be as large as 500 in moderate sample sizes. The ability of empirical likelihood to straightforwardlyexecute with a large number of moment inequality restrictions transfers to the CMS procedure for models ith large J. This computational feasibility of CMS is an important feature of our approach. Similarto Lok and Tabri (in press), this paper is also part of the econometrics literature on shape restrictions(e.g., Chetverikov et al., 2018, and the references therein), as the inequalities (1.1) can be thought of asﬁnite-dimensional analogues of shape restrictions on nonparametric functions.We organize the paper as follows. Section 2 introduces the statistical framework, as well as the GMSand CMS procedures. Section 3 introduces the main results of the paper. Section 4 reports the results ofMonte Carlo simulations, and Section 5 concludes.For notational simplicity, throughout the paper we write partitioned column vectors as h = ( h , h ) . rather than h = ( h , h ) . Let R + = { x ∈ R : x ≥ } , R + , ∞ = R + ∪ { + ∞} , R [+ ∞ ] = R ∪ { + ∞} , R [ ±∞ ] = R ∪ {±∞} , “:=” denote the deﬁnitional identity, and A denote the closure of a set A . The object of interest is a parameter θ ∈ Θ ⊆ R d , d < + ∞ , deﬁned by a ﬁnite number of known momentfunctions g j : W × Θ → R that satisfy the following unconditional moment inequality restrictions: E F (cid:0) g j ( W, θ ) (cid:1) ≥ ∀ j ∈ J , (2.1)where F denotes the true distribution of the observed data W and J := { , ..., J } with J < ∞ . In general,the identiﬁed set, Θ I ( F ) = { θ ∈ Θ : E F ( g j ( W, θ )) ≥ ∀ j ∈ J } , is not a singleton meaning that theparameter is partially identiﬁed.The moment inequality model is given by the following deﬁnition. Deﬁnition 1. [ Moment Inequality Model ] Let F be the set of parameters ( θ, F ) that satisfy:1. θ ∈ Θ ⊆ R d .2. { W i : i ≥ } are i.i.d. under F .3. E F (cid:0) g j ( W i , θ ) (cid:1) ≥ for j ∈ J .4. σ F,j ( θ ) := V ar F (cid:0) g j ( W i , θ ) (cid:1) ∈ [ ε ∗ , M ∗ ] for some M ∗ > ε ∗ > .5. Ω( θ, F ) ∈ Ψ , where Ω( θ, F ) is the J × J correlation matrix of { g j ( W i , θ ) , j = 1 , . . . , J } , and Ψ is thespace of correlation matrices whose determinant is greater than ε > .6. ∃ δ and M > E F | g j ( W i , θ ) /σ F,j ( θ ) | δ ≤ M ∀ j ∈ J . All of the conditions in this deﬁnition, except for Conditions 4 and 5, are those presented in (2.2) of Andrewsand Soares (2010). Condition 4 is a strengthening of Condition (v) in Andrews and Soares (2010) so that thevariances of the moment functions are uniformly bounded. Condition 5 speciﬁes the nonsingularity of thematrix Ω( θ, F ) . These conditions are relatively unrestrictive and are part of Assumption GEL in Andrewsand Guggenberger (2009). Furthermore, they arise frequently in papers that consider empirical likelihoodinference for moment inequalities (e.g., Canay, 2010, and Lok and Tabri, in press).For a given value of the parameter, θ = θ , we invert tests of the hypothesis H : θ ∈ Θ I ( F ) toconstruct conﬁdence sets of the form CS n = { θ ∈ Θ : T n ( θ ) ≤ c − α ( θ ) } , where T n ( θ ) denotes a test statistic nd c − α ( θ ) is a critical value for tests with nominal level α ∈ (0 , / CS n is a uniformly validconﬁdence set for θ if lim inf n → + ∞ inf ( θ,F ) ∈F P F ( T n ( θ ) ≤ c − α ( θ )) ≥ − α, (2.2)where P F ( · ) is the probability measure induced by repeated sampling from F . Uniformity is essential in orderfor asymptotic size to be a good approximation to the ﬁnite-sample size of conﬁdence sets, because the teststatistic exhibits a discontinuity in its asymptotic distribution (as a function of the distribution generatingthe data), but not in its ﬁnite-sample distribution. Discontinuities of this type can create asymptotic sizeproblems that are analogous to those that arise with parameters that are near a boundary (e.g., Andrewsand Guggenberger, 2009).A test statistic is a function S : R J [+ ∞ ] × V J × J → R given by T n ( θ ) := S (cid:16) n ˆ g n ( θ ) , ˆΣ n ( θ ) (cid:17) , where V J × J is the set of invertible J × J variance matrices,ˆ g n ( θ ) = " n n X i =1 g ( W i , θ ) , ..., n n X i =1 g J ( W i , θ ) > , g ( W i , θ ) = (cid:2) g ( W i , θ ) , ..., g J ( W i , θ ) (cid:3) > , and ˆΣ n ( θ ) = n − n X i =1 ( g ( W i , θ ) − ˆ g n ( θ ))( g ( W i , θ ) − ˆ g n ( θ )) > . Two examples are the modiﬁed method of moments (MMM) and adjusted quasi-likelihood-ratio (AQLR)statistics. In the context of the moment inequality model F given by Deﬁnition 1, these test statistics aredeﬁned as S (cid:0) n ˆ g n ( θ ) , ˆΣ n ( θ ) (cid:1) = n J X j =1 (cid:16) min (cid:8) , ˆ g n,j ( θ ) / ˆ σ n,j ( θ ) (cid:9)(cid:17) and (2.3) S A (cid:0) n ˆ g n ( θ ) , ˆΣ n ( θ ) (cid:1) = n inf t ∈ R J + , ∞ (ˆ g n ( θ ) − t ) (cid:124) (cid:0) ˜Σ n ( θ ) (cid:1) − (ˆ g n ( θ ) − t ) , (2.4)respectively, where ˜Σ n ( θ ) = ˆΣ n ( θ ) + max { , . − | ˆΩ n ( θ ) |} ˆ D n ( θ ) , ˆΩ n ( θ ) = ˆ D − n ( θ ) ˆΣ n ( θ ) ˆ D − n ( θ )and ˆ D n ( θ ) = diag ˆΣ n ( θ ), where diag ˆΣ n ( θ ) is a diagonal matrix with dimensions equal to those of ˆΣ n ( θ )whose diagonal elements equal those of ˆΣ n ( θ ). The point of departure for establishing that (2.2) holds for the GMS procedure is to consider the asymptoticdistribution of T n ( θ ) under a suitable sequence of null distributions. For any sequence { F n : n ≥ } in themodel of the null hypothesis, the test statistic satisﬁes T n ( θ ) d −→ S (cid:16) Ω / Z ∗ + h , Ω (cid:17) where Z ∗ ∼ N (0 J , I J ) , (2.5)where h ∈ R J + , ∞ , and Ω is a J × J correlation matrix. The vector h = ( h , , ..., h ,J ) has elements givenby lim n → + ∞ n / ( E F n ( g j ( W i , θ )) /σ F,j ( θ ) and measures the degree of slackness of the moment inequalities.The crux of this asymptotic construction is that the limiting distribution in (2.5) now depends continuously Speciﬁcally, this large-sample result (2.5) follows from the form of the test statistic, the Central Limit Theorem,and the convergence in probability of the sample correlation matrix. n the degree of slackness of the moment inequalities via the parameter h , which reﬂects the ﬁnite-samplesituation.The asymptotic implementation of the GMS critical value is the 1 − α quantile of a data-dependentversion of the asymptotic null distribution in (2.5). It replaces Ω by a consistent estimator and replaces h with a function ϕ : R J × Ψ → R J + , ∞ , which measures the slackness of moment inequalities throughˆ ξ n ( θ ) = κ − n n ˆ D − n ( θ )ˆ g n ( θ ) , where { κ n : n ≥ } is a divergent sequence of scalars (Andrews and Soares,2010). The GMS critical value, ˆ c n ( θ , − α ) , is the 1 − α quantile of L n ( θ , Z ∗ ) = S (cid:16) ˆΩ n ( θ ) Z ∗ + ϕ ( ˆ ξ n ( θ ) , ˆΩ n ( θ )) , ˆΩ n ( θ ) (cid:17) , (2.6)where Z ∗ ∼ N (0 J , I J ) and is independent of { W i : i ≥ } . That is,ˆ c n ( θ , − α ) := inf (cid:26) x ∈ R : P (cid:16) L n ( θ, Z ∗ ) ≤ x (cid:17) ≥ − α (cid:27) . (2.7)where P (cid:16) L n ( θ , Z ∗ ) ≤ x (cid:17) denotes the conditional CDF at x of L n ( θ , Z ∗ ) , conditional upon ( ˆ ξ n ( θ ) , ˆΩ n ( θ )) . In practice, the calculation of ˆ c n ( θ , − α ) is by simulating L n ( θ , Z ∗ ) using R i.i.d. draws from Z ∗ ∼ N (0 J , I J ) and computing the 1 − α quantile of the empirical CDF from { L n ( θ , Z ∗ r ) : r = 1 , ..., R } . Alternatively, one may compute the GMS critical value using the bootstrap. We brieﬂy describe this ap-proach. Let { W ∗ i : i ≤ n } be a bootstrap sample drawn from the empirical distribution of the data { W i : i ≤ n } , and deﬁne ˆ g ∗ n ( θ ) = n − P ni =1 g ( W ∗ i , θ ), ˆΣ ∗ n ( θ ) = n − P ni =1 ( g ( W ∗ i , θ ) − ˆ g ∗ n ( θ ))( g ( W ∗ i , θ ) − ˆ g ∗ n ( θ )) (cid:124) , ˆ D ∗ n ( θ ) = diag ˆΣ ∗ n ( θ ), and ˆΩ ∗ n ( θ ) = ( ˆ D ∗ n ( θ )) − ˆΣ ∗ n ( θ )( ˆ D ∗ n ( θ )) − . The bootstrap implementation of theGMS procedure replaces L n ( θ , Z ∗ ) in (2.6) with L n ( θ , { W ∗ i : i ≤ n } ) = S (cid:16) G ∗ n ( θ ) + ϕ ( ˆ ξ n ( θ ) , ˆΩ n ( θ )) , ˆΩ ∗ n ( θ ) (cid:17) , where G ∗ n ( θ ) = n ( ˆ D ∗ n ( θ )) − (ˆ g ∗ n ( θ ) − ˆ g n ( θ )) , and deﬁnes a critical value analogous to (2.7). In practice,this critical value is the empirical 1 − α quantile of the bootstrap statistics { L n ( θ , { W ∗ i,r : i ≤ n } ) : r =1 , ..., R } , where {{ W ∗ i,r : i ≤ n } : r = 1 , ..., R } are bootstrap samples drawn from the empirical distributionof the data { W i : i ≤ n } . The asymptotic results of this paper hold for the bootstrap provided that G ∗ n ( θ n,h ) d → Ω Z ∗ , where the convergence is conditional on { W i : i ≤ n } for almost every sample path, forall sequences { ( θ n,h , F n,h ) : n ≥ } in F .There are numerous choices for ϕ and { κ n : n ≥ } . Chernozhukov et al. (2007) and Andrews and Soares(2010) recommend using κ n = (ln n ) . Another option is to set κ n = (2 ln ln n ) , which is used in Canay(2010). Our main results set ϕ = ϕ (1) , where ϕ (1) j ( ξ, Ω) =  ξ j ≤ ∞ if ξ j > j ∈ J , and is referred to as the ‘moment selection t -test’ because it resembles a t -test with de-terministic critical value κ n . The decision reﬂects the recommendations of Andrews and Barwick (2012a),and is essentially without loss of generality because our results extend to any choice of ϕ that satisﬁes theassumptions of Andrews and Soares (2010). Appendix F.1 discusses how to generalize our results to othersuitable choices of ϕ .The advantage of the GMS procedure is that it asymptotically detects the “positive” moments E F ( g j ( W i , θ )) nd excludes them from the computation of the critical value, so as to mimic the discontinuity in the asymp-totic null distribution of T n ( θ ) . This ability of GMS tests to detect such moments is the source of itsimprovements over the subsampling and plug-in procedures under the null and alternative hypotheses.Although GMS tests are computationally simple and have desirable asymptotic properties, their perfor-mance in ﬁnite-samples depends crucially on how well they detect the “positive” moments, so as to omitthem from the computation of the critical value. Their use of the sample-analogue estimator of the momentsfor detecting the positive moments does not implement the information embedded in (2.1) and implementingthis information appropriately can improve the detection accuracy of “positive” moments in ﬁnite-samples.For a given moment selection function ϕ, the CMS procedure implements the information present in (2.1)through a surgical modiﬁcation of the GMS procedure. The modiﬁcation is to replace ˆ g n ( θ ) with itsconstrained empirical likelihood counterpart, where the constraints impose the inequality restrictions (2.1).Speciﬁcally, CMS replaces ˆ ξ n ( θ ) with ´ ξ n ( θ ) = κ − n n ˆ D − n ( θ )´ g n ( θ ), where ´ g n ( θ ) = P ni =1 ´ p i g ( W i , θ ) andthe probabilities ´ p , ..., ´ p n solvemax p ,...,p n ( n X i =1 ln( p i ) : n X i =1 p i g j ( W i , θ ) ≥ ∀ j ∈ J , n X i =1 p i = 1 , p i ≥ ∀ i ) , (2.9)and then computes a critical value as described in (2.7), but replaces ϕ ( ˆ ξ n ( θ ) , ˆΩ n ( θ )) with ϕ ( ´ ξ n ( θ ) , ˆΩ n ( θ ))in (2.6). The CMS modiﬁcation of GMS can easily be applied to all choices of ϕ and { κ n : n ≥ } presented in Andrews and Soares (2010) because it only replaces ˆ g n ( θ ) with ´ g n ( θ ) . The estimator ´ g n,j ( θ )of E F (cid:0) g j ( W, θ ) (cid:1) > g n,j ( θ ) because the optimization problem (2.9), which givesrise to ´ g n ( θ ) , imposes a correct constraint E F (cid:0) g j ( W, θ ) (cid:1) ≥ , while ˆ g n ( θ ) ignores such information.Thus, when E F (cid:0) g j ( W, θ ) (cid:1) > g n,j detects this conﬁguration more reliably than ϕ j ( ˆ ξ n , ˆΩ n ( θ )) , and therefore, takes it into account by deliveringa critical value that is suitable for the case where this moment inequality is omitted. This feature of CMSleads to it having better ﬁnite-sample properties than GMS under the null and alternative hypotheses.The CMS procedure is not computationally expensive because the empirical likelihood optimizationproblem (2.9) has a strictly concave objective function and a convex feasible set that is characterised byaﬃne functions of the choice variables (Owen, 2001). This means that the optimization problem (2.9)has a unique global solution, and it can be computed numerically using standard optimization routines insoftware such as Matlab, R, or GAUSS. This computational simplicity of the optimization problem (2.9) isan important feature of the CMS procedure. Remark 1.

One can ‘fully constrain’ the CMS procedure by using restricted estimators of the correlationmatrix. In this case, we evaluate ϕ ( ´ ξ F Cn ( θ ) , ´Ω n ( θ )), where´ ξ F Cn ( θ ) = κ − n n ´ D − n ( θ )´ g n ( θ ) , ´Ω n ( θ ) = ´ D − n ( θ ) ´Σ n ( θ ) ´ D − n ( θ ) , ´Σ n ( θ ) = n X i =1 ´ p i ( g ( W i , θ ) − ´ g n ( θ ))( g ( W i , θ ) − ´ g n ( θ )) > , and ´ D − n ( θ ) = diag ´Σ n ( θ ) . In our simulations not presented in this paper, we ﬁnd limited practical diﬀerence between the ϕ ( ´ ξ F Cn ( θ ) , ´Ω n ( θ ))and ϕ ( ´ ξ n ( θ ) , ˆΩ n ( θ )). Consequently, the rest of the paper focuses on ϕ ( ´ ξ n ( θ ) , ˆΩ n ( θ )) because it is simplerto show that there are power advantages over GMS. Main Results

We start by introducing the assumptions that beget the main results of this paper. They are conditionson the test statistic S , the moment selection function ϕ , and the parameter space F . The assumptions on S we consider are from Andrews and Soares (2010), and are stated as Assumptions 1-7 in Appendix B forease of exposition. Recall that we set ϕ = ϕ (1) in (2.8), and the main results we present are based on thischoice of moment selection function. It should be noted that this choice of ϕ is without loss of generalityas one can employ assumptions identical to those in Andrews and Soares (2010) on ϕ to deduce the sameconclusions, because the CMS procedure does not alter the moment selection function in the GMS procedure.See Appendix F.1 for the details on other choices of ϕ. The ﬁrst assumption concerns the sequence { κ n : n ≥ } . Assumption K. κ n → + ∞ as n → + ∞ . 2. κ − n n → + ∞ as n → + ∞ . The conditions in this assumption are not restrictive – the aforementioned examples of { κ n : n ≥ } satisfythem. The ‘optimal’ choice of { κ n : n ≥ } is an important question, but the goal of our paper is moremodest: to demonstrate how incorporating statistical information can improve ﬁnite-sample inference formoment inequalities in a computationally simple way and, for this purpose, our analysis conditions on anarbitrary choice of { κ n : n ≥ } . For our Monte Carlo experiment (Section 4), we set κ n = (ln n ) which isthe recommended choice in Chernozhukov et al. (2007) and Andrews and Soares (2010).The next assumption we present is the ﬁrst part in Part (d) of Assumption GEL in Andrews andGuggenberger (2009). It is helpful in establishing that ´ g n ( θ ) is a uniformly consistent estimator of themoments under the null hypothesis H : θ ∈ Θ I ( F ) . To introduce this assumption, for each t ∈ R J , deﬁne g i ( t, θ ) = g ( W i , θ ) − t. The vector t is a nuisance parameter that captures the slackness of themoment inequalities. Using the dual formulation of the empirical likelihood problem (2.9), the amount ofslackness is captured by ´ t n = arg min t ∈ R J + sup λ ∈ ´Λ n ( t,θ ) n − P ni =1 ln (cid:16) − λ > g i ( t, θ ) (cid:17) , where ´Λ n ( t, θ ) = { λ ∈ R J : λ > g i ( t, θ ) ∈ Q ∀ i = 1 , . . . , n } , Q is an open interval of R containing 0 . This reformulation of theempirical likelihood problem (2.9) is feasible because the linear constraint qualiﬁcation applies to it. Thepart of Assumption GEL we include in our setup is a regularity condition concerning the uniform asymptoticbehavior of ´ t, and is stated in terms of the following reparametrization of F . Deﬁnition 2.

Let Γ be deﬁned as the set of all γ = ( γ , γ , γ ) such that for some ( θ, F ) ∈ F where1. F is deﬁned in Deﬁnition 1.2. γ = ( E F ( g ( W i , θ )) /σ F, ( θ ) , . . . , E F ( g J ( W i , θ )) /σ F,J ( θ )) . γ = (cid:0) θ, vech ∗ (Ω( θ, F )) (cid:1) , where vech ∗ (Ω( θ, F )) is the vector of lower oﬀ-diagonal elements of Ω( θ, F ) . γ = F. Andrews and Soares (2010) indicate that there is a one-to-one mapping from γ to ( θ, F ); see Appendix Aof their paper for the details. Denote by { γ n,h : n ≥ } ⊂ Γ a sequence of parameters in Γ such that n / γ n,h, → h ∈ R J + , ∞ and γ n,h, → h ∈ R q [ ±∞ ] as n → ∞ , where q = dim(Θ) + dim(vech ∗ (Ω( θ, F ))) . Thepart of Assumption GEL that we include in our setup is given by the following assumption.

Assumption T.

For all subsequences { w n } of { n } and all sequence { γ w n ,h : n ≥ } ⊂ Γ and corresponding { ( θ w n ,h , F w n ,h ) : n ≥ } ⊂ F , ´ t w n = arg min t ∈ R J + sup λ ∈ ´Λ wn ( t,θ wn,h ) n n X i =1 ln (cid:16) − λ > g i ( t, θ w n ,h ) (cid:17) xists and satisﬁes sup n ≥ || ´ t w n || ‘ J ≤ K with probability approaching 1 as n → + ∞ for some constant K < + ∞ , where || · || ‘ J is the usual Euclidean norm on R J . We also include an assumption from Andrews and Soares (2010) for the case in which Int(Θ I ( F )) = ∅ for some data-generating process in the model. It is required to show that when there are no binding momentinequalities, the maximum asymptotic coverage probability is equal to 1 . Assumption M.

There exists ( θ, F ) ∈ F that satisﬁes E F (cid:0) g j ( W i , θ ) (cid:1) > for all j ∈ J . We now present the ﬁrst main result of the paper. It mirrors Theorem 1 of Andrews and Soares (2010) whichconcerns the asymptotic size of GMS conﬁdence sets. Denote by ´ c n ( θ , − α ) the CMS critical value underthe nominal level 1 − α for testing the null hypothesis H : θ ∈ Θ I ( F ) . Theorem 1.

Suppose S satisﬁes Assumptions 1 - 3, ϕ = ϕ (1) in (2.8), the sequence { κ n : n ≥ } satisﬁesPart 1 of Assumption K, and α ∈ (0 , / . Furthermore, let F + = { ( θ, F ) ∈ F : Assumption T holds } , and F ++ = { ( θ, F ) ∈ F : Assumptions T and M hold } . Then, the nominal level (1 − α ) CMS conﬁdence set basedon the statistic T n ( θ ) satisﬁes the following statements:1. lim inf n → + ∞ inf ( θ,F ) ∈F + P F (cid:16) T n ( θ ) ≤ ´ c n ( θ, − α ) (cid:17) ≥ − α. lim inf n → + ∞ inf ( θ,F ) ∈F + P F (cid:16) T n ( θ ) ≤ ´ c n ( θ, − α ) (cid:17) = 1 − α , if in addition S and { κ n : n ≥ } satisfyAssumption 7 and Part 2 of Assumption K, respectively.3. lim sup n → + ∞ sup ( θ,F ) ∈F ++ P F (cid:16) T n ( θ ) ≤ ´ c n ( θ, − α ) (cid:17) = 1 . Proof.

See Appendix C.1. (cid:4)

The ﬁrst result of Theorem 1 establishes the uniform validity of CMS conﬁdence sets over the parameterspace F + , and the second result of this theorem shows that they are not asymptotically conservative. Thethird result shows the maximum coverage probability of CMS conﬁdence sets is equal to 1 over the parameterspace F ++ . The parameter space F + is a subset of the one used in Theorem 1 of Andrews and Soares (2010),because it imposes Assumption T and Condition 4 in Deﬁnition 1 in addition to the conditions they set fortheir parameter space.The proof of Theorem 1 establishes that CMS and GMS procedures are asymptotically equivalent withuniformity over the parameter space F + . The essence of this result is that for every sequence { γ n,h : n ≥ } ⊂ Γ such that Assumption T holds, along the corresponding sequence { ( θ w n ,h , F w n ) : n ≥ } ⊂ F + we have´ ξ w n ( θ w n ,h ) = ˆ ξ w n ( θ w n ,h ) + o p (1) . This asymptotic equivalence is a consequence of ´ g w n ( θ w n ,h ) − ˆ g w n ( θ w n ,h ) = O p ( w − n ) (see Lemma D.3 in Appendix D) and Assumption K on κ w n . In particular, these arguments areused after re-writing the expression of ´ ξ w n ( θ w n ,h ) in terms of ˆ ξ w n ( θ w n ,h ) , as such´ ξ w n ( θ w n ,h ) = κ − w n w n ˆ D − w n ( θ w n ,h )´ g w n ( θ w n ,h )= κ − w n w n ˆ D − w n ( θ w n ,h ) (cid:0) ´ g w n ( θ w n ,h ) − ˆ g w n ( θ w n ,h ) (cid:1) + ˆ ξ w n ( θ w n ,h ) , (3.1)to obtain the asymptotic equivalence. .2 Limiting Local Power Function of CMS Tests This section employs the setup in Section 8 of Andrews and Soares (2010) to show the limiting local powerfunction of the CMS tests coincide with their GMS counterparts when the null parameter space is F + = { ( θ, F ) ∈ F : Assumption T holds } . For sequences of parameters { ( θ n, ∗ , F n ) : n ≥ } , consider the testingproblem H : E F n (cid:0) g j ( W i , θ n, ∗ ) (cid:1) ≥ ∀ j ∈ J vs. H : H is false (3.2)where θ n, ∗ = θ n + ηn − (1 + o (1)) for all n ≥

1, ( θ n , F n ) ∈ F + for all n ≥

1, and η ∈ R d where d =dim( θ n ) < + ∞ . The idea is to study the behavior of the testing procedure along sequences of parameters { ( θ n, ∗ , F n ) : n ≥ } that diﬀer locally from a point in the true parameter space F + by O ( n − ). The localpower function is deﬁned as P F n (cid:0) T n ( θ n, ∗ ) > ´ c n ( θ n, ∗ , − α ) (cid:1) , where P F n ( · ) is the probability measure inducedby random sampling from F n for all n ≥

1. The objective is to derive an expression for the limiting localpower function, lim n → + ∞ P F n (cid:0) T n ( θ n, ∗ ) > ´ c n ( θ n, ∗ , − α ) (cid:1) , and compare it to its GMS counterpart.To this end, we introduce technical assumptions for deriving the limiting local power function for CMStests. These assumptions are from Section 8 of Andrews and Soares (2010). Assumption LA1.

The true parameters { ( θ n , F n ) : n ≥ } satisfy:1. θ n = θ n, ∗ − ηn − (1+ o (1)) for some η ∈ R d , θ n, ∗ → θ and F n → F as n → + ∞ , where ( θ , F ) ∈ F + .2. For each j ∈ J , there exists h ,j ∈ R + , ∞ such that n E F n (cid:0) g j ( W i , θ n ) (cid:1) /σ F n ,j ( θ n ) → h ,j as n → + ∞ .3. sup (cid:8) E F n | g j ( W i , θ n, ∗ ) /σ F n ,j ( θ n, ∗ ) | δ : n ≥ (cid:9) < + ∞ for all j ∈ J for some δ > . The ﬁrst two parts of Assumptions LA1 show that the sequence of true parameters, { θ n : n ≥ } , is n − -local to the sequence of parameters under the null hypothesis, { θ n, ∗ : n ≥ } and provides the limit of thesequence of normalised moment functions when evaluated at the sequence of true parameters { θ n : n ≥ } .The third part of this assumption is a uniform integrability condition that permits the use of stochastic limittheorems for triangular arrays of row-wise IID random variables. The second assumption is as follows. Assumption LA2. Π( θ, F ) := ( ∂/∂θ > )[ D − ( θ, F ) E F ( g ( W i , θ ))] ∈ R J × dim(Θ) exists and is a continuousfunction in a neighbourhood of ( θ , F ) . Both Assumptions LA1 and LA2 are important for proving the large sample properties of CMS tests under n − -local alternatives. Namely, they allow one to mean value expand the normalised moment functionsunder H around θ = θ n and show thatlim n →∞ n D − ( θ n, ∗ , F ) E F ( g ( W i , θ n, ∗ )) = h + Π( θ , F ) η which can then be used to show that T n ( θ n, ∗ ) d → J h ,η , where J h ,η is the distribution function of S (cid:0) Ω Z ∗ + h + Π( θ , F ) η, Ω (cid:1) and Z ∗ ∼ N (0 J , I J ) (Andrews and Soares, 2010). Assumption LA3. lim n → + ∞ κ − n n D − ( θ n , F n ) E F n ( g ( W i , θ n )) = π ∈ R J + , ∞ . The last assumption involves the set C ( ϕ ) = { ˜ π ∈ R J [+ ∞ ] : ∀ j ∈ J , either ˜ π ,j = + ∞ or ϕ j ( ξ, Ω) → ϕ j (˜ π , Ω ) as ( ξ, Ω) → (˜ π , Ω ) } . Loosely, C ( ϕ ) is the set of all vectors in R J [+ ∞ ] for which ϕ is continuousat (˜ π , Ω ). With ϕ = ϕ (1) , this set is C ( ϕ (1) ) = { ˜ π ∈ R J [+ ∞ ] : ˜ π ,j = 1 , ∀ j ∈ J } . Assumption LA4. π ∈ C ( ϕ (1) ) and 2. P F (cid:16) S (cid:0) Ω Z ∗ + ϕ (1) ( π , Ω ) , Ω (cid:1) ≤ x (cid:17) is continuous and strictlyincreasing at x = c π ( ϕ (1) , − α ) , the − α quantile of the distribution function of S (cid:0) Ω Z ∗ + ϕ (1) ( π , Ω ) , Ω (cid:1) . ssumptions LA3 and LA4 are imposed so that we can use Theorem 2(a) of Andrews and Soares (2010) toobtain the form of the GMS limiting local power function.Next, we present the second main result of this paper. This result states that GMS and CMS tests areasymptotically equivalent, to ﬁrst-order, under n − / -local alternatives. Theorem 2.

Suppose S satisﬁes Assumptions 1-5, ϕ = ϕ (1) in (2.8), the sequence { κ n : n ≥ } satisﬁesAssumption K, and that Assumptions LA1 - LA4, hold. Then lim n → + ∞ P F n (cid:0) T n ( θ n, ∗ ) > ´ c n ( θ n, ∗ , − α ) (cid:1) =1 − J h ,η (cid:0) c π ( ϕ, − α ) (cid:1) . Proof.

See Appendix C.2. (cid:4)

The intuition behind Theorem 2 is essentially the same as Theorem 1. For a given sequence { ( θ n, ∗ , F n ) : n ≥ } of n − / -local alternatives, we show ´ ξ n ( θ n, ∗ ) = ˆ ξ n ( θ n, ∗ ) + o p (1). This asymptotic equivalence is a conse-quence of applying ´ g n ( θ n, ∗ ) − ˆ g n ( θ n, ∗ ) = O p ( n − ) (see Lemma E.6 in Appendix E) and Assumption K to adecomposition of ´ ξ n ( θ n, ∗ ) identical to (3.1). Therefore, the pairs ( ´ ξ n ( θ n, ∗ ) , ˆΩ n ( θ n, ∗ )) and ( ˆ ξ n ( θ n, ∗ ) , ˆΩ n ( θ n, ∗ ))are asymptotically equivalent along sequences of n − / -local alternatives. As this is the only point of dif-ference between CMS and GMS, Theorem 2 follows from Theorem 2(a) of Andrews and Soares (2010). Animportant corollary to Theorem 2 is that CMS inherits the ﬁrst-order improvements that GMS exhibits oversubsampling and plug-in asymptotic critical values (see Andrews and Soares, 2010). While Theorem 2 establishes the equality of the limiting local power functions of CMS and GMS tests under n − / -local alternatives, this section presents results that characterize sequences of local alternatives underwhich the power of CMS tests dominate their GMS counterparts for suﬃciently large, but ﬁnite, samples.First, we must establish when it is meaningful to compare tests along sequences of n − / -local alternatives.Under the conditions of part 2 of Theorem 1, for every r > , there exists N r, , N r, ∈ Z + (depending on r )such that (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) sup m ≥ n sup ( θ,F ) ∈F + P F ( T m ( θ ) > ´ c m ( θ, − α )) − α (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ r ∀ n ≥ N r, and (3.3) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) sup m ≥ n sup ( θ,F ) ∈F + P F ( T m ( θ ) > ˆ c m ( θ, − α )) − α (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ r ∀ n ≥ N r, , (3.4)by the deﬁnition of limit superior (with respect to n ). Then by the triangular inequality, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) sup m ≥ n sup ( θ,F ) ∈F + P F ( T m ( θ ) > ´ c m ( θ, − α )) − sup m ≥ n sup ( θ,F ) ∈F + P F ( T m ( θ ) > ˆ c m ( θ, − α )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ r for all n ≥ N r = max { N r, , N r, } , holds. In words, given an error tolerance r, the tails of the sequences ofexact sizes of CMS and GMS tests are within r of α and of each other, when n ≥ N r . Thus, given r (e.g.,0.0001), it is meaningful to compare the rejection probabilities along sequences of local alternatives when n ≥ N r . Let H denote the set of all sequences { ( θ n, ∗ , F n ) : n ≥ } that satisfy Assumption LA1 and LA2. Thefamily we consider for the comparisons is deﬁned as M = {{ ( θ n, ∗ , F n ) : n ≥ } ∈ H : Ω( θ n, ∗ , F n ) has nonnegative oﬀ-diagonal elements, ∀ n } . (3.5) or { ( θ n, ∗ , F n ) : n ≥ } ∈ H , let ˆΥ n ( θ n, ∗ ) := { j ∈ J : ϕ (1) j ( ˆ ξ n ( θ n, ∗ ) , ˆΩ n ( θ n, ∗ )) = 0 } and ´Υ n ( θ n, ∗ ) := { j ∈J : ϕ (1) j ( ´ ξ n ( θ n, ∗ ) , ˆΩ n ( θ n, ∗ )) = 0 } for each n ≥

1. We have the following result.

Theorem 3.

Let M be as in (3.5). Suppose that S satisﬁes Part 1 of Assumption 1, ϕ = ϕ (1) in (2.8),and the sequence { κ n : n ≥ } satisﬁes Assumption K. For every { ( θ n, ∗ , F n ) : n ≥ } ∈ M , there exists N ( θ n, ∗ , F n ) ∈ Z + such that P F n (cid:16) T n ( θ n ) > ˆ c n ( θ n , − α ) (cid:17) ≤ P F n (cid:16) T n ( θ n ) > ´ c n ( θ n , − α ) (cid:17) ∀ n ≥ N ( θ n, ∗ , F n ) . (3.6) If in addition S satisﬁes part 1 of Assumption 2 and Part 2 of Assumption 5, and the event (cid:26) ´Υ n ( θ n, ∗ ) (cid:40) ˆΥ n ( θ n, ∗ ) (cid:27) \ (cid:26) ˆ c n ( θ n, ∗ , − α ) > (cid:27) \ (cid:26) ´ c n ( θ n, ∗ , − α ) < T n ( θ n, ∗ ) ≤ ˆ c n ( θ n, ∗ , − α ) (cid:27) has positive probability for each n ≥ N ( θ n, ∗ , F n ) , then the weak inequalities in (3.6) are strict.Proof. See Appendix C.3. (cid:4)

Theorem 3 states the rejection probabilities of CMS tests are no less than their GMS counterparts in largeenough, but ﬁnite, sample sizes, under local alternatives in M . It also provides a suﬃcient condition forthe ordering to hold strictly. Thus, for each sequence of local alternatives in M and small r > , thelocal power of a CMS test is larger than its GMS counterpart when n ≥ max { N ( θ n, ∗ , F n ) , N r } , where N r = max { N r, , N r, } and N r, and N r, deﬁned in (3.3) and (3.4), respectively.The key message from Theorem 3 is that a comparison of GMS and CMS tests based on ﬁrst-orderasymptotics can be misleading, as it does not reﬂect the ﬁnite-sample situation for certain local alternatives.The result of Theorem 3 is similar to Corollary 6.1 Lok and Tabri (in press); however, it is important to notethat their result is speciﬁc to moment inequalities arising from restricted stochastic dominance orderings.Consequently, Theorem 3 provides a nontrivial extension of their result to the moment inequality model withﬁnitely many inequalities and arbitrary moment functions, when the oﬀ-diagonal elements of Ω( θ n, ∗ , F n ) arenon-negative for each n. At the heart of this result is the marriage of the non-negative correlational structure on Ω( θ n, ∗ , F n ) andconstrained empirical likelihood estimation. This marriage begets ˆ g n,j ( θ n, ∗ ) ≤ ´ g n,j ( θ n, ∗ ) with probabilityapproaching 1 , for all sequences in M (see Lemma E.8). This ordering of the estimators implies that ϕ (1) j ( ´ ξ ( θ n, ∗ ) , ˆΩ n ( θ n, ∗ )) ≥ ϕ (1) j ( ˆ ξ n ( θ n, ∗ ) , ˆΩ n ( θ n, ∗ )) holds with probability approaching 1, for all sequences in M (see Lemma E.9). It is this ordering of the moment selection functions under such sequences that givesrise to the result of Theorem 3.While Theorem 3 indicates that the local powers of the GMS and CMS tests can be ordered under a classof local alternatives M for large enough n, it does not specify the extent of the discrepancy in the local powers.It is quite diﬃcult to determine the extent of this discrepancy analytically. However, Section 4 presents MonteCarlo evidence that the discrepancy that Theorem 3 implies can be very large for local alternative sequenceswhich have some non-violated inequalities (SNVIs). That is, sequences { ( θ n, ∗ , F n ) : n ≥ } in M wherethere exists j ∈ J such that E F n (cid:0) g j ( W i , θ n, ∗ ) (cid:1) > ∀ n and lim n → + ∞ E F n (cid:0) g j ( W i , θ n, ∗ ) (cid:1) = 0. This section shows CMS tests are consistent against distant alternatives. Distant alternatives include ﬁxedalternatives and alternatives that diﬀer from the null by greater than O ( n − ). The next assumption is useful or deducing this result, and it is the same one introduced by Andrews and Soares (2010) in Section 9 oftheir paper. Assumption DA.

Let g ∗ n,j = E F n ( g j ( W i , θ n, ∗ )) /σ F n ,j ( θ n, ∗ ) for each j ∈ J , and υ n = max j ∈J {− g ∗ n,j } .1. n υ n → + ∞ as n → + ∞ Ω( θ n, ∗ , F n ) → Ω , Ω ∈ Ψ . The key part of this assumption is the ﬁrst part, which indicates that there exists j ∈ J such that g ∗ n,j < O ( n − ). This condition diﬀers from thesetup with n − -local alternatives, where the sequences of alternatives { ( θ n, ∗ , F n ) : n ≥ } are within a n − -neighbourhood of F + .We have the following result. Theorem 4.

Suppose S satisﬁes Assumptions 1,3,4 and 6, ϕ = ϕ (1) in (2.8), and the sequence { κ n : n ≥ } satisﬁes Assumption K. Then lim n → + ∞ P F n (cid:16) T n ( θ n, ∗ ) > ´ c n ( θ n, ∗ , − α ) (cid:17) = 1 . Proof.

See Appendix C.4. (cid:4)

This section studies the ﬁnite-sample performance of the CMS procedure and compares it to the GMSprocedure using a simulation experiment based on the designs in Andrews and Barwick (2012a). Thestudy uses the test statistics S in (2.3) and S A in (2.4), the recommended moment selection function ϕ = ϕ (1) in (2.8), and the recommended localisation parameter κ n = (ln n ) / . The nominal level is set to α = 0 . , and we considered sample sizes n = 50 , S and S A , and (ii) the recommended RMS testing procedure, which combines S = S A , ϕ = ϕ (1) , and κ -auto (a data-driven choice of κ n ), as additional benchmarks in studying theﬁnite-sample performance of CMS; see Appendices G.2 and G.1, respectively, for further details on thesetesting procedures. Only bootstrap versions of the tests were implemented, with 10000 bootstrap samplesper Monte Carlo replication. The computations were implemented using R .For a given θ, the null hypothesis is H : θ ∈ Θ( F ) . The experimental design in Andrews and Barwick(2012a,b) is a general formulation of that testing problem that does not require the speciﬁcation of a particularform for the moment functions { g j ( · , θ ) : j = 1 , ..., J } . They note that the ﬁnite-sample properties of tests of H depend on the moment functions only through (i) the vector µ = [ E F (cid:0) g ( W i , θ ) (cid:1) , . . . , E F (cid:0) g J ( W i , θ ) (cid:1) ] , (ii) the correlation matrix Ω = Corr ( g ( W i , θ ) , . . . , g J ( W i , θ )) , and (iii) the distribution of the mean zero,variance I J random vector Z † = [ Z † , . . . , Z † J ] where Z † j = Var − / F ( g j ( W i , θ )) (cid:0) g j ( W i , θ ) − E F (cid:0) g j ( W i , θ ) (cid:1)(cid:1) , j = 1 , . . . , J. We consider the case Z † ∼ N (0 J , I J ) and three correlation matrices, Ω Neg , Ω Zero , and Ω Pos , which exhibitnegative, zero, and positive correlations.The assertion of the null hypothesis in this general formulation is H : µ j ≥ ∀ j = 1 , . . . , J . Forcomparisons under the null hypothesis, we follow Andrews and Barwick (2012a) by comparing the tests’maximum null rejection probabilities (MNRPs). The MNRPs are computed over the mean vectors µ inthe null parameter space given the correlation matrix Ω ∈ { Ω Neg , Ω Zero , Ω Pos } and under the assumption ofnormally distributed moment inequalities. Based on simulation evidence, they conjecture that the MNRPsoccur for mean vectors µ whose elements are 0’s and + ∞ ’s. Thus, given a nominal level α, they computeMNRP results over the set of mean vectors µ which have that form. The results we report are for J = 2 , nd 10 . The matrix Ω

Zero equals the J -dimensional identity matrix. The matrices Ω Neg and Ω

Pos are Toeplitzmatrices with correlations given by the following: for J = 2 : ρ = − . Neg and ρ = . Pos ; for J = 4 : ρ = ( − . , . , − .

5) for Ω

Neg and ρ = ( . , . , .

5) for Ω

Pos ; for J = 10 : ρ = ( − . , . , − . , . , − . , . , − . , . , − . Neg and ρ = ( . , . , . , . , . , . . . , .

5) for Ω

Pos . As in Andrews and Barwick (2012a), the simulation studytreats the correlation matrices as unknown in the implementation of all of the tests.For power comparisons, we also follow Andrews and Barwick (2012a,b). They compare the power ofdiﬀerent tests by comparing their empirical power for a chosen set of alternative parameter vectors µ ∈ R J for a given correlation matrix Ω ∈ { Ω Neg , Ω Zero , Ω Pos } . The sets of µ vectors in the alternative are similarto the ones described in Andrews and Barwick (2012a,b). We adjust those sets so as to compare the localpower properties of the testing procedures. The adjustment is as follows. For each J ∈ { , , } the set of µ vectors are given by M J,n (Ω) = { µ/ √ n : µ ∈ M J (Ω) } , where the set M J (Ω) of µ vectors is describedin Section 7.1 of Andrews and Barwick (2012b). The µ vectors in M J,n (Ω) are scaled versions of those in M J (Ω) , where the scaling is by n − / to create the n − / -local alternatives. There are 7 , , and 40 elementsin M J,n (Ω) for J = 2 , , respectively. We omit their description for brevity.As the MNRPs of the tests can diﬀer in ﬁnite-samples, the simulation results on power comparisons arebased on a MNRP correction that is similar to the one employed by Andrews and Barwick (2012a). Foreach test statistic S , the MNRP correction of the CMS, GMS and RMS procedures is to add a constantbased on the true matrix Ω to their corresponding critical values, so that their resulting MNRPs matchthat of the RSW testing procedure with nominal level α = 0 .

05; see Section G.3 in the appendix for thedetails. The simulation studies in Andrews and Barwick (2012a) and Romano et al. (2014) compare testsunder the alternative using average MNRP-corrected power, where the average is computed over alternative µ vectors in M J (Ω) . We report simulation results graphically using boxplots of the MNRP-corrected local powers over sets of µ vectors M J,n (Ω) for the 54 diﬀerent combinations of ( J, Ω , S, n ) for each of the CMS,GMS and RSW procedures, and 9 diﬀerent combination of ( J, Ω , S A , n ) for the recommended RMS test.Additionally, we report average MNRP-corrected local powers of the diﬀerent tests across the aforementionedconﬁgurations using the symbol ⊕ in these plots.While the average MNRP-corrected power is a useful criterion for comparing tests across µ vectors in agiven set M J,n (Ω), it does not convey the whole picture of the tests’ performance over elements in M J,n (Ω) . Reporting boxplots, as we do, reveals the variation in powers of the tests across elements in M J,n (Ω); thus,presenting a broader and more extensive approach to comparing the tests under the alternative. Theseplots are especially useful for detecting diﬀerences in the performances of tests when the averages of theirMNRP-corrected powers are close, but exhibit diﬀerent distributional variations in MNRP-corrected poweracross µ vectors in M J,n (Ω) . As in Andrews and Soares (2010), Andrews and Barwick (2012a), and Romano et al. (2014), empiricalMNRPs are simulated as the maximum rejection probability over all µ vectors whose components are 0 and+ ∞ , with at least one component equal to zero. Table 1 reports the MNRPs for tests. Each experimentused 10000 Monte Carlo replications when J ∈ { , } and 2500 when J = 10.Overall, the procedures achieve a satisfactory performance for all cases considered. The RMS and RSWtests perform the best, as their MNRPs are closest to the 5% nominal level across all of the cases considered.For the RSW procedure: the MNRPs fall within the ranges [.043,.056] and [.043,.055] when using S A and S test statistics, respectively. For the RMS test: the MNRPs fall into the range [.042,.053]. The CMS testsover-reject the null slightly: the MNRPs fall within the ranges [.049,.067] and [.049,.061] when using S A and J = 2 J = 4 J = 10 n Procedure Statistic Ω

Neg Ω Zero Ω Pos Ω Neg Ω Zero Ω Pos Ω Neg Ω Zero Ω Pos

50 GMS S S A S S A S S A S A S S A S S A S S A S A S S A S S A S S A S A test statistics, respectively. The tables also show CMS tests have better MNRPs than their GMS versionsas the latter tend to over-reject more: the GMS MNRPs fall within the ranges [.049,.083] and [.049,.078]for S A and S , respectively. The largest MNRPs arise in the conﬁgurations where Ω = Ω Neg , and theseMNRPs increase with larger J , for the CMS, GMS and RSW tests. However, the MNRPs of all of thesetests do get closer to the 5% nominal level with larger sample sizes, across all conﬁgurations, and for CMStests, this numerical result is a consequence of Theorem 1.While we don’t have a theoretical result on improved size control of CMS tests over their GMS versions,Table 1 provides simulation-based evidence of such an improvement. Hence, these results point to thepotential beneﬁt of implementing the information (1.1), as we do, in two step testing procedures, underthe null. The next section presents simulation results on MNRP-corrected power of these tests, under localalternatives, and illustrates the result of Theorem 3. Figures 1 and 2 below report boxplots of the MNRP-corrected powers of the tests under S A and S ,respectively. The results can be summarised as follows. For each test statistic, the MNRP-corrected powervalues of the tests are generally distributed in a similar way in conﬁgurations where Ω = Ω Neg , and allof the tests have comparable average powers in those conﬁgurations. By contrast, in conﬁgurations whereΩ = Ω Zero , for each test statistic, the boxplots show the RSW tests’ MNRP-corrected power values tendto be (i) more dispersed (as shown by the lengths of their boxes), (ii) have a wider overall range, and(iii) have lower average power in comparison to the remaining tests, which all behave similarly as can beseen by their boxplots. For example, the average power of the RSW test when S = S A , J = 10, and n = 250 is approximately equal to 0.57, while the averages of the remaining procedures in that scenario areall approximately equal to 0.66, which is a large diﬀerence.More noticeable diﬀerences in the tests’ performance arise in conﬁgurations where Ω = Ω Pos . For each teststatistic, there is evidence for the following ranking in terms of average MNRP-corrected power, uniformlyin J and n : CMS in ﬁrst place, GMS in second place, and RSW in third place, with the RMS test tied inﬁrst place with the CMS- S A test. The boxplots also show: • The MNRP-corrected power values for the RMS and CMS- S A tests are generally distributed in asimilar way for each n and J , except when J = 2 the CMS- S A power values are slightly moredispersed (as shown by the length of the boxes) than their RMS counterparts for each n . • For each S , the MNRP-corrected power values of CMS tests are markedly less dispersed and havesmaller overall ranges than their GMS and RSW counterparts. • The diﬀerence among the CMS, GMS and RSW tests in these conﬁgurations with S = S can bestrikingly large in terms of average power; for example, with J = 4 and n = 250 , the average powers ofCMS, GMS and RSW tests are approximately equal to 0.75, 0.65, and 0.60, respectively. By contrast,with S = S A the diﬀerence among these tests is less pronounced, which is on account of using a moreeﬀective test statistic. For example, in the aforementioned conﬁguration, the average powers of CMS,GMS and RSW tests are approximately equal to 0.76, 0.74, and 0.73, respectively. However, this lesspronounced diﬀerence in average powers does not mean that these procedures behave similarly, asevidenced by the radically diﬀerent boxplots of the tests’ power values. S A -based tests. For each conﬁguration, the symbol ⊕ marks the location of the averageMNRP-corrected power of a test. igure 2: Boxplots of MNRP-corrected powers of S -based tests. For each conﬁguration, the symbol ⊕ marks the location of the averageMNRP-corrected power of a test. able 2: MNRP-Corrected Average Powers: Ω = Ω pos J n

GMS- S GMS- S A CMS- S CMS- S A RSW- S RSW- S A RMS50 0.658 0.676 0.677 0.691 0.611 0.652 0.6922 100 0.654 0.681 0.682 0.697 0.620 0.661 0.700250 0.662 0.687 0.689 0.701 0.630 0.672 0.70950 0.675 0.743 0.752 0.759 0.575 0.715 0.7504 100 0.667 0.747 0.728 0.763 0.587 0.729 0.760250 0.656 0.746 0.738 0.761 0.593 0.733 0.76150 0.692 0.782 0.787 0.803 0.557 0.746 0.79710 100 0.696 0.796 0.799 0.819 0.573 0.765 0.825250 0.695 0.810 0.802 0.830 0.585 0.782 0.837

The result of Theorem 2 implies that the average power of CMS and GMS tests should get closertogether with larger sample sizes. The simulations reﬂect this implication across all conﬁgurations, butindicate that it happens slowly when Ω = Ω

Pos . Consequently, there is simulation-based evidence that showsthe implementation of the information (2.1), as we do with CMS, may not improve the local power of GMStests for conﬁgurations in which Ω = Ω Pos . The reason is that the boxplots for MNRP-corrected powervalues of CMS and GMS tests are generally quite similar in those conﬁgurations. By contrast, the resultof Theorem 3 points to such an improvement in local power for conﬁgurations in which Ω = Ω

Pos , and thisresult is reﬂected in the simulations as described above.Table 2 reports the average MNRP-corrected powers of the tests when Ω = Ω

Pos and we use theseresults to further contextualize the local power improvement associated with CMS tests over GMS and RSWtests. We benchmark our analysis to RMS because simulation evidence in Andrews and Barwick (2012a)suggests that it is superior in terms of asymptotic average power and is therefore the recommended test.The CMS- S A and RMS tests are neck and neck as their average powers are essentially identical and achievethe highest average powers in all of those scenarios, with the CMS- S test having slightly lower averagepowers than those tests. For a given S , the RSW tests are the worst performing, as they achieve the lowestaverage powers in each corresponding scenario, and the diﬀerence between them and the RMS test can bequite large. For example, when J = 10 and n = 250, the diﬀerence between RSW- S and RMS is 0.252, andwith RSW- S A it is 0.055 which is a much smaller on account of using a more eﬀective test statistic. TheCMS- S test dominates the GMS- S test in each of those scenarios, where the diﬀerence can be as large as10 percentage points – see the scenarios with J = 10. Consequently, the importance of incorporating thestatistical information from the constraints, as we do with CMS, picks up the diﬀerence in average powersbetween the RMS and GMS test when S = S A and most of the diﬀerence when S = S , in each of the thosescenarios.While the focus above has been on average power, for individual µ vectors the power diﬀerences can bemassive with Ω = Ω Pos . Consider, for example, the element µ/ √ n ∈ M ,n (Ω Pos ) with µ = ( − . , , , (cid:124) .This mean vector is an example of an SNVI local alternative. Table 3 reports the MNRP-corrected powerestimates for the tests under this local alternative for n = 50 , , µ/ √ n ∈ M ,n (Ω Pos ) with µ = ( − . , , , (cid:124) . J n

GMS- S GMS- S A CMS- S CMS- S A RSW- S RSW- S A RMS50 0.3 0.668 0.684 0.733 0.237 0.633 0.7344 100 0.275 0.663 0.625 0.726 0.236 0.645 0.74250 0.256 0.665 0.616 0.718 0.234 0.654 0.744

Figure 3: ECDFs of MNRP-corrected CMS (solid line), GMS (dashed line), RMS (dash-dot line),and RSW (dotted line) critical values using the S (MMM) and S A (AQLR) test statistics. • There can be extremely large power improvements associated with CMS relative to RSW and GMSwhen S = S . Indeed, the improvement in power of CMS over GMS is approximately 36 percentagepoints and 40 percentages points over RSW. • The improvements persist with S = S A , but are not as large. The AQLR statistic results in CMSexperiencing a six percentage point improvement over GMS and eight percentages point improvementover RSW. In absolute terms, all procedures experience higher local power with S = S A . • The MNRP-corrected powers of CMS- S A are comparable to their RMS counterparts.To gain a deeper insight into the behavior of the tests under this local alternative, Figure 3 reports theempirical distribution functions (ECDFs) of the MNRP-corrected critical values for n = 250 . The focus onthis sample size is without loss of generality as similar graphs of the critical values’ ECDFs arise in all ofthe other values of n we considered. For either test statistic, the ECDFs in Figure 3 show strong evidenceof a ﬁrst-order stochastic dominance ranking among the critical values of the CMS, GMS, and RSW, tests.Speciﬁcally, for both types of test statistics, there is evidence for the ordering ´ c n ≤ ˆ c n ≤ ˇ c n , where ˇ c n denotesthe RSW critical value. By contrast, the ECDF of the recommended RMS test crosses that of CMS with S = S A , which means that there isn’t evidence of a clear ordering of their critical values. Overall, thediﬀerences between the ECDFs is quite striking and indicates that there is a big diﬀerence in the behaviorof the tests even in moderately large sample sizes. The stochastic ordering of the CMS and GMS criticalvalues is a reﬂection of Theorem 3 and provides evidence for local power improvements under SNVI local lternatives which have positively correlated moment functions. Finally, we discuss the behavior of the RSWprocedure. The RSW procedure rejects on the event { M n ( β ) (cid:42) R J + } T { S > ˇ c n } , where M n ( β ) is a lowerconﬁdence rectangle that is used to detect “positive” moments in the ﬁrst step of their two-step procedure(see Appendix G.2). Across the two test statistics, our simulations indicate that (i) the event { M n ( β ) (cid:42) R J + } occurs with empirical probability close to 1, and (ii) in 9465 times out of 10000 Monte Carlo replications,their critical value ˇ c n corresponds to the case where none of the moment inequalities have been omitted fromits calculation. These ﬁndings show the RSW procedure fails to reliably detect the “positive” moments in µ/ √ n in most of the 10000 Monte Carlo replications, resulting in it having low empirical power. This paper has proposed a surgical modiﬁcation of the generalized moment selection (GMS) procedure putforward by Andrews and Soares (2010) that improves its performance, called constrained moment selection(CMS). The basic idea of the CMS procedure is to use empirical likelihood to incorporate the informationembedded in the moment inequality constraints into the moment selection step of the GMS procedure. Ouranalyses highlights the importance of using this information to more reliably detect the binding moments,which is the source of the improvement of CMS over GMS tests.There are a number of directions for future research. Although we focus on modifying GMS tests, theintuition of incorporating the information embedded in the identiﬁed set transcends this choice and weconjecture that similar ﬁnite-sample beneﬁts would arise in a similar modiﬁcation to the two-step procedureof Romano et al. (2014). There is also an emerging literature that focuses on testing with ‘many’ moments,where the number of inequalities grow exponentially with the sample size (e.g., Chernozhukov et al., 2019,and Bai et al., 2019). Extending the empirical likelihood modiﬁcation to such testing procedures mayimprove their performance, but diﬀerent theoretical tools must be employed to account for the increasingnumber of constraints. Finally, our paper is related to the semi-inﬁnite programming empirical likelihoodprocedure proposed by Lok and Tabri (in press) for two-step bootstrap tests of stochastic dominance, wherethe continuum of unconditional moment inequalities is akin to inference for conditional moment inequalities.Their results are limited to restricted stochastic dominance tests and it would be interesting to extend theinsights from this paper to the general conditional moment inequality models of Andrews and Shi (2013,2017).

We are grateful to Jonathan Roth for providing valuable comments. We are also appreciative of feedback fromparticipants at the Graduate Student Workshop in Econometrics, Harvard University. The computationsin this paper were run on the FASRC Cannon cluster supported by the FAS Division of Science ResearchComputing Group at Harvard University. All errors are our own.

References

Andrews, D. W. K. and Barwick, P. J. (2012a). Inference for parameters deﬁned by moment inequalities: Arecommended moment selection procedure.

Econometrica , 80(6):2805–2826. ndrews, D. W. K. and Barwick, P. J. (2012b). Supplement to “inference for parameters deﬁned by momentinequalities: A recommended moment selection procedure”. Econometrica , 80(6):2805–2826.Andrews, D. W. K. and Guggenberger, P. (2009). Validity of subsampling and “plug-in asymptotic” inferencefor parameters deﬁned by moment inequalities.

Econometric Theory , 25(3):669–709.Andrews, D. W. K. and Shi, X. (2013). Inference based on conditional moment inequalities.

Econometrica ,81(2):609–666.Andrews, D. W. K. and Shi, X. (2017). Inference based on many conditional moment inequalities.

Journalof Econometrics , 196(2):275–287.Andrews, D. W. K. and Soares, G. (2010). Inference for parameters deﬁned by moment inequalities usinggeneralized moment selection.

Econometrica , 78(1):119–157.Bai, Y., Santos, A., and Shaikh, A. (2019). A practical method for testing many moment inequalities.

University of Chicago, Becker Friedman Institute for Economics Working Paper , (2019-116).Canay, I. A. (2010). El inference for partially identiﬁed models: Large deviations optimality and bootstrapvalidity.

Journal of Econometrics , 156(2):408–425.Chernozhukov, V., Chetverikov, D., and Kato, K. (2019). Inference on causal and structural parametersusing many moment inequalities.

The Review of Economic Studies , 86(5):1867–1900.Chernozhukov, V., Hong, H., and Tamer, E. (2007). Estimation and conﬁdence regions for parameter setsin econometric models.

Econometrica , 75(5):1243–1284.Chetverikov, D., Santos, A., and Shaikh, A. M. (2018). The Econometrics of Shape Restrictions.

AnnualReview of Economics , 10:31–63.Ciliberto, F. and Tamer, E. (2009). Market structure and multiple equilibria in airline markets.

Econometrica ,77(6):1791–1828.Guggenberger, P. and Smith, R. J. (2005). Generalized empirical likelihood estimators and tests underpartial, weak, and strong identiﬁcation.

Econometric Theory , pages 667–709.Hall, P. and Presnell, B. (1999). Intentionally biased bootstrap methods.

Journal of the Royal StatisticalSociety. Series B (Statistical Methodology) , 61(1):143–158.Hsu, Y.-C. and Shi, X. (2017). Model-selection tests for conditional moment restriction models.

TheEconometrics Journal , 20(1):52–85.Imbens, G. W. and Manski, C. F. (2004). Conﬁdence intervals for partially identiﬁed parameters.

Econo-metrica , 72(6):1845–1857.Lok, T. M. and Tabri, R. V. (in press). An Improved Bootstrap Test for Restricted Stochastic Dominance.

Journal of Econometrics .Manski, C. F. and Tamer, E. (2002). Inference on regressions with interval data on a regressor or outcome.

Econometrica , 70(2):519–546. oon, H. R. and Schorfheide, F. (2009). Estimation with overidentifying inequality moment conditions. Journal of Econometrics , 153(2):136–154.Owen, A. B. (2001).

Empirical likelihood . Chapman and Hall/CRC.Pakes, A., Porter, J., Ho, K., and Ishii, J. (2015). Moment inequalities and their application.

Econometrica ,83(1):315–334.Rambachan, A. and Roth, J. (2019). An honest approach to parallel trends.Romano, J. P., Shaikh, A. M., and Wolf, M. (2014). A practical two-step method for testing momentinequalities.

Econometrica , 82(5):1979–2002.Rudin, W. (1976).

Principles of mathematical analysis , volume 3. McGraw-hill New York.Shi, X. (2015). Model selection tests for moment inequality models.

Journal of Eonometrics , 187:1–17.Whang, Y.-J. (2019).

Econometric Analysis of Stochastic Dominance: Concepts, Methods, Tools, and Ap-plications . Themes in Modern Econometrics. Cambridge University Press. Outline

This Appendix provides supplementary material to this paper. It is organized as follows. • Section B lists the complete set of assumptions on the test statistic that Andrews and Soares (2010)use in their work. We use these conditions in the proofs of the main results in the paper. • Section C presents the proofs of the results in the paper: Theorems 1, 2, 3, and 4. • Section D presents technical lemmas used in the proof of Theorem 1. • Section E presents technical lemmas used in the proofs of Theorems 2 and 3. • Section G.1 outlines the reﬁned moment selection procedure of Andrews and Barwick (2012a). • Section G.2 outlines the two-step procedure of Romano et al. (2014). • Section G.3 details the MNRP corrections.

B Test Statistic Assumptions

Assumption 1.

1. Monotonicity: S ( g, Σ) is nonincreasing in g for all ( g, Σ) ∈ R J × R J × J .2. Invariance: S ( g, Σ) = S ( Dg, D Σ D ) for all g ∈ R J , Σ ∈ R J × J and positive deﬁnite diagonal matrix of Σ , D ∈ R J × J .3. Nonnegativity: S ( g, Ω) ≥ for all ( g, Ω) ∈ R J × Ψ .4. Continuity: S ( g, Ω) is a continuous function of g ∈ R J and Ω ∈ Ψ . Assumption 2.

For any h ∈ R J + , ∞ , Ω ∈ Ψ , Z ∗ ∼ N (0 J , I J ) , and x ∈ R , the distribution function of S (Ω Z ∗ + h , Ω) is 1. continuous at x > , 2. strictly increasing in x > unless h = [ ∞ , ..., ∞ ] > ∈ R J + , ∞ ,and 3. does not exceed / at x = 0 when h = 0 J . Assumption 3.

A necessary and suﬃcient condition for S ( g, Ω) > is that there exists j ∈ J that satisﬁes g j < , where g = ( g , ..., g J ) > and Ω ∈ Ψ . Assumption 4.

Let Z ∗ ∼ N (0 J , I J ) , α ∈ (0 , ) , and c (Ω , − α ) be the (1 − α ) -quantile of the distributionof S (Ω Z ∗ , Ω) . We assume1. The distribution function of S (Ω Z ∗ , Ω) is continuous at c (Ω , − α ) for all Ω ∈ Ψ .2. c (Ω , − α ) is a uniformly continuous function of Ω ∈ Ψ . Assumption 5.

1. Let v ∈ R J [+ ∞ ] and Ω ∈ Ψ be arbitrary. The distribution function of S (Ω Z ∗ + v, Ω) is a) continuous for x > and is b) strictly increasing at x > unless v = [ ∞ , ..., ∞ ] > ∈ R J + , ∞ .2. For all g , g ∗ ∈ R J + , ∞ that satisfy g ∗ (cid:31) g , we assume that P ( S (Ω Z ∗ + g , Ω) ≤ x ) < P ( S (Ω Z ∗ + g ∗ , Ω) ≤ x ) , where x > . We apply the deﬁnition of uniform continuity provided in Rudin (1976). That is, a function f : X → Y , where( X, d X ) and ( Y, d Y ) are metric spaces, is uniformly continuous if ∀ ε > , ∃ δ := δ ( ε ) > s.t. ∀ x, y ∈ X, d X ( x, y ) <δ = ⇒ d Y ( f ( x ) , f ( y )) < ε . The relation ‘ b (cid:31) a ’ means that every element in a is less than or equal to every element in b and the inequalityholds strictly for each least one element. ssumption 6. There exists χ > such that for each a ∈ R ++ , S ( ag, Ω) = a χ S ( g, Ω) for all g ∈ R J and Ω ∈ Ψ . Assumption 7.

Let h ,j : F → R + , ∞ given by h ,j ( θ, F ) = ∞ if E F ( g j ( W i , θ )) > and h ,j ( θ, F ) = 0 if E F ( g j ( W i , θ )) = 0 , and deﬁne h ( θ, F ) = [ h , ( θ, F ) , ..., h ,J ( θ, F )] > . Moreover, let Ω( θ, F ) := lim n →∞ Corr F ( n ˆ g n ( θ )) .There exists ( θ, F ) ∈ F such that the distribution of S (Ω ( θ, F ) Z ∗ + h ( θ, F ) , Ω( θ, F )) is continuous at its − α quantile, where Z ∗ ∼ N (0 J , I J ) . C Proofs of Theorems

We introduce notation: ˆ ϕ n ( θ ) := ϕ (cid:0) ˆ ξ n ( θ ) , ˆΩ n ( θ ) (cid:1) , ´ ϕ n ( θ ) := ϕ (cid:0) ´ ξ n ( θ ) , ˆΩ n ( θ ) (cid:1) , ˆ L n ( θ, Z ∗ ) := S (cid:0) ˆΩ n ( θ ) Z ∗ + ϕ ( ˆ ξ n ( θ ) , ˆΩ n ( θ )) , ˆΩ n ( θ ) (cid:1) , and ˆ L n ( θ, Z ∗ ) := S (cid:0) ˆΩ n ( θ ) Z ∗ + ϕ ( ´ ξ n ( θ ) , ˆΩ n ( θ )) , ˆΩ n ( θ ) (cid:1) for each θ ∈ Θ. We areassuming that ϕ = ϕ (1) ; see the discussion in Section F.1 for the general case. C.1 Theorem 1

Proof.

We present an outline of the proof and then the steps in detail.

Outline.

Lemma D.2 establishes the feasible set in the empirical likelihood optimisation problem (2.9) isnon-empty with probability tending to one uniformly over F + . Consequently, the constrained estimator ofthe moments exists and is unique with probability tending to one uniformly over F + . With this technicalresult in mind, the proof has four steps. First, we show that { ´ ϕ n ( θ ) = ˆ ϕ n ( θ ) } occurs with probabilityapproaching 1 as n → + ∞ with uniformity over F + . In the second step, we use the ﬁrst result to show thatfor any α ∈ (0 , ) and any r > {| ´ c n ( θ, − α ) − ˆ c n ( θ, − α ) | < r } occurs with probability tending to 1 as n → + ∞ , uniformly over F + . In the third step, we use step 2 to show thatlim inf n → + ∞ inf ( θ,F ) ∈F + P F (cid:16) T n ( θ ) ≤ ´ c n ( θ, − α ) (cid:17) = lim inf n → + ∞ inf ( θ,F ) ∈F + P F (cid:16) T n ( θ ) ≤ ˆ c n ( θ, − α ) (cid:17) . In the ﬁnal step, we prove all three statements in the theorem simultaneously by invoking Theorem 1 ofAndrews and Soares (2010).

Step 1.

The complement rule for probability measures implies that it suﬃces to showlim sup n → + ∞ sup ( θ,F ) ∈F + P F (cid:16) ˆ ϕ n ( θ ) = ´ ϕ n ( θ ) (cid:17) = 0 , which amounts to proving lim sup n → + ∞ sup ( θ,F ) ∈F + P F (cid:16) ˆ ϕ n,j ( θ ) = ´ ϕ n,j ( θ ) (cid:17) = 0 or each j ∈ { , ..., J } . Indeed, (cid:8) ˆ ϕ n ( θ ) = ´ ϕ n ( θ ) (cid:9) = S Jj =1 (cid:8) ˆ ϕ n,j ( θ ) = ´ ϕ n,j ( θ ) (cid:9) implieslim sup n → + ∞ sup ( θ,F ) ∈F + P F (cid:18) ˆ ϕ n ( θ ) = ´ ϕ n ( θ ) (cid:19) ≤ J X j =1 lim sup n → + ∞ sup ( θ,F ) ∈F + P F (cid:16) ˆ ϕ n,j ( θ ) = ´ ϕ n,j ( θ ) (cid:17) by the ﬁnite subadditivity of probability measures and basic properties of the supremum.To this end, ﬁx j ∈ { , ..., J } arbitrarily. Recognizing that { ˆ ϕ n,j ( θ ) = ´ ϕ n,j ( θ ) } = { ˆ ϕ n,j ( θ ) > ´ ϕ n,j ( θ ) } S { ˆ ϕ n,j ( θ ) < ´ ϕ n,j ( θ ) } , it follows that P F (cid:16) ˆ ϕ n,j ( θ ) = ´ ϕ n,j ( θ ) (cid:17) = P F (cid:16) ˆ ϕ n,j ( θ ) > ´ ϕ n,j ( θ ) (cid:17) + P F (cid:16) ˆ ϕ n,j ( θ ) < ´ ϕ n,j ( θ ) (cid:17) = P F (cid:16) ˆ ξ n,j ( θ ) > , ´ ξ n,j ( θ ) ≤ (cid:17) + P F (cid:16) ˆ ξ n,j ( θ ) ≤ , ´ ξ n,j ( θ ) > (cid:17) ≤ P F (cid:16) ˆ g n,j ( θ ) > ´ g n,j ( θ ) (cid:17) + P F (cid:16) ˆ g n,j ( θ ) < ´ g n,j ( θ ) (cid:17) = P F (cid:16) ˆ g n,j ( θ ) = ´ g n,j ( θ ) (cid:17) where the second equality and the inequality hold by deﬁnition of ϕ (1) . Lemma D.3 then is invoked toestablish that lim sup n → + ∞ sup ( θ,F ) ∈F + P F (cid:16) ˆ g n,j ( θ ) = ´ g n,j ( θ ) (cid:17) = 0and thereforelim sup n → + ∞ sup ( θ,F ) ∈F + P F (cid:16) ˆ ϕ n,j ( θ ) = ´ ϕ n,j ( θ ) (cid:17) ≤ lim sup n → + ∞ sup ( θ,F ) ∈F + P F (cid:16) ˆ g n,j ( θ ) = ´ g n,j ( θ ) (cid:17) = 0which completes the proof of Step 1. Step 2.

We use step 1 to show that the event (cid:8) ˆ c n ( θ, − α ) = ´ c n ( θ, − α ) (cid:9) occurs with probabilityapproaching 0 as n → + ∞ , with uniformity over F + . This follows immediately from step 1 because P F (cid:16) ˆ c n ( θ, − α ) = ´ c n ( θ, − α ) (cid:17) ≤ P F (cid:16) ´ ϕ n ( θ ) = ˆ ϕ n ( θ ) (cid:17) where the inequality holds because ˆ L n ( θ, Z ∗ ) and ´ L n ( θ, Z ∗ ) only diﬀer through the realization of the momentselection function a.s. [ Z ∗ ]. Step 1 and the squeeze rule then implies thatlim sup n → + ∞ sup ( θ,F ) ∈F + P F (ˆ c n ( θ, − α ) = ´ c n ( θ, − α )) = 0 . Step 3.

The result established in the second step allows us to conclude that (cid:8) T n ( θ ) ≤ ´ c n ( θ, − α ) (cid:9) = (cid:8) T n ( θ ) ≤ ˆ c n ( θ, − α ) + o p (1) (cid:9) uniformly over F + . The uniformity implies thatlim inf n → + ∞ inf ( θ,F ) ∈F + P F (cid:16) T n ( θ ) ≤ ´ c n ( θ, − α ) (cid:17) = lim inf n → + ∞ inf ( θ,F ) ∈F + P F (cid:16) T n ( θ ) ≤ ˆ c n ( θ, − α ) (cid:17) . tep 4. The previous step established that the asymptotic conﬁdence sizes of GMS and CMS are equal.Combine this with the fact that F + ⊆ F and apply Theorem 1 in Andrews and Soares (2010) to concludeall three statements in the theorem simultaneously. (cid:4) C.2 Theorem 2

Proof.

We present an outline of the proof and then the steps in detail.

Outline.

Lemma E.2 establishes the feasible set in the empirical likelihood optimisation problem (2.9) isnon-empty with probability tending to one under local alternatives that satisfy Assumption LA1 and LA2,i.e., local alternatives in the set H . Consequently, the constrained estimator of the moments exists and isunique with probability tending to one, under these local alternatives. With this technical result in mind,the proof has four steps and is similar to the proof of Theorem 1. First, we show { ´ ϕ n ( θ n, ∗ ) = ˆ ϕ n ( θ n, ∗ ) } occurs with probability approaching 1 as n → + ∞ for any sequence { ( θ n, ∗ , F n ) : n ≥ } . Next, we showthat { ˆ c n ( θ n, ∗ , − α ) = ´ c n ( θ n, ∗ , − α ) } is an event that occurs with probability approaching 0 as n → + ∞ along { ( θ n, ∗ , F n ) : n ≥ } . In the third step, we use the second step to conclude thatlim n → + ∞ P F n (cid:0) T n ( θ n, ∗ ) ≤ ˆ c n ( θ n, ∗ , − α ) (cid:1) = lim n → + ∞ P F n (cid:0) T n ( θ n, ∗ ) ≤ ´ c n ( θ n, ∗ , − α ) (cid:1) for all sequences { ( θ n, ∗ , F n ) : n ≥ } . In the fourth step, we invoke Part A of Theorem 2 in Andrews andSoares (2010) to establish the result. Step 1.

Step 1 follows a similar line of reasoning to the same step in Theorem 1. We pick an arbitrarysequences of n − -local alternatives { ( θ n, ∗ , F n ) : n ≥ } and show thatlim n → + ∞ P F n (cid:0) ˆ ϕ n ( θ n, ∗ ) = ´ ϕ n ( θ n, ∗ ) (cid:1) = 0 . To do this, we recognize that (cid:8) ˆ ϕ n ( θ n, ∗ ) = ´ ϕ n ( θ n, ∗ ) (cid:9) = S Jj =1 (cid:8) ˆ ϕ n,j ( θ n, ∗ ) = ´ ϕ n,j ( θ n, ∗ ) (cid:9) and therefore that P F n (cid:16) ˆ ϕ n ( θ n, ∗ ) = ´ ϕ n ( θ n, ∗ ) (cid:17) ≤ J X j =1 P F n (cid:16) ˆ ϕ n,j ( θ n, ∗ ) = ´ ϕ n,j ( θ n, ∗ ) (cid:17) ≤ J X j =1 P F n (cid:16) ˆ g n,j ( θ n, ∗ ) = ´ g n,j ( θ n, ∗ ) (cid:17) using identical reasoning to the corresponding result in the proof of Theorem 1, except replace θ with θ n, ∗ and F with F n, ∗ . It follows then thatlim n → + ∞ P F n (cid:16) ˆ ϕ n ( θ n, ∗ ) = ´ ϕ n ( θ n, ∗ ) (cid:17) ≤ J X j =1 lim n → + ∞ P F n (cid:16) ˆ g n,j ( θ n, ∗ ) = ´ g n,j ( θ n, ∗ ) (cid:17) = 0where the second equality holds by Lemma E.6. tep 2. The proof of step 2 is almost identical to step 2 in Theorem 1. We use the exact same reasoningas Step 2 of Theorem 1 to conclude that P F n (cid:16) ´ c n ( θ n, ∗ , − α ) = ˆ c n ( θ n, ∗ , − α ) (cid:17) ≤ P F n (cid:16) ´ ϕ n ( θ n, ∗ ) = ˆ ϕ n ( θ n, ∗ ) (cid:17) ∀ n ≥ n → + ∞ P F n (cid:16) ´ c n ( θ n, ∗ , − α ) = ˆ c n ( θ n, ∗ , − α ) (cid:17) = 0 following step 1. Step 3.

The result established in the second step allows us to conclude that (cid:8) T n ( θ n, ∗ ) ≤ ´ c n ( θ n, ∗ , − α ) (cid:9) = (cid:8) T n ( θ n, ∗ ) ≤ ˆ c n ( θ n, ∗ , − α ) + o p (1) (cid:9) along any sequence { ( θ n, ∗ , F n ) : n ≥ } . As such,lim n → + ∞ P F n (cid:16) T n ( θ n, ∗ ) ≤ ´ c n ( θ n, ∗ , − α ) (cid:17) = lim n → + ∞ P F n (cid:16) T n ( θ n, ∗ ) ≤ ˆ c n ( θ n, ∗ , − α ) (cid:17) for all sequences { ( θ n, ∗ , F n ) : n ≥ } . Step 4.

The previous step established that the n − -local power functions of GMS and CMS are equivalentto ﬁrst order. We can then apply Part A of Theorem 2 in Andrews and Soares (2010) to conclude thetheorem. (cid:4) C.3 Theorem 3

For the proof of Theorem 3, we let A ∗ n,α denote the event (cid:26) ´Υ n ( θ n, ∗ ) (cid:40) ˆΥ n ( θ n, ∗ ) (cid:27) \ (cid:26) ˆ c n ( θ n, ∗ , − α ) > (cid:27) \ (cid:26) ´ c n ( θ n, ∗ , − α ) < T n ( θ n, ∗ ) ≤ ˆ c n ( θ n, ∗ , − α ) (cid:27) . We also let { W i,n : i ≤ n } denote the n th row of the triangular array induced by { ( θ n, ∗ , F n ) : n ≥ } . C.3.1 Proof of Theorem 3

Proof.

We outline the argument and then prove the result in detail.

Outline.

Lemma E.2 establishes the feasible set in the empirical likelihood optimisation problem (2.9) isnon-empty with probability tending to one under local alternatives that satisfy Assumption LA1 and LA2,i.e., local alternatives in the set H . Consequently, the constrained estimator of the moments exists and isunique with probability tending to one, under these local alternatives. With this technical result in mind,the proof has three steps. First, we show (cid:8) ´ c n ( θ n, ∗ , − α ) ≤ ˆ c n ( θ n, ∗ , − α ) (cid:9) occurs with probability tendingto 1 along any sequence { ( θ n, ∗ , F n ) : n ≥ } ∈ M . This allows us to conclude the ﬁrst part of the theorem.In the second step, we show that the event A ∗ n,α implies that (cid:8) ´ c n ( θ n, ∗ , − α ) < ˆ c n ( θ n, ∗ , − α ) (cid:9) . In the ﬁnalstep, we conclude the strict ordering of the rejection probabilities. Step 1.

Let { ( θ n, ∗ , F n ) : n ≥ } ∈ M . Lemma E.9 states T Jj =1 (cid:8) ´ ϕ n,j ( θ n, ∗ ) ≥ ˆ ϕ n,j ( θ n, ∗ ) (cid:9) with probabilityapproaching 1 along { ( θ n, ∗ , F n ) : n ≥ } . It follows from Part 1 of Assumption 1 that (cid:8) ´ L n ( θ n, ∗ , Z ∗ ) ≤ ˆ L n ( θ n, ∗ , Z ∗ ) a.s. [ Z ∗ ] (cid:9) with probability approaching 1 under { ( θ n, ∗ , F n ) : n ≥ } . Consequently, (cid:8) ´ c n ( θ n, ∗ , − α ) ≤ ˆ c n ( θ n, ∗ , − α ) (cid:9) occurs with probability approaching 1 along { ( θ n, ∗ , F n ) : n ≥ } . Thus, there xists N ( θ n, ∗ , F n ) ≥ P F n ( T n ( θ n, ∗ ) > ˆ c n ( θ n, ∗ , − α )) ≤ P F n ( T n ( θ n, ∗ ) > ´ c n ( θ n, ∗ , − α )) for all n ≥ N ( θ n, ∗ , F n ). Step 2.

The event A ∗ n,α implies the event (cid:8) ´Υ n ( θ n, ∗ ) (cid:40) ˆΥ n ( θ n, ∗ ) (cid:9) T (cid:8) ˆ c n ( θ n, ∗ , − α ) > (cid:9) , which allowsus to apply Part 1 of Assumption 2 and Part 2 of Assumption 5 to deduce that 1 − α = P (cid:16) ˆ L n ( θ n, ∗ , Z ∗ ) ≤ ˆ c n ( θ n, ∗ , − α ) (cid:17) < P (cid:16) ´ L n ( θ n, ∗ , Z ∗ ) ≤ ˆ c n ( θ n, ∗ , − α ) (cid:17) a.s. (cid:2) { W i,n : i ≤ n } (cid:3) . Applying Part 1 of Assumption2 again, we conclude that A ∗ n,α ⊆ (cid:8) ´ c n ( θ n, ∗ , − α ) < ˆ c ( θ n, ∗ , − α ) (cid:9) . This completes step 2. Step 3.

Since A ∗ n,α ⊆ (cid:8) ´ c n ( θ n, ∗ , − α ) < T n ( θ n, ∗ ) ≤ ˆ c n ( θ n, ∗ , − α ) (cid:9) by construction, we use Step 2to deduce A ∗ n,α ⊆ (cid:8) ´ c n ( θ n, ∗ , − α ) < ˆ c n (1 − α, θ n, ∗ ) (cid:9) T (cid:8) ´ c n ( θ n, ∗ , − α ) < T n ( θ n, ∗ ) ≤ ˆ c n ( θ n, ∗ , − α ) (cid:9) . Consequently, if P F n ( A ∗ n,α ) > P F n ( T n ( θ n, ∗ ) > ´ c n ( θ n, ∗ , − α )) − P F n ( T n ( θ n, ∗ ) > ˆ c n ( θ n, ∗ , − α ))= P F n (cid:16)(cid:8) ´ c n ( θ n, ∗ , − α ) < ˆ c n (1 − α, θ n, ∗ ) (cid:9) \ (cid:8) ´ c n ( θ n, ∗ , − α ) < T n ( θ n, ∗ ) ≤ ˆ c n ( θ n, ∗ , − α ) (cid:9)(cid:17) ≥ P F n ( A ∗ n,α )where the inequality uses monotonicity of probability measures. (cid:4) C.4 Theorem 4

In the proof of Theorem 4, we use the notation ν n ( θ n, ∗ ) := D − n ( θ n, ∗ ) n (cid:0) ˆ g n ( θ n, ∗ ) − E F n g ( W i , θ n, ∗ ) (cid:1) . Proof.

Our approach is based on the proof for the corresponding result in Andrews and Soares (2010). Forease of exposition, we outline the proof and then provide the details.

Outline.

For { w n : n ≥ } any subsequence of { n } , it suﬃces to show that there exists a furthersubsequence { u n : n ≥ } such that lim n →∞ P F un (cid:0) T u n ( θ u n , ∗ ) > ´ c u n ( θ u n , ∗ , − α ) (cid:1) = 1. In Step 1, we deﬁnethe sub-subsequence. In Step 2, we show that ( u n υ u n ) − χ T u n ( θ u n , ∗ ) has a positive probability limit, where χ > u n υ u n ) − χ ´ c u n ( θ u n , ∗ , − α ) is zero. Inthe ﬁnal step, we use Step 2 and Step 3 to establish that lim n →∞ P F un (cid:0) T u n ( θ u n , ∗ ) > ´ c u n ( θ u n , ∗ , − α ) (cid:1) = 1. Step 1.

Consider any subsequence { w n : n ≥ } of { n } . We take { u n : n ≥ } so that g ∗ u n /υ u n → e ∈ [ − , + ∞ ] J as n → + ∞ , where g ∗ u n = [ E F un ( g ( W i , θ u n , ∗ )) /σ F un , ( θ u n , ∗ ) , ..., E F un ( g J ( W i , θ u n , ∗ )) /σ F un ,J ( θ u n , ∗ )] > , and υ u n = max ≤ j ≤ J {− g ∗ u n ,j } . This is the sub-subsequence considered in Andrews and Soares (2010). Step 2.

Since we make no modiﬁcation to the test statistic, we can follow the same argument as (S3.2) inthe Supplement to Andrews and Soares (2010) to conclude that ( u n υ u n ) − χ T u n ( θ ∗ u n ) p → S ( e, Ω ) >

0, where he inequality holds by Assumption 3. The argument for the convergence in probability is provided below( u n υ u n ) − χ T u n ( θ ∗ u n ) = ( u n υ u n ) − χ S (cid:16) ˆ D − u n ( θ u n , ∗ ) D ( θ u n , ∗ ) (cid:0) ν u n ( θ u n , ∗ ) + u n g ∗ u n (cid:1) , ˆΩ u n ( θ u n , ∗ ) (cid:17) = S (cid:16) o p (1) + υ − u n g ∗ n , Ω + o p (1) (cid:17) p → S ( e, Ω )where the ﬁrst equality is algebraic manipulation and Part 2 of Assumption 1, the second equality is As-sumption 6 and an application of the WLLN and Lyupanov CLT for triangular arrays of row-wise i.i.d.random variables and Part 2 of Distant Alternatives Assumption 1, and the convergence in probability holdsby the construction of the sub-subsequence in Step 1 and Part 4 of Assumption 1. This completes Step 2. Step 3.

We now establish that ( u n υ u n ) − χ ´ c u n ( θ u n , ∗ , − α ) = o p (1) along { ( θ u n , ∗ , F u n ) : n ≥ } . Part 1and 3 of Assumption 1 and the fact that ϕ (1) ∈ R J + , ∞ yield0 ≤ S (cid:0) ˆΩ u n ( θ u n , ∗ ) Z ∗ + ϕ ( ´ ξ u n ( θ u n , ∗ ) , ˆΩ u n ( θ u n , ∗ )) , ˆΩ u n ( θ u n , ∗ ) (cid:1) ≤ S ( ˆΩ u n ( θ u n , ∗ ) Z ∗ , ˆΩ u n ( θ u n , ∗ )) a.s. [ Z ∗ ]. Consequently, the CMS critical value satisﬁes0 ≤ ´ c u n ( θ u n , ∗ , − α ) ≤ c ( ˆΩ u n ( θ u n , ∗ ) , − α ) p → c (Ω , − α ) = O p (1) (C.1)where c ( ˆΩ u n ( θ u n , ∗ ) , − α ) is the 1 − α quantile of S ( ˆΩ u n ( θ u n , ∗ ) Z ∗ , ˆΩ u n ( θ u n , ∗ )) and the convergence inprobability holds by Part 2 of Assumption 4 and ˆΩ u n p → Ω along { ( θ u n , ∗ , F u n ) : n ≥ } by the weak law oflarge numbers for triangular arrays of row-wise i.i.d. random variables and Part 2 of Distant AlternativesAssumption 1. Since υ u n > n ≥ ≤ ( u n υ u n ) − χ ´ c u n ( θ u n , ∗ , − α ) ≤ ( u n υ u n ) − χ c ( ˆΩ u n ( θ u n , ∗ ) , − α ) p → u n υ u n → ∞ . Step 4.

Combine Step 2 and Step 3 to conclude that P F un ( T u n ( θ u n , ∗ ) > ´ c u n ( θ u n , ∗ , − α )) (C.3)= P F un (( u n υ u n ) − χ T u n ( θ u n , ∗ ) > ( u n υ u n ) − χ ´ c u n ( θ u n , ∗ , − α )) (C.4) → P ( S ( e, Ω ) >

0) = 1 (C.5)as n → ∞ , where the equality is invokes the scale equivariance of quantiles. (cid:4) D Technical Lemmas for Conﬁdence Sets

D.1 Establishing Uniformity

The following lemma validates the subsequence approach to establishing uniformity.

Lemma D.1.

Let { V n ( θ ) : n ≥ } be a sequence of events indexed by θ ∈ Θ . The following is true: im inf n → + ∞ P F wn,h ( V w n ( θ w n ,h )) = 1 for any subsequence { ( θ w n ,h , F w n ,h ) : n ≥ } in F + implies lim inf n → + ∞ inf ( θ,F ) ∈F + P F ( V n ( θ )) = 1 . Proof.

We outline the argument and then provide the details.

Outline.

The proof employs the direct method. In the ﬁrst step, we use the deﬁnition of inﬁmum toconstruct a subsequence { (˜ θ ∗ w n ,h , ˜ F ∗ w n ,h ) : n ≥ } in F + such that for each n ≥ ( θ,F ) ∈F + P F ( V w n ( θ )) + 2 − w n > P ˜ F ∗ wn,h (cid:0) V w n (˜ θ ∗ w n ,h ) (cid:1) . In the second step, we combine this with the assumption that lim inf n → + ∞ P F wn,h ( V w n ( θ w n ,h )) = 1 for anysubsequence { ( θ w n ,h , F w n ,h ) : n ≥ } in F + to conclude the result. Step 1.

As the smallest subsequential limit, the limit inferior implies the existence of a subsequence { w n : n ≥ } of { n } such thatlim n → + ∞ inf ( θ,F ) ∈F + P F (cid:0) V w n ( θ ) (cid:1) = lim inf n → + ∞ inf ( θ,F ) ∈F + P F (cid:0) V n ( θ ) (cid:1) . Consider the subsequence (cid:8) inf ( θ,F ) ∈F + P F ( V w n ( θ )) : n ≥ (cid:9) . For each n ≥ η >

0, there exists(˜ θ w n ,h,η , ˜ F w n ,h,η ) ∈ F + such that inf ( θ,F ) ∈F + P F (cid:0) V w n ( θ ) (cid:1) + η > P F wn,h,η (cid:0) V w n ( θ w n ,h,η ) (cid:1) , by deﬁnition of theinﬁmum. Consequently, there exists a subsequence { (˜ θ ∗ w n ,h , ˜ F ∗ w n ,h ) : n ≥ } in F + that satisﬁesinf ( θ,F ) ∈F + P F (cid:0) V w n ( θ ) (cid:1) + 2 − w n > P ˜ F ∗ wn,h (cid:0) V w n (˜ θ ∗ w n ,h ) (cid:1) (D.1)for each n ≥

1. This completes the ﬁrst step.

Step 2.

If lim inf n → + ∞ P F wn,h ( V w n ( θ w n ,h )) = 1 for any subsequence { ( θ w n ,h , F w n ) : n ≥ } in F + , thenlim inf n → + ∞ P ˜ F ∗ wn,h (cid:0) V w n (˜ θ ∗ w n ,h ) (cid:1) = 1 by construction. Taking the limit inferior on both sides of (D.1), weconclude that lim inf n → + ∞ inf ( θ,F ) ∈F + P F (cid:0) V w n ( θ ) (cid:1) = 1 by the squeeze rule. (cid:4) D.2 Restricted Estimator

CMS is based on the following empirical likelihood primal problem,sup p ( n X i =1 ln( p i ) : n X i =1 p i g j ( W i , θ ) ≥ ∀ j ∈ J , n X i =1 p i = 1 , p i ≥ ∀ i ∈ I ) , (D.2)where p ∈ R n and I := { , ..., n } . A feasible solution to (D.2) is denoted by ´ p ∈ R n and is the uniquemaximiser because the empirical likelihood problem is a strictly convex program (see Owen, 2001).We now establish that the feasible set is non-empty with probability tending to one uniformly over F + . Lemma D.2.

Deﬁne the random set C n ( θ ) = ( ( p , ..., p n ) > ∈ R n : n X i =1 p i g j ( W i , θ ) ≥ ∀ j ∈ J , n X i =1 p i = 1 , p i ≥ ∀ i ∈ I ) or all θ ∈ Θ . The following is true: lim sup n →∞ sup ( θ,F ) ∈F + P F ( C n ( θ ) = ∅ ) = 0 . Proof.

We outline the proof and then provide the details.

Outline.

The proof proceeds by the direct method and, in accordance with Lemma D.1, we only need toshow that lim sup n → + ∞ P F wn,h ( C w n ( θ w n ,h ) = ∅ ) = 0 for any subsequence { ( θ w n ,h , F w n ,h ) : n ≥ } in F + .In the ﬁrst step, we establish the result for sequences { ( θ n,h , F n,h ) : n ≥ } in F + using the union boundand the weak law of large numbers for triangular arrays of row-wise i.i.d. random variables. In Step 2, wegeneralize the argument to subsequences and complete the proof. Step 1.

We start by proving the result along sequences { ( θ n,h , F n,h ) : n ≥ } in F + . Consider an arbitrarysequence { ( θ n,h , F n,h ) : n ≥ } in F + . Recognizing that the standard simplex S n = { p ∈ R n : P ni =1 p i =1 , p i ≥ ∀ i ∈ I} 6 = ∅ , it follows that n C n ( θ n,h ) = ∅ o = n ∀ p ∈ S n , ∃ j := j ( p ) ∈ J s.t. n X i =1 p i g j ( W i , θ n,h ) < o , and therefore P F n ( C n ( θ n,h ) = ∅ ) ≤ J X j =1 P F n,h (cid:18) n n X i =1 g j ( W i , θ n,h ) < (cid:19) where the ﬁrst inequality holds by the ﬁnite subadditivity of probability measures and because ( n , ..., n ) > ∈S n for each n ≥

1. We then apply the weak law of large numbers for triangular arrays of row-wise i.i.d.random variables to conclude thatlim sup n → + ∞ P F n,h ( C n ( θ n,h ) = ∅ ) ≤ J X j =1 lim sup n → + ∞ P F n,h (cid:18) n n X i =1 g j ( W i , θ n,h ) < (cid:19) = 0where the equality holds because E F n g j ( W i , θ n,h ) ≥ n ≥ { ( θ n,h , F n ) : n ≥ } is asequence in F . Since { ( θ n,h , F n ) : n ≥ } was arbitrary, we establish the result along sequences. Step 2.

To establish the result for subsequences { w n : n ≥ } of { n } , just replaces n with w n in theprevious argument. (cid:4) In order to prove technical results, we reformulate the primal problem as one with equality constraints inorder to make use of lemmas in Andrews and Guggenberger (2009) (hereafter, AG09). Let t ∈ R J + denotea nuisance parameter vector where the j th element measures the slackness of corresponding moment. Thevector t allows us to formulate the empirical likelihood primal problem as a parameterized optimizationproblem as follows, EL ( t ) := sup p ( n X i =1 ln( p i ) : n X i =1 p i g i ( t, θ ) = 0 J , n X i =1 p i = 1 , p i ≥ ∀ i ∈ I ) , (D.3) here g i ( t, θ ) := g ( W i , θ ) − t and 0 J denotes the zero vector in J -dimensional Euclidean space and theempirical likelihood probabilities (´ p , ..., ´ p n ) are the solution to sup t ∈ R J + EL ( t ).A more convenient representation of the probabilities arises through the saddlepoint form of the empiricallikelihood problem. The Lagrangian for the constrained optimization problem (D.3) is L = n X i =1 ln( p i ) + nλ > n X i =1 p i g i ( t, θ ) + ω n X i =1 p i − ! . (D.4)Note that the non-negativity constraints are ignored as p i = 0 for some i ∈ I is never optimal. The ﬁrstorder conditions are ∂ L ∂p i = 1 p i + nλ > g i ( t, θ ) + ω = 0 ∀ i ∈ I (D.5) ∂ L ∂λ = n n X i =1 p i g i ( t, θ ) = 0 J (D.6) ∂ L ∂ω = n X i =1 p i − . (D.7)Multiplying p i with the corresponding ﬁrst order condition in (D.5) and then summing over I gives ω = − n .Substituting ω = − n into (D.5), we obtain that p i ( λ, t ) = 1 n (cid:0) − λ > g i ( t, θ ) (cid:1) ∀ i = 1 , ..., n. (D.8)Substituting ( p ( λ, t ) , ..., p n ( λ, t )) into the empirical log-likelihood function implies the saddle point repre-sentation of the empirical likelihood probleminf t ∈ R J + sup λ ∈ ´Λ n n n X i =1 ln (cid:16) − λ > g i ( t, θ ) (cid:17) , (D.9)where ´Λ n := { λ ∈ R J : λ > g i ( t, θ ) ∈ Q } and Q is an open subset of R that contains 0. The saddle pointproblem (D.9) is presented in AG09, which implies that the useful lemmas in that paper can be invoked toestablish the uniform validity of CMS. D.3 Lemmas Relating to the Restricted Estimator

The next results establish the uniform consistency of the restricted empirical likelihood estimator of themean and variance over F + . We must deﬁne some more notation before proceeding. For any subsequence { ( θ w n ,h , F w n ,h ) : n ≥ } in F + , let (´ t n , ´ λ n ) ∈ R J + × ´Λ n denote the solution to (D.9) evaluated at the subse-quence (i.e. replace θ with θ w n,h ). The construction of the feasible set implies ´ t w n = P ni =1 ´ p i g ( W i , θ w n ,h ).Lemma D.2 establishes that the estimator exists with probability approaching 1 uniformly over F + . Allsubsequent analysis assumes the event {C n ( θ ) = ∅} occurs so that the estimator exists, where the randomset C n ( θ ) was deﬁned in Lemma D.2.Deﬁne an empirical process { ˆ g n ( t ) : t ∈ R J + } given by ˆ g n ( t ) = n − P ni =1 g i ( t, θ ) for each t ∈ R J + . Since F + satisﬁes Assumption GEL of AG09, we invoke Lemma 6 and the subsequent remark in their paper and state hat ˆ g w n (´ t w n ) = O p ( w − n ) for any subsequence { ( θ w n ,h , F w n ,h ) : n ≥ } in F + . This gives us a uniform rateof convergence result for diﬀerence between the constrained and unconstrained estimator of the momentsand in the statement || · || ‘ J denotes the Euclidean norm for R J . Lemma D.3.

Let ´ g n ( θ ) := P ni =1 ´ p i g ( W i , θ ) and ˆ g n ( θ ) := n − P ni =1 g ( W i , θ ) for each θ ∈ Θ . The followingis true: || ´ g n ( θ ) − ˆ g n ( θ ) || ‘ J = O p ( n − ) uniformly over F + .Proof. The proof follows by the direct method. Since we want to show that ´ g n ( θ ) − ˆ g n ( θ ) = O p ( n − )with uniformity over F + , it suﬃces to show that ´ g w n ( θ w n ,h ) − ˆ g w n ( θ w n ,h ) = O p ( w − n ) for all subsequences { ( θ w n ,h , F w n ,h ) : n ≥ } (see Lemma D.1). Observing that ˆ g w n ( θ w n ,h ) − ´ g w n ( θ w n ,h ) = ˆ g w n (´ t w n ) for anysubsequence { ( θ w n ,h , F w n ,h ) : n ≥ } , we apply Lemma 6 of AG09 and conclude that ˆ g n ( θ w n ,h ) − ´ g n ( θ w n ,h ) = O p ( w − n ), which completes the proof. (cid:4) For the next lemma, we must introduce some more notation. Let Mat J × J ( R ) denote the vector space of J × J matrices over R . For each A ∈ Mat J × J ( R ), let || A || ‘ J × J := J X i =1 J X j =1 | a ij | ! . This is the

Frobenius norm . We let´Σ n ( θ ) := n X i =1 ´ p i (cid:0) g ( W i , θ ) − ´ g n ( θ ) (cid:1)(cid:0) g ( W i , θ ) − ´ g n ( θ ) (cid:1) > denote the constrained estimator of the moment covariance matrix andˆΣ n ( θ ) := 1 n n X i =1 (cid:0) g ( W i , θ ) − ´ g n ( θ ) (cid:1)(cid:0) g ( W i , θ ) − ´ g n ( θ ) (cid:1) > denote the unconstrained estimator of the moment covariance matrix for each θ ∈ Θ. Lemma D.4.

For each r > , lim inf n → + ∞ inf ( θ,F ) ∈F + P F (cid:16)(cid:12)(cid:12)(cid:12)(cid:12) ´Σ n ( θ ) − ˆΣ n ( θ ) (cid:12)(cid:12)(cid:12)(cid:12) ‘ J × J < r (cid:17) = 1 Proof.

Due to the length of the proof, we provide an outline and then detailed steps.

Outline.

In accordance with Lemma D.1, it suﬃces to show that ´Σ w n ( θ w n ,h ) = ˆΣ w n ( θ w n ,h )+ o p (1) for anysubsequence { ( θ w n ,h , F w n ,h ) : n ≥ } in F + . To do this, we ﬁrst prove the result along sequences. First, weestablish a preliminary result that states that max ≤ i ≤ n | ´ λ > n g i (´ t n , θ n,h ) | = o p (1) for any arbitrary sequence { ( θ n,h , F n ) : n ≥ } in F + . In Step 2, we show that ´Σ n ( θ n,h ) = P ni =1 ´ p i (cid:0) g ( W i , θ n,h ) − ˆ g n ( θ n,h ) (cid:1)(cid:0) g ( W i , θ n,h ) − ˆ g n ( θ n,h ) (cid:1) > + o p (1) holds for an arbitrary sequence { ( θ n,h , F n,h ) : n ≥ } in F + . In Step 3, we use Step 1 toshow that P ni =1 ´ p i (cid:0) g ( W i , θ n,h ) − ˆ g n ( θ n,h ) (cid:1)(cid:0) g ( W i , θ n,h ) − ˆ g n ( θ n,h ) (cid:1) > = ˆΣ n ( θ n,h ) + o p (1) along the sequence { ( θ n,h , F n,h ) : n ≥ } . This completes the proof for sequences. In Step 4, we generalize the result tosubsequences { w n : n ≥ } of { n } . tep 1. We ﬁrst establish the preliminary result that max ≤ i ≤ n | ´ λ > n g i (´ t n , θ n,h ) | = o p (1) along { ( θ n,h , F n,h ) : n ≥ } in F + . The Cauchy-Schwarz inequality yields thatmax ≤ i ≤ n | ´ λ > n g i (´ t n , θ n,h ) | ≤ || ´ λ n || ‘ J max ≤ i ≤ n || g i (´ t n , θ n,h ) || ‘ J . (D.10)Assumption T lets us apply Part (ii) Lemma 3 of AG09 that statesmax ≤ i ≤ n || g i (´ t n , θ n,h ) || ‘ J = O p ( n δ )and also apply Lemma 5 in AG09 that states || ´ λ n || ‘ J = O p ( n − ) along { ( θ n,h , F n,h ) : n ≥ } . Combiningthese with (D.10), we deduce thatmax ≤ i ≤ n | ´ λ > n g i (´ t n , θ n,h ) | ≤ O p ( n − ) O p ( n δ ) = O p ( n − δ δ ) ) = o p (1)along { ( θ n,h , F n,h ) : n ≥ } . The result established is essential in the third step. Step 2.

We can decompose g ( W i , θ n,h ) − ´ g n ( θ n,h ) = g ( W i , θ n,h ) − ˆ g n ( θ n,h ) + ˆ g n ( θ n,h ) − ´ g n ( θ n,h ) andtherefore ´Σ n ( θ n,h ) = n X i =1 ´ p i (cid:0) g ( W i , θ n,h ) − ˆ g n ( θ n,h ) (cid:1)(cid:0) g ( W i , θ n,h ) − ˆ g n ( θ n,h ) (cid:1) > + n X i =1 ´ p i (cid:0) g ( W i , θ n,h ) − ˆ g n ( θ n,h ) (cid:1)(cid:0) ˆ g n ( θ n,h ) − ´ g n ( θ n,h ) (cid:1) > + (cid:0) ˆ g n ( θ n,h ) − ´ g n ( θ n,h ) (cid:1) n X i =1 ´ p i (cid:0) g ( W i , θ n,h ) − ˆ g n ( θ n,h ) (cid:1) > + (cid:0) ˆ g n ( θ n,h ) − ´ g n ( θ n,h ) (cid:1)(cid:0) ˆ g n ( θ n,h ) − ´ g n ( θ n,h ) (cid:1) > = n X i =1 ´ p i (cid:0) g ( W i , θ n,h ) − ˆ g n ( θ n,h ) (cid:1)(cid:0) g ( W i , θ n,h ) − ˆ g n ( θ n,h ) (cid:1) > + o p (1)along { ( θ n,h , F n ) : n ≥ } , where the second equality holds by Lemma D.3. Consequently, we need to showthat P ni =1 ´ p i (cid:0) g ( W i , θ n,h ) − ˆ g n ( θ n,h ) (cid:1)(cid:0) g ( W i , θ n,h ) − ˆ g n ( θ n,h ) (cid:1) > = ˆΣ n ( θ n,h )+ o p (1) along { ( θ n,h , F n,h ) : n ≥ } .This completes the task for Step 2. Step 3.

Let A i ( θ n,h ) := (cid:0) g ( W i , θ n,h ) − ˆ g n ( θ n,h ) (cid:1)(cid:0) g ( W i , θ n,h ) − ˆ g n ( θ n,h ) (cid:1) > . The decomposition n X i =1 ´ p i A i ( θ n,h ) = n X i =1 (cid:18) ´ p i − n (cid:19) A i ( θ n,h ) + ˆΣ n ( θ n,h ) + o p (1) eans that we need to show P ni =1 (´ p i − n ) A i ( θ n,h ) = o p (1) along { ( θ n,h , F n ) : n ≥ } . By deﬁnition of ´ p i , itfollows that (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X i =1 (cid:18) ´ p i − n (cid:19) A i ( θ n,h ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ‘ J × J ≤ n n X i =1 (cid:12)(cid:12)(cid:12)(cid:12) ´ λ > n g i (´ t n , θ n,h )1 − ´ λ > n g i (´ t n , θ n,h ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) A i ( θ n,h ) (cid:12)(cid:12)(cid:12)(cid:12) ‘ J × J ≤ max ≤ i ≤ n | ´ λ > n g i (´ t n , θ n,h ) | − max ≤ i ≤ n | ´ λ > n g i (´ t n , θ n,h ) | n n X i =1 || A i ( θ n,h ) || ‘ J × J ! = o p (1) O p (1)= o p (1)where the ﬁrst inequality holds by the triangle inequality, the second holds by the reverse triangle inequalityand the deﬁnition of maximum, and the ﬁrst equality holds by Step 1 and the weak law of large numbers fortriangular arrays of row-wise i.i.d. random variables. This establishes that P ni =1 (cid:0) ´ p i − n (cid:1) A i ( θ n,h ) = o p (1)along { ( θ n,h , F n ) : n ≥ } and, in combination with the result in Step 2, we conclude that ´Σ n ( θ n,h ) =ˆΣ n ( θ n,h ) + o p (1) along sequences { ( θ n,h , F n,h ) : n ≥ } in F + . Step 3.

To generalize to subsequences { w n : n ≥ } of { n } , just replace n with w n and repeat Steps 1, 2and 3. (cid:4) E Technical Lemmas for Local Power

E.1 A Preliminary Lemma

This technical result shows that along any sequence of { ( θ n, ∗ , F n ) : n ≥ } that satisﬁes Assumption LA1,max ≤ i ≤ n | g j ( W i , θ n, ∗ ) | = O p ( n δ ) = o p ( n ) for any j ∈ J , where δ > ≤ i ≤ n max ≤ j ≤ J | g j ( W i , θ n, ∗ ) | = o p ( n ) and, by the equivalence ofnorms in Euclidean space, max ≤ i ≤ n || g j ( W i , θ n, ∗ ) || ‘ J = o p ( n ). The result is used in the proofs of LemmaE.5, Lemma E.6, and Lemma E.7. Lemma E.1.

For any sequence { ( θ n, ∗ , F n ) : n ≥ } of n − -local alternatives that satisﬁes LA1, the followingis true: max ≤ i ≤ n | g j ( W i , θ n, ∗ ) | = O p ( n δ ) for each j ∈ J , where δ > is deﬁned in Assumption LA1.Proof. We outline the proof and then provide the steps.

Outline.

The proof is similar to that of equation (2.4) in Guggenberger and Smith (2005). In the ﬁrststep, we choose an appropriate

C >

0. In the second step, we apply the union bound and Markov’s inequalityto establish the result.

Step 1.

Fix r > j ∈ J , and { ( θ n, ∗ , F n ) : n ≥ } arbitrarily. We know that K := sup n ≥ E F n | g j ( W i , θ n, ∗ ) | δ < + ∞ by Assumption LA1 and can therefore choose C >

K/C < r . Such a constant

C > tep 2. We know that P F n (cid:16) max ≤ i ≤ n | g j ( W i , θ n, ∗ ) | ≤ ( Cn ) δ (cid:17) ≤ n X i =1 P F n (cid:16) | g j ( W i , θ n, ∗ ) | δ > nC (cid:17) ≤ KC < r where the ﬁrst inequality applies the union bound, the second follows from Markov’s inequality and takingthe supremum of { E F n | g j ( W i , θ n, ∗ ) | δ : n ≥ } , and the third holds by the construction of C . We haveconstructed an upper bound (i.e. K/C ) that is does not depend on n , so we can take the supremum toconclude that sup n ≥ P F n (cid:16) max ≤ i ≤ n | g j ( W i , θ n, ∗ ) | ≤ ( Cn ) δ (cid:17) < r and complete the proof. (cid:4) E.2 Restricted Estimator Under Local Alternatives

The restricted empirical likelihood problem issup p ,...,p n ( n X i =1 ln( p i ) (cid:12)(cid:12)(cid:12)(cid:12) n X i =1 p i g ( W i , θ n, ∗ ) ≥ J , n X i =1 p i = 1 , p i ≥ ∀ i = 1 , ..., n ) . (E.1)The Lagrangian is L ( p , ..., p n , λ ( θ n, ∗ ) , ω ( θ n, ∗ )) = n X i =1 ln( p i ) + ω − n X i =1 p i ! − nλ > n X i =1 p i g ( W i , θ n, ∗ ) ! (E.2)and the Karusch-Kuhn-Tucker (KKT) conditions are ∂ L ∂p i = 1 p i − ω − nλ g ( W i , θ n, ∗ ) = 0 , ∀ i = 1 , ..., n (E.3) λ j ≤ , n X i =1 p i g j ( W i , θ n, ∗ ) ≥ , ∀ j ∈ J (E.4) n X i =1 p i = 1 , λ j n X i =1 p i g j ( W i , θ n, ∗ ) = 0 ∀ j ∈ J . (E.5)From the Karusch-Kuhn-Tucker conditions, we have that´ p i = 1 n (cid:18)

11 + (´ λ n,b ) > g b ( W i , θ n, ∗ ) (cid:19) (E.6)where g b ( W i , θ n, ∗ ) denotes the vector of estimating functions for the moments that are deemed bindingby the Karusch-Kuhn-Tucker conditions and ´ λ n,b is the vector of Lagrange multipliers that corresponds to g b ( W i , θ n, ∗ ). Substituting (E.6) into (E.2), we obtain the dual representation of the empirical likelihoodproblem, sup λ ∈ R J − ( n ln( n ) + n X i =1 ln (cid:0) λ > g ( W i , θ n, ∗ ) (cid:1)) . (E.7) he existence of Lagrange multipliers holds because of the fact that the constraints are aﬃne functions ofthe choice variables in the primal problem (E.2). E.3 Technical Results Relating to the Constrained Estimator

Recall that the set H is deﬁned as the set of all local alternatives { ( θ n, ∗ , F n ) : n ≥ } that satisfy Assump-tions LA1 and LA2. Lemma E.2.

For each { ( θ n, ∗ , F n ) : n ≥ } ∈ H , deﬁne the random set C n ( θ n, ∗ ) = ( ( p , ..., p n ) > ∈ R n : n X i =1 p i g j ( W i , θ n, ∗ ) ≥ ∀ j ∈ J , n X i =1 p i = 1 , p i ≥ ∀ i ∈ I ) . Then lim n → + ∞ P F n (cid:0) C n ( θ n, ∗ ) = ∅ (cid:1) = 0 for each { ( θ n, ∗ , F n ) : n ≥ } ∈ H .Proof. We start with an outline and the provide the details.

Outline.

The proof proceeds by the direct method. In Step 1, we establish we establish that it suﬃcesto show that P F n (ˆ g n,j ( θ n, ∗ ) < → n → + ∞ for each j ∈ J . In Step 2, we establish the result using amean-value expansion and the WLLN for triangular arrays of row-wise i.i.d. random variables. Step 1.

The proof of the ﬁrst step follows the a similar argument to that of Lemma D.2. We know that (cid:8) C n ( θ n, ∗ ) = ∅ (cid:9) = n ∀ p ∈ S n , ∃ j := j ( p ) ∈ J s.t. P ni =1 p i g j ( W i , θ n, ∗ ) < (cid:9) , where S n is the standardsimplex. That is, S n := { p ∈ R n : P ni =1 p i = 1 , p i ≥ ∀ i ∈ I} . Since ( n − , ..., n − ) > ∈ S n , it follows that (cid:8) C n ( θ n, ∗ ) = ∅ (cid:9) ⊆ J [ j =1 ( n n X i =1 g j ( W i , θ n, ∗ ) < ) and therefore lim n → + ∞ P F n (cid:16) C n ( θ n, ∗ ) = ∅ (cid:17) ≤ J X j =1 lim n → + ∞ P F n n n X i =1 g j ( W i , θ n, ∗ ) < ! . Hence it suﬃces to show that lim n → + ∞ P F n (ˆ g n,j ( θ n, ∗ ) <

0) = 0 for each j ∈ J . Step 2.

For each j ∈ J , we can mean-value expand E F n g j ( W i , θ n, ∗ ) /σ F n ,j ( θ n, ∗ ) around { ( θ n , F n ) : n ≥ } ∈ F and conclude that E F n g j ( W i , θ n, ∗ ) σ F n ,j ( θ n, ∗ ) = E F n g j ( W i , θ n ) σ F n ,j ( θ n ) + O ( n − )for each n ≥

1. So by the weak law of large numbers for triangular arrays of row-wise i.i.d. data, it followsthat lim n → + ∞ P F n (ˆ g n,j ( θ n, ∗ ) <

0) = 0, and therefore lim n → + ∞ P F n ( C n ( θ n, ∗ ) = ∅ ) = 0. (cid:4) Lemma E.2 is an important intermediate technical result because it allows us to conclude that along anysequence { ( θ n, ∗ , F n ) : n ≥ } ∈ H , the empirical likelihood estimator exists with probability approaching 1.In all of the subsequent results, it is implicit that the event {C n ( θ n, ∗ ) = ∅} occurs. emma E.3. Deﬁne ˆ g n,b ( θ n, ∗ ) := n − P ni =1 g b ( W i , θ n, ∗ ) . The following result holds for any sequence { ( θ n, ∗ , F n ) : n ≥ } of n − -local alternatives: P F n (cid:0) (´ λ n,b ) > ˆ g n,b ( θ n, ∗ ) ≥ (cid:1) = 1 for each n ≥ .Proof. We outline the proof and then provide details.

Outline.

The proof employs the direct method. The ﬁrst step shows that for any sequence { ( θ n, ∗ , F n ) : n ≥ } , log(1 + (´ λ n,b ) > ˆ g n,b ( θ n, ∗ )) ≥ Step 1.

Let ´ λ n denote a feasible solution to the dual problem (E.7) under an arbitrary sequence { ( θ n, ∗ , F n ) : n ≥ } . Since the dual variables for the slack inequalities are equal to zero with probability 1 under theKarusch-Kuhn-Tucker conditions, we have that ´ λ > n g ( W i , θ n, ∗ ) = (´ λ n,b ) > g b ( W i , θ n, ∗ ) and the following holdswith probability equal to 1:0 ≤ n n X i =1 ln (cid:16) λ n,b ) > g b ( W i , θ n, ∗ ) (cid:17) ≤ ln (cid:18) λ n,b ) > n n X i =1 g b ( W i , θ n, ∗ ) (cid:19) (E.8)where the ﬁrst inequality holds as 2 P ni =1 ln (cid:16) λ n,b ) > g b ( W i , θ n, ∗ ) (cid:17) is the empirical likelihood ratio statisticfor testing the null hypothesis (see Canay (2010)) and the second holds by Jensen’s inequality. This impliesthat log(1 + (´ λ n,b ) > ˆ g n,b ( θ n, ∗ )) ≥ n ≥ Step 2.

For any x ∈ R , ln(1 + x ) ≥ x ≥

0. Consequently, we use the conclusion of Step 1 toconclude that (´ λ n,b ) > ˆ g n,b ( W i , θ n, ∗ ) ≥ n ≥ (cid:4) Lemma E.4.

Deﬁne random index set ´ B % n := { j ∈ J : ´ g n,j ( θ n, ∗ ) = 0 } and deterministic index set C := { j ∈ J : lim n →∞ E F n (cid:0) g j ( W i , θ n, ∗ ) (cid:1) = 0 } . If Assumptions LA1 and LA2 hold, then the following resultis true for any sequence of n − -local alternatives { ( θ n, ∗ , F n ) : n ≥ } : lim n →∞ P F n ( ´ B % n ⊆ C ) = 1 (E.9) where P F n ( · ) is the probability measure induced by repeated sampling from F n .Proof. The proof has multiple steps so we present an outline and then the steps in detail.

Outline.

We want to show that the event (cid:8) ´ B % n ⊆ C (cid:9) occurs with probability approaching 1 alongany sequence { ( θ n, ∗ , F n ) : n ≥ } . This involves three steps. In Step 1, we use the complement rule todeduce that this is equivalent to showing that (cid:8) ´ B % n ∩ C c = ∅ (cid:9) occurs with probability approaching 0 along { ( θ n, ∗ , F n ) : n ≥ } . In Step 2, we characterize the event (cid:8) ´ B % n ∩ C c = ∅ (cid:9) . In Step 3, we argue that that theevent (cid:8) ´ B % n ∩ C c = ∅ (cid:9) occurs with probability approaching 0 along { ( θ n, ∗ , F n ) : n ≥ } . Step 1.

Let { ( θ n, ∗ , F n ) : n ≥ } be an arbitrary sequence of n − -local alternatives. By the complementrule, lim n →∞ P F n ( ´ B % n ⊆ C ) = 1 − lim n →∞ P F n ( ´ B % n ∩ C c = ∅ ) . (E.10)So to show lim n →∞ P F n ( ´ B % n ⊆ C ) = 1, it suﬃces to show that lim n →∞ P F n ( ´ B % n ∩ C c = ∅ ) = 0. tep 2. On the event { ´ B % n ∩ C c = ∅} , there exists j ∈ J such that ´ g n,j ( θ n, ∗ ) = 0 and lim n →∞ E F n ( g j ( W i , θ n, ∗ )) >

0. The deduction that lim n → + ∞ E F n g j ( W i , θ n, ∗ ) > E F n g j ( W i , θ n, ∗ ) /σ F n ,j ( θ n, ∗ )around θ n , which yields E F n g j ( W i , θ n, ∗ ) σ F n ,j ( θ n, ∗ ) = E F n g j ( W i , θ n ) σ F n ,j ( θ n ) + O ( n − )and therefore E F n g j ( W i , θ n, ∗ ) is asymptotically nonnegative because { ( θ n , F n ) : n ≥ } ∈ F . The expansionis valid under LA2. So all we need to show that with probability approaching 1 it is not possible for´ g n,j ( θ n, ∗ ) = 0 and lim n → + ∞ E F n g j ( W i , θ n, ∗ ) > j ∈ J . Step 3.

Suppose that there exists j ∈ J such that ´ g n,j ( θ n, ∗ ) = 0 and lim n → + ∞ E F n g j ( W i , θ n, ∗ ) >

0. Weﬁrst note that ´ g n,j ( θ n, ∗ ) = 0 implies ´ g n,j ( θ n, ∗ ) ≥ ˆ g n,j ( θ n, ∗ ) because0 = ´ g n,j ( θ n, ∗ ) = 1 n n X i =1 g j ( W i , θ n, ∗ )1 + (´ λ n,b ) > g b ( W i , θ n, ∗ ) ! ≥ ˆ g n,j ( θ n, ∗ )1 + (´ λ n,b ) > ˆ g b ( θ n, ∗ ) ≥ ˆ g n,j ( θ n, ∗ ) . by Jensen’s inequality and the fact that ´ λ n,b ) > ˆ g b ( θ n, ∗ ) ≥ g n,j ( θ n, ∗ ), we have that´ g n,j ( θ n, ∗ ) ≥ ˆ g n,j ( θ n, ∗ ) − E F n (ˆ g n,j ( θ n, ∗ )) + E F n (ˆ g n,j ( θ n, ∗ )) = o p (1) + E F n ( g j ( W i , θ n, ∗ )) (E.11)along { ( θ n, ∗ , F n : n ≥ } by the weak law of large numbers for triangular arrays of row-wise i.i.d. ran-dom variables and the unbiasedness of ˆ g n,j ( θ n, ∗ ) for E F n ( g j ( W i , θ n, ∗ )). If we send n → + ∞ , we deducethat the probability limit of ´ g n,j ( θ n, ∗ ) is strictly positive by the ordering in (E.11) and the fact thatlim n → + ∞ E F n g j ( W i , θ n, ∗ ) >

0. Consequently, lim n → + ∞ P F n ( ´ B % n ∩ C c = ∅ ) = 0 along { ( θ n, ∗ , F n : n ≥ } .Combining this result with Step 1, we complete the proof. (cid:4) Lemma E.5.

Let B n := | ´ B % n | , ´Λ := { ´ λ n,b | ( E. , ( E. , ( E. hold } ⊆ R B n , and ||·|| ‘ Bn denote the Euclideannorm on R B n . If Assumptions LA1 and LA2 hold, then sup ´ λ n,b ∈ ´Λ n || ´ λ n,b || ‘ Bn = O p ( n − ) along any sequence { ( θ n, ∗ , F n ) : n ≥ } .Proof. The proof proceeds by the direct method. Given the length of the proof, we outline the argumentand then provide detailed steps.

Outline.

Our goal is to show that for any { ( θ n, ∗ , F n ) : n ≥ } , sup ´ λ n,b ∈ ´Λ n || ´ λ n,b || ‘ Bn = O p ( n − ). Thisinvolves three steps. In the ﬁrst step, we do some algebra to relate the Karusch-Kuhn-Tucker conditionsto || ´ λ n,b || ‘ Bn . In the second step, we derive a bound relating || ´ λ n,b || ‘ Bn and the sample moments of theinequalities that are binding under the Karusch-Kuhn-Tucker conditions. In the third step, we use standardlimit theorems for triangular arrays of row-wise i.i.d. random variables and the bound derived in Step 2 toconclude the result. Step 1.

Let { ( θ n, ∗ , F n ) : n ≥ } be arbitrary. The Karusch-Kuhn-Tucker conditions dictate that ´ λ n satisﬁes 1 n n X i =1 g b ( W i , θ n, ∗ )1 + (´ λ n ) > g ( W i , θ n, ∗ ) = 0 B n . (E.12) nder complementary slackness ´ λ n = (´ λ n,b , J − B n ), which implies that (E.12) is equivalent to1 n n X i =1 g b ( W i , θ n, ∗ )1 + (´ λ n,b ) > g b ( W i , θ n, ∗ ) = 0 B n . (E.13)Let { β n : n ≥ } be a sequence of unit vectors in R B n that satisfy β n || ´ λ n,b || ‘ Bn = ´ λ n,b . We take the innerproduct between β n and (E.13), which gives us β > n n n X i =1 g b ( W i , θ n, ∗ )1 + (´ λ n,b ) > g b ( W i , θ n, ∗ ) ! = 0 (E.14)If we deﬁne X i := (´ λ n,b ) > g b ( W i , θ n, ∗ ) for all i = 1 , ..., n and use the transformation11 + X i = 1 − X i X i for all i = 1 , ..., n , then we have that β > n n n X i =1 g b ( W i , θ n, ∗ ) ! = β > n n n X i =1 (cid:0) g b ( W i , θ n, ∗ ) (cid:1) (´ λ n,b ) > g b ( W i , θ n, ∗ )1 + (´ λ n,b ) > g b ( W i , θ n, ∗ ) ! (E.15)= || ´ λ n,b || ‘ Bn β > n n n X i =1 g b ( W i , θ n, ∗ ) g b ( W i , θ n, ∗ ) > λ n,b ) > g b ( W i , θ n, ∗ ) ! β n . (E.16)where the last equality holds by the deﬁnition of { β n : n ≥ } . Step 2.

Let ˆΣ n,b ( θ n, ∗ ) denote the sample analogue estimator of the covariance matrix of g b ( W i , θ n, ∗ ). Wewill relate ˆΣ n,b ( θ n, ∗ ) to the RHS of (E.16). Since ´ B % n ⊆ C with probability approaching 1 (Lemma E.4), wehave that ˆΣ n,b ( θ n, ∗ ) = 1 n n X i =1 g b ( W i , θ n, ∗ ) g b ( W i , θ n, ∗ ) > . (E.17)with probability approaching 1 along { ( θ n, ∗ , F n ) : n ≥ } . Since p i >

0, we have that 1 + X i > i = 1 , ..., n , which implies that with probability tending to 1 along { ( θ n, ∗ , F n ) : n ≥ } : || ´ λ n,b || ‘ Bn β > n ˆΣ n,b ( θ n, ∗ ) β n ≤ || ´ λ n,b || ‘ Bn β > n n n X i =1 g b ( W i , θ n, ∗ ) g b ( W i , θ n, ∗ ) > λ n,b ) > g b ( W i , θ n, ∗ ) ! β n (1 + X max ) (E.18)where X max = max ≤ i ≤ n | (´ λ n,b ) > g b ( W i , θ n, ∗ ) | . Applying the Cauchy-Schwarz inequality, we have that | (´ λ n,b ) > g b ( W i , θ n, ∗ ) | ≤ || ´ λ n,b || ‘ Bn || g b ( W i , θ n, ∗ ) || ‘ Bn , (E.19)implying that || ´ λ n,b || ‘ Bn β > n ˆΣ n,b ( θ n, ∗ ) β n ≤ || ´ λ n,b || ‘ Bn β > n n n X i =1 g b ( W i , θ n, ∗ ) g b ( W i , θ n, ∗ ) > λ n,b ) > g b ( W i , θ n, ∗ ) ! β n (1 + || ´ λ n,b || ‘ Bn Z ∗ n ) (E.20) here Z ∗ n := max ≤ i ≤ n || g b ( W i , θ n, ∗ ) || ‘ Bn . Apply the equality in (E.16) to the right hand side of (E.20) toconclude || ´ λ n,b || ‘ Bn β > n ˆΣ n,b ( θ n, ∗ ) β n − β > n (cid:18) Z ∗ n n n X i =1 g b ( W i , θ n, ∗ ) (cid:19)! ≤ β > n n n X i =1 g b ( W i , θ n, ∗ ) ! (E.21) Step 3.

By Lemma E.1, we have that Z ∗ n = o p ( n ) along { ( θ n, ∗ , F n ) : n ≥ } . Since ´ B % n ⊆ C for large n ,we can apply the Lyapunov CLT to n P ni =1 g b ( W i , θ n, ∗ ) to conclude that n − P ni =1 g b ( W i , θ n, ∗ ) = O p ( n − )along { ( θ n, ∗ , F n ) : n ≥ } . Finally, LA1 implies that0 < a + o p (1) ≤ β > n ˆΣ n,b ( θ n, ∗ ) β n ≤ b + o p (1) (E.22)along the sequence { ( θ n, ∗ , F n ) : n ≥ } , where a and b are the smallest and largest eigenvalues of thevariance matrix of binding moments in the population. These limiting results allow us to conclude that || ´ λ n,b || ‘ Bn ≤ O p ( n − ) a + o p (1) ∀ ´ λ n,b ∈ ´Λ n (E.23)along { ( θ n, ∗ , F n ) : n ≥ } . We have shown that the positive random variable || ´ λ n,b || ‘ Bn is bounded aboveby a random variable that is O p ( n − ) which impliessup ´ λ n,b ∈ Λ ∗ n || ´ λ n,b || ‘ Bn = O p ( n − ) (E.24)along { ( θ n, ∗ , F n ) : n ≥ } . (cid:4) Lemma E.6.

Let ´ g n ( θ n, ∗ ) and ˆ g n ( θ n, ∗ ) denote the restricted and unrestricted estimators of the moments,respectively, under n − -local alternatives. If the sequence { ( θ n, ∗ , F n ) : n ≥ } satisﬁes Assumptions LA1and LA2, then || ˆ g n ( θ n, ∗ ) − ´ g n ( θ n, ∗ ) || ‘ J = O p ( n − ) .Proof. Due to the length of the proof, we present an outline and then the steps in detail.

Outline.

Our goal is to show || ˆ g n ( θ n, ∗ ) − ´ g n ( θ n, ∗ ) || ‘ J = O p ( n − ) for any sequence { ( θ n, ∗ , F n ) : n ≥ } . This involves three steps. In Step 1, we show that proving || ˆ g n ( θ n, ∗ ) − ´ g n ( θ n, ∗ ) || ‘ J × J = O p ( n − )along a sequence of n − -local alternatives amounts to establishing the result coordinate-wise. In Step2, we use Lemma E.5 to deduce that showing | ˆ g n,j ( θ n, ∗ ) − ´ g n,j ( θ n, ∗ ) | = O p ( n − ) only requires showing P ni =1 ´ p i || g b ( W i , θ n, ∗ ) g j ( W i , θ n, ∗ ) || ‘ Bn = O p (1) along { ( θ n, ∗ , F n ) : n ≥ } . In Step 3, we show the requiredresult and complete the proof. Step 1.

Let { ( θ n, ∗ , F n ) : n ≥ } be arbitrary and let { e , ..., e J } denote the standard basis for R J . Thetriangle inequality and the unit length of the basis allows us to conclude that || ´ g n ( θ n, ∗ ) − ˆ g n ( θ n, ∗ ) || ‘ J ≤ P Jj =1 || e j (cid:0) ˆ g n,j ( θ n, ∗ ) − ´ g n,j ( θ n, ∗ ) (cid:1) || ‘ J = P Jj =1 | ˆ g n,j ( θ n, ∗ ) − ´ g n,j ( θ n, ∗ ) | . Consequently, a suﬃcient condition for || ´ g n ( θ n, ∗ ) − ˆ g n ( θ n, ∗ ) || ‘ J = O p ( n − ) along { ( θ n, ∗ , F n ) : n ≥ } is that, for each j ∈ J , | ˆ g n,j ( θ n, ∗ ) − ´ g n,j ( θ n, ∗ ) | = O p ( n − ) along { ( θ n, ∗ , F n ) : n ≥ } . tep 2. Consider | ˆ g n,j ( θ n, ∗ ) − ´ g n,j ( θ n, ∗ ) | , where j ∈ J is ﬁxed arbitrarily. We conclude that | ˆ g n,j ( θ n, ∗ ) − ´ g n,j ( θ n, ∗ ) | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X i =1 (cid:16) n − ´ p i (cid:17) g j ( W i , θ n, ∗ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n n X i =1 −

11 + (´ λ n,b ) > g b ( W i , θ n, ∗ ) ! g j ( W i , θ n, ∗ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n n X i =1 (´ λ n,b ) > g b ( W i , θ n, ∗ )1 + (´ λ n,b ) > g b ( W i , θ n, ∗ ) ! g j ( W i , θ n, ∗ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ´ λ > n,b n X i =1 ´ p i g b ( W i , θ n, ∗ ) g j ( W i , θ n, ∗ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ || ´ λ n,b || ‘ Bn (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X i =1 ´ p i g b ( W i , θ n, ∗ ) g j ( W i , θ n, ∗ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ‘ Bn ≤ || ´ λ n,b || ‘ Bn n X i =1 || ´ p i g b ( W i , θ n, ∗ ) g j ( W i , θ n, ∗ ) || ‘ Bn ≤ sup ´ λ n,b ∈ ´Λ n || ´ λ n,b || ‘ Bn n X i =1 ´ p i || g b ( W i , θ n, ∗ ) g j ( W i , θ n, ∗ ) || ‘ Bn where the ﬁrst inequality holds by Cauchy-Schwarz, the second holds by the triangle inequality, and the ﬁnalholds by the deﬁnition of least upper bound. By Lemma E.5, it suﬃces to show that n X i =1 ´ p i || g b ( W i , θ n, ∗ ) g j ( W i , θ n, ∗ ) || ‘ Bn = O p (1)and this is what we do in Step 3. Step 3.

By deﬁnition of ´ p i and the fact that x = 1 − x x , one can show n X i =1 ´ p i || g b ( W i , θ n, ∗ ) g j ( W i , θ n, ∗ ) || ‘ Bn = E n, ( θ n, ∗ ) − E n, ( θ n, ∗ ) (E.25)where E n, ( θ n, ∗ ) := 1 n n X i =1 || g b ( W i , θ n, ∗ ) g j ( W i , θ n, ∗ ) || ‘ Bn (E.26)and E n, ( θ n, ∗ ) := 1 n n X i =1 (´ λ n,b ) > g b ( W i , θ n, ∗ )1 + (´ λ n,b ) > g b ( W i , θ n, ∗ ) ! || g b ( W i , θ n, ∗ ) g j ( W i , θ n, ∗ ) || ‘ Bn . (E.27)First, note that E n, ( θ n, ∗ ) = O p (1) under { ( θ n, ∗ , F n ) : n ≥ } by the weak law of large numbers fortriangular arrays of row-wise i.i.d. random variables and LA1. Regarding (E.27), it is easy to see that ´ λ n,b ) > g b ( W i , θ n, ∗ ) = o p (1) along { ( θ n, ∗ , F n ) : n ≥ } because | (´ λ n,b ) > g b ( W i , θ n, ∗ ) | ≤ sup ´ λ n,b ∈ ´Λ n || ´ λ n,b || ‘ Bn max ≤ i ≤ n || g b ( W i , θ n, ∗ ) || ‘ Bn = O p ( n − ) o p ( n )= o p (1)by the Cauchy-Schwarz inequality, Lemma E.1, and Lemma E.5. This implies that E n, ( θ n, ∗ ) = (´ λ n,b ) > n n X i =1 g b ( W i , θ n, ∗ ) || g b ( W i , θ n, ∗ ) g j ( W i , θ n, ∗ ) || ‘ Bn !| {z } E n, ( θ n, ∗ ) + o p (1) (E.28)along { ( θ n, ∗ , F n ) : n ≥ } . The Cauchy-Schwarz inequality, triangle inequality, and deﬁnition of least upperbound implies that | E n, ( θ n, ∗ ) | ≤ sup ´ λ n,b ∈ ´Λ n || ´ λ || ‘ Bn max ≤ i ≤ n || g b ( W i , θ n, ∗ ) || ‘ Bn n n X i =1 || g b ( W i , θ n, ∗ ) g j ( W i , θ n, ∗ ) || ‘ Bn (E.29)= O p ( n − ) o p ( n ) O p (1) (E.30)= o p (1) (E.31)along { ( θ n, ∗ , F n ) : n ≥ } , where the equality holds by Lemma E.5, Lemma E.1, the weak law of large numbersfor triangular arrays of row-wise i.i.d. random variables, and LA1. This result implies that E n, ( θ n, ∗ ) is o p (1) and, combined with E n, ( θ n, ∗ ) = O p (1), implies that (E.25) is O p (1), which was required to show | ˆ g n,j ( θ n, ∗ ) − ´ g n,j ( θ n, ∗ ) | = O p ( n − ). (cid:4) E.4 Technical Results for Power Comparison

The ﬁrst result establishes the consistency of the constrained estimator of the covariance matrix along n − -local alternatives. Like Lemma D.4, || · || ‘ J × J denotes the Frobenius norm. Lemma E.7.

Let ´Σ n ( θ n, ∗ ) := P ni =1 ´ p i (cid:0) g ( W i , θ n, ∗ ) − ´ g n ( θ n, ∗ ) (cid:1)(cid:0) g ( W i , θ n, ∗ ) > − ´ g n ( θ n, ∗ ) (cid:1) > and Σ( θ n, ∗ , F n ) := Cov F n (cid:0) g ( W i , θ n, ∗ ) (cid:1) . If Assumptions LA1 and LA2 hold, then ∀ r > , ∀ { ( θ n, ∗ , F n ) : n ≥ } , lim n →∞ P F n (cid:16) || ´Σ n ( θ n, ∗ ) − Σ( θ n, ∗ , F n ) || ‘ J × J < r (cid:17) = 1 . Proof.

The proof proceeds by the direct method. Although the argument is linear, it has a few steps so weoutline the proof and then provide the details.

Outline.

We consider an arbitrary sequence { ( θ n, ∗ , F n ) : n ≥ } of n − -local alternatives and show that || ´Σ n ( θ n, ∗ ) − Σ( θ n, ∗ , F n ) || ‘ J × J = o p (1). To do this, there are a FEW steps. In the ﬁrst step, we deducethat it is suﬃcient to show || P ni =1 ( n − − ´ p i ) g ( W i , θ n, ∗ ) g ( W i , θ n, ∗ ) > || ‘ J × J = o p (1). In Step 2, we show || P ni =1 ( n − − ´ p i ) g ( W i , θ n, ∗ ) g ( W i , θ n, ∗ ) > || ‘ J × J ≤ o p (1) P ni =1 ´ p i || g ( W i , θ n, ∗ ) g ( W i , θ n, ∗ ) > || ‘ J × J . Subsequently,we show that ´ p i || g ( W i , θ n, ∗ ) g ( W i , θ n, ∗ ) > || ‘ J × J = O p (1) in Step 3. tep 1. Fix { ( θ n, ∗ , F n ) : n ≥ } arbitrarily. By the triangle inequality, || ´Σ n ( θ n, ∗ ) − Σ( θ n, ∗ , F n ) || ‘ J × J ≤ || ´Σ n ( θ n, ∗ ) − ˆΣ( θ n, ∗ ) || ‘ J × J (E.32)+ || ˆΣ n ( θ n, ∗ ) − Σ( θ n, ∗ , F n ) || ‘ J × J (E.33)= || ´Σ n ( θ n, ∗ ) − ˆΣ( θ n, ∗ ) || ‘ J × J + o p (1) (E.34)along { ( θ n, ∗ , F n ) : n ≥ } , where the second equality holds by the weak law of large numbers for triangulararrays of row-wise i.i.d. random variables. Decomposing || ´Σ n ( θ n, ∗ ) − ˆΣ( θ n, ∗ ) || ‘ J × J , we obtain || ´Σ n ( θ n, ∗ ) − ˆΣ( θ n, ∗ ) || ‘ J × J ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X i =1 (cid:16) n − ´ p i (cid:17) g ( W i , θ n, ∗ ) g ( W i , θ n, ∗ ) > (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ‘ J × J + o p (1) (E.35) along { ( θ n, ∗ , F n ) : n ≥ } , where the inequality holds by the triangle inequality, Lemma E.6, andthe continuous mapping theorem. Consequently, the result boils down to being able to show thatthe ﬁrst term in (E.35) is o p (1) along { ( θ n, ∗ , F n ) : n ≥ } . Step 2.

Following a similar derivation to that in Lemma E.6, it can be shown that (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X i =1 (cid:16) n − ´ p i (cid:17) g ( W i , θ n, ∗ ) g ( W i , θ n, ∗ ) > (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ‘ J × J = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X i =1 (cid:20) ´ p i (´ λ n,b ) > g b ( W i , θ n, ∗ ) (E.36) × g ( W i , θ n, ∗ ) g ( W i , θ n, ∗ ) > (cid:21)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ‘ J × J ≤ sup ´ λ n,b ∈ ´Λ n || ´ λ n,b || ‘ Bn max ≤ i ≤ n || g b ( W i , θ n, ∗ ) || ‘ Bn (E.37) × n X i =1 ´ p i || g ( W i , θ n, ∗ ) g ( W i , θ n, ∗ ) > || ‘ J × J where the inequality holds by the triangle inequality, the Cauchy-Schwarz inequality and deﬁnitionof the least upper bound. Sincesup ´ λ n,b ∈ ´Λ n || ´ λ n,b || ‘ Bn max ≤ i ≤ n || g b ( W i , θ n, ∗ ) || ‘ J × J = O p ( n − ) o p ( n ) = o p (1)along { ( θ n, ∗ , F n ) : n ≥ } by Lemma E.5 and Lemma E.1, it suﬃces to show that n X i =1 ´ p i || g ( W i , θ n, ∗ ) g ( W i , θ n, ∗ ) > || ‘ J × J = O p (1)along { ( θ n, ∗ , F n ) : n ≥ } . This is the task of Step 3.22 tep 3. Decompose P ni =1 ´ p i || g ( W i , θ n, ∗ ) g ( W i , θ n, ∗ ) > || ‘ J × J as follows n X i =1 ´ p i || g ( W i , θ n, ∗ ) g ( W i , θ n, ∗ ) > || ‘ J × J = 1 n n X i =1 || g ( W i , θ n, ∗ ) g ( W i , θ n, ∗ ) > || ‘ J × J λ n,b ) > g b ( W i , θ n, ∗ )= 1 n n X i =1 || g ( W i , θ n, ∗ ) g ( W i , θ n, ∗ ) > || ‘ J × J − n n X i =1 (´ λ n,b ) > g b ( W i , θ n, ∗ ) || g ( W i , θ n, ∗ ) g ( W i , θ n, ∗ ) > || ‘ J × J λ n,b ) > g b ( W i , θ n, ∗ ) ≤ n n X i =1 || g ( W i , θ n, ∗ ) g ( W i , θ n, ∗ ) > || ‘ J × J − (´ λ n,b ) > n P ni =1 g b ( W i , θ n, ∗ ) || g ( W i , θ n, ∗ ) g ( W i , θ n, ∗ ) > || ‘ J × J λ n,b ) > ˆ g n,b ( θ n, ∗ ) . where the inequality holds by Jensen’s inequality. By the weak law of large numbers for row-wisei.i.d. random variables and LA1, n P ni =1 || g ( W i , θ n, ∗ ) g ( W i , θ n, ∗ ) > || ‘ J × J = O p (1) along { ( θ n, ∗ , F n ) : n ≥ } . Now, let E n, ( θ n, ∗ ) := (´ λ n,b ) > n P ni =1 g b ( W i , θ n, ∗ ) || g ( W i , θ n, ∗ ) g ( W i , θ n, ∗ ) > || ‘ J × J λ n,b ) > ˆ g n,b ( θ n, ∗ )and notice that | E n, ( θ n, ∗ ) | ≤ sup ´ λ n,b ∈ ´Λ n || ´ λ n,b || ‘ Bn max ≤ i ≤ n || g b ( W i , θ n, ∗ ) || ‘ J × J λ n,b ) > ˆ g n,b ( θ n, ∗ ) ! (E.38) × n n X i =1 || g ( W i , θ n, ∗ ) g ( W i , θ n, ∗ ) > || ‘ J × J ! . The numerator of (E.38) is o p (1) along { ( θ n, ∗ , F n ) : n ≥ } by Lemma E.5, Lemma E.1, and the weaklaw of large numbers for triangular arrays of row-wise i.i.d. random variables. The denominator is1 + o p (1) along { ( θ n, ∗ , F n ) : n ≥ } because | (´ λ n,b ) > ˆ g n,b ( θ n, ∗ ) | ≤ sup ´ λ n,b ∈ ´Λ n || ´ λ n,b || ‘ Bn || ˆ g n,b ( θ n, ∗ ) || ‘ Bn = O p ( n − ) O p ( n − ) = o p (1) (E.39)where the ﬁrst inequality is the Cauchy-Schwarz inequality and the deﬁnition of least upper bound,the ﬁrst equality holds by Lemma E.5 and a Liaponuv CLT for triangular arrays of row-wise i.i.d.random variables. Note we do not need to recenter as ´ B % n ⊆ C w.p.a. 1 as n → ∞ . Thus, P ni =1 ´ p i || g ( W i , θ n, ∗ ) g ( W i , θ n, ∗ ) > || ‘ J × J = O p (1) + o p (1) = O p (1) along { ( θ n, ∗ , F n ) : n ≥ } . (cid:4) The next lemma establishes an ordering of the restricted and unrestricted estimator of the moments thatoccurs with probability approaching 1 when the moments are nonnegatively correlated. emma E.8. Let M be as in (3.5). For any { ( θ n, ∗ , F n ) : n ≥ } ∈ M and j ∈ J , lim n → + ∞ P F n (cid:0) ˆ g n,j ( θ n, ∗ ) ≤ ´ g n,j ( θ n, ∗ ) (cid:1) = 1 . Proof.

We outline the steps to the proof and then provide details.

Outline.

The ﬁrst step shows that ˆ g n,j ( θ n, ∗ ) − ´ g n,j ( θ n, ∗ ) = ´ λ > n,b P ni =1 ´ p i g b ( W i , θ n, ∗ ) g j ( W i , θ n, ∗ ) for any j ∈ J . The second step uses the sign restrictions on the elements in { Ω( θ n, ∗ , F n ) : n ≥ } and on ´ λ n,b toconclude the result. Step 1.

Fix j ∈ J and { ( θ n, ∗ , F n ) : n ≥ } ∈ M arbitrarily. Using a derivation similar to that presentedin Lemma E.6, we have thatˆ g n,j ( θ n, ∗ ) − ´ g n,j ( θ n, ∗ ) = n X i =1 (cid:18) n − ´ p i (cid:19) g j ( W i , θ n, ∗ ) (E.40)= ´ λ > n,b n X i =1 ´ p i g b ( W i , θ n, ∗ ) g j ( W i , θ n, ∗ ) (E.41) Step 2.

Let Ξ( θ n, ∗ , F n ) denote the B n × g j ( W i , θ n, ∗ ) and the elements of g b ( W i , θ n, ∗ ). From (E.41), we can writeˆ g n,j ( θ n, ∗ ) − ´ g n,j ( θ n, ∗ ) = ´ λ > n,b n X i =1 ´ p i g b ( W i , θ n, ∗ ) g j ( W i , θ n, ∗ ) − Ξ( θ n, ∗ , F n ) + Ξ( θ n, ∗ , F n ) ! (E.42)From Lemma E.7, we have that P ni =1 ´ p i g b ( W i , θ n, ∗ ) g j ( W i , θ n, ∗ ) − Ξ( θ n, ∗ , F n ) = o p (1) along { ( θ n, ∗ , F n ) : n ≥ } . Hence, ˆ g n,j ( θ n, ∗ ) − ´ g n,j ( θ n, ∗ ) = ´ λ > n,b o p (1) + Ξ( θ n, ∗ , F n ) ! . (E.43)Since the the Karusch-Kuhn-Tucker conditions dictate that ´ λ n,b,k ≤ k ∈ { , ..., B n } and Ξ( θ n, ∗ , F n )is a vector of nonnegative terms, we have that ˆ g n,j ( θ n, ∗ ) − ´ g n,j ( θ n, ∗ ) ≤ n → ∞ along { ( θ n, ∗ , F n ) : n ≥ } . (cid:4) The next result provides an ordering of the elementwise moment selection functions that occurs withprobability 1 under nonnegative correlation. Let a, b ∈ R J [ ±∞ ] , the relation a (cid:37) b means that a j ≥ b j foreach j ∈ J . Lemma E.9.

Let M be as in (3.5). Then ∀ { ( θ n, ∗ , F n ) : n ≥ } ∈ M , lim n →∞ P F n (cid:16) ϕ (1) ( ´ ξ n ( θ n, ∗ ) , ˆΩ n ( θ n, ∗ )) (cid:37) ϕ (1) ( ˆ ξ n ( θ n, ∗ ) , ˆΩ n ( θ n, ∗ )) (cid:17) = 1 . Proof.

The proof uses the direct method. We present an outline and then the steps in detail.

Outline.

The proof involves two short steps. Step 1 shows that an ordering of the restricted and un-restricted estimators implies an ordering of the moment selection functions. Step 2 invokes Lemma E.8 toestablish the result. tep 1. Since ´ ξ n,j ( θ n, ∗ ) and ˆ ξ n,j ( θ n, ∗ ) are just ´ g n,j ( θ n, ∗ ) and ˆ g n,j ( θ n, ∗ ), respectively, scaled by commonpositive factor ˆ σ − n,j ( θ n, ∗ ) κ − n n , it follows that n ´ g n ( θ n, ∗ ) (cid:37) ˆ g n ( θ n, ∗ ) o ⊆ n ´ ξ n ( θ n, ∗ ) (cid:37) ˆ ξ n ( θ n, ∗ ) o (E.44) ⊆ n ϕ (1) ( ´ ξ n,j ( θ n, ∗ ) , ˆΩ n ( θ n, ∗ )) (cid:37) ϕ (1) ( ˆ ξ n,j ( θ n, ∗ ) , ˆΩ n ( θ n, ∗ )) o (E.45)where the second set inclusion holds because ϕ (1) ( ξ, Ω) is nondecreasing in ξ . Step 2.

Step 1 and the monotonicity of probability measures yield P F n (cid:16) ´ g n ( θ n, ∗ ) (cid:37) ˆ g n ( θ n, ∗ ) (cid:17) ≤ P F n (cid:16) ϕ (1) ( ´ ξ n,j ( θ n, ∗ ) , ˆΩ n ( θ n, ∗ )) (cid:37) ϕ (1) ( ˆ ξ n,j ( θ n, ∗ ) , ˆΩ n ( θ n, ∗ )) (cid:17) for each n ≥

1. So for any { ( θ n, ∗ , F n ) : n ≥ } ∈ M , we invoke Lemma E.8 to conclude thatlim n → + ∞ P F n (cid:16) ϕ (1) ( ´ ξ n,j ( θ n, ∗ ) , ˆΩ n ( θ n, ∗ )) (cid:37) ϕ (1) ( ˆ ξ n,j ( θ n, ∗ ) , ˆΩ n ( θ n, ∗ )) (cid:17) = 1 . (cid:4) F Further Theoretical Discussion

F.1 Other GMS Functions

F.1.1 GMS Assumptions

We restate the GMS Assumptions in Andrews and Soares (2010) to aid discussion in the next subsection.We restrict Ω ∈ Ψ in the statements to accord with the assumptions imposed on F . Assumption GMS 1.

For each j ∈ J , 1. ϕ j ( ξ, Ω) = 0 is continuous for all ( ξ, Ω) ∈ R J [+ ∞ ] × Ψ with ξ j = 0 and 2. ϕ j ( ξ, Ω) = 0 for all ( ξ, Ω) ∈ R J [+ ∞ ] × Ψ with ξ j = 0 . Assumption GMS 2. κ n → + ∞ as n → + ∞ Assumption GMS 3.

For each j ∈ J , ϕ j ( ξ, Ω) → + ∞ as ( ξ, Ω) → ( ξ ∗ , Ω ∗ ) for any ( ξ ∗ , Ω ∗ ) ∈ R J [+ ∞ ] × Ψ with ξ ∗ ,j = + ∞ . Assumption GMS 4. κ − n n → + ∞ as n → + ∞ Assumption GMS 6.

For each j ∈ J , ϕ j ( ξ, Ω) ≥ for all ( ξ, Ω) ∈ R J [+ ∞ ] × Ψ . Assumption GMS 7.

For each j ∈ J , ϕ j ( ξ, Ω) ≥ min { , ξ j } for all ( ξ, Ω) ∈ R [+ ∞ ] × Ψ . We do not list Assumption GMS 5 because it is required to compare moment selection and subsamplingcritical values, a topic we do not discuss formally in our paper. GMS2 and GMS4 combine to form AssumptionK in the paper.

F.1.2 Alternative Choices of ϕ The main theoretical results in the paper assumed that ϕ = ϕ (1) , but there are many other choices for ϕ .These include ϕ (2) j ( ξ, Ω) = ψ ( ξ j ) , ϕ (3) j ( ξ, Ω) = max(0 , ξ j ) , ϕ (4) j ( ξ, Ω) = ξ j , where ψ ( · ) is nondecreasing and atisﬁes ψ ( x ) = 0 if x ≤ a L , ψ ( x ) ∈ [0 , ∞ ] if x ∈ ( a L , a U ), and ψ ( x ) = ∞ if x ≥ a U (Andrews and Soares,2010). Another choice is the modiﬁed MSC choice deﬁned as ϕ (5) j =  c j ( ξ, Ω) = 1 ∞ if c j ( ξ, Ω) = 0where c := ( c ( ξ, Ω) , ..., c J ( ξ, Ω)) solves the integer program min c ∈{ , } J { S ( − c > ξ, Ω) − ζ ( | c | ) } for someincreasing function ζ ( · ). Modiﬁed MSC uses the information embedded in the oﬀ-diagonals of the correlationmatrix Ω in a computationally expensive way, whereas ϕ ( k ) , k ∈ { , , , } , does not (Andrews and Soares,2010).Our decision to focus on ϕ = ϕ (1) is essentially without loss of generality because the results can begeneralized to any ϕ that satisﬁes the assumptions of Andrews and Soares (2010). To see this, we ﬁrst recallthat Lemma D.3 implies that for any r > n → + ∞ inf ( θ,F ) ∈F + P F (cid:16)(cid:12)(cid:12)(cid:12)(cid:12) ( ´ ξ n ( θ ) , ˆΩ n ( θ )) − ( ˆ ξ n ( θ ) , ˆΩ n ( θ )) (cid:12)(cid:12)(cid:12)(cid:12) ‘ J × Ψ < r (cid:17) = 1and Lemma E.6 implies that for any r > { ( θ n, ∗ , F n ) : n ≥ } ∈ H that satisﬁes Assumptions LA1 andLA2, lim n → + ∞ P F n (cid:16)(cid:12)(cid:12)(cid:12)(cid:12) ( ´ ξ n ( θ n, ∗ ) , ˆΩ n ( θ n, ∗ )) − ( ˆ ξ n ( θ n, ∗ ) , ˆΩ n ( θ n, ∗ )) (cid:12)(cid:12)(cid:12)(cid:12) ‘ J × Ψ < r (cid:17) = 1 , where || · || ‘ J × Ψ := ( || · || ‘ J + || · || ‘ J × J ) in both statements. These convergence results are enough toextend our asymptotic size and limiting local power results to any ϕ that satisﬁes Assumptions GMS1–4with appropriate modiﬁcations to notation. Indeed, we can replicate the arguments in the proofs of Theorem1 and Theorem 2 in Andrews and Soares (2010) (with modiﬁcations to notation). The same comment appliesto Theorem 4 in their paper because the use of a constrained estimator does not challenge the validity ofGMS7.The ordering of the local power functions also extends. The weak ordering of the local power functionsextends to ϕ ( k ) , k ∈ { , , , } , because the result only requires that ϕ j ( ξ, Ω) be nondecreasing in ξ . However,the strict ordering does not apply under ϕ (3) because we require ϕ j ( ξ, Ω) ≥ j ∈ J in order to invokePart 2 of Assumption 5, eﬀectively restricting attention to those that satisfy GMS5. We do not view this tobe a serious limitation, especially given that ϕ = ϕ (1) is the recommended by Andrews and Barwick (2012a).A ﬁnal technical point is that to generalize Theorem 3, we must replace the event { ´Υ n ( θ n, ∗ ) (cid:40) ˆΥ n ( θ n, ∗ ) } in the statement of Theorem 3 with a more general event (cid:8) ϕ ( ´ ξ n ( θ n, ∗ ) , ˆΩ n ( θ n, ∗ )) (cid:31) ϕ ( ˆ ξ n ( θ n, ∗ ) , ˆΩ n ( θ n, ∗ ))) (cid:9) because the ﬁrst uses the speciﬁc form of ϕ (1) . F.2 Elaboration on Remark 1

In Remark 1, we state that one can ‘fully constrain’ the CMS procedure. This involves use of the empiricallikelihood estimator of the covariance matrix ´Σ n ( θ ) and correlation matrix ´Ω n ( θ ) = ´ D − n ( θ ) ´Σ n ( θ ) ´ D − n ( θ ). Note if c j = 0 and ξ j = + ∞ , the convention is adopted that c j ξ j = 0. For any vectors a, b ∈ R J [+ ∞ ] , the relation a (cid:31) b means that a j ≥ b j for each j with at least one strict inequality. emma D.3 and D.4 imply that for any r > n → + ∞ inf ( θ,F ) ∈F + P F (cid:16)(cid:12)(cid:12)(cid:12)(cid:12) ( ´ ξ F Cn ( θ ) , ´Ω n ( θ )) − ( ˆ ξ n ( θ ) , ˆΩ n ( θ )) (cid:12)(cid:12)(cid:12)(cid:12) ‘ J × Ψ < r (cid:17) = 1 . So simple modiﬁcations of the arguments in the proof of Theorem 1 establish validity of fully-constrainedconﬁdence sets.Similarly, Lemma E.6 and E.7 allow us to conclude that for any r > { ( θ n, ∗ , F n ) : n ≥ } ∈ H that satisﬁes Assumption LA1 and LA2,lim n → + ∞ P F n (cid:16)(cid:12)(cid:12)(cid:12)(cid:12) ( ´ ξ F Cn ( θ n, ∗ ) , ´Ω n ( θ n, ∗ )) − ( ˆ ξ n ( θ n, ∗ ) , ˆΩ n ( θ n, ∗ )) (cid:12)(cid:12)(cid:12)(cid:12) ‘ J × Ψ < r (cid:17) = 1implying that adjustments to the proof of Theorem 2 extend the limiting local power results to the fullyconstrained case. It is diﬃcult to establish a general ordering between ´ ξ F Cn ( · ) and ˆ ξ ( · ) so it is unclearwhether the ﬁnite-sample n − -local power comparisons hold in the fully constrained case. The consistencyagainst distant alternatives also extends because GMS7 and the use of the constrained estimator imply that ϕ j ( ´ ξ F Cn ( θ n, ∗ ) , ´Ω n ( θ n, ∗ ) ≥ j ∈ J so we can similarly bound the fully constrained critical value fromabove by the plug-in asymptotic critical value. G Further Simulation Details

G.1 Outline of RMS

Andrews and Barwick (2012a) present a modiﬁcation of the GMS procedure. From an implementationstandpoint, the approach is basically the same as GMS except that:1. They replace κ n with a data-driven tuning parameter ˆ κ := κ (ˆ δ n ( · )), where ˆ δ n ( · ) is the minimumoﬀ-diagonal element of ˆΩ n ( · ).2. They add a size-correction factor ˆ η := η (ˆ δ n ( · )) + η ( J ) to the GMS critical value that results fromusing the tuning parameter ˆ κ .The need to size-correct reﬂects the fact that ˆ κ is a ﬁnite constant plus o p (1) rather than a divergent sequenceand the method of data-driven tuning parameters is referred to as κ -auto (Andrews and Barwick, 2012a). G.2 Outline of the Two-Step Procedure

We outline the two-step procedure of Romano et al. (2014) to aid understanding of the simulation results.The procedure needs some modiﬁcation because we test H : µ ∈ R J + rather than H : µ ∈ R J − . To this end,let F = { F = N ( µ, Σ) : ( µ, Σ) ∈ R J × Ψ } , F = { F ∈ F : µ ∈ R J + } , and assume that the correlation matrixΣ is known. The following steps describe a level α test for H : F ∈ F vs. H : F ∈ F \ F using a randomsample { W i : i = 1 , ..., n } iid ∼ F :1. Compute the test statistic T n = S ( √ n ˆ g n , ˆΣ n ).2. Generate bootstrap samples { W ∗ i,b : i = 1 , ..., n } , b = 1 , ..., B , by sampling with replacement from thedata { W i : i = 1 , ..., n } . . Compute a lower conﬁdence rectangle M n ( β ) = { µ ∈ R J : min ≤ j ≤ J [ˆ σ − n,j √ n ( µ j − ˆ g n,j )] ≥ K − n ( β ) } ,where K − n ( β ) is the β -quantile of { min ≤ j ≤ J [(ˆ σ ∗ n,j,b ) − √ n (ˆ g n,j − ˆ g ∗ n,j,b )] : b = 1 , ..., B } , ˆ g ∗ n,j,b = n − P ni =1 W ∗ i,j,b , and ˆ σ ∗ n,b,j = n − P ni =1 ( W ∗ i,j,b − ˆ g ∗ n,j,b ) for b = 1 , ..., B and j = 1 , ..., J . This deter-mines which components of µ are ‘positive’.4. Compute bootstrap test statistics { T ∗ n,b : b = 1 , ..., B } , where T ∗ n,b = S (cid:0) ( ˆ D ∗ n,b ) − √ n (ˆ g ∗ n,b − ˆ g n ) +( ˆ D ∗ n,b ) − √ nλ ∗ , ˆΣ ∗ n,b (cid:1) , and λ ∈ R J + with λ ∗ j = max { , n − ˆ σ n,j K − n ( β ) + ˆ g n,j } for j = 1 , ..., J .5. Compute the critical value c RSWn (1 − α + β ), which is deﬁned as the 1 − α + β quantile of { T ∗ n,b : b =1 , ..., B } .6. Reject H at signiﬁcance level α if T n > c RSWn (1 − α + β ) and M n ( β ) (cid:42) R J + .Following the choice of Romano et al. (2014), we set β = α/

10 for all simulations.

G.3 MNRP Corrections

We outline the MNRP corrections used for ﬁnite-sample n − local power results, an essential ingredient fora fair comparison of the procedures under the alternative. For a given pair ( J, Ω), let p RSWn,R ≡ p RSWn,R ( J, Ω)denote the maximum null rejection probability for the two-step procedure of Romano et al. (2014) based on R Monte Carlo simulations and sample size n . For t ∈ { GM S, CM S, RM S } , the random variable δ tn,R ≡ δ tn,R ( J, Ω) is the (1 − p RSWn,R )-empirical quantile based on the simulated process { T ∗ ,tn,r − c ∗ ,tn,r : r = 1 , ..., R } ,where ( T ∗ ,tn,r , c ∗ ,tn,r ) correspond to the mean vector µ ∗ ,t that maximizes null rejection probability for test t . Weadd δ tn,R to the corresponding critical value in the power results to ensure that all procedures have the sameMNRP. Indeed, by constructionˆ P ∗ R,t ( T ∗ ,tn,r − c ∗ ,tn,r ≤ δ tn,R ) = p RSWn,R ∀ t ∈ { GM S, CM S, RM S } where ˆ P ∗ R,t ( · ) denotes the simulation distribution of { T ∗ ,tn,r − c ∗ ,tn,r : r = 1 , ..., R } ..