Inference for Moment Inequalities: A Constrained Moment Selection Procedure
IInference for Moment Inequalities: A ConstrainedMoment Selection Procedure
Rami V. Tabri ∗ and Christopher D. Walker † Abstract
Inference in models where the parameter is defined by moment inequalities is of interest inmany areas of economics. This paper develops a new method for improving the performance ofgeneralized moment selection (GMS) testing procedures in finite-samples. The method modifiesGMS tests by tilting the empirical distribution in its moment selection step by an amountthat maximizes the empirical likelihood subject to the restrictions of the null hypothesis. Wecharacterize sets of population distributions on which a modified GMS test is (i) asymptoticallyequivalent to its non-modified version to first-order, and (ii) superior to its non-modified versionaccording to local power when the sample size is large enough. An important feature of theproposed modification is that it remains computationally feasible even when the number ofmoment inequalities is large. We report simulation results that show the modified tests controlsize well, and have markedly improved local power over their non-modified counterparts.
Keywords : empirical likelihood, moment inequality model, statistical information.
JEL Classification : C12, C14, C21 ∗ School of Economics, The University of Sydney, Sydney, New South Wales 2006, Australia, Tel: +61 2 9351 3092,Fax: +61 2 9351 4341, Email: [email protected]. † Corresponding author. Department of Economics, Harvard University, Cambridge, MA 02138, United States ofAmerica, Email: [email protected] a r X i v : . [ ec on . E M ] A ug Introduction
Statistical inference in models defined by moment inequalities is a frequently encountered topic in economet-rics. Examples of applications include games of entry with multiple equilibria (e.g., Ciliberto and Tamer,2009), single/multiple agent optimization problems (e.g., Pakes et al., 2015), censored and missing data(e.g., Manski and Tamer, 2002; Imbens and Manski, 2004), model selection tests (e.g., Shi, 2015, and Hsuand Shi, 2017), event-study designs (e.g., Rambachan and Roth, 2019), stochastic dominance comparisons(e.g. Whang, 2019) and New-Keynesian DSGE models (e.g., Moon and Schorfheide, 2009). This paperconsiders inference for a finite-dimensional parameter defined by a finite number of unconditional momentinequalities.We suppose that there exists a true value of the parameter θ ∈ Θ ⊆ R d that satisfies the momentinequality restrictions E F (cid:0) g j ( W i , θ ) (cid:1) ≥ j = 1 , ..., J, (1.1)where { g j ( · , θ ) : j = 1 , ..., J } are known real-valued functions, { W i : i ≤ n } are independent and identicallydistributed (i.i.d.) with unknown distribution F , and W i ∈ R dim( W i ) . Under these moment conditions,the set Θ I ( F ) ≡ (cid:8) θ ∈ Θ : E F (cid:0) g j ( W i , θ ) (cid:1) ≥ ∀ j = 1 , ..., J (cid:9) denotes the so-called identified set while any θ ∈ Θ I ( F ) is termed an identifiable parameter . Thus, the true value of the parameter might not be uniquelyidentified by F and the economic model.We are interested in confidence sets for θ constructed by test inversion. The test is based on a statistic T n , for testing individual hypotheses for each θ that have the form H : θ ∈ Θ I ( F ) versus H : θ / ∈ Θ I ( F ) . (1.2)Inference in this model is challenging because the pointwise limiting null distribution of conventional teststatistics are discontinuous in the parameter – the dependence on the parameter is through the index setof moment inequalities (1.1) that are binding. In particular, a moment inequality enters the pointwiseasymptotic null distribution of the test statistic T n whenever it holds as an equality. Tests of (1.2) that havegood properties incorporate information about which moments E F (cid:0) g j ( W i , θ ) (cid:1) are “positive”, in order toexclude them from the computation of a critical value. Tests of this sort are known as two-step procedures inthe literature, examples of which include Andrews and Soares (2010), Canay (2010), Andrews and Barwick(2012a), and Romano et al. (2014). The first step of those testing procedures use the data to determinewhether the moment inequalities (1.1) are close to or far from being equalities. The second step usesthe outcome of the first step to yield information about which moment inequalities are “positive” whenconstructing tests of (1.2).The literature on two-step tests of (1.2) is vast, and almost all of these tests use the sample-analogueestimator of the moments E F (cid:0) g j ( W i , θ ) (cid:1) in the first step to determine the slackness of the moment inequal-ities. This feature ignores the information present in the restrictions (1.1), because the sample-analogueestimator does not exploit the fact that the moments satisfy these restrictions under the null hypothesisin (1.2). Thus, we conjecture that implementing this information in such tests can improve their accuracyin finite-samples under the null and alternative hypotheses. This paper provides such a modification for thebroad class of generalized moment selection (GMS) testing procedures put forward by Andrews and Soares(2010), and finds that our conjecture is in the right direction.We propose a modification of GMS testing procedures that implements the information present in (1.1) sing the method of empirical likelihood (Owen, 2001). The modification is to replace the sample-analogueestimator of the moments in the first step of the GMS procedure with its constrained empirical likelihoodcounterpart, where the constraints are the moment inequalities (1.1). We label this modification constrainedmoment selection (CMS). For a given test statistic and moment selection function, the CMS and GMStests only differ in terms of which moments they select for the computation of the critical value in testsof (1.2). The motivation for our proposal is that the detection of the “positive” moment inequalities in thefirst step would be more accurate because we are using additional information that is available to us, whichthe sample-analogue estimator of the moments ignores. Consequently, the CMS procedure alters the GMScritical value for testing (1.2) in a data-dependent way that incorporates the information contained in (1.1)through a reduction of the parameter space for F . For this reason, we expect CMS tests of (1.2) to be moreaccurate than their GMS counterparts in finite-samples.This paper characterises the parameter space for ( θ , F ) over which the CMS and GMS testing proce-dures are asymptotically equivalent, to first-order, under the null, local alternatives, and distant alternatives.We focus, though, on the GMS class of testing procedures in which the moment selection function is givenby the moment selection t -test. This focus is without loss of generality, as the results extend naturally, withappropriate modifications, to the more general setup in Andrews and Soares (2010) using their assump-tions. This means that for a given test statistic, CMS tests inherit all of the asymptotic properties of GMStests. Specifically, under the null, CMS confidence sets are asymptotically valid with uniformity over theparameter space, not asymptotically conservative, and not asymptotically similar. Furthermore, CMS testsof (1.2) have greater asymptotic local power than tests based on subsampling or fixed asymptotic criticalvalues, and are consistent against distant alternatives. The parameter space imposes only three conditionsin addition to the conditions that define the parameter space Andrews and Soares (2010) introduce. Theseconditions are part of Assumption GEL in Andrews and Guggenberger (2009): (i) a uniform bound on thevariances of the moment functions, (ii) a lower bound on the determinant of their correlation matrix, and(iii) a regularity condition on an estimator of the degree of slackness of the moments arising from the dualformulation of the constrained empirical likelihood problem. Collectively, the conditions that define ourparameter space enables the use of results from Andrews and Guggenberger (2009) on constrained empiricallikelihood estimation in our proofs of the aforementioned asymptotic results.While GMS and CMS are asymptotically equivalent procedures, we characterise local alternatives underwhich the power of CMS tests dominate their GMS counterparts for sufficiently large, but finite, samples.These are directions in the alternative that have some non-violated moment inequalities (SNVIs) and anon-negative correlational structure. That is, configurations where some of the moments E F (cid:0) g j ( W i , θ ) (cid:1) under the alternative hypothesis are “positive”, and the covariance matrix of { g j ( W i , θ ) : j = 1 , ..., J } hasnon-negative entries only. The non-negative correlational structure arises in empirical applications; see,for example, Lok and Tabri (in press) who point to that structure for moment inequalities characterisingstochastic dominance comparisons. It is quite difficult to determine the extent of this difference in localpowers analytically. However, using a Monte Carlo simulation experimental design based on Andrews andBarwick (2012a), who focus on finite-sample comparisons of the maximum null rejection probability (MNRP),we show using the modified method of moments (MMM) statistic that along such local alternatives thedifferences in MNRP-corrected powers of CMS and GMS tests can be approximately 36 percentage pointswhen J = 4 and n = 250 , which is strikingly large. See Section 4 for more details.The two-step tests in this literature that exploit the information (1.1) are the procedures put forwardby Andrews and Guggenberger (2009) and Canay (2010). They implement this information using (gener-alised) empirical likelihood. Andrews and Guggenberger (2009) and Canay (2010) develop subsampling and ootstrap tests of (1.2), respectively, using empirical-likelihood-type test statistics. Both tests have correctasymptotic size in a uniform sense and are shown not to be asymptotically conservative. However, Canay(2010)’s test has higher asymptotic power because it is a GMS procedure. More generally, Andrews andSoares (2010) show the asymptotic power of GMS tests dominate that of subsampling and plug-in asymp-totic tests. A disadvantage of Canay (2010)’s procedure is that it may be more computationally burdensomethan other GMS tests. Thus, our modification of GMS tests can improve finite-sample performance withoutincurring a high computational cost.Andrews and Barwick (2012a) proposed a refinement of GMS termed refined moment selection (RMS)and discussed the reasons why such an approach is preferable. However, the RMS procedure is quite com-putationally expensive when J > . By contrast, the CMS procedure remains computationally feasiblewhen J is large. The reason is that the constrained empirical likelihood optimization problem it is basedupon has a strictly concave objective function, convex feasible set, and the choice variables enter linearlyinto the constraints. As a consequence, there is a unique global solution to this optimization problem andits implementation involves an of-the-shelf programming routine. More recently, Romano et al. (2014) pro-posed a two-step testing procedure for moment inequalities that is similar in spirit to the RMS procedureand remains computationally feasible when J is large. An important distinction between the CMS testingprocedure and these tests is that, like GMS tests, neither of them exploits the information present in themoment inequality constraints (1.1), because they employ the sample-analogue estimator of the moments intheir first step.We examine the finite-sample performance of CMS tests using the MMM and adjusted quasi-likelihood-ratio (AQLR) test statistics in Monte Carlo simulations based on the experimental design in Andrews andBarwick (2012a). The experiment compares the performance of CMS to its GMS, RMS, RSW counterpartsin terms of MNRP and MNRP-corrected local power. The inclusion of the RMS and RSW procedures in thesimulation experiment is to benchmark the performance of CMS. Overall, the simulation results showcasethe value of implementing the information (1.1) in the CMS procedure in terms of finite-sample size andpower properties, and corroborate its theoretical superior performance over GMS. The simulation results alsoshow the performance of CMS and RMS tests based on the AQLR statistic are comparable. This findingis encouraging as the RMS test has desirable asymptotic properties but can be computationally expensivewhen J is large, while the CMS procedure isn’t costly to compute at all.The idea of exploiting information on parameters defined by constraints for improving performance instatistical problems, through constrained estimation, is one of the most natural ideas in statistics. Theliterature on constrained estimation via tilting the empirical distribution overlaps with this paper, wherethe problem is that the constraints/information are not adequately reflected by the empirical distribution(e.g., Hall and Presnell, 1999). Tilting the empirical distribution allows one to incorporate informationselectively into a statistical procedure without changing the procedure itself. Lok and Tabri (in press) applythis idea to modifying two-step bootstrap tests for restricted stochastic dominance orderings using empiricallikelihood and semi-infinite programming. The parameter of interest in their setup is infinite-dimensionaland there is a continuum of moment inequality restrictions, which are defined by moment functions thathave a particular form. The form of the moment functions in their setup yields a correlational structurethat facilitates the analysis of such moment inequalities. Contrastingly, in the setup of this paper, J is finiteand the form of the moment functions { g j ( · , θ ) : j = 1 , ..., J } is arbitrary. The implementation of empiricallikelihood in their setup has a data-driven number of inequalities that increases with the sample size, whichcan be as large as 500 in moderate sample sizes. The ability of empirical likelihood to straightforwardlyexecute with a large number of moment inequality restrictions transfers to the CMS procedure for models ith large J. This computational feasibility of CMS is an important feature of our approach. Similarto Lok and Tabri (in press), this paper is also part of the econometrics literature on shape restrictions(e.g., Chetverikov et al., 2018, and the references therein), as the inequalities (1.1) can be thought of asfinite-dimensional analogues of shape restrictions on nonparametric functions.We organize the paper as follows. Section 2 introduces the statistical framework, as well as the GMSand CMS procedures. Section 3 introduces the main results of the paper. Section 4 reports the results ofMonte Carlo simulations, and Section 5 concludes.For notational simplicity, throughout the paper we write partitioned column vectors as h = ( h , h ) . rather than h = ( h , h ) . Let R + = { x ∈ R : x ≥ } , R + , ∞ = R + ∪ { + ∞} , R [+ ∞ ] = R ∪ { + ∞} , R [ ±∞ ] = R ∪ {±∞} , “:=” denote the definitional identity, and A denote the closure of a set A . The object of interest is a parameter θ ∈ Θ ⊆ R d , d < + ∞ , defined by a finite number of known momentfunctions g j : W × Θ → R that satisfy the following unconditional moment inequality restrictions: E F (cid:0) g j ( W, θ ) (cid:1) ≥ ∀ j ∈ J , (2.1)where F denotes the true distribution of the observed data W and J := { , ..., J } with J < ∞ . In general,the identified set, Θ I ( F ) = { θ ∈ Θ : E F ( g j ( W, θ )) ≥ ∀ j ∈ J } , is not a singleton meaning that theparameter is partially identified.The moment inequality model is given by the following definition. Definition 1. [ Moment Inequality Model ] Let F be the set of parameters ( θ, F ) that satisfy:1. θ ∈ Θ ⊆ R d .2. { W i : i ≥ } are i.i.d. under F .3. E F (cid:0) g j ( W i , θ ) (cid:1) ≥ for j ∈ J .4. σ F,j ( θ ) := V ar F (cid:0) g j ( W i , θ ) (cid:1) ∈ [ ε ∗ , M ∗ ] for some M ∗ > ε ∗ > .5. Ω( θ, F ) ∈ Ψ , where Ω( θ, F ) is the J × J correlation matrix of { g j ( W i , θ ) , j = 1 , . . . , J } , and Ψ is thespace of correlation matrices whose determinant is greater than ε > .6. ∃ δ and M > E F | g j ( W i , θ ) /σ F,j ( θ ) | δ ≤ M ∀ j ∈ J . All of the conditions in this definition, except for Conditions 4 and 5, are those presented in (2.2) of Andrewsand Soares (2010). Condition 4 is a strengthening of Condition (v) in Andrews and Soares (2010) so that thevariances of the moment functions are uniformly bounded. Condition 5 specifies the nonsingularity of thematrix Ω( θ, F ) . These conditions are relatively unrestrictive and are part of Assumption GEL in Andrewsand Guggenberger (2009). Furthermore, they arise frequently in papers that consider empirical likelihoodinference for moment inequalities (e.g., Canay, 2010, and Lok and Tabri, in press).For a given value of the parameter, θ = θ , we invert tests of the hypothesis H : θ ∈ Θ I ( F ) toconstruct confidence sets of the form CS n = { θ ∈ Θ : T n ( θ ) ≤ c − α ( θ ) } , where T n ( θ ) denotes a test statistic nd c − α ( θ ) is a critical value for tests with nominal level α ∈ (0 , / CS n is a uniformly validconfidence set for θ if lim inf n → + ∞ inf ( θ,F ) ∈F P F ( T n ( θ ) ≤ c − α ( θ )) ≥ − α, (2.2)where P F ( · ) is the probability measure induced by repeated sampling from F . Uniformity is essential in orderfor asymptotic size to be a good approximation to the finite-sample size of confidence sets, because the teststatistic exhibits a discontinuity in its asymptotic distribution (as a function of the distribution generatingthe data), but not in its finite-sample distribution. Discontinuities of this type can create asymptotic sizeproblems that are analogous to those that arise with parameters that are near a boundary (e.g., Andrewsand Guggenberger, 2009).A test statistic is a function S : R J [+ ∞ ] × V J × J → R given by T n ( θ ) := S (cid:16) n ˆ g n ( θ ) , ˆΣ n ( θ ) (cid:17) , where V J × J is the set of invertible J × J variance matrices,ˆ g n ( θ ) = " n n X i =1 g ( W i , θ ) , ..., n n X i =1 g J ( W i , θ ) > , g ( W i , θ ) = (cid:2) g ( W i , θ ) , ..., g J ( W i , θ ) (cid:3) > , and ˆΣ n ( θ ) = n − n X i =1 ( g ( W i , θ ) − ˆ g n ( θ ))( g ( W i , θ ) − ˆ g n ( θ )) > . Two examples are the modified method of moments (MMM) and adjusted quasi-likelihood-ratio (AQLR)statistics. In the context of the moment inequality model F given by Definition 1, these test statistics aredefined as S (cid:0) n ˆ g n ( θ ) , ˆΣ n ( θ ) (cid:1) = n J X j =1 (cid:16) min (cid:8) , ˆ g n,j ( θ ) / ˆ σ n,j ( θ ) (cid:9)(cid:17) and (2.3) S A (cid:0) n ˆ g n ( θ ) , ˆΣ n ( θ ) (cid:1) = n inf t ∈ R J + , ∞ (ˆ g n ( θ ) − t ) (cid:124) (cid:0) ˜Σ n ( θ ) (cid:1) − (ˆ g n ( θ ) − t ) , (2.4)respectively, where ˜Σ n ( θ ) = ˆΣ n ( θ ) + max { , . − | ˆΩ n ( θ ) |} ˆ D n ( θ ) , ˆΩ n ( θ ) = ˆ D − n ( θ ) ˆΣ n ( θ ) ˆ D − n ( θ )and ˆ D n ( θ ) = diag ˆΣ n ( θ ), where diag ˆΣ n ( θ ) is a diagonal matrix with dimensions equal to those of ˆΣ n ( θ )whose diagonal elements equal those of ˆΣ n ( θ ). The point of departure for establishing that (2.2) holds for the GMS procedure is to consider the asymptoticdistribution of T n ( θ ) under a suitable sequence of null distributions. For any sequence { F n : n ≥ } in themodel of the null hypothesis, the test statistic satisfies T n ( θ ) d −→ S (cid:16) Ω / Z ∗ + h , Ω (cid:17) where Z ∗ ∼ N (0 J , I J ) , (2.5)where h ∈ R J + , ∞ , and Ω is a J × J correlation matrix. The vector h = ( h , , ..., h ,J ) has elements givenby lim n → + ∞ n / ( E F n ( g j ( W i , θ )) /σ F,j ( θ ) and measures the degree of slackness of the moment inequalities.The crux of this asymptotic construction is that the limiting distribution in (2.5) now depends continuously Specifically, this large-sample result (2.5) follows from the form of the test statistic, the Central Limit Theorem,and the convergence in probability of the sample correlation matrix. n the degree of slackness of the moment inequalities via the parameter h , which reflects the finite-samplesituation.The asymptotic implementation of the GMS critical value is the 1 − α quantile of a data-dependentversion of the asymptotic null distribution in (2.5). It replaces Ω by a consistent estimator and replaces h with a function ϕ : R J × Ψ → R J + , ∞ , which measures the slackness of moment inequalities throughˆ ξ n ( θ ) = κ − n n ˆ D − n ( θ )ˆ g n ( θ ) , where { κ n : n ≥ } is a divergent sequence of scalars (Andrews and Soares,2010). The GMS critical value, ˆ c n ( θ , − α ) , is the 1 − α quantile of L n ( θ , Z ∗ ) = S (cid:16) ˆΩ n ( θ ) Z ∗ + ϕ ( ˆ ξ n ( θ ) , ˆΩ n ( θ )) , ˆΩ n ( θ ) (cid:17) , (2.6)where Z ∗ ∼ N (0 J , I J ) and is independent of { W i : i ≥ } . That is,ˆ c n ( θ , − α ) := inf (cid:26) x ∈ R : P (cid:16) L n ( θ, Z ∗ ) ≤ x (cid:17) ≥ − α (cid:27) . (2.7)where P (cid:16) L n ( θ , Z ∗ ) ≤ x (cid:17) denotes the conditional CDF at x of L n ( θ , Z ∗ ) , conditional upon ( ˆ ξ n ( θ ) , ˆΩ n ( θ )) . In practice, the calculation of ˆ c n ( θ , − α ) is by simulating L n ( θ , Z ∗ ) using R i.i.d. draws from Z ∗ ∼ N (0 J , I J ) and computing the 1 − α quantile of the empirical CDF from { L n ( θ , Z ∗ r ) : r = 1 , ..., R } . Alternatively, one may compute the GMS critical value using the bootstrap. We briefly describe this ap-proach. Let { W ∗ i : i ≤ n } be a bootstrap sample drawn from the empirical distribution of the data { W i : i ≤ n } , and define ˆ g ∗ n ( θ ) = n − P ni =1 g ( W ∗ i , θ ), ˆΣ ∗ n ( θ ) = n − P ni =1 ( g ( W ∗ i , θ ) − ˆ g ∗ n ( θ ))( g ( W ∗ i , θ ) − ˆ g ∗ n ( θ )) (cid:124) , ˆ D ∗ n ( θ ) = diag ˆΣ ∗ n ( θ ), and ˆΩ ∗ n ( θ ) = ( ˆ D ∗ n ( θ )) − ˆΣ ∗ n ( θ )( ˆ D ∗ n ( θ )) − . The bootstrap implementation of theGMS procedure replaces L n ( θ , Z ∗ ) in (2.6) with L n ( θ , { W ∗ i : i ≤ n } ) = S (cid:16) G ∗ n ( θ ) + ϕ ( ˆ ξ n ( θ ) , ˆΩ n ( θ )) , ˆΩ ∗ n ( θ ) (cid:17) , where G ∗ n ( θ ) = n ( ˆ D ∗ n ( θ )) − (ˆ g ∗ n ( θ ) − ˆ g n ( θ )) , and defines a critical value analogous to (2.7). In practice,this critical value is the empirical 1 − α quantile of the bootstrap statistics { L n ( θ , { W ∗ i,r : i ≤ n } ) : r =1 , ..., R } , where {{ W ∗ i,r : i ≤ n } : r = 1 , ..., R } are bootstrap samples drawn from the empirical distributionof the data { W i : i ≤ n } . The asymptotic results of this paper hold for the bootstrap provided that G ∗ n ( θ n,h ) d → Ω Z ∗ , where the convergence is conditional on { W i : i ≤ n } for almost every sample path, forall sequences { ( θ n,h , F n,h ) : n ≥ } in F .There are numerous choices for ϕ and { κ n : n ≥ } . Chernozhukov et al. (2007) and Andrews and Soares(2010) recommend using κ n = (ln n ) . Another option is to set κ n = (2 ln ln n ) , which is used in Canay(2010). Our main results set ϕ = ϕ (1) , where ϕ (1) j ( ξ, Ω) = ξ j ≤ ∞ if ξ j > j ∈ J , and is referred to as the ‘moment selection t -test’ because it resembles a t -test with de-terministic critical value κ n . The decision reflects the recommendations of Andrews and Barwick (2012a),and is essentially without loss of generality because our results extend to any choice of ϕ that satisfies theassumptions of Andrews and Soares (2010). Appendix F.1 discusses how to generalize our results to othersuitable choices of ϕ .The advantage of the GMS procedure is that it asymptotically detects the “positive” moments E F ( g j ( W i , θ )) nd excludes them from the computation of the critical value, so as to mimic the discontinuity in the asymp-totic null distribution of T n ( θ ) . This ability of GMS tests to detect such moments is the source of itsimprovements over the subsampling and plug-in procedures under the null and alternative hypotheses.Although GMS tests are computationally simple and have desirable asymptotic properties, their perfor-mance in finite-samples depends crucially on how well they detect the “positive” moments, so as to omitthem from the computation of the critical value. Their use of the sample-analogue estimator of the momentsfor detecting the positive moments does not implement the information embedded in (2.1) and implementingthis information appropriately can improve the detection accuracy of “positive” moments in finite-samples.For a given moment selection function ϕ, the CMS procedure implements the information present in (2.1)through a surgical modification of the GMS procedure. The modification is to replace ˆ g n ( θ ) with itsconstrained empirical likelihood counterpart, where the constraints impose the inequality restrictions (2.1).Specifically, CMS replaces ˆ ξ n ( θ ) with ´ ξ n ( θ ) = κ − n n ˆ D − n ( θ )´ g n ( θ ), where ´ g n ( θ ) = P ni =1 ´ p i g ( W i , θ ) andthe probabilities ´ p , ..., ´ p n solvemax p ,...,p n ( n X i =1 ln( p i ) : n X i =1 p i g j ( W i , θ ) ≥ ∀ j ∈ J , n X i =1 p i = 1 , p i ≥ ∀ i ) , (2.9)and then computes a critical value as described in (2.7), but replaces ϕ ( ˆ ξ n ( θ ) , ˆΩ n ( θ )) with ϕ ( ´ ξ n ( θ ) , ˆΩ n ( θ ))in (2.6). The CMS modification of GMS can easily be applied to all choices of ϕ and { κ n : n ≥ } presented in Andrews and Soares (2010) because it only replaces ˆ g n ( θ ) with ´ g n ( θ ) . The estimator ´ g n,j ( θ )of E F (cid:0) g j ( W, θ ) (cid:1) > g n,j ( θ ) because the optimization problem (2.9), which givesrise to ´ g n ( θ ) , imposes a correct constraint E F (cid:0) g j ( W, θ ) (cid:1) ≥ , while ˆ g n ( θ ) ignores such information.Thus, when E F (cid:0) g j ( W, θ ) (cid:1) > g n,j detects this configuration more reliably than ϕ j ( ˆ ξ n , ˆΩ n ( θ )) , and therefore, takes it into account by deliveringa critical value that is suitable for the case where this moment inequality is omitted. This feature of CMSleads to it having better finite-sample properties than GMS under the null and alternative hypotheses.The CMS procedure is not computationally expensive because the empirical likelihood optimizationproblem (2.9) has a strictly concave objective function and a convex feasible set that is characterised byaffine functions of the choice variables (Owen, 2001). This means that the optimization problem (2.9)has a unique global solution, and it can be computed numerically using standard optimization routines insoftware such as Matlab, R, or GAUSS. This computational simplicity of the optimization problem (2.9) isan important feature of the CMS procedure. Remark 1.
One can ‘fully constrain’ the CMS procedure by using restricted estimators of the correlationmatrix. In this case, we evaluate ϕ ( ´ ξ F Cn ( θ ) , ´Ω n ( θ )), where´ ξ F Cn ( θ ) = κ − n n ´ D − n ( θ )´ g n ( θ ) , ´Ω n ( θ ) = ´ D − n ( θ ) ´Σ n ( θ ) ´ D − n ( θ ) , ´Σ n ( θ ) = n X i =1 ´ p i ( g ( W i , θ ) − ´ g n ( θ ))( g ( W i , θ ) − ´ g n ( θ )) > , and ´ D − n ( θ ) = diag ´Σ n ( θ ) . In our simulations not presented in this paper, we find limited practical difference between the ϕ ( ´ ξ F Cn ( θ ) , ´Ω n ( θ ))and ϕ ( ´ ξ n ( θ ) , ˆΩ n ( θ )). Consequently, the rest of the paper focuses on ϕ ( ´ ξ n ( θ ) , ˆΩ n ( θ )) because it is simplerto show that there are power advantages over GMS. Main Results
We start by introducing the assumptions that beget the main results of this paper. They are conditionson the test statistic S , the moment selection function ϕ , and the parameter space F . The assumptions on S we consider are from Andrews and Soares (2010), and are stated as Assumptions 1-7 in Appendix B forease of exposition. Recall that we set ϕ = ϕ (1) in (2.8), and the main results we present are based on thischoice of moment selection function. It should be noted that this choice of ϕ is without loss of generalityas one can employ assumptions identical to those in Andrews and Soares (2010) on ϕ to deduce the sameconclusions, because the CMS procedure does not alter the moment selection function in the GMS procedure.See Appendix F.1 for the details on other choices of ϕ. The first assumption concerns the sequence { κ n : n ≥ } . Assumption K. κ n → + ∞ as n → + ∞ . 2. κ − n n → + ∞ as n → + ∞ . The conditions in this assumption are not restrictive – the aforementioned examples of { κ n : n ≥ } satisfythem. The ‘optimal’ choice of { κ n : n ≥ } is an important question, but the goal of our paper is moremodest: to demonstrate how incorporating statistical information can improve finite-sample inference formoment inequalities in a computationally simple way and, for this purpose, our analysis conditions on anarbitrary choice of { κ n : n ≥ } . For our Monte Carlo experiment (Section 4), we set κ n = (ln n ) which isthe recommended choice in Chernozhukov et al. (2007) and Andrews and Soares (2010).The next assumption we present is the first part in Part (d) of Assumption GEL in Andrews andGuggenberger (2009). It is helpful in establishing that ´ g n ( θ ) is a uniformly consistent estimator of themoments under the null hypothesis H : θ ∈ Θ I ( F ) . To introduce this assumption, for each t ∈ R J , define g i ( t, θ ) = g ( W i , θ ) − t. The vector t is a nuisance parameter that captures the slackness of themoment inequalities. Using the dual formulation of the empirical likelihood problem (2.9), the amount ofslackness is captured by ´ t n = arg min t ∈ R J + sup λ ∈ ´Λ n ( t,θ ) n − P ni =1 ln (cid:16) − λ > g i ( t, θ ) (cid:17) , where ´Λ n ( t, θ ) = { λ ∈ R J : λ > g i ( t, θ ) ∈ Q ∀ i = 1 , . . . , n } , Q is an open interval of R containing 0 . This reformulation of theempirical likelihood problem (2.9) is feasible because the linear constraint qualification applies to it. Thepart of Assumption GEL we include in our setup is a regularity condition concerning the uniform asymptoticbehavior of ´ t, and is stated in terms of the following reparametrization of F . Definition 2.
Let Γ be defined as the set of all γ = ( γ , γ , γ ) such that for some ( θ, F ) ∈ F where1. F is defined in Definition 1.2. γ = ( E F ( g ( W i , θ )) /σ F, ( θ ) , . . . , E F ( g J ( W i , θ )) /σ F,J ( θ )) . γ = (cid:0) θ, vech ∗ (Ω( θ, F )) (cid:1) , where vech ∗ (Ω( θ, F )) is the vector of lower off-diagonal elements of Ω( θ, F ) . γ = F. Andrews and Soares (2010) indicate that there is a one-to-one mapping from γ to ( θ, F ); see Appendix Aof their paper for the details. Denote by { γ n,h : n ≥ } ⊂ Γ a sequence of parameters in Γ such that n / γ n,h, → h ∈ R J + , ∞ and γ n,h, → h ∈ R q [ ±∞ ] as n → ∞ , where q = dim(Θ) + dim(vech ∗ (Ω( θ, F ))) . Thepart of Assumption GEL that we include in our setup is given by the following assumption.
Assumption T.
For all subsequences { w n } of { n } and all sequence { γ w n ,h : n ≥ } ⊂ Γ and corresponding { ( θ w n ,h , F w n ,h ) : n ≥ } ⊂ F , ´ t w n = arg min t ∈ R J + sup λ ∈ ´Λ wn ( t,θ wn,h ) n n X i =1 ln (cid:16) − λ > g i ( t, θ w n ,h ) (cid:17) xists and satisfies sup n ≥ || ´ t w n || ‘ J ≤ K with probability approaching 1 as n → + ∞ for some constant K < + ∞ , where || · || ‘ J is the usual Euclidean norm on R J . We also include an assumption from Andrews and Soares (2010) for the case in which Int(Θ I ( F )) = ∅ for some data-generating process in the model. It is required to show that when there are no binding momentinequalities, the maximum asymptotic coverage probability is equal to 1 . Assumption M.
There exists ( θ, F ) ∈ F that satisfies E F (cid:0) g j ( W i , θ ) (cid:1) > for all j ∈ J . We now present the first main result of the paper. It mirrors Theorem 1 of Andrews and Soares (2010) whichconcerns the asymptotic size of GMS confidence sets. Denote by ´ c n ( θ , − α ) the CMS critical value underthe nominal level 1 − α for testing the null hypothesis H : θ ∈ Θ I ( F ) . Theorem 1.
Suppose S satisfies Assumptions 1 - 3, ϕ = ϕ (1) in (2.8), the sequence { κ n : n ≥ } satisfiesPart 1 of Assumption K, and α ∈ (0 , / . Furthermore, let F + = { ( θ, F ) ∈ F : Assumption T holds } , and F ++ = { ( θ, F ) ∈ F : Assumptions T and M hold } . Then, the nominal level (1 − α ) CMS confidence set basedon the statistic T n ( θ ) satisfies the following statements:1. lim inf n → + ∞ inf ( θ,F ) ∈F + P F (cid:16) T n ( θ ) ≤ ´ c n ( θ, − α ) (cid:17) ≥ − α. lim inf n → + ∞ inf ( θ,F ) ∈F + P F (cid:16) T n ( θ ) ≤ ´ c n ( θ, − α ) (cid:17) = 1 − α , if in addition S and { κ n : n ≥ } satisfyAssumption 7 and Part 2 of Assumption K, respectively.3. lim sup n → + ∞ sup ( θ,F ) ∈F ++ P F (cid:16) T n ( θ ) ≤ ´ c n ( θ, − α ) (cid:17) = 1 . Proof.
See Appendix C.1. (cid:4)
The first result of Theorem 1 establishes the uniform validity of CMS confidence sets over the parameterspace F + , and the second result of this theorem shows that they are not asymptotically conservative. Thethird result shows the maximum coverage probability of CMS confidence sets is equal to 1 over the parameterspace F ++ . The parameter space F + is a subset of the one used in Theorem 1 of Andrews and Soares (2010),because it imposes Assumption T and Condition 4 in Definition 1 in addition to the conditions they set fortheir parameter space.The proof of Theorem 1 establishes that CMS and GMS procedures are asymptotically equivalent withuniformity over the parameter space F + . The essence of this result is that for every sequence { γ n,h : n ≥ } ⊂ Γ such that Assumption T holds, along the corresponding sequence { ( θ w n ,h , F w n ) : n ≥ } ⊂ F + we have´ ξ w n ( θ w n ,h ) = ˆ ξ w n ( θ w n ,h ) + o p (1) . This asymptotic equivalence is a consequence of ´ g w n ( θ w n ,h ) − ˆ g w n ( θ w n ,h ) = O p ( w − n ) (see Lemma D.3 in Appendix D) and Assumption K on κ w n . In particular, these arguments areused after re-writing the expression of ´ ξ w n ( θ w n ,h ) in terms of ˆ ξ w n ( θ w n ,h ) , as such´ ξ w n ( θ w n ,h ) = κ − w n w n ˆ D − w n ( θ w n ,h )´ g w n ( θ w n ,h )= κ − w n w n ˆ D − w n ( θ w n ,h ) (cid:0) ´ g w n ( θ w n ,h ) − ˆ g w n ( θ w n ,h ) (cid:1) + ˆ ξ w n ( θ w n ,h ) , (3.1)to obtain the asymptotic equivalence. .2 Limiting Local Power Function of CMS Tests This section employs the setup in Section 8 of Andrews and Soares (2010) to show the limiting local powerfunction of the CMS tests coincide with their GMS counterparts when the null parameter space is F + = { ( θ, F ) ∈ F : Assumption T holds } . For sequences of parameters { ( θ n, ∗ , F n ) : n ≥ } , consider the testingproblem H : E F n (cid:0) g j ( W i , θ n, ∗ ) (cid:1) ≥ ∀ j ∈ J vs. H : H is false (3.2)where θ n, ∗ = θ n + ηn − (1 + o (1)) for all n ≥
1, ( θ n , F n ) ∈ F + for all n ≥
1, and η ∈ R d where d =dim( θ n ) < + ∞ . The idea is to study the behavior of the testing procedure along sequences of parameters { ( θ n, ∗ , F n ) : n ≥ } that differ locally from a point in the true parameter space F + by O ( n − ). The localpower function is defined as P F n (cid:0) T n ( θ n, ∗ ) > ´ c n ( θ n, ∗ , − α ) (cid:1) , where P F n ( · ) is the probability measure inducedby random sampling from F n for all n ≥
1. The objective is to derive an expression for the limiting localpower function, lim n → + ∞ P F n (cid:0) T n ( θ n, ∗ ) > ´ c n ( θ n, ∗ , − α ) (cid:1) , and compare it to its GMS counterpart.To this end, we introduce technical assumptions for deriving the limiting local power function for CMStests. These assumptions are from Section 8 of Andrews and Soares (2010). Assumption LA1.
The true parameters { ( θ n , F n ) : n ≥ } satisfy:1. θ n = θ n, ∗ − ηn − (1+ o (1)) for some η ∈ R d , θ n, ∗ → θ and F n → F as n → + ∞ , where ( θ , F ) ∈ F + .2. For each j ∈ J , there exists h ,j ∈ R + , ∞ such that n E F n (cid:0) g j ( W i , θ n ) (cid:1) /σ F n ,j ( θ n ) → h ,j as n → + ∞ .3. sup (cid:8) E F n | g j ( W i , θ n, ∗ ) /σ F n ,j ( θ n, ∗ ) | δ : n ≥ (cid:9) < + ∞ for all j ∈ J for some δ > . The first two parts of Assumptions LA1 show that the sequence of true parameters, { θ n : n ≥ } , is n − -local to the sequence of parameters under the null hypothesis, { θ n, ∗ : n ≥ } and provides the limit of thesequence of normalised moment functions when evaluated at the sequence of true parameters { θ n : n ≥ } .The third part of this assumption is a uniform integrability condition that permits the use of stochastic limittheorems for triangular arrays of row-wise IID random variables. The second assumption is as follows. Assumption LA2. Π( θ, F ) := ( ∂/∂θ > )[ D − ( θ, F ) E F ( g ( W i , θ ))] ∈ R J × dim(Θ) exists and is a continuousfunction in a neighbourhood of ( θ , F ) . Both Assumptions LA1 and LA2 are important for proving the large sample properties of CMS tests under n − -local alternatives. Namely, they allow one to mean value expand the normalised moment functionsunder H around θ = θ n and show thatlim n →∞ n D − ( θ n, ∗ , F ) E F ( g ( W i , θ n, ∗ )) = h + Π( θ , F ) η which can then be used to show that T n ( θ n, ∗ ) d → J h ,η , where J h ,η is the distribution function of S (cid:0) Ω Z ∗ + h + Π( θ , F ) η, Ω (cid:1) and Z ∗ ∼ N (0 J , I J ) (Andrews and Soares, 2010). Assumption LA3. lim n → + ∞ κ − n n D − ( θ n , F n ) E F n ( g ( W i , θ n )) = π ∈ R J + , ∞ . The last assumption involves the set C ( ϕ ) = { ˜ π ∈ R J [+ ∞ ] : ∀ j ∈ J , either ˜ π ,j = + ∞ or ϕ j ( ξ, Ω) → ϕ j (˜ π , Ω ) as ( ξ, Ω) → (˜ π , Ω ) } . Loosely, C ( ϕ ) is the set of all vectors in R J [+ ∞ ] for which ϕ is continuousat (˜ π , Ω ). With ϕ = ϕ (1) , this set is C ( ϕ (1) ) = { ˜ π ∈ R J [+ ∞ ] : ˜ π ,j = 1 , ∀ j ∈ J } . Assumption LA4. π ∈ C ( ϕ (1) ) and 2. P F (cid:16) S (cid:0) Ω Z ∗ + ϕ (1) ( π , Ω ) , Ω (cid:1) ≤ x (cid:17) is continuous and strictlyincreasing at x = c π ( ϕ (1) , − α ) , the − α quantile of the distribution function of S (cid:0) Ω Z ∗ + ϕ (1) ( π , Ω ) , Ω (cid:1) . ssumptions LA3 and LA4 are imposed so that we can use Theorem 2(a) of Andrews and Soares (2010) toobtain the form of the GMS limiting local power function.Next, we present the second main result of this paper. This result states that GMS and CMS tests areasymptotically equivalent, to first-order, under n − / -local alternatives. Theorem 2.
Suppose S satisfies Assumptions 1-5, ϕ = ϕ (1) in (2.8), the sequence { κ n : n ≥ } satisfiesAssumption K, and that Assumptions LA1 - LA4, hold. Then lim n → + ∞ P F n (cid:0) T n ( θ n, ∗ ) > ´ c n ( θ n, ∗ , − α ) (cid:1) =1 − J h ,η (cid:0) c π ( ϕ, − α ) (cid:1) . Proof.
See Appendix C.2. (cid:4)
The intuition behind Theorem 2 is essentially the same as Theorem 1. For a given sequence { ( θ n, ∗ , F n ) : n ≥ } of n − / -local alternatives, we show ´ ξ n ( θ n, ∗ ) = ˆ ξ n ( θ n, ∗ ) + o p (1). This asymptotic equivalence is a conse-quence of applying ´ g n ( θ n, ∗ ) − ˆ g n ( θ n, ∗ ) = O p ( n − ) (see Lemma E.6 in Appendix E) and Assumption K to adecomposition of ´ ξ n ( θ n, ∗ ) identical to (3.1). Therefore, the pairs ( ´ ξ n ( θ n, ∗ ) , ˆΩ n ( θ n, ∗ )) and ( ˆ ξ n ( θ n, ∗ ) , ˆΩ n ( θ n, ∗ ))are asymptotically equivalent along sequences of n − / -local alternatives. As this is the only point of dif-ference between CMS and GMS, Theorem 2 follows from Theorem 2(a) of Andrews and Soares (2010). Animportant corollary to Theorem 2 is that CMS inherits the first-order improvements that GMS exhibits oversubsampling and plug-in asymptotic critical values (see Andrews and Soares, 2010). While Theorem 2 establishes the equality of the limiting local power functions of CMS and GMS tests under n − / -local alternatives, this section presents results that characterize sequences of local alternatives underwhich the power of CMS tests dominate their GMS counterparts for sufficiently large, but finite, samples.First, we must establish when it is meaningful to compare tests along sequences of n − / -local alternatives.Under the conditions of part 2 of Theorem 1, for every r > , there exists N r, , N r, ∈ Z + (depending on r )such that (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) sup m ≥ n sup ( θ,F ) ∈F + P F ( T m ( θ ) > ´ c m ( θ, − α )) − α (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ r ∀ n ≥ N r, and (3.3) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) sup m ≥ n sup ( θ,F ) ∈F + P F ( T m ( θ ) > ˆ c m ( θ, − α )) − α (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ r ∀ n ≥ N r, , (3.4)by the definition of limit superior (with respect to n ). Then by the triangular inequality, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) sup m ≥ n sup ( θ,F ) ∈F + P F ( T m ( θ ) > ´ c m ( θ, − α )) − sup m ≥ n sup ( θ,F ) ∈F + P F ( T m ( θ ) > ˆ c m ( θ, − α )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ r for all n ≥ N r = max { N r, , N r, } , holds. In words, given an error tolerance r, the tails of the sequences ofexact sizes of CMS and GMS tests are within r of α and of each other, when n ≥ N r . Thus, given r (e.g.,0.0001), it is meaningful to compare the rejection probabilities along sequences of local alternatives when n ≥ N r . Let H denote the set of all sequences { ( θ n, ∗ , F n ) : n ≥ } that satisfy Assumption LA1 and LA2. Thefamily we consider for the comparisons is defined as M = {{ ( θ n, ∗ , F n ) : n ≥ } ∈ H : Ω( θ n, ∗ , F n ) has nonnegative off-diagonal elements, ∀ n } . (3.5) or { ( θ n, ∗ , F n ) : n ≥ } ∈ H , let ˆΥ n ( θ n, ∗ ) := { j ∈ J : ϕ (1) j ( ˆ ξ n ( θ n, ∗ ) , ˆΩ n ( θ n, ∗ )) = 0 } and ´Υ n ( θ n, ∗ ) := { j ∈J : ϕ (1) j ( ´ ξ n ( θ n, ∗ ) , ˆΩ n ( θ n, ∗ )) = 0 } for each n ≥
1. We have the following result.
Theorem 3.
Let M be as in (3.5). Suppose that S satisfies Part 1 of Assumption 1, ϕ = ϕ (1) in (2.8),and the sequence { κ n : n ≥ } satisfies Assumption K. For every { ( θ n, ∗ , F n ) : n ≥ } ∈ M , there exists N ( θ n, ∗ , F n ) ∈ Z + such that P F n (cid:16) T n ( θ n ) > ˆ c n ( θ n , − α ) (cid:17) ≤ P F n (cid:16) T n ( θ n ) > ´ c n ( θ n , − α ) (cid:17) ∀ n ≥ N ( θ n, ∗ , F n ) . (3.6) If in addition S satisfies part 1 of Assumption 2 and Part 2 of Assumption 5, and the event (cid:26) ´Υ n ( θ n, ∗ ) (cid:40) ˆΥ n ( θ n, ∗ ) (cid:27) \ (cid:26) ˆ c n ( θ n, ∗ , − α ) > (cid:27) \ (cid:26) ´ c n ( θ n, ∗ , − α ) < T n ( θ n, ∗ ) ≤ ˆ c n ( θ n, ∗ , − α ) (cid:27) has positive probability for each n ≥ N ( θ n, ∗ , F n ) , then the weak inequalities in (3.6) are strict.Proof. See Appendix C.3. (cid:4)
Theorem 3 states the rejection probabilities of CMS tests are no less than their GMS counterparts in largeenough, but finite, sample sizes, under local alternatives in M . It also provides a sufficient condition forthe ordering to hold strictly. Thus, for each sequence of local alternatives in M and small r > , thelocal power of a CMS test is larger than its GMS counterpart when n ≥ max { N ( θ n, ∗ , F n ) , N r } , where N r = max { N r, , N r, } and N r, and N r, defined in (3.3) and (3.4), respectively.The key message from Theorem 3 is that a comparison of GMS and CMS tests based on first-orderasymptotics can be misleading, as it does not reflect the finite-sample situation for certain local alternatives.The result of Theorem 3 is similar to Corollary 6.1 Lok and Tabri (in press); however, it is important to notethat their result is specific to moment inequalities arising from restricted stochastic dominance orderings.Consequently, Theorem 3 provides a nontrivial extension of their result to the moment inequality model withfinitely many inequalities and arbitrary moment functions, when the off-diagonal elements of Ω( θ n, ∗ , F n ) arenon-negative for each n. At the heart of this result is the marriage of the non-negative correlational structure on Ω( θ n, ∗ , F n ) andconstrained empirical likelihood estimation. This marriage begets ˆ g n,j ( θ n, ∗ ) ≤ ´ g n,j ( θ n, ∗ ) with probabilityapproaching 1 , for all sequences in M (see Lemma E.8). This ordering of the estimators implies that ϕ (1) j ( ´ ξ ( θ n, ∗ ) , ˆΩ n ( θ n, ∗ )) ≥ ϕ (1) j ( ˆ ξ n ( θ n, ∗ ) , ˆΩ n ( θ n, ∗ )) holds with probability approaching 1, for all sequences in M (see Lemma E.9). It is this ordering of the moment selection functions under such sequences that givesrise to the result of Theorem 3.While Theorem 3 indicates that the local powers of the GMS and CMS tests can be ordered under a classof local alternatives M for large enough n, it does not specify the extent of the discrepancy in the local powers.It is quite difficult to determine the extent of this discrepancy analytically. However, Section 4 presents MonteCarlo evidence that the discrepancy that Theorem 3 implies can be very large for local alternative sequenceswhich have some non-violated inequalities (SNVIs). That is, sequences { ( θ n, ∗ , F n ) : n ≥ } in M wherethere exists j ∈ J such that E F n (cid:0) g j ( W i , θ n, ∗ ) (cid:1) > ∀ n and lim n → + ∞ E F n (cid:0) g j ( W i , θ n, ∗ ) (cid:1) = 0. This section shows CMS tests are consistent against distant alternatives. Distant alternatives include fixedalternatives and alternatives that differ from the null by greater than O ( n − ). The next assumption is useful or deducing this result, and it is the same one introduced by Andrews and Soares (2010) in Section 9 oftheir paper. Assumption DA.
Let g ∗ n,j = E F n ( g j ( W i , θ n, ∗ )) /σ F n ,j ( θ n, ∗ ) for each j ∈ J , and υ n = max j ∈J {− g ∗ n,j } .1. n υ n → + ∞ as n → + ∞ Ω( θ n, ∗ , F n ) → Ω , Ω ∈ Ψ . The key part of this assumption is the first part, which indicates that there exists j ∈ J such that g ∗ n,j < O ( n − ). This condition differs from thesetup with n − -local alternatives, where the sequences of alternatives { ( θ n, ∗ , F n ) : n ≥ } are within a n − -neighbourhood of F + .We have the following result. Theorem 4.
Suppose S satisfies Assumptions 1,3,4 and 6, ϕ = ϕ (1) in (2.8), and the sequence { κ n : n ≥ } satisfies Assumption K. Then lim n → + ∞ P F n (cid:16) T n ( θ n, ∗ ) > ´ c n ( θ n, ∗ , − α ) (cid:17) = 1 . Proof.
See Appendix C.4. (cid:4)
This section studies the finite-sample performance of the CMS procedure and compares it to the GMSprocedure using a simulation experiment based on the designs in Andrews and Barwick (2012a). Thestudy uses the test statistics S in (2.3) and S A in (2.4), the recommended moment selection function ϕ = ϕ (1) in (2.8), and the recommended localisation parameter κ n = (ln n ) / . The nominal level is set to α = 0 . , and we considered sample sizes n = 50 , S and S A , and (ii) the recommended RMS testing procedure, which combines S = S A , ϕ = ϕ (1) , and κ -auto (a data-driven choice of κ n ), as additional benchmarks in studying thefinite-sample performance of CMS; see Appendices G.2 and G.1, respectively, for further details on thesetesting procedures. Only bootstrap versions of the tests were implemented, with 10000 bootstrap samplesper Monte Carlo replication. The computations were implemented using R .For a given θ, the null hypothesis is H : θ ∈ Θ( F ) . The experimental design in Andrews and Barwick(2012a,b) is a general formulation of that testing problem that does not require the specification of a particularform for the moment functions { g j ( · , θ ) : j = 1 , ..., J } . They note that the finite-sample properties of tests of H depend on the moment functions only through (i) the vector µ = [ E F (cid:0) g ( W i , θ ) (cid:1) , . . . , E F (cid:0) g J ( W i , θ ) (cid:1) ] , (ii) the correlation matrix Ω = Corr ( g ( W i , θ ) , . . . , g J ( W i , θ )) , and (iii) the distribution of the mean zero,variance I J random vector Z † = [ Z † , . . . , Z † J ] where Z † j = Var − / F ( g j ( W i , θ )) (cid:0) g j ( W i , θ ) − E F (cid:0) g j ( W i , θ ) (cid:1)(cid:1) , j = 1 , . . . , J. We consider the case Z † ∼ N (0 J , I J ) and three correlation matrices, Ω Neg , Ω Zero , and Ω Pos , which exhibitnegative, zero, and positive correlations.The assertion of the null hypothesis in this general formulation is H : µ j ≥ ∀ j = 1 , . . . , J . Forcomparisons under the null hypothesis, we follow Andrews and Barwick (2012a) by comparing the tests’maximum null rejection probabilities (MNRPs). The MNRPs are computed over the mean vectors µ inthe null parameter space given the correlation matrix Ω ∈ { Ω Neg , Ω Zero , Ω Pos } and under the assumption ofnormally distributed moment inequalities. Based on simulation evidence, they conjecture that the MNRPsoccur for mean vectors µ whose elements are 0’s and + ∞ ’s. Thus, given a nominal level α, they computeMNRP results over the set of mean vectors µ which have that form. The results we report are for J = 2 , nd 10 . The matrix Ω
Zero equals the J -dimensional identity matrix. The matrices Ω Neg and Ω
Pos are Toeplitzmatrices with correlations given by the following: for J = 2 : ρ = − . Neg and ρ = . Pos ; for J = 4 : ρ = ( − . , . , − .
5) for Ω
Neg and ρ = ( . , . , .
5) for Ω
Pos ; for J = 10 : ρ = ( − . , . , − . , . , − . , . , − . , . , − . Neg and ρ = ( . , . , . , . , . , . . . , .
5) for Ω
Pos . As in Andrews and Barwick (2012a), the simulation studytreats the correlation matrices as unknown in the implementation of all of the tests.For power comparisons, we also follow Andrews and Barwick (2012a,b). They compare the power ofdifferent tests by comparing their empirical power for a chosen set of alternative parameter vectors µ ∈ R J for a given correlation matrix Ω ∈ { Ω Neg , Ω Zero , Ω Pos } . The sets of µ vectors in the alternative are similarto the ones described in Andrews and Barwick (2012a,b). We adjust those sets so as to compare the localpower properties of the testing procedures. The adjustment is as follows. For each J ∈ { , , } the set of µ vectors are given by M J,n (Ω) = { µ/ √ n : µ ∈ M J (Ω) } , where the set M J (Ω) of µ vectors is describedin Section 7.1 of Andrews and Barwick (2012b). The µ vectors in M J,n (Ω) are scaled versions of those in M J (Ω) , where the scaling is by n − / to create the n − / -local alternatives. There are 7 , , and 40 elementsin M J,n (Ω) for J = 2 , , respectively. We omit their description for brevity.As the MNRPs of the tests can differ in finite-samples, the simulation results on power comparisons arebased on a MNRP correction that is similar to the one employed by Andrews and Barwick (2012a). Foreach test statistic S , the MNRP correction of the CMS, GMS and RMS procedures is to add a constantbased on the true matrix Ω to their corresponding critical values, so that their resulting MNRPs matchthat of the RSW testing procedure with nominal level α = 0 .
05; see Section G.3 in the appendix for thedetails. The simulation studies in Andrews and Barwick (2012a) and Romano et al. (2014) compare testsunder the alternative using average MNRP-corrected power, where the average is computed over alternative µ vectors in M J (Ω) . We report simulation results graphically using boxplots of the MNRP-corrected local powers over sets of µ vectors M J,n (Ω) for the 54 different combinations of ( J, Ω , S, n ) for each of the CMS,GMS and RSW procedures, and 9 different combination of ( J, Ω , S A , n ) for the recommended RMS test.Additionally, we report average MNRP-corrected local powers of the different tests across the aforementionedconfigurations using the symbol ⊕ in these plots.While the average MNRP-corrected power is a useful criterion for comparing tests across µ vectors in agiven set M J,n (Ω), it does not convey the whole picture of the tests’ performance over elements in M J,n (Ω) . Reporting boxplots, as we do, reveals the variation in powers of the tests across elements in M J,n (Ω); thus,presenting a broader and more extensive approach to comparing the tests under the alternative. Theseplots are especially useful for detecting differences in the performances of tests when the averages of theirMNRP-corrected powers are close, but exhibit different distributional variations in MNRP-corrected poweracross µ vectors in M J,n (Ω) . As in Andrews and Soares (2010), Andrews and Barwick (2012a), and Romano et al. (2014), empiricalMNRPs are simulated as the maximum rejection probability over all µ vectors whose components are 0 and+ ∞ , with at least one component equal to zero. Table 1 reports the MNRPs for tests. Each experimentused 10000 Monte Carlo replications when J ∈ { , } and 2500 when J = 10.Overall, the procedures achieve a satisfactory performance for all cases considered. The RMS and RSWtests perform the best, as their MNRPs are closest to the 5% nominal level across all of the cases considered.For the RSW procedure: the MNRPs fall within the ranges [.043,.056] and [.043,.055] when using S A and S test statistics, respectively. For the RMS test: the MNRPs fall into the range [.042,.053]. The CMS testsover-reject the null slightly: the MNRPs fall within the ranges [.049,.067] and [.049,.061] when using S A and J = 2 J = 4 J = 10 n Procedure Statistic Ω
Neg Ω Zero Ω Pos Ω Neg Ω Zero Ω Pos Ω Neg Ω Zero Ω Pos
50 GMS S S A S S A S S A S A S S A S S A S S A S A S S A S S A S S A S A test statistics, respectively. The tables also show CMS tests have better MNRPs than their GMS versionsas the latter tend to over-reject more: the GMS MNRPs fall within the ranges [.049,.083] and [.049,.078]for S A and S , respectively. The largest MNRPs arise in the configurations where Ω = Ω Neg , and theseMNRPs increase with larger J , for the CMS, GMS and RSW tests. However, the MNRPs of all of thesetests do get closer to the 5% nominal level with larger sample sizes, across all configurations, and for CMStests, this numerical result is a consequence of Theorem 1.While we don’t have a theoretical result on improved size control of CMS tests over their GMS versions,Table 1 provides simulation-based evidence of such an improvement. Hence, these results point to thepotential benefit of implementing the information (1.1), as we do, in two step testing procedures, underthe null. The next section presents simulation results on MNRP-corrected power of these tests, under localalternatives, and illustrates the result of Theorem 3. Figures 1 and 2 below report boxplots of the MNRP-corrected powers of the tests under S A and S ,respectively. The results can be summarised as follows. For each test statistic, the MNRP-corrected powervalues of the tests are generally distributed in a similar way in configurations where Ω = Ω Neg , and allof the tests have comparable average powers in those configurations. By contrast, in configurations whereΩ = Ω Zero , for each test statistic, the boxplots show the RSW tests’ MNRP-corrected power values tendto be (i) more dispersed (as shown by the lengths of their boxes), (ii) have a wider overall range, and(iii) have lower average power in comparison to the remaining tests, which all behave similarly as can beseen by their boxplots. For example, the average power of the RSW test when S = S A , J = 10, and n = 250 is approximately equal to 0.57, while the averages of the remaining procedures in that scenario areall approximately equal to 0.66, which is a large difference.More noticeable differences in the tests’ performance arise in configurations where Ω = Ω Pos . For each teststatistic, there is evidence for the following ranking in terms of average MNRP-corrected power, uniformlyin J and n : CMS in first place, GMS in second place, and RSW in third place, with the RMS test tied infirst place with the CMS- S A test. The boxplots also show: • The MNRP-corrected power values for the RMS and CMS- S A tests are generally distributed in asimilar way for each n and J , except when J = 2 the CMS- S A power values are slightly moredispersed (as shown by the length of the boxes) than their RMS counterparts for each n . • For each S , the MNRP-corrected power values of CMS tests are markedly less dispersed and havesmaller overall ranges than their GMS and RSW counterparts. • The difference among the CMS, GMS and RSW tests in these configurations with S = S can bestrikingly large in terms of average power; for example, with J = 4 and n = 250 , the average powers ofCMS, GMS and RSW tests are approximately equal to 0.75, 0.65, and 0.60, respectively. By contrast,with S = S A the difference among these tests is less pronounced, which is on account of using a moreeffective test statistic. For example, in the aforementioned configuration, the average powers of CMS,GMS and RSW tests are approximately equal to 0.76, 0.74, and 0.73, respectively. However, this lesspronounced difference in average powers does not mean that these procedures behave similarly, asevidenced by the radically different boxplots of the tests’ power values. S A -based tests. For each configuration, the symbol ⊕ marks the location of the averageMNRP-corrected power of a test. igure 2: Boxplots of MNRP-corrected powers of S -based tests. For each configuration, the symbol ⊕ marks the location of the averageMNRP-corrected power of a test. able 2: MNRP-Corrected Average Powers: Ω = Ω pos J n
GMS- S GMS- S A CMS- S CMS- S A RSW- S RSW- S A RMS50 0.658 0.676 0.677 0.691 0.611 0.652 0.6922 100 0.654 0.681 0.682 0.697 0.620 0.661 0.700250 0.662 0.687 0.689 0.701 0.630 0.672 0.70950 0.675 0.743 0.752 0.759 0.575 0.715 0.7504 100 0.667 0.747 0.728 0.763 0.587 0.729 0.760250 0.656 0.746 0.738 0.761 0.593 0.733 0.76150 0.692 0.782 0.787 0.803 0.557 0.746 0.79710 100 0.696 0.796 0.799 0.819 0.573 0.765 0.825250 0.695 0.810 0.802 0.830 0.585 0.782 0.837
The result of Theorem 2 implies that the average power of CMS and GMS tests should get closertogether with larger sample sizes. The simulations reflect this implication across all configurations, butindicate that it happens slowly when Ω = Ω
Pos . Consequently, there is simulation-based evidence that showsthe implementation of the information (2.1), as we do with CMS, may not improve the local power of GMStests for configurations in which Ω = Ω Pos . The reason is that the boxplots for MNRP-corrected powervalues of CMS and GMS tests are generally quite similar in those configurations. By contrast, the resultof Theorem 3 points to such an improvement in local power for configurations in which Ω = Ω
Pos , and thisresult is reflected in the simulations as described above.Table 2 reports the average MNRP-corrected powers of the tests when Ω = Ω
Pos and we use theseresults to further contextualize the local power improvement associated with CMS tests over GMS and RSWtests. We benchmark our analysis to RMS because simulation evidence in Andrews and Barwick (2012a)suggests that it is superior in terms of asymptotic average power and is therefore the recommended test.The CMS- S A and RMS tests are neck and neck as their average powers are essentially identical and achievethe highest average powers in all of those scenarios, with the CMS- S test having slightly lower averagepowers than those tests. For a given S , the RSW tests are the worst performing, as they achieve the lowestaverage powers in each corresponding scenario, and the difference between them and the RMS test can bequite large. For example, when J = 10 and n = 250, the difference between RSW- S and RMS is 0.252, andwith RSW- S A it is 0.055 which is a much smaller on account of using a more effective test statistic. TheCMS- S test dominates the GMS- S test in each of those scenarios, where the difference can be as large as10 percentage points – see the scenarios with J = 10. Consequently, the importance of incorporating thestatistical information from the constraints, as we do with CMS, picks up the difference in average powersbetween the RMS and GMS test when S = S A and most of the difference when S = S , in each of the thosescenarios.While the focus above has been on average power, for individual µ vectors the power differences can bemassive with Ω = Ω Pos . Consider, for example, the element µ/ √ n ∈ M ,n (Ω Pos ) with µ = ( − . , , , (cid:124) .This mean vector is an example of an SNVI local alternative. Table 3 reports the MNRP-corrected powerestimates for the tests under this local alternative for n = 50 , , µ/ √ n ∈ M ,n (Ω Pos ) with µ = ( − . , , , (cid:124) . J n
GMS- S GMS- S A CMS- S CMS- S A RSW- S RSW- S A RMS50 0.3 0.668 0.684 0.733 0.237 0.633 0.7344 100 0.275 0.663 0.625 0.726 0.236 0.645 0.74250 0.256 0.665 0.616 0.718 0.234 0.654 0.744
Figure 3: ECDFs of MNRP-corrected CMS (solid line), GMS (dashed line), RMS (dash-dot line),and RSW (dotted line) critical values using the S (MMM) and S A (AQLR) test statistics. • There can be extremely large power improvements associated with CMS relative to RSW and GMSwhen S = S . Indeed, the improvement in power of CMS over GMS is approximately 36 percentagepoints and 40 percentages points over RSW. • The improvements persist with S = S A , but are not as large. The AQLR statistic results in CMSexperiencing a six percentage point improvement over GMS and eight percentages point improvementover RSW. In absolute terms, all procedures experience higher local power with S = S A . • The MNRP-corrected powers of CMS- S A are comparable to their RMS counterparts.To gain a deeper insight into the behavior of the tests under this local alternative, Figure 3 reports theempirical distribution functions (ECDFs) of the MNRP-corrected critical values for n = 250 . The focus onthis sample size is without loss of generality as similar graphs of the critical values’ ECDFs arise in all ofthe other values of n we considered. For either test statistic, the ECDFs in Figure 3 show strong evidenceof a first-order stochastic dominance ranking among the critical values of the CMS, GMS, and RSW, tests.Specifically, for both types of test statistics, there is evidence for the ordering ´ c n ≤ ˆ c n ≤ ˇ c n , where ˇ c n denotesthe RSW critical value. By contrast, the ECDF of the recommended RMS test crosses that of CMS with S = S A , which means that there isn’t evidence of a clear ordering of their critical values. Overall, thedifferences between the ECDFs is quite striking and indicates that there is a big difference in the behaviorof the tests even in moderately large sample sizes. The stochastic ordering of the CMS and GMS criticalvalues is a reflection of Theorem 3 and provides evidence for local power improvements under SNVI local lternatives which have positively correlated moment functions. Finally, we discuss the behavior of the RSWprocedure. The RSW procedure rejects on the event { M n ( β ) (cid:42) R J + } T { S > ˇ c n } , where M n ( β ) is a lowerconfidence rectangle that is used to detect “positive” moments in the first step of their two-step procedure(see Appendix G.2). Across the two test statistics, our simulations indicate that (i) the event { M n ( β ) (cid:42) R J + } occurs with empirical probability close to 1, and (ii) in 9465 times out of 10000 Monte Carlo replications,their critical value ˇ c n corresponds to the case where none of the moment inequalities have been omitted fromits calculation. These findings show the RSW procedure fails to reliably detect the “positive” moments in µ/ √ n in most of the 10000 Monte Carlo replications, resulting in it having low empirical power. This paper has proposed a surgical modification of the generalized moment selection (GMS) procedure putforward by Andrews and Soares (2010) that improves its performance, called constrained moment selection(CMS). The basic idea of the CMS procedure is to use empirical likelihood to incorporate the informationembedded in the moment inequality constraints into the moment selection step of the GMS procedure. Ouranalyses highlights the importance of using this information to more reliably detect the binding moments,which is the source of the improvement of CMS over GMS tests.There are a number of directions for future research. Although we focus on modifying GMS tests, theintuition of incorporating the information embedded in the identified set transcends this choice and weconjecture that similar finite-sample benefits would arise in a similar modification to the two-step procedureof Romano et al. (2014). There is also an emerging literature that focuses on testing with ‘many’ moments,where the number of inequalities grow exponentially with the sample size (e.g., Chernozhukov et al., 2019,and Bai et al., 2019). Extending the empirical likelihood modification to such testing procedures mayimprove their performance, but different theoretical tools must be employed to account for the increasingnumber of constraints. Finally, our paper is related to the semi-infinite programming empirical likelihoodprocedure proposed by Lok and Tabri (in press) for two-step bootstrap tests of stochastic dominance, wherethe continuum of unconditional moment inequalities is akin to inference for conditional moment inequalities.Their results are limited to restricted stochastic dominance tests and it would be interesting to extend theinsights from this paper to the general conditional moment inequality models of Andrews and Shi (2013,2017).
We are grateful to Jonathan Roth for providing valuable comments. We are also appreciative of feedback fromparticipants at the Graduate Student Workshop in Econometrics, Harvard University. The computationsin this paper were run on the FASRC Cannon cluster supported by the FAS Division of Science ResearchComputing Group at Harvard University. All errors are our own.
References
Andrews, D. W. K. and Barwick, P. J. (2012a). Inference for parameters defined by moment inequalities: Arecommended moment selection procedure.
Econometrica , 80(6):2805–2826. ndrews, D. W. K. and Barwick, P. J. (2012b). Supplement to “inference for parameters defined by momentinequalities: A recommended moment selection procedure”. Econometrica , 80(6):2805–2826.Andrews, D. W. K. and Guggenberger, P. (2009). Validity of subsampling and “plug-in asymptotic” inferencefor parameters defined by moment inequalities.
Econometric Theory , 25(3):669–709.Andrews, D. W. K. and Shi, X. (2013). Inference based on conditional moment inequalities.
Econometrica ,81(2):609–666.Andrews, D. W. K. and Shi, X. (2017). Inference based on many conditional moment inequalities.
Journalof Econometrics , 196(2):275–287.Andrews, D. W. K. and Soares, G. (2010). Inference for parameters defined by moment inequalities usinggeneralized moment selection.
Econometrica , 78(1):119–157.Bai, Y., Santos, A., and Shaikh, A. (2019). A practical method for testing many moment inequalities.
University of Chicago, Becker Friedman Institute for Economics Working Paper , (2019-116).Canay, I. A. (2010). El inference for partially identified models: Large deviations optimality and bootstrapvalidity.
Journal of Econometrics , 156(2):408–425.Chernozhukov, V., Chetverikov, D., and Kato, K. (2019). Inference on causal and structural parametersusing many moment inequalities.
The Review of Economic Studies , 86(5):1867–1900.Chernozhukov, V., Hong, H., and Tamer, E. (2007). Estimation and confidence regions for parameter setsin econometric models.
Econometrica , 75(5):1243–1284.Chetverikov, D., Santos, A., and Shaikh, A. M. (2018). The Econometrics of Shape Restrictions.
AnnualReview of Economics , 10:31–63.Ciliberto, F. and Tamer, E. (2009). Market structure and multiple equilibria in airline markets.
Econometrica ,77(6):1791–1828.Guggenberger, P. and Smith, R. J. (2005). Generalized empirical likelihood estimators and tests underpartial, weak, and strong identification.
Econometric Theory , pages 667–709.Hall, P. and Presnell, B. (1999). Intentionally biased bootstrap methods.
Journal of the Royal StatisticalSociety. Series B (Statistical Methodology) , 61(1):143–158.Hsu, Y.-C. and Shi, X. (2017). Model-selection tests for conditional moment restriction models.
TheEconometrics Journal , 20(1):52–85.Imbens, G. W. and Manski, C. F. (2004). Confidence intervals for partially identified parameters.
Econo-metrica , 72(6):1845–1857.Lok, T. M. and Tabri, R. V. (in press). An Improved Bootstrap Test for Restricted Stochastic Dominance.
Journal of Econometrics .Manski, C. F. and Tamer, E. (2002). Inference on regressions with interval data on a regressor or outcome.
Econometrica , 70(2):519–546. oon, H. R. and Schorfheide, F. (2009). Estimation with overidentifying inequality moment conditions. Journal of Econometrics , 153(2):136–154.Owen, A. B. (2001).
Empirical likelihood . Chapman and Hall/CRC.Pakes, A., Porter, J., Ho, K., and Ishii, J. (2015). Moment inequalities and their application.
Econometrica ,83(1):315–334.Rambachan, A. and Roth, J. (2019). An honest approach to parallel trends.Romano, J. P., Shaikh, A. M., and Wolf, M. (2014). A practical two-step method for testing momentinequalities.
Econometrica , 82(5):1979–2002.Rudin, W. (1976).
Principles of mathematical analysis , volume 3. McGraw-hill New York.Shi, X. (2015). Model selection tests for moment inequality models.
Journal of Eonometrics , 187:1–17.Whang, Y.-J. (2019).
Econometric Analysis of Stochastic Dominance: Concepts, Methods, Tools, and Ap-plications . Themes in Modern Econometrics. Cambridge University Press. Outline
This Appendix provides supplementary material to this paper. It is organized as follows. • Section B lists the complete set of assumptions on the test statistic that Andrews and Soares (2010)use in their work. We use these conditions in the proofs of the main results in the paper. • Section C presents the proofs of the results in the paper: Theorems 1, 2, 3, and 4. • Section D presents technical lemmas used in the proof of Theorem 1. • Section E presents technical lemmas used in the proofs of Theorems 2 and 3. • Section G.1 outlines the refined moment selection procedure of Andrews and Barwick (2012a). • Section G.2 outlines the two-step procedure of Romano et al. (2014). • Section G.3 details the MNRP corrections.
B Test Statistic Assumptions
Assumption 1.
1. Monotonicity: S ( g, Σ) is nonincreasing in g for all ( g, Σ) ∈ R J × R J × J .2. Invariance: S ( g, Σ) = S ( Dg, D Σ D ) for all g ∈ R J , Σ ∈ R J × J and positive definite diagonal matrix of Σ , D ∈ R J × J .3. Nonnegativity: S ( g, Ω) ≥ for all ( g, Ω) ∈ R J × Ψ .4. Continuity: S ( g, Ω) is a continuous function of g ∈ R J and Ω ∈ Ψ . Assumption 2.
For any h ∈ R J + , ∞ , Ω ∈ Ψ , Z ∗ ∼ N (0 J , I J ) , and x ∈ R , the distribution function of S (Ω Z ∗ + h , Ω) is 1. continuous at x > , 2. strictly increasing in x > unless h = [ ∞ , ..., ∞ ] > ∈ R J + , ∞ ,and 3. does not exceed / at x = 0 when h = 0 J . Assumption 3.
A necessary and sufficient condition for S ( g, Ω) > is that there exists j ∈ J that satisfies g j < , where g = ( g , ..., g J ) > and Ω ∈ Ψ . Assumption 4.
Let Z ∗ ∼ N (0 J , I J ) , α ∈ (0 , ) , and c (Ω , − α ) be the (1 − α ) -quantile of the distributionof S (Ω Z ∗ , Ω) . We assume1. The distribution function of S (Ω Z ∗ , Ω) is continuous at c (Ω , − α ) for all Ω ∈ Ψ .2. c (Ω , − α ) is a uniformly continuous function of Ω ∈ Ψ . Assumption 5.
1. Let v ∈ R J [+ ∞ ] and Ω ∈ Ψ be arbitrary. The distribution function of S (Ω Z ∗ + v, Ω) is a) continuous for x > and is b) strictly increasing at x > unless v = [ ∞ , ..., ∞ ] > ∈ R J + , ∞ .2. For all g , g ∗ ∈ R J + , ∞ that satisfy g ∗ (cid:31) g , we assume that P ( S (Ω Z ∗ + g , Ω) ≤ x ) < P ( S (Ω Z ∗ + g ∗ , Ω) ≤ x ) , where x > . We apply the definition of uniform continuity provided in Rudin (1976). That is, a function f : X → Y , where( X, d X ) and ( Y, d Y ) are metric spaces, is uniformly continuous if ∀ ε > , ∃ δ := δ ( ε ) > s.t. ∀ x, y ∈ X, d X ( x, y ) <δ = ⇒ d Y ( f ( x ) , f ( y )) < ε . The relation ‘ b (cid:31) a ’ means that every element in a is less than or equal to every element in b and the inequalityholds strictly for each least one element. ssumption 6. There exists χ > such that for each a ∈ R ++ , S ( ag, Ω) = a χ S ( g, Ω) for all g ∈ R J and Ω ∈ Ψ . Assumption 7.
Let h ,j : F → R + , ∞ given by h ,j ( θ, F ) = ∞ if E F ( g j ( W i , θ )) > and h ,j ( θ, F ) = 0 if E F ( g j ( W i , θ )) = 0 , and define h ( θ, F ) = [ h , ( θ, F ) , ..., h ,J ( θ, F )] > . Moreover, let Ω( θ, F ) := lim n →∞ Corr F ( n ˆ g n ( θ )) .There exists ( θ, F ) ∈ F such that the distribution of S (Ω ( θ, F ) Z ∗ + h ( θ, F ) , Ω( θ, F )) is continuous at its − α quantile, where Z ∗ ∼ N (0 J , I J ) . C Proofs of Theorems
We introduce notation: ˆ ϕ n ( θ ) := ϕ (cid:0) ˆ ξ n ( θ ) , ˆΩ n ( θ ) (cid:1) , ´ ϕ n ( θ ) := ϕ (cid:0) ´ ξ n ( θ ) , ˆΩ n ( θ ) (cid:1) , ˆ L n ( θ, Z ∗ ) := S (cid:0) ˆΩ n ( θ ) Z ∗ + ϕ ( ˆ ξ n ( θ ) , ˆΩ n ( θ )) , ˆΩ n ( θ ) (cid:1) , and ˆ L n ( θ, Z ∗ ) := S (cid:0) ˆΩ n ( θ ) Z ∗ + ϕ ( ´ ξ n ( θ ) , ˆΩ n ( θ )) , ˆΩ n ( θ ) (cid:1) for each θ ∈ Θ. We areassuming that ϕ = ϕ (1) ; see the discussion in Section F.1 for the general case. C.1 Theorem 1
Proof.
We present an outline of the proof and then the steps in detail.
Outline.
Lemma D.2 establishes the feasible set in the empirical likelihood optimisation problem (2.9) isnon-empty with probability tending to one uniformly over F + . Consequently, the constrained estimator ofthe moments exists and is unique with probability tending to one uniformly over F + . With this technicalresult in mind, the proof has four steps. First, we show that { ´ ϕ n ( θ ) = ˆ ϕ n ( θ ) } occurs with probabilityapproaching 1 as n → + ∞ with uniformity over F + . In the second step, we use the first result to show thatfor any α ∈ (0 , ) and any r > {| ´ c n ( θ, − α ) − ˆ c n ( θ, − α ) | < r } occurs with probability tending to 1 as n → + ∞ , uniformly over F + . In the third step, we use step 2 to show thatlim inf n → + ∞ inf ( θ,F ) ∈F + P F (cid:16) T n ( θ ) ≤ ´ c n ( θ, − α ) (cid:17) = lim inf n → + ∞ inf ( θ,F ) ∈F + P F (cid:16) T n ( θ ) ≤ ˆ c n ( θ, − α ) (cid:17) . In the final step, we prove all three statements in the theorem simultaneously by invoking Theorem 1 ofAndrews and Soares (2010).
Step 1.
The complement rule for probability measures implies that it suffices to showlim sup n → + ∞ sup ( θ,F ) ∈F + P F (cid:16) ˆ ϕ n ( θ ) = ´ ϕ n ( θ ) (cid:17) = 0 , which amounts to proving lim sup n → + ∞ sup ( θ,F ) ∈F + P F (cid:16) ˆ ϕ n,j ( θ ) = ´ ϕ n,j ( θ ) (cid:17) = 0 or each j ∈ { , ..., J } . Indeed, (cid:8) ˆ ϕ n ( θ ) = ´ ϕ n ( θ ) (cid:9) = S Jj =1 (cid:8) ˆ ϕ n,j ( θ ) = ´ ϕ n,j ( θ ) (cid:9) implieslim sup n → + ∞ sup ( θ,F ) ∈F + P F (cid:18) ˆ ϕ n ( θ ) = ´ ϕ n ( θ ) (cid:19) ≤ J X j =1 lim sup n → + ∞ sup ( θ,F ) ∈F + P F (cid:16) ˆ ϕ n,j ( θ ) = ´ ϕ n,j ( θ ) (cid:17) by the finite subadditivity of probability measures and basic properties of the supremum.To this end, fix j ∈ { , ..., J } arbitrarily. Recognizing that { ˆ ϕ n,j ( θ ) = ´ ϕ n,j ( θ ) } = { ˆ ϕ n,j ( θ ) > ´ ϕ n,j ( θ ) } S { ˆ ϕ n,j ( θ ) < ´ ϕ n,j ( θ ) } , it follows that P F (cid:16) ˆ ϕ n,j ( θ ) = ´ ϕ n,j ( θ ) (cid:17) = P F (cid:16) ˆ ϕ n,j ( θ ) > ´ ϕ n,j ( θ ) (cid:17) + P F (cid:16) ˆ ϕ n,j ( θ ) < ´ ϕ n,j ( θ ) (cid:17) = P F (cid:16) ˆ ξ n,j ( θ ) > , ´ ξ n,j ( θ ) ≤ (cid:17) + P F (cid:16) ˆ ξ n,j ( θ ) ≤ , ´ ξ n,j ( θ ) > (cid:17) ≤ P F (cid:16) ˆ g n,j ( θ ) > ´ g n,j ( θ ) (cid:17) + P F (cid:16) ˆ g n,j ( θ ) < ´ g n,j ( θ ) (cid:17) = P F (cid:16) ˆ g n,j ( θ ) = ´ g n,j ( θ ) (cid:17) where the second equality and the inequality hold by definition of ϕ (1) . Lemma D.3 then is invoked toestablish that lim sup n → + ∞ sup ( θ,F ) ∈F + P F (cid:16) ˆ g n,j ( θ ) = ´ g n,j ( θ ) (cid:17) = 0and thereforelim sup n → + ∞ sup ( θ,F ) ∈F + P F (cid:16) ˆ ϕ n,j ( θ ) = ´ ϕ n,j ( θ ) (cid:17) ≤ lim sup n → + ∞ sup ( θ,F ) ∈F + P F (cid:16) ˆ g n,j ( θ ) = ´ g n,j ( θ ) (cid:17) = 0which completes the proof of Step 1. Step 2.
We use step 1 to show that the event (cid:8) ˆ c n ( θ, − α ) = ´ c n ( θ, − α ) (cid:9) occurs with probabilityapproaching 0 as n → + ∞ , with uniformity over F + . This follows immediately from step 1 because P F (cid:16) ˆ c n ( θ, − α ) = ´ c n ( θ, − α ) (cid:17) ≤ P F (cid:16) ´ ϕ n ( θ ) = ˆ ϕ n ( θ ) (cid:17) where the inequality holds because ˆ L n ( θ, Z ∗ ) and ´ L n ( θ, Z ∗ ) only differ through the realization of the momentselection function a.s. [ Z ∗ ]. Step 1 and the squeeze rule then implies thatlim sup n → + ∞ sup ( θ,F ) ∈F + P F (ˆ c n ( θ, − α ) = ´ c n ( θ, − α )) = 0 . Step 3.
The result established in the second step allows us to conclude that (cid:8) T n ( θ ) ≤ ´ c n ( θ, − α ) (cid:9) = (cid:8) T n ( θ ) ≤ ˆ c n ( θ, − α ) + o p (1) (cid:9) uniformly over F + . The uniformity implies thatlim inf n → + ∞ inf ( θ,F ) ∈F + P F (cid:16) T n ( θ ) ≤ ´ c n ( θ, − α ) (cid:17) = lim inf n → + ∞ inf ( θ,F ) ∈F + P F (cid:16) T n ( θ ) ≤ ˆ c n ( θ, − α ) (cid:17) . tep 4. The previous step established that the asymptotic confidence sizes of GMS and CMS are equal.Combine this with the fact that F + ⊆ F and apply Theorem 1 in Andrews and Soares (2010) to concludeall three statements in the theorem simultaneously. (cid:4) C.2 Theorem 2
Proof.
We present an outline of the proof and then the steps in detail.
Outline.
Lemma E.2 establishes the feasible set in the empirical likelihood optimisation problem (2.9) isnon-empty with probability tending to one under local alternatives that satisfy Assumption LA1 and LA2,i.e., local alternatives in the set H . Consequently, the constrained estimator of the moments exists and isunique with probability tending to one, under these local alternatives. With this technical result in mind,the proof has four steps and is similar to the proof of Theorem 1. First, we show { ´ ϕ n ( θ n, ∗ ) = ˆ ϕ n ( θ n, ∗ ) } occurs with probability approaching 1 as n → + ∞ for any sequence { ( θ n, ∗ , F n ) : n ≥ } . Next, we showthat { ˆ c n ( θ n, ∗ , − α ) = ´ c n ( θ n, ∗ , − α ) } is an event that occurs with probability approaching 0 as n → + ∞ along { ( θ n, ∗ , F n ) : n ≥ } . In the third step, we use the second step to conclude thatlim n → + ∞ P F n (cid:0) T n ( θ n, ∗ ) ≤ ˆ c n ( θ n, ∗ , − α ) (cid:1) = lim n → + ∞ P F n (cid:0) T n ( θ n, ∗ ) ≤ ´ c n ( θ n, ∗ , − α ) (cid:1) for all sequences { ( θ n, ∗ , F n ) : n ≥ } . In the fourth step, we invoke Part A of Theorem 2 in Andrews andSoares (2010) to establish the result. Step 1.
Step 1 follows a similar line of reasoning to the same step in Theorem 1. We pick an arbitrarysequences of n − -local alternatives { ( θ n, ∗ , F n ) : n ≥ } and show thatlim n → + ∞ P F n (cid:0) ˆ ϕ n ( θ n, ∗ ) = ´ ϕ n ( θ n, ∗ ) (cid:1) = 0 . To do this, we recognize that (cid:8) ˆ ϕ n ( θ n, ∗ ) = ´ ϕ n ( θ n, ∗ ) (cid:9) = S Jj =1 (cid:8) ˆ ϕ n,j ( θ n, ∗ ) = ´ ϕ n,j ( θ n, ∗ ) (cid:9) and therefore that P F n (cid:16) ˆ ϕ n ( θ n, ∗ ) = ´ ϕ n ( θ n, ∗ ) (cid:17) ≤ J X j =1 P F n (cid:16) ˆ ϕ n,j ( θ n, ∗ ) = ´ ϕ n,j ( θ n, ∗ ) (cid:17) ≤ J X j =1 P F n (cid:16) ˆ g n,j ( θ n, ∗ ) = ´ g n,j ( θ n, ∗ ) (cid:17) using identical reasoning to the corresponding result in the proof of Theorem 1, except replace θ with θ n, ∗ and F with F n, ∗ . It follows then thatlim n → + ∞ P F n (cid:16) ˆ ϕ n ( θ n, ∗ ) = ´ ϕ n ( θ n, ∗ ) (cid:17) ≤ J X j =1 lim n → + ∞ P F n (cid:16) ˆ g n,j ( θ n, ∗ ) = ´ g n,j ( θ n, ∗ ) (cid:17) = 0where the second equality holds by Lemma E.6. tep 2. The proof of step 2 is almost identical to step 2 in Theorem 1. We use the exact same reasoningas Step 2 of Theorem 1 to conclude that P F n (cid:16) ´ c n ( θ n, ∗ , − α ) = ˆ c n ( θ n, ∗ , − α ) (cid:17) ≤ P F n (cid:16) ´ ϕ n ( θ n, ∗ ) = ˆ ϕ n ( θ n, ∗ ) (cid:17) ∀ n ≥ n → + ∞ P F n (cid:16) ´ c n ( θ n, ∗ , − α ) = ˆ c n ( θ n, ∗ , − α ) (cid:17) = 0 following step 1. Step 3.
The result established in the second step allows us to conclude that (cid:8) T n ( θ n, ∗ ) ≤ ´ c n ( θ n, ∗ , − α ) (cid:9) = (cid:8) T n ( θ n, ∗ ) ≤ ˆ c n ( θ n, ∗ , − α ) + o p (1) (cid:9) along any sequence { ( θ n, ∗ , F n ) : n ≥ } . As such,lim n → + ∞ P F n (cid:16) T n ( θ n, ∗ ) ≤ ´ c n ( θ n, ∗ , − α ) (cid:17) = lim n → + ∞ P F n (cid:16) T n ( θ n, ∗ ) ≤ ˆ c n ( θ n, ∗ , − α ) (cid:17) for all sequences { ( θ n, ∗ , F n ) : n ≥ } . Step 4.
The previous step established that the n − -local power functions of GMS and CMS are equivalentto first order. We can then apply Part A of Theorem 2 in Andrews and Soares (2010) to conclude thetheorem. (cid:4) C.3 Theorem 3
For the proof of Theorem 3, we let A ∗ n,α denote the event (cid:26) ´Υ n ( θ n, ∗ ) (cid:40) ˆΥ n ( θ n, ∗ ) (cid:27) \ (cid:26) ˆ c n ( θ n, ∗ , − α ) > (cid:27) \ (cid:26) ´ c n ( θ n, ∗ , − α ) < T n ( θ n, ∗ ) ≤ ˆ c n ( θ n, ∗ , − α ) (cid:27) . We also let { W i,n : i ≤ n } denote the n th row of the triangular array induced by { ( θ n, ∗ , F n ) : n ≥ } . C.3.1 Proof of Theorem 3
Proof.
We outline the argument and then prove the result in detail.
Outline.
Lemma E.2 establishes the feasible set in the empirical likelihood optimisation problem (2.9) isnon-empty with probability tending to one under local alternatives that satisfy Assumption LA1 and LA2,i.e., local alternatives in the set H . Consequently, the constrained estimator of the moments exists and isunique with probability tending to one, under these local alternatives. With this technical result in mind,the proof has three steps. First, we show (cid:8) ´ c n ( θ n, ∗ , − α ) ≤ ˆ c n ( θ n, ∗ , − α ) (cid:9) occurs with probability tendingto 1 along any sequence { ( θ n, ∗ , F n ) : n ≥ } ∈ M . This allows us to conclude the first part of the theorem.In the second step, we show that the event A ∗ n,α implies that (cid:8) ´ c n ( θ n, ∗ , − α ) < ˆ c n ( θ n, ∗ , − α ) (cid:9) . In the finalstep, we conclude the strict ordering of the rejection probabilities. Step 1.
Let { ( θ n, ∗ , F n ) : n ≥ } ∈ M . Lemma E.9 states T Jj =1 (cid:8) ´ ϕ n,j ( θ n, ∗ ) ≥ ˆ ϕ n,j ( θ n, ∗ ) (cid:9) with probabilityapproaching 1 along { ( θ n, ∗ , F n ) : n ≥ } . It follows from Part 1 of Assumption 1 that (cid:8) ´ L n ( θ n, ∗ , Z ∗ ) ≤ ˆ L n ( θ n, ∗ , Z ∗ ) a.s. [ Z ∗ ] (cid:9) with probability approaching 1 under { ( θ n, ∗ , F n ) : n ≥ } . Consequently, (cid:8) ´ c n ( θ n, ∗ , − α ) ≤ ˆ c n ( θ n, ∗ , − α ) (cid:9) occurs with probability approaching 1 along { ( θ n, ∗ , F n ) : n ≥ } . Thus, there xists N ( θ n, ∗ , F n ) ≥ P F n ( T n ( θ n, ∗ ) > ˆ c n ( θ n, ∗ , − α )) ≤ P F n ( T n ( θ n, ∗ ) > ´ c n ( θ n, ∗ , − α )) for all n ≥ N ( θ n, ∗ , F n ). Step 2.
The event A ∗ n,α implies the event (cid:8) ´Υ n ( θ n, ∗ ) (cid:40) ˆΥ n ( θ n, ∗ ) (cid:9) T (cid:8) ˆ c n ( θ n, ∗ , − α ) > (cid:9) , which allowsus to apply Part 1 of Assumption 2 and Part 2 of Assumption 5 to deduce that 1 − α = P (cid:16) ˆ L n ( θ n, ∗ , Z ∗ ) ≤ ˆ c n ( θ n, ∗ , − α ) (cid:17) < P (cid:16) ´ L n ( θ n, ∗ , Z ∗ ) ≤ ˆ c n ( θ n, ∗ , − α ) (cid:17) a.s. (cid:2) { W i,n : i ≤ n } (cid:3) . Applying Part 1 of Assumption2 again, we conclude that A ∗ n,α ⊆ (cid:8) ´ c n ( θ n, ∗ , − α ) < ˆ c ( θ n, ∗ , − α ) (cid:9) . This completes step 2. Step 3.
Since A ∗ n,α ⊆ (cid:8) ´ c n ( θ n, ∗ , − α ) < T n ( θ n, ∗ ) ≤ ˆ c n ( θ n, ∗ , − α ) (cid:9) by construction, we use Step 2to deduce A ∗ n,α ⊆ (cid:8) ´ c n ( θ n, ∗ , − α ) < ˆ c n (1 − α, θ n, ∗ ) (cid:9) T (cid:8) ´ c n ( θ n, ∗ , − α ) < T n ( θ n, ∗ ) ≤ ˆ c n ( θ n, ∗ , − α ) (cid:9) . Consequently, if P F n ( A ∗ n,α ) > P F n ( T n ( θ n, ∗ ) > ´ c n ( θ n, ∗ , − α )) − P F n ( T n ( θ n, ∗ ) > ˆ c n ( θ n, ∗ , − α ))= P F n (cid:16)(cid:8) ´ c n ( θ n, ∗ , − α ) < ˆ c n (1 − α, θ n, ∗ ) (cid:9) \ (cid:8) ´ c n ( θ n, ∗ , − α ) < T n ( θ n, ∗ ) ≤ ˆ c n ( θ n, ∗ , − α ) (cid:9)(cid:17) ≥ P F n ( A ∗ n,α )where the inequality uses monotonicity of probability measures. (cid:4) C.4 Theorem 4
In the proof of Theorem 4, we use the notation ν n ( θ n, ∗ ) := D − n ( θ n, ∗ ) n (cid:0) ˆ g n ( θ n, ∗ ) − E F n g ( W i , θ n, ∗ ) (cid:1) . Proof.
Our approach is based on the proof for the corresponding result in Andrews and Soares (2010). Forease of exposition, we outline the proof and then provide the details.
Outline.
For { w n : n ≥ } any subsequence of { n } , it suffices to show that there exists a furthersubsequence { u n : n ≥ } such that lim n →∞ P F un (cid:0) T u n ( θ u n , ∗ ) > ´ c u n ( θ u n , ∗ , − α ) (cid:1) = 1. In Step 1, we definethe sub-subsequence. In Step 2, we show that ( u n υ u n ) − χ T u n ( θ u n , ∗ ) has a positive probability limit, where χ > u n υ u n ) − χ ´ c u n ( θ u n , ∗ , − α ) is zero. Inthe final step, we use Step 2 and Step 3 to establish that lim n →∞ P F un (cid:0) T u n ( θ u n , ∗ ) > ´ c u n ( θ u n , ∗ , − α ) (cid:1) = 1. Step 1.
Consider any subsequence { w n : n ≥ } of { n } . We take { u n : n ≥ } so that g ∗ u n /υ u n → e ∈ [ − , + ∞ ] J as n → + ∞ , where g ∗ u n = [ E F un ( g ( W i , θ u n , ∗ )) /σ F un , ( θ u n , ∗ ) , ..., E F un ( g J ( W i , θ u n , ∗ )) /σ F un ,J ( θ u n , ∗ )] > , and υ u n = max ≤ j ≤ J {− g ∗ u n ,j } . This is the sub-subsequence considered in Andrews and Soares (2010). Step 2.
Since we make no modification to the test statistic, we can follow the same argument as (S3.2) inthe Supplement to Andrews and Soares (2010) to conclude that ( u n υ u n ) − χ T u n ( θ ∗ u n ) p → S ( e, Ω ) >
0, where he inequality holds by Assumption 3. The argument for the convergence in probability is provided below( u n υ u n ) − χ T u n ( θ ∗ u n ) = ( u n υ u n ) − χ S (cid:16) ˆ D − u n ( θ u n , ∗ ) D ( θ u n , ∗ ) (cid:0) ν u n ( θ u n , ∗ ) + u n g ∗ u n (cid:1) , ˆΩ u n ( θ u n , ∗ ) (cid:17) = S (cid:16) o p (1) + υ − u n g ∗ n , Ω + o p (1) (cid:17) p → S ( e, Ω )where the first equality is algebraic manipulation and Part 2 of Assumption 1, the second equality is As-sumption 6 and an application of the WLLN and Lyupanov CLT for triangular arrays of row-wise i.i.d.random variables and Part 2 of Distant Alternatives Assumption 1, and the convergence in probability holdsby the construction of the sub-subsequence in Step 1 and Part 4 of Assumption 1. This completes Step 2. Step 3.
We now establish that ( u n υ u n ) − χ ´ c u n ( θ u n , ∗ , − α ) = o p (1) along { ( θ u n , ∗ , F u n ) : n ≥ } . Part 1and 3 of Assumption 1 and the fact that ϕ (1) ∈ R J + , ∞ yield0 ≤ S (cid:0) ˆΩ u n ( θ u n , ∗ ) Z ∗ + ϕ ( ´ ξ u n ( θ u n , ∗ ) , ˆΩ u n ( θ u n , ∗ )) , ˆΩ u n ( θ u n , ∗ ) (cid:1) ≤ S ( ˆΩ u n ( θ u n , ∗ ) Z ∗ , ˆΩ u n ( θ u n , ∗ )) a.s. [ Z ∗ ]. Consequently, the CMS critical value satisfies0 ≤ ´ c u n ( θ u n , ∗ , − α ) ≤ c ( ˆΩ u n ( θ u n , ∗ ) , − α ) p → c (Ω , − α ) = O p (1) (C.1)where c ( ˆΩ u n ( θ u n , ∗ ) , − α ) is the 1 − α quantile of S ( ˆΩ u n ( θ u n , ∗ ) Z ∗ , ˆΩ u n ( θ u n , ∗ )) and the convergence inprobability holds by Part 2 of Assumption 4 and ˆΩ u n p → Ω along { ( θ u n , ∗ , F u n ) : n ≥ } by the weak law oflarge numbers for triangular arrays of row-wise i.i.d. random variables and Part 2 of Distant AlternativesAssumption 1. Since υ u n > n ≥ ≤ ( u n υ u n ) − χ ´ c u n ( θ u n , ∗ , − α ) ≤ ( u n υ u n ) − χ c ( ˆΩ u n ( θ u n , ∗ ) , − α ) p → u n υ u n → ∞ . Step 4.
Combine Step 2 and Step 3 to conclude that P F un ( T u n ( θ u n , ∗ ) > ´ c u n ( θ u n , ∗ , − α )) (C.3)= P F un (( u n υ u n ) − χ T u n ( θ u n , ∗ ) > ( u n υ u n ) − χ ´ c u n ( θ u n , ∗ , − α )) (C.4) → P ( S ( e, Ω ) >
0) = 1 (C.5)as n → ∞ , where the equality is invokes the scale equivariance of quantiles. (cid:4) D Technical Lemmas for Confidence Sets
D.1 Establishing Uniformity
The following lemma validates the subsequence approach to establishing uniformity.
Lemma D.1.
Let { V n ( θ ) : n ≥ } be a sequence of events indexed by θ ∈ Θ . The following is true: im inf n → + ∞ P F wn,h ( V w n ( θ w n ,h )) = 1 for any subsequence { ( θ w n ,h , F w n ,h ) : n ≥ } in F + implies lim inf n → + ∞ inf ( θ,F ) ∈F + P F ( V n ( θ )) = 1 . Proof.
We outline the argument and then provide the details.
Outline.
The proof employs the direct method. In the first step, we use the definition of infimum toconstruct a subsequence { (˜ θ ∗ w n ,h , ˜ F ∗ w n ,h ) : n ≥ } in F + such that for each n ≥ ( θ,F ) ∈F + P F ( V w n ( θ )) + 2 − w n > P ˜ F ∗ wn,h (cid:0) V w n (˜ θ ∗ w n ,h ) (cid:1) . In the second step, we combine this with the assumption that lim inf n → + ∞ P F wn,h ( V w n ( θ w n ,h )) = 1 for anysubsequence { ( θ w n ,h , F w n ,h ) : n ≥ } in F + to conclude the result. Step 1.
As the smallest subsequential limit, the limit inferior implies the existence of a subsequence { w n : n ≥ } of { n } such thatlim n → + ∞ inf ( θ,F ) ∈F + P F (cid:0) V w n ( θ ) (cid:1) = lim inf n → + ∞ inf ( θ,F ) ∈F + P F (cid:0) V n ( θ ) (cid:1) . Consider the subsequence (cid:8) inf ( θ,F ) ∈F + P F ( V w n ( θ )) : n ≥ (cid:9) . For each n ≥ η >
0, there exists(˜ θ w n ,h,η , ˜ F w n ,h,η ) ∈ F + such that inf ( θ,F ) ∈F + P F (cid:0) V w n ( θ ) (cid:1) + η > P F wn,h,η (cid:0) V w n ( θ w n ,h,η ) (cid:1) , by definition of theinfimum. Consequently, there exists a subsequence { (˜ θ ∗ w n ,h , ˜ F ∗ w n ,h ) : n ≥ } in F + that satisfiesinf ( θ,F ) ∈F + P F (cid:0) V w n ( θ ) (cid:1) + 2 − w n > P ˜ F ∗ wn,h (cid:0) V w n (˜ θ ∗ w n ,h ) (cid:1) (D.1)for each n ≥
1. This completes the first step.
Step 2.
If lim inf n → + ∞ P F wn,h ( V w n ( θ w n ,h )) = 1 for any subsequence { ( θ w n ,h , F w n ) : n ≥ } in F + , thenlim inf n → + ∞ P ˜ F ∗ wn,h (cid:0) V w n (˜ θ ∗ w n ,h ) (cid:1) = 1 by construction. Taking the limit inferior on both sides of (D.1), weconclude that lim inf n → + ∞ inf ( θ,F ) ∈F + P F (cid:0) V w n ( θ ) (cid:1) = 1 by the squeeze rule. (cid:4) D.2 Restricted Estimator
CMS is based on the following empirical likelihood primal problem,sup p ( n X i =1 ln( p i ) : n X i =1 p i g j ( W i , θ ) ≥ ∀ j ∈ J , n X i =1 p i = 1 , p i ≥ ∀ i ∈ I ) , (D.2)where p ∈ R n and I := { , ..., n } . A feasible solution to (D.2) is denoted by ´ p ∈ R n and is the uniquemaximiser because the empirical likelihood problem is a strictly convex program (see Owen, 2001).We now establish that the feasible set is non-empty with probability tending to one uniformly over F + . Lemma D.2.
Define the random set C n ( θ ) = ( ( p , ..., p n ) > ∈ R n : n X i =1 p i g j ( W i , θ ) ≥ ∀ j ∈ J , n X i =1 p i = 1 , p i ≥ ∀ i ∈ I ) or all θ ∈ Θ . The following is true: lim sup n →∞ sup ( θ,F ) ∈F + P F ( C n ( θ ) = ∅ ) = 0 . Proof.
We outline the proof and then provide the details.
Outline.
The proof proceeds by the direct method and, in accordance with Lemma D.1, we only need toshow that lim sup n → + ∞ P F wn,h ( C w n ( θ w n ,h ) = ∅ ) = 0 for any subsequence { ( θ w n ,h , F w n ,h ) : n ≥ } in F + .In the first step, we establish the result for sequences { ( θ n,h , F n,h ) : n ≥ } in F + using the union boundand the weak law of large numbers for triangular arrays of row-wise i.i.d. random variables. In Step 2, wegeneralize the argument to subsequences and complete the proof. Step 1.
We start by proving the result along sequences { ( θ n,h , F n,h ) : n ≥ } in F + . Consider an arbitrarysequence { ( θ n,h , F n,h ) : n ≥ } in F + . Recognizing that the standard simplex S n = { p ∈ R n : P ni =1 p i =1 , p i ≥ ∀ i ∈ I} 6 = ∅ , it follows that n C n ( θ n,h ) = ∅ o = n ∀ p ∈ S n , ∃ j := j ( p ) ∈ J s.t. n X i =1 p i g j ( W i , θ n,h ) < o , and therefore P F n ( C n ( θ n,h ) = ∅ ) ≤ J X j =1 P F n,h (cid:18) n n X i =1 g j ( W i , θ n,h ) < (cid:19) where the first inequality holds by the finite subadditivity of probability measures and because ( n , ..., n ) > ∈S n for each n ≥
1. We then apply the weak law of large numbers for triangular arrays of row-wise i.i.d.random variables to conclude thatlim sup n → + ∞ P F n,h ( C n ( θ n,h ) = ∅ ) ≤ J X j =1 lim sup n → + ∞ P F n,h (cid:18) n n X i =1 g j ( W i , θ n,h ) < (cid:19) = 0where the equality holds because E F n g j ( W i , θ n,h ) ≥ n ≥ { ( θ n,h , F n ) : n ≥ } is asequence in F . Since { ( θ n,h , F n ) : n ≥ } was arbitrary, we establish the result along sequences. Step 2.
To establish the result for subsequences { w n : n ≥ } of { n } , just replaces n with w n in theprevious argument. (cid:4) In order to prove technical results, we reformulate the primal problem as one with equality constraints inorder to make use of lemmas in Andrews and Guggenberger (2009) (hereafter, AG09). Let t ∈ R J + denotea nuisance parameter vector where the j th element measures the slackness of corresponding moment. Thevector t allows us to formulate the empirical likelihood primal problem as a parameterized optimizationproblem as follows, EL ( t ) := sup p ( n X i =1 ln( p i ) : n X i =1 p i g i ( t, θ ) = 0 J , n X i =1 p i = 1 , p i ≥ ∀ i ∈ I ) , (D.3) here g i ( t, θ ) := g ( W i , θ ) − t and 0 J denotes the zero vector in J -dimensional Euclidean space and theempirical likelihood probabilities (´ p , ..., ´ p n ) are the solution to sup t ∈ R J + EL ( t ).A more convenient representation of the probabilities arises through the saddlepoint form of the empiricallikelihood problem. The Lagrangian for the constrained optimization problem (D.3) is L = n X i =1 ln( p i ) + nλ > n X i =1 p i g i ( t, θ ) + ω n X i =1 p i − ! . (D.4)Note that the non-negativity constraints are ignored as p i = 0 for some i ∈ I is never optimal. The firstorder conditions are ∂ L ∂p i = 1 p i + nλ > g i ( t, θ ) + ω = 0 ∀ i ∈ I (D.5) ∂ L ∂λ = n n X i =1 p i g i ( t, θ ) = 0 J (D.6) ∂ L ∂ω = n X i =1 p i − . (D.7)Multiplying p i with the corresponding first order condition in (D.5) and then summing over I gives ω = − n .Substituting ω = − n into (D.5), we obtain that p i ( λ, t ) = 1 n (cid:0) − λ > g i ( t, θ ) (cid:1) ∀ i = 1 , ..., n. (D.8)Substituting ( p ( λ, t ) , ..., p n ( λ, t )) into the empirical log-likelihood function implies the saddle point repre-sentation of the empirical likelihood probleminf t ∈ R J + sup λ ∈ ´Λ n n n X i =1 ln (cid:16) − λ > g i ( t, θ ) (cid:17) , (D.9)where ´Λ n := { λ ∈ R J : λ > g i ( t, θ ) ∈ Q } and Q is an open subset of R that contains 0. The saddle pointproblem (D.9) is presented in AG09, which implies that the useful lemmas in that paper can be invoked toestablish the uniform validity of CMS. D.3 Lemmas Relating to the Restricted Estimator
The next results establish the uniform consistency of the restricted empirical likelihood estimator of themean and variance over F + . We must define some more notation before proceeding. For any subsequence { ( θ w n ,h , F w n ,h ) : n ≥ } in F + , let (´ t n , ´ λ n ) ∈ R J + × ´Λ n denote the solution to (D.9) evaluated at the subse-quence (i.e. replace θ with θ w n,h ). The construction of the feasible set implies ´ t w n = P ni =1 ´ p i g ( W i , θ w n ,h ).Lemma D.2 establishes that the estimator exists with probability approaching 1 uniformly over F + . Allsubsequent analysis assumes the event {C n ( θ ) = ∅} occurs so that the estimator exists, where the randomset C n ( θ ) was defined in Lemma D.2.Define an empirical process { ˆ g n ( t ) : t ∈ R J + } given by ˆ g n ( t ) = n − P ni =1 g i ( t, θ ) for each t ∈ R J + . Since F + satisfies Assumption GEL of AG09, we invoke Lemma 6 and the subsequent remark in their paper and state hat ˆ g w n (´ t w n ) = O p ( w − n ) for any subsequence { ( θ w n ,h , F w n ,h ) : n ≥ } in F + . This gives us a uniform rateof convergence result for difference between the constrained and unconstrained estimator of the momentsand in the statement || · || ‘ J denotes the Euclidean norm for R J . Lemma D.3.
Let ´ g n ( θ ) := P ni =1 ´ p i g ( W i , θ ) and ˆ g n ( θ ) := n − P ni =1 g ( W i , θ ) for each θ ∈ Θ . The followingis true: || ´ g n ( θ ) − ˆ g n ( θ ) || ‘ J = O p ( n − ) uniformly over F + .Proof. The proof follows by the direct method. Since we want to show that ´ g n ( θ ) − ˆ g n ( θ ) = O p ( n − )with uniformity over F + , it suffices to show that ´ g w n ( θ w n ,h ) − ˆ g w n ( θ w n ,h ) = O p ( w − n ) for all subsequences { ( θ w n ,h , F w n ,h ) : n ≥ } (see Lemma D.1). Observing that ˆ g w n ( θ w n ,h ) − ´ g w n ( θ w n ,h ) = ˆ g w n (´ t w n ) for anysubsequence { ( θ w n ,h , F w n ,h ) : n ≥ } , we apply Lemma 6 of AG09 and conclude that ˆ g n ( θ w n ,h ) − ´ g n ( θ w n ,h ) = O p ( w − n ), which completes the proof. (cid:4) For the next lemma, we must introduce some more notation. Let Mat J × J ( R ) denote the vector space of J × J matrices over R . For each A ∈ Mat J × J ( R ), let || A || ‘ J × J := J X i =1 J X j =1 | a ij | ! . This is the
Frobenius norm . We let´Σ n ( θ ) := n X i =1 ´ p i (cid:0) g ( W i , θ ) − ´ g n ( θ ) (cid:1)(cid:0) g ( W i , θ ) − ´ g n ( θ ) (cid:1) > denote the constrained estimator of the moment covariance matrix andˆΣ n ( θ ) := 1 n n X i =1 (cid:0) g ( W i , θ ) − ´ g n ( θ ) (cid:1)(cid:0) g ( W i , θ ) − ´ g n ( θ ) (cid:1) > denote the unconstrained estimator of the moment covariance matrix for each θ ∈ Θ. Lemma D.4.
For each r > , lim inf n → + ∞ inf ( θ,F ) ∈F + P F (cid:16)(cid:12)(cid:12)(cid:12)(cid:12) ´Σ n ( θ ) − ˆΣ n ( θ ) (cid:12)(cid:12)(cid:12)(cid:12) ‘ J × J < r (cid:17) = 1 Proof.
Due to the length of the proof, we provide an outline and then detailed steps.
Outline.
In accordance with Lemma D.1, it suffices to show that ´Σ w n ( θ w n ,h ) = ˆΣ w n ( θ w n ,h )+ o p (1) for anysubsequence { ( θ w n ,h , F w n ,h ) : n ≥ } in F + . To do this, we first prove the result along sequences. First, weestablish a preliminary result that states that max ≤ i ≤ n | ´ λ > n g i (´ t n , θ n,h ) | = o p (1) for any arbitrary sequence { ( θ n,h , F n ) : n ≥ } in F + . In Step 2, we show that ´Σ n ( θ n,h ) = P ni =1 ´ p i (cid:0) g ( W i , θ n,h ) − ˆ g n ( θ n,h ) (cid:1)(cid:0) g ( W i , θ n,h ) − ˆ g n ( θ n,h ) (cid:1) > + o p (1) holds for an arbitrary sequence { ( θ n,h , F n,h ) : n ≥ } in F + . In Step 3, we use Step 1 toshow that P ni =1 ´ p i (cid:0) g ( W i , θ n,h ) − ˆ g n ( θ n,h ) (cid:1)(cid:0) g ( W i , θ n,h ) − ˆ g n ( θ n,h ) (cid:1) > = ˆΣ n ( θ n,h ) + o p (1) along the sequence { ( θ n,h , F n,h ) : n ≥ } . This completes the proof for sequences. In Step 4, we generalize the result tosubsequences { w n : n ≥ } of { n } . tep 1. We first establish the preliminary result that max ≤ i ≤ n | ´ λ > n g i (´ t n , θ n,h ) | = o p (1) along { ( θ n,h , F n,h ) : n ≥ } in F + . The Cauchy-Schwarz inequality yields thatmax ≤ i ≤ n | ´ λ > n g i (´ t n , θ n,h ) | ≤ || ´ λ n || ‘ J max ≤ i ≤ n || g i (´ t n , θ n,h ) || ‘ J . (D.10)Assumption T lets us apply Part (ii) Lemma 3 of AG09 that statesmax ≤ i ≤ n || g i (´ t n , θ n,h ) || ‘ J = O p ( n δ )and also apply Lemma 5 in AG09 that states || ´ λ n || ‘ J = O p ( n − ) along { ( θ n,h , F n,h ) : n ≥ } . Combiningthese with (D.10), we deduce thatmax ≤ i ≤ n | ´ λ > n g i (´ t n , θ n,h ) | ≤ O p ( n − ) O p ( n δ ) = O p ( n − δ δ ) ) = o p (1)along { ( θ n,h , F n,h ) : n ≥ } . The result established is essential in the third step. Step 2.
We can decompose g ( W i , θ n,h ) − ´ g n ( θ n,h ) = g ( W i , θ n,h ) − ˆ g n ( θ n,h ) + ˆ g n ( θ n,h ) − ´ g n ( θ n,h ) andtherefore ´Σ n ( θ n,h ) = n X i =1 ´ p i (cid:0) g ( W i , θ n,h ) − ˆ g n ( θ n,h ) (cid:1)(cid:0) g ( W i , θ n,h ) − ˆ g n ( θ n,h ) (cid:1) > + n X i =1 ´ p i (cid:0) g ( W i , θ n,h ) − ˆ g n ( θ n,h ) (cid:1)(cid:0) ˆ g n ( θ n,h ) − ´ g n ( θ n,h ) (cid:1) > + (cid:0) ˆ g n ( θ n,h ) − ´ g n ( θ n,h ) (cid:1) n X i =1 ´ p i (cid:0) g ( W i , θ n,h ) − ˆ g n ( θ n,h ) (cid:1) > + (cid:0) ˆ g n ( θ n,h ) − ´ g n ( θ n,h ) (cid:1)(cid:0) ˆ g n ( θ n,h ) − ´ g n ( θ n,h ) (cid:1) > = n X i =1 ´ p i (cid:0) g ( W i , θ n,h ) − ˆ g n ( θ n,h ) (cid:1)(cid:0) g ( W i , θ n,h ) − ˆ g n ( θ n,h ) (cid:1) > + o p (1)along { ( θ n,h , F n ) : n ≥ } , where the second equality holds by Lemma D.3. Consequently, we need to showthat P ni =1 ´ p i (cid:0) g ( W i , θ n,h ) − ˆ g n ( θ n,h ) (cid:1)(cid:0) g ( W i , θ n,h ) − ˆ g n ( θ n,h ) (cid:1) > = ˆΣ n ( θ n,h )+ o p (1) along { ( θ n,h , F n,h ) : n ≥ } .This completes the task for Step 2. Step 3.
Let A i ( θ n,h ) := (cid:0) g ( W i , θ n,h ) − ˆ g n ( θ n,h ) (cid:1)(cid:0) g ( W i , θ n,h ) − ˆ g n ( θ n,h ) (cid:1) > . The decomposition n X i =1 ´ p i A i ( θ n,h ) = n X i =1 (cid:18) ´ p i − n (cid:19) A i ( θ n,h ) + ˆΣ n ( θ n,h ) + o p (1) eans that we need to show P ni =1 (´ p i − n ) A i ( θ n,h ) = o p (1) along { ( θ n,h , F n ) : n ≥ } . By definition of ´ p i , itfollows that (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X i =1 (cid:18) ´ p i − n (cid:19) A i ( θ n,h ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ‘ J × J ≤ n n X i =1 (cid:12)(cid:12)(cid:12)(cid:12) ´ λ > n g i (´ t n , θ n,h )1 − ´ λ > n g i (´ t n , θ n,h ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) A i ( θ n,h ) (cid:12)(cid:12)(cid:12)(cid:12) ‘ J × J ≤ max ≤ i ≤ n | ´ λ > n g i (´ t n , θ n,h ) | − max ≤ i ≤ n | ´ λ > n g i (´ t n , θ n,h ) | n n X i =1 || A i ( θ n,h ) || ‘ J × J ! = o p (1) O p (1)= o p (1)where the first inequality holds by the triangle inequality, the second holds by the reverse triangle inequalityand the definition of maximum, and the first equality holds by Step 1 and the weak law of large numbers fortriangular arrays of row-wise i.i.d. random variables. This establishes that P ni =1 (cid:0) ´ p i − n (cid:1) A i ( θ n,h ) = o p (1)along { ( θ n,h , F n ) : n ≥ } and, in combination with the result in Step 2, we conclude that ´Σ n ( θ n,h ) =ˆΣ n ( θ n,h ) + o p (1) along sequences { ( θ n,h , F n,h ) : n ≥ } in F + . Step 3.
To generalize to subsequences { w n : n ≥ } of { n } , just replace n with w n and repeat Steps 1, 2and 3. (cid:4) E Technical Lemmas for Local Power
E.1 A Preliminary Lemma
This technical result shows that along any sequence of { ( θ n, ∗ , F n ) : n ≥ } that satisfies Assumption LA1,max ≤ i ≤ n | g j ( W i , θ n, ∗ ) | = O p ( n δ ) = o p ( n ) for any j ∈ J , where δ > ≤ i ≤ n max ≤ j ≤ J | g j ( W i , θ n, ∗ ) | = o p ( n ) and, by the equivalence ofnorms in Euclidean space, max ≤ i ≤ n || g j ( W i , θ n, ∗ ) || ‘ J = o p ( n ). The result is used in the proofs of LemmaE.5, Lemma E.6, and Lemma E.7. Lemma E.1.
For any sequence { ( θ n, ∗ , F n ) : n ≥ } of n − -local alternatives that satisfies LA1, the followingis true: max ≤ i ≤ n | g j ( W i , θ n, ∗ ) | = O p ( n δ ) for each j ∈ J , where δ > is defined in Assumption LA1.Proof. We outline the proof and then provide the steps.
Outline.
The proof is similar to that of equation (2.4) in Guggenberger and Smith (2005). In the firststep, we choose an appropriate
C >
0. In the second step, we apply the union bound and Markov’s inequalityto establish the result.
Step 1.
Fix r > j ∈ J , and { ( θ n, ∗ , F n ) : n ≥ } arbitrarily. We know that K := sup n ≥ E F n | g j ( W i , θ n, ∗ ) | δ < + ∞ by Assumption LA1 and can therefore choose C >
K/C < r . Such a constant
C > tep 2. We know that P F n (cid:16) max ≤ i ≤ n | g j ( W i , θ n, ∗ ) | ≤ ( Cn ) δ (cid:17) ≤ n X i =1 P F n (cid:16) | g j ( W i , θ n, ∗ ) | δ > nC (cid:17) ≤ KC < r where the first inequality applies the union bound, the second follows from Markov’s inequality and takingthe supremum of { E F n | g j ( W i , θ n, ∗ ) | δ : n ≥ } , and the third holds by the construction of C . We haveconstructed an upper bound (i.e. K/C ) that is does not depend on n , so we can take the supremum toconclude that sup n ≥ P F n (cid:16) max ≤ i ≤ n | g j ( W i , θ n, ∗ ) | ≤ ( Cn ) δ (cid:17) < r and complete the proof. (cid:4) E.2 Restricted Estimator Under Local Alternatives
The restricted empirical likelihood problem issup p ,...,p n ( n X i =1 ln( p i ) (cid:12)(cid:12)(cid:12)(cid:12) n X i =1 p i g ( W i , θ n, ∗ ) ≥ J , n X i =1 p i = 1 , p i ≥ ∀ i = 1 , ..., n ) . (E.1)The Lagrangian is L ( p , ..., p n , λ ( θ n, ∗ ) , ω ( θ n, ∗ )) = n X i =1 ln( p i ) + ω − n X i =1 p i ! − nλ > n X i =1 p i g ( W i , θ n, ∗ ) ! (E.2)and the Karusch-Kuhn-Tucker (KKT) conditions are ∂ L ∂p i = 1 p i − ω − nλ g ( W i , θ n, ∗ ) = 0 , ∀ i = 1 , ..., n (E.3) λ j ≤ , n X i =1 p i g j ( W i , θ n, ∗ ) ≥ , ∀ j ∈ J (E.4) n X i =1 p i = 1 , λ j n X i =1 p i g j ( W i , θ n, ∗ ) = 0 ∀ j ∈ J . (E.5)From the Karusch-Kuhn-Tucker conditions, we have that´ p i = 1 n (cid:18)
11 + (´ λ n,b ) > g b ( W i , θ n, ∗ ) (cid:19) (E.6)where g b ( W i , θ n, ∗ ) denotes the vector of estimating functions for the moments that are deemed bindingby the Karusch-Kuhn-Tucker conditions and ´ λ n,b is the vector of Lagrange multipliers that corresponds to g b ( W i , θ n, ∗ ). Substituting (E.6) into (E.2), we obtain the dual representation of the empirical likelihoodproblem, sup λ ∈ R J − ( n ln( n ) + n X i =1 ln (cid:0) λ > g ( W i , θ n, ∗ ) (cid:1)) . (E.7) he existence of Lagrange multipliers holds because of the fact that the constraints are affine functions ofthe choice variables in the primal problem (E.2). E.3 Technical Results Relating to the Constrained Estimator
Recall that the set H is defined as the set of all local alternatives { ( θ n, ∗ , F n ) : n ≥ } that satisfy Assump-tions LA1 and LA2. Lemma E.2.
For each { ( θ n, ∗ , F n ) : n ≥ } ∈ H , define the random set C n ( θ n, ∗ ) = ( ( p , ..., p n ) > ∈ R n : n X i =1 p i g j ( W i , θ n, ∗ ) ≥ ∀ j ∈ J , n X i =1 p i = 1 , p i ≥ ∀ i ∈ I ) . Then lim n → + ∞ P F n (cid:0) C n ( θ n, ∗ ) = ∅ (cid:1) = 0 for each { ( θ n, ∗ , F n ) : n ≥ } ∈ H .Proof. We start with an outline and the provide the details.
Outline.
The proof proceeds by the direct method. In Step 1, we establish we establish that it sufficesto show that P F n (ˆ g n,j ( θ n, ∗ ) < → n → + ∞ for each j ∈ J . In Step 2, we establish the result using amean-value expansion and the WLLN for triangular arrays of row-wise i.i.d. random variables. Step 1.
The proof of the first step follows the a similar argument to that of Lemma D.2. We know that (cid:8) C n ( θ n, ∗ ) = ∅ (cid:9) = n ∀ p ∈ S n , ∃ j := j ( p ) ∈ J s.t. P ni =1 p i g j ( W i , θ n, ∗ ) < (cid:9) , where S n is the standardsimplex. That is, S n := { p ∈ R n : P ni =1 p i = 1 , p i ≥ ∀ i ∈ I} . Since ( n − , ..., n − ) > ∈ S n , it follows that (cid:8) C n ( θ n, ∗ ) = ∅ (cid:9) ⊆ J [ j =1 ( n n X i =1 g j ( W i , θ n, ∗ ) < ) and therefore lim n → + ∞ P F n (cid:16) C n ( θ n, ∗ ) = ∅ (cid:17) ≤ J X j =1 lim n → + ∞ P F n n n X i =1 g j ( W i , θ n, ∗ ) < ! . Hence it suffices to show that lim n → + ∞ P F n (ˆ g n,j ( θ n, ∗ ) <
0) = 0 for each j ∈ J . Step 2.
For each j ∈ J , we can mean-value expand E F n g j ( W i , θ n, ∗ ) /σ F n ,j ( θ n, ∗ ) around { ( θ n , F n ) : n ≥ } ∈ F and conclude that E F n g j ( W i , θ n, ∗ ) σ F n ,j ( θ n, ∗ ) = E F n g j ( W i , θ n ) σ F n ,j ( θ n ) + O ( n − )for each n ≥
1. So by the weak law of large numbers for triangular arrays of row-wise i.i.d. data, it followsthat lim n → + ∞ P F n (ˆ g n,j ( θ n, ∗ ) <
0) = 0, and therefore lim n → + ∞ P F n ( C n ( θ n, ∗ ) = ∅ ) = 0. (cid:4) Lemma E.2 is an important intermediate technical result because it allows us to conclude that along anysequence { ( θ n, ∗ , F n ) : n ≥ } ∈ H , the empirical likelihood estimator exists with probability approaching 1.In all of the subsequent results, it is implicit that the event {C n ( θ n, ∗ ) = ∅} occurs. emma E.3. Define ˆ g n,b ( θ n, ∗ ) := n − P ni =1 g b ( W i , θ n, ∗ ) . The following result holds for any sequence { ( θ n, ∗ , F n ) : n ≥ } of n − -local alternatives: P F n (cid:0) (´ λ n,b ) > ˆ g n,b ( θ n, ∗ ) ≥ (cid:1) = 1 for each n ≥ .Proof. We outline the proof and then provide details.
Outline.
The proof employs the direct method. The first step shows that for any sequence { ( θ n, ∗ , F n ) : n ≥ } , log(1 + (´ λ n,b ) > ˆ g n,b ( θ n, ∗ )) ≥ Step 1.
Let ´ λ n denote a feasible solution to the dual problem (E.7) under an arbitrary sequence { ( θ n, ∗ , F n ) : n ≥ } . Since the dual variables for the slack inequalities are equal to zero with probability 1 under theKarusch-Kuhn-Tucker conditions, we have that ´ λ > n g ( W i , θ n, ∗ ) = (´ λ n,b ) > g b ( W i , θ n, ∗ ) and the following holdswith probability equal to 1:0 ≤ n n X i =1 ln (cid:16) λ n,b ) > g b ( W i , θ n, ∗ ) (cid:17) ≤ ln (cid:18) λ n,b ) > n n X i =1 g b ( W i , θ n, ∗ ) (cid:19) (E.8)where the first inequality holds as 2 P ni =1 ln (cid:16) λ n,b ) > g b ( W i , θ n, ∗ ) (cid:17) is the empirical likelihood ratio statisticfor testing the null hypothesis (see Canay (2010)) and the second holds by Jensen’s inequality. This impliesthat log(1 + (´ λ n,b ) > ˆ g n,b ( θ n, ∗ )) ≥ n ≥ Step 2.
For any x ∈ R , ln(1 + x ) ≥ x ≥
0. Consequently, we use the conclusion of Step 1 toconclude that (´ λ n,b ) > ˆ g n,b ( W i , θ n, ∗ ) ≥ n ≥ (cid:4) Lemma E.4.
Define random index set ´ B % n := { j ∈ J : ´ g n,j ( θ n, ∗ ) = 0 } and deterministic index set C := { j ∈ J : lim n →∞ E F n (cid:0) g j ( W i , θ n, ∗ ) (cid:1) = 0 } . If Assumptions LA1 and LA2 hold, then the following resultis true for any sequence of n − -local alternatives { ( θ n, ∗ , F n ) : n ≥ } : lim n →∞ P F n ( ´ B % n ⊆ C ) = 1 (E.9) where P F n ( · ) is the probability measure induced by repeated sampling from F n .Proof. The proof has multiple steps so we present an outline and then the steps in detail.
Outline.
We want to show that the event (cid:8) ´ B % n ⊆ C (cid:9) occurs with probability approaching 1 alongany sequence { ( θ n, ∗ , F n ) : n ≥ } . This involves three steps. In Step 1, we use the complement rule todeduce that this is equivalent to showing that (cid:8) ´ B % n ∩ C c = ∅ (cid:9) occurs with probability approaching 0 along { ( θ n, ∗ , F n ) : n ≥ } . In Step 2, we characterize the event (cid:8) ´ B % n ∩ C c = ∅ (cid:9) . In Step 3, we argue that that theevent (cid:8) ´ B % n ∩ C c = ∅ (cid:9) occurs with probability approaching 0 along { ( θ n, ∗ , F n ) : n ≥ } . Step 1.
Let { ( θ n, ∗ , F n ) : n ≥ } be an arbitrary sequence of n − -local alternatives. By the complementrule, lim n →∞ P F n ( ´ B % n ⊆ C ) = 1 − lim n →∞ P F n ( ´ B % n ∩ C c = ∅ ) . (E.10)So to show lim n →∞ P F n ( ´ B % n ⊆ C ) = 1, it suffices to show that lim n →∞ P F n ( ´ B % n ∩ C c = ∅ ) = 0. tep 2. On the event { ´ B % n ∩ C c = ∅} , there exists j ∈ J such that ´ g n,j ( θ n, ∗ ) = 0 and lim n →∞ E F n ( g j ( W i , θ n, ∗ )) >
0. The deduction that lim n → + ∞ E F n g j ( W i , θ n, ∗ ) > E F n g j ( W i , θ n, ∗ ) /σ F n ,j ( θ n, ∗ )around θ n , which yields E F n g j ( W i , θ n, ∗ ) σ F n ,j ( θ n, ∗ ) = E F n g j ( W i , θ n ) σ F n ,j ( θ n ) + O ( n − )and therefore E F n g j ( W i , θ n, ∗ ) is asymptotically nonnegative because { ( θ n , F n ) : n ≥ } ∈ F . The expansionis valid under LA2. So all we need to show that with probability approaching 1 it is not possible for´ g n,j ( θ n, ∗ ) = 0 and lim n → + ∞ E F n g j ( W i , θ n, ∗ ) > j ∈ J . Step 3.
Suppose that there exists j ∈ J such that ´ g n,j ( θ n, ∗ ) = 0 and lim n → + ∞ E F n g j ( W i , θ n, ∗ ) >
0. Wefirst note that ´ g n,j ( θ n, ∗ ) = 0 implies ´ g n,j ( θ n, ∗ ) ≥ ˆ g n,j ( θ n, ∗ ) because0 = ´ g n,j ( θ n, ∗ ) = 1 n n X i =1 g j ( W i , θ n, ∗ )1 + (´ λ n,b ) > g b ( W i , θ n, ∗ ) ! ≥ ˆ g n,j ( θ n, ∗ )1 + (´ λ n,b ) > ˆ g b ( θ n, ∗ ) ≥ ˆ g n,j ( θ n, ∗ ) . by Jensen’s inequality and the fact that ´ λ n,b ) > ˆ g b ( θ n, ∗ ) ≥ g n,j ( θ n, ∗ ), we have that´ g n,j ( θ n, ∗ ) ≥ ˆ g n,j ( θ n, ∗ ) − E F n (ˆ g n,j ( θ n, ∗ )) + E F n (ˆ g n,j ( θ n, ∗ )) = o p (1) + E F n ( g j ( W i , θ n, ∗ )) (E.11)along { ( θ n, ∗ , F n : n ≥ } by the weak law of large numbers for triangular arrays of row-wise i.i.d. ran-dom variables and the unbiasedness of ˆ g n,j ( θ n, ∗ ) for E F n ( g j ( W i , θ n, ∗ )). If we send n → + ∞ , we deducethat the probability limit of ´ g n,j ( θ n, ∗ ) is strictly positive by the ordering in (E.11) and the fact thatlim n → + ∞ E F n g j ( W i , θ n, ∗ ) >
0. Consequently, lim n → + ∞ P F n ( ´ B % n ∩ C c = ∅ ) = 0 along { ( θ n, ∗ , F n : n ≥ } .Combining this result with Step 1, we complete the proof. (cid:4) Lemma E.5.
Let B n := | ´ B % n | , ´Λ := { ´ λ n,b | ( E. , ( E. , ( E. hold } ⊆ R B n , and ||·|| ‘ Bn denote the Euclideannorm on R B n . If Assumptions LA1 and LA2 hold, then sup ´ λ n,b ∈ ´Λ n || ´ λ n,b || ‘ Bn = O p ( n − ) along any sequence { ( θ n, ∗ , F n ) : n ≥ } .Proof. The proof proceeds by the direct method. Given the length of the proof, we outline the argumentand then provide detailed steps.
Outline.
Our goal is to show that for any { ( θ n, ∗ , F n ) : n ≥ } , sup ´ λ n,b ∈ ´Λ n || ´ λ n,b || ‘ Bn = O p ( n − ). Thisinvolves three steps. In the first step, we do some algebra to relate the Karusch-Kuhn-Tucker conditionsto || ´ λ n,b || ‘ Bn . In the second step, we derive a bound relating || ´ λ n,b || ‘ Bn and the sample moments of theinequalities that are binding under the Karusch-Kuhn-Tucker conditions. In the third step, we use standardlimit theorems for triangular arrays of row-wise i.i.d. random variables and the bound derived in Step 2 toconclude the result. Step 1.
Let { ( θ n, ∗ , F n ) : n ≥ } be arbitrary. The Karusch-Kuhn-Tucker conditions dictate that ´ λ n satisfies 1 n n X i =1 g b ( W i , θ n, ∗ )1 + (´ λ n ) > g ( W i , θ n, ∗ ) = 0 B n . (E.12) nder complementary slackness ´ λ n = (´ λ n,b , J − B n ), which implies that (E.12) is equivalent to1 n n X i =1 g b ( W i , θ n, ∗ )1 + (´ λ n,b ) > g b ( W i , θ n, ∗ ) = 0 B n . (E.13)Let { β n : n ≥ } be a sequence of unit vectors in R B n that satisfy β n || ´ λ n,b || ‘ Bn = ´ λ n,b . We take the innerproduct between β n and (E.13), which gives us β > n n n X i =1 g b ( W i , θ n, ∗ )1 + (´ λ n,b ) > g b ( W i , θ n, ∗ ) ! = 0 (E.14)If we define X i := (´ λ n,b ) > g b ( W i , θ n, ∗ ) for all i = 1 , ..., n and use the transformation11 + X i = 1 − X i X i for all i = 1 , ..., n , then we have that β > n n n X i =1 g b ( W i , θ n, ∗ ) ! = β > n n n X i =1 (cid:0) g b ( W i , θ n, ∗ ) (cid:1) (´ λ n,b ) > g b ( W i , θ n, ∗ )1 + (´ λ n,b ) > g b ( W i , θ n, ∗ ) ! (E.15)= || ´ λ n,b || ‘ Bn β > n n n X i =1 g b ( W i , θ n, ∗ ) g b ( W i , θ n, ∗ ) > λ n,b ) > g b ( W i , θ n, ∗ ) ! β n . (E.16)where the last equality holds by the definition of { β n : n ≥ } . Step 2.
Let ˆΣ n,b ( θ n, ∗ ) denote the sample analogue estimator of the covariance matrix of g b ( W i , θ n, ∗ ). Wewill relate ˆΣ n,b ( θ n, ∗ ) to the RHS of (E.16). Since ´ B % n ⊆ C with probability approaching 1 (Lemma E.4), wehave that ˆΣ n,b ( θ n, ∗ ) = 1 n n X i =1 g b ( W i , θ n, ∗ ) g b ( W i , θ n, ∗ ) > . (E.17)with probability approaching 1 along { ( θ n, ∗ , F n ) : n ≥ } . Since p i >
0, we have that 1 + X i > i = 1 , ..., n , which implies that with probability tending to 1 along { ( θ n, ∗ , F n ) : n ≥ } : || ´ λ n,b || ‘ Bn β > n ˆΣ n,b ( θ n, ∗ ) β n ≤ || ´ λ n,b || ‘ Bn β > n n n X i =1 g b ( W i , θ n, ∗ ) g b ( W i , θ n, ∗ ) > λ n,b ) > g b ( W i , θ n, ∗ ) ! β n (1 + X max ) (E.18)where X max = max ≤ i ≤ n | (´ λ n,b ) > g b ( W i , θ n, ∗ ) | . Applying the Cauchy-Schwarz inequality, we have that | (´ λ n,b ) > g b ( W i , θ n, ∗ ) | ≤ || ´ λ n,b || ‘ Bn || g b ( W i , θ n, ∗ ) || ‘ Bn , (E.19)implying that || ´ λ n,b || ‘ Bn β > n ˆΣ n,b ( θ n, ∗ ) β n ≤ || ´ λ n,b || ‘ Bn β > n n n X i =1 g b ( W i , θ n, ∗ ) g b ( W i , θ n, ∗ ) > λ n,b ) > g b ( W i , θ n, ∗ ) ! β n (1 + || ´ λ n,b || ‘ Bn Z ∗ n ) (E.20) here Z ∗ n := max ≤ i ≤ n || g b ( W i , θ n, ∗ ) || ‘ Bn . Apply the equality in (E.16) to the right hand side of (E.20) toconclude || ´ λ n,b || ‘ Bn β > n ˆΣ n,b ( θ n, ∗ ) β n − β > n (cid:18) Z ∗ n n n X i =1 g b ( W i , θ n, ∗ ) (cid:19)! ≤ β > n n n X i =1 g b ( W i , θ n, ∗ ) ! (E.21) Step 3.
By Lemma E.1, we have that Z ∗ n = o p ( n ) along { ( θ n, ∗ , F n ) : n ≥ } . Since ´ B % n ⊆ C for large n ,we can apply the Lyapunov CLT to n P ni =1 g b ( W i , θ n, ∗ ) to conclude that n − P ni =1 g b ( W i , θ n, ∗ ) = O p ( n − )along { ( θ n, ∗ , F n ) : n ≥ } . Finally, LA1 implies that0 < a + o p (1) ≤ β > n ˆΣ n,b ( θ n, ∗ ) β n ≤ b + o p (1) (E.22)along the sequence { ( θ n, ∗ , F n ) : n ≥ } , where a and b are the smallest and largest eigenvalues of thevariance matrix of binding moments in the population. These limiting results allow us to conclude that || ´ λ n,b || ‘ Bn ≤ O p ( n − ) a + o p (1) ∀ ´ λ n,b ∈ ´Λ n (E.23)along { ( θ n, ∗ , F n ) : n ≥ } . We have shown that the positive random variable || ´ λ n,b || ‘ Bn is bounded aboveby a random variable that is O p ( n − ) which impliessup ´ λ n,b ∈ Λ ∗ n || ´ λ n,b || ‘ Bn = O p ( n − ) (E.24)along { ( θ n, ∗ , F n ) : n ≥ } . (cid:4) Lemma E.6.
Let ´ g n ( θ n, ∗ ) and ˆ g n ( θ n, ∗ ) denote the restricted and unrestricted estimators of the moments,respectively, under n − -local alternatives. If the sequence { ( θ n, ∗ , F n ) : n ≥ } satisfies Assumptions LA1and LA2, then || ˆ g n ( θ n, ∗ ) − ´ g n ( θ n, ∗ ) || ‘ J = O p ( n − ) .Proof. Due to the length of the proof, we present an outline and then the steps in detail.
Outline.
Our goal is to show || ˆ g n ( θ n, ∗ ) − ´ g n ( θ n, ∗ ) || ‘ J = O p ( n − ) for any sequence { ( θ n, ∗ , F n ) : n ≥ } . This involves three steps. In Step 1, we show that proving || ˆ g n ( θ n, ∗ ) − ´ g n ( θ n, ∗ ) || ‘ J × J = O p ( n − )along a sequence of n − -local alternatives amounts to establishing the result coordinate-wise. In Step2, we use Lemma E.5 to deduce that showing | ˆ g n,j ( θ n, ∗ ) − ´ g n,j ( θ n, ∗ ) | = O p ( n − ) only requires showing P ni =1 ´ p i || g b ( W i , θ n, ∗ ) g j ( W i , θ n, ∗ ) || ‘ Bn = O p (1) along { ( θ n, ∗ , F n ) : n ≥ } . In Step 3, we show the requiredresult and complete the proof. Step 1.
Let { ( θ n, ∗ , F n ) : n ≥ } be arbitrary and let { e , ..., e J } denote the standard basis for R J . Thetriangle inequality and the unit length of the basis allows us to conclude that || ´ g n ( θ n, ∗ ) − ˆ g n ( θ n, ∗ ) || ‘ J ≤ P Jj =1 || e j (cid:0) ˆ g n,j ( θ n, ∗ ) − ´ g n,j ( θ n, ∗ ) (cid:1) || ‘ J = P Jj =1 | ˆ g n,j ( θ n, ∗ ) − ´ g n,j ( θ n, ∗ ) | . Consequently, a sufficient condition for || ´ g n ( θ n, ∗ ) − ˆ g n ( θ n, ∗ ) || ‘ J = O p ( n − ) along { ( θ n, ∗ , F n ) : n ≥ } is that, for each j ∈ J , | ˆ g n,j ( θ n, ∗ ) − ´ g n,j ( θ n, ∗ ) | = O p ( n − ) along { ( θ n, ∗ , F n ) : n ≥ } . tep 2. Consider | ˆ g n,j ( θ n, ∗ ) − ´ g n,j ( θ n, ∗ ) | , where j ∈ J is fixed arbitrarily. We conclude that | ˆ g n,j ( θ n, ∗ ) − ´ g n,j ( θ n, ∗ ) | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X i =1 (cid:16) n − ´ p i (cid:17) g j ( W i , θ n, ∗ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n n X i =1 −
11 + (´ λ n,b ) > g b ( W i , θ n, ∗ ) ! g j ( W i , θ n, ∗ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n n X i =1 (´ λ n,b ) > g b ( W i , θ n, ∗ )1 + (´ λ n,b ) > g b ( W i , θ n, ∗ ) ! g j ( W i , θ n, ∗ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ´ λ > n,b n X i =1 ´ p i g b ( W i , θ n, ∗ ) g j ( W i , θ n, ∗ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ || ´ λ n,b || ‘ Bn (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X i =1 ´ p i g b ( W i , θ n, ∗ ) g j ( W i , θ n, ∗ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ‘ Bn ≤ || ´ λ n,b || ‘ Bn n X i =1 || ´ p i g b ( W i , θ n, ∗ ) g j ( W i , θ n, ∗ ) || ‘ Bn ≤ sup ´ λ n,b ∈ ´Λ n || ´ λ n,b || ‘ Bn n X i =1 ´ p i || g b ( W i , θ n, ∗ ) g j ( W i , θ n, ∗ ) || ‘ Bn where the first inequality holds by Cauchy-Schwarz, the second holds by the triangle inequality, and the finalholds by the definition of least upper bound. By Lemma E.5, it suffices to show that n X i =1 ´ p i || g b ( W i , θ n, ∗ ) g j ( W i , θ n, ∗ ) || ‘ Bn = O p (1)and this is what we do in Step 3. Step 3.
By definition of ´ p i and the fact that x = 1 − x x , one can show n X i =1 ´ p i || g b ( W i , θ n, ∗ ) g j ( W i , θ n, ∗ ) || ‘ Bn = E n, ( θ n, ∗ ) − E n, ( θ n, ∗ ) (E.25)where E n, ( θ n, ∗ ) := 1 n n X i =1 || g b ( W i , θ n, ∗ ) g j ( W i , θ n, ∗ ) || ‘ Bn (E.26)and E n, ( θ n, ∗ ) := 1 n n X i =1 (´ λ n,b ) > g b ( W i , θ n, ∗ )1 + (´ λ n,b ) > g b ( W i , θ n, ∗ ) ! || g b ( W i , θ n, ∗ ) g j ( W i , θ n, ∗ ) || ‘ Bn . (E.27)First, note that E n, ( θ n, ∗ ) = O p (1) under { ( θ n, ∗ , F n ) : n ≥ } by the weak law of large numbers fortriangular arrays of row-wise i.i.d. random variables and LA1. Regarding (E.27), it is easy to see that ´ λ n,b ) > g b ( W i , θ n, ∗ ) = o p (1) along { ( θ n, ∗ , F n ) : n ≥ } because | (´ λ n,b ) > g b ( W i , θ n, ∗ ) | ≤ sup ´ λ n,b ∈ ´Λ n || ´ λ n,b || ‘ Bn max ≤ i ≤ n || g b ( W i , θ n, ∗ ) || ‘ Bn = O p ( n − ) o p ( n )= o p (1)by the Cauchy-Schwarz inequality, Lemma E.1, and Lemma E.5. This implies that E n, ( θ n, ∗ ) = (´ λ n,b ) > n n X i =1 g b ( W i , θ n, ∗ ) || g b ( W i , θ n, ∗ ) g j ( W i , θ n, ∗ ) || ‘ Bn !| {z } E n, ( θ n, ∗ ) + o p (1) (E.28)along { ( θ n, ∗ , F n ) : n ≥ } . The Cauchy-Schwarz inequality, triangle inequality, and definition of least upperbound implies that | E n, ( θ n, ∗ ) | ≤ sup ´ λ n,b ∈ ´Λ n || ´ λ || ‘ Bn max ≤ i ≤ n || g b ( W i , θ n, ∗ ) || ‘ Bn n n X i =1 || g b ( W i , θ n, ∗ ) g j ( W i , θ n, ∗ ) || ‘ Bn (E.29)= O p ( n − ) o p ( n ) O p (1) (E.30)= o p (1) (E.31)along { ( θ n, ∗ , F n ) : n ≥ } , where the equality holds by Lemma E.5, Lemma E.1, the weak law of large numbersfor triangular arrays of row-wise i.i.d. random variables, and LA1. This result implies that E n, ( θ n, ∗ ) is o p (1) and, combined with E n, ( θ n, ∗ ) = O p (1), implies that (E.25) is O p (1), which was required to show | ˆ g n,j ( θ n, ∗ ) − ´ g n,j ( θ n, ∗ ) | = O p ( n − ). (cid:4) E.4 Technical Results for Power Comparison
The first result establishes the consistency of the constrained estimator of the covariance matrix along n − -local alternatives. Like Lemma D.4, || · || ‘ J × J denotes the Frobenius norm. Lemma E.7.
Let ´Σ n ( θ n, ∗ ) := P ni =1 ´ p i (cid:0) g ( W i , θ n, ∗ ) − ´ g n ( θ n, ∗ ) (cid:1)(cid:0) g ( W i , θ n, ∗ ) > − ´ g n ( θ n, ∗ ) (cid:1) > and Σ( θ n, ∗ , F n ) := Cov F n (cid:0) g ( W i , θ n, ∗ ) (cid:1) . If Assumptions LA1 and LA2 hold, then ∀ r > , ∀ { ( θ n, ∗ , F n ) : n ≥ } , lim n →∞ P F n (cid:16) || ´Σ n ( θ n, ∗ ) − Σ( θ n, ∗ , F n ) || ‘ J × J < r (cid:17) = 1 . Proof.
The proof proceeds by the direct method. Although the argument is linear, it has a few steps so weoutline the proof and then provide the details.
Outline.
We consider an arbitrary sequence { ( θ n, ∗ , F n ) : n ≥ } of n − -local alternatives and show that || ´Σ n ( θ n, ∗ ) − Σ( θ n, ∗ , F n ) || ‘ J × J = o p (1). To do this, there are a FEW steps. In the first step, we deducethat it is sufficient to show || P ni =1 ( n − − ´ p i ) g ( W i , θ n, ∗ ) g ( W i , θ n, ∗ ) > || ‘ J × J = o p (1). In Step 2, we show || P ni =1 ( n − − ´ p i ) g ( W i , θ n, ∗ ) g ( W i , θ n, ∗ ) > || ‘ J × J ≤ o p (1) P ni =1 ´ p i || g ( W i , θ n, ∗ ) g ( W i , θ n, ∗ ) > || ‘ J × J . Subsequently,we show that ´ p i || g ( W i , θ n, ∗ ) g ( W i , θ n, ∗ ) > || ‘ J × J = O p (1) in Step 3. tep 1. Fix { ( θ n, ∗ , F n ) : n ≥ } arbitrarily. By the triangle inequality, || ´Σ n ( θ n, ∗ ) − Σ( θ n, ∗ , F n ) || ‘ J × J ≤ || ´Σ n ( θ n, ∗ ) − ˆΣ( θ n, ∗ ) || ‘ J × J (E.32)+ || ˆΣ n ( θ n, ∗ ) − Σ( θ n, ∗ , F n ) || ‘ J × J (E.33)= || ´Σ n ( θ n, ∗ ) − ˆΣ( θ n, ∗ ) || ‘ J × J + o p (1) (E.34)along { ( θ n, ∗ , F n ) : n ≥ } , where the second equality holds by the weak law of large numbers for triangulararrays of row-wise i.i.d. random variables. Decomposing || ´Σ n ( θ n, ∗ ) − ˆΣ( θ n, ∗ ) || ‘ J × J , we obtain || ´Σ n ( θ n, ∗ ) − ˆΣ( θ n, ∗ ) || ‘ J × J ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X i =1 (cid:16) n − ´ p i (cid:17) g ( W i , θ n, ∗ ) g ( W i , θ n, ∗ ) > (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ‘ J × J + o p (1) (E.35) along { ( θ n, ∗ , F n ) : n ≥ } , where the inequality holds by the triangle inequality, Lemma E.6, andthe continuous mapping theorem. Consequently, the result boils down to being able to show thatthe first term in (E.35) is o p (1) along { ( θ n, ∗ , F n ) : n ≥ } . Step 2.
Following a similar derivation to that in Lemma E.6, it can be shown that (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X i =1 (cid:16) n − ´ p i (cid:17) g ( W i , θ n, ∗ ) g ( W i , θ n, ∗ ) > (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ‘ J × J = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X i =1 (cid:20) ´ p i (´ λ n,b ) > g b ( W i , θ n, ∗ ) (E.36) × g ( W i , θ n, ∗ ) g ( W i , θ n, ∗ ) > (cid:21)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ‘ J × J ≤ sup ´ λ n,b ∈ ´Λ n || ´ λ n,b || ‘ Bn max ≤ i ≤ n || g b ( W i , θ n, ∗ ) || ‘ Bn (E.37) × n X i =1 ´ p i || g ( W i , θ n, ∗ ) g ( W i , θ n, ∗ ) > || ‘ J × J where the inequality holds by the triangle inequality, the Cauchy-Schwarz inequality and definitionof the least upper bound. Sincesup ´ λ n,b ∈ ´Λ n || ´ λ n,b || ‘ Bn max ≤ i ≤ n || g b ( W i , θ n, ∗ ) || ‘ J × J = O p ( n − ) o p ( n ) = o p (1)along { ( θ n, ∗ , F n ) : n ≥ } by Lemma E.5 and Lemma E.1, it suffices to show that n X i =1 ´ p i || g ( W i , θ n, ∗ ) g ( W i , θ n, ∗ ) > || ‘ J × J = O p (1)along { ( θ n, ∗ , F n ) : n ≥ } . This is the task of Step 3.22 tep 3. Decompose P ni =1 ´ p i || g ( W i , θ n, ∗ ) g ( W i , θ n, ∗ ) > || ‘ J × J as follows n X i =1 ´ p i || g ( W i , θ n, ∗ ) g ( W i , θ n, ∗ ) > || ‘ J × J = 1 n n X i =1 || g ( W i , θ n, ∗ ) g ( W i , θ n, ∗ ) > || ‘ J × J λ n,b ) > g b ( W i , θ n, ∗ )= 1 n n X i =1 || g ( W i , θ n, ∗ ) g ( W i , θ n, ∗ ) > || ‘ J × J − n n X i =1 (´ λ n,b ) > g b ( W i , θ n, ∗ ) || g ( W i , θ n, ∗ ) g ( W i , θ n, ∗ ) > || ‘ J × J λ n,b ) > g b ( W i , θ n, ∗ ) ≤ n n X i =1 || g ( W i , θ n, ∗ ) g ( W i , θ n, ∗ ) > || ‘ J × J − (´ λ n,b ) > n P ni =1 g b ( W i , θ n, ∗ ) || g ( W i , θ n, ∗ ) g ( W i , θ n, ∗ ) > || ‘ J × J λ n,b ) > ˆ g n,b ( θ n, ∗ ) . where the inequality holds by Jensen’s inequality. By the weak law of large numbers for row-wisei.i.d. random variables and LA1, n P ni =1 || g ( W i , θ n, ∗ ) g ( W i , θ n, ∗ ) > || ‘ J × J = O p (1) along { ( θ n, ∗ , F n ) : n ≥ } . Now, let E n, ( θ n, ∗ ) := (´ λ n,b ) > n P ni =1 g b ( W i , θ n, ∗ ) || g ( W i , θ n, ∗ ) g ( W i , θ n, ∗ ) > || ‘ J × J λ n,b ) > ˆ g n,b ( θ n, ∗ )and notice that | E n, ( θ n, ∗ ) | ≤ sup ´ λ n,b ∈ ´Λ n || ´ λ n,b || ‘ Bn max ≤ i ≤ n || g b ( W i , θ n, ∗ ) || ‘ J × J λ n,b ) > ˆ g n,b ( θ n, ∗ ) ! (E.38) × n n X i =1 || g ( W i , θ n, ∗ ) g ( W i , θ n, ∗ ) > || ‘ J × J ! . The numerator of (E.38) is o p (1) along { ( θ n, ∗ , F n ) : n ≥ } by Lemma E.5, Lemma E.1, and the weaklaw of large numbers for triangular arrays of row-wise i.i.d. random variables. The denominator is1 + o p (1) along { ( θ n, ∗ , F n ) : n ≥ } because | (´ λ n,b ) > ˆ g n,b ( θ n, ∗ ) | ≤ sup ´ λ n,b ∈ ´Λ n || ´ λ n,b || ‘ Bn || ˆ g n,b ( θ n, ∗ ) || ‘ Bn = O p ( n − ) O p ( n − ) = o p (1) (E.39)where the first inequality is the Cauchy-Schwarz inequality and the definition of least upper bound,the first equality holds by Lemma E.5 and a Liaponuv CLT for triangular arrays of row-wise i.i.d.random variables. Note we do not need to recenter as ´ B % n ⊆ C w.p.a. 1 as n → ∞ . Thus, P ni =1 ´ p i || g ( W i , θ n, ∗ ) g ( W i , θ n, ∗ ) > || ‘ J × J = O p (1) + o p (1) = O p (1) along { ( θ n, ∗ , F n ) : n ≥ } . (cid:4) The next lemma establishes an ordering of the restricted and unrestricted estimator of the moments thatoccurs with probability approaching 1 when the moments are nonnegatively correlated. emma E.8. Let M be as in (3.5). For any { ( θ n, ∗ , F n ) : n ≥ } ∈ M and j ∈ J , lim n → + ∞ P F n (cid:0) ˆ g n,j ( θ n, ∗ ) ≤ ´ g n,j ( θ n, ∗ ) (cid:1) = 1 . Proof.
We outline the steps to the proof and then provide details.
Outline.
The first step shows that ˆ g n,j ( θ n, ∗ ) − ´ g n,j ( θ n, ∗ ) = ´ λ > n,b P ni =1 ´ p i g b ( W i , θ n, ∗ ) g j ( W i , θ n, ∗ ) for any j ∈ J . The second step uses the sign restrictions on the elements in { Ω( θ n, ∗ , F n ) : n ≥ } and on ´ λ n,b toconclude the result. Step 1.
Fix j ∈ J and { ( θ n, ∗ , F n ) : n ≥ } ∈ M arbitrarily. Using a derivation similar to that presentedin Lemma E.6, we have thatˆ g n,j ( θ n, ∗ ) − ´ g n,j ( θ n, ∗ ) = n X i =1 (cid:18) n − ´ p i (cid:19) g j ( W i , θ n, ∗ ) (E.40)= ´ λ > n,b n X i =1 ´ p i g b ( W i , θ n, ∗ ) g j ( W i , θ n, ∗ ) (E.41) Step 2.
Let Ξ( θ n, ∗ , F n ) denote the B n × g j ( W i , θ n, ∗ ) and the elements of g b ( W i , θ n, ∗ ). From (E.41), we can writeˆ g n,j ( θ n, ∗ ) − ´ g n,j ( θ n, ∗ ) = ´ λ > n,b n X i =1 ´ p i g b ( W i , θ n, ∗ ) g j ( W i , θ n, ∗ ) − Ξ( θ n, ∗ , F n ) + Ξ( θ n, ∗ , F n ) ! (E.42)From Lemma E.7, we have that P ni =1 ´ p i g b ( W i , θ n, ∗ ) g j ( W i , θ n, ∗ ) − Ξ( θ n, ∗ , F n ) = o p (1) along { ( θ n, ∗ , F n ) : n ≥ } . Hence, ˆ g n,j ( θ n, ∗ ) − ´ g n,j ( θ n, ∗ ) = ´ λ > n,b o p (1) + Ξ( θ n, ∗ , F n ) ! . (E.43)Since the the Karusch-Kuhn-Tucker conditions dictate that ´ λ n,b,k ≤ k ∈ { , ..., B n } and Ξ( θ n, ∗ , F n )is a vector of nonnegative terms, we have that ˆ g n,j ( θ n, ∗ ) − ´ g n,j ( θ n, ∗ ) ≤ n → ∞ along { ( θ n, ∗ , F n ) : n ≥ } . (cid:4) The next result provides an ordering of the elementwise moment selection functions that occurs withprobability 1 under nonnegative correlation. Let a, b ∈ R J [ ±∞ ] , the relation a (cid:37) b means that a j ≥ b j foreach j ∈ J . Lemma E.9.
Let M be as in (3.5). Then ∀ { ( θ n, ∗ , F n ) : n ≥ } ∈ M , lim n →∞ P F n (cid:16) ϕ (1) ( ´ ξ n ( θ n, ∗ ) , ˆΩ n ( θ n, ∗ )) (cid:37) ϕ (1) ( ˆ ξ n ( θ n, ∗ ) , ˆΩ n ( θ n, ∗ )) (cid:17) = 1 . Proof.
The proof uses the direct method. We present an outline and then the steps in detail.
Outline.
The proof involves two short steps. Step 1 shows that an ordering of the restricted and un-restricted estimators implies an ordering of the moment selection functions. Step 2 invokes Lemma E.8 toestablish the result. tep 1. Since ´ ξ n,j ( θ n, ∗ ) and ˆ ξ n,j ( θ n, ∗ ) are just ´ g n,j ( θ n, ∗ ) and ˆ g n,j ( θ n, ∗ ), respectively, scaled by commonpositive factor ˆ σ − n,j ( θ n, ∗ ) κ − n n , it follows that n ´ g n ( θ n, ∗ ) (cid:37) ˆ g n ( θ n, ∗ ) o ⊆ n ´ ξ n ( θ n, ∗ ) (cid:37) ˆ ξ n ( θ n, ∗ ) o (E.44) ⊆ n ϕ (1) ( ´ ξ n,j ( θ n, ∗ ) , ˆΩ n ( θ n, ∗ )) (cid:37) ϕ (1) ( ˆ ξ n,j ( θ n, ∗ ) , ˆΩ n ( θ n, ∗ )) o (E.45)where the second set inclusion holds because ϕ (1) ( ξ, Ω) is nondecreasing in ξ . Step 2.
Step 1 and the monotonicity of probability measures yield P F n (cid:16) ´ g n ( θ n, ∗ ) (cid:37) ˆ g n ( θ n, ∗ ) (cid:17) ≤ P F n (cid:16) ϕ (1) ( ´ ξ n,j ( θ n, ∗ ) , ˆΩ n ( θ n, ∗ )) (cid:37) ϕ (1) ( ˆ ξ n,j ( θ n, ∗ ) , ˆΩ n ( θ n, ∗ )) (cid:17) for each n ≥
1. So for any { ( θ n, ∗ , F n ) : n ≥ } ∈ M , we invoke Lemma E.8 to conclude thatlim n → + ∞ P F n (cid:16) ϕ (1) ( ´ ξ n,j ( θ n, ∗ ) , ˆΩ n ( θ n, ∗ )) (cid:37) ϕ (1) ( ˆ ξ n,j ( θ n, ∗ ) , ˆΩ n ( θ n, ∗ )) (cid:17) = 1 . (cid:4) F Further Theoretical Discussion
F.1 Other GMS Functions
F.1.1 GMS Assumptions
We restate the GMS Assumptions in Andrews and Soares (2010) to aid discussion in the next subsection.We restrict Ω ∈ Ψ in the statements to accord with the assumptions imposed on F . Assumption GMS 1.
For each j ∈ J , 1. ϕ j ( ξ, Ω) = 0 is continuous for all ( ξ, Ω) ∈ R J [+ ∞ ] × Ψ with ξ j = 0 and 2. ϕ j ( ξ, Ω) = 0 for all ( ξ, Ω) ∈ R J [+ ∞ ] × Ψ with ξ j = 0 . Assumption GMS 2. κ n → + ∞ as n → + ∞ Assumption GMS 3.
For each j ∈ J , ϕ j ( ξ, Ω) → + ∞ as ( ξ, Ω) → ( ξ ∗ , Ω ∗ ) for any ( ξ ∗ , Ω ∗ ) ∈ R J [+ ∞ ] × Ψ with ξ ∗ ,j = + ∞ . Assumption GMS 4. κ − n n → + ∞ as n → + ∞ Assumption GMS 6.
For each j ∈ J , ϕ j ( ξ, Ω) ≥ for all ( ξ, Ω) ∈ R J [+ ∞ ] × Ψ . Assumption GMS 7.
For each j ∈ J , ϕ j ( ξ, Ω) ≥ min { , ξ j } for all ( ξ, Ω) ∈ R [+ ∞ ] × Ψ . We do not list Assumption GMS 5 because it is required to compare moment selection and subsamplingcritical values, a topic we do not discuss formally in our paper. GMS2 and GMS4 combine to form AssumptionK in the paper.
F.1.2 Alternative Choices of ϕ The main theoretical results in the paper assumed that ϕ = ϕ (1) , but there are many other choices for ϕ .These include ϕ (2) j ( ξ, Ω) = ψ ( ξ j ) , ϕ (3) j ( ξ, Ω) = max(0 , ξ j ) , ϕ (4) j ( ξ, Ω) = ξ j , where ψ ( · ) is nondecreasing and atisfies ψ ( x ) = 0 if x ≤ a L , ψ ( x ) ∈ [0 , ∞ ] if x ∈ ( a L , a U ), and ψ ( x ) = ∞ if x ≥ a U (Andrews and Soares,2010). Another choice is the modified MSC choice defined as ϕ (5) j = c j ( ξ, Ω) = 1 ∞ if c j ( ξ, Ω) = 0where c := ( c ( ξ, Ω) , ..., c J ( ξ, Ω)) solves the integer program min c ∈{ , } J { S ( − c > ξ, Ω) − ζ ( | c | ) } for someincreasing function ζ ( · ). Modified MSC uses the information embedded in the off-diagonals of the correlationmatrix Ω in a computationally expensive way, whereas ϕ ( k ) , k ∈ { , , , } , does not (Andrews and Soares,2010).Our decision to focus on ϕ = ϕ (1) is essentially without loss of generality because the results can begeneralized to any ϕ that satisfies the assumptions of Andrews and Soares (2010). To see this, we first recallthat Lemma D.3 implies that for any r > n → + ∞ inf ( θ,F ) ∈F + P F (cid:16)(cid:12)(cid:12)(cid:12)(cid:12) ( ´ ξ n ( θ ) , ˆΩ n ( θ )) − ( ˆ ξ n ( θ ) , ˆΩ n ( θ )) (cid:12)(cid:12)(cid:12)(cid:12) ‘ J × Ψ < r (cid:17) = 1and Lemma E.6 implies that for any r > { ( θ n, ∗ , F n ) : n ≥ } ∈ H that satisfies Assumptions LA1 andLA2, lim n → + ∞ P F n (cid:16)(cid:12)(cid:12)(cid:12)(cid:12) ( ´ ξ n ( θ n, ∗ ) , ˆΩ n ( θ n, ∗ )) − ( ˆ ξ n ( θ n, ∗ ) , ˆΩ n ( θ n, ∗ )) (cid:12)(cid:12)(cid:12)(cid:12) ‘ J × Ψ < r (cid:17) = 1 , where || · || ‘ J × Ψ := ( || · || ‘ J + || · || ‘ J × J ) in both statements. These convergence results are enough toextend our asymptotic size and limiting local power results to any ϕ that satisfies Assumptions GMS1–4with appropriate modifications to notation. Indeed, we can replicate the arguments in the proofs of Theorem1 and Theorem 2 in Andrews and Soares (2010) (with modifications to notation). The same comment appliesto Theorem 4 in their paper because the use of a constrained estimator does not challenge the validity ofGMS7.The ordering of the local power functions also extends. The weak ordering of the local power functionsextends to ϕ ( k ) , k ∈ { , , , } , because the result only requires that ϕ j ( ξ, Ω) be nondecreasing in ξ . However,the strict ordering does not apply under ϕ (3) because we require ϕ j ( ξ, Ω) ≥ j ∈ J in order to invokePart 2 of Assumption 5, effectively restricting attention to those that satisfy GMS5. We do not view this tobe a serious limitation, especially given that ϕ = ϕ (1) is the recommended by Andrews and Barwick (2012a).A final technical point is that to generalize Theorem 3, we must replace the event { ´Υ n ( θ n, ∗ ) (cid:40) ˆΥ n ( θ n, ∗ ) } in the statement of Theorem 3 with a more general event (cid:8) ϕ ( ´ ξ n ( θ n, ∗ ) , ˆΩ n ( θ n, ∗ )) (cid:31) ϕ ( ˆ ξ n ( θ n, ∗ ) , ˆΩ n ( θ n, ∗ ))) (cid:9) because the first uses the specific form of ϕ (1) . F.2 Elaboration on Remark 1
In Remark 1, we state that one can ‘fully constrain’ the CMS procedure. This involves use of the empiricallikelihood estimator of the covariance matrix ´Σ n ( θ ) and correlation matrix ´Ω n ( θ ) = ´ D − n ( θ ) ´Σ n ( θ ) ´ D − n ( θ ). Note if c j = 0 and ξ j = + ∞ , the convention is adopted that c j ξ j = 0. For any vectors a, b ∈ R J [+ ∞ ] , the relation a (cid:31) b means that a j ≥ b j for each j with at least one strict inequality. emma D.3 and D.4 imply that for any r > n → + ∞ inf ( θ,F ) ∈F + P F (cid:16)(cid:12)(cid:12)(cid:12)(cid:12) ( ´ ξ F Cn ( θ ) , ´Ω n ( θ )) − ( ˆ ξ n ( θ ) , ˆΩ n ( θ )) (cid:12)(cid:12)(cid:12)(cid:12) ‘ J × Ψ < r (cid:17) = 1 . So simple modifications of the arguments in the proof of Theorem 1 establish validity of fully-constrainedconfidence sets.Similarly, Lemma E.6 and E.7 allow us to conclude that for any r > { ( θ n, ∗ , F n ) : n ≥ } ∈ H that satisfies Assumption LA1 and LA2,lim n → + ∞ P F n (cid:16)(cid:12)(cid:12)(cid:12)(cid:12) ( ´ ξ F Cn ( θ n, ∗ ) , ´Ω n ( θ n, ∗ )) − ( ˆ ξ n ( θ n, ∗ ) , ˆΩ n ( θ n, ∗ )) (cid:12)(cid:12)(cid:12)(cid:12) ‘ J × Ψ < r (cid:17) = 1implying that adjustments to the proof of Theorem 2 extend the limiting local power results to the fullyconstrained case. It is difficult to establish a general ordering between ´ ξ F Cn ( · ) and ˆ ξ ( · ) so it is unclearwhether the finite-sample n − -local power comparisons hold in the fully constrained case. The consistencyagainst distant alternatives also extends because GMS7 and the use of the constrained estimator imply that ϕ j ( ´ ξ F Cn ( θ n, ∗ ) , ´Ω n ( θ n, ∗ ) ≥ j ∈ J so we can similarly bound the fully constrained critical value fromabove by the plug-in asymptotic critical value. G Further Simulation Details
G.1 Outline of RMS
Andrews and Barwick (2012a) present a modification of the GMS procedure. From an implementationstandpoint, the approach is basically the same as GMS except that:1. They replace κ n with a data-driven tuning parameter ˆ κ := κ (ˆ δ n ( · )), where ˆ δ n ( · ) is the minimumoff-diagonal element of ˆΩ n ( · ).2. They add a size-correction factor ˆ η := η (ˆ δ n ( · )) + η ( J ) to the GMS critical value that results fromusing the tuning parameter ˆ κ .The need to size-correct reflects the fact that ˆ κ is a finite constant plus o p (1) rather than a divergent sequenceand the method of data-driven tuning parameters is referred to as κ -auto (Andrews and Barwick, 2012a). G.2 Outline of the Two-Step Procedure
We outline the two-step procedure of Romano et al. (2014) to aid understanding of the simulation results.The procedure needs some modification because we test H : µ ∈ R J + rather than H : µ ∈ R J − . To this end,let F = { F = N ( µ, Σ) : ( µ, Σ) ∈ R J × Ψ } , F = { F ∈ F : µ ∈ R J + } , and assume that the correlation matrixΣ is known. The following steps describe a level α test for H : F ∈ F vs. H : F ∈ F \ F using a randomsample { W i : i = 1 , ..., n } iid ∼ F :1. Compute the test statistic T n = S ( √ n ˆ g n , ˆΣ n ).2. Generate bootstrap samples { W ∗ i,b : i = 1 , ..., n } , b = 1 , ..., B , by sampling with replacement from thedata { W i : i = 1 , ..., n } . . Compute a lower confidence rectangle M n ( β ) = { µ ∈ R J : min ≤ j ≤ J [ˆ σ − n,j √ n ( µ j − ˆ g n,j )] ≥ K − n ( β ) } ,where K − n ( β ) is the β -quantile of { min ≤ j ≤ J [(ˆ σ ∗ n,j,b ) − √ n (ˆ g n,j − ˆ g ∗ n,j,b )] : b = 1 , ..., B } , ˆ g ∗ n,j,b = n − P ni =1 W ∗ i,j,b , and ˆ σ ∗ n,b,j = n − P ni =1 ( W ∗ i,j,b − ˆ g ∗ n,j,b ) for b = 1 , ..., B and j = 1 , ..., J . This deter-mines which components of µ are ‘positive’.4. Compute bootstrap test statistics { T ∗ n,b : b = 1 , ..., B } , where T ∗ n,b = S (cid:0) ( ˆ D ∗ n,b ) − √ n (ˆ g ∗ n,b − ˆ g n ) +( ˆ D ∗ n,b ) − √ nλ ∗ , ˆΣ ∗ n,b (cid:1) , and λ ∈ R J + with λ ∗ j = max { , n − ˆ σ n,j K − n ( β ) + ˆ g n,j } for j = 1 , ..., J .5. Compute the critical value c RSWn (1 − α + β ), which is defined as the 1 − α + β quantile of { T ∗ n,b : b =1 , ..., B } .6. Reject H at significance level α if T n > c RSWn (1 − α + β ) and M n ( β ) (cid:42) R J + .Following the choice of Romano et al. (2014), we set β = α/
10 for all simulations.
G.3 MNRP Corrections
We outline the MNRP corrections used for finite-sample n − local power results, an essential ingredient fora fair comparison of the procedures under the alternative. For a given pair ( J, Ω), let p RSWn,R ≡ p RSWn,R ( J, Ω)denote the maximum null rejection probability for the two-step procedure of Romano et al. (2014) based on R Monte Carlo simulations and sample size n . For t ∈ { GM S, CM S, RM S } , the random variable δ tn,R ≡ δ tn,R ( J, Ω) is the (1 − p RSWn,R )-empirical quantile based on the simulated process { T ∗ ,tn,r − c ∗ ,tn,r : r = 1 , ..., R } ,where ( T ∗ ,tn,r , c ∗ ,tn,r ) correspond to the mean vector µ ∗ ,t that maximizes null rejection probability for test t . Weadd δ tn,R to the corresponding critical value in the power results to ensure that all procedures have the sameMNRP. Indeed, by constructionˆ P ∗ R,t ( T ∗ ,tn,r − c ∗ ,tn,r ≤ δ tn,R ) = p RSWn,R ∀ t ∈ { GM S, CM S, RM S } where ˆ P ∗ R,t ( · ) denotes the simulation distribution of { T ∗ ,tn,r − c ∗ ,tn,r : r = 1 , ..., R } ..