[PDF] A Simple, Short, but Never-Empty Confidence Interval for Partially Identified Parameters

Abstract

This paper revisits the simple, but empirically salient, problem of inference on a real-valued parameter that is partially identified through upper and lower bounds with asymptotically normal estimators. A simple confidence interval is proposed and is shown to have the following properties: - It is never empty or awkwardly short, including when the sample analog of the identified set is empty. - It is valid for a well-defined pseudotrue parameter whether or not the model is well-specified. - It involves no tuning parameters and minimal computation. Computing the interval requires concentrating out one scalar nuisance parameter. In most cases, the practical result will be simple: To achieve 95% coverage, report the union of a simple 90% (!) confidence interval for the identified set and a standard 95% confidence interval for the pseudotrue parameter. For uncorrelated estimators -- notably if bounds are estimated from distinct subsamples -- and conventional coverage levels, validity of this simple procedure can be shown analytically. The case obtains in the motivating empirical application (de Quidt, Haushofer, and Roth, 2018), in which improvement over existing inference methods is demonstrated. More generally, simulations suggest that the novel confidence interval has excellent length and size control. This is partly because, in anticipation of never being empty, the interval can be made shorter than conventional ones in relevant regions of sample space.

Full PDF

AA Simple, Short, but Never-Empty Conﬁdence Intervalfor Partially Identiﬁed Parameters

J¨org Stoye ∗ January 1, 2021

Abstract

This paper revisits the simple, but empirically salient, problem of inference on areal-valued parameter that is partially identiﬁed through upper and lower bounds withasymptotically normal estimators. A simple conﬁdence interval is proposed and is shownto have the following properties: • It is never empty or awkwardly short, including when the sample analog of theidentiﬁed set is empty. • It is valid for a well-deﬁned pseudotrue parameter whether or not the model iswell-speciﬁed. • It involves no tuning parameters and minimal computation.Computing the interval requires concentrating out one scalar nuisance parameter. In mostcases, the practical result will be simple: To achieve 95% coverage, report the union of asimple 90% (!) conﬁdence interval for the identiﬁed set and a standard 95% conﬁdenceinterval for the pseudotrue parameter.For uncorrelated estimators –notably if bounds are estimated from distinct subsamples–and conventional coverage levels, validity of this simple procedure can be shown analyt-ically. The case obtains in the motivating empirical application (de Quidt, Haushofer,and Roth, 2018), in which improvement over existing inference methods is demonstrated.More generally, simulations suggest that the novel conﬁdence interval has excellent lengthand size control. This is partly because, in anticipation of never being empty, the intervalcan be made shorter than conventional ones in relevant regions of sample space. ∗ Department of Economics, Cornell University, [email protected]. Thanks to Johannes Haushofer,Jonathan de Quidt, and Chris Roth for an inquiry that motivated this work and for sharing and explain-ing their data. Financial support through NSF Grant SES-1824375 is gratefully acknowledged. a r X i v : . [ ec on . E M ] D ec Introduction

Inference under partial identiﬁcation is by now the subject of a broad literature. Onlyrecently did attention turn to the following concern: If a partially identiﬁed model is mis-speciﬁed, this may manifest in either an empty or –and arguably worse– in a misleadinglysmall conﬁdence region. That is, misspeciﬁed inference can be spuriously precise.The reason is that most conﬁdence regions used in partial identiﬁcation invert tests of H : θ ∈ Θ I ; here, θ is a parameter and Θ I is the identiﬁed set. If H is rejected at every θ , the conﬁdence region is empty. If H is barely not rejected at a few parameter values,the conﬁdence region may be very small. This issue is empirically relevant. For example, anempty sample analog of Θ I occurs in de Quidt, Haushofer, and Roth (2018), whose inquirysparked the present research and whose data are reanalyzed below.The literature on this issue is still young. Ponomareva and Tamer (2011) provide an earlydiagnosis. Kaido and White (2013) propose a notion of pseudotrue identiﬁed set and anestimator thereof. Molinari (2020) explains the issue in detail and highlights it as importantarea for further investigation. The most thorough treatment is by Andrews and Kwon (2019),who emphasize the issue’s importance and provide a general inference method that avoidsspurious precision and ensures coverage of a pseudotrue identiﬁed set.The present paper is in the spirit of Andrews and Kwon (2019). I focus on the simple butempirically salient case of a scalar parameter with upper and lower bounds whose estimatorsare jointly asymptotically normal. That is, I revisit the setting of Imbens and Manski (2004,without their supereﬃciency assumption) and Stoye (2009). For this setting, I propose aconﬁdence interval with the following features: • It is never empty nor very short (a lower bound on its length is reported later). • It exhibits asymptotically guaranteed coverage uniformly over the identiﬁed set andadditionally for a well-deﬁned pseudotrue parameter. • It tends to be shorter than more conventional intervals in benign cases, including inthe empirical application. • It is free of tuning parameters and trivial to compute.For target coverage of 95% and for the special case of uncorrelated estimators, e.g. in thispaper’s empirical application, the conﬁdence interval can be verbally deﬁned as follows: • Add ± .

64 standard errors to estimators of upper and lower bounds. • Also compute an average of the estimators that is weighted by their standard errors,as well as the corresponding standard error. Add ± .

96 of those standard errors to theaverage. See Manski (2003) for an early monograph, Tamer (2010) for a historical introductions, and Canay andShaikh (2017) and Molinari (2020) for recent surveys that extensively cover inference. [2]

Report the union of the intervals.While this paper generally proposes a somewhat less “cute” procedure with broader applica-bility, this specialized ﬁnding is probably the most striking part. Neither of the above twointervals is valid by itself; it is just that their coverage events are correlated in exactly theright way.Section 2 develops the proposal more formally and gives an intuition for why it works,though proofs are relegated to the Appendix. Section 3 provides a numerical illustration andSection 4 an application to the data that motivated this research. Section 5 concludes.

While the interpretation of what follows is inference on a scalar parameter θ , the only as-sumption is that one has well-behaved estimators of two other parameter values. Assumption : There exist estimators (ˆ θ L , ˆ θ U ) with probability limits ( θ L , θ U ) ∈ R suchthat √ n (cid:32) ˆ θ L − θ L ˆ θ U − θ U (cid:33) d → N (cid:32)(cid:32) (cid:33) , (cid:32) σ L ρσ L σ U ρσ L σ U σ U (cid:33)(cid:33) , where σ L , σ U > and consistent estimators (ˆ σ L , ˆ σ U , ˆ ρ ) p → ( σ L , σ U , ρ ) are available. The motivation is that the researcher estimates an identiﬁed set Θ I ≡ [ θ L , θ U ] containinga true parameter value θ . Assumption 1 is unrestrictive if, as in the empirical application,(ˆ θ L , ˆ θ U ) are smooth functions of sample moments. It is unlikely to hold for intersectionbounds (Andrews and Shi, 2013; Chernozhukov, Lee, and Rosen, 2013) and will hold forbounds that result from projecting a higher-dimensional identiﬁed set (Bugni, Canay, andShi, 2017; Kaido, Molinari, and Stoye, 2019), including components of partially identiﬁedvectors, only in benign cases.The obvious estimator of Θ I is [ˆ θ L , ˆ θ U ], but deﬁning a conﬁdence interval is delicate.Following Imbens and Manski (2004), the literature mostly focuses on conﬁdence intervalsthat (asymptotically) contain the true parameter value with prespeciﬁed probability (1 − α ), irrespective of its location in Θ I , i.e. conﬁdence intervals that control inf θ ∈ Θ I Pr( θ ∈ CI ). Finding such intervals is subtle because the nature of the testing problem qualitativelydepends on the length ∆ ≡ θ U − θ L of Θ I . Heuristically, this problem is one-sided if ∆is “large” and two-sided if it is “short,” i.e. near point identiﬁcation. Ascertaining whichcase obtains is subject to diﬃculties reminiscient of post-model selection inference (Leeb andP¨otscher, 2005) and parameter-on-the-boundary issues (Andrews, 2000).The literature on how to circumvent this issue is by now considerable. Most approachesinvert a test, that is, they report all values of θ for which H : θ ∈ Θ I was not rejected. Full disclaimer: I discovered it by simulation and initially assumed a bug. [3]ny such conﬁdence set can be empty; in this paper’s settings, that will happen if ˆ θ L ismuch larger than ˆ θ U , where the meaning of “much” varies across papers. This feature can beadvertised as an embedded speciﬁcation test but may not be wanted. Arguably even moreproblematic is that, if the model is misspeciﬁed, a test inversion conﬁdence interval can beshort, suggesting precision when the true issue is misspeciﬁcation. A speciﬁcation test willnot resolve this: In this paper’s setting, the best-practice such test (Bugni, Canay, and Shi,2015) just reports whether the test inversion interval is empty. Addressing this concern requires a notion of coverage for the case of misspeciﬁcation, i.e.if θ L > θ U . Following Andrews and Kwon (2019), deﬁne the pseudotrue identiﬁed setΘ ∗ I ≡ Θ I ∪ { θ ∗ } θ ∗ ≡ σ U θ L + σ L θ U σ L + σ U . This deﬁnition is natural because Θ ∗ I = arg min θ max { ( θ − θ U ) /σ U , ( θ L − θ ) /σ L , } ; thus, Θ ∗ I is the estimand implied by the frequent choice of max { ( θ − ˆ θ U ) / ˆ σ U , (ˆ θ L − θ ) / ˆ σ L , } as teststatistic. Note also that Θ ∗ I is never empty and that Θ ∗ I = Θ I whenever Θ I (cid:54) = ∅ .The revised notion of validity of a conﬁdence interval is as follows: Definition : A conﬁdence interval CI has asymptotic coverage of (1 − α ) if lim n →∞ inf θ ∈ Θ ∗ I Pr( θ ∈ CI ) ≥ − α. Forcing coverage of θ ∗ will ensure that the interval is nonempty and also that it is statis-tically interpretable as targeting Θ ∗ I . An obvious caveat is that, as with the related literaturegoing back to White (1982), the coverage target’s substantive relevance may not be clear ifthe model is in fact misspeciﬁed. As Andrews and Kwon (2019) elaborate, this has to betraded oﬀ against concerns with spurious precision.While the coverage notion exactly mimics Andrews and Kwon (2019), the conﬁdenceinterval will be quite diﬀerent. It goes “back to basics” in that, like early entries in theliterature (Imbens and Manski, 2004; Stoye, 2009), it essentially just adds a certain numberof standard errors to estimated bounds. An advantage is computational and conceptualsimplicity; with test inversion intervals, critical values generally depend on θ even in thissimple setting and therefore must be computed many times. However, the main motivationis that the new interval performs well. Its heuristic deﬁnition is as follows: • Compute an interval CI Θ I ≡ (cid:2) ˆ θ L − ˆ σ L √ n ˆ c, ˆ θ U + ˆ σ U √ n ˆ c (cid:3) , That was the sales pitch in Stoye (2009), but not all referees were sold on it. The embedded speciﬁcationtest is analyzed in more detail by Andrews and Soares (2010). This equivalence does not generalize, but Andrews and Kwon (2019) show that in “slightly misspeciﬁed”parameter regimes, spuriously precise inference generally coexists with low power of speciﬁcation tests. [4]here ˆ c depends on α and ˆ ρ ; see Table 1. • Also compute the estimator ˆ θ ∗ ≡ ˆ σ U ˆ θ L + ˆ σ L ˆ θ U ˆ σ L + ˆ σ U and conﬁdence interval CI θ ∗ ≡ (cid:104) ˆ θ ∗ − ˆ σ ∗ √ n Φ − (cid:0) − α (cid:1) , ˆ θ ∗ + ˆ σ ∗ √ n Φ − (cid:0) − α (cid:1)(cid:105) ˆ σ ∗ ≡ ˆ σ L ˆ σ U √ ρ ˆ σ L + ˆ σ U . • Report the union CI Θ I ∪ CI θ ∗ . • We will not pre-estimate ∆ but set it to its globally least favorable value. We will,however, anticipate the conservative bias ensuing from taking unions of intervals. Thisbias is easy to estimate; in particular, there is no parameter-on-the-boundary issue. • One might think that concentrating out ∆ will be very conservative. It turns out thatthis is not so. In most cases, ˆ c = Φ − (1 − α ), i.e. we can just use the one-sided criticalvalue, at least to extremely high simulation accuracy. If ρ = 0 and for conventionalcoverage levels, this can be shown analytically.The new conﬁdence interval is obviously never empty; indeed, its length cannot drop below2ˆ σ ∗ Φ − (1 − α/ Definition : The misspeciﬁcation-adaptive conﬁdence interval CI MA is CI MA ≡ (cid:104) ˆ θ L − ˆ σ L √ n ˆ c, ˆ θ U + ˆ σ U √ n ˆ c (cid:105) ∪ (cid:104) ˆ θ ∗ − ˆ σ ∗ √ n Φ − (cid:0) − α (cid:1) , ˆ θ ∗ + ˆ σ ∗ √ n Φ − (cid:0) − α (cid:1)(cid:105) , (2.1) where ˆ c is the unique value of c solving inf ∆ ≥ Pr (cid:16) Z − ∆ − c ≤ ≤ Z + c or | Z + Z − ∆ | ≤ (cid:112) ρ Φ − (cid:0) − α (cid:1)(cid:17) = 1 − α, (cid:32) Z Z (cid:33) ∼ N (cid:32)(cid:32) (cid:33) , (cid:32) ρ ˆ ρ (cid:33)(cid:33) . (2.2) If ρ = 0 is known and √ − (1 − α ) ≥ Φ − (1 − α/ , just set ˆ c = Φ − (1 − α ) . Remark : The condition that √ − (1 − α ) ≥ Φ − (1 − α/

2) holds for α < .

14, i.e. forcoverage levels of 86% or higher.

Theorem : The conﬁdence interval CI MA achieves asymptotic coverage of (1 − α ) .Proof. See appendix A. [5] ≤ α = . α = . α = . ∈ [0 , ∞ ) for diﬀerent coverages andcorrelations. For ρ ≤ .

8, further simulations corroborate the one-sided critical value as exactsolution.Expression (2.2) is numerically evaluated for diﬀerent values of ρ and target coverages inTable 1. In particular, simulation with very high accuracy suggests that ˆ c is just the one-sidedcritical value for ρ up to at least .

8; it then gradually increases toward the two-sided criticalvalue, which is easily seen to solve (2.2) for ˆ ρ = 1. Remark : Except for large positive ρ , inﬁmal coverage of (1 − α ) is attained in the limitas ∆ → ∞ . For ﬁnite ∆, CI MA is therefore nominally conservative.In principle, one could try to capture this by concentrating out ∆ over a more limitedrange, e.g. over a (1 − . α )-conﬁdence interval with Bonferroni adjustment of subsequentinference. This could in principle lead to ˆ c < Φ − (1 − α ). I do not advocate it becausenumerically, the inﬁmum in (2.2) is well approximated for surprisingly small values of ∆.Therefore, the “inferential cost” of a pre-test, whether through adjustment of second-stagetest size or through reliance on a tuning parameter, would typically not be recovered. Remark : The literature on partial identiﬁcation often focuses on uniform inference.This is because na¨ıve inference methods may fail in cases of interest, e.g. as one approachespoint identiﬁcation. To prevent this, the literature has an informal requirement that inferencebe uniform over delicate nuisance parameters like (in this paper) ∆; see Molinari (2020,Section 4.3.2) for further discussion. CI MA is obviously uniform in this sense because ∆ (andalso the position of θ in Θ I ) is set to its globally least favorable value.To formally claim that inference is uniform over a large class of data generating processes,one would furthermore have to strengthen Assumption 1 so that consistency and asymptoticnormality of bound estimators hold in a uniform sense. The exact nature of such strengthen-ings, and low-level assumptions that achieve them, are well understood (Andrews and Soares,2010; Romano and Shaikh, 2008) and are omitted for brevity. Remark : The notable diﬀerence in setting to Imbens and Manski (2004) is the absenceof an implicit supereﬃciency condition on ˆ∆ near true value 0. That condition turns out to The table was generated by gridding and using B = 4000000 simulations. This is feasible on a run-of-the-mill netbook. The relevant simulation error is the coverage error at the suggested ˆ c . Given B , it will be muchsmaller than what is routinely accepted in simulation-based, e.g. bootstrap, inference. For ρ ≤ .

8, furthersimulations establish to high accuracy that coverage is ﬁrst increasing and then decreasing in ∆ and minimizedas ∆ → ∞ , the same feature that is analytically proved for ρ = 0 and which justiﬁes ˆ c = Φ − (1 − α ). [6]btain if (and, in practice, only if) ˆ θ U ≥ ˆ θ L by construction (Stoye, 2009, Lemma 3). Thiscase is empirically relevant: It applies to most missing-data bounds and also bounds that relyon diﬀerent truncations of observed probability measures (Horowitz and Manski, 1995; Lee,2009), unless further reﬁnements turn these into intersection bounds. If it obtains and otherregularity conditions hold, the conﬁdence interval in Imbens and Manski (2004) is valid, isexpected to be rather eﬃcient for small ∆ (because it uses superconsistency of ˆ∆), and willobviously never be empty. Not coincidentally, this case is also characterized by the possibilityof ρ ≈

1; indeed, that is how superconsistency of ˆ∆ arises. Whether this case applies can beascertained before seeing any data, and I strongly suggest that users do so.

Remark : I follow the bulk of the literature in focusing on uniform coverage of θ ∈ Θ ∗ I .The procedure is easily adapted to coverage of the entire set Θ ∗ I . Note that, by a Bonferroniargument, a critical value of ˆ c = Φ − (1 − α/

2) would always do, and also that (as can be seenfrom considering large ∆) only a large negative value of ˆ ρ would cause ˆ c to be appreciablylower.The proof of Theorem 1 contains three steps. First, it is relatively routine to show that CI MA would be valid if, in line with the heuristic deﬁnition, expression (2.2) explicitly tookthe inﬁmum also over values of ( σ L , σ U ) as well as θ ∈ Θ I . In a second step, we can concentrateout all of these. In particular, one can restrict attention to one of θ = θ L or θ = θ U ; expression(2.2) arbitrarily chooses the latter. This ﬁnding is not obvious: For given ∆, coverage is not equally minimized at the interval’s endpoints; it is only that the corresponding inﬁma over∆ ∈ [0 , ∞ ) are the same. As ﬁnal ﬂourish in this step, it turns out that asymptotic coverageat θ U depends on (∆ , σ L , σ U ) only through ∆ /σ L . For the purpose of evaluating worst-casecoverage over ∆ ≥

0, we can therefore set both standard deviations to 1.The ﬁnal, and by far most delicate, step is that if ρ = 0, coverage is provably minimizedas ∆ → ∞ , justifying use of the one-sided critical value ˆ c = Φ − (1 − α ). To appreciatethis claim, consider again the two components of CI MA in (2.1). For α = .

05, the left-handinterval’s coverage for either θ L or θ U may be as low as . . frombelow as ∆ → ∞ . The right-hand interval’s coverage of these values is .

95 at ∆ = 0 (whereboth coincide with θ ∗ ) but rapidly decreases to 0 as ∆ increases. That these eﬀects aggregateto coverage uniformly above .

95 is far from obvious and heavily relies on speciﬁc features ofthe bivariate Normal distribution.Numerically, the ﬁnal step extends to moderate ρ (see again Table 1), and the proof usesconservative bounds. Some analytic result of higher generality might, therefore, be available.However, for large positive ρ , coverage is minimized at small positive ∆. Therefore, if ρ isunknown, estimating it cannot be avoided. In particular, in view of Table 1, a pre-test for“small enough” ρ would be counterproductive: Since ˆ c as a function of ρ is mostly completelyﬂat, one would be unlikely to recover the inferential cost (in the sense of Remark 2) of the[7] a) Coverage when ρ = 0. (b) Expected length when ρ = 0.(c) Coverage when ρ = 0 .

7. (d) Expected length when ρ = 0 . Figure 1: Coverage (left panels) and expected length (right panels; length of true interval issubtracted) of CI T I (blue), CI T I ∪ CI θ ∗ (red) and the new proposal CI MA (green). Horizontalaxis is ∆ = θ U − θ L ; negative values indicate increasing misspeciﬁcation. Nominal coverageis 95% and is indicated by a black horizontal line.pre-test. Figures 1 and 2 compare CI MA with a test inversion interval CI T I that arguably reﬂects thestate of the established literature. It inverts a test of H : θ ≤ θ U , θ ≥ θ L by taking themaximum (studentized) violation as test statistic, i.e. the same test statistic that generally The interval closely follows Romano, Shaikh, and Wolf (2014); other established methods (Andrews andSoares, 2010; Andrews and Barwick, 2012; Bugni, 2010; Canay, 2010) would inform similar constructions. Asof writing of this manuscript, at least two rather distinct (from the preceding and from each other) proposalsare in the pipeline (Andrews, Roth, and Pakes, 2019; Cox and Shi, 2020). Both invert a test and can beempty; Andrews, Roth, and Pakes (2019) also has a tuning parameter. They are compared in Cox and Shi(2020). A comparison of all these approaches in simple examples might be worthwhile. [8] a) Coverage when ρ = − .

7. (b) Expected length when ρ = − . ρ = 0 .

95. (d) Expected length when ρ = 0 . Figure 2: Continuation of Figure 1. The last case ( ρ = .

95) illustrates a setting where ∆ → ∞ is not least favorable and where ˆ c > . ∗ I as pseudotrue identiﬁed set. The critical value is based on a pre-test –speciﬁcally,a one-sided ( . α )-Wald test– that potentially discards one of the inequality constraints asnonbinding. Depending on the pre-test’s result, the critical value is then either a simple one-sided critical value or computed by a simulation that takes ρ into account. In either case, thesecond-stage test is of size . α , so that the pre-test is accounted for by Bonferroni correction.The resulting test is inverted, and the critical value is recomputed, as θ changes, making theinterval considerably shorter than early entries in the literature (Imbens and Manski, 2004;Stoye, 2009). Compared to CI MA , test inversion adds orders of magnitude of computationalcost, though at a very low absolute level. I abstract from asymptotic approximation bydrawing estimators straight from limiting distributions and taking ( σ L , σ U , ρ ) to be known.Interval length ∆ is denominated in estimator standard errors because √ nσ L = √ nσ U = 1throughout.The comparison is extended into the misspeciﬁed range by letting ∆ take on negativevalues. The test inversion interval obviously undercovers in that range. To clarify compar-[9]sons, I also compute CI T I ∪ CI θ ∗ . Recall that CI MA can be loosely intuited as reﬁning thisconstruction by adjusting the critical value to account for union-taking. Nominal coverage is95% throughout.Figure 1 illustrates the results for ρ = 0 (top panels) and ρ = . ρ = − . ρ = .

95. The last case is arguably contrivedbut serves to illustrate that ∆ → ∞ is not always least favorable. By the same token, this isthe only case in which ˆ c > Φ − ( . CI MA :It is shorter, and this is also reﬂected in more precise size control and thereby more power ofthe implied test. The advantage is especially apparent for small positive ∆. What happenshere is that the correction provided by CI θ ∗ allows CI MA to transition to just adding 1 . ρ ≤ . negative estimated interval length ˆ∆; that is, CI MA just adds 1 . CI MA for large ∆ reﬂects that CI T I accounts for a pre-test.One might wonder how Andrews and Kwon (2019) would perform in the example. Whilethe exact answer depends on choice of multiple tuning parameters, some qualitative con-siderations are as follows. Their interval starts from CI T I and expands it in order to avoidspurious precision. As a result, it will be bounded from below in both length and coverage bythe blue curves in Figures 1 and 2. In an initial reﬁnement, Andrews and Kwon (2019) formthe union between CI T I and a never-empty conﬁdence interval. Their preferred conﬁdenceinterval does this only if an additional pre-test fails to reject misspeciﬁcation. While thismitigates the eﬀect of expanding CI T I , the ﬁnal conﬁdence interval still contains CI T I andconsiderably exceeds it for small positive ∆ (see their Section 8.1, whose setting resemblesthe present one). This will obviously be reﬂected in its statistical performance. Conversely,an intriguing feature of CI MA is that it “spends” the “coverage capital” gained from ensuringnonemptiness by being shorter than CI T I for interesting values of ∆. In fairness to Andrewsand Kwon (2019), it appears far from obvious how to implement such a feature in their muchmore general setting.The advantage of CI MA fades out, and even reverses, in the special case where ρ → not ∆ →

0. In that limit, ˆ c will converge to the two-sided critical value, whereas a pre-testwill eventually recommend a one-sided test. While such scenarios can obviously be simulated,they arguably are contrived. The possibility of high ρ and correspondingly precise estimationof ∆ is empirically relevant, but it corresponds to the supereﬃciency case discussed in Remark4 and therefore to small ∆ as well as to a case distinction that can be decided in pre-dataanalysis. Also, one could in principle ﬁx this issue by layering a pre-test on top of CI MA ;however, as general advice in this matter, I stand by Remark 2. Andrews and Kwon (2019) implement CI TI through Andrews and Soares (2010) but point out thatRomano, Shaikh, and Wolf (2014) could be used instead. The diﬀerence will be small in the present setting. [10] ame [ ˆ θ L , ˆ θ U ] CI MA CI T I rel. length

Ambiguity Aversion [0.499,0.557] [0.459,0.597] [0.458,0.598] 0.97Eﬀort: 1 cent bonus [0.469,0.484] [0.448,0.503] [0.448,0.504] 0.97Eﬀort: 0 cent bonus ∗ [0.343,0.331] [0.318,0.356] [0.315,0.358] 0.91Lying ∗∗ [0.530,0.537] [0.512,0.556] [0.508,0.560] 0.83Time ∗∗ [0.766,0.770] [0.722,0.814] [0.712,0.824] 0.82Trust Game 1 [0.430,0.455] [0.388,0.493] [0.387,0.495] 0.96Trust Game 2 [0.348,0.398] [0.328,0.426] [0.327,0.427] 0.97Ultimatum Game 1 [0.443,0.470] [0.422,0.493] [0.422,0.494] 0.97Ultimatum Game 2 [0.362,0.413] [0.342,0.436] [0.341,0.436] 0.97Table 2: Conﬁdence intervals applied to data in de Quidt, Haushofer, and Roth (2018, com-pare select columns of their Table 1). Relative length refers to relative (of CI MA over CI T I )excess length beyond max { ˆ∆ , } . Of special interest: Case (*) has inverted bound estima-tors, displayed with abuse of interval notation. Cases (**) are short (near point identiﬁed)estimated intervals. De Quidt, Haushofer, and Roth (2018) estimate upper and lower bounds on behavioral pa-rameters from diﬀerent treatments in a between-subjects design, meaning that estimators areuncorrelated. At the same time, bounds can and did in fact invert, triggering an inquiry bythe authors that led to the present paper.Table 2 displays estimated bounds, CI MA , and CI T I for selected instances of the “weakbounds” data. This refers to a baseline setting before inducing experimenter demand. Formore details, I refer to de Quidt, Haushofer, and Roth (2018), particularly Figure 1 andcorresponding explanations. The last column divides the length of CI MA by the length of CI T I , subtracting max { ˆ∆ , } from both. Both intervals make full use of ρ = 0 being known.The comparison is between CI MA and CI T I ; obviously, CI T I ∪ CI θ ∗ would be largerthan CI T I . The data include one case (*) where bound estimators are inverted and whereex post, CI MA = CI θ ∗ . There are also two cases (**) of short estimated intervals (relativeto standard errors), i.e. of near point identiﬁcation. Because CI MA cannot be empty, onemight have conjectured it to be the longer one in these cases. In fact, it is noticeably shorterin all of them – the eﬀect of “spending coverage capital” from the nonemptiness correctiondominates. In all other cases, both intervals eﬀectively add 1 .

64 standard errors. This case would not have led any speciﬁcation test to reject the model, even before taking multiplehypothesis testing into account. In those cases, the small diﬀerences favoring CI MA reﬂect Bonferroni adjustment for pre-tests, i.e. thespeciﬁcs of Romano, Shaikh, and Wolf (2014). In cases where [ θ L , θ U ] is obviously “long,” researchers will inpractice be tempted to claim an asymptotic pre-test and just use 1 .

64 standard errors. [11]

Conclusion

For a simple, but empirically relevant, partial identiﬁcation problem, I propose a conﬁdenceinterval that has competitive size control and length including in the misspeciﬁed case, whilebeing extremely easy to compute. The most striking ﬁnding is that in many cases, a seeminglycrude ﬁx to a nominal 90% conﬁdence interval ensures 95% coverage at little cost in terms ofinterval length and with practically zero computation. Simulations are encouraging, and theconﬁdence interval improves on current best practice in application to recent lab experiments.The approach is complementary to Andrews and Kwon (2019), from whom I take thebroad motivation as well as the novel coverage requirement. Of course, their approach appliesfar beyond the present paper’s simple setting. On the other hand, it has several tuningparameters and expands a conventional conﬁdence interval, whereas the present proposal istuning parameter free and compensates for expanding the conventional interval by reducingits standalone nominal coverage. A question of obvious interest, but also beyond my currentreach, is whether this last feature can be usefully generalized. As it stands, the presentproposal is limited to a speciﬁc setting but appears both practical and powerful when thatsetting obtains. [12]

Proof of Theorem 1

Validity in case of ∆ < θ ∗ is required in this case, and CI θ ∗ achieves that by itself. Since for ∆ ≥

0, we have Θ ∗ I = Θ I , it remains to show coverage of θ ∈ [ θ L , θ U ] assuming that θ U ≥ θ L . For the remainder of this proof, express the true valueof θ as θ = λθ U + (1 − λ ) θ L for some λ ∈ [0 , CI ∗ MA , which is just like CI MA except that, rather than by (2.2), a critical value c ∗ is deﬁned by setting inf ∆ ≥ ,λ ∈ [0 , Pr( E ∆ ,λ,c ∗ ) = 1 − α , where E ∆ ,λ,c = (cid:26) Z ∗ − λσ L ∆ ≤ c ∩ Z ∗ + 1 − λσ U ∆ ≥ − c (cid:27) (A.1) ∪ (cid:26) Z ∗ + Z ∗ + (cid:18) − λσ U − λσ L (cid:19) ∆ ∈ (cid:104) − (cid:112) ρ Φ − (cid:0) − α (cid:1) , (cid:112) ρ Φ − (cid:0) − α (cid:1)(cid:105)(cid:27) , (cid:32) Z ∗ Z ∗ (cid:33) ∼ N (cid:32)(cid:32) (cid:33) , (cid:32) ρρ (cid:33)(cid:33) . Note two diﬀerences to (2.2): The construction explicitly minimizes over both ∆ and λ , andit is infeasible in that population values of ( σ L , σ U , ρ ) are used.Step 1 of the proof establishes validity of CI ∗ MA . Step 2 shows that λ can always be setto 1, transforming the above into (2.2). Step 3 establishes that if ρ = 0, one can furthermoretake the limit as ∆ → ∞ . The argument that ( σ L , σ U , ρ ) can be replaced with consistentestimators is omitted for brevity. Step 1: Validity of CI ∗ MA . Write CI ∗ MA = (cid:20) ˆ θ L − σ L √ n c ∗ , ˆ θ U + σ U √ n c ∗ (cid:21) ∪ (cid:34) σ L ˆ θ U + σ U ˆ θ L σ L + σ U − σ ∗ √ n Φ − (cid:0) − α (cid:1) , σ L ˆ θ U + σ U ˆ θ L σ L + σ U + σ ∗ √ n Φ − (cid:0) − α (cid:1)(cid:35) , where σ ∗ ≡ √ ρσ L σ U / ( σ L + σ U ) is the asymptotic standard deviation of √ n ( λ ∗ ˆ θ U + (1 − λ ∗ )ˆ θ L − θ ∗ ) and λ ∗ ≡ σ L / ( σ L + σ U ) is the mixture weight characterizing θ ∗ .Deﬁne also standardized estimation errors(¯ ε L , ¯ ε U ) ≡ √ n (cid:32) ˆ θ L − θ L σ L , ˆ θ U − θ U σ U (cid:33) . We have that θ ∈ CI ∗ MA if eitherˆ θ L − σ L √ n c ∗ ≤ λθ U + (1 − λ ) θ L ≤ ˆ θ U + σ U √ n c ∗ ⇐⇒ ˆ θ L − θ L ≤ λ ∆ + σ L √ n c ∗ , ˆ θ U − θ U ≥ − (1 − λ )∆ − σ U √ n c ∗ ⇐⇒ ¯ ε L ≤ λσ L √ n ∆ + c ∗ , ¯ ε U ≥ − − λσ U √ n ∆ − c ∗ [13]r σ L ˆ θ U + σ U ˆ θ L σ L + σ U − ( λθ U + (1 − λ ) θ L ) ∈ (cid:20) − σ ∗ √ n Φ − (cid:0) − α (cid:1) , σ ∗ √ n Φ − (cid:0) − α (cid:1)(cid:21) ⇐⇒ σ L (cid:16) θ U + σ U ¯ ε U √ n (cid:17) + σ U (cid:16) θ L + σ L ¯ ε L √ n (cid:17) σ L + σ U − ( λθ U + (1 − λ ) θ L ) ∈ (cid:20) − σ ∗ √ n Φ − (cid:0) − α (cid:1) , σ ∗ √ n Φ − (cid:0) − α (cid:1)(cid:21) ⇐⇒ σ L σ U σ L + σ U (¯ ε L + ¯ ε U ) + √ n (cid:18) σ L θ U + σ U θ L σ L + σ U − ( λθ U + (1 − λ ) θ L ) (cid:19) ∈ (cid:104) − σ ∗ Φ − (cid:0) − α (cid:1) , σ ∗ Φ − (cid:0) − α (cid:1)(cid:105) ⇐⇒ ¯ ε L + ¯ ε U + √ n σ L θ U + σ U θ L − ( σ L + σ U )( λθ U + (1 − λ ) θ L ) σ L σ U ∈ (cid:104) − (cid:112) ρ Φ − (cid:0) − α (cid:1) , (cid:112) ρ Φ − (cid:0) − α (cid:1)(cid:105) ⇐⇒ ¯ ε L + ¯ ε U + (cid:18) − λσ U − λσ L (cid:19) √ n ∆ ∈ (cid:104) − (cid:112) ρ Φ − (cid:0) − α (cid:1) , (cid:112) ρ Φ − (cid:0) − α (cid:1)(cid:105) . In sum,Pr( θ ∈ CI ∗ MA )= Pr (cid:18)(cid:26) ¯ ε L − λσ L √ n ∆ ≤ c ∗ ∩ ¯ ε U + 1 − λσ U √ n ∆ ≥ − c ∗ (cid:27) ∪ (cid:26) ¯ ε L + ¯ ε U + (cid:18) − λσ U − λσ L (cid:19) √ n ∆ ∈ (cid:104) − (cid:112) ρ Φ − (cid:0) − α (cid:1) , (cid:112) ρ Φ − (cid:0) − α (cid:1)(cid:105)(cid:27)(cid:19) → Pr (cid:18)(cid:26) Z ∗ − λσ L √ n ∆ ≤ c ∗ ∩ Z ∗ + 1 − λσ U √ n ∆ ≥ − c ∗ (cid:27) ∪ (cid:26) Z ∗ + Z ∗ + (cid:18) − λσ U − λσ L (cid:19) √ n ∆ ∈ (cid:104) − (cid:112) ρ Φ − (cid:0) − α (cid:1) , (cid:112) ρ Φ − (cid:0) − α (cid:1)(cid:105)(cid:27)(cid:19) ≥ inf ∆ ≥ ,λ ∈ [0 , Pr (cid:18)(cid:26) Z ∗ − λσ L ∆ ≤ c ∗ ∩ Z ∗ + 1 − λσ U ∆ ≥ − c ∗ (cid:27) ∪ (cid:26) Z ∗ + Z ∗ + (cid:18) − λσ U − λσ L (cid:19) ∆ ∈ (cid:104) − (cid:112) ρ Φ − (cid:0) − α (cid:1) , (cid:112) ρ Φ − (cid:0) − α (cid:1)(cid:105)(cid:27)(cid:19) = 1 − α, where the convergence uses Assumption 1 and the next step uses the deﬁnition of c ∗ and alsoobserves that, since we take an inﬁmum over ∆ ≥

0, we can drop the √ n premultiplying ∆. Step 2: Concentrating out λ . We ﬁrst concentrate out λ , for which { , } are equallyleast favorable if ∆ is unrestricted. To see this, consider the reparameterization( X , X ) ≡ (cid:18) Z ∗ + Z ∗ √ , Z ∗ − Z ∗ √ (cid:19) ⇐⇒ ( Z ∗ , Z ∗ ) = (cid:18) X − X √ , X + X √ (cid:19) (A.2)[14]nd observe that ( X , X ) are uncorrelated. Simple algebra yields E ∆ ,λ,c = (cid:26) X − X − λσ L √ ≤ √ c ∩ X + X + 1 − λσ U √ ≥ −√ c (cid:27) ∪ (cid:26) X + (cid:18) − λσ U − λσ L (cid:19) ∆ √ ∈ (cid:104) − (cid:112) ρ Φ − (cid:0) − α (cid:1) , (cid:112) ρ Φ − (cid:0) − α (cid:1)(cid:105)(cid:27) . Consider minimizing Pr( E ∆ ,λ,c ) subject to the constraint that∆ = σ L σ U λσ U + (1 − λ ) σ L β for some ﬁxed value β ≥

0. This is without loss of generality since one can minimize over β in a second step and every value of (∆ , λ ) ∈ [0 , ∞ ) × [0 ,

1] is consistent with some β ≥ . . . ≤ X ≤ . . . ”, one can write E ∆ ,λ,c (cid:107) ∆= σLσUλσU +(1 − λ ) σL β = (cid:26) − X − √ c − (1 − λ ) σ L λσ U + (1 − λ ) σ L √ β ≤ X ≤ X + √ c + λσ U λσ U + (1 − λ ) σ L √ β (cid:27) ∪ (cid:26) λσ U − (1 − λ ) σ L λσ U + (1 − λ ) σ L × β √ − (cid:112) ρ Φ − (cid:0) − α (cid:1) ≤ X ≤ λσ U − (1 − λ ) σ L λσ U + (1 − λ ) σ L × β √ (cid:112) ρ Φ − (cid:0) − α (cid:1)(cid:27) and thereforePr( E ∆ ,λ,c | X = x ) (cid:107) ∆= σLσUλσU +(1 − λ ) σL β = Pr (cid:18) X ∈ (cid:20) − x − √ c − (1 − λ ) σ L λσ U + (1 − λ ) σ L √ β, x + √ c + λσ U λσ U + (1 − λ ) σ L √ β (cid:21) ∪ (cid:20) λσ U − (1 − λ ) σ L λσ U + (1 − λ ) σ L × β √ − (cid:112) ρ Φ − (cid:0) − α (cid:1) , λσ U − (1 − λ ) σ L λσ U + (1 − λ ) σ L × β √ (cid:112) ρ Φ − (cid:0) − α (cid:1)(cid:21)(cid:19) with the understanding that the ﬁrst interval above is empty for small enough x .Irrespective of the value taken by x , both intervals are centered at λσ U − (1 − λ ) σ L λσ U +(1 − λ ) σ L × β √ , anexpression that increases in λ and takes value 0 at λ = λ ∗ . The intervals’ length does notdepend on λ , and their union coincides with the larger of the two (whose identity dependson x ). Again irrespective of the value of x , X is distributed normally around 0. Bylog-concavity of the Normal distribution (or by taking derivatives), the above probabilitytherefore increases in λ up to λ ∗ and decreases in λ thereafter conditionally on any x ,hence also unconditionally. Furthermore, plugging in λ ∈ { , } reveals symmetry about 0:Switching λ from 0 to 1 is equivalent to leaving λ unchanged but replacing X with − X .The probabilities of all intervals in the above display are , therefore, equally minimized at λ ∈ { , } (although these minima correspond to diﬀerent ∆). This establishes that, if bothof (∆ , λ ) are concentrated out globally, one can restrict attention to one of λ = 0 or λ = 1.[15]e ﬁnally observe that the way in which σ L enters E ∆ , ,c = (cid:26) Z ∗ − ∆ σ L ≤ c ∩ Z ∗ ≥ − c (cid:27) ∪ (cid:26) Z ∗ + Z ∗ − ∆ σ L ∈ (cid:104) − (cid:112) ρ Φ − (cid:0) − α (cid:1) , (cid:112) ρ Φ − (cid:0) − α (cid:1)(cid:105)(cid:27) allows the simpliﬁcationinf ∆ ≥ Pr( E ∆ , ,c )= inf ∆ ≥ Pr (cid:16)(cid:8) Z ∗ − ∆ ≤ c ∩ Z ∗ ≥ − c (cid:9) ∪ (cid:110) Z ∗ + Z ∗ − ∆ ∈ (cid:104) − (cid:112) ρ Φ − (cid:0) − α (cid:1) , (cid:112) ρ Φ − (cid:0) − α (cid:1)(cid:105)(cid:111)(cid:17) . Step 3: For ρ = 0 , concentrating out ∆ . For the remainder of this proof, suppose ρ = 0.In view of step 2, also restrict attention to λ = 1. This step’s main claim is that Pr( E ∆ , ,c )is ﬁrst increasing and then decreasing (possibly, although not in fact, all increasing or alldecreasing) in ∆ ≥

0. Suppose the claim is true, then it follows that inf ∆ ∈ [0 , ∞ ) Pr( E ∆ , ,c )is attained either at ∆ = 0 or as ∆ → ∞ . In the former case, θ U = θ ∗ , so that CI MA isobviously conservative. The latter limit is easily seen to equal 1 − α , and this is indeed the(unattained) inﬁmal coverage.It remains to show the main claim. Write γ = √ − (1 − α/ ρ = 0)we have E ∆ , ,c = (cid:8) Z ∗ − ∆ ≤ c ∩ Z ∗ ≥ − c (cid:9) ∪ (cid:8) Z ∗ + Z ∗ − ∆ ∈ [ − γ, γ ] (cid:9) , where ( Z ∗ , Z ∗ ) is bivariate standard Normal. We will henceforth think of Pr( E ∆ , ,c ) asfunction of ∆ with ( c, γ ) ﬁxed. Note that the condition on critical values translates as2 c ≥ γ .Using Φ( · ) and φ ( · ) for the standard normal distribution and density functions, writePr( E ∆ , ,c | Z ∗ = z ) =  Φ( γ + ∆ − z ) − Φ( − γ + ∆ − z ) , z < − c Φ( γ + ∆ − z ) , − c ≤ z ≤ − c + γ Φ(∆ + c ) , z > − c + γ and therefore (the last step below will be elaborated after the display) d Pr( E ∆ , ,c ) d ∆= d (cid:82) ∞−∞ Pr( E ∆ , ,c | Z ∗ = z ) φ ( z ) dz d ∆= (cid:90) − c + γ −∞ φ ( γ + ∆ − z ) φ ( z ) dz − (cid:90) − c −∞ φ ( − γ + ∆ − z ) φ ( z ) dz + (cid:90) ∞− c + γ φ (∆ + c ) φ ( z ) dz = √ (cid:18) φ (cid:18) γ + ∆ √ (cid:19) − φ (cid:18) − γ + ∆ √ (cid:19)(cid:19) Φ (cid:18) γ − ∆ − c √ (cid:19)(cid:124) (cid:123)(cid:122) (cid:125) A + φ (∆ + c )Φ( c − γ ) (cid:124) (cid:123)(cid:122) (cid:125) B . (A.3)[16]o see the last step, note ﬁrst that (cid:82) ∞− c + γ φ (∆ + c ) φ ( z ) dz simpliﬁes to B . Next,( Z ∗ , Z ∗ ) = ( γ + ∆ − z , z ) ⇔ ( X , X ) = (cid:18) γ + ∆ √ , z − γ − ∆ √ (cid:19) , where ( X , X ) is as in (A.2). Because ρ = 0 implies that ( X , X ) is standard normal, wehave (cid:90) − c + γ −∞ φ ( γ + ∆ − z ) φ ( z ) dz = (cid:90) − c + γ −∞ φ (cid:18) γ + ∆ √ (cid:19) φ (cid:18) z − γ − ∆ √ (cid:19) dz = √ (cid:90) ( γ − ∆ − c ) / √ −∞ φ (cid:18) γ + ∆ √ (cid:19) φ ( t ) dt = √ φ (cid:18) γ + ∆ √ (cid:19) Φ (cid:18) γ − ∆ − c √ (cid:19) . A similar computation for (cid:82) − c −∞ φ ( − γ + ∆ − z ) φ ( z ) dz and rearrangement of terms yieldterm A in (A.3).Term A equals zero at ∆ = 0 and then becomes negative. Term B is positive throughout.Because all terms vanish as ∆ → ∞ , it is not useful to directly take further derivatives.However, we can compare the terms’ relative magnitude. In particular, we will see that | A | / | B | increases in ∆, hence d Pr( E ∆ , ,c ) /d ∆ has at most one sign change and that signchange (if it occurs) is from positive to negative, establishing the claim.To see monotonicity of | A | / | B | , write | A || B | = √ × φ (cid:16) − γ +∆ √ (cid:17) − φ (cid:16) γ +∆ √ (cid:17) φ (∆ + c ) × Φ (cid:16) γ − ∆ − c √ (cid:17) Φ( c − γ )= √ × exp (cid:0) − (cid:0) γ + ∆ − γ ∆ (cid:1)(cid:1) − exp (cid:0) − (cid:0) γ + ∆ + 2 γ ∆ (cid:1)(cid:1) exp (cid:0) − (∆ + c + 2∆ c ) (cid:1) × Φ (cid:16) γ − ∆ − c √ (cid:17) Φ( c − γ )= (cid:18) exp (cid:16) ∆ + ∆ c + γ ∆2 (cid:17) − exp (cid:16) ∆ + ∆ c − γ ∆2 (cid:17)(cid:19) Φ (cid:18) γ − ∆ − c √ (cid:19) × const. , where “const.” absorbs terms that do not depend on ∆. The derivative of this expressionwith respect to ∆ (and dropping the multiplicative constant) is (cid:18) ∆ + 2 c + γ (cid:16) ∆ + ∆ c + γ ∆2 (cid:17) − ∆ + 2 c − γ (cid:16) ∆ + ∆ c − γ ∆2 (cid:17)(cid:19)(cid:124) (cid:123)(cid:122) (cid:125) C Φ (cid:18) γ − ∆ − c √ (cid:19) − √ (cid:18) exp (cid:16) ∆ + ∆ c + γ ∆2 (cid:17) − exp (cid:16) ∆ + ∆ c − γ ∆2 (cid:17)(cid:19) φ (cid:18) γ − ∆ − c √ (cid:19) ≥ (cid:18) ∆ + 2 c + γ (cid:16) ∆ + ∆ c + γ ∆2 (cid:17) − ∆ + 2 c − γ (cid:16) ∆ + ∆ c − γ ∆2 (cid:17)(cid:19) ∆+2 c − γ √ (cid:16) ∆+2 c − γ √ (cid:17) + 1 φ (cid:18) ∆ + 2 c − γ √ (cid:19) − √ (cid:18) exp (cid:16) ∆ + ∆ c + γ ∆2 (cid:17) − exp (cid:16) ∆ + ∆ c − γ ∆2 (cid:17)(cid:19) φ (cid:18) γ − ∆ − c √ (cid:19) , [17]sing that C ≥ − t ) ≥ tt +1 φ ( t ). In order to sign this, divide through by φ ( . . . ) (bothare the same by symmetry of φ ( · )) as well as by exp (cid:16) ∆ + ∆ c − γ ∆2 (cid:17) , multiply through by √ (cid:16) (∆+2 c − γ ) + 1 (cid:17) , and rearrange terms to conclude that the last expressionabove has the same sign as (cid:18) ∆ + 2 c + γ √ × ∆ + 2 c − γ √ − (cid:18) (∆ + 2 c − γ ) (cid:19)(cid:19) exp( γ ∆) − (cid:18) ∆ + 2 c − γ √ × ∆ + 2 c − γ √ − (cid:18) (∆ + 2 c − γ ) (cid:19)(cid:19) = (cid:18) (∆ + 2 c + γ )(∆ + 2 c − γ )2 − (∆ + 2 c − γ ) − (cid:19) exp( γ ∆) + 1= ( γ (∆ + 2 c − γ ) −

1) exp( γ ∆) + 1 . At ∆ = 0, this simpliﬁes to γ (2 c − γ ) and therefore is nonnegative if 2 c ≥ γ . But one canalso write dd ∆ (cid:0) ( γ (∆ + 2 c − γ ) −

1) exp( γ ∆) + 1 (cid:1) = γ exp( γ ∆) + ( γ (∆ + 2 c − γ ) − γ exp( γ ∆)= γ (∆ + 2 c − γ ) exp( γ ∆) , which is again nonnegative if 2 c ≥ γ . Thus, | A | / | B | is nondecreasing in ∆ for all ∆ ≥ eferences Andrews, D. W. K. (2000): “Inconsistency of the Bootstrap When a Parameter is on theBoundary of the Parameter Space,”

Econometrica , 68(2), 399–405.

Andrews, D. W. K., and

P. J. Barwick (2012): “Inference for Parameters Deﬁnedby Moment Inequalities: A Recommended Moment Selection Procedure,”

Econometrica ,80(6), 2805–2826.

Andrews, D. W. K., and

S. Kwon (2019): “Inference in Moment Inequality ModelsThat Is Robust to Spurious Precision under Model Misspeciﬁcation,”

Cowles FoundationDiscussion Paper CFDP 2184R . Andrews, D. W. K., and

X. Shi (2013): “Inference Based on Conditional Moment In-equalities,”

Econometrica , 81(2), 609–666.

Andrews, D. W. K., and

G. Soares (2010): “Inference for Parameters Deﬁned by MomentInequalities Using Generalized Moment Selection,”

Econometrica , 78(1), 119–157.

Andrews, I., J. Roth, and

A. Pakes (2019): “Inference for Linear Conditional MomentInequalities,” arXiv:1909.10062.

Bugni, F. A. (2010): “Bootstrap Inference in Partially Identiﬁed Models Deﬁned by MomentInequalities: Coverage of the Identiﬁed Set,”

Econometrica , 78(2), 735–753.

Bugni, F. A., I. A. Canay, and

X. Shi (2015): “Speciﬁcation tests for partially identiﬁedmodels deﬁned by moment inequalities,”

Journal of Econometrics , 185(1), 259–282.(2017): “Inference for subvectors and other functions of partially identiﬁed param-eters in moment inequality models,”

Quantitative Economics , 8(1), 1–38.

Canay, I. (2010): “EL inference for partially identiﬁed models: large deviations optimalityand bootstrap validity,”

Journal of Econometrics , 156(2), 408–425.

Canay, I. A., and

A. M. Shaikh (2017):

Practical and Theoretical Advances in Inferencefor Partially Identiﬁed Models vol. 2 of

Econometric Society Monographs , pp. 271–306.Cambridge University Press.

Chernozhukov, V., S. Lee, and

A. M. Rosen (2013): “Intersection Bounds: Estimationand Inference,”

Econometrica , 81(2), 667–737.

Cox, G., and

X. Shi (2020): “Simple Adaptive Size-Exact Testing for Full-Vector andSubvector Inference in Moment Inequality Models,” arXiv:1907.06317. de Quidt, J., J. Haushofer, and

C. Roth (2018): “Measuring and Bounding Experi-menter Demand,”

American Economic Review , 108(11), 3266–3302.[19] orowitz, J. L., and

C. F. Manski (1995): “Identiﬁcation and Robustness with Contam-inated and Corrupted Data,”

Econometrica , 63(2), 281–302.

Imbens, G. W., and

C. F. Manski (2004): “Conﬁdence Intervals for Partially IdentiﬁedParameters,”

Econometrica , 72(6), 1845–1857.

Kaido, H., F. Molinari, and

J. Stoye (2019): “Conﬁdence Intervals for Projections ofPartially Identiﬁed Parameters,”

Econometrica , 87(4), 1397–1432.

Kaido, H., and

H. White (2013):

Estimating Misspeciﬁed Moment Inequality Models pp.331–361. Springer New York, New York, NY.

Lee, D. S. (2009): “Training, Wages, and Sample Selection: Estimating Sharp Bounds onTreatment Eﬀects,”

The Review of Economic Studies , 76(3), 1071–1102.

Leeb, H., and

B. M. P¨otscher (2005): “Model Selection and Inference: Facts and Fic-tion,”

Econometric Theory , 21(1), 21–59.

Manski, C. F. (2003):

Partial Identiﬁcation of Probability Distributions (Springer Series inStatistics) . Springer-Verlag, Berlin.

Molinari, F. (2020): “Microeconometrics with Partial Identiﬁcation,” Handbook of econo-metrics, forthcoming.

Ponomareva, M., and

E. Tamer (2011): “Misspeciﬁcation in moment inequality models:back to moment equalities?,”

The Econometrics Journal , 14(2), 186–203.

Romano, J. P., and

A. M. Shaikh (2008): “Inference for identiﬁable parameters in par-tially identiﬁed econometric models,”

Journal of Statistical Planning and Inference , 138(9),2786 – 2807.

Romano, J. P., A. M. Shaikh, and

M. Wolf (2014): “A Practical Two-Step Method forTesting Moment Inequalities,”

Econometrica , 82(5), 1979–2002.

Stoye, J. (2009): “More on Conﬁdence Regions for Partially Identiﬁed Parameters,”

Econo-metrica , 77(4), 1299–1315.

Tamer, E. (2010): “Partial Identiﬁcation in Econometrics,”

Annual Review of Economics ,2(1), 167–195.

White, H. (1982): “Maximum Likelihood Estimation of Misspeciﬁed Models,”