A Simple, Short, but Never-Empty Confidence Interval for Partially Identified Parameters
AA Simple, Short, but Never-Empty Confidence Intervalfor Partially Identified Parameters
J¨org Stoye ∗ January 1, 2021
Abstract
This paper revisits the simple, but empirically salient, problem of inference on areal-valued parameter that is partially identified through upper and lower bounds withasymptotically normal estimators. A simple confidence interval is proposed and is shownto have the following properties: • It is never empty or awkwardly short, including when the sample analog of theidentified set is empty. • It is valid for a well-defined pseudotrue parameter whether or not the model iswell-specified. • It involves no tuning parameters and minimal computation.Computing the interval requires concentrating out one scalar nuisance parameter. In mostcases, the practical result will be simple: To achieve 95% coverage, report the union of asimple 90% (!) confidence interval for the identified set and a standard 95% confidenceinterval for the pseudotrue parameter.For uncorrelated estimators –notably if bounds are estimated from distinct subsamples–and conventional coverage levels, validity of this simple procedure can be shown analyt-ically. The case obtains in the motivating empirical application (de Quidt, Haushofer,and Roth, 2018), in which improvement over existing inference methods is demonstrated.More generally, simulations suggest that the novel confidence interval has excellent lengthand size control. This is partly because, in anticipation of never being empty, the intervalcan be made shorter than conventional ones in relevant regions of sample space. ∗ Department of Economics, Cornell University, [email protected]. Thanks to Johannes Haushofer,Jonathan de Quidt, and Chris Roth for an inquiry that motivated this work and for sharing and explain-ing their data. Financial support through NSF Grant SES-1824375 is gratefully acknowledged. a r X i v : . [ ec on . E M ] D ec Introduction
Inference under partial identification is by now the subject of a broad literature. Onlyrecently did attention turn to the following concern: If a partially identified model is mis-specified, this may manifest in either an empty or –and arguably worse– in a misleadinglysmall confidence region. That is, misspecified inference can be spuriously precise.The reason is that most confidence regions used in partial identification invert tests of H : θ ∈ Θ I ; here, θ is a parameter and Θ I is the identified set. If H is rejected at every θ , the confidence region is empty. If H is barely not rejected at a few parameter values,the confidence region may be very small. This issue is empirically relevant. For example, anempty sample analog of Θ I occurs in de Quidt, Haushofer, and Roth (2018), whose inquirysparked the present research and whose data are reanalyzed below.The literature on this issue is still young. Ponomareva and Tamer (2011) provide an earlydiagnosis. Kaido and White (2013) propose a notion of pseudotrue identified set and anestimator thereof. Molinari (2020) explains the issue in detail and highlights it as importantarea for further investigation. The most thorough treatment is by Andrews and Kwon (2019),who emphasize the issue’s importance and provide a general inference method that avoidsspurious precision and ensures coverage of a pseudotrue identified set.The present paper is in the spirit of Andrews and Kwon (2019). I focus on the simple butempirically salient case of a scalar parameter with upper and lower bounds whose estimatorsare jointly asymptotically normal. That is, I revisit the setting of Imbens and Manski (2004,without their superefficiency assumption) and Stoye (2009). For this setting, I propose aconfidence interval with the following features: • It is never empty nor very short (a lower bound on its length is reported later). • It exhibits asymptotically guaranteed coverage uniformly over the identified set andadditionally for a well-defined pseudotrue parameter. • It tends to be shorter than more conventional intervals in benign cases, including inthe empirical application. • It is free of tuning parameters and trivial to compute.For target coverage of 95% and for the special case of uncorrelated estimators, e.g. in thispaper’s empirical application, the confidence interval can be verbally defined as follows: • Add ± .
64 standard errors to estimators of upper and lower bounds. • Also compute an average of the estimators that is weighted by their standard errors,as well as the corresponding standard error. Add ± .
96 of those standard errors to theaverage. See Manski (2003) for an early monograph, Tamer (2010) for a historical introductions, and Canay andShaikh (2017) and Molinari (2020) for recent surveys that extensively cover inference. [2]
Report the union of the intervals.While this paper generally proposes a somewhat less “cute” procedure with broader applica-bility, this specialized finding is probably the most striking part. Neither of the above twointervals is valid by itself; it is just that their coverage events are correlated in exactly theright way.Section 2 develops the proposal more formally and gives an intuition for why it works,though proofs are relegated to the Appendix. Section 3 provides a numerical illustration andSection 4 an application to the data that motivated this research. Section 5 concludes.
While the interpretation of what follows is inference on a scalar parameter θ , the only as-sumption is that one has well-behaved estimators of two other parameter values. Assumption : There exist estimators (ˆ θ L , ˆ θ U ) with probability limits ( θ L , θ U ) ∈ R suchthat √ n (cid:32) ˆ θ L − θ L ˆ θ U − θ U (cid:33) d → N (cid:32)(cid:32) (cid:33) , (cid:32) σ L ρσ L σ U ρσ L σ U σ U (cid:33)(cid:33) , where σ L , σ U > and consistent estimators (ˆ σ L , ˆ σ U , ˆ ρ ) p → ( σ L , σ U , ρ ) are available. The motivation is that the researcher estimates an identified set Θ I ≡ [ θ L , θ U ] containinga true parameter value θ . Assumption 1 is unrestrictive if, as in the empirical application,(ˆ θ L , ˆ θ U ) are smooth functions of sample moments. It is unlikely to hold for intersectionbounds (Andrews and Shi, 2013; Chernozhukov, Lee, and Rosen, 2013) and will hold forbounds that result from projecting a higher-dimensional identified set (Bugni, Canay, andShi, 2017; Kaido, Molinari, and Stoye, 2019), including components of partially identifiedvectors, only in benign cases.The obvious estimator of Θ I is [ˆ θ L , ˆ θ U ], but defining a confidence interval is delicate.Following Imbens and Manski (2004), the literature mostly focuses on confidence intervalsthat (asymptotically) contain the true parameter value with prespecified probability (1 − α ), irrespective of its location in Θ I , i.e. confidence intervals that control inf θ ∈ Θ I Pr( θ ∈ CI ). Finding such intervals is subtle because the nature of the testing problem qualitativelydepends on the length ∆ ≡ θ U − θ L of Θ I . Heuristically, this problem is one-sided if ∆is “large” and two-sided if it is “short,” i.e. near point identification. Ascertaining whichcase obtains is subject to difficulties reminiscient of post-model selection inference (Leeb andP¨otscher, 2005) and parameter-on-the-boundary issues (Andrews, 2000).The literature on how to circumvent this issue is by now considerable. Most approachesinvert a test, that is, they report all values of θ for which H : θ ∈ Θ I was not rejected. Full disclaimer: I discovered it by simulation and initially assumed a bug. [3]ny such confidence set can be empty; in this paper’s settings, that will happen if ˆ θ L ismuch larger than ˆ θ U , where the meaning of “much” varies across papers. This feature can beadvertised as an embedded specification test but may not be wanted. Arguably even moreproblematic is that, if the model is misspecified, a test inversion confidence interval can beshort, suggesting precision when the true issue is misspecification. A specification test willnot resolve this: In this paper’s setting, the best-practice such test (Bugni, Canay, and Shi,2015) just reports whether the test inversion interval is empty. Addressing this concern requires a notion of coverage for the case of misspecification, i.e.if θ L > θ U . Following Andrews and Kwon (2019), define the pseudotrue identified setΘ ∗ I ≡ Θ I ∪ { θ ∗ } θ ∗ ≡ σ U θ L + σ L θ U σ L + σ U . This definition is natural because Θ ∗ I = arg min θ max { ( θ − θ U ) /σ U , ( θ L − θ ) /σ L , } ; thus, Θ ∗ I is the estimand implied by the frequent choice of max { ( θ − ˆ θ U ) / ˆ σ U , (ˆ θ L − θ ) / ˆ σ L , } as teststatistic. Note also that Θ ∗ I is never empty and that Θ ∗ I = Θ I whenever Θ I (cid:54) = ∅ .The revised notion of validity of a confidence interval is as follows: Definition : A confidence interval CI has asymptotic coverage of (1 − α ) if lim n →∞ inf θ ∈ Θ ∗ I Pr( θ ∈ CI ) ≥ − α. Forcing coverage of θ ∗ will ensure that the interval is nonempty and also that it is statis-tically interpretable as targeting Θ ∗ I . An obvious caveat is that, as with the related literaturegoing back to White (1982), the coverage target’s substantive relevance may not be clear ifthe model is in fact misspecified. As Andrews and Kwon (2019) elaborate, this has to betraded off against concerns with spurious precision.While the coverage notion exactly mimics Andrews and Kwon (2019), the confidenceinterval will be quite different. It goes “back to basics” in that, like early entries in theliterature (Imbens and Manski, 2004; Stoye, 2009), it essentially just adds a certain numberof standard errors to estimated bounds. An advantage is computational and conceptualsimplicity; with test inversion intervals, critical values generally depend on θ even in thissimple setting and therefore must be computed many times. However, the main motivationis that the new interval performs well. Its heuristic definition is as follows: • Compute an interval CI Θ I ≡ (cid:2) ˆ θ L − ˆ σ L √ n ˆ c, ˆ θ U + ˆ σ U √ n ˆ c (cid:3) , That was the sales pitch in Stoye (2009), but not all referees were sold on it. The embedded specificationtest is analyzed in more detail by Andrews and Soares (2010). This equivalence does not generalize, but Andrews and Kwon (2019) show that in “slightly misspecified”parameter regimes, spuriously precise inference generally coexists with low power of specification tests. [4]here ˆ c depends on α and ˆ ρ ; see Table 1. • Also compute the estimator ˆ θ ∗ ≡ ˆ σ U ˆ θ L + ˆ σ L ˆ θ U ˆ σ L + ˆ σ U and confidence interval CI θ ∗ ≡ (cid:104) ˆ θ ∗ − ˆ σ ∗ √ n Φ − (cid:0) − α (cid:1) , ˆ θ ∗ + ˆ σ ∗ √ n Φ − (cid:0) − α (cid:1)(cid:105) ˆ σ ∗ ≡ ˆ σ L ˆ σ U √ ρ ˆ σ L + ˆ σ U . • Report the union CI Θ I ∪ CI θ ∗ . • We will not pre-estimate ∆ but set it to its globally least favorable value. We will,however, anticipate the conservative bias ensuing from taking unions of intervals. Thisbias is easy to estimate; in particular, there is no parameter-on-the-boundary issue. • One might think that concentrating out ∆ will be very conservative. It turns out thatthis is not so. In most cases, ˆ c = Φ − (1 − α ), i.e. we can just use the one-sided criticalvalue, at least to extremely high simulation accuracy. If ρ = 0 and for conventionalcoverage levels, this can be shown analytically.The new confidence interval is obviously never empty; indeed, its length cannot drop below2ˆ σ ∗ Φ − (1 − α/ Definition : The misspecification-adaptive confidence interval CI MA is CI MA ≡ (cid:104) ˆ θ L − ˆ σ L √ n ˆ c, ˆ θ U + ˆ σ U √ n ˆ c (cid:105) ∪ (cid:104) ˆ θ ∗ − ˆ σ ∗ √ n Φ − (cid:0) − α (cid:1) , ˆ θ ∗ + ˆ σ ∗ √ n Φ − (cid:0) − α (cid:1)(cid:105) , (2.1) where ˆ c is the unique value of c solving inf ∆ ≥ Pr (cid:16) Z − ∆ − c ≤ ≤ Z + c or | Z + Z − ∆ | ≤ (cid:112) ρ Φ − (cid:0) − α (cid:1)(cid:17) = 1 − α, (cid:32) Z Z (cid:33) ∼ N (cid:32)(cid:32) (cid:33) , (cid:32) ρ ˆ ρ (cid:33)(cid:33) . (2.2) If ρ = 0 is known and √ − (1 − α ) ≥ Φ − (1 − α/ , just set ˆ c = Φ − (1 − α ) . Remark : The condition that √ − (1 − α ) ≥ Φ − (1 − α/
2) holds for α < .
14, i.e. forcoverage levels of 86% or higher.
Theorem : The confidence interval CI MA achieves asymptotic coverage of (1 − α ) .Proof. See appendix A. [5] ≤ α = . α = . α = . ∈ [0 , ∞ ) for different coverages andcorrelations. For ρ ≤ .
8, further simulations corroborate the one-sided critical value as exactsolution.Expression (2.2) is numerically evaluated for different values of ρ and target coverages inTable 1. In particular, simulation with very high accuracy suggests that ˆ c is just the one-sidedcritical value for ρ up to at least .
8; it then gradually increases toward the two-sided criticalvalue, which is easily seen to solve (2.2) for ˆ ρ = 1. Remark : Except for large positive ρ , infimal coverage of (1 − α ) is attained in the limitas ∆ → ∞ . For finite ∆, CI MA is therefore nominally conservative.In principle, one could try to capture this by concentrating out ∆ over a more limitedrange, e.g. over a (1 − . α )-confidence interval with Bonferroni adjustment of subsequentinference. This could in principle lead to ˆ c < Φ − (1 − α ). I do not advocate it becausenumerically, the infimum in (2.2) is well approximated for surprisingly small values of ∆.Therefore, the “inferential cost” of a pre-test, whether through adjustment of second-stagetest size or through reliance on a tuning parameter, would typically not be recovered. Remark : The literature on partial identification often focuses on uniform inference.This is because na¨ıve inference methods may fail in cases of interest, e.g. as one approachespoint identification. To prevent this, the literature has an informal requirement that inferencebe uniform over delicate nuisance parameters like (in this paper) ∆; see Molinari (2020,Section 4.3.2) for further discussion. CI MA is obviously uniform in this sense because ∆ (andalso the position of θ in Θ I ) is set to its globally least favorable value.To formally claim that inference is uniform over a large class of data generating processes,one would furthermore have to strengthen Assumption 1 so that consistency and asymptoticnormality of bound estimators hold in a uniform sense. The exact nature of such strengthen-ings, and low-level assumptions that achieve them, are well understood (Andrews and Soares,2010; Romano and Shaikh, 2008) and are omitted for brevity. Remark : The notable difference in setting to Imbens and Manski (2004) is the absenceof an implicit superefficiency condition on ˆ∆ near true value 0. That condition turns out to The table was generated by gridding and using B = 4000000 simulations. This is feasible on a run-of-the-mill netbook. The relevant simulation error is the coverage error at the suggested ˆ c . Given B , it will be muchsmaller than what is routinely accepted in simulation-based, e.g. bootstrap, inference. For ρ ≤ .
8, furthersimulations establish to high accuracy that coverage is first increasing and then decreasing in ∆ and minimizedas ∆ → ∞ , the same feature that is analytically proved for ρ = 0 and which justifies ˆ c = Φ − (1 − α ). [6]btain if (and, in practice, only if) ˆ θ U ≥ ˆ θ L by construction (Stoye, 2009, Lemma 3). Thiscase is empirically relevant: It applies to most missing-data bounds and also bounds that relyon different truncations of observed probability measures (Horowitz and Manski, 1995; Lee,2009), unless further refinements turn these into intersection bounds. If it obtains and otherregularity conditions hold, the confidence interval in Imbens and Manski (2004) is valid, isexpected to be rather efficient for small ∆ (because it uses superconsistency of ˆ∆), and willobviously never be empty. Not coincidentally, this case is also characterized by the possibilityof ρ ≈
1; indeed, that is how superconsistency of ˆ∆ arises. Whether this case applies can beascertained before seeing any data, and I strongly suggest that users do so.
Remark : I follow the bulk of the literature in focusing on uniform coverage of θ ∈ Θ ∗ I .The procedure is easily adapted to coverage of the entire set Θ ∗ I . Note that, by a Bonferroniargument, a critical value of ˆ c = Φ − (1 − α/
2) would always do, and also that (as can be seenfrom considering large ∆) only a large negative value of ˆ ρ would cause ˆ c to be appreciablylower.The proof of Theorem 1 contains three steps. First, it is relatively routine to show that CI MA would be valid if, in line with the heuristic definition, expression (2.2) explicitly tookthe infimum also over values of ( σ L , σ U ) as well as θ ∈ Θ I . In a second step, we can concentrateout all of these. In particular, one can restrict attention to one of θ = θ L or θ = θ U ; expression(2.2) arbitrarily chooses the latter. This finding is not obvious: For given ∆, coverage is not equally minimized at the interval’s endpoints; it is only that the corresponding infima over∆ ∈ [0 , ∞ ) are the same. As final flourish in this step, it turns out that asymptotic coverageat θ U depends on (∆ , σ L , σ U ) only through ∆ /σ L . For the purpose of evaluating worst-casecoverage over ∆ ≥
0, we can therefore set both standard deviations to 1.The final, and by far most delicate, step is that if ρ = 0, coverage is provably minimizedas ∆ → ∞ , justifying use of the one-sided critical value ˆ c = Φ − (1 − α ). To appreciatethis claim, consider again the two components of CI MA in (2.1). For α = .
05, the left-handinterval’s coverage for either θ L or θ U may be as low as . . frombelow as ∆ → ∞ . The right-hand interval’s coverage of these values is .
95 at ∆ = 0 (whereboth coincide with θ ∗ ) but rapidly decreases to 0 as ∆ increases. That these effects aggregateto coverage uniformly above .
95 is far from obvious and heavily relies on specific features ofthe bivariate Normal distribution.Numerically, the final step extends to moderate ρ (see again Table 1), and the proof usesconservative bounds. Some analytic result of higher generality might, therefore, be available.However, for large positive ρ , coverage is minimized at small positive ∆. Therefore, if ρ isunknown, estimating it cannot be avoided. In particular, in view of Table 1, a pre-test for“small enough” ρ would be counterproductive: Since ˆ c as a function of ρ is mostly completelyflat, one would be unlikely to recover the inferential cost (in the sense of Remark 2) of the[7] a) Coverage when ρ = 0. (b) Expected length when ρ = 0.(c) Coverage when ρ = 0 .
7. (d) Expected length when ρ = 0 . Figure 1: Coverage (left panels) and expected length (right panels; length of true interval issubtracted) of CI T I (blue), CI T I ∪ CI θ ∗ (red) and the new proposal CI MA (green). Horizontalaxis is ∆ = θ U − θ L ; negative values indicate increasing misspecification. Nominal coverageis 95% and is indicated by a black horizontal line.pre-test. Figures 1 and 2 compare CI MA with a test inversion interval CI T I that arguably reflects thestate of the established literature. It inverts a test of H : θ ≤ θ U , θ ≥ θ L by taking themaximum (studentized) violation as test statistic, i.e. the same test statistic that generally The interval closely follows Romano, Shaikh, and Wolf (2014); other established methods (Andrews andSoares, 2010; Andrews and Barwick, 2012; Bugni, 2010; Canay, 2010) would inform similar constructions. Asof writing of this manuscript, at least two rather distinct (from the preceding and from each other) proposalsare in the pipeline (Andrews, Roth, and Pakes, 2019; Cox and Shi, 2020). Both invert a test and can beempty; Andrews, Roth, and Pakes (2019) also has a tuning parameter. They are compared in Cox and Shi(2020). A comparison of all these approaches in simple examples might be worthwhile. [8] a) Coverage when ρ = − .
7. (b) Expected length when ρ = − . ρ = 0 .
95. (d) Expected length when ρ = 0 . Figure 2: Continuation of Figure 1. The last case ( ρ = .
95) illustrates a setting where ∆ → ∞ is not least favorable and where ˆ c > . ∗ I as pseudotrue identified set. The critical value is based on a pre-test –specifically,a one-sided ( . α )-Wald test– that potentially discards one of the inequality constraints asnonbinding. Depending on the pre-test’s result, the critical value is then either a simple one-sided critical value or computed by a simulation that takes ρ into account. In either case, thesecond-stage test is of size . α , so that the pre-test is accounted for by Bonferroni correction.The resulting test is inverted, and the critical value is recomputed, as θ changes, making theinterval considerably shorter than early entries in the literature (Imbens and Manski, 2004;Stoye, 2009). Compared to CI MA , test inversion adds orders of magnitude of computationalcost, though at a very low absolute level. I abstract from asymptotic approximation bydrawing estimators straight from limiting distributions and taking ( σ L , σ U , ρ ) to be known.Interval length ∆ is denominated in estimator standard errors because √ nσ L = √ nσ U = 1throughout.The comparison is extended into the misspecified range by letting ∆ take on negativevalues. The test inversion interval obviously undercovers in that range. To clarify compar-[9]sons, I also compute CI T I ∪ CI θ ∗ . Recall that CI MA can be loosely intuited as refining thisconstruction by adjusting the critical value to account for union-taking. Nominal coverage is95% throughout.Figure 1 illustrates the results for ρ = 0 (top panels) and ρ = . ρ = − . ρ = .
95. The last case is arguably contrivedbut serves to illustrate that ∆ → ∞ is not always least favorable. By the same token, this isthe only case in which ˆ c > Φ − ( . CI MA :It is shorter, and this is also reflected in more precise size control and thereby more power ofthe implied test. The advantage is especially apparent for small positive ∆. What happenshere is that the correction provided by CI θ ∗ allows CI MA to transition to just adding 1 . ρ ≤ . negative estimated interval length ˆ∆; that is, CI MA just adds 1 . CI MA for large ∆ reflects that CI T I accounts for a pre-test.One might wonder how Andrews and Kwon (2019) would perform in the example. Whilethe exact answer depends on choice of multiple tuning parameters, some qualitative con-siderations are as follows. Their interval starts from CI T I and expands it in order to avoidspurious precision. As a result, it will be bounded from below in both length and coverage bythe blue curves in Figures 1 and 2. In an initial refinement, Andrews and Kwon (2019) formthe union between CI T I and a never-empty confidence interval. Their preferred confidenceinterval does this only if an additional pre-test fails to reject misspecification. While thismitigates the effect of expanding CI T I , the final confidence interval still contains CI T I andconsiderably exceeds it for small positive ∆ (see their Section 8.1, whose setting resemblesthe present one). This will obviously be reflected in its statistical performance. Conversely,an intriguing feature of CI MA is that it “spends” the “coverage capital” gained from ensuringnonemptiness by being shorter than CI T I for interesting values of ∆. In fairness to Andrewsand Kwon (2019), it appears far from obvious how to implement such a feature in their muchmore general setting.The advantage of CI MA fades out, and even reverses, in the special case where ρ → not ∆ →
0. In that limit, ˆ c will converge to the two-sided critical value, whereas a pre-testwill eventually recommend a one-sided test. While such scenarios can obviously be simulated,they arguably are contrived. The possibility of high ρ and correspondingly precise estimationof ∆ is empirically relevant, but it corresponds to the superefficiency case discussed in Remark4 and therefore to small ∆ as well as to a case distinction that can be decided in pre-dataanalysis. Also, one could in principle fix this issue by layering a pre-test on top of CI MA ;however, as general advice in this matter, I stand by Remark 2. Andrews and Kwon (2019) implement CI TI through Andrews and Soares (2010) but point out thatRomano, Shaikh, and Wolf (2014) could be used instead. The difference will be small in the present setting. [10] ame [ ˆ θ L , ˆ θ U ] CI MA CI T I rel. length
Ambiguity Aversion [0.499,0.557] [0.459,0.597] [0.458,0.598] 0.97Effort: 1 cent bonus [0.469,0.484] [0.448,0.503] [0.448,0.504] 0.97Effort: 0 cent bonus ∗ [0.343,0.331] [0.318,0.356] [0.315,0.358] 0.91Lying ∗∗ [0.530,0.537] [0.512,0.556] [0.508,0.560] 0.83Time ∗∗ [0.766,0.770] [0.722,0.814] [0.712,0.824] 0.82Trust Game 1 [0.430,0.455] [0.388,0.493] [0.387,0.495] 0.96Trust Game 2 [0.348,0.398] [0.328,0.426] [0.327,0.427] 0.97Ultimatum Game 1 [0.443,0.470] [0.422,0.493] [0.422,0.494] 0.97Ultimatum Game 2 [0.362,0.413] [0.342,0.436] [0.341,0.436] 0.97Table 2: Confidence intervals applied to data in de Quidt, Haushofer, and Roth (2018, com-pare select columns of their Table 1). Relative length refers to relative (of CI MA over CI T I )excess length beyond max { ˆ∆ , } . Of special interest: Case (*) has inverted bound estima-tors, displayed with abuse of interval notation. Cases (**) are short (near point identified)estimated intervals. De Quidt, Haushofer, and Roth (2018) estimate upper and lower bounds on behavioral pa-rameters from different treatments in a between-subjects design, meaning that estimators areuncorrelated. At the same time, bounds can and did in fact invert, triggering an inquiry bythe authors that led to the present paper.Table 2 displays estimated bounds, CI MA , and CI T I for selected instances of the “weakbounds” data. This refers to a baseline setting before inducing experimenter demand. Formore details, I refer to de Quidt, Haushofer, and Roth (2018), particularly Figure 1 andcorresponding explanations. The last column divides the length of CI MA by the length of CI T I , subtracting max { ˆ∆ , } from both. Both intervals make full use of ρ = 0 being known.The comparison is between CI MA and CI T I ; obviously, CI T I ∪ CI θ ∗ would be largerthan CI T I . The data include one case (*) where bound estimators are inverted and whereex post, CI MA = CI θ ∗ . There are also two cases (**) of short estimated intervals (relativeto standard errors), i.e. of near point identification. Because CI MA cannot be empty, onemight have conjectured it to be the longer one in these cases. In fact, it is noticeably shorterin all of them – the effect of “spending coverage capital” from the nonemptiness correctiondominates. In all other cases, both intervals effectively add 1 .
64 standard errors. This case would not have led any specification test to reject the model, even before taking multiplehypothesis testing into account. In those cases, the small differences favoring CI MA reflect Bonferroni adjustment for pre-tests, i.e. thespecifics of Romano, Shaikh, and Wolf (2014). In cases where [ θ L , θ U ] is obviously “long,” researchers will inpractice be tempted to claim an asymptotic pre-test and just use 1 .
64 standard errors. [11]
Conclusion
For a simple, but empirically relevant, partial identification problem, I propose a confidenceinterval that has competitive size control and length including in the misspecified case, whilebeing extremely easy to compute. The most striking finding is that in many cases, a seeminglycrude fix to a nominal 90% confidence interval ensures 95% coverage at little cost in terms ofinterval length and with practically zero computation. Simulations are encouraging, and theconfidence interval improves on current best practice in application to recent lab experiments.The approach is complementary to Andrews and Kwon (2019), from whom I take thebroad motivation as well as the novel coverage requirement. Of course, their approach appliesfar beyond the present paper’s simple setting. On the other hand, it has several tuningparameters and expands a conventional confidence interval, whereas the present proposal istuning parameter free and compensates for expanding the conventional interval by reducingits standalone nominal coverage. A question of obvious interest, but also beyond my currentreach, is whether this last feature can be usefully generalized. As it stands, the presentproposal is limited to a specific setting but appears both practical and powerful when thatsetting obtains. [12]
Proof of Theorem 1
Validity in case of ∆ < θ ∗ is required in this case, and CI θ ∗ achieves that by itself. Since for ∆ ≥
0, we have Θ ∗ I = Θ I , it remains to show coverage of θ ∈ [ θ L , θ U ] assuming that θ U ≥ θ L . For the remainder of this proof, express the true valueof θ as θ = λθ U + (1 − λ ) θ L for some λ ∈ [0 , CI ∗ MA , which is just like CI MA except that, rather than by (2.2), a critical value c ∗ is defined by setting inf ∆ ≥ ,λ ∈ [0 , Pr( E ∆ ,λ,c ∗ ) = 1 − α , where E ∆ ,λ,c = (cid:26) Z ∗ − λσ L ∆ ≤ c ∩ Z ∗ + 1 − λσ U ∆ ≥ − c (cid:27) (A.1) ∪ (cid:26) Z ∗ + Z ∗ + (cid:18) − λσ U − λσ L (cid:19) ∆ ∈ (cid:104) − (cid:112) ρ Φ − (cid:0) − α (cid:1) , (cid:112) ρ Φ − (cid:0) − α (cid:1)(cid:105)(cid:27) , (cid:32) Z ∗ Z ∗ (cid:33) ∼ N (cid:32)(cid:32) (cid:33) , (cid:32) ρρ (cid:33)(cid:33) . Note two differences to (2.2): The construction explicitly minimizes over both ∆ and λ , andit is infeasible in that population values of ( σ L , σ U , ρ ) are used.Step 1 of the proof establishes validity of CI ∗ MA . Step 2 shows that λ can always be setto 1, transforming the above into (2.2). Step 3 establishes that if ρ = 0, one can furthermoretake the limit as ∆ → ∞ . The argument that ( σ L , σ U , ρ ) can be replaced with consistentestimators is omitted for brevity. Step 1: Validity of CI ∗ MA . Write CI ∗ MA = (cid:20) ˆ θ L − σ L √ n c ∗ , ˆ θ U + σ U √ n c ∗ (cid:21) ∪ (cid:34) σ L ˆ θ U + σ U ˆ θ L σ L + σ U − σ ∗ √ n Φ − (cid:0) − α (cid:1) , σ L ˆ θ U + σ U ˆ θ L σ L + σ U + σ ∗ √ n Φ − (cid:0) − α (cid:1)(cid:35) , where σ ∗ ≡ √ ρσ L σ U / ( σ L + σ U ) is the asymptotic standard deviation of √ n ( λ ∗ ˆ θ U + (1 − λ ∗ )ˆ θ L − θ ∗ ) and λ ∗ ≡ σ L / ( σ L + σ U ) is the mixture weight characterizing θ ∗ .Define also standardized estimation errors(¯ ε L , ¯ ε U ) ≡ √ n (cid:32) ˆ θ L − θ L σ L , ˆ θ U − θ U σ U (cid:33) . We have that θ ∈ CI ∗ MA if eitherˆ θ L − σ L √ n c ∗ ≤ λθ U + (1 − λ ) θ L ≤ ˆ θ U + σ U √ n c ∗ ⇐⇒ ˆ θ L − θ L ≤ λ ∆ + σ L √ n c ∗ , ˆ θ U − θ U ≥ − (1 − λ )∆ − σ U √ n c ∗ ⇐⇒ ¯ ε L ≤ λσ L √ n ∆ + c ∗ , ¯ ε U ≥ − − λσ U √ n ∆ − c ∗ [13]r σ L ˆ θ U + σ U ˆ θ L σ L + σ U − ( λθ U + (1 − λ ) θ L ) ∈ (cid:20) − σ ∗ √ n Φ − (cid:0) − α (cid:1) , σ ∗ √ n Φ − (cid:0) − α (cid:1)(cid:21) ⇐⇒ σ L (cid:16) θ U + σ U ¯ ε U √ n (cid:17) + σ U (cid:16) θ L + σ L ¯ ε L √ n (cid:17) σ L + σ U − ( λθ U + (1 − λ ) θ L ) ∈ (cid:20) − σ ∗ √ n Φ − (cid:0) − α (cid:1) , σ ∗ √ n Φ − (cid:0) − α (cid:1)(cid:21) ⇐⇒ σ L σ U σ L + σ U (¯ ε L + ¯ ε U ) + √ n (cid:18) σ L θ U + σ U θ L σ L + σ U − ( λθ U + (1 − λ ) θ L ) (cid:19) ∈ (cid:104) − σ ∗ Φ − (cid:0) − α (cid:1) , σ ∗ Φ − (cid:0) − α (cid:1)(cid:105) ⇐⇒ ¯ ε L + ¯ ε U + √ n σ L θ U + σ U θ L − ( σ L + σ U )( λθ U + (1 − λ ) θ L ) σ L σ U ∈ (cid:104) − (cid:112) ρ Φ − (cid:0) − α (cid:1) , (cid:112) ρ Φ − (cid:0) − α (cid:1)(cid:105) ⇐⇒ ¯ ε L + ¯ ε U + (cid:18) − λσ U − λσ L (cid:19) √ n ∆ ∈ (cid:104) − (cid:112) ρ Φ − (cid:0) − α (cid:1) , (cid:112) ρ Φ − (cid:0) − α (cid:1)(cid:105) . In sum,Pr( θ ∈ CI ∗ MA )= Pr (cid:18)(cid:26) ¯ ε L − λσ L √ n ∆ ≤ c ∗ ∩ ¯ ε U + 1 − λσ U √ n ∆ ≥ − c ∗ (cid:27) ∪ (cid:26) ¯ ε L + ¯ ε U + (cid:18) − λσ U − λσ L (cid:19) √ n ∆ ∈ (cid:104) − (cid:112) ρ Φ − (cid:0) − α (cid:1) , (cid:112) ρ Φ − (cid:0) − α (cid:1)(cid:105)(cid:27)(cid:19) → Pr (cid:18)(cid:26) Z ∗ − λσ L √ n ∆ ≤ c ∗ ∩ Z ∗ + 1 − λσ U √ n ∆ ≥ − c ∗ (cid:27) ∪ (cid:26) Z ∗ + Z ∗ + (cid:18) − λσ U − λσ L (cid:19) √ n ∆ ∈ (cid:104) − (cid:112) ρ Φ − (cid:0) − α (cid:1) , (cid:112) ρ Φ − (cid:0) − α (cid:1)(cid:105)(cid:27)(cid:19) ≥ inf ∆ ≥ ,λ ∈ [0 , Pr (cid:18)(cid:26) Z ∗ − λσ L ∆ ≤ c ∗ ∩ Z ∗ + 1 − λσ U ∆ ≥ − c ∗ (cid:27) ∪ (cid:26) Z ∗ + Z ∗ + (cid:18) − λσ U − λσ L (cid:19) ∆ ∈ (cid:104) − (cid:112) ρ Φ − (cid:0) − α (cid:1) , (cid:112) ρ Φ − (cid:0) − α (cid:1)(cid:105)(cid:27)(cid:19) = 1 − α, where the convergence uses Assumption 1 and the next step uses the definition of c ∗ and alsoobserves that, since we take an infimum over ∆ ≥
0, we can drop the √ n premultiplying ∆. Step 2: Concentrating out λ . We first concentrate out λ , for which { , } are equallyleast favorable if ∆ is unrestricted. To see this, consider the reparameterization( X , X ) ≡ (cid:18) Z ∗ + Z ∗ √ , Z ∗ − Z ∗ √ (cid:19) ⇐⇒ ( Z ∗ , Z ∗ ) = (cid:18) X − X √ , X + X √ (cid:19) (A.2)[14]nd observe that ( X , X ) are uncorrelated. Simple algebra yields E ∆ ,λ,c = (cid:26) X − X − λσ L √ ≤ √ c ∩ X + X + 1 − λσ U √ ≥ −√ c (cid:27) ∪ (cid:26) X + (cid:18) − λσ U − λσ L (cid:19) ∆ √ ∈ (cid:104) − (cid:112) ρ Φ − (cid:0) − α (cid:1) , (cid:112) ρ Φ − (cid:0) − α (cid:1)(cid:105)(cid:27) . Consider minimizing Pr( E ∆ ,λ,c ) subject to the constraint that∆ = σ L σ U λσ U + (1 − λ ) σ L β for some fixed value β ≥
0. This is without loss of generality since one can minimize over β in a second step and every value of (∆ , λ ) ∈ [0 , ∞ ) × [0 ,
1] is consistent with some β ≥ . . . ≤ X ≤ . . . ”, one can write E ∆ ,λ,c (cid:107) ∆= σLσUλσU +(1 − λ ) σL β = (cid:26) − X − √ c − (1 − λ ) σ L λσ U + (1 − λ ) σ L √ β ≤ X ≤ X + √ c + λσ U λσ U + (1 − λ ) σ L √ β (cid:27) ∪ (cid:26) λσ U − (1 − λ ) σ L λσ U + (1 − λ ) σ L × β √ − (cid:112) ρ Φ − (cid:0) − α (cid:1) ≤ X ≤ λσ U − (1 − λ ) σ L λσ U + (1 − λ ) σ L × β √ (cid:112) ρ Φ − (cid:0) − α (cid:1)(cid:27) and thereforePr( E ∆ ,λ,c | X = x ) (cid:107) ∆= σLσUλσU +(1 − λ ) σL β = Pr (cid:18) X ∈ (cid:20) − x − √ c − (1 − λ ) σ L λσ U + (1 − λ ) σ L √ β, x + √ c + λσ U λσ U + (1 − λ ) σ L √ β (cid:21) ∪ (cid:20) λσ U − (1 − λ ) σ L λσ U + (1 − λ ) σ L × β √ − (cid:112) ρ Φ − (cid:0) − α (cid:1) , λσ U − (1 − λ ) σ L λσ U + (1 − λ ) σ L × β √ (cid:112) ρ Φ − (cid:0) − α (cid:1)(cid:21)(cid:19) with the understanding that the first interval above is empty for small enough x .Irrespective of the value taken by x , both intervals are centered at λσ U − (1 − λ ) σ L λσ U +(1 − λ ) σ L × β √ , anexpression that increases in λ and takes value 0 at λ = λ ∗ . The intervals’ length does notdepend on λ , and their union coincides with the larger of the two (whose identity dependson x ). Again irrespective of the value of x , X is distributed normally around 0. Bylog-concavity of the Normal distribution (or by taking derivatives), the above probabilitytherefore increases in λ up to λ ∗ and decreases in λ thereafter conditionally on any x ,hence also unconditionally. Furthermore, plugging in λ ∈ { , } reveals symmetry about 0:Switching λ from 0 to 1 is equivalent to leaving λ unchanged but replacing X with − X .The probabilities of all intervals in the above display are , therefore, equally minimized at λ ∈ { , } (although these minima correspond to different ∆). This establishes that, if bothof (∆ , λ ) are concentrated out globally, one can restrict attention to one of λ = 0 or λ = 1.[15]e finally observe that the way in which σ L enters E ∆ , ,c = (cid:26) Z ∗ − ∆ σ L ≤ c ∩ Z ∗ ≥ − c (cid:27) ∪ (cid:26) Z ∗ + Z ∗ − ∆ σ L ∈ (cid:104) − (cid:112) ρ Φ − (cid:0) − α (cid:1) , (cid:112) ρ Φ − (cid:0) − α (cid:1)(cid:105)(cid:27) allows the simplificationinf ∆ ≥ Pr( E ∆ , ,c )= inf ∆ ≥ Pr (cid:16)(cid:8) Z ∗ − ∆ ≤ c ∩ Z ∗ ≥ − c (cid:9) ∪ (cid:110) Z ∗ + Z ∗ − ∆ ∈ (cid:104) − (cid:112) ρ Φ − (cid:0) − α (cid:1) , (cid:112) ρ Φ − (cid:0) − α (cid:1)(cid:105)(cid:111)(cid:17) . Step 3: For ρ = 0 , concentrating out ∆ . For the remainder of this proof, suppose ρ = 0.In view of step 2, also restrict attention to λ = 1. This step’s main claim is that Pr( E ∆ , ,c )is first increasing and then decreasing (possibly, although not in fact, all increasing or alldecreasing) in ∆ ≥
0. Suppose the claim is true, then it follows that inf ∆ ∈ [0 , ∞ ) Pr( E ∆ , ,c )is attained either at ∆ = 0 or as ∆ → ∞ . In the former case, θ U = θ ∗ , so that CI MA isobviously conservative. The latter limit is easily seen to equal 1 − α , and this is indeed the(unattained) infimal coverage.It remains to show the main claim. Write γ = √ − (1 − α/ ρ = 0)we have E ∆ , ,c = (cid:8) Z ∗ − ∆ ≤ c ∩ Z ∗ ≥ − c (cid:9) ∪ (cid:8) Z ∗ + Z ∗ − ∆ ∈ [ − γ, γ ] (cid:9) , where ( Z ∗ , Z ∗ ) is bivariate standard Normal. We will henceforth think of Pr( E ∆ , ,c ) asfunction of ∆ with ( c, γ ) fixed. Note that the condition on critical values translates as2 c ≥ γ .Using Φ( · ) and φ ( · ) for the standard normal distribution and density functions, writePr( E ∆ , ,c | Z ∗ = z ) = Φ( γ + ∆ − z ) − Φ( − γ + ∆ − z ) , z < − c Φ( γ + ∆ − z ) , − c ≤ z ≤ − c + γ Φ(∆ + c ) , z > − c + γ and therefore (the last step below will be elaborated after the display) d Pr( E ∆ , ,c ) d ∆= d (cid:82) ∞−∞ Pr( E ∆ , ,c | Z ∗ = z ) φ ( z ) dz d ∆= (cid:90) − c + γ −∞ φ ( γ + ∆ − z ) φ ( z ) dz − (cid:90) − c −∞ φ ( − γ + ∆ − z ) φ ( z ) dz + (cid:90) ∞− c + γ φ (∆ + c ) φ ( z ) dz = √ (cid:18) φ (cid:18) γ + ∆ √ (cid:19) − φ (cid:18) − γ + ∆ √ (cid:19)(cid:19) Φ (cid:18) γ − ∆ − c √ (cid:19)(cid:124) (cid:123)(cid:122) (cid:125) A + φ (∆ + c )Φ( c − γ ) (cid:124) (cid:123)(cid:122) (cid:125) B . (A.3)[16]o see the last step, note first that (cid:82) ∞− c + γ φ (∆ + c ) φ ( z ) dz simplifies to B . Next,( Z ∗ , Z ∗ ) = ( γ + ∆ − z , z ) ⇔ ( X , X ) = (cid:18) γ + ∆ √ , z − γ − ∆ √ (cid:19) , where ( X , X ) is as in (A.2). Because ρ = 0 implies that ( X , X ) is standard normal, wehave (cid:90) − c + γ −∞ φ ( γ + ∆ − z ) φ ( z ) dz = (cid:90) − c + γ −∞ φ (cid:18) γ + ∆ √ (cid:19) φ (cid:18) z − γ − ∆ √ (cid:19) dz = √ (cid:90) ( γ − ∆ − c ) / √ −∞ φ (cid:18) γ + ∆ √ (cid:19) φ ( t ) dt = √ φ (cid:18) γ + ∆ √ (cid:19) Φ (cid:18) γ − ∆ − c √ (cid:19) . A similar computation for (cid:82) − c −∞ φ ( − γ + ∆ − z ) φ ( z ) dz and rearrangement of terms yieldterm A in (A.3).Term A equals zero at ∆ = 0 and then becomes negative. Term B is positive throughout.Because all terms vanish as ∆ → ∞ , it is not useful to directly take further derivatives.However, we can compare the terms’ relative magnitude. In particular, we will see that | A | / | B | increases in ∆, hence d Pr( E ∆ , ,c ) /d ∆ has at most one sign change and that signchange (if it occurs) is from positive to negative, establishing the claim.To see monotonicity of | A | / | B | , write | A || B | = √ × φ (cid:16) − γ +∆ √ (cid:17) − φ (cid:16) γ +∆ √ (cid:17) φ (∆ + c ) × Φ (cid:16) γ − ∆ − c √ (cid:17) Φ( c − γ )= √ × exp (cid:0) − (cid:0) γ + ∆ − γ ∆ (cid:1)(cid:1) − exp (cid:0) − (cid:0) γ + ∆ + 2 γ ∆ (cid:1)(cid:1) exp (cid:0) − (∆ + c + 2∆ c ) (cid:1) × Φ (cid:16) γ − ∆ − c √ (cid:17) Φ( c − γ )= (cid:18) exp (cid:16) ∆ + ∆ c + γ ∆2 (cid:17) − exp (cid:16) ∆ + ∆ c − γ ∆2 (cid:17)(cid:19) Φ (cid:18) γ − ∆ − c √ (cid:19) × const. , where “const.” absorbs terms that do not depend on ∆. The derivative of this expressionwith respect to ∆ (and dropping the multiplicative constant) is (cid:18) ∆ + 2 c + γ (cid:16) ∆ + ∆ c + γ ∆2 (cid:17) − ∆ + 2 c − γ (cid:16) ∆ + ∆ c − γ ∆2 (cid:17)(cid:19)(cid:124) (cid:123)(cid:122) (cid:125) C Φ (cid:18) γ − ∆ − c √ (cid:19) − √ (cid:18) exp (cid:16) ∆ + ∆ c + γ ∆2 (cid:17) − exp (cid:16) ∆ + ∆ c − γ ∆2 (cid:17)(cid:19) φ (cid:18) γ − ∆ − c √ (cid:19) ≥ (cid:18) ∆ + 2 c + γ (cid:16) ∆ + ∆ c + γ ∆2 (cid:17) − ∆ + 2 c − γ (cid:16) ∆ + ∆ c − γ ∆2 (cid:17)(cid:19) ∆+2 c − γ √ (cid:16) ∆+2 c − γ √ (cid:17) + 1 φ (cid:18) ∆ + 2 c − γ √ (cid:19) − √ (cid:18) exp (cid:16) ∆ + ∆ c + γ ∆2 (cid:17) − exp (cid:16) ∆ + ∆ c − γ ∆2 (cid:17)(cid:19) φ (cid:18) γ − ∆ − c √ (cid:19) , [17]sing that C ≥ − t ) ≥ tt +1 φ ( t ). In order to sign this, divide through by φ ( . . . ) (bothare the same by symmetry of φ ( · )) as well as by exp (cid:16) ∆ + ∆ c − γ ∆2 (cid:17) , multiply through by √ (cid:16) (∆+2 c − γ ) + 1 (cid:17) , and rearrange terms to conclude that the last expressionabove has the same sign as (cid:18) ∆ + 2 c + γ √ × ∆ + 2 c − γ √ − (cid:18) (∆ + 2 c − γ ) (cid:19)(cid:19) exp( γ ∆) − (cid:18) ∆ + 2 c − γ √ × ∆ + 2 c − γ √ − (cid:18) (∆ + 2 c − γ ) (cid:19)(cid:19) = (cid:18) (∆ + 2 c + γ )(∆ + 2 c − γ )2 − (∆ + 2 c − γ ) − (cid:19) exp( γ ∆) + 1= ( γ (∆ + 2 c − γ ) −
1) exp( γ ∆) + 1 . At ∆ = 0, this simplifies to γ (2 c − γ ) and therefore is nonnegative if 2 c ≥ γ . But one canalso write dd ∆ (cid:0) ( γ (∆ + 2 c − γ ) −
1) exp( γ ∆) + 1 (cid:1) = γ exp( γ ∆) + ( γ (∆ + 2 c − γ ) − γ exp( γ ∆)= γ (∆ + 2 c − γ ) exp( γ ∆) , which is again nonnegative if 2 c ≥ γ . Thus, | A | / | B | is nondecreasing in ∆ for all ∆ ≥ eferences Andrews, D. W. K. (2000): “Inconsistency of the Bootstrap When a Parameter is on theBoundary of the Parameter Space,”
Econometrica , 68(2), 399–405.
Andrews, D. W. K., and
P. J. Barwick (2012): “Inference for Parameters Definedby Moment Inequalities: A Recommended Moment Selection Procedure,”
Econometrica ,80(6), 2805–2826.
Andrews, D. W. K., and
S. Kwon (2019): “Inference in Moment Inequality ModelsThat Is Robust to Spurious Precision under Model Misspecification,”
Cowles FoundationDiscussion Paper CFDP 2184R . Andrews, D. W. K., and
X. Shi (2013): “Inference Based on Conditional Moment In-equalities,”
Econometrica , 81(2), 609–666.
Andrews, D. W. K., and
G. Soares (2010): “Inference for Parameters Defined by MomentInequalities Using Generalized Moment Selection,”
Econometrica , 78(1), 119–157.
Andrews, I., J. Roth, and
A. Pakes (2019): “Inference for Linear Conditional MomentInequalities,” arXiv:1909.10062.
Bugni, F. A. (2010): “Bootstrap Inference in Partially Identified Models Defined by MomentInequalities: Coverage of the Identified Set,”
Econometrica , 78(2), 735–753.
Bugni, F. A., I. A. Canay, and
X. Shi (2015): “Specification tests for partially identifiedmodels defined by moment inequalities,”
Journal of Econometrics , 185(1), 259–282.(2017): “Inference for subvectors and other functions of partially identified param-eters in moment inequality models,”
Quantitative Economics , 8(1), 1–38.
Canay, I. (2010): “EL inference for partially identified models: large deviations optimalityand bootstrap validity,”
Journal of Econometrics , 156(2), 408–425.
Canay, I. A., and
A. M. Shaikh (2017):
Practical and Theoretical Advances in Inferencefor Partially Identified Models vol. 2 of
Econometric Society Monographs , pp. 271–306.Cambridge University Press.
Chernozhukov, V., S. Lee, and
A. M. Rosen (2013): “Intersection Bounds: Estimationand Inference,”
Econometrica , 81(2), 667–737.
Cox, G., and
X. Shi (2020): “Simple Adaptive Size-Exact Testing for Full-Vector andSubvector Inference in Moment Inequality Models,” arXiv:1907.06317. de Quidt, J., J. Haushofer, and
C. Roth (2018): “Measuring and Bounding Experi-menter Demand,”
American Economic Review , 108(11), 3266–3302.[19] orowitz, J. L., and
C. F. Manski (1995): “Identification and Robustness with Contam-inated and Corrupted Data,”
Econometrica , 63(2), 281–302.
Imbens, G. W., and
C. F. Manski (2004): “Confidence Intervals for Partially IdentifiedParameters,”
Econometrica , 72(6), 1845–1857.
Kaido, H., F. Molinari, and
J. Stoye (2019): “Confidence Intervals for Projections ofPartially Identified Parameters,”
Econometrica , 87(4), 1397–1432.
Kaido, H., and
H. White (2013):
Estimating Misspecified Moment Inequality Models pp.331–361. Springer New York, New York, NY.
Lee, D. S. (2009): “Training, Wages, and Sample Selection: Estimating Sharp Bounds onTreatment Effects,”
The Review of Economic Studies , 76(3), 1071–1102.
Leeb, H., and
B. M. P¨otscher (2005): “Model Selection and Inference: Facts and Fic-tion,”
Econometric Theory , 21(1), 21–59.
Manski, C. F. (2003):
Partial Identification of Probability Distributions (Springer Series inStatistics) . Springer-Verlag, Berlin.
Molinari, F. (2020): “Microeconometrics with Partial Identification,” Handbook of econo-metrics, forthcoming.
Ponomareva, M., and
E. Tamer (2011): “Misspecification in moment inequality models:back to moment equalities?,”
The Econometrics Journal , 14(2), 186–203.
Romano, J. P., and
A. M. Shaikh (2008): “Inference for identifiable parameters in par-tially identified econometric models,”
Journal of Statistical Planning and Inference , 138(9),2786 – 2807.
Romano, J. P., A. M. Shaikh, and
M. Wolf (2014): “A Practical Two-Step Method forTesting Moment Inequalities,”
Econometrica , 82(5), 1979–2002.
Stoye, J. (2009): “More on Confidence Regions for Partially Identified Parameters,”
Econo-metrica , 77(4), 1299–1315.
Tamer, E. (2010): “Partial Identification in Econometrics,”
Annual Review of Economics ,2(1), 167–195.
White, H. (1982): “Maximum Likelihood Estimation of Misspecified Models,”