[PDF] A test of non-identifying restrictions and confidence regions for partially identified parameters

Abstract

We propose an easily implementable test of the validity of a set of theoretical restrictions on the relationship between economic variables, which do not necessarily identify the data generating process. The restrictions can be derived from any model of interactions, allowing censoring and multiple equilibria. When the restrictions are parameterized, the test can be inverted to yield confidence regions for partially identified parameters, thereby complementing other proposals, primarily Chernozhukov et al. [Chernozhukov, V., Hong, H., Tamer, E., 2007. Estimation and confidence regions for parameter sets in econometric models. Econometrica 75, 1243-1285].

Full PDF

AA Test of Non-Identifying Restrictions andConﬁdence Regions for Partially IdentiﬁedParameters

Alfred Galichon and Marc Henry ´Ecole polytechnique, Paris and Universit´e de Montr´ealFirst draft: September 15, 2005This draft : April 16, 2008 Abstract

We propose an easily implementable test of the validity of a set of theoreticalrestrictions on the relationship between economic variables, which do not necessarilyidentify the data generating process. The restrictions can be derived from any modelof interactions, allowing censoring and multiple equilibria. When the restrictions areparameterized, the test can be inverted to yield conﬁdence regions for partially iden-tiﬁed parameters, thereby complementing other proposals, primarily Chernozhukovet al., 2007.

JEL Classiﬁcation: C10, C12, C13, C14, C52, C61Keywords: partial identiﬁcation, mass transportation, speciﬁcation test. This research was partly carried out while the ﬁrst author was visiting the Bendheim Center for Fi-nance, Princeton University and ﬁnancial support from NSF grant SES 0350770 to Princeton University,from NSF grant SES 0532398, from the Program for Economic Research at Columbia University and fromChaire EDF-Calyon “Finance et D´eveloppement Durable” is gratefully acknowledged. We are grateful toVictor Chernozhukov, Pierre-Andr´e Chiappori, Guido Imbens and Bernard Salani´e for encouragement, sup-port and many helpful discussions. We also thank three anonymous referees, whose detailed and insightfulcomments helped signiﬁcantly improve the paper, and we thank conference participants at Econometricsin Rio and seminar participants at Berkeley, Chicago, Columbia, ´Ecole polytechnique, Harvard-MIT, MITSloane OR, Northwestern, NYU, Princeton, SAMSI, Stanford, the Weierstrass Institut and Yale for helpfulcomments (with the usual disclaimer). Correspondence address: D´epartement d’´economie, ´Ecole polytech-nique, 91128 Palaiseau, France and D´epartement de sciences ´economiques, Universit´e de Montr´eal, C.P.6128, succursale Centre-ville, Montr´eal QC H3C 3J7, Canada. E-mail: [email protected] [email protected]. a r X i v : . [ ec on . E M ] F e b ntroduction In several rapidly expanding areas of economic research, the identiﬁcation problem issteadily becoming more acute. In policy and program evaluation (Manski, 1990) andmore general contexts with censored or missing data (Molinari, 2003, Magnac and Maurin,2008) and measurement error (Chen et al., 2005), ad hoc imputation rules lead to frag-ile inference. In demand estimation based on revealed preference (Blundell et al., 2005)the data is generically insuﬃcient for identiﬁcation. In the analysis of social interactions(Brock and Durlauf, 2007, Manski, 2004), complex strategies to reduce the large dimen-sionality of the correlation structure are needed. In the estimation of models with complexstrategic interactions and multiple equilibria (Tamer, 2003, D. Andrews et al., 2003, Pakeset al., 2004), assumptions on equilibrium selection mechanisms may not be available oracceptable.More generally, in all areas of investigation with structural data insuﬃciencies or incom-pletely speciﬁed economic mechanisms, the hypothesized structure fails to identify a uniquepossible generating mechanism for the data that is actually observed. Hence, when thestructure depends on unknown parameters, and even if a unique value of the parametercan still be construed as the true value in some well deﬁned way, it does not correspondin a one-to-one mapping with a probability measure for the observed variables. We thencall the structural restrictions non-identifying. In other words, even if we abstract fromsampling uncertainty and assume the distribution of the observable variables is perfectlyknown, no unique parameter but a whole set of parameter values (hereafter called identiﬁedset in the terminology of Manski, 2005) will be compatible with it.Once a theoretical description of an economic system is given, a natural question to con-sider is whether the structure can be rejected on the basis of data on its observable com-ponents. Marschak and Andrews, 1944 construct a collection of production functions thatare compatible with structural restrictions and are not rejected by the data. We extendthis approach within the general formulation of Koopmans and Reiersol, 1950, who deﬁnea structure as the combination of a binary relation between observed socioeconomic vari-ables (market entry, insurance coverage, winning bids in auctions, etc...) and unobservedones (productivity shocks, risk level, or risk attitude, valuations or information dependingon the auction paradigm, etc...) and a generating mechanism for the unobserved variables.This setup is employed by Roehrig, 1988 and Matzkin, 1994, who analyze conditions fornonparametric identiﬁcation of structures where the endogenous observable variables arefunctions of unobservable variables and exogenous observable ones.2ere, following Jovanovic, 1989, we allow the relation between observable and unobservablevariables to be many-to-many, thereby including structures with multiple equilibria (whena value of the latent variables is associated with a set of values of the observable variables)and censored endogenous observable variables (where a value of the observable variable isassociated with set of values of the latent variables). We do not strive for identiﬁcationconditions, but rather for the ability to reject such structures that are incompatible withdata, as in the original work of Marschak and Andrews, 1944.We show that such a goal can be attained in all generality (ie. for any structure, involvingdiscrete as well as continuous observable variables), through an appeal to the duality ofmass transportation (see Villani, 2003 for a comprehensive account of the theory). Givenany set of (possibly non-identifying) restrictions on the relation between latent and observ-able variables, and given the distribution ν of latent variables, the structure thus deﬁnedis compatible with the true distribution P of the observable variables if and only if thereexists a joint distribution with marginals P and ν and such that the restrictions are almostsurely respected. Otherwise, the data could not have been generated in a such a way. Weshow that the latter condition can be formulated as a mass transportation problem (theproblem of transporting a given distribution of mass from an initial location to a diﬀerentdistribution of mass in a ﬁnal location while minimizing a certain cost of transportation, asoriginally formulated by Monge, 1781). We show that this optimization problem has a dualformulation, an empirical version of which is a generalized Kolmogorov-Smirnov test statis-tic. We base a test of the restrictions in the structure on this statistic, whose asymptoticdistribution we derive, and approximate using the bootstrapped empirical process.Once we have a test of the structure, we can form conﬁdence regions for unknown param-eters using the methodology of Anderson and Rubin, 1949, which consists in collectingall parameter values for which the structure is not rejected by the test at the desired sig-niﬁcance level. The construction of such conﬁdence regions has been the focus of muchresearch lately (see for instance the thorough literature review in Chernozhukov et al.,2007). Unlike much of the econometric research on this issue, we do not restrict the anal-ysis to models deﬁned by moment inequalities. On the other hand, we consider structuresin the sense of Koopmans and Reiersol, 1950, and hence parametric distributions for thelatent variables. This, however, is a common assumption in empirical work with gametheoretic models, as exempliﬁed by D. Andrews et al., 2003, Ciliberto and Tamer, 2006,and more generally Ackerberg et al., 2007.The paper is organized as follows. The next section is divided in four subsections. The ﬁrstdescribes the setup; the second deﬁnes the hypothesis of compatibility of the structure with3he data; the third explains how to construct a conﬁdence region for the identiﬁed set, andthe fourth reviews the related literature. The second section is divided in three subsections.The ﬁrst subsection describes and justiﬁes the generalized Kolmogorov-Smirnov test ofcompatibility of the structure with the data; the second shows consistency of the test, andthe third investigates size properties of the test in a Monte Carlo experiment. The lastsection concludes. Consider the model of an economy which is composed of an observed variable Y and alatent, unobserved variable U . Formally, ( Y, U ) is a pair of random vectors deﬁned on acommon probability space. The pair (

Y, U ) has probability law π which is unknown. Y represents the variables that are observable, and U the variables that are unobservable. Y may have discrete and continuous components. Y may include variables of interestin their own right, and randomly censored or otherwise transformed versions of variablesof interest. We call the law of the observable variables P . It is unknown, but the dataavailable is a sample of independent and identically distributed vectors ( Y , . . . , Y n ) withlaw P . U includes random shocks and other unobserved heterogeneity components. Thelaw π of ( Y, U ) can be decomposed into the unconditional distribution P of Y and theconditional distribution of U given Y , namely π U | Y . Throughout the paper it is supposedthat π U | Y is unknown but ﬁxed across observations.The distribution of U is parameterized by a vector θ ∈ Θ , where Θ is an open subsetof R d , and the law of U is denoted ν θ . Finally, an economic model is given to us in theform of a set of restrictions on the vector ( Y, U ), which can be summarized without lossof generality by the relation U ∈ Γ θ ( Y ) where Γ θ is a many-to-many mapping, which iscompletely given except for the vector of structural parameters θ ∈ Θ , where Θ is anopen subset of R d . θ and θ may contain common components. We call θ the combinationof the two, so that θ ∈ Θ, with Θ an open subset of R d θ , and d θ ≤ d + d . From now on,we shall therefore denote the distribution of U by ν θ and the many-to-many mapping byΓ θ . In all that follows, we assume that Γ θ is measurable (a very weak requirement whichis deﬁned in the appendix), and has non-empty and closed values.We are interested in testing the compatibility of the observed variables Y with the model4escribed by (Γ , ν ). A related question is set-inference in a parametric model (Γ θ , ν θ ): aconﬁdence region for θ can be obtained by inverting the speciﬁcation test, namely retainingthe values of θ which are not rejected. Note that if θ = ( β, η ), where β are the parametersof interest and η ∈ H are nuisance parameters, we can redeﬁne the economic model restric-tions as U ∈ Γ β ( Y ) where Γ β is deﬁned by Γ β ( y ) = (cid:83) η ∈ H Γ ( β,η ) ( y ) for all y ∈ R d y . Hencewe can assume again without loss of generality that θ is indeed the parameter of interest.As the main focus of the present paper is to derive a speciﬁcation test, whenever there isno ambiguity we shall implicitly ﬁx the parameter θ and drop it from our notations. Example 1.

A prominent example for this set-up is provided by the class of models deﬁnedby a static game of interaction. Consider a game where the payoﬀ function for player j , j = 1 , . . . , J is given by Π j ( S j , S − j , X j , U j ; θ ) , where S j is player j ’s strategy and S − j istheir opponents’ strategies. X j is a vector of observable characteristics of player j and U j a vector of unobservable determinants of the payoﬀ. Finally θ is a vector of param-eters. Pure strategy equilibrium conditions deﬁne a many-to-many mapping Γ θ from un-observable player characteristics U to observable variables Y = ( S, X ) . More precisely, Γ θ ( s, x ) = { u ∈ R J : Π j ( s j , s − j , x j , u j ; θ ) ≥ Π j ( s, s − j , x j , u j ; θ ) , for all S and all j } .When the strategies are discrete, this is the set-up considered by D. Andrews et al., 2003,Pakes et al., 2004, and Ciliberto and Tamer, 2006. A special case of the latter example is given in Jovanovic, 1989 and will serve as our ﬁrstillustrative example:

Pilot Example 1.

The payoﬀ functions are Π ( Y , Y , U , U ) = ( θY − U )1 { Y =1 } and Π ( Y , Y , U , U ) = ( θY − U )1 { Y =1 } , where Y i ∈ { , } is ﬁrm i’s action, and U = ( U , U ) (cid:48) are exogenous costs. The ﬁrms know their costs; the analyst, however, knows only that U isuniformly distributed on [0 , , and that the structural parameter θ is in (0 , . There aretwo pure strategy Nash equilibria. The ﬁrst is Y = Y = 0 for all U ∈ [0 , . The secondis Y = Y = 1 for all U ∈ [0 , θ ] and zero otherwise. Since the two ﬁrms’ actions areperfectly correlated, we shall denote them by a single binary variable Y = Y = Y . Hencethe structure is described by the many-to-many mapping: Γ θ (1) = [0 , θ ] and Γ θ (0) = [0 , .In this case, since Y is Bernoulli, we can characterize P with the probability p of observinga 1. A second example illustrates the case with continuous observable variables:

Pilot Example 2.

Tinbergen, 1951 ﬁrst spelt out the implications of skill and job re-quirement heterogeneity on the distribution of wages. We adopt a simpliﬁed version of he skill versus job requirements relation for illustrative purposes. Suppose one observesavailable jobs in an economy, each characterized by a set of characteristics Y with distri-bution P . Worker’s skills are unobserved, and are assumed for illustrative purposes to becharacterized by an index U ∈ R . Fulﬁllment of job Y is known to require a range of skills Γ θ ( Y ) = [ s θ ( Y ) , s θ ( Y )] . The distribution of skills is parameterized by ν θ . Identiﬁcation of the parameter θ would require the correspondence between the law ofthe observations P and the parameter vector θ to be a function. Compared to the setupdescribed in Roehrig, 1988, there is the added complexity of the possibility that the observ-able variables have discrete components, and that the structure allows multiple equilibria.Conditions ensuring identiﬁcation are likely to prove complicated and restrictive, and willoften rule out multiple equilibria, which is the norm rather than the exception in exam-ple 1. We therefore eschew identiﬁcation, and allow the relation between P and θ to bemany-to-many. Our objective is to conduct inference on the set Θ I of parameter valuesthat are compatible with the true law of the observable variables P .Let us formally deﬁne compatibility of a given value θ of the parameter vector with alaw P for the observable variables Y . When θ is ﬁxed, all the elements in the model arecompletely known. We therefore have a structure in the terminology of Koopmans andReiersol, 1950 extended by Jovanovic, 1989. The structure is given by the law ν θ for U ,and the many-to-many mapping Γ θ linking Y and U . We denote this structure by thetriple ( P, Γ θ , ν θ ). Consider now the restrictions that ( P, Γ θ , ν θ ) imposes on the unknown π , the law of the vector of variables ( Y, U ). • Its marginal with respect to Y is P , • Its marginal with respect to U is ν θ , • The economic restrictions U ∈ Γ θ ( Y ) hold π almost surely.A probability law π that satisﬁes the restrictions above may or may not exist. If and onlyif it does, we say that the structure ( P, Γ θ , ν θ ) is internally consistent, or simply that thevalue θ of the parameter is compatible with the law P of the observable variables. If novalue θ is found such that the structure is internally consistent, then the model restrictionsare rejected. 6 eﬁnition 1. A structure ( P, Γ , ν ) for ( Y, U ) given by a probability law P for Y , a proba-bility law ν for U and a set of restrictions U ∈ Γ( Y ) is called internally consistent if thereexists a law π for the vector ( Y, U ) with marginals P and ν such that π ( { U ∈ Γ( Y ) } ) = 1 . We can now deﬁne the identiﬁed set as the set of values of the parameters that achievethis internal consistency. They are observationally equivalent, since even though they maycorrespond to diﬀerent π ’s, they correspond to the same P . Deﬁnition 2.

The identiﬁed set Θ I = Θ I ( P ) is the set of values θ of the parameter vectorsuch that the structure ( P, Γ θ , ν θ ) is internally consistent. We illustrate the previous deﬁnitions with our pilot example:

Pilot example 1 continued

For a given value of θ , the structure ( P, Γ θ , ν θ ) is deﬁned by p , Γ θ and the uniform distribution ν θ on [0 , . ( P, Γ θ , ν θ ) is internally consistent if thereexists a probability on { , } × [0 , with marginal frequency p of observing a Y = 1, anduniform marginal distribution for the costs U such that Y = 1 ⇒ U ≤ θ almost surely(where the last inequality is meant coordinate by coordinate).The previous example illustrates the fact that deﬁnition 1 is not very easy to apply toderive the identiﬁed set in speciﬁc problems. We therefore propose a characterization ofinternal consistency which will prove more practical, and which, as we shall see in the nextsection, will motivate the construction of the statistic to test internal consistency. Proposition 1.

A structure ( P, Γ , ν ) is internally consistent if and only if sup A ∈B [ P ( A ) − ν (Γ( A ))] = 0 where B is the collection of measurable sets in the space of realizations of Y . This proposition shows that checking internal consistency of a structure is equivalent tochecking that the P -measure of a set is always dominated by the ν -measure of the imageof this set by Γ (recall that the image of a set A by a many-to-many mapping is deﬁnedby Γ( A ) = (cid:83) a ∈ A Γ( a )). Note that it is relatively easy to show necessity, i.e. that theexistence of π satisfying the constraints (the deﬁnition of internal consistency) implies thatsup A ∈B [ P ( A ) − ν (Γ( A ))] = 0. Indeed, the deﬁnition of internal consistency implies that Y ∈ A ⇒ U ∈ Γ( A ), so that 1 { Y ∈ A } ≤ { U ∈ Γ( A ) } , π -almost surely. Taking expectation,we have E π (1 { Y ∈ A } ) ≤ E π (1 { U ∈ Γ( A ) } ), which yields the result, since π has marginals P and ν . The converse (proved in the appendix) is far more involved, as it relies on mass trans-portation duality, where mass P is transported into mass ν with 0-1 cost of transportationassociated with violations of the restrictions U ∈ Γ( Y ).7 ilot example 1 continued For a given θ , it is now very easy to derive the condition forinternal consistency of the structure. Indeed, all we need to check is that sup A ∈ { , } [ P ( A ) − ν θ (Γ θ ( A ))] = 0 (where 2 B is the collection of all subsets of a set B ), which only constrains P ( { } ) ≤ ν θ ([0 , θ ] ), hence p ≤ θ . So the identiﬁed set for the structural parameter isΘ I = [ √ p, Remark 1.

Further dimension reduction requires the determination of classes of sets A on which to check the inequality between P ( A ) and ν (Γ( A )) . This is needed for instancewhen the observable variables are discrete and take many diﬀerent values, since checkingthe inequality for all subsets of the set of possible values would involve a very large num-ber of operations. Galichon and Henry, 2006b addresses this issue with a theory of coredetermining classes . Pilot example 2 continued

Fixing θ (and dropping it from the notation), the necessaryand suﬃcient condition for internal consistency of the structure is that P ( A ) ≤ ν (Γ( A )) forany measurable set A . Suppose for expositional purposes that the jobs are characterizedby a real valued random variable Y , and that required skills are monotone in the sensethat s and s are nondecreasing. As shown in Galichon and Henry, 2006b, the inequalityneeds to be checked only on sets of the form A = ( −∞ , y ] and A = ( y, + ∞ ), for y ∈ R ,so that a necessary and suﬃcient condition for internal consistency of the structure is that F ν ( s ( y )) ≤ F ( y ) ≤ F ν ( s ( y )), where F is the cumulative distribution function of jobs Y ,and F ν is the cumulative distribution function of skills U . Given a sample ( Y , . . . , Y n ) of independently and identically distributed realizations of Y , our objective is to construct a sequence of random sets Θ αn such that for all θ ∈ Θ I ,lim n →∞ Pr ( θ ∈ Θ αn ) = 1 − α . In other words, we are concerned with constructing a regionΘ αn that covers each value of the identiﬁed set, as opposed to a region ˜Θ that covers theidentiﬁed set uniformly, i.e. such that Pr(Θ I ⊆ ˜Θ) = 1 − α . We do so by including in Θ αn all the values of θ such that we fail to reject a test of internal consistency of ( P, Γ θ , ν θ )with asymptotic level 1 − α . We shall demonstrate the construction of a test statistic T n ( θ )and a sequence c αn ( θ ) such that, conditionally on the structure ( P, Γ θ , ν θ ) being internallyconsistent, the probability that T n ( θ ) ≤ c αn ( θ ) is 1 − α asymptotically, i.e.lim n →∞ Pr ( T n ( θ ) ≤ c αn ( θ ) | ( P, Γ θ , ν θ ) is internally consistent) = 1 − α. (1)Hence we deﬁne our conﬁdence region in the following way.8able 1: Summary of the procedure

1. For a given value of θ , calculate ˆ T n ( θ ) = √ n sup A ∈C n [ P n ( A ) − ν θ (Γ θ ( A ))] , where the collection of sets C n is described in table 2, and P n is the empirical distribution of the sample ( Y , . . . , Y n ) , so that P n ( A ) = (1 /n ) (cid:80) ni =1 { Y i ∈ A } .2. Choose a large integer B . Draw B bootstrap samples ( Y b , . . . , Y bn ) , b =1 , . . . , B with replacement from the initial sample ( Y , . . . , Y n ) . For eachbootstrap sample, calculate T bn ( θ ) = sup A ∈C n,hn ( θ ) [ P b ( A ) − P n ( A )] , where P b is the empirical distribution of the bootstrap sample, and C n,h n ( θ ) isdescribed in table 2. Order the T bn ( θ ) ’s and call c α ∗ ( θ ) the B (1 − α ) largest.3. Include θ in Θ I if and only if ˆ T n ( θ ) ≤ c α ∗ ( θ ) . Deﬁnition 3.

The (1 − α ) conﬁdence region for Θ I is Θ αn = { θ ∈ Θ : T n ( θ ) ≤ c αn ( θ ) } . The full procedure is summarized in table 1. It is clear from equation 1 and the above def-inition that our conﬁdence region covers each element of the identiﬁed set with probability1 − α asymptotically. Hence, after a section devoted to discussing in detail our contributionwithin the literature on the topic, the remainder of this paper will be concerned with theconstruction of the statistic T n and sequence c αn with the required property (1). Pilot example 1 continued

The test statistic is then T n ( θ ) = √ n sup A ∈ { , } [ P n ( A ) − ν θ (Γ θ ( A ))]. Since P n ( ∅ ) = ν θ (Γ θ ( ∅ )) and P n ( { } ) − ν θ (Γ θ ( { } )) = p n − θ , the test statisticis equal to T n ( θ ) = max {√ n ( p n − p ) + √ n ( p − θ ) , } which tends to max { (cid:112) p (1 − p ) Z, } where Z is a standard normal random variable, if p = θ , 0 if p < θ , + ∞ if p > θ . Forany θ such that p ≤ θ , T n ( θ ) has the same limit as ˜ T n = sup A ∈C hn [ √ n ( P n ( A ) − P ( A ))]where C h n is equal to { ∅ , { , }} if p n < θ − h n and 2 { , } if p n ≥ θ − h n . Hence theconﬁdence region Θ αn is the set of θ values that are not rejected in a one-sided test ofthe null hypothesis p ≤ θ against the alternative p > θ based on the quantiles of thedistribution of max {√ n ( p ∗ − p n ) , } given the sample (where p ∗ denotes the frequency of1’s in a bootstrap sample). 9able 2: Collection of sets

1. Take the sample ( Y , . . . , Y n ) . Write Y i = ( D i , C i ) where D i includesthe discrete components, and C i the continuous components of the ob-servable variables in the sample. Call X D the set of values taken by D i . Then, C n is the collection of sets of the form A D × [ −∞ , C i ] or itscomplement, where i = 1 , . . . , n , A D ranges over the subsets of X D , and [ −∞ , C i ] denotes the hyper-rectangle bounded above by the componentsof C i .2. Given h n satisfying h n ln ln n + h − n (cid:112) ln ln n/n → as n → ∞ (e.g. h n = (ln n ) − ), take C n,h n ( θ ) = { A ∈ C n : P n ( A ) ≥ ν θ (Γ θ ( A )) − h n } . This paper appears to be the ﬁrst to cast partial identiﬁcation as a mass transportationproblem. Somewhat related is the speciﬁc use of Fr´echet-Hoeﬀding bounds on cell proba-bilities in Heckman et al., 1997 and Cross and Manski, 2002.The literature on speciﬁcation testing in econometrics is quite extensive (see the manyreferences in D. Andrews, 1988 for Cram´er-von Mises tests and D. Andrews, 1997 forthe Kolmogorov-Smirnov type). Jovanovic, 1989 proposes to consider testing speciﬁca-tions with multiple equilibria and possible lack of identiﬁcation with a generalization ofthe Kolmogorov-Smirnov speciﬁcation test, which is exceedingly conservative unless thestructure is nearly identiﬁed. The stochastic dominance tests of McFadden, 1989 (see alsoLinton et al., 2005 and references within) are also related to tests of partially identiﬁedstructures based on the Kolmogorov-Smirnov statistic. The feasible version of our testingprocedure and the use of the bootstrapped empirical process is related to D. Andrews,1997.The incompleteness of the structure to be tested raises boundary problems, which appearalso in the estimation of models deﬁned by moment inequalities (see Imbens and Manski,2004 and the link drawn by Rosen, 2008 with the literature on constrained statisticaltesting, surveyed in Sen and Silvapulle, 2004) and stochastic dominance testing (see Lintonet al., 2005). Here the asymptotic analysis is carried out via a localization of the empiricalprocesses to treat the boundary problem, which is another major innovation of this paper.Also related is the analysis in Liu and Shao, 2003 of the likelihood ratio test when the10ikelihood is maximized on a set as opposed to a single point.The related problem of constructing conﬁdence regions for partially identiﬁed structuralparameters is the focus of considerable recent research, following the recognition (advo-cated in Manski, 2005) that ad-hoc identiﬁcation conditions can considerably weaken in-ference drawn on their basis. Horowitz and Manski, 1998 propose conﬁdence intervalsthat asymptotically cover interval identiﬁed sets with ﬁxed probability. Beyond the in-terval case, Chernozhukov et al., 2007 propose a criterion function based method, wherethe criterion is maximized on a set, as opposed to a single point. The method allows theconstruction of conﬁdence regions for the identiﬁed set and for each parameter value in theidentiﬁed set. Chernozhukov et al., 2007 also specialize their method to the case of modelsdeﬁned by moment inequalities, with a quadratic criterion function.The case of moment inequalities is also considered as a special case by Galichon and Henry,2006a, Romano and Shaikh, 2008 and Romano and Shaikh, 2006 (see also Rosen, 2008 andBugni, 2007). The present paper complements Chernozhukov et al., 2007 in that it justiﬁes,via a mass transportation argument, the use of a generalized Kolmogorov-Smirnov criterionfunction in the extended Koopmans and Reiersol, 1950 setup presented here. Note thatour proposed use of the bootstrap only concerns the empirical process, as in D. Andrews,2000, so that issues of validity related to bootstrapping the test statistic itself do not arise.The Anderson and Rubin, 1949 approach taken here to construct conﬁdence regions forparameter values within the identiﬁed set is also adopted in Chernozhukov et al., 2007,D. Andrews et al., 2003, Romano and Shaikh, 2008 among many others. D. Andrewset al., 2003 work in a similar framework to the present paper (they consider example 1),but restrict their analysis to discrete dependent variables, and use a projection method, sothat their inference is likely to be more conservative.Since conﬁdence regions are asymptotically validated, as emphasized by Imbens and Man-ski, 2004, uniformity of the conﬁdence region for parameter values is a desirable propertyfor small sample accuracy. D. Andrews and Guggenberger, 2006 analyze uniformity of sub-sampling procedures. Romano and Shaikh, 2008 and Romano and Shaikh, 2006 give highlevel conditions for uniformity of sub-sampling procedures in the criterion-based approach,with speciﬁc conditions under which these results hold in case of regression with intervaloutcomes. Here, we propose to invert a test, which is shown to be asymptotically uniformin level in Galichon and Henry, 2008.In related research, Beresteanu and Molinari, 2008 propose a direct analogy to central11imit theorem based conﬁdence regions in best linear prediction problems. The conﬁdenceregion they propose for the identiﬁed set, in a problem of best linear prediction withinterval outcomes, is the union of a collection of random sets that contain the identiﬁedset with pre-speciﬁed probability. The latter is obtained from central limit theorems forrandom sets (see Molchanov, 2005 for a comprehensive account of the theory). Theypropose one-sided and two-sided versions of their test. The Beresteanu and Molinari, 2008two-sided procedure does not suﬀer from discontinuity at the limit where the identiﬁedset is a singleton. However, by construction, Beresteanu and Molinari, 2008 only provideconﬁdence regions for the whole set, which are typically larger than identiﬁed regions foreach point in the identiﬁed set.

As explained in the previous section, the construction of the conﬁdence region relies on atest of internal consistency of the structure ( P, Γ θ , ν θ ) for a ﬁxed θ . We now explain theconstruction of our test statistic and decision rule, for the hypothesis of internal consistencyof a structure ( P, Γ , ν ) deﬁned by a a probability law ν for U and a set of constraints U ∈ Γ( Y ). The hypothesis that ( P, Γ , ν ) is internally consistent is equivalent to theexistence of a law π for ( Y, U ) with marginals P and ν and such that the constraints U ∈ Γ( Y ) hold π -almost surely. By proposition 1, this null hypothesis is also equivalent to H : sup A ∈B [ P ( A ) − ν (Γ( A ))] = 0 . We propose the following statistic to test the null described above: T n = √ n sup A ∈C [ P n ( A ) − ν (Γ( A ))] , (2)where P n is the empirical distribution of the sample (so that for any measurable set A , P n ( A ) = (1 /n ) (cid:80) ni =1 { Y i ∈ A } ) and where C is deﬁned in table 3.This statistic is a generalized Kolmogorov-Smirnov speciﬁcation test statistic in the sensethat when Γ has disjoint images (i.e. Γ − is a function), T n is a multivariate Kolmogorov-Smirnov statistic for the test of the hypothesis that the structure is correctly speciﬁed, i.e.that the probability law A (cid:55)→ ν (Γ( A )) is indeed equal to the true law P generating the12able 3: collections of sets

1. Write Y = ( D, C ) where D includes the discrete components, and C thecontinuous components with dimension d C . Call X D the set of valuestaken by D . Then, C is the collection of sets of the form A D × [ −∞ , c ] orits complement, where c ∈ R d C , A D ranges over the subsets of X D , and [ −∞ , c ] is the hyper-rectangle bounded above by the components of c .2. Given h > , deﬁne C b = { A ∈ C : P ( A ) = ν (Γ( A )) } . C b,h = { A ∈ C : P ( A ) ≥ ν (Γ( A )) − h } . C h = { A ∈ C : P n ( A ) ≥ ν (Γ( A )) − h } . observable variables Y . In the general case where Γ is a many-to-many mapping, A (cid:55)→ ν (Γ( A )) is no longer a probability measure, since two sets A and B may be disjoint, andyet their images Γ( A ) and Γ( B ) are not, so that ν (Γ( A ∪ B )) may be strictly smaller than ν (Γ( A )) + ν (Γ( B )). This introduces signiﬁcant complications in the asymptotic analysisof the statistic T n as explained in the following discussion.We can write T n = √ n sup A ∈C [ P n ( A ) − ν (Γ( A )] = sup A ∈C { G n ( A ) + √ n [ P ( A ) − ν (Γ( A )] } (3)where G n ( A ) := √ n [ P n ( A ) − P ( A )] is the empirical process. In the case of the classicalKolmogorov-Smirnov statistic (i.e. if Γ − were a function), the term P ( A ) − ν (Γ( A ))would vanish under the null hypothesis. Here, however, under the null we only have P ( A ) ≤ ν (Γ( A )), so that the term √ n [ P ( A ) − ν (Γ( A )] will also contribute. Indeed, for anyset A ∈ C such that P ( A ) = ν (Γ( A )) (i.e. A ∈ C b as deﬁned in table 3), the only remainingterm in the right-hand-side of equation (3) is the empirical process. On the other hand,for any set A ∈ C such that P ( A ) < ν (Γ( A )), √ n [ P ( A ) − ν (Γ( A ))] will take increasinglylarge negative values and eventually dominate the expression inside the supremum in theright-hand-side of equation (3) and such a set A will not contribute to the supremum. Weshow in the proof of theorem 1 that under a very mild assumption on the structure, thelimit will only involve a supremum over sets in C b . Since C b depends on P , it is unknown,and needs to be approximated by a data dependent class C h n deﬁned in table 3 (namely C h with h = h n ). 13 eﬁnition 4. The test statistic T n is given by equation (2), and c αn is the − α quantile of ˜ T n := sup A ∈C hn G n ( A ) (with C h deﬁned in table 3), i.e. c αn = inf { c : P ( ˜ T n ≤ c ) ≥ − α } . Assumption 1.

There exists

K > and < η < such that for all A ∈ C b,h , for h > suﬃciently small, there exists an A b ∈ C b such that A b ⊆ A and d H ( A, A b ) ≤ Kh η . ( C b and C h are deﬁned in table 3, and d H denotes the Hausdorﬀ metric, deﬁned in the appendix.) Remark 2.

Assumption 1 is very mild, in the sense that it fails only in pathologicalcases, such as the case where y ∈ R and y (cid:55)→ P (( −∞ , y ]) − ν (Γ(( −∞ , y ])) is C ∞ with allderivatives equal to zero at some y = y such that ( −∞ , y ] ∈ C b . Assumption 2. h n satisﬁes h n ln ln n + h − n (cid:112) ln ln n/n → as n → ∞ . Remark 3.

Note that assumption 2 is extremely mild, and it is satisﬁed for instance incase h n = (ln n ) − or in case h n satisﬁes h n n η + h − n n η − / → , as n → ∞ for any / > η > , however small. Theorem 1.

Suppose Y either takes values in a ﬁnite set or has density with respect toLebesgue measure. Under assumption 1 and 2, and using the notations of deﬁnition 4, wehave lim n →∞ P ( T n ≤ c αn | ( P, Γ , ν ) is internally consistent ) = 1 − α. Theorem 1 is not applicable directly for two reasons:1. The quantile sequence c αn given in deﬁnition 4 is infeasible in that the statistic ˜ T n involves the empirical process G n = √ n [ P n − P ] with P unknown.2. The statistics T n and ˜ T n are deﬁned as suprema over inﬁnite collections of sets C and C h (with C and C h deﬁned in table 3).We show now that T n can be replaced by ˆ T n deﬁned in table 2, and that c αn can be replacedby c α ∗ , which is the 1 − α quantile of T ∗ := sup A ∈C n,hn G ∗ ( A ), where G ∗ := √ n [ P ∗ − P n ] isthe bootstrapped empirical process. We thereby justify the fully implementable proceduredescribed in table 1. This feasible version of the test mirrors the feasible version of theconditional Kolmogorov-Smirnov test proposed by D. Andrews, 1997, albeit in generalizedform (multivariate and incompletely speciﬁed).To that end, we need a large support assumption and a log concavity assumption forthe distribution of observable variables and a continuity assumption on the mapping Γ toensure that ˆ T n has the same limit as T n . 14 ssumption 3. In case P has density with respect to Lebesgue measure, the density isbounded away from zero, absolutely continuous and log concave (note that log concavedensities include the uniform, normal, beta, exponential and extreme value distributions). Assumption 4.

The functions y (cid:55)→ ν (Γ(( −∞ , y ])) and y (cid:55)→ ν (Γ(( −∞ , y ] c )) are Lipschitz,i.e. there exists some k > such that | ν (Γ(( −∞ , y ])) − ν (Γ(( −∞ , y (cid:48) ])) | ≤ k || y − y (cid:48) || , andidentically for ( −∞ , y ] c . Theorem 2.

Under the assumptions of theorem 1 and assumptions 3 and 4, we have lim n →∞ P ( ˆ T n ≤ c α ∗ | ( P, Γ , ν ) is internally consistent ) = 1 − α almost surely, conditionally on the sample. Remark 4.

The conditions for the validity of the bootstrap procedure are no more re-strictive than the conditions for theorem 1. The additional assumptions, which are morehigh level, are needed only to justify using the data driven class of sets C n instead of C .This follows the proposal in D. Andrews, 1997 in order to simplify the testing procedure asmuch as possible. However, an alternative feasible version of the test relies on a regulardiscretization ( y k ) Nk =1 of the space of continuous observable variables (thereby replacing C n by the class of sets of the form ( −∞ , y k ] , ( −∞ , y k ] c , k = 1 , . . . , N ). To complete the analysis of the test of internal consistency we give conditions under whichthe test is consistent. The class of alternatives we consider is the following: H a : sup A ∈C [ P ( A ) − ν (Γ( A ))] (cid:54) = 0 , where C is deﬁned in table 3. We choose this class of alternatives since it simpliﬁes to theset of alternatives in a multivariate Kolmogorov-Smirnov goodness-of-ﬁt test when P isabsolutely continuous with respect to Lebesgue measure and when Γ − is a function.We have Theorem 3.

Under H a and the assumptions of theorem 1, lim n →∞ P ( T n ≥ c αn ) = 1 . Remark 5.

Notice that the validity of this consistency test is completely general, and,unlike theorem 1, the proof is a straightforward extension of the proof of consistency of thetraditional Kolmogorov-Smirnov speciﬁcation test (see for instance page 526 of Lehmannand Romano, 2005). .3 Small sample investigation of the properties of the test ofinternal consistency We investigate the small sample properties of out test, and compare it to the propertiesof the Kolmogorov-Smirnov speciﬁcation test in the identiﬁed case in a small Monte Carloexperiment based on a special case of illustrative example 2.We consider the following setup illustrated in ﬁgure 1: the structure is given by the cor-respondence Γ( Y ) = [ s ( Y ) , s ( Y )] with s ( Y ) = max(0 , Y + s ) and s ( Y ) = min(1 , Y + s ), s = 0 .

15, and the latent variable U has law ν , which is the uniform distribution over [0 , Y has cumulative distribution function deﬁned on [0 ,

1] by F ( y ) = 0 for 0 ≤ y < s, = y − s for s ≤ y < s , = (1 + 4 s ) y − s − s for 1 + s ≤ y < − s , = y + s for 2 − s ≤ y < − s, = 1 for 1 − s ≤ y ≤ . Figure 1: The correspondence Γ is given by the shaded area, and the thick lines trace theinverse cumulative distribution function of Y .16able 4: Rejection levels for the partially identiﬁed case.Sample Size 100 500 1000 α = 0 .

01 0.001 0.007 0.008 α = 0 .

05 0.010 0.024 0.029 α = 0 .

10 0.029 0.049 0.066Table 5: Rejection levels for the exactly identiﬁed caseSample Size 100 500 1000 α = 0 .

01 0.019 0.024 0.014 α = 0 .

05 0.074 0.079 0.050 α = 0 .

10 0.138 0.135 0.105We perform 1000 repetitions of the following testing procedure, and we report the propor-tions of rejections out of these 1000 repetitions. We ﬁrst generate a sample ( U , . . . , U n ) ofiid uniform [0 , n = 100 , , Y , . . . , Y n ) as ( F − ( U ) , . . . , F − ( U n )). P n is the empirical law of ( Y , . . . , Y n ), and C n,h n isthe collection of sets of the form [0 , Y i ], i = 1 , . . . , n with P n [0 , Y i ] = (1 /n ) (cid:80) nj =1 { Y j ≤ Y i } ≥ ν (Γ([0 , Y i ])) − h n = min[1 , Y i + s ] − h n or [ Y i , i, . . . , n with P n [ Y i , ≥ ν (Γ([ Y i , − h n =min[1 , − Y i + s ] − h n .For each sample, we draw 1000 bootstrap samples ( Y b , . . . , Y bn ), and call P b the law of thebootstrap sample. For each bootstrap sample, we calculate the maximum of the quantities P b [0 , Y i ] − P n [0 , Y i ] for all i such that [0 , Y i ] ∈ C n,h n and P b [ Y i , − P n [ Y i ,

1] for all i suchthat [ Y i , ∈ C n,h n , and call this maximum max G b . Order the max G b obtained for allbootstrap draws, and call c α ∗ the (1 − α )1000 largest, for α = 0 . , . , .

1. Reject if c α ∗ issmaller than the maximum of the quantities P n [0 , Y i ] and P n [ Y i ,

1] for i = 1 . . . , n .The results are given in table 4 for the partially identiﬁed case ( s = 0 .

15) and in table 5,we give the benchmark of the exactly identiﬁed case ( s = 0 and h n = 1), so that the testis a traditional Kolmogorov-Smirnov speciﬁcation test. The results are given for h n on theboundary of the admissible rate, i.e. h n = (cid:112) ln ln n/n . This rate was chosen as a power We use MATLAB version 7.1 with random seed 777. h n = 0 . h n = 0 . h n = 0 . h n = 0 . h n = 0 . h n = 0 . α = 0 .

01 0.004 0 0.012 0.002 0.019 0.005 α = 0 .

05 0.026 0.006 0.049 0.017 0.058 0.022 α = 0 .

10 0.064 0.020 0.090 0.034 0.111 0.043maximizing rate (the rate that will ensure smaller quantiles, hence larger rejection rates).This is the only justiﬁcation for a choice of rate that we can provide at this stage, as optimalrate choice is beyond the scope of this paper. In applications, it is recommended to provideresults for diﬀerent choices of rates, as one would typically do in density, nonparametricregression or spectral estimation. The rejection rates are low for small sample sizes andimprove sharply when sample size increases. To give a sense of the sensitivity of rejectionrates to the choice of the tuning parameter h n , table 6 reports rejection rates in the caseof α = 0 . , . , . n = 100 , , h n thatare signiﬁcantly above, and signiﬁcantly below the initial choice of h n = (cid:112) ln ln n/n . For n = 1000, (cid:112) ln ln n/n = 0 . h n = 0 . , . n = 500, (cid:112) ln ln n/n = 0 . h n = 0 . , . n = 100, (cid:112) ln ln n/n =0 . h n = 0 . , . n = 100, the rejection rates are sensitive to the choiceof rate within the theoretical range (assumption 2) of tuning parameters. For n = 500,there is still sensitivity to the choice of h n , somewhat less so for n = 1000. However, asin the case of bandwidth in kernel estimation or in local spectral estimation of time series,it is highly recommended to report empirical results with a good range of values of thetuning parameter h n . Figure 2 graphs rejections rates against tuning parameter to give abetter sense of this sensitivity for sample size 500 and level 0.05. It is important also tonote that higher values of the tuning parameter lead to less ﬁltering, i.e. more sets areused in the computation of the supremum of the bootstrap empirical process, leading tolarger quantiles, hence smaller rejection rates. Hence it also shows how crucial the ﬁlteringprocedure is, since without it, the power of the test would be very poor.18igure 2: Sensitivity to the tuning parameter. Sample size 500, level 0.05, tuning param-eter ranging from 0.005 to 0.15 on the X axis, and rejections rates on the Y axis. Conclusion

We propose a test of the speciﬁcation of a structure in the sense of Koopmans and Reier-sol, 1950, extended by Jovanovic, 1989, where observable variables and latent variables arerelated by a many-to-many mapping, thereby allowing censored observable variables andmultiple equilibria. We apply mass transportation duality to derive a simple necessary andsuﬃcient condition for compatibility of such structures and data in complete generality,and to justify the use of a generalized Kolmogorov-Smirnov test statistic. We propose agenerically applicable and easily implementable procedure to test compatibility of structureand data, and to construct conﬁdence regions for partially identiﬁed parameters specifyingthe structure. This work therefore complements other proposals, which tend to focus onmodels deﬁned by moments inequalities. The small sample performance of the test is in-vestigated in a Monte Carlo experiment, and is found to be comparable to the performanceof the traditional Kolmogorov-Smirnov speciﬁcation test statistic.19 ppendix

Additional deﬁnitions

Deﬁnition 5.

A many-to-many mapping

Γ : R d ⇒ R d is called measurable if for eachopen set O ⊆ R d , Γ − ( O ) = { x ∈ R d | Γ( x ) ∩ O (cid:54) = ∅ } is a measurable subset of R d . Deﬁnition 6.

Calling d the Euclidean metric, the Hausdorﬀ metric d H between two sets A and A is deﬁned by d H ( A , A ) = max (cid:18) sup y ∈ A inf z ∈ A d ( y, z ) , sup z ∈ A inf y ∈ A d ( y, z ) (cid:19) . Proofs of results in the main text

Proof of proposition 1 : Since Γ is closed valued, ϕ ( y, u ) = 1 { u/ ∈ Γ( y ) } is lower semicon-tinuous, so that we can apply lemma 1 below to yieldinf π ∈M ( P,ν ) πϕ = sup f ⊕ g ≤ ϕ ( P f + νg ) , (4)where f ⊕ g ≤ ϕ stands for f ( y ) + g ( u ) ≤ ϕ ( y, u ) all y, u . Since the sup-norm of the costfunction is 1 (the cost function is an indicator), the supremum in (4) is attained by pairsof functions ( f, g ) in F , deﬁned by F = { ( f, g ) ∈ L ( P ) × L ( ν ) , ≤ f ≤ , − ≤ g ≤ ,f ( y ) + g ( u ) ≤ { u/ ∈ Γ( y ) } , f upper semicontinuous } . Now, ( f, g ) can be written as a convex combination of pairs (1 A , − B ) in F . Indeed, f = (cid:82) { f ≥ x } dx and g = (cid:82) − { g ≤− x } dx , and for all x , 1 { f ≥ x } ( y ) − { g ≤− x } ( u ) ≤ { u/ ∈ Γ( y ) } .Since the functional on the right-hand side of (4) is linear, the supremum is attained onsuch a pair (1 A , − B ). Hence, the right-hand side of (4) specializes tosup A × B ⊆ D ( P ( A ) − ν ( B )) . (5)For D = { ( y, u ) : u / ∈ Γ( y ) } , A × B ⊆ D means that if y ∈ A and u ∈ B , then u / ∈ Γ( y ).In other words u ∈ B implies u / ∈ Γ( A ), which can be written B ⊆ Γ( A ) c . Hence, the dualproblem can be writtensup Γ( A ) ⊆ B c ( P ( A ) − ν ( B )) = sup Γ( A ) ⊆ B ( P ( A ) − ν ( B )) . and the result follows immediately. 20 emma 1. If ϕ : Y × U → R is bounded, non-negative and lower semicontinuous, then inf π ∈M ( P,ν ) πϕ = sup f ⊕ g ≤ ϕ ( P f + νg ) . Proof of lemma 1 : The left-hand side is immediately seen to be always larger than theright-hand side, so we show the reverse inequality. It is a specialization of the Monge-Kantorovich duality to zero-one cost, which can also be proved using Proposition (3.3)page 424 of Kellerer, 1984, but we give a direct proof due to N. Belili for completeness.[a] case where ϕ is continuous and U and Y are compact.Call G the set of functions on Y × U strictly dominated by ϕ and call H the set offunctions of the form f + g with f and g continuous functions on Y and U respectively.Call s ( c ) = P f + νg for c ∈ H . It is a well deﬁned linear functional, and is not identicallyzero on H . G is convex and sup-norm open. Since ϕ is continuous on the compact Y × U ,we have s ( c ) ≤ sup f + sup g < sup ϕ for all c ∈ G ∩ H , which is non empty and convex. Hence, by the Hahn-Banach theorem,there exists a linear functional η that extends s on the space of continuous functions suchthat sup G η = sup G ∩ H s. By the Riesz representation theorem, there exists a unique ﬁnite non-negative measure π on Y × U such that η ( c ) = πc for all continuous c . Since η = s on H , we have (cid:90) Y×U f ( y ) dπ ( y, u ) = (cid:90) Y f ( y ) dP ( y ) (cid:90) Y×U g ( u ) dπ ( y, u ) = (cid:90) Y g ( u ) dν ( y ) , so that π ∈ M ( P, ν ) andsup f ⊕ g ≤ ϕ ( P f + νg ) = sup G ∩ H s = sup G η = πϕ. [b] Y and U are not necessarily compact, and ϕ is continuous.For all n >

0, there exists compact sets K n and L n such thatmax ( P ( Y\ K n ) , ν ( U \ L n )) ≤ n . a, b ) be an element of Y × U and deﬁne two probability measures µ n and ν n withcompact support by µ n ( A ) = P ( A ∩ K n ) + P ( A \ K n ) δ a ( A ) ν n ( B ) = ν ( B ∩ L n ) + ν ( B \ L n ) δ b ( B ) , where δ denotes the Dirac measure. By [a] above, there exists π n with marginals µ n and ν n such that π n ϕ ≤ sup f ⊕ g ≤ ϕ ( P f + νg ) + ϕ ( a, b ) n . Since ( π n ) has weakly converging marginals, it is weakly relatively compact. Hence itcontains a weakly converging subsequence with limit π ∈ M ( P, ν ). By Skorohod’s almostsure representation (see for instance theorem 11.7.2 page 415 of Dudley, 2002), there existsa sequence of random variables X n on a probability space (Ω , A , P ) with law π n and arandom variable X on the same probability space with law π such that X is the almostsure limit of ( X n ). By Fatou’s lemma, we then haveliminf π n ϕ = liminf E ϕ ( X n ) ≥ E liminf ϕ ( X n ) = E ϕ ( X ) = πϕ. Hence we have the desired result.[c] General case. ϕ is the pointwise supremum of a sequence of continuous bounded functions, so the resultfollows from upward σ -continuity of both inf π ∈M ( P,ν ) πϕ and sup f ⊕ g ≤ ϕ ( P f + νg ) on thespace of lower semicontinuous functions, shown in propositions (1.21) and (1.28) of Kellerer,1984. Proof of theorem 1 : We show that T n and ˜ T n converge in distribution (notation (cid:32) ) tothe same limit, which has a continuous distribution function. Hence, the result follows. • Case where Y = D discrete. Let A be the subset of X D that achieves the maximumof δ ( A ) = P ( A ) − ν (Γ( A )) over A ∈ C\C b . Call δ = δ ( A ), and note that δ < T n = sup A ∈ X D [ G n ( A ) + √ n ( P ( A ) − ν (Γ( A )))]= max { sup C b G n , sup A ∈ X D \C b [ G n ( A ) + √ n ( P ( A ) − ν (Γ( A )))] } . X D \C b G n + √ nδ , whose limsup is almost surely non-positive. Hence T n (cid:32) sup C b G follows from theconvergence of the empirical process. ˜ T n (cid:32) sup C b G follows from the fact that, underassumption 2, for all n suﬃciently large, C h n is almost surely equal to C b . • Case of Y = C absolutely continuous with respect to Lebesgue measure. Consider twosequences of positive numbers l n and h n such that they both satisfy assumption 2, l n > h n and ( l n − h n ) − (cid:113) ln ln nn →

0. Notice that { ∅ , R d C } ⊆ C b , C b,h , C h for any h >

0. Since G n ( R d C ) = 0, we therefore have sup C b G n , sup C b,ln G n and sup C hn G n non-negative. Hence, calling ζ n the indicator function of the event sup C G n ≤ ( l n − h n ) √ n ,we can write ζ n sup C b G n ≤ ζ n max (cid:40) sup C b [ G n + √ n ( P − ν Γ)] , sup C\C b [ G n + √ n ( P − ν Γ)] (cid:41) ≤ ζ n T n ≤ ζ n sup C hn G n ≤ ζ n sup C b,ln G n , where the ﬁrst inequality holds because the left-hand side is equal to the ﬁrst termin the right-hand side, the second inequality holds trivially as an equality since C = C b ∪ C\C b , the third inequality holds because on C\C h n , we have by deﬁnition G n + √ n ( P − ν Γ) = √ n ( P n − ν Γ) ≤ − h n ≤

0, and the last inequality holds because on { ζ n = 1 } , we have that A ∈ C h n implies ν Γ( A ) ≤ P n ( A ) + h n = P ( A ) + ( P n − P )( A ) + h n ≤ P ( A ) + sup C G n / √ n + h n ≤ P ( A ) + l n − h n + h n = P ( A ) + l n , which impliesthat A ∈ C b,l n .By lemma 2 and Theorem 2.5.2 page 127 of van der Vaart and Wellner, 1996, we havethat both sup C b G n and sup C b,ln G n converge in distribution to sup C b G . It is shownbelow that ζ n → p

1, so that Slutsky’s lemma (lemma 2.8 page 11 of van der Vaart,1998) yields the weak convergence of ζ n sup C b G n and ζ n sup C b,ln G n to the same limit,and hence that of ζ n T n and ζ n sup ˆ C hn G n . It follows from Slutsky’s lemma again that T n (cid:32) sup C b G and ˜ T n (cid:32) sup C b G . We now prove that ζ n → p

1. Indeed, for any (cid:15) > P ( | ζ n − | > (cid:15) ) = P ( ζ n = 0) = P (sup C G n > ( l n − h n ) √ n ) → l n − h n ) √ n (cid:29) √ ln ln n by assumption.23 emma 2. We have sup A ∈C b,hn G n ( A ) (cid:32) sup A ∈C b G ( A ) , Proof of lemma 2 : Take a bandwidth sequence l n that satisﬁes assumption 2, andtake C b,l n as in table 3. Under assumption 1, take A ∈ C b,l n and an A b ∈ C b such that d H ( A, A b ) ≤ ζ n = Kl ηn (we suppress the dependence of A b on A for ease of notation). As C b ⊆ C b,l n , one has sup A ∈C b G n ( A ) ≤ sup B ∈C b,ln G n ( A ) (6)Second, since A b ⊆ A , one hassup A ∈C b,ln G n ( A ) = sup A ∈C b,ln [ G n ( A b ) + G n ( A \ A b )] ≤ sup A ∈C b,ln [ G n ( A b )] + sup A ∈C b,ln [ G n ( A \ A b )] . If we have that sup A ∈C b,ln | G n ( A \ A b ) | = O a . s . (cid:16)(cid:112) ζ n ln ln n (cid:17) , then sup A ∈C b,ln G n ( A ) = sup A ∈C b,ln [ G n ( A b )] + O a . s . (cid:16)(cid:112) ζ n ln ln n (cid:17) (7)noting the dependence of A b on A in the expression above. But since A b ∈ C b , one hassup A ∈C b,ln [ G n ( A b )] ≤ sup A ∈C b G n ( A ). This fact, along with (6) and (7), yields the result.We now show that we have indeed thatsup A ∈C b,ln | G n ( A \ A b ) | = O a . s . (cid:16)(cid:112) ζ n ln ln n (cid:17) . This relies on the construction of a local empirical process relative to the thin regions A \ A b . First consider such a region. If A ∈ C b , the result holds trivially, so that we mayassume that A ∈ C b,l n \C b , so that A \ A b is not empty. We distinguish the case where A isa bounded rectangle, and the cases where A is unbounded.(i) A is a bounded rectangle, i.e. of the form ( y , z ) × . . . × ( y d y , z d y ), with y , . . . ,y d y , z , . . . , z d y real. Then, since d H ( A, A b ) ≤ ζ n , A b is also a bounded rectangle, andthe A \ A b is the union of at least one (since A and A b are distinct) and at most f ( d y )(the number of faces of a rectangle in R d y ) rectangles with at least one dimensionbounded by ζ n . 24ii) A is an unbounded rectangle, i.e. of the same form as above, except that some of theedges are + ∞ of −∞ . Then A b is also an unbounded rectangle, and A \ A b is alsothe union of a ﬁnite number of rectangles with one dimension bounded by ζ n .In both cases ( i ), and ( ii ), A \ A b is the union of a ﬁnite number of rectangles with at leastone dimension bounded by ζ n . Hence if we control the supremum of the empirical processon one of these thin rectangles, when A ranges over C b,l n , we can control it on A \ A b .Hence, it suﬃces to prove thatsup A ∈C b,ln | G n ( ϕ n ( A )) | = O a . s . (cid:16)(cid:112) ζ n ln ln n (cid:17) , where ϕ n is the homothety that carries A into one of the thin rectangles described above.As an homothety, ϕ n is invertible and bi-measurable, and since ϕ n ( A ) has at least one di-mension bounded by ζ n , and P is absolutely continuous with respect to Lebesgue measure, P ( ϕ n ( A )) = O ( ζ n ) uniformely when A ranges over C b,l n . Now, for any A ∈ C b,l n , we have G n ( ϕ n ( A )) = √ n [ P n ( ϕ n ( A )) − P ( ϕ n ( A ))]= 1 √ n n (cid:88) i =1 (cid:0) { ϕ n ( A ) } ( Y i ) − E P (1 { ϕ n ( A ) } ( Y )) (cid:1) = 1 √ n n (cid:88) i =1 (cid:0) A ( ϕ − n ( Y i )) − E P (1 A ( ϕ − n ( Y ))) (cid:1) := (cid:112) ζ n L n (1 A , ϕ n ) , where L n (1 A , ϕ n ) is deﬁned as1 √ nζ n n (cid:88) i =1 (cid:0) A ( ϕ − n ( Y i )) − E P (1 A ( ϕ − n ( Y ))) (cid:1) to conform with the notation of Einmahl and Mason, 1997.Conditions A(i)-A(iv) of the latter hold for a n = b n = l n and a = 0 under assumption 2,and conditions S(i)-S(iii) and F(ii) and F(iv)-F(viii) hold because F is here the class ofindicator functions of C b,l n , hence Donsker (see for instance example 2.6.1 page 135 of vander Vaart and Wellner, 1996). Hence Theorem 1.2 of Einmahl and Mason, 1997 holds, andsup A ∈C b,ln | L n (1 A , ϕ n ) | = O a . s . (cid:16) √ ln ln n (cid:17) so that the desired result holds. 25 roof of theorem 2 : By theorem 2.4 page 857 of Gin´e and Zinn, 1990, the bootstrappedempirical process G ∗ converges weakly to G conditionally almost surely, so thatsup A ∈C hn G n ( A ) and sup A ∈C hn G ∗ ( A )have the same continuous limit. There remains to show that T n and ˆ T n have the samelimit, and that sup A ∈C n,hn G ∗ ( A ) = sup A ∈C hn G ∗ ( A ) so that the result follows. The latterderives from the fact that G ∗ takes at most n diﬀerent values over C h n which are exhaustedon C n,h n . We now prove the former. First, notice that C n ⊆ C implies ˆ T n ≤ T n . • Case where Y = D discrete. In that case, there is n such that for all n ≥ n , C n = C ,and the result trivially follows. • Case where Y = C ∈ R d y has a density with respect to Lebesgue measure. By The-orem 9.14 page 291 of Villani, 2003, there is existence of a one-to-one bi-measurable(i.e. both itself and its inverse are measurable) and Lipschitz (with constant 1) func-tion φ : [0 , d y → R d y such that Y = φ ( V ) and V is distributed uniformly on [0 , d y ( φ is called a generalized quantile transformation).Hence, for any set A ∈ C , we can write P n ( A ) = 1 n n (cid:88) i =1 { Y i ∈ A } = 1 n n (cid:88) i =1 { φ ( U i ) ∈ A } = 1 n n (cid:88) i =1 { U i ∈ φ − ( A ) } = λ n ( φ − ( A )) , where λ n denotes the empirical law associated with an iid sample of uniformly dis-tributed variables on [0 , d y .We have ˆ T n − T n = sup A ∈C n [ P n ( A ) − ν (Γ( A )] − sup A ∈C [ P n ( A ) − ν (Γ( A )]. We showthat for all (cid:15) >

0, there is an n such that for all n > n ,sup y ∈ R dy inf j ∈{ ,...,n } { ( P n ( −∞ , Y j ] − P n ( −∞ , y ]) + ( ν (Γ( −∞ , y ])) − ν (Γ( −∞ , Y j ])) } < (cid:15) and we can proceed similarly for sets of the form ( −∞ , y ] c . The proof of the latterproceeds in three steps: – By the results stated in the two paragraphs following equation (1) page 919 ofTalagrand, 1994, we have for any η > y ∈ [0 , dy min j ∈{ ,...,n } || v − V j || = O a . s . (cid:0) n η − / max(2 ,d y ) (cid:1) . Since φ is Lipschitz, the latter implies thatsup y ∈ R dy min j ∈{ ,...,n } || y − Y j || = O a . s . (cid:0) n η − / max(2 ,d y ) (cid:1) . Consider the mapping y (cid:55)→ j ( y ) which achieves the minimum of || y − Y j ( y ) || .B assumption 4, we have for n large enough, sup y ∈ R dy ( ν (Γ(( −∞ , Y j ( y ) ])) − ν (Γ(( −∞ , y ]))) < (cid:15)/ – We have sup y ∈ R dy ( P ( −∞ , y ) − P ( −∞ , Y j ( y ) )) < (cid:15)/

4, since the set ( −∞ , y ) \ ( −∞ , Y j ( y ) ]shrinks uniformly. – By Theorem 2.3 page 367 of Stute, 1984, we have sup A ⊂ R dy ( P n ( A ) − P ( A )) < (cid:15)/ n large enough, and the result follows. Proof of theorem 3 : Under H a , there is a set A in C such that P ( A ) > ν (Γ( A )).Now the test statistic is T n = √ n sup A ∈C [ P n ( A ) − ν (Γ( A ))]= sup A ∈C [ G n ( A ) + √ n ( P ( A ) − ν (Γ( A )))] ≥ G n ( A ) + √ n [ P ( A ) − ν (Γ( A ))] . (8)Hence, T n − ˜ T n = √ n sup A ∈C [ P n ( A ) − ν (Γ( A ))] − sup A ∈C hn G n ( A ) ≥ √ n sup A ∈C [ P n ( A ) − ν (Γ( A ))] − sup A ∈C G n ( A ) ≥ G n ( A ) + √ n [ P ( A ) − ν (Γ( A ))] − sup A ∈C G n ( A ) , where the ﬁrst inequality follows from the fact that C h n ⊆ C , and the second inequalityfollows from (8). Since P ( A ) > ν (Γ( A )), we have √ n [ P ( A ) − ν (Γ( A ))] → ∞ . Hence,since G n ( A ) − sup A ∈C G n ( A ) is a tight sequence (this can be derived for instance fromexponential bounds in 2.14.9 and 2.14.10 page 246 of van der Vaart and Wellner, 1996),we have P ( T n ≥ c αn ) → α >

0. 27 eferences

Ackerberg, D., Benkard, L., Berry, S., & Pakes, A. (2007).

Econometric tools for analyzingmarket outcomes [ Handbook of Econometrics , Volume 6A].Anderson, T., & Rubin, H. (1949). Estimation of the parameters of a single equation ina complete system of stochastic equations.

Annals of Mathematical Statistics , ,46–63.Andrews, D. (1988). Chi-squared diagnostic tests for econometric models. Econometrica , , 1419–1453.Andrews, D. (1997). A conditional kolmogorov test. Econometrica , , 1097–1128.Andrews, D. (2000). Inconsistency of the bootstrap when a parameter is on the boundaryof the parameter space. Econometrica , , 399–405.Andrews, D., Berry, S., & Jia, P. (2003). Placing bounds on parameters of entry games inthe presence of multiple equilibria [unpublished manuscript].Andrews, D., & Guggenberger, P. (2006).

The limit of ﬁnite sample size and a problemwith subsampling [unpublished manuscript].Beresteanu, A., & Molinari, F. (2008). Asymptotic properties for a class of partially iden-tiﬁed models.

Econometrica , , 763–814.Blundell, R., Browning, M., & Crawford, I. (2005). Best nonparametric bounds on demandresponses [unpublished manuscript].Brock, B., & Durlauf, S. (2007). Identiﬁcation of binary choice models with social interac-tions.

Journal of Econometrics , , 52–75.Bugni, F. (2007). Bootstrap methods for some partially identiﬁed models [unpublishedmanuscript].Chen, X., Hong, H., & Tamer, E. (2005). Measurement error models with auxiliary data.

Review of Economic Studies , , 343–366.Chernozhukov, V., Hong, H., & Tamer, E. (2007). Estimation and conﬁdence regions forparameter sets in econometric models. Econometrica , , 1243–1285.Ciliberto, F., & Tamer, E. (2006). Market structure and multiple equilibria in airline mar-kets [unpublished manuscript].Cross, P., & Manski, C. (2002). Regressions, short and long.

Econometrica , , 357–368.Dudley, R. (2002). Real analysis and probability . Cambridge University Press.Einmahl, U., & Mason, D. (1997). Gaussian approximation of local empirical processesindexed by functions.

Probability Theory and Related Fields , , 283–311.Galichon, A., & Henry, M. (2006a). Dilation bootstrap. a methodology for constructingconﬁdence regions with partially identiﬁed models [available from SSRN at http ://papers.ssrn.com/sol3/papers.cfm?abstract id=934442].28alichon, A., & Henry, M. (2006b).

Inference in incomplete models [available from SSRNat: http://papers.ssrn.com/sol3/papers.cfm?abstract id=886907].Galichon, A., & Henry, M. (2008).

Universal power of kolmogorov-smirnov tests of under-identifying restrictions. [available from SSRN at http : / / papers . ssrn . com / sol3 /papers.cfm?abstract id=1123823].Gin´e, E., & Zinn, S. (1990). Bootstrapping general empirical measures.

Annals of Proba-bility , , 851–859.Heckman, J., Smith, J., & Clements, N. (1997). Making the most out of programme evalu-ation and social experiments: Accounting for heterogeneity in programme impacts. Review of Economic Studies , , 487–535.Horowitz, J., & Manski, C. (1998). Censoring of outcomes and regressors due to surveynonresponse: Identiﬁcation and estimation using weights and imputations. Journalof Econometrics , , 37–58.Imbens, G., & Manski, C. (2004). Conﬁdence intervals for partially identiﬁed parameters. Econometrica , , 1845–1859.Jovanovic, B. (1989). Observable implications of models with multiple equilibria. Econo-metrica , , 1431–1437.Kellerer, H. (1984). Duality theorems for marginal problems. Zeitschrift f¨ur Wahrschein-lichkeitstheorie und Verwandte Gebiete , , 399–432.Koopmans, T., & Reiersol, O. (1950). The identiﬁcation of structural characteristics. An-nals of Mathematical Statistics , , 165–181.Lehmann, E., & Romano, J. (2005). Testing statistical hypotheses . Springer: New York.Linton, O., Maasoumi, E., & Whang, Y. (2005). Testing for stochastic dominance undergeneral conditions: A subsampling approach.

Review of Economic Studies , , 735–765.Liu, X., & Shao, Y. (2003). Asymptotics for likelihood ratio tests under loss of identiﬁa-bility. Annals of Statistics , , 807–832.Magnac, T., & Maurin, E. (2008). Partial identiﬁcation in monotone binary models: Dis-crete regressors and interval data [forthcoming in the

Review of Economic Studies ].Manski, C. (1990). Nonparametric bounds on treatment eﬀects.

American Economic Re-view , , 319–323.Manski, C. (2004). Social learning from private experiences: The dynamics of the selectionproblem. Review of Economic Studies , , 443–458.Manski, C. (2005). Partial identiﬁcation in econometrics [ New Palgrave Dictionary of Eco-nomics, 2nd Edition. ]. 29arschak, J., & Andrews, W. (1944). Random simultaneous equations and the theory ofproduction.

Econometrica , , 143–203.Matzkin, R. (1994). Restrictions of economic theory in nonparametric methods. Handbookof Econometrics, vol 4, R. F. Engel and D. L. McFadden, eds. , 1–16.McFadden, D. (1989). Testing for stochastic dominance.

Studies in the Economics of Un-certainty (in honor of J, Hadar), Part II, T. Fomby and T. Seo, eds. , 113–134.Molchanov, I. (2005).

Theory of random sets . Springer: New York.Molinari, F. (2003).

Contaminated, corrupted and missing data [Northwestern UniversityPh.D.].Monge, G. (1781).

M´emoire sur la th´eorie des d´eblais et des remblais . Acad´emie Royaledes Sciences de Paris.Pakes, A., Porter, J., Ho, K., & Ishii, J. (2004).

Moment inequalities and their application [unpublished manuscript].Roehrig, C. (1988). Conditions for identiﬁcation in parametric and nonparametric models.

Econometrica , , 433–447.Romano, J., & Shaikh, A. (2006). Inference for the identiﬁed set in partially identiﬁedeconometric models [unpublished manuscript].Romano, J., & Shaikh, A. (2008).

Inference for identiﬁable parameters in partially iden-tiﬁed econometric models [forthcoming in the

Journal of Statistical Planning andInference ].Rosen, A. (2008).

Conﬁdence sets for partially identiﬁed parameters that satisfy a ﬁnitenumber of moment inequalities [forthcoming in the

Journal of Econometrics ].Sen, P., & Silvapulle, M. (2004).

Constrained statistical inference: Inequality, order andshape restrictions . Wiley-Interscience: New York.Stute, W. (1984). The oscillation behaviour of empirical processes: The multivariate case.

Annals of Probability , , 361–379.Talagrand, M. (1994). The transportation cost from the uniform measure to the empiricalmeasure in dimension greater or equal to three. Annals of Probability , , 919–959.Tamer, E. (2003). Incomplete simultaneous discrete response model with multiple equilib-ria. Review of Economic Studies , , 147–165.Tinbergen, J. (1951). Some remarks on the distribution of labour incomes. InternationalEconomic Papers 1: Translations prepared for the economic association, Eds.: AlanT. Peacock et al. , 95–207.van der Vaart, A. (1998).

Asymptotic statistics . Cambridge University Press.van der Vaart, A., & Wellner, J. (1996).

Weak convergence and empirical processes . NewYork: Springer. 30illani, C. (2003).