A test of non-identifying restrictions and confidence regions for partially identified parameters
AA Test of Non-Identifying Restrictions andConfidence Regions for Partially IdentifiedParameters
Alfred Galichon and Marc Henry ´Ecole polytechnique, Paris and Universit´e de Montr´ealFirst draft: September 15, 2005This draft : April 16, 2008 Abstract
We propose an easily implementable test of the validity of a set of theoreticalrestrictions on the relationship between economic variables, which do not necessarilyidentify the data generating process. The restrictions can be derived from any modelof interactions, allowing censoring and multiple equilibria. When the restrictions areparameterized, the test can be inverted to yield confidence regions for partially iden-tified parameters, thereby complementing other proposals, primarily Chernozhukovet al., 2007.
JEL Classification: C10, C12, C13, C14, C52, C61Keywords: partial identification, mass transportation, specification test. This research was partly carried out while the first author was visiting the Bendheim Center for Fi-nance, Princeton University and financial support from NSF grant SES 0350770 to Princeton University,from NSF grant SES 0532398, from the Program for Economic Research at Columbia University and fromChaire EDF-Calyon “Finance et D´eveloppement Durable” is gratefully acknowledged. We are grateful toVictor Chernozhukov, Pierre-Andr´e Chiappori, Guido Imbens and Bernard Salani´e for encouragement, sup-port and many helpful discussions. We also thank three anonymous referees, whose detailed and insightfulcomments helped significantly improve the paper, and we thank conference participants at Econometricsin Rio and seminar participants at Berkeley, Chicago, Columbia, ´Ecole polytechnique, Harvard-MIT, MITSloane OR, Northwestern, NYU, Princeton, SAMSI, Stanford, the Weierstrass Institut and Yale for helpfulcomments (with the usual disclaimer). Correspondence address: D´epartement d’´economie, ´Ecole polytech-nique, 91128 Palaiseau, France and D´epartement de sciences ´economiques, Universit´e de Montr´eal, C.P.6128, succursale Centre-ville, Montr´eal QC H3C 3J7, Canada. E-mail: [email protected] [email protected]. a r X i v : . [ ec on . E M ] F e b ntroduction In several rapidly expanding areas of economic research, the identification problem issteadily becoming more acute. In policy and program evaluation (Manski, 1990) andmore general contexts with censored or missing data (Molinari, 2003, Magnac and Maurin,2008) and measurement error (Chen et al., 2005), ad hoc imputation rules lead to frag-ile inference. In demand estimation based on revealed preference (Blundell et al., 2005)the data is generically insufficient for identification. In the analysis of social interactions(Brock and Durlauf, 2007, Manski, 2004), complex strategies to reduce the large dimen-sionality of the correlation structure are needed. In the estimation of models with complexstrategic interactions and multiple equilibria (Tamer, 2003, D. Andrews et al., 2003, Pakeset al., 2004), assumptions on equilibrium selection mechanisms may not be available oracceptable.More generally, in all areas of investigation with structural data insufficiencies or incom-pletely specified economic mechanisms, the hypothesized structure fails to identify a uniquepossible generating mechanism for the data that is actually observed. Hence, when thestructure depends on unknown parameters, and even if a unique value of the parametercan still be construed as the true value in some well defined way, it does not correspondin a one-to-one mapping with a probability measure for the observed variables. We thencall the structural restrictions non-identifying. In other words, even if we abstract fromsampling uncertainty and assume the distribution of the observable variables is perfectlyknown, no unique parameter but a whole set of parameter values (hereafter called identifiedset in the terminology of Manski, 2005) will be compatible with it.Once a theoretical description of an economic system is given, a natural question to con-sider is whether the structure can be rejected on the basis of data on its observable com-ponents. Marschak and Andrews, 1944 construct a collection of production functions thatare compatible with structural restrictions and are not rejected by the data. We extendthis approach within the general formulation of Koopmans and Reiersol, 1950, who definea structure as the combination of a binary relation between observed socioeconomic vari-ables (market entry, insurance coverage, winning bids in auctions, etc...) and unobservedones (productivity shocks, risk level, or risk attitude, valuations or information dependingon the auction paradigm, etc...) and a generating mechanism for the unobserved variables.This setup is employed by Roehrig, 1988 and Matzkin, 1994, who analyze conditions fornonparametric identification of structures where the endogenous observable variables arefunctions of unobservable variables and exogenous observable ones.2ere, following Jovanovic, 1989, we allow the relation between observable and unobservablevariables to be many-to-many, thereby including structures with multiple equilibria (whena value of the latent variables is associated with a set of values of the observable variables)and censored endogenous observable variables (where a value of the observable variable isassociated with set of values of the latent variables). We do not strive for identificationconditions, but rather for the ability to reject such structures that are incompatible withdata, as in the original work of Marschak and Andrews, 1944.We show that such a goal can be attained in all generality (ie. for any structure, involvingdiscrete as well as continuous observable variables), through an appeal to the duality ofmass transportation (see Villani, 2003 for a comprehensive account of the theory). Givenany set of (possibly non-identifying) restrictions on the relation between latent and observ-able variables, and given the distribution ν of latent variables, the structure thus definedis compatible with the true distribution P of the observable variables if and only if thereexists a joint distribution with marginals P and ν and such that the restrictions are almostsurely respected. Otherwise, the data could not have been generated in a such a way. Weshow that the latter condition can be formulated as a mass transportation problem (theproblem of transporting a given distribution of mass from an initial location to a differentdistribution of mass in a final location while minimizing a certain cost of transportation, asoriginally formulated by Monge, 1781). We show that this optimization problem has a dualformulation, an empirical version of which is a generalized Kolmogorov-Smirnov test statis-tic. We base a test of the restrictions in the structure on this statistic, whose asymptoticdistribution we derive, and approximate using the bootstrapped empirical process.Once we have a test of the structure, we can form confidence regions for unknown param-eters using the methodology of Anderson and Rubin, 1949, which consists in collectingall parameter values for which the structure is not rejected by the test at the desired sig-nificance level. The construction of such confidence regions has been the focus of muchresearch lately (see for instance the thorough literature review in Chernozhukov et al.,2007). Unlike much of the econometric research on this issue, we do not restrict the anal-ysis to models defined by moment inequalities. On the other hand, we consider structuresin the sense of Koopmans and Reiersol, 1950, and hence parametric distributions for thelatent variables. This, however, is a common assumption in empirical work with gametheoretic models, as exemplified by D. Andrews et al., 2003, Ciliberto and Tamer, 2006,and more generally Ackerberg et al., 2007.The paper is organized as follows. The next section is divided in four subsections. The firstdescribes the setup; the second defines the hypothesis of compatibility of the structure with3he data; the third explains how to construct a confidence region for the identified set, andthe fourth reviews the related literature. The second section is divided in three subsections.The first subsection describes and justifies the generalized Kolmogorov-Smirnov test ofcompatibility of the structure with the data; the second shows consistency of the test, andthe third investigates size properties of the test in a Monte Carlo experiment. The lastsection concludes. Consider the model of an economy which is composed of an observed variable Y and alatent, unobserved variable U . Formally, ( Y, U ) is a pair of random vectors defined on acommon probability space. The pair (
Y, U ) has probability law π which is unknown. Y represents the variables that are observable, and U the variables that are unobservable. Y may have discrete and continuous components. Y may include variables of interestin their own right, and randomly censored or otherwise transformed versions of variablesof interest. We call the law of the observable variables P . It is unknown, but the dataavailable is a sample of independent and identically distributed vectors ( Y , . . . , Y n ) withlaw P . U includes random shocks and other unobserved heterogeneity components. Thelaw π of ( Y, U ) can be decomposed into the unconditional distribution P of Y and theconditional distribution of U given Y , namely π U | Y . Throughout the paper it is supposedthat π U | Y is unknown but fixed across observations.The distribution of U is parameterized by a vector θ ∈ Θ , where Θ is an open subsetof R d , and the law of U is denoted ν θ . Finally, an economic model is given to us in theform of a set of restrictions on the vector ( Y, U ), which can be summarized without lossof generality by the relation U ∈ Γ θ ( Y ) where Γ θ is a many-to-many mapping, which iscompletely given except for the vector of structural parameters θ ∈ Θ , where Θ is anopen subset of R d . θ and θ may contain common components. We call θ the combinationof the two, so that θ ∈ Θ, with Θ an open subset of R d θ , and d θ ≤ d + d . From now on,we shall therefore denote the distribution of U by ν θ and the many-to-many mapping byΓ θ . In all that follows, we assume that Γ θ is measurable (a very weak requirement whichis defined in the appendix), and has non-empty and closed values.We are interested in testing the compatibility of the observed variables Y with the model4escribed by (Γ , ν ). A related question is set-inference in a parametric model (Γ θ , ν θ ): aconfidence region for θ can be obtained by inverting the specification test, namely retainingthe values of θ which are not rejected. Note that if θ = ( β, η ), where β are the parametersof interest and η ∈ H are nuisance parameters, we can redefine the economic model restric-tions as U ∈ Γ β ( Y ) where Γ β is defined by Γ β ( y ) = (cid:83) η ∈ H Γ ( β,η ) ( y ) for all y ∈ R d y . Hencewe can assume again without loss of generality that θ is indeed the parameter of interest.As the main focus of the present paper is to derive a specification test, whenever there isno ambiguity we shall implicitly fix the parameter θ and drop it from our notations. Example 1.
A prominent example for this set-up is provided by the class of models definedby a static game of interaction. Consider a game where the payoff function for player j , j = 1 , . . . , J is given by Π j ( S j , S − j , X j , U j ; θ ) , where S j is player j ’s strategy and S − j istheir opponents’ strategies. X j is a vector of observable characteristics of player j and U j a vector of unobservable determinants of the payoff. Finally θ is a vector of param-eters. Pure strategy equilibrium conditions define a many-to-many mapping Γ θ from un-observable player characteristics U to observable variables Y = ( S, X ) . More precisely, Γ θ ( s, x ) = { u ∈ R J : Π j ( s j , s − j , x j , u j ; θ ) ≥ Π j ( s, s − j , x j , u j ; θ ) , for all S and all j } .When the strategies are discrete, this is the set-up considered by D. Andrews et al., 2003,Pakes et al., 2004, and Ciliberto and Tamer, 2006. A special case of the latter example is given in Jovanovic, 1989 and will serve as our firstillustrative example:
Pilot Example 1.
The payoff functions are Π ( Y , Y , U , U ) = ( θY − U )1 { Y =1 } and Π ( Y , Y , U , U ) = ( θY − U )1 { Y =1 } , where Y i ∈ { , } is firm i’s action, and U = ( U , U ) (cid:48) are exogenous costs. The firms know their costs; the analyst, however, knows only that U isuniformly distributed on [0 , , and that the structural parameter θ is in (0 , . There aretwo pure strategy Nash equilibria. The first is Y = Y = 0 for all U ∈ [0 , . The secondis Y = Y = 1 for all U ∈ [0 , θ ] and zero otherwise. Since the two firms’ actions areperfectly correlated, we shall denote them by a single binary variable Y = Y = Y . Hencethe structure is described by the many-to-many mapping: Γ θ (1) = [0 , θ ] and Γ θ (0) = [0 , .In this case, since Y is Bernoulli, we can characterize P with the probability p of observinga 1. A second example illustrates the case with continuous observable variables:
Pilot Example 2.
Tinbergen, 1951 first spelt out the implications of skill and job re-quirement heterogeneity on the distribution of wages. We adopt a simplified version of he skill versus job requirements relation for illustrative purposes. Suppose one observesavailable jobs in an economy, each characterized by a set of characteristics Y with distri-bution P . Worker’s skills are unobserved, and are assumed for illustrative purposes to becharacterized by an index U ∈ R . Fulfillment of job Y is known to require a range of skills Γ θ ( Y ) = [ s θ ( Y ) , s θ ( Y )] . The distribution of skills is parameterized by ν θ . Identification of the parameter θ would require the correspondence between the law ofthe observations P and the parameter vector θ to be a function. Compared to the setupdescribed in Roehrig, 1988, there is the added complexity of the possibility that the observ-able variables have discrete components, and that the structure allows multiple equilibria.Conditions ensuring identification are likely to prove complicated and restrictive, and willoften rule out multiple equilibria, which is the norm rather than the exception in exam-ple 1. We therefore eschew identification, and allow the relation between P and θ to bemany-to-many. Our objective is to conduct inference on the set Θ I of parameter valuesthat are compatible with the true law of the observable variables P .Let us formally define compatibility of a given value θ of the parameter vector with alaw P for the observable variables Y . When θ is fixed, all the elements in the model arecompletely known. We therefore have a structure in the terminology of Koopmans andReiersol, 1950 extended by Jovanovic, 1989. The structure is given by the law ν θ for U ,and the many-to-many mapping Γ θ linking Y and U . We denote this structure by thetriple ( P, Γ θ , ν θ ). Consider now the restrictions that ( P, Γ θ , ν θ ) imposes on the unknown π , the law of the vector of variables ( Y, U ). • Its marginal with respect to Y is P , • Its marginal with respect to U is ν θ , • The economic restrictions U ∈ Γ θ ( Y ) hold π almost surely.A probability law π that satisfies the restrictions above may or may not exist. If and onlyif it does, we say that the structure ( P, Γ θ , ν θ ) is internally consistent, or simply that thevalue θ of the parameter is compatible with the law P of the observable variables. If novalue θ is found such that the structure is internally consistent, then the model restrictionsare rejected. 6 efinition 1. A structure ( P, Γ , ν ) for ( Y, U ) given by a probability law P for Y , a proba-bility law ν for U and a set of restrictions U ∈ Γ( Y ) is called internally consistent if thereexists a law π for the vector ( Y, U ) with marginals P and ν such that π ( { U ∈ Γ( Y ) } ) = 1 . We can now define the identified set as the set of values of the parameters that achievethis internal consistency. They are observationally equivalent, since even though they maycorrespond to different π ’s, they correspond to the same P . Definition 2.
The identified set Θ I = Θ I ( P ) is the set of values θ of the parameter vectorsuch that the structure ( P, Γ θ , ν θ ) is internally consistent. We illustrate the previous definitions with our pilot example:
Pilot example 1 continued
For a given value of θ , the structure ( P, Γ θ , ν θ ) is defined by p , Γ θ and the uniform distribution ν θ on [0 , . ( P, Γ θ , ν θ ) is internally consistent if thereexists a probability on { , } × [0 , with marginal frequency p of observing a Y = 1, anduniform marginal distribution for the costs U such that Y = 1 ⇒ U ≤ θ almost surely(where the last inequality is meant coordinate by coordinate).The previous example illustrates the fact that definition 1 is not very easy to apply toderive the identified set in specific problems. We therefore propose a characterization ofinternal consistency which will prove more practical, and which, as we shall see in the nextsection, will motivate the construction of the statistic to test internal consistency. Proposition 1.
A structure ( P, Γ , ν ) is internally consistent if and only if sup A ∈B [ P ( A ) − ν (Γ( A ))] = 0 where B is the collection of measurable sets in the space of realizations of Y . This proposition shows that checking internal consistency of a structure is equivalent tochecking that the P -measure of a set is always dominated by the ν -measure of the imageof this set by Γ (recall that the image of a set A by a many-to-many mapping is definedby Γ( A ) = (cid:83) a ∈ A Γ( a )). Note that it is relatively easy to show necessity, i.e. that theexistence of π satisfying the constraints (the definition of internal consistency) implies thatsup A ∈B [ P ( A ) − ν (Γ( A ))] = 0. Indeed, the definition of internal consistency implies that Y ∈ A ⇒ U ∈ Γ( A ), so that 1 { Y ∈ A } ≤ { U ∈ Γ( A ) } , π -almost surely. Taking expectation,we have E π (1 { Y ∈ A } ) ≤ E π (1 { U ∈ Γ( A ) } ), which yields the result, since π has marginals P and ν . The converse (proved in the appendix) is far more involved, as it relies on mass trans-portation duality, where mass P is transported into mass ν with 0-1 cost of transportationassociated with violations of the restrictions U ∈ Γ( Y ).7 ilot example 1 continued For a given θ , it is now very easy to derive the condition forinternal consistency of the structure. Indeed, all we need to check is that sup A ∈ { , } [ P ( A ) − ν θ (Γ θ ( A ))] = 0 (where 2 B is the collection of all subsets of a set B ), which only constrains P ( { } ) ≤ ν θ ([0 , θ ] ), hence p ≤ θ . So the identified set for the structural parameter isΘ I = [ √ p, Remark 1.
Further dimension reduction requires the determination of classes of sets A on which to check the inequality between P ( A ) and ν (Γ( A )) . This is needed for instancewhen the observable variables are discrete and take many different values, since checkingthe inequality for all subsets of the set of possible values would involve a very large num-ber of operations. Galichon and Henry, 2006b addresses this issue with a theory of coredetermining classes . Pilot example 2 continued
Fixing θ (and dropping it from the notation), the necessaryand sufficient condition for internal consistency of the structure is that P ( A ) ≤ ν (Γ( A )) forany measurable set A . Suppose for expositional purposes that the jobs are characterizedby a real valued random variable Y , and that required skills are monotone in the sensethat s and s are nondecreasing. As shown in Galichon and Henry, 2006b, the inequalityneeds to be checked only on sets of the form A = ( −∞ , y ] and A = ( y, + ∞ ), for y ∈ R ,so that a necessary and sufficient condition for internal consistency of the structure is that F ν ( s ( y )) ≤ F ( y ) ≤ F ν ( s ( y )), where F is the cumulative distribution function of jobs Y ,and F ν is the cumulative distribution function of skills U . Given a sample ( Y , . . . , Y n ) of independently and identically distributed realizations of Y , our objective is to construct a sequence of random sets Θ αn such that for all θ ∈ Θ I ,lim n →∞ Pr ( θ ∈ Θ αn ) = 1 − α . In other words, we are concerned with constructing a regionΘ αn that covers each value of the identified set, as opposed to a region ˜Θ that covers theidentified set uniformly, i.e. such that Pr(Θ I ⊆ ˜Θ) = 1 − α . We do so by including in Θ αn all the values of θ such that we fail to reject a test of internal consistency of ( P, Γ θ , ν θ )with asymptotic level 1 − α . We shall demonstrate the construction of a test statistic T n ( θ )and a sequence c αn ( θ ) such that, conditionally on the structure ( P, Γ θ , ν θ ) being internallyconsistent, the probability that T n ( θ ) ≤ c αn ( θ ) is 1 − α asymptotically, i.e.lim n →∞ Pr ( T n ( θ ) ≤ c αn ( θ ) | ( P, Γ θ , ν θ ) is internally consistent) = 1 − α. (1)Hence we define our confidence region in the following way.8able 1: Summary of the procedure
1. For a given value of θ , calculate ˆ T n ( θ ) = √ n sup A ∈C n [ P n ( A ) − ν θ (Γ θ ( A ))] , where the collection of sets C n is described in table 2, and P n is the empirical distribution of the sample ( Y , . . . , Y n ) , so that P n ( A ) = (1 /n ) (cid:80) ni =1 { Y i ∈ A } .2. Choose a large integer B . Draw B bootstrap samples ( Y b , . . . , Y bn ) , b =1 , . . . , B with replacement from the initial sample ( Y , . . . , Y n ) . For eachbootstrap sample, calculate T bn ( θ ) = sup A ∈C n,hn ( θ ) [ P b ( A ) − P n ( A )] , where P b is the empirical distribution of the bootstrap sample, and C n,h n ( θ ) isdescribed in table 2. Order the T bn ( θ ) ’s and call c α ∗ ( θ ) the B (1 − α ) largest.3. Include θ in Θ I if and only if ˆ T n ( θ ) ≤ c α ∗ ( θ ) . Definition 3.
The (1 − α ) confidence region for Θ I is Θ αn = { θ ∈ Θ : T n ( θ ) ≤ c αn ( θ ) } . The full procedure is summarized in table 1. It is clear from equation 1 and the above def-inition that our confidence region covers each element of the identified set with probability1 − α asymptotically. Hence, after a section devoted to discussing in detail our contributionwithin the literature on the topic, the remainder of this paper will be concerned with theconstruction of the statistic T n and sequence c αn with the required property (1). Pilot example 1 continued
The test statistic is then T n ( θ ) = √ n sup A ∈ { , } [ P n ( A ) − ν θ (Γ θ ( A ))]. Since P n ( ∅ ) = ν θ (Γ θ ( ∅ )) and P n ( { } ) − ν θ (Γ θ ( { } )) = p n − θ , the test statisticis equal to T n ( θ ) = max {√ n ( p n − p ) + √ n ( p − θ ) , } which tends to max { (cid:112) p (1 − p ) Z, } where Z is a standard normal random variable, if p = θ , 0 if p < θ , + ∞ if p > θ . Forany θ such that p ≤ θ , T n ( θ ) has the same limit as ˜ T n = sup A ∈C hn [ √ n ( P n ( A ) − P ( A ))]where C h n is equal to { ∅ , { , }} if p n < θ − h n and 2 { , } if p n ≥ θ − h n . Hence theconfidence region Θ αn is the set of θ values that are not rejected in a one-sided test ofthe null hypothesis p ≤ θ against the alternative p > θ based on the quantiles of thedistribution of max {√ n ( p ∗ − p n ) , } given the sample (where p ∗ denotes the frequency of1’s in a bootstrap sample). 9able 2: Collection of sets
1. Take the sample ( Y , . . . , Y n ) . Write Y i = ( D i , C i ) where D i includesthe discrete components, and C i the continuous components of the ob-servable variables in the sample. Call X D the set of values taken by D i . Then, C n is the collection of sets of the form A D × [ −∞ , C i ] or itscomplement, where i = 1 , . . . , n , A D ranges over the subsets of X D , and [ −∞ , C i ] denotes the hyper-rectangle bounded above by the componentsof C i .2. Given h n satisfying h n ln ln n + h − n (cid:112) ln ln n/n → as n → ∞ (e.g. h n = (ln n ) − ), take C n,h n ( θ ) = { A ∈ C n : P n ( A ) ≥ ν θ (Γ θ ( A )) − h n } . This paper appears to be the first to cast partial identification as a mass transportationproblem. Somewhat related is the specific use of Fr´echet-Hoeffding bounds on cell proba-bilities in Heckman et al., 1997 and Cross and Manski, 2002.The literature on specification testing in econometrics is quite extensive (see the manyreferences in D. Andrews, 1988 for Cram´er-von Mises tests and D. Andrews, 1997 forthe Kolmogorov-Smirnov type). Jovanovic, 1989 proposes to consider testing specifica-tions with multiple equilibria and possible lack of identification with a generalization ofthe Kolmogorov-Smirnov specification test, which is exceedingly conservative unless thestructure is nearly identified. The stochastic dominance tests of McFadden, 1989 (see alsoLinton et al., 2005 and references within) are also related to tests of partially identifiedstructures based on the Kolmogorov-Smirnov statistic. The feasible version of our testingprocedure and the use of the bootstrapped empirical process is related to D. Andrews,1997.The incompleteness of the structure to be tested raises boundary problems, which appearalso in the estimation of models defined by moment inequalities (see Imbens and Manski,2004 and the link drawn by Rosen, 2008 with the literature on constrained statisticaltesting, surveyed in Sen and Silvapulle, 2004) and stochastic dominance testing (see Lintonet al., 2005). Here the asymptotic analysis is carried out via a localization of the empiricalprocesses to treat the boundary problem, which is another major innovation of this paper.Also related is the analysis in Liu and Shao, 2003 of the likelihood ratio test when the10ikelihood is maximized on a set as opposed to a single point.The related problem of constructing confidence regions for partially identified structuralparameters is the focus of considerable recent research, following the recognition (advo-cated in Manski, 2005) that ad-hoc identification conditions can considerably weaken in-ference drawn on their basis. Horowitz and Manski, 1998 propose confidence intervalsthat asymptotically cover interval identified sets with fixed probability. Beyond the in-terval case, Chernozhukov et al., 2007 propose a criterion function based method, wherethe criterion is maximized on a set, as opposed to a single point. The method allows theconstruction of confidence regions for the identified set and for each parameter value in theidentified set. Chernozhukov et al., 2007 also specialize their method to the case of modelsdefined by moment inequalities, with a quadratic criterion function.The case of moment inequalities is also considered as a special case by Galichon and Henry,2006a, Romano and Shaikh, 2008 and Romano and Shaikh, 2006 (see also Rosen, 2008 andBugni, 2007). The present paper complements Chernozhukov et al., 2007 in that it justifies,via a mass transportation argument, the use of a generalized Kolmogorov-Smirnov criterionfunction in the extended Koopmans and Reiersol, 1950 setup presented here. Note thatour proposed use of the bootstrap only concerns the empirical process, as in D. Andrews,2000, so that issues of validity related to bootstrapping the test statistic itself do not arise.The Anderson and Rubin, 1949 approach taken here to construct confidence regions forparameter values within the identified set is also adopted in Chernozhukov et al., 2007,D. Andrews et al., 2003, Romano and Shaikh, 2008 among many others. D. Andrewset al., 2003 work in a similar framework to the present paper (they consider example 1),but restrict their analysis to discrete dependent variables, and use a projection method, sothat their inference is likely to be more conservative.Since confidence regions are asymptotically validated, as emphasized by Imbens and Man-ski, 2004, uniformity of the confidence region for parameter values is a desirable propertyfor small sample accuracy. D. Andrews and Guggenberger, 2006 analyze uniformity of sub-sampling procedures. Romano and Shaikh, 2008 and Romano and Shaikh, 2006 give highlevel conditions for uniformity of sub-sampling procedures in the criterion-based approach,with specific conditions under which these results hold in case of regression with intervaloutcomes. Here, we propose to invert a test, which is shown to be asymptotically uniformin level in Galichon and Henry, 2008.In related research, Beresteanu and Molinari, 2008 propose a direct analogy to central11imit theorem based confidence regions in best linear prediction problems. The confidenceregion they propose for the identified set, in a problem of best linear prediction withinterval outcomes, is the union of a collection of random sets that contain the identifiedset with pre-specified probability. The latter is obtained from central limit theorems forrandom sets (see Molchanov, 2005 for a comprehensive account of the theory). Theypropose one-sided and two-sided versions of their test. The Beresteanu and Molinari, 2008two-sided procedure does not suffer from discontinuity at the limit where the identifiedset is a singleton. However, by construction, Beresteanu and Molinari, 2008 only provideconfidence regions for the whole set, which are typically larger than identified regions foreach point in the identified set.
As explained in the previous section, the construction of the confidence region relies on atest of internal consistency of the structure ( P, Γ θ , ν θ ) for a fixed θ . We now explain theconstruction of our test statistic and decision rule, for the hypothesis of internal consistencyof a structure ( P, Γ , ν ) defined by a a probability law ν for U and a set of constraints U ∈ Γ( Y ). The hypothesis that ( P, Γ , ν ) is internally consistent is equivalent to theexistence of a law π for ( Y, U ) with marginals P and ν and such that the constraints U ∈ Γ( Y ) hold π -almost surely. By proposition 1, this null hypothesis is also equivalent to H : sup A ∈B [ P ( A ) − ν (Γ( A ))] = 0 . We propose the following statistic to test the null described above: T n = √ n sup A ∈C [ P n ( A ) − ν (Γ( A ))] , (2)where P n is the empirical distribution of the sample (so that for any measurable set A , P n ( A ) = (1 /n ) (cid:80) ni =1 { Y i ∈ A } ) and where C is defined in table 3.This statistic is a generalized Kolmogorov-Smirnov specification test statistic in the sensethat when Γ has disjoint images (i.e. Γ − is a function), T n is a multivariate Kolmogorov-Smirnov statistic for the test of the hypothesis that the structure is correctly specified, i.e.that the probability law A (cid:55)→ ν (Γ( A )) is indeed equal to the true law P generating the12able 3: collections of sets
1. Write Y = ( D, C ) where D includes the discrete components, and C thecontinuous components with dimension d C . Call X D the set of valuestaken by D . Then, C is the collection of sets of the form A D × [ −∞ , c ] orits complement, where c ∈ R d C , A D ranges over the subsets of X D , and [ −∞ , c ] is the hyper-rectangle bounded above by the components of c .2. Given h > , define C b = { A ∈ C : P ( A ) = ν (Γ( A )) } . C b,h = { A ∈ C : P ( A ) ≥ ν (Γ( A )) − h } . C h = { A ∈ C : P n ( A ) ≥ ν (Γ( A )) − h } . observable variables Y . In the general case where Γ is a many-to-many mapping, A (cid:55)→ ν (Γ( A )) is no longer a probability measure, since two sets A and B may be disjoint, andyet their images Γ( A ) and Γ( B ) are not, so that ν (Γ( A ∪ B )) may be strictly smaller than ν (Γ( A )) + ν (Γ( B )). This introduces significant complications in the asymptotic analysisof the statistic T n as explained in the following discussion.We can write T n = √ n sup A ∈C [ P n ( A ) − ν (Γ( A )] = sup A ∈C { G n ( A ) + √ n [ P ( A ) − ν (Γ( A )] } (3)where G n ( A ) := √ n [ P n ( A ) − P ( A )] is the empirical process. In the case of the classicalKolmogorov-Smirnov statistic (i.e. if Γ − were a function), the term P ( A ) − ν (Γ( A ))would vanish under the null hypothesis. Here, however, under the null we only have P ( A ) ≤ ν (Γ( A )), so that the term √ n [ P ( A ) − ν (Γ( A )] will also contribute. Indeed, for anyset A ∈ C such that P ( A ) = ν (Γ( A )) (i.e. A ∈ C b as defined in table 3), the only remainingterm in the right-hand-side of equation (3) is the empirical process. On the other hand,for any set A ∈ C such that P ( A ) < ν (Γ( A )), √ n [ P ( A ) − ν (Γ( A ))] will take increasinglylarge negative values and eventually dominate the expression inside the supremum in theright-hand-side of equation (3) and such a set A will not contribute to the supremum. Weshow in the proof of theorem 1 that under a very mild assumption on the structure, thelimit will only involve a supremum over sets in C b . Since C b depends on P , it is unknown,and needs to be approximated by a data dependent class C h n defined in table 3 (namely C h with h = h n ). 13 efinition 4. The test statistic T n is given by equation (2), and c αn is the − α quantile of ˜ T n := sup A ∈C hn G n ( A ) (with C h defined in table 3), i.e. c αn = inf { c : P ( ˜ T n ≤ c ) ≥ − α } . Assumption 1.
There exists
K > and < η < such that for all A ∈ C b,h , for h > sufficiently small, there exists an A b ∈ C b such that A b ⊆ A and d H ( A, A b ) ≤ Kh η . ( C b and C h are defined in table 3, and d H denotes the Hausdorff metric, defined in the appendix.) Remark 2.
Assumption 1 is very mild, in the sense that it fails only in pathologicalcases, such as the case where y ∈ R and y (cid:55)→ P (( −∞ , y ]) − ν (Γ(( −∞ , y ])) is C ∞ with allderivatives equal to zero at some y = y such that ( −∞ , y ] ∈ C b . Assumption 2. h n satisfies h n ln ln n + h − n (cid:112) ln ln n/n → as n → ∞ . Remark 3.
Note that assumption 2 is extremely mild, and it is satisfied for instance incase h n = (ln n ) − or in case h n satisfies h n n η + h − n n η − / → , as n → ∞ for any / > η > , however small. Theorem 1.
Suppose Y either takes values in a finite set or has density with respect toLebesgue measure. Under assumption 1 and 2, and using the notations of definition 4, wehave lim n →∞ P ( T n ≤ c αn | ( P, Γ , ν ) is internally consistent ) = 1 − α. Theorem 1 is not applicable directly for two reasons:1. The quantile sequence c αn given in definition 4 is infeasible in that the statistic ˜ T n involves the empirical process G n = √ n [ P n − P ] with P unknown.2. The statistics T n and ˜ T n are defined as suprema over infinite collections of sets C and C h (with C and C h defined in table 3).We show now that T n can be replaced by ˆ T n defined in table 2, and that c αn can be replacedby c α ∗ , which is the 1 − α quantile of T ∗ := sup A ∈C n,hn G ∗ ( A ), where G ∗ := √ n [ P ∗ − P n ] isthe bootstrapped empirical process. We thereby justify the fully implementable proceduredescribed in table 1. This feasible version of the test mirrors the feasible version of theconditional Kolmogorov-Smirnov test proposed by D. Andrews, 1997, albeit in generalizedform (multivariate and incompletely specified).To that end, we need a large support assumption and a log concavity assumption forthe distribution of observable variables and a continuity assumption on the mapping Γ toensure that ˆ T n has the same limit as T n . 14 ssumption 3. In case P has density with respect to Lebesgue measure, the density isbounded away from zero, absolutely continuous and log concave (note that log concavedensities include the uniform, normal, beta, exponential and extreme value distributions). Assumption 4.
The functions y (cid:55)→ ν (Γ(( −∞ , y ])) and y (cid:55)→ ν (Γ(( −∞ , y ] c )) are Lipschitz,i.e. there exists some k > such that | ν (Γ(( −∞ , y ])) − ν (Γ(( −∞ , y (cid:48) ])) | ≤ k || y − y (cid:48) || , andidentically for ( −∞ , y ] c . Theorem 2.
Under the assumptions of theorem 1 and assumptions 3 and 4, we have lim n →∞ P ( ˆ T n ≤ c α ∗ | ( P, Γ , ν ) is internally consistent ) = 1 − α almost surely, conditionally on the sample. Remark 4.
The conditions for the validity of the bootstrap procedure are no more re-strictive than the conditions for theorem 1. The additional assumptions, which are morehigh level, are needed only to justify using the data driven class of sets C n instead of C .This follows the proposal in D. Andrews, 1997 in order to simplify the testing procedure asmuch as possible. However, an alternative feasible version of the test relies on a regulardiscretization ( y k ) Nk =1 of the space of continuous observable variables (thereby replacing C n by the class of sets of the form ( −∞ , y k ] , ( −∞ , y k ] c , k = 1 , . . . , N ). To complete the analysis of the test of internal consistency we give conditions under whichthe test is consistent. The class of alternatives we consider is the following: H a : sup A ∈C [ P ( A ) − ν (Γ( A ))] (cid:54) = 0 , where C is defined in table 3. We choose this class of alternatives since it simplifies to theset of alternatives in a multivariate Kolmogorov-Smirnov goodness-of-fit test when P isabsolutely continuous with respect to Lebesgue measure and when Γ − is a function.We have Theorem 3.
Under H a and the assumptions of theorem 1, lim n →∞ P ( T n ≥ c αn ) = 1 . Remark 5.
Notice that the validity of this consistency test is completely general, and,unlike theorem 1, the proof is a straightforward extension of the proof of consistency of thetraditional Kolmogorov-Smirnov specification test (see for instance page 526 of Lehmannand Romano, 2005). .3 Small sample investigation of the properties of the test ofinternal consistency We investigate the small sample properties of out test, and compare it to the propertiesof the Kolmogorov-Smirnov specification test in the identified case in a small Monte Carloexperiment based on a special case of illustrative example 2.We consider the following setup illustrated in figure 1: the structure is given by the cor-respondence Γ( Y ) = [ s ( Y ) , s ( Y )] with s ( Y ) = max(0 , Y + s ) and s ( Y ) = min(1 , Y + s ), s = 0 .
15, and the latent variable U has law ν , which is the uniform distribution over [0 , Y has cumulative distribution function defined on [0 ,
1] by F ( y ) = 0 for 0 ≤ y < s, = y − s for s ≤ y < s , = (1 + 4 s ) y − s − s for 1 + s ≤ y < − s , = y + s for 2 − s ≤ y < − s, = 1 for 1 − s ≤ y ≤ . Figure 1: The correspondence Γ is given by the shaded area, and the thick lines trace theinverse cumulative distribution function of Y .16able 4: Rejection levels for the partially identified case.Sample Size 100 500 1000 α = 0 .
01 0.001 0.007 0.008 α = 0 .
05 0.010 0.024 0.029 α = 0 .
10 0.029 0.049 0.066Table 5: Rejection levels for the exactly identified caseSample Size 100 500 1000 α = 0 .
01 0.019 0.024 0.014 α = 0 .
05 0.074 0.079 0.050 α = 0 .
10 0.138 0.135 0.105We perform 1000 repetitions of the following testing procedure, and we report the propor-tions of rejections out of these 1000 repetitions. We first generate a sample ( U , . . . , U n ) ofiid uniform [0 , n = 100 , , Y , . . . , Y n ) as ( F − ( U ) , . . . , F − ( U n )). P n is the empirical law of ( Y , . . . , Y n ), and C n,h n isthe collection of sets of the form [0 , Y i ], i = 1 , . . . , n with P n [0 , Y i ] = (1 /n ) (cid:80) nj =1 { Y j ≤ Y i } ≥ ν (Γ([0 , Y i ])) − h n = min[1 , Y i + s ] − h n or [ Y i , i, . . . , n with P n [ Y i , ≥ ν (Γ([ Y i , − h n =min[1 , − Y i + s ] − h n .For each sample, we draw 1000 bootstrap samples ( Y b , . . . , Y bn ), and call P b the law of thebootstrap sample. For each bootstrap sample, we calculate the maximum of the quantities P b [0 , Y i ] − P n [0 , Y i ] for all i such that [0 , Y i ] ∈ C n,h n and P b [ Y i , − P n [ Y i ,
1] for all i suchthat [ Y i , ∈ C n,h n , and call this maximum max G b . Order the max G b obtained for allbootstrap draws, and call c α ∗ the (1 − α )1000 largest, for α = 0 . , . , .
1. Reject if c α ∗ issmaller than the maximum of the quantities P n [0 , Y i ] and P n [ Y i ,
1] for i = 1 . . . , n .The results are given in table 4 for the partially identified case ( s = 0 .
15) and in table 5,we give the benchmark of the exactly identified case ( s = 0 and h n = 1), so that the testis a traditional Kolmogorov-Smirnov specification test. The results are given for h n on theboundary of the admissible rate, i.e. h n = (cid:112) ln ln n/n . This rate was chosen as a power We use MATLAB version 7.1 with random seed 777. h n = 0 . h n = 0 . h n = 0 . h n = 0 . h n = 0 . h n = 0 . α = 0 .
01 0.004 0 0.012 0.002 0.019 0.005 α = 0 .
05 0.026 0.006 0.049 0.017 0.058 0.022 α = 0 .
10 0.064 0.020 0.090 0.034 0.111 0.043maximizing rate (the rate that will ensure smaller quantiles, hence larger rejection rates).This is the only justification for a choice of rate that we can provide at this stage, as optimalrate choice is beyond the scope of this paper. In applications, it is recommended to provideresults for different choices of rates, as one would typically do in density, nonparametricregression or spectral estimation. The rejection rates are low for small sample sizes andimprove sharply when sample size increases. To give a sense of the sensitivity of rejectionrates to the choice of the tuning parameter h n , table 6 reports rejection rates in the caseof α = 0 . , . , . n = 100 , , h n thatare significantly above, and significantly below the initial choice of h n = (cid:112) ln ln n/n . For n = 1000, (cid:112) ln ln n/n = 0 . h n = 0 . , . n = 500, (cid:112) ln ln n/n = 0 . h n = 0 . , . n = 100, (cid:112) ln ln n/n =0 . h n = 0 . , . n = 100, the rejection rates are sensitive to the choiceof rate within the theoretical range (assumption 2) of tuning parameters. For n = 500,there is still sensitivity to the choice of h n , somewhat less so for n = 1000. However, asin the case of bandwidth in kernel estimation or in local spectral estimation of time series,it is highly recommended to report empirical results with a good range of values of thetuning parameter h n . Figure 2 graphs rejections rates against tuning parameter to give abetter sense of this sensitivity for sample size 500 and level 0.05. It is important also tonote that higher values of the tuning parameter lead to less filtering, i.e. more sets areused in the computation of the supremum of the bootstrap empirical process, leading tolarger quantiles, hence smaller rejection rates. Hence it also shows how crucial the filteringprocedure is, since without it, the power of the test would be very poor.18igure 2: Sensitivity to the tuning parameter. Sample size 500, level 0.05, tuning param-eter ranging from 0.005 to 0.15 on the X axis, and rejections rates on the Y axis. Conclusion
We propose a test of the specification of a structure in the sense of Koopmans and Reier-sol, 1950, extended by Jovanovic, 1989, where observable variables and latent variables arerelated by a many-to-many mapping, thereby allowing censored observable variables andmultiple equilibria. We apply mass transportation duality to derive a simple necessary andsufficient condition for compatibility of such structures and data in complete generality,and to justify the use of a generalized Kolmogorov-Smirnov test statistic. We propose agenerically applicable and easily implementable procedure to test compatibility of structureand data, and to construct confidence regions for partially identified parameters specifyingthe structure. This work therefore complements other proposals, which tend to focus onmodels defined by moments inequalities. The small sample performance of the test is in-vestigated in a Monte Carlo experiment, and is found to be comparable to the performanceof the traditional Kolmogorov-Smirnov specification test statistic.19 ppendix
Additional definitions
Definition 5.
A many-to-many mapping
Γ : R d ⇒ R d is called measurable if for eachopen set O ⊆ R d , Γ − ( O ) = { x ∈ R d | Γ( x ) ∩ O (cid:54) = ∅ } is a measurable subset of R d . Definition 6.
Calling d the Euclidean metric, the Hausdorff metric d H between two sets A and A is defined by d H ( A , A ) = max (cid:18) sup y ∈ A inf z ∈ A d ( y, z ) , sup z ∈ A inf y ∈ A d ( y, z ) (cid:19) . Proofs of results in the main text
Proof of proposition 1 : Since Γ is closed valued, ϕ ( y, u ) = 1 { u/ ∈ Γ( y ) } is lower semicon-tinuous, so that we can apply lemma 1 below to yieldinf π ∈M ( P,ν ) πϕ = sup f ⊕ g ≤ ϕ ( P f + νg ) , (4)where f ⊕ g ≤ ϕ stands for f ( y ) + g ( u ) ≤ ϕ ( y, u ) all y, u . Since the sup-norm of the costfunction is 1 (the cost function is an indicator), the supremum in (4) is attained by pairsof functions ( f, g ) in F , defined by F = { ( f, g ) ∈ L ( P ) × L ( ν ) , ≤ f ≤ , − ≤ g ≤ ,f ( y ) + g ( u ) ≤ { u/ ∈ Γ( y ) } , f upper semicontinuous } . Now, ( f, g ) can be written as a convex combination of pairs (1 A , − B ) in F . Indeed, f = (cid:82) { f ≥ x } dx and g = (cid:82) − { g ≤− x } dx , and for all x , 1 { f ≥ x } ( y ) − { g ≤− x } ( u ) ≤ { u/ ∈ Γ( y ) } .Since the functional on the right-hand side of (4) is linear, the supremum is attained onsuch a pair (1 A , − B ). Hence, the right-hand side of (4) specializes tosup A × B ⊆ D ( P ( A ) − ν ( B )) . (5)For D = { ( y, u ) : u / ∈ Γ( y ) } , A × B ⊆ D means that if y ∈ A and u ∈ B , then u / ∈ Γ( y ).In other words u ∈ B implies u / ∈ Γ( A ), which can be written B ⊆ Γ( A ) c . Hence, the dualproblem can be writtensup Γ( A ) ⊆ B c ( P ( A ) − ν ( B )) = sup Γ( A ) ⊆ B ( P ( A ) − ν ( B )) . and the result follows immediately. 20 emma 1. If ϕ : Y × U → R is bounded, non-negative and lower semicontinuous, then inf π ∈M ( P,ν ) πϕ = sup f ⊕ g ≤ ϕ ( P f + νg ) . Proof of lemma 1 : The left-hand side is immediately seen to be always larger than theright-hand side, so we show the reverse inequality. It is a specialization of the Monge-Kantorovich duality to zero-one cost, which can also be proved using Proposition (3.3)page 424 of Kellerer, 1984, but we give a direct proof due to N. Belili for completeness.[a] case where ϕ is continuous and U and Y are compact.Call G the set of functions on Y × U strictly dominated by ϕ and call H the set offunctions of the form f + g with f and g continuous functions on Y and U respectively.Call s ( c ) = P f + νg for c ∈ H . It is a well defined linear functional, and is not identicallyzero on H . G is convex and sup-norm open. Since ϕ is continuous on the compact Y × U ,we have s ( c ) ≤ sup f + sup g < sup ϕ for all c ∈ G ∩ H , which is non empty and convex. Hence, by the Hahn-Banach theorem,there exists a linear functional η that extends s on the space of continuous functions suchthat sup G η = sup G ∩ H s. By the Riesz representation theorem, there exists a unique finite non-negative measure π on Y × U such that η ( c ) = πc for all continuous c . Since η = s on H , we have (cid:90) Y×U f ( y ) dπ ( y, u ) = (cid:90) Y f ( y ) dP ( y ) (cid:90) Y×U g ( u ) dπ ( y, u ) = (cid:90) Y g ( u ) dν ( y ) , so that π ∈ M ( P, ν ) andsup f ⊕ g ≤ ϕ ( P f + νg ) = sup G ∩ H s = sup G η = πϕ. [b] Y and U are not necessarily compact, and ϕ is continuous.For all n >
0, there exists compact sets K n and L n such thatmax ( P ( Y\ K n ) , ν ( U \ L n )) ≤ n . a, b ) be an element of Y × U and define two probability measures µ n and ν n withcompact support by µ n ( A ) = P ( A ∩ K n ) + P ( A \ K n ) δ a ( A ) ν n ( B ) = ν ( B ∩ L n ) + ν ( B \ L n ) δ b ( B ) , where δ denotes the Dirac measure. By [a] above, there exists π n with marginals µ n and ν n such that π n ϕ ≤ sup f ⊕ g ≤ ϕ ( P f + νg ) + ϕ ( a, b ) n . Since ( π n ) has weakly converging marginals, it is weakly relatively compact. Hence itcontains a weakly converging subsequence with limit π ∈ M ( P, ν ). By Skorohod’s almostsure representation (see for instance theorem 11.7.2 page 415 of Dudley, 2002), there existsa sequence of random variables X n on a probability space (Ω , A , P ) with law π n and arandom variable X on the same probability space with law π such that X is the almostsure limit of ( X n ). By Fatou’s lemma, we then haveliminf π n ϕ = liminf E ϕ ( X n ) ≥ E liminf ϕ ( X n ) = E ϕ ( X ) = πϕ. Hence we have the desired result.[c] General case. ϕ is the pointwise supremum of a sequence of continuous bounded functions, so the resultfollows from upward σ -continuity of both inf π ∈M ( P,ν ) πϕ and sup f ⊕ g ≤ ϕ ( P f + νg ) on thespace of lower semicontinuous functions, shown in propositions (1.21) and (1.28) of Kellerer,1984. Proof of theorem 1 : We show that T n and ˜ T n converge in distribution (notation (cid:32) ) tothe same limit, which has a continuous distribution function. Hence, the result follows. • Case where Y = D discrete. Let A be the subset of X D that achieves the maximumof δ ( A ) = P ( A ) − ν (Γ( A )) over A ∈ C\C b . Call δ = δ ( A ), and note that δ < T n = sup A ∈ X D [ G n ( A ) + √ n ( P ( A ) − ν (Γ( A )))]= max { sup C b G n , sup A ∈ X D \C b [ G n ( A ) + √ n ( P ( A ) − ν (Γ( A )))] } . X D \C b G n + √ nδ , whose limsup is almost surely non-positive. Hence T n (cid:32) sup C b G follows from theconvergence of the empirical process. ˜ T n (cid:32) sup C b G follows from the fact that, underassumption 2, for all n sufficiently large, C h n is almost surely equal to C b . • Case of Y = C absolutely continuous with respect to Lebesgue measure. Consider twosequences of positive numbers l n and h n such that they both satisfy assumption 2, l n > h n and ( l n − h n ) − (cid:113) ln ln nn →
0. Notice that { ∅ , R d C } ⊆ C b , C b,h , C h for any h >
0. Since G n ( R d C ) = 0, we therefore have sup C b G n , sup C b,ln G n and sup C hn G n non-negative. Hence, calling ζ n the indicator function of the event sup C G n ≤ ( l n − h n ) √ n ,we can write ζ n sup C b G n ≤ ζ n max (cid:40) sup C b [ G n + √ n ( P − ν Γ)] , sup C\C b [ G n + √ n ( P − ν Γ)] (cid:41) ≤ ζ n T n ≤ ζ n sup C hn G n ≤ ζ n sup C b,ln G n , where the first inequality holds because the left-hand side is equal to the first termin the right-hand side, the second inequality holds trivially as an equality since C = C b ∪ C\C b , the third inequality holds because on C\C h n , we have by definition G n + √ n ( P − ν Γ) = √ n ( P n − ν Γ) ≤ − h n ≤
0, and the last inequality holds because on { ζ n = 1 } , we have that A ∈ C h n implies ν Γ( A ) ≤ P n ( A ) + h n = P ( A ) + ( P n − P )( A ) + h n ≤ P ( A ) + sup C G n / √ n + h n ≤ P ( A ) + l n − h n + h n = P ( A ) + l n , which impliesthat A ∈ C b,l n .By lemma 2 and Theorem 2.5.2 page 127 of van der Vaart and Wellner, 1996, we havethat both sup C b G n and sup C b,ln G n converge in distribution to sup C b G . It is shownbelow that ζ n → p
1, so that Slutsky’s lemma (lemma 2.8 page 11 of van der Vaart,1998) yields the weak convergence of ζ n sup C b G n and ζ n sup C b,ln G n to the same limit,and hence that of ζ n T n and ζ n sup ˆ C hn G n . It follows from Slutsky’s lemma again that T n (cid:32) sup C b G and ˜ T n (cid:32) sup C b G . We now prove that ζ n → p
1. Indeed, for any (cid:15) > P ( | ζ n − | > (cid:15) ) = P ( ζ n = 0) = P (sup C G n > ( l n − h n ) √ n ) → l n − h n ) √ n (cid:29) √ ln ln n by assumption.23 emma 2. We have sup A ∈C b,hn G n ( A ) (cid:32) sup A ∈C b G ( A ) , Proof of lemma 2 : Take a bandwidth sequence l n that satisfies assumption 2, andtake C b,l n as in table 3. Under assumption 1, take A ∈ C b,l n and an A b ∈ C b such that d H ( A, A b ) ≤ ζ n = Kl ηn (we suppress the dependence of A b on A for ease of notation). As C b ⊆ C b,l n , one has sup A ∈C b G n ( A ) ≤ sup B ∈C b,ln G n ( A ) (6)Second, since A b ⊆ A , one hassup A ∈C b,ln G n ( A ) = sup A ∈C b,ln [ G n ( A b ) + G n ( A \ A b )] ≤ sup A ∈C b,ln [ G n ( A b )] + sup A ∈C b,ln [ G n ( A \ A b )] . If we have that sup A ∈C b,ln | G n ( A \ A b ) | = O a . s . (cid:16)(cid:112) ζ n ln ln n (cid:17) , then sup A ∈C b,ln G n ( A ) = sup A ∈C b,ln [ G n ( A b )] + O a . s . (cid:16)(cid:112) ζ n ln ln n (cid:17) (7)noting the dependence of A b on A in the expression above. But since A b ∈ C b , one hassup A ∈C b,ln [ G n ( A b )] ≤ sup A ∈C b G n ( A ). This fact, along with (6) and (7), yields the result.We now show that we have indeed thatsup A ∈C b,ln | G n ( A \ A b ) | = O a . s . (cid:16)(cid:112) ζ n ln ln n (cid:17) . This relies on the construction of a local empirical process relative to the thin regions A \ A b . First consider such a region. If A ∈ C b , the result holds trivially, so that we mayassume that A ∈ C b,l n \C b , so that A \ A b is not empty. We distinguish the case where A isa bounded rectangle, and the cases where A is unbounded.(i) A is a bounded rectangle, i.e. of the form ( y , z ) × . . . × ( y d y , z d y ), with y , . . . ,y d y , z , . . . , z d y real. Then, since d H ( A, A b ) ≤ ζ n , A b is also a bounded rectangle, andthe A \ A b is the union of at least one (since A and A b are distinct) and at most f ( d y )(the number of faces of a rectangle in R d y ) rectangles with at least one dimensionbounded by ζ n . 24ii) A is an unbounded rectangle, i.e. of the same form as above, except that some of theedges are + ∞ of −∞ . Then A b is also an unbounded rectangle, and A \ A b is alsothe union of a finite number of rectangles with one dimension bounded by ζ n .In both cases ( i ), and ( ii ), A \ A b is the union of a finite number of rectangles with at leastone dimension bounded by ζ n . Hence if we control the supremum of the empirical processon one of these thin rectangles, when A ranges over C b,l n , we can control it on A \ A b .Hence, it suffices to prove thatsup A ∈C b,ln | G n ( ϕ n ( A )) | = O a . s . (cid:16)(cid:112) ζ n ln ln n (cid:17) , where ϕ n is the homothety that carries A into one of the thin rectangles described above.As an homothety, ϕ n is invertible and bi-measurable, and since ϕ n ( A ) has at least one di-mension bounded by ζ n , and P is absolutely continuous with respect to Lebesgue measure, P ( ϕ n ( A )) = O ( ζ n ) uniformely when A ranges over C b,l n . Now, for any A ∈ C b,l n , we have G n ( ϕ n ( A )) = √ n [ P n ( ϕ n ( A )) − P ( ϕ n ( A ))]= 1 √ n n (cid:88) i =1 (cid:0) { ϕ n ( A ) } ( Y i ) − E P (1 { ϕ n ( A ) } ( Y )) (cid:1) = 1 √ n n (cid:88) i =1 (cid:0) A ( ϕ − n ( Y i )) − E P (1 A ( ϕ − n ( Y ))) (cid:1) := (cid:112) ζ n L n (1 A , ϕ n ) , where L n (1 A , ϕ n ) is defined as1 √ nζ n n (cid:88) i =1 (cid:0) A ( ϕ − n ( Y i )) − E P (1 A ( ϕ − n ( Y ))) (cid:1) to conform with the notation of Einmahl and Mason, 1997.Conditions A(i)-A(iv) of the latter hold for a n = b n = l n and a = 0 under assumption 2,and conditions S(i)-S(iii) and F(ii) and F(iv)-F(viii) hold because F is here the class ofindicator functions of C b,l n , hence Donsker (see for instance example 2.6.1 page 135 of vander Vaart and Wellner, 1996). Hence Theorem 1.2 of Einmahl and Mason, 1997 holds, andsup A ∈C b,ln | L n (1 A , ϕ n ) | = O a . s . (cid:16) √ ln ln n (cid:17) so that the desired result holds. 25 roof of theorem 2 : By theorem 2.4 page 857 of Gin´e and Zinn, 1990, the bootstrappedempirical process G ∗ converges weakly to G conditionally almost surely, so thatsup A ∈C hn G n ( A ) and sup A ∈C hn G ∗ ( A )have the same continuous limit. There remains to show that T n and ˆ T n have the samelimit, and that sup A ∈C n,hn G ∗ ( A ) = sup A ∈C hn G ∗ ( A ) so that the result follows. The latterderives from the fact that G ∗ takes at most n different values over C h n which are exhaustedon C n,h n . We now prove the former. First, notice that C n ⊆ C implies ˆ T n ≤ T n . • Case where Y = D discrete. In that case, there is n such that for all n ≥ n , C n = C ,and the result trivially follows. • Case where Y = C ∈ R d y has a density with respect to Lebesgue measure. By The-orem 9.14 page 291 of Villani, 2003, there is existence of a one-to-one bi-measurable(i.e. both itself and its inverse are measurable) and Lipschitz (with constant 1) func-tion φ : [0 , d y → R d y such that Y = φ ( V ) and V is distributed uniformly on [0 , d y ( φ is called a generalized quantile transformation).Hence, for any set A ∈ C , we can write P n ( A ) = 1 n n (cid:88) i =1 { Y i ∈ A } = 1 n n (cid:88) i =1 { φ ( U i ) ∈ A } = 1 n n (cid:88) i =1 { U i ∈ φ − ( A ) } = λ n ( φ − ( A )) , where λ n denotes the empirical law associated with an iid sample of uniformly dis-tributed variables on [0 , d y .We have ˆ T n − T n = sup A ∈C n [ P n ( A ) − ν (Γ( A )] − sup A ∈C [ P n ( A ) − ν (Γ( A )]. We showthat for all (cid:15) >
0, there is an n such that for all n > n ,sup y ∈ R dy inf j ∈{ ,...,n } { ( P n ( −∞ , Y j ] − P n ( −∞ , y ]) + ( ν (Γ( −∞ , y ])) − ν (Γ( −∞ , Y j ])) } < (cid:15) and we can proceed similarly for sets of the form ( −∞ , y ] c . The proof of the latterproceeds in three steps: – By the results stated in the two paragraphs following equation (1) page 919 ofTalagrand, 1994, we have for any η > y ∈ [0 , dy min j ∈{ ,...,n } || v − V j || = O a . s . (cid:0) n η − / max(2 ,d y ) (cid:1) . Since φ is Lipschitz, the latter implies thatsup y ∈ R dy min j ∈{ ,...,n } || y − Y j || = O a . s . (cid:0) n η − / max(2 ,d y ) (cid:1) . Consider the mapping y (cid:55)→ j ( y ) which achieves the minimum of || y − Y j ( y ) || .B assumption 4, we have for n large enough, sup y ∈ R dy ( ν (Γ(( −∞ , Y j ( y ) ])) − ν (Γ(( −∞ , y ]))) < (cid:15)/ – We have sup y ∈ R dy ( P ( −∞ , y ) − P ( −∞ , Y j ( y ) )) < (cid:15)/
4, since the set ( −∞ , y ) \ ( −∞ , Y j ( y ) ]shrinks uniformly. – By Theorem 2.3 page 367 of Stute, 1984, we have sup A ⊂ R dy ( P n ( A ) − P ( A )) < (cid:15)/ n large enough, and the result follows. Proof of theorem 3 : Under H a , there is a set A in C such that P ( A ) > ν (Γ( A )).Now the test statistic is T n = √ n sup A ∈C [ P n ( A ) − ν (Γ( A ))]= sup A ∈C [ G n ( A ) + √ n ( P ( A ) − ν (Γ( A )))] ≥ G n ( A ) + √ n [ P ( A ) − ν (Γ( A ))] . (8)Hence, T n − ˜ T n = √ n sup A ∈C [ P n ( A ) − ν (Γ( A ))] − sup A ∈C hn G n ( A ) ≥ √ n sup A ∈C [ P n ( A ) − ν (Γ( A ))] − sup A ∈C G n ( A ) ≥ G n ( A ) + √ n [ P ( A ) − ν (Γ( A ))] − sup A ∈C G n ( A ) , where the first inequality follows from the fact that C h n ⊆ C , and the second inequalityfollows from (8). Since P ( A ) > ν (Γ( A )), we have √ n [ P ( A ) − ν (Γ( A ))] → ∞ . Hence,since G n ( A ) − sup A ∈C G n ( A ) is a tight sequence (this can be derived for instance fromexponential bounds in 2.14.9 and 2.14.10 page 246 of van der Vaart and Wellner, 1996),we have P ( T n ≥ c αn ) → α >
0. 27 eferences
Ackerberg, D., Benkard, L., Berry, S., & Pakes, A. (2007).
Econometric tools for analyzingmarket outcomes [ Handbook of Econometrics , Volume 6A].Anderson, T., & Rubin, H. (1949). Estimation of the parameters of a single equation ina complete system of stochastic equations.
Annals of Mathematical Statistics , ,46–63.Andrews, D. (1988). Chi-squared diagnostic tests for econometric models. Econometrica , , 1419–1453.Andrews, D. (1997). A conditional kolmogorov test. Econometrica , , 1097–1128.Andrews, D. (2000). Inconsistency of the bootstrap when a parameter is on the boundaryof the parameter space. Econometrica , , 399–405.Andrews, D., Berry, S., & Jia, P. (2003). Placing bounds on parameters of entry games inthe presence of multiple equilibria [unpublished manuscript].Andrews, D., & Guggenberger, P. (2006).
The limit of finite sample size and a problemwith subsampling [unpublished manuscript].Beresteanu, A., & Molinari, F. (2008). Asymptotic properties for a class of partially iden-tified models.
Econometrica , , 763–814.Blundell, R., Browning, M., & Crawford, I. (2005). Best nonparametric bounds on demandresponses [unpublished manuscript].Brock, B., & Durlauf, S. (2007). Identification of binary choice models with social interac-tions.
Journal of Econometrics , , 52–75.Bugni, F. (2007). Bootstrap methods for some partially identified models [unpublishedmanuscript].Chen, X., Hong, H., & Tamer, E. (2005). Measurement error models with auxiliary data.
Review of Economic Studies , , 343–366.Chernozhukov, V., Hong, H., & Tamer, E. (2007). Estimation and confidence regions forparameter sets in econometric models. Econometrica , , 1243–1285.Ciliberto, F., & Tamer, E. (2006). Market structure and multiple equilibria in airline mar-kets [unpublished manuscript].Cross, P., & Manski, C. (2002). Regressions, short and long.
Econometrica , , 357–368.Dudley, R. (2002). Real analysis and probability . Cambridge University Press.Einmahl, U., & Mason, D. (1997). Gaussian approximation of local empirical processesindexed by functions.
Probability Theory and Related Fields , , 283–311.Galichon, A., & Henry, M. (2006a). Dilation bootstrap. a methodology for constructingconfidence regions with partially identified models [available from SSRN at http ://papers.ssrn.com/sol3/papers.cfm?abstract id=934442].28alichon, A., & Henry, M. (2006b).
Inference in incomplete models [available from SSRNat: http://papers.ssrn.com/sol3/papers.cfm?abstract id=886907].Galichon, A., & Henry, M. (2008).
Universal power of kolmogorov-smirnov tests of under-identifying restrictions. [available from SSRN at http : / / papers . ssrn . com / sol3 /papers.cfm?abstract id=1123823].Gin´e, E., & Zinn, S. (1990). Bootstrapping general empirical measures.
Annals of Proba-bility , , 851–859.Heckman, J., Smith, J., & Clements, N. (1997). Making the most out of programme evalu-ation and social experiments: Accounting for heterogeneity in programme impacts. Review of Economic Studies , , 487–535.Horowitz, J., & Manski, C. (1998). Censoring of outcomes and regressors due to surveynonresponse: Identification and estimation using weights and imputations. Journalof Econometrics , , 37–58.Imbens, G., & Manski, C. (2004). Confidence intervals for partially identified parameters. Econometrica , , 1845–1859.Jovanovic, B. (1989). Observable implications of models with multiple equilibria. Econo-metrica , , 1431–1437.Kellerer, H. (1984). Duality theorems for marginal problems. Zeitschrift f¨ur Wahrschein-lichkeitstheorie und Verwandte Gebiete , , 399–432.Koopmans, T., & Reiersol, O. (1950). The identification of structural characteristics. An-nals of Mathematical Statistics , , 165–181.Lehmann, E., & Romano, J. (2005). Testing statistical hypotheses . Springer: New York.Linton, O., Maasoumi, E., & Whang, Y. (2005). Testing for stochastic dominance undergeneral conditions: A subsampling approach.
Review of Economic Studies , , 735–765.Liu, X., & Shao, Y. (2003). Asymptotics for likelihood ratio tests under loss of identifia-bility. Annals of Statistics , , 807–832.Magnac, T., & Maurin, E. (2008). Partial identification in monotone binary models: Dis-crete regressors and interval data [forthcoming in the
Review of Economic Studies ].Manski, C. (1990). Nonparametric bounds on treatment effects.
American Economic Re-view , , 319–323.Manski, C. (2004). Social learning from private experiences: The dynamics of the selectionproblem. Review of Economic Studies , , 443–458.Manski, C. (2005). Partial identification in econometrics [ New Palgrave Dictionary of Eco-nomics, 2nd Edition. ]. 29arschak, J., & Andrews, W. (1944). Random simultaneous equations and the theory ofproduction.
Econometrica , , 143–203.Matzkin, R. (1994). Restrictions of economic theory in nonparametric methods. Handbookof Econometrics, vol 4, R. F. Engel and D. L. McFadden, eds. , 1–16.McFadden, D. (1989). Testing for stochastic dominance.
Studies in the Economics of Un-certainty (in honor of J, Hadar), Part II, T. Fomby and T. Seo, eds. , 113–134.Molchanov, I. (2005).
Theory of random sets . Springer: New York.Molinari, F. (2003).
Contaminated, corrupted and missing data [Northwestern UniversityPh.D.].Monge, G. (1781).
M´emoire sur la th´eorie des d´eblais et des remblais . Acad´emie Royaledes Sciences de Paris.Pakes, A., Porter, J., Ho, K., & Ishii, J. (2004).
Moment inequalities and their application [unpublished manuscript].Roehrig, C. (1988). Conditions for identification in parametric and nonparametric models.
Econometrica , , 433–447.Romano, J., & Shaikh, A. (2006). Inference for the identified set in partially identifiedeconometric models [unpublished manuscript].Romano, J., & Shaikh, A. (2008).
Inference for identifiable parameters in partially iden-tified econometric models [forthcoming in the
Journal of Statistical Planning andInference ].Rosen, A. (2008).
Confidence sets for partially identified parameters that satisfy a finitenumber of moment inequalities [forthcoming in the
Journal of Econometrics ].Sen, P., & Silvapulle, M. (2004).
Constrained statistical inference: Inequality, order andshape restrictions . Wiley-Interscience: New York.Stute, W. (1984). The oscillation behaviour of empirical processes: The multivariate case.
Annals of Probability , , 361–379.Talagrand, M. (1994). The transportation cost from the uniform measure to the empiricalmeasure in dimension greater or equal to three. Annals of Probability , , 919–959.Tamer, E. (2003). Incomplete simultaneous discrete response model with multiple equilib-ria. Review of Economic Studies , , 147–165.Tinbergen, J. (1951). Some remarks on the distribution of labour incomes. InternationalEconomic Papers 1: Translations prepared for the economic association, Eds.: AlanT. Peacock et al. , 95–207.van der Vaart, A. (1998).
Asymptotic statistics . Cambridge University Press.van der Vaart, A., & Wellner, J. (1996).
Weak convergence and empirical processes . NewYork: Springer. 30illani, C. (2003).