[PDF] Inference in Incomplete Models

Abstract

We provide a test for the specification of a structural model without identifying assumptions. We show the equivalence of several natural formulations of correct specification, which we take as our null hypothesis. From a natural empirical version of the latter, we derive a Kolmogorov-Smirnov statistic for Choquet capacity functionals, which we use to construct our test. We derive the limiting distribution of our test statistic under the null, and show that our test is consistent against certain classes of alternatives. When the model is given in parametric form, the test can be inverted to yield confidence regions for the identified parameter set. The approach can be applied to the estimation of models with sample selection, censored observables and to games with multiple equilibria.

Full PDF

IInference in Incomplete Models

Alfred Galichon and Marc Henry

Harvard University and Columbia UniversityFirst draft: September 15, 2005This draft : May 26, 2006 Abstract

We provide a test for the speciﬁcation of a structural model without identifying assump-tions. We show the equivalence of several natural formulations of correct speciﬁcation, whichwe take as our null hypothesis. From a natural empirical version of the latter, we derive aKolmogorov-Smirnov statistic for Choquet capacity functionals, which we use to constructour test. We derive the limiting distribution of our test statistic under the null, and show thatour test is consistent against certain classes of alternatives. When the model is given in para-metric form, the test can be inverted to yield conﬁdence regions for the identiﬁed parameterset. The approach can be applied to the estimation of models with sample selection, censoredobservables and to games with multiple equilibria.

JEL Classiﬁcation: C10, C12, C13, C14, C52, C61Keywords: partial identiﬁcation, speciﬁcation test, random correspondences, Core, selections, plausibility constraint, Monge-Kantorovich mass transportation problem, Kolmogorov-Smirnov test for capacity functionals. This research was carried out while the ﬁrst author was visiting the Bendheim Center for Finance, Prince-ton University and ﬁnancial support from NSF grant SES 0350770 to Princeton University and from the ConseilG´en´eral des Mines is gratefully acknowledged. The authors also wish to thank Gary Chamberlain, Xiaohong Chen,Victor Chernozhukov, Pierre-Andr´e Chiappori, Ronald Gallant, Peter Hansen, Han Hong, Guido Imbens, MichaelJansson, Massimo Marinacci, Rosa Matzkin, Francesca Molinari, Ulrich Mueller, Alexei Onatski, Ariel Pakes, JimPowell, Peter Robinson, Bernard Salani´e, Thomas Sargent, Jos´e Scheinkman, Jay Sethuraman, Azeem Shaikh,Chris Sims, Kyungchul Song and Edward Vytlacil and seminar participants at Berkeley, Columbia, ´Ecole poly-technique, Harvard, MIT, NYU, Princeton, SAMSI and Stanford for helpful comments (with the usual disclaimer).Correspondence address: Department of Economics, Columbia University, 420 W 118th Street, New York, NY10027, USA. [email protected]. This paper is now superseded by various papers by the same authors. a r X i v : . [ ec on . E M ] F e b ntroduction In many contexts, the ability of econometric models to identify, hence estimate from observedfrequencies, the distribution of residual uncertainty often rests on strong prior assumption thatare diﬃcult to substantiate and even to analyze within the economic decision problem.A recent approach, pioneered by Manski has been to forego such prior assumptions, thus givingup the ability to identify a single probability distribution for residual uncertainty, and allowinstead for a set of distributions compatible with the empirical setup. A variety of models havebeen analyzed in this way, whether partial identiﬁcation stems from incompletely speciﬁed models(typically models with multiple equilibria) or from structural data insuﬃciencies (typically casesof data censoring). See Manski, 2005 for an up-to-date survey on the topic.All these models with incomplete identiﬁcation share the basic fundamental structure that theresidual uncertainty and the relevant observable quantities are linked by a many-to-many mappinginstead of a one-to-one mapping as in the case of identiﬁcation.In this paper, we propose a general framework for conducting inference without additional as-sumptions such as equilibrium selection mechanisms necessary to identify the model (i.e. toensure that the many-to-many mapping is actually one-to-one). The usual terminology for suchmodels is “incomplete” or “partially identiﬁed.”In a parametric setting, the objective of inference in partially identiﬁed models is the estimationof the set of parameters (hereafter called identiﬁed set ) which are compatible with the distributionof the observed data and an assessment of the quality of that estimation. For the latter objective,two routes have been taken.Chernozhukov et al., 2002 initiated research to obtain regions that cover the identiﬁed set witha prescribed probability. They propose an M-estimation approach with a sub-sampling proce-dure to approximate quantiles of the supremum of the criterion function over the identiﬁed set.Shaikh, 2005 proposes an alternative M-estimation with subsampling procedure that nests theChernozhukov et al., 2002 proposal. M-estimation with subsampling is the only general proposalto date that does not rely on a conservative testing procedure, but the choice of criterion functionin the M-estimation procedure is arbitrary, and may have a large eﬀect on the conﬁdence regions.2n related research, a more direct application of random set methods has been taken to achieve thegoal of constructing conﬁdence regions for the identiﬁed set: Shaikh and Vytlacil, 2005 considera special model where the identiﬁed set is a deterministic mapping of a collection of expectations,and base inference on the sample analogs of these expectations. Beresteanu and Molinari, 2006propose the use of central limit theorems for random sets to conduct inference in models withset valued data. However, the adaptation of delta theorems for random sets is required for thisapproach to attain its full potential.The second route was initiated by Imbens and Manski, 2004 who considered the diﬀerent problemof covering each element of the identiﬁed set, and demanded uniform coverage. Shaikh, 2005 showsthat the M-estimation with subsampling procedure can also be applied to uniform coverage ofelements of the identiﬁed set. Pakes et al., 2004 consider models that are deﬁned by momentinequalities and propose a conservative procedure to form a conﬁdence region for all parametersin the identiﬁed set based on inequalities testing ideas. The procedure is conservative since thelimiting distribution of the test statistic depends on the number of constraints that are actuallybinding, and unlike in the special one dimensional treatment response case analyzed by Imbensand Manski, 2004, no supereﬃcient pre-test is available.Still in the latter spirit, Andrews et al., 2004 consider entry games (and more generally gameswith discrete strategies) and propose a conservative procedure to form a conﬁdence region for allparameters in the identiﬁed set based on the idea that the probability of a certain outcome is nolarger than the probability that necessary conditions (such as Nash rationality constraints) aremet.The inference procedure proposed here is in the same spirit as this latter contribution, but itgives a full formalization of the idea in a very general framework, does not restrict the class ofdistributions of observables (hence allows estimation of games with continuous strategies as wellas entry games), does not rely on resampling procedures (though they may be used as alterna-tive quantile approximation devices), and provides an exact test as opposed to the conservativeprocedures considered above.After a prelude to expound the ideas developed here in the familiar case of Kolmogorov-Smirnovspeciﬁcation testing, the general set-up is described (with some examples) in section 1. It com-prises the speciﬁcation of a structure (in the Koopmans terminology) with observable and un-observable variables (unobservable to the analyst but not necessarily to the economic agents)3elated by a many-to-many mapping as opposed to the one-to-one mapping required for identiﬁ-cation. The structure is deﬁned by the many-to-many mapping (which can comprise rationalityconstraints as before, as well as any constraints that are plausible within the theory) and a hy-pothesized distribution for the unobserved variables. To ﬁx ideas, we call Γ the many-to-manymapping deﬁning the structure, ν a hypothesized distribution of unobservables and P the truedistribution of observables.Still in section 1, a characterization is given of what we mean by correct speciﬁcation, viz.compatibility of the structure with the distribution of the observable variables, and it is shownthat several natural ways of deﬁning compatibility are in fact equivalent. They include (amongother notions) a compatibility notion based on selections γ of Γ (i.e. functions such that γ ∈ Γ),a notion based on the existence of a joint probability that admits ν and P as marginals andis supported on the region where the constraints implied by Γ are satisﬁed, and the notion ofmaximum plausibility introduced by Dempster, 1967.Second, in section 2, we show that the characterizations of correct speciﬁcation of the structureare equivalent to the existence of a zero cost solution to a Monge-Kantorovich mass transportationproblem, where mass is transported between distribution P and distribution ν with zero-one costassociated with violation of the constraints implied by Γ. This is the topic of section 2. Note thata special case of Monge-Kantorovich transportation problem is the well-know matching problem.Third, still in section 2, this observation allows us to conduct inference using the empirical versionof the mass transportation problem (with the unknown P replaced by the empirical distribution P n ). Empirical formulations pertaining to the diﬀerent characterizations of correct speciﬁcationof the structure are compared, and several are found to be equivalent, whereas others diﬀeraccording to the choice of probability metric. It turns out that the dual of the empirical problemyields a statistic that reduces to the familiar Kolmogorov-Smirnov speciﬁcation test statistic inthe identiﬁed case where Γ is one-to-one.The properties of this statistic are examined in section 3. The classical Kolmogorov-Smirnovstatistic tests the equality of two probability measures by checking their diﬀerence on a good class of sets (large enough to be convergence-determining, but small enough to allow asymptotictreatment). Here our test statistic checks that P ( A ) is no larger than ν (Γ( A )) for all A in asimilar class of sets. Since ν (Γ( A )) is the probability of the suﬃcient conditions implied by A ,we see the strong similarity with the Andrews et al., 2004 approach. Hence the dual empirical4roblem provides us with a computable test statistic, and a distribution to compare it to, and aparallel with the classical case.We derive the asymptotic distribution of our test statistic and describe how classes of alternativesagainst which our test has power are related to what we call core-determining classes of sets.Finally, the fourth section shows simple implementation procedures, and the inversion of the testto construct a conﬁdence region for the elements of the identiﬁed set of parameters when both Γand ν are speciﬁed in parametric form. If one is interested in testing structural hypotheses suchas extra constraints implied by theory, within the framework of a partially identiﬁed model, theconstraints should be rejected if the region they imply on the parameter set does not intersectwith the identiﬁed set. Here the question can be answered directly by incorporating the extraconstraints in the model and testing the restricted speciﬁcation. If, on the other hand, one isinterested in reporting parameter value estimates with conﬁdence bounds for policy analysis,the speciﬁcation test can be inverted to the end of providing conﬁdence regions that cover theelements of the identiﬁed set with pre-determined probability, or conﬁdence regions that coverthe identiﬁed set itself.At the end of this section, we discuss semi-nonparametric extensions of our approach to includemodels which do not specify a parametric family of hypothesized data generating processes forthe unobservable variables. This includes as a special case models deﬁned by moment inequalities,the full treatment of which is the subject of the companion paper Galichon and Henry, 2006.The last section of the main text concludes; whereas proofs and additional results are collectedin the appendix. Prelude: complete model benchmark

Before we deﬁne incomplete model speciﬁcations, we give a short heuristic univariate descriptionof the benchmark that we use and discuss the Kolmogorov-Smirnov speciﬁcation test statisticthat we are eﬀectively generalizing in this paper.For ease of noptation, we consider observables y ∈ R and unobservables u ∈ R (also called“unobserved shocks”, “latent variables”, etc...). Abstracting from dependence on an unknown5eterministic parameter, we deﬁne a “complete” structure as a pair ( ν, γ ), where ν is a datagenerating process for the unobservables, and γ is a bijection from the set of observables to theset of unobservables, as in ﬁgure 1.Figure 1: Bijective structureIf we call P the true data-generating process for the observables, we say that the completestructure is well speciﬁed if P ( A ) = ν ( γ ( A )) for all Borel sets A , which, by Dynkin’s lemma, isequivalent to P ( A ) = ν ( γ ( A )) for all cells A of the form ( −∞ , y ], y ∈ R , which is immediatelyseen to be equivalent to sup A ∈S ( P ( A ) − ν ( γ ( A ))) = 0 (1)where C = { ( −∞ , y ] , ( y , ∞ ) : ( y , y ) ∈ R } .(1) is a programming problem, and it will turn out to be very fruitful to consider its Monge-Kantorovich dual formulation inf π ∈M ( P,ν ) (cid:90) R { u (cid:54) = γ ( y ) } π ( dy, du ) = 0 , (2)where 1 { x ∈ A } denotes the indicator function of the set A , and the inﬁmum is taken over alljoint probability measures with marginals P and ν . The latter is a mass transportation (or“generalized matching”) problem, where mass is transported from the set of observables to the setof unobservables with zero-one cost of transportation associated with violations of the constraint u = γ ( y ). 6his formulation can be interpreted as the existence of a probability that is concentrated on thestructure, or alternatively, to the existence of a coupling between the random variable Y withlaw P and the random variable U with law ν , i.e. the existence of π with marginals P and ν suchthat π ( U (cid:54) = γ ( Y )) = 0 . (3)We shall show that this dual representation of the hypothesis of correct speciﬁcation has a naturalgeneralization to the case of incomplete structures.Turning to empirical versions of the problem, we can consider the statistic obtained by replacing P by the empirical distribution P n of a sample of independent and identically distributed variableswith law P , we obtain inf π ∈M ( P,ν ) (cid:90) R { u (cid:54) = γ ( y ) } π ( dy, du ) , (4)where the inﬁmum is taken over probabilities π with marginals P n and ν . By the above mentionedduality, the latter is equal to sup A ∈B Y ( P n ( A ) − ν ( γ ( A ))) , with B Y the class of Borel sets.The last step is to determine a class of sets that is small enough to allow determination of thelimiting behaviour of the statistic, i.e. we need to class of sets to be P -Donsker, and large enoughthat the values of ν ( γ ( . )) over all Borel sets are determined by the latter’s values on the restrictedclass. The class C satisﬁes both requirements, and the resulting test statistic issup A ∈C ( P n ( A ) − ν ( γ ( A ))) = sup y ∈ R | P n ( −∞ , y ] − ν ( γ ( −∞ , y ]) | , (5)which is exactly the Kolmogorov-Smirnov speciﬁcation test statistic.We shall essentially follow these same steps to show equivalence between formulations of thehypothesis of correct speciﬁcation and to derive a test of speciﬁcation when the bijection γ isreplaced by a correspondence Γ, as in ﬁgure 2. Then we shall consider parameterized versions ofthe structure where both Γ and ν depend on a parameter θ , and form conﬁdence regions with allvalues of θ such that the speciﬁcation of model (Γ θ , ν θ ) is not rejected.7igure 2: Incomplete structure We consider a very general econometric speciﬁcation, thereby posing the problem exactly as inJovanovic, 1989 which was an inspiration for this work. Variables under consideration are dividedinto two groups. • Latent variables, u ∈ U . The vector u is not observed by the analyst, but some of itscomponents may be observed by the economic actors. U is a complete, metrizable andseparable topological space (i.e. a Polish space). • Observable variables, y ∈ Y = R d y . The vector y is observed by the analyst.The Borel sigma-algebras of Y and U will be respectively denoted B Y and B U . Call P the Borelprobability measure that represents the true data generating process for the observable variables,and ν the hypothesized data generating processes for the latent variables. The structure is givenby a relation between observable and latent variables, i.e. a subset of Y × U , which we shallwrite as a multi-valued mapping from Y to U denoted by Γ. Finally, the set of Borel probabilitymeasures on ( Y × U , σ ( B Y × B U )) with marginals P and ν is denoted by M ( P, ν ). Wheneverthere is no ambiguity, we shall adopt the de Finetti notation µf to denote the integral of f withrespect to µ . 8 .1 Examples Example 1: Sample selection and other models with missing counterfactuals.

Thetypical Heckman sample selection models require very strong and often implausible assumptions toguarantee identiﬁcation. Weaker assumptions, such as certain forms of monotonicity are plausibleand restrict signiﬁcantly the identiﬁed set without reducing it to a singleton. As an illustration ofour formulation in this case, consider for instance the classical set-up in Heckman and Vytlacil,2001. We observe (

Y, D, W ), where Y is the outcome variable, D is an indicator variable forthe receipt of treatment, and Z is a vector of instruments (we implicitly condition the model onexogenous observable covariates). The outcome variable is generated as follows: Y = DY + (1 − D ) Y , where Y is the binary potential outcome if the individual does not receive treatment, and Y isthe binary potential outcome if the individual does receive treatment. The model is completedwith the speciﬁcation of D as follows: D = 1 { g ( Z ) ≥ U } , where g is a measurable function and U is uniformly distributed on [0 ,

1] (without loss of gen-erality). The model can be written in the form of a multi-valued mapping Γ from observable tounobservables in the following way:( y, d, z ) (cid:55)−→ { ( u, y , y ) ∈ Γ( y, d, z ) } (1 , , z ) (cid:55)−→ [ 0 , g ( z )] × { } × { , } (1 , , z ) (cid:55)−→ ( g ( z ) , × { , } × { } (0 , , z ) (cid:55)−→ [ 0 , g ( z )] × { } × { , } (0 , , z ) (cid:55)−→ ( g ( z ) , × { , } × { } Example 2: Returns to schooling.

Consider a general speciﬁcation for the returns to edu-cation, where income Y is a function of years of education E , other observable characteristics X and unobserved ability U as Y = G ( E, X, U ). G can be inverted as a multi-valued mapping toyield a correspondence U = Γ( Y, E, X ). Example 3: Censored data structures.

Models with top-censoring or positive censoring suchas Tobit models fall in this class. A classic problem where identiﬁcation fails is regression with9nterval censored outcomes: the observables variables are the pairs ( Y ∗ , Y ∗ , X ) of upper and lowervalues for the dependent variable, and the explanatory variables. The correspondence describingthe structure is Γ θ ( y ∗ , y ∗ , x ) = [ y ∗ − x (cid:48) θ, y ∗ + x (cid:48) θ ] . Example 4: Games with multiple equilibria.

Very large classes of economic models becomeestimable with this approach, when one allows the object of interest to be the identiﬁed setof parameters as opposed to single parameter values. A simple class of examples is that ofmodels deﬁned by a set of Nash rationality constraints. Suppose the payoﬀ function for player j , j = 1 , . . . , J is given by Π j ( S j , S − j , X j , U j ; θ ) , where S j is player j ’s strategy and S − j is their opponents’ strategies. X j is a vector of observablecharacteristics of player j and U j a vector of unobservable determinants of the payoﬀ. Finally θ is a vector of parameters. Pure strategy Nash equilibrium conditionsΠ j ( S j , S − j , X j , U j ; θ ) ≥ Π j ( S, S − j , X j , U j ; θ ) , for all S deﬁne a correspondence Γ θ from unobservable player characteristics to observable variables ( S, X ). Example 5: Entry models.

Consider the special case of example 4 proposed by Jovanovic,1989. The payoﬀ functions areΠ ( x , x , u ) = ( λx − u )1 { x =1 } , Π ( x , x , u ) = ( λx − u )1 { x =1 } , where x i ∈ { , } is ﬁrm i’s action, and u is an exogenous cost. The ﬁrms know their cost; theanalyst, however, knows only that u ∈ [0 , λ is in (0 , x = x = 0 for all u ∈ [0 , x = x = 1 for all u ∈ [0 , λ ] and zero otherwise. Since the two ﬁrms’ actions areperfectly correlated, we shall denote them by a single binary variable y = x = x . Hence thestructure is described by the multi-valued mapping: Γ(1) = [0 , λ ] and Γ(0) = [0 , y is Bernoulli, we can write P = (1 − p, p ) with p the probability of a 1. For the distributionof u , we consider a parametric exponential family on [0 , We wish to develop a procedure to detect whether the structure (Γ , ν ) and the distributionof observables are compatible. First we explain what we mean by compatible . We start bytaking P , Γ and ν as given and by considering three natural formalizations of compatibility, aﬁrst representation based on measurable selections of Γ, the second based on the existence of asuitable probability measure with marginals P and ν and a third based on Dempster’s notion ofmaximal plausibility. It is very easily understood in the simple case where the link Γ between latent and observablevariables is parametric and Γ = γ is measurable and single valued. Deﬁning the image measureof P by γ by P γ − ( A ) = P { y ∈ Y| γ ( y ) ∈ A } , (6)for all A ∈ B U , we say that the structure is well speciﬁed if and only if ν = P γ − . In the generalcase considered here, Γ may not be single valued, and its images may not even be disjoint (whichwould be the case if it was the inverse image of a single valued mapping from U to Y , i.e. a tradi-tional function from latent to observable variables). However, under a measurability assumptionon Γ, we can construct an analogue of the image measure, which will now be a set Core(Γ , P )of Borel probability measures on U (deﬁned by (10)), and the hypothesis of compatibility of therestrictions on latent variable distributions and on the structures linking latent and observablevariables will naturally take the formH : ν ∈ Core(Γ , P ) . (7) Assumption 1:

Γ has non-empty and closed values, and for each open set

O ⊆ U , Γ − ( O ) = { y ∈ Y | Γ( y ) ∩ O (cid:54) = ∅ } ∈ B Y . 11o relate the present case to the intuition of the single-valued case, it is useful to think in termsof single-valued selections of the multi-valued mapping Γ, as in ﬁgure 3.Figure 3: Selection of a correspondenceA measurable selection γ of Γ is a measurable function such that γ ( y ) ∈ Γ( y ) for all y ∈ Y . Theset of measurable selections of a multi-valued mapping Γ that satisﬁes Assumption 1 is denotedSel(Γ) (which is known to be non-empty by the Rokhlin-Kuratowsky-Ryll-Nardzewski Theorem).To each selection γ of Γ, we can associate the image measure of P , denoted P γ − , deﬁned as in(6).It would be tempting to reformulate the compatibility condition as the requirement that at leastone selection γ in Sel(Γ) is such that ν = P γ − . However, such a requirement implies that γ corresponds to the equilibrium that is always selected. Under such a requirement, if for agiven observable value the structure does not specify which value of the latent variables gaverise to it, the latter is nonetheless ﬁxed. Hence two identical observed realizations in the sampleof observations necessarily arose from the same realization of the latent variables. We argue,however, that if the structure does not specify an equilibrium selection mechanism, there is noreason to assume that each observation is drawn from the same equilibrium.Allowing endogenous equilibrium selection of unknown form is equivalent to allowing the existenceof an arbitrary distribution on the set of P γ − when γ spans Sel(Γ) (as opposed to a mass onone particular P γ − ). A Bayesian formulation of the problem would entail a speciﬁcation of thisdistribution. Here, we stick to the given speciﬁcation in leaving it completely unspeciﬁed.12ence, we argue that the correct reformulation of the compatibility condition is that ν can bewritten as a mixture of probability measures of the form P γ − , where γ ranges over Sel(Γ).However, as the following example show, even for the simplest multi-valued mapping, the set ofmeasurable selections is very rich, let alone the set of their mixtures. Example:

Consider the multi-valued mappingΓ : [0 , ⇒ [0 , x ) = { , x } for all x . The collection of measurable selections of Γ is indexed by theclass of Borel subsets of [0 , γ B , such that γ B ( x ) = x { x ∈ B } for any Borel subset B of [0 , { x ∈ B } denotes the indicator functionwhich equals one when x ∈ B and zero otherwise.Hence, it will be imperative to give manageable equivalent representations of such a mixture, asis done in Theorem 1 below. The second natural representation of compatibility of the distribution P of observables and thestructure (Γ , ν ) is based on the existence of probability measures on the product Y ×U that admit P and ν as marginals.In the benchmark case of Γ = γ one-to-one, the structure imposes a stringent constraint on pairs( y, u ), namely that u = γ ( y ). So the admissible region of the product space is the graph of γ , i.e.the set Graph γ = { ( y, u ) ∈ Y × U : u = γ ( y ) } . The compatibility condition described above, namely

P γ − = ν is equivalent to the existenceof a probability measure on the product space that is supported by Graph γ (i.e. that givesprobability zero outside the constrained region deﬁned by the structure) and admits P and ν asmarginals.This generalizes immediately to the case of Γ multi-valued, as the existence of a probabilitymeasure that admits P and ν as marginals, and that is supported on the constrained regionGraph Γ = { ( y, u ) ∈ Y × U : u ∈ Γ( y ) } , (8)13n other words, a probability measure that admits P and ν as marginals and gives probabilityzero to the event U / ∈ Γ( Y ), where U and Y are random elements with probability law ν and P respectively (namely (12) below). Dempster, 1967 suggests to consider the smallest reliability that can be associated with the event B ∈ B U as the belief function P ( A ) = P { y ∈ Y | Γ( y ) ⊆ B } and the largest plausibility that can be associated with the event B as the plausibility function P ( A ) = P { y ∈ Y | Γ( y ) ∩ B (cid:54) = ∅ } the two being linked by the relation P ( A ) = 1 − P ( A c ) , (9)which prompted some authors to call them conjugates or dual of each other.A natural way to construct a set of probability measures is to consider all probability measuresthat do not exceed the largest plausibility that can be associated with a set, and that, as a resultof (9), are larger than the smallest reliability associated with a set. We thus form the core of thebelief function : Core(Γ , P ) = { µ ∈ ∆( U ) | ∀ B ∈ B U , µ ( B ) ≥ P ( B ) } (10)= { µ ∈ ∆( U ) | ∀ B ∈ B U , µ ( B ) ≤ P ( B ) } where the ﬁrst equality can be taken as a deﬁnition, and the second follows immediately from(9). It is well known that Core(Γ , P ) is non-empty, and another natural representation of thecompatibility of the distribution P of observables with the structure (Γ , ν ) is that ν belongs toCore(Γ , P ), in other words, that ν satisﬁes ν ( B ) ≤ P ( { y ∈ Y : Γ( y ) ∩ B (cid:54) = ∅ } ) for all B ∈ B U .Figure 4 illustrates this requirement in the case of ﬁnite sets. The name Core is standard in the literature to denote the set of probability measures satisfying (13). It seemsto originate from D. Gillies’ 1953 Princeton PhD thesis on “some theorems on n-person games.” For ﬁnite sets,the core is non-empty by the Bondareva-Shapley theorem. In the present more general context, the non-emptinessof the core will follow from the equivalence of (i) and (iv) of Theorem 1 below, and the existence of measurableselections of Γ under assumption 1. { a } always gives rise to theevent { b , b } , whereas event { a } never does, so it is natural to constrain the probability of theevent { b , b } by the upper bound P ( { a , a , a } ) and the lower bound P ( { a } ). The following theorem shows that the three representations discussed above are, in fact, equiv-alent. In addition, two more equivalent formulations are presented that will be used in theempirical formulations in the next section.

Theorem 1:

Under assumption 1, the following statements are equivalent:(i) ν is a mixture of images of P by measurable selections of Γ, (i.e. ν is in the weak closedconvex hull of { P γ − ; γ ∈ Sel(Γ) } ).(ii) There exists for P -almost all y ∈ Y a probability measure π ν ( y, . ) on U with support Γ( y ),such that ν ( B ) = (cid:90) Y π ν ( y, B ) P ( dy ) , all B ∈ B U . (11)(iii) If U and Y are random elements with respective distributions P and ν , there exists aprobability measure π ∈ M ( P, ν ) that is supported on the admissible region, i.e. such that π ( U / ∈ Γ( Y )) = 0 . (12)(iv) The probability assigned by ν to an event in B ∈ B U is no greater than the largest plausi-bility associated with B given P and Γ, i.e. ν ( B ) ≤ P ( { y ∈ Y : Γ( y ) ∩ B (cid:54) = ∅ } ) (13)15v) For all A ∈ B Y , we have P ( A ) ≤ ν (Γ( A )) . (14) Remark 1:

The weak topology on ∆( U ), the set of probability measures on U , is the topology ofconvergence in distribution. ∆( U ) is also Polish, and the weak closed convex hull of { P γ − ; γ ∈ Sel(Γ) } is indeed the collection of arbitrary mixtures of elements of { P γ − ; γ ∈ Sel(Γ) } . Remark 2:

Notice that (11) looks like a disintegration of ν , and indeed, when Γ is the inverseimage of a single-valued measurable function (i.e. when the structure is given by a single-valuedmeasurable function from latent to observable variables), the probability kernel π ν is exactly the( P, Γ − )-disintegration of ν , in other words, π ν ( y, . ) is the conditional probability measure on U under the condition Γ − ( u ) = { y } . Hence (11) has the interpretation that a random elementwith distribution ν can be generated as a draw from π ν ( y, . ) where y is a realization of a randomelement with distribution P . Remark 3:

As will be explained later, our test statistic will be based on violations of repre-sentation (v), which is the dual formulation of (iii) seen as a Monge-Kantorovich optimal masstransportation solution.

Remark 4:

Equivalence of (i) and (iii) is a generalization of proposition 1 of Jovanovic, 1989 tothe case where P is not necessarily atomless and U not necessarily compact. Notice that relativeto Jovanovic, 1989, the roles of Y and U are reversed for the purposes of speciﬁcation testing.As discussed in the second remark following proposition 1 mentioned above, atomlessness of thedistribution of latent variables is innocuous as long as U is rich enough. However, atomlessness ofthe distribution of observables isn’t innocuous, since it rules out many of the relevant applications.Note that since as a multivalued function, Γ is always invertible, and Assumption 1 holds for Γif and only if it holds for Γ − , the roles of P and ν can be interchanged in the formulations. Insome cases, the symmetric formulation, with the roles of P and ν interchanged, is useful, so westate it for completeness below: Theorem 1’:

Under assumption 1, the following statements are equivalent, and are also equiv-alent to each of the statements in Theorem 1: 16i’) P is a mixture of images of ν by measurable selections of Γ − , (i.e. P is in the weak closedconvex hull of { νγ − ; γ ∈ Sel(Γ − ) } ).(ii’) There exists for ν -almost all u ∈ U a probability measure π P ( u, . ) on Y with support Γ − ( u ),such that P ( A ) = (cid:90) U π P ( u, A ) ν ( du ) , all A ∈ B Y . (15)(iii’) is identical to Theorem 1(iii).(iv’) The probability assigned by P to an event in A ∈ B Y is no greater than the largest plausi-bility associated with A given ν and Γ − , i.e. P ( A ) ≤ ν ( { u ∈ U : Γ − ( u ) ∩ A (cid:54) = ∅ } ) (16)(v’) For all B ∈ B U , we have ν ( B ) ≤ P (Γ − ( B )) . (17) Remark 1:

The reason for giving this second theorem is that some of the new formulations willmore amenable to forming empirical counterparts.

Each of the theoretical formulations of correct speciﬁcation of the structure given in Theorems 1and 1’ has empirical counterparts, obtained essentially by replacing P by an estimate such as P n in the formulations. The equivalence of the theoretical formulations does not necessarily entailequivalence of the empirical counterparts, especially in the cases where they rely on a choice ofdistance on the (metrizable) space of probability measures on ( Y , B Y ) or ( U , B U ). Hence we needto consider the relations existing between the diﬀerent empirical counterparts. We shall formour test statistic based on the empirical formulation relative to (v), so the reader may jump tosection 2.4 without loss of continuity. For this empirical formulation, we consider (i’) from Theorem 1’. We denote Core(Γ − , ν ) the setof arbitrary mixtures of νγ − when γ spans Sel(Γ − ), and denoting by d a choice of metric on17he space of probability measures on ( Y , B Y ), the null can be reformulated as d ( P, Core(Γ − , ν )) := inf µ ∈ Core(Γ − ,ν ) d ( P, µ ) = 0 . Hence the empirical version is obtained by replacing P by an estimate such as P n to yield d ( P n , Core(Γ − , ν )) . It will naturally depend on the speciﬁc choice of metric.To see the relation between this and other empirical formulations, consider the Kolmogorov-Smirnov metric deﬁned by d K S ( µ , µ ) = sup A ∈B Y ( µ ( A ) − µ ( A ))for any two probability measures µ and µ on ( Y , B Y ). With this choice of metric, we can deriveconditions under which the equalities d K S ( P n , Core(Γ − , ν )) = inf γ ∈ S el (Γ − ) sup A ∈B Y ( P n ( A ) − νγ − ( A ))= sup A ∈B Y inf γ ∈ S el (Γ) ( P n ( A ) − νγ ( A ))= sup A ∈B Y ( P n ( A ) − ν (Γ( A )))hold, and therefore this empirical formulation is equivalent to empirical formulations based on(iii), (iv), and (v) below. We consider (ii) from Theorem 1 and d a metric on the space of probability measures on ( U , B U ).Under the null hypothesis, let π ν be the family of kernels deﬁned in (ii) of Theorem 1. Denoting µf the integral of a function f by a measure µ , we can write (ii) as d ( ν, P π ν ) = 0, which admits d ( ν, P n π ν ) as empirical counterpart, and the latter is equal to d ( P π ν , P n π ν ). A notable aspect ofthis empirical formulation is that for many choices of metric d or indeed pseudo-metric (such asrelative entropy), it will take the form of a functional of the empirical process G n := √ n ( P n − P )applied to the functions y (cid:55)→ π ν ( y ) . Diﬀerent Goodness-of-ﬁt tests can therefore be generalizedwithin a single framework. The diﬃculty here of course is that the kernel π ν depends on theunknown P in a complicated way through the integral equation (11).18 .3 Empirical representation relative to (iii) In view of representation (iii) of Theorem 1, i.e. equation (12), the null can be reformulated asthe following Monge-Kantorovich mass transportation problemmin π ∈M ( P,ν ) (cid:90) Y×U { u/ ∈ Γ( y ) } π ( dy, du ) = 0 , (18)where the transportation cost function 1 { u/ ∈ Γ( y ) } is an indicator penalty for violation of the struc-ture.We now consider the empirical version of this Monge-Kantorovich problem, replacing P by theempirical distribution P n to yield the functional T ∗ ( P n , Γ , ν ) = min π ∈M ( P n ,ν ) (cid:90) Y×U { u/ ∈ Γ( y ) } π ( dy, du ) . (19)We shall see below that it is equal to the empirical formulations relative to (iv) and (v). Since formulations (iv) and (v) from Theorem 1 can be rewrittensup A ∈B Y ( P ( A ) − ν (Γ( A ))) = 0 , the following empirical formulation seems the most natural:sup A ∈B Y ( P n ( A ) − ν (Γ( A ))) . The following Theorem states the equivalence between the latter and the empirical formulationderived from (iii):

Theorem 2:

The following equalities hold: T ∗ ( P n , Γ , ν ) = max f ⊕ g ≤ ϕ ( P n f + νg ) (20)= sup A ∈B Y ( P n ( A ) − ν (Γ( A ))) , (21)where ϕ ( y, u ) = 1 { u/ ∈ Γ( y ) } , and f ⊕ g ≤ ϕ signiﬁes that the maximum in (20) is taken over allmeasureable functions f on Y and g on U such that for all ( y, u ), f ( y ) + g ( u ) ≤ ϕ ( y, u ).We shall therefore take T ∗ ( P n , Γ , ν ) as our starting point to construct a test statistic in thefollowing section. 19 Speciﬁcation test

We propose to adopt a test statistic based on the dual Monge-Kantorovich formulation (21),in other words a statistic that penalizes large values of (21). However, T ∗ ( P n , Γ , ν ) seeminglyinvolves checking condition (14) on all sets in B Y . We need to elicit a reduced class of sets onwhich to check condition (14). Call such a reduced class S , and the resulting statistic is T S ( P n , Γ , ν ) = sup A ∈S ( P n ( A ) − ν (Γ( A ))) . (22) S is the result of a formal trade-oﬀ: it needs to be small enough to allow us to derive a limitingdistribution for a suitable re-scaling of T ( P n , Γ , ν ), and large enough to determine the directionof the inequality P − ν Γ, which corresponds to a requirement that our test retain power againstﬁxed alternatives.To illustrate these requirements, we start by considering two simple types of structures to betested. First we shall consider bijective structures (which correspond to our “prelude”), then thecase where Y is ﬁnite. • Bijective structures:

In the case where Γ = γ is single-valued and bijective, consider thefollowing classes of cells in R d y : C = { ( −∞ , y ] , ( y, ∞ ) : y ∈ R d y } ˜ C = { ( −∞ , y ] : y ∈ R d y } . Notice that sup A ∈C ( P n ( A ) − ν ( γ ( A ))) = sup A ∈ ˜ C | P n ( A ) − ν ( γ ( A )) | and the latter is the classical Kolmogorov-Smirnov speciﬁcation test statistic. Hence thechoice of C for our reduced class S is suitable on both counts: we know, as was discussed inthe prelude, that C is a value-determining class for probability measures, hence checking theinequality P − νγ on the reduced class is equivalent to checking it on all measurable sets.In addition, from Appendix A1, we know that this class is Vapnik- ˘Cervonenkis, and hencethat √ nT C ( P n , γ, ν ) = sup A ∈C G n ( A ) converges weakly to the supremum of a P -Brownianbridge, and the test of speciﬁcation can be constructed based on approximations of thequantiles through simulations of the Brownian bridge or the bootstrap.20 Discrete observables:

In the case where the observables belong to a ﬁnite set, the powerset 2 Y is ﬁnite, hence Vapnik- ˘Cervonenkis. This will be suﬃcient to derive the limitingdistribution of √ nT Y ( P n , Γ , ν ) = √ n sup A ∈ Y ( P n ( A ) − ν (Γ( A ))). Since class of wholesubsets is used, we do not need to worry about the competing requirements that the classdetermine the direction of the inequality P − ν Γ.We shall consider the two requirements on the class of sets S sequentially. First, in the nextsubsection, we derive the asymptotic distribution of T S ( P n , Γ , ν ) for a given choice of S . Then, inthe following subsection, we examine the power of the test based on T S ( P n , Γ , ν ), which amountsto linking the choice of the class of sets S with classes of alternatives. We start with a short heuristic description of the behaviour of T S ( P n , Γ , ν ) which will motivatesome deﬁnitions and constructions. We then give speciﬁc sets of conditions for the asymptoticresults to hold. Under the null hypothesis H , we have P ( A ) − ν (Γ( A )) ≤ A ∈ B Y . Recalling that G n isthe empirical process √ n ( P n − P ), we have √ n T S ( P n , Γ , ν ) = √ n sup A ∈S ( P n ( A ) − ν (Γ( A )))= sup A ∈S ( G n ( A ) + √ n ( P ( A ) − ν (Γ( A )))) . Unlike the case of the classical Kolmogorov-Smirnov test, the second term in the previous displaydoes not vanish under the null, since the “regions of indeterminacy” allow δ ( A ) := P ( A ) − ν (Γ( A ))to be strictly negative for some sets A ∈ S . What we know at this stage is that under the null,we have √ n T S ( P n , Γ , ν ) = sup A ∈S ( G n ( A ) + √ n ( P ( A ) − ν (Γ( A )))) ≤ sup A ∈S G n ( A ) , but relying on this bound may lead to very conservative inference.Note that δ is independent of n , so that the scaling factor √ n will pull the second term in theprevious display to −∞ for all the sets where the inequality is strict. This prompts the followingdeﬁnition, illustrated in ﬁgure 5: 21igure 5: Examples of sets in C b (symbolized by the arrows) in a correctly speciﬁed case ( P and ν are uniform, hence correct speciﬁcation corresponds to the graph of Γ containing the diagonal). Deﬁnition 3.1:

We denote the subclass of sets from S where P = ν Γ by S b , i.e. S b := { A ∈ S : P ( A ) = ν (Γ( A )) } . If the class S is a Vapnik- ˘Cervonenkis class of sets, the empirical process converges weakly to the P -Browninan bridge G , i.e. a tight centered Gaussian stochastic process with variance-covariancedeﬁned by EG ( A ) G ( A ) = P ( A ∩ A ) − P ( A ) P ( A ) , and the convergence is uniform over the class S (i.e. the convergence is in l ∞ ( F ), where F isthe class of indicator functions of sets in S ), so that by the continuous mapping theorem, thesupremum of the empirical process converges weakly to the supremum of the Brownian bridge(for a detail of the proof, see Appendix A1).Under (mild) conditions that ensure that the function δ “takes oﬀ” frankly from zero on S b tonegative values on S\S b , the term √ n δ dominates the oscillations of the empirical process, andthe sets in S\S b drop out from the supremum in the asymptotic expression, so that √ n T S ( P n , Γ , ν ) (cid:32) sup A ∈S b G ( A ) , (23)22here (cid:32) denotes weak convergence. Naturally, since S b depends on the unknown P , we needto ﬁnd a data dependent class of sets to approximate S b . By the Law of Iterated Logarithm(see for instance page 476 of Dudley, 2003), we know that the empirical process G n is uniformly O p ( √ ln ln n ), so that if we construct the data dependent class as in deﬁnition 2 below with abandwidth sequence h = h n > h n + h − n (cid:114) ln ln nn → , (24)we shall pick out the sets in S b asymptotically. Deﬁnition 3.2:

We denote the data dependent subclass of sets from S where P n ≥ ν Γ − h byˆ S b,h , i.e. ˆ S b,h := { A ∈ S : P n ( A ) ≥ ν (Γ( A )) − h } . This data dependent class of sets allows us to approximate the distribution of T S ( P n , Γ , ν ) basedon the following limiting result sup A ∈ ˆ S b,hn G ( A ) (cid:32) sup A ∈S b G ( A ) (25)under requirement (24) on the bandwidth sequence h n , and the additional requirement that h n (ln ln n ) → , (26)which allows to control local oscillations of the empirical process as well. Note that (24) and (26)are very mild, as they are both satisﬁed whenever h n n − ζ + h − n n η → , for some − / < η ≤ ζ < . (27)Hence we shall be able to choose between the following methods for approximating quantiles ofthe distribution of T S ( P n , Γ , ν ) and constructing rejection regions for our test statistic: • We can simulate the Brownian bridge and compute the quantiles of the distribution of itssupremum over the data dependent class ˆ S b,h n for some choice of h n . • We can use a subsampling approximation of the quantiles of the distribution of T S ( P n , Γ , ν ).Indeed, sup A ∈S b G ( A ) has continuous distribution function on [0 , + ∞ ), hence the subsam-pling approximation of quantiles is valid. 23efore moving on to speciﬁc asymptotic results, we close this heuristic description with a discus-sion of the cases where the class of saturated sets S b is the trivial class { ∅ , Y} . In such cases,the test statistic converges to zero if one chooses the scaling factor √ n . A reﬁnement of the testwill therefore involve a faster rate of convergence, determined through the construction of a localempirical process taylored to the shape of ν Γ close to ∅ and to Y . We now turn to speciﬁc conditions on the structure (Γ , ν ) and the law P of the observables suchthat results (23) which allows the subsampling approach, and (25) which then also allows thesimulation approach, hold.(a) Case where Y is ﬁnite and S is the class of all subsets S = 2 Y .In that case, we show in Theorem 3a below that both approaches to inference are valid. Theorem 3a: If Y is ﬁnite and S = 2 Y , (23) and (25) hold.(b) Case where Y = R d y , P is absolutely continuous with respect to Lebesgue measure and S = { ( y , z ) × . . . × ( y d y , z d y ) : y , . . . , y d y , z , . . . , z d y ∈ R } or any subclass, such as theclass C deﬁned above .As indicated above, the asymptotic results are derived under assumptions such that thefunction δ “takes oﬀ” frankly from zero. To make this precise, we introduce the following“frank separation” assumption. Recall that if d is the Euclidean metric on Y , the Haussdorfmetric d H between two sets A and A is deﬁned by d H ( A , A ) = max (cid:32) sup y ∈ A inf z ∈ A d ( y, z ) , sup z ∈ A inf y ∈ A d ( y, z ) (cid:33) . We need to ensure that on sets that are suﬃciently distant from sets in S b (where theinequality is binding), then δ is suﬃciently negative so that it dominates local oscillationsof the empirical process. To formalize this, we deﬁne the subclass of S of sets such that theinequality is nearly binding. Deﬁnition 3.3:

We denote the subclass of sets from S where P ≥ ν Γ − h by S b,h , i.e. S b,h := { A ∈ S : P ( A ) ≥ ν (Γ( A )) − h } . Note that since P is absolutely continuous, considering only open intervals is without loss of generality.

24e can now state

Assumption FS (Frank Separation):

There exists

K > < η < A ∈ S b,h , for h > A b ∈ S b such that A b ⊆ A and d H ( A, A b ) ≤ Kh η . Remark 1:

Assumption is very mild, in the sense that it fails only in pathological cases,such as the case where Y = R , S = C , and y (cid:55)→ P (( −∞ , y ]) − ν (Γ(( −∞ , y ])) is C ∞ with allderivatives equal to zero at some y = y such that ( −∞ , y ] ∈ C .Then, we have: Theorem 3b:

Suppose assumptions FS and (27) hold and that P is absolutely continuouswith respect to Lebesgue measure. Then (23) and (25) hold.The proof is based on the following lemma, Lemma 3a:

Under the conditions of Theorem 3b, we havesup A ∈S b,hn G n ( A ) (cid:32) sup A ∈S b G ( A ) , which involves bounds on oscillations of the empirical process. As mentioned before, to ensure consistency of our speciﬁcation test statistic, we need to deriveconditions on the structure (Γ , ν ) and the law P of observables such that all violations of theinequality P ≤ ν Γ will be detected asymptotically with a test based on the statistic T S ( P n , Γ , ν ).Before giving speciﬁc results, we shall try to convey the extent of the diﬃculties involved, incomparison with the case of the classical Kolmogorov-Smirnov test which was developed in ourprelude.When testing the equality of two probability measures, as in the Kolmogorov-Smirnov test, weneed a class of sets that will determine the value of the law P , since it will ensure that if theequality holds on this class of sets, it holds everywhere. To be more precise, we need a convergencedetermining class (see section 2.6 page 18 of van der Vaart, 1998) since our test is asymptotic.25hen testing the inequality P ≤ ν Γ, the situation is complicated in two ways. First, ν Γ is aset function, but it is generally not additive unless Γ is bijective, and a convergence determiningclass is much harder come by. Second, determining the value of ν Γ may not be suﬃcient, sinceit may not guarantee that the direction of the inequality P ≤ ν Γ will be maintained from thereduced convergence determining class to all measurable sets. We discuss these two points in thefollowing subsections. ν Γ : The set function A (cid:55)→ ν (Γ( A )) is a Choquet capacity functional (for deﬁnitions and properties,see Appendix A2), and the following lemma (lemma 1.14 of Salinetti and Wets, 1986) provides aconvergence determining class in great generality. Recall that a closed ball B ( y, η ) with center y and radius η is the sets of points in Y whose distance to y is lower or equal to η . Deﬁne S SW asthe class of compact subsets of Y with the following two properties:(C1) Elements of S SW are ﬁnite unions of closed balls with positive radii,(C2) Elements of S SW are continuity sets for the Choquet capacity functional A → ν (Γ( A )) , in other words, if A ∈ S SW , then ν (Γ(cl( A ))) = ν (Γ(int( A ))).Then we have: Lemma SW:

The class S SW is convergence determining.The class S SW is not a Vapnik- ˘Cervonenkis class of sets since for any ﬁnite collection of points,there is a collection of ﬁnite union of balls that shatters it (see appendix A1). However, thereis a natural restriction of this class which is. In the case where Y = R d y , S SW can be redeﬁnedwith rectangles instead of balls. Take an integer K . Deﬁne the class of ﬁnite unions of at most K rectangles: S K = { (cid:91) k ≤ K ( y k , z k ) : ( y k , z k ) ∈ R d y } . Then we have 26 emma 3b: S K is a Vapnik- ˘Cervonenkis class of sets.Hence this class is amenable to asymptotic treatment. ν ΓThe requirement, that we call “Core determining”, on the class S that P ( A ) ≤ ν (Γ( A )) forall A ∈ S imply P ( A ) ≤ ν (Γ( A )) for all measurable A is apparently more stringent than therequirement that the values of the set function ν (Γ( . )) on all measurable sets be determined byits values on S . Deﬁnition 3.4:

A class S of subsets of Y is core determining for (Γ , ν ) ifsup S ( P − ν Γ) = 0 = ⇒ sup B Y ( P − ν Γ) = 0We have noted already the obvious fact:

Fact 1: S = 2 Y is core determining for observables on a ﬁnite set Y .A close inspection of the proof of Theorem 2 shows the following fact: Fact 2:

The class F Y of closed subsets of Y is core determining.We now show that we can actually say much more by linking the core determining propertywith the convergence determining property, and showing that the class ˜ S SW of ﬁnite unions ofopen balls with positive raddii (or alternatively the class ﬁnite unions of open rectangles) is coredetermining.First, we need to consider the following assumptions on the structure: Assumption (CD1): Y is a compact subset of R d y , and U is a compact subset of R d u . Assumption (CD2): P and ν are absolutely continuous with respect to Lebesgue measure. Assumption (CD3):

There exists γ ∈ Sel(Γ) such that P ( A ) → ν ( γ ( A )) → • There exists γ ∈ Sel(Γ) injective, such that νγ (now a probability measure) is absolutelycontinuous with respect to P . • There exists γ ∈ Sel(Γ) and α > ν ( γ ( A )) ≤ αP ( A ) for all A measurable. Assumption (CD4):

Γ is convex-valued, i.e. Γ( y ) is a convex set for all y ∈ Y .This assumption rules out some interesting cases, for instance when the graph of Γ (deﬁned in(8)) is the union of the graphs of two functions. However, our conditions are not minimal, andsuch cases could be treated under a diﬀerent set of conditions.We deﬁne the upper and lower envelopes of the Graph of Γ by Deﬁnition 3.5:

The upper (resp. lower) envelope of Graph Γ is the function y (cid:55)→ u ( y ) =sup { Γ( y ) } (resp. y (cid:55)→ l ( y ) = inf { Γ( y ) } ). Assumption (CD5):

The upper and lower envelopes u and l of the graph of Γ are Lipschitz,i.e. there exists κ ≥ y , y ∈ Y ,max ( | u ( y ) − u ( y ) | , | l ( y ) − l ( y ) | ) ≤ κ | y − y | . To state our last assumption, we need an extra deﬁnition:

Deﬁnition 3.6:

A forking point of Γ is a y such that for any (cid:15) >

0, there exists y and y inthe open ball B( y , (cid:15) ) such that Γ( y ) is a singleton, and Γ( y ) is not. Assumption (CD6):

Γ has at most a ﬁnite number of forking points.Note that this is a technical assumption that is violated only in pathological cases, and that isakin to the Frank Separation Assumption (FS).We can now state the result:

Theorem 3c:

Under assumption (CD1)-(CD6), the class ˜ S SW of ﬁnite unions of open balls with28ositive radii (or alternatively the class ﬁnite unions of open rectangles) is core determining.This result is fundamental in that it reduces the problem of checking consistency of the test basedon the statistic T S ( P n , Γ , ν ) to the problem of checking whether P ( A ) ≤ ν (Γ( A )) for A a ﬁniteunion of balls (or rectangles) in R d y whenever P ≤ ν Γ on S .We shall now apply this reasoning to give some conditions on the structure (Γ , ν ) under whichthe test based on statistic T S ( P n , Γ , ν ) is consistent with S = C = { ( −∞ , y ] , ( y, ∞ ) : y ∈ R } ,such as in ﬁgure 6, and conditions under which the class C may not be core determining, but theclass S = R = { ( y, z ) : y, z ∈ R } is. We thereby deﬁning classes of alternatives that our testsbased on T C ( P n , Γ , ν ) and T R ( P n , Γ , ν ) have power against in case Y = R and P is absolutelycontinuous with respect to Lebesgue measure.Figure 6: Violation of null that can be detected by the class of cells C . Notice in particular thatthe inequality P ≤ ν Γ is violated on the set A ( P and ν are uniform). Theorem 3d:

If assumption (CD1) and (CD2) are satisﬁed, and the graph of Γ has increasingupper and lower envelopes, then C is core determining, and hence the speciﬁcation test based onthe statistic T C ( P n , Γ , ν ) is consistent.In ﬁgure 7, we show a case where the null hypothesis does not hold, but a test based on T C ( P n , Γ , ν )fails to detect it because of the lack of monotonicity of the upper envelope. In that case, we needthe larger class of sets R to detect the departure from the null.29igure 7: Violation of null that cannot be detected by the class of cells C , but can be detectedby the class of all intervals. Notice in particular that the inequality P ≤ ν Γ is violated on A butnot on B ( P and ν are uniform). The test of speciﬁcation that we have developed can be applied to the construction of conﬁdenceregions in case the structure depends on unknown parameters. Let θ ∈ Θ ⊆ R d θ be a vector ofstructural parameters, and let the model be given by (Γ θ , ν θ ). Deﬁnition 4.1:

The identiﬁed set Θ I is deﬁned as the set of all θ ∈ Θ such that the nullhypothesis H ( θ ) of compatibility of (Γ θ , ν θ ) with P (as deﬁned in Theorems 1 and 1’) holds true.This section is an outline of the application of our testing procedure to the construction ofconﬁdence regions for elements of the identiﬁed set and for the identiﬁed set itself. To form a conﬁdence region that covers (with at least some pre-determined probability) eachparameter value that makes the structure compatible with the distribution of observables, wepropose to invert our test statistic to form a conﬁdence region for elements of Θ I . In otherwords, for a given α ∈ (0 , n such that, for all θ ∈ Θ I , lim inf n P ( θ ∈ CR n ) ≥ α. The conﬁdence region obtained from inverting the test has the form CR n = { θ ∈ Θ : √ nT S ( P n , Γ θ , ν θ ) ≤ ˆ Q α ( θ ) } where S is a class of sets which is Core determining for all θ ∈ Θ and ˆ Q α ( θ ) is an approximation of the α quantile of the distribution of T S ( P n , Γ θ , ν θ ). A30alid approximation can be obtained using either one of the two methods proposed at the end ofsection 3.1.1. To form a region that covers the whole identiﬁed set with pre-determined probability, we need a re-gion CR ∗ n such that lim inf n P (Θ I ⊆ CR ∗ n ) ≥ α. The latter can be obtained using the method pro-posed by Chernozhukov et al., 2002 applied to the criterion function (sup A ∈S ( P ( A ) − ν θ (Γ θ ( A )))) with sample criterion T S ( P n , Γ θ , ν θ ) (under the condition that C1, C2, C4 and C5 of Chernozhukovet al., 2002 hold). A main contribution of this paper, therefore, is to provide the ﬁrst natural andgeneral choice of criterion function, and thereby pave the way for a comparison of criteria and adiscussion of optimality. We now spell out our procedures on a very simple example: example 5 of section 1. The structureis described by the multi-valued mapping: Γ(1) = [0 , λ ] and Γ(0) = [0 , y isBernoulli, we can write P = (1 − p, p ) (cid:48) with p the probability of a 1. For the distribution of u , weconsider a parametric exponential family on [0 , ν φ has distribution function u φ , with φ >

0. Our parameter vector is therefore θ = ( λ, φ ) (cid:48) .The null hypothesis in this case is immediately seen to be equivalent to p ≤ λ φ for a given valueof the parameter vector. Indeed, the easiest formulation to use is probably formulation (v) whichrequires that p = P ( { } ) ≤ ν (Γ(1)) = ν [0 , λ ] = λ φ . Hence T { , } ( P n , Γ θ , ν θ ) = p n − λ φ . Now,if p = λ φ , then S b = { ∅ , { } , { } , { , }} and then √ n ( p n − λ φ ) converges weakly to a normalrandom variable with mean zero and variance p (1 − p ), whereas if p < λ φ , then S b = { ∅ , { , }} and √ n ( p n − λ φ ) converges to zero. In either case, for a given choice of sequence h n , ˆ S b,h n isequal to { ∅ , { } , { } , { , }} if p n ≥ λ φ − h n and { ∅ , { , }} otherwise.The α quantile of √ nT { , } ( P n , Γ θ , ν θ ) = √ n ( p n − λ φ ) can be approximated with 0 if p n < λ φ − h n ,and with the α quantile of the normal with mean zero and variance p n (1 − p n ) if p n ≥ λ φ − h n .Alternatively, Q α ( θ ) can be approximated using subsampling (though it would be a serious caseof overkill). The procedure would then be the following: Consider all (or a large number B n of)the samples of size b n from the sample of size n with 1 /b n + b n /n → Q α ( θ )31ith ˆ Q α ( θ ) = inf { x : 1 B n B n (cid:88) i =1 {√ bT S ( P ib , Γ θ , ν θ ) ≤ x } ≥ α } where P ib is the empirical distribution of the i -th subsample. A conﬁdence region is then CR n = { θ ∈ [0 , × (0 , + ∞ ) : √ nT S ( P n , Γ θ , ν θ ) ≤ ˆ Q α ( θ ) } . Since structures are often given without a speciﬁcation of the distribution of the unobservablevariables, it is customary to assume only moment conditions, such as a given mean (taken tobe equal to zero without loss of generality) and ﬁnite variance. This includes as special casesstructures deﬁned by moment inequality conditions.In such cases, a similar approach can be taken where the null is deﬁned as the existence of ajoint law supported on the set { u ∈ Γ θ ( y ) } with marginal P on Y and marginal on U satisfyingsome moment conditions. Calling V the set of laws that satisfy the said conditions, the dualformulation delivers a feasible version of the statisticinf ν ∈V sup A ∈S [ P ( A ) − ν (Γ θ ( A ))] . This involves a number of diﬃculties, which are the subject of a companion paper

GH:2006 .We only give here, as an illustration, the application of the method on a classic special case ofexample 3Suppose one observes income brackets with centers in Y = { y , . . . , y k } with y < . . . < y k andwidth δ . True income is unobservable, and one is interested in the mean of true income. Themodel correspondence is given by Γ( y ) = ( y − δ/ , y + δ/ p ( y i ) (resp. p n ( y i )) denote thetrue (resp. empirical) probability of { Y = y i } .Consider formulation (v’): ν ≤ P Γ − of the null hypothesis. Denoting Γ u ( B ) = { y : Γ( y ) ⊆ B } for any B ∈ B U , and writing φ ∗ = P Γ − and φ ∗ = P Γ u , we have (using Deﬁnition A2.6Lemma A2.2 in appendix A2) that under the null, the expectation of any measurable function f of the unobservable variables satisﬁes (cid:90) Ch f dφ ∗ ≤ E f ≤ (cid:90) Ch f dφ ∗ . φ ∗ n = P n Γ − and φ n ∗ = P n Γ u the empirical versions of φ ∗ and φ ∗ , the set [ (cid:82) Ch f dφ n ∗ , (cid:82) Ch f dφ ∗ n ]estimates the identiﬁed set [ (cid:82) Ch f dφ ∗ (cid:82) Ch f dφ ∗ ] . In the case considered here, where f is the iden-tity, this identiﬁed set equals (cid:34) k (cid:88) i =1 ( y i − δ/ p ( y i ) , k (cid:88) i =1 ( y i + δ/ p ( y i ) (cid:35) , which is equal to (cid:34) k (cid:88) i =1 ( y i − δ/

2) ( p n ( y i ) − g n,i / √ n ) , k (cid:88) i =1 ( y i + δ/

2) ( p n ( y i ) − g n,i / √ n ) (cid:35) from which asymptotically valid conﬁdence regions can be constructed, since g n = ( g n, , . . . , g n,k ) (cid:48) , with g n,i = √ n ( p n ( y i ) − p ( y i )) is asymptotically a Gaussian vector. Conclusion

We have provided a coherent deﬁnition of correct speciﬁcation of structures with no identifyingassumptions. This deﬁnition is the result of the equivalence of several natural generalizations ofthe hypothesis of correct speciﬁcation in the identiﬁed case. These theoretical formulations ofcorrect speciﬁcation have natural empirical counterparts, several of which are also shown to beequivalent, and a test of speciﬁcation is based on the latter. When the structure is parameterized,this test can be inverted to yield conﬁdence regions for the set of structural parameters for whichthe null hypothesis of correct speciﬁcation is satisﬁed.This work has the following natural extensions: First, the whole approach is articulated aroundthe existence of a joint measure with given marginals, hence it is essentially parametric in nature,but can be naturally extended to a problem of existence of a joint probability measure with onemarginal given (the distribution of observables) and moment conditions on the other marginal (thedistribution of unbobservable variables). This natural extension of our work will nest structuresdeﬁned by moment inequalities, and therefore deliver a way to construct conﬁdence regions insuch cases. Second, the statistic we have used to examine correct speciﬁcation can be derivedfrom the Kolmogorov-Smirnov distance between the empirical distribution and the set of datagenerating processes implied by the structure. Other distances and pseudo-distances will generatediﬀerent speciﬁcation statistics, and relative entropy may be a particularly good candidate, inthat it produces optimal inference in the special case of identiﬁed structures.33 ppendix A: Additional concepts and results

A1: Convergence of the empirical process

We give here deﬁnitions and results that we use in our asymptotic analysis. The deﬁnition of a Vapnik-˘Cervonenkis class of sets is given in section 2.6.1 page 134 of van der Vaart and Wellner, 1996 andreproduced here for the convenience of the reader.

Deﬁnition A1.1:

Let S be a collection of subsets of a set X . An arbitrary set of n points { x , . . . , x n } posesses 2 n subsets. Say that C picks out a certain subset from { x , . . . , x n } if this can be formed as theset C ∩ { x , . . . , x n } for a C in S . The collection S is said to shatter { x , . . . , x n } if each of its 2 n subsetscan be picked out in this manner. The Vapnik- ˘Cervonenkis index of the class S is the smallest n forwhich no set of cardinality n is shattered by S . A Vapnik- ˘Cervonenkis class of sets is a class with ﬁniteVapnik- ˘Cervonenkis index. Fact A1:

The class of cells C is a Vapnik- ˘Cervonenkis class of sets (see Example 2.6.1 page 135 of vander Vaart and Wellner, 1996). Deﬁnition A1.2:

The P -Brownian bridge is the tight centered Gaussian stochastic process with variance-covariance deﬁned by EG ( A ) G ( A ) = P ( A ∩ A ) − P ( A ) P ( A ). Theorem A1.1: If S is a Vapnik- ˘Cervonenkis class of sets, the empirical process converges weakly to the P -Browninan bridge G , and the convergence is uniform over the class S (i.e. the convergence is in l ∞ ( F ),where F is the class of indicator functions of sets in S ). Proof of Theorem A1.1:

We assume that S is a Vapnik- ˘Cervonenkis class of sets. Call F the class ofindicator functions of sets in S , and call V ( F ) the Vapnik- ˘Cervonenkis index of the corresponding class ofsets. By Theorem 2.6.4 page 136, there exists a constant C such that for all probability measure Q andall 0 < ε <

1, the covering number (see deﬁnition 2.2.3 page 98 of van der Vaart and Wellner, 1996) of F in L ( Q ) metric, N( ε, F , L ( Q )) satisfyN( ε, F , L ( Q )) ≤ C ( V ( F ))(4 e ) V ( F ) (1 /ε ) V ( F ) − . Hence, we have (cid:90) ∞ sup Q (cid:112) ln N( ε, F , L ( Q )) dε < ∞ . Since F is a class of indicator functions, the above suﬃces to satisfy conditions of Theorem 2.5.2 page 127of van der Vaart and Wellner, 1996, and F is P -Donsker, which by deﬁnition means that G n converges in l ∞ ( F ). y the continuous mapping theorem, we immediately have the following corollary: Corollary A1.1: If S is a Vapnik- ˘Cervonenkis class of sets, then sup S G n converges weakly to sup S G . A2: Choquet capacity functionals

We collect here all the deﬁnitions, equivalent representations and properties of Choquet capacity functionals(a.k.a. distributions of random sets or inﬁnitely alternating capacities) that are useful for this paper. Allthe results presented here can be traced back to Choquet, 1953.Take X a Polish space (complete metrizable and seperable topological space) endowed with its Borel σ -algebra B . For a sequence of numbers, a n ↑ a (resp. a n ↓ a ) denotes convergence in inceasing (resp.decreasing) values, whereas for a sequence of sets, the notation A n ↑ A (resp. A n ↓ A ) denotes A n ⊆ A n +1 for all n and A = (cid:83) n A n (resp. A n +1 ⊆ A n for all n and A = (cid:84) n A n ). Finally, denote F (resp. G ) the setof closed (resp. open) subsets of X , and for A ∈ B , F A = { F ∈ F : F ∩ A (cid:54) = ∅ } . Deﬁnition A2.1:

A capacity is a set function ϕ : B → R satisfying(i) ϕ ( ∅ ) = 0 and ϕ ( X ) = 1,(ii) For any two Borel sets A ⊆ B , we have ϕ ( A ) ≤ ϕ ( B ),(iii) For all sequences of Borel sets A n ↑ A , we have ϕ ( A n ) ↑ ϕ ( A ),(iv) For all sequences of closed sets F n ↓ F , we have ϕ ( F n ) ↓ ϕ ( F ). Deﬁnition A2.2

A capacity ϕ is called inﬁnitely alternating if for any n and any sequence A , . . . , A n ofBorel sets, ϕ (cid:32) n (cid:92) i =1 A i (cid:33) ≤ (cid:88) ∅ (cid:54) = I ⊆{ , ,...,n } ( − | I | +1 ϕ (cid:32)(cid:91) I A i (cid:33) We call Choquet capacity functional an inﬁnitely alternating capacity. Probability measures are specialcases of Choquet capacity functionals, for which the alternating inequality of deﬁnition A2.2 holds as anequality (known as Poincar´e’s equality).We now show that inﬁnite alternation is a characteristic property of distributions of random sets (for aproof, see for instance section 2.1 of Matheron, 1975).

Theorem A2.1: ϕ is a Choquet capacity functional (i.e. an inﬁnitely alternating capacity) if and onlyif there exists a probability measure P on F such that, for all A ∈ B , ϕ ( A ) = P ( F A ), and such a P isunique. is therefore called the distribution of the random set associated with the probability measure P , whichallows the following deﬁnition of convergence determining classes for a Choquet capacity functional: Deﬁnition A2.3:

A class C of Borel subsets of X is called convergence determining for a Choquet capacityfunctional ϕ if and only if the class {F A ; A ∈ C} is convergence determining for the probability measure P associated to ϕ as in Theorem A2.1.We now look at the relation with measurable correspondences, deﬁned as correspondences that satisfyAssumption 1 in the main text. Let (Ω , B , P ) be a probability space. Deﬁnition A2.4:

A non-empty and closed valued correspondence Γ : Ω ⇒ X is called a measurablecorrespondence if for each open set O ⊆ X , Γ − ( O ) = { ω ∈ Ω | Γ( ω ) ∩ O (cid:54) = ∅ } belongs to B .If we deﬁne ϕ by ϕ ( A ) = P { ω ∈ Ω | Γ( ω ) ∩ A (cid:54) = ∅ } , for all A ∈ B , then ϕ is a Choquet capacity functional(from section 26.8 page 209 of Choquet, 1953), and its core is deﬁned by the following: Deﬁnition A2.5: the core of ϕ deﬁned above is the set of probability measures that are set-wise dominatedby ϕ , i.e. Core( ϕ ) := Core(Γ , P ) = { Q : Q ( A ) ≤ ϕ ( A ) all A measurable } .We add useful regularity properties of Choquet capacity functionals: Lemma A2.1: If ϕ is a Choquet capacity functional, by the Choquet Capacitability Theorem (section38.2 page 232 of Choquet, 1953), in addition to properties (i)-(iv) of Deﬁnition A2.1, it satisﬁes(v) ϕ ( A ) = sup { ϕ ( F ) : F ⊆ A, F ∈ F} for all A ∈ B ,(vi) ϕ ( A ) = inf { ϕ ( G ) : A ⊆ G, G ∈ G} for all A ∈ B .Several notions extend integration in case of non-additive measures. We only use explicitely the notion ofChoquet integral, which we deﬁne below. Deﬁnition A2.6:

The Choquet integral of a bounded measurable function f with respect to a capacity ϕ is deﬁned by (cid:90) Ch f d ϕ = (cid:90) ∞ ϕ ( { f ≥ x } ) d x + (cid:90) −∞ ( ϕ ( { f ≥ x } ) −

1) d x, . (28)The Choquet integral reduces to the Lebesgue integral when ϕ is a probability measure. In addition, ithas a very simple expression in case ϕ is a Choquet capacity functional (see Theorem 1 of Castaldo et al.,2004). emma A2.2: If ϕ is a Choquet capacity functional, then for all f bounded measurable, the Choquetintegral of f with respect to ϕ is given by (cid:82) Ch f d ϕ = sup Q ∈ Core( ϕ ) (cid:82) f d Q . Appendix B: Proofs of the results in the main text

Reader’s guide to the proofs:

In the proof of Theorem 1, a result very close to (ii) ⇐⇒ (iv) is stated in Wasserman, 1990, but the proofis essentially omitted. The proof of (i) ⇐⇒ (iii) relies on Corollary 1 of Castaldo et al., 2004, which allowsto generalize Proposition 1 of Jovanovic, 1989. The proof of (iv) ⇐⇒ (v) is straightforward, whereasthe proof of (iii) ⇐⇒ (v) is similar to Theorem 2. The latter is a simple application of lemma 1, whichitself is a simpliﬁcation of the main generalized Monge-Kantorovitch duality theorem of Kellerer, 1984.Lemma 1[a] is lemma 11.8.5 of Dudley, 2003. The proof given here for completeness is due to N. Belili.The rest of Theorem 2 is a specialization of the duality result to zero-one cost, which can also be provedusing Proposition (3.3) page 424 of Kellerer, 1984, but we give a direct proof to show that we can specializeto closed sets, a fact that we use in the discussion of the power of the test.Theorem 3a is straightforward. Theorem 3b is structured around the inequalitysup S b G n ≤ sup ˆ S b,hn G n ≤ sup S b,ln G n which holds on an event of large enough probability, with suitable bandwidth sequences h n (cid:28) l n . Then,lemma 3a shows that sup S b,ln G n converges weakly to the same limit as sup S b G n , namely sup S b G . Finally,the same reasoning is invoked to show that sup ˆ S b,hn G also converges to the same limit (but for this weneed to assume that the bandwidth satisﬁes condition (27) rather than (24) and (26)). Lemma 3a relieson the construction of a local empirical process relative to the thin sets A \ A b , where A is in S b,l n and A b is in S b and is close to A in terms of Haussdorf metric (hence the term “thin”).Lemma 3b, like Appendix A1, brings together some facts that are scattered in van der Vaart and Well-ner, 1996. Theorem 3c uses the regulatiry properties of Choquet capacity functionals to show that ﬁniteunions of balls are core determining. Given a closed set F , using outer regularity of P and a compactnessargument, a decreasing sequence of ﬁnite unions of open balls is constructed that satisﬁes two require-ments: it converges to F both in P -measure and in Haussdorf distance. The regularity properties of thecorrespondence Γ are then used to control the Haussdorf distance between the images by Γ of F and theapproximating sequence. The absolute continuity of ν is then invoqued to conclude, so that the sign of theinequality is maintained by continuity. Theorem 3d ties in the problem of ﬁnding core determining classeswith the Monge-Kantorovitch dual under zero-one cost: pairs (1 F , − Γ( F ) ) with F in the larger class areshown to be convex combinations of pairs (1 A , − Γ( A ) ) with A in the potential core determining class. roof of Theorem 1: [a] We ﬁrst show equivalences (i) ⇐⇒ (iv) ⇐⇒ (ii):Call ∆( B ) the set of all Borel probability measures with support B . Under Assumption 1, the map y (cid:55)→ ∆(Γ( y )) is a map from Y to the set of all non-empty convex sets of Borel probability measures on U which are closed with respect to the weak topology. Moreover, for any f ∈ C b ( U ), the set of all continuousbounded real functions on U , the map y (cid:55)−→ sup (cid:26)(cid:90) f dµ : µ ∈ ∆(Γ( y )) (cid:27) = max u ∈ Γ( y ) f ( u )is B Y -measurable, so that, by Theorem 3 of Strassen, 1965, for a given ν ∈ ∆( U ), there exists π satisfying(11) with π ( y, . ) ∈ ∆(Γ( y )) for P -almost all y if and only if (cid:90) U f ( u ) ν ( du ) ≤ (cid:90) Y sup u ∈ Γ( y ) f ( u ) P ( dy ) (29)for all f ∈ C b ( U ). Now, deﬁning P as the set function P : B → P ( { y ∈ Y : Γ( y ) ∩ B (cid:54) = ∅ } ) , the right-hand side of (29) is shown in the following sequence of equalities to be equal to the integral of f with respect to P in the sense of Choquet (deﬁned by (28)). (cid:90) Y sup u ∈ Γ( y ) { f ( u ) } P ( dy )= (cid:90) ∞ P (cid:8) y ∈ Y : sup u ∈ Γ( y ) { f ( u ) } ≥ x (cid:9) d x + (cid:90) −∞ ( P (cid:8) y ∈ Y : sup u ∈ Γ( y ) { f ( u ) } ≥ x (cid:9) −

1) d x = (cid:90) ∞ P (cid:8) y ∈ Y : Γ( y ) ⊆ { f ≥ x } (cid:9) d x + (cid:90) −∞ ( P (cid:8) y ∈ Y : Γ( y ) ⊆ { f ≥ x } (cid:9) −

1) d x = (cid:90) ∞ P ( { f ≥ x } ) d x + (cid:90) −∞ ( P ( { f ≥ x } ) −

1) d x = (cid:90) Ch f d P .

By Theorem 1 of Castaldo et al., 2004, for any f ∈ C b ( U ), (cid:90) Ch f d P = max γ ∈ S el (Γ) (cid:90) U f ( u ) P γ − ( du ) , so that (29) is equivalent to max γ ∈ S el (Γ) (cid:90) U f ( u ) P γ − ( du ) ≥ (cid:90) U f ( u ) ν ( du ) (30)for any f ∈ C b ( U ). If ν is in the weak closure of the set of convex combinations of elements of { P γ − : γ ∈ Sel(Γ) } , then by linearity of the integral and the deﬁnition of weak convergence, (30) holds. Conversely, if ν satisﬁes (30), then it satisﬁes (cid:90) Ch f d P ≥ (cid:90) U f ( u ) ν ( du )and by monotone continuity, we have for all A ∈ B U , and 1 A the indicator function, (cid:90) U A ( u ) ν ( du ) ≤ (cid:90) C h A dP . ence ν ( A ) ≤ P ( A ) for all A ∈ B U , which by Corollary 1 of Castaldo et al., 2004 implies that ν is theweak limit of a sequence of convex combinations of elements of { P γ − : γ ∈ Sel(Γ) } , hence it is a mixturein the desired sense and the proof is complete.[b] We now show equivalences (iii) ⇐⇒ (iv) ⇐⇒ (v):Using theorem 2 below, it suﬃces to show that (13) is equivalent to ν (Γ( A )) ≥ P ( A ) for all A ∈ B Y . Aspreviously, deﬁne P as the set function on B U P : B → P ( { y ∈ Y : Γ( y ) ∩ B (cid:54) = ∅ } ) . Deﬁne also P as the set function P : B → P ( { y ∈ Y : Γ( y ) ⊆ B } ) . Since P ( B ) = 1 − P ( B c ), we have the well known equivalence between ν ( B ) ≤ P ( B ) for all B ∈ B U and ν ( B ) ≥ P ( B ) for all B ∈ B U . In particular, for B = Γ( A ) for any A ∈ B Y , we have ν ( B ) ⊆ { y ∈ Y :Γ( y ) ⊆ Γ( A ) } . As A ⊆ { y ∈ Y : Γ( y ) ⊆ Γ( A ) } , we have ν (Γ( A )) ≥ P ( B ). Conversely, for some B ∈ B U ,call B ∗ = { y ∈ Y : Γ( y ) ⊆ B } . Then, we have P ( B ∗ ) ≤ ν (Γ( B ∗ )). The result follows from the observationthat Γ( B ∗ ) ⊆ B . Proof of Theorem 1’:

The proof completely parallels the proof of Theorem 1. The equivalence between 1(iii) and 1’(iii’) drivesthe equivalence of each of the formulations in Theorem 1’ with each of the formulations in Theorem 1.

Lemma 1: If ϕ : Y × U → R is bounded, non-negative and lower semicontinuous, theninf π ∈M ( P,ν ) πϕ = sup f ⊕ g ≤ ϕ ( P f + νg ) Proof of Lemma 1:

It can be shown to be a special case of corollary (2.18) of Kellerer, 1984; however, a direct proof is moretransparent, so we give it here for completeness. The left-hand side is immediately seen to be always largerthan the right-hand side, so we show the reverse inequality.[a] case where ϕ is continuous and U and Y are compact.Call G the set of functions on Y × U strictly dominated by ϕ and call H the set of functions of the form + g with f and g continuous functions on Y and U respectively. Call s ( c ) = P f + νg for c ∈ H . It is awell deﬁned linear functional, and is not identically zero on H . G is convex and sup-norm open. Since ϕ is continuous on the compact Y × U , we have s ( c ) ≤ sup f + sup g < sup ϕ for all c ∈ G ∩ H , which is non empty and convex. Hence, by the Hahn-Banach theorem, there exists alinear functional η that extends s on the space of continuous functions such thatsup G η = sup G ∩ H s. By the Riesz representation theorem, there exists a unique ﬁnite non-negative measure π on Y × U suchthat η ( c ) = πc for all continuous c . Since η = s on H , we have (cid:90) Y×U f ( y ) dπ ( y, u ) = (cid:90) Y f ( y ) dP ( y ) (cid:90) Y×U g ( u ) dπ ( y, u ) = (cid:90) Y g ( u ) dν ( y ) , so that π ∈ M ( P, ν ) and sup f ⊕ g ≤ ϕ ( P f + νg ) = sup G ∩ H s = sup H η = πϕ. [b] Y and U are not necessarily compact, and ϕ is continuous.For all n >

0, there exists compact sets K n and L n such thatmax ( P ( Y\ K n ) , ν ( U\ L n )) ≤ n . Let ( a, b ) be an element of

Y × U and deﬁne two probability measures µ n and ν n with compact support by µ n ( A ) = P ( A ∩ K n ) + P ( A \ K n ) δ a ( A ) ν n ( B ) = ν ( B ∩ L n ) + ν ( B \ L n ) δ b ( B ) , where δ denotes the Dirac measure. By [a] above, there exists π n with marginals µ n and ν n such that π n ϕ ≤ sup f ⊕ g ≤ ϕ ( P f + νg ) + ϕ ( a, b ) n . Since ( π n ) has weakly converging marginals, it is weakly relatively compact. Hence it contains a weaklyconverging subsequence with limit π ∈ M ( P, ν ). By Skorohod’s almost sure representation (see for instancetheorem 11.7.2 page 415 of Dudley, 2003), there exists a sequence of random variables X n on a probabilityspace (Ω , A , P ) with law π n and a random variable X on the same probability space with law π such that X is the almost sure limit of ( X n ). By Fatou’s lemma, we then haveliminf π n ϕ = liminf E ϕ ( X n ) ≥ E liminf ϕ ( X n ) = E ϕ ( X ) = πϕ. ence we have the desired result.[c] General case. ϕ is the pointwise supremum of a sequence of continuous bounded functions, so the result follows fromupward σ -continuity of both inf π ∈M ( P,ν ) πϕ and sup f ⊕ g ≤ ϕ ( P f + νg ) on the space of lower semicontinuousfunctions, shown in propositions (1.21) and (1.28) of Kellerer, 1984. Proof of Theorem 2:

Under assumption 1, Γ is closed valued, hence ϕ ( y, u ) = 1 { u/ ∈ Γ( y ) } is lower semicontinuous and (20) is adirect application of lemma 1 above.We now show (21). Since the sup-norm of the cost function is 1 (the cost function is an indicator), thesupremum in (20) is attained pairs of functions ( f, g ) in F , deﬁned by F = { ( f, g ) ∈ L ( P ) × L ( ν ) , ≤ f ≤ , − ≤ g ≤ ,f ( y ) + g ( u ) ≤ { u/ ∈ Γ( y ) } , f upper semicontinuous } . Now, ( f, g ) can be written as a convex combination of pairs (1 A , − B ) in F . Indeed, f = (cid:82) { f ≥ x } dx and g = (cid:82) − { g ≤− x } dx , and for all x , 1 { f ≥ x } ( y ) − { g ≤− x } ( u ) ≤ { u/ ∈ Γ( y ) } . Since the functional on theright-hand side of (20) is linear, the supremum is attained on such a pair (1 A , − B ). Hence, the right-andside of (20) specializes to sup A × B ⊆ D ( P ( A ) − ν ( B )) . (31)For D = { ( y, u ) : u / ∈ Γ( y ) } , A × B ⊆ D means that if y ∈ A and u ∈ B , then u / ∈ Γ( y ). In other words u ∈ B implies u / ∈ Γ( A ), which can be written B ⊆ Γ( A ) c . Hence, the dual problem can be writtensup Γ( A ) ⊆ B c ( P ( A ) − ν ( B )) = sup Γ( A ) ⊆ B ( P ( A ) − ν ( B )) . and (21) follows immediately. Proof of Theorem 3a:

Let A be the subset of Y that achieves the maximum of δ ( A ) = P ( A ) − ν (Γ( A )) over A ∈ S\S b . Call δ = δ ( A ), and note that δ <

0. We have √ nT Y ( P n , Γ , ν ) = sup A ∈ Y [ G n ( A ) + √ n ( P ( A ) + ν (Γ( A )))]= max { sup S b G n , sup A ∈ Y \S b [ G n ( A ) + √ n ( P ( A ) + ν (Γ( A )))] } . he second term in the maximum of the preceding display is dominated bysup Y \S b G n + √ nδ , whose limsup is almost surely non-positive. Hence (23) follows from the convergence of the empiricalprocess. (25) follows from the fact that, under (24), for all n suﬃciently large, ˆ S b,h n is almost surely equalto S b . Proof of Theorem 3b:

Consider two sequences of positive numbers l n and h n such that they both satisfy (27), l n > h n and( l n − h n ) − (cid:113) ln ln nn →

0. Notice that { ∅ , Y} ⊆ S b , S b,h , ˆ S b,h for any h >

0. Since G n ( Y ) = 0, we thereforehave sup S b G n , sup S b,ln G n and sup ˆ S b,hn G n non-negative. Hence, calling ζ n the indicator function of theevent sup S G n ≤ ( l n − h n ) √ n , we can write ζ n sup S b G n ≤ ζ n max (cid:40) sup S b [ G n + √ n ( P − ν Γ)] , sup S\S b [ G n + √ n ( P − ν Γ)] (cid:41) ≤ ζ n √ nT S ( P n , Γ , ν ) ≤ ζ n sup ˆ S b,hn G n ≤ ζ n sup S b,ln G n , where the ﬁrst inequality holds because the left-hand side is equal to the ﬁrst term in the right-hand side,the second inequality holds trivially as an equality since S = S b ∪ S\S b , the third inequality holds becauseon S\ ˆ S b,h n , we have by deﬁnition G n + √ n ( P − ν Γ) = √ n ( P n − ν Γ) ≤ − h n ≤

0, and the last inequality holdsbecause on { ζ n = 1 } , we have that A ∈ ˆ S b,h n implies ν Γ( A ) ≤ P n ( A ) + h n = P ( A ) + ( P n − P )( A ) + h n ≤ P ( A ) + sup S G n / √ n + h n ≤ P ( A ) + l n − h n + h n = P ( A ) + l n , which implies that A ∈ S b,l n .By Lemma 3a and Appendix A1, we have that both sup S b G n and sup S b,ln G n converge weakly to sup S b G .It is shown below that ζ n → p

1, so that Slutsky’s lemma (lemma 2.8 page 11 of van der Vaart, 1998) yieldsthe weak convergence of ζ n sup S b G n and ζ n sup S b,ln G n to the same limit, and hence that of ζ n T S ( P n , Γ , ν )and ζ n sup ˆ S b,hn G n . It follows from Slutsky’s lemma again that √ nT S ( P n , Γ , ν ) (cid:32) sup S b G and sup ˆ S b,hn G n (cid:32) sup S b G , which proves (23).We now prove that ζ n → p

1. Indeed, for any (cid:15) > P ( | ζ n − | > (cid:15) ) = P ( ζ n = 0) = P (sup S G n > ( l n − h n ) √ n ) → l n − h n ) √ n (cid:29) √ ln ln n by assumption.There remains to show (25). Deﬁning ξ n as the indicator of the set {− h n √ n ≤ sup S G n ≤ ( l n − h n ) √ n } , e have the inequalities ξ n sup S b G ≤ ξ n sup ˆ S b,hn G ≤ ξ n sup S b,ln G . Indeed, the ﬁrst inequality holds because sup S G n ≥ − h n √ n implies that P n ( A ) ≥ P ( A ) − h n for all A , hence that S b ⊆ ˆ S b,h n ; and the second inequality holds because because on { ξ n = 1 } , we have that A ∈ ˆ S b,h n implies ν Γ( A ) ≤ P n ( A ) + h n = P ( A ) + ( P n − P )( A ) + h n ≤ P ( A ) + sup S G n / √ n + h n ≤ P ( A ) + l n − h n + h n = P ( A ) + l n , which implies that A ∈ S b,l n .By Lemma 3a suitably modiﬁed to apply to the oscillations of G instead of the oscillations of G n , wehave that sup S b,ln G converges weakly to sup S b G . It is shown below that ξ n → p

1, so that Slutsky’slemma yields the weak convergence of ξ n sup S b G n and ξ n sup S b,ln G to the same limit, and hence that of ξ n sup ˆ S b,hn G . It follows from Slutsky’s lemma again thatsup ˆ S b,hn G (cid:32) sup S b G , which proves (25).We now prove that ξ n → p

1. Indeed, for any (cid:15) > P ( | ξ n − | > (cid:15) ) = P ( ζ n = 0) = P (sup S G n > ( l n − h n ) √ n or sup S G n < − h n √ n ) → l n − h n ) √ n (cid:29) √ ln ln n and h n √ n (cid:29) √ ln ln n by assumption. Proof of Lemma 3a:

Take a bandwidth sequence l n that satisﬁes (27), and take S b,l n as in deﬁnition 3.3. Under assumption FS,take A ∈ S b,l n and an A ∈ S b such that d H ( A, A ) ≤ ζ n = Kl ηn (we suppress the dependence of A b on A for ease of notation). As S b ⊆ S b,l n , one hassup A ∈S b G n ( A ) ≤ sup B ∈S b,ln G n ( A ) (32)Second, since A b ⊆ A , one hassup A ∈S b,ln G n ( A ) = sup A ∈S b,ln [ G n ( A b ) + G n ( A \ A b )] ≤ sup A ∈S b,ln [ G n ( A b )] + sup A ∈S b,ln [ G n ( A \ A b )] . If we have that sup A ∈S b,ln | G n ( A \ A b ) | = O a . s . (cid:16)(cid:112) ζ n ln ln n (cid:17) , then sup A ∈S b,ln G n ( A ) = sup A ∈S b,ln [ G n ( A b )] + O a . s . (cid:16)(cid:112) ζ n ln ln n (cid:17) (33)noting the dependence of A b on A in the expression above. But since A b ∈ S b , one has sup A ∈S b,ln [ G n ( A b )] ≤ sup A ∈S b G n ( A ). This fact, along with (32) and (33), yields the result. e now show that we have indeed thatsup A ∈S b,ln | G n ( A \ A b ) | = O a . s . (cid:16)(cid:112) ζ n ln ln n (cid:17) . This relies on the construction of a local empirical process relative to the thin regions A \ A b . First considersuch a region. If A ∈ S b , the result holds trivially, so that we may assume that A ∈ S b,l n \S b , so that A \ A b is not empty. We distinguish the case where A is a bounded rectangle, and the cases where A isunbounded.(i) A is a bounded rectangle, i.e. of the form ( y , z ) × . . . × ( y d y , z d y ), with y , . . . , y d y , z , . . . , z d y real. Then, since d H ( A, A b ) ≤ ζ n , A b is also a bounded rectangle, and the A \ A b is the union of atleast one (since A and A b are distinct) and at most f ( d y ) (the number of faces of a rectangle in R d y )rectangles with at least one dimension bounded by ζ n .(ii) A is an unbounded rectangle, i.e. of the same form as above, except that some of the edges are + ∞ of −∞ . Then A b is also an unbounded rectangle, and A \ A b is also the union of a ﬁnite number ofrectangles with one dimension bounded by ζ n .In both cases ( i ), and ( ii ), A \ A b is the union of a ﬁnite number of rectangles with at least one dimensionbounded by ζ n . Hence if we control the supremum of the empirical process on one of these thin rectangles,when A ranges over S b,l n , we can control it on A \ A b .Hence, it suﬃces to prove that sup A ∈S b,ln | G n ( ϕ n ( A )) | = O a . s . (cid:16)(cid:112) ζ n ln ln n (cid:17) , where ϕ n is the homothety that carries A into one of the thin rectangles described above.As an homothety, ϕ n is invertible and bi-measurable, and since ϕ n ( A ) has at least one dimension boundedby ζ n , and P is absolutely continuous with respect to Lebesgue measure, P ( ϕ n ( A )) = O ( ζ n ) uniformelywhen A ranges over S b,l n . Now, for any A ∈ S b,l n , we have G n ( ϕ n ( A )) = √ n [ P n ( ϕ n ( A )) − P ( ϕ n ( A ))]= 1 √ n n (cid:88) i =1 (cid:0) { ϕ n ( A ) } ( Y i ) − E P (1 { ϕ n ( A ) } ( Y )) (cid:1) = 1 √ n n (cid:88) i =1 (cid:0) A ( ϕ − n ( Y i )) − E P (1 A ( ϕ − n ( Y ))) (cid:1) := (cid:112) ζ n L n (1 A , ϕ n ) , where L n (1 A , ϕ n ) is deﬁned as 1 √ nζ n n (cid:88) i =1 (cid:0) A ( ϕ − n ( Y i )) − E P (1 A ( ϕ − n ( Y ))) (cid:1) o conform with the notation of Einmahl and Mason, 1997.Conditions A(i)-A(iv) of the latter hold for a n = b n = l n and a = 0 under (27), and conditions S(i)-S(iii)and F(ii) and F(iv)-F(viii) hold because F is here the class of indicator functions of S b,l n which, as asubclass of S , is a Vapnik- ˘Cervonenkis class of sets. Hence Theorem 1.2 of Einmahl and Mason, 1997holds, and sup A ∈S b,ln | L n (1 A , ϕ n ) | = O a . s . (cid:16) √ ln ln n (cid:17) so that the desired result holds. Proof of Lemma 3b:

Consider S = { ( y, z ) : ( y, z ) ∈ R d y } . It is a Vapnik- ˘Cervonenkis class. Indeed, if d y = 1, its Vapnik-˘Cervonenkis index is three, since S can pick out the two elements of a set of cardinality 2, but can neverpick out the subset { x, z } of a set of three elements { x, y, z } . More generally, it can be shown that theVapnik- ˘Cervonenkis index of S is 2 d y + 1 (see Example 2.6.1 page 135 of van der Vaart and Wellner, 1996).Hence the class S K is also Vapnik- ˘Cervonenkis. The latter follows from lemma 2.6.17(iii) page 147 of vander Vaart and Wellner, 1996 and the fact that it is contained in the K -iterated union S (cid:116) . . . (cid:116) S , wherethe “square union” of two classes of sets S and S is deﬁned by S (cid:116) S = { A ∪ A : A ∈ S , A ∈ S } . Proof of Theorem 3c:

From Fact 2, we know that we can restrict attention to closed subsets of Y . Take F one such subset. Bythe outer regularity of Borel probability measures, for all n there is an open set O (cid:48) n such that F ⊆ O (cid:48) n and P ( O (cid:48) n ) ≤ P ( F ) + 1 /n . Since O (cid:48) n is open, for each y ∈ F , there exists r y > B ( y, r y ) centered at y with radius r y is included in O (cid:48) n , and by construction, the open set ˜ O (cid:48) n = (cid:83) y ∈ F B ( y, min( r y , /n )) covers F . As a closed subset of a compact set, F is compact. Hence we cancall O n the ﬁnite sub-covering of F extracted from ˜ O (cid:48) n . O n is therefore a ﬁnite union of open balls withpositive radii, i.e. it belongs to ˜ S SW . By construction of O n , we have d H ( O n , F ) ≤ /n , and we knowthat Γ( F ) ⊆ Γ( O n ), and we shall now show that ν (Γ( O n )) converges to ν (Γ( F )) to yield the result that˜ S SW is core determining.Consider the following partition Y = Y I ∪ Y − n ∪ Y + n with: Y I = { y ∈ Y : ν (Γ( y )) = 0 } , Y − n = { y ∈ Y : 0 < ν (Γ( y )) < /n } , Y + n = { y ∈ Y : ν (Γ( y )) ≥ /n } . eﬁne F I = F ∩ Y I , F − n = F ∩ Y − n and F + n = F ∩ Y + n , and similarly for O n , with O In denoting O n ∩ Y I .Consider ﬁrst O In \ F I . Assumption (CD3) yields immediately that ν (Γ( O In \ F I )) ↓ O − n \ F − n . Under assumption (CD6), ν (Γ( Y − n )) ↓

0, hence ν (Γ( O − n \ F − n )) ↓ O + n \ F + n . Consider the disjoint connected components of Γ( O + n ). Their ν measure is at least1 /n by construction, hence by the compactness of U , the number J n of disjoint connected components ofΓ( O + n ) is no greater than n . We have shown above that d H ( O n , F ) < /n , hence we have d H ( O + n , F + n ) < /n . By assumption (CD5), this implies that d H (Γ( O + n ) , Γ( F + n )) = O (1 /n ). Hence for n suﬃcientlylarge, all the disjoint connected components of Γ( O + n ) intersect Γ( F + n ). Call ( C j ) J n j =1 the disjoint connectedcomponents of Γ( O + n ). We have ν (Γ( O + n )) = J n (cid:88) j =1 ν (Γ( C j )) = J n (cid:88) j =1 (cid:0) ν (Γ( C j )) + O (1 /n ) = ν (Γ( F + n )) + O (1 /n ) (cid:1) , where the second equality holds under assumption (CD2). Since F + n ⊆ O + n , we therefore have the desiredresult ν (Γ( O + n \ F + n )) ↓

0, which completes the proof.

Proof of Theorem 3d:

From fact 2, we can restrict attention to closed subsets of Y = R . Call Y I the subset of Y deﬁnedby u ( y ) = l ( y ) P -almost surely (and therefore everywhere since u and l are increasing). Note that therestriction of ν Γ to Y I is a probability measure. Consider a closed subset F of Y . Call F I = F ∩ Y I (resp. F U = F \ F I ) the intersection of F with Y I (resp. its complementary). Because of the monotonicity of theenvelopes, ν (Γ( F )) = ν (Γ( F I )) + ν (Γ( F U )), hence we only need to prove the result for closed subsets of Y I and for closed subsets of Y\Y I .Take F a subset of Y I . The restriction ν Γ |Y I of ν Γ to Y I is a probability measure, and the class of sets C I deﬁned by C I = { A ∈ Y : A = ˜ A ∩ Y I , ˜ A ∈ C} is value determining for ν Γ |Y I . By the monotonicity ofthe envelopes, we have ν (Γ( ˜ A )) = ν (Γ( A )) + ν (Γ( ˜ A \ A )) (with the notation of the deﬁnition of C I above).Hence, if ν (Γ( A )) ≥ P ( A ) for all A ∈ C , then ν (Γ( A )) ≥ P ( A ) for all A ⊆ Y I .We can now restrict attention to the case where the upper and lower envelopes are distinct, in which case,for a closed set F , Γ( F ) has at most a countable number of connected parts, which we denote C n , n ∈ Z ,ordered in the sense that inf C n > sup C n − . By construction, each C n is the image by Γ of a subset F n of F . Γ being convex-valued, the monotonicity of the envelopes u and l implies upper-semicontinuityof l and lower-semicontinuity of u . Therefore, C n = Γ( F n ) = Γ([inf F n , sup F n ]), and we deduce that ν Γ( F ) = ν Γ( (cid:83) n I n ) where ( I n ) n ∈ Z is a countable collection of disjoint closed intervals in R . Hence if weshow that ν Γ( I ) ≥ P ( I ) for any interval I , then we have ν Γ( F ) = (cid:80) n ν Γ( I n ) ≥ (cid:80) n P ( I n ) ≥ P ( F ), and he inequality holds for F .Now, for any y < y ∈ R we have P ( y , y ] = P ( y , + ∞ )+ P ( −∞ , y ] − ≤ ν Γ( y , + ∞ )+ ν Γ( −∞ , y ] − ν ( u ( y ) − l ( y )) = ν Γ( y , y ] where u (resp. l ) is the upper (resp. lower) envelope, and the result follows. eferences Andrews, D., Berry, S., & Jia, P. (2004).

Conﬁdence regions for parameters in discrete gameswith multiple equilibria, with an application to discount chain store location [unpublishedmanuscript].Beresteanu, A., & Molinari, F. (2006).

Asymptotic properties for a class of partiallly identiﬁedmodels [unpublished manuscript].Castaldo, A., Maccheroni, F., & Marinacci, M. (2004). Random sets and their distributions.

Sankhya (Series A) , , 409–427.Chernozhukov, V., Hong, H., & Tamer, E. (2002). Inference on parameter sets in econometricmodels [unpublished manuscript].Choquet, G. (1953). Th´eorie des capacit´es.

Annales de l’Institut Fourier , , 131–295.Dempster, A. P. (1967). Upper and lower probabilities induced by a multi-valued mapping. Annalsof Mathematical Statistics , , 325–339.Dudley, R. (2003). Real analysis and probability . Cambridge University Press.Einmahl, U., & Mason, D. (1997). Gaussian approximation of local empirical processes indexedby functions.

Probability Theory and Related Fields , , 283–311.Galichon, A., & Henry, M. (2006). A duality approach to inference in models deﬁned by momentinequalities [unpublished manuscript].Heckman, J., & Vytlacil, E. (2001). Instrumental variables, selection models and tight bounds onthe average treatment eﬀect.

Econometric Evaluations of Labour Market Policies, Lechner,M., and F. Pfeiﬀer, eds. , 1–16.Imbens, G., & Manski, C. (2004). Conﬁdence intervals for partially identiﬁed parameters.

Econo-metrica , , 1845–1859.Jovanovic, B. (1989). Observable implications of models with multiple equilibria. Econometrica , , 1431–1437.Kellerer, H. (1984). Duality theorems for marginal problems. Zeitschrift f¨ur Wahrscheinlichkeit-stheorie und Verwandte Gebiete , , 399–432.Manski, C. (2005). Partial identiﬁcation in econometrics [ New Palgrave Dictionary of Economics,2nd Edition. ].Matheron, G. (1975).

Random sets and integral geometry . New York: Wiley.Pakes, A., Porter, J., Ho, K., & Ishii, J. (2004).

Moment inequalities and their application [un-published manuscript]. 48alinetti, G., & Wets, R. (1986). On the convergence in distribution of measurable multifunctions(random sets), normal integrands, stochastic processes and stochastic inﬁma.

Mathematicsof Operations Research , , 385–422.Shaikh, A. (2005). Inference for a class of partially identiﬁed econometric models [unpublishedmanuscript].Shaikh, A., & Vytlacil, E. (2005).

Threshhold crossing models and bounds on treatment eﬀects: Anonparametric analysis [NBER Technical Working Paper 0307].Strassen, V. (1965). The existence of probability measures with given marginals.

Journal of Math-ematical Statistics , , 423–439.van der Vaart, A. (1998). Asymptotic statistics . Cambridge University Press.van der Vaart, A., & Wellner, J. (1996).

Weak convergence and empirical processes . New York:Springer.Wasserman, L. (1990). Prior envelopes based on belief functions.

Annals of Statistics ,18