DDILATION BOOTSTRAP
ALFRED GALICHON AND MARC HENRY
Abstract.
We propose a methodology for constructing confidence regions withpartially identified models of general form. The region is obtained by inverting atest of internal consistency of the econometric structure. We develop a dilationbootstrap methodology to deal with sampling uncertainty without reference to thehypothesized economic structure. It requires bootstrapping the quantile process forunivariate data and a novel generalization of the latter to higher dimensions. Oncethe dilation is chosen to control the confidence level, the unknown true distributionof the observed data can be replaced by the known empirical distribution and confi-dence regions can then be obtained as in Galichon and Henry, 2008 and Beresteanuet al., 2008.
JEL Classification: C15, C31.Keywords: Partial identification, dilation bootstrap, quantile process, optimal matching.
Date : First version: May 2006. This version: October 22, 2012. Correspondence address: AlfredGalichon, ´Ecole polytechnique, D´epartment d’´economie, 91128 Palaiseau, France. Marc Henry, Uni-versit´e de Montr´eal, D´epartement de sciences ´economiques, 3150, Jean-Brillant, Montr´eal, Qu´ebecH3C 3J7, Canada. a r X i v : . [ ec on . E M ] F e b ALFRED GALICHON AND MARC HENRY
Introduction
In several rapidly expanding areas of economic research, the identification problemis steadily becoming more acute. In policy and program evaluation Manski, 1990and more general contexts with censored or missing data Shaikh and Vytlacil, 2010,Magnac and Maurin, 2008 and measurement error Chen et al., 2005, ad hoc imputa-tion rules lead to fragile inference. In demand estimation based on revealed preferenceBlundell et al., 2008 the data is generically insufficient for identification. In the anal-ysis of social interactions Brock and Durlauf, 2001, Manski, 2004, complex strategiesto reduce the large dimensionality of the correlation structure are needed. In the es-timation of models with complex strategic interactions and multiple equilibria Bjornand Vuong, 1985, Tamer, 2003, assumptions on equilibrium selection mechanismsmay not be available or acceptable.More generally, in all areas of investigation with structural data insufficienciesor incompletely specified economic mechanisms, the hypothesized structure fails toidentify a unique possible data generating mechanism for the data that is actuallyobserved. In such cases, many traditional estimation and testing techniques becomeinapplicable and a framework for inference in incomplete models is developing, withan initial focus on estimation of the set of structural parameters compatible withtrue data distribution (hereafter identified set ). A question of particular relevancein applied work is how to construct valid confidence regions for the identified set.
ILATION BOOTSTRAP 3
Formal methodological proposals abound since the seminal work of Chernozhukovet al., 2007, but computational efficiency is still a major concern.In the present work, we propose a methodology that clearly distinguishes howto deal with sampling uncertainty on the one hand, and model uncertainty on theother, so that unlike previous methodological proposals, search in the parameterspace is conducted only once, thereby greatly reducing the computational burden.The key to this separation is to deal with sampling variability without any referenceto the hypothesized structure, using a methodology we call the dilation method . Thisconsists in dilating each point in the space of observable variables in such a waythat the empirical probability (which is known) of a dilated set dominates the trueprobability (which is unknown) of the original set (before dilation). The unknowntrue probability (i.e. the true data generating mechanism) is then removed from theanalysis, and we can proceed as if the problem were purely deterministic, hence applythe methods proposed in Galichon and Henry, 2008 and Beresteanu et al., 2008.To construct confidence regions of level 1 − α for the identified set, such a di-lation y ⇒ J ( y ) (where ⇒ denotes a one-to-many map) must satisfy ˜ Y ∗ ∈ J ( ˜ Y )a.s. for some pair of random vectors ( ˜ Y ∗ , ˜ Y ), with probability 1 − α , where ˜ Y isdrawn from the true distribution of observable variables and ˜ Y ∗ is drawn from theempirical distribution relative to the observed sample. We propose a dilation boot-strap procedure to construct J , in which bootstrap realizations Y bj , j = 1 , . . . , n arematched one-to-one with the original sample points Y j , j = 1 , . . . , n so as to minimize ALFRED GALICHON AND MARC HENRY η bn = max j =1 ,...,n (cid:107) Y bj − Y σ ( j ) (cid:107) , where the permutation σ defines the matching. The α quantile of the distribution of η bn then defines the radius of the dilation.When the observable Y is a random variable, the dilation bootstrap relies on boot-strapping the quantile process, as proposed by Doss and Gill, 1992. However, boot-strapping the quantile process relies on order statistics and had no higher dimensionalgeneralization to date. This is now provided by the dilation bootstrap, which removesthe constraint on dimension through the appeal to optimal matching. Although theproblem of finding minimum cost matchings (called the assignment or marriage prob-lem ) is very familiar to economists, as far as we know, its application within aninference procedure is unprecedented.The rest of the paper is organized as follows. The next section describes the econo-metric framework and introduces the Composition Theorem and the dilation methodthe latter justifies. Section 2.1 discusses the application of the Composition Theo-rem to constructing confidence regions for partially identified parameters. Section 2.3presents the bootstrap feasible dilation and its theoretical underpinnings. Section 3presents simulation evidence on the performance of the dilation bootstrap in compar-ison with alternative methods. Section 4 explains how the method extends to higherdimensions and discrete choice and the last section concludes.
ILATION BOOTSTRAP 5 Dilation method and Composition Theorem
We consider the problem of inference on the structural parameters of an economicmodel, when the latter are (possibly) only partially identified. The economic struc-ture is defined as in Jovanovic, 1989, which generalizes Koopmans and Reiersøl, 1950.Variables under consideration are divided into two groups. Latent variables U cap-ture unobserved heterogeneity in the model. They are typically not observed bythe analyst, but some of their components may be observed by the economic actors.Observable variables Y include outcome variables and other observable heterogene-ity. They are observed by the analyst and the economic actors. We call observabledistribution P the true probability distribution generating the observable variables,and denote by ν the probability distribution that generated the latent variables U .The econometric structure under consideration is given by a binary relation betweenobservable and latent variables, i.e. a subset of Y × U , which can be written withoutloss of generality as a correspondence from U to Y . Assumption 1 (Econometric specification) . Observable variables Y , with realizations y ∈ Y ⊆ R d y and latent variables U , with realizations u ∈ U ⊆ R d u , are defined on acommon probability space (Ω , F , P ) and satisfy the relation: Y ∈ G ( U ) ⊆ Y almostsurely. Example 1 (Revealed Preferences) . This approach is particularly well suited to re-vealed preference analysis. Suppose X is the vector of observed choices made by an ALFRED GALICHON AND MARC HENRY agent, possibly over several periods. Let Z be a vector of observable variables definingthe environment in which the agent made their choices. Call Y = ( X, Z ) the vectorof all observable variables. Suppose the agent maximized a utility u ( X, Z, U | θ ) underconstraints g ( X, Z, U | θ ) ≤ (budget constraints, etc...), where θ is a vector of struc-tural parameters (including elasticities, risk aversion, etc...) and U a random vectordescribing unobserved heterogeneity. Call D ( U, X | θ ) the demand correspondence, i.e.the set of utility maximizing choices. Then we can define G ( U | θ ) by Y ∈ G ( U | θ ) ifand only if X ∈ D ( U, X | θ ) , and G ( U | θ ) exhausts all the information embodied in theutility maximization model. Example 2 (Games) . Another family of examples of our framework arises with para-metric games. Let N players with observable characteristics X = ( X , . . . , X N ) andunobservable characteristics U = ( U , . . . , U N ) have strategies Z = ( Z , . . . , Z N ) andpayoffs parameterized by X, U, Z and θ . For a given choice of equilibrium conceptin pure strategies, call C ( X, U, θ ) the equilibrium correspondence, i.e. the set of purestrategy equilibrium profiles. Then the empirical content of the game is characterizedby Z ∈ C ( X, U, θ ) , which can be equivalently rewritten Y ∈ G ( U ; θ ) with Y = ( Z, X ) . We assume a parametric structure for the unobserved heterogeneity and the modellinking unobserved heterogeneity variables to observable ones.
Assumption 2 (Correspondence) . The correspondence G : U ⇒ Y is known by theanalyst up to a finite dimensional vector of parameters θ ∈ Θ ⊆ R d θ . It is denoted ILATION BOOTSTRAP 7 G ( · ; θ ) . For all θ ∈ Θ , G ( · ; θ ) is measurable (i.e. the set { u : G ( u ; θ ) ∩ A (cid:54) = ∅ } ismeasurable for each open subset A of Y ) and has non empty and closed values. Note that the measurability and closed values assumptions are very mild conditions.The assumption that the correspondence is non-empty, however, may be restrictive.In the revealed preferences example, we require that the demand correspondence benon empty. In the games example, we require existence of equilibrium.
Assumption 3 (Latent variables) . The distribution ν of the unobservable variables U is assumed to belong to a parametric family ν ( ·| θ ) , θ ∈ Θ . The same notation is usedfor the parameters of ν and G to highlight the fact that they may have components incommon. The pair of random vectors (
Y, U ) involved in the model is generated by a probabil-ity distribution, that we denote π . Since the vector U is unobservable, the probabilitydistribution π is not directly identifiable from the data. However, the econometricmodel imposes restrictions on π . The distribution of its component Y is the ob-servable distribution P . The distribution of its component U is the hypothesizedprobability distribution ν ( ·| θ ). Finally, the joint distribution is further restricted bythe fact that it gives mass 0 to the event that the relation Y ∈ G ( U | θ ) is violated. Forany given value of the structural parameter vector θ , a joint distribution satisfyingall these restrictions may or may not exist. If it does, it is generally non unique. The ALFRED GALICHON AND MARC HENRY identified set Θ I is the collection of values of the structural parameter vector θ forwhich such a joint probability distribution does indeed exist. • If Θ I = ∅ , the model is rejected. • If Θ I is a singleton, the parameter vector θ is point identified. • Otherwise, the parameter θ is set identified.The set Θ I , first formalized in this way in Galichon and Henry, 2006 is sometimescalled “sharp identification region” to emphasize the fact that it exhausts all theinformation on the parameter available in the model. No value θ ∈ Θ I could berejected on the basis of the knowledge of the model and the observable distribution P only. Take a parameter value θ ∈ Θ. It belongs to the identified set Θ I if and onlyif there exists a joint distribution satisfying the required restrictions, in other words,if and only if there exists a “version” of U , i.e. a random vector ˜ U with the samedistribution as U , namely ν ( ·| θ ), such that Y ∈ G ( ˜ U | θ ) with probability 1. Hence,denoting by X ∼ µ the statement “the random vector X has probability distribution µ ,” we can characterize the identified set in the following way, which we take as ourformal definition. Definition 1 (Identified set) . Θ I = (cid:110) θ ∈ Θ | ∃ ˜ Y ∼ P, ˜ U ∼ ν ( ·| θ ) : P ( ˜ Y / ∈ G ( ˜ U | θ )) = 0 (cid:111) . ILATION BOOTSTRAP 9
Our inference method on the identified set will be based on a general way of com-bining sources of uncertainty (sampling uncertainty or data incompleteness) by com-position of correspondences. Suppose the probability measure Q on Y is the knowndistribution of a random vector Z and that it is related to the true unknown distri-bution P of the observed variables Y by the following relation: Assumption 4 (Dilation) . There exists a correspondence J : Y ⇒ Y such that P ( ˜ Z / ∈ J ( ˜ Y )) ≤ β for some ˜ Z ∼ Q , ˜ Y ∼ P and ≤ β < . Assumption 4 characterizes the additional level of indeterminacy the analyst faces.The structural model is incomplete in the sense that the relation between unobservedheterogeneity U and outcomes Y is a many-to-many mapping. In addition, dueto observability issues or sampling uncertainty, the distribution of outcomes P isunknown and the relation between true outcome Y and a variable Z that we cansimulate is also many-to-many. Example 3 (Measurement error) . Suppose true outcome Y is mismeasured as Z = Y + (cid:15) and nothing is known about measurement error (cid:15) except that it is small, i.e. (cid:107) (cid:15) (cid:107) ≤ η for some η > , with a degree of confidence − β . In that case, Assumption 4holds with the correspondence J defined by J ( y ) = B ( y, η ) for all y ∈ Y , where B ( y, η ) is the closed ball centered at y with radius η . Example 4 (Censored outcomes) . Suppose the true outcome Y is reported with cen-soring as Z = J ( Y ) , where J ( y ) returns the minimum of y and an upper bound B > .Assumption 4 is satisfied with β = 0 . The following theorem shows how the two levels of uncertainty can be combinedwithout loss of information. Theorem 1 (Composition Theorem) . Under assumptions 1 to 4, there exist ˜ Z ∼ Q and ˜ U ∼ ν such that P ( ˜ Z / ∈ J ◦ G ( ˜ U | θ )) ≤ β . Theorem 1 implies that when the distribution P of outcomes is unknown, theinfeasible identified set Θ I can be replaced by a feasible identified set˜Θ I = (cid:110) θ ∈ Θ | ∃ ˜ Z ∼ Q, ˜ U ∼ ν ( ·| θ ) : P ( ˜ Z / ∈ J ◦ G ( ˜ U | θ )) ≤ β (cid:111) . Proof of Theorem 1.
Under Assumptions 1 and 3, there is a pair (
Y, U ) such that Y ∼ P and U ∼ ν ( ·| θ ) and Y ∈ G ( U | θ ) almost surely. Equivalently, the minimumover all pairs ( ˜ Y , ˜ U ), with ˜ Y ∼ P and ˜ U ∼ ν ( ·| θ ), of the quantity E (1 { ˜ Y / ∈ G ( ˜ U | θ ) } )is zero. By proposition 1 of Galichon and Henry, 2009 (hereafter denoted P1) , thelatter is equivalent tosup ( P ( A ) − ν ( { u ∈ U : G ( u | θ ) ∩ A (cid:54) = ∅ }| θ )) = 0 , (1.1) The current proof of Theorem 1, suggested by Alexei Onatski, is shorter and simpler than ouroriginal proof in previous versions of the paper. We are responsible for any remaining errors.
ILATION BOOTSTRAP 11 where the sup is over all Borel subsets A of Y . Similarly, by Assumption 4, theminimum over all pairs ( ˜ Z, ˜ Y ), with ˜ Z ∼ Q and ˜ Y ∼ P , of the quantity E (1 { ˜ Z / ∈ J ( ˜ Y ) } ) is smaller than or equal to β . By P1 the latter is equivalent tosup ( Q ( A ) − P ( { y ∈ Y : J ( y ) ∩ A (cid:54) = ∅ }| θ )) ≤ β. (1.2)Denote J − ( A ) = { y ∈ Y : J ( y ) ∩ A (cid:54) = ∅ } . By (1.1), we have P ( J − ( A )) ≤ ν ( { u ∈U : G ( u | θ ) ∩ J − ( A ) (cid:54) = ∅ }| θ ) for all Borel subsets A of Y . Hence, (1.2) yieldssup (cid:0) Q ( A ) − ν ( { u ∈ U : G ( u | θ ) ∩ J − ( A ) (cid:54) = ∅ }| θ ) (cid:1) ≤ β, Hence sup ( Q ( A ) − ν ( { u ∈ U : J ◦ G ( u | θ ) ∩ A (cid:54) = ∅ }| θ )) ≤ β, (1.3)since G ( u | θ ) ∩ J − ( A ) (cid:54) = ∅ and J ◦ G ( u | θ ) ∩ A (cid:54) = ∅ are equivalent. Finally, by athird application of P1, (1.3) is equivalent to β weakly dominating the minimum ofthe quantity E (1 { ˜ Z / ∈ J ◦ G ( ˜ U | θ ) } ) over all pairs ( ˜ Z, ˜ U ) with ˜ Z ∼ Q and ˜ U ∼ ν ( ·| θ )and the result follows. (cid:3) To illustrate the composition theorem, consider a special case of the revealed pref-erence example 1 combined with measurement error, as in example 3. Suppose we ob-serve the share Y of risky assets in the portfolio of investors, who are assumed to maxi-mize the expectation of a CARA utility function u ( Y, A ; U ) = exp( − U [(1 − Y )+ Y A ]),hence they are assumed to maximize Y E ( A ) − U Y var(A) /
2, where E ( A ) is the per-ceived mean of the risky asset A and var( A ) its perceived variance. We further suppose investors differ by their risk aversion U , for which the analyst hypothesizes an expo-nential distribution ( F U ( u ; θ ) = P ( U ≤ u ; θ ) = 1 − e − θu ) and by their perception ofthe riskiness of the asset, and all the analyst knows is a pair of bounds ( λ, λ ) suchthat E ( A ) / var( A ) ∈ [ λ, λ ]. The investor’s maximization yields Y = E ( A ) /U var( A ),so that the model can be summarized by Y ∈ G ( U ) = [ λ/U, λ/U ]. Values λ = 50%and λ = 200% can be calibrated according to Weitzman, 2007. The true distribu-tion of income Y is unknown, but the true cumulative distribution of a mismeasuredversion Z = Y + (cid:15) , with (cid:107) (cid:15) (cid:107) ≤ η a.s., is F Z ( y ) = P ( Z ≤ z ) = exp( − /z ). ByTheorem 1, the identified set ˜Θ I can be derived from the composed correspondence J ◦ G : u ⇒ J ◦ G ( u ) = [ λ/u − η, λ/u + η ], where J : y ⇒ J ( y ) = B ( y, η ) is a dilation sat-isfying Assumption 4. The cumulative distribution of risk aversion satisfies 1 − e − θu = P ( U ≤ u ) ∈ [ P ( λ/u + η ≤ Z ) , P ( λ/u − η ≤ Z )] = [1 − e − ( λ/u + η ) − , − e − ( λ/u − η ) − ].Hence, for all u >
0, ( λ + ηu ) − ≤ θ ≤ ( λ − ηu ) − . Therefore, the identified set canbe derived as ˜Θ I = [1 /λ, /λ ].2. Dilation method and sampling uncertainty
Confidence regions.
The main application of the Composition Theorem 1 thatwe consider here is the construction of valid confidence regions for partially identifiedmodels, based on a sample of realizations of the observable variables.
ILATION BOOTSTRAP 13
Assumption 5 (Sampling) . Let ( Y , . . . , Y n ) be a sample of independent and iden-tically distributed random vectors with distribution P and let P n = (cid:80) nj =1 δ Y j be theempirical distribution associated with the sample. We propose a new method to construct a confidence region for the identified setΘ I of definition 1. Definition 2 (Confidence region) . A valid α -confidence region for the identified set Θ I is a sequence of random regions Θ αn satisfying lim inf n P (Θ I ⊆ Θ αn ) ≥ − α. As noted in Imbens and Manski, 2004, this is not the only way to define confidenceregions in a partially identified setting, as one might also consider coverage (pointwise or uniform) of each value within the identified set. Here we concentrate on asituation where one cannot assume that any value within the identified set can beconstrued as the true value, so that the whole set is the object of interest. Moreover, aconfidence region for the identified set is also a uniform confidence region for each of itselements. The construction of the confidence region is based on a new nonparametricway of controlling sampling uncertainty and its validity relies on a corollary to theComposition Theorem (Theorem 1 of Section 1). We construct sample based sets J αn ,where α ∈ (0 ,
1) is the desired confidence level, to account for the discrepancy betweenthe empirical distribution P n associated with the sample and the true observabledistribution P . We thereby obtain an analogue of Assumption 4: Assumption 6 (Sample dilation) . With probability − α n such that lim sup n α n ≤ α ,conditionally on the sample ( Y , . . . , Y n ) , the sequence of correspondences J αn satisfies Y ∈ J αn ( ˜ Y ∗ ) almost surely for some ˜ Y ∼ P , ˜ Y ∗ ∼ P n . Heuristically, the region J αn satisfying Assumption 6 ensures that with suitableconfidence, the realizations of the empirical distribution are caught by the enlarged realizations of the true distribution J αn ( ˜ Y ). Once the dilation J αn is obtained, theComposition Theorem can be applied to prove the following: Theorem 2.
Under assumptions 1, 2, 3, 5 and 6, then Θ αn := { θ ∈ Θ | ∃ ˜ Y ∗ ∼ P n , ˜ U ∼ ν ( ·| θ ) : P ( ˜ Y ∗ / ∈ J αn ◦ G ( ˜ U | θ )) = 0 } is a valid α -confidence region for theidentified set Θ I . The dilation J αn is chosen to control the confidence level: indeed, by Proposition 1of Galichon and Henry, 2009 (called P1 in the proof of Theorem 1), the statement ∃ ˜ Y ∗ ∼ P n , ˜ U ∼ ν ( ·| θ ) : P ( ˜ Y ∗ / ∈ J αn ◦ G ( ˜ U | θ )) = 0 is equivalent to P ( A ) ≤ P n ( J αn ( A )) , for all Borel subset A of Y . Hence, the unknown distribution P of an event A isdominated by the empirical distribution of the dilation J αn ( A ) of the event A . As both P n and ν ( ·| θ ) are known, the construction of Θ αn is feasible and efficient methods tocompute it were proposed in Galichon and Henry, 2008 and Beresteanu et al., 2008.2.2. Oracle dilation.
We now turn to the question of how to construct the dilation J αn that satisfies Assumption 6. When Y is a random variable, such dilation will beobtained from uniform confidence bands for the quantile process. ILATION BOOTSTRAP 15
Definition 3 (Quantile process) . Let F be the cumulative distribution of Y . Let Q ( t ) , t ∈ [0 , be the quantile function of Y , defined by Q ( t ) = inf { y ∈ [0 ,
1] : F ( y ) ≥ t } .Call Q n the empirical quantile relative to the sample ( X , . . . , X n ) . It is defined by Q n ( t ) = Y ( j ) for j − < nt ≤ j for each j , with Y ( j ) denoting the j th order statistic.The quantile process is defined as q n ( t ) := √ n ( Q n ( t ) − Q ( t )) . The idea of the construction of dilations satisfying Assumption 6 is based onthe quantile transformation. Indeed, letting Z be a uniform random variable on[0 ,
1] and defining ˜ Y = Q ( Z ) and ˜ Y ∗ = Q n ( Z ), we have a pair of random vari-ables ˜ Y and ˜ Y ∗ with respective probability distributions P and P n . Suppose auniform confidence band is available for the quantile function of the form P ( η n :=sup ≤ t ≤ | q n ( t ) | ≤ ˜ c n ( α )) = 1 − α n . Then, with probability 1 − α n , we have | ˜ Y ∗ − ˜ Y | = | Q n ( Z ) − Q ( Z ) | ≤ ˜ c n ( α ) / √ n almost surely. Hence, the dilation J αn defined for all y by J αn ( y ) = B ( y, ˜ c n ( α ) / √ n ) satisfies Assumption 6. Moreover, the choice of dilation J αn ( y ) = B ( y, ˜ c n ( α ) / √ n ) is optimal in the sense that, under the regularity conditionsof Assumption 7, | Q n ( Z ) − Q ( Z ) | achieves the minimum of | ˜ Y ∗ − ˜ Y | when ˜ Y ∗ (respec-tively ˜ Y ) ranges over the set of random variables with distribution P n (respectively P ). Note that smaller dilations are desirable, as they maximize informativeness ofthe resulting confidence region.The following conditions guarantee the existence of such uniform confidence bandsfor the quantile process. Assumption 7 (Uniform quantile bands) . The sample { Y , . . . , Y n } is an iid sampleof random variables with cumulative distribution function F satisfying:(i) F ( y ) is twice continuously differentiable on its support ( a, b ) .(ii) F (cid:48) = f > on ( a, b ) .(iii) For some γ > , sup y ∈ ( a,b ) F ( y )(1 − F ( y )) | f (cid:48) ( y ) | /f ( y ) ≤ γ .(iv) lim sup y ↓ a f ( y ) < ∞ and lim sup y ↑ b f ( y ) < ∞ .(v) f is nondecreasing (resp. nonincreasing) on an interval to the right of a (resp. tothe left of b ). A distribution function F satisfying Assumption 7 is called tail monotonic withindex γ by Parzen Parzen, 1979. To indicate the mildness of Assumption 7, Parzen,1979 gives the following example where it fails: 1 − F ( y ) = exp( − y − C sin y ) with0 . < C <
1. As shown below, under Assumption 7, asymptotic results on theempirical quantile process allow us to derive a dilation J αn that satisfies Assumption 6for all α ∈ (0 , c ( α ) implicitly by P (sup ≤ t ≤ | B ( t ) | ≤ c ( α )) = 1 − α , where B ( t ) is a Gaussian process called a Brownian bridge . For any α ∈ (0 , Proposition 1 (Oracle dilation) . Under assumptions 5 and 7, the dilation J αn definedfor each y by J αn ( y ) = [ y − c ( α ) / √ nf ( y ) , y + c ( α ) / √ nf ( y )] satisfies Assumption 6. ILATION BOOTSTRAP 17
Proof of Proposition 1.
Under Assumption 7, we have the following strong approxi-mation result in Cs¨org˝o, 1983, theorem 4.1.2 page 31:sup ≤ t ≤ | f ( Q ( t )) q n ( t ) − B n ( t ) | = O ( n − / ε ) , a.s.for (cid:15) > B n ( t ) is a sequence of Brownian bridges. Hence, the interval Q n ( t ) − c ( α ) / √ nf ( Q ( t )) ≤ Q ( t ) ≤ Q n ( t ) − c ( α ) / √ nf ( Q ( t ))is an asymptotically valid uniform confidence band for Q ( t ), 0 ≤ t ≤
1, of level 1 − α. Take Z a uniform random variable on [0 , Y ∗ := Q n ( Z ) and Y := Q ( Z ). Bythe quantile transform, ˜ Y ∗ has distribution P n and ˜ Y has distribution P . Therefore,with probability tending to 1 − α , there exists ˜ Y ∗ ∼ P n and ˜ Y ∼ P such that˜ Y − c ( α ) / √ nf ( ˜ Y ) ≤ ˜ Y ∗ ≤ ˜ Y + c ( α ) / √ nf ( ˜ Y ) almost surely, and the result follows. (cid:3) The dilation in proposition 1 is infeasible, as it depends on the unknown f and itrelies on quantiles c ( α ) that are difficult to compute. We develop a feasible alternativein our dilation bootstrap procedure in section 2.3. We resort to a bootstrap matchingalgorithm to construct feasible versions of the dilation above.2.3. Bootstrap dilation.
To introduce the simple idea underlying the method, con-sider the sample ( Y , . . . , Y n ) and a given bootstrap realization ( Y b , . . . , Y bn ) as infigure 1. As before, ( Y (1) , . . . , Y ( n ) ) are the order statistics associated with the sampleand ( Y b (1) , . . . , Y b ( n ) ) are the order statistics associated with the bootstrap realization (with arbitrary ranking of the ties). In the illustrative example of figure 1, the small-est observation of the initial sample Y (1) was drawn once in the bootstrap sample, thesecond smallest was not drawn, the third smallest was drawn once, the fourth smallesttwice, and the largest Y ( n ) was drawn twice. The arrows in the figure represent thebijection that matches the j ’th order statistic of the initial sample Y ( j ) with the j ’thorder statistic of the bootstrap sample Y b ( j ) for each j = 1 , . . . , n . Figure 1.
Bootstrap Quantile Matching.To achieve a bootstrap analog of Assumption 6, we need a dilation J bn and a per-mutation σ of { , . . . , n } such that Y ( j ) ∈ J bn ( Y bσ ( j ) ) for all j = 1 , . . . , n . One suchpermutation matches the order statistics of the initial sample with the order statis-tics of the bootstrap sample. In this matching in the example of figure 1, Y (1) ismatched with Y b (1) , namely with itself. Y (2) was not drawn in the bootstrap sample,so it is matched with Y b (2) , which is equal to Y (3) , for whom Y (2) is the second closestneighbor in Euclidian distance. Y (3) is the nearest neighbor of its match Y b (3) = Y (4) , ILATION BOOTSTRAP 19 Y (4) is matched with itself, Y ( n − is the nearest neighbor of its match Y b ( n − = Y ( n ) and finally Y ( n ) is matched with itself. The longest distance between two matches is η bn = | Y b ( n − − Y ( n − | = | Y ( n ) − Y ( n − | . Hence, if J bn ( y ) = B ( y, η bn ), Y b ( j ) ∈ J bn ( Y ( j ) ) willbe satisfied for all j = 1 , . . . , n in this particular bootstrap sample realization. Thechosen matching in figure 1 characterizes the bootstrap quantile process (see Defini-tion 4) and it minimizes the largest deviation η bn , and hence produces the smallestdilation. Definition 4 (Bootstrap quantile process) . A bootstrap sample is a sample ( Y b , . . . , Y bn ) of i.i.d. variables with distribution P n . The quantile function of the distribution ofthe bootstrap sample ( bootstrap quantile ) is defined for each t ∈ [0 , by Q bn ( t ) = Y b ( j ) for j − < nt ≤ j . The bootstrap quantile process is defined as q bn ( t ) := √ n ( Q bn ( t ) − Q n ( t )) . Call η bn the maximum of the bootstrap quantile process. In the illustrative example of figure 1, the bootstrap quantile process attains itsmaximum over t ∈ [0 ,
1] at t such that n − < nt ≤ n − η bn = Y b ( n − − Y ( n − = Y ( n ) − Y ( n − . In the population of bootstrap realizations, η bn has distribution with 1 − α quantile c ∗ n ( α ). The latter can be approximated by simulation with a large number B of bootstrap replications. We obtain η bn for each b = 1 , . . . , B . Call ˆ c ∗ n ( α ) the [ Bα ]-thlargest among the η bn ’s (where [ . ] denoted integer part) and ˆ J α, ∗ n ( y ) = B ( y, ˆ c ∗ n ( α )), thenby construction, a proportion 1 − α of the bootstrap samples indexed by b = 1 , . . . , n will satisfy Y b ( j ) ∈ J α, ∗ n ( Y ( j ) ) for all j = 1 , . . . , n . By Theorem 2 of Singh, 1981 (see also Theorem 5.1 of Bickel and Freedman, 1981), the bootstrap quantile process( q bn ( t )) t ∈ [0 , has almost surely the same uniform weak limit as the empirical quantileprocess ( q n ( t )) t ∈ [0 , and we therefore have the following result on the validity of thebootstrap dilation. Proposition 2 (Bootstrap dilation) . Let c ∗ n ( α ) be the − α quantile of the supremum η bn of the bootstrap quantile process ( q bn ( t )) t ∈ [0 , . Under assumptions 5 and 7, thedilation defined for each y by J α, ∗ n ( y ) = B ( y, c ∗ n ( α ) / √ n ) satisfies Assumption 6 almostsurely. Note that in the univariate case, the simulation approximation ˆ c ∗ n ( α ) to the quan-tile c ∗ n ( α ) is very simple to derive. The simplest algorithm requires ordering the initialsample and each of the bootstrap samples and computing the maximum of | Y b ( j ) − Y ( j ) | over j = 1 , . . . , n . However, we have introduced, with figure 1 and the discussionabove, an equivalent algorithm, which runs as follows: for each b = 1 , . . . , B , findthe permutation σ over { , . . . , n } , which minimizes the quantity max j | Y bj − Y σ ( j ) | .Unlike the algorithm based on the order statistics, such an optimal matching or op-timal assignment procedure can be performed regardless of dimension and efficientalgorithms and implementations are available. ILATION BOOTSTRAP 21 Simulation evidence
We assess the small sample performance of the dilation bootstrap on the followingsimulation design. Observable variables Y have a standard normal distribution, whileunobserved heterogeneity variable U is assumed to follow a normal distribution withmean θ and variance 1. The cumulative distribution of U is denoted F U . The modelcorrespondence G is defined for each u by G ( u ) = [ u − , u + 1], so that the modelis characterized by the relation Y ∈ G ( U ) = [ U − , U + 1]. Therefore, the identifiedset can be immediately derived as Θ I = [ − , ,
000 initial samples of size n =50 , , P n , we compute ˆ c ∗ n ( α ) with 5 ,
000 bootstrapreplications, and use the dilation ˆ J α, ∗ n ( y ) = B ( y, ˆ c ∗ n ( α )), so that a parameter value θ belongs to the (1 − α )-confidence region Θ CR for Θ I if and only if there exist ˜ Y ∗ ∼ P n and ˜ U ∼ F U ( . ; θ ) such that P ∗ ( ˜ Y ∗ ∈ ˆ J α, ∗ n ◦ G ( ˜ U )) = 1. Since P n and F U are known, thelatter condition can be checked efficiently with the core determining class method ofGalichon and Henry, 2008, section 2.3. We report Monte Carlo coverage probabilitiesin case of significance level α = 0 .
01, 0 .
05 and 0 . Table 1.
Rejection levels from the dilation bootstrap procedure.Sample Size 50 100 500 α = 0 .
01 0.0122 0.0118 0.0108 α = 0 .
05 0.0324 0.0364 0.0438 α = 0 .
10 0.0590 0.0648 0.0754
The most notable feature to note is the tendency to under reject in small samples,especially for true size α = 0 .
10 but also for true size α = 0 .
05. For true size α = 0 .
01 on the other hand, the procedure displays slight over rejection in smallsamples. For comparison purposes, we also report coverage probabilities from thegeneric subsampling procedure for set coverage in Chernozhukov et al., 2007 based onthe criterion function √ n max(max j =1 ,...,n [ F n ( Y j ) + F U ( Y j + 1)] , max j =1 ,...,n [ − F n ( Y j ) + F U ( Y j − , ,
48 when n = 50, 85 , ,
95 for n = 100 and 425 , ,
475 for n = 500 arereported in Table 2. We find the procedure over rejects in all but one case, and thereis moderate dependence in the choice of subsample size.4. Extensions
The dilation method and dilation bootstrap have natural extensions to the cases,where observable variables Y are multivariate and to the case, where Y is discrete.We consider both extensions in the following subsections.4.1. Multivariate extension.
Consider first the case, where the random vector ofobservable variables Y has dimension d ≥
2. This extension allows the considerationof multiple equations models. Moreover, it is particularly relevant in this partially
ILATION BOOTSTRAP 23
Table 2.
Rejection levels from the infeasible CHT procedure.Sample Subsample α = 0 . α = 0 . α = 0 . Example 5 (Single equation model with endogeneity) . Suppose the econometricmodel under consideration is Z = f ( X, U ; θ ) , where Z and X are observed ran-dom variables, U is unobserved heterogeneity and f is a function parameterized by θ . Suppose no assumption is made on the dependence between X and U . Define Y = ( X, Z ) (cid:48) . Define the correspondence G for each u by ( x, z ) ∈ G ( u ; θ ) if and onlyif z = f ( x, u ; θ ) . Then the model can be rewritten Y ∈ G ( U ; θ ) as in Assumption 1. In case Y is multivariate, although Theorem 2 holds irrespective of dimension,the construction of a dilation satisfying Assumption 6 can no longer rely on thetraditional quantile process as in Propositions 1 and 2. However, the quantity η n = inf {(cid:107) ˜ Y ∗ − ˜ Y (cid:107) ∞ : ˜ Y ∗ ∼ P n , ˜ Y ∼ P } is still well defined. When attained, it isachieved by a pair of random vectors ( ˜ Y ∗ , ˜ Y ) with marginal distributions P n and P ,which minimizes the largest deviation. Equivalently, there exist ˜ Y ∗ ∼ P n and ˜ Y ∼ P such that ˜ Y ∗ belongs to a closed ball B ( ˜ Y , η n ) centered on ˜ Y and with radius η n , i.e.such that E [1 { ˜ Y ∗ / ∈ B ( ˜ Y , η n ) } ] = 0.When Y is uniformly distributed on the unit cube [0 , d , the quantity η n is wellstudied in the probability literature. Hence, using asymptotic results on the quantity η n in the literature, specifically Leighton and Shor, 1989 for the case d = 2 and Shorand Yukich, 1991 for the case d ≥
3, we can derive analytical formulae for the dilation J n : Proposition 3 (Minimax matchings) . The exist a constant c > and a function c d > of the dimension d of Y such that J n ( y ) = B ( y, c (ln n ) / / √ n ) satisfiesAssumption 6 with α n = n − c √ ln n when d = 2 and J n ( y ) = B ( y, c d (. ln n/n ) /d ) satisfiesAssumption 6 for any α ∈ [0 , when d ≥ . However, the results of Proposition 3 only pertain to the uniform case and produceconservative confidence regions. More generally, we propose constructing suitabledilations based on the distribution of η n . Definition 5 (Minimax matching) . Call c n ( α ) the − α quantile of the distributionof η n = inf {(cid:107) ˜ Y ∗ − ˜ Y (cid:107) ∞ : ˜ Y ∗ ∼ P n , ˜ Y ∼ P } . ILATION BOOTSTRAP 25
By construction, we then see that the ball B ( y, c n ( α )) is a suitable dilation, in thesense that it satisfies Assumption 6. Proposition 4 (Multivariate oracle dilation) . The dilation J αn defined for each y by J αn ( y ) = B ( y, c n ( α )) satisfies Assumption 6. As for the approximation of c n ( α ) to obtain a feasible dilation, once again, althoughthe quantile process is no longer defined, the matching algorithm described in Sec-tion 2.3 is easily generalizable and delivers a bootstrap dilation approximation of J αn .The general procedure is described as follows. Bootstrap Algorithm: • Consider bootstrap samples ( Y b , . . . , Y bn ), b = 1 , . . . , B drawn from P n andcall P bn the empirical distribution of sample b . • For each bootstrap replication b , define η bn = min σ max j ∈{ ,...,n } (cid:107) Y bj − Y σ ( j ) (cid:107) , where σ ranges over all permutations of { , . . . , n } . • Let ˆ c ∗ n ( α ) be the [ Bα ] largest among the η bn , b = 1 , . . . , B , and for each y , setˆ J α, ∗ n ( y ) = B ( y, ˆ c ∗ n ).The problem of finding the permutation that achieves η bn is called bottleneck bipartitematching in the combinatorial optimization and operations research literature. Case of discrete choice.
We now turn to the case of aggregate data fromdiscrete choice. To fix ideas, consider a voting model, where K parties are representedin n electoral districts and observations ˆ p i,k , i = 1 , . . . , n and k = 1 , . . . , K , arereported shares of votes for party k in district i . Voter l chooses the party thatmaximizes their utility u li,k ( θ ) + ρ i,k + (cid:15) li,k , where u i,k ( θ ) is a deterministic function of(observed covariates and) the unknown parameter θ , ρ i,k are random district-partyeffects (independent of voters) and the (cid:15) li,k ’s are i.i.d. type I extreme value randomutilities. True vote shares for party k in district i satisfy ln p ∗ i,k ( ρ i,k ) = u i,k ( θ ) + ρ i,k +ln (cid:80) k exp( u i,k + ρ i,k ). True shares p ∗ i,k are unobserved, however, due to the possibilityof electoral fraud. Reported shares p i,k are assumed to satisfy p i,k ≥ p ∗ i,k when arepresentative of party k is present during the vote count in district i . In districts,where no party representative is present, the situation is equivalent to missing dataon vote shares. Let X i,k be equal to 1 if a representative of party k is present indistrict i during vote count, and zero otherwise. We assume X = ( X i,k ) i =1 ,...,n ; k =1 ,...,K is exogenous. The correspondence characterizing the model is G (cid:0) ( ρ i,k ) Kk =1 | X ; θ (cid:1) = (cid:40) ( p i,k ) Kk =1 : K (cid:88) k =1 p i,k = 1; p i,k ≥ p ∗ i,k ( ρ i,k ) X i,k , each k (cid:41) . District i has n i voters. Call ˆ p i,k the proportion of votes in district i reported as goingto party k and write ˆ p i = (ˆ p i,k ) k =1 ,...,K . By the central limit theorem, √ n i (ˆ p i − p i )has Gaussian limiting distribution with zero mean and covariance matrix V i , with ILATION BOOTSTRAP 27 diagonal elements p i,k (1 − p i,k ) and off-diagonal elements − p i,k p i,k (cid:48) . Call Z i a randomvector with distribution N (0 , V i /n i ) and let η i be such that P ( Z i / ∈ B (0 , η i )) = α i ,where B (0 , η i ) is the open ball centered at zero with radius η i . Define the dilation J α i n i defined for each p by J α i n i ( p ) = B ( p, η i ). Then J αn ( p ) = (cid:83) i J α i n i ( p ) = B ( p, max i η i )satisfies Assumption 6 for α = lim sup n Π ni =1 α i .In the two-party case, call ˆ p i the reported share of votes for party 1, p i the true or population reported share and p ∗ i the true share (absent reporting fraud). The trueshare satisfies ln p ∗ i ( ρ i ) = u i, ( θ ) + ρ i, + ln(exp( u i, ( θ ) + ρ i, ) + exp( u i, ( θ ) + ρ i, )).Because of fraud issues, all we know about the relation between p i and p ∗ i is thefollowing: p i ≥ p ∗ i if party 1 places an observer in district i.p i ≤ p ∗ i if party 2 places an observer in district i. Note that reported vote shares are equal to true vote shares in case both partieshave observers present for vote count. Letting X i,k take value 1 if party k places anobserver in district i and zero otherwise, the correspondence characterizing the modelis G ( ρ i | X, θ ) = { p i : p i ≥ p ∗ i ( ρ i ) X i, and (1 − p i ) ≥ (1 − p ∗ i ( ρ i )) X i, } = (cid:20) X i, exp( u i, ( θ ) + ρ i, )exp( u i, ( θ ) + ρ i, ) + exp( u i, ( θ ) + ρ i, ) , − X i, exp( u i, ( θ ) + ρ i, )exp( u i, ( θ ) + ρ i, ) + exp( u i, ( θ ) + ρ i, ) (cid:21) . By the central limit theorem, √ n i (ˆ p i − p i ) has Gaussian limiting distribution withzero mean and variance p i (1 − p i ). Call c α i / the quantile of level 1 − α i / η = max i η i with η i = c α i / (cid:112) p i (1 − p i ) /n i . Thenthe dilation defined for each p by J αn ( p ) = [ p − η, p + η ] satisfies Assumption 6 with α = lim sup n Π i α i . The composition of the dilation J αn and the correspondence G yields J αn ◦ G ( ρ | X ; θ ) = [ X p ∗ ( ρ ) − η, X (1 − p ∗ ( ρ )) + η ] . The region ˜Θ I containing all θ such that ˆ p ∈ J αn ◦ G ( ρ | X, θ ) a.s. is therefore a validconfidence region for the identified set and can be computed efficiently using methodsproposed in Galichon and Henry, 2008.
Conclusion
We have proposed a method to combine several sources of uncertainty, such asmissing or corrupted data and structural incompleteness in the model through acomposition of correspondences. We show that our composition theorem applies inparticular to the construction of confidence regions in partially identified models ofgeneral form. In that case, the composition theorem is applied to the compositionof the correspondence that defines the econometric structure and a dilation of thesample space that controls the significance level and allows to replace the unknowndistribution of observable data by the empirical distribution of the sample in the char-acterization of compatibility between model and data. An important computational
ILATION BOOTSTRAP 29 advantage of this method over previous proposed confidence regions for partiallyidentified parameters is that the dilation is performed independently of the structuralparameter, hence needs to be performed only once. The remaining search over theparameter space is purely deterministic. The dilation is obtained through a minimaxmatching procedure. It is equivalent to a uniform confidence band for the quantileprocess when the dimension of the endogenous variable is one, however, it has noparallel in higher dimensions. The method is shown to perform well in simulationexperiments.
Acknowledgements
We thank Christian Bontemps, Gary Chamberlain, Victor Chernozhukov, Pierre-Andr´e Chiappori, Ivar Ekeland, Rustam Ibragimov, Guido Imbens, Thierry Magnac,Francesca Molinari, Alexei Onatski, Geert Ridder, Bernard Salani´e, participants atthe “Semiparametric and Nonparametric Methods in Econometrics” conference inOberwolfach and seminar participants at BU, Brown, CalTech, Chicago, ´Ecole poly-technique, Harvard, MIT Sloan, Northwestern, Toulouse, UCLA, UCSD and Yale forhelpful comments (with the usual disclaimer). Both authors gratefully acknowledgefinancial support from NSF grant SES 0532398 and from Chaire AXA “Assurancedes Risques Majeurs” and Chaire Soci´et´e G´en´erale “Risques Financiers”. Galichon’sresearch is partly supported by Chaire EDF-Calyon “Finance and D´eveloppement
Durable” and FiME, Laboratoire de Finance des March´es de l’Energie. Henry’s re-search is also partly supported by SSHRC Grant 410-2010-242.
References
Beresteanu, A., Molchanov, I., & Molinari, F. (2008).
Sharp identification regions inmodels with convex predictions: Games, individual choice, and incomplete data [cemmap working paper CWP27/09].Bickel, P., & Freedman, D. (1981). Some asymptotic theory for the bootstrap.
Annalsof Statistics , , 1196–1217.Bjorn, P., & Vuong, Q. (1985). Simultaneous equations models for dummy endogenousvariables [Caltech Working Paper 537].Blundell, R., Browning, M., & Crawford, I. (2008). Best nonparametric bounds ondemand responses.
Econometrica , , 1227–1262.Brock, W., & Durlauf, S. (2001). Discrete choice with social interactions. Review ofEconomic Studies , , 235–265.Chen, X., Hong, H., & Tamer, E. (2005). Measurement error models with auxiliarydata. Review of Economic Studies , , 343–366.Chernozhukov, V., Hong, H., & Tamer, E. (2007). Estimation and confidence regionsfor parameter sets in econometric models. Econometrica , , 1243–1285.Cs¨org˝o, M. (1983). Quantile processes with statistical applications . Regional Confer-ence Series in Applied Mathematics.
EFERENCES 31
Doss, H., & Gill, R. (1992). An elementary approach to weak convergence for quantileprocesses, with applications to censored survival data.
Journal of the AmericanStatistical Association , , 869–877.Galichon, A., & Henry, M. (2006). Inference in incomplete models [available fromSSRN at http://papers.ssrn.com/sol3/papers.cfm?abstract id=886907].Galichon, A., & Henry, M. (2008).
Set identification in models with multiple equilibria [forthcoming in the
Review of Economic Studies ].Galichon, A., & Henry, M. (2009). A test of non-identifying restrictions and confidenceregions for partially identified parameters.
Journal of Econometrics , , 186–196.Imbens, G., & Manski, C. (2004). Confidence intervals for partially identified param-eters. Econometrica , , 1845–1859.Jovanovic, B. (1989). Observable implications of models with multiple equilibria. Econometrica , , 1431–1437.Koopmans, T., & Reiersøl, O. (1950). The identification of structural characteristics. Annals of Mathematical Statistics , , 165–181.Leighton, T., & Shor, P. (1989). Tight bounds for minimax grid matching with ap-plications to the average case analysis of algorithms. Combinatorica , , 161–187. Magnac, T., & Maurin, E. (2008). Partial identification in monotone binary models:Discrete regressors and interval data.
Review of Economic Studies , , 835–864.Manski, C. (1990). Nonparametric bounds on treatment effects. American EconomicReview , , 319–323.Manski, C. (2004). Social learning from private experiences: The dynamics of theselection problem. Review of Economic Studies , , 443–458.Parzen, E. (1979). Non parametric statistical data modeling. Journal of the AmericanStatistical Association , , 105–131.Shaikh, A., & Vytlacil, E. (2010). Partial identification in triangular systems of equa-tions with binary dependent variables [forthcoming in
Econometrica ].Shor, P., & Yukich, J. (1991). Minimax grid matching and empirical measures.
Annalsof Probability , , 1338–1348.Singh, K. (1981). On the asymptotic accuracy of Efron’s bootstrap. Annals of Statis-tics , , 1187–1195.Tamer, E. (2003). Incomplete simultaneous discrete response model with multipleequilibria. Review of Economic Studies , , 147–165.Weitzman, M. (2007). Subjective expectations and asset return puzzles. AmericanEconomic Review ,97