[PDF] Identification of multi-valued treatment effects with unobserved heterogeneity

Abstract

In this paper, we establish the sufficient conditions for identifying treatment effects on continuous outcomes in endogenous and multi-valued discrete treatment settings with unobserved heterogeneity. We employ the monotonicity assumption for multi-valued discrete treatments and instruments, and our identification condition is easy to interpret economically. Our result contrasts with related work by Chernozhukov and Hansen (2005) with regard to this. In addition, we identify the local treatment effects in multi-valued treatment settings and derive a closed-form expression of the identified treatment effects. We provide examples to illustrate the usefulness of our result.

Full PDF

aa r X i v : . [ ec on . E M ] O c t Identiﬁcation of multi-valued treatment eﬀects withunobserved heterogeneity

Koki Fusejima ∗ Graduate School of Economics, University of TokyoThis version: October 12, 2020

Abstract

In this paper, we establish suﬃcient conditions for identifying the treat-ment eﬀects on continuous outcomes in endogenous and multi-valued discretetreatment settings with unobserved heterogeneity. We employ the monotonic-ity assumption for multi-valued discrete treatments and instruments, and ouridentiﬁcation condition is easy to interpret economically. Our result contrastswith related work by Chernozhukov and Hansen (2005) for this point. We alsoestablish identiﬁcation of the local treatment eﬀects in multi-valued treatmentsettings and derive the closed-form expressions of the identiﬁed treatment ef-fects. We give examples to verify the usefulness of our result.

Keywords:

Treatment eﬀects, unobserved heterogeneity, identiﬁcation, endogene-ity, instrumental variables, monotonicity ∗ Email: [email protected]. I would like to thank my advisorKatsumi Shimotsu, Hidehiko Ichimura, Yuichi Kitamura, Hiroaki Kaido, Takuya Ishi-hara, Ryo Imai, Ryota Yuasa, and the seminar participants at University of Tokyo,Otaru University of Commerce, and Hitotsubashi University for their helpful com-ments on this research. This research is supported by Grant-in-Aid for JSPS ResearchFellow (20J20046) from the JSPS. All the errors are mine.1

Introduction

Unobserved heterogeneity in treatment eﬀects is important in many empirical studiesin economics. As discussed in Heckman (2001), for example, economic theory andapplications strongly suggest that causal eﬀects of treatments or policy variables varyacross individuals and subpopulations with the same observable characteristics.In the presence of such heterogeneity, diﬀerent treatment eﬀects can be deﬁned fordiﬀerent subpopulations with the same unobservable characteristics. Quantile treat-ment eﬀects are able to characterize heterogeneous impacts of treatments on diﬀerentlevels of unobserved components in terms of potential outcome quantiles. Under in-strument variable (IV) methods, local treatment eﬀect, which is ﬁrst introduced byImbens and Angrist (1994), is the treatment eﬀect conditional on the unobservablesubpopulation whose treatment states are aﬀected by the instrument.In this paper, we establish suﬃcient conditions for identifying the treatment eﬀectson continuous outcomes in endogenous and multi-valued discrete treatment settingswith unobserved heterogeneity. As is the case for any parameter, identiﬁcation is aprerequisite for consistent estimation. IV methods provide a powerful tool to iden-tify causal eﬀects under treatment endogeneity. Instruments are discrete in manyempirical applications, and we use only discrete instruments for identiﬁcation.For discrete treatments, treatments are implicitly or explicitly multi-valued inmany applications. For example, households may receive diﬀerent levels of transfersin anti-poverty programs, and participants to a training program may receive diﬀerenthours of training. It is important for the policy maker to compare the multi-valuedtreatment eﬀects, who needs to decide which treatment level is appropriate.For the multi-valued endogenous treatment case, Chernozhukov and Hansen(2005) establish identiﬁcation of the quantile treatment eﬀects (on the observed pop-ulations) with a discrete instrument. However, there are two diﬃculties in directlyapplying their identiﬁcation results in practice. First, their identiﬁcation results re-quire some numerical conditions on the conditional densities of the outcome variable,which is testable in principle but diﬃcult to check in practice. Second, estimationbased on the induced moment restriction is complicated by the non-smoothness andnon-convexity of the corresponding generalized method of moments (GMM) objectivefunction. 2or the ﬁrst point, if the treatment is binary, Vuong and Xu (2017) and W¨uthrich(2019) show identiﬁcation under an economically interpretable condition called “mono-tonicity.” This condition is introduced by Imbens and Angrist (1994), and the poten-tial outcome distributions are identiﬁed as closed-form expressions conditional on asubpopulation called “compliers” under this condition. Vuong and Xu (2017) andW¨uthrich (2019) exploit limited variations of the instrument by matching these twoconditional distributions and identify the full-population distributions of the poten-tial outcomes as closed-form expressions. This idea of matching two distributions isintroduced by Athey and Imbens (2006) in nonlinear diﬀerence-in-diﬀerence models.For the second point, W¨uthrich (2019) and Feng et al. (2020) develop plug-in estimation approach based on that closed-form expressions when the treat-ment is binary. This estimation strategy naturally bypasses the challenges as-sociated with optimizing the GMM objective function and remains computation-ally tractable. For estimation based on the GMM objective function, reliableand practically useful methods are developed especially for parametric structuralquantile models. See Chernozhukov and Hansen (2006), Chen and Lee (2018),Zhu (2018), and Kaido and Wuthrich (2018) for linear-in-parameters quantile mod-els, and see Chernozhukov and Hong (2003) and de Castro et al. (2019) for non-linear quantile models. Nonparametric estimation approaches are studied byChernozhukov et al. (2007), Horowitz and Lee (2007), Chen and Pouzo (2009, 2012),and Gagliardini and Scaillet (2012).The main contribution of this paper is as follows. We establish economically in-terpretable suﬃcient conditions for identifying the potential outcome distributionsin multi-valued treatment settings, and we derive the closed-form expressions of po-tential outcome distributions. We show that, in principle, identiﬁcation is achievedwithout assuming numerical conditions on the distribution of the outcome variable,which may be useful when designing a social experiment where the outcome data willbe collected later. We also establish identiﬁcation of the local treatment eﬀects inmulti-valued treatment settings under our assumptions. This idea resembles to that of Das (2005), who develops estimation strategy basedon the closed-form expression of the regression function with discrete endogenoustreatments in the nonparametric regression model with an additive error term.3o this end, we generalize the monotonicity assumption to multi-valued treat-ments. Our assumption is close to the “unobserved monotonicity” assumption in-troduced by Heckman and Pinto (2018), but we employ a weaker assumption thatholds only on a particular subset of the pairs of values the instrument can take. Themonotonicity assumption is originally assumed on all the pairs of instrument values,and this requirement for the pairs of values is also adopted in many studies. However,as we see in our examples, this requirement may be too strong when the instru-ment is multi-valued. Recently, Mogstad et al. (2019) and Mountjoy (2019) discuss asimilar problem when there are multiple instruments for the binary and multi-valuedunordered treatment cases respectively. They introduce weaker monotonicity assump-tions that holds only on pairs of each element, holding all other elements ﬁxed.In the case of multi-valued treatment settings, generally we can compare no twotreatment states on the same subpopulation under a single monotonicity relationshipbecause the selection mechanism become more complicated than the binary treatmentcase. We overcome this diﬃculty by developing systems of monotonicity relationshipsthat can be solved simultaneously for multiple comparisons of the treatment states.There is a rich literature on identiﬁcation of treatment eﬀects with unobservedheterogeneity using the IV methods. We review some papers that provide resultsdirectly relevant to this paper for identiﬁcation of multi-valued treatment eﬀects.For the treatment eﬀects on the full population (or some observed subpopulations),Feng (2019) and Caetano and Escanciano (2020) establish identiﬁcation when theinstrument has smaller support than the treatment using the observed covariates.Feng (2019) assumes the existence of exogenous covariates, where the covariates andthe instrument are jointly independent of the unobservables in both the outcomeand selection equations. Caetano and Escanciano (2020) assume the existence ofcontinuous covariates in which the outcome equation has a separable structure.Identiﬁcation of local treatment eﬀects in multi-valued treatment settings alsohave the diﬃculty arising from the complex selection mechanism. Under the mono-tonicity assumption generalized in each study, Angrist and Imbens (1995) identify aweighted average of local average treatment eﬀects that compares treatment states t and t − t and the set of other states with continuous instruments. Heckman et al. (2006)also establish identiﬁcation on more various subpopulations that are identiﬁed withcontinuous instruments. For more general class of selection models, Lee and Salani´e(2018) show similar identiﬁcation result for the average treatment eﬀects that com-pares any two diﬀerent treatment states t and t ′ .In this paper, we identify the local treatment eﬀect that compares any two diﬀerenttreatment states t and t ′ with a discrete instrument using systems of monotonicityrelationships we develop. We establish identiﬁcation when the outcome variable iscontinuously distributed under our monotonicity assumption with some additionalassumptions on unobservable factors.The remainder of the paper is organized as follows. In Section 2, we introducenotation, basic assumptions, and a map called “counterfactual mapping.” Counter-factual mapping is developed by Vuong and Xu (2017), and this map is an importanttool for identiﬁcation. In Section 3, we introduce the monotonicity assumptions andgive an example where economic theory can justify our monotonicity assumption. InSection 4, we establish identiﬁcation of the treatment eﬀects using our monotonicityassumption. In Section 5, we apply our identiﬁcation result to a real-world socialexperiment called “Moving to Opportunity.” Section 6 concludes. Proofs of the mainresults and some auxiliary results are collected in Appendices A and B respectively.Some additional discussions are collected in Appendices C-H in the SupplementalAppendix. Throughout this paper, we use the notations F A and Q A for the unconditional cu-mulative distribution function (c.d.f.) and quantile function (q.f.) of a scalar valuedrandom variable A , respectively. Similarly, for a set D and random vectors B and C , F A |D BC ( ·| b, c ) and Q A |D BC ( ·| b, c ) denote the conditional c.d.f. and q.f. of A on D ∩ { ( B, C ) = ( b, c ) } , respectively. Let D ◦ denote the interior of D .Let (Ω , F , P ) be a common probability space. Assume a ﬁnite collection of multi-ple treatment status (that is categorical or ordinal) indexed by t ∈ T where, without5oss of generality, T = { , , , . . . , k } with k ∈ N . Let Y , Y ,. . . , Y k with Y t ∈ Y ⊂ R and E [ | Y t | ] < ∞ denote the potential outcomes under each treatment level. The ran-dom variable T indicates which of the k + 1 potential outcomes is observed. Deﬁne D t := 1 { T = t } , where 1 {A} is the indicator function of a set A . D t is an indicatorfunction of each treatment level. Then, the observed outcome can be represented as Y = P kt =0 Y t D t . Assume that we also observe a random vector X ∈ X ⊂ R r of covari-ates and a discrete random variable Z ∈ Z for the instrument, where Z contains atleast k + 1 values. For x ∈ X , deﬁne U t := F Y t | X ( Y t | x ) conditional on X = x . U t iscalled “rank variable.” The rank variable characterizes heterogeneity of outcomes forindividuals with the same observed characteristics by the relative ranking in termsof potential outcomes. For notation simplicity, we assume that the supports of theconditional distributions of T , Y t , Y , and Z given X = x are equal to T , Y , Y , and Z respectively. To simplify the proofs in the main paper, we also assume that theclosure of Y ◦ is equal to Y , and F Y t | X ( Y ◦ | x ) does not depend on t ∈ T . The resultsin this paper do not rely on these restrictions, and we relax them in Appendix D. In this paper, for two diﬀerent treatment levels t and t ′ , we are interested inthe average treatment eﬀect (ATE): E [ Y t ] − E [ Y t ′ ] and the quantile treatment eﬀect(QTE): Q Y t ( τ ) − Q Y t ′ ( τ ), where τ ∈ (0 , X . We are also interested in the local treatment eﬀects, which weintroduce in Section 4.4.For identiﬁcation of these treatment eﬀects, it suﬃces to identify the conditionalmean and q.f. of the potential outcomes given X = x . We identify them underthe following set of assumptions. Chernozhukov and Hansen (2005), Vuong and Xu(2017), and W¨uthrich (2019) employ similar set of assumptions. Assumption 1 (Instrument independence and rank similarity) . For all x ∈ X , thefollowing conditions hold:(A1) Potential outcomes: For each t ∈ T , F Y t | X ( ·| x ) is continuous.(A2) Independence: Conditional on X = x , { U t } kt =0 are independent of Z . See Appendix H for the reason why we assume that the instrument variable doesnot have smaller support than the treatment. See also Remark 1 for related discussions.6

A3) Selection: T can be expressed as T = ρ ( Z, X, V ) for some unknown function ρ and random vector V .(A4) Rank similarity: Conditional on ( X, Z, V ) = ( x, z, v ) , { U t } kt =0 are identicallydistributed. Part (A1) of Assumption 1 imposes continuity of the potential outcome c.d.f..Under part (A1) of Assumption 1, F Y t | X ( y | x ) is strictly increasing in y ∈ Y ◦ , and Q Y t | X ( τ | x ) is strictly increasing in τ ∈ (0 , Chernozhukov and Hansen (2005) di-rectly assume that Q Y t | X ( τ | x ) is strictly increasing in τ ∈ (0 ,

1) and additionally im-poses continuous diﬀerentiability around each quantile. Vuong and Xu (2017) assumethat Q Y t | X ( τ | x ) is continuous and strictly increasing in τ ∈ (0 , Under part (A1)of Assumption 1, the rank variable U t follows an uniform distribution on (0 ,

1) con-ditional on X = x , and Y t and Q Y t | X ( U t | x ) are identically distributed conditional on X = x . Hence, we can interpret the QTE as treatment eﬀects on individuals with thesame level of unobserved heterogeneity at some level U t = τ . Part (A2) of Assump-tion 1 imposes conditional independence between the potential outcomes and theinstrument. Part (A3) of Assumption 1 states a general selection equation where therandom vector V captures unobserved factors aﬀecting selection into treatment. Part(A4) of Assumption 1 is called “rank similarity”. Rank similarity is arguably strong,but this condition has important implication for identiﬁcation and is consistent withmany empirical situations. We show these properties in Lemmas 5 and 6 in Appendix B. In this paper, we focus on Section 2 of Vuong and Xu (2017). Vuong and Xu(2017) consider several settings, and they relax some restrictions imposed on outcomeand selection equations in other sections. For the rank variable, we employ a slightly diﬀerent deﬁnition from the originaldeﬁnition of Chernozhukov and Hansen (2005). Chernozhukov and Hansen (2005)deﬁne the rank variable U t as an uniformly distributed random variable on (0 ,

1) thatsatisﬁes Y t = Q Y t | X ( U t | x ) conditional on X = x . The diﬀerence does not matter in oursettings because Y t and Q Y t | X ( U t | x ) are identically distributed conditional on X = x . We can alternatively assume a stronger assumption called “rank invariance” thatassumes U t = U t ′ for any t = t ′ . This condition corresponds to the nonseparable7he main statistical implication of Assumption 1 is that, for each t ∈ T and x ∈ X , the conditional q.f. of Y t given X = x satisﬁes the following nonlinear momentcondition (Chernozhukov and Hansen (2005) Theorem 1): k X t =0 F Y | T ZX ( Q Y t | X ( τ | x ) | t, z, x ) p t ( z, x ) = τ, (2.1)where p t ( z, x ) is deﬁned as p t ( z, x ) := P ( T = t | Z = z, X = x ) . (2.2)This moment condition (2.1) does not identify Q Y t | X ( τ | x ) without additional assump-tions. Roughly speaking, Chernozhukov and Hansen (2005) show that the key condi-tion for point identiﬁcation is full rank of Jacobian matrices characterized by the con-ditional densities of the observable outcome variable. The identiﬁcation condition ofChernozhukov and Hansen (2005) is in principle directly testable, but this conditionrequires more than suﬃciently strong correlation between the endogenous variable andthe instruments, and this condition is diﬃcult to interpret economically. Moreover,estimation based on (2.1) is complicated by the non-smoothness and non-convexityof the corresponding objective function, which occurs even for linear-in-parametersquantile models. In this section, we introduce the counterfactual mapping in multi-valued treatmentsettings. We show that identiﬁcation of the potential outcome distributions resultsin identiﬁcation of the counterfactual mappings.For s, t ∈ T and x ∈ X , deﬁne φ xs,t : R → R as φ xs,t ( y ) := Q Y t | X ( F Y s | X ( y | x ) | x ). φ xs,t is called conditional “counterfactual” mapping from Y s to Y t given X = x becausethe potential outcomes are also called as “counterfactual outcomes.” Vuong and Xuregression model Y = g ( t, x, ε ) where g ( t, x, · ) is strictly increasing in a scalar errorterm ε that follows an uniform distribution on (0 , g ( t, x, ε ) − g ( t ′ , x, ε ).We can identify the ITE if we alternatively employ this setting with the additionalassumptions assumed in Vuong and Xu (2017).82017) deﬁne a similar mapping for the binary treatment case. From the deﬁnition,this mapping relates the quantiles of conditional distribution of Y s to that of Y t given X = x . Under Assumption 1, this mapping is strictly increasing on Y ◦ , and φ xs,r = φ xt,r ◦ φ xs,t holds for s, t, r ∈ T . Moreover, under the assumption that F Y t | X ( Y ◦ | x )does not depend on t ∈ T , an inverse mapping φ x − s,t exists on Y ◦ , and φ xt,s ( y ) = φ x − s,t ( y )holds for y ∈ Y ◦ . Similarly, we deﬁne the unconditional counterfactual mapping for s, t ∈ T as φ s,t ( y ) := Q Y t ( F Y s ( y )).The following lemma shows that the potential outcome c.d.f.’s and means can bewritten as compositions of the counterfactual mappings and observable distributions.Vuong and Xu (2017) show a similar result for the binary treatment case. Lemma 1 (Potential outcome c.d.f.’s and means via counterfactual mappings) . As-sume that Assumption 1 holds. Deﬁne p t ( z, x ) as in (2.2).(a) For each s ∈ T and x ∈ X , F Y s | X ( y | x ) for y ∈ Y ◦ can be expressed as F Y s | X ( y | x ) = k X t =0 F Y | T ZX ( φ xs,t ( y ) | t, z, x ) p t ( z, x ) . (2.3) (b) For each s ∈ T and x ∈ X , E [ Y s | X = x ] can be expressed as E [ Y s | X = x ] = k X t =0 E [ φ xt,s ( Y ) | T = t, Z = z, X = x ] p t ( z, x ) . (2.4)Lemma 1 follows from the fact that, for each s, t ∈ T , Y s and φ xt,s ( Y t ) are identicallydistributed conditional on ( T, Z, X ) = ( t, z, x ). We have this because the rank vari-ables U s and U t are identically distributed conditional on ( T, Z, X ) = ( t, z, x ) underthe rank similarity assumption.Lemma 1 implies that, for each s ∈ T and x ∈ X , E [ Y s | X = x ] and Q Y s | X ( τ | x ) for τ ∈ (0 ,

1) are identiﬁed as closed-form expressions if φ xs,t for t ∈ T is also identiﬁed asa closed-form expression. Hence, we establish suﬃcient conditions to identify φ xs,t ’sand derive the closed-form expressions of φ xs,t ’s. Remark 1.

For Lemma 1, we only need part (a) in order to identify E [ Y s | X = x ] and Q Y s | X ( τ | x ) for τ ∈ (0 , φ xs,t ( y ) for y ∈ Y ◦ suﬃces for identiﬁcationof the treatment eﬀects because we assume that F Y s | X ( y | x ) is continuous in y ∈ Y ,and that the closure of Y ◦ is equal to Y . In Appendix D, we relax the assumption9hat the closure of Y ◦ is equal to Y and derive the closed-form expressions of φ xs,t and F Y s | X ( ·| x ) on a suﬃciently large subset of Y . Strictly speaking, these results arerequired to precisely derive the closed-form expressions of E [ Y s | X = x ] and Q Y s | X ( τ | x )for τ ∈ (0 , From this section, we suppress the conditioning variable X unless stated otherwisefor simplicity. In this section, we introduce the monotonicity assumptions. We ﬁrst introduce themonotonicity assumption for the binary treatment case. This condition is ﬁrst intro-duced by Imbens and Angrist (1994). Deﬁne P := { ( z, z ′ ) ∈ Z : z = z ′ } as a set ofpairs of diﬀerent values that the instrument can take. The monotonicity assumptionimposes restrictions on these pairs.Let T z be the potential treatment state if Z had been externally set to z . Then,for each pair of values ( z, z ′ ) ∈ P , we can partition the population into four groupsdeﬁned by T z and T z ′ . These four groups are { T z = T z ′ = 1 } , { T z = T z ′ = 0 } , { T z =0 , T z ′ = 1 } , and { T z = 1 , T z ′ = 0 } . The ﬁrst and second groups are those who do notchange their choice, and the third and fourth groups are those who respond to achange in Z . Let C z,z ′ := { T z = 0 , T z ′ = 1 } , and write the third and fourth groups as C z,z ′ and C z ′ ,z respectively.The monotonicity assumption requires that either P ( C z ′ ,z ) = 0 and P ( C z,z ′ ) > P ( C z,z ′ ) = 0 and P ( C z ′ ,z ) > z, z ′ ) ∈ P . The group that is assumedto have positive probability is called “compliers,” and the group that is assumed tohave 0 probability is called “deﬁers.” This condition is equivalent to assuming thateither T z ≤ T z ′ and P ( C z,z ′ ) > T z ≥ T z ′ and P ( C z ′ ,z ) > z, z ′ ) ∈ P .Heckman and Pinto (2018) generalize this assumption to multi-valued treatmentsettings. For each z ∈ Z and t ∈ T , deﬁne a binary variable D tz := 1 { T z = t } as an10ndicator function of each potential treatment state if Z had been externally set to z . The preceding comparisons of the treatments for the binary treatment case canbe translated into inequalities that compare D and D . For example, for ( z, z ′ ) ∈ P , T z ≤ T z ′ is equivalent to D z ≤ D z ′ or D z ≥ D z ′ because we have D = T and D =1 − T . Heckman and Pinto (2018) generalize this argument and assume that either D tz ≤ D tz ′ or D tz ≥ D tz ′ holds almost surely for each pair ( z, z ′ ) ∈ P and treatment level t ∈ T . They call this assumption “unordered monotonicity” because this conditioncan be assumed on the unordered treatments. However, as we see in the examples in Sections 3.2 and 5, imposing such conditionson all the pairs ( z, z ′ ) ∈ P may be too strong when the instrument is multi-valued.Hence, we employ a weaker assumption that impose such conditions only on a subsetof P . That subset is determined diﬀerently in each situation. As we deﬁne C z,z ′ = { T z = 0 , T z ′ = 1 } in the binary treatment case, deﬁne C tz,z ′ := { D tz = 0 , D tz ′ = 1 } . Our monotonicity assumption is characterized by inequalities suchas D tz ≤ D tz ′ and D tz ≥ D tz ′ . We call them “monotonicity inequalities of ( z, z ′ )” in thispaper. For ( z, z ′ ) ∈ P , if either D tz ≤ D tz ′ or D tz ≥ D tz ′ holds almost surely (condi-tional on X = x ∈ X ) for all t ∈ T , we state “monotonicity inequalities holds for( z, z ′ ) (conditional on X = x )” in this paper. We employ the following monotonicityassumption: Assumption 2 (Instrument independence and monotonicity) . For all x ∈ X , thereexists a subset Λ of P such that the following conditions hold for each λ = ( z, z ′ ) ∈ Λ :(A1) Independent instrument: Conditional on X = x , ( Y t , T z ) for t ∈ T and z ∈ λ are jointly independent of Z . Angrist and Imbens (1995) establish an assumption for the ordered treatmentsthat either T z ≤ T z ′ or T z ≥ T z ′ holds almost surely for each pair ( z, z ′ ) ∈ P . Thisassumption is called “ordered monotonicity.” Vuong and Xu (2017) assume that T = 1 { p ( Z, X ) ≥ V } and V | X ∼ U (0 ,

1) holdfor the binary treatment case. Heckman and Pinto (2018) show that an analogousseparable structure for the unobserved variable holds for each D t under the unorderedmonotonicity assumption. This implies that monotonicity inequalities hold for all thepairs of P , and we do not specify the selection equation in that way.11 A2) Monotonicity inequalities: Either P ( D tz ≤ D tz ′ | X = x ) = 1 or P ( D tz ≥ D tz ′ | X = x ) = 1 holds for all t ∈ T .(A3) Instrument relevance: Either P ( C tz,z ′ | X = x ) > or P ( C tz ′ ,z | X = x ) > holdsfor all t ∈ T .(A4) Suﬃcient support: The support of the conditional distribution of Y t given X = x and either C tz,z ′ or C tz ′ ,z is Y . In this paper, we call this subset Λ “monotonicity subset.” When T is binary, parts(A1)-(A3) of Assumption 2 are the monotonicity assumption for the binary treatmentcase. Part (A1) of Assumption 2 strengthens part (A2) of Assumption 1 and assumesthat the potential outcome and treatment are jointly independent of the instrument.Part (A2) of Assumption 2 assumes that monotonicity inequalities hold on a particularsubset Λ of P . Part (A3) of Assumption 2 with part (A2) assumes that the compliersalways exist. Under these conditions, when monotonicity inequalities hold on ( z, z ′ ) ∈ Λ, we exclude the cases where neither D tz < D tz ′ nor D tz > D tz ′ can happen for some t ∈ T . We can interpret this condition as an instrument relevance condition thatholds when the conditional covariance of D t and Z given Z ∈ { z, z ′ } is not 0 for all t ∈ T . Part (A4) of Assumption 2 strengthens the instrument relevance conditionand assumes that the compliers are suﬃciently large. Vuong and Xu (2017) employa similar condition for the binary treatment case. Under part (A4) of Assumption 2,each conditional c.d.f. of potential outcome on compliers as well as unconditional c.d.f.of potential outcome is strictly increasing on Y ◦ . From part (A3) of Assumption 1, T can be expressed as T = ρ ( Z, V ) for some unknown function ρ and an unobservedrandom vector V . A suﬃcient condition for part (A4) of Assumption 2 is that ( U t , V )has a rectangular support for all t ∈ T . This condition is often assumed for regressionmodels with a selection equation. See Appendix E for the reason for this point. We show this statement in Lemma 5 in Appendix B. See Appendix E for the proof of this point.12 .2 Motivating examples

In this section, we give examples where economic interpretation leads to Assumption2. Consider a social experiment of house purchase where each family can buy ahouse from k possible options located in diﬀerent regions and labeled as 1 , . . . , k , andvouchers are randomly assigned that oﬀer price discounts to the speciﬁed house (orhouses). We assume that the discount rates are the same. This kind of settingarises in various situations. We introduce a real-world social experiment in Section 5.We consider a situation that the researcher wants to evaluate where is the bestplace to raise a child, and the outcome of interest will be collected several years later.Let Y denote the outcome of interest that can be regarded as a continuous randomvariable (such as child test score data). Let the treatment T denote the house choiceand T = 0 for not buying any house, and for i = 1 , . . . , k , T = i for buying house i .In this section, we consider three examples. Suppose the treatment T and theinstrument Z we deﬁne in these cases are suﬃciently correlated, and we assume parts(A3) and (A4) of Assumption 2 unless stated otherwise. Let k = 2. Suppose there are three types of vouchers labeled as a , b and c such thatvoucher a : Oﬀer a discount to houses 1 and 2,voucher b : Oﬀer a discount to house 1,voucher c : Oﬀer a discount to house 2.Suppose these vouchers are randomly assigned to all the families. Let the instrument Z represent voucher assignment that takes values on Z = { a, b, c } . Part (A1) ofAssumption 2 holds because vouchers are randomly assigned.First, we have two obvious monotonicity inequalities arising from diﬀerent valuesof Z for each family ω ∈ Ω: D c ( ω ) ≤ D b ( ω ) , (3.1) We need no assumptions for house prices such as the house prices are the samebecause we just compare the possible budget sets of each family given the treatmentis ﬁxed at a particular state. 13nd D b ( ω ) ≤ D c ( ω ) . (3.2)Inequality (3.1) states that the family is induced toward buying house 1 when theinstrument changes from a voucher for house 2 to a voucher for house 1. Inequality(3.2) can be interpreted similarly.Next, economic analysis generates additional choice restrictions.Heckman and Pinto (2018) consider a similar example of car purchase, and wefollow their application of economic analysis. Let B ω ( z, t ) be the budget set of family ω when its voucher assignment is z ∈ Z and treatment choice is t ∈ T . Family ω isassumed to maximize a utility function u ω deﬁned over consumption goods g andchoice t . Then T z ( ω ) as a function of z can be viewed as a choice function of family ω : T z ( ω ) = arg max t ∈T (cid:18) max g ∈B ω ( z,t ) u ω ( g, t ) (cid:19) . (3.3)For budget set B ω ( z, t ) of family ω , we can naturally assume the following relation-ships: B ω ( a,

0) = B ω ( b,

0) = B ω ( c, , (3.4) B ω ( c, ⊂ B ω ( a,

1) = B ω ( b, , (3.5) B ω ( b, ⊂ B ω ( a,

2) = B ω ( c, . (3.6)Relationship (3.4) holds because any voucher oﬀers no discount and produce the samebudget set if family ω does not buy any house. Relationship (3.5) examines the budgetset of family ω if it purchases house 1. Its budget set is enlarged if it has a voucherthat subsidizes house 1 (vouchers a and b ) when compared to a voucher that do notaﬀect the choice set (voucher c ). Relationship (3.6) can be interpreted similarly. Fortwo diﬀerent treatment levels t and t ′ and voucher assignments z and z ′ , revealedpreference analysis generates the following choice rule: D tz ( ω ) = 1 and B ω ( z, t ) ⊆ B ω ( z ′ , t ) and B ω ( z ′ , t ′ ) ⊆ B ω ( z, t ′ ) ⇒ D t ′ z ′ ( ω ) = 0 . (3.7)This choice rule (3.7) makes intuitive sense. Suppose family ω buys house t if it has See the proof of Lemma L-1 of Pinto (2015) and Heckman and Pinto (2018) forthis statement. 14oucher z . Then its choice does not change to another house t ′ unless voucher z enlarges its budget set for buying house t when compared to its another voucher z ′ ,or its another voucher z ′ enlarges its budget set for buying house t ′ when comparedto voucher z . Applying the choice rule (3.7) to budget set relationships (3.4)-(3.6)generates six additional monotonicity inequalities in addition to (3.1)-(3.2). TableI summarizes these eight inequalities. From Table I, we may assume part (A2) ofAssumption 2 for the subset of P that contains ( a, b ) and ( a, c ).Table I: Monotonicity inequalities of Example I T a, b ) D a ≤ D b D a ≤ D b D a ≥ D b P ( a, c ) D a ≤ D c D a ≥ D c D a ≤ D c ( b, c ) D b ≥ D c D b ≤ D c These monotonicity inequalities also make intuitive sense. In the following discus-sion, we focus on D a ≤ D b . Other inequalities can be interpreted similarly. Supposefamily ω does not buy a house even though it has a voucher that subsidizes houses1 and 2 (voucher a ). Then family ω will not change its choice to buy a house if ithas a voucher that oﬀers the same discount rate but restricted to either house 1 or 2.Hence, D a = 1 ⇒ D b = 1 holds.Economic interpretation does not imply any monotonicity inequality between D b and D c , and ( b, c ) is not contained in the monotonicity subset. Suppose family ω does not buy a house if it has a voucher that subsidizes house 1 (voucher b ). Thenfamily ω may change its choice to house 2 if it has a voucher that subsidizes house 2(voucher c ) instead of voucher b . Hence, D b > D c may happen. But, D b < D c mayalso happen from a similar argument. Suppose there are k types of vouchers labeled as 1 , . . . , k such thatvoucher i : Oﬀer a discount to houses i .Suppose the families that volunteered to participate in the experiment are randomlyplaced in one of the following assignment groups: control group where no voucher is15ssigned and for i = 1 , . . . , k , group i where voucher i is assigned. Let the instrument Z represent voucher assignment, where Z = 0 denotes no voucher and Z = i denotesvoucher i . Part (A1) of Assumption 2 holds because vouchers are randomly assigned.Similar to Example I, economic analysis generates the following monotonicityinequalities, which make intuitive sense: D ii ≥ D i and D ji ≤ D j for i = 1 , . . . , k and j ∈ T \ { i } . (3.8)Table II summarizes (3.8). From Table II, we may assume part (A2) of Assumption2 for the subset of P that contains (1 , , . . . , ( k, T ··· k − k (1 , D ≤ D D ≥ D ··· D k − ≤ D k − D k ≤ D k P ... ... ... ... ... ...( k − , D k − ≤ D D k − ≤ D ··· D k − k − ≥ D k − D kk − ≤ D k ( k, D k ≤ D D k ≤ D ··· D k − k ≤ D k − D kk ≥ D k Suppose houses 1 , . . . , k are placed in order of distance from the downtown (house 1is the nearest and house k is the farthest), and there are k types of vouchers labeledas 1 , . . . , k such thatvoucher i : Oﬀer a discount to houses i, . . . , k .Suppose the families who volunteered to participate in the experiment are randomlyplaced in one of the following assignment groups: control group where no voucher isassigned, and for i = 1 , . . . , k , group i where voucher i is assigned. Let the instrument Z represent voucher assignment, where Z = 0 denotes no voucher and Z = i denotesvoucher i . Part (A1) of Assumption 2 holds because vouchers are randomly assigned. See Appendix F for the budget set relationships of the families we assume forCases II and III. 16n this example, we may regard T and Z as ordered variables and expect they arepositively correlated. This is because a family whose voucher has large number is morelikely to choose a house far from the downtown because it cannot use the voucher forhouses near the downtown.Similar to Example I, economic analysis generates the following monotonicityinequalities, which make intuitive sense: D ii ≥ D ii +1 and D ji ≤ D ji +1 for i = 1 , . . . , k − j ∈ T \ { i } ,D kk ≥ D k and D jk ≤ D j for j ∈ T \ { k } . (3.9)Table III summarizes (3.9). From Table III, we may assume part (A2) of Assumption2 for the subset of P that contains ( i, i + 1) for i = 1 , . . . , k − k, T ··· k − k (1 , D ≤ D D ≥ D ··· D k − ≤ D k − D k ≤ D k P ... ... ... ... ... ...( k − , k ) D k − ≤ D k D k − ≤ D k ··· D k − k − ≥ D k − k D kk − ≤ D kk ( k, D k ≤ D D k ≤ D ··· D k − k ≤ D k − D kk ≥ D k In this section, we establish identiﬁcation of the potential outcome distributions andthe local treatment eﬀects using our monotonicity assumption.

In this section, we introduce some basic identiﬁcation results under our monotonicityassumption. We ﬁrst establish identiﬁcation of the compliers under our monotonicityassumption. The following lemma shows that when D tz ≤ D tz ′ and P ( C tz,z ′ ) > z, z ′ ) ∈ P and t ∈ T , each probability of C tz,z ′ and the conditionalc.d.f. of Y t given C tz,z ′ are identiﬁed as closed-form expressions. Heckman and Pinto(2018) show a similar result under the unordered monotonicity assumption.17 emma 2 (Identiﬁcation of the compliers) . Assume that Assumption 2 hold, andthat P ( D tz ≤ D tz ′ | X = x ) = 1 and P ( C tz,z ′ | X = x ) > hold for ( z, z ′ ) ∈ P , x ∈ X , and t ∈ T . Deﬁne p t ( z, x ) as in (2.2). Then P ( C tz,z ′ | X = x ) and F Y t |C tz,z ′ X ( y | x ) for y ∈ Y are identiﬁed as P ( C tz,z ′ | X = x ) = p t ( z ′ , x ) − p t ( z, x ) , p t ( z ′ , x ) > p t ( z, x ) , (4.1) and F Y t |C tz,z ′ X ( y | x ) = F Y | T ZX ( y | t, z ′ , x ) p t ( z ′ , x ) − F Y | T ZX ( y | t, z, x ) p t ( z, x ) p t ( z ′ , x ) − p t ( z, x ) . (4.2)With Lemma 2 at hand, we establish identiﬁcation of the counterfactual mappings.In this section, we review the identiﬁcation results of Vuong and Xu (2017) for thebinary treatment case. Let the treatment be binary. Suppose T z ( ω ) ≤ T z ′ ( ω ) for all ω ∈ Ω and P ( C z,z ′ ) > D tz and C tz,z ′ , both D z ≤ D z ′ and P ( C z,z ′ ) >

0, and D z ′ ≤ D z and P ( C z ′ ,z ) > C z,z ′ and C z ′ ,z are the same: C z,z ′ = C z ′ ,z = C z,z ′ . (4.3)From (4.3), we obtain the following equation for the potential outcome conditionaldistributions given the compliers: F Y |C z,z ′ ( y ) = F Y |C z,z ′ ( φ , ( y )) for y ∈ Y ◦ . (4.4)This equation (4.4) follows from the following facts. First, under the rank similarityassumption, the rank variables U and U are identically distributed conditional on C z,z ′ . Second, from the deﬁnition of the counterfactual mapping, if y ∈ Y ◦ is the τ ∈ (0 ,

1) quantile of the distribution of Y , then φ , ( y ) is the τ quantile of thedistribution of Y . F Y |C z,z ′ and F Y |C z,z ′ are identiﬁed from Lemma 2, and φ , ( y ) for y ∈ Y ◦ is identiﬁed as φ , ( y ) = Q Y |C z,z ′ ( F Y |C z,z ′ ( y )) by solving (4.4) for φ , .In the case of multi-valued treatment settings, generally we can compare no twotreatment states on the same compliers as in (4.4) under a single monotonicity re-lationship. This is because the relationships between the compliers become morecomplicated than (4.3) for the binary treatment case, and the compliers generally do We show this statement in Lemma 7 in Appendix B.18ot coincide with each other. We overcome this diﬃculty by developing systems ofmonotonicity relationships that can be solved simultaneously for the counterfactualmappings on more than one compliers.

In this section, we establish identiﬁcation of the counterfactual mappings using therelationships of the compliers when the treatment is discrete in general. Let thetreatment take k + 1 values (then we have T = { , , . . . , k } ), and assume Assumptions1 and 2 hold for the subset Λ of P .We employ the identiﬁcation condition with “sign values” that characterize thetype of monotonicity relationship of each pair ( z , z ) contained in the monotonicitysubset Λ. To this end, we ﬁrst introduce sign values. Suppose that, for ( z , z ) ∈ Λ,there uniquely exists l ( z ,z ) ∈ T such that either D l ( z ,z z ≥ D l ( z ,z z and D jz ≤ D jz for j ∈ T \ { l ( z ,z ) } or D l ( z ,z z ≤ D l ( z ,z z and D jz ≥ D jz for j ∈ T \ { l ( z ,z ) } holds almost surely. Then we call this value l ( z ,z ) “sign value of ( z , z ),” and we call“( z , z ) has a sign value” in this paper.When the treatment takes three values, each ( z , z ) ∈ Λ has a sign value. Thisis because, if monotonicity inequalities of ( z , z ) are all the same, then no compliersexist and Λ cannot contain ( z , z ). Suppose that D jz ≤ D jz holds almost surely forall j = 0 , ,

2. Then, D z ≥ D z holds almost surely because D z ≤ D z and D z ≤ D z imply 1 − D z ≤ − D z . Hence, D z = D z holds almost surely and D z < D z cannot happen. Similar argument applies to treatment states 1 and 2, and D jz < D jz cannot happen for any j ∈ T , which violates part (A3) of Assumption 2.With the sign values, we employ the following assumption and assume that k diﬀerent types of monotonicity relationships exist. Assumption 3 (Existence of diﬀerent types of monotonicity relationships) . For all x ∈ X , the monotonicity subset Λ contains k pairs of instrument values λ , . . . , λ k such that the following condition holds: or i = 1 , . . . , k , there uniquely exist l λ i ∈ T with l λ i = l λ j for i = j such that D l λi λ i ≥ D l λi λ i and D jλ i ≤ D jλ i for j ∈ T \ { l λ i } hold almost surely conditional on X = x . Under Assumption 3, the monotonicity subset Λ contains k pairs of instrument values λ , . . . , λ k such that each pair λ i has a sign value, and the sign values l λ i ’s are alldiﬀerent. Then, each l λ i characterizes a type of the monotonicity relationship of λ i .Assumption 3 holds in Example I in Section 3.2. From Table I, the sign values of ( a, b )and ( a, c ) are l ( a,b ) = 2 and l ( a,c ) = 1 respectively, and ( a, b ) and ( a, c ) induce diﬀerenttypes of monotonicity relationships.We can interpret Assumption 3 as an instrument relevance condition for multi-ple endogenous binary variables that requires monotonic correlation between eachendogenous variable and the instrument. We illustrate this point with Example I.To this end, we show, for ( a, b ) and ( a, c ), diﬀerent monotonicity relationshipgenerates diﬀerent relationship between compliers. We focus on ( a, b ). First, observethat, from the deﬁnition of the compliers, we have C b,a = { D a = 1 , D b = 1 } ∪ { D a = 1 , D b = 1 } . (4.5)These sets { D a = 1 , D b = 1 } and { D a = 1 , D b = 1 } are disjoint because family ω con-tained in C b,a will choose either house 0 or 1 if its voucher assignment is b . Second,we obtain C a,b = { D a = 1 , D b = 1 } and C a,b = { D a = 1 , D b = 1 } . (4.6)To see this, suppose family ω contained in C a,b choose house 1 if its voucher assignmentis a . Then, because D a ≤ D b holds, it would also choose house 1 when its voucherassignment is b , which contradicts with the deﬁnition of C a,b . Hence, it will choosehouse 2 if its voucher assignment is a . Analogous argument applies to family ω contained in C a,b , and it will choose house 2 if its voucher assignment is a . Therefore,the following relationship holds from (4.5) and (4.6): C b,a = C a,b ∪ C a,b and C a,b ∩ C a,b = ∅ . (4.7)20or ( a, c ), an analogous discussion gives the following relationship: C c,a = C a,c ∪ C a,c and C a,c ∩ C a,c = ∅ . (4.8)These relationships (4.7) and (4.8) yield the preceding interpretation of Assump-tion 3. We focus on (4.7). Relationship (4.7) implies that whether the voucher assign-ment is a or b gives a monotonic eﬀect only toward the house choice that concernshouse l ( a,b ) = 2. Compared with voucher b , voucher a additionally oﬀers a discountonly to house 2, and only the preference for house 2 is aﬀected by the diﬀerence be-tween vouchers a and b . To see this precisely, observe that C a,b and C a,b are containedin C b,a from (4.7). This implies that if family ω does not buy a house or chooseshouse 1 when its voucher assignment is b , it will not change its choice or choose house2 when its voucher assignment is a . An analogous discussion follows for (4.8), andwhether the voucher assignment is a or c gives a monotonic eﬀect only toward thehouse choice that concerns house l ( a,c ) = 1.The treatment T in Example I consists of two binary choices that are whetherto buy house i or not ( D i ) for i = 1 ,

2. The value of D is determined when thevalues of D , . . . , D k are speciﬁed from the deﬁnition of the treatment. Therefore,with (4.7) and (4.8), there exists two pairs of instrument values such that each pairgive monotonic eﬀect toward each house D i for i = 1 , i = 1 , . . . , k , ( i,

0) has a signvalue l ( i, = i . Then, for i = j , ( i,

0) and ( j,

0) induce diﬀerent types of monotonic-ity relationships. The treatment T in Example II consists of k binary choices thatare whether to buy house i or not ( D i ) for i = 1 , . . . , k , and there exists k pairs ofinstrument values such that each pair ( i,

0) gives monotonic eﬀect toward each housechoice D i for i = 1 , . . . , k . As in Example I, the following relationships between thecompliers hold for (1 , , . . . , ( k,

0) from Table II: C i ,i = [ j = i C ji, for i = 1 , . . . , k. (4.9)As in (4.7) and (4.8), (4.9) implies that whether the voucher assignment is i or no See Appendix G for the derivation of (4.9).21oucher gives monotonic eﬀect only toward the house choice that concerns house l ( i, = i . Compared with no voucher, voucher i additionally oﬀers a discount only tohouse i , and only the preference for house i is aﬀected by the diﬀerence between thesetwo voucher assignments.Assumption 3 also holds in Example III in Section 3.2. From Table III, ( i, i + 1)for i = 1 , . . . , k − k,

0) has a sign value that are l ( i,i +1) = i and l ( k, = k . Thenthese k pairs of values induce diﬀerent types of monotonicity relationships.The following lemma shows identiﬁcation of the counterfactual mappings underAssumption 3: Lemma 3 (Identiﬁcation of counterfactual mappings from monotonicity) . Assumethat Assumptions 1-3 hold. Then, for all s, t ∈ T and x ∈ X , φ xs,t ( y ) for y ∈ Y ◦ isidentiﬁed. Remark 2.

We do not need to derive the closed-form expressions of φ xs,t ’s to identifythem. See the proof of Lemma 3 for general k ∈ T in Appendix A. See Appendix Cfor the closed-form expressions of φ xs,t ’s when the treatment is discrete in general.As discussed in Section 2, E [ Y s | X = x ] and Q Y s | X ( τ | x ) for τ ∈ (0 ,

1) are identiﬁed asclosed-form expressions if φ xs,t ( · ) for t ∈ T are identiﬁed as closed-form expressions.Hence, we obtain the following theorem: Theorem 1 (Identiﬁcation of potential outcome c.d.f.’s and ASF’s from monotonic-ity) . Assume that Assumptions 1-3 hold. Then, for all s ∈ T and x ∈ X , E [ Y s | X = x ] and Q Y s | X ( τ | x ) for τ ∈ (0 , are identiﬁed. This result is interesting because identiﬁcation is achieved without assuming nu-merical conditions on the distribution of the outcome variable, and the proposedsuﬃcient condition is economically interpretable. This fact may be useful when de-signing a social experiment where the outcome data will be collected later.

In this section, we illustrate identiﬁcation of the counterfactual mappings under As-sumption 3 with Examples I and II in Section 3.2. We ﬁrst illustrate identiﬁcationwhen the treatment takes three values with Example I.22s we derive (4.4) from (4.3) when the treatment is binary, for y ∈ Y ◦ , we obtainthe following equations from (4.7) and (4.8) respectively: F Y |C b,a ( y ) = F Y |C a,b ( φ , ( y )) P ( C a,b ) + F Y |C a,b ( φ , ( y )) P ( C a,b ) P ( C b,a ) (4.10)and F Y |C c,a ( φ , ( y )) = F Y |C a,c ( φ , ( y )) P ( C a,c ) + F Y |C a,c ( y ) P ( C a,c ) P ( C c,a ) . (4.11)By Lemma 2, all the functions in (4.10) and (4.11) except for the counterfactualmappings are identiﬁed. When we ﬁx (4.10) and (4.11) at any y f ∈ Y ◦ , these twoequations are regarded as nonlinear simultaneous equations of φ , ( y f ) and φ , ( y f ).We identify φ , ( y f ) and φ , ( y f ) by solving (4.10) and (4.11) simultaneously at y f .Equations (4.10) and (4.11) are suﬃciently diﬀerent in order to identify φ , ( y f )and φ , ( y f ) because these equations are generated from the monotonicity relation-ships with diﬀerent sign values that are l ( a,b ) = 2 and l ( a,c ) = 1 respectively. We cannotidentify φ , ( y f ) and φ , ( y f ) from a single restriction (4.10). In (4.10), the compari-son of treatment states 2 and 1 and that of treatment states 2 and 0 are mixed, and(4.11) provides an additional restriction to identify φ , ( y f ) and φ , ( y f ).Therefore, φ , ( y f ) and φ , ( y f ) are identiﬁed as follows from (4.10) and (4.11): φ , ( y f ) = sup  y ∈ Y f : F Y |C a,b ( φ y f , ( y )) P ( C a,b ) + F Y |C a,b ( y ) P ( C a,b ) P ( C b,a ) ≤ F Y |C b,a ( y f )  = inf  y ∈ Y f : F Y |C a,b ( φ y f , ( y )) P ( C a,b ) + F Y |C a,b ( y ) P ( C a,b ) P ( C b,a ) ≥ F Y |C b,a ( y f )  (4.12)and φ , ( y f ) = φ y f , ( φ , ( y f )) , (4.13)where φ y f , whose domain is Y f ⊂ Y is deﬁned as φ y f , ( y ) := Q Y |C a,c F Y |C c,a ( y ) P ( C c,a ) − F Y |C a,c ( y f ) P ( C a,c ) P ( C a,c ) ! for y ∈ Y f . (4.14)Other counterfactual mappings on Y ◦ are inversions or compositions of φ , and φ , ,and they are also identiﬁed as closed-form expressions.23e next illustrate identiﬁcation when the treatment is discrete in general withExample II. As we derive (4.4) from (4.3) when the treatment is binary, we obtainthe following equations from (4.9): F Y i |C i ,i ( φ k,i ( y )) = P j = i F Y j |C ji, ( φ k,j ( y )) P ( C ji, ) P ( C i ,i ) for y ∈ Y ◦ and i = 1 , . . . , k. (4.15)By Lemma 2, all the functions in (4.15) except for the counterfactual mappings areidentiﬁed. When we ﬁx (4.15) at any y f,k ∈ Y ◦ , the k equations of (4.15) are regardedas nonlinear simultaneous equations of φ k,i ( y f,k ) for j = 0 , . . . , k −

1. We identify thesevalues by solving the k equations of (4.15) simultaneously at y f,k .These k equations are suﬃciently diﬀerent in order to identify φ k,i ( y f,k ) for j = 0 , . . . , k − l ( i, = i for i = 1 , . . . , k . Therefore, φ k,i ( y f,k ) for j = 0 , . . . , k − Y ◦ areinversions and compositions of φ k,i for j = 0 , . . . , k −

1, and they are also identiﬁed.

In this section, we identify the local treatment eﬀects in the multi-valued treatmentsetting. Suppose that the monotonicity inequalities holds for ( z, z ′ ), and that D tz ≤ D tz ′ holds almost surely for treatment level t . When the treatment is binary, localtreatment eﬀect is the treatment eﬀect for the compliers, and identiﬁcation followsfrom identiﬁcation of F Y t |C z,z ′ , which is identiﬁed from Lemma 2.However, identiﬁcation is not straightforward in multi-valued treatment settingsbecause the relationship between the compliers become more complicated. We over-come this diﬃculty by using systems of monotonicity relationships we develop toidentify the counterfactual mappings, and we identify the local treatment eﬀect thatcompares any two diﬀerent treatment states t and t ′ . We establish identiﬁcation whenthe outcome variable is continuously distributed under our monotonicity assumptionwith some additional assumptions on unobservable factors such as the rank similarityassumption.For two diﬀerent treatment levels t and t ′ and instrument values z and z ′ , Note that φ k,k ( y ) = y for y ∈ Y ◦ holds from the deﬁnition.24he subpopulation { D t ′ z = 1 , D tz ′ = 1 } changes the treatment choice from t ′ to t if the instrument value changes from z to z ′ . The local average treatment ef-fect (LATE) that compares treatment states t and t ′ conditional on this sub-population { D t ′ z = 1 , D tz ′ = 1 } is E [ Y t | D t ′ z = 1 , D tz ′ = 1] − E [ Y t ′ | D t ′ z = 1 , D tz ′ = 1], andthe local quantile treatment eﬀect (LQTE) conditional on { D t ′ z = 1 , D tz ′ = 1 } is Q Y t | D t ′ z =1 ,D tz ′ =1 ( τ ) − Q Y t ′ | D t ′ z =1 ,D tz ′ =1 ( τ ) where τ ∈ (0 , X are similarly deﬁned.The important result to show identiﬁcation of local treatment eﬀects is that iden-tiﬁcation of the counterfactual mappings implies identiﬁcation of the treatment eﬀectconditional on the compliers. As we obtain (4.6) for Example I in Section 3.2, if ( z, z ′ )has a sign value l ( z,z ′ ) = t ′ , then P ( C tz,z ′ ) = P ( D t ′ z = 1 , D tz ′ = 1) holds for t ∈ T \ { t ′ } .The following lemma shows identiﬁcation of the conditional distribution of Y t ′ given C tz,z ′ as well as that of Y t given C tz,z ′ : Lemma 4 (Identiﬁcation of potential outcome conditional c.d.f.’s and ASF’s giventhe compliers) . Assume that Assumptions 1-3 hold, and that P ( D tz ≤ D tz ′ | X = x ) = 1 and P ( C tz,z ′ | X = x ) > hold for t ∈ T , ( z, z ′ ) ∈ P , and x ∈ X . Then, for all t ′ ∈ T , F Y t ′ |C tz,z ′ X ( y | x ) for y ∈ Y ◦ and E [ Y t ′ |C tz,z ′ , X = x ] can be expressed as F Y t ′ |C tz,z ′ X ( y | x ) = F Y t |C tz,z ′ X ( φ xt ′ ,t ( y ) | x ) (4.16) and E [ Y t ′ |C tz,z ′ , X = x ] = E [ φ xt,t ′ ( Y t ) |C tz,z ′ , X = x ] , (4.17) and E [ Y t ′ |C tz,z ′ , X = x ] and Q Y t ′ |C tz,z ′ X ( τ | x ) for τ ∈ (0 , are identiﬁed. With Lemma 4 at hand, we obtain the following theorem that shows identiﬁcation oflocal treatment eﬀects under our assumptions:

Theorem 2 (Identiﬁcation of local potential outcome c.d.f.’s and ASF’s) . Assumethat Assumptions 1-3 hold. Then, for all t, t ′ ∈ T and x ∈ X , there exists ( z, z ′ ) ∈ P such that E [ Y s | D t ′ z = 1 , D tz ′ = 1 , X = x ] and Q Y s | D t ′ z =1 ,D tz ′ =1 ,X ( τ | x ) for τ ∈ (0 , and s ∈ { t, t ′ } are identiﬁed. Application

In this section, we apply our identiﬁcation result to the real-world social experiment.Moving to Opportunity (MTO) is a housing experiment implemented by the U.S.Department of Housing and Urban Development (HUD) between 1994 and 1998. Itwas designed to evaluate the eﬀects of relocating to low poverty neighborhoods onthe outcomes of disadvantaged families living in high poverty urban neighborhoodsin the United States. This project targeted over 4000 very low-income families withchildren under 18 living in public housing or private assisted housing projects in veryhigh poverty areas of Baltimore, Boston, Chicago, Los Angeles, and New York City,whose poverty rates were more than 40 percent according to the 1990 US Census.This project randomly assigned tenant-based housing vouchers from the Section 8program that could be used to subsidize housing costs if the family agreed to relo-cate to better neighborhoods. Eligible families who volunteered to participate in theproject were placed in one of the three assignment groups: experimental (about 40%of the sample), Section 8 (about 30% of the sample), or control (about 30% of thesample). Families assigned to the experimental group were oﬀered Section 8 housingvouchers, but they were restricted to using their vouchers in a low poverty neighbor-hood, whose poverty rate was less than 10 percent according to the 1990 US Census,along with mobility counseling and help in leasing a new unit. After one year hadpassed, families in this group could use their voucher to move again if they wished,without any special constraints on location. Families assigned to the Section 8 groupwere oﬀered regular Section 8 housing vouchers without any restriction on their placeof use and whatever brieﬁng and assistance the local Section 8 program regularly pro-vided. Families assigned to the control group were oﬀered no voucher, but continuedto be eligible for project-based housing assistance and whatever other social programsand services to which they would otherwise be entitled. An interim impacts evalua-tion (Orr et al. (2003)) was conducted in 2002 and assessed the eﬀects in six studydomains: (1) mobility, housing, and neighborhood; (2) physical and mental health;(3) child educational achievement; (4) youth delinquency and risky behavior; (5) em-ployment and earnings; and (6) household income and public assistance reciept. Thelong term impacts evaluation (Sanbonmatsu et al. (2011)) was conducted in 2009 and2010. See Orr et al. (2003), Sanbonmatsu et al. (2011), and Shroder and Orr (2012)26or the detailed information for this project.Recent studies ﬁnd evidence of neighborhood eﬀects on adult employment.Aliprantis and Richter (2019) identify LATE for moving to a higher-quality neighbor-hood under an ordered treatment model using neighborhood quality as an observedcontinuous measure of the treatment variable and ﬁnd positive eﬀects on the adultlabor market outcomes and welfare receipt for the interim impacts evaluation data.Pinto (2015) nonparametrically identiﬁes LATE and ATE for moving to a low povertyneighborhood under an unordered treatment model using a proxy for neighborhoodquality and ﬁnds statistically signiﬁcant positive eﬀects on the adult labor marketoutcomes for the interim impacts evaluation data.In this paper, we take an approach similar to Pinto (2015) for the model and eco-nomic analysis. Let Y denote the outcome of interest that is continuously distributed.Let the treatment T denote the relocation decision at the intervention onset, where T = 0 denotes no relocation, T = 1 denotes high poverty neighborhood relocation,and T = 2 denotes low poverty neighborhood relocation. Let the instrument Z rep-resent voucher assignment, where Z = a denotes no voucher (control group), Z = b denotes the Section 8 voucher, and Z = c denotes the experimental voucher. Forsimplicity, we suppress the other covariates X . Part (A1) of Assumption 2 holds because vouchers are randomly assigned. Thetreatment T consists of two binary choices that are whether to relocate to high andlow poverty neighborhood or not ( D and D ) respectively. As discussed in Section4.2, our identiﬁcation result follows when there exists two pairs of instrument valuessuch that each pair gives monotonic eﬀect toward each relocation choice D i for i = 1 , Z = a ), the experimental voucher ( Z = c ) addition-ally subsidizes housing costs only for low poverty neighborhood relocation ( D = 1).Next, compared with the experimental voucher ( Z = c ), the Section 8 voucher ( Z = b )additionally subsidizes housing costs only for high poverty neighborhood relocation( D = 1). Hence, ( b, c ) and ( c, a ) give monotonic eﬀect toward relocation choices D Assuming the treatment variable is a binary variable when it is in fact multi-valued may prevent us from drawing desirable results. Aliprantis (2017) provideempirical evidence and theoretical arguments in favor of adopting a model with morethan two levels for the treatment to evaluate neighborhood eﬀects of MTO.27nd D respectively, and our identiﬁcation condition is satisﬁed.We proceed to conﬁrm the preceding argument with monotonicity inequalities.As discussed in Section 3.2 for the examples of house purchase, utility maximizationproblem (3.3) of each family ω with budget constraint B ω ( z, t ) generates the mono-tonicity inequalities. As in the examples of house purchase, we assume the followingrelationships for budget set B ω ( z, t ) of family ω : B ω ( a,

0) = B ω ( b,

0) = B ω ( c, , (5.1) B ω ( a,

1) = B ω ( c, ⊂ B ω ( b, , (5.2) B ω ( a, ⊂ B ω ( b,

2) = B ω ( c, . (5.3)Relationships (5.1)-(5.3) can be interpreted in the same way as (3.4)-(3.6) in Section3.2. Applying the choice rule (3.7) to budget set relationships (5.1)-(5.3) generates themonotonicity inequalities summarized in Table IV. From Table IV, we may assumepart (A2) of Assumption 2 for the subset Λ of P that contains ( c, a ) and ( b, c ).Table IV: Monotonicity inequalities of MTO T c, a ) D c ≤ D a D c ≤ D a D c ≥ D a P ( b, c ) D b ≤ D c D b ≥ D c D b ≤ D c The monotonicity relationships of Table IV conform with our identiﬁcation condi-tion. Assume the other conditions of Assumptions 1 and 2. From Table IV, the signvalues of ( c, a ) and ( b, c ) are l ( c,a ) = 2, l ( b,c ) = 1 respectively. Hence, ( c, a ) and ( b, c )induce diﬀerent types of monotonicity relationships. Therefore, Assumption 3 holdsand we can apply Theorems 1 and 2 to identify the treatment eﬀects as closed-formexpressions.Relationship (5.1) is weaker than Assumption A-2 of Pinto (2015), and relation-ships (5.2) and (5.3) are the same as Assumption A-1 of Pinto (2015). Inequalites inTable IV and the statement of Lemma L-1 of Pinto (2015) are equivalent. Pinto (2015)further assumes that a neighborhood is a normal good and generates additional mono-tonicity inequalities in addition to those in Table IV. Assumptions of Pinto (2015)leads to part (A2) of Assumption 2 for all the pairs in P , and unordered monotonic-28ty assumption holds. We adopt weaker assumptions because our assumptions aresuﬃcient to identify the treatment eﬀects.ATEs for moving to low and high poverty neighborhoods are E [ Y ] − E [ Y ] and E [ Y ] − E [ Y ] respectively. They are identiﬁed from Theorem 1. LATEs for moving tolow and high poverty neighborhoods are E [ Y | D a = 1 , D c = 1] − E [ Y | D a = 1 , D c =1] and E [ Y | D c = 1 , D b = 1] − E [ Y | D c = 1 , D b = 1] respectively. { D a = 1 , D c = 1 } represents the families that do not relocate when they are oﬀered no voucher, butrelocate to a low poverty neighborhood when they are oﬀered experimental voucher. { D c = 1 , D b = 1 } represents the families that do not relocate when they are oﬀeredexperimental voucher, but relocate to a high poverty neighborhood when they areoﬀered Section 8 voucher. These treatment eﬀects are identiﬁed from Theorem 2.The corresponding quantile treatment eﬀects are also identiﬁed. Therefore, when theoutcome variable is continuously distributed, treatment eﬀects are nonparametricallyidentiﬁed as closed-form expressions without another observable variable for a mea-sure of the treatment under some additional assumptions on unobservable factorssuch as the rank similarity assumption. In this paper, we establish suﬃcient conditions for identiﬁcation of the treatment ef-fects when the treatment is discrete and endogenous. We show that the monotonicityassumption is suﬃcient when this holds in an appropriate way, and this condition iseconomically interpretable. We also derive the closed-form expressions of the identi-ﬁed treatment eﬀects.For estimation procedure, W¨uthrich (2019) estimates observable conditionalc.d.f.’s, q.f.’s, and probabilities in the closed-form expression semiparatetricly by ex-isting methods and construct the estimator by plugging-in them to the closed-formexpression. We can conduct similar approach to our closed-form expressions.Alternatively, especially for estimation of the QTE, we can apply the exisitng estima-tion methods under structual quantile models based on the GMM objective functionafter checking our identiﬁcation conditions.29 ppendixA Proofs of the results in the main text

Proofs in this section use some auxiliary results (Lemmas 5-7) collected in AppendixB.

Proof of Lemma 1.

1) such that y ∗ = Q Y s | X ( τ ∗ | x ) holdsfrom part (A1) of Assumption 1. This τ ∗ can be expressed as τ ∗ = F Y s | X ( y ∗ | x ).First, we show part (a). The required result (2.3) holds if we show F Y s | T ZX ( y ∗ | t, z, x ) = F Y t | T ZX ( φ xs,t ( y ∗ ) | t, z, x ) . (A.3)We proceed to show (A.3). Under the rank similarity assumption, we have F U s | T ZX ( τ ∗ | t, z, x ) = F U t | T ZX ( τ ∗ | t, z, x ) . (A.4)Then, observe that the following equations hold: { U s ≤ τ ∗ } = { F Y s | X ( Y s | x ) ≤ F Y s | X ( y ∗ | x ) } = { Y s ≤ y ∗ } (A.5)and { U t ≤ τ ∗ } = { F Y t | X ( Y t | x ) ≤ F Y t | X ( φ xs,t ( y ∗ ) | x ) } = { Y t ≤ φ xs,t ( y ∗ ) } . (A.6)The ﬁrst equality in (A.5) holds from the deﬁnitions of U s and τ ∗ . The ﬁrst equality in(A.6) holds from the deﬁnitions of U t , τ ∗ , and φ xs,t , and because F Y t | X ( Q Y t | X ( τ ∗ | x ) | x ) = τ ∗ holds from part (A1) of Assumption 1. The second equalities in (A.5) and (A.6)hold because F Y t | X ( y | x ) for t ∈ T are strictly increasing in y ∈ Y ◦ by Lemma 5. There-30ore, applying (A.5) and (A.6) to (A.4) leads to (A.3), and (2.3) holds for y ∈ Y ◦ byapplying (A.3) to (A.1).Next, we show part (b). The required result (2.4) holds if we show the followingequation: E [ Y s | T = t, Z = z, X = x ] = E [ φ xt,s ( Y t ) | T = t, Z = z, X = x ] . (A.7)We proceed to show (A.7). Observe that we have { U t ≤ τ ∗ } = { Q Y s | X ( U t | x ) ≤ Q Y s | X ( τ ∗ | x ) } = { φ xt,s ( Y t ) ≤ y ∗ } . (A.8)The ﬁrst equality in (A.8) holds because Q Y s | X ( τ | x ) is strictly increasing in τ ∈ (0 , U t and φ xt,s .Then, applying (A.5) and (A.8) to (A.4) leads to F Y s | T ZX ( y ∗ | t, z, x ) = F φ xt,s ( Y t ) | T ZX ( y ∗ | t, z, x ) . (A.9)Because F Y t | X ( ·| x ) is continuous, U t ∼ U (0 ,

1) conditional on X = x holds. Then φ xt,s ( Y t ) = Q Y s | X ( U t | x ) d = Y s conditional on X = x holds, and F φ xt,s ( Y t ) | X ( ·| x ) is contin-uous. Hence, F Y s | T ZX ( ·| t, z, x ) and F φ xt,s ( Y t ) | T ZX ( ·| t, z, x ) are also continuous. Then,from the assumption that the closure of Y ◦ is equal to Y , we have F Y s | T ZX ( y | t, z, x ) = F φ xt,s ( Y t ) | T ZX ( y | t, z, x ) for y ∈ Y . (A.10)Therefore, Y s d = φ xt,s ( Y t ) conditional on ( T, Z, X ) = ( t, z, x ) holds, and we have (A.7).(2.4) holds by applying (A.7) to (A.2). ✷ Proof of Lemma 2.

Observe that we have C tz,z ′ = { D tz ′ = 1 } \ { D tz = D tz ′ = 1 } fromthe deﬁnition of C tz,z ′ , and that P ( D tz ≤ D tz ′ | X = x ) = 1 implies p t ( z, x ) = P ( D tz = D tz ′ = 1 | X = x ). Hence, (4.1) holds and P ( C tz,z ′ | X = x ) > p t ( z ′ , x ) > p t ( z, x ).For y ∈ Y , analogous argument gives P ( Y t ≤ y, C tz,z ′ | X = x ) = F Y | T ZX ( y | t, z ′ , x ) p t ( z ′ , x ) − F Y | T ZX ( y | t, z, x ) p t ( z, x ) , (A.11)and (4.2) holds from (4.1) and (A.11). ✷ For simplicity, we suppress the conditioning variable X in the proofs of Lemmas 3-4 unless stated otherwise. The proof of Lemma 3 become simpler when the treatmenttakes three values. We ﬁrst show Lemma 3 for the case of k = 2.31 roof of Lemma 3 (for the case of k = 2 ). Suppose that the monotonicity subsetΛ ⊂ P contains two pairs of instrument values ( a, b ) , ( d, c ) such that each sign value is l ( a,b ) = 2, l ( d,c ) = 1, and D a ≥ D b , D d ≥ D c hold almost surely, where a, b, c, d ∈ Z andthese values may not be all diﬀerent. Then, the types of monotonicity relationshipson ( a, b ) and ( d, c ) are diﬀerent. In this proof, we assume d = a , and the monotonicityinequalities corresponds to those of Example I in Section 3.2. The proof does not relyon this assumption, and we can show this lemma without this assumption similarly.It suﬃces to show that φ , and φ , are identiﬁed on Y ◦ . Identiﬁcation of othercounterfactual mappings results in identiﬁcation of φ , and φ , . To see this, we showthat φ − s,t exists on Y ◦ , and that φ s,r is identiﬁed on Y ◦ if φ s,t and φ t,r are identiﬁed on Y ◦ for s, t, r ∈ T . We ﬁrst show that φ − s,t exists on Y ◦ . First, φ s,t is strictly increasingon Y ◦ from part (A1) of Assumption 1 and Lemmas 5 and 6. Second, φ s,t ( Y ◦ ) = Y ◦ holds because we assume that F Y t ( Y ◦ ) does not depend on t ∈ T . Hence, from thedeﬁnition of φ s,t , an inverse mapping φ − s,t exists on Y ◦ , and φ t,s ( y ) = φ − s,t ( y ) holdsfor y ∈ Y ◦ . We next show that φ s,r is identiﬁed on Y ◦ if φ s,t and φ t,r are identiﬁedon Y ◦ . First, φ s,r = φ t,r ◦ φ s,t holds because F Y t ( Q Y t ( τ )) = τ for τ ∈ (0 ,

1) holds frompart (A1) of Assumption 1. Second, φ s,t ( Y ◦ ) = Y ◦ holds. Hence, the stated resultfollows.From the preceding argument, y ∈ Y ◦ implies φ s,t ( y ) ∈ Y ◦ under the assumptionthat F Y t ( Y ◦ ) does not depend on t ∈ T . We use this property in this proof.We proceed to show that φ , and φ , are identiﬁed on Y ◦ . We divide the proofinto parts (i)-(iii). Part (i).

In this part, for ( a, b ) , ( a, c ) ∈ Λ, we show (4.10) and (4.11) hold for y ∈ Y .To this end, we ﬁrst show P ( C b,a ) = P ( C a,b ) + P ( C a,b ) (A.12)and P ( C c,a ) = P ( C a,c ) + P ( C a,c ) . (A.13)We proceed to show (A.12). (A.13) follows from a similar argument. Observe Note that we can assume them without loss of generality.32hat, from the deﬁnition, we have P ( C b,a ) = P ( D a = 1 , D b = 1) + P ( D a = 1 , D b = 1) (A.14)and P ( C a,b ) = P ( D a = 1 , D b = 1) + P ( D a = 1 , D b = 1) . (A.15)Note that P ( D a = 1 , D b = 1) ≤ P ( D a = 1 , D b = 0) = 0 because { D a = 1 , D b = 1 } iscontained in { D a = 1 , D b = 0 } , and D a ≤ D b holds almost surely from part (A1) ofAssumption 2. Hence, we have P ( C a,b ) = P ( D a = 1 , D b = 1) . (A.16)An analogous argument gives P ( C a,b ) = P ( D a = 1 , D b = 1) . (A.17)Therefore, applying (A.16) and (A.17) to (A.14) gives (A.12).With (A.12) and (A.13) at hand, we show (4.10) and (4.11) hold for y ∈ Y . Weproceed to show (4.10). (4.11) follows from a similar argument. Take any y ∗ ∈Y ◦ . Then there exists τ ∗ ∈ (0 ,

1) such that y ∗ = Q Y ( τ ∗ ) holds from part (A1) ofAssumption 1. Observe that F U |C a,b ( τ ∗ ) = F U |C a,b ( τ ∗ ) and F U |C a,b ( τ ∗ ) = F U |C a,b ( τ ∗ ) (A.18)hold because { U s } s =0 are identically distributed conditional on each C ta,b for t = 0 , F U |C b,a ( τ ∗ ) = F U |C a,b ( τ ∗ ) P ( C a,b ) + F U |C a,b ( τ ∗ ) P ( C a,b ) P ( C b,a ) . (A.19)From (A.5) and (A.6) in the proof of Lemma 1, { U ≤ τ ∗ } = { Y ≤ y ∗ } and { U t ≤ τ ∗ } = { Y t ≤ φ ,t ( y ∗ ) } for t = 0 , y ∈ Y ◦ by applying them to (A.19). Part (ii).

From this part, we solve (4.10) and (4.11) for φ , and φ , simultaneously.Take any y f ∈ Y ◦ . Deﬁne φ y f , as in (4.14). In this part, we show that φ y f , satisﬁes(4.13). 33bserve that φ y f , is the identiﬁed function that satisﬁes F Y |C c,a ( y ) = F Y |C a,c ( φ y f , ( y )) P ( C a,c ) + F Y |C a,c ( y f ) P ( C a,c ) P ( C c,a ) for y ∈ Y f . (A.20)This is because F Y |C a,c is continuous on Y from Assumptions 1 and 2 and Lemma2, and F Y |C a,c ( Q Y |C a,c ( τ )) = τ holds for τ ∈ (0 , y f and (A.20) at φ , ( y f ), we have F Y |C a,c ( φ , ( y f )) = F Y |C a,c ( φ y f , ( φ , ( y f ))) . (A.21)Observe that φ , ( y f ) is contained in Y ◦ , and F Y |C a,c is strictly increasing on Y ◦ fromLemma 5. Therefore, (4.13) holds by taking inverse of F Y |C a,c in (A.21). Part (iii).

In this part, we ﬁnally show that (4.12) and (4.13) are the unique solutionto (4.10) and (4.11) at y f . First, We plug in (4.13) to (4.10) at y f and obtain F Y |C b,a ) ( y f ) = F Y |C a,b ( φ y f , ( φ , ( y f ))) P ( C a,b ) + F Y |C a,b ( φ , ( y f )) P ( C a,b ) P ( C b,a ) . (A.22)Deﬁne a function G y f , as G y f , ( y ) := F Y |C a,b ( φ y f , ( y )) P ( C a,b ) + F Y |C a,b ( y ) P ( C a,b ) P ( C b,a ) . (A.23)Then we can write (A.22) as F Y |C b,a ) ( y f ) = G y f , ( φ , ( y f )). Observe that G y f , is strictlyincreasing in Y f ∩ Y ◦ because F Y |C a,b , F Y |C a,b , and φ y f , are strictly increasing on Y f ∩ Y ◦ . φ , ( y f ) is contained in Y f ∩ Y ◦ , and we can solve (A.22) for φ , ( y f ) bytaking inverse of G y f , . Hence, φ , ( y ) is identiﬁed at each y f ∈ Y ◦ as (4.12).Therefore, φ , and φ , are identiﬁed on Y ◦ , and other counterfactual mappingsare also identiﬁed because they are inversions or compositions of φ , and φ , . ✷ We next show Lemma 3 for general k ∈ T . Proof of Lemma 3 (for general k ∈ T ). Suppose that the monotonicity subset Λ ⊂ P contains k pairs of instrument values λ , . . . , λ k such that each sign value is l λ i = i for i = 1 , . . . , k . For notation simplicity, let λ i = ( i,

0) and D ii ≥ D i hold almost surely.Then, the monotonicity inequalities corresponds to those of Example II in Section 3.2,and the types of monotonicity relationships on ( i,

0) and ( j,

0) for i = j are diﬀerent.34he proof does not rely on this assumption, and we can prove without this assumptionsimilarly.It suﬃces to show that φ k, , . . . , φ k,k − are identiﬁed on Y ◦ . As discussed in theproof for the case of k = 2, identiﬁcation of other counterfactual mappings results inidentiﬁcation of φ , and φ , , and y ∈ Y ◦ implies φ s,t ( y ) ∈ Y ◦ under the assumptionthat F Y t ( Y ◦ ) does not depend on t ∈ T .We proceed to show that φ k, , . . . , φ k,k − are identiﬁed on Y ◦ . We divide the proofinto parts (i)-(ii). We do not derive the closed-form expressions of the counterfactualmappings in this proof. We derive the closed-form expressions in Appendix C. Part (i).

In this part, for λ , . . . , λ k ∈ Λ, we show (4.15). To this end, we ﬁrst show P ( C i ,i ) = X j = i P ( C ji, ) for i = 1 , . . . , k. (A.24)We proceed to show (A.24). Observe that, from the deﬁnition, we have P ( C i ,i ) = X j = i P ( D ii = 1 , D j = 1) for i = 1 , . . . , k, (A.25) P ( C ji, ) = X l = j P ( D j = 1 , D li = 1) for j ∈ T \ { i } . (A.26)Note that, for j, l ∈ T \ { i } and j = l , we have P ( D j = 1 , D li = 1) ≤ P ( D l = 0 , D li =1) = 0 because { D j = 1 , D li = 1 } is contained in { D l = 0 , D li = 1 } , and D li ≤ D l holdsalmost surely from Assumption 3. Hence, we have P ( C ji, ) = P ( D ii = 1 , D j = 1) for i = 1 , . . . , k and j ∈ T \ { i } . (A.27)Therefore, by plugging in (A.27) to (A.25), we obtain (A.24). As we obtain (4.10)and (4.11) from (A.12) and (A.13) in the proof for the case of k = 2, we obtain (4.15)from (A.24) using Lemma 7. Part (ii).

In this part, we show that (4.15) is uniquely solved for φ k, , . . . , φ k,k − simul-taneously. To this end, consider the following simultaneous equations of ( y , . . . , y k ): F Y i |C i ,i ( y i ) = P j = i F Y j |C ji, ( y j ) P ( C ji, ) P ( C i ,i ) for i = 1 , . . . , k and y i ∈ Y ◦ . (A.28)35f ( y , . . . , y k ) = ( φ k, ( y k ) , . . . , φ k,k − ( y k ) , y k ) is the unique solution that satisﬁes (A.28),then φ k, ( y ) , . . . , φ k,k − ( y ) are identiﬁed for y ∈ Y ◦ in (4.15).We proceed to show that ( y , . . . , y k ) = ( φ k, ( y k ) , . . . , φ k,k − ( y k ) , y k ) is the uniquesolution that satisﬁes (A.28). Suppose we have solution ( y ′ , . . . , y ′ k − , y k ) diﬀerentfrom ( φ k, ( y k ) , . . . , φ k,k − ( y k ) , y k ) that also satisﬁes (A.28).We ﬁrst consider the case of y ′ < φ k, ( y k ). Then, from (A.28) with i = k , thereexists j ∈ T \ { , k } such that y ′ j > φ k,j ( y k ) holds. To see this, suppose y ′ j ≤ φ k,j ( y k )holds for all j ∈ T \ { , k } . Then, because F Y j |C jk,j is strictly increasing on Y ◦ , we have F Y k |C k ,k ( y ′ k ) > P j = k F Y j |C jk, ( y ′ j ) P ( C jk, ) P ( C k ,k ) , (A.29)and (A.28) with i = k does not hold. Without loss of generality, suppose j = k − φ k,k − is strictly increasing on Y ◦ , there exists y (1) > y k such that y ′ k − = φ k,k − ( y (1) ) holds. Then, from (A.28) with i = k −

1, there exists j ∈ T \ { , k − , k } such that y ′ j > φ k,j ( y (1) ) holds. To see this, suppose y ′ j ≤ φ k,j ( y (1) ) holds for all j ∈T \ { , k − , k } . Then, by comparing (4.15) at y (1) and (A.28) with i = k − F Y j |C jk,j on Y ◦ , we have F Y k − |C k − ,k − ( φ k,k − ( y (1) )) > P j = k − F Y j |C jk − , ( y ′ j ) P ( C jk − , ) P ( C k − ,k − ) , (A.30)and (A.28) with i = k − j = k − φ k,k − is strictly increasing in Y ◦ , there exists y (2) > y (1) such that y ′ k − = φ k,k − ( y (2) ) holds. Then, by repeating similar discussions, we can show from (A.28)with i = 2 , . . . , k that, without loss of generality, there exists y k < y (1) < · · · < y ( k − φ k,j ( y ( k − j − ) hold for j ∈ T \ { , k } . Hence,by comparing (4.15) at y ( k − and (A.28) with i = 1, we have F Y |C , ( φ k, ( y ( k − )) > P j = k − F Y j |C j , ( y ′ j ) P ( C j , ) P ( C , ) . (A.31)However, (A.31) contradicts with the fact that ( y ′ , . . . , y ′ k − , y k ) also satisﬁes (A.28)because (A.28) with i = 1 does not hold when (A.31) holds.We can show contradiction for other cases similarly. For the case of y ′ > φ k, ( y k ),we can show contradiction similarly by using reverse signs of inequality. For the case36f y ′ = φ k, ( y k ), consider the case of y ′ j = φ k,j ( y k ) for some j ∈ T \ { , k } , and we canshow contradiction in the same way as the case of y ′ = φ k, ( y k ).Therefore, ( y , . . . , y k ) = ( φ k, ( y k ) , . . . , φ k,k − ( y k ) , y k ) is the unique solution thatsatisﬁes (A.28), and φ k, ( y ) , . . . , φ k,k − ( y ) are identiﬁed for y ∈ Y ◦ in (4.15). As dis-cussed in the proof for the case of k = 2, other counterfactual mappings are alsoidentiﬁed on Y ◦ because they are inversions or compositions of φ k, , . . . , φ k,k − . ✷ Proof of Theorem 1.

It suﬃces to show that, for each s ∈ T and x ∈ X , the con-ditional distribution of Y s given X = x is identiﬁed. Because φ xs,t ( y ) for y ∈ Y ◦ isidentiﬁed from Lemma 3, F Y s | X ( y | x ) for y ∈ Y ◦ is identiﬁed from (2.3) in Lemma 1.Then, from the assumption that the closure of Y ◦ is equal to Y , F Y s | X ( y | x ) for y ∈ Y is identiﬁed. Therefore, the conditional distribution of Y s given X = x is identiﬁed. ✷ Proof of Lemma 4.

First, we show (4.16) and (4.17). Because U t and U t ′ are identi-cally distributed conditional on C tz,z ′ by Lemma 7, we have F U t ′ |C tz,z ′ ( τ ) = F U t |C tz,z ′ ( τ ) for τ ∈ (0 , . (A.32)Then, similar to the derivations of (A.6) and (A.8)-(A.10) in the proof of Lemma 1,we have { U t ≤ τ } = { Y t ≤ φ t ′ ,t ( y ) } for y ∈ Y ◦ (A.33)and Y t ′ d = φ xt,t ′ ( Y t ) conditional on C tz,z ′ . (A.34)Hence, applying (A.33) to (A.32) leads to (4.16), and (4.17) follows from (A.34).Next, we show that, for each t ′ ∈ T , the conditional distribution of Y t ′ given C tz,z ′ is identiﬁed. Because φ xt ′ ,t ( y ) for y ∈ Y ◦ is identiﬁed from Lemma 3, F Y t ′ |C tz,z ′ ( y ) for y ∈ Y ◦ is identiﬁed from (4.16). Then, from the assumption that the closure of Y ◦ isequal to Y , F Y t ′ |C tz,z ′ ( y ) for y ∈ Y is identiﬁed. Therefore, the conditional distributionof Y t ′ given C tz,z ′ is identiﬁed, and the stated result follows. ✷ Proof of Theorem 2.

From Assumption 3, for all t, t ′ ∈ T and x ∈ X , there exists( z, z ′ ) ∈ P such that l ( z,z ′ ) ∈ { t, t ′ } . We show the case of l ( z,z ′ ) = t ′ . The case of l ( z,z ′ ) = t follows from a similar argument. From (A.27) in the proof of Lemma 3 for37eneral k ∈ T , we have P ( C tz,z ′ | X = x ) = P ( D t ′ z = 1 , D tz ′ = 1 | X = x ). Therefore, thestated result follows by Lemma 4. ✷ B Auxiliary results

The following lemmas are used in the proofs in Appendix A.

Lemma 5 (Strict monotonicity on the interior of the support) . Let W be a scalarvalued random variable whose support is W . Then, F W is strictly increasing on W ◦ .Proof of Lemma 5. Because W is the support of W , we have W = { w ∈ R : F W ( w + ε ) − F W ( w − ε ) > ε > } . (B.1)Consider w , w ∈ W ◦ with w < w . Then, there exists δ > η ≤ δ ⇒ w + η ∈ W holds. First, suppose w − w ≤ δ . Then, because ( w + w ) / ∈ W , wehave F W ( w ) < F W ( w ) from (B.1). Second, suppose w − w > δ . Then, because w + δ/ ∈ W , we have F W ( w ) < F W ( w + δ ) from (B.1). Hence, we have F W ( w )

1) with τ < τ . Suppose that Q W ( τ ) = Q W ( τ ) holds. Because F W is continuous, F W ( W ) ∼ U (0 ,

1) holds. Then, from Q W ( τ ) ≤ w ⇔ τ ≤ F W ( w ) for τ ∈ (0 ,

1) and w ∈ R , we have1 − τ i = P ( F W ( W ) ≥ τ i ) = P ( W ≥ Q W ( τ i )) for i = 1 , . Hence, we have τ = τ , which is a contradiction. Therefore, the stated result follows. ✷ Lemma 7 (Rank similarity on the compliers) . Assume that Assumptions 1 and 2hold, and that P ( D tz ≤ D tz ′ | X = x ) = 1 and P ( C tz,z ′ | X = x ) > hold for ( z, z ′ ) ∈ P , x ∈ X , and t ∈ T . Then { U s } ks =0 are identically distributed conditional on C tz,z ′ and X = x . roof of Lemma 7. As we show (4.2) of Lemma 2, we can show that for τ ∈ (0 , t ′ ∈ T , F U t ′ |C tz,z ′ X ( τ | x ) = F U t ′ | T ZX ( τ | t, z ′ , x ) p t ( z ′ , x ) − F U t ′ | T ZX ( τ | t, z, x ) p t ( z, x ) p t ( z ′ , x ) − p t ( z, x ) (B.2)holds. Under rank similarity, for z + ∈ Z , we have F U t ′ | T ZX ( τ | t, z + , x ) = F U t | T ZX ( τ | t, z + , x ) . (B.3)Combining (B.2) with (B.3) leads to F U t ′ |C tz,z ′ X ( τ | x ) = F U t |C tz,z ′ X ( τ | x ), and the statedresult follows. ✷ References

Aliprantis, D. (2017). Assessing the evidence on neighborhood eﬀects from movingto opportunity.

Empirical Economics , 52(3):925–954.Aliprantis, D. and Richter, F. (2019). Evidence of neighborhood eﬀects from movingto opportunity: Lates of neighborhood quality. FRB of Cleveland Working PaperNo. 12-08r3.Angrist, J. D. and Imbens, G. W. (1995). Two-stage least squares estimation ofaverage causal eﬀects in models with variable treatment intensity.

Journal of theAmerican statistical Association , 90(430):431–442.Athey, S. and Imbens, G. W. (2006). Identiﬁcation and inference in nonlineardiﬀerence-in-diﬀerences models.

Econometrica , 74(2):431–497.Caetano, C. and Escanciano, J. C. (2020). Identifying multiple marginal eﬀects witha single instrument.

Econometric Theory , pages 1–31.Chen, L.-Y. and Lee, S. (2018). Exact computation of gmm estimators for instrumen-tal variable quantile regression models.

Journal of Applied Econometrics , 33(4):553–567.Chen, X. and Pouzo, D. (2009). Eﬃcient estimation of semiparametric conditionalmoment models with possibly nonsmooth residuals.

Journal of Econometrics ,152(1):46–60. 39hen, X. and Pouzo, D. (2012). Estimation of nonparametric conditional momentmodels with possibly nonsmooth generalized residuals.

Econometrica , 80(1):277–321.Chernozhukov, V. and Hansen, C. (2005). An iv model of quantile treatment eﬀects.

Econometrica , 73(1):245–261.Chernozhukov, V. and Hansen, C. (2006). Instrumental quantile regression inferencefor structural and treatment eﬀect models.

Journal of Econometrics , 132(2):491–525.Chernozhukov, V. and Hong, H. (2003). An mcmc approach to classical estimation.

Journal of Econometrics , 115(2):293–346.Chernozhukov, V., Imbens, G. W., and Newey, W. K. (2007). Instrumental variableestimation of nonseparable models.

Journal of Econometrics , 139(1):4–14.Das, M. (2005). Instrumental variables estimators of nonparametric models withdiscrete endogenous regressors.

Journal of Econometrics , 124(2):335–361.de Castro, L., Galvao, A. F., Kaplan, D. M., and Liu, X. (2019). Smoothed gmm forquantile models.

Journal of Econometrics , 213(1):121–144.Feng, J. (2019). Matching points: Supplementing instruments with covariates intriangular models. arXiv preprint arXiv:1904.01159.Feng, Q., Vuong, Q., and Xu, H. (2020). Estimation of heterogeneous individualtreatment eﬀects with endogenous treatments.

Journal of the American StatisticalAssociation , 115(529):231–240.Gagliardini, P. and Scaillet, O. (2012). Nonparametric instrumental variable estima-tion of structural quantile eﬀects.

Econometrica , 80(4):1533–1562.Heckman, J. J. (2001). Micro data, heterogeneity, and the evaluation of public policy:Nobel lecture.

Journal of political Economy , 109(4):673–748.Heckman, J. J. and Pinto, R. (2018). Unordered monotonicity.

Econometrica , 86(1):1–35. 40eckman, J. J., Urzua, S., and Vytlacil, E. (2006). Understanding instrumentalvariables in models with essential heterogeneity.

The Review of Economics andStatistics , 88(3):389–432.Horowitz, J. L. and Lee, S. (2007). Nonparametric instrumental variables estimationof a quantile regression model.

Econometrica , 75(4):1191–1208.Imbens, G. W. and Angrist, J. D. (1994). Identiﬁcation and estimation of localaverage treatment eﬀects.

Econometrica , 62(2):467–475.Kaido, H. and Wuthrich, K. (2018). Decentralization estimators for instrumentalvariable quantile regression models. arXiv preprint arXiv:1812.10925.Lee, S. and Salani´e, B. (2018). Identifying eﬀects of multivalued treatments.

Econo-metrica , 86(6):1939–1963.Mogstad, M., Torgovitsky, A., and Walters, C. (2019). The causal interpretation oftwo-stage least squares with multiple instrumental variables. Working Paper 25691,National Bureau of Economic Research.Mountjoy, J. (2019). Community colleges and upward mobility. Available at SSRN3373801.Orr, L., Feins, J., Jacob, R., Beecroft, E., Sanbonmatsu, L., Katz, L. F., Liebman,J. B., and Kling, J. R. (2003).

Moving to opportunity: Interim impacts evaluation .Washington, DC: US Department of Housing and Urban Development, Oﬃce ofPolicy Development and Research.Pinto, R. (2015). Selection bias in a controlled experiment: The case of moving toopportunity. Unpublished Ph. D. Thesis, University of Chicago, Department ofEconomics.Sanbonmatsu, L., Ludwig, J., Katz, L. F., Gennetian, L. A., Duncan, G. J., Kessler,R. C., Adam, E., McDade, T. W., and Lindau, S. T. (2011).

Moving to Opportunityfor Fair Housing Demonstration Program: Final Impacts Evaluation . Washington,DC: US Department of Housing and Urban Development, Oﬃce of Policy Develop-ment and Research. 41hroder, M. D. and Orr, L. L. (2012). Moving to opportunity: Why, how, and whatnext?

Cityscape , 14(2):31–56.Vuong, Q. and Xu, H. (2017). Counterfactual mapping and individual treatmenteﬀects in nonseparable models with binary endogeneity.

Quantitative Economics ,8(2):589–610.W¨uthrich, K. (2019). A closed-form estimator for quantile treatment eﬀects withendogeneity.

Journal of econometrics , 210(2):219–235.Zhu, Y. (2018). k-step correction for mixed integer linear programming: a new ap-proach for instrumental variable quantile regressions and related problems. arXivpreprint arXiv:1805.06855. 42 r X i v : . [ ec on . E M ] O c t Supplement to “Identiﬁcation of multi-valuedtreatment eﬀects with unobserved heterogeneity”

Koki Fusejima ∗ Graduate School of Economics, University of TokyoThis version: October 12, 2020

This Supplemental Appendix is organized as follows. Appendix C derives theclosed-form expressions of the counterfactual mappings for the general discrete treat-ment case. Appendix D relaxes some conditions assumed for simplicity in the mainpaper, and precisely derive the closed-form expressions of the treatment eﬀects. Ap-pendix E shows some suﬃcient conditions for Assumption 2 introduced in Section3.1. Appendix F shows the budget set relationships of the families we assume forExamples II and III in Section 2. Appendix G derives the relationships between thecompliers of Example II in Section 3.2. In Appendix H, we discuss the reason why weassume that the treatment do not have larger support than the instrument variable.For simplicity, we suppress the conditioning variable X unless stated otherwise. ∗ Email: [email protected]

Closed-form expressions of the counterfactualmappings in Lemma 3

C.1 Derivation of the closed-form expressions

In this section, we derive the closed-form expressions of φ s,t ’s in Lemma 3. Take any y f,k ∈ Y ◦ . Deﬁne G y f k − ,k ( y ) for y ∈ Y f as G y f k − ,k ( y ) := P k − j =0 F Y j |C jk, ( φ y f k − ,j ( y )) P ( C jk, ) + F Y k − |C k − k, ( y ) P ( C k − k, ) P ( C h ,k ) , (C.1)where φ y f k − ,j for j = 0 , . . . , k − Y f ⊂ Y are constructed to satisfythe following equations with φ y f k − ,k − ( y ) = y : F Y i |C i ,i ( φ y f k − ,i ( y ))= P j = i,k,k − F Y j |C ji, ( φ y f k − ,j ( y )) P ( C ji, ) + F Y k − |C k − i, ( y ) P ( C k − i, ) + F Y k |C ki, ( y f,k ) P ( C ki, ) P ( C i ,i )for i = 1 , . . . , k − y ∈ Y f . (C.2)The closed-form expressions of φ k,j ( y f,k ) for j = 0 , . . . , k − φ k,k − ( y f,k ) = sup n y ∈ Y f : G y f k − ,k ( y ) ≤ F Y k |C h ,k ( y f,k ) o = inf n y ∈ Y f : G y f k − ,k ( y ) ≥ F Y k |C h ,k ( y f,k ) o , (C.3) φ k,j ( y f,k ) = φ y f k − ,j ( φ k,k − ( y f,k )) for j = 0 , . . . , k − . (C.4)As discussed in proof of Lemma 3 in Appendix A, other counterfactual mappingsare also identiﬁed as closed-form expression on Y ◦ because they are inversions orcompositions of φ k, , . . . , φ k,k − .We proceed to show that (C.3) and (C.4) are the unique solution to (4.15) at y f,k .We divide the proof into parts (i)-(iii). The domain Y f contains φ k,k − ( y f,k ), and when k = 2, (A.20) and (C.2) are thesame. 2 art (i). In this part, we construct φ y f k − ,j for j = 0 , . . . , k − Y f ⊂ Y is the domain of φ y f k − ,j for j = 0 , . . . , k −

2, and Y f contains φ k,k − ( y f,k ). Bycomparing (4.15) at y f,k and (C.2), these functions exists at φ k,k − ( y f,k ).To this end, consider the case of k = h . We can construct these functions forany h by starting the following discussion from the case of h = 3 and repeating thediscussion inductively for h = 4 , , . . . . Suppose we know how to construct φ y f k − ,j for j = 0 , . . . , k − k =2 , . . . , h −

1. From (C.2) with k = h , for y f,h − ∈ Y f , we can construct φ y f h − ,j whosedomain is Y f, ⊂ Y as follows: φ y f h − ,h − ( y f,h − ) ∈ n y ∈ Y f, : G y f h − ,h − ( y ) = F Y h − |C h − ,h − ( y f,h − ) o , (C.5) φ y f h − ,j ( y f,h − ) = φ y f h − ,j ( φ y f h − ,h − ( y f,h − )) for j = 0 , . . . , h − , (C.6)where G y f h − ,h − is deﬁned as G y f h − ,h − ( y ) := P j = h,h − F Y j |C jh − , ( φ y f h − ,j ( y )) P ( C jh − , ) + F Y h |C hh − , ( y f,h ) P ( C hh − , ) P ( C h − ,h − ) , and φ y f h − ,j for j = 0 , . . . , h − F Y i |C i ,i ( φ y f h − ,i ( y ))= P j = i,h,h − F Y j |C ji, ( φ y f h − ,j ( y )) P ( C ji, ) + P hj = h − F Y j |C ji, ( y f,j ) P ( C hi, ) P ( C i ,i )for i = 1 , . . . , h − y ∈ Y f, , (C.7)We can construct φ y f h − ,j for j = 0 , . . . , h − φ y f h − ,j for j = 0 , . . . , h − k = h − Part (ii).

In this part, we show that φ y f k − ,j for j = 0 , . . . , k − We demonstrate the case of h = 3 in Appendix C.2.3nd, consider the following simultaneous equations of ( y , . . . , y k − ): F Y i |C i ,i ( y i )= P j = i,h F Y j |C ji, ( y j ) P ( C ji, ) + F Y k |C ki, ( y f,k ) P ( C ki, ) P ( C i ,i )for y i ∈ Y ◦ and i = 1 , . . . , k − . (C.8)If ( y , . . . , y k − ) = ( φ k, ( y f,k ) , . . . , φ k,k − ( y f,k )) is the only solution that satisﬁes (C.8)with y k − = φ k,k − ( y f,k ), (C.4) follows from comparing (4.15) at y f,k and (C.2) at φ k,k − ( y f,k ).We proceed to show that ( y , . . . , y k − ) = ( φ k, ( y f,k ) , . . . , φ k,k − ( y f,k )) is the onlysolution that satisﬁes (C.8) with y k − = φ k,k − ( y f,k ). Suppose we have solution( y ′ , . . . , y ′ k − , y k − ) diﬀerent from ( φ k, ( y f,k ) , . . . , φ k,k − ( y f,k )) that also satisﬁes (C.8)with y k − = φ k,k − ( y f,k ). Then, as we show contradiction for the cases of y ′ < φ k, ( y k ), y ′ > φ k, ( y k ), and y ′ = φ k, ( y k ) in part (ii) of the proof of Lemma 3 for general k ∈ T ,we can show that this supposition contradicts with the fact that ( y ′ , . . . , y ′ k − , y k − )also satisﬁes (C.8) with y k − = φ k,k − ( y f,k ) for the cases of y ′ < φ k, ( y f,k ), y ′ >φ k, ( y f,k ), and y ′ = φ k, ( y f,k ).Therefore, ( y , . . . , y k − ) = ( φ k, ( y f,k ) , . . . , φ k,k − ( y f,k )) is the unique solution thatsatisﬁes (C.8) with y k − = φ k,k − ( y f,k ). Hence, (C.4) holds regardless of the unique-ness of φ y f k − ,j for j = 0 , . . . , k − Part (iii).

In this part, we ﬁnally show that (C.3) and (C.4) are the unique solutionsto satisfy (4.15) at y f,k . First, we plug in (C.4) to (4.15) with i = k : F Y k |C h ,k ( y f,k )= P k − j =0 F Y j |C jk, ( φ y f k − ,j ( φ k,k − ( y f,k ))) P ( C jk, ) + F Y k − |C k − k, ( φ k,k − ( y f,k )) P ( C k − k, ) P ( C k ,k ) . (C.9)Next, we solve (C.9) for φ k,k − ( y f,k ). To this end, we ﬁrst show that, for j = 0 , . . . , k − y ∈ Y f ∩ Y ◦ , φ y f k − ,j satisﬁes the following properties: y > φ k,k − ( y f,k ) ⇒ φ y f k − ,j ( y ) > φ y f k − ,j ( φ k,k − ( y f,k )) (C.10)and y < φ k,k − ( y f,k ) ⇒ φ y f k − ,j ( y ) < φ y f k − ,j ( φ k,k − ( y f,k )) . (C.11)4ith (C.10) and (C.11) at hand, for y ∈ Y f ∩ Y ◦ , G y f k − ,k deﬁned in (C.1) satisfy aproperty similar to (C.10) and (C.11) of φ y f k − ,j .We proceed to show (C.10). (C.11) follows from a similar argument usingthe reverse signs of inequality. Suppose y + > φ k,k − ( y f,k ) satisﬁes φ y f k − , ( y + ) ≤ φ y f k − , ( φ k,k − ( y f,k )). From (C.4), we have φ y f k − , ( y + ) ≤ φ k, ( y f,k ). Then, as we showcontradiction for the case of y ′ < φ k, ( y k ) in part (ii) of the proof of Lemma 3 for gen-eral k ∈ T , we can show that this supposition contracts with the fact that φ y f k − ,j ( y + )for j = k − , . . . , y + . For the case of y + > φ k,k − ( y f,k ) satisfying φ y f k − , ( y + ) > φ y f k − , ( φ k,k − ( y f,h )), consider the case o φ y f k − ,j ( y + ) ≤ φ y f k − ,j ( φ k,k − ( y f,h ))for some j ∈ \{ , k − , k } , and we can show contradiction in the same way as thecase of y ′ = φ k, ( y k ) and y ′ j = φ k,j ( y k ) for some j ∈ T \ { , k } in part (ii) of the proofof Lemma 3 for general k ∈ T . Therefore, (C.10) holds.Therefore, we can solve (C.9) for φ k,k − ( y f,k ), and the closed-form expression of φ k,k − ( y ) at each y f,k is derived as (C.3). Closed-form expressions of φ k,j for j =0 , . . . , k − φ k,k − to (C.4). C.2 Supplement to Part (i)

In this section, we apply part (i) of the preceding proof to the case of k = 3. Take any y f, ∈ Y . For y ∈ Y f , let φ y f , and φ y f , whose domains are Y f ⊂ Y be the functionsthat satisﬁes the following equations: F Y |C , ( y )= F Y |C , ( φ y f , ( y )) P ( C , ) + F Y |C , ( φ y f , ( y )) P ( C , ) + F Y |C , ( y f, ) P ( C , ) P ( C , ) , (C.12) F Y |C , ( φ y f , ( y ))= F Y |C , ( φ y f , ( y )) P ( C , ) + F Y |C , ( φ y f , ( y )) P ( C , ) + F Y |C , ( y f, ) P ( C , ) P ( C , ) . (C.13)(C.12) and (C.13) are the same as (C.2) with k = 3. Then, as we identify the coun-terfactual mappings in the proof of Lemma 3 for the case of k = 2, φ y f , and φ y f , are5dentiﬁed on Y f , and for y f, ∈ Y f , (C.5) and (C.6) with h = 3 become φ y f , ( y f, ) = n y ∈ Y f, : G y f , ( y ) = F Y |C , ( y f, ) o = sup n y ∈ Y f, : G y f , ( y ) ≤ F Y |C , ( y f, ) o = inf n y ∈ Y f, : G y f , ( y ) ≥ F Y |C , ( y f, ) o , (C.14) φ y f , ( y f, ) = φ y f , ( φ y f , ( y f, )) , (C.15)where φ y f , whose domain is Y f, ⊂ Y is the function that satisﬁes the following equa-tion: F Y |C , ( y )= F Y |C , ( φ y f , ( y )) P ( C , ) + F Y |C , ( y f, ) P ( C , ) + F Y |C , ( y f, ) P ( C , ) P ( C , ) . (C.16)(C.16) is the same as (C.7) with h = 3, and as we identify φ y f , for the case of k = 2in the proof of Lemma 3, φ y f , in (C.16) is identiﬁed as φ y f , ( y ) = Q Y |C ,  F Y |C , ( y ) P ( C , ) − P j =2 F Y j |C j , ( y f,j ) P ( C j , ) P ( C , )  . D Additional discussion for identiﬁcation asclosed-form expressions

In this section, we relax the assumptions that the closure of Y ◦ is equal to Y , andthat F Y t | X ( Y ◦ | x ) does not depend on t ∈ T , and precisely derive the closed-formexpressions of the treatment eﬀects. To this end, we ﬁrst deﬁne a subset of Y thatis suﬃciently large and contains Y ◦ , and then we derive the closed-form expressionsof φ xs,t and the conditional c.d.f.’s of Y t on that subset of Y . Proofs of lemmas,propositions, and theorems are collected at the end of this section.Before we deﬁne the required subset of Y , we introduce some preliminary resultsthat are useful in this section. Let V and W be scalar valued random variables whosesupports are V and W . Deﬁne V ∗ := { v ∈ R : F V ( v ) − F V ( v − ε ) > ε > } and W ∗ := { w ∈ R : F W ( w ) − F W ( w − ε ) > ε > } . The following lemmasshow some useful properties of V ∗ and W ∗ .6 emma 8 (Size of W ∗ ) . W contains W ∗ , and W ∗ contains W ◦ . Lemma 9 (Covering the quantiles) . (a) For all τ ∈ (0 , , Q W ( τ ) is contained in W ∗ .(b) If F W is continuous, then, for all τ ∈ (0 , , there exists w ∈ W ∗ such that F W ( w ) = τ holds. Lemma 10 (Identical supports) . (a) If V ∗ ⊃ W ∗ holds, then V ⊃ W holds.(b) If F W is continuous and V ⊃ W holds, then V ∗ ⊃ W ∗ holds. Lemma 11 (Strict monotonicity on W ∗ ) . F W is strictly increasing on W ∗ . Lemma 12 (Identical distributions) . If F V ( w ) = F W ( w ) holds for all w ∈ W ∗ , and F W is continuous, then F V ( w ) = F W ( w ) holds for all w ∈ R . We now deﬁne the required subset of Y . For t ∈ T and x ∈ X , deﬁne Y ∗ := { y ∈ R : F Y t | X ( y | x ) − F Y t | X ( y − ε | x ) > ε > } . From Lemma 8, Y ∗ is a subset of Y and contains Y ◦ . The following proposition shows that φ xs,t and the conditionalc.d.f.’s of Y t are identiﬁed on Y ∗ as well as Y ◦ : Proposition 1.

Assume that Assumptions 1-3 hold. Deﬁne p t ( z, x ) as in (2.2).(a) For each s ∈ T and x ∈ X , F Y s | X ( y | x ) for y ∈ Y ∗ can be expressed as (2.3).(b) Assume that P ( D tz ≤ D tz ′ | X = x ) = 1 and P ( C tz,z ′ | X = x ) > hold for t ∈ T , ( z, z ′ ) ∈ P , and x ∈ X . Then, for all t ′ ∈ T , F Y t ′ |C tz,z ′ ( y ) for y ∈ Y ∗ can be ex-pressed as (4.16).(c) For all s, t ∈ T and x ∈ X , φ xs,t ( y ) for y ∈ Y ∗ is identiﬁed. Remark 3.

Proposition 1 modiﬁes the results in the main paper that hold on Y ◦ .Part (a) of Proposition 1 modiﬁes part (a) of Lemma 1. Part (b) of Proposition 1modiﬁes Lemma 4. Part (c) of Proposition 1 modiﬁes Lemma 3. The closed-formexpressions derived in the proof of Lemma 3 for the case of k = 2 and in AppendixC also hold on Y ∗ . Y ∗ does not depend on t ∈ T and x ∈ X from Lemma 10. We assume that thesupport of the conditional distribution of Y t given X = x is Y .7n the proof of Lemmas 1-4 and, we use strict monotonicity of F Y t | X ( y | x ) and F Y t |C tz,z ′ X ( y | x ) in y ∈ Y ◦ in order to derive the closed-form expressions of φ xs,t and theconditional c.d.f.’s of Y t on Y ◦ . From Lemma 11, F Y t | X ( y | x ) is strictly increasing in y ∈ Y ∗ under part (A1) of Assumption 1. From Lemmas 10 and 11, F Y t |C tz,z ′ X ( y | x )for each ( z, z ′ ) ∈ P is also strictly increasing on Y ∗ under part (A4) of Assumption 2.Therefore, applying Lemmas 10 and 11 instead of Lemma 5 leads to the closed-formexpressions of φ xs,t and the conditional c.d.f.’s of Y t on Y ∗ .We then apply Proposition 1 to precisely derive the closed-form expressions of thetreatment eﬀects. With Proposition 1 at hand, the following proposition shows thatthe treatment eﬀects are identiﬁed as closed-form expressions: Proposition 2.

Assume that Assumptions 1-3 hold.(a) For each s ∈ T and x ∈ X , Q Y s | X ( τ | x ) for τ ∈ (0 , and E [ Y s | X = x ] can beexpressed as Q Y t | X ( τ | x ) = inf { y ∈ Y ∗ : F Y t | X ( y | x ) ≥ τ } (D.1) and (2.4) respectively.(b) Assume that P ( D tz ≤ D tz ′ | X = x ) = 1 and P ( C tz,z ′ | X = x ) > hold for t ∈ T , ( z, z ′ ) ∈ P , and x ∈ X . Then, for all t ′ ∈ T , Q Y t ′ |C tz,z ′ X ( τ | x ) for τ ∈ (0 , and E [ Y t ′ |C tz,z ′ , X = x ] can be expressed as Q Y t ′ |C tz,z ′ X ( τ | x ) = inf { y ∈ Y ∗ : F Y t ′ |C tz,z ′ X ( y | x ) ≥ τ } (D.2) and (4.17) respectively. Remark 4.

In the main paper, we show (2.4) and (4.17) in part (b) of Lemma 1 andLemma 4 respectively. The proofs in the main paper use the assumption that theclosure of Y ◦ is equal to Y . In the proof of Proposition 2, we show (2.4) and (4.17)without assuming that condition.Proposition 2 follows from the fact that Y ∗ is suﬃciently large such that identiﬁca-tion of the distribution on Y ∗ leads to identiﬁcation of the whole distribution. FromLemma 9, Y ∗ contains all the quantiles of the conditional distribution of Y t given X = x . Lemma 12 implies that F Y s | X ( ·| x ) on R is speciﬁed when F Y s | X ( ·| x ) on Y ∗ isspeciﬁed. Hence, identiﬁcation of F Y s | X ( ·| x ) on Y ∗ leads to identiﬁcation of Q Y s | X ( ·| x )on (0 ,

1) and E [ Y s | X = x ]. 8roofs of Propositions 1 and 2 imply that Theorems in the main paper do not relyon the assumptions that the closure of Y ◦ is equal to Y , and that F Y t | X ( Y ◦ | x ) doesnot depend on t ∈ T . We modify the proofs of Theorems.In the main paper, the assumption that the closure of Y ◦ is equal to Y is used toassure that identiﬁcation on the interior of the support leads to identiﬁcation on thewhole support. However, Y ∗ is suﬃciently large and do not require such condition toidentify the conditional distributions of Y t on Y .In the main paper, the assumption that F Y t | X ( Y ◦ | x ) does not depend on t ∈ T isused to assure the existence of an inverse mapping φ x − s,t on Y ◦ . However, φ x − s,t existson Y ∗ from the fact that F Y t | X ( Y ∗ | x ) = (0 ,

1) holds and that F Y t | X ( Y ∗ | x ) does notdepend on t ∈ T . F Y t | X ( Y ∗ | x ) = (0 ,

1) follows from the fact that Y ∗ contains all thequantiles of the conditional distribution of Y t given X = x from Lemma 9.We ﬁnally show Lemmas 8-12 and Propositions 1 and 2, and modify the proofs ofTheorems 1 and 2. Proofs are as follows. Proof of Lemma 8.

First, we show that W contains W ∗ . For w ∈ W ∗ , F W ( w + ε ) >F W ( w − ε ) holds for all ε > F W ( w + ε ) ≥ F W ( w ) holds. Hence, w is alsocontained in W , and W contains W ∗ . Next, we show that W ∗ contains W ◦ . Let w bea point not contained in W ∗ . Then, there exists ε > F W ( w ) = F W ( w − ε )holds. Suppose that W is contained in W ◦ . Then, there exists δ > η ≤ δ ⇒ w − η ∈ W holds. If that η also satisﬁes η ≤ ε/

2, then F W ( w ) > F W ( w − η )holds because w − η is contained in W . However, this fact contradicts with the factthat F W ( w − η ) ≥ F W ( w − ε ) holds because F W ( w ) = F W ( w − ε ) holds. Hence, W is not contained in W ◦ , and W ∗ contains W ◦ . Therefore, the stated result follows. ✷ Proof of Lemma 9.

First, we show part (a). Suppose that there exists τ ′ ∈ (0 , Q W ( τ ′ ) is not contained in W ∗ . Then, there exists ε > F W ( Q W ( τ ′ )) = F W ( Q W ( τ ′ ) − ε ) holds. However, this fact contradicts with the factthat F W ( Q W ( τ ′ )) ≥ τ ′ holds from the deﬁnition of Q W . Hence, Q W ( τ ) is containedin W ∗ for all τ ∈ (0 , F W ( Q W ( τ )) = τ holds for all τ ∈ (0 , F W is continuous. Therefore, the stated result follows by taking w = Q W ( τ )for each τ . ✷ roof of Lemma 10. First, we show part (a). Let w ∈ W satisfy w / ∈ V . Then, thereexists ε > F V ( w + ε ) = F V ( w − ε ) holds because F V ( w + ε ) ≥ F V ( w − ε )holds. This fact implies that ( w − ε, w + ε ) is not contained in W ∗ because V ∗ ⊃ W ∗ and V ∗ ⊂ V hold from Lemma 8. Then there exists τ ∈ (0 ,

1) such that F W ( η ) = τ holds for all η ∈ ( w − ε, w + ε ). We proceed to show this property. Suppose that η, η ′ ∈ ( w − ε, w + ε ) satisﬁes η < η ′ and F W ( η ) < F W ( η ′ ). Then, η < Q W ( F W ( η ′ )) ≤ η ′ holds, and Q W ( F W ( η ′ )) is contained in W ∗ from Lemma 9. However, this con-tradicts with the fact that ( w − ε, w + ε ) is not contained in W ∗ . Hence, the statedproperty follows. This property implies that w is not contained in W . However, thiscontradicts with w ∈ W . Therefore, V ⊃ W holds.Next, we show part (b). Let w ∈ W ∗ satisfy w / ∈ V ∗ . Then there exists ε > F V ( w ) = F V ( w − ε ) holds because F V ( w ) ≥ F V ( w − ε ) holds. This factimplies that ( w − ε, w ) is not contained in W because V ⊃ W holds. Then there exists τ ∈ (0 ,

1) such that F W ( w − η ) = τ and τ < F W ( w ) hold for all 0 < η < ε . We proceedto show this property. Suppose that 0 < η < η ′ < ε satisﬁes F W ( w − η ) < F W ( w − η ′ ).Then, w − η < Q W ( F W ( w − η ′ )) ≤ w − η ′ holds, and Q W ( F W ( w − η ′ )) is contained in W ∗ from Lemma 9. However, this contradicts with the fact that ( w − ε, w ) is notcontained in W because Q W ( F W ( w − η ′ )) is also contained in W from Lemma 8.Hence, the stated property follows. However, this property contradicts with continuityof F W . Therefore, V ∗ ⊃ W ∗ holds. ✷ Proof of Lemma 11.

Let w , w ∈ W ∗ satisﬁes w < w . Then, because w is con-tained in W ∗ , F W ( w ) > F W ( w ) holds. Therefore, the stated result follows. ✷ Proof of Lemma 12.

We show that F V ( w ) ≥ F W ( w ) holds for all w ∈ R . Suppose thatthere exists w ∈ R such that F V ( w ) < F W ( w ) holds. Then, for F V ( w ) < τ ′ < F W ( w ),there exists w ′ ∈ W ∗ such that w ′ < w and F V ( w ′ ) = F W ( w ′ ) = τ ′ hold from Lemma9. However, this fact implies that τ ′ ≤ F V ( w ) holds and contradicts with the factthat F V ( w ) < F W ( w ) holds. Hence, F V ( w ) ≥ F W ( w ) holds for all w ∈ R . We canshow that F V ( w ) ≤ F W ( w ) holds for all w ∈ R similarly. Therefore, the stated resultfollows. ✷ Proof of Proposition 1.

For part (a), use Lemmas 10 and 11 instead of Lemma 5 inthe proof of part (a) of Lemma 1. For part (b), use Lemmas 10 and 11 instead of10emma 5 to show (4.16) in the proof of Lemma 4.For part (c), it suﬃces to show that φ xk,j for j = 0 , . . . , k − Y ∗ .Identiﬁcation of other counterfactual mappings results in identiﬁcation of φ xk,j for j = 0 , . . . , k − φ x − s,t exists on Y ∗ , and that φ xs,r is identiﬁed on Y ∗ if φ xs,t and φ xt,r are identiﬁed on Y ∗ for s, t, r ∈ T . We ﬁrst show that φ x − s,t exists on Y ∗ .First, φ xs,t is strictly increasing on Y ◦ from part (A1) of Assumption 1 and Lemma 11.Second, φ xs,t ( Y ∗ ) = Y ∗ holds because F Y t | X ( Y ∗ | x ) = (0 ,

1) holds and F Y t | X ( Y ∗ | x ) doesnot depend on t ∈ T . This is because Y ∗ contains all the quantiles of the conditionaldistribution of Y t given X = x from Lemma 9. Hence, from the deﬁnition of φ xs,t ,an inverse mapping φ x − s,t exists on Y ∗ , and φ xt,s ( y ) = φ x − s,t ( y ) holds for y ∈ Y ∗ . Wenext show that φ xs,r is identiﬁed on Y ∗ if φ xs,t and φ xt,r are identiﬁed on Y ∗ . First, φ xs,r = φ xt,r ◦ φ xs,t holds because F Y t | X ( Q Y t | X ( τ | x ) | x ) = τ for τ ∈ (0 ,

1) holds from part(A1) of Assumption 1. Second, φ xs,t ( Y ∗ ) = Y ∗ holds. Hence, the stated result follows.From the preceding argument, y ∈ Y ∗ implies φ xs,t ( y ) ∈ Y ∗ . We use this propertyin this proof.For identiﬁcation of φ xk,j for j = 0 , . . . , k − Y ∗ , use Lemmas 10 and 11 insteadof Lemma 5, and use the fact that y ∈ Y ∗ implies φ xs,t ( y ) ∈ Y ∗ . Therefore, φ xk,j for j = 0 , . . . , k − Y ∗ are identiﬁed on Y ∗ , and other counterfactual mappings arealso identiﬁed because they are inversions or compositions of φ xk,j for j = 0 , . . . , k − φ xs,t on y ∈ Y ∗ when T is discrete in general, useLemmas 10 and 11 instead of Lemma 5 in Appendix C. ✷ Proof of Proposition 2.

Because φ xs,t ( y ) for y ∈ Y ∗ is identiﬁed from part (c) ofProposition 1, F Y s | X ( y | x ) for y ∈ Y ∗ is identiﬁed from part (a) of Proposition 1. Hence, Q Y s | X ( τ | x ) for τ ∈ (0 ,

1) and E [ Y s | X = x ] are identiﬁed from part (a) of Proposition2. ✷ Modiﬁed proof of Theorem 2. we ﬁrst modify the proof of Lemma 4 and show that Q Y t ′ |C tz,z ′ X ( τ | x ) for τ ∈ (0 ,

1) and E [ Y t ′ |C tz,z ′ , X = x ] are identiﬁed for all t ′ ∈ T if P ( D tz ≤ D tz ′ | X = x ) = 1 and P ( C tz,z ′ | X = x ) > t ∈ T , ( z, z ′ ) ∈ P , and x ∈ X . Because φ xs,t ( y ) for y ∈ Y ∗ is identiﬁed from part (c) of Proposition 1, F Y t ′ |C tz,z ′ X ( y | x ) for y ∈ Y ∗ is identiﬁed from part (b) of Proposition 1. Hence, Q Y t ′ |C tz,z ′ X ( τ | x ) for τ ∈ (0 ,

1) and E [ Y t ′ |C tz,z ′ , X = x ] are identiﬁed from part (b) ofProposition 2. The stated result follows from applying this result instead of Lemma4 to the proof in the main paper. ✷ E About the conditions for Assumption 2 in Sec-tion 3.1

In this section, we show some suﬃcient conditions for Assumption 2 introduced inSection 3.1. The following proposition shows the suﬃcient conditions for parts (A3)and (A4) of Assumption 2.

Proposition 3. (a) Assume that parts (A1) and (A2) of Assumption 2 hold. Then,part (A3) of Assumption 2 holds when the covariance of D t and Z conditionalon Z ∈ { z, z ′ } and X = x is not 0 for all t ∈ T .(b) Assume that Assumption 1 and parts (A1)-(A3) of Assumption 2 hold. Then,part (A4) of Assumption 2 holds when the support of the joint conditional distri-bution of U t and V given X = x is the Cartesian product of the supports of theconditional distributions of U t and V given X = x for all t ∈ T , where T can beexpressed as T = ρ ( Z, X, V ) for some unknown function ρ and an unobservedrandom vector V from part (A3) of Assumption 1. roof of Proposition 3. First, we show part (a). Suppose that part (A3) of As-sumption 2 does not hold. Then, there exists ( z, z ′ ) ∈ Λ such that D tz ≤ D tz ′ holdsalmost surely for all t ∈ T , but P ( C t ′ z,z ′ ) > t ′ ∈ T . Be-cause we have P ( C t ′ z,z ′ ) = P ( D t ′ = 1 | Z = z ′ ) − P ( D t ′ = 1 | Z = z ) by Lemma 2, we have P ( D t ′ = 1 | Z = z ′ ) = P ( D t ′ = 1 | Z = z ), which implies D t ′ and Z are independent con-ditional on Z ∈ ( z, z ′ ). Hence, the covariance of D t ′ and Z conditional on Z ∈ { z, z ′ } is 0, and the stated result follows.Next, we show part (b). Without loss of generality, assume that P ( C tz,z ′ ) > U t be the support of the distribution of U t . Let V t and W t be the supports ofthe conditional distributions of U t and Y t given C tz,z ′ respectively. We ﬁrst show that U t = V t holds. This follows from the fact that the compliers is characterized by V .To see this, we have P ( C tz,z ′ ) = P ( D tz ′ = 1) − P ( D tz = 1)= P ( ρ ( z ′ , V ) = t ) − P ( ρ ( z, V ) = t ) (E.1)by Lemma 2. We then show that U t = V t implies Y = W t . It suﬃces to show that Y ∗ = W ∗ t holds because Y ∗ = W ∗ t implies Y = W t from Lemma 10. We proceed toshow that Y ∗ = W ∗ t holds. First, Y ∗ ⊃ W ∗ t follows from Y ⊃ W t . Y ⊃ W t holds fromthe deﬁnition of W t . Second, we show Y ∗ ⊂ W ∗ t . For each y ∗ ∈ Y ∗ , there exists τ ∗ ∈ (0 ,

1) such that y ∗ = Q Y t ( τ ∗ ) holds from the deﬁnition of Y ∗ . Take any ε > δ > y ∗ − ε = Q Y t ( τ ∗ − δ ) holds from the deﬁnition of Y ∗ .Observe that U t ∼ U (0 ,

1) holds because U t = F Y t ( Y t ) and F Y t is continuous. Hence, τ ∗ ∈ U ∗ t and F U t ( τ ∗ ) − F U t ( τ ∗ − δ ) > τ ∗ ∈ V ∗ t and F U t |C tz,z ′ ( τ ∗ ) − F U t |C tz,z ′ ( τ ∗ − δ ) > U t = V t implies U ∗ t = V ∗ t from Lemma 10. This implies that F Y t |C tz,z ′ ( y ∗ ) − F Y t |C tz,z ′ ( y ∗ − ε ) > { U t < τ ∗ } = { Y t < y ∗ } and { U t < τ ∗ − δ } = { Y t < y ∗ − ε } (E.4)hold from the deﬁnition of Q Y t . Then applying (E.4) to (E.2) gives (E.3). Hence, Y ∗ ⊂ W ∗ t holds. Therefore, Y ∗ = W ∗ t holds and the stated result follows. ✷ About the choice restrictions of Examples II andIII in Section 3.2

In this section, we show the budget set relationships of the families we assume forExamples II and III. Similar to Example I, economic analysis generates choice restric-tions for Examples II and III from the budget set relationships of the families.

F.1 Example II

First, we have the following obvious monotonicity inequalities arising from diﬀerentvalues of Z for each family ω ∈ Ω: D ij ( ω ) ≤ D ii ( ω ) for i = 1 , . . . , k and j ∈ T \ { i } . (F.1)For j ∈ T \ { , i } , inequality (F.1) states that the family is induced toward buyinghouse i when the instrument changes from a voucher for house j to a voucher forhouse i . For j = 0, inequality (F.1) states that the family is induced toward buyinghouse i when the instrument changes from no voucher to a voucher for house i .Next, for budget set B ω ( z, t ) of family ω , we can naturally assume the followingrelationships for i = 1 , . . . , k : B ω ( j,

0) = B ω ( i,

0) for j ∈ T \ { i } , (F.2) B ω ( l, i ) = B ω ( j, i ) ⊂ B ω ( i, i ) for j, l ∈ T \ { i } and j = l. (F.3)Relationship (F.2) holds because any voucher oﬀers no discount and produce thesame budget set if family ω does not buy any house. Relationship (F.3) examines thebudget set of family ω if it purchases house i . The budget set of family ω is enlargedif it has a voucher that subsidizes house i when compared to a voucher that do notaﬀect the choice set (voucher j for j ∈ T \ { i } or no voucher). Similar to Example I,for ω ∈ Ω, applying the choice rule (3.7) to budget set relationships (F.2) and (F.3)generates the following additional monotonicity inequalities in addition to (F.1): D ji ( ω ) ≤ D j ( ω ) for i = 1 , . . . , k and j ∈ T \ { i } . (F.4)Therefore, from (F.1) and (F.4), we may assume the monotonicity inequalities of143.8). F.2 Example III

First, as in (F.1) in Example II, we have the following obvious monotonicity inequalityarising from diﬀerent values of Z for each family ω ∈ Ω: D k ( ω ) ≤ D kk ( ω ) . (F.5)Second, for budget set B ω ( z, t ) for family ω , we can naturally assume the followingrelationships for i = 1 , . . . , k : B ω ( j,

0) = B ω ( i,

0) for j ∈ T \ { i } , (F.6) B ω (0 , i ) = B ω ( j, i ) for j ∈ { i + 1 , . . . , k } and B ω (0 , i ) ⊂ B ω ( l, i ) for l ∈ { , . . . , i } . (F.7)Relationship (F.6) can be interpreted in the same way as relationship (F.2). Relation-ship (F.7) examines the budget set of family ω if it purchases house i . The budgetset of family ω is enlarged if it has a voucher that subsidizes house i (voucher l for l ∈ { , . . . , i } ) when compared to a voucher that do not aﬀect the choice set (voucher j for j ∈ { i + 1 , . . . , k } or no voucher). Similar to Example I, for ω ∈ Ω, applying thechoice rule (3.7) to budget set relationships (F.6) and (F.7) generates the followingadditional monotonicity inequalities in addition to (F.5): D ii ( ω ) ≥ D ii +1 ( ω ) and D ji ( ω ) ≤ D ji +1 ( ω ) for i = 1 , . . . , k − j ∈ T \ { i } ,D jk ( ω ) ≤ D j ( ω ) for j ∈ T \ { k } . (F.8)Therefore, from (F.5) and (F.8), we may assume the monotonicity inequalities of(3.9). G The relationship between the compliers of Ex-ample II

In this section, we derive the relationships between the compliers of (4.9) in Section3.2. First, for (1 , , . . . , ( k,

0) in Table II, observe that, from the deﬁnition of the15ompliers, we have C i ,i = [ j = i { D ii = 1 , D j = 1 } for i = 1 , . . . , k. (G.1)The sets { D ii = 1 , D j = 1 } and { D ii = 1 , D l = 1 } are disjoint for j = l . This is because,for i = 1 , . . . , k , family ω contained in C i ,i will either buy a house other than house i or do not buy any house if it has no voucher. Second, we obtain C ji, = { D ii = 1 , D j = 1 } for i = 1 , . . . , k and j ∈ T \ { i } . (G.2)To see this, suppose family ω contained in C ji, choose house l , where l ∈ T \ { , i, j } .Then, because D li ≤ D l holds, it would also choose house l when its voucher assign-ment is 0, which contradicts with the deﬁnition of C ji, . Hence, it will choose house i if its voucher assignment is i . Therefore, the relationships of (4.9) follow from (G.2)and (G.1). H About the condition for discrete instruments

In this section, we discuss the reason why we assume that the treatment do not havelarger support than the instrument variable. Let T take k + 1 values. Assume thatAssumptions 1 and 2 hold. We assume that Z contains at least k + 1 values, and wecannot apply our identiﬁcation analysis if this condition does not hold.Suppose Z contains only k values, and that k pairs of instrument values λ , . . . , λ k are contained in the monotonicity subset Λ. We show that at least one monotonicityrelationship do not give additional information to other monotonicity relationships.To this end, we ﬁrst show that at least one pair of instrument values (without lossof generality, suppose λ k ) satisﬁes the following property:For λ i , both of the elements of λ i = ( λ i , λ i ) are contained in at least one of theother pairs, which are λ j for j ∈ { , . . . , k } \ { i } . (H.1)Suppose that for all l ∈ { , . . . , k } , one of the elements of λ l is not contained in anyof the other pairs. Then Z needs to contain at least k + 1 values. This is because weneed at least one value contained in more than one pair and k more values such that16ach of them is contained in each pair respectively.We deal with the case that the two elements of λ k = ( λ k , λ k ) are contained indiﬀerent pairs. When they are contained in the same pair (without loss of generality,suppose λ ), equations for the potential outcome conditional c.d.f.’s given the com-pliers are the same for λ and λ k , and the monotonicity relationship of λ k do not giveadditional information to other relationships λ , . . . , λ k − .Suppose that λ k is the only one pair that satisﬁes property (H.1). We showthat the monotonicity relationship of λ k do not give additional information to otherrelationships λ , . . . , λ k − . Similar argument follows for the cases that other pairs alsosatisfy such property. Without loss of generality, suppose we have Z = { , , . . . , k − } , λ l = (0 , l ) for l ∈ { , . . . , k − } , and λ k = (1 , P ( C t , ) = P ( C t , ) − P ( C t , ) for t ∈ T . (H.2)Hence, the equation for the potential outcome conditional c.d.f.’s given the compliersof λ k is obtained by subtracting that of λ from that of λ , and less than k equationsare induced from λ , . . . , λ k . We cannot identify k counterfactual mappings from only k −−