Identification of multi-valued treatment effects with unobserved heterogeneity
aa r X i v : . [ ec on . E M ] O c t Identification of multi-valued treatment effects withunobserved heterogeneity
Koki Fusejima ∗ Graduate School of Economics, University of TokyoThis version: October 12, 2020
Abstract
In this paper, we establish sufficient conditions for identifying the treat-ment effects on continuous outcomes in endogenous and multi-valued discretetreatment settings with unobserved heterogeneity. We employ the monotonic-ity assumption for multi-valued discrete treatments and instruments, and ouridentification condition is easy to interpret economically. Our result contrastswith related work by Chernozhukov and Hansen (2005) for this point. We alsoestablish identification of the local treatment effects in multi-valued treatmentsettings and derive the closed-form expressions of the identified treatment ef-fects. We give examples to verify the usefulness of our result.
Keywords:
Treatment effects, unobserved heterogeneity, identification, endogene-ity, instrumental variables, monotonicity ∗ Email: [email protected]. I would like to thank my advisorKatsumi Shimotsu, Hidehiko Ichimura, Yuichi Kitamura, Hiroaki Kaido, Takuya Ishi-hara, Ryo Imai, Ryota Yuasa, and the seminar participants at University of Tokyo,Otaru University of Commerce, and Hitotsubashi University for their helpful com-ments on this research. This research is supported by Grant-in-Aid for JSPS ResearchFellow (20J20046) from the JSPS. All the errors are mine.1
Introduction
Unobserved heterogeneity in treatment effects is important in many empirical studiesin economics. As discussed in Heckman (2001), for example, economic theory andapplications strongly suggest that causal effects of treatments or policy variables varyacross individuals and subpopulations with the same observable characteristics.In the presence of such heterogeneity, different treatment effects can be defined fordifferent subpopulations with the same unobservable characteristics. Quantile treat-ment effects are able to characterize heterogeneous impacts of treatments on differentlevels of unobserved components in terms of potential outcome quantiles. Under in-strument variable (IV) methods, local treatment effect, which is first introduced byImbens and Angrist (1994), is the treatment effect conditional on the unobservablesubpopulation whose treatment states are affected by the instrument.In this paper, we establish sufficient conditions for identifying the treatment effectson continuous outcomes in endogenous and multi-valued discrete treatment settingswith unobserved heterogeneity. As is the case for any parameter, identification is aprerequisite for consistent estimation. IV methods provide a powerful tool to iden-tify causal effects under treatment endogeneity. Instruments are discrete in manyempirical applications, and we use only discrete instruments for identification.For discrete treatments, treatments are implicitly or explicitly multi-valued inmany applications. For example, households may receive different levels of transfersin anti-poverty programs, and participants to a training program may receive differenthours of training. It is important for the policy maker to compare the multi-valuedtreatment effects, who needs to decide which treatment level is appropriate.For the multi-valued endogenous treatment case, Chernozhukov and Hansen(2005) establish identification of the quantile treatment effects (on the observed pop-ulations) with a discrete instrument. However, there are two difficulties in directlyapplying their identification results in practice. First, their identification results re-quire some numerical conditions on the conditional densities of the outcome variable,which is testable in principle but difficult to check in practice. Second, estimationbased on the induced moment restriction is complicated by the non-smoothness andnon-convexity of the corresponding generalized method of moments (GMM) objectivefunction. 2or the first point, if the treatment is binary, Vuong and Xu (2017) and W¨uthrich(2019) show identification under an economically interpretable condition called “mono-tonicity.” This condition is introduced by Imbens and Angrist (1994), and the poten-tial outcome distributions are identified as closed-form expressions conditional on asubpopulation called “compliers” under this condition. Vuong and Xu (2017) andW¨uthrich (2019) exploit limited variations of the instrument by matching these twoconditional distributions and identify the full-population distributions of the poten-tial outcomes as closed-form expressions. This idea of matching two distributions isintroduced by Athey and Imbens (2006) in nonlinear difference-in-difference models.For the second point, W¨uthrich (2019) and Feng et al. (2020) develop plug-in estimation approach based on that closed-form expressions when the treat-ment is binary. This estimation strategy naturally bypasses the challenges as-sociated with optimizing the GMM objective function and remains computation-ally tractable. For estimation based on the GMM objective function, reliableand practically useful methods are developed especially for parametric structuralquantile models. See Chernozhukov and Hansen (2006), Chen and Lee (2018),Zhu (2018), and Kaido and Wuthrich (2018) for linear-in-parameters quantile mod-els, and see Chernozhukov and Hong (2003) and de Castro et al. (2019) for non-linear quantile models. Nonparametric estimation approaches are studied byChernozhukov et al. (2007), Horowitz and Lee (2007), Chen and Pouzo (2009, 2012),and Gagliardini and Scaillet (2012).The main contribution of this paper is as follows. We establish economically in-terpretable sufficient conditions for identifying the potential outcome distributionsin multi-valued treatment settings, and we derive the closed-form expressions of po-tential outcome distributions. We show that, in principle, identification is achievedwithout assuming numerical conditions on the distribution of the outcome variable,which may be useful when designing a social experiment where the outcome data willbe collected later. We also establish identification of the local treatment effects inmulti-valued treatment settings under our assumptions. This idea resembles to that of Das (2005), who develops estimation strategy basedon the closed-form expression of the regression function with discrete endogenoustreatments in the nonparametric regression model with an additive error term.3o this end, we generalize the monotonicity assumption to multi-valued treat-ments. Our assumption is close to the “unobserved monotonicity” assumption in-troduced by Heckman and Pinto (2018), but we employ a weaker assumption thatholds only on a particular subset of the pairs of values the instrument can take. Themonotonicity assumption is originally assumed on all the pairs of instrument values,and this requirement for the pairs of values is also adopted in many studies. However,as we see in our examples, this requirement may be too strong when the instru-ment is multi-valued. Recently, Mogstad et al. (2019) and Mountjoy (2019) discuss asimilar problem when there are multiple instruments for the binary and multi-valuedunordered treatment cases respectively. They introduce weaker monotonicity assump-tions that holds only on pairs of each element, holding all other elements fixed.In the case of multi-valued treatment settings, generally we can compare no twotreatment states on the same subpopulation under a single monotonicity relationshipbecause the selection mechanism become more complicated than the binary treatmentcase. We overcome this difficulty by developing systems of monotonicity relationshipsthat can be solved simultaneously for multiple comparisons of the treatment states.There is a rich literature on identification of treatment effects with unobservedheterogeneity using the IV methods. We review some papers that provide resultsdirectly relevant to this paper for identification of multi-valued treatment effects.For the treatment effects on the full population (or some observed subpopulations),Feng (2019) and Caetano and Escanciano (2020) establish identification when theinstrument has smaller support than the treatment using the observed covariates.Feng (2019) assumes the existence of exogenous covariates, where the covariates andthe instrument are jointly independent of the unobservables in both the outcomeand selection equations. Caetano and Escanciano (2020) assume the existence ofcontinuous covariates in which the outcome equation has a separable structure.Identification of local treatment effects in multi-valued treatment settings alsohave the difficulty arising from the complex selection mechanism. Under the mono-tonicity assumption generalized in each study, Angrist and Imbens (1995) identify aweighted average of local average treatment effects that compares treatment states t and t − t and the set of other states with continuous instruments. Heckman et al. (2006)also establish identification on more various subpopulations that are identified withcontinuous instruments. For more general class of selection models, Lee and Salani´e(2018) show similar identification result for the average treatment effects that com-pares any two different treatment states t and t ′ .In this paper, we identify the local treatment effect that compares any two differenttreatment states t and t ′ with a discrete instrument using systems of monotonicityrelationships we develop. We establish identification when the outcome variable iscontinuously distributed under our monotonicity assumption with some additionalassumptions on unobservable factors.The remainder of the paper is organized as follows. In Section 2, we introducenotation, basic assumptions, and a map called “counterfactual mapping.” Counter-factual mapping is developed by Vuong and Xu (2017), and this map is an importanttool for identification. In Section 3, we introduce the monotonicity assumptions andgive an example where economic theory can justify our monotonicity assumption. InSection 4, we establish identification of the treatment effects using our monotonicityassumption. In Section 5, we apply our identification result to a real-world socialexperiment called “Moving to Opportunity.” Section 6 concludes. Proofs of the mainresults and some auxiliary results are collected in Appendices A and B respectively.Some additional discussions are collected in Appendices C-H in the SupplementalAppendix. Throughout this paper, we use the notations F A and Q A for the unconditional cu-mulative distribution function (c.d.f.) and quantile function (q.f.) of a scalar valuedrandom variable A , respectively. Similarly, for a set D and random vectors B and C , F A |D BC ( ·| b, c ) and Q A |D BC ( ·| b, c ) denote the conditional c.d.f. and q.f. of A on D ∩ { ( B, C ) = ( b, c ) } , respectively. Let D ◦ denote the interior of D .Let (Ω , F , P ) be a common probability space. Assume a finite collection of multi-ple treatment status (that is categorical or ordinal) indexed by t ∈ T where, without5oss of generality, T = { , , , . . . , k } with k ∈ N . Let Y , Y ,. . . , Y k with Y t ∈ Y ⊂ R and E [ | Y t | ] < ∞ denote the potential outcomes under each treatment level. The ran-dom variable T indicates which of the k + 1 potential outcomes is observed. Define D t := 1 { T = t } , where 1 {A} is the indicator function of a set A . D t is an indicatorfunction of each treatment level. Then, the observed outcome can be represented as Y = P kt =0 Y t D t . Assume that we also observe a random vector X ∈ X ⊂ R r of covari-ates and a discrete random variable Z ∈ Z for the instrument, where Z contains atleast k + 1 values. For x ∈ X , define U t := F Y t | X ( Y t | x ) conditional on X = x . U t iscalled “rank variable.” The rank variable characterizes heterogeneity of outcomes forindividuals with the same observed characteristics by the relative ranking in termsof potential outcomes. For notation simplicity, we assume that the supports of theconditional distributions of T , Y t , Y , and Z given X = x are equal to T , Y , Y , and Z respectively. To simplify the proofs in the main paper, we also assume that theclosure of Y ◦ is equal to Y , and F Y t | X ( Y ◦ | x ) does not depend on t ∈ T . The resultsin this paper do not rely on these restrictions, and we relax them in Appendix D. In this paper, for two different treatment levels t and t ′ , we are interested inthe average treatment effect (ATE): E [ Y t ] − E [ Y t ′ ] and the quantile treatment effect(QTE): Q Y t ( τ ) − Q Y t ′ ( τ ), where τ ∈ (0 , X . We are also interested in the local treatment effects, which weintroduce in Section 4.4.For identification of these treatment effects, it suffices to identify the conditionalmean and q.f. of the potential outcomes given X = x . We identify them underthe following set of assumptions. Chernozhukov and Hansen (2005), Vuong and Xu(2017), and W¨uthrich (2019) employ similar set of assumptions. Assumption 1 (Instrument independence and rank similarity) . For all x ∈ X , thefollowing conditions hold:(A1) Potential outcomes: For each t ∈ T , F Y t | X ( ·| x ) is continuous.(A2) Independence: Conditional on X = x , { U t } kt =0 are independent of Z . See Appendix H for the reason why we assume that the instrument variable doesnot have smaller support than the treatment. See also Remark 1 for related discussions.6
A3) Selection: T can be expressed as T = ρ ( Z, X, V ) for some unknown function ρ and random vector V .(A4) Rank similarity: Conditional on ( X, Z, V ) = ( x, z, v ) , { U t } kt =0 are identicallydistributed. Part (A1) of Assumption 1 imposes continuity of the potential outcome c.d.f..Under part (A1) of Assumption 1, F Y t | X ( y | x ) is strictly increasing in y ∈ Y ◦ , and Q Y t | X ( τ | x ) is strictly increasing in τ ∈ (0 , Chernozhukov and Hansen (2005) di-rectly assume that Q Y t | X ( τ | x ) is strictly increasing in τ ∈ (0 ,
1) and additionally im-poses continuous differentiability around each quantile. Vuong and Xu (2017) assumethat Q Y t | X ( τ | x ) is continuous and strictly increasing in τ ∈ (0 , Under part (A1)of Assumption 1, the rank variable U t follows an uniform distribution on (0 ,
1) con-ditional on X = x , and Y t and Q Y t | X ( U t | x ) are identically distributed conditional on X = x . Hence, we can interpret the QTE as treatment effects on individuals with thesame level of unobserved heterogeneity at some level U t = τ . Part (A2) of Assump-tion 1 imposes conditional independence between the potential outcomes and theinstrument. Part (A3) of Assumption 1 states a general selection equation where therandom vector V captures unobserved factors affecting selection into treatment. Part(A4) of Assumption 1 is called “rank similarity”. Rank similarity is arguably strong,but this condition has important implication for identification and is consistent withmany empirical situations. We show these properties in Lemmas 5 and 6 in Appendix B. In this paper, we focus on Section 2 of Vuong and Xu (2017). Vuong and Xu(2017) consider several settings, and they relax some restrictions imposed on outcomeand selection equations in other sections. For the rank variable, we employ a slightly different definition from the originaldefinition of Chernozhukov and Hansen (2005). Chernozhukov and Hansen (2005)define the rank variable U t as an uniformly distributed random variable on (0 ,
1) thatsatisfies Y t = Q Y t | X ( U t | x ) conditional on X = x . The difference does not matter in oursettings because Y t and Q Y t | X ( U t | x ) are identically distributed conditional on X = x . We can alternatively assume a stronger assumption called “rank invariance” thatassumes U t = U t ′ for any t = t ′ . This condition corresponds to the nonseparable7he main statistical implication of Assumption 1 is that, for each t ∈ T and x ∈ X , the conditional q.f. of Y t given X = x satisfies the following nonlinear momentcondition (Chernozhukov and Hansen (2005) Theorem 1): k X t =0 F Y | T ZX ( Q Y t | X ( τ | x ) | t, z, x ) p t ( z, x ) = τ, (2.1)where p t ( z, x ) is defined as p t ( z, x ) := P ( T = t | Z = z, X = x ) . (2.2)This moment condition (2.1) does not identify Q Y t | X ( τ | x ) without additional assump-tions. Roughly speaking, Chernozhukov and Hansen (2005) show that the key condi-tion for point identification is full rank of Jacobian matrices characterized by the con-ditional densities of the observable outcome variable. The identification condition ofChernozhukov and Hansen (2005) is in principle directly testable, but this conditionrequires more than sufficiently strong correlation between the endogenous variable andthe instruments, and this condition is difficult to interpret economically. Moreover,estimation based on (2.1) is complicated by the non-smoothness and non-convexityof the corresponding objective function, which occurs even for linear-in-parametersquantile models. In this section, we introduce the counterfactual mapping in multi-valued treatmentsettings. We show that identification of the potential outcome distributions resultsin identification of the counterfactual mappings.For s, t ∈ T and x ∈ X , define φ xs,t : R → R as φ xs,t ( y ) := Q Y t | X ( F Y s | X ( y | x ) | x ). φ xs,t is called conditional “counterfactual” mapping from Y s to Y t given X = x becausethe potential outcomes are also called as “counterfactual outcomes.” Vuong and Xuregression model Y = g ( t, x, ε ) where g ( t, x, · ) is strictly increasing in a scalar errorterm ε that follows an uniform distribution on (0 , g ( t, x, ε ) − g ( t ′ , x, ε ).We can identify the ITE if we alternatively employ this setting with the additionalassumptions assumed in Vuong and Xu (2017).82017) define a similar mapping for the binary treatment case. From the definition,this mapping relates the quantiles of conditional distribution of Y s to that of Y t given X = x . Under Assumption 1, this mapping is strictly increasing on Y ◦ , and φ xs,r = φ xt,r ◦ φ xs,t holds for s, t, r ∈ T . Moreover, under the assumption that F Y t | X ( Y ◦ | x )does not depend on t ∈ T , an inverse mapping φ x − s,t exists on Y ◦ , and φ xt,s ( y ) = φ x − s,t ( y )holds for y ∈ Y ◦ . Similarly, we define the unconditional counterfactual mapping for s, t ∈ T as φ s,t ( y ) := Q Y t ( F Y s ( y )).The following lemma shows that the potential outcome c.d.f.’s and means can bewritten as compositions of the counterfactual mappings and observable distributions.Vuong and Xu (2017) show a similar result for the binary treatment case. Lemma 1 (Potential outcome c.d.f.’s and means via counterfactual mappings) . As-sume that Assumption 1 holds. Define p t ( z, x ) as in (2.2).(a) For each s ∈ T and x ∈ X , F Y s | X ( y | x ) for y ∈ Y ◦ can be expressed as F Y s | X ( y | x ) = k X t =0 F Y | T ZX ( φ xs,t ( y ) | t, z, x ) p t ( z, x ) . (2.3) (b) For each s ∈ T and x ∈ X , E [ Y s | X = x ] can be expressed as E [ Y s | X = x ] = k X t =0 E [ φ xt,s ( Y ) | T = t, Z = z, X = x ] p t ( z, x ) . (2.4)Lemma 1 follows from the fact that, for each s, t ∈ T , Y s and φ xt,s ( Y t ) are identicallydistributed conditional on ( T, Z, X ) = ( t, z, x ). We have this because the rank vari-ables U s and U t are identically distributed conditional on ( T, Z, X ) = ( t, z, x ) underthe rank similarity assumption.Lemma 1 implies that, for each s ∈ T and x ∈ X , E [ Y s | X = x ] and Q Y s | X ( τ | x ) for τ ∈ (0 ,
1) are identified as closed-form expressions if φ xs,t for t ∈ T is also identified asa closed-form expression. Hence, we establish sufficient conditions to identify φ xs,t ’sand derive the closed-form expressions of φ xs,t ’s. Remark 1.
For Lemma 1, we only need part (a) in order to identify E [ Y s | X = x ] and Q Y s | X ( τ | x ) for τ ∈ (0 , φ xs,t ( y ) for y ∈ Y ◦ suffices for identificationof the treatment effects because we assume that F Y s | X ( y | x ) is continuous in y ∈ Y ,and that the closure of Y ◦ is equal to Y . In Appendix D, we relax the assumption9hat the closure of Y ◦ is equal to Y and derive the closed-form expressions of φ xs,t and F Y s | X ( ·| x ) on a sufficiently large subset of Y . Strictly speaking, these results arerequired to precisely derive the closed-form expressions of E [ Y s | X = x ] and Q Y s | X ( τ | x )for τ ∈ (0 , From this section, we suppress the conditioning variable X unless stated otherwisefor simplicity. In this section, we introduce the monotonicity assumptions. We first introduce themonotonicity assumption for the binary treatment case. This condition is first intro-duced by Imbens and Angrist (1994). Define P := { ( z, z ′ ) ∈ Z : z = z ′ } as a set ofpairs of different values that the instrument can take. The monotonicity assumptionimposes restrictions on these pairs.Let T z be the potential treatment state if Z had been externally set to z . Then,for each pair of values ( z, z ′ ) ∈ P , we can partition the population into four groupsdefined by T z and T z ′ . These four groups are { T z = T z ′ = 1 } , { T z = T z ′ = 0 } , { T z =0 , T z ′ = 1 } , and { T z = 1 , T z ′ = 0 } . The first and second groups are those who do notchange their choice, and the third and fourth groups are those who respond to achange in Z . Let C z,z ′ := { T z = 0 , T z ′ = 1 } , and write the third and fourth groups as C z,z ′ and C z ′ ,z respectively.The monotonicity assumption requires that either P ( C z ′ ,z ) = 0 and P ( C z,z ′ ) > P ( C z,z ′ ) = 0 and P ( C z ′ ,z ) > z, z ′ ) ∈ P . The group that is assumedto have positive probability is called “compliers,” and the group that is assumed tohave 0 probability is called “defiers.” This condition is equivalent to assuming thateither T z ≤ T z ′ and P ( C z,z ′ ) > T z ≥ T z ′ and P ( C z ′ ,z ) > z, z ′ ) ∈ P .Heckman and Pinto (2018) generalize this assumption to multi-valued treatmentsettings. For each z ∈ Z and t ∈ T , define a binary variable D tz := 1 { T z = t } as an10ndicator function of each potential treatment state if Z had been externally set to z . The preceding comparisons of the treatments for the binary treatment case canbe translated into inequalities that compare D and D . For example, for ( z, z ′ ) ∈ P , T z ≤ T z ′ is equivalent to D z ≤ D z ′ or D z ≥ D z ′ because we have D = T and D =1 − T . Heckman and Pinto (2018) generalize this argument and assume that either D tz ≤ D tz ′ or D tz ≥ D tz ′ holds almost surely for each pair ( z, z ′ ) ∈ P and treatment level t ∈ T . They call this assumption “unordered monotonicity” because this conditioncan be assumed on the unordered treatments. However, as we see in the examples in Sections 3.2 and 5, imposing such conditionson all the pairs ( z, z ′ ) ∈ P may be too strong when the instrument is multi-valued.Hence, we employ a weaker assumption that impose such conditions only on a subsetof P . That subset is determined differently in each situation. As we define C z,z ′ = { T z = 0 , T z ′ = 1 } in the binary treatment case, define C tz,z ′ := { D tz = 0 , D tz ′ = 1 } . Our monotonicity assumption is characterized by inequalities suchas D tz ≤ D tz ′ and D tz ≥ D tz ′ . We call them “monotonicity inequalities of ( z, z ′ )” in thispaper. For ( z, z ′ ) ∈ P , if either D tz ≤ D tz ′ or D tz ≥ D tz ′ holds almost surely (condi-tional on X = x ∈ X ) for all t ∈ T , we state “monotonicity inequalities holds for( z, z ′ ) (conditional on X = x )” in this paper. We employ the following monotonicityassumption: Assumption 2 (Instrument independence and monotonicity) . For all x ∈ X , thereexists a subset Λ of P such that the following conditions hold for each λ = ( z, z ′ ) ∈ Λ :(A1) Independent instrument: Conditional on X = x , ( Y t , T z ) for t ∈ T and z ∈ λ are jointly independent of Z . Angrist and Imbens (1995) establish an assumption for the ordered treatmentsthat either T z ≤ T z ′ or T z ≥ T z ′ holds almost surely for each pair ( z, z ′ ) ∈ P . Thisassumption is called “ordered monotonicity.” Vuong and Xu (2017) assume that T = 1 { p ( Z, X ) ≥ V } and V | X ∼ U (0 ,
1) holdfor the binary treatment case. Heckman and Pinto (2018) show that an analogousseparable structure for the unobserved variable holds for each D t under the unorderedmonotonicity assumption. This implies that monotonicity inequalities hold for all thepairs of P , and we do not specify the selection equation in that way.11 A2) Monotonicity inequalities: Either P ( D tz ≤ D tz ′ | X = x ) = 1 or P ( D tz ≥ D tz ′ | X = x ) = 1 holds for all t ∈ T .(A3) Instrument relevance: Either P ( C tz,z ′ | X = x ) > or P ( C tz ′ ,z | X = x ) > holdsfor all t ∈ T .(A4) Sufficient support: The support of the conditional distribution of Y t given X = x and either C tz,z ′ or C tz ′ ,z is Y . In this paper, we call this subset Λ “monotonicity subset.” When T is binary, parts(A1)-(A3) of Assumption 2 are the monotonicity assumption for the binary treatmentcase. Part (A1) of Assumption 2 strengthens part (A2) of Assumption 1 and assumesthat the potential outcome and treatment are jointly independent of the instrument.Part (A2) of Assumption 2 assumes that monotonicity inequalities hold on a particularsubset Λ of P . Part (A3) of Assumption 2 with part (A2) assumes that the compliersalways exist. Under these conditions, when monotonicity inequalities hold on ( z, z ′ ) ∈ Λ, we exclude the cases where neither D tz < D tz ′ nor D tz > D tz ′ can happen for some t ∈ T . We can interpret this condition as an instrument relevance condition thatholds when the conditional covariance of D t and Z given Z ∈ { z, z ′ } is not 0 for all t ∈ T . Part (A4) of Assumption 2 strengthens the instrument relevance conditionand assumes that the compliers are sufficiently large. Vuong and Xu (2017) employa similar condition for the binary treatment case. Under part (A4) of Assumption 2,each conditional c.d.f. of potential outcome on compliers as well as unconditional c.d.f.of potential outcome is strictly increasing on Y ◦ . From part (A3) of Assumption 1, T can be expressed as T = ρ ( Z, V ) for some unknown function ρ and an unobservedrandom vector V . A sufficient condition for part (A4) of Assumption 2 is that ( U t , V )has a rectangular support for all t ∈ T . This condition is often assumed for regressionmodels with a selection equation. See Appendix E for the reason for this point. We show this statement in Lemma 5 in Appendix B. See Appendix E for the proof of this point.12 .2 Motivating examples
In this section, we give examples where economic interpretation leads to Assumption2. Consider a social experiment of house purchase where each family can buy ahouse from k possible options located in different regions and labeled as 1 , . . . , k , andvouchers are randomly assigned that offer price discounts to the specified house (orhouses). We assume that the discount rates are the same. This kind of settingarises in various situations. We introduce a real-world social experiment in Section 5.We consider a situation that the researcher wants to evaluate where is the bestplace to raise a child, and the outcome of interest will be collected several years later.Let Y denote the outcome of interest that can be regarded as a continuous randomvariable (such as child test score data). Let the treatment T denote the house choiceand T = 0 for not buying any house, and for i = 1 , . . . , k , T = i for buying house i .In this section, we consider three examples. Suppose the treatment T and theinstrument Z we define in these cases are sufficiently correlated, and we assume parts(A3) and (A4) of Assumption 2 unless stated otherwise. Let k = 2. Suppose there are three types of vouchers labeled as a , b and c such thatvoucher a : Offer a discount to houses 1 and 2,voucher b : Offer a discount to house 1,voucher c : Offer a discount to house 2.Suppose these vouchers are randomly assigned to all the families. Let the instrument Z represent voucher assignment that takes values on Z = { a, b, c } . Part (A1) ofAssumption 2 holds because vouchers are randomly assigned.First, we have two obvious monotonicity inequalities arising from different valuesof Z for each family ω ∈ Ω: D c ( ω ) ≤ D b ( ω ) , (3.1) We need no assumptions for house prices such as the house prices are the samebecause we just compare the possible budget sets of each family given the treatmentis fixed at a particular state. 13nd D b ( ω ) ≤ D c ( ω ) . (3.2)Inequality (3.1) states that the family is induced toward buying house 1 when theinstrument changes from a voucher for house 2 to a voucher for house 1. Inequality(3.2) can be interpreted similarly.Next, economic analysis generates additional choice restrictions.Heckman and Pinto (2018) consider a similar example of car purchase, and wefollow their application of economic analysis. Let B ω ( z, t ) be the budget set of family ω when its voucher assignment is z ∈ Z and treatment choice is t ∈ T . Family ω isassumed to maximize a utility function u ω defined over consumption goods g andchoice t . Then T z ( ω ) as a function of z can be viewed as a choice function of family ω : T z ( ω ) = arg max t ∈T (cid:18) max g ∈B ω ( z,t ) u ω ( g, t ) (cid:19) . (3.3)For budget set B ω ( z, t ) of family ω , we can naturally assume the following relation-ships: B ω ( a,
0) = B ω ( b,
0) = B ω ( c, , (3.4) B ω ( c, ⊂ B ω ( a,
1) = B ω ( b, , (3.5) B ω ( b, ⊂ B ω ( a,
2) = B ω ( c, . (3.6)Relationship (3.4) holds because any voucher offers no discount and produce the samebudget set if family ω does not buy any house. Relationship (3.5) examines the budgetset of family ω if it purchases house 1. Its budget set is enlarged if it has a voucherthat subsidizes house 1 (vouchers a and b ) when compared to a voucher that do notaffect the choice set (voucher c ). Relationship (3.6) can be interpreted similarly. Fortwo different treatment levels t and t ′ and voucher assignments z and z ′ , revealedpreference analysis generates the following choice rule: D tz ( ω ) = 1 and B ω ( z, t ) ⊆ B ω ( z ′ , t ) and B ω ( z ′ , t ′ ) ⊆ B ω ( z, t ′ ) ⇒ D t ′ z ′ ( ω ) = 0 . (3.7)This choice rule (3.7) makes intuitive sense. Suppose family ω buys house t if it has See the proof of Lemma L-1 of Pinto (2015) and Heckman and Pinto (2018) forthis statement. 14oucher z . Then its choice does not change to another house t ′ unless voucher z enlarges its budget set for buying house t when compared to its another voucher z ′ ,or its another voucher z ′ enlarges its budget set for buying house t ′ when comparedto voucher z . Applying the choice rule (3.7) to budget set relationships (3.4)-(3.6)generates six additional monotonicity inequalities in addition to (3.1)-(3.2). TableI summarizes these eight inequalities. From Table I, we may assume part (A2) ofAssumption 2 for the subset of P that contains ( a, b ) and ( a, c ).Table I: Monotonicity inequalities of Example I T a, b ) D a ≤ D b D a ≤ D b D a ≥ D b P ( a, c ) D a ≤ D c D a ≥ D c D a ≤ D c ( b, c ) D b ≥ D c D b ≤ D c These monotonicity inequalities also make intuitive sense. In the following discus-sion, we focus on D a ≤ D b . Other inequalities can be interpreted similarly. Supposefamily ω does not buy a house even though it has a voucher that subsidizes houses1 and 2 (voucher a ). Then family ω will not change its choice to buy a house if ithas a voucher that offers the same discount rate but restricted to either house 1 or 2.Hence, D a = 1 ⇒ D b = 1 holds.Economic interpretation does not imply any monotonicity inequality between D b and D c , and ( b, c ) is not contained in the monotonicity subset. Suppose family ω does not buy a house if it has a voucher that subsidizes house 1 (voucher b ). Thenfamily ω may change its choice to house 2 if it has a voucher that subsidizes house 2(voucher c ) instead of voucher b . Hence, D b > D c may happen. But, D b < D c mayalso happen from a similar argument. Suppose there are k types of vouchers labeled as 1 , . . . , k such thatvoucher i : Offer a discount to houses i .Suppose the families that volunteered to participate in the experiment are randomlyplaced in one of the following assignment groups: control group where no voucher is15ssigned and for i = 1 , . . . , k , group i where voucher i is assigned. Let the instrument Z represent voucher assignment, where Z = 0 denotes no voucher and Z = i denotesvoucher i . Part (A1) of Assumption 2 holds because vouchers are randomly assigned.Similar to Example I, economic analysis generates the following monotonicityinequalities, which make intuitive sense: D ii ≥ D i and D ji ≤ D j for i = 1 , . . . , k and j ∈ T \ { i } . (3.8)Table II summarizes (3.8). From Table II, we may assume part (A2) of Assumption2 for the subset of P that contains (1 , , . . . , ( k, T ··· k − k (1 , D ≤ D D ≥ D ··· D k − ≤ D k − D k ≤ D k P ... ... ... ... ... ...( k − , D k − ≤ D D k − ≤ D ··· D k − k − ≥ D k − D kk − ≤ D k ( k, D k ≤ D D k ≤ D ··· D k − k ≤ D k − D kk ≥ D k Suppose houses 1 , . . . , k are placed in order of distance from the downtown (house 1is the nearest and house k is the farthest), and there are k types of vouchers labeledas 1 , . . . , k such thatvoucher i : Offer a discount to houses i, . . . , k .Suppose the families who volunteered to participate in the experiment are randomlyplaced in one of the following assignment groups: control group where no voucher isassigned, and for i = 1 , . . . , k , group i where voucher i is assigned. Let the instrument Z represent voucher assignment, where Z = 0 denotes no voucher and Z = i denotesvoucher i . Part (A1) of Assumption 2 holds because vouchers are randomly assigned. See Appendix F for the budget set relationships of the families we assume forCases II and III. 16n this example, we may regard T and Z as ordered variables and expect they arepositively correlated. This is because a family whose voucher has large number is morelikely to choose a house far from the downtown because it cannot use the voucher forhouses near the downtown.Similar to Example I, economic analysis generates the following monotonicityinequalities, which make intuitive sense: D ii ≥ D ii +1 and D ji ≤ D ji +1 for i = 1 , . . . , k − j ∈ T \ { i } ,D kk ≥ D k and D jk ≤ D j for j ∈ T \ { k } . (3.9)Table III summarizes (3.9). From Table III, we may assume part (A2) of Assumption2 for the subset of P that contains ( i, i + 1) for i = 1 , . . . , k − k, T ··· k − k (1 , D ≤ D D ≥ D ··· D k − ≤ D k − D k ≤ D k P ... ... ... ... ... ...( k − , k ) D k − ≤ D k D k − ≤ D k ··· D k − k − ≥ D k − k D kk − ≤ D kk ( k, D k ≤ D D k ≤ D ··· D k − k ≤ D k − D kk ≥ D k In this section, we establish identification of the potential outcome distributions andthe local treatment effects using our monotonicity assumption.
In this section, we introduce some basic identification results under our monotonicityassumption. We first establish identification of the compliers under our monotonicityassumption. The following lemma shows that when D tz ≤ D tz ′ and P ( C tz,z ′ ) > z, z ′ ) ∈ P and t ∈ T , each probability of C tz,z ′ and the conditionalc.d.f. of Y t given C tz,z ′ are identified as closed-form expressions. Heckman and Pinto(2018) show a similar result under the unordered monotonicity assumption.17 emma 2 (Identification of the compliers) . Assume that Assumption 2 hold, andthat P ( D tz ≤ D tz ′ | X = x ) = 1 and P ( C tz,z ′ | X = x ) > hold for ( z, z ′ ) ∈ P , x ∈ X , and t ∈ T . Define p t ( z, x ) as in (2.2). Then P ( C tz,z ′ | X = x ) and F Y t |C tz,z ′ X ( y | x ) for y ∈ Y are identified as P ( C tz,z ′ | X = x ) = p t ( z ′ , x ) − p t ( z, x ) , p t ( z ′ , x ) > p t ( z, x ) , (4.1) and F Y t |C tz,z ′ X ( y | x ) = F Y | T ZX ( y | t, z ′ , x ) p t ( z ′ , x ) − F Y | T ZX ( y | t, z, x ) p t ( z, x ) p t ( z ′ , x ) − p t ( z, x ) . (4.2)With Lemma 2 at hand, we establish identification of the counterfactual mappings.In this section, we review the identification results of Vuong and Xu (2017) for thebinary treatment case. Let the treatment be binary. Suppose T z ( ω ) ≤ T z ′ ( ω ) for all ω ∈ Ω and P ( C z,z ′ ) > D tz and C tz,z ′ , both D z ≤ D z ′ and P ( C z,z ′ ) >
0, and D z ′ ≤ D z and P ( C z ′ ,z ) > C z,z ′ and C z ′ ,z are the same: C z,z ′ = C z ′ ,z = C z,z ′ . (4.3)From (4.3), we obtain the following equation for the potential outcome conditionaldistributions given the compliers: F Y |C z,z ′ ( y ) = F Y |C z,z ′ ( φ , ( y )) for y ∈ Y ◦ . (4.4)This equation (4.4) follows from the following facts. First, under the rank similarityassumption, the rank variables U and U are identically distributed conditional on C z,z ′ . Second, from the definition of the counterfactual mapping, if y ∈ Y ◦ is the τ ∈ (0 ,
1) quantile of the distribution of Y , then φ , ( y ) is the τ quantile of thedistribution of Y . F Y |C z,z ′ and F Y |C z,z ′ are identified from Lemma 2, and φ , ( y ) for y ∈ Y ◦ is identified as φ , ( y ) = Q Y |C z,z ′ ( F Y |C z,z ′ ( y )) by solving (4.4) for φ , .In the case of multi-valued treatment settings, generally we can compare no twotreatment states on the same compliers as in (4.4) under a single monotonicity re-lationship. This is because the relationships between the compliers become morecomplicated than (4.3) for the binary treatment case, and the compliers generally do We show this statement in Lemma 7 in Appendix B.18ot coincide with each other. We overcome this difficulty by developing systems ofmonotonicity relationships that can be solved simultaneously for the counterfactualmappings on more than one compliers.
In this section, we establish identification of the counterfactual mappings using therelationships of the compliers when the treatment is discrete in general. Let thetreatment take k + 1 values (then we have T = { , , . . . , k } ), and assume Assumptions1 and 2 hold for the subset Λ of P .We employ the identification condition with “sign values” that characterize thetype of monotonicity relationship of each pair ( z , z ) contained in the monotonicitysubset Λ. To this end, we first introduce sign values. Suppose that, for ( z , z ) ∈ Λ,there uniquely exists l ( z ,z ) ∈ T such that either D l ( z ,z z ≥ D l ( z ,z z and D jz ≤ D jz for j ∈ T \ { l ( z ,z ) } or D l ( z ,z z ≤ D l ( z ,z z and D jz ≥ D jz for j ∈ T \ { l ( z ,z ) } holds almost surely. Then we call this value l ( z ,z ) “sign value of ( z , z ),” and we call“( z , z ) has a sign value” in this paper.When the treatment takes three values, each ( z , z ) ∈ Λ has a sign value. Thisis because, if monotonicity inequalities of ( z , z ) are all the same, then no compliersexist and Λ cannot contain ( z , z ). Suppose that D jz ≤ D jz holds almost surely forall j = 0 , ,
2. Then, D z ≥ D z holds almost surely because D z ≤ D z and D z ≤ D z imply 1 − D z ≤ − D z . Hence, D z = D z holds almost surely and D z < D z cannot happen. Similar argument applies to treatment states 1 and 2, and D jz < D jz cannot happen for any j ∈ T , which violates part (A3) of Assumption 2.With the sign values, we employ the following assumption and assume that k different types of monotonicity relationships exist. Assumption 3 (Existence of different types of monotonicity relationships) . For all x ∈ X , the monotonicity subset Λ contains k pairs of instrument values λ , . . . , λ k such that the following condition holds: or i = 1 , . . . , k , there uniquely exist l λ i ∈ T with l λ i = l λ j for i = j such that D l λi λ i ≥ D l λi λ i and D jλ i ≤ D jλ i for j ∈ T \ { l λ i } hold almost surely conditional on X = x . Under Assumption 3, the monotonicity subset Λ contains k pairs of instrument values λ , . . . , λ k such that each pair λ i has a sign value, and the sign values l λ i ’s are alldifferent. Then, each l λ i characterizes a type of the monotonicity relationship of λ i .Assumption 3 holds in Example I in Section 3.2. From Table I, the sign values of ( a, b )and ( a, c ) are l ( a,b ) = 2 and l ( a,c ) = 1 respectively, and ( a, b ) and ( a, c ) induce differenttypes of monotonicity relationships.We can interpret Assumption 3 as an instrument relevance condition for multi-ple endogenous binary variables that requires monotonic correlation between eachendogenous variable and the instrument. We illustrate this point with Example I.To this end, we show, for ( a, b ) and ( a, c ), different monotonicity relationshipgenerates different relationship between compliers. We focus on ( a, b ). First, observethat, from the definition of the compliers, we have C b,a = { D a = 1 , D b = 1 } ∪ { D a = 1 , D b = 1 } . (4.5)These sets { D a = 1 , D b = 1 } and { D a = 1 , D b = 1 } are disjoint because family ω con-tained in C b,a will choose either house 0 or 1 if its voucher assignment is b . Second,we obtain C a,b = { D a = 1 , D b = 1 } and C a,b = { D a = 1 , D b = 1 } . (4.6)To see this, suppose family ω contained in C a,b choose house 1 if its voucher assignmentis a . Then, because D a ≤ D b holds, it would also choose house 1 when its voucherassignment is b , which contradicts with the definition of C a,b . Hence, it will choosehouse 2 if its voucher assignment is a . Analogous argument applies to family ω contained in C a,b , and it will choose house 2 if its voucher assignment is a . Therefore,the following relationship holds from (4.5) and (4.6): C b,a = C a,b ∪ C a,b and C a,b ∩ C a,b = ∅ . (4.7)20or ( a, c ), an analogous discussion gives the following relationship: C c,a = C a,c ∪ C a,c and C a,c ∩ C a,c = ∅ . (4.8)These relationships (4.7) and (4.8) yield the preceding interpretation of Assump-tion 3. We focus on (4.7). Relationship (4.7) implies that whether the voucher assign-ment is a or b gives a monotonic effect only toward the house choice that concernshouse l ( a,b ) = 2. Compared with voucher b , voucher a additionally offers a discountonly to house 2, and only the preference for house 2 is affected by the difference be-tween vouchers a and b . To see this precisely, observe that C a,b and C a,b are containedin C b,a from (4.7). This implies that if family ω does not buy a house or chooseshouse 1 when its voucher assignment is b , it will not change its choice or choose house2 when its voucher assignment is a . An analogous discussion follows for (4.8), andwhether the voucher assignment is a or c gives a monotonic effect only toward thehouse choice that concerns house l ( a,c ) = 1.The treatment T in Example I consists of two binary choices that are whetherto buy house i or not ( D i ) for i = 1 ,
2. The value of D is determined when thevalues of D , . . . , D k are specified from the definition of the treatment. Therefore,with (4.7) and (4.8), there exists two pairs of instrument values such that each pairgive monotonic effect toward each house D i for i = 1 , i = 1 , . . . , k , ( i,
0) has a signvalue l ( i, = i . Then, for i = j , ( i,
0) and ( j,
0) induce different types of monotonic-ity relationships. The treatment T in Example II consists of k binary choices thatare whether to buy house i or not ( D i ) for i = 1 , . . . , k , and there exists k pairs ofinstrument values such that each pair ( i,
0) gives monotonic effect toward each housechoice D i for i = 1 , . . . , k . As in Example I, the following relationships between thecompliers hold for (1 , , . . . , ( k,
0) from Table II: C i ,i = [ j = i C ji, for i = 1 , . . . , k. (4.9)As in (4.7) and (4.8), (4.9) implies that whether the voucher assignment is i or no See Appendix G for the derivation of (4.9).21oucher gives monotonic effect only toward the house choice that concerns house l ( i, = i . Compared with no voucher, voucher i additionally offers a discount only tohouse i , and only the preference for house i is affected by the difference between thesetwo voucher assignments.Assumption 3 also holds in Example III in Section 3.2. From Table III, ( i, i + 1)for i = 1 , . . . , k − k,
0) has a sign value that are l ( i,i +1) = i and l ( k, = k . Thenthese k pairs of values induce different types of monotonicity relationships.The following lemma shows identification of the counterfactual mappings underAssumption 3: Lemma 3 (Identification of counterfactual mappings from monotonicity) . Assumethat Assumptions 1-3 hold. Then, for all s, t ∈ T and x ∈ X , φ xs,t ( y ) for y ∈ Y ◦ isidentified. Remark 2.
We do not need to derive the closed-form expressions of φ xs,t ’s to identifythem. See the proof of Lemma 3 for general k ∈ T in Appendix A. See Appendix Cfor the closed-form expressions of φ xs,t ’s when the treatment is discrete in general.As discussed in Section 2, E [ Y s | X = x ] and Q Y s | X ( τ | x ) for τ ∈ (0 ,
1) are identified asclosed-form expressions if φ xs,t ( · ) for t ∈ T are identified as closed-form expressions.Hence, we obtain the following theorem: Theorem 1 (Identification of potential outcome c.d.f.’s and ASF’s from monotonic-ity) . Assume that Assumptions 1-3 hold. Then, for all s ∈ T and x ∈ X , E [ Y s | X = x ] and Q Y s | X ( τ | x ) for τ ∈ (0 , are identified. This result is interesting because identification is achieved without assuming nu-merical conditions on the distribution of the outcome variable, and the proposedsufficient condition is economically interpretable. This fact may be useful when de-signing a social experiment where the outcome data will be collected later.
In this section, we illustrate identification of the counterfactual mappings under As-sumption 3 with Examples I and II in Section 3.2. We first illustrate identificationwhen the treatment takes three values with Example I.22s we derive (4.4) from (4.3) when the treatment is binary, for y ∈ Y ◦ , we obtainthe following equations from (4.7) and (4.8) respectively: F Y |C b,a ( y ) = F Y |C a,b ( φ , ( y )) P ( C a,b ) + F Y |C a,b ( φ , ( y )) P ( C a,b ) P ( C b,a ) (4.10)and F Y |C c,a ( φ , ( y )) = F Y |C a,c ( φ , ( y )) P ( C a,c ) + F Y |C a,c ( y ) P ( C a,c ) P ( C c,a ) . (4.11)By Lemma 2, all the functions in (4.10) and (4.11) except for the counterfactualmappings are identified. When we fix (4.10) and (4.11) at any y f ∈ Y ◦ , these twoequations are regarded as nonlinear simultaneous equations of φ , ( y f ) and φ , ( y f ).We identify φ , ( y f ) and φ , ( y f ) by solving (4.10) and (4.11) simultaneously at y f .Equations (4.10) and (4.11) are sufficiently different in order to identify φ , ( y f )and φ , ( y f ) because these equations are generated from the monotonicity relation-ships with different sign values that are l ( a,b ) = 2 and l ( a,c ) = 1 respectively. We cannotidentify φ , ( y f ) and φ , ( y f ) from a single restriction (4.10). In (4.10), the compari-son of treatment states 2 and 1 and that of treatment states 2 and 0 are mixed, and(4.11) provides an additional restriction to identify φ , ( y f ) and φ , ( y f ).Therefore, φ , ( y f ) and φ , ( y f ) are identified as follows from (4.10) and (4.11): φ , ( y f ) = sup y ∈ Y f : F Y |C a,b ( φ y f , ( y )) P ( C a,b ) + F Y |C a,b ( y ) P ( C a,b ) P ( C b,a ) ≤ F Y |C b,a ( y f ) = inf y ∈ Y f : F Y |C a,b ( φ y f , ( y )) P ( C a,b ) + F Y |C a,b ( y ) P ( C a,b ) P ( C b,a ) ≥ F Y |C b,a ( y f ) (4.12)and φ , ( y f ) = φ y f , ( φ , ( y f )) , (4.13)where φ y f , whose domain is Y f ⊂ Y is defined as φ y f , ( y ) := Q Y |C a,c F Y |C c,a ( y ) P ( C c,a ) − F Y |C a,c ( y f ) P ( C a,c ) P ( C a,c ) ! for y ∈ Y f . (4.14)Other counterfactual mappings on Y ◦ are inversions or compositions of φ , and φ , ,and they are also identified as closed-form expressions.23e next illustrate identification when the treatment is discrete in general withExample II. As we derive (4.4) from (4.3) when the treatment is binary, we obtainthe following equations from (4.9): F Y i |C i ,i ( φ k,i ( y )) = P j = i F Y j |C ji, ( φ k,j ( y )) P ( C ji, ) P ( C i ,i ) for y ∈ Y ◦ and i = 1 , . . . , k. (4.15)By Lemma 2, all the functions in (4.15) except for the counterfactual mappings areidentified. When we fix (4.15) at any y f,k ∈ Y ◦ , the k equations of (4.15) are regardedas nonlinear simultaneous equations of φ k,i ( y f,k ) for j = 0 , . . . , k −
1. We identify thesevalues by solving the k equations of (4.15) simultaneously at y f,k .These k equations are sufficiently different in order to identify φ k,i ( y f,k ) for j = 0 , . . . , k − l ( i, = i for i = 1 , . . . , k . Therefore, φ k,i ( y f,k ) for j = 0 , . . . , k − Y ◦ areinversions and compositions of φ k,i for j = 0 , . . . , k −
1, and they are also identified.
In this section, we identify the local treatment effects in the multi-valued treatmentsetting. Suppose that the monotonicity inequalities holds for ( z, z ′ ), and that D tz ≤ D tz ′ holds almost surely for treatment level t . When the treatment is binary, localtreatment effect is the treatment effect for the compliers, and identification followsfrom identification of F Y t |C z,z ′ , which is identified from Lemma 2.However, identification is not straightforward in multi-valued treatment settingsbecause the relationship between the compliers become more complicated. We over-come this difficulty by using systems of monotonicity relationships we develop toidentify the counterfactual mappings, and we identify the local treatment effect thatcompares any two different treatment states t and t ′ . We establish identification whenthe outcome variable is continuously distributed under our monotonicity assumptionwith some additional assumptions on unobservable factors such as the rank similarityassumption.For two different treatment levels t and t ′ and instrument values z and z ′ , Note that φ k,k ( y ) = y for y ∈ Y ◦ holds from the definition.24he subpopulation { D t ′ z = 1 , D tz ′ = 1 } changes the treatment choice from t ′ to t if the instrument value changes from z to z ′ . The local average treatment ef-fect (LATE) that compares treatment states t and t ′ conditional on this sub-population { D t ′ z = 1 , D tz ′ = 1 } is E [ Y t | D t ′ z = 1 , D tz ′ = 1] − E [ Y t ′ | D t ′ z = 1 , D tz ′ = 1], andthe local quantile treatment effect (LQTE) conditional on { D t ′ z = 1 , D tz ′ = 1 } is Q Y t | D t ′ z =1 ,D tz ′ =1 ( τ ) − Q Y t ′ | D t ′ z =1 ,D tz ′ =1 ( τ ) where τ ∈ (0 , X are similarly defined.The important result to show identification of local treatment effects is that iden-tification of the counterfactual mappings implies identification of the treatment effectconditional on the compliers. As we obtain (4.6) for Example I in Section 3.2, if ( z, z ′ )has a sign value l ( z,z ′ ) = t ′ , then P ( C tz,z ′ ) = P ( D t ′ z = 1 , D tz ′ = 1) holds for t ∈ T \ { t ′ } .The following lemma shows identification of the conditional distribution of Y t ′ given C tz,z ′ as well as that of Y t given C tz,z ′ : Lemma 4 (Identification of potential outcome conditional c.d.f.’s and ASF’s giventhe compliers) . Assume that Assumptions 1-3 hold, and that P ( D tz ≤ D tz ′ | X = x ) = 1 and P ( C tz,z ′ | X = x ) > hold for t ∈ T , ( z, z ′ ) ∈ P , and x ∈ X . Then, for all t ′ ∈ T , F Y t ′ |C tz,z ′ X ( y | x ) for y ∈ Y ◦ and E [ Y t ′ |C tz,z ′ , X = x ] can be expressed as F Y t ′ |C tz,z ′ X ( y | x ) = F Y t |C tz,z ′ X ( φ xt ′ ,t ( y ) | x ) (4.16) and E [ Y t ′ |C tz,z ′ , X = x ] = E [ φ xt,t ′ ( Y t ) |C tz,z ′ , X = x ] , (4.17) and E [ Y t ′ |C tz,z ′ , X = x ] and Q Y t ′ |C tz,z ′ X ( τ | x ) for τ ∈ (0 , are identified. With Lemma 4 at hand, we obtain the following theorem that shows identification oflocal treatment effects under our assumptions:
Theorem 2 (Identification of local potential outcome c.d.f.’s and ASF’s) . Assumethat Assumptions 1-3 hold. Then, for all t, t ′ ∈ T and x ∈ X , there exists ( z, z ′ ) ∈ P such that E [ Y s | D t ′ z = 1 , D tz ′ = 1 , X = x ] and Q Y s | D t ′ z =1 ,D tz ′ =1 ,X ( τ | x ) for τ ∈ (0 , and s ∈ { t, t ′ } are identified. Application
In this section, we apply our identification result to the real-world social experiment.Moving to Opportunity (MTO) is a housing experiment implemented by the U.S.Department of Housing and Urban Development (HUD) between 1994 and 1998. Itwas designed to evaluate the effects of relocating to low poverty neighborhoods onthe outcomes of disadvantaged families living in high poverty urban neighborhoodsin the United States. This project targeted over 4000 very low-income families withchildren under 18 living in public housing or private assisted housing projects in veryhigh poverty areas of Baltimore, Boston, Chicago, Los Angeles, and New York City,whose poverty rates were more than 40 percent according to the 1990 US Census.This project randomly assigned tenant-based housing vouchers from the Section 8program that could be used to subsidize housing costs if the family agreed to relo-cate to better neighborhoods. Eligible families who volunteered to participate in theproject were placed in one of the three assignment groups: experimental (about 40%of the sample), Section 8 (about 30% of the sample), or control (about 30% of thesample). Families assigned to the experimental group were offered Section 8 housingvouchers, but they were restricted to using their vouchers in a low poverty neighbor-hood, whose poverty rate was less than 10 percent according to the 1990 US Census,along with mobility counseling and help in leasing a new unit. After one year hadpassed, families in this group could use their voucher to move again if they wished,without any special constraints on location. Families assigned to the Section 8 groupwere offered regular Section 8 housing vouchers without any restriction on their placeof use and whatever briefing and assistance the local Section 8 program regularly pro-vided. Families assigned to the control group were offered no voucher, but continuedto be eligible for project-based housing assistance and whatever other social programsand services to which they would otherwise be entitled. An interim impacts evalua-tion (Orr et al. (2003)) was conducted in 2002 and assessed the effects in six studydomains: (1) mobility, housing, and neighborhood; (2) physical and mental health;(3) child educational achievement; (4) youth delinquency and risky behavior; (5) em-ployment and earnings; and (6) household income and public assistance reciept. Thelong term impacts evaluation (Sanbonmatsu et al. (2011)) was conducted in 2009 and2010. See Orr et al. (2003), Sanbonmatsu et al. (2011), and Shroder and Orr (2012)26or the detailed information for this project.Recent studies find evidence of neighborhood effects on adult employment.Aliprantis and Richter (2019) identify LATE for moving to a higher-quality neighbor-hood under an ordered treatment model using neighborhood quality as an observedcontinuous measure of the treatment variable and find positive effects on the adultlabor market outcomes and welfare receipt for the interim impacts evaluation data.Pinto (2015) nonparametrically identifies LATE and ATE for moving to a low povertyneighborhood under an unordered treatment model using a proxy for neighborhoodquality and finds statistically significant positive effects on the adult labor marketoutcomes for the interim impacts evaluation data.In this paper, we take an approach similar to Pinto (2015) for the model and eco-nomic analysis. Let Y denote the outcome of interest that is continuously distributed.Let the treatment T denote the relocation decision at the intervention onset, where T = 0 denotes no relocation, T = 1 denotes high poverty neighborhood relocation,and T = 2 denotes low poverty neighborhood relocation. Let the instrument Z rep-resent voucher assignment, where Z = a denotes no voucher (control group), Z = b denotes the Section 8 voucher, and Z = c denotes the experimental voucher. Forsimplicity, we suppress the other covariates X . Part (A1) of Assumption 2 holds because vouchers are randomly assigned. Thetreatment T consists of two binary choices that are whether to relocate to high andlow poverty neighborhood or not ( D and D ) respectively. As discussed in Section4.2, our identification result follows when there exists two pairs of instrument valuessuch that each pair gives monotonic effect toward each relocation choice D i for i = 1 , Z = a ), the experimental voucher ( Z = c ) addition-ally subsidizes housing costs only for low poverty neighborhood relocation ( D = 1).Next, compared with the experimental voucher ( Z = c ), the Section 8 voucher ( Z = b )additionally subsidizes housing costs only for high poverty neighborhood relocation( D = 1). Hence, ( b, c ) and ( c, a ) give monotonic effect toward relocation choices D Assuming the treatment variable is a binary variable when it is in fact multi-valued may prevent us from drawing desirable results. Aliprantis (2017) provideempirical evidence and theoretical arguments in favor of adopting a model with morethan two levels for the treatment to evaluate neighborhood effects of MTO.27nd D respectively, and our identification condition is satisfied.We proceed to confirm the preceding argument with monotonicity inequalities.As discussed in Section 3.2 for the examples of house purchase, utility maximizationproblem (3.3) of each family ω with budget constraint B ω ( z, t ) generates the mono-tonicity inequalities. As in the examples of house purchase, we assume the followingrelationships for budget set B ω ( z, t ) of family ω : B ω ( a,
0) = B ω ( b,
0) = B ω ( c, , (5.1) B ω ( a,
1) = B ω ( c, ⊂ B ω ( b, , (5.2) B ω ( a, ⊂ B ω ( b,
2) = B ω ( c, . (5.3)Relationships (5.1)-(5.3) can be interpreted in the same way as (3.4)-(3.6) in Section3.2. Applying the choice rule (3.7) to budget set relationships (5.1)-(5.3) generates themonotonicity inequalities summarized in Table IV. From Table IV, we may assumepart (A2) of Assumption 2 for the subset Λ of P that contains ( c, a ) and ( b, c ).Table IV: Monotonicity inequalities of MTO T c, a ) D c ≤ D a D c ≤ D a D c ≥ D a P ( b, c ) D b ≤ D c D b ≥ D c D b ≤ D c The monotonicity relationships of Table IV conform with our identification condi-tion. Assume the other conditions of Assumptions 1 and 2. From Table IV, the signvalues of ( c, a ) and ( b, c ) are l ( c,a ) = 2, l ( b,c ) = 1 respectively. Hence, ( c, a ) and ( b, c )induce different types of monotonicity relationships. Therefore, Assumption 3 holdsand we can apply Theorems 1 and 2 to identify the treatment effects as closed-formexpressions.Relationship (5.1) is weaker than Assumption A-2 of Pinto (2015), and relation-ships (5.2) and (5.3) are the same as Assumption A-1 of Pinto (2015). Inequalites inTable IV and the statement of Lemma L-1 of Pinto (2015) are equivalent. Pinto (2015)further assumes that a neighborhood is a normal good and generates additional mono-tonicity inequalities in addition to those in Table IV. Assumptions of Pinto (2015)leads to part (A2) of Assumption 2 for all the pairs in P , and unordered monotonic-28ty assumption holds. We adopt weaker assumptions because our assumptions aresufficient to identify the treatment effects.ATEs for moving to low and high poverty neighborhoods are E [ Y ] − E [ Y ] and E [ Y ] − E [ Y ] respectively. They are identified from Theorem 1. LATEs for moving tolow and high poverty neighborhoods are E [ Y | D a = 1 , D c = 1] − E [ Y | D a = 1 , D c =1] and E [ Y | D c = 1 , D b = 1] − E [ Y | D c = 1 , D b = 1] respectively. { D a = 1 , D c = 1 } represents the families that do not relocate when they are offered no voucher, butrelocate to a low poverty neighborhood when they are offered experimental voucher. { D c = 1 , D b = 1 } represents the families that do not relocate when they are offeredexperimental voucher, but relocate to a high poverty neighborhood when they areoffered Section 8 voucher. These treatment effects are identified from Theorem 2.The corresponding quantile treatment effects are also identified. Therefore, when theoutcome variable is continuously distributed, treatment effects are nonparametricallyidentified as closed-form expressions without another observable variable for a mea-sure of the treatment under some additional assumptions on unobservable factorssuch as the rank similarity assumption. In this paper, we establish sufficient conditions for identification of the treatment ef-fects when the treatment is discrete and endogenous. We show that the monotonicityassumption is sufficient when this holds in an appropriate way, and this condition iseconomically interpretable. We also derive the closed-form expressions of the identi-fied treatment effects.For estimation procedure, W¨uthrich (2019) estimates observable conditionalc.d.f.’s, q.f.’s, and probabilities in the closed-form expression semiparatetricly by ex-isting methods and construct the estimator by plugging-in them to the closed-formexpression. We can conduct similar approach to our closed-form expressions.Alternatively, especially for estimation of the QTE, we can apply the exisitng estima-tion methods under structual quantile models based on the GMM objective functionafter checking our identification conditions.29 ppendixA Proofs of the results in the main text
Proofs in this section use some auxiliary results (Lemmas 5-7) collected in AppendixB.
Proof of Lemma 1.
Observe that, for each s ∈ T and x ∈ X , we have F Y s | X ( y | x ) = F Y s | ZX ( y | z, x ) = k X t =0 F Y s | T ZX ( y | t, z, x ) p t ( z, x ) for y ∈ Y ◦ (A.1)and E [ Y s | X = x ] = E [ Y s | Z = z, X = x ] = k X t =0 E [ Y s | T = t, Z = z, X = x ] p t ( z, x ) , (A.2)where the first equalities in (A.1) and (A.2) hold from part (A2) of Assumption 1.Take any y ∗ ∈ Y ◦ . Then there exists τ ∗ ∈ (0 ,
1) such that y ∗ = Q Y s | X ( τ ∗ | x ) holdsfrom part (A1) of Assumption 1. This τ ∗ can be expressed as τ ∗ = F Y s | X ( y ∗ | x ).First, we show part (a). The required result (2.3) holds if we show F Y s | T ZX ( y ∗ | t, z, x ) = F Y t | T ZX ( φ xs,t ( y ∗ ) | t, z, x ) . (A.3)We proceed to show (A.3). Under the rank similarity assumption, we have F U s | T ZX ( τ ∗ | t, z, x ) = F U t | T ZX ( τ ∗ | t, z, x ) . (A.4)Then, observe that the following equations hold: { U s ≤ τ ∗ } = { F Y s | X ( Y s | x ) ≤ F Y s | X ( y ∗ | x ) } = { Y s ≤ y ∗ } (A.5)and { U t ≤ τ ∗ } = { F Y t | X ( Y t | x ) ≤ F Y t | X ( φ xs,t ( y ∗ ) | x ) } = { Y t ≤ φ xs,t ( y ∗ ) } . (A.6)The first equality in (A.5) holds from the definitions of U s and τ ∗ . The first equality in(A.6) holds from the definitions of U t , τ ∗ , and φ xs,t , and because F Y t | X ( Q Y t | X ( τ ∗ | x ) | x ) = τ ∗ holds from part (A1) of Assumption 1. The second equalities in (A.5) and (A.6)hold because F Y t | X ( y | x ) for t ∈ T are strictly increasing in y ∈ Y ◦ by Lemma 5. There-30ore, applying (A.5) and (A.6) to (A.4) leads to (A.3), and (2.3) holds for y ∈ Y ◦ byapplying (A.3) to (A.1).Next, we show part (b). The required result (2.4) holds if we show the followingequation: E [ Y s | T = t, Z = z, X = x ] = E [ φ xt,s ( Y t ) | T = t, Z = z, X = x ] . (A.7)We proceed to show (A.7). Observe that we have { U t ≤ τ ∗ } = { Q Y s | X ( U t | x ) ≤ Q Y s | X ( τ ∗ | x ) } = { φ xt,s ( Y t ) ≤ y ∗ } . (A.8)The first equality in (A.8) holds because Q Y s | X ( τ | x ) is strictly increasing in τ ∈ (0 , U t and φ xt,s .Then, applying (A.5) and (A.8) to (A.4) leads to F Y s | T ZX ( y ∗ | t, z, x ) = F φ xt,s ( Y t ) | T ZX ( y ∗ | t, z, x ) . (A.9)Because F Y t | X ( ·| x ) is continuous, U t ∼ U (0 ,
1) conditional on X = x holds. Then φ xt,s ( Y t ) = Q Y s | X ( U t | x ) d = Y s conditional on X = x holds, and F φ xt,s ( Y t ) | X ( ·| x ) is contin-uous. Hence, F Y s | T ZX ( ·| t, z, x ) and F φ xt,s ( Y t ) | T ZX ( ·| t, z, x ) are also continuous. Then,from the assumption that the closure of Y ◦ is equal to Y , we have F Y s | T ZX ( y | t, z, x ) = F φ xt,s ( Y t ) | T ZX ( y | t, z, x ) for y ∈ Y . (A.10)Therefore, Y s d = φ xt,s ( Y t ) conditional on ( T, Z, X ) = ( t, z, x ) holds, and we have (A.7).(2.4) holds by applying (A.7) to (A.2). ✷ Proof of Lemma 2.
Observe that we have C tz,z ′ = { D tz ′ = 1 } \ { D tz = D tz ′ = 1 } fromthe definition of C tz,z ′ , and that P ( D tz ≤ D tz ′ | X = x ) = 1 implies p t ( z, x ) = P ( D tz = D tz ′ = 1 | X = x ). Hence, (4.1) holds and P ( C tz,z ′ | X = x ) > p t ( z ′ , x ) > p t ( z, x ).For y ∈ Y , analogous argument gives P ( Y t ≤ y, C tz,z ′ | X = x ) = F Y | T ZX ( y | t, z ′ , x ) p t ( z ′ , x ) − F Y | T ZX ( y | t, z, x ) p t ( z, x ) , (A.11)and (4.2) holds from (4.1) and (A.11). ✷ For simplicity, we suppress the conditioning variable X in the proofs of Lemmas 3-4 unless stated otherwise. The proof of Lemma 3 become simpler when the treatmenttakes three values. We first show Lemma 3 for the case of k = 2.31 roof of Lemma 3 (for the case of k = 2 ). Suppose that the monotonicity subsetΛ ⊂ P contains two pairs of instrument values ( a, b ) , ( d, c ) such that each sign value is l ( a,b ) = 2, l ( d,c ) = 1, and D a ≥ D b , D d ≥ D c hold almost surely, where a, b, c, d ∈ Z andthese values may not be all different. Then, the types of monotonicity relationshipson ( a, b ) and ( d, c ) are different. In this proof, we assume d = a , and the monotonicityinequalities corresponds to those of Example I in Section 3.2. The proof does not relyon this assumption, and we can show this lemma without this assumption similarly.It suffices to show that φ , and φ , are identified on Y ◦ . Identification of othercounterfactual mappings results in identification of φ , and φ , . To see this, we showthat φ − s,t exists on Y ◦ , and that φ s,r is identified on Y ◦ if φ s,t and φ t,r are identified on Y ◦ for s, t, r ∈ T . We first show that φ − s,t exists on Y ◦ . First, φ s,t is strictly increasingon Y ◦ from part (A1) of Assumption 1 and Lemmas 5 and 6. Second, φ s,t ( Y ◦ ) = Y ◦ holds because we assume that F Y t ( Y ◦ ) does not depend on t ∈ T . Hence, from thedefinition of φ s,t , an inverse mapping φ − s,t exists on Y ◦ , and φ t,s ( y ) = φ − s,t ( y ) holdsfor y ∈ Y ◦ . We next show that φ s,r is identified on Y ◦ if φ s,t and φ t,r are identifiedon Y ◦ . First, φ s,r = φ t,r ◦ φ s,t holds because F Y t ( Q Y t ( τ )) = τ for τ ∈ (0 ,
1) holds frompart (A1) of Assumption 1. Second, φ s,t ( Y ◦ ) = Y ◦ holds. Hence, the stated resultfollows.From the preceding argument, y ∈ Y ◦ implies φ s,t ( y ) ∈ Y ◦ under the assumptionthat F Y t ( Y ◦ ) does not depend on t ∈ T . We use this property in this proof.We proceed to show that φ , and φ , are identified on Y ◦ . We divide the proofinto parts (i)-(iii). Part (i).
In this part, for ( a, b ) , ( a, c ) ∈ Λ, we show (4.10) and (4.11) hold for y ∈ Y .To this end, we first show P ( C b,a ) = P ( C a,b ) + P ( C a,b ) (A.12)and P ( C c,a ) = P ( C a,c ) + P ( C a,c ) . (A.13)We proceed to show (A.12). (A.13) follows from a similar argument. Observe Note that we can assume them without loss of generality.32hat, from the definition, we have P ( C b,a ) = P ( D a = 1 , D b = 1) + P ( D a = 1 , D b = 1) (A.14)and P ( C a,b ) = P ( D a = 1 , D b = 1) + P ( D a = 1 , D b = 1) . (A.15)Note that P ( D a = 1 , D b = 1) ≤ P ( D a = 1 , D b = 0) = 0 because { D a = 1 , D b = 1 } iscontained in { D a = 1 , D b = 0 } , and D a ≤ D b holds almost surely from part (A1) ofAssumption 2. Hence, we have P ( C a,b ) = P ( D a = 1 , D b = 1) . (A.16)An analogous argument gives P ( C a,b ) = P ( D a = 1 , D b = 1) . (A.17)Therefore, applying (A.16) and (A.17) to (A.14) gives (A.12).With (A.12) and (A.13) at hand, we show (4.10) and (4.11) hold for y ∈ Y . Weproceed to show (4.10). (4.11) follows from a similar argument. Take any y ∗ ∈Y ◦ . Then there exists τ ∗ ∈ (0 ,
1) such that y ∗ = Q Y ( τ ∗ ) holds from part (A1) ofAssumption 1. Observe that F U |C a,b ( τ ∗ ) = F U |C a,b ( τ ∗ ) and F U |C a,b ( τ ∗ ) = F U |C a,b ( τ ∗ ) (A.18)hold because { U s } s =0 are identically distributed conditional on each C ta,b for t = 0 , F U |C b,a ( τ ∗ ) = F U |C a,b ( τ ∗ ) P ( C a,b ) + F U |C a,b ( τ ∗ ) P ( C a,b ) P ( C b,a ) . (A.19)From (A.5) and (A.6) in the proof of Lemma 1, { U ≤ τ ∗ } = { Y ≤ y ∗ } and { U t ≤ τ ∗ } = { Y t ≤ φ ,t ( y ∗ ) } for t = 0 , y ∈ Y ◦ by applying them to (A.19). Part (ii).
From this part, we solve (4.10) and (4.11) for φ , and φ , simultaneously.Take any y f ∈ Y ◦ . Define φ y f , as in (4.14). In this part, we show that φ y f , satisfies(4.13). 33bserve that φ y f , is the identified function that satisfies F Y |C c,a ( y ) = F Y |C a,c ( φ y f , ( y )) P ( C a,c ) + F Y |C a,c ( y f ) P ( C a,c ) P ( C c,a ) for y ∈ Y f . (A.20)This is because F Y |C a,c is continuous on Y from Assumptions 1 and 2 and Lemma2, and F Y |C a,c ( Q Y |C a,c ( τ )) = τ holds for τ ∈ (0 , y f and (A.20) at φ , ( y f ), we have F Y |C a,c ( φ , ( y f )) = F Y |C a,c ( φ y f , ( φ , ( y f ))) . (A.21)Observe that φ , ( y f ) is contained in Y ◦ , and F Y |C a,c is strictly increasing on Y ◦ fromLemma 5. Therefore, (4.13) holds by taking inverse of F Y |C a,c in (A.21). Part (iii).
In this part, we finally show that (4.12) and (4.13) are the unique solutionto (4.10) and (4.11) at y f . First, We plug in (4.13) to (4.10) at y f and obtain F Y |C b,a ) ( y f ) = F Y |C a,b ( φ y f , ( φ , ( y f ))) P ( C a,b ) + F Y |C a,b ( φ , ( y f )) P ( C a,b ) P ( C b,a ) . (A.22)Define a function G y f , as G y f , ( y ) := F Y |C a,b ( φ y f , ( y )) P ( C a,b ) + F Y |C a,b ( y ) P ( C a,b ) P ( C b,a ) . (A.23)Then we can write (A.22) as F Y |C b,a ) ( y f ) = G y f , ( φ , ( y f )). Observe that G y f , is strictlyincreasing in Y f ∩ Y ◦ because F Y |C a,b , F Y |C a,b , and φ y f , are strictly increasing on Y f ∩ Y ◦ . φ , ( y f ) is contained in Y f ∩ Y ◦ , and we can solve (A.22) for φ , ( y f ) bytaking inverse of G y f , . Hence, φ , ( y ) is identified at each y f ∈ Y ◦ as (4.12).Therefore, φ , and φ , are identified on Y ◦ , and other counterfactual mappingsare also identified because they are inversions or compositions of φ , and φ , . ✷ We next show Lemma 3 for general k ∈ T . Proof of Lemma 3 (for general k ∈ T ). Suppose that the monotonicity subset Λ ⊂ P contains k pairs of instrument values λ , . . . , λ k such that each sign value is l λ i = i for i = 1 , . . . , k . For notation simplicity, let λ i = ( i,
0) and D ii ≥ D i hold almost surely.Then, the monotonicity inequalities corresponds to those of Example II in Section 3.2,and the types of monotonicity relationships on ( i,
0) and ( j,
0) for i = j are different.34he proof does not rely on this assumption, and we can prove without this assumptionsimilarly.It suffices to show that φ k, , . . . , φ k,k − are identified on Y ◦ . As discussed in theproof for the case of k = 2, identification of other counterfactual mappings results inidentification of φ , and φ , , and y ∈ Y ◦ implies φ s,t ( y ) ∈ Y ◦ under the assumptionthat F Y t ( Y ◦ ) does not depend on t ∈ T .We proceed to show that φ k, , . . . , φ k,k − are identified on Y ◦ . We divide the proofinto parts (i)-(ii). We do not derive the closed-form expressions of the counterfactualmappings in this proof. We derive the closed-form expressions in Appendix C. Part (i).
In this part, for λ , . . . , λ k ∈ Λ, we show (4.15). To this end, we first show P ( C i ,i ) = X j = i P ( C ji, ) for i = 1 , . . . , k. (A.24)We proceed to show (A.24). Observe that, from the definition, we have P ( C i ,i ) = X j = i P ( D ii = 1 , D j = 1) for i = 1 , . . . , k, (A.25) P ( C ji, ) = X l = j P ( D j = 1 , D li = 1) for j ∈ T \ { i } . (A.26)Note that, for j, l ∈ T \ { i } and j = l , we have P ( D j = 1 , D li = 1) ≤ P ( D l = 0 , D li =1) = 0 because { D j = 1 , D li = 1 } is contained in { D l = 0 , D li = 1 } , and D li ≤ D l holdsalmost surely from Assumption 3. Hence, we have P ( C ji, ) = P ( D ii = 1 , D j = 1) for i = 1 , . . . , k and j ∈ T \ { i } . (A.27)Therefore, by plugging in (A.27) to (A.25), we obtain (A.24). As we obtain (4.10)and (4.11) from (A.12) and (A.13) in the proof for the case of k = 2, we obtain (4.15)from (A.24) using Lemma 7. Part (ii).
In this part, we show that (4.15) is uniquely solved for φ k, , . . . , φ k,k − simul-taneously. To this end, consider the following simultaneous equations of ( y , . . . , y k ): F Y i |C i ,i ( y i ) = P j = i F Y j |C ji, ( y j ) P ( C ji, ) P ( C i ,i ) for i = 1 , . . . , k and y i ∈ Y ◦ . (A.28)35f ( y , . . . , y k ) = ( φ k, ( y k ) , . . . , φ k,k − ( y k ) , y k ) is the unique solution that satisfies (A.28),then φ k, ( y ) , . . . , φ k,k − ( y ) are identified for y ∈ Y ◦ in (4.15).We proceed to show that ( y , . . . , y k ) = ( φ k, ( y k ) , . . . , φ k,k − ( y k ) , y k ) is the uniquesolution that satisfies (A.28). Suppose we have solution ( y ′ , . . . , y ′ k − , y k ) differentfrom ( φ k, ( y k ) , . . . , φ k,k − ( y k ) , y k ) that also satisfies (A.28).We first consider the case of y ′ < φ k, ( y k ). Then, from (A.28) with i = k , thereexists j ∈ T \ { , k } such that y ′ j > φ k,j ( y k ) holds. To see this, suppose y ′ j ≤ φ k,j ( y k )holds for all j ∈ T \ { , k } . Then, because F Y j |C jk,j is strictly increasing on Y ◦ , we have F Y k |C k ,k ( y ′ k ) > P j = k F Y j |C jk, ( y ′ j ) P ( C jk, ) P ( C k ,k ) , (A.29)and (A.28) with i = k does not hold. Without loss of generality, suppose j = k − φ k,k − is strictly increasing on Y ◦ , there exists y (1) > y k such that y ′ k − = φ k,k − ( y (1) ) holds. Then, from (A.28) with i = k −
1, there exists j ∈ T \ { , k − , k } such that y ′ j > φ k,j ( y (1) ) holds. To see this, suppose y ′ j ≤ φ k,j ( y (1) ) holds for all j ∈T \ { , k − , k } . Then, by comparing (4.15) at y (1) and (A.28) with i = k − F Y j |C jk,j on Y ◦ , we have F Y k − |C k − ,k − ( φ k,k − ( y (1) )) > P j = k − F Y j |C jk − , ( y ′ j ) P ( C jk − , ) P ( C k − ,k − ) , (A.30)and (A.28) with i = k − j = k − φ k,k − is strictly increasing in Y ◦ , there exists y (2) > y (1) such that y ′ k − = φ k,k − ( y (2) ) holds. Then, by repeating similar discussions, we can show from (A.28)with i = 2 , . . . , k that, without loss of generality, there exists y k < y (1) < · · · < y ( k −
It suffices to show that, for each s ∈ T and x ∈ X , the con-ditional distribution of Y s given X = x is identified. Because φ xs,t ( y ) for y ∈ Y ◦ isidentified from Lemma 3, F Y s | X ( y | x ) for y ∈ Y ◦ is identified from (2.3) in Lemma 1.Then, from the assumption that the closure of Y ◦ is equal to Y , F Y s | X ( y | x ) for y ∈ Y is identified. Therefore, the conditional distribution of Y s given X = x is identified. ✷ Proof of Lemma 4.
First, we show (4.16) and (4.17). Because U t and U t ′ are identi-cally distributed conditional on C tz,z ′ by Lemma 7, we have F U t ′ |C tz,z ′ ( τ ) = F U t |C tz,z ′ ( τ ) for τ ∈ (0 , . (A.32)Then, similar to the derivations of (A.6) and (A.8)-(A.10) in the proof of Lemma 1,we have { U t ≤ τ } = { Y t ≤ φ t ′ ,t ( y ) } for y ∈ Y ◦ (A.33)and Y t ′ d = φ xt,t ′ ( Y t ) conditional on C tz,z ′ . (A.34)Hence, applying (A.33) to (A.32) leads to (4.16), and (4.17) follows from (A.34).Next, we show that, for each t ′ ∈ T , the conditional distribution of Y t ′ given C tz,z ′ is identified. Because φ xt ′ ,t ( y ) for y ∈ Y ◦ is identified from Lemma 3, F Y t ′ |C tz,z ′ ( y ) for y ∈ Y ◦ is identified from (4.16). Then, from the assumption that the closure of Y ◦ isequal to Y , F Y t ′ |C tz,z ′ ( y ) for y ∈ Y is identified. Therefore, the conditional distributionof Y t ′ given C tz,z ′ is identified, and the stated result follows. ✷ Proof of Theorem 2.
From Assumption 3, for all t, t ′ ∈ T and x ∈ X , there exists( z, z ′ ) ∈ P such that l ( z,z ′ ) ∈ { t, t ′ } . We show the case of l ( z,z ′ ) = t ′ . The case of l ( z,z ′ ) = t follows from a similar argument. From (A.27) in the proof of Lemma 3 for37eneral k ∈ T , we have P ( C tz,z ′ | X = x ) = P ( D t ′ z = 1 , D tz ′ = 1 | X = x ). Therefore, thestated result follows by Lemma 4. ✷ B Auxiliary results
The following lemmas are used in the proofs in Appendix A.
Lemma 5 (Strict monotonicity on the interior of the support) . Let W be a scalarvalued random variable whose support is W . Then, F W is strictly increasing on W ◦ .Proof of Lemma 5. Because W is the support of W , we have W = { w ∈ R : F W ( w + ε ) − F W ( w − ε ) > ε > } . (B.1)Consider w , w ∈ W ◦ with w < w . Then, there exists δ > η ≤ δ ⇒ w + η ∈ W holds. First, suppose w − w ≤ δ . Then, because ( w + w ) / ∈ W , wehave F W ( w ) < F W ( w ) from (B.1). Second, suppose w − w > δ . Then, because w + δ/ ∈ W , we have F W ( w ) < F W ( w + δ ) from (B.1). Hence, we have F W ( w ) 1) with τ < τ . Suppose that Q W ( τ ) = Q W ( τ ) holds. Because F W is continuous, F W ( W ) ∼ U (0 , 1) holds. Then, from Q W ( τ ) ≤ w ⇔ τ ≤ F W ( w ) for τ ∈ (0 , 1) and w ∈ R , we have1 − τ i = P ( F W ( W ) ≥ τ i ) = P ( W ≥ Q W ( τ i )) for i = 1 , . Hence, we have τ = τ , which is a contradiction. Therefore, the stated result follows. ✷ Lemma 7 (Rank similarity on the compliers) . Assume that Assumptions 1 and 2hold, and that P ( D tz ≤ D tz ′ | X = x ) = 1 and P ( C tz,z ′ | X = x ) > hold for ( z, z ′ ) ∈ P , x ∈ X , and t ∈ T . Then { U s } ks =0 are identically distributed conditional on C tz,z ′ and X = x . roof of Lemma 7. As we show (4.2) of Lemma 2, we can show that for τ ∈ (0 , t ′ ∈ T , F U t ′ |C tz,z ′ X ( τ | x ) = F U t ′ | T ZX ( τ | t, z ′ , x ) p t ( z ′ , x ) − F U t ′ | T ZX ( τ | t, z, x ) p t ( z, x ) p t ( z ′ , x ) − p t ( z, x ) (B.2)holds. Under rank similarity, for z + ∈ Z , we have F U t ′ | T ZX ( τ | t, z + , x ) = F U t | T ZX ( τ | t, z + , x ) . (B.3)Combining (B.2) with (B.3) leads to F U t ′ |C tz,z ′ X ( τ | x ) = F U t |C tz,z ′ X ( τ | x ), and the statedresult follows. ✷ References Aliprantis, D. (2017). Assessing the evidence on neighborhood effects from movingto opportunity. Empirical Economics , 52(3):925–954.Aliprantis, D. and Richter, F. (2019). Evidence of neighborhood effects from movingto opportunity: Lates of neighborhood quality. FRB of Cleveland Working PaperNo. 12-08r3.Angrist, J. D. and Imbens, G. W. (1995). Two-stage least squares estimation ofaverage causal effects in models with variable treatment intensity. Journal of theAmerican statistical Association , 90(430):431–442.Athey, S. and Imbens, G. W. (2006). Identification and inference in nonlineardifference-in-differences models. Econometrica , 74(2):431–497.Caetano, C. and Escanciano, J. C. (2020). Identifying multiple marginal effects witha single instrument. Econometric Theory , pages 1–31.Chen, L.-Y. and Lee, S. (2018). Exact computation of gmm estimators for instrumen-tal variable quantile regression models. Journal of Applied Econometrics , 33(4):553–567.Chen, X. and Pouzo, D. (2009). Efficient estimation of semiparametric conditionalmoment models with possibly nonsmooth residuals. Journal of Econometrics ,152(1):46–60. 39hen, X. and Pouzo, D. (2012). Estimation of nonparametric conditional momentmodels with possibly nonsmooth generalized residuals. Econometrica , 80(1):277–321.Chernozhukov, V. and Hansen, C. (2005). An iv model of quantile treatment effects. Econometrica , 73(1):245–261.Chernozhukov, V. and Hansen, C. (2006). Instrumental quantile regression inferencefor structural and treatment effect models. Journal of Econometrics , 132(2):491–525.Chernozhukov, V. and Hong, H. (2003). An mcmc approach to classical estimation. Journal of Econometrics , 115(2):293–346.Chernozhukov, V., Imbens, G. W., and Newey, W. K. (2007). Instrumental variableestimation of nonseparable models. Journal of Econometrics , 139(1):4–14.Das, M. (2005). Instrumental variables estimators of nonparametric models withdiscrete endogenous regressors. Journal of Econometrics , 124(2):335–361.de Castro, L., Galvao, A. F., Kaplan, D. M., and Liu, X. (2019). Smoothed gmm forquantile models. Journal of Econometrics , 213(1):121–144.Feng, J. (2019). Matching points: Supplementing instruments with covariates intriangular models. arXiv preprint arXiv:1904.01159.Feng, Q., Vuong, Q., and Xu, H. (2020). Estimation of heterogeneous individualtreatment effects with endogenous treatments. Journal of the American StatisticalAssociation , 115(529):231–240.Gagliardini, P. and Scaillet, O. (2012). Nonparametric instrumental variable estima-tion of structural quantile effects. Econometrica , 80(4):1533–1562.Heckman, J. J. (2001). Micro data, heterogeneity, and the evaluation of public policy:Nobel lecture. Journal of political Economy , 109(4):673–748.Heckman, J. J. and Pinto, R. (2018). Unordered monotonicity. Econometrica , 86(1):1–35. 40eckman, J. J., Urzua, S., and Vytlacil, E. (2006). Understanding instrumentalvariables in models with essential heterogeneity. The Review of Economics andStatistics , 88(3):389–432.Horowitz, J. L. and Lee, S. (2007). Nonparametric instrumental variables estimationof a quantile regression model. Econometrica , 75(4):1191–1208.Imbens, G. W. and Angrist, J. D. (1994). Identification and estimation of localaverage treatment effects. Econometrica , 62(2):467–475.Kaido, H. and Wuthrich, K. (2018). Decentralization estimators for instrumentalvariable quantile regression models. arXiv preprint arXiv:1812.10925.Lee, S. and Salani´e, B. (2018). Identifying effects of multivalued treatments. Econo-metrica , 86(6):1939–1963.Mogstad, M., Torgovitsky, A., and Walters, C. (2019). The causal interpretation oftwo-stage least squares with multiple instrumental variables. Working Paper 25691,National Bureau of Economic Research.Mountjoy, J. (2019). Community colleges and upward mobility. Available at SSRN3373801.Orr, L., Feins, J., Jacob, R., Beecroft, E., Sanbonmatsu, L., Katz, L. F., Liebman,J. B., and Kling, J. R. (2003). Moving to opportunity: Interim impacts evaluation .Washington, DC: US Department of Housing and Urban Development, Office ofPolicy Development and Research.Pinto, R. (2015). Selection bias in a controlled experiment: The case of moving toopportunity. Unpublished Ph. D. Thesis, University of Chicago, Department ofEconomics.Sanbonmatsu, L., Ludwig, J., Katz, L. F., Gennetian, L. A., Duncan, G. J., Kessler,R. C., Adam, E., McDade, T. W., and Lindau, S. T. (2011). Moving to Opportunityfor Fair Housing Demonstration Program: Final Impacts Evaluation . Washington,DC: US Department of Housing and Urban Development, Office of Policy Develop-ment and Research. 41hroder, M. D. and Orr, L. L. (2012). Moving to opportunity: Why, how, and whatnext? Cityscape , 14(2):31–56.Vuong, Q. and Xu, H. (2017). Counterfactual mapping and individual treatmenteffects in nonseparable models with binary endogeneity. Quantitative Economics ,8(2):589–610.W¨uthrich, K. (2019). A closed-form estimator for quantile treatment effects withendogeneity. Journal of econometrics , 210(2):219–235.Zhu, Y. (2018). k-step correction for mixed integer linear programming: a new ap-proach for instrumental variable quantile regressions and related problems. arXivpreprint arXiv:1805.06855. 42 r X i v : . [ ec on . E M ] O c t Supplement to “Identification of multi-valuedtreatment effects with unobserved heterogeneity” Koki Fusejima ∗ Graduate School of Economics, University of TokyoThis version: October 12, 2020 This Supplemental Appendix is organized as follows. Appendix C derives theclosed-form expressions of the counterfactual mappings for the general discrete treat-ment case. Appendix D relaxes some conditions assumed for simplicity in the mainpaper, and precisely derive the closed-form expressions of the treatment effects. Ap-pendix E shows some sufficient conditions for Assumption 2 introduced in Section3.1. Appendix F shows the budget set relationships of the families we assume forExamples II and III in Section 2. Appendix G derives the relationships between thecompliers of Example II in Section 3.2. In Appendix H, we discuss the reason why weassume that the treatment do not have larger support than the instrument variable.For simplicity, we suppress the conditioning variable X unless stated otherwise. ∗ Email: [email protected] Closed-form expressions of the counterfactualmappings in Lemma 3 C.1 Derivation of the closed-form expressions In this section, we derive the closed-form expressions of φ s,t ’s in Lemma 3. Take any y f,k ∈ Y ◦ . Define G y f k − ,k ( y ) for y ∈ Y f as G y f k − ,k ( y ) := P k − j =0 F Y j |C jk, ( φ y f k − ,j ( y )) P ( C jk, ) + F Y k − |C k − k, ( y ) P ( C k − k, ) P ( C h ,k ) , (C.1)where φ y f k − ,j for j = 0 , . . . , k − Y f ⊂ Y are constructed to satisfythe following equations with φ y f k − ,k − ( y ) = y : F Y i |C i ,i ( φ y f k − ,i ( y ))= P j = i,k,k − F Y j |C ji, ( φ y f k − ,j ( y )) P ( C ji, ) + F Y k − |C k − i, ( y ) P ( C k − i, ) + F Y k |C ki, ( y f,k ) P ( C ki, ) P ( C i ,i )for i = 1 , . . . , k − y ∈ Y f . (C.2)The closed-form expressions of φ k,j ( y f,k ) for j = 0 , . . . , k − φ k,k − ( y f,k ) = sup n y ∈ Y f : G y f k − ,k ( y ) ≤ F Y k |C h ,k ( y f,k ) o = inf n y ∈ Y f : G y f k − ,k ( y ) ≥ F Y k |C h ,k ( y f,k ) o , (C.3) φ k,j ( y f,k ) = φ y f k − ,j ( φ k,k − ( y f,k )) for j = 0 , . . . , k − . (C.4)As discussed in proof of Lemma 3 in Appendix A, other counterfactual mappingsare also identified as closed-form expression on Y ◦ because they are inversions orcompositions of φ k, , . . . , φ k,k − .We proceed to show that (C.3) and (C.4) are the unique solution to (4.15) at y f,k .We divide the proof into parts (i)-(iii). The domain Y f contains φ k,k − ( y f,k ), and when k = 2, (A.20) and (C.2) are thesame. 2 art (i). In this part, we construct φ y f k − ,j for j = 0 , . . . , k − Y f ⊂ Y is the domain of φ y f k − ,j for j = 0 , . . . , k − 2, and Y f contains φ k,k − ( y f,k ). Bycomparing (4.15) at y f,k and (C.2), these functions exists at φ k,k − ( y f,k ).To this end, consider the case of k = h . We can construct these functions forany h by starting the following discussion from the case of h = 3 and repeating thediscussion inductively for h = 4 , , . . . . Suppose we know how to construct φ y f k − ,j for j = 0 , . . . , k − k =2 , . . . , h − 1. From (C.2) with k = h , for y f,h − ∈ Y f , we can construct φ y f h − ,j whosedomain is Y f, ⊂ Y as follows: φ y f h − ,h − ( y f,h − ) ∈ n y ∈ Y f, : G y f h − ,h − ( y ) = F Y h − |C h − ,h − ( y f,h − ) o , (C.5) φ y f h − ,j ( y f,h − ) = φ y f h − ,j ( φ y f h − ,h − ( y f,h − )) for j = 0 , . . . , h − , (C.6)where G y f h − ,h − is defined as G y f h − ,h − ( y ) := P j = h,h − F Y j |C jh − , ( φ y f h − ,j ( y )) P ( C jh − , ) + F Y h |C hh − , ( y f,h ) P ( C hh − , ) P ( C h − ,h − ) , and φ y f h − ,j for j = 0 , . . . , h − F Y i |C i ,i ( φ y f h − ,i ( y ))= P j = i,h,h − F Y j |C ji, ( φ y f h − ,j ( y )) P ( C ji, ) + P hj = h − F Y j |C ji, ( y f,j ) P ( C hi, ) P ( C i ,i )for i = 1 , . . . , h − y ∈ Y f, , (C.7)We can construct φ y f h − ,j for j = 0 , . . . , h − φ y f h − ,j for j = 0 , . . . , h − k = h − Part (ii). In this part, we show that φ y f k − ,j for j = 0 , . . . , k − We demonstrate the case of h = 3 in Appendix C.2.3nd, consider the following simultaneous equations of ( y , . . . , y k − ): F Y i |C i ,i ( y i )= P j = i,h F Y j |C ji, ( y j ) P ( C ji, ) + F Y k |C ki, ( y f,k ) P ( C ki, ) P ( C i ,i )for y i ∈ Y ◦ and i = 1 , . . . , k − . (C.8)If ( y , . . . , y k − ) = ( φ k, ( y f,k ) , . . . , φ k,k − ( y f,k )) is the only solution that satisfies (C.8)with y k − = φ k,k − ( y f,k ), (C.4) follows from comparing (4.15) at y f,k and (C.2) at φ k,k − ( y f,k ).We proceed to show that ( y , . . . , y k − ) = ( φ k, ( y f,k ) , . . . , φ k,k − ( y f,k )) is the onlysolution that satisfies (C.8) with y k − = φ k,k − ( y f,k ). Suppose we have solution( y ′ , . . . , y ′ k − , y k − ) different from ( φ k, ( y f,k ) , . . . , φ k,k − ( y f,k )) that also satisfies (C.8)with y k − = φ k,k − ( y f,k ). Then, as we show contradiction for the cases of y ′ < φ k, ( y k ), y ′ > φ k, ( y k ), and y ′ = φ k, ( y k ) in part (ii) of the proof of Lemma 3 for general k ∈ T ,we can show that this supposition contradicts with the fact that ( y ′ , . . . , y ′ k − , y k − )also satisfies (C.8) with y k − = φ k,k − ( y f,k ) for the cases of y ′ < φ k, ( y f,k ), y ′ >φ k, ( y f,k ), and y ′ = φ k, ( y f,k ).Therefore, ( y , . . . , y k − ) = ( φ k, ( y f,k ) , . . . , φ k,k − ( y f,k )) is the unique solution thatsatisfies (C.8) with y k − = φ k,k − ( y f,k ). Hence, (C.4) holds regardless of the unique-ness of φ y f k − ,j for j = 0 , . . . , k − Part (iii). In this part, we finally show that (C.3) and (C.4) are the unique solutionsto satisfy (4.15) at y f,k . First, we plug in (C.4) to (4.15) with i = k : F Y k |C h ,k ( y f,k )= P k − j =0 F Y j |C jk, ( φ y f k − ,j ( φ k,k − ( y f,k ))) P ( C jk, ) + F Y k − |C k − k, ( φ k,k − ( y f,k )) P ( C k − k, ) P ( C k ,k ) . (C.9)Next, we solve (C.9) for φ k,k − ( y f,k ). To this end, we first show that, for j = 0 , . . . , k − y ∈ Y f ∩ Y ◦ , φ y f k − ,j satisfies the following properties: y > φ k,k − ( y f,k ) ⇒ φ y f k − ,j ( y ) > φ y f k − ,j ( φ k,k − ( y f,k )) (C.10)and y < φ k,k − ( y f,k ) ⇒ φ y f k − ,j ( y ) < φ y f k − ,j ( φ k,k − ( y f,k )) . (C.11)4ith (C.10) and (C.11) at hand, for y ∈ Y f ∩ Y ◦ , G y f k − ,k defined in (C.1) satisfy aproperty similar to (C.10) and (C.11) of φ y f k − ,j .We proceed to show (C.10). (C.11) follows from a similar argument usingthe reverse signs of inequality. Suppose y + > φ k,k − ( y f,k ) satisfies φ y f k − , ( y + ) ≤ φ y f k − , ( φ k,k − ( y f,k )). From (C.4), we have φ y f k − , ( y + ) ≤ φ k, ( y f,k ). Then, as we showcontradiction for the case of y ′ < φ k, ( y k ) in part (ii) of the proof of Lemma 3 for gen-eral k ∈ T , we can show that this supposition contracts with the fact that φ y f k − ,j ( y + )for j = k − , . . . , y + . For the case of y + > φ k,k − ( y f,k ) satisfying φ y f k − , ( y + ) > φ y f k − , ( φ k,k − ( y f,h )), consider the case o φ y f k − ,j ( y + ) ≤ φ y f k − ,j ( φ k,k − ( y f,h ))for some j ∈ \{ , k − , k } , and we can show contradiction in the same way as thecase of y ′ = φ k, ( y k ) and y ′ j = φ k,j ( y k ) for some j ∈ T \ { , k } in part (ii) of the proofof Lemma 3 for general k ∈ T . Therefore, (C.10) holds.Therefore, we can solve (C.9) for φ k,k − ( y f,k ), and the closed-form expression of φ k,k − ( y ) at each y f,k is derived as (C.3). Closed-form expressions of φ k,j for j =0 , . . . , k − φ k,k − to (C.4). C.2 Supplement to Part (i) In this section, we apply part (i) of the preceding proof to the case of k = 3. Take any y f, ∈ Y . For y ∈ Y f , let φ y f , and φ y f , whose domains are Y f ⊂ Y be the functionsthat satisfies the following equations: F Y |C , ( y )= F Y |C , ( φ y f , ( y )) P ( C , ) + F Y |C , ( φ y f , ( y )) P ( C , ) + F Y |C , ( y f, ) P ( C , ) P ( C , ) , (C.12) F Y |C , ( φ y f , ( y ))= F Y |C , ( φ y f , ( y )) P ( C , ) + F Y |C , ( φ y f , ( y )) P ( C , ) + F Y |C , ( y f, ) P ( C , ) P ( C , ) . (C.13)(C.12) and (C.13) are the same as (C.2) with k = 3. Then, as we identify the coun-terfactual mappings in the proof of Lemma 3 for the case of k = 2, φ y f , and φ y f , are5dentified on Y f , and for y f, ∈ Y f , (C.5) and (C.6) with h = 3 become φ y f , ( y f, ) = n y ∈ Y f, : G y f , ( y ) = F Y |C , ( y f, ) o = sup n y ∈ Y f, : G y f , ( y ) ≤ F Y |C , ( y f, ) o = inf n y ∈ Y f, : G y f , ( y ) ≥ F Y |C , ( y f, ) o , (C.14) φ y f , ( y f, ) = φ y f , ( φ y f , ( y f, )) , (C.15)where φ y f , whose domain is Y f, ⊂ Y is the function that satisfies the following equa-tion: F Y |C , ( y )= F Y |C , ( φ y f , ( y )) P ( C , ) + F Y |C , ( y f, ) P ( C , ) + F Y |C , ( y f, ) P ( C , ) P ( C , ) . (C.16)(C.16) is the same as (C.7) with h = 3, and as we identify φ y f , for the case of k = 2in the proof of Lemma 3, φ y f , in (C.16) is identified as φ y f , ( y ) = Q Y |C , F Y |C , ( y ) P ( C , ) − P j =2 F Y j |C j , ( y f,j ) P ( C j , ) P ( C , ) . D Additional discussion for identification asclosed-form expressions In this section, we relax the assumptions that the closure of Y ◦ is equal to Y , andthat F Y t | X ( Y ◦ | x ) does not depend on t ∈ T , and precisely derive the closed-formexpressions of the treatment effects. To this end, we first define a subset of Y thatis sufficiently large and contains Y ◦ , and then we derive the closed-form expressionsof φ xs,t and the conditional c.d.f.’s of Y t on that subset of Y . Proofs of lemmas,propositions, and theorems are collected at the end of this section.Before we define the required subset of Y , we introduce some preliminary resultsthat are useful in this section. Let V and W be scalar valued random variables whosesupports are V and W . Define V ∗ := { v ∈ R : F V ( v ) − F V ( v − ε ) > ε > } and W ∗ := { w ∈ R : F W ( w ) − F W ( w − ε ) > ε > } . The following lemmasshow some useful properties of V ∗ and W ∗ .6 emma 8 (Size of W ∗ ) . W contains W ∗ , and W ∗ contains W ◦ . Lemma 9 (Covering the quantiles) . (a) For all τ ∈ (0 , , Q W ( τ ) is contained in W ∗ .(b) If F W is continuous, then, for all τ ∈ (0 , , there exists w ∈ W ∗ such that F W ( w ) = τ holds. Lemma 10 (Identical supports) . (a) If V ∗ ⊃ W ∗ holds, then V ⊃ W holds.(b) If F W is continuous and V ⊃ W holds, then V ∗ ⊃ W ∗ holds. Lemma 11 (Strict monotonicity on W ∗ ) . F W is strictly increasing on W ∗ . Lemma 12 (Identical distributions) . If F V ( w ) = F W ( w ) holds for all w ∈ W ∗ , and F W is continuous, then F V ( w ) = F W ( w ) holds for all w ∈ R . We now define the required subset of Y . For t ∈ T and x ∈ X , define Y ∗ := { y ∈ R : F Y t | X ( y | x ) − F Y t | X ( y − ε | x ) > ε > } . From Lemma 8, Y ∗ is a subset of Y and contains Y ◦ . The following proposition shows that φ xs,t and the conditionalc.d.f.’s of Y t are identified on Y ∗ as well as Y ◦ : Proposition 1. Assume that Assumptions 1-3 hold. Define p t ( z, x ) as in (2.2).(a) For each s ∈ T and x ∈ X , F Y s | X ( y | x ) for y ∈ Y ∗ can be expressed as (2.3).(b) Assume that P ( D tz ≤ D tz ′ | X = x ) = 1 and P ( C tz,z ′ | X = x ) > hold for t ∈ T , ( z, z ′ ) ∈ P , and x ∈ X . Then, for all t ′ ∈ T , F Y t ′ |C tz,z ′ ( y ) for y ∈ Y ∗ can be ex-pressed as (4.16).(c) For all s, t ∈ T and x ∈ X , φ xs,t ( y ) for y ∈ Y ∗ is identified. Remark 3. Proposition 1 modifies the results in the main paper that hold on Y ◦ .Part (a) of Proposition 1 modifies part (a) of Lemma 1. Part (b) of Proposition 1modifies Lemma 4. Part (c) of Proposition 1 modifies Lemma 3. The closed-formexpressions derived in the proof of Lemma 3 for the case of k = 2 and in AppendixC also hold on Y ∗ . Y ∗ does not depend on t ∈ T and x ∈ X from Lemma 10. We assume that thesupport of the conditional distribution of Y t given X = x is Y .7n the proof of Lemmas 1-4 and, we use strict monotonicity of F Y t | X ( y | x ) and F Y t |C tz,z ′ X ( y | x ) in y ∈ Y ◦ in order to derive the closed-form expressions of φ xs,t and theconditional c.d.f.’s of Y t on Y ◦ . From Lemma 11, F Y t | X ( y | x ) is strictly increasing in y ∈ Y ∗ under part (A1) of Assumption 1. From Lemmas 10 and 11, F Y t |C tz,z ′ X ( y | x )for each ( z, z ′ ) ∈ P is also strictly increasing on Y ∗ under part (A4) of Assumption 2.Therefore, applying Lemmas 10 and 11 instead of Lemma 5 leads to the closed-formexpressions of φ xs,t and the conditional c.d.f.’s of Y t on Y ∗ .We then apply Proposition 1 to precisely derive the closed-form expressions of thetreatment effects. With Proposition 1 at hand, the following proposition shows thatthe treatment effects are identified as closed-form expressions: Proposition 2. Assume that Assumptions 1-3 hold.(a) For each s ∈ T and x ∈ X , Q Y s | X ( τ | x ) for τ ∈ (0 , and E [ Y s | X = x ] can beexpressed as Q Y t | X ( τ | x ) = inf { y ∈ Y ∗ : F Y t | X ( y | x ) ≥ τ } (D.1) and (2.4) respectively.(b) Assume that P ( D tz ≤ D tz ′ | X = x ) = 1 and P ( C tz,z ′ | X = x ) > hold for t ∈ T , ( z, z ′ ) ∈ P , and x ∈ X . Then, for all t ′ ∈ T , Q Y t ′ |C tz,z ′ X ( τ | x ) for τ ∈ (0 , and E [ Y t ′ |C tz,z ′ , X = x ] can be expressed as Q Y t ′ |C tz,z ′ X ( τ | x ) = inf { y ∈ Y ∗ : F Y t ′ |C tz,z ′ X ( y | x ) ≥ τ } (D.2) and (4.17) respectively. Remark 4. In the main paper, we show (2.4) and (4.17) in part (b) of Lemma 1 andLemma 4 respectively. The proofs in the main paper use the assumption that theclosure of Y ◦ is equal to Y . In the proof of Proposition 2, we show (2.4) and (4.17)without assuming that condition.Proposition 2 follows from the fact that Y ∗ is sufficiently large such that identifica-tion of the distribution on Y ∗ leads to identification of the whole distribution. FromLemma 9, Y ∗ contains all the quantiles of the conditional distribution of Y t given X = x . Lemma 12 implies that F Y s | X ( ·| x ) on R is specified when F Y s | X ( ·| x ) on Y ∗ isspecified. Hence, identification of F Y s | X ( ·| x ) on Y ∗ leads to identification of Q Y s | X ( ·| x )on (0 , 1) and E [ Y s | X = x ]. 8roofs of Propositions 1 and 2 imply that Theorems in the main paper do not relyon the assumptions that the closure of Y ◦ is equal to Y , and that F Y t | X ( Y ◦ | x ) doesnot depend on t ∈ T . We modify the proofs of Theorems.In the main paper, the assumption that the closure of Y ◦ is equal to Y is used toassure that identification on the interior of the support leads to identification on thewhole support. However, Y ∗ is sufficiently large and do not require such condition toidentify the conditional distributions of Y t on Y .In the main paper, the assumption that F Y t | X ( Y ◦ | x ) does not depend on t ∈ T isused to assure the existence of an inverse mapping φ x − s,t on Y ◦ . However, φ x − s,t existson Y ∗ from the fact that F Y t | X ( Y ∗ | x ) = (0 , 1) holds and that F Y t | X ( Y ∗ | x ) does notdepend on t ∈ T . F Y t | X ( Y ∗ | x ) = (0 , 1) follows from the fact that Y ∗ contains all thequantiles of the conditional distribution of Y t given X = x from Lemma 9.We finally show Lemmas 8-12 and Propositions 1 and 2, and modify the proofs ofTheorems 1 and 2. Proofs are as follows. Proof of Lemma 8. First, we show that W contains W ∗ . For w ∈ W ∗ , F W ( w + ε ) >F W ( w − ε ) holds for all ε > F W ( w + ε ) ≥ F W ( w ) holds. Hence, w is alsocontained in W , and W contains W ∗ . Next, we show that W ∗ contains W ◦ . Let w bea point not contained in W ∗ . Then, there exists ε > F W ( w ) = F W ( w − ε )holds. Suppose that W is contained in W ◦ . Then, there exists δ > η ≤ δ ⇒ w − η ∈ W holds. If that η also satisfies η ≤ ε/ 2, then F W ( w ) > F W ( w − η )holds because w − η is contained in W . However, this fact contradicts with the factthat F W ( w − η ) ≥ F W ( w − ε ) holds because F W ( w ) = F W ( w − ε ) holds. Hence, W is not contained in W ◦ , and W ∗ contains W ◦ . Therefore, the stated result follows. ✷ Proof of Lemma 9. First, we show part (a). Suppose that there exists τ ′ ∈ (0 , Q W ( τ ′ ) is not contained in W ∗ . Then, there exists ε > F W ( Q W ( τ ′ )) = F W ( Q W ( τ ′ ) − ε ) holds. However, this fact contradicts with the factthat F W ( Q W ( τ ′ )) ≥ τ ′ holds from the definition of Q W . Hence, Q W ( τ ) is containedin W ∗ for all τ ∈ (0 , F W ( Q W ( τ )) = τ holds for all τ ∈ (0 , F W is continuous. Therefore, the stated result follows by taking w = Q W ( τ )for each τ . ✷ roof of Lemma 10. First, we show part (a). Let w ∈ W satisfy w / ∈ V . Then, thereexists ε > F V ( w + ε ) = F V ( w − ε ) holds because F V ( w + ε ) ≥ F V ( w − ε )holds. This fact implies that ( w − ε, w + ε ) is not contained in W ∗ because V ∗ ⊃ W ∗ and V ∗ ⊂ V hold from Lemma 8. Then there exists τ ∈ (0 , 1) such that F W ( η ) = τ holds for all η ∈ ( w − ε, w + ε ). We proceed to show this property. Suppose that η, η ′ ∈ ( w − ε, w + ε ) satisfies η < η ′ and F W ( η ) < F W ( η ′ ). Then, η < Q W ( F W ( η ′ )) ≤ η ′ holds, and Q W ( F W ( η ′ )) is contained in W ∗ from Lemma 9. However, this con-tradicts with the fact that ( w − ε, w + ε ) is not contained in W ∗ . Hence, the statedproperty follows. This property implies that w is not contained in W . However, thiscontradicts with w ∈ W . Therefore, V ⊃ W holds.Next, we show part (b). Let w ∈ W ∗ satisfy w / ∈ V ∗ . Then there exists ε > F V ( w ) = F V ( w − ε ) holds because F V ( w ) ≥ F V ( w − ε ) holds. This factimplies that ( w − ε, w ) is not contained in W because V ⊃ W holds. Then there exists τ ∈ (0 , 1) such that F W ( w − η ) = τ and τ < F W ( w ) hold for all 0 < η < ε . We proceedto show this property. Suppose that 0 < η < η ′ < ε satisfies F W ( w − η ) < F W ( w − η ′ ).Then, w − η < Q W ( F W ( w − η ′ )) ≤ w − η ′ holds, and Q W ( F W ( w − η ′ )) is contained in W ∗ from Lemma 9. However, this contradicts with the fact that ( w − ε, w ) is notcontained in W because Q W ( F W ( w − η ′ )) is also contained in W from Lemma 8.Hence, the stated property follows. However, this property contradicts with continuityof F W . Therefore, V ∗ ⊃ W ∗ holds. ✷ Proof of Lemma 11. Let w , w ∈ W ∗ satisfies w < w . Then, because w is con-tained in W ∗ , F W ( w ) > F W ( w ) holds. Therefore, the stated result follows. ✷ Proof of Lemma 12. We show that F V ( w ) ≥ F W ( w ) holds for all w ∈ R . Suppose thatthere exists w ∈ R such that F V ( w ) < F W ( w ) holds. Then, for F V ( w ) < τ ′ < F W ( w ),there exists w ′ ∈ W ∗ such that w ′ < w and F V ( w ′ ) = F W ( w ′ ) = τ ′ hold from Lemma9. However, this fact implies that τ ′ ≤ F V ( w ) holds and contradicts with the factthat F V ( w ) < F W ( w ) holds. Hence, F V ( w ) ≥ F W ( w ) holds for all w ∈ R . We canshow that F V ( w ) ≤ F W ( w ) holds for all w ∈ R similarly. Therefore, the stated resultfollows. ✷ Proof of Proposition 1. For part (a), use Lemmas 10 and 11 instead of Lemma 5 inthe proof of part (a) of Lemma 1. For part (b), use Lemmas 10 and 11 instead of10emma 5 to show (4.16) in the proof of Lemma 4.For part (c), it suffices to show that φ xk,j for j = 0 , . . . , k − Y ∗ .Identification of other counterfactual mappings results in identification of φ xk,j for j = 0 , . . . , k − φ x − s,t exists on Y ∗ , and that φ xs,r is identified on Y ∗ if φ xs,t and φ xt,r are identified on Y ∗ for s, t, r ∈ T . We first show that φ x − s,t exists on Y ∗ .First, φ xs,t is strictly increasing on Y ◦ from part (A1) of Assumption 1 and Lemma 11.Second, φ xs,t ( Y ∗ ) = Y ∗ holds because F Y t | X ( Y ∗ | x ) = (0 , 1) holds and F Y t | X ( Y ∗ | x ) doesnot depend on t ∈ T . This is because Y ∗ contains all the quantiles of the conditionaldistribution of Y t given X = x from Lemma 9. Hence, from the definition of φ xs,t ,an inverse mapping φ x − s,t exists on Y ∗ , and φ xt,s ( y ) = φ x − s,t ( y ) holds for y ∈ Y ∗ . Wenext show that φ xs,r is identified on Y ∗ if φ xs,t and φ xt,r are identified on Y ∗ . First, φ xs,r = φ xt,r ◦ φ xs,t holds because F Y t | X ( Q Y t | X ( τ | x ) | x ) = τ for τ ∈ (0 , 1) holds from part(A1) of Assumption 1. Second, φ xs,t ( Y ∗ ) = Y ∗ holds. Hence, the stated result follows.From the preceding argument, y ∈ Y ∗ implies φ xs,t ( y ) ∈ Y ∗ . We use this propertyin this proof.For identification of φ xk,j for j = 0 , . . . , k − Y ∗ , use Lemmas 10 and 11 insteadof Lemma 5, and use the fact that y ∈ Y ∗ implies φ xs,t ( y ) ∈ Y ∗ . Therefore, φ xk,j for j = 0 , . . . , k − Y ∗ are identified on Y ∗ , and other counterfactual mappings arealso identified because they are inversions or compositions of φ xk,j for j = 0 , . . . , k − φ xs,t on y ∈ Y ∗ when T is discrete in general, useLemmas 10 and 11 instead of Lemma 5 in Appendix C. ✷ Proof of Proposition 2. We show part (a). Part (b) follows from a similar argument.We first show (D.1). From Lemma 9, F Y t | X ( Q Y t | X ( τ | x ) | x ) = τ and Q Y t | X ( τ | x ) ∈ Y ∗ hold for all τ ∈ (0 , Q Y t | X ( τ | x ).We next show (2.4). Similar to the proof of part (b) of Lemma 1, we have F Y s | T ZX ( y | t, z, x ) = F φ xt,s ( Y t ) | T ZX ( y | t, z, x ) for y ∈ Y ∗ . (D.3)Applying Lemma 12 to (D.3) in leads to F Y s | T ZX ( y | t, z, x ) = F φ xt,s ( Y t ) | T ZX ( y | t, z, x ) for y ∈ R . (D.4)Hence, Y s d = φ xt,s ( Y t ) conditional on ( T, Z, X ) = ( t, z, x ) holds. Therefore, we obtain112.4) as in the proof of part (b) of Lemma 1. ✷ Modified proof of Theorem 1. Because φ xs,t ( y ) for y ∈ Y ∗ is identified from part (c) ofProposition 1, F Y s | X ( y | x ) for y ∈ Y ∗ is identified from part (a) of Proposition 1. Hence, Q Y s | X ( τ | x ) for τ ∈ (0 , 1) and E [ Y s | X = x ] are identified from part (a) of Proposition2. ✷ Modified proof of Theorem 2. we first modify the proof of Lemma 4 and show that Q Y t ′ |C tz,z ′ X ( τ | x ) for τ ∈ (0 , 1) and E [ Y t ′ |C tz,z ′ , X = x ] are identified for all t ′ ∈ T if P ( D tz ≤ D tz ′ | X = x ) = 1 and P ( C tz,z ′ | X = x ) > t ∈ T , ( z, z ′ ) ∈ P , and x ∈ X . Because φ xs,t ( y ) for y ∈ Y ∗ is identified from part (c) of Proposition 1, F Y t ′ |C tz,z ′ X ( y | x ) for y ∈ Y ∗ is identified from part (b) of Proposition 1. Hence, Q Y t ′ |C tz,z ′ X ( τ | x ) for τ ∈ (0 , 1) and E [ Y t ′ |C tz,z ′ , X = x ] are identified from part (b) ofProposition 2. The stated result follows from applying this result instead of Lemma4 to the proof in the main paper. ✷ E About the conditions for Assumption 2 in Sec-tion 3.1 In this section, we show some sufficient conditions for Assumption 2 introduced inSection 3.1. The following proposition shows the sufficient conditions for parts (A3)and (A4) of Assumption 2. Proposition 3. (a) Assume that parts (A1) and (A2) of Assumption 2 hold. Then,part (A3) of Assumption 2 holds when the covariance of D t and Z conditionalon Z ∈ { z, z ′ } and X = x is not 0 for all t ∈ T .(b) Assume that Assumption 1 and parts (A1)-(A3) of Assumption 2 hold. Then,part (A4) of Assumption 2 holds when the support of the joint conditional distri-bution of U t and V given X = x is the Cartesian product of the supports of theconditional distributions of U t and V given X = x for all t ∈ T , where T can beexpressed as T = ρ ( Z, X, V ) for some unknown function ρ and an unobservedrandom vector V from part (A3) of Assumption 1. roof of Proposition 3. First, we show part (a). Suppose that part (A3) of As-sumption 2 does not hold. Then, there exists ( z, z ′ ) ∈ Λ such that D tz ≤ D tz ′ holdsalmost surely for all t ∈ T , but P ( C t ′ z,z ′ ) > t ′ ∈ T . Be-cause we have P ( C t ′ z,z ′ ) = P ( D t ′ = 1 | Z = z ′ ) − P ( D t ′ = 1 | Z = z ) by Lemma 2, we have P ( D t ′ = 1 | Z = z ′ ) = P ( D t ′ = 1 | Z = z ), which implies D t ′ and Z are independent con-ditional on Z ∈ ( z, z ′ ). Hence, the covariance of D t ′ and Z conditional on Z ∈ { z, z ′ } is 0, and the stated result follows.Next, we show part (b). Without loss of generality, assume that P ( C tz,z ′ ) > U t be the support of the distribution of U t . Let V t and W t be the supports ofthe conditional distributions of U t and Y t given C tz,z ′ respectively. We first show that U t = V t holds. This follows from the fact that the compliers is characterized by V .To see this, we have P ( C tz,z ′ ) = P ( D tz ′ = 1) − P ( D tz = 1)= P ( ρ ( z ′ , V ) = t ) − P ( ρ ( z, V ) = t ) (E.1)by Lemma 2. We then show that U t = V t implies Y = W t . It suffices to show that Y ∗ = W ∗ t holds because Y ∗ = W ∗ t implies Y = W t from Lemma 10. We proceed toshow that Y ∗ = W ∗ t holds. First, Y ∗ ⊃ W ∗ t follows from Y ⊃ W t . Y ⊃ W t holds fromthe definition of W t . Second, we show Y ∗ ⊂ W ∗ t . For each y ∗ ∈ Y ∗ , there exists τ ∗ ∈ (0 , 1) such that y ∗ = Q Y t ( τ ∗ ) holds from the definition of Y ∗ . Take any ε > δ > y ∗ − ε = Q Y t ( τ ∗ − δ ) holds from the definition of Y ∗ .Observe that U t ∼ U (0 , 1) holds because U t = F Y t ( Y t ) and F Y t is continuous. Hence, τ ∗ ∈ U ∗ t and F U t ( τ ∗ ) − F U t ( τ ∗ − δ ) > τ ∗ ∈ V ∗ t and F U t |C tz,z ′ ( τ ∗ ) − F U t |C tz,z ′ ( τ ∗ − δ ) > U t = V t implies U ∗ t = V ∗ t from Lemma 10. This implies that F Y t |C tz,z ′ ( y ∗ ) − F Y t |C tz,z ′ ( y ∗ − ε ) > { U t < τ ∗ } = { Y t < y ∗ } and { U t < τ ∗ − δ } = { Y t < y ∗ − ε } (E.4)hold from the definition of Q Y t . Then applying (E.4) to (E.2) gives (E.3). Hence, Y ∗ ⊂ W ∗ t holds. Therefore, Y ∗ = W ∗ t holds and the stated result follows. ✷ About the choice restrictions of Examples II andIII in Section 3.2 In this section, we show the budget set relationships of the families we assume forExamples II and III. Similar to Example I, economic analysis generates choice restric-tions for Examples II and III from the budget set relationships of the families. F.1 Example II First, we have the following obvious monotonicity inequalities arising from differentvalues of Z for each family ω ∈ Ω: D ij ( ω ) ≤ D ii ( ω ) for i = 1 , . . . , k and j ∈ T \ { i } . (F.1)For j ∈ T \ { , i } , inequality (F.1) states that the family is induced toward buyinghouse i when the instrument changes from a voucher for house j to a voucher forhouse i . For j = 0, inequality (F.1) states that the family is induced toward buyinghouse i when the instrument changes from no voucher to a voucher for house i .Next, for budget set B ω ( z, t ) of family ω , we can naturally assume the followingrelationships for i = 1 , . . . , k : B ω ( j, 0) = B ω ( i, 0) for j ∈ T \ { i } , (F.2) B ω ( l, i ) = B ω ( j, i ) ⊂ B ω ( i, i ) for j, l ∈ T \ { i } and j = l. (F.3)Relationship (F.2) holds because any voucher offers no discount and produce thesame budget set if family ω does not buy any house. Relationship (F.3) examines thebudget set of family ω if it purchases house i . The budget set of family ω is enlargedif it has a voucher that subsidizes house i when compared to a voucher that do notaffect the choice set (voucher j for j ∈ T \ { i } or no voucher). Similar to Example I,for ω ∈ Ω, applying the choice rule (3.7) to budget set relationships (F.2) and (F.3)generates the following additional monotonicity inequalities in addition to (F.1): D ji ( ω ) ≤ D j ( ω ) for i = 1 , . . . , k and j ∈ T \ { i } . (F.4)Therefore, from (F.1) and (F.4), we may assume the monotonicity inequalities of143.8). F.2 Example III First, as in (F.1) in Example II, we have the following obvious monotonicity inequalityarising from different values of Z for each family ω ∈ Ω: D k ( ω ) ≤ D kk ( ω ) . (F.5)Second, for budget set B ω ( z, t ) for family ω , we can naturally assume the followingrelationships for i = 1 , . . . , k : B ω ( j, 0) = B ω ( i, 0) for j ∈ T \ { i } , (F.6) B ω (0 , i ) = B ω ( j, i ) for j ∈ { i + 1 , . . . , k } and B ω (0 , i ) ⊂ B ω ( l, i ) for l ∈ { , . . . , i } . (F.7)Relationship (F.6) can be interpreted in the same way as relationship (F.2). Relation-ship (F.7) examines the budget set of family ω if it purchases house i . The budgetset of family ω is enlarged if it has a voucher that subsidizes house i (voucher l for l ∈ { , . . . , i } ) when compared to a voucher that do not affect the choice set (voucher j for j ∈ { i + 1 , . . . , k } or no voucher). Similar to Example I, for ω ∈ Ω, applying thechoice rule (3.7) to budget set relationships (F.6) and (F.7) generates the followingadditional monotonicity inequalities in addition to (F.5): D ii ( ω ) ≥ D ii +1 ( ω ) and D ji ( ω ) ≤ D ji +1 ( ω ) for i = 1 , . . . , k − j ∈ T \ { i } ,D jk ( ω ) ≤ D j ( ω ) for j ∈ T \ { k } . (F.8)Therefore, from (F.5) and (F.8), we may assume the monotonicity inequalities of(3.9). G The relationship between the compliers of Ex-ample II In this section, we derive the relationships between the compliers of (4.9) in Section3.2. First, for (1 , , . . . , ( k, 0) in Table II, observe that, from the definition of the15ompliers, we have C i ,i = [ j = i { D ii = 1 , D j = 1 } for i = 1 , . . . , k. (G.1)The sets { D ii = 1 , D j = 1 } and { D ii = 1 , D l = 1 } are disjoint for j = l . This is because,for i = 1 , . . . , k , family ω contained in C i ,i will either buy a house other than house i or do not buy any house if it has no voucher. Second, we obtain C ji, = { D ii = 1 , D j = 1 } for i = 1 , . . . , k and j ∈ T \ { i } . (G.2)To see this, suppose family ω contained in C ji, choose house l , where l ∈ T \ { , i, j } .Then, because D li ≤ D l holds, it would also choose house l when its voucher assign-ment is 0, which contradicts with the definition of C ji, . Hence, it will choose house i if its voucher assignment is i . Therefore, the relationships of (4.9) follow from (G.2)and (G.1). H About the condition for discrete instruments In this section, we discuss the reason why we assume that the treatment do not havelarger support than the instrument variable. Let T take k + 1 values. Assume thatAssumptions 1 and 2 hold. We assume that Z contains at least k + 1 values, and wecannot apply our identification analysis if this condition does not hold.Suppose Z contains only k values, and that k pairs of instrument values λ , . . . , λ k are contained in the monotonicity subset Λ. We show that at least one monotonicityrelationship do not give additional information to other monotonicity relationships.To this end, we first show that at least one pair of instrument values (without lossof generality, suppose λ k ) satisfies the following property:For λ i , both of the elements of λ i = ( λ i , λ i ) are contained in at least one of theother pairs, which are λ j for j ∈ { , . . . , k } \ { i } . (H.1)Suppose that for all l ∈ { , . . . , k } , one of the elements of λ l is not contained in anyof the other pairs. Then Z needs to contain at least k + 1 values. This is because weneed at least one value contained in more than one pair and k more values such that16ach of them is contained in each pair respectively.We deal with the case that the two elements of λ k = ( λ k , λ k ) are contained indifferent pairs. When they are contained in the same pair (without loss of generality,suppose λ ), equations for the potential outcome conditional c.d.f.’s given the com-pliers are the same for λ and λ k , and the monotonicity relationship of λ k do not giveadditional information to other relationships λ , . . . , λ k − .Suppose that λ k is the only one pair that satisfies property (H.1). We showthat the monotonicity relationship of λ k do not give additional information to otherrelationships λ , . . . , λ k − . Similar argument follows for the cases that other pairs alsosatisfy such property. Without loss of generality, suppose we have Z = { , , . . . , k − } , λ l = (0 , l ) for l ∈ { , . . . , k − } , and λ k = (1 , P ( C t , ) = P ( C t , ) − P ( C t , ) for t ∈ T . (H.2)Hence, the equation for the potential outcome conditional c.d.f.’s given the compliersof λ k is obtained by subtracting that of λ from that of λ , and less than k equationsare induced from λ , . . . , λ k . We cannot identify k counterfactual mappings from only k −−