[PDF] Filtered and Unfiltered Treatment Effects with Targeting Instruments

Abstract

Multivalued treatments are commonplace in applications. We explore the use of discrete-valued instruments to control for selection bias in this setting. We establish conditions under which counterfactual averages and treatment effects are identified for heterogeneous complier groups. These conditions restrict (i) the unobserved heterogeneity in treatment assignment, (ii) how the instruments target the treatments, and optionally (iii) the extent to which counterfactual averages are heterogeneous. We allow for limitations in the analyst's information via the concept of a filtered treatment. Finally, we illustrate the usefulness of our framework by applying it to data from the Student Achievement and Retention Project and the Head Start Impact Study.

Full PDF

aa r X i v : . [ ec on . E M ] J u l Filtered and Unﬁltered Treatment Eﬀectswith Targeting Instruments ∗ Sokbae Lee † Bernard Salani´e ‡ July 22, 2020

Abstract

Multivalued treatments are commonplace in applications. We explore the use ofdiscrete-valued instruments to control for selection bias in this setting. We establishconditions under which counterfactual averages and treatment eﬀects are identiﬁed forheterogeneous complier groups. These conditions require a combination of assumptionsthat restrict both the unobserved heterogeneity in treatment assignment and how theinstruments target the treatments. We introduce the concept of ﬁltered treatment,which takes into account limitations in the analyst’s information. Finally, we illustratethe usefulness of our framework by applying it to data from the Student Achievementand Retention Project and the Head Start Impact Study.

Keywords : Identiﬁcation, selection, multivalued treatments, discrete instruments,unordered monotonicity, factorial design. ∗ This work is in part supported by the European Research Council (ERC-2014-CoG-646917-ROMIA) andthe UK Economic and Social Research Council for research grant (ES/P008909/1) to the CeMMAP. † Department of Economics, Columbia University and Centre for Microdata Methods and Practice, Insti-tute for Fiscal Studies, [email protected]. ‡ Department of Economics, Columbia University, [email protected]. ntroduction Much of the literature on the evaluation of treatment eﬀects has concentrated on the paradig-matic “binary/binary” example, in which both treatment and instrument only take two val-ues. Multivalued treatments are common in actual policy implementations, however; andmultivalued instruments are just as frequent. Many diﬀerent programs aim to help trainjob seekers for instance, and each of them has its own eligibility rules. Tax and beneﬁtregimes distinguish many categories of taxpayers and eligible recipients. The choice of acollege and major has many dimensions too, and responds to a variety of ﬁnancial help pro-grams and other incentives. Randomized experiments in economics resort more and more tofactorial designs; they have a long tradition in applied statistics, starting with Fisher in the1920s . As the training, education choice, and tax-beneﬁt examples illustrate, multivaluedtreatments are often also subject to selection on unobservables. We explore in this paperthe use of discrete-valued instruments in order to control for selection bias when evaluatingdiscrete-valued treatments. We establish conditions under which counterfactual averages andtreatment eﬀects are identiﬁed for various (sometimes composite) complier groups. Theseconditions require a combination of assumptions that restrict both the unobserved hetero-geneity in treatment assignment and the conﬁguration of the instruments themselves.Existing work on multivalued treatments under selection on observables includes Imbens(2000), Cattaneo (2010), and Ao, Calonico, and Lee (2019) among others. The literaturethat uses discrete-valued instruments to evaluate treatment eﬀects under selection on un-observables is more sparse. On the theoretical side, Angrist and Imbens (1995) analyzedtwo-stage least-squares (TSLS) estimation when the treatment takes a ﬁnite number of or-dered values. Heckman, Urzua, and Vytlacil (2006, 2008) showed how treatment eﬀects canbe identiﬁed in discrete choice models for the ordered and unordered cases, respectively. Morerecently, Heckman and Pinto (2018) focused on unordered treatments and introduced the no-tion of “unordered monotonicity” under which treatment assignment is formally analogous toan additively separable discrete choice model. Several recent papers have studied the case ofbinary treatments with multiple instruments, as well as binary instruments with multivaluedor continuous treatments. For the former, Mogstad, Torgovitsky, and Walters (2019, 2020)and Goﬀ (2020) analyzed the identifying power of diﬀerent monotonicity assumptions. Forthe latter, Torgovitsky (2015), D’Haultfoeuille and F´evrier (2015), Huang, Khalil, and Yildiz(2019), Caetano and Escanciano (2020) and Feng (2020) developed identiﬁcation results fordiﬀerent models. On the applied side, Kirkeboen, Leuven, and Mogstad (2016) used discreteinstruments to obtain TSLS estimates of returns to diﬀerent ﬁelds of study. Kline and Walters Muralidharan, Romero, and W¨uthrich (2019) reviews recent applications of factorial designs. ﬁltered treatment D ;underlying it is an unﬁltered treatment T . Treatment eﬀects are of course harder to identifyin the ﬁltered model. The concept of ﬁltering is linked to our earlier work (Lee and Salani´e,2018), which allowed for limited violations of unordered monotonicity and used continuousinstruments to identify marginal treatment eﬀects.Moreover, both ﬁltering and the multiplicity of treatments and instruments may giverise to a bewildering number of cases. In the binary/binary model, the analyst can usuallytake for granted that switching on the binary instrument makes treatment (weakly) morelikely for any observation . With multiple instrument values and multiple treatments, thecorrespondence is less clear. We start by imposing the unordered monotonicity property ofHeckman and Pinto (2018) on the unﬁltered treatment model. Under unordered monotonic-ity, it is natural to speak of an instrument targeting an unﬁltered treatment by increasing itsrelative “mean utility”. Most of our paper relies on the assumption of strict targeting , whichobtains when each instrument only promotes the treatments it targets.To illustrate, consider the eﬀect of various programs T on some outcomes Y . Let eachinstrument value z stand for a policy regime, under which the access to some programs ismade easier or harder than in a control group. Under unordered monotonicity, this translatesinto a proﬁle of relative mean utilities of any treatment t under the policy regimes z P Z .We say that an instrument value z targets a treatment t when it maximizes its relative meanutility. Suppose that each policy regime consists of values of subsidies for a subset of theprograms, and that these subsidies enter mean utilities additively. Then a policy regime z targets a treatment t if it has the highest subsidy for this program among all policy regimes.Strict targeting requires that all policy regimes z that do not target t have the same (lower) This is satisﬁed under the LATE-monotonicity assumption (e.g., Imbens and Angrist, 1994; Vytlacil,2002). t . It is easy to translate this property in the other examples cited at the beginningof the introduction.With complete treatment data (the unﬁltered treatment), combining unordered mono-tonicity and strict targeting allows us to point-identify the size of some complier groupsand the corresponding treatment eﬀects, and to partially identify others. When the data ontreatments is ﬁltered, unordered monontonicity may not carry over to the ﬁltered treatment D (an observation already in Lee and Salani´e (2018)); and strict targeting generally does not.Nevertheless, they confer enough underlying structure to the mapping from instruments toﬁltered treatments that we can still identify various parameters of interest.We give numerous examples throughout the paper. We also illustrate the usefulness ofour framework by applying it to data from the Student Achievement and Retention (STAR)Project (Angrist, Lang, and Oreopoulos, 2009) and to Kline and Walters’s (2016) analysisof the Head Start Impact Study. We ﬁnd that the large intent-to-treat (ITT) eﬀect of theSTAR for female college students results from the aggregation of two very diﬀerent treatmenteﬀects; this highlights the value of unbundling the heterogeneous compliers. We also conﬁrmthe importance of taking into consideration alternative preschools when evaluating HeadStart; unlike Kline and Walters (2016), we do not rely on parametric selection models.The remainder of the paper is organized as follows. Section 1 deﬁnes our frameworkand introduces ﬁltered and unﬁltered treatments. In Section 2, we study identiﬁcation inthe unﬁltered treatment model. We deﬁne the concepts of targeting, one-to-one targeting,and strict targeting and their implications for the identiﬁcation of the probabilities and thetreatment eﬀects of various complier groups. Section 3 turns to ﬁltered models. We deriveidentiﬁcation results in several leading classes of applications. Finally, we present estimationresults for the two aforementioned empirical studies in Section 4. The Appendices containthe proofs of all propositions and lemmata, along with some additional material. We focus throughout on a treatment that takes discrete values, which we label d P D . Forsimplicity, we will call D “ d “treatment d ” These values are unordered: e.g. d “

2, whenavailable, is not “more treatment” than d “

1. In most of our examples, there is a well-deﬁnedcontrol group, which is denoted by d “

0. We assume that discrete-valued instruments Z i P Z are available. We condition on all other exogenous covariates X i throughout, and we omitthem from the notation. We will use the standard counterfactual notation: D i p z q and Y i p d, z q denote respectively potential treatments and outcomes.The validity of the instruments requires the usual exclusion restrictions:4 ssumption 1 (Valid Instruments) . (i) Y i p d, z q “ Y i p d q for all p d, z q in D ˆ Z .(ii) Y i p d q and D i p z q are independent of Z i for all p d, z q in D ˆ Z . Under Assumption 1, we deﬁne D i : “ D i p Z i q and Y i : “ Y i p D i q . Throughout the paper,we assume that we observe p Y i , D i , Z i q for each i . In addition, the instruments must berelevant. In the usual binary instrument/binary treatment case (hereafter “binary/binary”),this translates into a requirement that the propensity score vary with the instruments. Inour more general setting, we impose: Assumption 2 (Relevant Instruments) . Let Z i denote a column vector whose elements are1 and the variables p Z i “ z q for z P Z , and D i denote a column vector whose elements are1 and the variables p D i “ d q for d P D . Then E r Z i D J i s has full rank. We now move beyond these standard assumptions. First, we need an assumption thatrestricts the heterogeneity in the counterfactual mappings D i . In the binary/binary model,this is most often done by imposing LATE-monotonicity. Assumption 3 (LATE-monotonicity in the binary/binary model) . (i) or (ii) must hold:(i) for each observation i , D i p q ě D i p q ;(ii) for each observation i , D i p q ě D i p q . With more than two treatment values and/or more than two instrument values, thereare many ways to restrict the heterogeneity in treatment assignment. Since treatmentsare not ordered in any meaningful way, we cannot apply the results in Angrist and Imbens(1995) for instance. Mogstad, Torgovitsky, and Walters (2019, 2020) state several versions ofmonotonicity for a binary treatment model with |Z| ą

2. They propose an assumption PM(partial monotonicity) which applies binary LATE-monotonicity component by component.This requires that the instruments be interpretable as vectors, which is not necessarily thecase here.Heckman and Pinto (2018) took another path; they deﬁned an unordered monotonicityproperty that is motivated by an analogy to revealed preference theory. This can be statedas follows:

Assumption 4 (Unordered Monotonicity at p z, z q ) . For any treatment value d P D , (i) or(ii) must hold: i) if D i p z q “ d , then D i p z q “ d ;(ii) if D i p z q “ d , then D i p z q “ d . The easiest way to understand Assumption 4 is to think of treatment assignment asgenerated by a discrete choice problem. If observation i “chose” treatment value d under z ,then a change in instrument value that increases the mean utility of treatment d at least asmuch as the mean utilities of other treatment values should lead i to still choose d . Thisis more than illustrative: Heckman and Pinto (2018) show that the treatment assignmentmodels that satisfy unordered monotonicity for each pair of instrument values in a set Z canbe represented by a discrete choice problem with additively separable errors, that is D i p z q “ arg max d P D p U z p d q ` u id q for random vectors p u id q d P D that are distributed independently of Z i . Let AS-DCM denotethis class of models. Clearly, Asssumption 4 is more restrictive if the set of instrument values Z is richer. In Example 1 below, we would only want to invoke Assumption 4 on some policychanges. Example 1.

Unemployed individuals can be assigned either to a control group or to threediﬀerent training programs, with treatment values d “ , ,

3. Consider three alternativepolicy changes ( z Ñ z ), all of which make more individuals eligible for treatment 1. Policychange A at the same time restricts the eligibility criteria for both treatments 2 and 3 inunspeciﬁed ways. Policy change B leaves eligibility criteria unchanged for treatments 2 and 3,and policy change C restricts them for treatment 2 only. Assumption 4 would require thatall observations that have treatment 1 under z must also have treatment 1 under z , whichseems natural in this context. It would also prevent “two-way moves” between treatment 2and treatment 3, which seems restrictive. Assumption 4 would be more credible with policychanges B and C. Heckman and Pinto (2018) showed how unordered monotonicity could be applied to identifysome treatment eﬀects (or weighted averages of treatment eﬀects). In Lee and Salani´e (2018),we considered a more general family of models of treatment assignment. We allowed for As a special case, unordered monotonicity includes ordered treatments in which D i p z q “ arg max d P D p U z p d q` σ p d q u i q for some increasing positive function σ . u ij ď Q j p z q . All AS-DCM models clearly belong to this class, as D i p z q is characterized by u id ´ u i,D i p z q ď U z p D i p z qq ´ U z p d q for all d P D . Our results showed that this class of models can be generated by1. taking an AS-DCM model of assignment to treatment values T P T ,2. generating the observed treatment D P D from a partition of the set T .This deﬁnes the class of models of assignment to treatment that we analyze in this paper.We call such models “ﬁltered treatment”, and we will refer to the (imperfectly observed)model of treatment in 1 above as the “unﬁltered treatment”. To pursue the discrete choiceanalogy: in the unﬁltered model, each observation chooses a treatment within D and theanalyst observes this choice. In a ﬁltered model, choices are aggregated into groups; theanalyst only observes which group the treatment chosen belongs to. The aggregation occursvia a ﬁltering map from T to D . Deﬁnition 1 (Filtered Treatment) . The treatment assignment model is determined by:1. a ﬁnite set T ;2. a partition of T which we call D ; or equivalently, a surjective ﬁltering map M : T Ñ D ;3. a ﬁnite set of instrument values Z ; and4. an AS-DCM model of unﬁltered treatment: T i p z q “ arg max t P T p U z p t q ` u it q , where the vector p u it q t P T is distributed independently of Z i and has full support onIR |T | .Our paper focuses on such models. Assumption 5 (Filtered Treatment) . Treatment is assigned according to Deﬁnition 1. Wecall the model that generates T i the underlying unﬁltered treatment model .

7e will assume throughout, implicitly, that the set of instrument values Z has beenrestricted to a subset where unordered monotonicity is a reasonable assumption. We willalso take some liberties with language by speaking of individuals “choosing” their “preferred”unﬁltered treatments and of “mean utilities” U z p t q . These are only meant to simplify theexposition and do not imply that the individual actually chooses her treatment.As an example, consider the following double hurdle model; it has |D| “

2, and anunderlying unﬁltered treatment model with |T | “ Example 2 (Double Hurdle Treatment) . The unﬁltered treatment has T “ t , , u and T i p z q “ arg max t “ , , p U z p t q ` u it q , where the vector p u i , u i , u i q is distributed independently of Z i and has full support on IR .Suppose that the ﬁltered treatment is generated by D “ p T “ q , which corresponds tothe ﬁltering map M p q “ , M p q “ M p q “

0; that is,(1.1) $&% D i p z q “ p U z p q ` u i , U z p q ` u i q ă U z p q ` u i D i p z q “ u i ´ u i ą U z p q ´ U z p q and u i ´ u i ą U z p q ´ U z p q .Lee and Salani´e (2018) gave a set of assumptions under which the marginal treatmenteﬀect can be identiﬁed in a ﬁltered treatment model, provided that enough continuous instru-ments are available. In Example 2, we would need two continuous instruments, and someadditional restrictions. The current paper is exploring identiﬁcation with discrete-valuedinstruments. In these settings, the combination of Assumptions 1, 2, and 5 is far from suf-ﬁcient to identify interesting treatment eﬀects in ﬁltered and unﬁltered treatment modelsin general. In order to better understand what is needed, we now resort to the notion of response-groups of observations, whose members share the same mapping from instruments z to unﬁltered treatments t . We ﬁrst state a general deﬁnition . Deﬁnition 2 (Response-vectors and -groups) . Let ˜ t be an element of T Z and ˜ t p z q P T denote its component for instrument value z P Z . • Observation i has (elemental) response-vector R ˜ t if and only if for all z P Z , T i p z q “ ˜ t p z q . The set C ˜ t denotes the set of observations with response-vector R ˜ t and we call ita response-group . This is analogous to the deﬁnitions in Heckman and Pinto (2018). We extend the deﬁnition in the natural way to incompletely speciﬁed mappings, where˜ t is a correspondence from Z to T . We call the corresponding response-vectors andresponse-groups composite . We start by introducing additional assumptions on the underlying unﬁltered treatmentmodel. We will illustrate these assumptions in simple graphs; our leading example is the“ternary/ternary” case when |T | “ |Z| “ Example 3 (Ternary/ternary unﬁltered model) . Assume that Z “ t , , u and T “t , , u . In the p u i ´ u i , u i ´ u i q plane, the points of coordinates P z “ p U z p q´ U z p q , U z p q´ U z p qq for z “ , , z , • T i p z q “ P z ; • T i p z q “ P z and below the diagonal that goes through it; • T i p z q “ P z and above the diagonal that goes through it.This is shown in Figure 1 for a given z , where the origin is in P z .Figure 1: Unﬁltered treatment assignment in the ternary/ternary model for given zu i ´ u i u i ´ u i P z T i p z q “ T i p z q “ T i p z q “ .1 Targeted Treatments “Targeting” will be the common thread in our analysis. Just as in general economic discus-sions a policy measure may target a particular outcome, we will speak of instruments (in theeconometric sense) targeting the assignment to a particular treatment.Under unordered monotonicity (Assumption 4), assignment to treatment is governed bythe diﬀerences in mean utilities p U z p t q ´ U z p τ qq and by the diﬀerences in unobservables u it ´ u iτ . Only the former depend on the instrument. Intuitively, an instrument z targets atreatment t if it makes the diﬀerence p U z p t q ´ U z p τ qq as large as possible for given τ . Insteadof requiring this for any τ , we will choose a reference treatment t P T and require that z maximize p U z p t q ´ U z p t qq for this particular t . In many applications, the control group isa natural choice for a reference treatment. Since the control group is usually denoted t “ t “ Deﬁnition 3 (Targeted Treatments and Targeting Instruments) . Let t “ z P Z and t P T , we denote∆ z p t q ” U z p t q ´ U z p q the relative mean utility of treatment t given instrument z .Let ¯∆ t be the maximum value of ∆ z p t q over z P Z , and ¯ Z p t q the set of maximizers z P Z .If ¯ Z p t q is not all of Z , then for any z P ¯ Z p t q we will say that instrument value z targets treatment value t ; and we write t P ¯ T p z q . We denote T ˚ the set of targeted treatments and Z ˚ “ Ť t P T ˚ ¯ Z p t q the set of targeting instruments.Deﬁnition 3 calls for several remarks. First, by construction ∆ z p q ” Z p q “ Z .Therefore t “ T ˚ . In many of our examples, T ˚ “ T zt u ; the set T ˚ may excludeother treatment values, however.If a treatment value t is not targeted, by deﬁnition the function z Ñ ∆ z p t q is constantover z P Z , with value ¯∆ t . While treatment values in T z T ˚ have mean utilities that donot respond to changes in the instruments, these mean utilities may and in general willdiﬀer across treatments. The probability that an individual observation takes a treatment t P T z T ˚ also generally depends on the value of the instrument.More importantly, the utilities U z p t q and therefore the targeting maps ¯ Z and ¯ T are notobservable; any assumption on targeting instruments and targeted treatments must be apriori and will be context-dependent. As we will see, these prior assumptions sometimeshave consequences that can be tested. 10et us return to the illustration that we used in the introduction. A policy regime z consists of a set of (possibly zero or negative) subsidies S z p t q for treatments t P T . If thereis a no-subsidy regime z “ S p t q “ t , it seems natural to write the meanutility as U z p t q “ U p t q ` S z p t q . Then relative mean utilities are ∆ z p t q “ ∆ p t q ` S z p t q andfor any treatment t , the set ¯ Z p t q consists of the instrument values z that subsidize t mostheavily. As this illustration suggests, the sets ¯ Z p t q may not be singletons, and they may wellintersect. We will show this on several examples. Example 4 is an instance of factorial design in which each non-zero treatment value is targetedby two instrument values, and one instrument value targets several treatments.

Example 4 (2 ˆ . Let Z “ t ˆ , ˆ , ˆ , ˆ u , where the two digitsindicate the values of two binary instruments z and z . Suppose that T “ t , , u , where z “ z “ ˆ p q “ ¯∆ ą max p ∆ ˆ p q , ∆ ˆ p qq ∆ ˆ p q “ ¯∆ ą max p ∆ ˆ p q , ∆ ˆ p qq . Depending on the context, it may be reasonable to assume that ∆ ˆ p q “ ∆ ˆ p q and∆ ˆ p q “ ∆ ˆ p q : turning on the two instruments increases the appeal of t “ t “

2) just as much as if only z (resp. z ) had been turned on. This would be quite naturalif z “ z “ ˆ Z p q “ t ˆ , ˆ u and ¯ Z p q “ t ˆ , ˆ u ;instrument z “ ˆ t “ t “

2, so that ¯ T p ˆ q “ t , u . Example 5 (Two Instruments Target the Same Treatment) . Let us now modify Example 4slightly: the instrument can only take values 0 ˆ , ˆ

0, and 1 ˆ

1. Then z “ ˆ z “ ˆ t “

1: ¯ Z p q “ t ˆ , ˆ u . Example 6 (An Instrument Targets Two Treatments) . In this example, Z “ t , u and T “ t , , u . A fraction of individuals in the sample receives a subsidy z “ t “ t “

2; under z “

0, no treatment is subsidized. Wewould expect that ∆ p q ą ∆ p q and ∆ p q ą ∆ p q , so that ¯ Z p q “ ¯ Z p q “ t u ; then wehave T ˚ “ t , u and Z ˚ “ t u . 11 .1.2 One-to-one Targeting Sometimes we will impose the much stronger Assumption 6, or only one of its two parts.The ﬁrst part says that a targeted treatment can only have one targeting instrument; thesecond part stipulates that a targeting instrument may target only one treatment. Example 4violates both parts of Assumption 6. Example 5 violates its ﬁrst part only, and Example 6only violates its second part.

Assumption 6 (One-to-one Targeting) . (i) For any t P T ˚ , the set ¯ Z p t q is a singleton t ¯ z p t qu .(ii) For any z P Z ˚ , ¯ T p z q is a singleton t ¯ t p z qu . Note that if both parts of Assumption 6 hold, we can identify Z ˚ and T ˚ , and aninstrument z P Z ˚ to the treatment it targets. Deﬁnition 4 (Labeling Instruments) . Let both parts of Assumption 6 hold. To any t P T ˚ we associate the instrument value z “ ¯ z p t q that targets t , and we denote it by z “ t . Thisallows us to deﬁne the partition Z “ p Z z T ˚ q Ť T ˚ , which is illustrated in Figure 2.Figure 2: One-to-one Targeting T ˚ Z z T ˚ Z tz T ˚ T z T ˚ T t Example 7 (Treatment Subsidies) . Let T “ t u Ť T ˚ with T ˚ “ t , . . . , |T | ´ u , and Z “ T . Each z ą t “ z in the sense that for each t ą , S t p t q ą S z p t q for all z ‰ t. . Example 8 (Binary Instrument) . Let T “ t u Ť T ˚ with T ˚ “ t , . . . , |T | ´ u , and Z “ t , u . An observation with z “ t “

1, so that S p q ą

0. Other treatment values are not subsidized: S p t q “ S p t q “ t ‰

1. Then∆ p t q “ ∆ p t q for all t ‰

1, so that Z ˚ “ t u . Example 9 (No Control) . Let Z “ T ˚ , so that there is one fewer instrument value thantreatment values. The simplest example in this class is the ternary/binary model, with T “ t , , u and Z “ T ˚ “ t , u . There are only two classes of observations: those with z “ t “

1, and those with z “ t “ .2 Strict Targeting Assumption 4, conjoined with Assumption 6, imposes some useful restrictions on responsegroups.

Proposition 1 (Unﬁltered response groups (1)) . Under Assumptions 4 and 6, for any t P T ˚ : • if T i p t q “ , then T i p z q ‰ t for all z P Z ; • as a consequence, all response-groups C ˜ t with ˜ t p t q “ and ˜ t p z q “ t for some z ‰ t areempty. Example 3 (continued)

Return to the ternary/ternary model and assume that the targetedset of treatments T ˚ “ t , u and that Assumptions 4 and 6 hold. This imposes∆ p q ą max p ∆ p q , ∆ p qq and ∆ p q ą max p ∆ p q , ∆ p qq . A possible interpretation is that policy regime z “ z “

2) subsidizes treatment t “ t “

2) more that policy regimes z “ z “ z “

1) do.Since P z has coordinates p´ ∆ z p q , ´ ∆ z p qq , • P must lie to the left of P and P , • P must lie below P and P .This is easily rephrased in terms of the response-vectors of deﬁnition 2. First note that in theternary/ternary case, there are 3 “

27 response-vectors, R to R , with correspondingresponse-groups C to C . Groups C ddd are “always-takers” of treatment value d . Allother groups are “compliers” of some kind, in that their treatment changes under somechanges in the instrument. We will also pay special attention to some non-elemental groups.For instance, R ˚ will denote the group who is assigned treatment 0 under z “ z “

2, and any treatment under z “

1. That is, C ˚ “ C Ť C Ť C . Assumption 4 asserts the emptiness of four composite groups out of the 27 possible: C ˚ , C ˚ , C ˚ , and C ˚ by Proposition 1. They correspond to 10 elemental groups. Observations in group C are usually called the “never-takers”. We prefer not to break the symmetryin our notation. We hope this will not cause confusion. Speciﬁcally, they are: C , C , C , C , C , C , C , C , C , and C . P , P and P are consistent with Assumptions 4 and 6.Figure 3: Unordered monotonic ternary/ternary models: an example A A C C A C C C C C u i ´ u i u i ´ u i P P P The number of distinct response-groups (ten) and the contorted shape of the C and C groups in Figure 3 point to the diﬃculties we face in identifying response-groups withoutfurther assumptions. Moreover, this is only one possible conﬁguration: other cases exist,which would bring up other response-groups.Figure 3 also suggests that if we could make sure that P is directly to the left of P , theshape of C would become nicer—and group C would be empty. Bringing P directlyunder P would have a similar eﬀect. But these are assumptions on the dependence of the U z p d q on instruments. The ﬁrst one imposes ∆ p q “ ∆ p q and the second one imposes∆ p q “ ∆ p q . To put it diﬀerently, we are now requiring that instrument z “ t , whichmaximizes ∆ z p t q “ U z p t q ´ U z p q , should not shift assignment between the other values ofthe treatment. This can be interpreted as policy regime z “ z “

2) subsidizingtreatment t “ z “

2) only.The following assumption is a direct extension of the discussion above to our generaldiscrete model.

Assumption 7 (Strict Targeting) . Take any targeted treatment value t P T ˚ . Then thefunction z P Z Ñ ∆ z p t q takes the same value for all z R ¯ Z p t q . We denote this common valueby ∆ t , and we will say of the instrument values z P ¯ Z p t q that they strictly target t . z P ¯ Z p t q promotes treatment t withoutaﬀecting the relative mean utilities of other treatment values. This explains our use of theterm “strict targeting”. To return to the analogy with a discrete choice model, an instrumentin ¯ Z p t q plays the role of a price discount on good t in a model of demand for goods whosemean utilities only depend on their own prices. In the language of program subsidies, all z P ¯ Z p t q subsidize t at the same high rate, and all other instrument values oﬀer the same,lower subsidy (which could be zero or negative).Note that while we only state the assumption for t P T ˚ , it holds by deﬁnition for all t P T z T ˚ . Since ¯ Z p t q “ Z for these treatment values, ∆ t “ ¯∆ t is the common value of ∆ z p t q over all of Z .Moreover, Assumption 7 only bites for a given t P T ˚ if Z z ¯ Z p t q has at least two values.Since ¯ Z p t q is never empty, this shows that Assumption 7 automatically holds if |Z| “ ˆ p q “ ∆ ˆ p q and∆ ˆ p q “ ∆ ˆ p q (so that z “ ˆ t “ z “ ˆ t “ z “ z “ t “

1, and z “ Z p q “ t u yet ∆ p q ă ∆ p q . Example 10 (Tuition Subsidies) . To shed light on Assumption 7, consider two types ofpolicies aimed at making education more aﬀordable. Our ﬁrst policy consists of ﬁeld-speciﬁctuition subsidies. Each individual i is oﬀered randomly a choice of m i ě Z i of ﬁelds. If m i ě

1, the individual may choose to use a voucher to study in a ﬁeldin Z i , to study in another ﬁeld, or not pursue education. Let T i denote this choice, with T i “ t ‰

0, the value of ∆ z p t q is highest when t P z as a vouchercan be used. Therefore ¯ Z p t q is the set of menus of vouchers that include ﬁeld t ; and T ˚ isthe set of ﬁelds for which a voucher is sometimes, but not always oﬀered. Whether ¯ Z p t q isa single menu or not, all other menus of vouchers yield the same ∆ z p t q : the ﬁeld t is strictlytargeted .Another possible policy consists in subsidizing tuition for every year of study in the hopeof increasing the number of years of education. Now z is a subsidy rate, and t the number Note that iﬀ m i ď z “ t F, K u and z “ t G, L u fails the second part of Assumption 6; a set z “ t F, K u and z “ t G, K u fails both parts.

15f years of education. Since a higher subsidy rate reduces the cost of education, for any t the function ∆ z p t q achieves its maximum ¯∆ t for the highest subsidy ¯ z on oﬀer: for each t ,¯ Z p t q “ t ¯ z u and Assumption 7 fails. More importantly, if |Z| ą t ą

0, thevalue of ∆ z p t q increases with z ‰ ¯ z . Strict targeting would clearly not be an appropriateassumption in this setting.Extending our geometric illustration of Example 3, let P z be the point in IR |T ˚ | withcoordinates p´ ∆ z p t qq t P T ˚ . Under Assumption 7, the point P z has its t coordinate equal to ´ ∆ t on any axis t which it targets ( t P ¯ T p z q ), and ´ ∆ t on any other axis. Since ´ ∆ t ą ´ ∆ t ,two points P z and P z have the same coordinate on any axis t R ¯ T p z q Ť ¯ T p z q ; and P z isbelow P z on axis t if t P ¯ T p z qz ¯ T p z q .Now suppose that in addition to Z ˚ , the set of instruments contains at least two values z and z . Since neither targets any treatment, under Assumption 7 ∆ z p t q “ ∆ z p t q “ t P T ˚ . Moreover, ∆ z p t q equals ∆ t for all z P Z if t R T ˚ . This implies that thecounterfactual treatments T i p z q and T i p z q must be equal for any observation i . In thatsense, z is superﬂuous and we can aggregate it with z in a category that we will call z “ z ‰ P is above the point P z on any axis t P ¯ T p z q .We summarize this in Lemma 1. Lemma 1 (Some consequences of strict targeting) . Under Assumption 4 and Assumption 7,(i) The coordinates of two points P z and P z in IR |T ˚ | coincide on any axis t that is notin the symmetric diﬀerence ¯ T p z q △ ¯ T p z q .(ii) If z P ¯ Z p t q and z P ¯ Z p t q , the point P z is above the point P z on the axis t .(iii) The set of instrument values Z is either Z ˚ , or the union of Z ˚ and of a singleinstrument value that we denote z R Z ˚ . In the latter case, for any z ‰ z the point P z is below the point P z on any axis t P ¯ T p z q , and it has the same coordinates on allother axes. For simplicity, if such a z exists we denote z “ . Just as we chose to denote our reference treatment as t “

0, our choice of z “ Deﬁnition 5 (Preferred targeted and alternative treatments) . Take any observation i in thepopulation. 16i) For z P Z ˚ , let V ˚ i p z q “ max t P ¯ T p z q p ¯∆ t ` u it q and T ˚ i p z q Ă ¯ T p z q denote the set of maximizers. We call the elements of T ˚ i the preferred targeted treatments .(ii) Also deﬁne ∆ ˚ i “ max t P T p ∆ t ` u it q and let τ ˚ i Ă T denote the set of maximizers. We call the elements of τ ˚ i the preferredalternative treatments .Under strict targeting, an observation i can react to being assigned an instrument z in twoways. If z is in Z ˚ , then i can choose among the treatments that z targets. Alternatively, itmay choose as if no treatment was targeted (as it must if z is not in Z ˚ ). We now make thismore rigorous by proving that observations can only opt for one of their preferred targetedtreatments, if any, or for one of their preferred alternative treatments.By Lemma 1, Z is either Z ˚ or Z ˚ Ť t u . We now state our main result on response-groups. Proposition 2 (Unﬁltered response groups under strict targeting) . Let Assumptions 4 and 7hold. Then for every observation i ,(i) if z P Z ˚ , then T i p z q can only be in T ˚ i p z q or in τ ˚ i .(ii) if Z ‰ Z ˚ , then T i p q P τ ˚ i . For simplicity, we work from now on under the assumption that the distribution of theerror terms in the AS-DCM has no mass points. Then the sets τ ˚ i and T ˚ i p z q are single-tons with probability 1; with a minor abuse of notation, we let τ ˚ i and T ˚ i p z q denote theirelements . Assumption 8 (Absolutely continuous errors) . The distribution of the random vector p u it q t P T is absolutely continuous. Proposition 3 (Unﬁltered classes under strict targeting) . Under Assumptions 4, 7, and 8,the population contains at most two subpopulations denoted by P and P .(i) Subpopulation P can only exist if Z “ Z ˚ . If i P P , then T i p z q “ T ˚ i p z q for all z P Z . Note that this does not extend to the sets ¯ Z p t q and ¯ T p z q , which can still have several elements. ii) Subpopulation P consists of classes denoted by c p A, τ q , where A is a possibly emptysubset of Z ˚ and τ is a treatment value. If observation i is in c p A, τ q , then the followingholds. • T i p z q “ T ˚ i p z q for all z P A . • If A ‰ Z , then τ ˚ i “ τ ; and for all z P Z z A , T i p z q “ τ . • If A ‰ Z and τ P T ˚ , then ¯ Z p τ q Ă A .(iii) If Z “ Z ˚ , then there is no class in P with A “ Z ˚ . Proposition 3 has a straightforward corollary under one-to-one targeting (Assumption 6).Recall that under one-to-one targeting, the sets ¯ Z p t q and ¯ T p z q are singletons and we canidentify each targeting instrument with the treatment it targets. As a consequence, T ˚ i p z q “ z for each z in Z , and if τ P T ˚ then ¯ Z p τ q “ t τ u . This simpliﬁes the statement of ourcharacterization result. Corollary 1 (Unﬁltered classes under strict, one-to-one targeting) . Under Assumptions 4,6, 7, and 8, the population contains at most two subpopulations denoted by P and P .(i) Subpopulation P can only exist if Z “ Z ˚ . If i P P , then T i p z q “ z for all z P Z .(ii) Subpopulation P consists of classes denoted by c p A, τ q , where A is a possibly emptysubset of Z ˚ and τ is a treatment value. If observation i is in c p A, τ q , then the followingholds. • T i p z q “ z for all z P A . • If A ‰ Z , then τ ˚ i “ τ ; and for all z P Z z A , T i p z q “ τ . • If A ‰ Z and τ P T ˚ , then τ P A .(iii) If Z “ Z ˚ , then there is no class in P with A “ Z ˚ . The subpopulation P , when it exists, regroups “super-compliers”: they always take thetreatment that is targeted by the instrument value they were assigned. E.g. if Z “ Z ˚ “t , , u , under strict one-to-one targeting this subpopulation would be the response group C . It is easy to see from the proof that an observation i belongs to P if and only if forall z P Z “ Z ˚ , V ˚ i p z q ą ∆ ˚ i .Given any (possibly empty) subset A of T ˚ and a treatment value τ , an observation i belongs to c p A, τ q if and only if 18 for all z P A , V ˚ i p z q ą ∆ ˚ i ; • for all z P Z ˚ z A , V ˚ i p z q ă ∆ ˚ i ; • ∆ ˚ i “ ∆ τ ` u iτ .First consider the case when A is empty. Whatever the value of the instrument z is, anobservation i in c pH , τ q will take up the treatment τ that maximizes u it over T . Suchobservations are always-takers of τ . In the polar case A “ Z ˚ , when it is assigned a targetinginstrument value ( z P Z ˚ ), the observation complies by picking one of the treatments ittargets ( T i p z q “ T ˚ i p z q , which is z under one-to-one targeting). When both A and Z z A arenon-empty, the observation complies when the instrument z is in A , and it does not respondto changes in the value of z when it is in Z z A .Figure 4: An unﬁltered class c p A, τ q under strict one-to-one targeting zA Z ˚ z A Z z A T τ ˚ i τ ˚ i T ˚ z A Figure 4 represents the mapping of instruments to treatments for an observation i inpopulation P under strict one-to-one targeting. We illustrate a case for which Z ˚ “ Z z t u , Z ˚ z A is not empty, and τ P A . The white area shows that treatment values in T ˚ z A arenot assigned. To illustrate Corollary 1, we return to the ternary/ternary modelof Example 3, where Z ˚ “ T ˚ “ t , u and Z “ T “ t , , u . • P does not exist. • A can be H , t u , t u , or t , u , with corresponding values of τ in t u , t , u , t , u or t , , u respectively. The class c pH , q corresponds to the always-takers of 0, A “ C . For A “ t u we get C and A , and for A “ t u we get C and A . Finally,with A “ t , u we obtain the composite response group C ˚ “ C Ť C Ť C .19igure 5: Unﬁltered, strictly one-to-one targeted treatment: ternary/ternary model u i ´ u i u i ´ u i P C C C A A A C C P P P The eight elemental response groups are illustrated in Figure 5, again with the ori-gin in P . Comparing Figure 5 with Figure 3 shows the identifying power of Assump-tion 7. Kirkeboen, Leuven, and Mogstad (2016) used a ternary-ternary model in their in-vestigation of ﬁeld of study and earnings. We show in Appendix C.1 that our combinationof Assumption 4 and Assumption 7 yields exactly the same identifying restrictions as inKirkeboen, Leuven, and Mogstad (2016), by a quite diﬀerent path.Figure 6: Unﬁltered, strict one-to-one targeting: ternary/binary model with no control u i ´ u i u i ´ u i P P C C A A A C Our next example has Z “ Z ˚ : all individuals are assigned a targeting instrument. Example 11 (Ternary/binary model with no control) . Let us return to Example 9, consider T “ t , , u , and Z “ Z ˚ “ t , u : z “ t “ z “ t “

2. 20ow the subpopulation P exists; it corresponds to the response group C of super-compliers. A can be H , with τ “

0; it can be t u , with τ P t , u ; or it can be t u , with τ Pt , u . This generates response groups A ; C and A ; and C and A . These six elementalresponse-groups are represented in Figure 6, where we put the origin at u i “ u i “ u i sincethere is no P point any more.Sometimes one can obtain the characterization in Corollary 1 with a weaker assumptionthan Assumption 6. To see this, consider the following variant of Example 8. Example 12 (Only one type of subsidy) . Assume that T “ t , , u and Z “ t , u . Weinterpret z “ t “

1, and z “ t “ p q ą ∆ p q and ∆ p q “ ∆ p q ; we have¯ Z p q “ t u , ¯ Z p q “ t , u “ Z , and T ˚ “ Z ˚ “ t u . Since we only have a binary instrument,strict targeting holds in this example.The subpopulation P cannot exist here since z “ Z ˚ . In subpopulation P , wecan have classes A “ H with τ P t , u , and A “ t u with τ P T . The former generates thealways-takers groups A “ C and A “ C , and the latter has the two groups of compliers C and C and the always-taker group A “ C . These ﬁve elemental response-groups areillustrated in Figure 7.Figure 7: Unﬁltered, targeted treatment: ternary/binary model with only one type of subsidy u i ´ u i u i ´ u i P C A A A C P P If we had not imposed ∆ p q “ ∆ p q , Assumption 7 would still hold but t “ T ˚ . If for instance t “ t “ t “ p q ą ∆ p q and ¯ Z p q “ t u , so that T ˚ “ t , u . We wouldnot have one-to-one targeting anymore since z “ t “ t “ C , with A “ t u and τ “ t “ t “ t “ p U p q ´ U p qq ´ p U p q ´ U p qq “ p ∆ p q ´ ∆ p qq ´ p ∆ p q ´ ∆ p qq ą . This is enough to rule out the possibility of the response group C . To see this, assume that T i p q “

1. This implies U p q ` u i ą U p q ` u i , so that U p q ` u i ą U p q ` p U p q ´ U p qq ` u i ą U p q ` u i and T i p q cannot be 2. Now that we have characterized response-groups, we seek to identify the probabilities of thecorresponding response-groups in the unﬁltered treatment model.

Deﬁnition 6 (Genralized propensity scores) . We write P p t | z q for the generalized propensityscore Pr p T i “ t | Z i “ z q . Under Assumptions 6 and 7, the response-groups are easily enumerated.

Proposition 4 (Counting response-groups under strict one-to-one targeting) . Under As-sumptions 4, 6, 7. and 8, the number of response-groups is N “ p |T | ´ |Z ˚ | q ˆ |Z ˚ | ´ ´ p |T | ´ q p Z “ Z ˚ q . The data gives us the generalized propensity scores P p t | z q “ Pr p T i “ t | Z i “ z q for p t, z q P T ˆ Z . The adding-up constraints ÿ t P T P p t | z q “ k P Z reduce the countof independent data points to |T ˚ | ˆ |Z| . As the probabilities of the response-groups mustsum to one, we have p N ´ q unknowns.Table 1 shows some values of the number of equations |T ˚ | ˆ |Z| and the number ofunknowns p N ´ q for a number of examples. The ﬁrst row of |T | “ |Z| “ A ), compliers ( C ), and22able 1: Number of required identifying restrictions: unﬁltered treatment under strict, one-to-one targeting Row

T Z Z ˚ N ´ |Z| |T ˚ | Required Example(1) { } { } { } LATE (2) { } { } { } { |T | ´ } { } { } p |T | ´ q p |T | ´ q { } { } { } Example 11 (5) { } { } { } Example 3 (6) { } { } { }

11 9 2(7) { } { } { } { } { } { }

16 9 7(9) { } { } { }

19 12 7 always-takers ( A ). Rows (2) and its extension (3) show another case of exact identiﬁcation.In other rows, as |T | gets larger, the degree of underidentiﬁcation tends to increase.It is not diﬃcult to write down the equations that link observed propensity scores andgroup probabilities. Proposition 5 (Identifying equations for response-groups: unﬁltered treatment under strictone-to-one targeting) . For any subset A of Z ˚ , let A ` denote the set A Ť p T z T ˚ q . UnderAssumptions 1, 2, 4, 6, 7, and 8, the empirical content of the generalized propensity scoresof the unﬁltered treatment model is the following system of equations:(i) If Z ‰ Z ˚ : • for z P Z ˚ and t P T : P p t | z q “ ÿ A Ă Z ˚ zt z u p t P A ` q Pr p c p A, t qq (2.1) ` p t P Z ˚ , t “ z q ÿ A Ă Z ˚ z P A ÿ τ P A ` Pr p c p A, τ qq . • for z R Z ˚ and t P T : P p t | z q “ ÿ A Ă Z ˚ p t P A ` q Pr p c p A, t qq . (2.2) 23 ii) If Z “ Z ˚ , for t P T : P p t | z q “ ÿ A Ă Z zt z u p t P A ` q Pr p c p A, t qq (2.3) ` p t “ z q ¨˚˝ Pr p P q ` ÿ A Ă Z ,A ‰ Z z P A ÿ τ P A ` Pr p c p A, τ qq ˛‹‚ . Proposition 5 can be applied directly to some of the rows of Table 1. According to the table,our Example 8 is just identiﬁed under strict, one-to-one targeting. Proposition 6 conﬁrmsit and gives explicit formulæ, along with simple testable predictions. To avoid repetitions,in the remainder of Section 2, we assume that Assumptions 1, 2, 4, 6, 7, and 8 hold with D “ T . Proposition 6 (Response-group probabilities in Example 8) . The following probabilities areidentiﬁed: Pr p A q “ P p | q , Pr p A t q “ P p t | q for t ‰ , Pr p C t q “ P p t | q ´ P p t | q for t ‰ . (2.4) The model has p | T | ´ q testable predictions: P p t | q ě P p t | q for t ‰ . Row (5) of Table 1 is the ternary/ternary model of Example 3, in which eight elementalgroups are non-empty. One restriction is missing to point-identify the probabilities of alleight response-groups. The following proposition shows that the probabilities of four of theeight elemental groups are point-identiﬁed: two groups of always-takers, and two groups ofcompliers. In addition, the probabilities of two composite groups of compliers are point-identiﬁed. The other four probabilities are constrained by three adding-up constraints.

Proposition 7 (Response-group probabilities in the ternary/ternary model of Example 3) . he following probabilities are identiﬁed: Pr p A q “ P p | q , Pr p A q “ P p | q , Pr p C q “ P p | q ´ P p | q , Pr p C q “ P p | q ´ P p | q , Pr p C Ť C q “ P p | q ´ P p | q , Pr p C Ť C q “ P p | q ´ P p | q , Pr p C Ť C Ť C Ť A q “ P p | q . (2.5) The model has the following testable implications: P p | q ě P p | q (2.6) P p | q ě P p | q (2.7) P p | q ě max p P p | q , P p | qq . (2.8)The model of Example 11 is equally easy to analyze. The probabilities of two groups ofalways-takers are point-identiﬁed, and two equations link the probabilities of the other threeelemental groups. Proposition 8 (Response-group probabilities in the ternary/binary model of Example 11) . The following probabilities are identiﬁed: Pr p A q “ P p | q , Pr p A q “ P p | q , Pr p C Ť C q “ P p | q ´ P p | q , Pr p C Ť A q “ P p | q , Pr p C Ť A q “ P p | q . (2.9) The model has the following testable implication: (2.10) P p | q ě P p | q . We now establish identiﬁcation of treatment eﬀects for the complier groups whose probabil-ities are identiﬁed. To simplify the exposition, we introduce one more element of notation.25 eﬁnition 7 (Conditional average group outcomes) . For any z P Z , t P T , and for anyresponse group C with nonzero probability, we deﬁne E z p t | C q “ E p Y i p T i “ t q| Z i “ z, i P C q and we call it the conditional average group outcome . We deﬁne the conditional averageoutcome by ¯ E z p t q “ E p Y i p T i “ t q| Z i “ z q . To give a trivial example, the LATE formula (row (1) of Table 1) is E p Y i p q| i P C q “ ¯ E p q ´ ¯ E p q P p | q ´ P p | q and E p Y i p q| i P C q “ ¯ E p q ´ ¯ E p q P p | q ´ P p | q , yielding the familiar form: E p Y i p q ´ Y i p q| i P C q “ E p Y i | Z i “ q ´ E p Y i | Z i “ q Pr p T i “ | Z i “ q ´ Pr p T i “ | Z i “ q . While the ¯ E z p t q are directly identiﬁed from the data, the conditional average groupoutcomes of course are not. We do know that some of them are zero; and that they combinewith the group probabilities to form the observed conditional average outcomes. We will usethe following identity repeatedly: Lemma 2 (Decomposing conditional average outcomes) . Let z P Z and t P T . Then ¯ E z p t q “ ÿ C p z q “ t E p Y i p t q| i P C q Pr p i P C q , where C p z q “ t means that response group C has treatment t when assigned instrument z . Inaddition, E p Y i | Z i “ z q “ ÿ t P T ¯ E z p t q . First consider Example 8, where the probabilities of all p |T | ´ q response groups areidentiﬁed (Proposition 6). Proposition 9 (Identiﬁcation in the ternary/binary model under strict one-to-one target-26ng) . The following quantities are point-identiﬁed: E r Y i p q| i P A s “ ¯ E p q P p | q , E r Y i p t q| i P A t s “ ¯ E p t q P p t | q for t ‰ , E r Y i p t q| i P C t s “ ¯ E p t q ´ ¯ E p t q P p t | q ´ P p t | q for t ‰ . However, the standard Wald estimator only partially identiﬁes the average treatment eﬀectson the complier groups C t : E p Y i | Z i “ q ´ E p Y i | Z i “ q Pr p D i “ | Z i “ q ´ Pr p D i “ | Z i “ q “ p ¯ E p q ´ ¯ E p qq ´ ř t ‰ p ¯ E p t q ´ ¯ E p t qq P p | q ´ P p | q“ ÿ t ‰ α t E r Y i p q ´ Y i p t q| i P C t s , (2.11) where the weights α t “ Pr p i P C t | i P Ť τ ‰ C τ q “ p P p t | q ´ P p t | qq{p P p | q ´ P p | qq arepositive and sum to 1. Proposition 9 shows that we only identify a convex combination (with point-identiﬁedweights) of the ATEs on the |T ˚ | complier groups. It is possible to bound the average treat-ment eﬀects in a straightforward manner if we assume that the support of Y i is known andﬁnite. Alternatively, we may add conditions to achieve point identiﬁcation of average treat-ment eﬀects for the compliers. Assuming that the ATEs are all equal is one obvious solution.Another one is to assume the homogeneity of the average outcomes under treatment. Corollary 2 (Treatment eﬀects in the one-subsidy model) . Suppose that the average coun-terfactual outcomes under treatment are identical for all complier groups: E r Y i p q| i P C t s does not depend on t ‰ . (2.12) Then the average treatment eﬀects for all complier groups C t are point-identiﬁed: E r Y i p q ´ Y i p t q| i P C t s“ ¯ E p q ´ ¯ E p q P p | q ´ P p | q ´ ¯ E p t q ´ ¯ E p t q P p t | q ´ P p t | q . To interpret the homogeneity condition in (2.12), suppose that we are concerned with theeﬀect of one subsidized program ( t “

1) when other, unsubsidized programs ( t ą

1) are also27vailable. Then (2.12) imposes that outcomes for compliers (who switch to the subsidizedprogram when oﬀered a subsidy) are on average the same regardless where the compliersswitched from.We now move on the ternary/ternary model in Example 3. As we mentioned earlier, inthis example our assumptions allow us to use the results of Kirkeboen, Leuven, and Mogstad(2016). Their Proposition 2 tells us that β “ E “ Y i p q ´ Y i p q ˇˇ i P C Ť C ‰ ,β “ E “ Y i p q ´ Y i p q ˇˇ i P C Ť C ‰ , where β and β are the probability limits of the instrumental variable estimators in Y i “ β ` β p T i “ q ` β p T i “ q ` ε i . (2.13)We now show that we can also identify the average treatment eﬀects for the response groups C and C , whose probabilities are point-identiﬁed. Proposition 10 (Identiﬁcation of treatment eﬀects for Example 3) . The average treatmenteﬀects of C and C are identiﬁed: E r Y i p q ´ Y i p q| i P C s“ p E r Y i | Z i “ s ´ E r Y i | Z i “ sq ´ β p P p | q ´ P p | qq P p | q ´ P p | q and E r Y i p q ´ Y i p q| i P C s“ p E r Y i | Z i “ s ´ E r Y i | Z i “ sq ´ β p P p | q ´ P p | qq P p | q ´ P p | q . The average treatment eﬀect E r Y i p q ´ Y i p q| i P C s brings interesting information of adiﬀerent nature than β “ E “ Y i p q ´ Y i p q ˇˇ i P C Ť C ‰ , which Kirkeboen, Leuven, and Mogstad(2016) focus on. We can illustrate this on the choice of college education, using a specialcase of Example 10. Let z “ z “ t “ t “ t “ Y is later earnings.Both response groups C , C , and C are all comprised of individuals who will study28TEM if and only if they receive a STEM subsidy. On the other hand, individuals in C Ť C will not go to college unless they receive a subsidy, while those in C are “collegealways-takers”. These are quite diﬀerent populations and there is no reason to expect thatthe eﬀect of a STEM major on their future earnings should be the same, even on average. We now turn to ﬁltered versions of the treatment model we analyzed in the previous sec-tion. That is, we consider a model with a treatment variable D i P D , where the set ofﬁltered treatment values D is a non-trivial partition of the set of unﬁltered treatment val-ues T “ , . . . , |T | ´ . By deﬁnition, 2 ď |D| ă |T | . We impose unordered monotonicity(Assumption 4) on the unﬁltered treatment model.Let M : T Ñ D denote the “ﬁltering map”: for any d P D , the set of unﬁltered t ’s thatgenerate the observation D “ d is M ´ p d q . The statistics that can be identiﬁed from thedata are obtained by summing their unﬁltered equivalent over t P M ´ p d q .To make this more precise, we add superscripts T or D to response groups, conditionalprobabilities and expectations to indicate whether they pertain to the unﬁltered treatmentmodel or to the ﬁltered treatment model. For instance, C T refers to a response group in theunﬁltered treatment model (a “ T -response group”). The ﬁltering map transforms C T intoa “ D -response group” C D straightforwardly: if C T p z q “ t , then C D p z q “ M p t q . Deﬁne ¯ M tobe the component-by-component extension of M , so that ¯ M p C T q ” p M p t q , . . . , M p t |Z| qq for p t , . . . , t |Z| q P C T . Then the D -response groups are C D “ Ť C T | ¯ M p C T q“ C D C T , with probabilities Pr p i P C D q “ ÿ C T | ¯ M p C T q“ C D Pr p i P C T q . We let P T p t | z q denote the generalized propensity scores, and E Tz p t | C T q and ¯ E Tz p t q the con-ditional average group outcomes and conditional average outcomes of Deﬁnition 7. Theirequivalents in the ﬁltered treatment model are(3.1) P D p d | z q ” Pr p D i “ d | Z “ z q “ ÿ t P M ´ p d q P T p t | z q C , E Dz p d | C q ” E p Y i p D i “ d q| Z i “ z, i P C q “ ÿ t P M ´ p d q E Tz p t | C q . Finally,(3.2) ¯ E Dz p d q ” E p Y i p D i “ d q| Z i “ z q “ ÿ t P M ´ p d q ¯ E Tz p t q . Since we do not observe T i , only the left-hand sides in Equation (3.1) and Equation (3.2) areidentiﬁed from the data. Finally, we let T i p z q and D i p z q denote the counterfactual treatments,and Y Ti p t q and Y Di p d q the counterfactual outcomes. It would be easy, but perhaps not that useful, to translate the general results of Section 2.3and Section 2.4 to the ﬁltered treatment model. We choose to focus here on two useful classesof examples in which the unﬁltered treatment model satisﬁes strict, one-to-one targeting.

Let us ﬁrst return to the binary instrument/multiple unﬁltered treatment model (Example 8).Since z “ t “

1, it seems natural to start with a binary ﬁlteredtreatment: D i “ p T i “ q . This corresponds to a ﬁltering map M deﬁned by • M p q “ • M p t q “ t ‰ i took the targeted treatment;if not, then i could be in any other treatment cell.The mapping of T -response groups to D -response groups is straightforward. The groupsof always takers of treatment t “ d “ A D “ A T . The other always-takers mapinto the single group A D “ Ť t ‰ A Tt ; and the compliers C Tt combine into C D “ Ť t ‰ C Tt . Under M , we have P D p | z q “ P T p | z q for z “ ,

1. That is the sum of our information on groupprobabilities. Moving to treatment eﬀects, we observe ¯ E Dz p q “ ¯ E Tz p q and(3.3) ¯ E Dz p q “ ÿ t ‰ ¯ E Tz p t q z “ , D -response group and a weighted LATE,with unknown weights this time. Proposition 11 (Identiﬁcation in the ﬁltered binary instrument model (1)) . (i) The prob-abilities of the three D -response groups are point-identiﬁed: Pr p A D q “ Pr p A T q “ P D p | q Pr p A D q “ ÿ t ‰ Pr p A Tt q “ ´ P D p | q Pr p C D q “ ÿ t ‰ Pr p C Tt q “ P D p | q ´ P D p | q . with the testable implication P D p | q ě P D p | q .(ii) The following counterfactual expectations are identiﬁed: E p Y Di p q| i P A D q “ ¯ E D p q ´ P D p | q , E p Y Di p q| i P A D q “ ¯ E D p q P D p | q . (iii) The standard Wald estimator identiﬁes the following combination of LATEs: E p Y i | Z i “ q ´ E p Y i | Z i “ q Pr p D i “ | Z i “ q ´ Pr p D i “ | Z i “ q “ p ¯ E D p q ´ ¯ E D p qq ´ p ¯ E D p q ´ ¯ E D p qq P D p | q ´ P D p | q“ E p Y Di p q| i P C D q ´ ÿ t ‰ α Tt E p Y Ti p t q| i P C Tt q , (3.4) where the numbers α Tt “ Pr p i P C Tt | i P C D q are unidentiﬁed positive weights that sumto one. The LHS of Equation (3.4) is a particular form of weighted LATE: the substitution of E p Y Di p q| i P C D q by the weighted average in its second term reﬂects the lack of informationof the analyst on the respective sizes of the groups C Tt within C D , and on the dispersion ofthe average counterfactual outcomes when z “ Corollary 3 (Identiﬁcation in the ﬁltered binary instrument model (2)) . Assume that p Y Ti p t q| i P C Tt q is the same for all t ‰ . Then E p Y Di p q| i P C D q “ ÿ t ‰ α Tt E p Y Ti p t q| i P C Tt q and the standard Wald estimator identiﬁes the LATE on D -compliers: E p Y Di p q ´ Y Di p q| i P C D q “ E p Y i | Z i “ q ´ E p Y i | Z i “ q Pr p D i “ | Z i “ q ´ Pr p D i “ | Z i “ q . If we interpret t “ t “

1) as alternativetreatments, then the analyst may only know whether observation i received some kind oftreatment. The corresponding ﬁltering map would be • M p q “ • M p t q “ t ą M . Let M be the join of M and M : • M p q “ • M p q “ • M p t q “ t ą D -response groups consist of the always-takers A D “ A T , A D “ A T , A D “ Ť t ą A Tt ; and the complier groups C D “ C T and C D “ Ť t ą C Tt . Proposition 12 (Identiﬁcation in the ﬁltered binary instrument model (3)) . (i) The prob-abilities of the ﬁve D -response groups are point-identiﬁed: Pr p i P A D q “ P D p | q , Pr p i P A D q “ P D p | q , Pr p i P A D q “ P D p | q , Pr p i P C D q “ P D p | q ´ P D p | q , Pr p i P C D q “ P D p | q ´ P D p | q with the testable implications P D p | q ě P D p | q and P D p | q ě P D p | q . ii) The standard Wald estimator identiﬁes the following combination of LATEs: E p Y Di p q| i P C D q ´ α D E p Y Ti p q| i P C T q ´ p ´ α D q ÿ t ą β Tt E p Y Ti p t q| i P C Tt q (3.5) “ E p Y i | Z i “ q ´ E p Y i | Z i “ q Pr p D i “ | Z i “ q ´ Pr p D i “ | Z i “ q , where • α D “ Pr p i P C D | i P C D Ť C D q is a positive weight, smaller than 1, identiﬁed as p P D p | q ´ P D p | qq{p P D p | q ´ P D p | qq ; • the numbers β Tt “ Pr p i P C Tt | i P C D q are unidentiﬁed positive weights that sumto one. The extension to more general ﬁlters is trivial: any ﬁner partition will identify more α Dd parameters and allow the analyst to gain more information on the sizes of D -complier groupsand to reﬁne the interpretation of the Wald estimator.Let us now turn to the ternary/ternary unﬁltered treatment model of Example 3. Re-member that z “ t “ z “ t “

2. Suppose now that theanalyst only observes whether an individual took one of the subsidized treatments ( d “ t ą

0) or not ( d “ t “ M ´ p q “ M ´ p q “ t , u . The ternary/ternaryunﬁltered treatment model becomes a ternary/binary ﬁltered treatment model. The eight T -response groups of Proposition 7 combine into ﬁve D -response groups: A D “ A T ,A D “ A T Ť A T Ť C T Ť C T ,C D “ C T ,C D “ C T ,C D “ C T . We observe the conditional probabilities P D p | z q and the average outcomes ¯ E Dz p q and ¯ E Dz p q for z “ , , Proposition 13 (Identiﬁcation in the ternary/binary ﬁltered model (1)) . (i) The proba-bility of the always-taker group A D is point-identiﬁed as P D p | q . The other four -response groups probabilities are connected by three equations: Pr p C D ˚ q “ Pr p C D q ` Pr p C D q “ P D p | q ´ P D p | q , Pr p C D ˚ q “ Pr p C D q ` Pr p C D q “ P D p | q ´ P D p | q , Pr p C D ˚ q “ Pr p C D q ` Pr p A D q “ P D p | q . with the testable implications P D p | q ě P D p | q and P D p | q ě P D p | q .The four partially-identiﬁed probabilities can be parameterized as Pr p C D q “ p, Pr p C D q “ P D p | q ´ P D p | q ´ p, Pr p C D q “ P D p | q ´ P D p | q ´ p, Pr p A D q “ P D p | q ` P D p | q ´ P D p | q ` p, where max p , P D p | q ´ P D p | q ´ P D p | qq ď p ď P D p | q ´ max p P D p | q , P D p | qq . (ii) The following average conditional counterfactual outcomes are point-identiﬁed: E p Y Di p q| i P C D ˚ q “ ¯ E D p q P D p | q , E p Y Di p q| i P C D ˚ q “ ¯ E D p q ´ ¯ E D p q P D p | q ´ P D p | q , E p Y Di p q| i P C D ˚ q “ ¯ E D p q ´ ¯ E D p q P D p | q ´ P D p | q , E p Y Di p q| i P A D q “ ¯ E D p q P D p | q . (iii) The standard Wald estimators identify the LATE on C D ˚ and on C D ˚ : E p Y Di p q ´ Y i p q| i P C D ˚ q “ E p Y i | Z i “ q ´ E p Y i | Z i “ q Pr p D i “ | Z i “ q ´ Pr p D i “ | Z i “ q , (3.6) E p Y Di p q ´ Y i p q| i P C D ˚ q “ E p Y i | Z i “ q ´ E p Y i | Z i “ q Pr p D i “ | Z i “ q ´ Pr p D i “ | Z i “ q . (3.7)Note that the width of the interval on the unknown p cannot be larger than min p P D p | q , P D p | qq :if either instrument z “ , D -response groups will be almost point-identiﬁed. Since the averagecounterfactual outcomes on elemental D -response groups are connected by equations like E p Y i p d q| i P C D ˚ q “ q E p Y i p d q| i P C D q ` p ´ q q E p Y i p d q| i P C D q with q “ p {p P D p | q ´ P D p | qq , one could go further and impose homogeneity assumptionsto improve the identiﬁcation of elemental LATEs. We now return to Example 4, which featured a factorial experimental design. Recall thatwe had z “ ˆ , ˆ , ˆ , ˆ

1, and T “ t , , u . Each instrument combines twobinary instruments: the ﬁrst one is meant to promote treatment t “ t “

2. We focus here on the case when there is no complentarity (positive ornegative) between the two binary instruments : ∆ T ˆ p q “ ∆ T ˆ p q and ∆ T ˆ p q “ ∆ T ˆ p q .This would hold for instance if each binary instrument is a price subsidy and prices entermean utilities additively—a common asssumption in discrete choice models. As we sawin Section 2.1, we have ¯ Z p q “ t ˆ , ˆ u and ¯ Z p q “ t ˆ , ˆ u , so that thistreatment model does not statisfy one-to-one targeting. On the other hand, we also sawthat strict targeting holds if each binary instrument only has an eﬀect on the treatmentvalue that it targets. We will impose the corresponding assumptions ∆ T ˆ p q “ ∆ T ˆ p q and∆ T ˆ p q “ ∆ T ˆ p q . Let us now introduce a ﬁlter, so that the analyst only observes D i “ p T i ą q . Thisyields a ternary/binary ﬁltered treatment model, much as in the previous subsection. Thereare two important diﬀerences—the instrument takes four values rather than three, and weimposed several constraints on the mean utilities:∆ T ˆ p q “ ∆ T ˆ p q ∆ T ˆ p q “ ∆ T ˆ p q ∆ T ˆ p q “ ∆ T ˆ p q ∆ T ˆ p q “ ∆ T ˆ p q . (3.8) We use a superscript T to remind the reader that the argument in parentheses is an unﬁltered treatmentvalue in T . Appendix C.2 provides a variant of ﬁltered factorial design.

35n spite of the ﬁltering, they will allow us to point-identify the relevant LATEs. To seethis, ﬁrst note that for any given observation i , D i p z q “ u i ą max p ∆ Tz p q ` u i , ∆ Tz p q ` u i q , (3.9)so that the ﬁltered treatment model has the structure of a double hurdle model (Example 2).Figure 8: Filtered Factorial Design u i ´ u i u i ´ u i C D C D C D A D A D P ˆ P ˆ P ˆ P ˆ First note that under our assumptions, the right hand side is largest when z “ ˆ D i p ˆ q “

0, observation i always takes d “

0. If on the other hand D i p ˆ q “

1, then i is in A D since the right-hand side can only be larger for the other instrument values.Denote indices in response-groups in the order 0 ˆ , ˆ , ˆ , ˆ

1. The preceding argumentsleave only the D -response groups C D ˚˚ . The group C D cannot exist since in the absenceof complementarity between the binary instruments,max p ∆ T ˆ p q ` u i , ∆ T ˆ p q ` u i q “ max p ∆ T ˆ p q ` u i , ∆ T ˆ p q ` u i q . The three other groups are : • the eager compliers C D : any instrument except 0 ˆ d “ • the reluctant compliers C D and C D : they only adopt d “ D -response groups are shown in Figure 8. Table 2 shows which groupstake D i “ d when Z i “ z . We borrow here the terminology of Mogstad, Torgovitsky, and Walters (2019), which they apply to arather diﬀerent model. D -response Groups D i p z q “ D i p z q “ z “ C D ˚˚ “ A D Ť C D Ť C D Ť C D A D z “ C D ˚ “ A D Ť C D C D ˚ ˚ “ A D Ť C D Ť C D z “ C D ˚ “ A D Ť C D C D ˚˚ “ A D Ť C D Ť C D Proposition 14 (Identifying the Filtered Factorial Design Model) . (i) the probabilities ofthe D -response groups are point-identiﬁed by Pr p A D q “ P D p | ˆ q Pr p A D q “ P D p | ˆ q Pr p C D q “ P D p | ˆ q ´ P D p | ˆ q Pr p C D q “ P D p | ˆ q ´ P D p | ˆ q Pr p C D q “ P D p | ˆ q ` P D p | ˆ q ´ P D p | ˆ q ´ P D p | ˆ q , and the model has three testable implications: P D p | ˆ q ě P D p | ˆ q ,P D p | ˆ q ě P D p | ˆ q ,P D p | ˆ q ` P D p | ˆ q ě P D p | ˆ q ` P D p | ˆ q . (ii) The LATEs on the three groups of compliers are point-identiﬁed by E p Y Di p q ´ Y Di p q| i P C D q “ E p Y | Z “ ˆ q ´ E p Y | Z “ ˆ q P D p | ˆ q ´ P D p | ˆ q E p Y Di p q ´ Y Di p q| i P C D q “ E p Y | Z “ ˆ q ´ E p Y | Z “ ˆ q P D p | ˆ q ´ P D p | ˆ q E p Y Di p q ´ Y Di p q| i P C D q “ E p Y | Z “ ˆ q ` E p Y | Z “ ˆ q ´ E p Y | Z “ ˆ q ´ E p Y | Z “ ˆ q P D p | ˆ q ` P D p | ˆ q ´ P D p | ˆ q ´ P D p | ˆ q . Proposition 14 states that (i) the average treatment eﬀects for reluctant compliers are37dentiﬁed by suitable Wald statistics and that (ii) the average treatment eﬀect for eager com-pliers is identiﬁed by a ratio between diﬀerence-in-diﬀerences (DiD) population quantities.The latter estimand can be viewed as a two-dimensional version of Wald statistics. E p Y Di p q ´ Y Di p q| i P C D q with covariates Most estimands in the paper are expressed in terms of simple Wald estimators, which can beeasily estimated with covariates (e.g. Fr¨olich, 2007). Two exceptional cases are E p Y Di p q ´ Y Di p q| i P C D q in Proposition 14 and E p Y Di p q ´ Y Di p q| i P C D q in Proposition 15 inAppendix C.2.We here discuss how to estimate E p Y Di p q ´ Y Di p q| i P C D q with covariates. Estimationof E p Y Di p q ´ Y Di p q| i P C D q is similar. Introduce covariates X i explicitly and deﬁne: E r Y i p q ´ Y i p q| i P C D , X i “ x s “ DiD Y p x q DiD D p x q , where DiD Y p x q : “ E r Y i | Z i “ ˆ , X i “ x s ` E r Y i | Z i “ ˆ , X i “ x s´ E r Y i | Z i “ ˆ , X i “ x s ´ E r Y i | Z i “ ˆ , X i “ x s , DiD D p x q : “ Pr r T i “ | Z i “ ˆ , X i “ x s ` Pr r T i “ | Z i “ ˆ , X i “ x s´ Pr r T i “ | Z i “ ˆ , X i “ x s ´ Pr r T i “ | Z i “ ˆ , X i “ x s . Then, E p Y Di p q ´ Y Di p q| i P C D q “ E „ DiD Y p X q DiD D p X q ˇˇˇˇ i P C D  . Lemma 3.

Assume that DiD D p X q ‰ almost surely. Then, E p Y Di p q ´ Y Di p q| i P C D q “ E r DiD Y p X qs E r DiD D p X qs . Lemma 3 suggests the following two-step estimation strategy: ﬁrst, estimate E r Y i | Z i “ k, X i “ x s and Pr r T i “ | Z i “ k, X i “ x s for each k P t ˆ , ˆ , ˆ , ˆ u and x P t X , . . . , X n u ; second, evaluate DiD Y p X i q and DiD D p X i q , construct their averages andtake the ratio.For example, the ﬁrst step can be implemented using sieve estimators. In view ofAckerberg, Chen, and Hahn (2012); Ackerberg, Chen, Hahn, and Liao (2014), the resultingtwo-step sieve estimator is semiparametrically eﬃcient, and furthermore, conventional nor-38al inference, pretending that we have a two-step parametric model, is valid for semipara-metric inference. For brevity of the paper, we omit details. In this section, we revisit Angrist, Lang, and Oreopoulos (2009), who analyzed the StudentAchievement and Retention Project. STAR was a randomized evaluation of academic servicesand incentives for college freshmen at a Canadian university. It was a factorial design, withtwo binary instruments. The Student Fellowship Program (SFP) oﬀered students the chanceto win merit scholarships for good grades in the ﬁrst year; the Student Support Program(SSP) oﬀered students access to both a peer-advising service and a supplemental instructionservice. Entering ﬁrst-year undergraduates were randomly assigned to one of four groups: acontrol group ( z “ ˆ z “ ˆ z “ ˆ z “ ˆ T i “ p A i , S i q , where A i “ i signed up and S i “ S i “ A i “

0. Hence T i can only take three values: p , q , p , q , and p , q .With a slight change in notation, we model the choice as T i p z q “ arg max ` u i p , q , ∆ Tz p , q ` u i p , q , ∆ Tz p , q ` u i p , q ˘ . While there are four instrument values and three treatment values, this is in fact a ternary/ternarymodel, with some speciﬁc features. First note that T i “ T ˆ p , q and ∆ T ˆ p , q at minus inﬁnity. In addition, S i canonly be zero if z “ ˆ

1, so that we can set ∆ T ˆ p , q and ∆ T ˆ p , q at minus inﬁnitytoo. As a consequence, we do not lose any information by redeﬁning the control group to be0 ” t ˆ , ˆ u .In addition, students should be more likely to sign up under z “ ˆ z “ ˆ

0, as the former adds the lure of a fellowship. We will also assume that it makes39hem more likely to use the services—an assumption that we will test below. Then bothtreatment values p , q and p , q are targeted by 1 ˆ

1, but they cannot be strictly targeted.Take for instance ¯ Z p , q “ t ˆ u ; strict targeting would require ∆ T ˆ p , q “ ∆ T p , q ,which is minus inﬁnity.Rather than to pursue with the unﬁltered treatment model, let us move on to ﬁlteredmodels. In our terminology, Angrist, Lang, and Oreopoulos (2009) chose to use a particu-lar ﬁlter M p A, S q “ A , which is close to intent-to-treat as they point out. Here we take M p A, S q “ S instead: we deﬁne(4.1) D i p z q “ S i p z q “ ` ∆ Tz p , q ` u i p , q ą max p u i p , q , ∆ Tz p , q ` u i p , qq ˘ . Since the SFP incentives applied to the ﬁrst year grades only, we take the grades in thesecond year as our outcome variable Y i .Equation (4.1) has a similar structure to the double hurdle model of Equation (3.9). Themodels are quite diﬀerent, however. This new ﬁltered model has D i p q “ C D ,d,d for d, d “ ,

1, this assumptioneliminates one: if D i p ˆ q “ D i p ˆ q “

0. This leaves three groups:the never-takers A D , and two groups of compliers C D and C D . The group C D consistsof reluctant compliers, who only use SSP if it is oﬀered along with SFP. Those in C D areeager compliers: they use SSP whenever it is oﬀered to them with or without a fellowship.Remember that P D p | z q : “ Pr p D i “ | Z i “ z q for z “ , ˆ , ˆ

1. Then P D p | q “ p A D q “ ´ P D p | ˆ q Pr p C D q “ P D p | ˆ q ´ P D p | ˆ q Pr p C D q “ P D p | ˆ q . Note that given Equation (4.1), P D p | ˆ q “ Pr ` u i p , q ´ u i p , q ą ´ ∆ T ˆ p , q and u i p , q ´ u i p , q ą ∆ T ˆ p , q ´ ∆ T ˆ p , q ˘ P D p | ˆ q “ Pr ` u i p , q ´ u i p , q ą ´ ∆ T ˆ p , q and u i p , q ´ u i p , q ą ∆ T ˆ p , q ´ ∆ T ˆ p , q ˘ . T ˆ p , q ą ∆ T ˆ p , q and ∆ T ˆ p , q ´ ∆ T ˆ p , q ă ∆ T ˆ p , q ´ ∆ T ˆ p , q . Figure 9 illustrates a conﬁguration in which these inequalities hold, where P ˆ “ p´ ∆ T ˆ p , q , ∆ T ˆ p , q ´ ∆ T ˆ p , qq and P ˆ “ p´ ∆ T ˆ p , q , ∆ T ˆ p , q ´ ∆ T ˆ p , qq . Figure 9: STAR example u i p , q ´ u i p , q u i p , q ´ u i p , q P ˆ P ˆ A D C C Under our assumptions, it is straightforward to show that E r Y Di | Z i “ ˆ s ´ E r Y Di | Z i “ s “ E r Y Di p q ´ Y Di p q| i P C D s Pr p i P C D q , E r Y Di | Z i “ ˆ s ´ E r Y Di | Z i “ ˆ s “ E r Y Di p q ´ Y Di p q| i P C D s Pr p i P C D q . Therefore, we have E r Y Di p q ´ Y Di p q| i P C D s “ E r Y Di | Z i “ ˆ s ´ E r Y Di | Z i “ s P D p | ˆ q , E r Y Di p q ´ Y Di p q| i P C D s “ E r Y i | Z i “ ˆ s ´ E r Y Di | Z i “ ˆ s P D p | ˆ q ´ P D p | ˆ q . Since Pr p D i “ | Z i “ q “

0, the ﬁrst estimand is the IV formula of Bloom (1984); thesecond estimand is the LATE formula of Imbens and Angrist (1994).Table 3 reports estimation results. We only focus on the subsample of women since theSTAR program had no eﬀect on men. Panel A of Table 3 shows the estimated proportions41able 3: Empirical Results from STARPanel A. Proportion of CompliersPr p i P C D q p i P C D q E r Y i | Z i “ ˆ s ´ E r Y i | Z i “ s E r Y i p q ´ Y i p q| i P C D s E r Y i | Z i “ ˆ s ´ E r Y i | Z i “ ˆ s E r Y i p q ´ Y i p q| i P C D s n “ C D and 0.245 for C D . The majority group is thenever-takers whose share is 0.467. This is because the usage of SSP was low. Panel Breveals remarkable heterogeneity between the two complier groups. We do not ﬁnd anysigniﬁcant treatment eﬀect for C D , whereas we do ﬁnd sizeable and signiﬁcant impacton probation/withdrawal and good standing for C D . As can be seen in Figure 9, C D iscloser to the group of never-takers: they have higher unobserved disutilities of using academicsupport services than those in C D . However, those in C D reaped greater beneﬁts of usingthe SSP by avoiding probation or withdrawal in the second year.The main parameter of interest in Angrist, Lang, and Oreopoulos (2009) was the intent-to-treat (ITT) eﬀect of the SFSP program: E r Y i | Z i “ ˆ s ´ E r Y i | Z i “ ˆ s in ournotation. Our analysis suggests that the ITT eﬀect of the SFSP program is a mix of twovery diﬀerent treatment eﬀects. This highlights the importance of unbundling heterogeneouscomplier groups. Let us now reexamine Kline and Walters’s (2016) analysis of the Head Start Impact Study(HSIS) using our framework. The structure of HSIS is identical to that of Example 12.The treatments consist of no preschool ( n ), Head Start ( h ), and other preschool centers ( c ): T “ t n, h, c u . We will take t “ n as our reference treatment. The instrument is binary,with a control group ( z “

0) and a group that is oﬀered admission to Head Start ( z “ A n “ C nn , A c “ C cc , A h “ C hh , C nh , and C ch . The ﬁrstthree groups are always-takers and the last two groups are compliers. Their proportions in the sample are given by (2.4) in Proposition 6; they are shown inPanel A of Table 4. As expected, they coincide with those in Kline and Walters (2016).Panel B of Table 4 shows the counterfactual means of test scores as per Proposition 9.Among those that are point-identiﬁed, the average test scores are the highest for the groupswho always choose other preschool centers (about 0.3 standard deviation). There is anoticeable diﬀerence between the two complier groups: E r Y i p n q| i P C nh s is negative, but E r Y i p c q| i P C ch s is above 0.1 standard deviation. This indicates that among compliers, thechildren who used other centers had higher scores than those who stayed at home. Head Start The point estimates for probation/withdrawal and good standing are very large in absolute value; how-ever, the standard errors are large as well, resulting in wide conﬁdence intervals. This is partially becausethe sample size is relatively small and partially because the estimand is the ratio of two population quantitieswith the small denominator. E r Y i p n q| i P C nh s and E r Y i p c q| i P C ch s is new.Table 4: Proportions, Counterfactual Means and Treatment Eﬀects by Response Groups3-year-olds 4-year-olds PooledPanel A. Proportions of Response Groups via Proposition 6Always – no preschool ( A n ) 0.092 0.099 0.095Always – Head Start ( A h ) 0.147 0.122 0.136Always – other centers ( A c ) 0.058 0.114 0.083Compliers from n to h ( C nh ) 0.505 0.393 0.454Compliers from c to h ( C ch ) 0.198 0.272 0.232Panel B. Counterfactual Means of Test Scores via Proposition 9 E r Y i p n q| P A n s -0.050 -0.017 -0.035 E r Y i p h q| P A h s E r Y i p c q| P A c s E r Y i p n q| i P C nh s -0.027 -0.116 -0.062 E r Y i p c q| i P C ch s E r Y i p h q| i P C nh s “ E r Y i p h q| i P C ch s E r Y i p h q ´ Y i p n q| i P C nh s for compliers from ‘n’ to ‘h’ 0.279 0.285 0.278(0.063) (0.076) (0.050) E r Y i p h q ´ Y i p c q| i P C ch s for compliers from ‘c’ to ‘h’ 0.140 0.025 0.087(0.089) (0.097) (0.063) E r Y i p h q ´ Y i p n q| i P C nh s ´ E r Y i p h q ´ Y i p c q| i P C ch s h ), other centers ( c ), no preschool ( n ). Standard errors inparentheses are clustered at the Head Start center level.44 .2.2 Treatment Eﬀects To fully measure the substitution eﬀect, one needs to identify E r Y i p h q| i P C nh s and E r Y i p h q| i P C ch s .However, under Proposition 9, they are only partially identiﬁed by E r Y i p h q| i P C nh s t Pr p T i “ n | Z i “ q ´ Pr p T i “ n | Z i “ qu` E r Y i p h q| i P C ch s t Pr p T i “ c | Z i “ q ´ Pr p T i “ c | Z i “ qu“ E r Y i p T i “ h q| Z i “ s ´ E r Y i p T h “ q| Z i “ s . This is exactly the formula on Kline and Walters (2016, pp.1811), where they point out thatthe LATE for Head Start is a weighted average of “subLATEs” with weights S c and p ´ S c q with S c : “ Pr p C ch q Pr p C nh q ` Pr p C ch q “ Pr p T i “ c | Z i “ q ´ Pr p T i “ c | Z i “ q Pr p T i ‰ h | Z i “ q ´ Pr p T i ‰ h | Z i “ q . Kline and Walters (2016) ﬁrst tried to estimate E r Y i p h q´ Y i p c q| i P C ch s and E r Y i p h q´ Y i p n q| i P C nh s separately using two-stage least squares (2SLS), using interaction of the instrumentwith covariates or experimental sites in an attempt to generate enough variation. Theyacknowledged the limitations of this interacted 2SLS approach and developed a parametricselection model `a la Heckman (1979). Using a parametric selection model and pooled co-horts, Kline and Walters (2016, Table VIII, column (4) full model) obtain estimates of thetreatment eﬀect of 0 . p . q for C nh and ´ . p . q for C ch respectively (standarderrors in parentheses).Our Corollary 2 provides an alternative approach to separating the two treatment eﬀects.If we assume that E r Y i p h q| i P C nh s “ E r Y i p h q| i P C ch s , we can point-identify the averagetreatment eﬀects for both groups of compliers. The resulting estimates are shown in PanelsC and D of Table 4. The average impact on test scores of participating in Head Startis around 0 .

28 for C nh , whereas it is smaller and insigniﬁcant for C ch . Their diﬀerence issigniﬁcantly diﬀerent when the two cohorts are pooled together.We obtained these estimates of the treatment eﬀects by a completely diﬀerent route thanKline and Walters (2016). While the two sets of estimates are similar, our estimate of thediﬀerence between the treatment eﬀects on the two groups of compliers is twice smaller. Ourhomogeneity assumption E r Y i p h q| i P C nh s “ E r Y i p h q| i P C ch s may be too strong. It mightbe more plausible to assume that E r Y i p h q| i P C nh s ď E r Y i p h q| i P C ch s

45s children who would not attend preschool in the absence of oﬀer to Head Start are likely tobe less well-prepared than children who would attend other preschools. Then our estimateddiﬀerence between the two complier groups will be a lower bound of the true diﬀerence.

Concluding Remarks

We have shown that our targeting and ﬁltering concepts are a useful way to analyze modelswith multivalued treatments and multivalued instruments. While our characterization issharpest under strict, one-to-one targeting (Corollary 1), our framework remains useful evenwithout strict targeting. In addition to the examples we discussed in the text and to thetwo applications we revisited, we give an example in Appendix C.2, with a ternary/ternarymodel where the analyst only observed the least -preferred treatment in a factorial design.Our paper only analyzed discrete-valued instruments and treatments. Some of the notionswe used would extend naturally to continuous instruments and treatments: the deﬁnitionsof targeting, one-to-one targeting, and ﬁltering would translate directly. Strict targeting,on the other hand, is less appealing in a context in which continuous values may denoteintensities. Our earlier paper (Lee and Salani´e, 2018) can be seen as analyzing continuous-instruments/discrete-treatments ﬁltered models; so does Mountjoy’s (2019)’s study of 2-yearcolleges. Extending our analysis to models with continuous treatments is an interesting topicfor further research.

Appendices

A Proofs for Section 2

Proof of Proposition 1.

Let T i p t q “ t P T ˚ . Then u i ą ¯∆ t ` u it . However,¯∆ t ą ∆ z p t q if z R ¯ Z p t q . Therefore u i ą ∆ z p t q ` u it , and T i p z q cannot be t . Proof of Lemma 1.

The lemma is proved in the main text.

Proof of Proposition 2.

Take any observation i and an instrument value z P Z . The treat-ment T i p z q must maximize p U z p t q ` u it q over t P T . Under Assumption 7, for any t wehave • U z p t q “ U z p q ` ¯∆ t if t P ¯ T p z q • U z p t q “ U z p q ` ∆ t otherwise. 46herefore, eliminating U z p q ,(A.1) T i p z q P arg max ˆ max t R ¯ T p z q p ∆ t ` u it q , max t P ¯ T p z q p ¯∆ t ` u it q ˙ . Since ¯∆ t ě ∆ t for all t P T , a fortiori ¯∆ t ` u it ě ∆ t ` u it when t P ¯ T p z q . As a consequence,we can rewrite Equation (A.1) as T i p z q P arg max p ∆ ˚ i , V ˚ i p z qq . (i) If z P Z ˚ , then ¯ T p z q is not empty and the maximizer can be either in τ ˚ i or in T ˚ i p z q .(ii) If z P Z z Z ˚ , then z can only be 0. ¯ T p q “ H and T i p q can only be in τ ˚ i . Proof of Proposition 3.

Take an observation i and deﬁne A i “ t z P Z ˚ | T i p z q “ T ˚ i p z qu .(i) By deﬁnition, A i Ă Z ˚ ; therefore A i “ Z (which deﬁnes the subpopulation P ) requires Z “ Z ˚ .(ii) Now suppose that A i ‰ Z . If z P Z ˚ z A i , then by construction T i p z q ‰ T ˚ i p z q . ByProposition 2(i), T i p z q can only be τ ˚ i . If z R Z ˚ , then z “ T i p q “ τ ˚ i .(iii) Assume that τ ˚ i “ τ P T ˚ . Then ¯ Z p τ q ‰ H . For any z in ¯ Z p τ q , V ˚ i p z q ě ¯∆ τ ` u iτ ą ∆ τ ` u iτ “ ∆ ˚ i ;therefore z P A i . This proves that ¯ Z p τ q Ă A i . Proof of Corollary 1.

It follows directly from Proposition 3.

Proof of Proposition 4.

First assume that Z ˚ ‰ Z , so that only the classes in P exist. Theset A of Corollary 1 must be a subset of Z ˚ . For each such subset, τ can take any value in T z T ˚ ; and if τ P T ˚ then τ must be in A . Each subset A of Z ˚ with a elements thereforeallows for p a ` |T | ´ |T ˚ | q values of τ . This gives a total of |Z ˚ | ÿ a “ ˆ |Z ˚ | a ˙ p a ` |T | ´ |T ˚ | q P . Moreover, we know that |T ˚ | “ |Z ˚ | under one-to-onetargeting. Using the identities b ÿ a “ ˆ ba ˙ “ p ` q b “ bb ÿ a “ a ˆ ba ˙ “ b ˆ b ´ ÿ a “ ˆ b ´ a ˙ “ b ˆ b ´ , we obtain a total of p |T | ´ |Z ˚ | q ˆ |Z ˚ | ´ types.If Z “ Z ˚ , we must add the one type in P . On the other hand, we must subtract the |T | classes c p Z ˚ , τ q that are ruled out by Corollary 1(iii). Proof of Proposition 5. (i) First assume that Z ˚ ‰ Z , so that the subpopulation P doesnot exist. There are two ways to obtain T i p z q “ t . • The ﬁrst one is for i to belong to in any c p A, t q element, with A Ă Z ˚ and t constrained to be in A ` . This requires that z R A . If z is in Z ˚ , this implies A Ă Z ˚ zt z u . If not, then A can be any subset of Z ˚ . This gives the ﬁrst termin (2.1), and (2.2). • The second way to get T i p z q “ t is if t “ z , which can only happen if z P Z ˚ .Then if i P c p A, τ q for any A that contains z and any τ P A ` , we have T i p z q “ z .This gives the second term in (2.1).(ii) If Z ˚ “ Z , we only need to add in the subpopulation P if z “ t , and to delete fromthe summations the case A “ Z ˚ “ Z . Introducing these changes in (2.1) gives (2.3).Since Z ˚ “ Z there is obviously no subcase z R Z ˚ . Proof of Proposition 6.

Since Z ˚ “ t u ‰ Z in Example 8, we apply equations (2.1) and (2.2).With Z ˚ “ t u , we can only have A “ H , with A ` “ T zt u , or A “ t u , with A ` “ T .Equation (2.1) gives P p t | q “ p t ‰ q Pr p c pH , t qq ` p t “ q ÿ τ P T Pr p c pt u , τ qq ;and equation (2.2) gives P p t | q “ p t ‰ q Pr p c pH , t qq ` Pr p c pt u , qq .

48e already know that c pH , t q is A t and c pt u , τ q is A if τ “ C τ otherwise. Therefore for t “ P p | q “ Pr p A q ` ÿ τ ‰ Pr p C τ q and P p | q “ Pr p A q ; while for t ‰ P p t | q “ Pr p A t q and P p t | q “ Pr p C t q` Pr p A t q . Proof of Proposition 8.

It is straightforward from Figure 6.

Proof of Proposition 7.

It is straightforward from Figure 5.

Proof of Lemma 2.

We start from the sum over all response groups:¯ E z p t q “ ÿ C E z p t | C q Pr p i P C q . First note that if group C does not have treatment t under instrument z , it should not ﬁgurein the sum. Now if C p z q “ t , we have E z p t | C q “ E p Y i p T i “ t q| Z i “ z, i P C q“ E p Y i p t q| Z i “ z, i P C q“ E p Y i p t q| i P C q . The second part of the Lemma is just adding up.

Proof of Proposition 9.

By Lemma 2, we get¯ E p q “ E r Y i p q| i P A s Pr p i P A q ¯ E p t q “ E r Y i p t q| i P A t s Pr p i P A t q` E r Y i p t q| i P C t s Pr p i P C t q for t ‰ , ¯ E p q “ E r Y i p q| i P A s Pr p i P A q` ÿ t ‰ E r Y i p q| i P C t s Pr p i P C t q , ¯ E p t q “ E r Y i p t q| i P A t s Pr p i P A t q for t ‰ . Since Proposition 6 identiﬁes all type probabilities, the ﬁrst and fourth equations give directly E p Y i p t q| i P A t q for all t . Then the second equation identiﬁes E p Y i p t q| i P C t q for t ‰ E p Y i p q| i P C t q for t ‰ E p q ´ ¯ E p q “ ÿ t ‰ E r Y i p q| i P C t s Pr p i P C t q . By subtraction, we obtain p ¯ E p q ´ ¯ E p qq ´ ÿ t ‰ p ¯ E p t q ´ ¯ E p t qq“ ÿ t ‰ E r Y i p q ´ Y i p t q| i P C t s Pr p i P C t q . Combining these results with Proposition 6 and Lemma 2 yields the formula in the Propo-sition. The denominator ÿ t ‰ p P p t | q ´ P p t | qq “ P p | q ´ P p | q is positive, since all terms in the sum are positive. It follows that all α t weights are positiveand sum to 1. Proof of Corollary 2.

The corollary follows directly from the proof of Proposition 9, as ÿ t ‰ Pr p i P C t q “ ÿ t ‰ p P p t | q ´ P p t | qq “ P p | q ´ P p | q gives E p Y i p q| i P C t q “ p ¯ E p q ´ ¯ E p qq{p P p | q ´ P p | qq . Proof of Proposition 10.

B Proofs for Section 3

Proof of Proposition 11. (i) It follows directly from Proposition 6 and from the mappingof types.(ii) From Proposition 9, we have E p Y Di p q| i P A D q “ E p Y Ti p q| i P A T q “ ¯ E T p q P T p | q “ ¯ E D p q P D p | q . Moreover, E p Y Di p q| i P A D q “ E p Y Di p q| i P Ť t ‰ A Tt q“ ÿ t ‰ E p Y Ti p t q| i P A Tt q P T p t | q ´ P T p | q“ ÿ t ‰ ¯ E T p t q ´ P T p | q“ ¯ E D p q ´ P D p | q . (iii) Now consider the weighted LATE ÿ t ‰ α Tt E p Y Ti p q ´ Y Ti p t q| i P C Tt q , which is identiﬁedin the unﬁltered treatment model (equation 2.11). The weights α Tt “ p P T p t | q ´ P T p t | qq{p P T p | q ´ P T p | qq are not identiﬁed any more. Note however that for anyvariable W i , ÿ t ‰ α Tt E p W i | i P C t q “ E p W i | i P C D q ;therefore ÿ t ‰ α Tt E p Y Di p q| i P C Tt q “ E p Y Di p q| i P C D q . The LHS of Equation (2.11)51ecomes E p Y Di p q| i P C D q ´ ÿ t ‰ α Tt E p Y Ti p t q| i P C t q . On the RHS we had p ¯ E T p q ´ ¯ E T p qq ´ ř t ‰ p ¯ E T p t q ´ ¯ E T p t qq P T p | q ´ P T p | q . The denominator is still identiﬁed as P D p | q ´ P D p | q , as is the ﬁrst term of thenumerator, which equals ¯ E D p q ´ ¯ E D p q . From equation 3.3, ÿ t ‰ p ¯ E T p t q ´ ¯ E T p t qq “ ¯ E D p q . Therefore we identify E p Y Di p q| i P C D q ´ ÿ t ‰ α Tt E p Y Ti p t q| i P C Tt q “ p ¯ E D p q ´ ¯ E D p qq ´ p ¯ E D p q ´ ¯ E D p qq P D p | q ´ P D p | q , which is the standard Wald estimator. Proof of Corollary 3.

It is obvious by direct substitution into Equation (3.4).

Proof of Proposition 12. (i) It follows directly from the mapping of groups.(ii) Part (i) identiﬁes the weight α T “ p P D p | q ´ P D p | qq{p P D p | q ´ P D p | qq , whichwe denote α D in the Proposition. The other terms obtain by simple factorization, with1 ´ α D “ Pr p i P C Tt | t ą q . Proof of Proposition 13.

Recall that Table 2 shows which groups take D i “ d when Z i “ z .(i) We have P D p | z q “ P T p | z q for z “ ,

Recall that Table 5 shows how response groups map instrumentvalues into ﬁltered treatment values. The proof follows directly.

Proof of Proposition 15.

The proof is omitted since it is similar to those of Propositions 13and 14.

Proof of Lemma 3.

Using the fact that Pr p i P C D | X i “ x q “ DiD D p x q , we have that θ “ ż E r Y i p q ´ Y i p q| i P C D , X i “ x s f p x | i P C D q dx “ ż E r Y i p q ´ Y i p q| i P C D , X i “ x s Pr p i P C D | X i “ x q f p x q Pr p i P C D q dx “ ż DiD Y p x q DiD D p x q Pr p i P C D | X i “ x q f p x q Pr p i P C D q dx “ ş DiD Y p x q f p x q dx Pr p i P C D q“ ş DiD Y p x q f p x q dx ş DiD D p x q f p x q dx , which proves the lemma. 54 Additional Material

C.1 Strict Targeting in the Ternary/ternary Model

Just like ours, Kirkeboen, Leuven, and Mogstad (2016)’s approach to identiﬁcation relies ona monotonicity assumption and a restriction on the mapping from instruments to treatments.We translate them here in our notation to show that in this model, their assumptions areequivalent to ours.Kirkeboen, Leuven, and Mogstad (2016) impose the following in their Assumption 4: • if T i p q “ T i p q “ • if T i p q “ T i p q “ C ˚ , C ˚ , C ˚ , and C ˚ .Their Proposition 2 proves point-identiﬁcation of response-groups when one of three alter-native assumptions is added to their Assumption 4. We focus here on their assumption (iii),which is the weakest of the three and the one their application relies on. In our notation, itstates that: • if ( T i p q ‰ T i p q ‰ T i p q “ T i p q “ • if ( T i p q ‰ T i p q ‰ T i p q “ T i p q “ T i p q and T i p q are not 1, then they can only be 0 or 2. Therefore we are requiring T i p q “ T i p q . Applyingthe same argument to the second part, Assumption (iii) becomes: • if ( T i p q ‰ T i p q ‰ T i p q “ T i p q • if ( T i p q ‰ T i p q ‰ T i p q “ T i p q .It therefore excludes the response-groups C ˚ , C ˚ , C ˚ , and C ˚ . The response-group C appears twice in this list; and four other response-groups were already ruled out byAssumption 4. The reader can easily check that the 3 ´ ´ p ´ q “ .2 A Variant of Filtered Factorial Design Let us return to the factorial design of Example 4, with a twist: the unﬁltered treatmentconsists of the full ranking of the three alternatives. The instrument values are still p ˆ , ˆ , ˆ , ˆ q ; now T i is a pair that consists of the most-preferred alternative T i p z q “ arg max t “ , , p U z p t q ` u it q and of the least-preferred alternative T i p z q “ arg min t “ , , p U z p t q ` u it q . In Section 3.2, we considered the case when T i is only T i ; and we added a ﬁlter D i “ p T i ą q . Let us now take T i “ p T i , T i q , with the ﬁlter D i “ p T i “ q .The model of Section 3.2, where we only observed whether the most-preferred alternativewas 0, led to a double hurdle model. In this variant, we only observe whether the least- preferred alternative is 0, which leads to a diﬀerent ﬁltered treatment model:(C.1) D i p z q “ p u i ă min p ∆ Tz p q ` u i , ∆ Tz p q ` u i qq . We keep the same constraints on the mean utilities as in (3.8). Under Equation (C.1), wehave ﬁve response groups, as shown in Figure 10. Table 5 shows how response groups mapinstrument values into ﬁltered treatment values.Table 5: D -response Groups for the Alternative Factorial Design Model D i p z q “ D i p z q “ z “ ˆ A D Ť C D Ť C D Ť C D A D z “ ˆ A D Ť C D A D Ť C D Ť C D z “ ˆ A D Ť C D A D Ť C D Ť C D z “ ˆ A D A D Ť C D Ť C D Ť C D Proposition 15 (Identifying the Model with Equation (C.1)) . (i) The probabilities of the u i ´ u i u i ´ u i C D A D C D C D A D P ˆ P ˆ P ˆ P ˆ D -response groups are point-identiﬁed by Pr p A D q “ P D p | ˆ q Pr p A D q “ P D p | ˆ q Pr p C D q “ P D p | ˆ q ´ P D p | ˆ q Pr p C D q “ P D p | ˆ q ´ P D p | ˆ q Pr p C D q “ P D p | ˆ q ` P D p | ˆ q ´ P D p | ˆ q ´ P D p | ˆ q , and the model has three testable implications: P D p | ˆ q ě P D p | ˆ q ,P D p | ˆ q ě P D p | ˆ q ,P D p | ˆ q ` P D p | ˆ q ě P D p | ˆ q ´ P D p | ˆ q . (ii) The LATEs on the three groups of compliers are point-identiﬁed by E p Y Di p q ´ Y Di p q| i P C D q “ E p Y | Z “ ˆ q ´ E p Y | Z “ ˆ q P D p | ˆ q ´ P D p | ˆ q E p Y Di p q ´ Y Di p q| i P C D q “ E p Y | Z “ ˆ q ´ E p Y | Z “ ˆ q P D p | ˆ q ´ P D p | ˆ q E p Y Di p q ´ Y Di p q| i P C D q “ E p Y | Z “ ˆ q ` E p Y | Z “ ˆ q ´ E p Y | Z “ ˆ q ´ E p Y | Z “ ˆ q P D p | ˆ q ` P D p | ˆ q ´ P D p | ˆ q ´ P D p | ˆ q .

57t is worth comparing Proposition 14 with Proposition 15. One interesting observationis that the share of C D in Proposition 14 is identical up to sign to that of C D in Propo-sition 15. Namely, Pr p C D q “ DiD p T q and Pr p C D q “ p C D q “ p C D q “ ´ DiD p T q in Proposition 15, where DiD p T q is the diﬀerence-in-diﬀerencesof the propensity score deﬁned by:DiD p T q “ t Pr r D i “ | Z i “ ˆ s ´ Pr r D i “ | Z i “ ˆ su´ t Pr r D i “ | Z i “ ˆ s ´ Pr r D i “ | Z i “ ˆ su . In terms of economic interpretation, one may think of the selection mechanism in Proposi-tion 14 as the scenario when instruments 11 p Z i “ ˆ q and 11 p Z i “ ˆ q are substitutesto encourage agents to take treatment. On the contrary, the selection mechanism in Propo-sition 15 corresponds to the case that instruments 11 p Z i “ ˆ q and 11 p Z i “ ˆ q arecomplements. The same estimands identify the average treatment eﬀects for conceptuallydistinct subpopulations, depending on the details of the selection mechanism. This suggeststhat it is important to learn about the nature of selection into treatment before interpretingthe causal parameters of diﬀerent compliers. To do this, one can estimate the diﬀerence-in-diﬀerences of the propensity score DiD p T q and use its sign to determine whether equa-tion (3.9) or equation (C.1) is more plausible in any particular application. References

Ackerberg, D., X. Chen, and

J. Hahn (2012): “A practical asymptotic variance esti-mator for two-step semiparametric estimators,”

Review of Economics and Statistics , 94(2),481–498.

Ackerberg, D., X. Chen, J. Hahn, and

Z. Liao (2014): “Asymptotic eﬃciency ofsemiparametric two-step GMM,”

Review of Economic Studies , 81(3), 919–943.

Angrist, J., D. Lang, and

P. Oreopoulos (2009): “Incentives and Services for CollegeAchievement: Evidence from a Randomized Trial,”

American Economic Journal: AppliedEconomics , 1(1), 136–63.

Angrist, J. D., and

G. W. Imbens (1995): “Two-stage least squares estimation of aver-age causal eﬀects in models with variable treatment intensity,”

Journal of the AmericanStatistical Association , 90(430), 431–442.

Ao, W., S. Calonico, and

Y.-Y. Lee (2019): “Multivalued Treatments and Decompo-sition Analysis: An Application to the WIA Program,”

Journal of Business & EconomicStatistics , in press. 58 loom, H. S. (1984): “Accounting for no-shows in experimental evaluation designs,”

Eval-uation Review , 8(2), 225–246.

Caetano, C., and

J. C. Escanciano (2020): “Identifying Multiple Marginal Eﬀects witha Single Instrument,”

Econometric Theory , forthcoming.

Cattaneo, M. D. (2010): “Eﬃcient semiparametric estimation of multi-valued treatmenteﬀects under ignorability,”

Journal of Econometrics , 155(2), 138–154.

D’Haultfoeuille, X., and

P. F´evrier (2015): “Identiﬁcation of Nonseparable Triangu-lar Models With Discrete Instruments,”

Econometrica , 83(3), 1199–1210.

Feng, J. (2020): “Matching Points: Supplementing Instruments with Covariates in Trian-gular Models,” Job market paper, available at https://econ.columbia.edu/e/jfeng/ . Fr¨olich, M. (2007): “Nonparametric IV estimation of local average treatment eﬀects withcovariates,”

Journal of Econometrics , 139(1), 35–75.

Goff, L. (2020): “A Vector Monotonicity Assumption for Multiple Instruments,” availableat . Heckman, J., and

R. Pinto (2018): “Unordered Monotonicity,”

Econometrica , 86(1),1–35.

Heckman, J. J. (1979): “Sample Selection Bias as a Speciﬁcation Error,”

Econometrica ,47(1), 153–161.

Heckman, J. J., S. Urzua, and

E. Vytlacil (2006): “Understanding instrumentalvariables in models with essential heterogeneity,”

Review of Economics and Statistics ,88(3), 389–432.(2008): “Instrumental variables in models with multiple outcomes: The generalunordered case,”

Annales d’´economie et de statistique , 91/92, 151–174.

Huang, L., U. Khalil, and

N. Yildiz (2019): “Identiﬁcation and estimation of a tri-angular model with multiple endogenous variables and insuﬃciently many instrumentalvariables,”

Journal of Econometrics , 208(2), 346–366.

Imbens, G. W. (2000): “The role of the propensity score in estimating dose-response func-tions,”

Biometrika , 87(3), 706–710.

Imbens, G. W., and

J. D. Angrist (1994): “Identiﬁcation and Estimation of LocalAverage Treatment Eﬀects,”

Econometrica , 62(2), 467–475.

Kamat, V. (2019): “Identiﬁcation with Latent Choice Sets,” arXiv:1711.02048, https://arxiv.org/abs/1711.02048 . Kirkeboen, L. J., E. Leuven, and

M. Mogstad (2016): “Field of study, earnings, andself-selection,”

Quarterly Journal of Economics , 131(3), 1057–1111.59 line, P., and

C. R. Walters (2016): “Evaluating public programs with close substitutes:The case of Head Start,”

Quarterly Journal of Economics , 131(4), 1795–1848.

Lee, S., and

B. Salani´e (2018): “Identifying eﬀects of multivalued treatments,”

Econo-metrica , 86(6), 1939–1963.

Mogstad, M., A. Torgovitsky, and

C. R. Walters (2019): “Identiﬁcation of CausalEﬀects with Multiple Instruments: Problems and Some Solutions,” Working Paper 25691,National Bureau of Economic Research.(2020): “Policy Evaluation With Multiple Instrumental Variables,” Working Paper27546, National Bureau of Economic Research.

Mountjoy, J. (2019): “Community Colleges and Upward Mobility,” Chicago Booth mimeo.

Muralidharan, K., M. Romero, and

K. W¨uthrich (2019): “Factorial Designs, ModelSelection, and (Incorrect) Inference in Randomized Experiments,” Working Paper 26562,National Bureau of Economic Research.

Torgovitsky, A. (2015): “Identiﬁcation of Nonseparable Models Using Instruments WithSmall Support,”

Econometrica , 83(3), 1185–1197.

Vytlacil, E. (2002): “Independence, monotonicity, and latent index models: An equiva-lence result,”