Filtered and Unfiltered Treatment Effects with Targeting Instruments
aa r X i v : . [ ec on . E M ] J u l Filtered and Unfiltered Treatment Effectswith Targeting Instruments ∗ Sokbae Lee † Bernard Salani´e ‡ July 22, 2020
Abstract
Multivalued treatments are commonplace in applications. We explore the use ofdiscrete-valued instruments to control for selection bias in this setting. We establishconditions under which counterfactual averages and treatment effects are identified forheterogeneous complier groups. These conditions require a combination of assumptionsthat restrict both the unobserved heterogeneity in treatment assignment and how theinstruments target the treatments. We introduce the concept of filtered treatment,which takes into account limitations in the analyst’s information. Finally, we illustratethe usefulness of our framework by applying it to data from the Student Achievementand Retention Project and the Head Start Impact Study.
Keywords : Identification, selection, multivalued treatments, discrete instruments,unordered monotonicity, factorial design. ∗ This work is in part supported by the European Research Council (ERC-2014-CoG-646917-ROMIA) andthe UK Economic and Social Research Council for research grant (ES/P008909/1) to the CeMMAP. † Department of Economics, Columbia University and Centre for Microdata Methods and Practice, Insti-tute for Fiscal Studies, [email protected]. ‡ Department of Economics, Columbia University, [email protected]. ntroduction Much of the literature on the evaluation of treatment effects has concentrated on the paradig-matic “binary/binary” example, in which both treatment and instrument only take two val-ues. Multivalued treatments are common in actual policy implementations, however; andmultivalued instruments are just as frequent. Many different programs aim to help trainjob seekers for instance, and each of them has its own eligibility rules. Tax and benefitregimes distinguish many categories of taxpayers and eligible recipients. The choice of acollege and major has many dimensions too, and responds to a variety of financial help pro-grams and other incentives. Randomized experiments in economics resort more and more tofactorial designs; they have a long tradition in applied statistics, starting with Fisher in the1920s . As the training, education choice, and tax-benefit examples illustrate, multivaluedtreatments are often also subject to selection on unobservables. We explore in this paperthe use of discrete-valued instruments in order to control for selection bias when evaluatingdiscrete-valued treatments. We establish conditions under which counterfactual averages andtreatment effects are identified for various (sometimes composite) complier groups. Theseconditions require a combination of assumptions that restrict both the unobserved hetero-geneity in treatment assignment and the configuration of the instruments themselves.Existing work on multivalued treatments under selection on observables includes Imbens(2000), Cattaneo (2010), and Ao, Calonico, and Lee (2019) among others. The literaturethat uses discrete-valued instruments to evaluate treatment effects under selection on un-observables is more sparse. On the theoretical side, Angrist and Imbens (1995) analyzedtwo-stage least-squares (TSLS) estimation when the treatment takes a finite number of or-dered values. Heckman, Urzua, and Vytlacil (2006, 2008) showed how treatment effects canbe identified in discrete choice models for the ordered and unordered cases, respectively. Morerecently, Heckman and Pinto (2018) focused on unordered treatments and introduced the no-tion of “unordered monotonicity” under which treatment assignment is formally analogous toan additively separable discrete choice model. Several recent papers have studied the case ofbinary treatments with multiple instruments, as well as binary instruments with multivaluedor continuous treatments. For the former, Mogstad, Torgovitsky, and Walters (2019, 2020)and Goff (2020) analyzed the identifying power of different monotonicity assumptions. Forthe latter, Torgovitsky (2015), D’Haultfoeuille and F´evrier (2015), Huang, Khalil, and Yildiz(2019), Caetano and Escanciano (2020) and Feng (2020) developed identification results fordifferent models. On the applied side, Kirkeboen, Leuven, and Mogstad (2016) used discreteinstruments to obtain TSLS estimates of returns to different fields of study. Kline and Walters Muralidharan, Romero, and W¨uthrich (2019) reviews recent applications of factorial designs. filtered treatment D ;underlying it is an unfiltered treatment T . Treatment effects are of course harder to identifyin the filtered model. The concept of filtering is linked to our earlier work (Lee and Salani´e,2018), which allowed for limited violations of unordered monotonicity and used continuousinstruments to identify marginal treatment effects.Moreover, both filtering and the multiplicity of treatments and instruments may giverise to a bewildering number of cases. In the binary/binary model, the analyst can usuallytake for granted that switching on the binary instrument makes treatment (weakly) morelikely for any observation . With multiple instrument values and multiple treatments, thecorrespondence is less clear. We start by imposing the unordered monotonicity property ofHeckman and Pinto (2018) on the unfiltered treatment model. Under unordered monotonic-ity, it is natural to speak of an instrument targeting an unfiltered treatment by increasing itsrelative “mean utility”. Most of our paper relies on the assumption of strict targeting , whichobtains when each instrument only promotes the treatments it targets.To illustrate, consider the effect of various programs T on some outcomes Y . Let eachinstrument value z stand for a policy regime, under which the access to some programs ismade easier or harder than in a control group. Under unordered monotonicity, this translatesinto a profile of relative mean utilities of any treatment t under the policy regimes z P Z .We say that an instrument value z targets a treatment t when it maximizes its relative meanutility. Suppose that each policy regime consists of values of subsidies for a subset of theprograms, and that these subsidies enter mean utilities additively. Then a policy regime z targets a treatment t if it has the highest subsidy for this program among all policy regimes.Strict targeting requires that all policy regimes z that do not target t have the same (lower) This is satisfied under the LATE-monotonicity assumption (e.g., Imbens and Angrist, 1994; Vytlacil,2002). t . It is easy to translate this property in the other examples cited at the beginningof the introduction.With complete treatment data (the unfiltered treatment), combining unordered mono-tonicity and strict targeting allows us to point-identify the size of some complier groupsand the corresponding treatment effects, and to partially identify others. When the data ontreatments is filtered, unordered monontonicity may not carry over to the filtered treatment D (an observation already in Lee and Salani´e (2018)); and strict targeting generally does not.Nevertheless, they confer enough underlying structure to the mapping from instruments tofiltered treatments that we can still identify various parameters of interest.We give numerous examples throughout the paper. We also illustrate the usefulness ofour framework by applying it to data from the Student Achievement and Retention (STAR)Project (Angrist, Lang, and Oreopoulos, 2009) and to Kline and Walters’s (2016) analysisof the Head Start Impact Study. We find that the large intent-to-treat (ITT) effect of theSTAR for female college students results from the aggregation of two very different treatmenteffects; this highlights the value of unbundling the heterogeneous compliers. We also confirmthe importance of taking into consideration alternative preschools when evaluating HeadStart; unlike Kline and Walters (2016), we do not rely on parametric selection models.The remainder of the paper is organized as follows. Section 1 defines our frameworkand introduces filtered and unfiltered treatments. In Section 2, we study identification inthe unfiltered treatment model. We define the concepts of targeting, one-to-one targeting,and strict targeting and their implications for the identification of the probabilities and thetreatment effects of various complier groups. Section 3 turns to filtered models. We deriveidentification results in several leading classes of applications. Finally, we present estimationresults for the two aforementioned empirical studies in Section 4. The Appendices containthe proofs of all propositions and lemmata, along with some additional material. We focus throughout on a treatment that takes discrete values, which we label d P D . Forsimplicity, we will call D “ d “treatment d ” These values are unordered: e.g. d “
2, whenavailable, is not “more treatment” than d “
1. In most of our examples, there is a well-definedcontrol group, which is denoted by d “
0. We assume that discrete-valued instruments Z i P Z are available. We condition on all other exogenous covariates X i throughout, and we omitthem from the notation. We will use the standard counterfactual notation: D i p z q and Y i p d, z q denote respectively potential treatments and outcomes.The validity of the instruments requires the usual exclusion restrictions:4 ssumption 1 (Valid Instruments) . (i) Y i p d, z q “ Y i p d q for all p d, z q in D ˆ Z .(ii) Y i p d q and D i p z q are independent of Z i for all p d, z q in D ˆ Z . Under Assumption 1, we define D i : “ D i p Z i q and Y i : “ Y i p D i q . Throughout the paper,we assume that we observe p Y i , D i , Z i q for each i . In addition, the instruments must berelevant. In the usual binary instrument/binary treatment case (hereafter “binary/binary”),this translates into a requirement that the propensity score vary with the instruments. Inour more general setting, we impose: Assumption 2 (Relevant Instruments) . Let Z i denote a column vector whose elements are1 and the variables p Z i “ z q for z P Z , and D i denote a column vector whose elements are1 and the variables p D i “ d q for d P D . Then E r Z i D J i s has full rank. We now move beyond these standard assumptions. First, we need an assumption thatrestricts the heterogeneity in the counterfactual mappings D i . In the binary/binary model,this is most often done by imposing LATE-monotonicity. Assumption 3 (LATE-monotonicity in the binary/binary model) . (i) or (ii) must hold:(i) for each observation i , D i p q ě D i p q ;(ii) for each observation i , D i p q ě D i p q . With more than two treatment values and/or more than two instrument values, thereare many ways to restrict the heterogeneity in treatment assignment. Since treatmentsare not ordered in any meaningful way, we cannot apply the results in Angrist and Imbens(1995) for instance. Mogstad, Torgovitsky, and Walters (2019, 2020) state several versions ofmonotonicity for a binary treatment model with |Z| ą
2. They propose an assumption PM(partial monotonicity) which applies binary LATE-monotonicity component by component.This requires that the instruments be interpretable as vectors, which is not necessarily thecase here.Heckman and Pinto (2018) took another path; they defined an unordered monotonicityproperty that is motivated by an analogy to revealed preference theory. This can be statedas follows:
Assumption 4 (Unordered Monotonicity at p z, z q ) . For any treatment value d P D , (i) or(ii) must hold: i) if D i p z q “ d , then D i p z q “ d ;(ii) if D i p z q “ d , then D i p z q “ d . The easiest way to understand Assumption 4 is to think of treatment assignment asgenerated by a discrete choice problem. If observation i “chose” treatment value d under z ,then a change in instrument value that increases the mean utility of treatment d at least asmuch as the mean utilities of other treatment values should lead i to still choose d . Thisis more than illustrative: Heckman and Pinto (2018) show that the treatment assignmentmodels that satisfy unordered monotonicity for each pair of instrument values in a set Z canbe represented by a discrete choice problem with additively separable errors, that is D i p z q “ arg max d P D p U z p d q ` u id q for random vectors p u id q d P D that are distributed independently of Z i . Let AS-DCM denotethis class of models. Clearly, Asssumption 4 is more restrictive if the set of instrument values Z is richer. In Example 1 below, we would only want to invoke Assumption 4 on some policychanges. Example 1.
Unemployed individuals can be assigned either to a control group or to threedifferent training programs, with treatment values d “ , ,
3. Consider three alternativepolicy changes ( z Ñ z ), all of which make more individuals eligible for treatment 1. Policychange A at the same time restricts the eligibility criteria for both treatments 2 and 3 inunspecified ways. Policy change B leaves eligibility criteria unchanged for treatments 2 and 3,and policy change C restricts them for treatment 2 only. Assumption 4 would require thatall observations that have treatment 1 under z must also have treatment 1 under z , whichseems natural in this context. It would also prevent “two-way moves” between treatment 2and treatment 3, which seems restrictive. Assumption 4 would be more credible with policychanges B and C. Heckman and Pinto (2018) showed how unordered monotonicity could be applied to identifysome treatment effects (or weighted averages of treatment effects). In Lee and Salani´e (2018),we considered a more general family of models of treatment assignment. We allowed for As a special case, unordered monotonicity includes ordered treatments in which D i p z q “ arg max d P D p U z p d q` σ p d q u i q for some increasing positive function σ . u ij ď Q j p z q . All AS-DCM models clearly belong to this class, as D i p z q is characterized by u id ´ u i,D i p z q ď U z p D i p z qq ´ U z p d q for all d P D . Our results showed that this class of models can be generated by1. taking an AS-DCM model of assignment to treatment values T P T ,2. generating the observed treatment D P D from a partition of the set T .This defines the class of models of assignment to treatment that we analyze in this paper.We call such models “filtered treatment”, and we will refer to the (imperfectly observed)model of treatment in 1 above as the “unfiltered treatment”. To pursue the discrete choiceanalogy: in the unfiltered model, each observation chooses a treatment within D and theanalyst observes this choice. In a filtered model, choices are aggregated into groups; theanalyst only observes which group the treatment chosen belongs to. The aggregation occursvia a filtering map from T to D . Definition 1 (Filtered Treatment) . The treatment assignment model is determined by:1. a finite set T ;2. a partition of T which we call D ; or equivalently, a surjective filtering map M : T Ñ D ;3. a finite set of instrument values Z ; and4. an AS-DCM model of unfiltered treatment: T i p z q “ arg max t P T p U z p t q ` u it q , where the vector p u it q t P T is distributed independently of Z i and has full support onIR |T | .Our paper focuses on such models. Assumption 5 (Filtered Treatment) . Treatment is assigned according to Definition 1. Wecall the model that generates T i the underlying unfiltered treatment model .
7e will assume throughout, implicitly, that the set of instrument values Z has beenrestricted to a subset where unordered monotonicity is a reasonable assumption. We willalso take some liberties with language by speaking of individuals “choosing” their “preferred”unfiltered treatments and of “mean utilities” U z p t q . These are only meant to simplify theexposition and do not imply that the individual actually chooses her treatment.As an example, consider the following double hurdle model; it has |D| “
2, and anunderlying unfiltered treatment model with |T | “ Example 2 (Double Hurdle Treatment) . The unfiltered treatment has T “ t , , u and T i p z q “ arg max t “ , , p U z p t q ` u it q , where the vector p u i , u i , u i q is distributed independently of Z i and has full support on IR .Suppose that the filtered treatment is generated by D “ p T “ q , which corresponds tothe filtering map M p q “ , M p q “ M p q “
0; that is,(1.1) $&% D i p z q “ p U z p q ` u i , U z p q ` u i q ă U z p q ` u i D i p z q “ u i ´ u i ą U z p q ´ U z p q and u i ´ u i ą U z p q ´ U z p q .Lee and Salani´e (2018) gave a set of assumptions under which the marginal treatmenteffect can be identified in a filtered treatment model, provided that enough continuous instru-ments are available. In Example 2, we would need two continuous instruments, and someadditional restrictions. The current paper is exploring identification with discrete-valuedinstruments. In these settings, the combination of Assumptions 1, 2, and 5 is far from suf-ficient to identify interesting treatment effects in filtered and unfiltered treatment modelsin general. In order to better understand what is needed, we now resort to the notion of response-groups of observations, whose members share the same mapping from instruments z to unfiltered treatments t . We first state a general definition . Definition 2 (Response-vectors and -groups) . Let ˜ t be an element of T Z and ˜ t p z q P T denote its component for instrument value z P Z . • Observation i has (elemental) response-vector R ˜ t if and only if for all z P Z , T i p z q “ ˜ t p z q . The set C ˜ t denotes the set of observations with response-vector R ˜ t and we call ita response-group . This is analogous to the definitions in Heckman and Pinto (2018). We extend the definition in the natural way to incompletely specified mappings, where˜ t is a correspondence from Z to T . We call the corresponding response-vectors andresponse-groups composite . We start by introducing additional assumptions on the underlying unfiltered treatmentmodel. We will illustrate these assumptions in simple graphs; our leading example is the“ternary/ternary” case when |T | “ |Z| “ Example 3 (Ternary/ternary unfiltered model) . Assume that Z “ t , , u and T “t , , u . In the p u i ´ u i , u i ´ u i q plane, the points of coordinates P z “ p U z p q´ U z p q , U z p q´ U z p qq for z “ , , z , • T i p z q “ P z ; • T i p z q “ P z and below the diagonal that goes through it; • T i p z q “ P z and above the diagonal that goes through it.This is shown in Figure 1 for a given z , where the origin is in P z .Figure 1: Unfiltered treatment assignment in the ternary/ternary model for given zu i ´ u i u i ´ u i P z T i p z q “ T i p z q “ T i p z q “ .1 Targeted Treatments “Targeting” will be the common thread in our analysis. Just as in general economic discus-sions a policy measure may target a particular outcome, we will speak of instruments (in theeconometric sense) targeting the assignment to a particular treatment.Under unordered monotonicity (Assumption 4), assignment to treatment is governed bythe differences in mean utilities p U z p t q ´ U z p τ qq and by the differences in unobservables u it ´ u iτ . Only the former depend on the instrument. Intuitively, an instrument z targets atreatment t if it makes the difference p U z p t q ´ U z p τ qq as large as possible for given τ . Insteadof requiring this for any τ , we will choose a reference treatment t P T and require that z maximize p U z p t q ´ U z p t qq for this particular t . In many applications, the control group isa natural choice for a reference treatment. Since the control group is usually denoted t “ t “ Definition 3 (Targeted Treatments and Targeting Instruments) . Let t “ z P Z and t P T , we denote∆ z p t q ” U z p t q ´ U z p q the relative mean utility of treatment t given instrument z .Let ¯∆ t be the maximum value of ∆ z p t q over z P Z , and ¯ Z p t q the set of maximizers z P Z .If ¯ Z p t q is not all of Z , then for any z P ¯ Z p t q we will say that instrument value z targets treatment value t ; and we write t P ¯ T p z q . We denote T ˚ the set of targeted treatments and Z ˚ “ Ť t P T ˚ ¯ Z p t q the set of targeting instruments.Definition 3 calls for several remarks. First, by construction ∆ z p q ” Z p q “ Z .Therefore t “ T ˚ . In many of our examples, T ˚ “ T zt u ; the set T ˚ may excludeother treatment values, however.If a treatment value t is not targeted, by definition the function z Ñ ∆ z p t q is constantover z P Z , with value ¯∆ t . While treatment values in T z T ˚ have mean utilities that donot respond to changes in the instruments, these mean utilities may and in general willdiffer across treatments. The probability that an individual observation takes a treatment t P T z T ˚ also generally depends on the value of the instrument.More importantly, the utilities U z p t q and therefore the targeting maps ¯ Z and ¯ T are notobservable; any assumption on targeting instruments and targeted treatments must be apriori and will be context-dependent. As we will see, these prior assumptions sometimeshave consequences that can be tested. 10et us return to the illustration that we used in the introduction. A policy regime z consists of a set of (possibly zero or negative) subsidies S z p t q for treatments t P T . If thereis a no-subsidy regime z “ S p t q “ t , it seems natural to write the meanutility as U z p t q “ U p t q ` S z p t q . Then relative mean utilities are ∆ z p t q “ ∆ p t q ` S z p t q andfor any treatment t , the set ¯ Z p t q consists of the instrument values z that subsidize t mostheavily. As this illustration suggests, the sets ¯ Z p t q may not be singletons, and they may wellintersect. We will show this on several examples. Example 4 is an instance of factorial design in which each non-zero treatment value is targetedby two instrument values, and one instrument value targets several treatments.
Example 4 (2 ˆ . Let Z “ t ˆ , ˆ , ˆ , ˆ u , where the two digitsindicate the values of two binary instruments z and z . Suppose that T “ t , , u , where z “ z “ ˆ p q “ ¯∆ ą max p ∆ ˆ p q , ∆ ˆ p qq ∆ ˆ p q “ ¯∆ ą max p ∆ ˆ p q , ∆ ˆ p qq . Depending on the context, it may be reasonable to assume that ∆ ˆ p q “ ∆ ˆ p q and∆ ˆ p q “ ∆ ˆ p q : turning on the two instruments increases the appeal of t “ t “
2) just as much as if only z (resp. z ) had been turned on. This would be quite naturalif z “ z “ ˆ Z p q “ t ˆ , ˆ u and ¯ Z p q “ t ˆ , ˆ u ;instrument z “ ˆ t “ t “
2, so that ¯ T p ˆ q “ t , u . Example 5 (Two Instruments Target the Same Treatment) . Let us now modify Example 4slightly: the instrument can only take values 0 ˆ , ˆ
0, and 1 ˆ
1. Then z “ ˆ z “ ˆ t “
1: ¯ Z p q “ t ˆ , ˆ u . Example 6 (An Instrument Targets Two Treatments) . In this example, Z “ t , u and T “ t , , u . A fraction of individuals in the sample receives a subsidy z “ t “ t “
2; under z “
0, no treatment is subsidized. Wewould expect that ∆ p q ą ∆ p q and ∆ p q ą ∆ p q , so that ¯ Z p q “ ¯ Z p q “ t u ; then wehave T ˚ “ t , u and Z ˚ “ t u . 11 .1.2 One-to-one Targeting Sometimes we will impose the much stronger Assumption 6, or only one of its two parts.The first part says that a targeted treatment can only have one targeting instrument; thesecond part stipulates that a targeting instrument may target only one treatment. Example 4violates both parts of Assumption 6. Example 5 violates its first part only, and Example 6only violates its second part.
Assumption 6 (One-to-one Targeting) . (i) For any t P T ˚ , the set ¯ Z p t q is a singleton t ¯ z p t qu .(ii) For any z P Z ˚ , ¯ T p z q is a singleton t ¯ t p z qu . Note that if both parts of Assumption 6 hold, we can identify Z ˚ and T ˚ , and aninstrument z P Z ˚ to the treatment it targets. Definition 4 (Labeling Instruments) . Let both parts of Assumption 6 hold. To any t P T ˚ we associate the instrument value z “ ¯ z p t q that targets t , and we denote it by z “ t . Thisallows us to define the partition Z “ p Z z T ˚ q Ť T ˚ , which is illustrated in Figure 2.Figure 2: One-to-one Targeting T ˚ Z z T ˚ Z tz T ˚ T z T ˚ T t Example 7 (Treatment Subsidies) . Let T “ t u Ť T ˚ with T ˚ “ t , . . . , |T | ´ u , and Z “ T . Each z ą t “ z in the sense that for each t ą , S t p t q ą S z p t q for all z ‰ t. . Example 8 (Binary Instrument) . Let T “ t u Ť T ˚ with T ˚ “ t , . . . , |T | ´ u , and Z “ t , u . An observation with z “ t “
1, so that S p q ą
0. Other treatment values are not subsidized: S p t q “ S p t q “ t ‰
1. Then∆ p t q “ ∆ p t q for all t ‰
1, so that Z ˚ “ t u . Example 9 (No Control) . Let Z “ T ˚ , so that there is one fewer instrument value thantreatment values. The simplest example in this class is the ternary/binary model, with T “ t , , u and Z “ T ˚ “ t , u . There are only two classes of observations: those with z “ t “
1, and those with z “ t “ .2 Strict Targeting Assumption 4, conjoined with Assumption 6, imposes some useful restrictions on responsegroups.
Proposition 1 (Unfiltered response groups (1)) . Under Assumptions 4 and 6, for any t P T ˚ : • if T i p t q “ , then T i p z q ‰ t for all z P Z ; • as a consequence, all response-groups C ˜ t with ˜ t p t q “ and ˜ t p z q “ t for some z ‰ t areempty. Example 3 (continued)
Return to the ternary/ternary model and assume that the targetedset of treatments T ˚ “ t , u and that Assumptions 4 and 6 hold. This imposes∆ p q ą max p ∆ p q , ∆ p qq and ∆ p q ą max p ∆ p q , ∆ p qq . A possible interpretation is that policy regime z “ z “
2) subsidizes treatment t “ t “
2) more that policy regimes z “ z “ z “
1) do.Since P z has coordinates p´ ∆ z p q , ´ ∆ z p qq , • P must lie to the left of P and P , • P must lie below P and P .This is easily rephrased in terms of the response-vectors of definition 2. First note that in theternary/ternary case, there are 3 “
27 response-vectors, R to R , with correspondingresponse-groups C to C . Groups C ddd are “always-takers” of treatment value d . Allother groups are “compliers” of some kind, in that their treatment changes under somechanges in the instrument. We will also pay special attention to some non-elemental groups.For instance, R ˚ will denote the group who is assigned treatment 0 under z “ z “
2, and any treatment under z “
1. That is, C ˚ “ C Ť C Ť C . Assumption 4 asserts the emptiness of four composite groups out of the 27 possible: C ˚ , C ˚ , C ˚ , and C ˚ by Proposition 1. They correspond to 10 elemental groups. Observations in group C are usually called the “never-takers”. We prefer not to break the symmetryin our notation. We hope this will not cause confusion. Specifically, they are: C , C , C , C , C , C , C , C , C , and C . P , P and P are consistent with Assumptions 4 and 6.Figure 3: Unordered monotonic ternary/ternary models: an example A A C C A C C C C C u i ´ u i u i ´ u i P P P The number of distinct response-groups (ten) and the contorted shape of the C and C groups in Figure 3 point to the difficulties we face in identifying response-groups withoutfurther assumptions. Moreover, this is only one possible configuration: other cases exist,which would bring up other response-groups.Figure 3 also suggests that if we could make sure that P is directly to the left of P , theshape of C would become nicer—and group C would be empty. Bringing P directlyunder P would have a similar effect. But these are assumptions on the dependence of the U z p d q on instruments. The first one imposes ∆ p q “ ∆ p q and the second one imposes∆ p q “ ∆ p q . To put it differently, we are now requiring that instrument z “ t , whichmaximizes ∆ z p t q “ U z p t q ´ U z p q , should not shift assignment between the other values ofthe treatment. This can be interpreted as policy regime z “ z “
2) subsidizingtreatment t “ z “
2) only.The following assumption is a direct extension of the discussion above to our generaldiscrete model.
Assumption 7 (Strict Targeting) . Take any targeted treatment value t P T ˚ . Then thefunction z P Z Ñ ∆ z p t q takes the same value for all z R ¯ Z p t q . We denote this common valueby ∆ t , and we will say of the instrument values z P ¯ Z p t q that they strictly target t . z P ¯ Z p t q promotes treatment t withoutaffecting the relative mean utilities of other treatment values. This explains our use of theterm “strict targeting”. To return to the analogy with a discrete choice model, an instrumentin ¯ Z p t q plays the role of a price discount on good t in a model of demand for goods whosemean utilities only depend on their own prices. In the language of program subsidies, all z P ¯ Z p t q subsidize t at the same high rate, and all other instrument values offer the same,lower subsidy (which could be zero or negative).Note that while we only state the assumption for t P T ˚ , it holds by definition for all t P T z T ˚ . Since ¯ Z p t q “ Z for these treatment values, ∆ t “ ¯∆ t is the common value of ∆ z p t q over all of Z .Moreover, Assumption 7 only bites for a given t P T ˚ if Z z ¯ Z p t q has at least two values.Since ¯ Z p t q is never empty, this shows that Assumption 7 automatically holds if |Z| “ ˆ p q “ ∆ ˆ p q and∆ ˆ p q “ ∆ ˆ p q (so that z “ ˆ t “ z “ ˆ t “ z “ z “ t “
1, and z “ Z p q “ t u yet ∆ p q ă ∆ p q . Example 10 (Tuition Subsidies) . To shed light on Assumption 7, consider two types ofpolicies aimed at making education more affordable. Our first policy consists of field-specifictuition subsidies. Each individual i is offered randomly a choice of m i ě Z i of fields. If m i ě
1, the individual may choose to use a voucher to study in a fieldin Z i , to study in another field, or not pursue education. Let T i denote this choice, with T i “ t ‰
0, the value of ∆ z p t q is highest when t P z as a vouchercan be used. Therefore ¯ Z p t q is the set of menus of vouchers that include field t ; and T ˚ isthe set of fields for which a voucher is sometimes, but not always offered. Whether ¯ Z p t q isa single menu or not, all other menus of vouchers yield the same ∆ z p t q : the field t is strictlytargeted .Another possible policy consists in subsidizing tuition for every year of study in the hopeof increasing the number of years of education. Now z is a subsidy rate, and t the number Note that iff m i ď z “ t F, K u and z “ t G, L u fails the second part of Assumption 6; a set z “ t F, K u and z “ t G, K u fails both parts.
15f years of education. Since a higher subsidy rate reduces the cost of education, for any t the function ∆ z p t q achieves its maximum ¯∆ t for the highest subsidy ¯ z on offer: for each t ,¯ Z p t q “ t ¯ z u and Assumption 7 fails. More importantly, if |Z| ą t ą
0, thevalue of ∆ z p t q increases with z ‰ ¯ z . Strict targeting would clearly not be an appropriateassumption in this setting.Extending our geometric illustration of Example 3, let P z be the point in IR |T ˚ | withcoordinates p´ ∆ z p t qq t P T ˚ . Under Assumption 7, the point P z has its t coordinate equal to ´ ∆ t on any axis t which it targets ( t P ¯ T p z q ), and ´ ∆ t on any other axis. Since ´ ∆ t ą ´ ∆ t ,two points P z and P z have the same coordinate on any axis t R ¯ T p z q Ť ¯ T p z q ; and P z isbelow P z on axis t if t P ¯ T p z qz ¯ T p z q .Now suppose that in addition to Z ˚ , the set of instruments contains at least two values z and z . Since neither targets any treatment, under Assumption 7 ∆ z p t q “ ∆ z p t q “ t P T ˚ . Moreover, ∆ z p t q equals ∆ t for all z P Z if t R T ˚ . This implies that thecounterfactual treatments T i p z q and T i p z q must be equal for any observation i . In thatsense, z is superfluous and we can aggregate it with z in a category that we will call z “ z ‰ P is above the point P z on any axis t P ¯ T p z q .We summarize this in Lemma 1. Lemma 1 (Some consequences of strict targeting) . Under Assumption 4 and Assumption 7,(i) The coordinates of two points P z and P z in IR |T ˚ | coincide on any axis t that is notin the symmetric difference ¯ T p z q △ ¯ T p z q .(ii) If z P ¯ Z p t q and z P ¯ Z p t q , the point P z is above the point P z on the axis t .(iii) The set of instrument values Z is either Z ˚ , or the union of Z ˚ and of a singleinstrument value that we denote z R Z ˚ . In the latter case, for any z ‰ z the point P z is below the point P z on any axis t P ¯ T p z q , and it has the same coordinates on allother axes. For simplicity, if such a z exists we denote z “ . Just as we chose to denote our reference treatment as t “
0, our choice of z “ Definition 5 (Preferred targeted and alternative treatments) . Take any observation i in thepopulation. 16i) For z P Z ˚ , let V ˚ i p z q “ max t P ¯ T p z q p ¯∆ t ` u it q and T ˚ i p z q Ă ¯ T p z q denote the set of maximizers. We call the elements of T ˚ i the preferred targeted treatments .(ii) Also define ∆ ˚ i “ max t P T p ∆ t ` u it q and let τ ˚ i Ă T denote the set of maximizers. We call the elements of τ ˚ i the preferredalternative treatments .Under strict targeting, an observation i can react to being assigned an instrument z in twoways. If z is in Z ˚ , then i can choose among the treatments that z targets. Alternatively, itmay choose as if no treatment was targeted (as it must if z is not in Z ˚ ). We now make thismore rigorous by proving that observations can only opt for one of their preferred targetedtreatments, if any, or for one of their preferred alternative treatments.By Lemma 1, Z is either Z ˚ or Z ˚ Ť t u . We now state our main result on response-groups. Proposition 2 (Unfiltered response groups under strict targeting) . Let Assumptions 4 and 7hold. Then for every observation i ,(i) if z P Z ˚ , then T i p z q can only be in T ˚ i p z q or in τ ˚ i .(ii) if Z ‰ Z ˚ , then T i p q P τ ˚ i . For simplicity, we work from now on under the assumption that the distribution of theerror terms in the AS-DCM has no mass points. Then the sets τ ˚ i and T ˚ i p z q are single-tons with probability 1; with a minor abuse of notation, we let τ ˚ i and T ˚ i p z q denote theirelements . Assumption 8 (Absolutely continuous errors) . The distribution of the random vector p u it q t P T is absolutely continuous. Proposition 3 (Unfiltered classes under strict targeting) . Under Assumptions 4, 7, and 8,the population contains at most two subpopulations denoted by P and P .(i) Subpopulation P can only exist if Z “ Z ˚ . If i P P , then T i p z q “ T ˚ i p z q for all z P Z . Note that this does not extend to the sets ¯ Z p t q and ¯ T p z q , which can still have several elements. ii) Subpopulation P consists of classes denoted by c p A, τ q , where A is a possibly emptysubset of Z ˚ and τ is a treatment value. If observation i is in c p A, τ q , then the followingholds. • T i p z q “ T ˚ i p z q for all z P A . • If A ‰ Z , then τ ˚ i “ τ ; and for all z P Z z A , T i p z q “ τ . • If A ‰ Z and τ P T ˚ , then ¯ Z p τ q Ă A .(iii) If Z “ Z ˚ , then there is no class in P with A “ Z ˚ . Proposition 3 has a straightforward corollary under one-to-one targeting (Assumption 6).Recall that under one-to-one targeting, the sets ¯ Z p t q and ¯ T p z q are singletons and we canidentify each targeting instrument with the treatment it targets. As a consequence, T ˚ i p z q “ z for each z in Z , and if τ P T ˚ then ¯ Z p τ q “ t τ u . This simplifies the statement of ourcharacterization result. Corollary 1 (Unfiltered classes under strict, one-to-one targeting) . Under Assumptions 4,6, 7, and 8, the population contains at most two subpopulations denoted by P and P .(i) Subpopulation P can only exist if Z “ Z ˚ . If i P P , then T i p z q “ z for all z P Z .(ii) Subpopulation P consists of classes denoted by c p A, τ q , where A is a possibly emptysubset of Z ˚ and τ is a treatment value. If observation i is in c p A, τ q , then the followingholds. • T i p z q “ z for all z P A . • If A ‰ Z , then τ ˚ i “ τ ; and for all z P Z z A , T i p z q “ τ . • If A ‰ Z and τ P T ˚ , then τ P A .(iii) If Z “ Z ˚ , then there is no class in P with A “ Z ˚ . The subpopulation P , when it exists, regroups “super-compliers”: they always take thetreatment that is targeted by the instrument value they were assigned. E.g. if Z “ Z ˚ “t , , u , under strict one-to-one targeting this subpopulation would be the response group C . It is easy to see from the proof that an observation i belongs to P if and only if forall z P Z “ Z ˚ , V ˚ i p z q ą ∆ ˚ i .Given any (possibly empty) subset A of T ˚ and a treatment value τ , an observation i belongs to c p A, τ q if and only if 18 for all z P A , V ˚ i p z q ą ∆ ˚ i ; • for all z P Z ˚ z A , V ˚ i p z q ă ∆ ˚ i ; • ∆ ˚ i “ ∆ τ ` u iτ .First consider the case when A is empty. Whatever the value of the instrument z is, anobservation i in c pH , τ q will take up the treatment τ that maximizes u it over T . Suchobservations are always-takers of τ . In the polar case A “ Z ˚ , when it is assigned a targetinginstrument value ( z P Z ˚ ), the observation complies by picking one of the treatments ittargets ( T i p z q “ T ˚ i p z q , which is z under one-to-one targeting). When both A and Z z A arenon-empty, the observation complies when the instrument z is in A , and it does not respondto changes in the value of z when it is in Z z A .Figure 4: An unfiltered class c p A, τ q under strict one-to-one targeting zA Z ˚ z A Z z A T τ ˚ i τ ˚ i T ˚ z A Figure 4 represents the mapping of instruments to treatments for an observation i inpopulation P under strict one-to-one targeting. We illustrate a case for which Z ˚ “ Z z t u , Z ˚ z A is not empty, and τ P A . The white area shows that treatment values in T ˚ z A arenot assigned. To illustrate Corollary 1, we return to the ternary/ternary modelof Example 3, where Z ˚ “ T ˚ “ t , u and Z “ T “ t , , u . • P does not exist. • A can be H , t u , t u , or t , u , with corresponding values of τ in t u , t , u , t , u or t , , u respectively. The class c pH , q corresponds to the always-takers of 0, A “ C . For A “ t u we get C and A , and for A “ t u we get C and A . Finally,with A “ t , u we obtain the composite response group C ˚ “ C Ť C Ť C .19igure 5: Unfiltered, strictly one-to-one targeted treatment: ternary/ternary model u i ´ u i u i ´ u i P C C C A A A C C P P P The eight elemental response groups are illustrated in Figure 5, again with the ori-gin in P . Comparing Figure 5 with Figure 3 shows the identifying power of Assump-tion 7. Kirkeboen, Leuven, and Mogstad (2016) used a ternary-ternary model in their in-vestigation of field of study and earnings. We show in Appendix C.1 that our combinationof Assumption 4 and Assumption 7 yields exactly the same identifying restrictions as inKirkeboen, Leuven, and Mogstad (2016), by a quite different path.Figure 6: Unfiltered, strict one-to-one targeting: ternary/binary model with no control u i ´ u i u i ´ u i P P C C A A A C Our next example has Z “ Z ˚ : all individuals are assigned a targeting instrument. Example 11 (Ternary/binary model with no control) . Let us return to Example 9, consider T “ t , , u , and Z “ Z ˚ “ t , u : z “ t “ z “ t “
2. 20ow the subpopulation P exists; it corresponds to the response group C of super-compliers. A can be H , with τ “
0; it can be t u , with τ P t , u ; or it can be t u , with τ Pt , u . This generates response groups A ; C and A ; and C and A . These six elementalresponse-groups are represented in Figure 6, where we put the origin at u i “ u i “ u i sincethere is no P point any more.Sometimes one can obtain the characterization in Corollary 1 with a weaker assumptionthan Assumption 6. To see this, consider the following variant of Example 8. Example 12 (Only one type of subsidy) . Assume that T “ t , , u and Z “ t , u . Weinterpret z “ t “
1, and z “ t “ p q ą ∆ p q and ∆ p q “ ∆ p q ; we have¯ Z p q “ t u , ¯ Z p q “ t , u “ Z , and T ˚ “ Z ˚ “ t u . Since we only have a binary instrument,strict targeting holds in this example.The subpopulation P cannot exist here since z “ Z ˚ . In subpopulation P , wecan have classes A “ H with τ P t , u , and A “ t u with τ P T . The former generates thealways-takers groups A “ C and A “ C , and the latter has the two groups of compliers C and C and the always-taker group A “ C . These five elemental response-groups areillustrated in Figure 7.Figure 7: Unfiltered, targeted treatment: ternary/binary model with only one type of subsidy u i ´ u i u i ´ u i P C A A A C P P If we had not imposed ∆ p q “ ∆ p q , Assumption 7 would still hold but t “ T ˚ . If for instance t “ t “ t “ p q ą ∆ p q and ¯ Z p q “ t u , so that T ˚ “ t , u . We wouldnot have one-to-one targeting anymore since z “ t “ t “ C , with A “ t u and τ “ t “ t “ t “ p U p q ´ U p qq ´ p U p q ´ U p qq “ p ∆ p q ´ ∆ p qq ´ p ∆ p q ´ ∆ p qq ą . This is enough to rule out the possibility of the response group C . To see this, assume that T i p q “
1. This implies U p q ` u i ą U p q ` u i , so that U p q ` u i ą U p q ` p U p q ´ U p qq ` u i ą U p q ` u i and T i p q cannot be 2. Now that we have characterized response-groups, we seek to identify the probabilities of thecorresponding response-groups in the unfiltered treatment model.
Definition 6 (Genralized propensity scores) . We write P p t | z q for the generalized propensityscore Pr p T i “ t | Z i “ z q . Under Assumptions 6 and 7, the response-groups are easily enumerated.
Proposition 4 (Counting response-groups under strict one-to-one targeting) . Under As-sumptions 4, 6, 7. and 8, the number of response-groups is N “ p |T | ´ |Z ˚ | q ˆ |Z ˚ | ´ ´ p |T | ´ q p Z “ Z ˚ q . The data gives us the generalized propensity scores P p t | z q “ Pr p T i “ t | Z i “ z q for p t, z q P T ˆ Z . The adding-up constraints ÿ t P T P p t | z q “ k P Z reduce the countof independent data points to |T ˚ | ˆ |Z| . As the probabilities of the response-groups mustsum to one, we have p N ´ q unknowns.Table 1 shows some values of the number of equations |T ˚ | ˆ |Z| and the number ofunknowns p N ´ q for a number of examples. The first row of |T | “ |Z| “ A ), compliers ( C ), and22able 1: Number of required identifying restrictions: unfiltered treatment under strict, one-to-one targeting Row
T Z Z ˚ N ´ |Z| |T ˚ | Required Example(1) { } { } { } LATE (2) { } { } { } { |T | ´ } { } { } p |T | ´ q p |T | ´ q { } { } { } Example 11 (5) { } { } { } Example 3 (6) { } { } { }
11 9 2(7) { } { } { } { } { } { }
16 9 7(9) { } { } { }
19 12 7 always-takers ( A ). Rows (2) and its extension (3) show another case of exact identification.In other rows, as |T | gets larger, the degree of underidentification tends to increase.It is not difficult to write down the equations that link observed propensity scores andgroup probabilities. Proposition 5 (Identifying equations for response-groups: unfiltered treatment under strictone-to-one targeting) . For any subset A of Z ˚ , let A ` denote the set A Ť p T z T ˚ q . UnderAssumptions 1, 2, 4, 6, 7, and 8, the empirical content of the generalized propensity scoresof the unfiltered treatment model is the following system of equations:(i) If Z ‰ Z ˚ : • for z P Z ˚ and t P T : P p t | z q “ ÿ A Ă Z ˚ zt z u p t P A ` q Pr p c p A, t qq (2.1) ` p t P Z ˚ , t “ z q ÿ A Ă Z ˚ z P A ÿ τ P A ` Pr p c p A, τ qq . • for z R Z ˚ and t P T : P p t | z q “ ÿ A Ă Z ˚ p t P A ` q Pr p c p A, t qq . (2.2) 23 ii) If Z “ Z ˚ , for t P T : P p t | z q “ ÿ A Ă Z zt z u p t P A ` q Pr p c p A, t qq (2.3) ` p t “ z q ¨˚˝ Pr p P q ` ÿ A Ă Z ,A ‰ Z z P A ÿ τ P A ` Pr p c p A, τ qq ˛‹‚ . Proposition 5 can be applied directly to some of the rows of Table 1. According to the table,our Example 8 is just identified under strict, one-to-one targeting. Proposition 6 confirmsit and gives explicit formulæ, along with simple testable predictions. To avoid repetitions,in the remainder of Section 2, we assume that Assumptions 1, 2, 4, 6, 7, and 8 hold with D “ T . Proposition 6 (Response-group probabilities in Example 8) . The following probabilities areidentified: Pr p A q “ P p | q , Pr p A t q “ P p t | q for t ‰ , Pr p C t q “ P p t | q ´ P p t | q for t ‰ . (2.4) The model has p | T | ´ q testable predictions: P p t | q ě P p t | q for t ‰ . Row (5) of Table 1 is the ternary/ternary model of Example 3, in which eight elementalgroups are non-empty. One restriction is missing to point-identify the probabilities of alleight response-groups. The following proposition shows that the probabilities of four of theeight elemental groups are point-identified: two groups of always-takers, and two groups ofcompliers. In addition, the probabilities of two composite groups of compliers are point-identified. The other four probabilities are constrained by three adding-up constraints.
Proposition 7 (Response-group probabilities in the ternary/ternary model of Example 3) . he following probabilities are identified: Pr p A q “ P p | q , Pr p A q “ P p | q , Pr p C q “ P p | q ´ P p | q , Pr p C q “ P p | q ´ P p | q , Pr p C Ť C q “ P p | q ´ P p | q , Pr p C Ť C q “ P p | q ´ P p | q , Pr p C Ť C Ť C Ť A q “ P p | q . (2.5) The model has the following testable implications: P p | q ě P p | q (2.6) P p | q ě P p | q (2.7) P p | q ě max p P p | q , P p | qq . (2.8)The model of Example 11 is equally easy to analyze. The probabilities of two groups ofalways-takers are point-identified, and two equations link the probabilities of the other threeelemental groups. Proposition 8 (Response-group probabilities in the ternary/binary model of Example 11) . The following probabilities are identified: Pr p A q “ P p | q , Pr p A q “ P p | q , Pr p C Ť C q “ P p | q ´ P p | q , Pr p C Ť A q “ P p | q , Pr p C Ť A q “ P p | q . (2.9) The model has the following testable implication: (2.10) P p | q ě P p | q . We now establish identification of treatment effects for the complier groups whose probabil-ities are identified. To simplify the exposition, we introduce one more element of notation.25 efinition 7 (Conditional average group outcomes) . For any z P Z , t P T , and for anyresponse group C with nonzero probability, we define E z p t | C q “ E p Y i p T i “ t q| Z i “ z, i P C q and we call it the conditional average group outcome . We define the conditional averageoutcome by ¯ E z p t q “ E p Y i p T i “ t q| Z i “ z q . To give a trivial example, the LATE formula (row (1) of Table 1) is E p Y i p q| i P C q “ ¯ E p q ´ ¯ E p q P p | q ´ P p | q and E p Y i p q| i P C q “ ¯ E p q ´ ¯ E p q P p | q ´ P p | q , yielding the familiar form: E p Y i p q ´ Y i p q| i P C q “ E p Y i | Z i “ q ´ E p Y i | Z i “ q Pr p T i “ | Z i “ q ´ Pr p T i “ | Z i “ q . While the ¯ E z p t q are directly identified from the data, the conditional average groupoutcomes of course are not. We do know that some of them are zero; and that they combinewith the group probabilities to form the observed conditional average outcomes. We will usethe following identity repeatedly: Lemma 2 (Decomposing conditional average outcomes) . Let z P Z and t P T . Then ¯ E z p t q “ ÿ C p z q “ t E p Y i p t q| i P C q Pr p i P C q , where C p z q “ t means that response group C has treatment t when assigned instrument z . Inaddition, E p Y i | Z i “ z q “ ÿ t P T ¯ E z p t q . First consider Example 8, where the probabilities of all p |T | ´ q response groups areidentified (Proposition 6). Proposition 9 (Identification in the ternary/binary model under strict one-to-one target-26ng) . The following quantities are point-identified: E r Y i p q| i P A s “ ¯ E p q P p | q , E r Y i p t q| i P A t s “ ¯ E p t q P p t | q for t ‰ , E r Y i p t q| i P C t s “ ¯ E p t q ´ ¯ E p t q P p t | q ´ P p t | q for t ‰ . However, the standard Wald estimator only partially identifies the average treatment effectson the complier groups C t : E p Y i | Z i “ q ´ E p Y i | Z i “ q Pr p D i “ | Z i “ q ´ Pr p D i “ | Z i “ q “ p ¯ E p q ´ ¯ E p qq ´ ř t ‰ p ¯ E p t q ´ ¯ E p t qq P p | q ´ P p | q“ ÿ t ‰ α t E r Y i p q ´ Y i p t q| i P C t s , (2.11) where the weights α t “ Pr p i P C t | i P Ť τ ‰ C τ q “ p P p t | q ´ P p t | qq{p P p | q ´ P p | qq arepositive and sum to 1. Proposition 9 shows that we only identify a convex combination (with point-identifiedweights) of the ATEs on the |T ˚ | complier groups. It is possible to bound the average treat-ment effects in a straightforward manner if we assume that the support of Y i is known andfinite. Alternatively, we may add conditions to achieve point identification of average treat-ment effects for the compliers. Assuming that the ATEs are all equal is one obvious solution.Another one is to assume the homogeneity of the average outcomes under treatment. Corollary 2 (Treatment effects in the one-subsidy model) . Suppose that the average coun-terfactual outcomes under treatment are identical for all complier groups: E r Y i p q| i P C t s does not depend on t ‰ . (2.12) Then the average treatment effects for all complier groups C t are point-identified: E r Y i p q ´ Y i p t q| i P C t s“ ¯ E p q ´ ¯ E p q P p | q ´ P p | q ´ ¯ E p t q ´ ¯ E p t q P p t | q ´ P p t | q . To interpret the homogeneity condition in (2.12), suppose that we are concerned with theeffect of one subsidized program ( t “
1) when other, unsubsidized programs ( t ą
1) are also27vailable. Then (2.12) imposes that outcomes for compliers (who switch to the subsidizedprogram when offered a subsidy) are on average the same regardless where the compliersswitched from.We now move on the ternary/ternary model in Example 3. As we mentioned earlier, inthis example our assumptions allow us to use the results of Kirkeboen, Leuven, and Mogstad(2016). Their Proposition 2 tells us that β “ E “ Y i p q ´ Y i p q ˇˇ i P C Ť C ‰ ,β “ E “ Y i p q ´ Y i p q ˇˇ i P C Ť C ‰ , where β and β are the probability limits of the instrumental variable estimators in Y i “ β ` β p T i “ q ` β p T i “ q ` ε i . (2.13)We now show that we can also identify the average treatment effects for the response groups C and C , whose probabilities are point-identified. Proposition 10 (Identification of treatment effects for Example 3) . The average treatmenteffects of C and C are identified: E r Y i p q ´ Y i p q| i P C s“ p E r Y i | Z i “ s ´ E r Y i | Z i “ sq ´ β p P p | q ´ P p | qq P p | q ´ P p | q and E r Y i p q ´ Y i p q| i P C s“ p E r Y i | Z i “ s ´ E r Y i | Z i “ sq ´ β p P p | q ´ P p | qq P p | q ´ P p | q . The average treatment effect E r Y i p q ´ Y i p q| i P C s brings interesting information of adifferent nature than β “ E “ Y i p q ´ Y i p q ˇˇ i P C Ť C ‰ , which Kirkeboen, Leuven, and Mogstad(2016) focus on. We can illustrate this on the choice of college education, using a specialcase of Example 10. Let z “ z “ t “ t “ t “ Y is later earnings.Both response groups C , C , and C are all comprised of individuals who will study28TEM if and only if they receive a STEM subsidy. On the other hand, individuals in C Ť C will not go to college unless they receive a subsidy, while those in C are “collegealways-takers”. These are quite different populations and there is no reason to expect thatthe effect of a STEM major on their future earnings should be the same, even on average. We now turn to filtered versions of the treatment model we analyzed in the previous sec-tion. That is, we consider a model with a treatment variable D i P D , where the set offiltered treatment values D is a non-trivial partition of the set of unfiltered treatment val-ues T “ , . . . , |T | ´ . By definition, 2 ď |D| ă |T | . We impose unordered monotonicity(Assumption 4) on the unfiltered treatment model.Let M : T Ñ D denote the “filtering map”: for any d P D , the set of unfiltered t ’s thatgenerate the observation D “ d is M ´ p d q . The statistics that can be identified from thedata are obtained by summing their unfiltered equivalent over t P M ´ p d q .To make this more precise, we add superscripts T or D to response groups, conditionalprobabilities and expectations to indicate whether they pertain to the unfiltered treatmentmodel or to the filtered treatment model. For instance, C T refers to a response group in theunfiltered treatment model (a “ T -response group”). The filtering map transforms C T intoa “ D -response group” C D straightforwardly: if C T p z q “ t , then C D p z q “ M p t q . Define ¯ M tobe the component-by-component extension of M , so that ¯ M p C T q ” p M p t q , . . . , M p t |Z| qq for p t , . . . , t |Z| q P C T . Then the D -response groups are C D “ Ť C T | ¯ M p C T q“ C D C T , with probabilities Pr p i P C D q “ ÿ C T | ¯ M p C T q“ C D Pr p i P C T q . We let P T p t | z q denote the generalized propensity scores, and E Tz p t | C T q and ¯ E Tz p t q the con-ditional average group outcomes and conditional average outcomes of Definition 7. Theirequivalents in the filtered treatment model are(3.1) P D p d | z q ” Pr p D i “ d | Z “ z q “ ÿ t P M ´ p d q P T p t | z q C , E Dz p d | C q ” E p Y i p D i “ d q| Z i “ z, i P C q “ ÿ t P M ´ p d q E Tz p t | C q . Finally,(3.2) ¯ E Dz p d q ” E p Y i p D i “ d q| Z i “ z q “ ÿ t P M ´ p d q ¯ E Tz p t q . Since we do not observe T i , only the left-hand sides in Equation (3.1) and Equation (3.2) areidentified from the data. Finally, we let T i p z q and D i p z q denote the counterfactual treatments,and Y Ti p t q and Y Di p d q the counterfactual outcomes. It would be easy, but perhaps not that useful, to translate the general results of Section 2.3and Section 2.4 to the filtered treatment model. We choose to focus here on two useful classesof examples in which the unfiltered treatment model satisfies strict, one-to-one targeting.
Let us first return to the binary instrument/multiple unfiltered treatment model (Example 8).Since z “ t “
1, it seems natural to start with a binary filteredtreatment: D i “ p T i “ q . This corresponds to a filtering map M defined by • M p q “ • M p t q “ t ‰ i took the targeted treatment;if not, then i could be in any other treatment cell.The mapping of T -response groups to D -response groups is straightforward. The groupsof always takers of treatment t “ d “ A D “ A T . The other always-takers mapinto the single group A D “ Ť t ‰ A Tt ; and the compliers C Tt combine into C D “ Ť t ‰ C Tt . Under M , we have P D p | z q “ P T p | z q for z “ ,
1. That is the sum of our information on groupprobabilities. Moving to treatment effects, we observe ¯ E Dz p q “ ¯ E Tz p q and(3.3) ¯ E Dz p q “ ÿ t ‰ ¯ E Tz p t q z “ , D -response group and a weighted LATE,with unknown weights this time. Proposition 11 (Identification in the filtered binary instrument model (1)) . (i) The prob-abilities of the three D -response groups are point-identified: Pr p A D q “ Pr p A T q “ P D p | q Pr p A D q “ ÿ t ‰ Pr p A Tt q “ ´ P D p | q Pr p C D q “ ÿ t ‰ Pr p C Tt q “ P D p | q ´ P D p | q . with the testable implication P D p | q ě P D p | q .(ii) The following counterfactual expectations are identified: E p Y Di p q| i P A D q “ ¯ E D p q ´ P D p | q , E p Y Di p q| i P A D q “ ¯ E D p q P D p | q . (iii) The standard Wald estimator identifies the following combination of LATEs: E p Y i | Z i “ q ´ E p Y i | Z i “ q Pr p D i “ | Z i “ q ´ Pr p D i “ | Z i “ q “ p ¯ E D p q ´ ¯ E D p qq ´ p ¯ E D p q ´ ¯ E D p qq P D p | q ´ P D p | q“ E p Y Di p q| i P C D q ´ ÿ t ‰ α Tt E p Y Ti p t q| i P C Tt q , (3.4) where the numbers α Tt “ Pr p i P C Tt | i P C D q are unidentified positive weights that sumto one. The LHS of Equation (3.4) is a particular form of weighted LATE: the substitution of E p Y Di p q| i P C D q by the weighted average in its second term reflects the lack of informationof the analyst on the respective sizes of the groups C Tt within C D , and on the dispersion ofthe average counterfactual outcomes when z “ Corollary 3 (Identification in the filtered binary instrument model (2)) . Assume that p Y Ti p t q| i P C Tt q is the same for all t ‰ . Then E p Y Di p q| i P C D q “ ÿ t ‰ α Tt E p Y Ti p t q| i P C Tt q and the standard Wald estimator identifies the LATE on D -compliers: E p Y Di p q ´ Y Di p q| i P C D q “ E p Y i | Z i “ q ´ E p Y i | Z i “ q Pr p D i “ | Z i “ q ´ Pr p D i “ | Z i “ q . If we interpret t “ t “
1) as alternativetreatments, then the analyst may only know whether observation i received some kind oftreatment. The corresponding filtering map would be • M p q “ • M p t q “ t ą M . Let M be the join of M and M : • M p q “ • M p q “ • M p t q “ t ą D -response groups consist of the always-takers A D “ A T , A D “ A T , A D “ Ť t ą A Tt ; and the complier groups C D “ C T and C D “ Ť t ą C Tt . Proposition 12 (Identification in the filtered binary instrument model (3)) . (i) The prob-abilities of the five D -response groups are point-identified: Pr p i P A D q “ P D p | q , Pr p i P A D q “ P D p | q , Pr p i P A D q “ P D p | q , Pr p i P C D q “ P D p | q ´ P D p | q , Pr p i P C D q “ P D p | q ´ P D p | q with the testable implications P D p | q ě P D p | q and P D p | q ě P D p | q . ii) The standard Wald estimator identifies the following combination of LATEs: E p Y Di p q| i P C D q ´ α D E p Y Ti p q| i P C T q ´ p ´ α D q ÿ t ą β Tt E p Y Ti p t q| i P C Tt q (3.5) “ E p Y i | Z i “ q ´ E p Y i | Z i “ q Pr p D i “ | Z i “ q ´ Pr p D i “ | Z i “ q , where • α D “ Pr p i P C D | i P C D Ť C D q is a positive weight, smaller than 1, identified as p P D p | q ´ P D p | qq{p P D p | q ´ P D p | qq ; • the numbers β Tt “ Pr p i P C Tt | i P C D q are unidentified positive weights that sumto one. The extension to more general filters is trivial: any finer partition will identify more α Dd parameters and allow the analyst to gain more information on the sizes of D -complier groupsand to refine the interpretation of the Wald estimator.Let us now turn to the ternary/ternary unfiltered treatment model of Example 3. Re-member that z “ t “ z “ t “
2. Suppose now that theanalyst only observes whether an individual took one of the subsidized treatments ( d “ t ą
0) or not ( d “ t “ M ´ p q “ M ´ p q “ t , u . The ternary/ternaryunfiltered treatment model becomes a ternary/binary filtered treatment model. The eight T -response groups of Proposition 7 combine into five D -response groups: A D “ A T ,A D “ A T Ť A T Ť C T Ť C T ,C D “ C T ,C D “ C T ,C D “ C T . We observe the conditional probabilities P D p | z q and the average outcomes ¯ E Dz p q and ¯ E Dz p q for z “ , , Proposition 13 (Identification in the ternary/binary filtered model (1)) . (i) The proba-bility of the always-taker group A D is point-identified as P D p | q . The other four -response groups probabilities are connected by three equations: Pr p C D ˚ q “ Pr p C D q ` Pr p C D q “ P D p | q ´ P D p | q , Pr p C D ˚ q “ Pr p C D q ` Pr p C D q “ P D p | q ´ P D p | q , Pr p C D ˚ q “ Pr p C D q ` Pr p A D q “ P D p | q . with the testable implications P D p | q ě P D p | q and P D p | q ě P D p | q .The four partially-identified probabilities can be parameterized as Pr p C D q “ p, Pr p C D q “ P D p | q ´ P D p | q ´ p, Pr p C D q “ P D p | q ´ P D p | q ´ p, Pr p A D q “ P D p | q ` P D p | q ´ P D p | q ` p, where max p , P D p | q ´ P D p | q ´ P D p | qq ď p ď P D p | q ´ max p P D p | q , P D p | qq . (ii) The following average conditional counterfactual outcomes are point-identified: E p Y Di p q| i P C D ˚ q “ ¯ E D p q P D p | q , E p Y Di p q| i P C D ˚ q “ ¯ E D p q ´ ¯ E D p q P D p | q ´ P D p | q , E p Y Di p q| i P C D ˚ q “ ¯ E D p q ´ ¯ E D p q P D p | q ´ P D p | q , E p Y Di p q| i P A D q “ ¯ E D p q P D p | q . (iii) The standard Wald estimators identify the LATE on C D ˚ and on C D ˚ : E p Y Di p q ´ Y i p q| i P C D ˚ q “ E p Y i | Z i “ q ´ E p Y i | Z i “ q Pr p D i “ | Z i “ q ´ Pr p D i “ | Z i “ q , (3.6) E p Y Di p q ´ Y i p q| i P C D ˚ q “ E p Y i | Z i “ q ´ E p Y i | Z i “ q Pr p D i “ | Z i “ q ´ Pr p D i “ | Z i “ q . (3.7)Note that the width of the interval on the unknown p cannot be larger than min p P D p | q , P D p | qq :if either instrument z “ , D -response groups will be almost point-identified. Since the averagecounterfactual outcomes on elemental D -response groups are connected by equations like E p Y i p d q| i P C D ˚ q “ q E p Y i p d q| i P C D q ` p ´ q q E p Y i p d q| i P C D q with q “ p {p P D p | q ´ P D p | qq , one could go further and impose homogeneity assumptionsto improve the identification of elemental LATEs. We now return to Example 4, which featured a factorial experimental design. Recall thatwe had z “ ˆ , ˆ , ˆ , ˆ
1, and T “ t , , u . Each instrument combines twobinary instruments: the first one is meant to promote treatment t “ t “
2. We focus here on the case when there is no complentarity (positive ornegative) between the two binary instruments : ∆ T ˆ p q “ ∆ T ˆ p q and ∆ T ˆ p q “ ∆ T ˆ p q .This would hold for instance if each binary instrument is a price subsidy and prices entermean utilities additively—a common asssumption in discrete choice models. As we sawin Section 2.1, we have ¯ Z p q “ t ˆ , ˆ u and ¯ Z p q “ t ˆ , ˆ u , so that thistreatment model does not statisfy one-to-one targeting. On the other hand, we also sawthat strict targeting holds if each binary instrument only has an effect on the treatmentvalue that it targets. We will impose the corresponding assumptions ∆ T ˆ p q “ ∆ T ˆ p q and∆ T ˆ p q “ ∆ T ˆ p q . Let us now introduce a filter, so that the analyst only observes D i “ p T i ą q . Thisyields a ternary/binary filtered treatment model, much as in the previous subsection. Thereare two important differences—the instrument takes four values rather than three, and weimposed several constraints on the mean utilities:∆ T ˆ p q “ ∆ T ˆ p q ∆ T ˆ p q “ ∆ T ˆ p q ∆ T ˆ p q “ ∆ T ˆ p q ∆ T ˆ p q “ ∆ T ˆ p q . (3.8) We use a superscript T to remind the reader that the argument in parentheses is an unfiltered treatmentvalue in T . Appendix C.2 provides a variant of filtered factorial design.
35n spite of the filtering, they will allow us to point-identify the relevant LATEs. To seethis, first note that for any given observation i , D i p z q “ u i ą max p ∆ Tz p q ` u i , ∆ Tz p q ` u i q , (3.9)so that the filtered treatment model has the structure of a double hurdle model (Example 2).Figure 8: Filtered Factorial Design u i ´ u i u i ´ u i C D C D C D A D A D P ˆ P ˆ P ˆ P ˆ First note that under our assumptions, the right hand side is largest when z “ ˆ D i p ˆ q “
0, observation i always takes d “
0. If on the other hand D i p ˆ q “
1, then i is in A D since the right-hand side can only be larger for the other instrument values.Denote indices in response-groups in the order 0 ˆ , ˆ , ˆ , ˆ
1. The preceding argumentsleave only the D -response groups C D ˚˚ . The group C D cannot exist since in the absenceof complementarity between the binary instruments,max p ∆ T ˆ p q ` u i , ∆ T ˆ p q ` u i q “ max p ∆ T ˆ p q ` u i , ∆ T ˆ p q ` u i q . The three other groups are : • the eager compliers C D : any instrument except 0 ˆ d “ • the reluctant compliers C D and C D : they only adopt d “ D -response groups are shown in Figure 8. Table 2 shows which groupstake D i “ d when Z i “ z . We borrow here the terminology of Mogstad, Torgovitsky, and Walters (2019), which they apply to arather different model. D -response Groups D i p z q “ D i p z q “ z “ C D ˚˚ “ A D Ť C D Ť C D Ť C D A D z “ C D ˚ “ A D Ť C D C D ˚ ˚ “ A D Ť C D Ť C D z “ C D ˚ “ A D Ť C D C D ˚˚ “ A D Ť C D Ť C D Proposition 14 (Identifying the Filtered Factorial Design Model) . (i) the probabilities ofthe D -response groups are point-identified by Pr p A D q “ P D p | ˆ q Pr p A D q “ P D p | ˆ q Pr p C D q “ P D p | ˆ q ´ P D p | ˆ q Pr p C D q “ P D p | ˆ q ´ P D p | ˆ q Pr p C D q “ P D p | ˆ q ` P D p | ˆ q ´ P D p | ˆ q ´ P D p | ˆ q , and the model has three testable implications: P D p | ˆ q ě P D p | ˆ q ,P D p | ˆ q ě P D p | ˆ q ,P D p | ˆ q ` P D p | ˆ q ě P D p | ˆ q ` P D p | ˆ q . (ii) The LATEs on the three groups of compliers are point-identified by E p Y Di p q ´ Y Di p q| i P C D q “ E p Y | Z “ ˆ q ´ E p Y | Z “ ˆ q P D p | ˆ q ´ P D p | ˆ q E p Y Di p q ´ Y Di p q| i P C D q “ E p Y | Z “ ˆ q ´ E p Y | Z “ ˆ q P D p | ˆ q ´ P D p | ˆ q E p Y Di p q ´ Y Di p q| i P C D q “ E p Y | Z “ ˆ q ` E p Y | Z “ ˆ q ´ E p Y | Z “ ˆ q ´ E p Y | Z “ ˆ q P D p | ˆ q ` P D p | ˆ q ´ P D p | ˆ q ´ P D p | ˆ q . Proposition 14 states that (i) the average treatment effects for reluctant compliers are37dentified by suitable Wald statistics and that (ii) the average treatment effect for eager com-pliers is identified by a ratio between difference-in-differences (DiD) population quantities.The latter estimand can be viewed as a two-dimensional version of Wald statistics. E p Y Di p q ´ Y Di p q| i P C D q with covariates Most estimands in the paper are expressed in terms of simple Wald estimators, which can beeasily estimated with covariates (e.g. Fr¨olich, 2007). Two exceptional cases are E p Y Di p q ´ Y Di p q| i P C D q in Proposition 14 and E p Y Di p q ´ Y Di p q| i P C D q in Proposition 15 inAppendix C.2.We here discuss how to estimate E p Y Di p q ´ Y Di p q| i P C D q with covariates. Estimationof E p Y Di p q ´ Y Di p q| i P C D q is similar. Introduce covariates X i explicitly and define: E r Y i p q ´ Y i p q| i P C D , X i “ x s “ DiD Y p x q DiD D p x q , where DiD Y p x q : “ E r Y i | Z i “ ˆ , X i “ x s ` E r Y i | Z i “ ˆ , X i “ x s´ E r Y i | Z i “ ˆ , X i “ x s ´ E r Y i | Z i “ ˆ , X i “ x s , DiD D p x q : “ Pr r T i “ | Z i “ ˆ , X i “ x s ` Pr r T i “ | Z i “ ˆ , X i “ x s´ Pr r T i “ | Z i “ ˆ , X i “ x s ´ Pr r T i “ | Z i “ ˆ , X i “ x s . Then, E p Y Di p q ´ Y Di p q| i P C D q “ E „ DiD Y p X q DiD D p X q ˇˇˇˇ i P C D . Lemma 3.
Assume that DiD D p X q ‰ almost surely. Then, E p Y Di p q ´ Y Di p q| i P C D q “ E r DiD Y p X qs E r DiD D p X qs . Lemma 3 suggests the following two-step estimation strategy: first, estimate E r Y i | Z i “ k, X i “ x s and Pr r T i “ | Z i “ k, X i “ x s for each k P t ˆ , ˆ , ˆ , ˆ u and x P t X , . . . , X n u ; second, evaluate DiD Y p X i q and DiD D p X i q , construct their averages andtake the ratio.For example, the first step can be implemented using sieve estimators. In view ofAckerberg, Chen, and Hahn (2012); Ackerberg, Chen, Hahn, and Liao (2014), the resultingtwo-step sieve estimator is semiparametrically efficient, and furthermore, conventional nor-38al inference, pretending that we have a two-step parametric model, is valid for semipara-metric inference. For brevity of the paper, we omit details. In this section, we revisit Angrist, Lang, and Oreopoulos (2009), who analyzed the StudentAchievement and Retention Project. STAR was a randomized evaluation of academic servicesand incentives for college freshmen at a Canadian university. It was a factorial design, withtwo binary instruments. The Student Fellowship Program (SFP) offered students the chanceto win merit scholarships for good grades in the first year; the Student Support Program(SSP) offered students access to both a peer-advising service and a supplemental instructionservice. Entering first-year undergraduates were randomly assigned to one of four groups: acontrol group ( z “ ˆ z “ ˆ z “ ˆ z “ ˆ T i “ p A i , S i q , where A i “ i signed up and S i “ S i “ A i “
0. Hence T i can only take three values: p , q , p , q , and p , q .With a slight change in notation, we model the choice as T i p z q “ arg max ` u i p , q , ∆ Tz p , q ` u i p , q , ∆ Tz p , q ` u i p , q ˘ . While there are four instrument values and three treatment values, this is in fact a ternary/ternarymodel, with some specific features. First note that T i “ T ˆ p , q and ∆ T ˆ p , q at minus infinity. In addition, S i canonly be zero if z “ ˆ
1, so that we can set ∆ T ˆ p , q and ∆ T ˆ p , q at minus infinitytoo. As a consequence, we do not lose any information by redefining the control group to be0 ” t ˆ , ˆ u .In addition, students should be more likely to sign up under z “ ˆ z “ ˆ
0, as the former adds the lure of a fellowship. We will also assume that it makes39hem more likely to use the services—an assumption that we will test below. Then bothtreatment values p , q and p , q are targeted by 1 ˆ
1, but they cannot be strictly targeted.Take for instance ¯ Z p , q “ t ˆ u ; strict targeting would require ∆ T ˆ p , q “ ∆ T p , q ,which is minus infinity.Rather than to pursue with the unfiltered treatment model, let us move on to filteredmodels. In our terminology, Angrist, Lang, and Oreopoulos (2009) chose to use a particu-lar filter M p A, S q “ A , which is close to intent-to-treat as they point out. Here we take M p A, S q “ S instead: we define(4.1) D i p z q “ S i p z q “ ` ∆ Tz p , q ` u i p , q ą max p u i p , q , ∆ Tz p , q ` u i p , qq ˘ . Since the SFP incentives applied to the first year grades only, we take the grades in thesecond year as our outcome variable Y i .Equation (4.1) has a similar structure to the double hurdle model of Equation (3.9). Themodels are quite different, however. This new filtered model has D i p q “ C D ,d,d for d, d “ ,
1, this assumptioneliminates one: if D i p ˆ q “ D i p ˆ q “
0. This leaves three groups:the never-takers A D , and two groups of compliers C D and C D . The group C D consistsof reluctant compliers, who only use SSP if it is offered along with SFP. Those in C D areeager compliers: they use SSP whenever it is offered to them with or without a fellowship.Remember that P D p | z q : “ Pr p D i “ | Z i “ z q for z “ , ˆ , ˆ
1. Then P D p | q “ p A D q “ ´ P D p | ˆ q Pr p C D q “ P D p | ˆ q ´ P D p | ˆ q Pr p C D q “ P D p | ˆ q . Note that given Equation (4.1), P D p | ˆ q “ Pr ` u i p , q ´ u i p , q ą ´ ∆ T ˆ p , q and u i p , q ´ u i p , q ą ∆ T ˆ p , q ´ ∆ T ˆ p , q ˘ P D p | ˆ q “ Pr ` u i p , q ´ u i p , q ą ´ ∆ T ˆ p , q and u i p , q ´ u i p , q ą ∆ T ˆ p , q ´ ∆ T ˆ p , q ˘ . T ˆ p , q ą ∆ T ˆ p , q and ∆ T ˆ p , q ´ ∆ T ˆ p , q ă ∆ T ˆ p , q ´ ∆ T ˆ p , q . Figure 9 illustrates a configuration in which these inequalities hold, where P ˆ “ p´ ∆ T ˆ p , q , ∆ T ˆ p , q ´ ∆ T ˆ p , qq and P ˆ “ p´ ∆ T ˆ p , q , ∆ T ˆ p , q ´ ∆ T ˆ p , qq . Figure 9: STAR example u i p , q ´ u i p , q u i p , q ´ u i p , q P ˆ P ˆ A D C C Under our assumptions, it is straightforward to show that E r Y Di | Z i “ ˆ s ´ E r Y Di | Z i “ s “ E r Y Di p q ´ Y Di p q| i P C D s Pr p i P C D q , E r Y Di | Z i “ ˆ s ´ E r Y Di | Z i “ ˆ s “ E r Y Di p q ´ Y Di p q| i P C D s Pr p i P C D q . Therefore, we have E r Y Di p q ´ Y Di p q| i P C D s “ E r Y Di | Z i “ ˆ s ´ E r Y Di | Z i “ s P D p | ˆ q , E r Y Di p q ´ Y Di p q| i P C D s “ E r Y i | Z i “ ˆ s ´ E r Y Di | Z i “ ˆ s P D p | ˆ q ´ P D p | ˆ q . Since Pr p D i “ | Z i “ q “
0, the first estimand is the IV formula of Bloom (1984); thesecond estimand is the LATE formula of Imbens and Angrist (1994).Table 3 reports estimation results. We only focus on the subsample of women since theSTAR program had no effect on men. Panel A of Table 3 shows the estimated proportions41able 3: Empirical Results from STARPanel A. Proportion of CompliersPr p i P C D q p i P C D q E r Y i | Z i “ ˆ s ´ E r Y i | Z i “ s E r Y i p q ´ Y i p q| i P C D s E r Y i | Z i “ ˆ s ´ E r Y i | Z i “ ˆ s E r Y i p q ´ Y i p q| i P C D s n “ C D and 0.245 for C D . The majority group is thenever-takers whose share is 0.467. This is because the usage of SSP was low. Panel Breveals remarkable heterogeneity between the two complier groups. We do not find anysignificant treatment effect for C D , whereas we do find sizeable and significant impacton probation/withdrawal and good standing for C D . As can be seen in Figure 9, C D iscloser to the group of never-takers: they have higher unobserved disutilities of using academicsupport services than those in C D . However, those in C D reaped greater benefits of usingthe SSP by avoiding probation or withdrawal in the second year.The main parameter of interest in Angrist, Lang, and Oreopoulos (2009) was the intent-to-treat (ITT) effect of the SFSP program: E r Y i | Z i “ ˆ s ´ E r Y i | Z i “ ˆ s in ournotation. Our analysis suggests that the ITT effect of the SFSP program is a mix of twovery different treatment effects. This highlights the importance of unbundling heterogeneouscomplier groups. Let us now reexamine Kline and Walters’s (2016) analysis of the Head Start Impact Study(HSIS) using our framework. The structure of HSIS is identical to that of Example 12.The treatments consist of no preschool ( n ), Head Start ( h ), and other preschool centers ( c ): T “ t n, h, c u . We will take t “ n as our reference treatment. The instrument is binary,with a control group ( z “
0) and a group that is offered admission to Head Start ( z “ A n “ C nn , A c “ C cc , A h “ C hh , C nh , and C ch . The firstthree groups are always-takers and the last two groups are compliers. Their proportions in the sample are given by (2.4) in Proposition 6; they are shown inPanel A of Table 4. As expected, they coincide with those in Kline and Walters (2016).Panel B of Table 4 shows the counterfactual means of test scores as per Proposition 9.Among those that are point-identified, the average test scores are the highest for the groupswho always choose other preschool centers (about 0.3 standard deviation). There is anoticeable difference between the two complier groups: E r Y i p n q| i P C nh s is negative, but E r Y i p c q| i P C ch s is above 0.1 standard deviation. This indicates that among compliers, thechildren who used other centers had higher scores than those who stayed at home. Head Start The point estimates for probation/withdrawal and good standing are very large in absolute value; how-ever, the standard errors are large as well, resulting in wide confidence intervals. This is partially becausethe sample size is relatively small and partially because the estimand is the ratio of two population quantitieswith the small denominator. E r Y i p n q| i P C nh s and E r Y i p c q| i P C ch s is new.Table 4: Proportions, Counterfactual Means and Treatment Effects by Response Groups3-year-olds 4-year-olds PooledPanel A. Proportions of Response Groups via Proposition 6Always – no preschool ( A n ) 0.092 0.099 0.095Always – Head Start ( A h ) 0.147 0.122 0.136Always – other centers ( A c ) 0.058 0.114 0.083Compliers from n to h ( C nh ) 0.505 0.393 0.454Compliers from c to h ( C ch ) 0.198 0.272 0.232Panel B. Counterfactual Means of Test Scores via Proposition 9 E r Y i p n q| P A n s -0.050 -0.017 -0.035 E r Y i p h q| P A h s E r Y i p c q| P A c s E r Y i p n q| i P C nh s -0.027 -0.116 -0.062 E r Y i p c q| i P C ch s E r Y i p h q| i P C nh s “ E r Y i p h q| i P C ch s E r Y i p h q ´ Y i p n q| i P C nh s for compliers from ‘n’ to ‘h’ 0.279 0.285 0.278(0.063) (0.076) (0.050) E r Y i p h q ´ Y i p c q| i P C ch s for compliers from ‘c’ to ‘h’ 0.140 0.025 0.087(0.089) (0.097) (0.063) E r Y i p h q ´ Y i p n q| i P C nh s ´ E r Y i p h q ´ Y i p c q| i P C ch s h ), other centers ( c ), no preschool ( n ). Standard errors inparentheses are clustered at the Head Start center level.44 .2.2 Treatment Effects To fully measure the substitution effect, one needs to identify E r Y i p h q| i P C nh s and E r Y i p h q| i P C ch s .However, under Proposition 9, they are only partially identified by E r Y i p h q| i P C nh s t Pr p T i “ n | Z i “ q ´ Pr p T i “ n | Z i “ qu` E r Y i p h q| i P C ch s t Pr p T i “ c | Z i “ q ´ Pr p T i “ c | Z i “ qu“ E r Y i p T i “ h q| Z i “ s ´ E r Y i p T h “ q| Z i “ s . This is exactly the formula on Kline and Walters (2016, pp.1811), where they point out thatthe LATE for Head Start is a weighted average of “subLATEs” with weights S c and p ´ S c q with S c : “ Pr p C ch q Pr p C nh q ` Pr p C ch q “ Pr p T i “ c | Z i “ q ´ Pr p T i “ c | Z i “ q Pr p T i ‰ h | Z i “ q ´ Pr p T i ‰ h | Z i “ q . Kline and Walters (2016) first tried to estimate E r Y i p h q´ Y i p c q| i P C ch s and E r Y i p h q´ Y i p n q| i P C nh s separately using two-stage least squares (2SLS), using interaction of the instrumentwith covariates or experimental sites in an attempt to generate enough variation. Theyacknowledged the limitations of this interacted 2SLS approach and developed a parametricselection model `a la Heckman (1979). Using a parametric selection model and pooled co-horts, Kline and Walters (2016, Table VIII, column (4) full model) obtain estimates of thetreatment effect of 0 . p . q for C nh and ´ . p . q for C ch respectively (standarderrors in parentheses).Our Corollary 2 provides an alternative approach to separating the two treatment effects.If we assume that E r Y i p h q| i P C nh s “ E r Y i p h q| i P C ch s , we can point-identify the averagetreatment effects for both groups of compliers. The resulting estimates are shown in PanelsC and D of Table 4. The average impact on test scores of participating in Head Startis around 0 .
28 for C nh , whereas it is smaller and insignificant for C ch . Their difference issignificantly different when the two cohorts are pooled together.We obtained these estimates of the treatment effects by a completely different route thanKline and Walters (2016). While the two sets of estimates are similar, our estimate of thedifference between the treatment effects on the two groups of compliers is twice smaller. Ourhomogeneity assumption E r Y i p h q| i P C nh s “ E r Y i p h q| i P C ch s may be too strong. It mightbe more plausible to assume that E r Y i p h q| i P C nh s ď E r Y i p h q| i P C ch s
45s children who would not attend preschool in the absence of offer to Head Start are likely tobe less well-prepared than children who would attend other preschools. Then our estimateddifference between the two complier groups will be a lower bound of the true difference.
Concluding Remarks
We have shown that our targeting and filtering concepts are a useful way to analyze modelswith multivalued treatments and multivalued instruments. While our characterization issharpest under strict, one-to-one targeting (Corollary 1), our framework remains useful evenwithout strict targeting. In addition to the examples we discussed in the text and to thetwo applications we revisited, we give an example in Appendix C.2, with a ternary/ternarymodel where the analyst only observed the least -preferred treatment in a factorial design.Our paper only analyzed discrete-valued instruments and treatments. Some of the notionswe used would extend naturally to continuous instruments and treatments: the definitionsof targeting, one-to-one targeting, and filtering would translate directly. Strict targeting,on the other hand, is less appealing in a context in which continuous values may denoteintensities. Our earlier paper (Lee and Salani´e, 2018) can be seen as analyzing continuous-instruments/discrete-treatments filtered models; so does Mountjoy’s (2019)’s study of 2-yearcolleges. Extending our analysis to models with continuous treatments is an interesting topicfor further research.
Appendices
A Proofs for Section 2
Proof of Proposition 1.
Let T i p t q “ t P T ˚ . Then u i ą ¯∆ t ` u it . However,¯∆ t ą ∆ z p t q if z R ¯ Z p t q . Therefore u i ą ∆ z p t q ` u it , and T i p z q cannot be t . Proof of Lemma 1.
The lemma is proved in the main text.
Proof of Proposition 2.
Take any observation i and an instrument value z P Z . The treat-ment T i p z q must maximize p U z p t q ` u it q over t P T . Under Assumption 7, for any t wehave • U z p t q “ U z p q ` ¯∆ t if t P ¯ T p z q • U z p t q “ U z p q ` ∆ t otherwise. 46herefore, eliminating U z p q ,(A.1) T i p z q P arg max ˆ max t R ¯ T p z q p ∆ t ` u it q , max t P ¯ T p z q p ¯∆ t ` u it q ˙ . Since ¯∆ t ě ∆ t for all t P T , a fortiori ¯∆ t ` u it ě ∆ t ` u it when t P ¯ T p z q . As a consequence,we can rewrite Equation (A.1) as T i p z q P arg max p ∆ ˚ i , V ˚ i p z qq . (i) If z P Z ˚ , then ¯ T p z q is not empty and the maximizer can be either in τ ˚ i or in T ˚ i p z q .(ii) If z P Z z Z ˚ , then z can only be 0. ¯ T p q “ H and T i p q can only be in τ ˚ i . Proof of Proposition 3.
Take an observation i and define A i “ t z P Z ˚ | T i p z q “ T ˚ i p z qu .(i) By definition, A i Ă Z ˚ ; therefore A i “ Z (which defines the subpopulation P ) requires Z “ Z ˚ .(ii) Now suppose that A i ‰ Z . If z P Z ˚ z A i , then by construction T i p z q ‰ T ˚ i p z q . ByProposition 2(i), T i p z q can only be τ ˚ i . If z R Z ˚ , then z “ T i p q “ τ ˚ i .(iii) Assume that τ ˚ i “ τ P T ˚ . Then ¯ Z p τ q ‰ H . For any z in ¯ Z p τ q , V ˚ i p z q ě ¯∆ τ ` u iτ ą ∆ τ ` u iτ “ ∆ ˚ i ;therefore z P A i . This proves that ¯ Z p τ q Ă A i . Proof of Corollary 1.
It follows directly from Proposition 3.
Proof of Proposition 4.
First assume that Z ˚ ‰ Z , so that only the classes in P exist. Theset A of Corollary 1 must be a subset of Z ˚ . For each such subset, τ can take any value in T z T ˚ ; and if τ P T ˚ then τ must be in A . Each subset A of Z ˚ with a elements thereforeallows for p a ` |T | ´ |T ˚ | q values of τ . This gives a total of |Z ˚ | ÿ a “ ˆ |Z ˚ | a ˙ p a ` |T | ´ |T ˚ | q P . Moreover, we know that |T ˚ | “ |Z ˚ | under one-to-onetargeting. Using the identities b ÿ a “ ˆ ba ˙ “ p ` q b “ bb ÿ a “ a ˆ ba ˙ “ b ˆ b ´ ÿ a “ ˆ b ´ a ˙ “ b ˆ b ´ , we obtain a total of p |T | ´ |Z ˚ | q ˆ |Z ˚ | ´ types.If Z “ Z ˚ , we must add the one type in P . On the other hand, we must subtract the |T | classes c p Z ˚ , τ q that are ruled out by Corollary 1(iii). Proof of Proposition 5. (i) First assume that Z ˚ ‰ Z , so that the subpopulation P doesnot exist. There are two ways to obtain T i p z q “ t . • The first one is for i to belong to in any c p A, t q element, with A Ă Z ˚ and t constrained to be in A ` . This requires that z R A . If z is in Z ˚ , this implies A Ă Z ˚ zt z u . If not, then A can be any subset of Z ˚ . This gives the first termin (2.1), and (2.2). • The second way to get T i p z q “ t is if t “ z , which can only happen if z P Z ˚ .Then if i P c p A, τ q for any A that contains z and any τ P A ` , we have T i p z q “ z .This gives the second term in (2.1).(ii) If Z ˚ “ Z , we only need to add in the subpopulation P if z “ t , and to delete fromthe summations the case A “ Z ˚ “ Z . Introducing these changes in (2.1) gives (2.3).Since Z ˚ “ Z there is obviously no subcase z R Z ˚ . Proof of Proposition 6.
Since Z ˚ “ t u ‰ Z in Example 8, we apply equations (2.1) and (2.2).With Z ˚ “ t u , we can only have A “ H , with A ` “ T zt u , or A “ t u , with A ` “ T .Equation (2.1) gives P p t | q “ p t ‰ q Pr p c pH , t qq ` p t “ q ÿ τ P T Pr p c pt u , τ qq ;and equation (2.2) gives P p t | q “ p t ‰ q Pr p c pH , t qq ` Pr p c pt u , qq .
48e already know that c pH , t q is A t and c pt u , τ q is A if τ “ C τ otherwise. Therefore for t “ P p | q “ Pr p A q ` ÿ τ ‰ Pr p C τ q and P p | q “ Pr p A q ; while for t ‰ P p t | q “ Pr p A t q and P p t | q “ Pr p C t q` Pr p A t q . Proof of Proposition 8.
It is straightforward from Figure 6.
Proof of Proposition 7.
It is straightforward from Figure 5.
Proof of Lemma 2.
We start from the sum over all response groups:¯ E z p t q “ ÿ C E z p t | C q Pr p i P C q . First note that if group C does not have treatment t under instrument z , it should not figurein the sum. Now if C p z q “ t , we have E z p t | C q “ E p Y i p T i “ t q| Z i “ z, i P C q“ E p Y i p t q| Z i “ z, i P C q“ E p Y i p t q| i P C q . The second part of the Lemma is just adding up.
Proof of Proposition 9.
By Lemma 2, we get¯ E p q “ E r Y i p q| i P A s Pr p i P A q ¯ E p t q “ E r Y i p t q| i P A t s Pr p i P A t q` E r Y i p t q| i P C t s Pr p i P C t q for t ‰ , ¯ E p q “ E r Y i p q| i P A s Pr p i P A q` ÿ t ‰ E r Y i p q| i P C t s Pr p i P C t q , ¯ E p t q “ E r Y i p t q| i P A t s Pr p i P A t q for t ‰ . Since Proposition 6 identifies all type probabilities, the first and fourth equations give directly E p Y i p t q| i P A t q for all t . Then the second equation identifies E p Y i p t q| i P C t q for t ‰ E p Y i p q| i P C t q for t ‰ E p q ´ ¯ E p q “ ÿ t ‰ E r Y i p q| i P C t s Pr p i P C t q . By subtraction, we obtain p ¯ E p q ´ ¯ E p qq ´ ÿ t ‰ p ¯ E p t q ´ ¯ E p t qq“ ÿ t ‰ E r Y i p q ´ Y i p t q| i P C t s Pr p i P C t q . Combining these results with Proposition 6 and Lemma 2 yields the formula in the Propo-sition. The denominator ÿ t ‰ p P p t | q ´ P p t | qq “ P p | q ´ P p | q is positive, since all terms in the sum are positive. It follows that all α t weights are positiveand sum to 1. Proof of Corollary 2.
The corollary follows directly from the proof of Proposition 9, as ÿ t ‰ Pr p i P C t q “ ÿ t ‰ p P p t | q ´ P p t | qq “ P p | q ´ P p | q gives E p Y i p q| i P C t q “ p ¯ E p q ´ ¯ E p qq{p P p | q ´ P p | qq . Proof of Proposition 10.
For z “ , , E r Y i | Z i “ z s“ ÿ t “ E r Y i | Z i “ z, i P A t s Pr p i P A t q ` E r Y i | Z i “ z, i P C s Pr p i P C q` E r Y i | Z i “ z, i P C s Pr p i P C q ` E r Y i | Z i “ z, i P C s Pr p i P C q` E r Y i | Z i “ z, i P C s Pr p i P C q ` E r Y i | Z i “ z, i P C s Pr p i P C q . Note that the first term is also ÿ t “ E r Y i p t q| i P A t s Pr p i P A t q , z . It follows that E r Y i | Z i “ s ´ E r Y i | Z i “ s“ E r Y i p q ´ Y i p q| i P C s Pr p i P C q ` E r Y i p q ´ Y i p q| i P C ˚ s Pr p i P C ˚ q , E r Y i | Z i “ s ´ E r Y i | Z i “ s“ E r Y i p q ´ Y i p q| i P C s Pr p i P C q ` E r Y i p q ´ Y i p q| i P C ˚ s Pr p i P C ˚ q . Combining these formulæ with Proposition 7 yields the result.
B Proofs for Section 3
Proof of Proposition 11. (i) It follows directly from Proposition 6 and from the mappingof types.(ii) From Proposition 9, we have E p Y Di p q| i P A D q “ E p Y Ti p q| i P A T q “ ¯ E T p q P T p | q “ ¯ E D p q P D p | q . Moreover, E p Y Di p q| i P A D q “ E p Y Di p q| i P Ť t ‰ A Tt q“ ÿ t ‰ E p Y Ti p t q| i P A Tt q P T p t | q ´ P T p | q“ ÿ t ‰ ¯ E T p t q ´ P T p | q“ ¯ E D p q ´ P D p | q . (iii) Now consider the weighted LATE ÿ t ‰ α Tt E p Y Ti p q ´ Y Ti p t q| i P C Tt q , which is identifiedin the unfiltered treatment model (equation 2.11). The weights α Tt “ p P T p t | q ´ P T p t | qq{p P T p | q ´ P T p | qq are not identified any more. Note however that for anyvariable W i , ÿ t ‰ α Tt E p W i | i P C t q “ E p W i | i P C D q ;therefore ÿ t ‰ α Tt E p Y Di p q| i P C Tt q “ E p Y Di p q| i P C D q . The LHS of Equation (2.11)51ecomes E p Y Di p q| i P C D q ´ ÿ t ‰ α Tt E p Y Ti p t q| i P C t q . On the RHS we had p ¯ E T p q ´ ¯ E T p qq ´ ř t ‰ p ¯ E T p t q ´ ¯ E T p t qq P T p | q ´ P T p | q . The denominator is still identified as P D p | q ´ P D p | q , as is the first term of thenumerator, which equals ¯ E D p q ´ ¯ E D p q . From equation 3.3, ÿ t ‰ p ¯ E T p t q ´ ¯ E T p t qq “ ¯ E D p q . Therefore we identify E p Y Di p q| i P C D q ´ ÿ t ‰ α Tt E p Y Ti p t q| i P C Tt q “ p ¯ E D p q ´ ¯ E D p qq ´ p ¯ E D p q ´ ¯ E D p qq P D p | q ´ P D p | q , which is the standard Wald estimator. Proof of Corollary 3.
It is obvious by direct substitution into Equation (3.4).
Proof of Proposition 12. (i) It follows directly from the mapping of groups.(ii) Part (i) identifies the weight α T “ p P D p | q ´ P D p | qq{p P D p | q ´ P D p | qq , whichwe denote α D in the Proposition. The other terms obtain by simple factorization, with1 ´ α D “ Pr p i P C Tt | t ą q . Proof of Proposition 13.
Recall that Table 2 shows which groups take D i “ d when Z i “ z .(i) We have P D p | z q “ P T p | z q for z “ ,
1. Given Proposition 7(i), this gives usPr p C T q` Pr p C T q “ P D p | q´ P D p | q and Pr p C T q` Pr p C T q “ P D p | q´ P D p | q ,which map into Pr p C D q ` Pr p C D q “ P D p | q ´ P D p | q Pr p C D q ` Pr p C D q “ P D p | q ´ P D p | q ;52nd the last equation in Proposition 7(i) maps intoPr p C D q ` Pr p C D q ` Pr p C D q ` Pr p A D q “ P D p | q . Finally, Pr p A D q “ P D p | q from the table. Defining p “ Pr p C D q gives the equations inthe proposition, along with the constraints on p . Note also that Pr p C D ˚ q “ Pr p A D q ` Pr p C D q “ P D p | q .(ii) First note that ¯ E D p q “ E p Y i p i P A D qq “ E p Y Di p q| i P A D q Pr p A D q . The otherequations can be read from the table:¯ E D p q “ E p Y D p q p C D ˚ qq ¯ E D p q “ E p Y D p q p C D ˚ qq ¯ E D p q “ E p Y D p q p C D ˚˚ qq ¯ E D p q “ E p Y D p q p C D ˚ ˚ qq ¯ E D p q “ E p Y D p q p C D ˚˚ qq . Part (i) showed that we point-identify Pr p A D q , Pr p C D ˚ q , Pr p C D ˚ q , and Pr p C D ˚ q . Thisallows us to rewrite the last three lines as¯ E D p q “ P D p | q E p Y D p q| C D ˚ q ` p P D p | q ´ P D p | qq E p Y D p q| C D ˚ q“ P D p | q E p Y D p q| C D ˚ q ` p P D p | q ´ P D p | qq E p Y D p q| C D ˚ q ¯ E D p q “ p P D p | q ´ P D p | qq E p Y D p q| C D ˚ q ` P D p | q E p Y D p q| A D q ¯ E D p q “ p P D p | q ´ P D p | qq E p Y D p q| C D ˚ q ` P D p | q E p Y D p q| A D q , where we used the fact that C D ˚ “ C D ˚ “ A D .Simple calculations give E p Y D p q| C D ˚ q “ ¯ E D p q Pr p C D ˚ q “ ¯ E D p q P D p | q E p Y D p q| C D ˚ q “ ¯ E D p q Pr p C D ˚ q “ ¯ E D p q P D p | q E p Y D p q| C D ˚ q “ ¯ E D p q ´ ¯ E D p q P D p | q ´ P D p | q E p Y D p q| C D ˚ q “ ¯ E D p q ´ ¯ E D p q P D p | q ´ P D p | q . E p Y D p q| C D ˚ q “ ¯ E D p q ´ E p Y D p q p C D ˚ qq P D p | q ´ P D p | q “ ¯ E D p q ´ ¯ E D p q P D p | q ´ P D p | q E p Y D p q| C D ˚ q “ ¯ E D p q ´ E p Y D p q p C D ˚ qq P D p | q ´ P D p | q “ ¯ E D p q ´ ¯ E D p q P D p | q ´ P D p | q (iii) From (ii) we obtain directly, using Lemma 2, E p Y D p q ´ Y D p q| C D ˚ q “ ¯ E D p q ` ¯ E D p q ´ ¯ E D p q ´ ¯ E D p q P D p | q ´ P D p | q “ E p Y | Z “ q ´ E p Y | Z “ q P D p | q ´ P D p | q E p Y D p q ´ Y D p q| C D ˚ q “ ¯ E D p q ` ¯ E D p q ´ ¯ E D p q ´ ¯ E D p q P D p | q ´ P D p | q “ E p Y | Z “ q ´ E p Y | Z “ q P D p | q ´ P D p | q . Proof of Proposition 14.
Recall that Table 5 shows how response groups map instrumentvalues into filtered treatment values. The proof follows directly.
Proof of Proposition 15.
The proof is omitted since it is similar to those of Propositions 13and 14.
Proof of Lemma 3.
Using the fact that Pr p i P C D | X i “ x q “ DiD D p x q , we have that θ “ ż E r Y i p q ´ Y i p q| i P C D , X i “ x s f p x | i P C D q dx “ ż E r Y i p q ´ Y i p q| i P C D , X i “ x s Pr p i P C D | X i “ x q f p x q Pr p i P C D q dx “ ż DiD Y p x q DiD D p x q Pr p i P C D | X i “ x q f p x q Pr p i P C D q dx “ ş DiD Y p x q f p x q dx Pr p i P C D q“ ş DiD Y p x q f p x q dx ş DiD D p x q f p x q dx , which proves the lemma. 54 Additional Material
C.1 Strict Targeting in the Ternary/ternary Model
Just like ours, Kirkeboen, Leuven, and Mogstad (2016)’s approach to identification relies ona monotonicity assumption and a restriction on the mapping from instruments to treatments.We translate them here in our notation to show that in this model, their assumptions areequivalent to ours.Kirkeboen, Leuven, and Mogstad (2016) impose the following in their Assumption 4: • if T i p q “ T i p q “ • if T i p q “ T i p q “ C ˚ , C ˚ , C ˚ , and C ˚ .Their Proposition 2 proves point-identification of response-groups when one of three alter-native assumptions is added to their Assumption 4. We focus here on their assumption (iii),which is the weakest of the three and the one their application relies on. In our notation, itstates that: • if ( T i p q ‰ T i p q ‰ T i p q “ T i p q “ • if ( T i p q ‰ T i p q ‰ T i p q “ T i p q “ T i p q and T i p q are not 1, then they can only be 0 or 2. Therefore we are requiring T i p q “ T i p q . Applyingthe same argument to the second part, Assumption (iii) becomes: • if ( T i p q ‰ T i p q ‰ T i p q “ T i p q • if ( T i p q ‰ T i p q ‰ T i p q “ T i p q .It therefore excludes the response-groups C ˚ , C ˚ , C ˚ , and C ˚ . The response-group C appears twice in this list; and four other response-groups were already ruled out byAssumption 4. The reader can easily check that the 3 ´ ´ p ´ q “ .2 A Variant of Filtered Factorial Design Let us return to the factorial design of Example 4, with a twist: the unfiltered treatmentconsists of the full ranking of the three alternatives. The instrument values are still p ˆ , ˆ , ˆ , ˆ q ; now T i is a pair that consists of the most-preferred alternative T i p z q “ arg max t “ , , p U z p t q ` u it q and of the least-preferred alternative T i p z q “ arg min t “ , , p U z p t q ` u it q . In Section 3.2, we considered the case when T i is only T i ; and we added a filter D i “ p T i ą q . Let us now take T i “ p T i , T i q , with the filter D i “ p T i “ q .The model of Section 3.2, where we only observed whether the most-preferred alternativewas 0, led to a double hurdle model. In this variant, we only observe whether the least- preferred alternative is 0, which leads to a different filtered treatment model:(C.1) D i p z q “ p u i ă min p ∆ Tz p q ` u i , ∆ Tz p q ` u i qq . We keep the same constraints on the mean utilities as in (3.8). Under Equation (C.1), wehave five response groups, as shown in Figure 10. Table 5 shows how response groups mapinstrument values into filtered treatment values.Table 5: D -response Groups for the Alternative Factorial Design Model D i p z q “ D i p z q “ z “ ˆ A D Ť C D Ť C D Ť C D A D z “ ˆ A D Ť C D A D Ť C D Ť C D z “ ˆ A D Ť C D A D Ť C D Ť C D z “ ˆ A D A D Ť C D Ť C D Ť C D Proposition 15 (Identifying the Model with Equation (C.1)) . (i) The probabilities of the u i ´ u i u i ´ u i C D A D C D C D A D P ˆ P ˆ P ˆ P ˆ D -response groups are point-identified by Pr p A D q “ P D p | ˆ q Pr p A D q “ P D p | ˆ q Pr p C D q “ P D p | ˆ q ´ P D p | ˆ q Pr p C D q “ P D p | ˆ q ´ P D p | ˆ q Pr p C D q “ P D p | ˆ q ` P D p | ˆ q ´ P D p | ˆ q ´ P D p | ˆ q , and the model has three testable implications: P D p | ˆ q ě P D p | ˆ q ,P D p | ˆ q ě P D p | ˆ q ,P D p | ˆ q ` P D p | ˆ q ě P D p | ˆ q ´ P D p | ˆ q . (ii) The LATEs on the three groups of compliers are point-identified by E p Y Di p q ´ Y Di p q| i P C D q “ E p Y | Z “ ˆ q ´ E p Y | Z “ ˆ q P D p | ˆ q ´ P D p | ˆ q E p Y Di p q ´ Y Di p q| i P C D q “ E p Y | Z “ ˆ q ´ E p Y | Z “ ˆ q P D p | ˆ q ´ P D p | ˆ q E p Y Di p q ´ Y Di p q| i P C D q “ E p Y | Z “ ˆ q ` E p Y | Z “ ˆ q ´ E p Y | Z “ ˆ q ´ E p Y | Z “ ˆ q P D p | ˆ q ` P D p | ˆ q ´ P D p | ˆ q ´ P D p | ˆ q .
57t is worth comparing Proposition 14 with Proposition 15. One interesting observationis that the share of C D in Proposition 14 is identical up to sign to that of C D in Propo-sition 15. Namely, Pr p C D q “ DiD p T q and Pr p C D q “ p C D q “ p C D q “ ´ DiD p T q in Proposition 15, where DiD p T q is the difference-in-differencesof the propensity score defined by:DiD p T q “ t Pr r D i “ | Z i “ ˆ s ´ Pr r D i “ | Z i “ ˆ su´ t Pr r D i “ | Z i “ ˆ s ´ Pr r D i “ | Z i “ ˆ su . In terms of economic interpretation, one may think of the selection mechanism in Proposi-tion 14 as the scenario when instruments 11 p Z i “ ˆ q and 11 p Z i “ ˆ q are substitutesto encourage agents to take treatment. On the contrary, the selection mechanism in Propo-sition 15 corresponds to the case that instruments 11 p Z i “ ˆ q and 11 p Z i “ ˆ q arecomplements. The same estimands identify the average treatment effects for conceptuallydistinct subpopulations, depending on the details of the selection mechanism. This suggeststhat it is important to learn about the nature of selection into treatment before interpretingthe causal parameters of different compliers. To do this, one can estimate the difference-in-differences of the propensity score DiD p T q and use its sign to determine whether equa-tion (3.9) or equation (C.1) is more plausible in any particular application. References
Ackerberg, D., X. Chen, and
J. Hahn (2012): “A practical asymptotic variance esti-mator for two-step semiparametric estimators,”
Review of Economics and Statistics , 94(2),481–498.
Ackerberg, D., X. Chen, J. Hahn, and
Z. Liao (2014): “Asymptotic efficiency ofsemiparametric two-step GMM,”
Review of Economic Studies , 81(3), 919–943.
Angrist, J., D. Lang, and
P. Oreopoulos (2009): “Incentives and Services for CollegeAchievement: Evidence from a Randomized Trial,”
American Economic Journal: AppliedEconomics , 1(1), 136–63.
Angrist, J. D., and
G. W. Imbens (1995): “Two-stage least squares estimation of aver-age causal effects in models with variable treatment intensity,”
Journal of the AmericanStatistical Association , 90(430), 431–442.
Ao, W., S. Calonico, and
Y.-Y. Lee (2019): “Multivalued Treatments and Decompo-sition Analysis: An Application to the WIA Program,”
Journal of Business & EconomicStatistics , in press. 58 loom, H. S. (1984): “Accounting for no-shows in experimental evaluation designs,”
Eval-uation Review , 8(2), 225–246.
Caetano, C., and
J. C. Escanciano (2020): “Identifying Multiple Marginal Effects witha Single Instrument,”
Econometric Theory , forthcoming.
Cattaneo, M. D. (2010): “Efficient semiparametric estimation of multi-valued treatmenteffects under ignorability,”
Journal of Econometrics , 155(2), 138–154.
D’Haultfoeuille, X., and
P. F´evrier (2015): “Identification of Nonseparable Triangu-lar Models With Discrete Instruments,”
Econometrica , 83(3), 1199–1210.
Feng, J. (2020): “Matching Points: Supplementing Instruments with Covariates in Trian-gular Models,” Job market paper, available at https://econ.columbia.edu/e/jfeng/ . Fr¨olich, M. (2007): “Nonparametric IV estimation of local average treatment effects withcovariates,”
Journal of Econometrics , 139(1), 35–75.
Goff, L. (2020): “A Vector Monotonicity Assumption for Multiple Instruments,” availableat . Heckman, J., and
R. Pinto (2018): “Unordered Monotonicity,”
Econometrica , 86(1),1–35.
Heckman, J. J. (1979): “Sample Selection Bias as a Specification Error,”
Econometrica ,47(1), 153–161.
Heckman, J. J., S. Urzua, and
E. Vytlacil (2006): “Understanding instrumentalvariables in models with essential heterogeneity,”
Review of Economics and Statistics ,88(3), 389–432.(2008): “Instrumental variables in models with multiple outcomes: The generalunordered case,”
Annales d’´economie et de statistique , 91/92, 151–174.
Huang, L., U. Khalil, and
N. Yildiz (2019): “Identification and estimation of a tri-angular model with multiple endogenous variables and insufficiently many instrumentalvariables,”
Journal of Econometrics , 208(2), 346–366.
Imbens, G. W. (2000): “The role of the propensity score in estimating dose-response func-tions,”
Biometrika , 87(3), 706–710.
Imbens, G. W., and
J. D. Angrist (1994): “Identification and Estimation of LocalAverage Treatment Effects,”
Econometrica , 62(2), 467–475.
Kamat, V. (2019): “Identification with Latent Choice Sets,” arXiv:1711.02048, https://arxiv.org/abs/1711.02048 . Kirkeboen, L. J., E. Leuven, and
M. Mogstad (2016): “Field of study, earnings, andself-selection,”
Quarterly Journal of Economics , 131(3), 1057–1111.59 line, P., and
C. R. Walters (2016): “Evaluating public programs with close substitutes:The case of Head Start,”
Quarterly Journal of Economics , 131(4), 1795–1848.
Lee, S., and
B. Salani´e (2018): “Identifying effects of multivalued treatments,”
Econo-metrica , 86(6), 1939–1963.
Mogstad, M., A. Torgovitsky, and
C. R. Walters (2019): “Identification of CausalEffects with Multiple Instruments: Problems and Some Solutions,” Working Paper 25691,National Bureau of Economic Research.(2020): “Policy Evaluation With Multiple Instrumental Variables,” Working Paper27546, National Bureau of Economic Research.
Mountjoy, J. (2019): “Community Colleges and Upward Mobility,” Chicago Booth mimeo.
Muralidharan, K., M. Romero, and
K. W¨uthrich (2019): “Factorial Designs, ModelSelection, and (Incorrect) Inference in Randomized Experiments,” Working Paper 26562,National Bureau of Economic Research.
Torgovitsky, A. (2015): “Identification of Nonseparable Models Using Instruments WithSmall Support,”
Econometrica , 83(3), 1185–1197.
Vytlacil, E. (2002): “Independence, monotonicity, and latent index models: An equiva-lence result,”