Direct and Indirect Effects based on Changes-in-Changes
aa r X i v : . [ ec on . E M ] O c t Direct and Indirect Effects based on Changes-in-Changes ∗ Martin Huber † , Mark Schelker † , Anthony Strittmatter ‡ † University of Fribourg, Dept. of Economics ‡ University of St. Gallen, Swiss Institute for Empirical Economic Research
Abstract:
We propose a novel approach for causal mediation analysis based on changes-in-changes assumptions restricting unobserved heterogeneity over time. This allows disen-tangling the causal effect of a binary treatment on a continuous outcome into an indirecteffect operating through a binary intermediate variable (called mediator) and a direct effectrunning via other causal mechanisms. We identify average and quantile direct and indirecteffects for various subgroups under the condition that the outcome is monotonic in theunobserved heterogeneity and that the distribution of the latter does not change over timeconditional on the treatment and the mediator. We also provide a simulation study andan empirical application to the Jobs II programme.
Keywords:
Direct effects, indirect effects, mediation analysis, changes-in-changes, causalmechanisms, treatment effects.
JEL classification: C21. ∗ We have benefited from comments by Giuseppe Germinario as well as conference/seminar partici-pants at the Universities of Neuchâtel, Melbourne, Sydney, Hamburg, and Lisbon, the LuxembourgInstitute of Socio-Economic Research, the 2019 meeting of the Austro-Swiss Region of the Inter-national Biometric Society in Lausanne, and the 2019 meeting of the International Associationfor Applied Econometrics in Nicosia. Addresses for correspondence: Martin Huber, Chair of Ap-plied Econometrics - Evaluation of Public Policies, University of Fribourg, Bd. de Pérolles 90,1700 Fribourg, Switzerland, [email protected]. Mark Schelker, Chair of Public Economics,University of Fribourg, Bd. de Pérolles 90, 1700 Fribourg, Switzerland, [email protected] Strittmatter, Swiss Institute for Empirical Economic Research (SEW), Universityof St.Gallen, Varnbüelstr. 14, 9000 St.Gallen, Switzerland, [email protected], . Introduction
Causal mediation analysis aims at disentangling a total treatment effect into anindirect effect operating through an intermediate variable – commonly referred toas mediator – as well as the direct effect. The latter includes any causal mechanismsnot operating through the mediator of interest. Even when the treatment is random,direct and indirect effects are generally not identified by simply controlling for themediator without accounting for its potential endogeneity, as this likely introducesselection bias, see Robins and Greenland (1992).This paper suggests a novel identification strategy for causal mediation analysisbased on changes-in-changes (CiC) as suggested by Athey and Imbens (2006) forevaluating (total) average and quantile treatment effects. We adapt the approach tothe identification of the direct effect and the indirect effect running through a binarymediator. The outcome variable must be continuous and is assumed to be observedboth prior to and after treatment and mediator assignment as it is the case in re-peated cross sections or panel data. The key identifying assumptions imply that thecontinuous outcome is strictly monotonic in unobserved heterogeneity and that thedistribution of unobserved heterogeneity does not change over time conditional onthe treatment and the mediator (the latter assumption is also known as stationar-ity). Given appropriate common support conditions, this permits identifying directeffects on subpopulations conditional on the treatment and the mediator states, evenif both treatment and mediator assignment are endogenous.Augmenting the assumptions by random treatment assignment and weak mono-tonicity of the mediator in the treatment allows for causal mediation analysis insubpopulations defined upon whether and how the mediator reacts to the treatment.Specifically, we show the identification of direct effects among those whose mediatoris always one (always-takers in the denomination of Angrist, Imbens, and Rubin,1996) and never one (never-takers) irrespective of treatment assignment, respec-tively. Furthermore, we identify the total, direct, and indirect treatment effects onthose whose mediator value complies with treatment assignment (compliers). For1ny set of assumptions, we discuss the identification of both average and quantiledirect and indirect effects. We note that if appropriately weighted, the respective av-erage effects among compliers, always-takers, and never-takers add up to the averagedirect and indirect effects in the population.Identification in the earlier mediation literature typically relied on linear modelsfor the mediator and outcome equations and often neglected endogeneity issues, seefor instance Cochran (1957), Judd and Kenny (1981), and Baron and Kenny (1986).More recent contributions use more general identification approaches based on thepotential outcome framework and take endogeneity issues explicitly into consider-ation. Examples include Robins and Greenland (1992), Pearl (2001), Robins (2003),Petersen, Sinisi, and van der Laan (2006), VanderWeele (2009), Imai, Keele, and Yamamoto(2010), Hong (2010), Albert and Nelson (2011), Imai and Yamamoto (2013), Tchetgen Tchetgen and Shpitser(2012), Vansteelandt, Bekaert, and Lange (2012), and Huber (2014). The vast ma-jority of the literature assumes that the covariates observed in the data are suffi-ciently rich to control for treatment and mediator endogeneity. Also in empirical eco-nomics, there has been an increase in the application of such selection on observablesapproaches, see for instance Simonsen and Skipper (2006), Flores and Flores-Lagunes(2009), Heckman, Pinto, and Savelyev (2013), Huber (2015), Keele, Tingley, and Yamamoto(2015), Conti, Heckman, and Pinto (2016), Huber, Lechner, and Mellace (2017), Bijwaard and Jones(2019), Bellani and Bia (2018), Huber, Lechner, and Strittmatter (2018), and Doerr and Strittmatter(2019). Comparably few studies in economics develop or apply instrumental variableapproaches for disentangling direct and indirect effects, see for instance Frölich and Huber(2017), Powdthavee, Lekfuangfu, and Wooden (2013), Brunello, Fort, Schneeweis, and Winter-Ebmer(2016) and Chen, Chen, and Liu (2017). Our paper provides another, CiC-basedidentification strategy that neither rests on selection on observables assumptionsnor on instrumental variables for the treatment or the mediator.While most studies aim at evaluating direct and indirect effects in the total pop-ulation, a smaller strand of the literature uses the principal stratification frameworkof Frangakis and Rubin (2002) to investigate effects in subpopulations (or principal2trata) defined upon whether and how the mediator reacts to the treatment, seeRubin (2004). This approach has been criticized for typically focussing on directeffects on populations whose mediator is constant (i.e. always- and never-takers)rather than decomposing direct and indirect effects on compliers and for consider-ing subpopulations rather than the total population, see VanderWeele (2008) andVanderWeele (2012). Deuchert, Huber, and Schelker (2017) suggest a difference-in-differences (DiD) strategy that alleviates such criticisms. Identification relies on arandomized treatment, monotonicity of the (binary) mediator in the treatment, andparticular common trend assumptions on mean potential outcomes across principalstrata. The latter imply that mean potential outcomes under specific treatment andmediator states change by the same amount over time across specific subpopulations.Depending on the strength of common trend and effect homogeneity assumptionsacross principal strata, direct and indirect effects are identified for different subpop-ulations and under the strongest set of assumptions even for the total population.Our paper contributes to this literature on principal strata effects, but relies ondifferent identifying assumptions than Deuchert, Huber, and Schelker (2017). Whiledifferential time trends across subpopulations are permitted, our approach restrictsthe conditional distribution of unobserved heterogeneity over time. The two sets ofassumptions are not nested and their appropriateness is to be judged in the empiricalcontext at hand. However, both approaches could be used simultaneously for testingthe joint validity of the identifying assumptions of either method, in which caseboth CiC and DiD converge to the same, true average direct and indirect effects.As a further distinction to Deuchert, Huber, and Schelker (2017), our method alsopermits assessing quantile treatment effects (QTEs) rather than average effects only.In independent work, Sawada (2019) proposes a CiC strategy to tackle non-compliance in randomized experiments when the exclusion restriction of randomassignment is violated. While there is an overlap in some identification results ofhis study and ours (e.g. concerning the direct effect on never-takers), there arealso important differences. First, Sawada (2019) predominantly focusses on the av-3rage treatment effect on the treated under one-sided non-compliance (ruling outalways-takers), which then corresponds to the total effect on compliers. Our paperin addition disentangles the total complier effect into direct and indirect compo-nents. Second, under two-sided non-compliance (i.e. the existence of both never-and always-takers), Sawada (2019) identifies the total complier effect by assum-ing homogeneity of the direct effect, while we extend the CiC assumptions to thealways-takers for identifying (direct, indirect, and total) complier effects as well asthe direct effect among always-takers. Third and in contrast to Sawada (2019), wealso provide identification results in the absence of randomization and monotonicityof the mediator in the treatment. On the other hand, Sawada (2019) in contrastto our study demonstrates that the CiC strategy does not necessarily require pre-treatment outcomes, but may exploit any pre-treatment variable that has similarrank orders (as a function of unobserved heterogeneity) like the outcome of interest.We provide a simulation study in which we compare the CiC to the DiD approachto illustrate our identification results. We also consider an empirical application tothe Jobs II programme previously analysed by Vinokur, Price, and Schul (1995), arandomized job training intervention designed to analyse the impact of job train-ing on labour market and mental health outcomes. We investigate the direct effectof the randomized offer of treatment on a depression index, as well as its indirecteffect through actual participation in the programme as mediator. The reason forinvestigating the direct effect is that treatment assignment could have a motiva-tion or discouragement effect on those randomly offered or not offered the training.We, however, find the direct effect estimates to be close to zero and statisticallyinsignificant and therefore no indication for the violation of the exclusion restrictionwhen using treatment assignment as instrumental variable for actual participation.In contrast, the moderately negative total and indirect effects on those induced toparticipate by assignment are statistically significant at least at the 10% level inall but one case and very much in line with the estimate obtained by instrumentalvariable regression. 4he remainder of this study is organized as follows. Section 2 introduces thenotation and defines the direct and indirect effects of interest. Section 3 presentsthe assumptions underlying our CiC approach as well as the identification results.Section 4 provides a simulation study. Section 5 provides an application to Jobs II.Section 6 concludes.
Let D denote a binary treatment (e.g., receiving the offer to participate in a trainingprogramme) and M a binary intermediate variable or mediator that may be a func-tion of D (e.g., the actual participation in a training programme). Furthermore, let T indicate a particular time period: T = 0 denotes the baseline period prior to therealisation of D and M , T = 1 the follow up period after measuring D and M inwhich the effect of the outcome is evaluated. Finally, let Y t denote the outcome ofinterest (e.g., health measures) in period T = t . Indexing the outcome by the timeperiod t ∈ { , } implies that it is measured both in the baseline period and afterthe realisation of D and M . To define the parameters of interest, we make use of thepotential outcome notation, see for instance Rubin (1974), and denote by Y t ( d, m ) the potential outcome for treatment state D = d and mediator state M = m in time T = t , with d, m, t, ∈ { , } . Furthermore, let M ( d ) denote the potential mediatoras a function of the treatment state d ∈ { , } . For notational ease, we will not useany time index for D and M , because either is assumed to be measured at a singlepoint in time between T = 0 and T = 1 , albeit not necessarily at the same point,as D causally precedes M . Therefore, D and M correspond to the actual treatmentand mediator status in T = 1 , while it is assumed that no treatment or mediationtakes place in T = 0 .Using this notation, the average treatment effect (ATE) in the ex-post periodis defined as ∆ = E [ Y (1 , M (1)) − Y (0 , M (0))] . That is, the ATE corresponds5o the effect of D on the outcome that either affects the latter directly (net of anyeffect on the mediator) or indirectly through an effect on M . Indeed, the totalATE can be disentangled into the direct and indirect effects, denoted by θ ( d ) = E [ Y (1 , M ( d )) − Y (0 , M ( d ))] and δ ( d ) = E [ Y ( d, M (1)) − Y ( d, M (0))] , by addingand subtracting Y (1 , M (0)) or Y (0 , M (1)) , respectively: ∆ = E [ Y (1 , M (1)) − Y (0 , M (0))] , = E [ Y (1 , M (1)) − Y (1 , M (0))] | {z } = δ (1) + E [ Y (1 , M (0)) − Y (0 , M (0))] | {z } = θ (0) , = E [ Y (1 , M (1)) − Y (0 , M (1))] | {z } = θ (1) + E [ Y (0 , M (1)) − Y (0 , M (0))] | {z } = δ (0) . Distinguishing between θ (1) and θ (0) or δ (1) and δ (0) , respectively, implies thepossibility of interaction effects between D and M such that the direct and indirecteffects could be heterogeneous across values d = 1 and d = 0 .In our approach, we consider the concepts of direct and indirect effects withinspecific subpopulations. The latter are either defined conditional on the treat-ment and mediator values or conditional on potential mediator values under ei-ther treatment states, which matches the so-called principal stratum frameworkof Frangakis and Rubin (2002). As outlined in Angrist, Imbens, and Rubin (1996)in the context of instrumental variable-based identification, any individual i inthe population belongs to one of four strata, henceforth denoted by τ , accord-ing to their potential mediator status under either treatment state: always-takers( a : M (1) = M (0) = 1 ) whose mediator is always one, compliers ( c : M (1) = 1 , M (0) = 0 ) whose mediator corresponds to the treatment value, defiers ( de : M (1) =0 , M (0) = 1 ) whose mediator opposes the treatment value, and never-takers ( n : M (1) = M (0) = 0 ) whose mediator is never one. Note that τ cannot be pinneddown for any individual, because either M (1) or M (0) is observed, but never both.Let ∆ τ = E [ Y (1 , M (1)) − Y (0 , M (0)) | τ ] denote the ATE conditional on τ ∈{ a, c, de, n } ; θ τ ( d ) and δ τ ( d ) denote the corresponding direct and indirect effects.6ecause M (1) = M (0) = 0 for any never-taker, the indirect effect for this group isby definition zero ( δ n ( d ) = E [ Y ( d, − Y ( d, | τ = n ] = 0) and ∆ n = E [ Y (1 , − Y (0 , | τ = n ] = θ n (1) = θ n (0) = θ n equals the direct effect for never-takers.Correspondingly, because M (1) = M (0) = 1 for any always-taker, the indirecteffect for this group is by definition zero ( δ a ( d ) = E [ Y ( d, − Y ( d, | τ = a ] = 0) and ∆ a = E [ Y (1 , − Y (0 , | τ = a ] = θ a (1) = θ a (0) = θ a equals the direct effectfor always-takers. For the compliers, both direct and indirect effects may exist. Notethat M ( d ) = d due to the definition of compliers. Accordingly, θ c ( d ) = E [ Y (1 , d ) − Y (0 , d ) | τ = c ] equals the direct effect for compliers, δ c ( d ) = E [ Y ( d, − Y ( d, | τ = c ] equals the indirect effect for compliers, and ∆ c = E [ Y (1 , − Y (0 , | τ = c ] equals the total effect for compliers. In the absence of any direct effect, the indirecteffects on the compliers are homogeneous, δ c (1) = δ c (0) = δ c , and correspond tothe local average treatment effect (LATE, e.g., Angrist, Imbens, and Rubin, 1996).Analogous results hold for the defiers.As already mentioned, we will also consider direct effects conditional on specificvalues D = d and mediator states M = M ( d ) = m , which are denoted by θ d,m ( d ) = E [ Y (1 , m ) − Y (0 , m ) | D = d, M ( d ) = m ] . These parameters are identified underweaker assumptions than strata-specific effects, but are also less straightforward tointerpret, as they refer to mixtures of two strata. For instance, θ , (1) = E [ Y (1 , − Y (0 , | D = 1 , M (1) = 0] is the effect on a mixture of never-takers and defiers,as these two groups satisfy M (1) = 0 . Likewise, θ , (0) refers to never-takers andcompliers satisfying M (0) = 0 , θ , (0) to always-takers and defiers satisfying M (0) =1 , and θ , (1) to always-takers and compliers satisfying M (1) = 1 . We denote by F Y t ( d,m ) ( y ) = Pr( Y t ( d, m ) ≤ y ) the cumulative distribution functionof Y t ( d, m ) at outcome level y . Its inverse, F − Y t ( d,m ) ( q ) = inf { y : F Y t ( d,m ) ( y ) ≥ q } , isthe quantile function of Y t ( d, m ) at rank q . The total QTE are denoted by ∆ ( q ) = F − Y (1 ,M (1)) ( q ) − F − Y (0 ,M (0)) ( q ) . The QTE can be disentangled into the direct quantile7ffects, denoted by θ ( q, d ) = F − Y (1 ,M ( d )) ( q ) − F − Y (0 ,M ( d )) ( q ) , and the indirect quantileeffects, denoted by δ ( q, d ) = F − Y ( d,M (1)) ( q ) − F − Y ( d,M (0)) ( q ) .The conditional distribution function in stratum τ is F Y t ( d,m ) | τ ( y ) = Pr( Y t ( d, m ) ≤ y | τ ) and the corresponding conditional quantile function is F − Y t ( d,m ) | τ ( q ) = inf { y : F Y t ( d,m ) | τ ( y ) ≥ q } for τ ∈ { a, c, d, n } . Using the previously described stratifica-tion framework, we define the QTE conditional on τ ∈ { a, c, de, n } : ∆ τ ( q ) = F − Y (1 ,M (1)) | τ ( q ) − F − Y (0 ,M (0)) | τ ( q ) . The direct quantile treatment effect among never-takers equals ∆ n ( q ) = F − Y (1 , | n ( q ) − F − Y (0 , | n ( q ) = θ n ( q ) . The direct quantile effectamong always-takers equals ∆ a ( q ) = F − Y (1 , | a ( q ) − F − Y (0 , | a ( q ) = θ a ( q ) . The totalQTE among compliers equals ∆ c ( q ) = F − Y (1 , | c ( q ) − F − Y (0 , | c ( q ) , the direct quantileeffect among compliers equals θ c ( q, d ) = F − Y (1 ,d ) | c ( q ) − F − Y (0 ,d ) | c ( q ) , and the indirectquantile effect among compliers equals δ c ( q, d ) = F − Y ( d, | c ( q ) − F − Y ( d, | c ( q ) . Finally,we define the direct quantile treatment effects conditional on specific values D = d and mediator states M = M ( d ) = m , θ d,m ( q,
1) = F − Y (1 ,m ) | D = d,M (1)= m ( q ) − F − Y (0 ,m ) | D = d,M (1)= m ( q ) and θ d,m ( q,
0) = F − Y (1 ,m ) | D = d,M (0)= m ( q ) − F − Y (0 ,m ) | D = d,M (0)= m ( q ) , with the quantile function F − Y t ( d,m ) | D = d,M ( d )= m ( q ) = inf { y : F Y t ( d,m ) | D = d,M ( d )= m ( y ) ≥ q } and the distribution function F Y t ( d,m ) | D = d,M ( d )= m ( y ) = Pr( Y t ( d, m ) ≤ y | D = d, M ( d ) = m ) . We subsequently define various functions of the observed data required for the iden-tification results. The conditional distribution function of the observed outcome Y t conditional on treatment value d and mediator state m , is given by F Y t | D = d,M = m ( y ) =Pr( Y t ≤ y | D = d, M = m ) for d, m ∈ { , } . The corresponding conditional quantile8unction is F − Y t | D = d,M = m ( q ) = inf { y : F Y t | D = d,M = m ( y ) ≥ q } . Furthermore, Q dm ( y ) := F − Y | D = d,M = m ◦ F Y | D = d,M = m ( y ) = F − Y | D = d,M = m ( F Y | D = d,M = m ( y )) is the quantile-quantile transform of the conditional outcome from period 0 to 1given treatment d and mediator status m . This transform maps y at rank q inperiod 0 ( q = F Y | D = d,M = m ( y ) ) into the corresponding y ′ at rank q in period 1( y ′ = F − Y | D = d,M = m ( q ) ). This sections discusses the identifying assumptions along with the identificationresults for the various direct and indirect effects. We note that our assumptionscould be adjusted to only hold conditional on a vector of observed covariates. Inthis case, the identification results would hold within cells defined upon covari-ate values. In our main discussion, however, covariates are not considered for thesake of ease of notation. For notational convenience, we maintain throughout that
Pr( T = t, D = d, M = m ) > for t, d, m ∈ { , } , implying that all possibletreatment-mediator combinations exist in the population in both time periods. Ourfirst assumption implies that potential outcomes are characterized by a continuousnonparametric function, denoted by h , that is strictly monotonic in a scalar U thatreflects unobserved heterogeneity. Assumption 1:
Strict monotonicity of continuous potential outcomes in unob-served heterogeneity.The potential outcomes satisfy the following model: Y t ( d, m ) = h ( d, m, t, U ) , withthe general function h being continuous and strictly increasing in the scalar unob-servable U ∈ R for all d, m, t ∈ { , } .Assumption 1 requires the potential outcomes to be continuous implying that there9s a one-to-one correspondence between a potential outcome’s distribution and quan-tile functions, which is a condition for point identification. For discrete potentialoutcomes, only bounds on the effects could be identified, in analogy to the discus-sion in Athey and Imbens (2006) for total (rather than direct and indirect) effects.Assumption 1 also implies that individuals with identical unobserved characteristics U have the same potential outcomes Y t ( d, m ) , while higher values of U correspondto strictly higher potential outcomes Y t ( d, m ) . Strict monotonicity is automaticallysatisfied in additively separable models, but Assumption 1 also allows for more flex-ible non-additive structures that arise in nonparametric models.The next assumption rules out anticipation effects of the treatment or the media-tor on the outcome in the baseline period. This assumption is plausible if assignmentto the treatment or the mediator cannot be foreseen in the baseline period, such thatbehavioral changes affecting the pre-treatment outcome are ruled out. Assumption 2:
No anticipation effect of M and D in the baseline period. Y ( d, m ) − Y ( d ′ , m ′ ) = 0 , for d, d ′ , m, m ′ { , } . Similarly, Athey and Imbens (2006) and Chaisemartin and D’Haultfeuille (2018) as-sume the assignment to the treatment group does not affect the potential outcomesas long as the treatment is not yet realized.Furthermore, we assume conditional independence between unobserved hetero-geneity and time periods given the treatment and no mediation.
Assumption 3:
Conditional independence of U and T given D = 1 , M = 0 or D = 0 , M = 0 .(a) U ⊥⊥ T | D = 1 , M = 0 ,(b) U ⊥⊥ T | D = 0 , M = 0 .Under Assumption 3a, the distribution of U is allowed to vary across groups de-fined upon treatment and mediator state, but not over time within the groupwith D = 1 , M = 0 . Assumption 3b imposes the same restriction conditional on D = 0 , M = 0 . Assumption 3 thus imposes stationarity of U within groups defined10n D and M . This assumption is weaker than (and thus implied by) requiring that U is constant across T for each individual i . For example, Assumption 3 is satisfiedin the fixed effect model U = η + v t , with η being a time-invariant individual-specificunobservable (fixed effect) and v t an idiosyncratic time-varying unobservable withthe same distribution in both time periods.Athey and Imbens (2006) and Chaisemartin and D’Haultfeuille (2018) imposetime invariance conditional on the treatment status, U ⊥⊥ T | D = d , to identify theaverage treatment effect on the treated, ϕ = E [ Y (1 , M (1)) − Y (0 , M (0)) | D = 1] orlocal average treatment effect, ϕ = E [ Y (1 , M (1)) − Y (0 , M (0)) | τ = c ] , respectively.We additionally condition on the mediator status to identify direct and indirecteffects.For our next assumption, we introduce some further notation. Let F U | d,m ( u )) =Pr( U ≤ u | D = d, M = m ) be the conditional distribution of U with support U dm . Assumption 4:
Common support given M = 0 .(a) U ⊆ U ,(b) U ⊆ U .Assumption 4a is a common support assumption, implying that any possible valueof U in the population with D = 1 , M = 0 is also contained in the populationwith D = 0 , M = 0 . Assumption 4b imposes that any value of U conditional on D = 0 , M = 0 also exists conditional on D = 1 , M = 0 . Both assumptions togetherimply that the support of U is the same in both populations, albeit the distributionsmay generally differ.Assumptions 1 to 3 permit identifying direct effects on mixed populations ofnever-takers and defiers as well as never-takers and compliers, respectively, as for-mally stated in Theorem 1. Theorem 1:
Under Assumptions 1–3,(a) and Assumption 4a, the average and quantile direct effects under d = 1 con-11itional on D = 1 and M (1) = 0 are identified: θ , (1) = E [ Y − Q ( Y ) | D = 1 , M = 0] ,θ , ( q,
1) = F − Y | D =1 ,M =0 ( q ) − F − Q ( Y ) | D =1 ,M =0 ( q ) . (b) and Assumption 4b, the average and quantile direct effects under d = 0 con-ditional on D = 0 and M (0) = 0 are identified: θ , (0) = E [ Q ( Y ) − Y | D = 0 , M = 0] ,θ , ( q,
0) = F − Q ( Y ) | D =0 ,M =0 ( q ) − F − Y | D =0 ,M =0 ( q ) . Proof.
See Appendix A.To identify direct effects on further populations, we invoke a conditional inde-pendence assumption that is in the spirit of Assumption 3, but refers to differentcombinations of the treatment and the mediator.
Assumption 5:
Conditional independence of U and T given D = 0 , M = 1 or D = 1 , M = 1 .(a) U ⊥⊥ T | D = 0 , M = 1 ,(b) U ⊥⊥ T | D = 1 , M = 1 .Under Assumption 5a, the distribution of U is allowed to vary by treatment andmediator group, but not over time conditional on D = 0 , M = 1 . Assumption 5bimposes the same restriction conditional on D = 1 , M = 1 .Assumption 6 is similar to Assumption 4, but imposes common support condi-tional on M = 1 rather than M = 0 . Assumption 6:
Common support given M = 1 .(a) U ⊆ U ,(b) U ⊆ U .Assumptions 6a implies that any possible value of U in the population with D =0 , M = 1 is also contained in the population with D = 1 , M = 1 . Assumptions12b states that any value of U conditional on D = 1 , M = 1 exists conditional on D = 0 , M = 1 .Theorem 2 shows the identification of the direct effects on mixed populations ofalways-takers and defiers as well as always-takers and compliers. Theorem 2:
Under Assumptions 1-2, 5,(a) and Assumption 6a, the average and quantile direct effects under d = 1 con-ditional on D = 0 and M (0) = 1 are identified: θ , (0) = E [ Q ( Y ) − Y | D = 0 , M = 1] ,θ , ( q,
0) = F − Q ( Y ) | D =0 ,M =1 ( q ) − F − Y | D =0 ,M =1 ( q ) . (b) and Assumption 6b, the average and quantile direct effects under d = 1 isidentified conditional on D = 1 and M (1) = 1 are identified: θ , (1) = E [ Y − Q ( Y ) | D = 1 , M = 1] ,θ , ( q,
1) = F − Y | D =1 ,M =1 ( q ) − F − Q ( Y ) | D =1 ,M =1 ( q ) . Proof.
See Appendix B.In the instrumental variable framework, any direct effects of the instrumentare typically ruled out by imposing the exclusion restriction, in order to iden-tify the causal effect of an endogenous regressor on the outcome, see for instanceImbens and Angrist (1994). By considering D as instrument and M as endogenousregressor, θ , (1) = θ , (0) = θ , (0) = θ , (1) = 0 yield testable implications of theexclusion restriction under Assumptions 1-6.So far, we did not impose exogeneity of the treatment or mediator. In thefollowing, we assume treatment exogeneity by invoking independence between thetreatment and the potential post-treatment variables. Assumption 7:
Independence of the treatment and potential mediators/outcomes. { Y t ( d, m ) , M ( d ) } ⊥⊥ D , for all d, m, t, ∈ { , } . ∆ = E [ Y | D = 1] − E [ Y | D = 0] .Furthermore, we assume the mediator to be weakly monotonic in the treatment. Assumption 8:
Weak monotonicity of the mediator in the treatment.
Pr( M (1) ≥ M (0)) = 1 . Assumption 8 is standard in the instrumental variable literature on local averagetreatment effects when denoting by D the instrument and by M the endogenousregressor, see Imbens and Angrist (1994) and Angrist, Imbens, and Rubin (1996).It rules out the existence of defiers.As discussed in the Appendix C, the total ATE ∆ = E [ Y | D = 1] − E [ Y | D = 0] and QTE ∆ ( q ) = F − Y | D =1 ( q ) − F − Y | D =0 ( q ) for the entire population are identified un-der Assumption 7. Furthermore, Assumptions 7 and 8 yield the strata proportions,denoted by p τ = Pr( τ ) , as functions of the conditional mediator probabilities giventhe treatment, which we denote by p ( m | d ) = Pr( M = m | D = d ) for d, m ∈ { , } (see Appendix C): p a = p | , p c = p | − p | = p | − p | , p n = p | . (1)Furthermore, Assumptions 2, 7, and 8 imply that (see Appendix C) ∆ ,c = E [ Y (1 , − Y (0 , | c ] = E [ Y | D = 1] − E [ Y | D = 0] p | − p | = 0 . (2)Therefore, a rejection of the testable implication E [ Y | D = 1] − E [ Y | D = 0] = 0 inthe data would point to a violation of these assumptions.Assumptions 7 and 8 permit identifying additional parameters, namely the total,direct, and indirect effects on compliers, and the direct effects on never- and always-takers, as shown in Theorems 3 to 5. This follows from the fact that defiers are ruledout and that the proportions and potential outcome distributions of the various14rincipal strata are not selective w.r.t. the treatment. Theorem 3:
Under Assumptions 1–3, 7-8,a) and Assumption 4a, the average and quantile direct effects on never-takers areidentified: θ n = θ , (1) and θ n ( q ) = θ , ( q, . b) and Assumption 4, the average direct effect under d = 0 on compliers isidentified: θ c (0) = p | p | − p | θ , (0) − p | p | − p | θ , (1) . Furthermore, the potential outcome distributions under d = 0 on compliersare identified: F Y (1 , | τ = c ( y ) = p | p | − p | F Q ( Y ) | D =0 ,M =0 ( y ) − p | p | − p | c F Y | D =1 ,M =0 ( y ) , (3) F Y (0 , | τ = c ( y ) = p | p | − p | F Y | D =0 ,M =0 ( y ) − p | p | − p | F Q ( Y ) | D =1 ,M =0 ( y ) . (4)Therefore, the direct quantile effect under d = 0 on compliers, θ c ( q,
0) = F − Y (1 , | c ( q ) − F − Y (0 , | c ( q ) , is identified. Proof.
See Appendix D.
Theorem 4:
Under Assumptions 1–2, 5, 7-8,a) and Assumption 6a, the average and quantile direct effects on always-takersare identified: θ a = θ , (0) and θ a ( q ) = θ , ( q, . b) and Assumption 6, the average direct effect under d = 1 on compliers is15dentified: θ c (1) = p | p | − p | θ , (1) − p | p | − p | θ , (0) . Furthermore, the potential outcome distributions under d = 1 for compliersare identified: F Y (1 , | τ = c ( y ) = p | p | − p | F Y | D =1 ,M =1 ( y ) − p | p | − p | F Q ( Y ) | D =0 ,M =1 ( y ) , (5) F Y (0 , | τ = c ( y ) = p | p | − p | F Q ( Y ) | D =1 ,M =1 ( y ) − p | p | − p | F Y | D =0 ,M =1 ( y ) . (6)Therefore, the direct quantile effect under d = 1 on compliers θ c ( q,
1) = F − Y (1 , | c ( q ) − F − Y (0 , | c ( q ) is identified. Proof.
See Appendix E.
Theorem 5:
Under Assumptions 1-3, 5, 7-8,a) and Assumptions 4a, 6a, the total average treatment effect on compliers isidentified: ∆ c = p | p | − p | E [ Y | D = 1 , M = 1] − p | p | − p | E [ Q ( Y ) | D = 0 , M = 1] − p | p | − p | E [ Y | D = 0 , M = 0] + p | p | − p | E [ Q ( Y ) | D = 1 , M = 0] . Furthermore, the total quantile treatment effect on compliers ∆ c ( q ) = F − Y (1 , | c ( q ) − F − Y (0 , | c ( q ) is identified using the inverse of (5) and (4).b) and Assumptions 4a, 6b, the average indirect effect under d = 0 on compliers16s identified: δ c (0) = p | p | − p | E [ Q ( Y ) | D = 1 , M = 1] − p | p | − p | E [ Y | D = 0 , M = 1] − p | p | − p | E [ Y | D = 0 , M = 0] + p | p | − p | E [ Q ( Y ) | D = 1 , M = 0] . Furthermore, the quantile indirect effect under d = 0 on compliers δ c ( q,
0) = F − Y (0 , | c ( q ) − F − Y (0 , | c ( q ) is identified using the inverse of (6) and (4).c) and Assumptions 4b, 6a, the average indirect effect under d = 1 on compliersis identified: δ c (1) = p | p | − p | E [ Y | D = 1 , M = 1] − p | p | − p | E [ Q ( Y ) | D = 0 , M = 1] − p | p | − p | E [ Q ( Y ) | D = 0 , M = 0] + p | p | − p | E [ Y | D = 1 , M = 0] . Furthermore, the quantile indirect effect under d = 1 on compliers δ c ( q,
1) = F − Y (1 , | c ( q ) − F − Y (1 , | c ( q ) is identified using the inverse of (5) and (3). Proof.
See Appendix F.
As in Assumption 5.1 of Athey and Imbens (2006), we assume standard regularityconditions, namely that conditional on T = t , D = d , and M = m , Y is a randomdraw from that subpopulation defined in terms of t, d, m ∈ { , } . Furthermore,the outcome in the subpopulations required for the identification results of interestmust have compact support and a density that is bounded from above and below aswell as continuously differentiable. Denote by N the total sample size across bothperiods and all treatment-mediator combinations and by i ∈ { , ..., N } an index forthe sampled subject, such that ( Y i , D i , M i , T i ) correspond to sample realizations ofthe random variables ( Y, D, M, T ).The total, direct, and indirect effects may be estimated using the sample analogyprinciple, which replaces population moments with sample moments (e.g. Manski,17988). For instance, any conditional mediator probability given the treatment,
Pr( M = m | D = d ) , is to be replaced by an estimate thereof in the sample, P Ni =1 I { M i = m,D i = d } P Ni =1 I { D i = d } . A crucial step is the estimation of the quantile-quantile trans-forms. The application of such quantile transformations dates at least back toJuhn, Murphy, and Pierce (1991), see also Chaisemartin and D’Haultfeuille (2018),Wüthrich (2019), and Strittmatter (2019) for recent applications. First, it requiresestimating the conditional outcome distribution, F Y t | D = d,M = m ( y ) , by the conditionalempirical distribution ˆ F Y t | D = d,M = m ( y ) = P ni =1 I { D i = d,M i = m,T i = t } P i : D i = d,M i = m,T i = t I { Y i ≤ y } . Second, inverting the latter yields the empirical quantile function ˆ F − Y t | D = d,M = m ( q ) .The empirical quantile-quantile transform is then obtained by ˆ Q dm ( y ) = ˆ F − Y | D = d,M = m ( ˆ F Y | D = d,M = m ( y )) . This permits estimating the average and quantile effects of interest. Average effectsare estimated by replacing any (conditional) expectations with the correspondingsample averages in which the estimated quantile-quantile transforms enter as plug-inestimates. Taking θ , (see Theorem 1) as an example, an estimate thereof is ˆ θ , (1) = 1 P ni =1 I { D i = 1 , M i = 0 , T i = 1 } X i : D i =1 ,M i =0 ,T i =1 Y i − P ni =1 I { D i = 1 , M i = 0 , T i = 0 } X i : D i =1 ,M i =0 ,T i =0 ˆ Q ( Y i ) . Likewhise, quantile effects are estimated based on the empirical quantiles.For the estimation of total ATE and QTE, Athey and Imbens (2006) show thatthe resulting estimators are √ N -consistent and asymptotically normal, see theirTheorems 5.1 and 5.3. These properties also apply to our context when splittingthe sample into subgroups based on the values of a binary treatment and mediator(rather than the treatment only). For instance, the implications of Theorem 1 inAthey and Imbens (2006) when considering subsamples with D = 1 and D = 0 carryover to considering subsamples with D = 1 , M = 0 and D = 0 , M = 0 for estimating18he average direct effect on never-takers. In contrast to Athey and Imbens (2006),however, some of our identification results include the conditional mediator proba-bilities Pr( M = m | D = d ) . As the latter are estimated with √ N -consistency, too, itfollows that the resulting effect estimators are again √ N -consistent and asymptoti-cally normal. We use a non-parametric bootstrap approach to calculate the standarderrors. Chaisemartin and D’Haultfeuille (2018) show the validity of the bootstrapapproach for such kind of estimators, which follows from their asymptotic normality.For the case that identifying assumptions to only hold conditional on observedcovariates, denoted by X , estimation must be adapted to allow for control variables.Following a suggestion by Athey and Imbens (2006) in their Section 5.1, basingestimation on outcome residuals in which the association of X and Y has beenpurged by means of a regression is consistent under the additional assumption thatthe effects of D and M are homogeneous across covariates. As an alternative,Melly and Santangelo (2015) propose a flexible semiparametric estimator that doesnot impose such a homogeneity-in-covariates assumption and show √ N -consistencyand asymptotic normality. To shape the intuition for our identification results, this section presents a briefsimulation based on the following data generating process (DGP): T ∼ Binom (0 . , D ∼ Binom (0 . , U ∼ U nif ( − , , V ∼ N (0 , independent of each other, and M = I { D + U + V > } , Y T = Λ((1 + D + M + D · M ) · T + U ) . Treatment D as well as the observed time period T are randomized, while themediator-outcome association is confounded due to the unobserved time constant19eterogeneity U . The potential outcome in period is given by Y ( d, M ( d ′ )) =Λ((1 + d + M ( d ′ ) + d · M ( d ′ )) + U ) , where Λ denotes a link function. If the lattercorresponds to the identity function, our model is linear and implies a homogeneoustime trend T equal to 1. If Λ is nonlinear, the time trend is heterogeneous, whichinvalidates the common trend assumption of difference-in-differences models. M isnot only a function of D and U , but also of the unobserved random term V , whichguarantees common support w.r.t. U , see Assumptions 4 and 6. Compliers, always-takers, and never-takers satisfy, respectively: c = I { U + V ≤ , U + V > } , a = I { U + V > } , and n = I { U + V ≤ } .In the simulations with 1,000 replications, we consider two sample sizes ( N =1 , , , ) and investigate the behaviour of our change-in-changes methods as wellas the difference-in-differences approach of Deuchert, Huber, and Schelker (2017) inboth a linear ( Λ equal to identity function) and nonlinear outcome model where Λ equals the exponential function. To implement the change-in-changes estimatorsin the simulations as well as the application in Section 5, we make use of the ‘cic’command in the qte R-package by Callaway (2016) with its default values.Table 1 reports the bias, standard deviation (‘sd’), root mean squared error(‘rmse’), true effect (‘true’), and the relative root mean squared error in percent ofthe true effect (‘relr’) of the respective estimators of θ n , θ a , ∆ c , θ c (1) , θ c (0) , δ c (1) ,and δ c (0) for the linear model. In this case, the identifying assumptions underlyingboth the change-in-changes (Panel A.) and difference-in-differences (Panel B.) esti-mators are satisfied. Specifically, the homogeneous time trend on the individual levelsatisfies any of the common trend assumptions in Deuchert, Huber, and Schelker(2017), while the monotonicity of Y in U and the independence of T and U satisfiesthe key assumptions of this paper. For this reason any of the estimates in Table 1 areclose to being unbiased and appear to converge to the true effect at the parametricrate when comparing the results for the two different sample sizes.Table 2 provides the results for the exponential outcome model, in which thetime trend is heterogeneous and interacts with U through the nonlinear link func-20able 1: Linear model with random treatment ˆ θ n ˆ θ a ˆ∆ c ˆ θ c (1) ˆ θ c (0) ˆ δ c (1) ˆ δ c (0) A. Changes-in-Changes N =1,000bias 0.00 -0.00 -0.01 -0.01 -0.01 -0.00 -0.01sd 0.11 0.08 0.23 0.10 0.13 0.27 0.27rmse 0.11 0.08 0.23 0.10 0.13 0.27 0.27true 1.00 2.00 3.00 2.00 1.00 2.00 1.00relr 0.11 0.04 0.08 0.05 0.13 0.14 0.27 N =4,000bias -0.00 -0.00 0.00 -0.00 -0.01 0.01 0.01sd 0.06 0.04 0.12 0.05 0.07 0.14 0.14rmse 0.06 0.04 0.12 0.05 0.07 0.14 0.14true 1.00 2.00 3.00 2.00 1.00 2.00 1.00relr 0.06 0.02 0.04 0.02 0.07 0.07 0.14 B. Difference-in-Differences N =1,000bias 0.01 -0.00 -0.01 -0.01 0.00 -0.02 0.00sd 0.11 0.09 0.14 0.14 0.12 0.19 0.10rmse 0.11 0.09 0.14 0.14 0.12 0.19 0.10true 1.00 2.00 3.00 2.00 1.00 2.00 1.00relr 0.11 0.04 0.05 0.07 0.12 0.10 0.10 N =4,000bias -0.00 -0.00 0.00 -0.00 -0.00 0.00 0.00sd 0.06 0.04 0.07 0.07 0.06 0.10 0.05rmse 0.06 0.04 0.07 0.07 0.06 0.10 0.05true 1.00 2.00 3.00 2.00 1.00 2.00 1.00relr 0.06 0.02 0.02 0.04 0.06 0.05 0.05 Note: ‘bias’, ‘sd’, and ‘rmse’ provide the bias, standard deviation, and root mean squared error ofthe respective estimator. ‘true’ and ‘relr’ are the respective true effect as well as the root meansquared error relative to the true effect. tion. While the change-in-changes assumptions hold (Panel A.), average time trendsare heterogeneous across complier types such that the difference-in-differences ap-proach (Panel B.) of Deuchert, Huber, and Schelker (2017) is inconsistent. Accord-ingly, the biases of the change-in-changes estimates generally approach zero as thesample size increases, while this is not the case for the difference-in-differences esti-mates. Change-in-changes yields a lower root mean squared error than the respectivedifference-in-differences estimator in all but one case (namely ˆ δ c (0) with N = 1 , )and its relative attractiveness increases in the sample size due to its lower bias.21able 2: Nonlinear model with random treatment ˆ θ n ˆ θ a ˆ∆ c ˆ θ c (1) ˆ θ c (0) ˆ δ c (1) ˆ δ c (0) A. Change-in-Changes N =1,000bias 0.01 -0.14 -0.48 -0.35 -0.11 -0.37 -0.13sd 0.48 5.08 8.47 6.20 1.16 8.64 4.23rmse 0.48 5.08 8.48 6.21 1.17 8.65 4.23true 3.49 68.09 52.42 47.70 4.72 47.70 4.72relr 0.14 0.07 0.16 0.13 0.25 0.18 0.90 N =4,000bias -0.01 0.01 -0.00 -0.11 -0.07 0.07 0.11sd 0.25 2.63 4.37 3.20 0.66 4.44 2.04rmse 0.25 2.63 4.37 3.20 0.66 4.44 2.04true 3.49 68.09 52.45 47.73 4.72 47.73 4.72relr 0.07 0.04 0.08 0.07 0.14 0.09 0.43 B. Difference-in-Differences N =1,000bias -0.27 -8.91 14.42 11.46 -1.49 15.91 2.96sd 0.46 2.62 2.58 2.62 0.47 2.61 0.47rmse 0.53 9.29 14.65 11.76 1.56 16.12 2.99true 3.49 68.09 52.42 47.70 4.72 47.70 4.72relr 0.15 0.14 0.28 0.25 0.33 0.34 0.63 N =4,000bias -0.28 -8.79 14.51 11.57 -1.51 16.02 2.94sd 0.24 1.28 1.26 1.28 0.25 1.27 0.23rmse 0.37 8.88 14.57 11.64 1.53 16.07 2.95true 3.49 68.09 52.45 47.73 4.72 47.73 4.72relr 0.11 0.13 0.28 0.24 0.32 0.34 0.62 Note: ‘bias’, ‘sd’, and ‘rmse’ provide the bias, standard deviation, and root mean squared error ofthe respective estimator. ‘true’ and ‘relr’ are the respective true effect as well as the root meansquared error relative to the true effect.
In our final simulation design, we maintain the exponential outcome model butassume D to be selective w.r.t. U rather than random. To this end, the treatmentmodel in (4) is replaced by D = I { U + Q > } , with the independent variable Q ∼ N (0 , being an unobserved term. Under this violation of Assumption 7,complier shares and effects are no longer identified, which is confirmed by the sim-ulation results presented in Table 3. The bias in the change-in-changes based total,direct, and indirect effects on compliers do not vanish as the sample size increases.Furthermore, under non-random assignment of D (while maintaining monotonicity22able 3: Nonlinear model with non-random treatment ˆ θ , ˆ θ , ˆ∆ c ˆ θ c (1) ˆ θ c (0) ˆ δ c (1) ˆ δ c (0) A. Change-in-Changes N =1,000bias 0.02 0.13 47.21 40.19 -1.44 48.64 7.02sd 0.71 4.56 5.45 4.11 0.75 5.53 2.92rmse 0.71 4.56 47.52 40.40 1.62 48.96 7.60true 4.41 54.19 52.42 47.70 4.72 47.70 4.72relr 0.16 0.08 0.91 0.85 0.34 1.03 1.61 N =4,000bias -0.00 0.06 47.38 40.13 -1.53 48.91 7.25sd 0.38 2.35 2.84 2.04 0.38 2.86 1.51rmse 0.38 2.35 47.47 40.18 1.57 48.99 7.40true 4.40 54.18 52.45 47.73 4.72 47.73 4.72relr 0.09 0.04 0.90 0.84 0.33 1.03 1.57 B. Difference-in-Differences N =1,000bias 0.35 19.98 29.00 27.65 0.04 28.96 1.35sd 0.67 2.48 2.46 2.48 0.67 2.51 0.45rmse 0.75 20.14 29.11 27.76 0.67 29.07 1.43true 4.41 54.19 52.42 47.70 4.72 47.70 4.72relr 0.17 0.37 0.56 0.58 0.14 0.61 0.30 N =4,000bias 0.34 20.02 28.98 27.65 0.02 28.96 1.33sd 0.35 1.22 1.19 1.22 0.35 1.24 0.23rmse 0.49 20.06 29.01 27.68 0.35 28.99 1.35true 4.40 54.18 52.45 47.73 4.72 47.73 4.72relr 0.11 0.37 0.55 0.58 0.07 0.61 0.29 Note: ‘bias’, ‘sd’, and ‘rmse’ provide the bias, standard deviation, and root mean squared error ofthe respective estimator. ‘true’ and ‘relr’ are the respective true effect as well as the root meansquared error relative to the true effect. of M in D ), the never-takers’ and always-takers’ respective distributions of U dif-fer across treatment. Therefore, average direct effects among the total of never oralways-takers, respectively, are not identified. Yet, θ , , which is still identified bythe same estimator as before, yields the direct effect among never-takers with D = 1 (as defiers do not exist). Likewise, θ , corresponds to the direct effect on always-takers with D = 0 . Indeed, the results in Table 3 suggest that both parameters areconsistently estimated with the change-in-changes model (Panel A.).23 Application
Our empirical application is based on the JOBS II data by Vinokur and Price (1999).JOBS II was a randomized job training intervention in the US, designed to anal-yse the impact of job training on labour market and mental health outcomes, seeVinokur, Price, and Schul (1995). It was a modified version of the earlier JOBS pro-gramme, which had been found to improve labour market outcomes such as job satis-faction, motivation, earnings, and job stability, see Caplan, Vinokur, Price, and van Ryn(1989) and Vinokur, van Ryn, Gramlich, and Price (1991), as well as mental health,see Vinokur, Price, and Caplan (1991). According to the results of Vinokur, Price, and Schul(1995), the JOBS II programme increased reemployment rates and improved mentalhealth outcomes, especially for participants having an elevated risk of depression.The JOBS interventions had an important impact in the academic literature (seee.g. Wanberg, 2012, Liu, Huang, and Wang, 2014) and the methodology was imple-mented in field experiments in Finland (Vuori, Silvonen, Vinokur, and Price, 2002,Vuori and Silvonen, 2005) and the Netherlands (Brenninkmeijer and Blonk, 2011),suggesting positive effects on labour market integration in either case.The JOBS II intervention was conducted in south-eastern Michigan, where 2,464job seekers were eligible to participate in a randomized field experiment, see Vinokur and Price(1999). In a baseline period prior to programme assignment, individuals respondedto a screening questionnaire that collected pre-treatment information on mentalhealth. Based on the latter, individuals were classified as having either a high orlow depression risk and those with a high risk were oversampled before the train-ing was randomly assigned. The job training consisted of five 4-hours seminarsconducted in morning sessions during one week between March 1 and August 7,1991. Members of the treatment group who participated in at least four of the In the JOBS II intervention, randomization was followed by yet another questionnaire sent out twoweeks before the actual job training, see Vinokur, Price, and Schul (1995), which also providedinformation on whether an individual had been assigned the training. Consequently, the datacollected in that questionnaire must be considered post-treatment as they could be affected bylearning the assignment. Therefore, we rely on the earlier screening data as the relevant pre-treatment period prior to random programme assignment. The control group received a booklet with information on job searchmethods (Vinokur, Price, and Schul, 1995, p. 44-49).We analyse the impact of job training on mental health, namely symptoms ofdepression 6 months after training participation. The health outcome ( Y ) is basedon a 11-items index of depression symptoms of the Hopkins Symptom Checklist.For example, respondents were asked how much they were bothered by symptomssuch as crying easily, feeling lonely, feeling blue, feeling hopeless, having thoughts ofending their lives, or experiencing a loss of sexual interest. The questions were codedon a 5-point scale, going from ‘not at all’ (1) to ‘extremely’ (5), and summarized ina depression variable that consists of the average across all questions.One-sided non-compliance with the random assignment is a major issue in JOBSII. While the study design rules out always-takers because members of the controlgroup did not have access to the job training programme, 45% of those assigned totraining in our data did not participate and are therefore never-takers, the remaining55% are compliers. In order to avoid selection bias w.r.t actual participation, theoriginal JOBS II study by Vinokur, Price, and Schul (1995) analysed the total effectof the policy (i.e. the intention-to-treat effect), including those who, despite receivingan offer to participate, did not take part in the job training. In contrast, we use ourmethodology to separate the direct effect of mere training assignment, which is ourtreatment D , from the indirect effect operating through actual training participation,which is our mediator M , among compliers. We also consider the direct effect on When compared to the earlier JOBS programme (Caplan, Vinokur, Price, and van Ryn, 1989),the job training sessions of JOBS II focused more strongly on building a sense of mastery, per-sonal control and self- efficacy in job search. Previous research had suggested that an increase inthis sense of mastery, control and self-efficacy improved observed effort in job search behaviour(Eden and Aviram, 1993). Results in Marshall and Lang (1990) suggest that mastery is a strongpredictor of depression symptoms among women. For a detailed discussion of the literature, the ex-act sampling process, the training programme, and further aspects, see Vinokur, Price, and Schul(1995). Imai, Keele, and Tingley (2010) analyse Jobs II in a mediation context as well, but consider adifferent mediator, namely job search self-efficacy, and a different identification strategy based onselection on observables. T = 0 ) post-mediator ( T = 1 )sample size mean sample size meanoverall 1,796 1.86 1,564 1.73(0.58) (0.67) D = 0
551 1.87 486 1.78(0.59) (0.70) D = 1 Note: Standard deviations are in parentheses. ‘mean diff’, ‘pval’, and ‘SD’ are the mean difference,its p-value, and the standardized difference, respectively. never-takers, which likely differs from that on the compliers. While being offered(or not offered) the job training might affect compliers’ mental health by inducingmotivation/enthusiasm (or discouragement), it may not have the same effect amongnever-takers, who do not attend such seminars whatsoever.More concisely, we base identification on Theorem 3a with Assumption 4a forthe average direct effect on never-takers, θ n , on Theorem 3b with Assumption 4 forthe direct effect on compliers under d = 0 , θ c (0) , and Theorem 5 with Assumptions4b and 6a for the indirect effect on compliers under d = 1 , δ c (1) . None of theseapproaches requires the presence of always-takers in the sample. We also note that ifrandom assignment operated through other mechanisms than actual participation inany of the subpopulations as it may appear reasonable in the context of mental healthoutcomes, this would violate the exclusion restriction when using assignment asinstrumental variable for actual participation in a two stage least squares regression.Given that our identifying assumptions hold, our approach can therefore be used tostatistically test the exclusion restriction.Our evaluation sample consists of a total of 3,360 observations in the pre-treatment and post-mediator periods with non-missing information for D , M , and Y . It is an unbalanced panel due to attrition of roughly 13% of the initial respon-dents between the two periods. Table 4 provides summary statistics for the outcome26n the total sample as well as by treatment group over time. We verify whether ran-domization was successful by comparing the outcome means of the treatment andcontrol groups in the pre-treatment period ( T = 0 ) just prior to the randomiza-tion of D . The small difference of 0.01 is not statistically significant according toa two sample t-test. Furthermore, the standardized difference test suggested byRosenbaum and Rubin (1985) yields a value of just 1.68 and is thus far below 20, athreshold frequently chosen for indicating problematic imbalances across treatmentgroups. To test for potential attrition bias we also consider these statistics in thepre-treatment period exclusively among the panel cases that remain in the samplein the post-mediator period (not reported in Table 4). The p-value of the t-testamounts to 0.52 and the standardized difference of 3.5 is low such that attrition biasdoes not appear to be a concern. We therefore do not find statistical evidence fora violation of the random assignment of D in our sample. Table 4 also reports themean difference in outcomes in the post-mediator period ( T = 1 ) 6 months afterparticipation, which is an estimate for the total (or intention-to-treat) effect of D .The difference of 0.08 is statistically significant at the 5% level.Table 5: Empirical results for Jobs IIChanges-in-Changes Difference-in-Differences Type shares ˆ θ n ˆ∆ c ˆ θ c (0) ˆ δ c (1) ˆ θ n ˆ∆ c ˆ θ c (0) ˆ δ c (1) ˆ p ( n ) ˆ p ( c ) est -0.04 -0.11 0.06 -0.17 -0.03 -0.12 -0.06 -0.06 0.45 0.55se 0.05 0.06 0.05 0.08 0.05 0.06 0.05 0.07 0.01 0.01pval 0.40 0.06 0.26 0.04 0.52 0.03 0.21 0.43 0.00 0.00 Note: ‘est’, ‘se’, and ‘pval’ provide the effect estimate, standard error, and p-value of the respectiveestimator. ˆ p ( n ) and ˆ p ( c ) are the estimated never-taker and complier shares. Standard errors arebased on cluster bootstrapping the effects 1999 times where clustering is on the respondent level. Table 5 presents the estimation results based on our CiC approach and the DiDstrategy of Deuchert, Huber, and Schelker (2017) when (linearly) controlling for thegender of respondents in either case. Standard errors rely on cluster bootstrappingthe direct and indirect effects 1999 times, where clustering is on the respondentlevel. The CiC and DiD estimates of the direct effects on never-takers, ˆ θ n (0) , aswell as on compliers, ˆ θ c (0) , are not statistically significant at conventional levels.27ence, we do not find statistical evidence for a direct effect of the mere assignmentinto the training programme on the depression outcome, which would point to aviolation of the exclusion restriction when using assignment as instrument for par-ticipation. In contrast, we find for both CiC and DiD negative total effects amongcompliers ˆ∆ c that are statistically significant at least at the 10% level. In the caseof CiC, also the negative indirect effect among compliers, ˆ δ c (1) , is significant at the5% level, while this is not in the case for DiD. By and large, our results point to amoderately negative treatment effect on depressive symptoms through actual pro-gramme participation, rather than through other (i.e. direct) mechanisms. The CiCestimates ˆ δ c (1) and ˆ∆ c are in fact rather similar to the result of a two stage leastsquares regression relying on the exclusion restriction by using D as instrument for M . The latter approach yields a local average treatment effect on compliers in thepost-mediator period of -0.14 with a heteroskedasticity-robust standard error of 0.07(significant at the 5% level). We proposed a novel identification strategy for causal mediation analysis with re-peated cross sections or panel data based on changes-in-changes (CiC) assump-tions that are related but yet different to Athey and Imbens (2006) considering to-tal treatment effects. Strict monotonicity of outcomes in unobserved heterogeneityand distributional time invariance of the latter within groups defined on treatmentand mediator states are key assumptions for identifying direct effects within thesegroups. Additionally assuming random treatment assignment and weak monotonic-ity of the mediator in the treatment permits identifying direct effects on never-takersand always-takers as well as total, direct, and indirect effects on compliers. We alsoprovided a brief simulation study and an empirical application to the Jobs II pro-gramme. 28 eferences
Albert, J. M., and
S. Nelson (2011): “Generalized causal mediation analysis,”
Biometrics , 67, 1028–1038.
Angrist, J. D., G. W. Imbens, and
D. B. Rubin (1996): “ Identification ofCausal Effects Using Instrumental Variables,”
Journal of the American StatisticalAssociation , 91, 444–472.
Athey, S., and
G. W. Imbens (2006): “Identification and Inference in NonlinearDifference-In-Difference Models,”
Econometrica , 74, 431–497.
Baron, R. M., and
D. A. Kenny (1986): “The Moderator-Mediator Variable Dis-tinction in Social Psychological Research: Conceptual, Strategic, and StatisticalConsiderations,”
Journal of Personality and Social Psychology , 51, 1173–1182.
Bellani, L., and
M. Bia (2018): “The long-run effect of childhood poverty andthe mediating role of education,”
Journal of the Royal Statistical Society: SeriesA , 182, 37–68.
Bijwaard, G. E., and
A. M. Jones (2019): “An IPW estimator for mediationeffects in hazard models: with an application to schooling, cognitive ability andmortality,”
Empirical Economics , 57, 129–175.
Brenninkmeijer, V., and
R. W. Blonk (2011): “The effectiveness of the JOBSprogram among the long-term unemployed: a randomized experiment in theNetherlands,”
Health Promotion International , 27, 220–229.
Brunello, G., M. Fort, N. Schneeweis, and
R. Winter-Ebmer (2016): “TheCausal Effect of Education on Health: What is the Role of Health Behaviors?,”
Health Economics , 25, 314–336.
Callaway, B. (2016): “Quantile Treatment Effects in R: The qte Package,” workingpaper, Temple University, Philadelphia .29 aplan, R. D., A. D. Vinokur, R. H. Price, and
M. van Ryn (1989): “Jobseeking, reemployment, and mental health: A randomized field experiment incoping with job loss,”
Journal of Applied Psychology , 74, 759–769.
Chaisemartin, C., and
X. D’Haultfeuille (2018): “ Fuzzy Differences-in-Differences,”
Review of Economic Studies , 85, 999–1028.
Chen, S. H., Y. C. Chen, and
J. T. Liu (2017): “The impact of family compo-sition on educational achievement,”
Journal of Human Resources , 0915-7401R1.
Cochran, W. G. (1957): “Analysis of Covariance: Its Nature and Uses,”
Biomet-rics , 13, 261–281.
Conti, G., J. J. Heckman, and
R. Pinto (2016): “The Effects of Two InfluentialEarly Childhood Interventions on Health and Healthy Behaviour,”
The EconomicJournal , 126, F28–F65.
Deuchert, E., M. Huber, and
M. Schelker (2017): “Direct and indirect ef-fects based on difference-in-differences with an application to political preferencesfollowing the Vietnam draft lottery,” forthcoming in the Journal of Business &Economic Statistics . Doerr, A., and
A. Strittmatter (2019): “Identifying causal channels of policyreforms with multiple treatments and different types of selection,” working paper,University of St. Gallen . Eden, D., and
A. Aviram (1993): “Self-efficacy training to speed reemployment:Helping people to help themselves,”
Journal of Applied Psychology , 78, 352–360.
Flores, C. A., and
A. Flores-Lagunes (2009): “Identification and Estimationof Causal Mechanisms and Net Effects of a Treatment under Unconfoundedness,”
IZA Discussion Paper No. 4237 . Frangakis, C., and
D. Rubin (2002): “Principal Stratification in Causal Infer-ence,”
Biometrics , 58, 21–29. 30 rölich, M., and
M. Huber (2017): “Direct and Indirect Treatment Effects –Causal Chains and Mediation Analysis with Instrumental Variables,”
Journal ofthe Royal Statistical Society: Series B , 79, 1645–1666.
Heckman, J., R. Pinto, and
P. Savelyev (2013): “Understanding the Mech-anisms Through Which an Influential Early Childhood Program Boosted AdultOutcomes,”
American Economic Review , 103, 2052–2086.
Hong, G. (2010): “Ratio of mediator probability weighting for estimating naturaldirect and indirect effects,” in
Proceedings of the American Statistical Association,Biometrics Section , p. 2401–2415. Alexandria, VA: American Statistical Associa-tion.
Huber, M. (2014): “Identifying causal mechanisms (primarily) based on inverseprobability weighting,”
Journal of Applied Econometrics , 29, 920–943.
Huber, M. (2015): “Causal pitfalls in the decomposition of wage gaps,”
Journal ofBusiness and Economic Statistics , 33, 179–191.
Huber, M., M. Lechner, and
G. Mellace (2017): “Why Do Tougher Case-workers Increase Employment? The Role of Program Assignment as a CausalMechanism,”
The Review of Economics and Statistics , 99, 180–183.
Huber, M., M. Lechner, and
A. Strittmatter (2018): “Direct and indirecteffects of training vouchers for the unemployed,”
Journal of the Royal StatisticalSociety: Series A (Statistics in Society) , 181, 441–463.
Imai, K., L. Keele, and
D. Tingley (2010): “A General Approach to CausalMediation Analysis,”
Psychological Methods , 15, 309–334.
Imai, K., L. Keele, and
T. Yamamoto (2010): “Identification, Inference andSensitivity Analysis for Causal Mediation Effects,”
Statistical Science , 25, 51–71.
Imai, K., and
T. Yamamoto (2013): “Identification and Sensitivity Analysis for31ultiple Causal Mechanisms: Revisiting Evidence from Framing Experiments,”
Political Analysis , 21, 141–171.
Imbens, G. W., and
J. Angrist (1994): “Identification and Estimation of LocalAverage Treatment Effects,”
Econometrica , 62, 467–475.
Judd, C. M., and
D. A. Kenny (1981): “Process Analysis: Estimating Mediationin Treatment Evaluations,”
Evaluation Review , 5, 602–619.
Juhn, C., K. M. Murphy, and
B. Pierce (1991): “ Accounting for the Slowdownin Black-White Wage Convergencee,” in
Workers and their Wages: Changing Pat-terns in the United States , ed. by M. H. Kosters, pp. 107–143. American EnterpriseInsitute, Washigton.
Keele, L., D. Tingley, and
T. Yamamoto (2015): “Identifying mechanisms be-hind policy interventions via causal mediation analysis,”
Journal of Policy Anal-ysis and Management , 34, 937–963.
Liu, S., J. L. Huang, and
M. Wang (2014): “Effectiveness of Job Search Inter-ventions: A Meta-Analytic Review,”
Psychological Bulletin , 140, 1009–1041.
Manski, C. F. (1988):
Analog Estimation Methods in Econometrics . Chapman &Hall, New York.
Marshall, G. N., and
E. L. Lang (1990): “Optimism, self-mastery, and symp-toms of depression in women professionals,”
Journal of Personality and SocialPsychology , 59, 132–139.
Melly, B., and
G. Santangelo (2015): “ The Changes-in-Changes Model withCovariates,”
Working Paper . Pearl, J. (2001): “Direct and indirect effects,” in
Proceedings of the SeventeenthConference on Uncertainty in Artificial Intelligence , pp. 411–420, San Francisco.Morgan Kaufman. 32 etersen, M. L., S. E. Sinisi, and
M. J. van der Laan (2006): “Estimationof Direct Causal Effects,”
Epidemiology , 17, 276–284.
Powdthavee, N., W. N. Lekfuangfu, and
M. Wooden (2013): “The MarginalIncome Effect of Education on Happiness: Estimating the Direct and IndirectEffects of Compulsory Schooling on Well-Being in Australia,”
IZA DiscussionPaper No. 7365 . Robins, J. M. (2003): “Semantics of causal DAG models and the identificationof direct and indirect effects,” in
In Highly Structured Stochastic Systems , ed.by P. Green, N. Hjort, and
S. Richardson, pp. 70–81, Oxford. Oxford UniversityPress.
Robins, J. M., and
S. Greenland (1992): “Identifiability and Exchangeabilityfor Direct and Indirect Effects,”
Epidemiology , 3, 143–155.
Rosenbaum, P. R., and
D. B. Rubin (1985): “Constructing a control group usingmultivariate matched sampling methods that incorporate the propensity score.,”
The American Statistician , 39, 33–38.
Rubin, D. B. (1974): “ Estimating the Causal Effect of Treatments in Randomizedand Non-Randomized Studies,”
Journal of Educational Psychology , 66, 688–701.
Rubin, D. B. (2004): “Direct and Indirect Causal Effects via Potential Outcomes,”
Scandinavian Journal of Statistics , 31, 161–170.
Sawada, M. (2019): “ Non-Compliance in Randomized Control Trials without Ex-clusion Restrictions,”
Working Paper . Simonsen, M., and
L. Skipper (2006): “The Costs of Motherhood: An AnalysisUsing Matching Estimators,”
Journal of Applied Econometrics , 21, 919–934.
Strittmatter, A. (2019): “Heterogeneous Earnings Effects of the Job Corps byGender Earnings: A Translated Quantile Approach,” forthcoming in Labour Eco-nomics . 33 chetgen Tchetgen, E. J., and
I. Shpitser (2012): “Semiparametric theoryfor causal mediation analysis: Efficiency bounds, multiple robustness, and sensi-tivity analysis,”
The Annals of Statistics , 40, 1816–1845.
VanderWeele, T. J. (2008): “Simple relations between principal stratificationand direct and indirect effects,”
Statistics & Probability Letters , 78, 2957–2962.
VanderWeele, T. J. (2009): “Marginal Structural Models for the Estimation ofDirect and Indirect Effects,”
Epidemiology , 20, 18–26.
VanderWeele, T. J. (2012): “Comments: Should Principal Stratification Be Usedto Study Mediational Processes?,”
Journal of Research on Educational Effective-ness , 5, 245–249.
Vansteelandt, S., M. Bekaert, and
T. Lange (2012): “Imputation Strategiesfor the Estimation of Natural Direct and Indirect Effects,”
Epidemiologic Methods ,1, 129–158.
Vinokur, A. D., and
R. H. Price (1999): “Jobs II Preventive Intervention forUnemployed Job Seekers, 1991-1993,”
Inter-university Consortium for Politicaland Social Research . Vinokur, A. D., R. H. Price, and
R. D. Caplan (1991): “From field ex-periments to program implementation: Assessing the potential outcomes of anexperimental intervention program for unemployed persons,”
American Journalof Community Psychology , 19, 543–562.
Vinokur, A. D., R. H. Price, and
Y. Schul (1995): “Impact of the JOBSIntervention on Unemployed Workers Varying in Risk for Depression,”
AmericanJournal of Community Psychology , 23, 39–74.
Vinokur, A. D., M. van Ryn, E. M. Gramlich, and
R. H. Price (1991):“From field experiments to program implementation: Assessing the potential out-comes of an experimental intervention program for unemployed persons,”
Journalof Applied Psychology , 76, 213–219. 34 uori, J., and
J. Silvonen (2005): “The benefits of a preventive job searchprogram on re-employment and mental health at 2-year follow-up,”
Journal ofOccupational and Organizational Psychology , 78, 43–52.
Vuori, J., J. Silvonen, A. D. Vinokur, and
R. H. Price (2002): “The TyöhönJob Search Program in Finland: Benefits for the Unemployed With Risk of De-pression or Discouragement,”
Journal of Occupational Health Psychology , 7, 5–19.
Wanberg, C. R. (2012): “The Individual Experience of Unemployment,”
AnnualReview of Psychology , 63, 369–396.
Wüthrich, K. (2019): “A comparison of two quantile models with endogeneity,” forthcoming in Journal of Business and Economic Statistics .35 ppendices
A Proof of Theorem 1
A.1 Average direct effect under d = conditional on D = and M ( ) = In the following, we prove that θ , (1) = E [ Y (1 , − Y (0 , | D = 1 , M i (1) =0] = E [ Y − Q ( Y ) | D = 1 , M = 0] . Using the observational rule, we obtain E [ Y (1 , | D = 1 , M (1) = 0] = E [ Y | D = 1 , M = 0] . Accordingly, we have to showthat E [ Y (0 , | D = 1 , M (1) = 0] = E [ Q ( Y ) | D = 1 , M = 0] to finish the proof.Denote the inverse of h ( d, m, t, u ) by h − ( d, m, t ; y ) , which exists because of thestrict monotonicity required in Assumption 1. Under Assumptions 1 and 3a, theconditional potential outcome distribution function equals F Y t ( d, | D =1 ,M =0 ( y ) A = Pr( h ( d, m, t, U ) ≤ y | D = 1 , M = 0 , T = t ) , = Pr( U ≤ h − ( d, m, t ; y ) | D = 1 , M = 0 , T = t ) , A a = Pr( U ≤ h − ( d, m, t ; y ) | D = 1 , M = 0) , = F U | ( h − ( d, m, t ; y )) , (A.1)for d, d ′ ∈ { , } . We use these quantities in the following.First, evaluating F Y (0 , | D =1 ,M =0 ( y ) at h (0 , , , u ) gives F Y (0 , | D =1 ,M =0 ( h (0 , , , u )) = F U | ( h − (0 , , h (0 , , , u ))) = F U | ( u ) . Applying F − Y (0 , | D =1 ,M =0 ( q ) to both sides, we have h (0 , , , u ) = F − Y (0 , | D =1 ,M =0 ( F U | ( u )) . (A.2)36econd, for F Y (0 , | D =1 ,M =0 ( y ) we have F − U | D =1 ,M =0 ( F Y (0 , | D =1 ,M =0 ( y )) = h − (0 , , y ) . (A.3)Combining (A.2) and (A.3) yields, h (0 , , , h − (0 , , y )) = F − Y (0 , | D =1 ,M =0 ◦ F Y (0 , | D =1 ,M =0 ( y ) . (A.4)Note that h (0 , , , h − (0 , , y )) maps the period 1 (potential) outcome of an in-dividual with the outcome y in period 0 under non-treatment without the me-diator. Accordingly, E [ F − Y (0 , | D =1 ,M =0 ◦ F Y (0 , | D =1 ,M =0 ( Y ) | D = 1 , M = 0] = E [ Y (0 , | D = 1 , M = 0] . We can identify F Y (0 , | D =1 ,M =0 ( y ) under Assump-tion 2, but we cannot identify F Y (0 , | D =1 ,M =0 ( y ) . However, we show in the fol-lowing that we can identify the overall quantile-quantile transform F − Y (0 , | D =1 ,M =0 ◦ F Y (0 , | D =1 ,M =0 ( y ) under the additional Assumption 3b.Under Assumptions 1 and 3b, the conditional potential outcome distributionfunction equals F Y t ( d, | D =0 ,M =0 ( y ) A = Pr( h ( d, m, t, U ) ≤ y | D = 0 , M = 0 , T = t ) , = Pr( U ≤ h − ( d, m, t ; y ) | D = 0 , M = 0 , T = t ) , A b = Pr( U ≤ h − ( d, m, t ; y ) | D = 0 , M = 0) , = F U | ( h − ( d, m, t ; y )) , (A.5)for d, d ′ ∈ { , } . We repeat similar steps as above. First, evaluating F Y (0 , | D =0 ,M =0 ( y ) at h (0 , , , u ) gives F Y (0 , D =0 ,M =0 ( h (0 , , , u )) = F U | ( h − (0 , , h (0 , , , u ))) = F U | ( u ) . Applying F − Y (0 , | D =0 ,M =0 ( q ) to both sides, we have h (0 , , , u ) = F − Y (0 , | D =0 ,M =0 ( F U | ( u )) . (A.6)37econd, for F Y (0 , | D =0 ,M =0 ( y ) we have F − U | ( F Y (0 , | D =0 ,M =0 ( y )) = h − (0 , , y ) . (A.7)Combining (A.6) and (A.7) yields, h (0 , , , h − (0 , , y )) = F − Y (0 , | D =0 ,M =0 ◦ F Y (0 , | D =0 ,M =0 ( y ) . (A.8)The left sides of (A.4) and (A.8) are equal. In contrast to (A.4), (A.8) con-tains only distributions that can be identified from observable data. In partic-ular, F Y t (0 , | D =0 ,M =0 ( y ) = Pr( Y t (0 , ≤ y | D = 0 , M = 0) = Pr( Y t ≤ y | D =0 , M = 0) . Accordingly, we can identify F − Y (0 , | D =1 ,M =0 ◦ F Y (0 , | D =1 ,M =0 ( y ) by Q ( y ) ≡ F − Y | D =0 ,M =0 ◦ F Y | D =0 ,M =0 ( y ) .Parsing Y through Q ( · ) in the treated group without mediator gives E [ Q ( Y ) | D = 1 , M = 0]= E [ F − Y | D =0 ,M =0 ◦ F Y | D =0 ,M =0 ( Y ) | D = 1 , M = 0] , = E [ F − Y (0 , | D =0 ,M =0 ◦ F Y (0 , | D =0 ,M =0 ( Y (1 , | D = 1 , M = 0] , A ,A b = E [ h (0 , , , h − (0 , , Y (1 , | D = 1 , M = 0] , A = E [ h (0 , , , h − (0 , , Y (0 , | D = 1 , M = 0] , A ,A a = E [ F − Y (0 , | D =1 ,M =0 ◦ F Y (0 , | D =1 ,M =0 ( Y (0 , | D = 1 , M = 0] , = E [ Y (0 , | D = 1 , M = 0] = E [ Y (0 , | D = 1 , M (1) = 0] , (A.9)which has data support because of Assumption 4a.38 .2 Quantile direct effect under d = conditional on D = and M ( ) = In the following, we prove that θ , ( q,
1) = F − Y (1 , | D =1 ,M (1)=0 ( q ) − F − Y (0 , | D =1 ,M (1)=0 ( q ) , = F − Y | D =1 ,M =0 ( q ) − F − Q ( Y ) | D =1 ,M =0 ( q ) . For this purpose, we have to show that F Y (1 , | D =1 ,M (1)=0 ( y ) = F Y | D =1 ,M =0 ( y ) and (A.10) F Y (0 , | D =1 ,M (1)=0 ( y ) = F Q ( Y ) | D =1 ,M =0 ( y ) , (A.11)which is sufficient to show that the quantiles are also identified. We can show (A.10)using the observational rule F Y (1 , | D =1 ,M (1)=0 ( y ) = F Y | D =1 ,M =0 ( y ) = E [1 { Y ≤ y }| D = 1 , M = 0] , with {·} being the indicator function.Using (A.9), we obtain F Q ( Y ) | D =1 ,M =0 ( y )= E [1 { Q ( Y ) ≤ y }| D = 1 , M = 0] , = E [1 { F − Y | D =0 ,M =0 ◦ F Y | D =0 ,M =0 ( Y ) ≤ y }| D = 1 , M = 0] , = E [1 { Y (0 , ≤ y }| D = 1 , M = 0] , = F Y (0 , | D =1 ,M (1)=0 ( y ) , (A.12)which proves (A.11). A.3 Average direct effect under d = conditional on D = and M ( ) = In the following, we show that θ , (0) = E [ Y (1 , − Y (0 , | D = 0 , M (0) =0] = E [ Q ( Y ) − Y | D = 0 , M = 0] . Using the observational rule, we obtain39 [ Y (0 , | D = 0 , M (0) = 0] = E [ Y | D = 0 , M = 0] . Accordingly, we have to showthat E [ Y (1 , | D = 0 , M (0) = 0] = E [ Q ( Y ) | D = 0 , M = 0] to finish the proof.First, we use (A.5) to evaluate F Y (1 , | D =0 ,M =0 ( y ) at h (1 , , , u ) F Y (1 , | D =0 ,M =0 ( h (1 , , , u )) = F U | ( h − (1 , , h (1 , , , u ))) = F U | ( u ) . Applying F − Y (1 , | D =0 ,M =0 ( q ) to both sides, we have h (1 , , , u ) = F − Y (1 , | D =0 ,M =0 ( F U | ( u )) . (A.13)Second, for F Y (1 , | D =0 ,M =0 ( y ) we have F − U | ( F Y (1 , | D =0 ,M =0 ( y )) = h − (1 , , y ) , (A.14)using (A.5). Combining (A.13) and (A.14) yields, h (1 , , , h − (1 , , y )) = F − Y (1 , | D =0 ,M =0 ◦ F Y (1 , | D =0 ,M =0 ( y ) . (A.15)Note that h (1 , , , h − (1 , , y )) maps the period 1 (potential) outcome of an indi-vidual with the outcome y in period 0 under treatment without the mediator. Ac-cordingly, E [ F − Y (1 , | D =0 ,M =0 ◦ F Y (1 , | D =0 ,M =0 ( Y ) | D = 0 , M = 0] = E [ Y (1 , | D =1 , M = 0] . We can identify F Y (1 , | D =0 ,M =0 ( y ) under Assumption 2, but we cannotidentify F Y (1 , | D =0 ,M =0 ( y ) . However, we show in the following that we can identifythe overall quantile-quantile transform F − Y (1 , | D =0 ,M =0 ◦ F Y (1 , | D =0 ,M =0 ( y ) under theadditional Assumption 3a.First, we use (A.1) to evaluate F Y (1 , | D =1 ,M =0 ( y ) at h (1 , , , u ) F Y (1 , | D =10 ,M =0 ( h (1 , , , u )) = F U | ( h − (1 , , h (1 , , , u ))) = F U | ( u ) . F − Y (1 , | D =1 ,M =0 ( q ) to both sides, we have h (1 , , , u ) = F − Y (1 , | D =1 ,M =0 ( F U | ( u )) . (A.16)Second, for F Y (1 , | D =0 ,M =0 ( y ) we have F − U | ( F Y (1 , | D =1 ,M =0 ( y )) = h − (1 , , y ) , (A.17)using (A.1). Combining (A.16) and (A.17) yields, h (1 , , , h − (1 , , y )) = F − Y (1 , | D =1 ,M =0 ◦ F Y (1 , | D =1 ,M =0 ( y ) . (A.18)The left sides of (A.15) and (A.18) are equal. In contrast to (A.15), (A.18) con-tains only distributions that can be identified from observable data. In partic-ular, F Y t (1 , | D =1 ,M =0 ( y ) = Pr( Y t (1 , ≤ y | D = 1 , M = 0) = Pr( Y t ≤ y | D =1 , M = 0) . Accordingly, we can identify F − Y (1 , | D =0 ,M =0 ◦ F Y (1 , | D =0 ,M =0 ( y ) by Q ( y ) ≡ F − Y | D =1 ,M =0 ◦ F Y | D =1 ,M =0 ( y ) .Parsing Y through Q ( · ) in the non-treated group without mediator gives E [ Q ( Y ) | D = 0 , M = 0]= E [ F − Y | D =1 ,M =0 ◦ F Y | D =1 ,M =0 ( Y ) | D = 0 , M = 0] , = E [ F − Y (1 , | D =1 ,M =0 ◦ F Y (1 , | D =1 ,M =0 ( Y (0 , | D = 0 , M = 0] , A ,A a = E [ h (1 , , , h − (1 , , Y (0 , | D = 0 , M = 0] , A = E [ h (1 , , , h − (1 , , Y (1 , | D = 1 , M = 0] , A ,A b = E [ F − Y (1 , | D =0 ,M =0 ◦ F Y (1 , | D =0 ,M =0 ( Y (1 , | D = 0 , M = 0] , = E [ Y (1 , | D = 0 , M = 0] = E [ Y (1 , | D = 0 , M (0) = 0] , (A.19)which has data support because of Assumption 4b.41 .4 Quantile direct effect under d = conditional on D = and M ( ) = In the following, we prove that θ , ( q,
0) = F − Y (1 , | D =0 ,M (0)=0 ( q ) − F − Y (0 , | D =0 ,M (0)=0 ( q ) , = F − Q ( Y ) | D =0 ,M =0 ( q ) − F − Y | D =0 ,M =0 ( q ) . For this purpose, we have to show that F Y (1 , | D =0 ,M (0)=0 ( y ) = F Q ( Y ) | D =0 ,M =0 ( y ) and (A.20) F Y (0 , | D =0 ,M (0)=0 ( y ) = F Y | D =0 ,M =0 ( y ) , (A.21)which is sufficient to show that the quantiles are also identified. We can show (A.21)using the observational rule F Y (0 , | D =0 ,M (0)=0 ( y ) = F Y | D =0 ,M =0 ( y ) = E [1 { Y ≤ y }| D = 0 , M = 0] .Using (A.19), we obtain F Q ( Y ) | D =0 ,M =0 ( y )= E [1 { Q ( Y ) ≤ y }| D = 0 , M = 0] , = E [1 { F − Y | D =1 ,M =0 ◦ F Y | D =1 ,M =0 ( Y ) ≤ y }| D = 0 , M = 0] , = E [1 { Y (1 , ≤ y }| D = 0 , M = 0] , = F Y (1 , | D =0 ,M (0)=0 ( y ) , which proves (A.20). 42 Proof of Theorem 2
B.1 Average direct effect under d = conditional on D = and M ( ) = In the following, we show that θ , (0) = E [ Y (1 , − Y (0 , | D = 0 , M (0) =1] = E [ Q ( Y ) − Y | D = 0 , M = 1] . Using the observational rule, we obtain E [ Y (0 , | D = 0 , M (0) = 1] = E [ Y | D = 0 , M = 1] . Accordingly, we have to showthat E [ Y (1 , | D = 0 , M (0) = 1] = E [ Q ( Y ) | D = 0 , M = 1] to finish the proof.Under Assumptions 1 and 5a, the conditional potential outcome distributionfunction equals F Y t ( d, | D =1 ,M =0 ( y ) A = Pr( h ( d, m, t, U ) ≤ y | D = 0 , M = 1 , T = t ) , = Pr( U ≤ h − ( d, m, t ; y ) | D = 0 , M = 1 , T = t ) , A a = Pr( U ≤ h − ( d, m, t ; y ) | D = 0 , M = 1) , = F U | ( h − ( d, m, t ; y )) , (B.1)for d, d ′ ∈ { , } . We use these quantities in the following.First, evaluating F Y (1 , | D =0 ,M =1 ( y ) at h (1 , , , u ) gives F Y (1 , | D =0 ,M =1 ( h (1 , , , u )) = F U | ( h − (1 , , h (1 , , , u ))) = F U | ( u ) . Applying F − Y (1 , | D =0 ,M =1 ( q ) to both sides, we have h (1 , , , u ) = F − Y (1 , | D =0 ,M =1 ( F U | ( u )) . (B.2)Second, for F Y (1 , | D =0 ,M =1 ( y ) we have F − U | ( F Y (1 , | D =0 ,M =1 ( y )) = h − (1 , , y ) . (B.3)43ombining (B.2) and (B.3) yields, h (1 , , , h − (1 , , y )) = F − Y (1 , | D =0 ,M =1 ◦ F Y (1 , | D =0 ,M =1 ( y ) . (B.4)Note that h (1 , , , h − (1 , , y )) maps the period 1 (potential) outcome of an in-dividual with the outcome y in period 0 under treatment with the mediator. Ac-cordingly, E [ F − Y (1 , | D =0 ,M =1 ◦ F Y (1 , | D =0 ,M =1 ( Y ) | D = 0 , M = 1] = E [ Y (1 , | D =0 , M = 1] . We can identify F Y (1 , | D =0 ,M =1 ( y ) = F Y | D =0 ,M =1 ( y ) under Assump-tion 2, but we cannot identify F Y (1 , | D =0 ,M =1 ( y ) . However, we show in the follow-ing that we can identify the overall quantile-quantile transform F − Y (1 , | D =0 ,M =1 ◦ F Y (1 , | D =0 ,M =1 ( y ) under the additional Assumption 5b.Under Assumptions 1 and 5b, the conditional potential outcome distributionfunction equals F Y t ( d, | D =1 ,M =1 ( y ) A = Pr( h ( d, m, t, U ) ≤ y | D = 1 , M = 1 , T = t ) , = Pr( U ≤ h − ( d, m, t ; y ) | D = 1 , M = 1 , T = t ) , A b = Pr( U ≤ h − ( d, m, t ; y ) | D = 1 , M = 1) , = F U | ( h − ( d, m, t ; y )) , (B.5)for d, d ′ ∈ { , } . We repeat similar steps as above. First, evaluating F Y (1 , | D =1 ,M =1 ( y ) at h (1 , , , u ) gives F Y (1 , | D =1 ,M =1 ( h (1 , , , u )) = F U | ( h − (1 , , h (1 , , , u ))) = F U | ( u ) . Applying F − Y (1 , | D =1 ,M =1 ( q ) to both sides, we have h (1 , , , u ) = F − Y (1 , | D =1 ,M =1 ( F U | ( u )) . (B.6)Second, for F Y (1 , | D =1 ,M =1 ( y ) we have F − U | ( F Y (1 , | D =1 ,M =1 ( y )) = h − (1 , , y ) . (B.7)44ombining (B.6) and (B.7) yields, h (1 , , , h − (1 , , y )) = F − Y (1 , | D =1 ,M =1 ◦ F Y (1 , | D =1 ,M =1 ( y ) . (B.8)The left sides of (B.4) and (B.8) are equal. In contrast to (B.4), (B.8) con-tains only distributions that can be identified from observable data. In partic-ular, F Y t (1 , | D =1 ,M =1 ( y ) = Pr( Y t (1 , ≤ y | D = 1 , M = 1) = Pr( Y t ≤ y | D =1 , M = 1) . Accordingly, we can identify F − Y (1 , | D =0 ,M =1 ◦ F Y (1 , | D =0 ,M =1 ( y ) by Q ( y ) ≡ F − Y | D =1 ,M =1 ◦ F Y | D =1 ,M =1 ( y ) .Parsing Y through Q ( · ) in the non-treated group with mediator gives E [ Q ( Y ) | D = 0 , M = 1]= E [ F − Y | D =1 ,M =1 ◦ F Y | D =1 ,M =1 ( Y ) | D = 0 , M = 1] , = E [ F − Y (1 , | D =1 ,M =1 ◦ F Y (1 , | D =1 ,M =1 ( Y (0 , | D = 0 , M = 1] , A ,A b = E [ h (1 , , , h − (1 , , Y (0 , | D = 0 , M = 1] , A = E [ h (1 , , , h − (1 , , Y (0 , | D = 0 , M = 1] , A ,A a = E [ F − Y (1 , | D =0 ,M =1 ◦ F Y (1 , | D =0 ,M =1 ( Y (0 , | D = 0 , M = 1] , = E [ Y (1 , | D = 0 , M = 1] = E [ Y (1 , | D = 0 , M (0) = 1] , (B.9)which has data support because of Assumption 6a. B.2 Quantile direct effect under d = conditional on D = and M ( ) = In the following, we show that θ , ( q,
0) = F − Y (1 , | D =0 ,M (0)=1 ( q ) − F − Y (0 , | D =0 ,M (0)=1 ( q ) , = F − Q ( Y ) | D =0 ,M =1 ( q ) − F − Y | D =0 ,M =1 ( q ) . F Y (1 , | D =0 ,M (0)=1 ( y ) = F Q ( Y ) | D =0 ,M =1 ( y ) and (B.10) F Y (0 , | D =0 ,M (0)=1 ( y ) = F Y | D =0 ,M =1 ( y ) , (B.11)which is sufficient to show that the quantiles are also identified. We can show (B.11)using the observational rule F Y (0 , | D =0 ,M (0)=1 ( y ) = F Y | D =0 ,M =1 ( y ) = E [1 { Y ≤ y }| D = 0 , M = 1] .Using (B.9), we obtain F Q ( Y ) | D =0 ,M =1 ( y )= E [1 { Q ( Y ) ≤ y }| D = 0 , M = 1] , = E [1 { F − Y | D =1 ,M =1 ◦ F Y | D =1 ,M =1 ( Y ) ≤ y }| D = 0 , M = 1] , = E [1 { Y (1 , ≤ y }| D = 0 , M = 0] , = F Y (1 , | D =0 ,M (0)=1 ( y ) , (B.12)which proves (B.10). B.3 Average direct effect under d = conditional on D = and M ( ) = In the following, we show that θ , (1) = E [ Y (1 , − Y (0 , | D = 1 , M (1) =1] = E [ Y − Q ( Y ) | D = 1 , M = 1] . Using the observational rule, we obtain E [ Y (1 , | D = 1 , M (1) = 1] = E [ Y | D = 1 , M = 1] . Accordingly, we have to showthat E [ Y (0 , | D = 1 , M (1) = 1] = E [ Q ( Y ) | D = 1 , M = 1] to finish the proof.First, using (B.5) to evaluate F Y (0 , | D =1 ,M =1 ( y ) at h (0 , , , u ) gives F Y (0 , | D =1 ,M =1 ( h (0 , , , u )) = F U | ( h − (0 , , h (0 , , , u ))) = F U | ( u ) . F − Y (0 , | D =1 ,M =1 ( q ) to both sides, we have h (0 , , , u ) = F − Y (0 , | D =1 ,M =1 ( F U | ( u )) . (B.13)Second, for F Y (0 , | D =0 ,M =1 ( y ) we obtain F − U | ( F Y (0 , | D =1 ,M =1 ( y )) = h − (0 , , y ) , (B.14)using (B.5). Combining (B.13) and (B.14) yields, h (0 , , , h − (0 , , y )) = F − Y (0 , | D =1 ,M =1 ◦ F Y (0 , | D =1 ,M =1 ( y ) . (B.15)Note that h (0 , , , h − (0 , , y )) maps the period 1 (potential) outcome of an indi-vidual with the outcome y in period 0 under non-treatment with the mediator. Ac-cordingly, E [ F − Y (1 , | D =0 ,M =1 ◦ F Y (1 , | D =0 ,M =1 ( Y ) | D = 0 , M = 1] = E [ Y (1 , | D =0 , M = 1] . We can identify F Y (1 , | D =0 ,M =1 ( y ) = F Y | D =0 ,M =1 ( y ) under Assump-tion 2, but we cannot identify F Y (1 , | D =0 ,M =1 ( y ) . However, we show in the follow-ing that we can identify the overall quantile-quantile transform F − Y (1 , | D =0 ,M =1 ◦ F Y (1 , | D =0 ,M =1 ( y ) under the additional Assumption 5a.First, using (B.1) to evaluate F Y (0 , | D =0 ,M =1 ( y ) at h (0 , , , u ) gives F Y (0 , | D =0 ,M =1 ( h (0 , , , u )) = F U | ( h − (0 , , h (0 , , , u ))) = F U | ( u ) . Applying F − Y (0 , | D =0 ,M =1 ( q ) to both sides, we have h (0 , , , u ) = F − Y (0 , | D =0 ,M =1 ( F U | ( u )) . (B.16)Second, for F Y (0 , | D =0 ,M =1 ( y ) we obtain F − U | ( F Y (0 , | D =0 ,M =1 ( y )) = h − (0 , , y ) , (B.17)47sing (B.1). Combining (B.16) and (B.17) yields, h (0 , , , h − (0 , , y )) = F − Y (0 , | D =0 ,M =1 ◦ F Y (0 , | D =0 ,M =1 ( y ) . (B.18)The left sides of (B.15) and (B.18) are equal. In contrast to (B.15), (B.18)contains only distributions that can be identified from observable data. In particular, F Y t (0 , | D =0 ,M =1 ( y ) = Pr( Y t (0 , ≤ y | D = 0 , M = 1) = Pr( Y t ≤ y | D = 0 , M =1) . Accordingly, we can identify F − Y (0 , | D =1 ,M =1 ◦ F Y (0 , | D =1 ,M =1 ( y ) by Q ( y ) ≡ F − Y | D =0 ,M =1 ◦ F Y | D =0 ,M =1 ( y ) .Parsing Y through Q ( · ) in the treated group with mediator gives E [ Q ( Y ) | D = 1 , M = 1]= E [ F − Y | D =0 ,M =1 ◦ F Y | D =0 ,M =1 ( Y ) | D = 1 , M = 1] , = E [ F − Y (0 , | D =0 ,M =1 ◦ F Y (0 , | D =0 ,M =1 ( Y (1 , | D = 1 , M = 1] , A ,A a = E [ h (0 , , , h − (0 , , Y (1 , | D = 1 , M = 1] , A = E [ h (0 , , , h − (0 , , Y (0 , | D = 1 , M = 1] , A ,A b = E [ F − Y (0 , | D =1 ,M =1 ◦ F Y (0 , | D =1 ,M =1 ( Y (0 , | D = 1 , M = 1] , = E [ Y (0 , | D = 1 , M = 1] = E [ Y (0 , | D = 1 , M (1) = 1] , (B.19)which has data support under Assumption 6b. B.4 Quantile direct effect under d = conditional on D = and M ( ) = In the following, we show that θ , ( q,
1) = F − Y (1 , | D =1 ,M (1)=1 ( q ) − F − Y (0 , | D =1 ,M (1)=1 ( q ) , = F − Y | D =1 ,M =1 ( q ) − F − Q ( Y ) | D =1 ,M =1 ( q ) . F Y (1 , | D =1 ,M (1)=1 ( y ) = F Y | D =1 ,M =1 ( y ) and (B.20) F Y (0 , | D =1 ,M (1)=1 ( y ) = F Q ( Y ) | D =1 ,M =1 ( y ) , (B.21)which is sufficient to show that the quantiles are also identified. We can show (B.20)using the observational rule F Y (1 , | D =1 ,M (1)=1 ( y ) = F Y | D =1 ,M =1 ( y ) = E [1 { Y ≤ y }| D = 1 , M = 1] .Using (B.19), we obtain F Q ( Y ) | D =1 ,M =1 ( y )= E [1 { Q ( Y ) ≤ y }| D = 1 , M = 1] , = E [1 { F − Y | D =0 ,M =1 ◦ F Y | D =0 ,M =1 ( Y ) ≤ y }| D = 1 , M = 1] , = E [1 { Y (0 , ≤ y }| D = 1 , M = 0] , = F Y (0 , | D =1 ,M (1)=1 ( y ) , which proves (B.21). C Proof of equations (1) and (2) ∆ = E [ Y | D = 1] − E [ Y | D = 0] and quantile treatment effect ∆ ( q ) = F − Y | D =1 ( q ) − F − Y | D =0 ( q ) The average total effect for the entire population is identified by, ∆ = E [ Y (1 , M (1))] − E [ Y (0 , M (0))] , A = E [ Y (1 , M (1)) | D = 1] − E [ Y (0 , M (0)) | D = 0] , = E [ Y | D = 1] − E [ Y | D = 0] , where the first equality is the definition of ∆ , the second equality hold by Assump-tion 7, and the last equality holds by the observational rule.We define the conditional distribution F Y | D = d ( y ) = Pr( Y ≤ y | D = d ) and49 − Y | D = d ( q ) = inf { y : F Y | D = d ( y ) ≥ q } . We can show the identification of the totalQTE for the entire population ∆ ( q ) = F − Y | D =1 ( q ) − F − Y | D =0 ( q ) when we show that F Y (1 ,M (1)) ( y ) = F Y | D =1 ( y ) and F Y (0 ,M (0)) ( y ) = F Y | D =0 ( y ) . Using Assumption 7 andthe observational rule gives, F Y (1 ,M (1)) ( y ) = Pr( Y (1 , M (1)) ≤ y ) , A = Pr( Y (1 , M (1)) ≤ y | D = 1) , = Pr( Y ≤ y | D = 1) = F Y | D =1 ( y ) , and F Y (0 ,M (0)) ( y ) = Pr( Y (0 , M (0)) ≤ y ) , A = Pr( Y (0 , M (0)) ≤ y | D = 0) , = Pr( Y ≤ y | D = 0) = F Y | D =0 ( y ) , which finishes the proof.By Assumption 7, the share of a type τ conditional on D corresponds to p τ (in the population), as D is randomly assigned. This implies that p | = p a + p c , p | = p a + p de , p | = p n + p de , and p | = p n + p c . Under Assumption 8, p de = 0 ,which finishes the proof of (1).Furthermore, E [ Y t ( d, m ) | τ, D = 1] = E [ Y t ( d, m ) | τ, D = 0] = E [ Y t ( d, m ) | τ ] dueto the independence of D and the potential outcomes as well as the types τ (which area deterministic function of M ( d ) ) under Assumption 7. It follows that conditioningon D is not required on the right hand side of the following equation, which expressesthe mean outcome conditional D = 0 and M = 0 as weighted average of the meanpotential outcomes of compliers and never-takers: E [ Y t | D = 0 , M = 0]= p n p n + p c E [ Y t (0 , | τ = n ] + p c p n + p c E [ Y t (0 , | τ = c ] . (C.1)50nly compliers and never-takers satisfy M (0) = 0 and thus make up the group with D = 0 and M = 0 . After some rearrangements we obtain E [ Y t (0 , | τ = n ] − E [ Y t (0 , | τ = c ]= p n + p c p c { E [ Y t (0 , | τ = n ] − E [ Y t | D = 0 , M = 0] } . (C.2)Next, we consider observations with D = 1 and M = 0 , which might consist of bothnever-takers and defiers, as M (1) = 0 for both types. However, by Assumption 8,defiers are ruled out, such that the mean outcome given D = 1 and M = 0 isdetermined by never-takers only: E [ Y t | D = 1 , M = 0] A ,A = E [ Y t (1 , | τ = n ] . (C.3)Furthermore, by Assumption 2, E [ Y (0 , | τ = n ] A = E [ Y (1 , | τ = n ] A ,A = E [ Y | D = 1 , M = 0] . Similarly to (C.1) for the never-takers and compliers, consider the mean outcomegiven Z = 1 and D = 1 , which is made up by always-takers and compliers (the typeswith M (1) = 1 ) E [ Y t | D = 1 , M = 1]= p a p a + p c E [ Y t (1 , | τ = a ] + p c p a + p c E [ Y t (1 , | τ = c ] . (C.4)After some rearrangements we obtain E [ Y t (1 , | τ = a ] − E [ Y t (1 , | τ = c ]= p a + p c p c { E [ Y t (1 , | τ = a ] − E [ Y t | D = 1 , M = 1] } . (C.5)By Assumptions 7 and 8, E [ Y t | D = 0 , M = 1] = E [ Y t (0 , | τ = a ] . (C.6)51ow consider (C.5) for period T = 0 , and note that by Assumption 2, E [ Y (1 , | τ = a ] = E [ Y (0 , | τ = a ] = E [ Y (0 , | τ = a ] and E [ Y (1 , | τ = c ] = E [ Y (0 , | τ = c ] .Combining (C.4), (C.6), and the law of iterative expectations (LIE) gives E [ Y | D = 1] LIE = E [ Y | D = 1 , M = 1] · p | + E [ Y | D = 1 , M = 0] · p | , = E [ Y (1 , | τ = c ] · p c + E [ Y (1 , | τ = a ] · p a + E [ Y (1 , | τ = n ] · p n , A = E [ Y (1 , | τ = c ] · p c + E [ Y (1 , | τ = a ] · p a + E [ Y (0 , | τ = n ] · p n . Likewise, combining (C.1) and (C.3) gives E [ Y | D = 0] LIE = E [ Y | D = 0 , M = 1] · p | + E [ Y | D = 1 , M = 0] · p | , = E [ Y (0 , | τ = a ] · p a + E [ Y (0 , | τ = c ] · p c + E [ Y (0 , | τ = n ] · p n , A = E [ Y (1 , | τ = a ] · p a + E [ Y (0 , | τ = c ] · p c + E [ Y (0 , | τ = n ] · p n . Accordingly, E [ Y | D = 1] − E [ Y | D = 0] p | − p | = E [ Y (1 , | τ = c ] − E [ Y (0 , | τ = c ] A = 0 , which proves (2). Accordingly, E [ Y | D = 1] − E [ Y | D = 0] = 0 is a testableimplication of Assumption 2, 7, and 8. D Proof of Theorem 3
D.1 Average direct effect on the never-takers
In the following, we show that θ n = E [ Y (1 , − Y (0 , | τ = n ] = E [ Y − Q ( Y ) | D =1 , M = 0] . From (C.3), we obtain the first ingredient E [ Y (1 , | τ = n ] = E [ Y | D =1 , M = 0] . Furthermore, from (A.9) we have E [ Q ( Y ) | D = 1 , M = 0] = E [ Y (0 , | D = , M (1) = 0] . Under Assumption 7 and 8, E [ Y (0 , | D = 1 , M (1) = 0] A = E [ Y (0 , | D = 1 , τ = n ] A = E [ Y (0 , | τ = n ] . (D.1) D.2 Quantile direct effect on the never-takers
We prove that θ n ( q ) = F − Y (1 , | τ = n ( q ) − F − Y (0 , | τ = n ( q ) , = F − Y | D =1 ,M =0 ( q ) − F − Q ( Y ) | D =1 ,M =0 ( q ) . This requires showing that F Y (1 , | τ = n ( y ) = F Y | D =1 ,M =0 ( y ) and (D.2) F Y (0 , | τ = n ( y ) = F Q ( Y ) | D =1 ,M =0 ( y ) . (D.3)Under Assumptions 7 and 8, F Y t | D =1 ,M =0 ( y ) = E [1 { Y t ≤ y }| D = 1 , M = 0] A ,A = E [1 { Y t (1 , ≤ y }| τ = n ]= F Y t (1 , | τ = n ( y ) , (D.4)which proves (D.2). From (A.12), we have F Q ( Y ) | D =1 ,M =0 ( y ) = F Y (0 , | D =1 ,M (1)=0 ( y ) = E [1 { Y (0 , ≤ y }| D = 1 , M (1) = 0] . Under Assumption 7 and 8, E [1 { Y (0 , ≤ y }| D = 1 , M (1) = 0] A ,A = E [1 { Y (0 , ≤ y }| τ = n ]= F Y (0 , | τ = n ( y ) , (D.5)which proves (D.3). 53 .3 Average direct effect under d = on compliers In the following, we show that θ c (0) = E [ Y (1 , − Y (0 , | τ = c ] , = p | p | − p | E [ Q ( Y ) − Y | D = 0 , M = 0] − p | p | − p | E [ Y − Q ( Y ) | D = 1 , M = 0] . Plugging (D.1) in (C.1) under T = 1 , we obtain E [ Y | D = 0 , M = 0] = p n p n + p c E [ Q ( Y ) | D = 1 , M = 0]+ p c p n + p c E [ Y (0 , | τ = c ] . This allows identifying E [ Y (0 , | τ = c ] = p | p | − p | E [ Y | D = 0 , M = 0] − p | p | − p | E [ Q ( Y ) | D = 1 , M = 0] . (D.6)Accordingly, we have to show the identification of E [ Y (1 , | c ] to finish theproof. From (A.19) we have E [ Y (1 , | D = 0 , M = 0] = E [ Q ( Y ) | D = 0 , M = 0] .Applying the law of iterative expectations, gives E [ Y (1 , | D = 0 , M = 0] = p n p n + p c E [ Y (1 , | D = 0 , M = 0 , τ = n ]+ p c p n + p c E [ Y (1 , | D = 0 , M = 0 , τ = c ] , A = p n p n + p c E [ Y (1 , | τ = n ] + p c p n + p c E [ Y (1 , | τ = c ] . After some rearrangements and using (C.3), we obtain E [ Y (1 , | τ = c ] = p n + p c p c E [ Q ( Y ) | D = 0 , M = 0] − p n p c E [ Y | D = 1 , M = 0] . E [ Y (1 , | τ = c ] = p | p | − p | E [ Q ( Y ) | D = 0 , M = 0] − p | p | − p | E [ Y | D = 1 , M = 0] , (D.7)using p n = p | , and p c + p n = p | . D.4 Quantile direct effect under d = on compliers We show that F Y (1 , | τ = c ( y ) = p | p | − p | F Q ( Y ) | D =0 ,M =0 ( y ) − p | p | − p | c F Y | D =1 ,M =0 ( y ) and F Y (0 , | τ = c ( y ) = p | p | − p | F Y | D =0 ,M =0 ( y ) − p | p | − p | F Q ( Y ) | D =1 ,M =0 ( y ) , which proves that θ c ( q,
0) = F − Y (1 , | c ( q ) − F − Y (0 , | c ( q ) is identified.From (A.20), we have F Y (1 , | D =0 ,M (0)=0 ( y ) = F Q ( Y ) | D =0 ,M =0 ( y ) . Applying thelaw of iterative expectations gives F Y (1 , | D =0 ,M (0)=0 ( y ) = p n p n + p c F Y (1 , | D =0 ,M (0)=0 ,τ = n ( y )+ p c p n + p c F Y (1 , | D =0 ,M (0)=0 ,τ = c ( y ) , A = p n p n + p c F Y (1 , | τ = n ( y ) + p c p n + p c F Y (1 , | τ = c ( y ) . Using (D.2) and rearranging the equation gives, F Y (1 , | τ = c ( y ) = p | p | − p | F Q ( Y ) | D =0 ,M =0 ( y ) − p | p | − p | F Y | D =1 ,M =0 ( y ) . (D.8)In analogy to (C.1), the outcome distribution under D = 0 and M = 0 equals: F Y | D =0 ,M =0 ( y ) = p n p n + p c F Y (0 , | τ = n ( y ) + p c p n + p c F Y (0 , | τ = c ( y ) . F Y (0 , | τ = c ( y ) = p | p | − p | F Y | D =0 ,M =0 ( y ) − p | p | − p | F Q ( Y ) | D =1 ,M =0 ( y ) . (D.9) E Proof of Theorem 4
E.1 Average direct effect on the always-takers
In the following, we show that θ a = E [ Y (1 , − Y (0 , | τ = a ] = E [ Q ( Y ) − Y | D = 0 , M = 1] . From (C.6), we obtain the first ingredient E [ Y (0 , | a ] = E [ Y | D = 0 , M = 1] . Furthermore, from (B.9) we have E [ Q ( Y ) | D = 0 , M = 1] = E [ Y (1 , | D = 0 , M (0) = 1] . Under Assumption 7 and 8, E [ Y (1 , | D = 0 , M (0) = 1] A = E [ Y (1 , | D = 0 , τ = a ] A = E [ Y (1 , | τ = a ] . (E.1) E.2 Quantile direct effect on the always-takers
We prove that θ a ( q ) = F − Y (1 , | τ = a ( q ) − F − Y (0 , | τ = a ( q ) , = F − Q ( Y ) | D =0 ,M =1 ( q ) − F − Y | D =0 ,M =1 ( q ) . This requires showing that F Y (1 , | τ = a ( y ) = F Q ( Y ) | D =0 ,M =1 ( y ) and (E.2) F Y (0 , | τ = a ( y ) = F Y | D =0 ,M =1 ( y ) . (E.3)56nder Assumptions 7 and 8, F Y t | D =0 ,M =1 ( y ) = E [1 { Y t ≤ y }| D = 0 , M = 1] A ,A = E [1 { Y t (0 , ≤ y }| τ = a ]= F Y t (0 , | τ = a , ( y ) . (E.4)which proves (E.3). From (B.12), we have F Q ( Y ) | D =0 ,M =1 ( y ) = F Y (1 , | D =0 ,M (0)=1 ( y ) = E [1 { Y (1 , ≤ y }| D = 0 , M (0) = 1] . Under Assumption 7 and 8, E [1 { Y (1 , ≤ y }| D = 0 , M (0) = 1] A ,A = E [1 { Y (1 , ≤ y }| τ = a ]= F Y (1 , | τ = a ( y ) , (E.5)which proves (E.2). E.3 Average direct effect under d = on compliers In the following, we show that θ c (1) = E [ Y (1 , − Y (0 , | τ = c ] , = p | p | − p | E [ Y − Q ( Y ) | D = 1 , M = 1] − p | p | − p | E [ Q ( Y ) − Y | D = 0 , M = 1] . Plugging (E.1) in (C.4), we obtain E [ Y | D = 1 , M = 1] = p a p a + p c E [ Q ( Y ) | D = 0 , M = 1]+ p c p a + p c E [ Y (1 , | τ = c ] . E [ Y (1 , | τ = c ] = p | p | − p | E [ Y | D = 1 , M = 1] − p | p | − p | E [ Q ( Y ) | D = 0 , M = 1] . (E.6)From (B.19) we have E [ Y (0 , | D = 1 , M = 1] = E [ Q ( Y ) | D = 1 , M = 1] .Applying the law of iterative expectations, gives E [ Y (0 , | D = 1 , M = 1] = p a p a + p c E [ Y (0 , | D = 1 , M = 1 , τ = a ]+ p c p a + p c E [ Y (0 , | D = 1 , M = 1 , τ = c ] , A = p a p a + p c E [ Y (0 , | τ = a ] + p c p a + p c E [ Y (0 , | τ = c ] . After some rearrangements and using (C.6), we obtain E [ Y (0 , | τ = c ] = p a + p c p c E [ Q ( Y ) | D = 1 , M = 1] − p a p c E [ Y | D = 0 , M = 1] . This gives E [ Y (0 , | τ = c ] = p | p | − p | E [ Q ( Y ) | D = 1 , M = 1] − p | p | − p | E [ Y | D = 0 , M = 1] , (E.7)with p a = p | , and p c + p a = p | . E.4 Quantile direct effect under d = on compliers We show that F Y (1 , | τ = c ( y ) = p | p | − p | F Y | D =1 ,M =1 ( y ) − p | p | − p | F Q ( Y ) | D =0 ,M =1 ( y ) and F Y (0 , | τ = c ( y ) = p | p | − p | F Q ( Y ) | D =1 ,M =1 ( y ) − p | p | − p | F Y | D =0 ,M =1 ( y ) , which proves that θ c ( q,
1) = F − Y (1 , | c ( q ) − F − Y (0 , | c ( q ) is identified.58n analogy to (C.4), the outcome distribution under D = 0 and M = 0 equals: F Y | D =1 ,M =1 ( y ) = p a p a + p c F Y (1 , | τ = a ( y ) + p c p a + p c F Y (1 , | τ = c ( y ) . Using (E.2) and rearranging the equation gives F Y (1 , | τ = c ( y ) = p | p | − p | F Y | D =1 ,M =1 ( y ) − p | p | − p | F Q ( Y ) | D =0 ,M =1 ( y ) . (E.8)From (B.21), we have F Y (0 , | D =1 ,M (1)=1 ( y ) = F Q ( Y ) | D =1 ,M =1 ( y ) . Applying thelaw of iterative expectations gives F Y (0 , | D =1 ,M (1)=1 ( y ) = p a p a + p c F Y (0 , | D =1 ,M (1)=1 ,τ = a ( y )+ p c p a + p c F Y (0 , | D =1 ,M (1)=1 ,τ = c ( y ) , A = p a p a + p c F Y (0 , | τ = a ( y ) + p c p a + p c F Y (0 , | τ = c ( y ) . Using (E.3) and rearranging the equation gives, F Y (0 , | τ = c ( y ) = p | p | − p | F Q ( Y ) | D =1 ,M =1 ( y ) − p | p | − p | F Y | D =0 ,M =1 ( y ) . (E.9) F Proof of Theorem 5
F.1 Average treatment effect on the compliers
In (E.6) and (D.6), we show that θ c = E [ Y (1 , − Y (0 , | τ = c ] , = p | p | − p | E [ Y | D = 1 , M = 1] − p | p | − p | E [ Q ( Y ) | D = 0 , M = 1] − p | p | − p | E [ Y | D = 0 , M = 0] + p | p | − p | E [ Q ( Y ) | D = 1 , M = 0] . .2 Quantile treatment effect on the compliers In (E.8) and (D.9), we show that F Y (1 , | c ( y ) and F Y (0 , | c ( y ) are identified. Accord-ingly, ∆ c ( q ) = F − Y (1 , | c ( q ) − F − Y (0 , | c ( q ) is identified. F.3 Average indirect effect under d = 0 on compliers In (E.7) and (D.6), we show that δ c (0) = E [ Y (0 , − Y (0 , | τ = c ] , = p | p | − p | E [ Q ( Y ) | D = 1 , M = 1] − p | p | − p | E [ Y | D = 0 , M = 1] − p | p | − p | E [ Y | D = 0 , M = 0] + p | p | − p | E [ Q ( Y ) | D = 1 , M = 0] . F.4 Quantile indirect effect under d = 0 on compliers In (E.9) and (D.9), we show that F Y (0 , | c ( y ) and F Y (0 , | c ( y ) are identified. Accord-ingly, δ c ( q,
0) = F − Y (0 , | c ( q ) − F − Y (0 , | c ( q ) is identified. F.5 Average indirect effect under d = 1 on compliers In (E.6) and (D.7), we show that δ c (1) = E [ Y (1 , − Y (1 , | τ = c ] , = p | p | − p | E [ Y | D = 1 , M = 1] − p | p | − p | E [ Q ( Y ) | D = 0 , M = 1] − p | p | − p | E [ Q ( Y ) | D = 0 , M = 0] + p | p | − p | E [ Y | D = 1 , M = 0] . F.6 Quantile indirect effect under d = 1 on compliers In (E.8) and (D.8), we show that F Y (1 , | c ( y ) and F Y (1 , | c ( y ) are identified. Accord-ingly, δ c ( q,
1) = F − Y (1 , | c ( q ) − F − Y (1 , | c ( q ))