[PDF] Direct and Indirect Effects based on Changes-in-Changes

Abstract

We propose a novel approach for causal mediation analysis based on changes-in-changes assumptions restricting unobserved heterogeneity over time. This allows disentangling the causal effect of a binary treatment on a continuous outcome into an indirect effect operating through a binary intermediate variable (called mediator) and a direct effect running via other causal mechanisms. We identify average and quantile direct and indirect effects for various subgroups under the condition that the outcome is monotonic in the unobserved heterogeneity and that the distribution of the latter does not change over time conditional on the treatment and the mediator. We also provide a simulation study and an empirical application to the Jobs II programme.

Full PDF

aa r X i v : . [ ec on . E M ] O c t Direct and Indirect Eﬀects based on Changes-in-Changes ∗ Martin Huber † , Mark Schelker † , Anthony Strittmatter ‡ † University of Fribourg, Dept. of Economics ‡ University of St. Gallen, Swiss Institute for Empirical Economic Research

Abstract:

We propose a novel approach for causal mediation analysis based on changes-in-changes assumptions restricting unobserved heterogeneity over time. This allows disen-tangling the causal eﬀect of a binary treatment on a continuous outcome into an indirecteﬀect operating through a binary intermediate variable (called mediator) and a direct eﬀectrunning via other causal mechanisms. We identify average and quantile direct and indirecteﬀects for various subgroups under the condition that the outcome is monotonic in theunobserved heterogeneity and that the distribution of the latter does not change over timeconditional on the treatment and the mediator. We also provide a simulation study andan empirical application to the Jobs II programme.

Keywords:

Direct eﬀects, indirect eﬀects, mediation analysis, changes-in-changes, causalmechanisms, treatment eﬀects.

JEL classiﬁcation: C21. ∗ We have beneﬁted from comments by Giuseppe Germinario as well as conference/seminar partici-pants at the Universities of Neuchâtel, Melbourne, Sydney, Hamburg, and Lisbon, the LuxembourgInstitute of Socio-Economic Research, the 2019 meeting of the Austro-Swiss Region of the Inter-national Biometric Society in Lausanne, and the 2019 meeting of the International Associationfor Applied Econometrics in Nicosia. Addresses for correspondence: Martin Huber, Chair of Ap-plied Econometrics - Evaluation of Public Policies, University of Fribourg, Bd. de Pérolles 90,1700 Fribourg, Switzerland, [email protected]. Mark Schelker, Chair of Public Economics,University of Fribourg, Bd. de Pérolles 90, 1700 Fribourg, Switzerland, [email protected] Strittmatter, Swiss Institute for Empirical Economic Research (SEW), Universityof St.Gallen, Varnbüelstr. 14, 9000 St.Gallen, Switzerland, [email protected], . Introduction

Causal mediation analysis aims at disentangling a total treatment eﬀect into anindirect eﬀect operating through an intermediate variable – commonly referred toas mediator – as well as the direct eﬀect. The latter includes any causal mechanismsnot operating through the mediator of interest. Even when the treatment is random,direct and indirect eﬀects are generally not identiﬁed by simply controlling for themediator without accounting for its potential endogeneity, as this likely introducesselection bias, see Robins and Greenland (1992).This paper suggests a novel identiﬁcation strategy for causal mediation analysisbased on changes-in-changes (CiC) as suggested by Athey and Imbens (2006) forevaluating (total) average and quantile treatment eﬀects. We adapt the approach tothe identiﬁcation of the direct eﬀect and the indirect eﬀect running through a binarymediator. The outcome variable must be continuous and is assumed to be observedboth prior to and after treatment and mediator assignment as it is the case in re-peated cross sections or panel data. The key identifying assumptions imply that thecontinuous outcome is strictly monotonic in unobserved heterogeneity and that thedistribution of unobserved heterogeneity does not change over time conditional onthe treatment and the mediator (the latter assumption is also known as stationar-ity). Given appropriate common support conditions, this permits identifying directeﬀects on subpopulations conditional on the treatment and the mediator states, evenif both treatment and mediator assignment are endogenous.Augmenting the assumptions by random treatment assignment and weak mono-tonicity of the mediator in the treatment allows for causal mediation analysis insubpopulations deﬁned upon whether and how the mediator reacts to the treatment.Speciﬁcally, we show the identiﬁcation of direct eﬀects among those whose mediatoris always one (always-takers in the denomination of Angrist, Imbens, and Rubin,1996) and never one (never-takers) irrespective of treatment assignment, respec-tively. Furthermore, we identify the total, direct, and indirect treatment eﬀects onthose whose mediator value complies with treatment assignment (compliers). For1ny set of assumptions, we discuss the identiﬁcation of both average and quantiledirect and indirect eﬀects. We note that if appropriately weighted, the respective av-erage eﬀects among compliers, always-takers, and never-takers add up to the averagedirect and indirect eﬀects in the population.Identiﬁcation in the earlier mediation literature typically relied on linear modelsfor the mediator and outcome equations and often neglected endogeneity issues, seefor instance Cochran (1957), Judd and Kenny (1981), and Baron and Kenny (1986).More recent contributions use more general identiﬁcation approaches based on thepotential outcome framework and take endogeneity issues explicitly into consider-ation. Examples include Robins and Greenland (1992), Pearl (2001), Robins (2003),Petersen, Sinisi, and van der Laan (2006), VanderWeele (2009), Imai, Keele, and Yamamoto(2010), Hong (2010), Albert and Nelson (2011), Imai and Yamamoto (2013), Tchetgen Tchetgen and Shpitser(2012), Vansteelandt, Bekaert, and Lange (2012), and Huber (2014). The vast ma-jority of the literature assumes that the covariates observed in the data are suﬃ-ciently rich to control for treatment and mediator endogeneity. Also in empirical eco-nomics, there has been an increase in the application of such selection on observablesapproaches, see for instance Simonsen and Skipper (2006), Flores and Flores-Lagunes(2009), Heckman, Pinto, and Savelyev (2013), Huber (2015), Keele, Tingley, and Yamamoto(2015), Conti, Heckman, and Pinto (2016), Huber, Lechner, and Mellace (2017), Bijwaard and Jones(2019), Bellani and Bia (2018), Huber, Lechner, and Strittmatter (2018), and Doerr and Strittmatter(2019). Comparably few studies in economics develop or apply instrumental variableapproaches for disentangling direct and indirect eﬀects, see for instance Frölich and Huber(2017), Powdthavee, Lekfuangfu, and Wooden (2013), Brunello, Fort, Schneeweis, and Winter-Ebmer(2016) and Chen, Chen, and Liu (2017). Our paper provides another, CiC-basedidentiﬁcation strategy that neither rests on selection on observables assumptionsnor on instrumental variables for the treatment or the mediator.While most studies aim at evaluating direct and indirect eﬀects in the total pop-ulation, a smaller strand of the literature uses the principal stratiﬁcation frameworkof Frangakis and Rubin (2002) to investigate eﬀects in subpopulations (or principal2trata) deﬁned upon whether and how the mediator reacts to the treatment, seeRubin (2004). This approach has been criticized for typically focussing on directeﬀects on populations whose mediator is constant (i.e. always- and never-takers)rather than decomposing direct and indirect eﬀects on compliers and for consider-ing subpopulations rather than the total population, see VanderWeele (2008) andVanderWeele (2012). Deuchert, Huber, and Schelker (2017) suggest a diﬀerence-in-diﬀerences (DiD) strategy that alleviates such criticisms. Identiﬁcation relies on arandomized treatment, monotonicity of the (binary) mediator in the treatment, andparticular common trend assumptions on mean potential outcomes across principalstrata. The latter imply that mean potential outcomes under speciﬁc treatment andmediator states change by the same amount over time across speciﬁc subpopulations.Depending on the strength of common trend and eﬀect homogeneity assumptionsacross principal strata, direct and indirect eﬀects are identiﬁed for diﬀerent subpop-ulations and under the strongest set of assumptions even for the total population.Our paper contributes to this literature on principal strata eﬀects, but relies ondiﬀerent identifying assumptions than Deuchert, Huber, and Schelker (2017). Whilediﬀerential time trends across subpopulations are permitted, our approach restrictsthe conditional distribution of unobserved heterogeneity over time. The two sets ofassumptions are not nested and their appropriateness is to be judged in the empiricalcontext at hand. However, both approaches could be used simultaneously for testingthe joint validity of the identifying assumptions of either method, in which caseboth CiC and DiD converge to the same, true average direct and indirect eﬀects.As a further distinction to Deuchert, Huber, and Schelker (2017), our method alsopermits assessing quantile treatment eﬀects (QTEs) rather than average eﬀects only.In independent work, Sawada (2019) proposes a CiC strategy to tackle non-compliance in randomized experiments when the exclusion restriction of randomassignment is violated. While there is an overlap in some identiﬁcation results ofhis study and ours (e.g. concerning the direct eﬀect on never-takers), there arealso important diﬀerences. First, Sawada (2019) predominantly focusses on the av-3rage treatment eﬀect on the treated under one-sided non-compliance (ruling outalways-takers), which then corresponds to the total eﬀect on compliers. Our paperin addition disentangles the total complier eﬀect into direct and indirect compo-nents. Second, under two-sided non-compliance (i.e. the existence of both never-and always-takers), Sawada (2019) identiﬁes the total complier eﬀect by assum-ing homogeneity of the direct eﬀect, while we extend the CiC assumptions to thealways-takers for identifying (direct, indirect, and total) complier eﬀects as well asthe direct eﬀect among always-takers. Third and in contrast to Sawada (2019), wealso provide identiﬁcation results in the absence of randomization and monotonicityof the mediator in the treatment. On the other hand, Sawada (2019) in contrastto our study demonstrates that the CiC strategy does not necessarily require pre-treatment outcomes, but may exploit any pre-treatment variable that has similarrank orders (as a function of unobserved heterogeneity) like the outcome of interest.We provide a simulation study in which we compare the CiC to the DiD approachto illustrate our identiﬁcation results. We also consider an empirical application tothe Jobs II programme previously analysed by Vinokur, Price, and Schul (1995), arandomized job training intervention designed to analyse the impact of job train-ing on labour market and mental health outcomes. We investigate the direct eﬀectof the randomized oﬀer of treatment on a depression index, as well as its indirecteﬀect through actual participation in the programme as mediator. The reason forinvestigating the direct eﬀect is that treatment assignment could have a motiva-tion or discouragement eﬀect on those randomly oﬀered or not oﬀered the training.We, however, ﬁnd the direct eﬀect estimates to be close to zero and statisticallyinsigniﬁcant and therefore no indication for the violation of the exclusion restrictionwhen using treatment assignment as instrumental variable for actual participation.In contrast, the moderately negative total and indirect eﬀects on those induced toparticipate by assignment are statistically signiﬁcant at least at the 10% level inall but one case and very much in line with the estimate obtained by instrumentalvariable regression. 4he remainder of this study is organized as follows. Section 2 introduces thenotation and deﬁnes the direct and indirect eﬀects of interest. Section 3 presentsthe assumptions underlying our CiC approach as well as the identiﬁcation results.Section 4 provides a simulation study. Section 5 provides an application to Jobs II.Section 6 concludes.

Let D denote a binary treatment (e.g., receiving the oﬀer to participate in a trainingprogramme) and M a binary intermediate variable or mediator that may be a func-tion of D (e.g., the actual participation in a training programme). Furthermore, let T indicate a particular time period: T = 0 denotes the baseline period prior to therealisation of D and M , T = 1 the follow up period after measuring D and M inwhich the eﬀect of the outcome is evaluated. Finally, let Y t denote the outcome ofinterest (e.g., health measures) in period T = t . Indexing the outcome by the timeperiod t ∈ { , } implies that it is measured both in the baseline period and afterthe realisation of D and M . To deﬁne the parameters of interest, we make use of thepotential outcome notation, see for instance Rubin (1974), and denote by Y t ( d, m ) the potential outcome for treatment state D = d and mediator state M = m in time T = t , with d, m, t, ∈ { , } . Furthermore, let M ( d ) denote the potential mediatoras a function of the treatment state d ∈ { , } . For notational ease, we will not useany time index for D and M , because either is assumed to be measured at a singlepoint in time between T = 0 and T = 1 , albeit not necessarily at the same point,as D causally precedes M . Therefore, D and M correspond to the actual treatmentand mediator status in T = 1 , while it is assumed that no treatment or mediationtakes place in T = 0 .Using this notation, the average treatment eﬀect (ATE) in the ex-post periodis deﬁned as ∆ = E [ Y (1 , M (1)) − Y (0 , M (0))] . That is, the ATE corresponds5o the eﬀect of D on the outcome that either aﬀects the latter directly (net of anyeﬀect on the mediator) or indirectly through an eﬀect on M . Indeed, the totalATE can be disentangled into the direct and indirect eﬀects, denoted by θ ( d ) = E [ Y (1 , M ( d )) − Y (0 , M ( d ))] and δ ( d ) = E [ Y ( d, M (1)) − Y ( d, M (0))] , by addingand subtracting Y (1 , M (0)) or Y (0 , M (1)) , respectively: ∆ = E [ Y (1 , M (1)) − Y (0 , M (0))] , = E [ Y (1 , M (1)) − Y (1 , M (0))] | {z } = δ (1) + E [ Y (1 , M (0)) − Y (0 , M (0))] | {z } = θ (0) , = E [ Y (1 , M (1)) − Y (0 , M (1))] | {z } = θ (1) + E [ Y (0 , M (1)) − Y (0 , M (0))] | {z } = δ (0) . Distinguishing between θ (1) and θ (0) or δ (1) and δ (0) , respectively, implies thepossibility of interaction eﬀects between D and M such that the direct and indirecteﬀects could be heterogeneous across values d = 1 and d = 0 .In our approach, we consider the concepts of direct and indirect eﬀects withinspeciﬁc subpopulations. The latter are either deﬁned conditional on the treat-ment and mediator values or conditional on potential mediator values under ei-ther treatment states, which matches the so-called principal stratum frameworkof Frangakis and Rubin (2002). As outlined in Angrist, Imbens, and Rubin (1996)in the context of instrumental variable-based identiﬁcation, any individual i inthe population belongs to one of four strata, henceforth denoted by τ , accord-ing to their potential mediator status under either treatment state: always-takers( a : M (1) = M (0) = 1 ) whose mediator is always one, compliers ( c : M (1) = 1 , M (0) = 0 ) whose mediator corresponds to the treatment value, deﬁers ( de : M (1) =0 , M (0) = 1 ) whose mediator opposes the treatment value, and never-takers ( n : M (1) = M (0) = 0 ) whose mediator is never one. Note that τ cannot be pinneddown for any individual, because either M (1) or M (0) is observed, but never both.Let ∆ τ = E [ Y (1 , M (1)) − Y (0 , M (0)) | τ ] denote the ATE conditional on τ ∈{ a, c, de, n } ; θ τ ( d ) and δ τ ( d ) denote the corresponding direct and indirect eﬀects.6ecause M (1) = M (0) = 0 for any never-taker, the indirect eﬀect for this group isby deﬁnition zero ( δ n ( d ) = E [ Y ( d, − Y ( d, | τ = n ] = 0) and ∆ n = E [ Y (1 , − Y (0 , | τ = n ] = θ n (1) = θ n (0) = θ n equals the direct eﬀect for never-takers.Correspondingly, because M (1) = M (0) = 1 for any always-taker, the indirecteﬀect for this group is by deﬁnition zero ( δ a ( d ) = E [ Y ( d, − Y ( d, | τ = a ] = 0) and ∆ a = E [ Y (1 , − Y (0 , | τ = a ] = θ a (1) = θ a (0) = θ a equals the direct eﬀectfor always-takers. For the compliers, both direct and indirect eﬀects may exist. Notethat M ( d ) = d due to the deﬁnition of compliers. Accordingly, θ c ( d ) = E [ Y (1 , d ) − Y (0 , d ) | τ = c ] equals the direct eﬀect for compliers, δ c ( d ) = E [ Y ( d, − Y ( d, | τ = c ] equals the indirect eﬀect for compliers, and ∆ c = E [ Y (1 , − Y (0 , | τ = c ] equals the total eﬀect for compliers. In the absence of any direct eﬀect, the indirecteﬀects on the compliers are homogeneous, δ c (1) = δ c (0) = δ c , and correspond tothe local average treatment eﬀect (LATE, e.g., Angrist, Imbens, and Rubin, 1996).Analogous results hold for the deﬁers.As already mentioned, we will also consider direct eﬀects conditional on speciﬁcvalues D = d and mediator states M = M ( d ) = m , which are denoted by θ d,m ( d ) = E [ Y (1 , m ) − Y (0 , m ) | D = d, M ( d ) = m ] . These parameters are identiﬁed underweaker assumptions than strata-speciﬁc eﬀects, but are also less straightforward tointerpret, as they refer to mixtures of two strata. For instance, θ , (1) = E [ Y (1 , − Y (0 , | D = 1 , M (1) = 0] is the eﬀect on a mixture of never-takers and deﬁers,as these two groups satisfy M (1) = 0 . Likewise, θ , (0) refers to never-takers andcompliers satisfying M (0) = 0 , θ , (0) to always-takers and deﬁers satisfying M (0) =1 , and θ , (1) to always-takers and compliers satisfying M (1) = 1 . We denote by F Y t ( d,m ) ( y ) = Pr( Y t ( d, m ) ≤ y ) the cumulative distribution functionof Y t ( d, m ) at outcome level y . Its inverse, F − Y t ( d,m ) ( q ) = inf { y : F Y t ( d,m ) ( y ) ≥ q } , isthe quantile function of Y t ( d, m ) at rank q . The total QTE are denoted by ∆ ( q ) = F − Y (1 ,M (1)) ( q ) − F − Y (0 ,M (0)) ( q ) . The QTE can be disentangled into the direct quantile7ﬀects, denoted by θ ( q, d ) = F − Y (1 ,M ( d )) ( q ) − F − Y (0 ,M ( d )) ( q ) , and the indirect quantileeﬀects, denoted by δ ( q, d ) = F − Y ( d,M (1)) ( q ) − F − Y ( d,M (0)) ( q ) .The conditional distribution function in stratum τ is F Y t ( d,m ) | τ ( y ) = Pr( Y t ( d, m ) ≤ y | τ ) and the corresponding conditional quantile function is F − Y t ( d,m ) | τ ( q ) = inf { y : F Y t ( d,m ) | τ ( y ) ≥ q } for τ ∈ { a, c, d, n } . Using the previously described stratiﬁca-tion framework, we deﬁne the QTE conditional on τ ∈ { a, c, de, n } : ∆ τ ( q ) = F − Y (1 ,M (1)) | τ ( q ) − F − Y (0 ,M (0)) | τ ( q ) . The direct quantile treatment eﬀect among never-takers equals ∆ n ( q ) = F − Y (1 , | n ( q ) − F − Y (0 , | n ( q ) = θ n ( q ) . The direct quantile eﬀectamong always-takers equals ∆ a ( q ) = F − Y (1 , | a ( q ) − F − Y (0 , | a ( q ) = θ a ( q ) . The totalQTE among compliers equals ∆ c ( q ) = F − Y (1 , | c ( q ) − F − Y (0 , | c ( q ) , the direct quantileeﬀect among compliers equals θ c ( q, d ) = F − Y (1 ,d ) | c ( q ) − F − Y (0 ,d ) | c ( q ) , and the indirectquantile eﬀect among compliers equals δ c ( q, d ) = F − Y ( d, | c ( q ) − F − Y ( d, | c ( q ) . Finally,we deﬁne the direct quantile treatment eﬀects conditional on speciﬁc values D = d and mediator states M = M ( d ) = m , θ d,m ( q,

1) = F − Y (1 ,m ) | D = d,M (1)= m ( q ) − F − Y (0 ,m ) | D = d,M (1)= m ( q ) and θ d,m ( q,

0) = F − Y (1 ,m ) | D = d,M (0)= m ( q ) − F − Y (0 ,m ) | D = d,M (0)= m ( q ) , with the quantile function F − Y t ( d,m ) | D = d,M ( d )= m ( q ) = inf { y : F Y t ( d,m ) | D = d,M ( d )= m ( y ) ≥ q } and the distribution function F Y t ( d,m ) | D = d,M ( d )= m ( y ) = Pr( Y t ( d, m ) ≤ y | D = d, M ( d ) = m ) . We subsequently deﬁne various functions of the observed data required for the iden-tiﬁcation results. The conditional distribution function of the observed outcome Y t conditional on treatment value d and mediator state m , is given by F Y t | D = d,M = m ( y ) =Pr( Y t ≤ y | D = d, M = m ) for d, m ∈ { , } . The corresponding conditional quantile8unction is F − Y t | D = d,M = m ( q ) = inf { y : F Y t | D = d,M = m ( y ) ≥ q } . Furthermore, Q dm ( y ) := F − Y | D = d,M = m ◦ F Y | D = d,M = m ( y ) = F − Y | D = d,M = m ( F Y | D = d,M = m ( y )) is the quantile-quantile transform of the conditional outcome from period 0 to 1given treatment d and mediator status m . This transform maps y at rank q inperiod 0 ( q = F Y | D = d,M = m ( y ) ) into the corresponding y ′ at rank q in period 1( y ′ = F − Y | D = d,M = m ( q ) ). This sections discusses the identifying assumptions along with the identiﬁcationresults for the various direct and indirect eﬀects. We note that our assumptionscould be adjusted to only hold conditional on a vector of observed covariates. Inthis case, the identiﬁcation results would hold within cells deﬁned upon covari-ate values. In our main discussion, however, covariates are not considered for thesake of ease of notation. For notational convenience, we maintain throughout that

Pr( T = t, D = d, M = m ) > for t, d, m ∈ { , } , implying that all possibletreatment-mediator combinations exist in the population in both time periods. Ourﬁrst assumption implies that potential outcomes are characterized by a continuousnonparametric function, denoted by h , that is strictly monotonic in a scalar U thatreﬂects unobserved heterogeneity. Assumption 1:

Strict monotonicity of continuous potential outcomes in unob-served heterogeneity.The potential outcomes satisfy the following model: Y t ( d, m ) = h ( d, m, t, U ) , withthe general function h being continuous and strictly increasing in the scalar unob-servable U ∈ R for all d, m, t ∈ { , } .Assumption 1 requires the potential outcomes to be continuous implying that there9s a one-to-one correspondence between a potential outcome’s distribution and quan-tile functions, which is a condition for point identiﬁcation. For discrete potentialoutcomes, only bounds on the eﬀects could be identiﬁed, in analogy to the discus-sion in Athey and Imbens (2006) for total (rather than direct and indirect) eﬀects.Assumption 1 also implies that individuals with identical unobserved characteristics U have the same potential outcomes Y t ( d, m ) , while higher values of U correspondto strictly higher potential outcomes Y t ( d, m ) . Strict monotonicity is automaticallysatisﬁed in additively separable models, but Assumption 1 also allows for more ﬂex-ible non-additive structures that arise in nonparametric models.The next assumption rules out anticipation eﬀects of the treatment or the media-tor on the outcome in the baseline period. This assumption is plausible if assignmentto the treatment or the mediator cannot be foreseen in the baseline period, such thatbehavioral changes aﬀecting the pre-treatment outcome are ruled out. Assumption 2:

No anticipation eﬀect of M and D in the baseline period. Y ( d, m ) − Y ( d ′ , m ′ ) = 0 , for d, d ′ , m, m ′ { , } . Similarly, Athey and Imbens (2006) and Chaisemartin and D’Haultfeuille (2018) as-sume the assignment to the treatment group does not aﬀect the potential outcomesas long as the treatment is not yet realized.Furthermore, we assume conditional independence between unobserved hetero-geneity and time periods given the treatment and no mediation.

Assumption 3:

Conditional independence of U and T given D = 1 , M = 0 or D = 0 , M = 0 .(a) U ⊥⊥ T | D = 1 , M = 0 ,(b) U ⊥⊥ T | D = 0 , M = 0 .Under Assumption 3a, the distribution of U is allowed to vary across groups de-ﬁned upon treatment and mediator state, but not over time within the groupwith D = 1 , M = 0 . Assumption 3b imposes the same restriction conditional on D = 0 , M = 0 . Assumption 3 thus imposes stationarity of U within groups deﬁned10n D and M . This assumption is weaker than (and thus implied by) requiring that U is constant across T for each individual i . For example, Assumption 3 is satisﬁedin the ﬁxed eﬀect model U = η + v t , with η being a time-invariant individual-speciﬁcunobservable (ﬁxed eﬀect) and v t an idiosyncratic time-varying unobservable withthe same distribution in both time periods.Athey and Imbens (2006) and Chaisemartin and D’Haultfeuille (2018) imposetime invariance conditional on the treatment status, U ⊥⊥ T | D = d , to identify theaverage treatment eﬀect on the treated, ϕ = E [ Y (1 , M (1)) − Y (0 , M (0)) | D = 1] orlocal average treatment eﬀect, ϕ = E [ Y (1 , M (1)) − Y (0 , M (0)) | τ = c ] , respectively.We additionally condition on the mediator status to identify direct and indirecteﬀects.For our next assumption, we introduce some further notation. Let F U | d,m ( u )) =Pr( U ≤ u | D = d, M = m ) be the conditional distribution of U with support U dm . Assumption 4:

Common support given M = 0 .(a) U ⊆ U ,(b) U ⊆ U .Assumption 4a is a common support assumption, implying that any possible valueof U in the population with D = 1 , M = 0 is also contained in the populationwith D = 0 , M = 0 . Assumption 4b imposes that any value of U conditional on D = 0 , M = 0 also exists conditional on D = 1 , M = 0 . Both assumptions togetherimply that the support of U is the same in both populations, albeit the distributionsmay generally diﬀer.Assumptions 1 to 3 permit identifying direct eﬀects on mixed populations ofnever-takers and deﬁers as well as never-takers and compliers, respectively, as for-mally stated in Theorem 1. Theorem 1:

Under Assumptions 1–3,(a) and Assumption 4a, the average and quantile direct eﬀects under d = 1 con-11itional on D = 1 and M (1) = 0 are identiﬁed: θ , (1) = E [ Y − Q ( Y ) | D = 1 , M = 0] ,θ , ( q,

1) = F − Y | D =1 ,M =0 ( q ) − F − Q ( Y ) | D =1 ,M =0 ( q ) . (b) and Assumption 4b, the average and quantile direct eﬀects under d = 0 con-ditional on D = 0 and M (0) = 0 are identiﬁed: θ , (0) = E [ Q ( Y ) − Y | D = 0 , M = 0] ,θ , ( q,

0) = F − Q ( Y ) | D =0 ,M =0 ( q ) − F − Y | D =0 ,M =0 ( q ) . Proof.

See Appendix A.To identify direct eﬀects on further populations, we invoke a conditional inde-pendence assumption that is in the spirit of Assumption 3, but refers to diﬀerentcombinations of the treatment and the mediator.

Assumption 5:

Conditional independence of U and T given D = 0 , M = 1 or D = 1 , M = 1 .(a) U ⊥⊥ T | D = 0 , M = 1 ,(b) U ⊥⊥ T | D = 1 , M = 1 .Under Assumption 5a, the distribution of U is allowed to vary by treatment andmediator group, but not over time conditional on D = 0 , M = 1 . Assumption 5bimposes the same restriction conditional on D = 1 , M = 1 .Assumption 6 is similar to Assumption 4, but imposes common support condi-tional on M = 1 rather than M = 0 . Assumption 6:

Common support given M = 1 .(a) U ⊆ U ,(b) U ⊆ U .Assumptions 6a implies that any possible value of U in the population with D =0 , M = 1 is also contained in the population with D = 1 , M = 1 . Assumptions12b states that any value of U conditional on D = 1 , M = 1 exists conditional on D = 0 , M = 1 .Theorem 2 shows the identiﬁcation of the direct eﬀects on mixed populations ofalways-takers and deﬁers as well as always-takers and compliers. Theorem 2:

Under Assumptions 1-2, 5,(a) and Assumption 6a, the average and quantile direct eﬀects under d = 1 con-ditional on D = 0 and M (0) = 1 are identiﬁed: θ , (0) = E [ Q ( Y ) − Y | D = 0 , M = 1] ,θ , ( q,

0) = F − Q ( Y ) | D =0 ,M =1 ( q ) − F − Y | D =0 ,M =1 ( q ) . (b) and Assumption 6b, the average and quantile direct eﬀects under d = 1 isidentiﬁed conditional on D = 1 and M (1) = 1 are identiﬁed: θ , (1) = E [ Y − Q ( Y ) | D = 1 , M = 1] ,θ , ( q,

1) = F − Y | D =1 ,M =1 ( q ) − F − Q ( Y ) | D =1 ,M =1 ( q ) . Proof.

See Appendix B.In the instrumental variable framework, any direct eﬀects of the instrumentare typically ruled out by imposing the exclusion restriction, in order to iden-tify the causal eﬀect of an endogenous regressor on the outcome, see for instanceImbens and Angrist (1994). By considering D as instrument and M as endogenousregressor, θ , (1) = θ , (0) = θ , (0) = θ , (1) = 0 yield testable implications of theexclusion restriction under Assumptions 1-6.So far, we did not impose exogeneity of the treatment or mediator. In thefollowing, we assume treatment exogeneity by invoking independence between thetreatment and the potential post-treatment variables. Assumption 7:

Independence of the treatment and potential mediators/outcomes. { Y t ( d, m ) , M ( d ) } ⊥⊥ D , for all d, m, t, ∈ { , } . ∆ = E [ Y | D = 1] − E [ Y | D = 0] .Furthermore, we assume the mediator to be weakly monotonic in the treatment. Assumption 8:

Weak monotonicity of the mediator in the treatment.

Pr( M (1) ≥ M (0)) = 1 . Assumption 8 is standard in the instrumental variable literature on local averagetreatment eﬀects when denoting by D the instrument and by M the endogenousregressor, see Imbens and Angrist (1994) and Angrist, Imbens, and Rubin (1996).It rules out the existence of deﬁers.As discussed in the Appendix C, the total ATE ∆ = E [ Y | D = 1] − E [ Y | D = 0] and QTE ∆ ( q ) = F − Y | D =1 ( q ) − F − Y | D =0 ( q ) for the entire population are identiﬁed un-der Assumption 7. Furthermore, Assumptions 7 and 8 yield the strata proportions,denoted by p τ = Pr( τ ) , as functions of the conditional mediator probabilities giventhe treatment, which we denote by p ( m | d ) = Pr( M = m | D = d ) for d, m ∈ { , } (see Appendix C): p a = p | , p c = p | − p | = p | − p | , p n = p | . (1)Furthermore, Assumptions 2, 7, and 8 imply that (see Appendix C) ∆ ,c = E [ Y (1 , − Y (0 , | c ] = E [ Y | D = 1] − E [ Y | D = 0] p | − p | = 0 . (2)Therefore, a rejection of the testable implication E [ Y | D = 1] − E [ Y | D = 0] = 0 inthe data would point to a violation of these assumptions.Assumptions 7 and 8 permit identifying additional parameters, namely the total,direct, and indirect eﬀects on compliers, and the direct eﬀects on never- and always-takers, as shown in Theorems 3 to 5. This follows from the fact that deﬁers are ruledout and that the proportions and potential outcome distributions of the various14rincipal strata are not selective w.r.t. the treatment. Theorem 3:

Under Assumptions 1–3, 7-8,a) and Assumption 4a, the average and quantile direct eﬀects on never-takers areidentiﬁed: θ n = θ , (1) and θ n ( q ) = θ , ( q, . b) and Assumption 4, the average direct eﬀect under d = 0 on compliers isidentiﬁed: θ c (0) = p | p | − p | θ , (0) − p | p | − p | θ , (1) . Furthermore, the potential outcome distributions under d = 0 on compliersare identiﬁed: F Y (1 , | τ = c ( y ) = p | p | − p | F Q ( Y ) | D =0 ,M =0 ( y ) − p | p | − p | c F Y | D =1 ,M =0 ( y ) , (3) F Y (0 , | τ = c ( y ) = p | p | − p | F Y | D =0 ,M =0 ( y ) − p | p | − p | F Q ( Y ) | D =1 ,M =0 ( y ) . (4)Therefore, the direct quantile eﬀect under d = 0 on compliers, θ c ( q,

0) = F − Y (1 , | c ( q ) − F − Y (0 , | c ( q ) , is identiﬁed. Proof.

See Appendix D.

Theorem 4:

Under Assumptions 1–2, 5, 7-8,a) and Assumption 6a, the average and quantile direct eﬀects on always-takersare identiﬁed: θ a = θ , (0) and θ a ( q ) = θ , ( q, . b) and Assumption 6, the average direct eﬀect under d = 1 on compliers is15dentiﬁed: θ c (1) = p | p | − p | θ , (1) − p | p | − p | θ , (0) . Furthermore, the potential outcome distributions under d = 1 for compliersare identiﬁed: F Y (1 , | τ = c ( y ) = p | p | − p | F Y | D =1 ,M =1 ( y ) − p | p | − p | F Q ( Y ) | D =0 ,M =1 ( y ) , (5) F Y (0 , | τ = c ( y ) = p | p | − p | F Q ( Y ) | D =1 ,M =1 ( y ) − p | p | − p | F Y | D =0 ,M =1 ( y ) . (6)Therefore, the direct quantile eﬀect under d = 1 on compliers θ c ( q,

1) = F − Y (1 , | c ( q ) − F − Y (0 , | c ( q ) is identiﬁed. Proof.

See Appendix E.

Theorem 5:

Under Assumptions 1-3, 5, 7-8,a) and Assumptions 4a, 6a, the total average treatment eﬀect on compliers isidentiﬁed: ∆ c = p | p | − p | E [ Y | D = 1 , M = 1] − p | p | − p | E [ Q ( Y ) | D = 0 , M = 1] − p | p | − p | E [ Y | D = 0 , M = 0] + p | p | − p | E [ Q ( Y ) | D = 1 , M = 0] . Furthermore, the total quantile treatment eﬀect on compliers ∆ c ( q ) = F − Y (1 , | c ( q ) − F − Y (0 , | c ( q ) is identiﬁed using the inverse of (5) and (4).b) and Assumptions 4a, 6b, the average indirect eﬀect under d = 0 on compliers16s identiﬁed: δ c (0) = p | p | − p | E [ Q ( Y ) | D = 1 , M = 1] − p | p | − p | E [ Y | D = 0 , M = 1] − p | p | − p | E [ Y | D = 0 , M = 0] + p | p | − p | E [ Q ( Y ) | D = 1 , M = 0] . Furthermore, the quantile indirect eﬀect under d = 0 on compliers δ c ( q,

0) = F − Y (0 , | c ( q ) − F − Y (0 , | c ( q ) is identiﬁed using the inverse of (6) and (4).c) and Assumptions 4b, 6a, the average indirect eﬀect under d = 1 on compliersis identiﬁed: δ c (1) = p | p | − p | E [ Y | D = 1 , M = 1] − p | p | − p | E [ Q ( Y ) | D = 0 , M = 1] − p | p | − p | E [ Q ( Y ) | D = 0 , M = 0] + p | p | − p | E [ Y | D = 1 , M = 0] . Furthermore, the quantile indirect eﬀect under d = 1 on compliers δ c ( q,

1) = F − Y (1 , | c ( q ) − F − Y (1 , | c ( q ) is identiﬁed using the inverse of (5) and (3). Proof.

See Appendix F.

As in Assumption 5.1 of Athey and Imbens (2006), we assume standard regularityconditions, namely that conditional on T = t , D = d , and M = m , Y is a randomdraw from that subpopulation deﬁned in terms of t, d, m ∈ { , } . Furthermore,the outcome in the subpopulations required for the identiﬁcation results of interestmust have compact support and a density that is bounded from above and below aswell as continuously diﬀerentiable. Denote by N the total sample size across bothperiods and all treatment-mediator combinations and by i ∈ { , ..., N } an index forthe sampled subject, such that ( Y i , D i , M i , T i ) correspond to sample realizations ofthe random variables ( Y, D, M, T ).The total, direct, and indirect eﬀects may be estimated using the sample analogyprinciple, which replaces population moments with sample moments (e.g. Manski,17988). For instance, any conditional mediator probability given the treatment,

Pr( M = m | D = d ) , is to be replaced by an estimate thereof in the sample, P Ni =1 I { M i = m,D i = d } P Ni =1 I { D i = d } . A crucial step is the estimation of the quantile-quantile trans-forms. The application of such quantile transformations dates at least back toJuhn, Murphy, and Pierce (1991), see also Chaisemartin and D’Haultfeuille (2018),Wüthrich (2019), and Strittmatter (2019) for recent applications. First, it requiresestimating the conditional outcome distribution, F Y t | D = d,M = m ( y ) , by the conditionalempirical distribution ˆ F Y t | D = d,M = m ( y ) = P ni =1 I { D i = d,M i = m,T i = t } P i : D i = d,M i = m,T i = t I { Y i ≤ y } . Second, inverting the latter yields the empirical quantile function ˆ F − Y t | D = d,M = m ( q ) .The empirical quantile-quantile transform is then obtained by ˆ Q dm ( y ) = ˆ F − Y | D = d,M = m ( ˆ F Y | D = d,M = m ( y )) . This permits estimating the average and quantile eﬀects of interest. Average eﬀectsare estimated by replacing any (conditional) expectations with the correspondingsample averages in which the estimated quantile-quantile transforms enter as plug-inestimates. Taking θ , (see Theorem 1) as an example, an estimate thereof is ˆ θ , (1) = 1 P ni =1 I { D i = 1 , M i = 0 , T i = 1 } X i : D i =1 ,M i =0 ,T i =1 Y i − P ni =1 I { D i = 1 , M i = 0 , T i = 0 } X i : D i =1 ,M i =0 ,T i =0 ˆ Q ( Y i ) . Likewhise, quantile eﬀects are estimated based on the empirical quantiles.For the estimation of total ATE and QTE, Athey and Imbens (2006) show thatthe resulting estimators are √ N -consistent and asymptotically normal, see theirTheorems 5.1 and 5.3. These properties also apply to our context when splittingthe sample into subgroups based on the values of a binary treatment and mediator(rather than the treatment only). For instance, the implications of Theorem 1 inAthey and Imbens (2006) when considering subsamples with D = 1 and D = 0 carryover to considering subsamples with D = 1 , M = 0 and D = 0 , M = 0 for estimating18he average direct eﬀect on never-takers. In contrast to Athey and Imbens (2006),however, some of our identiﬁcation results include the conditional mediator proba-bilities Pr( M = m | D = d ) . As the latter are estimated with √ N -consistency, too, itfollows that the resulting eﬀect estimators are again √ N -consistent and asymptoti-cally normal. We use a non-parametric bootstrap approach to calculate the standarderrors. Chaisemartin and D’Haultfeuille (2018) show the validity of the bootstrapapproach for such kind of estimators, which follows from their asymptotic normality.For the case that identifying assumptions to only hold conditional on observedcovariates, denoted by X , estimation must be adapted to allow for control variables.Following a suggestion by Athey and Imbens (2006) in their Section 5.1, basingestimation on outcome residuals in which the association of X and Y has beenpurged by means of a regression is consistent under the additional assumption thatthe eﬀects of D and M are homogeneous across covariates. As an alternative,Melly and Santangelo (2015) propose a ﬂexible semiparametric estimator that doesnot impose such a homogeneity-in-covariates assumption and show √ N -consistencyand asymptotic normality. To shape the intuition for our identiﬁcation results, this section presents a briefsimulation based on the following data generating process (DGP): T ∼ Binom (0 . , D ∼ Binom (0 . , U ∼ U nif ( − , , V ∼ N (0 , independent of each other, and M = I { D + U + V > } , Y T = Λ((1 + D + M + D · M ) · T + U ) . Treatment D as well as the observed time period T are randomized, while themediator-outcome association is confounded due to the unobserved time constant19eterogeneity U . The potential outcome in period is given by Y ( d, M ( d ′ )) =Λ((1 + d + M ( d ′ ) + d · M ( d ′ )) + U ) , where Λ denotes a link function. If the lattercorresponds to the identity function, our model is linear and implies a homogeneoustime trend T equal to 1. If Λ is nonlinear, the time trend is heterogeneous, whichinvalidates the common trend assumption of diﬀerence-in-diﬀerences models. M isnot only a function of D and U , but also of the unobserved random term V , whichguarantees common support w.r.t. U , see Assumptions 4 and 6. Compliers, always-takers, and never-takers satisfy, respectively: c = I { U + V ≤ , U + V > } , a = I { U + V > } , and n = I { U + V ≤ } .In the simulations with 1,000 replications, we consider two sample sizes ( N =1 , , , ) and investigate the behaviour of our change-in-changes methods as wellas the diﬀerence-in-diﬀerences approach of Deuchert, Huber, and Schelker (2017) inboth a linear ( Λ equal to identity function) and nonlinear outcome model where Λ equals the exponential function. To implement the change-in-changes estimatorsin the simulations as well as the application in Section 5, we make use of the ‘cic’command in the qte R-package by Callaway (2016) with its default values.Table 1 reports the bias, standard deviation (‘sd’), root mean squared error(‘rmse’), true eﬀect (‘true’), and the relative root mean squared error in percent ofthe true eﬀect (‘relr’) of the respective estimators of θ n , θ a , ∆ c , θ c (1) , θ c (0) , δ c (1) ,and δ c (0) for the linear model. In this case, the identifying assumptions underlyingboth the change-in-changes (Panel A.) and diﬀerence-in-diﬀerences (Panel B.) esti-mators are satisﬁed. Speciﬁcally, the homogeneous time trend on the individual levelsatisﬁes any of the common trend assumptions in Deuchert, Huber, and Schelker(2017), while the monotonicity of Y in U and the independence of T and U satisﬁesthe key assumptions of this paper. For this reason any of the estimates in Table 1 areclose to being unbiased and appear to converge to the true eﬀect at the parametricrate when comparing the results for the two diﬀerent sample sizes.Table 2 provides the results for the exponential outcome model, in which thetime trend is heterogeneous and interacts with U through the nonlinear link func-20able 1: Linear model with random treatment ˆ θ n ˆ θ a ˆ∆ c ˆ θ c (1) ˆ θ c (0) ˆ δ c (1) ˆ δ c (0) A. Changes-in-Changes N =1,000bias 0.00 -0.00 -0.01 -0.01 -0.01 -0.00 -0.01sd 0.11 0.08 0.23 0.10 0.13 0.27 0.27rmse 0.11 0.08 0.23 0.10 0.13 0.27 0.27true 1.00 2.00 3.00 2.00 1.00 2.00 1.00relr 0.11 0.04 0.08 0.05 0.13 0.14 0.27 N =4,000bias -0.00 -0.00 0.00 -0.00 -0.01 0.01 0.01sd 0.06 0.04 0.12 0.05 0.07 0.14 0.14rmse 0.06 0.04 0.12 0.05 0.07 0.14 0.14true 1.00 2.00 3.00 2.00 1.00 2.00 1.00relr 0.06 0.02 0.04 0.02 0.07 0.07 0.14 B. Diﬀerence-in-Diﬀerences N =1,000bias 0.01 -0.00 -0.01 -0.01 0.00 -0.02 0.00sd 0.11 0.09 0.14 0.14 0.12 0.19 0.10rmse 0.11 0.09 0.14 0.14 0.12 0.19 0.10true 1.00 2.00 3.00 2.00 1.00 2.00 1.00relr 0.11 0.04 0.05 0.07 0.12 0.10 0.10 N =4,000bias -0.00 -0.00 0.00 -0.00 -0.00 0.00 0.00sd 0.06 0.04 0.07 0.07 0.06 0.10 0.05rmse 0.06 0.04 0.07 0.07 0.06 0.10 0.05true 1.00 2.00 3.00 2.00 1.00 2.00 1.00relr 0.06 0.02 0.02 0.04 0.06 0.05 0.05 Note: ‘bias’, ‘sd’, and ‘rmse’ provide the bias, standard deviation, and root mean squared error ofthe respective estimator. ‘true’ and ‘relr’ are the respective true eﬀect as well as the root meansquared error relative to the true eﬀect. tion. While the change-in-changes assumptions hold (Panel A.), average time trendsare heterogeneous across complier types such that the diﬀerence-in-diﬀerences ap-proach (Panel B.) of Deuchert, Huber, and Schelker (2017) is inconsistent. Accord-ingly, the biases of the change-in-changes estimates generally approach zero as thesample size increases, while this is not the case for the diﬀerence-in-diﬀerences esti-mates. Change-in-changes yields a lower root mean squared error than the respectivediﬀerence-in-diﬀerences estimator in all but one case (namely ˆ δ c (0) with N = 1 , )and its relative attractiveness increases in the sample size due to its lower bias.21able 2: Nonlinear model with random treatment ˆ θ n ˆ θ a ˆ∆ c ˆ θ c (1) ˆ θ c (0) ˆ δ c (1) ˆ δ c (0) A. Change-in-Changes N =1,000bias 0.01 -0.14 -0.48 -0.35 -0.11 -0.37 -0.13sd 0.48 5.08 8.47 6.20 1.16 8.64 4.23rmse 0.48 5.08 8.48 6.21 1.17 8.65 4.23true 3.49 68.09 52.42 47.70 4.72 47.70 4.72relr 0.14 0.07 0.16 0.13 0.25 0.18 0.90 N =4,000bias -0.01 0.01 -0.00 -0.11 -0.07 0.07 0.11sd 0.25 2.63 4.37 3.20 0.66 4.44 2.04rmse 0.25 2.63 4.37 3.20 0.66 4.44 2.04true 3.49 68.09 52.45 47.73 4.72 47.73 4.72relr 0.07 0.04 0.08 0.07 0.14 0.09 0.43 B. Diﬀerence-in-Diﬀerences N =1,000bias -0.27 -8.91 14.42 11.46 -1.49 15.91 2.96sd 0.46 2.62 2.58 2.62 0.47 2.61 0.47rmse 0.53 9.29 14.65 11.76 1.56 16.12 2.99true 3.49 68.09 52.42 47.70 4.72 47.70 4.72relr 0.15 0.14 0.28 0.25 0.33 0.34 0.63 N =4,000bias -0.28 -8.79 14.51 11.57 -1.51 16.02 2.94sd 0.24 1.28 1.26 1.28 0.25 1.27 0.23rmse 0.37 8.88 14.57 11.64 1.53 16.07 2.95true 3.49 68.09 52.45 47.73 4.72 47.73 4.72relr 0.11 0.13 0.28 0.24 0.32 0.34 0.62 Note: ‘bias’, ‘sd’, and ‘rmse’ provide the bias, standard deviation, and root mean squared error ofthe respective estimator. ‘true’ and ‘relr’ are the respective true eﬀect as well as the root meansquared error relative to the true eﬀect.

In our ﬁnal simulation design, we maintain the exponential outcome model butassume D to be selective w.r.t. U rather than random. To this end, the treatmentmodel in (4) is replaced by D = I { U + Q > } , with the independent variable Q ∼ N (0 , being an unobserved term. Under this violation of Assumption 7,complier shares and eﬀects are no longer identiﬁed, which is conﬁrmed by the sim-ulation results presented in Table 3. The bias in the change-in-changes based total,direct, and indirect eﬀects on compliers do not vanish as the sample size increases.Furthermore, under non-random assignment of D (while maintaining monotonicity22able 3: Nonlinear model with non-random treatment ˆ θ , ˆ θ , ˆ∆ c ˆ θ c (1) ˆ θ c (0) ˆ δ c (1) ˆ δ c (0) A. Change-in-Changes N =1,000bias 0.02 0.13 47.21 40.19 -1.44 48.64 7.02sd 0.71 4.56 5.45 4.11 0.75 5.53 2.92rmse 0.71 4.56 47.52 40.40 1.62 48.96 7.60true 4.41 54.19 52.42 47.70 4.72 47.70 4.72relr 0.16 0.08 0.91 0.85 0.34 1.03 1.61 N =4,000bias -0.00 0.06 47.38 40.13 -1.53 48.91 7.25sd 0.38 2.35 2.84 2.04 0.38 2.86 1.51rmse 0.38 2.35 47.47 40.18 1.57 48.99 7.40true 4.40 54.18 52.45 47.73 4.72 47.73 4.72relr 0.09 0.04 0.90 0.84 0.33 1.03 1.57 B. Diﬀerence-in-Diﬀerences N =1,000bias 0.35 19.98 29.00 27.65 0.04 28.96 1.35sd 0.67 2.48 2.46 2.48 0.67 2.51 0.45rmse 0.75 20.14 29.11 27.76 0.67 29.07 1.43true 4.41 54.19 52.42 47.70 4.72 47.70 4.72relr 0.17 0.37 0.56 0.58 0.14 0.61 0.30 N =4,000bias 0.34 20.02 28.98 27.65 0.02 28.96 1.33sd 0.35 1.22 1.19 1.22 0.35 1.24 0.23rmse 0.49 20.06 29.01 27.68 0.35 28.99 1.35true 4.40 54.18 52.45 47.73 4.72 47.73 4.72relr 0.11 0.37 0.55 0.58 0.07 0.61 0.29 Note: ‘bias’, ‘sd’, and ‘rmse’ provide the bias, standard deviation, and root mean squared error ofthe respective estimator. ‘true’ and ‘relr’ are the respective true eﬀect as well as the root meansquared error relative to the true eﬀect. of M in D ), the never-takers’ and always-takers’ respective distributions of U dif-fer across treatment. Therefore, average direct eﬀects among the total of never oralways-takers, respectively, are not identiﬁed. Yet, θ , , which is still identiﬁed bythe same estimator as before, yields the direct eﬀect among never-takers with D = 1 (as deﬁers do not exist). Likewise, θ , corresponds to the direct eﬀect on always-takers with D = 0 . Indeed, the results in Table 3 suggest that both parameters areconsistently estimated with the change-in-changes model (Panel A.).23 Application

Our empirical application is based on the JOBS II data by Vinokur and Price (1999).JOBS II was a randomized job training intervention in the US, designed to anal-yse the impact of job training on labour market and mental health outcomes, seeVinokur, Price, and Schul (1995). It was a modiﬁed version of the earlier JOBS pro-gramme, which had been found to improve labour market outcomes such as job satis-faction, motivation, earnings, and job stability, see Caplan, Vinokur, Price, and van Ryn(1989) and Vinokur, van Ryn, Gramlich, and Price (1991), as well as mental health,see Vinokur, Price, and Caplan (1991). According to the results of Vinokur, Price, and Schul(1995), the JOBS II programme increased reemployment rates and improved mentalhealth outcomes, especially for participants having an elevated risk of depression.The JOBS interventions had an important impact in the academic literature (seee.g. Wanberg, 2012, Liu, Huang, and Wang, 2014) and the methodology was imple-mented in ﬁeld experiments in Finland (Vuori, Silvonen, Vinokur, and Price, 2002,Vuori and Silvonen, 2005) and the Netherlands (Brenninkmeijer and Blonk, 2011),suggesting positive eﬀects on labour market integration in either case.The JOBS II intervention was conducted in south-eastern Michigan, where 2,464job seekers were eligible to participate in a randomized ﬁeld experiment, see Vinokur and Price(1999). In a baseline period prior to programme assignment, individuals respondedto a screening questionnaire that collected pre-treatment information on mentalhealth. Based on the latter, individuals were classiﬁed as having either a high orlow depression risk and those with a high risk were oversampled before the train-ing was randomly assigned. The job training consisted of ﬁve 4-hours seminarsconducted in morning sessions during one week between March 1 and August 7,1991. Members of the treatment group who participated in at least four of the In the JOBS II intervention, randomization was followed by yet another questionnaire sent out twoweeks before the actual job training, see Vinokur, Price, and Schul (1995), which also providedinformation on whether an individual had been assigned the training. Consequently, the datacollected in that questionnaire must be considered post-treatment as they could be aﬀected bylearning the assignment. Therefore, we rely on the earlier screening data as the relevant pre-treatment period prior to random programme assignment. The control group received a booklet with information on job searchmethods (Vinokur, Price, and Schul, 1995, p. 44-49).We analyse the impact of job training on mental health, namely symptoms ofdepression 6 months after training participation. The health outcome ( Y ) is basedon a 11-items index of depression symptoms of the Hopkins Symptom Checklist.For example, respondents were asked how much they were bothered by symptomssuch as crying easily, feeling lonely, feeling blue, feeling hopeless, having thoughts ofending their lives, or experiencing a loss of sexual interest. The questions were codedon a 5-point scale, going from ‘not at all’ (1) to ‘extremely’ (5), and summarized ina depression variable that consists of the average across all questions.One-sided non-compliance with the random assignment is a major issue in JOBSII. While the study design rules out always-takers because members of the controlgroup did not have access to the job training programme, 45% of those assigned totraining in our data did not participate and are therefore never-takers, the remaining55% are compliers. In order to avoid selection bias w.r.t actual participation, theoriginal JOBS II study by Vinokur, Price, and Schul (1995) analysed the total eﬀectof the policy (i.e. the intention-to-treat eﬀect), including those who, despite receivingan oﬀer to participate, did not take part in the job training. In contrast, we use ourmethodology to separate the direct eﬀect of mere training assignment, which is ourtreatment D , from the indirect eﬀect operating through actual training participation,which is our mediator M , among compliers. We also consider the direct eﬀect on When compared to the earlier JOBS programme (Caplan, Vinokur, Price, and van Ryn, 1989),the job training sessions of JOBS II focused more strongly on building a sense of mastery, per-sonal control and self- eﬃcacy in job search. Previous research had suggested that an increase inthis sense of mastery, control and self-eﬃcacy improved observed eﬀort in job search behaviour(Eden and Aviram, 1993). Results in Marshall and Lang (1990) suggest that mastery is a strongpredictor of depression symptoms among women. For a detailed discussion of the literature, the ex-act sampling process, the training programme, and further aspects, see Vinokur, Price, and Schul(1995). Imai, Keele, and Tingley (2010) analyse Jobs II in a mediation context as well, but consider adiﬀerent mediator, namely job search self-eﬃcacy, and a diﬀerent identiﬁcation strategy based onselection on observables. T = 0 ) post-mediator ( T = 1 )sample size mean sample size meanoverall 1,796 1.86 1,564 1.73(0.58) (0.67) D = 0

551 1.87 486 1.78(0.59) (0.70) D = 1 Note: Standard deviations are in parentheses. ‘mean diﬀ’, ‘pval’, and ‘SD’ are the mean diﬀerence,its p-value, and the standardized diﬀerence, respectively. never-takers, which likely diﬀers from that on the compliers. While being oﬀered(or not oﬀered) the job training might aﬀect compliers’ mental health by inducingmotivation/enthusiasm (or discouragement), it may not have the same eﬀect amongnever-takers, who do not attend such seminars whatsoever.More concisely, we base identiﬁcation on Theorem 3a with Assumption 4a forthe average direct eﬀect on never-takers, θ n , on Theorem 3b with Assumption 4 forthe direct eﬀect on compliers under d = 0 , θ c (0) , and Theorem 5 with Assumptions4b and 6a for the indirect eﬀect on compliers under d = 1 , δ c (1) . None of theseapproaches requires the presence of always-takers in the sample. We also note that ifrandom assignment operated through other mechanisms than actual participation inany of the subpopulations as it may appear reasonable in the context of mental healthoutcomes, this would violate the exclusion restriction when using assignment asinstrumental variable for actual participation in a two stage least squares regression.Given that our identifying assumptions hold, our approach can therefore be used tostatistically test the exclusion restriction.Our evaluation sample consists of a total of 3,360 observations in the pre-treatment and post-mediator periods with non-missing information for D , M , and Y . It is an unbalanced panel due to attrition of roughly 13% of the initial respon-dents between the two periods. Table 4 provides summary statistics for the outcome26n the total sample as well as by treatment group over time. We verify whether ran-domization was successful by comparing the outcome means of the treatment andcontrol groups in the pre-treatment period ( T = 0 ) just prior to the randomiza-tion of D . The small diﬀerence of 0.01 is not statistically signiﬁcant according toa two sample t-test. Furthermore, the standardized diﬀerence test suggested byRosenbaum and Rubin (1985) yields a value of just 1.68 and is thus far below 20, athreshold frequently chosen for indicating problematic imbalances across treatmentgroups. To test for potential attrition bias we also consider these statistics in thepre-treatment period exclusively among the panel cases that remain in the samplein the post-mediator period (not reported in Table 4). The p-value of the t-testamounts to 0.52 and the standardized diﬀerence of 3.5 is low such that attrition biasdoes not appear to be a concern. We therefore do not ﬁnd statistical evidence fora violation of the random assignment of D in our sample. Table 4 also reports themean diﬀerence in outcomes in the post-mediator period ( T = 1 ) 6 months afterparticipation, which is an estimate for the total (or intention-to-treat) eﬀect of D .The diﬀerence of 0.08 is statistically signiﬁcant at the 5% level.Table 5: Empirical results for Jobs IIChanges-in-Changes Diﬀerence-in-Diﬀerences Type shares ˆ θ n ˆ∆ c ˆ θ c (0) ˆ δ c (1) ˆ θ n ˆ∆ c ˆ θ c (0) ˆ δ c (1) ˆ p ( n ) ˆ p ( c ) est -0.04 -0.11 0.06 -0.17 -0.03 -0.12 -0.06 -0.06 0.45 0.55se 0.05 0.06 0.05 0.08 0.05 0.06 0.05 0.07 0.01 0.01pval 0.40 0.06 0.26 0.04 0.52 0.03 0.21 0.43 0.00 0.00 Note: ‘est’, ‘se’, and ‘pval’ provide the eﬀect estimate, standard error, and p-value of the respectiveestimator. ˆ p ( n ) and ˆ p ( c ) are the estimated never-taker and complier shares. Standard errors arebased on cluster bootstrapping the eﬀects 1999 times where clustering is on the respondent level. Table 5 presents the estimation results based on our CiC approach and the DiDstrategy of Deuchert, Huber, and Schelker (2017) when (linearly) controlling for thegender of respondents in either case. Standard errors rely on cluster bootstrappingthe direct and indirect eﬀects 1999 times, where clustering is on the respondentlevel. The CiC and DiD estimates of the direct eﬀects on never-takers, ˆ θ n (0) , aswell as on compliers, ˆ θ c (0) , are not statistically signiﬁcant at conventional levels.27ence, we do not ﬁnd statistical evidence for a direct eﬀect of the mere assignmentinto the training programme on the depression outcome, which would point to aviolation of the exclusion restriction when using assignment as instrument for par-ticipation. In contrast, we ﬁnd for both CiC and DiD negative total eﬀects amongcompliers ˆ∆ c that are statistically signiﬁcant at least at the 10% level. In the caseof CiC, also the negative indirect eﬀect among compliers, ˆ δ c (1) , is signiﬁcant at the5% level, while this is not in the case for DiD. By and large, our results point to amoderately negative treatment eﬀect on depressive symptoms through actual pro-gramme participation, rather than through other (i.e. direct) mechanisms. The CiCestimates ˆ δ c (1) and ˆ∆ c are in fact rather similar to the result of a two stage leastsquares regression relying on the exclusion restriction by using D as instrument for M . The latter approach yields a local average treatment eﬀect on compliers in thepost-mediator period of -0.14 with a heteroskedasticity-robust standard error of 0.07(signiﬁcant at the 5% level). We proposed a novel identiﬁcation strategy for causal mediation analysis with re-peated cross sections or panel data based on changes-in-changes (CiC) assump-tions that are related but yet diﬀerent to Athey and Imbens (2006) considering to-tal treatment eﬀects. Strict monotonicity of outcomes in unobserved heterogeneityand distributional time invariance of the latter within groups deﬁned on treatmentand mediator states are key assumptions for identifying direct eﬀects within thesegroups. Additionally assuming random treatment assignment and weak monotonic-ity of the mediator in the treatment permits identifying direct eﬀects on never-takersand always-takers as well as total, direct, and indirect eﬀects on compliers. We alsoprovided a brief simulation study and an empirical application to the Jobs II pro-gramme. 28 eferences

Albert, J. M., and

S. Nelson (2011): “Generalized causal mediation analysis,”

Biometrics , 67, 1028–1038.

Angrist, J. D., G. W. Imbens, and

D. B. Rubin (1996): “ Identiﬁcation ofCausal Eﬀects Using Instrumental Variables,”

Journal of the American StatisticalAssociation , 91, 444–472.

Athey, S., and

G. W. Imbens (2006): “Identiﬁcation and Inference in NonlinearDiﬀerence-In-Diﬀerence Models,”

Econometrica , 74, 431–497.

Baron, R. M., and

D. A. Kenny (1986): “The Moderator-Mediator Variable Dis-tinction in Social Psychological Research: Conceptual, Strategic, and StatisticalConsiderations,”

Journal of Personality and Social Psychology , 51, 1173–1182.

Bellani, L., and

M. Bia (2018): “The long-run eﬀect of childhood poverty andthe mediating role of education,”

Journal of the Royal Statistical Society: SeriesA , 182, 37–68.

Bijwaard, G. E., and

A. M. Jones (2019): “An IPW estimator for mediationeﬀects in hazard models: with an application to schooling, cognitive ability andmortality,”

Empirical Economics , 57, 129–175.

Brenninkmeijer, V., and

R. W. Blonk (2011): “The eﬀectiveness of the JOBSprogram among the long-term unemployed: a randomized experiment in theNetherlands,”

Health Promotion International , 27, 220–229.

Brunello, G., M. Fort, N. Schneeweis, and

R. Winter-Ebmer (2016): “TheCausal Eﬀect of Education on Health: What is the Role of Health Behaviors?,”

Health Economics , 25, 314–336.

Callaway, B. (2016): “Quantile Treatment Eﬀects in R: The qte Package,” workingpaper, Temple University, Philadelphia .29 aplan, R. D., A. D. Vinokur, R. H. Price, and

M. van Ryn (1989): “Jobseeking, reemployment, and mental health: A randomized ﬁeld experiment incoping with job loss,”

Journal of Applied Psychology , 74, 759–769.

Chaisemartin, C., and

X. D’Haultfeuille (2018): “ Fuzzy Diﬀerences-in-Diﬀerences,”

Review of Economic Studies , 85, 999–1028.

Chen, S. H., Y. C. Chen, and

J. T. Liu (2017): “The impact of family compo-sition on educational achievement,”

Journal of Human Resources , 0915-7401R1.

Cochran, W. G. (1957): “Analysis of Covariance: Its Nature and Uses,”

Biomet-rics , 13, 261–281.

Conti, G., J. J. Heckman, and

R. Pinto (2016): “The Eﬀects of Two InﬂuentialEarly Childhood Interventions on Health and Healthy Behaviour,”

The EconomicJournal , 126, F28–F65.

Deuchert, E., M. Huber, and

M. Schelker (2017): “Direct and indirect ef-fects based on diﬀerence-in-diﬀerences with an application to political preferencesfollowing the Vietnam draft lottery,” forthcoming in the Journal of Business &Economic Statistics . Doerr, A., and

A. Strittmatter (2019): “Identifying causal channels of policyreforms with multiple treatments and diﬀerent types of selection,” working paper,University of St. Gallen . Eden, D., and

A. Aviram (1993): “Self-eﬃcacy training to speed reemployment:Helping people to help themselves,”

Journal of Applied Psychology , 78, 352–360.

Flores, C. A., and

A. Flores-Lagunes (2009): “Identiﬁcation and Estimationof Causal Mechanisms and Net Eﬀects of a Treatment under Unconfoundedness,”

IZA Discussion Paper No. 4237 . Frangakis, C., and

D. Rubin (2002): “Principal Stratiﬁcation in Causal Infer-ence,”

Biometrics , 58, 21–29. 30 rölich, M., and

M. Huber (2017): “Direct and Indirect Treatment Eﬀects –Causal Chains and Mediation Analysis with Instrumental Variables,”

Journal ofthe Royal Statistical Society: Series B , 79, 1645–1666.

Heckman, J., R. Pinto, and

P. Savelyev (2013): “Understanding the Mech-anisms Through Which an Inﬂuential Early Childhood Program Boosted AdultOutcomes,”

American Economic Review , 103, 2052–2086.

Hong, G. (2010): “Ratio of mediator probability weighting for estimating naturaldirect and indirect eﬀects,” in

Proceedings of the American Statistical Association,Biometrics Section , p. 2401–2415. Alexandria, VA: American Statistical Associa-tion.

Huber, M. (2014): “Identifying causal mechanisms (primarily) based on inverseprobability weighting,”

Journal of Applied Econometrics , 29, 920–943.

Huber, M. (2015): “Causal pitfalls in the decomposition of wage gaps,”

Journal ofBusiness and Economic Statistics , 33, 179–191.

Huber, M., M. Lechner, and

G. Mellace (2017): “Why Do Tougher Case-workers Increase Employment? The Role of Program Assignment as a CausalMechanism,”

The Review of Economics and Statistics , 99, 180–183.

Huber, M., M. Lechner, and

A. Strittmatter (2018): “Direct and indirecteﬀects of training vouchers for the unemployed,”

Journal of the Royal StatisticalSociety: Series A (Statistics in Society) , 181, 441–463.

Imai, K., L. Keele, and

D. Tingley (2010): “A General Approach to CausalMediation Analysis,”

Psychological Methods , 15, 309–334.

Imai, K., L. Keele, and

T. Yamamoto (2010): “Identiﬁcation, Inference andSensitivity Analysis for Causal Mediation Eﬀects,”

Statistical Science , 25, 51–71.

Imai, K., and

T. Yamamoto (2013): “Identiﬁcation and Sensitivity Analysis for31ultiple Causal Mechanisms: Revisiting Evidence from Framing Experiments,”

Political Analysis , 21, 141–171.

Imbens, G. W., and

J. Angrist (1994): “Identiﬁcation and Estimation of LocalAverage Treatment Eﬀects,”

Econometrica , 62, 467–475.

Judd, C. M., and

D. A. Kenny (1981): “Process Analysis: Estimating Mediationin Treatment Evaluations,”

Evaluation Review , 5, 602–619.

Juhn, C., K. M. Murphy, and

B. Pierce (1991): “ Accounting for the Slowdownin Black-White Wage Convergencee,” in

Workers and their Wages: Changing Pat-terns in the United States , ed. by M. H. Kosters, pp. 107–143. American EnterpriseInsitute, Washigton.

Keele, L., D. Tingley, and

T. Yamamoto (2015): “Identifying mechanisms be-hind policy interventions via causal mediation analysis,”

Journal of Policy Anal-ysis and Management , 34, 937–963.

Liu, S., J. L. Huang, and

M. Wang (2014): “Eﬀectiveness of Job Search Inter-ventions: A Meta-Analytic Review,”

Psychological Bulletin , 140, 1009–1041.

Manski, C. F. (1988):

Analog Estimation Methods in Econometrics . Chapman &Hall, New York.

Marshall, G. N., and

E. L. Lang (1990): “Optimism, self-mastery, and symp-toms of depression in women professionals,”

Journal of Personality and SocialPsychology , 59, 132–139.

Melly, B., and

G. Santangelo (2015): “ The Changes-in-Changes Model withCovariates,”

Working Paper . Pearl, J. (2001): “Direct and indirect eﬀects,” in

Proceedings of the SeventeenthConference on Uncertainty in Artiﬁcial Intelligence , pp. 411–420, San Francisco.Morgan Kaufman. 32 etersen, M. L., S. E. Sinisi, and

M. J. van der Laan (2006): “Estimationof Direct Causal Eﬀects,”

Epidemiology , 17, 276–284.

Powdthavee, N., W. N. Lekfuangfu, and

M. Wooden (2013): “The MarginalIncome Eﬀect of Education on Happiness: Estimating the Direct and IndirectEﬀects of Compulsory Schooling on Well-Being in Australia,”

IZA DiscussionPaper No. 7365 . Robins, J. M. (2003): “Semantics of causal DAG models and the identiﬁcationof direct and indirect eﬀects,” in

In Highly Structured Stochastic Systems , ed.by P. Green, N. Hjort, and

S. Richardson, pp. 70–81, Oxford. Oxford UniversityPress.

Robins, J. M., and

S. Greenland (1992): “Identiﬁability and Exchangeabilityfor Direct and Indirect Eﬀects,”

Epidemiology , 3, 143–155.

Rosenbaum, P. R., and

D. B. Rubin (1985): “Constructing a control group usingmultivariate matched sampling methods that incorporate the propensity score.,”

The American Statistician , 39, 33–38.

Rubin, D. B. (1974): “ Estimating the Causal Eﬀect of Treatments in Randomizedand Non-Randomized Studies,”

Journal of Educational Psychology , 66, 688–701.

Rubin, D. B. (2004): “Direct and Indirect Causal Eﬀects via Potential Outcomes,”

Scandinavian Journal of Statistics , 31, 161–170.

Sawada, M. (2019): “ Non-Compliance in Randomized Control Trials without Ex-clusion Restrictions,”

Working Paper . Simonsen, M., and

L. Skipper (2006): “The Costs of Motherhood: An AnalysisUsing Matching Estimators,”

Journal of Applied Econometrics , 21, 919–934.

Strittmatter, A. (2019): “Heterogeneous Earnings Eﬀects of the Job Corps byGender Earnings: A Translated Quantile Approach,” forthcoming in Labour Eco-nomics . 33 chetgen Tchetgen, E. J., and

I. Shpitser (2012): “Semiparametric theoryfor causal mediation analysis: Eﬃciency bounds, multiple robustness, and sensi-tivity analysis,”

The Annals of Statistics , 40, 1816–1845.

VanderWeele, T. J. (2008): “Simple relations between principal stratiﬁcationand direct and indirect eﬀects,”

Statistics & Probability Letters , 78, 2957–2962.

VanderWeele, T. J. (2009): “Marginal Structural Models for the Estimation ofDirect and Indirect Eﬀects,”

Epidemiology , 20, 18–26.

VanderWeele, T. J. (2012): “Comments: Should Principal Stratiﬁcation Be Usedto Study Mediational Processes?,”

Journal of Research on Educational Eﬀective-ness , 5, 245–249.

Vansteelandt, S., M. Bekaert, and

T. Lange (2012): “Imputation Strategiesfor the Estimation of Natural Direct and Indirect Eﬀects,”

Epidemiologic Methods ,1, 129–158.

Vinokur, A. D., and

R. H. Price (1999): “Jobs II Preventive Intervention forUnemployed Job Seekers, 1991-1993,”

Inter-university Consortium for Politicaland Social Research . Vinokur, A. D., R. H. Price, and

R. D. Caplan (1991): “From ﬁeld ex-periments to program implementation: Assessing the potential outcomes of anexperimental intervention program for unemployed persons,”

American Journalof Community Psychology , 19, 543–562.

Vinokur, A. D., R. H. Price, and

Y. Schul (1995): “Impact of the JOBSIntervention on Unemployed Workers Varying in Risk for Depression,”

AmericanJournal of Community Psychology , 23, 39–74.

Vinokur, A. D., M. van Ryn, E. M. Gramlich, and

R. H. Price (1991):“From ﬁeld experiments to program implementation: Assessing the potential out-comes of an experimental intervention program for unemployed persons,”

Journalof Applied Psychology , 76, 213–219. 34 uori, J., and

J. Silvonen (2005): “The beneﬁts of a preventive job searchprogram on re-employment and mental health at 2-year follow-up,”

Journal ofOccupational and Organizational Psychology , 78, 43–52.

Vuori, J., J. Silvonen, A. D. Vinokur, and

R. H. Price (2002): “The TyöhönJob Search Program in Finland: Beneﬁts for the Unemployed With Risk of De-pression or Discouragement,”

Journal of Occupational Health Psychology , 7, 5–19.

Wanberg, C. R. (2012): “The Individual Experience of Unemployment,”

AnnualReview of Psychology , 63, 369–396.

Wüthrich, K. (2019): “A comparison of two quantile models with endogeneity,” forthcoming in Journal of Business and Economic Statistics .35 ppendices

A Proof of Theorem 1

A.1 Average direct eﬀect under d = conditional on D = and M ( ) = In the following, we prove that θ , (1) = E [ Y (1 , − Y (0 , | D = 1 , M i (1) =0] = E [ Y − Q ( Y ) | D = 1 , M = 0] . Using the observational rule, we obtain E [ Y (1 , | D = 1 , M (1) = 0] = E [ Y | D = 1 , M = 0] . Accordingly, we have to showthat E [ Y (0 , | D = 1 , M (1) = 0] = E [ Q ( Y ) | D = 1 , M = 0] to ﬁnish the proof.Denote the inverse of h ( d, m, t, u ) by h − ( d, m, t ; y ) , which exists because of thestrict monotonicity required in Assumption 1. Under Assumptions 1 and 3a, theconditional potential outcome distribution function equals F Y t ( d, | D =1 ,M =0 ( y ) A = Pr( h ( d, m, t, U ) ≤ y | D = 1 , M = 0 , T = t ) , = Pr( U ≤ h − ( d, m, t ; y ) | D = 1 , M = 0 , T = t ) , A a = Pr( U ≤ h − ( d, m, t ; y ) | D = 1 , M = 0) , = F U | ( h − ( d, m, t ; y )) , (A.1)for d, d ′ ∈ { , } . We use these quantities in the following.First, evaluating F Y (0 , | D =1 ,M =0 ( y ) at h (0 , , , u ) gives F Y (0 , | D =1 ,M =0 ( h (0 , , , u )) = F U | ( h − (0 , , h (0 , , , u ))) = F U | ( u ) . Applying F − Y (0 , | D =1 ,M =0 ( q ) to both sides, we have h (0 , , , u ) = F − Y (0 , | D =1 ,M =0 ( F U | ( u )) . (A.2)36econd, for F Y (0 , | D =1 ,M =0 ( y ) we have F − U | D =1 ,M =0 ( F Y (0 , | D =1 ,M =0 ( y )) = h − (0 , , y ) . (A.3)Combining (A.2) and (A.3) yields, h (0 , , , h − (0 , , y )) = F − Y (0 , | D =1 ,M =0 ◦ F Y (0 , | D =1 ,M =0 ( y ) . (A.4)Note that h (0 , , , h − (0 , , y )) maps the period 1 (potential) outcome of an in-dividual with the outcome y in period 0 under non-treatment without the me-diator. Accordingly, E [ F − Y (0 , | D =1 ,M =0 ◦ F Y (0 , | D =1 ,M =0 ( Y ) | D = 1 , M = 0] = E [ Y (0 , | D = 1 , M = 0] . We can identify F Y (0 , | D =1 ,M =0 ( y ) under Assump-tion 2, but we cannot identify F Y (0 , | D =1 ,M =0 ( y ) . However, we show in the fol-lowing that we can identify the overall quantile-quantile transform F − Y (0 , | D =1 ,M =0 ◦ F Y (0 , | D =1 ,M =0 ( y ) under the additional Assumption 3b.Under Assumptions 1 and 3b, the conditional potential outcome distributionfunction equals F Y t ( d, | D =0 ,M =0 ( y ) A = Pr( h ( d, m, t, U ) ≤ y | D = 0 , M = 0 , T = t ) , = Pr( U ≤ h − ( d, m, t ; y ) | D = 0 , M = 0 , T = t ) , A b = Pr( U ≤ h − ( d, m, t ; y ) | D = 0 , M = 0) , = F U | ( h − ( d, m, t ; y )) , (A.5)for d, d ′ ∈ { , } . We repeat similar steps as above. First, evaluating F Y (0 , | D =0 ,M =0 ( y ) at h (0 , , , u ) gives F Y (0 , D =0 ,M =0 ( h (0 , , , u )) = F U | ( h − (0 , , h (0 , , , u ))) = F U | ( u ) . Applying F − Y (0 , | D =0 ,M =0 ( q ) to both sides, we have h (0 , , , u ) = F − Y (0 , | D =0 ,M =0 ( F U | ( u )) . (A.6)37econd, for F Y (0 , | D =0 ,M =0 ( y ) we have F − U | ( F Y (0 , | D =0 ,M =0 ( y )) = h − (0 , , y ) . (A.7)Combining (A.6) and (A.7) yields, h (0 , , , h − (0 , , y )) = F − Y (0 , | D =0 ,M =0 ◦ F Y (0 , | D =0 ,M =0 ( y ) . (A.8)The left sides of (A.4) and (A.8) are equal. In contrast to (A.4), (A.8) con-tains only distributions that can be identiﬁed from observable data. In partic-ular, F Y t (0 , | D =0 ,M =0 ( y ) = Pr( Y t (0 , ≤ y | D = 0 , M = 0) = Pr( Y t ≤ y | D =0 , M = 0) . Accordingly, we can identify F − Y (0 , | D =1 ,M =0 ◦ F Y (0 , | D =1 ,M =0 ( y ) by Q ( y ) ≡ F − Y | D =0 ,M =0 ◦ F Y | D =0 ,M =0 ( y ) .Parsing Y through Q ( · ) in the treated group without mediator gives E [ Q ( Y ) | D = 1 , M = 0]= E [ F − Y | D =0 ,M =0 ◦ F Y | D =0 ,M =0 ( Y ) | D = 1 , M = 0] , = E [ F − Y (0 , | D =0 ,M =0 ◦ F Y (0 , | D =0 ,M =0 ( Y (1 , | D = 1 , M = 0] , A ,A b = E [ h (0 , , , h − (0 , , Y (1 , | D = 1 , M = 0] , A = E [ h (0 , , , h − (0 , , Y (0 , | D = 1 , M = 0] , A ,A a = E [ F − Y (0 , | D =1 ,M =0 ◦ F Y (0 , | D =1 ,M =0 ( Y (0 , | D = 1 , M = 0] , = E [ Y (0 , | D = 1 , M = 0] = E [ Y (0 , | D = 1 , M (1) = 0] , (A.9)which has data support because of Assumption 4a.38 .2 Quantile direct eﬀect under d = conditional on D = and M ( ) = In the following, we prove that θ , ( q,

1) = F − Y (1 , | D =1 ,M (1)=0 ( q ) − F − Y (0 , | D =1 ,M (1)=0 ( q ) , = F − Y | D =1 ,M =0 ( q ) − F − Q ( Y ) | D =1 ,M =0 ( q ) . For this purpose, we have to show that F Y (1 , | D =1 ,M (1)=0 ( y ) = F Y | D =1 ,M =0 ( y ) and (A.10) F Y (0 , | D =1 ,M (1)=0 ( y ) = F Q ( Y ) | D =1 ,M =0 ( y ) , (A.11)which is suﬃcient to show that the quantiles are also identiﬁed. We can show (A.10)using the observational rule F Y (1 , | D =1 ,M (1)=0 ( y ) = F Y | D =1 ,M =0 ( y ) = E [1 { Y ≤ y }| D = 1 , M = 0] , with {·} being the indicator function.Using (A.9), we obtain F Q ( Y ) | D =1 ,M =0 ( y )= E [1 { Q ( Y ) ≤ y }| D = 1 , M = 0] , = E [1 { F − Y | D =0 ,M =0 ◦ F Y | D =0 ,M =0 ( Y ) ≤ y }| D = 1 , M = 0] , = E [1 { Y (0 , ≤ y }| D = 1 , M = 0] , = F Y (0 , | D =1 ,M (1)=0 ( y ) , (A.12)which proves (A.11). A.3 Average direct eﬀect under d = conditional on D = and M ( ) = In the following, we show that θ , (0) = E [ Y (1 , − Y (0 , | D = 0 , M (0) =0] = E [ Q ( Y ) − Y | D = 0 , M = 0] . Using the observational rule, we obtain39 [ Y (0 , | D = 0 , M (0) = 0] = E [ Y | D = 0 , M = 0] . Accordingly, we have to showthat E [ Y (1 , | D = 0 , M (0) = 0] = E [ Q ( Y ) | D = 0 , M = 0] to ﬁnish the proof.First, we use (A.5) to evaluate F Y (1 , | D =0 ,M =0 ( y ) at h (1 , , , u ) F Y (1 , | D =0 ,M =0 ( h (1 , , , u )) = F U | ( h − (1 , , h (1 , , , u ))) = F U | ( u ) . Applying F − Y (1 , | D =0 ,M =0 ( q ) to both sides, we have h (1 , , , u ) = F − Y (1 , | D =0 ,M =0 ( F U | ( u )) . (A.13)Second, for F Y (1 , | D =0 ,M =0 ( y ) we have F − U | ( F Y (1 , | D =0 ,M =0 ( y )) = h − (1 , , y ) , (A.14)using (A.5). Combining (A.13) and (A.14) yields, h (1 , , , h − (1 , , y )) = F − Y (1 , | D =0 ,M =0 ◦ F Y (1 , | D =0 ,M =0 ( y ) . (A.15)Note that h (1 , , , h − (1 , , y )) maps the period 1 (potential) outcome of an indi-vidual with the outcome y in period 0 under treatment without the mediator. Ac-cordingly, E [ F − Y (1 , | D =0 ,M =0 ◦ F Y (1 , | D =0 ,M =0 ( Y ) | D = 0 , M = 0] = E [ Y (1 , | D =1 , M = 0] . We can identify F Y (1 , | D =0 ,M =0 ( y ) under Assumption 2, but we cannotidentify F Y (1 , | D =0 ,M =0 ( y ) . However, we show in the following that we can identifythe overall quantile-quantile transform F − Y (1 , | D =0 ,M =0 ◦ F Y (1 , | D =0 ,M =0 ( y ) under theadditional Assumption 3a.First, we use (A.1) to evaluate F Y (1 , | D =1 ,M =0 ( y ) at h (1 , , , u ) F Y (1 , | D =10 ,M =0 ( h (1 , , , u )) = F U | ( h − (1 , , h (1 , , , u ))) = F U | ( u ) . F − Y (1 , | D =1 ,M =0 ( q ) to both sides, we have h (1 , , , u ) = F − Y (1 , | D =1 ,M =0 ( F U | ( u )) . (A.16)Second, for F Y (1 , | D =0 ,M =0 ( y ) we have F − U | ( F Y (1 , | D =1 ,M =0 ( y )) = h − (1 , , y ) , (A.17)using (A.1). Combining (A.16) and (A.17) yields, h (1 , , , h − (1 , , y )) = F − Y (1 , | D =1 ,M =0 ◦ F Y (1 , | D =1 ,M =0 ( y ) . (A.18)The left sides of (A.15) and (A.18) are equal. In contrast to (A.15), (A.18) con-tains only distributions that can be identiﬁed from observable data. In partic-ular, F Y t (1 , | D =1 ,M =0 ( y ) = Pr( Y t (1 , ≤ y | D = 1 , M = 0) = Pr( Y t ≤ y | D =1 , M = 0) . Accordingly, we can identify F − Y (1 , | D =0 ,M =0 ◦ F Y (1 , | D =0 ,M =0 ( y ) by Q ( y ) ≡ F − Y | D =1 ,M =0 ◦ F Y | D =1 ,M =0 ( y ) .Parsing Y through Q ( · ) in the non-treated group without mediator gives E [ Q ( Y ) | D = 0 , M = 0]= E [ F − Y | D =1 ,M =0 ◦ F Y | D =1 ,M =0 ( Y ) | D = 0 , M = 0] , = E [ F − Y (1 , | D =1 ,M =0 ◦ F Y (1 , | D =1 ,M =0 ( Y (0 , | D = 0 , M = 0] , A ,A a = E [ h (1 , , , h − (1 , , Y (0 , | D = 0 , M = 0] , A = E [ h (1 , , , h − (1 , , Y (1 , | D = 1 , M = 0] , A ,A b = E [ F − Y (1 , | D =0 ,M =0 ◦ F Y (1 , | D =0 ,M =0 ( Y (1 , | D = 0 , M = 0] , = E [ Y (1 , | D = 0 , M = 0] = E [ Y (1 , | D = 0 , M (0) = 0] , (A.19)which has data support because of Assumption 4b.41 .4 Quantile direct eﬀect under d = conditional on D = and M ( ) = In the following, we prove that θ , ( q,

0) = F − Y (1 , | D =0 ,M (0)=0 ( q ) − F − Y (0 , | D =0 ,M (0)=0 ( q ) , = F − Q ( Y ) | D =0 ,M =0 ( q ) − F − Y | D =0 ,M =0 ( q ) . For this purpose, we have to show that F Y (1 , | D =0 ,M (0)=0 ( y ) = F Q ( Y ) | D =0 ,M =0 ( y ) and (A.20) F Y (0 , | D =0 ,M (0)=0 ( y ) = F Y | D =0 ,M =0 ( y ) , (A.21)which is suﬃcient to show that the quantiles are also identiﬁed. We can show (A.21)using the observational rule F Y (0 , | D =0 ,M (0)=0 ( y ) = F Y | D =0 ,M =0 ( y ) = E [1 { Y ≤ y }| D = 0 , M = 0] .Using (A.19), we obtain F Q ( Y ) | D =0 ,M =0 ( y )= E [1 { Q ( Y ) ≤ y }| D = 0 , M = 0] , = E [1 { F − Y | D =1 ,M =0 ◦ F Y | D =1 ,M =0 ( Y ) ≤ y }| D = 0 , M = 0] , = E [1 { Y (1 , ≤ y }| D = 0 , M = 0] , = F Y (1 , | D =0 ,M (0)=0 ( y ) , which proves (A.20). 42 Proof of Theorem 2

B.1 Average direct eﬀect under d = conditional on D = and M ( ) = In the following, we show that θ , (0) = E [ Y (1 , − Y (0 , | D = 0 , M (0) =1] = E [ Q ( Y ) − Y | D = 0 , M = 1] . Using the observational rule, we obtain E [ Y (0 , | D = 0 , M (0) = 1] = E [ Y | D = 0 , M = 1] . Accordingly, we have to showthat E [ Y (1 , | D = 0 , M (0) = 1] = E [ Q ( Y ) | D = 0 , M = 1] to ﬁnish the proof.Under Assumptions 1 and 5a, the conditional potential outcome distributionfunction equals F Y t ( d, | D =1 ,M =0 ( y ) A = Pr( h ( d, m, t, U ) ≤ y | D = 0 , M = 1 , T = t ) , = Pr( U ≤ h − ( d, m, t ; y ) | D = 0 , M = 1 , T = t ) , A a = Pr( U ≤ h − ( d, m, t ; y ) | D = 0 , M = 1) , = F U | ( h − ( d, m, t ; y )) , (B.1)for d, d ′ ∈ { , } . We use these quantities in the following.First, evaluating F Y (1 , | D =0 ,M =1 ( y ) at h (1 , , , u ) gives F Y (1 , | D =0 ,M =1 ( h (1 , , , u )) = F U | ( h − (1 , , h (1 , , , u ))) = F U | ( u ) . Applying F − Y (1 , | D =0 ,M =1 ( q ) to both sides, we have h (1 , , , u ) = F − Y (1 , | D =0 ,M =1 ( F U | ( u )) . (B.2)Second, for F Y (1 , | D =0 ,M =1 ( y ) we have F − U | ( F Y (1 , | D =0 ,M =1 ( y )) = h − (1 , , y ) . (B.3)43ombining (B.2) and (B.3) yields, h (1 , , , h − (1 , , y )) = F − Y (1 , | D =0 ,M =1 ◦ F Y (1 , | D =0 ,M =1 ( y ) . (B.4)Note that h (1 , , , h − (1 , , y )) maps the period 1 (potential) outcome of an in-dividual with the outcome y in period 0 under treatment with the mediator. Ac-cordingly, E [ F − Y (1 , | D =0 ,M =1 ◦ F Y (1 , | D =0 ,M =1 ( Y ) | D = 0 , M = 1] = E [ Y (1 , | D =0 , M = 1] . We can identify F Y (1 , | D =0 ,M =1 ( y ) = F Y | D =0 ,M =1 ( y ) under Assump-tion 2, but we cannot identify F Y (1 , | D =0 ,M =1 ( y ) . However, we show in the follow-ing that we can identify the overall quantile-quantile transform F − Y (1 , | D =0 ,M =1 ◦ F Y (1 , | D =0 ,M =1 ( y ) under the additional Assumption 5b.Under Assumptions 1 and 5b, the conditional potential outcome distributionfunction equals F Y t ( d, | D =1 ,M =1 ( y ) A = Pr( h ( d, m, t, U ) ≤ y | D = 1 , M = 1 , T = t ) , = Pr( U ≤ h − ( d, m, t ; y ) | D = 1 , M = 1 , T = t ) , A b = Pr( U ≤ h − ( d, m, t ; y ) | D = 1 , M = 1) , = F U | ( h − ( d, m, t ; y )) , (B.5)for d, d ′ ∈ { , } . We repeat similar steps as above. First, evaluating F Y (1 , | D =1 ,M =1 ( y ) at h (1 , , , u ) gives F Y (1 , | D =1 ,M =1 ( h (1 , , , u )) = F U | ( h − (1 , , h (1 , , , u ))) = F U | ( u ) . Applying F − Y (1 , | D =1 ,M =1 ( q ) to both sides, we have h (1 , , , u ) = F − Y (1 , | D =1 ,M =1 ( F U | ( u )) . (B.6)Second, for F Y (1 , | D =1 ,M =1 ( y ) we have F − U | ( F Y (1 , | D =1 ,M =1 ( y )) = h − (1 , , y ) . (B.7)44ombining (B.6) and (B.7) yields, h (1 , , , h − (1 , , y )) = F − Y (1 , | D =1 ,M =1 ◦ F Y (1 , | D =1 ,M =1 ( y ) . (B.8)The left sides of (B.4) and (B.8) are equal. In contrast to (B.4), (B.8) con-tains only distributions that can be identiﬁed from observable data. In partic-ular, F Y t (1 , | D =1 ,M =1 ( y ) = Pr( Y t (1 , ≤ y | D = 1 , M = 1) = Pr( Y t ≤ y | D =1 , M = 1) . Accordingly, we can identify F − Y (1 , | D =0 ,M =1 ◦ F Y (1 , | D =0 ,M =1 ( y ) by Q ( y ) ≡ F − Y | D =1 ,M =1 ◦ F Y | D =1 ,M =1 ( y ) .Parsing Y through Q ( · ) in the non-treated group with mediator gives E [ Q ( Y ) | D = 0 , M = 1]= E [ F − Y | D =1 ,M =1 ◦ F Y | D =1 ,M =1 ( Y ) | D = 0 , M = 1] , = E [ F − Y (1 , | D =1 ,M =1 ◦ F Y (1 , | D =1 ,M =1 ( Y (0 , | D = 0 , M = 1] , A ,A b = E [ h (1 , , , h − (1 , , Y (0 , | D = 0 , M = 1] , A = E [ h (1 , , , h − (1 , , Y (0 , | D = 0 , M = 1] , A ,A a = E [ F − Y (1 , | D =0 ,M =1 ◦ F Y (1 , | D =0 ,M =1 ( Y (0 , | D = 0 , M = 1] , = E [ Y (1 , | D = 0 , M = 1] = E [ Y (1 , | D = 0 , M (0) = 1] , (B.9)which has data support because of Assumption 6a. B.2 Quantile direct eﬀect under d = conditional on D = and M ( ) = In the following, we show that θ , ( q,

0) = F − Y (1 , | D =0 ,M (0)=1 ( q ) − F − Y (0 , | D =0 ,M (0)=1 ( q ) , = F − Q ( Y ) | D =0 ,M =1 ( q ) − F − Y | D =0 ,M =1 ( q ) . F Y (1 , | D =0 ,M (0)=1 ( y ) = F Q ( Y ) | D =0 ,M =1 ( y ) and (B.10) F Y (0 , | D =0 ,M (0)=1 ( y ) = F Y | D =0 ,M =1 ( y ) , (B.11)which is suﬃcient to show that the quantiles are also identiﬁed. We can show (B.11)using the observational rule F Y (0 , | D =0 ,M (0)=1 ( y ) = F Y | D =0 ,M =1 ( y ) = E [1 { Y ≤ y }| D = 0 , M = 1] .Using (B.9), we obtain F Q ( Y ) | D =0 ,M =1 ( y )= E [1 { Q ( Y ) ≤ y }| D = 0 , M = 1] , = E [1 { F − Y | D =1 ,M =1 ◦ F Y | D =1 ,M =1 ( Y ) ≤ y }| D = 0 , M = 1] , = E [1 { Y (1 , ≤ y }| D = 0 , M = 0] , = F Y (1 , | D =0 ,M (0)=1 ( y ) , (B.12)which proves (B.10). B.3 Average direct eﬀect under d = conditional on D = and M ( ) = In the following, we show that θ , (1) = E [ Y (1 , − Y (0 , | D = 1 , M (1) =1] = E [ Y − Q ( Y ) | D = 1 , M = 1] . Using the observational rule, we obtain E [ Y (1 , | D = 1 , M (1) = 1] = E [ Y | D = 1 , M = 1] . Accordingly, we have to showthat E [ Y (0 , | D = 1 , M (1) = 1] = E [ Q ( Y ) | D = 1 , M = 1] to ﬁnish the proof.First, using (B.5) to evaluate F Y (0 , | D =1 ,M =1 ( y ) at h (0 , , , u ) gives F Y (0 , | D =1 ,M =1 ( h (0 , , , u )) = F U | ( h − (0 , , h (0 , , , u ))) = F U | ( u ) . F − Y (0 , | D =1 ,M =1 ( q ) to both sides, we have h (0 , , , u ) = F − Y (0 , | D =1 ,M =1 ( F U | ( u )) . (B.13)Second, for F Y (0 , | D =0 ,M =1 ( y ) we obtain F − U | ( F Y (0 , | D =1 ,M =1 ( y )) = h − (0 , , y ) , (B.14)using (B.5). Combining (B.13) and (B.14) yields, h (0 , , , h − (0 , , y )) = F − Y (0 , | D =1 ,M =1 ◦ F Y (0 , | D =1 ,M =1 ( y ) . (B.15)Note that h (0 , , , h − (0 , , y )) maps the period 1 (potential) outcome of an indi-vidual with the outcome y in period 0 under non-treatment with the mediator. Ac-cordingly, E [ F − Y (1 , | D =0 ,M =1 ◦ F Y (1 , | D =0 ,M =1 ( Y ) | D = 0 , M = 1] = E [ Y (1 , | D =0 , M = 1] . We can identify F Y (1 , | D =0 ,M =1 ( y ) = F Y | D =0 ,M =1 ( y ) under Assump-tion 2, but we cannot identify F Y (1 , | D =0 ,M =1 ( y ) . However, we show in the follow-ing that we can identify the overall quantile-quantile transform F − Y (1 , | D =0 ,M =1 ◦ F Y (1 , | D =0 ,M =1 ( y ) under the additional Assumption 5a.First, using (B.1) to evaluate F Y (0 , | D =0 ,M =1 ( y ) at h (0 , , , u ) gives F Y (0 , | D =0 ,M =1 ( h (0 , , , u )) = F U | ( h − (0 , , h (0 , , , u ))) = F U | ( u ) . Applying F − Y (0 , | D =0 ,M =1 ( q ) to both sides, we have h (0 , , , u ) = F − Y (0 , | D =0 ,M =1 ( F U | ( u )) . (B.16)Second, for F Y (0 , | D =0 ,M =1 ( y ) we obtain F − U | ( F Y (0 , | D =0 ,M =1 ( y )) = h − (0 , , y ) , (B.17)47sing (B.1). Combining (B.16) and (B.17) yields, h (0 , , , h − (0 , , y )) = F − Y (0 , | D =0 ,M =1 ◦ F Y (0 , | D =0 ,M =1 ( y ) . (B.18)The left sides of (B.15) and (B.18) are equal. In contrast to (B.15), (B.18)contains only distributions that can be identiﬁed from observable data. In particular, F Y t (0 , | D =0 ,M =1 ( y ) = Pr( Y t (0 , ≤ y | D = 0 , M = 1) = Pr( Y t ≤ y | D = 0 , M =1) . Accordingly, we can identify F − Y (0 , | D =1 ,M =1 ◦ F Y (0 , | D =1 ,M =1 ( y ) by Q ( y ) ≡ F − Y | D =0 ,M =1 ◦ F Y | D =0 ,M =1 ( y ) .Parsing Y through Q ( · ) in the treated group with mediator gives E [ Q ( Y ) | D = 1 , M = 1]= E [ F − Y | D =0 ,M =1 ◦ F Y | D =0 ,M =1 ( Y ) | D = 1 , M = 1] , = E [ F − Y (0 , | D =0 ,M =1 ◦ F Y (0 , | D =0 ,M =1 ( Y (1 , | D = 1 , M = 1] , A ,A a = E [ h (0 , , , h − (0 , , Y (1 , | D = 1 , M = 1] , A = E [ h (0 , , , h − (0 , , Y (0 , | D = 1 , M = 1] , A ,A b = E [ F − Y (0 , | D =1 ,M =1 ◦ F Y (0 , | D =1 ,M =1 ( Y (0 , | D = 1 , M = 1] , = E [ Y (0 , | D = 1 , M = 1] = E [ Y (0 , | D = 1 , M (1) = 1] , (B.19)which has data support under Assumption 6b. B.4 Quantile direct eﬀect under d = conditional on D = and M ( ) = In the following, we show that θ , ( q,

1) = F − Y (1 , | D =1 ,M (1)=1 ( q ) − F − Y (0 , | D =1 ,M (1)=1 ( q ) , = F − Y | D =1 ,M =1 ( q ) − F − Q ( Y ) | D =1 ,M =1 ( q ) . F Y (1 , | D =1 ,M (1)=1 ( y ) = F Y | D =1 ,M =1 ( y ) and (B.20) F Y (0 , | D =1 ,M (1)=1 ( y ) = F Q ( Y ) | D =1 ,M =1 ( y ) , (B.21)which is suﬃcient to show that the quantiles are also identiﬁed. We can show (B.20)using the observational rule F Y (1 , | D =1 ,M (1)=1 ( y ) = F Y | D =1 ,M =1 ( y ) = E [1 { Y ≤ y }| D = 1 , M = 1] .Using (B.19), we obtain F Q ( Y ) | D =1 ,M =1 ( y )= E [1 { Q ( Y ) ≤ y }| D = 1 , M = 1] , = E [1 { F − Y | D =0 ,M =1 ◦ F Y | D =0 ,M =1 ( Y ) ≤ y }| D = 1 , M = 1] , = E [1 { Y (0 , ≤ y }| D = 1 , M = 0] , = F Y (0 , | D =1 ,M (1)=1 ( y ) , which proves (B.21). C Proof of equations (1) and (2) ∆ = E [ Y | D = 1] − E [ Y | D = 0] and quantile treatment eﬀect ∆ ( q ) = F − Y | D =1 ( q ) − F − Y | D =0 ( q ) The average total eﬀect for the entire population is identiﬁed by, ∆ = E [ Y (1 , M (1))] − E [ Y (0 , M (0))] , A = E [ Y (1 , M (1)) | D = 1] − E [ Y (0 , M (0)) | D = 0] , = E [ Y | D = 1] − E [ Y | D = 0] , where the ﬁrst equality is the deﬁnition of ∆ , the second equality hold by Assump-tion 7, and the last equality holds by the observational rule.We deﬁne the conditional distribution F Y | D = d ( y ) = Pr( Y ≤ y | D = d ) and49 − Y | D = d ( q ) = inf { y : F Y | D = d ( y ) ≥ q } . We can show the identiﬁcation of the totalQTE for the entire population ∆ ( q ) = F − Y | D =1 ( q ) − F − Y | D =0 ( q ) when we show that F Y (1 ,M (1)) ( y ) = F Y | D =1 ( y ) and F Y (0 ,M (0)) ( y ) = F Y | D =0 ( y ) . Using Assumption 7 andthe observational rule gives, F Y (1 ,M (1)) ( y ) = Pr( Y (1 , M (1)) ≤ y ) , A = Pr( Y (1 , M (1)) ≤ y | D = 1) , = Pr( Y ≤ y | D = 1) = F Y | D =1 ( y ) , and F Y (0 ,M (0)) ( y ) = Pr( Y (0 , M (0)) ≤ y ) , A = Pr( Y (0 , M (0)) ≤ y | D = 0) , = Pr( Y ≤ y | D = 0) = F Y | D =0 ( y ) , which ﬁnishes the proof.By Assumption 7, the share of a type τ conditional on D corresponds to p τ (in the population), as D is randomly assigned. This implies that p | = p a + p c , p | = p a + p de , p | = p n + p de , and p | = p n + p c . Under Assumption 8, p de = 0 ,which ﬁnishes the proof of (1).Furthermore, E [ Y t ( d, m ) | τ, D = 1] = E [ Y t ( d, m ) | τ, D = 0] = E [ Y t ( d, m ) | τ ] dueto the independence of D and the potential outcomes as well as the types τ (which area deterministic function of M ( d ) ) under Assumption 7. It follows that conditioningon D is not required on the right hand side of the following equation, which expressesthe mean outcome conditional D = 0 and M = 0 as weighted average of the meanpotential outcomes of compliers and never-takers: E [ Y t | D = 0 , M = 0]= p n p n + p c E [ Y t (0 , | τ = n ] + p c p n + p c E [ Y t (0 , | τ = c ] . (C.1)50nly compliers and never-takers satisfy M (0) = 0 and thus make up the group with D = 0 and M = 0 . After some rearrangements we obtain E [ Y t (0 , | τ = n ] − E [ Y t (0 , | τ = c ]= p n + p c p c { E [ Y t (0 , | τ = n ] − E [ Y t | D = 0 , M = 0] } . (C.2)Next, we consider observations with D = 1 and M = 0 , which might consist of bothnever-takers and deﬁers, as M (1) = 0 for both types. However, by Assumption 8,deﬁers are ruled out, such that the mean outcome given D = 1 and M = 0 isdetermined by never-takers only: E [ Y t | D = 1 , M = 0] A ,A = E [ Y t (1 , | τ = n ] . (C.3)Furthermore, by Assumption 2, E [ Y (0 , | τ = n ] A = E [ Y (1 , | τ = n ] A ,A = E [ Y | D = 1 , M = 0] . Similarly to (C.1) for the never-takers and compliers, consider the mean outcomegiven Z = 1 and D = 1 , which is made up by always-takers and compliers (the typeswith M (1) = 1 ) E [ Y t | D = 1 , M = 1]= p a p a + p c E [ Y t (1 , | τ = a ] + p c p a + p c E [ Y t (1 , | τ = c ] . (C.4)After some rearrangements we obtain E [ Y t (1 , | τ = a ] − E [ Y t (1 , | τ = c ]= p a + p c p c { E [ Y t (1 , | τ = a ] − E [ Y t | D = 1 , M = 1] } . (C.5)By Assumptions 7 and 8, E [ Y t | D = 0 , M = 1] = E [ Y t (0 , | τ = a ] . (C.6)51ow consider (C.5) for period T = 0 , and note that by Assumption 2, E [ Y (1 , | τ = a ] = E [ Y (0 , | τ = a ] = E [ Y (0 , | τ = a ] and E [ Y (1 , | τ = c ] = E [ Y (0 , | τ = c ] .Combining (C.4), (C.6), and the law of iterative expectations (LIE) gives E [ Y | D = 1] LIE = E [ Y | D = 1 , M = 1] · p | + E [ Y | D = 1 , M = 0] · p | , = E [ Y (1 , | τ = c ] · p c + E [ Y (1 , | τ = a ] · p a + E [ Y (1 , | τ = n ] · p n , A = E [ Y (1 , | τ = c ] · p c + E [ Y (1 , | τ = a ] · p a + E [ Y (0 , | τ = n ] · p n . Likewise, combining (C.1) and (C.3) gives E [ Y | D = 0] LIE = E [ Y | D = 0 , M = 1] · p | + E [ Y | D = 1 , M = 0] · p | , = E [ Y (0 , | τ = a ] · p a + E [ Y (0 , | τ = c ] · p c + E [ Y (0 , | τ = n ] · p n , A = E [ Y (1 , | τ = a ] · p a + E [ Y (0 , | τ = c ] · p c + E [ Y (0 , | τ = n ] · p n . Accordingly, E [ Y | D = 1] − E [ Y | D = 0] p | − p | = E [ Y (1 , | τ = c ] − E [ Y (0 , | τ = c ] A = 0 , which proves (2). Accordingly, E [ Y | D = 1] − E [ Y | D = 0] = 0 is a testableimplication of Assumption 2, 7, and 8. D Proof of Theorem 3

D.1 Average direct eﬀect on the never-takers

In the following, we show that θ n = E [ Y (1 , − Y (0 , | τ = n ] = E [ Y − Q ( Y ) | D =1 , M = 0] . From (C.3), we obtain the ﬁrst ingredient E [ Y (1 , | τ = n ] = E [ Y | D =1 , M = 0] . Furthermore, from (A.9) we have E [ Q ( Y ) | D = 1 , M = 0] = E [ Y (0 , | D = , M (1) = 0] . Under Assumption 7 and 8, E [ Y (0 , | D = 1 , M (1) = 0] A = E [ Y (0 , | D = 1 , τ = n ] A = E [ Y (0 , | τ = n ] . (D.1) D.2 Quantile direct eﬀect on the never-takers

We prove that θ n ( q ) = F − Y (1 , | τ = n ( q ) − F − Y (0 , | τ = n ( q ) , = F − Y | D =1 ,M =0 ( q ) − F − Q ( Y ) | D =1 ,M =0 ( q ) . This requires showing that F Y (1 , | τ = n ( y ) = F Y | D =1 ,M =0 ( y ) and (D.2) F Y (0 , | τ = n ( y ) = F Q ( Y ) | D =1 ,M =0 ( y ) . (D.3)Under Assumptions 7 and 8, F Y t | D =1 ,M =0 ( y ) = E [1 { Y t ≤ y }| D = 1 , M = 0] A ,A = E [1 { Y t (1 , ≤ y }| τ = n ]= F Y t (1 , | τ = n ( y ) , (D.4)which proves (D.2). From (A.12), we have F Q ( Y ) | D =1 ,M =0 ( y ) = F Y (0 , | D =1 ,M (1)=0 ( y ) = E [1 { Y (0 , ≤ y }| D = 1 , M (1) = 0] . Under Assumption 7 and 8, E [1 { Y (0 , ≤ y }| D = 1 , M (1) = 0] A ,A = E [1 { Y (0 , ≤ y }| τ = n ]= F Y (0 , | τ = n ( y ) , (D.5)which proves (D.3). 53 .3 Average direct eﬀect under d = on compliers In the following, we show that θ c (0) = E [ Y (1 , − Y (0 , | τ = c ] , = p | p | − p | E [ Q ( Y ) − Y | D = 0 , M = 0] − p | p | − p | E [ Y − Q ( Y ) | D = 1 , M = 0] . Plugging (D.1) in (C.1) under T = 1 , we obtain E [ Y | D = 0 , M = 0] = p n p n + p c E [ Q ( Y ) | D = 1 , M = 0]+ p c p n + p c E [ Y (0 , | τ = c ] . This allows identifying E [ Y (0 , | τ = c ] = p | p | − p | E [ Y | D = 0 , M = 0] − p | p | − p | E [ Q ( Y ) | D = 1 , M = 0] . (D.6)Accordingly, we have to show the identiﬁcation of E [ Y (1 , | c ] to ﬁnish theproof. From (A.19) we have E [ Y (1 , | D = 0 , M = 0] = E [ Q ( Y ) | D = 0 , M = 0] .Applying the law of iterative expectations, gives E [ Y (1 , | D = 0 , M = 0] = p n p n + p c E [ Y (1 , | D = 0 , M = 0 , τ = n ]+ p c p n + p c E [ Y (1 , | D = 0 , M = 0 , τ = c ] , A = p n p n + p c E [ Y (1 , | τ = n ] + p c p n + p c E [ Y (1 , | τ = c ] . After some rearrangements and using (C.3), we obtain E [ Y (1 , | τ = c ] = p n + p c p c E [ Q ( Y ) | D = 0 , M = 0] − p n p c E [ Y | D = 1 , M = 0] . E [ Y (1 , | τ = c ] = p | p | − p | E [ Q ( Y ) | D = 0 , M = 0] − p | p | − p | E [ Y | D = 1 , M = 0] , (D.7)using p n = p | , and p c + p n = p | . D.4 Quantile direct eﬀect under d = on compliers We show that F Y (1 , | τ = c ( y ) = p | p | − p | F Q ( Y ) | D =0 ,M =0 ( y ) − p | p | − p | c F Y | D =1 ,M =0 ( y ) and F Y (0 , | τ = c ( y ) = p | p | − p | F Y | D =0 ,M =0 ( y ) − p | p | − p | F Q ( Y ) | D =1 ,M =0 ( y ) , which proves that θ c ( q,

0) = F − Y (1 , | c ( q ) − F − Y (0 , | c ( q ) is identiﬁed.From (A.20), we have F Y (1 , | D =0 ,M (0)=0 ( y ) = F Q ( Y ) | D =0 ,M =0 ( y ) . Applying thelaw of iterative expectations gives F Y (1 , | D =0 ,M (0)=0 ( y ) = p n p n + p c F Y (1 , | D =0 ,M (0)=0 ,τ = n ( y )+ p c p n + p c F Y (1 , | D =0 ,M (0)=0 ,τ = c ( y ) , A = p n p n + p c F Y (1 , | τ = n ( y ) + p c p n + p c F Y (1 , | τ = c ( y ) . Using (D.2) and rearranging the equation gives, F Y (1 , | τ = c ( y ) = p | p | − p | F Q ( Y ) | D =0 ,M =0 ( y ) − p | p | − p | F Y | D =1 ,M =0 ( y ) . (D.8)In analogy to (C.1), the outcome distribution under D = 0 and M = 0 equals: F Y | D =0 ,M =0 ( y ) = p n p n + p c F Y (0 , | τ = n ( y ) + p c p n + p c F Y (0 , | τ = c ( y ) . F Y (0 , | τ = c ( y ) = p | p | − p | F Y | D =0 ,M =0 ( y ) − p | p | − p | F Q ( Y ) | D =1 ,M =0 ( y ) . (D.9) E Proof of Theorem 4

E.1 Average direct eﬀect on the always-takers

In the following, we show that θ a = E [ Y (1 , − Y (0 , | τ = a ] = E [ Q ( Y ) − Y | D = 0 , M = 1] . From (C.6), we obtain the ﬁrst ingredient E [ Y (0 , | a ] = E [ Y | D = 0 , M = 1] . Furthermore, from (B.9) we have E [ Q ( Y ) | D = 0 , M = 1] = E [ Y (1 , | D = 0 , M (0) = 1] . Under Assumption 7 and 8, E [ Y (1 , | D = 0 , M (0) = 1] A = E [ Y (1 , | D = 0 , τ = a ] A = E [ Y (1 , | τ = a ] . (E.1) E.2 Quantile direct eﬀect on the always-takers

We prove that θ a ( q ) = F − Y (1 , | τ = a ( q ) − F − Y (0 , | τ = a ( q ) , = F − Q ( Y ) | D =0 ,M =1 ( q ) − F − Y | D =0 ,M =1 ( q ) . This requires showing that F Y (1 , | τ = a ( y ) = F Q ( Y ) | D =0 ,M =1 ( y ) and (E.2) F Y (0 , | τ = a ( y ) = F Y | D =0 ,M =1 ( y ) . (E.3)56nder Assumptions 7 and 8, F Y t | D =0 ,M =1 ( y ) = E [1 { Y t ≤ y }| D = 0 , M = 1] A ,A = E [1 { Y t (0 , ≤ y }| τ = a ]= F Y t (0 , | τ = a , ( y ) . (E.4)which proves (E.3). From (B.12), we have F Q ( Y ) | D =0 ,M =1 ( y ) = F Y (1 , | D =0 ,M (0)=1 ( y ) = E [1 { Y (1 , ≤ y }| D = 0 , M (0) = 1] . Under Assumption 7 and 8, E [1 { Y (1 , ≤ y }| D = 0 , M (0) = 1] A ,A = E [1 { Y (1 , ≤ y }| τ = a ]= F Y (1 , | τ = a ( y ) , (E.5)which proves (E.2). E.3 Average direct eﬀect under d = on compliers In the following, we show that θ c (1) = E [ Y (1 , − Y (0 , | τ = c ] , = p | p | − p | E [ Y − Q ( Y ) | D = 1 , M = 1] − p | p | − p | E [ Q ( Y ) − Y | D = 0 , M = 1] . Plugging (E.1) in (C.4), we obtain E [ Y | D = 1 , M = 1] = p a p a + p c E [ Q ( Y ) | D = 0 , M = 1]+ p c p a + p c E [ Y (1 , | τ = c ] . E [ Y (1 , | τ = c ] = p | p | − p | E [ Y | D = 1 , M = 1] − p | p | − p | E [ Q ( Y ) | D = 0 , M = 1] . (E.6)From (B.19) we have E [ Y (0 , | D = 1 , M = 1] = E [ Q ( Y ) | D = 1 , M = 1] .Applying the law of iterative expectations, gives E [ Y (0 , | D = 1 , M = 1] = p a p a + p c E [ Y (0 , | D = 1 , M = 1 , τ = a ]+ p c p a + p c E [ Y (0 , | D = 1 , M = 1 , τ = c ] , A = p a p a + p c E [ Y (0 , | τ = a ] + p c p a + p c E [ Y (0 , | τ = c ] . After some rearrangements and using (C.6), we obtain E [ Y (0 , | τ = c ] = p a + p c p c E [ Q ( Y ) | D = 1 , M = 1] − p a p c E [ Y | D = 0 , M = 1] . This gives E [ Y (0 , | τ = c ] = p | p | − p | E [ Q ( Y ) | D = 1 , M = 1] − p | p | − p | E [ Y | D = 0 , M = 1] , (E.7)with p a = p | , and p c + p a = p | . E.4 Quantile direct eﬀect under d = on compliers We show that F Y (1 , | τ = c ( y ) = p | p | − p | F Y | D =1 ,M =1 ( y ) − p | p | − p | F Q ( Y ) | D =0 ,M =1 ( y ) and F Y (0 , | τ = c ( y ) = p | p | − p | F Q ( Y ) | D =1 ,M =1 ( y ) − p | p | − p | F Y | D =0 ,M =1 ( y ) , which proves that θ c ( q,

1) = F − Y (1 , | c ( q ) − F − Y (0 , | c ( q ) is identiﬁed.58n analogy to (C.4), the outcome distribution under D = 0 and M = 0 equals: F Y | D =1 ,M =1 ( y ) = p a p a + p c F Y (1 , | τ = a ( y ) + p c p a + p c F Y (1 , | τ = c ( y ) . Using (E.2) and rearranging the equation gives F Y (1 , | τ = c ( y ) = p | p | − p | F Y | D =1 ,M =1 ( y ) − p | p | − p | F Q ( Y ) | D =0 ,M =1 ( y ) . (E.8)From (B.21), we have F Y (0 , | D =1 ,M (1)=1 ( y ) = F Q ( Y ) | D =1 ,M =1 ( y ) . Applying thelaw of iterative expectations gives F Y (0 , | D =1 ,M (1)=1 ( y ) = p a p a + p c F Y (0 , | D =1 ,M (1)=1 ,τ = a ( y )+ p c p a + p c F Y (0 , | D =1 ,M (1)=1 ,τ = c ( y ) , A = p a p a + p c F Y (0 , | τ = a ( y ) + p c p a + p c F Y (0 , | τ = c ( y ) . Using (E.3) and rearranging the equation gives, F Y (0 , | τ = c ( y ) = p | p | − p | F Q ( Y ) | D =1 ,M =1 ( y ) − p | p | − p | F Y | D =0 ,M =1 ( y ) . (E.9) F Proof of Theorem 5

F.1 Average treatment eﬀect on the compliers

In (E.6) and (D.6), we show that θ c = E [ Y (1 , − Y (0 , | τ = c ] , = p | p | − p | E [ Y | D = 1 , M = 1] − p | p | − p | E [ Q ( Y ) | D = 0 , M = 1] − p | p | − p | E [ Y | D = 0 , M = 0] + p | p | − p | E [ Q ( Y ) | D = 1 , M = 0] . .2 Quantile treatment eﬀect on the compliers In (E.8) and (D.9), we show that F Y (1 , | c ( y ) and F Y (0 , | c ( y ) are identiﬁed. Accord-ingly, ∆ c ( q ) = F − Y (1 , | c ( q ) − F − Y (0 , | c ( q ) is identiﬁed. F.3 Average indirect eﬀect under d = 0 on compliers In (E.7) and (D.6), we show that δ c (0) = E [ Y (0 , − Y (0 , | τ = c ] , = p | p | − p | E [ Q ( Y ) | D = 1 , M = 1] − p | p | − p | E [ Y | D = 0 , M = 1] − p | p | − p | E [ Y | D = 0 , M = 0] + p | p | − p | E [ Q ( Y ) | D = 1 , M = 0] . F.4 Quantile indirect eﬀect under d = 0 on compliers In (E.9) and (D.9), we show that F Y (0 , | c ( y ) and F Y (0 , | c ( y ) are identiﬁed. Accord-ingly, δ c ( q,

0) = F − Y (0 , | c ( q ) − F − Y (0 , | c ( q ) is identiﬁed. F.5 Average indirect eﬀect under d = 1 on compliers In (E.6) and (D.7), we show that δ c (1) = E [ Y (1 , − Y (1 , | τ = c ] , = p | p | − p | E [ Y | D = 1 , M = 1] − p | p | − p | E [ Q ( Y ) | D = 0 , M = 1] − p | p | − p | E [ Q ( Y ) | D = 0 , M = 0] + p | p | − p | E [ Y | D = 1 , M = 0] . F.6 Quantile indirect eﬀect under d = 1 on compliers In (E.8) and (D.8), we show that F Y (1 , | c ( y ) and F Y (1 , | c ( y ) are identiﬁed. Accord-ingly, δ c ( q,

1) = F − Y (1 , | c ( q ) − F − Y (1 , | c ( q ))