[PDF] Nonparametric causal mediation analysis for stochastic interventional (in)direct effects

Abstract

Causal mediation analysis has historically been limited in two important ways: (i) a focus has traditionally been placed on binary treatments and static interventions, and (ii) direct and indirect effect decompositions have been pursued that are only identifiable in the absence of intermediate confounders affected by treatment. We present a theoretical study of an (in)direct effect decomposition of the population intervention effect, defined by stochastic interventions jointly applied to the treatment and mediators. In contrast to existing proposals, our causal effects can be evaluated regardless of whether a treatment is categorical or continuous and remain well-defined even in the presence of intermediate confounders affected by treatment. Our (in)direct effects are identifiable without a restrictive assumption on cross-world counterfactual independencies, allowing for substantive conclusions drawn from them to be validated in randomized controlled trials. Beyond the novel effects introduced, we provide a careful study of nonparametric efficiency theory relevant for the construction of flexible, multiply robust estimators of our (in)direct effects, while avoiding undue restrictions induced by assuming parametric models of nuisance parameter functionals. To complement our nonparametric estimation strategy, we introduce inferential techniques for constructing confidence intervals and hypothesis tests, and discuss open source software implementing the proposed methodology.

Full PDF

NN ONPARAMETRIC CAUSAL MEDIATION ANALYSIS FORSTOCHASTIC INTERVENTIONAL ( IN ) DIRECT EFFECTS

Nima S. Hejazi

Graduate Group in Biostatistics, andCenter for Computational Biology,University of California, Berkeley [email protected]

Kara E. Rudolph

Department of Epidemiology,Mailman School of Public Heatlh,Columbia University [email protected]

Mark J. van der Laan

Division of Biostatistics,School of Public Health, andDepartment of Statistics,University of California, Berkeley [email protected]

Iv´an D´ıaz

Division of Biostatistics,Department of Population Health Sciences,Weill Cornell Medicine [email protected]

September 23, 2020 A BSTRACT

Causal mediation analysis has historically been limited in two important ways: (i) a focus has tradi-tionally been placed on binary treatments and static interventions, and (ii) direct and indirect effectdecompositions have been pursued that are only identiﬁable in the absence of intermediate con-founders affected by treatment. We present a theoretical study of an (in)direct effect decompositionof the population intervention effect, deﬁned by stochastic interventions jointly applied to the treat-ment and mediators. In contrast to existing proposals, our causal effects can be evaluated regardlessof whether a treatment is categorical or continuous and remain well-deﬁned even in the presenceof intermediate confounders affected by treatment. Our (in)direct effects are identiﬁable without arestrictive assumption on cross-world counterfactual independencies, allowing for substantive con-clusions drawn from them to be validated in randomized controlled trials. Beyond the novel effectsintroduced, we provide a careful study of nonparametric efﬁciency theory relevant for the construc-tion of ﬂexible, multiply robust estimators of our (in)direct effects, while avoiding undue restrictionsinduced by assuming parametric models of nuisance parameter functionals. To complement ournonparametric estimation strategy, we introduce inferential techniques for constructing conﬁdenceintervals and hypothesis tests, and discuss open source software, the medshift R package, imple-menting the proposed methodology. Application of our (in)direct effects and their nonparametricestimators is illustrated using data from a comparative effectiveness trial examining the direct andindirect effects of pharmacological therapeutics on relapse to opioid use disorder.

In myriad applications, one is often interested in the effect of a treatment on an outcome only through a particularpathway between the two. Indeed, efforts in deﬁning and identifying such path-speciﬁc effects have come to constitutea rich history in not only philosophy but also in the sciences of statistics, causal inference, epidemiology, economics,and psychology. In each of these disciplines, as well as in many others among the biomedical and social sciences,developing a mechanistic understanding of the complexities that admit representations as path-speciﬁc effects remainsa central goal; examples include elucidating the biological mechanism by which a vaccine reduces infection risk (e.g.,Corey et al., 2015; Hejazi et al., 2020), assessing the effect on preterm birth of maternal exposure to environmental a r X i v : . [ s t a t . M E ] S e p EPTEMBER

23, 2020 toxins (e.g., Ferguson et al., 2017), and ascertaining the effect of novel pharmacological therapies on substance abusedisorder relapse.The latter serves as our motivating example as we consider how exposure to a buprenorphine dose schedule character-ized by successive increases toward a maximum dose early in treatment (versus static dose) affects the risk of relapseto opioid use disorder, both directly and indirectly through mediating factors such as depression and pain. Developinga detailed mechanistic understanding of the process by which such therapeutics modulate intermediary states is neces-sarily a causal question — one central to designing and successively improving upon available therapies in a mannertargeted towards the mitigation of the risk of substance abuse relapse. In comparative effectiveness trials of promisingopioid use disorder therapeutics, detailed dissections of the complex neurological and psychiatric pathways involvedin the development of addiction disorders is of clinical interest (Lee et al., 2018; Rudolph et al., 2020a). The abilityto deﬁne and evaluate causal effects along paths involving or avoiding mediating neuropsychiatric sequela would fa-cilitate drug efﬁcacy assessments; moreover, the ability to further examine and reﬁne scientiﬁc conclusions based onstatistical evidence through randomized controlled trials remains key to furthering clinical progress.To carefully study complex mediation relationships, a wealth of techniques rooted in statistical causal inference havebeen formulated. Path analysis (Wright, 1921, 1934), perhaps the earliest example of such methodology, directlyinspired the development of subsequent techniques that leveraged parametric structural equation models (e.g., Gold-berger, 1972; Baron & Kenny, 1986) for mediation analysis. More recently, the advent of modern frameworks andformalisms for causal inference, including nonparametric structural equation models, directed acyclic graphs, and theirunderlying do-calculus (Pearl, 1995, 2000), provided the necessary foundational tools to express causal mechanismswithout reliance on the more restrictive approach of parametric modeling.In tandem with the developments of Pearl (2000), similar approaches spearheaded by Robins (1986), Spirtes et al.(2000), Dawid (2000), and Richardson & Robins (2013) both allowed nonparametric formulations of mediation anal-ysis and uncovered signiﬁcant limitations of the earlier efforts focused on parametric structural equation models (Pearl,1998; Imai et al., 2010). Recent applications of modern causal models have illustrated the failings of popular para-metric modeling strategies (i.e., Baron & Kenny, 1986), in the presence of intermediate confounders of the mediator-outcome relationship (Cole & Hern´an, 2002). Consequently, the usually implausible assumptions that underlie suchrestrictive structural equation models make these approaches of limited applicability for the examination of complexphenomena in the biomedical, health, and social sciences.Modern approaches to causal inference have allowed for signiﬁcant advances over the methodology of traditional me-diation analysis, overcoming the signiﬁcant restrictions imposed by the use of parametric structural equation modeling.For example, both Robins & Greenland (1992), using the potential outcomes framework, and Pearl (2006), using non-parametric structural equation models, provided equivalent nonparametric decompositions of the average treatmenteffect (for binary treatments) into natural direct and indirect effects, which quantify all effects of the treatment on theoutcome through paths avoiding the mediator and all paths involving the mediator, respectively. Such advances werenot without their own limitations, however. A key assumption of the nonparametric decomposition of the average treat-ment effect is the requirement of cross-world counterfactual independencies (i.e., the condition that counterfactualsindexed by distinct intervention assignments be independent). Unfortunately, such an assumption limits the scientiﬁcrelevance of the natural (in)direct effects by making them unidentiﬁable in randomized trials, directly implying thatcorresponding scientiﬁc claims cannot be falsiﬁed through experimentation (Popper, 1934; Dawid, 2000; Robins &Richardson, 2010). Importantly, such cross-world independencies are also unsatisﬁed in the presence of intermediateconfounders affected by treatment (Avin et al., 2005; Tchetgen Tchetgen & VanderWeele, 2014). As such confoundersare often present in practice, the natural (in)direct effects are fundamentally of limited applicability.To circumvent the limitations imposed by cross-world counterfactual independencies, a rich literature has recentlydeveloped — with alternative deﬁnitions of the (in)direct effects (Petersen et al., 2006; van der Laan & Petersen, 2008;Vansteelandt & VanderWeele, 2012; VanderWeele et al., 2014). Such developments have led directly to the formulationof the new and promising family of interventional (in)direct effects (Didelez et al., 2006; VanderWeele et al., 2014;Lok, 2016; Vansteelandt & Daniel, 2017; Zheng & van der Laan, 2017; Rudolph et al., 2017; Lok, 2019; Nguyenet al., 2019), for which nonparametric effect decompositions and corresponding efﬁciency theory have only recentlybecome available (D´ıaz et al., 2020; Benkeser, 2020). While resolving the issues arising from requiring cross-worldcounterfactual independencies, these interventional (in)direct effects are limited by their lack of applicability beyondbinary treatments.A related thread of the literature has considered stochastic interventions. The framework of stochastic interventionsgeneralizes many intervention classes — for example, within this framework, static interventions result in post-intervention exposures that have degenerate distributions. Stock (1989) ﬁrst considered the estimation of the totaleffects of stochastic interventions, while many others (e.g., Robins et al., 2004; Didelez et al., 2006; Tian, 2008; Pearl,2009; Stitelman et al., 2010; Haneuse & Rotnitzky, 2013; D´ıaz & van der Laan, 2013; Dud´ık et al., 2014; Young et al., EPTEMBER

23, 2020 medshift (Hejazi & D´ıaz, 2020) package, for the R language and environment for statistical computing (R Core Team, 2020). Let A denote a continuous or categorical treatment variable, let Y denote a continuous or binary outcome, let Z denote a mediator, let W denote a vector of observed pre-treatment covariates, and let L denote an intermediate(mediator-outcome) confounder affected by treatment. We formalize the causal inference problem using the followingnonparametric structural equation model (NPSEM). Assume W = f W ( U W ); A = f A ( W, U A ); L = f L ( A, W, U L ); (1) Z = f Z ( L, A, W, U Z ); Y = f Y ( Z, L, A, W, U Y ) . In the NPSEM (1), U = ( U W , U A , U L , U Z , U Y ) is a vector of exogenous factors, and the functions f are assumeddeterministic but unknown. This mechanistic model is assumed to generate the observed data O ; it encodes severalfundamental assumptions. First, an implicit temporal ordering W → A → L → Z → Y is assumed. Second,each variable (i.e., { W, A, L, Z, Y } ) is assumed to be generated from the corresponding deterministic function of theobserved variables that precede it temporally, plus an exogenous variable denoted by U . Each exogenous variable isassumed to contain all unobserved causes of the corresponding observed variable. Unlike the mediation effects forstochastic interventions presented in D´ıaz & Hejazi (2020), we consider an intermediate confounder L that is affectedby treatment; our results generalize the methodology of D´ıaz & Hejazi (2020), extending their (in)direct effects tosettings with intermediate confounding. For a random variable X , let X a denote the counterfactual outcome observedin a hypothetical world in which P ( A = a ) = 1 . For example, we have L a = f L ( a, W, U L ) , Z a = f Z ( L a , a, W, U Z ) ,and Y a = f Y ( Z a , L a , a, W, U Y ) . Likewise, we let Y a,z = f Y ( z, L a , a, W, U Y ) denote the value of the outcome in ahypothetical world where P ( A = a, Z = z ) = 1 . Figure 1 represents model (1) in terms of a directed acyclic graph. EPTEMBER

23, 2020

LW ZA Y

Figure 1: Directed Acyclic Graph of NPSEM (1).Letting O = ( W, A, Z, L, Y ) represent a random variable with distribution P , we denote by O , . . . , O n a sample of n i.i.d. observations of O . We let P f = R f ( o )d P ( o ) for a given function f ( o ) . We use P c to denote the joint distributionof ( O, U ) , and let E and E c denote corresponding expectation operators. We use P n to denote the empirical distributionof O , . . . , O n , and assume P ∈ M , where M is the nonparametric statistical model deﬁned as all continuous densitieson O with respect to a dominating measure ν . Let p denote the corresponding probability density function. We use g ( a | w ) and e ( a | z, w ) to denote the probability density function or the probability mass function of A conditional on W = w and ( Z, W ) , respectively; m ( z, l, a, w ) to denote the outcome regression function E ( Y | Z = z, L = l, A = a, W = w ) . Let g ( · | w ) and e ( · | z, w ) be dominated by a measure κ ( a ) (e.g., the counting measure for binary A andthe Lebesgue measure for continuous A ). We will use the parameterizations p ( z | w ) p ( z | a, w ) = g ( a | w ) e ( a | z, w ) ; p ( z | a, w ) p ( z | l, a, w ) = p ( l | a, w ) p ( l | z, a, w ) (2)in constructing our estimators, as such parameterizations allow for estimation and integration with respect to multi-variate conditional densities on the mediator Z to be avoided. We use W , A , Z , L , and Y to denote the support of thecorresponding random variables.Causal effects are deﬁned in terms of hypothetical interventions on the NPSEM (1). In particular, consider an in-tervention in which the structural equation corresponding to A is removed, with the treatment drawn instead from auser-speciﬁed distribution g δ ( a | w ) , which may itself depend on the natural exposure distribution and a user-speciﬁedparameter δ . Going forward, we let A δ denote a draw from g δ ( a | w ) .Alternatively, such modiﬁcations can occasionally be described in terms of an intervention in which the structuralequation corresponding to A is removed and the treatment is set equal to a hypothetical regime d ( A, W ) . Regime d depends on the treatment level A that would be assigned in the absence of the regime as well as on W . Thelatter intervention has been referred to as depending on the natural value of treatment , or as a modiﬁed treatmentpolicy (Haneuse & Rotnitzky, 2013). For such interventions, Haneuse & Rotnitzky (2013) introduced the assumptionof piecewise smooth invertibility , which ensures that the change of variable formula can be used when computingintegrals over A : A1 (Piecewise smooth invertibility) . For each w ∈ W , assume that the interval I ( w ) = ( l ( w, ) , u ( w )) may bepartitioned into subintervals I δ,j ( w ) : j = 1 , . . . , J ( w ) such that d ( a, w ) is equal to some d j ( a, w ) in I δ,j ( w ) and d j ( · , w ) has inverse function h j ( · , w ) with derivative h j ( · , w ) .Assumption A1 can be used to show that the intervention may also be represented as a change in which the structuralequation f A is removed from the NPSEM, with A being drawn from the post-intervention distribution g δ ( a | w ) .Young et al. (2014) provide a discussion comparing and contrasting the interpretation and identiﬁcation of these twointerventions. Such stochastic interventions can be used to deﬁne the population intervention effect (PIE) of A on Y .To illustrate, consider continuous-valued A and assume the distribution of A conditional on W = w is supported inthe interval ( l ( w ) , u ( w )) . Then, one may deﬁne d ( a, w ) = ( a − δ if a > l ( w ) + δa if a ≤ l ( w ) + δ, (3)where < δ < u ( w ) is an arbitrary prespeciﬁed value. We can alternatively deﬁne a tilted intervention distribution as g δ ( a | w ) = exp( δa ) g ( a | w ) R exp( δa ) g ( a | w )d κ ( a ) , (4)for δ ∈ R . Kennedy (2019) proposed a form of exponential tilting (4) under the parameterization δ = exp( δ ) ,appropriate for incremental interventions on the propensity score for binary A . D´ıaz & Hejazi (2020) provide a careful EPTEMBER

23, 2020 study of the interventions 3 and 4 in the context of mediation, introducing novel (in)direct effects and correspondingefﬁciency theory; however, their contributions assume the absence of intermediate confounding.

D´ıaz & Hejazi (2020) deﬁned the (in)direct effect of A on Y in terms of a decomposition of the total effect of astochastic intervention. In particular, the total effect E ( Y − Y A δ ) may be decomposed as the sum of the populationintervention direct effect (PIDE) and the population intervention indirect effect (PIIE):PIDE = E c { f Y ( Z, L, A, W, U Y ) − f Y ( Z, L A δ , A δ , W, U Y ) } PIIE = E c { f Y ( Z, L A δ , A δ , W, U Y ) − f Y ( Z A δ , L A δ , A δ , W, U Y ) } . Upon inspection, the deﬁnitions above reveal that the direct effect measures the effect through paths not involving themediator (i.e., A → Y and A → L → Y ), whereas the indirect effect measures the effect through paths involving themediator (i.e., A → Z → Y and A → L → Z → Y ).Unfortunately, the population intervention direct and indirect effects are not generally identiﬁed in the presence of anintermediate confounder affected by treatment such as in the DAG in Figure (1) (D´ıaz & Hejazi, 2020). This is dueto the dual role of L as a confounder of the relation between Z and Y , which requires adjustment, and a variable onthe path from A to Y , which precludes adjustment. Next, we present a solution to this complication using a jointstochastic intervention on the treatment A and mediator Z . We also show that the effects deﬁned in this paper are ageneralization of the effects of D´ıaz & Hejazi (2020) in the sense that the former reduce to the latter in the absence ofintermediate confounding. To introduce (in)direct effects robust to the presence of intermediate confounders, we draw upon ideas ﬁrst outlinedby Didelez et al. (2006), Petersen et al. (2006), and van der Laan & Petersen (2008), all subsequently formalizedby VanderWeele et al. (2014) and Vansteelandt & Daniel (2017). Owing to their deﬁnition in terms of stochasticinterventions on the mediator, these (in)direct effects have been collectively termed interventional effects . We leveragetwo types of stochastic interventions: one on the treatment A , which deﬁnes the intervention of interest, and one onthe mediator Z , which is used to achieve identiﬁability of the effects. Following the convention of the literature, weterm stochastic interventions on the mediator Z interventional , while reserving the label of stochastic to refer only tointerventions on the treatment A . To proceed, let G δ denote a random draw from the distribution of Z A δ conditionalon ( A δ , W ) , and let G denote a random draw from the distribution of Z conditional on ( A, W ) . We consider theeffect deﬁned by ψ δ = E c { Y A,G − Y A δ ,G δ } . Note that the effect ψ δ is distinct from the effect considered by D´ıaz& Hejazi (2020), which may be expressed E c { Y A,Z − Y A δ ,Z δ } . The effect ψ δ arises from ﬁxing the mediator to arandom value chosen from its distribution among all those with a particular treatment level, rather than ﬁxing it to whatit would have been under a particular (ﬁxed) treatment. Deﬁning the effect in this way aids in achieving an identiﬁabledecomposition into direct and indirect effects. In particular, we may decompose this effect in terms of interventionalstochastic direct effects (DE) and indirect effects (IE) : ψ ( δ ) = DE z }| { E { Y A,G − Y A δ ,G } + IE z }| { E { Y A δ ,G − Y A δ ,G δ } . (5)Decomposition as the sum of direct and indirect effects affords an interpretation analogous to the correspondingstandard decomposition of the average treatment effect into the natural direct and indirect effects (Pearl, 2006). Inparticular, the direct effect arises from drawing a counterfactual value of A from a post-intervention distributionwhile keeping the distribution of Z ﬁxed. The indirect effect arises from replacing the distribution of Z with acandidate post-intervention distribution while holding A ﬁxed. Our proposed stochastic interventional effects have aninterpretation similar to the interventional effects of VanderWeele et al. (2014); moreover, while both effect deﬁnitionsaccount for the presence of an intermediate confounder, our direct and indirect effects accommodate ﬂexible, stochasticinterventions on the exposure while those of VanderWeele et al. (2014) are limited to static interventions. In order to construct estimators of our proposed causal direct and indirect effects, we turn to examining assumptionsneeded to estimate components of the post-intervention quantities matching the counterfactual variables of interest.Towards this end, we introduce the following identiﬁcation assumptions: A2 (Common support) . Assume supp { g δ ( · | w ) } ⊆ supp { g ( · | w ) } for all w ∈ W . EPTEMBER

23, 2020 A3 (No unmeasured exposure-outcome confounder) . Assume Y a,z ⊥⊥ A | W . A4 (No unmeasured mediator-outcome confounder) . Assume Y a,z ⊥⊥ Z | ( L, A, W ) . A5 (No unmeasured exposure-mediator confounder) . Assume Z a ⊥⊥ A | W .Under these assumptions, we have the following identiﬁcation results. A proof is available in the SupplementaryMaterials. Theorem 1 (Identiﬁcation) . Deﬁne θ ,δ = Z m ( z, l, a, w ) p ( l | a, w ) p ( z | a, w ) g δ ( a | w ) p ( w )d ν ( a, z, l, w ) ,θ ,δ = Z m ( z, l, a, w ) p ( l | a, w ) p ( z | w ) g δ ( a | w ) p ( w )d ν ( a, z, l, w ) . Under A2–A5, the direct effect ψ D ,δ and indirect effect ψ I ,δ (5) are identiﬁed and given, respectively, by ψ D ,δ = θ , − θ ,δ ψ I ,δ = θ ,δ − θ ,δ . (6)Assumption A3 states that, conditional on W , there is no unmeasured confounding of the relation between A and Y ;assumption A5 states that conditional on W there is no unmeasured confounding of the relation between A and Z ; andassumption A4 states that conditional on ( W, A, L ) there is no unmeasured confounding of the relation between Z and Y . These assumptions are standard in causal mediation analysis. In addition to these assumptions, standard mediationanalyses (e.g., VanderWeele et al., 2014) require positivity assumptions on the treatment and mediation mechanisms.The stochastic intervention framework we adopt does not require such assumptions, as positivity can be arranged bydeﬁnition of g δ . For example, the interventions in expressions (3) and (4) satisfy assumption A2 by deﬁnition. Theinterested reader is encouraged to consult Kennedy (2019) and D´ıaz & Hejazi (2020) for a discussion on this topic.Another consequence of this identiﬁcation result is that the deﬁnitions (6) reduce to the stochastic (in)direct effectsof D´ıaz & Hejazi (2020) in the absence of intermediate confounders L . Importantly, this implies that our estimatorscan be safely used in the absence of intermediate confounders; furthermore, it implies that the corresponding estimatesmay be interpreted in terms of a decomposition of the population intervention effect E c { Y − Y A δ } , which is arguablyof more scientiﬁc interest than the interventional effect ψ δ = E c { Y A,G − Y A δ ,G δ } .As is clear from the deﬁnition (6), evaluation of ψ D ,δ and ψ I ,δ requires access to θ , , the population mean in theabsence of any intervention on the treatment mechanism, as well as both of θ ,δ and θ ,δ , which are based on thepost-intervention treatment mechanism g δ ( a | w ) . Consequently, we next turn our attention to developing efﬁciencytheory for estimation of the statistical parameter θ j,δ : j = 1 , , which depends on the observed data distribution P . Thus far, we have discussed the decomposition of the effect of a stochastic intervention into direct and indirect effectsand have provided identiﬁcation results under under standard identiﬁability assumptions. We consider the developmentof efﬁciency theory for the estimation of θ ,δ and θ ,δ in the nonparametric model M . To do so, we introduce the efﬁcient inﬂuence function (EIF), which characterizes the asymptotic behavior of all regular and asymptotically linearestimators (Bickel et al., 1993; van der Vaart, 2002). Three common approaches exist for constructing local efﬁcientestimators from the EIF: (i) using the EIF as an estimating equation (e.g., van der Laan & Robins, 2003), (ii) using theEIF in a one-step bias correction (e.g., Pfanzagl & Wefelmeyer, 1985; Bickel et al., 1993), and targeted minimum lossestimation (van der Laan & Rubin, 2006; van der Laan & Rose, 2011, 2018).As a consequence of its representation in terms of orthogonal score equations, the EIF estimating equation makes itpossible to construct consistent estimators of the target parameter even when certain components of its distributionare inconsistently estimated. Thirdly, second-order bias terms may be derived from asymptotic analysis of estimatorsconstructed based on the EIF — often, these estimators require slow convergence rates (e.g., n − / ) for the nuisanceparameters involved. This latter property enables the use of ﬂexible, data adaptive regression techniques in estimatingthese quantities.For simplicity, we focus on the case of a binary intermediate confounder L , though our general approach requiresonly that either L or Z be low-dimensional. In Theorem 2, we present the EIF for a general stochastic intervention.Although the components of the EIF associated with ( Y, Z, L, W ) are the same, the component associated with themodel for the distribution of A must be computed on a case-by-case basis, that is, for each intervention of interest. EPTEMBER

23, 2020

Lemmas 1 and 2 present such components for modiﬁed treatment policies satisfying assumption A1 and for exponen-tial tilting, respectively. In theorem 2 below, we present a representation of the EIF that avoids the computation ofmultivariate integrals over Z . To introduce the EIF, we deﬁne the following auxiliary nuisance parameters: u ( z, a, w ) = Z m ( z, l, a, w )d P ( l | a, w ); v ( l, a, w ) = Z m ( z, l, a, w )d P ( z | a, w ); s ( l, a, w ) = Z m ( z, l, a, w )d P ( z | w ); ¯ u ( a, w ) = Z u ( z, a, w )d P ( z | a, w )¯ v ( a, w ) = Z v ( l, a, w )d P ( l | a, w )¯ s ( a, w ) = Z s ( l, a, w )d P ( l | a, w ) (7)Proofs for the following results are detailed in the Supplementary Materials. Theorem 2 (Efﬁcient inﬂuence functions) . H P ,δ ( a, z, l, w ) = g δ ( a | w ) g ( a | w ) p ( z | a, w ) p ( z | a, l, w ) ; H P ,δ ( a, z, l, w ) = g δ ( a | w ) g ( a | w ) p ( z | w ) p ( z | a, l, w ) (8) The efﬁcient inﬂuence functions for θ j,δ : j = 1 , in the nonparametric model are equal to D j P ,δ ( o ) − θ j,δ , where D j P ,δ ( o ) = S j P ,δ ( o ) + S j,A P ,δ ( o ) and S P ,δ ( o ) = H P ,δ ( a, z, l, w ) { y − m ( z, l, a, w ) } (9) + g δ ( a | w ) g ( a | w ) (cid:2) v ( l, a, w ) − ¯ v ( a, w ) + u ( z, a, w ) − ¯ u ( a, w ) (cid:3) (10) + Z ¯ u ( a, w ) g δ ( a | w )d κ ( a ) S P ,δ ( o ) = H P ,δ ( a, z, l, w ) { y − m ( z, l, a, w ) } (11) + g δ ( a | w ) g ( a | w ) { s ( l, a, w ) − ¯ s ( a, w ) } (12) + Z u ( z, a, w ) g δ ( a | w )d κ ( a ) , and S ,A P ,δ ( o ) , S ,A P ,δ ( o ) are the respective efﬁcient score functions corresponding to the model for g ( a | w ) . An immediate consequence of Theorem 2 is that, in a randomized trial, S j,A P ,δ ( o ) = 0 for j = 1 , ; however, evenin such trials, covariate adjustment can improve the efﬁciency of the resultant estimator (van der Laan & Robins,2003). We now present the efﬁcient scores S j,A P ,δ ( o ) for modiﬁed treatment policies and exponentially tilted stochasticinterventions. To do so, we deﬁne the parameter q ( a, w ) = R u ( z, a, w )d P ( z | w ) . Lemma 1 (Modiﬁed treatment policies) . If the modiﬁed treatment policy d ( A, W ) satisﬁes assumption A1, then S ,A P ,δ ( o ) = ¯ u ( d ( a, w ) , w ) − Z ¯ u ( d ( a, w ) , w ) g ( a | w )d κ ( a ) S ,A P ,δ ( o ) = q ( d ( a, w ) , w ) − Z q ( d ( a, w ) , w ) g ( a | w )d κ ( a ) . Lemma 2 (Exponential tilt) . If the stochastic intervention is the exponential tilt (4), then S ,A P ,δ ( o ) = g δ ( a | w ) g ( a | w ) (cid:26) ¯ u ( a, w ) − Z ¯ u ( a, w ) g δ ( a | w )d κ ( a ) (cid:27) (13) S ,A P ,δ ( o ) = g δ ( a | w ) g ( a | w ) (cid:26) q ( a, w ) − Z q ( a, w ) g δ ( a | w )d κ ( a ) (cid:27) (14)For binary treatments, the EIF corresponding to the incremental propensity score intervention may be simpliﬁed asper the following corollary. Corollary 1 (Efﬁcient inﬂuence function for incremental propensity score interventions) . Let A take values on { , } ,and let the exponentially tilted intervention g δ, (1 | W ) be based on (4) under the parameterization δ = exp( δ ) .Then, the EIF of Lemma 2 may be simpliﬁed as follows. Deﬁne the nuisance parameters q ( w ) = ¯ u (1 , w ) − ¯ u (0 , w ) , q ( w ) = E (cid:8) u ( Z, , W ) − u ( Z, , W ) | W = w (cid:9) , (15) EPTEMBER

23, 2020

Then S j,Aη,δ ( o ) = δ q j ( w ) { a − g (1 | w ) }{ δ g (1 | w ) + 1 − g (1 | w ) } . In contrast to the efﬁcient inﬂuence function for the interventional (in)direct effects (D´ıaz et al., 2020), the contributionof the treatment process to the EIF for the stochastic interventional effects deﬁned here is non-zero. This is a directconsequence of the fact that the parameter of interest depends on g ; moreover, this implies that the efﬁciency boundin observational studies differs from the efﬁciency bound in randomized studies. This further implies that it is notgenerally possible to obtain estimating equations that are robust to inconsistent estimation of g . Such robustness willonly be possible if the stochastic intervention is also a modiﬁed treatment policy satisfying assumption A1.The form of Theorem 2 makes it clear that estimation of multivariate or continuous conditional density functionson the mediators Z or intermediate confounders L , as well as integrals with respect to these density functions, isgenerally necessary for computation of the EIF. This poses a signiﬁcant challenge from the perspective of estimation,due to both the curse of dimensionality and the practical computational complexity inherent in solving multivariatenumerical integrals. A simpliﬁcation is possible when either either of Z or L is low-dimensional; this is achieved byreparameterizing the densities as conditional expectations (or low-dimensional conditional densities) that take othernuisance parameters as pseudo-outcomes is possible. To demonstrate, we assume L is univariate (e.g., binary as in ourillustrative application), though similar parameterizations may be achieved if Z is low-dimensional. In cases where L or Z is low-dimensional, our proposed re-parameterizations allow for the conditional density to be estimated viaappropriate semiparametric estimators (e.g., D´ıaz & van der Laan, 2011). Lemma 3 (Low-dimensional L and Z ) . If L is low-dimensional (e.g., binary and univariate) and Z is multivariate, wecan choose a representation of v , s , and ¯ u in terms of conditional expectations in order to facilitate their estimation.Denote b ( l | a, w ) and d ( l | z, a, w ) the probability that L = l ∈ { , } conditional on ( A, W ) and ( Z, A, W ) ,respectively. Then, using (2), we have v ( l, a, w ) = E " m ( z, l, a, w ) b ( L | A, W ) d ( L | Z, A, W ) (cid:12)(cid:12)(cid:12)(cid:12) L = l, A = a, W = w , s ( l, a, w ) = E " m ( z, l, a, w ) b ( L | A, W ) d ( L | Z, A, W ) g ( A | W ) e ( A | Z, W ) (cid:12)(cid:12)(cid:12)(cid:12) L = l, A = a, W = w , (16) ¯ u ( a, w ) = E " u ( Z, A, W ) (cid:12)(cid:12)(cid:12)(cid:12) A = a, W = w . Likewise, H P ,δ ( a, z, l, w ) = g δ ( a | w ) g ( a | w ) b ( l | a, w ) d ( l | z, a, w ) ; H P ,δ ( a, z, l, w ) = g δ ( a | w ) e ( a | z, w ) b ( l | a, w ) d ( l | z, a, w ) , and q ( a, w ) = E (cid:26) g ( A | W ) e ( A | Z, W ) u ( Z, A, W ) | A = a, W = w (cid:27) . Analogous representations may be constructed for ¯ v , ¯ s , and u based on the parameterizations (2) if L is multivariateand Z is of low dimension. We note, however, that at least one of Z or L must be of small dimensionality so that itsdensity may be estimated and integrals over its range may be computed with ease. In what follows, we assume L is univariate, denote η = ( m , g , b , ¯ u , v , d , e , s , q ) and let D j P ,δ ( o ) = D jη,δ ( o ) . The choiceof parameterization in Lemma 3 has important consequences for the purpose of estimation, as it helps to bypassestimation of the (possibly high-dimensional) conditional density of the mediators, instead allowing for regressionmethods, far more readily available throughout the statistics literature and software, to be used for estimation of therelevant quantities. Similar ideas have been used by Zheng & van der Laan (2017), D´ıaz & Hejazi (2020), and D´ıazet al. (2020). In addition to the expression for the efﬁcient inﬂuence function in Lemma 3, it is important to understandthe behavior of the difference P D η − θ , which is expected to yield a second order term in differences η − η , so thatconsistent estimation of θ is possible under consistent estimation of certain conﬁgurations of the parameters in η . Aswe will see in Theorems 3 and 4, this second-order term is fundamental in the construction of asymptotically linearestimators. Lemmas 5 and 6, found in the Supplementary Materials, delinate these second-order terms. The followinglemma is a direct consequence. EPTEMBER

23, 2020

Lemma 4 (Multiple robustness for modiﬁed treatment policies) . Let the modiﬁed treatment policy satisfy A1, and let η be such that one of the following conditions hold: m = m g = g b = b ¯ u = ¯ u v = v d = d e = e s = s q = q Cond. 1 × × ×

Cond. 2 × × × ×

Cond. 3 × × × ×

Cond. 4 × × × × ×

Cond. 5 × × × ×

Cond. 6 × × × × ×

Table 1: Different conﬁgurations of consistency for nuisance parameters

Then P D η ,δ = θ ,δ and P D η ,δ = θ ,δ , with D η,δ and D η,δ as deﬁned in Theorem 2 and Lemma 1. The above lemma implies that it is possible to construct consistent estimators for for the (in)direct effects under con-sistent estimation of subsets of the nuisance parameters in η , in the conﬁgurations described in the lemma. Lemma 4follows directly from Lemma 5, found in the Supplementary Materials. Some readers may ﬁnd it surprising thatestimation of θ j,δ may be robust to inconsistent estimation of g , even when the parameter deﬁnitions are explicitlydependent on g . We offer some intuition into this result by noting that assumption A1 allows use of the change ofvariable formula to obtain θ ,δ = E (cid:26)Z m ( z, l, d ( A, W ) , W ) p ( l | d ( A, W ) , W ) p ( z | W )d ν ( z, l ) (cid:27) . Estimation of this parameter without relying on g may be carried out by consistently estimating m ( z, l, a, w ) , p ( l | a, w ) , and p ( z | w ) and using the empirical distribution as an estimator of the outer expectation. This behavior hasbeen previously observed for related stochastic intervention effects under assumption A1 (D´ıaz & van der Laan, 2012;Haneuse & Rotnitzky, 2013; D´ıaz & Hejazi, 2020).The robustness result for the case an exponentially tilted intervention (1), which does not satisfy assumption A1, ispresented in the following lemma Lemma 5 (Multiple robustness for exponential tilting) . Let g δ be deﬁned as in (4). Let η be such that at least one ofCond. 1-4 in Table 1 holds. Then P D η ,δ = θ ,δ and P D η ,δ = θ ,δ , with D η,δ and D η,δ as deﬁned in Theorem 2 andLemma 2 Lemma 5 is a direct consequence of Lemma 6 in the Supplementary Materials. The corresponding proof reveals thatthe EIF for the binary distribution is not robust to inconsistent estimation of g — that is, the intervention fails to satisfyassumption A1 and integrals over the range of A cannot be computed using the change of variable formula. Thisbehavior has been previously observed for other interventions that do not satisfy assumption A1 (e.g., D´ıaz & vander Laan, 2013). Even though this lemma implies that consistent estimation of g is required, the bias terms remainsecond-order; thus, an estimator of g converging at rate n / or faster is sufﬁcient. We discuss two efﬁcient estimators that rely on the efﬁcient inﬂuence function D η,δ , in order to build an estimator thatis both efﬁcient and robust to model misspeciﬁcation. We discuss an asymptotic linearity result for the doubly robustestimator that allows computation of asymptotically correct conﬁdence intervals and hypothesis tests. In the sequel,we assume that preliminary estimators of the components of η are available. These estimators may be obtained fromﬂexible regression techniques such as support vector machines, regression trees, boosting, neural networks, splines, orensembles thereof (Wolpert, 1992; Breiman, 1996; van der Laan et al., 2007). As previously discussed, the consistencyof these estimators will determine the consistency of our estimators of θ j,δ .Both of our proposed efﬁcient estimators make use of the EIF D η,δ to revise an initial substitution estimator through abias correction step. As such, estimation proceeds by ﬁrst constructing initial estimators of the nuisance parameters in η ; then, each of the efﬁcient estimators is constructed by application of distinct bias-correction steps. In constructingthe these efﬁcient estimators, we advocate for the use of cross-ﬁtting (Klaassen, 1987; Zheng & van der Laan, 2011;Chernozhukov et al., 2018) in order to avoid imposing entropy conditions on the initial estimators of the nuisanceparameters in η . Let V , . . . , V J denote a random partition of the index set { , . . . , n } into J prediction sets of EPTEMBER

23, 2020 approximately the same size. That is, V j ⊂ { , . . . , n } ; S Jj =1 V j = { , . . . , n } ; and V j ∩ V j = ∅ . In addition, foreach j , the associated training sample is given by T j = { , . . . , n } \ V j . Denote by ˆ η j the estimator of η , obtained bytraining the corresponding prediction algorithm using only data in the sample T j . Further, let j ( i ) denote the index ofthe validation set which contains observation i . To construct a robust and efﬁcient estimator using the efﬁcient inﬂuence function D η,δ , the one-step bias correc-tion (Pfanzagl & Wefelmeyer, 1985; Bickel et al., 1993) adds the empirical mean of the estimated EIF D ˆ η,δ to aninitial substitution estimator. The estimators are thus deﬁned ˆ ψ osD ,δ = 1 n n X i =1 { D η j ( i ) , ( O i ) − D η j ( i ) ,δ ( O i ) } ˆ ψ osI ,δ = 1 n n X i =1 { D η j ( i ) ,δ ( O i ) − D η j ( i ) ,δ ( O i ) } (17)Asymptotic linearity and efﬁciency of the estimators for modiﬁed treatment policies is detailed below. Theorem 3 (Weak convergence of one-step estimators) . Let k·k denote the L ( P ) -norm deﬁned as k f k = R f d P .Deﬁne the following assumptions.(i) P {| D jη,δ ( O ) | ≤ C } = P {| D j ˆ η,δ ( O ) | ≤ C } = 1 for j = 1 , and for some C < ∞ .(ii) The following second-order terms converge at the speciﬁed rate • k ˆ m − m k {k ˆ g − g k + k ˆ e − e k + (cid:13)(cid:13)(cid:13) ˆ d − d (cid:13)(cid:13)(cid:13) } = o P ( n − / ) • k ˆ g − g k { (cid:13)(cid:13)(cid:13) ˆ¯ u − ¯ u (cid:13)(cid:13)(cid:13) + k ˆ q − q k} = o P ( n − / ) • (cid:13)(cid:13)(cid:13) ˆ b − b (cid:13)(cid:13)(cid:13) {k ˆ v − v k + k ˆ s − s k} = o P ( n − / ) , and(iii) The effect is deﬁned in terms of modiﬁed treatment policy d ( a, w ) , which is piecewise smooth invertible (A1).(iv) The intervention g δ is an exponential tilting intervention and P (cid:8)R (ˆ g − g )d κ (cid:9) = o P ( n − / ) .If assumptions (i) and (ii) hold, and one of assumptions (iii) and (iv) holds, then: √ n { ˆ ψ osD ,δ − ψ D ,δ } N (0 , σ D ,δ ) , and √ n { ˆ ψ osI ,δ − ψ I ,δ } N (0 , σ I ,δ ) , where σ D ,δ = Var { D η, ( O ) − D η,δ ( O ) } and σ I ,δ = Var { D η,δ ( O ) − D η,δ ( O ) } are the respective efﬁciency bounds. Theorem 3 establishes the weak convergence of ˆ ψ osD ,δ and ˆ ψ osI ,δ pointwise in δ . This convergence is useful to deriveconﬁdence intervals in situations where the modiﬁed treatment policy has a suitable scientiﬁc interpretation for agiven realization of δ . Under the assumptions of the theorem, an estimator ˆ σ D ,δ of σ D ,δ may be obtained as theempirical variance of D η j ( i ) , ( O i ) − D η j ( i ) ,δ ( O i ) , and a Wald-type conﬁdence interval may be constructed as ˆ ψ osD ,δ ± z − α/ ˆ σ D ( δ ) / √ n . Similar considerations apply to ˆ ψ osI ,δ .Although the one-step estimator has optimal asymptotic performance, its ﬁnite-sample behavior may be affected bythe inverse probability weighting involved in the computation of the efﬁcient inﬂuence functions D j ˆ η ( O i ) : j = 1 , . Inparticular, it is not guaranteed that ˆ ψ osD ,δ and ˆ ψ osI ,δ will remain within the bounds of the parameter space. This issue maybe attenuated by performing weight stabilization. The estimated EIF D η j ( i ) ( O i ) can be weight-stabilized by dividing(9) and (11) by the empirical mean of H η j ( i ) ,δ ( A i , Z i , L i , W i ) and H η j ( i ) ,δ ( A i , Z i , L i , W i ) , respectively; as well asdividing (10), (12), (13), and (14) by the empirical mean of ˆ g j ( i ) ,δ ( A i | W i ) / ˆ g j ( i ) ( A i | W i ) . EPTEMBER

23, 2020

Although corrections may be applied to the one-step estimator, a more principled way to obtain estimators that remainin the parameter space may be derived from the targeted minimum loss (TML) estimation framework. The TMLestimator is constructed by tilting an initial data adaptive estimator ˆ η towards a solution ˜ η of the estimating equations P n { D η, − D η,δ } = ψ D ,δ (˜ η ) P n { D η,δ − D η,δ } = ψ I ,δ (˜ η ) , (18)where ψ D ,δ (˜ η ) and ψ I ,δ (˜ η ) are the substitution estimators in formula (19) obtained by plugging in the estimates ˜ η inthe parameter deﬁnition (6). Thus, a TML estimator is guaranteed to remain in the parameter space by virtue of itsbeing a substitution estimator. The fact that the nuisance estimators solve the relevant estimating equation is used toobtain a weak convergence result analogous to Theorem 3. Thus, while the TML estimator is expected to attain thesame optimal asymptotic behavior as the one-step estimator, its ﬁnite-sample behavior may be better. An algorithmto compute a TML estimator ˜ η is presented in the Supplementary Materials. Roughly, the algorithm proceeds byprojecting the EIF into score functions for the model of each nuisance parameter, and ﬁtting appropriate parametricsubmodels (van der Laan & Rose, 2011, 2018). For example, the following model is ﬁtted for m : logit m β ( a, z, l, w ) = logit ˆ m ( z, l, a, w ) + β I H I ( o ) + β D H D ( o ) , where H D ( o ) = ˆ b ( l | a, w )ˆ d ( l | z, a, w ) (cid:26) − ˆ g δ ( a | w )ˆ e ( a | z, w ) (cid:27) H I ( o ) = ˆ b ( l | a, w )ˆ d ( l | z, a, w ) (cid:26) ˆ g δ ( a | w )ˆ e ( a | z, w ) − ˆ g δ ( a | w )ˆ g ( a | w ) (cid:27) , and logit( p ) = log { p (1 − p ) − } . Here, the initial estimator logit ˆ m ( z, l, a, w ) is considered a ﬁxed offset variable (i.e.,a variable with known parameter value equal to one). The score of these tilting models is equal to the correspondingcomponent of the efﬁcient inﬂuence function. The parameter β = ( β I , β D ) may be estimated by running standardlogistic regression of Y on ( H D ( O ) , H I ( O )) with no intercept and an offset term equal to logit ˆ m ( z, l, a, w ) . Let ˆ β denote the MLE, and let ˜ m = m ˆ β denote the updated estimates. Fitting this model ensures that ˜ m solves the relevantscore equations. Models like this are estimated iteratively for all parameters in a way that guarantees that the estimatingequations (18) are solved up to a term that converges to zero in probability at rate faster than n − / . After the iterationends, the TML estimators are deﬁned as ˆ ψ tmleD ,δ = 1 n Z n X i =1 (cid:8) ˜¯ u ( a, W i )˜ g ( a | W i ) − ˜ u ( Z i , a, W i )˜ g δ ( a | W i ) (cid:9) d κ ( a )ˆ ψ tmleI ,δ = 1 n Z n X i =1 (cid:8) ˜ u ( Z i , a, W i ) − ˜¯ u ( a, W i ) (cid:9) ˜ g δ ( a | W i )d κ ( a ) . (19)The fact that the TML estimator solves estimating equations (18) is a fundamental step in proving the followingtheorem. Theorem 4 (Weak convergence of TML estimator) . Assume (i) and (ii) hold, and one of (iii), (iv) deﬁned in Theorem 3holds, then: √ n { ˆ ψ tmleD ,δ − ψ D ,δ } N (0 , σ D ,δ ) , and √ n { ˆ ψ tmleI ,δ − ψ I ,δ } N (0 , σ I ,δ ) , where σ D ,δ = Var { D η, ( O ) − D η,δ ( O ) } and σ I ,δ = Var { D η,δ ( O ) − D η,δ ( O ) } . Using Theorem 4, asymptotically valid variance estimators, p-values, and conﬁdence intervals for the (in)direct effectsmay be obtained in a manner analogous to those for the one-step estimator. The proof of the theorem proceeds usingsimilar arguments as the proof of Theorem 3 for the one-step estimator, using empirical process theory and leveragingcross-ﬁtting to avoid entropy conditions on the initial estimators of η . Since the estimators now depend on the fullsample through the estimates of the parameters β of the logistic tilting models, the empirical process treatment isslightly different from that of Theorem 3. The proof is detailed in the Supplementary Materials. EPTEMBER

23, 2020

We used simulation experiments to assess our two proposed efﬁcient estimators of the (in)direct effects. On accountof computational considerations, we focus on binary exposures and intermediate confounders in this example; how-ever, as noted in the prior, our proposed methodology is general enough to be readily applicable in the presence ofcontinuous-valued covariates, treatment, mediators, intermediate confounders, and outcome. We made use of the fol-lowing data-generating mechanism for the joint distribution of O to generate synthetic data for evaluation of the twoestimators: W ∼ Bernoulli ( p = 0 . W ∼ Bernoulli ( p = 0 . W | ( W , W ) ∼ Bernoulli ( p = 0 . / · ( W + W )); A | W ∼ Bernoulli ( p = expit(2 + (5 / ( W + W + W )))); L | ( A, W ) ∼ Bernoulli ( p = expit(1 / W + W + W ) − A − log(2) + 0 . Z | ( L, A, W ) ∼ Bernoulli ( p = expit(log(3) · ( W + W ) + A − L )); Y | ( Z, L, A, W ) ∼ Bernoulli p = expit (cid:18) − · (3 − L − A + Z )2 + ( W + W + W ) (cid:19)! , where expit( x ) := { x ) } − . For each of the sample sizes n ∈ { , , , , , , , , } , datasets were generated. For every dataset, six variations of each of the two efﬁcient estima-tors was applied — ﬁve variants were based on misspeciﬁcation of a single nuisance parameter among { e , m , d , g , b } while the sixth variant was constructed based on consistent estimation of all ﬁve nuisance parameters. An intercept-only logistic regression model provided inconsistent estimation of each of the nuisance parameters { e , m , d , g , b } ,while a Super Learner ensemble (van der Laan et al., 2007) was used to achieve consistent estimation. The SuperLearner ensemble was constructed with a library of algorithms composed of intercept-only logistic regression; main-terms logistic regression; and several variants of the highly adaptive lasso (Benkeser & van der Laan, 2016; van derLaan, 2017; Coyle et al., 2020), a nonparametric regression approach capable of ﬂexibly estimating arbitrary func-tional forms at a fast convergence rate under only a global smoothness assumption (van der Laan & Bibaut, 2017;Bibaut & van der Laan, 2019). Note that we do not consider cases of misspeciﬁed estimation of { v , s , q , ¯ u } , for theirconsistent estimation depends on consistent estimation of a subset of the nuisance parameters { e , m , d , g , b } . Gener-ally, based on Lemmas 4 and 5, robustness of the direct and indrect effect estimators to misspeciﬁcation of { e , m , d } is to be expected, but the same is not true under misspeciﬁcation of { g , b } .Figure 2 summarizes the results of our investigations of the relative performance of the estimator variants enumeratedabove. Speciﬁcally, we assess the relative performance of our proposed estimators in terms of absolute bias, scaled(by n / ) bias, standard error and scaled (by n ) mean squared error relative to the efﬁciency bound for the data-generating model, the empirical coverage of 95% conﬁdence intervals, and relative efﬁciency. In terms of both raw(unscaled) bias and scaled bias, the estimator variants appear to conform to the predictions of Lemmas 4 and 5 —speciﬁcally, raw bias vanishes and scaled bias stabilizes to a small value (providing evidence for rate-consistency)under misspeciﬁcation of any of { e , m , d } as well as in the case of no nuisance parameter misspeciﬁcation. In the samevein, when either of { g , b } are estimated inconsistently, some of the estimator variants display diverging asymptotic(scaled) bias, in agreement with expectations based upon theory. The consistency of other estimator variants (e.g.,the one-step estimator under misspeciﬁcation of b ) is likely an artifact of this data-generating mechanism, not to betaken as a general indication of robust performance. In terms of their relative mean squared error, the estimators ofthe (in)direct effects exhibit convergence to the efﬁciency bound under misspeciﬁcation of { e , m , d } and under nomisspeciﬁcation; this also appears to hold for a subset of the estimator variants under misspeciﬁcation of { g , b } . Westress that aspects of this are likely to be a particularity of the given data-generating mechanism or on account of theirregularity of misspeciﬁed estimator variants, for the regularity and asymptotically linearity of the estimators is only tobe expected under consistent estimation of all nuisance parameters. Finally, the empirical coverage of 95% conﬁdenceintervals is as expected: under a lack of nuisance parameter misspeciﬁcation, both the one-step and TML estimators ofthe direct and indirect effect achieve 95% coverage in larger sample sizes. We note that misspeciﬁcation of e leads toover-coverage for all estimator variants, implying an overly inﬂated variance estimate, while the conﬁdence intervalsfail to attain the nominal rate in most other instances. Notably, several of the estimator variants generate conﬁdenceintervals that are liable to converge to 0% coverage in larger samples under misspeciﬁcation of { g , b } , very much inline with theoretical expectations. EPTEMBER

23, 2020 l l l l l l l l ll l l l l l l l ll l l l l l l l ll l l l l l l l ll l l l l l l l ll l l l l l l l ll l l l l l l l ll l l l l l l l ll l l l l l l l ll l l l l l l l ll l l l l l l l ll l l l l l l l l l l l l l l l l ll l l l l l l l ll l l l l l l l ll l l l l l l l ll l l l l l l l ll l l l l l l l ll l l l l l l l ll l l l l l l l ll l l l l l l l ll l l l l l l l ll l l l l l l l ll l l l l l l l l l l l l l l l l ll l l l l l l l ll l l l l l l l ll l l l l l l l ll l l l l l l l ll l l l l l l l ll l l l l l l l ll l l l l l l l ll l l l l l l l ll l l l l l l l ll l l l l l l l ll l l l l l l l l l l l l l l l l ll l l l l l l l ll l l l l l l l ll l l l l l l l ll l l l l l l l ll l l l l l l l ll l l l l l l l ll l l l l l l l ll l l l l l l l ll l l l l l l l ll l l l l l l l ll l l l l l l l l l l l l l l l l ll l l l l l l l ll l l l l l l l ll l l l l l l l ll l l l l l l l ll l l l l l l l ll l l l l l l l ll l l l l l l l ll l l l l l l l ll l l l l l l l ll l l l l l l l ll l l l l l l l l l l l l l l l l ll l l l l l l l ll l l l l l l l ll l l l l l l l ll l l l l l l l ll l l l l l l l ll l l l l l l l ll l l l l l l l ll l l l l l l l ll l l l l l l l ll l l l l l l l ll l l l l l l l l none e m d g b | B i a s | n | B i a s | r e l s d ( q ^ )( n · r e l M SE ) C o v ( . ) s d ^ ( q ^ ) s d ( q ^ ) n n n Estimator ll OSTMLE

Parameter l DEIE

Figure 2: Comparison of efﬁcient estimators across different nuisance parameter conﬁgurations.Importantly, the TML estimator appears to generally outperform the one-step estimator throughout several scenarios.This comes in several forms, including lower bias, relative standard deviation, or relative mean squared error undermisspeciﬁcation of { e , m , d } or under no misspeciﬁcation; however, under inconsistent estimation of { g , b } , the irreg-ularity of the estimators complicates this comparison. Interestingly, under misspeciﬁcation of g , the TML estimatorsof the direct and indirect effects appear unbiased and efﬁcient, a result unpredictable from theory given the irregularityof the estimators under this conﬁguration. Altogether, results of our numerical experiments indicate that our proposedestimators exhibit properties that align with the theoretical results of Lemmas 4 and 5. We now consider the application of our proposed stochastic interventional direct and indirect effects to decomposethe causal effect of a strategy where buprenorphine dose is successively increased early in treatment (regardless ofopioid use) on relapse among those with opioid use disorder (OUD). Data for our illustrative analysis come fromthe X:BOT trial, a 24-week, multi-site randomized controlled trial designed to examine the comparative effectivenessof extended-release naltrexone (XR-NTX) and sublingual buprenorphine-naloxone (BUP-NX) on relapse (Lee et al.,2018, 2016; Nunes et al., 2016). The X:BOT trial enrolled 570 participants, all of whom were 18 years or older, hadOUD (as per the Diagnostic and Statistical Manual of Mental Disorders-5; American Psychiatric Association, 2013),and had used non-prescribed opioids in the 30 days preceding enrollment. Participants were randomized to receiveeither XR-NTX or BUP-NX using a stratiﬁed permuted block design; 287 of the 570 were randomized to receive EPTEMBER

23, 2020

BUP-NX. Prior analytic efforts have established a protective effect of BUP-NX administration (versus placebo) onOUD relapse (Mattick et al., 2014). For each participant assigned to receive BUP-NX, the prescribed dose was basedon both clinical indication (Lee et al., 2018) and clinician judgment. Some clinicians tended to hold dose constant overtime (i.e., a static dose regimen), while others increased dose — either based on clinical assessment or based on thehypothesis that higher doses would result in better outcomes (Nutt, 2015; Greenwald et al., 2003; Comer et al., 2005;Heikman et al., 2017). In this analysis, we estimated hypothetical stochastic interventional direct and indirect effectsto assess the mechanism by which universally ramping up BUP-NX dose early in treatment (deﬁned as three or moredose increases in the ﬁrst four weeks of treatment) could mitigate the risk of OUD relapse.Baseline covariates ( W ) available in the data included site; gender; age; race/ethnicity; homeless status; educationalattainment; employment status; marital status; current intravenous drug use; alcohol use disorder; cocaine use disorder;age at start of heroin use; severity of current opioid use; indicator of prior OUD treatment; past withdrawal discomfortlevel; histories of amphetamine use, sedative use, and cannabis use; weekly cost of primary drug; whether or not livingwith an individual currently using drugs or with alcohol use disorder; histories of psychiatric illnesses; randomizationtiming; baseline pain level; baseline depression symptoms. The exposure ( A ) was taken to be successive increases indose of BUP-NX versus static dose, measured during the ﬁrst four weeks of treatment. Mediating factors ( Z ) includeddepression and pain, measured from week 6 until relapse or week 24 (end of follow-up). Abstinence from illicitopioid use early in the treatment schedule, measured between weeks 4 and 6, acted as an intermediate confounderaffected by exposure ( L ). OUD relapse status at the X:BOT trial’s end of follow-up was the outcome of interest ( Y ).To examine the effect of exposure to successive increases in BUP-NX dose, we consider an incremental propensityscore intervention, which, for binary A , replaces the propensity score g (1 | w ) with a shifted variant constructed frommultiplying the odds of treatment by a user-speciﬁed parameter δ (Kennedy, 2019). Speciﬁcally, we considered agrid log( δ ) ∈ {− . , − . , − . , . . . , . , . } of the exposure odds observed in the X:BOT trial. Across all suchestimates in the odds δ of exposure, the stochastic interventional direct and indirect effects that we estimated may beinterpreted in terms of the overall effect of increasingly encouraging ramping up BUP-NX dose early in treatment onthe counterfactual risk of OUD relapse; thus, the results of our analysis may be informative of the mechanisms bywhich increasing BUP-NX dose can alter the risk of OUD relapse. The direct and indirect effect estimates, across thegrid in δ , are presented in Figure 3. D i r e c t E ff e c tI nd i r e c t E ff e c t . − . − . − . − . − . − . − . − . − . − . + . + . + . + . + . + . + . + . + . + −0.2−0.10.00.1−0.04−0.020.000.020.04 Odds d of exposure to adaptive versus static prescription strategies E s t i m a t ed c hange i n popu l a t i on i n t e r v en t i on r i sk o f r e l ap s e One step estimate TML estimate based on estimated stochastic interventional (in)direct effects

Adaptive prescription strategies directly lower risk of OUD relapse

Figure 3: Stochastic interventional direct (upper panel) and indirect (lower panel) effect estimates of a hypotheticalintervention increasing the odds of exposure to a BUP-NX dose schedule in which dose is successively increased earlyin treatment on OUD relapse across a grid of shifts δ in the odds. EPTEMBER

23, 2020

We applied both of our cross-ﬁtted, efﬁcient one-step and TML estimators to examine the stochastic interventionaldirect and indirect effects of increasing the odds of ramping up BUP-NX dose. Both estimation strategies producedresults that were generally in very close agreement as to the magnitude of the direct and indirect effects. For eachpoint estimate, standard error estimates and 95% Wald-style conﬁdence intervals were constructed based on the con-clusions of Theorem 4. In order to ensure the ﬂexibility of our estimators, each component of the vector of nuisanceparameters η = ( e , m , d , g , b ) was estimated via ensemble machine learning, using the Super Learner algorithm (vander Laan et al., 2007). The library of machine learning algorithms from which the Super Learner ensemble wasconstructed included intercept-only logistic regression, logistic regression with Bayesian priors on parameters, multi-variate adaptive regression splines (Friedman, 1991), generalized additive models (Hastie & Tibshirani, 1990), randomforests (Breiman, 2001), gradient boosted machines (Friedman, 2001), and the highly adaptive lasso (Benkeser & vander Laan, 2016; Coyle et al., 2020).From examination of the point estimates and conﬁdence intervals of the direct and indirect effects in Figure 3, twoconclusions may be drawn. Firstly, there appears to be little to no indirect effect of successively increasing BUP-NX dose on risk of OUD relapse, revealing that any effect of BUP-NX dose does not appear to operate throughmediating factors such as depression or pain. Secondly, the direct effect of successively increasing BUP-NX dosevaries considerably across changes in the odds of the introduction of such a dose schedule. Importantly, it appearsthat decreasing the odds of increasing dose could lead to as much as a 5% increase in the OUD relapse risk, with aplateau emerging at odds lower than ≈ ≈

10% with increasingodds of successively increasing BUP-NX dose, with the risk plateauing at odds higher than 33%. This decrease in thecounterfactual risk of OUD relapse suggests a protective effect of BUP-NX dose schedules where dose is successivelyincreased early in treatment relative to static dose.The conclusions that may be drawn from our re-analysis using the stochastic interventional direct and indirect effectscomplement those previously reported in the investigations of Lee et al. (2018), who evaluated the total effect ofBUP-NX (versus XR-NTX) treatment on OUD relapse, and Rudolph et al. (2020a), who used the interventionalmediation analysis approach of D´ıaz et al. (2020) (limited to static interventions on A ) to examine differences inrelapse risk between homeless and non-homeless participants. Importantly, our substantive conclusion — that doseincreases directly lower the risk of relapse — agree generally with those of Rudolph et al. (2020b), who found that doseincreases directly lowered risk of OUD relapse when such dose increases followed opioid use. Notably, our proposed(in)direct effects and estimation approach differ from previous efforts in three important ways: (i) our causal effectdeﬁnitions remain unaltered in the presence intermediate confounders affected by exposure and may be re-evaluated inrandomized trials, (ii) the ﬂexible estimators we introduce eschew restrictive modeling assumptions by incorporatingstate-of-the-art machine learning in the estimation of nuisance parameters, and (iii) our strategy provides an analogto a dose-response analysis by allowing for the risk of OUD relapse to be traced out across changes in the odds ofexposure to a schedule in which BUP-NX dose is increased repeatedly early in treatment. We have proposed a class of novel direct and indirect effects for causal mediation analysis, as well as two efﬁcientestimators of these effects in the nonparametric statistical model. Importantly, our proposed estimation frameworkallows for data adaptive estimation of nuisance parameters, while still preserving the beneﬁts associated with similarclassical techniques — that is, our estimators are regular and asymptotically linear, provide unbiased point estimates,are multiply robust, allow the construction of asymptotically valid conﬁdence intervals, and are capable of attainingthe nonparametric efﬁciency bound. Our (in)direct effects have interpretations that echo those of the classical natural(in)direct effects; however, our effects remain well-deﬁned even in the presence of intermediate confounders affectedby exposure. Further, any scientiﬁc conclusions drawn based upon our proposed (in)direct effects may be readilyinterrogated in trials that randomize both the exposure and mediators. Such ﬂexible effect deﬁnitions and estimatorsseem necessary both to cope with the design complexity exhibited by modern epidemiological and biomedical studiesand to take appropriate advantage of the ever-growing number of data adaptive regression techniques.The challenge of leveraging data adaptive regression methodology to construct robust estimators that accommodatevalid statistical inference is not a new one. It has been considered in great detail as early as the work of Pfanzagl& Wefelmeyer (1985) as well in numerous recent advances, most notably by van der Laan & Rose (2011, 2018)and Chernozhukov et al. (2018); related work by these authors presents a wealth of extensions and applications. Inthe present work, we derive multiply robust, efﬁcient estimators based on both the one-step and targeted minimumloss estimation frameworks. Following Klaassen (1987) and Zheng & van der Laan (2011), our estimators leveragecross-validation to avoid imposing possibly restrictive assumptions on nuisance function estimators. We demonstratedthe properties of our estimators in simulation experiments that illustrated their ability to yield unbiased point estimates, EPTEMBER

23, 2020 attain the nonparametric efﬁciency bound, and build conﬁdence interval covering at the nominal rate across severalnuisance parameter conﬁgurations, all within the context of a problem in which classical mediation effects are ill-deﬁned. Finally, we demonstrated the application of our novel (in)direct effects in dissecting the mechanism by whichincreasing the odds of adopting a dose schedule corresponding to universal successive increases in buprenorphine doseearly in treatment affects relapse to OUD (Lee et al., 2018; Rudolph et al., 2020a).Several signiﬁcant extensions and reﬁnements are left for future consideration. Firstly, our proposed estimation strat-egy for the direct and indirect effects leverages re-parameterizations of factors of the likelihood in order to simplifythe estimation of nuisance parameters. This approach works particularly well when either mediators or intermediateconfounders are of low dimension; however, improving this approach to accommodate moderate dimensionality ofboth mediators and intermediate confounders would surely widen the range of scenarios to which the methodologymay be applied. Secondly, when deﬁning effects based upon stochastic interventions indexed by the user-speciﬁedparameter δ , an important consideration is choosing a priori a particular value of δ . One solution is to evaluate a setof causal effects indexed by a grid in δ . In such cases, aggregate effects (across δ ) may be summarized via workingmarginal structural models (e.g., Hejazi et al., 2020) or the construction of uniform tests of the null hypothesis of nodirect effect (e.g., D´ıaz & Hejazi, 2020). Developments of these distinct summarization strategies would enrich therange of scientiﬁc applications on which these robust and ﬂexible (in)direct effects could be brought to bear. Acknowledgements

The authors thank John Rotrosen, Edward Nunes, and Marc Fishman for raising the research question in theApplication and for helpful feedback. KER’s time was supported by R00DA042127. The X:BOT trial was supportedby the National Institute on Drug Abuse, Clinical Trials Network (U10DA013046, UG1/U10DA013035,UG1/U10DA013034, U10DA013045, UG1/U10DA013720, UG1/U10DA013732, UG1/U10DA013714,UG1/U10DA015831, U10DA015833, HHSN271201200017C, and HHSN271201500065C).

References A MERICAN P SYCHIATRIC A SSOCIATION (2013).

Diagnostic and statistical manual of mental disorders (DSM-5 R (cid:13) ) .American Psychiatric Pub.A VIN , C., S

HPITSER , I. & P

EARL , J. (2005). Identiﬁability of path-speciﬁc effects. In

IJCAI International JointConference on Artiﬁcial Intelligence .B ARON , R. M. & K

ENNY , D. A. (1986). The moderator–mediator variable distinction in social psychological re-search: Conceptual, strategic, and statistical considerations.

Journal of Personality and Social Psychology ,1173.B ENKESER , D. (2020). Nonparametric inference for interventional effects with multiple mediators .B

ENKESER , D. &

VAN DER L AAN , M. J. (2016). The highly adaptive lasso estimator. In . IEEE.B

IBAUT , A. F. &

VAN DER L AAN , M. J. (2019). Fast rates for empirical risk minimization over c`adl`ag functions withbounded sectional variation norm. arXiv preprint arXiv:1907.09244 .B ICKEL , P. J., K

LAASSEN , C. A., R

ITOV , Y. & W

ELLNER , J. A. (1993).

Efﬁcient and adaptive estimation forsemiparametric models . Johns Hopkins University Press Baltimore.B

REIMAN , L. (1996). Stacked regressions.

Machine Learning , 49–64.B REIMAN , L. (2001). Random forests.

Machine learning , 5–32.C HERNOZHUKOV , V., C

HETVERIKOV , D., D

EMIRER , M., D

UFLO , E., H

ANSEN , C., N

EWEY , W. & R

OBINS , J. M.(2018). Double/debiased machine learning for treatment and structural parameters.

The Econometrics Journal ,C1–C68.C OLE , S. R. & H

ERN ´ AN , M. A. (2002). Fallibility in estimating direct effects. International Journal of Epidemiology , 163–165. EPTEMBER

23, 2020 C OMER , S. D., W

ALKER , E. A. & C

OLLINS , E. D. (2005). Buprenorphine/naloxone reduces the reinforcing andsubjective effects of heroin in heroin-dependent volunteers.

Psychopharmacology , 664–675.C

OREY , L., G

ILBERT , P. B., T

OMARAS , G. D., H

AYNES , B. F., P

ANTALEO , G. & F

AUCI , A. S. (2015). Immunecorrelates of vaccine protection against HIV-1 acquisition.

Science Translational Medicine , 310rv7–310rv7.C OYLE , J. R., H

EJAZI , N. S. &

VAN DER L AAN , M. J. (2020). hal9001: The scalable highly adaptive lasso . Rpackage version 0.2.6.D

AWID , A. P. (2000). Causal inference without counterfactuals.

Journal of the American Statistical Association ,407–424.D´ IAZ , I. & H

EJAZI , N. S. (2020). Causal mediation analysis for stochastic interventions.

Journal of the RoyalStatistical Society: Series B (Statistical Methodology) , 661–683.D´ IAZ , I., H

EJAZI , N. S., R

UDOLPH , K. E. &

VAN DER L AAN , M. J. (2020). Non-parametric efﬁcient causalmediation with intermediate confounders .D´

IAZ , I. &

VAN DER L AAN , M. J. (2011). Super learner based conditional density estimation with application tomarginal structural models.

The International Journal of Biostatistics , 1–20.D´ IAZ , I. &

VAN DER L AAN , M. J. (2012). Population intervention causal effects based on stochastic interventions.

Biometrics , 541–549.D´ IAZ , I. &

VAN DER L AAN , M. J. (2013). Assessing the causal effect of policies: an example using stochasticinterventions.

The International Journal of Biostatistics , 161–174.D IDELEZ , V., D

AWID , P. & G

ENELETTI , S. (2006). Direct and indirect effects of sequential treatments. In

Proceed-ings of the 22nd Annual Conference on Uncertainty in Artiﬁcial Intelligence .D UD ´ IK , M., E RHAN , D., L

ANGFORD , J. & L I , L. (2014). Doubly robust policy evaluation and optimization. Statis-tical Science , 485–511.F ERGUSON , K. K., C

HEN , Y.-H., V

ANDER W EELE , T. J., M C E LRATH , T. F., M

EEKER , J. D. & M

UKHERJEE , B.(2017). Mediation of the relationship between maternal phthalate exposure and preterm birth by oxidative stresswith repeated measurements across pregnancy.

Environmental Health Perspectives , 488–494.F

RIEDMAN , J. H. (1991). Multivariate adaptive regression splines.

Annals of Statistics , 1–67.F

RIEDMAN , J. H. (2001). Greedy function approximation: a gradient boosting machine.

Annals of Statistics , 1189–1232.G

OLDBERGER , A. S. (1972). Structural equation methods in the social sciences.

Econometrica: Journal of theEconometric Society , 979–1001.G

REENWALD , M. K., J

OHANSON , C.-E., M

OODY , D. E., W

OODS , J. H., K

ILBOURN , M. R., K

OEPPE , R. A.,S

CHUSTER , C. R. & Z

UBIETA , J.-K. (2003). Effects of buprenorphine maintenance dose on µ -opioid receptoravailability, plasma concentrations, and antagonist blockade in heroin-dependent volunteers. Neuropsychopharma-cology , 2000–2009.H ANEUSE , S. & R

OTNITZKY , A. (2013). Estimation of the effect of interventions that modify the received treatment.

Statistics in Medicine , 5260–5277.H ASTIE , T. J. & T

IBSHIRANI , R. J. (1990).

Generalized additive models , vol. 43. CRC press.H

EIKMAN , P. K., M

UHONEN , L. H. & O

JANPER ¨ A , I. A. (2017). Polydrug abuse among opioid maintenance treat-ment patients is related to inadequate dose of maintenance treatment medicine. BMC psychiatry , 1–11.H EJAZI , N. S. & D´

IAZ , I. (2020). medshift: Causal mediation analysis for stochastic interventions . R packageversion 0.1.4. EPTEMBER

23, 2020 H EJAZI , N. S.,

VAN DER L AAN , M. J., J

ANES , H. E., G

ILBERT , P. B. & B

ENKESER , D. C. (2020). Efﬁcientnonparametric inference on the effects of stochastic interventions under two-phase sampling, with applications tovaccine efﬁcacy trials .I

MAI , K., K

EELE , L. & T

INGLEY , D. (2010). A general approach to causal mediation analysis.

PsychologicalMethods , 309.K ENNEDY , E. H. (2019). Nonparametric causal effects based on incremental propensity score interventions.

Journalof the American Statistical Association , 645–656.K

LAASSEN , C. A. (1987). Consistent estimation of the inﬂuence function of locally asymptotically linear estimators.

Annals of Statistics , 1548–1562.L EE , J. D., N UNES , E. V., B

AILEY , G. L., B

RIGHAM , G. S., C

OHEN , A. J., F

ISHMAN , M., L

ING , W., L

INDBLAD ,R., S

HMUELI -B LUMBERG , D., S

TABLEIN , D. et al. (2016). Nida clinical trials network ctn-0051, extended-release naltrexone vs. buprenorphine for opioid treatment (x: Bot): study design and rationale.

Contemporaryclinical trials , 253–264.L EE , J. D., N UNES J R , E. V., N OVO , P., B

ACHRACH , K., B

AILEY , G. L., B

HATT , S., F

ARKAS , S., F

ISHMAN , M.,G

AUTHIER , P., H

ODGKINS , C. C., K

ING , J., L

INDBLAD , R., L IU , D., M ATTHEWS , A. G., M AY , J., P EAVY ,K. M., R

OSS , S., S

ALAZAR , D., S

CHKOLNIK , P., S

HMUELI -B LUMBERG , D., S

TABLEIN , D., S

UBRAMANIAM ,G. & R

OTROSEN , J. (2018). Comparative effectiveness of extended-release naltrexone versus buprenorphine-naloxone for opioid relapse prevention (X:BOT): a multicentre, open-label, randomised controlled trial.

The Lancet , 309–318.L OK , J. J. (2016). Deﬁning and estimating causal direct and indirect effects when setting the mediator to speciﬁcvalues is not feasible. Statistics in Medicine , 4008–4020.L OK , J. J. (2019). Causal organic direct and indirect effects: closer to baron and kenny. arXiv preprintarXiv:1903.04697 .M ATTICK , R. P., B

REEN , C., K

IMBER , J. & D

AVOLI , M. (2014). Buprenorphine maintenance versus placebo ormethadone maintenance for opioid dependence.

Cochrane database of systematic reviews .N GUYEN , T. Q., S

CHMID , I. & S

TUART , E. A. (2019). Clarifying causal mediation analysis for the applied re-searcher: Deﬁning effects based on what we want to learn. arXiv preprint arXiv:1904.08515 .N UNES , E. V., L EE , J. D., S ISTI , D., S

EGAL , A., C

APLAN , A., F

ISHMAN , M., B

AILEY , G., B

RIGHAM , G., N

OVO ,P., F

ARKAS , S. et al. (2016). Ethical and clinical safety considerations in the design of an effectiveness trial: Acomparison of buprenorphine versus naltrexone treatment for opioid dependence.

Contemporary clinical trials ,34–43.N UTT , D. J. (2015). Considerations on the role of buprenorphine in recovery from heroin addiction from a ukperspective.

Journal of Psychopharmacology , 43–49.P EARL , J. (1995). Causal diagrams for empirical research.

Biometrika , 669–688.P EARL , J. (1998). Graphs, causality, and structural equation models.

Sociological Methods & Research , 226–284.P EARL , J. (2000).

Causality . Cambridge university press.P

EARL , J. (2006). Direct and indirect effects. In

Proceedings of the 17th Annual Conference on Uncertainty inArtiﬁcial Intelligence .P EARL , J. (2009). Myth, Confusion, and Science in Causal Analysis. Tech. Rep. R-348, Cognitive Systems Labora-tory, Computer Science Department, University of California, Los Angeles, Los Angeles, CA.P

ETERSEN , M. L., S

INISI , S. E. &

VAN DER L AAN , M. J. (2006). Estimation of direct causal effects.

Epidemiology , 276–284. EPTEMBER

23, 2020 P FANZAGL , J. & W

EFELMEYER , W. (1985). Contributions to a general asymptotic statistical theory.

Statistics &Risk Modeling , 379–388.P OPPER , K. (1934).

The logic of scientiﬁc discovery . Routledge.R C

ORE T EAM (2020).

R: A Language and Environment for Statistical Computing . R Foundation for StatisticalComputing, Vienna, Austria.R

ICHARDSON , T. S. & R

OBINS , J. M. (2013). Single world intervention graphs (swigs): A uniﬁcation of thecounterfactual and graphical approaches to causality.

Center for the Statistics and the Social Sciences, Universityof Washington Series. Working Paper , 2013.R

OBINS , J. M. (1986). A new approach to causal inference in mortality studies with sustained exposure periods —application to control of the healthy worker survivor effect.

Mathematical Modelling , 1393–1512.R OBINS , J. M. & G

REENLAND , S. (1992). Identiﬁability and exchangeability for direct and indirect effects.

Epi-demiology , 143–155.R

OBINS , J. M., H

ERN ´ AN , M. A. & S IEBERT , U. (2004). Effects of multiple interventions. In

Comparative Quantiﬁ-cation of Health Risks , vol. 1. Citeseer, pp. 2191–2230.R

OBINS , J. M. & R

ICHARDSON , T. S. (2010). Alternative graphical causal models and the identiﬁcation of directeffects.

Causality and Psychopathology: Finding the Determinants of Disorders and Their Cures , 103–158.R

UDOLPH , K. E., D´

IAZ , I., H

EJAZI , N. S.,

VAN DER L AAN , M. J., L UO , S. X., S HULMAN , M., C

AMPBELL , A.,R

OTROSEN , J. & N

UNES , E. V. (2020a). Explaining differential effects on opioid use disorder treatment using anovel causal approach incorporating mediating and intermediate variables .R

UDOLPH , K. E., S

HULMAN , M., F

ISHMAN , M., D´

IAZ , I., R

OTROSEN , J. & N

UNES , E. V. (2020b). Associationbetween dynamic dose adjustment of buprenorphine for treatment of opioid use disorder and risk of relapse .R

UDOLPH , K. E., S

OFRYGIN , O., Z

HENG , W. &

VAN DER L AAN , M. J. (2017). Robust and ﬂexible estimation ofstochastic mediation effects: a proposed method and example in a randomized trial setting.

Epidemiologic Methods .S PIRTES , P., G

LYMOUR , C. N., S

CHEINES , R., H

ECKERMAN , D., M

EEK , C., C

OOPER , G. & R

ICHARDSON , T.(2000).

Causation, prediction, and search . MIT press.S

TITELMAN , O. M., H

UBBARD , A. E. & J

EWELL , N. P. (2010). The impact of coarsening the explanatory variableof interest in making causal inferences: Implicit assumptions behind dichotomizing variables .S

TOCK , J. H. (1989). Nonparametric policy analysis.

Journal of the American Statistical Association , 567–575.T CHETGEN T CHETGEN , E. J. & V

ANDER W EELE , T. J. (2014). On identiﬁcation of natural direct effects when aconfounder of the mediator is directly affected by exposure.

Epidemiology , 282.T IAN , J. (2008). Identifying dynamic sequential plans. In

Proceedings of the Twenty-Fourth Conference on Uncer-tainty in Artiﬁcial Intelligence . AUAI Press.

VAN DER L AAN , M. J. (2017). A generally efﬁcient targeted minimum loss based estimator based on the highlyadaptive lasso.

The International Journal of Biostatistics . VAN DER L AAN , M. J. & B

IBAUT , A. F. (2017). Uniform consistency of the highly adaptive lasso estimator ofinﬁnite-dimensional parameters. arXiv preprint arXiv:1709.06256 . VAN DER L AAN , M. J. & P

ETERSEN , M. L. (2008). Direct effect models.

The International Journal of Biostatistics . VAN DER L AAN , M. J., P

OLLEY , E. C. & H

UBBARD , A. E. (2007). Super learner.

Statistical Applications inGenetics and Molecular Biology . EPTEMBER

23, 2020

VAN DER L AAN , M. J. & R

OBINS , J. M. (2003).

Uniﬁed Methods for Censored Longitudinal Data and Causality .Springer Science & Business Media.

VAN DER L AAN , M. J. & R

OSE , S. (2011).

Targeted Learning: Causal Inference for Observational and ExperimentalData . Springer Science & Business Media.

VAN DER L AAN , M. J. & R

OSE , S. (2018).

Targeted Learning in Data Science: Causal Inference for ComplexLongitudinal Studies . Springer Science & Business Media.

VAN DER L AAN , M. J. & R

UBIN , D. (2006). Targeted maximum likelihood learning.

The International Journal ofBiostatistics . VAN DER V AART , A. W. (2002). Semiparametric statistics.

Lectures on Probability Theory and Statistics .V ANDER W EELE , T. J., V

ANSTEELANDT , S. & R

OBINS , J. M. (2014). Effect decomposition in the presence of anexposure-induced mediator-outcome confounder.

Epidemiology , 300.V ANSTEELANDT , S., B

EKAERT , M. & L

ANGE , T. (2012). Imputation strategies for the estimation of natural directand indirect effects.

Epidemiologic Methods , 131–158.V ANSTEELANDT , S. & D

ANIEL , R. M. (2017). Interventional effects for mediation analysis with multiple mediators.

Epidemiology , 258.V ANSTEELANDT , S. & V

ANDER W EELE , T. J. (2012). Natural direct and indirect effects on the exposed: effectdecomposition under weaker assumptions.

Biometrics , 1019–1027.W OLPERT , D. H. (1992). Stacked generalization.

Neural Networks , 241–259.W RIGHT , S. (1921). Correlation and causation.

Journal of Agricultural Research , 557–585.W RIGHT , S. (1934). The method of path coefﬁcients.

The Annals of Mathematical Statistics , 161–215.Y OUNG , J. G., H

ERN ´ AN , M. A. & R OBINS , J. M. (2014). Identiﬁcation, estimation and approximation of risk underinterventions that depend on the natural value of treatment using observational data.

Epidemiologic Methods ,1–19.Z HENG , W. &

VAN DER L AAN , M. J. (2011). Cross-validated targeted minimum-loss-based estimation. In

TargetedLearning: Causal Inference for Observational and Experimental Data . Springer, pp. 459–474.Z

HENG , W. &

VAN DER L AAN , M. J. (2017). Longitudinal mediation analysis with time-varying mediators andexposures, with application to survival outcomes.

Journal of Causal Inference . upplementary Materials for N ONPARAMETRIC CAUSAL MEDIATION ANALYSIS FORSTOCHASTIC INTERVENTIONAL ( IN ) DIRECT EFFECTS

Nima S. Hejazi

Graduate Group in Biostatistics, andCenter for Computational Biology,University of California, Berkeley [email protected]

Kara E. Rudolph

Department of Epidemiology,Mailman School of Public Heatlh,Columbia University [email protected]

Mark J. van der Laan

Division of Biostatistics,School of Public Health, andDepartment of Statistics,University of California, Berkeley [email protected]

Iv´an D´ıaz

Division of Biostatistics,Department of Population Health Sciences,Weill Cornell Medicine [email protected]

September 23, 2020

S1 Theorem 1

Proof

First, we have E { Y A δ ,G δ } = Z E (cid:8) Y a,z | A δ = a, G δ = z, W = w (cid:9) g δ ( a | w ) P ( G δ = z | A δ = a, W = w ) p ( w )d ν ( a, z, w )= Z E (cid:8) Y a,z | W = w (cid:9) g δ ( a | w ) P ( Z ( a ) = z | A δ = a, W = w ) p ( w )d ν ( a, z, w ) (S1) = Z E (cid:8) Y a,z | A = a, W = w (cid:9) g δ ( a | w ) P ( Z ( a ) = z | W = w ) p ( w )d ν ( a, z, w ) (S2) = Z E (cid:8) Y a,z | A = a, W = w (cid:9) g δ ( a | w ) P ( Z ( a ) = z | A = a, W = w ) p ( w )d ν ( a, z, w ) (S3) = Z E (cid:8) Y a,z | A = a, W = w, L = l (cid:9) b ( l | a, w ) g δ ( a | w ) p ( z | a, w ) p ( w )d ν ( a, z, l, w )= Z m ( a, z, l, w ) b ( l | a, w ) g δ ( a | w ) p ( z | a, w ) p ( w )d ν ( a, z, l, w ) , (S4)where (S1) follows by deﬁnition of ( A δ , G δ ) , (S2) follows by A3 and deﬁnition of A δ , (S3) follows by A5, and (S4)follows by A4. Similar arguments yield E { Y A,G } = Z m ( a, z, l, w ) b ( l | a, w ) g ( z | w ) p ( z | a, w ) p ( w )d ν ( a, z, l, w ) . a r X i v : . [ s t a t . M E ] S e p EPTEMBER

23, 2020

We also have E { Y A δ ,G } = Z E (cid:8) Y a,z | A δ = a, G = z, W = w (cid:9) g δ ( a | w ) P ( G = z | A δ = a, W = w ) p ( w )d ν ( a, z, w )= Z E (cid:8) Y a,z | W = w (cid:9) g δ ( a | w ) P ( G = z | W = w ) p ( w )d ν ( a, z, w )= Z E (cid:8) Y a,z | A = a, W = w (cid:9) g δ ( a | w ) p ( z | w ) p ( w )d ν ( a, z, w )= Z E (cid:8) Y a,z | A = a, W = w (cid:9) g δ ( a | w ) p ( z | w ) p ( w )d ν ( a, z, w )= Z E (cid:8) Y a,z | A = a, W = w, L = l (cid:9) b ( l | a, w ) g δ ( a | w ) p ( z | w ) p ( w )d ν ( a, z, l, w )= Z m ( a, z, l, w ) b ( l | a, w ) g δ ( a | w ) p ( z | w ) p ( w )d ν ( a, z, l, w ) . Subtracting gives the expressions for the PIIE and PIDE in the theorem.

S2 Efﬁcient inﬂuence functions (Theorem 2)

Proof

In this proof we will use Θ j ( P ) : j = 1 , to denote a parameter as a functional that maps the distribution P in the model to a real number. We will assume that the measure ν is discrete so that integrals can be written assums, and will omit the dependence on δ . It can be checked algebraically that the resulting inﬂuence function will alsocorrespond to the inﬂuence function of a general measure ν . The true parameter value for θ is thus given by θ = Θ ( P ) = X y,a,z,m,w y p ( y | a, z, l, w ) p ( l | a, w ) p ( z | a, w ) g δ ( a | w ) p ( w ) . The non-parametric MLE of θ in the model of g δ known is given by Θ( P n ) = X y,a,z,m,w y P n f y,a,z,l,w P n f a,z,l,w P n f l,a,w P n f a,w P n f z,a,w P n f a,w g δ ( a | w ) P n f w , (S5)where we remind the reader of the notation P f = R f d P . Here f y,a,z,l,w = ( Y = y, A = a, Z = z, M = m, W = w ) , and ( · ) denotes the indicator function. The other functions f are deﬁned analogously.We will use the fact that the efﬁcient inﬂuence function in a non-parametric model corresponds with the inﬂuencecurve of the NPMLE. This is true because the inﬂuence curve of any regular estimator is also a gradient, and a non-parametric model has only one gradient. The Delta method (see, e.g., Appendix 18 of van der Laan & Rose, 2011)shows that if ˆΘ ( P n ) is a substitution estimator such that θ = ˆΘ ( P ) , and ˆΘ ( P n ) can be written as ˆΘ ∗ ( P n f : f ∈ F ) for some class of functions F and some mapping Θ ∗ , the inﬂuence function of ˆΘ ( P n ) is equal to IF P ( O ) = X f ∈F d ˆΘ ∗ ( P )d P f { f ( O ) − P f } . Applying this result to (S5) with F = { f y,a,z,l,w , f a,z,l,w , f z,a,w , f a ,w , f l,a,w , f a,w , f w : y, a, z, l, w } and rearrangingterms gives the result of the theorem. The algebraic derivations involved here are lengthy and not particularlyilluminating, and are therefore omitted from the proof. Similar analyses may be performed for the model where only g δ is unknown, as well as θ . S3 Targeted minimum loss estimation algorithm

To simplify notation, in the remaining of this section we will denote ˜ η j ( i ) ( O i ) with ˜ η ( O i ) . If L is binary, the efﬁcientinﬂuence functions in Theorem 2 may be simpliﬁed using the following identity: v ( l, a, w ) − ¯ v ( a, w ) = { v (1 , a, w ) − v (0 , a, w ) }{ l − b (1 | a, w ) } , which also holds for v replaced by s and ¯ v by ¯ s . EPTEMBER

23, 2020

Step 1. Initialize ˜ η = ˆ η . Compute ˜ v , ˜ s , and ˜ q j by plugging in ˜ m , ˜ g , ˜ e , ˜ d into equations (7), (16) and (15) if Z ismultivariate, and ﬁtting data-adaptive regression algorithms as appropriate.Step 2. For each subject, compute the auxiliary covariates H D ,i = ˜ b ( L i | A i , W i )˜ d ( L i | Z i , A i , W i ) (cid:26) − ˜ g δ ( A i | W i )˜ e ( A i | Z i , W i ) (cid:27) H I ,i = ˜ b ( L i | A i , W i )˜ d ( L i | Z i , A i , W i ) (cid:26) ˜ g δ ( A i | W i )˜ e ( A i | Z i , W i ) − ˜ g δ ( A i | W i )˜ g ( A i | W i ) (cid:27) K D ,i = ˜ v (1 , A i , W i ) − ˜ v (0 , A i , W i ) − ˜ g δ ( A i | W i )˜ g ( A i | W i ) { ˜ s (1 , A i , W i ) − ˜ s (0 , A i , W i ) } K I ,i = ˜ g δ ( A i | W i )˜ g ( A i | W i ) { ˜ s (1 , A i , W i ) − ˜ s (0 , A i , W i ) − ˜ v (1 , A i , W i ) + ˜ v (0 , A i , W i ) } M D ,i = − ˜ g δ (1 | w )(1 − ˜ g δ (1 | w ))˜ g (1 | w )(1 − ˜ g (1 | w )) ˜ q ( w ) M I ,i = ˜ g δ (1 | w )(1 − ˜ g δ (1 | w ))˜ g (1 | w )(1 − ˜ g (1 | w )) { ˜ q ( w ) − ˜ q ( w ) } Step 3. Fit the logistic tilting models logit m β ( A i , Z i , L i , W i ) = logit ˜ m ( A i , Z i , L i , W i ) + β I H I ,i + β D H D ,i logit b α (1 | A i , W i ) = logit ˜ b (1 | A i , W i ) + α I K I ,i + α D K D ,i logit g γ (1 | W i ) = logit ˜ g (1 | W i ) + γ I M I ,i + γ D M D ,i where logit( p ) = log { p (1 − p ) − } . Here, logit ˜ m ( a, z, l, w ) is an offset variable (i.e., a variable with knownparameter value equal to one). The parameter β = ( β I , β D ) may be estimated by running standard logisticregression of Y i on ( H D ,i , H I ,i ) with no intercept and an offset term equal to logit ˜ m ( A i , Z i , L i , W i ) . Let ˆ β denote the estimate, and let ˜ m = m ˆ β denote the updated estimates. Perform analogous computations for b and g .Step 4. Compute ˜ u according to equation (7) by plugging in ˜ m and ˜ b . Compute the covariate J i = ˜ g δ ( A i | W i )˜ g ( A i | W i ) , and ﬁt the model logit ¯ u κ ( A i , W i ) = logit ˜¯ u ( A i , W i ) + κ D + κ I J i by running a logistic regression of ˜ u ( Z i , A i , W i ) on J i with an intercept and offset logit ˜¯ u ( A i , W i ) . Let ˆ κ denote the MLE, and update ˜¯ u = ¯ u ˆ κ .Step 5. The TMLE of the direct and indirect effects are deﬁned as: ˆ ψ tmleD ,δ = 1 n Z n X i =1 (cid:8) ˜¯ u ( a, W i )˜ g ( a | W i ) − ˜ u ( Z i , a, W i )˜ g δ ( a | W i ) (cid:9) d κ ( a )ˆ ψ tmleI ,δ = 1 n Z n X i =1 (cid:8) ˜ u ( Z i , a, W i ) − ˜¯ u ( a, W i ) (cid:9) ˜ g δ ( a | W i )d κ ( a ) S4 Proof of Theorem 3

Proof

Let P n,j denote the empirical distribution of the prediction set V j , and let G n,j denote the associated empiricalprocess p n/J ( P n,j − P ) . For simplicity we denote a general parameter ψ with inﬂuence function D η , the proofapplies equally to the direct and indirect effect parameters. Note that ˆ ψ os δ = 1 J J X j =1 P n,j D ˆ η j ,δ , ψ δ = P D η . EPTEMBER

23, 2020

Thus, √ n { ˆ ψ os δ − ψ δ } = G n { D η,δ − ψ δ } + R n, ( δ ) + R n, ( δ ) , where R n, ( δ ) = 1 √ J J X j =1 G n,j ( D ˆ η j ,δ − D η,δ ) , R n, ( δ ) = √ nJ J X j =1 P { D ˆ η j ,δ − ψ δ } . It remains to show that R n, ( δ ) and R n, ( δ ) are o P (1) . Lemmas 4 and 5 together with the Cauchy-Schwartz inequalityand assumption (ii) of the theorem shows that || R n, || ∆ = o P (1) . For || R n, || ∆ we use empirical process theory toargue conditional on the training sample T j . In particular, Lemma 19.33 of van der Vaart (2000) applied to the classof functions F = { D ˆ η j ,δ − D η,δ } (which consists of one element) yields E ((cid:12)(cid:12) G n,j ( D ˆ η j ,δ − D η,δ ) (cid:12)(cid:12) (cid:12)(cid:12)(cid:12)(cid:12) T j ) . C log 2 n / + || D ˆ η j ,δ − D η,δ || (log 2) / By assumption (ii), the left hand side is o P (1) . Lemma 6.1 of Chernozhukov et al. (2018) may now be used to arguethat conditional convergence implies unconditional convergence, concluding the proof. S5 Theorem 4

Proof

Let P n,j denote the empirical distribution of the prediction set V j , and let G n,j denote the associated em-pirical process p n/J ( P n,j − P ) . For simplicity we denote a general parameter ψ with inﬂuence function D η , theproof applies equally to the direct and indirect effect parameters. By deﬁnition, the sum of the scores of the sub-models { m β , b α , g γ , ¯ u κ : ( β, α, γ, κ ) } at the last iteration of the TMLE procedure is equal to n − P ni =1 D ˜ η ( O i ) = o P ( n − / ) . Thus, we have ˆ ψ tmle δ = 1 J J X j =1 P n,j D ˜ η j + o P ( n − / ) . Thus, √ n ( ˆ ψ tmle δ − θ ) = G n ( D η − θ ) + R n, + R n, + o P ( n − / ) , where R n, = 1 √ J J X j =1 G n,j ( D ˜ η j − D η ) , R n, = √ nJ J X j =1 P ( D ˜ η j − θ ) . As in the proof of Theorem 3, Lemmas 4 and 5 together with the Cauchy-Schwartz inequality and the assumptions ofthe theorem shows that R n, = o P (1) .Since D ˜ η j depends on the full sample through the estimates of the parameters β of the logistic tilting models, theempirical process treatment of R n, needs to be slightly from that in the proof of Theorem 3. To make this dependenceexplicit, we introduce the notation D ˆ η j ,β = D ˜ η j and R n, ( β ) . Let F jn = { D ˆ η j ,β − D η : β ∈ B } . Because thefunction ˆ η j is ﬁxed given the training data, we can apply Theorem 2.14.2 of van der Vaart & Wellner (1996) to obtain E  sup f ∈F jn | G n,j f | (cid:12)(cid:12)(cid:12)(cid:12) T j  . || F jn || Z q N [ ] ( (cid:15) || F jn || , F jn , L ( P ))d (cid:15), where N [ ] ( (cid:15) || F jn || , F jn , L ( P )) is the bracketing number and we take F jn = sup β ∈ B | D ˆ η j ,β − D η | as an envelope forthe class F jn . Theorem 2.7.2 of van der Vaart & Wellner (1996) shows log N [ ] ( (cid:15) || F jn || , F jn , L ( P )) . (cid:15) || F jn || . This shows || F jn || Z q N [ ] ( (cid:15) || F jn || , F jn , L ( P ))d (cid:15) . Z s || F jn || + || F jn || (cid:15) d (cid:15) ≤ || F jn || + || F jn || / Z (cid:15) / d (cid:15) ≤ || F jn || + 2 || F jn || / . EPTEMBER

23, 2020

Since || F jn || = o P (1) , this shows sup f ∈F jn G n,j f = o P (1) for each j , conditional on T j . Thus sup β ∈ B R n, ( β ) = o P (1) . Lemmas 4 and 5 together with the Cauchy-Schwartz inequality and the assump-tions of the theorem show that R n, = o P (1) , concluding the proof of the theorem. S6 Additional results

Lemma 5 (Second order terms for modiﬁed treatment policies) . Let d ξ ( o ) denote d ν ( a, l, z )d P ( w ) , and let r ( z | , a, w ) denote p ( z | a, w ) , and let h ( z | , w ) denote p ( z | w ) . Let d ( a, w ) denote a modiﬁed treatment policy satisfying A1. Wehave P D η ,δ − ψ ( δ ) = Z (cid:18) gg dd − (cid:19) ( m − m ) b rg δ, d ξ (S6) − Z (cid:18) gg − (cid:19) (¯ u − ¯ u ) g δ, d ξ (S7) + Z (cid:18) gg − (cid:19) ( m − m ) b rg δ, d ξ (S8) − Z gg ( b − b )( v − v ) g δ, d ξ (S9) − Z (¯ u − ¯ u )( g δ, − g δ )d ξ (S10) and P D η ,δ − ψ ( δ ) = Z (cid:18) ee dd − (cid:19) ( m − m ) b hg δ, d ξ + Z gg ( b − b )( s − s ) g δ, d ξ − Z ( q − q )( g δ, − g δ )d ξ. EPTEMBER

23, 2020

Proof

Note that P S η ,δ − ψ ( δ ) = Z (cid:18) gg dd − (cid:19) ( m − m ) b rg δ, d ξ + Z ( m − m ) b rg δ, d ξ − Z ¯ u ( g δ − g δ, )d ξ − Z ¯ ug δ, d ξ + Z gg ( b − b ) v g δ, d ξ + Z gg ( u r − ¯ u ) g δ, d ξ + Z ¯ u g δ, d ξ = ( S

6) + Z (¯ u − ¯ u ) g δ, d ξ + Z ( m − m ) b rg δ, d ξ + Z gg ( b − b ) v g δ, d ξ + Z gg u rg δ, d ξ − Z gg ¯ u g δ, d ξ − Z ¯ u ( g δ − g δ, )d ξ = ( S − Z (cid:18) gg − (cid:19) (¯ u − ¯ u ) g δ, d ξ + Z gg m b rg δ, d ξ − Z gg mbrg δ, d ξ + Z ( m − m ) b rg δ, d ξ + Z gg ( b − b ) v g δ, d ξ − Z ¯ u ( g δ − g δ, )d ξ = ( S − ( S Z ( m − m ) b rg δ, d ξ + Z gg ( b − b ) v g δ, d ξ + Z gg ( m b + mb − mb − mb ) rg δ, d ξ − Z ¯ u ( g δ − g δ, )d ξ = ( S − ( S Z ( m − m ) b rg δ, d ξ + Z gg ( b − b ) v g δ, d ξ + Z gg ( m − m ) b rg δ, d ξ + Z gg ( b − b ) mrg δ, d ξ − Z ¯ u ( g δ − g δ, )d ξ = ( S − ( S

7) + ( S − ( S − Z ¯ u ( g δ − g δ, )d ξ. (S11)Using A1 we can change variables to obtain P S A, η ,δ = Z ¯ u ( g δ − g δ, )d ξ. The proof for ψ is analogous. This completes the proof of the theorem. EPTEMBER

23, 2020

Lemma 6 (Second order terms for exponential tilting.) . Deﬁne c ( w ) = { R a exp( δa ) g ( a | w ) } − , and let c ( w ) bedeﬁned analogously. Let b ( a ) = exp( δa ) . Using the same notation as in Lemma 4, we have P D η ,δ − ψ ( δ ) = Z (cid:18) gg dd − (cid:19) ( m − m ) b rg δ, d ξ − Z (cid:18) gg − (cid:19) (¯ u − ¯ u ) g δ, d ξ + Z (cid:18) gg − (cid:19) ( m − m ) b rg δ, d ξ − Z gg ( b − b )( v − v ) g δ, d ξ + Z (¯ u − ¯ u )( g δ, − g δ )d ξ − Z (cid:26) ( c − c ) Z b g ¯ u d κ Z b g d κ (cid:27) d ξ + Z (cid:26) ( c − c ) Z b ¯ u ( g − g )d κ (cid:27) d ξ, and P D η ,δ − ψ ( δ ) = Z (cid:18) ee dd − (cid:19) ( m − m ) b hg δ, d ξ + Z gg ( b − b )( s − s ) g δ, d ξ − Z ( q − q )( g δ, − g δ )d ξ − Z (cid:26) ( c − c ) Z b g ¯ q d κ Z b g d κ (cid:27) d ξ + Z (cid:26) ( c − c ) Z b ¯ q ( g − g )d κ (cid:27) d ξ. Proof

In this proof, (S11) is also valid. We have P S ,Aη ,δ − Z ¯ u ( g δ − g δ, )d ξ = P S ,Aη ,δ − Z ¯ u ( g δ − g δ, )d ξ + Z (¯ u − ¯ u )( g δ, − g δ )d ξ It thus remains to prove that P S ,Aη ,δ − Z ¯ u ( g δ − g δ, )d ξ = − Z (cid:26) ( c − c ) Z b g ¯ u d κ Z b g d κ (cid:27) d ξ + Z (cid:26) ( c − c ) Z b ¯ u ( g − g )d κ (cid:27) d ξ. EPTEMBER

23, 2020

We have P S ,Aη − Z ¯ u ( g δ − g δ, )d ξ = Z (cid:26)Z g ,δ g ¯ u g d κ − Z g ,δ g g d κ Z ¯ u g ,δ d κ + Z ( g ,δ − g δ )¯ u d κ (cid:27) d ξ = Z ( g ,δ g g ¯ u d κ − Z g δ ¯ u d κ + Z g ,δ ¯ u d κ (cid:20) − Z g ,δ g g d κ (cid:21)) d ξ = Z (cid:26) c Z b ¯ u g d κ − c Z b ¯ u g d κ + c Z b g ¯ u d κ Z ( c − c ) b g d κ (cid:27) d ξ = Z ( c − c ) (cid:26)Z b ¯ u g d κ − c Z b g ¯ u d κ Z b g d κ (cid:27) d ξ = Z ( c − c ) (cid:26)Z b ¯ u g d κ − c Z b g ¯ u d κ Z b g d κ − ( c − c ) Z b g ¯ u d κ Z b g d κ (cid:27) d ξ = Z ( − ( c − c ) Z b g ¯ u d κ Z b g d κ + ( c − c ) (cid:20)Z b ¯ u g d κ − Z b g ¯ u d κ (cid:21)) d ξ (S12) = Z (cid:26) − ( c − c ) Z b g ¯ u d κ Z b g d κ + ( c − c ) Z b ¯ u ( g − g )d κ (cid:27) d ξ, where (S12) follows from c R b g d κ = 1 . The proof for ψ is analogous. References C HERNOZHUKOV , V., C

HETVERIKOV , D., D

EMIRER , M., D

UFLO , E., H

ANSEN , C., N

EWEY , W. & R

OBINS , J. M.(2018). Double/debiased machine learning for treatment and structural parameters.

The Econometrics Journal ,C1–C68. VAN DER L AAN , M. J. & R

OSE , S. (2011).

Targeted Learning: Causal Inference for Observational and ExperimentalData . Springer Science & Business Media.

VAN DER V AART , A. W. (2000).

Asymptotic statistics , vol. 3. Cambridge university press.

VAN DER V AART , A. W. & W

ELLNER , J. A. (1996).

Weak convergence and empirical processes . Springer.. Springer.