[PDF] Estimation of separable direct and indirect effects in continuous time

Abstract

Many research questions involve time-to-event outcomes that can be prevented from occurring due to competing events. In these settings, we must be careful about the causal interpretation of classical statistical estimands. In particular, estimands on the hazard scale, such as ratios of cause specific or subdistribution hazards, are fundamentally hard to be interpret causally. Estimands on the risk scale, such as contrasts of cumulative incidence functions, do have a causal interpretation, but they only capture the total effect of the treatment on the event of interest; that is, effects both through and outside of the competing event. To disentangle causal treatment effects on the event of interest and competing events, the separable direct and indirect effects were recently introduced. Here we provide new results on the estimation of direct and indirect separable effects in continuous time. In particular, we derive the nonparametric influence function in continuous time and use it to construct an estimator that has certain robustness properties. We also propose a simple estimator based on semiparametric models for the two cause specific hazard functions. We describe the asymptotic properties of these estimators, and present results from simulation studies, suggesting that the estimators behave satisfactorily in finite samples. Finally, we re-analyze the prostate cancer trial from Stensrud et al (2020).

Full PDF

EE STIMATION OF SEPARABLE DIRECT AND INDIRECT EFFECTS INCONTINUOUS TIME

A P

REPRINT

Torben Martinussen

Department of Biostatistics, University of [email protected]

Mats Julius Stensrud

Department of Mathematics, Ecole Polytechnique Fédérale de [email protected] 1, 2020 A BSTRACT

Many research questions involve time-to-event outcomes that can be prevented from occurring due tocompeting events. In these settings, we must be careful about the causal interpretation of classicalstatistical estimands. In particular, estimands on the hazard scale, such as ratios of cause speciﬁc orsubdistribution hazards, are fundamentally hard to be interpret causally. Estimands on the risk scale,such as contrasts of cumulative incidence functions, do have a causal interpretation, but they onlycapture the total effect of the treatment on the event of interest; that is, effects both through and outsideof the competing event. To disentangle causal treatment effects on the event of interest and competingevents, the separable direct and indirect effects were recently introduced. Here we provide new resultson the estimation of direct and indirect separable effects in continuous time. In particular, we derivethe nonparametric inﬂuence function in continuous time and use it to construct an estimator that hascertain robustness properties. We also propose a simple estimator based on semiparametric modelsfor the two cause speciﬁc hazard functions. We describe the asymptotic properties of these estimators,and present results from simulation studies, suggesting that the estimators behave satisfactorily inﬁnite samples. Finally, we re-analyze the prostate cancer trial from Stensrud et al (2020). K eywords Separable effects · Competing events · Survival analysis · Hazard functions · Inﬂuence function

In survival analysis, the event of interest can be prevented from occurring due to a competing event. The presenceof competing events requires us to be careful about the interpretation of classical statistical estimands (Robins, 1986;Young et al., 2020). In particular, it is well-established that estimands on the hazard scale, such as cause speciﬁc hazardor subdistribution hazard ratios, do not have a causal interpretation unless we impose strong assumptions that usually areunreasonable (Robins, 1986; Hernán, 2010; Martinussen et al., 2018; Young et al., 2020). Yet the cumulative incidencefunction, which is deﬁned on the risk scale (Andersen et al., 2012), has a causal interpretation as the total effect on theevent of interest (Young et al., 2020).However, the total effect does not inform us about the mechanisms by which the treatment exerts effects on the event ofinterest. To illustrate this, suppose that we perfectly executed a randomized experiment in which 1000 patients receiveda cancer drug and 1000 patients received control. After 5 years, 250 patients died in the treatment arm and 500 died inthe control arm, and therefore the drug was successfully shown to reduce mortality after 5 years. However, the drug wasexcreted in the kidneys, and to assess a potential side effect, the investigators did an additional analysis in which kidneyfailure was the primary outcome. They found 250 kidney failures in the treatment arm and 100 kidney failures in theplacebo arm after 5 years. Two scientists debated the treatment effect on kidney events. As the drug was known to be a r X i v : . [ s t a t . M E ] A ug PREPRINT - S

EPTEMBER

1, 2020excreted in the kidneys, the ﬁrst scientist suspected that the increase in kidney events was a biological side effect. Thesecond scientist doubted this explanation, and claimed that the increase in kidney events occurred because the drugreduced mortality, and hence more subjects were at risk of developing kidney events in the treatment arm. Thus, eventhough the scientists could identify the cumulative incidences of mortality and kidney events, they were unable to agreeon the causal mechanism by which treatment causes kidney events.It has sometimes been suggested that the marginal distribution function, also called the net risk, can be used to assessthe causal effect of treatment on the event of interest outside of its effect on the competing event. However, interpretingthis estimand requires us to consider a hypothetical intervention to prevent the competing event from occurring, that is,to consider a controlled direct effect (Robins and Greenland, 1992; Young et al., 2020). This is problematic because thehypothetical intervention to prevent the competing event, which is death in our conceptual example above, is usuallyinfeasible in practice, and therefore the causal estimand is ill-deﬁned and is not scientiﬁcally interesting. In particular,the controlled direct effect cannot solve the problem raised by the two scientists in the example above.To describe the mechanism by which the treatment exerts effects on the outcome of interest, Stensrud et al. (2020b)recently introduced the separable direct and indirect effects, motivated by a treatment decomposition idea introducedby Robins and Richardson (2010) in a mediation setting (see also Didelez (2018)). The separable direct effect is thetreatment effect on the event of interest outside of its effect on the competing event, and the separable indirect effect isthe effect on the outcome of interest only through the competing event (Stensrud et al., 2020b). These effects add up tothe total effect, that is, the conventional cumulative incidence function, and thereby the separable effects describe themechanism by which the total effect arise. In particular, the discussion between the scientists above could be resolvedby considering the separable effects (Stensrud et al., 2020b).So far the theory on the separable effects has been restricted to settings in discrete time (Stensrud et al., 2020b) andfocused on identiﬁcation and interpretation. Here we deﬁne separable effects in continuous time and consider variousestimators. Importantly, we derive the nonparametric inﬂuence function for the separable effects that leads to a robustestimator. We describe the asymptotic properties of the estimator, assess its properties in a simulation study, andre-analyze the example on prostate cancer from Stensrud et al. (2020b).The manuscript is organized as follows. In Section 2, we describe the observed data structure, establish notation thatwill be used in subsequent sections and deﬁne the separable effects in continuous time. In Section 3, we derive thenonparametric inﬂuence function of the separable direct effect in continuous time, propose an estimator based on theinﬂuence function, often referred to as the one-step estimator, and describe its robustness to model mis-speciﬁcation.In Section 4, we study the asymptotic properties of the estimator when Cox proportional hazard models are used forestimation. We also describe the asymptotic properties of the one-step estimator. In Section 5, we give the efﬁcientinﬂuence function of the separable indirect effect. In Section 6, we assess the performance of the estimators in asimulation study and re-analyse the prostate cancer data from Stensrud et al. (2020b). In section 7, we provide adiscussion. Detailed calculations are given in the Appendix.

Suppose that we observe data from a randomized experiment in which i = 1 , . . . , n individuals are assigned adichotomous treatment A i . For each individual i , we measure data on a vector of covariates, W i , before treatmentassignment, and each i has an event time T i of type (cid:15) i ∈ { , } , where (cid:15) i = 1 denotes the event of interest ( Y ) and (cid:15) i = 2 the competing event D . We will hereby suppress the individual i subscript, because the random vector foreach individual is assumed to be drawn independently from a distribution common to all subjects. Because of lossto follow-up, we only observe ∆ = I ( T ≤ C ) and ˜ T = min ( T, C ) , where C denotes the right censoring time. Weassume that T and C are conditionally independent given ( A, W ) . To focus on the main ideas of this work and simplifythe derivations, we initially develop the theory without censoring. Thus, we ﬁrst assume that data for a given individualconsist of the vector X = ( T, (cid:15), A, W ) . However, the results immediately extend to settings with censoring, where thecensoring may depend on A and W , as we describe in more detail in Section 3. To deﬁne our estimands of interest – the separable effects – we consider a decomposition of A into two dichotomouscomponents, A Y and A D , analogous to Stensrud et al. (2020b): the A Y component exerts all its effects on the event ofinterest ( Y ) outside of the competing event ( D ), and A D exerts all its effect on Y through D . In the observed data,the treatment components are deterministically related in each individual, A = A Y = A D , but we will conceive ahypothetical experiment in which these components are assigned different values. This decomposition assumptionmotivates the deﬁnition of our separable direct and indirect effects. However, the separable effects can still be2 PREPRINT - S

EPTEMBER

1, 2020meaningful, even if a physical decomposition of the treatment is possible, e.g. if we can conceive a hypotheticaltreatment that operate in the same way as the A Y component of A , but does not exert effects on A D (Stensrud et al.,2020b,a). We use superscripts to denote counterfactuals, such that T a , is the event time when, possibly contrary to fact, A is set to a , and (cid:15) a indicates whether an event of interest Y ( (cid:15) a = 1 ) or a competing event D ( (cid:15) a = 2 ) occurred at T a . Similarly, T a Y ,a D and (cid:15) a Y ,a D denote counterfactual values under an intervention that sets A Y to a Y and A D to a D .Let P ( t, a Y , a D ) ≡ P ( T a Y ,a D ≤ t, (cid:15) a Y ,a D = 1) . Analogous to the results in Stensrud et al. (2020b) in discrete time, we can now deﬁne the separable direct effect oftreatment A at time t as P ( t, , a D ) vs. P ( t, , a D ) for a D ∈ { , } , and the separable indirect effect is deﬁned as P ( t, a Y , vs. P ( t, a Y , for a Y ∈ { , } . Note that pairs of separable direct and indirect effects sum to the total effect, which is equal to the classical cumulativeincidence function, that is, (cid:8) P ( t, , a ) − P ( t, , a ) (cid:9) + (cid:8) P ( t, − a, − P ( t, − a, (cid:9) = P ( t, , − P ( t, , for a ∈ { , } . Consider the additive separable direct effect, δ ( t, a D ) = P ( t, , a D ) − P ( t, , a D ) for a D ∈ { , } . To identify δ ( t, a D ) from the observed data, where A and the components A Y and A D are deterministically related,we impose the following conditions, which are continuous time analogues to the conditions in Stensrud et al. (2020b),see also Robins and Richardson (2010).We assume conditional exchangeability, that is, ( T a , (cid:15) a ) ⊥⊥ A | W for a ∈ { , } , which is a classical exchangeability condition that is expected to hold when treatment A is randomly assigned.Second, we assume consistency, such that if an individual has observed treatment A = a , then ( T a , (cid:15) a ) = ( T, (cid:15) ) , for a ∈ { , } . The consistency assumption ensures that the observed outcome is equal to the counterfactual outcomefor any individual that has observed data history consistent with a counterfactual scenario.Third, positivity such that f ( W = w ) > ⇒ Pr( A = a | W = w ) > for a ∈ { , } , (1) f ( T > t, W = w ) > ⇒ Pr( ˜

T > t, A = a | W = w ) > for a ∈ { , } and for all t < t ∗ , (2)where t ∗ denotes the end of follow-up and f generically denotes a density function. Note that (1) is the usual positivitycondition under interventions on A and (2) ensures that among those event-free through each follow-up time, there existindividuals with A = 1 and individuals with A = 0 that are uncensored.Finally, we impose dismissible component conditions. To introduce these conditions, let λ j ( t | A = a, W = w ) denotethe conditional cause speciﬁc hazard function with j = 1 , , denoting the j th cause, and let Λ j ( s | A = a, W = w ) = PREPRINT - S

EPTEMBER

1, 2020 (cid:82) t λ j ( s | A = a, W = w ) ds be the corresponding cumulative conditional cause speciﬁc hazard function. Similarly,let λ aj ( t | W = w ) and λ a Y ,a D j ( t | W = w ) be the counterfactual conditional cause speciﬁc hazard functions underinterventions on A and joint interventions on A Y and A D , respectively. Then, the dismissible components conditionsare ∆1 : λ a Y ,a D =11 ( t | W = w ) = λ a Y ,a D =01 ( t | W = w ) , a Y ∈ { , } , at all t , which states that a counterfactual hazards of the event of interest ( j = 1 ) are equal under all values of A D , and ∆2 : λ a Y =1 ,a D ( t | W = w ) = λ a Y =0 ,a D ( t | W = w ) , a D ∈ { , } , at all t , which states that a counterfactual hazard functions of the competing event ( j = 2 ) are equal under all values of A Y . Let P ( t, a Y , a D , w ) = (cid:90) t e − Λ ( s | A = a Y ,w ) − Λ ( s | A = a D ,w ) d Λ ( s | A = a Y , w ) ,δ ( t, a D , w ) = P ( t, , a D , w ) − P ( t, , a D , w ) . Note that under the treatment decomposition assumption (Robins and Richardson, 2010; Stensrud et al., 2020b,a) andthe identiﬁcation conditions in Section 2.3, the cumulative incidence function for Y under treatment a conditionalon W = w is given by P ( t, a, a, w ) , which is also denoted by F ( t | a, w ) . Then a continuous-time equivalent to theG-formula (Robins, 1986) in Stensrud et al. (2020b) is P ( T a Y ,a D ≤ t, (cid:15) a Y ,a D = 1) = (cid:90) P ( t, a Y , a D , w ) f ( W = w ) dw, which allows us to identify our parameter of interest, δ ( t, a D ) = E { δ ( t, a D , W ) } , from the observed data. Suppose we were willing to postulate (semi)parametric models for the cause speciﬁc hazard functions, such as Coxproportional hazards models. Then it would be straight forward to estimate δ ( t, a ) using ˆ δ ( t, a ) = ˆ P ( t, , a ) − ˆ P ( t, , a ) , where ˆ P ( t, , a ) = n − (cid:88) i (cid:26)(cid:90) t e − ˆΛ ( s | A =1 ,W i ) − ˆΛ ( s | A = a,W i ) d ˆΛ ( s | A = 1 , W i ) (cid:27) , ˆ P ( t, , a ) = n − (cid:88) i (cid:26)(cid:90) t e − ˆΛ ( s | A =0 ,W i ) − ˆΛ ( s | A = a,W i ) d ˆΛ ( s | A = 0 , W i ) (cid:27) , because the terms in P ( t, , a ) and P ( t, , a ) can easily be estimated using Cox-models for the two cause speciﬁchazard functions. That is, if λ j ( t | a, w ) = λ j ( t ) e β Tj l , with l = ( a, w T ) T , then ˆΛ j ( s | l ) is obtained from ˆΛ j ( t ) e ˆ β Tj l ,which can be estimated from a Cox regression analysis. Asymptotic properties can also be derived using the approachof Chen et al. (2010), which we return to in Section 4.1.However, using such semiparametric regression models for the cause speciﬁc hazard functions may lead to biasedresults if these models are misspeciﬁed. Therefore we will provide more general results, based on semiparametrictheory (van der Vaart, 2000; van der Laan and Robins, 2003), which leads us to estimators with desirable propertiessuch as semiparametric efﬁciency and certain robustness (Bang and Robins, 2005). The primary tool to ﬁnding theseestimators is to derive the so-called efﬁcient inﬂuence function, see van der Vaart (2000), which we do in Section 3.4 PREPRINT - S

EPTEMBER

1, 2020

In this section we give the efﬁcient inﬂuence function for the target parameter parameter ψ t ( P ) = δ ( t,

1) = E { P ( t, , , W ) } − E { P ( t, , , W ) } , where we use P to denote the probability measure from which we observe Z = ( T, (cid:15), A, W ) . Note that it is possible torecode the treatment variable A , that is, interchanging the two levels 0 and 1, and thus we can restrict our attention to δ ( t, without loss of generality. The corresponding separable indirect effect is P ( t, , − P ( t, , , which wedecribe in more detail in Section 5.We impose no structure on P and show in the Appendix that the efﬁcient inﬂuence function is ˜ ψ ( t, Z )= (cid:8) N ( t ) − P ( t, , , W ) (cid:9) I ( A = 1) P ( A = 1 | W ) − (cid:8)(cid:90) t e − Λ ( s | ,W ) e − Λ ( s | ,W ) dN ( s ) − P ( t, , , W ) (cid:9) I ( A = 0) P ( A = 0 | W ) − (cid:90) t (cid:8) P ( t, , , W ) − P ( u, , , W ) (cid:9)(cid:20) dM T ( u | , W ) P ( T > u | , W ) I ( A = 0) P ( A = 0 | W ) − dM T ( u | , W ) P ( T > u | , W ) I ( A = 1) P ( A = 1 | W ) (cid:21) + δ ( t, , W ) − E (cid:8) δ ( t, , W ) (cid:9) , where N j ( t ) = I ( T ≤ t, (cid:15) = j ) is the j th speciﬁc counting process and M Tj ( t | a, w ) is the corresponding countingprocess martingale given A = a, W = w , i.e., M Tj ( t | a, w ) = N j ( t ) − (cid:82) t I ( s ≤ T ) d Λ j ( s | a, w ) . We may furtherrewrite the efﬁcient inﬂuence function in terms of the counting process martingales, see the Appendix for further details, ˜ ψ ( t, Z ) = (cid:90) h ( s, t, A, W ) dM T ( s | A, W ) + (cid:90) h ( s, t, A, W ) dM T ( s | A, W )+ δ ( t, , W ) − E (cid:8) δ ( t, , W ) (cid:9) (3)where h ( s, t, A, W ) = I ( s ≤ t ) g ( A, W ) e − Λ ( s | ,W ) e − Λ ( s | A,W ) (cid:26) − { F ( t | A, W ) − F ( s, | A, W ) (cid:9) P ( T > s | A, W ) (cid:27) ,h ( s, t, A, W ) = I ( s ≤ t ) g ( A, W ) P ( T > s | A, W ) (cid:26) P ( t, , , W ) − P ( s, , , W ) − e − Λ ( s | ,W ) e − Λ ( s | A,W ) (cid:8) F ( t | A, W ) − F ( s, | A, W ) (cid:9)(cid:27) ,g ( A, W ) = AP ( A = 1 | W ) − − AP ( A = 0 | W ) . We remind the reader that C denotes the potential censoring time, ˜ T = T ∧ C and ∆ = I ( T ≤ C ) . We can immediatelygeneralize (3) to allow for censoring, as we have imposed no structure on P (Tsiatis, 2006, formula 10.76): the efﬁcientinﬂuence function based on the observed data D = ( ˜ T , ∆ , ∆ (cid:15), A, W ) is given by ψ ( t, D ) = ˜ ψ ( t, Z )∆ K C ( T | A, W ) + (cid:90) ∞ L ( s, A, W ) K C ( s | A, W ) dM C ( s | A, W ) , (4)where we let Λ C ( s | A, W ) = (cid:82) s λ C ( u | A, W ) du denote the cumulative censoring hazard function, K C ( s | A, W ) = e − Λ C ( s | A,W ) is the corresponding survival function, and M C ( t | A, W ) = N C ( t ) − (cid:90) t I ( s ≤ ˜ T ) d Λ C ( s | A, W ) is the martingale associated with the censoring counting process N C ( t ) = I ( ˜ T ≤ t, ∆ = 0) using the ﬁltration wherewe include A and W . In (4), L ( s, A, W ) = E (cid:8) ˜ ψ ( t, Z ) | T > s, A, W (cid:9) . (5)Following Lemma A.2 of Lu and Tsiatis (2008), this is easily generalized to the competing risk setting considered here,we can express (4) in terms of the counting process martingales based on the observed data. We get that the efﬁcientinﬂuence function based on the observed data can be written as ψ ( t, D ) = φ ( t, D ) + δ ( t, , W ) − E (cid:8) δ ( t, , W ) (cid:9) , (6)5 PREPRINT - S

EPTEMBER

1, 2020where φ ( t, D ) = (cid:90) h ( s, t, A, W ) K C ( s | A, W ) dM ( s | A, W ) + (cid:90) h ( s, t, A, W ) K C ( s | A, W ) dM ( s | A, W ) and where M j ( t | a, w ) , j = 1 , , are the observed counting process martingales given A = a, W = w , i.e., for j = 1 , , M j ( t | a, w ) = ∆ I ( ˜ T ≤ t, (cid:15) = j ) − (cid:90) t I ( s ≤ ˜ T ) d Λ j ( s | a, w ) . The one-step estimator based on the efﬁcient inﬂuence function is thus given by ˆ δ e ( t,

1) = n − n (cid:88) i =1 (cid:26) ˆ δ ( t, , W i ) + ˆ φ ( t, D i ) (cid:27) , (7)where ˆ δ ( t, , W i ) is deﬁned analogously to δ ( t, , W i ) , except that the unknown quantities are replaced with estimatedcounterparts, and similarly for ˆ φ . This part requires working models, and in Section 3.1 we describe how robust theresulting estimator is to mis-speciﬁcation of these working models. Note also that the one-step estimator is equal to thesimple estimator ˆ δ ( t, plus an augmentation term. We argue now that the estimator ˆ δ e ( t, given in (7) has certain robustness properties unlike the initial estimator ˆ δ ( t, based on Cox-models for the cause speciﬁc hazard functions. Consider ﬁrst the setting where there is no censoring,i.e., we are then basing estimation on the efﬁcient inﬂuence function ˜ ψ ( t, Z ) given in (3). Let H denote the unknownparameters that goes into the efﬁcient inﬂuence function ˜ ψ ( t, Z ) , i.e. H = { Λ ( ·| A, W ) , Λ ( ·| A, W ) , P ( A = 1 | W ) } .Our estimator in this case is then the solution to n − (cid:80) i ˜ ψ ( t, Z i , H n ) , where H n is an estimator of H . We showin the Appendix that the resulting estimator is consistent if two out of the three possible working models are correctlyspeciﬁed. Now consider the setting where we allow for censoring. Let G denote the unknown parameters that goes intothe efﬁcient inﬂuence function, i.e. G = { Λ ( ·| A, W ) , Λ ( ·| A, W ) , P ( A = 1 | W ) , Λ C ( ·| A, W ) } so there are now fourmodels, which we denote (i) to (iv) in the order indicated in the deﬁnition of G . We show in the Appendix that ˆ δ e ( t, is consistent if the following models are correctly speciﬁed: (i) and (ii), or (i), (iii) and (iv), or (ii), (iii) and (iv). Hence,in a randomized study with the censoring being independent of W then we obtain consistency if one of the two causespeciﬁc hazard models is correctly speciﬁed, but not necessarily both. In this subsection we assume that the cause speciﬁc hazard functions are on Cox proportional hazards form, i.e. thatthese models are correctly speciﬁed. Speciﬁcally, let Λ j ( t | a, w ) = Λ j ( t ) e β jA a + β jW w , j = 1 , , and ˆ P ( t, a Y , a D ) = n − (cid:88) i ˆ P ( t, a Y , a D , W i ) , where ˆ P ( t, a Y , a D , W i ) is calculated using the estimates from ﬁtting separate Cox regression models with the event ofinterest and the competing event as dependent variable. We show in the Appendix that n / (cid:110) ˆ P ( t, a Y , a D ) − P ( t, a Y , a D ) (cid:111) = (cid:88) i (cid:15) P i ( t, a Y , a D ) , where (cid:15) P i ( t, a Y , a D ) are zero-mean iid terms (i.e. the inﬂuence function) so that ˆ P ( t, a Y , a D ) is a RAL estimator(Tsiatis, 2006) as long as the speciﬁed Cox proportional hazards models for the cause speciﬁc hazard functions arecorrectly speciﬁed. In the Appendix, we also give further details on how to estimate the inﬂuence function enhancingestimation of the variance of the estimator. 6 PREPRINT - S

EPTEMBER

1, 2020

Let ˜ φ ( t, D, G ) = δ ( t, , W, G ) + φ ( t, D, G ) . so that efﬁcient inﬂuence function ψ ( t, D, G ) is re-expressed as ˜ φ ( t, D, G ) − δ ( t, . If G is known then n − (cid:88) i (cid:8) ˜ φ ( t, D i , G ) − ˆ δ e ( t, (cid:9) = n − (cid:88) i (cid:8) ˜ φ ( t, D i , G ) − δ ( t, (cid:9) − { ˆ δ e ( t, − δ ( t, } from which we see that ˆ δ e ( t, has inﬂuence function ψ ( t, D, G ) , ie, the efﬁcient inﬂuence function. Thus, in thiscase, ˆ δ e ( t, is a semiparametrically efﬁcient RAL estimator (Tsiatis, 2006). In reality G is not known and needs tobe estimated. Let G n be such an estimator of G . The proposed estimator ˆ δ e ( t, solves n − (cid:88) i (cid:8) ˜ φ ( t, D i , G n ) − ˆ δ e ( t, (cid:9) and, therefore, n / { ˆ δ e ( t, − δ ( t, } = n − / (cid:88) i (cid:8) ˜ φ ( t, D i , G ) − δ ( t, (cid:9) + n − / (cid:88) i (cid:8) ˜ φ ( t, D i , G n ) − ˜ φ ( t, D i , G ) (cid:9) = n − / (cid:88) i ψ ( t, D i , G ) + n − / (cid:88) i (cid:8) ˜ φ ( t, D i , G n ) − ˜ φ ( t, D i , G ) (cid:9) = n − / (cid:88) i ψ ( t, D i , G ) + En / (cid:8) ˜ φ ( t, D, G n ) − ˜ φ ( t, D, G ) (cid:9) + o p (1) , following Chen et al. (2010). The expectation on the right hand side of the latter display is taken w.r.t. to D . Based oncorrectly speciﬁed models G n one may then derive the true analytical form of the inﬂuence function of ˆ δ e ( t ) in whichcase ˆ δ e ( t, is still a semiparametrically RAL estimator. The expression of the inﬂuence function depends on thespeciﬁc chosen working models G n similar to the development in Section 4.1, however. Instead we recommend usingthe non-parametric bootstrap procedure to estimate the variance of ˆ δ e ( t ) similar to what has been advocated in forinstance van der Laan and Rubin (2006) and van der Laan and Rose (2011). Later, we discuss an alternative approachfor estimation of the variance of ˆ δ e ( t ) . Analogous to the results in Section 3, we get the efﬁcient inﬂuence function for the separable indirect effect, P ( t, , − P ( t, , , which can be written as a function of the observed data, ψ I ( t, D ) = φ I ( t, D ) + P ( t, , , W ) − P ( t, , , W ) − (cid:8) P ( t, , − P ( t, , (cid:9) , (8)where φ I ( t, D ) = (cid:90) h I ( s, t, A, W ) K C ( s | A, W ) dM ( s | A, W ) + (cid:90) h I ( s, t, A, W ) K C ( s | A, W ) dM ( s | A, W ) , with h I ( s, t, A, W ) = I ( s ≤ t ) g ( A, W ) (cid:26) − e − Λ ( s | ,W ) e − Λ ( s | A,W ) (cid:27) (cid:26) − { F ( t | A, W ) − F ( s, | A, W ) (cid:9) P ( T > s | A, W ) (cid:27) ,h I ( s, t, A, W ) = I ( s ≤ t ) g ( A, W ) P ( T > s | A, W ) (cid:20) P ( t, , , W ) − P ( s, , , W ) − (cid:26) − e − Λ ( s | ,W ) e − Λ ( s | A,W ) (cid:27) (cid:8) F ( t | A, w ) − F ( s, | A, W ) (cid:9)(cid:21) , and M j ( t | a, w ) , j = 1 , , are the observed counting process martingales given A = a, W = w .7 PREPRINT - S

EPTEMBER

1, 2020 ˆ δ ( t, We ﬁrst consider the performance of the estimator ˆ δ ( t, that is based on using Cox-proportional hazards models forthe two cause speciﬁc hazard functions. Clearly, this estimator is only consistent if the proportional cause speciﬁchazards models are correctly speciﬁed. To generate data we used the cause speciﬁc hazard functions λ ( t | A = a, W = w ) = λ ( t ) e β A a + β W w λ ( t | A = a, W = w ) = λ ( t ) e β A a + β W w with λ ( t ) = 0 . , β A = − log (2) , β W = 0 . , and with λ ( t ) = 0 . , β A = − , β W = 0 . .Treatment indicator A was generated with P ( A = 1) = 0 . , and the covariate W was uniform on (0 , . Censoringwas generated according to the minimum of 7 and an exponentially distribution with mean 12. We then appliedthe estimators ˆ P ( t, , , ˆ P ( t, , and ˆ δ ( t,

1) = ˆ P ( t, , − ˆ P ( t, , and their corresponding standard errorestimators, all calculated at time points 2, 4 and 6. Results are summarized in Table 1. Each entry in the table is basedon 1000 replicates. Table 1 about hereBoth the estimator and its corresponding standard error estimator behave satisfactorily (Table 1). At the early timepoints the coverage is slightly less than nominal for P ( t, , and P ( t, , when n = 400 but improves for n = 800 . ˆ δ e ( t, We now assess the performance of the estimator ˆ δ e ( t, given in (7), which is derived from the efﬁcient inﬂuencefunction. We investigate the robustness properties of this estimator. We also report results for the simple estimator ˆ δ ( t, The exposure A is binary and the covariate W is uniform on (0 , . To be able to compute ˆ δ e ( t, we needworking models for the two cause speciﬁc hazard models, the propensity score and the censoring hazard function.We used Cox proportional hazard models for the two cause speciﬁc hazard models with main effects of A and W , alogistic regression model with main effects of W for the propensity score, and a Cox proportional hazards model for thecensoring hazard function with effect of A only. Let L = I ( W > / , λ ( t ) = 0 . , λ ( t ) = 0 . , β A = − log (5) , β A = 0 , β W = log (2) β W = 0 . . Censoring times were generated as C = min ( ˜ C, where ˜ C wasgenerated using the hazard function λ ˜ C ( t | A, W ) speciﬁed below. We used a sample size of n = 400 and simulateddata from the following different scanarios.A1 All models are correctly speciﬁed. λ ( t | A = a, W = w ) = λ ( t ) e β A a + β W w , λ ( t | A = a, W = w ) = λ ( t ) e β A a + β W w P ( A = 1 | W ) = expit (0 + log (2)( W − . , λ ˜ C ( t | A, W ) = 12 . A2 All models are correctly speciﬁed except the censoring model. λ ( t | A = a, W = w ) = λ ( t ) e β A a + β W w , λ ( t | A = a, W = w ) = λ ( t ) e β A a + β W w P ( A = 1 | W ) = expit (0 + log (2)( W − . , λ ˜ C ( t | A, W ) = 12 e . W . B1 The cause speciﬁc hazard models and the censoring model are correctly speciﬁed, but the propensity scoremodel is not. λ ( t | A = a, W = w ) = λ ( t ) e β A a + β W w , λ ( t | A = a, W = w ) = λ ( t ) e β A a + β W w P ( A = 1 | W ) = 0 . L + 0 . − L ) , λ ˜ C ( t | A, W ) = 12 . B2 The cause speciﬁc hazard models are correctly speciﬁed, but the propensity score model and the censoringmodel are not. λ ( t | A = a, W = w ) = λ ( t ) e β A a + β W w , λ ( t | A = a, W = w ) = λ ( t ) e β A a + β W w P ( A = 1 | W ) = 0 . L + 0 . − L ) , λ ˜ C ( t | A, W ) = 12 e . W . PREPRINT - S

EPTEMBER

1, 2020C1 λ ( t | A = a, W = w ) is a proportional hazard, but λ ( t | A = a, W = w ) is not. The propensity scoremodel and the censoring model are correctly speciﬁed: λ ( t | A = a, W = w ) = (1 − a ) λ ( t ) e β A a + β W w + aλ ( t ) e β A L − β A (1 − L )+ β W w λ ( t | A = a, W = w ) = λ ( t ) e β A a + β W w P ( A = 1 | W ) = expit (0 + log (2)( W − . , λ ˜ C ( t | A, W ) = 12 . C2 λ ( t | A = a, W = w ) is a proportional hazard, but λ ( t | A = a, W = w ) is not. The propensity scoremodel is correctly speciﬁed but the censoring model is not: λ ( t | A = a, W = w ) = (1 − a ) λ ( t ) e β A a + β W w + aλ ( t ) e β A L − β A (1 − L )+ β W w λ ( t | A = a, W = w ) = λ ( t ) e β A a + β W w P ( A = 1 | W ) = expit (0 + log (2)( W − . , λ ˜ C ( t | A, W ) = 12 e . W . We used 250 bootstrap replicates to calculate the bootstrap estimate of the variability of ˆ δ e ( t, . We also calculated anestimator of the variability based on the squared efﬁcient inﬂuence function.Table 2 about hereThe results are summarized in Table 2, where each entry in the table based on 1000 replicates. We see that bothestimators are consistent under scenario A1 and A2, and that the simple estimator is slightly more efﬁcient, as expectedsince both cause speciﬁc hazard functions are proportional. In Scenario B1 and B2, where the two cause speciﬁc hazardsmodels are correctly speciﬁed but the propensity score model is mis-speciﬁed, both estimators are consistent. Underscenario C1, where only λ ( t | A, W ) is a proportional hazard, the simple estimator is biased whereas the one basedon the efﬁcient inﬂuence function is still consistent, as both the censoring and propensity score models are correctlyspeciﬁed. Under scenario C2, where the censoring model is mis-speciﬁed, the estimator based on the efﬁcient inﬂuencefunction is now slightly biased, and the the simple estimator suffer from more severe bias. To illustrate the new estimators, we used data from a randomized trial on prostate cancer therapy (Byarand Green, 1980), which were also analyzed in Stensrud et al. (2020b) and are publicly available to anyone(http://biostat.mc.vanderbilt.edu/DataSets). We restricted our analysis to the patients who received placebo (127patients) and high-dose DES (125 patients). We included baseline measurements of daily activity function (binary), age(centered around its mean), hemoglobin level (centered around its mean) and previous cardiovascular disease (binary)in our analysis. We considered death due to prostate cancer as the event of interest and death due to other causes(consisting primarily of cardiovascular deaths) as the competing event.The events were recorded in monthly intervals from randomization. We used Cox proportional hazards models to obtainthe following hazard ratio estimates of the two cause speciﬁc hazards (comparing treatment to placebo): 0.74 (95%CI: 0.45, 1.21) for the primary cause and 1.17 (95% CI: 0.82, 1.66) for the competing cause. However, these hazardratio estimates cannot be interpreted causally (Young et al., 2020; Stensrud et al., 2020b). Yet point estimates of thecumulative incidence curves suggests that the treatment reduces the risk of death death to prostate cancer (Figure 1, leftdisplay), but increases the risk of death due to other causes (Figure 1, right display).Figure 1 about hereTo disentangle the causal treatment effect on the risk of dying from prostate cancer and from competing events,we therefore estimated the separable direct δ ( t, using the proposed ˆ δ e ( t, (Figure 2, left display). The pointestimates suggest a beneﬁcial separable direct of the prostate cancer therapy (although the conﬁdence bands cover0). For example, the separable direct effect after 40 months is estimated to be approximately -0.09, that is, ˆ δ e ( t =40 ,

0) = − .

09 (95%

CI: − . , − . , suggesting that a component of treatment reduces the risk of death dueto prostate cancer. As discussed in Stensrud et al. (2020b) this is supported by the biological argument that DESprevents the male testicles from producing testosterone, which, in turn, may prevent prostate cancer cells fromreplicating. On the other hand, the separable indirect effect (Figure 2, right display) is estimated to be approximately − . CI: − . , − . at t = 40 , suggesting that there is a component of DES that may increase the risk of9 PREPRINT - S

EPTEMBER

1, 2020death due to other causes, potentially because DES includes estrogen that may increase cardiovascular risk (Stensrudet al., 2020b). Thus, given that our identiﬁability conditions hold, the point estimates suggests that there is potentialfor improving the prostate cancer treatment by providing a new (modiﬁed) treatment that does not exert effects ondeath due to other causes. Indeed, such treatments already exist: Luteinising Hormone Releasing Hormone (LHRH)antagonists or orchidectomy (castration) are frequently used to suppress testosterone production in prostate cancerpatients today, and these treatments do not contain estrogen.Figure 2 about here

The separable direct and indirect effects clarify the causal interpretation of treatment effects in competing event settings(Stensrud et al., 2020b), which are ubiquitous in medicine and epidemiology. Our new results enable researchers toestimate these effects using classical models statistical models for survival analysis, such as Cox proportional hazardsmodels. Moreover, by deriving the nonparametric efﬁcient inﬂuence function, we have obtained a one-step estimator ofthe separable effects. Alternatively one may estimate the parameter using Targeted Maximum Likelihood Estimation(TMLE), see van der Laan and Rubin (2006) and van der Laan and Rose (2011). In the Appendix, we give calculationsto carry out TMLE for the speciﬁc parameter considered in this paper.The estimator derived from the efﬁcient inﬂuence function has certain desirable robustness properties, allowing someof the working models to be misspeciﬁed. It is, however, not straightforward to estimate the variance of the resultingestimator, because it depends on the working models that may contribute to the variability of the estimator. This is awell known problem, see for instance Moore and van der Laan (2009) and van der Laan and Rose (2011). In this article,we use the non-parametric bootstrap to estimate the variance. Alternatively, one may follow the more complicated routeoutlined in Benkeser et al. (2017) that gives a detailed description of the asymptotic linearity of a similar estimator,although in a simpler setting than the one considered in this paper. Extending the method of Benkeser et al. (2017) tothe setting considered in the present paper is a topic for future research.10

PREPRINT - S

EPTEMBER

1, 2020

References

Andersen, P. K., Geskus, R. B., de Witte, T., and Putter, H. (2012). Competing risks in epidemiology: possibilities andpitfalls.

International journal of epidemiology , 41(3):861–870.Bang, H. and Robins, J. M. (2005). Doubly robust estimation in missing data and causal inference models.

Biometrics ,61(4):962–973.Benkeser, D., Carone, M., Laan, M. J. V. D., and Gilbert, P. B. (2017). Doubly robust nonparametric inference on theaverage treatment effect.

Biometrika , 104(4):863–880.Byar, D. and Green, S. (1980). The choice of treatment for cancer patients based on covariate information.

Bulletin ducancer , 67(4):477–490.Chen, L., Lin, D. Y., and Zeng, D. (2010). Attributable fraction functions for censored event times.

Biometrika ,97(3):713–726.Didelez, V. (2018). Deﬁning causal meditation with a longitudinal mediator and a survival outcome.

Lifetime dataanalysis , pages 1–18.Hernán, M. A. (2010). The hazards of hazard ratios.

Epidemiology (Cambridge, Mass.) , 21(1):13.Lu, X. and Tsiatis, A. A. (2008). Improving the efﬁciency of the log-rank test using auxiliary covariates.

Biometrika ,95(3):679–694.Martinussen, T., Vansteelandt, S., and Andersen, P. K. (2018). Subtleties in the interpretation of hazard ratios. arXivpreprint arXiv:1810.09192 , page 0.Moore, K. L. and van der Laan, M. J. (2009). Covariate adjustment in randomized trials with binary outcomes: Targetedmaximum likelihood estimation.

Statistics in Medicine , 28(1):39–64.Robins, J. (1986). A new approach to causal inference in mortality studies with a sustained exposure period—applicationto control of the healthy worker survivor effect.

Mathematical modelling , 7(9-12):1393–1512.Robins, J. M. and Greenland, S. (1992). Identiﬁability and exchangeability for direct and indirect effects.

Epidemiology ,pages 143–155.Robins, J. M. and Richardson, T. S. (2010). Alternative graphical causal models and the identiﬁcation of direct effects.

Causality and psychopathology: Finding the determinants of disorders and their cures , pages 103–158.Stensrud, M. J., Hernán, M. A., Tchetgen, E. J. T., Robins, J. M., Didelez, V., and Young, J. G. (2020a). Generalizedinterpretation and identiﬁcation of separable effects in competing event settings. arXiv preprint arXiv:2004.14824 .Stensrud, M. J., Young, J. G., Didelez, V., Robins, J. M., and Hernán, M. A. (2020b). Separable effects for causalinference in the presence of competing events.

Journal of the American Statistical Association , (just-accepted):1–23.Tsiatis, A. (2006).

Semiparametric Theory and Missing Data . Springer New York.van der Laan, M. and Robins, J. M. (2003).

Uniﬁed methods for censored longitudinal data and causality . SpringerScience & Business Media.van der Laan, M. and Rubin, D. (2006). Targeted maximum likelihood learning.

The International Journal ofBiostatistics .van der Laan, M. J. and Rose, S. (2011). Targeted learning.

Springer Series in Statistics .van der Vaart, A. W. (2000).

Asymptotic statistics , volume 3. Cambridge university press.Young, J. G., Stensrud, M. J., Tchetgen Tchetgen, E. J., and Hernán, M. A. (2020). A causal framework for classicalstatistical estimands in failure-time settings with competing events.

Statistics in Medicine .11

PREPRINT - S

EPTEMBER

1, 2020

We consider the case where we observe the full data Z = ( T, (cid:15), A, W ) , that is, no individual is censored due to loss tofollow-up. Once we have the efﬁcient inﬂuence function for the full data case then it is easy to get it for the observeddata case as described in Section 2. We calculate the efﬁcient inﬂuence function through the following steps.(i) Calculate the efﬁcient inﬂuence for ψ ( P ) = Λ ( t | a, w ) .(ii) Calculate the efﬁcient inﬂuence for ψ ( P ) = Z ( t, w ) = exp {− Λ ( t | a, w ) } exp {− Λ ( t | a ∗ , w ) } . (iii) Calculate the efﬁcient inﬂuence for ψ ( P ) = F ( t | a ∗ , w ) . (iv) Calculate the efﬁcient inﬂuence for ψ ( P ) = (cid:90) t Z ( s, w ) dF ( s | a ∗ , w ) . (v) Calculate the efﬁcient inﬂuence for ψ ( P ) = P ( t, a ∗ , a ) = (cid:90) (cid:90) t Z ( s, w ) dF ( s | a ∗ , w ) f ( w ) dw. (vi) Calculate the efﬁcient inﬂuence for ψ ( P ) = δ ( t,

1) = P ( t, a ∗ = 1 , a = 1) − P ( t, a ∗ = 0 , a = 1) , which is the direct effect we are interested in.Since we have posed no structure on the distribution of ( T, (cid:15), A, W ) we may use the parametric submodel P v given by f v ( x ) = f ( x ) { v · g ( x ) } , when doing the efﬁcient inﬂuence function calculations. In the latter display, g denotes a zero-mean function with ﬁnitesecond moment. We need to ﬁnd ˙ ψ ( g ) = (cid:18) ∂ψ { P v } ∂v (cid:19) | v =0 , which we write as ˙ ψ ( g ) = (cid:90) ˜ ψgdP, and then our efﬁcient inﬂuence function is ˜ ψ . (i) We need to write Λ ( t | a, w ) ﬁrst as a function of the underlying probability measure. This is done by noting that P ( (cid:15) = 2 | T = s, A = a, w ) = λ ( s | A = a, w ) λ ( s | A = a, w ) + λ ( s | A = a, w ) and therefore Λ ( t | a, w ) = (cid:90) t f ( s, (cid:15) = 2 , a, w ) P ( T > s, a, w ) ds (9)and, also that P ( T > s, a, w ) = (cid:88) (cid:15) =1 (cid:90) ∞ s f ( u, (cid:15), a, w ) du so in this way we have expressed Λ ( t | a, w ) in terms of the underlying density function f . We parametrise the densityfunction f v ( x ) = f ( x ) { vg ( x ) } , PREPRINT - S

EPTEMBER

1, 2020and calculate ∂∂v (Λ v ( t | a, w )) | v =0 where we in (9) replace f with f v . This leads to ∂∂v (Λ v ( t | a, w )) | v =0 = (cid:90) I ( s < t, (cid:15) = 2) I ( A = a, w ) P ( A = a, w ) 1 P ( T > s | A, w ) (cid:40) g ( s, (cid:15), A, w ) − (cid:80) (cid:15) =1 (cid:82) ∞ s g ( u, ˜ (cid:15), A, w ) f ( u, ˜ (cid:15), A, w ) duP ( T > s, A, w ) (cid:41) f ( s, (cid:15), A, w ) dx, (10)with x = ( s, (cid:15), A, w ) . So we can directly read off that the ﬁrst term in the latter display contributes with I ( T < t, (cid:15) = 2) I ( A = a, w ) P ( A = a, w ) exp {− Λ( T | a, w ) } = I ( A = a, w ) P ( A = a, w ) (cid:90) t P ( T > s | a, w ) dN ( s ) to the efﬁcient inﬂuence function. By changing order of integration, it may be seen that second term in (10) contributes I ( A = a, w ) P ( A = a, w ) (cid:90) t P ( T > s | a, w ) I ( s ≤ T ) λ ( s | a, w ) ds so all in all, we have that ∂∂v (Λ v ( t | a, w )) | v =0 = (cid:90) ˜ ψgdP (11)where ˜ ψ ( T, (cid:15), A, w ) = I ( A = a, w ) P ( A = a | w ) P ( w ) (cid:90) t P ( T > s | a, w ) dM ( s | a, w ) (12)with M ( t | a, w ) = N ( t ) − (cid:90) t I ( s ≤ T ) λ ( s | a, w ) ds being the martingale associated with the cause 2 counting process (where we condition on A = a and W = w ). (ii) Taking h ( x, y ) = e − x + y and by simple differentiation, we get ˜ ψ ( T, (cid:15), A, w ) = Z ( t, w ) (cid:20) I ( A = a ∗ ) P ( A = a ∗ | w ) (cid:90) t P ( T > s | a ∗ , w ) dM ( s | a ∗ , w ) − I ( A = a ) P ( A = a | w ) (cid:90) t P ( T > s | a, w ) dM ( s | a, w ) (cid:21) I ( W = w ) P ( w ) (13) (iii) We use L = ( A, W T ) T . F ( t | l ) = (cid:90) t P ( T > s | l ) λ ( s | l ) ds We seek to write this parameter as a function of the underlying probability measure, ψ ( P ) . The density function is f ( t, (cid:15), l ) = P ( T > t | l ) { λ ( s | l ) } I ( (cid:15) =1) { λ ( s | l ) } I ( (cid:15) =2) f ( l ) and we can therefore write F ( t | l ) as F ( t | l ) = (cid:90) I ( s < t, (cid:15) = 1) I ( L = l ) P ( L = l ) f ( s, (cid:15), l ) d ( s, (cid:15), l ) = ψ ( P ) We now parametrise the density function f v ( x ) = f ( x ) { vg ( x ) } , where x = ( s, (cid:15), l ) . So we ﬁrst need to ﬁnd (cid:16) ∂ψ ( P v ) ∂v (cid:17) | v =0 , which essentially amounts to ﬁnd ∂∂v (cid:18) f v ( s, (cid:15), l ) P v ( L = l ) (cid:19) | v =0 PREPRINT - S

EPTEMBER

1, 2020It is easily seen that ∂∂v ( P v ( L = l )) | v =0 = E { g ( X ) | L = l } P ( L = l ) ∂∂v ( f v ( s, (cid:15), l )) | v =0 = g ( x ) f ( x ) giving that (cid:18) ∂ψ ( P v ) ∂v (cid:19) | v =0 = (cid:90) I ( s < t, (cid:15) = 1) I ( L = l ) P ( L = l ) { g ( x ) − E ( g ( x ) | L = l ) } f ( x ) dx. Further, (cid:90) I ( s < t, (cid:15) = 1) I ( L = l ) P ( L = l ) { E ( g ( x ) | L = l ) } f ( x ) dx = E { I ( T < t, (cid:15) = 1) I ( L = l ) P ( L = l ) E ( g ( x ) | L ) } = EE ( | L )= E (cid:18) g ( X ) I ( L = l ) P ( L = l ) E { I ( T < t, (cid:15) = 1) | L } (cid:19) giving us that (cid:18) ∂ψ ( P v ) ∂v (cid:19) | v =0 = (cid:90) g ( x ) I ( L = l ) P ( L = l ) [ I ( s < t, (cid:15) = 1) − E { I ( T < t, (cid:15) = 1) | L = l } ] f ( x ) dx = (cid:90) ˜ ψgdP, where ˜ ψ ( T, (cid:15), A = a ∗ , w ) = I ( A = a ∗ , w ) P ( A = a ∗ | w ) P ( w ) { N ( t ) − F ( t | a ∗ , w ) } , (14)is the efﬁcient inﬂuence function. (iv) By looking at the transform (cid:90) x ( s ) dy ( s ) we obtain that the efﬁcient inﬂuence function of ψ ( P ) = (cid:82) t Z ( s, w ) dF ( s | a ∗ , w ) consists of the sum of the terms inthe following two displays (cid:20) I ( A = a ∗ ) P ( A = a ∗ | w ) (cid:90) t (cid:90) tu Z ( s, w ) dF ( s | a ∗ , w ) 1 P ( T > u | a ∗ , w ) dM ( u | a ∗ , w ) − I ( A = a ) P ( A = a | w ) (cid:90) t (cid:90) tu Z ( s, w ) dF ( s | a ∗ , w ) 1 P ( T > u | a, w ) dM ( u | a, w ) (cid:21) I ( W = w ) P ( w ) (15)and I ( A = a ∗ , w ) P ( A = a ∗ | w ) P ( w ) (cid:90) t Z ( s, w ) { dN ( s ) − dF ( s | a ∗ , w ) } . (16) (v) We obtain directly that the efﬁcient inﬂuence function of ψ ( P ) = (cid:90) (cid:90) t Z ( s, w ) dF ( s | a ∗ , w ) f ( w ) dw is (cid:90) t (cid:90) tu Z ( s, w ) dF ( s | a ∗ , w ) (cid:20) dM ( u | a ∗ , w ) P ( T > u | a ∗ , w ) I ( A = a ∗ ) P ( A = a ∗ | w ) − dM ( u | a, w ) P ( T > u | a, w ) I ( A = a ) P ( A = a | w ) (cid:21) − I ( A = a ∗ ) P ( A = a ∗ | w ) (cid:90) t Z ( s, w ) { dN ( s ) − dF ( s | a ∗ , w ) } + (cid:90) t Z ( s, w ) dF ( s | a ∗ , w ) − E (cid:26)(cid:90) t Z ( s, W ) dF ( s | a ∗ , W ) (cid:27) (17)Note that (cid:90) tu Z ( s, w ) dF ( s | a ∗ , w ) = P ( t, a ∗ , a, w ) − P ( u, a ∗ , a, w ) . PREPRINT - S

EPTEMBER

1, 2020 (vi)

We can now use the result in (v) to get the efﬁcient inﬂuence function of ψ ( P ) = δ ( t,

1) = P ( t, a ∗ = 1 , a = 1) − P ( t, a ∗ = 0 , a = 1) since both terms on the right hand side of the latter display are special cases of (v). This gives the desired efﬁcientinﬂuence function ˜ ψ ( t, T, (cid:15), A, W ) (cid:8) N ( t ) − P ( t, , , W ) (cid:9) I ( A = 1) P ( A = 1 | W ) − (cid:8)(cid:90) t e − Λ ( s | ,W ) e − Λ ( s | ,W ) dN ( s ) − P ( t, , , W ) (cid:9) I ( A = 0) P ( A = 0 | W ) − (cid:90) t (cid:8) P ( t, , , W ) − P ( u, , , W ) (cid:9)(cid:20) dM ( u | , W ) P ( T > u | , W ) I ( A = 0) P ( A = 0 | W ) − dM ( u | , W ) P ( T > u | , W ) I ( A = 1) P ( A = 1 | W ) (cid:21) + δ ( t, , W ) − E (cid:8) δ ( t, , W ) (cid:9) (18)where, as earlier deﬁned, δ ( t, , W ) = P ( t, , , W ) − P ( t, , , W ) . Note also that P ( t, , , W ) = F ( t | A = 1 , W ) , the cumulative incidence function for cause 1 given A = 1 and W , and that δ ( t,

1) = E (cid:8) δ ( t, , W ) (cid:9) is the parameter of interest.The efﬁcient inﬂuence function ˜ ψ ( t, T, (cid:15), A, W ) needs to be an element of the tangent space T , which is given by T = T ⊕ T ⊕ T where T = (cid:8)(cid:90) h ( u, A, W ) dM ( u, A, W ) for all h ( u, a, w ) (cid:9) , T = (cid:8)(cid:90) h ( u, A, W ) dM ( u, A, W ) for all h ( u, a, w ) (cid:9) , T = (cid:8) h ( A, W ) ∈ H : E { h ( X, W ) } = 0 (cid:9) . This is not obvious at ﬁrst sight when looking at (18). However, it turns out be true as we may show that N ( t ) − F ( t | a, w ) = (cid:90) t dM ( s | a, w ) − (cid:90) t { F ( t | a, w ) − F ( s, | a, w ) (cid:9) P ( T > s | a, w ) dM ( s | a, w ) (19)where M ( t | a, w ) = M ( t | a, w ) + M ( t | a, w ) . To see that (19) holds just check that it holds in the two cases T ≤ t and t < T . Using (19) we can write the efﬁcient inﬂuence function ˜ ψ ( t, T, (cid:15), A, W ) as (cid:8) M ( t | , W ) − (cid:90) t { F ( t | , w ) − F ( s, | , w ) (cid:9) P ( T > s | , w ) dM ( s | , w ) (cid:9) I ( A = 1) P ( A = 1 | W ) − (cid:8)(cid:90) t e − Λ ( s | ,W ) e − Λ ( s | ,W ) (cid:8) dM ( s | , w ) − { F ( t | , w ) − F ( s, | , w ) (cid:9) P ( T > s | , w ) dM ( s | , w ) (cid:9) I ( A = 0) P ( A = 0 | W )+ (cid:90) t (cid:8) P ( t, , , W ) − P ( u, , , W ) (cid:9)(cid:20) dM ( u | , W ) P ( T > u | , W ) I ( A = 1) P ( A = 1 | W ) − dM ( u | , W ) P ( T > u | , W ) I ( A = 0) P ( A = 0 | W ) (cid:21) + δ ( t, , W ) − E (cid:8) δ ( t, , W ) (cid:9) (20)and from this representation we note that ˜ ψ ( t, T, (cid:15), A, W ) indeed belongs to the tangent space T . The expression in thelatter display can further be re-written giving (3). 15 PREPRINT - S

EPTEMBER

1, 2020

We argue now that the estimator ˆ δ e ( t, given in (7) has certain robustness properties unlike the initial estimator ˆ δ ( t, based on Cox-models for the cause speciﬁc hazard functions. We ﬁrst consider the case where there is nocensoring. The estimator ˆ δ e ( t, consist of a difference between two terms, and we consider these terms separately.We have that P ( t, ,

1) = E { F ( t | A = 1 , W ) } , and deﬁne Y = N ( t ) I ( A = 1) P ( A = 1 | W ) so E ( Y ) = E { F ( t | A = 1 , W ) } if P ( A = 1 | W ) is correctly speciﬁed. The corresponding terms of the estimator are (cid:20) N ( t ) − F ( t | A = 1 , W ) (cid:21) I ( A = 1) P ( A = 1 | W ) + F ( t | A = 1 , W )= Y + F ( t | A = 1 , W ) (cid:26) P ( A = 1 | W ) − I ( A = 1) P ( A = 1 | W ) (cid:27) (21)from which we see that the mean of the left hand side of (21), if P ( A = 1 | W ) is correctly speciﬁed, is E ( Y ) + 0 = F ( t, so it has the correct mean in that case. On the other hand, if F ( t | A = 1 , W ) is correctly speciﬁed (which is the case if λ j ( t | A = 1 , W ) , j = 1 , , are correctly speciﬁed) but P ( A = 1 | W ) may not be so, then the mean of the left hand sideof (21) is E (cid:26) F ( t | A = 1 , W ) P ( A = 1 | W ) P ∗ ( A = 1 | W ) (cid:27) + E (cid:20) F ( t | A = 1 , W ) (cid:26) − P ( A = 1 | W ) P ∗ ( A = 1 | W ) (cid:27)(cid:21) = F ( t, , where we use P ∗ ( A = 1 | W ) to denote the limit (in probability) of a potentially mis-speciﬁed estimator ˆ P ( A = 1 | W ) ,and P to denote the truth. So the estimator ˆ P ( t, ,

EPTEMBER

1, 2020Assume ﬁrst that λ ( t | j, W ) and λ ( t | j, W ) , j = 0 , , are correctly speciﬁed but P ( A = j | W ) may not be so. It isthen clear from (3) that the estimator based on the efﬁcient inﬂuence function is unbiased as the efﬁcient inﬂuencefunction will still have zero mean.We now assume that P ( A = j | W ) and λ ( t | j, W ) are correctly speciﬁed but λ ( t | j, W ) may be mis-speciﬁed. Itfollows directly that the mean of the ﬁrst term of display (23) is E ( Y ) + 0 = P ( t, , . and it is also clear that the the mean of the sum of the last two terms of display (23) is zero as long as λ ( t | j, W ) iscorrectly speciﬁed.We now assume that P ( A = j | W ) and λ ( t | j, W ) are correctly speciﬁed but λ ( t | j, W ) may be mis-speciﬁed. Itfollows again that the mean of the ﬁrst term of display (23) is E ( Y ) + 0 = P ( t, , . Keep in mind that we have E (cid:26) dN ( s ) I ( A = 0) P ( A = 0 | W ) | W (cid:27) = dF ( s | , W ) (24)and that E (cid:26) dN ( s ) I ( A = j ) P ( A = j | W ) | W (cid:27) = dF ( s | j, W ) for j = 1 , . We let Λ ∗ k ( t | j, w ) denote the limit of a given estimator ˆΛ k ( t | j, w ) for k = 1 , and j = 0 , , and similarly,let Z ∗ ( t, w ) denote the limit of the estimator ˆ Z ( t, w ) , and P ∗ ( t > u | j, w ) being e − Λ ∗ ( u | j,w ) − Λ ∗ ( u | j,w ) (all supposed toexist such as would be the case if we use Cox-models for the two cause speciﬁc hazard functions). We note that E (cid:8) dN ( u ) − Y ( u ) d Λ ∗ ( u | j, W ) P ∗ ( T > u | j, W ) I ( A = j ) P ( A = j | W ) | W (cid:9) = dF ( u | j, W )) P ∗ ( T > u | j, W ) − P ( T > u | j, W ) d Λ ∗ ( u | j, W ) P ∗ ( T > u | j, W )= − P ( T > u | j, W ) P ∗ ( T > u | j, W ) (cid:8) d Λ ∗ ( u | j, W ) − Λ ( u | j, W ) (cid:9) = − e { Λ ∗ ( u | j,W ) − Λ ( u | j,W ) } (cid:8) d Λ ∗ ( u | j, W ) − Λ ( u | j, W ) (cid:9) (25)since λ is correctly speciﬁed. Integrating the right hand side of (25) from 0 to s gives − e { Λ ∗ ( u | j,W ) − Λ ( u | j,W ) } + 1 Using that dF ( s | , W ) = e −{ Λ ∗ ( u | ,W ) − Λ ( u | ,W ) } dF ( s | , W ) we see that the last term of (23) converges to − (cid:90) t { Z ∗ ( s, w ) − Z ( s, w ) } dF ( s | , W ) that cancels with the limit of the second term of (23). Hence, ˆ δ e ( t, is consistent if two of the three models (i): λ ( t | j, w ) ; (ii): λ ( t | j, w ) and (iii): P ( A = j | W ) , are correctly speciﬁed. We now consider the case with censoringand let (iv): λ C ( s | A, W ) be the censoring model. Using (4), we can write ψ ( t, D ) = ˜ ψ ( t, Z ) + (cid:90) ∞ (cid:2) ˜ ψ ( t, Z ) − E (cid:8) ˜ ψ ( t, Z ) | T > s, A, W (cid:9)(cid:3) K C ( s | A, W ) dM C ( s | A, W ) (26)that depends on G as noted in Section 3.1. We let G ∗ denote the limit of the estimator G n of G , and have, as notedabove, that E { ˜ ψ ( t, Z, G ∗ ) } = 0 if two out of the three models (i)-(iii) are correctly speciﬁed. Keep in mind that dM C ( s | A, W ) = dN C ( s ) − I ( s ≤ ˜ T ) d Λ C ( s | A, W ) . Since T and C are conditional independent given A and W , wehave E { dN C ( s ) | T, A, W ) } = I ( s ≤ T ) P ( T > s | A, W ) K C ( s | A, W ) λ C ( s | A, W ) E { I ( s ≤ ˜ T ) | T, A, W ) } = I ( s ≤ T ) P ( T > s | A, W ) K C ( s | A, W ) . PREPRINT - S

EPTEMBER

1, 2020Therefore, E (cid:26) (cid:2) ˜ ψ ( t, Z, G ∗ ) − E (cid:8) ˜ ψ ( t, Z ) | T > s, A, W, G ∗ (cid:9)(cid:3) K ∗ C ( s | A, W ) dM ∗ C ( s | A, W ) (cid:27) = E (cid:26) I ( s ≤ T ) (cid:2) ˜ ψ ( t, Z, G ∗ ) − E (cid:8) ˜ ψ ( t, Z ) | T > s, A, W, G ∗ (cid:9)(cid:3) K C ( s | A, W ) K ∗ C ( s | A, W ) { λ C ( s | A, W ) − λ ∗ C ( s | A, W ) } ds (cid:27) , and E (cid:8) I ( s ≤ T ) (cid:2) ˜ ψ ( t, Z, G ∗ ) − E (cid:8) ˜ ψ ( t, Z ) | T > s, A, W, G ∗ (cid:9)(cid:3) | A, W (cid:9) = (cid:88) (cid:15) (cid:90) ∞ s ˜ ψ ( t, u, (cid:15), A, W ) (cid:2) f ( u, (cid:15) | A, W ) − f ∗ ( u, (cid:15) | A, W ) P ( T > u | A, W ) P ∗ ( T > u | A, W ) (cid:3) du We hence see that the mean of the latter term in (26) is zero if either models (i) and (ii) are correctly speciﬁed or ifmodel (iv) is correctly speciﬁed. The proposed estimator ˆ δ e ( t, is thus consistent if the following models are correctlyspeciﬁed: (i) and (ii), or (i), (iii) and (iv), or (ii), (iii) and (iv). It follows that under appropriate conditions, as in Chen et al. (2010), that n / (cid:110) ˆ P ( t, a Y , a D ) − P ( t, a Y , a D ) (cid:111) = n − / (cid:88) i [ P ( t, a Y , a D , W i ) − E { P ( t, a Y , a D , W i ) } ]+ E (cid:104) n / (cid:110) ˆ P ( t, a Y , a D , W ) − P ( t, a Y , a D , W ) (cid:111)(cid:105) + o p (1) , (27)where the expectation in the second term is taken wrt W . The ﬁrst term on the right hand side of (27) is already oniid-form, so we only need to deal with the second term. Deﬁne θ t ( a Y , a D , w ) = e − Λ ( t | A = a Y ,w ) − Λ ( t | A = a D ,w ) . It is then easily seen that n / (cid:110) d ˆ P ( t, a Y , a D , W ) − dP ( t, a Y , a D , W ) (cid:111) = n / (cid:110) ˆ θ t ( a Y , a D , W ) − θ t ( a Y , a D , W ) (cid:111) e β A a Y + β W W d Λ ( t )+ θ t ( a Y , a D , W ) g ( β , a Y , W ) d n / (cid:110) ˆΛ ( t ) − Λ ( t ) (cid:111) (28) + θ t ( a Y , a D , W ) (cid:8) D β g ( β , a Y , W ) (cid:9) n / (cid:110) ˆ β − β (cid:111) d Λ ( t ) and we then need to ﬁnd the inﬂuence functions of the three terms on the right hand side of (28), and take expectationwrt W . In the latter display, g ( β , a Y , W ) = e β A a Y + β W W with β j = ( β jA , β jW ) , j = 1 , . The second and third terms can be dealt with directly. For instance, with (cid:15) β i being theinﬂuence function corresponding to ˆ β , E W (cid:2) θ t ( a Y , a D , W ) (cid:8) D β g ( β , a Y , W ) (cid:9) n / (cid:110) ˆ β − β (cid:111) d Λ ( t ) (cid:3) = E L (cid:2) θ t ( a Y , a D , L ) (cid:8) D β g ( β , a Y , L ) (cid:9)(cid:3) (cid:40) n − / (cid:88) i (cid:15) β i (cid:41) d Λ ( t ) in which we may estimate E W (cid:2) θ t ( a Y , a D , W ) (cid:8) D β g ( β , a Y , W ) (cid:9)(cid:3) by n − (cid:88) i θ t ( a Y , a D , W i ) (cid:8) D β g ( β , a Y , W i ) (cid:9) . and then replacing parameters with their estimates. Similarly with the second term on the right hand side of (28). Theﬁrst term on the right hand side of (28) is handled using the mean value theorem (Taylor expansion) with respect to theparameters { Λ j ( t ) , β j } , j = 1 , . The expectation wrt W of the part corresponding to λ ( t ) of this term is − E W (cid:2) θ t ( a Y , a D , W ) e β A a Y + β W W ) (cid:3) (cid:40) n − / (cid:88) i (cid:15) λ i ( t ) (cid:41) d Λ ( t ) , PREPRINT - S

EPTEMBER

1, 2020where E W (cid:2) θ t ( a Y , a D , W ) e β A a Y + β W W ) (cid:3) is estimated by n − (cid:88) i θ t ( a Y , a D , W i ) e β A a Y + ˆ β W W i ) and replacing parameters with their estimates. Similarly with the other terms. By combining all these terms, we havethat n / (cid:110) ˆ P ( t, a Y , a D ) − P ( t, a Y , a D ) (cid:111) = (cid:88) i (cid:15) P i ( t, a Y , a D ) , where (cid:15) P i ( t, a Y , a D ) are zero-mean iid terms (i.e. we have found the inﬂuence function), and we have also a way ofestimating the inﬂuence function. So, basically, what we need is the ﬁrst term in (27), and then all the inﬂuence functionscorresponding to the parameters { Λ j ( t ) , β j } using the usual Cox-regression estimators (these can be extracted fromthe coxaalen -function in R-package timereg ), and then also some ordinary empirical means such as n − (cid:88) i θ t ( a Y , a D , W i ) e β A a Y + ˆ β W W i ) . We here brieﬂy describe how to do Targeted Maximum Likelihood Estimation (TMLE), see van der Laan and Rubin(2006) and van der Laan and Rose (2011). We do so for the setting with no censoring, based on expression (3). Thegeneralization to the censored case follows immediately. The tangent space adhering to f ( A | W ) consist of all functions r ( A, W ) so that E { r ( A, W ) | W } = 0 , and the efﬁcient inﬂuence function ˜ ψ ( t, T, (cid:15), A, W ) is therefore orthogonal tothat space. To see why, E (cid:8) r ( A, W ) (cid:90) g ( s ; A, W ) dM j ( s | A, W ) (cid:9) = E (cid:2) r ( A, W ) E { (cid:90) g ( s ; A, W ) dM j ( s | A, W ) | A, W } (cid:3) = 0 and E (cid:2) r ( A, W ) { δ ( t, , W ) − δ ( t, } (cid:3) = E (cid:2) { δ ( t, , W ) − δ ( t, } E { r ( A, W ) | W } (cid:3) = 0 . Hence we need not to ﬂuctuate the initial estimator of P ( A = a | W ) , and can concentrate on the part of the tangentspace adhering to f ( T, (cid:15) | A, W ) . The needed parametric submodels are obtained by parametrizing the cause speciﬁchazard function as λ j,γ ( s | A, W ) = λ j ( s | A, W ) e γH j ( s ; A,W ) , j = 1 , . The score (in γ ) corresponding to f ( T, (cid:15) | A, W ) of the likelihood is (cid:88) j =1 (cid:90) H j ( s ; A, W ) dM Tj ( s | A, W ) so we see from (3) that we need to pick the functions H j ( s ; A, W ) as H ( s ; A, W ) = I ( s ≤ t ) g ( A, W ) e − Λ ( s | ,W ) e − Λ ( s | A,W ) (cid:26) − { F ( t | A, w ) − F ( s, | A, w ) (cid:9) P ( T > s | A, w ) (cid:27) H ( s ; A, W ) = I ( s ≤ t ) g ( A, W ) P ( T > s | A, W ) , (cid:26) P ( t, , , W ) − P ( s, , , W ) − e − Λ ( s | ,W ) e − Λ ( s | A,W ) (cid:8) F ( t | A, w ) − F ( s, | A, W ) (cid:9)(cid:27) . We ﬁnd ˆ γ as the solution to U ( γ ) = n (cid:88) i = 2 (cid:88) j =1 (cid:90) H j ( s ; A i , W i ) (cid:8) dN ij ( s ) − Y i ( s ) e γH j ( s ; A i ,W i ) d ˆΛ j ( s | A i , W i ) (cid:9) , (29)where ˆΛ j ( s | A i , W i ) is an initial estimate of Λ j ( s | A i , W i ) based for instance on semiparametric models such as theCox-model. We then calculate ˆΛ (1) j ( s | A, W ) = (cid:90) s e ˆ γH j ( u ; A,W ) d ˆΛ j ( s | A, W ) . PREPRINT - S

EPTEMBER

1, 2020Next step is to solve equation (29) with ˆΛ j replaced by ˆΛ (1) j to get an updated ˆ γ (1) to calculate ˆΛ (2) j ( s | A, W ) = (cid:90) s e ˆ γ (1) H j ( u ; A,W ) d ˆΛ j ( s | A, W ) and iterate until convergence (ie until ˆ γ does no longer change), say it happens at iteration step k . The TMLE estimateof parameter of interest is then simply the plug-in estimate ˆ δ ( t,

1) = n − (cid:88) i (cid:26)(cid:90) t e − ˆΛ ( k )1 ( s | ,W i ) − ˆΛ ( k )2 ( s | ,W i ) d ˆΛ ( k )1 ( s | , W i ) (cid:27) − n − (cid:88) i (cid:26)(cid:90) t e − ˆΛ ( k )1 ( s | ,W i ) − ˆΛ ( k )2 ( s | ,W i ) d ˆΛ ( k )2 ( s | , W i ) (cid:27) . Table 1: Simulation results concerning the estimators ˆ P ( t, , , ˆ P ( t, , and ˆ δ ( t,

1) = ˆ P ( t, , − ˆ P ( t, , .Each entry in the table is based on 1000 replicates. n=400 n=800 t = 2 t = 4 t = 6 t = 2 t = 4 t = 6 True P ( t, , ˆ P ( t, , ˆ P ( t, , ) 0.013 0.020 0.025 0.009 0.014 0.018see ( ˆ P ( t, , ) 0.013 0.020 0.025 0.009 0.014 0.01895% CP( ˆ P ( t, , ) 0.924 0.934 0.945 0.939 0.938 0.942True P ( t, , ˆ P ( t, , ˆ P ( t, , ) 0.020 0.027 0.032 0.0143 0.019 0.023see ( ˆ P ( t, , ) 0.019 0.028 0.033 0.014 0.019 0.02395% CP( P ( t, , ) 0.929 0.952 0.952 0.948 0.954 0.956True δ ( t, -0.049 -0.080 -0.101 -0.049 -0.080 -0.101Mean ˆ δ ( t, -0.048 -0.079 -0.099 -0.049 -0.081 -0.101sd ( ˆ δ ( t, ) 0.019 0.031 0.038 0.014 0.022 0.027see ( ˆ δ ( t, ) 0.019 0.031 0.039 0.014 0.022 0.02895% CP( ˆ δ ( t, ) 0.946 0.951 0.952 0.956 0.957 0.956 PREPRINT - S

EPTEMBER

1, 2020Table 2: Simulation results concerning the simple estimator ˆ δ ( t, and the one based on the efﬁcient inﬂuence function, ˆ δ e ( t, . We let see ei denote the standard error estimate based on the squared efﬁcient inﬂuence function and see b denotes the standard error estimate based 250 bootstrap replicates. Each entry in the table is based on 1000 replicates. t = 1 t = 3 t = 5 t = 7 t = 9 A1 True δ ( t, -0.052 -0.128 -0.178 -0.210 -0.231Mean ˆ δ ( t, -0.052 -0.128 -0.177 -0.209 -0.229Mean ˆ δ e ( t, -0.052 -0.128 -0.178 -0.209 -0.229sd ( ˆ δ ( t, ) 0.014 0.024 0.029 0.032 0.034sd ( ˆ δ e ( t, ) 0.020 0.029 0.033 0.035 0.036see ei ( ˆ δ e ( t, ) 0.020 0.030 0.035 0.037 0.038see b ( ˆ δ e ( t, ) 0.020 0.029 0.033 0.035 0.036A2 True δ ( t, -0.052 -0.128 -0.178 -0.210 -0.231Mean ˆ δ ( t, -0.052 -0.127 -0.176 -0.208 -0.228Mean ˆ δ e ( t, -0.052 -0.127 -0.176 -0.207 -0.227sd ( ˆ δ ( t, ) 0.015 0.028 0.034 0.039 0.042sd ( ˆ δ e ( t, ) 0.020 0.033 0.038 0.042 0.044see ei ( ˆ δ e ( t, ) 0.020 0.032 0.038 0.042 0.046see b ( ˆ δ e ( t, ) 0.020 0.031 0.038 0.042 0.044B1 True δ ( t, -0.052 -0.128 -0.178 -0.210 -0.231Mean ˆ δ ( t, -0.052 -0.127 -0.176 -0.208 -0.228Mean ˆ δ e ( t, -0.052 -0.126 -0.176 -0.207 -0.228sd ( ˆ δ ( t, ) 0.014 0.025 0.031 0.034 0.037sd ( ˆ δ e ( t, ) 0.024 0.036 0.041 0.043 0.046see ei ( ˆ δ e ( t, ) 0.024 0.037 0.043 0.046 0.048see b ( ˆ δ e ( t, ) 0.023 0.035 0.040 0.043 0.044B2 True δ ( t, -0.052 -0.128 -0.178 -0.210 -0.231Mean ˆ δ ( t, -0.052 -0.127 -0.175 -0.206 -0.227Mean ˆ δ e ( t, -0.052 -0.126 -0.175 -0.206 -0.227sd ( ˆ δ ( t, ) 0.016 0.028 0.037 0.042 0.045sd ( ˆ δ e ( t, ) 0.024 0.038 0.045 0.049 0.054see ei ( ˆ δ e ( t, ) 0.024 0.039 0.048 0.052 0.056see b ( ˆ δ e ( t, ) 0.024 0.038 0.045 0.050 0.054C1 True δ ( t, ˆ δ ( t, ˆ δ e ( t, ˆ δ e ( t, ) 0.030 0.040 0.042 0.041 0.041see ei ( ˆ δ e ( t, ) 0.030 0.041 0.043 0.044 0.043see b ( ˆ δ e ( t, ) 0.030 0.040 0.042 0.042 0.042C2 True δ ( t, ˆ δ ( t, ˆ δ e ( t, ˆ δ e ( t, ) 0.032 0.043 0.047 0.049 0.050see ei ( ˆ δ e ( t, ) 0.031 0.044 0.048 0.051 0.052see b ( ˆ δ e ( t, ) 0.030 0.043 0.048 0.050 0.051 PREPRINT - S

EPTEMBER

1, 2020

Time (months) A b s o l u t e r i sk . . . . . Time (months) A b s o l u t e r i sk . . . . . Figure 1: Results from the prostate cancer data example. The curves describe the cumulative incidence of prostatecancer death (left panel) and death due to other causes (right panel). The black curves denote the placebo arm while theorange curves denote the treatment arm, and the shaded areas are 95% conﬁdence bands.22

PREPRINT - S

EPTEMBER

1, 2020 - . - . - . - . . . Time (months) - . - . - . - . . . Time (months)

Figure 2: Separable effects in the prostate cancer data application. Left panel shows ˆ δ e ( t,0)