Estimation of separable direct and indirect effects in continuous time
EE STIMATION OF SEPARABLE DIRECT AND INDIRECT EFFECTS INCONTINUOUS TIME
A P
REPRINT
Torben Martinussen
Department of Biostatistics, University of [email protected]
Mats Julius Stensrud
Department of Mathematics, Ecole Polytechnique Fédérale de [email protected] 1, 2020 A BSTRACT
Many research questions involve time-to-event outcomes that can be prevented from occurring due tocompeting events. In these settings, we must be careful about the causal interpretation of classicalstatistical estimands. In particular, estimands on the hazard scale, such as ratios of cause specific orsubdistribution hazards, are fundamentally hard to be interpret causally. Estimands on the risk scale,such as contrasts of cumulative incidence functions, do have a causal interpretation, but they onlycapture the total effect of the treatment on the event of interest; that is, effects both through and outsideof the competing event. To disentangle causal treatment effects on the event of interest and competingevents, the separable direct and indirect effects were recently introduced. Here we provide new resultson the estimation of direct and indirect separable effects in continuous time. In particular, we derivethe nonparametric influence function in continuous time and use it to construct an estimator that hascertain robustness properties. We also propose a simple estimator based on semiparametric modelsfor the two cause specific hazard functions. We describe the asymptotic properties of these estimators,and present results from simulation studies, suggesting that the estimators behave satisfactorily infinite samples. Finally, we re-analyze the prostate cancer trial from Stensrud et al (2020). K eywords Separable effects · Competing events · Survival analysis · Hazard functions · Influence function
In survival analysis, the event of interest can be prevented from occurring due to a competing event. The presenceof competing events requires us to be careful about the interpretation of classical statistical estimands (Robins, 1986;Young et al., 2020). In particular, it is well-established that estimands on the hazard scale, such as cause specific hazardor subdistribution hazard ratios, do not have a causal interpretation unless we impose strong assumptions that usually areunreasonable (Robins, 1986; Hernán, 2010; Martinussen et al., 2018; Young et al., 2020). Yet the cumulative incidencefunction, which is defined on the risk scale (Andersen et al., 2012), has a causal interpretation as the total effect on theevent of interest (Young et al., 2020).However, the total effect does not inform us about the mechanisms by which the treatment exerts effects on the event ofinterest. To illustrate this, suppose that we perfectly executed a randomized experiment in which 1000 patients receiveda cancer drug and 1000 patients received control. After 5 years, 250 patients died in the treatment arm and 500 died inthe control arm, and therefore the drug was successfully shown to reduce mortality after 5 years. However, the drug wasexcreted in the kidneys, and to assess a potential side effect, the investigators did an additional analysis in which kidneyfailure was the primary outcome. They found 250 kidney failures in the treatment arm and 100 kidney failures in theplacebo arm after 5 years. Two scientists debated the treatment effect on kidney events. As the drug was known to be a r X i v : . [ s t a t . M E ] A ug PREPRINT - S
EPTEMBER
1, 2020excreted in the kidneys, the first scientist suspected that the increase in kidney events was a biological side effect. Thesecond scientist doubted this explanation, and claimed that the increase in kidney events occurred because the drugreduced mortality, and hence more subjects were at risk of developing kidney events in the treatment arm. Thus, eventhough the scientists could identify the cumulative incidences of mortality and kidney events, they were unable to agreeon the causal mechanism by which treatment causes kidney events.It has sometimes been suggested that the marginal distribution function, also called the net risk, can be used to assessthe causal effect of treatment on the event of interest outside of its effect on the competing event. However, interpretingthis estimand requires us to consider a hypothetical intervention to prevent the competing event from occurring, that is,to consider a controlled direct effect (Robins and Greenland, 1992; Young et al., 2020). This is problematic because thehypothetical intervention to prevent the competing event, which is death in our conceptual example above, is usuallyinfeasible in practice, and therefore the causal estimand is ill-defined and is not scientifically interesting. In particular,the controlled direct effect cannot solve the problem raised by the two scientists in the example above.To describe the mechanism by which the treatment exerts effects on the outcome of interest, Stensrud et al. (2020b)recently introduced the separable direct and indirect effects, motivated by a treatment decomposition idea introducedby Robins and Richardson (2010) in a mediation setting (see also Didelez (2018)). The separable direct effect is thetreatment effect on the event of interest outside of its effect on the competing event, and the separable indirect effect isthe effect on the outcome of interest only through the competing event (Stensrud et al., 2020b). These effects add up tothe total effect, that is, the conventional cumulative incidence function, and thereby the separable effects describe themechanism by which the total effect arise. In particular, the discussion between the scientists above could be resolvedby considering the separable effects (Stensrud et al., 2020b).So far the theory on the separable effects has been restricted to settings in discrete time (Stensrud et al., 2020b) andfocused on identification and interpretation. Here we define separable effects in continuous time and consider variousestimators. Importantly, we derive the nonparametric influence function for the separable effects that leads to a robustestimator. We describe the asymptotic properties of the estimator, assess its properties in a simulation study, andre-analyze the example on prostate cancer from Stensrud et al. (2020b).The manuscript is organized as follows. In Section 2, we describe the observed data structure, establish notation thatwill be used in subsequent sections and define the separable effects in continuous time. In Section 3, we derive thenonparametric influence function of the separable direct effect in continuous time, propose an estimator based on theinfluence function, often referred to as the one-step estimator, and describe its robustness to model mis-specification.In Section 4, we study the asymptotic properties of the estimator when Cox proportional hazard models are used forestimation. We also describe the asymptotic properties of the one-step estimator. In Section 5, we give the efficientinfluence function of the separable indirect effect. In Section 6, we assess the performance of the estimators in asimulation study and re-analyse the prostate cancer data from Stensrud et al. (2020b). In section 7, we provide adiscussion. Detailed calculations are given in the Appendix.
Suppose that we observe data from a randomized experiment in which i = 1 , . . . , n individuals are assigned adichotomous treatment A i . For each individual i , we measure data on a vector of covariates, W i , before treatmentassignment, and each i has an event time T i of type (cid:15) i ∈ { , } , where (cid:15) i = 1 denotes the event of interest ( Y ) and (cid:15) i = 2 the competing event D . We will hereby suppress the individual i subscript, because the random vector foreach individual is assumed to be drawn independently from a distribution common to all subjects. Because of lossto follow-up, we only observe ∆ = I ( T ≤ C ) and ˜ T = min ( T, C ) , where C denotes the right censoring time. Weassume that T and C are conditionally independent given ( A, W ) . To focus on the main ideas of this work and simplifythe derivations, we initially develop the theory without censoring. Thus, we first assume that data for a given individualconsist of the vector X = ( T, (cid:15), A, W ) . However, the results immediately extend to settings with censoring, where thecensoring may depend on A and W , as we describe in more detail in Section 3. To define our estimands of interest – the separable effects – we consider a decomposition of A into two dichotomouscomponents, A Y and A D , analogous to Stensrud et al. (2020b): the A Y component exerts all its effects on the event ofinterest ( Y ) outside of the competing event ( D ), and A D exerts all its effect on Y through D . In the observed data,the treatment components are deterministically related in each individual, A = A Y = A D , but we will conceive ahypothetical experiment in which these components are assigned different values. This decomposition assumptionmotivates the definition of our separable direct and indirect effects. However, the separable effects can still be2 PREPRINT - S
EPTEMBER
1, 2020meaningful, even if a physical decomposition of the treatment is possible, e.g. if we can conceive a hypotheticaltreatment that operate in the same way as the A Y component of A , but does not exert effects on A D (Stensrud et al.,2020b,a). We use superscripts to denote counterfactuals, such that T a , is the event time when, possibly contrary to fact, A is set to a , and (cid:15) a indicates whether an event of interest Y ( (cid:15) a = 1 ) or a competing event D ( (cid:15) a = 2 ) occurred at T a . Similarly, T a Y ,a D and (cid:15) a Y ,a D denote counterfactual values under an intervention that sets A Y to a Y and A D to a D .Let P ( t, a Y , a D ) ≡ P ( T a Y ,a D ≤ t, (cid:15) a Y ,a D = 1) . Analogous to the results in Stensrud et al. (2020b) in discrete time, we can now define the separable direct effect oftreatment A at time t as P ( t, , a D ) vs. P ( t, , a D ) for a D ∈ { , } , and the separable indirect effect is defined as P ( t, a Y , vs. P ( t, a Y , for a Y ∈ { , } . Note that pairs of separable direct and indirect effects sum to the total effect, which is equal to the classical cumulativeincidence function, that is, (cid:8) P ( t, , a ) − P ( t, , a ) (cid:9) + (cid:8) P ( t, − a, − P ( t, − a, (cid:9) = P ( t, , − P ( t, , for a ∈ { , } . Consider the additive separable direct effect, δ ( t, a D ) = P ( t, , a D ) − P ( t, , a D ) for a D ∈ { , } . To identify δ ( t, a D ) from the observed data, where A and the components A Y and A D are deterministically related,we impose the following conditions, which are continuous time analogues to the conditions in Stensrud et al. (2020b),see also Robins and Richardson (2010).We assume conditional exchangeability, that is, ( T a , (cid:15) a ) ⊥⊥ A | W for a ∈ { , } , which is a classical exchangeability condition that is expected to hold when treatment A is randomly assigned.Second, we assume consistency, such that if an individual has observed treatment A = a , then ( T a , (cid:15) a ) = ( T, (cid:15) ) , for a ∈ { , } . The consistency assumption ensures that the observed outcome is equal to the counterfactual outcomefor any individual that has observed data history consistent with a counterfactual scenario.Third, positivity such that f ( W = w ) > ⇒ Pr( A = a | W = w ) > for a ∈ { , } , (1) f ( T > t, W = w ) > ⇒ Pr( ˜
T > t, A = a | W = w ) > for a ∈ { , } and for all t < t ∗ , (2)where t ∗ denotes the end of follow-up and f generically denotes a density function. Note that (1) is the usual positivitycondition under interventions on A and (2) ensures that among those event-free through each follow-up time, there existindividuals with A = 1 and individuals with A = 0 that are uncensored.Finally, we impose dismissible component conditions. To introduce these conditions, let λ j ( t | A = a, W = w ) denotethe conditional cause specific hazard function with j = 1 , , denoting the j th cause, and let Λ j ( s | A = a, W = w ) = PREPRINT - S
EPTEMBER
1, 2020 (cid:82) t λ j ( s | A = a, W = w ) ds be the corresponding cumulative conditional cause specific hazard function. Similarly,let λ aj ( t | W = w ) and λ a Y ,a D j ( t | W = w ) be the counterfactual conditional cause specific hazard functions underinterventions on A and joint interventions on A Y and A D , respectively. Then, the dismissible components conditionsare ∆1 : λ a Y ,a D =11 ( t | W = w ) = λ a Y ,a D =01 ( t | W = w ) , a Y ∈ { , } , at all t , which states that a counterfactual hazards of the event of interest ( j = 1 ) are equal under all values of A D , and ∆2 : λ a Y =1 ,a D ( t | W = w ) = λ a Y =0 ,a D ( t | W = w ) , a D ∈ { , } , at all t , which states that a counterfactual hazard functions of the competing event ( j = 2 ) are equal under all values of A Y . Let P ( t, a Y , a D , w ) = (cid:90) t e − Λ ( s | A = a Y ,w ) − Λ ( s | A = a D ,w ) d Λ ( s | A = a Y , w ) ,δ ( t, a D , w ) = P ( t, , a D , w ) − P ( t, , a D , w ) . Note that under the treatment decomposition assumption (Robins and Richardson, 2010; Stensrud et al., 2020b,a) andthe identification conditions in Section 2.3, the cumulative incidence function for Y under treatment a conditionalon W = w is given by P ( t, a, a, w ) , which is also denoted by F ( t | a, w ) . Then a continuous-time equivalent to theG-formula (Robins, 1986) in Stensrud et al. (2020b) is P ( T a Y ,a D ≤ t, (cid:15) a Y ,a D = 1) = (cid:90) P ( t, a Y , a D , w ) f ( W = w ) dw, which allows us to identify our parameter of interest, δ ( t, a D ) = E { δ ( t, a D , W ) } , from the observed data. Suppose we were willing to postulate (semi)parametric models for the cause specific hazard functions, such as Coxproportional hazards models. Then it would be straight forward to estimate δ ( t, a ) using ˆ δ ( t, a ) = ˆ P ( t, , a ) − ˆ P ( t, , a ) , where ˆ P ( t, , a ) = n − (cid:88) i (cid:26)(cid:90) t e − ˆΛ ( s | A =1 ,W i ) − ˆΛ ( s | A = a,W i ) d ˆΛ ( s | A = 1 , W i ) (cid:27) , ˆ P ( t, , a ) = n − (cid:88) i (cid:26)(cid:90) t e − ˆΛ ( s | A =0 ,W i ) − ˆΛ ( s | A = a,W i ) d ˆΛ ( s | A = 0 , W i ) (cid:27) , because the terms in P ( t, , a ) and P ( t, , a ) can easily be estimated using Cox-models for the two cause specifichazard functions. That is, if λ j ( t | a, w ) = λ j ( t ) e β Tj l , with l = ( a, w T ) T , then ˆΛ j ( s | l ) is obtained from ˆΛ j ( t ) e ˆ β Tj l ,which can be estimated from a Cox regression analysis. Asymptotic properties can also be derived using the approachof Chen et al. (2010), which we return to in Section 4.1.However, using such semiparametric regression models for the cause specific hazard functions may lead to biasedresults if these models are misspecified. Therefore we will provide more general results, based on semiparametrictheory (van der Vaart, 2000; van der Laan and Robins, 2003), which leads us to estimators with desirable propertiessuch as semiparametric efficiency and certain robustness (Bang and Robins, 2005). The primary tool to finding theseestimators is to derive the so-called efficient influence function, see van der Vaart (2000), which we do in Section 3.4 PREPRINT - S
EPTEMBER
1, 2020
In this section we give the efficient influence function for the target parameter parameter ψ t ( P ) = δ ( t,
1) = E { P ( t, , , W ) } − E { P ( t, , , W ) } , where we use P to denote the probability measure from which we observe Z = ( T, (cid:15), A, W ) . Note that it is possible torecode the treatment variable A , that is, interchanging the two levels 0 and 1, and thus we can restrict our attention to δ ( t, without loss of generality. The corresponding separable indirect effect is P ( t, , − P ( t, , , which wedecribe in more detail in Section 5.We impose no structure on P and show in the Appendix that the efficient influence function is ˜ ψ ( t, Z )= (cid:8) N ( t ) − P ( t, , , W ) (cid:9) I ( A = 1) P ( A = 1 | W ) − (cid:8)(cid:90) t e − Λ ( s | ,W ) e − Λ ( s | ,W ) dN ( s ) − P ( t, , , W ) (cid:9) I ( A = 0) P ( A = 0 | W ) − (cid:90) t (cid:8) P ( t, , , W ) − P ( u, , , W ) (cid:9)(cid:20) dM T ( u | , W ) P ( T > u | , W ) I ( A = 0) P ( A = 0 | W ) − dM T ( u | , W ) P ( T > u | , W ) I ( A = 1) P ( A = 1 | W ) (cid:21) + δ ( t, , W ) − E (cid:8) δ ( t, , W ) (cid:9) , where N j ( t ) = I ( T ≤ t, (cid:15) = j ) is the j th specific counting process and M Tj ( t | a, w ) is the corresponding countingprocess martingale given A = a, W = w , i.e., M Tj ( t | a, w ) = N j ( t ) − (cid:82) t I ( s ≤ T ) d Λ j ( s | a, w ) . We may furtherrewrite the efficient influence function in terms of the counting process martingales, see the Appendix for further details, ˜ ψ ( t, Z ) = (cid:90) h ( s, t, A, W ) dM T ( s | A, W ) + (cid:90) h ( s, t, A, W ) dM T ( s | A, W )+ δ ( t, , W ) − E (cid:8) δ ( t, , W ) (cid:9) (3)where h ( s, t, A, W ) = I ( s ≤ t ) g ( A, W ) e − Λ ( s | ,W ) e − Λ ( s | A,W ) (cid:26) − { F ( t | A, W ) − F ( s, | A, W ) (cid:9) P ( T > s | A, W ) (cid:27) ,h ( s, t, A, W ) = I ( s ≤ t ) g ( A, W ) P ( T > s | A, W ) (cid:26) P ( t, , , W ) − P ( s, , , W ) − e − Λ ( s | ,W ) e − Λ ( s | A,W ) (cid:8) F ( t | A, W ) − F ( s, | A, W ) (cid:9)(cid:27) ,g ( A, W ) = AP ( A = 1 | W ) − − AP ( A = 0 | W ) . We remind the reader that C denotes the potential censoring time, ˜ T = T ∧ C and ∆ = I ( T ≤ C ) . We can immediatelygeneralize (3) to allow for censoring, as we have imposed no structure on P (Tsiatis, 2006, formula 10.76): the efficientinfluence function based on the observed data D = ( ˜ T , ∆ , ∆ (cid:15), A, W ) is given by ψ ( t, D ) = ˜ ψ ( t, Z )∆ K C ( T | A, W ) + (cid:90) ∞ L ( s, A, W ) K C ( s | A, W ) dM C ( s | A, W ) , (4)where we let Λ C ( s | A, W ) = (cid:82) s λ C ( u | A, W ) du denote the cumulative censoring hazard function, K C ( s | A, W ) = e − Λ C ( s | A,W ) is the corresponding survival function, and M C ( t | A, W ) = N C ( t ) − (cid:90) t I ( s ≤ ˜ T ) d Λ C ( s | A, W ) is the martingale associated with the censoring counting process N C ( t ) = I ( ˜ T ≤ t, ∆ = 0) using the filtration wherewe include A and W . In (4), L ( s, A, W ) = E (cid:8) ˜ ψ ( t, Z ) | T > s, A, W (cid:9) . (5)Following Lemma A.2 of Lu and Tsiatis (2008), this is easily generalized to the competing risk setting considered here,we can express (4) in terms of the counting process martingales based on the observed data. We get that the efficientinfluence function based on the observed data can be written as ψ ( t, D ) = φ ( t, D ) + δ ( t, , W ) − E (cid:8) δ ( t, , W ) (cid:9) , (6)5 PREPRINT - S
EPTEMBER
1, 2020where φ ( t, D ) = (cid:90) h ( s, t, A, W ) K C ( s | A, W ) dM ( s | A, W ) + (cid:90) h ( s, t, A, W ) K C ( s | A, W ) dM ( s | A, W ) and where M j ( t | a, w ) , j = 1 , , are the observed counting process martingales given A = a, W = w , i.e., for j = 1 , , M j ( t | a, w ) = ∆ I ( ˜ T ≤ t, (cid:15) = j ) − (cid:90) t I ( s ≤ ˜ T ) d Λ j ( s | a, w ) . The one-step estimator based on the efficient influence function is thus given by ˆ δ e ( t,
1) = n − n (cid:88) i =1 (cid:26) ˆ δ ( t, , W i ) + ˆ φ ( t, D i ) (cid:27) , (7)where ˆ δ ( t, , W i ) is defined analogously to δ ( t, , W i ) , except that the unknown quantities are replaced with estimatedcounterparts, and similarly for ˆ φ . This part requires working models, and in Section 3.1 we describe how robust theresulting estimator is to mis-specification of these working models. Note also that the one-step estimator is equal to thesimple estimator ˆ δ ( t, plus an augmentation term. We argue now that the estimator ˆ δ e ( t, given in (7) has certain robustness properties unlike the initial estimator ˆ δ ( t, based on Cox-models for the cause specific hazard functions. Consider first the setting where there is no censoring,i.e., we are then basing estimation on the efficient influence function ˜ ψ ( t, Z ) given in (3). Let H denote the unknownparameters that goes into the efficient influence function ˜ ψ ( t, Z ) , i.e. H = { Λ ( ·| A, W ) , Λ ( ·| A, W ) , P ( A = 1 | W ) } .Our estimator in this case is then the solution to n − (cid:80) i ˜ ψ ( t, Z i , H n ) , where H n is an estimator of H . We showin the Appendix that the resulting estimator is consistent if two out of the three possible working models are correctlyspecified. Now consider the setting where we allow for censoring. Let G denote the unknown parameters that goes intothe efficient influence function, i.e. G = { Λ ( ·| A, W ) , Λ ( ·| A, W ) , P ( A = 1 | W ) , Λ C ( ·| A, W ) } so there are now fourmodels, which we denote (i) to (iv) in the order indicated in the definition of G . We show in the Appendix that ˆ δ e ( t, is consistent if the following models are correctly specified: (i) and (ii), or (i), (iii) and (iv), or (ii), (iii) and (iv). Hence,in a randomized study with the censoring being independent of W then we obtain consistency if one of the two causespecific hazard models is correctly specified, but not necessarily both. In this subsection we assume that the cause specific hazard functions are on Cox proportional hazards form, i.e. thatthese models are correctly specified. Specifically, let Λ j ( t | a, w ) = Λ j ( t ) e β jA a + β jW w , j = 1 , , and ˆ P ( t, a Y , a D ) = n − (cid:88) i ˆ P ( t, a Y , a D , W i ) , where ˆ P ( t, a Y , a D , W i ) is calculated using the estimates from fitting separate Cox regression models with the event ofinterest and the competing event as dependent variable. We show in the Appendix that n / (cid:110) ˆ P ( t, a Y , a D ) − P ( t, a Y , a D ) (cid:111) = (cid:88) i (cid:15) P i ( t, a Y , a D ) , where (cid:15) P i ( t, a Y , a D ) are zero-mean iid terms (i.e. the influence function) so that ˆ P ( t, a Y , a D ) is a RAL estimator(Tsiatis, 2006) as long as the specified Cox proportional hazards models for the cause specific hazard functions arecorrectly specified. In the Appendix, we also give further details on how to estimate the influence function enhancingestimation of the variance of the estimator. 6 PREPRINT - S
EPTEMBER
1, 2020
Let ˜ φ ( t, D, G ) = δ ( t, , W, G ) + φ ( t, D, G ) . so that efficient influence function ψ ( t, D, G ) is re-expressed as ˜ φ ( t, D, G ) − δ ( t, . If G is known then n − (cid:88) i (cid:8) ˜ φ ( t, D i , G ) − ˆ δ e ( t, (cid:9) = n − (cid:88) i (cid:8) ˜ φ ( t, D i , G ) − δ ( t, (cid:9) − { ˆ δ e ( t, − δ ( t, } from which we see that ˆ δ e ( t, has influence function ψ ( t, D, G ) , ie, the efficient influence function. Thus, in thiscase, ˆ δ e ( t, is a semiparametrically efficient RAL estimator (Tsiatis, 2006). In reality G is not known and needs tobe estimated. Let G n be such an estimator of G . The proposed estimator ˆ δ e ( t, solves n − (cid:88) i (cid:8) ˜ φ ( t, D i , G n ) − ˆ δ e ( t, (cid:9) and, therefore, n / { ˆ δ e ( t, − δ ( t, } = n − / (cid:88) i (cid:8) ˜ φ ( t, D i , G ) − δ ( t, (cid:9) + n − / (cid:88) i (cid:8) ˜ φ ( t, D i , G n ) − ˜ φ ( t, D i , G ) (cid:9) = n − / (cid:88) i ψ ( t, D i , G ) + n − / (cid:88) i (cid:8) ˜ φ ( t, D i , G n ) − ˜ φ ( t, D i , G ) (cid:9) = n − / (cid:88) i ψ ( t, D i , G ) + En / (cid:8) ˜ φ ( t, D, G n ) − ˜ φ ( t, D, G ) (cid:9) + o p (1) , following Chen et al. (2010). The expectation on the right hand side of the latter display is taken w.r.t. to D . Based oncorrectly specified models G n one may then derive the true analytical form of the influence function of ˆ δ e ( t ) in whichcase ˆ δ e ( t, is still a semiparametrically RAL estimator. The expression of the influence function depends on thespecific chosen working models G n similar to the development in Section 4.1, however. Instead we recommend usingthe non-parametric bootstrap procedure to estimate the variance of ˆ δ e ( t ) similar to what has been advocated in forinstance van der Laan and Rubin (2006) and van der Laan and Rose (2011). Later, we discuss an alternative approachfor estimation of the variance of ˆ δ e ( t ) . Analogous to the results in Section 3, we get the efficient influence function for the separable indirect effect, P ( t, , − P ( t, , , which can be written as a function of the observed data, ψ I ( t, D ) = φ I ( t, D ) + P ( t, , , W ) − P ( t, , , W ) − (cid:8) P ( t, , − P ( t, , (cid:9) , (8)where φ I ( t, D ) = (cid:90) h I ( s, t, A, W ) K C ( s | A, W ) dM ( s | A, W ) + (cid:90) h I ( s, t, A, W ) K C ( s | A, W ) dM ( s | A, W ) , with h I ( s, t, A, W ) = I ( s ≤ t ) g ( A, W ) (cid:26) − e − Λ ( s | ,W ) e − Λ ( s | A,W ) (cid:27) (cid:26) − { F ( t | A, W ) − F ( s, | A, W ) (cid:9) P ( T > s | A, W ) (cid:27) ,h I ( s, t, A, W ) = I ( s ≤ t ) g ( A, W ) P ( T > s | A, W ) (cid:20) P ( t, , , W ) − P ( s, , , W ) − (cid:26) − e − Λ ( s | ,W ) e − Λ ( s | A,W ) (cid:27) (cid:8) F ( t | A, w ) − F ( s, | A, W ) (cid:9)(cid:21) , and M j ( t | a, w ) , j = 1 , , are the observed counting process martingales given A = a, W = w .7 PREPRINT - S
EPTEMBER
1, 2020 ˆ δ ( t, We first consider the performance of the estimator ˆ δ ( t, that is based on using Cox-proportional hazards models forthe two cause specific hazard functions. Clearly, this estimator is only consistent if the proportional cause specifichazards models are correctly specified. To generate data we used the cause specific hazard functions λ ( t | A = a, W = w ) = λ ( t ) e β A a + β W w λ ( t | A = a, W = w ) = λ ( t ) e β A a + β W w with λ ( t ) = 0 . , β A = − log (2) , β W = 0 . , and with λ ( t ) = 0 . , β A = − , β W = 0 . .Treatment indicator A was generated with P ( A = 1) = 0 . , and the covariate W was uniform on (0 , . Censoringwas generated according to the minimum of 7 and an exponentially distribution with mean 12. We then appliedthe estimators ˆ P ( t, , , ˆ P ( t, , and ˆ δ ( t,
1) = ˆ P ( t, , − ˆ P ( t, , and their corresponding standard errorestimators, all calculated at time points 2, 4 and 6. Results are summarized in Table 1. Each entry in the table is basedon 1000 replicates. Table 1 about hereBoth the estimator and its corresponding standard error estimator behave satisfactorily (Table 1). At the early timepoints the coverage is slightly less than nominal for P ( t, , and P ( t, , when n = 400 but improves for n = 800 . ˆ δ e ( t, We now assess the performance of the estimator ˆ δ e ( t, given in (7), which is derived from the efficient influencefunction. We investigate the robustness properties of this estimator. We also report results for the simple estimator ˆ δ ( t, The exposure A is binary and the covariate W is uniform on (0 , . To be able to compute ˆ δ e ( t, we needworking models for the two cause specific hazard models, the propensity score and the censoring hazard function.We used Cox proportional hazard models for the two cause specific hazard models with main effects of A and W , alogistic regression model with main effects of W for the propensity score, and a Cox proportional hazards model for thecensoring hazard function with effect of A only. Let L = I ( W > / , λ ( t ) = 0 . , λ ( t ) = 0 . , β A = − log (5) , β A = 0 , β W = log (2) β W = 0 . . Censoring times were generated as C = min ( ˜ C, where ˜ C wasgenerated using the hazard function λ ˜ C ( t | A, W ) specified below. We used a sample size of n = 400 and simulateddata from the following different scanarios.A1 All models are correctly specified. λ ( t | A = a, W = w ) = λ ( t ) e β A a + β W w , λ ( t | A = a, W = w ) = λ ( t ) e β A a + β W w P ( A = 1 | W ) = expit (0 + log (2)( W − . , λ ˜ C ( t | A, W ) = 12 . A2 All models are correctly specified except the censoring model. λ ( t | A = a, W = w ) = λ ( t ) e β A a + β W w , λ ( t | A = a, W = w ) = λ ( t ) e β A a + β W w P ( A = 1 | W ) = expit (0 + log (2)( W − . , λ ˜ C ( t | A, W ) = 12 e . W . B1 The cause specific hazard models and the censoring model are correctly specified, but the propensity scoremodel is not. λ ( t | A = a, W = w ) = λ ( t ) e β A a + β W w , λ ( t | A = a, W = w ) = λ ( t ) e β A a + β W w P ( A = 1 | W ) = 0 . L + 0 . − L ) , λ ˜ C ( t | A, W ) = 12 . B2 The cause specific hazard models are correctly specified, but the propensity score model and the censoringmodel are not. λ ( t | A = a, W = w ) = λ ( t ) e β A a + β W w , λ ( t | A = a, W = w ) = λ ( t ) e β A a + β W w P ( A = 1 | W ) = 0 . L + 0 . − L ) , λ ˜ C ( t | A, W ) = 12 e . W . PREPRINT - S
EPTEMBER
1, 2020C1 λ ( t | A = a, W = w ) is a proportional hazard, but λ ( t | A = a, W = w ) is not. The propensity scoremodel and the censoring model are correctly specified: λ ( t | A = a, W = w ) = (1 − a ) λ ( t ) e β A a + β W w + aλ ( t ) e β A L − β A (1 − L )+ β W w λ ( t | A = a, W = w ) = λ ( t ) e β A a + β W w P ( A = 1 | W ) = expit (0 + log (2)( W − . , λ ˜ C ( t | A, W ) = 12 . C2 λ ( t | A = a, W = w ) is a proportional hazard, but λ ( t | A = a, W = w ) is not. The propensity scoremodel is correctly specified but the censoring model is not: λ ( t | A = a, W = w ) = (1 − a ) λ ( t ) e β A a + β W w + aλ ( t ) e β A L − β A (1 − L )+ β W w λ ( t | A = a, W = w ) = λ ( t ) e β A a + β W w P ( A = 1 | W ) = expit (0 + log (2)( W − . , λ ˜ C ( t | A, W ) = 12 e . W . We used 250 bootstrap replicates to calculate the bootstrap estimate of the variability of ˆ δ e ( t, . We also calculated anestimator of the variability based on the squared efficient influence function.Table 2 about hereThe results are summarized in Table 2, where each entry in the table based on 1000 replicates. We see that bothestimators are consistent under scenario A1 and A2, and that the simple estimator is slightly more efficient, as expectedsince both cause specific hazard functions are proportional. In Scenario B1 and B2, where the two cause specific hazardsmodels are correctly specified but the propensity score model is mis-specified, both estimators are consistent. Underscenario C1, where only λ ( t | A, W ) is a proportional hazard, the simple estimator is biased whereas the one basedon the efficient influence function is still consistent, as both the censoring and propensity score models are correctlyspecified. Under scenario C2, where the censoring model is mis-specified, the estimator based on the efficient influencefunction is now slightly biased, and the the simple estimator suffer from more severe bias. To illustrate the new estimators, we used data from a randomized trial on prostate cancer therapy (Byarand Green, 1980), which were also analyzed in Stensrud et al. (2020b) and are publicly available to anyone(http://biostat.mc.vanderbilt.edu/DataSets). We restricted our analysis to the patients who received placebo (127patients) and high-dose DES (125 patients). We included baseline measurements of daily activity function (binary), age(centered around its mean), hemoglobin level (centered around its mean) and previous cardiovascular disease (binary)in our analysis. We considered death due to prostate cancer as the event of interest and death due to other causes(consisting primarily of cardiovascular deaths) as the competing event.The events were recorded in monthly intervals from randomization. We used Cox proportional hazards models to obtainthe following hazard ratio estimates of the two cause specific hazards (comparing treatment to placebo): 0.74 (95%CI: 0.45, 1.21) for the primary cause and 1.17 (95% CI: 0.82, 1.66) for the competing cause. However, these hazardratio estimates cannot be interpreted causally (Young et al., 2020; Stensrud et al., 2020b). Yet point estimates of thecumulative incidence curves suggests that the treatment reduces the risk of death death to prostate cancer (Figure 1, leftdisplay), but increases the risk of death due to other causes (Figure 1, right display).Figure 1 about hereTo disentangle the causal treatment effect on the risk of dying from prostate cancer and from competing events,we therefore estimated the separable direct δ ( t, using the proposed ˆ δ e ( t, (Figure 2, left display). The pointestimates suggest a beneficial separable direct of the prostate cancer therapy (although the confidence bands cover0). For example, the separable direct effect after 40 months is estimated to be approximately -0.09, that is, ˆ δ e ( t =40 ,
0) = − .
09 (95%
CI: − . , − . , suggesting that a component of treatment reduces the risk of death dueto prostate cancer. As discussed in Stensrud et al. (2020b) this is supported by the biological argument that DESprevents the male testicles from producing testosterone, which, in turn, may prevent prostate cancer cells fromreplicating. On the other hand, the separable indirect effect (Figure 2, right display) is estimated to be approximately − . CI: − . , − . at t = 40 , suggesting that there is a component of DES that may increase the risk of9 PREPRINT - S
EPTEMBER
1, 2020death due to other causes, potentially because DES includes estrogen that may increase cardiovascular risk (Stensrudet al., 2020b). Thus, given that our identifiability conditions hold, the point estimates suggests that there is potentialfor improving the prostate cancer treatment by providing a new (modified) treatment that does not exert effects ondeath due to other causes. Indeed, such treatments already exist: Luteinising Hormone Releasing Hormone (LHRH)antagonists or orchidectomy (castration) are frequently used to suppress testosterone production in prostate cancerpatients today, and these treatments do not contain estrogen.Figure 2 about here
The separable direct and indirect effects clarify the causal interpretation of treatment effects in competing event settings(Stensrud et al., 2020b), which are ubiquitous in medicine and epidemiology. Our new results enable researchers toestimate these effects using classical models statistical models for survival analysis, such as Cox proportional hazardsmodels. Moreover, by deriving the nonparametric efficient influence function, we have obtained a one-step estimator ofthe separable effects. Alternatively one may estimate the parameter using Targeted Maximum Likelihood Estimation(TMLE), see van der Laan and Rubin (2006) and van der Laan and Rose (2011). In the Appendix, we give calculationsto carry out TMLE for the specific parameter considered in this paper.The estimator derived from the efficient influence function has certain desirable robustness properties, allowing someof the working models to be misspecified. It is, however, not straightforward to estimate the variance of the resultingestimator, because it depends on the working models that may contribute to the variability of the estimator. This is awell known problem, see for instance Moore and van der Laan (2009) and van der Laan and Rose (2011). In this article,we use the non-parametric bootstrap to estimate the variance. Alternatively, one may follow the more complicated routeoutlined in Benkeser et al. (2017) that gives a detailed description of the asymptotic linearity of a similar estimator,although in a simpler setting than the one considered in this paper. Extending the method of Benkeser et al. (2017) tothe setting considered in the present paper is a topic for future research.10
PREPRINT - S
EPTEMBER
1, 2020
References
Andersen, P. K., Geskus, R. B., de Witte, T., and Putter, H. (2012). Competing risks in epidemiology: possibilities andpitfalls.
International journal of epidemiology , 41(3):861–870.Bang, H. and Robins, J. M. (2005). Doubly robust estimation in missing data and causal inference models.
Biometrics ,61(4):962–973.Benkeser, D., Carone, M., Laan, M. J. V. D., and Gilbert, P. B. (2017). Doubly robust nonparametric inference on theaverage treatment effect.
Biometrika , 104(4):863–880.Byar, D. and Green, S. (1980). The choice of treatment for cancer patients based on covariate information.
Bulletin ducancer , 67(4):477–490.Chen, L., Lin, D. Y., and Zeng, D. (2010). Attributable fraction functions for censored event times.
Biometrika ,97(3):713–726.Didelez, V. (2018). Defining causal meditation with a longitudinal mediator and a survival outcome.
Lifetime dataanalysis , pages 1–18.Hernán, M. A. (2010). The hazards of hazard ratios.
Epidemiology (Cambridge, Mass.) , 21(1):13.Lu, X. and Tsiatis, A. A. (2008). Improving the efficiency of the log-rank test using auxiliary covariates.
Biometrika ,95(3):679–694.Martinussen, T., Vansteelandt, S., and Andersen, P. K. (2018). Subtleties in the interpretation of hazard ratios. arXivpreprint arXiv:1810.09192 , page 0.Moore, K. L. and van der Laan, M. J. (2009). Covariate adjustment in randomized trials with binary outcomes: Targetedmaximum likelihood estimation.
Statistics in Medicine , 28(1):39–64.Robins, J. (1986). A new approach to causal inference in mortality studies with a sustained exposure period—applicationto control of the healthy worker survivor effect.
Mathematical modelling , 7(9-12):1393–1512.Robins, J. M. and Greenland, S. (1992). Identifiability and exchangeability for direct and indirect effects.
Epidemiology ,pages 143–155.Robins, J. M. and Richardson, T. S. (2010). Alternative graphical causal models and the identification of direct effects.
Causality and psychopathology: Finding the determinants of disorders and their cures , pages 103–158.Stensrud, M. J., Hernán, M. A., Tchetgen, E. J. T., Robins, J. M., Didelez, V., and Young, J. G. (2020a). Generalizedinterpretation and identification of separable effects in competing event settings. arXiv preprint arXiv:2004.14824 .Stensrud, M. J., Young, J. G., Didelez, V., Robins, J. M., and Hernán, M. A. (2020b). Separable effects for causalinference in the presence of competing events.
Journal of the American Statistical Association , (just-accepted):1–23.Tsiatis, A. (2006).
Semiparametric Theory and Missing Data . Springer New York.van der Laan, M. and Robins, J. M. (2003).
Unified methods for censored longitudinal data and causality . SpringerScience & Business Media.van der Laan, M. and Rubin, D. (2006). Targeted maximum likelihood learning.
The International Journal ofBiostatistics .van der Laan, M. J. and Rose, S. (2011). Targeted learning.
Springer Series in Statistics .van der Vaart, A. W. (2000).
Asymptotic statistics , volume 3. Cambridge university press.Young, J. G., Stensrud, M. J., Tchetgen Tchetgen, E. J., and Hernán, M. A. (2020). A causal framework for classicalstatistical estimands in failure-time settings with competing events.
Statistics in Medicine .11
PREPRINT - S
EPTEMBER
1, 2020
We consider the case where we observe the full data Z = ( T, (cid:15), A, W ) , that is, no individual is censored due to loss tofollow-up. Once we have the efficient influence function for the full data case then it is easy to get it for the observeddata case as described in Section 2. We calculate the efficient influence function through the following steps.(i) Calculate the efficient influence for ψ ( P ) = Λ ( t | a, w ) .(ii) Calculate the efficient influence for ψ ( P ) = Z ( t, w ) = exp {− Λ ( t | a, w ) } exp {− Λ ( t | a ∗ , w ) } . (iii) Calculate the efficient influence for ψ ( P ) = F ( t | a ∗ , w ) . (iv) Calculate the efficient influence for ψ ( P ) = (cid:90) t Z ( s, w ) dF ( s | a ∗ , w ) . (v) Calculate the efficient influence for ψ ( P ) = P ( t, a ∗ , a ) = (cid:90) (cid:90) t Z ( s, w ) dF ( s | a ∗ , w ) f ( w ) dw. (vi) Calculate the efficient influence for ψ ( P ) = δ ( t,
1) = P ( t, a ∗ = 1 , a = 1) − P ( t, a ∗ = 0 , a = 1) , which is the direct effect we are interested in.Since we have posed no structure on the distribution of ( T, (cid:15), A, W ) we may use the parametric submodel P v given by f v ( x ) = f ( x ) { v · g ( x ) } , when doing the efficient influence function calculations. In the latter display, g denotes a zero-mean function with finitesecond moment. We need to find ˙ ψ ( g ) = (cid:18) ∂ψ { P v } ∂v (cid:19) | v =0 , which we write as ˙ ψ ( g ) = (cid:90) ˜ ψgdP, and then our efficient influence function is ˜ ψ . (i) We need to write Λ ( t | a, w ) first as a function of the underlying probability measure. This is done by noting that P ( (cid:15) = 2 | T = s, A = a, w ) = λ ( s | A = a, w ) λ ( s | A = a, w ) + λ ( s | A = a, w ) and therefore Λ ( t | a, w ) = (cid:90) t f ( s, (cid:15) = 2 , a, w ) P ( T > s, a, w ) ds (9)and, also that P ( T > s, a, w ) = (cid:88) (cid:15) =1 (cid:90) ∞ s f ( u, (cid:15), a, w ) du so in this way we have expressed Λ ( t | a, w ) in terms of the underlying density function f . We parametrise the densityfunction f v ( x ) = f ( x ) { vg ( x ) } , PREPRINT - S
EPTEMBER
1, 2020and calculate ∂∂v (Λ v ( t | a, w )) | v =0 where we in (9) replace f with f v . This leads to ∂∂v (Λ v ( t | a, w )) | v =0 = (cid:90) I ( s < t, (cid:15) = 2) I ( A = a, w ) P ( A = a, w ) 1 P ( T > s | A, w ) (cid:40) g ( s, (cid:15), A, w ) − (cid:80) (cid:15) =1 (cid:82) ∞ s g ( u, ˜ (cid:15), A, w ) f ( u, ˜ (cid:15), A, w ) duP ( T > s, A, w ) (cid:41) f ( s, (cid:15), A, w ) dx, (10)with x = ( s, (cid:15), A, w ) . So we can directly read off that the first term in the latter display contributes with I ( T < t, (cid:15) = 2) I ( A = a, w ) P ( A = a, w ) exp {− Λ( T | a, w ) } = I ( A = a, w ) P ( A = a, w ) (cid:90) t P ( T > s | a, w ) dN ( s ) to the efficient influence function. By changing order of integration, it may be seen that second term in (10) contributes I ( A = a, w ) P ( A = a, w ) (cid:90) t P ( T > s | a, w ) I ( s ≤ T ) λ ( s | a, w ) ds so all in all, we have that ∂∂v (Λ v ( t | a, w )) | v =0 = (cid:90) ˜ ψgdP (11)where ˜ ψ ( T, (cid:15), A, w ) = I ( A = a, w ) P ( A = a | w ) P ( w ) (cid:90) t P ( T > s | a, w ) dM ( s | a, w ) (12)with M ( t | a, w ) = N ( t ) − (cid:90) t I ( s ≤ T ) λ ( s | a, w ) ds being the martingale associated with the cause 2 counting process (where we condition on A = a and W = w ). (ii) Taking h ( x, y ) = e − x + y and by simple differentiation, we get ˜ ψ ( T, (cid:15), A, w ) = Z ( t, w ) (cid:20) I ( A = a ∗ ) P ( A = a ∗ | w ) (cid:90) t P ( T > s | a ∗ , w ) dM ( s | a ∗ , w ) − I ( A = a ) P ( A = a | w ) (cid:90) t P ( T > s | a, w ) dM ( s | a, w ) (cid:21) I ( W = w ) P ( w ) (13) (iii) We use L = ( A, W T ) T . F ( t | l ) = (cid:90) t P ( T > s | l ) λ ( s | l ) ds We seek to write this parameter as a function of the underlying probability measure, ψ ( P ) . The density function is f ( t, (cid:15), l ) = P ( T > t | l ) { λ ( s | l ) } I ( (cid:15) =1) { λ ( s | l ) } I ( (cid:15) =2) f ( l ) and we can therefore write F ( t | l ) as F ( t | l ) = (cid:90) I ( s < t, (cid:15) = 1) I ( L = l ) P ( L = l ) f ( s, (cid:15), l ) d ( s, (cid:15), l ) = ψ ( P ) We now parametrise the density function f v ( x ) = f ( x ) { vg ( x ) } , where x = ( s, (cid:15), l ) . So we first need to find (cid:16) ∂ψ ( P v ) ∂v (cid:17) | v =0 , which essentially amounts to find ∂∂v (cid:18) f v ( s, (cid:15), l ) P v ( L = l ) (cid:19) | v =0 PREPRINT - S
EPTEMBER
1, 2020It is easily seen that ∂∂v ( P v ( L = l )) | v =0 = E { g ( X ) | L = l } P ( L = l ) ∂∂v ( f v ( s, (cid:15), l )) | v =0 = g ( x ) f ( x ) giving that (cid:18) ∂ψ ( P v ) ∂v (cid:19) | v =0 = (cid:90) I ( s < t, (cid:15) = 1) I ( L = l ) P ( L = l ) { g ( x ) − E ( g ( x ) | L = l ) } f ( x ) dx. Further, (cid:90) I ( s < t, (cid:15) = 1) I ( L = l ) P ( L = l ) { E ( g ( x ) | L = l ) } f ( x ) dx = E { I ( T < t, (cid:15) = 1) I ( L = l ) P ( L = l ) E ( g ( x ) | L ) } = EE ( | L )= E (cid:18) g ( X ) I ( L = l ) P ( L = l ) E { I ( T < t, (cid:15) = 1) | L } (cid:19) giving us that (cid:18) ∂ψ ( P v ) ∂v (cid:19) | v =0 = (cid:90) g ( x ) I ( L = l ) P ( L = l ) [ I ( s < t, (cid:15) = 1) − E { I ( T < t, (cid:15) = 1) | L = l } ] f ( x ) dx = (cid:90) ˜ ψgdP, where ˜ ψ ( T, (cid:15), A = a ∗ , w ) = I ( A = a ∗ , w ) P ( A = a ∗ | w ) P ( w ) { N ( t ) − F ( t | a ∗ , w ) } , (14)is the efficient influence function. (iv) By looking at the transform (cid:90) x ( s ) dy ( s ) we obtain that the efficient influence function of ψ ( P ) = (cid:82) t Z ( s, w ) dF ( s | a ∗ , w ) consists of the sum of the terms inthe following two displays (cid:20) I ( A = a ∗ ) P ( A = a ∗ | w ) (cid:90) t (cid:90) tu Z ( s, w ) dF ( s | a ∗ , w ) 1 P ( T > u | a ∗ , w ) dM ( u | a ∗ , w ) − I ( A = a ) P ( A = a | w ) (cid:90) t (cid:90) tu Z ( s, w ) dF ( s | a ∗ , w ) 1 P ( T > u | a, w ) dM ( u | a, w ) (cid:21) I ( W = w ) P ( w ) (15)and I ( A = a ∗ , w ) P ( A = a ∗ | w ) P ( w ) (cid:90) t Z ( s, w ) { dN ( s ) − dF ( s | a ∗ , w ) } . (16) (v) We obtain directly that the efficient influence function of ψ ( P ) = (cid:90) (cid:90) t Z ( s, w ) dF ( s | a ∗ , w ) f ( w ) dw is (cid:90) t (cid:90) tu Z ( s, w ) dF ( s | a ∗ , w ) (cid:20) dM ( u | a ∗ , w ) P ( T > u | a ∗ , w ) I ( A = a ∗ ) P ( A = a ∗ | w ) − dM ( u | a, w ) P ( T > u | a, w ) I ( A = a ) P ( A = a | w ) (cid:21) − I ( A = a ∗ ) P ( A = a ∗ | w ) (cid:90) t Z ( s, w ) { dN ( s ) − dF ( s | a ∗ , w ) } + (cid:90) t Z ( s, w ) dF ( s | a ∗ , w ) − E (cid:26)(cid:90) t Z ( s, W ) dF ( s | a ∗ , W ) (cid:27) (17)Note that (cid:90) tu Z ( s, w ) dF ( s | a ∗ , w ) = P ( t, a ∗ , a, w ) − P ( u, a ∗ , a, w ) . PREPRINT - S
EPTEMBER
1, 2020 (vi)
We can now use the result in (v) to get the efficient influence function of ψ ( P ) = δ ( t,
1) = P ( t, a ∗ = 1 , a = 1) − P ( t, a ∗ = 0 , a = 1) since both terms on the right hand side of the latter display are special cases of (v). This gives the desired efficientinfluence function ˜ ψ ( t, T, (cid:15), A, W ) (cid:8) N ( t ) − P ( t, , , W ) (cid:9) I ( A = 1) P ( A = 1 | W ) − (cid:8)(cid:90) t e − Λ ( s | ,W ) e − Λ ( s | ,W ) dN ( s ) − P ( t, , , W ) (cid:9) I ( A = 0) P ( A = 0 | W ) − (cid:90) t (cid:8) P ( t, , , W ) − P ( u, , , W ) (cid:9)(cid:20) dM ( u | , W ) P ( T > u | , W ) I ( A = 0) P ( A = 0 | W ) − dM ( u | , W ) P ( T > u | , W ) I ( A = 1) P ( A = 1 | W ) (cid:21) + δ ( t, , W ) − E (cid:8) δ ( t, , W ) (cid:9) (18)where, as earlier defined, δ ( t, , W ) = P ( t, , , W ) − P ( t, , , W ) . Note also that P ( t, , , W ) = F ( t | A = 1 , W ) , the cumulative incidence function for cause 1 given A = 1 and W , and that δ ( t,
1) = E (cid:8) δ ( t, , W ) (cid:9) is the parameter of interest.The efficient influence function ˜ ψ ( t, T, (cid:15), A, W ) needs to be an element of the tangent space T , which is given by T = T ⊕ T ⊕ T where T = (cid:8)(cid:90) h ( u, A, W ) dM ( u, A, W ) for all h ( u, a, w ) (cid:9) , T = (cid:8)(cid:90) h ( u, A, W ) dM ( u, A, W ) for all h ( u, a, w ) (cid:9) , T = (cid:8) h ( A, W ) ∈ H : E { h ( X, W ) } = 0 (cid:9) . This is not obvious at first sight when looking at (18). However, it turns out be true as we may show that N ( t ) − F ( t | a, w ) = (cid:90) t dM ( s | a, w ) − (cid:90) t { F ( t | a, w ) − F ( s, | a, w ) (cid:9) P ( T > s | a, w ) dM ( s | a, w ) (19)where M ( t | a, w ) = M ( t | a, w ) + M ( t | a, w ) . To see that (19) holds just check that it holds in the two cases T ≤ t and t < T . Using (19) we can write the efficient influence function ˜ ψ ( t, T, (cid:15), A, W ) as (cid:8) M ( t | , W ) − (cid:90) t { F ( t | , w ) − F ( s, | , w ) (cid:9) P ( T > s | , w ) dM ( s | , w ) (cid:9) I ( A = 1) P ( A = 1 | W ) − (cid:8)(cid:90) t e − Λ ( s | ,W ) e − Λ ( s | ,W ) (cid:8) dM ( s | , w ) − { F ( t | , w ) − F ( s, | , w ) (cid:9) P ( T > s | , w ) dM ( s | , w ) (cid:9) I ( A = 0) P ( A = 0 | W )+ (cid:90) t (cid:8) P ( t, , , W ) − P ( u, , , W ) (cid:9)(cid:20) dM ( u | , W ) P ( T > u | , W ) I ( A = 1) P ( A = 1 | W ) − dM ( u | , W ) P ( T > u | , W ) I ( A = 0) P ( A = 0 | W ) (cid:21) + δ ( t, , W ) − E (cid:8) δ ( t, , W ) (cid:9) (20)and from this representation we note that ˜ ψ ( t, T, (cid:15), A, W ) indeed belongs to the tangent space T . The expression in thelatter display can further be re-written giving (3). 15 PREPRINT - S
EPTEMBER
1, 2020
We argue now that the estimator ˆ δ e ( t, given in (7) has certain robustness properties unlike the initial estimator ˆ δ ( t, based on Cox-models for the cause specific hazard functions. We first consider the case where there is nocensoring. The estimator ˆ δ e ( t, consist of a difference between two terms, and we consider these terms separately.We have that P ( t, ,
1) = E { F ( t | A = 1 , W ) } , and define Y = N ( t ) I ( A = 1) P ( A = 1 | W ) so E ( Y ) = E { F ( t | A = 1 , W ) } if P ( A = 1 | W ) is correctly specified. The corresponding terms of the estimator are (cid:20) N ( t ) − F ( t | A = 1 , W ) (cid:21) I ( A = 1) P ( A = 1 | W ) + F ( t | A = 1 , W )= Y + F ( t | A = 1 , W ) (cid:26) P ( A = 1 | W ) − I ( A = 1) P ( A = 1 | W ) (cid:27) (21)from which we see that the mean of the left hand side of (21), if P ( A = 1 | W ) is correctly specified, is E ( Y ) + 0 = F ( t, so it has the correct mean in that case. On the other hand, if F ( t | A = 1 , W ) is correctly specified (which is the case if λ j ( t | A = 1 , W ) , j = 1 , , are correctly specified) but P ( A = 1 | W ) may not be so, then the mean of the left hand sideof (21) is E (cid:26) F ( t | A = 1 , W ) P ( A = 1 | W ) P ∗ ( A = 1 | W ) (cid:27) + E (cid:20) F ( t | A = 1 , W ) (cid:26) − P ( A = 1 | W ) P ∗ ( A = 1 | W ) (cid:27)(cid:21) = F ( t, , where we use P ∗ ( A = 1 | W ) to denote the limit (in probability) of a potentially mis-specified estimator ˆ P ( A = 1 | W ) ,and P to denote the truth. So the estimator ˆ P ( t, ,
1) = n − (cid:88) i (cid:20) ˆ P ( t, , , W i ) + (cid:8) N i ( t ) − ˆ P ( t, , , W i ) (cid:9) I ( A i = 1)ˆ P ( A = 1 | W i ) (cid:21) is double robust in this sense. We now turn to the more difficult part, ˆ P ( t, , . We need to look at (cid:8)(cid:90) t e − Λ ( s | ,W ) e − Λ ( s | ,W ) dN ( s ) − P ( t, , , W ) (cid:9) I ( A = 0) P ( A = 0 | W ) + P ( t, , , W )+ (cid:90) t (cid:8) P ( t, , , W ) − P ( u, , , W ) (cid:9)(cid:20) dM ( u | , W ) P ( T > u | , W ) I ( A = 0) P ( A = 0 | W ) − dM ( u | , W ) P ( T > u | , W ) I ( A = 1) P ( A = 1 | W ) (cid:21) . (22)Now let Y = (cid:90) t e − Λ ( s | ,W ) e − Λ ( s | ,W ) dN ( s ) I ( A = 0) P ( A = 0 | W ) where Λ denotes the true value of the parameter. If P ( A = 1 | W ) is correctly specified then E ( Y ) = P ( t, , . Display (22) can be rewritten as Y + P ( t, , , W ) (cid:26) P ( A = 0 | W ) − I ( A = 0) P ( A = 0 | W ) (cid:27) + (cid:90) t (cid:8) e − Λ ( s | ,W ) e − Λ ( s | ,W ) − e − Λ ( s | ,W ) e − Λ ( s | ,W ) (cid:9) dN ( s ) I ( A = 0) P ( A = 0 | W )+ (cid:90) t Z ( s, W ) (cid:20)(cid:90) s dM ( u | , W ) P ( T > u | , W ) I ( A = 0) P ( A = 0 | W ) − (cid:90) s dM ( u | , W ) P ( T > u | , W ) I ( A = 1) P ( A = 1 | W ) (cid:21) dF ( s | , W ) . (23)16 PREPRINT - S
EPTEMBER
1, 2020Assume first that λ ( t | j, W ) and λ ( t | j, W ) , j = 0 , , are correctly specified but P ( A = j | W ) may not be so. It isthen clear from (3) that the estimator based on the efficient influence function is unbiased as the efficient influencefunction will still have zero mean.We now assume that P ( A = j | W ) and λ ( t | j, W ) are correctly specified but λ ( t | j, W ) may be mis-specified. Itfollows directly that the mean of the first term of display (23) is E ( Y ) + 0 = P ( t, , . and it is also clear that the the mean of the sum of the last two terms of display (23) is zero as long as λ ( t | j, W ) iscorrectly specified.We now assume that P ( A = j | W ) and λ ( t | j, W ) are correctly specified but λ ( t | j, W ) may be mis-specified. Itfollows again that the mean of the first term of display (23) is E ( Y ) + 0 = P ( t, , . Keep in mind that we have E (cid:26) dN ( s ) I ( A = 0) P ( A = 0 | W ) | W (cid:27) = dF ( s | , W ) (24)and that E (cid:26) dN ( s ) I ( A = j ) P ( A = j | W ) | W (cid:27) = dF ( s | j, W ) for j = 1 , . We let Λ ∗ k ( t | j, w ) denote the limit of a given estimator ˆΛ k ( t | j, w ) for k = 1 , and j = 0 , , and similarly,let Z ∗ ( t, w ) denote the limit of the estimator ˆ Z ( t, w ) , and P ∗ ( t > u | j, w ) being e − Λ ∗ ( u | j,w ) − Λ ∗ ( u | j,w ) (all supposed toexist such as would be the case if we use Cox-models for the two cause specific hazard functions). We note that E (cid:8) dN ( u ) − Y ( u ) d Λ ∗ ( u | j, W ) P ∗ ( T > u | j, W ) I ( A = j ) P ( A = j | W ) | W (cid:9) = dF ( u | j, W )) P ∗ ( T > u | j, W ) − P ( T > u | j, W ) d Λ ∗ ( u | j, W ) P ∗ ( T > u | j, W )= − P ( T > u | j, W ) P ∗ ( T > u | j, W ) (cid:8) d Λ ∗ ( u | j, W ) − Λ ( u | j, W ) (cid:9) = − e { Λ ∗ ( u | j,W ) − Λ ( u | j,W ) } (cid:8) d Λ ∗ ( u | j, W ) − Λ ( u | j, W ) (cid:9) (25)since λ is correctly specified. Integrating the right hand side of (25) from 0 to s gives − e { Λ ∗ ( u | j,W ) − Λ ( u | j,W ) } + 1 Using that dF ( s | , W ) = e −{ Λ ∗ ( u | ,W ) − Λ ( u | ,W ) } dF ( s | , W ) we see that the last term of (23) converges to − (cid:90) t { Z ∗ ( s, w ) − Z ( s, w ) } dF ( s | , W ) that cancels with the limit of the second term of (23). Hence, ˆ δ e ( t, is consistent if two of the three models (i): λ ( t | j, w ) ; (ii): λ ( t | j, w ) and (iii): P ( A = j | W ) , are correctly specified. We now consider the case with censoringand let (iv): λ C ( s | A, W ) be the censoring model. Using (4), we can write ψ ( t, D ) = ˜ ψ ( t, Z ) + (cid:90) ∞ (cid:2) ˜ ψ ( t, Z ) − E (cid:8) ˜ ψ ( t, Z ) | T > s, A, W (cid:9)(cid:3) K C ( s | A, W ) dM C ( s | A, W ) (26)that depends on G as noted in Section 3.1. We let G ∗ denote the limit of the estimator G n of G , and have, as notedabove, that E { ˜ ψ ( t, Z, G ∗ ) } = 0 if two out of the three models (i)-(iii) are correctly specified. Keep in mind that dM C ( s | A, W ) = dN C ( s ) − I ( s ≤ ˜ T ) d Λ C ( s | A, W ) . Since T and C are conditional independent given A and W , wehave E { dN C ( s ) | T, A, W ) } = I ( s ≤ T ) P ( T > s | A, W ) K C ( s | A, W ) λ C ( s | A, W ) E { I ( s ≤ ˜ T ) | T, A, W ) } = I ( s ≤ T ) P ( T > s | A, W ) K C ( s | A, W ) . PREPRINT - S
EPTEMBER
1, 2020Therefore, E (cid:26) (cid:2) ˜ ψ ( t, Z, G ∗ ) − E (cid:8) ˜ ψ ( t, Z ) | T > s, A, W, G ∗ (cid:9)(cid:3) K ∗ C ( s | A, W ) dM ∗ C ( s | A, W ) (cid:27) = E (cid:26) I ( s ≤ T ) (cid:2) ˜ ψ ( t, Z, G ∗ ) − E (cid:8) ˜ ψ ( t, Z ) | T > s, A, W, G ∗ (cid:9)(cid:3) K C ( s | A, W ) K ∗ C ( s | A, W ) { λ C ( s | A, W ) − λ ∗ C ( s | A, W ) } ds (cid:27) , and E (cid:8) I ( s ≤ T ) (cid:2) ˜ ψ ( t, Z, G ∗ ) − E (cid:8) ˜ ψ ( t, Z ) | T > s, A, W, G ∗ (cid:9)(cid:3) | A, W (cid:9) = (cid:88) (cid:15) (cid:90) ∞ s ˜ ψ ( t, u, (cid:15), A, W ) (cid:2) f ( u, (cid:15) | A, W ) − f ∗ ( u, (cid:15) | A, W ) P ( T > u | A, W ) P ∗ ( T > u | A, W ) (cid:3) du We hence see that the mean of the latter term in (26) is zero if either models (i) and (ii) are correctly specified or ifmodel (iv) is correctly specified. The proposed estimator ˆ δ e ( t, is thus consistent if the following models are correctlyspecified: (i) and (ii), or (i), (iii) and (iv), or (ii), (iii) and (iv). It follows that under appropriate conditions, as in Chen et al. (2010), that n / (cid:110) ˆ P ( t, a Y , a D ) − P ( t, a Y , a D ) (cid:111) = n − / (cid:88) i [ P ( t, a Y , a D , W i ) − E { P ( t, a Y , a D , W i ) } ]+ E (cid:104) n / (cid:110) ˆ P ( t, a Y , a D , W ) − P ( t, a Y , a D , W ) (cid:111)(cid:105) + o p (1) , (27)where the expectation in the second term is taken wrt W . The first term on the right hand side of (27) is already oniid-form, so we only need to deal with the second term. Define θ t ( a Y , a D , w ) = e − Λ ( t | A = a Y ,w ) − Λ ( t | A = a D ,w ) . It is then easily seen that n / (cid:110) d ˆ P ( t, a Y , a D , W ) − dP ( t, a Y , a D , W ) (cid:111) = n / (cid:110) ˆ θ t ( a Y , a D , W ) − θ t ( a Y , a D , W ) (cid:111) e β A a Y + β W W d Λ ( t )+ θ t ( a Y , a D , W ) g ( β , a Y , W ) d n / (cid:110) ˆΛ ( t ) − Λ ( t ) (cid:111) (28) + θ t ( a Y , a D , W ) (cid:8) D β g ( β , a Y , W ) (cid:9) n / (cid:110) ˆ β − β (cid:111) d Λ ( t ) and we then need to find the influence functions of the three terms on the right hand side of (28), and take expectationwrt W . In the latter display, g ( β , a Y , W ) = e β A a Y + β W W with β j = ( β jA , β jW ) , j = 1 , . The second and third terms can be dealt with directly. For instance, with (cid:15) β i being theinfluence function corresponding to ˆ β , E W (cid:2) θ t ( a Y , a D , W ) (cid:8) D β g ( β , a Y , W ) (cid:9) n / (cid:110) ˆ β − β (cid:111) d Λ ( t ) (cid:3) = E L (cid:2) θ t ( a Y , a D , L ) (cid:8) D β g ( β , a Y , L ) (cid:9)(cid:3) (cid:40) n − / (cid:88) i (cid:15) β i (cid:41) d Λ ( t ) in which we may estimate E W (cid:2) θ t ( a Y , a D , W ) (cid:8) D β g ( β , a Y , W ) (cid:9)(cid:3) by n − (cid:88) i θ t ( a Y , a D , W i ) (cid:8) D β g ( β , a Y , W i ) (cid:9) . and then replacing parameters with their estimates. Similarly with the second term on the right hand side of (28). Thefirst term on the right hand side of (28) is handled using the mean value theorem (Taylor expansion) with respect to theparameters { Λ j ( t ) , β j } , j = 1 , . The expectation wrt W of the part corresponding to λ ( t ) of this term is − E W (cid:2) θ t ( a Y , a D , W ) e β A a Y + β W W ) (cid:3) (cid:40) n − / (cid:88) i (cid:15) λ i ( t ) (cid:41) d Λ ( t ) , PREPRINT - S
EPTEMBER
1, 2020where E W (cid:2) θ t ( a Y , a D , W ) e β A a Y + β W W ) (cid:3) is estimated by n − (cid:88) i θ t ( a Y , a D , W i ) e β A a Y + ˆ β W W i ) and replacing parameters with their estimates. Similarly with the other terms. By combining all these terms, we havethat n / (cid:110) ˆ P ( t, a Y , a D ) − P ( t, a Y , a D ) (cid:111) = (cid:88) i (cid:15) P i ( t, a Y , a D ) , where (cid:15) P i ( t, a Y , a D ) are zero-mean iid terms (i.e. we have found the influence function), and we have also a way ofestimating the influence function. So, basically, what we need is the first term in (27), and then all the influence functionscorresponding to the parameters { Λ j ( t ) , β j } using the usual Cox-regression estimators (these can be extracted fromthe coxaalen -function in R-package timereg ), and then also some ordinary empirical means such as n − (cid:88) i θ t ( a Y , a D , W i ) e β A a Y + ˆ β W W i ) . We here briefly describe how to do Targeted Maximum Likelihood Estimation (TMLE), see van der Laan and Rubin(2006) and van der Laan and Rose (2011). We do so for the setting with no censoring, based on expression (3). Thegeneralization to the censored case follows immediately. The tangent space adhering to f ( A | W ) consist of all functions r ( A, W ) so that E { r ( A, W ) | W } = 0 , and the efficient influence function ˜ ψ ( t, T, (cid:15), A, W ) is therefore orthogonal tothat space. To see why, E (cid:8) r ( A, W ) (cid:90) g ( s ; A, W ) dM j ( s | A, W ) (cid:9) = E (cid:2) r ( A, W ) E { (cid:90) g ( s ; A, W ) dM j ( s | A, W ) | A, W } (cid:3) = 0 and E (cid:2) r ( A, W ) { δ ( t, , W ) − δ ( t, } (cid:3) = E (cid:2) { δ ( t, , W ) − δ ( t, } E { r ( A, W ) | W } (cid:3) = 0 . Hence we need not to fluctuate the initial estimator of P ( A = a | W ) , and can concentrate on the part of the tangentspace adhering to f ( T, (cid:15) | A, W ) . The needed parametric submodels are obtained by parametrizing the cause specifichazard function as λ j,γ ( s | A, W ) = λ j ( s | A, W ) e γH j ( s ; A,W ) , j = 1 , . The score (in γ ) corresponding to f ( T, (cid:15) | A, W ) of the likelihood is (cid:88) j =1 (cid:90) H j ( s ; A, W ) dM Tj ( s | A, W ) so we see from (3) that we need to pick the functions H j ( s ; A, W ) as H ( s ; A, W ) = I ( s ≤ t ) g ( A, W ) e − Λ ( s | ,W ) e − Λ ( s | A,W ) (cid:26) − { F ( t | A, w ) − F ( s, | A, w ) (cid:9) P ( T > s | A, w ) (cid:27) H ( s ; A, W ) = I ( s ≤ t ) g ( A, W ) P ( T > s | A, W ) , (cid:26) P ( t, , , W ) − P ( s, , , W ) − e − Λ ( s | ,W ) e − Λ ( s | A,W ) (cid:8) F ( t | A, w ) − F ( s, | A, W ) (cid:9)(cid:27) . We find ˆ γ as the solution to U ( γ ) = n (cid:88) i = 2 (cid:88) j =1 (cid:90) H j ( s ; A i , W i ) (cid:8) dN ij ( s ) − Y i ( s ) e γH j ( s ; A i ,W i ) d ˆΛ j ( s | A i , W i ) (cid:9) , (29)where ˆΛ j ( s | A i , W i ) is an initial estimate of Λ j ( s | A i , W i ) based for instance on semiparametric models such as theCox-model. We then calculate ˆΛ (1) j ( s | A, W ) = (cid:90) s e ˆ γH j ( u ; A,W ) d ˆΛ j ( s | A, W ) . PREPRINT - S
EPTEMBER
1, 2020Next step is to solve equation (29) with ˆΛ j replaced by ˆΛ (1) j to get an updated ˆ γ (1) to calculate ˆΛ (2) j ( s | A, W ) = (cid:90) s e ˆ γ (1) H j ( u ; A,W ) d ˆΛ j ( s | A, W ) and iterate until convergence (ie until ˆ γ does no longer change), say it happens at iteration step k . The TMLE estimateof parameter of interest is then simply the plug-in estimate ˆ δ ( t,
1) = n − (cid:88) i (cid:26)(cid:90) t e − ˆΛ ( k )1 ( s | ,W i ) − ˆΛ ( k )2 ( s | ,W i ) d ˆΛ ( k )1 ( s | , W i ) (cid:27) − n − (cid:88) i (cid:26)(cid:90) t e − ˆΛ ( k )1 ( s | ,W i ) − ˆΛ ( k )2 ( s | ,W i ) d ˆΛ ( k )2 ( s | , W i ) (cid:27) . Table 1: Simulation results concerning the estimators ˆ P ( t, , , ˆ P ( t, , and ˆ δ ( t,
1) = ˆ P ( t, , − ˆ P ( t, , .Each entry in the table is based on 1000 replicates. n=400 n=800 t = 2 t = 4 t = 6 t = 2 t = 4 t = 6 True P ( t, , ˆ P ( t, , ˆ P ( t, , ) 0.013 0.020 0.025 0.009 0.014 0.018see ( ˆ P ( t, , ) 0.013 0.020 0.025 0.009 0.014 0.01895% CP( ˆ P ( t, , ) 0.924 0.934 0.945 0.939 0.938 0.942True P ( t, , ˆ P ( t, , ˆ P ( t, , ) 0.020 0.027 0.032 0.0143 0.019 0.023see ( ˆ P ( t, , ) 0.019 0.028 0.033 0.014 0.019 0.02395% CP( P ( t, , ) 0.929 0.952 0.952 0.948 0.954 0.956True δ ( t, -0.049 -0.080 -0.101 -0.049 -0.080 -0.101Mean ˆ δ ( t, -0.048 -0.079 -0.099 -0.049 -0.081 -0.101sd ( ˆ δ ( t, ) 0.019 0.031 0.038 0.014 0.022 0.027see ( ˆ δ ( t, ) 0.019 0.031 0.039 0.014 0.022 0.02895% CP( ˆ δ ( t, ) 0.946 0.951 0.952 0.956 0.957 0.956 PREPRINT - S
EPTEMBER
1, 2020Table 2: Simulation results concerning the simple estimator ˆ δ ( t, and the one based on the efficient influence function, ˆ δ e ( t, . We let see ei denote the standard error estimate based on the squared efficient influence function and see b denotes the standard error estimate based 250 bootstrap replicates. Each entry in the table is based on 1000 replicates. t = 1 t = 3 t = 5 t = 7 t = 9 A1 True δ ( t, -0.052 -0.128 -0.178 -0.210 -0.231Mean ˆ δ ( t, -0.052 -0.128 -0.177 -0.209 -0.229Mean ˆ δ e ( t, -0.052 -0.128 -0.178 -0.209 -0.229sd ( ˆ δ ( t, ) 0.014 0.024 0.029 0.032 0.034sd ( ˆ δ e ( t, ) 0.020 0.029 0.033 0.035 0.036see ei ( ˆ δ e ( t, ) 0.020 0.030 0.035 0.037 0.038see b ( ˆ δ e ( t, ) 0.020 0.029 0.033 0.035 0.036A2 True δ ( t, -0.052 -0.128 -0.178 -0.210 -0.231Mean ˆ δ ( t, -0.052 -0.127 -0.176 -0.208 -0.228Mean ˆ δ e ( t, -0.052 -0.127 -0.176 -0.207 -0.227sd ( ˆ δ ( t, ) 0.015 0.028 0.034 0.039 0.042sd ( ˆ δ e ( t, ) 0.020 0.033 0.038 0.042 0.044see ei ( ˆ δ e ( t, ) 0.020 0.032 0.038 0.042 0.046see b ( ˆ δ e ( t, ) 0.020 0.031 0.038 0.042 0.044B1 True δ ( t, -0.052 -0.128 -0.178 -0.210 -0.231Mean ˆ δ ( t, -0.052 -0.127 -0.176 -0.208 -0.228Mean ˆ δ e ( t, -0.052 -0.126 -0.176 -0.207 -0.228sd ( ˆ δ ( t, ) 0.014 0.025 0.031 0.034 0.037sd ( ˆ δ e ( t, ) 0.024 0.036 0.041 0.043 0.046see ei ( ˆ δ e ( t, ) 0.024 0.037 0.043 0.046 0.048see b ( ˆ δ e ( t, ) 0.023 0.035 0.040 0.043 0.044B2 True δ ( t, -0.052 -0.128 -0.178 -0.210 -0.231Mean ˆ δ ( t, -0.052 -0.127 -0.175 -0.206 -0.227Mean ˆ δ e ( t, -0.052 -0.126 -0.175 -0.206 -0.227sd ( ˆ δ ( t, ) 0.016 0.028 0.037 0.042 0.045sd ( ˆ δ e ( t, ) 0.024 0.038 0.045 0.049 0.054see ei ( ˆ δ e ( t, ) 0.024 0.039 0.048 0.052 0.056see b ( ˆ δ e ( t, ) 0.024 0.038 0.045 0.050 0.054C1 True δ ( t, ˆ δ ( t, ˆ δ e ( t, ˆ δ e ( t, ) 0.030 0.040 0.042 0.041 0.041see ei ( ˆ δ e ( t, ) 0.030 0.041 0.043 0.044 0.043see b ( ˆ δ e ( t, ) 0.030 0.040 0.042 0.042 0.042C2 True δ ( t, ˆ δ ( t, ˆ δ e ( t, ˆ δ e ( t, ) 0.032 0.043 0.047 0.049 0.050see ei ( ˆ δ e ( t, ) 0.031 0.044 0.048 0.051 0.052see b ( ˆ δ e ( t, ) 0.030 0.043 0.048 0.050 0.051 PREPRINT - S
EPTEMBER
1, 2020
Time (months) A b s o l u t e r i sk . . . . . Time (months) A b s o l u t e r i sk . . . . . Figure 1: Results from the prostate cancer data example. The curves describe the cumulative incidence of prostatecancer death (left panel) and death due to other causes (right panel). The black curves denote the placebo arm while theorange curves denote the treatment arm, and the shaded areas are 95% confidence bands.22
PREPRINT - S
EPTEMBER
1, 2020 - . - . - . - . . . Time (months) - . - . - . - . . . Time (months)
Figure 2: Separable effects in the prostate cancer data application. Left panel shows ˆ δ e ( t,0)