Longitudinal mediation analysis of time-to-event endpoints in the presence of competing risks
Tat-Thang Vo, Hilary Davies-Kershaw, Ruth Hackett, Stijn Vansteelandt
RReceived: Added at production Revised: Added at production Accepted: Added at productionDOI: xxx/xxxx
RESEARCH ARTICLE
Longitudinal mediation analysis of time–to–event endpoints inthe presence of competing risks
Tat-Thang Vo* | Hilary Davies-Kershaw | Ruth Hackett | Stijn Vansteelandt Department of Applied Mathematics,Computer Science and Statistics, GhentUniversity, Ghent, Belgium Université de Paris, CRESS, INSERM,INRA, Paris, France Department of Population Health, LondonSchool of Hygiene and Tropical Medicine,London, UK Health Psychology Section, Department ofPsychology, King’s College London,London, UK Department of Medical Statistics, LondonSchool of Hygiene and Tropical Medicine,London, UK
Correspondence *Tat-Thang Vo, Department of AppliedMathematics, Computer Science andStatistics, Ghent University, Krijgslaan 281,S9, Ghent 9000, Belgium. Email:[email protected]
Summary
This proposal is motivated by an analysis of the English Longitudinal Study of Age-ing (ELSA), which aims to investigate the role of loneliness in explaining the negativeimpact of hearing loss on dementia. The methodological challenges that complicatethis mediation analysis include the use of a time-to-event endpoint subject to com-peting risks, as well as the presence of feedback relationships between the mediatorand confounders that are both repeatedly measured over time. To account for thesechallenges, we introduce natural effect proportional (cause-specific) hazard mod-els. These extend marginal structural proportional (cause-specific) hazard models toenable effect decomposition. We show that under certain causal assumptions, thepath-specific direct and indirect effects indexing this model are identifiable from theobserved data. We next propose an inverse probability weighting approach to esti-mate these effects. On the ELSA data, this approach reveals little evidence that thetotal efect of hearing loss on dementia is mediated through the feeling of loneliness,with a non-statistically significant indirect effect equal to 1.012 (hazard ratio (HR)scale; 95% confidence interval (CI) 0.986 to 1.053).
KEYWORDS:
Longitudinal mediation analysis, Natural effect model, Inverse weighting, Survival outcome
This article is motivated by an analysis of the English Longitudinal Study of Ageing, a longitudinal cohort study of individualsaged 50 and older living in the community in England , which follows participants biennially since 2002/03. In previous studies,it was shown that both self-reported hearing loss and loneliness are significantly associated with a higher risk of dementia .The question of interest, which we shall address here, is whether loneliness mediates the impact of hearing loss on incidentphysician-diagnosed dementia.The focus on time-to-event endpoints (dementia) complicates the planned mediation analysis. It renders the popular difference-and product-of-coefficient methods inappropriate , necessitating the use of more complex causal mediation analysis methods.While such methods have been developed for the analysis of time-to-event endpoints, most ignore that the mediator is notassessed at baseline and that subjects may therefore experience the event prior to the mediator being assessed . Furthercomplications arise from the mediator, loneliness, being repeatedly measured. While useful to better capture mediation via theentire longitudinal mediator process , this also gives rise to complex time-varying confounding patterns whereby mediators Abbreviations:
ANA, anti-nuclear antibodies; APC, antigen-presenting cells; IRF, interferon regulatory factor a r X i v : . [ s t a t . M E ] J u l Vo ET AL (e.g. loneliness) and confounders (e.g. comorbidities) mutually influence each other over time. These complications have beenaddressed in a number of recent works .A further complication that we must consider, and that we have not previously found being addressed in the literature, isthe presence of competing risks by death. With observed - as opposed to counterfactual - event times, the modelling of cause-specific hazards is well known to simplify the handling of competing risks. However, the modelling of (cause-specific) hazardsis not readily possible in the previous mediation analysis works , which instead focus on the analysis of survival chances.In view of this, in this paper, we will here introduce so-called natural effect proportional (cause-specific) hazard models, whichdirectly parameterise the direct and indirect effects of a given exposure on the cause-specific hazard of the considered event.We will extend the weighting-based approach proposed by Mittinty and Vansteelandt for longitudinal natural effect models toenable the planned mediation analysis of an exposure via repeatedly measured mediators on a time-to-event endpoint subject tocompeting risks.We proceed as follows. In the next section, we first describe the setting of interest. We then extend the natural effect propor-tional hazard model to take into account the longitudinal nature of the mediator. In the same section, we discuss the assumptionsunder which the path-specific direct and indirect effects derived from such model are identifiable from data, and provide a step-by-step procedure for estimating these effects. In section 3, we apply the proposed approach to analyze data from the EnglishLongitudinal Study of Ageing . We end with some final remarks and a discussion. Consider an observational study in which independent individuals 𝑖 = 1 , … , 𝑛 are exposed to a categorical factor 𝐴 𝑖 coded as , , … , 𝑃 − 1 for 𝑃 different categories. Longitudinal measurements of the mediator 𝑀 𝑖 , 𝑀 𝑖 … , 𝑀 𝑖𝐾 and of the covariates 𝐿 𝑖 , 𝐿 𝑖 … , 𝐿 𝑖𝐾 are subsequently recorded at baseline (subscript 0) and at visits , … , 𝐾 , along with (i) a time-to-event endpoint 𝑇 𝑖 and (ii) an index 𝐷 𝑖 specifying whether the main event ( 𝐷 𝑖 = 1 ) or the competing one ( 𝐷 𝑖 = 2 ) happens. Denote 𝑡 𝑘 , 𝑘 = 1 … 𝐾 the fixed time point after the onset of the exposure at which the measurements of 𝑀 𝑘 and 𝐿 𝑘 are pre-planned for all patients.Assume that these measurements are only recorded until the last visit 𝐾 or until event 𝐷 𝑖 = 1 or 𝐷 𝑖 = 2 happens, whichevercomes first. The time-to-event endpoint may be censored administratively or due to loss to follow-up, in which case 𝐷 𝑖 = 0 .The causal diagram in figure 1 depicts the relationships between the variables over time. It also represents a non-parametricstructural equation model with independent errors. In the diagram, 𝐿 𝑘 includes the indicator 𝐼 ( 𝑇 ≥ 𝑡 𝑘 ) of having survived visit 𝑘 . Throughout, we will denote the history of measurements up to visit 𝑘 using a bar, i.e. 𝑀 𝑘 = ( 𝑀 , 𝑀 , … 𝑀 𝑘 ) .To define the direct and indirect effects of interest, we will make use of so-called path-specific effects, expressed as a (cause-specific) hazard ratio. In particular, we define the counterfactual variables 𝑇 𝑎,𝑎 ∗ and 𝐷 𝑎,𝑎 ∗ as the time to the main or competingevent (whichever comes first) and the corresponding event index that would be observed if the exposure 𝐴 were set to 𝑎 and themediator levels changed to the levels that we would have seen if the exposure were set to 𝑎 ∗ and the levels of the time-varyingconfounders were as observed under this joint intervention on 𝐴 and 𝑀 𝐾 , respectively. The total causal effect (TE) on the cause- 𝑗 -specific hazard when the exposure changes from 𝑎 to 𝑎 ∗ is then expressed as 𝐻𝑅 𝑗𝑇 𝐸 = 𝜆 𝑗𝑎 ∗ ,𝑎 ∗ ( 𝑡 ) 𝜆 𝑗𝑎,𝑎 ( 𝑡 ) , which can be decomposedinto the direct effect (DE) 𝐻𝑅 𝑗𝐷𝐸 = 𝜆 𝑗𝑎 ∗ ,𝑎 ( 𝑡 ) 𝜆 𝑗𝑎,𝑎 ( 𝑡 ) and the indirect effect (IE) 𝐻𝑅 𝑗𝐼𝐸 = 𝜆 𝑗𝑎 ∗ ,𝑎 ∗ ( 𝑡 ) 𝜆 𝑗𝑎 ∗ ,𝑎 ( 𝑡 ) , where 𝐻𝑅 𝑗𝑇 𝐸 = 𝐻𝑅 𝑗𝐼𝐸 × 𝐻𝑅 𝑗𝐷𝐸 .The indirect effect 𝐻𝑅 𝑗𝐼𝐸 hence reflects the part of the treatment effect (on the cause- 𝑗 -specific hazard) that is mediated via thepathways 𝐴 → 𝑀 𝑘 → … → 𝑇 , where 𝑘 = 1 , , … . These pathways start from the treatment 𝐴 and go directly to one of themediator levels before getting to the event time 𝑇 by any intermediate path. In contrast, the direct effect 𝐻𝑅 𝑗𝐷𝐸 reflects the partof the treatment effect (on the cause- 𝑗 -specific hazard) that does not go through any of the above pathways.We are now ready to define the cause– 𝑗 –specific natural effect proportional hazard model, accounting for longitudinalmediators, as follows: 𝜆 𝑗𝑎,𝑎 ∗ ( 𝑡 ) = 𝜆 𝑗 ( 𝑡 ) 𝑒 𝛼 𝑗 𝑎 + 𝛼 𝑗 𝑎 ∗ (1)for all 𝑎, 𝑎 ∗ , where 𝜆 𝑗 ( 𝑡 ) , 𝛼 𝑗 and 𝛼 𝑗 are unknown. Under model (1) , the total, direct and indirect effect can be expressed on thehazard ratio scale as 𝐻𝑅 𝑗𝑇 𝐸 = 𝑒 ( 𝛼 𝑗 + 𝛼 𝑗 )( 𝑎 ∗ − 𝑎 ) ; 𝐻𝑅 𝑗𝐼𝐸 = 𝑒 𝛼 𝑗 ( 𝑎 ∗ − 𝑎 ) and 𝐻𝑅 𝑗𝐷𝐸 = 𝑒 𝛼 𝑗 ( 𝑎 ∗ − 𝑎 ) , respectively. To assess the possibility o ET AL 𝐴 𝑀 𝑀 ... 𝑀 ⌊ 𝑡 ⌋ 𝑇𝐿 𝐿 𝐿 ... 𝐿 ⌊ 𝑡 ⌋ 𝑈 𝑙 𝑈 𝑚 FIGURE 1
Causal diagram. 𝑈 𝑙 : unmeasured confounders affecting 𝐿 and ( 𝑇 , 𝐷 ) . 𝑈 𝑚 : unmeasured confounders affectingdifferent measurements of 𝑀 over timeof mediator-exposure interaction, one can alternatively consider model: 𝜆 𝑗𝑎,𝑎 ∗ ( 𝑡 ) = 𝜆 𝑗 ( 𝑡 ) 𝑒 𝛼 𝑗 𝑎 + 𝛼 𝑗 𝑎 ∗ + 𝛼 𝑗 𝑎.𝑎 ∗ (2)Under model (2), the total causal effect on the hazard ratio scale is expressed as 𝐻𝑅 𝑗𝑇 𝐸 = 𝑒 ( 𝛼 𝑗 + 𝛼 𝑗 + 𝛼 𝑗 ( 𝑎 ∗ + 𝑎 )) ) ( 𝑎 ∗ − 𝑎 ) and is decom-posed into the indirect effect 𝐻𝑅 𝑗𝐼𝐸 = 𝑒 ( 𝛼 𝑗 + 𝛼 𝑗 𝑎 )( 𝑎 ∗ − 𝑎 ) and the direct effect 𝐻𝑅 𝑗𝐷𝐸 = 𝑒 ( 𝛼 𝑗 + 𝛼 𝑗 𝑎 ∗ )( 𝑎 ∗ − 𝑎 ) . Finally, note that othermodels for survival outcomes, such as the Aalen model, can also be extended to the current context. To estimate the parameters in model (1) , we will assume that the set of baseline covariates 𝐿 is sufficient to control for con-founding of the relationship between 𝐴 and ( 𝑇 , 𝐷 ) , as well as between 𝐴 and 𝑀 𝑡 at any time. Besides, we assume there are nounmeasured confounders of the relationship between the time-to-event outcome and the mediator at any time. In figure 1, thelatter assumption is satisfied since conditioning on the exposure 𝐴 and the history of the time-varying confounder 𝐿 up to time 𝑡 is sufficient to adjust for confounding of the relationship between the time-to-event outcome 𝑇 and the mediator level at time 𝑡 .Our development allows for the presence of unmeasured common causes of the mediators over time (i.e. denoted 𝑈 𝑚 in figure 1)and separate, independent unmeasured common causes of the baseline/time-varying confounders and the survival time ( 𝑇 , 𝐷 ) (i.e. denoted 𝑈 𝑙 ).In what follows, we generalize the standard estimation procedure for the popular marginal strutural models to fit models (1)and (2). For this, we make the assumption that censoring is non-informative, in the sense that at any time, the instantaneous riskof the event among patients who then drop out of the study is not different (at all future times) from that of patients who remain,conditional on the exposure level. Under the aforementioned assumptions, Appendix 1 shows that consistent estimators of theparameters indexing model (1) can be obtained by solving the estimating equation: Vo ET AL ∞ ∫ ∑ 𝑖,𝑎,𝑎 ∗ ⎧⎪⎨⎪⎩( 𝑎𝑎 ∗ ) − ∑ 𝑎,𝑎 ∗ ̂𝐸 [( 𝑎𝑎 ∗ ) 𝑅 𝑗𝑖 ( 𝑡 ) 𝑊 𝑖 ( ⌊ 𝑡 ⌋ , 𝑎, 𝑎 ∗ ) 𝑒 𝛼 𝑗 𝑎 + 𝛼 𝑗 𝑎 ∗ ]∑ 𝑎,𝑎 ∗ ̂𝐸 [ 𝑅 𝑗𝑖 ( 𝑡 ) 𝑊 𝑖 ( ⌊ 𝑡 ⌋ , 𝑎, 𝑎 ∗ ) 𝑒 𝛼 𝑗 𝑎 + 𝛼 𝑗 𝑎 ∗ ] ⎫⎪⎬⎪⎭ ⋅⋅ 𝑅 𝑗𝑖 ( 𝑡 ) 𝑊 𝑖 ( ⌊ 𝑡 ⌋ , 𝑎, 𝑎 ∗ ) ( 𝑑𝑁 𝑗𝑖 ( 𝑡 ) − 𝜆 𝑗 ( 𝑡 ) 𝑒 𝛼 𝑗 𝑎 + 𝛼 𝑗 𝑎 ∗ ) = 0 where 𝑅 𝑗𝑖 ( 𝑡 ) = 𝐼 ( 𝑇 𝑖 ≥ 𝑡, 𝐷 𝑖 = 𝑗 ) ; 𝑑𝑁 𝑗𝑖 ( 𝑡 ) = 𝐼 ( 𝑇 𝑖 = 𝑡, 𝐷 𝑖 = 𝑗 ) and 𝐼 ( . ) denotes the indicator function. Besides, 𝑊 𝑖 ( ⌊ 𝑡 ⌋ , 𝑎, 𝑎 ∗ ) = ∏ 𝑠 ∶ 𝑡 𝑠 ≤ ⌊ 𝑡 ⌋ Pr ( 𝑀 𝑠,𝑖 | 𝐴 𝑖 = 𝑎 ∗ , 𝑀 𝑠 −1 ,𝑖 , 𝐿 𝑠,𝑖 , 𝑇 𝑖 ≥ 𝑡 𝑠 ) ∏ 𝑠 ∶ 𝑡 𝑠 ≤ ⌊ 𝑡 ⌋ Pr ( 𝑀 𝑠,𝑖 | 𝐴 𝑖 = 𝑎, 𝑀 𝑠 −1 ,𝑖 , 𝐿 𝑠,𝑖 , 𝑇 𝑖 ≥ 𝑡 𝑠 ) × 𝐼 ( 𝐴 𝑖 = 𝑎 ) Pr ( 𝐴 𝑖 = 𝑎 | 𝐿 ,𝑖 ) denotes the weight of individual 𝑖 at time 𝑡 and at exposure levels 𝑎 and 𝑎 ∗ . The second component of the weight ensures thatthe exposure-outcome association is adjusted for confounding by 𝐿 . It creates a pseudo-population in which the exposure isno longer associated with 𝐿 and hence removes confounding by 𝐿 . The first component of the weight then distinguishesbetween the direct and indirect paths by correcting for the fact that the observed mediator value at each time point before 𝑡 maydiffer from the counterfactual value that is of interest at that time. Note that the notation ⌊ 𝑡 ⌋ here is slightly different from itsstandard definition, to take into account the fact that if a patient experiences an event or leaves the study at time 𝑡 = 𝑡 𝑘 of visit 𝑘 , no measurement of 𝑀 𝑘 and 𝐿 𝑘 is possible at that time. More precisely, ⌊ 𝑡 ⌋ = { 𝑡 𝑘 −1 if 𝑡 = 𝑡 𝑘 𝑡 𝑘 if 𝑡 𝑘 < 𝑡 < 𝑡 𝑘 +1 where 𝑘 = 1 , … , 𝐾 . From this, the fitting procedure is described as follows: Step 1–
Postulate and fit a suitable model for the exposure 𝐴 conditional on the baseline confounders ( 𝐿 ) based on theoriginal data set. For instance, a multinomial logistic model can be used for a categorical exposure with 𝑃 possible values: log Pr ( 𝐴 = 𝑎 | 𝐿 = 𝑙 ) Pr ( 𝐴 = 0 | 𝐿 = 𝑙 ) = 𝛽 ,𝑎 + 𝛽 ,𝑎 𝑙 , (3)where 𝑎 = 1 , … , 𝑃 − 1 and Pr ( 𝐴 = 0 | 𝐿 = 𝑙 ) = 1∕(1 + ∑ 𝑃 −1 𝑎 =1 𝑒 𝛽 ,𝑎 + 𝛽 ,𝑎 𝑙 ) . Step 2–
Convert the original dataset to a long or counting-process format, in which the observation period [0 , 𝑡 𝑖 ] of subject 𝑖 is broken into ⌊ 𝑡 𝑖 ⌋ + 1 intervals if 𝑡 𝑖 > ⌊ 𝑡 𝑖 ⌋ > 𝑡 , into 𝑡 𝑖 intervals if 𝑡 𝑖 = ⌊ 𝑡 𝑖 ⌋ > 𝑡 and is kept unchanged if ⌊ 𝑡 𝑖 ⌋ < 𝑡 . Eachinterval 𝑘 will have the following information encoded:(a) The beginning of the interval, which equals the time 𝑡 𝑘 −1 of visit 𝑘 − 1 , with 𝑡 = 0 .(b) The end of the interval, which equals the time 𝑡 𝑘 of visit 𝑘 or the event/censoring time 𝑡 𝑖 for the last interval.(c) The event status at the end of the interval.(d) The exposure 𝐴 𝑖 and the baseline covariates 𝐿 𝑖 , whose values remain unchanged across all intervals.(e) The history of the mediator and the longitudinal confounders recorded up to the end of the interval. Note that the historyof 𝐿 and 𝑀 for subject 𝑖 up to the time 𝑡 𝑖 is similar to their history up to the last visit prior to time 𝑡 𝑖 .Table 1 provides a toy example in which three patients receive a binary treatment and are followed up for a total of three years,with two visits pre-planned at the end of year 1 and 2. Patient 3 is free of event till the end of the study and hence has the mediatorlevel fully recorded at the two intermediate visits. In contrast, patient 1 and 2 experience an event after 1.5 and 0.9 years, due towhich they have no (i.e. patient 2) or only one (i.e. patient 1) mediator level recorded. Table 2 illustrate how the information ofthese three hypothetical individuals is encoded in a counting-process format. Step 3–
Postulate and fit a suitable model for the mediator at a time point 𝑡 , conditional on the exposure, the longitudinalconfounder 𝐿 𝑡 and previous measurements of the mediator (i.e. 𝑀 𝑡 −1 ), by using the long data set. For instance, one may assumemultinomial logistic models for a categorical mediator 𝑀 𝑘 with possible values , … , 𝑄 . The model for 𝑀 𝑘 is thus: log Pr ( 𝑀 𝑘 = 𝑞 | 𝐴 = 𝑎, 𝑀 𝑘 −1 = 𝑚 𝑘 −1 , 𝐿 𝑘 = ̄𝑙 𝑘 , 𝑇 ≥ 𝑡 𝑘 ) Pr ( 𝑀 𝑘 = 0 | 𝐴 = 𝑎, 𝑀 𝑘 −1 = 𝑚 𝑘 −1 , 𝐿 𝑘 = 𝑙 𝑘 , 𝑇 ≥ 𝑡 𝑘 ) = 𝛾 𝑞 + 𝛾 𝑞 𝑎 + 𝛾 ′2 𝑞 𝑚 𝑘 −1 + 𝛾 ′3 𝑞 𝑙 𝑡 (4) o ET AL TABLE 1
Illustrating example: A toy dataset in standard short format. Here, 𝑚 𝑖𝑗 and 𝑙 𝑖𝑗 denote the mediator and longitudinalconfounder level of individual 𝑗 recorded at visit 𝑡 𝑖 , respectively. The treatment 𝐴 is binary (0 vs. 1) and there are two competingevents, coded as and .Individual Following-up time Status 𝐴 𝑀 𝑀 𝐿 𝐿 𝐿 (years)1 1.5 1 1 𝑚 − 𝑙 𝑙 − − − 𝑙 − − 𝑚 𝑚 𝑙 𝑙 𝑙 TABLE 2
The counting-process format of the dataset in table 1Individual Start Stop Status
𝐴 𝑀 𝑡 𝑀 𝑡 −1 𝐿 𝑡 𝐿 𝑡 −1 𝐿 𝑙 𝑚 𝑙 𝑙 𝑙 𝑙 𝑚 𝑙 𝑙 𝑚 𝑚 𝑙 𝑙 𝑙 TABLE 3
Extended data set for the example in Table 1Individual Start Stop Status
𝐴 𝐴 ∗ 𝑀 𝑡 𝑀 𝑡 −1 𝐿 𝑡 𝐿 𝑡 −1 𝐿 𝑙 𝑚 𝑙 𝑙 𝑙 𝑙 𝑚 𝑙 𝑙 𝑚 𝑚 𝑙 𝑙 𝑙 𝑙 𝑚 𝑙 𝑙 𝑙 𝑙 𝑚 𝑙 𝑙 𝑚 𝑚 𝑙 𝑙 𝑙 where 𝑞 = 1 , … , 𝑄 . Step 4–
A new data set is then constructed by copying the original data set (in long format) 𝑃 times and including an additionalvariable 𝐴 ∗ to capture the 𝑃 possible values of the exposure relative to the indirect path. 𝐴 ∗ is set to the actual value of theexposure 𝐴 for the first replication, to the other potential values of 𝐴 for the remaining replications. For the example discussedin table 1 and 2, the corresponding extended data set is provided in table 3. Step 5–
Compute weights by applying the fitted models from steps 1 and 3 to the new data set. At visit 𝑘 , the weight for the 𝑖 𝑡ℎ individual is 𝑤 𝑖 ( 𝑘, 𝑎, 𝑎 ∗ ) = 𝑤 𝑡𝑡𝑚𝑖 ( 𝑎 ) ⋅ 𝑤 𝑚𝑒𝑑𝑖 ( 𝑘, 𝑎, 𝑎 ∗ ) , where: 𝑤 𝑡𝑡𝑚𝑖 ( 𝑎 ) = 𝑃 −1 ∑ 𝑎 =0 𝐼 ( 𝐴 𝑖 = 𝑎 ) Pr ( 𝐴 𝑖 = 𝑎 | 𝐿 𝑖 = 𝑙 𝑖 ) and 𝑤 𝑚𝑒𝑑𝑖 ( 𝑘, 𝑎, 𝑎 ∗ ) = ∏ 𝑠 ∶ 𝑡 𝑠 ≤ 𝑡 𝑘 Pr ( 𝑀 𝑠,𝑖 = 𝑚 𝑠,𝑖 | 𝐴 𝑖 = 𝑎 ∗ , 𝑀 𝑠 −1 ,𝑖 = 𝑚 𝑠 −1 ,𝑖 , 𝐿 𝑠,𝑖 = ̄𝑙 𝑠,𝑖 , 𝑇 𝑖 ≥ 𝑡 𝑠 ) Pr ( 𝑀 𝑠,𝑖 = 𝑚 𝑠,𝑖 | 𝐴 𝑖 = 𝑎, 𝑀 𝑠 −1 ,𝑖 = 𝑚 𝑠 −1 ,𝑖 , 𝐿 𝑠,𝑖 = 𝑙 𝑠,𝑖 , 𝑇 𝑖 ≥ 𝑡 𝑠 ) Vo ET AL
𝐴 𝑀 𝑡 𝑇𝐿 𝐿 𝑡 𝑅 𝐶 ( 𝑡 ) 𝑈 𝑙 𝑈 𝑚 (a) 𝐴 𝑀 𝑡 −1 𝑀 𝑡 𝑇𝐿 𝐿 𝑡 −1 𝐿 𝑡 𝑅 𝐶 ( 𝑡 ) 𝑈 𝑙 𝑈 𝑚 (b) FIGURE 2 (Simplified) causal diagram when censoring presents – (a) Censoring is non-informative conditional on the exposureand (b) Censoring is non-informative at time 𝑡 conditional on the exposure and the history up to that timewhere the subscript 𝑡𝑡𝑚 and 𝑚𝑒𝑑 denotes treatment and mediator, respectively. At the end of the follow-up time, the weight fora patient having 𝑇 𝑖 = 𝑡 𝑖 is 𝑤 𝑖 ( 𝑡 𝑖 , 𝑎, 𝑎 ∗ ) = 𝑤 𝑖 ( ⌊ 𝑡 𝑖 ⌋ , 𝑎, 𝑎 ∗ ) . Step 6–
Fit the natural effect cause-specific proportional hazard model (1) and (2) by proportiona hazard regression of thecause-specific event time on 𝐴 and 𝐴 ∗ on the basis of the expanded data set, using the weights computed in the previous step. Step 7–
Derive confidence intervals for the parameters in model (1) and (2) by using the non-parametric bootstrap. For this,one first generates 𝑆 bootstrap samples with replacement from the original dataset, then repeats all the above steps for eachbootstrap sample. The 95% confidence interval for each parameter in model (1) and (2) is computed by using the 2.5% and 97.5%quantiles of the bootstrap distribution of the corresponding estimator. Denote 𝐶 the time-to-censoring. As stated above, when the censoring is non-informative conditional on the exposure (figure 2a),the provided estimating procedure remains valid without further adjustment. When censoring is dependent upon the baselinecovariate vector 𝐿 and the exposure 𝐴 , one could adjust for censoring by alternatively focusing on the so-called conditionalcause-specific natural effect proportional hazard model, that is, 𝜆 𝑗𝑎,𝑎 ∗ ( 𝑡 | 𝐿 = 𝑙 ) = 𝜆 𝑗 ( 𝑡 ) 𝑒 𝛼 𝑗 𝑎 + 𝛼 𝑗 𝑎 ∗ + 𝛼 ′2 𝑗 𝑙 (5)Note that an interaction between 𝑎 ∗ and 𝑙 could also be permitted in such model to assess the possibility of mediator-baselinecovariate interaction. The procedure discussed in section 2.2 can then be applied to estimate the parameters in this model, witha slight adjustment in step 6 where apart from 𝐴 and 𝐴 ∗ , the covariates 𝐿 (and the product of 𝐴 ∗ and 𝐿 if mediator- baselinecovariate interaction is assessed) are also included into the proportional hazard regression model. In practice, it might howeverbe the case that censoring is dependent upon post-baseline factors such as the longitudinal mediator and confounder levels thatare measured prior to censoring (figure 2b). In Appendix A2, we show that if at any time 𝑡 , the risk of future events for patientswho drop out of the study is not different from that of patients who have the same exposure, mediator and covariate history upto time 𝑡 but remain in the study, the so-called inverse probability of censoring weighting approach can be used to account for o ET AL censoring. More precisely, the parameters indexing models (1) and (2) are then estimated by solving the following equation: ∞ ∫ ∑ 𝑖,𝑎,𝑎 ∗ ⎧⎪⎪⎨⎪⎪⎩( 𝑎𝑎 ∗ ) − ∑ 𝑎,𝑎 ∗ ̂𝐸 [( 𝑎𝑎 ∗ ) ⋅ 𝑅 𝑖 ( 𝑡 ) ⋅ 𝐼 ( 𝐶 𝑖 > 𝑡 ) ⋅ 𝑊 𝑖 ( ⌊ 𝑡 ⌋ ) ⋅ 𝑒 𝛼 𝑗 𝑎 + 𝛼 𝑗 𝑎 ∗ ]∑ 𝑎,𝑎 ∗ ̂𝐸 [ 𝑅 𝑖 ( 𝑡 ) ⋅ 𝐼 ( 𝐶 𝑖 > 𝑡 ) ⋅ 𝑊 𝑖 ( ⌊ 𝑡 ⌋ ) ⋅ 𝑒 𝛼 𝑗 𝑎 + 𝛼 𝑗 𝑎 ∗ ] ⎫⎪⎪⎬⎪⎪⎭ ⋅⋅ 𝑅 𝑖 ( 𝑡 ) ⋅ 𝐼 ( 𝐶 𝑖 > 𝑡 ) ⋅ 𝑊 𝑖 ( ⌊ 𝑡 ⌋ ) ( 𝑑𝑁 𝑖 ( 𝑡 ) − 𝜆 ( 𝑡 ) ⋅ 𝑒 𝛼 𝑗 𝑎 + 𝛼 𝑗 𝑎 ∗ 𝑑𝑡 ) = 0 where 𝑊 𝑖 ( ⌊ 𝑡 ⌋ , 𝑎, 𝑎 ∗ ) = ∏ 𝑠 ∶ 𝑡 𝑠 ≤ ⌊ 𝑡 ⌋ ̂ Pr ( 𝑀 𝑠,𝑖 | 𝐴 𝑖 = 𝑎 ∗ , 𝑀 𝑠 −1 ,𝑖 , 𝐿 𝑠,𝑖 , 𝑇 𝑖 ≥ 𝑡 𝑠 , 𝐶 𝑖 ≥ 𝑡 𝑠 ) ∏ 𝑠 ∶ 𝑡 𝑠 ≤ ⌊ 𝑡 ⌋ ̂ Pr ( 𝑀 𝑠,𝑖 | 𝐴 𝑖 = 𝑎, 𝑀 𝑠 −1 ,𝑖 , 𝐿 𝑠,𝑖 , 𝑇 𝑖 ≥ 𝑡 𝑠 , 𝐶 𝑖 ≥ 𝑡 𝑠 ) ⋅ 𝐼 ( 𝐴 𝑖 = 𝑎 ) ̂ Pr ( 𝐴 𝑖 = 𝑎 | 𝐿 ,𝑖 ) ⋅⋅ ∏ 𝑠 ∶0 ≤ 𝑠 ≤ 𝑡 [ ̂𝜆 𝐶 ( 𝑠 | 𝑇 𝑖 > 𝑠, 𝑀 ⌊ 𝑠 ⌋ ,𝑖 , 𝐿 ⌊ 𝑠 ⌋ ,𝑖 , 𝐿 ,𝑖 , 𝐴 𝑖 = 𝑎 ) ] and 𝜆 𝐶 ( . ) denoting the cause-specific hazard function of the time-to-censoring. Here, ∏ 𝑠 𝑥 𝑠 is defined as a product limit. Withthe above weight 𝑊 𝑖 ( ⌊ 𝑡 ⌋ , 𝑎, 𝑎 ∗ ) , the additional component that accounts for the informative censoring can make the overallweight become unstable (e.g. when the censoring hazard is close to 1 in some strata). To overcome this, one can then usestabilized (censoring) weights which incorporate a numerator defined in the same way as the denominator but adjusting only forthe exposure, that is: ∏ 𝑠 ∶0 ≤ 𝑠 ≤ 𝑡 [ ̂𝜆 𝐶 ( 𝑠 | 𝐴 𝑖 = 𝑎 ) ] ∏ 𝑠 ∶0 ≤ 𝑠 ≤ 𝑡 [ ̂𝜆 𝐶 ( 𝑠 | 𝑇 𝑖 > 𝑠, 𝑀 ⌊ 𝑠 ⌋ ,𝑖 , 𝐿 ⌊ 𝑠 ⌋ ,𝑖 , 𝐿 ,𝑖 , 𝐴 𝑖 = 𝑎 ) ] One then needs to postulate two models for the censoring hazard at time 𝑡 , with one conditioning on exposure and the otherconditioning on exposure, baseline covariates and the history of the longitudinal mediator and confounders up to time 𝑡 , whereonly the latter model needs to be correct. For instance, 𝜆 𝐶 ( 𝑡 | 𝐴 𝑖 = 𝑎 𝑖 ) = 𝜆 ′0 𝐶 ( 𝑡 ) 𝑒 𝜂𝑎 𝑖 and 𝜆 𝐶 ( 𝑡 | 𝑇 𝑖 > 𝑡, 𝑀 ⌊ 𝑠 ⌋ ,𝑖 = 𝑚 ⌊ 𝑠 ⌋ ,𝑖 , 𝐿 ⌊ 𝑠 ⌋ ,𝑖 = 𝑙 ⌊ 𝑠 ⌋ ,𝑖 , 𝐿 ,𝑖 = 𝑙 ,𝑖 , 𝐴 𝑖 = 𝑎 𝑖 ) = 𝜆 𝐶 ( 𝑡 ) 𝑒 𝜃 𝑎 𝑖 + 𝜃 𝑙 ,𝑖 + 𝜃 ′2 𝑚 ⌊ 𝑠 ⌋ ,𝑖 + 𝜃 ′3 𝑙 ⌊ 𝑠 ⌋ ,𝑖 (6)As a result, in step 5 of the estimation procedure, apart from computing the mediator and treatment weights, one needs toadditionally derive the censoring weight. More precisely, the weight for the 𝑖 𝑡ℎ individual at visit 𝑘 is now 𝑤 𝑡𝑡𝑚𝑖 ( 𝑎, 𝑎 ∗ ) ⋅ 𝑤 𝑚𝑒𝑑𝑖 ( 𝑘, 𝑎 ) ⋅ 𝑤 𝑐𝑒𝑛𝑖 ( 𝑘, 𝑎 ) , where the subscript cen denotes censoring, i.e., 𝑤 𝑐𝑒𝑛𝑖 ( 𝑘, 𝑎 ) = 1 ∏ 𝑠 ∶0 ≤ 𝑠 ≤ 𝑡 𝑘 [ ̂𝜆 𝐶 ( 𝑠 | 𝑇 𝑖 > 𝑠, 𝑚 𝑠,𝑐,𝑖 , 𝑙 𝑠,𝑐,𝑖 , 𝑙 ,𝑖 , 𝑎 ) ] while 𝑤 𝑡𝑡𝑚𝑖 ( 𝑎, 𝑎 ∗ ) and 𝑤 𝑚𝑒𝑑𝑖 ( 𝑘, 𝑎, 𝑎 ∗ ) are computed as above. If a proportional hazard censoring model as (6) is fitted, the baselinecensoring hazard 𝜆 𝐶 ( 𝑡 ) and 𝜆 ′0 𝐶 ( 𝑡 ) at time 𝑡 can be estimated by the standard Breslow estimator. Once the individual weightsare computed, the natural effect cause-specific proprotional hazard model (1) and (2) can be fitted using these weights and theconfidence intervals for the estimates can be derived via the nonparametric bootstrap, as described above. We illustrate the proposed approach on the ELSA data. In this ongoing study, the first contact with the participants was in2002/03 (wave 1). These participants were then followed up biennially, with measures collected via computer-assisted face-to-face interview and self-completion questionnaires. For this illustration, the data are available until one year after the last wave in2016/17. As stated above, the question of interest here is whether the feeling of loneliness mediates the impact of hearing losson dementia, accounting for mortality as a competing event.For this analysis, we used the hearing measurement recorded at wave 2 (e.g. 2004/05) and dichotomized subjects into twogroups, namely normal ( 𝐴 = 0 ) and limited ( 𝐴 = 1 ) hearing ability. The longitudinal mediator "loneliness" was recordedfrom wave 3 (2006/07) to wave 7 (2014/15) and had two potential values, namely frequent ( 𝑀 = 1 ) vs. infrequent feeling ofloneliness ( 𝑀 = 0 ). Alongside the mediator, four longitudinal confounders were recorded over time (i.e. wave 3 to wave 7),namely depression status (yes vs. no), mobility score (continuous), smoking status (non-smoker vs. current smoker) and alcohol Vo ET AL status (non-drinker vs. current drinker). The baseline covariates consisted of age, gender, ethnicity (white vs. non-white), wealth(1=low, 5=high), education level (1 = no formal qualification, 2 = intermediate and 3 = higher education), marital status (yesvs. no), the use of hearing aids (yes vs. no), the presence of other comorbidities (i.e. hypertension, diabetes, stroke and cancer –yes vs. no) and the baseline values of the aforementioned time-varying confounders. A detailed description of these covariateswas provided elsewhere .We assumed that the relationship between the variables obeys the causal structure depicted in figure 1. Here, the mediatorsand confounders measured at time 𝑡 𝑘 −1 are time-varying confounders of the relationship between the mediator measured at time 𝑡 𝑘 and the outcome. To derive the treatment weights (step 1), we first considered a logistic (treatment) model adjusting for themain effects of all baseline covariates. To assess the potential of covariate-covariate interactions, we used a LASSO variableselection process (R package glmnet) to select the most important interaction terms from the set of all possible two-by-twocovariate interactions. The chosen interactions were then added into the treatment model. The tuning parameter in the LASSOwas selected by leave-one-out cross-validation.To derive the mediation weights (step 3), we first considered a logistic (mediator) model adjusting for 𝑡 𝑘 , 𝑀 𝑘 −1 , 𝐿 𝑘 and themain effects of all baseline covariates. To assess whether the conditional distribution of 𝑀 𝑘 had a residual dependence uponthe history of 𝑀 and 𝐿 that preceded the time 𝑡 𝑘 −1 (for 𝑀 ) and 𝑡 𝑘 (for 𝐿 ), we used the LASSO to determine the first post-baseline measurement of 𝑀 and 𝐿 that were predictive for 𝑀 𝑘 , conditional on the later measurements. This measurement andall measurements following this one were included into the mediator model. Next, we assessed whether there were important(i) treatment-baseline/longitudinal covariate interactions, (ii) time-baseline covariate interactions and (iii) baseline covariate-covariate interactions that should be adjusted for. For each step, an independent LASSO variable selection process was performedto select the most important interaction terms from all possible interactions of the same type. The interactions that were chosenin the previous step were always included in the model of the subsequent steps (which implies no shrinkage on these terms inthe subsequent steps). The tuning parameter in each LASSO procedure was selected by leave-one-out cross-validation. The finalmodel was refitted before calculating the mediation weights.To derive the censoring weights, we first considered a cause-specific proportional hazard model adjusting for 𝑀 𝑘 , 𝐿 𝑘 , theexposure 𝐴 and the main effects of all baseline covariates 𝐿 . To assess whether the censoring hazard at time 𝑡 had a residualdependence upon the history of 𝑀 and 𝐿 that preceded the time 𝑡 𝑘 for 𝑀 and for 𝐿 , we implemented a backward eliminationprocess, using the Akaike information criterion to determine the first post-baseline measurement of 𝑀 and 𝐿 that were predictivefor the censoring hazard at time 𝑡 , conditional on the later measurements. This measurement and all measurements followingthis one were included into the censoring proportional hazard model. Next, as for the mediator model, we assessed whether therewere important (i) treatment-baseline/longitudinal covariate interactions and (ii) baseline covariate-covariate interactions thatshould be adjusted for. For each step, an independent backward elimination process was performed to select the most importantinteraction terms from all possible interactions of the same type. The interactions that were chosen in the previous step werealways included in the model of the subsequent steps (which implies no exclusion of these terms in the subsequent steps).Note that we used backward elimination for the construction of the censoring models (as opposed to LASSO) due to the lackof prepackaged software that can apply LASSO or other penalized variable selection methods on a counting format survivaldataset. Results of the variable selection processes for the treatment, mediator and censoring models are reported in OnlineSupplementary Material file.The two natural effect proportional hazard models (i.e. model (1) and (2)) specific for dementia and for death were thenfitted using the calculated weights. The confidence intervals of the total, direct and indirect hazard ratios were derived by thenon-parametric bootstrap method, with 5000 samples taken from the original data set by sampling with replacement. We thenestablished the cumulative incidence curves of dementia and of death under different sets of 𝑎 and 𝑎 ∗ . These curves reflect thecumulative failure rates over time for a particular cause (e.g. dementia), acounting for the presence of other competing events(e.g. death). To estimate the curves, we considered dementia and death as two terminal states of a multi-state model wherethe transition from dementia to death was treated as an absorbing state, i.e. the one that subjects never exist . For pedagogicpurposes, we only use the results of the natural effect model (1) (i.e. without interaction between 𝑎 and 𝑎 ∗ ) to establish thesecurves.As can be seen from table 5, the total effect of hearing loss on the time to dementia diagnosis was statistically significant(Model 1, HR = 1.843; 95%CI 1.100 to 2.710). Model (1) further suggested that this total effect was weakly mediated throughthe feeling of loneliness, with a non-statistically significant indirect effect equal to 1.012 (HR scale; 95%CI 0.986 to 1.053).This expresses that the hazard of dementia would become 1.012 times higher if all patients were to have limited hearing abilitybut the loneliness levels were switched from the values that would have been observed if they had normal hearing ability to the o ET AL . . . . . . Years after hearing measurement at wave 2 C u m u l a t i v e r i sk o f de m en t i a d i agno s i s a = 0, a* = 0a = 1, a* = 0a = 1, a* = 1 0 2 4 6 8 10 . . . . . Years after hearing measurement at wave 2 C u m u l a t i v e r i sk o f dea t h a = 0, a* = 0a = 1, a* = 0a = 1, a* = 1 FIGURE 3
Estimated cumulative incidence curve of dementia diagnosis
TABLE 4
Data analysis: estimation of the natural effect modelsModel Coefficient Estimate 95%CI p-value(1)
Dementia ( 𝑗 = 1 ) 𝛼 𝑗 𝛼 𝑗 Death ( 𝑗 = 2 ) 𝛼 𝑗 -0.126 (-0.483; 0.143) 0.462 𝛼 𝑗 Dementia ( 𝑗 = 1 ) 𝛼 𝑗 𝛼 𝑗 𝛼 𝑗 Death ( 𝑗 = 2 ) 𝛼 -0.121 (-0.480; 0.148) 0.485 𝛼 𝛼 -0.009 (-0.049; 0.022) 0.632value observed under limited hearing. In contrast, the total effect of hearing loss on mortality was not statistically significant(Model 1, HR = 0.882; 95%CI 0.625 to 1.148). There was no statistical evidence of an indirect effect through the feeling ofloneliness (HR = 1.001, 95%CI 0.987 to 1.011). These findings did not change when considering model (2) with interaction(table 4 - p-value of the interaction coefficient equals 0.520 for dementia and 0.632 for death).Figure 3 provides the estimated cumulative incidence curve of dementia for different 𝑎 and 𝑎 ∗ , which visualizes the weakindirect effect of hearing loss on dementia through the suggested longitudinal mediator. At some time points, there are largejumps in these curves due to the fairly high rate of interval censoring in the dataset (i.e. if the date for dementia diagnosis (or Vo ET AL
TABLE 5
Data analysis: the effect of limited vs. normal hearing ability on the time-to-event outcomes, mediated through thefeeling of loneliness. The mediated proportion is calculated on log scaleModel Effect Hazard ratio 95%CI Mediated proportion(1)
Dementia
Total effect ( 𝐴 = 1 vs. 𝐴 = 0 ) 1.843 (1.100; 2.711)Direct effect 1.821 (1.095; 2.721)Indirect effect 1.012 (0.986; 1.053) 2.0% Death
Total effect ( 𝐴 = 1 vs. 𝐴 = 0 ) 0.882 (0.625; 1.148)Direct effect 0.882 (0.617; 1.154)Indirect effect 1.001 (0.987; 1.011) -0.8%(2) Dementia
Total effect ( 𝐴 = 1 vs. 𝐴 = 0 ) 1.839 (1.100; 2.713)Direct effect 1.836 (1.091; 2.716)Indirect effect 1.002 (0.988; 1.017) 0.3%Total effect ( 𝐴 = 0 vs. 𝐴 = 1 ) 0.544 (0.369; 0.912)Direct effect 0.554 (0.367; 0.912)Indirect effect 0.981 (0.925; 1.022) 3.2% Death
Total effect ( 𝐴 = 1 vs. 𝐴 = 0 ) 0.882 (0.622; 1.148)Direct effect 0.878 (0.614; 1.151)Indirect effect 1.005 (0.989; 1.023) -4.0%Total effect ( 𝐴 = 0 vs. 𝐴 = 1 ) 1.134 (0.871; 1.605)Direct effect 1.129 (0.862; 1.617)Indirect effect 1.004 (0.981; 1.036) 3.2%for death) was not known but person had a new diagnosis (or passed away) from one visit to the next, then we considered themidpoint between two visits as the event date). Finally, the above results should be interpreted with caution as there might beimportant time-varying confounders of the mediator-outcome association that were not taken into account. The findings couldalso be biased if the involved models were incorrectly specified or censoring was informative (e.g. elderly patients who livealone might not come to the control visit due to dementia-related problems). In this paper, we have generalized the weighting-based strategy proposed for natural effect models in single mediation analysisto the setting where the mediator of interest is repeatedly measured over time (hence subject to longitudinal confounders) andthe primary outcome is a time-to-event endpoint, subject to competing risks. The proposed approach yields consistent estimatesfor the natural direct and indirect effects if the causal assumptions hold and the natural effect model and the conditional dis-tribution of the exposure, mediator and censoring are correctly specified. As noted by Steen et al , the mediator model needscareful consideration, especially when the exposure (and the baseline covariates) are highly predictive of the mediator, for theneven minor misspecification can have a major impact on the weights and lead to biased results. Apart from (mediator) modelmisspecification, the estimated weights may become unstable or even extreme when at each time point, there is an inadequateoverlap between the conditional distributions of the mediator under different treatment/exposure conditions, as may be the casewhen the exposure has a strong effect on the mediator. While the presence of extreme weights might appear as a limitationat first, it may also diagnose severe model extrapolation that often goes unnoticed when using a repeated regression approachproposed for the same setting . Simple weighting-based approaches also tend to yield larger standard errors (compared toimputation or regression-based approaches) due to lack in efficiency. This can be especially problematic when the mediator iscontinuous. In that case, the weight-based approaches tend to be unstable even under proper model specification and adequate o ET AL overlap of the mediator distributions across treatment groups, which may result in considerable finite sample bias in the naturaleffect estimates. The repeated regression approach might be more appropriate when dealing with continuous mediators .Several proposals can be made to improve the suggested approach. Future research might focus on the development of doublyor multiply robust estimators to improve the robustness and efficiency of the current weight-based approach. The proposedstrategy can also be easily extended to take into account multiple mediators 𝑀 (1) , … , 𝑀 ( 𝑉 ) that are repeatedly measured overtime. When these mediators are causally ordered then as suggested by Vanderweele and Vansteelandt (2014), one can first eval-uate the effect mediated through 𝑀 (1) , then examine how much this changes when 𝑀 (1) and 𝑀 (2) are jointly considered asmediators. This then reveals the additional contribution of 𝑀 (2) beyond 𝑀 (1) alone. The process is then carried on by sequentiallyadding one mediator at a time until all 𝑉 mediators are included . By accounting for multiple, repeatedly measured mediators,results of the analysis may allow one to get closer to evaluating the entire mediation process that underlies the treatment mecha-nism in practice. Finally, future research should also extend the proposed approach to account for continuous exposures, whichare quite common in epidemiology and social science. ACKNOWLEDGMENTS
The first author was supported by the funding from the European UnionâĂŹs Horizon 2020 research and innovation program,under the Marie Sklodowska-Curie grant agreement (grant no.: 676207).
Conflict of interest
The authors declare no potential conflict of interests.
SUPPORTING INFORMATION
The following supporting information is available as part of the online article:
Data analysis 1.
Results of the construction of the propensity score model, the mediator model at each time point and thecensoring model
Data analysis 2
R codes for the analysis of the ELSA data.
References
1. Steptoe A, Breeze E, Banks J, Nazroo J. Cohort profile: the English longitudinal study of ageing.
International journal ofepidemiology
Journal of the American Geriatrics Society
Journal of the American Geriatrics Society
The Journals of Gerontology: Series B
Annual Review of Public Health
Epidemiology Vo ET AL
7. Pearl J. Direct and indirect effects. In: Morgan Kaufmann Publishers Inc. ; 2001: 411–420.8. Lange T, Rasmussen M, Thygesen LC. Assessing natural direct and indirect effects through multiple pathways.
Americanjournal of epidemiology
Epidemiology (Cambridge,Mass.)
Statistical methods in medical research
Statistics in medicine
Journal of causal inference
Statistics in medicine arXiv preprintarXiv:1912.01200
Americanjournal of epidemiology
Journal of the American Geriatrics Society
Statistics in medicine
Journal of Statistical Software
American journal ofepidemiology
Epidemiologic Methods
Biometrics
Epidemiologic methods
Journal of clinical epidemiology
CognitiveScience
Biometrics
Epidemiology (Cam-bridge, Mass.) o ET AL
27. Nguyen QC, Osypuk TL, Schmidt NM, Glymour MM, Tchetgen Tchetgen EJ. Practical guidance for conducting mediationanalysis with multiple mediators using inverse odds ratio weighting.
American journal of epidemiology
Diabetologia
Journal of Applied Econo-metrics
Statisticalscience
Annals of statistics
Epidemiology (Cambridge, Mass.)
How to cite this article:
Vo T., Davies-Kershaw H., Hackett R., and Vansteelandt S. (2020), Longitudinal mediation analysisof time–to–event endpoints in the presence of competing risks,
Statistics in Medicine. , xxxx;xx:x–x . APPENDIXA – FORMAL PROOF OF THE PROPOSAL
We here describe the derivations of the estimating procedure discussed in section 2. Assume that the causal diagram in figure1 represents a non-parametric structural equation model with independent errors and that the natural effect Cox model (1) iscorrectly specified. For the sake of simplicity, we first assume that no censoring presents. The added complexity due to censoringwill be subsequently addressed. Let 𝐹 be the score for ( 𝛼 , 𝛼 ) for all individuals in the sample. To simplify the proof, we willdenote 𝑊 𝑖 ( ⌊ 𝑡 ⌋ , 𝑎, 𝑎 ∗ ) as 𝑊 𝑖 ( ⌊ 𝑡 ⌋ ) . One then has: 𝐸 ( 𝐹 ) = ∫ ∑ 𝑖,𝑎,𝑎 ∗ ⎡⎢⎢⎢⎣( 𝑎𝑎 ∗ ) − ∑ 𝑎,𝑎 ∗ ( 𝑎𝑎 ∗ ) ⋅ 𝑒 𝛼 𝑎 + 𝛼 𝑎 ∗ ⋅ 𝐸 [ 𝑅 𝑖 ( 𝑡 ) ⋅ 𝑊 𝑖 ( ⌊ 𝑡 ⌋ ) ]∑ 𝑎,𝑎 ∗ 𝑒 𝛼 𝑎 + 𝛼 𝑎 ∗ ⋅ 𝐸 [ 𝑅 𝑖 ( 𝑡 ) ⋅ 𝑊 𝑖 ( ⌊ 𝑡 ⌋ ) ] ⎤⎥⎥⎥⎦ ⋅⋅ 𝐸 [ 𝑅 𝑖 ( 𝑡 ) ⋅ 𝑊 𝑖 ( ⌊ 𝑡 ⌋ ) ⋅ 𝑑𝑁 𝑖 ( 𝑡 )] We now consider the expression 𝐸 [ 𝑅 ( 𝑡 ) ⋅ 𝑊 ( ⌊ 𝑡 ⌋ ) ⋅ 𝑑𝑁 ( 𝑡 )] , that is: 𝐸 [ 𝑅 ( 𝑡 ) ⋅ 𝑊 ( ⌊ 𝑡 ⌋ ) ⋅ 𝑑𝑁 ( 𝑡 )]= 𝐸 { 𝐸 ( 𝑑𝑁 ( 𝑡 ) | 𝑇 ≥ 𝑡, 𝐴, 𝑀 𝑡 , 𝐿 𝑡 , 𝐿 ) ⋅ 𝑅 ( 𝑡 ) ⋅ 𝑊 ( ⌊ 𝑡 ⌋ ) } = 𝐸 { 𝐸 ( 𝑑𝑁 ( 𝑡 ) | 𝑇 ≥ 𝑡, 𝐴, 𝑀 𝑡 , 𝐿 𝑡 , 𝐿 ) ⋅ 𝑅 ( 𝑡 ) ⋅ 𝑊 ∗ ( ⌊ 𝑡 ⌋ ) |||| 𝐴 = 𝑎 } Pr ( 𝐴 = 𝑎 ) where 𝑊 ∗ ( ⌊ 𝑡 ⌋ ) = ∏ 𝑠 ∶ 𝑡 𝑠 ≤ ⌊ 𝑡 ⌋ Pr ( 𝑀 𝑠 | 𝐴 = 𝑎 ∗ , 𝑀 𝑠 −1 , 𝐿 𝑠 , 𝑇 ≥ 𝑡 𝑠 ) ∏ 𝑠 ∶ 𝑡 𝑠 ≤ ⌊ 𝑡 ⌋ Pr ( 𝑀 𝑠 | 𝐴 = 𝑎, 𝑀 𝑠 −1 , 𝐿 𝑠 , 𝑇 ≥ 𝑡 𝑠 ) × 1 Pr ( 𝐴 = 𝑎 | 𝐿 ) . Denote 𝑡 − = 𝑡 − 𝛿𝑡 where 𝛿𝑡 > is a small positive quantity such that ⌊ 𝑡 ⌋ < 𝑡 − < 𝑡 . One then has: Vo ET AL 𝐸 [ 𝑅 ( 𝑡 ) ⋅ 𝑊 ( 𝑡 ) ⋅ 𝑑𝑁 ( 𝑡 )]= 𝐸 { 𝐸 ( 𝑑𝑁 ( 𝑡 ) | 𝑇 ≥ 𝑡, 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ) ⋅ 𝐼 ( 𝑇 ≥ 𝑡 ) ⋅ 𝐼 ( 𝑇 ≥ 𝑡 −) ⋅ 𝑊 ∗ ( ⌊ 𝑡 ⌋ ) |||| 𝐴 = 𝑎 } Pr ( 𝑎 )= 𝐸 { 𝐸 [ 𝐸 ( 𝑑𝑁 ( 𝑡 ) | 𝑇 ≥ 𝑡, 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ) ⋅ 𝐼 ( 𝑇 ≥ 𝑡 ) ⋅ 𝐼 ( 𝑇 ≥ 𝑡 −) ⋅⋅ 𝑊 ∗ ( ⌊ 𝑡 ⌋ ) |||| 𝑇 ≥ 𝑡 − , 𝐴 = 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ] |||| 𝐴 = 𝑎 } Pr ( 𝑎 )= 𝐸 { 𝐸 ( 𝑑𝑁 ( 𝑡 ) | 𝑇 ≥ 𝑡, 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ) ⋅ 𝐼 ( 𝑇 ≥ 𝑡 −) ⋅ 𝑊 ∗ ( ⌊ 𝑡 ⌋ ) ⋅⋅ Pr [ 𝑇 ≥ 𝑡 | 𝑇 ≥ 𝑡 − , 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ] |||| 𝐴 = 𝑎 } Pr ( 𝑎 ) where 𝐸 ( 𝑑𝑁 ( 𝑡 ) | 𝑇 ≥ 𝑡, 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ) is a short-hand notation for 𝐸 ( 𝑑𝑁 ( 𝑡 ) | 𝑇 ≥ 𝑡, 𝐴 = 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ) and so on. Notethat Pr [ 𝑇 ≥ 𝑡 | 𝑇 ≥ 𝑡 − , 𝐴 = 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ] = 1 − 𝜆 ( 𝑡 | 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ) 𝛿𝑡 . This implies that: 𝐸 [ 𝑅 ( 𝑡 ) ⋅ 𝑊 ( 𝑡 ) ⋅ 𝑑𝑁 ( 𝑡 )] = 𝐸 { 𝐸 ( 𝑑𝑁 ( 𝑡 ) | 𝑇 ≥ 𝑡, 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ) ⋅ 𝐼 ( 𝑇 ≥ 𝑡 −) ⋅⋅ 𝑊 ∗ ( ⌊ 𝑡 ⌋ ) ⋅ [ 𝜆 ( 𝑡 | 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ) 𝛿𝑡 ] |||| 𝐴 = 𝑎 } Pr ( 𝑎 ) Applying the above arrangements in a backward fashion till when the time point ⌊ 𝑡 ⌋ is reached, one then obtains: 𝐸 [ 𝑅 ( 𝑡 ) ⋅ 𝑊 ( 𝑡 ) ⋅ 𝑑𝑁 ( 𝑡 )]= 𝐸 { 𝐸 ( 𝑑𝑁 ( 𝑡 ) | 𝑇 ≥ 𝑡, 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ) ⋅ 𝐼 ( 𝑇 ≥ ⌊ 𝑡 ⌋ ) ⋅ 𝑊 ∗ ( ⌊ 𝑡 ⌋ ) ⋅ ∏ 𝑠 ∶ ⌊ 𝑡 ⌋ ≤ 𝑠 ≤ 𝑡 [ 𝜆 ( 𝑠 | 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ) ] |||| 𝐴 = 𝑎 } Pr ( 𝑎 )= 𝐸 { ∫ 𝐸 ( 𝑑𝑁 ( 𝑡 ) | 𝑇 ≥ 𝑡, 𝑎, 𝑚 ⌊ 𝑡 ⌋ , 𝑀 ⌊ 𝑡 −1 ⌋ , 𝑙 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 −1 ⌋ , 𝐿 ) ⋅ 𝐼 ( 𝑇 ≥ ⌊ 𝑡 ⌋ ) ⋅ 𝑊 ∗ ( ⌊ 𝑡 − 1 ⌋ ) ⋅⋅ Pr ( 𝑚 𝑡 | 𝑎 ∗ , 𝑀 ⌊ 𝑡 −1 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 , 𝑇 ≥ 𝑡 ) ⋅ Pr ( 𝑙 𝑡 | 𝑎, 𝐿 ⌊ 𝑡 −1 ⌋ , 𝑀 ⌊ 𝑡 −1 ⌋ , 𝐿 , 𝑇 ≥ 𝑡 ) ⋅⋅ ∏ 𝑠 ∶ ⌊ 𝑡 ⌋ ≤ 𝑠 ≤ 𝑡 [ 𝜆 ( 𝑠 | 𝑎, 𝑚 ⌊ 𝑡 ⌋ , 𝑀 ⌊ 𝑡 −1 ⌋ , 𝑙 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 −1 ⌋ , 𝐿 ) ] 𝑑𝑚 ⌊ 𝑡 ⌋ 𝑑𝑙 ⌊ 𝑡 ⌋ |||| 𝐴 = 𝑎 } Pr ( 𝑎 ) where ∏ 𝑠 𝑎 𝑠 is defined as a product limit. The above procedure continues to be repeated in a backward fashion till when thestarting time point 𝑡 = 0 is reached, which then gives: 𝐸 [ 𝑅 ( 𝑡 ) ⋅ 𝑊 ( 𝑡 ) ⋅ 𝑑𝑁 ( 𝑡 )]= 𝐸 { ∫ 𝐸 ( 𝑑𝑁 ( 𝑡 ) | 𝑇 ≥ 𝑡, 𝑎, 𝑚 ⌊ 𝑡 ⌋ , 𝑙 ⌊ 𝑡 ⌋ , 𝐿 ) ⋅ ∏ 𝑠 ∶0 <𝑠 ≤ 𝑡 [ 𝜆 ( 𝑠 | 𝑎, 𝑚 ⌊ 𝑠 ⌋ , 𝑙 ⌊ 𝑠 ⌋ , 𝐿 ) ] ⋅⋅ ∏ 𝑠 ∶0 <𝑠 ≤ ⌊ 𝑡 ⌋ Pr ( 𝑚 𝑠 | 𝑎 ∗ , 𝑚 𝑠 −1 , 𝑙 𝑠 , 𝐿 , 𝑇 ≥ 𝑡 𝑠 ) ⋅⋅ Pr ( 𝑙 𝑠 | 𝑙 𝑠 −1 , 𝑚 𝑠 −1 , 𝐿 , 𝑎, 𝑇 ≥ 𝑡 𝑠 ) 𝑑𝑚 ⌊ 𝑡 ⌋ 𝑑𝑙 ⌊ 𝑡 ⌋ |||| 𝐴 = 𝑎 } Pr ( 𝑎 ) Note that ∏ 𝑠 ∶0 <𝑠 ≤ 𝑡 [ 𝜆 ( 𝑠 | 𝑎, 𝑚 ⌊ 𝑠 ⌋ , 𝑙 ⌊ 𝑠 ⌋ , 𝐿 ) ] = Pr ( 𝑇 ≥ 𝑡 | 𝑎, 𝑚 ⌊ 𝑡 ⌋ , 𝑙 ⌊ 𝑡 ⌋ , 𝐿 ) . This implies that: o ET AL 𝐸 [ 𝑅 ( 𝑡 ) ⋅ 𝑊 ( 𝑡 ) ⋅ 𝑑𝑁 ( 𝑡 )]= 𝐸 { ∫ 𝐸 ( 𝑑𝑁 ( 𝑡 ) | 𝑇 ≥ 𝑡, 𝑎, 𝑚 ⌊ 𝑡 ⌋ , 𝑙 ⌊ 𝑡 ⌋ , 𝐿 ) ⋅ Pr ( 𝑇 ≥ 𝑡 | 𝑎, 𝑚 ⌊ 𝑡 ⌋ , 𝑙 ⌊ 𝑡 ⌋ , 𝐿 ) ⋅⋅ ∏ 𝑠 ∶0 <𝑠 ≤ 𝑡 Pr ( 𝑚 𝑠 | 𝑎 ∗ , 𝑚 𝑠 −1 , 𝑙 𝑠 , 𝐿 , 𝑇 ≥ 𝑡 𝑠 ) ⋅⋅ Pr ( 𝑙 𝑠 | 𝑙 𝑠 −1 , 𝑚 𝑠 −1 , 𝐿 , 𝑎, 𝑇 ≥ 𝑡 𝑠 ) , 𝑑𝑚 ⌊ 𝑡 ⌋ 𝑑𝑙 ⌊ 𝑡 ⌋ |||| 𝐴 = 𝑎 } Pr ( 𝑎 )= ∫ 𝐸 ( 𝑑𝑁 ( 𝑡 ) | 𝑎, 𝑚 𝑡 , 𝑙 𝑡 , 𝑙 ) ⋅ ∏ 𝑠 ∶0 <𝑠 ≤ ⌊ 𝑡 ⌋ Pr ( 𝑚 𝑠 | 𝑎 ∗ , 𝑚 𝑠 −1 , 𝑙 𝑠 , 𝑙 , 𝑇 ≥ 𝑡 𝑠 ) ⋅⋅ Pr ( 𝑙 𝑠 | 𝑙 𝑠 −1 , 𝑚 𝑠 −1 , 𝑙 , 𝑎, 𝑇 ≥ 𝑡 𝑠 ) ⋅ Pr ( 𝑙 | 𝑎 ) × Pr ( 𝑎 ) Pr ( 𝑎 | 𝑙 ) 𝑑𝑚 ⌊ 𝑡 ⌋ 𝑑𝑙 ⌊ 𝑡 ⌋ 𝑑𝑙 = 𝐸 [ 𝑑𝑁 𝑎,𝑎 ∗ ( 𝑡 ) ] = 𝐸 [ 𝑅 𝑎,𝑎 ∗ ( 𝑡 ) ] ⋅ 𝐸 [ 𝑑𝑁 𝑎,𝑎 ∗ ( 𝑡 ) | 𝑅 𝑎,𝑎 ∗ ( 𝑡 ) = 1 ] where the prior-to-last equaling follows by the derivation of Vansteelandt et al . In a similar way, one can show that 𝐸 [ 𝑅 ( 𝑡 ) ⋅ 𝑊 ( 𝑡 )] = 𝐸 ( 𝑅 𝑎,𝑎 ∗ ( 𝑡 ) ) . These imply that: 𝐸 ( 𝐹 ) = ∫ ∑ 𝑖,𝑎,𝑎 ∗ ⎡⎢⎢⎢⎣( 𝑎𝑎 ∗ ) − ∑ 𝑎,𝑎 ∗ ( 𝑎𝑎 ∗ ) ⋅ 𝑒 𝛼 𝑎 + 𝛼 𝑎 ∗ ⋅ 𝐸 [ 𝑅 𝑎,𝑎 ∗ 𝑖 ( 𝑡 ) ]∑ 𝑎,𝑎 ∗ 𝑒 𝛼 𝑎 + 𝛼 𝑎 ∗ ⋅ 𝐸 [ 𝑅 𝑎,𝑎 ∗ 𝑖 ( 𝑡 ) ] ⎤⎥⎥⎥⎦ ⋅⋅ 𝐸 [ 𝑅 𝑎,𝑎 ∗ ( 𝑡 ) ] ⋅ 𝐸 [ 𝑑𝑁 𝑎,𝑎 ∗ ( 𝑡 ) | 𝑅 𝑎,𝑎 ∗ ( 𝑡 ) = 1 ] = ∫ ∑ 𝑖,𝑎,𝑎 ∗ ⎡⎢⎢⎢⎣( 𝑎𝑎 ∗ ) − ∑ 𝑎,𝑎 ∗ ( 𝑎𝑎 ∗ ) 𝑒 𝛼 𝑎 + 𝛼 𝑎 ∗ ⋅ 𝐸 [ 𝑅 𝑎,𝑎 ∗ 𝑖 ( 𝑡 ) ]∑ 𝑎,𝑎 ∗ 𝑒 𝛼 𝑎 + 𝛼 𝑎 ∗ ⋅ 𝐸 [ 𝑅 𝑎,𝑎 ∗ 𝑖 ( 𝑡 ) ] ⎤⎥⎥⎥⎦ ⋅⋅ 𝐸 [ 𝑅 𝑎,𝑎 ∗ ( 𝑡 ) ] ⋅ { 𝐸 [ 𝑑𝑁 𝑎,𝑎 ∗ ( 𝑡 ) | 𝑅 𝑎,𝑎 ∗ ( 𝑡 ) = 1 ] − 𝜆 ( 𝑡 ) ⋅ 𝑒 𝛼 𝑎 + 𝛼 𝑎 ∗ } = 0 The final equaling follows by the fact that the natural proportional hazard model is correctly specified, for then 𝐸 [ 𝑑𝑁 𝑖 ( 𝑡 ) 𝑎,𝑎 ∗ ( 𝑡 ) | 𝑇 𝑎,𝑎 ∗ 𝑖 ≥ 𝑡 ] = 𝜆 ( 𝑡 ) ⋅ 𝑒 𝛼 𝑎 + 𝛼 𝑎 ∗ B – ADDRESSING INFORMATIVE CENSORING
When censoring presents but is non-informative given the exposure group, the score 𝐹 for ( 𝛼 , 𝛼 ) for all individuals in thesample can be written as: 𝐹 = ∞ ∫ ∑ 𝑖,𝑎,𝑎 ∗ ⎧⎪⎨⎪⎩( 𝑎𝑎 ∗ ) − ∑ 𝑎,𝑎 ∗ 𝐸 [( 𝑎𝑎 ∗ ) ⋅ 𝑅 𝑖 ( 𝑡 ) ⋅ 𝐼 ( 𝐶 𝑖 > 𝑡 ) ⋅ 𝑊 𝑖 ( ⌊ 𝑡 ⌋ ) ⋅ 𝑒 𝛼 𝑎 + 𝛼 𝑎 ∗ ]∑ 𝑎,𝑎 ∗ 𝐸 [ 𝑅 𝑖 ( 𝑡 ) ⋅ 𝐼 ( 𝐶 𝑖 > 𝑡 ) ⋅ 𝑊 𝑖 ( ⌊ 𝑡 ⌋ ) ⋅ 𝑒 𝛼 𝑎 + 𝛼 𝑎 ∗ ] ⎫⎪⎬⎪⎭ ⋅⋅ 𝑅 𝑖 ( 𝑡 ) ⋅ 𝐼 ( 𝐶 𝑖 > 𝑡 ) ⋅ 𝑊 𝑖 ( ⌊ 𝑡 ⌋ ) ( 𝑑𝑁 𝑖 ( 𝑡 ) − 𝜆 ( 𝑡 ) ⋅ 𝑒 𝛼 𝑎 + 𝛼 𝑎 ∗ 𝑑𝑡 ) Note that: Vo ET AL 𝐸 [ 𝑅 ( 𝑡 ) ⋅ 𝐼 ( 𝐶 ≥ 𝑡 ) ⋅ 𝑊 ( 𝑡 ) ⋅ 𝑑𝑁 ( 𝑡 )]= 𝐸 { 𝐸 [ 𝐼 ( 𝐶 ≥ 𝑡 ) ⋅ 𝑅 ( 𝑡 ) ⋅ 𝑑𝑁 ( 𝑡 ) ⋅ 𝑊 ( ⌊ 𝑡 ⌋ ) |||| 𝐴, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ]} = 𝐸 { 𝑊 ( ⌊ 𝑡 ⌋ ) ⋅ Pr ( 𝐶 ≥ 𝑡 | 𝐴, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 , 𝑇 ≥ 𝑡 ) ⋅⋅ 𝐸 [ 𝑑𝑁 ( 𝑡 ) | 𝐴, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 , 𝑇 ≥ 𝑡 ] ⋅ Pr ( 𝑇 ≥ 𝑡 | 𝐴, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ) } = 𝐸 { 𝑊 ( ⌊ 𝑡 ⌋ ) ⋅ Pr ( 𝐶 ≥ 𝑡 | 𝐴 ) ⋅ 𝐸 [ 𝑑𝑁 ( 𝑡 ) | 𝐴, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 , 𝑇 ≥ 𝑡 ] ⋅ Pr ( 𝑇 ≥ 𝑡 | 𝐴, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ) } = Pr ( 𝐶 ≥ 𝑡 | 𝐴 = 𝑎 ) ⋅ 𝐸 { 𝑊 ∗ ( ⌊ 𝑡 ⌋ ) ⋅ 𝐸 [ 𝑑𝑁 ( 𝑡 ) | 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 , 𝑇 ≥ 𝑡 ] ⋅ Pr ( 𝑇 ≥ 𝑡 | 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ) |||| 𝐴 = 𝑎 } Pr ( 𝑎 ) The prior-to-last equaling results from the assumption that the censoring is non-informative given the treatment group (e.g.figure 3). Following the same reasoning as above, one can show that: 𝐸 [ 𝑅 ( 𝑡 ) ⋅ 𝐼 ( 𝐶 ≥ 𝑡 ) ⋅ 𝑊 ( 𝑡 ) ⋅ 𝑑𝑁 ( 𝑡 )] = Pr ( 𝐶 ≥ 𝑡 | 𝐴 = 𝑎 ) ⋅ 𝐸 [ 𝑅 𝑎,𝑎 ∗ ( 𝑡 ) ] ⋅⋅ 𝐸 [ 𝑑𝑁 𝑎,𝑎 ∗ ( 𝑡 ) | 𝑅 𝑎,𝑎 ∗ ( 𝑡 ) = 1 ] and similarly, 𝐸 [ 𝑅 ( 𝑡 ) ⋅ 𝐼 ( 𝐶 ≥ 𝑡 ) ⋅ 𝑊 ( 𝑡 )] = Pr ( 𝐶 ≥ 𝑡 | 𝐴 = 𝑎 ) ⋅ 𝐸 [ 𝑅 𝑎,𝑎 ∗ ( 𝑡 ) ] The score will thus remain the same as in the case of no censoring, with an added proportional constant, i.e. Pr ( 𝐶 > 𝑡 | 𝐴 = 𝑎 ) .As a result, 𝐸 ( 𝐹 ) = ∫ ∑ 𝑖,𝑎,𝑎 ∗ 𝐸 ⎧⎪⎨⎪⎩⎡⎢⎢⎢⎣( 𝑎𝑎 ∗ ) − ∑ 𝑎,𝑎 ∗ ( 𝑎𝑎 ∗ ) ⋅ 𝑒 𝛼 𝑎 + 𝛼 𝑎 ∗ ⋅ 𝐸 ( 𝑅 𝑎,𝑎 ∗ 𝑖 ( 𝑡 ) ) ⋅ Pr ( 𝐶 > 𝑡 | 𝐴 = 𝑎 ) ∑ 𝑎,𝑎 ∗ 𝑒 𝛼 𝑎 + 𝛼 𝑎 ∗ ⋅ 𝐸 ( 𝑅 𝑎,𝑎 ∗ 𝑖 ( 𝑡 ) ) ⋅ Pr ( 𝐶 > 𝑡 | 𝐴 = 𝑎 ) ⎤⎥⎥⎥⎦ ⋅⋅ 𝐸 [ 𝑅 𝑎,𝑎 ∗ 𝑖 ( 𝑡 ) ] ⋅ [ 𝐸 [ 𝑑𝑁 𝑖 ( 𝑡 ) 𝑎,𝑎 ∗ ( 𝑡 ) | 𝑇 𝑎,𝑎 ∗ 𝑖 ≥ 𝑡 ] − 𝜆 ( 𝑡 ) ⋅ 𝑒 𝛼 𝑎 + 𝛼 𝑎 ∗ ]} ⋅ Pr ( 𝐶 > 𝑡 | 𝐴 = 𝑎 )= 0 which results from the fact that the natural proportional hazard model is correctly specified.When censoring is only non-informative at time 𝑡 given the exposure group and the history of mediator and covariates up tothat time, the weight of each patient at time 𝑡 needed to be adjusted as described in section 2.3. To prove this, we consider onceagain the expression 𝐸 [ 𝑅 ( 𝑡 ) ⋅ 𝐼 ( 𝐶 > 𝑡 ) ⋅ 𝑊 ( ⌊ 𝑡 ⌋ ) ⋅ 𝑑𝑁 ( 𝑡 )] . Note that the weight 𝑊 ( ⌊ 𝑡 ⌋ ) now has an additional censoring-weightcomponent (see section 2.3), for then: 𝐸 [ 𝑅 ( 𝑡 ) ⋅ 𝐼 ( 𝐶 > 𝑡 ) ⋅ 𝑊 ( ⌊ 𝑡 ⌋ ) ⋅ 𝑑𝑁 ( 𝑡 )]= 𝐸 { 𝐸 ( 𝑑𝑁 ( 𝑡 ) | 𝐶 ≥ 𝑡, 𝑇 ≥ 𝑡, 𝐴, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ) ⋅ 𝑅 ( 𝑡 ) ⋅ 𝐼 ( 𝐶 > 𝑡 ) ⋅ 𝑊 ( ⌊ 𝑡 ⌋ ) } = 𝐸 { 𝐸 ( 𝑑𝑁 ( 𝑡 ) | 𝑇 ≥ 𝑡, 𝐴, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ) ⋅ 𝑅 ( 𝑡 ) ⋅ 𝐼 ( 𝐶 > 𝑡 ) ⋅ 𝑊 ∗ ( ⌊ 𝑡 ⌋ ) |||| 𝐴 = 𝑎 } Pr ( 𝐴 = 𝑎 ) where 𝑊 ∗ ( 𝑡 ) = ∏ 𝑠 ∶0 ≤ 𝑠 ≤ ⌊ 𝑡 ⌋ Pr ( 𝑀 𝑠 | 𝐴 = 𝑎 ∗ , 𝑀 ⌊ 𝑠 ⌋ −1 , 𝐿 ⌊ 𝑠 ⌋ , 𝑇 ≥ 𝑡 𝑠 , 𝐶 ≥ 𝑡 𝑠 ) ∏ 𝑠 ∶0 ≤ 𝑠 ≤ ⌊ 𝑡 ⌋ Pr ( 𝑀 𝑠 | 𝐴 = 𝑎, 𝑀 ⌊ 𝑠 ⌋ −1 , 𝐿 ⌊ 𝑠 ⌋ , 𝑇 ≥ 𝑡 𝑠 , 𝐶 ≥ 𝑡 𝑠 ) ⋅ Pr ( 𝐴 = 𝑎 | 𝐿 ) ⋅⋅ ∏ 𝑠 ∶0 ≤ 𝑠 ≤ 𝑡 [ 𝜆 𝐶 ( 𝑠 | 𝑇 > 𝑠, 𝑚 𝑠 , 𝑙 𝑠 , 𝑙 , 𝑎 ) ] o ET AL Note that the second equaling results from the fact that censoring is non-informative at any time 𝑡 , given the history up to thattime. Now denote 𝑡 − = 𝑡 − 𝛿𝑡 where 𝛿𝑡 > is a small positive quantity such that ⌊ 𝑡 ⌋ < 𝑡 − < 𝑡 , one then has: 𝐸 [ 𝑅 ( 𝑡 ) ⋅ 𝐼 ( 𝐶 > 𝑡 ) ⋅ 𝑊 ( ⌊ 𝑡 ⌋ ) ⋅ 𝑑𝑁 ( 𝑡 )]= 𝐸 { 𝐸 ( 𝑑𝑁 ( 𝑡 ) | 𝑇 ≥ 𝑡, 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ) ⋅ 𝐼 ( 𝑇 ≥ 𝑡 ) ⋅ 𝐼 ( 𝑇 ≥ 𝑡 −) ⋅⋅ 𝐼 ( 𝐶 ≥ 𝑡 ) ⋅ 𝐼 ( 𝐶 ≥ 𝑡 −) ⋅ 𝑊 ∗ ( ⌊ 𝑡 ⌋ ) |||| 𝐴 = 𝑎 } Pr ( 𝑎 )= 𝐸 { 𝐸 [ 𝐸 ( 𝑑𝑁 ( 𝑡 ) | 𝑇 ≥ 𝑡, 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ) ⋅ 𝐼 ( 𝑇 ≥ 𝑡 ) ⋅ 𝐼 ( 𝑇 ≥ 𝑡 −) ⋅ 𝐼 ( 𝐶 ≥ 𝑡 ) ⋅⋅ 𝐼 ( 𝐶 ≥ 𝑡 −) ⋅ 𝑊 ∗ ( ⌊ 𝑡 ⌋ ) |||| 𝑇 ≥ 𝑡 − , 𝐶 ≥ 𝑡 − , 𝐴 = 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ] |||| 𝐴 = 𝑎 } Pr ( 𝑎 )= 𝐸 { 𝐸 ( 𝑑𝑁 ( 𝑡 ) | 𝑇 ≥ 𝑡, 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ) ⋅ 𝐼 ( 𝑇 ≥ 𝑡 −) ⋅ 𝐼 ( 𝐶 ≥ 𝑡 −) ⋅ 𝑊 ∗ ( ⌊ 𝑡 ⌋ ) ⋅⋅ E [ 𝐼 ( 𝑇 ≥ 𝑡 ) ⋅ 𝐼 ( 𝐶 ≥ 𝑡 ) |||| 𝑇 ≥ 𝑡 − , 𝐶 ≥ 𝑡 − , 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ] |||| 𝐴 = 𝑎 } Pr ( 𝑎 )= 𝐸 { 𝐸 ( 𝑑𝑁 ( 𝑡 ) | 𝑇 ≥ 𝑡, 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ) ⋅ 𝐼 ( 𝑇 ≥ 𝑡 −) ⋅ 𝐼 ( 𝐶 ≥ 𝑡 −) ⋅ 𝑊 ∗ ( ⌊ 𝑡 ⌋ ) ⋅ Pr [ 𝐶 ≥ 𝑡 | 𝑇 ≥ 𝑡, 𝐶 ≥ 𝑡 − , 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ] ⋅⋅ Pr [ 𝑇 ≥ 𝑡 | 𝑇 ≥ 𝑡 − , 𝐶 ≥ 𝑡 − , 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ] |||| 𝐴 = 𝑎 } Pr ( 𝑎 )= 𝐸 { 𝐸 ( 𝑑𝑁 ( 𝑡 ) | 𝑇 ≥ 𝑡, 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ) ⋅ 𝐼 ( 𝑇 ≥ 𝑡 −) ⋅ 𝐼 ( 𝐶 ≥ 𝑡 −) ⋅ 𝑊 ∗ ( ⌊ 𝑡 ⌋ ) ⋅ [1 − 𝜆 𝐶 ( 𝑡 | 𝑇 ≥ 𝑡, 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ) 𝛿𝑡 ] ⋅⋅ Pr [ 𝑇 ≥ 𝑡 | 𝑇 ≥ 𝑡 − , 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ] |||| 𝐴 = 𝑎 } Pr ( 𝑎 )= 𝐸 { 𝐸 ( 𝑑𝑁 ( 𝑡 ) | 𝑇 ≥ 𝑡, 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ) ⋅ 𝐼 ( 𝑇 ≥ 𝑡 −) ⋅ 𝐼 ( 𝐶 ≥ 𝑡 −) ⋅ 𝑊 ∗ ( ⌊ 𝑡 ⌋ ) ⋅ [1 − 𝜆 𝐶 ( 𝑡 | 𝑇 ≥ 𝑡, 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ) 𝛿𝑡 ] ⋅ [1 − 𝜆 ( 𝑡 | 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ) 𝛿𝑡 ] |||| 𝐴 = 𝑎 } Pr ( 𝑎 ) The last two equalings result from the fact that Pr [ 𝐶 ≥ 𝑡 | 𝑇 ≥ 𝑡, 𝐶 ≥ 𝑡 − , 𝐴 = 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ] = 1 − 𝜆 𝐶 ( 𝑡 | 𝑇 ≥ 𝑡, 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ) 𝛿𝑡 and that Pr [ 𝑇 ≥ 𝑡 | 𝑇 ≥ 𝑡 − , 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ] = Pr [ 𝑇 ≥ 𝑡 | 𝑇 ≥ 𝑡 − , 𝐶 ≥ 𝑡 − , 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ] due to theassumption of conditionally non-informative censoring. This implies that: 𝐸 [ 𝑅 ( 𝑡 ) ⋅ 𝐼 ( 𝐶 > 𝑡 ) ⋅ 𝑊 ( ⌊ 𝑡 ⌋ ) ⋅ 𝑑𝑁 ( 𝑡 )]= 𝐸 { 𝐸 ( 𝑑𝑁 ( 𝑡 ) | 𝑇 ≥ 𝑡, 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ) ⋅ 𝐼 ( 𝑇 ≥ 𝑡 −) ⋅ 𝐼 ( 𝐶 ≥ 𝑡 −) ⋅⋅ ∏ 𝑠 ∶0 ≤ 𝑠 ≤ ⌊ 𝑡 ⌋ Pr ( 𝑀 𝑠 | 𝐴 = 𝑎 ∗ , 𝑀 ⌊ 𝑠 ⌋ −1 , 𝐿 ⌊ 𝑡 ⌋ , 𝑇 ≥ 𝑡 𝑠 , 𝐶 ≥ 𝑡 𝑠 ) ∏ 𝑠 ∶0 ≤ 𝑠 ≤ ⌊ 𝑡 ⌋ Pr ( 𝑀 𝑠 | 𝐴 = 𝑎, 𝑀 ⌊ 𝑠 ⌋ −1 , 𝐿 ⌊ 𝑠 ⌋ , 𝑇 ≥ 𝑡 𝑠 , 𝐶 ≥ 𝑡 𝑠 ) ⋅ Pr ( 𝐴 = 𝑎 | 𝐿 ) ⋅⋅ 𝜆 ( 𝑡 | 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ) ∏ 𝑠 ∶0 ≤ 𝑠 ≤ 𝑡 − [ 𝜆 𝐶 ( 𝑠 | 𝑇 > 𝑠, 𝑚 𝑠 , 𝑙 𝑠 , 𝑙 , 𝑎 ) ] |||| 𝐴 = 𝑎 } Pr ( 𝑎 ) Applying the above arrangements in a backward fashion as is done in the case of no censoring, one will obtain the result that: 𝐸 [ 𝑅 ( 𝑡 ) ⋅ 𝐼 ( 𝐶 > 𝑡 ) ⋅ 𝑊 ( ⌊ 𝑡 ⌋ ) ⋅ 𝑑𝑁 ( 𝑡 )] = 𝐸 [ 𝑅 𝑎,𝑎 ∗ ( 𝑡 ) ] ⋅ 𝐸 [ 𝑑𝑁 𝑎,𝑎 ∗ ( 𝑡 ) | 𝑅 𝑎,𝑎 ∗ ( 𝑡 ) = 1 ]]