[PDF] Longitudinal mediation analysis of time-to-event endpoints in the presence of competing risks

Abstract

This proposal is motivated by an analysis of the English Longitudinal Study of Ageing (ELSA), which aims to investigate the role of loneliness in explaining the negative impact of hearing loss on dementia. The methodological challenges that complicate this mediation analysis include the use of a time-to-event endpoint subject to competing risks, as well as the presence of feedback relationships between the mediator and confounders that are both repeatedly measured over time. To account for these challenges, we introduce natural effect proportional (cause-specific) hazard models. These extend marginal structural proportional (cause-specific) hazard models to enable effect decomposition. We show that under certain causal assumptions, the path-specific direct and indirect effects indexing this model are identifiable from the observed data. We next propose an inverse probability weighting approach to estimate these effects. On the ELSA data, this approach reveals little evidence that the total efect of hearing loss on dementia is mediated through the feeling of loneliness, with a non-statistically significant indirect effect equal to 1.012 (hazard ratio (HR) scale; 95% confidence interval (CI) 0.986 to 1.053).

Full PDF

RReceived: Added at production Revised: Added at production Accepted: Added at productionDOI: xxx/xxxx

RESEARCH ARTICLE

Longitudinal mediation analysis of time–to–event endpoints inthe presence of competing risks

Tat-Thang Vo* | Hilary Davies-Kershaw | Ruth Hackett | Stijn Vansteelandt Department of Applied Mathematics,Computer Science and Statistics, GhentUniversity, Ghent, Belgium Université de Paris, CRESS, INSERM,INRA, Paris, France Department of Population Health, LondonSchool of Hygiene and Tropical Medicine,London, UK Health Psychology Section, Department ofPsychology, King’s College London,London, UK Department of Medical Statistics, LondonSchool of Hygiene and Tropical Medicine,London, UK

Correspondence *Tat-Thang Vo, Department of AppliedMathematics, Computer Science andStatistics, Ghent University, Krijgslaan 281,S9, Ghent 9000, Belgium. Email:[email protected]

Summary

This proposal is motivated by an analysis of the English Longitudinal Study of Age-ing (ELSA), which aims to investigate the role of loneliness in explaining the negativeimpact of hearing loss on dementia. The methodological challenges that complicatethis mediation analysis include the use of a time-to-event endpoint subject to com-peting risks, as well as the presence of feedback relationships between the mediatorand confounders that are both repeatedly measured over time. To account for thesechallenges, we introduce natural eﬀect proportional (cause-speciﬁc) hazard mod-els. These extend marginal structural proportional (cause-speciﬁc) hazard models toenable eﬀect decomposition. We show that under certain causal assumptions, thepath-speciﬁc direct and indirect eﬀects indexing this model are identiﬁable from theobserved data. We next propose an inverse probability weighting approach to esti-mate these eﬀects. On the ELSA data, this approach reveals little evidence that thetotal efect of hearing loss on dementia is mediated through the feeling of loneliness,with a non-statistically signiﬁcant indirect eﬀect equal to 1.012 (hazard ratio (HR)scale; 95% conﬁdence interval (CI) 0.986 to 1.053).

KEYWORDS:

Longitudinal mediation analysis, Natural eﬀect model, Inverse weighting, Survival outcome

This article is motivated by an analysis of the English Longitudinal Study of Ageing, a longitudinal cohort study of individualsaged 50 and older living in the community in England , which follows participants biennially since 2002/03. In previous studies,it was shown that both self-reported hearing loss and loneliness are signiﬁcantly associated with a higher risk of dementia .The question of interest, which we shall address here, is whether loneliness mediates the impact of hearing loss on incidentphysician-diagnosed dementia.The focus on time-to-event endpoints (dementia) complicates the planned mediation analysis. It renders the popular diﬀerence-and product-of-coeﬃcient methods inappropriate , necessitating the use of more complex causal mediation analysis methods.While such methods have been developed for the analysis of time-to-event endpoints, most ignore that the mediator is notassessed at baseline and that subjects may therefore experience the event prior to the mediator being assessed . Furthercomplications arise from the mediator, loneliness, being repeatedly measured. While useful to better capture mediation via theentire longitudinal mediator process , this also gives rise to complex time-varying confounding patterns whereby mediators Abbreviations:

ANA, anti-nuclear antibodies; APC, antigen-presenting cells; IRF, interferon regulatory factor a r X i v : . [ s t a t . M E ] J u l Vo ET AL (e.g. loneliness) and confounders (e.g. comorbidities) mutually inﬂuence each other over time. These complications have beenaddressed in a number of recent works .A further complication that we must consider, and that we have not previously found being addressed in the literature, isthe presence of competing risks by death. With observed - as opposed to counterfactual - event times, the modelling of cause-speciﬁc hazards is well known to simplify the handling of competing risks. However, the modelling of (cause-speciﬁc) hazardsis not readily possible in the previous mediation analysis works , which instead focus on the analysis of survival chances.In view of this, in this paper, we will here introduce so-called natural eﬀect proportional (cause-speciﬁc) hazard models, whichdirectly parameterise the direct and indirect eﬀects of a given exposure on the cause-speciﬁc hazard of the considered event.We will extend the weighting-based approach proposed by Mittinty and Vansteelandt for longitudinal natural eﬀect models toenable the planned mediation analysis of an exposure via repeatedly measured mediators on a time-to-event endpoint subject tocompeting risks.We proceed as follows. In the next section, we ﬁrst describe the setting of interest. We then extend the natural eﬀect propor-tional hazard model to take into account the longitudinal nature of the mediator. In the same section, we discuss the assumptionsunder which the path-speciﬁc direct and indirect eﬀects derived from such model are identiﬁable from data, and provide a step-by-step procedure for estimating these eﬀects. In section 3, we apply the proposed approach to analyze data from the EnglishLongitudinal Study of Ageing . We end with some ﬁnal remarks and a discussion. Consider an observational study in which independent individuals 𝑖 = 1 , … , 𝑛 are exposed to a categorical factor 𝐴 𝑖 coded as , , … , 𝑃 − 1 for 𝑃 diﬀerent categories. Longitudinal measurements of the mediator 𝑀 𝑖 , 𝑀 𝑖 … , 𝑀 𝑖𝐾 and of the covariates 𝐿 𝑖 , 𝐿 𝑖 … , 𝐿 𝑖𝐾 are subsequently recorded at baseline (subscript 0) and at visits , … , 𝐾 , along with (i) a time-to-event endpoint 𝑇 𝑖 and (ii) an index 𝐷 𝑖 specifying whether the main event ( 𝐷 𝑖 = 1 ) or the competing one ( 𝐷 𝑖 = 2 ) happens. Denote 𝑡 𝑘 , 𝑘 = 1 … 𝐾 the ﬁxed time point after the onset of the exposure at which the measurements of 𝑀 𝑘 and 𝐿 𝑘 are pre-planned for all patients.Assume that these measurements are only recorded until the last visit 𝐾 or until event 𝐷 𝑖 = 1 or 𝐷 𝑖 = 2 happens, whichevercomes ﬁrst. The time-to-event endpoint may be censored administratively or due to loss to follow-up, in which case 𝐷 𝑖 = 0 .The causal diagram in ﬁgure 1 depicts the relationships between the variables over time. It also represents a non-parametricstructural equation model with independent errors. In the diagram, 𝐿 𝑘 includes the indicator 𝐼 ( 𝑇 ≥ 𝑡 𝑘 ) of having survived visit 𝑘 . Throughout, we will denote the history of measurements up to visit 𝑘 using a bar, i.e. 𝑀 𝑘 = ( 𝑀 , 𝑀 , … 𝑀 𝑘 ) .To deﬁne the direct and indirect eﬀects of interest, we will make use of so-called path-speciﬁc eﬀects, expressed as a (cause-speciﬁc) hazard ratio. In particular, we deﬁne the counterfactual variables 𝑇 𝑎,𝑎 ∗ and 𝐷 𝑎,𝑎 ∗ as the time to the main or competingevent (whichever comes ﬁrst) and the corresponding event index that would be observed if the exposure 𝐴 were set to 𝑎 and themediator levels changed to the levels that we would have seen if the exposure were set to 𝑎 ∗ and the levels of the time-varyingconfounders were as observed under this joint intervention on 𝐴 and 𝑀 𝐾 , respectively. The total causal eﬀect (TE) on the cause- 𝑗 -speciﬁc hazard when the exposure changes from 𝑎 to 𝑎 ∗ is then expressed as 𝐻𝑅 𝑗𝑇 𝐸 = 𝜆 𝑗𝑎 ∗ ,𝑎 ∗ ( 𝑡 ) 𝜆 𝑗𝑎,𝑎 ( 𝑡 ) , which can be decomposedinto the direct eﬀect (DE) 𝐻𝑅 𝑗𝐷𝐸 = 𝜆 𝑗𝑎 ∗ ,𝑎 ( 𝑡 ) 𝜆 𝑗𝑎,𝑎 ( 𝑡 ) and the indirect eﬀect (IE) 𝐻𝑅 𝑗𝐼𝐸 = 𝜆 𝑗𝑎 ∗ ,𝑎 ∗ ( 𝑡 ) 𝜆 𝑗𝑎 ∗ ,𝑎 ( 𝑡 ) , where 𝐻𝑅 𝑗𝑇 𝐸 = 𝐻𝑅 𝑗𝐼𝐸 × 𝐻𝑅 𝑗𝐷𝐸 .The indirect eﬀect 𝐻𝑅 𝑗𝐼𝐸 hence reﬂects the part of the treatment eﬀect (on the cause- 𝑗 -speciﬁc hazard) that is mediated via thepathways 𝐴 → 𝑀 𝑘 → … → 𝑇 , where 𝑘 = 1 , , … . These pathways start from the treatment 𝐴 and go directly to one of themediator levels before getting to the event time 𝑇 by any intermediate path. In contrast, the direct eﬀect 𝐻𝑅 𝑗𝐷𝐸 reﬂects the partof the treatment eﬀect (on the cause- 𝑗 -speciﬁc hazard) that does not go through any of the above pathways.We are now ready to deﬁne the cause– 𝑗 –speciﬁc natural eﬀect proportional hazard model, accounting for longitudinalmediators, as follows: 𝜆 𝑗𝑎,𝑎 ∗ ( 𝑡 ) = 𝜆 𝑗 ( 𝑡 ) 𝑒 𝛼 𝑗 𝑎 + 𝛼 𝑗 𝑎 ∗ (1)for all 𝑎, 𝑎 ∗ , where 𝜆 𝑗 ( 𝑡 ) , 𝛼 𝑗 and 𝛼 𝑗 are unknown. Under model (1) , the total, direct and indirect eﬀect can be expressed on thehazard ratio scale as 𝐻𝑅 𝑗𝑇 𝐸 = 𝑒 ( 𝛼 𝑗 + 𝛼 𝑗 )( 𝑎 ∗ − 𝑎 ) ; 𝐻𝑅 𝑗𝐼𝐸 = 𝑒 𝛼 𝑗 ( 𝑎 ∗ − 𝑎 ) and 𝐻𝑅 𝑗𝐷𝐸 = 𝑒 𝛼 𝑗 ( 𝑎 ∗ − 𝑎 ) , respectively. To assess the possibility o ET AL 𝐴 𝑀 𝑀 ... 𝑀 ⌊ 𝑡 ⌋ 𝑇𝐿 𝐿 𝐿 ... 𝐿 ⌊ 𝑡 ⌋ 𝑈 𝑙 𝑈 𝑚 FIGURE 1

Causal diagram. 𝑈 𝑙 : unmeasured confounders aﬀecting 𝐿 and ( 𝑇 , 𝐷 ) . 𝑈 𝑚 : unmeasured confounders aﬀectingdiﬀerent measurements of 𝑀 over timeof mediator-exposure interaction, one can alternatively consider model: 𝜆 𝑗𝑎,𝑎 ∗ ( 𝑡 ) = 𝜆 𝑗 ( 𝑡 ) 𝑒 𝛼 𝑗 𝑎 + 𝛼 𝑗 𝑎 ∗ + 𝛼 𝑗 𝑎.𝑎 ∗ (2)Under model (2), the total causal eﬀect on the hazard ratio scale is expressed as 𝐻𝑅 𝑗𝑇 𝐸 = 𝑒 ( 𝛼 𝑗 + 𝛼 𝑗 + 𝛼 𝑗 ( 𝑎 ∗ + 𝑎 )) ) ( 𝑎 ∗ − 𝑎 ) and is decom-posed into the indirect eﬀect 𝐻𝑅 𝑗𝐼𝐸 = 𝑒 ( 𝛼 𝑗 + 𝛼 𝑗 𝑎 )( 𝑎 ∗ − 𝑎 ) and the direct eﬀect 𝐻𝑅 𝑗𝐷𝐸 = 𝑒 ( 𝛼 𝑗 + 𝛼 𝑗 𝑎 ∗ )( 𝑎 ∗ − 𝑎 ) . Finally, note that othermodels for survival outcomes, such as the Aalen model, can also be extended to the current context. To estimate the parameters in model (1) , we will assume that the set of baseline covariates 𝐿 is suﬃcient to control for con-founding of the relationship between 𝐴 and ( 𝑇 , 𝐷 ) , as well as between 𝐴 and 𝑀 𝑡 at any time. Besides, we assume there are nounmeasured confounders of the relationship between the time-to-event outcome and the mediator at any time. In ﬁgure 1, thelatter assumption is satisﬁed since conditioning on the exposure 𝐴 and the history of the time-varying confounder 𝐿 up to time 𝑡 is suﬃcient to adjust for confounding of the relationship between the time-to-event outcome 𝑇 and the mediator level at time 𝑡 .Our development allows for the presence of unmeasured common causes of the mediators over time (i.e. denoted 𝑈 𝑚 in ﬁgure 1)and separate, independent unmeasured common causes of the baseline/time-varying confounders and the survival time ( 𝑇 , 𝐷 ) (i.e. denoted 𝑈 𝑙 ).In what follows, we generalize the standard estimation procedure for the popular marginal strutural models to ﬁt models (1)and (2). For this, we make the assumption that censoring is non-informative, in the sense that at any time, the instantaneous riskof the event among patients who then drop out of the study is not diﬀerent (at all future times) from that of patients who remain,conditional on the exposure level. Under the aforementioned assumptions, Appendix 1 shows that consistent estimators of theparameters indexing model (1) can be obtained by solving the estimating equation: Vo ET AL ∞ ∫ ∑ 𝑖,𝑎,𝑎 ∗ ⎧⎪⎨⎪⎩( 𝑎𝑎 ∗ ) − ∑ 𝑎,𝑎 ∗ ̂𝐸 [( 𝑎𝑎 ∗ ) 𝑅 𝑗𝑖 ( 𝑡 ) 𝑊 𝑖 ( ⌊ 𝑡 ⌋ , 𝑎, 𝑎 ∗ ) 𝑒 𝛼 𝑗 𝑎 + 𝛼 𝑗 𝑎 ∗ ]∑ 𝑎,𝑎 ∗ ̂𝐸 [ 𝑅 𝑗𝑖 ( 𝑡 ) 𝑊 𝑖 ( ⌊ 𝑡 ⌋ , 𝑎, 𝑎 ∗ ) 𝑒 𝛼 𝑗 𝑎 + 𝛼 𝑗 𝑎 ∗ ] ⎫⎪⎬⎪⎭ ⋅⋅ 𝑅 𝑗𝑖 ( 𝑡 ) 𝑊 𝑖 ( ⌊ 𝑡 ⌋ , 𝑎, 𝑎 ∗ ) ( 𝑑𝑁 𝑗𝑖 ( 𝑡 ) − 𝜆 𝑗 ( 𝑡 ) 𝑒 𝛼 𝑗 𝑎 + 𝛼 𝑗 𝑎 ∗ ) = 0 where 𝑅 𝑗𝑖 ( 𝑡 ) = 𝐼 ( 𝑇 𝑖 ≥ 𝑡, 𝐷 𝑖 = 𝑗 ) ; 𝑑𝑁 𝑗𝑖 ( 𝑡 ) = 𝐼 ( 𝑇 𝑖 = 𝑡, 𝐷 𝑖 = 𝑗 ) and 𝐼 ( . ) denotes the indicator function. Besides, 𝑊 𝑖 ( ⌊ 𝑡 ⌋ , 𝑎, 𝑎 ∗ ) = ∏ 𝑠 ∶ 𝑡 𝑠 ≤ ⌊ 𝑡 ⌋ Pr ( 𝑀 𝑠,𝑖 | 𝐴 𝑖 = 𝑎 ∗ , 𝑀 𝑠 −1 ,𝑖 , 𝐿 𝑠,𝑖 , 𝑇 𝑖 ≥ 𝑡 𝑠 ) ∏ 𝑠 ∶ 𝑡 𝑠 ≤ ⌊ 𝑡 ⌋ Pr ( 𝑀 𝑠,𝑖 | 𝐴 𝑖 = 𝑎, 𝑀 𝑠 −1 ,𝑖 , 𝐿 𝑠,𝑖 , 𝑇 𝑖 ≥ 𝑡 𝑠 ) × 𝐼 ( 𝐴 𝑖 = 𝑎 ) Pr ( 𝐴 𝑖 = 𝑎 | 𝐿 ,𝑖 ) denotes the weight of individual 𝑖 at time 𝑡 and at exposure levels 𝑎 and 𝑎 ∗ . The second component of the weight ensures thatthe exposure-outcome association is adjusted for confounding by 𝐿 . It creates a pseudo-population in which the exposure isno longer associated with 𝐿 and hence removes confounding by 𝐿 . The ﬁrst component of the weight then distinguishesbetween the direct and indirect paths by correcting for the fact that the observed mediator value at each time point before 𝑡 maydiﬀer from the counterfactual value that is of interest at that time. Note that the notation ⌊ 𝑡 ⌋ here is slightly diﬀerent from itsstandard deﬁnition, to take into account the fact that if a patient experiences an event or leaves the study at time 𝑡 = 𝑡 𝑘 of visit 𝑘 , no measurement of 𝑀 𝑘 and 𝐿 𝑘 is possible at that time. More precisely, ⌊ 𝑡 ⌋ = { 𝑡 𝑘 −1 if 𝑡 = 𝑡 𝑘 𝑡 𝑘 if 𝑡 𝑘 < 𝑡 < 𝑡 𝑘 +1 where 𝑘 = 1 , … , 𝐾 . From this, the ﬁtting procedure is described as follows: Step 1–

Postulate and ﬁt a suitable model for the exposure 𝐴 conditional on the baseline confounders ( 𝐿 ) based on theoriginal data set. For instance, a multinomial logistic model can be used for a categorical exposure with 𝑃 possible values: log Pr ( 𝐴 = 𝑎 | 𝐿 = 𝑙 ) Pr ( 𝐴 = 0 | 𝐿 = 𝑙 ) = 𝛽 ,𝑎 + 𝛽 ,𝑎 𝑙 , (3)where 𝑎 = 1 , … , 𝑃 − 1 and Pr ( 𝐴 = 0 | 𝐿 = 𝑙 ) = 1∕(1 + ∑ 𝑃 −1 𝑎 =1 𝑒 𝛽 ,𝑎 + 𝛽 ,𝑎 𝑙 ) . Step 2–

Convert the original dataset to a long or counting-process format, in which the observation period [0 , 𝑡 𝑖 ] of subject 𝑖 is broken into ⌊ 𝑡 𝑖 ⌋ + 1 intervals if 𝑡 𝑖 > ⌊ 𝑡 𝑖 ⌋ > 𝑡 , into 𝑡 𝑖 intervals if 𝑡 𝑖 = ⌊ 𝑡 𝑖 ⌋ > 𝑡 and is kept unchanged if ⌊ 𝑡 𝑖 ⌋ < 𝑡 . Eachinterval 𝑘 will have the following information encoded:(a) The beginning of the interval, which equals the time 𝑡 𝑘 −1 of visit 𝑘 − 1 , with 𝑡 = 0 .(b) The end of the interval, which equals the time 𝑡 𝑘 of visit 𝑘 or the event/censoring time 𝑡 𝑖 for the last interval.(c) The event status at the end of the interval.(d) The exposure 𝐴 𝑖 and the baseline covariates 𝐿 𝑖 , whose values remain unchanged across all intervals.(e) The history of the mediator and the longitudinal confounders recorded up to the end of the interval. Note that the historyof 𝐿 and 𝑀 for subject 𝑖 up to the time 𝑡 𝑖 is similar to their history up to the last visit prior to time 𝑡 𝑖 .Table 1 provides a toy example in which three patients receive a binary treatment and are followed up for a total of three years,with two visits pre-planned at the end of year 1 and 2. Patient 3 is free of event till the end of the study and hence has the mediatorlevel fully recorded at the two intermediate visits. In contrast, patient 1 and 2 experience an event after 1.5 and 0.9 years, due towhich they have no (i.e. patient 2) or only one (i.e. patient 1) mediator level recorded. Table 2 illustrate how the information ofthese three hypothetical individuals is encoded in a counting-process format. Step 3–

Postulate and ﬁt a suitable model for the mediator at a time point 𝑡 , conditional on the exposure, the longitudinalconfounder 𝐿 𝑡 and previous measurements of the mediator (i.e. 𝑀 𝑡 −1 ), by using the long data set. For instance, one may assumemultinomial logistic models for a categorical mediator 𝑀 𝑘 with possible values , … , 𝑄 . The model for 𝑀 𝑘 is thus: log Pr ( 𝑀 𝑘 = 𝑞 | 𝐴 = 𝑎, 𝑀 𝑘 −1 = 𝑚 𝑘 −1 , 𝐿 𝑘 = ̄𝑙 𝑘 , 𝑇 ≥ 𝑡 𝑘 ) Pr ( 𝑀 𝑘 = 0 | 𝐴 = 𝑎, 𝑀 𝑘 −1 = 𝑚 𝑘 −1 , 𝐿 𝑘 = 𝑙 𝑘 , 𝑇 ≥ 𝑡 𝑘 ) = 𝛾 𝑞 + 𝛾 𝑞 𝑎 + 𝛾 ′2 𝑞 𝑚 𝑘 −1 + 𝛾 ′3 𝑞 𝑙 𝑡 (4) o ET AL TABLE 1

Illustrating example: A toy dataset in standard short format. Here, 𝑚 𝑖𝑗 and 𝑙 𝑖𝑗 denote the mediator and longitudinalconfounder level of individual 𝑗 recorded at visit 𝑡 𝑖 , respectively. The treatment 𝐴 is binary (0 vs. 1) and there are two competingevents, coded as and .Individual Following-up time Status 𝐴 𝑀 𝑀 𝐿 𝐿 𝐿 (years)1 1.5 1 1 𝑚 − 𝑙 𝑙 − − − 𝑙 − − 𝑚 𝑚 𝑙 𝑙 𝑙 TABLE 2

The counting-process format of the dataset in table 1Individual Start Stop Status

𝐴 𝑀 𝑡 𝑀 𝑡 −1 𝐿 𝑡 𝐿 𝑡 −1 𝐿 𝑙 𝑚 𝑙 𝑙 𝑙 𝑙 𝑚 𝑙 𝑙 𝑚 𝑚 𝑙 𝑙 𝑙 TABLE 3

Extended data set for the example in Table 1Individual Start Stop Status

𝐴 𝐴 ∗ 𝑀 𝑡 𝑀 𝑡 −1 𝐿 𝑡 𝐿 𝑡 −1 𝐿 𝑙 𝑚 𝑙 𝑙 𝑙 𝑙 𝑚 𝑙 𝑙 𝑚 𝑚 𝑙 𝑙 𝑙 𝑙 𝑚 𝑙 𝑙 𝑙 𝑙 𝑚 𝑙 𝑙 𝑚 𝑚 𝑙 𝑙 𝑙 where 𝑞 = 1 , … , 𝑄 . Step 4–

A new data set is then constructed by copying the original data set (in long format) 𝑃 times and including an additionalvariable 𝐴 ∗ to capture the 𝑃 possible values of the exposure relative to the indirect path. 𝐴 ∗ is set to the actual value of theexposure 𝐴 for the ﬁrst replication, to the other potential values of 𝐴 for the remaining replications. For the example discussedin table 1 and 2, the corresponding extended data set is provided in table 3. Step 5–

Compute weights by applying the ﬁtted models from steps 1 and 3 to the new data set. At visit 𝑘 , the weight for the 𝑖 𝑡ℎ individual is 𝑤 𝑖 ( 𝑘, 𝑎, 𝑎 ∗ ) = 𝑤 𝑡𝑡𝑚𝑖 ( 𝑎 ) ⋅ 𝑤 𝑚𝑒𝑑𝑖 ( 𝑘, 𝑎, 𝑎 ∗ ) , where: 𝑤 𝑡𝑡𝑚𝑖 ( 𝑎 ) = 𝑃 −1 ∑ 𝑎 =0 𝐼 ( 𝐴 𝑖 = 𝑎 ) Pr ( 𝐴 𝑖 = 𝑎 | 𝐿 𝑖 = 𝑙 𝑖 ) and 𝑤 𝑚𝑒𝑑𝑖 ( 𝑘, 𝑎, 𝑎 ∗ ) = ∏ 𝑠 ∶ 𝑡 𝑠 ≤ 𝑡 𝑘 Pr ( 𝑀 𝑠,𝑖 = 𝑚 𝑠,𝑖 | 𝐴 𝑖 = 𝑎 ∗ , 𝑀 𝑠 −1 ,𝑖 = 𝑚 𝑠 −1 ,𝑖 , 𝐿 𝑠,𝑖 = ̄𝑙 𝑠,𝑖 , 𝑇 𝑖 ≥ 𝑡 𝑠 ) Pr ( 𝑀 𝑠,𝑖 = 𝑚 𝑠,𝑖 | 𝐴 𝑖 = 𝑎, 𝑀 𝑠 −1 ,𝑖 = 𝑚 𝑠 −1 ,𝑖 , 𝐿 𝑠,𝑖 = 𝑙 𝑠,𝑖 , 𝑇 𝑖 ≥ 𝑡 𝑠 ) Vo ET AL

𝐴 𝑀 𝑡 𝑇𝐿 𝐿 𝑡 𝑅 𝐶 ( 𝑡 ) 𝑈 𝑙 𝑈 𝑚 (a) 𝐴 𝑀 𝑡 −1 𝑀 𝑡 𝑇𝐿 𝐿 𝑡 −1 𝐿 𝑡 𝑅 𝐶 ( 𝑡 ) 𝑈 𝑙 𝑈 𝑚 (b) FIGURE 2 (Simpliﬁed) causal diagram when censoring presents – (a) Censoring is non-informative conditional on the exposureand (b) Censoring is non-informative at time 𝑡 conditional on the exposure and the history up to that timewhere the subscript 𝑡𝑡𝑚 and 𝑚𝑒𝑑 denotes treatment and mediator, respectively. At the end of the follow-up time, the weight fora patient having 𝑇 𝑖 = 𝑡 𝑖 is 𝑤 𝑖 ( 𝑡 𝑖 , 𝑎, 𝑎 ∗ ) = 𝑤 𝑖 ( ⌊ 𝑡 𝑖 ⌋ , 𝑎, 𝑎 ∗ ) . Step 6–

Fit the natural eﬀect cause-speciﬁc proportional hazard model (1) and (2) by proportiona hazard regression of thecause-speciﬁc event time on 𝐴 and 𝐴 ∗ on the basis of the expanded data set, using the weights computed in the previous step. Step 7–

Derive conﬁdence intervals for the parameters in model (1) and (2) by using the non-parametric bootstrap. For this,one ﬁrst generates 𝑆 bootstrap samples with replacement from the original dataset, then repeats all the above steps for eachbootstrap sample. The 95% conﬁdence interval for each parameter in model (1) and (2) is computed by using the 2.5% and 97.5%quantiles of the bootstrap distribution of the corresponding estimator. Denote 𝐶 the time-to-censoring. As stated above, when the censoring is non-informative conditional on the exposure (ﬁgure 2a),the provided estimating procedure remains valid without further adjustment. When censoring is dependent upon the baselinecovariate vector 𝐿 and the exposure 𝐴 , one could adjust for censoring by alternatively focusing on the so-called conditionalcause-speciﬁc natural eﬀect proportional hazard model, that is, 𝜆 𝑗𝑎,𝑎 ∗ ( 𝑡 | 𝐿 = 𝑙 ) = 𝜆 𝑗 ( 𝑡 ) 𝑒 𝛼 𝑗 𝑎 + 𝛼 𝑗 𝑎 ∗ + 𝛼 ′2 𝑗 𝑙 (5)Note that an interaction between 𝑎 ∗ and 𝑙 could also be permitted in such model to assess the possibility of mediator-baselinecovariate interaction. The procedure discussed in section 2.2 can then be applied to estimate the parameters in this model, witha slight adjustment in step 6 where apart from 𝐴 and 𝐴 ∗ , the covariates 𝐿 (and the product of 𝐴 ∗ and 𝐿 if mediator- baselinecovariate interaction is assessed) are also included into the proportional hazard regression model. In practice, it might howeverbe the case that censoring is dependent upon post-baseline factors such as the longitudinal mediator and confounder levels thatare measured prior to censoring (ﬁgure 2b). In Appendix A2, we show that if at any time 𝑡 , the risk of future events for patientswho drop out of the study is not diﬀerent from that of patients who have the same exposure, mediator and covariate history upto time 𝑡 but remain in the study, the so-called inverse probability of censoring weighting approach can be used to account for o ET AL censoring. More precisely, the parameters indexing models (1) and (2) are then estimated by solving the following equation: ∞ ∫ ∑ 𝑖,𝑎,𝑎 ∗ ⎧⎪⎪⎨⎪⎪⎩( 𝑎𝑎 ∗ ) − ∑ 𝑎,𝑎 ∗ ̂𝐸 [( 𝑎𝑎 ∗ ) ⋅ 𝑅 𝑖 ( 𝑡 ) ⋅ 𝐼 ( 𝐶 𝑖 > 𝑡 ) ⋅ 𝑊 𝑖 ( ⌊ 𝑡 ⌋ ) ⋅ 𝑒 𝛼 𝑗 𝑎 + 𝛼 𝑗 𝑎 ∗ ]∑ 𝑎,𝑎 ∗ ̂𝐸 [ 𝑅 𝑖 ( 𝑡 ) ⋅ 𝐼 ( 𝐶 𝑖 > 𝑡 ) ⋅ 𝑊 𝑖 ( ⌊ 𝑡 ⌋ ) ⋅ 𝑒 𝛼 𝑗 𝑎 + 𝛼 𝑗 𝑎 ∗ ] ⎫⎪⎪⎬⎪⎪⎭ ⋅⋅ 𝑅 𝑖 ( 𝑡 ) ⋅ 𝐼 ( 𝐶 𝑖 > 𝑡 ) ⋅ 𝑊 𝑖 ( ⌊ 𝑡 ⌋ ) ( 𝑑𝑁 𝑖 ( 𝑡 ) − 𝜆 ( 𝑡 ) ⋅ 𝑒 𝛼 𝑗 𝑎 + 𝛼 𝑗 𝑎 ∗ 𝑑𝑡 ) = 0 where 𝑊 𝑖 ( ⌊ 𝑡 ⌋ , 𝑎, 𝑎 ∗ ) = ∏ 𝑠 ∶ 𝑡 𝑠 ≤ ⌊ 𝑡 ⌋ ̂ Pr ( 𝑀 𝑠,𝑖 | 𝐴 𝑖 = 𝑎 ∗ , 𝑀 𝑠 −1 ,𝑖 , 𝐿 𝑠,𝑖 , 𝑇 𝑖 ≥ 𝑡 𝑠 , 𝐶 𝑖 ≥ 𝑡 𝑠 ) ∏ 𝑠 ∶ 𝑡 𝑠 ≤ ⌊ 𝑡 ⌋ ̂ Pr ( 𝑀 𝑠,𝑖 | 𝐴 𝑖 = 𝑎, 𝑀 𝑠 −1 ,𝑖 , 𝐿 𝑠,𝑖 , 𝑇 𝑖 ≥ 𝑡 𝑠 , 𝐶 𝑖 ≥ 𝑡 𝑠 ) ⋅ 𝐼 ( 𝐴 𝑖 = 𝑎 ) ̂ Pr ( 𝐴 𝑖 = 𝑎 | 𝐿 ,𝑖 ) ⋅⋅ ∏ 𝑠 ∶0 ≤ 𝑠 ≤ 𝑡 [ ̂𝜆 𝐶 ( 𝑠 | 𝑇 𝑖 > 𝑠, 𝑀 ⌊ 𝑠 ⌋ ,𝑖 , 𝐿 ⌊ 𝑠 ⌋ ,𝑖 , 𝐿 ,𝑖 , 𝐴 𝑖 = 𝑎 ) ] and 𝜆 𝐶 ( . ) denoting the cause-speciﬁc hazard function of the time-to-censoring. Here, ∏ 𝑠 𝑥 𝑠 is deﬁned as a product limit. Withthe above weight 𝑊 𝑖 ( ⌊ 𝑡 ⌋ , 𝑎, 𝑎 ∗ ) , the additional component that accounts for the informative censoring can make the overallweight become unstable (e.g. when the censoring hazard is close to 1 in some strata). To overcome this, one can then usestabilized (censoring) weights which incorporate a numerator deﬁned in the same way as the denominator but adjusting only forthe exposure, that is: ∏ 𝑠 ∶0 ≤ 𝑠 ≤ 𝑡 [ ̂𝜆 𝐶 ( 𝑠 | 𝐴 𝑖 = 𝑎 ) ] ∏ 𝑠 ∶0 ≤ 𝑠 ≤ 𝑡 [ ̂𝜆 𝐶 ( 𝑠 | 𝑇 𝑖 > 𝑠, 𝑀 ⌊ 𝑠 ⌋ ,𝑖 , 𝐿 ⌊ 𝑠 ⌋ ,𝑖 , 𝐿 ,𝑖 , 𝐴 𝑖 = 𝑎 ) ] One then needs to postulate two models for the censoring hazard at time 𝑡 , with one conditioning on exposure and the otherconditioning on exposure, baseline covariates and the history of the longitudinal mediator and confounders up to time 𝑡 , whereonly the latter model needs to be correct. For instance, 𝜆 𝐶 ( 𝑡 | 𝐴 𝑖 = 𝑎 𝑖 ) = 𝜆 ′0 𝐶 ( 𝑡 ) 𝑒 𝜂𝑎 𝑖 and 𝜆 𝐶 ( 𝑡 | 𝑇 𝑖 > 𝑡, 𝑀 ⌊ 𝑠 ⌋ ,𝑖 = 𝑚 ⌊ 𝑠 ⌋ ,𝑖 , 𝐿 ⌊ 𝑠 ⌋ ,𝑖 = 𝑙 ⌊ 𝑠 ⌋ ,𝑖 , 𝐿 ,𝑖 = 𝑙 ,𝑖 , 𝐴 𝑖 = 𝑎 𝑖 ) = 𝜆 𝐶 ( 𝑡 ) 𝑒 𝜃 𝑎 𝑖 + 𝜃 𝑙 ,𝑖 + 𝜃 ′2 𝑚 ⌊ 𝑠 ⌋ ,𝑖 + 𝜃 ′3 𝑙 ⌊ 𝑠 ⌋ ,𝑖 (6)As a result, in step 5 of the estimation procedure, apart from computing the mediator and treatment weights, one needs toadditionally derive the censoring weight. More precisely, the weight for the 𝑖 𝑡ℎ individual at visit 𝑘 is now 𝑤 𝑡𝑡𝑚𝑖 ( 𝑎, 𝑎 ∗ ) ⋅ 𝑤 𝑚𝑒𝑑𝑖 ( 𝑘, 𝑎 ) ⋅ 𝑤 𝑐𝑒𝑛𝑖 ( 𝑘, 𝑎 ) , where the subscript cen denotes censoring, i.e., 𝑤 𝑐𝑒𝑛𝑖 ( 𝑘, 𝑎 ) = 1 ∏ 𝑠 ∶0 ≤ 𝑠 ≤ 𝑡 𝑘 [ ̂𝜆 𝐶 ( 𝑠 | 𝑇 𝑖 > 𝑠, 𝑚 𝑠,𝑐,𝑖 , 𝑙 𝑠,𝑐,𝑖 , 𝑙 ,𝑖 , 𝑎 ) ] while 𝑤 𝑡𝑡𝑚𝑖 ( 𝑎, 𝑎 ∗ ) and 𝑤 𝑚𝑒𝑑𝑖 ( 𝑘, 𝑎, 𝑎 ∗ ) are computed as above. If a proportional hazard censoring model as (6) is ﬁtted, the baselinecensoring hazard 𝜆 𝐶 ( 𝑡 ) and 𝜆 ′0 𝐶 ( 𝑡 ) at time 𝑡 can be estimated by the standard Breslow estimator. Once the individual weightsare computed, the natural eﬀect cause-speciﬁc proprotional hazard model (1) and (2) can be ﬁtted using these weights and theconﬁdence intervals for the estimates can be derived via the nonparametric bootstrap, as described above. We illustrate the proposed approach on the ELSA data. In this ongoing study, the ﬁrst contact with the participants was in2002/03 (wave 1). These participants were then followed up biennially, with measures collected via computer-assisted face-to-face interview and self-completion questionnaires. For this illustration, the data are available until one year after the last wave in2016/17. As stated above, the question of interest here is whether the feeling of loneliness mediates the impact of hearing losson dementia, accounting for mortality as a competing event.For this analysis, we used the hearing measurement recorded at wave 2 (e.g. 2004/05) and dichotomized subjects into twogroups, namely normal ( 𝐴 = 0 ) and limited ( 𝐴 = 1 ) hearing ability. The longitudinal mediator "loneliness" was recordedfrom wave 3 (2006/07) to wave 7 (2014/15) and had two potential values, namely frequent ( 𝑀 = 1 ) vs. infrequent feeling ofloneliness ( 𝑀 = 0 ). Alongside the mediator, four longitudinal confounders were recorded over time (i.e. wave 3 to wave 7),namely depression status (yes vs. no), mobility score (continuous), smoking status (non-smoker vs. current smoker) and alcohol Vo ET AL status (non-drinker vs. current drinker). The baseline covariates consisted of age, gender, ethnicity (white vs. non-white), wealth(1=low, 5=high), education level (1 = no formal qualiﬁcation, 2 = intermediate and 3 = higher education), marital status (yesvs. no), the use of hearing aids (yes vs. no), the presence of other comorbidities (i.e. hypertension, diabetes, stroke and cancer –yes vs. no) and the baseline values of the aforementioned time-varying confounders. A detailed description of these covariateswas provided elsewhere .We assumed that the relationship between the variables obeys the causal structure depicted in ﬁgure 1. Here, the mediatorsand confounders measured at time 𝑡 𝑘 −1 are time-varying confounders of the relationship between the mediator measured at time 𝑡 𝑘 and the outcome. To derive the treatment weights (step 1), we ﬁrst considered a logistic (treatment) model adjusting for themain eﬀects of all baseline covariates. To assess the potential of covariate-covariate interactions, we used a LASSO variableselection process (R package glmnet) to select the most important interaction terms from the set of all possible two-by-twocovariate interactions. The chosen interactions were then added into the treatment model. The tuning parameter in the LASSOwas selected by leave-one-out cross-validation.To derive the mediation weights (step 3), we ﬁrst considered a logistic (mediator) model adjusting for 𝑡 𝑘 , 𝑀 𝑘 −1 , 𝐿 𝑘 and themain eﬀects of all baseline covariates. To assess whether the conditional distribution of 𝑀 𝑘 had a residual dependence uponthe history of 𝑀 and 𝐿 that preceded the time 𝑡 𝑘 −1 (for 𝑀 ) and 𝑡 𝑘 (for 𝐿 ), we used the LASSO to determine the ﬁrst post-baseline measurement of 𝑀 and 𝐿 that were predictive for 𝑀 𝑘 , conditional on the later measurements. This measurement andall measurements following this one were included into the mediator model. Next, we assessed whether there were important(i) treatment-baseline/longitudinal covariate interactions, (ii) time-baseline covariate interactions and (iii) baseline covariate-covariate interactions that should be adjusted for. For each step, an independent LASSO variable selection process was performedto select the most important interaction terms from all possible interactions of the same type. The interactions that were chosenin the previous step were always included in the model of the subsequent steps (which implies no shrinkage on these terms inthe subsequent steps). The tuning parameter in each LASSO procedure was selected by leave-one-out cross-validation. The ﬁnalmodel was reﬁtted before calculating the mediation weights.To derive the censoring weights, we ﬁrst considered a cause-speciﬁc proportional hazard model adjusting for 𝑀 𝑘 , 𝐿 𝑘 , theexposure 𝐴 and the main eﬀects of all baseline covariates 𝐿 . To assess whether the censoring hazard at time 𝑡 had a residualdependence upon the history of 𝑀 and 𝐿 that preceded the time 𝑡 𝑘 for 𝑀 and for 𝐿 , we implemented a backward eliminationprocess, using the Akaike information criterion to determine the ﬁrst post-baseline measurement of 𝑀 and 𝐿 that were predictivefor the censoring hazard at time 𝑡 , conditional on the later measurements. This measurement and all measurements followingthis one were included into the censoring proportional hazard model. Next, as for the mediator model, we assessed whether therewere important (i) treatment-baseline/longitudinal covariate interactions and (ii) baseline covariate-covariate interactions thatshould be adjusted for. For each step, an independent backward elimination process was performed to select the most importantinteraction terms from all possible interactions of the same type. The interactions that were chosen in the previous step werealways included in the model of the subsequent steps (which implies no exclusion of these terms in the subsequent steps).Note that we used backward elimination for the construction of the censoring models (as opposed to LASSO) due to the lackof prepackaged software that can apply LASSO or other penalized variable selection methods on a counting format survivaldataset. Results of the variable selection processes for the treatment, mediator and censoring models are reported in OnlineSupplementary Material ﬁle.The two natural eﬀect proportional hazard models (i.e. model (1) and (2)) speciﬁc for dementia and for death were thenﬁtted using the calculated weights. The conﬁdence intervals of the total, direct and indirect hazard ratios were derived by thenon-parametric bootstrap method, with 5000 samples taken from the original data set by sampling with replacement. We thenestablished the cumulative incidence curves of dementia and of death under diﬀerent sets of 𝑎 and 𝑎 ∗ . These curves reﬂect thecumulative failure rates over time for a particular cause (e.g. dementia), acounting for the presence of other competing events(e.g. death). To estimate the curves, we considered dementia and death as two terminal states of a multi-state model wherethe transition from dementia to death was treated as an absorbing state, i.e. the one that subjects never exist . For pedagogicpurposes, we only use the results of the natural eﬀect model (1) (i.e. without interaction between 𝑎 and 𝑎 ∗ ) to establish thesecurves.As can be seen from table 5, the total eﬀect of hearing loss on the time to dementia diagnosis was statistically signiﬁcant(Model 1, HR = 1.843; 95%CI 1.100 to 2.710). Model (1) further suggested that this total eﬀect was weakly mediated throughthe feeling of loneliness, with a non-statistically signiﬁcant indirect eﬀect equal to 1.012 (HR scale; 95%CI 0.986 to 1.053).This expresses that the hazard of dementia would become 1.012 times higher if all patients were to have limited hearing abilitybut the loneliness levels were switched from the values that would have been observed if they had normal hearing ability to the o ET AL . . . . . . Years after hearing measurement at wave 2 C u m u l a t i v e r i sk o f de m en t i a d i agno s i s a = 0, a* = 0a = 1, a* = 0a = 1, a* = 1 0 2 4 6 8 10 . . . . . Years after hearing measurement at wave 2 C u m u l a t i v e r i sk o f dea t h a = 0, a* = 0a = 1, a* = 0a = 1, a* = 1 FIGURE 3

Estimated cumulative incidence curve of dementia diagnosis

TABLE 4

Data analysis: estimation of the natural eﬀect modelsModel Coeﬃcient Estimate 95%CI p-value(1)

Dementia ( 𝑗 = 1 ) 𝛼 𝑗 𝛼 𝑗 Death ( 𝑗 = 2 ) 𝛼 𝑗 -0.126 (-0.483; 0.143) 0.462 𝛼 𝑗 Dementia ( 𝑗 = 1 ) 𝛼 𝑗 𝛼 𝑗 𝛼 𝑗 Death ( 𝑗 = 2 ) 𝛼 -0.121 (-0.480; 0.148) 0.485 𝛼 𝛼 -0.009 (-0.049; 0.022) 0.632value observed under limited hearing. In contrast, the total eﬀect of hearing loss on mortality was not statistically signiﬁcant(Model 1, HR = 0.882; 95%CI 0.625 to 1.148). There was no statistical evidence of an indirect eﬀect through the feeling ofloneliness (HR = 1.001, 95%CI 0.987 to 1.011). These ﬁndings did not change when considering model (2) with interaction(table 4 - p-value of the interaction coeﬃcient equals 0.520 for dementia and 0.632 for death).Figure 3 provides the estimated cumulative incidence curve of dementia for diﬀerent 𝑎 and 𝑎 ∗ , which visualizes the weakindirect eﬀect of hearing loss on dementia through the suggested longitudinal mediator. At some time points, there are largejumps in these curves due to the fairly high rate of interval censoring in the dataset (i.e. if the date for dementia diagnosis (or Vo ET AL

TABLE 5

Data analysis: the eﬀect of limited vs. normal hearing ability on the time-to-event outcomes, mediated through thefeeling of loneliness. The mediated proportion is calculated on log scaleModel Eﬀect Hazard ratio 95%CI Mediated proportion(1)

Dementia

Total eﬀect ( 𝐴 = 1 vs. 𝐴 = 0 ) 1.843 (1.100; 2.711)Direct eﬀect 1.821 (1.095; 2.721)Indirect eﬀect 1.012 (0.986; 1.053) 2.0% Death

Total eﬀect ( 𝐴 = 1 vs. 𝐴 = 0 ) 0.882 (0.625; 1.148)Direct eﬀect 0.882 (0.617; 1.154)Indirect eﬀect 1.001 (0.987; 1.011) -0.8%(2) Dementia

Total eﬀect ( 𝐴 = 1 vs. 𝐴 = 0 ) 1.839 (1.100; 2.713)Direct eﬀect 1.836 (1.091; 2.716)Indirect eﬀect 1.002 (0.988; 1.017) 0.3%Total eﬀect ( 𝐴 = 0 vs. 𝐴 = 1 ) 0.544 (0.369; 0.912)Direct eﬀect 0.554 (0.367; 0.912)Indirect eﬀect 0.981 (0.925; 1.022) 3.2% Death

Total eﬀect ( 𝐴 = 1 vs. 𝐴 = 0 ) 0.882 (0.622; 1.148)Direct eﬀect 0.878 (0.614; 1.151)Indirect eﬀect 1.005 (0.989; 1.023) -4.0%Total eﬀect ( 𝐴 = 0 vs. 𝐴 = 1 ) 1.134 (0.871; 1.605)Direct eﬀect 1.129 (0.862; 1.617)Indirect eﬀect 1.004 (0.981; 1.036) 3.2%for death) was not known but person had a new diagnosis (or passed away) from one visit to the next, then we considered themidpoint between two visits as the event date). Finally, the above results should be interpreted with caution as there might beimportant time-varying confounders of the mediator-outcome association that were not taken into account. The ﬁndings couldalso be biased if the involved models were incorrectly speciﬁed or censoring was informative (e.g. elderly patients who livealone might not come to the control visit due to dementia-related problems). In this paper, we have generalized the weighting-based strategy proposed for natural eﬀect models in single mediation analysisto the setting where the mediator of interest is repeatedly measured over time (hence subject to longitudinal confounders) andthe primary outcome is a time-to-event endpoint, subject to competing risks. The proposed approach yields consistent estimatesfor the natural direct and indirect eﬀects if the causal assumptions hold and the natural eﬀect model and the conditional dis-tribution of the exposure, mediator and censoring are correctly speciﬁed. As noted by Steen et al , the mediator model needscareful consideration, especially when the exposure (and the baseline covariates) are highly predictive of the mediator, for theneven minor misspeciﬁcation can have a major impact on the weights and lead to biased results. Apart from (mediator) modelmisspeciﬁcation, the estimated weights may become unstable or even extreme when at each time point, there is an inadequateoverlap between the conditional distributions of the mediator under diﬀerent treatment/exposure conditions, as may be the casewhen the exposure has a strong eﬀect on the mediator. While the presence of extreme weights might appear as a limitationat ﬁrst, it may also diagnose severe model extrapolation that often goes unnoticed when using a repeated regression approachproposed for the same setting . Simple weighting-based approaches also tend to yield larger standard errors (compared toimputation or regression-based approaches) due to lack in eﬃciency. This can be especially problematic when the mediator iscontinuous. In that case, the weight-based approaches tend to be unstable even under proper model speciﬁcation and adequate o ET AL overlap of the mediator distributions across treatment groups, which may result in considerable ﬁnite sample bias in the naturaleﬀect estimates. The repeated regression approach might be more appropriate when dealing with continuous mediators .Several proposals can be made to improve the suggested approach. Future research might focus on the development of doublyor multiply robust estimators to improve the robustness and eﬃciency of the current weight-based approach. The proposedstrategy can also be easily extended to take into account multiple mediators 𝑀 (1) , … , 𝑀 ( 𝑉 ) that are repeatedly measured overtime. When these mediators are causally ordered then as suggested by Vanderweele and Vansteelandt (2014), one can ﬁrst eval-uate the eﬀect mediated through 𝑀 (1) , then examine how much this changes when 𝑀 (1) and 𝑀 (2) are jointly considered asmediators. This then reveals the additional contribution of 𝑀 (2) beyond 𝑀 (1) alone. The process is then carried on by sequentiallyadding one mediator at a time until all 𝑉 mediators are included . By accounting for multiple, repeatedly measured mediators,results of the analysis may allow one to get closer to evaluating the entire mediation process that underlies the treatment mecha-nism in practice. Finally, future research should also extend the proposed approach to account for continuous exposures, whichare quite common in epidemiology and social science. ACKNOWLEDGMENTS

The ﬁrst author was supported by the funding from the European UnionâĂŹs Horizon 2020 research and innovation program,under the Marie Sklodowska-Curie grant agreement (grant no.: 676207).

Conﬂict of interest

The authors declare no potential conﬂict of interests.

SUPPORTING INFORMATION

The following supporting information is available as part of the online article:

Data analysis 1.

Results of the construction of the propensity score model, the mediator model at each time point and thecensoring model

Data analysis 2

R codes for the analysis of the ELSA data.

References

1. Steptoe A, Breeze E, Banks J, Nazroo J. Cohort proﬁle: the English longitudinal study of ageing.

International journal ofepidemiology

Journal of the American Geriatrics Society

The Journals of Gerontology: Series B

Annual Review of Public Health

Epidemiology Vo ET AL

7. Pearl J. Direct and indirect eﬀects. In: Morgan Kaufmann Publishers Inc. ; 2001: 411–420.8. Lange T, Rasmussen M, Thygesen LC. Assessing natural direct and indirect eﬀects through multiple pathways.

Americanjournal of epidemiology

Epidemiology (Cambridge,Mass.)

Statistical methods in medical research

Statistics in medicine

Journal of causal inference

Statistics in medicine arXiv preprintarXiv:1912.01200

Americanjournal of epidemiology

Journal of the American Geriatrics Society

Statistics in medicine

Journal of Statistical Software

American journal ofepidemiology

Epidemiologic Methods

Biometrics

Epidemiologic methods

Journal of clinical epidemiology

CognitiveScience

Biometrics

Epidemiology (Cam-bridge, Mass.) o ET AL

27. Nguyen QC, Osypuk TL, Schmidt NM, Glymour MM, Tchetgen Tchetgen EJ. Practical guidance for conducting mediationanalysis with multiple mediators using inverse odds ratio weighting.

American journal of epidemiology

Diabetologia

Journal of Applied Econo-metrics

Statisticalscience

Annals of statistics

Epidemiology (Cambridge, Mass.)

How to cite this article:

Vo T., Davies-Kershaw H., Hackett R., and Vansteelandt S. (2020), Longitudinal mediation analysisof time–to–event endpoints in the presence of competing risks,

Statistics in Medicine. , xxxx;xx:x–x . APPENDIXA – FORMAL PROOF OF THE PROPOSAL

We here describe the derivations of the estimating procedure discussed in section 2. Assume that the causal diagram in ﬁgure1 represents a non-parametric structural equation model with independent errors and that the natural eﬀect Cox model (1) iscorrectly speciﬁed. For the sake of simplicity, we ﬁrst assume that no censoring presents. The added complexity due to censoringwill be subsequently addressed. Let 𝐹 be the score for ( 𝛼 , 𝛼 ) for all individuals in the sample. To simplify the proof, we willdenote 𝑊 𝑖 ( ⌊ 𝑡 ⌋ , 𝑎, 𝑎 ∗ ) as 𝑊 𝑖 ( ⌊ 𝑡 ⌋ ) . One then has: 𝐸 ( 𝐹 ) = ∫ ∑ 𝑖,𝑎,𝑎 ∗ ⎡⎢⎢⎢⎣( 𝑎𝑎 ∗ ) − ∑ 𝑎,𝑎 ∗ ( 𝑎𝑎 ∗ ) ⋅ 𝑒 𝛼 𝑎 + 𝛼 𝑎 ∗ ⋅ 𝐸 [ 𝑅 𝑖 ( 𝑡 ) ⋅ 𝑊 𝑖 ( ⌊ 𝑡 ⌋ ) ]∑ 𝑎,𝑎 ∗ 𝑒 𝛼 𝑎 + 𝛼 𝑎 ∗ ⋅ 𝐸 [ 𝑅 𝑖 ( 𝑡 ) ⋅ 𝑊 𝑖 ( ⌊ 𝑡 ⌋ ) ] ⎤⎥⎥⎥⎦ ⋅⋅ 𝐸 [ 𝑅 𝑖 ( 𝑡 ) ⋅ 𝑊 𝑖 ( ⌊ 𝑡 ⌋ ) ⋅ 𝑑𝑁 𝑖 ( 𝑡 )] We now consider the expression 𝐸 [ 𝑅 ( 𝑡 ) ⋅ 𝑊 ( ⌊ 𝑡 ⌋ ) ⋅ 𝑑𝑁 ( 𝑡 )] , that is: 𝐸 [ 𝑅 ( 𝑡 ) ⋅ 𝑊 ( ⌊ 𝑡 ⌋ ) ⋅ 𝑑𝑁 ( 𝑡 )]= 𝐸 { 𝐸 ( 𝑑𝑁 ( 𝑡 ) | 𝑇 ≥ 𝑡, 𝐴, 𝑀 𝑡 , 𝐿 𝑡 , 𝐿 ) ⋅ 𝑅 ( 𝑡 ) ⋅ 𝑊 ( ⌊ 𝑡 ⌋ ) } = 𝐸 { 𝐸 ( 𝑑𝑁 ( 𝑡 ) | 𝑇 ≥ 𝑡, 𝐴, 𝑀 𝑡 , 𝐿 𝑡 , 𝐿 ) ⋅ 𝑅 ( 𝑡 ) ⋅ 𝑊 ∗ ( ⌊ 𝑡 ⌋ ) |||| 𝐴 = 𝑎 } Pr ( 𝐴 = 𝑎 ) where 𝑊 ∗ ( ⌊ 𝑡 ⌋ ) = ∏ 𝑠 ∶ 𝑡 𝑠 ≤ ⌊ 𝑡 ⌋ Pr ( 𝑀 𝑠 | 𝐴 = 𝑎 ∗ , 𝑀 𝑠 −1 , 𝐿 𝑠 , 𝑇 ≥ 𝑡 𝑠 ) ∏ 𝑠 ∶ 𝑡 𝑠 ≤ ⌊ 𝑡 ⌋ Pr ( 𝑀 𝑠 | 𝐴 = 𝑎, 𝑀 𝑠 −1 , 𝐿 𝑠 , 𝑇 ≥ 𝑡 𝑠 ) × 1 Pr ( 𝐴 = 𝑎 | 𝐿 ) . Denote 𝑡 − = 𝑡 − 𝛿𝑡 where 𝛿𝑡 > is a small positive quantity such that ⌊ 𝑡 ⌋ < 𝑡 − < 𝑡 . One then has: Vo ET AL 𝐸 [ 𝑅 ( 𝑡 ) ⋅ 𝑊 ( 𝑡 ) ⋅ 𝑑𝑁 ( 𝑡 )]= 𝐸 { 𝐸 ( 𝑑𝑁 ( 𝑡 ) | 𝑇 ≥ 𝑡, 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ) ⋅ 𝐼 ( 𝑇 ≥ 𝑡 ) ⋅ 𝐼 ( 𝑇 ≥ 𝑡 −) ⋅ 𝑊 ∗ ( ⌊ 𝑡 ⌋ ) |||| 𝐴 = 𝑎 } Pr ( 𝑎 )= 𝐸 { 𝐸 [ 𝐸 ( 𝑑𝑁 ( 𝑡 ) | 𝑇 ≥ 𝑡, 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ) ⋅ 𝐼 ( 𝑇 ≥ 𝑡 ) ⋅ 𝐼 ( 𝑇 ≥ 𝑡 −) ⋅⋅ 𝑊 ∗ ( ⌊ 𝑡 ⌋ ) |||| 𝑇 ≥ 𝑡 − , 𝐴 = 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ] |||| 𝐴 = 𝑎 } Pr ( 𝑎 )= 𝐸 { 𝐸 ( 𝑑𝑁 ( 𝑡 ) | 𝑇 ≥ 𝑡, 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ) ⋅ 𝐼 ( 𝑇 ≥ 𝑡 −) ⋅ 𝑊 ∗ ( ⌊ 𝑡 ⌋ ) ⋅⋅ Pr [ 𝑇 ≥ 𝑡 | 𝑇 ≥ 𝑡 − , 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ] |||| 𝐴 = 𝑎 } Pr ( 𝑎 ) where 𝐸 ( 𝑑𝑁 ( 𝑡 ) | 𝑇 ≥ 𝑡, 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ) is a short-hand notation for 𝐸 ( 𝑑𝑁 ( 𝑡 ) | 𝑇 ≥ 𝑡, 𝐴 = 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ) and so on. Notethat Pr [ 𝑇 ≥ 𝑡 | 𝑇 ≥ 𝑡 − , 𝐴 = 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ] = 1 − 𝜆 ( 𝑡 | 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ) 𝛿𝑡 . This implies that: 𝐸 [ 𝑅 ( 𝑡 ) ⋅ 𝑊 ( 𝑡 ) ⋅ 𝑑𝑁 ( 𝑡 )] = 𝐸 { 𝐸 ( 𝑑𝑁 ( 𝑡 ) | 𝑇 ≥ 𝑡, 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ) ⋅ 𝐼 ( 𝑇 ≥ 𝑡 −) ⋅⋅ 𝑊 ∗ ( ⌊ 𝑡 ⌋ ) ⋅ [ 𝜆 ( 𝑡 | 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ) 𝛿𝑡 ] |||| 𝐴 = 𝑎 } Pr ( 𝑎 ) Applying the above arrangements in a backward fashion till when the time point ⌊ 𝑡 ⌋ is reached, one then obtains: 𝐸 [ 𝑅 ( 𝑡 ) ⋅ 𝑊 ( 𝑡 ) ⋅ 𝑑𝑁 ( 𝑡 )]= 𝐸 { 𝐸 ( 𝑑𝑁 ( 𝑡 ) | 𝑇 ≥ 𝑡, 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ) ⋅ 𝐼 ( 𝑇 ≥ ⌊ 𝑡 ⌋ ) ⋅ 𝑊 ∗ ( ⌊ 𝑡 ⌋ ) ⋅ ∏ 𝑠 ∶ ⌊ 𝑡 ⌋ ≤ 𝑠 ≤ 𝑡 [ 𝜆 ( 𝑠 | 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ) ] |||| 𝐴 = 𝑎 } Pr ( 𝑎 )= 𝐸 { ∫ 𝐸 ( 𝑑𝑁 ( 𝑡 ) | 𝑇 ≥ 𝑡, 𝑎, 𝑚 ⌊ 𝑡 ⌋ , 𝑀 ⌊ 𝑡 −1 ⌋ , 𝑙 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 −1 ⌋ , 𝐿 ) ⋅ 𝐼 ( 𝑇 ≥ ⌊ 𝑡 ⌋ ) ⋅ 𝑊 ∗ ( ⌊ 𝑡 − 1 ⌋ ) ⋅⋅ Pr ( 𝑚 𝑡 | 𝑎 ∗ , 𝑀 ⌊ 𝑡 −1 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 , 𝑇 ≥ 𝑡 ) ⋅ Pr ( 𝑙 𝑡 | 𝑎, 𝐿 ⌊ 𝑡 −1 ⌋ , 𝑀 ⌊ 𝑡 −1 ⌋ , 𝐿 , 𝑇 ≥ 𝑡 ) ⋅⋅ ∏ 𝑠 ∶ ⌊ 𝑡 ⌋ ≤ 𝑠 ≤ 𝑡 [ 𝜆 ( 𝑠 | 𝑎, 𝑚 ⌊ 𝑡 ⌋ , 𝑀 ⌊ 𝑡 −1 ⌋ , 𝑙 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 −1 ⌋ , 𝐿 ) ] 𝑑𝑚 ⌊ 𝑡 ⌋ 𝑑𝑙 ⌊ 𝑡 ⌋ |||| 𝐴 = 𝑎 } Pr ( 𝑎 ) where ∏ 𝑠 𝑎 𝑠 is deﬁned as a product limit. The above procedure continues to be repeated in a backward fashion till when thestarting time point 𝑡 = 0 is reached, which then gives: 𝐸 [ 𝑅 ( 𝑡 ) ⋅ 𝑊 ( 𝑡 ) ⋅ 𝑑𝑁 ( 𝑡 )]= 𝐸 { ∫ 𝐸 ( 𝑑𝑁 ( 𝑡 ) | 𝑇 ≥ 𝑡, 𝑎, 𝑚 ⌊ 𝑡 ⌋ , 𝑙 ⌊ 𝑡 ⌋ , 𝐿 ) ⋅ ∏ 𝑠 ∶0 <𝑠 ≤ 𝑡 [ 𝜆 ( 𝑠 | 𝑎, 𝑚 ⌊ 𝑠 ⌋ , 𝑙 ⌊ 𝑠 ⌋ , 𝐿 ) ] ⋅⋅ ∏ 𝑠 ∶0 <𝑠 ≤ ⌊ 𝑡 ⌋ Pr ( 𝑚 𝑠 | 𝑎 ∗ , 𝑚 𝑠 −1 , 𝑙 𝑠 , 𝐿 , 𝑇 ≥ 𝑡 𝑠 ) ⋅⋅ Pr ( 𝑙 𝑠 | 𝑙 𝑠 −1 , 𝑚 𝑠 −1 , 𝐿 , 𝑎, 𝑇 ≥ 𝑡 𝑠 ) 𝑑𝑚 ⌊ 𝑡 ⌋ 𝑑𝑙 ⌊ 𝑡 ⌋ |||| 𝐴 = 𝑎 } Pr ( 𝑎 ) Note that ∏ 𝑠 ∶0 <𝑠 ≤ 𝑡 [ 𝜆 ( 𝑠 | 𝑎, 𝑚 ⌊ 𝑠 ⌋ , 𝑙 ⌊ 𝑠 ⌋ , 𝐿 ) ] = Pr ( 𝑇 ≥ 𝑡 | 𝑎, 𝑚 ⌊ 𝑡 ⌋ , 𝑙 ⌊ 𝑡 ⌋ , 𝐿 ) . This implies that: o ET AL 𝐸 [ 𝑅 ( 𝑡 ) ⋅ 𝑊 ( 𝑡 ) ⋅ 𝑑𝑁 ( 𝑡 )]= 𝐸 { ∫ 𝐸 ( 𝑑𝑁 ( 𝑡 ) | 𝑇 ≥ 𝑡, 𝑎, 𝑚 ⌊ 𝑡 ⌋ , 𝑙 ⌊ 𝑡 ⌋ , 𝐿 ) ⋅ Pr ( 𝑇 ≥ 𝑡 | 𝑎, 𝑚 ⌊ 𝑡 ⌋ , 𝑙 ⌊ 𝑡 ⌋ , 𝐿 ) ⋅⋅ ∏ 𝑠 ∶0 <𝑠 ≤ 𝑡 Pr ( 𝑚 𝑠 | 𝑎 ∗ , 𝑚 𝑠 −1 , 𝑙 𝑠 , 𝐿 , 𝑇 ≥ 𝑡 𝑠 ) ⋅⋅ Pr ( 𝑙 𝑠 | 𝑙 𝑠 −1 , 𝑚 𝑠 −1 , 𝐿 , 𝑎, 𝑇 ≥ 𝑡 𝑠 ) , 𝑑𝑚 ⌊ 𝑡 ⌋ 𝑑𝑙 ⌊ 𝑡 ⌋ |||| 𝐴 = 𝑎 } Pr ( 𝑎 )= ∫ 𝐸 ( 𝑑𝑁 ( 𝑡 ) | 𝑎, 𝑚 𝑡 , 𝑙 𝑡 , 𝑙 ) ⋅ ∏ 𝑠 ∶0 <𝑠 ≤ ⌊ 𝑡 ⌋ Pr ( 𝑚 𝑠 | 𝑎 ∗ , 𝑚 𝑠 −1 , 𝑙 𝑠 , 𝑙 , 𝑇 ≥ 𝑡 𝑠 ) ⋅⋅ Pr ( 𝑙 𝑠 | 𝑙 𝑠 −1 , 𝑚 𝑠 −1 , 𝑙 , 𝑎, 𝑇 ≥ 𝑡 𝑠 ) ⋅ Pr ( 𝑙 | 𝑎 ) × Pr ( 𝑎 ) Pr ( 𝑎 | 𝑙 ) 𝑑𝑚 ⌊ 𝑡 ⌋ 𝑑𝑙 ⌊ 𝑡 ⌋ 𝑑𝑙 = 𝐸 [ 𝑑𝑁 𝑎,𝑎 ∗ ( 𝑡 ) ] = 𝐸 [ 𝑅 𝑎,𝑎 ∗ ( 𝑡 ) ] ⋅ 𝐸 [ 𝑑𝑁 𝑎,𝑎 ∗ ( 𝑡 ) | 𝑅 𝑎,𝑎 ∗ ( 𝑡 ) = 1 ] where the prior-to-last equaling follows by the derivation of Vansteelandt et al . In a similar way, one can show that 𝐸 [ 𝑅 ( 𝑡 ) ⋅ 𝑊 ( 𝑡 )] = 𝐸 ( 𝑅 𝑎,𝑎 ∗ ( 𝑡 ) ) . These imply that: 𝐸 ( 𝐹 ) = ∫ ∑ 𝑖,𝑎,𝑎 ∗ ⎡⎢⎢⎢⎣( 𝑎𝑎 ∗ ) − ∑ 𝑎,𝑎 ∗ ( 𝑎𝑎 ∗ ) ⋅ 𝑒 𝛼 𝑎 + 𝛼 𝑎 ∗ ⋅ 𝐸 [ 𝑅 𝑎,𝑎 ∗ 𝑖 ( 𝑡 ) ]∑ 𝑎,𝑎 ∗ 𝑒 𝛼 𝑎 + 𝛼 𝑎 ∗ ⋅ 𝐸 [ 𝑅 𝑎,𝑎 ∗ 𝑖 ( 𝑡 ) ] ⎤⎥⎥⎥⎦ ⋅⋅ 𝐸 [ 𝑅 𝑎,𝑎 ∗ ( 𝑡 ) ] ⋅ 𝐸 [ 𝑑𝑁 𝑎,𝑎 ∗ ( 𝑡 ) | 𝑅 𝑎,𝑎 ∗ ( 𝑡 ) = 1 ] = ∫ ∑ 𝑖,𝑎,𝑎 ∗ ⎡⎢⎢⎢⎣( 𝑎𝑎 ∗ ) − ∑ 𝑎,𝑎 ∗ ( 𝑎𝑎 ∗ ) 𝑒 𝛼 𝑎 + 𝛼 𝑎 ∗ ⋅ 𝐸 [ 𝑅 𝑎,𝑎 ∗ 𝑖 ( 𝑡 ) ]∑ 𝑎,𝑎 ∗ 𝑒 𝛼 𝑎 + 𝛼 𝑎 ∗ ⋅ 𝐸 [ 𝑅 𝑎,𝑎 ∗ 𝑖 ( 𝑡 ) ] ⎤⎥⎥⎥⎦ ⋅⋅ 𝐸 [ 𝑅 𝑎,𝑎 ∗ ( 𝑡 ) ] ⋅ { 𝐸 [ 𝑑𝑁 𝑎,𝑎 ∗ ( 𝑡 ) | 𝑅 𝑎,𝑎 ∗ ( 𝑡 ) = 1 ] − 𝜆 ( 𝑡 ) ⋅ 𝑒 𝛼 𝑎 + 𝛼 𝑎 ∗ } = 0 The ﬁnal equaling follows by the fact that the natural proportional hazard model is correctly speciﬁed, for then 𝐸 [ 𝑑𝑁 𝑖 ( 𝑡 ) 𝑎,𝑎 ∗ ( 𝑡 ) | 𝑇 𝑎,𝑎 ∗ 𝑖 ≥ 𝑡 ] = 𝜆 ( 𝑡 ) ⋅ 𝑒 𝛼 𝑎 + 𝛼 𝑎 ∗ B – ADDRESSING INFORMATIVE CENSORING

When censoring presents but is non-informative given the exposure group, the score 𝐹 for ( 𝛼 , 𝛼 ) for all individuals in thesample can be written as: 𝐹 = ∞ ∫ ∑ 𝑖,𝑎,𝑎 ∗ ⎧⎪⎨⎪⎩( 𝑎𝑎 ∗ ) − ∑ 𝑎,𝑎 ∗ 𝐸 [( 𝑎𝑎 ∗ ) ⋅ 𝑅 𝑖 ( 𝑡 ) ⋅ 𝐼 ( 𝐶 𝑖 > 𝑡 ) ⋅ 𝑊 𝑖 ( ⌊ 𝑡 ⌋ ) ⋅ 𝑒 𝛼 𝑎 + 𝛼 𝑎 ∗ ]∑ 𝑎,𝑎 ∗ 𝐸 [ 𝑅 𝑖 ( 𝑡 ) ⋅ 𝐼 ( 𝐶 𝑖 > 𝑡 ) ⋅ 𝑊 𝑖 ( ⌊ 𝑡 ⌋ ) ⋅ 𝑒 𝛼 𝑎 + 𝛼 𝑎 ∗ ] ⎫⎪⎬⎪⎭ ⋅⋅ 𝑅 𝑖 ( 𝑡 ) ⋅ 𝐼 ( 𝐶 𝑖 > 𝑡 ) ⋅ 𝑊 𝑖 ( ⌊ 𝑡 ⌋ ) ( 𝑑𝑁 𝑖 ( 𝑡 ) − 𝜆 ( 𝑡 ) ⋅ 𝑒 𝛼 𝑎 + 𝛼 𝑎 ∗ 𝑑𝑡 ) Note that: Vo ET AL 𝐸 [ 𝑅 ( 𝑡 ) ⋅ 𝐼 ( 𝐶 ≥ 𝑡 ) ⋅ 𝑊 ( 𝑡 ) ⋅ 𝑑𝑁 ( 𝑡 )]= 𝐸 { 𝐸 [ 𝐼 ( 𝐶 ≥ 𝑡 ) ⋅ 𝑅 ( 𝑡 ) ⋅ 𝑑𝑁 ( 𝑡 ) ⋅ 𝑊 ( ⌊ 𝑡 ⌋ ) |||| 𝐴, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ]} = 𝐸 { 𝑊 ( ⌊ 𝑡 ⌋ ) ⋅ Pr ( 𝐶 ≥ 𝑡 | 𝐴, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 , 𝑇 ≥ 𝑡 ) ⋅⋅ 𝐸 [ 𝑑𝑁 ( 𝑡 ) | 𝐴, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 , 𝑇 ≥ 𝑡 ] ⋅ Pr ( 𝑇 ≥ 𝑡 | 𝐴, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ) } = 𝐸 { 𝑊 ( ⌊ 𝑡 ⌋ ) ⋅ Pr ( 𝐶 ≥ 𝑡 | 𝐴 ) ⋅ 𝐸 [ 𝑑𝑁 ( 𝑡 ) | 𝐴, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 , 𝑇 ≥ 𝑡 ] ⋅ Pr ( 𝑇 ≥ 𝑡 | 𝐴, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ) } = Pr ( 𝐶 ≥ 𝑡 | 𝐴 = 𝑎 ) ⋅ 𝐸 { 𝑊 ∗ ( ⌊ 𝑡 ⌋ ) ⋅ 𝐸 [ 𝑑𝑁 ( 𝑡 ) | 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 , 𝑇 ≥ 𝑡 ] ⋅ Pr ( 𝑇 ≥ 𝑡 | 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ) |||| 𝐴 = 𝑎 } Pr ( 𝑎 ) The prior-to-last equaling results from the assumption that the censoring is non-informative given the treatment group (e.g.ﬁgure 3). Following the same reasoning as above, one can show that: 𝐸 [ 𝑅 ( 𝑡 ) ⋅ 𝐼 ( 𝐶 ≥ 𝑡 ) ⋅ 𝑊 ( 𝑡 ) ⋅ 𝑑𝑁 ( 𝑡 )] = Pr ( 𝐶 ≥ 𝑡 | 𝐴 = 𝑎 ) ⋅ 𝐸 [ 𝑅 𝑎,𝑎 ∗ ( 𝑡 ) ] ⋅⋅ 𝐸 [ 𝑑𝑁 𝑎,𝑎 ∗ ( 𝑡 ) | 𝑅 𝑎,𝑎 ∗ ( 𝑡 ) = 1 ] and similarly, 𝐸 [ 𝑅 ( 𝑡 ) ⋅ 𝐼 ( 𝐶 ≥ 𝑡 ) ⋅ 𝑊 ( 𝑡 )] = Pr ( 𝐶 ≥ 𝑡 | 𝐴 = 𝑎 ) ⋅ 𝐸 [ 𝑅 𝑎,𝑎 ∗ ( 𝑡 ) ] The score will thus remain the same as in the case of no censoring, with an added proportional constant, i.e. Pr ( 𝐶 > 𝑡 | 𝐴 = 𝑎 ) .As a result, 𝐸 ( 𝐹 ) = ∫ ∑ 𝑖,𝑎,𝑎 ∗ 𝐸 ⎧⎪⎨⎪⎩⎡⎢⎢⎢⎣( 𝑎𝑎 ∗ ) − ∑ 𝑎,𝑎 ∗ ( 𝑎𝑎 ∗ ) ⋅ 𝑒 𝛼 𝑎 + 𝛼 𝑎 ∗ ⋅ 𝐸 ( 𝑅 𝑎,𝑎 ∗ 𝑖 ( 𝑡 ) ) ⋅ Pr ( 𝐶 > 𝑡 | 𝐴 = 𝑎 ) ∑ 𝑎,𝑎 ∗ 𝑒 𝛼 𝑎 + 𝛼 𝑎 ∗ ⋅ 𝐸 ( 𝑅 𝑎,𝑎 ∗ 𝑖 ( 𝑡 ) ) ⋅ Pr ( 𝐶 > 𝑡 | 𝐴 = 𝑎 ) ⎤⎥⎥⎥⎦ ⋅⋅ 𝐸 [ 𝑅 𝑎,𝑎 ∗ 𝑖 ( 𝑡 ) ] ⋅ [ 𝐸 [ 𝑑𝑁 𝑖 ( 𝑡 ) 𝑎,𝑎 ∗ ( 𝑡 ) | 𝑇 𝑎,𝑎 ∗ 𝑖 ≥ 𝑡 ] − 𝜆 ( 𝑡 ) ⋅ 𝑒 𝛼 𝑎 + 𝛼 𝑎 ∗ ]} ⋅ Pr ( 𝐶 > 𝑡 | 𝐴 = 𝑎 )= 0 which results from the fact that the natural proportional hazard model is correctly speciﬁed.When censoring is only non-informative at time 𝑡 given the exposure group and the history of mediator and covariates up tothat time, the weight of each patient at time 𝑡 needed to be adjusted as described in section 2.3. To prove this, we consider onceagain the expression 𝐸 [ 𝑅 ( 𝑡 ) ⋅ 𝐼 ( 𝐶 > 𝑡 ) ⋅ 𝑊 ( ⌊ 𝑡 ⌋ ) ⋅ 𝑑𝑁 ( 𝑡 )] . Note that the weight 𝑊 ( ⌊ 𝑡 ⌋ ) now has an additional censoring-weightcomponent (see section 2.3), for then: 𝐸 [ 𝑅 ( 𝑡 ) ⋅ 𝐼 ( 𝐶 > 𝑡 ) ⋅ 𝑊 ( ⌊ 𝑡 ⌋ ) ⋅ 𝑑𝑁 ( 𝑡 )]= 𝐸 { 𝐸 ( 𝑑𝑁 ( 𝑡 ) | 𝐶 ≥ 𝑡, 𝑇 ≥ 𝑡, 𝐴, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ) ⋅ 𝑅 ( 𝑡 ) ⋅ 𝐼 ( 𝐶 > 𝑡 ) ⋅ 𝑊 ( ⌊ 𝑡 ⌋ ) } = 𝐸 { 𝐸 ( 𝑑𝑁 ( 𝑡 ) | 𝑇 ≥ 𝑡, 𝐴, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ) ⋅ 𝑅 ( 𝑡 ) ⋅ 𝐼 ( 𝐶 > 𝑡 ) ⋅ 𝑊 ∗ ( ⌊ 𝑡 ⌋ ) |||| 𝐴 = 𝑎 } Pr ( 𝐴 = 𝑎 ) where 𝑊 ∗ ( 𝑡 ) = ∏ 𝑠 ∶0 ≤ 𝑠 ≤ ⌊ 𝑡 ⌋ Pr ( 𝑀 𝑠 | 𝐴 = 𝑎 ∗ , 𝑀 ⌊ 𝑠 ⌋ −1 , 𝐿 ⌊ 𝑠 ⌋ , 𝑇 ≥ 𝑡 𝑠 , 𝐶 ≥ 𝑡 𝑠 ) ∏ 𝑠 ∶0 ≤ 𝑠 ≤ ⌊ 𝑡 ⌋ Pr ( 𝑀 𝑠 | 𝐴 = 𝑎, 𝑀 ⌊ 𝑠 ⌋ −1 , 𝐿 ⌊ 𝑠 ⌋ , 𝑇 ≥ 𝑡 𝑠 , 𝐶 ≥ 𝑡 𝑠 ) ⋅ Pr ( 𝐴 = 𝑎 | 𝐿 ) ⋅⋅ ∏ 𝑠 ∶0 ≤ 𝑠 ≤ 𝑡 [ 𝜆 𝐶 ( 𝑠 | 𝑇 > 𝑠, 𝑚 𝑠 , 𝑙 𝑠 , 𝑙 , 𝑎 ) ] o ET AL Note that the second equaling results from the fact that censoring is non-informative at any time 𝑡 , given the history up to thattime. Now denote 𝑡 − = 𝑡 − 𝛿𝑡 where 𝛿𝑡 > is a small positive quantity such that ⌊ 𝑡 ⌋ < 𝑡 − < 𝑡 , one then has: 𝐸 [ 𝑅 ( 𝑡 ) ⋅ 𝐼 ( 𝐶 > 𝑡 ) ⋅ 𝑊 ( ⌊ 𝑡 ⌋ ) ⋅ 𝑑𝑁 ( 𝑡 )]= 𝐸 { 𝐸 ( 𝑑𝑁 ( 𝑡 ) | 𝑇 ≥ 𝑡, 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ) ⋅ 𝐼 ( 𝑇 ≥ 𝑡 ) ⋅ 𝐼 ( 𝑇 ≥ 𝑡 −) ⋅⋅ 𝐼 ( 𝐶 ≥ 𝑡 ) ⋅ 𝐼 ( 𝐶 ≥ 𝑡 −) ⋅ 𝑊 ∗ ( ⌊ 𝑡 ⌋ ) |||| 𝐴 = 𝑎 } Pr ( 𝑎 )= 𝐸 { 𝐸 [ 𝐸 ( 𝑑𝑁 ( 𝑡 ) | 𝑇 ≥ 𝑡, 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ) ⋅ 𝐼 ( 𝑇 ≥ 𝑡 ) ⋅ 𝐼 ( 𝑇 ≥ 𝑡 −) ⋅ 𝐼 ( 𝐶 ≥ 𝑡 ) ⋅⋅ 𝐼 ( 𝐶 ≥ 𝑡 −) ⋅ 𝑊 ∗ ( ⌊ 𝑡 ⌋ ) |||| 𝑇 ≥ 𝑡 − , 𝐶 ≥ 𝑡 − , 𝐴 = 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ] |||| 𝐴 = 𝑎 } Pr ( 𝑎 )= 𝐸 { 𝐸 ( 𝑑𝑁 ( 𝑡 ) | 𝑇 ≥ 𝑡, 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ) ⋅ 𝐼 ( 𝑇 ≥ 𝑡 −) ⋅ 𝐼 ( 𝐶 ≥ 𝑡 −) ⋅ 𝑊 ∗ ( ⌊ 𝑡 ⌋ ) ⋅⋅ E [ 𝐼 ( 𝑇 ≥ 𝑡 ) ⋅ 𝐼 ( 𝐶 ≥ 𝑡 ) |||| 𝑇 ≥ 𝑡 − , 𝐶 ≥ 𝑡 − , 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ] |||| 𝐴 = 𝑎 } Pr ( 𝑎 )= 𝐸 { 𝐸 ( 𝑑𝑁 ( 𝑡 ) | 𝑇 ≥ 𝑡, 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ) ⋅ 𝐼 ( 𝑇 ≥ 𝑡 −) ⋅ 𝐼 ( 𝐶 ≥ 𝑡 −) ⋅ 𝑊 ∗ ( ⌊ 𝑡 ⌋ ) ⋅ Pr [ 𝐶 ≥ 𝑡 | 𝑇 ≥ 𝑡, 𝐶 ≥ 𝑡 − , 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ] ⋅⋅ Pr [ 𝑇 ≥ 𝑡 | 𝑇 ≥ 𝑡 − , 𝐶 ≥ 𝑡 − , 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ] |||| 𝐴 = 𝑎 } Pr ( 𝑎 )= 𝐸 { 𝐸 ( 𝑑𝑁 ( 𝑡 ) | 𝑇 ≥ 𝑡, 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ) ⋅ 𝐼 ( 𝑇 ≥ 𝑡 −) ⋅ 𝐼 ( 𝐶 ≥ 𝑡 −) ⋅ 𝑊 ∗ ( ⌊ 𝑡 ⌋ ) ⋅ [1 − 𝜆 𝐶 ( 𝑡 | 𝑇 ≥ 𝑡, 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ) 𝛿𝑡 ] ⋅⋅ Pr [ 𝑇 ≥ 𝑡 | 𝑇 ≥ 𝑡 − , 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ] |||| 𝐴 = 𝑎 } Pr ( 𝑎 )= 𝐸 { 𝐸 ( 𝑑𝑁 ( 𝑡 ) | 𝑇 ≥ 𝑡, 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ) ⋅ 𝐼 ( 𝑇 ≥ 𝑡 −) ⋅ 𝐼 ( 𝐶 ≥ 𝑡 −) ⋅ 𝑊 ∗ ( ⌊ 𝑡 ⌋ ) ⋅ [1 − 𝜆 𝐶 ( 𝑡 | 𝑇 ≥ 𝑡, 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ) 𝛿𝑡 ] ⋅ [1 − 𝜆 ( 𝑡 | 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ) 𝛿𝑡 ] |||| 𝐴 = 𝑎 } Pr ( 𝑎 ) The last two equalings result from the fact that Pr [ 𝐶 ≥ 𝑡 | 𝑇 ≥ 𝑡, 𝐶 ≥ 𝑡 − , 𝐴 = 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ] = 1 − 𝜆 𝐶 ( 𝑡 | 𝑇 ≥ 𝑡, 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ) 𝛿𝑡 and that Pr [ 𝑇 ≥ 𝑡 | 𝑇 ≥ 𝑡 − , 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ] = Pr [ 𝑇 ≥ 𝑡 | 𝑇 ≥ 𝑡 − , 𝐶 ≥ 𝑡 − , 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ] due to theassumption of conditionally non-informative censoring. This implies that: 𝐸 [ 𝑅 ( 𝑡 ) ⋅ 𝐼 ( 𝐶 > 𝑡 ) ⋅ 𝑊 ( ⌊ 𝑡 ⌋ ) ⋅ 𝑑𝑁 ( 𝑡 )]= 𝐸 { 𝐸 ( 𝑑𝑁 ( 𝑡 ) | 𝑇 ≥ 𝑡, 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ) ⋅ 𝐼 ( 𝑇 ≥ 𝑡 −) ⋅ 𝐼 ( 𝐶 ≥ 𝑡 −) ⋅⋅ ∏ 𝑠 ∶0 ≤ 𝑠 ≤ ⌊ 𝑡 ⌋ Pr ( 𝑀 𝑠 | 𝐴 = 𝑎 ∗ , 𝑀 ⌊ 𝑠 ⌋ −1 , 𝐿 ⌊ 𝑡 ⌋ , 𝑇 ≥ 𝑡 𝑠 , 𝐶 ≥ 𝑡 𝑠 ) ∏ 𝑠 ∶0 ≤ 𝑠 ≤ ⌊ 𝑡 ⌋ Pr ( 𝑀 𝑠 | 𝐴 = 𝑎, 𝑀 ⌊ 𝑠 ⌋ −1 , 𝐿 ⌊ 𝑠 ⌋ , 𝑇 ≥ 𝑡 𝑠 , 𝐶 ≥ 𝑡 𝑠 ) ⋅ Pr ( 𝐴 = 𝑎 | 𝐿 ) ⋅⋅ 𝜆 ( 𝑡 | 𝑎, 𝑀 ⌊ 𝑡 ⌋ , 𝐿 ⌊ 𝑡 ⌋ , 𝐿 ) ∏ 𝑠 ∶0 ≤ 𝑠 ≤ 𝑡 − [ 𝜆 𝐶 ( 𝑠 | 𝑇 > 𝑠, 𝑚 𝑠 , 𝑙 𝑠 , 𝑙 , 𝑎 ) ] |||| 𝐴 = 𝑎 } Pr ( 𝑎 ) Applying the above arrangements in a backward fashion as is done in the case of no censoring, one will obtain the result that: 𝐸 [ 𝑅 ( 𝑡 ) ⋅ 𝐼 ( 𝐶 > 𝑡 ) ⋅ 𝑊 ( ⌊ 𝑡 ⌋ ) ⋅ 𝑑𝑁 ( 𝑡 )] = 𝐸 [ 𝑅 𝑎,𝑎 ∗ ( 𝑡 ) ] ⋅ 𝐸 [ 𝑑𝑁 𝑎,𝑎 ∗ ( 𝑡 ) | 𝑅 𝑎,𝑎 ∗ ( 𝑡 ) = 1 ]]