[PDF] Joint modelling of longitudinal and multi-state processes: application to clinical progressions in prostate cancer

Abstract

Joint modelling of longitudinal and survival data is increasingly used in clinical trials on cancer. In prostate cancer for example, these models permit to account for the link between longitudinal measures of prostate-specific antigen (PSA) and the time of clinical recurrence when studying the risk of relapse. In practice, multiple types of relapse may occur successively. Distinguishing these transitions between health states would allow to evaluate, for example, how PSA trajectory and classical covariates impact the risk of dying after a distant recurrence post-radiotherapy, or to predict the risk of one specific type of clinical recurrence post-radiotherapy, from the PSA history. In this context, we present a joint model for a longitudinal process and a multi-state process which is divided into two sub-models: a linear mixed sub-model for longitudinal data, and a multi-state sub-model with proportional hazards for transition times, both linked by shared random effects. Parameters of this joint multi-state model are estimated within the maximum likelihood framework using an EM algorithm coupled to a quasi-Newton algorithm in case of slow convergence. It is implemented under R, by combining and extending the mstate and JM packages. The estimation program is validated by simulations and applied on pooled data from two cohorts of men with localized prostate cancer and treated by radiotherapy. Thanks to the classical covariates available at baseline and the PSA measurements collected repeatedly during the follow-up, we are able to assess the biomarker's trajectory, define the risks of transitions between health states, and quantify the impact of the PSA dynamics on each transition intensity.

Full PDF

JJoint modelling of longitudinal and multi-state processes:application to clinical progressions in prostate cancer

Lo¨ıc Ferrer , ∗ , Virginie Rondeau , James J. Dignam , Tom Pickles ,H´el`ene Jacqmin-Gadda and C´ecile Proust-Lima June 2015 INSERM, U897, ISPED, Universit´e de Bordeaux, F-33000 Bordeaux, France Department of Public Health Sciences, University of Chicago, and NRG Oncology, U.S.A. Department of Radiation Oncology, University of British Columbia, Canada ∗ E-mail: [email protected]

Abstract:

Joint modelling of longitudinal and survival data is increasingly used in clinical trials oncancer. In prostate cancer for example, these models permit to account for the link between longitudinalmeasures of prostate-speciﬁc antigen (PSA) and the time of clinical recurrence when studying the risk ofrelapse. In practice, multiple types of relapse may occur successively. Distinguishing these transitionsbetween health states would allow to evaluate, for example, how PSA trajectory and classical covariatesimpact the risk of dying after a distant recurrence post-radiotherapy, or to predict the risk of one spe-ciﬁc type of clinical recurrence post-radiotherapy, from the PSA history. In this context, we present ajoint model for a longitudinal process and a multi-state process which is divided into two sub-models: alinear mixed sub-model for longitudinal data, and a multi-state sub-model with proportional hazards fortransition times, both linked by shared random e ﬀ ects. Parameters of this joint multi-state model are es-timated within the maximum likelihood framework using an EM algorithm coupled to a quasi-Newtonalgorithm in case of slow convergence. It is implemented under R, by combining and extending the mstate and JM packages. The estimation program is validated by simulations and applied on pooleddata from two cohorts of men with localized prostate cancer and treated by radiotherapy. Thanks tothe classical covariates available at baseline and the PSA measurements collected repeatedly during thefollow-up, we are able to assess the biomarker’s trajectory, deﬁne the risks of transitions between healthstates, and quantify the impact of the PSA dynamics on each transition intensity. Keywords:

Joint modelling; Longitudinal process; Multi-state process; Prostate cancer; R; Shared ran-dom e ﬀ ects. 1 a r X i v : . [ s t a t . A P ] J un In longitudinal health studies, marker data are usually collected at repeated measurement times until theoccurrence of an event such as disease relapse or death, with the perspective to study the link betweenthese two correlated processes or use the information brought by the marker’s dynamics to explain orpredict the time to event. In such analyses, the repeated measurements of the marker should not beconsidered as a standard time-dependent covariate in a survival model (Andersen and Gill, 1982; Fisherand Lin, 1999) because the marker is an internal outcome measured with error and at discrete timeswhereas the Cox model assumes to know the exact values of the explanatory variables for all individualsat risk at each event time. To counteract these weaknesses, the two processes can be modelled jointly(Faucett and Thomas, 1996; Wulfsohn and Tsiatis, 1997). The principle is to deﬁne two sub-models(one mixed sub-model for the longitudinal data and one survival sub-model for the time-to-event data)and link them using a common latent structure. The shared random e ﬀ ect models, notably developed byTsiatis and Davidian (2004), are the most popular joint models. They usually assume that a function ofthe random e ﬀ ects from the linear mixed model is included as covariates in the survival model. Thus,longitudinal and survival processes share the same random e ﬀ ects, and the e ﬀ ect of the dynamics of themarker on the risk of event can be deduced.In prostate cancer, the joint modelling method is very useful. The prostate-speciﬁc antigen (PSA),which is a protein secreted by the prostate, is found to be over-expressed in the presence of prostate can-cer. This blood-based longitudinal tumour marker is commonly used by clinicians to monitor patientswith localized prostate cancer following treatment (radiation therapy or surgery) in order to detect sub-clinical presence of disease. Proust-Lima et al. (2008) , Taylor et al. (2005) and Yu et al. (2008) showed,through various types of joint models, that the dynamics of this biomarker, along with the pre-treatmentPSA level and other factors measuring the aggressiveness of cancer cells and the extent of the tumour,were risk factors for progression and permitted one to dynamically predict (i.e. using PSA to adaptprediction over time) the risk of clinical relapse.In practice, a patient may experience a succession of clinical progression events with for examplea local recurrence, followed by a distant metastatic recurrence and then death. So, instead of the oc-currence of a single clinical event, the progression of prostate cancer should be deﬁned as a multi-stateprocess with a focus on the transitions between clinical states and the impact of the biomarker dynamicson it. This is essential to understand and predict accurately the course of the disease, and it is of partic-ular relevance for the clinicians that need to distinguish the di ﬀ erent types of events in order to properlyadapt the treatment.Some authors already extended the classical framework of joint modelling to multiple time-to-eventdata. Chi and Ibrahim (2006) proposed a joint model for multivariate longitudinal data and multivariatesurvival data. Liu and Huang (2009) and Kim et al. (2012) looked into the simultaneous study of threecorrelated outcomes: longitudinal data, times of recurrent events and time of terminal event. Elasho ﬀ et al. (2008) and Rizopoulos (2012) extended the joint model to competing risks survival data, whichallows to characterize the cause of survival event. Dantan et al. (2011) presented a joint model withlatent state for longitudinal data and time-to-illness and time-to-death data. Recently, Andrinopoulouet al. (2014) studied simultaneously two longitudinal markers and competing events. However, the jointstudy of longitudinal and multi-state data has never been proposed and implemented. Thus, we introducea joint model with shared random e ﬀ ects for repeated measurements of a longitudinal marker and timesof transitions between multiple states. It consists in a linear mixed model and a multi-state model withtransition-speciﬁc proportional intensities, both linked by shared random e ﬀ ects.The computational aspect is the main obstacle in the development of joint models with shared ran-dom e ﬀ ects. As explained by Gould et al. (2014), the R package JM , developed by Rizopoulos (2010),has enabled many advances in the use of joint modelling, particularly through e ﬃ cient numerical inte-grations. On the other hand, the R package mstate , developed by De Wreede et al. (2010), providesestimation of multi-state models. In the present work, we combine and adapt these two packages inorder to estimate joint multi-state models. Thus, the implementation is easy and e ﬀ ective. Through theadaptation of the jointModel() function of the JM package, our approach uses the maximum likelihoodapproach, which is performed using the EM algorithm coupled to a quasi-Newton algorithm in case ofslow convergence. The software advantage is that it keeps the features, syntax and outputs of JM .The paper is organized as follows. Section 2 presents the joint model for longitudinal and multi-state processes. The estimation and implementation procedures are detailed in Section 3. The jointmulti-state model has been validated by simulations, as detailed in Section 4. We apply our model on adetailed example of two cohorts of men with prostate cancer in Section 5. A brief discussion is given inSection 6. For each individual i , a longitudinal process and a multi-state process are observed. Let { E i ( t ) , t ≥ } bethe multi-state process where E i ( t ) denotes the occupied state by subject i at time t and takes values in theﬁnite state space S = { , , . . . , M } . It is assumed that the multi-state process is continuous and observedbetween the left truncation time T i and the right censoring time C i , so that the observed process is E i = { E i ( t ) , T i ≤ t ≤ C i } . We further consider that E i is a non-homogeneous Markov process. TheMarkov property ensures that the future of the process depends only on the present state and not thepast state, i.e. Pr ( E i ( t + u ) = k | E i ( t ) = h , { E i ( s ) , s < t } ) = Pr ( E i ( t + u ) = k | E i ( t ) = h ) , ∀ h , k ∈ S , ∀ u ≥ T i impactsthe future evolution of the process. Let us consider T i = (cid:0) T i , T i , . . . , T im i (cid:1) (cid:62) the vector of the m i ≥ i , with T ir < T i ( r + , ∀ r ∈ { , . . . , m i − } , and where (cid:62) denotes thetranspose vector. If the last observed state for subject i ( E i ( T im i )) is absorbing, that is it is impossibleto leave it once entered (typically death), we observe m i direct transitions. Otherwise, T im i equals C i the right censoring time and we observe m i − δ i = (cid:0) δ i , . . . , δ im i (cid:1) (cid:62) thevector of observed transition indicators, with δ i ( r + equals 1 if a transition is observed at time T i ( r + (i.e. E i ( T ir ) (cid:44) E i ( T i ( r + )) and 0 otherwise, ∀ r ∈ { , . . . , m i − } . For each patient i , we also observe Y i = (cid:0) Y i , . . . , Y in i (cid:1) (cid:62) the vector of n i measures of the marker collected at times t i , . . . , t in i , with t in i ≤ T im i . The joint multi-state model is decomposed into two sub-models : a linear mixed sub-model for thelongitudinal data (repeated measurements of the biomarker) and a multi-state model with proportionalhazards for the event history data (transition and censoring times), both linked by shared random e ﬀ ects. To model the trajectory of the longitudinal marker, we use a linear mixed model. Under Gaussian as-sumptions, we assume that Y i j the observed measure of the marker at time point t i j is a noisy measure ofthe true measure Y ∗ i ( t i j ). This non-observed measure Y ∗ i ( t i j ) is explained according to time and covari-ates, at the population level, with ﬁxed e ﬀ ects β and at the individual level, with random e ﬀ ects b i thattake into account the correlation between repeated measures of the same individual: Y i j = Y ∗ i ( t i j ) + (cid:15) i j = X Li ( t i j ) (cid:62) β + Z i ( t i j ) (cid:62) b i + (cid:15) i j , (1)with X Li ( t i j ) and Z i ( t i j ) the vectors of possibly time-dependent covariates associated respectively withthe p -vector of ﬁxed e ﬀ ects β and the q -vector of random e ﬀ ects b i , b i ∼ N (0 , D ). Note that (cid:15) i = (cid:0) (cid:15) i , . . . , (cid:15) in i (cid:1) (cid:62) ∼ N (0 , σ I n i ) where I is the identity matrix; (cid:15) i and b i are independent. To model the transition times, we use a Markov multi-state model with proportional hazards that takesinto account the marker’s dynamics through the shared random e ﬀ ects b i . Thus, for the transition fromstate h ∈ S to state k ∈ S , the transition intensity at time t takes the form: λ ihk ( t | b i ) = lim d t → Pr( E i ( t + d t ) = k | E i ( t ) = h ; b i )d t = λ hk , ( t ) exp( X S (cid:62) hk , i γ hk + W hk , i ( b i , t ) (cid:62) η hk ) , (2)with λ hk , ( . ) the parametric baseline intensity (Weibull, piecewise constant or B-splines for example) and X Shk , i the vector of prognostic factors associated with the r -vector of coe ﬃ cients γ hk . The multivariatefunction W hk , i ( b i , t ) deﬁnes the dependence structure between the longitudinal and multi-state processes.We can choose W hk , i ( b i , t ) = Y ∗ i ( t ) (the true current level of the marker), or W hk , i ( b i , t ) = ∂ Y ∗ i ( t ) /∂ t (thetrue current slope), W hk , i ( b i , t ) = (cid:16) Y ∗ i ( t ) , ∂ Y ∗ i ( t ) /∂ t (cid:17) (cid:62) (both), or any other function of the random e ﬀ ectsin the context under study. Thus, the s -vector of coe ﬃ cients η hk quantiﬁes the impact of the longitudinalmarker’s dynamics on the transition intensity between states h and k . The parameters of this joint model are estimated in the maximum likelihood framework. Since thelongitudinal and multi-state processes are independent conditionally on the random e ﬀ ects, the completeobserved likelihood is obtained through the product of the individual contributions to the likelihood forthe N individuals as: L ( θ ) = N (cid:89) i = (cid:90) R q f Y ( Y i | b i ; θ ) f E ( E i | b i ; θ ) f b ( b i ; θ ) d b i , (3)where θ is the vector of all the parameters contained in (1) and (2), and f ( . ) is a probability densityfunction.In the longitudinal sub-part, described by the linear mixed model (1), the conditional longitudinaloutcomes are such that: f Y ( Y i | b i ; θ ) = (cid:0) πσ (cid:1) n i / exp  − (cid:107) Y i − X L (cid:62) i β − Z (cid:62) i b i (cid:107) σ  , (4)where || x || denotes the Euclidean norm of vector x , X Li is the matrix of covariates with row vectors X Li ( t i j ) T , j = , . . . , n i , and likewise Z i = { Z i ( t i j ) } .For the multi-state side, let P ihk ( s , t ) be the transition probability from state h to state k between thetimes s and t for individual i , i.e. P ihk ( s , t ) = Pr( E i ( t ) = k | E i ( s ) = h ). For each r ∈ { , . . . , m i − } ,the continuity and Markov assumptions imply that the individual i remains in state E i ( T ir ) between thetimes T ir and T i ( r + with probability P iE i ( T ir ) , E i ( T ir ) ( T ir , T i ( r + | b i ), and if T i ( r + is an observed transitiontime, he transits to state E i ( T i ( r + ) with intensity λ iE i ( T ir ) , E i ( T i ( r + ) ( T i ( r + | b i ). By conditioning on E i ( T i ),this translates in the individual contribution to the likelihood: f E ( E i | b i ; θ ) = m i − (cid:89) r = (cid:20) P iE i ( T ir ) , E i ( T ir ) ( T ir , T i ( r + | b i ) λ iE i ( T ir ) , E i ( T i ( r + ) ( T i ( r + | b i ) δ i ( r + (cid:21) = m i − (cid:89) r = (cid:20) exp (cid:32)(cid:90) T i ( r + T ir λ iE i ( T ir ) , E i ( T ir ) ( u | b i ) d u (cid:33) λ iE i ( T ir ) , E i ( T i ( r + ) ( T i ( r + | b i ) δ i ( r + (cid:21) (5)with λ ihh ( t ) = − (cid:80) k , k (cid:44) h λ ihk ( t ). The possible delayed entry is here ignored by conditioning on E i ( T i ).Linking the longitudinal and multi-state sub-parts, the random e ﬀ ects b i follow a multivariate Gaus-sian distribution such that: f b ( b i ; θ ) = π ) q / det( D ) / exp  − b (cid:62) i D − b i  . (6) The joint multi-state model has been implemented under R, via the combination of two well-knownpackages: mstate for the multi-state models and JM for the joint models with shared random e ﬀ ects.To ﬁt semi-parametric Markov multi-state models, mstate consists ﬁrst of preparing the database formulti-state analysis, more speciﬁcally by deﬁning each patient’s history as a series of rows, one for eachtransition at risk for each individual (in contrast with only one data record (row) per individual in aclassical survival analysis). By stratifying on the transition type, the standard coxph() function of theR package survival can be used to ﬁt transition-speciﬁc Cox models. With standard longitudinal andtime-to-event data, the JM package initialises the values of parameters from the function lme() ( nlme package) for the longitudinal sub-part and coxph() ( survival package) for the survival sub-part. Then,the function jointModel() carries out the estimation procedure.So by replacing the standard call to coxph() by the call to coxph() on prepared data by mstate , anextended jointModel() function, called JMstateModel() , carries out the estimation procedure. Theimplementation procedure is thus distinguished in four steps: • lme() function to initialise the parameters in the longitudinal sub-model; • mstate package to adapt the data to the multi-state framework; • coxph() function, on the prepared data, to initialise the parameters in the multi-state sub-model; • JMstateModel() function to estimate all the parameters of the joint multi-state model.A detailed example is given in Appendix C, and full detailed examples are available on https://github.com/LoicFerrer/JMstateModel/ . The

JMstateModel() function computes and maximises the likelihood adapted to the multi-state datausing integration and optimisation algorithms available in the JM package. Thus, the procedure combinesan EM algorithm coupled to a quasi-Newton algorithm if the convergence is not achieved. Furthermore,the integral with respect to time in (5) and the integral with respect to the random e ﬀ ects in (3) do nothave an analytical solution. These integrals are approached by numerical integration. The integralsover time are approximated using Gauss-Kronrod quadratures, and the integrals over the random e ﬀ ectsusing pseudo-adaptative Gauss-Hermite quadratures. Inference is provided by asymptotic properties formaximum likelihood estimators; the variance-covariance matrix of the parameter estimates is based onthe inverse of the Hessian matrix. More details on the optimisation procedure, the EM algorithm and thenumerical integrations are available in Rizopoulos (2012). The simulation study aimed to validate the estimation program described previously.

For each subject i = , . . . ,  Y i j = Y ∗ i ( t i j ) + (cid:15) i j = ( β + β , X X i + b i ) + ( β + β , X X i + b i ) × t i j + (cid:15) i j ,λ ihk ( t | b i ) = λ hk , ( t ) exp (cid:16) γ hk X i + η hk , level Y ∗ i ( t ) + η hk , slope ∂ Y ∗ i ( t ) /∂ t (cid:17) , (7)where the multi-state process that included three states ( h , k ∈ { , , } ) and three transitions is describedin Figure 1. [Figure 1 about here.]First, X i and b i were generated according to normal distributions with respectively mean 2 .

04 andvariance 0 .

5, and mean vector   and variance-covariance matrix  . − . − .

04 0 .  . The times of mea-surements were generated such as t i j = , . , . , . . . , .

33, and (cid:15) i j was generated from a stan-dard normal distribution. The log baseline intensities were linear combinations of cubic B-splineswith the same knot vector (0 . , . , . , . , . (cid:62) for the three transitions, and the vec-tors of spline coe ﬃ cients ( − . , − . , − . − . , − . , − . , − . T for the transition0 →

1, ( − . , − . , − . , − . − . , − . , − . (cid:62) for the transition 0 →

2, and( − . , − . , − . , − . − . , − . , − . (cid:62) for the transition 1 →

2. Parameters valuesand knot locations were chosen according to the application data described in Section 5.The procedure to generate the vector of observed times T i = (cid:0) T i , . . . , T im i (cid:1) (cid:62) was inspired byBeyersmann et al. (2011) and Crowther and Lambert (2013). For each individual i , the censoringtime C i was generated from an uniform distribution on [1 , T ∗ i = (cid:16) T ∗ i , , T ∗ i , , T ∗ i , (cid:17) (cid:62) was generated according to the following procedure: (1) three random numbers u i , , u i , and u i , were generated from three independent standard uniform distributions; (2) T ∗ i , and T ∗ i , were generated by solving (cid:82) T ∗ i , k λ i k ( ν k | b i ) d ν k + log( u i , k ) = , for k = , , through the Brent’sunivariate root-ﬁnding method (Brent, 1973); (3) then, the true transition time T ∗ i , was generated via (cid:82) T ∗ i , T ∗ i , λ i ( ν | b i ) d ν + log( u i , ) =

0. Finally, by comparing T ∗ i and C i , the vector T i , which characterizesthe multi-state process, was deduced.The longitudinal measurements, generated from the linear mixed sub-model, were truncated at T i the ﬁrst observed time of the multi-state process. The estimation model was the model deﬁned in (7) with b i ∼ N   ,  D D D D  and (cid:15) i j ∼ N (0 , σ ).The log baseline intensities were approximated by a linear combination of cubic-splines with threeinternal knots placed at the quantiles of the observed times.0 The simulations results were obtained through 500 replicates of 1500 individuals. Each joint multi-state model was estimated using 3, 9 and 15 pseudo-adaptative Gauss-Hermite quadrature points. Thesimulation results are presented in Table 4.These results were very satisfying with unbiased estimates and correct 95% coverage rates. Theyshowed however the need to use a certain number of Gauss-Hermite quadrature points to approximate theintegral over the random e ﬀ ects. Indeed, the use of 3 pseudo-adaptative Gauss-Hermite quadrature pointsinduced a bias in the estimation of the parameters associated with time e ﬀ ect in the longitudinal sub-part.This error was fully corrected with 9 points. Overall, these results conﬁrmed the good performances ofthe implemented procedure. Complementary simulation results based on 500 and 1000 subjects areprovided in Appendix A. We analysed data from patients with localized prostate cancer and treated by external beam radiotherapy.The analysis aimed to explore the link between PSA dynamics and transition intensities between clinicalstates, as well as to describe PSA repeated measurements and times of transitions between health states.

Our study focuses on 1474 men with a clinically localized prostate cancer and treated by external beamradiotherapy (EBRT): 629 patients come from the multi-center clinical trial RTOG 9406 (RadiationTherapy Oncology Group, USA) in which data collection has been conducted from 1994 to 2013, and845 patients come from the cohort of the British Columbia Cancer Agency (BCCA) in Vancouver,Canada, and have been examined between 1994 and 2012 (Table 2). During his follow-up, a patient canpossibly go through several states deﬁned as local recurrence, distant recurrence, initiation of hormonaltherapy (HT) and death, due or not to prostate cancer. The initiation of salvage hormonal therapy,which is an additional treatment prompted by physician observed signs in PSA or clinical signs, isdesigned to prevent growth of potentially present subclinical cancer. This intervention is not plannedat diagnosis or initiated by any precise rule, but is rather based on a mutual agreement between the1clinician and his patient. Thus, it is treated as a disease state transition representing failure of the initialtreatment to satisfactorily control the disease. Furthermore, as recommended in the article by Proust-Lima et al. (2008), we only considered the local relapses which took place three years or later afterradiation, or within three years of EBRT when the last PSA value was > / ml. PSA data werecollected at regular visits, for a median number of 10 PSA measurements per patient. Note that thisnumber includes only PSA data recorded between the end of EBRT and before the ﬁrst event (ﬁrstclinical recurrence, hormonal therapy, death or censorship). Subjects with only one PSA measure wereexcluded, and subjects who had an event in the ﬁrst year after EBRT were excluded to prevent inclusionof patients who had substantial residual initial tumors. As shown in Table 2, three baseline factorswere considered: the pre-therapy level of PSA in the log scale (iPSA), the T-stage category whichcharacterizes the tumour size (3 categories were considered: 2, 3–4 versus 1 in reference), and theGleason score category which measures the aggressiveness of cancer cells (3 categories: 7, 8–10 versus2–6 in reference). In the models, a cohort covariate was also considered coded as 1 for RTOG 9406 and − ﬀ erentintensities (see “Hormonal Therapy” and “Censorship” for example).[Figure 2 about here.]The multi-state data are depicted through the transitions between the 5 states and the correspondingamount of observed direct transitions in Figure 3.[Figure 3 about here.]From the end of EBRT (state 0), a patient can experience either a transition to a localized recurrence(state 1), an hormonal therapy (state 2), a distant recurrence (state 3) or death (absorbing state 4). After2a localized recurrence (state 1), a patient may experience either a distant recurrence (state 3) or die (state4) or initiate a HT (state 2). After initiation of HT, a patient may only experience a distant recurrenceor die, and ﬁnally, after a distant recurrence, a patient may only die. In total, 144 subjects had a localrecurrence; 317 men initiated an hormonal therapy including 90 after a local recurrence; 90 men hada distant recurrence including 10 directly after a local recurrence and 33 after a HT initiation. In total,802 patients died including 523 who did not have another recorded progression of the cancer before.Among the 672 men who were censored during the follow-up, 533 were censored before experiencingany clinical progression. The joint multi-state model being a complex model, a step-by-step procedure was carried out to specifythe joint model. The speciﬁcations of the longitudinal and multi-state sub-models were based on twoseparate analyses, that is assuming independence between the two processes. Covariate selection wasmade using uni- or multivariate Wald tests.

The biphasic shape of log-PSA was described in a linear mixed model with two functions of time accord-ing to previous works (Proust-Lima et al., 2008): f ( t ) = (1 + t ) α − f ( t ) = ( t ) + ν / (1 + t ) ν , where α and ν were estimated by proﬁle likelihood ( α = − . , ν = Y i j = log (cid:16) PSA i ( t i j ) + . (cid:17) the log-measure of PSA for the individual i at time t i j –the natural logarithm transformation is performed to obtain a Gaussian shape for the longitudinalresponse– the linear mixed sub-model took the form: Y i j = Y ∗ i ( t i j ) + (cid:15) i j = (cid:16) β + X L (cid:62) i β , cov + b i (cid:17) + (cid:16) β + X L (cid:62) i β , cov + b i (cid:17) × f ( t i j ) + (cid:16) β + X L (cid:62) i β , cov + b i (cid:17) × f ( t i j ) + (cid:15) i j , b i = ( b i , b i , b i ) (cid:62) ∼ N (0 , D ), D unstructured, and (cid:15) i = (cid:0) (cid:15) i , . . . , (cid:15) in i (cid:1) (cid:62) ∼ N (0 , σ I n i ). The covari-ates X L i , X L i and X L i were sub-vectors of the baseline prognostic factors obtained using a backwardstepwise procedure. For the sake of brevity, we will speak about PSA dynamics and biomarker’s cur-rent level / slope when referring actually to respectively the dynamics of log(PSA + .

1) and the currentlevel / slope of Y ∗ i ( t ). In the multi-state sub-part, the determination of prognostic factors and proportionality between baselineintensities was also made by considering no link between the two processes ( η =

0) and unspeciﬁed base-line intensities (i.e. using a standard semi-parametric multi-state model). The full sub-model consideredtransition-speciﬁc baseline intensities and transition-speciﬁc e ﬀ ects of baseline prognostic factors. Toreduce the excessive number of parameters to estimate, a ﬁrst step was to assume proportional baselineintensities for some transitions. Clinically, it made sense to consider proportional baseline intensities fortransitions leading to local recurrence or hormonal therapy: λ , ( t ) = exp( ζ ) λ , ( t ) = exp( ζ ) λ , ( t );and for the transitions leading to distant recurrence: λ , ( t ) = exp( ζ ) λ , ( t ) = exp( ζ ) λ , ( t ). Theseassumptions were conﬁrmed by the data. We could not make the same assumption for all transitionsleading to death because the proportional hazards assumption was not veriﬁed. Instead, we chose λ , ( t ) = exp( ζ ) λ , ( t ) and λ , ( t ) was stratiﬁed on the cohort. This procedure reduced the num-ber of baseline intensities to six. A second step consisted in selecting the prognostic factors. Factorswith an associated p-value > . ﬀ ects on several transitionswere considered using multivariate Wald tests. For example, the baseline T-stage category had the samee ﬀ ect on transition intensities 0 →

1, 0 → →

3. Finally, covariates with p-value < . In the joint model, log baseline intensities approximated by linear combinations of cubic B-splines withthree internal knots replaced the unspeciﬁed ones. The dependence function W hk , i ( b i , t ) was the samefor all the transitions h → k and was determined using Wald tests. It resulted that the combination ofthe true current level and the true current slope of the biomarker ﬁtted at best the relationship between4PSA dynamics and the instantaneous risk to transit between health states. The ﬁnal step was to select theprognostic factors and the log-coe ﬃ cients of proportionality between baseline intensities with p-value < . λ ihk ( t | b i ) = λ hk , ( t ) exp  X S (cid:62) hk , i γ hk +  Y ∗ i ( t ) ∂ Y ∗ i ( t ) /∂ t  (cid:62)  η hk , level η hk , slope  , where the relations between λ hk , ( t ) and the ﬁnal X Shk , i , for h , k ∈ { , . . . , } are indicated in Section 5.2.2and in Table 3. Note that the covariates that were removed of the joint model speciﬁcation are not inTable 3. The parameter estimates of the joint multi-state model are presented in Table 3. These parameterswere those selected according to the procedure described previously. The parameters of the baselineintensities are not shown here for clarity. [Table 2 about here.]The estimated regression parameters in the longitudinal sub-part conﬁrmed that pre-treatment PSA levelwas associated with the initial PSA level and the biphasic PSA trajectory, T-stage value was associatedboth with the short term and the long term dynamics while Gleason score was only associated with thelong term trajectory. Higher values of these covariates measured at baseline corresponded to higher longterm PSA levels. The cohort e ﬀ ect indicated a signiﬁcant di ﬀ erence between the two cohorts only forthe long term PSA evolution, with a greater long term increase of PSA in Vancouver.For the multi-state process, the prognostic factors showed that an advanced initial stage was notalways associated with the intensities of transitions between health states after adjustment for the PSAdynamics. In particular, the Gleason score had signiﬁcant e ﬀ ects on only two transition intensities.Moreover, we found that having a high PSA value at baseline was signiﬁcantly associated with a higherinstantaneous risk to directly experience hormonal therapy initiation or death after EBRT, but reduced theintensities of transitions leading to distant recurrence or death after a previous event. A poor (i.e. higher)T-stage category at baseline had globally a deleterious e ﬀ ect on the clinical endpoints. For the transitions5leading to distant recurrence after EBRT or after hormonal therapy, a patient with a Gleason score of 7at baseline had a 2 . = exp(0 . = <

7. The cohort was signiﬁcantly associated with the intensities of transitions leading todeath after clinical recurrence or hormonal therapy –and the direct transition leading to local recurrenceafter EBRT. The instantaneous risk to experience these transitions was higher in BCCA. For the directtransitions leading to distant recurrence after local recurrence or hormonal therapy, the cohort e ﬀ ect wasalso signiﬁcant, with higher intensities in RTOG 9406.Regarding the association parameters between PSA dynamics (current level and current slope) andclinical progressions, there were highly signiﬁcant deleterious e ﬀ ects of the PSA dynamics from theinitial state on the intensities of transitions leading to all the types of progression (local recurrence, hor-monal therapy or distant recurrence). For example, after adjustments for covariates and for the true slopeof the biomarker, an increase of one unit of the true biomarker’s level (log PSA without error measure-ment) induced a 1 . = exp(0 . = / and a strong increase of PSA leads to higher hazard to experience a clinicalrecurrence or an additional therapy. In contrast, for the direct transition leading to death after radiother-apy, we found a deleterious e ﬀ ect of the current slope and a protective e ﬀ ect of the current level of thebiomarker: at a given moment in the initial state, for two patients with the same baseline characteristicsand the same slope of log PSA, the one with higher PSA value will be less likely to directly die. Inthis studied population, an important cause of death is induced by comorbidities, most of death fromprostate cancer experienced a documented disease progression before. From the local recurrence, therewas large deleterious e ﬀ ect of the current slope of the biomarker for the intensities of transitions leadingto the hormonal therapy or the distant recurrence, and there was a borderline signiﬁcant protective e ﬀ ectof the current level for the intensity of transition leading to the distant recurrence. From the hormonaltherapy or the distant recurrence, there was no signiﬁcant e ﬀ ect of the PSA dynamics on the hazard tochange state. This was also clinically sensible, as it reﬂects that progression in these advanced stages isno more linked to PSA increase. In practice, criteria other than PSA are considered speciﬁcally in thisphase of the disease, such as the PCWG2 criteria (Scher et al., 2008). Moreover, deaths in patients withhormonal therapy might be explained by cardiac toxicity due to HT.6 The parameter estimates of the joint multi-state model were validated by several graphical tools pre-sented in Figure 4. For the longitudinal sub-model, the plotted standardized conditional residuals versusﬁtted values of the biomarker conﬁrmed the homoscedasticity of the conditional errors. Subject-speciﬁcpredictions were also compared to observations by plotting the average values by time intervals basedon the deciles of the observation times. 95% conﬁdence intervals of the observed values were added andconﬁrmed the very good ﬁt of the model concerning the longitudinal data. For the multi-state sub-part,we focused on P (0 , t ) = { P hk (0 , t ) } the matrix of transition probabilities between the times 0 and t . Wecompared our parametric estimator (obtained through the average of the predicted individual transitionprobabilities from the joint multi-state model) to the Aalen-Johansen estimator (non-parametric estima-tor of the transition probabilities), both using product integrals. This comparison is fully discussed anddetailed in Appendix B. These comparisons showed the overall good performances of the joint multi-state model in terms of ﬁt of transition probabilities, with the exception for the transition 1 → The joint model for the longitudinal biomarker PSA and multi-state clinical progression data providesa full model of prostate cancer progression which takes into account both classical prognostic factorsand the dynamics of the PSA, in order to study factors that inﬂuence the transition intensities betweenclinical health states. The implementation is easy and relies on the mstate and JM packages. The multi-state data are prepared through the mstate package, and a slightly modiﬁed jointModel() functioncarries out the estimation procedure. The estimation program has been validated by simulations, withvery good performances. It underlined however some bias in the estimates when too few quadraturepoints (3) were used. In the absence of speciﬁc validation, we thus recommend at least 9 Gauss-Hermitequadrature points in the pseudo-adaptative numerical integration to approximate at best the integral overthe random e ﬀ ects. Goodness-of-ﬁt of the model can be assessed using diagnostic graphical tools whosemethodology was detailed in Appendix B.7The application conﬁrmed that the PSA dynamics strongly impacted the instantaneous risk to ex-perience a clinical recurrence or hormonal therapy after the end of radiotherapy. Moreover, the currentslope of the biomarker had highly signiﬁcant deleterious e ﬀ ect on the hazard to transit from local recur-rence to hormonal therapy or distant recurrence. Conversely, extrapolating the biomarker’s dynamics didnot impact anymore the transition intensities from the hormonal therapy or the distant recurrence states.This expresses that in the advanced cancers, the PSA –and especially the collected measures prior theﬁrst event– is no more of importance. In these situations, other criteria have to be monitored.Previous works in prostate cancer had found a strong association between slope of log-PSA and anyclinical recurrence (see S`ene et al. (2014); Taylor et al. (2013)), by considering all the recurrences in acomposite event and the hormonal therapy as a time-dependent covariate. The limit of these approacheswas that in practice the type of progression is of major importance as the care greatly depends on thetype of risk the patient has. The joint multi-state model formalizes this need. In the same way as itwas done with a single event (see Proust-Lima and Taylor (2009); Rizopoulos (2011)), individualizeddynamic predictions of each type of progression could be derived from this model in order to preciselyquantify the risk of each type of progression according to the PSA history. For example, the cumulativeprobability for subject i to reach the state k between the times s and t , s ≤ t , given he was in state h at time s , could be expressed as: F ihk ( s , t ) = (cid:82) ts Pr( E i ( u ) = k | E i ( s ) = h , Y ( s ) i , X L ( s ) i , X Si ) d u , with Y ( s ) i thehistory (i.e. collected measures) of the marker up to time s , X L ( s ) i the history of the longitudinal sub-partcovariates until time s , and X Si = { X Shk , i } the matrix containing the prognostic factors for all the statetransitions.In this article, we assumed a continuous and Markov multi-state process. Depending on the topic, asemi-Markov process, which considers the spent time in the current state, could be deﬁned as well. Indementia for example, the multi-state process might consider three states: healthy, demented and dead,and it would be necessary to consider the time spent in the demented state before death (Commengeset al., 2007). The joint multi-state model and its associated implemented function handle for semi-Markov.In summary, we introduce here a ﬁrst joint model for longitudinal and multi-state clinical progressiondata. We showed that this model can easily be implemented under R and can be applied in practicethrough an example, the prostate cancer progression, which is one of many biomedical areas in which8such data are collected. This model that captures the complete information about the progression opensto much more precise knowledge of diseases and speciﬁc dynamic predictions. Appendix A: Simulation results

In Section 4 in the article, the simulations were performed on 1500 subjects and showed that 9 Gauss-Hermite quadrature points should be used in this case. A complementary simulation study was basedon the ﬁrst 500 and 1000 individuals of each of the 500 replicates and used 9 Gauss-Hermite quadraturepoints. With 500 and 1000 subjects, we respectively observed 167 and 333 transitions 0 →

1, 103 and205 transitions 0 →

2, 96 and 191 transitions 1 →

2. The results are presented in Table 4.[Table 3 about here.]Overall, these results conﬁrmed the good performances of the implemented function. The coveragerates were close to 95% and the relative bias were low, except for γ , X and η , slope , certainly due tothe required accuracy. Appendix B: Parametric versus non-parametric transition probabilities

To assess the results obtained in our application (Section 5 in the article), we compared the parametricestimator of the transition probabilities to a non-parametric estimator. The following results are basedon the book of ? . Preliminaries

Consider the multi-state process E = { E ( t ) , t ≥ } with values in the ﬁnite space S = { , , . . . , M } andwhere E ( t ) denotes the state occupied by an individual at time t . We assume that E is a non-homogeneousMarkov process, with left truncation and right censoring. In the following, we will consider that all theintroduced multi-state processes guarantee the above properties. The intensity of transition from state h ∈ S to state k ∈ S at time t is deﬁned as λ hk ( t ) = lim d t → Pr( E ( t + d t ) = k | E ( t ) = h )d t and we write λ ( s , t ) = { λ hk ( s , t ) } the ( M + × ( M +

1) matrix of transition intensities. The matrix of cumulative9transition intensities is noted Λ ( t ) , composed of non-diagonal elements Λ hk ( t ) = (cid:82) t λ hk ( u ) d u , ∀ h (cid:44) k , and diagonal elements Λ hh ( t ) = − (cid:80) k (cid:44) h Λ hk ( t ). Let us consider the transition probability P hk ( s , t ) = Pr( E ( t ) = k | E ( s ) = h ), with s ≤ t , which is the probability that a subject in state h at time s occupies thestate k at a later time t . We call P ( s , t ) = { P hk ( s , t ) } the matrix of transition probabilities, which satisﬁesthe Chapman-Kolmogorov equation: P ( s , t ) = P ( s , u ) P ( u , t ) , with 0 ≤ s ≤ u ≤ t . (8)Thus is deduced that P ( s , t ) is the unique solution of the Kolmogorov forward di ﬀ erential equations: P ( s , s ) = I ,∂∂ t P ( s , t ) = P ( s , t ) λ ( t ) . (9) Non-parametric estimator

Let N hk ( t ) be the number of direct observed transitions from state h to state k up to time t , and Y h ( t )the number of individuals in state h just before time t . The non-parametric estimator of the cumulativeintensities, called Λ ∗ ( t ) has elements Λ ∗ hk ( t ) estimated through the Nelson-Aalen estimator: Λ ∗ hk ( t ) = (cid:90) t d N hk ( u ) Y h ( u ) , h (cid:44) k , (10)and Λ ∗ hh ( t ) = − (cid:80) k (cid:44) h Λ ∗ hk ( t ). The solution to the Kolmogorov equations (9) permits to express the non-parametric estimate of the transition probabilities (cid:98) P ∗ ( s , t ) though the product-integral: (cid:98) P ∗ ( s , t ) = (cid:82) ( s , t ] (cid:16) I + d (cid:98) Λ ∗ ( u ) (cid:17) . (11)where (cid:98) Λ ∗ ( u ) is the non-parametric estimate of the cumulative transition intensities at time u , withd (cid:98) Λ ∗ hh ( u ) ≥ − u , and I is the ( M + × ( M +

1) identity matrix. This estimated matrix oftransition probabilities is called Aalen-Johansen estimator.Let s < T ∗ < . . . < T ∗ m ∗ ≤ t be the ordained times of observed direct transitions between s and t for0all individuals. From (11), it can be deduced: (cid:98) P ∗ ( s , t ) = m ∗ (cid:89) l = (cid:16) I + ∆ (cid:98) Λ ∗ ( T ∗ l ) (cid:17) . (12)where ∆ (cid:98) Λ ∗ ( T ∗ l ) = (cid:98) Λ ∗ ( T ∗ l ) − (cid:98) Λ ∗ ( T ∗ l − ) and ∆ (cid:98) Λ ∗ hh ( T ∗ l ) ≥ − T ∗ l .In our application, we applied: I + ∆ (cid:98) Λ ∗ ( T ∗ l ) =  − ∆ N . ( T ∗ l ) Y ( T ∗ l ) ∆ N ( T ∗ l ) Y ( T ∗ l ) ∆ N ( T ∗ l ) Y ( T ∗ l ) ∆ N ( T ∗ l ) Y ( T ∗ l ) ∆ N ( T ∗ l ) Y ( T ∗ l )0 1 − ∆ N . ( T ∗ l ) Y ( T ∗ l ) ∆ N ( T ∗ l ) Y ( T ∗ l ) ∆ N ( T ∗ l ) Y ( T ∗ l ) ∆ N ( T ∗ l ) Y ( T ∗ l )0 0 1 − ∆ N . ( T ∗ l ) Y ( T ∗ l ) ∆ N ( T ∗ l ) Y ( T ∗ l ) ∆ N ( T ∗ l ) Y ( T ∗ l )0 0 0 1 − ∆ N . ( T ∗ l ) Y ( T ∗ l ) ∆ N ( T ∗ l ) Y ( T ∗ l )0 0 0 0 1  , where N h . ( T ∗ l ) = (cid:80) k (cid:44) h N hk ( T ∗ l ) and ∆ N hk ( T ∗ l ) = N hk ( T ∗ l ) − N hk ( T ∗ l − ). Parametric estimator

For each patient i ∈ { , . . . , N } , the observed multi-state process is { E i ( t ) , t ≥ } , where E i ( t ) denotes thestate occupied by subject i at time t and takes values in the ﬁnite space S = { , , . . . , M } . In our jointmulti-state model, we were interested in the transitions intensities: λ ihk ( t | b i ) = lim d t → Pr( E i ( t + d t ) = k | E i ( t ) = h ; b i )d t , (13)which share the random e ﬀ ects b i with the longitudinal sub-model. Based on the Preliminaries para-graph, let Λ i ( t | b i ) = { Λ ihk ( t | b i ) } be the parametric matrix of cumulative intensities for subject i , and P ihk ( s , t | b i ) = { P ihk ( s , t | b i ) } be the parametric matrix of transition probabilities. From the individual co-variates measured at baseline X Shk , i and the parameters estimated by maximum likelihood ˆ θ , we computedfor each subject i the individual predictions of the transition intensities (cid:98) λ ihk ( t | ˆ θ ) and deduced the individ-ual predictions of the cumulative intensities (cid:98) Λ ihk ( t | ˆ θ ). To obtain the parametric estimator of the individual1transition probabilities (cid:98) P i ( s , t | ˆ θ ), we could use the relations: (cid:98) P ihh ( s , t | ˆ θ ) = exp (cid:16)(cid:98) Λ ihh ( t | ˆ θ ) − (cid:98) Λ ihh ( s | ˆ θ ) (cid:17) , (cid:98) P ihk ( s , t | ˆ θ ) = (cid:90) ts (cid:98) P ihh ( s , u | ˆ θ ) (cid:98) λ ihk ( u | ˆ θ ) (cid:98) P ikk ( u , t | ˆ θ )d u , h (cid:44) k . (14)In practice, when the state space S is large, these integrals become too complex numerically. In theapplication we therefore preferred to calculate (cid:98) P i ( s , t | ˆ θ ) through the product-integral: (cid:98) P i ( s , t | ˆ θ ) = (cid:82) ( s , t ] (cid:16) I + d (cid:98) Λ i ( u ) (cid:17) , (15)with d (cid:98) Λ ihh ( u ) ≥ − u . The matrix of transition probabilities (cid:98) P ( s , t ) was then deduced by averagingover the N individual predictions: (cid:98) P ( s , t | ˆ θ ) = N N (cid:88) i = (cid:98) P i ( s , t | ˆ θ ) . (16) Covariance estimation of the non-parametric estimator

The covariance matrix of the Aalen-Johansen estimator of transition probabilities can be estimated bythe Greenwood-type estimator: (cid:100) cov( (cid:98) P ∗ ( s , t )) = (cid:90) ts (cid:98) P ∗ ( u , t ) (cid:62) ⊗ (cid:98) P ∗ ( s , u − ) (cid:100) cov(d (cid:98) Λ ∗ ( u )) (cid:98) P ∗ ( u , t )) ⊗ (cid:98) P ∗ ( s , u − ) (cid:62) , (17)where ⊗ denotes the Kronecker product and (cid:62) denotes the vector transpose. ? simpliﬁed the computations in (17) using the recursion formula: (cid:100) cov( (cid:98) P ∗ ( s , t )) = { ( I + ∆ (cid:98) Λ ∗ ( t )) (cid:62) ⊗ I } (cid:100) cov( (cid:98) P ∗ ( s , t − )) { ( I + ∆ (cid:98) Λ ∗ ( t )) ⊗ I } + { I ⊗ (cid:98) P ∗ ( s , t − ) } (cid:100) cov( ∆ (cid:98) Λ ∗ ( t )) { I ⊗ (cid:98) P ∗ ( s , t − ) } , (18)2where (cid:100) cov( ∆ (cid:98) Λ ∗ hk ( t ) , ∆ (cid:98) Λ ∗ h (cid:48) k (cid:48) ( t )) =  ( Y h ( t ) − ∆ N h . ( t )) ∆ N h . ( t ) Y h ( t ) , for h = k = h (cid:48) = k (cid:48) , − ( Y h ( t ) − ∆ N h . ( t )) ∆ N hk (cid:48) ( t ) Y h ( t ) , for h = k = h (cid:48) (cid:44) k (cid:48) , − ( δ kk (cid:48) Y h ( t ) − ∆ N hk ( t )) ∆ N hk (cid:48) ( t ) Y h ( t ) , for h = h (cid:48) , h (cid:44) k , h (cid:44) k (cid:48) , , for h (cid:44) h (cid:48) , with δ kk (cid:48) the Kronecker delta. Note that (cid:100) cov( (cid:98) P ∗ ( s , t )) and (cid:100) cov( ∆ (cid:98) Λ ∗ ( t )) are two ( K + × ( K + covariance matrices.These results may be used to construct the 95% pointwise conﬁdence intervals for the Aalen-Johansen estimator: exp  log (cid:98) P ∗ hk ( s , t ) ± . (cid:113) (cid:99) var( (cid:98) P ∗ hk ( s , t )) (cid:98) P ∗ hk ( s , t ))  . Appendix C: Example of R Code

This example is based on the simulation study (see Section 4 in the article), with a non-homogeneousMarkov multi-state model which included three states and three transitions. The same covariate, called X in the above code, impacted the longitudinal and survival processes, the longitudinal sub-model hada random intercept and a random slope, the log baseline intensities were approximated using B-splines,and the dependence between the two processes was explained through the true current level and the truecurrent slope of the biomarker.The JMstateModel() function and several detailed examples are available on https://github.com/LoicFerrer/JMstateModel/ . X.1 + X.2 + X.3 + strata(trans),data = data_mstate,method = "breslow",x = TRUE,model = TRUE) Acknowledgements

The authors thank Paul Sargos and Pierre Richaud from the Institut Bergoni´e (Bordeaux, France) fortheir availability and their expertise in clinical interpretations. Computer time for this study was pro-vided by the computing facilities MCIA (M´esocentre de Calcul Intensif Aquitain) of the Universit´e deBordeaux and of the Universit´e de Pau et des Pays de l’Adour. This work was supported by a joint grantfrom INSERM and R´egion Aquitaine, and a grant from the Institut de Recherche en Sant´e Publique[grant AAP12CanBio16]. The RTOG trial and J. Dignam’s e ﬀ orts were supported by Public HealthService grants U10 CA21661 and U10 CA180822 from the National Cancer Institute, NIH, U.S. De-partment of Health and Human Services. Supplementary Materials

The function

JMstateModel() , which is an extension of the standard jointModel() function to themulti-state framework, is available with several examples on https://github.com/LoicFerrer/JMstateModel/ . References

Andersen, P. K. and Gill, R. D. (1982). Cox’s regression model for counting processes: a large samplestudy.

The annals of statistics ﬀ re, E. (2014). Joint modeling of twolongitudinal outcomes and competing risk data. Statistics in medicine .Beyersmann, J., Allignol, A., and Schumacher, M. (2011).

Competing risks and multistate models withR . Springer.Brent, R. P. (1973).

Algorithms for minimization without derivatives . Courier Dover Publications.Chi, Y.-Y. and Ibrahim, J. G. (2006). Joint models for multivariate longitudinal and multivariate survivaldata.

Biometrics

ScandinavianJournal of Statistics

Statistics in medicine

Biostatistics

Computer methods and programsin biomedicine ﬀ , R. M., Li, G., and Li, N. (2008). A joint model for longitudinal measurements and survivaldata in the presence of multiple failure types. Biometrics

Statistics in medicine

Annual review of public health

Statistics in medicine .Kim, S., Zeng, D., Chambless, L., and Li, Y. (2012). Joint models of longitudinal data and recurrentevents with informative terminal event.

Statistics in biosciences Journal of the Royal Statistical Society: Series C (Applied Statistics)

International Journal of Radiation Oncology* Biology* Physics

International Journal of Radiation Oncology*Biology* Physics

Biostatistics

International Journal of Radiation Oncology* Biology* Physics

Journal of Statistical Software

Biometrics

Joint models for longitudinal and time-to-event data: With applications in R .CRC Press.7Scher, H. I., Halabi, S., Tannock, I., Morris, M., Sternberg, C. N., Carducci, M. A., Eisenberger, M. A.,Higano, C., Bubley, G. J., Dreicer, R., et al. (2008). Design and end points of clinical trials for patientswith progressive prostate cancer and castrate levels of testosterone: recommendations of the prostatecancer clinical trials working group.

Journal of Clinical Oncology ﬀ ect models for the joint analysisof longitudinal and time-to-event data: application to the prediction of prostate cancer recurrence. Journal de la Soci´et´e Fran¸caise de Statistique

Biometrics

Journal of clinical oncology

Statistica Sinica

Biometrics

Journal of the American Statistical Association T a b l e : S i m u l a ti on r e s u lt s acc o r d i ng t o3 , a nd15p s e udo - a d a p t a ti v e G a u ss - H e r m it e qu a d r a t u r e po i n t s . F o r eac h s ce n a r i o , t h e s t a ti s ti c s a r e (fr o m l e f tt o r i gh t ) : m ea n , m ea n s t a nd a r d e rr o r , s t a nd a r dd e v i a ti on , r e l a ti v e b i a s ( i np e r ce n t a g e ) a nd c ov e r a g e r a t e ( i np e r ce n t a g e ) . G a u ss - H e r m it e qu a d r a t u r e po i n t s G a u ss - H e r m it e qu a d r a t u r e po i n t s G a u ss - H e r m it e qu a d r a t u r e po i n t s T r u e M ea n S t d E rr S t d D e v R e l . C ov . M ea n S t d E rr S t d D e v R e l . C ov . M ea n S t d E rr S t d D e v R e l . C ov . v a l u e b i a s r a t e b i a s r a t e b i a s r a t e L ong it ud i na l p r o ce ss β − . − . . . . . − . . . . . − . . . . . β , X . . . . . . . . . . . . . . . . β − . − . . . − . . − . . . − . . − . . . − . . β , X . . . . − . . . . . − . . . . . − . . l og ( σ ) − . − . . . . . − . . . . . − . . . . . M u lti - s t a t e p r o ce ss γ , X . . . . . . . . . . . . . . . . γ , X . . . . − . . . . . − . . . . . . . γ , X − . − . . . . . − . . . . . − . . . . . η , level . . . . − . . . . . − . . . . . − . . η , level . . . . . . . . . . . . . . . . η , level . . . . . . . . . . . . . . − . . η , slope . . . . . . . . . . . . . . . . η , slope − . − . . . . . − . . . − . . − . . . − . . η , slope . − . . . − . . . . . − . . . . . . . R ando m e ﬀ ec t s D . . . . − . . . . . − . . . . . − . . D − . − . . . . . − . . . . . − . . . . . D . . . . − . . . . . − . . . . . . . IGURES State 0 State 1 State 2 λ ( t ) λ ( t ) λ ( t ) Υ sim = 

692 500 3080 213 2870 0 595 

Figure 1: Simulated multi-state process. Arrows indicate the directions of the possible transitions. λ hk ( t )characterizes the intensity of transition between states h and k at time t . The matrix Υ sim has size (3 , Υ sim , ( h + k + , where Υ sim , ( h + k + is the average number of observeddirect transitions h → k over the 500 replicates. The diagonal elements Υ sim , ( h + h + denote the averagenumber of patients who were censored in state h . Note that the sum of elements of a row ( h +

1) of Υ sim corresponds to the average number of patients who experienced the state h . IGURES Local Recurrence ( y =144) Hormonal Therapy ( y =227) Distant Recurrence ( y =47)Death ( y =523) Censorship ( y =533)−20246−20246 0 5 10 15 0 5 10 15 Years since the end of EBRT l og ( PSA + . ) Figure 2: Individual trajectories of log (PSA + .

1) after the end of EBRT and according to the kind ofﬁrst relapse in the two cohorts ( N = IGURES EndEBRT LocalRecurrence HormonalTherapy DistantRecurrence Death λ ( t ) λ ( t ) λ ( t ) λ ( t ) λ ( t ) λ ( t ) λ ( t ) λ ( t ) λ ( t ) λ ( t ) Υ = 

533 144 227 47 5230 20 90 10 240 0 106 33 1780 0 0 13 770 0 0 0 802 

Figure 3: Multi-state representation of the clinical progressions in prostate cancer. Arrows indicate thedirections of the possible transitions ( N = λ hk ( t ) characterizes the intensity of transition betweenstates h and k at time t . The matrix Υ has size (5 ,

5) and is composed of elements Υ ( h + k + , where Υ ( h + k + is the number of observed direct transitions h → k . The diagonal elements Υ ( h + h + denotethe number of patients who were censored in state h . Note that the sum of elements of a row ( h +

1) of Υ corresponds to the number of patients who experienced the state h . IGURES (a) Conditional standardized residuals versus the ﬁtted val-ues. The black curve indicates the mean of conditionalresiduals according to the ﬁtted values, using a locallyweighted polynomial regression. (b) Observed and predicted values of the biomarker. 95%conﬁdence intervals of the observed values are connectedin grey. From state 0 From state 1From state 2 From state 30.000.250.500.751.000.000.250.500.751.00 0 5 10 15 0 5 10 15

Years since the end of EBRT T r an s i t i on P r obab ili t i e s To state

Estimators

Par.Non−par.95% CI (c) Predicted transition probabilities from the joint multi-state model and non-parametric probability transitions.

Figure 4: Goodness-of-ﬁt plots for the longitudinal process (a,b) and the multi-state process (c).

IGURES ∗ † ‡ ∗ Pre-therapy PSA value (ng / ml) in the log( . + .

1) scale. † Minimum between the time of ﬁrst transition and the time of censoring. ‡ Minimum between the time of death and the time of censorship.

IGURES p -values in the joint multi-state model on the pooleddata ( N = p -value Value StdErr p -value β − < . γ , iPSA < . β , iPSA < . γ , iPSA < . β , cohort − . γ (13 , , , , , iPSA − . β < . γ (01 , , , tstage2 < . β , iPSA < . γ (01 , , , tstage3-4 . β , tstage2 < . γ (12 , , , tstage2 − . β , tstage3-4 < . γ (12 , , , tstage3-4 . β , cohort − . γ (03 , gleason7 < . β − < . γ (03 , gleason8-10 . β , iPSA < . γ (01 , , , , cohort − < . β , tstage2 < . γ (13 , , cohort < . β , tstage3-4 < . ζ (12 , < . β , gleason7 < . ζ < . β , gleason8-10 < . η , level < . β , cohort − < . η , level < . σ ) − η , level < . η , level − . D η , level − . D η , level − . D η , level . D η , level − . D η , level . D η , level . η , slope < . η , slope < . η , slope < . η , slope . η , slope . η , slope < . η , slope − . η , slope . η , slope . η , slope − . D i j denotes the i j -element of the covariance matrix for the random e ﬀ ects. γ ( hk , h (cid:48) k (cid:48) ) , X denotes the e ﬀ ectof the covariate X on the intensities of transitions h → k and h (cid:48) → k (cid:48) , i.e. γ ( hk , h (cid:48) k (cid:48) ) , X = γ hk , X = γ h (cid:48) k (cid:48) , X . Inthe same idea, it is used ζ (12 , = ζ = ζ . IGURES

500 subjects 1000 subjectsTrue Mean StdErr StdDev Rel. Cov. Mean StdErr StdDev Rel. Cov.value bias rate bias rate

Longitudinal process β − − − β , X β − − − β , X − − σ ) − − − − Multi-state process γ , X γ , X − γ , X − − − − − η , level − η , level − η , level η , slope η , slope − − − − η , slope Random e ﬀ ectsD − − D − − − D − −−