[PDF] A new latent cure rate marker model for survival data

Abstract

To address an important risk classification issue that arises in clinical practice, we propose a new mixture model via latent cure rate markers for survival data with a cure fraction. In the proposed model, the latent cure rate markers are modeled via a multinomial logistic regression and patients who share the same cure rate are classified into the same risk group. Compared to available cure rate models, the proposed model fits better to data from a prostate cancer clinical trial. In addition, the proposed model can be used to determine the number of risk groups and to develop a predictive classification algorithm.

Full PDF

aa r X i v : . [ s t a t . A P ] O c t The Annals of Applied Statistics (cid:13)

Institute of Mathematical Statistics, 2009

A NEW LATENT CURE RATE MARKERMODEL FOR SURVIVAL DATA

By Sungduk Kim, Yingmei Xi and Ming-Hui Chen National Institute of Child Health and Human Development,Biogen Idec Inc. and University of Connecticut

To address an important risk classiﬁcation issue that arises inclinical practice, we propose a new mixture model via latent cure ratemarkers for survival data with a cure fraction. In the proposed model,the latent cure rate markers are modeled via a multinomial logisticregression and patients who share the same cure rate are classiﬁedinto the same risk group. Compared to available cure rate models,the proposed model ﬁts better to data from a prostate cancer clini-cal trial. In addition, the proposed model can be used to determinethe number of risk groups and to develop a predictive classiﬁcationalgorithm.

1. Introduction.

In cure rate modeling of event-time data, a fraction ofthe population is considered to have zero hazard. The model is often suit-able for survival data from cancer clinical trials owing to advances in medicaltreatment and health care. For example, treatment of prostate cancer rou-tinely cures the patient in the sense of completely eradicating the disease.Existing cure rate models are able to accommodate a fraction of the popula-tion being cured [Berkson and Gage (1952) and Maller and Zhou (1996)] andcharacteristics of tumor growth [Chen, Ibrahim and Sinha (1999), Tsodikov,Ibrahim and Yakovlev (2003) and Cooner et al. (2007)]. However, risk-groupinformation is not easily incorporated. Prostate cancer patients can be classi-ﬁed into low, intermediate and high-risk groups on the basis of pre-treatmentcharacteristics, such as the level of prostate-speciﬁc antigen (PSA), biopsyGleason scores or clinical tumor categories [D’Amico et al. (1998, 2002)]. Afailure to incorporate risk stratiﬁcation into the cure rate model can lead to

Received September 2007; revised February 2009. Dr. Chen’s research was supported in part by NIH Grants

Key words and phrases.

Deviance Information Criterion (DIC), Markov chain MonteCarlo, logarithm of pseudomarginal likelihood (LPML), PSA recurrence.

This is an electronic reprint of the original article published by theInstitute of Mathematical Statistics in

The Annals of Applied Statistics ,2009, Vol. 3, No. 3, 1124–1146. This reprint diﬀers from the original in paginationand typographic detail. 1

S. KIM, Y. XI AND M.-H. CHEN poorly ﬁtting statistical models and poorly estimated cure rates and predic-tive probabilities of risk groups. We address this problem via a latent classanalysis of the cure rate model.We consider data from a retrospective cohort study of n = 1235 mentreated with radical prostatectomy (RP) at Brigham and Women’s Hospi-tal between 1988–2001, which is a subset of the data published in D’Amicoet al. (2002). The primary endpoint is the time to prostate-speciﬁc anti-gen (PSA) recurrence or to the last follow-up, whichever came ﬁrst. Therewere 261 patients who had PSA recurrence after the radical prostatectomy.We consider four prognostic factors: natural logarithm of prostate speciﬁcantigen (LogPSA) prior to RP, biopsy Gleason score, the 1992 AmericanJoint Commission on Cancer (AJCC) clinical tumor category and the yearof radical prostatectomy (Year). D’Amico et al. (2002) considered three riskgroups based on PSA, biopsy Gleason score and clinical tumor category andreported the estimates of 8-year PSA recurrence free survival for the threerisk groups based on the Kaplan–Mier (KM) method [Kaplan and Meier(1958)]. For the subset of the data considered here, the KM estimates of8-year PSA recurrence free survival are 88.7%, 57.4% and 23.4% for low-riskpatients (T1c, T2a, a PSA level ≤

10 ng / mL), intermediate-risk patients(T2b or Gleason score 7 or a PSA level >

10 and ≤

20 ng / mL) and high-riskpatients (T2c or PSA level >

20 ng / mL or Gleason score ≥ ≤

6, clinical stage T1 andsurgery in 2001, the predictive probabilities for three risk groups are 0.745,0.241 and 0.014 and, thus, this patient will be classiﬁed into the “low risk”group.Overdiagnosis of clinically insigniﬁcant prostate cancer was considered amajor issue of prostate-speciﬁc antigen (PSA) screening since the U.S. Foodand Drug Administration approved PSA testing in 1986 as a way to monitor

ATENT CURE RATE MARKER MODEL prostate cancer progression [Wang and Arnold (2002)]. Etzioni et al. (2002)estimated rates of prostate cancer overdiagnosis due to PSA testing amongmen who were 60 to 84 years old in 1988. Overdiagnosis may occur whenolder men or men with comorbid illness who have very low risk disease aretreated. However, overdiagnosis is usually not the case for men treated withsurgery because they are healthy but they can have very low risk disease.Since the data we analyze in this paper were from those men who wentto surgery, it may be appropriate to ﬁt a cure rate model to this particularprostate cancer data set. We include the year of RP in the analysis, as it mayhave a signiﬁcant eﬀect on the “cure rate.” There are two reasons for this.With time people are diagnosed after several PSA tests and serial screenedmen are diagnosed earlier with more favorable disease [e.g., Martin et al.(2008)] and with increased medical experience. Over time, the techniques oftreatment also improve, which can improve outcome due to a learning curve,especially when new surgery (e.g., robotic RP) or radiation therapy (e.g.,seed therapy) techniques are used. We ﬁt both the Cox proportional hazardsregression model [Cox (1972)] and the proposed latent cure rate markermodel with a piecewise exponential baseline hazard function to this prostatecancer data. We then computed the logarithm of pseudomarginal likelihood(LPML) [Ibrahim, Chen and Sinha (2001), Chapter 6] and the DevianceInformation Criterion (DIC) proposed by Spiegelhalter et al. (2002) for eachmodel. From Table 2 in Section 5, we see that the best LPML and DICvalues are − . − .

2. The models.

Preliminary.

Let y i denote the observed survival time and let ν i be the censoring indicator that equals 1 if y i is a failure time and 0 if itis right censored for the i th subject. Also, let N i denote the number ofmetastatic-competent tumor cells and assume that the N i ’s are independentPoisson random variables with mean θ i . Suppose further that W ij denotesthe random time for the j th carcinogenic cell to produce a detectable can-cer mass (incubation time for the j th carcinogenic cell) for the i th subject.We assume that the variables W ij , i = 1 , , . . . , are independent and dis-tributed with a common distribution function F ( y ), and are independent S. KIM, Y. XI AND M.-H. CHEN of N i . The time to relapse of cancer can be deﬁned by the random vari-able Y i = min { W ij , ≤ j ≤ N i } , where P ( W i = ∞ ) = 1. Then, the survivalfunction for the cure rate model for the i th subject is given by S i ( y ) = P ( Y i ≥ y ) = exp {− θ i F ( y ) } . (1)Using (1), the cure rate is given by S i ( ∞ ) = exp( − θ i ), which is also equalto P ( N i = 0). Other properties of the cure rate model (1) can be found inYakovlev et al. (1993), Yakovlev and Tsodikov (1996) and Chen, Ibrahim andSinha (1999). To build a regression model, Chen, Ibrahim and Sinha (1999)introduced covariates through θ i via θ i ≡ θ ( x ′ i β ) = exp( x ′ i β ), where x i =( x i , x i , . . . , x ip ) ′ denotes the p × i th subjectand β = ( β , β , . . . , β p ) ′ is the corresponding vector of regression coeﬃcients, i = 1 , , . . . , n . Let S ( y ) = 1 − F ( y ) and f ( y ) = ddy F ( y ). Then the resultingsurvival function is given by S i ( y | x i , β ) = exp {− exp( x ′ i β ) F ( y ) } . (2)We refer to (2) as the CIS model.A natural extension of the CIS model is the cure rate double regressionmodel. Let β = ( β , β , . . . , β p ) ′ and β = ( β , β , . . . , β p ) ′ . We assume θ i = exp( x ′ i β ). A proportional hazards model is assumed for the distributionfunction F ( y ) of the incubation time for noncured subjects. Speciﬁcally,let the cumulative hazard function H ( y ) = exp( x ′ i β ) H ( y ), where H ( y ) isthe baseline cumulative hazard function. Then, F ( y ) = 1 − exp {− H ( y ) } =1 − exp {− exp( x ′ i β ) H ( y ) } . Under this assumption for F ( y ), we have S i ( y | x i , β ) = exp( − exp( x ′ i β )[1 − exp {− exp( x ′ i β ) H ( y ) } ]) , (3)where β = ( β ′ , β ′ ) ′ . Yakovlev and Tsodikov (1996) used parametric accel-erated failure time eﬀects on the cumulative hazards with a similar idea.Model (3) in its semiparametric form has appeared in Broet et al. (2001),where they tended to use a generalized Gompertz name for the model.We see, from (3), that the model in (3) incorporates the covariates intoboth the cure rate and the hazard function with double proportional haz-ards structures. Thus, we refer to this model as the PHPH model. Thename “PHPH” was also introduced by Tsodikov (2002) for the semipara-metric version of the model. Recently, Liu, Lu and Shao (2006) developedthe PHPH version of the standard cure rate model of Berkson and Gage(1952).Another extension of the CIS model is the latent activation cure rate(LACR) model proposed by Cooner et al. (2007). Given N i ≥

1, let W i (1) ≤ W i (2) ≤ · · · ≤ W i ( N i ) denote the ordered values of the W ij ’s. The time torelapse of cancer is deﬁned by Y i = W i ( R i ) for 1 ≤ R i ≤ N i and W i if N i =0, where R i is an integer valued variable. Cooner et al. (2007) speciﬁed ATENT CURE RATE MARKER MODEL a conditional distribution for R i given N i , denoted by [ R i | N i ]. When N i follows a Poisson distribution with mean θ i = exp( x i β ) and [ R i | N i ] is adiscrete uniform on { , , . . . , N i } with probability N i , the survival functionunder the LACR model is given by S i ( y | x i , β ) = exp {− exp( x i β ) } + [1 − exp {− exp( x i β ) } ] S ( y ) . (4)Other distributions for N i and [ R i | N i ] are also considered in Cooner et al.(2007).2.2. A new latent cure rate marker model.

The latent cure rate marker(LCRM) model assumes that the N i ’s are independent Poisson random vari-ables with mean θ g i , where g i is a (unknown) group indicator, and exp( − θ g i )is the cure rate marker. Let G denote the number of distinct values of θ g i .Further, g i (1 ≤ g i ≤ G ) indicates the group membership. Without loss ofgenerality, we assume θ < θ < · · · < θ G . Under these constraints, the groupmembership g i is uniquely deﬁned. Similar to the PHPH model, we assumethe proportional hazards model for cumulative hazard function H ( y ), thatis, H ( y ) = exp( x ′ i β ) H ( y ), where H ( y ) is the baseline cumulative hazardfunction. Then, under the LCRM model, the conditional survival functionof y i given θ g i is of the form S i ( y | x i , β , θ g i ) = exp( − θ g i [1 − exp {− exp( x ′ i β ) H ( y ) } ]) . (5)We assume a multinomial logistic regression model for the latent groupmembership g i . To this end, let z ′ i = ( z i , z i , z i , . . . , z iq ) denote a ( q + 1) × i th subject, which includes an intercept (i.e., z i = 1) for i = 1 , , . . . , n . Also let φ j = ( φ j , φ j , φ j , . . . , φ jq ) ′ denote thecorresponding vectors of regression coeﬃcients for j = 1 , , . . . , G and φ ′ =( φ ′ , φ ′ , . . . , φ ′ G − ). Then the density of the group membership g i is given by f ( g i | z i , φ ) = exp( z ′ i φ g i ) P Gl =1 exp( z ′ i φ l ) . (6)For notational convenience, we let φ G = (0 , , . . . , ′ . Write θ ′ = ( θ , θ , . . . , θ G ).Using (5), the unconditional survival function of y i is given by S i ( y | x i , z i , β , θ , φ )(7) = G X k =1 exp( − θ k [1 − exp {− exp( x ′ i β ) H ( y ) } ]) exp( z ′ i φ k ) P Gl =1 exp( z ′ i φ l ) . Unlike the CIS model, the LCRM model does not directly link the co-variates to the cure fractions and instead it assumes that the populationis characterized by an unobserved cure rate marker, namely, exp( − θ g i ), S. KIM, Y. XI AND M.-H. CHEN where the latent group membership g i is described according to covari-ates via a multinomial logistic regression model. We note that the mono-tonic constraints on the cure rates θ k ’s not only deﬁne the group mem-bership g i but also ensure identiﬁability of the multinomial logistic regres-sion model. We also see, from (7), that the LCRM model is indeed a ﬁ-nite mixture of cure rate models. If θ k → θ for k = 1 , , . . . , G , (7) reducesto S i ( y | x i , β , θ ) = exp( − θ [1 − exp {− exp( x ′ i β ) H ( y ) } ]) , which is a special case of the PHPH model.We assume the piecewise exponential model for the baseline hazard func-tion h ( y ), which is constructed as follows. We ﬁrst partition the time axisinto J intervals: ( s , s ], ( s , s ] , . . . , ( s J − , s J ], where s = 0 < s < s < · · · < s J = ∞ . We then assume a constant hazard λ j over the j th inter-val I j = ( s j − , s j ]. That is, h ( y ) = λ j if y ∈ I j for j = 1 , , . . . , J . Then thecorresponding cumulative distribution function (c.d.f.), F ( y | λ ), is given by F ( y | λ ) = 1 − exp ( − λ j ( y − s j − ) − j − X g =1 λ g ( s g − s g − ) ) (8)for s j − ≤ y < s j , where λ = ( λ , . . . , λ J ) ′ . We note that when J = 1, F ( y | λ )reduces to the parametric exponential model.Let D = ( n, y , X, Z, ν , N , g ) denote the complete data, where y = ( y , . . . , y n ) ′ , ν = ( ν , ν , . . . , ν n ) ′ , X is the n × p matrix of covariates with i th row x ′ i , Z ,which may share common components with X , is a q -vector of covariateswith i th row z ′ i , N ′ = ( N , N , . . . , N n ), and g ′ = ( g , g , . . . , g n ). Then, thecomplete data likelihood under the LCRM model is given by L ( β , θ , φ , λ | D )= n Y i =1 " J Y j =1 ( N i λ j ) ν i δ ij × exp ( ν i δ ij x ′ i β − exp( x ′ i β ) N i δ ij (9) × λ j ( y i − s j − ) + j − X k =1 λ k ( s k − s k − ) !) × exp " n X i =1 ( N i log θ g i − log( N i !) − θ g i + z ′ i φ g i − log " G X l =1 exp( z ′ i φ l ) , where δ ij = 1 if the i th subject failed or was censored in the j th interval I j , and 0 otherwise. Since N and g are not observed, by summing (9) over ATENT CURE RATE MARKER MODEL N and g , we obtain the likelihood function based on the observed data D obs = ( n, y , X, Z, ν ) given by L ( β , θ , φ , λ | D obs )= n Y i =1 G X k =1 exp " ν i ( log θ k + x ′ i β + J X j =1 δ ij log λ j − exp( x ′ i β ) H ∗ ( y i ) ) (10) − θ k (1 − exp {− exp( x ′ i β ) H ∗ ( y i ) } ) × exp ( z ′ i φ k − log G X l =1 exp( z ′ i φ l ) !)! , where H ∗ ( y i ) = P Jj =1 δ ij [ λ j ( y i − s j − ) + P j − l =1 λ l ( s l − s l − )].

3. The prior and posterior distributions under the LCRM model.

Weconsider a joint prior distribution for ( β , θ , φ , λ ). Suppose that J and s j , j = 1 , . . . , J , are ﬁxed. First we consider a ﬁxed G . We assume that β , θ , φ and λ are independent a priori. Thus, the joint prior for ( β , θ , φ , λ ) is ofthe form π ( β , θ , φ , λ ) = π ( β ) π ( θ ) π ( φ ) π ( λ ). We further assume that β ∼ N p (0 , c I p ) , φ k ∼ N q (0 , c I q ) , k = 1 , , . . . , G − , (11) π ( λ ) ∝ J Y j =1 λ a − j exp( − b λ j ) , (12)and π ( θ ) ∝ G Y k =1 θ a k − k exp( − b k θ k ) , < θ < θ < · · · < θ G , (13)where c , c , a , b , a k and b k , k = 1 , , . . . , G , are the prespeciﬁed hy-perparameters. Due to the monotonic constraints, θ < θ < · · · < θ G , elic-iting the hyper-parameters a k and b k becomes more crucial than other hy-perparameters. To this end, we ﬁrst specify θ = ( θ , θ , . . . , θ G ) ′ suchthat θ < θ < · · · < θ G . Equivalently, we specify a set of prior cure ratesexp( − θ k ), k = 1 , , . . . , G . Then, we set a k = c and b k = c θ k , where c is aknown constant. This essentially implies that we specify the prior mean andthe prior standard deviation of θ k to be θ k and c θ k . Thus, c quantiﬁesthe prior uncertainty in θ k . A large value of c reﬂects a vague prior beliefin θ k and a small value of c yields a strong prior belief in θ k . In Section5 we use the LPML and DIC measures to guide the choice of c and θ andwe then conduct a sensitivity analysis on various choices of c and θ . S. KIM, Y. XI AND M.-H. CHEN

Based on the prior distributions speciﬁed above, the joint posterior dis-tribution of β , θ , φ , λ , N and g based on the complete data D is thus givenby π ( β , θ , φ , λ , N , g | D obs ) ∝ L ( β , θ , φ , λ | D ) π ( β ) π ( θ ) π ( φ ) π ( λ ) , (14)where L ( β , θ , φ , λ | D ) is deﬁned in (9). We note that when the priors π ( β ), π ( θ ), π ( φ ) and π ( λ ) are proper, the resulting posterior is proper. However,even when we take an improper prior for θ , an improper uniform prior for β and an improper Jeﬀreys-type prior for λ , that is, c → ∞ and a = b = 0in (12), the posterior is still proper under some mild conditions. We formallystate this result in the following theorem. Theorem 1.

Suppose that π ( β ) ∝ and π ( λ ) ∝ Q Jj =1 λ − j . Let X j bean n × ( p + 1) matrix with its i th row equal to ν i δ ij (1 , x ′ i ) , where p is thedimension of β . Assume that (i) when ν i = 1 , y i > , (ii) d j ≡ P ni =1 ν i δ ij ≥ for j = 1 , . . . , J , (iii) there exists a j ∗ such that X j ∗ is of full rank, and (iv) c > , a k > and b k ≥ for k = 1 , , . . . , G − , d + P G − k =1 a k + a G > , where d = P Jj =1 d j , and b G > . Then, the resulting posterior in (14) is proper. The proof of the theorem is given in the Appendix. The conditions (i)–(iii) are indeed quite mild and essentially require that all event times arestrictly positive, at least one event occurs in each chosen interval ( s j − , s j ],and the covariate matrix is of full rank for at least one interval. These con-ditions are easily satisﬁed in most applications and are quite easy-to-check.We note that the condition (iv) does not require a G >

0. Thus, π ( θ ) canbe improper. We also note that the latent structure of the LCRM modelleads to the development of a Markov chain Monte Carlo (MCMC) al-gorithm for sampling from posterior distribution in (14). When G is notspeciﬁed, we assume a truncated Poisson distribution with mean µ G on { , , . . . , G max } for G , where µ G and G max are prespeciﬁed. Then, we de-velop a reversible jump algorithm for carrying out posterior computation.The description of the MCMC algorithm for a ﬁxed G and the detailed de-velopment of the reversible jump MCMC based on Lopes and West (2004)and Green (1995) are given in online supplementary material [Kim, Xi andChen (2009)].

4. Posterior predictive classiﬁcation under the LCRM model.

In thissection we consider classiﬁcation via the posterior predictive probability.The latent cure rate markers under the LCRM model can be naturally usedfor the predictive classiﬁcation. Let x new and z new denote the future valuesof the vectors of baseline covariates. Also let g new denote the future groupindicator. Suppose that g new takes a value between 1 and G , where G is ATENT CURE RATE MARKER MODEL ﬁxed. Then, the conditional posterior probability for g new given φ and z new is given by P ( g new = k | φ , z new , G ) = exp( z ′ new φ k ) P Gl =1 exp( z ′ new φ l ) , k = 1 , , . . . , G. (15)The posterior estimate of this predictive probability for g new is the posteriorexpectation of P ( g new = k | φ , z new ) given byˆ p ( k | z new , G ) = E [ P ( g new = k | φ , z new , G ) | D obs ](16)for k = 1 , , . . . , G , where the expectation is taken with respect to the pos-terior distribution of φ based on the observed data D obs . The clinical inter-pretation of (16) is that, given the patient’s characteristic z new , ˆ p ( k | z new ) isthe probability that the patient is in the k th risk group.Next, we consider the conditional predictive probability for g new = k giventhe survival time Y ≥ t , x new and z new . This conditional predictive proba-bility can be calculated as follows: P ( g new = k | β , θ , φ , λ , t, x new , z new , G )(17) = exp( z ′ new φ k − θ k [1 − exp {− exp( x ′ new β ) H ∗ ( t ) } ]) P Gl =1 exp( z ′ new φ l − θ l [1 − exp {− exp( x ′ new β ) H ∗ ( t ) } ]) , where H ∗ ( t ) is given in (10). The posterior estimate of (17) for g new isˆ p ( k | t, x new , z new , G ) = E [ P ( g new = k | β , θ , φ , λ , t, x new , z new ) | D obs ] . (18)From (16) and (18), it is easy to see thatˆ p ( k | z new , G ) = ˆ p ( k | t = 0 , x new , z new , G )for k = 1 , , . . . , G . Since lim t →∞ H ∗ ( t ) = ∞ , we also havelim t →∞ ˆ p ( k | t, x new , z new , G ) = E (cid:20) exp( z ′ new φ k − θ k ) P Gl =1 exp( z ′ new φ l − θ l ) (cid:21) , (19) k = 1 , , . . . , G. Using the posterior predictive probability in (18), we classify a new patientwith characteristic ( x new , z new ) into risk group k ∗ if k ∗ = arg max ≤ k ≤ G ˆ p ( k | t, x new , z new , G ) . (20)An attractive property of the posterior predictive probability in (18) ispresented in the following theorem. Theorem 2.

The posterior predictive probability for the lowest risk group( k = 1 ), ˆ p (1 | t, x new , z new , G ) , increases in t , while for the highest risk group( k = G ), ˆ p ( G | t, x new , z new , G ) decreases in t . S. KIM, Y. XI AND M.-H. CHEN

The proof of Theorem 2 is given in the Appendix. Based on (19) andTheorem 2, we have that, for t ≥ P ( g new = 1 | t, x new , z new , G ) ≤ E (cid:20) exp( z ′ new φ − θ ) P Gk =1 exp( z ′ new φ k − θ k ) (cid:21) , (21)which is the largest probability that ˆ P ( g new = 1 | t, x new , z new , G ) may achievefor t > x new , z new ). Similarly, we haveˆ P ( g new = G | t, x new , z new , G ) ≥ E (cid:20) exp( z ′ new φ G − θ G ) P Gk =1 exp( z ′ new φ k − θ k ) (cid:21) , (22)which is the smallest probability that ˆ P ( g new = G | t, x new , z new , G ) can getfor t >

0. The quantity ˆ P ( g new = k | t, x new , z new , G ) is clinically importantas this gives the patient an idea how well he can do prospectively given hisbaseline characteristic.Finally, we note that when G is not speciﬁed, a similar posterior predic-tive classiﬁcation algorithm can be established. For example, instead of (15),we compute P ( g new = k | φ , z new ) = G max X G =1 π ( G ) exp( z ′ new φ k ) P Gl =1 exp( z ′ new φ l ) , k = 1 , , . . . , G max , where π ( G ) denotes the prior distribution for G and G max is the largestvalue of G .

5. Analysis of the prostate cancer data.

We revisit the prostate cancerdata discussed in Section 1. The response variable y is the time to prostate-speciﬁc antigen (PSA) recurrence. Covariates x , x , x , x and x corre-spond to LogPSA, biopsy Gleason score, clinical tumor category and theyear of radical prostatectomy (Year). A summary of covariates is given inTable 1. The covariates LogPSA ( x ) and Year ( x ) are continuous, while x , x and x are binary. The mean and the standard deviation for LogPSAwere 1.95 and 0.72. We also set z j = x j for j = 1 , . . . ,

5. In all computationswe standardized all covariates by subtracting their respective sample meansand then being divided by their respective sample standard deviations.The hyper-parameters of the prior distribution in Section 4 are speciﬁed asfollows. In (11), (12) and (13), we take c = 1000, c = 3, a = 1, b = 0 . c = 2 .

5. We choose c to be much larger than c as the posterioris proper even when π ( β ) ∝ a = 1 and b = 0 .

01 are speciﬁed so that the prior for λ is relatively noninformative.We further specify θ = − log(0 .

5) for G = 1; θ = − log(0 .

9) and θ = − log(0 .

3) for G = 2; θ = − log(0 . θ = − log(0 .

5) and θ = − log(0 . G = 3; θ = − log(0 . θ = − log(0 . θ = − log(0 .

3) and θ = ATENT CURE RATE MARKER MODEL − log(0 .

1) for G = 4; and θ = − log(0 . θ = − log(0 . θ = − log(0 . θ = − log(0 .

3) and θ = − log(0 .

1) for G = 5. We note that (0 . , . , . G = 3 were determined by the KM estimates of cure rates based on thethree risk groups deﬁned in D’Amico et al. (1998, 2002).Table 2 shows the values of LPML and DIC for the Cox, CIS, PHPH,LACR and LCRM models for various J ’s and G ’s. We note that under theCox model, the survival function is given by S i ( y | x i , β , λ ) =exp {− exp( x ′ i β ) H ( y | λ ) } , where H ( y | λ ) is the cumulative baseline hazardfunction corresponding to F ( y | λ ) given in (8). From Table 2, we observethat there is a concave pattern in the LPMLs and there is a convex patternin the DICs as functions of J for the CIS, PHPH and LCRM with ﬁxed G .We note that for J = 15, the values of LPML and DIC are − . − . J still holdsfor the Cox and LCRM models. Similar patterns are also observed in theLPMLs and DICs as functions of G for ﬁxed J under the LCRM. Amongthe three J ’s shown in Table 2, J = 5 consistently ﬁts the data better forthe CIS, PHPH and LCRM models and J = 10 ﬁts the data better for theCox and LACR models. The LCRM model, with J = 5 and G = 3 ﬁts thedata best among all models considered. In particular, LPML = − . − . . G = 1, the LCRM model with G ≥ J = 5 and G = 3 are given in Table 3. We see from this table thatLogPSA is signiﬁcant in the proportional hazards model (5) for the survivalfunction for a “noncured” subject and LogPSA, G8H and Year of RP aresigniﬁcant in the multinomial model (6) for the latent group membershipat a signiﬁcance level of 0.05. In addition, Year of RP is nearly signiﬁcantin both models. Although the prior cure rates for the three risk groups are0.9, 0.5 and 0.1, respectively, the resulting posterior estimates of these cure Table 1

Summary of covariates for prostate cancer data

Covariate Coded variable Value Deﬁnition Frequency x LogPSA ( −∞ , ∞ ) Logarithm of PSA prior to RP –( x , x ) (G7, G8H) (0 ,

0) Gleason score 6 or less 866(1 ,

0) Gleason score 7 303(0 ,

1) Gleason score 8–10 66 x Cstage 0 (T1) Clinical tumor category T1c or T2a 10551 (T2) Clinical tumor category T2b or T2c 180 x Year > S. KIM, Y. XI AND M.-H. CHEN

Table 2

LPMLs and DICs of Cox, CIS, PHPH, LACR and LCRM models J = 1 J = 5 J = 10 Model G LPML DIC LPML DIC LPML DIC

Cox − − − − − − − − − − − − − − − − − − − − − − − − − − − Table 3

Posterior estimates based on the best LCRM model

Posterior Posterior

HPDVariable mean SD interval β (LogPSA) 0.349 0.107 (0.136, 0.554) β (G7) 0.117 0.135 ( − β (G8H) 0.090 0.085 ( − β (Cstage) 0.042 0.095 ( − β (Year) − − θ θ θ − θ ) 0.939 0.095 (0.740, 1.000)exp( − θ ) 0.347 0.156 (0.059, 0.625)exp( − θ ) 0.092 0.052 (0.000, 0.181) φ (Intercept) 0.842 0.822 ( − φ (LogPSA) − − − φ (G7) − − φ (G8H) − − − φ (Cstage) − − φ (Year) 0.840 0.373 (0.106, 1.597) φ (Intercept) − − φ (LogPSA) − − − φ (G7) 1.118 1.058 ( − φ (G8H) − − φ (Cstage) − − φ (Year) 1.144 0.851 ( − rates are 0.939, 0.347 and 0.092. Under the same model setting for Table3, the posterior predictive probabilities, ˆ p ( k | t, x new , z new , G ) given in (19)with z new = x new , are computed for three sets of baseline covariates x new ’sfor various t ’s and the results are given in Table 4. Based on the proposedclassiﬁcation criterion given in (20), these probabilities clearly indicate thata patient with a PSA level of 5, Gleason 6 or less, and tumor stage T1belongs to risk group 1 (low risk group) and a patient with a PSA level of30, Gleason 8 to 10, and tumor stage T2 falls into risk group 3 (high riskgroup) no matter whether he had surgery in 1988 or 2001. However, a patientwith a PSA level of 5, Gleason 7 and tumor stage T2 may be classiﬁed intorisk group 3 (high risk group) if he had surgery in 1988 while a patient withthe same PSA level, Gleason score and tumor stage will be classiﬁed intorisk group 2 (intermediate risk group) if he had surgery in 2001. From Table4, we also see that for each set of baseline covariates, the risk classiﬁcationdoes not change no matter how long the patient will live if he had surgeryin 2001 and this is not the case when he had surgery in 1988. In addition,the overall cure rates, S ( ∞| x new , z new , β , θ , φ ) = G X k =1 exp( − θ k ) exp( z ′ new φ k ) P Gl =1 exp( z ′ new φ l ) , are presented in Table 4. It is interesting to see that when (PSA, Gleason,Cstage) = (5, ≤

6, T1), the overall cure rate is much smaller than that given g new = 1, when (PSA, Gleason, Cstage) = (5, 7, T2), the overall cure rate isgreater than that given g new = 2 if he had surgery in 2001, while the over-all cure rate is very similar to the risk group speciﬁc cure rate ( g new = 3)when (PSA, Gleason, Cstage) = (30, 8–10, T2). Figure 1 shows the esti-mated risk group speciﬁc PSA recurrence free probabilities correspondingto these three sets of covariates and the estimated overall PSA recurrencefree probability when the year of RP was 2001. From plots (a), (b) and (c),we see that three risk group speciﬁc probability curves are well separatedfrom each other. These plots also show that a wrong classiﬁcation may leadto either over-estimate or under-estimate of the PSA recurrence free prob-ability. Thus, the posterior predictive classiﬁcation is quite important, as acorrect classiﬁcation leads to more accurate estimates of the cure rate aswell as the PSA recurrence free probability.We further conducted a sensitivity analysis on the choice of c and θ j ’s.Table 5 shows the LPML and DIC values of the LCRM model with G = 3for various c and the prior cure rates exp( θ ) = (0 . , . , . c . Among all values of c and exp( θ ), c = 2 . θ ) =(0 . , . , .

1) yield the largest LPML and the smallest DIC among all choicesconsidered. Although not reported in Table 5, the posterior estimates of S. KIM, Y. XI AND M.-H. CHEN

Table 4

Posterior predictive probability based on the best LCRM model

Year PSA Gleason Stage t ˆ p ( k = 1 | t ) ˆ p ( k = 2 | t ) ˆ p ( k = 3 | t ) Overall cure rate ≤ ∞ ∞ ∞ ≤ ∞ ∞ ∞ the cure rates were also calculated under those choices of c and the priorcure rates. For example, when c = 2 .

5, the posterior estimates of the curerates and the corresponding posterior standard deviations are (0.939, 0.347,0.092) and (0.095, 0.156, 0.052) for exp( θ ) = (0 . , . , . θ ) = (0 . , . , . θ ) = (0 . , . , . c . These results demonstrate thatthe proposed LCRM model is quite robust to the speciﬁcation of c andprior cure rates.When G is not speciﬁed, we used the RJMCMC algorithm given in Kim,Xi and Chen (2009). In the RJMCMC algorithm, we took a = 3 for θ l and d l = 0 . φ l , l = 1 , , . . . , G −

1. We speciﬁed the transition matrix asfollows: TR =  . . . . . . . . . . . . . . . . . . . . . . . . .  . The dimension of model, G , is assumed to follow a Poisson distributionwith mean µ G = 3 and truncated between 1 and 5. Also J is ﬁxed to ATENT CURE RATE MARKER MODEL Fig. 1.

Plots of the estimated risk group speciﬁc PSA recurrence free probabilities cor-responding to (PSA, Gleason, Cstage) = (5, ≤ , T1) (a) , (5, 7, T2) (b) , and (30, 8–10,T2) (c) , and the estimated overall PSA recurrence free probability (d) for year of RP = be 5. Under the above setting, the posterior probabilities of G are com-puted and these are P ( G = 1 | D obs ) = 0 . P ( G = 2 | D obs ) = 0 . P ( G =3 | D obs ) = 0 . P ( G = 4 | D obs ) = 0 . P ( G = 5 | D obs ) = 0 .

0, respec-tively. Therefore, the model with G = 3 has the highest posterior modelprobability. This result is consistent with the best model identiﬁed by theLPML and DIC measures shown in Table 2. We also conducted a sensi-tivity analysis on the speciﬁcation of µ G in the prior distribution for G .Speciﬁcally, we obtained that P ( G = 1 | D obs ) = 0 . P ( G = 2 | D obs ) = 0 . S. KIM, Y. XI AND M.-H. CHEN

Table 5

LPMLs and DICs of the LCRM model for various c and prior cure rates Prior cure rates(0.9, 0.5, 0.1) (0.8, 0.5, 0.2) (0.7, 0.5, 0.3) c LPML DIC LPML DIC LPML DIC − − − − − − − − − − − − − − − − − − − − − − − − − − − P ( G = 3 | D obs ) = 0 . P ( G = 4 | D obs ) = 0 . P ( G = 5 | D obs ) = 0 . µ G = 2, and P ( G = 1 | D obs ) = 0 . P ( G = 2 | D obs ) = 0 . P ( G = 3 | D obs ) =0 . P ( G = 4 | D obs ) = 0 . P ( G = 5 | D obs ) = 0 . µ G = 4. Thus,the model with G = 3 consistently has the highest posterior model proba-bility for all three choices of µ G .In all the computations, we ﬁrst generated 100,000 Gibbs samples with aburn-in of 4000 iterations, and we then used 20,000 iterations obtained fromevery 5th iteration for computing all posterior estimates, including posteriormean, posterior standard deviation, 95% highest posterior density (HPD)intervals and LPML. The computer codes were written in FORTRAN 95using IMSL subroutines with double precision accuracy. The convergence ofthe MCMC sampling algorithm was checked using several diagnostic proce-dures as recommended by Cowles and Carlin (1996).

6. Discussions.

In Section 5 we used LPML and DIC measures to assessthe goodness of ﬁt of the models for diﬀerent choices of G and J . LPML isa well-established Bayesian model comparison criterion based on the con-ditional predictive ordinate (CPO) statistics, which is particularly suitablefor the cure rate models. Let CPO i denote the CPO statistic for the i thsubject. LPML is deﬁned asLPML = n X i =1 log(CPO i ) . The larger the LPML, the better the ﬁt of a given model. Letting γ denotethe vector of all model parameters and L ( γ | D obs ) the likelihood based onthe observed data D obs , the DIC is deﬁned asDIC = Dev(¯ γ ) + 2 p D , ATENT CURE RATE MARKER MODEL where Dev( γ ) = − L ( γ | D obs ) is a deviance function, ¯ γ is the poste-rior mean of γ , p m = Dev( γ ) − Dev(¯ γ ), and Dev( γ ) is the posterior mean ofDev( γ ). For the LCRM model, γ = ( β , θ , φ , λ ) and L ( γ | D obs ) = L ( β , θ , φ , λ | D obs ), which is given by (10). The DIC is a Bayesian measureof predictive model performance, which is decomposed into a measure of ﬁtand a measure of model complexity ( p D ). The smaller the value of DIC, thebetter the model will predict new observations generated in the same way asthe data. As discussed in Spiegelhalter et al. (2002), DIC is the Bayesian ver-sion of the Akaike Information Criterion (AIC) [Akaike (1973)]. Unlike AIC,the dimensional penalty in DIC is automatically calculated without actu-ally counting the number of parameters. Although the dimensional penaltyis not explicitly shown in LPML, LPML has a dimensional penalty similarto AIC as derived by Gelfand and Dey (1994) based on the asymptotic ap-proximation. Moreover, as discussed in Ibrahim, Chen and Sinha (2001), theLPML measure is particularly suitable for comparing cure rate models, asthe moments do not exist under these models.As discussed in Sections 1 and 2, there are several cure rate models forsurvival data with a cure fraction recently developed in the literature. Thereis a distinct diﬀerence between the proposed model and the existing ones.Speciﬁcally, the new model is to no longer explain the cure fractions directlyaccording to covariates but to divide the population into latent classes char-acterized by speciﬁc cure rates and being described according to covariates.This nice feature of the proposed model allows us to develop the predictiveclassiﬁcation algorithm for classifying patients into diﬀerent risk groups. Theproposed mixture model falls within the latent class modeling framework.The latent class models are commonly used for analyzing complex samplesurvey data. For survey data, a latent class model is often used to explainunobservable categorical relationships or latent structures that characterizediscrete multivariate data [Dayton (1999), Agresti (2002) and Patterson,Dayton and Graubard (2002)]. Recently, latent class models have been de-veloped for survival data. Lin et al. (2002) proposed latent class modelsfor joint longitudinal and survival data. They assumed a Cox proportionalhazards model with time-varying covariates for the survival endpoint andeach latent class represents certain pattern of longitudinal and event-timeresponses. Larsen (2004) extended the Cox model to encompass a latent classvariable (an indicator of the unobserved status of health or functioning) aspredictor of time-to-event. However, the literature on the latent class modelfor survival data with a cure fraction is still sparse. Based on the subset ofthe data published in D’Amico et al. (2002), we showed in Section 5 thatthe proposed model with three latent cure rate markers ﬁts the data bestbased on LPML, DIC and the reversible jump of Green (1995). This ﬁndingis consistent with the prostate cancer literature, as the three risk groups areroutinely used in the prostate cancer clinical practice. S. KIM, Y. XI AND M.-H. CHEN

Although the proportional hazards (PH) assumption is assumed for thecumulative hazard function H ( y ) for noncured subjects in (5), the result-ing survival function is not PH due to the nature of the mixture model.To examine the PH assumption, we ﬁrst considered the generalized odds-rate hazards (GORH) model discussed in Banerjee et al. (2007). We thencompared various GORH models for H ( y ) based on the LPML and DICmeasures to see whether a PH model for H ( y ) is appropriate. The results,which are available in Kim, Xi and Chen (2009), empirically conﬁrm thatthe PH assumption for H ( y ) may be appropriate for the prostate cancerdata discussed in Section 1.In Section 5, the covariates considered include only PSA, biopsy Gleasonscore, clinical tumor category and year of RP due to the limitation of theprostate cancer data we had. However, it will not add much additional com-putational diﬃculty to incorporate more covariates into the proposed model.Unlike D’Amico et al. (1998, 2002), the proposed model does not requireany prespeciﬁed cutoﬀ values of the covariates in classifying patients intodiﬀerent risk groups. The proposed method is potentially useful in clinicalapplications as it allows doctors to include as many important covariates aspossible, some of which may be discovered later on due to medical advances,for obtaining a more accurate risk classiﬁcation.In the LCRM model, we assume that there are G unknown latent θ g i ’s.Instead of the latent class model, we may assume a mixture of the DirichletProcess (MDP) model discussed in Ibrahim, Chen and Sinha (2001) for thecure rate parameters. Speciﬁcally, we assign an unknown θ i to each subjectand then assume a Dirichlet Process prior for θ i . In Section 2.2, we assumea piecewise exponential model for the baseline hazard function h ( y ). Onepossible extension to this is to assume a gamma process prior for h ( y ),which leads to a semiparametric LCRM model. These two extensions of theLCRM model are currently under investigation.APPENDIX: PROOFS OF THEOREMS Proof of Theorem 1.

After summing out N and g , we have π ( β , θ , φ , λ | D obs ) ∝ π ∗ ( β , θ , φ , λ | D obs )= L ( β , θ , φ , λ | D obs ) × " J Y j =1 λ − j G Y k =1 θ a k − k exp( − b k θ k ) π ( φ ) , where L ( β , θ , φ , λ | D obs ) is given by (10). It suﬃces to show that Z π ∗ ( β , θ , φ , λ | D obs ) d β d θ d φ d λ < ∞ . (A.1) ATENT CURE RATE MARKER MODEL It is easy to show that L ( β , θ , φ , λ | D obs ) ≤ Y { i : ν i =1 } θ G exp ( x ′ i β + J X j =1 δ ij log λ j − exp( x ′ i β ) H ∗ ( y i ) ) . Using condition (iv), we can show Z θ dG " G Y k =1 θ a k − k exp( − b k θ k ) π ( φ ) d θ d φ < ∞ due to the constraints, 0 < θ < θ < · · · < θ G , and the condition, a k > b k ≥

0, for k = 1 , , . . . , G −

1. Let π ∗ ( β , λ | D obs ) = " Y { i : ν i =1 } exp ( x ′ i β + J X j =1 δ ij log λ j − exp( x ′ i β ) H ∗ ( y i ) ) × " J Y j =1 λ − j . In order to establish (A.1), we only need to prove Z π ∗ ( β , λ | D obs ) d β d λ < ∞ . (A.2)Consider the transformation u j = log( λ j ), and let u = ( u , . . . , u J ) ′ . Then, dλ j = λ j du j , j = 1 , , . . . , J , and π ∗ ( β , u | D obs ) = π ∗ ( β , λ | D obs ) (cid:12)(cid:12)(cid:12)(cid:12) ∂ ( λ , λ , . . . , λ J ) ∂ ( u , u , . . . , u J ) (cid:12)(cid:12)(cid:12)(cid:12) = J Y j =1 Y { i : ν i =1 } { exp( u j + x ′ i β ) } δ ij × exp ( − δ ij exp( x ′ i β ) × " exp( u j )( y i − s j − )+ j − X l =1 exp( u l )( s l − s l − ) . Letting δ ij i = 1 and δ ij = 0 for j = j i , we have π ∗ ( β , u | D obs ) ≤ π ∗∗ ( β , u | D obs ) S. KIM, Y. XI AND M.-H. CHEN = Y { i : ν i =1 } exp( u j i + x ′ i β ) exp {− ( y i − s j i − ) exp( u j i + x ′ i β ) } , and it suﬃces to show that R π ∗∗ ( β , u | D obs ) d β d u < ∞ . We rewrite π ∗∗ ( β , u | D obs ) as π ∗∗ ( β , u | D obs )= J Y j =1 Y { i : ν i =1 ,j i = j } exp( u j + x ′ i β ) exp {− ( y i − s j − ) exp( u j + x ′ i β ) } = Y j = j ∗ Y { i : ν i =1 ,j i = j } exp( u j + x ′ i β ) exp {− ( y i − s j − ) exp( u j + x ′ i β ) }× Y { i : ν i =1 ,j i = g ∗ } exp( u j + x ′ i β ) exp {− ( y i − s j − ) exp( u j + x ′ i β ) } . Since d j ≥

1, there exists s j − < y i j ≤ s j for j = j ∗ . Thus, Y j = j ∗ Y { i : ν i =1 ,j i = j } exp( u j + x ′ i β ) exp {− ( y i − s j − ) exp( u j + x ′ i β ) }≤ K Y j = j ∗ exp( u j + x ′ i j β ) exp {− ( y i j − s j − ) exp( u j + x ′ i j β ) } , where K > Z Y j = j ∗ Y { i : ν i =1 ,j i = j } exp( u j + x ′ i β ) exp {− ( y i − s j − ) exp( u j + x ′ i β ) }× Y j = j ∗ du j ! ≤ K Y j = j ∗ Z ∞−∞ exp( u j + x ′ i j β ) exp {− ( y i j − s j − ) exp( u j + x ′ i j β ) } du j = K Y j = j ∗ ( y i j − s j − ) − , where K > j = j ∗ , without loss of generality, we as-sume y i ∗ , . . . , y i ∗ p +1 ∈ ( s j ∗ − , s j ∗ ], and X ∗ j ∗ , which has the l th row (1 , x ′ i ∗ l ), ℓ = 1 , . . . , p + 1, is of full rank. Therefore, Y { i : ν i =1 ,j i = j ∗ } exp( u j ∗ + x ′ i β ) exp {− ( y i − s j ∗ − ) exp( u j ∗ + x ′ i β ) }≤ K p +1 Y l =1 exp { u ∗ j + x ′ i ∗ l β } exp {− ( y i ∗ l − s j ∗ − ) exp( u ∗ j + x ′ i ∗ l β ) } , ATENT CURE RATE MARKER MODEL where K > w = ( w , . . . ,w p +1 ) ′ ≡ X ∗ g ∗ (cid:0) u j ∗ β (cid:1) , which is a one-to-one transformation. We have Z R p +1 p +1 Y l =1 exp( u j ∗ + x ′ i l ∗ β ) exp {− ( y i ∗ l − s j ∗ − ) exp( u j ∗ + x ′ i l ∗ β ) } du j ∗ d β ∝ Z R p +1 p +1 Y l =1 exp( w l ) exp {− ( y i ∗ l − s j ∗ − ) exp( w l ) } dw l = p +1 Y l =1 ( y i ∗ l − s j ∗ − ) − . Therefore, Z R p + J π ∗∗ ( β , u | D obs ) d β d u ≤ K (cid:18) Y j = g ∗ ( y i j − s j − ) − (cid:19) p +1 Y l =1 ( y i ∗ l − s j ∗ − ) − ! < ∞ , where K > (cid:3)

Proof of Theorem 2.

It is suﬃcient to show that P ( g new = k | β , θ , φ , λ , t, x new , z new , G ) for k = 1 ( k = G ) increases (decreases) in t . We canrewrite the conditional predictive probability in (18) as P ( g new = k | β , θ , φ , λ , t, x new , z new , G )= exp( z ′ new φ k ) P Gl =1 exp( z ′ new φ l ) exp( − ( θ l − θ k )[1 − exp {− exp( x ′ new β ) H ∗ ( t ) } ]) . Since H ∗ ( t ) is an increasing function of t , θ l − θ > l >

1, and θ l − θ G < l < G . Thus, ˆ p ( k | t, x new , z new , G ) increases in t for k = 1 and decreasesin t for k = G . (cid:3) Acknowledgments.

The authors wish to thank the Editor, the Asso-ciate Editor and three referees for helpful comments and suggestions whichhave improved the paper. The authors also wish to thank Dr. Anthony V.D’Amico of Brigham and Women’s Hospital and Dana Farber Cancer Insti-tute for his useful discussions and providing the prostate cancer data andthe references on risk groups.SUPPLEMENTARY MATERIAL

Checking the proportional hazards assumption and computational de-velopment (DOI: 10.1214/08-AOAS238SUPP; .pdf). In online supplemen-tary material we provide the empirical results for checking the proportional S. KIM, Y. XI AND M.-H. CHEN hazards assumption and the description of the Markov chain Monte Carlo(MCMC) sampling algorithm for a ﬁxed G and the detailed development ofthe reversible jump MCMC. REFERENCES Agresti, A. (2002).

Categorical Data Analysis , 2nd ed. Wiley, New York. MR1914507

Akaike, H. (1973). Information theory and an extension of the maximum likelihoodprinciple. In

International Symposium on Information Theory (B. N. Petrov and F.Csaki, eds.) 267–281. Akademia Kiado, Budapest. MR0483125

Banerjee, T., Chen, M.-H., Dey, D. K. and

Kim, S. (2007). Bayesian analysis ofgeneralized odds-rate hazards models for survival data.

Lifetime Data Anal. Berkson, J. and

Gage, R. P. (1952). Survival curve for cancer patients following treat-ment.

J. Amer. Statist. Assoc. Broet, P., Rycke, Y. D., Tubert-Bitter, P., Lellouch, J., Asselain, B. and

Moreau, T. (2001). A semiparametric approach for the two-sample comparison ofsurvival times with long-term survivors.

Biometrics Chen, M.-H., Ibrahim, J. G. and

Sinha, D. (1999). A new Bayesian model for survivaldata with a surviving fraction.

J. Amer. Statist. Assoc. Cooner, F., Banerjee, S., Carlin, B. P. and

Sinha, D. (2007). Flexible cure ratemodelling under latent activation schemes.

J. Amer. Statist. Assoc.

Cox, D. R. (1972). Regression models and life tables.

J. Roy. Statist. Soc. Ser. B Cowles, M. K. and

Carlin, B. P. (1996). Markov chain Monte Carlo convergencediagnostics: A comparative review.

J. Amer. Statist. Assoc. D’Amico, A. V., Whittington, R., Malkowicz, S. B., Cote, K., Loffredo, M.,Schultz, D., Chen, M.-H., Tomaszewski, J. E., Renshaw, A. A., Wein, A. and

Richie, J. P. (2002). Biochemical outcome following radical prostatectomy or externalbeam radiation therapy for clinically localized prostate cancer in the PSA era.

Cancer D’Amico, A. V., Whittington, R., Malkowicz, S. B., Schultz, D., Tomaszewski,J. E., Kaplan, I., Beard, C. and

Wein, A. (1998). Biochemical outcome after radicalprostatectomy, external beam radiation therapy, or interstitial radiation therapy forclinically localized prostate cancer.

The Journal of the American Medical Association

Dayton, C. M. (1999).

Latent Class Scaling Analysis . Sage, Thousand Oaks, CA.

Etzioni, R., Penson, D. F., Legler, J. M., di Tommaso, D., Boer, R., Gann, P.H. and

Feuer, E. J. (2002). Overdiagnosis due to prostate-speciﬁc antigen screening:Lessons from U.S. prostate cancer incidence trends.

Journal of the National CancerInstitute Gelfand, A. E. and

Dey, D. K. (1994). Bayesian model choice: Asymptotics and exactcalculations.

J. Roy. Statist. Soc. Ser. B Green, P. J. (1995). Reversible jump Markov chain Monte Carlo computation andBayesian model determination.

Biometrika Ibrahim, J. G., Chen, M.-H. and

Sinha, D. (2001).

Bayesian Survival Analysis .Springer, New York. MR1876598

Kaplan, E. L. and

Meier, P. (1958). Non-parametric estimation from incomplete ob-servations.

J. Amer. Statist. Assoc. Kim, S., Xi, Y. and

Chen, M.-H. (2009). Supplement to “A new latent cure rate markerfor survival data.” DOI: 10.1214/08-AOAS238SUPP.

Larsen K. (2004). Joint analysis of time-to-event and multiple binary indicators of latentclasses.

Biometrics Lin, H., Turnbull, B. W., McCulloch, C. E. and

Slate, E. H. (2002). Latent classmodels for joint analysis of longitudinal biomarker and event process data: Applicationto longitudinal prostate-speciﬁc antigen readings and prostate cancer.

J. Amer. Statist.Assoc. Liu, M., Lu, W. and

Shao, Y. (2006). Interval mapping of quantitative trait loci fortime-to-event data with the proportional hazards mixture cure model.

Biometrics Lopes, H. F. and

West, M. (2004). Bayesian model assessment in factor analysis.

Statist.Sinica Maller, R. A. and

Zhou, S. (1996).

Survival Analysis With Long Term Survivors . Wiley,New York. MR1453117

Martin, N. E., Chen, M.-H., Catalona, W. J., Loeb, S., Roehl, K. A. and

D’Amico,A. V. (2008). The inﬂuence of serial prostate-speciﬁc antigen (PSA) screening on thePSA velocity at diagnosis.

Cancer

Patterson, B. H., Dayton, C. M. and

Graubard, B. I. (2002). Latent class analysisof complex sample survey data: Application to dietary data (with discussion).

J. Amer.Statist. Assoc. Spiegelhalter, D. J., Best, N. G., Carlin, B. P. and van der Linde, A. (2002).Bayesian measures of model complexity and ﬁt (with discussion).

J. Roy. Statist. Soc.Ser. B Tsodikov, A. D. (2002). Semiparametric models of long- and short-term survival: Anapplication to the analysis of breast cancer survival in Utah by age and stage.

Statisticsin Medicine Tsodikov, A. D., Ibrahim, J. G. and

Yakovlev, A. Y. (2003). Estimating cure ratesfrom survival data: An alternative to two-component mixture models.

J. Amer. Statist.Assoc. Wang, L. and

Arnold, K. (2002). Press release: Prostate cancer incidence trends revealextent of screening-related overdiagnosis.

Journal of the National Cancer Institute Yakovlev, A. Y. and

Tsodikov, A. D. (1996).

Stochastic Models of Tumor Latency andTheir Biostatistical Applications . World Scientiﬁc, River Edge, NJ.

Yakovlev, A. Y., Asselain, B., Bardou, V. J., Fourquet, A., Hoang, T.,Rochefediere, A. and

Tsodikov, A. D. (1993). A simple stochastic model of tumorrecurrence and its applications to data on premenopausal breast cancer. In

Biometrieet Analyse de Dormees Spatio-Temporelles (B. Asselain, M. Boniface, C. Duby, C.Lopez, J. P. Masson and J. Tranchefort, eds.) 66–82. Soci´et´e Francaise de Biom´etrie,ENSA Renned, France. S. KimDivision of EpidemiologyStatistics and Prevention ResearchNational Institute of Child Healthand Human Development, NIHRockville, Maryland 20852USAE-mail: [email protected]

Y. XiBiogen Idec Inc.14 Cambridge CenterCambridge, Massachusetts 02142USAE-mail: [email protected] S. KIM, Y. XI AND M.-H. CHEN