Bayesian Survival Analysis Using Gamma Processes with Adaptive Time Partition
aa r X i v : . [ s t a t . M E ] A ug Bayesian Survival Analysis Using GammaProcesses with Adaptive Time Partition
Yi Li
Department of Mathematics, Northeastern University
Sumi Seo
Department of Mathematics, Northeastern University
Kyu Ha Lee
Department of Nutrition, Department of Epidemiology, Department of Biostatistics,Harvard T.H. Chan School of Public Health
August 6, 2020
Abstract
In Bayesian semi-parametric analyses of time-to-event data, non-parametric pro-cess priors are adopted for the baseline hazard function or the cumulative baselinehazard function for a given finite partition of the time axis. However, it would becontroversial to suggest a general guideline to construct an optimal time partition.While a great deal of research has been done to relax the assumption of the fixedsplit times for other non-parametric processes, to our knowledge, no methods havebeen developed for a gamma process prior, which is one of the most widely used inBayesian survival analysis. In this paper, we propose a new Bayesian framework forproportional hazards models where the cumulative baseline hazard function is mod-eled a priori by a gamma process. A key feature of the proposed framework is thatthe number and position of interval cutpoints are treated as random and estimatedbased on their posterior distributions.
Keywords:
Bayesian survival analysis; gamma process; proportional hazards model; re-versible jump Markov chain Monte Carlo 1
Introduction
Non-parametric and semi-parametric Bayesian analysis of time-to-event data have beenwell accepted in practice because they enable more flexible modeling strategy with fewerassumptions. While semi-parametric Cox proportional hazards models (Cox, 1972) avoidmodeling the baseline hazard function by maximizing the partial likelihood, the Bayesianparadigm requires explicit parametrization of the baseline hazard function. To this end, var-ious types of non-parametric prior processes have been developed, including gamma process(Kalbfleisch, 1978), beta process (Hjort et al., 1990), and Dirichlet process (Gelfand and Mallick,1995), which can be used to model the cumulative baseline hazard or distribution functions.A broad overview can be found in Ibrahim et al. (2005).Gamma process is one of the most widely used non-parametric process priors to modelthe cumulative baseline hazard in Bayesian proportional hazards models (Kalbfleisch, 1978;Clayton, 1991; Sinha et al., 1999, 2003, 2015). It has been adopted in many applica-tions including multivariate methods (Dunson and Chen, 2004; Sen et al., 2010; Cai, 2010;Sreedevi and Sankaran, 2012; Ouyang et al., 2013) and variable selection methods (Lee et al.,2011; Savitsky et al., 2011; Gu et al., 2013; Zhao et al., 2014; Lee et al., 2015a), and othersurvival models (Gelfand and Kottas, 2003; Cho et al., 2009; Zhao et al., 2015). In the ab-sence of the prior information, in the survival models where Gamma process prior is usedfor cumulative baseline hazard functions (Burridge, 1981; Lee et al., 2011; Savitsky et al.,2011; Gu et al., 2013; Zhao et al., 2014; Lee et al., 2015a), the time partition is specifiedeither based on uniquely ordered failure times or equi-length intervals conditional on theprespecified number of interval cutpoints. The use of equally spaced intervals may lead tothe time partition where some of the intervals have few failures. The possibility increasesas the number of splits is set to a large value. On the other hand, setting the partitionsuch that each interval contains the same number of failures may result in oversimplifiedestimation of baseline hazard for a long period time. Thus, fixing the partition of the base-line hazard timescale may impose unreasonable structure on the model and consequentlystandard error estimates may not reflect additional uncertainty associated with the numberand position of the split times (Haneuse et al., 2008). Therefore, it is very important tospecify the optimal time partition in the Bayesian semiparametric analysis of time-to-event2ata.While there have been studies on methods to relax the assumption of fixed time partitionfor models with other non-parametric processes (Arjas and Gasbarra, 1994; Haneuse et al.,2008; Lee et al., 2015b), to our knowledge, no such methods have been developed for thegamma process prior for cumulative baseline hazard function. In this paper, we propose aBayesian framework that allows the number and position of interval splits to be determinedby the data. We present an efficient computational scheme to fit the proposed models basedon a reversible jump Markov chain Monte Carlo (MCMC) algorithm (Green, 1995).The remainder of the paper is organized as follows. Section 2 describes the proposedBayesian analysis framework including specification of prior distributions. Section 3 pro-vides a detailed computational algorithm for obtaining samples from the joint posterior andtwo metrics for comparing goodness of fit across model specifications. Section 4 presentssimulation studies under the four different scenarios to assess performance of our proposedmethods. In Section 5, we apply the proposed framework to four different real-life survivaldata. Section 6 concludes with discussion.
Let T denote the survival times of individuals in a population. Under the Cox proportionalhazards model (Cox, 1972), the hazard at time t for an individual whose covariate vectoris x is given by h ( t | x ) = h ( t ) exp( x ⊤ β ) , (1)where h ( t ) is the baseline hazard function and β is a vector of regression coefficients.In the Bayesian paradigm, one is required to specify the baseline hazard function. Weconsider a non-parametric specification by taking the baseline hazard function to be aflexible mixture of piecewise constant functions (McKeague and Tighiouart, 2000). To this3nd, we construct a finite partition of the time axis, s = { s < s < s < ... < s J +1 } , with s ≡ s max ≡ s J +1 > t i for all i = 1 , ..., n . The survival time t i of the i th subject falls inone of the J + 1 disjoint intervals. Given the partition ( J , s ), we assume h ( t ) = J +1 X j =1 h j s j − s j − [ s j − To perform posterior estimation and inference, we use a random-scan Gibbs samplingalgorithm to generate samples from the joint posterior distribution. Since updating the timepartition ( J , s ) requires a change in the dimension of the parameter space, we develop ourcomputational scheme based on a Metropolis-Hastings-Green (MHG) step (Green, 1995). Adetailed description of the complete algorithm, together with all necessary full conditionalposterior distributions, is provided.In the first step of the random-scan Gibbs sampling scheme, we select a move from thefollowing four possible choices:move RP : updating regression parameters β move BH : updating parameters for the cumulative baseline hazard h move BI : creating a new split timemove DI : removing a split timeIf J is the number of time splits at the current iteration, the probabilities for move BI and DI are π JBI = ρ min (cid:26) , Poisson( J + 1 | α )Poisson( J | α ) (cid:27) = ρ min (cid:26) , αJ + 1 (cid:27) π JDI = ρ min (cid:26) , Poisson( J − | α )Poisson( J | α ) (cid:27) = ρ min (cid:26) , Jα (cid:27) .ρ is determined so that π JBI + π JDI < C and C < J = 1 , . . . , J max . J max is the preassignedupper limit on the number of time splits and we set π J max BI = 0. For the remaining moves, π RP = π BH = 1 − π JBI − π JDI Move RP The full conditional posterior distribution for β m , m = 1 , · · · , p , is given by6 ( β m | β ( − m ) , h ) ∝ J +1 Y j =1 Y k ∈ D j exp( x k,m β m ) exp − X l ∈ R j ∆ j ( y l ) h j s j − s j − exp( x l,m β m ) , where β ( − m ) is β without the element β m . Since the full conditional does not have standardform, we update each of regression coefficient using a random walk Metropolis-Hastings(MH) algorithm. Move BH The full conditional posterior distribution for h j , j = 1 , · · · , J + 1, is given by π ( h j | h ( − j ) , β ) ∝ h d j + c η ( s j ) κ − c η ( s j − ) κ − j exp − h j X l ∈ R j ∆ j ( y l ) e x ⊤ l β s j − s j − + c . Since the full conditional does not have standard form, we update h j ’s using a randomwalk MH algorithm. Move BI For a given partition ( J, s ), the cumulative baseline function H ( t )= P J +1 j =1 ∆ j ( t ) s j − s j − h j .Updating this specification requires generating a proposal split and then deciding whetheror not to accept the proposal. First, we proceed by selecting a proposal split time s ∗ uni-formly from among the observed event times that are not included in the current partition.Suppose s ∗ falls in the interval ( s j − , s j ] in the current partition. That is, the inducedproposal partition can be written as(0 = s , · · · , s j − , s ∗ , s j , · · · , s J +1 = s max ) ≡ (0 = s ∗ , · · · , s ∗ j − , s ∗ j , s ∗ j +1 , · · · , s ∗ J +2 = s max )Second, we calculate h ∗ j and h ∗ j +1 of the two new intervals created by the split at time s ∗ . To ensure the old height, h j s j − s j − , is a compromise of the two new heights, h ∗ j s ∗ − s ∗ j − and7 ∗ j +1 s ∗ j +1 − s ∗ , the former is taken to be the weighted mean of the latter:( s ∗ − s ∗ j − ) log h ∗ j s ∗ − s ∗ j − + ( s ∗ j +1 − s ∗ ) log h ∗ j +1 s ∗ j +1 − s ∗ = ( s j − s j − ) log h j s j − s j − . We define the multiplicative perturbation h ∗ j ( s ∗ j +1 − s ∗ ) h ∗ j +1 ( s ∗ − s ∗ j − ) = − UU , where U ∼ Uniform(0, 1).Then the new h ∗ j and h ∗ j +1 are given by h ∗ j = h j s ∗ − s j − s j − s j − (cid:18) − UU (cid:19) s ∗− sj − sj − sj − h ∗ j +1 = h j s j − s ∗ s j − s j − (cid:18) U − U (cid:19) sj − s ∗ sj − sj − . The Jacobian of the above system is (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∂h ∗ j ∂h j ∂h ∗ j ∂U∂h ∗ j +1 ∂h j ∂h ∗ j +1 ∂U (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = h j ( s ∗ − s j − )( s j − s ∗ ) (cid:8) (1 − U ) ( s ∗ − s j − ) U ( s j − s ∗ ) (cid:9) sj − sj − . Finally, the acceptance probability in the MHG step is computed as the product of thelikelihood ratio, prior ratio, proposal ratio, and Jacobian:i) Likelihood ratio: L ( β , h ∗ ) /L ( β , h )ii) Prior ratio: Poisson( J + 1 | α )Poisson( J | α ) × Q j +1 g = j Gamma( h ∗ g | c η ( s ∗ g ) κ − c η ( s ∗ g − ) κ , c )Gamma( h j | c η ( s j ) κ − c η ( s j − ) κ , c ) × (2 J + 3)(2 J + 2)( s ∗ − s j − )( s j − s ∗ ) s ( s j − s j − )iii) Proposal ratio: π DI ♯ { y i : δ i = 1 } π BI ( J + 1)Uniform( U | , h j ( s ∗ − s j − )( s j − s ∗ ) (cid:8) (1 − U ) ( s ∗ − s j − ) U ( s j − s ∗ ) (cid:9) sj − sj − Move DI The acceptance probability for the corresponding reverse move has the same form withthe appropriate change of labelling of the partitions and variables, and the ratio termsinverted. Suppose we remove a randomly selected split time s j . The proposal partition oftime axis consists of J time splits as follows:(0 = s , · · · , s j − , s j +1 , · · · , s J +1 = s max ) ≡ (0 = s ∗ , · · · , s ∗ j − , s ∗ j , s ∗ j +1 , · · · , s ∗ J = s max )As similarly done for the move BI,( s j − s j − ) log h j s j − s j − + ( s j +1 − s j ) log h j +1 s j +1 − s j = ( s ∗ j − s ∗ j − ) log h ∗ j s ∗ j − s ∗ j − h j ( s j +1 − s j ) h j +1 ( s j − s j − ) = 1 − U ∗ U ∗ Therefore, the four components of the acceptance probability can be written as follows:i) Likelihood ratio: L ( β , h ∗ ) /L ( β , h )ii) Prior ratio: Poisson( J − | α )Poisson( J | α ) × Gamma( h ∗ j | c η ( s ∗ j ) κ − c η ( s ∗ j − ) κ , c ) Q j +1 g = j Gamma( h g | c η ( s g ) κ − c η ( s g − ) κ , c ) × s ( s j +1 − s j − )(2 J + 1)2 J ( s j − s j − )( s j +1 − s j )iii) Proposal ratio: π BI Jπ DI ♯ { y i : δ i = 1 } h ∗ j (cid:8) (1 − U ∗ ) ( s j − s j − ) ( U ∗ ) ( s j +1 − s j ) (cid:9) sj +1 − sj − ( s j − s j − )( s j +1 − s j ) In practice, analysts must balance model complexity with limitations of available informa-tion in the data. To this end, we develop two model comparison metrics based on the de-viance information criterion (DIC) (Spiegelhalter et al., 2002) and the log-pseudo marginallikelihood (LPML) statistics (Geisser and Eddy, 1979). Various DIC constructions andextensions have been compared and tested in the case of mixtures of distributions and ran-dom effect models (Celeux et al., 2006). Since our proposed framework involves the PEMstructure, we consider the DIC measure as suggested in Celeux et al. (2006) and estimatethe quantity by using the following Monte Carlo approximation: d DIC GP = − R R X r =1 log ( n Y i =1 L ( D i | Φ ( r ) ) ) + 2 log ( n Y i =1 R R X r =1 L ( D i | Φ ( r ) ) ) . Φ ( r ) denotes the value of Φ at the r th MCMC iteration, r =1 , . . . , R . Note, a model withsmaller DIC GP indicates a better fit of the model for the data. A general rule of thumbfor model comparison is to consider differences of less than 2 to be negligible, differencesbetween 2 and 6 to be indicative of positive support for the model with the lower valueand differences greater than 6 to be strong support in value of the model with the lowervalue (Spiegelhalter et al., 2002).The LPML criterion is computed as P ni =1 log CPO i , where the subject-specific condi-tional predictive ordinates (CPO) is given by CPO i = L ( D i | D − ( i ) ). Following Chen et al.(2012), CPO can be approximated via a Monte Carlo estimator: \ CPO i = ( R R X r =1 L ( D i | Φ ( r ) ) − ) − . Since CPO i is the marginal posterior predictive density of the outcome for the i th subject10iven a dataset excluding the subject, a larger value of LPML indicates a better fit tothe data. For practical interpretation, one can compute the so-called pseudo-Bayes factor(PBF) for two models by exponentiating difference in their LPML values (Hanson, 2006). Table 1: A summary of four simulation scenarios explored in Section 4. Scenario n Censoring Distribution of therate baseline hazard function1 300 30% Weibull2 300 30% Piecewise linear3 300 50% Weibull4 100 30% Weibull Time0 30 60 90 . . . Induced prior for S ( t ) B a s e li ne s u r v i v a l f un c t i on exp(−H*)95% CI Figure 1: Induced baseline survival function S ( t ) based on the hyperparameters considered inSection 4 and 5. We refer to the proposed Bayesian framework as a PEM-gamma process model with areversible jump MCMC scheme (GP-RJ). The overarching goals of the simulation studiesare to investigate the small sample operating characteristics of the proposed GP-RJ frame-work and compare the performance with a PEM-gamma process model with a fixed timepartition with equally spaced intervals (GP-EQ). GP-EQ model can be implemented byrunning GP-RJ with π BI = π DI set to 0. 11 .1 Setting We consider four data scenarios that vary in terms of sample size, censoring rates, and thetrue underlying baseline hazard distribution. The simulation setting is similar to that in(Lee et al., 2016) and a summary of the scenarios is provided in Table 1. In Scenarios 1,3, and 4, the baseline hazard function is set to correspond to the hazard of a Weibull(0 . . h ( t ) = {− ( k − b ) t/ 40 + b } [ t ≤ + { (3 k − b ) / k − b ) t/ } [ t> with b = 0 . , k = 0 . n = 100) in Scenario 4.For each of the four scenarios we generated 500 simulated datasets under the modeloutlined in Table 1. We specified that the hazard function depended on three covariates: x i, , x i, ∼ Normal(0, 1), and x i, ∼ Bernoulli(0.5). The regression coefficients were setto ( β , β , β ) = (0 . , . , − . simSurv function in the SemiCompRisks R package (Alvares et al., 2019) to simulate the survival data. For each of the 500 datasets generated under each of the four scenarios we fit GP-RJand GP-EQ models. For GP-RJ model, we set the prior Poisson rate on the number ofsplit times to be α = 10. We specified that the time partition for GP-EQ model wasconstructed with 11 equi-length intervals ( J = 10). For both models, we set ( η , κ , c ) =(0 . , . , S ( t ) = exp( − H ( t )) based on thischoice of hyperparameters was given in Figure 1. We ran two independent chains for atotal of 1 million scans each with the first half taken as burn-in. We computed the Gelman-Rubin potential scale reduction factors (PSRF) (Gelman et al., 2013) to assess convergence,specifically requiring the statistics to be less than 1.05 for all model parameters.12 ime0 30 60 90 . . . B a s e li ne ha z a r d f un c t i on (a) Scenario 1: h ( t ) TruthGP−EQ, J = a = 10 Time0 30 60 90 . . . (b) Scenario 2: h ( t ) TruthGP−EQ, J = a = 10 Time0 30 60 90 . . . (c) Scenario 3: h ( t ) TruthGP−EQ, J = a = . . . B a s e li ne ha z a r d f un c t i on (d) Scenario 4: h ( t ) TruthGP−EQ, J = a = Scenario 1 Scenario 2 Scenario 3 Scenario 4 (e) Scenario 1−4: J T he nu m be r o f s p li t s Figure 2: Simulation studies: (a)-(d) Estimated baseline hazard function, h ( t ), for two analysesdescribed in Section 4; (e) The posterior distribution of J from the analysis under the GP-RJmodel. able 2: Estimated percent bias (PB), coverage probability (CP), and average relative width of95% credible intervals (RW) for β for two analyses described in Section 4, across four simulationscenarios given in Table 1. For RW, the GP-RJ was taken as the referent and throughout valuesare based on results from 500 simulated datasets. PB CP RWScenario True GP-RJ GP-EQ GP-RJ GP-EQ GP-RJ GP-EQ β β β -0.5 2.2 2.7 0.95 0.96 1.00 1.00 β β β -0.5 -0.4 -0.6 0.95 0.96 1.00 1.00 β β β -0.5 2.6 3.3 0.95 0.95 1.00 1.00 β β β -0.5 0.9 0.8 0.95 0.95 1.00 1.00 In Figure 2 (a)-(d), we presented the mean estimated baseline hazard function under thefour scenarios using GP-RJ and GP-EQ models. Both models performed well in terms ofestimation of the baseline hazard function. In Scenarios 2 and 3, the baseline hazard fromthe analyses tended to be underestimated for t > 60. This was due to relatively fewer eventsbeing generated for the later time period under the two simulation scenarios. Note that theproposed GP-RJ model successfully smoothed out the process while the GP-EQ resultedin a piecewise constant baseline hazard as the conventional PEM model. In Figure 2 (e),we presented the distribution of the estimated number of time splits ( J ) for GP-RJ. Whileproviding a smooth estimated baseline hazard function function like the truth, GP-RJ with α = 10 utilized a smaller number of split times comparing to GP-EQ with J = 10 acrossthe four scenarios. In addition, it was shown that the more complicated the underlyinghazard function was (Scenario 2) and the less the information was available from the data(Scenarios 3 and 4), the more split times GP-RJ tended to create.Table 2 indicated that the proposed GP-RJ also performed well in terms of estimationand inference for β . It was shown that percent bias was small across the board, and the14stimated coverage probabilities were all close to the nominal 0.95. While GP-EQ yieldedpoint estimates of β that were slightly more biased comparing to GP-RJ, the differencebetween two models was not significant because of the nature of the proportional hazardsassumption. In this section, we demonstrate the practical utility of our proposed models using fourdifferent time-to-event datasets. The four datasets are Dutch male laryngeal cancer data(DLC) (Kardaun, 1983), German breast cancer data (GBC) (Schumacher et al., 1994),Netherlands Cancer Institute breast cancer data (NBC) (Van De Vijver et al., 2002), andAmerican Association for Cancer Research breast cancer data (ABC) (Shapiro et al., 1993).We provide a summary of the four datasets in Table 3. Table 3: A summary of the datasets analyzed in Section 5 Data ID Description n p CensoringrateData 1. DLC Dutch male laryngeal cancer data 90 3 44%Data 2. GBC German breast cancer data 686 8 56%Data 3. NBC NCI breast cancer data 144 5 67%Data 4. ABC AACR breast cancer data 255 2 60% We fit the proposed GP-RJ models to the four real-life survival datasets. In the absence offurther information, we set the Poisson rate parameter α to 5, 10, 20, which reflected an apriori expectation of 6, 11, 21 intervals on hazard time scale, respectively. For illustration,we also presented the analyses based on GP-EQ models where the time partition wasprespecified with three different values of J = 5, 10, 20. Additional analyses were conductedwith a fixed time partition constructed based on uniquely ordered event times, which wereferred to as GP-UQ. For all of GP-RJ, GP-EQ, GP-UQ, we set ( η , κ , c ) = (0 . , . , Table 4: DIC and LPML for seven models fit to the four datasets. The smallest DIC and thelargest LPML values are highlighted in bold. DLC GBC NBC ABCDIC LPML DIC LPML DIC LPML DIC LPMLGP-EQ, J = 5 38.5 -19.9 510.9 -256.6 187.3 -93.3 207.2 -103.6GP-EQ, J = 10 38.4 -19.8 508.0 -255.2 179.4 -90.1 206.5 -103.3GP-EQ, J = 20 39.2 -20.3 509.6 -255.7 187.5 -92.9 207.0 -103.5GP-UQ † α = 5 38.1 -19.6 509.7 -255.9 180.9 -89.4 205.9 -103.0GP-RJ, α = 10 38.1 -19.6 506.8 -254.4 GP-RJ, α = 20 † s is set to uniquely ordered event times. J = 34 , , , 103 for DLC, GBC, NBC, ABC, respectively. Table 4 presented DIC and LPML for the models we compared. Focusing on the analyseswhere J for GP-EQ and α for GP-RJ were set to the same value, GP-RJ model had equalor better fits to the four datasets than the corresponding GP-EQ model: e.g. differencesin DIC ranged between 0 . − . . − . . − . . − . . − . ime in years . . . B a s e li ne ha z a r d f un c t i on (a) DLC data GP−EQ, J = a = 20 Time in years . . . (b) GBC data GP−EQ, J = a = . . . B a s e li ne ha z a r d f un c t i on (c) NBC data GP−EQ, J = a = 10 Time in years . . . (d) ABC data GP−EQ, J = a = Figure 3: Estimated baseline hazard function, h ( t ), for the two models fit to the four real-lifesurvival datasets outlined in Table 3. a) DLC data: J T he nu m be r o f s p li t s (b) GBC data: J (c) NBC data: J (d) ABC data: J s (e) DLC data: s P r obab ili t y o f s p li t po s i t i on s . . . s (f) GBC data: s . . . s (g) NBC data: s . . . s (h) ABC data: s . . . Figure 4: Analyses of the four real-life survival datasets outlined in Table 3: (a)-(d) The posteriordistribution of J ; (e)-(h) The posterior probabilities associated with the positions of the splittimes s . J , s ). This was not surprising because the posterior medians of J from GP-RJ were12, 17, and 8 for the three datasets (see Figure 4 (a), (b), and (d)), which were smaller thanthe prior Poisson rate α ’s set for the analyses. In Figure 4 (e)-(h), we presented posteriorprobabilities associated with the positions of the split times s . Note that the shape ofthe distribution of posterior probabilities of s was generally consistent with the smooth orabrupt changes in the pattern of baseline hazard (Figure 3). We have described a new Bayesian framework for proportional hazards models where thecumulative baseline hazard function was modeled a priori by a gamma process. Theproposed approach helps avoid the difficult task of specifying the number of the time splitsand their positions by employing a random split-times model and a reversible jump MCMCscheme.When compared to the model with the fixed time partition (GP-EQ), the proposed GP-RJ characterized the baseline hazard function as a notably smoother mixture of piecewiseconstant (Figure 2 and 3). Although requiring the extra steps to estimate ( J , s ), GP-RJgenerally performed well with a smaller number of J (Figure 2 (e) and 4 (a)-(d)), exhibiteda smaller bias for regression coefficients (Table 2), and yielded a better model fit (Table 4)than the corresponding GP-EQ.To our best knowledge, this work is the first attempt to relax the assumption of fixing thetime partition for the proportional hazards model where a gamma process prior is used forcumulative baseline hazard function. The proposed framework provides researchers withvalid statistical approaches to overcome a major barrier for the practical use of gammaprocess models in the analysis of survival data. References Alvares, D., Haneuse, S., Lee, C., and Lee, K. H. (2019). SemiCompRisks: An R packagefor the analysis of independent and cluster-correlated semi-competing risks data. The RJournal , 11(1):376. 19rjas, E. and Gasbarra, D. (1994). Nonparametric Bayesian inference from right censoredsurvival data, using the Gibbs sampler. Statistica Sinica , pages 505–524.Burridge, J. (1981). Empirical Bayes analysis of survival time data. Journal of the RoyalStatistical Society: Series B , pages 65–75.Cai, B. (2010). Bayesian semiparametric frailty selection in multivariate event time data. Biometrical Journal: Journal of Mathematical Methods in Biosciences , 52(2):171–185.Celeux, G., Forbes, F., Robert, C. P., Titterington, D. M., et al. (2006). Deviance infor-mation criteria for missing data models. Bayesian Analysis , 1(4):651–673.Chen, M.-H., Shao, Q.-M., and Ibrahim, J. G. (2012). Monte Carlo methods in Bayesiancomputation . Springer Science & Business Media.Cho, H., Ibrahim, J. G., Sinha, D., and Zhu, H. (2009). Bayesian case influence diagnosticsfor survival models. Biometrics , 65(1):116–124.Clayton, D. G. (1991). A Monte Carlo method for Bayesian inference in frailty models. Biometrics , pages 467–485.Cox, D. (1972). Regression models and life-tables. Journal of the Royal Statistical Society:Series B , pages 187–220.Dunson, D. B. and Chen, Z. (2004). Selecting factors predictive of heterogeneity in multi-variate event time data. Biometrics , 60(2):352–358.Geisser, S. and Eddy, W. F. (1979). A predictive approach to model selection. Journal ofthe American Statistical Association , 74(365):153–160.Gelfand, A. E. and Kottas, A. (2003). Bayesian semiparametric regression for medianresidual life. Scandinavian Journal of Statistics , 30(4):651–665.Gelfand, A. E. and Mallick, B. K. (1995). Bayesian analysis of proportional hazards modelsbuilt from monotone functions. Biometrics , pages 843–852.Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B.(2013). Bayesian data analysis . CRC press.20reen, P. J. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesianmodel determination. Biometrika , 82:711–732.Gu, X., Yin, G., and Lee, J. J. (2013). Bayesian two-step lasso strategy for biomarker se-lection in personalized medicine development for time-to-event endpoints. Contemporaryclinical trials , 36(2):642–650.Haneuse, S., Rudser, K. D., and Gillen, D. L. (2008). The separation of timescales inBayesian survival modeling of the time-varying effect of a time-dependent exposure. Biostatistics , 9(3):400–410.Hanson, T. E. (2006). Inference for mixtures of finite polya tree models. Journal of theAmerican Statistical Association , 101(476):1548–1565.Hjort, N. L. et al. (1990). Nonparametric Bayes estimators based on beta processes inmodels for life history data. the Annals of Statistics , 18(3):1259–1294.Ibrahim, J., Chen, M., and Sinha, D. (2005). Bayesian survival analysis . Wiley OnlineLibrary.Kalbfleisch, J. (1978). Non-parametric Bayesian analysis of survival time data. Journal ofthe Royal Statistical Society. Series B (Methodological) , pages 214–221.Kardaun, O. (1983). Statistical survival analysis of male larynx-cancer patients-a casestudy. Statistica neerlandica , 37(3):103–125.Lee, K. H., Chakraborty, S., and Sun, J. (2011). Bayesian variable selection in semipara-metric proportional hazards model for high dimensional survival data. The InternationalJournal of Biostatistics , 7(1):21.Lee, K. H., Chakraborty, S., and Sun, J. (2015a). Survival prediction and variable selectionwith simultaneous shrinkage and grouping priors. Statistical Analysis and Data Mining ,8(2):114–127.Lee, K. H., Dominici, F., Schrag, D., and Haneuse, S. (2016). Hierarchical models forsemicompeting risks data with application to quality of end-of-life care for pancreaticcancer. Journal of the American Statistical Association , 111(515):1075–1095.21ee, K. H., Haneuse, S., Schrag, D., and Dominici, F. (2015b). Bayesian semiparametricanalysis of semicompeting risks data: investigating hospital readmission after a pancre-atic cancer diagnosis. Journal of the Royal Statistical Society, Series C , 64(2):253–273.McKeague, I. W. and Tighiouart, M. (2000). Bayesian estimators for conditional hazardfunctions. Biometrics , 56(4):1007–1015.Ouyang, B., Sinha, D., Slate, E. H., and Van Bakel, A. B. (2013). Bayesian analysis ofrecurrent event with dependent termination: an application to a heart transplant study. Statistics in medicine , 32(15):2629–2642.Savitsky, T., Vannucci, M., and Sha, N. (2011). Variable selection for nonparametricGaussian process priors: models and computational strategies. Statistical science: areview journal of the Institute of Mathematical Statistics , 26(1):130.Schumacher, M., Bastert, G., Bojar, H., Huebner, K., Olschewski, M., Sauerbrei, W.,Schmoor, C., Beyerle, C., Neumann, R., and Rauschecker, H. (1994). Randomized 2 x 2trial evaluating hormonal treatment and the duration of chemotherapy in node-positivebreast cancer patients. german breast cancer study group. Journal of Clinical Oncology ,12(10):2086–2093.Sen, A., Banerjee, M., Li, Y., and Noone, A.-M. (2010). A Bayesian approach to competingrisks analysis with masked cause of death. Statistics in medicine , 29(16):1681–1695.Shapiro, C. L., Gelman, R. S., Hayes, D. F., Osteen, R., Obando, A., Canellos, G. P.,Frei III, E., and Henderson, I. C. (1993). Comparison of adjuvant chemotherapy withmethotrexate and fluorouracil with and without cyclophosphamide in breast cancer pa-tients with one to three positive axillary lymph nodes. JNCI: Journal of the NationalCancer Institute , 85(10):812–817.Sinha, A., Chi, Z., and Chen, M.-H. (2015). Bayesian inference of hidden gamma wearprocess model for survival data with ties. Statistica Sinica , 25(4):1613.Sinha, D., Chen, M.-H., and Ghosh, S. K. (1999). Bayesian analysis and model selectionfor interval-censored survival data. Biometrics , 55(2):585–590.22inha, D., Ibrahim, J. G., and Chen, M.-H. (2003). A Bayesian justification of Cox’s partiallikelihood. Biometrika , 90(3):629–641.Spiegelhalter, D. J., Best, N. G., Carlin, B. P., and Van Der Linde, A. (2002). Bayesianmeasures of model complexity and fit. Journal of the Royal Statistical Society, Series B ,64(4):583–639.Sreedevi, E. and Sankaran, P. (2012). A semiparametric Bayesian approach for the analysisof competing risks data. Communications in Statistics-Theory and Methods , 41(15):2803–2818.Van De Vijver, M. J., He, Y. D., Van’t Veer, L. J., Dai, H., Hart, A. A., Voskuil, D. W.,Schreiber, G. J., Peterse, J. L., Roberts, C., Marton, M. J., et al. (2002). A gene-expression signature as a predictor of survival in breast cancer. New England Journal ofMedicine , 347(25):1999–2009.Zhao, L., Feng, D., Bellile, E. L., and Taylor, J. M. (2014). Bayesian random thresholdestimation in a Cox proportional hazards cure model. Statistics in Medicine , 33(4):650–661.Zhao, L., Shi, J., Shearon, T. H., and Li, Y. (2015). A Dirichlet process mixture modelfor survival outcome data: assessing nationwide kidney transplant centers.