Kullback-Leibler-Based Discrete Relative Risk Models for Integration of Published Prediction Models with New Dataset
KKullback-Leibler-Based Discrete Relative Risk Models forIntegration of Published Prediction Models with New Dataset
Di Wang, Wen Ye, Kevin HeDepartment of Biostatistics, University of Michigan, Ann Arbor, MI
Abstract
Existing literature for prediction of time-to-event data has primarily focused on riskfactors from an individual dataset. However, these analyses may suffer from small samplesizes, high dimensionality and low signal-to-noise ratios. To improve prediction stability andbetter understand risk factors associated with outcomes of interest, we propose a Kullback-Leibler-based discrete relative risk modeling procedure. Simulations and real data analysisare conducted to show the advantage of the proposed methods compared with those solelybased on local dataset or prior models.
Introduction
Prior research for predicting survival outcomes has primarily focused on risk factorsfrom an individual dataset. These analyses may suffer from small sample sizes, high di-mensionality and low signal-to-noise ratios. To address these limitations and improve theprediction stability, we propose a Kullback-Leibler-based discrete relative risk modelingprocedure to aggregate the new data with previously published prediction models.Our endeavor is motivated by the prediction of kidney post-transplantation mortality.With a limited supply of donor kidneys and an increasing need for kidney transplantation,methods to accurately identify optimal organ allocation are urgently needed. Recently, theKidney Donor Profile Index (KDPI) has been introduced to combine a variety of donorfactors into a single number that summarizes the likelihood of graft failure after a deceaseddonor kidney transplant. The KDPI has been used to measure how long a deceased donorkidney is expected to function relative to all of the kidneys recovered in the U.S. LowerKDPI scores are believed to be associated with longer estimated function, while higherKDPI scores are associated with shorter estimated function. However, the predictive power1 a r X i v : . [ s t a t . M E ] J a n f the KDPI is only moderate (C-index = 0.60). In addition, the KDPI does not includeall donor factors potentially associated with kidney graft outcomes. For example, biopsyresults are not included in the KDPI. Since the KDPI is a donor-level measure, not specificto either kidney, it also does not contain any information about anatomical damage, trauma,or abnormalities that may be associated with one of a donor’s kidneys. Further, the KDPIprovides no assessment of the likelihood of disease or malignancy transmission from adeceased donor. Consequently, KDPI is not a precise enough tool to differentiate thequality of kidney donors.In addition to the KDPI score, each candidate on the kidney waitlist gets an individualEstimated Post-Transplant Survival Score (EPTS), ranging from 0 to 100 percent. Can-didates with EPTS scores of 20 percent or less will receive offers for kidneys from donorswith similar KDPI scores before other candidates at the local, regional, and national levelsof distribution. Similar to KDPI, the predictive power of EPTS is limited. For example,it has been shown that patients with comorbidities have a greater than 3-fold increase inevent failure risk than patients without comorbidities. The EPTS model, however, does notmake this distinction since comorbidities (except for diabetes mellitus) are not included inthe model.To optimize the organ allocation and improve our understanding of risk factors asso-ciated with post-transplant mortality, a desirable strategy is to aggregate these publishedsurvival models with new available dataset. One example for such datasets is the MichiganGenomics Initiative (MGI), an institutional repository of DNA and genetic data that islinked to medical phenotype and electronic health record (EHR) information.In the context of survival analysis, prediction models have primarily focused on individ-ual dataset. Tibshirani (1997) proposed a Lasso procedure in the Cox proportional hazardsmodel, and a coordinate descent algorithm for this procedure was developed in the R pack-age glmnet (Simon et al. 2011). To deal with the problem of collinearity, Simon et al.(2011) developed the elastic net for the Cox proportional hazards models. Alternatively,Random Survival Forests (RSF) model (Ishwaran et al. 2008) has been applied for the2rediction of coronary artery disease mortality (Steele et al. 2018). While successful, theseanalyses may suffer from small sample sizes, high dimensionality and low signal-to-noiseratios. Thus, of special interest in this report is to aggregate the new data with previouslypublished prediction models, such as the KDPI and EPTS scores for kidney transplanta-tion. To improve prediction stability and better understand risk factors associated withoutcomes of interest, we propose a Kullback-Leibler-based discrete relative risk modelingprocedure. MethodsDiscrete Relative Risk Models
An important consideration for analyses of post-transplantmortality is that the time is typically recorded on a daily basis. For example, in studiesof 30-day mortality, there are 30 discrete event times and the number of tied events isnumerous. When the number of unique event times reduces, the bias of parameter esti-mations increases quickly, preventing the use of the standard partial likelihood approach.These concerns motivate us to propose a prediction approach based on discrete relative riskmodels.Let T i denote the event time of interest for subject i , i = 1 , . . . , n , where n is the totalnumber of subjects. Let k = 1 , . . . , τ be the unique event times (e.g. τ = 30 for 30 daymortality). Let D k denote the set of labels associated with individuals failing at time k .The set of labels associated with individuals censored at time k is denoted as C k . Let R k denote the at risk set at time k . Let X i be an external and possibly time-dependentcovariate vector for the i -th subject. Let λ ik = P ( T i = k | T i ≥ k, X i ) be the hazard at time k for the i -th patient with covariate X i . The likelihood function is given by L = τ (cid:89) k =1 (cid:40) (cid:89) i ∈ R k − D k (1 − λ ik ) (cid:89) i ∈ D k λ ik (cid:41) . (1)Consider a general formulation of the hazard h ( λ ik ) = h ( η k + X (cid:62) ik β ), where h denotes amonotonically increasing and twice differentiable link function, η k is the baseline hazard ofmortality at time k , and β denotes a coefficient vector associated with X ik . Define g = h − .3he log-likelihood is given by (cid:96) ( η , β ) = n (cid:88) i =1 τ (cid:88) k =1 Y ik (cid:20) δ ik log (cid:26) g ( η k + X (cid:62) ik β )1 − g ( η k + X (cid:62) ik β ) (cid:27) + log { − g ( η k + X (cid:62) ik β ) } (cid:21) , (2)where Y ik is the at-risk indicator and δ ik is the death indicator at time k . Commonchoices for the link function h include complementary log-log (grouped relative risk model),log (discrete relative risk model), and logit (discrete logistic model). Kullback-Leibler-Based Integration
To integrate the published models, we utilize thefact that the term δ ik log[ g ( η k + X (cid:62) ik β ) / { − g ( η k + X (cid:62) ik β ) } ] + log { − g ( η k + X (cid:62) ik β ) } in the log-likelihood (2) is proportional to the Kullback–Leibler (KL) distance between thedistribution of the discrete survival model and the corresponding saturated model. Theresulting weighted log-likelihood function to link the prior information and the new data isgiven by (cid:96) λ ( η , β ) = n (cid:88) i =1 τ (cid:88) k =1 Y ik (cid:34) δ li + λ ˆ δ ik λ log (cid:26) g ( η k + X (cid:62) ik β )1 − g ( η k + X (cid:62) ik β ) (cid:27) + log { − g ( η k + X (cid:62) ik β ) } (cid:35) , (3)where ˆ δ ik = h (ˆ η k + X (cid:62) ik (cid:98) β ) is the predicted outcome for subject i at time k based on therisk factors in the new data, and parameters ˆ η k and (cid:98) β obtained from the prior model. Ifthe prior model only contains a subset of relevant variables in the new data, the remainingvalues of (cid:98) β are set to be 0. Note that λ is a tuning parameter to weigh the prior informationand the new data and is determined using cross-validation. In the extreme case of λ = 0,the penalized log-likelihood is reduced to the log-likelihood based on the new data. Incontrast, when λ = ∞ , the model is equivalent to the prior information. Simulation
To assess the performance of the proposed procedure, we compared it with both theprior model and the model fitted only by local data. Generally, the proposed KL modelingprocedure works for log-log, log and logit link functions. Here, we used logit link as an4xample function to conduct the simulation studies. Suppose β = ( β , β , . . . , β p ) T isthe vector of coefficients from a prior model published in the previous research, and β l =( β , β , . . . , β p n ) T is the vector of coefficients from which the local data is generated. Weassume that the prior model and the local data share the same set of baseline hazard ofmortality η k at each discrete time point T k . Then the local training data is generated by X ∼ M V N ( , Σ ), where Σ is a first-order autoregressive (AR1) correlation matrix withthe auto-correlation parameter 0.5. For each time point T k , the event indicator Y ik for eachsubject i in the at risk set R k is generated by Bernoulli( logit − ( X T i β l + η k )). We will deletesubject i from the at risk set R T >T k for future time points if Y ik = 1 at T k ; otherwise, wewill keep it. Latent censoring times are generated from a discrete uniform(1, 30), and thenare truncated by an administrative censoring at time point 10.We consider six different models which are clustered into two scenarios in the simulationstudies:Scenario 1: Local data and prior model share the same set of predictors:Model (a): β l = β ;Model (b): β l = 0 . β ;Model (c): β l =reverse( β ) = ( β p , β p − , . . . , β ) T .Scenario 2: Local data contains all predictors in the prior model and additional newpredictors:Model (d): β l = ( β , . β );Model (e): β l = ( β , . β );Model (f): β l = ( β , β ).For scenario 1, model (a) mimics the situation when local data comes from exactly thesame model as the prior model; model (b) changes the magnitude of coefficients from theprior model, but keeps the same trend; the local data in model (c) is generated from acompletely different model. For scenario 2, model (d), (e) and (f) mimic the situationswhere the additional new predictors in the local data are of various importance compared5o the prior model, which are managed by adjusting magnitudes of the new predictors.Moreover, we set the local sample size n l = 300, number of prior predictors p = 10,number of additional new predictors p n = 10 and the range of tuning parameters to be λ ∈ [0 , λ is selected by 5-fold cross validation on the local data withempirical log likelihood as the metric of model performance. After determining λ , weevaluate the proposed KL modeling method with the prior model and the local model. Inorder to make a fair comparison, we evaluate the models on the hold-out external validationdata set, which is simulated from the same model setting as the local data. The best modelachieves the maximal log likelihood on the external validation dataset. The simulation isreplicated 100 times.Figure 1: Simulation results of KL-modeling (green) compared with prior (red) and local(purple) models. (a)-(f) presents results for model setting (a)-(f) respectively.Figure 1 shows that under all model settings the KL-based modeling procedure achievescomparable or better performance than both the prior model and the model fitted onlyby the local data. More specifically, the KL-based modeling procedure favors the caseswhere the prior model is similar to the local data. However, even under extreme situationswhen the prior model is completely different from the local data (Fig. 1 (c)) or missingimportant predictors (Fig. 1 (f)), the KL-based modeling procedure does not result inmisleading predictions. We also present the best tuning parameter λ determined by the6L-based modeling procedure in the simulation studies. The KL-based modeling proceduretends to select larger λ when the prior model is more similar to the local data. In addition,it will not incorporate bad prior information which is not relevant to the local data byselecting an extremely small λ or setting it to be 0 (Figure 2).Figure 2: Selected tuning parameter λ for the best fitting of KL modeling procedure. (a)-(f)represents model setting (a)-(f) respectively. Data Analysis
We use EPTS model and a local kidney transplant data as an example to illustratehow to use KL modeling procedure to integrate previously published prediction modelsand new dataset. The raw EPTS score is derived from a Cox proportional hazards modelusing Scientific Registry of Transplant Recipients (SRTR) kidney transplant data. For sim-plicity, the raw EPTS score only includes 4 predictors in the model: candidate’s age inyears, duration on dialysis in years, current diagnosis of diabetes, and whether the candi-date has a prior organ transplant. Since the EPTS model doesn’t report baseline survivalinformation, we applied estimated baseline survival information using kidney transplantdata obtained from the U.S. Organ Procurement and Transplantation Network (OPTN)(https://optn.transplant.hrsa.gov/data/). A total of 80,019 patients which includes all pa-tients with ages greater than 18 who received transplant between January 2005 and January7cenario 1Model KL Prior Locallog likelihood -358.478 -398.657 -395.531Scenario 2Model KL Prior Locallog likelihood -358.467 -398.657 -409.814Table 1: The log likelihood of different models.2013 with deceased donor type were used in the estimation. Specifically, we fit a discreterelative risk model including the same set of predictors as EPTS model does and obtainedthe parameter estimates for each week within the first year after receiving transplants.Thus, our prior model is the combination of EPTS model and estimated baseline survivalinformation by week.The local kidney transplant data we used is the University of Michigan Medical Center(MIUM) kidney transplant dataset. We consider two different scenarios regarding to predic-tors of local data: Scenario 1, which includes the same set of predictors as EPTS does, andScenario 2, which includes two additional predictors of comorbidities (whether candidatehas previous malignancy, and presence of pre-transplant peripheral vascular disease) thanEPTS. In this real data analysis, we only evaluate first year survival after transplant. Asshown in Table 1, KL-based modeling procedure has the best performance under both twoscenarios. Specifically, using the same set of predictors, the model fitted only by local datahas a slightly better performance than prior model, which indicates that the prior modellacks accuracy when applied to this specific local dataset. However, the log likelihood ofthe local model decreases substantially when including additional predictors, which showsthat the model fitted only by local dataset is unstable. In summary, KL-based modelingprocedure provides a more stable and accurate prediction than the prior model and themodel fitted only by the local data. 8 ummary
Existing literature for prediction of time-to-event data has primarily focused on riskfactors from an individual dataset, which may suffer from small sample sizes, high dimen-sionality and low signal-to-noise ratios. To improve prediction stability and better under-stand risk factors associated with outcomes of interest, we propose a Kullback-Leibler-baseddiscrete relative risk modeling procedure. Simulations are conducted to show the advan-tage of the proposed methods compared with those solely based on local dataset or priormodels. The proposed Kullback-Leibler-based discrete relative risk modeling procedure issufficiently flexible to incorporate situations where the number of risk factors in the MGIdataset is greater than those available in previous models reported in the literature.
References
Kalbfleisch, J.D., and Prentice, R.L. 1973. Marginal likelihoods based on Cox’s regressionand life model.
Biometrika
Journal of statistical software
The annals of applied statistics