[PDF] Cox-nnet v2.0: improved neural-network based survival prediction extended to large-scale EMR dataset

Abstract

Cox-nnet is a neural-network based prognosis prediction method, originally applied to genomics data. Here we propose the version 2 of Cox-nnet, with significant improvement on efficiency and interpretability, making it suitable to predict prognosis based on large-scale electronic medical records (EMR) datasets. We also add permutation-based feature importance scores and the direction of feature coefficients. Applying on an EMR dataset of OPTN kidney transplantation, Cox-nnet v2.0 reduces the training time of Cox-nnet up to 32 folds (n=10,000) and achieves better prediction accuracy than Cox-PH (p<0.05). Availability and implementation: Cox-nnet v2.0 is freely available to the public at this https URL

Full PDF

CCox-nnet v2.0: improved neural-network based survival prediction extended to large-scale EMR dataset

Di Wang , Kevin He , Lana X Garmire Department of Biostatistics, University of Michigan, Ann Arbor, MI Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI * Corresponding author: [email protected]

Abstract

Summary:

Availability and implementation:

Cox-nnet v2.0 is freely available to the public at https://github.com/lanagarmire/Cox-nnet-v2.0

Introduction arge-scale Electronic medical records (EMR) are informative and easily accessible data sources frequently used for patients survival prediction. Prediction models built on EMR data tend to have better performance than those using administrative data (Mahmoudi et al. , 2020). It is also found that machine learning based models outperformed conventional models, such as Cox-Proportional Hazard (Cox-PH) model (Cox, 1972), Random Survival Forests (RSF) model (Ishwaran et al. , 2008) and elastic net regression (Fan et al. , 2010) on the prediction of coronary artery disease mortality using EMR data (Steele et al. , 2018). Although it is challenging to develop prediction models driven by EMR data, the large sample size and clinical features in EMR data provide valuable information in survival prediction (Goldstein et al. , 2017). We previously proposed Cox-nnet (Ching et al. , 2018), a deep learning based neural network prognosis prediction model, which achieved comparable or better performance than Cox-PH on high-throughput omics data. We recently applied Cox-nnet to histopathology imaging data with pre-extracted features, and demonstrated its advantage in combining gene expression data and image data for survival prediction (Zhan et al. ) . However, it remains to be tested if Cox-nnet is suitable to predict survival in large-scale EMR data, where the patient size is usually magnitudes larger than genomics data. Towards this, we propose Cox-nnet v2.0, which significantly improves computational speed, with enhanced interpretability. Additionally, Cox-nnet v2.0 also achieves better prediction accuracy than Cox-PH.

Methods

Cox-nnet method improvement : The original Cox-nnet is a neural-network based extension to Cox-PH method, using the log partial likelihood as its loss function. In Cox-nnet v2.0, we have made the following improvements: (1) Speed-up in calculating log partial likelihood loss function. The log partial likelihood is calculated by: 𝑝𝑝𝑝𝑝 ( 𝜷𝜷 ) = � 𝐶𝐶 𝑖𝑖 = ( 𝜽𝜽 𝑖𝑖 − 𝑝𝑝𝑙𝑙𝑙𝑙 � 𝑡𝑡 𝑖𝑖 ⩽𝑡𝑡 𝑗𝑗 𝑒𝑒𝑒𝑒𝑝𝑝 ( 𝜽𝜽 𝑖𝑖 )) here 𝜽𝜽 𝑖𝑖 is the linear predictor of patient 𝑖𝑖 and 𝐶𝐶 𝑖𝑖 is defined by 𝐶𝐶 𝑖𝑖 = 𝐼𝐼 ( patient 𝑖𝑖 is not censored ). To avoid nested summation in Theano, the previous version of Cox-nnet calculates the log partial likelihood by matrix multiplication: 𝑝𝑝𝑝𝑝 ( 𝜷𝜷 ) = { 𝜽𝜽 − 𝑝𝑝𝑙𝑙𝑙𝑙 ( 𝑅𝑅 ∗ 𝑒𝑒𝑒𝑒𝑝𝑝 ( 𝜽𝜽 ))} 𝑇𝑇 𝐶𝐶 Where 𝐶𝐶 and 𝜽𝜽 are vectors of 𝐶𝐶 𝑖𝑖 and 𝜽𝜽 𝑖𝑖 respectively. 𝑅𝑅 is a 𝑛𝑛 by 𝑛𝑛 at risk set indicator matrix, and each entry 𝑅𝑅 𝑖𝑖𝑖𝑖 is defined by: 𝑅𝑅 𝑖𝑖𝑖𝑖 = 𝐼𝐼 ( 𝑡𝑡 𝑖𝑖 ⩽ 𝑡𝑡 𝑖𝑖 ) Where 𝑛𝑛 is the sample size of the input data, and 𝑡𝑡 𝑖𝑖 and 𝑡𝑡 𝑖𝑖 are the event time of patient 𝑖𝑖 and 𝑗𝑗 , respectively. This implementation is memory intensive and time consuming when dealing with large sample sizes. In the new version, instead of pairwise comparison we sorted the observations by event time. Then by definition of the at risk set, 𝑅𝑅 is converted to an upper triangular matrix filled with 1. Intuitively, 𝑅𝑅 ∗𝑒𝑒𝑒𝑒𝑝𝑝 ( 𝜽𝜽 ) can be calculated using cumulative summation that no longer requires storing 𝑅𝑅 matrix and nested summation (double loops). (2) Adding permutation based feature importance scores. Previously the variable importance score of Cox-nnet is calculated by pseudo drop-out, which replaced the variable with its mean. The drawback is that it is hard to interpret categorical variables. Here we introduce a more general feature evaluation method using permutation importance score (Breiman, 2001). The main idea is to measure the model error increase after shuffling the feature’s values, since the permutation breaks the relationship between the feature and the outcome. We implement the algorithm proposed in Fisher et al. (2019). (3) Adding the directionality of the feature coefficient. Similar to estimating the sign of 𝛽𝛽 for Cox-PH, we develop a framework which approximates the direction of feature coefficients in Cox-nnet. The linear predictor in Cox-nnet is: 𝜽𝜽 𝑖𝑖 = 𝐺𝐺 ( 𝑊𝑊𝑋𝑋 𝑖𝑖 + 𝑏𝑏 ) 𝛽𝛽 here 𝐺𝐺 is the activation function, 𝑊𝑊 is the coefficient weight matrix between input and hidden layer, and 𝑏𝑏 is the bias term. Suppose each column 𝑋𝑋 𝑘𝑘∗ in 𝑋𝑋 ∗ is defined by: 𝑋𝑋 𝑘𝑘∗ = ( 𝑋𝑋 𝑘𝑘 − ) ⨉𝐼𝐼 ( 𝑋𝑋 𝑘𝑘 is continuous variable ) + ⨉ 𝐼𝐼 ( 𝑋𝑋 𝑘𝑘 is categorical variable ) Similar to the interpretation of 𝛽𝛽 in Cox-PH, the direction of each feature coefficient in Cox-nnet is approximated by the sign of 𝑛𝑛 ∑ 𝑛𝑛𝑖𝑖= 𝛥𝛥𝜽𝜽 𝑖𝑖𝑘𝑘 = 𝑛𝑛 ∑ 𝑛𝑛𝑖𝑖= ( 𝜽𝜽 𝑖𝑖 − 𝜽𝜽 𝑖𝑖𝑘𝑘∗∗ ) = 𝑛𝑛 ∑ 𝑛𝑛𝑖𝑖= { 𝐺𝐺 ( 𝑊𝑊𝑋𝑋 𝑖𝑖 + 𝑏𝑏 ) 𝛽𝛽 − 𝐺𝐺 ( 𝑊𝑊𝑋𝑋 𝑖𝑖𝑘𝑘∗∗ + 𝑏𝑏 ) 𝛽𝛽 } . Where 𝑋𝑋 𝑖𝑖𝑘𝑘∗∗ is defined by 𝑋𝑋 𝑖𝑖𝑘𝑘∗∗ = ( 𝑋𝑋 𝑖𝑖𝑘𝑘∗ , 𝑋𝑋 𝑖𝑖 ( −𝑘𝑘 ) ) . Intuitively, the risk increases if the sign of 𝑛𝑛 ∑ 𝑛𝑛𝑖𝑖= 𝛥𝛥𝜽𝜽 𝑖𝑖𝑘𝑘 is positive. (4) Adding additional optimization algorithms and activation functions for parameter tuning. We add Adam (Kingma and Ba, 2014) optimizer as an alternative optimization strategy, which further accelerates the training process. We also use the Scaled Exponential Linear Unit (SELU) activation function (Klambauer et al. , 2017) in the Cox-nnet v2.0.

Evaluation Metric s : As in Cox-nnet v1.0, we evaluate the prediction accuracy by C-IPCW (Uno et al. , 2011), which is the C-index weighted by inverse censoring probability. Dataset : The EMR data used for the study is kidney transplant data obtained from the U.S. Organ Procurement and Transplantation Network (OPTN) (https://optn.transplant.hrsa.gov/data/). A total of 80,019 patients which includes all patients with ages greater than 18 who received transplant between January 2005 and January 2013 with deceased donor type were used in the analysis. A total of 117 clinical variables describing up-to transplant characteristics are used in the analysis.

Results

The structure of Cox-nnet v2.0 is shown in Fig. 1A. The newly updated functionalities are highlighted. We randomly split the kidney transplant EMR data into training (80%) and testing (20%) sets, and used C-IPCW to evaluate on the hold-out testing set. We repeated this process 10 times to access the overall prediction performance. Cox-nnet v2.0 is not sensitive to the sample size and dramatically reduces the training time, compared to Cox-nnet v1.0 where the computing time increases polynomially with the ample size (Fig. 1B). Cox-nnet v2.0 also achieves significantly better C-IPCW than Cox-PH (Fig. 1C), without any drop of C-IPCW compared to Cox-nnet v1.0. We performed feature evaluation by calculating the feature importance scores using the new permutation method, where the values are close to those by the previous pseudo drop-out method. With the directionality (+/- signs) of the feature coefficients, our feature evaluation results are more interpretable: a positive (+) sign indicates increased risk of graft failure, whereas a negative (-) sign means decreased risk of graft failure. As additional confirmation, the pattern of important scores matches well with that of coefficients obtained from Cox-PH (Fig. 1D). In summary, Cox-nnet v2.0 significantly accelerates the training process of Cox-nnet without loss in the prediction accuracy. In addition, it also enables better interpretation for all features in the model. Cox-nnet v2.0 is the new version suitable for prognosis prediction in large-scale EMR dataset.

Author’s Contribution

LXG conceived the project, DW conducted model improvement and data analysis, KH provided the dataset and helped with the analysis. DW and LXG wrote the manuscript with the help of KH. All authors read, revised and approved the manuscript.

Declaration of Conflict of Interest

The authors declare no conflict of interest.

Acknowledgement

References

Breiman,L. (2001) Random Forests.

Mach. Learn. , , 5–32. Ching,T. et al. (2018) Cox-nnet: An artificial neural network method for prognosis prediction of high-throughput omics data. PLoS Comput. Biol. , , e1006076. Cox,D.R. (1972) Regression models and life‐tables.

J. R. Stat. Soc. Series B Stat. Methodol. , , 187–202. Fan,J. et al. (2010) High-dimensional variable selection for Cox’s proportional hazards model. In, Borrowing strength: Theory powering applications--a Festschrift for Lawrence D. Brown . Institute of Mathematical Statistics, , 70–86. Fisher,A. et al. (2019) All Models are Wrong, but Many are Useful: Learning a Variable’s Importance by Studying an Entire Class of Prediction Models Simultaneously. J. Mach. Learn. Res. , , 1–81. Goldstein,B.A. et al. (2017) Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review. J. Am. Med. Inform. Assoc. , , 198–208. Ishwaran,H. et al. (2008) Random survival forests. Ann. Appl. Stat. , , 841–860. Kingma,D.P. and Ba,J. (2014) Adam: A Method for Stochastic Optimization. arXiv [cs.LG] . Klambauer,G. et al. (2017) Self-Normalizing Neural Networks. In, Guyon,I. et al. (eds), Advances in Neural Information Processing Systems 30 . Curran Associates, Inc., pp. 971–980. Mahmoudi,E. et al. (2020) Use of electronic medical records in development and validation of risk prediction models of hospital readmission: systematic review.

BMJ , , m958. Steele,A.J. et al. (2018) Machine learning models in electronic health records can outperform conventional survival models for predicting patient mortality in coronary artery disease. PLoS One , , e0202344. Uno,H. et al. (2011) On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Stat. Med. , , 1105–1117. Zhan,Z. et al. Two-stage biologically interpretable neural-network models for liver cancer prognosis prediction using histopathology and transcriptomic data.