mlr3proba: An R Package for Machine Learning in Survival Analysis
Raphael Sonabend, Franz J. Király, Andreas Bender, Bernd Bischl, Michel Lang
aa r X i v : . [ s t a t . C O ] A ug mlr3proba: Machine Learning Survival Analysis in R mlr3proba: Machine Learning Survival Analysis in R Raphael Sonabend [email protected] Franz J. Király [email protected] Andreas Bender [email protected] Bernd Bischl [email protected] Michel Lang [email protected] Department of Statistical Science, University College London, London, WC1E 6BT, UK Department of Statistics, LMU Munich, Munich, 80539, Germany
Editor:
TBD
Abstract
As machine learning has become increasingly popular over the last few decades, so toohas the number of machine learning interfaces for implementing these models. However,no consistent interface for evaluation and modelling of survival analysis has emerged de-spite its vital importance in many fields, including medicine, economics, and engineering. mlr3proba is part of the mlr3 ecosystem of machine learning packages for R and facilitatesmlr3’s general model tuning and benchmarking by providing a multitude of performancemeasures and learners for survival analysis with a clean and systematic infrastructure fortheir evaluation. mlr3proba provides a comprehensive machine learning interface for sur-vival analysis, which allows survival modelling to finally be up to the state-of-art.
Keywords: machine learning, survival analysis, R, benchmarking, AutoML
1. Introduction mlr3proba introduces survival modelling to the mlr3 (Lang et al., 2019a) ecosystem of ma-chine learning packages. Survival modelling is the field of Statistics concerned with makingpredictions about the time until an event (often death) takes place. By utilising a probabilis-tic supervised learning (Gressmann et al., 2018) framework mlr3proba allows for multiplesurvival analysis predictions: predicting the time to an event, the probability of an eventover time, and the relative risk of an event. mlr3proba includes an extensive collection ofclassical and machine learning models, and many specialised survival measures.R (R Core Team, 2020) is a favoured programming language for both survival analysisand machine learning as it has an extensive library of statistical functions and models imple-mented in both its core functionality, and in open source packages from CRAN and Biocon-ductor. mlr3proba leverages these packages by connecting a multitude of machine learningmodels and measures for survival analysis. By interfacing these packages, mlr3proba cur-rently supports simulation of survival data, classical survival models, prediction of survivaldistributions by machine learning, and support for high-dimensional data (for example ge-nomics and imaging). Interfacing other packages in the mlr3 family allows optimisation andbenchmarking for all of these. onabend, Király, Bender, Bischl, and Lang
2. Implemented Functionality
A standard pipeline for survival analysis consists of: i) Defining a survival task as a set offeatures and survival outcome (time until the event and a censoring indicator); ii) Traininga model on survival data, with the possibility of optimisation via tuning of hyper-parame-ters; iii) Making predictions from the trained model on new data; iv) Evaluating the qualityof predictions with survival-specific measures, possibly including visualisation. mlr3proba streamlines this process by: i) Standardising survival tasks, with the
Surv object from the survival (Therneau, 2015) package, into a single object capable of handlingleft-, interval-, and right-censoring (
TaskSurv ); ii) Unifying all survival learners (
LearnSurv* )with iii) prediction objects that clearly distinguish model prediction types (
PredictionSurv ); iv) Uni-fying survival measures for different survival prediction types (
MeasureSurv* ).More than 20 survival learners are currently implemented, ranging from classical statis-tical models like Cox regression (Cox, 1972) to machine learning methods including Ran-dom Survival Forests (Ishwaran et al., 2008), gradient boosting machines, and many others.When models return a survival probability prediction, these are cast into standardized dis-tribution objects using the distr6 package (Sonabend and Kiraly, 2019), which allows cleanpost-processing, such as predicting survival and hazard functions, among other uses.For comparison of different models, 19 survival measures are implemented in mlr3proba ,including discrimination measures (variants of concordance indices) and scoring rules, for ex-ample the (time-dependent) Graf score (Graf et al., 1999). Several of these are implementeddirectly in mlr3proba with an
Rcpp (Eddelbuettel and Francois, 2011) implementation forfast and reliable performance.Careful design and documentation of models and measures clearly demonstrates theprecise return type provided by models and evaluated by measures, which has historicallybeen problematic in survival analysis (see section 3). Model tuning and optimisation isavailable via mlr3tuning (Lang et al., 2019b) and preprocessing, feature selection, and moregeneral pipelines from mlr3pipelines (Binder et al., 2020).
3. Return Types
A key advantage of mlr3proba , is a clear distinction between model prediction types. Sur-vival models can be used to produce a variety of different prediction types but implemen-tation has historically not reflected this. In other supervised learning fields this is not aproblem as regression always predicts a continuous value for the outcome, and classificationeither predicts a category or a probability (the two can immediately be seen to be different).However in survival analysis there are several different possible predictions that could bemade and without clear documentation, these can look very similar. For example, if a userwanted to compare predictions from Cox PH and survival tree models without mlr3proba ,this would require the following steps: i) Train and predict from the models using two sepa-rate packages; ii) identify that the model predictions are not directly comparable; iii) use athird package to transform the predictions into a compatible form (for example combininga relative risk prediction with a baseline hazard estimation); iv) identify which measurescan be used to evaluate this form; and v) find the package that includes these measures andpotentially write functions to interface with the package. lr3proba: Machine Learning Survival Analysis in R In mlr3proba , this distinction is made clear by defining four distinct prediction types:i) response , which returns the predicted survival time (expected time until the event oc-curs); ii) lp , which returns a prediction for the linear predictor of a linear model; iii) crank ,a continuous ranking for comparing the relative risk between observations in the test sam-ple; iv) distr , a survival distribution implemented via distr6 , which includes functionalityfor evaluating the survival and hazard function.This problem of return types is important as different types are not comparable andhistorically this has not been successfully managed, leading to impractical benchmark exper-iments. By defining these prediction types as different objects in mlr3proba , incompatiblebenchmark experiments are avoided by internal validation checks. Finally, by making useof mlr3pipelines (Binder et al., 2020), transformations between prediction types can bemade, with clear documentation and parameter setting that ensure a simple interface for theuser. The example in Listing 1 demonstrates a GBM learner using the distrcompositor totransform a linear predictor prediction to a distribution by assuming a PH form and esti-mating the baseline survival function with the Kaplan-Meier estimator (Kaplan and Meier,1958). Mathematically, this transformation is S PH ( τ ) = ˆ S ( τ ) exp(ˆ η ) where ˆ S is the baselinehazard estimated by the Kaplan-Meier estimator and ˆ η is the predicted linear predictor.
4. Related Work
There are an increasing number of machine learning packages across programming languages,including caret (Kuhn, 2008), mlr (Bischl et al., 2016), tidymodels (Kuhn and Wickham,2020), and scikit-learn (Pedregosa et al., 2011). However, functionality for survival anal-ysis has been mostly limited to ‘classical’ statistical models with relatively few packagessupporting a machine learning framework.R ships with the package survival (Therneau, 2015), which supports left-, interval-,andright-censoring, competing risks, time-dependent models, stratification, and model evalu-ation. However the package is limited to classical statistical models, with no support formachine learning and limited support for formal comparison or non-linear models. pec (Mogensen et al., 2014) implements no models itself but instead interfaces with manydifferent survival packages to create survival probability predictions. The package’s mainfocus is on model evaluation via prediction error curves (‘pec’s) with little support for modelbuilding/training and predicting. skpro (Gressmann et al., 2018) is a probabilistic supervised learning interface in Python. skpro extends the scikit-learn (Pedregosa et al., 2011) interface to probabilistic modelsand appears to be the only package (in any language) dedicated to domain-agnostic prob-abilistic supervised learning. The interface provides an infrastructure for machine learningbased survival analysis with design choices influencing mlr3proba , but skpro does not cur-rently support survival models. pysurvival (Fotso et al., 2019) is another Python package, which implements classi-cal and machine learning survival analysis models. The package has the advantage of beingable to natively leverage neural network survival models, which are almost exclusively imple-mented in Python. Whilst not directly interfacing the scikit-learn interface, the packageintroduces unified functions for model fitting, predicting, and evaluation. onabend, Király, Bender, Bischl, and Lang scikit-survival (PÃűlsterl, 2019) builds directly on scikit-learn to implement afew survival models and measures in a machine learning framework. Unlike pysurvival , noneural networks are included, thus the two packages complement each other well.
5. Example
Listing 1 gives an example of how to benchmark three survival models, set hyper-parametervalues, and make use of the distribution compositor.Line 1: Required packages are loaded, mlr3proba always requires mlr3 . Lines 2-3:Kaplan-Meier and Cox PH learners are constructed with default parameters. Line 4: Adecision tree is initialized with user-specified choices for the hyper-parameters maxdepth and minsplit . Lines 5-6: The GBM learner is wrapped in the distrcompositor to transformits ranking prediction to a probabilistic prediction. Line 7: Learners are combined into a listfor use in the benchmark function. Line 8: Benchmarking is performed on a pre-specifiedtask using the rats dataset from survival . Line 9: Three-fold cross-validation is specified.Line 10: The infrastructure for the experiment is automatically determined by supplyingthe task(s), learners, and resampling method. Line 11: Learners are resampled accordingto the chosen scheme and benchmarked. Line 12: Predictions are aggregated over all foldsand scored with Uno’s concordance index (Uno et al., 2011) to provide a final comparison. library(mlr3); library(mlr3proba); library(mlr3learners.gbm) kaplan = lrn("surv.kaplan") cox = lrn("surv.coxph") rpart = lrn("surv.rpart", maxdepth = 10, minsplit = 10) gbm = distrcompositor(lrn("surv.gbm", n.trees = 50), estimator = "kaplan", form = "ph") learns = list(cox, kaplan, rpart, gbm) task = tsk("rats") resample = rsmp("cv", folds = 3) design = benchmark_grid(task, learns, resample) bm = benchmark(design) bm$aggregate(msr("surv.cindex", weight_meth = "G2")) Listing 1: Example code for constructing, benchmarking, and evaluating survival models.
6. Availability, Documentation, and Code Quality Control
Packages in the mlr3 ecosystem are released under the GNU Lesser General Public Licenseversion 3 (LGPL-3) on GitHub ( https://github.com/mlr-org ) and CRAN. Documenta-tion is available at https://mlr3proba.mlr-org.com/ . Survival learners are available fromthe mlr3learners
GitHub organisation ( https://github.com/mlr3learners ). Furtheruse-case centric information is available in the (work-in-progress) mlr3 book ( https://mlr3book.mlr-org.com )and examples in the mlr3 gallery ( https://mlr3gallery.mlr-org.com ). An extensive suiteof unit tests is run on different continuous integration systems. Contributor guidelines detailexpectations and requirements for external contributions. lr3proba: Machine Learning Survival Analysis in R Acknowledgments
RS receives a PhD stipend from EPSRC (EP/R513143/1). ML is funded by the GermanFederal Ministry of Education and Research (BMBF) under Grant No. 01IS18036A and byDeutsche Forschungsgemeinschaft (DFG) within the Collaborative Research Center SFB 876“Providing Information by Resource-Constrained Analysis”.
References
Martin Binder, Florian Pfisterer, Bernd Bischl, Michel Lang, and Susanne Dandl.mlr3pipelines: Preprocessing Operators and Pipelines for ’mlr3’, 2020. URL https://mlr3pipelines.mlr-org.com .Bernd Bischl, Michel Lang, Lars Kotthoff, Julia Schiffner, Jakob Richter, ErichStuderus, Giuseppe Casalicchio, and Zachary M Jones. mlr: Machine Learn-ing in R.
Journal of Machine Learning Research , 17(170):1—-5, 2016. URL http://jmlr.org/papers/v17/15-066.html .D. R. Cox. Regression Models and Life-Tables.
Journal of the Royal Statistical Society.Series B (Methodological) , (Vol. 34, No. 2.):187–220, 1972. ISSN 00359246. doi: 10.1111/j.2517-6161.1972.tb00899.x.Dirk Eddelbuettel and Romain Francois. Rcpp: Seamless R and C++ Integration.
Journal of Statistical Software , 40(8):1–18, 2011. doi: 10.18637/jss.v040.i08. URL .Stephane Fotso et al. PySurvival: Open source package for survival analysis modeling, 2019.URL .Erika Graf, Claudia Schmoor, Willi Sauerbrei, and Martin Schumacher. Assess-ment and comparison of prognostic classification schemes for survival data.
Statistics in Medicine , 18(17-18):2529–2545, 1999. ISSN 0277-6715. doi:10.1002/(SICI)1097-0258(19990915/30)18:17/18<2529::AID-SIM274>3.0.CO;2-5. URL https://onlinelibrary.wiley.com/doi/abs/10.1002/%28SICI%291097-0258%2819990915/30%2918%3A17/18%3C2529%3A%3AAID-SIM274%3E3.0.CO%3B2-5 .Frithjof Gressmann, Franz J. Király, Bilal Mateen, and Harald Oberhauser. Probabilisticsupervised learning. 2018. URL http://arxiv.org/abs/1801.00753 .Hemant Ishwaran, Udaya B. Kogalur, Eugene H. Blackstone, and Michael S. Lauer.Random survival forests.
The Annals of Applied Statistics , 2(3):841–860, Septem-ber 2008. ISSN 1932-6157, 1941-7330. doi: 10.1214/08-AOAS169. URL http://projecteuclid.org/euclid.aoas/1223908043 .E. L. Kaplan and Paul Meier. Nonparametric Estimation from Incomplete Observations.
Journal of the American Statistical Association , 53(282):457–481, 1958. ISSN 01621459.doi: 10.2307/2281868. onabend, Király, Bender, Bischl, and Lang Max Kuhn. Building Predictive Models in R Using the caret Package.
Journal of Sta-tistical Software; Vol 1, Issue 5 (2008) , nov 2008. doi: 10.18637/jss.v028.i05. URL .Max Kuhn and Hadley Wickham. tidymodels: Easily Install and Load the ’Tidymodels’Packages, 2020. URL https://cran.r-project.org/package=tidymodels .Michel Lang, Martin Binder, Jakob Richter, Patrick Schratz, Florian Pfisterer, Ste-fan Coors, Quay Au, Giuseppe Casalicchio, Lars Kotthoff, and Bernd Bischl.mlr3: A modern object-oriented machine learning framework in R.
Journalof Open Source Software , 4(44):1903, 2019a. doi: 10.21105/joss.01903. URL https://joss.theoj.org/papers/10.21105/joss.01903 .Michel Lang, Jakob Richter, Bernd Bischl, and Daniel Schalk. mlr3tuning: Tuning for’mlr3’, 2019b. URL https://cran.r-project.org/package=mlr3tuning .Ulla B Mogensen, Hemant Ishwaran, and Thomas A Gerds. Evaluating Ran-dom Forests for Survival Analysis using Prediction Error Curves, 2014. URL .F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blon-del, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cour-napeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learn-ing in Python.
Journal of Machine Learning Research , 12:2825–2830, 2011. URL http://jmlr.org/papers/v12/pedregosa11a.html .Sebastian PÃűlsterl. scikit-survival, July 2019. URL https://doi.org/10.5281/zenodo.3352343 .R Core Team. R: A Language and Environment for Statistical Computing, 2020. URL .Raphael Sonabend and Franz Kiraly. distr6: The Complete R6 Probability DistributionsInterface, jul 2019. URL https://cran.r-project.org/package=distr6 .Terry M. Therneau. A Package for Survival Analysis in S, 2015. URL https://cran.r-project.org/package=survival .Hajime Uno, Tianxi Cai, Michael J. Pencina, Ralph B. D’Agostino, and L J Wei. On the C-statistics for Evaluating Overall Adequacy of Risk Prediction Procedures with CensoredSurvival Data.
Statistics in Medicine , 30(10):1105–1117, 2011. ISSN 02776715. doi:10.1002/sim.4154., 30(10):1105–1117, 2011. ISSN 02776715. doi:10.1002/sim.4154.