[PDF] A Fast and Scalable Implementation Method for Competing Risks Data with the R Package fastcmprsk

Abstract

Advancements in medical informatics tools and high-throughput biological experimentation make large-scale biomedical data routinely accessible to researchers. Competing risks data are typical in biomedical studies where individuals are at risk to more than one cause (type of event) which can preclude the others from happening. The Fine-Gray model is a popular and well-appreciated model for competing risks data and is currently implemented in a number of statistical software packages. However, current implementations are not computationally scalable for large-scale competing risks data. We have developed an R package, fastcmprsk, that uses a novel forward-backward scan algorithm to significantly reduce the computational complexity for parameter estimation by exploiting the structure of the subject-specific risk sets. Numerical studies compare the speed and scalability of our implementation to current methods for unpenalized and penalized Fine-Gray regression and show impressive gains in computational efficiency.

Full PDF

JJSS

Journal of Statistical Software

MMMMMM YYYY, Volume VV, Issue II. doi: 10.18637/jss.v000.i00

A Fast and Scalable Implementation Method forCompeting Risks Data with the R Package fastcmprsk

Eric S. Kawaguchi

University of California, Los Angeles

Jenny I. Shen

Harbor-UCLA Medical Center

Gang Li

University of California, Los Angeles

Marc A. Suchard

University of California, Los Angeles

Abstract

Advancements in medical informatics tools and high-throughput biological experimen-tation make large-scale biomedical data routinely accessible to researchers. Competingrisks data are typical in biomedical studies where individuals are at risk to more thanone cause (type of event) which can preclude the others from happening. The Fine-Graymodel is a popular and well-appreciated model for competing risks data and is currentlyimplemented in a number of statistical software packages. However, current implemen-tations are not computationally scalable for large-scale competing risks data. We havedeveloped an R package, fastcmprsk , that uses a novel forward-backward scan algorithmto signiﬁcantly reduce the computational complexity for parameter estimation by exploit-ing the structure of the subject-speciﬁc risk sets. Numerical studies compare the speedand scalability of our implementation to current methods for unpenalized and penalizedFine-Gray regression and show impressive gains in computational eﬃciency. Keywords : Fine-Gray model, inverse-censoring probability, large-scale data, scalable comput-ing, semi-parametric modeling, survival analysis, time-to-event data.

1. Introduction

Competing risks time-to-event data arise frequently in biomedical research when subjects areat risk for more than one type of possibly correlated events or causes and the occurrence ofone event precludes the others from happening. For example, one may wish to study time a r X i v : . [ s t a t . C O ] M a y fastcmprsk : Scalable Fine-Gray Regression until ﬁrst kidney transplant for kidney dialysis patients with end state renal disease. Thenterminating events such as death, renal function recovery, or discontinuation of dialysis arecompeting risks as their occurrence will prevent subjects from receiving a transplant. Whenmodeling competing risks data the cumulative incidence function (CIF), the probability ofobserving a certain cause while taking the other causes (known as the competing risks) intoaccount, is oftentimes a quantity of interest.The most commonly-used model to draw inference about the covariate eﬀect on the CIF andto predict the CIF dependent on a set of covariates is the Fine-Gray proportional subdistri-bution hazards model (Fine and Gray 1999). Various statistical packages for estimating theparameters of the Fine-Gray model are popular within the R ( R Development Core Team2019) programming language. One package, among others, is the cmprsk (Gray 2014) pack-age. The riskRegression (Gerds, Blanche, Morgensen, and Brice 2019) package, initiallyimplemented for predicting absolute risks (Gerds, Scheike, and Andersen 2012), uses a wrap-per that calls the cmprsk package to perform Fine-Gray regression. Scheike and Zhang (2011)provide timereg (Scheike 2019) that allows for general modeling of the cumulative incidencefunction and includes the Fine-Gray model as a special case. The survival package also per-forms Fine-Gray regression but does so using a weighted Cox (Cox 1972) model. Over thepast decade, there have been several extensions to the Fine-Gray method that also result inuseful packages. The crrSC (Zhou and Latouche 2013) package allows for the modeling ofboth stratiﬁed (Zhou, Latouche, Rocha, and Fine 2011) and clustered (Zhou, Fine, Latouche,and Labopin 2012) competing risks data. Kuk and Varadhan (2013) propose a stepwiseFine-Gray selection procedure and develop the crrstep (Varadhan and Kuk 2015) package forimplementation. Fu, Parikh, and Zhou (2017) then introduce penalized Fine-Gray regressionwith the corresponding crrp (Fu 2016) package.A contributing factor to the computational complexity for general Fine-Gray regression imple-mentation is parameter estimation. Generally, one needs to compute the log-pseudo likelihoodand its ﬁrst and second derivatives with respect to its regression parameters for optimization.Calculating these quantities is typically of order O ( n ), where n is the number of observationsin the dataset, due to the repeated calculation of the subject-speciﬁc risk sets. With currenttechnological advancements making large-scale data from electronic health record (EHR) datasystems routinely accessible to researchers, these implementations quickly become inoperableor grind-to-a-halt in this domain. For time-to-event data with no competing risks Mittal,Madigan, Burd, and Suchard (2014), among others, have made signiﬁcant progress in reduc-ing the computational complexity for the Cox proportional hazards model from O ( n ) to O ( n )by taking advantage of the cumulative structure of the risk set. However, the counterfactualconstruction of the risk set for the Fine-Gray model does not retain the same structure andpresents a barrier to reducing the complexity of the risk set calculation. To the best of ourknowledge, no further advancements in reducing the computational complexity required forcalculating the subject-speciﬁc risk sets exists.The contribution of this work is the development of an R package fastcmprsk which imple-ments a novel forward-backward scan algorithm (Kawaguchi, Shen, Suchard, and Li 2019) forthe Fine-Gray model. By taking advantage of the ordering of the data and the structure ofthe risk set, we can calculate the log-pseudo likelihood and its derivatives, which are necessaryfor parameters estimation, in O ( n ) time rather than O ( n ). As a consequence, our approachis scalable to large competing risks datasets and outperforms competing algorithms for bothpenalized and unpenalized parameter estimation. ournal of Statistical Software fastcmprsk package thatwe developed for R which utilizes the aforementioned algorithm, which include unpenalizedand penalized parameter estimation and CIF estimation. We perform simulation studiesin Section 5 to compare the performance of our proposed method to some of their pop-ular competitors. Section 6 provides an illustration on real data using a subset of theUnited States Renal Database Systems. The fastcmprsk package is readily available at https://github.com/erickawaguchi/fastcmprsk .

2. Preliminaries

We ﬁrst establish some notation and the formal deﬁnition of the data generating processfor competing risks. For subject i = 1 , . . . , n , let T i , C i , and (cid:15) i be the event time, possibleright-censoring time, and cause (event type), respectively. Without loss of generality assumethere are two event types (cid:15) ∈ { , } where (cid:15) = 1 is the event of interest (or primary event)and (cid:15) = 2 is the competing risk. With the presence of right-censoring, we generally observe X i = T i ∧ C i , δ i = I ( T i ≤ C i ), where a ∧ b = min( a, b ) and I ( · ) is the indicator function.Letting z i be a p -dimensional vector of time-independent subject-speciﬁc covariates, compet-ing risks data consist of the following independent and identically distributed quadruplets { ( X i , δ i , δ i (cid:15) i , z i ) } ni =1 . Assume that there also exists a τ such that 1) for some arbitrary time t , t ∈ [0 , τ ] ; 2) Pr( T i > τ ) > C i > τ ) > i = 1 , . . . , n , and that for simplicity,no ties are observed.The CIF for the primary event conditional on the covariates z = ( z , . . . , z p ) is F ( t ; z ) =Pr( T ≤ t, (cid:15) = 1 | z ). To model the covariate eﬀects on F ( t ; z ), Fine and Gray (1999) introducedthe now well-appreciated proportional subdistribution hazards (PSH) model: h ( t | z ) = h ( t ) exp( z β ) , (1)where h ( t | z ) = lim ∆ t → Pr { t ≤ T ≤ t + ∆ t, (cid:15) = 1 | T ≥ t ∪ ( T ≤ t ∩ (cid:15) = 1) , z } ∆ t = − ddt log { − F ( t ; z ) } is a subdistribution hazard (Gray 1988), h ( t ) is a completely unspeciﬁed baseline subdis-tribution hazard, and β is a p × h ( t ; z ) is somewhat counterfactual as it includes sub-jects who are still at risk ( T ≥ t ) and those who have already observed the competing riskprior to time t ( T ≤ t ∩ (cid:15) = 1). However, this construction is useful for direct modeling of theCIF. Parameter estimation and large-sample inference of the PSH model follows from the log- fastcmprsk : Scalable Fine-Gray Regression pseudo likelihood: l ( β ) = n X i =1 Z ∞ " β z i − ln (X k ˆ w k ( u ) Y k ( u ) exp (cid:0) z k β (cid:1)) ˆ w i ( u ) dN i ( u ) , (2)where N i ( t ) = I ( X i ≤ t, (cid:15) i = 1), Y i ( t ) = 1 − N i ( t − ), and ˆ w i ( t ) is a time-dependent weightbased on the inverse probability of censoring weighting (IPCW) technique (Robins and Rot-nitzky 1992). To parallel Fine and Gray (1999), we deﬁne the IPCW for subject i at time t as ˆ w i ( t ) = I ( C i ≥ T i ∧ t ) ˆ G ( t ) / ˆ G ( X i ∧ t ), where G ( t ) = Pr( C ≥ t ) is the survival function ofthe censoring variable C and ˆ G ( t ) is the Kaplan-Meier estimate for G ( t ). However, we cangeneralize the IPCW to allow for dependence between C and z .Let ˆ β mple = arg min β {− l ( β ) } be the maximum pseudo likelihood estimator of β . Fine andGray (1999) investigate the large-sample properties of ˆ β mple and prove that, under certainregularity conditions, √ n ( ˆ β mple − β ) → N (0 , Ω − ΣΩ − ) , (3)where β is the true value of β , Ω is the limit of the negative of the partial derivative matrixof the score function evaluated at β , and Σ is the variance-covariance matrix of the limitingdistribution of the score function. The package cmprsk implements this estimation procedure. An alternative interpretation of the coeﬃcients from the Fine-Gray model is to model theireﬀect on the CIF. Using a Breslow-type estimator (Breslow 1974), we can obtain a consistentestimate for H ( t ) = R t h ( s ) ds throughˆ H ( t ) = 1 n n X i =1 Z t S (0) ( ˆ β , u ) ˆ w i ( u ) dN i ( u ) , where ˆ S (0) ( ˆ β , u ) = n − P ni =1 ˆ w i ( u ) Y i ( u ) exp( z i ˆ β ). The predicted CIF, conditional on z = z ,is then ˆ F ( t ; z ) = 1 − exp (cid:26)Z t exp( z ˆ β ) d ˆ H ( u ) (cid:27) . We refer the readers to Appendix B of Fine and Gray (1999) for the large-sample propertiesof ˆ F ( t ; z ). The quantities needed to estimate R t d ˆ H ( u ) are already precomputed whenestimating ˆ β . Fine and Gray (1999) proposed a resampling approach to calculate conﬁdenceintervals and conﬁdence bands for ˆ F ( t ; z ). Oftentimes, reserachers are interested in identifying which covariates have an eﬀect on the CIF.Penalization methods (Tibshirani 1996; Fan and Li 2001; Zou 2006; Zhang, Li, and Tsai 2010)oﬀer a popular way to perform variable selection and parameter estimation simultaneouslythrough minimizing the objective function Q ( β ) = − l ( β ) + p X j =1 p λ ( | β j | ) , (4) ournal of Statistical Software l ( β ) is deﬁned in (2), p λ ( | β j | ) is a penalty function where the sparsity of the model iscontrolled by the non-negative tuning parameter λ . Fu et al. (2017) recently extend severalpopular variable selection procedures - LASSO (Tibshirani 1996), SCAD (Fan and Li 2001),adaptive LASSO (Zou 2006), and MCP (Zhang 2010) - to the Fine-Gray model, explore itsasymptotic properties under ﬁxed model dimension, and develop the R package crrp (Fu 2016)for implementation. Parameter estimation in the crrp package employs a cyclic coordinatealgorithm.The sparsity of the model depends heavily on the choice of the tuning parameters. Practically,ﬁnding a suitable (or optimal) tuning parameter involves applying a penalization method overa sequence of possible candidate values of λ and ﬁnding the λ that minimizes some metric suchas the Bayesian information criterion (Schwarz 1978) or generalized cross validation measure(Craven and Wahba 1978). A more thorough discussion on tuning parameter selection canpartially be found in Wang, Li, and Tsai (2007); Zhang et al. (2010); Wang and Zhu (2011);Fan and Tang (2013); Fu et al. (2017); Ni and Cai (2018).

3. Forward-backward scan for parameter estimation

This section discusses a novel forward-backward scan algorithm, proposed in Kawaguchi et al. (2019), that reduces the computational complexity associated with parameter estimation from O ( n ) to O ( n ). Commonly-used optimization routines generally require the calculation of thelog-pseudo likelihood (2), the score function˙ l j ( β ) = n X i =1 I ( δ i (cid:15) i = 1) z ij − n X i =1 I ( δ i (cid:15) i = 1) P k ∈ R i z kj ˜ w ik exp( η k ) P k ∈ R i ˜ w ik exp( η k ) , (5)and, in some cases, the Hessian digonals¨ l jj ( β ) = n X i =1 I ( δ i (cid:15) i = 1)  P k ∈ R i z kj ˜ w ik exp( η k ) P k ∈ R i ˜ w ik exp( η k ) − ( P k ∈ R i z kj ˜ w ik exp( η k ) P k ∈ R i ˜ w ik exp( η k ) )  , (6)where ˜ w ik = ˆ w k ( X i ) = ˆ G ( X i ) / ˆ G ( X i ∧ X k ) , k ∈ R i ,R i = { y : ( X y ≥ X i ) ∪ ( X y ≤ X i ∩ (cid:15) y = 2) } and η k = z k β for use within cyclic coordinatedescent. Direct calculations using the above formulas will need O ( n ) operations due to thedouble summations, that becomes computationally taxing for large n . Below we will showhow to calculate the double summation linearly, allowing us to compute (2), (5), and (6) in O ( n ) time.Before proceeding with the algorithm, we ﬁrst deﬁne what we mean by a forward and backwardscan. A forward (preﬁx) scan maps { a , a , . . . , a n } 7→ { a , a + a , . . . , P ni =1 a i } ; whereas abackward (preﬁx) scan maps to { P ni =1 a i , P ni =2 a i , . . . , a } . First, note that R i partitions intotwo disjoint subsets: R i (1) = { y : X y ≥ X i } and R i (2) = { y : ( X y ≤ X i ∩ (cid:15) y = 2) } . Here R i (1) is the set of observations that have an observed event time after X i and R i (2) is the setof observations that have observed the competing event before time X i . Further, ˜ w ik = 1 if k ∈ R i (1) and ˜ w ik = ˆ G ( X i ) / ˆ G ( X k ) , if k ∈ R i (2). Since R i (1) and R i (2) are disjoint, we can fastcmprsk : Scalable Fine-Gray Regression write the double summation of, for example, the score function (5) as n X i =1 I ( δ i (cid:15) i = 1) P k ∈ R i (1) z kj exp( η k ) + ˆ G ( X i ) P k ∈ R i (2) z kj exp( η k ) / ˆ G ( X k ) P k ∈ R i (1) exp( η k ) + ˆ G ( X i ) P k ∈ R i (2) exp( η k ) / ˆ G ( X k ) . (7)We will ﬁrst tackle the denominator term P k ∈ R i (1) exp( η k ) + ˆ G ( X i ) P k ∈ R i (2) exp( η k ) / ˆ G ( X k ).If we arrange the observed event times in decreasing order, we see that P k ∈ R i (1) exp( η k ) is aseries of cumulative sums. For example, given X i > X i , the set R i (1) consists of the observa-tions from R i (1) and the set of observations { y : X y ∈ [ X i , X i ) } , therefore P k ∈ R i (1) exp( η k ) = P k ∈ R i (1) exp( η k ) + P k ∈{ y : X y ∈ [ X i ,X i ) } exp( η k ) and thus calculating P k ∈ R i (1) exp( η k ) for all i = 1 , . . . , n requires O ( n ) calculations in total. However, ˆ G ( X i ) P k ∈ R i (2) exp( η k ) / ˆ G ( X k )does not monotonically increase as the event times decrease. Instead, we observe thatˆ G ( X i ) P k ∈ R i (2) exp( η k ) / ˆ G ( X k ) is a series of cumulative sums as the event times increase.Thus calculating the denominator term will requires two scans: one forward scan going for-ward from largest observed event time to smallest to calculate P k ∈ R i (1) exp( η k ) and one back-ward scan from smallest observed event time to largest to calculate ˆ G ( X i ) P k ∈ R i (2) exp( η k ) / ˆ G ( X k ).Likewise, we calculate both P k ∈ R i z kj exp( η k ) and P k ∈ R i z kj exp( η k ) in linear time since theterms z kj and z kj are multiplicative constants that do not aﬀect the cumulative structures ofthe summations. As a consequence, the ratio in the double summation is available in O ( n )time.Furthermore, the outer summation of subjects who observe the event of interest is also acumulative sum since, provided that X i > X i and both δ i = 1 and δ i = 1, i X l =1 I ( δ l (cid:15) l = 1) P k ∈ R l z kj exp( η k ) P k ∈ R l exp( η k ) = i X l =1 I ( δ l (cid:15) l = 1) P k ∈ R l z kj exp( η k ) P k ∈ R l exp( η k ) (8)+ I ( δ i (cid:15) i = 1) P k ∈ R i z kj exp( η k ) P k ∈ R i exp( η k ) , (9)that also only requires O ( n ) calculations since the ratios are precomputed in O ( n ) calculationsand thus the score function (5) can be calculated in linear time. Similarly, both the log-pseudolikelihood (2) and the diagonal elements of the Hessian (6) are also calculated linearly. Werefer readers to Kawaguchi et al. (2019) for a thorough disucssion.

4. The fastcmprsk package

We utilize this forward-backward scan algorithm for both penalized and unpenalized parame-ter estimation for the Fine-Gray model in linear time. Furthermore, we also develop scalablemethods to estimate the predicted CIF and its corresponding conﬁdence interval/band. Forconvenience to researchers and readers, we further include a function to simulate two-causecompeting risks data. Table 1 provides a quick summary of the currently available functionsprovided in fastcmprsk . We brieﬂy detail the use of these functions below.

Researchers can simulate two-cause competing risks data using the simulateTwoCauseFineGrayModel function in fastcmprsk . The data generation scheme follows a similar design to that of ournal of Statistical Software

Modeling functions fastCrr

Fits unpenalized Fine-Gray regression fastCrrp

Fits penalized Fine-Gray regression

Utilities summary

Returns ANOVA table from fastCrr output predict

Estimates CIF given a vector of covariates plot

Plots output (object dependent) varianceControl

Options for bootstrap variance simulateTwoCauseFineGrayModel

Simulates two-cause competing risks dataTable 1: Currently available functions in fastcmprsk .Fine and Gray (1999) and Fu et al. (2017). Given a design matrix Z = ( z , . . . , z n ), β ,and β , let the cumulative incidence function for cause 1 (the event of interest) be de-ﬁned as F ( t ; z i ) = Pr( T i ≤ t, (cid:15) i = 1 | z i ) = 1 − [1 − π { − exp( − t ) } ] exp( z i β ) , which is aunit exponential mixture with mass 1 − π at ∞ when z i = and where π controls thecause 1 event rate. The cumulative incidence function for cause 2 is obtained by settingPr( (cid:15) i = 2 | z i ) = 1 − Pr( (cid:15) i = 1 | z i ) and then using an exponential distribution with rateexp( z i β ) for the conditional cumulative incidence function Pr( T i ≤ t | (cid:15) i = 2 , z i ). Censoringtimes are independently generated from a uniform distribution U ( u min , u max ) where u min and u max control the censoring percentage. Appendix A provides more details on the data gen-eration process. Below is a toy example of simulating competing risks data where n = 500, β = (0 . , − . , , − . , , . , . , , , − . β = − β , u min = 0, u max = 1, π = 0 . Z is simulated from a multivariate standard normal distribution with unit variance.This simulated dataset will be used to illustrate the use of the diﬀerent modeling functionswithin fastcmprsk . R> library(fastcmprsk)R> set.seed(2019)R> nobs <- 500R> beta1 <- c(0.40, -0.40, 0, -0.50, 0, 0.60, 0.75, 0, 0, -0.80)R> beta2 <- -beta1R> Z <- matrix(rnorm(nobs * length(beta1)), nrow = nobs)R> dat <- simulateTwoCauseFineGrayModel(nobs, beta1, beta2,+ Z, u.min = 0, u.max = 1, p = 0.5)R> table(dat$fstatus) head(dat$ftime) [1] 0.098345608 0.008722629 0.208321175 0.017656904 0.495185038 0.222799124

We ﬁrst illustrate the coeﬃcient estimation from (1) using the Fine-Gray log-pseudo likeli- fastcmprsk : Scalable Fine-Gray Regression hood. The fastCrr function estimates these parameters using our forward-backward scanalgorithm and is functionally similar to the crr function from the cmprsk package. [1] 8.534242e-08

As expected, the fastCrr function calculates nearly identical parameter estimates to the crr function. We compare the runtime performance between these two methods in Section 5.1.We now show how to obtain the variance-covariance matrix for the parameter estimates. Thevariance-covariance matrix for ˆ β can not be directly estimated using the fastCrr function.First, the asymptotic expression requires estimating both Ω and Σ , which can not be triviallycalculated in linear time. Second, for large-scale data where both n and p can be large, matrixcalculations, storage, and inversion can be computationally prohibitive. Instead, we proposeto estimate the variance-covariance matrix using the bootstrap (Efron 1979). Let ˜ β (1) , . . . ˜ β ( B ) be bootstrapped parameter estimates obtained by resampling subjects with replacement fromthe original data B times. Unless otherwise noted, the size of each resample is the same asthe original data. For j = 1 , . . . , p and k = 1 , . . . , p , we can estimate the covariance betweenˆ β j and ˆ β k by d Cov ( ˆ β j , ˆ β k ) = 1 B − B X b =1 ( ˜ β ( b ) j − ¯ β j )( ˜ β ( b ) k − ¯ β k ) , (10)where ¯ β j = B P Bb =1 ˜ β ( b ) j . Therefore, with ˆ σ j = d Cov ( ˆ β j , ˆ β j ), a (1 − α ) × β j is given by ˆ β j ± z − α/ ˆ σ j , (11)where z − α/ is the (1 − α ) × th percentile of the standard normal distribution. Sinceparameter estimation for the Fine-Gray model can be done in linear time using our forward-backward scan algorithm, the collection of parameter estimates obtained by bootstrappingcan also be obtained linearly. The varianceControl function controls the parameters usedfor bootstrapping, that one then passes into the var.control argument in fastCrr . R> vc <- varianceControl(B = 100, seed = 2019)R> fit3 <- fastcmprsk::fastCrr(dat$ftime, dat$fstatus, Z,+ failcode = 1, cencode = 0, variance = TRUE,+ var.control = vc, returnDataFrame = TRUE) ournal of Statistical Software [1] 0.099 0.096 0.099 0.104 0.099 0.113 0.103 0.097 0.104 0.135 R> summary(fit3, conf.int = FALSE, digits = 2)

Fine-Gray Regression via fastcmprsk package.Call:fastcmprsk::fastCrr(dat$ftime, dat$fstatus, Z, failcode = 1,cencode = 0, variance = TRUE, var.control = vc, returnDataFrame = TRUE)fastCrr converged in 24 iterations.coef exp(coef) se(coef) z p-value[1,] 0.19228 1.212 0.0993 1.9358 5.3e-02[2,] -0.38640 0.679 0.0963 -4.0142 6.0e-05[3,] 0.01816 1.018 0.0988 0.1838 8.5e-01[4,] -0.39769 0.672 0.1042 -3.8169 1.4e-04[5,] 0.10571 1.111 0.0986 1.0724 2.8e-01[6,] 0.57494 1.777 0.1130 5.0895 3.6e-07[7,] 0.77884 2.179 0.1032 7.5478 4.4e-14[8,] -0.00611 0.994 0.0972 -0.0628 9.5e-01[9,] -0.06571 0.936 0.1040 -0.6315 5.3e-01[10,] -0.99687 0.369 0.1346 -7.4054 1.3e-13Pseudo Log-likelihood = -590Null Pseudo Log-likelihood = -675

The CIF is also available in linear time in the fastcmprsk package. Fine and Gray (1999)propose a Monte Carlo simulation method for interval and band estimation. We implement aslightly diﬀerent approach using bootstrapping for interval and band estimation in our pack-age. Let ˜ F (1)1 ( t ; z ) , . . . , ˜ F ( B )1 ( t ; z ) be the bootstrapped predicted CIF obtained by resamplingsubjects with replacement from the original data B times and let m ( · ) be a known, monotone,and continuous transformation. In our current implementation we let m ( x ) = log {− log( x ) } ;however, we plan on incorporating other transformations in our future implementation. Weﬁrst estimate the variance function σ ( t ; z ) of the transformed CIF throughˆ σ ( t ; z ) = 1 B B X b =1 h m { ˜ F ( b )1 ( t ; z ) } − ¯ m { ˜ F ( t ; z ) } i , (12)where ¯ m { ˜ F ( t ; z ) } = B P Bb =1 m { ˜ F ( b )1 ( t ; z ) } . Using the functional delta method, we can nowconstruct (1 − α ) × F ( t ; z ) by m − h m { ˆ F ( t ; z ) } ± z − α/ ˆ σ ( t ; z ) i . (13)0 fastcmprsk : Scalable Fine-Gray Regression Next we propose a symmetric global conﬁdence band for the estimated CIF ˆ F ( t ; z ), t ∈ [ t L , t U ] via bootstrap. We ﬁrst determine a critical region C − α ( z ) such thatPr  sup t ∈ [ t L ,t U ] | m { ˆ F ( t ; z ) } − m { F ( t ; z ) }| q d V ar [ m { ˆ F ( t ; z ) } ] ≤ C − α ( z )  = 1 − α. (14)While Equation (12) estimates d V ar [ m { ˆ F ( t ; z ) } ] we still need to ﬁnd C − α ( z ) by the boot-strap (1 − α ) th percentile of the distribution of the supremum in the equation above. Thealgorithm is as follows:1. Resample subjects with replacement from the original data B times and estimate˜ F ( b )1 ( t ; z ) for b = 1 , . . . , B and ˆ σ ( t ; z ) using (12).2. For the b th bootstrap sample , b ∈ { , . . . , B } , calculate C ( b ) = sup t ∈ [ t L ,t U ] | m { ˜ F ( b )1 ( t ; z ) } − m { ˆ F ( t ; z ) }| ˆ σ ( t ; z ) .

3. Estimate C − α ( z ) from the sample (1 − α ) th percentile of the B values of C ( b ) , denotedby ˆ C − α ( z ).Finally, the (1 − α ) × F ( t ; z ), t ∈ [ t L , t U ] is given by m − h m { ˆ F ( t ; z ) } ± ˆ C − α ( z )ˆ σ ( t ; z ) i . (15)One can perform CIF estimation and interval/band estimation using the predict function. R> set.seed(2019)R> z0 <- rnorm(10)

We extend our forward-backward scan approach for for penalized Fine-Gray regression asdescribed in Section 2.4. The fastCrrp function performs LASSO, SCAD, MCP, and ridge(Hoerl and Kennard 1970) penalization. The advantage of implementing this algorithm forpenalized Fine-Gray regression is two fold. Since the cyclic coordinate descent algorithm usedin the crrp function calculates the gradient and Hessian diagonals in O ( pn ) time, as opposedto O ( pn ) using our approach, we expect to see drastic diﬀerences in runtime for large samplesizes. Second, as mentioned earlier, researchers generally tune the strength of regularizationthrough multiple model ﬁts over a grid of candidate tuning parameter values. Thus thediﬀerence in runtime between both methods grows larger as the number of candidate valuesincreases. Below provides an example of performing LASSO-penalized Fine-Gray regressionusing 25 candidate values for λ . The syntax for fastCrrp is nearly identical to the syntaxfor crrp . ournal of Statistical Software . . . . Time E s t i m a t ed C I F Figure 1: CIF estimate and corresponding 95% conﬁdence intervals between t L = 0 . t U = 0 . R> library(crrp)R> lam.path <- 10^seq(log10(0.1), log10(0.001), length = 25)R> [1] 1.110223e-15

R> plot(fit.fcrrp)

5. Simulation studies

This section provides a more comprehensive illustration of the computational performance ofthe fastcmprsk package over two popular competing packages cmprsk and crrp . We simulatedatasets under various sample sizes and ﬁx the number of covariates p = 100. We generate2 fastcmprsk : Scalable Fine-Gray Regression −3.0 −2.5 −2.0 −1.5 −1.0 − . − . . . Solution path for LASSO−penalized regression log ( λ n ) β j Figure 2: Path plot for LASSO-penalized Fine-Gray regression in our toy example.the design matrix, Z from a p -dimensional standard normal distribution with mean zero, unitvariance, and pairwise correlation corr( z i , z j ) = ρ | i − j | , where ρ = 0 . β = ( β ∗ , β ∗ , . . . , β ∗ ), where β ∗ = (0 . , − . , , − . , , . , . , , , − . β = ( β ∗ , p − ). We let β = − β . We set π = 0 .

5, which correspondsto a cause 1 event rate of approximately 41%. The average censoring percentage for oursimulations varies between 30 − simulateTwoCauseFineGrayModel to simulatethese data and average results over 100 Monte Carlo replicates. We report timing on a systemwith an Intel Core i5 2.9 GHz processor and 16GB of memory. crr package In this section, we compare the runtime and estimation performance of the fastCrr functionto crr . We vary n from 1000 to 4000 and run fastCrr and crr both with and withoutvariance estimation. We take 100 bootstrap samples to obtain the bootstrap standard errorswith fastCrr .Figure 3 illustrates the runtime performance (in seconds) between both fastCrr (dashedlines) and crr (solid lines) as n increases. It is clear that the performance of the crr methodsincreases quadratically while the fastCrr methods remain approximately linear. This leadsto substantial improvement in computational performance for large sample sizes. Second, theforward-backward scan allows us to eﬃciently compute variance estimates through bootstrap-ping. We see that bootstrapping for smaller sample sizes may not result in computationalgains; however, notable diﬀerences are observed for larger sample sizes.To assess the performance of the bootstrap procedure for variance estimation, Table 3 shows ournal of Statistical Software −1012 3.0 3.2 3.4 3.6 log ( Sample size ) l og ( S e c ond s ) Method crr (var)crr (no var)fastCrr (var)fastCrr (no var)

Figure 3: Runtime comparison between fastCrr and crr with and without variance estima-tion. n = 1000 2000 3000 4000 crr fastCrr β = 0 . β = 0 . crrp package As mentioned in Section 2.4, Fu et al. (2017) provide an R package crrp for performing pe-nalized Fine-Gray regression using the LASSO, SCAD, and MCP penalties. We compare theruntime between fastCrrp with the implementation in the crrp package. To level compar-isons, we modify the source code in crrp so that the function only calculates the coeﬃcientestimates and BIC score. We vary n = 1000 , , . . . , p = 100, and employ a 25-value grid search for the tuning parameter. Figure 4 illustrates the computational advantagethe fastCrrp function has over crrp .The computational performance of crrp (solid lines) increases quadratically while fasrCrrp (dashed lines) increases linearly, resulting in a 200 to 300-fold speed up in runtime when n = 4000. This, along with the previous section, strongly suggests that for large-scale com-peting risks datasets, analyses that may take several hours or days to perform using currently4 fastcmprsk : Scalable Fine-Gray Regression

012 3.0 3.2 3.4 3.6 log ( Sample size ) l og ( S e c ond s ) Method: Penalty crrp: MCPcrrp: SCADcrrp: LASSOfastCrrp: MCPfastCrrp: SCADfastCrrp: LASSO

Figure 4: Runtime comparison between the crrp and fastcmprsk implementations of LASSO,SCAD, and MCP penalization. Solid and dashed lines represent the crrp and fastcmprsk implementation, respectively. Square, circle, and triangle symbols denote the penalties MCP,SCAD, and LASSO, respectively.implemented methods are available within seconds or minutes using our forward-backwardscan algorithm. We illustrate this in our real data analysis in the following section.

6. End-stage renal disease

We analyze data collected from the United States Renal Data System, a national data systemfunded by the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK)that collects information about end-stage renal disease in the United States. Patients withend-stage renal disease are known to have a shorter life expectancy compared to their disease-free peers (USRDS Annual Report 2017) and kidney transplantation provides better healthoutcomes for patients with end stage renal disease (Wolfe, Ashby, Milford, Ojo, Ettenger,Agodoa, Held, and Port 1999; Purnell, Luo, Kucirka, Cooper, Crews, Massie, Boulware, andSegev 2016). However patients may observe competing events such as death or renal functionrecovery or may wish to discontinue dialysis for quality of life purposes before transplant.We extract a subset of the United States Renal Data System that spans a 10-year studytime between January 2005 to June 2015 and a subsample to 125,000 subjects. We consider63 demographic and clinical covariates. The event of interest is ﬁrst kidney transplant forpatients who were currently on dialysis. Death, renal function recovery, and discontinuationof dialysis are competing risks. Subjects who are lost to follow up or had no event by the endof study period are considered as right censored. ournal of Statistical Software

Unpenalized crr fastCrr w.o. variance 4,544 4w. variance 96,120 246

Penalized crrp fastCrrp

LASSO 86,304 32SCAD 92,591 35MCP 102,585 33Table 3: Timing comparison using a subset of the USRDS dataset. The ﬁrst two rowscorrespond to unpenalized Fine-Gray regression with and without variance estimation using crr and fastCrr . The last three rows correspond to penalized Fine-Gray regression using crrp and fastCrrp .Table 3 shows the runtime results between both the crr and fastCrr implementations forunpenalized Fine-Gray regression. Using the crr function, parameter estimation withoutvariance estimation took 1.3 hours to ﬁnish and with variance estimation took 26.7 hours tocomplete. The fastCrr function performed the same tasks within seconds, resulting in anover 1000-fold speedup for parameter estimation and an over 390-fold speedup for parameterand variance estimation. With respect to estimation, both approaches return nearly identicalparameter estimates (maximum absolute diﬀerence of 3 . × − ).To compare variance estimation, Figure 5 plots the 95% conﬁdence intervals for the ﬁrst sixcovariates: age at dialysis, sex, and presence of diabetes, hypertension, atherosclerotic heartdisease, and cardiac failure. Both procedures return similar conﬁdence intervals for all sixcovariates and we also observe similar results for the covariates not included in the ﬁgure.Finally, we apply the LASSO, SCAD, and MCP variable selection routines to the dataset.Following Section 5.2, we use a grid of 25 candidate tuning parameters. The ﬁnal model foreach penalization method is chosen by selecting the tuning parameter that minimizes the BICscore. The runtime results can be found in last three rows of Table 3 which shows that thecurrent implementations for variable selection are drastically slower than our package (an over2000-fold diﬀerence in runtime). To assess the performance of each method, we consider a testset of 100,000 additional subjects and asses prediction performance through the concordanceindex (Wolbers, Koller, Witteman, and Steyerberg 2009). The predictive performance ofall three methods are comparable with similar concordance index values ( ≈ .

85) that weattribute to the massive sample size of both the training and test set. As expected, bothMCP and SCAD produce similar-sized models (48 variables for MCP and 49 variables forSCAD) due, in part, to their oracle behavior while LASSO selects 62 variables and are asuperset of the variables selected by both MCP and SCAD. The variables selected by MCPare also all contained in the SCAD model.In conclusion, our forward-backward scan algorithm results in a signiﬁcant reduction in run-time for unpenalized and penalized Fine-Gray regression for large-scale competing risks data.Analyses using current packages may take hours or even over a day to ﬁnish; whereas the fastcmprsk package completes the same tasks within seconds or minutes.6 fastcmprsk : Scalable Fine-Gray Regression

HYPER COMO_ASHD CARFAILAGE_DIAL SEX DIABETESfastCrr crr fastCrr crr fastCrr crr−0.50−0.250.000.25−0.50−0.250.000.25

Figure 5: Point estimate and 95% conﬁdence intervals reported by fastCrr (using 100 boot-strap samples) and crr .

7. Discussion

The fastcmprsk package provides a set of scalable tools for the analysis of large-scale compet-ing risks data by developing an approach to linearize the computational complexity requiredto estimate the parameters of the Fine-Gray proportional subdistribution hazards model. Thepackage implements both penalized and unpenalized Fine-Gray regression. We can conve-niently extend our forward-backward algorithm to other applications such as stratiﬁed andclustered Fine-Gray regression. Calculating standard errors for both the parameter estimatesand the CIF involves bootstrapping. We may further speed up standard error estimationthrough parallelization using, for example, the doParallel (Calaway, Weston, and Tenenbaum2018) package.Lastly, our current implementation assumes that covariates are densely observed across sub-jects. This is problematic in the sparse high-dimensional massive sample size (sHDMSS)domain (Mittal et al. ournal of Statistical Software

8. Acknowledgements

The manuscript was reviewed and approved for publication by an oﬃcer of the NationalInstitute of Diabetes and Digestive and Kidney Diseases. Data reported herein were suppliedby the USRDS. Interpretation and reporting of these data are the responsibility of the authorsand in no way should be seen as oﬃcial policy or interpretation of the US government. MarcA. Suchard’s work is partially supported through the National Institute of Health grant U19AI 135995. The research of Gang Li was partly supported by National Institute of HealthGrants P30 CA-16042, UL1TR000124-02, and P50 CA211015.

References

Breslow N (1974). “Covariance analysis of censored survival data.”

Biometrics , (1), 89–99. doi:10.2307/2529620 .Calaway R, Weston S, Tenenbaum D (2018). doParallel : Foreach Parallel Adaptor forthe ’parallel’ Package . R package version 1.0.14, URL https://CRAN.R-project.org/package=doParallel .Cox DR (1972). “Regression Models and Life-Tables.” Journal of the Royal Statistical Society:Series B (Statistical Methodology) , (2), 187–220. doi:10.1007/978-1-4612-4380-9_37 .Craven P, Wahba G (1978). “Smoothing noisy data with spline functions.” NumerischeMathematik , (4), 377–403. ISSN 0945-3245. doi:10.1007/BF01404567 .Efron B (1979). “Bootstrap Methods: Another Look at the Jackknife.” The Annals ofStatistics , (1), 1–26. doi:10.1214/aos/1176344552 .Fan J, Li R (2001). “Variable selection via nonconcave penalized likelihood and its oracleproperties.” Journal of the American Statistical Association , (456), 1348–1360. doi:10.1198/016214501753382273 .Fan Y, Tang CY (2013). “Tuning parameter selection in high dimensional penalized likeli-hood.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) , (3),531–552. doi:10.1111/rssb.12001 .Fine JP, Gray RJ (1999). “A proportional hazards model for the subdistribution of a com-peting risk.” Journal of the American Statistical Association , (446), 496–509. doi:10.1080/01621459.1999.10474144 .Fu Z (2016). crrp : Penalized Variable Selection in Competing Risks Regression . R packageversion 1.0, URL https://CRAN.R-project.org/package=crrp .Fu Z, Parikh CR, Zhou B (2017). “Penalized variable selection in competing risks regression.” Lifetime Data Analysis , (3), 353–376. doi:10.1007/s10985-016-9362-3 .Gerds TA, Blanche P, Morgensen UB, Brice O (2019). riskRegression : Risk RegressionModels and Prediction Scores for Survival Analysis with Competing Risks . R package version2019.01.29, URL https://CRAN.R-project.org/package=riskRegression .8 fastcmprsk : Scalable Fine-Gray Regression Gerds TA, Scheike TH, Andersen PK (2012). “Absolute risk regression for competing risks:interpretation, link functions, and prediction.”

Statistics in Medicine , (29), 3921–3930. doi:10.1002/sim.5459 .Gray B (2014). cmprsk : Subdistribution Analysis of Competing Risks . R package version2.2-7, URL https://CRAN.R-project.org/package=cmprsk .Gray RJ (1988). “A class of K-sample tests for comparing the cumulative incidence of a com-peting risk.” The Annals of Statistics , (3), 1141–1154. doi:10.1214/aos/1176350951 .Hoerl AE, Kennard RW (1970). “Ridge regression: Biased estimation for nonorthogonalproblems.” Technometrics , (1), 55–67. doi:10.1080/00401706.1970.10488634 .Kawaguchi ES, Shen JI, Suchard MA, Li G (2019). “A Scalable l0-Based Sparse RegressionMethod for Large-Scale Competing Risks Data.” Manuscript in preparation.Kuk D, Varadhan R (2013). “Model selection in competing risks regression.” Statistics inMedicine , (18), 3077–3088. doi:10.1002/sim.5762 .Mittal S, Madigan D, Burd RS, Suchard MA (2014). “High-dimensional, massive sample-sizeCox proportional hazards regression for survival analysis.” Biostatistics , (2), 207–221. doi:10.1093/biostatistics/kxt043 .Ni A, Cai J (2018). “Tuning Parameter Selection in Cox Proportional Hazards Model witha Diverging Number of Parameters.” Scandinavian Journal of Statistics , (3), 557–570. doi:10.1111/sjos.12313 . R Development Core Team (2019).

R: A Language and Environment for Statistical Computing .R Foundation for Statistical Computing, Vienna, Austria.Purnell TS, Luo X, Kucirka LM, Cooper LA, Crews DC, Massie AB, Boulware LE, SegevDL (2016). “Reduced racial disparity in kidney transplant outcomes in the United Statesfrom 1990 to 2012.”

Journal of the American Society of Nephrology , (8), 2511–2518. doi:10.1681/ASN.2015030293 .Robins JM, Rotnitzky A (1992). “Recovery of information and adjustment for dependentcensoring using surrogate markers.” In AIDS epidemiology , pp. 297–331. Springer. doi:10.1007/978-1-4757-1229-2_14 .Scheike T (2019). timereg : Flexible Regression Models for Survival Data . R package version1.9.3, URL https://CRAN.R-project.org/package=timereg .Scheike TH, Zhang MJ (2011). “Analyzing competing risk data using the R timereg package.” Journal of Statistical Software , (2). doi:10.18637/jss.v038.i02 .Schuemie MJ, Ryan PB, Hripcsak G, Madigan D, Suchard MA (2018). “Improving repro-ducibility by using high-throughput observational studies with empirical calibration.” Philo-sophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sci-ences , (2128), 20170356. doi:10.1098/rsta.2017.0356 .Schwarz G (1978). “Estimating the dimension of a model.” The Annals of Statistics , (2),461–464. doi:10.1214/aos/1176344136 . ournal of Statistical Software Journal of the RoyalStatistical Society: Series B (Statistical Methodology) , (1), 267–288. doi:10.1.1.35.7574 .Varadhan R, Kuk D (2015). crrstep : Stepwise Covariate Selection for the Fine & GrayCompeting Risks Regression Model . R package version 2015-2.1, URL https://CRAN.R-project.org/package=crrstep .Wang H, Li R, Tsai CL (2007). “Tuning parameter selectors for the smoothly clipped absolutedeviation method.” Biometrika , (3), 553–568. doi:10.1093/biomet/asm053 .Wang T, Zhu L (2011). “Consistent tuning parameter selection in high dimensional sparselinear regression.” Journal of Multivariate Analysis , (7), 1141–1151. doi:10.1016/j.jmva.2011.03.007 .Wolbers M, Koller MT, Witteman JC, Steyerberg EW (2009). “Prognostic models withcompeting risks: methods and application to coronary risk prediction.” Epidemiology , (4), 555–561. doi:10.1097/EDE.0b013e3181a39056 .Wolfe RA, Ashby VB, Milford EL, Ojo AO, Ettenger RE, Agodoa LY, Held PJ, Port FK(1999). “Comparison of mortality in all patients on dialysis, patients on dialysis awaitingtransplantation, and recipients of a ﬁrst cadaveric transplant.” New England Journal ofMedicine , (23), 1725–1730. doi:10.1056/NEJM199912023412303 .Zhang CH (2010). “Nearly unbiased variable selection under minimax concave penalty.” TheAnnals of Statistics , (2), 894–942. doi:10.1214/09-aos729 .Zhang Y, Li R, Tsai CL (2010). “Regularization parameter selections via generalized in-formation criterion.” Journal of the American Statistical Association , (489), 312–323. doi:10.1198/jasa.2009.tm08013 .Zhou B, Fine J, Latouche A, Labopin M (2012). “Competing risks regression for clustereddata.” Biostatistics , (3), 371–383. doi:10.1093/biostatistics/kxr032 .Zhou B, Latouche A (2013). crrSC : Competing risks regression for Stratiﬁed and Clustereddata . R package version 1.1, URL https://CRAN.R-project.org/package=crrSC .Zhou B, Latouche A, Rocha V, Fine J (2011). “Competing risks regression for stratiﬁed data.” Biometrics , (2), 661–670. doi:10.1111/j.1541-0420.2010.01493.x .Zou H (2006). “The adaptive lasso and its oracle properties.” Journal of the AmericanStatistical Association , (476), 1418–1429. doi:10.1198/016214506000000735 .0 fastcmprsk : Scalable Fine-Gray Regression A. Data generation scheme

We describe the data generation process for the simulateTwoCauseFineGrayModel function.Let n , p , Z n × p , β , β , u min , u max and π be speciﬁed. We ﬁrst generate independent Bernoullirandom variables to simulate the cause indicator (cid:15) for each subject. That is, (cid:15) i ∼ Bern { (1 − p ) exp( z i β ) } for i = 1 , . . . , n . Then, conditional on the cause, event times are simulated fromPr( T i ≤ t | (cid:15) i = 1 , z i ) = 1 − [1 − π { − exp( − t ) } ] exp( z i β ) − (1 − π ) exp( z i β ) Pr( T i ≤ t | (cid:15) i = 2 , z i ) = 1 − exp {− t exp( z i β ) } , and C i ∼ U ( u min , u max ). Therefore, for i = 1 , . . . , n , we can obtain the following quadruplet { ( X i , δ i , δ i (cid:15) i , z i ) } where X i = min( T i , C i ), and δ i = I ( X i ≤ C i ). Below is an excerpt of thecode used in simulateTwoCauseFineGrayModel to simulate the observed event times, causeand censoring indicators. ournal of Statistical Software Aﬃliation:

Eric KawaguchiDepartments of BiostatisticsUniversity of California, Los Angeles650 Charles E. Young Drive, SouthE-mail: [email protected]

URL: https://erickawaguchi.github.io/

Journal of Statistical Software published by the Foundation for Open Access Statistics

MMMMMM YYYY, Volume VV, Issue II

Submitted: yyyy-mm-dd doi:10.18637/jss.v000.i00