[PDF] Robust Bias Estimation for Kaplan-Meier Survival Estimator with Jackknifing

Abstract

For studying or reducing the bias of functionals of the Kaplan-Meier survival estimator, the jackknifing approach of Stute and Wang (1994) is natural. We have studied the behavior of the jackknife estimate of bias under different configurations of the censoring level, sample size, and the censoring and survival time distributions. The empirical research reveals some new findings about robust calculation of the bias, particularly for higher censoring levels. We have extended their jackknifing approach to cover the case where the largest observation is censored, using the imputation methods for the largest observations proposed in Khan and Shaw (2013b). This modification to the existing formula reduces the number of conditions for creating jackknife bias estimates to one from the original two, and also avoids the problem that the Kaplan--Meier estimator can be badly underestimated by the existing jackknife formula.

Full PDF

aa r X i v : . [ s t a t . M E ] D ec Robust Bias Estimation for Kaplan–Meier SurvivalEstimator with Jackkniﬁng

Md Hasinur Rahaman Khan a , J. Ewart H. Shaw b a Institute of Statistical Research and Training, University of Dhaka, Bangladesh b Department of Statistics, University of Warwick, UK

Abstract

For studying or reducing the bias of functionals of the Kaplan–Meier survivalestimator, the jackkniﬁng approach of Stute and Wang (1994) is natural. Wehave studied the behavior of the jackknife estimate of bias under diﬀerentconﬁgurations of the censoring level, sample size, and the censoring andsurvival time distributions. The empirical research reveals some new ﬁndingsabout robust calculation of the bias, particularly for higher censoring levels.We have extended their jackkniﬁng approach to cover the case where thelargest observation is censored, using the imputation methods for the largestobservations proposed in Khan and Shaw (2013b). This modiﬁcation to theexisting formula reduces the number of conditions for creating jackknife biasestimates to one from the original two, and also avoids the problem thatthe Kaplan–Meier estimator can be badly underestimated by the existingjackknife formula.

Keywords:

Bias, Censoring, Jackkniﬁng, Kaplan–Meier Estimator

Preprint submitted to Elsevier October 1, 2018 . Introduction

Suppose that there is a random sample of n individuals. Let T i and C i be the random variables that represent the lifetime and censoring time forthe i th individual. We also assume T i has unknown distribution function F .The Kaplan-Meier (K–M) estimator, ˆ F KM (Kaplan and Meier, 1958) is thendeﬁned by 1 − ˆ F KM ( t ) = Y y ( i ) ≤ y (cid:16) n − in − i + 1 (cid:17) δ ( i ) , (1)where Y (1) ≤ · · · ≤ Y ( n ) are the ordered observations (censored and uncen-sored lifetimes), δ ( i ) = 1 if Y ( i ) is observed and δ ( i ) = 0 if Y ( i ) is censored,ties between censoring times are treated as if the former precede the latter,and other ties are ordered arbitrarily. Suppose that S is a given statisticalfunction so that S ( F ) is the parameter of interest. It follows from Stute(1994) that if S is nonlinear then the K–M based estimator, S ( F KM ), isbiased. Stute (1994) also discussed the situation where the bias arises evenfor linear S when the data of interest are partially observable. Now for any F -integrable function ϕ , the corresponding estimator of the parameter ofinterest, S ( ˆ F KM ) is deﬁned by the K–M integral R ϕ ( Y ( i ) ) d ˆ F KM .The K–M estimator is well known to be unbiased if there is no ran-dom censorship but it becomes biased under censorship. Gill (1980) wasthe ﬁrst to bound the bias of ˆ F KM : − F H ≤ E ( ˆ F KM ) − F ≤

0, where H is the distribution function of Y . Mauro (1985) extended this result toarbitrary K–M integrals with non-negative integrands. Zhou (1988) provedthat the bias of the K–M estimator functional decreases at an exponentialrate, and always underestimates the true value. He established the lower2ound: − R ϕ H F ( dt ) ≤ bias( R ϕ d ˆ F KM ) ≤

0. Stute (1994) derived theexact formula for the bias of R ϕ d ˆ F KM for a general Borel-measurable func-tion, ϕ . He also discussed the eﬀect of light, medium or heavy censoring onthe bias of R ϕ d ˆ F KM . Stute and Wang (1994) derived an explicit formulafor the jackknife estimate of the bias of R ϕ ( Y ( i ) ) d ˆ F KM . They also showedthat jackkniﬁng can lead to a considerable reduction of the bias. Four yearslater, Shen (1998) proposed another explicit formula for jackknife estimate ofbias of R ϕ ( T ∗ ( i ) ) d ˆ F KM . He used delete-2 jackkniﬁng where two observationsare deleted. It follows from Shen (1998) that the formula based on delete-2doesn’t show any further improvement on the delete-1 formula. Stute (1996)also proposed a jackknife estimate of the variance of R ϕ ( Y ( i ) ) d ˆ F KM .As mentioned in Stute and Wang (1994), under random censorship theestimator S ( ˆ F KM ) becomes the K–M integral S ( ˆ F KM ) = n X i =1 w i ϕ ( Y ( i ) ) ≡ ˆ S KMϕ , i = 1 , · · · , n (2)where the the K–M weights w i are the sizes of the jumps by which the K–Mestimator of F changes at the uncensored points Y ( i ) , given by w = δ (1) n , w i = δ ( i ) n − i + 1 i − Y j =1 (cid:16) n − jn − j + 1 (cid:17) δ ( j ) , i = 2 , · · · , n. (3)A detailed study of the w i ’s in connection with the strong law of large num-bers under censoring has been carried out in Stute and Wang (1993).The jackknife estimate of bias for the K–M integral (Eq. 2) is given byBias ( ˆ S KMϕ ) = − n − n ϕ ( Y ( n ) ) δ ( n ) (1 − δ ( n − ) n − Y j =1 (cid:16) n − − jn − j (cid:17) δ ( j ) . (4)3he associated bias corrected jackknife estimator is therefore given by˜ S KMϕ = ˆ S KMϕ − Bias ( ˆ S KMϕ ) . (5)

2. Modiﬁed Jackknife Bias for K–M Lifetime Estimator

When no censoring is present, ˆ F KM reduces to the usual sample distribu-tion estimator ˆ F that assign weight n to each observation. With censoring,the weighting method (3) gives zero weight to the censored observations Y +( . ) ,causing particular problems if the largest datum is censored (i.e. δ ( n ) = 0).As a ﬁrst step one may apply Efron’s (1967) tail correction approach: reclas-sify δ ( n ) = 0 as δ ( n ) = 1. In order to reduce estimation bias and ineﬃciency,Khan and Shaw (2013b) proposed ﬁve alternatives to Efron’s approach, thatcan lead to more eﬃcient and less biased estimates. The approaches aresummarised in Table 1. The ﬁrst four approaches are based on the under- Table 1: The imputation approaches from Khan and Shaw (2013b). W τ m : Adding the Conditional Mean W τ md : Adding the Conditional Median W τ ∗ m : Adding the Resampling-based Conditional Mean W τ ∗ md : Adding the Resampling-based Conditional Median W ν : Adding the Predicted Diﬀerence Quantitylying regression assumption relating lifetimes and covariates (e.g., the AFTmodel), and the ﬁfth approach W ν , is based on only the random censorshipassumption.The jackknife bias in Eq. (4) is non-zero if and only if the largest datumis uncensored, δ ( n ) = 1, and the second largest datum is censored, δ ( n − =4. Stute and Wang (1994) state that if δ ( n ) = 0, then the correspondingobservation doesn’t contain enough information about F to make a changeof ˆ S KMϕ desirable. This inability to estimate bias if δ ( n ) = 0 is a majorlimitation of the jackknife bias formula.If ( δ ( n − = 0 , δ ( n ) = 0), then we can obtain a modiﬁed jackknife esti-mate of bias by imputing the largest datum, for example using any of theapproaches given in Table 1. From Eq. (2) this gives the modiﬁed estimatorˆ S ∗ ϕKM ≡ n − X i =1 w i ϕ ( Y ( i ) ) + ´ w n ϕ ( ˜ Y ( n ) ) , i = 1 , · · · , n − , (6)where ˜ Y ( n ) is the imputed largest observation, and ´ w n is the correspondingadjusted K–M weight´ w n = w n + n − n n − Y j =1 (cid:16) n − − jn − j (cid:17) δ ( j ) as suggested in Stute and Wang (1994) for the pair ( δ ( n − = 0 , δ ( n ) = 1).The modiﬁed estimator (6) is also obtained when imputing in the situation( δ ( n − = 1 , δ ( n ) = 0). In this case the K–M weight to ˜ Y ( n ) is not adjustedand we arrive at the estimatorˆ S ∗ ϕKM ≡ n − X i =1 w i ϕ ( Y ( i ) ) + w n ϕ ( ˜ Y ( n ) ) , i = 1 , · · · , n − . So unlike the actual jackknife formula the modiﬁed approach doesn’t im-pose any condition on the censoring status of Y ( i ) . The modiﬁed estimate ofbias is given byBias ( ˆ S ∗ ϕKM ) = − n − n ϕ ( ˜ Y ( n ) ) δ ∗ ( n ) (1 − δ ( n − ) n − Y j =1 (cid:16) n − − jn − j (cid:17) δ ( j ) , (7)5here δ ∗ ( n ) is the modiﬁed censoring indicator for ˜ Y ( n ) . With the above ap-proach, δ ∗ ( n ) is always 1. It follows from Eq. (7) the larger bias quantitybecause ˜ Y ( n ) > Y ( n ) . The modiﬁed bias corrected jackknife estimator is thendeﬁned by ˜ S ∗ ϕKM = ˆ S ∗ ϕKM − Bias ( ˆ S ∗ ϕKM ) . (8)The K–M estimates under both approaches for the four pairs are summa-rized in Table 2. Table 2: K–M lifetime estimates by censoring indicators for the last two observations.

K–M estimate δ ( n − δ ( n ) ˆ S ∗ ϕKM + n − n ϕ ( ˜ Y ( n ) ) δ ∗ ( n ) (1 − δ ( n − ) Q n − j =1 (cid:16) n − − jn − j (cid:17) δ ( j ) S ∗ ϕKM S KMϕ S KMϕ + n − n ϕ ( ˜ Y ( n ) ) δ ( n ) (1 − δ ( n − ) Q n − j =1 (cid:16) n − − jn − j (cid:17) δ ( j ) S ( ˆ F KM )based on both the actual and the modiﬁed jackknife bias formula. For com-putational simplicity we look only at the K–M mean lifetime estimator, ob-tained by replacing ϕ ( y ) by y in Eq. (2). Note that researchers in reliabilityare very often interested in estimating the mean lifetime of a component, andthat the K–M mean lifetime estimate also has an important role in HealthEconomics, for example, in a “QTWIST” analysis (Glasziou et al. 1990).Obviously the behaviour of the K–M mean lifetime estimator depends onthe nature of the distribution being estimated and the degree of censoring,although the true distribution of censored data is generally unknown. Wetherefore conducted simulation studies to demonstrate the behavior of the6–M mean lifetime estimator in the presence of right censoring. We assumethat the lifetimes and censoring times have independent distributions.Note that the mean survival time can be deﬁned as the area under thesurvival curve, S ( t ) (Kaplan and Meier, 1958). A nonparametric estimate ofthe mean survival time can also be obtained by substituting the K–M meanestimator for the unknown survival function ˆ µ = R ∞ ˆ S ( t ) d t . Stute (1994)proposed a bias corrected jackknife estimator for the K–M mean lifetime.When the observations are subject to right censoring, the usual mean esti-mator of the mean lifetime is not appropriate (Datta, 2005). The reason isthat the censoring leads to an inconsistent estimator that underestimates thetrue mean and the bias worsens as the censoring increases.

3. Simulation Study

This section reports on three simulation based examples. The ﬁrst exam-ple extends the Koziol-Green model simulations of Stute and Wang (1994).The second example considers various skewed distributions for survival timesand corresponding distributions for the associated censored times. The thirdexample uses a log-normal AFT model where the event times are assumedto be associated with several covariates.

This extends the simulations of the Koziol-Green proportional hazardsmodel from Stute and Wang (1994). Under this model both T and C wereexponentially distributed: T ∼ Exp (1) and C ∼ Exp ( λ ), with varying λ ’s.Four diﬀerent sample sizes n = 30 , , ,

150 are used. For each sample,100 ,

000 simulation runs are drawn and the bias and variance of both the7ean lifetime estimators ˆ S KM mean and ˜ S KM mean are computed. The bias andits variance are shown in Table 3 and 4 (the ﬁrst sub-table for both tables)respectively. Table 3: Simulation results based on the Koziol-Green model for the bias of the four K–Mmean lifetime estimators ˆ S KM mean, ˜ S KM mean, ˆ S ∗ KM mean and ˜ S ∗ KM mean. P % n=30 n=50 n=100 n=150 n=30 n=50 n=100 n=150Bias of ˆ S KM mean Bias of ˜ S KM mean10 -0.155 -0.114 -0.073 -0.055 -0.154 -0.114 -0.073 -0.05620 -0.197 -0.157 -0.107 -0.085 -0.191 -0.155 -0.107 -0.08630 -0.250 -0.205 -0.151 -0.126 -0.233 -0.195 -0.146 -0.12340 -0.304 -0.265 -0.209 -0.178 -0.267 -0.239 -0.193 -0.16450 -0.364 -0.327 -0.278 -0.248 -0.295 -0.268 -0.237 -0.21560 -0.409 -0.389 -0.349 -0.328 -0.287 -0.281 -0.263 -0.25570 -0.430 -0.426 -0.413 -0.396 -0.224 -0.234 -0.246 -0.24580 -0.402 -0.417 -0.428 -0.428 -0.082 -0.097 -0.127 -0.14190 -0.280 -0.304 -0.335 -0.346 0.161 0.178 0.171 0.164Bias of ˆ S ∗ KM mean Bias of ˜ S ∗ KM mean10 -0.208 -0.147 -0.090 -0.067 -0.207 -0.147 -0.090 -0.06820 -0.259 -0.202 -0.132 -0.104 -0.252 -0.200 -0.132 -0.10430 -0.326 -0.261 -0.186 -0.155 -0.309 -0.251 -0.181 -0.15240 -0.391 -0.335 -0.260 -0.218 -0.354 -0.310 -0.243 -0.20550 -0.465 -0.407 -0.343 -0.304 -0.396 -0.349 -0.303 -0.27160 -0.511 -0.481 -0.426 -0.400 -0.389 -0.372 -0.341 -0.32770 -0.518 -0.512 -0.495 -0.475 -0.312 -0.320 -0.328 -0.32580 -0.463 -0.481 -0.496 -0.498 -0.162 -0.162 -0.195 -0.21090 -0.304 -0.331 -0.367 -0.380 0.151 0.151 0.139 0.129 The results show that, for both estimators, the bias increases as censoringincreases until a particular censoring level, then declines. That particularcensoring level falls in the range 60 to 80. Above that censoring level thebias decreases as censoring increases, and decreases much more rapidly forthe corrected estimator than for the K–M estimator. In addition, the bias8 able 4: Simulation results based on the Koziol − Green model for variance of the bias ofthe four K − M mean lifetime estimators ˆ S KM mean, ˜ S KM mean, ˆ S ∗ KM mean and ˜ S ∗ KM mean. P % n=30 n=50 n=100 n=150 n=30 n=50 n=100 n=150Variance of bias of ˆ S KM mean Variance of bias of ˜ S KM mean10 0.004 0.002 0.001 0.000 0.010 0.005 0.002 0.00120 0.008 0.006 0.003 0.002 0.019 0.013 0.006 0.00430 0.016 0.012 0.006 0.004 0.037 0.027 0.014 0.01040 0.024 0.019 0.012 0.009 0.056 0.045 0.028 0.02150 0.034 0.028 0.021 0.016 0.082 0.064 0.049 0.03760 0.041 0.037 0.029 0.025 0.096 0.088 0.067 0.05870 0.040 0.038 0.034 0.032 0.092 0.090 0.081 0.07480 0.030 0.030 0.029 0.029 0.071 0.074 0.069 0.07190 0.011 0.011 0.013 0.013 0.034 0.032 0.034 0.035Variance of bias of ˆ S ∗ KM mean Variance of bias of ˜ S ∗ KM mean10 0.021 0.008 0.003 0.001 0.034 0.014 0.004 0.00220 0.031 0.019 0.007 0.004 0.053 0.032 0.012 0.00830 0.056 0.035 0.015 0.011 0.095 0.061 0.027 0.02040 0.078 0.056 0.031 0.022 0.135 0.099 0.057 0.03950 0.116 0.077 0.054 0.039 0.201 0.136 0.098 0.07060 0.117 0.100 0.073 0.059 0.209 0.181 0.132 0.10870 0.101 0.092 0.082 0.073 0.183 0.171 0.151 0.13580 0.063 0.064 0.061 0.063 0.121 0.128 0.118 0.12390 0.018 0.019 0.022 0.023 0.045 0.045 0.049 0.051 for the corrected estimator at P % = 90 censoring is positive for all samplesizes. This behaviour at high censoring levels does not appear in Stute andWang (1994) who investigated the bias up to only P % = 66 .

7, but it is easilyseen from Table 2 that if censoring is 100%, then δ ( n ) = 0, so the bias is 0. Asimilar trend is observed for the variance of the bias of the two estimators.We have computed also the bias of the jackknife estimate and its variancebased on both the modiﬁed estimators ˆ S ∗ KM mean and ˜ S ∗ KM mean. The modiﬁ-cation is based on the predicted diﬀerence quantity approach where ˜ Y ( n ) is9eplaced by Y ( n ) + ν ( W ν in Table 1), as discussed in Khan and Shaw (2013b).The bias and its variance are shown in Table 3 and 4 respectively (the secondsub-table for both tables). The results demonstrate that under the modiﬁedapproach, slightly larger bias and variance estimates are obtained. Theiroverall trends are similar to those of the original estimators. In the second simulation, survival times are generated from four skeweddistributions , and censoring times independently from other speciﬁed dis-tributions, as listed in Table 5. Datasets are generated randomly subject tothe restriction δ ( n − = 0, and, for the original jackknife formula, with theadditional restriction δ ( n ) = 1. Table 5: The failure time distributions with their corresponding censoring distributions.

Failure time distributions Censoring distributionsLog-normal (1.1, 1): √ π exp( − (log t − . / t Uniform: U ( a, a )Exponential (0.2): exp( − t ) Exponential: Exp ( λ )Gamma (4, 1): t exp( − t ) Uniform: U ( a, a )Weibull (3.39, 3): . t exp( − t . ) Uniform: U ( a, a ) In the case when T ∼ Exp (0 .

2) and C ∼ Exp ( λ ) for a chosen levelof censoring percentage P % , it follows that Y and δ are independent with P % /

100 = pr ( δ = 0) = λ/ (0 . λ ). For censoring time the Uniform distri-bution over the range [ a, a ] is chosen.We use four samples n = 30 , , , ,

000 simulated datasets areshown in Fig. 1 and 2 (both shown in supplementary document) respectively.10

20 40 60 80 − . − . − . − . Estimator

Censoring percentage B i a s N 30N 50N 100N 150 − . . . . Corrected estimator

Censoring percentage B i a s N 30N 50N 100N 150 − . − . − . − . Modified estimator

Censoring percentage B i a s N 30N 50N 100N 150 − . . . . Modified corr. estimator

Censoring percentage B i a s N 30N 50N 100N 150 (a) For T ∼ LN (1.1, 1) & C ∼ U ( a, a ). − . − . − . − . Estimator

Censoring percentage B i a s N 30N 50N 100N 150 − . − . . . Corrected estimator

Censoring percentage B i a s N 30N 50N 100N 150 − . − . − . Modified estimator

Censoring percentage B i a s N 30N 50N 100N 150 − − Modified corr. estimator

Censoring percentage B i a s N 30N 50N 100N 150 (b) For T ∼ EX (0.2) & C ∼ EX ( λ ). − . − . − . − . Estimator

Censoring percentage B i a s N 30N 50N 100N 150 − . − . . Corrected estimator

Censoring percentage B i a s N 30N 50N 100N 150 − . − . − . − . Modified estimator

Censoring percentage B i a s N 30N 50N 100N 150 − . − . . . Modified corr. estimator

Censoring percentage B i a s N 30N 50N 100N 150 (c) For T ∼ G (4, 1) & C ∼ U ( a, a ). − . − . − . Estimator

Censoring percentage B i a s N 30N 50N 100N 150 − . − . − . Corrected estimator

Censoring percentage B i a s N 30N 50N 100N 150 − . − . − . Modified estimator

Censoring percentage B i a s N 30N 50N 100N 150 − . − . − . Modified corr. estimator

Censoring percentage B i a s N 30N 50N 100N 150 (d) For T ∼ WB (3.39, 3) & C ∼ U ( a, a ).Figure 1: The bias of the K–M mean lifetime estimators ˆ S KM mean, ˜ S KM mean, ˆ S ∗ KM mean and˜ S ∗ KM mean in 10000 simulation runs.

20 40 60 80 . . . . Estimator

Censoring percentage V a r i an c e o f b i a s N 30N 50N 100N 150 . . . Corrected estimator

Censoring percentage V a r i an c e o f b i a s N 30N 50N 100N 150 . . . . Modified estimator

Censoring percentage V a r i an c e o f b i a s N 30N 50N 100N 150 . . . Modified corr. estimator

Censoring percentage V a r i an c e o f b i a s N 30N 50N 100N 150 (a) For T ∼ LN (1.1, 1) & C ∼ U ( a, a ). . . . . . . . Estimator

Censoring percentage V a r i an c e o f b i a s N 30N 50N 100N 150 . . . . . . Corrected estimator

Censoring percentage V a r i an c e o f b i a s N 30N 50N 100N 150 . . . . Modified estimator

Censoring percentage V a r i an c e o f b i a s N 30N 50N 100N 150

Modified corr. estimator

Censoring percentage V a r i an c e o f b i a s N 30N 50N 100N 150 (b) For T ∼ EX (0.2) & C ∼ EX ( λ ). . . . Estimator

Censoring percentage V a r i an c e o f b i a s N 30N 50N 100N 150 . . . Corrected estimator

Censoring percentage V a r i an c e o f b i a s N 30N 50N 100N 150 . . . Modified estimator

Censoring percentage V a r i an c e o f b i a s N 30N 50N 100N 150 . . . . Modified corr. estimator

Censoring percentage V a r i an c e o f b i a s N 30N 50N 100N 150 (c) For T ∼ G (4, 1) & C ∼ U ( a, a ). . . . . Estimator

Censoring percentage V a r i an c e o f b i a s N 30N 50N 100N 150 . . . . Corrected estimator

Censoring percentage V a r i an c e o f b i a s N 30N 50N 100N 150 . . . . Modified estimator

Censoring percentage V a r i an c e o f b i a s N 30N 50N 100N 150 . . . . Modified corr. estimator

Censoring percentage V a r i an c e o f b i a s N 30N 50N 100N 150 (d) For T ∼ WB (3.39, 3) & C ∼ U ( a, a ).Figure 2: The variance of the bias of the K − M mean lifetime estimators ˆ S KM mean, ˜ S KM mean,ˆ S ∗ KM mean and ˜ S ∗ KM mean in 10000 simulation runs. W ν of Table 1,described fully in (Khan and Shaw, 2013b).Fig. 1(a), 1(d) and 2(a), 2(d) reveal similar results to our large simulationbased Koziol–Green model example. For example, given the modiﬁcation,the bias estimate is bound to be higher. This seems to be true also for thevariance estimate. In addition, we ﬁnd that for both actual and modiﬁedestimators the trend in bias diﬀers for diﬀerent censoring levels, but theybehave similarly under diﬀerent lifetime distributions (see Fig. 1). The re-lationship between bias and censoring level varies substantially between thedistributions and the sample sizes. For a log-normal distribution, the biasfor the estimators except for the corrected estimators tends to increase as P % increases until 50. The maximum bias for the other distributions in-vestigated occurs between 60% and 80% censoring. Under the Exponentiallifetime distribution the bias behaves very similarly to that of the Koziol–Green proportional hazards model. Given that the estimators are originalor modiﬁed the corrected estimators seem to be overestimated in the highercensoring points (i.e., the bias becomes positive in higher censoring).The variance (Fig. 2) of bias for estimators also diﬀers according to samplesizes and censoring level. The variance generally reaches a maximum atsome censoring level between 50% and 70%, then declines. However, for thecorrected estimators under a log-normal distribution the variance decreasesconsistently as censoring increases (see Fig. 2(a)). This simulation study is conducted to investigate how the modiﬁed esti-mators behave relative to the original estimators when lifetimes are modeled13s an AFT model that has the form Z i = α + X Ti β + σε i , i = 1 , · · · , n ε i ∼ N (0 ,

1) for i = 1 , · · · , n (9)where Z i = log ( T i ), X is the covariate vector, α is the intercept term, β isthe unknown p × a, a ) where a is chosenanalytically in the same way as done in the previous example. We considerﬁve covariates X = ( X , X , X , X , X ) each of which is generated usingU(0 , P % points, and three samples n = 30, 50 and 100. The coeﬃ-cients of the covariates are chosen as β j = j +1 where j = 1 , · · · , σ = 1.Of the ﬁve proposed imputation approaches of Table 1 and Khan and Shaw(2013b), the resampling based conditional mean approach ( W τ ∗ m ) is found tohave the least bias, and the results for W τ ∗ m from 10 ,

000 simulation runs areshown in Fig. (3).

4. Discussion

The behavior of bias for the K–M lifetime estimators is inﬂuenced bymany factors in practice. For example, the nature of the distributions to beused for lifetimes, the censoring rate, the sample size, whether the lifetimesare modeled with the covariates and so on. To explore the behaviour of thejackknife bias for K–M estimators under various conditions (in particular,censoring levels) a large simulation is required. Our simulation studies gobeyond the small simulation study in Stute and Wang (1994) and show cleardiﬀerences from many of their results. In particular, the bias (Eq. (4) and14 − − − − − − Estimator

Censoring percentage B i a s N 30N 50N 100

20 40 60 80 − − − − Corrected estimator

Censoring percentage B i a s N 30N 50N 100

20 40 60 80 − − − − − − Modified estimator

Censoring percentage B i a s N 30N 50N 100

20 40 60 80 − − − − Modified corr. estimator

Censoring percentage B i a s N 30N 50N 100 (a) Bias

20 40 60 80 . . . . . . Estimator

Censoring percentage V a r i an c e o f b i a s N 30N 50N 100

20 40 60 80 . . . . . . Corrected estimator

Censoring percentage V a r i an c e o f b i a s N 30N 50N 100

20 40 60 80 . . . . Modified estimator

Censoring percentage V a r i an c e o f b i a s N 30N 50N 100

20 40 60 80 . . . . Modified corr. estimator

Censoring percentage V a r i an c e o f b i a s N 30N 50N 100 (b) Variance of biasFigure 3: Simulation results for the third simulated example for all four K − M meanlifetime estimators ˆ S KM mean, ˜ S KM mean, ˆ S ∗ KM mean and ˜ S ∗ KM mean under the log-normal AFT modelat diﬀerent censoring points. Lowess smooths are superimposed. (7)) will be 0 at 0% censoring and increases as the censoring level increases.However, the bias will also tend to 0 as the censoring level tends to 100%(because the bias is 0 when either δ ( n − or δ ( n ) is 0). Therefore, as shownin the ﬁgures, the bias increases up to a particular censoring level (typically50% − δ ( n ) = 0, δ ( n − = 0)15o contribute to the bias calculation. So our modiﬁcations reduce the originalconditions needed for jackknife estimation of bias ( δ ( n − = 0, δ ( n ) = 1) to thesingle condition δ ( n − = 0. The modiﬁed jackknife estimate also preventsthe K–M estimator from being badly underestimated by the jackknife esti-mate when the largest observation is censored. For calculating bias and itsvariance with the proposed and existing jackkniﬁng procedures we have pro-vided a publicly available package jackknifeKME (Khan and Shaw, 2013a)implemented in the R programming system.

5. Acknowledgements

The ﬁrst author is grateful to the Centre for Research in StatisticalMethodology (CRiSM), Department of Statistics, University of Warwick, UKfor oﬀering research funding for his PhD study.