[PDF] Bootstrapping the Kaplan-Meier Estimator on the Whole Line

Abstract

This article is concerned with proving the consistency of Efron's (1981) bootstrap for the Kaplan-Meier estimator on the whole support of a survival function. While other works address the asymptotic Gaussianity of the estimator itself without restricting time (e.g. Gill, 1983, and Ying, 1989), we enable the construction of bootstrap-based time-simultaneous confidence bands for the whole survival function. Other practical applications include bootstrap-based confidence bands for the mean residual life-time function or the Lorenz curve as well as confidence intervals for the Gini index.

Full PDF

aa r X i v : . [ m a t h . S T ] M a y Bootstrapping the Kaplan-Meier Estimator on the Whole Line

Dennis Dobler ˚ November 7, 2018

Abstract

This article is concerned with proving the consistency of Efron’s (1981) bootstrap for the Kaplan-Meierestimator on the whole support of a survival function. While other works address the asymptotic Gaussianityof the estimator itself without restricting time (e.g. Gill, 1983, and Ying, 1989), we enable the construction ofbootstrap-based time-simultaneous conﬁdence bands for the whole survival function. Other practical applicationsinclude bootstrap-based conﬁdence bands for the mean residual life-time function or the Lorenz curve as well asconﬁdence intervals for the Gini index.

Keywords:

Counting process; Right-censoring; Resampling; Efron’s bootstrap; Mean-residual lifetime; Lorenzcurve; Gini index. ˚ University of Ulm, Institute of Statistics, Germany 1

Introduction

This article reconsiders Efron’s (1981) bootstrap of Kaplan-Meier estimators. It is well-known that drawing withreplacement directly from the original observations consisting of (event time, censoring indicator) reproducesthe correct covariance structure; see e.g. Akritas (1986), Lo and Singh (1986), Horvath and Yandell (1987) orvan der Vaart and Wellner (1996) for an application in empirical processes. Let T : Ω Ñ p , τ q be a continuouslydistributed random survival time with survival function given by S p t q “ ´ F p t q “ P p T ą t q . For conceptualconvenience we mainly refer to T as a random survival time , although other interpretations are also reasonable; seethe examples below. In the previously mentioned articles the typical assumption S p τ q ą is met for mathematicalconvenience in proving weak convergence of estimators for S on the Skorohod space D r , τ s and because moststudies involve a rather strict censoring mechanism: after a pre-speciﬁed end of study time each individual withoutan observed event is considered as right-censored. Thus, it is often not possible to draw inference on functionalsof the whole survival function.Some functionals, however, indeed require the possibility to observe arbitrarily large survival times. For in-stance, consider the mean residual life-time function t ÞÝÑ g p t q “ E r T ´ t | T ą t s “ S p t q ż τt S p u q d u ; (1.1)see e.g. Meilijson (1972), Gill (1983), Remark 3.3, and Stute and Wang (1993). This function describes theexpected remaining life-time given the survival until a point of time t ą . Another, econometric example of afunctional of whole survival curves is the Lorenz curve p ÞÝÑ L p p q “ µ ´ ż p F ´ p t q d t “ µ ´ ż F ´ p p q s d F p s q , (1.2)where F ´ p t q “ inf t u ě F p u q ě t u is the left-continuous generalized inverse of F and µ “ ş t d F p t q isits mean. With the interpretation of T being the income of a random individual in a population, this function L obviously represents the total income of the lowest p th fraction of all incomes. A closely related quantity is the Gini index G “ ş p u ´ L p u qq d u ş u d u P r , s (1.3)as a measure of uniformity of all incomes within a population; see e.g. Tse (2006). The value G “ representsperfect equality of all incomes, whereas G “ describes the other extreme: only one persons gains everything andthe rest nothing.All quantities (1.1), (1.2) and (1.3) are statistical functionals of the whole survival function S . First analyzing S only on a subset of its support results inevitably in an alternation of the above functionals in a second step.And this affects the interpretation of such quantities. In order to circumvent such problems, estimating the wholesurvival function is the obvious solution: Henceforth, denote by τ “ inf t t ě S p t q “ u P p , the support’sright end point. Wang (1987) and Stute and Wang (1993) showed the uniform consistency of the Kaplan-Meier or product-limit estimator p p S p t qq t Pr ,τ s for p S p t qq t Pr ,τ s and Gill (1983) and Ying (1989) proved its weak convergenceon the Skorohod space D r , τ s . For robust statistical inference procedures concerning the above functionals of S it is thus necessary to extend well-known bootstrap results for the Kaplan-Meier estimator to the whole Skorohodspace D r , τ s . After presenting this primary result we deduce inference procedures for the quantities (1.1) to (1.3).This article is organized as follows. Section 2 introduces all required estimators, recapitulates previous weakconvergence results on D r , t s , t ă τ , and provides handy results for checking all main assumptions. The maintheorems on weak convergence of the bootstrap Kaplan-Meier estimator are presented in Section 3, including aconsistency theorem for a bootstrap variance function estimator. Section 4 deduces inference procedures for (1.1)to (1.3) and the ﬁnal Section 5 gives a discussion on future research possibilities. All proofs are given in theAppendix. Most of this article’s results originate from the University of Ulm PhD thesis of Dennis Dobler; cf.Chapter 6 and 7 of Dobler (2016). 2 Preliminary Results

Let T , . . . , T n : Ω Ñ p , , n P N , be independent survival times with continuous survival functions S p t q “ ´ F p t q “ P p T ą t q and cumulative hazard function A p t q “ ş t α p u q d u “ ´ ş t p d S q{ S ´ “ ´ log S p t q .Independent thereof, let C , . . . , C n : Ω Ñ p , be i.i.d. (censoring) random variables with (possibly dis-continuous) survival function G p t q “ P p C ą t q such that the observable data consist of all ď i ď n pairs p X i , δ i q : “ p T i ^ C i , t X i “ T i uq . Here, t¨u is the indicator function. Thus, the survival function of X is H “ S ¨ G . The Kaplan-Meier estimator is deﬁned by p S n p t q “ ś i : X i : n ď t p ´ δ r i : n s n ´ i ` q , where p X n , . . . , X n : n q isthe order statistic of p X , . . . , X n q and p δ r n s , . . . , δ r n : n s q are their concomitant censoring indicators. Throughout,we assume that ´ ż τ d SG ´ ă 8 (2.1)which restricts the magnitude of censoring to a reasonable level. For instance Gill (1983), Ying (1989) and Akritasand Brunner (1997) require this condition for an analysis of the large sample properties of Kaplan-Meier estimatorson the whole support r , τ s . Thereof, Gill (1983) requires Condition (2.1) for a vanishing upper bound in Lenglart’sinequality. Obviously, the above condition implies that r , τ s is contained in the support of G ; see also Allignolet al. (2014) for a similar condition in a non-Markov illness-death model, reduced to a competing risks problem.Denote by p T n : “ max i ď n X i the largest observed event or censoring time and let, for a function t ÞÑ f p t q , thenotation f p T n be its stopped version, i.e., f p T n p t q “ f p t ^ p T n q . The monotone function t ÞÑ σ p t q “ ş t p d A q{ H ´ is the asymptotic variance function of the related Nelson-Aalen estimator for A and reappears in the asymptoticcovariance function of p S n . Throughout, all convergences (in distribution, probability, or almost surely) are under-stood to hold as n Ñ 8 and convergence in distribution and in probability are denoted by d Ñ and p Ñ , respectively.The present theory relies on the following weak convergence results for the Kaplan-Meier process p S n of S .L EMMA

Let B denote a Brownian motion on r , τ s and suppose (2.1) holds. (a) Theorem 1.2(i) of (Gill, 1983): On D r , τ s we have ? n p p S n ´ S q p T n d ÝÑ W : “ S ¨ p B ˝ σ q , (b) Part of Theorem 2 in (Ying, 1989): On D r , τ s we have ? n p p S n ´ S q d ÝÑ W “ S ¨ p B ˝ σ q . Denote by p A n p t q “ ř i : X i : n ď t δ r i : n s n ´ i ` the Nelson-Aalen estimator for the cumulative hazard function A p t q andby p G n the Kaplan-Meier estimator for the censoring survival function G . Note that p H n “ p G n p S n holds for theempirical survival function of H since, almost surely (a.s.), no survival time equals a censoring time: T i ‰ C j a.s.for all i, j . The asymptotic covariance function Γ of W in Lemma 2.1 and a natural estimator p Γ n are given by Γ p u, v q “ S p u q ´ ż u ^ v d AH ´ ¯ S p v q and p Γ n p u, v q “ p S n p u q ´ ż u ^ v d p A n p H n ´ ¯ p S n p v q . The following lemma is helpful for an assessment of Condition (2.1) and for studentizations.L

EMMA

For all t P r , τ s it holds that ´ ż τt d p S n p G n ´ p ÝÑ ´ ż τt d SG ´ ď 8 . (b) In case of (2.1) we have sup p u,v qPr ,τ s | p Γ n p u, v q ´ Γ p u, v q| p ÝÑ . Main Results

The limit distribution of the Kaplan-Meier process in Lemma 2.1 shall be assessed via bootstrapping. To thisend, we independently draw n times with replacement from p X , δ q , . . . , p X n , δ n q and denote the thus obtainedbootstrap sample by p X ˚ , δ ˚ q , . . . , p X ˚ n , δ ˚ n q . Throughout, denote by Γ ˚ n , S ˚ n etc. the obvious estimators but basedon the bootstrap sample. Note that this requires a discontinuous extension of the above quantities. The followingtheorem is the basis of all later inference methods. Theorem 1.

Let B denote a Brownian motion on r , τ s and suppose that (2.1) holds. Then we have, conditionallyon X , X , . . . , ? n p S ˚ n ´ p S n q d ÝÑ W “ S ¨ p B ˝ σ q on D r , τ s in probability. Many statistical applications involve a consistent variance estimator, e.g. Hall-Wellner or equal precisionconﬁdence bands for S ; cf. Andersen et al. (1993), p. 266. In order to asymptotically reproduce the same limiton the bootstrap side, the uniform consistency of a bootstrapped variance estimator (deﬁned on the whole support r , τ s of the covariance function) needs to be veriﬁed. To this end, introduce the bootstrap version of p Γ n , that is, Γ ˚ n p u, v q “ S ˚ n p u q ´ ż u ^ v d A ˚ n H ˚ n ´ ¯ S ˚ n p v q . For all ε ą , its uniform consistency (here and below always meaning conditional convergence in probabilitygiven X , X , . . . in probability) over all points p u, v q P r , τ s zr τ ´ ε, τ s is an immediate consequence ofTheorem 1 in combination with the continuous mapping theorem: Write the absolute value of the integral partminus its estimated counterpart as ˇˇˇ ż u ^ v p p H n ´ ´ H ˚ n ´ q d A ˚ n ´ H ˚ n ´ d p p A n ´ A ˚ n q H ˚ n ´ p H n ´ ˇˇˇ ď sup p ,u ^ v q | p H n ´ H ˚ n | H ˚ n pp u ^ v q´q p H n pp u ^ v q´q A ˚ n p u ^ v q ` ˇˇˇ ż u ^ v d p p A n ´ A ˚ n q p H n ´ ˇˇˇ . The ﬁrst term is asymptotically negligible due to P`olya’s theorem and the second term becomes small due to thecontinuous mapping theorem applied to the integral functional and the logarithm functional. Here the restriction to r , τ s zr τ ´ ε, τ s simpliﬁed the calculations since all denominators are asymptotically bounded away from zero.For uniform consistency on the whole rectangle r , τ s , however, similar arguments as for the bootstrappedKaplan-Meier process on r , τ s are required. Compared to (2.1), we postulate a slightly more restrictive censoringcondition.L EMMA

Suppose that ´ ż τ d SG ´ ă 8 . (3.1) Then we have the following conditional uniform consistency given X , X , . . . in probability: sup p u,v qPr ,τ s | Γ ˚ n p u, v q ´ p Γ n p u, v q| p ÝÑ in probability as n Ñ 8 . (3.2) Remark 1.

The proof of Lemma 3.1 shows that Condition (3.1) can be diminished to ´ ż τ d SG ´ ´ ż τ S ´ δ d SG ` δ ´ ă 8 for some δ P p , q . This is due to the inequality p n p H n ´ q ´ ď p n δ p H ` δn ´ q ´ . Applications

Applications of Theorem 1 concern conﬁdence intervals for the mean residual life-time g p t q “ E r T ´ t | T ą t s oncompact sub-intervals r t , t s Ă r , τ q in case of τ ă 8 as well as conﬁdence regions for the Lorenz curve L andthe Gini index G . To this end, we apply the functional delta-method (e.g. Andersen et al., 1993, Theorem II.8.1)which in turn requires the Hadamard-differentiability of all involved statistical functionals. Conﬁdence Bands for the Mean-Residual Lifetime Function

Let ď t ď t and introduce the space C r t , τ s of continuous functions on r t , τ s equipped with the supremumnorm as well as the subset r C r t , t s “ t f P C r t , τ s : inf s Pr t ,t s | f p s q| ą u Ă C r t , τ s containing all continuous functions having a positive distance to the constant zero function on the interval r t , t s .Similarly, let r D r t , t s “ t f P D r t , τ s : inf s Pr t ,t s | f p s q| ą , sup s Pr t ,τ s | f p s q| ă 8u Ă D r t , τ s be the extension of r C r t , t s to possibly discontinuous, bounded c`adl`ag functions. For the notion of Hadamard-differentiability tangentially to subsets of D r t , τ s , see Deﬁnition II.8.2, Theorem II.8.2 and Lemma II.8.3 inAndersen et al. (1993), p. 111f. The following lemma makes the functional delta-method available for applicationsto the mean residual life-time function.L EMMA

Let τ ă 8 and r t , t s Ă r , τ q be a compact interval. Then ψ : r D r t , t s Ñ D r t , t s , θ p¨q ÞÑ θ p¨q ż τ ¨ θ p s q d s is Hadamard-differentiable at each θ P r C r t , t s tangentially to C r t , τ s with continuous linear derivative d ψ p θ q¨ h P D r t , t s given by p d ψ p θ q ¨ h qp s q : “ θ p s q ż τs h p u q d u ´ h p s q ż τs θ p u q θ p s q d u. As pointed out in Gill (1989) or Andersen et al. (1993), p. 110, the functional delta-method is established onthe functional space D r t , τ s (or subsets thereof) equipped with the supremum norm. However, in case of limitingprocesses with continuous sample paths, “weak convergence in the sense of the [Skorohod] metric and in the senseof the supremum norm are exactly equivalent” (Andersen et al., 1993). See also Problem 7 in Pollard (1984), p.137. The convergence result of Theorem 1 combined with the functional ψ of Lemma 4.1 constitutes the followingweak convergence.L EMMA

Suppose that (2.1) holds. On the Skorohod space D r t , t s we then have ? n ´ ż τ ¨ p S n p u q p S n p¨q d u ´ ż τ ¨ S p u q S p¨q d u ¯ d ÝÑ U and, given X , X , . . . , ? n ´ ż τ ¨ S ˚ n p u q S ˚ n p¨q d u ´ ż τ ¨ p S n p u q p S n p¨q d u ¯ d ÝÑ U n outer probability. The Gaussian process U has a.s. continuous sample paths, mean zero and covariance function p r, s q ÞÑ ż τr _ s ż τr _ s Γ p u, v q S p r q S p s q d u d v ´ σ p r _ s q g p r q g p s q , where g p t q “ E r T ´ t | T ą t s “ ş τt S p u q S p t q d u is again the mean residual life-time function. The previous lemma in combination with the continuous mapping theorem almost immediately gives rise tothe construction of asymptotically valid conﬁdence regions for the mean residual life-time function. According tothe functional delta-method we may ﬁrst apply, e.g. an arcsin - or log -transformation to ensure that only positivevalues are included in the conﬁdence regions; cf. Section IV.1.3 in Andersen et al. (1993), p. 208ff. For ease ofpresentation, only the linear regions are stated below.

Theorem 2.

Let ď t ď t ă τ . Choose any α P p , q and suppose that (2.1) holds. An asymptotic two-sided p ´ α q -conﬁdence band for the mean residual life-time function p E r T ´ t | T ą t sq t Pr t ,t s is given by ” ż τt p S n p u q p S n p t q d u ´ q MRLTn ,n ? n , ż τt p S n p u q p S n p t q d u ` q MRLTn ,n ? n ı t Pr t ,t s where q MRLTn ,n is the p ´ α q -quantile of the conditional law given p X , δ q , . . . , p X n , δ n q of ? n sup t Pr t ,t s ˇˇˇ ż τt S ˚ n p u q S ˚ n p t q d u ´ ż τt p S n p u q p S n p t q d u ˇˇˇ . Remark 2. (a)

Instead of using a transformation as indicated above Theorem 2, one could also employ a stu-dentization using p Γ n and Γ ˚ n . Plugging these and consistent estimators for the other unknown quantities into theasymptotic variance representation yields consistent variance estimators for the statistic of interest. This yields aGaussian process with asymptotic variance 1 at all points of time for the mean residual life-time estimates. (b) In practice, the construction of conﬁdence bands for the mean residual life-time function requires to choose t depending on the data: else, too large choices of t might result in p S n p t q “ , in which case the above estimatorwould not be well-deﬁned. Conﬁdence Regions for the Lorenz Curve and the Gini Index

Suppose S has compact support r , τ s , i.e. let this again be the smallest interval satisfying S p q “ and S p τ q “ .As estimators for the Lorenz curve and the Gini index we consider the plug-in estimates p L n p p q “ p µ n ż p p ´ p S n p t qq ´ d t and p G n “ ş p u ´ p L n p u qq d u ş u d u , where p µ n “ ş τ s d p S n p s q . The restricted and unscaled Lorenz curve estimator under independent right-censoringhas been bootstrapped by Horvath and Yandell (1987). Tse (2006) discussed the large sample properties of theabove Lorenz curve estimator (even under left-truncation) and also of the normalized estimated Gini index ? n p p G n ´ G q “ ? n ´ ş p u ´ p L n p u qq d u ş u d u ´ ş p u ´ L p u qq d u ş u d u ¯ “ ? n ż p L p u q ´ p L n p u qq d u. Again equip all subsequent function spaces with the supremum norm. Let D Ò r , τ s Ă D r , τ s be the set of alldistribution functions on r , τ s with no atom in , and let D ´ r , τ s be the set of all c`agl`ad functions on r , τ s . First,we consider the normalized estimated Lorenz curve, i.e. the process W n : Ω Ñ r , s given by W n p p q “? n ´ p µ n ż p p ´ p S q ´ p s q d s ´ µ ż p p ´ S q ´ p s q d s ¯ “? n p p µ ´ n ¨ p Φ ˝ Ψ ˝ p ´ p S qqp p q ´ µ ´ ¨ p Φ ˝ Ψ ˝ p ´ S qqp p qq . Φ and Ψ are Φ : D ´ r , s ÞÑ C r , s , h ÞÑ ´ p ÞÑ ż p h p s q d s ¯ , and Ψ : D Ò r , τ s ÞÑ D ´ r , s , k ÞÑ k ´ (the left-continuous generalized inverse) . Suppose that S is continuously differentiable on its support with strictly positive derivative f , bounded away fromzero. The Hadamard-differentiability of Ψ at p ´ S q tangentially to C r , τ s then holds according to Lemma 3.9.23in van der Vaart and Wellner (1996), p. 386. Its derivative map is given by α ÞÑ ´ αf ˝ p ´ S q ´ . The otherfunctional Φ is obviously Hadamard-differentiable at S ´ P C r , s tangentially to C r , s since Φ itself is linearand the domain of integration is bounded. Next, ? n ´ p µ ´ µ ¯ “ ? n p Υ p p g n p qq ´ Υ p g p qqq where Υ : p ,

8q Ñ p , , r ÞÑ r , g p q “ E r T ´ a | T ą s “ E r T s is the mean-residual life-time functionat and p g n p q its estimated counterpart. Clearly, Υ is (Hadamard-)differentiable and the required Hadamard-differentiability of p ´ S q ÞÑ g p q follows immediately from Lemma 4.1. Finally, the multiplication functionalis also Hadamard-differentiable. All in all, we conclude that W n “ ? n p Ξ p p S n q ´ Ξ p S qq for a functional Ξ : D r , τ s Ñ C r , s which is Hadamard-differentiable at S tangentially to C r , τ s . Theorem 1 in combination withthe functional δ -method (for the bootstrap) immediately implies that W n and W ˚ n both converge in (conditional)distribution to the same continuous Gaussian process (in outer probability given X ). Time-simultaneous inferenceprocedures for the Lorenz curve, such as tests for equality and conﬁdence bands are constructed straightforwardly.Finally, the normalized estimated Gini index allows the representation ? n p p G n ´ G q “ ? n pt Φ ˝ Ξ up q ˝ p S n ´ t Φ ˝ Ξ up q ˝ S q of which t Φ ˝ Ξ up q is again Hadamard-differentiable at S tangentially to C r , τ s . Hence, conﬁdence intervals for G with bootstrap-based quantiles are constructed in the same way as before. In this article we established consistency of the bootstrap for Kaplan-Meier estimators on the whole support ofthe estimated survival function. By means of the functional delta-method this conditional weak convergence istransferred to Hadamard-differentiable functionals such as the mean-residual lifetime, the Lorenz curve or the Giniindex. Further applications include the expected length of stay in the transient state (e.g. Grand and Putter 2015)or the probability of concordance (e.g. Pocock et al. 2012, Dobler and Pauly 2016).This bootstrap consistency on the whole support may also be extended to more general inhomogeneous Marko-vian multistate models. Based on the martingale representation of Aalen-Johansen estimators for transition prob-ability matrices (e.g. Andersen et al., 1993, p. 289), one could try to generalize the results of Gill (1983) to thissetting. Here the notion of the ‘largest event times’ requires special attention as these may differ for differenttypes of transitions. A reasonable ﬁrst step towards such a generalization would be an analysis in competing risksset-ups where the support of each cumulative incidence function provides a natural domain to investigate weakconvergences on. Once weak convergence of the estimators on the whole support is veriﬁed, martingale argumentssimilar to those of Akritas (1986) and Gill (1983) may be employed in order to obtain such (now conditional) weakconvergences for the resampled Aalen-Johansen estimator using a variant of Efron’s bootstrap. In more generalMarkovian multi-state models we could independently draw with replacement from the sample that contains allindividual trajectories rather than single observed transitions in order to not corrupt the dependencies within eachindividual; see for example Tattar and Vaman (2012) for a similar suggestion. Applications of this theory couldinclude inference on more reﬁned variants of the probability of concordance or the expected length of stay. Con-sidering a progressive disease in a two-sample situation, for instance, we would like to compare the probability that7n individual of group one remains longer in a less severe disease state than an individual of group two. Accurateinference procedures for the mean residual life-time in a state of disability given any state at present time offersanother kind of application.

Acknowledgements

The author would like to thank Markus Pauly (University of Ulm) for helpful discussions.

Appendix

Some of the following proofs (Appendix A) rely on the ideas of Gill (1983). In order to also apply (variants of)his lemmata in our bootstrap context, Appendix B below contains all required results. ‘Tightness’ in the support’sright boundary τ for the bootstrapped Kaplan-Meier estimator is essentially shown via a bootstrap version of theapproximation theorem for truncated estimators as in Theorem 3.2 in Billingsley (1999); cf. Appendix C. Deﬁneby Y p u q “ n p H n ´ p u q the process counting the number of individuals at risk of dying, and by Y ˚ p u q its bootstrapversion. A Proofs

Proof of Lemma 2.2.

Proof of (a): Let t ă τ and suppose (2.1) holds. By the continuous mapping theorem andthe boundedness away from zero of G on r , t s , it clearly follows that ´ ş t p S n p G n ´ p ÝÑ ´ ş t SG ´ as n Ñ 8 . Letting t Ò τ , the right-hand side converges towards ´ ş τ SG ´ ă 8 . It remains to apply Theorem 3.2 of Billingsley (1999)in order to verify the assertion for t “ and hence for all t ď τ by the continuous mapping theorem. Thus, weshow that for all ε ą , lim t Ò τ lim sup n Ñ8 P ´ ´ ż τt d p S n p G n ´ ą ε ¯ “ . Let p T n again be the largest observation among X , . . . , X n and deﬁne, for any β ą , B β : “ t p S n p s q ď β ´ S p t q and p H n p s ´q ě βH p s ´q for all s P r , p T n su . By Lemmata B.1 and B.2, the probability p β : “ ´ P p B β q ď β ` eβ exp p´ { β q is arbitrary small for sufﬁcientlysmall β ą . Hence, by Theorem 1.1 of Stute and Wang (1993) (applied for the concluding convergence), P ´ ´ ż τt d p S n p G n ´ ą ε ¯ “ P ´ ´ ż τt p S n ´ d p S n p H n ´ ą ε ¯ ď P ´ ´ β ´ ż τt S ´ d p S n H ´ ą ε ¯ ` p β “ P ´ ´ β ´ ż τt d p S n G ´ ą ε ¯ ` p β Ñ P ´ ´ β ´ ż τt d SG ´ ą ε ¯ ` p β . For large t ă τ and by the continuity of S , the far right-hand side of the previous display equals p β .Proof of (b): First note that the uniform convergences in probability in Theorems IV.3.1 and IV.3.2 of Andersenet al. (1993), p. 261ff., yield, for any ε ą , sup p u,v qPr ,τ ´ ε s | p Γ n p u, v q ´ Γ p u, v q| p ÝÑ as n Ñ 8 . S ´ d A “ ´ d S show that Γ p u, v q “ ´ ż τ t w ď u ^ v u S p u q S p v q S p w ´q S p w ´q d S p w q G p w ´q Ñ ´ ż τ SG ´ “ as u, v Ñ τ . Hence, it remains to verify the remaining condition (3.8) of Theorem 3.2 in Billingsley (1999) inorder to conclude this proof. That is, for each positive δ we show lim u,v Ñ τ lim sup n Ñ8 P ´ sup p u,v qPr ,τ s | p Γ n p u, v q ´ p Γ n p τ, τ q| ě δ ¯ “ . To this end, rewrite p Γ n p u, v q ´ p Γ n p τ, τ q as ż τu ^ v p S n p τ q p S n p τ q p S n p w ´q p S n p w ´q d p S n p w q p G n p w ´q ` ż u ^ v d p S n p S n ´ p G n ´ p p S n p τ q ´ p S n p u q p S n p v qq . The left-hand integral is bounded in absolute value by ´ ş τu ^ v d p S n p G n ´ which goes to ´ ş τu ^ v d SG ´ in probability as n Ñ 8 by (a). For large u, v this is arbitrarily small.The remaining integral is bounded in absolute value by ´ ż u ^ v p S n p u q p S n p v q p S n ´ d p S n p G n ´ “ ´ n ż u ^ v p S n p u q p S n p v q Y p G n ´ d p S n . By Lemmata B.1 and B.2 this integral is bounded from above by ´ ż u ^ v S p p T n ^ u q S p p T n ^ v q H ´ G ´ d p S n “ ´ ż u ^ v S p p T n ^ u q S p p T n ^ v q S ´ G ´ d p S n on a set with arbitrarily high probability. For sufﬁciently large n we also have p T n ą u ^ v with arbitrarily highprobability. Next, Theorem 1.1 in Stute and Wang (1993) yields ´ ż u ^ v S p u q S p v q S ´ G ´ d p S n p ÝÑ ´ ż u ^ v S p u q S p v q S ´ G ´ d S as n Ñ 8 . As above the dominated convergence theorem shows the negligibility of this integral as u, v Ñ τ . l Proof of Theorem 1.

For the proof of weak convergence of the bootstrapped Kaplan-Meier estimator on each Sko-rohod space D r , t s , t ă τ , see e.g. Akritas (1986), Lo and Singh (1986) or Horvath and Yandell (1987). Bydeﬁning these processes as constant functions after t , the convergences equivalently hold on D r , τ s . This takescare of Condition (a) in Lemma C.1, while (c) is obviously fulﬁlled by the continuity of the limit Gaussian process.To close the indicated gap for the bootstrapped Kaplan-Meier process on the whole support r , τ s , it remains toanalyze Condition (b). This is ﬁrst veriﬁed for the truncated process by following the strategy of Gill (1983) whileapplying the martingale theory of Akritas (1986) for the bootstrapped counting processes. Thus, the truncationtechnique of Lemma C.1 shows the convergence in distribution of the truncated process. Finally, the negligibilityof the remainder term is shown similarly as in Ying (1989).We will make use of the fact that our martingales, stopped at arbitrary stopping times, retain the martingaleproperty; cf. Andersen et al. (1993), p. 70, for sufﬁcient conditions on this matter. Similarly to the largest eventor censoring time p T n , introduce the largest bootstrap time T ˚ n “ max i “ ,...,n X ˚ i , being an integrable stoppingtime with respect to the ﬁltration of Akritas (1986) who used Theorem 3.1.1 of Gill (1980): Hence, we choose theﬁltration given by F t : “ t X i , δ i , δ ˚ i t X ˚ i ď t u , X ˚ i t X ˚ i ď t u : i “ , . . . , n u , ď t ď τ ; t X ˚ i ď t u into the ﬁltration since their values are already determined by all the X ˚ i t X ˚ i ď t u : According to our assumptions, X ˚ i ą a.s. for all i “ , . . . , n .We would ﬁrst like to verify condition (b) in Lemma C.1 for the stopped bootstrap Kaplan-Meier process. Thatis, for each ε ą and an arbitrary subsequence p n q Ă p n q there is another subsequence p n q Ă p n q such that lim t Ò τ lim sup n Ñ8 P p sup t ď s ă T ˚ n ? n |p S ˚ n ´ p S n qp s q ´ p S ˚ n ´ p S n qp t q| ą ε | X qď lim t Ò τ lim sup n Ñ8 P p sup t ď s ă p T n ? n |p S ˚ n ´ p S n qp s ^ T ˚ n q ´ p S ˚ n ´ p S n qp t ^ T ˚ n q| ą ε | X q“ a.s. (A.1)for all ε ą . Here σ p X q “ F summarizes the collected data. Due to the boundedness away from zero, i.e. inf t ď s ă p T n p S n p s q ą , we may rewrite the bootstrap process ? n p S ˚ n ´ p S n qp s q “ ? n ´ S ˚ n p s q p S n p s q ´ ¯ p S n p s q for each s P r t, p T n q of which the bracket term is a square integrable martingale; see Akritas (1986) again. Hence, the term ? n p S ˚ n ´ p S n qp s q in (A.1) equals M ˚ n p s q p S n p s ^ T ˚ n q : “ ? n ´ S ˚ n p s ^ T ˚ n q p S n p s ^ T ˚ n q ´ ¯ p S n p s ^ T ˚ n q , (A.2)whereof p M ˚ n p s qq s Pr , p T n q is again a square integrable martingale. Indeed, its predictable variation process evalu-ated at the stopping time s “ T ˚ n is ﬁnite (having the sufﬁcient condition of Andersen et al. (1993), p. 70, for astopped martingale to be a square integrable martingale in mind): The predictable variation is given by s ÞÑ x M ˚ n yp s q “ ż s ^ T ˚ n ´ S ˚ n ´ p S n ¯ p ´ ∆ p A n q d p A n H ˚ n ´ , where H ˚ n is the empirical survival function of X ˚ , . . . , X ˚ n and ∆ f denotes the increment process s ÞÑ f p s `q ´ f p s ´q of a monotone function f . The supremum in (A.1) is bounded by sup t ď s ă p T n | M ˚ n p s q ´ M ˚ n p t q| p S n p s ^ T ˚ n q ` sup t ď s ă p T n | M ˚ n p t q|| p S n p s ^ T ˚ n q ´ p S n p t ^ T ˚ n q| of which the right-hand term is not greater than | M ˚ n p t q| p S n p t ^ T ˚ n q . By the convergence in distribution of thebootstrapped Kaplan-Meier estimator on each D r , r τ s , r τ ă τ , we have convergence in conditional distribution of M ˚ n p t q p S n p t ^ T ˚ n q given X towards N p , S p t q Γ p t, t qq in probability. Hence, lim n Ñ8 P p| M ˚ n p t q p S n p t ^ T ˚ n q| ą ε { | X q Ñ ´ N p , Γ p t, t qqp´ ε { , ε { q almost surely along subsequences p n q of arbitrary subsequences p n q Ă p n q . Since the variance of the normaldistribution in the previous display goes to zero as t Ò τ , cf. (2.4) in Gill (1983), the above probability vanishes as t Ò τ .By Lemma B.3, the remainder sup t ď s ă p T n | M ˚ n p s q ´ M ˚ n p t q| p S n p s ^ T ˚ n q is not greater than t ď s ă p T n ˇˇˇ ż st p S n p u q d M ˚ n p u q ˇˇˇ . (A.3)Since, given X , p S n is a bounded and predictable process, this integral is a square integrable martingale on r t, p T n q .We proceed as in Gill (1983) by applying Lenglart’s inequality, cf. Section II.5.2 in Andersen et al. (1993): For10ach η ą we have P ´ sup t ď s ă T ˚ n ^ τ ˇˇˇ ż st p S n d M ˚ n ˇˇˇ ą ε ˇˇˇ X ¯ ď ηε ` P ´ˇˇˇ ż τ ^ T ˚ n t S ˚ n ´ p ´ ∆ p A n q d p A n H ˚ n ´ ˇˇˇ ą η ˇˇˇ X ¯ . (A.4)We intersect the event on the right-hand side of (A.4) with B ˚ H,n,β : “ t H ˚ n p s ´q ě β p H n p s ´q for all s P r t, T ˚ n su and also with B ˚ S,n,β : “ t S ˚ n p s q ď β ´ p S n p s q for all s P r t, T ˚ n su . According to Lemmata B.1 and B.2, theconditional probabilities of these events are at least ´ exp p ´ { β q{ β and ´ β , respectively, for any β P p , q .Thus, (A.4) is less than or equal to ηε ` β ` exp p ´ { β q β ` ! β ´ ˇˇˇ ż τ ^ p T n t p S n ´ p ´ ∆ p A n q d p A n p H n ´ ˇˇˇ ą η ) . (A.5)In order to show the almost sure negligibility of the indicator function as n Ñ 8 and then t Ò τ , we analyze thecorresponding convergence of the integral. Since ´ d p S n “ p S n ´ d p A n , the integral is less than or equal to ´ ż τt p S n ´ d p S n p H n ´ “ ´ ż τt d p S n ´ p G n ´ . Lemma 2.2 implies that for each subsequence p n q Ă p n q there is another subsequence p n q Ă p n q such that ´ ş τt d p S n ´ p G n ´ Ñ ´ ş τt d SG ´ a.s. for all t P r , τ s X Q along p n q . Due to P p Z P Q q “ , the same convergence holdsfor all t ď τ . Letting now t Ò τ shows that the indicator function in (A.5) vanishes almost surely in limit superioralong p n q if ﬁnally t Ò τ . The remaining terms are arbitrarily small for sufﬁciently small η, β ą . Hence, allconditions of Lemma C.1 are met and the assertion follows for the stopped process p t s ă T ˚ n u? n p S ˚ n p s q ´ p S n p s qq ` t s ě T ˚ n u? n p S ˚ n p T ˚ n ´q ´ p S n p T ˚ n ´qqq s Pr ,τ s . Finally, we show the asymptotic negligibility of sup T ˚ n ď s ď τ ? n | S ˚ n p s q ´ p S n p s q| ď sup T ˚ n ď s ď τ ? n p S ˚ n p s q ` p S n p s qq“ ? nS ˚ n p T ˚ n q ` ? n p S n p T ˚ n q ; cf. Ying (1989) for similar considerations. Again by Lemma B.1, we have for any ε ą , β P p , q that P p? nS ˚ n p T ˚ n q ` ? n p S n p T ˚ n q ą ε | X qď P p? nS ˚ n p T ˚ n q ą ε { | X q ` P p? n p S n p T ˚ n q ą ε { | X qď P p? n p S n p T ˚ n q ą βε { | X q ` P p? n p S n p T ˚ n q ą ε { | X q ` β. Deﬁne the generalized inverse p S ´ n p u q : “ inf t s ď τ : p S n p s q ě u u . The independence of the bootstrap drawingsas well as arguments of quantile transformations yield P p? n p S n p T ˚ n q ą ε | X q “ P p X ˚ ă p S ´ n p ε {? n q | X q n “ ” ´ n |t i : X i ě p S ´ n p ε {? n qu| ı n . C ą , P p|t i : X i ě p S ´ n p ε {? n qu| ě C q“ P p|t i : p S n p X i q ě ε {? n u| ě C qě P p|t i : p H n p X i q ě ε {? n u| ě C q“ P ´ˇˇˇ! i : i ´ n ě ε ? n )ˇˇˇ ě C ¯ “ !ˇˇˇ! i : i ´ n ě ε ? n )ˇˇˇ ě C ) “ t|t r ε ? n s ` , . . . , n u| ě C u . Clearly, this indicator function goes to 1 as n Ñ 8 . l Proof of Lemma 3.1.

For the most part, we follow the lines of the above proof of Lemma 2.2 by verifying con-dition (3.8) of Theorem 3.2 in Billingsley (1999). To point out the major difference to the previous proof, weconsider ´ ż τu ^ v d S ˚ n G ˚ n ´ “ ż τu ^ v S ˚ n ´ d A ˚ n G ˚ n ´ “ ż τu ^ v S ˚ n ´ G ˚ n ´ J ˚ d p A ˚ n ´ p A n q ` ż τu ^ v S ˚ n ´ G ˚ n ´ J ˚ d p A n , where J ˚ p u q “ t Y ˚ p u q ą u . The arguments of Akritas (1986) show that ş ¨ u ^ v S ˚ n ´ G ˚ n ´ J ˚ d p A ˚ n ´ p A n q is a square-integrable martingale with predictable variation process given by t ÞÝÑ ż tu ^ v S ˚ n ´ G ˚ n ´ J ˚ Y ˚ p ´ ∆ p A n q d p A n . After writing S ˚ n G ˚ n “ H ˚ n , a two-fold application of Lemmata B.1 and B.2 (at ﬁrst to the bootstrap quantities S ˚ n and H ˚ n , then to the Kaplan-Meier estimators p S n and p H n ) show that the predictable variation in the previousdisplay is bounded from above by ´ β ´ n ż tu ^ v S ´ H ´ d p S n “ ´ β ´ n ż tu ^ v d p S n G ´ on a set with arbitrarily large probability depending on β P p , q . Here we also used that p S n ´ d p A n “ d p S n . Dueto (3.1), Theorem 1.1 of Stute and Wang (1993) yields ´ ż tu ^ v d p S n G ´ p ÝÑ ´ ż tu ^ v d SG ´ ă 8 and hence the asymptotic negligibility of the predictable variation process in probability. By Rebolledo’s theorem(Theorem II.5.1 in Andersen et al., 1993, p. 83), ş τu ^ v S ˚ n ´ G ˚ n ´ J ˚ d p A ˚ n ´ p A n q hence goes to zero in conditional prob-ability. The remaining integral ş τu ^ v S ˚ n ´ G ˚ n ´ J ˚ d p A n is treated similarly with Lemmata B.1 and B.2 and Theorem 1.1of Stute and Wang (1993) yielding a bound in terms of ş τu ^ v d SG ´ . This is arbitrarily small for sufﬁciently large u, v ă τ . l Proof of Lemma 4.1.

Proof of (b): Throughout, the functional spaces D r t , τ s and r D r t , t s are equipped with thesupremum norm. For some sequences t n Ó and h n Ñ h in D r t , τ s such that θ ` t n h n P r D r t , t s , consider the12upremum distance sup s Pr t ,t s ˇˇˇ t n r ψ p θ ` t n h n qp s q ´ ψ p θ qp s qs ´ p d ψ p θ q ¨ h qp s q ˇˇˇ . (A.6)The proof is concluded if (A.6) goes to zero. For an easier access the expression in the previous display is ﬁrstanalyzed for each ﬁxed s P r t , t s : t n r ψ p θ ` t n h n qp s q ´ ψ p θ qp s qs ´ p d ψ p θ q ¨ h qp s q“ t n θ p s q ` t n h n p s q θ p s qˆ ” θ p s q ż τs p θ p u q ` t n h n p u qq d u ´ p θ p s q ` t n h n p s qq ż τs θ p u q d u ı ´ θ p s q ż τs h p u q d u ` h p s q ż τs θ p u q θ p s q d u “ θ p s q ` t n h n p s q ż τs h n p u q d u ´ h n p s q θ p s q ` t n h n p s q θ p s q ż τs θ p u q d u ´ θ p s q ż τs h n p u q d u ` h n p s q ż τs θ p u q θ p s q d u ´ θ p s q ż τs p h p u q ´ h n p u qq d u ´ p h p s q ´ h n p s qq ż τs θ p u q θ p s q d u “ ´ ż τs h n p u q d u t n h n p s qr θ p s q ` t n h n p s qs θ p s q ` h n p s q ż τs θ p u q θ p s q d u t n h n p s q θ p s q ` t n h n p s q´ θ p s q ż τs p h p u q ´ h n p u qq d u ´ p h p s q ´ h n p s qq ż τs θ p u q θ p s q d u. (A.7)For large n , each denominator is bounded away from zero: To see this, denote ε : “ inf s Pr t ,t s | θ p s q| and C : “ sup s Pr t ,t s | h p u q| . Thus, sup s Pr t ,t s | h n p s q| ď sup s Pr t ,t s | h n p s q ´ h p u q| ` sup s Pr t ,t s | h p s q| ď ε ` C for each n large enough. It follows that, for each such n additionally satisfying t n ď ε p ε ` C q ´ , the denomina-tors are bounded away from zero, in particular, inf s Pr t ,t s | θ p s q ` t n h n p s q| ě ε { . Thus, taking the suprema over s P r t , t s , the ﬁrst two terms in (A.7) become arbitrarily small by letting t n be sufﬁciently small. The remainingtwo terms converge to zero since sup s Pr t ,t s | h n p s q ´ h p s q| Ñ and sup u Pr t ,τ s | θ p u q| ă 8 . Note here that ż τt | h p u q ´ h n p u q| d u ď sup u Pr t ,τ s | h p u q ´ h n p u q|p τ ´ t q Ñ due to τ ă 8 . l Proof of Lemma 4.2.

The convergences are immediate consequences of the functional delta-method, Theorem 1and the bootstrap version of the delta-method; cf. Section 3.9 in van der Vaart and Wellner (1996). Simply notethat all considered survival functions are elements of D ă8 X r D r t , t s (on increasing sets with probability tendingto one) and that the survival function of the life-times is assumed continuous and bounded away from zero oncompact subsets of r , τ q . Further, there is a version of the limit Gaussian processes with almost surely continuoussample paths. 13or the representation of the variance of the limit distribution in part (a) we refer to van der Vaart and Wellner(1996), p. 383 and 397. The asymptotic covariance structure in part (b) is easily calculated using Fubini’s theorem– for its applicability note that the variances Γ p r, r q of the limit process W of the Kaplan-Meier estimator exist atall points of time r P r , τ s . Thus, since W is a zero-mean process, we have for any ď r ď s ă τ , cov ´ ż τr W p u q S p r q d u ´ ż τr W p r q S p u q S p r q d u, ż τs W p v q S p s q d v ´ ż τs W p s q S p v q S p s q d v ¯ “ ż τr ż τs ” Γ p u, v q ´ S p u q S p r q Γ p r, v q ´ S p v q S p s q Γ p s, u q ` S p u q S p v q S p r q S p s q Γ p r, s q ı d u d vS p r q S p s q . Inserting the deﬁnition Γ p r, s q “ S p r q S p s q σ p r ^ s q and splitting the ﬁrst integral into ş τr “ ş sr ` ş τs yields thatthe last display equals ż τr ż τs S p u q S p v q S p r q S p s q r σ p u ^ v q ´ σ p r ^ v q ´ σ p s ^ u q ` σ p r ^ s qs d u d v “ ż τs ż τs S p u q S p v q S p r q S p s q r σ p u ^ v q ´ σ p r q ´ σ p s q ` σ p r qs d u d v ` ż sr ż τs S p u q S p v q S p r q S p s q r σ p u q ´ σ p r q ´ σ p u q ` σ p r qs d u d v “ ż τs ż τs Γ p u, v q S p r q S p s q d u d v ´ σ p r _ s q g p r q g p s q . l Proof of Theorem 2.

The theorem follows from Lemma 4.2 combined with the continuous mapping theorem ap-plied to the supremum functional D r t , t s Ñ R , f ÞÑ sup t Pr t ,t s | f p t q| which is continuous on C r t , t s . Forthe connection between the consistency of a bootstrap distribution of a real statistic and the consistency of thecorresponding tests (and the equivalent formulation in terms of conﬁdence regions), see Lemma 1 in Janssen andPauls (2003). l B Adaptations of Gill’s (1983) Lemmata

Abbreviate again the sigma algebra containing all the information of the original sample as X : “ σ p X i , δ i : i “ , . . . , n q . The proofs in Appendix A rely on bootstrap versions of Lemmata 2.6, 2.7 and 2.9 in Gill (1983). Sincethose are stated under the assumption of a continuous distribution function S , but ties in the bootstrap sampleare inevitable, these lemmata need a slight extension. For completeness, parts (a) of the following two Lemmatacorrespond to the original Lemmata 2.6 and 2.7 in Gill (1983).L EMMA

B.1 (Extension of Lemma 2.6 in Gill, 1983).

For any β P p , q , (a) P p p S n p t q ď β ´ S p t q for all t ď p T n q ě ´ β , (b) P p S ˚ n p t q ď β ´ p S n p t q for all t ď T ˚ n | X q ě ´ β almost surely.Proof of (b) . All equalities and inequalities concerning conditional expectations are understood as to hold almostsurely. As in the proof of Theorem 1, p S ˚ n p t ^ T ˚ n q{ p S n p t ^ T ˚ n qq t Pr , p T n q deﬁnes a right-continuous martingale14or each ﬁxed n and for almost every given sample X . Hence, Doob’s L -inequality (e.g. Revuz and Yor, 1999,Theorem 1.7 in Chapter II) yields for each β P p , q P p sup t Pr , p T n q S ˚ n p t ^ T ˚ n q{ p S n p t ^ T ˚ n q ě β ´ | X qď β sup t Pr , p T n q E p S ˚ n p t ^ T ˚ n q{ p S n p t ^ T ˚ n q | X q“ βE p S ˚ n p q{ p S n p q | X q “ β. This implies P p S ˚ n ď β ´ p S n on r , T ˚ n q | X q ě ´ β. It remains to extend this result to the interval’s endpoint.If the observation corresponding to T ˚ n is uncensored, we have “ S ˚ n p T ˚ n q ď β ´ p S n p T ˚ n q . Else, the event ofinterest t S ˚ n ď β ´ p S n on r , T ˚ n qu (given X ) implies that S ˚ n p T ˚ n q “ S ˚ n p T ˚ n ´q ď β ´ p S n p T ˚ n ´q “ β ´ p S n p T ˚ n q . Thus, for given X , t S ˚ n p T ˚ n q ď β p S n p T ˚ n qu Ă t S ˚ n ď β p S n on r , T ˚ n qu . l L EMMA

B.2 (Extension of Lemma 2.7 in Gill, 1983).

For any β P p , q , (a) P p p H n p t ´q ě βH n p t ´q for all t ď p T n q ě ´ eβ exp p´ { β q , (b) P p H ˚ n p t ´q ě β p H n p t ´q for all t ď T ˚ n | X q ě ´ eβ exp p´ { β q almost surely.Proof of (a) . As pointed out by Gill (1983), the assertion follows from the inequality for the uniform distributionin Remark 1(ii) of Wellner (1978). By using quantile transformations, his inequality can be shown to hold forrandom variables having an arbitrary, even discontinuous distribution function.

Proof of (b) . Fix X i p ω q , δ i p ω q , i “ , . . . , n . Since H in part (a) is allowed to have discontinuities, (b) followsfrom (a) for each ω . l Let a, b P D r , τ s be two (stochastic) jump processes, i.e. processes being constant between two discontinu-ities. If b has bounded variation, we deﬁne the integral of a with respect to b via ż s a d b “ ÿ a p t q ∆ b p t q , s P p , τ s , where the sum is over all discontinuities of b inside the interval p , s s . If a has bounded variation, we deﬁne theabove integral via integration by parts: ş s a d b “ a p s q b p s q ´ a p q b p q ´ ş s b ´ d a .L EMMA

B.3 (Adaptation of Lemma 2.9 in Gill, 1983).

Let h P D r , τ s be a non-negative and non-increasingjump process such that h p q “ and let Z P D r , τ s be a jump process which is zero at time zero. Then for all t ď τ , sup s Pr ,t s h p s q| Z p s q| ď s Pr ,t s ˇˇˇ ż s h p u q d Z p u q ˇˇˇ . Proof.

The original proof of Lemma 2.9 in Gill (1983) still applies for the most part with the assumptions of thislemma. For the sake of completeness, we present the whole proof.Let U p t q “ ş t h p s q d Z p s q with a t ď τ such that h p t q ą . Then Z p t q “ ż t d U p s q h p s q “ U p t q h p t q ´ ż t U p s ´q d ´ h p s q ¯ “ ż t p U p t q ´ U p s ´qq d ´ h p s q ¯ ` U p t q h p q . | h p t q Z p t q| ď ˇˇˇ ż t p U p t q ´ U p s ´qq d ´ h p t q h p s q ¯ˇˇˇ ` | U p t q| h p t qď ă s ď t | U p s q| ´ ´ h p t q h p q ¯ ` sup ă s ď t | U p s q| h p t q ď ă s ď t | U p s q| . l C Bootstrap Version of the Truncation Technique for Weak Convergence

The following lemma is a conditional variant of Theorem 3.2 in Billingsley (1999). Let ρ be the modiﬁed Skorohodmetric J on D r , τ s as in Billingsley (1999), i.e. ρ p f, g q “ inf λ P Λ p} λ } o _ sup t Pr ,τ s | f p t q ´ g p λ p t qq|q , where Λ is the collection of non-decreasing functions onto r , τ s and } λ } o “ sup s ‰ t ˇˇˇ log λ p s q´ λ p t q s ´ t ˇˇˇ . For an application inthe proof of Theorem 1, note that ρ p f, g q ď sup t Pr ,τ s | f p t q ´ g p t q| .L EMMA

C.1.

Let X : p Ω , A , P q Ñ p D r , τ s , ρ q be a stochastic process and let the sequences of stochasticprocesses X un and X n satisfy the following convergences given a σ -algebra C :(a) X un d ÝÑ Z u given C in probability as n Ñ 8 for every ﬁxed u ,(b) Z u d ÝÑ X given C in probability as u Ñ 8 ,(c) for all ε ą and for each subsequence p n q Ă p n q there exists another subsequence p n q Ă p n q such that lim u Ñ8 lim sup n Ñ8 P p ρ p X un , X n q ą ε | C q “ almost surely . Then, X n d ÝÑ X given C in probability as n Ñ 8 .Proof.

Choose a sequence ε m Ó . Let p n q Ă p n q be an arbitrary subsequence and choose subsequences p n p ε m qq Ă p n q and p u q Ă p u q such that (a) and (b) hold almost surely and also such that (c) holds alongthese subsequences. Replace p n p ε m qq by their diagonal sequence p n q ensuring (c) simultaneously for all ε m .Let F Ă D r , τ s be a closed subset and let F ε m “ t f P D r , τ s : ρ p f, F q ď ε m u be its closed ε m -enlargement.We proceed as in the proof of Theorem 3.2 in Billingsley (1999) whereas all inequalities now hold almost surely. P p X n P F | C q ď P p X u n P F ε m | C q ` P p ρ p X u n , X n q ą ε m | C q . The Portmanteau theorem in combination with (a) yields lim sup n Ñ8 P p X n P F | C q ď P p Z u P F ε m | C q ` lim sup n Ñ8 P p ρ p X u n , X n q ą ε m | C q . Condition (b) and another application of the Portmanteau theorem imply that lim sup n Ñ8 P p X n P F | C q ď P p X P F ε m | C q . Let m Ñ 8 to deduce lim sup n Ñ8 P p X n P F | C q ď P p X P F | C q almost surely. Thus, a ﬁnal applicationof Portmanteau theorem as well as the subsequence principle lead to the conclusion that X n d ÝÑ X given C inprobability. l eferences M. G. Akritas. Bootstrapping the Kaplan-Meier Estimator.

Journal of the American Statistical Association , 81(396):1032–1038, 1986.M. G. Akritas and E. Brunner. Nonparametric Methods for Factorial Designs with Censored Data.

Journal of theAmerican Statistical Association , 92(438):568–576, 1997.A. Allignol, J. Beyersmann, T. Gerds, and A. Latouche. A competing risks approach for nonparametric estimationof transition probabilities in a non-Markov illness-death model.

Lifetime Data Analysis , 20(4):495–513, 2014.P. K. Andersen, Ø. Borgan, R. D. Gill, and N. Keiding.

Statistical Models Based on Counting Processes . Springer,New York, 1993.P. Billingsley.

Convergence of probability measures . Wiley, New York, second edition, 1999.D. Dobler.

Nonparametric inference procedures for multi-state Markovian models with applications to incompletelife science data . PhD thesis, Universit¨at Ulm, Deutschland, 2016.D. Dobler and M Pauly. Resampling-based Conﬁdence Intervals and Tests for the Concordance Index and the WinRatio.

Preprint arXiv:1605.04729 , 2016.B. Efron. Censored Data and the Bootstrap.

Journal of the American Statistical Association , 76(374):312–319,1981.R. D. Gill. Censoring and Stochastic Integrals.

Mathematical Centre Tracts 124, Amsterdam: MathematischCentrum , 1980.R. D. Gill. Large Sample Behaviour of the Product-Limit Estimator on the Whole Line.

The Annals of Statistics ,pages 49–58, 1983.R. D. Gill. Non- and Semi-Parametric Maximum Likelihood Estimators and the von Mises Method (Part 1)[withDiscussion and Reply].

Scandinavian Journal of Statistics , 16(2):97–128, 1989.M. K. Grand and H. Putter. Regression models for expected length of stay.

Statistics in Medicine , 2015.L. Horvath and B. Yandell. Convergence Rates for the Bootstrapped Product-Limit Process.

The Annals of Statis-tics , pages 1155–1173, 1987.A. Janssen and T. Pauls. How do bootstrap and permutation tests work?

The Annals of Statistics , 31(3):768–806,2003.S.-H. Lo and K. Singh. The Product-Limit Estimator and the Bootstrap: Some Asymptotic Representations.

Probability Theory and Related Fields , 71(3):455–465, 1986.I. Meilijson. Limiting Properties of the Mean Residual Lifetime Function.

The Annals of Mathematical Statistics ,43(1):354–357, 1972.S. J. Pocock, C. A. Ariti, T. J. Collier, and D. Wang. The win ratio: a new approach to the analysis of compositeendpoints in clinical trials based on clinical priorities.

European heart journal , 33(2):176–182, 2012.D. Pollard.

Convergence of Stochastic Processes . Springer, New York, 1984.D. Revuz and M. Yor.

Continuous Martingales and Brownian Motion . Springer, third edition, 1999.W. Stute and J.-L. Wang. The Strong Law under Random Censorship.

The Annals of Statistics , 21(3):1591–1607,1993. 17. Tattar and H. Vaman. Extension of the Harrington-Fleming tests to multistate models.

Sankhya B , 74(1):1–14,2012.S.-M. Tse. Lorenz curve for truncated and censored data.

Annals of the Institute of Statistical Mathematics , 58(4):675–686, 2006.A. W. van der Vaart and J. Wellner.

Weak Convergence and Empirical Processes . Springer, New York, 1996.J.-G. Wang. A Note on the Uniform Consistency of the Kaplan-Meier Estimator.

The Annals of Statistics , pages1313–1316, 1987.J. A. Wellner. Limit theorems for the ratio of the empirical distribution function to the true distribution function.

Zeitschrift f¨ur Wahrscheinlichkeitstheorie und verwandte Gebiete , 45(1):73–88, 1978.Z. Ying. A note on the asymptotic properties of the product-limit estimator on the whole line.