Modified Cox regression with current status data
Laurent Bordes, Maria Carmen Pardo, Christian Paroissin, Valentin Patilea
aa r X i v : . [ m a t h . S T ] F e b Modified Cox regression with current status data
Laurent Bordes ∗ Mar´ıa Carmen Pardo † Christian Paroissin ‡ Valentin Patilea § Abstract
In survival analysis, the lifetime under study is not always observed. In certainapplications, for some individuals, the value of the lifetime is only known to besmaller or larger than some random duration. This framework represent an exten-sion of standard situations where the lifetime is only left or only right randomlycensored. We consider the case where the independent observation units includealso some covariates, and we propose two semiparametric regression models. Thenew models extend the standard Cox proportional hazard model to the situationof a more complex censoring mechanism. However, like in Cox’s model, in bothmodels the nonparametric baseline hazard function still could be expressed as anexplicit functional of the distribution of the observations. This allows to define theestimator of the finite-dimensional parameters as the maximum of a likelihood-typecriterion which is an explicit function of the data. Given an estimate of the finite-dimensional parameter, the estimation of the baseline cumulative hazard functionis straightforward.
Keywords: asymptotic normality, consistency, hazard function, likelihoodMSC2010: Primary 62N01, 62N02; secondary 62F12
Driven by applications, there is a constant interest in time-to-event analysis to extendthe predictive models to situations where the lifetimes of interest suffer from complexcensoring mechanisms. Here we consider the case where instead of the lifetime of interest ∗ University of Pau, France; [email protected]. † Complutense University of Madrid, Spain; [email protected]. ‡ University of Pau, France; [email protected]. § CREST-Ensai, France; [email protected]. V. Patilea acknowledges support from the research program
New Challenges for New Data of Fondation du Risque and LCL. , one observes independent copies of a finite nonnegative duration X and of a discretevariable A ∈ { , , } such that X = T si A = 0 ,X < T si A = 1 ,X ≥ T si A = 2 . (1.1)Depending on the application, the inequality signs in (1.1) could be strict or not. Let uspoint out that the limit case where the event { A = 2 } (resp. { A = 1 } ) has zero probabilitycorresponds to the usual random right-censoring (resp. left-censoring) setup, while thecase where the probability of the event { A = 0 } is null corresponds to the current statusframework.Let us assume that T ∈ [0 , ∞ ] and let Z ∈ R q be a vector of random covariates. All therandom variables we consider are defined on some probability space (Ω , F , P ) . Although X takes values only on the real line, we allow a positive probability for the event { T = ∞} , that is we allow for cured individuals (see, for instance Fang et al. (2005) and Zheng etal. (2006) and the references therein for the applications where infinity lifetimes couldoccur). Symmetrically, we also allow the zero lifetime to have positive probability, that isa zero-inflated law for T could be taken into account (see Braekers & Growels (2015) forsome motivations).Let Z be the support of Z . The conditional probability distribution of (
X, A ) given Z is characterized by the sub-distributions functions H k ([0 , t ] | z ) = P ( X ≤ t, A = k | Z = z ) , t ≥ , k ∈ { , , } , z ∈ Z . Let H k ( dt | z ) denote the associated measures. Moreover, let H k ([0 , t ]) = P ( X ≤ t, A = k )be the unconditional versions of these sub-distributions. Clearly, H k ([0 , t ]) = E ( H k ([0 , t ] | Z )) , t ≥ , k ∈ { , , } . The conditional distribution function of X given Z = z is then H ([0 , t ] | z ) = P ( X ≤ t | Z = z ) = H ([0 , t ] | z ) + H ([0 , t ] | z ) + H ([0 , t ] | z ) . It is important to understand that, based on the data, one could only identify theconditional sub-distributions H k ( ·| z ). For identifying and consistently estimating theconditional law of the lifetime of interest T , one should introduce some assumptions onthe censoring mechanism. In other words, one has to consider a latent model. Severalcensoring mechanisms have proposed in the case without covariates . Turnbull (1974)considered two censoring times L ≤ U such that the case { A = 0 } (resp. { A = 1 } ) (resp. { A = 2 } ) corresponds to the event { L ≤ T ≤ U } and X = T (resp. { U ≤ T } and X = U ) (resp. { T ≤ L } and X = L ). Patilea & Rolin (2006b) relaxed the condition2 ≤ U and proposed two models that could be easily illustrated using simple electriccircuits with three components connected in series and/or parallel. Patilea and Rolin(2006a) extended the standard right-censoring (resp. left-censoring) model by allowinguncensored lifetimes T for which one only knows that are smaller (resp. larger) than theobservation X . This corresponds, for instance, to the case of a medical study where adisease is detected for a patient, but the onset time could not be determined from medicalrecords, personal information, etc , while for other patients with the disease detected theonset time is available. The model of Turnbull does not allow to express the law of thelifetime of interest as an explicit function of the sub-distributions H k , as it is the case forthe models proposed by Patilea & Rolin (2006a, 2006b). Thus a numerical algorithm isnecessary to compute Turnbull’s estimator. It is important to keep in mind that any ofthese latent models could be correct and useful for a specific application. The data doesnot allow to check the validity of the model. Turnbull’s model, perhaps the most popularmodel for data structures as we consider here, is not necessarily justified in applicationswhere there is no natural interpretation of the variables L and U .The aim of this paper is to extend the modeling of data as in equation (1.1) to thecase where some covariates Z are available. Kim et al. (2010) extended Turnbull’s modelto the case with covariates using a proportional hazard approach. Here we consider theextension of the approaches proposed by Patilea & Rolin (2006a) imposing the sameproportional hazard assumption. More precisely, we propose two novel latent modelsfor observed lifetimes as in (1.1) in the presence of covariates. Both models are wellsuited for data as in (1.1), and hence could be used in applications. The decision touse one of them, or the one proposed by Kim et al. (2010), could be made only on thebasis of additional information on the application. Current status data corresponds to A ∈ { , } . Right (resp. left) censored data corresponds to A ∈ { , } (resp. A ∈ { , } ).This explains the terminology we propose for our models: modified Cox regressions withcurrent status lifetimes. For each of the new models, we introduce a semiparametricestimator for the finite-dimensional parameters, together with the corresponding baselinecumulative hazard functions estimators. Our estimators are easy to implement.The paper is organized as follows. Our semiparametric models are introduced in sec-tion 2. They extend the standard right, respectively left, random censoring proportionalhazard models. In section 3 we introduce the semiparametric estimators of the covariatescoefficients, and the estimators of the cumulative hazard and survival functions. In par-ticular, we provide an estimator for the cure rate and the zero-lifetime probability. Thetheoretical results are presented In our models we follow the idea of Cox’s semiparametric proportional hazard model. Inboth models we are able to express the baseline cumulative hazard function as a func-tional of distribution of the observations, characterized by the conditional sub-distribu-tions H k ( ·| z ) and the law of Z , and the coefficients of the covariates. This makes that the3oefficients of the covariates could be estimated by maximizing an likelihood-type criterionthat is build as an explicit function of the observations. Thus the numerical aspects arevery much simplified, compared to the model considered by Kim et al. (2010). With athand the estimate of the finite-dimensional parameters, we could easily build the estima-tor of the baseline cumulative hazard function. In particular, using the estimate of totalmass of the baseline cumulative hazard, we provide a simple estimate of the conditionalcure rate P ( T = ∞ | Z = z ). Similarly, we could provide an estimator for the conditionalzero-lifetime probability P ( T = 0 | Z = z ). The extension to the case of mixture models,such as considered by Fang et al. (2005), where the cure rate or the zero-lifetime coulddepend on possibly different set of covariates, is left for future work. Let C ∈ [0 , ∞ ) be a random censoring time and ∆ be a Bernoulli random variable withsuccess probability p ∈ (0 , F T ( t | z ) and S T ( t | z ) , t ∈ [0 , ∞ ] , be the conditionaldistribution function and survivor function of T given Z = z . Similarly, F C ( t | z ) and S C ( t | z ) , t ∈ [0 , ∞ ) , denote the distribution function and the survivor function of C .Following Patilea & Rolin (2006a), the latent model for ( X, A, Z ) is defined by: ( X, A, Z ) = ( T, , Z ) if 0 ≤ T ≤ C and ∆ = 1 , ( X, A, Z ) = ( C, , Z ) if 0 ≤ C < T, ( X, A, Z ) = ( C, , Z ) if 0 ≤ T ≤ C and ∆ = 0 . Let us notice that p = 1 is the classical right-censoring limit case, while p = 0 wouldcorrespond to the pure current status setup. The later limit case is not included in whatfollows since we assume p > . In the case
C < T, the observed outcome is not beinfluenced by the value of ∆ . For identification purposes we consider the following assumption.
A1:
Assume that: a ) conditionally on Z , the latent variables T and C are independent; b ) ∆ and ( T, C, Z ) are independent.The independence assumptions allow to write H ( dt | z ) = pS C ( t − | z ) F T ( dt | z ) ,H ( dt | z ) = F C ( dt | z ) S T ( t | z ) ,H ( dt | z ) = (1 − p ) F C ( dt | z ) F T ( t | z ) . (2.2)The system could be solved for the quantities p and F T ( dt | z ) . First, let us write H ([ t, ∞ ) | z ) + pH ([ t, ∞ ) | z ) = pS T ( t − | z ) S C ( t − | z ) . Since S T ( t − | z ) S C ( t − | z ) = H ([ t, ∞ ) | z ), we deduce H ([ t, ∞ ) | z ) = p { H ([ t, ∞ ) | z ) + H ([ t, ∞ ) | z ) } , t ≥ . t = 0 we could derive the simple representation p = H ([0 , ∞ )) H ([0 , ∞ )) + H ([0 , ∞ )) = P (∆ = 1 , T ≤ C ) P ( T ≤ C ) . (2.3)Let us point out that one could replace the condition A1b ) by the weaker condition that∆ and (
T, C ) are independent given Z and still write the equations (2.2) with p replacedby some function of the covariates p ( Z ) . In this case one would derive the conditionalversion of the representation (2.3), but then the estimation of p ( Z ) would require theestimation of the conditional versions of H and H . For the sake of a simpler setup wesuppose that p does not depend on the covariates.Next, we solve (2.2) for the conditional distribution of T. For this purpose we followa proportional hazards model approach and we suppose that the risk function of T given Z = z could be written as λ ( t | z ) = λ ( t ) exp( β ⊤ z ) , ∀ t > , ∀ z ∈ Z , (2.4)where λ ( · ) is some unknown baseline hazard function and β is a vector of unknown regres-sion parameters. (Herein the vectors are matrix columns and β ⊤ denotes the transposedof β. ) With this assumption, for each z ∈ Z and t ≥ , we could write H ( dt | z ) = pF T ( dt | z ) S C ( t − | z )= F T ( dt | z ) S T ( t − | z ) pS T ( t − | z ) S C ( t − | z )= λ ( t ) exp( β ⊤ z ) pS T ( t − | z ) S C ( t − | z ) dt = λ ( t ) exp( β ⊤ z ) { H ([ t, ∞ ) | z ) + pH ([ t, ∞ ) | z ) } dt. Hence, H ( dt ) = E { H ( dt | Z ) } = E { exp( β ⊤ Z ) ( H ([ t, ∞ ) | Z ) + pH ([ t, ∞ ) | Z )) } λ ( t ) dt. Moreover, E (cid:8) exp( β ⊤ Z ) H k ([ t, ∞ ) | Z ) (cid:9) = E (cid:8) exp( β ⊤ Z ) ( X ≥ t, A = k ) (cid:9) , ∀ t ≥ , k = 0 , . As a consequence, for any t such that E (cid:8) exp( β ⊤ Z )[ H ([ t, ∞ ) | Z ) + H ([ t, ∞ ) | Z )] (cid:9) > λ ( t ) dt = H ( dt ) E { exp( β ⊤ Z )[ ( X ≥ t, A = 0) + p ( X ≥ t, A = 1)] } . (2.5)Thus, the baseline cumulative hazard function Λ( t ) = R [0 ,t ] λ ( s ) ds could be expressed as afunctional of the observed variables and the finite-dimensional parameters of the model :Λ( t ) = Λ( t ; p, β ) = Z [0 ,t ] H ( ds ) E { exp( β ⊤ Z )[ ( X ≥ s, A = 0) + p ( X ≥ s, A = 1)] } . (2.6)5he conditional survival function of the lifetime of interest can be expressed as S T ( t | z ) = Y s ∈ (0 ,t ] (cid:0) − exp( β ⊤ z )Λ( ds ) (cid:1) . Herein, the notation Q s ∈ I means the product-integral over the interval I , as formallydefined in Gill & Johansen (1990). In particular, the conditional cure probability can beexpressed as S T ( ∞ | z ) = Y s ∈ (0 , ∞ ) (cid:0) − exp( β ⊤ z )Λ( ds ) (cid:1) . Let C ∈ (0 , ∞ ) be a random censoring time and ∆ be a Bernoulli random variable withsuccess probability p ∈ (0 , X, A, Z ) is defined by: ( X, A, Z ) = ( T, , Z ) if 0 < C ≤ T and ∆ = 1 , ( X, A, Z ) = ( C, , Z ) if 0 < C ≤ T, and ∆ = 0( X, A, Z ) = ( C, , Z ) if 0 ≤ T < C.
The case p = 1 corresponds to the classical left-censored data situation. Consider theassumptions A1a ) and
A1b ). Then we can write H ( dt | z ) = pF C ( t | z ) F T ( dt | z ) ,H ( dt | z ) = (1 − p ) F C ( dt | z ) S T ( t − | z ) ,H ( dt | z ) = F C ( dt | z ) F T ( t − | z ) . (2.7)This system also could be solved for the quantities p and F T ( dt | z ) . First, combining thefirst and the third equation, deduce H ([0 , t ] | z ) + pH ([0 , t ] | z ) = pF T ( t | z ) F C ( t | z ) , so that p = H ([0 , ∞ )) H ([0 , ∞ )) + H ([0 , ∞ )) . Moreover, for each z ∈ Z and t ≥ , we could write H ( dt | z ) = pF T ( dt | z ) F C ( t | z )= F T ( dt | z ) F T ( t | z ) pF T ( t | z ) F C ( t | z )= R ( dt | z ) { H ([0 , t ] | z ) + pH ([0 , t ] | z ) } , where R ( dt | z ) = F T ( dt | z ) F T ( t | z )6s the conditional reverse hazard measure. The quantity R ( dt | z ) could be interpreted asthe conditional probability that the event occurs in the interval [ t − dt, t ], given that theevent occurs no later than t . This measure has the property F T ( t | z ) = Y s ∈ ( t, ∞ ) (1 − R ( ds | z )) , ∀ t ≥ . In particular, F T (0 | z ) = Y s ∈ (0 , ∞ ) (1 − R ( ds | z )) . Inspired by the proportional hazards approach, let us consider that the conditionalreverse hazard function of T given Z = z could be written as r ( t | z ) = r ( t ) exp( β ⊤ z ) , ∀ t > , ∀ z ∈ Z , (2.8)where r ( · ) is some unknown baseline reverse hazard function and β is a vector of unknownregression parameters.Similar to the right-censoring case, one can deduce r ( t ) dt = H ( dt ) E { exp( β ⊤ Z )[ ( X ≤ t, A = 0) + p ( X ≤ t, A = 2)] } , (2.9)and the baseline cumulative reverse hazard is obtained as R ( t ) = R ( t, ∞ ) r ( s ) ds . Let ( X i , A i , Z i ) , ≤ i ≤ n , denote the observations that are independent copies of( X, A, Z ) ∈ [0 , ∞ ) × { , , } × Z . In the following, we consider Z = R q with q somepositive integer. With observations of the covariates and of lifetimes as in (1.1), a naturallikelihood-type criterion is the one considered by Kim et al. (2010) : L n ( β, Λ) = n Y i =1 (cid:26) exp( β ⊤ Z i ) λ ( X i ) exp (cid:18) − exp( β ⊤ Z i ) Z ( X i > t )Λ( dt ) (cid:19)(cid:27) ( A i =0) × (cid:26) exp (cid:18) − exp( β ⊤ Z i ) Z ( X i ≥ t )Λ( dt ) (cid:19)(cid:27) ( A i =1) × (cid:26) − exp (cid:18) − exp( β ⊤ Z i ) Z ( X i ≥ t )Λ( dt ) (cid:19)(cid:27) ( A i =2) . (3.1)In this criterion, the factors involving the distribution of ( C, Z ⊤ ) ⊤ are dropped, as theyare supposed uninformative.To write the likelihood-type criterion L n ( β, Λ), we only used a hazard rate as in(2.4), without specifying any censoring mechanism or latent model. Alternatively, one7ould write the likelihood in terms of the cumulative reverse hazard R ( · ) we defined insection 2.2, using only the assumption (2.8). The two criteria are equivalent and wouldbe valid for the type of data we consider. Next, one could follow the profiling idea. Inthe case where P ( A = 2) = 0 this leads to Cox’s partial likelihood with right-censoreddata. See Murphy & van der Vaart (2000). A similar situation, Cox’s partial likelihoodwith left-censored data, occurs when P ( A = 1) = 0. Unfortunately, given a value β , themaximization with respect to Λ( · ) (or R ( · )) of L n ( β, Λ) does not have a nondegenerate,explicit solution when both P ( A = 1) and P ( A = 2) are positive. See Kim et al. (2010),the Remark on page 1341. A possible solution, proposed by Kim et al. , would be toconsider a numerical approximation. Here we propose an alternative, more convenientand sound route. To estimate the parameters of interest, one has to consider a modelfor the censoring mechanism. In the model considered by Kim et al. (2010), there isno way to connect the infinite-dimensional parameter Λ (or R ( · )) to the quantities thatcould be easily estimated from the data, such as H ( · ). This makes the profiling approachcomplicated. The profiling approach is very appealing in the standard right-censoring(resp. left-censoring) case because there Λ could be easily expressed in terms of H ( · ), H ( · | z ) and H ( · | z ) (resp. H ( · | z )).In the two models we propose, the relationship between quantities that could be esti-mated by sample means from the data and the infinite-dimensional parameter Λ (or R ( · ))is explicit and this allows us to build a user-friendly approximated likelihood. These mod-els does not only make the optimization of the likelihood-type criteria simpler. First ofall, they induce censoring mechanisms that make sense in some applications. See Patilea& Rolin (2006a) for a discussion. The parameters of our first model are θ = ( p, β ⊤ ) ⊤ ∈ (0 , × B ⊂ R q +1 and the hazardfunction Λ( · ). Let θ = ( p , β ⊤ ) ⊤ and Λ ( · ) denote the true values of the parameters.Using the notation from equation (2.6) we can also write Λ ( t ) = Λ( t ; θ ).In view of equation (2.3) let us consider b p = P ni =1 ( A i = 0) P ni =1 ( A i = 1)as estimator of p . For estimating β we shall use a partial likelihood approach. With athand an estimate of β , we will use an empirical version of equation (2.6) and build anestimate of Λ ( · ). For these purposes let us define empirical quantities N ki ( t ) = ( X i ≤ t, A i = k ) , ≤ i ≤ n, k ∈ { , } ,N n, ( t ) = 1 n n X i =1 ( X i ≤ t, A i = 0) = 1 n n X i =1 N i ( t ) . c , c ⊗ = 1 , c ⊗ = c and c ⊗ = cc ⊤ . Let S ( l ) n,k ( t ; β ) = 1 n n X i =1 exp( β ⊤ Z i ) Z ⊗ li ( X i ≥ t, A i = k ) , l = 0 , , k ∈ { , , } , and E ( l ) n ( t ; θ ) = E ( l ) n ( t ; p, β ) = S ( l ) n, ( t ; β ) + pS ( l ) n, ( t ; β ) . (3.2)Consider Λ n ( t ; θ ) = Λ n ( t ; p, β ) = Z [0 ,t ] N n, ( ds ) S (0) n, ( s ; β ) + pS (0) n, ( s ; β ) = Z [0 ,t ] N n, ( ds ) E (0) n ( s ; θ )as the empirical version of the cumulative hazard function Λ( t ), as defined in (2.6).Using these empirical quantities, and recalling that P ( T = C ) = 0, we can write thefollowing approximation of the criterion defined in (3.1) : n Y i =1 Y t ∈ [0 ,τ ] (cid:2) exp( β ⊤ Z i )Λ n ( t ; θ ) (cid:3) N i ( dt ) (cid:20) − exp (cid:18) − Z [0 ,X i ] exp( β ⊤ Z i )Λ n ( ds ; θ ) (cid:19)(cid:21) N i ( dt ) × exp (cid:18) − Z [0 ,τ ] { S (0)0 ( t ; β ) + S (0)1 ( t ; β ) } Λ n ( dt ; θ ) (cid:19) , where τ ∈ (0 , ∞ ) is some threshold that prevents from dividing by zero, it will be specifiedbelow. Hence, let us define the approximate log-likelihood function ℓ n ( p, β ; τ ) = 1 n n X i =1 D τ i (cid:0) β ⊤ Z i − log (cid:0) E (0) ( X i ; p, β ) (cid:1)(cid:1) + 1 n n X i =1 D τ i log (cid:18) − exp (cid:18) − Z [0 ,X i ] exp( β ⊤ Z i ) E (0) ( s ; p, β ) N n, ( ds ) (cid:19)(cid:19) − Z [0 ,τ ] E (0) ( t ; 1 , β ) E (0) ( t ; p, β ) N n, ( dt ) , and D τki = ( X i ≤ τ, A i = k ) , k ∈ { , } . The regression parameter β is then estimatedby b β = arg max β ∈ B ℓ n ( b p, β ; τ ) , where B ⊂ R q is a set of parameters and τ is fixed by the statistician. For theoretical re-sults, one needs conditions allowing to control for small values of H ([ τ, ∞ ))+ pH ([ τ, ∞ )) . This is technical condition that is usually ignored in practice where one would simply take τ equal to the largest uncensored observation. Next, the cumulative hazard function isestimated by b Λ( t ) = Λ n ( t ; b p, b β )9nd the conditional survival function of the lifetime of interest is estimated by b S T ( t | z ) = Y s ∈ (0 ,t ] (cid:16) − exp( b β ⊤ z ) b Λ( ds ) (cid:17) , t < τ. The conditional cure probability P ( T = ∞ | Z = z ) is then estimated by b S T ( ∞ | z ) = b S T ( τ | z ) = Y s ∈ (0 ,τ ] (cid:16) − exp( b β ⊤ z ) b Λ( ds ) (cid:17) . In the case of the model for left-censored and current status data the estimate of p is b p = P ni =1 ( A i = 0) P ni =1 ( A i = 2) . Next, using the same notation as above, let us define F ( l ) n,k ( t ; β ) = 1 n n X i =1 exp( β ⊤ Z i ) Z ⊗ li ( X i ≤ t, A i = k ) , l = 0 , , k ∈ { , , } . Let us denote L ( l ) n ( t ; p, β ) = F ( l ) n, ( t ; β ) + pF ( l ) n, ( t ; β )and, for any t such that L (0) ( t ; p, β ) >
0, consider R n ( dt ; p, β ) = N n, ( dt ) L (0) n ( t ; p, β ) . Let us fix some (small) value ̺ such that H ([0 , ̺ ]) + H ([0 , ̺ ]) > ℓ n ( p, β ; ̺ ) = 1 n n X i =1 D ̺ i (cid:0) β ⊤ Z i − log (cid:0) L (0) n ( X i ; p, β ) (cid:1)(cid:1) + 1 n n X i =1 D ̺ i log − exp − Z [ X i , ∞ ) exp( β ⊤ Z i ) L (0) n ( s ; p, β ) N n, ( ds ) !! − Z [ ̺, ∞ ) L (0) n ( t ; 1 , β ) L (0) n ( t ; p, β ) N n, ( dt ) , where D ̺ki = ( X i ≥ ̺, A i = k ) , k ∈ { , } . The regression parameter β is then estimatedby b β = arg max β ∈ B ℓ n ( b p, β ; ̺ ) , B ⊂ R q is a set of parameters and ̺ is fixed by the statistician. Like in the previousmodel, imposing a bound ̺ , here it should be a lower one, is a technical condition usuallyignored in applications. Next, the conditional distribution function of the lifetime ofinterest is estimated by b F T ( t | z ) = Y ( t, ∞ ) (cid:16) − exp( b β ⊤ z ) R n ( ds ; b p, b β ) (cid:17) , t ≥ ̺. The zero lifetime conditional probability P ( T = 0 | Z = z ) is then estimated by b F T (0 | z ) = b F T ( ̺ | z )and the baseline cumulative reverse hazard is estimated by b R ( t ) = R ( t, ∞ ) R n ( ds ; b p, b β ). For the asymptotic results we only consider the investigation of the right-censored andcurrent status data case. For the left-censored and current status data case the resultsare similar and could be obtained after obvious modifications.Let P be the probability distribution of ( X, A, Z ) and for any integrable function f let P f = E [ f ( X, A, Z )] . Let P n = 1 n n X i =1 δ ( X i ,A i , Z i ) be the empirical distribution function and G n = √ n ( P n − P ) . Let us introduce the following additional assumptions.
A2:
The vector of covariates Z lies in R q , with q ≥ k Z k ≤ c a.s. Moreover, β is an interior point ofthe parameter set B that is a compact subset of R q , and p ∈ [ ǫ, − ǫ ] ⊂ (0 , A3:
The value τ > H ([ τ, ∞ )) + H ([ τ, ∞ )) > . For simplicity we rule out the case p = 1 because in this case P ( A = 2) = 0 and b p = 1a.s., that is we are exactly in the classical PH model under right-censoring. Since p isstrictly positive, Assumption A3 ) is equivalent to H ([ τ, ∞ )) + p H ([ τ, ∞ )) > . Also forsimplicity, in the sequel we assume that the lifetime of interest T and the censoring time C are almost surely different. Let us notice that the construction we propose in sections2.1 and 2.2 adapts to the case where q depends on the sample size, or to the case where Z is an infinite-dimensional space. The study of the properties of the estimators definedin such cases is left for future work. Theorem 4.1 (Consistency) . Let b θ = ( b p, b β ⊤ ) ⊤ . Assume P ( T = C ) = 0 and Assumptions A1 – A3 hold true. Then: . b θ → θ , in probability;2. sup t ∈ [0 ,τ ] (cid:12)(cid:12)(cid:12)b Λ( t ) − Λ ( t ) (cid:12)(cid:12)(cid:12) → in probability. Theorem 4.2 (I.i.d. representation) . Under the assumptions of Theorem 4.1 we have: √ n b p − p b β − β b Λ( t ) − Λ ( t ) = G n ˜ ℓ t ; p ,β , Λ + R n ( t ) , t ∈ [0 , τ ] , where ℓ s ; p ,β , Λ is some squares integrable function and R n ( t ) is a reminder term that isuniformly negligible, that is sup t ∈ [0 ,τ ] | R n ( t ) | = o P (1) . Corollary 4.3 (CLT) . Under the assumptions of Theorem 4.2 √ n b p − p b β − β b Λ( · ) − Λ ( · ) ❀ G in R q +1 × ℓ ∞ ([0 , τ ]) , where G is a tight, zero-mean Gaussian process with covariance function ρ G ( s, t ) = P ˜ ℓ s ; p ,β , Λ ˜ ℓ ⊤ t ; p ,β , Λ , ≤ s, t ≤ τ. We could also derive the asymptotic law of the estimator of the survivor function S T ( t | z ) for an arbitrary value z in the support of the covariates. The following result isa straightforward extension of classical results for Cox PH model, see Link (1984). Corollary 4.4 (CLT for the conditional survivor) . Under the assumptions of Theorem4.2 and for any fixed z ∈ Z , √ n (cid:16) b S T ( · | z ) − S T ( · | z ) (cid:17) ❀ S z in ℓ ∞ ([0 , τ ]) , where S z is a tight, zero-mean Gaussian process. Let us now investigate the estimator of the cure rate. Suppose that H ( · ) has abounded support and let τ H be its right endpoint. Assume that H ([ τ H , ∞ )) > . Thenin our model we necessarily have Λ([0 , τ H ]) < ∞ and inf k z k≤ c S T ( τ H | z ) > . Sinceone cannot identify the law of T beyond the last uncensored observation, by an usualconvention, S T ( ∞ | z ) = S T ( τ H | z ) . These quantities could be estimated by b S T ( τ | z ) . The following corollary is a direct consequence of Corollary 4.4.12 orollary 4.5 (CLT for the conditional cure rate) . Suppose that the assumptions A1 , A2 hold true. Moreover, H ( · ) has a bounded support with right endpoint τ H < ∞ .Assume that H ([ τ H , ∞ )) > . Then √ n (cid:16) b S T ( X n ) | z ) − S T ( ∞ | z ) (cid:17) ❀ N (0 , V ( z )) , where X n ) is the largest uncensored observation and V ( z ) = E ( S z ( τ H )) with S z fromCorollary 4.4. The estimation of the covariance functions of the processes G and S z , and of thevariance V ( z ) is quite difficult. Therefore we propose an alternative route, based on theweighted bootstrap, for estimating the asymptotic law of our estimators. Let us consider˜ ℓ · ; b p, b β, b Λ that is an uniformly consistent estimator of ˜ ℓ · ; p ,β , Λ . Next, let us define G ′ n = 1 √ n n X i =1 ( ξ i − ¯ ξ ) δ ( X i ,A i , Z i ) where ξ , . . . , ξ n are i.i.d., with zero mean and unit variance random variables, for instancegaussian, independent of the data. Theorem 4.6 (Asymptotic law approximation) . Under the assumptions of Theorem 4.2: (cid:16) G n ˜ ℓ · ; p ,β , Λ , G ′ n ˜ ℓ · ; b p, b β, b Λ (cid:17) ❀ ( G , G ′ ) in (cid:0) R q +1 × ℓ ∞ ([0 , τ ]) (cid:1) where G and G ′ are independent and identically distributed. As a direct consequence of Theorem 4.6 one could obtain the validity of the bootstrapapproximation of the asymptotic laws stated in Corollaries 4.3 to 4.5. The details areomitted.
References [1]
Braekers, R. & Grouwels, Y. (2015). A semi-parametric Cox’s regression modelfor zero-inflated left-censored time to event data.
Communications in Statistics – The-ory and Methods , 1969–1988.[2]
Cox, D.R. (1972). Regression models and life tables (with discussion).
J. Roy. Statist.Soc. Ser. B , 187–220.[3] Cox, D.R. (1975). Partial likelihood.
Biometrika , 269–276.[4] Fang, H.B., Li, G., & Sun, J. (2005). Maximum likelihood estimation in a semi-parametric Logistic/proportional-hazards mixture model.
Scand. J. Statist. , 59–75.135] Gill, R.D. (1994).
Lectures on survival analysis . Lectures on probability theory:Ecole d’t de probabilits de Saint-Flour XXII. Lecture notes in mathematics 1581.Springer.[6]
Gill, R.D., Johansen, S. (1990). A Survey of Product-Integration with a ViewToward Application in Survival Analysis.
Ann. Statist. , 1501–1555.[7]
Huang, J. (1999). Asymptotic properties of nonparametric estimation based onpartly interval-censored data.
Statistica Sinica , 501–519.[8] Kim, J.S. (2003). Maximum likelihood estimation for the proportional hazards modelswith partly interval-censored data.
J. Royal Stat. Soc. B , 489–502.[9] Kim, Y., Kim, B., Jang, W. (2010). Asymptotic properties of the maximum like-lihood estimator for the proportional hazards model with doubly censored data.
J.Multivar. Anal. , 1339–1351.[10]
Kosorok, M.D. (2008).
Introduction to empirical process and semiparametric in-ference . Springer Series in Statistics, Springer: New-York.[11]
Link, C.L. (1984). Confidence intervals for the survival function using Cox’sproportional-hazard model with covariates.
Biometrics , 601–609.[12] Murphy, S. A., & A. W. van der Vaart (2000). On profile likelihood.
J. Amer.Statist. Assoc. , 449–465.[13]
Patilea, V., & Rolin, J.-M. (2006a). Product-limit estimators of the survivalfunction for two modified forms of current-status data.
Bernoulli , 801–819.[14] Patilea, V., & Rolin, J.-M. (2006b). Product-limit estimators of the survivalfunction with twice censored data.
Ann. Statist. , 925–938.[15] Turnbull, B.W. (1974). Nonparametric estimation of a survivorship function withdoubly censored data.
J. Amer. Statist. Assoc. , 169–173.[16] van der Vaart, A.D. (1998). Asymptotic Statistics.
Cambridge Series in Statisticaland Probabilistic Mathematics. Cambridge University Press, Cambridge.[17] van der Vaart, A.D., & Wellner, J.A. (1996).
Weak convergence and empiricalprocesses.
Springer Series in Statistics. Springer-Verlag, New-York.[18] van der Vaart, A.D., & Wellner, J.A. (2007). Empirical processes indexed byestimated functions. In
Asymptotics: Particles, Processes and Inverse Problems , IMSLecture NotesMonograph Series, Vol. 55 (2007) 234–252.[19]
Zheng, D., Yin, G. & Ibrahim, J.G. (2006). Semiparametric TransformationModels for Survival Data With a Cure Fraction.
J. Amer. Statist. Assoc. ,670–684. 14
Appendix
For any matrix A , we denote by k A k = p T race ( A ⊤ A ) . Let us recall that vectors areconsidered as column matrices. The spaces of functions we consider are endowed withthe uniform (supremum) norm that is denoted by k · k ∞ . Let ∂ p and ∂ β denote the partialderivation operators with respect to p and β, respectively.Let s ( l ) k ( t ; β ) = E n S ( l ) n,k ( t ; β ) o = E (cid:8) exp( β ⊤ Z ) Z ⊗ l ( X ≥ t, A = k ) (cid:9) , and e ( l ) ( t ; θ ) = e ( l ) ( t ; p, β ) = E (cid:8) E ( l ) n ( t ; θ ) (cid:9) = s ( l )0 ( t ; β ) + ps ( l )1 ( t ; β ) , l = 0 , , k ∈ { , , } . Let ℓ ( p, β ; τ ) = E [ β ⊤ Z 1 ( X ≤ τ, A = 0)] − Z [0 ,τ ] log (cid:0) e (0) ( t ; p, β ) (cid:1) H ( dt )+ E (cid:2) log (cid:0) − exp (cid:0) − exp( β ⊤ Z )Λ( X ; p, β ) (cid:1)(cid:1) ( X ≤ τ, A = 2) (cid:3) − Z [0 ,τ ] e (0) ( t ; 1 , β ) e (0) ( t ; p, β ) H ( dt ) . The criterion ℓ ( p, β ; τ ) is expected to be the limit of the approximated log-likelihoodfunction ℓ n ( p, β ; τ ). Let us recall that P denotes the probability distribution of ( X, A, Z )and for any integrable function f let P f = E [ f ( X, A, Z )] . Moreover, P n = 1 n n X i =1 δ ( X i ,A i , Z i ) is the empirical measure, and G n = √ n ( P n − P ) . Finally, define δ k ( a ) = ( a = k ) , k ∈ { , , } . To prove consistency for b β , it suffices, for instance, to use the results from section 5.2 ofvan der Vaart (1998). This means to check that ℓ ( p , β ; τ ) > ℓ ( p, β ; τ ) , ∀ ( p, β ⊤ ) ⊤ ∈ [ ǫ, − ǫ ] × B, ( p, β ⊤ ) ⊤ = ( p , β ⊤ ) ⊤ , (5.1)sup p ∈ [ ǫ, − ǫ ] sup β ∈ B | ℓ n ( p, β ; τ ) − ℓ ( p, β ; τ ) | = o P (1) , (5.2)and the map ( p, β ⊤ ) ⊤ ℓ ( p, β ; τ ) is continuous. The continuity condition is a directconsequence of the Lebesgue’s Dominated Convergence Theorem. Conditions (5.1) and(5.2) will be consequence of the two following lemmas.15 emma 5.1. Under the conditions of Theorem 4.1, the true value of the parameter θ =( p, β ⊤ ) ⊤ is identifiable, that is the condition (5.1) holds.Proof of Lemma 5.1. Consider the conditional log-likelihood of the multinomial variable A ∈ { , , } given Z = z log p ( t, A, z ; p, β ) = δ ( A ) exp( β ⊤ z ) { H ([ t, ∞ ) | z ) + pH ([ t, ∞ ) | z ) } Λ( dt )+ δ ( A ) exp (cid:0) − exp( β ⊤ z )Λ([0 , t ]) (cid:1) F C ( dt | z )+ δ ( A ) (cid:8) − exp (cid:0) − exp( β ⊤ z )Λ([0 , t ]) (cid:1)(cid:9) F C ( dt | z ) . Following the notation of Gill (1994), here we treat dt not just as the length of a smallinterval [ t, t + dt ) but also as the name of the interval itself. Note thatlog p ( t, A, z ; p , β ) = δ ( A ) H ( dt | z ) + δ ( A ) H ( dt | z ) + δ ( A ) H ( dt | z ) . By the standard log-likelihood ratio inequality, for any t and z , we have E (cid:20) log p ( t, A, z ; p, β ) p ( t, A, z ; p , β ) (cid:12)(cid:12)(cid:12)(cid:12) X ∈ dt, Z = z (cid:21) ≤ . Integrating with respect to X ∈ [0 , τ ] and Z , we obtain E (cid:20) log p ( X, A, Z ; p, β ) p ( X, A, Z ; p , β ) ( X ∈ [0 , τ ]) (cid:21) ≤ . If the last inequality becomes equality, then necessarily p ( t, , z ; p, β ) = p ( t, , z ; p , β ) foralmost all t ∈ [0 , τ ] in the support of X and z in Z . With our assumptions, this cannothappen when ( p, β ⊤ ) ⊤ = ( p , β ⊤ ) ⊤ . Lemma 5.2.
Under the conditions of Theorem 4.1, the condition (5.2) holds. Moreover, sup p ∈ [ ǫ, − ǫ ] sup β ∈ B sup t ∈ [0 ,τ ] | Λ n ( t ; p, β ) − Λ( t ; p, β ) | = o P (1) . Proof of Lemma 5.2.
First, note thatsup p ∈ [ ǫ, − ǫ ] sup β ∈ B sup t ∈ [0 ,τ ] (cid:12)(cid:12) E ( l ) ( s ; p, β ) − e ( l ) ( s ; p, β ) (cid:12)(cid:12) = o P (1) , l = 0 , . (5.3)This is a consequence of the uniform law of large numbers for the classes of functions { ( x, a, z ) exp( β ⊤ z ) z ⊗ l ( x ≥ t ) δ k ( a ) : β ∈ B, t ∈ [0 , τ ] , k ∈ { , , }} , l = 0 , . These two classes of functions are bounded and have polynomial complexity, that is theyare VC classes. In particular, they are Glivenlo-Cantelli classes. Next, by our assumptions,for any, p , z and l , we have t e ( l ) ( t ; p, β ) is decreasing. Moreover,inf l ∈{ , } inf p ∈ [ ǫ, − ǫ ] inf β ∈ B e ( l ) ( τ ; p, β ) > t ; p, β ) = Z [0 ,t ] H ( ds ) e (0) ( s ; p, β ) . Finally, we can writeΛ n ( t ; p, β ) − Λ( t ; p, β ) = Z [0 ,t ] δ ( a ) E (0) ( s ; p, β ) d P n ( s, a, z ) − Z [0 ,t ] δ ( a ) e (0) ( s ; p, β ) dP ( s, a, z )= Z [0 ,t ] (cid:20) δ ( a ) E (0) ( s ; p, β ) − δ ( a ) e (0) ( s ; p, β ) (cid:21) d P n ( s, a, z ) + Z [0 ,t ] δ ( a ) e (0) ( s ; p, β ) d ( P n − P )( s, a, z )and the result follows from (5.3), (5.4) and again the uniform law of large numbers.To justify Theorem 4.1, it suffices to notice that Lemma 5.2 and the uniform lawof large numbers guarantee condition (5.2) and to use Theorem 5.7 from van der Vaart(1998). In this section we sketch the arguments allowing to prove Theorem 4.2 and Corollaries4.3 to 4.5.Note that ∂ β Λ( t ; θ ) = − Z [0 ,t ] e (1) ( s ; θ )[ e (0) ( s ; θ )] H ( ds ) = − Z [0 ,t ] e (1) ( s ; θ ) e (0) ( s ; θ ) Λ( ds ; θ ) , t ∈ [0 , τ ] , with Λ( · ; θ ) = Λ( · ; p , β ) defined in (2.6). Next, define ℘ n ( t ; θ ) = ∂ Λ n ( t ; θ ) ∂p = ∂ Λ n ( t ; p, β ) ∂p and ℘ ( t ) = ∂ Λ( t, θ ) ∂p = ∂ Λ( t ; p , β ) ∂p . Consider the score function U n ( θ ; τ ) = U n ( p, β ; τ )= ∂ β ℓ n ( p, β ; τ )= 1 n n X i =1 ( X i ∈ [0 , τ ]) δ ( A i ) (cid:18) Z i − E (1) ( X i ; p, β ) E (0) ( X i ; p, β ) (cid:19) − n n X i =1 ( X i ∈ [0 , τ ]) δ ( A i ) × E (1) ( X i ; 1 , β ) E (0) ( X i ; p, β ) − E (1) ( X i ; p, β ) E (0) ( X i ; 1 , β )[ E (0) ( X i ; p, β )] + 1 n n X i =1 ( X i ∈ [0 , τ ]) δ ( A i ) exp ( − V i ( θ ))1 − exp ( − V i ( θ )) W i ( θ ) , V i ( θ ) = exp( β ⊤ Z i ) n X j =1 ( X j ≤ X i ) E (0) ( X j ; β, p ) δ ( A j )and W i ( θ ) = exp( β ⊤ Z i ) n X j =1 ( X j ≤ X i ) δ ( A j ) Z i E (0) ( X j ; β, p ) − E (1) ( X j ; β, p )[ E (0) ( X j ; β, p )] . Since we imposed P ( T = C ) = 0, we could equivalently define V i ( θ ) with ( X j ≤ X i )instead of ( X j < X i ), as it would require the definition of the approximate log-likelihood ℓ n . Let us also consider U ( θ ; τ ) = ∂ β ℓ ( p, β ; τ ) , the limit of this score function. The following lemma is a simple consequence of theuniform law of large numbers and the convergence in probability of U − statistics, andhence the proof is omitted. Lemma 5.3.
Under the Assumptions A1 – A3 and if θ n = ( p n , β ⊤ n ) ⊤ → θ = ( p , β ⊤ ) ⊤ inprobability, then1. k ∂ β U n ( θ n , τ ) − ∂ β U ( θ , τ ) k = o P (1); k ∂ p U n ( θ n , τ ) − ∂ p U ( θ , τ ) k = o P (1) . sup t ∈ [0 ,τ ] k ∂ β Λ n ( t ; θ n ) − ∂ β Λ( t ; θ ) k = o P (1); sup t ∈ [0 ,τ ] | ℘ n ( t ; θ n ) − ℘ ( t ) | = o P (1) . Let us sketch the arguments of the proof of Theorem 4.2. By the definition of b β andthe first-order Taylor expansion of U n ( θ ; τ ) in a neighborhood of θ , √ nU n ( b θ, τ ) = 0 = √ nU n ( θ , τ ) + ∂ β U n ( θ ∗ n , τ ) √ n ( b β − β ) + ∂ p U n ( θ ∗ n , τ ) √ n ( b p − p ) , where θ ∗ n is a point between b θ and θ . By Lemma 5.3, if the q × q − matrix ∂ β U ( θ , τ ) isinvertible, √ n ( b β − β ) = − ∂ β U ( θ , τ ) − √ nU n ( θ , τ ) − ∂ β U ( θ , τ ) − ∂ p U ( θ , τ ) √ n ( b p − p ) + o P (1) . Hence, the asymptotic normality of √ n ( b β − β ) will follow from the joint asymptoticnormality of √ nU n ( θ , τ ) and √ n ( b p − p ). On the other hand, by a Taylor expansion andProposition 5.3, for some θ † n between b θ and θ , we can write √ n (cid:16) Λ n ( t ; b θ ) − Λ( t ; θ ) (cid:17) = √ n (Λ n ( t ; θ ) − Λ( t ; θ ))+ ∂ β Λ n ( t ; θ † n ) ⊤ √ n ( b β − β ) + ℘ n ( t, θ † n ) √ n ( b p − p )= √ n (Λ n ( t ; θ ) − Λ( t ; θ ))+ ∂ β Λ( t ; θ ) ⊤ √ n ( b β − β ) + ℘ ( t, θ ) √ n ( b p − p ) + o P (1) . √ n (cid:16)b Λ( t ) − Λ ( t ) (cid:17) will follow from the joint asymp-totic normality of √ n (Λ n ( t ; θ ) − Λ( t ; θ )) , √ nU n ( θ , τ ) and √ n ( b p − p ). Gathering facts,we have √ n b p − p b β − β b Λ( t ) − Λ ( t ) = Σ ( t ) √ n U n ( θ , τ )Λ n ( t ; θ ) − Λ( t ; θ ) b p − p + o P (1) , whereΣ ( t ) = − ∂ β U ( θ , τ ) − − ∂ β U ( θ , τ ) − ∂ p U ( θ , τ ) − ∂ β Λ( t ; θ ) ⊤ ∂ β U ( θ , τ ) − ℘ ( t, θ ) − ∂ β Λ( t ; θ ) ⊤ ∂ β U ( θ , τ ) − ∂ p U ( θ , τ ) . Hence, it suffices to study the asymptotic behavior of the ( q + 2) − dimension vector √ n (cid:0) U n ( θ , τ ) ⊤ , Λ n ( t ; θ ) − Λ( t ; θ ) , b p − p (cid:1) ⊤ . b p It is clear that the class of 0/1-valued functions δ k ( · ) defined on { , , } and indexed by k ∈ { , , } is P − Donsker. We have b p = P n δ P n ( δ + δ ) . Using the first-order Taylor expansion for f ( x , x ) = x / ( x + x ) with x , x ≥ c for c some small positive constant, we easily derive the representation √ n ( b p − p ) = √ n [ f ( P n δ , P n δ ) − f ( P δ , P δ )]= ∂f∂x ( P δ , P δ ) G n δ + ∂f∂x ( P δ , P δ ) G n δ + o P (1)= P δ ( P δ + P δ ) G n δ − P δ ( P δ + P δ ) G n δ + o P (1) . Λ n ( t ; θ )For any t ≥ , k, l ∈ { , , } , let us define( x, a, z ) f ( k,l ) t ( x, a, z ) = exp( β ⊤ z ) z ⊗ l ( x ≥ t ) δ k ( a ) , x ≥ , a ∈ { , , } , z ∈ Z ⊂ R q . Thus
P f ( k,l ) t = s ( l ) k ( t ; β ). For any k, l ∈ { , , } consider the family of such functions F ( k,l ) = n f ( k,l ) t ( · , · , · ) : t ∈ [0 , τ ] o . Each of such families are clearly uniformly bounded and P − Donsker. Next, let e (0) t ( x, a, z ) = f (0 , t ( x, a, z ) + p f (1 , t ( x, a, z ) . P e (0) t = e (0) ( t ; θ ) and we can rewriteΛ n ( t ; θ ) = P n (cid:20) δ ( a ) ( x ≤ t ) P n e (0) x (cid:21) and Λ( t ; θ ) = P (cid:20) δ ( a ) ( x ≤ t ) P e (0) x (cid:21) . Hence, we can write √ n (Λ n ( t ; θ ) − Λ( t ; θ )) = √ n (cid:20)Z [0 ,t ] A n dB n − Z [0 ,t ] A dB (cid:21) for the c`adl`ag functions A ( t ) = P e (0) t , B ( t ) = P [ ( x ≤ t ) δ ( a )] = p Z [0 ,t ] A ( s )Λ( ds ; θ ) , t ∈ [0 , τ ] , and A n , B n their empirical version obtained by replacing P by P n . Let D [ a, b ] be the space of c`adl`ag functions on [ a, b ] and let BV M [ a, b ] be the set of allfunctions B ∈ D [ a, b ] with total variation | B (0) | + R ( a,b ] | B ( ds ) | ≤ M. Let D M = { A ∈ D [0 , τ ] : A ≥ ǫ } × BV M [0 , τ ]for some positive constants ǫ, M . For sufficiently small ǫ and sufficiently large M (de-pending on c from assumption A2 and H ([ τ, ∞ )) + H ([ τ, ∞ )) > A3 , A, B and, with probability tending to 1, A n , B n defined above belong to D M . The D [0 , τ ] − valued map ( A, B ) R [0 , · ] (1 /A ) dB is Hadamard differentiable on the set D M andthe derivative map is given by( α, β ) Z [0 , · ] (1 /A ) dβ − Z [0 , · ] ( α/A ) dB ;see, for instance, Kosorok (2008) section 12.2. The integral R [0 , · ] (1 /A ) dβ is defined viaintegration by parts if β is not of bounded variation. To derive the i.i.d. representation,let us use the Hadamard derivative with α = G n e (0) · , β = G n [ δ ( a ) ( x ≤ · )] . Since e (0) ( t ; θ ) = P e (0) s . Deduce that for any t ∈ [0 , τ ], √ n (Λ n ( t ; θ ) − Λ( t ; θ )) = G n f (2) t − p Z [0 ,t ] (cid:8) G n f (0 , s (cid:9) Λ( ds ; θ ) e (0) ( s ; θ ) − p Z [0 ,t ] (cid:8) G n f (1 , s (cid:9) Λ( ds ; θ ) e (0) ( s ; θ )+ r n ( t ) , [0 ,τ ] | r n ( t ) | = o P (1) , where f (2) t ∈ F (2) = n ( x, a, z ) f (2) t ( x, a, z ) = δ ( a ) ( x ≤ t ) (cid:2) p e (0) ( t ; θ ) (cid:3) − : t ∈ [0 , τ ] o . Clearly, F (2) is a P − Donsker class of real-valued functions defined on [0 , τ ] × { , , } × Z . √ nU n ( θ , τ )We only present the guidelines that could be followed to deduce the asymptotic normality √ nU n ( θ , τ ). Consider the function φ : R × R × R q × R q R d given by the relationship φ ( y , y , y , y ) = − y + p y y + p y − y + y y + p y + y + y ( y + p y ) ( y + p y ) . Let Ξ be a set of (3 + 3 q ) − dimension vector valued functions of the observed variables η = ( η , η , η ⊤ , η ⊤ , η , η ⊤ ) : [0 , τ ] → R × R × R q × R q × R × R q , such that each component of η is a monotone c`adl`ag function bounded in absolute valueby some sufficiently large constant M . Moreover, we assume that the function η ( x ) = ( P f (0 , x , P f (1 , x , ( P f (0 , x ) ⊤ , ( P f (1 , x ) ⊤ , Λ( x ; θ ) , ( ∂ β Λ( x ; θ )) ⊤ ) , x ∈ [0 , τ ] , belongs to Ξ, and, with probability tending to 1 as n → ∞ , the empirical version η n ( x ) = ( P n f (0 , x , P n f (1 , x , ( P n f (0 , x ) ⊤ , ( P n f (1 , x ) ⊤ , Λ n ( x ; θ ) , ( ∂ β Λ n ( x ; θ )) ⊤ ) , x ∈ [0 , τ ] , is also contained in Ξ. Let us define the family of functions H = { ( x, a, z ) h η ( x, a, z ) : η ∈ Ξ } , where h η ( x, a, z ) = (cid:20) δ ( a ) { z + φ ( η ( x ) , η ( x ) , η ( x ) , η ( x )) } + δ ( a ) exp( − exp( β ⊤ z ) η ( x ))1 − exp( − exp( β ⊤ z ) η ( x )) { zη ( x ) + η ( x ) } (cid:21) ( x ≤ τ ) . Next, the idea is to decompose √ nU n ( θ , τ ) = G n h η + √ nP h η n + G n ( h η n − h η ) . By the continuity of the paths of the empirical process, see for instance Theorem 2.1 ofvan der Vaart & Wellner (2007), G n ( h η n − h η ) = o P (1) . The term G n h η0