[PDF] Asymptotic theory for the semiparametric accelerated failure time model with missing data

Abstract

We consider a class of doubly weighted rank-based estimating methods for the transformation (or accelerated failure time) model with missing data as arise, for example, in case-cohort studies. The weights considered may not be predictable as required in a martingale stochastic process formulation. We treat the general problem as a semiparametric estimating equation problem and provide proofs of asymptotic properties for the weighted estimators, with either true weights or estimated weights, by using empirical process theory where martingale theory may fail. Simulations show that the outcome-dependent weighted method works well for finite samples in case-cohort studies and improves efficiency compared to methods based on predictable weights. Further, it is seen that the method is even more efficient when estimated weights are used, as is commonly the case in the missing data literature. The Gehan censored data Wilcoxon weights are found to be surprisingly efficient in a wide class of problems.

Full PDF

aa r X i v : . [ m a t h . S T ] A ug The Annals of Statistics (cid:13)

Institute of Mathematical Statistics, 2009

ASYMPTOTIC THEORY FOR THE SEMIPARAMETRICACCELERATED FAILURE TIME MODEL WITH MISSING DATA

By Bin Nan, John D. Kalbfleisch and Menggang Yu

University of Michigan, University of Michigan and Indiana University

We consider a class of doubly weighted rank-based estimatingmethods for the transformation (or accelerated failure time) modelwith missing data as arise, for example, in case-cohort studies. Theweights considered may not be predictable as required in a martin-gale stochastic process formulation. We treat the general problem asa semiparametric estimating equation problem and provide proofsof asymptotic properties for the weighted estimators, with eithertrue weights or estimated weights, by using empirical process the-ory where martingale theory may fail. Simulations show that theoutcome-dependent weighted method works well for ﬁnite samplesin case-cohort studies and improves eﬃciency compared to methodsbased on predictable weights. Further, it is seen that the methodis even more eﬃcient when estimated weights are used, as is com-monly the case in the missing data literature. The Gehan censoreddata Wilcoxon weights are found to be surprisingly eﬃcient in a wideclass of problems.

1. Introduction.

Instead of modeling the hazard function for censoredsurvival data, as in the Cox model [6], modeling the (transformed) failuretime directly is sometimes appealing to practitioners since it postulates asimple relationship between the response variable and covariates with easilyinterpretable parameters. Let T denote the failure time transformed by aknown monotone function h , C be the corresponding transformed censoringtime, ∆ = 1( T ≤ C ) and Y = min( T, C ), where 1( · ) denotes an indicatorfunction. The model of interest is T i = θ ′ Z i + e i , i = 1 , . . . , n, (1.1) Received January 2008; revised July 2008. Supported in part by NSF Grant DMS-07-06700.

AMS 2000 subject classiﬁcations.

Primary 62E20, 62N01; secondary 62D05.

Key words and phrases.

Accelerated failure time model, case-cohort study, censoredlinear regression, Donsker class, empirical processes, Glivenko–Cantelli class, pseudo Z -estimator, nonpredictable weights, rank estimating equation, semiparametric method. This is an electronic reprint of the original article published by theInstitute of Mathematical Statistics in

The Annals of Statistics ,2009, Vol. 37, No. 5A, 2351–2376. This reprint diﬀers from the original inpagination and typographic detail. 1

B. NAN, J. D. KALBFLEISCH AND M. YU where the e i ’s are independent and identically distributed (i.i.d.) with un-known distribution F , and e i is independent of ( Z i , C i ) for all i . When h = log, the model is called the accelerated failure time model (see, e.g.,[12]).For a cohort of n i.i.d. observations of X i = ( Y i , ∆ i , Z i ), i = 1 , . . . , n , [4]proposed an imputation type of least squares method, where the censoredsurvival time is replaced by an estimate of the mean residual life conditionalon the covariates, which is obtained from the Kaplan–Meier estimator on theresidual scale. Stute [24, 25] proposed a weighted least squares method withweights obtained from the Kaplan–Meier estimator for the transformed sur-vival time. [21, 26] and [30], among others, studied the rank-based estimatingmethod and proved the asymptotic properties using martingale theory forcounting processes.In this article, we consider a general rank-based estimating method formodel (1.1) in the presence of missing data as arise, for example, in case-cohort studies (e.g., [19, 23]) where data are missing by design. Speciﬁcally,let Z i = ( Z ′ i , Z ′ i ) ′ and assume that Z i is missing at random (see [14]), while Z i , Y i and ∆ i are always observed for all i . The situations where Z i = Z i for all i , or where Z i is not included in model (1.1), are special cases. Inthe latter of these special cases, Z i is usually called an auxiliary variable inthe missing data literature. The approach in this article extends the work of[16] for case-cohort studies, where weights are predictable and the countingprocess approach of [26] applies. It can be applied to general two-phaseoutcome-dependent sampling designs for censored survival data and allowsthe use of nonpredictable weights that can yield more eﬃcient parameterestimates. The proof of eﬃciency gains from using estimated weights, eventhough the true weights are given, similarly follows the approach of [18].This article is organized as follows. In Section 2, we introduce the doublyweighted rank-based estimating method with arbitrary weights (i.e., eitherpredictable or nonpredictable), and link the proposed estimating functionto a semiparametric framework that is more suitable for applying empiri-cal process theory. Methods based on both known weights and estimatedweights are considered. We describe asymptotic properties of the proposedestimators in Section 3, with detailed proofs given in Section 6. In Section 4,we discuss the asymptotic eﬃciency and some simulation results that com-pare methods of using predictable weights and nonpredictable weights andmethods of using known weights and estimated weights. We make a fewconcluding remarks in Section 5.

2. Doubly weighted semiparametric estimating function.

For the i thsubject, Z i , Y i and ∆ i are always observed. Let R i be the missing dataindicator that takes value 1 if Z i is also observed and 0 otherwise. Suppose FT MODEL WITH MISSING DATA that Z i is missing at random, so that π i = Pr( R i = 1 | Z i , Y i , ∆ i ) = Pr( R i = 1 | Z i , Y i , ∆ i )for each i . This holds, for example, when independent Bernoulli sampling isimplemented in a two-phase sampling design that includes the case-cohortstudy as a special case.To estimate θ in model (1.1), we follow [15] and deﬁne the followingrandom mapΨ n ( θ, η, ρ ) = 1 n n X i =1 ψ ( X i ; θ, η, ρ )(2.1) = 1 n n X i =1 Ω i ρ ( Y i − θ ′ Z i , θ ) { Z i − η ( Y i − θ ′ Z i , θ ) } ∆ i , where θ ∈ Θ ⊂ R d is the d -dimensional Euclidean parameter of interest withunknown true value θ , and η and ρ are real valued (vectors of) functionsthat can be viewed as inﬁnite dimensional nuisance parameters.When η ( t, θ ) is replaced by an estimator of the true function (see [21]) η ( t, θ ) = E { Y − θ ′ Z ≥ t ) Z } E { Y − θ ′ Z ≥ t ) } , with η ( t, θ ) = E ( Z | Y − θ ′ Z ≥ t ), random map (2.1) becomes a weightedestimating function for θ , where Ω i are subject speciﬁc weights and ρ ( t, θ )is a weight function. Clearly such an estimating function is semiparametric.To be more general, we assume that the true functional forms of η and ρ are unknown and need to be estimated, and study the estimating functionΨ n ( θ, ˆ η n , ˆ ρ n ) with ˆ η n ( t, θ ) = P nj =1 W j Y j − θ ′ Z j ≥ t ) Z j P nj =1 W j Y j − θ ′ Z j ≥ t ) , (2.2)where W j are subject speciﬁc weights that may or may not equal Ω j . This isthe source of the term “double weights” (see [31]); the purpose of introduc-ing two possibly diﬀerent subject speciﬁc weights will soon become clear.A particularly interesting weight function ρ ( t, θ ) is taken to be ρ ( t, θ ) =Pr( Y − θ ′ Z ≥ t ), and it can be estimated byˆ ρ n ( t, θ ) = P nj =1 W j Y j − θ ′ Z j ≥ t ) P nj =1 W j , (2.3)a weighted Gehan-type weight. This type of weight provides a very desirableproperty. The corresponding estimating function Ψ n ( θ, ˆ η n , ˆ ρ n ) is monotonein θ . See [31] for the detailed derivation. B. NAN, J. D. KALBFLEISCH AND M. YU

In this article, we focus on the estimator of θ obtained from the estimatingfunction Ψ n ( θ, ˆ η n , ˆ ρ n ), where ˆ η n is given in (2.2). The estimator ˆ ρ n can bemore ﬂexible, but we will be particularly interested in the one given by(2.3). Using two possibly diﬀerent sets of subject speciﬁc weights Ω i and W i in Ψ n ( θ, ˆ η n , ˆ ρ n ) yields great ﬂexibility that covers a broad range of problems.The following are a few examples:(i) When ρ = 1 and Ω i = W i = 1 for all i , (2.2) becomesˆ η n ( t, θ ) = P nj =1 Y j − θ ′ Z j ≥ t ) Z j P nj =1 Y j − θ ′ Z j ≥ t ) , and the estimating function Ψ n ( θ, ˆ η n ,

1) becomes the rank-based estimatingfunction studied by [26] and [29], among others. [26] and [30] proved asymp-totic linearity of Ψ n ( θ, ˆ η n ,

1) and thus normality of the estimator obtainedfrom Ψ n ( θ, ˆ η n ,

1) = 0 using a stochastic integral formulation and martingaletheory for counting processes.(ii) When ˆ ρ n takes the form in (2.3) and Ω i = W i = 1 for all i , Ψ n ( θ, ˆ η n , ˆ ρ n )becomes the estimating function of [26] with Gehan weights. The monotonic-ity of such an estimating function was studied by [7].(iii) When ˆ ρ n takes the form in (2.3) and Ω i = 1, W i = 1( i ∈ SC ) / Pr( i ∈SC ) for all i where SC denotes the set of labels of the subcohort in a case-cohort study, Ψ n ( θ, ˆ η n , ˆ ρ n ) becomes the estimating function of [16] with gen-eralized Gehan-type weights.(iv) When ˆ ρ n takes the form in (2.3) and Ω i = 1, W i = R i /π i for all i ,where π i depends on ∆ i , Ψ n ( θ, ˆ η n , ˆ ρ n ) becomes an extension of the estimat-ing function of [31] (where the authors focused on numerical aspects anddid not provide asymptotic properties). The weights Ω i = 1 and W i = R i /π i have been applied to case-cohort studies to potentially improve eﬃciency inthe Estimator II of [2] as well as in [5, 13] for the Cox model.(v) When Ω i = W i = R i /π i , the estimating function Ψ n ( θ, ˆ η n , ˆ ρ n ) can beapplied to a general missing data problem with covariate Z i missing atrandom. This arises, for example, in a two-phase sampling design and yieldsan estimator that is similar to that proposed in [20] and further studied by[3] for the Cox model.In examples (i), (ii) and (iii), the estimating functions can be formulatedas martingales, and the related theory applies. In the last two situations,however, weights Ω i and/or W i depend on ∆ i , particularly in case-cohortstudies, and, thus, are not predictable. There is no martingale representa-tion of these weighted estimating functions. Further complications are: (1)the estimating function Ψ n ( θ, ˆ η n , ˆ ρ n ) is a nonsmooth function of θ , so thatthe methods developed for smooth estimating functions based on Taylor ex-pansions do not apply; and (2) the nuisance parameters η and ρ are explicit FT MODEL WITH MISSING DATA functions of θ , whereas usual semiparametric models assume that nuisanceparameters do not vary with the parameter of interest.Our simulation study shows a substantial eﬃciency gain when such outcome-dependent weights are used and more eﬃciency gain when the known weightsare estimated from observed data. This latter result has been often noted(see, e.g., [3, 11, 18, 22], among many others). For these reasons, it is de-sirable to rigorously investigate the theoretical properties of the estimatorsobtained from the estimating function Ψ n ( θ, ˆ η n , ˆ ρ n ) with both known andestimated ﬂexible weights.

3. Asymptotic properties.

Assume that the observed data are i.i.d. Inaddition to Conditions 1–3 in [30] (also assumed in [26]), we assume Condi-tions (A) and (B) below and derive asymptotic properties of the estimatorobtained from the weighted estimating function Ψ n ( θ, ˆ η n , ˆ ρ n ). In particu-lar, these results apply when ˆ η n is given by (2.2) and ˆ ρ n takes the form(2.3), which estimates ρ = Pr( Y − θ ′ Z ≥ t ) with either true weights W i ortheir estimates ˆ W i . Our method does not depend on stochastic integralsand, hence, does not require predictability of the weights. So, it applies to amuch broader range of estimating functions. Note that ˆ η n ( t, θ ) in (2.2) andˆ ρ n ( t, θ ) in (2.3) are not diﬀerentiable in θ . Condition (A).

There exist constants τ < ∞ and ξ , such that Pr( Y − θ ′ Z ≥ τ ) ≥ ξ > Z and θ ∈ Θ. Condition (B).

The selection probability π = Pr( R = 1 | Z , Y, ∆) ≥ ζ > Z , Y and ∆ for some constant ζ .Condition (A) follows an assumption in equation (3.1) of [26]. Condition(B) is a common assumption in the missing data literature and guaranteesthat the inverse selection probability weights are bounded. Using empiricalprocess theory, we follow the idea of [26] and [30] to show the asymptoticlinearity of Ψ n ( θ, ˆ η n , ˆ ρ n ) in θ in a neighborhood of the true value θ . Weadopt the empirical process notation of [27]. In particular, for a function f of a random variable U that follows distribution P , we deﬁne P f = Z f ( u ) dP ( u ) , P n f = n − n X i =1 f ( U i ) , G n f = n − / ( P n − P ) f and refer all the details to the reference. Throughout the article, we assumethat Ω i and W i are bounded and satisfy E (Ω i | X i ) = E ( W i | X i ) = 1, for all i , and set ε θ = Y − θ ′ Z and ε = Y − θ ′ Z . B. NAN, J. D. KALBFLEISCH AND M. YU

Using true weights.

Consistency and rate of convergence of the pro-posed estimator ˆ θ n for general η and ρ are given in Theorems 3.1 and 3.2,respectively. Asymptotic normality of ˆ θ n obtained from the estimating func-tion Ψ n ( θ, ˆ η n , ˆ ρ n ), with ˆ η n and ˆ ρ n taking the forms in (2.2) and (2.3), isgiven in Theorem 3.3. Proofs are deferred to Section 6. Theorem 3.1.

Denote Ψ( θ, η, ρ ) = P [ ρ ( ε θ , θ ) { Z − η ( ε θ , θ ) } ∆] . Let Θ ,the parameter space of θ , be compact, assume that θ ∈ Θ is the uniquesolution of Ψ( θ, η , ρ ) = 0 and let k · k be the supremum norm. If k η − η k ≤ δ n and k ρ − ρ k ≤ δ n with δ n ↓ , where η , η , ρ and ρ belong to Glivenko–Cantelli classes and are bounded, then: (i) In outer probability, k Ψ n ( θ, η, ρ ) − Ψ( θ, η , ρ ) k → An approximate root ˆ θ n satisfying Ψ n (ˆ θ n , η ( · , ˆ θ n ) , ρ ( · , ˆ θ n )) = o p ∗ (1) isconsistent; (iii) When ˆ η n and ˆ ρ n are given respectively by (2.2) and (2.3), an ap-proximate root ˆ θ n satisfying Ψ n (ˆ θ n , ˆ η n ( · , ˆ θ n ) , ˆ ρ n ( · , ˆ θ n )) = o p ∗ (1) is consistent. Theorem 3.2.

Let Θ ⊂ Θ be a neighborhood of θ , k · k be the supre-mum norm in Θ and ˆ η n be as in (2.2). Assume that k ˆ ρ n − ρ k = O p ∗ ( n − / ) ,and assume that both ˆ ρ n and ρ are bounded and belong to a Donskerclass. Let ˆ θ n be an approximate root satisfying Ψ n (ˆ θ n , ˆ η n ( · , ˆ θ n ) , ˆ ρ n ( · , ˆ θ n )) = o p ∗ ( n − / ) . Suppose Ψ( θ, η ( · , θ ) , ρ ( · , θ )) is diﬀerentiable with bounded con-tinuous derivative ˙Ψ θ ( θ, η ( · , θ ) , ρ ( · , θ )) in Θ , and ˙Ψ θ ( θ , η ( · , θ ) , ρ ( · , θ )) is nonsingular. Then, k ˆ η n − η k = O p ∗ ( n − / ) and | ˆ θ n − θ | = O p ∗ ( n − / ) .Finally, if ˆ ρ n takes the form in (2.3) and ρ ( t, θ ) = Pr( ε θ ≥ t ) , then the aboveconditions for ˆ ρ n and ρ are satisﬁed. In the proofs of the above theorems, given in Section 6, we apply the per-manence of the Donsker property under closures and convex hulls (see [27])to show that (2.2) and (2.3) and their limits are Donsker. A variety of suf-ﬁcient conditions for Donsker classes of functions are provided in [27].When ˆ η n takes the form in (2.2), the estimating function Ψ n ( θ, ˆ η n , ˆ ρ n ) isdiscontinuous in θ . In the case of full cohort data with Ω i = W i = 1 for all i , [21, 26, 30] showed, with considerable eﬀort, the asymptotic linearity ofΨ n ( θ, ˆ η n , θ , in order to proveasymptotic normality. [16] had equally complicated arguments for asymp-totic linearity in case-cohort studies where the weights W i do not dependon ∆ i . We avoid the stochastic integral formulation and apply empiricalprocess theory to show the asymptotic linearity of Ψ n ( θ, ˆ η n ( · , θ ) , ˆ ρ n ( · , θ )) FT MODEL WITH MISSING DATA around θ for the class of missing data problems considered here. In Theo-rem 3.3, we focus on the situation where ˆ η n and ˆ ρ n are, respectively, givenby (2.2) and (2.3). For other types of bounded weight functions ˆ ρ n and ρ ,proofs of asymptotic normality follow the same steps, and the same asymp-totic representation should hold if { ˆ ρ n } and { ρ } are Donsker and ˆ ρ n is anasymptotic linear estimator. This approach takes care of both predictableand nonpredictable weights. Theorem 3.3.

Let ˆ η n and ˆ ρ n be as in (2.2) and (2.3). Let ˆ θ n be an ap-proximate root satisfying Ψ n (ˆ θ n , ˆ η n ( · , ˆ θ n ) , ˆ ρ n ( · , ˆ θ n )) = o p ∗ ( n − / ) . Let Y and Z denote the sample spaces of random variables Y and Z , respectively. Sup-pose that ρ ( ε θ , θ ) and η ( ε θ , θ ) are diﬀerentiable in θ with derivatives ˙ ρ θ and ˙ η θ , which are uniformly bounded and continuous in Θ × Y × Z . Notethat this implies that Ψ( θ, η ( · , θ ) , ρ ( · , θ )) is diﬀerentiable in θ with boundedcontinuous derivative ˙Ψ θ ( θ, η ( · , θ ) , ρ ( · , θ )) in Θ . Then, we have the fol-lowing: (i) The asymptotic linearity n / Ψ n (ˆ θ n , ˆ η n ( · , ˆ θ n ) , ˆ ρ n ( · , ˆ θ n ))= n / Ψ n ( θ , ˆ η n ( · , θ ) , ˆ ρ n ( · , θ ))(3.2) + n / (ˆ θ n − θ ) ˙Ψ θ ( θ , η ( · , θ ) , ρ ( · , θ )) + o p ∗ (1) holds; (ii) If ˙Ψ θ ( θ , η ( · , θ ) , ρ ( · , θ )) is nonsingular, then n / (ˆ θ n − θ ) is asymp-totically normal with the asymptotic representation n / (ˆ θ n − θ ) = {− ˙Ψ θ ( θ , η ( · , θ ) , ρ ( · , θ )) } − · G n (cid:20) Ω ρ ( ε , θ ) { Z − η ( ε , θ ) } ∆(3.3) − Z W ρ ( t, θ ) { Z − η ( t, θ ) } ε ≥ t ) d Λ ( t ) (cid:21) + o p ∗ (1) . Remark.

As becomes clear in the proof of Theorem 3.3, the asymptoticrepresentation (3.3) is the same if the weight function ρ ( t, θ ) is known, and,in fact, such a property does not depend on what ρ ( t, θ ) is. This ﬁnding isconsistent with the claim in Section 4 of [26]. Equation (3.3) reduces to theresult of [16] for predictable W when Ω = 1 and ρ ( t, θ ) = 1. The varianceestimator for ˆ θ n can be obtained following the method described in [16]based on the asymptotic representation (3.3) and the original idea of [9].Alternative variance estimation methods can be found in [10, 17]. Later, in B. NAN, J. D. KALBFLEISCH AND M. YU

Section 4.1, we show that letting Ω = W yields more eﬃcient estimation forthe example of a case-cohort study.3.2. Using estimated weights.

In Theorems 3.1, 3.2 and 3.3, the subject-speciﬁc weights W i and Ω i are assumed to be known. This is a reasonableassumption for many types of sampling designs when weights are the in-verse of sampling probabilities, because sampling probabilities are usuallyprespeciﬁed by investigators. In the missing data literature, many authors(e.g., [22] and [3]) have pointed out that using the estimated weights im-proves the asymptotic eﬃciency, even though the true weights are known.Suppose true weights W i are parameterized by α with true value α ; thatis, W i ≡ W ( X i ; α ) , i = 1 , . . . , n. Let ˆ α n be an estimator of α . Then, we can estimate W i byˆ W i = W ( X i ; ˆ α n ) , i = 1 , . . . , n. In this subsection, we take Ω i = W i , i = 1 , . . . , n , for simplicity, and we con-sider the asymptotic properties of the estimator ˆ θ ∗ n , which are obtained fromthe following semiparametric estimating function with estimated weights:Ψ ∗ n ( θ, ˆ η ∗ n , ˆ ρ ∗ n ) = 1 n n X i =1 ˆ W i ˆ ρ ∗ n ( Y i − θ ′ Z i , θ ) { Z i − ˆ η ∗ n ( Y i − θ ′ Z i , θ ) } ∆ i , (3.4)where ˆ η ∗ n ( t, θ ) = P nj =1 ˆ W j Y j − θ ′ Z j ≥ t ) Z j P nj =1 ˆ W j Y j − θ ′ Z j ≥ t )(3.5)and ˆ ρ ∗ n ( t, θ ) = P nj =1 ˆ W j Y j − θ ′ Z j ≥ t ) P nj =1 ˆ W j . (3.6)This case Ω i = W i handles the case-cohort study, naturally, when inversesampling probability weights are used for which Ω i = W i = 1 whenever ∆ i =1. Note that the estimating function (3.4) is obtained by replacing knownweights W i with their estimates ˆ W i in Ψ n ( θ, ˆ η n , ˆ ρ n ), ˆ η n and ˆ ρ n ; see (2.2)and (2.3). As in Theorem 3.3, the following result holds for other types ofbounded weight function ρ and estimator ˆ ρ ∗ n , provided that { ˆ ρ ∗ n } and { ρ } are Donsker, and that ˆ ρ ∗ n , as a function of α , is an asymptotically linearestimator that is twice continuously diﬀerentiable in α with the ﬁrst-orderderivative converging to an integrable limit at α . The latter remark becomesclear in the proof of the next theorem. FT MODEL WITH MISSING DATA We now consider consistency and asymptotic normality of ˆ θ ∗ n in Theo-rem 3.4 with a reasonable assumption about ˆ α n and a classical smoothnesscondition for W ( X ; α ) in α . The eﬃciency gain from using estimated weightsbecomes evident. Theorem 3.4.

Suppose that W ( X ; α ) is twice diﬀerentiable, with re-spect to α , in A × X with continuous and bounded derivatives, where A is a neighborhood of α and X is the bounded sample space of the ran-dom variable X . Suppose that ˆ α n is an asymptotically eﬃcient estimatorof α with bounded inﬂuence function at α . Let ˆ η ∗ n and ˆ ρ ∗ n be deﬁned by(3.5) and (3.6), and let ˆ θ ∗ n be an approximate root satisfying the equation Ψ ∗ n (ˆ θ ∗ n , ˆ η ∗ n ( · , ˆ θ ∗ n ) , ˆ ρ ∗ n ( · , ˆ θ ∗ n )) = o p ∗ ( n − / ) . Suppose that all the assumptions inTheorem 3.3 hold. Then, ˆ θ ∗ n is consistent, and n / (ˆ θ ∗ n − θ ) is asymptoticallynormal with zero mean and the asymptotic variance Σ − { ˙Ψ θ ( θ , η , ρ ) } − BV B ′ { ˙Ψ θ ( θ , η , ρ ) } − , (3.7) where Σ is the asymptotic variance of n / (ˆ θ n − θ ) determined by (3.3), V is the asymptotic variance of n / ( ˆ α n − α ) , and B = P [ ρ ( ε , θ ) A ( ε , θ )∆] − P [ ρ ( ε , θ ) { Z − η ( ε , θ ) } ( ˙ W α ( X ; α )) ′ ∆] , with ˙ W α ( X ; α ) denoting the α -derivative of W ( X ; α ) and A ( t, θ ) = 1 ρ ( t, θ ) [ P { ε ≥ t ) Z ( ˙ W α ( X ; α )) ′ }− η ( t, θ ) P { ( ˙ W α ( X ; α )) ′ ε ≥ t ) } ] . Note that, if ˆ ρ ∗ n = ˆ ρ n = 1, then ρ ( t, θ ), in the above expression for A ,should be replaced by P { ε ≥ t ) } . The asymptotic eﬃciency of ˆ α n is one ofthree suﬃcient conditions for applying the result of [18] to obtain the aboveasymptotic normality of ˆ θ ∗ n . When data are missing at random and inversesampling probability weights are considered, the parameter α is adaptive toother parameters (see [1]) and its eﬃcient estimator can be easily obtained,for example, by the maximum likelihood method. In sampling designs, astratiﬁed approach is commonly used to improve eﬃciency. If the numberof strata is ﬁnite, then the (independent Bernoulli) sampling probabilitieswithin strata consist of the parameter α , and the sampling fractions are themaximum likelihood estimates of α .The other two conditions of [18] are: (i) n / (ˆ θ n − θ ) and n / ( ˆ α n − α )are asymptotically jointly normal; and (ii) n / (ˆ θ ∗ n − θ ) is asymptoticallyequivalent to n / (ˆ θ n − θ ) + Bn / ( ˆ α n − α ). The former is determined by(3.3) in Theorem 3.3 and the fact that ˆ α n is an asymptotically linear esti-mator. The latter is established with a detailed proof in Section 6. B. NAN, J. D. KALBFLEISCH AND M. YU

Consider a stratiﬁed case-cohort study. Suppose that all the censoredsubjects in a study cohort are divided into S strata by the variable Z ∈{ ζ , . . . , ζ S } . In a stratiﬁed case-cohort study, all of the failures are com-pletely observed. For censored subjects, we denote the true sampling prob-abilities by α s , 1 ≤ s ≤ S . Suppose that there are n s subjects in stratum s ,out of whom n ∗ s are selected into the subcohort by the independent Bernoullisampling. We assume that, when n → ∞ , n s /n → γ s >

0, 1 ≤ s ≤ S . In-stead of using the true sampling probabilities α = ( α , . . . , α S ) ′ in theweight function W , we now replace each α s with the sampling fractionˆ α n,s = n ∗ s /n s , 1 ≤ s ≤ S . We can then denote the sampling probability andits estimator of the i th subject as π i = S X s =1 Z i = ζ s ) α s and ˆ π i = S X s =1 Z i = ζ s ) ˆ α n,s . We consider the inverse sampling probability weights W ( X i ; ˆ α n ) = ∆ i + (1 − ∆ i ) 1( i ∈ SC )ˆ π i . The second term in the expression for matrix B in Theorem 3.4 becomeszero, since ˙ W α contains the factor (1 − ∆). The asymptotic variance of ˆ α n is V = diag { α (1 − α ) /γ , . . . , α S (1 − α S ) /γ S } , which can be easily es-timated from observed data.

4. Numerical results.

Asymptotic eﬃciency comparison.

Considering the standard nor-mal, standard logistic and standard extreme value error distributions inmodel (1.1), we evaluate asymptotic eﬃciency under a case-cohort settingto illustrate diﬀerent extents of eﬃciency gain by using diﬀerent weights.The one-dimensional covariate Z is taken to follow a Bernoulli distribu-tion with success probability 0.3 and θ = 0. Censoring time has a uniformdistribution on [ a, b ], where a and b are chosen to obtain 80% censoring pro-portion. Let Z ∗ be a binary correlate of Z with Pr( Z ∗ = 1 | Z = 1) = 0 . Z ∗ = 0 | Z = 0) = 0 .

8. The subcohort is a stratiﬁed subsample selected byindependent Bernoulli sampling with selection probability π ( Z ∗ ), chosen sothat the two strata determined by Z ∗ have the same expected number ofsubjects.For each error distribution, we consider a 2 factorial design with thefollowing factors: • logrank weights ( ˆ ρ n = 1) and Gehan weights [see (2.3)]; • subject speciﬁc weight: predictable with W i = 1( i ∈ SC ) /π i and nonpre-dictable with W i = ∆ i + (1 − ∆ i )1( i ∈ SC ) /π i ; FT MODEL WITH MISSING DATA • subject speciﬁc weights: true W i = W ( X i ; α ) and estimated ˆ W i = W ( X i ; ˆ α n ).The asymptotic variance of logrank weighted method for the full cohort isused as the benchmark, and we report the relative eﬃciency for each of the8 scenarios with subcohort size fraction ranging from 1% to 100%. Resultsare given in Figures 1–3, where: (1) dark curves represent logrank weights,and gray curves represent Gehan weights; (2) solid curves represent pre-dictable known weights, and dotted curves represent predictable estimatedweights; and (3) dashed curves represent nonpredictable known weights, anddotted/dashed curves represent nonpredictable estimated weights.We can see that using estimated weights W ( X i ; ˆ α n ) does not improveeﬃciency very much compared to using true weights W ( X i ; α ) for the set-tings considered. The eﬃciency gain from using the nonpredictable weightsis substantial, especially for small to moderate sampling rates. An interest-ing feature is that when the subcohort size is relatively small, the Gehanweighted method performs much better than the logrank weighted methodfor all three error distributions, even though the result is opposite whensubcohort size is close to the full cohort for both logistic and extreme valueerror distributions. We do not have an analytical explanation for this phe-nomenon, which seems to persist in other simulations as well. It seems safe,however, to recommend the Gehan weights for the problems with missingdata; it is fortuitous that the Gehan weights also yield a monotone esti-mating function, which is a numerically advantageous property. Anotherinteresting phenomenon is that, for the logistic error, the Gehan weights Fig. 1.

Asymptotic eﬃciency under normal error distribution. B. NAN, J. D. KALBFLEISCH AND M. YU

Fig. 2.

Asymptotic eﬃciency under logistic error distribution. may be somewhat less eﬃcient than the logrank weights for censored data,even though they are the most eﬃcient for uncensored data (see [12]).4.2.

Simulations.

We conduct simulations under the same settings asthat in the previous subsection. Since the simulation results are basicallytelling the same story for diﬀerent error distributions, we only report the re-

Fig. 3.

Asymptotic eﬃciency under extreme value error distribution.

FT MODEL WITH MISSING DATA sults for the logistic error. We consider case-cohort designs with cohort size of2000 and subcohort sizes of 15%, 20% and 25% of the entire cohort on aver-age, which lead to on average 640, 720 and 800 completely observed subjects,respectively. Bias of the point estimator, average of the variance estimator,empirical variance and 95% coverage probability, based on the variance es-timator, are reported for ﬁve diﬀerent analyses using the following logrankand Gehan weights: full data analysis, predictable subject-speciﬁc weightedanalysis using true weights, predictable subject-speciﬁc weighted analysisusing estimated weights, nonpredictable subject-speciﬁc weighted analysisusing true weights and nonpredictable subject-speciﬁc weighted analysis us-ing estimated weights. The asymptotic variance for each scenario is alsoreported. From Table 1, we see that all of the methods work well for ﬁnitesamples and reﬂect the patterns observed from the eﬃciency results in theprevious subsection.

5. Discussion.

We consider only the case where weights Ω i and W i arei.i.d. for all i = 1 , . . . , n , which makes the proofs of the asymptotic propertiesmore straightforward. For the case where the weights are determined by(stratiﬁed) simple random sampling, the method of [3] may be applicable,and this is an interesting topic worthy of further investigation.

6. Proofs.

Proof of Theorem 3.1.

As in [26], for notational simplicity, we as-sume one-dimensional θ in the proofs of the theorems in Section 3.Since η , η , ρ and ρ belong to Glivenko–Cantelli classes, it follows,from Theorem 3 of [28], that the set of bounded functions { Ω ρ ( Y, θ ) { Z − η ( ε θ , θ ) } ∆ } is a Glivenko–Cantelli class. By adding and subtracting the sameterm, and by the triangle inequality, we then have that k Ψ n ( θ, η, ρ ) − Ψ( θ, η , ρ ) k = k P n [Ω ρ ( ε θ , θ ) { Z − η ( ε θ , θ ) } ∆] − P [Ω ρ ( ε θ , θ ) { Z − η ( ε θ , θ ) } ∆] k≤ k ( P n − P )[Ω ρ ( ε θ , θ ) { Z − η ( ε θ , θ ) } ∆] k + k P { Ω( ρ − ρ ) Z ∆ }k + k P { Ω( ρη − ρ η )∆ }k . The ﬁrst term on the right-hand side of the above inequality converges tozero in outer probability by the Glivenko–Cantelli property. Obviously, k P { Ω( ρ − ρ ) Z ∆ }k ≤ k ρ − ρ k P | Ω Z ∆ | → k P { Ω( ρη − ρ η )∆ }k≤ k ρη − ρ η k P | Ω∆ | → , B. NAN, J. D. KALBFLEISCH AND M. YU

Table 1

Summary statistics of simulations, where α = subcohort size fraction; Method 1 = fulldata analysis, 2 = predictable subject-speciﬁc weighted analysis using true weights, 3 = predictable subject-speciﬁc weighted analysis using estimated weights, 4 = nonpredictablesubject-speciﬁc weighted analysis using true weights, 5 = nonpredictable subject-speciﬁcweighted analysis using estimated weights; Emp. Var = empirical variance estimator;Ave. Var = average of variance estimator; CP = coverage probability; Asym. Var = asymptotic variance α Weight Method ˆ θ n Emp. Var Ave. Var 95% CP Asym. Var − − − where k ρη − ρ η k = k ( ρ − ρ )( η + η ) + ( ρ + ρ )( η − η ) k≤ k ρ − ρ k · k η + η k + k ρ + ρ k · k η − η k FT MODEL WITH MISSING DATA → . This establishes (3.1), which, in turn, can be shown to imply | ˆ θ n − θ | → θ is the unique solution to Ψ( θ, η ( · , θ ) , ρ ( · , θ )) = 0, for any ﬁxed ε >

0, there exists a δ > P [ | ˆ θ n − θ | > ε ] ≤ P [ | Ψ(ˆ θ n , η ( · , ˆ θ n ) , ρ ( · , ˆ θ n )) | > δ ] . We show that | Ψ(ˆ θ n , η ( · , ˆ θ n ) , ρ ( · , ˆ θ n )) | → θ n follows immediately. Note that there exists a sequence { δ n } ↓ k η − η k ≤ δ n and k ρ − ρ k ≤ δ n with probability tendingto one. Hence, from (3.1), we have the inequalities | Ψ(ˆ θ n , η ( · , ˆ θ n )) , ρ ( · , ˆ θ n )) |≤ | Ψ n (ˆ θ n , η ( · , ˆ θ n ) , ρ ( · , ˆ θ n )) | + | Ψ(ˆ θ n , η ( · , ˆ θ n ) , ρ ( · , ˆ θ n )) − Ψ n (ˆ θ n , η ( · , ˆ θ n ) , ρ ( · , ˆ θ n )) |≤ | Ψ n (ˆ θ n , η ( · , ˆ θ n ) , ρ ( · , ˆ θ n )) | + o p ∗ (1)= o p ∗ (1) . Hence, ˆ θ n is consistent.We now show that (3.1) holds, when η and ρ are replaced by ˆ η n and ˆ ρ n given in (2.2) and (2.3), respectively, and ρ ( t, θ ) = Pr( ε θ ≥ t ). We deﬁne D (0) n ( t, θ ) = P n { W ε θ ≥ t ) } , d (0) ( t, θ ) = P { W ε θ ≥ t ) } ; D (1) n ( t, θ ) = P n { W ε θ ≥ t ) Z } , d (1) ( t, θ ) = P { W ε θ ≥ t ) Z } . Thus, ˆ η n ( t, θ ) = D (1) n ( t, θ ) /D (0) n ( t, θ ) and η ( t, θ ) = d (1) ( t, θ ) /d (0) ( t, θ ). Thelatter equality holds because P { W ε θ ≥ t ) } = P { ε θ ≥ t ) } and P { W ε θ ≥ t ) Z } = P { ε θ ≥ t ) Z } . Since the class of functions { ε θ ≥ t ) } is a VC-class (see, e.g., Exercise 9 onpage 151 and Exercise 14 on page 152 in [27]) and, thus, a Donsker class, weknow that the sets of functions F = { W ε θ ≥ t ) } and F = { W ε θ ≥ t ) Z } are Donsker classes (see, e.g., [27], Section 2.10). Since Donsker classes areGlivenko–Cantelli classes, it follows that k D ( k ) n ( t, θ ) − d ( k ) ( t, θ ) k → k = 0 ,

1. Let τ correspond to T ∗ in [26] and represent the longestfollow-up time. Since both D (0) n (with probability 1) and d (0) are boundedaway from zero when t ≤ τ , we have k ˆ η n ( t, θ ) − η ( t, θ ) k → B. NAN, J. D. KALBFLEISCH AND M. YU in outer probability. Similarly, we have k ˆ ρ n ( t, θ ) − ρ ( t, θ ) | → F k be the closure of F k , k = 0 ,

1, respectively, in which the convergenceis both pointwise and in L ( P ). Then, D ( k ) n ( t, θ ) and d ( k ) ( t, θ ) are in theconvex hull of ¯ F k , k = 0 ,

1, and, thus, belong to Donsker classes (see, e.g.,[27], Theorems 2.10.2 and 2.10.3). Hence, both { ˆ η n ( t, θ ) } and { η ( t, θ ) } areDonsker (by [27], Example 2.10.9) and, thus, Glivenko–Cantelli. Similarly,we can argue that both { ˆ ρ n ( t, θ ) } and { ρ ( t, θ ) } are Donsker and, hence,Glivenko–Cantelli. Then, by the ﬁrst half of the proof we obtain k Ψ n ( θ, ˆ η n , ˆ ρ n ) − Ψ( θ, η , ρ ) k → Proof of Theorem 3.2.

From the proof of Theorem 3.1 we see that n / { D ( k ) n ( t, θ ) − d ( k ) ( t, θ ) } , k = 0 ,

1, converge to zero mean Gaussian pro-cesses for all θ ∈ Θ , and k n / { D ( k ) n ( t, θ ) − d ( k ) ( t, θ ) }k = O p ∗ (1), k = 0 , n / { ˆ η n ( t, θ ) − η ( t, θ ) } = n / (cid:20) d (0) ( t, θ ) { D (1) n ( t, θ ) − d (1) ( t, θ ) }− D (1) n ( t, θ ) D (0) n ( t, θ ) d (0) ( t, θ ) { D (0) n ( t, θ ) − d (0) ( t, θ ) } (cid:21) = n / (cid:20) d (0) ( t, θ ) { D (1) n ( t, θ ) − d (1) ( t, θ ) }− d (1) ( t, θ ) d (0) ( t, θ ) { D (0) n ( t, θ ) − d (0) ( t, θ ) } (cid:21) + o p ∗ (1)= d (0) ( t, θ ) − n / [ { D (1) n ( t, θ ) − D (0) n ( t, θ ) η ( t, θ ) }− { d (1) ( t, θ ) − d (0) ( t, θ ) η ( t, θ ) } ] + o p ∗ (1)= d (0) ( t, θ ) − G n [ W ε θ ≥ t ) { Z − η ( t, θ ) } ] + o p ∗ (1) . Since the classes of functions { W } , { ε θ ≥ t ) } , { Z } and { η } are Donsker,we know that { W ε θ ≥ t ) { Z − η ( t, θ ) }} is Donsker (e.g., [27], Section 2.10).Thus, n / k ˆ η n − η k = O p ∗ (1), since d (0) ( t, θ ) − is bounded.We now show n / | ˆ θ n − θ | = O p ∗ (1). First, we have k n / { Ψ n ( θ, ˆ η n ( · , θ ) , ˆ ρ n ( · , θ )) − Ψ( θ, η ( · , θ ) , ρ ( · , θ )) }k = O p ∗ (1)(6.3) FT MODEL WITH MISSING DATA by applying the triangle inequality, and that { ˆ η n } and { ˆ ρ n } are Donsker, aswell as n / k ˆ ρ n − ρ k = O p ∗ (1) and n / k ˆ η n − η k = O p ∗ (1) in the followingcalculation: k n / { Ψ n ( θ, ˆ η n ( · , θ ) , ˆ ρ n ( · , θ )) − Ψ( θ, η ( · , θ ) , ρ ( · , θ )) }k = k n / ( P n − P )[Ω ˆ ρ n ( ε θ , θ ) { Z − ˆ η n ( ε θ , θ ) } ∆]+ n / P [Ω { ˆ ρ n ( ε θ , θ ) − ρ ( ε θ , θ ) } Z ∆]+ n / P [Ω ˆ ρ n ( ε θ , θ )ˆ η n ( ε θ , θ ) − ρ ( ε θ , θ ) η ( ε θ , θ )∆] k ≤ k G n [Ω ˆ ρ n ( ε θ , θ ) { Z − ˆ η n ( ε θ , θ ) } ∆] k + n / k ˆ ρ n − ρ k · P (Ω Z ∆)+ ( n / k ˆ ρ n − ρ k · k ˆ η n + η k + k ˆ ρ n + ρ k · n / k ˆ η n − η k ) P (Ω∆)= O p ∗ (1) . Because Ψ( θ , η ( · , θ ) , ρ ( · , θ )) = 0 and | ˆ θ n − θ | = o p ∗ (1) by Theorem 3.1,we then have O p ∗ (1) = − n / { Ψ n (ˆ θ n , ˆ η n ( · , ˆ θ n ) , ˆ ρ n ( · , ˆ θ n )) − Ψ(ˆ θ n , η ( · , ˆ θ n ) , ρ ( · , ˆ θ n )) } = o p ∗ (1) + n / Ψ(ˆ θ n , η ( · , ˆ θ n ) , ρ ( · , ˆ θ n )) − n / Ψ( θ , η ( · , θ ) , ρ ( · , θ ))(6.4) = o p ∗ (1) + n / (ˆ θ n − θ ) ˙Ψ θ ( θ ∗ , η ( · , θ ∗ ) , ρ ( · , θ ∗ ))= o p ∗ (1) + n / (ˆ θ n − θ ) { ˙Ψ θ ( θ , η ( · , θ ) , ρ ( · , θ )) + o p ∗ (1) } , where θ ∗ is a point between θ and ˆ θ n . Thus, n / (ˆ θ n − θ ) = O p ∗ (1).Let C n = n − P ni =1 W i . By the central limit theorem, n / ( C n −

1) = O p (1). Thus, when ˆ ρ n takes the form, in (2.3) and ρ ( t, θ ) = Pr( ε θ ≥ t ),they are clearly bounded, and we can show n / k ˆ ρ n − ρ k = O p ∗ (1) by thefollowing calculation: n / { ˆ ρ n ( t, θ ) − ρ ( t, θ ) } = n / (cid:20) { D (0) n ( t, θ ) − d (0) ( t, θ ) } − D (0) n ( t, θ ) C n { C n − } (cid:21) = n / [ { D (0) n ( t, θ ) − d (0) ( t, θ ) } − d (0) ( t, θ ) { C n − } ] + o p ∗ (1)= n / [ { D (0) n ( t, θ ) − C n d (0) ( t, θ ) } ] + o p ∗ (1)= G n [ W { ε θ ≥ t ) − d (0) ( t, θ ) } ] + o p ∗ (1) . We have already shown in the proof of Theorem 3.1 that such chosen ˆ ρ n and ρ belong to a Donsker class. B. NAN, J. D. KALBFLEISCH AND M. YU

Proof of Theorem 3.3.

The diﬀerentiability of both ρ ( ε θ , θ ) and η ( ε θ , θ ) in θ and its implication of the diﬀerentiability of Ψ( θ, η ( · , θ ) , ρ ( · , θ ))in θ , as well as the continuity and boundedness of the derivatives, canbe shown by interchanging integration and diﬀerentiation, which is war-ranted by the dominated convergence theorem under the given regularityconditions. From Theorem 3.2, we know that | ˆ θ n − θ | = O p ∗ ( n − / ). Let | θ − θ | ≤ Kn − / with K < ∞ . Then, we have n / { Ψ n ( θ, ˆ η n ( · , θ ) , ˆ ρ n ( · , θ )) − Ψ n ( θ , ˆ η n ( · , θ ) , ˆ ρ n ( · , θ )) } = n / [ P n Ω ˆ ρ n ( ε θ , θ ) { Z − ˆ η n ( ε θ , θ ) } ∆(6.5) − P n Ω ˆ ρ n ( ε θ , θ ) { Z − ˆ η n ( ε , θ ) } ∆]+ n / [ P n Ω ˆ ρ n ( ε θ , θ ) { Z − ˆ η n ( ε , θ ) } ∆(6.6) − P n Ω ˆ ρ n ( ε , θ ) { Z − ˆ η n ( ε , θ ) } ∆] . We ﬁrst look at term (6.5), which can be rewritten as n / [ − P n Ω ˆ ρ n ( ε θ , θ )ˆ η n ( ε θ , θ )∆ + P n Ω ˆ ρ n ( ε θ , θ )ˆ η n ( ε , θ )∆]= − G n [Ω ˆ ρ n ( ε θ , θ ) { ˆ η n ( ε θ , θ ) − ˆ η n ( ε , θ ) } ∆](6.7) − n / P [Ω ˆ ρ n ( ε θ , θ ) { ˆ η n ( ε θ , θ ) − ˆ η n ( ε , θ ) } ∆] . (6.8)Term (6.7) converges to zero in outer probability, because Ω ˆ ρ n ˆ η n ∆ belongsto a Donsker class by arguments similar to those in the proof of Theorem3.1, and Ω ˆ ρ n ( ε θ , θ ) { ˆ η n ( ε θ , θ ) − ˆ η n ( ε , θ ) } ∆ converges to zero in quadraticmean. Let t ′ = t − ( θ − θ ) z . Direct calculation yields n / P [Ω ˆ ρ n ( ε θ , θ ) { ˆ η n ( ε θ , θ ) − η ( ε θ , θ ) } ∆]= n / P (cid:20) ˆ ρ n ( ε θ , θ ) (cid:26) D (1) n ( ε θ , θ ) D (0) n ( ε θ , θ ) − d (1) ( ε θ , θ ) d (0) ( ε θ , θ ) (cid:27) ∆ (cid:21) = n / Z ˆ ρ n ( t ′ , θ ) (cid:20) d (0) ( t ′ , θ ) { D (1) n ( t ′ , θ ) − d (1) ( t ′ , θ ) }× D (1) n ( t ′ , θ ) D (0) n ( t ′ , θ ) d (0) ( t ′ , θ ) { D (0) n ( t ′ , θ ) − d (0) ( t ′ , θ ) } (cid:21) × δ dP ε , ∆ ,Z ( t, δ, z )(6.9) = n / Z ˆ ρ n ( t ′ , θ ) (cid:20) d (0) ( t ′ , θ ) { D (1) n ( t ′ , θ ) − d (1) ( t ′ , θ ) }× d (1) ( t ′ , θ ) d (0) ( t ′ , θ ) { D (0) n ( t ′ , θ ) − d (0) ( t ′ , θ ) } (cid:21) FT MODEL WITH MISSING DATA × δ dP ε , ∆ ,Z ( t, δ, z ) + o p ∗ (1)= Z G n ˆ ρ n ( t ′ , θ ) d (0) ( t ′ , θ ) − W ε θ ≥ t ′ ) × { Z − η ( t ′ , θ ) } dP ε , ∆ ,Z ( t, , z ) + o p ∗ (1)= Z G n ˆ ρ n ( t ′ , θ ) ℓ ( t ′ , θ, W, Z, ε θ ) dP ε , ∆ ,Z ( t, , z ) + o p ∗ (1)where ℓ ( t ′ , θ, W, Z, ε θ ) = d (0) ( t ′ , θ ) − W ε θ ≥ t ′ ) { Z − η ( t ′ , θ ) } and P ε , ∆ ,Z denotes the joint probability law of ( ε , ∆ , Z ). Clearly, the class of func-tions { ˆ ρ n ( t, θ ) ℓ ( t, θ, W, Z, ε θ ) } is Donsker. The above middle equality holdsbecause (cid:12)(cid:12)(cid:12)(cid:12) n / Z ˆ ρ n ( t ′ , θ ) (cid:20) d (0) ( t ′ , θ ) { D (1) n ( t ′ , θ ) − d (1) ( t ′ , θ ) }− D (1) n ( t ′ , θ ) D (0) n ( t ′ , θ ) d (0) ( t ′ , θ ) { D (0) n ( t ′ , θ ) − d (0) ( t ′ , θ ) } (cid:21) × δ dP ε , ∆ ,Z ( t, δ, z ) − n / Z ˆ ρ n ( t ′ , θ ) (cid:20) d (0) ( t ′ , θ ) { D (1) n ( t ′ , θ ) − d (1) ( t ′ , θ ) }− d (1) ( t ′ , θ ) d (0) ( t ′ , θ ) { D (0) n ( t ′ , θ ) − d (0) ( t ′ , θ ) } (cid:21) × δ dP ε , ∆ ,Z ( t, δ, z ) (cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)Z ˆ ρ n ( t ′ , θ ) (cid:26) d (1) ( t ′ , θ ) d (0) ( t ′ , θ ) − D (1) n ( t ′ , θ ) D (0) n ( t ′ , θ ) d (0) ( t ′ , θ ) (cid:27) × n / { D (0) n ( t ′ , θ ) − d (0) ( t ′ , θ ) } δ dP ε , ∆ ,Z ( t, δ, z ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ · (cid:13)(cid:13)(cid:13)(cid:13) d (1) ( t, θ ) d (0) ( t, θ ) − D (1) n ( t, θ ) D (0) n ( t, θ ) d (0) ( t, θ ) (cid:13)(cid:13)(cid:13)(cid:13) · k n / { D (0) n ( t, θ ) − d (0) ( t, θ ) }k · o p ∗ (1) · O p ∗ (1) · o p ∗ (1)by the tail bounds for the supremum of empirical processes in [27], Section2.14. Similarly, we have n / P [Ω ˆ ρ n ( ε θ , θ ) { ˆ η n ( ε , θ ) − η ( ε , θ ) } ∆]= Z G n ˆ ρ n ( t ′ , θ ) ℓ ( t, θ , W, Z, ε ) dP ε , ∆ ,Z ( t, , z ) + o p ∗ (1) . B. NAN, J. D. KALBFLEISCH AND M. YU

Thus, (6.8) becomes − n / P [ˆ ρ n ( ε θ , θ ) { η ( ε θ , θ ) − η ( ε , θ ) } ∆]+ Z G n ˆ ρ n ( t ′ , θ )(6.10) × { ℓ ( t ′ , θ, W, Z, ε θ ) − ℓ ( t, θ , W, Z, ε ) } dP ε , ∆ ,Z ( t, , z )+ o p ∗ (1) . Note that n / { η ( ε θ , θ ) − η ( ε , θ ) } = n / ( θ − θ ) { ˙ η θ ( ε θ ∗ , θ ∗ ) } is bounded(by assumptions of bounded density functions for failure and censoring timesin [30]), where ˙ η θ denotes the derivative of η with respect to θ , and θ ∗ is apoint between θ and θ . Thus, by repeatedly using the dominate convergencetheorem, we know that the ﬁrst term in (6.10) equals − n / ( θ − θ ) P { ρ ( ε θ , θ ) ˙ η θ ( ε , θ )∆ } + o p ∗ (1) , which in turn equals − n / ( θ − θ ) P { ρ ( ε , θ ) ˙ η θ ( ε , θ )∆ } + o p ∗ (1) . It can be veriﬁed that ˆ ρ n ( t ′ , θ ) { ℓ ( t ′ , θ, W, Z, ε θ ) − ℓ ( t, θ , W, Z, ε ) } convergesto zero in quadratic mean; thus, k G n ˆ ρ n ( t ′ , θ ) { ℓ ( t ′ , θ, W, Z, ε θ ) − ℓ ( t, θ , W, Z, ε ) }k = o p ∗ (1) , then the second term in (6.10) converges to zero in outer probability. So wehave shown that term (6.5) is asymptotically equivalent to − n / ( θ − θ ) × P { ρ ( ε , θ ) ˙ η θ ( ε , θ )∆ } .We now consider term (6.6), which can be rewritten as n / P n [Ω { Z − ˆ η n ( ε , θ ) } ∆ { ˆ ρ n ( ε θ , θ ) − ˆ ρ n ( ε , θ ) } ]= G n [Ω { Z − ˆ η n ( ε , θ ) } ∆ { ˆ ρ n ( ε θ , θ ) − ˆ ρ n ( ε , θ ) } ](6.11) + n / P [Ω { Z − ˆ η n ( ε , θ ) } ∆ { ˆ ρ n ( ε θ , θ ) − ˆ ρ n ( ε , θ ) } ] . (6.12)Because Ω { Z − ˆ η n ( ε , θ ) } ∆ { ˆ ρ n ( ε θ , θ ) − ˆ ρ n ( ε , θ ) } belongs to a Donskerclass and converges to zero in quadratic mean, we know that term (6.11)converges to zero in outer probability. Similar to the calculation in (6.9), for(6.12), we have n / P [Ω { Z − ˆ η n ( ε , θ ) } ∆ { ˆ ρ n ( ε θ , θ ) − ρ ( ε θ , θ ) } ]= n / Z { z − ˆ η n ( t, θ ) } (cid:20) { D (0) n ( t ′ , θ ) − d (0) ( t ′ , θ ) }− D (0) n ( t ′ , θ ) C n { C n − } (cid:21) dP ε , ∆ ,Z ( t, , z ) FT MODEL WITH MISSING DATA = n / Z { z − ˆ η n ( t, θ ) } (6.13) × [ { D (0) n ( t ′ , θ ) − C n d (0) ( t ′ , θ ) } ] dP ε , ∆ ,Z ( t, , z ) + o p ∗ (1)= Z G n [ { z − ˆ η n ( t, θ ) }× W { ε θ ≥ t ′ ) − d (0) ( t ′ , θ ) } ] dP ε , ∆ ,Z ( t, , z ) + o p ∗ (1) . Similarly, we have n / P [Ω { Z − ˆ η n ( ε , θ ) } ∆ { ˆ ρ n ( ε , θ ) − ρ ( ε , θ ) } ]= Z G n [ { z − ˆ η n ( t, θ ) } W { ε ≥ t ) − d (0) ( t, θ ) } ] dP ε , ∆ ,Z ( t, , z )+ o p ∗ (1) . Then, term (6.12) becomes n / P [Ω { Z − ˆ η n ( ε , θ ) } ∆ { ρ ( ε θ , θ ) − ρ ( ε , θ ) } ]+ Z G n { z − ˆ η n ( t, θ ) }× W [ { ε θ ≥ t ′ ) − d (0) ( t ′ , θ ) }− { ε ≥ t ) − d (0) ( t, θ ) } ] dP ε , ∆ ,Z ( t, , z ) + o p ∗ (1) . Similar to the arguments following (6.10), we know that the ﬁrst term aboveis asymptotically equivalent to n / ( θ − θ ) P [ { Z − η ( ε , θ ) } ∆ ˙ ρ θ ( ε , θ )],and the second term, above, is o p ∗ (1). So, term (6.6) can be replaced by n / ( θ − θ ) P [ { Z − η ( ε , θ ) } ∆ ˙ ρ θ ( ε , θ )] + o p ∗ (1).Then, from the above calculation for terms (6.5) and (6.6), we obtain n / { Ψ n ( θ, ˆ η n ( · , θ ) , ˆ ρ n ( · , θ )) − Ψ n ( θ , ˆ η n ( · , θ ) , ˆ ρ n ( · , θ )) } = − n / ( θ − θ ) P { ρ ( ε , θ ) ˙ η θ ( ε , θ )∆ } (6.14) + n / ( θ − θ ) P [ { Z − η ( ε , θ ) } ∆ ˙ ρ θ ( ε , θ )] + o p ∗ (1)= n / ( θ − θ ) ˙Ψ θ ( θ , η ( · , θ ) , ρ ( · , θ )) + o p ∗ (1) , which yields the asymptotic linearity (3.2) when θ is replaced by ˆ θ n . In fact,in the above expression, we have P [ { Z − η ( ε , θ ) } ∆ ˙ ρ θ ( ε , θ )] = 0, giventhe equality η ( ε , θ ) = E ( Z | ε , ∆ = 1), which can be veriﬁed directly (see,also, [21]). We keep it in the above calculation so as to clearly show therelationship of ˙Ψ θ and ( ˙ η θ , ˙ ρ θ ).Since ˆ θ n satisﬁes Ψ n (ˆ θ n , ˆ η n ( · , ˆ θ n ) , ˆ ρ n ( · , ˆ θ n )) = o p ∗ ( n − / ), showing asymp-totic normality for n / (ˆ θ n − θ ) is equivalent to showing asymptotic normal-ity for n / Ψ n ( θ , ˆ η n ( · , θ ) , ˆ ρ n ( · , θ )). The following shows the calculation. By B. NAN, J. D. KALBFLEISCH AND M. YU adding, subtracting and rearranging terms, we have n / Ψ n ( θ , ˆ η n ( · , θ ) , ˆ ρ n ( · , θ ))= G n [Ω ρ ( ε , θ ) { Z − η ( ε , θ ) } ∆] − G n [Ω ˆ ρ n ( ε , θ ) { ˆ η n ( ε , θ ) − η ( ε , θ ) } ∆](6.15) + G n [Ω { Z − η ( ε , θ ) } ∆ { ˆ ρ n ( ε , θ ) − ρ ( ε , θ ) } ](6.16) − n / P [Ω ˆ ρ n ( ε , θ ) { ˆ η n ( ε , θ ) − η ( ε , θ ) } ∆](6.17) + n / P [Ω { Z − η ( ε , θ ) } ∆ { ˆ ρ n ( ε , θ ) − ρ ( ε , θ ) } ] . (6.18)Repeatedly using similar arguments, we can show that terms (6.15) and(6.16) are o p ∗ (1). Term (6.17) can be calculated similarly, as in (6.9), butwith t = t ′ , so that the lower case variable z is not involved in the integrand,and ˆ ρ n can be further replaced by ρ . Term (6.18) can be calculated similarly,as in (6.13). We then have n / Ψ n ( θ , ˆ η n ( · , θ ) , ˆ ρ n ( · , θ ))= G n (cid:20) Ω ρ ( ε , θ ) { Z − η ( ε , θ ) } ∆ − Z ρ ( t, θ ) d (0) ( t, θ ) − W ε ≥ t ) { Z − η ( t, θ ) } dP ε , ∆ ( t, Z { z − η ( t, θ ) } W { ε ≥ t ) − d (0) ( t, θ ) } dP ε , ∆ ,Z ( t, , z ) (cid:21) (6.19) + o p ∗ (1)= G n (cid:20) Ω ρ ( ε , θ ) { Z − η ( ε , θ ) } ∆ − Z ρ ( t, θ ) W ε ≥ t ) { Z − η ( t, θ ) } d Λ ( t ) (cid:21) (6.20) + o p ∗ (1) , which converges in distribution to a normal random variable by the cen-tral limit theorem, because the inﬂuence function in the above expressionis bounded. Here, Λ is the cumulative hazard function of e = T − θ ′ Z .So, from equation (3.2), we know that n / (ˆ θ n − θ ) is asymptotically nor-mal with asymptotic representation (3.3) if ˙Ψ θ ( θ , η ( · , θ ) , ρ ( · , θ )) is non-singular. That the term (6.19), yielded by estimating the weight function ρ ( t, θ ), is equal to zero can be veriﬁed directly, again, by using the equal-ity η ( ε , θ ) = E ( Z | ε , ∆ = 1). Term (6.20) is obtained from the followingcalculation: d (0) ( t, θ ) = P { W Y − θ ′ Z ≥ t ) } FT MODEL WITH MISSING DATA = P { Y − θ ′ Z ≥ t ) } = E [ E { Y − θ ′ Z ≥ t ) | Z } ]= E [Pr( T − θ ′ Z ≥ t | Z ) Pr( C − θ ′ Z ≥ t | Z )]= Z exp {− Λ ( t ) }{ − G ( t | z ) } dH ( z ) , where G ( ·| z ) is the conditional distribution function of the centered censor-ing time C − θ ′ Z given Z = z , and H is the marginal distribution functionof covariate Z . On the other hand, from the joint distribution of ( ε , ∆ , Z ),we obtain dP ε , ∆ ( t,

1) = (cid:20)Z exp {− Λ ( t ) }{ − G ( t | z ) } dH ( z ) (cid:21) d Λ ( t )= d (0) ( t, θ ) d Λ ( t ) . That term (6.19) is zero becomes even more straightforward from term(6.18) if the weight function ρ is given and, thus, need not be estimated(e.g., ˆ ρ n = ρ = 1).6.4. Proof of Theorem 3.4.

We will sequentially show consistency, root- n rate convergence and the asymptotic normality of ˆ θ ∗ n . It is easy to seethat { W ( x ; α ) : α ∈ A } is Lipschitz in α and, hence, Donsker (see Example3.2.12 of [27]), so we have that { ˆ η ∗ n } and { ˆ ρ ∗ n } are Donsker (see Section 2.10of [27]). Based on the smoothness of W ( X ; α ) in α and the structures of ˆ η n ,ˆ ρ n , ˆ η ∗ n and ˆ ρ ∗ n given in (2.2), (2.3), (3.5) and (3.6), we have k W ( X ; ˆ α n ) − W ( X ; α ) k → , k ˆ η ∗ n − ˆ η n k → k ˆ ρ ∗ n − ˆ ρ n k → α . The above three quantities areactually O p ∗ ( n − / ) by the root- n consistency of ˆ α n and the smoothnessassumption of W ( X ; α ). Thus, with Ω i replaced by W i in Ψ n , we have k Ψ ∗ n ( θ, ˆ η ∗ n ( · , θ ) , ˆ ρ ∗ n ( · , θ )) − Ψ n ( θ, ˆ η ∗ n ( · , θ ) , ˆ ρ ∗ n ( · , θ )) k≤ k W ( X ; ˆ α n ) − W ( X ; α ) kk ˆ ρ ∗ n ( ε θ , θ ) { Z − ˆ η ∗ n ( ε θ , θ ) } ∆ k (6.21) = o p ∗ (1)by the boundedness of ˆ ρ ∗ n ( ε θ , θ ) { Z − ˆ η ∗ n ( ε θ , θ ) } ∆. By (6.1), (6.2) and thetriangle inequality, we have k ˆ η ∗ n − η k → k ˆ ρ ∗ n − ρ k → k Ψ n ( θ, ˆ η ∗ n ( · , θ ) , ˆ ρ ∗ n ( · , θ )) − Ψ n ( θ, η ( · , θ ) , ρ ( · , θ )) k = o p ∗ (1) , B. NAN, J. D. KALBFLEISCH AND M. YU since Donsker implies Glivenko–Cantelli. Hence, by the triangle inequalitywe have k Ψ ∗ n ( θ, ˆ η ∗ n ( · , θ ) , ˆ ρ ∗ n ( · , θ )) − Ψ( θ, η ( · , θ ) , ρ ( · , θ )) k = o p ∗ (1) , which yields the consistency of ˆ θ ∗ n by the same argument as in the proof ofTheorem 3.1.From (6.21), we know that k n / { Ψ ∗ n ( θ, ˆ η ∗ n ( · , θ ) , ˆ ρ ∗ n ( · , θ )) − Ψ n ( θ, ˆ η ∗ n ( · , θ ) , ˆ ρ ∗ n ( · , θ )) }k = O p ∗ (1) . Replacing (ˆ η n , ˆ ρ n ) with (ˆ η ∗ n , ˆ ρ ∗ n ) in (6.3), we obtain k n / { Ψ n ( θ, ˆ η ∗ n ( · , θ ) , ˆ ρ ∗ n ( · , θ )) − Ψ( θ, η ( · , θ ) , ρ ( · , θ )) }k = O p ∗ (1) . Hence, by applying the triangle inequality, we have k n / { Ψ ∗ n ( θ, ˆ η ∗ n ( · , θ ) , ˆ ρ ∗ n ( · , θ )) − Ψ( θ, η ( · , θ ) , ρ ( · , θ )) }k = O p ∗ (1) , and the same calculation as in (6.4), with Ψ n replaced by Ψ ∗ n and ˆ θ n replacedby ˆ θ ∗ n , shows that n / (ˆ θ ∗ n − θ ) = O p ∗ (1).We now prove the asymptotic normality of n / (ˆ θ ∗ n − θ ). Consider thefollowing decomposition: n / Ψ ∗ n (ˆ θ ∗ n , ˆ η ∗ n ( · , ˆ θ ∗ n ) , ˆ ρ ∗ n ( · , ˆ θ ∗ n ))= n / Ψ ∗ n (ˆ θ ∗ n , ˆ η ∗ n ( · , ˆ θ ∗ n ) , ˆ ρ ∗ n ( · , ˆ θ ∗ n )) − n / Ψ n (ˆ θ ∗ n , ˆ η n ( · , ˆ θ ∗ n ) , ˆ ρ n ( · , ˆ θ ∗ n ))(6.22) + n / Ψ n (ˆ θ ∗ n , ˆ η n ( · , ˆ θ ∗ n ) , ˆ ρ n ( · , ˆ θ ∗ n )) − n / Ψ n ( θ , ˆ η n ( · , θ ) , ˆ ρ n ( · , θ ))(6.23) + n / Ψ n ( θ , ˆ η n ( · , θ ) , ˆ ρ n ( · , θ )) − n / Ψ n (ˆ θ n , ˆ η n ( · , ˆ θ n ) , ˆ ρ n ( · , ˆ θ n ))(6.24) + n / Ψ n (ˆ θ n , ˆ η n ( · , ˆ θ n ) , ˆ ρ n ( · , ˆ θ n )) . (6.25)Then, applying (6.14) to (6.23) and (6.24), respectively, we can replace (6.23)with n / (ˆ θ ∗ n − θ ) ˙Ψ θ ( θ , η ( · , θ ) , ρ ( · , θ )) + o p ∗ (1)(6.26)and replace (6.24) with − n / (ˆ θ n − θ ) ˙Ψ θ ( θ , η ( · , θ ) , ρ ( · , θ )) + o p ∗ (1) . (6.27)Term (6.25), clearly, is o p ∗ (1). We then calculate term (6.22). Letˆ η n,α ( t, θ ) = P n { W ( X ; α )1( ε θ ≥ t ) Z } / P n { W ( X ; α )1( ε θ ≥ t ) } , ˆ ρ n,α ( t, θ ) = P n { W ( X ; α )1( ε θ ≥ t ) } / P n { W ( X ; α ) } . Then, we have ˆ η n ≡ ˆ η n,α , ˆ ρ n ≡ ˆ ρ n,α , ˆ η ∗ n ≡ ˆ η n, ˆ α n , and ˆ ρ ∗ n ≡ ˆ ρ n, ˆ α n . LetΦ n ( α, θ ) = P n [ W ( X ; α )ˆ ρ n,α ( ε θ , θ ) { Z − ˆ η n,α ( ε θ , θ ) } ∆] . FT MODEL WITH MISSING DATA It can be seen by direct calculation that the second derivative of Φ n ( α, θ )to α is bounded with outer probability 1. So, by the Taylor expansion, wehave Ψ ∗ n ( θ, ˆ η ∗ n , ˆ ρ ∗ n ) − Ψ n ( θ, ˆ η n , ˆ ρ n ) = Φ n ( ˆ α n , θ ) − Φ n ( α , θ )= ˙Φ n,α ( α , θ )( ˆ α n − α ) + o p ∗ ( n − / ) , where˙Φ n,α ( α , θ ) = P n (cid:20) ˆ ρ n,α ( ε θ , θ ) { Z − ˆ η n,α ( ε θ , θ ) } ∂W ( X ; α ) ∂α ′ (cid:12)(cid:12)(cid:12)(cid:12) α = α ∆+ W ( X ; α ) { Z − ˆ η n,α ( ε θ , θ ) } ∂ ˆ ρ n,α ( ε θ , θ ) ∂α ′ (cid:12)(cid:12)(cid:12)(cid:12) α = α ∆+ W ( X ; α )ˆ ρ n,α ( ε θ , θ ) (cid:26) − ∂ ˆ η n,α ( ε θ , θ ) ∂α ′ (cid:27) α = α ∆ (cid:21) . It is also easy to see, by direct calculation, that { ∂ ˆ η n,α /∂α | α = α : θ ∈ Θ } and { ∂ ˆ ρ n,α /∂α | α = α : θ ∈ Θ } are (componentwise) Glivenko–Cantelli, so,with outer probability 1, we have˙Φ n,α ( α , ˆ θ ∗ n ) → P [ ρ ( ε , θ ) { Z − η ( ε , θ ) } ( ˙ W α ( X ; α )) ′ ∆]+ P [ W ( X ; α ) { Z − η ( ε , θ ) } A ( ε , θ )∆] − P [ W ( X ; α ) ρ ( ε , θ ) A ( ε , θ )∆](6.28) = P [ ρ ( ε , θ ) { Z − η ( ε , θ ) } ( ˙ W α ( X ; α )) ′ ∆] − P [ ρ ( ε , θ ) A ( ε , θ )∆] ≡ − B, where ˙ W α ( X ; α ) = ∂W ( X ; α ) /∂α , A is the limit of ∂ ˆ ρ n,α /∂α ′ | α = α ,θ =ˆ θ ∗ n and A is the limit of ∂ ˆ η n,α /∂α ′ | α = α ,θ =ˆ θ ∗ n . The term (6.28) is zero since E ( Z | ε , ∆ = 1) = η ( ε , θ ). Note that E ( W | X ) = 1 is also used in the above calcu-lation. It can be directly veriﬁed that A ( t, θ ) = 1 P { (1( ε ≥ t ) } [ P { ε ≥ t ) Z ( ˙ W α ( X ; α )) ′ }− η ( t, θ ) P { ( ˙ W α ( X ; α )) ′ ε ≥ t ) } ] . Hence, we haveΨ ∗ n ( θ, ˆ η ∗ n , ˆ ρ ∗ n ) − Ψ n ( θ, ˆ η n , ˆ ρ n ) = − B ( ˆ α n − α ) + o p ∗ ( n − / ) . (6.29)Replacing (6.22), (6.23) and (6.24) by (6.29), (6.26) and (6.27), respectively,we obtain n / (ˆ θ ∗ n − θ ) = n / (ˆ θ n − θ ) + { ˙Ψ θ ( θ , η , ρ ) } − Bn / ( ˆ α n − α ) + o p ∗ (1) . B. NAN, J. D. KALBFLEISCH AND M. YU

By (3.3) we know that ˆ θ n is an asymptotically linear estimator. Given thatˆ α n is also an asymptotically linear estimator, we know that n / (ˆ θ n − θ ) and n / ( ˆ α n − α ) are asymptotically jointly normal by the multivariate centrallimit theorem. Hence, by [18], we know that n / (ˆ θ ∗ n − θ ) is asymptoticallynormal with variance given in (3.7).REFERENCES [1] Bickel, P. J., Klaassen, C. A. J., Ritov, Y. and

Wellner, J. A. (1993).

Eﬃ-cient and Adaptive Estimation for Semiparametric Models . Johns Hopkins Univ.Press, Baltimore. MR1245941[2]

Borgan, O., Langholz, B., Samuelsen, S. O., Goldstein, L. and

Pogoda, J. (2000). Exposure stratiﬁed case-cohort designs.

Lifetime Data Anal. Breslow, N. E. and

Wellner, J. A. (2007). Weighted likelihood for semiparametricmodels and two-phase stratiﬁed samples, with application to Cox regression.

Scand. J. Statist. Buckley, J. and

James, I. R. (1979). Linear regression with censored data.

Biometrika Chen, K. and

Lo, S.-H. (1999). Case-cohort and case-control analysis with Cox’smodel.

Biometrika Cox, D. R. (1972). Regression models and life tables (with discussion).

J. Roy.Statist. Soc. Ser. B Fygenson, M. and

Ritov, Y. (1994). Monotone estimating equations for censoreddata.

Ann. Statist. Hu, H. (1998). Large sample theory for pseudo-maximum likelihood estimates insemiparametric models. Ph.D. dissertation, Dept. Statistics, Univ. Washington.[9]

Huang, Y. (2002). Calibration regression of censored lifetime medical cost.

J. Amer.Statist. Assoc. Jin, Z., Ying, Z. and

Wei, L. J. (2001). A simple resampling method by perturbingthe minimand.

Biometrika Kalbfleisch, J. D. and

Lawless, J. F. (1988). Likelihood analysis of multi-statemodels for disease incidence and mortality.

Stat. Med. Kalbfleisch, J. D. and

Prentice, R. L. (2002).

The Statistical Analysis of FailureTime Data , 2nd ed. Wiley, New York. MR1924807[13]

Kulich, M. and

Lin, D. Y. (2004). Improving the eﬃciency of relative-risk estima-tion in case-cohort studies.

J. Amer. Statist. Assoc. Little, R. J. A. and

Rubin, D. B. (2002).

Statistical Analysis with Missing Data ,2nd ed. Wiley, Hoboken, NJ. MR1925014[15]

Nan, B. and

Wellner, J. A. (2006). Semiparametric pseudo Z-estimation withapplications. Technical report, Dept. Biostatistics, Univ. Michigan.[16]

Nan, B., Yu, M. and

Kalbfleisch, J. D. (2006). Censored linear regression forcase-cohort studies.

Biometrika Parzen, M. I., Wei, L. J. and

Ying, Z. (1994). A resampling method based onpivotal estimating functions.

Biometrika Pierce, D. A. (1982). The asymptotic eﬀect of substituting estimators for parametersin certain types of statistics.

Ann. Statist. Prentice, R. L. (1986). A case-cohort design for epidemiologic cohort studies anddisease prevention trials.

Biometrika [20] Pugh, M., Robins, J., Lipsitz, S. and

Harrington, D. (1994). Inference in theCox proportional hazards model with missing covariates. Technical Report 758Z,Harvard School of Public Health, Boston, MA.[21]

Ritov, Y. (1990). Estimation in a linear regression model with censored data.

Ann.Statist. Robins, J. M., Rotnitzky, A. and

Zhao, L. P. (1994). Estimation of regressioncoeﬃcients when some regressors are not always observed.

J. Amer. Statist.Assoc. Self, S. G. and

Prentice, R. L. (1988). Asymptotic distribution theory and eﬃ-ciency results for case-cohort studies.

Ann. Statist. Stute, W. (1993). Consistent estimation under random censorship when covariablesare available.

J. Multivariate Anal. Stute, W. (1996). Distributional convergence under random censorship when co-variables are present.

Scand. J. Statist. Tsiatis, A. A. (1990). Estimating regression parameters using linear rank tests forcensored data.

Ann. Statist. van der Vaart, A. W. and Wellner, J. A. (1996).

Weak Convergence and Em-pirical Processes . Springer, New York. MR1385671[28] van der Vaart, A. W. and

Wellner, J. A. (2000). Preservation theorems forGlivenko–Cantelli and uniform Glivenko–Cantelli classes. In

High DimensionalProbability II (E. Gin´e, D. Mason and J. A. Wellner, eds.) 115–134. Birkh¨auser,Boston. MR1857319[29] Wei, L. J., Ying, Z. L. and

Lin, D. Y. (1990). Linear regression analysis for censoredsurvival data based on rank tests.

Biometrika Ying, Z. (1993). A large sample study of rank estimation for censored regressiondata.

Ann. Statist. Yu, M. and

Nan, B. (2006). A hybrid Newton-type method for censored survivaldata using double weights in linear models.

Lifetime Data Anal. B. NanJ. D. KalbfleischDepartment of BiostatisticsUniversity of Michigan1420 Washington HeightsAnn Arbor, Michigan 48109-2029USAE-mail: [email protected]ﬂ@umich.edu