The Propensity Score Estimation in the Presence of Length-biased Sampling: A Nonparametric Adjustment Approach
TThe Propensity Score Estimation in the Presenceof Length-biased Sampling: A NonparametricAdjustment Approach
By Ashkan Ertefaie
Department of Statistics, University of Michigan, Ann Arbor, Michigan, USA [email protected] Masoud Asgharian and David Stephens
Department of Mathematics and Statistics, McGill University, Montreal, Quebec,Canada. [email protected] [email protected] Summary
The pervasive use of prevalent cohort studies on disease duration, increasingly calls forappropriate methodologies to account for the biases that invariably accompany samplesformed by such data. It is well-known, for example, that subjects with shorter lifetimeare less likely to be present in such studies. Moreover, certain covariate values could be preferentially selected into the sample, being linked to the long-term survivors. The exist-ing methodology for estimation of the propensity score using data collected on prevalentcases requires the correct conditional survival/hazard function given the treatment andcovariates. This requirement can be alleviated if the disease under study has stationaryincidence, the so-called stationarity assumption. We propose a nonparametric adjust- ment technique based on a weighted estimating equation for estimating the propensityscore which does not require modeling the conditional survival/hazard function when thestationarity assumption holds. Large sample properties of the estimator is establishedand its small sample behavior is studied via simulation. Some key words : Propensity score; Length-biased sampling; Causal inference; Length-biased sampling. . Introduction Survival or failure time data typically comprise an initiating event, say onset of a dis-ease, and a terminating event, say death. In an ideal situation, recruited subjects havenot experienced the initiating event, the so-called incident cases. These cases are thenfollowed to a terminating event or censoring, say loss to follow-up. In many practical situ- ations, however, recruiting incident cases is infeasible due to logistic or other constraints.In such circumstances, subjects who have experienced the initiating event prior to thestart of the study, so-called prevalent cases, are recruited. It is well known that thesecases tend to have a longer survival time, and hence form a biased sample from the targetpopulation. This bias is termed length bias when the initiating events are generated by a stationary Poisson process (Cox & Lewis (1966), Zelen & Feinlein (1969)), the so-colled stationarity assumption. a r X i v : . [ m a t h . S T ] N ov A. Ertefaie, M. Asgharian and D. Stephenes
In observational studies, treatment is assigned to the experimental units without ran-domization. Thus, in each treatment group, the covariate distributions may be imbal-anced which may lead to bias in estimating the treatment effect if the covariate imbalance is not properly taken into account (Cochran & Rubin (1973), Rubin (1973)). Propensityscore is a tool that is widely used in causal inference to adjust for this source of bias(see Robins et al. (2000), Hern´an et al. (2000)). Rosenbaum & Rubin (1983) define thepropensity score for a binary treatment D as p ( D = 1 | X ) where X is a vector of measuredcovariates. They show that under some assumptions, treatment is independent of the co- variates inside each propensity score stratum (the balancing property of the propensityscore).In cases where the sample is not representative of the population, naive propensityscore estimation may not have the balancing property. Cheng & Wang (2012) develop amethod that consistently estimates the parameters of the propensity score from prevalent survival data. They also, present a method that can be used in a special case of length-biased sampling. Their method requires correct specification of the conditional hazardmodel given the treatment and covariates. We refer to their estimator as CW in thesequel.Our goal is to develop a method that estimates the propensity score using a weighted logistic regression where weights are estimated nonparametrically. Our estimating equa-tion is designed specifically for the length-biased data, i.e., disease with stationary in-cidence. Unlike the method proposed by CW, our method does not require any modelspecification for the conditional failure time given the exposure and the covariates.Studies on length-biased sampling can be traced as far back as Wicksell (1925) and his corpuscle problem. The phenomenon was later noticed by Fisher (1934) in his article onmethods of ascertainment. Neyman (1955) discussed length-biased sampling further andcoined the term incidence-prevalence bias. Cox (1969) studied length-biased sampling inindustrial applications, while Zelen & Feinlein (1969) observed the same bias in screeningtests for disease prevalence (Asgharian et al. (2002), Asgharian & Wolfson (2005)). More recently, Shen et al. (2009), Qin & Shen (2010) and Ning et al. (2010) have studied theanalysis of covariates under biased sampling.2 . Length-biased sampling In this section, we introduce concepts and notations necessary to formulate problemsinvolving length-biased sampling. We adopt the common modeling framework for preva- lent cohort studies. We assume that affected individuals in the study population developthe condition of interest ( onset ) according to some stochastic mechanism at randompoints in calendar time, and undergo a terminal event ( failure ) at some subsequent timepoint that is also determined by a stochastic mechanism. Individuals enter into the studyat some census time, and are followed up until the terminal or censoring event. · . Notations Let T pop be the time measured from the onset to failure time in the target populationwith an absolutely continuous distribution F and density f . Also, let D pop and X pop be the binary treatment variable and the vector of covariates, respectively. Let T bethe same measured time for observed subjects with distribution F LB . The variables with superscript pop represent the population variables; variables without pop denotethe observed truncated variables. It is well known that if the onset times are generated iscellanea F LB ( t ) = (cid:90) t s dF ( s ) (cid:90) ∞ s dF ( s ) = 1 µ (cid:90) t s dF ( s ) and f LB ( t ) = tf ( t ) µ (1)where f LB is the density function of F LB and µ is the mean survival time under F . Theobserved event time, T , can be written A + R , when A is the time from the onset of the disease to the recruitment time, and R , the residual life time, is the time from recruitmentto the event, also called backward and forward recurrence times, respectively. Whenthe individuals are also subject to right-censoring, the observed survival time is Y = A + min( R, C ), where C is the censoring time measured from the recruitment to the lossto follow-up; for all subjects, both A and min( R, C ) are observed. The censoring indicator is denoted by δ ( δ = 1 indicating failure). The sample consists of ( y i , a i , δ i , d i , x i ) for n independent subjects. The following diagram illustrates the different random quantitiesintroduced in this Section. (cid:31) (cid:111) (cid:111) T = A + R (cid:31) (cid:47) (cid:47) (cid:31) (cid:47) (cid:47) A Onset Recruitment (cid:31) (cid:111) (cid:111) (cid:31) (cid:111) (cid:111) C ◦ (cid:31) (cid:47) (cid:47) (cid:31) (cid:47) (cid:47) R × (cid:31) (cid:111) (cid:111) Throughout the paper, we assume that the following assumptions hold:
A1.
The variable ( T pop , D pop , X pop ) is independent of the calendar time of the onset of the disease. A2.
The disease has stationary incidence, i.e., the disease incidence occurs at a constantrate.
A3.
The censoring time C is independent of ( A, R ) given the covariates ( D, X ). A4.
The censoring time C is independent of the covariates ( D, X ). Let f ( t | d, x ) be the unbiased conditional density of survival times given the covariatesand treatment. Then, under assumptions A A
2, the joint density of (
A, T ) given( D, X ) is f ( t | d, x ) / (cid:82) ∞ uf ( u | d, x ) duI ( t > a >
0) as shown in Asgharian et al. (2006). Theassumptions A A p ( Y ∈ ( t, t + dt ) , A ∈ ( a, a + da ) , δ = 1 | d, x ) = f ( t | d, x ) S c ( t − a ) dtdaµ d ( x , θ ) , where µ d ( x , θ ) = (cid:82) ∞ p ( T pop ≥ a | D = d, X = x , θ ) and S c ( . ) are the conditional counter- factual mean failure time if treated at D = d and the survival function for the residualcensoring variable C , respectively. By integrating the above equation over 0 < a < t , wehave p ( Y ∈ ( t, t + dt ) , δ = 1 | d, x ) = f ( t | d, x ) w ( t ) dtµ d ( x , θ ) , (2)where w ( t ) = (cid:82) t S C ( s ) ds (see Shen et al. (2009) and Qin & Shen (2010)). A. Ertefaie, M. Asgharian and D. Stephenes . Propensity score estimation under length-biased sampling Assuming a logit model for the propensity score in the target population, we have π ( x , α ) = p ( D pop = 1 | X pop = x ) = exp( α x )1 + exp( α x ) . (3)where α is a p × X may include a columnof 1s. It can be shown that under assumption A
2, we have p LB ( D = 1 | X = x ) = µ ( x , θ ) p ( D pop = 1 | X pop = x ) µ ( x , α, θ ) , (4)where µ ( x , α, θ ) = π ( x , α ) µ ( x , θ ) + (1 − π ( x , α )) µ ( x , θ ) (Bergeron et al. (2008)).Assuming the proportional hazard model, i.e., λ T pop ( u | D pop = d, X pop = x ) = λ ( u ) e γd + β x , Cheng & Wang (2012) show that the parameter of the propensity score canbe consistently estimated using the logistic regression but adjusted for the ‘offset’ term log( ˆ α ( x ; ˆΛ , ˆ γ, ˆ β )) as the intercept where ˆ α ( x ; ˆΛ , ˆ γ, ˆ β ) = (cid:80) ni =1 exp[ − ˆΛ( a i ) exp(ˆ γ + ˆ β x ) (cid:80) ni =1 exp[ − ˆΛ( a i ) exp( ˆ β x )] . The cu-mulative hazard function Λ is estimated using the Breslow estimator, and the parameters( γ, β ) can be estimated using the estimating equations developed by Qin & Shen (2010).The consistency of the parameters of the propensity score in the CW method relies on thecorrect specification of the conditional hazard model given the treatment and covariates. When the initiating event of the duration variable has stationary incidence, it is pos-sible to devise a robust methodology for estimating the propensity score that does notrequire knowledge of the conditional hazard model. See among others Wolfson et al.(2001) and De U˜na-´alvarez (2004) for examples of such duration variables in medicaland labor force studies, respectively.
We construct an unbiased estimating equation for estimating the parameters of thepropensity score using the weighted logistic regression where weights are estimated non-parametrically. Let F ( d | x ) be the unbiased conditional distribution of the treatmentgiven the covariates. Then E (cid:20) δ ( D − π ( x , α )) w ( Y ) (cid:12)(cid:12)(cid:12)(cid:12) X = x (cid:21) = E (cid:20) E (cid:26) δ ( d − π ( x , α )) w ( Y ) (cid:12)(cid:12)(cid:12)(cid:12) D = d, X = x (cid:27)(cid:21) = (cid:90) ( d − π ( x , α )) (cid:90) f ( y | x , d ) w ( y ) w ( y ) µ d ( x , θ ) dy × µ d ( x , θ ) dF ( d | x ) µ ( x , α, θ )= 1 µ ( x , α, θ ) (cid:90) ( d − π ( x , α )) dF ( d | x ) = 0 . The second equality follows from equations (2), and (4). The last equality holds since f ( y | x , d ) is a proper density and (3). An unbiased estimating equation for α is therefore U ( α ) = n (cid:88) i =1 δ i x (cid:62) i ( d i − π ( x i , α )) (cid:98) w ( y i ) = 0 , (5)where ˆ w ( y ) = (cid:82) y ˆ S C ( s ) ds and ˆ S C is the Kaplan-Meier estimator of the survivor functionof the residual censoring variable C . iscellanea by (5) in the presence of length-biased sampling when w ( . ) is replaced by its estimatedvalue. Theorem . Let (cid:98) α be an estimator obtained by ( ). Then under conditions C − C and assumptions A − A , (cid:98) α → α in probability as n → ∞ . Moreover, √ n ( (cid:98) α − α ) d −→ N (0 , η ( α )) , where η ( α ) is given in the Appendix. Proof
See the Appendix.A consistent plug-in estimator of η ( α ) is presented in the Appendix.4 . Simulation Studies In this Section, we describe a simulation study to examine the performance of theproposed propensity score estimator. Our simulation consists of 500 datasets of sizes
500 and 5000. The censoring variable C is generated from a uniform distribution in theinterval (0 , τ ) where the parameter τ is set such that it results in a desired censoringproportion. To create length-biased samples, we generate a variable A from a uniformdistribution (0 , ρ ) and ignore those whose generated unbiased failure time is less than A .We generated the unbiased failure times from the following hazard model h ( t | d, x ) = 0 . { d − . x + 0 . x + 0 . dx − . dx } , where D ∼ Bernoulli (cid:16) exp {− . x − x } {− . x − x } (cid:17) with X and X distributed according to N (0 , σ = 0 . We estimate the parameters of the propensity score using four different estimators:ˆ α is the estimator obtained by the proposed method; ˆ α w and ˆ α mw are the estimatorobtained by the CW method when the hazard model is correctly and incorrectly specified,respectively, and ˆ α Un is obtained by a naive method that does not adjust for the length-bias sampling. In ˆ α mw , we assume that the interaction between the treatment D and the covariate X has been ignored in the fitted hazard model.Table 1 summarizes the estimated propensity score parameters and their standard er-rors. Our simulation results confirm that the proposed estimating equation (5) adjusts thelength-biased sampling. The standard errors, however, are larger than the one obtainedby the CW method, which is the price we pay for relaxing the modeling assumption of the hazard model. As we expected, CW estimator is highly sensitive to the model mis-specification even when just one of the interaction terms is ignored. Specifically, when theinteraction term between the treatment and variable X is omitted in the fitted hazardmodel, the estimated coefficient corresponding to X in the propensity score model isbiased. In general, if variables in the study are correlated, then missing one variable in the hazard model may cause bias in the estimation of other variables in the propensityscore model as well. Acknowledgements
This work was supported in part by NIDA grant P50 DA010075.
A. Ertefaie, M. Asgharian and D. Stephenes
Table 1: Simulation: Propensity score parameter estimation. ˆ α : Estimatedparameters using the proposed method. ˆ α w : Estimated parameters using theCW method. ˆ α mw : Estimated parameters using the CW method when the haz-ard model is misspecified. ˆ α Un : Estimated parameters when unadjusted forthe length-biased sampling. α =(-0.1,1,-1).Method Bias S.D. Bias S.D.10 % Cens. n = 500 n = 5000ˆ α (0.01,0.04,0.03) (0.21,0.43,0.45) (0.00,0.01,0.01) (0.09,0.18,0.19)ˆ α w (0.08,0.02,0.01) (0.17,0.28,0.28) (0.01,0.00,0.01) (0.06,0.10,0.10)ˆ α mw (0.03,0.02,0.49) (0.17,0.29,0.23) (0.08,0.03,0.48) (0.06,0.09,0.08)ˆ α Un (0.10,0.50,0.50) (0.11,0.21,0.22) (0.10,0.51,0.50) (0.04,0.07,0.08)20 % Cens. n = 500 n = 5000ˆ α (0.01,0.05,0.05) (0.22,0.42,0.43) (0.02,0.03,0.03) (0.09,0.18,0.20)ˆ α w (0.02,0.01,0.01) (0.17,0.29,0.27) (0.02,0.01,0.01) (0.06,0.10,0.10)ˆ α mw (0.01,0.02,0.47) (0.16,0.29,0.21) (0.02,0.05,0.46) (0.06,0.10,0.08)ˆ α Un (0.11,0.49,0.50) (0.10,0.22,0.20) (0.10,0.51,0.50) (0.04,0.07,0.08)30 % Cens. n = 500 n = 5000ˆ α (0.04,0.08,0.09) (0.22,0.44,0.44) (0.03,0.06,0.07) (0.10,0.19,0.20)ˆ α w (0.07,0.00,0.02) (0.17,0.29,0.29) (0.08,0.03,0.02) (0.06,0.10,0.11)ˆ α mw (0.07,0.02,0.45) (0.17,0.29,0.21) (0.03,0.06,0.45) (0.06,0.10,0.08)ˆ α Un (0.11,0.49,0.50) (0.10,0.22,0.20) (0.10,0.51,0.50) (0.04,0.07,0.08) References
Asgharian, M. , M’Lan, C. E. & Wolfson, D. B. (2002). Length-biased sampling with right censoring.
Journal of the American Statistical Association
97 201–209.
Asgharian, M. & Wolfson, D. B. (2005). Asymptotic behavior of the unconditional NPMLE of thelength-biased survivor function from right censored prevalent cohort data.
The Annals of Statistics
33 pp. 2109–2131.
Asgharian, M. , Wolfson, D. B. & Zhang, X. (2006). Checking stationarity of the incidence rateusing prevalent cohort survival data.
Statistics in medicine
25 1751–1767.
Bergeron, P. J. , Asgharian, M. & Wolfson, D. B. (2008). Covariate bias induced by length-biasedsampling of failure times.
Journal of the American Statistical Association
103 737–742.
Cheng, Y. & Wang, M. (2012). Estimating propensity scores and causal survival functions using prevalent survival data.
Biometrics
68 707–716.
Cochran, W. & Rubin, D. (1973). Controlling bias in observational studies: A review.
Sankhy¯a: TheIndian Journal of Statistics, Series A
Cox, D. R. (1969).
Some Sampling Problems in Technology . in New Developments in Survey Sampling:New York: Wiley Interscience, 506–527.
Cox, D. R. & Lewis, P. (1966).
The statistical analysis of series of events . John Wiley and Sons.
De U˜na-´alvarez, J. (2004). Nonparametric estimation under length-biased sampling and type i cen-soring: a moment based approach.
Annals of the Institute of Statistical Mathematics
56 667–681.
Fisher, R. A. (1934). The effect of methods of ascertainment upon the estimation of frequencies.
Annalsof Human Genetics
Hern´an, M. , Brumback, B. & Robins, J. (2000). Marginal structural models to estimate the causaleffect of zidovudine on the survival of hiv-positive men.
Epidemiology
11 561–570.
Neyman, J. (1955). Statistics–servant of all science.
Science
122 401–406.
Ning, J. , Qin, J. & Shen, Y. (2010). Non-parametric tests for right-censored data with biased sampling.
Journal of the Royal Statistical Society: Series B (Statistical Methodology)
72 609–630.
Pepe, M. S. & Fleming, T. R. (1991). Weighted Kaplan-Meier statistics: Large sample and optimalityconsiderations.
Journal of the Royal Statistical Society. Series B (Methodological)
53 341–352. iscellanea Qin, J. & Shen, Y. (2010). Statistical methods for analyzing right-censored length-biased data undercox model.
Biometrics
66 382–392.
Robins, J. , Hern´an, M. & Brumback, B. (2000). Marginal structural models and causal inference in epidemiology.
Epidemiology
11 550–560.
Rosenbaum, P. R. & Rubin, D. B. (1983). The central role of the propensity score in observationalstudies for causal effects.
Biometrika
70 41–55.
Rubin, D. (1973). The use of matched sampling and regression adjustment to remove bias in observationalstudies.
Biometrics
Shen, Y. , Ning, J. & Qin, J. (2009). Analyzing length-biased data with semiparametric transformationand accelerated failure time models.
Journal of the American Statistical Association
104 1192–1202.
Wang, M. C. (1991). Nonparametric estimation from cross-sectional survival data.
Journal of theAmerican Statistical Association
86 130–143.
Wicksell, S. D. (1925). The corpuscle problem: a mathematical study of a biometric problem.
Biometrika
17 84–99.
Wolfson, C. , Wolfson, D. B. , Asgharian, M. , M’Lan, C. E. , Østbye, T. , Rockwood, K. & Hogan, D. B. (2001). A reevaluation of the duration of survival after the onset of dementia.
NewEngland Journal of Medicine
344 1111–1116.
Zelen, M. & Feinlein, M. (1969). On the theory of screening for chronic diseases.
Biometrika Appendix
In this section, we present the assumptions and proofs of the main result. The following con-ditions are required for establishing Theorem 1:C.1 X is a p vector of bounded covariates, not contained in a ( p −
1) dimensional hyperplane.
C.2 sup[ t : p ( R > t ) > ≥ sup[ t : p ( C > t ) > s and p ( δ = 1) > (cid:82) s [( (cid:82) st S C ( v ) dv ) / ( S C ( t ) S V ( t ))] dS C ( t ) < ∞ .C.4 det E (cid:20)(cid:110) δ X (cid:62) D − π ( X ,α ) w ( Y ) (cid:111) ⊗ (cid:21) < ∞ .C.5 Λ = E (cid:104) δ X (cid:62) ∂π ( X ,α ) ∂α (cid:105) is nonsingular.C.6 det [ (cid:82) s κ ( t ) / ( S C ( t ) S R ( t )) dS C ( t )] < ∞ where κ ( t ) = E (cid:104) δ ( Y >t ) X (cid:62) [ D − π ( X ,α ))] (cid:82) Yt S C ( v ) dvw ( Y ) (cid:105) , C.2 is an identifiability condition (Wang (1991)), C.3-C.6 are required to obtain an estimatorwith a finite variance.
Proof of Theorem
1. Using the strong consistency of (cid:98) w ( y ) to w ( y ) (Pepe & Fleming (1991)), wehave 1 (cid:98) w ( Y ) = 1 w ( Y ) (cid:20) w ( Y ) − (cid:98) w ( Y ) w ( Y ) (cid:21) + o p (1) . Thus˜ U ( α ) = 1 √ n n (cid:88) i =1 U i ( α )= 1 √ n n (cid:88) i =1 δ i x (cid:62) i d i − π ( x i , α )ˆ w ( y i ) = 1 √ n n (cid:88) i =1 δ i x (cid:62) i d i − π ( x i , α ) w ( y i ) (cid:20) w ( y i ) − (cid:98) w ( y i ) w ( y i ) (cid:21) + o p (1) , (A1)In section 3, we have shown that the estimating equation ˜ U ( α ) given by (5) is unbiased. Followingthe martingale integral representation √ n ( (cid:98) w ( Y ) − w ( Y )) (Shen et al. (2009) and Qin & Shen A. Ertefaie, M. Asgharian and D. Stephenes (2010)), w ( y i ) − (cid:98) w ( y i ) w ( y i ) = E (cid:34)(cid:90) s ( y i > t ) (cid:82) y i t S C ( v ) dvS C ( t ) S R ( t ) w ( y i ) dM C ( t ) (cid:35) , (A2)where M C ( s ) = ( Y − A < s, δ = 0) − (cid:82) s (min( Y − A, C ) > u ) d Λ C ( u ), with Λ C ( . ) be the cu-mulative hazard function of the censoring variable. The stochastic process M C ( s ) has mean zero, E [ M C ( s )] = E [ ( C < Y − A < s )] − (cid:90) s E [ ( Y − A > u ) . ( C > u )] d Λ C ( u ) = (cid:90) s S C ( u ) λ C ( u ) S R ( u ) du − (cid:90) s S C ( u ) S R ( u ) d Λ C ( u ) = 0 . By inserting ( A2) into ( A1) and using the standard Taylor expansion, we can derive theasymptotic variance of the estimator as follows η ( α ) = Λ (cid:48) Σ − Λ , where Σ = E (cid:104) ˜ U ( α ) ˜ U ( α ) (cid:105) = E (cid:34)(cid:26) δ X (cid:62) D − π ( X , α ) w ( Y ) (cid:27) ⊗ (cid:26) w ( Y ) − (cid:98) w ( Y ) w ( Y ) (cid:27) (cid:35) = E (cid:34)(cid:26) δ X (cid:62) D − π ( X , α ) w ( Y ) + (cid:90) s κ ( t ) dM C ( t ) S C ( t ) S R ( t ) (cid:27) ⊗ (cid:35) Λ = E (cid:20) ∂U i ( α ) ∂α (cid:21) = E (cid:20) δw ( Y ) X (cid:62) ∂π ( X , α ) ∂α (cid:21) where κ ( t ) = E (cid:104) δ ( Y >t ) X (cid:62) [ D − π ( X ,α ))] (cid:82) Yt S C ( v ) dvw ( Y ) (cid:105) . Let P n be the empirical average. The compo-nents of the variance-covariance matrix η ( α ) can be consistently estimated byˆΣ = P n (cid:40) δ X (cid:62) D − π ( X , α )ˆ w ( Y ) + (cid:90) s ˆ κ ( t ) d ˆ M C ( t )ˆ S C ( t ) ˆ S R ( t ) (cid:41) ⊗ , ˆΛ = P n (cid:20) ∂U i ( α ) ∂α (cid:21) = P n (cid:20) δ ˆ w ( Y ) X (cid:62) ∂π ( X , α ) ∂α (cid:21) , Also, the stochastic process M C ( s ) can be estimated by replacing the Λ C ( . ) by its estimate, (cid:98) Λ C ( ..