[PDF] Raking and Regression Calibration: Methods to Address Bias from Correlated Covariate and Time-to-Event Error

Abstract

Medical studies that depend on electronic health records (EHR) data are often subject to measurement error, as the data are not collected to support research questions under study. These data errors, if not accounted for in study analyses, can obscure or cause spurious associations between patient exposures and disease risk. Methodology to address covariate measurement error has been well developed; however, time-to-event error has also been shown to cause significant bias but methods to address it are relatively underdeveloped. More generally, it is possible to observe errors in both the covariate and the time-to-event outcome that are correlated. We propose regression calibration (RC) estimators to simultaneously address correlated error in the covariates and the censored event time. Although RC can perform well in many settings with covariate measurement error, it is biased for nonlinear regression models, such as the Cox model. Thus, we additionally propose raking estimators which are consistent estimators of the parameter defined by the population estimating equation. Raking can improve upon RC in certain settings with failure-time data, require no explicit modeling of the error structure, and can be utilized under outcome-dependent sampling designs. We discuss features of the underlying estimation problem that affect the degree of improvement the raking estimator has over the RC approach. Detailed simulation studies are presented to examine the performance of the proposed estimators under varying levels of signal, error, and censoring. The methodology is illustrated on observational EHR data on HIV outcomes from the Vanderbilt Comprehensive Care Clinic.

Full PDF

RRaking and Regression Calibration: Methods toAddress Bias from Correlated Covariate andTime-to-Event Error

Eric J. Oh ∗ , Bryan E. Shepherd , Thomas Lumley , Pamela A. Shaw University of Pennsylvania, Perelman School of MedicineDepartment of Biostatistics, Epidemiology, and Informatics Vanderbilt University School of MedicineDepartment of Biostatistics University of AucklandDepartment of Statistics

Abstract

Medical studies that depend on electronic health records (EHR) data are often subject to mea-surement error, as the data are not collected to support research questions under study. These dataerrors, if not accounted for in study analyses, can obscure or cause spurious associations between pa-tient exposures and disease risk. Methodology to address covariate measurement error has been welldeveloped; however, time-to-event error has also been shown to cause signiﬁcant bias but methods toaddress it are relatively underdeveloped. More generally, it is possible to observe errors in both thecovariate and the time-to-event outcome that are correlated. We propose regression calibration (RC)estimators to simultaneously address correlated error in the covariates and the censored event time.Although RC can perform well in many settings with covariate measurement error, it is biased fornonlinear regression models, such as the Cox model. Thus, we additionally propose raking estimatorswhich are consistent estimators of the parameter deﬁned by the population estimating equation. Rak-ing can improve upon RC in certain settings with failure-time data, require no explicit modeling ofthe error structure, and can be utilized under outcome-dependent sampling designs. We discuss fea-tures of the underlying estimation problem that affect the degree of improvement the raking estimator ∗ Corresponding author: [email protected] a r X i v : . [ s t a t . M E ] M a r as over the RC approach. Detailed simulation studies are presented to examine the performanceof the proposed estimators under varying levels of signal, error, and censoring. The methodologyis illustrated on observational EHR data on HIV outcomes from the Vanderbilt Comprehensive CareClinic. Biomedical research relies increasingly on electronic health records (EHR) data, either as the sole orsupplemental source of data, due to the vast amount of data these resources contain and their relativelylow cost compared to prospectively collected data. However, EHR data and other large cohort databaseshave been observed to be error-prone. These errors, if not accounted for in the data analysis, can biasassociations of patient exposures and disease risk. There exists a large body of literature describingthe impact of and methods to correct for covariate measurement error (Carroll et al., 2006); however,much less attention has been given to errors in the outcome. For linear models, independent random(classical) errors in the outcome variable do not bias regression estimates; however, errors correlatedwith either predictors in the model or errors in those predictors could bias associations. For non-linearmodels, even classical outcome errors can bias estimated associations of interest (Carroll et al., 2006).There are many examples in clinical research where the outcome of interest relies on an impreciselymeasured event time. Researchers studying the epidemiology of chronic conditions may enroll subjectssome time after an initial diagnosis, and so research questions focused on the timing of events postdiagnosis may need to rely on patient recall or chart review of electronic medical records for the dateof diagnosis, both of which are subject to error. Errors in the time origin can be systematic, as subjectcharacteristics can inﬂuence the amount of error in recall. Methods to handle a misclassiﬁed outcomehave been developed for binary outcomes (Magder and Hughes, 1997; Edwards et al., 2013; Wang et al.,2016) and discrete failure time data (Meier et al., 2003; Magaret, 2008; Hunsberger et al., 2010), whereestimates of sensitivity and speciﬁcity can be incorporated into the bias correction. However, methods tohandle errors in a continuous failure time have largely been ignored.Additionally, as more and more observational studies utilize data primarily collected for non-researchpurposes (e.g. administrative databases or electronic health records), it is increasingly common to have2rrors in both the outcome and exposures that are correlated. For example, in some observational studiesof HIV/AIDS, the date of antiretroviral therapy (ART) initiation has been observed to have substantialerrors (Shepherd and Yu, 2011; Duda et al., 2012). These errors can lead to errors in event times, deﬁnedas time since ART initiation, and errors in exposures of interest, such as CD4 count at ART initiation.Furthermore, certain types of records are often more likely to have errors (e.g. records from a particularstudy site), records with errors often tend to have errors across multiple variables, and the magnitudeof these errors cannot be assumed uncorrelated. Ignoring correlated outcome and exposure errors couldlead to positive or negative bias in estimates of regression parameters.In some settings, data errors can be corrected by retrospectively reviewing and validating medicalrecords; however, this is expensive and time-consuming to do for a large number of records. Instead, wecan perform data validation on a subset of selected records and use this information to correct estimatesbased on the larger, unvalidated dataset. In this manuscript, we propose regression calibration and rak-ing estimators as two methods to correct the bias induced from such correlated errors by incorporatinginformation learned in a validation subset to the large unvalidated dataset.Regression calibration (RC), introduced by Prentice (1982), is a method to address covariate mea-surement error that is widely used due to ease of implementation and good numerical performance in abroad range of settings. Although most RC methods assume measurement error in covariates only, Shawet al. (2018) examined a way to apply RC to correlated errors in a covariate and a continuous outcome; todate these methods have not addressed correlated errors between failure time outcomes and exposures.Raking is a method in survey sampling that makes use of auxiliary information available on thepopulation to improve upon the Horvitz-Thompson (HT) estimator for regression parameters in two-phase designs. The HT estimator is known to be inefﬁcient (Robins et al., 1994) but raking improvesstatistical efﬁciency, without changing the target of inference, by adjusting the standard HT weights bytuning them to auxiliary variables. Raking also takes advantage of the known sampling probabilities withvalidation studies such as those considered in this manuscript. These survey sampling ideas, while notnew, have not been carefully studied in the measurement error setting. Breslow et al. (2009) consideredraking estimators for modeling case-cohort data with missing covariates. Lumley et al. (2011) considereda raking estimator using simulated data in a covariate measurement error context with a validation subset.3n this manuscript, we consider raking estimators for more general settings allowing for errors in thecovariate and a time-to-event outcome, including misclassiﬁcation, and discuss various possibilities forthe auxiliary variables, how different choices affect the degree of improvement over the HT estimator,and ways to implement these methods using standard statistical software.Our contributions in this manuscript are twofold. First, we develop regression calibration estimatorsto address both censored event time error alone and correlated covariate and censored event time errorstogether. To our knowledge, no RC estimators have been developed for these settings. Second, we de-velop raking estimators that are consistent and, in some settings, improve upon the RC estimators. Thesemethods are important given the increased use of error-prone data in biomedical research and the paucityof methods that simultaneously handle errors in covariates and times-to-event. The rest of the paperproceeds as follows. We present our survival time model and the considered measurement error frame-works in Section 2. Sections 3 and 4 present the proposed regression calibration and raking methods,respectively. Section 5 compares the relative performance of the proposed estimators with simulationstudies for various parameter settings and error distributions. In Section 6, we apply our methods to anHIV cohort and ascertain their robustness to misclassiﬁcation. We conclude with a discussion in Section7.

We consider the Cox proportional hazards model. Let T i and C i , be the failure time and right censoringtime, respectively, for subjects i = 1 , . . . , n on a ﬁnite follow-up time interval, [0 , τ ] . Deﬁne U i = min ( T i , C i ) and the corresponding failure indicator ∆ i = I ( T i ≤ C i ) . Let Y i ( t ) = I ( U i ≥ t ) and N i ( t ) = I ( U i ≤ t, ∆ i = 1) denote the at-risk indicator and counting process for observed events,respectively. Let X i be a p -dimensional vector of continuous covariates that are measured with errorand Z i a q -dimensional vector of precisely measured discrete and/or continuous covariates that may becorrelated with X i . We assume C i is independent of T i given ( X i , Z i ) and that the data are i.i.d. Letthe hazard rate for subject i at time t be given by λ i ( t ) = λ ( t ) exp( β (cid:48) X X i + β (cid:48) Z Z i ) , where λ ( t ) isan unspeciﬁed baseline hazard function. We consider β X to be the parameter(s) of interest, which is4stimated by solving the partial likelihood score for β = ( β X , β Z ) . n (cid:88) i =1 (cid:90) τ (cid:40) { X i , Z i } (cid:48) − n − (cid:80) nj =1 Y j ( t ) { X j , Z j } (cid:48) exp( β (cid:48) X X j + β (cid:48) Z Z j ) n − (cid:80) nj =1 Y j ( t ) exp( β (cid:48) X X j + β (cid:48) Z Z j ) (cid:41) dN i ( t ) = 0 (1) Oftentimes, errors seen in electronic health records data or other datasets used for observational studieswill not be simple random error and will depend on other variables in the dataset. For example, when thetime-to-event error is due to a mismeasured time origin, this timing error can cause correlated errors inthe baseline observations for exposures that are associated with the true survival outcome. In addition,errors induced in the exposures and censored time-to-event outcome can vary systematically with subjectcharacteristics that could make a subject’s record more error-prone. Thus, we consider the error settinginvolving additive systematic and random error in both the covariates and time-to-event.Instead of observing ( X, Z, U, ∆) , we observe ( X (cid:63) , Z, U (cid:63) , ∆) , where X (cid:63) = α + α (cid:48) X + α (cid:48) Z + (cid:15) (2) U (cid:63) = U + γ + γ (cid:48) X + γ (cid:48) Z + ν = U + ω. (3)Note that X and Z in the above formulation do not necessarily represent the full vector of covariates(e.g. some elements in the vectors α , α , γ , and γ may be 0). We assume that (cid:15) and ν are mean 0random variables with variance Σ (cid:15)(cid:15) and Σ νν , respectively, and are independent of all other variables withthe exception that we allow their covariance, Σ (cid:15)ν , to be non-zero. We refer to this setting as the additiveerror structure . In this setting the error in the observed censored failure time U ∗ is a mistiming error butthere are no errors in the event indicator ∆ . We will see in the sections to follow that raking estimators, contrary to regression calibration estimators,do not require modeling the measurement error structure explicitly. Thus, we will also consider a moregeneral error model that also involves a misspeciﬁed event. Whereas the additive error structure in5ection 2.1 might be expected in scenarios involving only an error-prone baseline time (e.g. self-reportedbaseline time), the general error model relaxes this assumption to allow the timing of the failure, andthus the failure indicator, to be error-prone as well. Instead of observing ( X, Z, U, ∆) , one observes ( X (cid:63) , Z, U (cid:63) , ∆ (cid:63) ) , where errors in the event may be coming from both a mistiming error and also frommisclassiﬁcation of the event indicator. Note that with this error structure we also make no assumptionsregarding the additivity of errors or their correlation with other variables. We consider the two-phase design in which the true, error-free variables are measured retrospectively fora subsample of subjects at the second phase. Let R i be an indicator for whether subject i = 1 , . . . , n isselected to be in the second phase and let < π i ≤ be their known sampling probability. In general,the sampling probabilities are known in validation studies based on observational data utilizing EHR,which are becoming increasingly common. This sampling scheme also accommodates scenarios wherethe subsample size is ﬁxed (e.g. simple random sampling) and where the subsample size is random (e.g.Bernoulli sampling), as well as stratiﬁed designs (e.g. case-cohort). We assume that at phase one, therandom variables ( X (cid:63)i , Z i , U (cid:63)i , ∆ i ) [or ( X (cid:63)i , Z i , U (cid:63)i , ∆ (cid:63)i ) in a setting with misclassiﬁcation] are observedfor n subjects as a random sample from the population. At phase two, m < n subjects are selected fromthe phase one population according to the aforementioned sampling probability and the random variables ( X i , U i ) [or ( X i , U i , ∆ i ) ] are additionally observed for those subjects. From this point on, we refer to thephase two subjects as the validation subset. In this section, we give a brief introduction to the original RC and risk set regression calibration (RSRC)methods for classical covariate measurement error and then develop their extensions for our considerederror settings that include error in the censored outcome alone and correlated errors in the censored out-come and covariates. Under regularity conditions similar to those in Andersen and Gill (1982), the RCand RSRC estimators developed in this section for error in the censored outcome and potentially corre-6ated errors in the censored outcome and covariates are asymptotically normal, although not necessarilyconsistent for β . The proof is similar to that in the covariate error only setting, which was shown in Wanget al. (1997). For more detail see Appendix A of the Supplementary Materials. Prentice (1982) introduced the regression calibration method for the setting of Cox regression and clas-sical measurement error in the covariate. Shaw and Prentice (2012) applied regression calibration forthe covariate error structure assumed in Section 2.1. The idea of regression calibration is to estimatethe unobserved true variable with its expectation given the data. Prentice (1982) showed that under theindependent censoring assumption, the induced hazard function based on the error-prone data is given by λ ( t ; X (cid:63) , Z ) = λ ( t ) exp ( β (cid:48) Z Z ) E (exp { β (cid:48) X X }| X (cid:63) , Z, U ≥ t ) . He then showed that for rare events andmoderate β X , E (exp { β (cid:48) X X }| X (cid:63) , Z, U ≥ t ) ≈ exp ( β (cid:48) X E ( X | X (cid:63) , Z )) . E( X | X (cid:63) , Z ) can be estimatedusing the following ﬁrst order approximation E( X | X (cid:63) , Z ) = µ X + (cid:20) Σ XX (cid:63) Σ XZ (cid:21)  Σ X (cid:63) X (cid:63) Σ X (cid:63) Z Σ ZX (cid:63) Σ ZZ  −  X (cid:63) − µ X (cid:63) Z − µ Z  , (4)where the validation subset is used to calculate the moments involving X (see Shaw and Prentice (2012)).Deﬁne ˆ X = E( X | X (cid:63) , Z ; ˆ ζ x ) , where ˆ ζ x is the vector of nuisance parameters in (4) estimated from thedata. ˆ X is then imputed for X in the partial likelihood score (1) instead of the observed X (cid:63) to solve for β , which yields the corrected estimates (Shaw and Prentice, 2012). Note, for simplicity we generallysuppress the notation of the dependence of terms such as E( X | X (cid:63) , Z ) on the nuisance parameter ζ x ,unless it is important for clarity, such as to refer to its estimator E( X | X (cid:63) , Z ; ˆ ζ x ) . Assume the time-to-event error structure in Section 2.1, i.e., we observe ( X, Z, U (cid:63) , ∆) . Given the addi-tivity of the outcome errors in (3), we can take the expectation of the censored event time, U (cid:63) , given theobserved covariates and rearrange to obtain E( U | X, Z ) = E( U (cid:63) | X, Z ) − E ( ω | X, Z ) . We use E( ω | X, Z )

7o correct U (cid:63) and then impute as our estimate of the true censored event time. Since the true E( ω | X, Z ) is unknown, we can estimate it using the following ﬁrst order approximation E( ω | X, Z ; ζ ω ) = µ ω + (cid:20) Σ ωX Σ ωZ (cid:21)  Σ XX Σ XZ Σ ZX Σ ZZ  −  X − µ X Z − µ Z  , (5)where the validation subset is used to calculate the moments involving ω and ζ ω is the vector of nuisanceparameters in (5). Adjusting U (cid:63) to have the correct expectation gives us ˆ U = U (cid:63) − E( ω | X, Z ; ˆ ζ ω ) , whichwe use instead of U (cid:63) to solve the partial likelihood score (1) for the corrected β estimates. Assume the additive error structure for both X (cid:63) and U (cid:63) in Section 2.1, i.e., we observe ( X (cid:63) , Z, U (cid:63) , ∆) .Given the additivity of the outcome errors in (2.3), we can take the expectation of the censored event time, U (cid:63) , given the observed covariates and rearrange to obtain E( U | X (cid:63) , Z ) = E( U (cid:63) | X (cid:63) , Z ) − E ( ω | X (cid:63) , Z ) .We use E( ω | X (cid:63) , Z ) to correct U (cid:63) and then impute as our estimate of the true censored event time. Dueto the error-prone X (cid:63) , we impute E( X | X (cid:63) , Z ) for X (cid:63) as well, similar to Prentice (1982). Given that thetrue E( X | X (cid:63) , Z ; ζ x ) is unknown, we estimate it using the same ﬁrst order approximation described inSection 3.1. In addition, we propose to estimate E( ω | X (cid:63) , Z ; ζ ω ) using the same ﬁrst order approximationdescribed in Section 3.2 except using X (cid:63) instead of X , giving us ˆ U = U (cid:63) − E( ω | X (cid:63) , Z ; ˆ ζ ω ) as theestimate of the true censored time-to-event. Thus, we impute ˆ U and ˆ X = E( X | X (cid:63) , Z ; ˆ ζ x ) in the partiallikelihood score (1) instead of the observed U (cid:63) and X (cid:63) and solve for β to obtain our corrected estimates. We also considered improving our regression calibration estimators by applying the idea of recalibratingthe mismeasured covariate within each risk set developed by Xie et al. (2001) for classical measurementerror and extended to the covariate error model in Section 2.1 by Shaw and Prentice (2012). Since the riskset membership likely depends on subject speciﬁc covariates whose distribution is changing over time,8e may be able to obtain better RC estimates by performing the calibration at every risk set as eventsoccur. In particular, this method was shown to decrease the bias signiﬁcantly for the setting of covariatemeasurement error when the hazard ratio is quite large, a case in which ordinary RC has been observed toperform poorly. Speciﬁcally for covariate measurement error, the risk set regression calibration estimatorsolves the partial likelihood score (1) using ˆ X ( t ) instead of X , where ˆ X ( t ) is recalculated using RC ateach event time using data from only those individuals still in the risk set at that event time.In the presence of time-to-event error, however, the necessary moments needed to estimate the con-ditional expectations in Sections 3.2 and 3.3 at the i th individuals’ censored event time will be incorrectdue to the fact that the risk sets deﬁned by U (cid:63) will not be the same as those deﬁned by U , leading tobiased estimates. Thus, to extend the RSRC idea to the settings of error in the censored outcome andcorrelated error in the covariate and censored outcome, we propose a two-stage RSRC estimator wherethe ﬁrst stage involves obtaining the estimate ˆ U using ordinary RC. The second stage then assumes ˆ U is the observed event time instead of U (cid:63) and recalibrates ˆ U and X (cid:63) at risk sets deﬁned by ˆ U using themethods described in Section 3.2 and Section 3.3. In this section, we develop design-based estimators by applying generalized raking (raking for short)(Deville and S¨arndal, 1992; Deville et al., 1993), which leverages the error-prone data available on theentire sample to improve the efﬁciency of consistent estimators calculated using the error-free validationsubset. We give a brief overview of the general raking method and then propose our estimators forthe correlated measurement error settings under consideration. Under suitable regularity conditions, theproposed raking estimators have been shown to be √ n consistent, asymptotically normal estimators of β for all two-phase designs described in Section 2.3. For the proof, see Saegusa and Wellner (2013). Let P i ( β ) denote the population score equations for the true underlying Cox model with correspondingtarget parameter β , the log hazard ratio we would estimate if we had error-free data on the full cohort.9hen the HT estimator of β is given by the solution to (cid:80) ni =1 R i π i P i ( β ) = 0 , which is known to be aconsistent estimator of β . Consider A i , a set of auxiliary variables that are available for everyone atphase one and are correlated with the phase two subsample variables. Raking estimators modify thedesign weights w i,des = π i to new weights w i,cal = g i π i such that they are as close as possible to w i,des while (cid:80) ni =1 A i is exactly estimated by the validation subset. Thus, given a distance measure d ( ., . ) , theobjective is minimize n (cid:88) i =1 R i d (cid:18) g i π i , π i (cid:19) subject to n (cid:88) i =1 A i = n (cid:88) i =1 R i g i π i A i . (6)Note that the constraints above are known as the calibration equations. Deville et al. (1993) give severaloptions for choosing the distance function, and the resulting constrained minimization problem can besolved to yield a solution for g i . The generalized raking estimator is then deﬁned as the solution to n (cid:88) i =1 R i g i π i P i ( β ) = 0 . (7) For our setting of the Cox model, we use the distance function d ( a, b ) = a log (cid:0) ab (cid:1) + ( b − a ) in theobjective function of (6) to ensure positive weights. Solving the constrained minimization problem for g i then yields g i = exp (cid:16) − ˆ λ (cid:48) A i (cid:17) . After plugging in g i to the calibration equations, Deville and S¨arndal(1992) show that the solution for λ satisﬁes ˆ λ = ˆ B − (cid:32) N (cid:88) i =1 R i π i A i − N (cid:88) i =1 A i (cid:33) + O p ( n − ) , where ˆ B = (cid:80) Ni =1 R i π i A (cid:48) i A i . Finally, we construct auxiliary variables, A i , that yield efﬁcient estimators.Breslow et al. (2009) derived the asymptotic expansion for the solution to (7) and showed that the op-timal auxiliary variable is given by A opt i = E(˜ (cid:96) ( X i , Z i , U i , ∆ i ) | V ) , where ˜ (cid:96) ( X i , Z i , U i , ∆ i ) denotes theefﬁcient inﬂuence function contributions from the population model had the true outcome and covariates10een observed for everyone in phase one and V = ( X (cid:63) , Z, U (cid:63) , ∆) [or ( X (cid:63) , Z, U (cid:63) , ∆ (cid:63) ) in a setting withmisclassiﬁcation]. However, calculating A opt i involves a conditional distribution of unobserved variablesand thus is generally not practically obtainable. Kulich and Lin (2004) proposed a “plug in” method thatapproximates this conditional expectation by using the inﬂuence functions from a model ﬁt using phaseone data. Speciﬁcally, they proposed to use the phase two data to ﬁt models that impute the missinginformation from the phase one data only and then to obtain the inﬂuence functions from the desiredmodel that uses imputed values in place of the missing data. They further proposed using a dfbeta typeresidual, which is readily available in statistical software, to estimate the inﬂuence function from theapproximate model. We will propose two different imputations for the missing data, which will lead totwo different choices of A i that approximate A opt i .The ﬁrst proposed approximation of A opt i is given by A N,i = ˜ (cid:96) ( X (cid:63)i , Z i , U (cid:63)i , ∆ i ) , the inﬂuence func-tion for the naive estimator that used the error prone data instead of the unobserved true values. One canestimate A N,i empirically using ˜ (cid:96) ( X (cid:63)i , Z i , U (cid:63)i , ∆ i ) ≈ ∆ i (cid:26) { X (cid:63)i , Z i } (cid:48) − S (1) (cid:63) ( β, t ) S (0) (cid:63) ( β, t )) (cid:27) − n (cid:88) i =1 (cid:90) τ exp( β (cid:48) X X (cid:63)i + β (cid:48) Z Z i ) S (0) (cid:63) ( β, t ) (cid:26) { X (cid:63)i , Z i } (cid:48) − S (1) (cid:63) ( β, t ) S (0) (cid:63) ( β, t )) (cid:27) dN (cid:63)i ( t ) , where S ( r ) (cid:63) ( β, t ) = n − (cid:80) nj =1 Y (cid:63)j ( t ) (cid:8) X (cid:63)j , Z j (cid:9) (cid:48) ⊗ r exp( β (cid:48) X X (cid:63)j + β (cid:48) Z Z j ) ( a ⊗ is the vector a and a ⊗ is thescalar 1). For measurement error settings including an error-prone failure indicator, we approximate A opt i with A N,i = ˜ (cid:96) ( X (cid:63)i , Z i , U (cid:63)i , ∆ (cid:63)i ) .The second proposed approximation of A opt i is given by A RC ,i = ˜ (cid:96) ( ˆ X i ( ˆ ζ x ) , Z i , ˆ U i ( ˆ ζ ω ) , ∆ i ) , i.e., theinﬂuence function for the target estimator that uses the calibrated estimates ( ˆ X i ( ˆ ζ x ) , ˆ U i ( ˆ ζ )) in place ofthe unobserved true data ( X i , U i ) . One can again use the empirical approximation ˜ (cid:96) ( ˆ X i ( ˆ ζ x ) , Z i , ˆ U i ( ˆ ζ ω ) , ∆ i ) ≈ ∆ i (cid:40)(cid:110) ˆ X i ( ˆ ζ x ) , Z i (cid:111) (cid:48) − ˆ S (1) ( β, ˆ ζ, t )ˆ S (0) ( β, ˆ ζ, t )) (cid:41) − n (cid:88) i =1 (cid:90) τ exp( β (cid:48) X ˆ X i ( ˆ ζ x ) + β (cid:48) Z Z i )ˆ S (0) ( β, ˆ ζ, t ) (cid:40)(cid:110) ˆ X i ( ˆ ζ x ) , Z i (cid:111) (cid:48) − ˆ S (1) ( β, ˆ ζ, t )ˆ S (0) ( β, ˆ ζ, t )) (cid:41) d ˆ N i ( t ; ˆ ζ ω ) , ˆ S ( r ) ( β, ˆ ζ, t ) = n − (cid:80) nj =1 ˆ Y j ( t ; ˆ ζ ω ) (cid:110) ˆ X j ( ˆ ζ x ) , Z j (cid:111) (cid:48) ⊗ r exp( β (cid:48) X ˆ X j ( ˆ ζ x ) + β (cid:48) Z Z j ) ( a ⊗ is the vector a and a ⊗ is the scalar 1). For measurement error settings including an error-prone failure indicator, weapproximate A opt i with A RC ,i = ˜ (cid:96) ( ˆ X i ( ˆ ζ x ) , Z i , ˆ U i ( ˆ ζ ω ) , ∆ (cid:63)i ) . Thus, the two proposed raking estimators are:1. Generalized raking naive (GRN): solution to (7) using A N,i

2. Generalized raking regression calibration (GRRC): solution to (7) using A RC ,i where both estimators utilize g i = exp (cid:16) − ˆ λ (cid:48) A i (cid:17) .The efﬁciency gain from the raking estimator over the HT estimator depends on the correlation be-tween the auxiliary variables and the target variables. Breslow and Wellner (2007) showed that the vari-ance of HT parameter estimates is the sum of the model-based variance due to sampling from an inﬁnitepopulation with no missing data and the design-based variance resulting from estimation of the unknownfull cohort total of efﬁcient inﬂuence function contributions. Thus, we consider ˜ (cid:96) ( X i , Z i , U i , ∆ i ) to beour target variables. We expect the regression calibration estimators to be less biased than the naiveestimators and therefore conjecture that A RC would be more highly correlated with A opt than A N . Notethat in general, when the parameter of interest is a regression parameter, choosing the auxiliary variablesto be the observed, error-prone variables will not improve efﬁciency. For more details, see Chapter 8 ofLumley (2011). Instead of explicitly calculating A N,i and A RC ,i with the inﬂuence function formulas given above, wepropose to utilize standard software to calculate the A i so that practitioners may easily implement thesemethods. In R , the inﬂuence functions can be approximated with negligible error as a dfbeta type resid-ual. Thus, the raking estimates can be computed as follows:1. Fit a candidate Cox model using all phase one subjects.2. Construct the auxiliary variables A i as imputed dfbetas from the model ﬁt in Step 1.3. Estimate regression parameters β using weights raked to A i by solving (7).12or step one, we consider the naive Cox model using the error-prone data (GRN) and the regressioncalibration approach described in Section 3 (GRRC). For step three, we utilize the survey package byLumley (2016) in R , which provides standard software for obtaining raking estimates. We examined the ﬁnite sample performance of our proposed RC, RSRC, GRRC, and GRN estimatorsthrough simulation for the error framework described in Section 2. These four estimators were comparedto those from the true model, a Cox proportional hazards regression model ﬁt with the true covariatesand event times, a naive Cox model ﬁt with the error-prone covariates and/or error-prone censored eventtimes, and the complete-case estimator using only the true covariates and event times in the validationsubset. We note that all validation subsets were selected as simple random samples with known samplingprobability, meaning the complete-case estimator is equivalent to the HT estimator. Following Section2.1, we considered the additive error structure with correlated covariate and time-to-event error. Inaddition to this case, we also considered the censored outcome error only setting. We further consideredcorrelated covariate and censored outcome error under the special case where the covariates are onlysubject to random error, namely classical measurement error (cid:16) ( α , α ) = (cid:126) α = (cid:126) (cid:17) . In addition, weconsidered the general error structure described in Section 2.2, where there exists errors in the time-to-event that result from mistiming as well as misclassiﬁcation in addition to additive covariate error. Wepresent % biases, average bootstrap standard errors (ASE) for the 4 proposed estimators or average modelstandard errors (ASE) for the naive and complete case estimators, empirical standard errors (ESE), meansquare errors (MSE), and 95 % coverage probabilities (CP) for varying values of the log hazard ratio β X , % censoring, and error variances and covariances. We additionally present type 1 error results for β X = 0 and α = 0 . . All simulations were run 2000 times using R version 3.4.2. The error-prone covariate X was generatedas a standard normal distribution and the error-free covariate as Z ∼ N (2 , , with ρ X,Z = 0 . . We13et the true log hazard ratios to be β X ∈ { log(1 . , log(3) } , which we refer to as moderate and large,respectively, and β Z = log(2) . The true survival time T was generated from an exponential distributionwith rate equal to λ exp( β X X + β Z Z ) , where λ = 0 . . We then simulated and censoring,which we refer to as common and rare event settings, respectively, by generating separate random rightcensoring times for each β X to yield the desired % censored event times. Censoring times were generatedas Uniform distributions with length and . for each % censored time, respectively, to mimic studiesof different lengths. For the error terms (cid:15) and ν , we considered normal distributions with means 0,variances (Σ (cid:15)(cid:15) = σ (cid:15) , Σ νν = σ ν ) ∈ { . , } , and (Σ (cid:15)ν = σ (cid:15)ν ) ∈ { . , . } , resulting in correlationsranging from . to . . The error-prone covariate and censored event time were generated withparameters ( α , α , α ) = (0 , . , − . and ( γ , γ , γ ) = ( σ ν × , . , − . . The choice of γ is suchthat the error-prone time is a valid event time (i.e., greater than zero) with high probability. The fewcensored event times that were less than 0 were reﬂected across 0 to generate valid outcomes.For the error terms (cid:15) and ν , we also considered a mixture of a point mass at zero and a shifted gammadistribution with the same means and covariances as the normal distributions to determine the robustnessof our methods to non-normality of errors. Note that while the RC and RSRC estimators are expected tobe challenged by such departures from normality, the raking estimators are not affected by the structureof the measurement error other than by the strength of the correlation between the auxiliary variablesand the target variables. The mixture probability was set to be . for both covariate and outcome error.For the misclassiﬁcation example, we set β X = log(1 . , σ (cid:15) = σ ν = 0 . , σ (cid:15)ν = 0 . , with normallydistributed error terms and censoring. In addition, the sensitivity and speciﬁcity for ∆ were set to by adding Bernoulli error ( p = 0 . . For all simulations, we set the number of subjects to be and selected the validation subsets as simple random samples of size 200, or π i = π = 0 . . The dataexample in Section 6 considers selecting the validation subsets using unequal sampling probabilities viaoutcome-dependent sampling.Standard errors for the RC, GRRC, and GRN estimates were obtained using the bootstrap methodwith bootstrap sampling stratiﬁed on the validation subset membership and using bootstrap sam-ples. Note that while the raking estimators have known sandwich variance estimators for the asymptoticvariance, we used the bootstrap to calculate standard errors and coverage probabilities (see Appendix14 of the Supplementary Materials for an empirical comparison). The RSRC standard errors were alsocalculated similarly using the bootstrap; however, only bootstrap samples were utilized due to itscomputational burden. In addition, the RSRC estimators were recalibrated at deciles of the observedevent times. For all discussed tables, we observed that the naive estimates had very large bias with coveragehovering around . In contrast, the complete case estimates were nearly unbiased for all settingsdiscussed, but suffered from large standard errors, particularly for rare event settings when there wereonly a few subjects who had events in the validation subset. The coverage of the complete case estimateswas near for all settings. In the discussion of simulation results to follow, we focus on the 4 proposedestimators and how their relative performance differed across settings.Table 1 presents the relative performance for estimating β X in the presence of the time-to-event errordescribed in Section 2.1 and no covariate error, with ν ∼ N (0 , σ ν ) . The RC estimates had moderate tolarge bias ( − to − ) and coverage ranging from . to , depending on if β X was moderate orlarge. We observed around a decrease in bias for the RSRC estimates compared to RC for moderate β X and common events and a range of − bias reduction for other settings, with coverage around − and for moderate and large β X , respectively. The reduction in bias for the RSRC estimatesresulted in a lower MSE for all settings except under moderate β X and rare events, a setting in whichRC is known to perform well. Both raking estimates were nearly unbiased across all parameter settings,had uniformly lower standard errors than the complete case estimates, and had coverage near .Interestingly, the performances of the GRRC and GRN estimators were virtually indistinguishable, withsimilar bias, standard errors, MSE, and coverage. Overall, RSRC had the lowest MSE for all moderate β X settings whereas the raking estimates had the lowest MSE for all large β X settings.Tables 2 and 3 consider the relative performance for estimating a moderate log hazard ratio in thesetting of correlated additive errors in the outcome and covariate as described in Section 2.1 for normallydistributed error terms and common and rare events, respectively. The RC estimates had relatively mod-erate bias ( − to − ) and coverage ranging from . to . . For common events, the RSRC15stimates had around less bias than the RC estimates, whereas for rare events, they yielded onlya small decrease in bias. Even in these more complex error settings, both raking estimates remainednearly unbiased, had lower standard errors than the complete case estimates, and maintained coveragearound across varying error variances and covariances. We noticed that for all parameter settings,the GRRC and GRN estimators were again nearly indistinguishable. Overall for the common event set-tings, the RSRC estimates had the lowest MSE when the error variances were both . ; otherwise, theraking estimates had the lowest MSE for all other settings. For the rare event settings, the RC estimateshad the lowest MSE across all variance and covariance settings.We present the relative performance for estimating a larger log hazard ratio, keeping other parametersthe same as in Tables 2 and 3, in Table 4 and Supplementary Materials Table 1 in Appendix C. Both theRC and RSRC estimates had large bias, ranging from − to − and − to − , respectively,as well as coverage or below. Again, both raking estimates remained nearly unbiased, had lowerstandard errors than the complete case estimates, and maintained coverage around across varyingerror variances and covariances, with the GRRC and GRN estimates indistinguishable. Across all errorsettings, the raking estimates had the lowest MSE.Table 5 presents the type 1 error, ASE, ESE, and MSE when β X = 0 in the presence of correlated,additive measurement error in the outcome and covariate X with normally distributed errors. For bothlevels of censoring, the type 1 error of the RC and RSRC estimates ranged from . to . and theraking estimates were around . and . for common and rare events, respectively. It is of note thatthe type 1 error for the naive estimator is for both levels of censoring, meaning the null hypothesis wasfalsely rejected in every simulation run.Results for β Z , for the settings presented in Tables 1-4, are presented in Tables 2-5 of AppendixC in the Supplementary Materials. The conclusions for this parameter were similar to those of β X ;however, the raking estimates had the lowest MSE across more settings. Tables 6-8 in Appendix D ofthe Supplementary Materials present simulation results for β X in a setting where the covariates are onlysubject to classical measurement error, keeping all other settings the same as Tables 2-4. Results aresimilar to those presented above.We consider the relative performance for when the error distributions were generated as a mixture16f a point mass at 0 and shifted gamma distribution, with settings otherwise the same as those in Tables1-4, in Tables 9-12 of Appendix E in the Supplementary Materials. The RC and RSRC estimators werechallenged by such departures from normality, with generally more bias and higher MSE, while theraking estimators remained unbiased with lower MSE.Table 13 in Appendix F of the Supplementary Materials considers the relative performance of theestimators in the presence of misclassiﬁcation errors in addition to the correlated additive errors in thetime-to-event and covariate X , as described in Section 2.2. The RC and RSRC estimates had verylarge bias and coverage between and as these methods were not developed to directly handlemisclassiﬁcation. As expected, the GRRC and GRN estimates were nearly unbiased because the rakingestimators do not depend on the structure of the measurement error. Overall, the raking estimators hadthe lowest MSE in this more complex error setting. We applied the four proposed methods to electronic health records data from a large HIV clinic, theVanderbilt Comprehensive Care Clinic (VCCC). The VCCC is an outpatient clinic that provides careto HIV patients and collects clinical data over time that is electronically recorded by nurses and physi-cians (Lemly et al., 2009). The VCCC fully validated all key variables for all records, resulting in anunvalidated, error-prone dataset and a fully validated dataset that we consider to be correct. Thus, thisobservational cohort is ideal for directly assessing the relative performance of the proposed regressioncalibration and raking estimators compared to the naive and HT estimators. Note that the naive estimatorwas calculated using only the unvalidated dataset as if the validated dataset did not exist. In addition, theHT estimator was calculated using a subsample of the fully validated dataset. Throughout this example,we considered the estimates from the fully validated dataset to be the “truth” and deﬁned these as theparameters of interest. In addition, all considerations of bias were relative to these target parameters. Weconsidered two different failure time outcomes of interest: time from the start of antiretroviral therapy(ART) to the time of virologic failure and to the time of ﬁrst AIDS deﬁning event (ADE). For the formeranalysis, virologic failure was deﬁned as an HIV-RNA count greater than or equal to 400 copies/mLand patients were censored at the last available test date after ART initiation. The HIV-RNA assay, and17ence time at virologic failure was largely free of errors, whereas the time at ART start was error-prone,corresponding to errors in U . The ADE outcome was deﬁned as the ﬁrst opportunistic infection (OI) andpatients were censored at age of death if it occurred or last available test date after ART initiation. Forthis failure time, both time of ART initiation and time at ﬁrst ADE were error-prone, corresponding toerrors in U and ∆ . We studied the association between the outcomes of interest and the CD4 count andage at ART initiation. Since date of ART initiation was error prone, CD4 and age at ART initiation mayalso have errors. Appendix G of the Supplementary Materials provides detail on the eligibility criteriaand statistics for the covariate and time-to-event error for both analyses.The analysis of the virologic failure outcome included 1863 patients with moderate censoring ratesof . and . in the unvalidated and validated dataset, respectively. We observed highly (slightly)skewed error in CD4 count at ART start (observed event times) and very small amounts of misclassiﬁ-cation. The validation subset was selected as a simple random sample of , resulting in 373 patients.For this sampling design, the HT estimator is equivalent to the complete case estimator. The hazardratios and their corresponding conﬁdence intervals comparing the estimators are displayed graphicallyin the ﬁrst row of Figure 1 and shown in Table 14 in Appendix H of the Supplementary Materials. Wenote that the standard errors for all estimators (including the true, naive, and HT) were calculated usingthe bootstrap with 300 replicates, which were somewhat larger than the model SEs likely due to a lack ofﬁt of the Cox model. The RSRC estimators were recalibrated at vigintiles of the observed event times.For this analysis, there was little bias in the naive estimators of a 100 cell/mm increase in CD4 countat ART initiation and 10 year increase of age at ART initiation ( . and . , respectively). Forboth covariates, RC and RSRC provided very minimal improvements in bias, albeit with slightly widerconﬁdence intervals. Small bias notwithstanding, we noticed that both the GRRC and GRN estimatorshad smaller bias compared to the naive estimator and had narrower conﬁdence intervals than the HT esti-mator. The GRRC and GRN estimators had very little differentiating them, similar to what was observedin the simulations.The analysis of the ADE outcome included 1595 patients with very high censoring rates of . and . in the unvalidated and validated dataset, respectively. We observed highly (slightly) skewederror in CD4 count at ART start (observed event times) and a misclassiﬁcation rate of that was18argely due to false positives ( positive predictive value = 35%) . While the RC and RSRC methods de-veloped in this paper do not explicitly handle misclassiﬁcation, we were nevertheless interested in seeinghow they would perform in this real data scenario in comparison to the raking methods that can handlemisclassiﬁcation. Due to ADE being a rare event, we utilized a case-cohort sampling scheme to selectthe validation subset. Speciﬁcally, we selected a simple random sample of , or 112 patients, from thefull error-prone data and then added the remaining 227 subjects classiﬁed as cases by the error-proneADE indicator to the validation subset. Note that due to the biased sampling scheme of the case-cohortdesign, the estimates of the conditional expectations involved in the RC and RSRC estimators cannot becalculated in the same manner as under simple random sampling. Thus, we used IPW least squares toestimate the conditional expectations for RC, RSRC, and GRRC (step one of calculating raking estimatesas detailed in Section 4.3) . The hazard ratios and their corresponding conﬁdence intervals comparing theestimators are displayed graphically in the second row of Figure 1 and shown in Table 14 in AppendixH of the Supplementary Materials. The standard errors for all estimators were again calculated usingthe bootstrap with 300 replicates. We noticed signiﬁcantly more bias in the naive estimators of a 100cell/mm increase in CD4 count at ART initiation and 10 year increase of age at CD4 count measurement( . and . , respectively). In fact, the naive point estimate for age was in the wrong directioncompared to the true estimate, yielding anticonservative bias. The RC and RSRC estimators providedlittle to no bias improvement for both covariates. However, the GRRC and GRN estimates were bothnearly unbiased with narrower conﬁdence intervals than those of the HT estimator. Again, we noticedthat the GRRC and GRN estimators gave similar estimates, with GRRC (GRN) having narrower conﬁ-dence intervals for the CD4 (age) hazard ratios. In this analysis, we noticed huge improvements in biasfrom the GRRC and GRN estimators compared to the naive estimators and decreased standard errorscompared to the HT estimator even in the presence of appreciable misclassiﬁcation, which the RC andRSRC estimators could not handle.The R package RRCME at https://github.com/ericoh17/RRCME implements our methods on a sim-ulated data set that mimics the structure of the VCCC data. Additionally, Appendix I of the Supple-mentary Materials contains code that implements the RC and GRN estimators for this simulated data todemonstrate ease of application of these estimators.19

Discussion

Data collected primarily for non-research purposes, such as those from administrative databases or EHR,can have errors in both the outcome and exposures of interest, which can be correlated. Using EHR datafrom the VCCC HIV cohort, we observed that Cox regression models using the unvalidated dataset ascompared to the fully validated dataset resulted in a 3-fold underestimation of the CD4 hazard ratio forADE and overestimation of the age hazard ratio in the wrong direction such that the null hypothesis ofa unit hazard ratio was nearly rejected. Spurious associations driven by such unvalidated outcomes andexposures can misdirect clinical researchers and can be harmful to patients down the line. Even whenvariables are reviewed and validated for a subset of the records, the additional information gained fromthese validation procedures are not often utilized in estimation.The existing literature does not adequately address such complex error across multiple variables;in particular, the timing error in the censored failure time outcome. In this article, we developed fourdifferent estimators that incorporate an internal validation subset in the analysis to try to obtain unbiasedand efﬁcient estimates. The RC and RSRC estimators approximate the true model by estimating thetrue outcome and/or exposure given the unvalidated data and information on the error structure fromthe validation subset. This approximation lacks consistency in most cases for nonlinear models and theRC and RSRC estimators can have appreciable bias for some error settings. However, in settings witha modest hazard ratio and rare events, RC outperformed the other estimators with respect to having thelowest MSE. RSRC had the lowest MSE for settings with a modest hazard ratio and common eventsunder only censored outcome error and for settings with a modest hazard ratio, common events, andsmall error variance under correlated outcome and covariate error. The proposed regression calibrationmethods were considered for the proportional hazards model; however, we expect they would work quitewell more generally in accelerated failure time models where an additive error structure is assumed.In fact, some forms of error in the outcome will bias the proportional hazards parameter but not theacceleration parameter (Oh et al., 2018).The generalized raking estimators are consistent whenever the design-weighted complete case esti-mating equations (e.g. HT estimator) yields consistent estimators; they use inﬂuence functions based onthe unvalidated data as auxiliary variables to improve efﬁciency over the complete case estimator and can20e used under outcome-dependent sampling. The raking estimators are not sensitive to the measurementerror structure, which is in contrast to the RC and RSRC estimators that can perform poorly when theerror structure is not correctly speciﬁed. In particular, we noticed in our data example and simulationsthat in the presence of misclassiﬁcation as well as timing errors, GRRC and GRN yield nearly unbi-ased estimates while RC and RSRC are substantially biased. Generally, the raking estimators performedwell, with little small sample bias and, in most cases, the smallest MSE. The raking estimators had largeefﬁciency gains in settings with a large hazard ratio as well those with a modest hazard ratio, commonevents, and large error variances. For all settings considered, GRRC and GRN performed similarly. GRNhas the added advantage that it can be applied with standard statistical software, e.g. the survey packagein R (Lumley, 2016).As noted above, the performance of the GRRC and GRN estimators was virtually identical, contraryto our hypothesis that the GRRC estimates would be more efﬁcient than those of GRN. This result wasunknown for previous applications of raking (Breslow et al., 2009, Lumley et al., 2011) and in factgoes against their recommendation to build imputation models for the partially missing variables. Forthe setting of only classical covariate measurement error and no time-to-event error, we derived (notshown) that the inﬂuence functions for Cox regression using X (cid:63) versus ˆ X are scalar multiples of eachother. Thus, the solutions to (7) under both auxiliary variables are equivalent. For the more complexerror settings considered in this paper (Sections 2.1, 2.2), an explicit characterization of the relationshipbetween the two auxiliary variables is more difﬁcult, but we hypothesize that an approximation of asimilar type holds for the settings studied.The motivating example for this paper was to develop methods where there were only errors in thefailure time outcome but not in the failure indicator. We additionally considered methods, namely GRRCand GRN, that are able to address more general error structures. We believe future research investigat-ing RC methods to directly correct for misclassiﬁcation resulting from time-to-event error would beworthwhile. In addition, while theory demonstrates that generalized raking estimators are consistent,we noticed that the small sample bias (and efﬁciency) can depend on the speciﬁc validation subsample.Developing optimal subsampling schemes to maximize efﬁciency would not only improve the completecase analysis, but also increase the efﬁciency gains of the raking estimators and is an area of future work.21 cknowledgements We would like to thank Timothy Sterling, MD and the co-investigators of the Vanderbilt ComprehensiveCare Clinic (VCCC) for use of their data. This work was supported by a Patient Centered OutcomesResearch Institute (PCORI) Award [R-1609-36207] and the U.S. National Institutes of Health (NIH)[R01-AI131771, P30-AI110527, R01-AI093234, U01-AI069923, and U01-AI069918]. The statementsin this manuscript are solely the responsibility of the authors and do not necessarily represent the viewsof PCORI or NIH.

References

P. K. Andersen and R. D. Gill. Cox’s regression model for counting processes: a large sample study.

TheAnnals of Statistics , pages 1100–1120, 1982.N. E. Breslow and J. A. Wellner. Weighted likelihood for semiparametric models and two-phase stratiﬁedsamples, with application to Cox regression.

Scandinavian Journal of Statistics , 34(1):86–102, 2007.N. E Breslow, T Lumley, C. M. Ballantyne, L. E. Chambless, and M Kulich. Improved Horvitz–Thompson estimation of model parameters from two-phase stratiﬁed samples: applications in epi-demiology.

Statistics in Biosciences , 1(1):32–49, 2009.R. J. Carroll, D Ruppert, L. A. Stefanski, and C. M. Crainiceanu.

Measurement Error in NonlinearModels: A Modern Perspective . CRC press, 2006.J. C. Deville and C. E. S¨arndal. Calibration estimators in survey sampling.

Journal of the AmericanStatistical Association , 87(418):376–382, 1992.J. C. Deville, C. E. S¨arndal, and O Sautory. Generalized raking procedures in survey sampling.

Journalof the American Statistical Association , 88(423), 1993.S. N. Duda, B. E. Shepherd, C. S. Gadd, D. R. Masys, and C. C. McGowan. Measuring the quality ofobservational study data in an international HIV research network.

PLoS One , 7(4):e33908, 2012.22. K. Edwards, S. R. Cole, M. A. Troester, and D. B. Richardson. Accounting for misclassiﬁed outcomesin binary regression models using multiple imputation with internal validation data.

American Journalof Epidemiology , 177(9):904–912, 2013.S Hunsberger, P. S. Albert, and L Dodd. Analysis of progression-free survival data using a discrete timesurvival model that incorporates measurements with and without diagnostic error.

Clinical Trials , 7(6):634–642, 2010.M Kulich and D. Y. Lin. Improving the efﬁciency of relative-risk estimation in case-cohort studies.

Journal of the American Statistical Association , 99(467):832–844, 2004.D. C. Lemly, B. E. Shepherd, T Hulgan, P Rebeiro, S Stinnette, R. B. Blackwell, S Bebawy, A Kheshti,T. R. Sterling, and S. P. Raffanti. Race and sex differences in antiretroviral therapy use and mortalityamong HIV-infected persons in care.

Journal of Infectious Diseases , 199(7):991–998, 2009.T Lumley.

Complex Surveys: A Guide to Analysis Using R , volume 565. John Wiley & Sons, 2011.T Lumley. Survey: Analysis of complex survey samples, 2016. R package version 3.32.T Lumley, P. A. Shaw, and J. Y. Dai. Connections between survey calibration estimators and semipara-metric models for incomplete data.

International Statistical Review , 79(2):200–220, 2011.A. S. Magaret. Incorporating validation subsets into discrete proportional hazards models for mismea-sured outcomes.

Statistics in Medicine , 27(26):5456–5470, 2008.L. S. Magder and J. P. Hughes. Logistic regression when the outcome is measured with uncertainty.

American Journal of Epidemiology , 146(2):195–203, 1997.A. S. Meier, B. A. Richardson, and J. P. Hughes. Discrete proportional hazards models for mismeasuredoutcomes.

Biometrics , 59(4):947–954, 2003.E. J. Oh, B. E. Shepherd, T Lumley, and P. A. Shaw. Considerations for analysis of time-to-eventoutcomes measured with error: bias and correction with SIMEX.

Statistics in Medicine , 37(8):1276–1289, 2018. 23. L. Prentice. Covariate measurement errors and parameter estimation in a failure time regressionmodel.

Biometrika , 69(2):331–342, 1982.J. M. Robins, A Rotnitzky, and L. P. Zhao. Estimation of regression coefﬁcients when some regressorsare not always observed.

Journal of the American Statistical Association , 89(427):846–866, 1994.T Saegusa and J. A. Wellner. Weighted likelihood estimation under two-phase sampling.

Annals ofStatistics , 41(1):269–295, 2013.P. A. Shaw and R. L. Prentice. Hazard ratio estimation for biomarker-calibrated dietary exposures.

Biometrics , 68(2):397–407, 2012.P. A. Shaw, J. He, and B. E. Shepherd. Regression calibration to correct correlated errors in outcomeand exposure. arXiv preprint arXiv:1811.10147 , November 2018.B. E Shepherd and C Yu. Accounting for data errors discovered from an audit in multiple linear regres-sion.

Biometrics , 67(3):1083–1091, 2011.A. W. Van der Vaart.

Asymptotic Statistics , volume 3. Cambridge University Press, 1998.C. Y. Wang, L Hsu, Z. D. Feng, and R. L. Prentice. Regression calibration in failure time regression.

Biometrics , pages 131–145, 1997.L Wang, P. A. Shaw, H. M. Mathelier, S. E. Kimmel, and B French. Evaluating risk-prediction modelsusing data from electronic health records.

The Annals of Applied Statistics , 10(1):286, 2016.S. X. Xie, C. Y. Wang, and R. L. Prentice. A risk set calibration method for failure time regressionby using a covariate reliability sample.

Journal of the Royal Statistical Society: Series B (StatisticalMethodology) , 63(4):855–870, 2001. 24igure 1: The hazard ratios and their corresponding conﬁdence intervals (CI) for a 100 cell/mm increase in CD4 count at ART initiation and 10 year increase in age at CD4 count measurement. Es-timates and their CIs are calculated using the bootstrap for the Regression Calibration (RC), Risk SetRegression Calibration (RSRC), Generalized Raking Regression Calibration (GRRC), and GeneralizedRaking Naive (GRN) estimators. 25able 1: Simulation results for β X under additive measurement error only in the outcome with normallydistributed error and 25 and 75 % censoring for the true event time. For simulated data sets, thebias, average bootstrap standard error (ASE) for the 4 proposed estimators, average model standard error(ASE) for naive and complete case, empirical standard error (ESE), mean squared error (MSE), and 95 % coverage probabilities (CP) are presented. % Censoring β X σ ν Method % Bias ASE ESE MSE CP25 log(1.5) True -0.025 0.030 0.031 0.001 0.9470.5 RC -12.677 0.042 0.043 0.004 0.752RSRC -5.056 0.048 0.050 0.003 0.928GRRC 0.074 0.059 0.058 0.003 0.957GRN 0.271 0.060 0.059 0.003 0.958Naive -37.562 0.030 0.031 0.024 0.002Complete 0.321 0.098 0.098 0.010 0.9521 RC -18.522 0.046 0.047 0.008 0.624RSRC -7.991 0.055 0.056 0.004 0.910GRRC -0.025 0.066 0.065 0.004 0.956GRN 0.074 0.066 0.065 0.004 0.958Naive -40.891 0.030 0.030 0.028 0.000Complete 0.321 0.098 0.098 0.010 0.954log(3) True 0.046 0.037 0.036 0.001 0.9510.5 RC -26.879 0.054 0.056 0.090 0.001RSRC -19.188 0.060 0.063 0.048 0.070GRRC -0.983 0.103 0.102 0.010 0.938GRN -1.010 0.104 0.104 0.011 0.939Naive -37.347 0.031 0.040 0.170 0.000Complete 0.819 0.118 0.118 0.014 0.9541 RC -33.042 0.056 0.058 0.135 0.000RSRC -23.466 0.065 0.067 0.071 0.027GRRC -0.883 0.108 0.105 0.011 0.940GRN -0.847 0.108 0.106 0.011 0.942Naive -41.88 0.030 0.039 0.213 0.000Complete 0.819 0.118 0.118 0.014 0.95575 log(1.5) True 0.074 0.054 0.054 0.003 0.9480.5 RC -15.340 0.079 0.080 0.010 0.872RSRC -12.874 0.087 0.089 0.011 0.898GRRC -0.099 0.113 0.112 0.012 0.957GRN 0.543 0.116 0.117 0.014 0.955Naive -69.204 0.054 0.055 0.082 0.000Complete 0.444 0.176 0.182 0.033 0.9501 RC -17.338 0.081 0.084 0.012 0.845RSRC -15.488 0.089 0.092 0.012 0.873GRRC -0.444 0.118 0.118 0.014 0.952GRN 0.247 0.120 0.121 0.015 0.953Naive -57.638 0.054 0.056 0.058 0.016Complete -0.099 0.177 0.182 0.033 0.946log(3) True 0.118 0.058 0.059 0.003 0.9500.5 RC -31.030 0.085 0.088 0.124 0.024RSRC -28.827 0.094 0.097 0.110 0.087GRRC -0.901 0.166 0.163 0.027 0.951GRN -0.446 0.168 0.175 0.031 0.950Naive -52.357 0.053 0.062 0.335 0.000Complete 1.912 0.191 0.197 0.039 0.9461 RC -33.060 0.087 0.091 0.140 0.024RSRC -31.567 0.095 0.099 0.130 0.055GRRC -0.774 0.171 0.170 0.029 0.940GRN -0.501 0.171 0.172 0.030 0.942Naive -48.680 0.053 0.061 0.290 0.000Complete 1.930 0.193 0.202 0.041 0.946 β X = log 1 . under correlated, additive measurement error in the outcomeand covariate X with normally distributed error and 25 % censoring for the true event time. For simulated data sets, the bias, average bootstrap standard error (ASE) for the 4 proposed estimators,average model standard error (ASE) for naive and complete case, empirical standard error (ESE), meansquared error (MSE), and 95 % coverage probabilities (CP) are presented. β X σ ν σ (cid:15) σ ν,(cid:15) Method % Bias ASE ESE MSE CPlog(1.5) True -0.025 0.030 0.031 0.001 0.9470.5 0.5 0.15 RC -13.762 0.059 0.059 0.007 0.804RSRC -6.338 0.070 0.068 0.005 0.922GRRC 0.173 0.083 0.084 0.007 0.947GRN 0.345 0.083 0.084 0.007 0.946Naive -79.760 0.024 0.025 0.105 0.000Complete 0.321 0.098 0.098 0.010 0.9520.30 RC -13.491 0.060 0.060 0.007 0.813RSRC -6.116 0.071 0.069 0.005 0.928GRRC 0.296 0.083 0.084 0.007 0.947GRN 0.567 0.083 0.084 0.007 0.945Naive -97.024 0.024 0.025 0.155 0.000Complete 0.173 0.098 0.099 0.010 0.9541 0.15 RC -13.836 0.072 0.071 0.008 0.843RSRC -7.054 0.084 0.083 0.008 0.922GRRC 0.049 0.089 0.090 0.008 0.948GRN 0.148 0.089 0.090 0.008 0.952Naive -86.099 0.020 0.020 0.122 0.000Complete 0.271 0.098 0.098 0.010 0.9520.30 RC -13.639 0.073 0.072 0.008 0.845RSRC -6.955 0.086 0.084 0.008 0.914GRRC 0.074 0.089 0.090 0.008 0.947GRN 0.271 0.089 0.089 0.008 0.945Naive -97.912 0.020 0.020 0.158 0.000Complete 0.222 0.098 0.098 0.010 0.9571 0.5 0.15 RC -19.237 0.065 0.065 0.010 0.746RSRC -9.520 0.078 0.076 0.007 0.902GRRC 0.123 0.085 0.086 0.007 0.944GRN 0.247 0.085 0.086 0.007 0.944Naive -79.686 0.024 0.025 0.105 0.000Complete 0.321 0.098 0.098 0.010 0.9540.30 RC -19.311 0.066 0.066 0.010 0.743RSRC -9.693 0.079 0.077 0.008 0.903GRRC 0.148 0.085 0.086 0.007 0.945GRN 0.345 0.085 0.085 0.007 0.946Naive -95.027 0.024 0.025 0.149 0.000Complete 0.173 0.098 0.098 0.010 0.9551 0.15 RC -19.213 0.079 0.079 0.012 0.801RSRC -10.235 0.095 0.092 0.010 0.908GRRC -0.025 0.090 0.092 0.008 0.945GRN 0.074 0.090 0.091 0.008 0.946Naive -86.049 0.020 0.020 0.122 0.000Complete 0.148 0.098 0.099 0.010 0.9520.30 RC -19.213 0.080 0.080 0.012 0.798RSRC -10.580 0.096 0.093 0.010 0.902GRRC 0.123 0.090 0.091 0.008 0.947GRN 0.247 0.090 0.091 0.008 0.948Naive -96.556 0.020 0.020 0.154 0.000Complete 0.321 0.098 0.098 0.010 0.953 β X = log 1 . under correlated, additive measurement error in the outcomeand covariate X with normally distributed error and 75 % censoring for the true event time. For simulated data sets, the bias, average bootstrap standard error (ASE) for the 4 proposed estimators,average model standard error (ASE) for naive and complete case, empirical standard error (ESE), meansquared error (MSE), and 95 % coverage probabilities (CP) are presented. β X σ ν σ (cid:15) σ ν,(cid:15) Method % Bias ASE ESE MSE CPlog(1.5) True 0.074 0.054 0.054 0.003 0.9480.5 0.5 0.15 RC -15.143 0.109 0.108 0.015 0.906RSRC -12.677 0.120 0.120 0.017 0.925GRRC 0.222 0.154 0.153 0.023 0.955GRN 0.987 0.156 0.156 0.024 0.956Naive -120.208 0.046 0.046 0.240 0.000Complete 0.444 0.176 0.182 0.033 0.9500.30 RC -14.477 0.109 0.108 0.015 0.900RSRC -11.715 0.121 0.119 0.016 0.922GRRC 0.099 0.154 0.152 0.023 0.954GRN 1.406 0.154 0.154 0.024 0.954Naive -167.043 0.048 0.049 0.461 0.000Complete 0.444 0.177 0.183 0.034 0.9481 0.15 RC -14.896 0.134 0.131 0.021 0.920RSRC -13.047 0.146 0.146 0.024 0.931GRRC -0.099 0.166 0.164 0.027 0.962GRN 0.271 0.168 0.166 0.028 0.958Naive -113.623 0.038 0.038 0.214 0.000Complete 0.271 0.177 0.183 0.034 0.9520.30 RC -14.650 0.133 0.131 0.021 0.922RSRC -12.381 0.146 0.145 0.024 0.936GRRC 0.839 0.166 0.164 0.027 0.958GRN 1.430 0.168 0.167 0.028 0.956Naive -143.465 0.039 0.039 0.340 0.000Complete 1.208 0.177 0.182 0.033 0.9481 0.5 0.15 RC -16.993 0.113 0.114 0.018 0.890RSRC -15.316 0.123 0.123 0.019 0.907GRRC -0.370 0.156 0.155 0.024 0.954GRN 0.444 0.158 0.157 0.024 0.952Naive -102.228 0.045 0.046 0.174 0.000Complete -0.099 0.177 0.182 0.033 0.9460.30 RC -17.264 0.113 0.112 0.017 0.892RSRC -15.464 0.124 0.124 0.019 0.904GRRC -0.222 0.155 0.154 0.024 0.956GRN 0.814 0.156 0.155 0.024 0.958Naive -132.613 0.046 0.046 0.291 0.000Complete 0.296 0.176 0.182 0.033 0.9501 0.15 RC -17.091 0.138 0.136 0.023 0.918RSRC -15.562 0.150 0.152 0.027 0.916GRRC -0.222 0.166 0.165 0.027 0.957GRN 0.123 0.168 0.167 0.028 0.955Naive -101.587 0.037 0.038 0.171 0.000Complete -0.074 0.176 0.182 0.033 0.9480.30 RC -17.042 0.138 0.135 0.023 0.916RSRC -15.291 0.151 0.151 0.027 0.916GRRC 0.123 0.167 0.165 0.027 0.954GRN 0.814 0.169 0.167 0.028 0.952Naive -121.86 0.038 0.038 0.246 0.000Complete 0.617 0.177 0.180 0.032 0.954 β X = log 3 under correlated, additive measurement error in the outcomeand covariate X with normally distributed error and 25 % censoring for the true event time. For simulated data sets, the bias, average bootstrap standard error (ASE) for the 4 proposed estimators,average model standard error (ASE) for naive and complete case, empirical standard error (ESE), meansquared error (MSE), and 95 % coverage probabilities (CP) are presented. β X σ ν σ (cid:15) σ ν,(cid:15) Method % Bias ASE ESE MSE CPlog(3) True 0.055 0.037 0.036 0.001 0.9520.5 0.5 0.15 RC -31.239 0.077 0.077 0.124 0.026RSRC -23.038 0.092 0.092 0.072 0.239GRRC 0.337 0.113 0.112 0.012 0.950GRN 0.346 0.112 0.111 0.012 0.950Naive -70.243 0.025 0.027 0.596 0.000Complete 0.819 0.118 0.118 0.014 0.9540.30 RC -31.904 0.079 0.080 0.129 0.030RSRC -23.102 0.097 0.096 0.074 0.274GRRC 0.410 0.113 0.111 0.012 0.952GRN 0.473 0.112 0.111 0.012 0.954Naive -76.842 0.024 0.026 0.713 0.000Complete 0.810 0.118 0.118 0.014 0.9551 0.15 RC -31.895 0.094 0.093 0.132 0.086RSRC -24.394 0.111 0.110 0.084 0.329GRRC 0.373 0.116 0.115 0.013 0.954GRN 0.410 0.116 0.114 0.013 0.952Naive -79.473 0.020 0.022 0.763 0.000Complete 0.719 0.118 0.118 0.014 0.9560.30 RC -32.359 0.096 0.095 0.135 0.092RSRC -24.540 0.115 0.113 0.086 0.351GRRC 0.391 0.116 0.114 0.013 0.957GRN 0.455 0.115 0.114 0.013 0.954Naive -83.888 0.020 0.021 0.850 0.000Complete 0.737 0.118 0.118 0.014 0.9561 0.5 0.15 RC -35.900 0.079 0.079 0.162 0.014RSRC -26.916 0.095 0.094 0.096 0.163GRRC 0.328 0.114 0.112 0.013 0.950GRN 0.337 0.114 0.112 0.013 0.951Naive -71.372 0.025 0.027 0.616 0.000Complete 0.819 0.118 0.118 0.014 0.9550.30 RC -36.528 0.080 0.081 0.168 0.014RSRC -27.334 0.098 0.097 0.100 0.181GRRC 0.337 0.114 0.112 0.013 0.949GRN 0.364 0.114 0.112 0.012 0.954Naive -76.997 0.024 0.026 0.716 0.000Complete 0.728 0.118 0.118 0.014 0.9561 0.15 RC -36.246 0.096 0.096 0.168 0.052RSRC -28.409 0.114 0.113 0.110 0.253GRRC 0.391 0.117 0.115 0.013 0.950GRN 0.401 0.116 0.115 0.013 0.950Naive -80.256 0.020 0.022 0.778 0.000Complete 0.755 0.118 0.118 0.014 0.9520.30 RC -36.674 0.098 0.097 0.172 0.056RSRC -28.754 0.116 0.115 0.113 0.264GRRC 0.428 0.117 0.114 0.013 0.952GRN 0.446 0.116 0.114 0.013 0.954Naive -84.015 0.020 0.021 0.852 0.000Complete 0.746 0.118 0.118 0.014 0.954 β X = 0 under correlated, additive measurement error in the outcomeand covariates with normally distributed error and 25 and 75 % censoring for the true event time. For simulated data sets, the type 1 error, average bootstrap standard error (ASE) for the 4 proposedestimators, average model standard error (ASE) for naive and complete case, empirical standard error(ESE), and mean squared error (MSE) are presented. % Censoring σ ν σ (cid:15) σ ν,(cid:15) Method Type 1 Error ASE ESE MSE25 0.5 0.5 0.15 RC 0.044 0.054 0.054 0.003RSRC 0.050 0.063 0.062 0.004GRRC 0.043 0.077 0.075 0.006GRN 0.042 0.078 0.075 0.006Naive 1.000 0.025 0.026 0.019Complete 0.049 0.097 0.097 0.01075 0.5 0.5 0.15 RC 0.050 0.102 0.102 0.010RSRC 0.059 0.112 0.116 0.014GRRC 0.046 0.141 0.141 0.020GRN 0.046 0.143 0.143 0.021Naive 1.000 0.045 0.047 0.080Complete 0.056 0.170 0.178 0.032 upplementary Materials for “Raking and Regression Calibration:Methods to Address Bias from Correlated Covariate andTime-to-Event Error” Eric J. Oh (cid:63) , Bryan E. Shepherd , Thomas Lumley , Pamela A. Shaw University of Pennsylvania, Perelman School of MedicineDepartment of Biostatistics, Epidemiology, and Informatics Vanderbilt University School of MedicineDepartment of Biostatistics University of AucklandDepartment of Statistics

Appendix A: Asymptotic theory for RC and RSRC estimators

First, we consider the RC extension for covariate and time-to-event error in Section 3.3. The RC estima-tor in this setting, ˆ β RC , is found by solving the score equation S RC ( β, ˆ ζ ) = n (cid:88) i =1 (cid:90) τ (cid:40)(cid:110) ˆ X i ( ˆ ζ x ) , Z i (cid:111) (cid:48) − S (1) ( β, ˆ ζ, t ) S (0) ( β, ˆ ζ, t ) (cid:41) d ˆ N i ( t ; ˆ ζ ω ) = 0 where S ( r ) ( β, ˆ ζ, t ) = n − (cid:80) nj =1 ˆ Y j ( t ; ˆ ζ ω ) (cid:110) ˆ X j ( ˆ ζ x ) , Z j (cid:111) (cid:48) ⊗ r exp( β (cid:48) X ˆ X j ( ˆ ζ x ) + β (cid:48) Z Z j ) ( a ⊗ is the vector a and a ⊗ is the scalar 1), and (cid:110) ˆ U ( ˆ ζ ω ) , ˆ X ( ˆ ζ x ) (cid:111) are as given in Section 3.3. Throughout this section, weassume that (1) ( N i , Y i , X i , Z i ) are i.i.d; (2) there exists a ﬁnite constant τ > such that P ( U ≥ τ ) > ; (3) (cid:82) τ λ ( u ) du < ∞ ; and (4) mn → p ∈ (0 , . Deﬁne β (cid:63) as the solution to E { S RC ( β, ζ ) } = 0 ,which is generally not the same as β . First, we consider consistency for β (cid:63) and asymptotic normal-ity for the solution to S RC ( β, ζ ) , where ζ = ( ζ x , ζ ω ) is the true nuisance parameter vector. Then S RC ( β, ζ ) , which is based on the standard Cox partial score equation, and thus concave, will have aunique, consistent solution, namely β (cid:63) , under mild regularity conditions (see Andersen and Gill, 1982).To establish asymptotic normality, we additionally deﬁne θ (cid:63) = ( β (cid:63) , ζ ) and assume that (5) ∂∂θ S RC ( θ ) θ ∈ N ( θ (cid:63) ) , a compact neighborhood of θ (cid:63) ; (6) ∂∂θ S RC ( θ ) converges to its limit E (cid:8) ∂∂θ S RC ( θ ) (cid:9) uniformly in N ( θ (cid:63) ) ; (7) E (cid:8) ∂∂θ S RC ( θ ) (cid:9) is nonsingular at θ (cid:63) ; and (8) E (cid:20) sup θ ∈N ( θ (cid:63) ) (cid:110)(cid:110) ˆ X j ( ˆ ζ x ) , Z j (cid:111) exp( β (cid:48) X ˆ X j ( ˆ ζ x ) + β (cid:48) Z Z j ) (cid:111) (cid:21) < ∞ . The techniques of Andersen andGill (1982) can then be used to establish asymptotic normality of the solution to S RC ( β, ζ ) . Next, thesolution to S RC ( β, ˆ ζ ) , where ˆ ζ is our plug-in moment estimator for ζ , can be shown to be consistent andasymptotically normal using Theorem 5.31 in Van der Vaart (1998). The theorem additionally requiresthat S RC ( β, ˆ ζ ) be Donsker in N ( θ (cid:63) ) . It is well known that the usual Cox score equation is Donsker andgiven that ˆ ζ is a ﬁnite dimensional moment estimator, the estimating equations we solve to estimate thenuisance parameters are Donsker as well. ˆ X and ˆ U are Lipschitz transformations of X and U involvingestimators from a Donsker class of functions, so it follows from Example 19.20 in Van der Vaart (1998)that S RC ( β, ˆ ζ ) is Donsker.The arguments above apply to show consistency and asymptotic normality of ˆ β RC from Section 3.2for time-to-event error only by utilizing the true X instead of ˆ X . Similary, the asymptotic properties ofthe RSRC estimators from Section 3.4 follow as well due to the fact that we recalibrate a ﬁxed, ﬁnitenumber of times. This results in a ﬁnite number of Lipschitz transformations and thus a Donsker classof estimating equations. Appendix B: Empirical comparison of sandwich and bootstrap vari-ances for raking estimators

We used the bootstrap to calculate standard errors for the raking estimators due to the fact that wenoticed coverage probabilities in some settings under using the sandwich variance estimators. Forexample, in an independent simulation with settings β X = log(3) , σ ν = 0 . , σ (cid:15) = 1 , σ ν,(cid:15) = 0 . , and censoring, the coverage of GRRC was . using the sandwich estimator and . using thebootstrap. Note that Monte Carlo error cannot explain this undercoverage as the number of simulationruns was , resulting in a conﬁdence interval of . ± . (cid:113) (0 . . , or (0 . , . ,which does not include . . The coverage of GRN under the same settings was extremely similar.32 ppendix C: Additive error tables Table 1: Simulation results for β X under correlated, additive measurement error in the outcome andcovariate X with β X = log 3 , normally distributed error, and 75 % censoring for the true event time. For simulated data sets, the bias, average bootstrap standard error (ASE) for the 4 proposed estimators,average model standard error (ASE) for naive and complete case, empirical standard error (ESE), meansquared error (MSE), and 95 % coverage probabilities (CP) are presented. β σ ν σ (cid:15) σ ν,(cid:15) Method % Bias ASE ESE MSE CPlog(3) True 0.146 0.058 0.058 0.003 0.9490.5 0.5 0.15 RC -31.540 0.121 0.121 0.135 0.198RSRC -28.836 0.135 0.134 0.118 0.339GRRC 0.819 0.187 0.182 0.033 0.960GRN 1.129 0.188 0.183 0.034 0.958Naive -86.163 0.044 0.047 0.898 0.000Complete 1.912 0.191 0.197 0.039 0.9460.30 RC -31.567 0.122 0.121 0.135 0.214RSRC -28.627 0.136 0.133 0.117 0.356GRRC 0.792 0.188 0.186 0.035 0.955GRN 1.329 0.187 0.187 0.035 0.953Naive -102.639 0.046 0.047 1.274 0.000Complete 1.766 0.192 0.202 0.041 0.9401 0.15 RC -31.294 0.149 0.148 0.140 0.357RSRC -28.827 0.166 0.164 0.127 0.506GRRC 1.283 0.194 0.191 0.037 0.952GRN 1.420 0.196 0.192 0.037 0.954Naive -90.669 0.036 0.038 0.994 0.000Complete 1.957 0.192 0.200 0.041 0.9410.30 RC -31.431 0.150 0.148 0.141 0.354RSRC -28.754 0.167 0.166 0.127 0.492GRRC 1.238 0.194 0.193 0.037 0.958GRN 1.611 0.196 0.192 0.037 0.958Naive -101.719 0.038 0.039 1.250 0.000Complete 1.839 0.192 0.202 0.041 0.9421 0.5 0.15 RC -33.415 0.123 0.124 0.150 0.178RSRC -31.695 0.137 0.135 0.139 0.288GRRC 0.847 0.190 0.187 0.035 0.954GRN 1.174 0.190 0.188 0.036 0.950Naive -79.646 0.044 0.046 0.768 0.000Complete 1.930 0.193 0.202 0.041 0.9460.30 RC -33.652 0.124 0.123 0.152 0.178RSRC -31.494 0.138 0.138 0.139 0.303GRRC 0.874 0.188 0.186 0.034 0.958GRN 1.302 0.188 0.186 0.035 0.956Naive -90.541 0.045 0.046 0.992 0.000Complete 1.866 0.192 0.201 0.041 0.9481 0.15 RC -33.378 0.152 0.151 0.157 0.328RSRC -31.804 11.643 0.166 0.149 0.438GRRC 1.129 0.195 0.193 0.037 0.954GRN 1.311 0.196 0.193 0.038 0.952Naive -86.191 0.036 0.038 0.898 0.000Complete 1.866 0.192 0.201 0.041 0.9460.30 RC -33.533 0.153 0.151 0.159 0.328RSRC -32.04 3.224 0.164 0.151 0.439GRRC 1.202 0.194 0.191 0.037 0.951GRN 1.538 0.195 0.192 0.037 0.952Naive -93.700 0.036 0.038 1.061 0.000Complete 1.893 0.192 0.200 0.040 0.944 β Z under additive measurement error only in the outcome with normallydistributed error and 25 and 75 % censoring for the true event time. For simulated data sets, thebias, average bootstrap standard error (ASE) for the 4 proposed estimators, average model standard error(ASE) for naive and complete case, empirical standard error (ESE), mean squared error (MSE), and 95 % coverage probabilities (CP) are presented. % Censoring β X σ ν Method % Bias ASE ESE MSE CP25 log(1.5) True 0.072 0.032 0.033 0.001 0.9490.5 RC -12.523 0.044 0.043 0.009 0.493RSRC -4.891 0.051 0.052 0.004 0.884GRRC 0.115 0.066 0.065 0.004 0.956GRN -0.014 0.066 0.065 0.004 0.958Naive 12.003 0.033 0.034 0.008 0.294Complete 1.428 0.104 0.105 0.011 0.9561 RC -18.495 0.048 0.048 0.019 0.247RSRC -7.617 0.058 0.059 0.006 0.847GRRC 0.087 0.074 0.073 0.005 0.957GRN -0.029 0.074 0.072 0.005 0.957Naive 2.741 0.032 0.033 0.002 0.902Complete 1.385 0.104 0.105 0.011 0.954log(3) True 0.043 0.033 0.033 0.001 0.9490.5 RC -26.719 0.048 0.048 0.037 0.030RSRC -18.712 0.055 0.057 0.020 0.343GRRC -0.851 0.086 0.087 0.008 0.944GRN -1.010 0.084 0.082 0.007 0.948Naive 0.144 0.032 0.037 0.001 0.913Complete 1.284 0.106 0.108 0.012 0.9521 RC -32.951 0.051 0.051 0.055 0.006RSRC -22.881 0.060 0.062 0.029 0.264GRRC -0.793 0.090 0.088 0.008 0.946GRN -0.866 0.089 0.088 0.008 0.947Naive -10.777 0.032 0.036 0.007 0.362Complete 1.298 0.106 0.108 0.012 0.95575 log(1.5) True 0.130 0.056 0.056 0.003 0.9540.5 RC -14.874 0.079 0.079 0.017 0.72RSRC -12.248 0.087 0.090 0.015 0.816GRRC -0.101 0.121 0.119 0.014 0.954GRN -0.707 0.129 0.128 0.016 0.952Naive 32.244 0.057 0.059 0.053 0.020Complete 1.962 0.182 0.190 0.036 0.9441 RC -17.226 0.082 0.082 0.021 0.681RSRC -14.946 0.090 0.094 0.020 0.782GRRC -0.390 0.127 0.124 0.015 0.954GRN -1.010 0.131 0.13 0.017 0.946Naive 17.760 0.056 0.058 0.019 0.400Complete 1.818 0.182 0.190 0.036 0.944log(3) True 0.188 0.054 0.055 0.003 0.9480.5 RC -30.268 0.083 0.084 0.051 0.288RSRC -27.685 0.092 0.096 0.046 0.443GRRC -1.068 0.148 0.145 0.021 0.944GRN -1.746 0.152 0.149 0.022 0.944Naive 20.111 0.055 0.062 0.023 0.297Complete 2.265 0.178 0.186 0.035 0.9481 RC -32.691 0.085 0.087 0.059 0.237RSRC -30.628 0.094 0.099 0.055 0.383GRRC -1.096 0.152 0.150 0.022 0.950GRN -1.890 0.154 0.153 0.024 0.943Naive 3.982 0.054 0.061 0.004 0.880Complete 2.121 0.180 0.188 0.036 0.944 β Z under additive, general measurement error in the outcome and co-variate X with β X = log 1 . , normally distributed error, and 25 % censoring for the true event time. For simulated data sets, the bias, average bootstrap standard error (ASE) for the 4 proposed estimators,average model standard error (ASE) for naive and complete case, empirical standard error (ESE), meansquared error (MSE), and 95 % coverage probabilities (CP) are presented. β X σ ν σ (cid:15) σ ν,(cid:15) Method % Bias ASE ESE MSE CPlog(1.5) True 0.072 0.032 0.033 0.001 0.9490.5 0.5 0.15 RC -13.936 0.050 0.049 0.012 0.510RSRC -5.554 0.058 0.058 0.005 0.901GRRC 0.245 0.071 0.069 0.005 0.964GRN 0.115 0.070 0.068 0.005 0.963Naive 25.189 0.031 0.032 0.032 0.000Complete 1.428 0.104 0.105 0.011 0.9560.30 RC -13.893 0.050 0.050 0.012 0.526RSRC -5.324 0.058 0.059 0.005 0.903GRRC 0.245 0.069 0.067 0.004 0.964GRN 0.058 0.069 0.066 0.004 0.968Naive 27.656 0.031 0.033 0.038 0.000Complete 1.486 0.104 0.106 0.011 0.9511 0.15 RC -14.153 0.054 0.053 0.012 0.568RSRC -5.713 0.062 0.062 0.006 0.912GRRC 0.346 0.073 0.071 0.005 0.964GRN 0.245 0.072 0.070 0.005 0.965Naive 26.113 0.031 0.032 0.034 0.000Complete 1.457 0.104 0.106 0.011 0.9530.30 RC -14.138 0.055 0.053 0.012 0.578RSRC -5.526 0.062 0.062 0.005 0.917GRRC 0.332 0.072 0.069 0.005 0.966GRN 0.216 0.071 0.068 0.005 0.966Naive 27.786 0.031 0.032 0.038 0.000Complete 1.428 0.104 0.106 0.011 0.9541 0.5 0.15 RC -19.563 0.054 0.053 0.021 0.288RSRC -8.094 0.065 0.066 0.008 0.850GRRC 0.231 0.078 0.076 0.006 0.962GRN 0.115 0.077 0.075 0.006 0.960Naive 15.581 0.030 0.031 0.013 0.047Complete 1.385 0.104 0.105 0.011 0.9540.30 RC -19.606 0.055 0.054 0.021 0.300RSRC -7.906 0.065 0.066 0.007 0.867GRRC 0.216 0.077 0.075 0.006 0.958GRN 0.058 0.077 0.074 0.006 0.964Naive 17.731 0.030 0.031 0.016 0.020Complete 1.443 0.104 0.106 0.011 0.9541 0.15 RC -19.707 0.059 0.058 0.022 0.360RSRC -8.094 0.070 0.070 0.008 0.881GRRC 0.317 0.080 0.078 0.006 0.960GRN 0.202 0.079 0.077 0.006 0.960Naive 16.504 0.030 0.031 0.014 0.030Complete 1.472 0.104 0.106 0.011 0.9550.30 RC -19.736 0.060 0.058 0.022 0.359RSRC -7.920 0.070 0.070 0.008 0.886GRRC 0.260 0.079 0.077 0.006 0.961GRN 0.159 0.078 0.076 0.006 0.962Naive 17.99 0.030 0.031 0.016 0.014Complete 1.371 0.104 0.106 0.011 0.956 β Z under correlated, additive measurement error in the outcome andcovariate X with β X = log 1 . , normally distributed error, and 75 % censoring for the true event time. For simulated data sets, the bias, average bootstrap standard error (ASE) for the 4 proposed estimators,average model standard error (ASE) for naive and complete case, empirical standard error (ESE), meansquared error (MSE), and 95 % coverage probabilities (CP) are presented. β X σ ν σ (cid:15) σ ν,(cid:15) Method % Bias ASE ESE MSE CPlog(1.5) True 0.173 0.056 0.056 0.003 0.9540.5 0.5 0.15 RC -15.105 0.088 0.087 0.019 0.785RSRC -12.220 0.098 0.099 0.017 0.853GRRC -0.014 0.131 0.130 0.017 0.954GRN -0.692 0.139 0.139 0.019 0.950Naive 45.272 0.051 0.053 0.101 0.000Complete 1.962 0.182 0.190 0.036 0.9440.30 RC -14.917 0.088 0.087 0.018 0.802RSRC -11.902 0.098 0.100 0.017 0.858GRRC 0.043 0.129 0.128 0.016 0.956GRN -0.808 0.137 0.137 0.019 0.946Naive 55.457 0.052 0.055 0.151 0.000Complete 1.861 0.182 0.191 0.037 0.9401 0.15 RC -15.163 0.097 0.095 0.020 0.818RSRC -12.119 0.107 0.108 0.019 0.887GRRC -0.014 0.136 0.135 0.018 0.954GRN -0.548 0.143 0.143 0.020 0.950Naive 44.103 0.051 0.053 0.096 0.000Complete 2.265 0.182 0.191 0.037 0.9480.30 RC -14.975 0.097 0.094 0.020 0.827RSRC -12.047 0.107 0.108 0.018 0.883GRRC -0.144 0.135 0.132 0.018 0.956GRN -0.779 0.142 0.141 0.020 0.950Naive 50.595 0.051 0.053 0.126 0.000Complete 2.049 0.182 0.187 0.035 0.9501 0.5 0.15 RC -17.543 0.092 0.091 0.023 0.742RSRC -15.018 0.101 0.104 0.022 0.813GRRC -0.274 0.137 0.134 0.018 0.955GRN -0.923 0.141 0.140 0.020 0.950Naive 30.701 0.050 0.052 0.048 0.010Complete 1.818 0.182 0.190 0.036 0.9440.30 RC -17.586 0.092 0.090 0.023 0.748RSRC -14.903 0.101 0.104 0.021 0.830GRRC -0.115 0.134 0.133 0.018 0.954GRN -0.822 0.140 0.138 0.019 0.948Naive 35.894 0.050 0.052 0.065 0.000Complete 1.847 0.181 0.185 0.034 0.9441 0.15 RC -17.644 0.100 0.098 0.025 0.780RSRC -14.816 0.111 0.113 0.023 0.846GRRC -0.188 0.139 0.140 0.020 0.944GRN -0.649 0.144 0.145 0.021 0.944Naive 30.441 0.049 0.052 0.047 0.010Complete 1.760 0.181 0.190 0.036 0.9410.30 RC -17.500 0.101 0.098 0.024 0.789RSRC -14.946 0.110 0.113 0.024 0.849GRRC -0.144 0.140 0.138 0.019 0.955GRN -0.750 0.144 0.145 0.021 0.946Naive 34.192 0.050 0.052 0.059 0.001Complete 1.746 0.182 0.190 0.036 0.946 β Z under correlated, additive measurement error in the outcome andcovariate X with β X = log 3 , normally distributed error, and 25 % censoring for the true event time. For simulated data sets, the bias, average bootstrap standard error (ASE) for the 4 proposed estimators,average model standard error (ASE) for naive and complete case, empirical standard error (ESE), meansquared error (MSE), and 95 % coverage probabilities (CP) are presented. β X σ ν σ (cid:15) σ ν,(cid:15) Method % Bias ASE ESE MSE CPlog(3) True 0.043 0.033 0.033 0.001 0.9450.5 0.5 0.15 RC -31.667 0.060 0.060 0.052 0.036RSRC -20.789 0.070 0.072 0.026 0.466GRRC -0.433 0.088 0.086 0.007 0.952GRN -0.548 0.087 0.084 0.007 0.952Naive 26.892 0.031 0.034 0.036 0.000Complete 1.284 0.106 0.108 0.012 0.9520.30 RC -32.432 0.063 0.062 0.054 0.036RSRC -20.602 0.072 0.074 0.026 0.497GRRC -0.303 0.087 0.084 0.007 0.956GRN -0.491 0.087 0.084 0.007 0.957Naive 26.661 0.031 0.034 0.035 0.000Complete 1.284 0.106 0.107 0.012 0.9501 0.15 RC -32.720 0.067 0.066 0.056 0.050RSRC -20.977 0.076 0.077 0.027 0.522GRRC -0.303 0.088 0.087 0.008 0.955GRN -0.404 0.087 0.085 0.007 0.959Naive 29.316 0.031 0.034 0.042 0.000Complete 1.298 0.106 0.108 0.012 0.9530.30 RC -33.225 0.069 0.068 0.058 0.050RSRC -20.789 0.077 0.078 0.027 0.550GRRC -0.188 0.088 0.086 0.007 0.956GRN -0.317 0.087 0.085 0.007 0.954Naive 29.128 0.031 0.034 0.042 0.000Complete 1.313 0.106 0.108 0.012 0.9501 0.5 0.15 RC -36.341 0.062 0.062 0.067 0.010RSRC -23.920 0.073 0.074 0.033 0.371GRRC -0.375 0.092 0.090 0.008 0.954GRN -0.519 0.091 0.089 0.008 0.955Naive 16.605 0.030 0.034 0.014 0.040Complete 1.298 0.106 0.108 0.012 0.9550.30 RC -37.048 0.064 0.063 0.070 0.008RSRC -23.761 0.074 0.076 0.033 0.398GRRC -0.361 0.091 0.089 0.008 0.956GRN -0.447 0.091 0.089 0.008 0.953Naive 16.764 0.030 0.033 0.015 0.038Complete 1.356 0.106 0.108 0.012 0.9481 0.15 RC -37.063 0.069 0.068 0.071 0.018RSRC -23.660 0.079 0.080 0.033 0.454GRRC -0.274 0.092 0.090 0.008 0.955GRN -0.361 0.091 0.089 0.008 0.952Naive 19.274 0.030 0.033 0.019 0.012Complete 1.284 0.106 0.108 0.012 0.9500.30 RC -37.524 0.070 0.069 0.072 0.017RSRC -23.574 0.080 0.081 0.033 0.467GRRC -0.202 0.092 0.090 0.008 0.956GRN -0.289 0.091 0.089 0.008 0.956Naive 19.361 0.030 0.033 0.019 0.012Complete 1.327 0.106 0.108 0.012 0.950 ppendix D: Classical measurement error tables Table 6: Simulation results for β X = log 1 . under correlated additive measurement error in the outcomeand classical measurement error in the covariate X with normally distributed error and 25 % censoringfor the true event time. For simulated data sets, the bias, average bootstrap standard error (ASE) forthe 4 proposed estimators, average model standard error (ASE) for naive and complete case, empiricalstandard error (ESE), mean squared error (MSE), and 95 % coverage probabilities (CP) are presented. β X σ ν σ (cid:15) σ ν,(cid:15) Method % Bias ASE ESE MSE CPlog(1.5) True -0.049 0.030 0.031 0.001 0.9460.5 0.5 0.15 RC -13.762 0.056 0.057 0.006 0.800RSRC -6.141 0.066 0.065 0.005 0.920GRRC 0.123 0.081 0.082 0.007 0.949GRN 0.296 0.081 0.082 0.007 0.947Naive -78.428 0.023 0.023 0.102 0.000Complete 0.123 0.098 0.099 0.010 0.9520.30 RC -13.589 0.057 0.057 0.006 0.800RSRC -5.944 0.068 0.066 0.005 0.929GRRC 0.222 0.081 0.082 0.007 0.945GRN 0.518 0.081 0.082 0.007 0.942Naive -93.621 0.023 0.024 0.145 0.000Complete 0.148 0.098 0.099 0.010 0.9541 0.15 RC -13.836 0.067 0.067 0.008 0.832RSRC -6.758 0.079 0.078 0.007 0.918GRRC 0.000 0.087 0.088 0.008 0.946GRN 0.148 0.087 0.088 0.008 0.946Naive -84.594 0.019 0.020 0.118 0.000Complete 0.173 0.098 0.099 0.010 0.9520.30 RC -13.688 0.068 0.068 0.008 0.836RSRC -6.708 0.080 0.079 0.007 0.912GRRC 0.247 0.087 0.088 0.008 0.948GRN 0.469 0.087 0.088 0.008 0.944Naive -95.471 0.019 0.020 0.150 0.000Complete 0.271 0.098 0.098 0.010 0.9521 0.5 0.15 RC -19.286 0.062 0.062 0.010 0.734RSRC -9.224 0.074 0.073 0.007 0.907GRRC 0.148 0.083 0.084 0.007 0.942GRN 0.271 0.083 0.084 0.007 0.943Naive -78.552 0.023 0.023 0.102 0.000Complete 0.247 0.098 0.098 0.010 0.9540.30 RC -19.286 0.062 0.063 0.010 0.732RSRC -9.372 0.076 0.073 0.007 0.904GRRC 0.197 0.083 0.084 0.007 0.944GRN 0.370 0.083 0.084 0.007 0.946Naive -91.993 0.023 0.023 0.140 0.000Complete 0.173 0.098 0.099 0.010 0.9481 0.15 RC -19.139 0.074 0.074 0.012 0.791RSRC -10.013 0.088 0.087 0.009 0.907GRRC 0.025 0.089 0.090 0.008 0.942GRN 0.123 0.089 0.090 0.008 0.944Naive -84.619 0.019 0.020 0.118 0.000Complete 0.271 0.098 0.099 0.010 0.9540.30 RC -19.262 0.074 0.074 0.012 0.779RSRC -10.137 0.089 0.088 0.009 0.902GRRC 0.099 0.089 0.090 0.008 0.944GRN 0.247 0.089 0.090 0.008 0.943Naive -94.336 0.019 0.020 0.147 0.000Complete 0.247 0.098 0.098 0.010 0.953 β X = log 1 . under correlated additive measurement error in the outcomeand classical measurement error in the covariate X with normally distributed error and 75 % censoringfor the true event time. For simulated data sets, the bias, average bootstrap standard error (ASE) forthe 4 proposed estimators, average model standard error (ASE) for naive and complete case, empiricalstandard error (ESE), mean squared error (MSE), and 95 % coverage probabilities (CP) are presented. β X σ ν σ (cid:15) σ ν,(cid:15) Method % Bias ASE ESE MSE CPlog(1.5) True 0.123 0.054 0.054 0.003 0.9490.5 0.5 0.15 RC -14.946 0.104 0.103 0.014 0.901RSRC -12.406 0.114 0.113 0.015 0.918GRRC 0.148 0.151 0.149 0.022 0.956GRN 0.789 0.153 0.152 0.023 0.952Naive -115.916 0.043 0.043 0.223 0.000Complete 0.543 0.177 0.183 0.033 0.9510.30 RC -14.675 0.104 0.103 0.014 0.896RSRC -12.011 0.115 0.112 0.015 0.925GRRC -0.296 0.150 0.147 0.022 0.956GRN 1.233 0.149 0.147 0.022 0.953Naive -156.462 0.045 0.045 0.404 0.000Complete 0.123 0.176 0.181 0.033 0.9521 0.15 RC -14.970 0.124 0.123 0.019 0.918RSRC -12.677 0.137 0.138 0.022 0.926GRRC -0.370 0.162 0.160 0.026 0.954GRN 0.074 0.164 0.163 0.026 0.956Naive -111.082 0.036 0.036 0.204 0.000Complete -0.074 0.176 0.180 0.032 0.9470.30 RC -14.477 0.124 0.123 0.019 0.919RSRC -12.529 0.137 0.137 0.021 0.929GRRC -0.074 0.162 0.160 0.026 0.958GRN 0.715 0.164 0.162 0.026 0.955Naive -138.212 0.037 0.038 0.315 0.000Complete 0.247 0.176 0.181 0.033 0.9481 0.5 0.15 RC -17.091 0.108 0.107 0.016 0.896RSRC -15.587 0.117 0.118 0.018 0.901GRRC -0.074 0.153 0.151 0.023 0.960GRN 0.666 0.154 0.152 0.023 0.956Naive -99.367 0.042 0.042 0.164 0.000Complete 0.617 0.177 0.181 0.033 0.9520.30 RC -17.042 0.108 0.108 0.016 0.890RSRC -15.538 0.118 0.119 0.018 0.901GRRC -0.173 0.153 0.151 0.023 0.956GRN 0.987 0.154 0.152 0.023 0.950Naive -126.003 0.044 0.044 0.263 0.000Complete 0.592 0.178 0.183 0.034 0.9501 0.15 RC -16.993 0.129 0.127 0.021 0.910RSRC -15.784 0.140 0.142 0.024 0.910GRRC 0.247 0.165 0.164 0.027 0.959GRN 0.765 0.166 0.166 0.028 0.956Naive -99.614 0.036 0.036 0.164 0.000Complete 0.518 0.178 0.183 0.034 0.9470.30 RC -17.067 0.129 0.127 0.021 0.908RSRC -15.316 0.141 0.141 0.024 0.914GRRC -0.222 0.164 0.162 0.026 0.962GRN 0.567 0.165 0.164 0.027 0.963Naive -118.136 0.036 0.037 0.231 0.000Complete 0.222 0.177 0.182 0.033 0.947 β X = log 3 under correlated additive measurement error in the outcomeand classical measurement error in the covariate X with normally distributed error and 25 % censoringfor the true event time. For simulated data sets, the bias, average bootstrap standard error (ASE) forthe 4 proposed estimators, average model standard error (ASE) for naive and complete case, empiricalstandard error (ESE), mean squared error (MSE), and 95 % coverage probabilities (CP) are presented. β X σ ν σ (cid:15) σ ν,(cid:15) Method % Bias ASE ESE MSE CPlog(3) True 0.046 0.037 0.036 0.001 0.9550.5 0.5 0.15 RC -30.921 0.073 0.074 0.121 0.018RSRC -22.665 0.088 0.087 0.070 0.216GRRC 0.200 0.112 0.110 0.012 0.952GRN 0.291 0.112 0.111 0.012 0.954Naive -70.016 0.023 0.026 0.592 0.000Complete 0.792 0.118 0.118 0.014 0.9570.30 RC -31.658 0.076 0.076 0.127 0.024RSRC -22.774 0.092 0.091 0.071 0.240GRRC 0.300 0.112 0.110 0.012 0.957GRN 0.319 0.112 0.110 0.012 0.954Naive -76.032 0.023 0.025 0.698 0.000Complete 0.737 0.118 0.117 0.014 0.9551 0.15 RC -31.604 0.087 0.088 0.128 0.062RSRC -23.903 0.104 0.103 0.080 0.304GRRC 0.401 0.115 0.113 0.013 0.953GRN 0.428 0.115 0.113 0.013 0.952Naive -78.572 0.020 0.021 0.746 0.000Complete 0.792 0.118 0.118 0.014 0.9600.30 RC -32.159 0.090 0.089 0.133 0.065RSRC -24.058 0.108 0.107 0.081 0.328GRRC 0.437 0.115 0.113 0.013 0.956GRN 0.519 0.115 0.113 0.013 0.952Naive -82.713 0.019 0.021 0.826 0.000Complete 0.801 0.118 0.118 0.014 0.9581 0.5 0.15 RC -35.681 0.075 0.076 0.159 0.008RSRC -26.488 0.090 0.090 0.093 0.150GRRC 0.191 0.113 0.112 0.012 0.950GRN 0.218 0.113 0.112 0.012 0.952Naive -71.244 0.023 0.025 0.613 0.000Complete 0.746 0.118 0.118 0.014 0.9560.30 RC -36.382 0.077 0.077 0.166 0.009RSRC -26.961 0.093 0.092 0.096 0.156GRRC 0.300 0.113 0.111 0.012 0.954GRN 0.300 0.113 0.111 0.012 0.956Naive -76.360 0.023 0.025 0.704 0.000Complete 0.737 0.118 0.118 0.014 0.9561 0.15 RC -36.055 0.090 0.090 0.165 0.034RSRC -27.835 0.107 0.106 0.105 0.222GRRC 0.382 0.116 0.114 0.013 0.948GRN 0.428 0.115 0.114 0.013 0.950Naive -79.437 0.020 0.021 0.762 0.000Complete 0.801 0.118 0.118 0.014 0.9570.30 RC -36.564 0.091 0.091 0.170 0.039RSRC -28.190 0.110 0.108 0.108 0.231GRRC 0.382 0.116 0.114 0.013 0.952GRN 0.437 0.115 0.114 0.013 0.954Naive -82.977 0.019 0.021 0.831 0.000Complete 0.765 0.118 0.118 0.014 0.957 ppendix E: Gamma distributed error tables Table 9: Simulation results for β X under additive measurement error only in the outcome with gammadistributed error and 25 and 75 % censoring for the true event time. For simulated data sets, thebias, average bootstrap standard error (ASE) for the 4 proposed estimators, average model standard error(ASE) for naive and complete case, empirical standard error (ESE), mean squared error (MSE), and 95 % coverage probabilities (CP) are presented. % Censoring β X σ ν Method % Bias ASE ESE MSE CP25 log(1.5) True 0.099 0.030 0.032 0.001 0.9420.5 RC -19.558 0.045 0.045 0.008 0.574RSRC -4.563 0.060 0.059 0.004 0.935GRRC -0.567 0.067 0.067 0.004 0.949GRN -0.567 0.066 0.067 0.004 0.947Naive -31.371 0.030 0.032 0.017 0.018Complete 0.543 0.098 0.100 0.010 0.9521 RC -28.905 0.052 0.052 0.016 0.380RSRC -8.879 0.071 0.071 0.006 0.918GRRC -0.617 0.075 0.076 0.006 0.950GRN -0.592 0.075 0.076 0.006 0.945Naive -38.869 0.029 0.032 0.026 0.001Complete 0.617 0.098 0.100 0.010 0.949log(3) True 0.155 0.037 0.037 0.001 0.9410.5 RC -33.733 0.055 0.056 0.140 0.000RSRC -23.156 0.067 0.069 0.069 0.041GRRC -1.211 0.113 0.116 0.014 0.923GRN -1.211 0.113 0.119 0.014 0.920Naive -38.166 0.030 0.043 0.178 0.000Complete 0.819 0.119 0.121 0.015 0.9481 RC -41.334 0.058 0.059 0.210 0.000RSRC -28.254 0.074 0.076 0.102 0.019GRRC -0.892 0.115 0.116 0.014 0.936GRN -0.856 0.115 0.122 0.015 0.928Naive -44.948 0.030 0.041 0.246 0.000Complete 0.874 0.119 0.122 0.015 0.94675 log(1.5) True 0.395 0.054 0.056 0.003 0.9360.5 RC -19.829 0.080 0.080 0.013 0.834RSRC -9.989 0.100 0.103 0.012 0.921GRRC 0.518 0.118 0.118 0.014 0.954GRN 0.543 0.116 0.116 0.014 0.956Naive -40.719 0.054 0.057 0.031 0.156Complete 2.318 0.177 0.180 0.032 0.9501 RC -19.903 0.089 0.091 0.015 0.854RSRC -13.762 0.112 0.119 0.017 0.906GRRC 0.641 0.121 0.120 0.014 0.958GRN 0.641 0.119 0.118 0.014 0.952Naive -36.279 0.054 0.058 0.025 0.242Complete 2.738 0.178 0.181 0.033 0.948log(3) True 0.300 0.058 0.059 0.003 0.9480.5 RC -33.187 0.086 0.087 0.140 0.010RSRC -28.527 0.107 0.110 0.110 0.168GRRC -0.692 0.174 0.176 0.031 0.937GRN -0.546 0.173 0.180 0.032 0.940Naive -40.469 0.053 0.068 0.202 0.000Complete 2.458 0.193 0.200 0.041 0.9461 RC -33.824 0.097 0.100 0.148 0.022RSRC -30.957 0.121 0.128 0.132 0.201GRRC -0.628 0.176 0.183 0.034 0.938GRN -0.528 0.174 0.183 0.034 0.934Naive -39.186 0.053 0.068 0.190 0.000Complete 2.485 0.193 0.204 0.042 0.944 β X = log 1 . under correlated, additive measurement error in the out-come and covariate X with gamma distributed error and 25 % censoring for the true event time. For simulated data sets, the bias, average bootstrap standard error (ASE) for the 4 proposed estimators,average model standard error (ASE) for naive and complete case, empirical standard error (ESE), meansquared error (MSE), and 95 % coverage probabilities (CP) are presented. β X σ ν σ (cid:15) σ ν,(cid:15) Method % Bias ASE ESE MSE CPlog(1.5) True 0.099 0.030 0.032 0.001 0.9420.5 0.5 0.15 RC -23.060 0.057 0.057 0.012 0.601RSRC -5.944 0.075 0.076 0.006 0.928GRRC -0.888 0.081 0.082 0.007 0.945GRN -0.814 0.081 0.082 0.007 0.943Naive -56.972 0.025 0.028 0.054 0.000Complete 0.543 0.098 0.100 0.010 0.9520.30 RC -25.206 0.058 0.058 0.014 0.547RSRC -4.760 0.077 0.079 0.007 0.925GRRC -1.282 0.082 0.084 0.007 0.941GRN -1.110 0.082 0.083 0.007 0.943Naive -62.718 0.025 0.028 0.066 0.000Complete 0.543 0.098 0.099 0.010 0.9521 0.15 RC -25.403 0.068 0.067 0.015 0.607RSRC -8.903 0.086 0.087 0.009 0.906GRRC -1.726 0.087 0.089 0.008 0.938GRN -1.578 0.086 0.088 0.008 0.942Naive -66.689 0.022 0.025 0.074 0.000Complete 0.469 0.098 0.100 0.010 0.9520.30 RC -27.499 0.068 0.068 0.017 0.562RSRC -7.941 0.088 0.091 0.009 0.901GRRC -1.899 0.088 0.090 0.008 0.934GRN -1.603 0.087 0.089 0.008 0.938Naive -71.030 0.022 0.026 0.084 0.000Complete 0.641 0.098 0.100 0.010 0.9461 0.5 0.15 RC -31.988 0.064 0.063 0.021 0.468RSRC -9.323 0.087 0.090 0.009 0.912GRRC -0.863 0.085 0.086 0.007 0.950GRN -0.789 0.085 0.086 0.007 0.949Naive -61.189 0.025 0.028 0.062 0.000Complete 0.617 0.098 0.10 0.010 0.9490.30 RC -33.961 0.064 0.064 0.023 0.417RSRC -7.769 0.088 0.092 0.009 0.910GRRC -1.233 0.086 0.087 0.008 0.944GRN -1.061 0.086 0.086 0.008 0.948Naive -66.023 0.025 0.028 0.072 0.000Complete 0.543 0.098 0.100 0.010 0.9501 0.15 RC -33.862 0.074 0.073 0.024 0.506RSRC -11.666 0.099 0.102 0.013 0.899GRRC -1.430 0.090 0.091 0.008 0.942GRN -1.307 0.090 0.090 0.008 0.944Naive -69.870 0.022 0.025 0.081 0.000Complete 0.617 0.098 0.100 0.010 0.9540.30 RC -35.737 0.075 0.074 0.026 0.462RSRC -10.432 0.101 0.104 0.013 0.905GRRC -1.554 0.090 0.092 0.009 0.948GRN -1.455 0.090 0.091 0.008 0.948Naive -73.447 0.022 0.025 0.089 0.000Complete 0.567 0.098 0.100 0.010 0.952 β X = log 1 . under correlated, additive measurement error in the out-come and covariate X with gamma distributed error and 75 % censoring for the true event time. For simulated data sets, the bias, average bootstrap standard error (ASE) for the 4 proposed estimators,average model standard error (ASE) for naive and complete case, empirical standard error (ESE), meansquared error (MSE), and 95 % coverage probabilities (CP) are presented. β X σ ν σ (cid:15) σ ν,(cid:15) Method % Bias ASE ESE MSE CPlog(1.5) True 0.395 0.054 0.056 0.003 0.9360.5 0.5 0.15 RC -25.946 0.097 0.095 0.020 0.807RSRC -7.966 0.124 0.129 0.018 0.919GRRC 1.110 0.148 0.146 0.021 0.953GRN 1.159 0.148 0.144 0.021 0.952Naive -68.835 0.046 0.049 0.080 0.000Complete 2.318 0.177 0.180 0.032 0.9500.30 RC -30.582 0.097 0.096 0.025 0.734RSRC -5.105 0.125 0.132 0.018 0.920GRRC 1.061 0.149 0.146 0.021 0.950GRN 1.554 0.148 0.144 0.021 0.952Naive -79.292 0.046 0.050 0.106 0.000Complete 2.417 0.177 0.182 0.033 0.9471 0.15 RC -27.154 0.111 0.110 0.024 0.806RSRC -9.939 0.140 0.150 0.024 0.912GRRC 0.937 0.158 0.153 0.023 0.958GRN 1.061 0.157 0.152 0.023 0.954Naive -75.666 0.040 0.043 0.096 0.000Complete 2.220 0.177 0.181 0.033 0.9470.30 RC -31.470 0.110 0.109 0.028 0.748RSRC -7.670 0.143 0.155 0.025 0.908GRRC 0.913 0.158 0.153 0.024 0.954GRN 1.529 0.157 0.153 0.023 0.953Naive -83.287 0.040 0.043 0.116 0.000Complete 2.664 0.177 0.179 0.032 0.9521 0.5 0.15 RC -25.107 0.107 0.108 0.022 0.842RSRC -12.110 0.138 0.149 0.025 0.906GRRC 1.554 0.150 0.145 0.021 0.954GRN 1.603 0.149 0.144 0.021 0.954Naive -63.088 0.046 0.050 0.068 0.001Complete 2.738 0.178 0.181 0.033 0.9480.30 RC -27.820 0.106 0.105 0.024 0.810RSRC -8.484 0.138 0.150 0.024 0.917GRRC 1.159 0.150 0.149 0.022 0.952GRN 1.332 0.149 0.147 0.022 0.949Naive -70.413 0.046 0.049 0.084 0.000Complete 2.713 0.177 0.182 0.033 0.9491 0.15 RC -26.439 0.122 0.122 0.026 0.836RSRC -14.675 0.155 0.171 0.033 0.895GRRC 0.715 0.158 0.152 0.023 0.954GRN 1.061 0.157 0.152 0.023 0.952Naive -71.128 0.040 0.042 0.085 0.000Complete 2.220 0.177 0.178 0.032 0.9540.30 RC -29.448 0.121 0.121 0.029 0.810RSRC -11.444 0.156 0.174 0.032 0.899GRRC 1.208 0.160 0.154 0.024 0.956GRN 1.455 0.158 0.152 0.023 0.954Naive -76.801 0.040 0.043 0.099 0.000Complete 3.132 0.178 0.178 0.032 0.955 β X = log 3 under correlated, additive measurement error in the outcomeand covariate X with gamma distributed error and 25 % censoring for the true event time. For simulated data sets, the bias, average bootstrap standard error (ASE) for the 4 proposed estimators,average model standard error (ASE) for naive and complete case, empirical standard error (ESE), meansquared error (MSE), and 95 % coverage probabilities (CP) are presented. β X σ ν σ (cid:15) σ ν,(cid:15) Method % Bias ASE ESE MSE CPlog(3) True 0.146 0.037 0.038 0.001 0.9440.5 0.5 0.15 RC -39.113 0.071 0.072 0.190 0.002RSRC -28.864 0.089 0.094 0.109 0.106GRRC -0.965 0.118 0.119 0.014 0.937GRN -0.901 0.118 0.119 0.014 0.937Naive -60.376 0.025 0.036 0.441 0.000Complete 0.819 0.119 0.121 0.015 0.9480.30 RC -40.879 0.072 0.074 0.207 0.002RSRC -29.710 0.093 0.099 0.116 0.122GRRC -1.047 0.122 0.122 0.015 0.936GRN -0.947 0.120 0.121 0.015 0.934Naive -62.998 0.025 0.038 0.480 0.000Complete 0.892 0.119 0.122 0.015 0.9481 0.15 RC -44.438 0.090 0.093 0.247 0.018RSRC -34.726 0.106 0.114 0.159 0.128GRRC -1.265 0.129 0.132 0.018 0.932GRN -1.192 0.127 0.128 0.016 0.931Naive -71.254 0.020 0.035 0.614 0.000Complete 0.856 0.119 0.121 0.015 0.9480.30 RC -45.912 0.090 0.094 0.263 0.015RSRC -35.208 0.110 0.119 0.164 0.145GRRC -1.338 0.131 0.131 0.017 0.930GRN -1.320 0.128 0.129 0.017 0.930Naive -73.056 0.020 0.036 0.646 0.000Complete 0.819 0.119 0.121 0.015 0.9471 0.5 0.15 RC -45.066 0.074 0.074 0.251 0.000RSRC -32.204 0.095 0.100 0.135 0.079GRRC -0.664 0.119 0.120 0.014 0.941GRN -0.674 0.118 0.119 0.014 0.937Naive -63.871 0.025 0.034 0.494 0.000Complete 0.874 0.119 0.122 0.015 0.9460.30 RC -46.322 0.074 0.074 0.264 0.000RSRC -32.668 0.097 0.103 0.139 0.095GRRC -0.819 0.121 0.120 0.014 0.938GRN -0.755 0.119 0.120 0.014 0.937Naive -65.883 0.025 0.035 0.525 0.000Complete 0.847 0.119 0.121 0.015 0.9501 0.15 RC -49.171 0.091 0.093 0.300 0.008RSRC -36.755 0.112 0.118 0.177 0.124GRRC -0.992 0.126 0.127 0.016 0.938GRN -0.956 0.125 0.126 0.016 0.937Naive -73.393 0.020 0.033 0.651 0.000Complete 0.828 0.119 0.122 0.015 0.9490.30 RC -50.254 0.091 0.093 0.314 0.006RSRC -37.029 0.114 0.122 0.180 0.130GRRC -1.001 0.128 0.128 0.016 0.936GRN -0.956 0.126 0.128 0.016 0.935Naive -74.831 0.020 0.034 0.677 0.000Complete 0.856 0.119 0.121 0.015 0.949 ppendix F: Misclassiﬁcation table Table 13: Simulation results for β X = log 1 . under misspeciﬁcation and correlated, additive measure-ment error in the outcome and covariate X with normally distributed error, 75 % censoring for the trueevent time, sensitivity, and speciﬁcity. For simulated data sets, the bias, average bootstrapstandard error (ASE) for the 4 proposed estimators, average model standard error (ASE) for naive andcomplete case, empirical standard error (ESE), mean squared error (MSE), and 95 % coverage probabili-ties (CP) are presented. β X σ ν σ (cid:15) σ ν,(cid:15) Method % Bias ASE ESE MSE CPlog(1.5) True -0.099 0.055 0.054 0.003 0.9530.5 0.5 0.15 RC -43.111 0.106 0.101 0.041 0.611RSRC -40.842 0.118 0.117 0.041 0.681GRRC -0.049 0.170 0.163 0.027 0.952GRN 0.641 0.172 0.164 0.027 0.954Naive -141.097 0.042 0.042 0.329 0.000Complete -0.025 0.177 0.178 0.032 0.953

Appendix G: VCCC eligibility criteria

We analyzed data on 4797 HIV-positive patients that established care at the VCCC between 1998 and2013. For the virologic failure outcome, patients were excluded if they had an indeterminate ART startdate, started ART prior to enrollment, had no CD4 count measurement between 180 days before or30 days after starting ART, or had no follow-up after starting ART. Using the unvalidated data, 2143patients met the criteria for inclusion, of which 1863 met the criteria using the validated data. These1863 patients were used in all further analyses to ensure that any differences between estimators are notdue to the differences in included patients. For the ADE outcome, the exclusion criteria was similarto that of the former analysis except we additionally excluded patients that had an ADE before ARTinitiation and those with indeterminate ADE dates. Using the unvalidated data, 1995 patients met theADE analysis criteria, of which 1595 met the criteria using the validated data. Again, these 1595 wereused in all further ADE analyses. Note that for both analyses, failures within 6 months of ART start werenot considered a true failure due to the time required by the regimen to be efﬁcacious. In addition, wemade some further simplifying assumptions for the purpose of this data example for ease of exposition.Speciﬁcally, we removed subjects from the analyses that were not in both the unvalidated and validateddatasets for ease of interpretation and selected validation subsets as if we did not validate all subjects.45his was done to highlight the application of our methods and be able to effectively compare their relativeperformance.Of the 1863 patients in the analysis of the virologic failure outcome, 20 were incorrectly classiﬁed ashaving failed, resulting in a misclassiﬁcation rate. There were 386 incorrectly recorded event times,with the error having mean and standard deviation of − . and . years, respectively. CD4 count atART start was incorrect for 125 patients, with the error having mean and standard deviation of and cell / mm , respectively. The correlation between the error in the failure times and CD4 count at ARTinitiation for subjects with both types of error was − . .Of the 1595 patients in the analysis of the ADE outcome, 161 were incorrectly classiﬁed as havinghad an ADE and 12 were incorrectly classiﬁed as having been censored, resulting in an appreciablemisclassiﬁcation rate of . There were 551 incorrectly recorded event times, with the error havingmean and standard deviation of − . and . years, respectively. CD4 count at ART start was incorrectfor 107 patients, with the error having mean and standard deviation of and cell / mm , respectively.The correlation between the error in the failure times and CD4 count at ART initiation for subjects withboth types of error was − . . 46 ppendix H: VCCC table Table 14: The hazard ratios (HR) and their corresponding conﬁdence intervals (CI) for a 100cell/mm increase in CD4 count at ART initiation and 10 year increase in age at CD4 count measurement.The CIs are calculated using the bootstrap for the RC, RSRC, GRRC, and GRN estimators. Outcome Method × CD4 × AgeTime to virologic failure True 0.902 (0.869, 0.935) 0.860 (0.806, 0.916)RC 0.920 (0.888, 0.953) 0.880 (0.825, 0.939)RSRC 0.918 (0.885, 0.953) 0.879 (0.821, 0.942)GRRC 0.918 (0.883, 0.954) 0.869 (0.811, 0.932)GRN 0.918 (0.882, 0.956) 0.869 (0.802, 0.942)Naive 0.918 (0.885, 0.953) 0.878 (0.824, 0.936)HT 0.929 (0.852, 1.012) 0.790 (0.679, 0.919)Time to ADE True 0.693 (0.593, 0.809) 0.829 (0.671, 1.023)RC 0.899 (0.832, 0.971) 1.071 (0.940, 1.221)RSRC 0.895 (0.827, 0.969) 1.073 (0.938, 1.226)GRRC 0.694 (0.565, 0.852) 0.883 (0.632, 1.234)GRN 0.693 (0.564, 0.853) 0.883 (0.622, 1.253)Naive 0.910 (0.841, 0.986) 1.087 (0.957, 1.235)HT 0.748 (0.597, 0.939) 1.114 (0.757, 1.640)