[PDF] A simulation-extrapolation approach for the mixture cure model with mismeasured covariates

Abstract

We consider survival data from a population with cured subjects in the presence of mismeasured covariates. We use the mixture cure model to account for the individuals that will never experience the event and at the same time distinguish between the effect of the covariates on the cure probabilities and on survival times. In particular, for practical applications, it seems of interest to assume a logistic form of the incidence and a Cox proportional hazards model for the latency. To correct the estimators for the bias introduced by the measurement error, we use the simex algorithm, which is a very general simulation based method. It essentially estimates this bias by introducing additional error to the data and then recovers bias corrected estimators through an extrapolation approach. The estimators are shown to be consistent and asymptotically normally distributed when the true extrapolation function is known. We investigate their finite sample performance through a simulation study and apply the proposed method to analyse the effect of the prostate specific antigen (PSA) on patients with prostate cancer.

Full PDF

AA simulation-extrapolation approach for the mixture curemodel with mismeasured covariates B Y E NI M USTA

ORSTAT, KU LeuvenNaamsestraat 69, 3000 Leuven, Belgium [email protected]

NGRID V AN K EILEGOM

ORSTAT, KU LeuvenNaamsestraat 69, 3000 Leuven, Belgium [email protected] S UMMARY

We consider survival data from a population with cured subjects in the presence of mismea-sured covariates. We use the mixture cure model to account for the individuals that will neverexperience the event and at the same time distinguish between the effect of the covariates onthe cure probabilities and on survival times. In particular, for practical applications, it seems ofinterest to assume a logistic form of the incidence and a Cox proportional hazards model forthe latency. To correct the estimators for the bias introduced by the measurement error, we usethe simex algorithm, which is a very general simulation based method. It essentially estimatesthis bias by introducing additional error to the data and then recovers bias corrected estimatorsthrough an extrapolation approach. The estimators are shown to be consistent and asymptoticallynormally distributed when the true extrapolation function is known. We investigate their ﬁnitesample performance through a simulation study and apply the proposed method to analyse theeffect of the prostate speciﬁc antigen (PSA) on patients with prostate cancer.

Some key words : cure models; logistic model; measurement error; simex algorithm; survival analysis.

1. I

NTRODUCTION

Classical survival analysis methods are designed to deal with time-to-event data in the pres-ence of censoring and covariates. However, they often fail to address various challenges pre-sented by real-life problems. In recent times, signiﬁcant advances have been made in adaptingand extending traditional methods for handling data with more complex features. In this articlewe account simultaneously for a cure fraction of the population, referring to those subjects thatare immune to the event of interest, and covariates measured with error. Such situations arisefrequently in practice. For instance, in cancer studies it is known that some of the patients willnever experience recurrence or cancer related death and certain biomarker expressions such asthe hemoglobin level or tumor size cannot be measured precisely. The systolic blood pressureis also known to be an error-prone predictor for the development of the coronary heart disease.Examples of variables that cannot be measured precisely end events that are not experienced1 a r X i v : . [ s t a t . M E ] S e p E. M

USTA AND

I. V AN K EILEGOM by the whole population can also be found in economic and social studies. Ignoring both thesecharacteristics in the statistical procedures would most probably lead to incorrect inferences.Cure rate models were ﬁrst introduced by Boag (1949) and Berkson & Gage (1952), but onlyquite recently they have attracted attention in the statistical literature and applications. The pro-posed models can be divided in two main categories: mixture cure models and promotion timemodels (see Amico & Van Keilegom (2018) for a detailed review). The ﬁrst ones assume thatthe population consists of two subpopulations, the cured and the susceptible ones, and modelseparately the incidence (the probability of being noncured) and the latency (the survival of thenoncured subjects) using parametric or nonparametric models. The latter ones have a propor-tional hazards structure and extend the classical Cox regression model to allow for the survivalfunction to ﬂatten at a level greater than zero. There is no clear indication of which approach ismore appropriate but in general, mixture cure models are preferred when one wants to distinguishbetween variables that affect the cure probability and the survival of the uncured subjects.On the other hand, there is a vast literature about bias correction methods mainly in regres-sion models with covariates contaminated by measurement error (see Carroll et al. (2006)). Theclassical additive error model is generally accepted and the most common methods to deal withit are the so called functional ones, which do not make any assumption on the distribution ofthe unobserved true covariates. They can be divided in three large classes of models: regressioncalibration, score functional methods and simulation-extrapolation (simex). The latter one is inparticular quite appealing because it is a simulation based method and it can be easily adaptedto any kind of model. It only requires an estimation method in the absence of measurement errorand can be easily implemented (though computationally more intensive). In survival analysis ithas been applied to the semiparametric Cox model (Carroll et al. (2006)), the marginal hazardsmodel for multivariate failure time data (Greene & Cai (2004)), the frailty model for clusteredsurvival data (Li & Lin (2003)) etc.However, there are only limited studies on cure rate models with measurement error. Thisproblem was ﬁrst addressed by Mizoi et al. (2007) and Ma & Yin (2008), who propose a cor-rected score approach for the parametric and semiparametric promotion time models respec-tively. Afterwards, the simex procedure was introduced as an alternative estimation method ina more general version of promotion time models by Bertrand et al. (2017a) and an extensivesimulation study was done by Bertrand et al. (2017b) to compare it with the corrected score ap-proach and get a better understanding on the robustness of the method. In the context of mixturecure models, the simex algorithm has only been proposed for left-truncated right-censored datawhen a transformation model is assumed for the latency (Chen (2019)). However, Chen (2019)considers only the case in which the mismeasured covariate affects only the latency and theoryis developed for one speciﬁc estimation method based on martingale integral representations. Inparticular, the most commonly used logistic/Cox mixture cure model for right-censored data andthe maximum likelihood estimation method (based on the EM algorithm) have not been investi-gated in presence of measurement error. The popularity of this model motivates us to search forsolutions to correct estimates for the biases induced by the measurement error.Here we propose a simex approach for a general mixture cure model with a parametric formof the incidence and a semiparametric model for the latency. Any estimation method in the ab-sence of measurement error can be used within the simex algorithm. We focus mainly on thelogistic/Cox setting, given its practical relevance, but the proposed procedure and the asymp-totic theory hold for other mixture cure models as well, provided that the considered estimationmethod in the absence of measurement error satisﬁes certain conditions. In particular, these con-ditions are satisﬁed for the maximum likelihood estimator introduced in Sy & Taylor (2000) andthe presmoothing approach proposed by Musta et al. (2020). We use both these estimators in ixture cure model with mismeasured covariates the simex procedure and compare them through a simulation study. In contrast to the previouslyconsidered promotion time models, here we ﬁnd that if the mismeasured covariate affects onlyone of the two components (incidence or latency), the estimation of the other component remainsundisturbed even if the variables are correlated. However, the use of the simex algorithm to cor-rect for the bias, not always leads to better results in terms of mean squared error. The decisionon whether to choose simex over the naive approach (ignoring the bias) depends on a numberof factors. In particular, a large sample size, a strong effect of the covariate, a relatively largemeasurement error and low censoring favour the use of the simex approach.The article is organized as follows. We start by describing a general parametric/semiparametricmixture cure model with measurement error in Section 2 and then explain the simex estimationprocedure in Section 3. Asymptotic properties of the estimators are presented in Section 4, whiletheir practical performance for the logistic/Cox mixture cure model is demonstrated throughsimulation studies in Section 5. Finally, in Section 6, we apply the proposed method to a prostatecancer dataset to account for measurement error in the values of the prostate speciﬁc antigen.

2. M

IXTURE CURE MODEL WITH MEASUREMENT ERROR

Suppose we are interested in the time T until a certain event happens. In contrast to classi-cal survival analysis, in cure models it is possible to have T = ∞ (the event never happens),reﬂecting the presence of a cure fraction. On the other hand, a ﬁnite survival time correspondsto susceptible subjects that will experience the event at some time point. If we indicate by B theuncured status, i.e. B = { T < ∞} , then we can write T = BT + (1 − B ) ∞ , with T representing the survival time for an uncured individual. The challenge of dealing withthis type of models arises from the fact that, because of ﬁnite censoring times, it is impossibleto completely separate the two groups. To be precise, if C denotes the censoring time, thenwe only observe the follow-up time Y = min( T, C ) and the censoring indicator ∆ = { T ≤ C } .Hence, for the observations with ∆ = 0 , we do not know whether they are cured or susceptible.In addition to the cure fraction and censoring, it is desirable to also account for the impact ofcertain covariates on the time to event variable. Let ( X T , Z T ) T a ( p + q ) -dimensional vectorof covariates, where x T denotes the transpose of the vector x . The advantage of mixture curemodels with respect to promotion time models is that they can distinguish between the covariates X , which affect the cure rate, and Z , which affect the survival of the uncured subjects, i.e. P ( T = ∞| X, Z ) = P ( T = ∞| X ) and P ( T < ∞| X, Z ) = P ( T < ∞| Z ) . However, it is possible for X and Z to be the same or share some of the components. As com-monly done in studies of cure models, we assume that the censoring time and the survival timeare independent given the covariates T ⊥ C | ( X, Z ) , (1)which is equivalent to requiring T ⊥ ( C, X ) | Z and B ⊥ ( C, T , Z ) | X (see Lemma 1 in Ap-pendix A of Musta et al. (2020)).In this paper we deal with situations in which some of the continuous covariates included in X and/or Z are subject to measurement error. For ease of notation and interpretation we deﬁne thevector of unique covariates ( E (1) T , E (2) T , E (3) T ) T ∈ R p + q where E (1) denotes the covariatesin X that are not present in Z , E (2) denotes the common components of X and Z , E (3) denotesthe covariates in Z that are not present in X and q is the number of covariates in E (3) . In other E. M

USTA AND

I. V AN K EILEGOM words, we are removing the repeated covariates from the vector ( X T , Z T ) T without loosingany information. In the presence of measurement error, instead of ( E (1) T , E (2) T , E (3) T ) T , weobserve W = ( W (1) T , W (2) T , W (3) T ) T such that W = (cid:16) E (1) T , E (2) T , E (3) T (cid:17) T + U (2)where U ∈ R p + q is the vector of measurement errors. We assume that U is independent of ( X, Z, T, C ) and it follows a continuous distribution with mean zero and known variance ma-trix V . The elements of V corresponding to covariates with no measurement error (includingnon-continuous covariates) are set to zero. However, no parametric assumption is made on thedistribution of the errors. In particular, the measurement error is not required to be normallydistributed.We consider a general mixture cure model with a parametric form of the incidence and a semi-parametric model for the latency. To be precise, the cure probability of a subject with covariate x is π ( x ) = 1 − φ ( γ , x ) for some known function φ : R p × R p (cid:55)→ [0 , and γ ∈ R p , while the conditional survival func-tion of the noncured subjects S u ( ·| z ) depends on a parametric component β and a nonparamet-ric non-decreasing function Λ (for example the cumulative baseline hazard). As a result, theconditional survival function corresponding to T is then S ( t | x, z ) = P ( T > t | X = x, Z = z ) = 1 − φ ( γ , x ) + φ ( γ , x ) S u ( t | z ) . The logistic model, where φ ( γ, x ) = e γ T x e γ T x , (3)is perhaps the most common one when a parametric form of the cure probability is adequate. Onthe other hand, the Cox proportional hazards model (Cox (1972)) S u ( t | z ) = exp (cid:8) − Λ ( t ) exp( β T z ) (cid:9) , (4)and the accelerated failure time model S u ( t | z ) = exp (cid:8) − Λ (cid:0) exp (cid:0) β T z (cid:1) t (cid:1)(cid:9) , where Λ is the baseline cumulative hazard function, are both widely used semiparametric mod-elling approaches for the latency. However, our methodology applies more in general to paramet-ric/semiparametric mixture cure models provided that an estimation method for the case withoutmeasurement error is available. The goal is to estimate the true parameters γ , β and Λ on thebasis of n i.i.d. observations ( Y , ∆ , W ) , . . . , ( Y n , ∆ n , W n ) , knowing the variance matrix V of the measurement error. In the next section we propose a simulation-extrapolation approachdesigned to reduce the bias due to the measurement error.

3. M

ETHODOLOGY

The basic idea behind the simex algorithm is that we can gain insights on how the measurementerror affects the estimators by creating artiﬁcial data with increasing levels of measurement errorand estimating the parameters as if there was no error. The obtained information is then used in ixture cure model with mismeasured covariates the second step to recover the bias corrected estimators through an extrapolation approach. Nextwe describe the details of this procedure. Step 1. (Simulation)

We choose K levels of added noise λ , . . . , λ K ≥ and for each ofthem we generate a large number B of artiﬁcially contaminated samples. To be precise, for each λ ∈ { λ , . . . , λ K } and b ∈ { , . . . , B } , we simulate independent identically distributed variables { ˜ U b,i } ni =1 , independently of the observed data and with distribution N D (0 , I D ) , where D = p + q is the dimension of the vector W . Afterwards, we construct new covariates W i,λ,b = W i + ( λV ) / ˜ U i,b , where V is the covariance matrix of the error in (2). Distributions different from Gaussian can beused too but here we focus on normal errors. The mixture model satisﬁed by the new covariates W i,λ,b S (cid:18) t (cid:12)(cid:12)(cid:12)(cid:12) (cid:16) W (1) i,λ,b , W (2) i,λ,b (cid:17) , (cid:16) W (2) i,λ,b , W (3) i,λ,b (cid:17)(cid:19) = 1 − φ (cid:16) γ λ , (cid:16) W (1) i,λ,b , W (2) i,λ,b (cid:17)(cid:17) + φ (cid:16) γ λ , (cid:16) W (1) i,λ,b , W (2) i,λ,b (cid:17)(cid:17) S u,λ (cid:18) t (cid:12)(cid:12)(cid:12)(cid:12) (cid:16) W (2) i,λ,b , W (3) i,λ,b (cid:17)(cid:19) , is characterized by the parameters γ λ , β λ and Λ λ . Using { Y i , ∆ i , W i,λ,b } ni =1 we estimate γ λ , β λ and Λ λ , as if there was no measurement error, obtaining ˆ γ λ,b , ˆ β λ,b and ˆΛ λ,b . The latter one is anestimator of Λ over some compact interval [0 , τ ] . Any available estimation method can be used.For example, in the logistic/Cox mixture cure model, the maximum likelihood estimation (Sy &Taylor (2000); Cai et al. (2012)) or the presmoothing approach proposed by Musta et al. (2020)can be considered.At the end, for each level of contamination, the average values of all the B estimates arecalculated: ˆ γ λ = 1 B B (cid:88) b =1 ˆ γ λ,b , ˆ β λ = 1 B B (cid:88) b =1 ˆ β λ,b and ˆΛ λ ( t ) = 1 B B (cid:88) b =1 ˆΛ λ,b ( t ) . (5)Note that, if the estimators ˆΛ λ,b are piecewise constant with jumps at the observed event times,then also ˆΛ λ is piecewise constant with jumps at the observed event times. The parameters to bechosen in this step are K , the λ s and B . Common values are K = 5 , λ ∈ { , . , , . , } and B = 50 (Carroll et al. (1996); Cook & Stefanski (1994)). Step 2. (Extrapolation)

Note that, by independence, the covariance matrix of the simulatedcovariates W i,λ,b is var( W i,λ,b | X i ) = var( W i | X i ) + λV = (1 + λ ) V. This means that the variance has been inﬂated by a factor λ and that the ideal case of nomeasurement error corresponds to λ = − (adding ‘negative’ variance). Hence, the idea is tomodel the relationship between λ and the estimators ˆ γ λ , ˆ β λ , ˆΛ λ by ﬁtting a regression functionand then extrapolate to λ = − . First, an extrapolant function needs to be chosen (e.g. linear,quadratic or fractional) for each component of ˆ γ λ , ˆ β λ , ˆΛ λ as a function of λ . For example, for E. M

USTA AND

I. V AN K EILEGOM the quadratic case and λ ∈ { λ , . . . , λ K } , we have ˆ γ λ,j = g γ,j ( a ∗ γ j , λ ) + (cid:15) γ,λ,j = a ∗ γ j , + a ∗ γ j , λ + a ∗ γ j , λ + (cid:15) γ,λ,j , j = 1 , . . . , p ˆ β λ,j = g β,j ( a ∗ β j , λ ) + (cid:15) β,λ,j = a ∗ β j , + a ∗ β j , λ + a ∗ β j , λ + (cid:15) β,λ,j , j = 1 , . . . , q ˆΛ λ ( t ) = g Λ ,t ( a ∗ t , λ ) + (cid:15) Λ ,λ,t = a ∗ t, + a ∗ t, λ + a ∗ t, λ + (cid:15) Λ ,λ,t , t ∈ [0 , τ ] , where (cid:15) β,λ,j , (cid:15) γ,λ,j and (cid:15) Λ ,λ,t are the error terms in the extrapolant model, assumed to have meanzero and to be independent. Estimators ˆ a γ j = (ˆ a γ j , , ˆ a γ j , , ˆ a γ j , ) , ˆ a β j = (ˆ a β j , , ˆ a β j , , ˆ a β j , ) and ˆ a t = (ˆ a t, , ˆ a t, , ˆ a t, ) of the unknown parameters of the extrapolant function are obtained by ﬁt-ting the previous regression models using the method of least squares. Finally, the simex estima-tors are deﬁned by ˆ γ j, simex = lim λ →− g γ,j (ˆ a γ j , λ ) , j = 1 , . . . , p, ˆ β j, simex = lim λ →− g β,j (ˆ a β j , λ ) , j = 1 , . . . , q, ˆΛ simex ( t ) = lim λ →− g Λ ,t (ˆ a t , λ ) , t ∈ [0 , τ ] . If the initial estimators ˆΛ λ,b are piecewise constant with jumps at the observed event times,then the extrapolation procedure needs to be applied only for the observed event times t ∈{ T (1) , . . . , T ( m ) } . Equivalently, the procedure can be applied to the jump sizes for different coef-ﬁcients a ∗ and a possibly different extrapolation function (if it is not polynomial). Even thoughthis does not guarantee that the resulting estimator ˆΛ simex is non-decreasing, in practice this isoften the case. If one is interested in estimation of Λ on the whole support and ˆΛ simex is notmonotone, an isotonized version of it, using for example the pool-adjacent-violators algorithm(Robertson et al. (1988)), would be a more reasonable estimate. However, here we focus on es-timation of the parameters γ , β and do not further exploit this aspect. Note also that differentextrapolation functions lead to different results. Hence it is important to have a good approxima-tion of the true extrapolation function.

4. A

SYMPTOTIC PROPERTIES . General results

In this section we establish some theoretical results regarding the large-sample properties ofthe proposed estimators. A drawback of the simex approach is that consistency and asymp-totic normality of the estimators hold only if we knew the true extrapolation function, whichis usually not the case in practice. When the true extrapolant function is not known, but an ap-proximation of it is used, the results of Theorems 1 and 2 hold with γ , β , Λ ( t ) replaced by lim λ →− g γ ( a γ , λ ) , lim λ →− g β ( a β , λ ) and lim λ →− g Λ ( a t , λ ) respectively. Here g γ ( a γ , λ ) de-notes the vector ( g γ, ( a γ , λ ) , . . . , g γ,p ( a γ p , λ )) T and g β ( a β , λ ) , g Λ ( a t , λ ) are deﬁned similarly.We ﬁrst establish the asymptotic results in a general mixture cure model as described in Sec-tion 2, assuming that the used estimation method for obtaining ˆ γ λ,b , ˆ β λ,b , ˆΛ λ,b (ignoring themeasurement error) satisﬁes certain conditions. Afterwards, we will focus on two estimationmethods for the logistic/Cox mixture cure model and show that the required conditions are met.All the proofs can be found in the Supplementary Material. ixture cure model with mismeasured covariates For a ﬁxed λ > consider observations ( Y, ∆ , W λ ) , where W λ = W + ( λV ) / ˜ U and themixture cure model with conditional survival S ( t | W λ ) = 1 − φ (cid:16) γ λ , (cid:16) W (1) λ , W (2) λ (cid:17)(cid:17) + φ (cid:16) γ λ , (cid:16) W (1) λ , W (2) λ (cid:17)(cid:17) S u,λ (cid:18) t (cid:12)(cid:12)(cid:12)(cid:12) (cid:16) W (2) λ , W (3) λ (cid:17)(cid:19) , where, as mentioned in Section 2, the decomposition of W λ in three components corresponds tothe covariates that inﬂuence only the cure probability, those that are common for the incidenceand the latency and the ones that affect only the latency. The survival of the uncured subject S u,λ depends on the regression parameters β λ and the nonparametric function Λ λ . Suppose we have anestimation method that provides estimates ˆ γ λ , ˆ β λ and ˆΛ λ , the latter one being a non-decreasingfunction. The following conditions will be needed in order to establish the asymptotic results.(A1) With probability one and for some τ > we have (cid:107) ˆ γ λ − γ λ (cid:107) → , (cid:107) ˆ β λ − β λ (cid:107) → and sup t ∈ [0 ,τ ] | ˆΛ λ ( t ) − Λ λ ( t ) | → as n → ∞ , i.e. the estimators are strongly consistent. By (cid:107) · (cid:107) we denote the Euclidean norm.(A2) For m < ∞ , deﬁne H m = { h = ( h , h , h ) ∈ BV [0 , τ ] × R p × R q : (cid:107) h (cid:107) H = (cid:107) h (cid:107) v + (cid:107) h (cid:107) + (cid:107) h (cid:107) ≤ m } where BV [0 , τ ] denotes the space of functions of bounded variation on [0 , τ ] , (cid:107) h (cid:107) v = | h (0) | + V τ ( h ) and V τ ( h ) denotes the total variation of h over [0 , τ ] . Uniformly over h ∈ H m we have h T (ˆ γ λ − γ λ ) + h T ( ˆ β λ − β λ ) + (cid:90) τ h ( s )d( ˆΛ λ − Λ λ )( s )= 1 n n (cid:88) i =1 Ψ λ ( Y i , ∆ i , W i,λ , h , h , h ) + o P ( n − / ) for some function Ψ λ such that E [Ψ λ ( Y, ∆ , W λ , h , h , h )] = 0 and for ﬁxed λ , the class { ( y, δ, w ) (cid:55)→ Ψ λ ( y, δ, w, h , h , h ) : ( h , h , h ) ∈ H m } is uniformly bounded and Donsker.Moreover, in what follows, we assume that the extrapolant functions g ( a, λ ) are such that thematrix ˙ g ( a, λ ) of partial derivatives with respect to the elements of a is bounded and continuousat the true parameters a ∗ and has full rank, i.e. ˙ g ( a ∗ , λ ) T ˙ g ( a ∗ , λ ) is invertible.T HEOREM Suppose that condition (A1) is satisﬁed and that Λ is continuous. If the mea-surement error variance and the true extrapolant functions are known then, with probability one, (cid:107) ˆ γ simex − γ (cid:107) → , (cid:107) ˆ β simex − β (cid:107) → and sup t ∈ [0 ,τ ] | ˆΛ simex ( t ) − Λ ( t ) | → . T HEOREM Suppose that conditions (A1)-(A2) are satisﬁed and that Λ is continuous. If themeasurement error variance and the true extrapolant functions are known, then n / (ˆ γ simex − γ ) converges in distribution to N (0 , Σ γ ) and n / ( ˆ β simex − β ) converges in distribution to N (0 , Σ β ) , with Σ γ and Σ β as in (S2) and (S3) in the Supplementary Material. Moreover, n / ( ˆΛ simex − Λ ) converges weakly in l ∞ ([0 , τ ]) to a mean zero Gaussian process G deﬁned in(S4). E. M

USTA AND

I. V AN K EILEGOM

The proofs of Theorems 1 and 2 follow the usual arguments for simex estimators. In particular,consistency relies mainly on the consistency of the estimators for each λ and consistency ofthe estimated extrapolant functions. Moreover, the i.i.d. representation in condition (A2) and theexpressions in (5) allow us to obtain convergence to a Gaussian process for any λ . Finally, theasymptotic normality of the simex estimators follows by the delta method. Details of the proofscan be found in the Supplementary Material.4.2 . Example: logistic/Cox mixture cure model The logistic/Cox mixture cure model is perhaps the most commonly used one for studyingsurvival data in the presence of a cure fraction. It assumes that the function φ ( γ, x ) is as in (3),where the ﬁrst component of x is equal to one and γ corresponds to the intercept. On the otherhand, the distribution of the uncured subjects follows a Cox proportional hazards model as in (4),where Λ is the baseline cumulative hazard, β T Z does not contain an intercept and the matrix var( Z ) is assumed to have full rank for the Cox model to be identiﬁable. The classical estimatorin this setting is the maximum likelihood estimator proposed by Sy & Taylor (2000) and imple-mented in the R package smcure . The estimator is computed through the expectation maxi-mization algorithm because of the unobserved latent variable B and its asymptotic properties areinvestigated in Lu (2008). Recently, an alternative estimation procedure relying on presmooth-ing was proposed by Musta et al. (2020). It uses a preliminary nonparametric estimator for thecure probabilities and ignores the Cox model when estimating γ . It is shown through simula-tions that, if the interest is focused on estimation of the parameters of the incidence, this methodusually performs better that the maximum likelihood estimator. However, both methods lead tovery similar results when estimating the latency. Next we show that these two estimators satisfyour conditions (A1)-(A2) and as a result, both procedures can be used in the SIMEX algorithmleading to consistent and square-root convergent estimators.T HEOREM Consider the maximum likelihood estimation method proposed by Sy & Taylor(2000). Assume that conditions 1-4 in Lu (2008) are satisﬁed. Then our conditions (A1)-(A2)above hold with Ψ λ ( y, δ, w, h , h , h ) as in (S9) in the Supplementary Material. T HEOREM Consider the estimation method proposed by Musta et al. (2020) and assumethat their assumptions (C1)-(C4), (AC2), (AC5)-(AC7) are satisﬁed. Then our conditions (A1)-(A2) above hold with Ψ λ ( y, δ, w, h , h , h ) as in (S11) in the Supplementary Material. In order for the mixture cure model to be identiﬁable, T should have compact support [0 , τ ] such that inf x,z P ( C > τ | X = x, Z = z ) > . Hence, τ in our conditions ( A − ( A is equalto τ . In practice cure rate models are used when there is a long follow-up beyond the largestobserved event time T ( m ) and the zero-tail constraint is applied, i.e. the censored subjects withfollow-up time larger than T ( m ) are considered cured. For being able to develop the asymptotictheory, in Lu (2008) it is assumed that inf z P ( T = τ | Z = z ) > , while Musta et al. (2020)argue that this assumption can be avoided thanks to the presmoothing step.

5. N

UMERICAL STUDY . Setup

In this section we investigate the ﬁnite-sample behaviour of the simex method in the logis-tic/Cox mixture cure model. The two estimation approaches considered in Section 4.2 are usedwithin the simex algorithm and compared with each other in the context of mismeasured covari-ates. Results for a variety of models and scenarios are presented in the next subsections. We try ixture cure model with mismeasured covariates to cover a wide range of situations and capture the effect of the cure rate, censoring rate, samplesize and measurement error variance. Unless stated otherwise, the error distribution is Gaussianand the used extrapolation function is quadratic, which seems to be a good compromise in termsof bias and variance (Cook & Stefanski (1994); Carroll et al. (2006); Li & Lin (2003); Bertrandet al. (2017a)). Finally, we also brieﬂy investigate the robustness of the method with respect tothe extrapolation function, misspeciﬁcation of the error distribution and variance. In all the simu-lation studies, for the simex method, we choose B = 50 , K = 5 , λ ∈ { , . , , . , } (as theseseem to be quite common choices in the literature) and for each setting simulated datasetswere used to compute the bias, variance and mean squared error (MSE) of the estimators. Wealso compare the bias corrected estimators with the naive estimators, which do not take the mea-surement error into account. The bandwidth for the estimator based on presmoothing is chosenas in Musta et al. (2020), i.e. the cross-validation optimal bandwidth for estimation of the con-ditional distribution H ( t | x ) for t ≤ Y ( m ) truncated from above at , where Y ( m ) is the largestuncensored observation and x is the continuous covariate affecting the incidence (standardized).To reduce computational time, we compute this bandwidth only once for the initial dataset anduse the same for the data with added noise. We observed that not updating the bandwidth for each b ∈ { , . . . , B } and λ ∈ { . , , . , } does not have a signiﬁcant impact on the ﬁnal results.Moreover, we assume to know the standard deviation of the error, which is usually not the case inpractice. In such situations, a preliminary step of variance estimation is required before applyingthe simex procedure (see for example Bertrand et al. (2019)).5.2 . One mismeasured covariate We start by considering a simpliﬁed model in which there is only one covariate of interest,measured with error, affecting both the cure probability and the survival of the uncured subjects.

Model 1.

Both incidence and latency depend on one covariate X , which is a standard normalrandom variable. We generate the cure status B as a Bernoulli random variable with successprobability φ ( γ, x ) = 1 / (1 + exp( − γ − γ x )) . The survival times for the uncured observationsare generated according to a Weibull proportional hazards model S u ( t | x ) = exp (cid:0) − µt ρ exp( β T x ) (cid:1) , and are truncated at τ = 7 for ρ = 1 . , µ = 1 . and β = 1 . The censoring times are indepen-dent from X and T . They are generated from the exponential distribution with parameter λ C and are truncated at τ = 9 . Various choices of the parameters γ and λ C with the correspondingcure and censoring rates can be found in Table 1. Here and in what follows, the truncation ofthe survival times and censoring times is done in such a way that τ < τ and it is unlikely toobserve an event time at τ . This mimics real-life situations in which cure models are adequate. X is measured with error, i.e. instead of X we observe W = X + U , where U ∼ N (0 , v ) .Results for sample size n = 200 ( n = 400 ) and measurement error variance v = 0 . aregiven in Table 2 (Table S1 in the Supplementary Material). This corresponds to a large errorsituation since the ratio between the standard deviation of the error and the standard deviation ofthe covariate is . . Below we will consider also settings with smaller measurement error.First of all, we observe that in the presence of measurement error there is usually no advantageof using the presmoothing approach instead of maximum likelihood estimation. In particular,when the bias induced by the measurement error is large, it seems that the estimator based onpresmoothing is more affected for both the naive and the simex method. Moreover, most of thetime the bias is observed only for the coefﬁcients that correspond to the variables measured witherror. As expected, in all cases, the simex algorithm reduces this bias at the price of a largervariance. In terms of mean squared error, it is better to use the naive approach for coefﬁcients0 E. M

USTA AND

I. V AN K EILEGOM

Setting γ Scenario Cure rate γ Cens. rate λ C Cens. level Plateau1 .

20% 1 . .

09 25% 14%2 0 .

50% 0 1 0 .

13 55% 32%2 0 . .

20% 1 . .

07 25% 16%2 0 .

26 35% 9%

50% 0 1 0 .

15 55% 31%2 0 .

20% 2 . . .

33 35% 9%

50% 0 1 0 . . Table 1: Parameter values and characteristics of each scenario for Model 1.that are small in absolute value (the case of γ in setting 1), while the simex method is preferredwhen the absolute value of the coefﬁcient is large (i.e. the covariate has a greater effect on thecure/survival). In this setting, for n = 200 , γ = 0 . seems to be a borderline case, meaning thatthe simex method performs better when the censoring rate is low, while the naive method hassmaller MSE when the censoring rate is high. In addition, results show that when the coefﬁcientof a mismeasured covariate is large, there might be induced bias even for the intercept, whichis also corrected by the simex algorithm. As the sample size increases, the bias created by themeasurement error increases but the variance decreases for both naive and simex estimators.Furthermore, the advantage of using simex instead of ignoring the bias becomes more signiﬁcant.At the same time, the threshold absolute value of a coefﬁcient for which bias correction leads tobetter MSE decreases (simex is preferred for γ = 0 . in setting 2, which was a borderline casefor n = 200 ). 5.3 . More realistic scenarios Through the following four models we try to cover more realistic situations and investigatethe effect of the measurement error on the naive and bias corrected estimators.

Model 2.

Both incidence and latency depend on two independent covariates: X has a uniformdistribution on the interval [ − , and X is a Bernoulli random variable with success probabil-ity . . We generate the cure status B as a Bernoulli random variable with success probability φ ( γ, x ) = 1 / (1 + exp( − γ − γ x − γ x )) . The survival times for the uncured observationsare generated according to a Weibull proportional hazards model S u ( t | x ) = exp ( − µt ρ exp( β x + β x )) , and are truncated at τ for ρ = 1 . and µ = 1 . . The censoring times are independent from ( X, T ) . They are generated from the exponential distribution with parameter λ C and are trun-cated at τ . Instead of X we observe W = X + U , where U ∼ N (0 , v ) . We consider v ∈{ . , . } corresponding to small and large error settings respectively. Model 3.

For the incidence we consider two independent covariates: X has a uniform dis-tribution on the interval [ − , and X is a Bernoulli random variable with success proba-bility . . The latency also depends on two covariates: Z = X and Z is independent ofthe previous ones and normally distributed with mean zero and standard deviation . . We ixture cure model with mismeasured covariates naive - 1 naive - 2 simex - 1 simex - 2Par. Bias Var. MSE Bias Var. MSE Bias Var. MSE Bias Var. MSE / / γ . . . . . . . . . . . . γ − . . . − . . . − . . . − . . . β − . . . − . . . − . . . − . . . / / γ . . . . . . . . . − . . . γ − . . . . . . − . . . . . . β − . . . − . . . − . . . − . . . / / γ . . . . . . . . . . . . γ − . . . − . . . − . . . − . . . β − . . . − . . . − . . . − . . . / / γ . . . − . . . . . . − . . . γ − . . . − . . . − . . . . . . β − . . . − . . . − . . . − . . . / / γ − . . . − . . . . . . . . . γ − . . . − . . . − . . . − . . . β − . . . . . . − . . . − . . . / / γ . . . − . . . . . . . . . γ − . . . − . . . − . . . − . . . β − . . . − . . . − . . . − . . . / / γ . . . . . . . . . . . . γ − . . . − . . . − . . . − . . . β − . . . − . . . − . . . − . . . / / γ . . . − . . . . . . − . . . γ − . . . − . . . − . . . − . . . β − . . . − . . . − . . . − . . . / / γ − . . . − . . . − . . . − . . . γ − . . . − . . . − . . . − . . . β − . . . − . . . − . . . − . . . / / γ − . . . − . . . − . . . − . . . γ − . . . − . . . − . . . − . . . β − . . . − . . . − . . . − . . . / / γ . . . . . . . . . . . . γ − . . . − . . . − . . . − . . . β − . . . − . . . − . . . − . . . / / γ . . . − . . . . . . − . . . γ − . . . − . . . − . . . − . . . β − . . . − . . . − . . . − . . . Table 2: Bias, variance and MSE of ˆ γ and ˆ β for the naive and simex method based on the max-imum likelihood (1) or the presmoothing (2) approach for Model 1 ( n = 200 ). The ﬁrst columngives the setting/scenario/cens. level. All numbers were multiplied by .generate the cure status B as a Bernoulli random variable with success probability φ ( γ, x ) =1 / (1 + exp( − γ − γ x − γ x )) . The survival times for the uncured observations are gener-ated according to a Weibull proportional hazards model S u ( t | z ) = exp ( − µt ρ exp( β z + β z )) , E. M

USTA AND

I. V AN K EILEGOM and are truncated at τ for ρ = 1 . and µ = 1 . . The censoring times are independent from ( T, X, Z ) . They are generated from the exponential distribution with parameter λ C and aretruncated at τ . The mismeasured covariate is Z , i.e. we only observe W = Z + U , where U ∼ N (0 , v ) and v ∈ { . , . } corresponding to small and large error settings respectively. Model 4.

The incidence depends on one covariate X which is a standard normal random vari-able. The latency depends on two covariates: Z = X and Z is independent of X and uniformlydistributed on [ − , . We generate the cure status B as a Bernoulli random variable with successprobability φ ( γ, x ) = 1 / (1 + exp( − γ − γ x )) . The survival times for the uncured observationsare generated according to a Weibull proportional hazards model S u ( t | z ) = exp ( − µt ρ exp( β z + β z )) , (6)and are truncated at τ for ρ = 1 . and µ = 1 . . The censoring times are independent of the vec-tor ( X, Z, T ) . They are generated from the exponential distribution with parameter λ C and aretruncated at τ . Instead of X and Z we observe W = X + U and W = Z + U , where the er-ror terms U ∼ N (0 , v ) and U ∼ N (0 , v ) are independent. We consider ( v , v ) = (0 . , . and ( v , v ) = (0 . , . corresponding to small and large error settings respectively. Model 5.

The incidence depends on one covariate X which is a standard normal random vari-able. The latency depends on two correlated covariates: Z = X and Z = − X + N , where N is a normal random variable with mean zero and standard deviation . independent of X . We generate the cure status B as a Bernoulli random variable with success probability φ ( γ, x ) = 1 / (1 + exp( − γ − γ x )) . The survival times for the uncured observations are gen-erated according to the Weibull proportional hazards model in (6) and are truncated at τ for ρ = 1 . and µ = 1 . . The censoring times are independent of the vector ( X, Z, T ) . They aregenerated from the exponential distribution with parameter λ C and are truncated at τ . The covari-ate Z is measured with error, i.e. instead of Z we observe W = Z + U , where U ∼ N (0 , v ) is independent of the previous variables. We consider v = 0 . and v = 0 . corresponding tosmall and large error settings respectively. Model Scenario γ β λ C ( τ , τ ) Cure Cens. Plateaurate rate . , , .

4) (0 . , .

3) 0 .

33 (4 ,

6) 20% 35% 9%2 2 (1 . , . , − .

3) (2 , − .

8) 0 .

08 (10 ,

12) 30% 35% 19%3 ( − . , . ,

1) (0 . , .

3) 0 . ,

6) 50% 60% 22%1 (1 . , , .

4) (1 . , .

5) 0 . ,

8) 20% 35% 7%3 2 (1 . , . , − .

3) (1 , −

1) 0 . ,

8) 30% 35% 22%3 ( − . , . ,

1) (0 . , .

5) 0 . ,

8) 50% 60% 24%1 (1 . , .

5) (0 . , .

1) 0 . ,

7) 20% 35% 9%4 2 (1 . ,

2) (0 . , .

5) 0 .

12 (5 ,

7) 30% 35% 22%3 (0 . −

2) ( − . , .

5) 0 . ,

7) 50% 60% 14%1 (1 . , .

5) (0 . , .

1) 0 . ,

6) 20% 35% 10%5 2 (1 . ,

2) (0 . , − .

5) 0 .

13 (4 ,

6) 30% 35% 21%3 (0 ,

2) (1 , −

1) 0 . ,

8) 50% 60% 12%

Table 3: Parameter values and model characteristics for each scenario in Models 2-5.For the four models, various choices of the parameters γ , β , λ C and ( τ , τ ) are considered, insuch a way that we obtain three scenarios for the cure rate ( , and ) and differentlevels of censoring (see Table 3). The sample size is ﬁxed at n = 200 , while the variance ofthe measurement error is chosen as described in each model, corresponding to a ratio between ixture cure model with mismeasured covariates Mod/ naive - 1 naive - 2 simex - 1 simex - 2Scen./ v Par. Bias Var. MSE Bias Var. MSE Bias Var. MSE Bias Var. MSE / γ . . . . . . . . . . . . v = . γ − . . . − . . . . . . − . . . γ − . . . − . . . − . . . − . . . β − . . . − . . . . . . . . . β − . . . − . . . . . . . . . / γ . . . . . . . . . . . . v = . γ − . . . − . . . − . . . − . . . γ − . . . − . . . − . . . − . . . β − . . . − . . . − . . . − . . . β − . . . − . . . . . . . . . / γ − . . . − . . . − . . . − . . . v = . γ . . . − . . . . . . − . . . γ . . . . . . . . . . . . β − . . . . . . . . . . . . β − . . . − . . . − . . . − . . . / γ − . . . − . . . − . . . − . . . v = . γ . . . − . . . . . . − . . . γ . . . . . . . . . . . . β − . . . − . . . − . . . . . . β − . . . − . . . − . . . − . . . / γ − . . . − . . . . . . − . . . v = . γ − . . . − . . . . . . − . . . v = . β − . . . − . . . . . . . . . β − . . . − . . . . . . . . . / γ − . . . − . . . − . . . − . . . v = . γ − . . . − . . . − . . . − . . . v = . β − . . . − . . . − . . . − . . . β . . . − . . . − . . . − . . . / γ . . . − . . . . . . − . . . v = . γ . . . . . . . . . . . . β . . . . . . . . . . . . β . . . . . . . . . . . . / γ . . . − . . . . . . − . . . v = . γ . . . . . . . . . . . . β . . . . . . . . . . . . β . . . . . . . . . . . . Table 4: Bias, variance and MSE of ˆ γ and ˆ β for the naive and simex method based on the maxi-mum likelihood (1) or the presmoothing (2) approach in Models 2-5 ( n = 200 ). The ﬁrst columngives the model, scenario and the standard deviation of the measurement error. All numbers weremultiplied by .the standard deviation of the error and the standard deviation of the covariate equal to . and . . Some of the results are given in Table 4 and the rest can be found in Tables S2-S3 in theSupplementary Material.Once more we observe that the maximum likelihood estimator and the estimator based onpresmoothing give comparable results for both the naive and the simex method. As expected,the measurement error mainly affects the estimators of the coefﬁcients corresponding to the mis-4 E. M

USTA AND

I. V AN K EILEGOM measured covariates. However, the measurement error induces bias also on variables correlatedto the mismeasured covariate within the same component. For example in Model 5, the measure-ment error of Z leads to biased estimators for β and β , but does not affect the estimation of γ even though Z = X . In all settings, the simex method corrects for the bias due to the mea-surement error. Nevertheless, in terms of mean squared error, the naive approach is still preferredwhen the measurement error is small and the absolute value of the coefﬁcient corresponding tothe standardized covariate is small (the covariate has a weak effect on cure or survival). On thecontrary, a strong effect (large coefﬁcient) and a large measurement error favour the use of thesimex method. 5.4 . Robustness of the method Here we investigate the robustness of the simex approach with respect to the choice of theextrapolation function, misspeciﬁcation of the error distribution and of the error standard devi-ation. We focus on Model 2, where the mismeasured covariate is X = Z affecting both thecure probability and the survival. The sample size is n = 200 and the error standard deviation is v = 0 . or v = 0 . .In addition to the quadratic extrapolant used in Table 4, we consider also a linear and a cubicextrapolant. Results in Table 5 show that, as the order of the extrapolation function increases, thedifference between the maximum likelihood estimators and the estimators based on presmooth-ing becomes more signiﬁcant. In particular, it favours the ﬁrst method over the latter one mainlydue to a smaller variance. As expected, the choice of the extrapolation function has stronger ef-fect on the coefﬁcients corresponding to the mismeasured covariates and when the error is large.For v = 0 . , the bias decreases as the extrapolation order increases while there is no clear con-clusion when v is small. In terms of mean squared error, linear extrapolation is preferred whenthe measurement error variance is low or more in general in situations where the naive methodwould do better than the simex approach. In cases where simex outperforms the naive estimators,the quadratic extrapolant seems to be the best choice. v = 0 . v = 0 . simex - 1 simex - 2 simex - 1 simex - 2Par. Bias Var. MSE Bias Var. MSE Bias Var. MSE Bias Var. MSElinear γ . . . . . . . . . . . . γ . . . − . . . − . . . − . . . γ − . . . − . . . − . . . − . . . β − . . . − . . . − . . . − . . . β . . . . . . − . . . − . . . cubic γ . . . . . . . . . . . . γ . . . − . . . . . . − . . . γ − . . . − . . . − . . . − . . . β . . . . . . − . . . − . . . β . . . . . . − . . . . . . Table 5: Bias, variance and MSE of ˆ γ and ˆ β for the simex method based on the maximumlikelihood (1) or the presmoothing (2) approach with three different extrapolation functions. Allnumbers were multiplied by .To understand what happens if the error distribution is misspeciﬁed we generate the measure-ment error from three other distributions: a uniform distribution U ∼ Unif( − a, a ) , a Student-tdistribution with k degrees of freedom a − U ∼ t k and a chi-squared distribution with k degrees ixture cure model with mismeasured covariates of freedom a − U + k ∼ χ k . The constant a is chosen in such a way that the standard deviationof U is v = 0 . or v = 0 . . In all three cases we still use the Gaussian distribution in the simexprocedure. Results are given in Table 6. We observe that, when the true distribution is uniformor Student-t, the method still behaves quite well and there is little impact on the estimators.However, when the true distribution is not symmetric ( χ ) there is a signiﬁcant increase in meansquared error, in particular for large v . v = 0 . v = 0 . simex - 1 simex - 2 simex - 1 simex - 2Par. Bias Var. MSE Bias Var. MSE Bias Var. MSE Bias Var. MSEt-distr. γ . . . . . . . . . . . . γ . . . . . . − . . . − . . . γ − . . . − . . . − . . . − . . . β . . . . . . − . . . − . . . β . . . . . . . . . . . . Unif. γ . . . . . . . . . . . . γ . . . − . . . − . . . − . . . γ − . . . − . . . − . . . − . . . β . . . . . . − . . . − . . . β . . . . . . − . . . − . . . χ γ . . . . . . . . . . . . γ . . . − . . . . . . − . . . γ − . . . − . . . − . . . − . . . β − . . . − . . . − . . . − . . . β . . . . . . − . . . − . . . Table 6: Bias, variance and MSE of ˆ γ and ˆ β for the simex method based on the maximumlikelihood (1) or the presmoothing (2) approach when the error distribution is misspeciﬁed. Allnumbers were multiplied by .Finally we investigate the effect of error variance misspeciﬁcation. We simulate the error froma normal distribution with standard deviation v = 0 . and v = 0 . but in the estimation processthe variance is misspeciﬁed v E ∈ { v − . , v + 0 . } . Results reported in Table 7 show that themisspeciﬁcation affects estimation of all the parameters but the difference is larger for those thatcorrespond to the mismeasured covariates. As expected, increasing the speciﬁed variance v E leads to an increased variance of the simex estimators. For small v , the lowest bias is obtainedwhen v is correctly speciﬁed while for large v , the bias decreases as the speciﬁed variance in-creases. In terms of mean squared error, in situations where simex performs worse than the naiveapproach underspecifying the variance works better. On the other hand, when simex outperformsthe naive estimators, overspecifying the error variance is preferred over underspeciﬁcation.

6. A

PPLICATION : PROSTATE CANCER STUDY

In this section we illustrate the practical use of the proposed simex procedure for a medicaldataset concerning patients with prostate cancer. According to the American Cancer Society,prostate cancer is the second most common cancer among American men (after skin cancer)and it is estimated that about 1 man in 9 is diagnosed with prostate cancer during his lifetime.Even though most men diagnosed with prostate cancer do not die from it, it can sometimes be aserious disease. The 5-year survival rate based on the stage of the cancer at diagnoses is almost6

E. M

USTA AND

I. V AN K EILEGOM v = 0 . v = 0 . simex - 1 simex - 2 simex - 1 simex - 2 v E Par. Bias Var. MSE Bias Var. MSE Bias Var. MSE Bias Var. MSE v − . γ . . . . . . . . . . . . γ − . . . − . . . − . . . − . . . γ − . . . − . . . − . . . − . . . β − . . . − . . . − . . . − . . . β . . . . . . − . . . − . . . v + 0 . γ . . . . . . . . . . . . γ . . . . . . . . . . . . γ − . . . − . . . − . . . − . . . β . . . . . . . . . . . . β . . . . . . . . . . . . Table 7: Bias, variance and MSE of ˆ γ and ˆ β for the simex method based on the maximumlikelihood (1) or the presmoothing (2) approach when the error variance is misspeciﬁed. Allnumbers were multiplied by . for localized or regional stage and drops to for distant stage. Among other factors,the prostate-speciﬁc antigen (PSA) blood level is a good indicator of the presence of the cancerand is used as a tool to both diagnose and monitor the development of the disease. In most cases,elevated PSA levels indicate a poor prostate cancer prognosis. Even though most studies do nottake it in consideration, the PSA measurements are not error-free because of the inaccuracy ofthe measuring technique and own ﬂuctuations of the PSA levels. Here we try to analyse the effectof PSA on cure probability and survival while accounting for measurement error.We obtain the data from the Surveillance, Epidemiology and End Results (SEER) database,which is a collection of cancer incidence data from population-based cancer registries in the US.We select the database ’Incidence - SEER 18 Regs Research Data’ and extract the prostate cancerdata for the county of San Bernardino in California during the period − . We restrictto only white patients, aged − years old, with stage at diagnosis: localized, regional ordistant and follow-up time greater than zero. Since a PSA level smaller than ng/ml of bloodis considered as normal and a PSA value between and ng/ml is considered as a borderlinerange, we focus only on patients with PSA level greater than ng/ml. The event time is deathbecause of prostate cancer. This cohort consists of observations out of which do notexperience cancer related death (i.e. around are censored). The follow-up time ranges from to months. For most of the patients the cancer has been diagnosed at early stage (local-ized), while for of them the stage at diagnosis is ‘regional’ and only for it is ‘distant’.The PSA level varies from . to ng/ml, with median value . ng/ml, mean value . ng/ml and standard deviation ng/ml. We use a logistic/Cox mixture cure model to analyse thisdataset and the covariates of interest are the PSA level (continuous variable centered to the meanand measured with error) and stage at diagnosis. The latter one is classiﬁed using two dummyBernoulli variables S and S , indicating distant and regional stage respectively. The use of thecure models is justiﬁed from the presence of a long plateau containing around of the obser-vations visible in the Kaplan-Meier curve (Kaplan & Meier (1958)) in Figure 1. Moreover, theKaplan-Meier curves depending on stage at diagnosis in Figure 1 conﬁrm that being in the distantstage signiﬁcantly affects the probability of being cured. We ﬁrst estimate the model ignoring themeasurement error (‘naive’) and then we apply the simex procedure with quadratic extrapolationfunction for two levels of measurement error, namely with standard deviation v = 4 . and v = 8 , ixture cure model with mismeasured covariates . . . . . . Time (months) S u r v i v a l p r obab ili t y . . . . . . Time (months) S u r v i v a l p r obab ili t y Fig. 1: Left panel: Kaplan Meier survival curve for the prostate cancer data. Right panel: KaplanMeier survival curves based on cancer stage at diagnosis, localized (solid), regional (dotted) anddistant (dashed).corresponding to a ratio between the standard deviation of the error and the standard deviationof the covariate equal to . and . (we considered slightly smaller error than in the simulationsetting in order to be closer to real life scenarios). In all three cases we use both the maximumlikelihood estimation method and the presmoothing based method. The standard deviations ofthe estimates are computed through bootstrap samples. We consider such a large number ofbootstrap samples because we noted that the estimated standard deviation for γ (distant stage) isnot very stable due to the small sample size of that category. The results are reported in Table 8.First of all we observe that, independently of the estimation method that we use, the PSA leveland being in the distant stage are signiﬁcant for the cure probability, while only the latter oneis signiﬁcant for survival of uncured patients (at level ). The positive sign of the coefﬁcientsconﬁrms that high PSA level and distant stage are related to low cure probability and poor sur-vival. Note that the estimated coefﬁcient for the PSA value seems very small but it correspondsto a coefﬁcient around . for the standardized variable. Given that the sample size is also large,we expect that, if the measurement error is relatively large, the use of the simex procedure wouldgive more accurate results. Moreover, since there is some correlation between the PSA level andthe stage of cancer, the measurement error might induce bias also in the other coefﬁcients. For themaximum likelihood estimator, the estimated effect of the PSA level on the cure probability isslightly stronger when taking into account the measurement error, while the effect of the distantstage is slightly weakened. The opposite happens with the estimation based on presmoothing.To understand what these differences in the estimates mean in practical terms we compute thecure probability for patients with distant or localized stage and three different PSA levels: ng/ml, ng/ml (mean value) and ng/ml (see Table 9). Contrary to our expectation, we seethat, in this example, there is not much difference between the naive and the simex approach. Weobserved in the simulation study that, when the bias induced by the measurement error is large,it is signiﬁcantly reduced by the simex procedure and otherwise simex has little effect (see forexample estimation of γ in Model 1 and Model 2, Scenario 1 with v = 0 . or estimation of β in Model 5, Scenario 2 with v = 0 . ). Hence, we can conclude that in this example, the biasinduced by the mismeasured PSA value is small. This is probably due to the fact that the effectof the PSA value on survival is weak (the absolute value of its coefﬁcient is small compared tothe intercept and the coefﬁcient of S ). The very high cure and censoring rate might also play arole. On the other hand, correlation between PSA and the stage of cancer would lead to induced8 E. M

USTA AND

I. V AN K EILEGOM incidence latencyIntercept PSA S S PSA S S n a i v e - estimates − . . . . . . . est. SD . . . . . . . p-value . . . . . . . n a i v e - estimates − . . . . . . . est. SD . . . . . . . p-value . . . . . . . v = . s i m e x - estimates − . . . . . . . est. SD . . . . . . . p-value . . . . . . . s i m e x - estimates − . . . . . . . est. SD . . . . . . . p-value . . . . . . . v = s i m e x - estimates − . . . . . . . est. SD . . . . . . . p-value . . . . . . s i m e x - estimates − . . . . . . . est. SD . . . . . . . p-value . . . . . . . Table 8: Coefﬁcient estimates, estimated standard deviations and p-values for the prostate can-cer data using the naive and the simex method based on the maximum likelihood (1) and thepresmoothing (2) approach. ‘Localized’ ‘Distant’PSA (ng/ml)

10 22 34 10 22 34 naive - 1 .

0% 90 .

3% 86 .

7% 33 .

1% 25 .

6% 19 . simex - 1 ( v = 4 . ) .

1% 90 .

3% 86 .

6% 33 .

3% 25 .

7% 19 . simex - 1 ( v = 8 ) .

2% 90 .

3% 86 .

4% 33 .

8% 25 .

9% 19 . naive - 2 .

4% 90 .

9% 87 .

6% 35 .

9% 28 .

2% 21 . simex - 2 ( v = 4 . ) .

2% 90 .

7% 87 .

4% 34 .

6% 27 .

4% 21 . simex - 2 ( v = 8 ) .

2% 90 .

7% 87 .

3% 34 .

9% 27 .

6% 21 . Table 9: Estimated cure probability for given PSA level and stage. The naive and simex estima-tors are computed using the maximum likelihood (1) or the presmoothing (2) approach.bias even for the coefﬁcients corresponding to S and S . From the simulation study (see Model5) we expect this bias to be of the same order as for the mismeasured covariate. Thus, since herethe bias for the coefﬁcient of the PSA value is small, even for the coefﬁcients of S and S wedo not observe much difference between the naive and simex method. Finally, we ﬁnd that theestimated cure probabilities are larger when using the estimators based on presmoothing. Basedagain on the simulation study (cases with small bias), it is more likely that presmoothing behavesbetter than the maximum likelihood approach. ixture cure model with mismeasured covariates CKNOWLEDGEMENTS

The authors acknowledge ﬁnancial support from the European Research Council (20162021,Horizon 2020, grant agreement 694409). For the simulations we used the infrastructure of theFlemish Supercomputer Center (VSC). S UPPLEMETARY M ATERIAL

Supporting information may be found in the online appendix. This document contains theproofs of the theorems in Section 4 and additional simulation results. R EFERENCES A MICO , M. & V AN K EILEGOM , I. (2018). Cure models in survival analysis.

Annual Review of Statistics and ItsApplication , 311–342.B ERKSON , J. & G

AGE , R. P. (1952). Survival curve for cancer patients following treatment.

Journal of the AmericanStatistical Association , 501–515.B ERTRAND , A., L

EGRAND , C., C

ARROLL , R. J., D E M EESTER , C. & V AN K EILEGOM , I. (2017a). Inference ina survival cure model with mismeasured covariates using a simulation-extrapolation approach.

Biometrika ,31–50.B

ERTRAND , A., L

EGRAND , C., L ´

EONARD , D. & V AN K EILEGOM , I. (2017b). Robustness of estimation methodsin a survival cure model with mismeasured covariates.

Computational Statistics & Data Analysis , 3–18.B

ERTRAND , A., V AN K EILEGOM , I. & L

EGRAND , C. (2019). Flexible parametric approach to classical measure-ment error variance estimation without auxiliary data.

Biometrics , 297–307.B OAG , J. W. (1949). Maximum likelihood estimates of the proportion of patients cured by cancer therapy.

Journalof the Royal Statistical Society. Series B (Methodological) , 15–53.C AI , C., Z OU , Y., P ENG , Y. & Z

HANG , J. (2012). smcure: An R-package for estimating semiparametric mixturecure models.

Computer methods and programs in biomedicine , 1255–1260.C

ARROLL , R. J., K ¨

UCHENHOFF , H., L

OMBARD , F. & S

TEFANSKI , L. A. (1996). Asymptotics for the simexestimator in nonlinear measurement error models.

Journal of the American Statistical Association , 242–250.C ARROLL , R. J., R

UPPERT , D., S

TEFANSKI , L. A. & C

RAINICEANU , C. M. (2006).

Measurement error in nonlin-ear models: a modern perspective . CRC press.C

HEN , L.-P. (2019). Semiparametric estimation for cure survival model with left-truncated and right-censored dataand covariate measurement error.

Statistics & Probability Letters .C OOK , J. R. & S

TEFANSKI , L. A. (1994). Simulation-extrapolation estimation in parametric measurement errormodels.

Journal of the American Statistical association , 1314–1328.C OX , D. R. (1972). Regression models and life-tables. Journal of the Royal Statistical Society. Series B. Method-ological , 187–220.G REENE , W. F. & C AI , J. (2004). Measurement error in covariates in the marginal hazards model for multivariatefailure time data. Biometrics , 987–996.K APLAN , E. L. & M

EIER , P. (1958). Nonparametric estimation from incomplete observations.

Journal of theAmerican statistical association , 457–481.L I , Y. & L IN , X. (2003). Functional inference in frailty measurement error models for clustered survival data usingthe simex approach. Journal of the American Statistical Association , 191–203.L U , W. (2008). Maximum likelihood estimation in the proportional hazards cure model. Annals of the Institute ofStatistical Mathematics , 545–574.M A , Y. & Y IN , G. (2008). Cure rate model with mismeasured covariates under transformation. Journal of theAmerican statistical association , 743–756.M

IZOI , M. F., B

OLFARINE , H. & P

EDROSO -D E -L IMA , A. C. (2007). Cure rate model with measurement error.

Communications in StatisticsSimulation and Computation , 185–196.M USTA , E., P

ATILEA , V. & V AN K EILEGOM , I. (2020). A presmoothing approach for estimation in mixture curemodels. arXiv:2008.05338 .R OBERTSON , T., W

RIGHT , F. T. & D

YKSTRA , R. L. (1988).

Order Restricted Statistical Inference . Wiley Series inProbability and Mathematical Statistics, John Wiley and Sons, Chichester.S Y , J. P. & T AYLOR , J. M. (2000). Estimation in a Cox proportional hazards cure model.

Biometrics , 227–236. VAN DER V AART , A. W. & W

ELLNER , J. A. (1996).

Weak convergence and empirical processes . Springer Series inStatistics. Springer-Verlag, New York. With applications to statistics. simulation-extrapolation approachfor the mixture cure model with mismeasured covariates

Supplementary Material

Eni Musta and Ingrid Van Keilegom

ORSTAT, KU Leuven

This supplement is organized as follows. Appendix A contains proofs of Theorems 1-4. Appendix Bcollects additional simulation results, that were omitted from the main paper due to page limits.A. P

ROOFS

Proof of Theorem . For a ﬁxed λ and b , from condition (A1) we have (cid:107) ˆ γ λ,b − γ λ (cid:107) → , (cid:107) ˆ β λ,b − β λ (cid:107) → and sup t ∈ [0 ,τ ] | ˆΛ λ,b ( t ) − Λ λ ( t ) | → with probability one. By deﬁnition of ˆ γ λ , ˆ β λ and ˆΛ λ as averages over the correspondent values for b =1 , . . . , B (see (5)) and Slutsky’s theorem, it follows that for each λ ∈ { λ , . . . , λ K }(cid:107) ˆ γ λ − γ λ (cid:107) → , (cid:107) ˆ β λ − β λ (cid:107) → and sup t ∈ [0 ,τ ] | ˆΛ λ ( t ) − Λ λ ( t ) | → almost surely. Next we ﬁrst focus on consistency of ˆ γ simex . Since we are assuming that g γ ( a ∗ γ , λ ) =( g γ, ( a ∗ γ , λ ) , . . . , g γ,p ( a ∗ γ p , λ )) T is the true extrapolation function, we have γ λ = g γ ( a ∗ γ , λ ) and γ = g γ ( a ∗ γ , − . On the other hand, ˆ γ simex = g γ (ˆ a γ , − , where ˆ a γ is the least squares estimator of a ∗ γ , i.e. itsolves Ψ n ( a γ ) = ˙ g γ ( a γ , λ ) T { g γ ( a γ , λ ) − ˆ γ λ } = 0 where ˆ γ λ = (ˆ γ Tλ , . . . , ˆ γ Tλ K ) T , g γ ( a γ , λ ) = ( g γ ( a γ , λ ) T , . . . , g γ ( a γ , λ K ) T ) T and ˙ g γ ( a γ , λ ) is the pK × p dim( a γ ) matrix of partial derivatives of the elements of g γ ( a γ , λ ) with respect to the elements of a γ .Moreover, the true parameters a ∗ γ are the solution of Ψ( a γ ) = ˙ g γ ( a γ , λ ) T { g γ ( a γ , λ ) − γ λ } = 0 and sup a γ (cid:107) Ψ n ( a ) − Ψ( a ) (cid:107) ≤ sup a γ (cid:107) ˙ g γ ( a γ , λ ) (cid:107)(cid:107) ˆ γ λ − γ λ (cid:107) → a.s.Hence, if a ∗ γ is the unique solution of Ψ( a γ ) = 0 , it follows that ˆ a γ → a ∗ γ with probability one. From thecontinuous mapping theorem it follows that (cid:107) ˆ γ simex − γ (cid:107) → a.s.Consistency of ˆ β simex can be proven in the same way. For the function ˆΛ simex we suppose that for ev-ery t ∈ [0 , τ ] , Λ λ ( t ) can be speciﬁed by a function g Λ ,t ( a t , λ ) depending on a parametric vector a t and Λ ( t ) = g Λ ,t ( a t , − . Hence, arguing as above, for any ﬁxed t ∈ [0 , τ ] we can show that | ˆΛ simex ( t ) − Λ ( t ) | → a.s.Uniform consistency on the compact [0 , τ ] follows from the fact that Λ is continuous and Λ SIMEX isnon-decreasing. (cid:3) E. M

USTA AND

I. V AN K EILEGOM

Proof of Theorem . For ﬁxed λ and b , from condition (C2) we have h T (ˆ γ λ,b − γ λ ) + h T ( ˆ β λ,b − β λ ) + (cid:90) τ h ( s )d(ˆΛ λ,b − Λ λ )( s )= 1 n n (cid:88) i =1 Ψ( Y i , ∆ i , W i,λ,b , h , h , h ) + o P ( n − / ) uniformly over ( h , h , h ) ∈ H m . As a result, h T (ˆ γ λ − γ λ ) + h T ( ˆ β λ − β λ ) + (cid:90) τ h ( s )d(ˆΛ λ − Λ λ )( s )= h T (cid:32) B B (cid:88) b =1 ˆ γ λ,b − γ λ (cid:33) + h T (cid:32) B B (cid:88) b =1 ˆ β λ,b − β λ (cid:33) + (cid:90) τ h ( s )d (cid:32) B B (cid:88) b =1 ˆΛ λ,b − Λ λ (cid:33) ( s )= 1 n n (cid:88) i =1 (cid:40) B B (cid:88) b =1 Ψ( Y i , ∆ i , W i,λ,b , h , h , h ) (cid:41) + o P ( n − / ) . Since sum of Donsker classes is Donsker (see Lemma 2.10.6 in van der Vaart & Wellner (1996)), it followsthat the process n / (cid:26) h T (ˆ γ λ − γ λ ) + h T ( ˆ β λ − β λ ) + (cid:90) τ h ( s )d(ˆΛ λ − Λ λ )( s ) (cid:27) converges weakly to a zero-mean Gaussian process G λ indexed by h = ( h , h , h ) ∈ H m and covariancefunction Cov ( G λ ( h , h , h ) , G λ ( h ∗ , h ∗ , h ∗ ))= E (cid:34)(cid:40) B B (cid:88) b =1 Ψ( Y, ∆ , W λ,b , h , h , h ) (cid:41) (cid:40) B B (cid:88) b =1 Ψ( Y, ∆ , W λ,b , h ∗ , h ∗ , h ∗ ) (cid:41)(cid:35) . Moreover, the K dimensional vector n /  h T (ˆ γ λ − γ λ ) + h T ( ˆ β λ − β λ ) + (cid:82) τ h ( s )d(ˆΛ λ − Λ λ )( s ) ... h T (ˆ γ λ K − γ λ K ) + h T ( ˆ β λ K − β λ K ) + (cid:82) τ h ( s )d(ˆΛ λ K − Λ λ K )( s )  converges to a K dimensional Gaussian process G λ with mean zero and covariance function between the i th and the jth component E (cid:34)(cid:40) B B (cid:88) b =1 Ψ( Y, ∆ , W λ i ,b , h , h , h ) (cid:41) (cid:40) B B (cid:88) b =1 Ψ( Y, ∆ , W λ j ,b , h ∗ , h ∗ , h ∗ ) (cid:41)(cid:35) . In particular, if we take h ≡ , h = 0 and h = (0 , . . . , , , , . . . , with h containing 1 at the j thposition ( j = 1 , . . . , p ) and 0 elsewhere, we obtain the weak convergence of n / (ˆ γ λ − γ λ ) to a multi-variate normal random variable with mean zero and covariance matrix Σ γ, λ . With the same reasoning wealso obtain n / ( ˆ β λ − β λ ) d −→ N (0 , Σ β, λ ) . For ˆΛ λ we consider the class (cid:8) ( h , h , h ) ∈ H m : h = h = 0 and h ( s ) = { s ≤ t } , t ∈ [0 , τ ] (cid:9) and obtain the weak convergence of n / { ˆΛ λ ( t ) − Λ λ ( t ) } to a Gaussian process G λ indexed by t ∈ [0 , τ ] .Next we prove the asymptotic normality of ˆ γ simex . Since we are assuming that g γ ( a ∗ γ , λ ) =( g γ, ( a ∗ γ , λ ) , . . . , g γ,p ( a ∗ γ p , λ )) T is the true extrapolation function, we have γ λ = g γ ( a ∗ γ , λ ) and γ = g γ ( a ∗ γ , − . On the other hand, ˆ γ simex = g γ (ˆ a γ , − , where ˆ a γ is the least squares estimator of a ∗ γ , i.e. it ixture cure model with mismeasured covariates solves Ψ n ( a γ ) = ˙ g γ ( a γ , λ ) T { g γ ( a γ , λ ) − ˆ γ λ } = 0 (S1)where ˆ γ λ = (ˆ γ Tλ , . . . , ˆ γ Tλ K ) T , g γ ( a γ , λ ) = ( g γ ( a γ , λ ) T , . . . , g γ ( a γ , λ K ) T ) T and ˙ g γ ( a γ , λ ) is the pK × p dim( a γ ) matrix of partial derivatives of the elements of g γ ( a γ , λ ) with respect to the elements of a γ .Since ˆ a γ solves equation (S1) and ˆ a γ → a ∗ γ with probability one (see proof of Theorem 1), if ˙ g γ ( a γ , λ ) is bounded and continuous w.r.t. a γ and ˙ g γ ( a γ , λ ) T ˙ g γ ( a γ , λ ) is invertible, we have n / (ˆ a γ − a ∗ γ ) = (cid:8) ˙ g γ ( a ∗ γ , λ ) T ˙ g γ ( a ∗ γ , λ ) (cid:9) − ˙ g γ ( a ∗ γ , λ ) T n / (ˆ γ λ − γ λ ) + o P (1) . As a result, n / (ˆ a γ − a ∗ γ ) d −→ (cid:8) ˙ g γ ( a ∗ γ , λ ) T ˙ g γ ( a ∗ γ , λ ) (cid:9) − ˙ g γ ( a ∗ γ , λ ) T N (0 , Σ γ, λ ) . Finally, using the delta method, we obtain n / (ˆ γ simex − γ ) d −→ ˙ g γ ( a ∗ γ , − (cid:8) ˙ g γ ( a ∗ γ , λ ) T ˙ g γ ( a ∗ γ , λ ) (cid:9) − ˙ g γ ( a ∗ γ , λ ) T N (0 , Σ γ, λ ) , meaning that n / (ˆ γ simex − γ ) converges weakly to a multivariate normal random variable with meanzero and covariance matrix Σ γ = ˙ g γ ( a ∗ γ , − (cid:8) ˙ g γ ( a ∗ γ , λ ) T ˙ g γ ( a ∗ γ , λ ) (cid:9) − ˙ g γ ( a ∗ γ , λ ) T × Σ γ, λ ˙ g γ ( a ∗ γ , λ ) (cid:8) ˙ g γ ( a ∗ γ , λ ) T ˙ g γ ( a ∗ γ , λ ) (cid:9) − ˙ g γ ( a ∗ γ , − T . (S2)In the same way it can be shown that n / ( ˆ β simex − β ) converges weakly to a multivariate normal ran-dom variable with mean zero and covariance matrix Σ β = ˙ g β ( a ∗ β , − (cid:8) ˙ g β ( a ∗ β , λ ) T ˙ g β ( a ∗ β , λ ) (cid:9) − ˙ g β ( a ∗ β , λ ) T × Σ β, λ ˙ g β ( a ∗ β , λ ) (cid:8) ˙ g β ( a ∗ β , λ ) T ˙ g β ( a ∗ β , λ ) (cid:9) − ˙ g β ( a ∗ β , − T . (S3)Similarly, for the nonparametric component we have n / (ˆ a t − a ∗ t ) = (cid:8) ˙ g Λ ,t ( a ∗ t , λ ) T ˙ g Λ ,t ( a ∗ t , λ ) (cid:9) − ˙ g Λ ,t ( a ∗ t , λ ) T n / (ˆΛ λ ( t ) − Λ λ ( t )) + o P (1) for all t ∈ [0 , τ ] . From the weak convergence of the process n / (ˆΛ λ − Λ λ ) , it follows that n / (ˆ a t − a ∗ t ) converges in distribution to the Gaussian process (cid:8) ˙ g Λ ,t ( a ∗ t , λ ) T ˙ g Λ ,t ( a ∗ t , λ ) (cid:9) − ˙ g Λ ,t ( a ∗ t , λ ) T G λ . Once more, the delta method yields that n / (ˆΛ simex − Λ ) converges weakly to the Gaussian process G = ˙ g Λ ,t ( a ∗ t , − (cid:8) ˙ g Λ ,t ( a ∗ t , λ ) T ˙ g Λ ,t ( a ∗ t , λ ) (cid:9) − ˙ g Λ ,t ( a ∗ t , λ ) T G λ . (S4) (cid:3) Proof of Theorem . Let Υ = ( γ , β , Λ ) , θ = ( γ , β ) , ˆ θ n = (ˆ γ n , ˆ β n ) and H m as in (A2). Deﬁnethe continuous linear operator σ = ( σ , σ ) from H m to H m of the form σ ( h )( t ) = E (cid:104) { Y ≥ t } V ( t, Υ )( h ) g ( t, Υ ) e β T Z (cid:105) − E (cid:20)(cid:90) τt { Y ≥ s } V ( t, Υ )( h ) g ( s, Υ ) { − g ( s, Υ ) } e β T Z dΛ ( s ) (cid:21) (S5)and σ ( h )( t ) = E (cid:20)(cid:90) τ { Y ≥ t } W ( t, Υ ) V ( t, Υ )( h ) g ( t, Υ ) e β T Z dΛ ( t ) (cid:21) (S6) E. M

USTA AND

I. V AN K EILEGOM where g ( t, Λ , β, γ ) = φ ( γ, X ) exp (cid:0) − Λ( t ) exp (cid:0) β T Z (cid:1)(cid:1) − φ ( γ, X ) + φ ( γ, X ) exp ( − Λ( t ) exp ( β T Z )) , (S7) V ( t, Υ )( h ) = h ( t ) − { − g ( t, Υ ) } e β T Z (cid:90) t h ( s )dΛ ( s ) + ( h T , h T ) W ( t, Υ ) and W ( t, Υ ) = (cid:16) { − g ( t, Υ ) } X T , (cid:104) − { − g ( t, Υ ) } e β T Z Λ ( t ) (cid:105) Z T (cid:17) T . Note that in our case X = ( W (1) λ , W (2) λ ) and Z = ( W (1) λ , W (2) λ ) . In the proof of Theorem 2 in Lu (2008)(page 572) it is shown that (cid:90) τ σ ( h )( t ) d √ n (Λ n − Λ )( t ) + √ n (ˆ θ n − θ ) T σ ( h ) = √ n { S n (Υ ) − S (Υ ) } ( h ) + o P (1) , (S8)where √ n { S n (Υ ) − S (Υ ) } ( h , h , h ) = (cid:90) f h ( y, δ, x, z ) d √ n ( P n − P )( y, δ, x, z ) and { f h ( y, δ, x, z ) , h ∈ H m } is a uniformly bounded Donsker class such that E [ f h ( Y, ∆ , X, Z )] = S (Υ ) = 0 . In Lu (2008) it is also shown that σ is invertible with inverse σ − = ( σ − , σ − ) . Hence, for all h ∈ H m ,if in (S8) we replace h by σ − ( h ) , we obtain (cid:90) τ h ( t )d(Λ n ( t ) − Λ ( t )) + h T (ˆ γ n − γ ) + h T ( ˆ β n − β )= (cid:90) f σ − ( h ) ( y, δ, x, z ) d( P n − P )( y, δ, x, z ) + o P ( n − / ) and (A2) holds with Ψ λ ( y, δ, w, h , h , h ) = f σ − ( h ) (cid:16) y, δ, ( w (1) , w (2) ) , ( w (2) , w (3) ) (cid:17) (S9) (cid:3) Proof of Theorem . In Musta et al. (2020) it is shown that ˆ γ n − γ = − (Γ T Γ ) − Γ T (cid:90) ψ ( y, δ, x ) (. P n − P )( y, δ, x, z ) + o P ( n − / ) (S10)(see their equation (A.33)), where Γ = − E (cid:20)(cid:26) φ ( γ , X ) + 11 − φ ( γ , X ) (cid:27) ∇ γ φ ( γ , X ) ∇ γ φ ( γ , X ) T (cid:21) ,ψ ( y, δ, x ) = − (cid:26) ∆ { y ≤ τ } − H ( y | x ) − (cid:90) y ∧ τ H ( ds | x )(1 − H ( s | x )) (cid:27) φ ( γ , x ) ∇ γ φ ( γ , x ) with H k ( t | x ) = P ( Y ≤ t, ∆ = k | X = x ) for k = 0 , and H ( t | x ) = H ( t | x ) + H ( t | x ) . Moreover wehave E [ ψ ( Y, ∆ , X )] = 0 .Let Υ = ( γ , β , Λ ) and ˜ H m = { ˜ h = ( h , h ) ∈ BV [0 , τ ] × R q : (cid:107) h (cid:107) v + (cid:107) h (cid:107) L ≤ m } . Deﬁnethe continuous linear operator σ = ( σ , σ ) from ˜ H m to ˜ H m as in (S5), (S6) with V ( t, Υ )( h ) = h ( t ) − { − g ( t, Υ ) } e β T Z (cid:90) t h ( s )dΛ ( s ) + h T W ( t, Υ ) ixture cure model with mismeasured covariates and W ( t, Υ ) = (cid:104) − { − g ( t, Υ ) } e β T Z Λ ( t ) (cid:105) Z. From equations (A37)-(A38) in Musta et al. (2020) we have (cid:90) τ σ (˜ h )( t ) d √ n (Λ n − Λ )( t ) + √ n ( ˆ β n − β ) T σ (˜ h ) = √ n (cid:110) ˆ S n (Υ ) − S (Υ ) (cid:111) (˜ h ) + o P (1) , where √ n (cid:110) ˆ S n (Υ ) − S (Υ ) (cid:111) ( h , h ) = (cid:90) ˜ f ˜ h ( y, δ, x, z ) d √ n ( P n − P )( y, δ, x, z ) for some uniformly bounded Donsker class { f ˜ h ( y, δ, x, z ) , ˜ h ∈ ˜ H m } with E[ ˜ f ˜ h ( Y, ∆ , X, Z )] = 0 . Hence,if we replace ˜ h by σ − (˜ h ) , we obtain (cid:90) τ h ( t )d(Λ n ( t ) − Λ ( t )) + h T ( ˆ β n − β ) = (cid:90) ˜ f σ − (˜ h ) ( y, δ, x, z ) d( P n − P )( y, δ, x, z ) + o P ( n − / ) . Note that in our case x = ( w (1) λ , w (2) λ ) and z = ( w (1) λ , w (2) λ ) . Moreover, if h = ( h , h , h ) ∈ H m , then ˜ h = ( h , h ) ∈ ˜ H m . It follows that (A2) holds with Ψ λ ( y, δ, w, h , h , h ) = − h T (Γ T Γ ) − Γ T ψ (cid:16) y, δ, ( w (1) , w (2) ) (cid:17) + ˜ f σ − (( h ,h )) (cid:16) y, δ, ( w (1) , w (2) ) , ( w (2) , w (3) ) (cid:17) (S11) (cid:3) B. A

DDITIONAL SIMULATION RESULTS

In this section we report the simulation results for sample size n = 400 in Model 1, and results forModels 2-5 ( n = 200 ) that were omitted from the main paper. E. M

USTA AND

I. V AN K EILEGOM naive - 1 naive - 2 simex - 1 simex - 2Par. Bias Var. MSE Bias Var. MSE Bias Var. MSE Bias Var. MSE / / γ . . . . . . . . . . . . γ − . . . − . . . − . . . − . . . β − . . . − . . . − . . . − . . . / / γ . . . . . . . . . . . . γ − . . . − . . . − . . . . . . β − . . . − . . . − . . . − . . . / / γ − . . . − . . . − . . . − . . . γ − . . . − . . . − . . . − . . . β − . . . − . . . − . . . − . . . / / γ − . . . − . . . − . . . − . . . γ − . . . − . . . − . . . . . . β − . . . − . . . − . . . − . . . / / γ − . . . − . . . − . . . − . . . γ − . . . − . . . − . . . − . . . β − . . . − . . . − . . . − . . . / / γ − . . . − . . . . . . − . . . γ − . . . − . . . − . . . − . . . β − . . . − . . . − . . . − . . . / / γ − . . . − . . . − . . . − . . . γ − . . . − . . . − . . . − . . . β − . . . − . . . − . . . − . . . / / γ . . . − . . . − . . . − . . . γ − . . . − . . . − . . . − . . . β − . . . − . . . − . . . − . . . / / γ − . . . − . . . − . . . − . . . γ − . . . − . . . − . . . − . . . β − . . . − . . . − . . . − . . . / / γ − . . . − . . . − . . . − . . . γ − . . . − . . . − . . . − . . . β − . . . − . . . − . . . − . . . / / γ − . . . − . . . − . . . − . . . γ − . . . − . . . − . . . − . . . β − . . . − . . . − . . . − . . . / / γ . . . − . . . . . . − . . . γ − . . . − . . . − . . . − . . . β − . . . − . . . − . . . − . . . Table S1: Bias, variance and MSE of ˆ γ and ˆ β for the naive and simex method using the maximumlikelihood (1) and the presmoothing (2) approach for Model 1 ( n = 400 ). The ﬁrst column givesthe setting/scenario/cens. level. All numbers were multiplied by . ixture cure model with mismeasured covariates naive - 1 naive - 2 simex - 1 simex - 2Mod./Scen./ v Par. Bias Var. MSE Bias Var. MSE Bias Var. MSE Bias Var. MSE / / . γ . . . . . . . . . . . . γ − . . . − . . . . . . − . . . γ − . . . − . . . − . . . − . . . β − . . . − . . . − . . . − . . . β . . . . . . . . . . . . / / . γ − . . . − . . . . . . . . . γ − . . . − . . . − . . . − . . . γ − . . . − . . . − . . . − . . . β − . . . − . . . − . . . − . . . β . . . . . . . . . . . . / / . γ . . . . . . − . . . − . . . γ − . . . − . . . . . . − . . . γ . . . − . . . . . . . . . β − . . . − . . . . . . . . . β − . . . − . . . . . . . . . / / . γ . . . . . . − . . . . . . γ − . . . − . . . − . . . − . . . γ − . . . − . . . . . . − . . . β − . . . − . . . − . . . − . . . β − . . . − . . . − . . . − . . . / / . γ . . . . . . . . . . . . γ . . . − . . . . . . − . . . γ . . . − . . . . . . − . . . β . . . . . . . . . . . . β − . . . − . . . − . . . . . . / / . γ . . . . . . . . . . . . γ . . . − . . . . . . − . . . γ . . . − . . . . . . − . . . β . . . . . . . . . . . . β − . . . − . . . − . . . − . . . / / . γ . . . . . . . . . . . . γ . . . − . . . . . . − . . . γ − . . . − . . . − . . . − . . . β . . . . . . . . . . . . β . . . . . . − . . . − . . . / / . γ . . . . . . . . . . . . γ . . . − . . . . . . − . . . γ − . . . − . . . − . . . − . . . β − . . . − . . . . . . . . . β . . . . . . . . . . . . Table S2: Bias, variance and MSE of ˆ γ and ˆ β for the naive and simex method based on themaximum likelihood (1) or the presmoothing (2) approach in Models 2 and 3 ( n = 200 ). Theﬁrst column gives the model, scenario and the standard deviation of the measurement error. Allnumbers were multiplied by . E. M

USTA AND

I. V AN K EILEGOM naive - 1 naive - 2 simex - 1 simex - 2Mod./Scen./ v Par. Bias Var. MSE Bias Var. MSE Bias Var. MSE Bias Var. MSE / γ . . . . . . . . . . . . v = 0 . γ − . . . − . . . . . . . . . v = 0 . β − . . . − . . . − . . . − . . . β . . . . . . . . . . . . / γ . . . . . . . . . . . . v = 0 . γ − . . . − . . . − . . . − . . . v = 0 . β − . . . − . . . − . . . − . . . β − . . . − . . . . . . . . . / γ . . . − . . . . . . − . . . v = 0 . γ . . . . . . − . . . . . . v = 0 . β . . . . . . . . . . . . β − . . . − . . . − . . . − . . . / γ . . . − . . . . . . − . . . v = 0 . γ . . . . . . . . . . . . v = 0 . β . . . . . . . . . . . . β − . . . − . . . − . . . − . . . / / . γ . . . . . . . . . . . . γ . . . − . . . . . . − . . . β − . . . − . . . − . . . − . . . β − . . . − . . . − . . . − . . . / / . γ . . . . . . . . . . . . γ . . . − . . . . . . − . . . β − . . . − . . . − . . . − . . . β − . . . − . . . − . . . − . . . / / . γ . . . . . . . . . . . . γ . . . − . . . . . . − . . . β . . . . . . . . . . . . β . . . . . . . . . . . . / / . γ . . . . . . . . . . . . γ . . . − . . . . . . − . . . β . . . . . . . . . . . . β . . . . . . . . . . . . Table S3: Bias, variance and MSE of ˆ γ and ˆ β for the naive and simex method based on themaximum likelihood (1) or the presmoothing (2) approach in Models 4 and 5 ( n = 200 ). Theﬁrst column gives the model, scenario and the standard deviation of the measurement error. Allnumbers were multiplied by100