Robust and Efficient Parameter Estimation based on Censored Data with Stochastic Covariates
aa r X i v : . [ m a t h . S T ] J u l Robust and Efficient Parameter Estimation based onCensored Data with Stochastic Covariates
Abhik Ghosh and Ayanendranath BasuIndian Statistical Institute [email protected], [email protected]
Abstract
Analysis of random censored life-time data along with some related stochastic covariablesis of great importance in many applied sciences like medical research, population studiesand planning etc. The parametric estimation technique commonly used under this set-up isbased on the efficient but non-robust likelihood approach. In this paper, we propose a robustparametric estimator for the censored data with stochastic covariates based on the minimumdensity power divergence approach. The resulting estimator also has competitive efficiencywith respect to the maximum likelihood estimator under pure data. The strong robustnessproperty of the proposed estimator with respect to the presence of outliers is examined andillustrated through an appropriate simulation study in the context of censored regressionwith stochastic covariates. Further, the theoretical asymptotic properties of the proposedestimator are also derived in terms of a general class of M-estimators based on the estimatingequation.
Keywords:
Censored Data; Robust Methods; Linear Regression; Density power divergence;M-Estimator; Exponential Regression Model, Accelerated Failure Time Model.
It is often necessary to analyze life-time data in many applied sciences including medical sciences,population studies, planning etc. For these survival analyses problems, researchers often cannotobserve the full data because some of the respondents may leave the study in between or somemay be still alive at the end of the study period. Statistical modelling of such data involves theidea of censored distributions and random censoring variables. Mathematically, let Y , . . . , Y n be n independent and identically distributed (i.i.d.) observations from the population withunknown life-time distribution G Y . We assume that the observations are censored by a censoringdistribution G C independent of G Y and C , . . . , C n denote n i.i.d. sample observations from G C .We only observe the portion of Y i s (right) censored by C i s, i.e., we observe Z i = min ( Y i , C i ) and δ i = I ( Y i ≤ C i ) , i = 1 , . . . , n, where I ( A ) denote the indicator function of the event A . Based on these data ( Z i , δ i ), ouraim is to do inference about the lifetime distribution G Y . Suppose Z ( i,n ) denotes the i -th orderstatistic in { Z , · · · , Z n } and δ [ i,n ] is the value of corresponding δ ( i -th concomitant). The famousproduct-limit (non-parametric) estimator of G Y under this set-up had been derived by Kaplan1nd Meier (1958), which is given by d G Y ( y ) = 1 − n Y i =1 (cid:20) − δ [ i,n ] n − i + 1 (cid:21) I ( Z ( i,n ) ≤ y ) . It can be seen that, under suitable assumptions, the above product-limit estimator is in fact themaximum likelihood estimator of the distribution function in presence of censoring and enjoysseveral optimum properties. Many researchers have proved such properties and also extendedit for different complicated inference problems with censored data; for example, see Petersen(1977), Chen et al. (1982), Campbell and F¨oldes (1982), Wang et al. (1986), Tsi et al. (1987),Dabrowska (1988), Lo et al. (1989), Zhou (1991), Cai (1998), Satten and Datta (2001) amongmany others.In this present paper we further assume the availability of a set of uncensored covariables X ∈ R p that are associated with our target response Y ; i.e., for each respondent i we haveobserved the values X i along with ( Z i , δ i ). These covariables are generally the demographicconditions of the subject or some measurable indicator of the response variable (e.g., medicaldiagnostic measures like blood pressure, hemoglobin content, etc., for clinical trial responses).Let us assume that the distribution function of these i.i.d. covariates is G X and their jointdistribution function with Y is G so that G Y ( y ) = Z G ( x, y ) dx = Z G Y | X = x ( y ) G X ( x ) dx, where G Y | X = x is the conditional distribution function of Y given X = x . Instead of inferringabout the response Y alone, here we are more interested in obtaining the association betweenresponse and covariables through the conditional distribution G Y | X ; the distribution G X ofcovariates is often of interest but sometimes it may act as a nuisance component too. Let usdenote the i -th concomitant of X associated with Z ( i,n ) by X [ i,n ] . Under this set-up, Stute(1993) has extended the Kaplan-Meier product limit (KMPL) estimator d G Y ( y ) to obtain a non-parametric estimate of the joint multivariate distribution function G given by b G ( x, y ) = n X i =1 W in I ( X [ i,n ] ≤ x, Z ( i,n ) ≤ y )where the weights are calculated as W in = δ [ i,n ] n − i + 1 i − Y j =1 (cid:20) n − jn − j + 1 (cid:21) δ [ j,n ] . Note that when there is no censoring at all, i.e., δ i = 1 for all i , then W in = n for each i sothat d G Y ( y ) and b G ( x, y ) coincide with respective empirical distribution functions. Further, aboveestimator is also self-adjusted in presence of any ties in the data. These give the framework fornon-parametric inference based on the censored data with covariates. Stute (1993, 1996) provedseveral asymptotic properties like strong consistency, asymptotic distribution of b G ( x, y ) andrelated statistical functionals. Several non-parametric and semi-parametric inference proceduresusing b G ( x, y ) are widely used in real life applications.However, for many applications in medical sciences, one may know the parametric form ofthe distribution of the survival time (censored responses) possibly through previous experience2n similar context (similar drugs or similar diseases may have studied in the past). In suchcases, the use of a fully parametric model is much more appropriate over the semi-parametricor non-parametric models. Many advantages of a fully parametric model for the regression withcensored responses had been illustrated in Chapter 8 of Hosmer et al. (2008) which include – (a)greater efficiency due to the use of full likelihood, (b) more meaningful estimates of clinical effectswith simple interpretations, (c) prediction of the response variable from the fitted model etc.The most common parametric models used for the analysis of survival data are the exponential,Weibull or log-logistic distributions. Many researchers had used such parametric models toanalyze survival data more efficiently; see for example, Cox and Oakes (1984), Crowder etal. (1991), Collett (2003), Lawless (2003), Klein and Moeschberger (2003) among others.The robustness issue with the survival data, on the other hand, has got prominent attentionvery recently. The size and availability of survival data has clearly been growing in recent timesin biomedical and other industrial studies which often may contain few erroneous observations oroutliers and it is very difficult to sort out those observations in presence of complicated censoringschemes. Some recent attempts have been made to obtain robust parametric estimators basedon survival data without any covariates. For example, Wang (1999) has derived the propertiesof M-estimators for univariate life-time distributions and Basu et al. (2006) have developed amore efficient robust parametric estimator by minimizing the density power divergence measure(Basu et al., 1998). These estimators, along with the automatic control for the effect of outlyingobservations, provide a compromise between the most efficient classical parametric estimatorslike maximum likelihood or method of moments and the inefficient non-parametric or semi-parametric approaches provided there is no significant loss of efficiency under pure data. Thepresent paper extends this idea to develop such robust estimators for the model parametersunder censored response with covariates. It does not follow directly from the existing literatureas we need to change the laws of large number and central limit theorem for censored datasuitably in the presence of covariates.It is to be noted that, in this paper we consider a fully parametric model for survival datawith covariates, which is not the same as the usual semi-parametric or nonparametric regres-sion models like the Cox proportional hazard model (Cox, 1972) or the Buckley-James linearregression (Buckley and James, 1979; Ritov, 1990). The latter methods are generally morerobust with respect to model misspecification but less efficient compared to a fully parametricmodels. Further, many recent attempts have been made to develop inference under such semi-parametric models that are robust also with respect to outliers in the data; e.g., Zhou (1992),Bednarski (1993), Kosorok et al. (2004), Bednarski and Borowicz (2006), Salibian-Barrera andYohai (2008), Farcomeni and Viviani (2011) etc. However, no such work has been done to de-velop robust inference under fully parametric regression models with survival response. The fewworks that closely relate to the proposal of our present paper are by Zhou (2010), Locatelli etal. (2011) and Wang et al. (2015), who have proposed some robust solutions for a particularcase of accelerated failure time regression model without any parametric assumptions on thestochastic covariates; the first two papers propose the M-estimators and S-estimators respec-tively for the semi-parametric AFT models with parametric location-scale error and the thirdone extends the M-estimators further to achieve robustness with respect to leverage points andsimultaneous robust estimation of the error variance. In Zhou (2010) there is the possibilityof dependence between the error and stochastic explanatory variables, but independence is alsoallowed in which case it considers the same model set-up as in Locatelli et al. (2011). However,none of these approaches considered fully parametric models with suitable assumptions on themarginal covariate distributions. The present paper fill this gap in the literature of survival anal-3sis by proposing a simple yet general robust estimation criterion with more efficiency under anygeneral parametric model for both the censored response and the stochastic covariates. Further,the proposal of the present paper is fully general with respect to model assumptions and canbe easily extended to the semi-parametric models considered in the existing literature; we willshow that the existing versions of the M-estimators of Zhou (2010) and Wang et al. (2015) canbe considered as a special case of our proposal under the semi-parametric extension. In thatsense, our proposed method in the present paper will give a complete general framework for allpossible modeling options of censored data with stochastic covariates along with the possibilityof more efficient inference through fully parametric covariate distributions.The rest of the paper is organized as follows. We start with a brief review of backgroundconcepts and results about the non-parametric estimator b G ( x, y ) and the minimum densitypower divergence estimators in Section 2. Next we consider a general parametric set-up for thecensored lifetime data with covariates as described above and propose the modified minimumdensity power divergence estimator for the present set-up in Section 3; its application in thecontext of simple linear and exponential regression models with censored response and for thegeneral parametric accelerated failure time model are also described in this section. In Section4, we derive theoretical asymptotic properties for a general class of estimators containing theproposed minimum density power divergence estimator under the present set-up; this generalclass of estimators is indeed a suitable extension of the M-estimators. The global nature of ourproposal and its generality are discussed in Section 5 along with the illustration of this extensionin the semi-parametric set-up. Section 6 contains the robustness properties of the proposedMDPDE and the general M-estimators examined through the influence function analysis forboth fully parametric and semi-parametric set-ups. The performance of the proposed minimumdensity power divergence estimator in terms of both efficiency and robustness is illustratedthrough some appropriate simulation studies in Section 7. Some remarks on the choice of thetuning parameter in the proposed estimator are presented in Section 8, while the paper endswith a short concluding remark in Section 9. b G ( x, y ) One of the main barriers to derive any asymptotic results based on survival data was the un-availability of limit theorems, like law of iterated logarithm, law of large number, central limittheorem etc., under censorship. This problem has been solved in the recently decades mainlythrough the works of Stute and Wang; see Stute and Wang (1993), Stute (1995) for such limittheorems for the censored data without covariates and Stute (1993, 1996) for similar results inpresence of covariables. In this section, we briefly describe some results from the later workswith covariates that will be needed in this paper.Assume the set-up of life-time variable Y censored by an independent censoring variable C as discussed in Section 1. Denote Z = min( Y, C ); the distribution function of Z is given by G Z = 1 − (1 − G Y )(1 − G C ). In order to have the limiting results for this set-up, we need tomake the following basic assumptions:(A1) The life-time variable Y and the censoring variable C are independent and their respectivedistribution functions G Y and G C have no jump in common.4A2) The random variable δ = I ( Y ≤ C ) and X are conditionally independent given Y, i.e.whenever the actual life-time is known the covariates provide no further information oncensoring. More precisely, P ( Y ≤ C | X, Y ) = P ( Y ≤ C | Y ) . Now, consider a real valued measurable function φ from R p +1 to R k and define S n = n X i =1 W in φ ( X [ i,n ] , Z ( i,n ) ) = Z φ ( x, y ) b G ( dx, dy ) . (1)This functional S n forms the basis of several estimators under this set-up. The results presentedbelow describe its strong consistency and distributional convergence; see Stute (1993, 1996) fortheir proofs and details. Proposition 2.1 [Strong Consistency (Stute, 1993)]Suppose that φ ( X, Y ) is integrable and Assumptions (A1) and (A2) hold for the above mentionedset-up. Then we have, with probability one and in the mean, lim n →∞ S n = Z Y <τ GZ φ ( X, Y ) dP + I ( τ G Z ∈ A ) Z Y = τ GZ φ ( X, τ G Z ) dP, (2) where τ G Z denote the least upper bound for the support of G Z given by τ G z = inf { z : G Z ( z ) = 1 } , and A denotes the set of atoms (jumps) of G Z . The above convergence can be written in a simpler form, by defining e G ( x, y ) = (cid:26) G ( x, y ) if y < τ G Z G ( x, τ G Z − ) + I ( τ G Z ∈ A ) G ( x, { τ G Z } ) if y ≥ τ G Z . Then, the convergence in (2) yieldslim n →∞ Z φ ( x, y ) b G ( dx, dy ) = Z φ ( x, y ) e G ( dx, dy ) = e S, say . Further, note that the independence of Y and C gives τ G Z = min( τ G Y , τ G C ), where τ G Y and τ G C are the least upper bound of the supports of G Y and G C respectively. So, whenever τ G Y < τ G C or τ G C = ∞ , the modified distribution function e G coincides with G and the estimator S n becomesa strongly consistent estimator of its population counterpart S = R φ ( x, y ) G ( dx, dy ). Further,it follows that the Glivenko-Cantelli type strong uniform convergence of ˆ G to G holds underassumptions (A1) and (A2); see Corollary 1.5 of Stute (1993). Proposition 2.2 [Central Limit Theorem (Stute, 1996, Theorem 1.2)]Consider the above mentioned set-up with assumption (A2) and suppose that the measurablefunction φ ( X, Y ) satisfies(A3) R [ φ ( X, Z ) γ ( Z ) δ ] dP < ∞ , (A4) R | φ ( X, Z ) | C / ( w ) e G ( dx, dw ) < ∞ , here γ ( z ) = exp (cid:26)Z z − G Z ( dz ′ )1 − G Z ( z ′ ) (cid:27) , with G Z ( z ) = P ( Z ≤ z, δ = 0) , and C ( w ) = Z w − G C ( dz ′ )[1 − G C ( z ′ )][1 − G Z ( z ′ )] . Then we have, as n → ∞ , √ n ( S n − e S ) D → N (0 , Σ φ ) , (3) where Σ φ = Cov [ φ ( X, Z ) γ ( Z ) δ + γ ( Z )(1 − δ ) − γ ( Z )] , (4) where γ and γ are vectors of the same length as φ and are defined as γ ( z ) = 11 − G Z ( z ) Z I ( z < w ) φ ( x, w ) γ ( w ) e G ( dx, dw ) , and γ ( z ) = Z Z I ( v < z, v < w ) φ ( x, w ) γ ( w )[1 − G Z ( v )] G Z ( dv ) e G ( dx, dw ) , with e G ( x, z ) = P ( X ≤ x, Z ≤ z, δ = 1) . Note that a consistent estimator of the above asymptotic variance can be obtained by usingthe corresponding sample covariance and by replacing the distribution functions in the definitionsof γ , γ and γ by their respective empirical estimators.In this context, we should note that Assumption (A2) is strictly stronger than the usualassumptions in regression analysis for censored life-time data (Begun et al., 1983). This canbe seen by writing Stute’s estimate b G as a particular case of the inverse of the probability ofcensoring weighted (IPCW) statistic, where the censoring weights are estimated by the marginalKaplan-Meier estimator for the censoring time. This may lead to some biased inference when thecensoring distribution depends on the covariates, but in such cases we cannot have robust resultsunless moving to the semi-parametric models like Cox regression. Further, Robins and Rotnitzky(1992) also assumed this stronger condition (A2) to develop a more efficient IPCW statistic underthe semi-parametric set-up (see also Van der Laan and Robins, 2003). Zhou (2010) and Wanget al. (2015) have also considered the same assumption (A2) for robust estimation under semi-parametric accelerated failure time models. So, while considering the fully parametric set-upthroughout this paper, we continue with the assumption (A2) for deriving any asymptotic result;clearly it does not restrict the practical use of the proposed method in finite samples. The density power divergence based statistical inference has become quite popular in recentdays due to its strong robustness properties and high asymptotic efficiency without using anynon-parametric smoothing. The density power divergence measure between any two densities g and f (with respect to some common dominating measure) is defined in terms of a tuningparameter α ≥ d α ( g, f ) = Z (cid:20) f α − (cid:18) α (cid:19) f α g + 1 α g α (cid:21) , for α > , Z g log( g/f ) , for α = 0 . (5)6hen we have n i.i.d. samples Y , . . . , Y n from a population with true density function g , modeledby the parametric family of densities F = { f θ : θ ∈ Θ ⊂ R p } , the minimum density powerdivergence estimator (MDPDE) of the parameter θ is to be obtained by minimizing the densitypower divergence between the data and the model family; or equivalently by minimizing Z f αθ ( y ) dy − αα Z f αθ ( y ) dG n ( y ) = Z f αθ ( y ) dy − αα n n X i =1 f αθ ( Y i ) , (6)with respect to θ ; here G n is the empirical distribution function based on the sample. See Basuet al. (1998, 2011) for more details and other properties of the MDPDEs. It is worthwhile tonote that the MDPDE corresponding to α = 0 coincides with the maximum likelihood estimator(MLE); the MDPDEs become more robust but less efficient as α increases, although the extent ofloss is not significant in most cases with small positive α . Thus the parameter α gives a trade-offbetween robustness and efficiency. Hong and Kim (2001) and Warwick and Jones (2005) havepresented some data-driven choices for the selection of optimal tuning parameter α .The MDPDE has been applied to several statistical problems and has been extended suitablyfor different types of data. For example, Kim and Lee (2001) have extended it to the case ofrobust estimation of extreme value index, Lee and Song (2006, 2013) have provided extensionsin the context of GARCH model and diffusion processes respectively and Ghosh and Basu (2013,2014) have generalized it to the case of non-identically distributed data with applications to thelinear regression and the generalized linear model. In the context of survival analysis, Basu etal. (2006) have extended the concept of MDPDE for censored data without any covariates toobtain a robust and efficient estimator. Based on n i.i.d. right censored observations Y , . . . , Y n as above, Basu et al. (2006) have proposed to use the Kaplan-Meier product limit estimator d G Y in place of the empirical distribution function G n in (6) and derived the properties of thecorresponding MDPDEs. In the next section, we will further generalize this idea to obtain robustestimators for a joint parametric model based on censored data with covariates. Let us consider the set-up of Section 1. We are interested in making some inference aboutthe distribution of the lifetime variable Y and its relation with the covariates (through thedistribution G Y | X ) based on the survival data with covariates ( Z i , δ i , X i ). Sometimes one mayalso be interested in the distribution G X of the covariates. As noted earlier, this paper focuseson the parametric approach of inference; so we assume two model family of distributions for G Y | X and G X given by F X = { F θ ( y | X ) : θ ∈ Θ ⊆ R q } and F ′ = { F X,γ ( x ) : γ ∈ Γ ⊆ R r } respectively. Then the target parameters of interest are θ and γ which we will estimate jointlybased on ( Z i , δ i , X i ). The case of known γ can be easily derived from this general case or fromthe work of Basu et al. (2006).The most common and popular method of estimation is the maximum likelihood estimator(MLE) that is obtained by maximizing the probability of the observed data ( Z i , δ i , X i ) withrespect to the parameters ( θ, γ ). However, in spite of several optimal properties, the MLEhas well-known drawback of the lack of robustness. As noted in the previous section, theminimum density power divergence estimator can be used as a robust alternative to the MLE7ith no significant loss in efficiency under pure data for several common problems. Derivingthe motivation from these, specially from the work of Basu et al. (2006), here we consider theminimum density power divergence estimator (MDPDE) of ( θ, γ ) obtained by minimizing theobjective function (6) for the joint density of the variables ( Y, X ) and a suitable estimator ofthis joint distribution. Let us denote the density of F θ ( y | X ) by f θ ( y | X ) and so on. Then thejoint model density of ( X, Y ) is f θ ( y | x ) f X,θ ( x ). As an estimator of their true joint distribution G we use the KMPL b G ( x, y ), because of its optimality properties as described in Section 2.1.Thus, for any α >
0, the objective function to be minimized with respect to ( θ, γ ) is given by H n,α ( θ, γ ) = Z Z f θ ( y | x ) α f X,γ ( x ) α dxdy − αα Z Z f θ ( y | x ) α f X,γ ( x ) α d b G ( x, y )= Z Z f θ ( y | x ) α f X,γ ( x ) α dxdy − αα n X i =1 W in f θ ( Z ( i,n ) | X [ i,n ] ) α f X,γ ( X [ i,n ] ) α . (7)For the case α = 0, the MDPDE of ( θ, γ ) is to be obtained by minimizing the objective function (cid:20) lim α ↓ H n,α ( θ, γ ) (cid:21) , or equivalently H n, ( θ, γ ) = − n X i =1 W in log (cid:2) f θ ( Z ( i,n ) | X [ i,n ] ) f X,γ ( X [ i,n ] ) (cid:3) . (8)The estimator obtained by minimizing (8) is nothing but the maximum likelihood estimator of( θ, γ ) under the present set-up. Therefore, the proposed MDPDE is indeed a generalization ofthe MLE.The estimating equations of the MDPDE of ( θ, γ ) are then given by ∂H n,α ( θ, γ ) ∂θ = 0 , ∂H n,α ( θ, γ ) ∂γ = 0 . α ≥ . For α >
0, routine differentiation simplifies the estimating equations to yield ζ θ − n X i =1 W in u θ ( Z ( i,n ) , X [ i,n ] ) f θ ( Z ( i,n ) | X [ i,n ] ) α f X,γ ( X [ i,n ] ) α = 0 , (9) ζ γ − n X i =1 W in u γ ( X [ i,n ] ) f θ ( Z ( i,n ) | X [ i,n ] ) α f X,γ ( X [ i,n ] ) α = 0 , (10)where ζ θ = Z Z u θ ( y, x ) f θ ( y | x ) α f X,γ ( x ) α dxdy,ζ γ = Z Z u γ ( x ) f θ ( y | x ) α f X,γ ( x ) α dxdy, with u θ ( y, x ) = ∂ ln f θ ( y | x ) ∂θ and u γ ( x ) = ∂ ln f γ ( x ) ∂γ being the score functions corresponding to θ and γ respectively. For α = 0, the corresponding estimating equation obtained by differentiating H n, has the simpler form given by n X i =1 W in u θ ( Z ( i,n ) , X [ i,n ] ) = 0 , n X i =1 W in u γ ( X [ i,n ] ) = 0 , α = 0; note that, at α = 0, ζ θ = 0 and ζ γ = 0. Therefore, Equations (9) and (10) represent the estimating equationsfor all MDPDEs with α ≥ Definition 3.1
Consider the above mentioned set-up. The minimum density power divergenceestimator of ( θ, γ ) based on the observed data ( Z i , δ i , X i ) , i = 1 , . . . , n is defined by the simul-taneous root of the equations (9) and (10). If there are multiple roots of these equations, theMDPDE will be given by the root which minimizes the objective function (7) for α > , or (8)for α = 0 . Clearly, the MDPDE is Fisher consistent by its definition and the estimating equations (9)and (10) are unbiased at the model. Further, there will not be any problem of root selection incase of multiple roots, which is a general issue in the inferences based on estimating equation.This is because we have a proper objective function in the case of MDPDE.Note that the MDPDE estimating equations (9) and (10) can also be written as
Z Z ψ ( y, x ; θ, γ ) d b G ( x, y ) = n X i =1 W in ψ ( Z ( i,n ) , X [ i,n ] ; θ, γ ) = 0 , (11)where ψ ( y, x ; θ, γ ) = ( ψ ,α ( y, x ; θ, γ ) , ψ ,α ( y, x ; θ, γ )) T with ψ ,α ( Y, X ; θ, γ ) = ζ θ − u θ ( Y, X ) f θ ( Y | X ) α f X,γ ( X ) α ,ψ ,α ( Y, X ; θ, γ ) = ζ γ − u γ ( X ) f θ ( Y | X ) α f X,γ ( X ) α . (cid:27) (12)This particular estimating equation (11) is similar to that of the M-estimator for i.i.d. non-censored data with covariates (in fact they become the same if b G is the empirical distributionfunction). So, extending the concept of Wang (1999), we can define the general M-estimator of θ and γ based on any general function ψ ( y, x ; θ, γ ) : R × R p × R q × R r R q × R r (13)as the solution of the estimating equation (11). However, to make it an unbiased estimatingequation, we consider only the ψ -functions for which Z Z ψ ( y, x ; θ, γ ) dG ( x, y ) = 0 . (14) Definition 3.2
Consider the above mentioned parametric set-up for censored data with stochas-tic covariates. Also, consider a general ψ -function as in (13) satisfying the condition (14).An M-estimator of ( θ, γ ) corresponding to this general ψ -function based on the observed data ( Z i , δ i , X i ) , i = 1 , . . . , n , is defined as the root of the estimating equation (11). Note that any general M-estimator is also Fisher consistent and is based on an unbiasedestimating equation, by definition. But, in general, they may suffer from the problem of multipleroots and need a proper numerical techniques (like bootstrapping) to get a well-defined M-estimator. So, in this paper, we restrict our attention mainly to examine the performances ofthe robust MDPDE corresponding to the particular ψ -function defined in (12). However, wederive theoretical asymptotic properties of the general M-estimators in Section 4 and deduce theproperties of MDPDEs from those general results in Section 4.2.9 .2 Application (I): Fully Parametric version of Linear Regression model We first consider the simplest problem of linear regression with censored responses and stochasticcovariables. Precisely, we assume the linear regression model (LRM) Y i = X Ti θ + ǫ i , i = 1 , . . . , n, (15)where Y i is the censored response (generally the lifetime), X i is a p -variate stochastic auxiliaryvariable associated with the response, θ is the vector of unknown regression coefficients and ǫ i is the error in specified linear model. We assume that the error ǫ i s are independent andidentically distributed with distribution function F e and X i s are independent of the errors havingdistribution function F X,γ . Generally we can assume both symmetric error distributions like thenormal as well as asymmetric error distributions like the exponential; however the second group isused in most reliability applications. Then, the conditional distribution of the response variable Y i given X i is F θ ( y | X i ) = F e ( y − X Ti θ ). Now, we consider the incomplete censored observations( Z i , δ i ) as defined in Section 1 and use them to estimate ( θ, γ ) robustly and efficiently.This inference problem clearly belongs to the general set-up considered in the previous sub-section and frequently arises in reliability studies and other applied researches. We can obtaina robust solution to this problem through the proposed MDPDEs, obtained by just solving theestimating equations (9) and (10). Here we present the detail working for one particular exampleof model families F X and F ′ . The case of other model families can also be tackled similarly.Suppose the response variable is exponentially distributed with mean depending on the co-variates as E [ Y | X ] = X T θ ; then F X = (cid:8) Exp ( X T θ ) : θ ∈ R p (cid:9) where Exp ( τ ) represents theexponential distribution with mean τ . Also, for simplicity, let us assume that the auxiliary vari-ables are independent to each other and normally distributed so that F ′ = { N p ( γ, I p ) : γ ∈ R p } .In this case, the objective function H n,α ( θ, γ ) of the MDPDE has a simpler form given by H n,α ( θ, γ ) = (1 + α ) − / (2 π ) α/ ψ (0) ( θ, γ ) − α (2 π ) α/ α n X i =1 W in e − αψ i ( θ,γ ) ( X T [ i,n ] θ ) α α > , (16)and H n, ( θ, γ ) = n X i =1 W in (cid:20) ψ i ( θ, γ ) + log( X T [ i,n ] θ ) + 12 log(2 π ) (cid:21) , (17)where ψ (0) ( θ, γ ) = R ( x T θ ) − α N p ( x, γ, α I p ) dx and ψ i ( θ, γ ) = Z ( i,n ) X T [ i,n ] θ + ( X [ i,n ] − γ ) T ( X [ i,n ] − γ ).Note that the integral ψ (0) ( θ, γ ) is just the expectation of a simple function of multivariatenormal random variable; so it can be computed quite easily using standard numerical integrationtechniques. Therefore, we can simply minimize the above objective functions by any numericalalgorithm to obtain the MDPDE at any α ≥ α (1 + α ) / ¯ ψ (0) ( θ, γ ) = n X i =1 W in e − αψ i ( θ,γ ) X T [ i,n ] θ ) (1+ α ) − Z ( i,n ) ( X T [ i,n ] θ ) ! X [ i,n ] ,α (1 + α ) / h ¯ ψ (0) ( θ, γ ) − γψ (0) ( θ, γ ) i = n X i =1 W in e − αψ i ( θ,γ ) ( X [ i,n ] − γ )( X T [ i,n ] θ ) α , where ¯ ψ (0) ( θ, γ ) = R ( x T θ ) − α xN p ( x, γ, α I p ) dx. The case of α = 0 (MLE) can be simplifiedfurther where the estimator of γ becomes independent of the parameter θ . To see this, we10implify the above estimating equations at α = 0 as n X i =1 W in (cid:16) Z ( i,n ) − X T [ i,n ] θ (cid:17) ( X T [ i,n ] θ ) X [ i,n ] = 0 , n X i =1 W in (cid:0) X [ i,n ] − γ (cid:1) = 0 . Solving the second equation, we get that b γ = P ni =1 W in X [ i,n ] P ni =1 W in , which clearly does not depend on θ , as is expected from the theory of maximum likelihood inference.The semi-parametric version of this model has been considered in Zhou (2010), where noassumption has been made about the distribution of the i.i.d. sequences { X i } and { ǫ i } . In thatpaper, an M-estimator of θ has been proposed by solving the estimating equation n X i =1 W in ψ (cid:16) Z ( i,n ) − X T [ i,n ] θ (cid:17) = 0 , (18)for suitable choices of ψ . Although this M-estimator is different from the proposed MDPDE,it in fact belongs to our general class of M-estimators (Definition 3.2) as we will show in detailin section 5. Further, we will show in Section 6 that our MDPDE with α > The simple linear regression model considered in the previous subsection is the most popularinference problem under the set-up considered in this paper. However, although it is simpleand potentially applicable in several real life problems, for the purpose of serving the typicalapplications in the medical science this simple linear model is rarely used. The reason is that inalmost all medical applications the support of the distribution of the censoring times is shorterthan the support of the lifetimes (we cannot follow the patients until they die). For this reasonthe linear model is usually not identifiable and in order to become applicable the setup of theproposed method needs to introduce a truncation time, say τ , such that the probability to beuncensored by time τ is strictly greater than zero for all x . We can suitably extend the proposedMDPDE to cover these assumptions through some more routine calculations, which we leave forthe readers.In this section we present an alternative multiplicative model with exponential error for theapplications in medical sciences. This particular model, known as the exponential regressionmodel, is widely used and most popular in the medical sciences and related applications. Moreprecisely, let us assume the multiplicative regression model for the survival times (responses) Y i , i = 1 , . . . , n as Y i = e X Ti θ × ǫ i , i = 1 , . . . , n, (19)where X i is a p -variate stochastic auxiliary variable associated with the response, θ is the vectorof unknown regression coefficients and ǫ i is the error in the specified linear model. Such amultiplicative model ensures the positivity of the the response variables, which are generally life-time in most applications. In the exponential regression model, we assume that the error variable ǫ is exponentially distributed with mean 1. Then the conditional distribution of the response11ariable Y given the covariate X is also exponential with mean E [ Y | X ] = e X T θ ; so consideringthe notations of Section 1, F X = n Exp ( e X T θ ) : θ ∈ R p o . Also, the X i s are independent ofthe errors having distribution function F X,γ . Our objective is to estimate ( θ, γ ) robustly andefficiently based on the incomplete (censored) observations ( Z i , δ i , X i ) as defined in Section 1.Again this inference problem belongs to the general set-up of Section 3.1 so that the proposedMDPDEs provide a robust solution to it. In the case of independent and normally distributedcovariates with F ′ = { N p ( γ, I p ) : γ ∈ R p } , we can simplify the objective function H n,α ( θ, γ ), tobe minimized in order to obtain the MDPDE, as H n,α ( θ, γ ) = e α ( γ T θ )+ α α ) ( θ T θ ) (1 + α ) / (2 π ) α/ − (1 + α )(2 π ) α/ α n X i =1 W in e − α Γ i ( θ,γ ) α > , (20)and H n, ( θ, γ ) = n X i =1 W in (cid:20) Γ i ( θ, γ ) + 12 log(2 π ) (cid:21) , (21)where Γ i ( θ, γ ) = Z ( i,n ) e X T [ i,n ] θ + ( X T [ i,n ] θ ) + ( X [ i,n ] − γ ) T ( X [ i,n ] − γ ) . This objective function canbe easily minimized using any standard numerical techniques for any α ≥ α >
0, they have the form n X i =1 W in e − α Γ i ( θ,γ ) (cid:16) Z ( i,n ) e ( X T [ i,n ] θ ) + 1 (cid:17) X [ i,n ] = α (1 + α ) / (cid:20) γ + αθ (1 + α ) (cid:21) e α ( γ T θ )+ α α ) ( θ T θ ) , n X i =1 W in e − α Γ i ( θ,γ ) (cid:0) X [ i,n ] − γ (cid:1) = α θ (1 + α ) / e α ( γ T θ )+ α α ) ( θ T θ ) . At α = 0 (MLE), these estimating equations further simplifies to n X i =1 W in (cid:16) Z ( i,n ) e ( X T [ i,n ] θ ) + 1 (cid:17) X [ i,n ] = 0 , n X i =1 W in (cid:0) X [ i,n ] − γ (cid:1) = 0 , which again produce the same estimator of γ as in the case of the LRM and independent of theparameter θ . The exponential regression model considered in the previous section can be linearized by takingnatural logarithm of the response time:log( Y i ) = X Ti θ + ǫ ∗ i , (22)where ǫ ∗ i = log( ǫ ) follows the standard extreme value distribution. This model can be generalizedby considering some alternative distribution for ǫ ∗ i , but with mean 0. When ǫ ∗ i follows an extremevalue distribution with mean 0 and scale parameter σ , then Y i follows a Weibull distribution andthe resulting regression model is known as the Weibull regression model (WRM). Other commondistributions for ǫ ∗ i are logistic (survival time has log-logistic distribution), normal (survival times12re log-normal) etc. In such models, the covariate has a multiplicative effect of the responselife-time and hence they are generally known as the accelerated failure time (AFT) model.Consider a general location scale model family { f µ,σ ( x ) = f (( x − µ ) /σ ) } for some knownfunction f and let ǫ ∗ i have density f ,σ . Then, given the covariate X , Y i has density1 y f (cid:18) log ( y ) − ( X T β ) σ (cid:19) . This is the general form of the parametric AFT regression model; taking f as standard extremevalue distribution it simplifies to the Weibull regression model and so on. Suppose the distribu-tion of the covariates X i is modeled by the family F X,γ as in the earlier cases so that our targetbecomes the estimation of θ = ( β, σ ) and γ = µ based on the censored observations ( Z i , δ i , X i )as defined in Section 1.Now we can again minimize the objective function (7) with respect to the parameters toobtain their robust MDPDE. The exact form of this objective function can be obtained easilyfor any particular choice of f and any standard numerical algorithm provide us with the solutionof this optimization problem.Further, as we will see through the numerical illustrations in Section 7, the proposed MDPDEprovides highly robust solutions in presence of outliers in data with little loss in efficiency at smallpositive α . In the case of the accelerated failure time models, the robustness of the MDPDE isdirectly comparable with the alternative proposal of Locatelli et al. (2011). Note that, contraryto the Locatelli et al. (2011) approach, our proposed estimator can estimate the parameter ( γ )in the distribution of covariates simultaneously with θ = ( β, σ ); although someone might notsee it to be a big advantage as the parameter γ can be estimated separately in many cases.However, the major advantage of our proposal is its generality and computational simplicityfor any kind of parametric model with censored survival data with stochastic covariates. Wewill see in Section 5 that, even if we ignore the estimation of the parameters of the marginaldistribution of covariates, our proposal contains the existing proposals of Locatelli et al. (2011),a similar proposal of Zhou (2010) under the same set-up and also their extension in Wang etal. (2015). This vast generality is the major strength of our proposal over the existing literature;see Section 5 for more detailed and general discussions. Consider the models and set-up described in Section 3.1. First, we derive the asymptotic proper-ties of the general M-estimator (ˆ θ n , ˆ γ n ) of ( θ, γ ) as defined in Definition 3.2 based on a (random)censored sample of size n . Let us assume that the true distributions belong to the correspondingmodel families with ( θ , γ ) being the true parameter value. Define λ n ( θ, γ ) = Z Z ψ ( y, x ; θ, γ ) d b G ( x, y ) = n X i =1 W in ψ ( Z ( i,n ) , X [ i,n ] ; θ, γ ) , and λ G ( θ, γ ) = Z Z ψ ( y, x ; θ, γ ) dG ( x, y ) . Then λ n ( θ, γ ) is the empirical version of λ G ( θ, γ ); also by definition λ n (ˆ θ n , ˆ γ n ) = 0 and λ G ( θ , γ ) = 0. In order to prove the asymptotic consistency and normality of the general M-estimator (ˆ θ n , ˆ γ n ), we use the results of Section 2. So, we will assume that the assumptions (A1)to (A4) holds true with φ replaced by ψ . Further, let us assume13A5) Either τ G Y < τ G C or τ G C = ∞ so that e G and G coincides.(A6) The variance matrix Σ ψ , as defined in (4) with φ replaced by ψ , exists finitely (with allentries finite).Then Proposition 2.1 and Proposition 2.2 give the strong consistency and asymptotic normalityof λ n ( θ, γ ), which are summarized in the following lemma. Lemma 4.1
Consider the above set-up with an integrable function ψ and let ( θ , γ ) be the trueparameter value. Then,(i) Under Assumptions (A1), (A2) and (A5) we have, with probability one, lim n →∞ λ n ( θ , γ ) = λ G ( θ , γ ) (ii) Under Assumptions (A2) to (A6), the asymptotic distribution of √ n [ λ n ( θ, γ ) − λ G ( θ, γ )] = √ n Z Z ψ ( y, x ; θ, γ ) d [ b G − G ]( x, y ) is normal with mean and variance matrix Σ ψ . Wang (1999) has proved the strong consistency and asymptotic normality results for the M-estimators based on only censored variable with no covariables. This section extend the theoryto the case where covariables are present along with the censored response. The extensionsare in the line of the corresponding results with no censoring (see Huber, 1981; Serfling, 1980).Further note that whenever γ is known the asymptotic properties of ˆ θ n follows from just a routineapplication of the results derived in Wang (1999); so here we assume γ to be known and derivethe joint distribution of M-estimator of θ and γ . These results provide a general (asymptotic)theoretical framework to study the properties of a wide class of estimators of ( θ, γ ) dependingon the estimating equations.Denote the j -th component of the function ψ by ψ j for j = 1 , . . . , q + r . Also, considerfollowing (stronger) conditions on the nature of the function ψ .(A7) ψ ( y, x ; θ, γ ) is continuous in ( θ, γ ) and also bounded.(A8) The population estimating equation λ G ( θ, γ ) = 0 has an unique root given by ( θ , γ ).(A9) There exists a compact set C in R q × R r satisfyinginf ( θ,γ ) / ∈ C (cid:12)(cid:12)(cid:12)(cid:12)Z Z ψ j ( y, x ; θ, γ ) dG ( x, y ) (cid:12)(cid:12)(cid:12)(cid:12) > , j = 1 , . . . , q + r. Now let us start with the strong consistency of the M-estimator (ˆ θ n , ˆ γ n ) by an extension ofTheorem 3 of Wang (1999, page 307) under above conditions. The proof follows in the same lineof Wang (1999) by replacing the corresponding SLLN, given in Proposition 1 of Wang (1999), bythe part (i) of Lemma 4.1 in the present context; hence it is omitted for simplicity of presentation.14 heorem 4.2 Consider the above set-up with Assumptions (A1), (A2), (A5), (A7) and (A8).Then we have the following results.(i) There exists a sequence of M-estimators { (ˆ θ n , ˆ γ n ) } satisfying the empirical estimating equa-tion λ n ( θ, γ ) = 0 that converges with probability one to ( θ , γ ) .(ii) Further if (A9) also holds true, then any sequence of M-estimators { (ˆ θ n , ˆ γ n ) } satisfying λ n ( θ, γ ) = 0 converges with probability one to ( θ , γ ) . Note that the first part (i) of Theorem 4.2 is just a multivariate extension of Lemma Bof Serfling (1980, page 249) from the complete data case to the present case of censored datawith covariates. Further, the additional condition (A9) in part (ii) makes any sequence of M-estimators satisfying the estimating equation (11) to eventually fall in a compact neighborhoodof ( θ , γ ). This result, even with the stronger conditions, becomes really helpful when theempirical estimating equation λ n ( θ, γ ) = 0 has multiple roots and one could obtain differentM-estimator sequences by applying different numerical equation solving techniques. Part (ii) ofTheorem 4.2 ensures that all theses sequences of M-estimators will be strongly consistent for theunique root ( θ , γ ) of the equation λ G ( θ, γ ) = 0.Next we turn our attention to the asymptotic normality of M-estimators. In this regard, wewill first present a useful lemma in terms of any real valued function g ( y, x, θ , γ ). This is againa suitable extension of Lemma 1 of Wang (1999, page 307) to the present set-up and the prooffollows similarly by replacing the corresponding SLLN (Proposition 1 of Wang) by Part (i) ofLemma 4.1. Assume the following condition about the function g ( y, x, θ , γ ).(A10) For a real valued function g ( y, x, θ , γ ), at least one of the following holds:(i) g ( y, x, θ, γ ) is continuous at ( θ , γ ) uniformly in ( y, x ).(ii) As δ → Z Z sup { ( θ,γ ): || ( θ,γ ) − ( θ ,γ ) ||≤ δ } | g ( y, x, θ, γ ) − g ( y, x, θ , γ ) | dG ( x, y ) = h δ → . (Here || · || denotes the Euclidean norm).(iii) g is continuous in ( y, x ) for for any fixed ( θ, γ ) in a neighborhood of ( θ , γ ), andlim ( θ,γ ) → ( θ ,γ ) || g ( y, x, θ, γ ) − g ( y, x, θ , γ ) || v = 0 . (Here || · || v denotes the total variation norm).(iv) RR g ( y, x, θ, γ ) dG ( x, y ) is continuous at ( θ, γ ) = ( θ , γ ), and g is continuous in ( y, x )for ( θ, γ ) in a neighborhood of ( θ , γ ), andlim ( θ,γ ) → ( θ ,γ ) || g ( y, x, θ, γ ) − g ( y, x, θ , γ ) || v < ∞ . (v) RR g ( y, x, θ, γ ) dG ( x, y ) is continuous at ( θ, γ ) = ( θ , γ ), and Z Z g ( y, x, θ, γ ) d ˆ G ( x, y ) P → Z Z g ( y, x, θ, γ ) dG ( x, y ) < ∞ , uniformly for ( θ, γ ) in a neighborhood of ( θ , γ ).15 emma 4.3 Suppose g ( y, x, θ , γ ) is a real valued function with RR g ( y, x, θ , γ ) dG ( x, y ) < ∞ . Assume that the conditions (A1), (A2) and (A10) hold for g . Then, for any sequence (ˆ θ n , ˆ γ n ) P → ( θ , γ ) , we have Z Z g ( y, x, ˆ θ n , ˆ γ n ) d ˆ G ( x, y ) P → Z Z g ( y, x, θ , γ ) dG ( x, y ) . Theorem 4.4
Consider the above set-up and assume that ψ is differentiable with respect to ( θ, γ ) in a neighborhood of ( θ , γ ) and the matrix Λ G ( θ , γ ) = Z Z ∂∂ ( θ, γ ) ψ ( y, x, θ, γ ) (cid:12)(cid:12)(cid:12)(cid:12) ( θ,γ )=( θ ,γ ) dG ( x, y ) , (23) exists finitely and is non-singular. Further assume that the assumptions of Lemma 4.3 holdfor g ( y, x, θ, γ ) = Λ ijG ( θ , γ ) , the ( i, j ) -th element of Λ G ( θ , γ ) , with i, j = 1 , . . . , q + r . Then,under Assumptions (A2) to (A6) we have, for any sequence of M-estimators { (ˆ θ n , ˆ γ n ) } satisfying λ n ( θ, γ ) = 0 that converges in probability to ( θ , γ ) , √ n h (ˆ θ n , ˆ γ n ) − ( θ , γ ) i D → N (cid:18) , Λ G ( θ , γ ) − Σ ψ ( G )Λ G ( θ , γ ) − (cid:19) . Proof:
Since ψ is differentiable in ( θ, γ ) , so is the function λ n ( θ, γ ). So an application ofmultivariate mean value theorem yields λ n (ˆ θ n , ˆ γ n ) − λ n ( θ , γ ) = Λ ˆ G ( ζ n , ζ n ) h (ˆ θ n , ˆ γ n ) − ( θ , γ ) i , with || ( ζ n , ζ n ) − ( θ , γ ) || < || (ˆ θ n , ˆ γ n ) − ( θ , γ ) || . Further, by definition, λ n (ˆ θ n , ˆ γ n ) = 0 and λ G ( θ , γ ) = 0. Hence we get,(ˆ θ n , ˆ γ n ) − ( θ , γ ) = − (cid:2) Λ ˆ G ( ζ n , ζ n ) (cid:3) − (cid:20)Z Z ψ ( y, x ; θ, γ ) d [ b G − G ]( x, y ) (cid:21) . However, it follows from Lemma 4.3 that each term of Λ ˆ G ( ζ n , ζ n ) convergence in probabilityto the corresponding term of Λ G ( θ , γ ). Then, an application of Slutsky’s theorem and Part(ii) of Lemma 4.1 completes the proof of the theorem. (cid:3) It is to be noted that the asymptotic normality of the M-estimators require more conditionsthan that required for its strong consistency in terms of differentiability properties of the ψ function, but it avoid the strong assumptions (A7) – (A9) used in Theorem 4.2. In fact, toobtain the asymptotic distributional convergence of any sequence of M-estimators in this case,it is just enough to ensure their convergence to the true parameter value in probability. All therelated conditions used here are in the same spirit with that used in Wang (1999) have been nocovariables are present and were discussed extensively in that paper.Finally, note that the estimating equation of any general M-estimator can be solved throughan appropriate numerical technique but the complexity in terms of the iterative procedure in-creases extensively for a complicated non-linear ψ -function. However, one can show that, for16he Newton-Raphson algorithm, if we start the iterations with some √ n -consistent estimator of( θ, γ ) then the estimator obtained by just one iteration, known as the one-step M-estimator, willhave the same asymptotic distribution as the fully iterated M-estimator even in case of censoreddata with covariables as considered here. This is a well-known property of the M-estimatorin case of complete data. The following theorem present this precisely for our case; the prooffollows by an argument similar to that of Theorem 6 of Wang (1999) replacing Proposition 1and 2 of that paper by Part (i) and Part (ii) of Lemma 4.1 respectively. Theorem 4.5
Suppose the conditions of Theorem 4.4 hold true and let ( e θ n , e γ n ) is any √ n -consistent estimate of the true parameter value ( θ , γ ) . Then, the one-step M-estimator ( θ (1) n , γ (1) n ) ,defined as ( θ (1) n , γ (1) n ) = ( e θ n , e γ n ) − h Λ ˆ G ( e θ n , e γ n ) i − λ n ( e θ n , e γ n ) , (24) has the same distribution as that of the M-estimator (ˆ θ n , ˆ γ n ) derived in Theorem 4.4. Note that, the MDPDE is a particular M-estimator with the ψ -function given by (12) and soall the results derived in the previous subsection for general M-estimators also hold true for theMDPDEs. In particular MDPDEs are strongly consistent and asymptotically normal under theassumptions considered in Theorems 4.2 and 4.4. However, in this particular case of MDPDEs,we can closely investigate the required assumptions for the particular form of the ψ -function.Note that assumptions (A1), (A2) and (A5) are related to the censoring scheme under consid-eration and others are about the special structure of the ψ -function. Further, in this particularcase of MDPDE, the ψ -function depends on the model density and its score function. So, con-ditions (A3), (A4) and (A6) can easily be shown to hold for most statistical models by usingthe existence of finite and continuous second order moments of the score functions with respectto the true distribution G . Similar differentiability conditions on the model and score functionsfurther ensure the assumptions of Lemma 4.3. So, the asymptotic normality of the MDPDEsfollows from Theorem 4.4 for most models provided we can prove its consistency. However,assumptions (A6)–(A9), required to prove the strong consistency in Theorem 4.2, are ratherdifficult one and may not always hold for the assumed model.Noting that, the asymptotic normality of MDPDEs, as obtained in Theorem 4.4, does notrequire its strong consistency (only convergence in probability is enough), we now present analternative approach to prove the (weak) consistency for the particular case of MDPDEs undersome simpler conditions. This approach is essentially due ot Lehmann (1983), and has been usedby Basu et al. (1998) to prove the asymptotic properties of the MDPDEs under i.i.d. completedata and extended by many researchers later in the context of different inference problems. Here,we extend their approach further for the present case of censored data with covariates. Let usalso relax the assumption that the true distribution G belongs to the model family in the senseof assumption (D1) below. Define V ( Y, X ; θ, γ ) = Z Z f θ ( y | x ) α f X,γ ( x ) α dxdy − αα f θ ( Y | X ) α f X,γ ( X ) α , so that the MDPDE of ( θ, γ ) is to be obtained by minimizing H n ( θ, γ ) = Z Z V ( y, x ; θ, γ ) d b G ( x, y ) , ψ -function for the MDPDEs as given by Equation(12) satisfies ψ ( Y, X ; θ, γ ) = ∂V ( Y, X ; θ, γ ) ∂θ , ψ ( Y, X ; θ, γ ) = ∂V ( Y, X ; θ, γ ) ∂γ . (25)Now, let us assume the following conditions:(D1) The supports of the distributions F θ and F X,γ for any value of X are independent of theparameters θ and γ respectively. The true distribution G ( x, y ) is also supported on the set A = { ( x, y ) : f θ ( y | x ) f X,γ ( x ) > } , on which the true density g is positive.(D2) There exists an open subset ω of the parameter space that contains the best fitting pa-rameter ( θ , γ ) and for all ( θ, γ ) ∈ ω and for almost all ( x, y ) ∈ A , the densities f θ and f X,γ are thrice continuously differentiable with respect to θ and γ respectively.(D3) The integrals RR f θ ( y | x ) α f X,γ ( x ) α dxdy and RR f θ ( y | x ) α f X,γ ( x ) α dG ( x, y ) can be dif-ferentiated three times and the derivatives can be taken under the integral sign. Furtherthe ψ -function under consideration is finite.(D4) The matrix Λ G ( θ , γ ) = Z Z ∂∂ ( θ, γ ) ψ ( y, x, θ, γ ) (cid:12)(cid:12)(cid:12)(cid:12) ( θ,γ )=( θ ,γ ) dG ( x, y )= Z Z ∂ ∂ ( θ, γ ) V ( y, x, θ, γ ) (cid:12)(cid:12)(cid:12)(cid:12) ( θ,γ )=( θ ,γ ) dG ( x, y ) , exists finitely and is non-singular.(D5) For all ( θ, γ ) ∈ ω , each of the third derivatives of V ( y, x, θ, γ ) with respect to ( θ, γ ) isbounded by a function of ( x, y ), independent of ( θ, γ ), that has finite expectation withrespect to the true distribution G . Theorem 4.6
Under Assumptions (A1), (A2), (A5) and (D1)–(D5), there exists a sequence ofsolutions { (ˆ θ n , ˆ γ n ) } of the minimum density power divergence estimating equations (9) and (10)with probability tending to one, that is consistent for the best fitting parameter ( θ , γ ) .(Then, the asymptotic normality of this sequence { (ˆ θ n , ˆ γ n ) } follows from Theorem 4.4 under theassumptions of that theorem.) Proof:
We follow a similar argument to that in the proof of Theorem 6.4.1(i) of Lehman(1983). Consider the behavior of H n ( θ, γ ), as a function of ( θ, γ ), on a sphere Q a having center( θ , γ ) and radius a . Then, to prove the existence part, it is enough to show that, for sufficientlysmall a , H n ( θ, γ ) > H n ( θ , γ ) , (26)with probability tending to one, for any point ( θ, γ ) on the surface of Q a . Hence, for any a > H n ( θ, γ ) has a local minimum in the interior of Q a and the estimating equations of the MDPDEhave a solution { (ˆ θ n ( a ) , ˆ γ n ( a ) } within Q a , with probability tending to one.18ow a Taylor series expansion of H n ( θ, γ ) around ( θ , γ ) yields H n ( θ , γ ) − H n ( θ, γ ) = − q + r X i =1 ( ζ i − ζ i ) ∂H n ( θ, γ ) ∂ζ i (cid:12)(cid:12)(cid:12)(cid:12) ( θ,γ )=( θ ,γ ) − q + r X i,j =1 ( ζ i − ζ i )( ζ j − ζ j ) ∂ H n ( θ, γ ) ∂ζ i ζ j (cid:12)(cid:12)(cid:12)(cid:12) ( θ,γ )=( θ ,γ ) + 16 q + r X i,j,k =1 ( ζ i − ζ i )( ζ j − ζ j )( ζ k − ζ k ) ∂ H n ( θ, γ ) ∂ζ i ζ j ζ k (cid:12)(cid:12)(cid:12)(cid:12) ( θ,γ )=( θ ∗ ,γ ∗ ) = S + S + S , (say), (27)where ζ i and ζ i are the i -th component of the parameter vectors ( θ, γ ) and ( θ , γ ) respectivelyfor all i = 1 , . . . , q + r , and ( θ ∗ , γ ∗ ) lies in between ( θ, γ ) and ( θ , γ ) with respect to the Euclideannorm. By a direct extension of the arguments presented in the proof of Theorem 3.1 of Basu etal. (2006), we get, with probability tending to one, on Q a , | S | < ( q + r ) a , for all a > S < − ca , for all a < a with some c, a > | S | < ba , for all a > b > , using the assumptions (D1)–(D5) and Lemma 4.1 whenever necessary. Combining these, we getmax( S + S + S ) < − ca + ( b + q + r ) a , which is less that zero whenever a < cb + q + r proving (26) holds.Finally, to show that one can choose a root of the estimating equations of MDPDEs inde-pendent of the radius a , consider the sequence of roots closest to the best fitting parameter( θ , γ ), which exists by continuity of H n ( θ, γ ) as a function of ( θ, γ ). This sequence will also beconsistent completing the proof of the theorem. (cid:3) Note that Assumptions (D1)–(D5) are easier to check compared to the (stronger) Assump-tions (A6)–(A9) and are the routine extensions of the corresponding assumptions [(A1)–(A5)]of Basu et al. (2006). ψ -Functions Although our main focus in this paper is to study one particular M-estimator, namely theminimum density power divergence estimator (MDPDE), it opens the scope of many differentM-estimators through the general results derived in Section 4.1. This general framework ofparameter estimation based on some suitable estimating equation is well studied in case ofcomplete data and several optimum robustness properties of these M-estimators has been provedfor different classes of weight function; for example, see Huber (1981) and Hampel et al. (1986).In fact, there exists different class of ψ -function generating optimum solution in case of differentproblems. For example, in case of estimating the location parameter in a symmetric distribution,the ψ functions, that are odd in the targeted parameter, lead to such optimum M-estimation.However, as pointed out in Wang (1999), an optimum ψ -function for the complete datamight not enjoy similar optimality for the censored data, even if there is no covariable presence.19he main reason is that the lifetime variables are not usually symmetric and neither belongto a location-scale family; rather it is usually asymmetric. The case of censored data withcovariates, as considered here, is much more complicated and we can not directly pick a ψ -function from the theory of complete data. Wang (1999) presented some example of ψ -functionsin the context of censored data with no covariates that can be extended in the present case withseveral covariables. However, their usefulness and optimality both in terms of efficiency androbustness need to be verified for the censored data cases with or without covariates. Thereneed a lot of research in this area to suggest an optimum ψ -function under any suitable criteriaof robustness or efficiency based on censored data.However, we believe that the minimum density power divergence estimator proposed hereis quite sufficient for most practical situations since it produces highly robust estimators withonly a slight loss in efficiency compared to the maximum likelihood estimator (as described inSection 7). Further, the estimating equation of MDPDEs can be solved by any simple numericaltechnique quite comfortably and has a simple interpretation in terms of the density powerdivergence. Thus, although some future research work may provide suitable ψ -function satisfyingsome optimality criteria with complicated form or estimation procedure, the MDPDE will stillhave its importance in many practical scenarios due to its simplicity. Although the main focus of our paper is the MDPDE under fully parametric set-up, the M-estimator defined in Definition 3.2 and its asymptotic theory derived in the previous sectionis completely general in the sense that it can also be applied to any semi-parametric or evennon-parametric set-ups. To see this, just note that the general M-estimator is defined in termsof a ψ function that only need to satisfy Equation (14). Therefore, one can also consider the ψ functions, ψ ( y, x ; θ ), involving no parametric assumptions on the distribution of x (and henceindependent of parameter γ ) and define the M-estimator as before based on the correspondingestimating equation; that estimator will also follow the general asymptotic theory developed inthis paper. Further, in this case, we might generalize our requirement (14) for such ψ functionsby considering integral with respect to only the conditional distribution G Y | X of Y given X asfollows (since φ doesn’t include any distributional part of X ): Z ψ ( y, x ; θ ) dG Y | X ( y ) = 0 . (28)In this general sense, the existing estimators of Zhou (2010) and Wang et al. (2015) becomeparticular members of our class of general M-estimators with some specific choice of ψ functionwithout distributional assumptions on X . In particular the choice ψ ( y, x ; θ ) = ψ (cid:0) y − x T θ (cid:1) , (29)under the set-up considered in Section 3.2 (except the distributional assumption on X ) generatesthe estimator proposed in Zhou (2010). Then the asymptotic results of Zhou et al (2010) directlyfollows from our general theory of Section 4; in particular, Theorem 3.3 of Zhaou (2010) followsfrom our Theorem 4.4.Similarly, the proposal of Wang et al. (2015) can also be though of as a special case of ourgeneral M-estimators under the set-up of 3.4 (except the distributional assumption on X ) with20he ψ function ψ ( y, x ; θ ) = ψ (cid:16) ω ( x )( y − x t β ) σ (cid:17) xω ( x ) χ (cid:16) ω ( x )( y − x t β ) σ (cid:17) , (30)where χ ( s ) = sψ ( s ) − ω ( x ) is some suitable weights and ψ is some suitable function as given inWang et al. (2015). Once again, all the asymptotic results of their paper follow from our generaltheory presented in Section 4. For example, Theorem 3.2 and 3.3 of Wang et al. (2015) followfrom our Theorem 4.2 and 4.4. respectively under the above mentioned set-up. A numericalcomparison of our MDPDE with the estimator of Wang et al. (2015) has been provided later inSection 7.2 through an interesting real data example.However, the proposed MDPDE, a special M-estimator with the ψ function given by (12),involve the assumed density of the covariates X . So it cannot be applied directly to the semi-parametric settings where no distributional assumption has been made. But, we can easilyextend our definition of MDPDE for the semi-parametric cases by considering the density powerdivergence between the conditional densities f θ ( Y | X ) of Y given X instead of considering thejoint density of Y and X . The ψ function corresponding to this extended MDPDE under semi-parametric set-up can be seen to have the form= e ζ θ ( X ) − u θ ( Y, X ) f θ ( Y | X ) α , (31)where e ζ θ ( x ) = R u θ ( y, x ) f θ ( y | x ) α dy . Clearly, this ψ function, ψ α ( y, x ; θ ), corresponding to theextended MDPDE satisfies the stronger condition (28) and hence also satisfies (14). Thus, allthe properties derived in Section 4 continue to hold under suitable modification for the semi-parametric set-up. With this modification, the proposed MDPDE can now be applied to anysemi-parametric set-up including the linear regression set-up of Zhou et al. (2010), as consideredin Section 3.2 with fully parametric assumptions. The influence function (Hampel et al., 1986) of an estimator is a popular tool to measure itsclassical robustness properties. It measures the stability of the estimator under infinitesimalcontamination yielding a first order approximation of the bias due to that small contaminationin data. More precisely, if T ψ ( G ) = ( T θψ ( G ) , T γψ ( G )) denotes the statistical functional for theM-estimator corresponding to ψ (which satisfies Equation (14)), then the influence function ofthis estimator is defined as IF (( y , x ); T ψ , G ) = ∂∂ǫ T ψ ( G ǫ ) (cid:12)(cid:12)(cid:12)(cid:12) ǫ =0 = lim ǫ ↓ T ψ ( G ǫ ) − T ψ ( G ) ǫ , where G ǫ = (1 − ǫ ) G + ǫ ∧ ( x ,y ) is the contaminated distribution with ǫ being the contaminationproportion and ∧ ( x ,y ) being the degenerate distribution at the contamination point ( x , y ). Ifthe influence function is bounded in the contamination points ( x , y ), the bias under infinites-imal contamination cannot become arbitrarily large even when the contamination is very farfrom the data center; hence the estimator will be robust with respect to the data contamination.A straightforward albeit lengthy differentiation of the estimation equation (Equation (14)with G replaced by G ǫ and ( θ, γ ) replaced by T ψ ( G ǫ )) yields the form of the influence functionof our general M-estimators, which is presented in the following theorem.21 heorem 6.1 Under the above mentioned set-up, IF (( y , x ); T ψ , G ) = Λ G ( T θψ ( G ) , T γψ ( G )) − ψ ( y , x ; T θψ ( G ) , T γψ ( G )) . (32)Clearly, whenever we choose the ψ function to be bounded with respect to y and x , theinfluence function of the corresponding M-estimator will be bounded in both y and x ; hencethe estimator will be robust with respect to both the outlier y in response variables as wellas the leverage point x in the explanatory variables. However, if we have ψ function boundedonly in y and not in x (like the ψ functions of the classical M-estimators under normal linearregression without censoring) the resulting estimator will be robust only with respect to outliersin response but may not be robust with respect to leverage points.In particular, the above theorem also provides the influence function of the proposed MDPDEunder fully parametric models by just using the ψ function given in (12). Note that, for mostcommon parametric models, this particular ψ function is bounded in both y and x whenever α > α > α = 0, the ψ function of the corresponding MDPDE (which is thesame as the MLE) is proportional to the score functions which are generally unbounded for mostparametric models and prove their non-robust nature.However, the MDPDE under the semi-parametric extension has a different ψ function, givenin (31), which is not bounded in x for all α ≥ y for α >
0. Hence the semi-parametric MDPDE with α >
Consider the exponential regression model with randomly censored data and normal covariablesas discussed in the previous subsection. For simulation exercise, we consider only one covariableso that X is a univariate normal random variable with mean γ (scalar) and variance 1. Thena covariate sample of size n is generated from N ( γ,
1) distribution and given the value x of thecovariate we simulate the (lifetime) response variable from an exponential distribution with mean θx under a random censoring scheme; the true values of the parameters are taken to be θ = 1and γ = 5. Here we consider the simple exponential censoring distribution, but the censoringrate is determined to keep the expected proportion of censoring at 10 or 20% under the truedistribution. Under the exponential censoring distribution with mean τ , i.e., C ∼ Exp ( τ ), theexpected proportion of censoring under the true distribution Exp ( θx ) can be seen to be P ( Y > C ) = θxτ + θx . So to make this proportion equal to 10% or 20%, we need to take τ = 9 θx and τ = 4 θx respectively (with θ = 1 for our simulation study).Then we compute the MDPDE of ( θ, γ ) numerically and repeat the process 1000 times toobtain the empirical estimates for the total absolute bias (sum of the absolute biases of θ and γ ) and the total MSE (sum of the MSEs of θ and γ ) of the MDPDE with respect to the targetvalue (1 , α and with the maximum likelihood estimator (MLE) at α = 0 bycomparing these empirical bias and MSEs. 22t first we consider only the pure sample without any contamination and compare theefficiencies of the MDPDEs for different α with the MLE (at α = 0). The empirical estimatesof efficiency are computed from the total MSEs and are reported in Table 1 along with thetotal absolute bias for different α and different censoring proportions. It is clear from the tablethat the efficiency of the MDPDE decreases as α increases but the loss in efficiency is not sosignificant at smaller positive values of α . Further, for any fixed α both the total absolute biasand MSE increase as the censoring proposing increases.Table 1: Empirical Summary measures for the MDPDEs under no contamination α Cens. Prop. 0.00 0.01 0.10 0.30 0.50 0.70 1.00Total Abs. 10% 0.3848 0.3695 0.4067 0.4659 0.5145 0.5474 0.5801Bias 20% 0.4260 0.4192 0.4616 0.5472 0.6122 0.6518 0.6890Total 10% 0.1363 0.1490 0.1577 0.1873 0.2198 0.2458 0.2773MSE 20% 0.1860 0.1954 0.2078 0.2656 0.3097 0.3424 0.3782Relative 10% 100% 91% 86% 73% 62% 55% 49%Efficiency 20% 100% 95% 90% 70% 60% 54% 49%
Next, to examine the robustness of the proposed MDPDEs over the MLE, we repeat the abovesimulation study but with 5, 10, 15 or 20% contamination in the response variable and covariates.For contamination in response variable, we generate them from an
Exp (5 x ) distribution ( θ = 5)under the same censoring scheme as before; for contamination in the covariates, we simulateobservations from another normal distribution with mean 10 and variance 1. The empiricalbias and MSE of the estimators are reported in Tables 2 and 3 respectively. Clearly, note thatthe total absolute bias as well as the total MSE increases for any fixed α as the contaminationproportion increases. However, these changes are rather drastic at smaller values of α andstabilize as α increases. In other words, the MDPDE with larger α ≥ . α Cens. Prop. Cont. Prop 0.00 0.01 0.10 0.30 0.50 0.70 1.0010% 5% 0.513 0.506 0.392 0.270 0.220 0.199 0.18610% 0.741 0.667 0.437 0.258 0.223 0.216 0.21715% 1.093 1.011 0.794 0.554 0.518 0.534 0.57820% 1.594 1.476 1.169 1.009 0.878 0.816 0.78120% 5% 1.006 0.953 0.747 0.651 0.723 0.786 0.85610% 0.759 0.706 0.522 0.412 0.393 0.405 0.41515% 0.865 0.790 0.719 0.559 0.481 0.449 0.43620% 1.090 1.000 1.102 1.166 1.092 1.035 1.003 α Cens. Prop. Cont. Prop 0.00 0.01 0.10 0.30 0.50 0.70 1.0010% 5% 0.265 0.257 0.139 0.081 0.070 0.071 0.08310% 0.453 0.425 0.220 0.101 0.079 0.077 0.08115% 0.908 0.848 0.497 0.259 0.229 0.244 0.29820% 2.282 2.167 1.585 0.913 0.714 0.629 0.60220% 5% 0.794 0.743 0.425 0.305 0.353 0.410 0.48210% 0.441 0.415 0.248 0.166 0.144 0.143 0.14715% 0.689 0.680 0.499 0.309 0.249 0.235 0.24220% 1.208 1.171 1.153 1.008 0.898 0.838 0.833
We will now apply our proposed MDPDEs to an interesting real data example with the semi-parametric model assumptions, which will illustrate the performance of the proposed semi-parametric extension described in Section 5 along with its applicability in real life scenarios.The data set considered is from the popular Stanford heart transplant program described indetails in Clark et al. (1971) and contains the following survival information of 158 patients(Crowley and Hu, 1977; Escobar and Meeker Jr, 1992): ID number of patients (“ID”), survivalor censoring time (“TIME”), censoring status (dead or alive), patient’s age at first transplant inyears (“AGE”) and the T5 mismatch score (“T5-MS”). The dataset, available from the ‘survival’library of R, was analyzed statistically by many authors including Brown et al. (1973), Turnbullet al. (1974), Mantel and Byar (1974), and Miller and Halpern (1982). Recently, it has also beenused to illustrate the performances of robust estimates under semi-parametric AFT models bySalibian-Barrera and Yohai (2008), Locatelli et al. (2011) and Wang et al. (2015). The latestrobust estimator of Wang et al. (2015), namely, the KMW-GM estimator, has been seen to workbest for this dataset while using the modellog(TIME) = β + β (AGE) + β (T5-MS) + σǫ. Here, we will apply our proposed MDPDE with different tuning parameters α with the sameparametric model as above and the assumption that ǫ ∼ N (0 ,
1) and illustrate the superiorperformance of our proposal over the KMW-GM estimator of Wang et al. (2015).As noted in Wang et al. (2015), there are three potential outliers in the dataset correspondingto the ID 2, 16 and 21, where the patients have unexpectedly shorter survival times. This findingis also consistent with the results from previous analyses of the dataset and so we also treat thesethree data points as outliers and compute our MDPDEs twice; once with the full data set andonce after removing these outliers. However, since the estimates of the parameters ( β , β , β , σ )differ only slightly in the two cases with and without outliers, we will report teh relative variationin the estimates in order to check the extent of their robustness. Following Wang et al. (2015),we define the relative variation asRelativeVariation = | ˆ θ full − ˆ θ cleaned || ˆ θ full | , θ full is the estimated parameter value based on the full data set and ˆ θ cleaned is the parameterestimate based on the cleaned data after removing the three outliers. The relative variationsobtained for each of the parameters are reported in Table 4 for our proposed MDPDEs withdifferent tuning parameters and α and also for the KMW-GM estimator of Wang et al. (2015).It can be seen clearly from the table that the MDPDEs of most of the parameters are muchmore stable and have less relative variation compared to the KMW-GM estimator of Wang etal. (2015) for α ≥ .
4; this clearly shows the greater robustness of our proposal compared to theexisting robust method. Further, note that the relative variation is quite high at α = 0 whichis the non-robust maximum likelihood estimator. As α increases the relative variations of allthe parameters decrease significantly which again shows the significant gain in robustness of ourproposal with increasing α .Table 4: The relative variation of the MDPDEs at α and the KMW-GM estimates of Wang etal. (2015) with and without outliers for the Heart Transplant Data α β β β σ α in MDPDE A crucial issue for applying the proposed MDPDE in any real-life problem is the choice of tun-ing parameter α . As we have seen that the robustness and efficiency of the MDPDEs dependcrucially on the tuning parameter α , it needs to be chosen carefully in practice where we haveno idea regarding the contamination and censoring proportions. The simulation study presentedin Section 7 gives some indication in this direction. We have seen that the MDPDEs with larger α ≥ . α increases. On the other hand, the efficiency of the MDPDEs under pure data isseen to decreases as α increases, but there is no significant loss in efficiency at smaller positivevalues of α near 0.3. So, we recommend to use a value of the tuning parameter α near 0.3 to geta fair compromise between efficiency and robustness whenever the amount of contamination isnot known in practice. This is in-line with the empirical suggestions given by Basu et al. (2006)in the context of MDPDE based on censored data with no covariables. However, these empiri-cal suggestions need further justification based on more elaborative simulation and theoreticalaspects. In case of complete data, some such justifications of the data driven choice of α isgiven by Hong and Kim (2001) and Warwick and Jones (2005). Their work might have beengeneralized to the case of censored data, although it is not very easy, in order to solve this issue25f selecting α . We hope to pursue this in our future research. The present paper proposes the minimum density power divergence estimator under the para-metric set-up for censored data with covariables to generate highly efficient and robust inference.The applicability of the proposed technique is illustrated through appropriate theoretical resultsand simulation exercise in the context of censored regression with stochastic covariates. Further,the paper provide the asymptotic theory for a general class of estimators based on the estimatingequation which opens the scope of studying many such estimators in the context of censoreddata in presence of some stochastic covariates.
Acknowledgments:
The authors gratefully acknowledge the comments of two anonymousreferees which led to an improved version of the manuscript.
References [1] Basu, S., Basu, A., and Jones, M. C. (2006). Robust and efficient parametric estimation forcensored survival data.
Annals of the Institute of Statistical Mathematics , , 341–355.[2] Basu, A., Harris, I. R., Hjort, N. L., and Jones, M. C. (1998). Robust and efficient estimationby minimising a density power divergence. Biometrika , , 549–559.[3] Bednarski, T. (1993) Robust estimation in the Cox regression model. Scand. J. Statist. , ,213–225.[4] Bednarski, T. and Borowicz, F. (2006). coxrobust: Robust Estimation in Cox Model. Rpackage version 1.0 .[5] Begun, J. M., Hall, W. J., Huang, W. M., Wellner, J. A. (1983). Information and AsymptoticEfficiency in Parametric-Nonparametric Models.
Annals of Statistics , , 432–452.[6] Brown, B. W., Jr., Hollander, M., and Korwar, R. M. (1973). Nonparametric Test of in-dependence for censored data with application to Heart Transplant Studies. Florida StateUniversity Conference on Reliability and Biometry .[7] Buckley, J., and James, I., (1979). Linear regression with censored data.
Biometrika , ,429–436.[8] Cai, Z. (1998). Asymptotic properties of Kaplan-Meier estimator for censored dependentdata. Statistics and probability letters , , 381–389.[9] Campbell, G., and F¨oldes, A. (1982). Large sample properties of nonparametric bivariateestimators with censored data. Nonparametric statistical inference , , 103-121.[10] Chen, Y. Y., Hollander, M., and Langberg, N. A. (1982). Small-sample results for theKaplan-Meier estimator. Journal of the American Statistical Association , , 141-144.[11] Clark, D. A., Stinson, E. B., Griepp, R. B., Schroeder, J. S., Shumway, N. E., and Harrison,D. C. (1971). Cardiac Transplantation in Man. VI. Prognosis of Patients Selected for CardiacTransplantation. Annals of Internal Medicine , , 15–21.2612] Collett, D. (2003). Modelling Survival Data in Medical Research . Chapman Hall, London,U.K.[13] Cox, D.R. (1972). Regression models and life tables (with discussion).
Journal of RoyalStatistical Society, Series B . , 187–220.[14] Cox, D. R., and Oakes, D. (1984). Analysis of Survival Data . Chapman Hall, London, U.K.[15] Crowder, M. J., Kimber, A. C., Smith, R. L., and Sweeting, T. J. (1991).
Statistical Analysisof Reliability Data . Chapman Hall, London, U.K.[16] Crowley, J. and Hu, M. (1977). Covariance analysis of heart transplant survival data.
Jour-nal of the American Statistical Association , , 27–36.[17] Dabrowska, D. M. (1988). Kaplan-Meier estimate on the plane. The Annals of Statistics , , 1475–1489.[18] Escobar, L. A. and Meeker Jr, W. Q. (1992). Assessing influence in regression analysis withcensored data. Biometrics , , 507–528.[19] Farcomeni, A. and Viviani, S. (2011) Robust estimation for the Cox regression model basedon trimming. Biometrical Journal , , 956–973.[20] Ghosh, A., and Basu, A. (2013). Robust estimation for independent non-homogeneous ob-servations using density power divergence with applications to linear regression. ElectronicJournal of statistics , , 2420-2456.[21] Ghosh, A., and Basu, A. (2014). Robust Estimation in Generalized Linear Models : TheDensity Power Divergence Approach. Test , doi:10.1007/s11749-015-0445-3.[22] Hampel, F. R., E. Ronchetti, P. J. Rousseeuw, and W. Stahel (1986).
Robust Statistics:The Approach Based on Influence Functions . New York, USA: John Wiley & Sons.[23] Hong, C. and Kim, Y. (2001), Automatic selection of the tuning parameter in the minimumdensity power divergence estimation.
Journal of the Korean Statistical Society , , 453–465.[24] Hosmer, D. W., Lemeshow, S. and May, S. (2008). Applied Survival Analysis: RegressionModeling of Time-to-Event Data . John Wiley & Sons.[25] Huber, P. J. (1981).
Robust Statistics . John Wiley & Sons.[26] Kaplan, E. L., and Meier, P. (1958). Nonparametric estimation from incomplete observa-tions.
Journal of the American statistical association ,
53 (282) , 457–481.[27] Kim, M., and Lee, S. (2008). Estimation of a tail index based on minimum density powerdivergence.
Journal of Multivariate Analysis , , 2453–2471.[28] Klein, J.P. and Moeschberger, M.L. (2003). Survival Analysis Techniques for Censored andTruncated Data, Second Edition.
Springer-Verlag, New York.[29] Kosorok, M.R., Lee, B.L. and Fine, J.P. (2004). Robust inference for univariate proportionalhazards frailty regression models.
Annals of Statistics , , 1448–1491.2730] Lawless, J.F. (2003). Statistical Models and Methods for Lifetime Data, Second Edition.
John Wiley & Sons, Inc. New York.[31] Lee, S., and Song, J. (2009). Minimum density power divergence estimator for GARCHmodels.
Test , , 316–341.[32] Lee, S., and Song, J. (2013). Minimum density power divergence estimator for diffusionprocesses. Annals of the Institute of Statistical Mathematics , , 213-236.[33] Lehmann, E. L. (1983). Theory of Point Estimation . John Wiley & Sons.[34] Lo, S. H., Mack, Y. P., and Wang, J. L. (1989). Density and hazard rate estimation forcensored data via strong representation of the Kaplan-Meier estimator.
Probability theory andrelated fields , , 461-473.[35] Locatelli, I., Marazzi, A., Yohai, V. J. (2011). Robust accelerated failure time regression. Computational Statistics and Data Analysis . , 874–887.[36] Mantel, N. and Byar, D. P. (1974). Evaluation of Response-Time data involving transientstates: An illustration using Heart-Transplant data. Journal of the American Statistical As-sociation , , 81–86.[37] Miller, R. and Halpern, J. (1982). Regression with censored data. Biometrika , , 521–531.[38] Peterson Jr, A. V. (1977). Expressing the Kaplan-Meier estimator as a function of empiricalsubsurvival functions. Journal of the American Statistical Association ,
72 (360a) , 854–858.[39] Ritov, Y. (1986). Estimation in a Linear Regression Model with Censored Data.
The Annalsof Statistics , , 303–328.[40] Robins, J. M., and Rotnitzky, A. (1992). Recovery of information and adjustment for depen-dent censoring using surrogate markers. In AIDS Epidemiology , 297–331. Birkh´’auser Boston.[41] Satten, G. A., and Datta, S. (2001). The KaplanMeier estimator as an inverse-probability-of-censoring weighted average.
The American Statistician , , 207–210.[42] Salibian-Barrera, M., and Yohai, V. J. (2008). High breakdown point robust regression withcensored data. The Annals of Statistics , , 118–146.[43] Serfling, R. J. (1980). Approximation Theorems of Mathematical Statistics . New York, USA:John Wiley & Sons.[44] Stute, W. (1993). Consistent estimation under random censorship when covariables arepresent.
Journal of Multivariate Analysis , , 89–103.[45] Stute, W. (1995). The central limit theorem under random censorship. The Annals of Statis-tics , , 422–439.[46] Stute, W. (1996). Distributional convergence under random censorship when covariablesare present. Scandinavian Journal of Statistics , , 461–471.[47] Stute, W., and Wang, J. L. (1993). The strong law under random censorship. The Annalsof Statistics , , 1591–1607. 2848] Turnbull, B. W., Brown, B. W., Jr., and Hu, M. (1974). Survivorship analysis of HeartTransplant data. Journal of the American Statistical Association , , 74–80.[49] Tsai, W. Y., Jewell, N. P., and Wang, M. C. (1987). A note on the product-limit estimatorunder right censoring and left truncation. Biometrika , , 883–886.[50] Van der Laan, M. J., and Robins, J. M. (2003). Unified methods for censored longitudinaldata and causality . Springer.[51] Wang, J. L. (1999). Asymptotic Properties of M-Estimators Based on Estimating Equationsand Censored Data.
Scandinavian journal of statistics , , 297–318.[52] Wang, M. C., Jewell, N. P., and Tsai, W. Y. (1986). Asymptotic properties of the productlimit estimate under random truncation. The Annals of Statistics , , 1597–1605.[53] Warwick, J., and Jones, M. C. (2005). Choosing a robustness tuning parameter. Journal ofStatistical Computation and Simulation , , 581–588.[54] Zhou, M. (1991). Some properties of the Kaplan-Meier estimator for independent noniden-tically distributed random variables. The Annals of Statistics ,19(4)