[PDF] Robust Hypothesis Testing and Model Selection for Parametric Proportional Hazard Regression Models

Abstract

The semi-parametric Cox proportional hazards regression model has been widely used for many years in several applied sciences. However, a fully parametric proportional hazards model, if appropriately assumed, can often lead to more efficient inference. To tackle the extreme non-robustness of the traditional maximum likelihood estimator in the presence of outliers in the data under such fully parametric proportional hazard models, a robust estimation procedure has recently been proposed extending the concept of the minimum density power divergence estimator (MDPDE) under this set-up. In this paper, we consider the problem of statistical inference under the parametric proportional hazards model and develop robust Wald-type hypothesis testing and model selection procedures using the MDPDEs. We have also derived the necessary asymptotic results which are used to construct the testing procedure for general composite hypothesis and study its asymptotic powers. The claimed robustness properties are studied theoretically via appropriate influence function analyses. We have studied the finite sample level and power of the proposed MDPDE based Wald type test through extensive simulations where comparisons are also made with the existing semi-parametric methods. The important issue of the selection of appropriate robustness tuning parameter is also discussed. The practical usefulness of the proposed robust testing and model selection procedures is finally illustrated through three interesting real data examples.

Full PDF

RRobust Hypothesis Testing and Model Selection forParametric Proportional Hazard Regression Models

Amarnath Nandy , Abhik Ghosh , Ayanendranath Basu , Leandro Pardo Interdisciplinary Statistical Research Unit, Indian Statistical Institute, Kolkata, India Department of Statistics and O.R. I, Complutense University of Madrid, 28040 Madrid, Spain

Abstract

The semi-parametric Cox proportional hazards regression model has been widely used for manyyears in several applied sciences. However, a fully parametric proportional hazards model, if appro-priately assumed, can often lead to more eﬃcient inference. To tackle the extreme non-robustnessof the traditional maximum likelihood estimator in the presence of outliers in the data under suchfully parametric proportional hazard models, a robust estimation procedure has recently been pro-posed extending the concept of the minimum density power divergence estimator (MDPDE) underthis set-up. In this paper, we consider the problem of statistical inference under the parametricproportional hazards model and develop robust Wald-type hypothesis testing and model selectionprocedures using the MDPDEs. We have also derived the necessary asymptotic results which areused to construct the testing procedure for general composite hypothesis and study its asymptoticpowers. The claimed robustness properties are studied theoretically via appropriate inﬂuence func-tion analyses. We have studied the ﬁnite sample level and power of the proposed MDPDE basedWald-type test through extensive simulations where comparisons are also made with the existingsemi-parametric methods. The important issue of the selection of appropriate robustness tuningparameter is also discussed. The practical usefulness of the proposed robust testing and modelselection procedures is ﬁnally illustrated through three interesting real data examples.

Keywords : Cox Regression; Minimum Density Power Divergence Estimator (MDPDE); ParametricSurvival Model, Robust Wald-type Test, Divergence Information Criterion (DIC).

Survival analysis is a major branch of statistics dealing with the time-to-event data where observationsare mostly right censored. There are diﬀerent censoring mechanisms that can be assumed dependingon the study design, its operation and data collection strategies. Among them, the most commonand simplest one is random censoring where the censoring time ( C ) is assumed to be distributedindependently of the actual lifetime (time to the event under consideration) variable ( T ). Such acensoring mechanism is commonly assumed for medical studies where the reason for patient dropoutsis often unknown. Given n independent randomly censored observations x , . . . , x n from such a study,we have x i = min ( t i , c i ), for each i = 1 , . . . , n , where t , . . . , t n are the independent and identicallydistributed (IID) realizations of T , and c , . . . , c n are the respective IID realizations of C . If weassume that the distribution functions of T and C are G T and G C , respectively, then the observed1 a r X i v : . [ s t a t . M E ] S e p ifetime variable X = min( T, C ) has distribution G X = 1 − (1 − G T ) (1 − G C ). Additionally, it isgenerally known whether an observation is censored or not; we denote this censoring information on n observations, respectively, by δ i = I ( t i ≤ c i ) for each i = 1 , . . . , n , and the associated random variableby δ = I ( T ≤ C ) , where I S denotes the indicator of an event S . In most practical applications, weadditionally have several related covariates and our aim is to explain (and often predict) the lifetime T from given values of the auxiliary variables Z ∈ R p . Let z i denotes the observed value of Z on the i -th observation for i = 1 , . . . , n . Then the widely used semi-parametric Cox proportional hazard modelrelates the lifetimes with the observed covariate values via its hazard rate. For the i -th observation,this is given by λ i ( t ) = λ ( t | Z = z i ) = λ ( t ) e β (cid:48) z i , i = 1 , . . . , n, (1)where the (unknown) regression coeﬃcient β ∈ R p takes into account the eﬀect of covariates and λ is the (unknown) hazard rate of T in the absence of covariates, known as the baseline hazard.Cox [12, 13] proposed the partial likelihood method for estimating the regression coeﬃcient in model(1) based on the observed data { ( x i , δ i , z i ) : i = 1 , , , ..., n } ; but the resulting estimator and thesubsequent inference are seen to be highly unstable under data contamination. Later several others[8, 35] have proposed appropriate robust estimates of β under the semi-parametric Cox model (1).In this paper, we consider a more eﬃcient fully parametric version of (1) as given by λ i, θ ( t ) = λ θ ( t | Z = z i ) = λ ( t, γ ) e β (cid:48) z i i = 1 , . . . , n (2)where λ ( t ; γ ) is a known parametric hazard function, deﬁned in terms of the unknown q -dimensionalparameter γ , and hence the full parameter vector is now given by θ = ( γ (cid:48) , β (cid:48) ) (cid:48) ∈ R p + q . Such afully parametric version of the Cox model is often referred to as the parametric proportional hazardsmodel and has several advantages as noted in [11, 21, 25, 27]. In practice, the parametric form for thebaseline hazard in (2) may often be assumed based on empirical investigations of the given data orfrom previous experiences with similar studies. The classical inference procedures for the parametricmodel in (2), based on the randomly censored observations, are formed using the maximum likelihoodestimators (MLEs) which are extremely non-robust against data contamination. However, moderncomplex datasets are frequently prone to outliers, and a robust procedure for inference would be ofextreme practical value under the parametric model (2). However, such a robustness considerationfor ﬁtting a regression model with randomly censored responses was lacking in the literature for along time except for a few scattered attempts for the accelerated failure time (AFT) models; see,e.g., [1, 3, 41]. For the parametric proportional hazard model (2), Ghosh and Basu [21] have recentlydeveloped a robust parameter estimation procedure extending the minimum density power divergence(DPD) approach of [4, 5]. In this paper, we follow up their work to further develop robust inference forhypothesis testing and model selection for the parametric model in (2) based on randomly censoredobservations.The MLE based classical Wald test is also extremely non-robust in the presence of outliers in thedata which has previously been noted by several authors [e.g., 29, 31] particularly for the Cox regressionmodel. However, in the context of censored data with covariates, the robust testing procedures areavailable only for the traditional semi-parametric Cox model in (1) and not for its parametric versiongiven in (2). When no additional covariate is available, a robust Wald-type testing procedure based onthe randomly censored observations has been developed more recently in [22]. Our aim, in this paper, is2o present similar robust Wald-type testing procedures for testing general parametric hypothesis underthe fully parametric proportional hazard regression model in (2), along with detailed investigation oftheir asymptotic and robustness properties. To deﬁne such tests, we will utilize the robust minimumDPD estimator (MDPDE) of the parameter vector θ developed in [21] for model (2). Once the generaltheory is developed, we will also illustrate our proposals for some special cases, such as the test forsigniﬁcance of any covariate and the test for selecting a proper parametric baseline hazard.Another important aspect of statistical inference is model selection , where we choose an appropriateparametric model from a pool of available candidates that best ﬁts the observed data. In case of ourparametric Cox regression via models of the form (2), we often need to select the best ﬁtted modelboth in terms of the covariates to be included (from a larger set of possible covariates) and the formof the baseline hazard function to use. The most common likelihood-based model selection criteria,such as AIC and BIC, are also severely non-robust under data contamination. Recently, a few robustmodel selection criteria have been discussed in the literature [see, e.g., 9, 43] but they all consideredthe semi-parametric Cox model (1). So, in this paper we also discuss a robust model selection criterionfor the parametric Cox regressions (2) extending the idea of divergence information criterion (DIC)based on the robust MDPDE of the model parameters. The DIC was originally introduced in [34]for the simple IID observations with complete data and later extended for non-homogeneous (butstill complete) data in [33]. Here, we extend the concept of DIC further for the randomly censoredresponse to select the best parametric proportional hazards regression model from the available set ofall possible models.The rest of the paper is organized as follows. In Section 2, for the sake of completeness, we presentthe MDPDE of θ under the parametric proportional hazards regression model (2) following [21]. TheWald-type test statistics, based on these MDPDEs, are presented in Section 3, together with theirasymptotic properties. In Section 4, we have derived the theoretical robustness properties of theproposed Wald-type test statistics along with the stability of its level and power via the inﬂuencefunction analysis. The model selection criterion, namely the MDPDE based DIC, is described inSection 5. In Section 6 and 8, we have provided the empirical results to illustrate our proposedinference methodologies based on extensive simulation studies and interesting real data examples,respectively. In Section 7, we have discussed the important issue of selecting the robustness tuningparameter in our proposal. Finally, Section 9 concludes the paper. For brevity, we present the proofsof all results in the Appendix. In the parametric proportional hazards model (2), given the covariate value Z = z i and the censoringindicator δ = δ i , the density function for the i -th observed lifetime x i is given by f i, θ ( x ) = f θ ( x | δ i , z i ) = (cid:16) λ ( x, γ ) e β (cid:48) z i (cid:17) δ i exp (cid:16) − Λ γ ( x ) e β (cid:48) z i (cid:17) , (3)where Λ γ ( x ) = (cid:90) x λ ( s, γ ) ds . Clearly, given z i and δ i , the variables X i s are independent but non-homogeneous; our aim is to model their true densities, say h i ( x ) = h ( x | δ = δ i , Z = z i ), by theparametric density function of the form f i, θ ( x ) given in (3), respectively, for every i = 1 , . . . , n . So,3e follow the approach of [17] to deﬁne the MDPDE of θ in the present scenario as (cid:98) θ n,α = (cid:16)(cid:98) γ (cid:48) n,α , (cid:98) β (cid:48) n,α (cid:17) (cid:48) = arg min θ n n (cid:80) i =1 d α ( (cid:98) h i , f i, θ ) (4)where d α is the DPD measure [4, 5] and (cid:98) h i is an empirical estimate of h i for each i = 1 , .., n . Uponsimpliﬁcation, the MDPDE can be obtained minimizing the simpler objective function H n,α ( θ ) = 1 n n (cid:80) i =1 (cid:20)(cid:90) f i, θ ( x ) α dx − αα λ θ ( x i | z i ) αδ i S θ ( x i | z i ) α + 1 α (cid:21) (5)with λ θ ( t | z i ) = λ θ ( t | Z = z i ) being as given in (2) and S θ ( t | z ) = S θ ( t | Z = z ) = exp (cid:16) − Λ γ ( t ) e β (cid:48) z (cid:17) . Remark 1

It is known that the DPD measure at α = 0 coincides with the Kullback-Leibler divergence( d KL). On the other hand, if L n ( θ ) denotes the likelihood function associated to the random sample ( x i , δ i , z i ) , i = 1 , . . . , n , it is not diﬃcult to establish that n log L n ( θ ) = k − n n (cid:80) i =1 d KL ( (cid:98) h i , f i, θ ) , where k is a constant independent of θ . Therefore, the MDPDE at α = 0 coincides with the MLE. Inthis sense the MDPDE can be considered as a natural extension of the MLE, obtained replacing theKullback-Leibler divergence by the DPD measure with some α > . Ghosh and Basu [21] studied the detailed properties of the MDPDE, obtained by minimizing H n,α ( θ ), under the parametric proportional hazards model (2) both theoretically and empirically. Forcompleteness, we here present their main theoretical results that would be need in the subsequent partsof the paper. Firstly, note that the score functions corresponding to the density f i, θ ( x ) = f θ ( x | δ i , z i )in (3) are given by u (1) θ ( x | δ i , z i ) = ψ γ ( x ) δ i − Ψ γ ( x ) e β (cid:48) z i , and u (2) θ ( x | δ i , z i ) = z i (cid:104) δ i − Λ γ ( x ) e β (cid:48) z i (cid:105) , where ψ γ ( x ) = ∂ log λ ( x, γ ) ∂ γ , and Ψ γ ( x ) = (cid:90) x ∂λ ( s, γ ) ∂ γ ds = (cid:90) x ψ γ ( s ) λ ( s, γ ) ds. Then, under standard diﬀerentiability assumptions, the MDPDE of θ , under the model (2), can beobtained by solving the following system of equations [21]:1 n n (cid:80) i =1 u (1 ,α ) n ( θ | x i , δ i , z i ) = q , n n (cid:80) i =1 u (2 ,α ) n ( θ | x i , δ i , z i ) = p , (6)where u (1 ,α ) n ( θ | x, δ, z ) = (cid:110) ψ γ ( x ) λ θ ( x | z ) α S θ ( x | z ) α − ( λ θ ( x | z ) α −

1) Λ γ ( x ) S θ ( x | z ) α e β (cid:48) z (cid:111) z δ − z Λ γ ( x ) S θ ( x | z ) α e β (cid:48) z − ξ (2 ,α ) ( θ | δ, z ) , (8)with ξ ( j,α ) ( θ | δ, z ) = (cid:90) u ( j ) θ ( x | δ, z ) f θ ( x | δ, z ) α dx, for j = 1 , . Let us denote the MDPDE, obtained by solving (6), as (cid:98) θ n,α . Its asymptotic distribution has beenstudied in [21] under some appropriate assumptions; we will refer to their Assumptions (A)–(E) asthe Ghosh-Basu Conditions throughout the rest of the paper. Besides the technical conditions, itincludes a crucial assumption about the structure of the underlying true distribution; Assumption(A), in particular, states that the true hazard rate of the i -th observation, given the covariate value Z = z i , is of form λ i ( s ) = λ ( s ) h ( z i ) for some positive functions λ and h . Under these conditions,the limiting form of the MDPDE estimating equations in (6), as n → ∞ , is given by [21] u (1 ,α )0 ( θ ) = q , u (2 ,α )0 ( θ ) = p , (9)where we assume that the process is observed till the maximum time-point T max and deﬁne u (1 ,α )0 ( θ ) = (cid:90) T max (cid:104) ψ γ ( s ) λ ( s, γ ) α (cid:110) r (0) α,α ( s ) λ ( s ) − (cid:101) q (0) α,α ( s ) λ ( s, γ ) (cid:111) − Ψ γ ( s ) (cid:110) q (0)0 ,α ( s ) − (cid:101) q (0) α,α ( s ) (cid:111) − λ ( s, γ ) α Ψ γ ( s ) (cid:110) r (0) α +1 ,α ( s ) λ ( s ) − (cid:101) q (0) α +1 ,α ( s ) λ ( s, γ ) (cid:111)(cid:105) ds, and u (2 ,α )0 ( θ ) = (cid:90) T max (cid:104) λ ( s, γ ) α (cid:110) r (1) α,α ( s ) λ ( s ) − (cid:101) q (1) α,α ( s ) λ ( s, γ ) (cid:111) − Λ γ ( s ) (cid:110) q (1)0 ,α ( s ) − (cid:101) q (1) α,α ( s ) (cid:111) − λ ( s, γ ) α Λ γ ( s ) (cid:110) r (1) α +1 ,α ( s ) λ ( s ) − (cid:101) q (1) α +1 ,α ( s ) λ ( s, γ ) (cid:111)(cid:105) ds, with (cid:101) q ( j ) α ,α ( s ) = E (cid:104) ( Z ) j e ( α +1) β (cid:48) z S θ ( s | Z ) α +1 (cid:105) r ( j ) α ,α ( s ) = E (cid:104) ( Z ) j I ( X ≥ s ) e α β (cid:48) z S θ ( s | Z ) α h ( Z ) (cid:105) q ( j ) α ,α ( s ) = E (cid:104) ( Z ) j I ( X ≥ s ) e ( α +1) β (cid:48) z S θ ( s | Z ) α (cid:105) for each j = 0 , α , α ≥

0. Here ( Z ) denotes 1 and hence (cid:101) q (0) α ,α , r (0) α ,α and q (0) α ,α areall scalars; however, (cid:101) q (1) α ,α , r (1) α ,α and q (1) α ,α are p -dimensional vectors. Furthermore, when the trueunderlying distribution has the hazard rate as the assumed model in (2), we always have (cid:101) q ( j ) α ,α = r ( j ) α ,α = q ( j ) α ,α for j = 0 ,

1. Accordingly, we deﬁne the corresponding Fisher consistent statisticalfunctional as the solution of (9). More precisely, if the observations ( x i , δ i , z i ), i = 1 , . . . , n , areassumed to be IID realizations of the random variable ( X, δ, Z ) having joint distribution H , the5inimum DPD functional U α ( H ), corresponding to the MDPDE (cid:98) θ n,α , as a solution to the limitingMDPDE estimating equations given in (9).The following theorem, taken from [21], present the asymptotic distribution of the MDPDE (cid:98) θ n,α under the parametric proportional hazards regression model (2). Theorem 2.1 ([21])

Suppose that the observed data ( x i , δ i , z i ) , i = 1 , ..., n, are IID realizations of therandom triplet ( X, δ, Z ) having the true joint distribution H such that the minimum DPD functional U α ( H ) = θ exists and is unique. Then, under Ghosh-Basu Conditions, there exists a consistentsequence of MDPDEs (cid:98) θ n,α = (cid:16)(cid:98) γ (cid:48) n,α , (cid:98) β (cid:48) n,α (cid:17) (cid:48) , as a solution to the estimating equations in (6), whichsatisﬁes √ n (cid:16)(cid:98) θ n,α − θ (cid:17) D −→ n −→∞ N ( q + p , Σ α ( θ )) , where d denotes the d -vector of zeros and Σ α ( θ ) = J − α ( θ ) K α ( θ ) J − α ( θ ) with J α ( θ ) = −  ∂ u (1 ,α )0 ( θ ) ∂ γ (cid:48) ∂ u (1 ,α )0 ( θ ) ∂ β (cid:48) ∂ u (2 ,α )0 ( θ ) ∂ γ (cid:48) ∂ u (2 ,α )0 ( θ ) ∂ β (cid:48)  and K α ( θ ) = Var H (cid:34) u (1 ,α ) n ( θ | X, δ, Z ) u (2 ,α ) n ( θ | X, δ, Z ) (cid:35) . We will utilize the asymptotic distribution of the MDPDE, as stated in the above theorem, todeﬁne and study the robust testing and model selection procedures for the parametric proportionalhazards model (2). For this purpose, we would need a consistent estimator of the asymptotic variancematrix Σ α ( θ ) of the MDPDE (cid:98) θ n,α . Such an variance estimate can be constructed as Σ n,α = J − n,α (cid:16)(cid:98) θ n,α (cid:17) K n,α (cid:16)(cid:98) θ n,α (cid:17) J − n,α (cid:16)(cid:98) θ n,α (cid:17) , (10)where J n,α ( θ ) and K n,α ( θ ) are some consistent (and continuous) estimator of the matrices J α ( θ )and K α ( θ ), respectively. Following the discussion in [21], we will particularly use their empiricalestimators as given by J n,α ( θ ) = − n n (cid:80) i =1  ∂ u (1 ,α ) n ( θ | x i ,δ i , z i ) ∂ γ (cid:48) ∂ u (1 ,α ) n ( θ | x i ,δ i , z i ) ∂ β (cid:48) ∂ u (2 ,α ) n ( θ | x i ,δ i , z i ) ∂ γ (cid:48) ∂ u (2 ,α ) n ( θ | x i ,δ i , z i ) ∂ β (cid:48)  , and K n,α ( θ ) = 1 n n (cid:80) i =1 (cid:34) u (1 ,α ) n ( θ | x i , δ i , z i ) u (1 ,α ) n ( θ | x i , δ i , z i ) (cid:48) u (1 ,α ) n ( θ | x i , δ i , z i ) u (2 ,α ) n ( θ | x i , δ i , z i ) (cid:48) u (2 ,α ) n ( θ | x i , δ i , z i ) u (1 ,α ) n ( θ | x i , δ i , z i ) (cid:48) u (2 ,α ) n ( θ | x i , δ i , z i ) u (2 ,α ) n ( θ | x i , δ i , z i ) (cid:48) (cid:35) . We now consider the problem of testing the general composite hypothesis given by H : θ ∈ Θ against H : θ / ∈ Θ , (11)6here Θ is a given subset of the parameter space Θ ⊆ R p + q . This null parameter space Θ is oftendeﬁned by a set of r ( ≤ p + q ) restrictions of the form m ( θ ) = r , (12)where m : R p + q → R r is a known function; that is, Θ = { θ ∈ Θ : m ( θ ) = r } . We assume that the( p + q ) × r matrix M ( θ ) = ∂ m (cid:48) ( θ ) ∂ θ exists, is continuous in θ , and rank ( M ( θ )) = r .An important example is the test for signiﬁcance of the regression model with the null hypothesis H : β = p , indicating no covariate eﬀect. This is a special case of (11) with r = p and m ( θ ) = β ,so that Θ = { ( γ, β ) ∈ Θ : m ( γ, β ) = β = p } and M = [ I p ; O ] (cid:48) ; here I p denotes the p × p identitymatrix and O is a null matrix of appropriate order.In the following deﬁnition we present the Wald-type test statistics for testing (11), utilizing theMDPDEs and the consistent estimator of their asymptotic variance matrix. Deﬁnition 3.1

The Wald-type test statistics for testing (11) under the parametric proportional hazardmodel (2) is given by W n (cid:16)(cid:98) θ n,α (cid:17) = n m (cid:48) ( (cid:98) θ n,α ) (cid:16) M (cid:48) ( (cid:98) θ n,α ) Σ n,α M ( (cid:98) θ n,α ) (cid:17) − m ( (cid:98) θ n,α ) , (13) where (cid:98) θ n,α is the MDPDE of θ obtained based on a randomly censored sample of size n and tuningparameter α ≥ , and the matrix Σ n,α is as deﬁned in (10). The following theorem presents the asymptotic null distribution of our Wald-type test statisticswhich can be used to determine the required critical values for the testing procedures; the proof isgiven in the Appendix.

Theorem 3.2

Suppose that the assumptions of Theorem 2.1 hold true under the parametric pro-portional hazards model (2) and the matrix Σ α ( θ ) is continuous in θ ∈ Θ . Then, the asymptoticdistribution of the Wald-type test statistic W n ( (cid:98) θ n,α ) , under the composite null hypothesis in (11), is achi-square with r degrees of freedom. Using the above theorem, the null hypothesis in (11) will be rejected, based on the Wald-type teststatistics W n ( (cid:98) θ n,α ), if W n ( (cid:98) θ n,α ) > χ r,τ , (14)where χ r,τ is the 100(1 − τ ) percentile of a chi-square distribution with r degrees of freedom. Let us now study the power of the MDPDE based Wald-type tests for testing the hypothesis in (11)under the parametric proportional hazards regression model (2). Our ﬁrst theorem helps to obtainan approximation to the power function of the Wald-type tests of the form (14); see Appendix for itsproof. 7 heorem 3.3

Let θ ∗ ∈ Θ \ Θ be the true value of the parameter so that the MDPDE (cid:98) θ n,α P −→ n −→∞ θ ∗ . Denote, (cid:96) ∗ ( θ , θ ) = n m (cid:48) ( θ ) (cid:0) M (cid:48) ( θ ) Σ α ( θ ) M ( θ ) (cid:1) − m ( θ ) . Then √ n (cid:16) (cid:96) ∗ ( (cid:98) θ n,α , (cid:98) θ n,α ) − (cid:96) ∗ ( θ ∗ , θ ∗ ) (cid:17) L −→ n −→∞ N (cid:0) , σ W n ( θ ∗ ) (cid:1) where σ W n ( θ ∗ ) = (cid:18) ∂(cid:96) ∗ ( θ , θ ∗ ) ∂ θ (cid:19) (cid:48) θ = θ ∗ Σ α ( θ ∗ ) (cid:18) ∂(cid:96) ∗ ( θ , θ ∗ ) ∂ θ (cid:19) (cid:48) θ = θ ∗ . Remark 2

Based on Theorem 3.3, we approximate the power function, say β W n ( θ ∗ ) , of the Wald-typetest statistics in (13), at the parameter value θ ∗ ∈ Θ \ Θ as β W n ( θ ∗ ) ∼ = 1 − Φ (cid:32) √ nσ W n ( θ ∗ ) (cid:32) χ r,α n − (cid:96) ∗ ( θ ∗ , θ ∗ ) (cid:33)(cid:33) . (15) Here

Φ ( x ) is the distribution function of a normal random variable with mean zero and variance 1.It implies that lim n →∞ β W n ( θ ∗ ) = 1 , i.e. the Wald-type test statistics considered in (13) are consistent in the sense of Fraser [15]. Remark 3

Based on (15), we can further compute the required sample size in order to achieve atargeted power value. If we want a power of β W n ( θ ∗ ) = π, we need a sample of size n = [ n ∗ ] + 1 ,where [ x ] is the largest integer less than or equal to x , and n ∗ = A + B + (cid:112) A ( A + 2 B )2 (cid:96) ∗ ( θ ∗ , θ ∗ ) , with A = σ W n ( θ ∗ ) (cid:0) Φ − (1 − π ) (cid:1) and B = χ r,α (cid:96) ∗ ( θ ∗ , θ ∗ ) . Next, we derive the asymptotic power of the Wald-type test statistics considered in (13) at acontiguous sequence of alternative hypotheses close to the null hypothesis. Let θ n ∈ Θ \ Θ be agiven contiguous alternative, and let θ be the element in Θ closest to θ n in terms of the Euclideandistance. One possibility to introduce such contiguous alternative hypotheses, in this context, is toconsider a ﬁxed non-zero d ∈ R p + q and permit θ n to move towards θ ∈ Θ as n increases, throughthe relation H ,n : θ = θ n , where θ n = θ + n − / d . (16)A second approach is the relaxation of the condition m ( θ ) = r that deﬁnes Θ . Let δ ∈ R r andconsider the sequence of parameters { θ n } moving towards θ ∈ Θ according to the set up H ∗ ,n : θ = θ n , where m ( θ n ) = n − / δ . (17)Note that a Taylor series expansion of m ( θ n ) around θ yields m ( θ n ) = m ( θ ) + M (cid:48) ( θ ) ( θ n − θ ) + o ( (cid:107) θ n − θ (cid:107) ) . (18)8y substituting θ n = θ + n − / d in (18) and taking into account that m ( θ ) = r , we get m ( θ n ) = n − / M (cid:48) ( θ ) d + o ( (cid:107) θ n − θ (cid:107) ) . (19)So, the equivalence relationship between the hypotheses H ,n and H ∗ ,n is obtained as δ = M (cid:48) ( θ ) d , as n → ∞ . (20)In the following theorem we present the asymptotic distribution of the Wald-type test statistics underthese contiguous alternative hypotheses H ,n and H ∗ ,n , which can be used to compute the correspond-ing asymptotic power functions of the proposed Wald-type tests. Here, χ r ( b ) will denote a non-centralchi-square random variable with degrees of freedom r and non-centrality parameter b . Theorem 3.4

Assume that the conditions of Theorem 2.1 hold. Then, the asymptotic distribution of W n ( (cid:98) θ n,α ) is given by W n ( (cid:98) θ n,α ) L −→ n →∞ χ r (cid:16) d (cid:48) M ( θ ) (cid:0) M (cid:48) ( θ ) Σ α ( θ ) M ( θ ) (cid:1) − M (cid:48) ( θ ) d (cid:17) , (21) under H ,n given in (16), and by W n ( (cid:98) θ n,α ) L −→ n →∞ χ r (cid:16) δ (cid:48) (cid:0) M (cid:48) ( θ ) Σ α ( θ ) M ( θ ) (cid:1) − δ (cid:17) , (22) under H ∗ ,n given in (17). Remark 4

The result presented under the contiguous alternative hypotheses H ,n in (16) can also beused to give an approximation of the power function at any θ ∗ . For this purpose, we express θ ∗ as θ ∗ = θ + ( θ ∗ − θ ) = θ + 1 √ n √ n ( θ ∗ − θ ) . and then apply the results from Theorem 3.4 with d = √ n ( θ ∗ − θ ) to obtain the desired powerapproximation. The inﬂuence function (IF) is a classical tool for studying local robustness of any statistical functionalunder inﬁnitesimal contamination at a distant outlying point [23, 38, 39]. Here, we will use it to theo-retically justify the claimed robustness of our proposed Wald-type test statistics under the parametricproportional hazards model (2). For this purpose, we ﬁrst need to deﬁne the statistical functionalassociated with the Wald-type test statistics deﬁned in (13).Let us continue with the notation and assumptions of Section 2. In particular, assume that theobservations ( x i ; δ i ; z i ), i = 1 , . . . , n , are IID realizations of ( X, δ, Z ) having true joint distribution H , and U α ( H ) denotes the MDPDE functional at H having a tuning parameter α ≥

0. Using it, we9eﬁne the statistical functional associated with the Wald-type test statistics for testing the compositehypothesis (11), evaluated at H , as given by (ignoring the multiplier n ) W α ( H ) = m (cid:48) ( U α ( H )) (cid:0) M (cid:48) ( θ ) Σ α ( θ ) M ( θ ) (cid:1) − m ( U α ( H )) . (23)Next, we consider the contaminated distribution H (cid:15) = (1 − (cid:15) ) H + (cid:15) ∧ y t , where there is 100 (cid:15) % contam-ination by the degenerate distribution ∧ y t at a contamination point y t = ( x t , δ t , z t ). Then, the r -thorder IF of our proposed Wald-type test statistic is deﬁned as ( r = 1 , , . . . ) IF r ( y t , W α , H ) = ∂ r ∂(cid:15) r W α ( H (cid:15) ) | (cid:15) =0 . We can follow the general arguments presented in [19] to compute the above IF of the MDPDEbased Wald-type tests for the present case of parametric Cox regression model. In particular, for anynull parameter value θ ∈ Θ , let us denote the joint distribution of ( X, δ, Z ) to be H θ , under which x i , given δ i and z i , follows the model density f i, θ as in (3), for each i = 1 , . . . , n . Then, one can easilyshow that, at this null distribution H = H θ , the ﬁrst order IF of the Wald-type test functional W α is identically zero and the second order IF has the form IF ( y t , W α , H θ ) = 2 IF (cid:48) ( y t , U α , H θ ) N ( θ ) IF ( y t , U α , H θ ) , (24)where N ( θ ) = M (cid:0) θ (cid:1)(cid:2) M (cid:48) (cid:0) θ (cid:1) Σ α (cid:0) θ (cid:1) M (cid:0) θ (cid:1)(cid:3) − M (cid:48) (cid:0) θ (cid:1) and IF ( y t , U α , H θ ) denotes the IF ofthe MDPDE functional U α having the form [21] IF ( y t , U α , H θ ) = J − α ( θ ) (cid:32) u (1 ,α ) n ( θ | y t ) u (2 ,α ) n ( θ | y t ) (cid:33) , (25)with u (1 ,α ) n ( θ | x, δ, z ), u (2 ,α ) n ( θ | x, δ, z ) and J α ( θ ) being as deﬁned in Section 2.We can clearly observe that, due to the presence of the terms λ ( x t , γ ) α , e α β (cid:48) z t and S θ ( x t | z t ) α in u ( j,α ) n ( θ | x, δ, z ), j = 1 ,

2, the above IF of the MDPDE remains bounded at contamination points forany α >

0. Therefore the second order IF of the Wald-type test statistic also remains bounded in y t for all α >

0, which indicates the desired robustness properties for our proposed class of tests.

We now study the robustness of the level and power of the proposed Wald-type tests through thecorresponding inﬂuence functions [24]. These were studied for general MDPDE based Wald-type testswith IID complete data in [19] which we will extend here for the randomly censored data.To start with, we compute the asymptotic level and power against the contiguous alternativesof the form (16), along with the additional contamination, respectively, through the distributions H L(cid:15), y t = (cid:16) − (cid:15) √ n (cid:17) H θ + (cid:15) √ n ∧ y t and H P(cid:15), y t = (cid:16) − (cid:15) √ n (cid:17) H θ n + (cid:15) √ n ∧ y t , where θ ∈ Θ is the assumed true(null) parameter value and y t = ( x t , δ t , z t ) is the contamination point as in the previous subsection.Also, let us deﬁne the following quantities T W n ( (cid:15), y t ) = lim n →∞ P H L(cid:15), y t (cid:0) W n ( θ ) > χ r,τ (cid:1) , W n ( θ n , (cid:15), y t ) = lim n →∞ P H P(cid:15), y t (cid:0) W n ( θ ) > χ r,τ (cid:1) . Then, the level inﬂuence function (LIF) and the power inﬂuence function (PIF), for our proposedrobust Wald-type test statistic W n ( θ ) for testing the hypothesis in (11) at the level of signiﬁcance τ ,are deﬁned as LIF ( y t , W n , H θ ) = ∂∂(cid:15) T W n ( (cid:15), y t ) | (cid:15) =0 , and P IF ( y t , W n , H θ ) = ∂∂(cid:15) β W n ( θ n , (cid:15), y t ) | (cid:15) =0 . So, we ﬁrst need to ﬁnd the asymptotic distribution of the Wald-type test statistic W n under thecontiguous contaminated distribution H P(cid:15), y t which is presented in the following theorem; its proof ispresented in Appendix. Theorem 4.1

Under the assumptions of Theorem 2.1. the asymptotic distribution of W n ( (cid:98) θ n,α ) underthe distribution H P(cid:15), y t is a non-central χ with degrees of freedom r and non-centrality parameter d ∗ = d (cid:48) (cid:15), y t ,α ( θ ) N ( θ ) d (cid:15), y t ,α ( θ ) , (26) where d (cid:15), y t ,α ( θ ) = d + (cid:15)IF ( y t , U α , H θ ) . Now, using the inﬁnite series expansion of a non-central chi-square distribution function in termsof that of the central chi-square variables, we can deduce from Theorem 4.1 that β W n ( θ n , (cid:15), y t ) = lim n →∞ P H P(cid:15),yt (cid:0) W n ( θ ) > χ r,τ (cid:1) ∼ = P (cid:0) χ r ( d ∗ ) > χ r,τ (cid:1) = ∞ (cid:88) v =0 C v (cid:16) M (cid:48) ( θ ) d (cid:15), y t ,α ( θ ) , (cid:0) M (cid:48) ( θ ) Σ α ( θ ) M ( θ ) (cid:1) − (cid:17) P (cid:0) χ r +2 v > χ r,τ (cid:1) , (27)where C v ( t , A ) = ( t (cid:48) At ) v v !2 v e − t (cid:48) At . Remark 5

If we put (cid:15) = 0 in (27), we get the asymptotic power of the proposed Wald-type tests underthe contiguous alternatives in (16) as given by β α ( θ n ) = β W n ( θ n , , y t ) ∼ = ∞ (cid:88) v =0 C v (cid:16) M (cid:48) ( θ ) d , (cid:0) M (cid:48) ( θ ) Σ α ( θ ) M ( θ ) (cid:1) − (cid:17) P (cid:0) χ r +2 v > χ r,τ (cid:1) . This result can also be obtained using the result (21) stated in Theorem 3.4.

Next, substituting d = in Theorem 4.1, we get the following corollary in a straightforwardmanner. 11 orollary 4.2 Under the assumptions of Theorem 2.1, the asymptotic distribution of W n ( (cid:98) θ n,α ) underthe distribution H L(cid:15),y t is non-central chi-square with degrees of freedom r and non-centrality parameter (cid:15) IF ( y t , U α , H θ ) (cid:48) N ( θ ) IF ( y t , U α , H θ ) . Thus, the asymptotic level of our MDPDE-based Wald-type tests, under contiguous contamination, isgiven by T W n ( (cid:15), y t ) = β W n ( θ , (cid:15), y t ) ∼ = ∞ (cid:88) v =0 C v (cid:16) (cid:15) M (cid:48) ( θ ) IF ( y t , U α , H θ ) , (cid:0) M (cid:48) ( θ ) Σ α ( θ ) M ( θ ) (cid:1) − (cid:17) P (cid:0) χ r +2 v > χ r,τ (cid:1) . Now, we can easily obtain the LIF and PIF of the proposed Wald-type test statistics via standarddiﬀerentiation, using Theorem 4.1 and Corollary 4.2, which are presented in the following theorem.See the Appendix for a detailed proof.

Theorem 4.3

Assume that the conditions of Theorem 4.1 is satisﬁed and that the IF of the MDPDEused is bounded. Then, for the proposed Wald-type test, the LIF of any order becomes identically zeroand the PIF at the signiﬁcance level τ has the form P IF (cid:0) y t , W n , H θ (cid:1) = C ∗ r (cid:0) d (cid:48) N (cid:0) θ (cid:1) d (cid:1) d (cid:48) N (cid:0) θ (cid:1) IF ( y t , U α , H θ ) , (28) where C ∗ r (cid:0) s (cid:1) = e − s (cid:80) ∞ v =0 s v − − v (2 v − s ) P (cid:0) χ r +2 v > χ r,τ (cid:1) /v ! . Note that the PIF of our proposed MDPDE-based Wald-type test is a linear function of the IF ofthe corresponding MDPDE as given in (25). As a result, the PIF would be bounded for all α >

In order to better understand the implications of the general results derived above, let us now discuss,in detail, a particular example of the testing problem under the parametric Cox regression model. Forsimplicity, consider the exponential baseline hazard λ ( t, γ ) = γ , with γ >

0, and only one covariate,so that the full parameter vector is given by θ = ( β, γ ), with β ∈ R , γ ∈ R + . Suppose that we areinterested in testing H : β = β against H : β (cid:54) = β , (29)for a pre-ﬁxed real number β . Note that, here γ is an unknown nuisance parameter. If we consider β = 0, the corresponding testing problem becomes that of testing for the signiﬁcance of the covariate.Clearly, the simpler testing problem in (29) belongs to the general class of hypotheses in (11), with r = 1, Θ = (cid:8) θ = ( γ, β ) T : β = β , γ ∈ R + (cid:9) , m ( θ ) = β − β and M ( θ ) = (0 , (cid:48) .If we denote the MDPDE with tuning parameter α by (cid:98) θ n,α = (cid:0)(cid:98) γ n,α , (cid:98) β n,α (cid:1) , then our proposedWald-type test statistic (13) for testing the hypothesis in (29) simpliﬁes to W n = W n ( (cid:98) θ n,α ) = n (cid:16) (cid:98) β n,α − β (cid:17) Σ (22) n,α ( (cid:98) θ n,α ) , (30)12here Σ (22) n,α ( θ ) is the (2 , × Σ n,α ( θ ) of the MDPDE used.Then, under the null hypothesis in (29), W n asymptotically follows χ distribution and the test canbe performed by comparing W n with the corresponding critical value. Further, the approximateexpression of power function at the contiguous alternative hypotheses of the form H ,n : β = β + n − / d , with d ∈ R , is given by β α (cid:0) θ n (cid:1) = 1 − G χ ,δ (cid:0) χ ,τ (cid:1) , with δ = d Σ (22) n,α ( θ ) , θ ∈ Θ , where G χ ,δ is the cumulative distribution function of a non-central χ ( δ ) random variable. We havenumerically computed the asymptotic contiguous powers for testing the hypothesis in (29) at 5% levelof signiﬁcance, taking β = 1 and θ = (1 , T , for diﬀerent values of α ; the results are presented inTable 1. It is clear that the asymptotic contiguous power of our proposed Wald-type tests (under puredata) decreases as α increases but this loss is not signiﬁcant at smaller α > β = 0 at 5%level of signiﬁcance with θ = (1 , T and diﬀerent values of α and dαd IF (cid:0) y t , W n , H θ (cid:1) = 2 IF (cid:0) y t , U ( β ) α , H θ (cid:1) / Σ (22) n,α ( θ ) . (31) P IF ( y t , W n , H θ ) = C ∗ (cid:16) d / Σ (22) n,α ( θ ) (cid:17) IF ( y t , U ( β ) α , H θ ) d/ Σ (22) n,α ( θ ) , (32)where y t is the contamination point, U ( β ) α is the MDPDE functional corresponding to β and its IF IF ( y t , U ( β ) α , H θ ) is given by the second component of the 2-dimensional IF vector of the MDPDEin (25). For illustration, we have plotted these IF and PIF over the contamination points ( x t , z t )within y t for diﬀerent values of tuning parameter α and δ t = 1 in Figures 1 and 2, respectively; theresults for δ t = 0 have similar patterns and, hence, they are not presented for brevity. For plottingthese IFs, we have taken β = 1, θ = (1 , T , d = 0 . n = 50 observations are drawn from N (1 ,

1) distribution as covariate, along with the incorporation of 10% uniform censoring. It is clearlyevident that the classical MLE based Wald test statistic has unbounded second order IF and its PIFis also unbounded indicating its extreme non-robust nature even under inﬁnitesimal contamination.13 a) α = 0 (b) α = 0 . α = 0 . α = 0 . Figure 1: Second order inﬂuence function for the MDPDE-based Wald-type test statistics for testing β = 1, under the parametric Cox regression model with θ = (1 , T , when δ t = 1.But, the MDPDE based Wald-type test statistic (30) has bounded second order IF as well as boundedPIF for all α >

0, which justiﬁes their desired robustness properties. Further, the extent of robustnessincreases with the value of α , indicated by the re-descending natures of all these IFs. We now discuss a robust model selection procedure under the parametric proportional hazard model(2), based on our MDPDE discussed in Section 2. In particular, we extend the idea of the divergenceinformation criterion (DIC), which was originally discussed in [33, 34] for IID complete data andrecently extended for non-homogeneous (complete) data in [30].Recall that, in our present case, given the values of z i and δ i , the observed lifetime variable X i hastrue densities h i ( x ), for i = 1 , . . . , n , which we model respectively by the parametric density f i, θ ( x )given in (3). Now, suppose that there are l many candidate models available for a given set of observeddata where the s -th model is denoted as { M ( s )1 , M ( s )2 , ..., M ( s ) n } for s = 1 , . . . , l . Also, assume that, foreach s , the corresponding model of the i -th observation, namely M ( s ) i is parametrized by a densityfunction of the form f i, θ ( x ), as in (3), with possibly diﬀerent θ for diﬀerent s .Under such a set-up, the computation of the usual model selection criterion AIC uses the KL14 a) α = 0 (b) α = 0 . α = 0 . α = 0 . Figure 2: Power inﬂuence function for the MDPDE-based Wald-type tests for testing β = 1, underthe parametric Cox regression model with θ = (1 , T and d = 0 . δ t = 1.divergence to quantify the discrepancy between the true distribution h i and parametric family ofmodels f i, θ . Here, we will extend it by considering the DPD measure d α in place of the KL divergenceto deﬁne the DIC. Note that, as discussed in Section 2, minimizing the DPD measure between the dataand the postulated model amount to the minimization of the simpler objective function H n,α ( θ ) asdeﬁned in (5). Thus, our robust model selection criterion should then select the parametric model forwhich E X ,...,X n [ H n,α ( (cid:98) θ n,α )] will be minimized, where (cid:98) θ n,α denotes the corresponding MDPDE withtuning parameter α . Since exact computation is not generally possible, we ﬁrst ﬁnd an asymptoticunbiased estimator of E X ,...,X n [ H n,α ( (cid:98) θ n,α )] and then minimize this estimator to select the optimummodel robustly. This leads to the following deﬁnition of the DIC, the resulting information criterionfor MDPDE-based robust model selection. Deﬁnition 5.1

Let, { M ( s )1 , M ( s )2 , ..., M ( s ) n } , s = 1 , . . . , l , be the set of l candidate models for theobservations x i for i = 1 , . . . , n , where each of them are the parametric proportional hazards modelhaving densities as in (2). Then, we select models ( M ∗ , M ∗ , ..., M ∗ n ) that satisfy ( M ∗ , M ∗ , ..., M ∗ n ) = min s DIC n,α (cid:16) M ( s )1 , M ( s )2 , ..., M ( s ) n (cid:17) here the divergence information criterion (DIC) is deﬁned as DIC n,α (cid:16) M ( s )1 , M ( s )2 , ..., M ( s ) n (cid:17) = H n,α (cid:16)(cid:98) θ ( s ) n,α (cid:17) + α + 1 n trace (cid:16) K n,α (cid:16)(cid:98) θ ( s ) n,α (cid:17) J − n,α (cid:16)(cid:98) θ ( s ) n,α (cid:17)(cid:17) , (33) withe (cid:98) θ ( s ) n,α being the MDPDE of the common parameter θ under the s -th models (cid:16) M ( s )1 , M ( s )2 , ..., M ( s ) n (cid:17) ,and H n,α ( θ ) , K n,α ( θ ) and J n,α ( θ ) are as deﬁned in Section 2. That the DIC, as deﬁned in the above deﬁnition, is indeed an asymptotically unbiased estimatorof E X ,...,X n [ H n,α ( (cid:98) θ n,α )] can be easily shown by extending the arguments from [30, 34]; the followingtheorem summarizes the ﬁnal rigorous result. Theorem 5.2

Suppose that the assumptions of Theorem 2.1 hold for the randomly censored observa-tions ( x i , δ i , z i ) for i = 1 , . . . , n , and under the model ( M , M , · · · , M n ) , where M i corresponds to theparametric proportional hazards regression model having the conditional density as in (2), i = 1 , . . . , n .Denote the MDPDE of the parameter θ under this speciﬁed model by (cid:98) θ n,α Then, the DIC measure,

DIC n,α ( M , M , · · · , M n ) as deﬁned in (33), is asymptotically unbiased for E X ,...,X n [ H n,α ( (cid:98) θ n,α )] . Remark 6

When α → , then the MDPDE (cid:98) θ ( s ) n,α computed under the s -th model (cid:16) M ( s )1 , M ( s )2 , ..., M ( s ) n (cid:17) coincides with the corresponding MLE, say (cid:98) θ ( s ) . Thus, we have lim α → DIC n,α (cid:16) M ( s )1 , M ( s )2 , ..., M ( s ) n (cid:17) = − n log L n (cid:16)(cid:98) θ ( s ) (cid:17) + 1 n trace (cid:16) K n, (cid:16)(cid:98) θ ( s ) (cid:17) J − n, (cid:16)(cid:98) θ ( s ) (cid:17)(cid:17) , which is nothing but a constant multiple of the Takeuchi information criterion (TIC), a generalizedversion of the AIC [42]. Here we present some interesting ﬁndings from an extensive simulation study in order to examine theﬁnite-sample power and level of the proposed MDPDE based Wald-type tests in some speciﬁc casesunder the parametric Cox regression model. For the purpose of computation of the MDPDEs, in allour numerical illustrations, we have implemented a two stage optimization technique. At the ﬁrststage, for a ﬁxed baseline parameter value ( γ ), we have used the Newton-Raphson method to estimatethe regression coeﬃcients ( β ) and then, in the second stage, we used the inbuilt R function optim toestimate γ for the estimated (ﬁxed) value of β . These two steps are repeated iteratively until thevalues of both the estimates converges. Subsequent implementation of the testing and model selectionproposals are done in R; the unoptimized codes are available from the authors upon request.In all simulations, we have compared our proposed robust MDPDE-based Wald-type tests for dif-ferent α > α = 0). Additionally, we compare our proposal, when testing for the regression coeﬃcients, withthe Wald-type tests based on two semi-parametric estimates, namely the standard partial likelihoodestimate (PLE) and the robust estimator of Bednarski [8] (denoted as BRE), of the regression coeﬃ-cient. These semi-parametric estimates and their standard errors are computed using the R package coxrobust [10] which are then used to perform the appropriate tests in our simulations.16 .1 Testing for a regression coeﬃcient when baseline hazard is a constant For our ﬁrst illustrations, we consider the simple exponential baseline hazard, in the parametric Coxregression model (3), given by λ ( x, γ ) = γ ∈ [0 , ∞ ). So, our parameter of interest is the p + 1dimensional vector θ = (cid:0) γ, β (cid:48) (cid:1) (cid:48) . We have simulated samples of size n = 50 ,

100 and 200 from thismodel, with the covariates being drawn from the standard normal distribution, and with uniformcensoring of proportions 5% and 10%. We have considered three covariates ( p = 3) with the trueparameters γ = 1 and β = ( β , β , β ) (cid:48) = (1 , , (cid:48) . Then, we test for the hypothesis H : β = 1,using the proposed W n (cid:16)(cid:98) θ n,α (cid:17) , deﬁned in (13), as the test statistic. Based on 1000 replications, wehave computed the empirical levels of the tests as the proportion of test statistics (among the 1000replications) exceeding the chi-square critical value (proportion of rejection). Next we wish to computethe empirical power of the tests by repeating the above process but under the contiguous alternativehypothesis H ,n : β = β n , where β n = 1 + d √ n , for diﬀerent d >

0. In practice, however, we test thenull hypothesis β = β n against the alternative β = 1. Switching these hypothesis in this way hastwo advantages. Firstly, the power under pure data can be estimated with the same data generatedfrom β = 1. Secondly, the same type of contamination now helps us to study the inﬂation in thelevel and the drop in power as we will shortly see. In case of power, the results for d = 4 and 6are reported here. To illustrate the desired robustness, we recalculate the level and power of theTable 2: Empirical levels of the MDPDE based Wald-type tests, and its semi-parametric competitors,for testing H : β = 1 in a parametric Cox regression model with exponential baseline Cens. Parametric MDPDE with α SemiparametricProp. (cid:15) n = 505% 0 0.079 0.082 0.086 0.091 0.095 0.1 0.106 0.083 0.0880.05 0.745 0.617 0.503 0.394 0.236 0.149 0.093 0.209 0.1550.1 0.758 0.629 0.515 0.401 0.288 0.166 0.102 0.231 0.16910% 0 0.081 0.083 0.087 0.093 0.097 0.102 0.109 0.09 0.0940.05 0.774 0.631 0.515 0.407 0.269 0.174 0.104 0.243 0.1620.1 0.78 0.642 0.528 0.412 0.309 0.187 0.108 0.269 0.175 n = 1005% 0 0.067 0.068 0.07 0.071 0.074 0.079 0.084 0.07 0.0730.05 0.768 0.634 0.513 0.403 0.225 0.127 0.088 0.20 0.1460.1 0.772 0.637 0.527 0.415 0.274 0.15 0.093 0.223 0.15210% 0 0.069 0.071 0.073 0.075 0.079 0.083 0.089 0.074 0.0770.05 0.782 0.652 0.524 0.421 0.241 0.134 0.092 0.209 0.150.1 0.785 0.646 0.548 0.426 0.293 0.169 0.098 0.248 0.16 n = 2005% 0 0.056 0.057 0.059 0.062 0.066 0.071 0.076 0.065 0.0690.05 0.781 0.649 0.536 0.417 0.219 0.112 0.081 0.174 0.1390.1 0.783 0.644 0.531 0.42 0.269 0.143 0.086 0.186 0.14510% 0 0.058 0.06 0.062 0.065 0.069 0.074 0.079 0.068 0.0730.05 0.792 0.662 0.55 0.428 0.243 0.121 0.086 0.187 0.1480.1 0.796 0.673 0.544 0.433 0.287 0.162 0.094 0.211 0.154 (cid:15) % contamination in each sample considered in the studywith (cid:15) = 0 . , .

1. The contaminating observations are generated from an exponential distributionwith mean 31. The same simulations are also repeated for the semi-parametric competitions, namelythe Wald-type tests based on PLE and BRE. The resulting values of the empirical levels and powersobtained from the diﬀerent simulation scenarios are reported in Tables 2 and 3.Table 3: Empirical powers of the MDPDE based Wald-type tests, and its semi-parametric competitors,for testing the null hypothesis β = 1 + d √ n in a parametric Cox regression model with exponentialbaseline, calculated at the alternative β = 1, which is contiguous to the null. Cens. Parametric MDPDE with α Semi-parametric d Prop. (cid:15) n = 504 5% 0 0.914 0.906 0.877 0.838 0.826 0.779 0.726 0.869 0.8280.05 0.282 0.367 0.478 0.564 0.643 0.748 0.814 0.742 0.7790.1 0.289 0.375 0.484 0.573 0.655 0.751 0.824 0.728 0.76310% 0 0.901 0.899 0.87 0.815 0.798 0.759 0.72 0.845 0.8090.05 0.264 0.338 0.459 0.534 0.627 0.732 0.807 0.739 0.7640.1 0.277 0.364 0.467 0.551 0.643 0.739 0.817 0.727 0.7546 5% 0 0.956 0.94 0.935 0.915 0.903 0.857 0.814 0.918 0.8890.05 0.391 0.472 0.584 0.694 0.783 0.842 0.896 0.843 0.8710.1 0.402 0.493 0.591 0.706 0.79 0.856 0.9 0.856 0.87810% 0 0.945 0.929 0.921 0.911 0.887 0.829 0.792 0.896 0.8770.05 0.386 0.461 0.576 0.682 0.757 0.827 0.887 0.819 0.8520.1 0.395 0.484 0.582 0.688 0.771 0.834 0.892 0.831 0.857 n = 1004 5% 0 0.962 0.957 0.924 0.897 0.884 0.835 0.775 0.898 0.8740.05 0.353 0.428 0.541 0.626 0.714 0.806 0.869 0.779 0.8020.1 0.364 0.437 0.548 0.633 0.724 0.811 0.872 0.79 0.80710% 0 0.96 0.946 0.919 0.889 0.865 0.827 0.771 0.885 0.8640.05 0.346 0.413 0.529 0.612 0.701 0.794 0.858 0.764 0.7920.1 0.351 0.42 0.535 0.622 0.71 0.803 0.865 0.778 0.7966 5% 0 0.985 0.98 0.977 0.971 0.961 0.915 0.881 0.941 0.9240.05 0.454 0.538 0.649 0.731 0.823 0.902 0.948 0.893 0.9180.1 0.468 0.547 0.661 0.742 0.838 0.91 0.957 0.906 0.92710% 0 0.979 0.976 0.969 0.961 0.949 0.907 0.853 0.932 0.9180.05 0.447 0.521 0.636 0.717 0.809 0.891 0.942 0.884 0.9070.1 0.453 0.529 0.644 0.732 0.821 0.898 0.943 0.897 0.913 n = 2004 5% 0 0.981 0.976 0.937 0.922 0.915 0.871 0.826 0.934 0.9070.05 0.416 0.497 0.587 0.694 0.768 0.837 0.885 0.804 0.8310.1 0.419 0.504 0.592 0.706 0.782 0.846 0.889 0.785 0.82810% 0 0.979 0.975 0.925 0.913 0.901 0.851 0.814 0.925 0.9020.05 0.409 0.491 0.569 0.682 0.752 0.828 0.867 0.795 0.8240.1 0.414 0.499 0.584 0.687 0.761 0.824 0.871 0.78 0.8176 5% 0 0.999 0.998 0.997 0.992 0.978 0.959 0.928 0.97 0.9490.05 0.519 0.617 0.728 0.812 0.893 0.947 0.989 0.932 0.9510.1 0.527 0.623 0.734 0.822 0.902 0.951 0.992 0.944 0.95810% 0 0.992 0.99 0.987 0.982 0.964 0.948 0.916 0.961 0.9430.05 0.492 0.596 0.705 0.798 0.877 0.932 0.975 0.912 0.9370.1 0.508 0.615 0.725 0.813 0.884 0.937 0.987 0.928 0.949 .

05. The levels of theproposed tests exhibit an increasing relationship with α . With increasing sample size, all the observedlevels tend towards the nominal level. In case of power (Table 3), the empirical power decreases withincrease in α , but increases with increase in the sample size. Also the powers are higher for d = 6 thanthose for d = 4, as would be expected. Under pure data the power climbs up to be practically equalto 1 at d = 6 and n = 200. Under contamination, the level of the classical Wald test is signiﬁcantlyinﬂated, but they climb down to more acceptable levels with increasing α . On the other hand, thecontamination leads to a substantial drop in power for the classical Wald test, but the powers increasewith increasing α and, for moderate values of α , there is little or no loss in power.The performance of the semi-parametric tests, based on the PLE and BRE, fall somewhere in themiddle of the range of our proposed tests. Between them the PLE generates slightly more eﬃcienttests while the BRE leads to more robust outcomes. On the whole it appears that the level and thepower of the these semi-parametric tests (using PLE and BRE) are dominated by the MDPDE basedtests for low values of α under pure data, and relatively larger values of α for contaminated data (Here,in the context of power, to dominate is to have a higher value of the power, while in the context oflevel it indicates that the observed level is closer to the nominal value). Thus, with a suitable tuningparameter selection strategy which lets the user choose the optimal tuning parameter α depending onthe amount of anomaly in the data, our proposal can beat these competitors in each situation. Next we have considered the Weibull baseline hazard in the parametric Cox regression model (3),given by λ ( x, γ ) = γ γ γ x γ − , where γ = ( γ , γ ) (cid:48) ∈ [0, ∞ ) . Then, our parameter of interest is the p + 2 dimensional vector θ = (cid:0) γ, β (cid:48) (cid:1) (cid:48) . We again simulate random samples of size n = 50 , ,

200 and300 from this model with the covariates being drawn from the standard normal distribution, alongwith uniform censoring proportions of 5% and 10%, as in the exponential baseline case. We havetaken p = 3, where the true parameters are γ = 1 , γ = 2 and β = (0 . , . , . (cid:48) = ( β , β , β ) (cid:48) andconsidered the problem of testing the hypothesis H : β = 0 . H ,n : β = β n , where β n = 0 . d √ n with d = 2. We have used W n (cid:16)(cid:98) θ n,α (cid:17) deﬁnedin (13) as the test statistic. Again using 1000 replications, we have computed the empirical levelsand powers of the tests as described in Section 6.1. Finally, to illustrate the claimed robustness,we recalculate the level and power of the Wald-type tests after introducing 100 (cid:15) % contamination ineach sample of the previous simulation exercise with (cid:15) = 0 . , .

1. The contaminating observationsare generated from exponential distribution with mean 8 .

5, for the level and power calculations. Wereport all the resulting empirical levels and powers obtained from diﬀerent simulation scenarios, alongwith the result obtained from the competitive semi-parametric tests, in Tables 4 and 5.In general, the ﬁndings are similar to the exponential baseline case. The levels of the proposedtests (as well as those of PLE and BRE) are all higher than the nominal level under pure data, buttend towards the nominal value with increasing sample size. Also the inﬂation in the level increaseswith α . On the other hand, the stability of the level is better maintained by the tests with largervalues of α . For pure data, power drops with increasing α , but for larger α power is more stable undercontamination. Tests with lower values of α provide better performance compared to PLE and BREunder pure data, and tests with larger values of α dominate the PLE and BRE under contamination.Thus, a suitable tuning parameter selection strategy can provide a suitable candidate depending on19able 4: Empirical levels of the MDPDE based Wald-type tests, and its semi-parametric competitors,for testing H : β = 0 . Cens. Parametric MDPDE with α Semi-parametricProp. (cid:15) n = 505% 0 0.088 0.089 0.092 0.096 0.101 0.105 0.112 0.094 0.1030.05 0.759 0.618 0.529 0.394 0.298 0.191 0.116 0.208 0.1670.1 0.778 0.631 0.543 0.416 0.313 0.205 0.125 0.224 0.18410% 0 0.09 0.092 0.095 0.099 0.104 0.109 0.117 0.098 0.1080.05 0.771 0.634 0.548 0.409 0.31 0.201 0.122 0.227 0.1780.1 0.787 0.651 0.562 0.434 0.328 0.219 0.134 0.241 0.195 n = 1005% 0 0.076 0.078 0.08 0.083 0.086 0.091 0.097 0.079 0.0850.05 0.776 0.649 0.537 0.402 0.287 0.172 0.105 0.189 0.1540.1 0.797 0.687 0.569 0.438 0.301 0.192 0.116 0.204 0.16210% 0 0.078 0.08 0.083 0.086 0.09 0.094 0.101 0.085 0.0890.05 0.792 0.679 0.554 0.423 0.299 0.186 0.113 0.209 0.1660.1 0.809 0.704 0.581 0.453 0.315 0.206 0.124 0.233 0.179 n = 2005% 0 0.061 0.062 0.064 0.067 0.071 0.076 0.084 0.067 0.0720.05 0.794 0.668 0.544 0.417 0.269 0.154 0.089 0.168 0.1360.1 0.813 0.697 0.584 0.456 0.293 0.176 0.101 0.187 0.14810% 0 0.063 0.065 0.067 0.071 0.075 0.08 0.088 0.071 0.0770.05 0.806 0.682 0.569 0.429 0.284 0.168 0.096 0.191 0.1450.1 0.829 0.711 0.596 0.47 0.308 0.191 0.109 0.212 0.159 n = 3005% 0 0.054 0.056 0.058 0.06 0.063 0.067 0.075 0.062 0.0660.05 0.811 0.694 0.572 0.434 0.261 0.14 0.077 0.142 0.1170.1 0.834 0.719 0.602 0.469 0.282 0.162 0.085 0.163 0.13110% 0 0.057 0.059 0.062 0.065 0.069 0.074 0.081 0.068 0.0730.05 0.828 0.706 0.586 0.451 0.278 0.153 0.086 0.154 0.1290.1 0.85 0.732 0.618 0.483 0.297 0.175 0.094 0.168 0.144 the situation. We now present another type of important testing example, where we test for “

Exponentiality againstmonotone hazardness ” for the baseline hazard formulation. We know that when γ = 1, Weibulldistribution becomes exponential and it has constant hazard; for γ (cid:54) = 1, it has monotone hazard. So,equivalently our objective can be formulated as the problem of testing the hypothesis H : γ = 1with a Weibull family of alternative distributions (although it restricts the class of alternatives tothe Weibull family only). Note that, the semi-parametric tests (based on PLE or BRE) can not beperformed in this case, but we can still apply our proposed MDPDE-based Wald-type tests.We again perform a simulation study, as in Section 6.2, with p = 3, γ = γ = 1 and β =(0 . , . , . (cid:48) = ( β , β , β ) (cid:48) . We generate samples of size n = 50 , ,

200 and 300 from this model20able 5: Empirical powers of the MDPDE based Wald-type tests, and its semi-parametric competitors,for testing the null hypothesis β = β n = 0 . √ n in a parametric Cox regression model with Weibullbaseline, calculated at the alternative β = 0 .

3, which is contiguous to the null.

Cens. Parametric MDPDE with α Semi-parametricProp. (cid:15) n = 505% 0 0.917 0.914 0.906 0.891 0.849 0.785 0.752 0.807 0.7730.05 0.396 0.458 0.521 0.597 0.654 0.701 0.743 0.729 0.7390.1 0.409 0.469 0.538 0.608 0.677 0.713 0.756 0.743 0.75210% 0 0.907 0.897 0.886 0.875 0.836 0.762 0.738 0.786 0.7560.05 0.374 0.421 0.496 0.563 0.621 0.676 0.73 0.678 0.7130.1 0.389 0.437 0.514 0.582 0.653 0.698 0.741 0.695 0.733 n = 1005% 0 0.969 0.963 0.958 0.947 0.905 0.873 0.798 0.840 0.8150.05 0.431 0.489 0.542 0.624 0.683 0.728 0.789 0.745 0.7820.1 0.447 0.506 0.557 0.643 0.698 0.744 0.803 0.769 0.79410% 0 0.953 0.944 0.932 0.92 0.881 0.854 0.779 0.824 0.7930.05 0.409 0.462 0.523 0.603 0.658 0.706 0.773 0.714 0.760.1 0.423 0.485 0.532 0.619 0.675 0.722 0.782 0.736 0.775 n = 2005% 0 0.989 0.987 0.983 0.972 0.945 0.918 0.867 0.905 0.8790.05 0.48 0.539 0.584 0.653 0.717 0.793 0.857 0.821 0.8430.1 0.498 0.551 0.602 0.668 0.73 0.808 0.869 0.839 0.85810% 0 0.974 0.961 0.952 0.942 0.921 0.904 0.851 0.886 0.8620.05 0.459 0.508 0.562 0.629 0.695 0.766 0.838 0.809 0.8270.1 0.473 0.527 0.581 0.649 0.709 0.784 0.852 0.824 0.839 n = 3005% 0 0.995 0.989 0.989 0.982 0.963 0.944 0.903 0.936 0.9120.05 0.546 0.602 0.654 0.713 0.786 0.841 0.892 0.868 0.8850.1 0.563 0.615 0.671 0.735 0.797 0.854 0.904 0.882 0.89610% 0 0.982 0.974 0.965 0.958 0.943 0.931 0.891 0.917 0.8970.05 0.524 0.581 0.637 0.696 0.759 0.817 0.88 0.857 0.8710.1 0.539 0.594 0.653 0.718 0.772 0.836 0.893 0.87 0.882 with 5% and 10% censored observations as in the previous case. Based on the 1000 replications,we calculate the empirical level in the same manner, as discussed in Section 6.1, but now for testing H : γ = 1. For calculating power, we have replicated 1000 samples from the model where true value ofthe parameter in question is γ n = 1+ . √ n , and we calculate the proportion of times H is rejected, whichgives us the empirical power. Then, we recalculate the level and power of the Wald-type tests afterintroducing 100 (cid:15) % contamination in each sample with (cid:15) = 0 . , .

1. The contaminating observationsare generated from Weibull distributions with parameters ( γ = 1 , γ = 0 .

8) and ( γ = 1 , γ = 0 . n for all α , but have higher observed values and slower convergencefor larger values of α for pure data. Similarly, under pure data, power decreases with increasing α Cens. Parametric MDPDE with α Prop. (cid:15) n = 505% 0 0.081 0.083 0.086 0.09 0.093 0.098 0.1070.05 0.728 0.609 0.514 0.396 0.287 0.195 0.1190.1 0.757 0.635 0.548 0.421 0.309 0.218 0.13110% 0 0.085 0.087 0.091 0.094 0.098 0.104 0.1140.05 0.749 0.638 0.536 0.421 0.306 0.207 0.130.1 0.757 0.635 0.548 0.421 0.309 0.218 0.131 n = 1005% 0 0.073 0.075 0.077 0.08 0.084 0.088 0.0960.05 0.743 0.633 0.526 0.408 0.275 0.178 0.1030.1 0.771 0.665 0.567 0.435 0.296 0.198 0.11710% 0 0.078 0.081 0.084 0.088 0.091 0.095 0.1040.05 0.766 0.662 0.553 0.429 0.294 0.191 0.1150.1 0.798 0.692 0.589 0.457 0.314 0.215 0.128 n = 2005% 0 0.062 0.064 0.066 0.07 0.073 0.077 0.0860.05 0.759 0.654 0.544 0.423 0.264 0.167 0.0930.1 0.786 0.687 0.591 0.462 0.289 0.191 0.10610% 0 0.067 0.069 0.072 0.075 0.079 0.084 0.0940.05 0.785 0.678 0.565 0.442 0.281 0.184 0.1030.1 0.809 0.721 0.614 0.48 0.316 0.209 0.118 n = 3005% 0 0.053 0.055 0.057 0.059 0.062 0.066 0.0740.05 0.778 0.675 0.563 0.436 0.249 0.155 0.0840.1 0.812 0.711 0.592 0.476 0.276 0.178 0.09710% 0 0.059 0.061 0.063 0.066 0.069 0.073 0.0830.05 0.803 0.697 0.584 0.455 0.268 0.173 0.0960.1 0.836 0.739 0.621 0.491 0.298 0.195 0.108 and increases with increasing sample size. Under contamination, lower values of α produce drasticallyinﬂated levels, but the degree of inﬂation drops with increasing α . Similarly, the contamination leadsto a huge drop in power for lower values of α , most of which is slowly recovered with increasing α .On the whole we feel that the MDPDE based robust Wald-type tests have the potential to becomereally useful tools for the applied scientist for the situations described here. α Our proposed MDPDE-based Wald-type tests and the model selection criterion, DIC, crucially dependson the choice of α . In simulation studies, we have seen that, an appropriately chosen tuning parameter α leads to better inference than both the MLE based inference of the semi-parametric approach inmost situations. In fact, the tuning parameter α controls the trade-oﬀ between asymptotic eﬃciencyand robustness of the MDPDE under the parametric proportional hazards model (2) as well. At α = 0, the MDPDE coincides with MLE, which is the most eﬃcient (asymptotically) under pure22able 7: Empirical powers of the MDPDE based Wald-type tests for testing the exponentiality of thebaseline hazard in a parametric Cox regression model, calculated at γ n = 1 + . √ n Cens. Parametric MDPDE with α Prop. (cid:15) n = 505% 0 0.952 0.949 0.947 0.943 0.939 0.933 0.9210.05 0.419 0.492 0.557 0.634 0.738 0.827 0.910.1 0.407 0.479 0.538 0.612 0.724 0.813 0.89610% 0 0.937 0.935 0.932 0.927 0.922 0.917 0.9060.05 0.394 0.469 0.534 0.619 0.712 0.809 0.8970.1 0.381 0.453 0.514 0.589 0.696 0.789 0.881 n = 1005% 0 0.971 0.968 0.966 0.962 0.958 0.953 0.9430.05 0.474 0.541 0.603 0.682 0.781 0.859 0.930.1 0.457 0.523 0.581 0.663 0.768 0.842 0.91910% 0 0.956 0.952 0.949 0.945 0.94 0.934 0.9240.05 0.433 0.509 0.564 0.641 0.746 0.831 0.9150.1 0.419 0.487 0.549 0.624 0.722 0.819 0.904 n = 2005% 0 0.987 0.984 0.977 0.974 0.971 0.966 0.9570.05 0.529 0.593 0.668 0.737 0.828 0.897 0.9460.1 0.512 0.579 0.647 0.724 0.813 0.881 0.93510% 0 0.973 0.97 0.966 0.963 0.959 0.953 0.9410.05 0.492 0.57 0.642 0.711 0.807 0.873 0.9280.1 0.473 0.551 0.619 0.698 0.787 0.858 0.917 n = 3005% 0 0.999 0.998 0.997 0.997 0.994 0.99 0.9760.05 0.573 0.628 0.702 0.773 0.858 0.919 0.9670.1 0.556 0.613 0.684 0.751 0.839 0.903 0.95410% 0 0.987 0.983 0.98 0.977 0.973 0.968 0.9550.05 0.547 0.603 0.678 0.749 0.823 0.889 0.9490.1 0.533 0.587 0.659 0.728 0.805 0.878 0.937 data but has no robustness property. As α increases, the robustness of the corresponding MDPDEincreases under contaminated data but, on the other hand, its eﬃciency gradually decreases underthe pure data. Generally, for smaller positive α , this loss in eﬃciency is not so much signiﬁcant. Therobustness properties of the proposed Wald-type tests depend directly on that of the MDPDE usedto construct the tests-statistics, as we have shown theoretically in Section 4. Additionally, in oursimulations also, the MDPDE-based Wald-type tests with lower values of α are seen to provide betterperformance compared to the tests based PLE and BRE under pure data, whereas they are seen todominate the PLE and BRE based tests under contaminated data for larger values of α . The same canbe shown, both theoretically and empirically, for our model selection criterion (DIC) as well. Sincethe amount of contamination is often unknown, it is extremely important to choose an appropriatetuning parameter α based on the given data to apply our proposed inference methodologies to anyreal dataset.Based on the above discussions, we have a direct one-to-one relationship between the inferenceprocedures and the underlying MDPDE in terms of the eﬀect of α on them. So, the optimum value23f α for a given dataset can equivalently be chosen based on either of the statistical procedures; inparticular, it may be chosen based on controlling the trade-oﬀs between the eﬃciency and robustnessof the MDPDEs. Since this phenomenon is not only restricted to the current proportional hazardmodels, there have been a few attempts in the literature to choose an appropriate data-driven valueof α for the MDPDEs under general parametric set-ups with complete data. The most popular one isthe work of Warwick and Jones [44], who proposed to select this tuning parameter α in the MDPDEs,under the IID set up, by minimizing an estimate of their asymptotic mean square error (MSE).Another method had been proposed in [26] in a similar line but minimizing the estimate of asymptoticvariance rather than the MSE. It has latter been seen that the use of bias in considering the MSE, asdone in [44], often lead to better performance, and this method has subsequently used for MDPDEsunder more complex parametric models. In particular, Ghosh and Basu [17, 18] extended this method,based on minimizing the MSE, for non-homogeneous setups along with a detailed investigation of theirusefulness in analyzing real datasets. The same process was also applied for the MDPDEs under theparametric proportional hazards model (2) by [21]. So, here, to ﬁnd an optimum α for our proposedMDPDE-based Wald-type tests and DIC, we propose to use the same procedure of minimizing theestimated (asymptotic) MSE as described in [21]. For the sake of completeness, let us brieﬂy describethe full procedure.Suppose that the observed dataset is obtained from a true distribution, where the conditionaldensity of x i given δ i and z i , is contaminated as h i = (1 − (cid:15) ) f i, θ ∗ + (cid:15) ∆ i , for each i and for somecontaminating densities ∆ i . Then, θ ∗ is the target parameter value. Let the MDPDE of θ withtuning parameter α is denoted as (cid:98) θ n,α , which is a consistent estimator of the corresponding functionalvalue, say θ , at the true distribution. Note that, θ ∗ and θ may not be the same. Then, theasymptotic MSE (AMSE) of the MDPDE (cid:98) θ n,α with respect to the target θ ∗ is given by AM SE (cid:16)(cid:98) θ n,α (cid:17) = ( θ − θ ∗ ) (cid:48) ( θ − θ ∗ ) + 1 n trace (cid:2) J − α ( θ ) K α ( θ ) J − α ( θ ) (cid:3) . (34)Now, we have provided a consistent estimator of the asymptotic variance matrix J − α ( θ ) K α ( θ ) J − α ( θ )as Σ n,α deﬁned in (10). We also know that (cid:98) θ n,α is consistent for θ by Theorem 2.1. However, sincewe do not have any straightforward estimate of θ ∗ , we need to use an appropriate pilot estimator, say (cid:98) θ P , for this purpose. Thus, an estimate of the AMSE at tuning parameter α is given by (cid:92) AM SE (cid:16)(cid:98) θ n,α (cid:17) = (cid:16)(cid:98) θ n,α − (cid:98) θ P (cid:17) (cid:48) (cid:16)(cid:98) θ n,α − (cid:98) θ P (cid:17) + 1 n trace [ Σ n,α ] . (35)Note that, for a given dataset, the estimated MSE, (cid:92) AM SE (cid:16)(cid:98) θ n,α (cid:17) , is a function of the tuning param-eter α only; so, we can minimize it over α ∈ [0 ,

1] to get the optimal tuning parameter for the givendataset. This can be veriﬁed by standard grid-search over [0 ,

1] to obtain the global minimum.However, it is important to note that the ﬁnal choice of optimal α often depend on the pilotestimator (cid:98) θ P . This issue has been studied via detailed simulations in previous works. Under theIID set-up, its choice has been recommended in [44] as the MDPDE at α = 1, whereas the MDPDEat α = 0 . α from the previous step until convergence.Since the present case of parametric Cox regression model (2) is considered as a non-homogeneous24et-up in Section 2, we also suggest to simply use the MDPDE at α = 0 . α for a given dataset. The iterative approach of [40] can also beconsidered in the present case, but it is often not required in practice. Let us now apply the proposed parametric proportional hazards regression model to analyze threeinteresting survival data examples. In each example, we have ﬁtted the model using both Weibulland exponential baselines. For each baseline model, we have calculated our robust model selectioncriterion, DIC, for diﬀerent α and used these values to compare two baseline models. In all threeexamples, we have observed that for α less than 0 .

2, the exponential model gives a better result thanthe Weibull model, but for higher α , the Weibull model outperforms exponential model in terms ofthe DIC. Further, we have used the DIC for covariate selection as well; for each baseline, we haveconsidered all possible subsets of the full model (except the intercept model) and compared the DICvalues to ﬁnd the best subset of covariates. Due to the dependence on α , based on the discussionsof Section 7, the optimum model for each dataset is selected through the procedure described in thefollowing.Given a particular dataset, for each possible sub-model (both in terms of available covariates andthe two baseline hazards), we ﬁrst compute the ‘optimum’ value of α that provides the best trade-oﬀ between the eﬃciency and robustness of the MDPDEs of the model parameters, following theprocedure described in Section 7 with the pilot choice of α = 0 .

5. The DIC for each sub-model is thencalculated at the respective optimum α -values and compared across all possible sub-models; we choosethe model having minimum DIC as the best robust model for the given dataset which is then usedfor subsequent inference. In this ﬁnal step, we have applied the proposed MDPDE-based Wald-typetests, at the corresponding optimum α , to test for the signiﬁcance of each of the covariates presentin the selected model. For brevity, only the results for the best selected model are presented in themain text for each of the three datasets; the detailed results for additional competitive models, thatare used in our computations, are provided at the end in Appendix B.The classical MLE based Wald tests were done to perform the same inference, and, due to thepresence of the outliers in the data, the results were seen to be signiﬁcantly diﬀerent from those of ourproposed tests. For the purpose of comparison, we also analyzed these data with the correspondingsemi-parametric models, where the test of signiﬁcance of the model parameters are performed basedon the PLE and the BRE. These semi-parametric results are also presented along with our proposedMDPDE based results (with optimum α ) under the fully parametric model for each of the threedatasets. Our ﬁrst example is a lung cancer study dataset available in the R package emplik . The data contain121 observations on patients with small cell lung cancer (SCLC) from a clinical study [32] which wasdesigned to evaluate two regimens: Arm A - cisplatin (C) followed by etoposide (E); and Arm B -etoposide (E) followed by cisplatin (C). Actually, for patients with small cell lung cancer (SCLC), thestandard therapy is to use a combination of etoposide (E) and cisplatin (C). However, the optimalsequencing and administration schedule have not been established. In this study, 121 patients with25imited-stage SCLC were randomly assigned to these two arms (62 patients to A and 59 patients toB). At the time of the analysis, there was no loss to follow-up. Each death time was either observed oradministratively censored. Therefore, the censoring variable does not depend on the two covariates -treatment indicator ( arms ) and patient’s entry age ( entry-age ). Previously, these data were analyzedin [45], where these data were ﬁtted with a semi-parametric median regression model by regressingthe logarithm of the survival time over the covariates arms and entry-age .Here, we have ﬁtted the parametric Cox regression model, with the two covariates arms and entry-age as covariates, using our proposed method. The ﬁnal results corresponding to the bestmodel selected via the DIC at the optimum α -values are reported in Table 8; refer to Table 11 inAppendix B for the model selection details. Here, we have reported the estimate, the standard error(SE), and the p-value for testing the signiﬁcance of each covariate included in the model, i.e., testing H : β j = 0 as described in Section 6. Note that, our proposal selects a model with exponential baselinehazard and only one covariate arm ; the other covariate ( entry-age ) is left outside as insigniﬁcant.However, the variable entry-age is seen to be signiﬁcant when the fully parametric model with thesame baseline hazard is analyzed by the MLE-based Wald test or the corresponding semi-parametricmodel is analyzed by the PLE based approach. That this is an erroneous results due to the presenceof outliers can partially be observed by the semi-parametric BRE-based results (Table 8), where thecovariate entry-age is not marginally signiﬁcant even at 10% level of signiﬁcance. The regressioncoeﬃcient for arm obtained by MDPDE in our best selected model is also very close to that obtainedby BRE, which further justiﬁes the correctness of our results along with the advantages of the fullyparametric modeling. Thus our proposed method is found very useful in choosing proper model interms of robustness against any outliers present in the data.Table 8: The MDPDE-based results for the best selected model at an optimum (cid:98) α = 0 . γ . arm entry-age γ Best model selected by our proposal (DIC( (cid:98) α ) = 87 . α . We have summarized theMDPDE at diﬀerent α , obtained under the full model, along with their standard errors in Table 12of Appendix B. It is clear that the MDPDE with α ≈ . As our second example, which illustrates the eﬀectiveness of the fully parametric approach over thesemi-parametric one, we consider the data on 137 advanced lung cancer patients as collected by theVeterans Administration Lung Cancer Study Group. Patients were randomized according to one oftwo chemotherapeutic agents (1-standard; 2-test). Tumors are classiﬁed into one of four broad groups(squamous, small cell, adeno, and large). The covariates recorded are performance status, time fromdiagnosis to starting on study (months), age, and previous therapy (0 for no; 10 for yes). These dataare now available in R package survival and were previously used by R. J. Prentice [36] to determinewhich covariates have an important relation with survival and to compare eﬃcacy of treatments withrespect to longevity by ﬁtting a Cox model of survival time over the covariates.After ﬁtting the standard semi-parametric model using the coxrobust package, we have found thatamong several covariates, karno (Karnofsky performance score) and three cell types - adeno, largeand small-cell - are signiﬁcant. Using these covariates, we have next ﬁtted the fully parametric modelusing the proposed MDPE-based approach, and tabulated the MDPDEs at diﬀerent α in Table 13 inAppendix B. We can see that, these MDPDEs are quite close to the MLE indicating that there is nosigniﬁcant outlier-eﬀect present in these data with respect to the fully parametric model.Table 9: The MDPDE-based results for the best selected model at an optimum (cid:98) α = 0 .

62 for Veterandata, along with the results obtained by the MLE-based approach and the semi-parametric PLE orBRE based approaches. The baseline hazard for the selected model is exponential with parameter γ . karno adeno large smallcell γ Best model selected by our proposal (DIC( (cid:98) α ) = 86 . large is insigniﬁcant while using PLE or BRE; the similarity inthe PLE an BRE also indicates no outlier-eﬀect. However, if we look at the results obtained by theMLE under the fully parametric model all covariates except karno becomes insigniﬁcant, indicatingthe signiﬁcant change in our inference while using the proposed fully parametric modelling. The modelselected by our proposed procedure also provide the same inference with only the covariate karno beingsigniﬁcant for the patient’s lifetime; it illustrates that the proposed MDPDE based procedure also givethe results that are consistent with the most eﬃcient MLE based inference in most cases when there isno signiﬁcant outlier-eﬀect. Additionally, our proposed model selection strategy automatically leavesout the two unimportant covariates providing us a better model with only one insigniﬁcant covariate,along with the signiﬁcant one. Our ﬁnal example is from a completely diﬀerent applied domain where our procedure is shown toprovide new insights signiﬁcantly diﬀerent from those obtained by the existing method. These datawere obtained from [37], and were also used in [2, 14]. The data relate to 432 convicts who werereleased from the state institutions of Georgia and Texas in the 1970s and who were followed upfor one year after release. Some of the prisoners were oﬀered eligibility for unemployment insurancepayments for periods of up to 6 months or until they managed to locate employment. These ex-prisoners were carefully selected for conducting an experiment run by the Department of Labor incollaboration with the two states. Other prisoners who were not oﬀered unemployment beneﬁts alsoparticipated in the experiment to serve as controls. The ex-prisoners who were oﬀered unemploymentinsurance beneﬁts were compared with others released around the same time who were not madethe same oﬀer. The purpose of the experiment was to test a new way of helping persons who hadcompleted their sentences or were released on parole to bring themselves into civilian life. In broadestterms, it is the social problem of crime that is the center of concern of that study. Speciﬁcally, theresearchers focused on recidivism, the unfortunate tendency of persons convicted of crime at one pointin time to be arrested and convicted again, sometimes to repeat this sequence over and over. Notethat here censoring happens if a prisoner does not commit a crime within one year after release.Based on an initial analysis using semi-parametric Cox model via coxrobust , we observed thatthe signiﬁcant covariates are “ wexp ” (full-time work experience before incarceration; 1 for yes), “ ﬁn ”(ﬁnancial aid provided or not; 1 for yes) and “ prio ” (number of convictions prior to current incar-ceration). So, we next ﬁt a fully parametric Cox model using these covariates and summarize theﬁnal results in Table 10 (see Tables 15 and 16 in Appendix B for detailed analyses). Here, we notethat, the semi-parametric results based on PLE or BRE, respectively, indicates the variable wexp or ﬁn to be insigniﬁcant; the other two remain signiﬁcant in each. The diﬀerence between the PLE andBRE based inference indicates the presence of outliers in the data, and hence the MLE based resultsobtained under the fully parametric model is also unreliable. So, under this fully parametric approach,we can use the model obtained via the proposed MDPDE-based model selection procedure to be theoptimum one. Interestingly, the p-values obtained based on the MDPDE based Wald-type tests at28able 10: The MDPDE-based results for the best selected model at an optimum (cid:98) α = 0 .

57 for Criminalrecidivism data, along with the results obtained by the MLE-based approach and the semi-parametricPLE or BRE based approaches. The baseline hazard for the selected model is Weibull( γ , γ ). ﬁn wexp prio γ γ Best model selected by our proposal (DIC( (cid:98) α ) = 132 . ﬁn and wexp are insigniﬁcant and the thirdvariable prio is only marginally signiﬁcant at 90% level of signiﬁcance. These completely new resultsprovide a new aspect to the underlying research problem with the proposed robust and eﬃcient fullyparametric inference procedures. In this paper we have considered a parametric proportional hazards model for randomly censoredresponse as an alternative to the traditional semi-parametric Cox regression model. Due to the non-robust nature of the traditional likelihood based inference under the presence of outliers, we have,in order to provide a viable alternative, developed a robust generalized Wald-type tests based on theMDPDE of [21]. Our work provides a thorough theoretical evaluation of the proposed robust Wald-typetests for testing the general composite hypothesis in the proposed parametric Cox model establishingthe claimed robustness and its advantages over the classical Wald-test. We have provided simulationstudies along with three real data examples to demonstrate how these theoretical advantages help inpractice in real situations and how this method sometimes performs better than testing procedurebased on traditional semi-parametric model. Besides that, we have provided the inﬂuence function(IF) of robust Wald-type test statistic and its asymptotic power. From the ﬁgures we also observethat the IF is unbounded for α = 0 but becomes bounded for positive α . A robust model selectioncriterion (DIC) is proposed using the MDPDE and is used in the real data applications to chose theoptimum parametric model successfully resisting the eﬀects of outliers present in the data.29t may possibly be argued that the method for selecting the best model in terms of the combinationof covariates, baseline functions and tuning parameter for the estimation of the model parameter asillustrated in Table 11 is a bit ad-hoc. A more satisfactory approach would involve a simultaneousoptimization over the diﬀerent factors, rather than the sequential optimization over the tuning pa-rameter followed by the optimization over the combinations of covariates and baseline functions. Thisissue possibly needs more attention and we plan to develop a more comprehensive strategy for ﬁndingthe most appropriate model in our future research.On the whole, even based on the demonstrations provided in the present paper, we can observe theproposed MDPDE-based robust tests and model selection criterion to become extremely useful toolsfor robust inference in practical applications with survival data. So, it would be important futureworks to extend this MDPDE and the associated testing procedure for Cox regression with frailtycomponents, with time varying covariates, and even with multivariate censored responses. We hopeto take up some of these extensions in the future. Acknowledgment:

This research is supported by Ministerio de Ciencia, Innovacin y Universidades (Spain) under theGrant PGC2018-095194-B-I00. Additionally, the research of AN and AG is partially supported by aninternal grant from Indian Statistical Institute, Kolkata, and that of AG is also supported partiallyby the INSPIRE Faculty Research Grant from Department of Science and Technology, Governmentof India.

Appendices

A Proof of the results

A.1 Proof of Theorem 3.2

Take any θ ∈ Θ ; then we have m ( θ ) = 0. Also, by Theorem 2.1, we have and √ n (cid:16)(cid:98) θ n,α − θ (cid:17) L −→ n −→∞ N ( q + p , Σ α ( θ )). Then using the delta method, we get √ n m (cid:16)(cid:98) θ n,α (cid:17) D −→ n −→∞ N (cid:0) q + p , M (cid:48) ( θ ) Σ α ( θ ) M ( θ ) (cid:1) . But, by assumptions M ( θ ) and Σ α ( θ ) are continuous around θ = θ . Also, by Theorem 2.1, (cid:98) θ n,α is consistent for θ . Hence, M (cid:48) ( (cid:98) θ n,α ) Σ α ( (cid:98) θ n,α ) M ( (cid:98) θ n,α ) is consistent for M (cid:48) ( θ ) Σ α ( θ ) M ( θ ).Therefore, by applying Slutsky’s theorem, we get that the asymptotic distribution of W n ( (cid:98) θ n,α ) as χ r . A.2 Proof of Theorem 3.3

Here, (cid:98) θ n,α P −→ n −→∞ θ ∗ . So, asymptotic distribution of l ∗ ( (cid:98) θ n,α , (cid:98) θ n,α ) and l ∗ ( (cid:98) θ n,α , θ ∗ ) will be the same.Now, a ﬁrst order Taylor series expansion of l ∗ ( (cid:98) θ n,α , θ ∗ ) at (cid:98) θ n,α around θ ∗ gives l ∗ ( (cid:98) θ n,α , θ ∗ ) − l ∗ ( θ ∗ , θ ∗ ) = ∂l ∗ ( θ , θ ∗ ) ∂ θ (cid:48) | θ = θ ∗ (cid:16)(cid:98) θ n,α − θ ∗ (cid:17) + o p (cid:16) || (cid:98) θ n,α − θ ∗ || (cid:17) . √ n (cid:16) (cid:96) ∗ ( (cid:98) θ n,α , (cid:98) θ n,α ) − (cid:96) ∗ ( θ ∗ , θ ∗ ) (cid:17) D −→ n −→∞ N (cid:0) , σ W n ( θ ∗ ) (cid:1) , where σ W n ( θ ∗ ) = (cid:18) ∂(cid:96) ∗ ( θ , θ ∗ ) ∂ θ (cid:19) (cid:48) θ = θ ∗ Σ α ( θ ∗ ) (cid:18) ∂(cid:96) ∗ ( θ , θ ∗ ) ∂ θ (cid:19) (cid:48) θ = θ ∗ . A.3 Proof of Theorem 3.4

From a Taylor series expansion of m ( (cid:98) θ n,α ) around θ n , we get m ( (cid:98) θ n,α ) = m ( θ n ) + M (cid:48) ( θ n )( (cid:98) θ n,α − θ n ) + o ( || (cid:98) θ n,α − θ n || ) . But, from (19), we get m ( (cid:98) θ n,α ) = m ( θ n ) + M (cid:48) ( θ n )( (cid:98) θ n,α − θ n ) + n − / M (cid:48) ( θ ) d + o ( (cid:107) θ n − θ (cid:107) ) + o ( || (cid:98) θ n,α − θ n || ) . Now we can write (cid:98) θ n,α − θ = (cid:98) θ n,α − θ n + θ n − θ = ( (cid:98) θ n,α − θ n ) + n − / d . Next, using the theory of contiguity (Le Cam’s third lemma) and Theorem 2.1 we get that, under H ,n given in (16), √ n ( (cid:98) θ n,α − θ ) D −→ n −→∞ N ( d , Σ α ( θ )) . Thus, under H ,n , we get √ n (cid:16)(cid:98) θ n,α − θ n (cid:17) D −→ n −→∞ N ( , Σ α ( θ )) , and also √ n ( o ( (cid:107) θ n − θ (cid:107) ) + o ( || (cid:98) θ n,α − θ n || )) = o p ( ) . Hence, combining, we have √ n m ( (cid:98) θ n,α ) D −→ n −→∞ N r ( M (cid:48) ( θ ) d , M (cid:48) ( θ ) Σ α ( θ ) M ( θ )) . Finally note that, if X ∼ N k ( µ , Σ ) with rank ( Σ ) = k , then X (cid:48) Σ − X follows a χ distributionwith degrees of freedom k and non-centrality parameter µ (cid:48) Σ − µ . Thus, W n ( (cid:98) θ n,α ), deﬁned in (21),must follow a χ distribution with degrees of freedom r and non-centrality parameter d (cid:48) M ( θ ) (cid:0) M (cid:48) ( θ ) Σ α ( θ ) M ( θ ) (cid:1) − M (cid:48) ( θ ) d . The second part of the theorem follows from the equivalence of the two contiguous hypotheses viathe relationship δ = M (cid:48) ( θ ) d . 31 .4 Proof of Theorem 4.1 Let us denote θ n ∗ = U α ( H P(cid:15),y t ). Then we get W n ( (cid:98) θ n,α ) = n m (cid:48) ( (cid:98) θ n,α ) (cid:16) M (cid:48) ( (cid:98) θ n,α ) Σ n,α ( (cid:98) θ n,α ) M ( (cid:98) θ n,α ) (cid:17) − m ( (cid:98) θ n,α )= n m (cid:48) ( θ n ∗ ) (cid:16) M (cid:48) ( (cid:98) θ n,α ) Σ n,α ( (cid:98) θ n,α ) M ( (cid:98) θ n,α ) (cid:17) − m ( θ n ∗ )+ n ( m ( (cid:98) θ n,α ) − m ( θ n ∗ )) (cid:48) (cid:16) M (cid:48) ( (cid:98) θ n,α ) Σ n,α ( (cid:98) θ n,α ) M ( (cid:98) θ n,α ) (cid:17) − ( m ( (cid:98) θ n,α ) − m ( θ n ∗ ))+ 2 n ( m ( (cid:98) θ n,α ) − m ( θ n ∗ )) (cid:48) (cid:16) M (cid:48) ( (cid:98) θ n,α ) Σ n,α ( (cid:98) θ n,α ) M ( (cid:98) θ n,α ) (cid:17) − m ( θ n ∗ )= T ,n + T ,n + T ,n , say (36)Now, let us consider θ n ∗ as a function of (cid:15) n = (cid:15) √ n i.e. f ( (cid:15) n ). Then a Taylor series expansion of f ( (cid:15) n ) at (cid:15) n = 0 gives f ( (cid:15) n ) = ∞ (cid:88) i =0 i ! (cid:15) i n i ∂ i f ( (cid:15) n ) ∂(cid:15) in | (cid:15) n =0 = θ n + (cid:15) √ n IF ( y t , U α , H θ ) + o p ( / √ n ) . From this, we get √ n ( θ n ∗ − θ n ) = √ n ( θ n ∗ − θ − n − / d )= (cid:15)IF ( y t , U α , H θ ) + o p ( ) , and thus √ n ( θ n ∗ − θ ) = d + (cid:15)IF ( y t , U α , H θ ) + o p ( )= d (cid:15), y t ,α ( θ ) + o p ( ) . (37)Additionally, using a Taylor series expansion of m ( θ n ∗ ) around θ , we can get √ n m (cid:48) ( θ n ∗ ) = M (cid:48) ( θ ) d (cid:15), y t ,α ( θ ) + o p ( ) . (38)Now, under H P(cid:15), y t , the asymptotic distribution of MDPDE yields √ n (cid:16)(cid:98) θ n,α − θ n ∗ (cid:17) D −→ n −→∞ N ( , Σ α ( θ )) . (39)Thus we get T ,n D −→ n −→∞ χ r . Now, combining (36), (37) and (38), we get W n ( (cid:98) θ n,α ) = V n (cid:48) (cid:0) M (cid:48) ( θ ) Σ α ( θ ) M ( θ ) (cid:1) − V n + o p ( ) , where V n = √ n ( m ( (cid:98) θ n,α ) − m ( θ n ∗ )) + M (cid:48) ( θ ) d (cid:15), y t ,α ( θ ) . V n D −→ n −→∞ N (cid:0) M (cid:48) ( θ ) d (cid:15), y t ,α ( θ ) , (cid:0) M (cid:48) ( θ ) Σ α ( θ ) M ( θ ) (cid:1)(cid:1) , and hence we ﬁnally get that W n ( (cid:98) θ n,α ) D −→ n −→∞ χ r ( d ∗ ) , where d ∗ is as deﬁned in the statement of the theorem. A.5 Proof of Theorem 4.3

We start with the expression of β W n ( θ n , (cid:15), y t ) from Theorem 4.1. Clearly, by deﬁnition of PIF andusing the chain rule of derivatives, we get P IF ( y t , W n , H θ ) = ∂∂(cid:15) β W n ( θ n , (cid:15), y t ) (cid:12)(cid:12)(cid:12)(cid:12) (cid:15) =0 ∼ = ∞ (cid:88) v =0 ∂∂(cid:15) C v (cid:16) M (cid:48) ( θ ) d (cid:15), y t ,α ( θ ) , (cid:0) M (cid:48) ( θ ) Σ α ( θ ) M ( θ ) (cid:1) − (cid:17)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:15) =0 P (cid:0) χ r +2 v > χ r,τ (cid:1) ∼ = ∞ (cid:88) v =0 ∂∂ t (cid:48) C v (cid:16) M (cid:48) ( θ ) t , (cid:0) M (cid:48) ( θ ) Σ α ( θ ) M ( θ ) (cid:1) − (cid:17)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) t = d , y t,α ( θ ) × ∂∂(cid:15) d (cid:15), y t ,α ( θ ) (cid:12)(cid:12)(cid:12)(cid:12) (cid:15) =0 P (cid:0) χ r +2 v > χ r,τ (cid:1) . But, d , y t ,α ( θ ) = d and the standard derivatives give ∂∂(cid:15) d (cid:15), y t ,α ( θ ) = IF ( y t , U α , H θ ) , and ∂∂ t C v ( t , A ) = ( t (cid:48) At ) v − v !2 v (2 v − t (cid:48) At ) At e − t (cid:48) At . Combining the above results and simplifying, we get the required expression of the PIF as given inthe theorem.Next, to calculate the LIF, we may start from the expression of T W n ( (cid:15), y t ), as given in Remark4.2, and proceed as in the calculation of the PIF. Alternatively, we may also obtain the LIF just bysubstituting d = in the expression of PIF. Through either way, we get that LIF ( y t , W n , H θ ) = ∂∂(cid:15) T W n ( (cid:15), y t ) | (cid:15) =0 = 0 . Further, the derivative of T W n ( (cid:15), y t ) of any order with respect to (cid:15) will also be zero at (cid:15) = 0, whichimplies that the LIF of any order will be identically zero.33 Additional numerical results for real data applications

Table 11: The MDPDE-based results for all possible sub-model in Small cell lung cancer data, alongwith the results obtained by the competitors under the full model. optimal α DIC arm entry-age γ optimal α DIC arm entry-age γ γ Exponential baseline Weibull baseline

Fully parametric MDPDE-based resultsEstimate 0.7 α = 0)Estimate 734.235 0.336 0.036 0.019 824.514 0.542 0.032 0.165 0.007SE (0.597) (0.005) (0.002) (0.202) (0.021) (0.002) (0.0008)p-value (0.073) (9.56e-12) (0.007) (0.127)Semi-parametric PLE-based resultsEstimate 0.513 0.028 -SE (0.204) (0.013) -p-value (0.012) (0.029) -Semi-parametric BRE-based resultsEstimate 0.757 0.026 -SE (0.231) (0.016) -p-value (0.001) (0.101) - Table 12: Estimates and SE (in parenthesis) of diﬀerent parameters for Small cell ling cancer data

Variable Parametric MDPDE with α Semi-parametric0(MLE) 0.05 0.1 0.2 0.3 0.4 0.5 PLE BRE

Exponential baseline arm entry-age -0.036 0.012 0.021 0.027 0.030 0.037 0.038 0.028 0.026(0.005) (0.019) (0.024) (0.034) (0.044) (0.054) (0.062) (0.013) (0.016) γ Weibull baseline arm entry-age γ γ Variable Parametric MDPDE with α Semiparametric0(MLE) 0.05 0.1 0.2 0.3 0.4 0.5 PLE BRE

Exponential baseline karno -0.030 -0.028 -0.024 -0.030 -0.044 -0.044 -0.045 -0.031 -0.042(2.6e-03) (2.62e-03) (2.86e-03) (3.14e-03) (3.53e-03) (3.98e-03) (0.0046) (0.0052) (0.0065) celltype adeno -0.093 -0.225 -0.748 -0.987 -1.236 -1.293 -1.328 -1.15 -1.237(2.77e-01) (2.92e-01) (3.19e-01) (3.83e-01) (5.46e-01) (5.91e-03) (0.6232) (0.293) (0.528) celltype large -0.311 -0.510 -0.621 -0.827 -0.952 -1.077 -1.310 -0.831 -1.027(7.16e-01) (3.17e-01) (3.52e-01) (5.50e-01) (8.55e-01) (8.84e-01) (0.9422) (0.2935) (0.329) celltype smallcell -0.101 -0.167 -0.120 -0.193 -0.246 -0.238 -0.183 -0.438 -0.164(2.16e-01) (2.34e-01) (2.68e-01) (4.61e-01) (5.27e-01) (5.51e-01) (0.608) (0.256) (0.285) γ Weibull baseline karno -0.025 -0.027 -0.029 -0.026 -0.032 -0.036 -0.039(4.21e-03) (5.52e-03) (5.59e-03) (6.48e-03) (7.12e-03) (8.82e-03) (9.28e-03) celltype adeno -1.101 -1.12 -1.159 -1.144 -1.159 -1.188 -1.214(6.88e-01) (7.23e-01) (8.19e-01) (8.90e-01) (9.50e-01) (1.00) (1.12) celltype large -0.742 -0.723 -0.772 -0.842 -0.892 -0.954 -1.086(6.24e-01) (6.59e-01) (7.05e-01) (7.42e-01) (8.11e-01) (9.48e-01) (1.04) celltype smallcell -0.459 -0.423 -0.325 -0.244 -0.215 -0.190 -0.1764.55e-01 5.29e-01 6.18e-01 6.99e-01 7.58e-01 8.11e-01 9.07e-01 γ γ optimal α DIC karno adeno large smallcell γ optimal α DIC karno adeno large smallcell γ γ Exponential baseline Weibull baseline

Fully parametric MDPDE-based resultsEstimate 0.56 92.368 -0.167 - - - 0.051 0.52 87.348 -0.089 - - - 0.021 0.469SE (0.019) - - - (0.008) (0.047) - - - (0.006) (0.287)p-value (0.007) - - - (0.167) - - -Estimate 0.48 114.975 - 0.260 - - 0.047 0.47 111.837 - 0.179 - - 0.107 0.384SE - (0.248) - - (0.0096) - (0.143) - - (0.038) (0.485)p-value - (0.216) - - - (0.196) - -Estimate 0.5 105.945 - - -0.767 - 0.069 0.48 104.982 - - -0.015 - 0.028 0.237SE - - (0.686) - (0.016) - - (0.003) - (0.019) (0.315)p-value - - (0.238) - - - (0.018) -Estimate 0.55 111.837 - - - 0.656 0.074 0.49 103.784 - - - 0.439 0.076 0.317SE - - - (0.449) (0.021) - - - (0.382) (1.2e-3) (0.093)p-value - - - (0.319) - - - (0.537)Estimate 0.49 100.19 -0.206 0.714 - - 0.106 0.45 94.997 -0.158 0.596 - - 0.085 0.523SE (0.056) (0.564) - - (0.009) (0.008) (0.431) - - (1.5e-4) (0.005)p-value (0.004) (0.427) - - (4.2e-4) (0.397) - -Estimate 0.54 90.462 -0.108 - -0.549 - 0.048 0.56 90.089 -0.184 - -0.462 - 0.131 0.452SE (0.008) - (0.468) - (0.007) (0.027) - (0.233) - (0.019) (0.284)p-value (1.15e-4) - (0.349) - (0.002) - (0.259) -Estimate 0.62 -0.096 - - 0.469 0.037 0.64 90.882 -0.074 - - 0.325 0.045 0.829SE (0.028) - - (0.295) (0.019) (0.017) - - (0.313) (1.8e-4) (1.76e-3)p-value (0.019) - - (0.418) (0.036) - - (0.529)Estimate 0.48 112.382 - -0.196 -0.506 - 0.089 0.51 102.498 - -0.086 -0.395 - 0.14 0.681SE - (0.088) (0.382) - (0.018) - (0.022) (0.278) - (0.089) (0.191)p-value - (0.119) (0.361) - - (0.194) (0.483) -Estimate 0.65 100.018 - 0.78 - 0.857 0.051 0.62 92.858 - 0.623 - 0.778 0.097 0.421SE - (0.367) - (0.601) (0.021) - (0.247) - (0.491) (2.1e-4) (1.54e-3)p-value - (0.103) - (0.439) - (0.219) - (0.426)Estimate 0.46 107.892 - - -0.486 0.431 0.039 0.46 102.38 - - -0.344 0.288 0.231 0.529SE - - (0.478) (0.371) (0.008) - - (0.173) (0.221) (3.48e-4) (2.3e-3)p-value - - (0.621) (0.594) - - (0.412) (0.515)Estimate 0.52 94.589 -0.158 0.626 -0.593 - 0.037 0.55 91.995 -0.134 0.473 -0.689 - 0.062 0.498SE (0.068) (0.464) (0.285) - (0.004) (0.118) (0.307) (0.229) - (0.077) (0.043)p-value (0.098) (0.484) (0.037) - (0.256) (0.124) (0.003) -Estimate 0.47 97.212 -0.162 1.142 - 0.945 0.096 0.5 91.991 -0.068 1.481 - 0.822 0.1 0.372SE (0.494) (0.562) - (0.604) (0.011) (0.366) (0.723) - (0.529) (9.2e-4) (0.008)p-value (0.743) (0.042) - (0.118) (0.853) (0.041) - (0.120)Estimate 0.52 90.008 -0.113 - -0.625 0.387 0.106 0.53 87.682 -0.074 - -0.748 0.295 0.164 0.639SE (0.126) - (0.486) (0.342) (0.008) (0.039) - (0.394) (0.221) (1.76e-3) (0.027)p-value (0.37) - (0.198) (0.258) (0.058) - (0.057) (0.182)Estimate 0.51 103.427 - 0.959 -0.102 0.849 0.058 0.49 103.181 - 0.848 -0.217 0.917 0.139 0.613SE - (0.614) (0.089) (0.701) (0.096) - (0.574) (0.114) (0.647) (2.44e-3) (0.072)p-value - (0.118) (0.252) (0.226) - (0.14) (0.057) (0.156)Estimate 0.58 90.182 -0.062 -1.385 -1.119 -0.164 0.049 0.56 91.465 -0.041 -1.273 -1.196 -0.171 0.132 0.011SE (0.007) (1.564) (1.698) (0.716) (1.41e-4) (9.94e-3) (1.231) (1.185) (0.998) (9.93e-3) (9.76e-5)p-value (1.93e-5) (0.376) (0.51) (0.448) (3.71e-5) (0.301) (0.313) (0.864)Fully parametric MLE-based results( α = 0)Estimate 743.944 -0.030 -0.093 -0.311 -0.101 0.033 878.482 -0.025 -1.101 -0.742 -0.459 0.112 0.058SE (2.6e-3) (2.77e-1) (7.16e-1) (2.16e-1) (0.033) (4.21e-3) (6.88e-1) (6.24e-1) (4.55e-1) (4.75e-3) (2.75e-5)p-value (8.44e-31) (0.737) (0.664) (0.640) (2.88e-9) (0.110) (0.234) (0.313)Semi-parametric PLE-based resultsEstimate -0.031 -1.15 -0.831 -0.438 -SE (0.0052) (0.293) (0.2935) (256) -p-value (2.05e-9) (8.68e-5) (2.49e-1) (4.83e-3) -Semi-parametric BRE-based resultsEstimate -0.042 -1.237 -1.027 -0.164 -SE (0.0065) (0.528) (0.329) (0.285) -p-value (9.39e-11) (1.92e-2) (6.55e-1) (1.83e-2) - α Semiparametric0(MLE) 0.05 0.1 0.2 0.3 0.4 0.5 PLE BRE

Exponential baseline ﬁn yes -0.337 -0.122 -0.117 0.083 0.167 0.182 0.161 -0.346 -0.352(0.153) (0.155) (0.189) (0.215) (0.247) (0.321) (0.427) (0.190) (0.247) wexp yes -0.217 -0.093 -0.025 -0.011 0.005 0.037 0.060 -0.223 -0.512(0.189) (0.266) (0.317) (0.379) (0.491) (0.621) (0.794) (0.209) (0.242) prio γ Weibull baseline ﬁn yes -0.425 -0.342 -0.218 -0.109 -0.053 0.068 0.123(0.228) (0.245) (0.278) (0.301) (0.356) (0.435) (0.542) wexp yes -0.472 -0.382 -0.314 -0.235 -0.172 -0.108 0.003(0.241) (0.314) (0.365) (0.423) (0.481) (0.596) (0.742) prio γ γ optimal α DIC ﬁn wexp prio γ optimal α DIC ﬁn wexp prio γ γ Exponential baseline Weibull baseline

Fully parametric MDPDE-based resultsEstimate 0.47 189.85 -0.285 - - 0.048 0.48 181.216 -0.316 - - 0.009 0.128SE (0.197) - - (0.056) (0.372) - - (0.001) (0.215)p-value (0.148) - - (0.396) - -Estimate 0.53 154.496 - -0.82 - 0.057 0.49 167.842 - -0.057 - 0.012 0.212SE - (0.464) - (0.081) - (0.024) - (0.008) (0.271)p-value - (0.077) - - (0.018) -Estimate 0.57 180.809 - - 0.096 0.082 0.54 162.71 - - -0.026 0.024 0.085SE - - (0.056) (0.098) - - (0.089) (0.001) (0.094)p-value - - (0.086) - - (0.734)Estimate 0.55 145.834 -0.214 -0.056 - 0.049 0.61 148.423 -0.256 -0.047 - 0.008 0.137SE (0.394) (0.034) - (0.062) (0.392) (0.069) - (0.002) (0.169)p-value (0.066) (0.099) - (0.514) (0.496) -Estimate 0.48 174.984 -0.286 - 0.094 0.062 0.47 166.828 -0.228 - -0.016 0.017 0.187SE (0.162) - (0.082) (0.071) (0.198) - (0.005) (0.008) (0.189)p-value (0.077) - (0.252) (0.25) - (0.001)Estimate 0.52 163.429 - -0.659 0.917 0.037 0.51 169.398 - -0.416 0.48 0.027 0.296SE - (0.614) (0.351) (0.059) - (0.011) (0.51) (0.004) (0.142)p-value - (0.283) (0.009) - (5.71e-3) (0.347)Estimate 0.58 134.917 0.154 0.697 -0.102 0.038 0.57 α = 0)Estimate 694.124 -0.337 -0.217 0.083 0.022 978.543 -0.425 -0.472 0.109 0.004 0.472SE (0.153) (0.189) (0.026) (0.001) (0.228) (0.241) (0.009) (0.0001) (0.543)p-value (0.023) (0.251) (0.002) (0.026) (0.24) (9.61e-29)Semi-parametric PLE-based resultsEstimate -0.346 -0.223 0.092 -SE (0.190) (0.209) (0.028) -p-value (0.069) (0.285) (0.001) -Semi-parametric BRE-based resultsEstimate -0.352 -0.512 0.077 -SE (0.247) (0.242) (0.034) -p-value (0.154) (0.034) (0.022) - eferences [1] Agostinelli, C., Locatelli, I., Marazzi, A. and Yohai, V. J. (2017). Robust estimators of acceler-ated failure time regression with generalized log-gamma errors. Computational Statistics and DataAnalysis , 107, 92–106.[2] Allison, P. D. (1995).

Survival analysis using the SAS system: a practical guide.

Cary, NC: SASInstitute.[3] Asadzadeh, S. and Baghaei, A. (2018). Robust AFT-based monitoring procedures for reliabilitydata.

Quality Technology and Quantitative Management , 17(1), 1–16.[4] Basu, A., Harris, I. R., Hjort, N. L. and Jones, M. C. (1998). Robust and eﬃcient estimation byminimising a density power divergence.

Biometrika , 85, 549–559.[5] Basu, A., Shioya, H. and Park, C. (2011).

Statistical Inference: The Minimum Distance Approach .Boca Raton, FL: Chapman and Hall.[6] Basu, A., Ghosh, A., Mandal, A., Martin, N. and Pardo, L. (2017). A Wald-type test statistic fortesting linear hypothesis in logistic regression models based on minimum density power divergenceestimator.

Electronic Journal of Statistics , 11, 2741–2772.[7] Basu, A., Ghosh, A., Martin, N. and Pardo, L. (2018). Robust Wald-type tests for non-homogeneous observations based on the minimum density power divergence estimator.

Metrika ,81, 493–522.[8] Bednarski, T. (1993). Robust estimation in the Cox regression model.

Scandinavian Journal ofStatistics , 20, 213–225.[9] Bednarski, T. (2006). On robust model selection within the Cox model.

Econometrics Journal , 9,279–290.[10] Bednarski, T. and Borowicz, F. (2006). coxrobust: Robust Estimation in Cox Model.

R packageversion 1.0 .[11] Borgan, O. (1984). Maximum likelihood estimation in parametric counting process models, withapplications to censored failure time data.

Scandinavian Journal of Statistics , 11, 1–16; Correction11, 275.[12] Cox, D. R. (1972). Regression models and life tables (with discussion).

Journal of the RoyalStatistical Society, Series B , 34, 187–220.[13] Cox, D. R. (1975). Partial likelihood.

Biometrika , 62(2), 269–276.[14] Fox, J. and Carvalho, M. S. (2012). The RcmdrPlugin. survival package: extending the R com-mander interface to survival analysis.

Journal of Statistical Software , 49(7), 1–32.[15] Fraser, D. A. S. (1957). Most powerful rank-type tests.

Annals of Mathematical Statistics , 28,1040–1043. 3916] Ghosh, A., Basu, A. and Pardo, L. (2015). On the robustness of a divergence based test of simplestatistical hypotheses.

Journal of Statistical Planning and Inference , 116, 91–108.[17] Ghosh, A. and Basu, A. (2013). Robust estimation for independent non-homogeneous observa-tions using density power divergence with applications to linear regression.

Electronic Journal ofstatistics , 7, 2420–2456.[18] Ghosh, A. and Basu, A. (2015). Robust estimation for non-homogeneous data and the selectionof the optimal tuning parameter: the density power divergence approach.

Journal of AppliedStatistics , 42(9), 2056–2072.[19] Ghosh , A., Mandal, A., Martin, N. and Pardo, L. (2016). Inﬂuence analysis of robust Wald-typetests.

Journal of Multivariate Analysis , 147, 102–126.[20] Ghosh, A. and Basu, A. (2018). Robust bounded inﬂuence tests for independent but non-homogeneous observations.

Statistica Sinica , 28(3), 1133–1155.[21] Ghosh, A. and Basu, A. (2019). Robust and eﬃcient estimation in the parametric Cox regressionmodel under random censoring.

Statistics in Medicine , 38(27), 1–17.[22] Ghosh, A., Basu, A. and Pardo, L. (2019). Robust Wald-type tests under random censoring withapplications to clinical trial analyses. arXiv:1708.09695v2 [stat.ME] .[23] Hampel, F. R., Ronchetti, E., Rousseeuw, P. J., Stahel, W. (1986).

Robust statistics: the approachbased on inﬂuence functions . Wiley, New York.[24] Heritier, S. and Ronchetti, E. (1994). Robust bounded-inﬂuence tests in general parametric mod-els.

Journal of the American Statistical Association , 89, 897–904.[25] Hjort, N. L. (1992). On inference in parametric survival data models.

International StatisticalReview , 60(3), 355–387.[26] Hong, C. and Kim, Y. (2001). Automatic selection of the tuning parameter in the minimumdensity power divergence estimation.

Journal of the Korean Statistical Society , 30, 453–465.[27] Hosmer, D. W., Lemeshow, S. and May, S. (2008).

Applied survival analysis: regression modelingof time-to-event data . Hoboken, NJ: John Wiley & Sons.[28] Huber, P. J. (1983). Minimax aspects of bounded-inﬂuence regression (with discussion).

Journalof the American Statistical Association , 69, 383–393.[29] Jones, M. P. (1991). Robust tests for survival data involving a single continuous covariate.

Scan-dinavian Journal of Statistics , 18(1), 323–332.[30] Kurata, S., and Hamada, E. (2019). On the consistency and the robustness in model selectioncriteria.

Communications in Statistics-Theory and Methods , doi:10.1080/03610926.2019.1615093.[31] Lin, D. Y. and Wei, L. J. (1989). The robust inference for the Cox proportional hazards model.

Journal of the American Statistical Association , 84(408), 1074–1078.4032] Maksymiuk, A. W., Jett, J. R., Earle, J. D., Su, J. Q., Diegert, F. A., Mailliard, J. A., Kardinal,C. G., Krook, J. E., Veeder, M. H., Wiesenfeld, M., Tschetter, L. K., and Levitt, R. (1994).Sequencing and Schedule Eﬀects of Cisplatin Plus Etoposide in Small Cell Lung Cancer Results ofa North Central Cancer Treatment Group Randomized Clinical Trial.

Journal of Clinical Oncology ,12, 70–76.[33] Mantalos, P., Mattheou, K. and Karagrigoriou, A. (2010). An improved Divergence InformationCriterion for the determination of the order of an AR process.

Communication in Statistics -Simulation and Computation , 39, 865–879.[34] Mattheou, K., Lee, S. and Karagrigoriou, A. (2009). A model selection criterion based on theBHHJ measure of divergence.

Journal of Statistical Planning and Inference , 139, 228–235.[35] Minder, C. E. and Bednarski, T. (1996). A robust method for proportional hazards regression.

Statistics in Medicine , 15, 1033–1047.[36] Prentice, R. L. (1973). Exponential survivals with censoring and explanatory variables.

Biometrika , 60(2), 279–288.[37] Rossi, P. H., Berk, R. A. and Lenihan, K. J. (1980).

Money, work, and crime: some experimentalresults . New York: Academic Press.[38] Rousseeuw, P. J., Ronchetti, E. (1979). The Inﬂuence Curve for Tests. Research Report. 21,

Fachgruppe fr Statistik, ETH, Zurich .[39] Rousseeuw, P. J., Ronchetti, E. (1981). Inﬂuence curves for general statistics.

Journal of Com-putational and Applied Mathematics , 7, 161–166.[40] Basak, S., Basu, A., and Jones, M. C. (2020). On the ‘optimal’ density power divergence tuningparameter.

Journal of Applied Statistics , doi: 10.1080/02664763.2020.1736524.[41] Shen, H., Chai, H., Li, M., Zhou, Z., Liang, Y., Yang, Z., Huang, H., Liu, X. and Zhang, B.(2018). Robust sparse accelerated failure time model for survival analysis.

Technology and HealthCare , 26, S55–S63.[42] Takeuchi, K. (1976). Distribution of information statistics and criteria for adequacy of models.

Mathematical Sciences , 153 , 12–18 (in Japanese).[43] Wang, C-Y. and Song, X. (2016). Robust best linear estimator for Cox regression with instru-mental variables in whole cohort and surrogates with additive measurement error in calibrationsample.

Biometrical Journal , 58(6), 1465–1484.[44] Warwick, J. and Jones, M. C. (2005). Choosing a robustness tuning parameter.

Journal of Sta-tistical Computation and Simulation , 75(7), 581–588.[45] Ying, Z., Jung, S. H. and Wei, L. J. (1995). Median regression analysis with censored data.