Robust estimators for generalized linear models with a dispersion parameter
Michael Amiguet, Alfio Marazzi, Marina Valdora, Victor Yohai
aa r X i v : . [ s t a t . M E ] M a r Robust estimators for generalized linear modelswith a dispersion parameter
Michael Amiguet (1) , Alfio Marazzi (1) , Marina Valdora (2) and Victor Yohai (2) , (1) Universit´e de Lausanne, (2) Universidad de Buenos Aires March 29, 2017
Abstract
Highly robust and efficient estimators for the generalized linear model witha dispersion parameter are proposed. The estimators are based on three steps.In the first step the maximum rank correlation estimator is used to consis-tently estimate the slopes up to a scale factor. In the second step, the scalefactor, the intercept, and the dispersion parameter are consistently estimatedusing a MT-estimator of a simple regression model. The combined estimatoris highly robust but inefficient. Then, randomized quantile residuals based onthe initial estimators are used to detect outliers to be rejected and to definea set S of observations to be retained. Finally, a conditional maximum likeli-hood (CML) estimator given the observations in S is computed. We show that,under the model, S tends to the complete sample for increasing sample size.Therefore, the CML tends to the unconditional maximum likelihood estimator.It is therefore highly efficient, while maintaining the high degree of robustnessof the initial estimator. The case of the negative binomial regression model isstudied in detail. Introduction
In recent years, several extensions of the generalized linear models (GLM; Nelder andWedderburn, 1972) have been proposed to increase flexibility in modelling complexdata structures. We consider the case where the response distribution does notnecessarily belong to the exponential family and where a dispersion parameter ispresent. For this case, we will propose highly efficient and highly robust estimators.We focus on the Negative Binomial (NB) regression model, but we also considerthe Beta regression model as an example with continuous response. NB regression(see Hilbe, 2008) extends Poisson regression for modeling count data in presenceof overdispersion. Beta regression (Ferrari and Cribari-Neto, 2004) is a tool formodelling continuous responses which are restricted to the interval [0 , Let F µ,α ( y ) denote a general family of discrete or continuous distribution functions,where µ is the mean and α is a dispersion parameter, and let f µ,α ( y ) denote thecorresponding probability (density) function. We will focus on two specific examples4f families, one discrete and one continuous :- the NB family: f µ,α ( y ) = Γ( y + 1 /α )Γ(1 /α )Γ( y + 1) ( αµ +1) − /α (cid:18) αµαµ + 1 (cid:19) y , y = 0 , , , ... , α ≥ µ ≥
0; (1)- the Beta family: f µ,α ( y ) = Γ(1 /α )Γ( µ/α )Γ((1 − µ ) /α ) y µ/α − (1 − y ) (1 − µ ) /α − , 0 < y < α ≥ µ ≥
0. (2)In both cases, the parametrization has been chosen so that the expected value is µ .In the NB case, the variance is µ + αµ ; in the Beta case, the variance is µ (1 − µ ) / (1 +1 /α ). In both cases, fixing µ , the variance increases with α .We will need the following assumption on F µ,α ( y ), which is satisfied in our exam-ples: Assumption A : For any α , Y ∼ F µ ,α ( y ), Y ∼ F µ ,α ( y ), if µ > µ then Y ≻ Y ,where “ ≻ ” means “stochastically larger”.Suppose that a response Y and a vector X = ( X , ...X p ) T of covariates are ob-served. We consider the following class of regression models Y | X = x ∼ F h ( x T β ) ,α , (3)where h is a strictly increasing known link function, and β = ( β , ...β p ) T is a vectorof coefficients. We assume that X is constantly equal to one, that is, β is anintercept. We will use the notations x T = (1 , x ∗ T ), β T0 = ( β , β ∗ T0 ), γ = β ∗ / || β ∗ || ,and θ = ( β, α ).We assume that a random sample ( x , y ) , ..., ( x n , y n ) is available. The ML esti-mator of θ = ( β , α ) maximizes the log-likelihood of the sample given by L ( θ ) = n X i =1 ln (cid:16) f h ( x T i β ) ,α ( y i ) (cid:17) .The ML estimator is very efficient but not robust. We want to obtain highly robustand efficient estimators of β and α . 5 Estimation procedure
The proposed procedure starts with the computation of a very robust but not nec-essarily efficient initial estimator which provides the tool for outlier identification.Then, a conditional ML approach is used - where the outliers are removed - whichprovides a fully efficient estimator.Most familiar highly robust estimators of regression, such as LMS, LTS, and Sestimators (see, e.g., Maronna et al., 2006), are based on the minimization of a robustmeasure of the residual scale, such as an M scale (Huber, 1980). These estimatorshave been used as initial estimators of well known highly robust and efficient proce-dures, such as MM (Yohai, 1987), and TML (Marazzi and Yohai, 2004) estimators.However, for the regression models we are considering here, a different approachhas to be used because the residual distribution may depend on the covariates andresidual measures of scale are not available in this case. We therefore propose an ap-proach based on the maximum rank correlation (MRC) estimator introduced by Han(1987a) and Han (1987b). However, the MRC estimator identifies the scaled slopes γ = β ∗ / || β ∗ || , but it does not identify the intercept β , the dispersion parameter α , and the scale factor η = || β ∗ || . So, we need to estimate these three parametersseparately. The complete proposal can then be summarized as follows: Step 1
Compute the MRC estimator ˜ γ of γ . In addition, compute robust andconsistent estimators ˜ β , ˜ α , and ˜ η , of β , α , and η . Then, initial estimatorsof β and α are given by ˜ β = ( ˜ β , ˜ η ˜ γ T ) T and ˜ α respectively. Step 2
Compute randomized quantile residuals z i (Dunn and Smyth, 1996) based onthe initial model and use them to define cutoff values ˜ a and ˜ b , so that influentialoutliers are defined as observations such that z i / ∈ [˜ a, ˜ b ]. Step 3
Compute a conditional ML estimator of θ given z i ∈ [˜ a, ˜ b ].6n the following subsections, we provide a detailed description of each single step. For a given coefficient vector γ = ( γ , ..., γ p ) T , the Kendall’s τ correlation coefficientbetween the responses y i -s and the linear combinations γ T x ∗ i -s is given by τ ( γ ) = 1 n ( n − X i = j I (cid:2) ( γ T x ∗ j − γ T x ∗ i )( y j − y i ) ≥ (cid:3) and the maximum rank correlation ( MRC) estimator of γ is defined by ˜ γ = arg min k γ k =1 τ ( γ ). (4)The robustness of Kendall’s τ correlation coefficient has been studied by Alfons etal. (2016). Under the assumption A, the MRC estimator strongly converges to γ forany strictly increasing h (Han, 1987a); it is also root n consistent and asymptoticallynormal (Sherman, 1993) . To compute the MRC estimator one can utilize a subsampling procedure. Notethat the simple evaluation of the objective function requires O ( n ) calculations, butan algorithm using O ( n log n ) calculations has been proposed by Abrevaya (1999).However, in the Monte Carlo experiments described in Section 4, we used the very fastfunction maxCorGrid of the R package ccaPP (Alfons, 2015) based on an alternategrid algorithm described in Alfons et al. (2016).We now turn to the estimation of β , α , and η , necessary to complete the initialestimator. We observe that h ( x T β ) = h ( β + η γ T0 x ∗ ). Since ˜ γ is close to γ , weapproximate γ T0 x ∗ i by v i = ˜ γ T x ∗ i and consider the simple regression model with justone covariate: Y | v ∼ F h ( β + η ν ) ,α . (5)For this model and a given value α of the unknown α , we have many highly robustestimators ˜ β ∗ ( α ), ˜ η ∗ ( α ), of β and η . Examples are: the conditionally unbiased7ounded influence estimator of K¨unsch et al. (1989), the RQL estimator of Cantoniand Ronchetti (2001), and the weighted MT estimators of Valdora and Yohai (2014) . Finally, to estimate α , we consider a bounded function ψ ( y, µ, α ) such that, for all µ , we have E µ,α [ ψ ( y, µ, α )] = 0 . (6)Then, for any fixed µ, if y , y , ...y n is a random sample of N B ( µ, α ), the M estimatorof α satisfying the equation n X i =1 ψ ( y i , µ, α ) = 0is Fisher consistent for α . Then, an initial consistent estimator ˜ α of α is obtainedby solving X i ψ ( y i , h ( ˜ β ∗ ( α ) + ˜ η ∗ ( α ) v i ) , α ) = 0. (7)The Fisher consistency of this estimator is immediate. In fact, asymptotically, h ( ˜ β ∗ ( α ) + ˜ η ∗ ( α ) v i ) = E ( y i ), and then by (6) E ( ψ ( y, µ i , α )) = 0 . Once ˜ α is computed, we define the initial estimators of β and η by ˜ β = ˜ β ∗ ( ˜ α ),˜ η = ˜ η ∗ ( ˜ α ). In this way we obtain the initial estimators ˜ β = ( ˜ β , ˜ η ˜ γ ) of β and ˜ α of α . We will assume that: Assumption B : n / ( ˜ β − β ) = O p (1) and n / ( ˜ α − α ) = O p (1) . In the simulations of Section 4 and the examples in Section 5, we use a weightedMT estimator for ˜ β ∗ ( α ), ˜ η ∗ ( α ) (see appendix 8) and the score function ψ of the op-timal bounded influence estimator according to Hampel (1972) described in Marazziand Yohai (2010). It can be proved that, under general conditions, the resultinginitial estimators ˜ β and ˜ α satisfy the assumption B.8 .2 Adaptive cutoff values and outlier detection We now assume that some preliminary estimator ˜ θ = ( ˜ β, ˜ α ) of θ is available, forexample the estimators defined in the previous section. Since the residual distributiondepends on the covariates, residuals cannot be used in the usual way for the purpose ofhighlighting outliers. Instead, we use the randomized quantile residuals (RQR) thatwere proposed in Dunn and Smyth (1996) for exploratory purposes. Let ˜ µ x = h ( x T ˜ β ).Then, the RQRs are defined by z i = F ˜ µ x , ˜ α ( y i )in the continuous case and by z i = F ˜ µ x , ˜ α ( y i ) − u i f ˜ µ x , e α ( y i )in the discrete case, where { u , ..., u n } is a sample from a uniform distribution U [0 , x , y ) , ..., ( x n , y n ).If ˜ θ = θ , { z , ..., z n } is a sample from U [0 , a and a fixed upper cutoff value b for the RQRs are simply given by a low, respectivelya large quantile of U [0 ,
1] – e.g., a = 0 .
05 and b = 0 .
95 – and observations such that z i / ∈ [ a, b ] may be identified as outliers. However, we propose the use of “adaptive”cutoff values ˜ a and ˜ b that, under the assumed model, tend to 0 and 1 respectively,when ˜ β and ˜ α are consistent estimators. Therefore, under the model, i.e., in theabsence of outliers, the fraction of observations that are erroneously identified asoutliers tends to 0 when the sample size n → ∞ .To define the adaptive cutoff values, we follow a procedure similar to the onesdescribed in Marazzi and Yohai (2004, Section 3.2) and in Gervini and Yohai (2002).Let F n denote the empirical cdf of z , ..., z n and F Rn,t and F Ln,t be the right and theleft truncated versions of F n for a given t respectively, i.e,. F Rn,t ( z ) = F n ( z ) /F n ( t ) if z ≤ t, Ln,t ( z ) = ( F n ( z ) − F n ( t )) / (1 − F n ( t )) if z ≥ t, F n and the U [0 , t suchthat F Rn,t ( z ) ≥ z for all z ≥ ζ where ζ is a value close to one. More precisely, wedefine an upper cutoff value as˜ b = sup (cid:26) t : inf z ≥ ζ ( F Rn,t ( z ) − z ) ≥ (cid:27) . In a similar way, we define a lower cutoff value as˜ a = inf (cid:26) t : sup z ≤ ζ ( F Ln,t ( z ) − z ) ≤ (cid:27) , where ζ is close to zero.We assume that: Assumption C : The density f ( y, µ, α ) has a bounded derivative with respect to µ and α .Then, we have the following Theorem, proved in Appendix 9. Theorem 1
Assume B and C. Then n / ˜ a = O p (1) , n / (˜ b −
1) = O p (1) . Usually, a quite high value of is ζ chosen. Our usual choice is ζ = 0 .
95; however,in the presence of a large proportion of high outliers, it may be convenient to use alower value, e.g., ζ = 0 .
90. Similar considerations apply to the choice of the lowercutoff ˜ a and we usually set ζ = 0 .
05, but ζ = 0 .
10 would allow removing a largerfraction of small observations, such as “excess zeros”in the NB case. (In fact, avery small ζ could fail to identify many “excess zeros”, because each one of themcorresponds to several distinct z i ’s and may not emerge as an extremely small value.)10 .3 Final estimator In the final step, we improve the efficiency of the initial estimator using a conditionalML approach. Suppose first that fixed cutoff values a and b are given and the RQRsare computed. Let p β,α ( y | x , Z ∈ [ a, b ]) denote the conditional density of Y given X = x and Z ∈ [ a, b ], where Z ∼ U [0 ,
1] represents the RQR. Then, the conditionaldensity is of the form p β,α ( y | x , Z ∈ [ a, b ]) = f h ( x T i β ) ,α ( y ) W a,b ( x , β,α ). (8)In the continuous case we have W a,b ( x , β,α ) = I ( F − µ x ,α ( a ) ≤ y ≤ F − µ x ,α ( b )) b − a . In the discrete case, the following expression (9) for W a.b ( x , β,α ) is derived in theappendix 7. Let, for any c , y ∗ x ( c ) = max { y : F µ x ,α ( y ) ≤ c } , and t c, x = F µ x ,α ( y ∗ x ( c ) + 1) − cf µ x ,α ( y ∗ x ( c ) + 1) , Put T a, x = y ∗ x ( a ) + 2, T b, x = y ∗ x ( b ) ,A x = { y : T a, x ≤ y ≤ T b, x } , and Q ( x , β,α ) = F µ x ,α ( T b, x ) − F µ x ,α ( T a, x −
1) + f µ x ,α ( T a, x − t a, x + f µ x ,α ( T b, x + 1)(1 − t b, x ) . Then W a,b ( x , β,α ) = Q ( x ,β,α ) if y ∈ A x , t a, x Q ( x ,β,α ) if y = T a, x − , − t b, x Q ( x ,β,α ) if y = T b, x + 1 , a and ˜ b are the adaptive cutoff values defined above, andconsider the adaptive conditional likelihood function L CML ( θ ) = n X i =1 I (˜ a ≤ z i ≤ ˜ b ) ln (cid:16) p β,α ( y i | x i , z i ∈ [˜ a, ˜ b ]) (cid:17) . The conditional maximum likelihood ( CML) estimator ˆ θ CML = ( ˆ β CML , ˆ α CML ) is de-fined by ˆ θ CML = arg max θ L CML ( θ ) . In the discrete case, a slight modification of this definition is convenient. We notethat (see appendix 7): { a ≤ z i ≤ b } = { T a, x i ≤ y i ≤ T b, x i } ∪ { y i = T ∗ l, x , u i ≤ t a, x i } ∪ { y i = T ∗ u, x , u i ≥ t b, x i } , where T ∗ l, x = T a, x − , and T ∗ u, x = T b, x + 1. Then, L CML ( θ ) = X T ˜ a, x i ≤ y i ≤ T ˜ b, x i ln (cid:16) p β,α ( y i | x i , z i ∈ [˜ a, ˜ b ]) (cid:17) + X y i = T ∗ l, x I ( u i ≤ t ˜ a, x i ) ln (cid:16) p β,α ( y i | x i , z i ∈ [˜ a, ˜ b ]) (cid:17) + X y i = T ∗ u, x I ( u i ≥ t ˜ b, x i ) ln (cid:16) p β,α ( y i | x i , z i ∈ [˜ a, ˜ b ]) (cid:17) . Since the u i -s are non–informative, we replace I ( u i ≤ t ˜ a, x i ) and I ( u i ≥ t ˜ b, x i ) by theirexpected values, and define L MCML ( θ ) = X T ˜ a, x i ≤ y i ≤ T ˜ b, x i ln (cid:16) p β,α ( y i | x i , z i ∈ [˜ a, ˜ b ]) (cid:17) + X y i = T ∗ l, x t ˜ a, x i ln (cid:16) p β,α ( y i | x i , z i ∈ [˜ a, ˜ b ]) (cid:17) + X y i = T ∗ u, x (1 − t ˜ b, x i ) ln (cid:16) p β,α ( y i | x i , z i ∈ [˜ a, ˜ b ]) (cid:17) . modified CML (MCML) estimator ˆ θ MCML = ( ˆ β MCML , ˆ α MCML ) by ˆ θ MCML = arg max θ L MCML ( θ ) . From (9) and Theorem 1, it is easy to show that n / (cid:0) W ˜ a, ˜ b ( x , β,α ) − (cid:1) = O p (1) (10)and therefore n / ( p β,α ( y | x , Z ∈ [˜ a, ˜ b ]) − f h ( x T i β ) ,α ( y )) = O p (1) . (11)Then, according to (11), both L CML ( θ ) and L MCML ( θ ) tend, under the model, to theunconditional likelihood function with rate n − / . For this reason we conjecture thatboth the CML and the MCML estimator have the same asymptotic distribution thanthe unconditional ML estimator, that is, n / ( ˆ θ CML − θ ) → D N p ( , I − ( θ )) , and n / ( ˆ θ MCML − θ ) → D N p ( , I − ( θ )) , where → D denotes convergence in distribution, N p ( µ, Σ ) the p -variate normal distri-bution with mean µ and covariance matrix Σ, and I ( θ ) the information matrix. Thisimplies that ˆ θ CML and ˆ θ MCML are both fully efficient.
Remark 1 . Empirical results show that, in order to optimize the finite sampleefficiency, with no loss of robustness, it is convenient to iterate the conditional MLestimator as follows. Given a current value of ˆ θ CML (or ˆ θ MCML ), we compute newRQR-s. Then, we compute new values of ˜ a and ˜ b and use them to update ˆ θ CML .Often, the process converges after a few iterations, but can also move away from theinitial value. In the experiments reported in Section , we found that two stepsare enough: the efficiency did not improve using more iterations. Moreover, in the13iscrete case, the final estimator slightly depends on the sample { u , ..., u n } used tocompute ˜ a and ˜ b . To remove this dependency, we propose to average the final step(MCML) over a few replications of this sample. Remark 2 . In certain circumstances, we may use a very simple alternative procedureto compute robust and consistent estimators of β , α , and η in (5). We first identifya simple model, which is free of the dispersion parameter, and that can be taken asan approximation of (5). For example, the Poisson regression model with mean h ( β + η ν ) may be taken as an approximation of the NB model. We then use anavailable robust procedure to estimate β and η . In the NB case, the conditionallyunbiased bounded influence estimators of K¨unsch et al. (1989), implemented in the Rpackage “robeth” (Marazzi, 1992) is a natural choice. In the Beta regression case wenote that Atkinson (1985) transforms the response so that the transformed dependentvariable (e.g., log( y/ (1 − y ))) assumes values on the real line, and then uses it in alinear regression analysis. Clearly, we may also use a robust regression estimatorin this case, e.g., the MM estimator implemented in the R package “robustbase”.Finally, we estimate α using (7). Since the approximate model is not the correctone, the estimators do not converge to β , α , and η . Usual robust estimatorsconverge however to their population values and can be used to define fixed cut-offvalues a and b for Z , which also converge to their asymptotic values. The CML(MCML) estimator of ( β , α , η ) given Z ∈ [ a, b ] is then consistent under (5). We present simulation results only for the NB regression model (3)-(1). We comparedthe initial estimators ˜ β and ˜ α and the final modified CML estimators ˆ β and ˆ α . In thefollowing, these estimators will be referred as INI and CML respectively. All cutoffvalues were adaptive with ζ = 0 .
05 and ζ = 0 .
95. In order to compute the MRC14stimators we used the function maxCorGrid of the R package ccaPP (Alfons et al.,2015). The INI estimator was completed with the help of the weighted MT estimatordescribed in Appendix 8. In order to estimate α , we used the function ψ defined bythe equation for α of the optimal M estimator M80 described in Marazzi and Yohai(2010, p.174) and available in the “robustGLM” package. To compute the CML esti-mator, we used the standard R optimizer “optim”, reparametrizing α with σ = √ α inorder to satisfy the constraint α >
0. (For a very small number of contaminated cases,the optimization process diverged; the initial solution was recorded in such cases.)Only two iterations of the CML procedure were computed. For comparison, we alsocomputed the GM estimators of Aeberhard et al. (2014) that will referred as ACHin the following. To compute the ACH estimator, we used the R function glmrob.nb(available on internet) with the parameters: bounding.func=‘T/T’, c.tukey.beta=4,c.tukey.sig=4, as suggested by the authors, and the option x-weight=hard that pro-vides hard rejection weights for the covariate observations. We used the followingmodel: y ∼ F exp( x T β ) ,α , x = (cid:18) x ∗ (cid:19) , x ∗ ∼ N (0 , I ) , (12) β = (1 . , . , . , , , , α = 0 . . We first performed four experiments with samples of size n = 100, 400, 1000, 2000from (12) without addition of outliers. For each experiment, the number of replica-tions was N = 1000. To measure the quality of an estimator ( β,α ) we used the meanabsolute estimation error (MAEE) and the mean absolute prediction error (MAPE).The MAEE of β is defined byMAEE( β ) = 1 N N X i =1 (cid:13)(cid:13)(cid:13) β i − β (cid:13)(cid:13)(cid:13) , β i is the estimate of β based on the i th replication and || . || denotes the l norm. The MAEE of α is defined in a similar way byMAEE( α ) = 1 N N X i =1 (cid:12)(cid:12)(cid:12) α i − α (cid:12)(cid:12)(cid:12) .The MAPE of the prediction estimator µ x = exp (cid:0) x T β (cid:1) of µ , x = exp (cid:0) x T β (cid:1) isdefined as MAPE( µ ) = 1 N N X i =1 (cid:12)(cid:12)(cid:12) µ i − µ i (cid:12)(cid:12)(cid:12) ,where µ i = exp (cid:16) x i β (cid:17) and µ i = exp (cid:16) x T i β i (cid:17) and x i is the i th replication of x .Table 1 reports the empirical relative efficiencies measured as the ratios of the MAEEand MAPE of the robust estimators with respect to the corresponding MAEE andMAPE of the ML estimators. Table 1. Empirical relative efficiencies of coefficients, dispersion, and predictionestimators
We observe that the relative efficiencies of the initial estimators were low butcould be improved with the help of the final MCML procedure. With the exceptionof the dispersion estimator for n = 100, our final estimator is much more efficient thanthe ACH competitor. (The tuning constants of the ACH estimator were apparentlychosen by the authors in order to obtain a satisfactory degree of robustness.) In another simulation the model (12) has been contaminated with 10% of pointwisecontamination. Preliminary experiments showed that the estimators were quite sen-sitive to outlying values of y when x ∗ = (3 , , , , T . This value of x is moderatelyoutlying with respect to the majority of the covariate observations. Therefore, we16sed point contaminations of the form ( x ∗ out , y out ) with x ∗ out = (3 , , , , T and aresponse y out varying in the set { , , , , , , , , , , , , } . Foreach value of y out , we generated 1000 samples of size n = 400 according to (12) andthen replaced 10% of the observations with identical outliers of the form ( x ∗ out , y out ).Table 2 reports the MAEE and MAPE of the estimators for the different values of y out . (Outliers were excluded in the computation of the MAPE). The results are alsodisplayed in Figure 1. Both the MAPE and MAE of the proposed estimators weresmaller than those of ACH for most values of y out . Table 2. MAEE and MAPE of coefficient, dispersion, and prediction estimators forvarying y out .Figure 1. Mean absolute prediction and estimation errors for varying y out . In modern hospital management, stays are classified into “diagnosis related groups”(DRGs; Fetter et al., 1980) which are designed to be as homogeneous as possiblewith respect to diagnosis, treatment, and resource consumption. The mean cost ofstay of each DRGs is periodically estimated with the help of administrative data ona national basis and used to determine “standard prices” for hospital funding and re-imbursement. Typical stays are reimbursed according to the standard prices, whereasthe reimbursement of exceptional stays (outliers) is subject to special negotiationsamong the partners. Since it is difficult to measure cost, length of stay (LOS) is oftenused as a proxy. Outliers are usually defined as observations with a LOS larger thatsome arbitrary cutoff value. In designing and refining the groups, the relationshipbetween LOS and other variables which are usually available on administrative fileshas to be assessed and taken into account.17e first reconsider the example described in Marazzi and Yohai (2010). In thisexample there are not covariables, that is, only the parameters of a NB distributionare estimated. Table 3 shows the LOS of 32 stays classified into DRG “disorders ofthe nervous system” and we immediately identify three extreme values: 115, 198,374 days. The arithmetic means with and without these observations are 25 . . − a = 0 .
05 and b = 0 .
95 based on two iterations starting from M80, and averagedover 100 replications of { u , ...u n } . We also computed the three estimators (MLE*,M80*, CML*) after removal of the three outliers. The numerical results are shownin Table 4. They show that M80 and CML provided results which were similar toMLE* and unaffected by the outliers. The average values of ˜ a and ˜ b were ¯ a = 0 . b = 0 .
953 from which we derived T ¯ a = 1, T ¯ b = 7, t ¯ a = 0 .
61, and t ¯ b = 0 .
43. Thismeans that, in the overage, the CML estimator completely rejected LOS − ,
8] and gives weights 0 .
61 and 0 .
57 to the extremes of thisinterval.
Table 3. Length of stay of 32 hospital patients.Table 4. Estimates of LOS-1 mean and LOS-1 dispersion for disorders of thenervous system.
18n a second example, we considered a sample of 649 hospital stays (256 maleand 393 female patients) for the “major diagnostic category” (MDC) “Diseases andDisorders of the Endocrine, Nutritional And Metabolic System”. A MDC is simplya group of DRGs associated with a particular medical specialty. The data are shownin Figure 2 (two outliers with LOS = 84 and LOS = 122 fall beyond the upper limitof the figure).We studied the relationship between LOS − x in years) and Sex of the patient ( x = 0 for males and x = 1 forfemales). We considered a NB model with exponential link and linear predictor β + β x + β x + β x x . We compared the ML, the ACH, and the complete es-timator (called CML in the following) proposed in Section 3. The ACH estimatorwas computed with the help of the R function glmrob.nb with the tuning parameterssuggested by the authors. The CML step - with a = 0 .
05 and b = 0 .
95 and twoiterations - was replicated 30 times with different vectors { u , ..., u n } . The averagevalues of ˜ a and ˜ b were ¯ a = 0 .
004 and ¯ b = 0 . T ¯ a, x i , T ¯ b, x i , t ¯ a, x i , and t ¯ b , x i ( i = 1 , ..., n ). We found that 65 observations were totally rejected, 62fell on the lower limits T ¯ a, x i (receiving an average weight 0 .
96) and 9 on the upperlimits T b,x i (with an average weight 0 . Figure 2. Data: LOS-1 and Age of patients and fitted models according to CMLand ML.Table 5. Coefficient (standard errors) and dispersion estimates for disorders of theendocrine system.
19e observe that the CML and the ML* coefficient estimates are very close andquite similar to the ACH estimates. (However, the standard errors provided byglmrob.nb are surprisingly large.) We also note that the dispersion parameter isheavily inflated by the contamination. For CML and ML*, the Sex effect ( β ) issignificant at the 5% level and the interaction ( β ) is not significant. Instead, forML, the interaction is significant at the 5% level, but not the effect of Sex. Thus, theclassical and the robust inferences are different.Figure 3 shows three uniform qq-plots of randomized tail probabilities z , ..., z n based on different estimates of α and β . In panel (a) the ML estimator has been usedand the sigmoidal shape suggests that the estimated model is incorrect. In panel(b), the z -values were based on the modified CML estimator; the plot is more linearbut it gradually departs form the diagonal for increasing quantiles. This suggeststhat the robustly fitted model is adequate for a large proportion of data but not forthose corresponding to very large values of z . Panel (c) is based on ML* and the z -values corresponding to the full outliers based on CML have been removed fromthe plot; this plot follows the diagonal line very well. Finally, the boxplots in panel(d) compare the distribution of the absolute residuals (without outliers) based onML, ACH, CML, and ML*; the two latter ones are globally smaller than the formerones. We conclude that CML (and ML*) provide an adequate model for about 90%of the population. Figure 3. qq-plots of randomized tail probabilities based on ML, CML, ML withremoval of the extreme z-values from the plot, and boxplots of the absolute residualsof ML, ACH, CML, and ML*. Discussion
In many areas of applied statistics, the data may be affected by a high level of contam-ination. An example is the analysis of hospital length of stay, where contaminationlevels as high as 10% are not uncommon. For this reason, different ad hoc rules oftrimming had long been used by practitioners to remove outliers (e.g., Marazzi et al.,1998) from their data. In these applications, well founded highly robust proceduresare needed.Maronna et al. (1979) showed that classical M and GM estimators of regression(see e.g., Huber, 1980, Hampel et al., 1986) were unable to combine a high level ofrobustness and a high level efficiency: M and GM estimators can be very efficient, butare very sensitive to outliers in the factor space. This work stimulated the research onhigh breakdown-point estimation that provided LMS, LTS, and S estimators (see e.g.,Maronna et al., 2006) just to mention three among many other procedures. Then,for the usual linear regression problem, the MM estimators of Yohai (1987) combinedhigh breakdown point and high efficiency with the help a two step approach: inthe first step, a very robust initial fit (an S estimator) provided the tool for outlieridentification; the second step was based on an efficient estimator (an M estimator),where the outliers were downweighted. Since then, similar two-step procedures havebeen proposed for different models (Marazzi and Yohai, 2004; Locatelli et al. 2010;Agostinelli et al., 2014).However, the familiar high breakdown point regression estimators used in thefirst step are based on minimization of a robust measure of the residual scale and,unfortunately, cannot be used for GLMs with a dispersion parameter, such as NBand Beta regression. The reason is that the residual distribution depends on thecovariates and robust residual measures of scale are not available in this case. In thispaper, we propose a more general approach that bypasses residual scales.Our proposal is an original assembly of well known procedures. In the initial step21e use the MRC estimator (Han, 1987) to estimate the slopes up to a scale factor. Avery fast algorithm to compute this estimator has recently been proposed in Alfons et.al (2016). We complete the MRC with the help of a weighted MT estimator (Valdoraand Yohai, 2014) of a simple negative binomial regression. We then use randomizedquantile residuals (Dunn and Smyth, 1996) to determine adaptive cutoff values ˜ a and ˜ b using a procedure similar to the one proposed in Marazzi and Yohai (2004).Influential outliers are identified by the residuals not belonging to [˜ a, ˜ b ]. Finally, wecompute a conditional ML, estimator where residuals belong to [˜ a, ˜ b ]. Since, in theabsence of outliers, ˜ a → b →
1, the CML estimator tends to the ML estimatorfor n → ∞ . It is therefore fully efficient.Monte Carlo simulations confirm that our proposal is very efficient under themodel and very robust under point contamination, both in the response and thecovariate distributions. This kind of contamination is unrealistic; however, it is gen-erally the least favorable one and allows evaluation of the maximal bias an estimatorcan incur. The CML estimator for NB regression also resists to a moderate fractionof excess zeroes in the response. A more vigorous treatment of this peculiarity ofcount data should however be approached with the help of specific models, such ashurdle models (see, e.g. Min and Agresti, 2002, and Cantoni and Zedini, 2009).We have shown that the proposed method is a useful tool for modelling hospitallength of stay as a function of available covariates, while identifying influential outliersaccording to a model based rule. A set of R functions to compute the proposedestimators is made available as an R package.22 ppendices To simplify notations, we just consider the case without covariates; the extension tothe regression case is straightforward. We suppose that ϑ = ( µ , α ) is given and let z = F ϑ ( y ) − uf ϑ ( y ), where u ∼ U [0 , a and b are given cutoff valuesfor z and define, for any c , y ∗ ( c ) = max { y : F ϑ ( y ) ≤ c } . Note that F ϑ ( y ∗ ( a ) + 1) − uf ϑ ( y ∗ ( a ) + 1) ≥ a is equivalent to u ≤ F ϑ ( y ∗ ( a ) + 1) − af ϑ ( y ∗ ( a ) + 1) = t a . Similarly F ϑ ( y ∗ ( b ) + 1) − uf ϑ ( y ∗ ( b ) + 1) ≤ b is equivalent to u ≥ F ϑ ( y ∗ ( b ) + 1) − bf ϑ ( y ∗ ( b ) + 1) = t b . Put T a = y ∗ ( a ) + 2, T b = y ∗ ( b ), and A = { y : T a ≤ y ≤ T b } . We have { a ≤ z ≤ b } = A ∪ { y = T a − , u ≤ t a } ∪ { y = T b + 1 , u ≥ t b } , and then P ϑ ( a ≤ z ≤ b | u ) = P ϑ ( A ) + f ϑ ( T a − I ( u ≤ t a ) + f ϑ ( T b + 1) I ( u ≥ t b ) , P ϑ ( A ) = F ϑ ( T b ) − F ϑ ( T a − v = I ( a ≤ z ≤ b ). Since E [ I ( u ≤ t a )] = P ( u ≤ t a ) = t a , the distribution of y | v = 1 is given by p ϑ ( y | v = 1) = f ϑ ( y ) Q ( ϑ ) if y ∈ A, f ϑ ( y ) t a Q ( ϑ ) if y = T a − , f ϑ ( y )(1 − t b ) Q ( ϑ ) if y = T b + 1 , Q ( ϑ ) = E [ P ϑ ( a ≤ z ≤ b | u )] = P ϑ ( A ) + p ϑ ( T a − t a + p ϑ ( T b + 1)(1 − t b ) . We describe the use of the weighted MT estimator to compute ˜ β ∗ ( α ) and ˜ η ∗ ( α )introduced in subsection 3.1. We consider the simple regression model Y | v ∼ F h ( β + η v ) ,α . Assuming that α is known, the weighted MT estimator of ( β , η ) isdefined as follows.( ˜ β ∗ ( α ) , ˜ η ∗ ( α )) = arg min β ,β n n X i =1 w ( x i , ˆ M , ˆ S ) ρ ( t ( y i , α ) − m ( h ( β + β x i , α )) , (13)where ρ is a continuous and bounded function with a unique local minimum at 0, m is the function defined by m ( µ, α ) = arg min γ E µ,α ( ρ ( t ( y, α ) − γ )) , (14) t ( y, α ) is a variance stabilizing transformation and w ( x, ˆ M , ˆ S ) is a nonnegative non-increasing function of (cid:12)(cid:12)(cid:12) x − ˆ M (cid:12)(cid:12)(cid:12) / ˆ S , where ˆ M and ˆ S are robust estimators of location24nd scale of the covariate x . Usually, ρ is taken in the Tukey’s biweight family givenby ρ Tc ( u ) = 1 − max (cid:18) − (cid:16) uc (cid:17) (cid:19) , ! . In our simulations in Section 4 and the examples in section 5 with the NB distribution,we used the transformation t ( y, α ) = s ( y, α ) if 0 < α < . s ( y, .
3) if α > . , where s ( y, α ) = p /α − . s y + 3 / /α − / ! .This is a modification of the transformation proposed by Yu (2009) to allow valuesof α larger than 4 /
3. We take w ( x, ˆ M , ˆ S ) = I ( | x − median( x i ) | / mad( x i ) <
2) and h ( z ) = exp( z ). Since the variance of t ( y, α ) is almost constant, it is not necessary todivide the argument of ρ c by a scale estimator. While the efficiency of the estimatorincreases with c , its degree of robustness decreases. Since the weighted MT estimator,is used to define an initial estimator whose efficiency will be improved in further steps,the value of c is chosen in order to obtain a satisfactory degree of robustness. By trialand error we obtain the following rule for choosing c as a function of α : c = 1 . σ ( α ),where, for each α, σ ( α ) is the constant that approximates the standard deviationof t ( y, α ) . The value of σ ( α ) is obtained by interpolation the values in the followingTable A1: α .10 .20 .30 .40 .50 .60 .70 .80 .90 1.0 1.1 1.2 1.3 σ .41 .40 .39 .37 .36 .35 .33 .32 .30 .29 .27 .26 .24 Table A1. Approximated standard deviations of t ( y, α ) for the NB distribution25or the Beta distribution, we have Var µ,α ( y ) = µ (1 − µ ) / (1 + α ) and a suitablevariance stabilizing transformation (Bartlett, 1947) is given by t ( y, α ) = Z y / Var µ,α ( y ) / dµ = √ α arcsin ( √ y ) .In our experiments we used this transformation for α ∈ [5 ,
50] and link function h ( u ) = exp( u ) / (1+ exp( u ). We follow the same approach as in the NB case. Thevalues of the approximated variances can be found in the following Table A2: α σ .42 .43 .43 .44 .45 .45 .47 .48 .48 .49 .49 .49 .49 .49 Table A2. Approximated standard deviations of t ( y, α ) for the beta distributionWhen α is unknown, the estimator ( ˜ β ∗ ( ˜ α ) , ˜ η ∗ ( ˜ α )) simultaneously satisfies equa-tions (13) and (7). To compute an approximate solution we consider a grid of possiblevalues of α , namely the values in the tables above. For each α in the grid, we firstcompute ( ˜ β ∗ ( α ) , ˜ η ∗ ( α )) and then the solution ˜ α ∗ of (7). The desired approximationis then defined as the vector ( ˜ β ∗ ( ˜ α ∗ ) , ˜ η ∗ ( ˜ α ∗ )) for which the difference between α and˜ α ∗ is minimal. We consider the discrete case, where the RQRs are defined by z i = F h ( x T ˜ β ) , ˜ α ( y i ) − u i f h ( x T ˜ β ) , e α ( y i ) , ≤ i ≤ n, By Assumption B, there exist A and B such that, if D n = { n / | ˜ α − α | ≤ A , || n / ( ˜ β T − β ) || ≤ B } ,
26e have P ( D n ) ≥ − ζ / v i = F h ( x T β ) ,α − u i f h ( x T β ) ,α ( y i ) , ≤ i ≤ n. Then the v i ’s are i.i.d. with distribution U [0 , . By Assumption C, there exist K and K such that z i ≥ v i − (cid:16) K || ˜ β − β || + K | ˜ α − α | (cid:17) i.e., z i ≥ v i − n − / B n , where B n = O p (1). Let e such that, if M n = { B n ≤ e } , then P ( M n ) ≥ − ζ / . Let F zn and F vn be the empirical distributions of the z i ’s and v i ’s respectively. Then,we have F zn ( v ) ≤ F vn ( v + n − / B n ) . (15)Since E n = sup v n / | F vn ( v ) − v | = O P (1) , we get F vn ( v ) ≤ v + n − / E n . Then, putting G n = B n + E n , by (15) we obtain F zn ( v ) ≤ v + n − / G n . (16)In a similar way we get F zn ( v ) ≥ v − n − / G ∗ n . (17)where G ∗ n = O p (1). Let H na ( v ) = sup( F zn ( v ) − F zn ( a ) , − F zn ( a )27nd A = { a : sup v ≤ ζ ( H na ( v ) − v ) ≤ } . Then ˜ a = inf A Note that a ∈ A is equivalent to F zn ( v ) ≤ v (1 − F zn ( a )) + F zn ( a ) for all v ≤ ζ and this is equivalent to F zn ( a )(1 − v ) ≥ F zn ( v ) − v for all v ≤ ζ . By (16) and (17) a sufficient condition for a ∈ A is that( a − n − / G ∗ n )(1 − ζ ) ≥ n − / G n or equivalently that a ≥ n − / (cid:18) G n − ζ + G ∗ n (cid:19) This implies that ˜ a ≤ n − / (cid:18) G n − ζ + G ∗ n (cid:19) . proving that n / ˜ a is bounded in probability. The proof that n / (˜ b −
1) is boundedin probability too is similar. 28 eferences
Abrevaya J. (1999). Computation of the maximum rank correlation estimator. Eco-nomics Letters, 62, 279–285.Aeberhard W.H., Cantoni E. and Heritier S. (2014). Robust inference in the negativebinomial regression model with an application to falls data. Biometrics, 70, 920-931.DOI: 10.1111/biom.12212Agostinelli C., Marazzi A. and Yohai V.J. (2014). Robust estimators of the general-ized log-gamma distribution. Technometrics, 56(1), 92-101.Alfons A., Croux C. and Filzmoser P. (2016). Robust maximum association estima-tors. Journal of the American Statistical Association. In press.Alfons A. (2015). ccaPP: (Robust) canonical correlation analysis via projection pur-suit. R package version 0.3.1, URL http://CRAN.R-project.org/package=ccaPP.Amiguet, M. (2011). Adaptively weighted maximum likelihood estimation of discretedistributions. Ph.D. thesis, Universit´e de Lausanne, Switzerland.Austin, P.C., Rothwell, D.M. and Tu, J.V. (2002). A comparison of statistical mod-eling strategies for analyzing length of stay after CABG surgery. Health services &outcomes research methodology (3), 107-133. DOI:10.1023/A:1024260023851Cadigan N. G. and Chen J. (2001). Properties of robust M–estimators for Poissonand negative binomial data. Journal of Statistical Computation and Simulation, 70,273-288.Cantoni E., and Ronchetti E. (2001). Robust inference for generalized linear models.Journal of the American Statistical Association, 96(455),1022-1030.Cantoni E. and Zedini A. (2009). A robust version of the hurdle model. Cahiers dud´epartement d’´econom´etrie No 2009.07, Facult´e des sciences ´economiques et sociales,Universit´e de Gen`eve. 29arter E.M. and Potts H.W.W. (2014). Predicting length of stay from an electronicpatient record system: a primary total knee replacement example. BMC MedicalInformatics & Decision Making 14: 26. DOI:10.1186/1472-6947-14-26.Cribari–Neto F. and Zeiles, A. (2010). Beta regression in R. Journal of StatisticalSoftware, 34, 1–24.Cuesta-Albertos J.A., Matr´an C. and Mayo-Iscar A (2008). Trimming and likeli-hood: robust location and dispersion estimate in the elliptical model. The Annals ofStatistics, 36(5), 2284–2318.Davison A.C. and Snell E.J. (1991). Residuals and diagnostics. In Statistical Theoryand Modelling: In Honour of Sir David Cox. D.V. Hinkley, N. Reid and E.J. Snell(editors), 83–106. Chapman and Hall.Dunn P.K. and Smyth G.K. (1996). Randomized quantile residuals. Journal ofComputational and Graphical Statistics, 5(3), 236-244.Espinheira P. L., Ferrari S. L. P. and Cribari-Neto, F. (2008). Influence diagnosticsin beta regression. Computational Statistics & Data Analysis, 52(9), 4417-4431.Ferrari S. L. P. and Cribari-Neto F. (2004). Beta regression for modelling rates andproportions. Journal of Applied Statistics, 31(7), 799–815Fetter R.B., Shin Y., Freeman J.L., Averill R.F., and Thompson J.D. (1980). Casemixdefinition by diagnosis-related groups. Medical care, 18(1), 1-53.Gervini D. and Yohai V.J. (2002). A class of robust and fully efficient regressionestimators. Annals of Statistics, 30(2), 583-616.Hampel F.R., Ronchetti E.M., Rousseeuw P.J. and Stahel W.A. (1986). RobustStatistics: The Approach Based on Influence Functions. Wiley, New York.Han A.K. (1987a). Non-parametric analysis of a generalized regression model: Themaximum rank correlation estimator. Journal of Econometrics, 35(23), 303-316.30an A.K. (1987b). A non-parametric analysis of transformations. Journal of Econo-metrics, 35, (2-3), 191-209.Hilbe J.M. (2008). Negative Binomial Regression. Cambridge University press.Huber P.J. (1980). Robust Statistics. Wiley, New York.Hunger M., Baumert J. and Holle R. (2011). Analysis of SF-60 index data: is betaregression appropriate? Value in Health 14, 759-767.K¨unsch H.R., Stefanski L.A. and Carroll R.J. (1989). Conditionally unbiased bounded-influence estimation in general regression models, with applications to generalizedlinear models. Journal of the American Statistical Association, 84(406), 460-466.Locatelli I., Marazzi A. and Yohai V.J. (2010). Robust accelerated failure timeregression. Computational Statistics & Data Analysis, 55(1), 874-887.Marazzi A. (1993). Algorithms, Routines, and S-Functions for Robust Statistics.Wadsworth, Inc., Belmont, California.Marazzi A., Paccaud F., Ruffieux C. and Beguin C. (1998). Fitting the distributionof length of stay by parametric models. Medical Care, 36(6), 915-927.Marazzi A. and Yohai V.J. (2004). Adaptively truncated maximum likelihood re-gression with asymmetric errors. Journal of Statistical Planning and Inference. 122(1-2), 271-291.Marazzi A. and Yohai V.J. (2010). Optimal robust estimates based on the Hellingerdistance. Advances in Data Analysis and Classification. Springer-Verlag.Maronna R.A., Martin R.D. and Yohai V.J. (2006). Robust Statistics Theory andMethods. Wiley & Sons, Ltd.Maronna R., Bustos O. and Yohai V.J. (1979). Bias-and efficiency-robustness ofgeneral M-estimators for regression with random carriers, Smoothing techniques forcurve estimation, 91-116. 31in Y. and Agresti A. (2002). Modeling nonnegative data with clumping at zero: Asurvey. Journal of the Iranian Statistical Society, 1,(1-2), 7-33Nelder J.A. and Wedderburn R. W. M. (1972). Generalized linear models. Journalof the Royal Statistical Society, Series A, 135 (3), 370-384.Venables W. N. and Ripley B. D. (1999). Modern Applied Statistics with S-PLUS.Third edition. Springer.Rocha, A. V. and Simas, A. B. (2010). Influence diagnostics in a general class of betaregression models. Test, 20(1), 95-119.Seow W.J., Pesatori A.C., Dimont E., Farmer P.B., Albetti B, et al. (2012). Urinarybenzene biomarkers and DNA methylation in Bulgarian petrochemi workers: Studyfindings and comparison of linear and beta regression models. PLOS ONE 7: e50471.Sherman R.P. (1993). The limiting distribution of the maximum rank correlationestimator. Econometrica, 61(1), 123-137.Swearingen C.J., Tillley B.C., Adams R.J., Rumboldt Z., Nicholas J.S., Bandyopad-hyay D. and Woolson R.F. (2011). Application of Beta Regression to Analyze ls-chemic Stroke Volume in NINDS rt-PA Clinical Trials. Neuroepidemiology, 37(2),73-82.Valdora M. and Yohai V.J. (2014). Robust estimation in generalized linear models.Journal of Statistical Planning and Inference, 146, 31-48.Yu G. (2009. Variance stabilizing transformations of Poisson, binomial and negativebinomial distributions. Statistics and Probability Letters, 79, 1621-1629.Yohai V.J. (1987). High breakdown-point and high efficiency robust estimates forregression. Annals of Statistics, 15(2), 642-656.32 α µn
INI CML ACH INI CML ACH INI CML ACH
100 0.55 0.74 0.71 0.78 0.71 0.76 0.50 0.70 0.76400 0.52 0.88 0.75 0.73 0.85 0.79 0.48 0.89 0.761000 0.51 0.93 0.78 0.73 0.93 0.83 0.48 0.93 0.782000 0.54 0.95 0.75 0.73 0.94 0.83 0.50 0.95 0.75
Table 1. Empirical relative efficiencies of coefficients, dispersion, and prediction estimates. y out β CML 0.72 0.76 0.78 0.55 0.38 0.32 0.33 0.37 0.42 0.45 0.53 0.55 0.48ACH 1.27 1.19 1.09 0.67 0.46 0.41 0.45 0.50 0.55 0.59 0.70 0.75 0.88INI 0.45 0.09 0.09 0.09 0.09 0.09 0.09 0.09 0.09 0.09 0.09 0.09 0.09 α CML 0.45 0.25 0.13 0.18 0.23 0.23 0.23 0.21 0.20 0.19 0.15 0.12 0.09ACH 0.53 0.12 0.10 0.27 0.29 0.29 0.27 0.26 0.24 0.23 0.19 0.17 0.12INI 1.92 1.17 1.11 1.11 1.11 1.11 1.11 0 1.11 1.11 1.11 1.11 1.11 µ CML 1.59 1.91 1.69 1.19 0.79 0.62 0.63 0.74 0.88 1.04 1.40 1.54 1.34ACH 2.68 2.39 2.18 1.40 0.97 0.78 0.78 0.89 1.04 1.19 1.64 1.91 2.63
Table 2. MAEE and MAPE of coefficient, dispersion, and prediction estimates for varying y out .33OS 1 2 3 4 5 6 7 8 9 16 115 198 374frequency 2 6 5 5 4 2 2 1 1 1 1 1 1Table 3. Length of stay of 32 hospital patients.MLE M80 CML MLE* M80* CML* µ α β β β β α ML 1.266 0.017 0.064 -0.009 1.067 (0.134) (0.002) (0.178) (0.003) (0.067)
ACH 1.656 0.004 -1.055 0.012 0.542 (0.726) (0.011) (0.735) (0.011) (—)
CML 0.899 0.017 -0.269 -0.002 0.593 (0.113) (0.002) (0.154) (0.003) (0.049)
ML* 0.846 0.016 -0.253 -0.002 0.503 (0.114) (0.002) (0.156) (0.003) (0.046)
Table 5. Coefficient (standard errors) and dispersion estimates for disorders of theendocrine system.34
50 100 150 . . . . . M AEE ( b ) INIFINACH . . . . . M AEE ( a ) INIFINACH . . . . M AEE ( m ) INIFINACH
Figure 1. Mean absolute prediction and estimation errors for varying y out .35
20 40 60 80 100
Age L O S − Figure 2. Data: LOS and Age of 649 patients. Black circles are men, gray circlesare women. Full outliers are marked by cross signs (x); borderline observations byplus signs (+). Fitted models according to CML (solid lines) and ML (broken lines):black for men, gray for women 36 .0 0.2 0.4 0.6 0.8 1.0 . . . . . . (a) empirical z−quantiles (ML) un i f o r m quan t il e s . . . . . . (b) empirical z−quantiles (CML) un i f o r m quan t il e s . . . . . . (c) cleaned empirical z−quantiles (ML*) un i f o r m quan t il e s ML ACH CML ML* (d) boxplots of absolute residuals(d) boxplots of absolute residuals