Estimation and testing on independent not identically distributed observations based on Rényi's pseudodistances
EEstimation and testing on independent notidentically distributed observations based onR´enyi’s pseudodistances
Elena Castilla , Maria Jaenada and Leandro Pardo Department of Statistics and O.R., Complutense University of Madrid, Spain
Abstract
In real life we often deal with independent but not identically distributed observa-tions (i.n.i.d.o), for which the most well-known statistical model is the multiple linearregression model (MLRM) without random covariates. While the classical methodsare based on the maximum likelihood estimator (MLE), it is well known its lack ofrobustness to small deviations from the assumed conditions. In this paper, and basedon the R´enyi’s pseudodistance (RP), we introduce a new family of estimators in caseour information about the unknown parameter is given for i.n.i.d.o.. This family ofestimators, let say minimum RP estimators (as they are obtained by minimizing theRP between the assumed distribution and the empirical distribution of the data), con-tains the MLE as a particular case and can be applied, among others, to the MLRMwithout random covariates. Based on these estimators, we introduce Wald-type testsfor testing simple and composite null hypotheses, as an extension of the classicalMLE-based Wald test. Influence functions for the estimators and Wald-type tests arealso obtained and analysed. Finally, a simulation study is developed in order to assesthe performance of the proposed methods and some real-life data are analysed forillustrative purpose.
Keywords : asymptotic normality; consistency; independent not identically distributedobservations; influence function; minimum R´enyi’s pseudodistance estimators, robustness;Wald-type tests.
In parametric estimation the role of divergence measures is very intuitive: minimizing asuitable divergence measure between the data and the assumed model in order to estimatethe unknown parameters. These estimators are called “minimum divergence estimators”(MDEs). There is a growing body of literature that recognizes the importance of MDEsbecause their robustness, without a significant loss of efficiency, in relation to the max-imum likelihood estimator (MLE). See, for instance, Beran [1], Tamura and Boos [2],Simpson [3, 4], Lindsay [5], Pardo [6] and Basu et al. [7]. In the case of continuousmodels is convenient to consider families of divergence measures for which non-parametric1 a r X i v : . [ m a t h . S T ] F e b stimators of the unknown density function are not needed. From this perspective, thedensity power divergence (DPD) family, leading to the minimum DPD estimators, is themost important family of divergence measures. For more details see [7]. However, thereis another important family of divergence measures which neither needs non-parametricestimators, the R´enyi’s pseudodistances (RP).Let X , ..., X n be a random sample of size n from a population having true and unknowndensity function g, modelled by a parametric family of densities f θ with θ ∈ Θ ⊂ R p . TheRP between the densities f θ and g is given , for α > , by R α ( f θ , g ) = 1 α + 1 log (cid:18)(cid:90) f θ ( x ) α +1 dx (cid:19) + 1 α ( α + 1) log (cid:18)(cid:90) g ( x ) α +1 dx (cid:19) − α log (cid:18)(cid:90) f θ ( x ) α g ( x ) dx (cid:19) . (1)The RP can be defined for α = 0 taking continuous limits, yielding the expression R ( f θ , g ) = lim α ↓ R α ( f θ , g ) = (cid:90) g ( x ) log g ( x ) f θ ( x ) dx, i.e., the RP coincides with the Kullback-Leibler divergence between g and f θ , D Kullback ( g, f θ ),at α = 0 (see [6]). The RP was considered for the first time in Jones et al. [8]; later Bro-niatowski et al. [9] established that the RP is positive for any two densities and for allvalues of the parameter α, R α ( f θ , g ) ≥ R α ( f θ , g ) = 0 if and only if f θ = g. This property embolden the definition of the minimum RP estimator as the minimizer ofthe RP between the assumed distribution and the empiric distribution of the data. There-fore, the minimum RP estimator based on the random sample X , ..., X n for the unknownparameter θ is given, for α >
0, by (cid:98) θ ∗ α = arg sup θ ∈ Θ n (cid:80) i =1 f θ ( X i ) α C α ( θ ) , (2)where C α ( θ ) = (cid:0)(cid:82) f θ ( x ) α +1 dx (cid:1) αα +1 . Note that the value α = 0 was defined as the KL divergence and hence, the minimum RPestimator coincides with the MLE at α = 0 . Besides, [9] studied the asymptotic propertiesand robustness of the minimum RP estimators and presented an application to the multiplelinear regression model (MLRM) with random covariates. In the same vein, [10] introducedWald-type tests based on the minimum RP estimators for the MLRM and [11] studiedthe minimum RP estimator for the linear regression model in the ultra-high dimensionalset-up. Moreover, (cid:98) θ ∗ α is a M-estimator and thus it asymptotic distribution and influencefunction (IF) can be obtained based on the asymptotic theory of the M-estimators.However, far too little attention has been paid to the case of independent but nonidentically distributed observations (i.n.i.d.o.), for which the most well-known statisticalmodel is the MLRM without random covariates. The nicest study for i.n.i.d.o. basedon divergence measures, until now, is the paper of Ghosh and Basu [12] based on DPDmeasures. Some extensions are given in [13] and [14]. The main aim of this paper is tointroduce and study the minimum RP estimator for i.n.i.d.o. We study their asymptotic2roperties as well as we obtain its influence function in order to study its robustness.In Section 2 we introduce the minimum RP estimator for i.n.i.d.o. The consistency andasymptotic distribution is presented in Section 3. Section 4 is devoted to introduce andstudy Wald-type tests based on minimum RP estimators. The robustness of these esti-mators and Wald-type test is studied through its influence functions in Section 5. Thespecial case of MLRM is considered in Section 6. Finally, an extensively study and twonumerical examples of the MLRM are presented in Section 7 and 8, respectively. Let Y , ..., Y n be i.n.i.d.o. being g , ..., g n the corresponding density functions with respectto some common dominating measure. We are interested in modeling g i by the densityfunction f i ( y, θ ) , i = 1 , ..., n, being θ common for all the density functions f i ( y, θ ). Foreach observation i, the “R´enyi’s pseudodistance” between f i ( y, θ ) and (cid:98) g i , can be definedfor positives values of α as R α ( f i ( y, θ ) , (cid:98) g i ) = 1 α + 1 log (cid:18)(cid:90) f i ( y, θ ) α +1 dy (cid:19) − α log (cid:18)(cid:90) f i ( y, θ ) α (cid:98) g i ( y ) dy (cid:19) + k, (3)where k = 1 α ( α + 1) log (cid:18)(cid:90) (cid:98) g i ( y ) α +1 dy (cid:19) does not depend on θ. As we only have one observation of Y i the best way to estimate (cid:98) g i is assuming that the distribution is degenerate in y i . Therefore, (3) yields to the loss R α ( f i ( y, θ ) , (cid:98) g i ) = 1 α + 1 log (cid:18)(cid:90) f i ( y, θ ) α +1 dy (cid:19) − α log f i ( Y i , θ ) α + k. (4)At α = 0 the RP loss is given by R ( f i ( y, θ ) , (cid:98) g i ) = lim α ↓ R α ( f i ( y, θ ) , (cid:98) g i ) = − log f i ( Y i , θ ) + k. (5)Now, Expression (4) can be written as R α ( f i ( y, θ ) , (cid:98) g i ) = − α log f i ( Y i , θ ) α (cid:0)(cid:82) f i ( y, θ ) α +1 dy (cid:1) αα +1 + k, and thus minimizing R α ( f i ( y, θ ) , (cid:98) g i ) in θ , for α > , is equivalent to maximizing V i ( Y i , θ ) = f i ( Y i , θ ) α (cid:0)(cid:82) f i ( y, θ ) α +1 dy (cid:1) αα +1 . In the following, we shall denote, L iα ( θ ) = (cid:18)(cid:90) f i ( y, θ ) α +1 dy (cid:19) αα +1 . H αn ( θ ) = 1 n n (cid:80) i =1 f i ( Y i , θ ) α (cid:0)(cid:82) f i ( y, θ ) α +1 dy (cid:1) αα +1 = 1 n n (cid:80) i =1 f i ( Y i , θ ) α L iα ( θ ) = 1 n n (cid:80) i =1 V i ( Y i , θ ) (6)and then the minimum RP estimator, (cid:98) θ α , for the common θ, is given by (cid:98) θ α = arg max θ ∈ Θ H αn ( θ ) , (7)with H αn ( θ ) defined in (6) for α > H n ( θ ) = n n (cid:80) i =1 log f i ( Y i , θ ) . It is interesting to observe that when Y , ..., Y n are independent and identically dis-tributed random variables, the estimator (cid:98) θ α coincides with the estimator (cid:98) θ ∗ α given in (2).In the following section we shall establish the consistency of (cid:98) θ α , as well as its asymptoticdistribution. We shall assume in the following that the true densities g i i = 1 , ..., n, belong to theassumed model, i.e., g i ≡ f i ( y, θ ) , i = 1 , ..., n, for some common parameter θ . We denoteby θ ∗ the true value of the unknown parameter. In the following we shall denote E θ ∗ [ Y ] = (cid:82) yf i ( y, θ ∗ ) dy, and we introduce the matrices Ψ n = 1 n n (cid:80) i =1 J ( i ) , (8)with J ( i ) = (cid:18) − E θ ∗ (cid:20) ∂ V i ( Y ; θ ) ∂θ j ∂θ k (cid:21)(cid:19) i,j =1 ,...,p and Ω n = 1 n n (cid:80) i =1 V ar θ ∗ (cid:34)(cid:18) ∂V i ( Y ; θ ) ∂θ j (cid:19) j =1 ,..,p (cid:35) . (9)In order to get the asymptotic results we shall assume the following conditions: C1.
The support, X , of the density functions f i ( y, θ ) is the same for all i and does notdepend on θ . C2.
There exists an open subset Θ ∗ of Θ containing the true value of the parameter θ ∗ such that for almost all y ∈ X the density f i ( y, θ ) admits all third derivatives withrespect to θ and i = 1 , ..., n. C3.
For i = 1 , , ... the integrals (cid:82) f i ( y, θ ) α dy can be differentiated thrice with respect to θ and we can interchange integration anddifferentiation. 4 For i = 1 , , ... the matrices J ( i ) are positive definite. We denote by λ n the minimumeigenvalue of Ω n and λ = inf n λ n > . C5.
There exists functions M ( i ) jkl such that (cid:12)(cid:12)(cid:12)(cid:12) ∂ V i ( y ; θ ) ∂θ j ∂θ k ∂θ l (cid:12)(cid:12)(cid:12)(cid:12) ≤ M ( i ) jkl ( y ) , ∀ θ ∈ Θ ∗ , ∀ j, k, l and E θ ∗ (cid:104) M ( i ) jkl ( Y ) (cid:105) = m jkl < ∞ , ∀ j, k, l. C6.
For all j, k, l the sequences (cid:110) ∂V i ( Y ; θ ) ∂θ j (cid:111) j =1 ,...,p , (cid:110) ∂ V i ( Y ; θ ) ∂θ j ∂θ k (cid:111) j,k =1 ,..,p and (cid:110) ∂ V i ( Y ; θ ) ∂θ j ∂θ k ∂l (cid:111) j,k,l =1 ,..,p are uniformly integrable in the Ces´aro sense, i.e.lim N →∞ (cid:32) sup n> n n (cid:80) i =1 E θ ∗ (cid:34)(cid:12)(cid:12)(cid:12)(cid:12) ∂V i ( Y ; θ ) ∂θ j (cid:12)(cid:12)(cid:12)(cid:12) I (cid:26) ∂Vi ( Y ; θ ) ∂θj >N (cid:27) ( Y ) (cid:35)(cid:33) = 0 , lim N →∞ (cid:32) sup n> n n (cid:80) i =1 E θ ∗ (cid:34)(cid:12)(cid:12)(cid:12)(cid:12) ∂ V i ( Y ; θ ) ∂θ j ∂θ k (cid:12)(cid:12)(cid:12)(cid:12) I (cid:26) ∂ Vi ( Y ; θ ) ∂θj∂θk >N (cid:27) ( Y ) (cid:35)(cid:33) = 0 , lim N →∞ (cid:32) sup n> n n (cid:80) i =1 E θ ∗ (cid:34)(cid:12)(cid:12)(cid:12)(cid:12) ∂ V i ( Y ; θ ) ∂θ j ∂θ k ∂θ l (cid:12)(cid:12)(cid:12)(cid:12) I (cid:26) ∂ Vi ( Y ; θ ) ∂θj∂θk∂θl >N (cid:27) ( Y ) (cid:35)(cid:33) = 0 . C7.
For all ε > n →∞ n n (cid:80) i =1 E θ ∗ (cid:13)(cid:13)(cid:13)(cid:13) Ω − n ∂V i ( Y, θ ) ∂ θ (cid:13)(cid:13)(cid:13)(cid:13) I (cid:40)(cid:13)(cid:13)(cid:13)(cid:13) Ω − n ∂Vi ( Y,θ ) ∂ θ (cid:13)(cid:13)(cid:13)(cid:13) (cid:41) ( Y ) > ε √ n = 0 . Note that C6. gives sufficient conditions for the weak law of large numbers withi.n.i.d.o. ([15]), while C7. is the assumption required for the multivariate central limittheorem for i.n.i.d.o. ([16]). The following theorem states the consistency of (cid:98) θ α and thesecond one establishes its asymptotic distribution. Proof of both theorems are given inAppendix A.1 and Appendix A.2, respectively. Theorem 3.1
Let Y , ..., Y n be i.n.i.d.o. each with a density function f i ( y, θ ) , θ ∈ Θ ⊂ R p . If conditions C1.–C6. holds, there exists a consistent sequence (cid:98) θ n of the system ofequations ∂H αn ( θ ) ∂ θ = p . (10) Theorem 3.2
Let Y , ..., Y n be i.n.i.d.o. each with a density function f i ( y, θ ) , θ ∈ Θ ⊂ R p . If conditions C1.–C7. are satisfied the asymptotic distribution of the minimum RPestimator is given by √ n Ω − n Ψ n (cid:16)(cid:98) θ α − θ ∗ (cid:17) L → n →∞ N ( p , I p ) , (11) being I p the p-dimensional identity matrix and the matrices Ψ n and Ω n were defined in(8) and (9), respectively. Wald-type test for i.n.i.d.o.
We define a family of Wald-type test statistics based on minimum RP estimators fortesting the hypothesis H : θ = θ against H : θ (cid:54) = θ , (12)for a given θ ∈ Θ. Definition 4.1
Let (cid:98) θ α be the minimum RP estimator of θ . The family of proposed Wald-type test statistics for testing the null hypothesis (12) is given by W n ( θ ) = n ( (cid:98) θ α − θ ) T Σ − α ( θ )( (cid:98) θ α − θ ) , (13) where Σ α ( θ ) = lim n →∞ Ψ n ( θ ) Ω − n ( θ ) Ψ n ( θ ) . Theorem 4.2
The asymptotic distribution of the Wald-type test statistics W n ( θ ) , de-fined in (13), under the null hypothesis (12), is a chi-square distribution with p degrees offreedom. Based on Theorem 4.2, we shall reject the null hypothesis in (12) if W n ( θ ) > χ p,ν , (14)being χ p,ν the upper ν -th quantile of χ p . Theorem 4.3
Let θ ∗ be the true value of θ , with θ ∗ (cid:54) = θ , and let us denote (cid:96) ( θ ) = ( θ − θ ) T Σ − α ( θ − θ ) and σ W n ( θ ) ( θ ∗ ) = 4( θ ∗ − θ ) T (cid:2) Σ − α ( θ ) Σ α ( θ ∗ ) Σ − α ( θ ) (cid:3) ( θ ∗ − θ ) . Then, √ n (cid:16) (cid:96) ( (cid:98) θ α ) − (cid:96) ( θ ∗ ) (cid:17) L → n →∞ N ( , σ W n ( θ ) ( θ ∗ )) . Corollary 4.4
Theorem 4.3, makes it possible to have an approximation of the powerfunction for the test given in (14). This is given by π α W n ( θ ) ( θ ∗ ) = P ( Rejecting H | θ = θ ∗ ) = P ( W n ( θ ) > χ p,ν | θ = θ ∗ )= P (cid:32) (cid:96) ( (cid:98) θ α ) − (cid:96) ( θ ∗ ) > χ p,ν n − (cid:96) ( θ ∗ ) (cid:33) = 1 − Φ n (cid:32) √ nσ W n ( θ ) ( θ ∗ ) (cid:32) χ p,ν n − (cid:96) ( θ ∗ ) (cid:33)(cid:33) , where Φ n ( · ) is a sequence of distribution functions which tends uniformly to the standardnormal distribution function Φ( · ) . We can observe thatlim n →∞ π α W n ( θ ) ( θ ∗ ) = 1 ∀ α ≥ , so the Wald-type tests are consistent in the sense of Fraser. π α W n ( θ ) ( θ ∗ ) ≈ π ∗ , is given by n = (cid:34) A + B + (cid:112) A ( A + 2 B )2 (cid:96) ( θ ∗ ) (cid:35) , where A = σ W n ( θ ) ( θ ∗ )(Φ − (1 − π ∗ )) , and B = 2 (cid:96) ( θ ∗ ) χ p,ν . We may be also interested on testing a set of r < p redundant restrictions on the parametervector θ . In this context, we are interested in testing H : M T θ = m against H : M T θ (cid:54) = m , (15)where M is p × r full rank matrix with r < p and m is a r -vector. Definition 4.5
Let (cid:98) θ α be the minimum RP estimator of θ . The family of proposed Wald-type test statistics for testing the null hypothesis (15) is given by W n ( (cid:98) θ α ) = n ( M T (cid:98) θ α − m ) T (cid:104) M T Σ α ( (cid:98) θ α ) M (cid:105) − ( M T (cid:98) θ α − m ) . (16) Theorem 4.6
The asymptotic distribution of the Wald-type test statistics W n ( (cid:98) θ α ) , de-fined in (16), under the null hypothesis (15), is a chi-square distribution with r degrees offreedom. Based on Theorem 4.6, we shall reject the null hypothesis in (15) if W n ( (cid:98) θ α ) > χ r,ν . (17)We could generalize our results to a more general restricted space Θ ⊂ Θ defined bya set of r < p non-redundant restrictions of the form h ( θ ) = , by substituting the matrix M T Σ α ( (cid:98) θ α ) M by H T Σ α ( (cid:98) θ α ) H with H = ∂ h ( θ ) ∂ θ T in (16). The asymptotic distributionstated in Theorem 4.6 still holds. The previous results provides an asymptotic approximation to the power function of theproposed Wald-type tests. We now consider the particular set of contiguous alternativeshypothesis of the form H : θ n = θ + n − / d , (18)where d is a fixed vector in R p such that d ∈ Θ and θ is an element of Θ . Note that thealternative hypothesis move towards θ and it get closer when the sample size n increases. Theorem 4.7
Under the contiguous alternative hypotheses given in (18), the asymptoticdistribution of the Wald-type test statistics, W n ( (cid:98) θ α ) , defined in (15) is a non-central chi-square distribution with r degrees of freedom and non-centrality parameter δ = d T M [ M T Σ α ( (cid:98) θ α ) M ] − M T d . Influence function analysis
In this section we shall obtain the influence function (IF) of the minimum RP functional forthe non-homogeneous case. We shall denote by G i the true distribution function associatedto the observation Y i whose density function is denoted by g i and by T α ( G , . . . , G n ) theminimum RP functional defined as the minimizer of n (cid:88) i =1 H α, ( i ) n = 1 n n (cid:88) i =1 (cid:26)
11 + α log (cid:18)(cid:90) f i ( y, θ ) α +1 dy (cid:19) − α log (cid:18)(cid:90) f i ( y, θ ) α dG i ( y ) (cid:19)(cid:27) (19)or, fixed a value of α , under appropriate differentiability conditions, as the solution of thesystem of equations obtained after differentiating (19) and equalling to zero1 n n (cid:88) i =1 (cid:26) (cid:82) f i ( y, θ ) α +1 u i ( y, θ ) g i ( y ) dy (cid:82) f i ( y, θ ) α +1 g i ( y ) dy − (cid:82) f i ( y, θ ) α u i ( y, θ ) g i ( y ) dy (cid:82) f i ( y, θ ) α g i ( y ) dy (cid:27) = p . (20)By u i ( y, θ ) we are denoting u i ( y, θ ) = ∂ log( f i ( y, θ )) ∂ θ and by g i,ε = (1 − ε ) g i + ε ∆ t i , i = 1 , . . . , n the contaminated density where ∆ t i is the degenerated distribution at point t i .Let θ = T α ( G , . . . , G n ) and we denote by θ i ε = T α ( G , . . . , G i − , G i ,ε , G i +1 , . . . , G n )the minimum Renyi pseudodistance functional with contamination only in the i -th direc-tion, where G i ,ε is the distribution function associated to the denisty function g i ,ε ( y ) = (1 − ε ) f i ( y, θ ) + ε ∆ i ( y )and g i ( y ) = (cid:40) f i ( y, θ ) if i (cid:54) = i ;(1 − ε ) f i ( y, θ ) + ε ∆ i if i = i . It is also possible to contaminate in all the directions and in this case we shall denoteby θ ε = T α ( G ,ε , . . . G n,ε )the minimum RP functional with contamination in all directions.In the following theorem we present the expressions of the IF. See Appendix A.7 fordetails. 8 heorem 5.1 The influence function in the i -th direction is given by IF ( t i , T α , G , . . . , G n ) = ( M n,α ( θ )) − − (cid:96) i ,α ( θ ) (cid:0)(cid:82) f i ( y, θ ) α g i ( y ) dy (cid:1) , where (cid:96) i ,α ( θ ) = f i ( y, θ ) (cid:90) f i ( y, θ ) α +1 u i ( y, θ ) dy − f i ( y, θ ) u i ( y, θ ) (cid:90) f i ( y, θ ) α +1 dy, M n,α ( θ ) = 1 n n (cid:88) i =1 (cid:34) A i,α ( θ ) (cid:0)(cid:82) f i ( y, θ ) α dy (cid:1) − A ∗ i,α ( θ ) (cid:0)(cid:82) f i ( y, θ ) α g i ( y ) dy (cid:1) (cid:35) , (21) and A i,α ( θ ) = (cid:20) (1 + α ) (cid:90) f i ( y, θ ) α +1 u Ti ( y, θ ) u i ( y, θ ) dy + (cid:90) f i ( y, θ ) α +1 ∂ u i ( y, θ ) ∂ θ dy (cid:21) (cid:90) f i ( y, θ ) α +1 dy − (1 + α ) (cid:18)(cid:90) f i ( y, θ ) α +1 u i ( y, θ ) dy (cid:19) (cid:18)(cid:90) f i ( y, θ ) α +1 u i ( y, θ ) dy (cid:19) T , A ∗ i,α ( θ ) = (cid:20) α (cid:90) f i ( y, θ ) α g i ( y ) u Ti ( y, θ ) u i ( y, θ ) dy + (cid:90) f i ( y, θ ) α g i ( y ) ∂ u i ( y, θ ) ∂ θ dy (cid:21) (cid:90) f i ( y, θ ) α g i ( y ) dy − α (cid:18)(cid:90) f i ( y, θ ) α g i ( y ) u i ( y, θ ) dy (cid:19) (cid:18)(cid:90) f i ( y, θ ) α g i ( y ) u i ( y, θ ) dy (cid:19) T . Similarly, the influence function in all directions is given by IF ( t , . . . , t n , T α , G , . . . , G n ) = ( M n,α ( θ )) − n (cid:88) i =1 − (cid:96) i,α ( θ ) (cid:0)(cid:82) f i ( y, θ ) α g i ( y ) dy (cid:1) . Remark 5.2
When the true distribution g i belongs to the model so that g i ( y ) = f i ( y, θ ) for i = 1 , . . . , n , then M n,α ( θ ) given in (21) coincides with Ψ n ( θ ) given in (8) and IF ( t i , T α , F , θ , . . . , F n, θ ) = Ψ − n ( θ ) D i ,α ( θ ) ,IF ( t , . . . , t n , T α , F , θ , . . . , F n, θ ) = Ψ − n ( θ ) n (cid:88) i =1 D i,α ( θ ) , with D i,α ( θ ) = − (cid:96) i,α ( θ ) (cid:0)(cid:82) f i ( y, θ ) α +1 dy (cid:1) . Remark 5.3
In particular, letting t i = t , G i = G , f i ( y, θ ) = f ( y, θ ) for i = 1 , . . . , n (thissituation corresponds to the case of independent and identically distributed, i.i.d., randomvariables) and g ( y ) = f ( y, θ ) , we have IF ( t, T α , G ) = ( M α ( θ )) − [ f ( y, θ ) α u ( y, θ ) − c α ( θ ) f ( y, θ ) α ] , where c α ( θ ) = (cid:82) f ( y, θ ) α +1 u ( y, θ ) dy (cid:82) f ( y, θ ) α +1 dy and M α ( θ ) = 1 (cid:82) f ( y, θ ) α +1 dy (cid:20)(cid:90) f ( y, θ ) α +1 dy (cid:90) f ( y, θ ) α +1 u ( y, θ ) u T ( y, θ ) dy − (cid:18)(cid:90) f ( y, θ ) α +1 u ( y, θ ) dy (cid:19) (cid:18)(cid:90) f ( y, θ ) α +1 u ( y, θ ) dy (cid:19) T (cid:35) , as in [9]. .1 Influence function of the Wald-type test statistics Once we have computed the IF for the minimum RP estimators, we can define and studythe IF for the Wald-type test statistics. First, we define the associated statistical func-tional, evaluated at G , . . . , G n as W α ( G , . . . , G n ) = ( T α ( G , . . . , G n ) − θ ) T Σ − α ( θ )( T α ( G , . . . , G n ) − θ ) , (22)corresponding to (12) for the simple null hypothesis. Let us consider first the contam-ination only in one direction, say i -th direction. The corresponding IF is then definedas IF ( t i , W α , G , . . . , G n ) = ∂∂ε W α ( G , . . . , G i − , G i ,ε , G i +1 , . . . , G n ) (cid:12)(cid:12) ε =0 (23)= 2 T α ( G , . . . , G n ) − θ ) T Σ − α ( θ ) IF ( t i , T α , G , . . . , G n ) . However, if we evaluate (23) at the null distribution G i = F i, θ , it becomes identicallyzero. Therefore, it becomes necessary to consider the second order IF of the proposedWald-type test functional IF (2) ( t i , W α , G , . . . , G n ) = ∂ ∂ ε W α ( G , . . . , G i − , G i ,ε , G i +1 , . . . , G n ) (cid:12)(cid:12) ε =0 = 2 IF ( t i , T α , G , . . . , G n ) Σ − α ( θ ) IF ( t i , T α , G , . . . , G n ) . Similarly, we can consider contamination in all directions, obtaining that the second or-der influence function of the proposed Wald-type tests functional for testing simple nullhypothesis is given by IF (2) ( t , . . . , t n , W α , G , . . . , G n )= ∂ ∂ ε W α ( G , . . . , G i − , G i ,ε , G i +1 , . . . , G n ) (cid:12)(cid:12) ε =0 = 2 IF ( t , . . . , t n , T α , G , . . . , G n ) Σ − α ( θ ) IF ( t , . . . , t n , T α , G , . . . , G n ) . Remark 5.4
When the true distribution belongs to the model, then the second order influ-ence functions of the proposed Wald-type tests functional for testing simple null hypothesisin (12) are given by IF (2) ( t i , W α , F , θ , . . . , F n, θ )= 2 (cid:2) D i ,α ( θ ) (cid:3) T (cid:2) Ψ − n ( θ ) Σ − α ( θ ) Ψ − n ( θ ) (cid:3) (cid:2) D i ,α ( θ ) (cid:3) ,IF (2) ( t , . . . .t n , W α , F , θ , . . . , F n, θ )= 2 (cid:34) n (cid:88) i =1 D i,α ( θ ) (cid:35) T (cid:2) Ψ − n ( θ ) Σ − α ( θ ) Ψ − n ( θ ) (cid:3) (cid:34) n (cid:88) i =1 D i,α ( θ ) (cid:35) . Remark 5.5
In a similar manner, when the true distribution belongs to the model, thesecond order influence functions of the proposed Wald-type tests functionals for testing omposite null hypothesis in (15) are given by IF (2) ( t i , W α , F , θ , . . . , F n, θ )= 2 (cid:2) Ψ − n ( θ ) D i ,α ( θ ) (cid:3) T M (cid:2) M T Σ α ( θ ) M (cid:3) − M T (cid:2) Ψ − n ( θ ) D i ,α ( θ ) (cid:3) ,IF (2) ( t , . . . .t n , W α , F , θ , . . . , F n, θ )= 2 (cid:34) Ψ − n ( θ ) n (cid:88) i =1 D i ,α ( θ ) (cid:35) T M (cid:2) M T Σ α ( θ ) M (cid:3) − M T (cid:34) Ψ − n ( θ ) n (cid:88) i =1 D i ,α ( θ ) (cid:35) . Consider the MLRM Y i = X Ti β + ε i , i = 1 , . . . , n, (24)where the errors ε (cid:48) i s are i.i.d. normal random variables with mean zero and variance σ , X Ti = ( X i , ..., X ip ) is the vector of independent variables corresponding to the i -thcondition and β = ( β , ..., β p ) T is the vector of regression coefficients to be estimated.We will consider that, for each i , X i is fixed, yielding to independent but not identicallydistributed Y (cid:48) i s (i.n.i.d.o.), with Y i ∼ N ( X Ti β , σ ). Under the previous notation, with f i ≡ N ( X Ti β , σ ), we have, for α > V i ( Y i ; θ , X i ) = π ) α/ σ α exp (cid:16) − α ( Y i − X Ti β ) σ (cid:17)(cid:0) (2 π ) α/ σ α √ α (cid:1) − αα +1 = (cid:18) α π (cid:19) α α +1) σ − αα +1 exp (cid:32) − α (cid:18) Y i − X Ti β σ (cid:19) (cid:33) . Thus, our objective function to be minimized becomes1 n n (cid:88) i =1 V i ( Y i ; θ , X i ) = (cid:18) α π (cid:19) α α +1) n n (cid:88) i =1 σ − αα +1 exp (cid:32) − α (cid:18) Y i − X Ti β σ (cid:19) (cid:33) . Taking into account that the term (cid:0) α π (cid:1) α α +1) does not depend on the model parameters,we have that, for α > (cid:16)(cid:98) β α , (cid:98) σ α (cid:17) = arg max β ,σ n (cid:80) i =1 σ − αα +1 exp (cid:32) − α (cid:18) Y i − X Ti β σ (cid:19) (cid:33) . (25)Derivating with respect to β and σ we see that the estimators (cid:98) β α and (cid:98) σ α are solutions ofthe system n (cid:80) i =1 exp (cid:18) − α (cid:16) Y i − X Ti β σ (cid:17) (cid:19) (cid:16) Y i − X Ti β σ (cid:17) X i = pn (cid:80) i =1 exp (cid:18) − α (cid:16) Y i − X Ti β σ (cid:17) (cid:19) (cid:26)(cid:16) Y i − X Ti β σ (cid:17) − α (cid:27) = 0 , (26)which is exactly the same as the one suggested by Castilla et al. (2020) for the case ofhomogeneous data. If α = 0, we have 11 (cid:98) β α =0 , (cid:98) σ α =0 (cid:17) = arg max β ,σ πσ ) n/ exp (cid:32) − (cid:107) Y − X β (cid:107) σ (cid:33) (27)and we get the system necessary to get the MLE of β and σ , whose well-known solutionis given by (cid:98) β = ( X T X ) − X T Y and (cid:98) σ = 1 n n (cid:80) i =1 (cid:16) Y i − X Ti (cid:98) β (cid:17) , where X T = ( X , . . . , X n ) p × n is the matrix of explanatory variables. Lemma 6.1
Consider the set-up of the MLRM with i.n.i.d.o. defined in (24) and assumethat the true data generating density belongs to the model family. If the following mildconditions about the explanatory variables hold
M1.
The values of the explanatory variables are such that, for all j , k and l sup max n> ≤ i ≤ n | X ij | = O (1) , and sup max n> ≤ i ≤ n | X ij X ij | = O (1) and n n (cid:88) i =1 | x ij x ik x il | = O (1) . M2.
The matrix X T satisfiesinf n (cid:20) min eigenvalue of n X T X (cid:21) > ,n × max ≤ i ≤ n (cid:2) X Ti ( X T X ) − X i (cid:3) = O (1) , then C1.–C7. are satisfied. On the other hand, after some heavy computations we follow that expressions (8) and(9) are given by Ψ n = (cid:34) − σ ( α +1) / (cid:0) n (cid:80) ni =1 X i X Ti (cid:1) − σ ( α +1) / (cid:35) , (28)and Ω n = ( n (cid:80) ni =1 X i X Ti ) (2 α +1) / (3 α +4 α +2) σ ( α +1) (2 α +1) / . (29) Theorem 6.2
Consider the set-up of the MLRM with i.n.i.d.o. defined in (24) and as-sume that the true data generating density belongs to the model family and the observedexplanatory variables satisfy conditions M1. and M2.. Then,1. There exists a consistent sequence as (cid:98) θ α = ( (cid:98) β α , (cid:98) σ α ) of solutions to the minimumR´enyi estimating equations (26). . (cid:98) β α and (cid:98) σ α are asymptotically independent and their asymptotic distribution is givenby √ n (cid:16)(cid:98) θ α − θ ∗ (cid:17) L → n →∞ N ( , Σ α ) , with Σ α = lim n →∞ Σ n , Σ n = σ ( α +1) (2 α +1) / (cid:0) n (cid:80) ni =1 X i X Ti (cid:1) − σ ( α +1) (3 α +4 α +2)4(2 α +1) / . We could now apply the theory stated to test any simple or composite hypothesis on thelinear regression parameters. The asymptotic distribution under the null hypothesis of theWald type tests defined in (16) is given in Theorem 6.2 and the asymptotic distribution ofthe Wald-type test statistics under contiguous alternative hypothesis is given in Theorem4.7. The non-centrality parameter in Theorem 4.7 can be expressed as δ = d ∗ T [ M T Σ n M ] − d ∗ , with d ∗ = M T d . If we consider the composite null hypothesis (15), then δ = (2 α + 1) / σ ( α + 1) d ∗ T [ M T (cid:32) n n (cid:88) i =1 X i X Ti (cid:33) M ] − d ∗ . Now, based on Remark 5.2 we can obtain the IF of the functional associated to theminimum RP estimator of θ . These are given by IF ( t i , T α , F , θ , . . . , F n, θ ) = Ψ − n ( θ ) (cid:0) D Ti ,α ( β ) , D i ,α ( σ ) (cid:1) T ,IF (2) ( t i , W α , F , θ , . . . , F n, θ ) = 2 (cid:0) D Ti ,α ( β ) , D i ,α ( σ ) (cid:1) (cid:2) Ψ − n ( θ ) Σ − n ( θ ) Ψ − n ( θ ) (cid:3) × (cid:0) D Ti ,α ( β ) , D i ,α ( σ ) (cid:1) T ,IF (2) ( t i , W α , F , θ , . . . , F n, θ ) = 2 (cid:104) Ψ − n ( θ ) (cid:0) D Ti ,α ( β ) , D i ,α ( σ ) (cid:1) T (cid:105) T M (cid:2) M T Σ n ( θ ) M (cid:3) − × M T (cid:2) Ψ − n ( θ ) (cid:0) D Ti ,α ( β ) , D i ,α ( σ ) (cid:1)(cid:3) , with D i ,α ( β ) = − σ exp − α (cid:32) t i − x Ti β σ (cid:33) (cid:32) t i − x Ti β σ (cid:33) x i , D i ,α ( σ ) = − σ exp − α (cid:32) t i − x Ti β σ (cid:33) (cid:32) t i − x Ti β σ (cid:33) − α + 1 . Note that, since Ψ n ( θ ) and Σ n ( θ ) are diagonal matrices, we could express separately theIF of the functional T α ( β ) and T α ( σ ) associated to the minimum RP estimator, (cid:98) β α and (cid:98) σ α respectively. Following [14], we consider two different fixed design matrices for theunivariate lineal regression model: 13 esign 1 Two-points design.We fix x i = (1 , x i ) T , with x i = a, i = 1 , .., n/ x i = b, i = n/ , .., n. Design 2
Fixed-Normal design. We fix x i = (1 , x i ) T , where x i , i = 1 , ..., n are prefixedindependent and identically distributed observations from a N ( µ x = 0 , σ x = 1) . Figure 2 presents the (cid:96) -norm of the first order IF of the minimum RP estimator andsecond order IF of Wald type test estimators for testing (12) with θ = (1 , , T withboth fixed designs and contamination in one direction for different values of α . Clearly,the IF is bounded for positives values of the parameter α and is unbounded at the MLE,highlighting it lack of robustness. Moreover, the supremum of the (cid:96) -norm of the IFindicates the robustness of the estimator. Hence, we could study the optimal parameterof α trough the gross error sensitivity function. We define the gross error sensitivity ofthe functional T α considering contamination in the i direction as γ ∗ ( T α , F , θ , . . . , F n, θ ) = sup t i {|| IF i ( t i , T α , F , θ , . . . , F n, θ ) | |} . (30)Considering separately the influence function of the functionals T α ( β ) and T α ( σ ) , it iseasy to show that γ ∗ ( T α ( β ) , F , θ , . . . , F n, θ ) = σ ( α + 1) / α / exp (cid:18) − (cid:19) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:32) n n (cid:88) i =1 X i X Ti (cid:33) − x i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ,γ ∗ ( T α ( σ ) , F , θ , . . . , F n, θ ) = ( α + 1) / α exp (cid:18) − α + 22( α + 1) (cid:19) . (31)Figure 1 represents the gross error sensitivity functions depending on the parameter α, using Design 1 and fixing the true standard error σ = 1. The optimal value of α depends onthe functional, being α = 1 / α = (cid:112) / T α ( β ) and T α ( σ ) respectively. Therefore,a global optimal value of α, in terms of robustness, should varies between values α = 0 . α = 0 .
82 if the true standard error is σ = 1 . Finally, we study the Asymptotic Relative Efficiency (ARE) of the proposed minimumRP estimators with respect to the MLE, which is B.A.N. (Best Asymptotically Normal).The ARE of is computed as the ratio of their asymptotic variances. Note that this ratiodoes not depend on the regression parameters, but is only determined by α. ARE( (cid:98) β α ) = (2 α + 1) / ( α + 1) , ARE( (cid:98) σ α ) = 2(2 α + 1) / ( α + 1) (3 α + 4 α + 2) . (32)Table 1 represents the ARE of the minimum RP estimator, ( (cid:98) β α , (cid:98) σ α ). As shown, theincrement of α leads to an efficiency loss, which is heightened for the standard errorestimator. Therefore, to ensure sufficing efficiency, the parameter α should be chosen fromlow values. However, the efficiency reduction might worth in contrast with the robustnessadvantage. In view of the error sensitivity function study, values above α = 0 .
82 are notadvocated. 14 a g * ( T a ( b )) a g * ( T a ( s )) Figure 1: Gross error function for T α ( β ) (left) and T α ( σ ) (right).Table 1: ARE of the minimum RP estimator with respect to the MLE for the multiplelinear regression model for different values of α . α (cid:98) β α ) ( × (cid:98) σ α ) ( × We empirically evaluate the performance of the proposed Wald-type test statistics basedon minimum RP estimator for MLRM through an extensive simulation study. We considerthe univariate regression model with fixed design matrix y i = β + β x i + ε i , i = 1 , .., n and the two different design matrices presented in Section 6. We generate the responsevariable from the linear regression model (24) with regression parameters β = (1 ,
1) and σ = 1. To introduce contamination on the data, we swap the true regression vectorto β = (1 . ,
2) for a 10% of the sample size. We analyse the performance of Wald-type tests for simple null hypothesis on both regression parameters, β and σ , at differentvalues of the tuning parameter α . Note that, for the proposed design matrices, the matrix n (cid:80) ni =1 X i X Ti is finitely defined and is positive definite.We consider two different null hypothesesH : β = 1 , (33)H : σ = 1 , (34)corresponding with the composite null hypothesisH : M T β = m , t i0 I F ( T a ) l a −6 −4 −2 0 2 4 6 . . . . . . t i0 I F ( T a ) l a −6 −4 −2 0 2 4 6 t i0 I F ( W a ) l a −6 −4 −2 0 2 4 6 t i0 I F ( W a ) l a Figure 2: (cid:96) -norm of the first order IF of the minimum RP estimator (top) and secondorder IF of Wald type test estimators for testing (12) with θ = (1 , , T (bottom) withfixed Design 1 (left) and Design 2 (right), and contamination in the direction i = 1 . M β = [0 , ,
0] and m β = 1 , M σ = [0 , ,
1] and m σ = 1 , respectively. The Wald-type test statistics for testing (33)-(34), which we will denote W n ( (cid:98) β α ) and W n ( (cid:98) σ α ), are given in (16) by substituting the corresponding matrices. Soas to investigate the trade-off between efficiency and robustness depending on the tuningparameter α , we compute the empirical levels for the proposed Wald-type tests and powerswhen the true parameter values are β = 0 .
45 and σ = 0 .
8, respectively. These levelsand powers are computed as the number of times that the null hypothesis is rejected overthe total simulated samples R = 1000 . Figures 3-6 contain the root mean square error(RMSE), empirical level and power results for the null hypothesis tests (33) and (34), fora 5% significance level. The results show the clear improvement in robustness when α increases, in detriment to the efficiency. The MLE produces the best performance withpure data, showing its major efficiency, and the behaviour of the minimum RP estimatorimproves when α decreases, i.e., estimators based in low values of the parameter enjoygreater efficiency. However, in presence of data contamination, the RMSE and empiricallevel of the Wald-type test statistics rise for low values of α , highlighting its lack ofrobustness. The most revealing setting is Design 1, at which the empirical level and powerof the Wald-type tests based on the MLE reaches their worst results, but the proposedWald-type test based on RP loss statistics continues to perform adequately for sufficientlyhigh values of α .For the first hypothesis test (33), we could apply Theorem 4.7 to obtain the powerunder the contiguous alternative hypothesis (18). The distribution of the Wald-type tests W n ( (cid:98) β ) is given by a chi-squared with 1 degree of freedom and non-centrality parameter δ = σ ( α + 1) (2 α + 1) / d ∗ T (cid:32) n n (cid:88) i =1 X i (cid:33) − d ∗ , depending on the standard deviation error σ , the tuning parameter α and the fixed value d x = d ∗ T (cid:0) n (cid:80) ni =1 X i (cid:1) − d ∗ . The choice d x = 0 corresponds with the level of the test.Table 2 summarizes the empirical power results over different values of α and d x , with σ = 1 . Table 2: Empirical power values of the null hypothesis (33) under contiguous hypothesis. d x α l l l l l l l l l
50 100 150 200 250 300 . . . . . . n R M SE ( b ) l a l l l l l l l l l l
50 100 150 200 250 300 . . . . n R M SE ( s ) l a l l l l l l l l l l
50 100 150 200 250 300 . . . . n Le v e l W n ( b ^ a ) l a l l l l l l l l l l
50 100 150 200 250 300 . . . . . . . n Le v e l W n ( s ^ a ) l a l l l l l l l l l l
50 100 150 200 250 300 . . . . . . . n P o w e r W n ( b ^ a ) l a l l l l l l l l l l
50 100 150 200 250 300 . . . . . . n P o w e r W n ( s ^ a ) l a Figure 3: RMSE (top) empirical level (middle) and empirical power(bottom) against sam-ple size for the null hypothesis (33) (left) and (34) (right) for the corresponding Wald typetests with pure data and Design 1. 18 l l l l l l l l l
50 100 150 200 250 300 . . . . . n R M SE ( b ) l a l l l l l l l l l l
50 100 150 200 250 300 . . . . . . . . n R M SE ( s ) l a l l l l l l l l l l
50 100 150 200 250 300 . . . . . n Le v e l W n ( b ^ a ) l a l l l l l l l l l l
50 100 150 200 250 300 . . . . . n Le v e l W n ( s ^ a ) l a l l l l l l l l l l
50 100 150 200 250 300 . . . . n P o w e r W n ( b ^ a ) l a l l l l l l l l l l
50 100 150 200 250 300 . . . . . n P o w e r W n ( s ^ a ) l a Figure 4: RMSE (top) empirical level (middle) and empirical power(bottom) against sam-ple size for the null hypothesis (33) (left) and (34) (right) for the corresponding Wald-typetests with 10% of outliers and Design 2. 19 l l l l l l l l l
50 100 150 200 250 300 . . . . . n R M SE ( b ) l a l l l l l l l l l l
50 100 150 200 250 300 . . . . n R M SE ( s ) l a l l l l l l l l l l
50 100 150 200 250 300 . . . . n Le v e l W n ( b ^ a ) l a l l l l l l l l l l
50 100 150 200 250 300 . . . . . . n Le v e l W n ( s ^ a ) l a l l l l l l l l l l
50 100 150 200 250 300 . . . . . . . n P o w e r W n ( b ^ a ) l a l l l l l l l l l l
50 100 150 200 250 300 . . . . . . n P o w e r W n ( s ^ a ) l a Figure 5: RMSE (top) empirical level (middle) and empirical power(bottom) against sam-ple size for the null hypothesis (33) (left) and (34) (right) for the corresponding Wald-typetests with pure data and Design 2. 20 l l l l l l l l l
50 100 150 200 250 300 . . . . . n R M SE ( b ) l a l l l l l l l l l l
50 100 150 200 250 300 . . . . . . . . n R M SE ( s ) l a l l l l l l l l l l
50 100 150 200 250 300 . . . . . . . n Le v e l W n ( b ^ a ) l a l l l l l l l l l l
50 100 150 200 250 300 . . . . . n Le v e l W n ( s ^ a ) l a l l l l l l l l l l
50 100 150 200 250 300 . . . . . . n P o w e r W n ( b ^ a ) l a l l l l l l l l l l
50 100 150 200 250 300 . . . . . . . n P o w e r W n ( s ^ a ) l a Figure 6: RMSE (top) empirical level (middle) and empirical power(bottom) against sam-ple size for the null hypothesis (33) (left) and (34) (right) for the corresponding Wald-typetests with 10% of outliers and Design 2. 21ote that greater values of d x produces greater power values as expected, and empiricalpower decreases with α. However, the efficiency loss is not very significant in comparisonto the robustness advantage.
These data, adapted from a larger data set in [18], were presented in Rousseeuw andLeroy ([19], pp. 57) as an example of the unrobustness of the classical MLE in simplelinear regression. In this sample, the body weight (in kilograms) and the brain weight(in grams) of n = 28 animals are compared, to investigate if a larger brain is required togovern a heavier body. As suggested in in [19], a transformation should be done to clearlyrepresent either the larger or smaller measurements. In this case, we take the Napierianlogarithm of both brain and body weights. Observations 6 ,
16 and 25, those correspondingto dinosaurs, posses an unusual small brain as compared with a heavy body, which clearlyaffects to the slope of the classical estimation method ( α = 0) as can be seen in Figure 7and Table 3. The estimates of the regression coefficients and the error variance obtainedfrom the minimum RP estimation for various α are also presented here, observing howthe estimation based on α > * **** *** *** *** *** *** * *** *** * With Outliers log of Body Weight l og o f B r a i n W e i gh t a * **** ** *** *** ** *** * ***** * Without Outliers log of Body Weight l og o f B r a i n W e i gh t a Figure 7: Plots of the data-points and fitted regression lines for the Brain and WeightData using several minimum RP estimators before and after deleting the outliers.22able 3: The parameter estimates of the linear regression model for the Brain and WeightData using several minimum RP estimators.With outliers Without outliers α σ β β σ β β α , we consider the following testsH : β = 1 . , (35)H : β = 0 . , (36)H : ( β , β ) = (1 . , . , (37)where the values 1 .
98 and 0 .
73 are respectively the mean value of the estimated coefficients β and β with the different values of α and using the original data (with outliers) listedin Table 3. Table 8.1 shows the p-values obtained by using the corresponding Wald-typetest statistics, W n ( (cid:98) β ), W n ( (cid:98) β ) and W n ( (cid:98) β ) . Table 4: p-value obtained for the tests (35)-(37) using the corresponding Wald-type teststatistics. With outliers Without outliers α W n ( (cid:98) β ) W n ( (cid:98) β ) W n ( (cid:98) β ) W n ( (cid:98) β ) W n ( (cid:98) β ) W n ( (cid:98) β )0 0.080 0.000 0.000 0.452 0.542 0.0720.2 0.713 0.556 0.204 0.723 0.537 0.1970.4 0.833 0.437 0.358 0.833 0.437 0.3580.6 0.539 0.310 0.305 0.539 0.310 0.3050.8 0.423 0.236 0.235 0.423 0.236 0.2351 0.393 0.203 0.203 0.393 0.203 0.203As shown, the robustness of the test increases with α , showing the robustness improve-ment of the proposed Wald-type test statistics. Note that the major difference betweenthe p-value using clean data and data with outliers is obtained with the value α = 0corresponding to the classical MLE. These data, originally presented in Mickey et al. [20], consist on n = 21 observations andrelate the age in which children speak their first word to their Gesell adaptative score,a meassure of mental ability. By means of a sequential approach to detect outliers via23tepwise regression, [20] concluded that observation 18 was an outlier. While estimates ofthe regression coefficients obtained with the MLE do not change excesively when omittingthis outlier (Figure 8), we do observe a greater change in the error variance estimation(Table 5). As expected, minimum RP estimates for α > * *** * **** ** ***** * ****
10 15 20 25 30 35 40
With Outlier age G e s e ll sc o r e a * *** * **** ** ***** * ***
10 15 20 25 30 35 40
Without Outlier age G e s e ll sc o r e a Figure 8: Plots of the data-points and fitted regression lines for the First Word Data usingseveral minimum RP estimators before and after deleting the outliers.Table 5: The parameter estimates of the linear regression model for the First Word Datausing several minimum RP estimators.With outliers Without outliers α σ β β σ β β : β = 112 . , (38)H : β = − . , (39)H : ( β , β ) = (112 . , − . , (40)24here the values 112 .
56 and 0 .
73 are respectively the mean value of the estimated coeffi-cients β and β with the different values of α and using the original data (with outliers)listed in Table 5. Table 8.2 shows the p-values obtained by using the corresponding Wald-type test statistics, W n ( (cid:98) β ), W n ( (cid:98) β ) and W n ( (cid:98) β ) . Table 6: p-value obtained for the tests (38)-(40) using the corresponding Wald-type teststatistics. With outliers Without outliers α W n ( (cid:98) β ) W n ( (cid:98) β ) W n ( (cid:98) β ) W n ( (cid:98) β ) W n ( (cid:98) β ) W n ( (cid:98) β )0 0.072 0.098 0.071 0.013 0.293 0.0010.2 0.110 0.331 0.065 0.065 0.485 0.0070.4 0.243 0.636 0.098 0.234 0.719 0.0610.6 0.596 0.949 0.318 0.628 0.997 0.3110.8 0.590 0.620 0.588 0.529 0.583 0.5291 0.001 0.078 0.000 0.001 0.078 0.000The results highlight again the gain in robustness. In this paper we have presented the minimum RP estimators for the case of i.n.i.d.o. Wald-type tests based on them are also developed. Classical MLE and Wald-test are obtainedas a particular case of these new estimators and tests. In particular, we have studied thecase of MLRM. Through the study of the influence functions and the development of anextensive simulation study we prove their robustness from a theoretical and practical pointof view, respectively. Application to different models is a problem that will be of interestfor further consideration.
Acknowledgements:
This research is supported by the Spanish Grants no. PGC2018-095194-B-100, no. FPU19/01824 and no. FPU16/03104.
A Proof of Results
A.1 Proof of Theorem 3.1
The proof follows similar steps that the proof presented in [12] for the minimum DPDestimators for i.n.i.d.o and the proof presented in [17] for the MLE with i.n.i.d.o.To prove the existence, with probability tending to 1, of a consistent sequence ofsolutions of the system of equations (10), we study the behaviour of the objective functionin (6), H αn ( θ ) , on a neighbourhood of the true parameter value. We consider the sphere Q a with center at the true value of the parameter θ ∗ and radius a. We will show that forany sufficiently small a H αn ( θ ) < H αn ( θ )25ith probability tending to 1 at all points θ on the surface of Q a . This inequality ensuresthat the objective function H αn ( θ ) has a local maximum in the interior of Q a . Since H αn ( θ ) is differentiable the system of equations (10) must be satisfied at a local maximum.Therefore, for any a >
0, the system of equations (10) has a solution (cid:98) θ n ( a ) within Q a verifying lim n →∞ P θ ∗ (cid:16)(cid:13)(cid:13)(cid:13)(cid:98) θ n ( a ) − θ ∗ (cid:13)(cid:13)(cid:13)(cid:17) = 1 . We consider a Taylor series expansion of H αn ( θ ) around θ = (cid:0) θ , ..., θ p (cid:1) ,H αn ( θ ) − H αn ( θ ∗ ) = p (cid:80) j =1 (cid:18) ∂H αn ( θ ) ∂θ j (cid:19) θ = θ (cid:0) θ j − θ j (cid:1) (41)+ 12 p (cid:80) j =1 p (cid:80) k =1 (cid:18) ∂ H αn ( θ ) ∂θ j ∂θ k (cid:19) θ = θ (cid:0) θ j − θ j (cid:1) (cid:0) θ k − θ k (cid:1) + 16 p (cid:80) j =1 p (cid:80) k =1 p (cid:80) l =1 (cid:18) ∂ H αn ( θ ) ∂θ j ∂θ k ∂θ l (cid:19) θ = θ ∗ (cid:0) θ j − θ j (cid:1) (cid:0) θ k − θ k (cid:1) (cid:0) θ l − θ l (cid:1) = p (cid:80) j =1 n n (cid:80) i =1 (cid:18) ∂V i ( Y i ; θ ) ∂θ j (cid:19) θ = θ (cid:0) θ j − θ j (cid:1) + 12 p (cid:80) j =1 p (cid:80) k =1 n n (cid:80) i =1 (cid:18) ∂ V i ( Y i ; θ ) ∂θ j ∂θ k (cid:19) θ = θ (cid:0) θ j − θ j (cid:1) (cid:0) θ k − θ k (cid:1) + 16 p (cid:80) j =1 p (cid:80) k =1 p (cid:80) l =1 n n (cid:80) i =1 (cid:18) ∂ V i ( Y i ; θ ) ∂θ j ∂θ k ∂θ l (cid:19) θ = θ ∗ (cid:0) θ j − θ j (cid:1) (cid:0) θ k − θ k (cid:1) (cid:0) θ l − θ l (cid:1) = L + L + L , where θ ∗ belong to the interior of the ball centred on θ ∗ and radius a. We study separatelyright-hand terms L , L and L in (41).Using assumption C6., we have that A ( nj = (cid:18) ∂H αn ( θ ) ∂θ j (cid:19) θ = θ = 1 n n (cid:80) i =1 (cid:18) ∂V i ( Y i ; θ ) ∂θ j (cid:19) θ = θ P → n n (cid:80) i =1 E θ ∗ (cid:34)(cid:18) ∂V i ( Y ; θ ) ∂θ j (cid:19) θ = θ (cid:35) = 0 . We are going to establish the last equality, ∂V i ( Y i ; θ ) ∂θ j = 1 L iα ( θ ) (cid:18) αf i ( Y, θ ) α u j ( Y, θ ) L iα ( θ ) − ∂L iα ( θ ) ∂θ j f i ( Y, θ ) α (cid:19) . with u j ( y, θ ) = ∂ log( f i ( Y, θ )) ∂θ j . But ∂L iα ( θ ) ∂θ j = αα + 1 (cid:18)(cid:90) f i ( y, θ ) α +1 dy (cid:19) αα +1 − ( α + 1) (cid:90) f i ( y, θ ) α +1 u j ( y, θ ) dy = α (cid:18)(cid:90) f i ( y, θ ) α +1 dy (cid:19) αα +1 − (cid:90) f i ( y, θ ) α +1 u j ( y, θ ) dy. Therefore, L iα ( θ ) ∂V i ( Y ; θ ) ∂θ j = αf i ( Y, θ ) α u j ( Y, θ ) L iα ( θ ) (cid:34) α (cid:18)(cid:90) f i ( y, θ ) α +1 dy (cid:19) αα +1 − (cid:90) f i ( y, θ ) α +1 u j ( y, θ ) dy (cid:35) f i ( Y, θ ) α E θ ∗ (cid:34)(cid:18) ∂V i ( Y i ; θ ) ∂θ j (cid:19) θ = θ (cid:35) = (cid:90) (cid:0) αf i ( y, θ ∗ ) α u j ( y, θ ∗ ) L iα ( θ ) (cid:1) f i ( y, θ ∗ ) dy − (cid:90) (cid:34) α (cid:18)(cid:90) f i ( y, θ ∗ ) α +1 dy (cid:19) αα +1 − (cid:90) f i ( y, θ ) α +1 u j ( y, θ ∗ ) dy (cid:35) f i ( y, θ ∗ ) α +1 dy = 0 . On the other hand, we denote B ( njk = (cid:18) ∂ H αn ( θ ) ∂θ j ∂θ k (cid:19) θ = θ , and applying again condition C6., we obtain the convergence B ( njk = (cid:18) ∂ H αn ( θ ) ∂θ j ∂θ k (cid:19) θ = θ = 1 n n (cid:80) i =1 (cid:18) ∂ V i ( Y i ; θ ) ∂θ j ∂θ k (cid:19) θ = θ P → n n (cid:80) i =1 E θ ∗ (cid:34)(cid:18) ∂ V i ( Y ; θ ) ∂θ j ∂θ k (cid:19) θ = θ (cid:35) = ( − Ψ n ) jk . Finally, applying condition C6. to the third derivative, we have (cid:18) ∂ H αn ( θ ) ∂θ j ∂θ k ∂θ l (cid:19) θ = θ ∗ = 1 n n (cid:80) i =1 (cid:18) ∂ V i ( Y i ; θ ) ∂θ j ∂θ k ∂θ l (cid:19) θ = θ ∗ P → n n (cid:80) i =1 E θ ∗ (cid:20)(cid:18) ∂ V i ( Y ; θ ) ∂θ j ∂θ k ∂θ l (cid:19) θ = θ ∗ (cid:21) . Assumption C5. ensures the existence of M jkl , j, k, l = 1 , .., p s.t. (cid:12)(cid:12)(cid:12)(cid:12)(cid:18) ∂ V i ( Y i ; θ ) ∂θ j ∂θ k ∂θ l (cid:19) θ = θ ∗ (cid:12)(cid:12)(cid:12)(cid:12) ≤ M ( i ) jkl ( y ) , and therefore there exists γ ( i ) jkl ( y ) verifying0 ≤ | γ jkl ( y ) | ≤ (cid:18) ∂ V i ( Y i ; θ ) ∂θ j ∂θ k ∂θ l (cid:19) θ = θ ∗ = M ( i ) jkl ( Y i ) γ ( i ) jkl ( Y i ) . and E θ ∗ (cid:104) M ( i ) jkl ( Y i ) (cid:105) = m jkl , with (cid:12)(cid:12)(cid:12)(cid:12) n n (cid:80) i =1 E θ ∗ (cid:20)(cid:18) ∂ V i ( Y ; θ ) ∂θ j ∂θ k ∂θ l (cid:19) θ = θ ∗ (cid:21)(cid:12)(cid:12)(cid:12)(cid:12) < m jkl . The previous convergence provide that for all a and for all ε there exists n such that forall n > n we have P (cid:16)(cid:12)(cid:12)(cid:12) A ( nj (cid:12)(cid:12)(cid:12) > a (cid:17) < εp + p + p P (cid:16)(cid:12)(cid:12)(cid:12) B ( njk − (Ψ n ) jk (cid:12)(cid:12)(cid:12) ≥ a (cid:17) < εp + p + p P (cid:18)(cid:12)(cid:12)(cid:12)(cid:12) n n (cid:80) i =1 (cid:16) ∂ V i ( Y i ; θ ) ∂θ j ∂θ k ∂θ l (cid:17) θ = θ ∗ (cid:12)(cid:12)(cid:12)(cid:12) ≥ m jkl (cid:19) ≤ εp + p + p .
27e shall denote now by S ∗ the event containing the p + p + p inequalities, (cid:26)(cid:12)(cid:12)(cid:12) A ( nj (cid:12)(cid:12)(cid:12) > a ; (cid:12)(cid:12)(cid:12) B ( njk − ( − Ψ n ) jk (cid:12)(cid:12)(cid:12) ≥ a ; (cid:12)(cid:12)(cid:12)(cid:12) n n (cid:80) i =1 (cid:18) ∂ V i ( Y i ; θ ) ∂θ j ∂θ k ∂θ l (cid:19) θ = θ • (cid:12)(cid:12)(cid:12)(cid:12) ≥ m jkl (cid:27) . It is clear that P ( S ∗ ) < ε and P (( S ∗ ) C ) ≥ − ε. In the following we denote S = ( S ∗ ) C . We finally study the sign of H αn ( θ ) − H αn ( θ ∗ ) under the event S and for θ ∈ Q a . Since θ ∈ Q a in S holds | L | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n p (cid:80) j =1 n (cid:80) i =1 (cid:18) ∂V i ( Y i ; θ ) ∂θ j (cid:19) θ = θ (cid:0) θ j − θ j (cid:1)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ p a a (43)and (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p (cid:80) j =1 p (cid:80) k =1 (cid:110)(cid:16) B ( njk − ( − Ψ n ) jk (cid:17)(cid:111) (cid:0) θ j − θ j (cid:1) (cid:0) θ k − θ k (cid:1)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ p a a. We now consider the negative quadratic form A = − p (cid:80) j =1 p (cid:80) k =1 (cid:0) θ j − θ j (cid:1) (cid:0) θ k − θ k (cid:1) n n (cid:80) i =1 E θ ∗ (cid:34)(cid:18) ∂ V i ( Y ; θ ) ∂θ j ∂θ k (cid:19) θ = θ (cid:35) . An orthogonal transformation can reduce the quadratic form A to its diagonal form A = p (cid:80) i =1 λ i ξ i with p (cid:80) i =1 ξ i = p (cid:80) i =1 ( θ i − θ pi ) = a . Shorting the negatives eigenvalues λ i we get p (cid:80) i =1 λ i ξ i ≤ − λ a < . A study of the sign of the function p a − λ a proves that we can find c > , a > a < a | L | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p (cid:80) j =1 p (cid:80) k =1 (cid:110)(cid:16) B ( njk − ( − Ψ n ) jk (cid:17)(cid:111) (cid:16) θ j − θ pj (cid:17) (cid:0) θ k − θ pk (cid:1) + 12 p (cid:80) j =1 p (cid:80) k =1 (cid:16) θ j − θ pj (cid:17) (cid:0) θ k − θ pk (cid:1) ( − Ψ n ) jk (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) < − ca . Lastly, | L | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p (cid:80) j =1 p (cid:80) k =1 p (cid:80) l =1 n n (cid:80) i =1 (cid:18) ∂ V i ( Y i ; θ ) ∂θ j ∂θ k ∂θ l (cid:19) θ = θ • (cid:16) θ j − θ pj (cid:17) (cid:0) θ k − θ pk (cid:1) (cid:0) θ l − θ pl (cid:1)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) < p (cid:80) j =1 p (cid:80) k =1 p (cid:80) l =1 m jkl a = a b, being b = 26 p (cid:80) j =1 p (cid:80) k =1 p (cid:80) l =1 m jkl . Therefore, H αn ( θ ) − H αn ( θ ) < pa − ca + ba pa − ca + ba < a < cb + p . Therefore assuming a < cb + p we get thatin the event S H αn ( θ ) − H αn ( θ ) < ∀ θ ∈ Q a . Thus the event C involving all θ ∈ Q a s.t. H αn ( θ ) − H αn ( θ ∗ <
0, is contained in S , i.e. P ( C ) ≥ P ( S ) > − ε. Choosing a lower than min (cid:16) a , cb + p (cid:17) , we havelim n →∞ P ( ∀ θ ∈ Q a /H αn ( θ ) − H αn ( θ ) <
0) = 1 . Thus, there exists (cid:98) θ n ( a ) ∈ Q a , i,e., (cid:13)(cid:13)(cid:13)(cid:98) θ n ( a ) − θ (cid:13)(cid:13)(cid:13) < a such that H αn ( θ ) has a local maxi-mum in (cid:98) θ n ( a ) , i.e., ∀ a ≤ min (cid:18) a , cb + p (cid:19) , we obtain the required convergencelim n →∞ P (cid:16)(cid:13)(cid:13)(cid:13)(cid:98) θ n ( a ) − θ ∗ (cid:13)(cid:13)(cid:13) < a (cid:17) = 1 . A.2 Proof of Theorem 3.2
We denote H αn,j ( θ ) = ∂H αn ( θ ) ∂θ j with H αn ( θ ) defined in (6). A Taylor expansion of H αn,j ( θ ) around θ ∗ , gives, H αn,j ( θ ) = H αn,j ( θ ∗ ) + p (cid:80) k =1 (cid:18) ∂ H αn ( θ ) ∂θ j ∂θ k (cid:19) θ = θ (cid:0) θ k − θ k (cid:1) + 12 p (cid:80) k =1 p (cid:80) l =1 (cid:18) ∂ H αn ( θ ) ∂θ j ∂θ k ∂θ l (cid:19) θ = θ ∗ (cid:0) θ k − θ k (cid:1) (cid:0) θ l − θ l (cid:1) with θ ∗ in the segment conecting θ and θ ∗ . It is clear that at the minimum RP estimatorthe function H αn,j vanishes, H αn,j ( (cid:98) θ α ) = 0 . Therefore, H αn,j ( θ ∗ ) = − p (cid:80) k =1 (cid:18) ∂ H αn ( θ ) ∂θ j ∂θ k (cid:19) θ = θ (cid:16)(cid:98) θ α,k − θ k (cid:17) − p (cid:80) k =1 p (cid:80) l =1 (cid:18) ∂ H αn ( θ ) ∂θ j ∂θ k ∂θ l (cid:19) θ = θ ∗ (cid:16)(cid:98) θ α,k − θ k (cid:17) (cid:16)(cid:98) θ α,l − θ l (cid:17) . Using that H αn,j ( θ ∗ ) = 1 n n (cid:80) i =1 (cid:18) ∂V i ( Y, θ ) ∂θ j (cid:19) θ = θ , it holds 1 √ n n (cid:80) i =1 (cid:18) ∂V i ( Y, θ ) ∂θ j (cid:19) θ = θ = √ n p (cid:88) k =1 (cid:16)(cid:98) θ α,k − θ k (cid:17) (cid:26) − (cid:18) ∂ H αn ( θ ) ∂θ j ∂θ k (cid:19) θ = θ − p (cid:88) k =1 p (cid:88) l =1 (cid:18) ∂ H αn ( θ ) ∂θ j ∂θ k ∂θ l (cid:19) θ = θ ∗ (cid:16)(cid:98) θ α,l − θ l (cid:17) (cid:27) .
29f we denote, Z kn = √ n p (cid:80) k =1 (cid:16)(cid:98) θ α,k − θ k (cid:17) ,A jkn = − (cid:18) ∂ H αn ( θ ) ∂θ j ∂θ k (cid:19) θ = θ − p (cid:80) k =1 p (cid:80) l =1 (cid:18) ∂ H αn ( θ ) ∂θ j ∂θ k ∂θ l (cid:19) θ = θ ∗ (cid:16)(cid:98) θ α,l − θ l (cid:17) ,T jn = 1 √ n n (cid:80) i =1 (cid:18) ∂V i ( Y, θ ) ∂ θ (cid:19) θ = θ , we can write T jn = p (cid:80) k =1 A jkn Z kn . Finally, we define the following vectors Z n = ( Z n , ..., Z pn ) T , T n = ( T n , ..., T pn ) T , A n = ( A jkn ) j =1 ,...,p ; k =1 ,...,p . It is clear that T n = A n Z n = (cid:32) √ n n (cid:80) i =1 (cid:18) ∂V i ( Y, θ ) ∂θ (cid:19) θ = θ , ..., √ n n (cid:80) i =1 (cid:18) ∂V i ( Y, θ ) ∂θ p (cid:19) θ = θ (cid:33) T = 1 √ n n (cid:80) i =1 (cid:18) ∂V i ( Y, θ ) ∂ θ (cid:19) θ = θ , and it is a simple exercise to verify that V i ( Y, θ ) , i = 1 , . . . , n , are independent with E θ ∗ (cid:34)(cid:18) ∂V i ( Y, θ ) ∂ θ (cid:19) θ = θ (cid:35) = 0and V ar θ ∗ (cid:34)(cid:18) ∂V i ( Y, θ ) ∂ θ (cid:19) θ = θ (cid:35) < ∞ . By Assumption C7. and applying the multivariate extension of Lindeberg-Levy centrallimit theorem we get √ n Ω − n T n L → n →∞ N ( p , I p )or equivalently √ n Ω − n A n Z n L → n →∞ N ( p , I p ) . By assumption C5., (cid:18) ∂ H αn ( θ ) ∂θ j ∂θ k ∂θ l (cid:19) θ = θ ∗ is bounded with probability tending to one. Therefore based on the consistency of (cid:98) θ α wehave that the second term of A jkn converges to zero in probability. Moreover − (cid:18) ∂ H αn ( θ ) ∂θ j ∂θ k (cid:19) θ = θ P → n →∞ (Ψ n ) jk Ω − n ( A n − Ψ n ) Z n P → n →∞ p . Therefore, Ω − n Ψ n Z n L → n →∞ N ( p , I p )and finally Ω − n Ψ n (cid:16)(cid:98) θ α − θ ∗ (cid:17) L → n →∞ N ( p , I p ) . A.3 Proof of Theorem 4.2
We have by (11) that √ n ( (cid:98) θ α − θ ∗ ) L → n →∞ N ( p , Σ α ( θ ∗ )) , where Σ α ( θ ∗ ) = lim n →∞ Ψ n ( θ ∗ ) Ω − n ( θ ∗ ) Ψ n ( θ ∗ ). Therefore, √ n ( M T (cid:98) θ α − m ) L → n →∞ N ( p , M T Σ α ( θ ∗ ) M ) . As rank ( M ) = p , we have that n ( M T (cid:98) θ α − m ) T ( M T Σ α ( θ ∗ ) M ) − ( M T (cid:98) θ α − m )converges in law to a chi-square distribution with p degrees of freedom. But under H , Σ α ( θ ) = Σ α ( θ ∗ ), and thus W n ( θ ) converges to a chi-square distribution with p degreesof freedom. A.4 Proof of Theorem 4.3
A first-order Taylor expansion of (cid:96) ( θ ) around θ ∗ at (cid:98) θ α is given by (cid:96) (cid:98) θ α − (cid:96) ( θ ∗ ) = ∂(cid:96) ( θ ) ∂ θ T (cid:12)(cid:12)(cid:12)(cid:12) θ = θ ∗ ( (cid:98) θ α − θ ∗ ) + o p ( n − / ) . Then the asymptotic distribution of the random variable √ n ( (cid:98) θ α − θ ∗ ) matches the asymp-totic distribution of the random variable ∂(cid:96) ( θ ) ∂ θ T (cid:12)(cid:12)(cid:12) θ = θ ∗ √ n ( (cid:98) θ α − θ ∗ and the result follows. A.5 Proof of Theorem 4.6
We have by (11) that √ n ( (cid:98) θ α − θ ∗ ) L → n →∞ N ( p , Σ α ( θ ∗ )) , where Σ α ( θ ∗ ) = lim n →∞ Ψ n ( θ ∗ ) Ω − n ( θ ∗ ) Ψ n ( θ ∗ ). Therefore, √ n ( M T (cid:98) θ α − m ) L → n →∞ N ( p , M T Σ α ( θ ∗ ) M ) . As rank ( M ) = r , we have that n ( M T (cid:98) θ α − m ) T ( M T Σ α ( θ ∗ ) M ) − ( M T (cid:98) θ α − m )converges in law to a chi-square distribution with r degrees of freedom. But Σ α ( (cid:98) θ α ) is aconsistent estimator of Σ α ( θ ∗ ), and thus W n ( (cid:98) θ α ) converges to a chi-square distributionwith r degrees of freedom. 31 .6 Proof of Theorem 4.7 A Taylor series expansion of M T θ − m around θ n yields M T (cid:98) θ α − m = M T θ n − m + M T ( (cid:98) θ α − θ n ) + o ( || (cid:98) θ α − θ n || )= n − / M T d − m + M T ( (cid:98) θ α − θ n ) + o ( || (cid:98) θ α − θ n || ) . Using Theorem 6.2, √ n ( (cid:98) θ α − θ n ) L → n →∞ N ( , Σ α ) and √ no ( || (cid:98) θ α − θ n || ) = o p (1) , we getthe asymptotic convergence √ n (cid:16) M T (cid:98) θ α (cid:17) L → n →∞ N ( M T d , M T Σ α M ) . We now consider the random variable Z = √ n M T (cid:98) θ α (cid:0) M T Σ α M (cid:1) − / satisfying Z L → n →∞ N ( (cid:0) M T Σ α M (cid:1) − / M T d , I r × r ) . Hence, the asymptotic distribution of the quadratic form W = Z T Z is given by a non-central chi-square distribution with r degrees of freedom and non-centrality parameter δ = d T M [ M T Σ α ( (cid:98) θ α ) M ] − M T d . A.7 Proof of Theorem 5.1
The IF of the functional T α ( G , . . . , G n ) with contamination in the i -th direction will beobtained replacing θ i ε and g i ,ε in θ and g i respectively in the equality (20), differentiatingwith respect to ε and evaluating the corresponding equality in ε = 0.In (20) we replace θ by θ i ε and g i ( y ) by g i ,ε = (1 − ε ) g i ( y ) + ε ∆ t i ( y ) , and for i (cid:54) = i we consider the original g i ( y ). We get1 n n (cid:80) i =1 (cid:82) f i ( y, θ i ε ) α u i ( y, θ i ε ) dy (cid:82) f i ( y, θ i ε ) α dy − n n (cid:80) i (cid:54) = i (cid:82) f i ( y, θ i ε ) α g i ( y ) u i ( y, θ i ε ) dy (cid:82) f i ( y, θ i ε ) α g i ( y ) dy − (cid:82) f i ( y, θ i ε ) α g i ,ε ( y ) u i ( y, θ i ε ) dy (cid:82) f i ( y, θ i ε ) α g i ,ε ( y ) dy = p . (44)Now, we denote ζ i,α ( θ i ε ) = (cid:82) f i ( y, θ i ε ) α u i ( y, θ i ε ) dy (cid:82) f i ( y, θ i ε ) α dy , ζ ∗ i,α ( θ i ε ) = (cid:82) f i ( y, θ i ε ) α g i ( y ) u i ( y, θ i ε ) dy (cid:82) f i ( y, θ i ε ) α g i ( y ) dy , ζ ∗∗ i,α ( θ i ε ) = (cid:82) f i ( y, θ i ε ) α g i ,ε ( y ) u i ( y, θ i ε ) dy (cid:82) f i ( y, θ i ε ) α g i ,ε ( y ) dy . n n (cid:88) i =1 ζ i,α ( θ i ε ) − n n (cid:80) i (cid:54) = i ζ ∗ i,α ( θ i ε ) − ζ ∗∗ i ,α ( θ i ε ) = . (45)Now, we have ∂ ζ i,α ( θ i ε ) ∂ε = (cid:18)(cid:90) f i ( y, θ i ε ) α dy (cid:19) − (cid:26)(cid:20) (1 + α ) (cid:90) f i ( y, θ i ε ) α ∂f i ( y, θ i ε ) ∂ θ i ε ∂ θ i ε ∂ε u i ( y, θ i ε ) dy + (cid:90) f i ( y, θ i ε ) α ∂ u i ( y, θ i ε ) ∂ θ i ε ) ∂ θ i ε ∂ε dy (cid:21) (cid:90) f i ( y, θ i ε ) α dy − (cid:20) (1 + α ) (cid:90) f i ( y, θ i ε ) α ∂f i ( y, θ i ε ) ∂ θ i ε ∂ θ i ε ∂ε dy (cid:21) (cid:90) f i ( y, θ i ε ) α dy (cid:27) , and ∂ ζ i,α ( θ i ε ) ∂ε (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ε =0 = IF ( t i , T α , G , . . . , G n ) (cid:0)(cid:82) f i ( y, θ i ε ) α dy (cid:1) × (cid:26)(cid:20) (1 + α ) (cid:90) f i ( y, θ ) α +1 u Ti ( y, θ ) u i ( y, θ ) dy + (cid:90) f i ( y, θ ) α +1 ∂ u i ( y, θ ) ∂ θ dy (cid:21) (cid:90) f i ( y, θ ) α +1 dy − (1 + α ) (cid:18)(cid:90) f i ( y, θ ) α +1 u i ( y, θ ) dy (cid:19) (cid:18)(cid:90) f i ( y, θ ) α +1 u i ( y, θ ) dy (cid:19) T (cid:41) = IF ( t i , T α , G , . . . , G n ) (cid:0)(cid:82) f i ( y, θ i ε ) α dy (cid:1) A i,α ( θ ) , with A i,α ( θ ) = (cid:20) (1 + α ) (cid:90) f i ( y, θ ) α +1 u Ti ( y, θ ) u i ( y, θ ) dy + (cid:90) f i ( y, θ ) α +1 ∂ u i ( y, θ ) ∂ θ dy (cid:21) (cid:90) f i ( y, θ ) α +1 dy. Therefore, ∂∂ε n (cid:88) i =1 ζ i,α ( θ i ε ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ε =0 = n (cid:88) i =1 ∂ ζ i,α ( θ i ε ) ∂ε (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ε =0 = IF ( t i , T α , G , . . . , G n ) n (cid:88) i =1 A i,α ( θ ) (cid:0)(cid:82) f i ( y, θ i ε ) α dy (cid:1) . Now, ∂ ζ ∗ i,α ( θ i ε ) ∂ε = (cid:18)(cid:90) f i ( y, θ i ε ) α g i ( y ) dy (cid:19) − (cid:26)(cid:20) α (cid:90) f i ( y, θ i ε ) α − g i ( y ) ∂f i ( y, θ i ε ) ∂ θ i ε ∂ θ i ε ∂ε u i ( y, θ i ε ) dy + (cid:90) f i ( y, θ i ε ) α g i ( y ) ∂ u i ( y, θ i ε ) ∂ θ i ε ) ∂ θ i ε ∂ε dy (cid:21) (cid:90) f i ( y, θ i ε ) α g i ( y ) dy − (cid:20) α (cid:90) f i ( y, θ i ε ) α − g i ( y ) ∂f i ( y, θ i ε ) ∂ θ i ε ∂ θ i ε ∂ε dy (cid:21) (cid:90) f i ( y, θ i ε ) α dy (cid:27) . This is, 33 ζ ∗ i,α ( θ i ε ) ∂ε (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ε =0 = IF ( t i , T α , G , . . . , G n ) (cid:0)(cid:82) f i ( y, θ i ε ) α g i ( y ) dy (cid:1) × (cid:26)(cid:20) α (cid:90) f i ( y, θ ) α g i ( y ) u Ti ( y, θ ) u i ( y, θ ) dy + (cid:90) f i ( y, θ ) α g i ( y ) ∂ u i ( y, θ ) ∂ θ dy (cid:21) (cid:90) f i ( y, θ ) α g i ( y ) dy − α (cid:18)(cid:90) f i ( y, θ ) α g i ( y ) u i ( y, θ ) dy (cid:19) (cid:18)(cid:90) f i ( y, θ ) α g i ( y ) u i ( y, θ ) dy (cid:19) T (cid:41) = IF ( t i , T α , G , . . . , G n ) (cid:0)(cid:82) f i ( y, θ i ε ) α g i ( y ) dy (cid:1) A ∗ i,α ( θ ) , with A ∗ i,α ( θ ) = (cid:20) α (cid:90) f i ( y, θ ) α g i ( y ) u Ti ( y, θ ) u i ( y, θ ) dy + (cid:90) f i ( y, θ ) α g i ( y ) ∂ u i ( y, θ ) ∂ θ dy (cid:21) (cid:90) f i ( y, θ ) α g i ( y ) dy − α (cid:18)(cid:90) f i ( y, θ ) α g i ( y ) u i ( y, θ ) dy (cid:19) (cid:18)(cid:90) f i ( y, θ ) α g i ( y ) u i ( y, θ ) dy (cid:19) T . Therefore, ∂∂ε n (cid:88) i =1 ζ ∗ i,α ( θ i ε ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ε =0 = n (cid:88) i =1 ∂ ζ ∗ i,α ( θ i ε ) ∂ε (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ε =0 = IF ( t i , T α , G , . . . , G n ) n (cid:88) i =1 A ∗ i,α ( θ ) (cid:0)(cid:82) f i ( y, θ i ε ) α g i ( y ) dy (cid:1) . In a similar manner, ∂∂ε ζ ∗∗ i ,α ( θ i ε ) (cid:12)(cid:12)(cid:12)(cid:12) ε =0 = ∂ ζ ∗∗ i ,α ( θ i ε ) ∂ε (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ε =0 = (cid:96) i ,α ( θ ) (cid:0)(cid:82) f i ( y, θ ) α g i ( y ) dy (cid:1) , with (cid:96) i ,α ( θ ) = f i ( y, θ ) (cid:90) f i ( y, θ ) α +1 u i ( y, θ ) dy − f i ( y, θ ) u i ( y, θ ) (cid:90) f i ( y, θ ) α +1 dy. Therefore, equality (45) can be written as IF ( t i , T α , G , . . . , G n ) (cid:40) n n (cid:88) i =1 (cid:34) A i,α ( θ ) (cid:0)(cid:82) f i ( y, θ ) α dy (cid:1) − A ∗ i,α ( θ ) (cid:0)(cid:82) f i ( y, θ ) α g i ( y ) dy (cid:1) (cid:35)(cid:41) + (cid:96) i ,α ( θ ) (cid:0)(cid:82) f i ( y, θ ) α g i ( y ) dy (cid:1) = . Finally, IF ( t i , T α , G , . . . , G n ) = ( M n,α ( θ )) − − (cid:96) i ,α ( θ ) (cid:0)(cid:82) f i ( y, θ ) α g i ( y ) dy (cid:1) , where M n,α ( θ ) = 1 n n (cid:88) i =1 (cid:34) A i,α ( θ ) (cid:0)(cid:82) f i ( y, θ ) α dy (cid:1) − A ∗ i,α ( θ ) (cid:0)(cid:82) f i ( y, θ ) α g i ( y ) dy (cid:1) (cid:35) . .8 Proof of Lemma 6.1 This proof is very similar to that of [12] (Lemma 6.1).
A.9 Proof of Theorem 6.2
The consistence follows directly from Lemma 6.1 and Theorem 3.1, while the asymptoticdistribution is obtained applying Lemma 6.1 and Theorem 3.2 to the MLRM.
References [1] Beran, R. (1977). Minimum Hellinger distance estimates for parametric models.
TheAnnals of Statistics , (3), 445–463.[2] Tamura, R. N., & Boos, D. D. (1986). Minimum Hellinger distance estimation formultivariate location and covariance. Journal of the American Statistical Association , (393), 223–229.[3] Simpson, D. G. (1987). Minimum Hellinger distance estimation for the analysis ofcount data. Journal of the American Statististical Association , (399), 802–807.[4] Simpson, D. G. (1989). Hellinger deviance tests: efficiency, breakdown points, andexamples. Journal of the American Statististical Association , (405), 107–113.[5] Lindsay, B. G. (1994). Efficiency versus robustness: the case for minimum Hellingerdistance and related methods. The Annals of Statistics , (2), 1081–1114.[6] Pardo, L. (2006). Statistical Inference Based on Divergence Measures . Chapman &Hall/CRC, Boca de Raton.[7] Basu, A., Shioya, H., & Park, C. (2011).
Statistical inference: the minimum distanceapproach . Chapman & Hall/CRC, Boca de Raton.[8] Jones, M.C., Hjort, N.L., Harris, I.R. & Basu, A. (2001). A comparison of relateddensity-based minimum divergence estimators.
Biometrika , , 865-873.[9] Broniatowski, M., Toma, A. & Vajda, I. (2012). Decomposable pseudodistances andapplications in statistical estimation, Journal of Statistical Planning and Inference , , 2574–2585.[10] Castilla, E., Mart´ın N., Mu˜noz S. & Pardo, L. (2020). Robust Wald-type tests basedon Minimum R´enyi Pseudodistance Estimators for the Multiple Regression Model. Journal of Statistical Computation and Simulation . (14), 2655–2680.[11] Castilla, E., Ghosh, A., Jaenada, M. & Pardo, L. (2020). On regularization meth-ods based on R´enyi’s pseudodistances for sparse high-dimensional linear regressionmodels. https://arxiv.org/abs/2007.15929 .3512] Ghosh, A., & Basu, A. (2013). Robust estimation for independent non-homogeneousobservations using density power divergence with applications to linear regression. Electronic Journal of Statistics , , 2420–2456.[13] Ghosh, A., & Basu, A. (2018). Robust bounded influence tests for independent non-homogeneous observations. Statistica Sinica , (3), 1133–1155.[14] Basu, A. , Ghosh, A. , Martin, N. & Pardo, L. (2018). Robust Wald-type testsfor non-homogeneous observations based on the minimum density power divergenceestimator. Metrika , (5) 493–522.[15] Chandra, T. K. (1989). Uniform integrabilkity in the Ces´areo sense and the weak lawof large numbers. Sankya, Serie A , , 309–317.[16] Feller, W. (1971). An Introduction to Probability Theory and its applications.
VolumeII, 2nd edition. John Wiley & Sons.[17] Fanny Leroy, F., Dauxois, J.Y. & Tubert-Bitter, P. (2016). On the Parametric Max-imum Likelihood Estimator for Independent but Non-identically Distributed Obser-vations with Application to Truncated Data.
Journal of Statistical Theory and Ap-plications , (1), 96–107.[18] Weisberg, S. Applied linear regression . Vol. 528. John Wiley & Sons, 2005.[19] Rousseeuw, P. J., & Annick M. L. (2005).
Robust regression and outlier detection .Vol. 589. John Wiley & Sons.[20] Mickey, M. R, Dunn, O. J. &Clark, V. (1967). Note on the use of stepwise egressionin detecting outliers.
Computers and Biomedical Research ,1