[PDF] On the estimating equations and objective functions for parameters of exponential power distribution: Application for disorder

Abstract

The efficient modeling for disorder in a phenomena depends on the chosen score and objective functions. The main parameters in modeling are location, scale and shape. The exponential power distribution known as generalized Gaussian is extensively used in modeling. In real world, the observations are member of different parametric models or disorder in a data set exists. In this study, estimating equations for the parameters of exponential power distribution are derived to have robust and also efficient M-estimators when the data set includes disorder or contamination. The robustness property of M-estimators for the parameters is examined. Fisher information matrices based on the derivative of score functions from \log, \log_q and distorted log-likelihoods are proposed by use of Tsallis q-entropy in order to have variances of M-estimators. It is shown that matrices derived by score functions are positive semidefinite if conditions are satisfied. Information criteria inspired by Akaike and Bayesian are arranged by taking the absolute value of score functions. Fitting performances of score functions from estimating equations and objective functions are tested by applying volume, information criteria and mean absolute error which are essential tools in modeling to assess the fitting competence of the proposed functions. Applications from simulation and real data sets are carried out to compare the performance of estimating equations and objective functions. It is generally observed that the distorted log-likelihood for the estimations of parameters of exponential power distribution has superior performance than other score and objective functions for the contaminated data sets.

Full PDF

aa r X i v : . [ m a t h . S T ] F e b On the estimating equations and objective functions forparameters of exponential power distribution: Application fordisorder

Mehmet Niyazi C¸ ankaya a ,a a Faculty of Applied Sciences, Department of International Trading and Finance,U¸sak University, U¸sak, Turkey and a Faculty of Art and Sciences, Department of Statistics, U¸sak University, U¸sak, Turkey

Abstract

The eﬃcient modeling for disorder in a phenomena depends on the chosen score and objectivefunctions. The main parameters in modeling are location, scale and shape. The exponential powerdistribution known as generalized Gaussian is extensively used in modeling. In real world, theobservations are member of diﬀerent parametric models or disorder in a data set exists. In thisstudy, estimating equations for the parameters of exponential power distribution are derived to haverobust and also eﬃcient M-estimators when the data set includes disorder or contamination. Therobustness property of M-estimators for the parameters is examined. Fisher information matricesbased on the derivative of score functions from log, log q and distorted log-likelihoods are proposedby use of Tsallis q -entropy in order to have variances of M-estimators. It is shown that matricesderived by score functions are positive semideﬁnite if conditions are satisﬁed. Information criteriainspired by Akaike and Bayesian are arranged by taking the absolute value of score functions.Fitting performances of score functions from estimating equations and objective functions aretested by applying volume, information criteria and mean absolute error which are essential tools inmodeling to assess the ﬁtting competence of the proposed functions. Applications from simulationand real data sets are carried out to compare the performance of estimating equations and objectivefunctions. It is generally observed that the distorted log-likelihood for the estimations of parametersof exponential power distribution has superior performance than other score and objective functionsfor the contaminated data sets. Mathematics Subject Classiﬁcation:

Keywords:

Estimating equations; Fisher information; q -calculus; Modeling; Tsallis q -entropy. . INTRODUCTION Eﬃcient modeling a data set is an important problem in the applied science and dependson the functions which are used to perform modeling. If a data set includes contamination ordisorder, it is diﬃcult to manage the eﬃciency. The relative entropies or divergences are usedto estimate robustly the parameters in a parametric model f ( x ; θ ) [1] when contaminationexists in the data set. Recently, Tsallis statistic based on q -deformed calculus gains the mostimportant concern to model the data set including contamination or disorder [2–4]. Thestatistics for location and scale are inevitable indicators to summarize a data set. We assumethat a data set is a member of f ( x ; θ ), which is strict assumptation to model and summarizethe tendeny (location) and dispersion (scale) of the data set. Getting location and scale isperformed by using two procedures which are f ( x ; θ ) and estimation method for parameters θ of f . The most prominent estimation method is maximum likelihood estimation (MLE)which is equivalent to minimization of relative divergences between data and function [1].MLE method produces estimators ˆ θ which have properties such as unbiasedness, eﬃciency,minimum variance [5]. In MLE method, we can take ordinary logarithm, i.e. log, to proceeda simple analytical expression. The chosen parametric function and taking logarithm inMLE is originally an objective function (M-function) ρ = − log( f ) which gives an advantageto manage the eﬃciency and so we have eﬃcient M-estimators from M-functions. In thiscase, log can be replaced by log q derived by Tsallis q -entropy. Thus, we have maximumlog q likelihood estimation method which has been extensively used and applied by [6–9]for the estimations of parameters. The parameter q makes a slow transition to log. Thediﬀerent values of q in log q zooms to tail or central parts of the model. It is possible toconsider other types of the generalized and deformed logarithms [10–14]. However, theconcavity, analytical simplicity for examining ﬁniteness of score functions and constrainedoptimization in computation for log q are important properties to apply for estimations ofparameters [7]. log q is simple and free from integral calculations when we compare withdivergences [15] which is originally from [16]. It can be convenient to propose a log-scorefunction from log by use of MLE instead of scanning the model f ( x ; θ ) completely by meansof log q . Thus, it is possible to model the disorder or diﬀerent contaminations into data sets.As an another M-function based on objective function, a convex combination of underlyingand contamination distributions are proposed by [7] for estimations of parameters in the2nderlying distribution.It is assumed that a data set comes from an underlying distribution f ( x ; θ ). In this case,observations x , x , · · · , x n are a member of f ( x ; θ ) and so they are identical. However, iden-tically distributed observations are not generally observed in an empirical application. Theunderlying distribution f ( x ; θ ) can contain some contamination from f ( x ; τ ). If there existsa contamination, using robust estimation methods for parameters of underlying distributionis necessary [17]. In order to make a robust and also an eﬃcient ﬁtting for non-identicalobservations, the estimating equations (EEs) from a generalized version of likelihood estima-tion methods can be used [18]. EEs are deﬁned as solutions of system of equations accordingto corresponding parameters θ : n X i =1 Ψ( x i ; θ ) | θ =ˆ θ = , (1)where Ψ( x i ; θ ) = ∂∂ θ Λ[ f ( x i ; θ )], Λ represents a generalization of log and Ψ is a vector for scorefunctions ψ , ψ , · · · ψ p . M-estimators ˆ θ are produced by EEs which include M-functions[7, 18]. If Λ is log, then MLE of parameters are obtained. One can ﬁnd diﬀerent examplesof M-functions from [17, 19, 20]. The parameters of underlying distribution are estimatedrobustly by use of these M-functions. However, eﬃciency, that is, the more precise ﬁttingperformed by M-functions, is another challenging task in the estimations of parameters.For this reason, it is necessary to propose M-functions which can accomplish to managethe eﬃciency. The M-functions from objective and score can be chosen by volume frominformation geometry, information criteria (IC) and mean absolute error. [7, 22–25].This paper aims to propose the score function S which can be generated from Ψ( x ; θ ) bymeans of EEs. We consider to derive new score functions S q and S D from log q and distortedlog likelihoods, respectively. Thus, a weighted log-score function wS from S q and S D isobtained. Since M-estimators from M-functions are competative each other, we will make acomparison among them to test their performance about getting the eﬃcient and the robustM-estimators. Fisher information and IC for score functions will be proposed to performinference based on score functions.The remainder of this paper is organized as follows. Section II introduces Tsallis q -entropy. Section III introduces maximum likelihood type estimation methods. In SectionIV, we brieﬂy recall deﬁnitions of M-estimation originally generated from MLE. EEs from3ikelihood estimation are proposed. Thus, we will propose score functions from EEs for dif-ferent values α j , j = 1 , , α at some intervals on the real line and so thelocation and scale parameters of underlying distribution can be estimated eﬃciently and ro-bustly. Fisher information are given by Section V. Section VI provides tools for informationtheory. Real data application is given by Section VII. Section VIII is divided into conclusionand future works. Appendices are given for random number generation, computation, sim-ulation, positive semideﬁnite of Fisher information matrices and the corresponding resultsof these matrices for EP distribution. II. q -DERIVATIVE AND TSALLIS q -ENTROPY Entropy is a tool to measure disorder in a phenomena [26]. Tsallis q -entropy is a gen-eralization of Shannon entropy [27] based on q -deformed calculus [28]. The idea whichgenerates a functional form of entropy is provided by [29]. There are two steps to produceTsallis q -entropy. Firstly, we have partition function given by g ( z ; θ ) = f ( x ; θ ) z , (2)where z ∈ R and vector θ ∈ R p represents the number of p parameters θ , θ and θ p , p ∈ Z + from a parametric model f ( x ; θ ) ∈ R . Secondly, we have Jackson q -derivative deﬁned as D qz g ( z ; θ ) = − g ( z ; θ ) − g ( qz ; θ )(1 − q ) z , < q < . (3)When the deﬁnition in Eq. (3) is applied to Eq. (2), the functional form of Tsallis q -entropyis produced by the following form: D qz f ( x ; θ ) z | z =1 = − f ( x ; θ ) − f ( x ; θ ) q − q . (4)[30, 31].If we consider the probability values from f ( x i ; θ ) for random observations x , x , · · · , x n ,then we have Tsallis q -entropy deﬁned as − n X i =1 f ( x i ; θ ) − f ( x i ; θ ) q − q = − − P ni =1 f ( x i ; θ ) q − q = − n X i =1 f ( x i ; θ ) q log q ( f ( x i ; θ )) . (5)log q ( f ( x i ; θ )) = f ( x i ; θ ) − q − − q is q -deformed form of ordinary logarithm. For q →

1, log q drops to log [2]. f ( x ; θ ) q corresponds to (unnormalized) escort distribution connected with4he q -calculus [28]. Escort distribution, or “zooming distribution”, was originally realized byrelation with dynamical chaotic systems [32, 33] and is also broadly applied into multifractals[34]. III. ESTIMATION METHODS BASED ON MAXIMUM LIKELIHOOD TYPE

The maximum distorted likelihood estimation (MDLE) is a tool to estimate the param-eters θ of model f : x × θ ∈ R p → R which is a probability density (p.d.) functioncorresponding to random variables X ,. . . , X n which are non-identical and so some of X i aredisorder or a member of another parametric model f ( x ; τ ). The likelihood function L D isgiven by L D ( f ( x ; θ )) = n Y i =1 [ β + f ( x i ; θ )] , β ≥ , (6) x i in a vector x = ( x , x , · · · , x n ) is the observed value of random variable X i . Eq. (6) ismaximized according to parameters θ in order to get estimators ˆ θ when random variables X i are independent and non-identically distributed. When β = 0, MDLE drops to MLE inwhich random variables are assumed to be identical. The tuning constant (TC) β makesa perturbation to f ( x i ; θ ). This perturbation can be a good tricker to perform an eﬃcientmodeling when observations x , x , · · · , x n in a data set are distributed non-identically [16].The maximization of likelihood function in Eq. (6) with β = 0 coincides to minimization ofentropy in Eq. (5) with q = 1 [1, 4, 35], because minimization between f ( x ; θ ) and f ( x ; ˆ θ )is performed. Further, Fisher information can be derived by use of entropy minimization[7, 36]. IV. GENERALIZED MAXIMUM LIKELIHOOD ESTIMATION: M-ESTIMATIONS FOR PARAMETERS OF EXPONENTIAL POWER DISTRI-BUTION

A generalized maximum likelihood estimation is deﬁned by taking Λ of likelihood L in Eq.(6), i.e. Λ-likelihood, which is for n case. If we consider a case in which n = 1, then we willhave a functional form of Λ( f ( x ; θ )). Thus, we have M-functions [7] which are obtained bythe generalized version of likelihood function with n = 1 representing the objective function5iven by [17] ρ ( x ; θ ) = Λ( f ( x ; θ )) , (7)where Λ : R → R is a concave (or convex) function which can be taken as log and log q [7].Then, objective functions based on log and log q are obtained. A vector Ψ for score functions ψ , ψ , · · · , ψ p is given by Ψ( x ; θ ) = ∂∂ θ ρ ( x ; θ ) , (8)where vector θ ∈ R p represents the number of p parameters θ , θ and θ p , p ∈ Z + from aparametric model f ( x ; θ ) ∈ R .M-estimations are deﬁned by an objective function given as ρ : x × θ ∈ R p → R for thenumber n of observations represented by x , x , · · · , x n , minimizing n X i =1 ρ ( x i ; θ ) | θ =ˆ θ , (9)over ˆ θ or by a vector of score functions given as Ψ : x × θ ∈ R p → R p as solution forˆ θ = (ˆ θ , ˆ θ , · · · , ˆ θ p ) from a system of estimating equations n X i =1 Ψ( x i ; θ ) | θ =ˆ θ = , (10)which are used to get M-estimators ˆ θ from M-functions based on ρ and ψ in Eqs. (7) and(8), respectively. If Λ is log and log q , then Ψ is log and log q score function from derivativesof ρ w.r.t θ , respectively [17, 18, 20]. A. Estimating equations based on log likelihood for parameters of exponentialpower distribution

Gamma distribution is solution of ordinary diﬀerential equation [21]. The continuous Y ∈ [0 , ∞ ) and discrete Z = { , − } are variables distributed as gamma and probability1 / µ , respectively. If we apply variable transformation X = Y /α Z togamma distribution with special values of parameters, we produce exponential power (EP)distribution which is a ﬂexible function to model shape of data sets. The properties such asexistence of moments and unibomodality are important indicators to use this function formodeling data sets (see [37]). The p.d. function of EP is given by f ( x ; µ, σ, α ) = α σ Γ(1 /α ) exp {− (cid:18) | x − µ | σ (cid:19) α } , x, µ ∈ R , σ > , α > . (11)6 and σ are location and scale parameters, respectively. The parameter α controls shapeand peakedness of function f and it is important to increase the modeling capability of f .MLE is used to produce the estimating equations (EEs). The parameters µ , σ and α inEP are estimated by use of EEs which generate score function S . In order to get EEs forthese parameters, log-likelihood function of EP distribution in Eq. (11) is necessary and itis given by log[ L ( x ; µ, σ, α )] = n log (cid:18) α σ Γ(1 /α ) (cid:19) − n X i =1 (cid:18) | x i − µ | σ (cid:19) α . (12)The derivatives of log( L ) in Eq. (12) with respect to (w.r.t) parameters µ , σ and α aretaken and setting them to zero in order to obtain EEs. After algebraic manipulations areperformed, we get the following EEs based on log-score function S for the parameters µ , σ and α : ˆ µ = P ni =1 m ( x i ; ˆ µ, ˆ σ, ˆ α ) x i P ni =1 m ( x i ; ˆ µ, ˆ σ, ˆ α ) , (13)ˆ σ = (cid:20) n n X i =1 m ( x i ; ˆ µ, ˆ σ, ˆ α )( x i − ˆ µ ) (cid:21) / , (14)ˆ α = " n X i =1 (cid:20) n (cid:18) ψ (1 / ˆ α )ˆ α (cid:19)(cid:21) − ( m ( x i ; ˆ µ, ˆ σ, ˆ α )ˆ α (cid:18) x i − ˆ µ ˆ σ (cid:19) log (cid:18) | x i − ˆ µ | ˆ σ (cid:19)) − , (15)The estimators ˆ µ , ˆ σ and ˆ α are MLE of parameters µ , σ and α . Here, m ( x i ; ˆ µ, ˆ σ, ˆ α ) = ˆ α (cid:18) | x i − ˆ µ | ˆ σ (cid:19) ˆ α − = ˆ α | y i | ˆ α − (16)is a common function for MLE of parameters µ , σ and α . It is also known as a weightfunction [38]. ψ in Eq. (15) is digamma function. The equations in (13)-(15) are EEs ofparameters. For α = 2 known as Gaussian distribution, EEs of µ and σ are given by [8]. Thefunctions depending only x , µ and σ in Eqs. (13)-(14) can be constructed for the estimationsof parameters µ and σ if we have a location-scale model for an arbitrary function. B. Structure of Huber’s M-functions from EP distribution

Let S be a log-score function of EEs from log( f ( x ; θ )). Then, S can be obtained if it iswritten as following form: S ( y ) = m ( y ) · | y | , (17)7here y = x − µσ . The common function m for two parameters µ and σ has also been deﬁnedby Eq. (16) as a closed form of these parameters. ρ ( y ) = − log( f ) is an objective functionto show a connection between the getting log-score function S produced by EEs and thederivative of ρ ( y ) w.r.t the variable y . Thus the correspondence between maximum log-likelihood estimation method used to derive objective function ρ and EEs which producethe score function for location and scale parameters are clariﬁed. The derivative of objectivefunction ρ w.r.t y will be a log-score from function m , because if ρ ( y ) = | y | α , then ddy ρ ( y ) = S ( y ) = α | y | α − sign( y ) from Eq. (17) for EP distribution (see p.52 in [39] and p. 5 in [40]).Thus we can have same mathematical expression with Eq. (17). The function S deﬁned asa log-score function is in fact a member of EEs and is used for estimation instead of using ρ [18, 41–43].The objective function of Huber is deﬁned as ρ ( y ) =  y , | y | ≤ r ;2 r | y | − r , | y | > r, (18)and score function ddy ρ ( y ) = 2 S H ( y ) is given by S H ( y ) =  y , | y | ≤ r ;sign( y ) r , | y | > r. (19)Huber’s score function in Eq. (19) is a combination of log-score functions from normal andLaplace for α = 2 and α = 1 in EP distribution respectively from Eq. (11) [17]. The variableparts of Eq. (19) are obtained by taking derivative w.r.t variable y of ρ ( y ), as given by Eq.(17) as base of Eq. (19) produced originally from spirit of EEs. Since S H in Eq. (19) isused with EEs, S H with EEs is known as M-estimation. C. M-estimators ˆ µ and ˆ σ in estimating equations based on combined log -scorefunctions for non-identically distributed case of EP distribution The combination of log-score function S with diﬀerent values of parameter α in EP willbe used to get M-estimators ˆ µ and ˆ σ which are eﬃcient for parameters µ and σ , becausewe want to propose two generalizations of Huber’s score function from EEs. Inspiring fromHuber’s M-functions [17] and using the spirit of EEs [18], new score functions S ∗ and S ∗ H will be proposed by use of EEs in Eqs. (13) and (14) for the parameters µ and σ , respectively.8fter S in Eq. (17) is partitioned, we have partial form of S . As a result, we have α , α and α for left, middle and right parts on the real line, as proposed by Huber’s S H in Eq.(19) which is mainly from EEs in [18]. n , n and n represent sample size for these threeparts. Each part j has its corresponding score functions S j . The estimators ˆ µ and ˆ σ fromEEs are given by the following forms, respectively.ˆ µ = X j =1 n j X i =1 S j ( x ji ; ˆ µ, ˆ σ, α j ) | y ji | − x ji / X j =1 n j X i =1 S j ( x ji ; ˆ µ, ˆ σ, α j ) | y ji | − , (20)ˆ σ = " P j =1 n j X j =1 n j X i =1 S j ( x ji ; ˆ µ, ˆ σ, α j ) | y ji | − ( x ji − ˆ µ ) / , (21)where y ji = x ji − ˆ µ ˆ σ , k, t > S ∗ ( y ) =  S = α | y | α − , ( −∞ , − k ); S = α | y | α − , [ − k, t ]; S = α | y | α − , ( t, ∞ ) . (22)Huber’s M-function in Eq. (19) is generalized by partial log-score function S ∗ in Eq. (22)which should not be continuous at the points − k and t for α , α and α , because mod-eling data, i.e. Riemann integration’s rule is applied and histograms on the real line areconstructed randomly, is equivalent to an integration of area under function. As it is well-known, integration works even though the function S ∗ used to ﬁt data set is not continuous[5, 37]. Thus, the discontinuity from S to S is not a problem for estimations of parameters. S ∗ is log-score function to get M-estimators ˆ µ and ˆ σ from EEs. S ∗ can be replaced with S ∗ H , or alternatively, one can use diﬀerent score functions in order to manage the eﬃciency.Thus, we have an eﬃcient ﬁtting of data.We propose a log-score function S ∗ H given by S ∗ H ( y ) =  S = − kα | y | α − , ( −∞ , − k ); S = α | y | α − , [ − k, t ]; S = tα | y | α − , ( t, ∞ ) , (23)which is proposed by use of Huber’s score function in Eq. (19). The log-score function S ∗ H in Eq. (23) is a generalized form of Eq. (19). Since the shape parameters α , α and α in Eq. (22) and (23) are added, we will have robust and eﬃcient M-estimators for µ and9 when we compared with Huber’s S H which is not capable to ﬁt shape of data sets. Wealso provide that Huber’s S H in Eq. (19) works on the base of EEs. Note that r of Huber’sM-functions from Eqs. (18) and (19) can be replaced by − k and t because of spirit of EEsand estimations of paramaters [5, 37] from Riemann integration. D. Estimating equations based on log q and distorted log likelihoods for parameters µ , σ and α of EP distribution As an alternative tool for robust estimation, the generalized and deformed en-tropies/logarithms [10, 11, 14, 31, 45–47] and the divergences for estimation [15, 48] canbe used if the observations x , x , · · · , x n in a data set are distributed non-identically. Wewill use q -deformed logarithm for the sake of its concavity, analytical simplicity and advan-tage for the constrained optimization in computation for parameter q [6–9]. In MLE, it ispossible to take the logarithm of likelihood function for a simple calculation in order to getestimators of parameters. In such a situation, log can be replaced with log q . Then it isknown as maximum log q likelihood estimation (MqLE) [6, 8, 16].After taking log for Eq. (6) with β = 0 and replacing log form of Eq. (6) with log q and β = 0, MqLE for parameters of EP distribution are obtained by optimizing the followingfunction according to parameters µ, σ and αl q ( L ( x ; µ, σ, α )) = n X i =1 log q [ f ( x i ; µ, σ, α )] , (24) x is a vector of random observations x , x , · · · , x n . The p.d. function f is in Eq. (11).The derivatives of l q ( L ) in Eq. (24) w.r.t parameters µ , σ and α are taken and setting themto zero in order to obtain EEs. After algebraic manipulations are performed, we get thefollowing EEs based on log q -score function S q for parameters µ , σ and α :ˆ µ = n X i =1 S q ( x i ; ˆ µ, ˆ σ, ˆ α ) | y i | − x i / n X i =1 S q ( x i ; ˆ µ, ˆ σ, ˆ α ) | y i | − , (25)ˆ σ = " P ni =1 w i n X i =1 S q ( x i ; ˆ µ, ˆ σ, ˆ α ) | y i | − ( x i − ˆ µ ) / , (26)ˆ α = ˆ α P ni =1 S q ( x i ; ˆ µ, ˆ σ, ˆ α ) | y i | log( | y i | ) P ni =1 w i − ψ (1 / ˆ α ) , (27)10here S q = w i S is a weighted ( w ) form of log-score function S , w i = f ( x i ; ˆ µ, ˆ σ, ˆ α ) − q , S ( x i ; ˆ µ, ˆ σ, ˆ α ) = ˆ α | y i | ˆ α − , y i = x i − ˆ µ ˆ σ . ψ is digamma function. S q fromEEs can be used to ﬁt a non-identically distributed data set. For q = 1, S q drops to S in Eq.(17). S can be derived by a function without normalizing factor, such as escort distribution[13] or S can be an arbitrary function in M-functions [18].For MDLE of the parameters of f ( x ; θ ), logarithmic form of Eq. (6), i.e. log( β + f ), isused. S q in Eqs. (25)-(27) can be replaced with the distorted log-score function S D = w i S and w i is f ( x i ;ˆ µ, ˆ σ, ˆ α ) β + f ( x i ;ˆ µ, ˆ σ, ˆ α ) ; because, one get same result, which shows us an interesting connectionbetween log q and distorted log-likelihoods for parameters of EP distribution. Note that S q and S D are comparable due to the weighted form of log-score function S , i.e. wS whichcan produce robust and also eﬃcient M-estimators ˆ θ . S q and S D are redescending M-functions which are used with EEs to get M-estimators [20], because S q and S D are zero iflim x →∞ f ( x ; µ, σ, α ) = 0. It can be necessary to have a partial form of S q or S D which isused to model a data set more eﬃciently for the estimations of µ and σ when many outliersor huge disorders exist in a data set. E. Robustness of M-estimators: Score functions of parameters µ , σ and α of EPdistribution for log q , log and distorted log functions If we observe the non-identically distributed random variables X , X , · · · , X n , we consultthe robust statistics which can represent the behavior of bulk of data in a data set when thereexist outlier(s) in the data set. In order to test the robustness of M-estimators producedby objective ρ and score Ψ functions used to model a data set with outlier(s), we need toexamine the values of score functions when y goes to inﬁnity. [17, 20].Let us examine whether or not the elements of a vector Ψ for parameters of EP dis-tribution in log, log q and distorted log functions are ﬁnite. For the sake of simplicity, theﬁniteness for positive side of score functions in the vector Ψ is examined, because the dis-tribution is symmetric [37]. If all elements of vector Ψ which shows robustness property ofM-estimators are ﬁnite for y → ∞ , then M-estimators will be robust [20].Let Ψ log q = ( µ ψ log q , σ ψ log q , α ψ log q ) be a vector of score functions from derivative ofΛ( f ( x ; θ )) w.r.t parameters θ = ( µ, σ, α ) of EP distribution. Here Λ can be log and log q .11et us examine the ﬁniteness of score functions. µ ψ log q ( x ; θ ) = ∂∂µ log q ( f ( x ; θ )) = − f ( x ; θ ) − q αy α − , (28) σ ψ log q ( x ; θ ) = ∂∂σ log q ( f ( x ; θ )) = f ( x ; θ ) − q (cid:20) − σ + ασ y α (cid:21) , (29) α ψ log q ( x ; θ ) = ∂∂α log q ( f ( x ; θ )) = f ( x ; θ ) − q (cid:20) α + ψ (1 /α ) α − y α log( y ) (cid:21) , (30)where y = x − µσ . TABLE I: Limit values of Ψ log q = ( lim x →∞ µ ψ log q , lim x →∞ σ ψ log q , lim x →∞ α ψ log q ). q = 1 0 < q < q > α = 1 ( − α, ∞ , −∞ ) (0 , ,

0) ( −∞ , ∞ , −∞ )0 < α < , ∞ , −∞ ) (0 , ,

0) ( ∞ , ∞ , −∞ ) α > −∞ , ∞ , −∞ ) (0 , ,

0) ( −∞ , ∞ , −∞ ) Let Ψ D log = ( µ ψ D log , σ ψ D log , α ψ D log ) be a vector of score functions from derivative of log( β + f ( x ; θ )) w.r.t parameters of EP distribution. Let us examine the ﬁniteness of score functions. µ ψ D log ( x ; θ ) = ∂∂µ log (cid:18) f ( x ; θ ) β + f ( x ; θ ) (cid:19) = − f ( x ; θ ) β + f ( x ; θ ) αy α − , (31) σ ψ D log ( x ; θ ) = ∂∂σ log (cid:18) f ( x ; θ ) β + f ( x ; θ ) (cid:19) = f ( x ; θ ) β + f ( x ; θ ) (cid:20) − σ + ασ y α (cid:21) , (32) α ψ D log ( x ; θ ) = ∂∂α log (cid:18) f ( x ; θ ) β + f ( x ; θ ) (cid:19) = f ( x ; θ ) β + f ( x ; θ ) (cid:20) α + ψ (1 /α ) α − y α log( y ) (cid:21) , (33)where y = x − µσ . TABLE II: Limit values of Ψ D log = ( lim x →∞ µ ψ D log , lim x →∞ σ ψ D log , lim x →∞ α ψ D log ). α = 1 0 < α < α > β > , ,

0) (0 , ,

0) (0 , , β = 0 ( − α, ∞ , −∞ ) (0 , ∞ , −∞ ) ( −∞ , ∞ , −∞ ) M-estimators ˆ µ , ˆ σ and ˆ α will be robust if 0 < q < β >

0, because we have ﬁnitevalues of limit for score functions (see Tables I and II). Since y is a variable which includeslocation and scale parameters with variable x , i.e. y = x − µσ , examining the ﬁniteness ofvariables y and x which go to inﬁnity in Ψ is equivalent to each other [5, 20].12 . FISHER INFORMATION MATRICES BASED ON DERIVATIVE OF SCOREFUNCTIONS AND INFORMATION THEORY Fisher information (FI) is used to get variance-covariance of MLE. FI matrix of objec-tive function ρ = − log( f ( x ; θ )), that is − E h ∂ ∂ θ ∂ θ T log[ f ( x ; θ )] i = E h ∂ ∂ θ ∂ θ T ρ [ f ( x ; θ )] i = E h(cid:8) ∂∂ θ ρ [ f ( x ; θ )] (cid:9) (cid:8) ∂∂ θ ρ [ f ( x ; θ )] (cid:9) T i as the known deﬁnition of FI, [26, 49] can be adoptedfor score functions S H , S ∗ , S ∗ H , S q and S D from EEs by use of the deﬁnition of Tsallis q -entropy as a generalization of Shannon entropy [7, 36, 50]. Thus, FI of score functionsbased on S will be proposed for M-estimators as a generalization of MLE (see B). Since FImatrices are based on derivative of score functions w.r.t parameters θ , they are comparable[7, 22–25]. Note that since M-functions can become S ( x ; θ ) (or S can be an arbitrary func-tion) which can be generated from Ψ( x ; θ ), it is necessary to use FI based on the derivativeof score functions w.r.t θ instead of using variance-covariance of M-estimators produced byM-functions from ρ and ψ [51, 52].FI matrix obtained from derivative of log-score function S ∗ in Eq. (22) or S ∗ H in Eq.(23) w.r.t parameters µ and σ is deﬁned as F log ( S j ; f ; θ , α j ) = n X j =1 Z b j a j ∂∂ θ S j ( x j ; θ , α j ) ∂∂ θ S j ( x j ; θ , α j ) T f ( x j ; θ , α )d x j , (34)where θ = ( µ, σ ), a j , b j ∈ R and S j = α j ( | x j − µ | σ ) α j − sign( x j − µ ).It is possible to propose FI for location, scale and shape parameters if we use distributionswhich can derive score functions via using MLE, MqLE and MDLE for these parameters.FI matrix obtained from log q -score function S q is deﬁned as F log q ( S q ; f ; θ ) = n Z ba [(1 − q ) S ( x ; θ ) ∂∂ θ S ( x ; θ ) T + ∂∂ θ S ( x ; θ ) ∂∂ θ S ( x ; θ ) T ] f ( x ; θ ) − q d x. (35)FI matrix obtained from distorted log-score function S D is deﬁned as F log ( S D ; f ; θ ) = n Z ba (cid:20)(cid:18) ββ + f ( x ; θ ) (cid:19) S ( x ; θ ) ∂∂ θ S ( x ; θ ) T + ∂∂ θ S ( x ; θ ) ∂∂ θ S ( x ; θ ) T (cid:21) f ( x ; θ ) β + f ( x ; θ ) d x, (36) where θ = ( µ, σ, α ), a, b ∈ R and S = α ( | x − µ | σ ) α − sign( x − µ ). The inverse of F log q and its special case F log are deﬁned as variance-covariance matrices of M-estimators ˆ θ . Thevariances-covariances of ˆ θ are denoted by Var S q (ˆ θ ) = F − q and Var S (ˆ θ ) = F − for q = 1.Eq. (35) with q = 1 and Eq. (36) with β = 0 are Eq. (34) with parameter α = α j . The13atrix in Eq. (34) is positive semideﬁnite, i.e. non-negative, and symmetric. The matricesin Eqs. (35)-(36) are positive semideﬁnite and asymmetric if conditions are satisﬁed (seeB 1 for details). VI. SELECTION OF M-FUNCTIONS VIA TOOLS FROM INFORMATION THE-ORYA. Selection of optimal M-function via volume

The ﬁtting performance of M-functions is tested by the volume of ellipsoid of M-functionwith underlying p.d. f ( x ; θ ). The volume based on score function S can be proposed byreplacing the known FI matrix based on objective function ρ with FI matrix based on S used in EEs. The main part of Eq. (37) is det( F ). Other parts are constants which do notaﬀect volume exactly when volume based on ρ in [7] is compared with that in Eq. (37).Note that score functions derived by EEs are comparable for volume.The volume is used to determine the values of r, k, t in S H , S ∗ and S ∗ H with EEs, q and β from S q and S D in MqLE and MDLE, respectively. The diﬀerent values of tuningconstants (TCs) with estimated values of ˆ θ and ﬁxed values of α are tried with grid searchuntil the smallest values of volume and MAE are obtained. Thus, the optimal M-functionis chosen among diﬀerent values of TCs and the corresponding estimates according to TCsin M-functions. The volume is deﬁned byVol q ( S q ; ˆ θ ) = (cid:18) πvn (cid:19) d/ d/ q det[ F log q ( S q ; f ; ˆ θ )] , (37)where n is the number of observations in a data set. v and d are the number of eigenvaluesand dimension of FI matrix, respectively. S q is replaced with S H , S ∗ , S ∗ H and S D in log caseto produce volumes based on S H , S ∗ , S ∗ H and S D , respectively. We have volume knownas a general geometric measure for d ≥

1. In special case, if ˆ θ is a vector for (ˆ µ, ˆ σ ) and(ˆ µ, ˆ σ, ˆ α ), then dimensions of FI matrix are d = 2 and d = 3, respectively [7, 22–26].14 . Selection of the best M-function via information criteria Information criteria (IC) are used to test whether or not the best M-function for a dataset can be chosen. Since M-estimators are based on score functions derived from log( f ),log q ( f ) and log( β + f ), it is appropriate to use Akaike IC (AIC), corrected AIC (cAIC)and Bayesian IC (BIC) which should be based on score functions in EEs. Note that theknown deﬁnitions of IC are comparable among diﬀerent p.d. functions used to ﬁt a data set,because p.d. functions are on the interval [0 , f ( x ; θ )) will be comparable when IC are used for the selection of the bestp.d. f ( x ; θ ) among p.d. functions. In our case, IC are proposed by use of score functions S instead of using objective functions log( L ) deﬁned as origin of IC. Since summation of a scorefunction evaluated by values of x , x , · · · , x n and ˆ θ is approximately zero and theoreticallyequals to zero, i.e. integration over all values of x and true values of parameters [26], theabsolute value of score function is taken in order to test the ﬁtting performance of scorefunction in its self [7, 26].IC of the combined log-score functions is deﬁned as IC S = 2 X j =1 n j X i =1 | S j ( x ji ; ˆ µ, ˆ σ, α j ) | + c ( p, n ) , (38)where c ( p, n ) is the given constant. Similarly, IC for S q is obtained by replacing log in Eq.(38) with log q . However, S q is not a partial form and so IC of log q -score function is deﬁnedas [7, 9] IC S q = 2 n X i =1 | S q ( x i ; ˆ µ, ˆ σ, ˆ α ) | + c ( p, n ) (39)for a determined value of q via volume as well. p is a number of the estimable parameters and n is sample size. S q is replaced with S D for the distorted log likelihood. The formulae in Eqs.(38) and (39) are tools for selection of M-function. r , − k , t , q , β and α in score functionsare chosen for their corresponding score functions with appropriate minimum values of ICin Eqs. (38) - (39) and the smallest value of MAE. The performance of IC depends on thepenalty term c ( p, n ). When c ( p, n ) is 2 p , pnn − p − and p log( n ), penalty terms of AIC, cAICand BIC are obtained, respectively [53]. Since penalty term of AIC is 2 p , the quality of AICis low to select the best M-function. For this reason, cAIC and BIC are given. Their penaltyterms are aﬀected by p and n [54, 55]. 15 II. APPLICATION FOR REAL DATA SETS

Temperature measurements are important to observe the ecological movement and theresults of temparature on earth such as sudden rains and ﬂoods should be examined precisely.Grytviken temperature from 1905 to 2019 years is analyzed to get statistics about location,scale and also shape parameters of empirical distribution (Example 1 and see Table III) [60].Cancer treatment in genomic and pharmacology includes measurements to examine aﬀectsof pharmacology in treatments of genes. Gene-drug correlation data should be analyzedprecisely to manage level of pharmacology (Example 2 and see Table IV) [61]. The real dataset which is cDNA microarray coded as ”SID W 486613, ESTs, Highly similar to OvarianGranulosa Cell 13.0 KD protein HGR74 [Homo sapiens] [5’:AA044350, 3’:AA044028]” [61]is used to get statistics as well.If a distortion (occuring due to measurement error in experiment, diﬀerence in measure-ments from laboratory expert, incorrectly recorded data or the nature of phenomena) in adata set exists, then the underlying distribution which represents the majority of a data setshould be modelled eﬃciently. The location, scale and also shape parameters are mainlyused to get the statistics about data analysis [5, 20]. Random number generation (see A 1) isalso an advantage to observe behaviours of real data sets. For these aims, we apply our EEsinto estimation of these parameters robustly and precisely. The parameters α , α and α are used to manage eﬃciency of score functions S ∗ and S ∗ H as well. The objective functionslog q ( f ) and log( β + f ) are also applied to model real data sets. Since the modeling capabilityof the proposed functions for these data set is observed to be high, these real data sets havebeen chosen. Many phenomena can be modelled by these functions in order to be able toget eﬃcient M-estimators from M-functions such as score and objective. Two outlier valueswhich are two times of maximal value of data set are added as positive and negative values.Thus we added two outliers into real data sets. The main aim is to observe the sensitivityof M-estimators from score functions with EEs and objective functions produced by MqLEand MDLE to two outliers.Even if S ∗ in Eq. (22) or S ∗ H in Eq. (23) from EEs is inﬁnite for α , α >

1, the bestscore functions with EEs are used to model real data sets, which shows that the underlyingand contamination should be eﬃciently modelled for observations from a ﬁnite sample size(see italic numbers in Tables V-VIII). 16

ABLE III: Example 1: M-estimates of parameters θ = ( µ, σ, α ) EEs TC ˆ µ ˆ σ α , α , α Var S (ˆ µ ) Var S (ˆ σ ) Var S (ˆ α ) AIC S cAIC S BIC S Vol( S ; ˆ µ, ˆ σ, ˆ α ) MAE S H k = . t = 4 . S ∗ H k = 0, t = 3 .

15 3.6135 2.3078 1.25,2.1,1.4 .0307 .0261 - 52.8691 52.9996 57.9769 .003497 0.5214outliers 3.7006 2.7360 1.25,2.1,1.4 .0357 .0313 - 55.2169 55.3445 60.3663 .004179 0.8245 S ∗ k = . t = 4 3.0674 1.4233 1.25,2.1,1.4 .0281 .0258 - 253.9906 254.1210 259.0983 .002632 0.1912outliers 3.1192 1.8391 1.25,2.1,1.4 .0462 .0452 - 225.5623 225.6899 230.7117 .004141 . EEs, ρ TC ˆ µ ˆ σ α, ˆ α Var S (ˆ µ ) Var S (ˆ σ ) Var S (ˆ α ) AIC S cAIC S BIC S Vol( S ; ˆ µ, ˆ σ, ˆ α ) MAE S D β = 10 − β = 10 − . outliers 3.1261 1.6374 1.9375 .0093 .0361 .0272 198.5424 198.8005 206.2666 .000142 0.3337EEs, ρ TC ˆ µ ˆ σ α, ˆ α Var S q (ˆ µ ) Var S q (ˆ σ ) Var S q (ˆ α ) AIC S q cAIC S q BIC S q Vol q ( S q ; ˆ µ, ˆ σ, ˆ α ) MAE S q q = . q = . TABLE IV: Example 2: M-estimates of parameters θ = ( µ, σ, α ) EEs TC ˆ µ ˆ σ α , α , α Var S (ˆ µ ) Var S (ˆ σ ) Var S (ˆ α ) AIC S cAIC S BIC S Vol( S ; ˆ µ, ˆ σ, ˆ α ) MAE S H k = − . t = 1 .

18 0.0307 0.0677 1,2,1 0 . · − . · − - 234.8350 234.9393 240.3764 1 . · − . · − . · − - 204.0155 204.1181 209.5905 2 . · − S ∗ H k = − . t = 0 .

67 -0.0304 0.1148 1.52,3.18,1.11 0 . · − . · − - 175.0141 175.1184 180.5555 8 . · − . · − S ∗ k = − . t = 0 .

61 -0.0255 0.1504 1.52,3.18,1.11 0 . · − . · − - 208.4176 208.5219 213.9589 9 . · − . · − ρ TC ˆ µ ˆ σ α, ˆ α Var S (ˆ µ ) Var S (ˆ σ ) Var S (ˆ α ) AIC S cAIC S BIC S Vol( S ; ˆ µ, ˆ σ, ˆ α ) MAE S D β = 6 · − . · − . · − - 247.6774 247.7818 253.2188 3 . · − . outliers 0.0070 0.2362 3.18 3 . · − . · − - 247.4937 247.5963 253.0687 3 . · − . MDLE β = 6 · − . · − . · − . · − . · − . · − . · − ρ TC ˆ µ ˆ σ α, ˆ α Var S q (ˆ µ ) Var S q (ˆ σ ) Var S q (ˆ α ) AIC S q cAIC S q BIC S q Vol q ( S q ; ˆ µ, ˆ σ, ˆ α ) MAE S q q = .

93 0.0095 0.2550 3.18 3 . · − . · − - 259.2036 259.3080 264.7450 3 . · − . · − . · − - 255.6675 255.7701 261.2425 3 . · − q = .

93 0.0072 0.2472 2.9071 4 . · − . · − . · − . · − . · − . · − Redescending M-functions derived by MqLE and MDLE can be alternative to each others17or estimation and so we can have similar values for the estimation of parameters and MAE(see Table IV for S D , S q , MDLE and MqLE). The value of MAE for S ∗ is better than thatof S ∗ H and S H for case in Table IV. For overall assessment according to MAE, S D has thesmallest value of MAE among others in Table IV. When we consider the results of MDLE insimulation in A 3 and real data sets (see for procedure in A 4), MDLE should be preferred. VIII. CONCLUSIONS AND DISCUSSIONS

This paper has focused on derivation of score functions from EEs based on log, log q and distorted log-likelihoods and inference for location, scale and shape parameters of EPdistribution. We have also proposed to estimate location and scale parameters for shapeparameter value determined by consulting FI from information geometry; IC which canbe used as tools for model selection among competing models; and MAE used to make acomparison among M-functions all together. We have considered to propose score functionsas an approach in modeling. Even if we use EP distribution to produce score functions, onecan use this approach for two aims. The ﬁrst one is that score functions can be producedby use of diﬀerent parametric models. The second one is that the score function in EEs isreplaced by arbitrary ones which can model data set eﬃciently.Applications from simulation and real data sets have been given to test the ﬁtting per-formance of score functions with EEs and objective functions. [ MSE(ˆ θ ) is used for thesimulation in order to determine the values of TCs in score and objective functions. Theresults of simulation show that MDLE of parameters σ and α of EP distribution are betterthan MqLE of these parameters. q in MqLE aﬀects the success of estimations of parame-ters σ and α . We have showed that FI matrices in Eqs. (34), (35) and (36) are positivesemideﬁnite if conditions are satisﬁed. Thus, the diagonal elements of inverse of FI matrixare variance of estimators, i.e. Var(ˆ µ ), Var(ˆ σ ) and Var( ˆ α ). Explicit expressions calculatedfor each element of FI for parameters µ , σ and α of EP distribution are given for S ∗ , S ∗ H and S q .M-estimators are robust for q ∈ (0 ,

1) and β >

0. Robustness is not enough to imply themodeling capability for empirical distributions occurred by the underlying and contamina-tion. The diﬀerent score functions with EEs should be used to model a data set even if thescore functions S ∗ and S ∗ H derived by EEs are inﬁnite for parameters µ and σ respectively18hen α >

1, as supported by the results of simulation for simultaneous estimation of param-eters µ and σ . In addition, diﬀerent objective functions should also be used to have the bestﬁtting on a data set. In the application of real data sets, the role of robustness has beenobserved especially for estimation of σ when outlier and non-outlier cases are compared.Redescending M-functions from S q and S D give eﬃcient results extensively and they areﬁnite for q ∈ (0 ,

1) and β > S ∗ H and S ∗ for estimations of µ and σ , as observed by simulation results. For determining the values ofTCs, volume, IC and MAE should be used together.EEs for diﬀerent p.d. functions will be derived. EEs from estimation methods which arediﬀerent from likelihood method, fractional objective functions, regression case and theirmultivariate forms with applications will be studied. FI matrix based on diﬀerent estima-tion methods will also be derived to provide variance of the estimators obtained by theircorresponding estimation methods. A user-friendly package in open access statistical soft-ware R will be prepared for practitioners in the applied science to manage precise modelingon the data sets. Appendix A: Random number generation, tools for computation and simulation1. Algorithm of random number generation

The algorithm for number generation is given in the following order [56]:1. Generate a random number y from gamma distribution with shape parameter 1 /α andscale parameter 1, i.e. Y ∼ Γ(1 /α, x j = µ j + σ j Y /α j , j = 1 , , x represents theartiﬁcial data set from the underlying distribution. x and x represent the artiﬁcialdata sets from the contamination. Thus, the artiﬁcial random numbers are x =( x , x , x ).

2. Tools used for computation

All of computations are performed by using MATLAB R2013a. The codes for geneticalgorithm (GA) which is a derivative-free method to perform an optimization are given by19he following order:1. Provide lower and upper bounds for search space of GA: µ lower = − ; µ upper = 10 ; σ lower = 0 . σ upper = 10 ; α lower = 0 . α upper = 10 ;lb=[ µ lower σ lower α lower ];ub=[ µ upper σ upper α upper ];2. Use ’ga’ to get the estimates of ˆ µ, ˆ σ, ˆ α by use of MqLE:(a) opts = gaoptimset(’CrossoverFcn’,@crossoversinglepoint,’display’,’oﬀ’);(b)

1. function [Estimates] = MqLE(p,x,q)2. mu=p(1);sigma=p(2);alpha=p(3);3. f=(alpha/(2*sigma*gamma(1/alpha))).*exp(-(abs(x-mu)./sigma).^alpha);4. Lq=sum((f.^(1-q)-1)./(1-q));5. Estimates=-Lq; (c) Give the estimates of ˆ µ, ˆ σ, ˆ α :pMqLE=ga(@(p)MqLE(p,x,q),3,[],[],[],[],lb,ub,[],opts);The parameter values of GA are default of GA module in MATLAB R2013a. Since theoptimization of objective functions in Eq. (6) with log, i.e. log( β + f ), and Eq. (24)according to parameters is equivalent to the numerical solving of equations for the estimatorsˆ µ , ˆ σ and ˆ α in Eqs. (25)-(27), the objective functions log q ( f ) and log( β + f ) are used to getthe estimates of ˆ µ , ˆ σ and ˆ α . Note that α is an important parameter [59]. For this reason,GA is used. MDLE is obtained by same codes and 4. line is replaced with sum(log( β + f )).Iterative reweighting algorithm for simultaneous estimations of µ and σ are given by [51].For computation of all EEs in Eqs. (20)-(21) and (25)-(26), initial values are provided byˆ µ = Median( x ), ˆ σ = Median( | x i − Median( x ) | ), x = { x , x , · · · , x n } .

3. The results of simulation for simultaneous estimation of µ, σ and simultaneousestimation of µ, σ, α

Simulation provides to observe the performance of diﬀerent functions from score andobjective. Thus, we make a comparison among their modeling capability for same typesof designs in A. It is important to drive a comprehensive simulation and its outputs aregiven by Tables V-VIII which show the results of simulation for M-estimates of ˆ µ , ˆ σ from20Es with score functions S H , S ∗ H , S ∗ , S q , S D and M-estimates of ˆ µ , ˆ σ , ˆ α from MqLEand MDLE, the simulated variance ( d Var(ˆ θ )), the simulated mean squared error ( [ MSE(ˆ θ ))and TC values k, t, q, β for ˆ µ , ˆ σ and q, β for ˆ µ, ˆ σ , ˆ α . TC , TC , TC show values of TCfor sample sizes n = 110 , , m P mj =1 (ˆ θ j − θ ) are obtained. If values of [ MSE(ˆ θ ) and d Var(ˆ θ ) are equal to each other,then biases of estimators ˆ θ are zero.We have four types of design. These designs will have outliers for α <

2, because EPdistribution becomes a heavy-tailed function if α ∈ (0 , α ≤ n and n of contamination are 5, respectively. The sample size n of underlying is100 , , m = 10 . The four designs aregiven by following forms for Tables V-VIII, respectively: • th design: x = D ( α = 1 . , µ = 5 , σ = 6 , n ), x = D ( α = 2 , µ = 0 , σ = 1 , n ), x = D ( α = 1 . , µ = 4 , σ = 2 , n ). • nd design: x = D ( α = 1 . , µ = 2 , σ = 3 , n ), x = D ( α = 3 , µ = 0 , σ = 1 , n ), x = D ( α = 1 . , µ = 3 , σ = 5 , n ). • rd design: x = D ( α = 1 . , µ = 3 , σ = 4 , n ), x = D ( α = 3 , µ = 0 , σ = 1 , n ), x = D ( α = 0 . , µ = 3 , σ = 4 , n ). • th design: x = D ( α = 0 . , µ = 4 , σ = 2 , n ), x = D ( α = 1 . , µ = 0 , σ =1 , n ), x = D ( α = 0 . , µ = 2 , σ = 3 , n ).If α , α >

1, score functions S ∗ H and S ∗ are inﬁnite when y = x − ˆ µ ˆ σ goes to inﬁnity forthe positive side or equivalently negative side of these score functions in Eqs. (22) or (23)on the real line. In four designs, there are situations in which α is greater than 1. Evenif α , α > [ MSE(ˆ θ ), ˆ θ = (ˆ µ, ˆ σ ) for ˆ µ and ˆ σ are small enough. In Tables VI-VII, valuesof [ MSE(ˆ σ ) for ˆ σ from S H as Huber’s M-function do not decrease even if sample size n isincreased from 210 to 410, which can be an expected result, because Huber’s M-function inEq. (19) depends α = 2 and α , α = 1. For 2 th and 3 rd designs in subsection A 1, we have α = 3 and α , α = 1 . , . , . ABLE V: 1 st design EEs θ TC ˆ θ = (ˆ µ, ˆ σ ) d Var(ˆ θ ) [ MSE(ˆ θ ) TC ˆ θ d Var(ˆ θ ) [ MSE(ˆ θ ) TC ˆ θ d Var(ˆ θ ) [ MSE(ˆ θ ) − k, t n = 110 − k, t n = 210 − k, t n = 410 S H -1.38,1.38 0.0852 0.0025 0.0097 -2.47,2.47 0.0718 0.0014 0.0065 -5.01,5.01 0.0638 0.0008 0.0048 µ = 0 1.0058 0.0080 S ∗ H σ = 1 -0.79,0.78 0.0830 0.0019 -1.02,1.01 0.0438 0.0009 0.0028 -1.25,1.24 0.0236 0.0004 0.00101.0046 0.0084 0.0084 1.0041 0.0044 0.0044 1.0054 0.0020 0.0020 S ∗ -0.37,0.36 0.1016 0.0019 0.0122 -1.04,1.03 -1.47,1.46 0.0167 0.0005 S q q = 0 .

84 0.0190 0.0019 0.0023 0.875 0.0142 0.0009 0.0011 q = 0 .

905 0.0108 0.0004 0.00061.0107 0.0102 0.0103 1.0066 0.0040 0.0041 1.0075 0.0017 0.0018 S D β = 3 · − β = 10 − β = 0 . · − ρ θ TC ˆ θ = (ˆ µ, ˆ σ, ˆ α ) d Var(ˆ θ ) [ MSE(ˆ θ ) TC ˆ θ d Var(ˆ θ ) [ MSE(ˆ θ ) TC ˆ θ d Var(ˆ θ ) [ MSE(ˆ θ )MqLE µ = 0 q = 0 .

625 0.0069 0.0016 q = 0 .

625 0.0030 0.0006 q = 0 . σ = 1 0.7589 0.0325 0.0906 0.7766 0.0142 0.0641 0.7630 0.0075 0.0637 α = 2 2.0080 0.4568 0.4568 2.0013 0.1913 0.1913 1.9871 0.1036 0.1037MDLE β = 2 . · − β = 1 . · − β = 1 . · − TABLE VI: 2 nd design EEs θ TC ˆ θ = (ˆ µ, ˆ σ ) d Var(ˆ θ ) [ MSE(ˆ θ ) TC ˆ θ d Var(ˆ θ ) [ MSE(ˆ θ ) TC ˆ θ d Var(ˆ θ ) [ MSE(ˆ θ ) − k, t n = 110 − k, t n = 210 − k, t n = 410 S H -1.72,1.72 0.0803 0.0018 0.0083 -3.31,3.31 0.0723 0.0009 0.0061 -7.85,7.85 0.0662 0.0006 0.0050 µ = 0 1.0062 0.0123 S ∗ H σ = 1 -0.84,0.84 0.0701 0.0023 -1.05,1.05 0.0312 0.0012 -1.22,1.22 0.0142 0.0005 0.00070.9986 0.0155 0.0155 1.0033 0.0064 0.0064 1.0071 0.0022 0.0022 S ∗ -0.57,0.58 0.0868 0.0019 0.0094 -1.07,1.09 0.0330 0.0012 0.0023 -1.28,1.29 0.0121 0.0006 S q q = 0 .

86 0.0144 0.0022 0.0024 0.9 0.0083 0.0011 0.0011 q = 0 .

93 0.0047 0.0005 0.00061.0028 0.0042 0.0042 1.0009 0.0021 0.0021 1.0034 0.0011 0.0011 S D β = 7 · − β = 4 · − β = 1 . · − ρ θ TC ˆ θ = (ˆ µ, ˆ σ, ˆ α ) d Var(ˆ θ ) [ MSE(ˆ θ ) TC ˆ θ d Var(ˆ θ ) [ MSE(ˆ θ ) TC ˆ θ d Var(ˆ θ ) [ MSE(ˆ θ )MqLE µ = 0 q = 0 .

55 0.0106 0.0019 q = 0 .

52 0.0053 0.0009 q = 0 . σ = 1 0.7703 0.0209 0.0736 0.7632 0.0102 0.0663 0.8111 0.0043 0.0400 α = 3 2.7923 0.9898 1.0329 2.7638 0.5645 0.6203 2.7337 0.2323 0.3033MDLE β = 9 · − β = 9 . · − β = 9 · − When we look at the simulation results for 1 th and 2 nd designs in Tables V-VI respectively,22 ABLE VII: 3 rd design EEs θ TC ˆ θ = (ˆ µ, ˆ σ ) d Var(ˆ θ ) [ MSE(ˆ θ ) TC ˆ θ d Var(ˆ θ ) [ MSE(ˆ θ ) TC ˆ θ d Var(ˆ θ ) [ MSE(ˆ θ ) − k, t n = 110 − k, t n = 210 − k, t n = 410 S H -1.34,1.34 0.0773 0.0018 0.0078 -2.53,2.53 0.0676 0.0010 0.0056 -5.21,5.21 0.0588 0.0005 0.0040 µ = 0 1.0005 0.0219 0.0219 1.0022 0.0186 0.0186 0.9984 0.0200 0.0200 S ∗ H σ = 1 -1.13,1.12 -1.29,1.28 0.0020 0.0012 0.0012 -1.40,1.38 0.0004 0.0005 S ∗ -1.2,1.19 -1.37,1.35 -1.48,1.47 0.0018 0.0006 0.00061.0074 0.0091 S q q = 0 .

85 0.0184 0.0023 0.0026 q = 0 .

89 0.0122 0.0011 0.0012 q = 0 .

92 0.0084 0.0005 0.00061.0053 0.0051 0.0051 1.0046 0.0024 0.0024 1.0045 0.0011 0.0011 S D β = 4 · − β = 2 · − β = 10 − ρ θ TC ˆ θ = (ˆ µ, ˆ σ, ˆ α ) d Var(ˆ θ ) [ MSE(ˆ θ ) TC ˆ θ d Var(ˆ θ ) [ MSE(ˆ θ ) TC ˆ θ d Var(ˆ θ ) [ MSE(ˆ θ )MqLE µ = 0 q = 0 .

65 0.0080 0.0015 q = 0 .

65 0.0042 0.0007 q = 0 .

65 0.0024 0.0004 σ = 1 0.8196 0.0162 0.0488 0.8284 0.0076 0.0370 0.8317 0.0042 0.0325 α = 3 2.7331 0.7868 0.8581 2.7371 0.3979 0.4670 2.7300 0.2200 0.2929MDLE β = 7 · − β = 7 · − β = 8 · − TABLE VIII: 4 th design EEs θ TC ˆ θ = (ˆ µ, ˆ σ ) d Var(ˆ θ ) [ MSE(ˆ θ ) TC ˆ θ d Var(ˆ θ ) [ MSE(ˆ θ ) TC ˆ θ d Var(ˆ θ ) [ MSE(ˆ θ ) − k, t n = 110 − k, t n = 210 − k, t n = 410 S H -1.07,1.07 0.0931 0.0031 0.0118 -1.38,1.38 0.0532 0.0017 0.0045 -1.73,1.73 0.0308 0.0009 0.0019 µ = 0 1.0083 0.0120 0.0120 1.0029 0.0055 0.0056 1.0011 0.0027 0.0027 S ∗ H σ = 1 -1.21,1.22 0.1001 0.0016 0.0116 -1.47,1.48 0.0620 0.0005 0.0044 -1.74,1.76 0.0397 0.0002 0.00181.0004 0.0101 S ∗ -1.76,1.77 0.0685 0.0014 -2.34,2.36 0.0331 0.0005 -2.71,2.73 0.0175 0.0002 S q q = 0 .

815 0.0502 0.0011 0.0036 q = 0 .

875 0.0290 0.0004 0.0012 q = 0 .

918 0.0165 0.0001 0.00041.0046 0.0088 S D β = 9 · − β = 4 · − β = 2 · − ρ θ TC ˆ θ = (ˆ µ, ˆ σ, ˆ α ) d Var(ˆ θ ) [ MSE(ˆ θ ) TC ˆ θ d Var(ˆ θ ) [ MSE(ˆ θ ) TC ˆ θ d Var(ˆ θ ) [ MSE(ˆ θ )MqLE µ = 0 q = 0 .

525 0.0180 0.0042 q = 0 .

565 0.0085 0.0011 q = 0 .

605 0.0045 0.0003 σ = 1 0.5755 0.0617 0.2419 0.6351 0.0320 0.1652 0.6824 0.0161 0.1170 α = 1 . β = 6 · − β = 3 · − β = 1 . · − these results show that main issue is about the modeling capability of score functions with23Es for the design of artiﬁcial data sets and the real data sets even if µ ψ log and σ ψ log for α > q = 1 are inﬁnite in Eqs. (28) and (29) respectively (see also Table I). Since Eqs.(28)-(30) and (31)-(33) are similar expression except w i , we can say same results for Eqs.(31)-(33) if β = 0 (see also Table II).The following results are observed generally in these designs: For the simultenaous esti-mations of parameters µ and σ , MSE values of ˆ µ and ˆ σ from score functions S q and S D withEEs have MSE values which are smaller than that of S H , S ∗ H and S ∗ with EEs. MSE valuesof ˆ µ and ˆ σ from S ∗ H and S ∗ with EEs have smaller than that of S H for sample sizes 210 and410, because S H depends on α = 2 and α , α = 1 for underlying and contamination, re-spectively. For simultenaous estimations of µ , σ and α , MSE values of ˆ σ and ˆ α from MDLEhave smaller than that of MqLE. If we want to have small values of [ MSE(ˆ σ ) and [ MSE( ˆ α ) forˆ σ and ˆ α , respectively, we need to change value of q for each estimations of two parameters.Since parameter q changes shape of function [12, 40, 58], it is reasonable to observe thisresult. β in MDLE can be changed minimally to get small values of [ MSE(ˆ σ ) and [ MSE( ˆ α )for ˆ σ and ˆ α , respectively. For simultaneous estimation of µ and σ , the chosen values of TCsfrom S H , S ∗ H and S ∗ can lead to have biased estimators for µ , i.e., d Var(ˆ µ ) < [ MSE(ˆ µ ) insome cases from Tables V-VIII. For Tables V-VIII, bold represents the smallest values of [ MSE(ˆ θ ) among all of functions and italic represents the smallest values of [ MSE(ˆ θ ) among S H , S ∗ H and S ∗ .The design of artiﬁcial data set aﬀects success of simultenaous estimations for parametersof EP. The diﬀerent designs of artiﬁcial data sets were tried and they are not given in heredue to the number of page restriction. In order to avoid the similarity of contaminationschema in four designs applied at here, we chose arbitrary values of µ , σ and µ and σ for contamination. We also tried same values of parameters for contamination. In this case,the values of TCs should be updated to have small values of [ MSE(ˆ θ ) for each tried cases ofcontamination. k , t , q and β should be chosen accurately for each sample sizes to have smallMSE values when n gets larger, because the designs in each sample sizes can be diﬀerentthan the other one. In other words, it is obvious that the values of TC change the structureof M-function. So the values of [ MSE(ˆ θ ) in diﬀerent sample sizes are aﬀected, as expected.24 . Application to real data sets: Procedure of computation for estimation of µ, σ, α The computation is performed according to the following order:1. Arbitrary appropriate values of TCs and shape parameter α are chosen according todesign of data set (outliers or contamination) by user to start computation.2. The values of k , t , q and β as TCs and shape parameters α , α and α are chosenby grid search until the smallest value of volume in Eq. (37), appropriate minimumvalues of IC in its self in Eqs. (38) - (39) and the smallest value of mean absolute error(MAE) in Eq. (A1) which is more precise evaluation for testing the ﬁtting performanceof objective and score functions are obtained.MAE is deﬁned as 1 n n X i =1 | real sort( x ) − artiﬁcial sort( x ) | . (A1)After data are sorted, sample sizes of contamination and underlying distributions are deter-mined. n = 7 , n = 109 and n = 2 for non-outlier case in real data sets. These samplesizes are determined by user according to outliers. The estimates of ˆ µ , ˆ σ , ˆ α for MqLE andMDLE, ˆ µ , ˆ σ , ﬁxed values of α , α , α for S H , S ∗ H , S ∗ , S D and S q are values used to gen-erate random numbers from EP distribution. Since redescending M-functions move slowlydecreasing for tails when it is compared by S H , S ∗ H , S ∗ [20], we use ˆ µ , ˆ σ and ﬁxed valuesof α , α , α for S q and S D . Random numbers x j are obtained by D j , as given by items fordesign of artiﬁcial data set in A 1. The replication number is m = 2 · for P j =1 n j .Since the main criteria for model selection among M-functions is MAE, we overcome thedisadvantages occured by special function Γ in FI matrices for volume and also tail part of S and weighted score, i.e. wS for IC. Tail part of S and wS can pull down IC hardly. Forthis reason, MAE should be used to adjust appropriate values of IC. Volume and IC aretools to decrease high elapsed time occuring due to using MAE in several replications forcomputation. In other words, using only MAE consumes much time. Consulting volumeand IC gives an advantage to decrease computation time while determining values of TCs.Note that the elapsed time is around 1.5 hours which change according to sample size n .25 ppendix B: Proof of FI matrices based on derivative of score functions S H , S ∗ , S ∗ H , S q and S D from EEs The approach based on objective function ρ [26, 35, 50] is adopted to score func-tions [7, 36]. We can add or remove minus because relative entropy or divergence isused for the estimation of parameters θ . By using the deﬁnition of Tsallis q -entropy H q = R ba f ( x ; θ ) q log q [ f ( x ; θ )]d x for continuous variable x and law of large numbers (LLN)[5], we can replace f q with S and log q with S q because g ( z ; θ ) = S ( x ; θ ) z in Eq. (2) is taken.Further, Eq. (4) is equivalently rewritten as f ( x ; θ ) f ( x ; θ ) − q − − q = f ( x ; θ ) log q ( f ( x ; θ )) ≅ S ( x ; θ ) S q ( x ; θ ) if Jackson q -derivative is g ( qz ; θ ) − g ( z ; θ )( q − z ≅ g ( z ; θ ) − g ( qz ; θ )(1 − q ) z . Similarly, we have S and S D for log-likelihood and distorted log-likelihood, respectively. Let us assume that scorefunctions S , S q and S D are diﬀerentiable. Thus, entropy function can be based on scorefunctions and FI matrix based on S q is proved by d F log q ( S q ; f ; θ ) = 1 n n X i =1 Z i/n ( i − /n ∆ S ( x i ; θ )∆ S q ( x i ; θ )d x i , (B1) LLN = n X i =1 Z i/n ( i − /n f ( x i ; θ )∆ S ( x i ; θ )∆ S q ( x i ; θ )d x i , = n X i =1 Z i/n ( i − /n f ( x i ; θ ) ∂∂ θ S q ( x i ; θ ) ∂∂ θ S ( x i ; θ ) T d x i , = n X i =1 Z i/n ( i − /n f ( x i ; θ ) − q [(1 − q ) S ( x i ; θ ) + ∂∂ θ S ( x i ; θ )] ∂∂ θ S ( x i ; θ ) T d x i . Eq. (B1) keeps spirit of Riemann integration or histograms as random bins on the real line.That is, let ∆( θ ) = θ − ˆ θ = θ − ( θ + h ) be a diﬀerence operator. For lim h → ∆( θ ) = 0. h ∆( S ) = S ( x ; θ ) − S ( x ; ˆ θ ) = S − ˆ S as a derivative of S w.r.t θ , i.e., ∂∂ θ S ( x ; θ ) = lim h → ∆( S ).FI based on the distorted log-score function S D is obtained similarly. For all intervals x i ofsampled version of the fourth line of Eq. (B1), we have result in Eq. (35) for continuouscase of x with interval ( a, b ).

1. Proof for positive semideﬁnite of Fisher information matrices in Eqs. (34) , (35) and (36) If we produce EEs of location µ and scale σ as a location-scale family in a p.d. function f ( x ; θ ), score function S and Ψ for µ and σ can have same mathematical expression, as26iven by Eqs. (17) and (19). For location-scale model, Ψ for µ and σ can drop to S as onlyone function for µ and σ due to the spirit of EEs, as introduced by subsection IV A. Since wecan have EE for shape parameter α of EP distribution, only one score function for µ , σ and α is obtained from EEs. Since ρ ( x ; θ ) = − log( f ( x ; θ )), S ( x ; θ ) from Ψ( x ; θ ) is mainly from f ′ ( x ; θ ) f ( x ; θ ) . f ′ ( x ; θ ) = ∂∂ θ f ( x ; θ ), f ′′ ( x ; θ ) = ∂ ∂ θ ∂ θ T f ( x ; θ ), θ ∈ ( µ, σ ) [51]. Let f ( x ; θ ), f ′ ( x ; θ )and f ′′ ( x ; θ ) be shown as f , f ′ and f ′′ , respectively. Since S is rooted from f ′ /f , diagonalelements of FI matrix in Eq. (34) without underlying f ≥ "(cid:18) f ′ f (cid:19) ′ = " f ′′ f − (cid:18) f ′ f (cid:19) . (B2)Since Eq. (B2) has a square form, we have positive values for the diagonal elements of Fishermatrix F log in Eq. (34). Let f ′ µ and f ′′ µ be ∂∂µ f ( x ; µ ) and ∂ ∂µ f ( x ; µ ), respectively. f µ givenby ∂∂µ log( f ( x ; µ, σ, α )) = f ′ µ f µ is p.d. function in log. Similarly, other twin parameters, i.e., σ and α , etc. can be chosen to have notations in Eq. (B3). The undiagonal elements of FImatrix in Eq. (34) without underlying f ≥ f ′ µ f µ ! ′ (cid:18) f ′ σ f σ (cid:19) ′ =  f ′′ µ f µ − f ′ µ f µ !  " f ′′ σ f σ − (cid:18) f ′ σ f σ (cid:19) . (B3)The following conditions should be satisﬁed in order to imply that F log in Eq. (34) ispositive semideﬁnite. Let us test F log by use of these conditions. • Determinants test: – E µµ = E (cid:16) ( ∂∂µ S ( x ; µ, σ )) (cid:17) ≥ E σσ = E (cid:0) ( ∂∂σ S ( x ; µ, σ )) (cid:1) ≥ – E µµ E σσ ≥ E µσ , i.e. E (cid:16) ( ∂∂µ S ( x ; µ, σ )) (cid:17) E (cid:0) ( ∂∂σ S ( x ; µ, σ )) (cid:1) ≥ (cid:16) E ∂∂µ S ( x ; µ, σ ) ∂∂σ S ( x ; µ, σ ) (cid:17) . • Pivot test: E µµ = E (cid:16) ( ∂∂µ S ( x ; µ, σ )) (cid:17) ≥ E − µµ ( E µµ E σσ − E µσ ) ≥ F log in Eq. (34) is a positive semideﬁnite matrix.27et us examine whether or not diagonal elements of FI matrices in Eqs. (35) and Eq.(36) without f − q ≥ f β + f ≥ wS is rooted from wf ′ /f , diagonalelements are rewritten as following form:( f ′ ) f q − f ′′ ( f ′ ) f ( q + 1) + ( f ′′ ) f . (B4)( f ′ ) f β + f − f ′′ ( f ′ ) f (cid:18) − ββ + f (cid:19) + ( f ′′ ) f , (B5) β ≥ q ∈ (0 , f is diﬀerentiable and concave according to parameters, then f ′′ isnegative. Thus, Eqs. (B4) and (B5) are positive because other expressions in Eqs. (B4) and(B5) are positive or if summation of ﬁrst and third terms in Eq. (B4) is bigger than secondterm in Eq. (B4), then Eq. (B4) is positive. The same rule is valid for Eq. (B5). Note thatsince f is p.d. function and β ≥

0, then β + f ≥ ≤ ββ + f ≤ f − q ≥ f β + f ≥

0, as given by following forms, respectively. f ′′ σ f σ − (cid:18) f ′ σ f σ (cid:19) !  (1 − q ) (cid:18) f ′ f (cid:19) + f ′′ µ f µ − f ′ µ f µ !  . (B6) f ′′ σ f σ − (cid:18) f ′ σ f σ (cid:19) ! (cid:18) ββ + f (cid:19) (cid:18) f ′ f (cid:19) + f ′′ µ f µ − f ′ µ f µ !  . (B7)Eqs. (B6) and (B7) can be positive or negative according to values of expressions. µ and σ can be replaced by σ and α , respectively. Other possible replacement for parameters can bedone to have undiagonal elements of Fisher matrix in Eq. (C1). Thus, undiagonal elementsof inverse of FI matrices in Eqs. (35) and (36) can be positive or negative. Note that0 ≤ f ≤

1, 0 ≤ f − q ≤ f β + f ≥ F log q and F log in Eqs. (35) and (36) will be positive semideﬁnite matrices if square of Eq.(B4) is greater and equal than multiplication of Eq. (B6) and Eq. (B6) with replacementof σ and µ according to matrix in Eq. (C1), i.e., determinants of upper submatrices arepositive or zero if there exists equality. The same procedure given for Eqs. (B4) and (B6)is valid for (B5) and (B7). F log q and F log are not symmetric, but F is a square matrix with d × d . If Eqs. (B4) and (B5) are positive, then diagonal elements of F will be positive.Thus, we have positive value for trace of F , i.e. summation of eigenvalues of F is positive28f the number of columns or rows is equal to rank of F . Determinant of F is product ofeigenvalues of F . When every eigenvalue of F is positive, F is positive deﬁnite due toeigenvalue decomposition if F has d linearly independent eigenvectors [62]. If determinantof F is zero for the positive semideﬁnite case, then the Moore-Penrose generalized inversefor singular matrix [64] can be used to get the variances of estimators, i.e. Var(ˆ µ ), Var(ˆ σ )and Var( ˆ α ), even if information loss occurring due to using the generalized inverse exists.To show whether or not F is positive semideﬁnite, we have used ∂∂ θ ρ ( x ; θ ) = ∂∂ θ Λ · f ( x ; θ ) ′ = ∂∂ θ Λ · f ( x ; θ ) ′ f ( x ; θ ) f ( x ; θ ) = w f ( x ; θ ) ′ f ( x ; θ ) . The equivalence between wS and ∂∂ θ ρ = Ψ isenjoyed in order to be able to say that we have positive semideﬁnite F tentatively for wS used in FI. Otherwise, as we already know that estimation process is based on divergencewhich is mainly an absolute value of distance between f ( x ; θ ) and f ( x ; ˆ θ ) for p.d. function[1, 35] or equivalently ρ ( x ; θ ) and ρ ( x ; ˆ θ ) for objective functions or equivalently S ( x ; θ ) and S ( x ; ˆ θ ) for score functions, elements of FI matrix can be multiplied by minus accordingly inorder to avoid negative values for diagonal elements of inverse of FI matrix in which we canhave for score functions used only in FI. Appendix C: Fisher information matrices for the parameters µ, σ and α of EPdistribution FI matrices based on score functions in Eq. (34) for µ and σ and also Eqs. (35)-(36) for µ , σ and α are represented by the following forms with elements: F = n  E µµ E µσ E µα E σµ E σσ E σα E αµ E ασ E αα  , (C1)where E represents an expectation taken over the underlying distribution for derivative ofscore functions w.r.t parameters µ , σ and α . n is sample size [17].29 . The elements of FI matrix in Eq. (34) based on derivative of combined log -score S ∗ from EEs for parameters of EP distribution if f is an underlying distribution F log = n  E µµ (cid:20)(cid:16) ∂S∂µ (cid:17) (cid:21) E µσ h ∂S∂µ ∂S∂σ i E σµ h ∂S∂σ ∂S∂µ i E σσ h(cid:0) ∂S∂σ (cid:1) i , (C2)where E µσ h ∂S∂µ ∂S∂σ i = E σµ h ∂S∂σ ∂S∂µ i . Eqs. (C3) and (C5) are positive. As shown by Eq.(B3),Eq. (C4) can be negative and positive. Thus, matrix in Eq. (C2) is positive semideﬁnitedue to tests of determinants and pivot.The reparametrization of Γ function is used to calculate the integrals in Eqs. (C3)-(C5)[37]. Note that k and t should be positive due to Γ [63]. E µµ "(cid:18) ∂S∂µ (cid:19) = 12 σ Γ(1 /α ) { ( α − α ) Γ( 2 α − α , ( k/σ ) α ) + ( α − α ) Γ( 2 α − α , ( t/σ ) α ) (C3)+ ( α − α ) [ γ (2 − /α , ( k/σ ) α ) + γ (2 − /α , ( t/σ ) α )] } ,E µσ (cid:20) ∂S∂µ ∂S∂σ (cid:21) = 12 σ Γ(1 /α ) {− ( α − α ) Γ( 2 α − α , ( k/σ ) α ) + ( α − α ) Γ( 2 α − α , ( t/σ ) α ) (C4)+ ( α − α ) [ − γ (2 − /α , ( k/σ ) α ) + γ (2 − /α , ( t/σ ) α )] } ,E σσ "(cid:18) ∂S∂σ (cid:19) = 12 σ Γ(1 /α ) { ( α − α ) Γ( 2 α − α , ( k/σ ) α ) + ( α − α ) Γ( 2 α − α , ( t/σ ) α ) (C5)+ ( α − α ) [ γ (2 − /α , ( k/σ ) α ) + γ (2 − /α , ( t/σ ) α )] } . Γ( z ) = γ ( z, a ) + Γ( z, a ) is gamma function with lower and upper incomplete gamma func-tions [63]. For S ∗ H , S and S in S ∗ are multiplied by − k and t , respectively. α , α , α > / . The elements of FI matrix in Eq. (35) based on derivative of score S q from EEsfor parameters of EP distribution if f is an underlying distribution F log q = n  E µµ h (1 − q ) S ∂S∂µ + ∂S∂µ ∂S∂µ i E µσ h (1 − q ) S ∂S∂σ + ∂S∂µ ∂S∂σ i E µα h (1 − q ) S ∂S∂α + ∂S∂µ ∂S∂α i E σµ h (1 − q ) S ∂S∂µ + ∂S∂σ ∂S∂µ i E σσ (cid:2) (1 − q ) S ∂S∂σ + ∂S∂σ ∂S∂σ (cid:3) E σα (cid:2) (1 − q ) S ∂S∂α + ∂S∂σ ∂S∂α (cid:3) E αµ h (1 − q ) S ∂S∂µ + ∂S∂α ∂S∂µ i E ασ (cid:2) (1 − q ) S ∂S∂σ + ∂S∂α ∂S∂σ (cid:3) E αα (cid:2) (1 − q ) S ∂S∂α + ∂S∂α ∂S∂α (cid:3) , (C6)The matrix F log q in Eq. (C6) and its form with parameters µ and σ as two-dimensionalmatrix are positive semideﬁnite if conditions are satisﬁed from Eqs. (B4) and (B6).Mathematica 11.3 is used to calculate the integrals in Eqs. (C7)-(C15). E µµ (cid:20) (1 − q ) S ∂S∂µ + ∂S∂µ ∂S∂µ (cid:21) = − q − ( α − α − q σ q − (2 − q ) /α − (C7)(2 + 3 σ ( q − − q + α (2 + 2 σ (1 − q ) + q )) Γ(2 − /α )Γ(1 /α ) q − E µσ (cid:20) (1 − q ) S ∂S∂σ + ∂S∂µ ∂S∂σ (cid:21) = 0 , (C8) E µα (cid:20) (1 − q ) S ∂S∂α + ∂S∂µ ∂S∂α (cid:21) = 0 , (C9) E σµ (cid:20) (1 − q ) S ∂S∂µ + ∂S∂σ ∂S∂µ (cid:21) = α − q (1 − q )(1 − α )Γ(3 − /α )2 − q (Γ(1 /α ) σ ) − q (2 − q ) − /α , (C10) E σσ (cid:20) (1 − q ) S ∂S∂σ + ∂S∂σ ∂S∂σ (cid:21) = 2 q − ( α − α − q σ q − (2 − q ) /α − (C11) (cid:0) α Γ(2 − /α ) − ( α − − /α ) (cid:1) Γ(1 /α ) q − E σα (cid:20) (1 − q ) S ∂S∂α + ∂S∂σ ∂S∂α (cid:21) = (2 − q ) /α − q − ( α − − /α ) (cid:18) ασ Γ(1 /α ) (cid:19) − q (C12) (cid:0) log(2 − q ) − ψ (0) (2 − /α ) − (cid:1) , E αµ (cid:20) (1 − q ) S ∂S∂µ + ∂S∂α ∂S∂µ (cid:21) = 2 q − ( α − α − q σ q − (2 − q ) /α − ( q − − /α )Γ(1 /α ) q − (C13) ασ (cid:20) (1 − q ) S ∂S∂σ + ∂S∂α ∂S∂σ (cid:21) = 2 q − ( α − σ/α ) q − (2 − q ) /α − Γ(2 − /α )Γ(1 /α ) q − (C14) (cid:16) log(2 − q ) − ψ (0) (2 − /α ) (cid:17) E αα (cid:20) (1 − q ) S ∂S∂α + ∂S∂α ∂S∂α (cid:21) = (2 σ/α ) q − (2 − q ) /α − Γ(2 − /α )Γ(1 /α ) q − (C15) (cid:16) (log(2 − q ) − − − q ) − ψ (0) (2 − /α ) + ψ (0) (2 − /α ) + ψ (1) (2 − /α ) (cid:17) For q = 1, FI based on S is obtained by Eqs. (C7)-(C15). ψ is digamma function and ψ ( h ) is h th derivative of the digamma function. α > / q ∈ (0 , S D is positive semideﬁnite if conditions (positive and negative) for Eqs. (B5) and (B7) aresatisﬁed. For parameters µ and σ , we have two-dimensional matrix of Eq. (C6). Acknowledgements

We would like to thank so much Editorial Board and anonymous referees to provide theinvaluable comments.

Disclosure statement

No potential conﬂict of interest was reported by the author(s). [1] L. Pardo, Statistical inference based on divergence measures, CRC Press, Taylor & FrancisGroup, 2005.[2] C. Tsallis, Introduction to Nonextensive Statistical Mechanics: Approaching a ComplexWorld, Springer, New York, 2009.[3] Tsallis, C. (1988). Possible generalization of Boltzmann-Gibbs statistics. Journal of statisticalphysics, 52(1-2), 479-487.[4] Abe, S., Okamoto, Y. (Eds.). (2001). Nonextensive statistical mechanics and its applications(Vol. 560). Springer Science and Business Media.

5] E.L. Lehmann, G. Casella, Theory of point estimation, Wadsworth & Brooks/Cole. PaciﬁcGrove, CA, 589, USA, 1998.[6] Ferrari, D., Yang, Y. (2010). Maximum Lq-likelihood estimation. The Annals of Statistics,38(2), 753-783.[7] C¸ ankaya, M. N., Korbel, J. (2018). Least informative distributions in maximum q-log-likelihood estimation. Physica A: Statistical Mechanics and its Applications, 509, 140-150.[8] Hasegawa, Y., Arita, M. (2009). Properties of the maximum q-likelihood estimator for in-dependent random variables. Physica A: Statistical Mechanics and its Applications, 388(17),3399-3412.[9] Giuzio, M., Ferrari, D., Paterlini, S. (2016). Sparse and robust normal and t-portfolios bypenalized Lq-likelihood minimization. European Journal of Operational Research, 250(1), 251-261.[10] Hanel, R., Thurner, S. (2011). A comprehensive classiﬁcation of complex statistical systemsand an axiomatic derivation of their entropy and distribution functions. EPL (EurophysicsLetters), 93(2), 20006.[11] Jizba, P., Arimitsu, T. (2004). Observability of R´enyi’s entropy. Physical Review E, 69(2),026128.[12] Bercher, J. F. (2012). A simple probabilistic construction yielding generalized entropies anddivergences, escort distributions and q-Gaussians. Physica A: Statistical Mechanics and itsApplications, 391(19), 4460-4469.[13] P. Jizba, Information theory and generalized statistics, in: H.-T. Elze, ed., Decoherence andEntropy in Complex Systems, Lecture Notes in Physics, vol. 633, Springer-Verlag, Berlin,2003, p.362.[14] Jizba, P., Korbel, J. (2016). On q-non-extensive statistics with non-Tsallisian entropy. PhysicaA: Statistical Mechanics and its Applications, 444, 808-827.[15] Basu, A., Harris, I. R., Hjort, N. L., Jones, M. C. (1998). Robust and eﬃcient estimation byminimising a density power divergence. Biometrika, 85(3), 549-559.[16] Vajda, I. (1986). Eﬃciency and robustness control via distorted maximum likelihood estima-tion. Kybernetika, 22(1), 47-67.[17] Huber, P. J. (1964). Robust estimation of a location parameter: Annals Mathematics Statis-tics, 35.

18] Godambe, V. P. (1960). An optimum property of regular maximum likelihood estimation. TheAnnals of Mathematical Statistics, 31(4), 1208-1211.[19] ¨Ozt¨urk, ¨O. (1998). Theory and Methods: A Robust and Almost Fully Eﬃcient M-Estimator.Australian and New Zealand Journal of Statistics, 40(4), 415-424.[20] F.R. Hampel, E. M. Ronchetti, P. J. Rousseeuw, W. A. Stahel, Robust statistics: The approachbased on inﬂuence functions, Wiley Series in Probability and Statistics, New York, 1986.[21] Smith, H. L. (2011). An introduction to delay diﬀerential equations with applications to thelife sciences (Vol. 57). New York: Springer.[22] J. Rissanen, Information and Complexity in Statistical Modeling, Springer, New York, 2007.[23] Myung, I. J., Balasubramanian, V., Pitt, M. A. (2000). Counting probability distributions:Diﬀerential geometry and model selection. Proceedings of the National Academy of Sciences,97(21), 11170-11175.[24] Gattone, S. A., De Sanctis, A., Russo, T., Pulcini, D. (2017). A shape distance based on theFisher-Rao metric and its application for shapes clustering. Physica A: Statistical Mechanicsand its Applications, 487, 93-102.[25] Maybank, S. J. (2006). Application of the Fisher-Rao metric to structure detection. Journalof Mathematical Imaging and Vision, 25(1), 49-62.[26] Amari, S., Information Geometry and Its Applications, in: Applied Mathematical Sciences,Springer, 2016.[27] Shannon, Claude Elwood. ”A mathematical theory of communication.” ACM SIGMOBILEmobile computing and communications review 5.1 (2001): 3-55.[28] Ernst, T. (2012). A comprehensive treatment of q-calculus. Springer Science & Business Media.[29] Abe, S. (1997). A note on the q-deformation-theoretic aspect of the generalized entropies innonextensive physics. Physics Letters A, 224(6), 326-330.[30] F. H. Jackson, On q-functions and a certain diﬀerence operator, Appl. Math. Sci, vol. 46, no.2, pp. 253–281, 1908, doi: 10.1017/S0080456800002751.[31] Wada, T., Suyari, H. (2007). A two-parameter generalization of Shannon–Khinchin axiomsand the uniqueness theorem. Physics Letters A, 368(3-4), 199-205.[32] C. Beck, F. Schl¨ogl, Thermodynamics of Chaotic Systems: An Introduction, Cambridge Un.Press, Cambridge, 1997.[33] Beck, C. (2004). Superstatistics, escort distributions, and applications. Physica A: Statistical echanics and its Applications, 342(1-2), 139-144.[34] D. Harte, Multifractals, Theory and Applications, Chapman & Hall/CRC, London, 2001.[35] S. Kullback, Information Theory and Statistics, Courier Corporation, USA, 1997.[36] Plastino, A., Plastino, A. R., Miller, H. G. (1997). Tsallis nonextensive thermostatistics andFisher’s information measure. Physica A: Statistical Mechanics and its Applications, 235(3-4),577-588.[37] C¸ ankaya, M. N. (2018). Asymmetric bimodal exponential power distribution on the real line.Entropy, 20(1), 23.[38] C¸ ankaya, M. N., Arslan, O. (2020). On the robustness properties for maximum likelihood esti-mators of parameters in exponential power and generalized T distributions. Communicationsin Statistics-Theory and Methods, 49(3), 607-630.[39] Maronna, R. A. (1976). Robust M-estimators of multivariate location and scatter. The annalsof statistics, 51-67.[40] Bercher, J. F. (2012). On generalized Cram´er–Rao inequalities, generalized Fisher informationand characterizations of generalized q-Gaussian distributions. Journal of Physics A: Mathe-matical and Theoretical, 45(25), 255303.[41] C¸ ankaya, M. N. 2015. M-Estimators with asymmetric inﬂuence function: Properties and theirapplications. PhD diss., University of Ankara.[42] Godambe, V. P., Thompson, M. E. (1984). Robust estimation through estimating equations.Biometrika, 71(1), 115-125.[43] M. Thompson, e-mail communication.[44] Malik SC, Arora S, 1992. Mathematical analysis. New Age International.[45] Korbel, J. (2017). Rescaling the nonadditivity parameter in Tsallis thermostatistics. PhysicsLetters A, 381(32), 2588-2592.[46] Korbel, J., Hanel, R., Thurner, S. (2018). Classiﬁcation of complex systems by their sample-space scaling exponents. New Journal of Physics, 20(9), 093007.[47] Jizba, P., Korbel, J. (2019). Maximum entropy principle in statistical inference: Case fornon-Shannonian entropies. Physical review letters, 122(12), 120601.[48] Dar´oczy, Z. (1970). Generalized information functions. Information and control, 16(1), 36-51.[49] Fisher, R. A. (1925, July). Theory of statistical estimation. In Mathematical Proceedingsof the Cambridge Philosophical Society (Vol. 22, No. 5, pp. 700-725). Cambridge University ress.[50] C¸ ankaya, M. N., Korbel, J. (2017). On statistical properties of Jizba-Arimitsu hybrid entropy.Physica A: Statistical Mechanics and its Applications, 475, 1-10.[51] Huber, P.J. Robust statistics , 1981, John Wiley and Sons, New York, USA.[52] Godambe, V. P., Heyde, C. C. (2010). Quasi-likelihood and optimal estimation. In Selectedworks of cc heyde (pp. 386-399). Springer, New York, NY.[53] H. Akaike, Information theory and an extension of the maximum likelihood principle. In B.N. Petrov & B. F. Csaki (Eds.), Second International Symposium on Information Theory,Academiai Kiado: Budapest, 267-281, 1973.[54] Bozdogan, H. (1987). Model selection and Akaike’s information criterion (AIC): The generaltheory and its analytical extensions. Psychometrika, 52(3), 345-370.[55] Ronchetti, E. (1997). Robustness aspects of model choice. Statistica Sinica, 327-338.[56] Elsalloukh, H. 2009. Further results on the epsilon-skew exponential power distribution family.Far East Journal of Theoretical Statistics 28:201–12.[57] Lucas, A. (1997). Robustness of the student t based M-estimator. Communications inStatistics-Theory and Methods, 26(5), 1165-1182.[58] Bercher, J. F. (2013). Some properties of generalized Fisher information in the context ofnonextensive thermostatistics. Physica A: Statistical Mechanics and its Applications, 392(15),3140-3154.[59] ¨Orkc¨u, H. H., ¨Ozsoy, V. S., Aksoy, E., Doˇgan, M. I. (2015). Estimating the parameters of3-p Weibull distribution using particle swarm optimization: A comprehensive experimentalcomparison. Applied Mathematics and Computation, 268, 201-226.[60] Available online: legacy.bas.ac.uk: https://legacy.bas.ac.uk/met/READER/surface/Grytviken.All.temperature.txt. [61] Available online: https://discover.nci.nih.gov/nature2000/data/selected_data/at_matrix.txt. [62] Verhaegen, M., Verdult, V. (2007). Filtering and system identiﬁcation: a least squares ap-proach. Cambridge university press.[63] I.S. Gradshteyn, I. M. Ryzhik, A. Jeﬀrey, D. Zwillinger, Table of Integrals, Series, and Prod-ucts, Sixth Edition, Academic Press, 1171, USA, 2007.[64] Li, Y. H., Yeh, P. C. (2012). An interpretation of the Moore-Penrose generalized inverse of singular Fisher Information Matrix. IEEE Transactions on Signal Processing, 60(10), 5532-5536.singular Fisher Information Matrix. IEEE Transactions on Signal Processing, 60(10), 5532-5536.