Box-Cox symmetric distributions and applications to nutritional data
aa r X i v : . [ s t a t . O T ] M a r Box-Cox symmetric distributions and applications tonutritional data
Silvia L.P. Ferrari ∗ Department of Statistics, University of S˜ao Paulo, S˜ao Paulo, Brazil
Giovana Fumes
Department of Exact Sciences, ESALQ, University of S˜ao Paulo, Piracicaba, Brazil
Abstract
We introduce and study the Box-Cox symmetric class of distributions, which is usefulfor modeling positively skewed, possibly heavy-tailed, data. The new class of distribu-tions includes the Box-Cox t, Box-Cox Cole-Green (or Box-Cox normal), Box-Cox powerexponential distributions, and the class of the log-symmetric distributions as specialcases. It provides easy parameter interpretation, which makes it convenient for regres-sion modeling purposes. Additionally, it provides enough flexibility to handle outliers.The usefulness of the Box-Cox symmetric models is illustrated in a series of applicationsto nutritional data.
Key words:
Box-Cox transformation; Symmetric distributions; Box-Cox power exponen-tial distribution; Box-Cox slash distribution; Box-Cox t distribution; Log-symmetric dis-tributions; Nutrients intake.
It is well-known that positive continuous data usually present positive skewness and outlierobservations. This is the typical situation with survival times, nutrients intake and familyincome data, among many other examples. Since Box and Cox (1964) seminal paper, theBox-Cox power transformation has been routinely employed for transforming to normality.Let Y be a positive random variable. The Box-Cox transformation is defined as ( Y λ − /λ, if λ ,
0, and log Y , if λ =
0. Despite its popularity and ease of implementation, this approach,however, has drawbacks, one of them being the fact that the model parameters cannot be ∗ Corresponding author. Email: [email protected] R + and is defined from the transformation Z ≡ h ( Y ; µ, σ, λ ) = σλ (cid:20)(cid:16) Y µ (cid:17) λ − (cid:21) , if λ , , σ log (cid:16) Y µ (cid:17) , if λ = , (1)where µ > σ > −∞ < λ < ∞ , assuming that Z has a standard normal distribu-tion truncated at a suitable interval of the real line; details are given in the next section.The Box-Cox symmetric (BCS) class of distributions defined in this paper replaces the nor-mal distribution by the class of the continuous standard symmetric distributions. Replac-ing the normal distribution by the Student-t and the power exponential distributions re-sults in the Box-Cox t (Rigby and Stasinopoulos; 2006) and the Box-Cox power exponential(Rigby and Stasinopoulos; 2004; Voudouris et al.; 2012) distributions. Additionally, it gener-alizes the class of the log symmetric distributions (Vanegas and Paula; 2016, 2015).The paper unfolds as follows. In Section 2, the Box-Cox symmetric class of distributions isdefined, some properties are stated and interpretation of the parameters in terms of quantilesis discussed. Tail heaviness of Box-Cox symmetric distributions is studied in Section 3. Itis shown that the Box-Cox symmetric class of distributions allows much more tail flexibilitythan the log-symmetric distributions. Likelihood-based inference in discussed in Section 4.It is suggested that the choice of the symmetric distribution may lead to robust estimationagainst outliers. In Section 5, applications to 33 nutrients intake data are presented, and acomparison of alternative approaches is provided. Finally, concluding remarks (Section 6)close the paper. Technical details are left for the Appendix. A continuous random variable W is said to have a symmetric distribution with locationparameter µ ∈ R , scale parameter σ > r , and we write W ∼ S ( µ, σ ; r ), if its probability density function (pdf) is given by f W ( w ) = σ r (cid:18) w − µσ (cid:19) ! , w ∈ R , (2)where r ( · ) satisfies r ( u ) >
0, for u ≥
0, and R ∞ u − / r ( u )d u =
1. The class of the symmetricdistributions has a number of well-known distributions as special cases depending on the2hoice of r . It includes the normal distribution as well as the Student-t, power exponential,type I logistic, type II logistic and slash distributions among others. Densities in this familyhave quite di ff erent tail behaviors, and some of them may have heavier or lighter tails thanthe normal distribution.The symmetric distributions have some interesting properties. Some of them follow: (i) If W ∼ S ( µ, σ ; r ), its characteristic function is ψ W ( t ) = e it µ ϕ ( t σ ), t ∈ R , for some function ϕ , with ϕ ( u ) ∈ R , for u >
0. Whenever they exist, E( W ) = µ and Var( W ) = ξσ , where ξ = − ϕ ′ (0) > ϕ ′ (0) = d ϕ ( u ) / d u | u = is a constant not depending on µ and σ (Fang et al.; 1990). If u − ( k + / r ( u ) is integrable, then the k th moment of W exist (Kelker; 1970). (ii) If W ∼ S ( µ, σ ; r ),then a + bW ∼ S ( a + b µ, | b | σ ; r ), where a , b ∈ R , with b ,
0. In particular, if W ∼ S ( µ, σ ; r ), then S = ( W − µ ) /σ ∼ S (0 , r ), and its pdf is f S ( s ) = r ( s ) , for s ∈ R .Let Y be a positive continuous random variable, and consider Z ≡ h ( Y ; µ, σ, λ ) as in (1).Assume that Z has a standard symmetric distribution truncated at R \ A ( σ, λ ), where A ( σ, λ ) = (cid:16) − σλ , ∞ (cid:17) , if λ > , (cid:16) −∞ , − σλ (cid:17) , if λ < , ( −∞ , ∞ ) , if λ = , (3)i.e. the support of the truncated distribution is A ( σ, λ ). We then say that Y has a Box-Coxsymmetric distribution with parameters µ > σ >
0, and λ ∈ R , and density generatingfunction r , and we write Y ∼ BCS ( µ, σ, λ ; r ). In other words, Y ∼ BCS ( µ, σ, λ ; r ) if thetransformed variable Z in (1) has the distribution of S ∼ S (0 , r ) truncated at R \ A ( σ, λ ).The Box-Cox symmetric class of distributions reduces to the log-symmetric class of dis-tributions (Vanegas and Paula; 2016) when λ is fixed at zero. Additionally, it leads to theBox-Cox Cole-Green (Stasinopoulos et al.; 2008), Box-Cox t (Rigby and Stasinopoulos; 2006)and Box-Cox power exponential (Rigby and Stasinopoulos; 2004; Voudouris et al.; 2012) dis-tributions by taking Z as a truncated standard normal, Student-t and power exponentialrandom variable, respectively. The density generating function, r ( u ), for u ≥
0, for variousdistributions in the BCS class follows:(i) normal: r ( u ) = (2 π ) − / exp {− u / } ;(ii) double exponential: r ( u ) = ( √ /
2) exp {− √ u / } ;(iii) power exponential: r ( u ) = [ τ/ ( p ( τ )2 + /τ Γ (1 /τ ))] exp {− u τ/ / (2 p ( τ ) τ ) } ], where τ > p ( τ ) = − /τ Γ (1 /τ )[ Γ (3 /τ )] − ; when τ = τ = r ( u ) coincides with the densitygenerating function of the double exponential and normal, respectively;(iv) Cauchy: r ( u ) = { π (1 + u ) } − ;(v) Student-t: r ( u ) = τ τ/ { B (1 / , τ/ } − ( τ + u ) − ( τ + / , τ >
0, where B ( · , · ) is the beta function;3vi) type I logistic: r ( u ) = c exp {− u } (1 + exp {− u } ) − , where c ≈ . R ∞ u − / r ( u )d u = r ( u ) = exp {− u / } (1 + exp {− u / } ) − ;(viii) canonical slash (Rogers and Tukey; 1972): r ( u ) = (1 / √ π u )(1 − exp {− u / } ), for u > r ( u ) = / (2 √ π ), for u = : r ( u ) = Ψ (( q + / , u / q q / − / ( √ π u ( q + / ), for u >
0, and r ( u ) = q / [( q + √ π ],for u = q >
0, where Ψ ( a , x ) = R x t a − e − t d t is the lower incomplete gamma function;when q = f Z ( · ) be the pdf of Z with Z given in (1). We have f Z ( z ) = f S ( z ) F S (cid:16) σ | λ | (cid:17) , z ∈ A ( σ, λ ) , (4)where f S ( · ) and F S ( · ) are the pdf and the cumulative distribution functions (cdf) of S ∼ S (0 , r ), respectively. Now, let z = h ( y ; µ, σ, λ ) (see (1)). Because the Jacobian of thetransformation from y to z is | ∂ z /∂ y | = y λ − /µ λ σ the pdf of Y is given by f Y ( y ) = y λ − µ λ σ f Z ( z ) , y > . (5)Since f S ( s ) = r ( s ), we have from (1) and (4) that (5) can be written as f Y ( y ) = y λ − µ λ σ r ( z ) R ( σ | λ | ) , if λ , , y σ r (cid:0) z (cid:1) , if λ = , (6)for y >
0, where R ( s ) = R s −∞ r ( u )d u , for s ∈ R . The cdf of Y is given by F Y ( y ) = R ( z ) R ( σ | λ | ) , if λ ≤ , R ( z ) − R ( − σ | λ | ) R ( σ | λ | ) , if λ > , for y > ff erent values of the parameters. The figures suggest, as expected, that the transformationparameter λ controls the skewness of the distribution, while the right tail / kurtosis behavior iscontrolled by the extra parameter (degrees of freedom parameter of the Box-Cox t distribution It is the distribution of Z / U / q , where q > Z and U are independent random variables with standardnormal and uniform distribution, respectively. If σ | λ | =
0, 1 /σ | λ | is interpreted as lim σλ → (1 /σ | λ | ) = ∞ and F (1 /σ | λ | ) is taken as 1. ff erent values of theparameters. Note that the extra parameter q controls for right tail heaviness; see Section 3for a detailed discussion of tail heaviness of the Box-Cox symmetric distributions. . . . . . . . . y f ( y ) BCCGBCTBCSlashBCPE
Figure 1: Probability density functions of BCCG, BCT ( τ = τ = .
5) and BCSlash( q =
4) for µ = σ = . λ = . Y ∼ BCS ( µ, σ, λ ; r ), for µ > σ > λ ∈ R , thefollowing properties hold:(i) dY ∼ BCS ( d µ, σ, λ ; r ), for all constant d >
0, and hence µ is a scale parameter;(ii) ( Y /µ ) d ∼ BCS (1 , d σ, λ/ d ; r ), for all constant d >
0. In particular, Y /µ ∼ BCS (1 , σ, λ ; r ),( Y /µ ) /σ ∼ BCS (1 , , σλ ; r ), and, for λ >
0, ( Y /µ ) λ ∼ BCS (1 , λσ, r );(iii) if λ = Y has a truncated symmetric distribution with parameters µ and µσ andsupport (0 , ∞ );(iv) from (6), we have that, if λ = Y has a log-symmetric distribution with parameters µ and σ and density generating function r (Vanegas and Paula; 2016); it is denoted hereby LS ( µ, σ ; r ).(v) for integer k , E( Y k ) = µ k E (cid:16) (1 + σλ S ) k λ I A ( σ,λ ) ( S ) (cid:17) , if λ , ,µ k E (cid:0) exp( k σ S ) (cid:1) , if λ = , where I A ( σ,λ ) ( · ) is the indicator function of the set A ( σ, λ ) given in (3), and S ∼ S (0 , r ).Hence, when λ = R \ A ( σ, λ ) has negligible probability under the5 . . . . . . . y f ( y ) m = m = m = m = (a) σ = λ = q = . . . . . . . . y f ( y ) s = s = s = s = (b) µ = λ = . q = . . . . . y f ( y ) l = - l = - l = l = l = (c) µ = σ = . q = . . . . y f ( y ) q = = = = (d) µ = σ = . λ = . Figure 2: Probability density functions of
BCSlash ( µ, σ, λ, q ). S (0 , r ) distribution) the moments of Y can be obtained from the characteristic functionof a standard symmetric distribution.The BCS class of distributions allows easy parameter interpretation from its quantiles.Let s α denote the α -quantile of the S ∼ S (0 , r ), and z α be the α -quantile of the truncated S ,i.e. of the standard symmetric distribution S (0 , r ) truncated at R \ A ( σ, λ ). We have z α = R − h α R (cid:16) σ | λ | (cid:17)i , if λ ≤ , R − h − (1 − α ) R (cid:16) σ | λ | (cid:17)i , if λ > . Recall that R ( · ) is the cdf of S ∼ S (0 , r ). If Y ∼ BCS ( µ, σ, λ ; r ) its α -quantile is given by y α = µ (1 + σλ z α ) λ , if λ , ,µ exp( σ z α ) , if λ = . We note that all the quantiles are proportional to µ . In particular, if λ = z / = µ is the median of Y . Moreover, if the truncation set R \ A ( σ, λ ) has6egligible probability under the S (0 , r ) distribution (this happens when σλ is small), µ isapproximately equal to the median of Y . In order to give an interpretation for σ , we considerthe centile-based coe ffi cient of variation (Rigby and Stasinopoulos; 2006) CV Y = y . − y . y . . When λ is not far from zero and ignoring the truncation region, CV Y ≈ . σ s . ), whichis an increasing function of σ . Here, sinh( · ) is the hyperbolic sine function. The approximationis exact when λ =
0. Therefore, σ can be seen as a relative dispersion parameter. Finally, λ is regarded as a skewness parameter since it determines the power transformation tosymmetry.At this point it is informative to compare our approach with an alternative, closely re-lated, approach that uses the original Box-Cox transformation. A usual strategy for dealingwith positive continuous asymmetric data is to employ a Box-Cox transformation in the dataand to assume that the transformed data follow a normal distribution. The normal distribu-tion can be replaced by the class of the continuous symmetric distributions in the Box-Coxtransformation approach; see Cordeiro and Andrade (2011). Formally, this approach doesnot correspond to assume a coherent distribution for the data because the support of thetransformed variable is not the entire real line, unless the transformation parameter is zero;this is known as the truncation problem. In order to take the correct support of the Box-Cox transformed data into account, a truncated normal (or symmetric) distribution may beassumed for the transformed data; see e.g. Poirier (1978) and Yang (1996). However, thetruncation point will depend on the three parameters, namely the location, dispersion andtransformation parameters. In our approach the truncation point depends on σ and λ only.Hence, it does not depend on the regressor values if a regression model is assumed for µ . Al-though the truncation problem is usually disregarded in the statistical literature, alternativetransformations have been proposed to overcome this problem; see, e.g., Yeo and Johnson(2000) and Yang (2006). Furthermore, the model parameters are interpreted as charac-teristics of the transformed data, not the original data. Our approach does not have thesetwo shortcomings: a genuine distribution is assumed for the data and the parameters areinterpretable in terms of characteristics of the original data, not the transformed data.In Section 5, we present a comparison of alternative approaches in real data applications. Heavy-tailed distributions have been frequently used to model phenomena in various fieldssuch as finance and environmental sciences; see, for instance, Resnick (2007). A usualcriterion for evaluating tail heaviness in the extreme value theory is the tail index of regularvariation functions. Informally, a regular variation function behaves asymptotically as a7ower function. Formally, a Lebesgue measurable function M : R + → R + is regularlyvarying at infinity with index of regular variation α ( M ∈ RV α ), if lim t →∞ M ( ty ) / M ( t ) = y α for y >
0. If α = M is said to be a slowly varying function. The function M varies rapidlyat infinity or is regularly varying at infinity with index −∞ ( ∞ ), or M ∈ RV −∞ ( M ∈ RV ∞ ), if,for y >
0, lim t →∞ M ( ty ) / M ( t ) : = y −∞ (lim t →∞ M ( ty ) / M ( t ) : = y ∞ ); see de Haan (1970, p. 4). A continuous distribution with cdf F is said to have a heavy right tail whenever F : = − F is a regularly varying at infinity function with negative index of regular variation α = − /ξ, ξ >
0, i.e., lim t →∞ F ( ty ) / F ( t ) = y − /ξ . The parameter ξ is called the tail index. Fromthe l’H ˆopital rule, this limit can be written as y lim t →∞ f ( ty ) / f ( t ), for y >
0, where f is thepdf corresponding to F . When the limit equals y −∞ we say that F has a light (non-heavy)right tail and that the tail index is zero. When the limit equals 1, i.e. F is a slowly varyingfunction, we will say that F has right heavy tail with tail index ∞ .From de Haan (1970, Corollary 1.2.1) it follows that the tail index is invariant to location-scale transformations. Hence, from (2) the tail index of a S ( µ, σ ; r ) distribution is independentof µ and σ and is obtained from L S ( w ; r ) = w lim t →∞ r ( t w ) r ( t ) . (7)It can be shown that the tail index of the BCS ( µ, σ, λ ; r ) distribution, for all µ > σ > L BCS ( y , µ, σ, λ ; r ) = L S ( y λ ; r ) , if λ > , L LS ( y , µ, σ ; r ) , if λ = , y λ , if λ < , where L LS ( y , µ, σ ; r ) corresponds to the limit that defines the tail index of the log-symmetricdistributions and is given by (7) with tw in the numerator replaced by σ − log( tw /µ ) and t inthe denominator replaced by σ − log( t /µ ).Table 1 gives the tail index of some Box-Cox symmetric distributions. When λ >
0, theBox-Cox t and Box-Cox slash distributions have heavy right tail with the extra parameter (thedegrees of freedom parameter τ and the shape parameter q in the case of the t and the slashdistributions, respectively) controlling the tail weight for fixed λ ; the Box-Cox Cole-Green(normal), Box-Cox power exponential, Box-Cox type I logistic and Box-Cox type II logisticdistributions are all right light-tailed distributions, i.e. the tail index for these distributionsare zero. The results for λ = τ > y −∞ = ∞ , if 0 < y < =
1, if y = =
0, if y > y ∞ =
0, if 0 < y < =
1, if y = = ∞ , if y > The tail indices were obtained using Maple 13; see http: // τ > τ ∈ Q , and for the slash distribution, for q ∈ N ∗ . σ . All the others have right heavy tail with tail index ∞ . It is noteworthy that theextra parameters have no influence on the tail index. This suggests that the class of thelog-symmetric distributions (Vanegas and Paula; 2016) is much more restrictive than theBox-Cox symmetric distributions defined in this paper with respect to tail flexibility. Finally,when λ <
0, all the Box-Cox symmetric distributions have heavy right tail with tail indexequal to | λ | − .Table 1: Tail index ( ξ ) of some symmetric and Box-Cox symmetric distributions.distribution symmetric BCS ( λ >
0) BCS ( λ =
0) BCS ( λ < / | λ | double exponential 0 0 σ/ √ / | λ | power exponential τ > / | λ | τ = σ/ √ / | λ | τ < ∞ / | λ | Cauchy 1 1 /λ ∞ / | λ | t 1 /τ / ( λτ ) ∞ / | λ | type I logistic 0 0 0 1 / | λ | type II logistic 0 0 σ / | λ | canonical slash 1 1 /λ ∞ / | λ | slash ( q ∈ N ∗ ) 1 / q / ( λ q ) ∞ / | λ | An alternative approach to compare tail heaviness of statistical distributions is consid-ered by Rigby et al. (2014, Chapter 12). Here, we focus on right tail heaviness only. If therandom variables Y and Y have continuous pdf’s f Y ( y ) and f Y ( y ) and lim y →∞ f Y ( y ) = lim y →∞ f Y ( y ) = Y has heavier right tail than Y if and only if lim y →∞ (log f Y ( y ) − log f Y ( y )) = ∞ . The authors classify the possible asymptotic (large y ) behavior of the log-arithm of a pdf in three major forms: − k (log y ) k , − k y k or − k exp( k y ), with positive k ’s.The three forms are decreasing in order of tail heaviness and, for the first form, decreasing k results in a heavier tail while decreasing k for fixed k results in heavier tail. Similarly, forthe two other forms.Table 2 gives the coe ffi cients of the right tail asymptotic form of the logarithm of pdf’sfor some symmetric and Box-Cox symmetric distributions. Following Rigby et al. (2014), theright tail heaviness of the distributions in Table 2 can be classified in the following four types(the corresponding tail index for each type is given in parentheses):- non-heavy tail: type II with k ≥ ξ = k > < k < ξ = k = k > ξ = / ( k − k = k = ξ = ∞ ).It should be noted that distributions with right tail index ξ =
0, which are classified ashaving light (non-heavy) right tail according to the regular variation theory, are split intotwo categories in Rigby’s criterion: non-heavy right tail and heavy right tail but lighter thanany ‘Paretian type’ tail.When λ >
0, the Box-Cox t and Box-Cox slash distributions have ‘Paretian type’ righttail with the extra parameter controlling the right tail heaviness for fixed λ ; the Box-Coxpower exponential distributions have non-heavy right tail ( τ ≥ /λ ) or heavy right tailbut lighter than any ‘Paretian type’ tail ( τ < /λ ) , with τ determining the tail heavinessfor fixed λ . Depending on the value of λ , the Box-Cox Cole-Green and Box-Cox type Ilogistic distributions may have a non-heavy right tail ( λ ≥ /
2) or a heavy right tail butlighter than any ‘Paretian type’ tail (0 < λ < / λ ≥ < λ <
1, respectively. From the coe ffi cients for λ = τ > λ <
0, all the Box-Coxsymmetric distributions in Table 2 have right ‘Paretian type’ tail with the tail heavinesscontrolled by λ only.Table 2: Coe ffi cients of the right tail asymptotic form of the log of the pdf for some symmetricand Box-Cox symmetric distributions. distribution symmetric BCS ( λ >
0) BCS ( λ =
0) BCS ( λ < k = , k = / (2 σ ) k = λ , k = / (2 µ λ σ λ ) k = k = / (2 σ ) k = k = | λ | + k = k = √ /σ k = λ , k = √ / ( µ λ σλ ) k = k = √ /σ + k = k = | λ | + τ > k = τ, k = / (2 p ( τ ) τ σ τ ) k = λτ, k = / (2 p ( τ ) τ µ λτ σ τ λ τ ) k = τ , k = / (2 p ( τ ) τ σ τ ) k = k = | λ | + τ = k = k = √ /σ k = λ , k = √ / ( µ λ σλ ) k = k = √ /σ + k = k = | λ | + τ < k = τ, k = / (2 p ( τ ) τ σ τ ) k = λτ, k = / (2 p ( τ ) τ µ λτ σ τ λ τ ) k = k = k = k = | λ | + k = , k = k = k = λ + k = k = k = k = | λ | + k = k = τ + k = k = λτ + k = k = k = k = | λ | + k = k = /σ k = λ , k = / ( µ λ σ λ ) k = k = /σ k = k = | λ | + k = k = /σ k = λ , k = / ( µ λ σλ ) k = k = /σ + k = k = | λ | + k = , k = k = k = λ + k = k = k = k = | λ | + k = k = q + k = k = λ q + k = k = k = k = | λ | + The study presented in this section shows that the class of the Box-Cox symmetric distri-butions is very flexible for modeling positive data displaying di ff erent right tail behaviors.It covers from right light-tailed distributions to heavier than any ‘Paretian type’ tailed dis-tribution. More importantly, some distributions in this class have an extra parameter thatcontrols the right tail heaviness. Slightly heavy-tailed data may be modeled using a Box-Cox10ower exponential distribution, which has a ‘slightly’ heavy right tail (heavy tail but lighterthan any ‘Paretian type’ tail) when λ > λτ < τ .Moderately or highly heavy-tailed data may require a Box-Cox t or Box-Cox slash distribu-tion, because both are right ‘Paretian type’ tailed distributions with a parameter that controlsthe right tail heaviness. The log-likelihood function for a single observation y taken from a BCS distribution is givenby ℓ ( µ, σ, λ ) = ( λ −
1) log y − λ log µ − log σ + log r ( z ) − log R (cid:18) σ | λ | (cid:19) , where z = h ( y ; µ, σ, λ ) with h ( y ; µ, σ, λ ) given in (1); the last term in ℓ is zero if λ =
0. Thescore vector and the Hessian matrix are obtained from the first and second derivatives of ℓ with respect to the parameters; see the Appendix.The maximum likelihood estimates of µ and σ , for fixed λ , from a sample of n independentobservations y , . . . , y n , are solution of the system of equations µ = n σλ ) /λ (cid:16)P ni = ̟ i y λ i z i (cid:17) /λ , if λ , , (cid:16)Q ni = y ̟ i i (cid:17) / P ni = ̟ i , if λ = ,σ = n λ (1 − δ ) P ni = ̟ i (cid:20)(cid:16) y i µ (cid:17) λ − (cid:21) , if λ , , n P ni = ̟ i (cid:16) log y i µ (cid:17) , if λ = , where δ = σ | λ | r (cid:16) σ | λ | (cid:17) R (cid:16) σ | λ | (cid:17) , z i = h ( y i ; µ, σ, λ ), and ̟ i = ̟ ( z i ) with ̟ ( z ) = − r ′ ( z ) / r ( z ) being a weighting function thatdepends on r . We note that the maximum likelihood estimates of µ and σ involve weightedarithmetic and geometric averages of the contributions of each observation y i with weight ̟ ( z i ). Table (3) gives ̟ ( z ) for several BCS distributions. Note that some distributions in theBCS class produce robust estimation against outliers. For instance, for the Box-Cox t andBox-Cox power exponential (with τ <
2) distributions the weighting function is decreasingin the observation of Y . Hence, outlier observations have smaller weights in the estimationof the parameters.The system of likelihood equations for ( µ, σ, λ ) does not have analytical solution. Fur-thermore, it may involve an addition equation relative to an extra parameter (for instance,the degrees of freedom parameter of the BCT distribution). Maximization of the likelihood11able 3: Weighting functions for some BCS distributions.BCS distribution ̟ ( z )normal 1double exponential √ / | z | power exponential τ ( z ) τ/ − / (2 p ( τ ) τ )Cauchy 2 / (1 + z )t ( τ + / ( τ + z )type I logistic − {− z } − / (exp {− z } + {−| z |} − / [ | z | (exp {−| z |} + / z − exp {− z / } / (1 − exp {− z / } )slash 2 Ψ (( q + / , z / / ( z Ψ (( q + / , z / gamlss in R for the BCT, BCCG and BCPE distri-butions through the CG and the RS algorithms (Rigby and Stasinopoulos; 2005; Rigby et al.;2014). It is noteworthy that gamlss allows the fit of regression models with monotonic linkfunctions relating the parameters ( µ , σ , λ and the possibly extra parameter) to explanatoryvariables through parametric or semi-parametric additive models.It is of particular interest to test the null hypothesis H : λ =
0; recall that the BCSdistributions reduce to the log-symmetric distributions when λ =
0. In order to evaluatethe performance of the likelihood ratio test of H against a two sided alternative, we nowpresent a small Monte Carlo simulation study. We set µ = σ = n = , , ,
500 from a Box-Cox t distribution with two di ff erent values for thedegrees of freedom parameter, namely τ = ,
10. The samples are generated under the nullhypothesis. We assume that τ is known, and we estimate the remaining parameters usingthe function optim in R with the analytical derivatives derived in the Appendix and withnumerical derivatives. The type I error probability estimated from the simulated samplesfor a nominal level α =
5% are presented in Table 4. The figures in the table reveal that thelikelihood ratio test performs well regardless of whether analytical or numerical derivativesare employed. As expected, the type I error probabilities are close to the nominal level of thetest for the sample sizes considered here.
In this section, we present applications of the Box-Cox distributions in the analysis of microand macronutrients intake. The data refer to observations of nutrients intake based on thefirst 24-hour dietary recall interview for n =
368 individuals. For each nutrient, we assumethat the data Y , . . . , Y n are independent. 12able 4: Type I error probability of the likelihood ratio test of H : λ = µ = σ = τ = n τ = τ = ff erent models to each of all the 33 nutrients. All the models consideredin the first analysis are constructed from the Student-t distribution and from its limitingcase when the degrees of freedom parameter goes to infinity, i.e. the normal distribution.The following models were considered: Box-Cox t (BCT); Box-Cox Cole-Green (BCCG),which corresponds to the BCT model with τ → ∞ ; skew-normal (SN) and skew-t (ST)(Azzalini; 2005); and transformed symmetric models with normal (TN) and t (TT) errors(Cordeiro and Andrade; 2011). The TN (TT) model assumes that the original Box and Cox(1964) transformed data follow a normal (Student-t) distribution. The unknown parameters(including the degrees of freedom parameter) were estimated by the maximum likelihoodmethod. For the BCT, BCCG, SN and ST distributions, we used the gamlss package imple-mented in R , while for the TN and TT models we used both the function optim in R and the PROC NLP in SAS . Goodness-of-fit was evaluated using the following criteria: Akaike infor-mation criterion (AIC) and Anderson-Darling statistics (AD, ADR, and AD2R); see Luce ˜no(2005, Tables 1, 2 and B.1). AD is a global measure of lack-of-fit, while both ADR and AD2Rare more sensitive to the lack of fit in the right tail of the distribution; AD2R puts moreweight in the right tail than ADR. For all the four criteria a lower value is preferred.Tables 5-8 present the goodness-of-fit statistics for all the fitted models to 22 and 11micro and macronutrients intakes data. Underlined numbers indicate the best fitting model.The blank cells in the tables indicate that the algorithm employed for maximum likelihoodestimation did not achieve convergence or produced unrealistic estimates. The tables conveyimportant information. First, the datasets cover a wide range of light-tailed to heavy-taileddata. This can be seen by the estimated values of the degrees of freedom parameter underthe BCT model; b τ BCT ranges from 2 . . b τ BCT > gamlss package implemented in R , while for the BCSlash model we used the function optim in R . Tables 9 and 10 give descriptive statistics of the data, and parameter estimatesand goodness-of-fit statistics for each fitted model. The descriptive statistics confirm thefindings in the boxplots. For both data sets, the Akaike information criteria are similarfor the BCT, BCPE, and BCSlash models, and both are smaller than that for the BCCGmodel. The Anderson-Darling statistics indicate that the BCT model gives the best fit. Notethat the BCCG model gives a poor right tail fit for both data sets. This lack of fit is mostpronounced for the energy intake data set, for which AD2R = .
25 for the BCCG modelwhile AD2R = .
78, AD2R = .
18, and AD2R = .
15 for the BCT, BCPE, and BCSlash models,respectively.Figure 4 presents qq plots for quantile residuals b r = Φ − ( b F Y ( y )), where Φ ( · ) is the cdf ofthe standard normal distribution and b F Y ( y ) is the fitted cdf of Y . If the model is correct,the quantile residuals are expected to behave approximately as standard normal quantiles(Dunn and Smyth; 1996). The lack of fit of the BCCG model in the right tail is clear fromthe plots for both data sets. For the animal protein data, the residual plots are similar forthe BCT and BCPE models and indicate reasonable fits. On the other hand, for the energyintake data set, which seems to require a right heavier tailed model, the BCT model providesa slightly better fit than the BCPE and BCSlash models.14able 5: AIC for the fitted BCT, ST, TT, BCCG, SN, and TN models; micronutrients andmacronutrients datasets. micronutrient b τ BCT
BCT ST TT BCCG SN TNvitamin A (mcg) 7.2 5807.4 5809.2 5807.4 5822.7vitamin D (mcg) 6.8 1688.7 1707.7 1690.5 1745.3 1698.9vitamin E (mg) 6.9 1812.8 1818.4 1812.8 1824.3 1902.4 1824.3vitamin K (mcg) 7.8 4354.3 4356.5 4354.3 4368.4vitamin C (mg) 2.2 4022.5 4029.9 4120.9vitamin B1 (mg) 6.8 709.7 711.6 709.7 720.9 778.4 720.9vitamin B2 (mg) 6.5 701.6 702.2 701.6 716.1 769.6 716.1vitamin B3 (mg) 6.8 2722.4 2726.3 2722.4 2730.8 2778.9 2730.8vitamin B6 (mg) 51.3 853.2 858.5 853.2 851.3 866.9 851.3vitamin B12 (mcg) 2.5 1782.6 1808.4 1782.8 1887.3pantothenic acid (mg) 8.3 1466.0 1465.5 1466.0 1474.1 1509.3 1474.1folate (mcg) 4.9 4795.4 4800.7 4795.4 4823.4 4823.4calcium (mg) 13.3 5311.3 5312.6 5311.3 5312.5 5342.8 5312.5phosphorus (mg) 14.7 5548.5 5548.4 5548.5 5549.0 5561.5 5549.0magnesium (mg) 8.6 4474.8 4476.8 4474.8 4481.5 4522.8 4481.5iron (mg) 5.9 2409.7 2409.3 2409.7 2431.9 2431.9zinc (mg) 14.6 2185.3 2188.7 2185.3 2185.4 2203.4 2185.4copper (mg) 5.5 566.1 566.1 593.6selenium (mcg) 5.2 3992.6 4000.6 3992.6 4020.8sodium (mg) 4.6 6525.4 6535.6 6525.4 6572.3potassium (mg) 9.3 6144.5 6147.5 6144.5 6151.9 6195.2 6151.9manganese (mg) 2.5 1616.5 1647.0 1623.7 1656.2macronutrient b τ BCT
BCT ST TT BCCG SN TNprotein (g) 10.2 3659.5 3660.9 3659.5 3662.5 3678.8 3662.5energy (kcal) 6.1 5861.9 5863.7 5861.9 5876.1 5912.8 5876.1fiber (g) 10.0 2652.2 2653.5 2652.2 2655.6 2669.2 2655.6carbohydrate (g) 10.5 4360.0 4359.9 4360.0 4366.5 4382.3 4366.5total fat (g) 13.9 3587.0 3593.7 3587.0 3587.5 3651.9 3587.5animal protein (g) 4.9 3514.4 3532.5 3516.1 3526.3 3553.9 3526.5vegetable protein (g) 6.6 2963.3 2964.4 2963.3 2972.6 2988.5 2972.6saturated fat (g) 196.0 2819.6 2822.9 2819.6 2817.7 2844.2 2817.7monounsaturated fat (g) 12.6 2857.1 2865.4 2857.1 2858.4 2920.1 2858.4polyunsaturated fat (g) 7.0 2596.9 2596.4 2596.9 2612.4 2771.9 2612.4cholesterol (mg) 5.8 4724.7 4753.7 4725.6 4782.9 4728.5 micronutrient statistic BCT ST TT BCCG SN TNvitamin A AD 1.12 0.94 1.12 1.47ADR 0.64 0.53 0.64 0.94AD2R 6.26 8.46 6.27 > > > >
100 61.43vitamin K AD 0.60 22.69 0.60 1.28ADR 0.35 2.22 0.35 0.86AD2R 55.60 >
100 55.44 > > > > > > > > >
100 17.25vitamin B6 AD 0.37 0.42 0.38 0.45 1.69 0.45ADR 0.18 0.21 0.18 0.21 1.04 0.21AD2R 4.13 5.00 4.10 4.44 27.4 4.45vitamin B12 AD 0.36 1.23 0.37 8.69ADR 0.26 0.77 0.27 5.08AD2R 8.88 22.56 9.67 > >
100 13.77folate AD 0.19 0.23 0.19 1.83 1.83ADR 0.12 0.13 0.12 0.94 0.95AD2R 4.61 14.68 4.60 > > >
100 7.50 cont. ). micronutrient statistic BCT ST TT BCCG SN TNphosphorus AD 0.25 0.21 0.25 0.36 1.50 0.36ADR 0.15 0.13 0.15 0.18 1.00 0.19AD2R 2.42 2.94 2.43 5.58 >
100 5.56magnesium AD 0.31 0.32 0.31 0.66 4.07 0.66ADR 0.18 0.20 0.18 0.36 2.77 0.36AD2R 2.76 5.08 2.76 25.34 >
100 25.36iron AD 0.25 0.12 0.25 1.21 1.21ADR 0.10 0.07 0.10 0.45 0.45AD2R 2.35 8.01 2.34 > > >
100 7.97copper AD 0.37 0.37 1.74ADR 0.21 0.21 0.99AD2R 16.10 16.25 > >
100 7.14 > > > > > >
100 74.60 28.29 56.04 macronutrient statistics BCT ST TT BCCG SN TNprotein AD 0.23 0.25 0.23 0.54 2.97 0.54ADR 0.14 0.15 0.14 0.30 2.02 0.30AD2R 3.60 4.62 3.58 10.93 67.43 10.93energy AD 0.19 0.20 0.19 1.13 4.13 1.10ADR 0.10 0.11 0.10 0.58 2.78 0.54AD2R 1.78 2.60 1.78 > > > >
100 11.26carbohydrate AD 0.26 0.20 0.25 0.39 1.36 0.39ADR 0.13 0.12 0.13 0.17 0.91 0.17AD2R 5.22 16.34 5.21 > > > >
100 14.46animal protein AD 0.35 0.42 0.35 1.46 2.84 1.43ADR 0.16 0.16 0.14 0.70 1.74 0.68AD2R 3.19 4.10 3.03 23.47 >
100 22.34vegetable protein AD 0.25 0.10 0.25 1.08 2.58 1.08ADR 0.10 0.07 0.10 0.43 1.62 0.43AD2R 1.98 1.81 1.98 39.96 >
100 39.88saturated fat AD 0.16 0.36 0.16 0.17 2.66 0.17ADR 0.08 0.14 0.08 0.08 1.72 0.08AD2R 3.07 3.76 3.06 3.27 >
100 3.27monounsaturated fat AD 0.33 0.63 0.33 0.62 5.17 0.62ADR 0.11 0.22 0.11 0.23 3.39 0.26AD2R 4.42 12.78 4.40 29.04 >
100 29.06polyunsaturated fat AD 0.57 0.43 0.57 1.02 21.93 1.02ADR 0.26 0.26 0.26 0.51 13.18 0.51AD2R 3.52 10.56 3.52 86.75 >
100 86.57cholesterol AD 0.32 0.35 0.28 4.22 1.16ADR 0.13 0.15 0.12 2.45 0.51AD2R 3.40 2.98 3.34 >
100 11.01
For the energy intake data, the estimate of the skewness parameter ( λ ) is indistinguishablefrom zero for the four models. Its estimates are close to zero, and the standard errors arerelatively large. It suggests that log-symmetric models may be appropriate. Table 11 presentsthe parameter estimates and the goodness-of-fit statistics for the log-t, log-normal, log-powerexponential, and log-slash model fits. Comparing the figures in Tables 10 and 11 one maynotice that AD and ADR, and the estimates for µ , σ and τ do not change much and the18ICs are slightly smaller for the log-symmetric models. On the other hand, AD2R droppedfrom 112.25 (BCCG model) to 48.17 (log-normal model) and from 4.18 (BCPE model) to 2.70(log-PE model). The change in AD2R is small when one moves from the BCT model to thelog-t model or from the BCSlash model to the log-slash model. Also, the quantile residualsplots (not shown) are similar to those presented in Figure 4. Taking parsimony into account,we may conclude that the best fitting model is the log-t model. i m a l p r o t e i n 02000400060008000ene r g y Figure 3: Adjusted boxplots; animal protein data (left) and energy intake (right).Table 9: Descriptive statistics, parameter estimates (standard error in parentheses) andgoodness-of-fit measures; animal protein intake data. statistics (mg) min .25-quantile median mean (s.d.) .75-quantile max0.02 30.51 44.26 52.27 67.59 207.10(34.02)distribution BCT BCCG BCPE BCSlash µ σ λ τ q statistics (kcal) min .25-quantile median mean (s.d.) .75-quantile max298.80 1356.00 1723.00 1868.00 2197.00 8370.00(838.35)distribution BCT BCCG BCPE BCSlash µ σ λ τ q −3 −2 −1 0 1 2 3 − − − Theoretical Quantiles S a m p l e Q uan t il e s (a) BCT −3 −2 −1 0 1 2 3 − − − Theoretical Quantiles S a m p l e Q uan t il e s (b) BCCG −3 −2 −1 0 1 2 3 − − − Theoretical Quantiles S a m p l e Q uan t il e s (c) BCPE −3 −2 −1 0 1 2 3 − − − Theoretical Quantiles S a m p l e Q uan t il e s (d) BCSlash −3 −2 −1 0 1 2 3 − − − Theoretical Quantiles S a m p l e Q uan t il e s (e) BCT −3 −2 −1 0 1 2 3 − − Theoretical Quantiles S a m p l e Q uan t il e s (f) BCCG −3 −2 −1 0 1 2 3 − − − Theoretical Quantiles S a m p l e Q uan t il e s (g) BCPE −3 −2 −1 0 1 2 − − − Theoretical Quantiles S a m p l e Q uan t il e s (h) BCSlash Figure 4: qq plots for quantile residuals for the BCT, BCCG, BCPE and BCSlash model fitsfor the animal protein intake data (first row) and energy intake data (second row).20able 11: Parameter estimates (standard errors in parentheses) and goodness-of-fit statisticsfor the log- t , log-normal, log-power exponential and log-slash models; energy intake data. distribution log-t log-normal log-PE log-slash µ σ τ q For the BCT model fitted to the animal protein intake data the estimates of µ , σ , λ , and τ reported in Table 9 are b µ P = . b σ P = . b λ P = .
42, and b τ P = .
90, respectively. Forthe log-t model fitted to the energy intake data the estimates of µ , σ , and τ are b µ E = . b σ E = .
34, and b τ E = .
09, respectively; see Table 11. Because b σ P b λ P is small, b µ P may be seen asan estimate of the population median of the animal protein intake. As expected, b µ P is closeto the sample median (44.26). Similarly, b µ E , which is an estimate of the population median ofthe energy intake, is close to the sample median (1723.00). Additionally, b σ P is considerablylarger than b σ E indicating that the relative dispersion of the population intake of animalprotein is larger than that of energy. The estimates of the degrees of freedom parameter τ ,both not large, reveal the need for right heavy tailed distributions for an adequate fit for thedata. This paper proposed a new class of distributions, the Box-Cox symmetric distributions. Itcontains some well known distributions as special cases and allows the definition of newdistributions, such as the Box-Cox slash distribution. It is particularly suitable for inferenceon positively skewed, possibly heavy-tailed, data. It permits easy parameter interpretation,a desirable feature for modeling.There is clear possibility for extension to regression models. Some or all the parameters ofthe BCS distributions may be modeled by a link function and a linear or nonlinear regressionmodel structure. The GAMLSS framework (Rigby and Stasinopoulos; 2005) is a naturaltool for implementing BCS regression models. It allows the regression structure to includeparametric and nonparametric terms and random e ff ects. Box-Cox t, Box-Cox Cole-Green,and Box-Cox power exponential models are already implemented in gamlss package in R .21ome BCS distributions include an extra parameter; e.g., the degrees of freedom pa-rameter of the BCT distribution. We have not faced convergence problems or unrealisticestimation when the additional parameter is estimated simultaneously with the others. Itshould be noticed that the sample sizes in our applications were relatively large ( n = ff ects to model nutrients intake data taken from repeated24-hour recalls.As a final remark, we recall that a comprehensive study on the right tail heaviness ofthe Box-Cox symmetric distributions was presented. For future possible investigation, itmight be interesting to search for skewness-kurtosis boundaries allowing the existence ofBCS distributions, as in Jondeau and Rockinger (2003). Acknowledgments
We thank Jos´e Eduardo Corrente for providing the data used in this study, and Eliane C.Pinheiro for helpful discussions. We acknowledge the financial support of FAPESP, CAPESand CNPq (Brazil). We are grateful to the associate editor and two anonymous referees forconstructive comments and suggestions.
Appendix
In this appendix we give the first and second derivatives of the log-likelihood functionwith respect to the parameters. Let z = h ( y ; µ, σ, λ ), where h ( y ; µ, σ, λ ) is given in (1), ̟ = − r ′ ( z ) / r ( z ), and ξ = r (( σλ ) − ) / R (( σ | λ | ) − ) . We have ∂ z ∂µ = − µσ y µ ! λ −−−→ λ → − µσ ,∂ z ∂λ = σλ + y µ ! λ " − + λ log y µ ! −−−→ λ → σ " log y µ ! ,∂ z ∂µ = ( λ + µ σ y µ ! λ −−−→ λ → µ σ ,∂ z ∂λ = σλ − + y µ ! λ − λ log y µ ! + λ log y µ !! −−−→ λ → σ " log y µ ! , z ∂µ∂λ = − µσ y µ ! λ log y µ ! −−−→ λ → − µσ log y µ ! . Let ℓ denote the log-likelihood for a single observation y . We have ℓ = ( λ −
1) log y − λ log µ − log σ + log r ( z ) − log R (cid:18) σ | λ | (cid:19) , if λ ,
0; the last term in ℓ is zero if λ =
0. The first derivatives of ℓ are given by ∂ℓ∂µ = − λµ − ̟ z ∂ z ∂µ ,∂ℓ∂σ = − σ + ̟ z σ + ξσ | λ | , if λ , , − σ + ̟ z σ , if λ = ,∂ℓ∂λ = log y µ ! − ̟ z ∂ z ∂λ + sign( λ ) ξσλ . The second derivatives of ℓ are given by ∂ ℓ∂µ = λµ − z d ̟ dz + ̟ ! ∂ z ∂µ ! − ̟ z ∂ z ∂µ ,∂ ℓ∂σ = σ − z σ d ̟ dz − ̟ z σ + σ | λ | ∂ξ∂σ − ξσ | λ | , if λ , , σ − z σ d ̟ dz − ̟ z σ , if λ = ,∂ ℓ∂λ = − z d ̟ dz + ̟ ! ∂ z ∂λ ! − ̟ z ∂ z ∂λ + sign( λ ) σλ ∂ξ∂λ − ξσλ ! ,∂ ℓ∂µ∂σ = z σ ∂ z ∂µ z d ̟ dz + ̟ ! ,∂ ℓ∂µ∂λ = − µ − z d ̟ dz + ̟ ! ∂ z ∂µ ∂ z ∂λ − ̟ z ∂ z ∂µ∂λ ,∂ ℓ∂σ∂λ = z σ ∂ z ∂λ z d ̟ dz + ̟ ! + σ | λ | ∂ξ∂λ − sign( λ ) ξσ λ . The first and second derivatives of ℓ are obtained after plugging the derivatives of z givenabove.Note that the first derivatives of ℓ depend on the weighting function ̟ ( ̟ is given inTable 3 for some distributions). Consequently, d ̟/ dz appears in all the second derivatives of ℓ . Note that ∂ℓ/∂σ and ∂ℓ/∂λ involve ξ , which in turn depends on the particular distributionin the BCS class and the truncation set. The first derivatives of ξ appear in ∂ ℓ/∂σ , ∂ ℓ/∂λ ∂ ℓ/∂σ∂λ . The stability of the terms that involve ξ and its first derivatives around λ = ff erent distributions. For instance, they may be unstable for theBox-Cox t distribution with small degrees of freedom parameter. Yet, a simulation study ofthe type I error probability of the likelihood ratio test of H : λ = ff erent values of the degrees of freedom parameter performed well; see Section 4. References
Azzalini, A. (2005). The skew-normal distribution and related multivariate families,
Scandi-navian Journal of Statistics (2): 159–188.Box, G. E. P. and Cox, D. R. (1964). An analysis of transformations, Journal of the RoyalStatistical Society. Series B (2): 211–252.Cole, T. and Green, P. J. (1992). Smoothing reference centile curves: the LMS method andpenalized likelihood, Statistics in Medicine (10): 1305–1319.Cordeiro, G. M. and Andrade, M. G. (2011). Transformed symmetric models, StatisticalModelling (4): 371–388.de Haan, L. (1970). On Regular Variation and Its Application to the Weak Convergence of SampleExtremes , Mathematical Centre Tracts 32, Amsterdam: Mathematics Centre.Dunn, P. K. and Smyth, G. K. (1996). Randomized quantile residuals,
Journal of Computationaland Graphical Statistics (3): 236–244.Fang, K. T., Kotz, S. and NG, K. W. (1990). Symmetric Multivariate and Related Distributions ,Chapman and Hall, London.Hubert, M. and Vandervieren, E. (2008). An adjusted boxplot for skewed distributions,
Computational Statistics & Data Analysis (12): 5186–5201.Jondeau, E. and Rockinger, M. (2003). Conditional volatility, skewness, and kurtosis: exis-tence, persistence, and comovements, Journal of Economic Dynamics & Control : 1699–1737.Kelker, D. (1970). Distribution theory of spherical distributions and a location-scale param-eter generalization, Sankhya A (4): 419–430.Luce ˜no, A. (2005). Fitting the generalized Pareto distribution to data using maximumgoodness-of-fit estimators, Computational Statistics & Data Analysis (2): 904–917.Poirier, D. J. (1978). The use of box-cox transformation in limited dependent variable models, Journal of the American Statistical Association (362): 284–287.24esnick, S. I. (2007). Heavy-Tail Phenomena Probabilistic and Statistical Modeling , Springer.Rigby, R. A. and Stasinopoulos, D. M. (2004). Smooth centile curves for skew and kurtoticdata modelled using the BoxCox power exponential distribution,
Statistics in Medicine (19): 30533076.Rigby, R. A. and Stasinopoulos, D. M. (2005). Generalized additive models for location, scaleand shape, Journal of the Royal Statistical Society: Series C (Applied Statistics) (3): 507–554.Rigby, R. A. and Stasinopoulos, D. M. (2006). Using the Box-Cox t distribution in GAMLSSto model skewness and kurtosis, Statistical Modelling (3): 209–229.Rigby, R. A., Stasinopoulos, D. M., Heller, G. and Voudouris, V. (2014). The Distribution Toolbox of GAMLSS , London. http: // / wp-content / uploads / / / distributions.pdf.Rogers, W. H. and Tukey, J. W. (1972). Understanding some long-tailed symmetrical distri-butions, Statistica Neerlandica (3): 211–226.Stasinopoulos, D. M., Rigby, R. A. and Akantziliotou, C. (2008). Instructions on how to use theGAMLSS package in R , London. http: // Test (1): 110–135.Vanegas, L. H. and Paula, G. A. (2016). Log-symmetric distributions: statistical propertiesand parameter estimation, Brazilian Journal of Probability and Statistics : 196–220.Voudouris, V., Gilchrist, R., Rigby, R. A., Sedgwick, J. and Stasinopoulos, D. M. (2012).Modelling skewness and kurtosis with the BCPE density in GAMLSS, Journal of AppliedStatistics (6): 1279–1293.Yang, Z. (2006). A modified family of power transformations, Economics Letters (1): 14–19.Yang, Z. L. (1996). Some asymptotic results on Box-Cox transformation methodology, Com-munications in Statistics, Theory and Methods (2): 403–415.Yeo, I. K. and Johnson, R. A. (2000). A new family of power transformation to improvenormality or symmetry, Biometrika87