Bayesian Robust Quantile Regression
aa r X i v : . [ s t a t . M E ] M a y Bayesian Robust Quantile Regression
Mauro Bernardi a , Marco Bottone b , Lea Petrella c a Department of Statistical Sciences, University of Padova, Padua, Italy. b Bank of Italy and Department of Methods and Models for Economics, Territory andFinance, Sapienza University of Rome, Rome, Italy. c Department of Methods and Models for Economics, Territory and FinanceSapienza University of Rome, Rome, Italy.
Abstract
Traditional Bayesian quantile regression relies on the Asymmetric Laplace dis-tribution (ALD) mainly because of its satisfactory empirical and theoreticalperformances. However, the ALD displays medium tails and it is not suitablefor data characterized by strong deviations from the Gaussian hypothesis. Inthis paper, we propose an extension of the ALD Bayesian quantile regressionframework to account for fat–tails using the Skew Exponential Power (SEP)distribution. Beside having the τ -level quantile as parameter, the SEP distribu-tion has an additional key parameter governing the decay of the tails, makingit attractive for robust modeling of conditional quantiles at different confidencelevels. Linear and Generalized Additive Models (GAM) with penalized splineare considered to show the flexibility of the SEP in the Bayesian quantile regres-sion context. Lasso priors are considered in both cases to account for shrinkingparameters problem when the parameters space becomes wide. To implementthe Bayesian inference we propose a new adaptive Metropolis–Hastings algo-rithm in the linear model and an adaptive Metropolis within Gibbs one in theGAM framework. Empirical evidence of the statistical properties of the pro-posed SEP Bayesian quantile regression method is provided through severalexample based on simulated and real dataset. Keywords:
Bayesian quantile regression, robust methods, Skew ExponentialPower, GAM.
1. Introduction
Quantile regression has become a very popular approach to provide a wide de-scription of the distribution of a response variable conditionally on a set ofregressors. While linear regression analysis aims to estimate the conditional
URL: [email protected] (Mauro Bernardi), [email protected] (Marco Bottone), [email protected] (Lea Petrella)
Preprint submitted to October 1, 2018 ean of a variable of interest, in quantile regression we may estimate any con-ditional quantile of order τ with τ ∈ (0 ,
1) . Since the seminal works of Koenkerand Basset (1978) and Koenker and Machado (1999), several papers have beenproposed in literature considering the quantile regression analysis both from afrequentist and a Bayesian points of view. For the former, following Koenker(2005) and the references therein, the estimation strategy relies on the minimiza-tion of a given loss function. From the Bayesian point of view Yu and Moyeed(2001) introduced the ALD as likelihood tool to perform the inference. Afterthat a wide Bayesian literature has been growing on quantile regression andALD see for example Dunson and Taylor (2005), Kottas and Gelfand (2001),Kottas and Krnjiajic (2009), Thomson et al. (2010), Salazar et al (2012) Lumand Gelfand (2012), Sriram et al (2013) and Bernardi et al. (2015). Althoughthe ALD is widely used in the Bayesian framework it has the main disadvantageof displaying medium tails which may give misleading informations for extremequantiles in particular when the data are characterized by the presence of out-liers and heavy tails. In fact the absence for the ALD of a parameter governingthe tails fatness may influence the final inference. Recently Wichitaksorn etal. (2014) tried to generalize the classical Bayesian quantile regression by usingsome skew distributions obtained through mixture of scaled Normal ones. Thisclass of distributions allows for different degrees of asymmetry of the responsevariable but they all impose a given structure of the tails. To overcome thisdrawback we propose an extension of the Bayesian quantile regression by usingthe Skew Exponential Power (SEP) distribution proposed in Zhu and Zinde–Walsh, 2009. The SEP distribution, like the ALD, has the property of havingthe τ -level quantile as the natural location parameter but it also has an addi-tional parameter (the shape parameter) governing the decay of the tails. Usingthe proposed distribution in quantile regression we are able to robustify theinference in particular when outliers or extreme values are observed. In linearregression analysis several works have extensively considered the non skewedversion of the SEP i.e. the Exponential Power distribution (EP), for the relatedrobustness properties given by the shape parameter. Box and Tiao (1973) firstshow how to robustify the classical Gaussian linear regression model introducingthe EP as distribution assumption for the error term. Choy and Smith (1997),explore the robustness properties of posterior moment based on the EP distri-bution, while Choy and Walker (2003) present further extension of the workof Choy and Smith (1997) introducing the case in which the shape parameterassumes values greater than two.Finally, Naranjo et al. (2015) and Kobayashi (2016) consider the use of theSEP distribution in the regression and stochastic volatility models. For the bestof our knowledge this is the first attempt to consider the SEP distribution inorder to provide a robust framework for quantile regression analysis.In this paper we propose to use of the SEP distribution to develop a Bayesianrobust quantile regression framework. In particular due to the specific charac-teristics of the SEP distribution we will show how to estimate the quantile func-tion firstly considering the simple linear regression problem then extending it tothe generalized additive models (GAM) one. For the latter case we will adopt2he Penalized Spline (P–Spline) approach to carry out the statistical inference.The Bayesian paradigm is implemented by means of a new adaptive MetropolisMCMC sampling scheme with a full set of informative prior. In particular, forthe GAM framework, the proposed algorithm turns into an Adaptive Metropoliswithin Gibbs MCMC for an efficient estimate of the penalization parameter andthe P–Spline coefficients.When dealing with model building the choice of appropriate predictors andconsequently the variable selection issue plays an important role. In this paperwe approach the problem in the Bayesian quantile regression framework, byconsidering the Bayesian version of Lasso penalization methodology introducedby Tibshirani (1996). In particular for the linear quantile regression model wewill assume, as prior distribution on each regressors, the generalized version ofthe univariate independent Laplace distribution proposed by Park and Casella(2008) and Hans (2009) already considered in Alhamzawi et al. (2012). Withthis prior we shrink each parameter separately. As second step, when dealingwith the GAM models we generalize the Lang and Brezger (2004) second orderrandom walk prior for the Spline coefficients assuming a multivariate Laplacedistribution accounting for a correlation structure among parameters. This priorcorresponds to the group lasso penalty one of Yuan and Lin (2006), Meier et al.(2008) and Li et al. (2010) which in the spline contest has a natural interpreta-tion in terms of knots associated with each regressor.To analyze the performance of the proposed models we consider simulationstudies in which we control for the weight of the outliers, the number of theparameters, the shape of the regressors and the presence of heteroschedastic-ity. Furthermore we analyze three popular real dataset: the corrected version(see Li et al. 2010) of the Boston housing data first analyzed by Harrison andRubinfeld (1978); the Munich rental dataset with geoadditive spatial effect con-sidered in Rue and Held (2005) and Yue and Rue (2011) among the others; theBarro growth data firstly studied by Barro and Sala i-Martin (1995) and thenextended in the quantile regression framework by Koenker and Machado (1999).Compared with the existing literature, the models we propose introduce robust-ness, variable selection and non linearity in the estimation process, providing amore flexible framework and new interesting interpretation of some regressioncoefficient and, on average, lower posterior standard deviations.The remainder of the paper is organized as follows. In Section 2, we intro-duce the SEP distribution and discuss its properties relevant to model condi-tional quantiles as function of exogenous covariates. In Section 3 we introducethe model specification and the MCMC algorithms proposed. In Section 4 weapproach the non–linear extension of the linear quantile approach via GAMmodels. Section 5 explores the sampling performances of the proposed modelsthrough some simulation experiments. Section 6 discusses three well knownempirical applications while Section 7 concludes.3 α =1; τ =0.5 α =2; τ =0.5 α =0.5; τ =0.5 (a) -4 -2 0 2 400.20.40.60.81 α =1; τ =0.5 α =2; τ =0.2 α =0.5; τ =0.8 (b)Figure 1: Plot of the SEP distribution for different level of the shape parameter (1(a)) andskewness parameter (1(b)) with σ = 1 . µ = 0 .
2. The Skewed Exponential Power distribution
Zhu and Zinde–Walsh (2009) have recently proposed a parametrization of theSEP distribution introduced by Fernandez and Steel (1998) particularly conve-nient when quantiles are the main concern.
Definition 2.1.
A random variable Y ∈ R is said to be Skewed ExponentialPower distributed, i.e. Y ∼ SEP ( µ, σ, α, τ ) , if its density has the followingform: f SEP ( y ; µ, σ, α, τ ) = σ κ EP ( α ) exp n − α (cid:0) µ − y τσ (cid:1) α o , i f y ≤ µ σ κ EP ( α ) exp n − α (cid:16) y − µ − τ ) σ (cid:17) α o , i f y > µ, (1) where µ ∈ ℜ is the location parameter, σ ∈ ℜ + and α ∈ (0 , ∞ ) are the scale andshape parameters respectively, while κ EP = h α α Γ (cid:0) α (cid:1)i − with Γ ( · ) beingthe complete gamma function. Moreover, the parameter τ ∈ (0 , controls forthe skewness of the distribution. One of the most nice property of (1), which induces us to propose it for quantileregression inference, is that the location parameter µ coincides with the τ –levelquantile (we will theoretically prove it in Appendix A). It can be also shown (seeZhu and Zinde–Walsh, 2009) that the kurtosis of the SEP is directly determinedby its parameter α . In Figure 1 we present the pdf of the SEP distribution fordifferent values of shape ( α ) and skewness ( τ ) parameters, with fixed values ofthe location and scale parameters ( µ, σ ) = (0 , τ = 0 . α = 1 and α = 2, respectively.Moreover the smaller is α , the fatter are the tails of the distribution and inparticular as α → α → ∞ it becomes equal to the Uniform one. It is hence evident the importance of theparameter α in capturing the behaviour of the tails which may be fundamentalwhen outliers or heavy tails data are modelled. Furthermore, subfigure 1(b)displays the behavior of the SEP for different combination of α and τ . In this4ase, the ALD ( α = 1) and the Skew Normal distribution ( α = 2) can beobtained because of the role of the skewness parameter τ . In the same figure,it should be also evident the relation between τ and the location parameter µ .For a fixed µ ( µ = 0 in the graph) by varying τ , the shape of the distributionchanges in such a way that µ becomes its quantile of level τ .
3. Robust Bayesian linear quantile regression
In this section we propose the use of the SEP distribution to implementthe Bayesian inference for linear quantile regression combined with the priordistributions specification. Since we are interested in Lasso penalization prob-lem in order to achieve sparsity within the quantile regression model, we pro-pose as prior distribution for the regression parameters, a generalized version ofthe univariate independent Laplace distribution proposed by Park and Casella(2008) and Hans (2009). In line with Alhamzawi et al. (2012), for each quantileregression parameter we assume a Laplace distribution having different scaleparameter in order to shrink each regression parameter in a different way. Toachieve the Bayesian procedure we provide an adaptive MCMC sampling schemeobtained by running a block-move Independent Metropolis within Gibbs.
Let Y = ( Y , Y , . . . , Y T ) a random sample of T observations and X t = (1 , X t, ,. . . , X t,p − ) ′ , t = 1 , , . . . , T the associated set of p covariates . Consider thefollowing linear quantile regression model Y t = X ′ t β τ + ε t , t = 1 , , . . . , T, (2)where β τ = ( β τ, , β τ, , . . . , β τ,p − ) ′ is the vector of p unknown regression pa-rameters varying with the quantile τ level. Here, ε t , for any t = 1 , , . . . , T , areindependent random variables which are supposed to have zero τ – th quantileand constant variance. Assuming y = ( y , y , . . . , y T ) a realization of Y and x t a realization of X t , then the likelihood function for the model (2) based on theSEP distribution (1) with fixed τ can be written as L τ ( β τ , σ, α, | y , x t ) = T Y t =1 σ α − α Γ (cid:0) α (cid:1) (cid:20) exp (cid:26) − α (cid:18) x ′ t β τ − y t τ σ (cid:19) α (cid:27) ( y t ≤ x ′ t β τ ) + exp (cid:26) − α (cid:18) y t − x ′ t β τ − τ ) σ (cid:19) α (cid:27) ( y t > x ′ t β τ ) (cid:21) = 1(2 σ ) T α − Tα Γ (cid:0) α (cid:1) T " exp ( − α T X t =1 (cid:18) x ′ t β τ − y t τ σ (cid:19) α ) ( y t ≤ x ′ t β τ ) + exp ( − α T X t =1 (cid:18) y t − x ′ t β τ − τ ) σ (cid:19) α ) ( y t > x ′ t β τ ) , (3)5here in this case the parameter µ of equation (1) has been replaced by theregression function µ = x ′ t β τ . As discussed in the previous section, due to theproperty of the SEP distribution, the regression function x ′ t β τ corresponds tothe conditional τ –level quantile of Y t i.e. Q τ ( Y t | X t = x t ) = x ′ t β τ . In whatfollows, we omit the subscript τ for sake of simplicity.The Bayesian inferential procedure requires the specification of the prior dis-tribution for the unknown vector of parameters Ξ = ( β , γ , σ, α ). As mentionedbefore, for the parameters of the regression function we generalize the priorproposed in Park and Casella assuming the hierarchical structure in (5) and (6)which allows to efficiently shrink each parameter. The priors for the parametersare: π ( Ξ ) = π ( β | γ ) π ( γ ) π ( σ ) π ( α ) , (4)with π ( β | γ ) ∝ p Y j =1 L ( β j | , γ j ) (5) π ( γ ) ∝ p Y j =1 G ( γ j | ψ, ̟ ) (6) π ( σ ) ∝ IG ( a, b ) (7) π ( α ) ∝ B ( c, d ) (0 , ( α ) , (8)where β ∈ R p , ( ψ, ̟, a, b, c, d ) are given positive hyperparameters and γ =( γ , γ , . . . , γ p ) are the parameters of the univariate Laplace distribution: L ( β j | , γ j ) = γ j {− γ j | β j |} ( −∞ , + ∞ ) ( β j ) . (9)with zero location and γ j scale parameter. In (6)-(8) G , IG and B denote theGamma, Inverse Gamma and Beta distributions, respectively. As known dueto its characteristics, the Laplace distribution is the Bayesian counterpart ofthe Lasso penalization methodology introduced by Tibshirani (1996) to achievesparsity within the classical regression framework. The original Bayesian Lasso,see also, e.g., Park and Casella (2008) and Hans (2009), introduces the sameunivariate independent Laplace prior distribution for each regression parame-ters. Here, as in Alhamzawi et al. (2012), we consider a more general case usingthe parameters γ j , j = 1 , , . . . , p allowing us to overcome the problem that mayarise in presence of regressors with different scales of measurement by shrinkingeach regression parameter in a different way.As shown in Park and Casella (2008) and Kozumi and Kobayashi (2011), theLaplace distribution can be expressed as a location–scale mixture of Gaussianswhich adapted to our case becomes L ( β j | , γ j ) = Z ∞ p πω j exp ( − β j ω j ) γ j ( − γ j ω j ) dω j , (10)6or j = 1 , , . . . , p , where the mixing variable is exponentially distributed withshape parameter 2 /γ j . Furthermore, to retain a parsimonious model parame-terization, we introduce a second layer hierarchical prior representation for thevector of shape parameters γ , in equation (6). Using the location–scale repre-sentation of the Laplace distribution, the prior structure defined in equations(5)–(6), can be represented as follows β | ω ∼ N p ( β | p , Ω ) (11) ω j | γ j ∼ E (cid:0) ω j | /γ j (cid:1) (12) γ j ∼ G ( γ j | ψ, ̟ ) , (13)where p is a column vector of zeros of dimension p , ω = ( ω , ω , . . . , ω p ) ′ , Ω = diag { ω j , j = 1 , , . . . , p } and E is the exponential distribution. Concerningthe specification of the values for the hyperparameters of the prior distributions,typically, vague priors are imposed on the scale σ because it is regarded as anuisance parameter, see e.g. Yu and Moyeed (2001) and Tokdar and Kadane(2012). Concerning the prior specification for the shape parameter α , we im-pose a Beta distribution with c = 2 and d = 2 in order to allow for a large priorvariance without incurring in the problem of U–shaped Beta distribution whichgives large probability mass to extreme values. Moreover, we extend the Betadistribution to cover the support α ∈ (0 ,
2) where the special case α = 2 allowsto consider the so called conditional “expectile” of Newey and Powell (1987),while the case α = 1 the conditional quantiles based on the ALD introduced byYu and Moyeed (2001). As mentioned in Section 2, the parameter α regulatesthe tails–fatness of the SEP distribution so that smaller values implies largerprobabilities of extreme observations. Therefore, choosing α ∈ (0 ,
2) we encom-pass both quantile and expectile regression issue addressing at the same timethe robustness task relying on a distribution with fatter tails than the SkewNormal.In the following Section, we introduce the Bayesian parameter estimationprocedure which aims to simulate from the posterior distribution using an Adap-tive Independent Metropolis–Hastings MCMC algorithm.
The Bayesian inference is carried out using an adaptive MCMC sampling schemebased on the following posterior distribution π ( Ξ | y , x ) ∝ L τ ( β , σ, α | y , x ) π ( β | γ ) π ( γ ) π ( σ ) π ( α ) , (14)where L τ ( β , σ, α | y , x ) indicates the likelihood function specified in equation(3). After choosing a set of initial values for the parameter vector Ξ (0) , simula-tions from the posterior distribution at the i –th iteration of Ξ ( i ) , for i = 1 , , . . . ,are obtained by running iteratively a block–move Independent Metropolis withinGibbs (IMG). The simulation algorithm requires as first step the specificationof a proposal distribution for the parameters ( β , σ, α ).7o propose a move for each block of the parameters, we choose the followingproposal distributions: q ( β ) ∼ N p (cid:16) β | µ ( i ) β , Σ ( i ) β (cid:17) (15) q ( σ ) ∼ N (cid:16) ˜ σ | µ ( i )˜ σ , ψ ( i )˜ σ (cid:17) (cid:12)(cid:12)(cid:12) ∂ ˜ σ∂σ (cid:12)(cid:12)(cid:12) (16) q ( α ) ∼ N (cid:16) µ ( i ) α , ψ ( i ) α (cid:17) (0 , ( α ) , (17)where the scale parameter ˜ σ = log ( σ ) is considered on a log–scale and subse-quently transformed to preserve positiveness. The jacobian term in equation(16) is required to get the distribution of the transformation σ = exp (˜ σ ). Ateach iteration i = 1 , , . . . , the IMG algorithm proceeds by simulating a candi-date draw from each parameter block, i.e. Υ ∗ = ( ξ ∗ , ξ ∗ , ξ ∗ ) = ( β ∗ , σ ∗ , α ∗ ) whichis subsequently accepted or rejected. The generic probability that the proposedcandidate parameter ξ ∗ j , for j = 1 , , λ (cid:16) ξ ( i − j , ξ ∗ j (cid:17) = min , L (cid:16) ξ ∗ j , Ξ ( i − − j | y , x (cid:17) L (cid:0) Ξ ( i − | y , x (cid:1) π (cid:0) ξ ∗ j (cid:1) π (cid:16) ξ ( i − j (cid:17) q (cid:16) ξ ( i − j (cid:17) q (cid:0) ξ ∗ j (cid:1) , for j = 1 , ,
3, where λ (cid:16) ξ ( i − j , ξ ∗ j (cid:17) indicates the probability to move from the oldto the proposed state of the chain, π ( · ) is the generic prior given in equations(5) - (8) and Ξ ( i − − j refers to the whole set of parameters at iteration i − j –th element of Υ ∗ . To complete the algorithm we sample ( ω j , γ j ),for j = 1 , , . . . , p with a Gibbs step by simulating directly from the respectivefull conditional distributions ω j | β ( i ) j , γ ( i − j ∼ GIG (cid:18) ω j (cid:12)(cid:12)(cid:12) , β ( i ) j , γ ( i − j (cid:19) γ j ( i ) | ω ( i ) j ∼ G γ j (cid:12)(cid:12)(cid:12) ψ + 1 , ̟ + ω ( i ) j ! . where GIG denotes the Generalized Inverse Gaussian distribution. Since mostof the statistical properties of the Markov chain as well as the performance ofthe Monte Carlo estimators crucially depend on the definition of the proposaldistribution q ( · ) (see Andrieu and Moulines, 2006 and Andrieu and Thoms,2008) we improve the basic IMG–MCMC algorithm with an additional step8dapting the proposal parameters using the following equations: µ ( i +1) β = µ ( i ) β + ς ( i +1) (cid:16) β − µ ( i ) β (cid:17) , (18) Σ ( i +1) β = Σ ( i ) β + ς ( i +1) (cid:16) β − µ ( i ) β (cid:17) (cid:16) β − µ ( i ) β (cid:17) ′ , (19) µ ( i +1)˜ σ = µ ( i )˜ σ + ς ( i +1) (cid:16) ˜ σ − µ ( i )˜ σ (cid:17) , (20) ψ ( i +1)˜ σ = ψ ( i )˜ σ + ς ( i +1) (cid:16) ˜ σ − µ ( i )˜ σ (cid:17) , (21) µ ( i +1) α = µ ( i ) α + ς ( i +1) (cid:16) α − µ ( i ) α (cid:17) , (22) ψ ( i +1) α = ψ ( i ) α + ς ( i +1) (cid:16) α − µ ( i ) α (cid:17) , (23)where ς ( i +1) denotes a tuning parameter that should be carefully selected ateach iteration to ensure the convergence and the ergodicity of the resultingchain (see Andrieu and Moulines, 2006). Roberts and Rosenthal (2007) providetwo conditions for the convergence of the chain: the diminishing adaptation con-dition, which is satisfied if and only if ς ( i ) −→
0, as i → + ∞ , and the boundedconvergence condition, which essentially guarantees that all transition kernelsconsidered have bounded convergence time. Andrieu and Moulines (2006) showthat both conditions are satisfied if and only if ς ( i ) ∝ i − d where d ∈ [0 . , ς ( i ) = Ci . where C is set to 10, i.e. C = 10. Asargued by Roberts and Rosenthal (2007), together these two conditions ensureasymptotic convergence and a weak law of large numbers for this algorithm.
4. Nonlinear extension
In this section, we propose an additive extension of the robust linear quan-tile regression model considered previously to the class of Generalized AdditiveModels (GAM) introduced by Hastie and Tibshirani (1986). We will set upGAM models using the SEP likelihood considered before. In order to define thequantile function we make use of the P-Spline functions ending up with a semi-parametric problem. The Bayesian analysis is carried out by generalizing theLang and Brezger (2004) second order random walk prior for the Spline coeffi-cients assuming a multivariate Laplace distribution accounting for a correlationstructure among parameters able to take into account for selection variable issue.
Generalized Additive models extend multiple linear regression by allowing forthe response variable to be modeled as sum of unknown smooth functions ofcontinuous covariates. The aim of this section is to set up a robust non linear andsemi–parametric framework for quantile regression following a GAM approachusing the SEP likelihood. In particular we assume that the τ –level conditional9uantile can be modeled as a parametric component jointly with a sum of smoothfunctions as follows: Q τ ( Y t | X t = x t , Z t = z t ) = x ′ t β + J X j =1 f j ( z j,t ) , (24)where x ′ t β is the parametric component while each f j ( z j,t ) is a nonparametriccontinuous smooth function and z t = ( z ,t , z ,t , . . . , z J,t ) ′ is an additional setof covariates. To implement the statistical analysis we assume that the non-parametric component f j ( z tj ) can be approximated using a polynomial splineof order d , with k + 1 equally spaced knots between min ( z t ) and max ( z t ). Us-ing the well known representation of splines in terms of linear combinations ofB–splines, we can rewrite equation (24) as: Q τ ( Y t | X t = x t , Z t = z t ) = x ′ t β + J X j =1 k + d X ν =1 θ j,ν B j,ν ( z tj ) , (25)where B j,ν ( z tj ) denote B–spline basis functions and θ j,ν are the unknown coef-ficients. In this framework, the value of the estimated coefficients and the shapeof the fitted functions strongly depend upon the number and the position ofthe knots. With respect to the position, in absence of any prior information weconsider equidistant knots as a natural choice. Regarding the number of knots,to catch properly the smoothness of the data a careful trade off needs to beconsidered between few and too many knots since it may cause underfitting oroverfitting respectively. A possible solution to this problem is known as Penal-ized Spline (P–Spline) proposed by O’Sullivan (1986 and 1988) and generalizedby Eilers and Marx (1996) which relies on the introduction of a penalty ele-ment on the first or second differences of the B–Spline coefficients. This settinghas been embedded in the Bayesian framework by Lang and Brezger (2004),Brezger and Lang (2006) and Brezger and Steiner (2008) using a second orderrandom walk for all the B–Spline coefficients, i.e.: θ j,ν = 2 θ j,ν − − θ j,ν − + u j,ν , ∀ j = 1 , , . . . , J, ∀ ν = 1 , , . . . , k + d, (26)where the generic stochastic component u j,ν ∼ N (0 , h j ) and θ j, and θ j, areinitialized with diffuse priors, i.e., π ( θ j,ν ) ∝
1, for ν = 1 ,
2. In their workLang and Brezger (2004) assume that the stochastic components u j,ν driving therandom walk process are independent, i.e. u j,ν ⊥ u k,ν , for all j, k = 1 , , . . . , J with j = k . Since there are no reasons to assume a priori u j,ν and u k,ν indepen-dent ( ∀ j, k ) here we consider an extension of (26) and we assume a multivari-ate Laplace distribution on the vector of regressors accounting for a correlationstructure among them. Moreover it can easily proved, that the original marginalshrinkage effect is preserved under the assumed prior structure, because eachmarginal prior reduces to a univariate Laplace, see, e.g., Kotz et al. (2001).Moreover using the Laplace distribution as prior distribution allows to extendthe Bayesian Lasso approach to estimation.10et u j = ( u j, , u j, , . . . , u j,k + d ), we assume u j ∼ AL k + d (0 , I k + d ), where AL k + d denotes the multivariate Laplace distribution and I d + k is the identitymatrix of dimension k + d . Furthermore, let D δ to be the difference matrixof dimension ( k + d − δ ) × ( k + d ) and δ = 2 is the order of the differentialoperator, such that D δ θ j = u j , then π ( θ j | h j ) ∼ AL k + d (0 , h j ( D ′ δ D δ )) , ∀ j = 1 , , . . . , J, (27)having density π ( θ j | h j ) = C | D ′ δ D δ | h d + kj exp n − h j (cid:2) θ ′ j ( D ′ δ D δ ) θ j (cid:3) o , (28)where C = √ π Γ ( d + k +12 ) . As shows in Kotz et al. (2001), the multivariate Laplacedistribution can be expressed as a location–scale mixture of Gaussians, wherethe mixing variable follows a Gamma distribution π ( θ j | φ j ) ∼ N k + d (cid:16) , φ j ( D ′ δ D δ ) − (cid:17) (29) π ( φ j | h j ) ∼ G k + d + 12 , h j ! , (30)for j = 1 , , . . . , J .It is easy to show how to retrieve (28) from (29) - (30) by integrating out theaugmented variable φ j , i.e. π ( θ j | h j ) = Z R | D ′ δ D δ | exp (cid:26) − θ ′ ( D ′ δ D δ ) θ φ j (cid:27) (2 πφ j ) k + d (cid:16) h j (cid:17) d + k +12 φ d + k − j Γ (cid:0) d + k +12 (cid:1) × exp ( − h j φ j ) dφ j = (cid:16) h j (cid:17) d + k +12 | D ′ δ D δ | (2 π ) k + d Γ (cid:0) d + k +12 (cid:1) × Z R φ − j exp (cid:26) − (cid:20) θ ′ ( D ′ δ D δ ) θ φ j + h j φ j (cid:21)(cid:27) dφ j , (31)where the integrand in the previous equation (31) is proportional to a General-ized Inverse Gaussian distribution GIG ( p, a, b ) with parameters p = , a = h j and b = θ ′ j ( D ′ δ D δ ) θ j from which we have Z R φ − j exp (cid:26) − (cid:20) θ ′ ( D ′ δ D δ ) θ φ j + h j φ j (cid:21)(cid:27) dφ j = 2 K (cid:16)q h j (cid:0) θ ′ j ( D ′ δ D δ ) θ j (cid:1)(cid:17)(cid:18) h j θ ′ j ( D ′ δ D δ ) θ j (cid:19) , K ( z ) = p π z exp {− z } . Substituting this latter expression into equation(31) we obtain π ( θ j | h j ) = (cid:16) h j (cid:17) d + k +12 | D ′ δ D δ | (2 π ) k + d Γ (cid:0) d + k +12 (cid:1) √ π exp (cid:26) − h j q θ ′ j ( D ′ δ D δ ) θ j (cid:27) [ h j ( θ ′ j ( D ′ δ D δ ) θ j )] (cid:18) h j θ ′ j ( D ′ δ D δ ) θ j (cid:19) = √ π | D ′ δ D δ | h d + kj Γ (cid:0) d + k +12 (cid:1) exp n − h j q θ ′ j ( D ′ δ D δ ) θ j o , (32)which corresponds to the ALD defined in equations (27)–(28). The proposedprior distribution for θ j corresponds to the group lasso penalty of Yuan and Lin(2006), Meier et al. (2008) and Li et al. (2010) which accounts for the groupstructure when performing variable selection. It is worth emphasizing that, inour context, the group variables have a natural interpretation because they cor-respond to knots accounting for the smoothness level of the same regressor overdifferent regions of the support. The overall smoothness of the fitted curve iscontrolled by the variance of the error term h j which correspond to the inverse ofthe penalization parameter used by Eilers et al. (1996) in the frequentist frame-work. We choose a conjugate Gamma prior for h j , that is h j ∼ G (cid:0) a ( h ) , b ( h ) (cid:1) with a ( h ) = b ( h ) = 0 . h j , and assuming the prior structure defined in equations (5)–(6) forthe shape and scale parameters ( σ, α ), we then have the following hierarchicalmodel y t = x ′ t β + J X j =1 B zj θ j + ǫ t , ǫ t ∼ SEP (0 , τ, σ, α ) (33) β | ω ∼ N p ( β | p , Ω ) (34) ω k | γ k ∼ E (cid:0) ω k | /γ k (cid:1) (35) γ k ∼ G ( γ k | ψ, ̟ ) , ∀ k = 1 , , . . . , p (36) θ j | φ j ∼ N k + d (cid:16) , φ j ( D ′ δ D δ ) − (cid:17) φ j | h j ∼ G d + k + 12 , h j ! (37) h j ∼ IG (cid:18) a ( h ) , b ( h ) (cid:19) ∀ j = 1 , , . . . , J, (38)where B zj = ( B j, ( z tj ) , . . . , B j,k + d ( z tj )). In order to perform the Bayesian inference the Adaptive MCMC algorithmproposed in Section 3.2 will be slightly modified to deal with the simulation12rom the posterior distribution of the generalized quantiles GAM parameters.The posterior distribution becomes equal to π ( Ξ | y , x , z ) ∝ L τ ( β , σ, α, ϑ | y , x , z ) π ( β | γ ) π ( γ ) × π ( ϑ | φ ) π ( φ | h ) π ( h ) π ( σ, α ) (39)where the vector Ξ contains now three more additional set of parameters, namely ϑ = ( θ , θ , . . . , θ J ), φ = ( φ , φ , . . . , φ J ) and h = (cid:0) h , h , . . . , h J (cid:1) . The likeli-hood function L τ ( β , σ, α, ϑ | y , x , z ) defined in equation (3) for the linear modelshould be adapted to account for the additional splines coefficients. Here to per-form the Bayesian analysis it is necessary to add three more steps to the algo-rithm described in Section 3.2. Specifically, after having updated all the parame-ters of the linear part of the model, a candidate for θ j , for j = 1 , , . . . , J is drawnfrom a Gaussian proposal distribution, i.e., q (cid:0) θ j,i − , θ ∗ j (cid:1) ∼ N k + d (cid:16) µ ( i ) θ j , Σ ( i ) θ j (cid:17) and accepted on the basis of the following acceptance probability λ (cid:16) θ ( i − j , θ ∗ j (cid:17) = min , L (cid:16) β ( i − , σ ( i − , α ( i − , θ ∗ j , ϑ ( i − − j | y , x , z (cid:17) L (cid:16) β ( i − , σ ( i − , α ( i − , ϑ ( i − j | y , x , z (cid:17) × π (cid:0) θ ∗ j (cid:1) π (cid:16) θ ( i − j (cid:17) q (cid:16) θ ( i − j (cid:17) q (cid:0) θ ∗ j (cid:1) , where ϑ − j denote the whole set of B–Spline coefficients without the j –th com-ponent. Furthermore, as specified in Section 3.2 for the regression parameters,an adaptive step for the mean and the variance of the proposal distribution ofeach θ j is implemented using the following equation µ ( i +1) θ j = µ ( i ) θ j + ς ( i +1) (cid:16) θ j − µ ( i ) θ j (cid:17) , (40) Σ ( i +1) θ j = Σ ( i ) θ j + ς ( i +1) (cid:16) θ j − µ ( i ) θ j (cid:17) (cid:16) θ j − µ ( i ) θ j (cid:17) ′ , (41)for j = 1 , , . . . , J , where ς is the vanishing factor fixed as discussed above. Thehyperparameters ( φ , h ) are updated by single–move Gibbs sampling steps bysimulating from the following full conditionals which are proportional to theGIG distribution φ j | θ ( i ) j , h ( i − j ∼ GIG (cid:18) φ j (cid:12)(cid:12)(cid:12) , h ( i − j , θ ′ j ( D ′ δ D δ ) θ j (cid:19) h j | φ ( i ) j ∼ GIG (cid:18) h j (cid:12)(cid:12)(cid:12) − a h , φ ( i ) j , b h (cid:19) .
5. Simulation Studies
In this Section, simulation studies are performed to highlight the improvementsin robustness obtained by implementing SEP based quantile regression com-pared with those obtained by the traditional Bayesian quantile regression based13n the ADL distribution. More specifically, our purpose is to illustrate how theSEP misspecified model assumption in the quantile regression framework gen-erate posterior distributions of the regression parameters centered on the truevalues. The first simulation experiment assesses the robustness properties ofthe proposed methodology for quantile estimation when the joint distributionof the couple ( Y i , X i ), for i = 1 , , . . . , T , is contaminated by the presence ofoutliers. The second study shows the effectiveness of shrinkage effect obtainedby imposing the Lasso–type prior proposed when multiple quantile linear modelis the main concern. The last experiment aims at highlighting the ability ofthe model to adapt to non–linear shapes when data come from heterogeneousfat–tailed distributions. The hyperparameters of the priors distribution are allchosen such that the priors are non informative. In particular regarding thenuisance parameter σ we choose a = b = 0 .
001 which corresponds to a properInverse Gamma distributions with infinite second moments. When lasso prioris assumed the hyperparameters ( ψ, ̟ ) in the Gamma priors for γ j are chosento be 0 . For this experiment we consider a sample of T = 100 drawn from the followinghomoskedastic mixture of distributions f (cid:0) ( Y, X ) ′ , η , µ , µ , . . . , µ L , Σ (cid:1) = L X l =1 η l ϕ ( µ l , Σ ) , (42)where ϕ denote the density function of a Gaussian distribution with mean µ and variance and covariance matrix Σ and η = ( η , · · · , η L ) is the vector ofweights. We fix the number of components equal to L = 3, with mixtureweights η = (0 . , . , . µ = (cid:20) (cid:21) , µ = (cid:20) (cid:21) , µ = (cid:20) − (cid:21) , Σ = (cid:20) . . (cid:21) . (43)The quantile regression model implemented is a simple model with only oneexogenous variable i.e. Y t = X t β + ε t for t = 1 , , . . . T . The aim of this toyexample is to show the performances of the Bayesian quantile linear regressionanalysis assuming both the ALD and the SEP likelihood when the data arestrongly contaminated by the presence of outliers. Since we have only one re-gressor, for this illustrative example we use a simplified version of the samplerproposed in Section 3.2, in which a simple Gaussian prior is considered for β .For τ = (0 . , . , .
9) we run the MCMC algorithm with N = 50 ,
000 iterationsand a burn–in of M = 10 , β , σ ),are randomly drawn from N (0 ,
1) and IG (0 . , . α , required only for the SEP distributioncase, is randomly drawn form B (2 , (a) τ = 0 . , T = 100 -3 -2 -1 0 1 2 3-4-202468 (b) τ = 0 . , T = 100 -3 -2 -1 0 1 2 3-4-202468 (c) τ = 0 . , T = 100Figure 2: Contaminated data example. Comparison between Bayesian quantile regressionbased on the ALD (blue) and the SEP distribution (red) for different values of the quantileconfidence level τ = (0 . , . , .
9) and sample sizes T = 100. Shaded areas denote 95% HPDcredible sets. SEP one. It can be easily observed that the two curves overlap for τ = 0 . τ = (0 . , . τ = 0 . α is very close to one, whichimplies that the SEP reduces to the the ALD distribution. As far as we move Parameter
ALD SEPτ = 0 . τ = 0 . τ = 0 . τ = 0 . τ = 0 . τ = 0 . β -0.391 1.149 2.688 0.186 1.144 2.011 (0.176) (0.093) (0.237) (0.089) (0.086) (0.100) β (0.151) (0.106) (0.144) (0.094) (0.093) (0.074) σ (0.386) (0.212) (0.349) (0.153) (0.112) (0.150) α - - - 0.596 0.832 0.504 (0.094) (0.142) (0.068) Table 1: Contaminated data example. Estimated parameters for different levels of the quantileconfidence level τ = (0 . , . , .
9) and T = 100. Standard deviations in parenthesis. away from the median, it is notable the differences in the estimated regressionquantile parameters under the ALD and the SEP assumption. Looking at thesubplots 2(a) and 2(c) it is evident that the intercept and the slope of theregression line obtained using ALD distribution is strongly influenced by the7 .
25% of the outliers from the two external components of the mixture. Inthose cases, the estimated α of the SEP is considerably smaller than one. Theestimation of the β ’s parameters is therefore made under a distribution withfatter tails than the ALD strongly penalizing more extreme observations andproviding, as consequence, more robust results.For the regression parameters Table 1 contains the estimated posterior meansand standard deviations under the ALD and the SEP assumption. Under thedata generating process the theoretical slope should be always equal to 0 . In this section, we carry out a Monte Carlo simulation study specifically tailoredto evaluate the performance of the model when the Lasso prior (5) is consideredfor the regression parameters. The simulation are similar to the one proposed inLi et al. (2010) and Alhamzawi et al. (2012). Specifically, we simulate T = 200observations from the linear model Y t = X ′ t β + ε t , where the true values for theregressors are set as follows:Simulation 1. β = (3 , . , , , , , , ′ , Simulation 2. β = (0 . , . , . , . , . , . , . , . ′ , Simulation 3. β = (5 , , , , , , , ′ , The first simulation corresponds to a sparse regression case, the second to adense case while the third to a very sparse one. The covariates are indepen-dently generated from a N (0 , Σ) with σ i,j = 0 . | i − j | . Two different choicesfor the error terms distribution of the generating process distribution are con-sidered for each simulation study. The first choice is a Gaussian distribution N (cid:0) µ, σ (cid:1) , with µ chosen so that the τ -th quantile is 0, while σ is set as 9, asin Li et al. (2010). The second choice is a Generalized Student’t distribution GS (cid:0) µ, σ , ν (cid:1) with two degrees of freedom, i.e. ν = 2, σ = 9 and µ chosen sothat the τ -th quantile is 0. For three different quantile level, τ = (0 . , . , . β ) and each choice of theerror term. Table 2 reports the median of mean absolute deviation (MMAD),i.e. median (cid:16) P t =1 | x ′ t ˆ β − x ′ t β | (cid:17) , and the median of the parameters ˆ β over50 estimates. To be concise only results for simulation1 are reported since re-sults from the other two simulations are similar. It is immediate to see thatthe proposed Bayesian quantile regression method based on the SEP likelihoodperforms better in terms of MMAD for both the distributions of the error term.This evidence confirms that the presence of the shape parameter α in the like-lihood allows to better capture the behavior of the data. The estimated shapeparameter is indeed greater and lower then 1 in the Gaussian and GeneralizedStudent case respectively, giving us a more reliable estimation of the vector β regardless to the tails weight of the distribution of the error term. Theseresults are confirmed in simulation 2 and simulation 3 (not reported here) inwhich we exasperate the density and the sparsity in the structure of the pre-dictors. Furthermore, looking at the values of β we can see that the proposedrobust method reduces the bias of the estimates for all the quantile confidencelevels. Concerning the shrinkage ability of the proposed estimator we observethat where the true parameters are zero, the SEP distribution is able to identifythem better than the ALD. 16 rror distribution Par. ALD SEP τ = 0 . τ = 0 . τ = 0 . τ = 0 . τ = 0 . τ = 0 . β β β β β β β β β β β β β β β β Table 2: Multiple regression simulated data example 1. MMADs and estimated parametersfor Simulation 1 under the SEP and ALD assumption for the quantile error term.
In this simulation example we illustrate the performances of model assumptionswhen a simple GAM model is considered with a single continuous smooth non–linear function and where the parametric linear components are set to zero.Following Chen and Yu (2009), we consider two data generating process y t = f j ( x t )+ ǫ t , for t = 1 , , . . . , T and j = 1 , f represents the wave functionand f the doppler function, defined as follows f ( x ) = 4 ( x − .
5) + 2 exp (cid:16) −
256 ( x − . (cid:17) (0 , ( x ) (44) f ( x ) = (0 . x (1 − . x )) sin (cid:18) π (1 + γ )0 . x + γ (cid:19) (0 , ( x ) , (45)with γ = 0 .
15. These functions are usually used (see also Denison et al. 1998)to check the nonlinear fitting ability of a proposed model. Starting form thesetwo curves, we generate a sample of T = 200 and T = 512 observations for the17ave and the doppler functions, respectively using four different sources of error Gaussian noise , ǫ t ∼ N (0 ,
1) (46) y t = f ( x t ) + σ ǫ t ,y t = f ( x t ) + σ ǫ t , Student–t noise , ǫ t ∼ T ν (0 , , (47) y t = f ( x t ) + σ ǫ t ,y t = f ( x t ) + σ ǫ t , Linear heterogeneity , ǫ t ∼ T ν (0 , , (48) y t = f ( x t ) + σ (1 + x ) ǫ t ,y t = f ( x t ) + σ (1 + x ) ǫ t , Quadratic heterogeneity , ǫ t ∼ T ν (0 , , (49) y t = f ( x t ) + σ (cid:0) x t (cid:1) ǫ t ,y t = f ( x t ) + σ (cid:0) x t (cid:1) ǫ t , where σ = √ . σ = √ . ν = 2. All the considered model specificationsare estimated using penalized P–Splines of order 4 imposing a relative largenumber of equally spaced knots and with a penalization parameter δ = 2, assuggested by Eilers et al. (1996). In particular, we use 20 knots for the wavefunction and 25 knots for the doppler one because of the presence of manychange points. The sampling process is performed using 10,000 iterations withthe first 5,000 as burn–in. Table 3 shows the average and the standard errors, for50 repeats, of the mean squared errors ( mse ) of three different quantile levels forall the described curves. It can be noted that the SEP outperforms almost uni-formly the ALD in terms of mse . The difference between the two curves is lessevident in presence of Gaussian errors, where the ALD shows also a smaller mse for the extreme quantiles of the wave function. The improvement in terms ofestimation bias becomes greater looking at more heavy tailed and heteroskedas-tic error distributions. Concerning the wave function the SEP shows a mse thatis equal to half of that obtained with the ALD at the extreme quantiles. Thesame conclusions can be drawn for the doppler function that is generally betterestimated than the wave.
6. Empirical applications
Three empirical datasets are analyzed in this section: Boston Housing, MunichRent and Barro growth data. The first dataset is carachterized by the presenceof many regressors that allows us to emphasize the usefulness of introducing alasso prior for the regression parameters. The second one also has a large set ofregressors but some of them are characterized by a non linear relation with theresponse variable. For this dataset we highlight that the assumption of a lassoprior within a robust quantile GAM framework leads us to a more precise esti-mation process. Finally we propose the use of our robust quantile lasso GAM18 odel Noise Wave Doppler τ = 0 . τ = 0 . τ = 0 . τ = 0 . τ = 0 . τ = 0 . ALD
Gaussian 0.0054 0.0022 0.0039 0.0002 0.0000 0.0002 (0.0171) (0.0070) (0.0124) (0.0007) (0.0002) (0.0005)
Student–t 0.0504 0.0034 0.0177 0.0009 0.0001 0.0177 (0.1593) (0.0108) (0.0561) (0.0027) (0.0045) (0.0015)
Lin. Het. 0.1035 0.0054 0.0627 0.0059 0.0002 0.0039 (0.3273) (0.0170) (0.1979) (0.0180) (0.0010) (0.0124)
Quad. Het. 0.0505 0.0067 0.0752 0.0018 0.0001 0.0050 (0.1598) (0.0210) (0.2377) (0.0058) (0.0006) (0.0160)
SEP
Gaussian 0.0071 0.0020 0.0078 0.0002 0.0000 0.0005 (0.0226) (0.0063) (0.0248) (0.0006) (0.0002) (0.0016)
Student–t 0.0251 0.0037 0.0132 0.0007 0.0001 0.0006 (0.0795) (0.0117) (0.0417) (0.0022) (0.0045) (0.0019)
Lin. Het. 0.0986 0.0046 0.0678 0.0008 0.0003 0.0006 (0.3118) (0.0145) (0.2144) (0.0026) (0.0010) (0.0020)
Quad. Het. 0.0234 0.0057 0.0305 0.0011 0.0002 0.0008 (0.0741) (0.0180) (0.0965) (0.0035) (0.0006) (0.0026)
Table 3: Non–linear regression simulated data example. MSE of the fitted curves with foursources of noise evaluated over the 200 synthetic replications. Standard deviations in paren-thesis. model to study the Barro growth data by assuming a non linear representa-tion for some regressors. We find a new interesting interpretation of regressionparameters while the neoclassical convergence hypothesis is maintained.
In this section we analyze the Boston Housing data first considered by Harrisonand Rubinfeld (1978) studying the influence of pollution on house prices. Inparticular in this paper we consider the corrected data of Li et al. (2010).The model is based on the log-transformed corrected median values of owner-occupied housing (values in USD 1000) as dependent variable while severalexogenous variables are taken into account: the point longitudes and latitudesin decimal degrees (LON and LAT respectively), the per capita crime (CRIM),the proportions of residential land zoned and non-retail business acres per town(ZN and INDUS respectively), a dummy equal to 1 if tract borders Charles River(CHAS), the nitric oxides concentration (NOX), the average numbers of roomsper dwelling (RM), the proportions of owner-occupied units built prior to 1940(AGE), the weighted distances to five Boston employment centers (DIS), theindex of accessibility to radial highways per town (RAD), the full-value property-tax rate per town (TAX), the pupil-teacher ratios per town (PTRATIO), thetransformed Black population proportion (B) and percentage values of lowerstatus population (LSTAT) as regressors. To provide a complete description ofthe conditional distribution of the response variable we consider five differentchoices of τ , i.e. 0 .
10 0 .
25 0 .
50 0 .
75 0 .
90. Moreover in order to show theopportunity of assuming a Lasso prior for the regressor parameters and to showits performances we consider also a Guaussian prior distribution. Results areshowed in Table 4 where it is evident that independently of the choice of the prior19 ariable Gaussian Prior Lasso Prior τ = 0 . τ = 0 . τ = 0 . τ = 0 . τ = 0 . τ = 0 . τ = 0 . τ = 0 . τ = 0 . τ = 0 . (0.0364) (0.0416) (0.0450) (0.0555) (0.0434) (0.0166) (0.0172) (0.0187) (0.0176) (0.0157) LAT -0.0250 0.0251 0.0386 0.0531 0.0816 0.0103 0.0221 0.0168 0.0121 0.0238 (0.0582) (0.0678) (0.0732) (0.0937) (0.0729) (0.0276) (0.0290) (0.0311) (0.0296) (0.0262)
CRIM -0.0230 -0.0177 -0.0093 -0.0063 -0.0058 -0.0241 -0.0178 -0.0093 -0.0059 -0.0032 (0.0032) (0.0027) (0.0015) (0.0016) (0.0021) (0.0028) (0.0027) (0.0014) (0.0017) (0.0014)
ZN 0.0000 0.0005 0.0008 0.0012 0.0012 -0.0000 0.0006 0.0009 0.0010 0.0009 (0.0003) (0.0003) (0.0004) (0.0004) (0.0003) (0.0003) (0.0003) (0.0004) (0.0003) (0.0002)
INDUS 0.0030 0.0023 0.0027 0.0014 -0.0013 0.0016 0.0017 0.0016 0.0010 -0.0027 (0.0016) (0.0014) (0.0016) (0.0017) (0.0013) (0.0015) (0.0014) (0.0016) (0.0017) (0.0011)
CHAS 0.0482 0.0464 0.0597 0.0737 0.1134 0.0183 0.0377 0.0411 0.0448 0.0834 (0.0264) (0.0197) (0.0264) (0.0308) (0.0302) (0.0190) (0.0205) (0.0236) (0.0275) (0.0312)
NOX -0.3667 -0.2642 -0.3672 -0.4465 -0.3204 -0.0397 -0.0604 -0.0480 -0.0416 -0.0222 (0.1324) (0.0946) (0.1119) (0.1424) (0.1191) (0.0463) (0.0575) (0.0551) (0.0524) (0.0395)
RM 0.2050 0.2259 0.2139 0.2007 0.2067 0.2318 0.2299 0.2129 0.2262 0.2347 (0.0258) (0.0141) (0.0175) (0.0255) (0.0154) (0.0157) (0.0131) (0.0173) (0.0201) (0.0142)
AGE -0.0013 -0.0016 -0.0009 -0.0000 0.0006 -0.0019 -0.0018 -0.0011 -0.0008 0.0002 (0.0005) (0.0003) (0.0004) (0.0006) (0.0003) (0.0003) (0.0003) (0.0004) (0.0005) (0.0003)
DIS -0.0337 -0.0342 -0.0330 -0.0337 -0.0313 -0.0279 -0.0303 -0.0268 -0.0267 -0.0257 (0.0063) (0.0054) (0.0062) (0.0063) (0.0043) (0.0050) (0.0047) (0.0061) (0.0054) (0.0033)
RAD 0.0113 0.0100 0.0074 0.0106 0.0118 0.0103 0.0089 0.0077 0.0095 0.0099 (0.0025) (0.0018) (0.0024) (0.0022) (0.0019) (0.0022) (0.0016) (0.0027) (0.0021) (0.0014)
TAX -0.0007 -0.0006 -0.0005 -0.0004 -0.0004 -0.0007 -0.0006 -0.0005 -0.0005 -0.0004 (0.0001) (0.0001) (0.0001) (0.0001) (0.0001) (0.0001) (0.0001) (0.0001) (0.0001) (0.0001)
PRATIO -0.0295 -0.0292 -0.0318 -0.0302 -0.0253 -0.0300 -0.0270 -0.0280 -0.0245 -0.0199 (0.0037) (0.0032) (0.0039) (0.0047) (0.0033) (0.0024) (0.0027) (0.0037) (0.0035) (0.0027)
B 0.0006 0.0006 0.0007 0.0007 0.0006 0.0006 0.0006 0.0007 0.0008 0.0009 (0.0001) (0.0001) (0.0001) (0.0001) (0.0002) (0.0001) (0.0001) (0.0001) (0.0001) (0.0001)
LSTAT -0.0186 -0.0172 -0.0189 -0.0195 -0.0191 -0.0161 -0.0173 -0.0205 -0.0179 -0.0163 (0.0027) (0.0019) (0.0023) (0.0028) (0.0015) (0.0018) (0.0019) (0.0023) (0.0025) (0.0015) σ (0.0243) (0.0198) (0.0208) (0.0251) (0.0222) (0.0208) (0.0197) (0.0208) (0.0237) (0.0202) α (0.0603) (0.0558) (0.0620) (0.0627) (0.0390) (0.0470) (0.0541) (0.0602) (0.0569) (0.0335) Table 4: Linear regression model results for Boston dataset. Standard deviations in paren-thesis. distribution all the variables appear in the table with a sign in line with previousstudies on the same dataset. Nevertheless, Lasso prior should be preferred atleast for two reasons. First it uniformly provides smaller posterior standarderrors, the estimated coefficients appear to be more reliable at extreme quantilelevels, i.e. τ = 0 . τ = 0 . In section 5.3 we provide empirical evidence that the SEP distribution producesmore reliable estimates of the conditional quantile in presence of heteroskedas-ticity and heavy tails. To provide a real data example we analyze the very wellknown 2003 Munich rental dataset, notoriously characterized by the presenceof heterogeneous variability. Furthermore, several analyses of this dataset (seefor example Kneib et al, 2011 and Mayr et al, 2012) showed the presence of a20patial effects modeled by considering a parameter for each of the 380 districtsof Munich. For this reason the parameter space handled is quite wide high-lighting the need of considering a variable selection approach. Here therefore,we assume a Lasso prior distribution on the unknown parameters in line withthe on proposed in 28 and we compare its performances with a Gaussian priorassumption. The response variable is the rent in Euro per square meters for aflat in Munich. Two sets of covariates describe linear and non linear relationsbetween the rent and its determinants. The linear predictors are a set of 13dummies for the goodness of location, the goodness of rooms and the number ofrooms in the flat. The floor size and the year of construction have instead a nonlinear impact on the response variable. Finally, the spatial location of the flatallows to implement a geoadditive model of the kind introduced by Kammannand Wand (2003). To this aim we use a Bayesian semi–parametric quantileregression model with a spatial effect similar to the one considered in Rue andHeld (2005) and Yue and Rue (2011) among the others. A complete descriptionof the dataset can be found in Rue and Held (2005).We estimate the τ -th conditional quantile for the rent r t , i.e., Q τ ( r t | x t , z t )from the following model: r t = q t,τ + ǫ t q t,τ = x ′ t β τ + f s,τ ( z s,t ) + f y,τ ( z y,t ) + f l,τ ( z l,t ) , (50)where t = 1 , , . . . , ǫ t is the error term with zero τ – th quantile and constantvariance, x t is the whole set of dummies treated as linear parametric predictors, z t = ( z s,t , z y,t , z l,t ) are the predictor variable for “size” , “year” and “spatial”effect while f s,τ f y,τ and f l,τ are their non linear functions. The estimationprocedure of three quantile confidence levels τ = (0 . , . , .
75) has been per-formed using the Adaptive MCMC procedure for GAM models described insubsection 4.2.Figures 3 and 4 show the estimated non linear effect for the year of construc-tion and the floor size using Gaussian and Lasso prior respectively. The pointsbelow each sub-figure represent the available observations for each value of thecovariates while dotted lines represent the 95% posterior credible intervals. Wecan observe that both priors provide similar estimated splines for the effect ofthe floor size on the house prices. It can be seen that small flat (less that 40 m )has a very high rent for square meters while for big flat the rent remains al-most unchanged. Concerning the year of construction, the estimated splines areapparently quite different under the two prior specifications but they actuallycontain similar information. In fact, looking at the level of the variable and theconfidence intervals the effect of this covariate can be approximatively consid-ered equal to zero until the 1990, when a clear positive and increasing effect isshown under both prior specification. Figure 5 displays the estimated spatialeffects of 380 subquarters in Munich. Values are normalized to be in the range(0 ,
920 1940 1960 1980 2000-4-2024 (a) Year, τ = 0 . (b) Year, τ = 0 . (c) Year, τ = 0 . (d) Floor Size, τ = 0 . (e) Floor Size, τ = 0 . (f) Floor Size, τ = 0 . (a) Year, τ = 0 . (b) Year, τ = 0 . (c) Year, τ = 0 . (d) Floor Size, τ = 0 . (e) Floor Size, τ = 0 . (f) Floor Size, τ = 0 . linear parametric effects are shown in Table 5. The signs of the variables are inline with previous works but new interesting results are suggested under Lassoprior in the estimation of the effect of ”No hot water”, ”No central heating”and ”6 Rooms”. In particular, Lasso prior discriminates further the effect ofthese variables for each quantile. Indeed, we can see that the absence of hotwater and the presence of 6 Rooms have a statistical significant effect only onexpensive house, i.e. for τ = 0 .
75 while the opposite occurs considering the ab-sence of central heating. We think that these results highlight more consistentlythe variety of the consumption choices due to different budget constraints. It isworth noting that Lasso prior correctly shrinks the effect of ”Special bathroominterior” that is not very significant when estimated using Gaussian prior.22 (a) τ = 0 . (b) τ = 0 . (c) τ = 0 . (d) τ = 0 . (e) τ = 0 . (f) τ = 0 . As final application, we analyze the dataset related to the international eco-nomic growth model firstly considered by Barro and Sala i-Martin (1995) andextended to the quantile regression framework by Koenker and Machado (1999).Since standard OLS model do not provide a clear result about the convergencehypothesis of neoclassical growth models, several papers have analyzed growthequations using quantile regression technique with prominent results. In theirpaper Barreto and Hughes (2004) show that the determinants of the economicgrowth for countries in the left or right tails of the distribution are very differ-ent from those in the mean. Mello and Perrelli (2003) use quantile regression tofind evidence in favor of the convergence hypothesis for countries in the upperquantile of the conditional distribution of the response variable using the Barrogrowth model (Barro, 1991). Finally, Laurini (2007) uses spline functions intesting the convergence hypothesis with a dataset of Brazilian municipalities.To the best of our knowledge, this is the first attempt to propose a Bayesianquantile Lasso GAM model in order to study the impact of both linear andnon linear effect of the covariates on the cross country GDP growth using theBarro and Sala i-Martin (1995) model. The dataset contains 161 world nations23 ariable Gaussian Prior Lasso Prior τ = 0 . τ = 0 . τ = 0 . τ = 0 . τ = 0 . τ = 0 . (0.0925) (0.0880) (0.0857) (0.1230) (0.1124) (0.1039) Excellent location 1.4213 1.6999 1.9381 1.4136 1.6305 1.8450 (0.2770) (0.2879) (0.2829) (0.2376) (0.2454) (0.2527)
No hot water -1.3361 -1.8499 -2.2199 -0.0353 -0.0335 -2.7652 (0.2410) (0.2664) (0.2738) (0.0475) (0.0454) (0.3731)
No central heating -1.5449 -1.4206 -1.0610 -1.9830 -2.0557 -0.1316 (0.1759) (0.1957) (0.1867) (0.1872) (0.2076) (0.1193)
No tiles in bathroom -0.4260 -0.5792 -0.5942 -0.1597 -0.3277 -0.2576 (0.1079) (0.1091) (0.1140) (0.1143) (0.1453) (0.1575)
Special bathroom interior 0.3926 0.3803 0.4897 0.0598 0.0824 0.0598 (0.1462) (0.1580) (0.1489) (0.0606) (0.0779) (0.0581)
Special kitchen interior 0.9145 1.1405 1.2480 1.0824 1.3077 1.3153 (0.1787) (0.1564) (0.1740) (0.3355) (0.2239) (0.1989) (0.1754) (0.1700) (0.1751) (0.1926) (0.1729) (0.1652) (0.1002) (0.1024) (0.0922) (0.1154) (0.1126) (0.1084) (0.0998) (0.0927) (0.0947) (0.1159) (0.1101) (0.1076) (0.1459) (0.1404) (0.1490) (0.1557) (0.1644) (0.1601) (0.2659) (0.3047) (0.2956) (0.4154) (0.3538) (0.3514) (0.4450) (0.4479) (0.4661) (0.0658) (0.0752) (0.5960)
Table 5: Posterior Means and standard errors (in parenthesis) of the linear regressors for threedifferent quantile levels. observed for 13 covariates covering the two periods 1965-75 and 1975-85. Witha quantile GAM model we are able to combine the theory of non linear returnto education with that of economic convergence using spline functions to modelthe variables ”Male secondary school” (MSS), ”Female Secondary school” (FSS),”Male Higher Education” (MHE) and ”Female Higher Education” (FHE) whilewe adopt a linear representation for the remaining variables.The parameter estimates of the linear covariates (Table 6) are in line withprevious studies based on quantile regression methods. In particular, it is worthnoting that the coefficients related to the initial per capita GDP is always nega-tive, confirming the neoclassical theory about conditional convergence. Figures6 displays the estimated spline functions along with their credible sets, for threequantile levels τ = (0 . , . , . .5 1 1.5 2 2.5 3 3.5 4-0.58-0.56-0.54-0.52-0.5-0.48-0.46-0.44 (a) MSS, τ = 0 . (b) FSS, τ = 0 . (c) MHE, τ = 0 . (d) FHE, τ = 0 . (e) MSS, τ = 0 . (f) FSS, τ = 0 . (g) MHE, τ = 0 . (h) FHE, τ = 0 . (i) MSS, τ = 0 . (j) FSS, τ = 0 . (k) MHE, τ = 0 . (l) FHE, τ = 0 . different types of education is not the same for countries in the lower and uppertails of the growth conditional distribution. This result is of particular interestsince it allows to isolate the positive and negative contributions of each type ofeducation on the rate of economic growth. There are two opposite paths char-acterizing the effect of secondary schooling and higher education on growth: thefirst one is increasing in the quantile level τ , the second is decreasing. In partic-ular our estimates suggest that relatively low education levels help countries inthe upper tail of the grow distribution while higher education levels boost therate of growth for countries in the lower tail. Those results can be interpretedin view of the fact that, high and low GDP growth levels are linked with emerg-ing and developed nations respectively. The basic schooling is a key factor foremerging nations which base their economies on high labor intensity activities.For these countries, the costs resulting from higher levels of schooling outweightheir returns while the opposite is true for the advanced countries. The latter,indeed, have the possibility to take advantage of skilled labor forces to exploitthe higher returns derived from the available technology.
7. Conclusion
In this paper we show how the SEP distribution provides a flexible tool tomodel the conditional quantile of a response variable as a function of exogenouscovariates in a Bayesian quantile regression contest. In particular extreme ob-servations are properly accounted by the shape parameter governing the tailsdecay of the distribution efficiently handling data with outliers or with fat tail–decay. Moreover we extend the linear quantile regression framework to the25 ariable Quantile levels τ = 0 . τ = 0 . τ = 0 . τ = 0 . τ = 0 . (0.0043) (0.0048) (0.0055) (0.0045) (0.0047) Life Expectancy 0.0324 0.0127 0.1947 0.2021 0.1866 (0.0091) (0.0104) (0.0110) (0.0097) (0.0092)
Human Capital -0.0024 -0.0037 -0.0010 -0.0023 -0.0018 (0.0011) (0.0017) (0.0020) (0.0019) (0.0017)
Education/GDP -0.2707 -0.1127 -0.2956 -0.2156 -0.0993 (0.1246) (0.1579) (0.1690) (0.1721) (0.1732)
Investment/GDP 0.0999 0.0930 0.0359 0.0343 0.0482 (0.0268) (0.0291) (0.0337) (0.0277) (0.0264)
Public Consumption/GDP -0.1628 -0.1723 0.0052 0.0251 -0.0206 (0.0387) (0.0443) (0.0463) (0.0384) (0.0280)
Black Market Premium -0.0227 -0.0267 -0.0360 -0.0319 -0.0316 (0.0063) (0.0074) (0.0077) (0.0064) (0.0072)
Political Instability -0.0264 -0.0302 -0.0153 -0.0053 -0.0042 (0.0083) (0.0088) (0.0103) (0.0098) (0.0073)
Growth Rate Terms Trade 0.1220 0.1250 0.2274 0.2478 0.2744 (0.0366) (0.0504) (0.0652) (0.0639) (0.0569)
Table 6: Posterior Means and standard errors (in parenthesis) of the linear regressors for threedifferent quantile levels.
GAM one when quantile functions are approximated with splines. In both caseswe provide new adaptive Metropolis within Gibbs algorithm in order to imple-ment the statistical inference. Since it is common when building models, thata big number of parameters should be estimated in particular when spline toolsare used, in this paper we accommodate the problem of variable selection andshrinking parameters by using the Bayesian version of Lasso penalization meth-ods. In particular we suggest the use of generalized independent Laplace priorson the regressor parameters in the linear case allowing to shrink each parameterseparately and a multivariate Laplace distribution on the spline coefficients gen-eralizing the Lang and Brezger (2004) second order random walk prior. Finallywe show the power of the models considered through simulation and real dataset applications where it is evident the flexibility of the quantile methodologyproposed in terms of robustness and sparsity.
Acknowledgments
This research is supported by the Italian Ministry of Research PRIN 2013–2015, “Multivariate Statistical Methods for Risk Assessment” (MISURA), andby the “Carlo Giannini Research Fellowship”, the “Centro Interuniversitario diEconometria” (CIdE) and “UniCredit Foundation”.26 ppendix ALemma 7.1.
Let Y ∼ SEP ( µ, σ, α, τ ) , then the τ –level quantile of Y coincideswith its natural location parameter, i.e. Q τ ( Y ) = µ .Proof. In order to show that P ( Y ≤ µ ) = τ we compute the cdf of a SEP in y = µ P ( Y ≤ µ ) = Z µ −∞ σ α − α Γ (cid:0) α (cid:1) (cid:20) exp (cid:26) − α (cid:18) µ − y τ σ (cid:19) α (cid:27) ( y ≤ µ ) + exp (cid:26) − α (cid:18) y − µ − τ ) σ (cid:19) α (cid:27) ( y>µ ) (cid:21) dy = Z µ −∞ σ α − α Γ (cid:0) α (cid:1) exp (cid:26) − α (cid:18) µ − y τ σ (cid:19) α (cid:27) dy (51)Without loss of generality, let us consider the case when µ = 0 and σ = 1. Theintegral reduces to α − α (cid:0) α (cid:1) Z −∞ exp (cid:26) − α (cid:18) − y τ (cid:19) α (cid:27) dy. (52)By substitute ( − y ) α = x we have α − α (cid:0) α (cid:1) Z + ∞ exp (cid:26) − xα (2 τ ) α (cid:27) α ( x ) α − dx (53)Rearranging equation (53) and recognizing the kernel of a Gamma pdf withshape 1 /α and scale α (2 τ ) α the integral becomes α − α (cid:0) α (cid:1) α Γ (cid:18) α (cid:19) ( α (2 τ ) α ) α . By using the property Γ( x + 1) = x Γ( x ) all the terms simplify except for τ ,concluding the proof. 27 eferences Alhamzawi, R., Yu, K., and Benoit, D. F. (2012). Bayesian adaptive lassoquantile regression.
Statistical Modelling , 12(3):279–297.Andrieu, C. and Moulines, r. (2006). On the ergodicity properties of someadaptive mcmc algorithms.
The Annals of Applied Probability , 16(3):1462–1505.Andrieu, C. and Thoms, J. (2008). A tutorial on adaptive mcmc.
Statistics andComputing , 18(4):343–373.Barreto, R. A. and Hughes, A. W. (2004). Under performers and over achievers:A quantile regression analysis of growth.
Economic Record , 80(248):17–35.Barro, R. and i Martin, X. S. (1995).
Economic Growth . McGraw-Hill, NewYork.Barro, R. J. (1991). Economic growth in a cross section of countries.
TheQuarterly Journal of Economics , 106(2):407–443.Bernardi, M., Gayraud, G., and Petrella, L. (2015). Bayesian tail risk interde-pendence using quantile regression.
Bayesian Analysis , 10(3):pp. 553–603.Box, G. and Tiao, G. (1973).
Bayesian inference in statistical analysis . Addison-Wesley series in behavioral science: quantitative methods. Addison-WesleyPub. Co.Brezger, A. and Lang, S. (2006). Generalized structured additive regressionbased on bayesian p-splines.
Computational Statistics and Data Analysis ,50(4):967 – 991.Brezger, A. and Steiner, W. J. (2008). Monotonic regression based on bayesianp-splines: An application to estimating price response functions from store-level scanner data.
Journal of Business and Economic Statistics , 26(1):pp.90–104.Chen, C. and Yu, K. (2009). Automatic bayesian quantile regression curvefitting.
Statistics and Computing , 19(3):271–281.Choy, S. and Walker, S. G. (2003). The extended exponential power distributionand bayesian robustness.
Statistics & Probability Letters , 65(3):227 – 232.Choy, S. T. B. and Smith, A. F. M. (1997). On robust analysis of a nor-mal location parameter.
Journal of the Royal Statistical Society. Series B(Methodological) , 59(2):pp. 463–474.Denison, D. G. T., Mallick, B. K., and Smith, A. F. M. (1998). Automaticbayesian curve fitting.
Journal of the Royal Statistical Society: Series B(Statistical Methodology) , 60(2):333–350.28unson, D. B. and Taylor, J. A. (2005). Approximate bayesian inference forquantiles.
Journal of Nonparametric Statistics , 17(3):385–400.Eilers, P. H. C., Rijnmond, D. M., and Marx, B. D. (1996). Flexible smoothingwith b-splines and penalties.
Statistical Science , 11:89–121.Fernandez, C. and Steel, M. F. J. (1998). On bayesian modeling of fat tailsand skewness.
Journal of the American Statistical Association , 93(441):pp.359–371.Hans, C. (2009). Bayesian lasso regression.
Biometrika , 96(4):835–845.Hastie, T. and Tibshirani, R. (1986). Generalized additive models.
StatisticalScience , 1(3):pp. 297–310.Jr., D. H. and Rubinfeld, D. L. (1978). Hedonic housing prices and the demandfor clean air.
Journal of Environmental Economics and Management , 5(1):81– 102.Kammann, E. E. and Wand, M. P. (2003). Geoadditive models.
Journal of theRoyal Statistical Society: Series C (Applied Statistics) , 52(1):1–18.Kobayashi, G. (2016). Skew exponential power stochastic volatility model foranalysis of skewness, non-normal tails, quantiles and expectiles.
Computa-tional Statistics , 31(1):49–88.Koenker, B. (2005).
Quantile Regression . Cambridge University Press, Cam-bridge.Koenker, B. and Basset, G. (1978). Regression quantiles.
Econometrica , 46:33–50.Koenker, R. and Machado, J. A. (1999). Goodness of fit and related inferenceprocesses for quantile regression.
Journal of the american statistical associa-tion , 94(448):1296–1310.Kottas, A. and Gelfand, A. (2001). Bayesian semiparametric median regressionmodeling.
Journal of the American Statistical Association , 96:1458–1468.Kottas, A. and Krnjajic, M. (2009). Bayesian semiparametric modelling inquantile regression.
Scandinavian Journal of Statistics , 36:297–319.Kotz, S., Kozubowski, T. J., and Podg´orski, K. (2001). Asymmetric multivariatelaplace distribution. In
The Laplace Distribution and Generalizations , pages239–272. Springer.Kozumi, H. and Kobayashi, G. (2011). Gibbs sampling methods for bayesianquantile regression.
Journal of Statistical Computation and Simulation ,81:1565–1578. 29ang, S. and Brezger, A. (2004). Bayesian p-splines.
Journal of Computationaland Graphical Statistics , 13(1):pp. 183–212.Laurini, M. P. (2007). A note on the use of quantile regression in beta conver-gence analysis.
Economics Bullettin , 3(52):1–8.Li, Q., Xi, R., and Lin, N. (2010). Bayesian regularized quantile regression.
Bayesian Anal. , 5(3):533–556.Lum, K. and Gelfand, A. (2012). Spatial quantile multiple regression using theasymmetric laplace process.
Bayesian Analysis , 7:235–258.Mayr, A., Fenske, N., Hofner, B., Kneib, T., and Schmid, M. (2012). General-ized additive models for location, scale and shape for high dimensional dataaflexible approach based on boosting.
Journal of the Royal Statistical Society:Series C (Applied Statistics) , 61(3):403–427.Meier, L., Van De Geer, S., and Bhlmann, P. (2008). The group lasso for logis-tic regression.
Journal of the Royal Statistical Society: Series B (StatisticalMethodology) , 70(1):53–71.Mello, M. and Perrelli, R. (2003). Growth equations: a quantile regressionexploration.
The Quarterly Review of Economics and Finance , 43(4):643 –667. Capital Accumulation and Allocation in Economic Growth.Naranjo, L., P´erez, C. J., and Mart´ın, J. (2015). Bayesian analysis of somemodels that use the asymmetric exponential power distribution.
Statisticsand Computing , 25(3):497–514.Newey, W. and Powell, J. (1987). Asymmetric least squares estimation andtesting.
Econometrica , 55:819–847.O’Sullivan, F. (1986). A statistical perspective on ill-posedinverse problems(with discussion).
Statistical Science , 1:505–527.OSullivan, F. (1988). Fast computation of fully automated log-density and log-hazard estimators.
SIAM Journal on Scientific and Statistical Computing ,9(2):363–379.Park, T. and Casella, G. (2008). The bayesian lasso.
Journal of the AmericanStatistical Association , 103:681–686.Roberts, G. O. and Rosenthal, J. S. (2007). Coupling and ergodicity of adap-tive markov chain monte carlo algorithms.
Journal of Applied Probability ,44(2):pp. 458–475.Rue, H. and Held, L. (2005).
Gaussian Markov random fields: theory andapplications . CRC Press.Salazar, E., Ferreira, M., and Migon, H. (2012). Objective bayesian analysis forexponential power regression models.
Sankhya B , 74(1):107–125.30riram, K., Ramamoorthi, R., and Ghosh, P. (2013). Posterior consistencyof bayesian quantile regression based on the misspecified asymmetric laplacedensity.
Bayesian Analysis , 8:479–504.Thomas Kneib, Susanne Konrath, L. F. (2011). High dimensional structuredadditive regression models: Bayesian regularization, smoothing and predic-tive performance.
Journal of the Royal Statistical Society. Series C (AppliedStatistics) , 60(1):51–70.Thompson, P., Cai, Y., Moyeed, R., Reeve, D., and Stander, J. (2010). Bayesiannonparametric quantile regression using splines.
Computational Statistics andData Analysis , 54(4):1138 – 1150.Tibshirani, R. (1996). Regression shrinkage and selection via the lasso.
Journalof the Royal Statistical Society, Series B , 58:267–288.Tokdar, S. and Kadane, J. (2012). Using exponentially weighted quantile re-gression to estimate value at risk and expected shortfall.
Journal of FinancialEconomics , 6:382–406.Wichitaksorn, N., Choy, S. T. B., and Gerlach, R. (2014). A generalized class ofskew distributions and associated robust quantile regression models.
Cana-dian Journal of Statistics , 42(4):579–596.Yu, K. and Moyeed, R. (2001). Bayesian quantile regression.
Statistics & Prob-ability Letters , 54:437–447.Yuan, M. and Lin, Y. (2006). Model selection and estimation in regressionwith grouped variables.
Journal of the Royal Statistical Society: Series B(Statistical Methodology) , 68(1):49–67.Yue, Y. R. and Rue, H. (2011). Bayesian inference for additive mixed quantileregression models.
Comput. Stat. Data Anal. , 55(1):84–96.Zhu, D. and Zinde-Walsh, V. (2009). Properties and estimation of asymmetricexponential power distribution.