HHorseshoe Prior Bayesian Quantile Regression ∗ David Kohns † Department of Economics, Heriot-Watt UniversityandTibor Szendrei ‡ Department of Economics, Heriot-Watt UniversityJune 16, 2020
Abstract
This paper extends the horseshoe prior of Carvalho et al. (2010) to the Bayesian quantileregression (HS-BQR) and provides a fast sampling algorithm that speeds up computa-tion significantly in high dimensions. The performance of the HS-BQR is tested on largescale Monte Carlo simulations and an empirical application relevant to macroeoncomics.The Monte Carlo design considers several sparsity structures (sparse, dense, block) anderror structures (i.i.d. errors and heteroskedastic errors). A number of LASSO basedestimators (frequentist and Bayesian) are pitted against the HS-BQR to better gauge theperformance of the method on the different designs. The HS-BQR yields as just as good,or better performance than the other estimators considered when evaluated using coeffi-cient bias and forecast error. We find that the HS-BQR is particularly potent in sparsedesigns and when estimating extreme quantiles. The simulations also highlight how thehigh dimensional quantile estimators fail to correctly identify the quantile function of thevariables when both location and scale effects are present. In the empirical application, inwhich we evaluate forecast densities of US inflation, the HS-BQR provides well calibratedforecast densities whose individual quantiles, have the highest pseudo R , highlighting itspotential for Value-at-Risk estimation. Keywords:
Global-Local Priors, Shrinkage, Machine Learning, Quantile Regression
JEL codes:
C01, C11, C21, C53 ∗ The authors would like to thank Arnab Bhattacharjee, Mark Schaffer, Atanas Christev, Aubrey Poon,Gary Koop and all participants of the Scottish Graduate Program in Economics conference in Crieff for theirinvaluable feedback. † Email: [email protected] ‡ Email: [email protected] a r X i v : . [ ec on . E M ] J un Introduction
Quantile regression has been an important tool in the econometricians’ toolkit when estimat-ing heterogenous effects across the conditional response distribution since the seminal workof Koenker and Bassett (1978). In contrast to least squares methods, it estimates directlyquantiles of the dependent variables’ conditional distribution which allows for richer inferencethan solely focusing on estimating the conditional mean. Since this approach does not put anysymmetry restrictions on the conditional distribution, it has proven to be particularly usefulin the macroeconomics and finance literature, where the quantile regression approach is usedto compute probabilities of recessions (e.g. Adrian et al. (2019) and Giglio et al. (2016)), riskmeasures such as VaR (Engle and Manganelli, 2004; Chen et al., 2012; Taylor, 2019) whichfinancial institutions are obliged to report in adherence with the Basel Committee on Bank-ing Supervision (see Basel II), and more generally forecast density construction (e.g. Korobilis(2017) and Carriero et al. (2020)). The Bayesian approach to estimating the quantile regressionhas been used extensively for these purposes as it has been shown to provide more robust tailinferences compared to frequentist estimation (Chen et al., 2012). Especially for modern forecast density applications in macroeconomics, it has become cus-tomary to use some form of dimension reduction, such as factor based (e.g. Stock and Watson(2002) or Bai and Ng (2008)) or sparsity based reduction (e.g. Tibshirani (1996)) as the dimen-sionality of the feature space renders the traditional least squares or quantile regression approachimprecise or infeasible. While there are well established frequentist sparse high-dimensional ap-plications and theory for quantile regression (see Belloni and Chernozhukov (2011) and Wang(2017) for an overview), high-dimensional Bayesian quantile regression is less established. TheBayesian quantile regression approach, as popularised by Yu and Moyeed (2001), is based on theasymmetric Laplace likelihood (ALL), which has a special connection to the frequentist quan-tile regression solution, in that its maximum likelihood estimates are equivalent to the quantileregression with a check-loss function (Koenker, 2005). A hurdle in the Bayesian literature hasbeen that the ALL based methods, result in improper posteriors with any but non-informativeor exponential Laplace priors, where the latter results in the popular Bayesian Lasso quantileregression (Li et al., 2010; Alhamzawi and Yu, 2013; Alhamzawi et al., 2012; Chen et al., 2013).The broader Bayesian sparsity literature has shown, however, that global-local shrinkage priors Reason being that numerical integration through MCMC simulation often provides more robust inferencesfor extreme quantiles compared to numerical optmisation routines. p th fitted quantile can be interpreted as the p th VaR. The framework provided in this paper has theadditional advantage that the derived algorithms can be directly applied to other global-localpriors that can be expressed as scale mixture of normals.In what follows, we will first review the generic quantile regression framework. Then, wewill present the main results in the Bayesian quantile as well as shrinkage literature whichhave motivated the form of the model. Following this, we will develop posteriors as well asthe sampling algorithm. Lastly, we will provide evidence from Monte Carlo simulations and anempirical application of the favourable performance of the HS-BQR compared to alternativemethods. We conclude with further generalisations of the algorithms provided and a discussionof our results. For an overview of GL priors see Polson and Scott (2010). Methodology
Quantile regression is a framework to extend econometric analysis beyond the conditional mean(Koenker and Bassett, 1978). The value added of quantile regression is that it does not ex-clusively focusing on the mean (as is the case with OLS) but allows for inference about theconditional distribution of the variable of interest directly, through modeling of its conditionalquantiles.Taking the linear model Y = Xβ + (cid:15) as our starting point, the conditional quantile functionof Y can be defined as Q p ( Y | X ) = Xβ ( p ) (1)where p ∈ (0 , T × K matrix of covariates, β ( p ) is a K × (cid:15) is T × min β n (cid:88) t =1 ρ p ( y t − x (cid:48) t β ) (2)where ρ p ( . ) is a loss function with the following form: ρ p ( y ) = [ p − I ( y < y = [(1 − p ) I ( y ≤
0) + pI ( y > | y | (3)and I(.) is an indicator function taking on a value of 0 or 1 depending on whether the conditionis satisfied. Equation (3) determines the weight each observation receives in the minimisationproblem. It is often referred to as the check-loss function due to the weight profile it assignsdepending on the quantile being estimated (Koenker, 2005). Note how ( y t − x (cid:48) t β ) is the residualof a regression model. The interpretation of the coefficients is thus similar to the classical4egression case: β j ( p ) is the rate of change of the p th quantile of the dependent variable’sdistribution to a unit change in the j th regressor.Unlike in classical regression analysis, QR does not make any parametric assumption about (cid:15) (Koenker, 2005) which allows for rich, non-symmetric inference about the conditional distri-bution of Y. Three key findings enable extending the quantile regression to the horseshoe prior. (i) Yuand Moyeed (2001) have shown that maximising an asymmetric Laplace likelihood under mildconditions is equivalent to minimising the standard frequentist quantile loss function as in (3).(ii) Kozumi and Kobayashi (2011) showed that while conjugate Normal-Inverse-Gamma (N-IG) priors result in intractable posteriors with the asymmetric Laplace likelihood, independentN-IG priors paired with a latent data representation of the AL likelihood, result in tractableconditional posteriors which can be sampled from a straight forward Gibbs sampler. (iii) Khareand Hobert (2012) show that the Markov chain of this sampler is geometrically ergodic andalso valid in K >> T settings which gives theoretical justification to apply this sampler to highdimensional settingsIn particular, we assume the quantile regression model (1) and a fixed design X . As shownby Yu and Moyeed (2001), β ( p ) can be obtained as the maximum likelihood estimator for β under the fully parametric model y t = x (cid:48) t β + (cid:15) where { (cid:15) t } Tt =1 are assumed i.i.d. with commondensity given by { (cid:15) } Tt =1 ∼ g ( (cid:15) ; p ) = p (1 − p ) /σ [ e (1 − p ) (cid:15)/σ I R − ( (cid:15) ) + e − p(cid:15)/σ I R + ( (cid:15) ) (4)where R + := (0 , ∞ ) and R − := ( −∞ , p th quantile equal to zero. Assuming the linear model as above with error density (4), thejoint likelihood f ( Y | β, σ ) becomes: f ( Y | β, σ ) = ( p T )(1 − p ) T σ − T T (cid:89) t =1 [ e (1 − p )( y t − x (cid:48) t β ) /σ I R − ( y t − x (cid:48) t β ) + e − p ( y t − x (cid:48) t β ) /σ I R + ( y t − x (cid:48) t β )] (5)It is apparent that using any non-trivial prior for ( β, σ ), will result in an intractable poste-rior which will necessitate inefficient accept and reject sampling algorithms (Yu and Moyeed,2001). However, Kozumi and Kobayashi (2011) showed using the mixture representation of the5symmetric Laplace distribution provided by Kotz et al. (2012), that the likelihood in (5) canbe obtained by formulating the error process as: (cid:15) = σθz + στ √ zu (6)Where z ∼ exp (1), u ∼ N (0 , θ = − pp (1 − p ) and τ = p (1 − p ) are deterministic quantilespecific parameters. Let θ = θ ( p ) and τ = τ ( p ) be defined as above and let the tuples { ( y t , z t ) } Tt =1 be independent random pairs. Now, to simplify the Gibbs sampler, we insteadassume z t ∼ exp ( σ ) such that given z t , y t is in normally distributed as y t | z t ∼ N ( x (cid:48) t β + θz t , z t στ ). The joint density of Y | Z is given by: f ( Y | β, σ, Z ) ∝ (cid:16) T (cid:89) t =1 √ z t (cid:17) × exp (cid:104) − T (cid:88) t =1 ( y t − x (cid:48) t β − θz t ) στ z t (cid:105) (7) In order to identify the posterior of a large dimensional coefficient vector in small samples,informative priors are needed. Ideally, they are able to separate noise variables from signalssuch that the noise is shrunk towards zero and signals attain their unrestricted parametervalues. In the frequentist setting, this is usually achieved through penalised likelihoods whichforce variables to threshold to 0 or not. In the Bayesian approach, it is important to notethat the assumption about sparsity is different in that proper prior distributions have non-zeroprobability over sparse and non-sparse regions in the posterior (Batencourt, 2018). In order,therefore, to achieve the desired separation between shrunk and unshrunk variables, the amountof shrinkage on a 0-1 scale should approach a bi-modal distribution where most of the mass ison 0 and 1 respectively. The horseshoe prior of Carvalho et al. (2010) achieves such a shrinkageprofile, while double-exponential based lasso priors do not.To illustrate, we use the linear regression model as above, but for simplicity assume aGaussian error term. As we are leveraging a conditionally normal likelihood for the Bayesianquantile regression, the following discussion also holds for the Bayesian quantile regression.Global-local priors as defined by Polson and Scott (2010) are a scale mixture of normals prioron the coefficient vector and otherwise unspecified priors for the variance parameters which we6rite as: σ ∼ π ( σ ) dσ β j | λ j , ν , σ ∼ N (0 , λ j ν σ ) , j ∈ (1 , · · · , K ) λ j ∼ π ( λ j ) dλ j , j ∈ (1 , · · · , K ) ν ∼ π ( ν ) dν (8)The idea of this family of shrinkage priors is that the global scale-prior ν , controls the overallshrinkage applied to the regression, while the local scale λ j allows for the local possibility ofregressors to escape shrinkage when they have large effects on the response. It can be shownthat the conditional posterior is normal with the following mean and variance: p ( β | Λ , ν, σ , X ) = N ( β | β, Σ) ,β = ν Λ( ν Λ + σ ( X (cid:48) X ) − ) − ˆ β, Σ = ( ν − Λ − + 1 σ X (cid:48) X ) − where Λ = diag ( λ , · · · , λ K ) and ˆ β = ( X (cid:48) X ) − X (cid:48) y is the maximum likelihood solution(given that the inverse exists). Assuming for simplicity that the covariates are uncorrelatedwith zero mean and variances V ar ( x j ) = s j , then X (cid:48) X ≈ ndiag ( s , · · · , s K ) such that themean of the posterior coefficient vector can be approximated as: β j = (1 − κ j ) ˆ β j (9)and κ j = 11 + nσ − ν s j λ j (10) κ j is called the shrinkage factor which is bounded to be between 0 and 1 depending on thevalues of the scales, ν and λ . This result is independent of the prior used for ν and λ . Theirpriors, however, determine the shape of the implicit prior on the shrinkage factor. The vocalpoint of this paper, the horse shoe prior of Carvalho et al. (2011), employs two half Cauchydistributions for λ and τ : λ j ∼ C + (0 , ν ∼ C + (0 ,
1) (11)7he benefit of using two fat tailed distributions as the Cauchy for scale parameters is thateach scale has most mass near 0, which entails shrinkage, however, has fat enough tails to allowcertain features to escape the pull toward zero. It can be shown that these priors imply aBeta(0.5,0.5) distribution on the shrinkage factors (Carvalho et al., 2010). When τ and λ arestrongly identified, this prior results in complete or no shrinkage for each coefficient in the limit,as can be visually confirmed from figure (1), left panel.The same cannot be said for the lasso prior. This prior originated from the fact that theL1 norm penalised least squares objective function of Tibshirani (1996) is equivalent to themaximum a posteriori (MAP) value of a normal linear regression model with independentdouble-exponential priors on the regression coefficients and thus named the Bayesian lasso.Cast into the global local form of equation (8), the prior takes the following form : π ( λ j ) = u e − u λ j / π ( σ ) ∝ σ − (12)Under this prior, the shrinkage coefficient can be shown to have the shape as in the rightpanel of figure (1). The double exponential has the unfortunate trait that large signals canescape shrinkage, but noise variables are not shrunk aggressively enough. This will result intoo little shrinkage in large dimensional problems with many noise variables. In order to generalise the horseshoe prior to the AL likelihood in (7), the HS prior needs tobe formulated under the assumption of independence between the β and σ prior so that theposterior takes a conditionally normal form (Kozumi and Kobayashi, 2011). While prior inde-pendence between regression coefficients and error variance might seem a strong assumption,Moran et al. (2018) have shown that in high-dimensional settings, the independence assumptionaids inference of the error variance. This is due to the fact that conjugate priors act mathe-matically as additional observations which artificially bias the error variances downwards whenK >> T Note here that that an auxiliary variable u is introduced which after integration yields the desired doubleexponential Laplace density on the coefficient vector (see Park and Casella (2008)) κ j , the shrinkage coefficient implied by (a) the horseshoe prior and(b) the Lasso prior.The general independent GL prior takes the following hierarchical form: σ ∼ π ( σ ) dσ β j | λ j , ν ∼ N (0 , λ j ν ) , j ∈ (1 , · · · , K ) λ j ∼ π ( λ j ) dλ j , j ∈ (1 , · · · , K ) ν ∼ π ( ν ) dν (13)As recommended by Gelman et al. (2006), we select a weakly informative inverse Gammadistribution as the prior for the error variance σ and two independently distributed half Cauchydistributions on the positive support for the scale parameters of the β prior: σ ∼ IG ( a, b ) (14) λ j ∼ C + (0 ,
1) (15) ν ∼ C + (0 ,
1) (16)Under the above priors, the posteriors take the following form: β | σ, λ , ν , θ, τ, X, Y, Z ∼ N ( β.V ) , (17)where V = ( X (cid:48) U X + V ) − , U = diag ( τ z i σ ) and V = ν diag ( λ , · · · , λ K ). Using the usualBayesian computations, β is defined as β = ( X (cid:48) U X + V − ) − ( X (cid:48) U y + V − β ) with β = K .The conditional posterior of the scale parameter is given by σ | β, λ , ν , θ, τ, X, Y, Z ∼ IG ( a, b ) (18)9nd a = a + T , b = b + (cid:80) Tt =1 ( y t − x (cid:48) t β − θz t )) z t + τ + (cid:80) Tt =1 z t .As the conditional posteriors for the scales ( λ, ν ) follow Cauchy distributions (Carvalhoet al., 2010), and therefore no moments exist which would enable sampling, the literature hasproposed Gibbs samplers which rely either on slice sampling (Polson et al., 2014) or mixturerepresentations (Makalic and Schmidt, 2015). However, both rely on conjugate formulations.We use the fact that the posterior distribution of λ j conditional on ν remains independent ofall other parameters by assumption, to formulate a block slice sampling algorithm for for λ =( λ , · · · , λ K ) (cid:48) akin to Polson et al. (2014) . Slice sampling generates pseudo-random numbersfrom any distribution function f ( y ) by sampling uniformly from horizontal slices through thePDF. Advantages of the algorithm include its simplicity, that it involves no rejections, and thatit requires no external parameters to be set. Define η j = 1 /λ j and µ j = β j /ν . The conditionalposterior distribution of η j , given all other parameters is given by p ( η j | ν, σ, µ j , θ, τ, X, Y, Z ) ∝ exp (cid:26) − µ j η j (cid:27)
11 + η j (19)Slice sampling can now be implemented to draw from (14):1. Sample ( u j | η j ) uniformly in the interval (0 , / (1 + η j )).2. Sample η j | µ j , u j ∼ Ex (2 /µ j ) from an exponential density truncated to have zero proba-bility outside (0 , (1 − u j ) /u j )).Taking the inverse square root of the sample of 2., one receives back the estimate for λ j . Byreplacing η = 1 /ν and µ j by (cid:80) Kj =1 ( β j /λ j ) / ν can be sampled in a similar manner.Finally, since the latent z t are sampled independently, the conditional posterior follows thereciprocal of the inverse Gaussian: z t | β, σ, λ , ν , θ, τ, X, Y ∼ I − G ( c t , d t ) (20)Where I-G stands for the inverse Gaussian distribution with location and rate parameters re-spectively, c t = √ θ +2 τ | y t − x (cid:48) t β | and d t = θ +2 τ στ . 10 .5 Gibbs Sampler With these conditional posteriors at hand, we utilise a standard Gibbs sampler. The dynamicsof the Markov chain { ( β m , σ m , λ m , ν m , z m ) } ∞ m =0 are implicitly defined through the following steps1. Draw Z ∼ π ( . | β, σ, λ , ν , θ, τ, X, Y ) from I-G( c t , d t ) for all t and call the T x 1 vector z n +1
2. Draw σ n +1 ∼ π ( . | β, λ , ν , θ, τ, X, Y, z n +1 ) from IG ( a, b )3. Draw β n +1 ∼ π ( . | σ n +1 , λ , ν , θ, τ, X, Y, z n +1 ) from N ( β, V )4. Simulate λ n +1 and ν n +1 through slice sampling as in (11)5. Iterate (1-4) until convergenceA computational bottleneck is however present in very high dimensions in evaluating the K × K dimensional inverse for the conditional posterior of β . Cholesky decomposition basedmethods will generally be of order O ( K ). Taking into consideration that in quantile settings,one is usually interested in obtaining more than one expected quantile, this can result in pro-hibitively long computation times. We therefore provide a more efficient sampling algorithm for β which leverages data augmentation similar to the algorithm developed by Bhattacharya et al.(2016) and is shown to be of order O ( T K ) which is especially beneficial in high dimensionalsettings.As derived above, using the scale mixture representation in (13), the conditional posteriorof β given all other parameters can be written as: β | σ, λ, ν, τ, θ, X, Y, Z ∼ N ( A − X (cid:48) U y, A − ) , A = ( X (cid:48) U X + Λ − ∗ ) , Λ − ∗ = ν diag ( λ , · · · , λ K )(21)Suppose, we want to sample from N K ( µ, Σ), whereΣ = (Φ (cid:48)
Φ + D ) − , µ = ΣΦ (cid:48) α. (22)Assume D ∈ R K × K is a positive definitive matrix, φ ∈ R T × K , and α ∈ R T × . Then (21) isa special case of of (22) when setting Φ = √ U X , D = Λ ∗ and α = √ U y . An exact algorithmto sample from (21) is thus given by: 11 lgorithm 1.
Fast HS-BQR sampler1. Sample independently u ∼ N (0 , D ) and δ ∼ N (0 , I T )
2. Set ξ = Φ u + δ
3. Solve (Φ D Φ (cid:48) + I T ) w = ( α − ξ )
4. Set θ = u + D Φ (cid:48) w Proposition 1.
Suppose θ is obtained through algorithm 1. Then θ ∼ N ( µ, Σ).
Proof.
Using the Sherman-Morrison-Woodbury identity, µ = D Φ (cid:48) (Φ D Φ (cid:48) + I T ) − α . Plugging in2. into 3., we obtain θ = u + D Φ (cid:48) (Φ D Φ (cid:48) + I T ) − ( α − ξ ). Since by definition ξ ∼ N (0 , Φ D Φ+ I K ), θ follows a normal distribution with mean D Φ (cid:48) (Φ D Φ (cid:48) + I K ) α = µ . As cov ( u, ξ ) = D Φ (cid:48) , it followsthat cov ( θ ) = D − D Φ (cid:48) (Φ D Φ (cid:48) + I K ) − Φ D which by the Sherman-Morrison-Woodbury identityis equal to Σ. More details are provided in supplementary metarials. The provided algorithmis not specific to the horseshoe prior and follows through for any prior of the form in (13). Thecomputational advantage provided in algorithm 1 compared to Cholesky based decompositionsis that we can cheaply sample from ( u, ξ ) (cid:48) which via data augmentation yields samples fromthe desired distributions. Now we set out to compare exponential based shrinkage priors to the proposed HS-BQR inorder to verify the theoretical advantages laid out above. We consider 3 variants of the originallasso prior which have been adapted to the Bayesian Quantile regression and the frequentistlasso quantile regression of Belloni and Chernozhukov (2011):1. Bayesian Lasso QR (LBQR): The lasso prior is derived by noticing that the L1 normpenalised check loss function min β T (cid:88) t =1 ρ p ( y t − x (cid:48) i β ) + λ K (cid:88) j =1 | β j | (23)can be obtained as the MAP estimate of the AL likelihood with a Laplace prior on theregression coefficients, π ( β | σ, λ ) = ( σλ/ p exp {− σλ (cid:80) Kj =1 | β j |} . The posterior takes the12ollowing form: β | y, X, σ, λ ∝ exp ( − σ T (cid:88) t =1 ρ θ ( y t − x (cid:48) t β ) − σλ K (cid:88) j =1 | β j | )) (24)To estimate estimate (24), we utilise the Gibbs sampler of Li et al. (2010) with theirrecommended hyperpriors. Due to the shrinkage coefficient profile discussed above, weexpect the LBQR to do well in sparse designs with well identified signal and noise.2. Bayesian Elastic Net QR (BQRENET): The elastic net estimator quantile regressiondiffers from the lasso in that it adds a L2 norm of the regression coefficients to theminimisation problem. This is the ridge component which allows to shrink coefficients ina less aggressive way than the L1 norm. This makes it useful when dealing with correlatedor dense designs. Assuming the elastic net estimator for the quantile regression, as min β T (cid:88) t =1 ρ p ( y t − x (cid:48) i β ) + λ K (cid:88) j =1 | β j | + λ K (cid:88) k =1 β j (25)the prior can, similarly to above, be derived with the AL likelihood and an exponentialprior as, π ( β k | λ , λ , σ ) ∝ σλ exp ( − σλ | β j | − σλ β j ). The posterior is then: β | y, X, σ, λ ∝ exp ( − σ T (cid:88) t =1 ρ p ( y t − x (cid:48) t β ) − σλ K (cid:88) j =1 | β j | − σλ ) K (cid:88) j =1 β j )) (26)We use the same hyperpriors as recommended by Li et al. (2010)3. Bayesian Adaptive Lasso QR (BALQR). The adaptive lasso as proposed by Alhamzawiet al. (2012) uses the same setup as the LBQR, however, allows for the shrinkage co-efficient to vary with each covariate. The prior can then be formulated as follows: π ( β | σ, λ j ) = ( σλ j / p exp ( {− σλ j (cid:80) Kj =1 | β j |} ). Since this estimator allows for coefficientspecific shrinkage we expect it to outperform the LBQR.4. Lasso QR (L1-QR). The seminal L1-QR of Belloni and Chernozhukov (2011) applies a L1penalised term to the standard frequentist quantile regression with check-loss function asin (3). Quantile specific regression coefficients are obtained as: min β T (cid:88) t =1 ρ p ( y t − x (cid:48) t β ( p )) + λ p (cid:112) p (1 − p ) T K (cid:88) j =1 | β ( p ) | (27)13ll simulation setups have more explanatory variables than observations. The aim of consid-ering K >> T scenarios is to verify that the proposed method remains valid in high dimensionalsettings. In total 100 Monte Carlo datasets, each with 200 observations, are generated. Thefirst 100 observations are used to retrieve the ˆ β ( p ) vector to calculate bias, while the last 100observations are used to construct forecast errors.We consider 12 designs in total which vary along two different dimensions: the degree ofsparsity and the error generating process. We test the following sparsity profiles: • Sparse with β = (1 , , , , , , × T ), • Dense with β = (1 , . × T ), • Block structure with β = (1 , . × T , × T , . × T ).Consider a linear model as in (1). To retrieve the true quantile regression coefficients, β ( p ),we make use of Koenker (2005)’s alternative representation of the quantile regression: y t = x (cid:48) t β + ( x (cid:48) t ϑ ) u t (28)where u t is assumed to be i.i.d. having some CDF, F . The dimensionality of ϑ is K × β ( p ) to equation (28): β ( p ) = β + ϑF − ( p ) (29)Hence, the true β profile of a quantile regression model has a random coefficient modelinterpretation, where the vector of coefficients can be decomposed into a fixed plus a randomcomponent. In particular, the random component depends on the inverse CDF of the error, F − ( p ). One can therefore think of ϑ as determining which variable is correlated with the error,where by default the first entry, ϑ , is set to 1. This entails that location effects will always bepresent. From a frequentist’ perspective Equation (29) is our oracle estimator for β for a givenquantile p , which, given that the AL likelihood approximation in equation (7) holds, can becompared to the mean of the posterior of equation (17) (Kozumi and Kobayashi, 2011). Withthis in mind, it is trivial to calculate the true β ’s for the error generating processes considered. While it is possible for ϑ to take on any value, for simplicity we assume that the elements of ϑ only to takeon the values {
0, 1 } . GP Error distributions Quantile functions y = Xβ + (cid:15) (cid:15) ∼ N (0 , β ( p ) = β + F − N (0 , ( p ) y = Xβ + (cid:15) (cid:15) ∼ T (3) β ( p ) = β + F − T (3) ( p ) y = Xβ + (1 + X ) (cid:15) (cid:15) ∼ N (0 , β ( p ) = β + F − N (0 , ( p ) β ( p ) = β + F − N (0 , ( p ) y = Xβ + (cid:15) + X (cid:15) (cid:15) ∼ N (0 , β ( p ) = β + F − N (0 , ( p ) (cid:15) ∼ U (0 , β ( p ) = β + F − U (0 , ( p )Table 1: Summary of simulation setupsThe second dimension along which the DGPs differ is in their error process. The proposedDGPs can be grouped into two broad cases: (1) i.i.d. errors ( y and y ); and (2) heteroskedasticerrors ( y and y ). In y , we assume that the error distribution follows a standard normaldistribution and in y , the error has student-t distributed errors with 3 degrees of freedom. Forthe other cases, we assume simple heteroskasticity caused by correlation between the secondcovariate and (cid:15) which will influence β . In contrast to y , y can be thought of as containinga mixture between a uniform and a standard normal error distribution. In all simulations,the design matrix is simulated using a multivariate normal distribution with mean 0 and acovariance matrix with its ( i, j ) th element defined as 0 . | i − j | .Relating the assumed error processes to the random coefficient representation (29), it isclear that, under i.i.d. errors, only the constant has a non-constant quantile function causedby F − (hereinafter called location shifters). Under the heteroskedastic designs, apart from theconstant, β will have a non-constant quantile function as well. Hence, β in y is determinedby F − N (0 , across p, and β in y follows F − U (0 , , i.e., increases linearly with p. The simulationdesigns (and the expected quantile functions) are summarised in table (1).We evaluate the performance of the estimators in terms of bias in the coefficients and forecasterror. Using the true quantile profile in β ( p ) in (29), we calculate root mean coefficient bias(RMCB) and root mean squared forecast error (RMSFE) as:1. Root Mean Coefficient Bias = (cid:113) iter || ˆ β ( p ) − β ( p ) ||
2. Root Mean Squared Forecast Error = (cid:113) iter || X ˆ β ( p ) − Xβ ( p ) || iter refers to the number of Monte Carlo repetitions. For the Bayesian estimators we defineˆ β ( p ) as the mean of the posterior for the p th quantiles model. Note that the way the simulations15ere designed, iter is 100 for both RMSFE and RMCB, for all estimators. The different error distributions test whether the quantile is truly robust to changes in theirspecification as the literature suggests (see Koenker and Bassett (1982)). As described above,the conditional distributions are completely described with location shifters in the case of i.i.d.errors, where y ’s β has a profile of an inverse normal CDF shifted up by the respectiveconstant coefficient, and y ’s β has a profile of an inverse student-t distribution with 3 degreesof freedom, shifted up by the respective constant coefficient. In both these cases ϑ is equal1. The results for the bias for the three designs (sparse, dense, block) across a selection ofquantiles are presented in table (2) and the results of the forecast performance are presentedin table (3). The way the DGP’s were constructed, only a few variables have a non constantquantile curve. While the tables present a good overview of overall performance it does notprovide information whether the estimator captures the correct quantile curves. To alleviatethis, boxplots were created for each quantile for the variables with non constant quantile curves.The HS-BQR’s boxplots are presented in figure (2). Table (2) shows that the HS-BQR performs admirably compared to the considered estima-tors in all i.i.d designs regardless of what type of sparsity structure is considered. In particular,for the sparse case the HS-BQR provides the lowest coefficient bias for both y and y for allthe quantiles. The forecast results from table (3) corroborate these findings with the HS-BQRproviding the lowest root mean squared forecast errors among the estimators considered for thesparse design of y and y .The HS-BQR’s performance is competitive for the dense and block cases, but other estima-tors yield lower coefficient bias for these designs as can be seen in table (2). For the dense designthe HS-BQR rivals the performance of the BQRENET and even provides lower coefficient biasfor some of the quantiles for both y and y . The same cannot be said for the block design wherethe BALQR yields superior performance, except for the most extreme quantiles considered forwhich the HS-BQR gives the lowest coefficient bias. The forecast error results of table (3) arein line with the bias results where the HS-BQR offers comparable performance compared to the The only estimator where there is a deviation from 100 is the RMCB and RMSFE of BALQR where thevariance covariance matrix of the posterior coefficients was not invertible for some of the cases. This is indicativethat the BALQR prior did not shrink enough The same boxplots were constructed for the other estimators and are presented in an online appendix. β profiles for y and y across quantiles for the different sparsity settingsBQRENET for the dense designs, while the block design is dominated by the BALQR. Thiscoheres with the theoretical properties of the priors. The ridge component in the BQRENETprovides more stable inference for dense designs, while the BALQR benefits in block structuresfrom adaptive shrinkage without having to identify a global shrinkage parameter. Note, howthe coefficient bias and forecast errors are the lowest for the sparse cases for all the estimatorsconsidered. This result is expected, as the priors favour sparse posterior solutions.Both the normally distributed y and t-distributed y showcase a situation where the extremequantiles (0.1 and 0.9) have higher bias than the central quantile (0.5) for all the estimatorsconsidered. This is a common finding in quantile regressions which is on account of moreextreme quantiles being ”data sparse” as a few observations get large weights. While it isexpected that there is a U-shape in the coefficient bias as we move across the quantiles, theslope of this shape is not uniform across the estimators. In particular, it can be seen in table (2)that the HS-BQR’s bias does not increase as much as the other estimators. Similarly, extremequantiles generally tend to have higher forecast errors for all estimators, but the HS-BQR’sextreme quantiles don’t suffer as much as it’s competition as shown in table (3). This propertycannot be overstated, as quantile regression is often employed for extreme quantiles. Apart from the L1-QR and HS-BQR in the block design, where the estimators have lower coefficient biasand forecast error for its extreme low quantiles than its central quantiles. a b l e : R oo t m e a n c o e ffi c i e n t b i a s p . . . . . . . . . . . . . . . . . . . . y y y y Sp a r s e H S - B Q R . . . . . . . . . . . . . . . . . . . . L - Q R . . . . . . . . . . . . . . . . . . . . L B Q R . . . . . . . . . . . . . . . . . . . . B Q R E N E T . . . . . . . . . . . . . . . . . . . . B A L Q R . . . . . . . . . . . . . . . . . . . . D e n s e H S - B Q R . . . . . . . . . . . . . . . . . . . . L - Q R . . . . . . . . . . . . . . . . . . . . L B Q R . . . . . . . . . . . . . . . . . . . . B Q R E N E T . . . . . . . . . . . . . . . . . . . . B A L Q R . . . . . . . . . . . . . . . . . . . . B l o c k H S - B Q R . . . . . . . . . . . . . . . . . . . . L - Q R . . . . . . . . . . . . . . . . . . . . L B Q R . . . . . . . . . . . . . . . . . . . . B Q R E N E T . . . . . . . . . . . . . . . . . . . . B A L Q R . . . . . . . . . . . . . . . . . . . . a b l e : R oo t m e a n s q a u r e d f o r ec a s t e rr o r p . . . . . . . . . . . . . . . . . . . . y y y y Sp a r s e H S - B Q R . . . . . . . . . . . . . . . . . . . . L - Q R . . . . . . . . . . . . . . . . . . . . L B Q R . . . . . . . . . . . . . . . . . . . . B Q R E N E T . . . . . . . . . . . . . . . . . . . . B A L Q R . . . . . . . . . . . . . . . . . . . . D e n s e H S - B Q R . . . . . . . . . . . . . . . . . . . . L - Q R . . . . . . . . . . . . . . . . . . . . L B Q R . . . . . . . . . . . . . . . . . . . . B Q R E N E T . . . . . . . . . . . . . . . . . . . . B A L Q R . . . . . . . . . . . . . . . . . . . . B l o c k H S - B Q R . . . . . . . . . . . . . . . . . . . . L - Q R . . . . . . . . . . . . . . . . . . . . L B Q R . . . . . . . . . . . . . . . . . . . . B Q R E N E T . . . . . . . . . . . . . . . . . . . . B A L Q R . . . . . . . . . . . . . . . . . . . . y and y . The figure underpins the findings of the tables: The HS-BQR captures the normalinverse CDF shape for y and inverse t-distrubtion for y for the sparse design. In the densedesign the location shift effect is not as well captured as for the sparse case, but the HS-BQR iscapable of identifying location shift’s for the more extreme quantiles. The figure also highlightshow the HS-BQR suffers the most in block designs, where the boxplots highlight how for manymonte carlo cases the estimated β coefficient is an outlier, several magnitudes above the mean β . This finding underpins, that in designs with unmodeled block structures and, hence, badlyidentified global shrinkage, quantile effect might be shrunk away. Implementation of group-levelshrinkage along with prior information about the sparsity pattern in the data might be able toalleviate this problem, which we leave for future research. As correlation between a covariate and the error causes a quantile profile, y and y will haveboth location shifters and scale shifters. In particular, β of y follows an inverse Normal CDFshifted up by the respective constant, and β of y is a linear line from 1 to 3 across the quantiles.It is worth noting, that to make sure there is no kink in the data , the variable multiplied by theerror is restricted to take on positive values only. This was achieved by taking the exponentialof that particular variable .As with the homoskedastic DGPs, we see that for all estimators, the error rate increaseswhen moving away from the central quantiles and that coefficient bias as well as forecast accu-racy worsens for dense and block designs compared to the sparse design. Further, the bias andforecast results in tables (2) and (3) show that the HS-BQR provides better or similar perfor-mance to the alternative estimators, where it consistently outperforms all Bayesian estimatorsfor y in sparse designs. Similar to the previous discussion, the HS-BQR stands out in thatit provides consistently more stable inference of extreme quantiles independent of the sparsitystructure. When allowing for X to take on negative values as well, the relationship between Y and X will involve akink at X = 0 which causes non-linearity and therefore violates the assumption of linearity in parameters. The exponential of the 7 th variable for the sparse and dense case and the T + 2 nd variable for the block caseis also taken. This does not influence the results. β and β profiles for y across quantiles for the different sparsity settingsFigure 4: β and β profiles for y across quantiles for the different sparsity settings21igure 5: β i profiles for y , y , y and y across quantiles for the small designConfirming the homoskedastic simulation results, in dense designs, the BQRENET aidedby the ridge component in the prior, provides lower coefficient bias and forecast error, thanthe HS-BQR, whereas in block DGPs, the BALQR marginally outperforms the HS-BQR for y . A different picture emerges for block DGPs of y . Here, the HS-BQR provides a gain inthe precision of coefficient estimation between 2-20% and in excess of 100% in terms of forecastaccuracy.The boxplots in figure (4) also provide an explanation as to why the HS-BQR’s forecastperformance is much better for the block case of y , which is that it captured the quantilefunction for β . The HS-BQR’s lackluster performance in forecasting the dense design of y is revealed in figure (3): the HS-BQR failed to capture the quantile function as precisely itwas able to do so for the same design of y . This highlights that to obtain good performancefor these estimators, it is not sufficient to shrink the unimportant variables to 0, it is alsoimperative to identify the correct quantile function for the variables that are heteroskedastic.Generally, as can be seen from (3) and (4), The HS-BQR seems to have difficulty capturingboth location and scale effects present in the DGP. It often isolates only one of the effects, whileshrinking the other to a constant quantile function. To check whether this is a general problemfor Bayesian quantile regression, we estimated the HS-BQR on a DGP determined by only oneexplanatory variable (and a constant). As can be seen in figure (5), quantile profiles are well22dentified. All the estimators have difficulty simultaneously identifying the true regressorsand partialling out the location ( β ( p )) and scale ( β ( p )) effects in high dimensional setting.This difficulty of capturing the quantile profiles for both location and scale effects could stemfrom estimating the quantiles independently. Namely, there is no monotonicity imposed on theprofile of β ( p ). As such, further improvements in high dimensional quantile regression can beachieved, by finding ways where the methods don’t shrink one of the location of scale shiftersto a constant quantile profile. We now compare the above estimator’s ability to construct forecast densities of US quarterlyinflation. Density forecasts are an important aspect of macroeconomics that are extensivelyused in institutions such as central banks which construct fan-chart estimates of key economicindicators and is getting increasing attention in academics as well (Adrian et al., 2019; Aastveitet al., 2018). Rather than simply forecasting the mean, density forecasts also provide infor-mation about tail risk. Quantile regression is particularly useful for characterising these tailrisks, as it is capable of giving asymmetric forecast densities where upside and downside risk areestimated independently. Traditional conditional mean models, in contrast, assume a normaldistribution around the forecasted values, which characterises the uncertainty in a symmetricway.We break from the forecast density literature a bit, by not exclusively focusing on testingthe whole density, but also evaluating specific quantiles’ performance. This is done to show theHS-BQR’s power to create Value-at-Risk estimates on top of creating intuitive density forecasts.We use data provided by Korobilis (2017) which uses 16 macro variables as potential regres-sors. These include measures of real economic activity (e.g. unemployment, investment, housingstarts), financial market information (e.g. default yield spreads), money supply variables (e.g.M1 money stock) and expectation indices (e.g. PMI) (see: Korobilis (2017) for more informa-tion on the data set) as well as two lags of the dependent variable. The dependent variable isUS quarterly CPI inflation, seasonally adjusted, spanning the period from 1947Q1-2015Q3.To obtain the forecasts, we use the general linear model: y t + h = x (cid:48) t β + (cid:15) t + h (30) The small design follows the same structure as outlined in table (1). t = 1 , · · · , T − h , where h refers to the forecast horizon. For brevity, in this paperwe only consider one-step-ahead forecasts such that h is set to 1. Using the quantile setup,forecasts from each quantile are denoted as y pT +1 | T . Note, that these one step ahead forecasts areequivalent to the one step ahead p th Value-at-Risk. Forecasts are computed on a rolling basis,meaning that each T+1 forecast is constructed from a one time period expanding window ofin-sample observations t = 1 , · · · , T . The initial in-sample period uses the first 40 observationsof the sample, which makes for 233 rolling forecast windows. Just like in Korobilis (2017), weestimate a grid of 19 equidistant quantiles to construct the predictive density p (ˆ y T +1 | T ).Forecast densities are evaluated along Kolmogorov-Smirnov (KS) statistics based on Proba-bility Integral Transform (PIT). The PIT is often used when evaluating density forecasts andhinges on the idea that any given forecast distribution p ( y t + h | y t ) can be converted to randomvariables having a standard uniform distribution. In particular, the PIT is the correspondingCDF of the density function evaluated at the actual observation of the out-of-sample periods, y t + h , and is constructed using the following formula: g t + h = (cid:90) y t + h −∞ p ( u | y t ) du = P ( y t + h | y t ) (31)The estimated predictive density is consistent with the true density when the g t + h arei.i.d. uniform. As such, the CDF should be a 45 degree line (Diebold et al., 1998). Theadditional benefit of this test is that the theoretical true PIT distribution is independent of theeconometrician’s loss function. With this in mind, any test can be employed that compares thedistance between the empirical and theoretical PIT distribution. We opt for using the KS testto compare the PITs of the estimators with the uniform CDF.As mentioned above, this paper does not exclusively focus on the forecast densities butalso tries to evaluate the specific quantile forecasts. The two most popular tests to verifythe performance of a specific quantile are the Dynamic Quantile test of Engle and Manganelli(2004) and the VQR test of Gaglianone et al. (2011). These tests provide a principled way oftesting the null hypothesis of the selected quantile being correct. However, they do not offera comparative measure as to which method provides better fit for a specific quantile. For thisreason this paper instead computes the pseudo R for the quantiles for each estimator, followingKoenker and Machado (1999). The pseudo R of the following regression is obtained: There are a plethora of tests to evaluate distributions based on QQ-plot of the PIT. The choice of the KSwas based solely on its simplicity to compute and any other test would suffice for evaluation. y t ( p | V t ( p )) = β + β V t ( p ) (32)where V t ( p ) is the fitted value of of the estimator for the p th quantile. Running the regressionin equation (32) for the p th quantile gives an intuitive test for the ability of the estimated fittedvalue to capture the dynamics we are interested in. In particular the pseudo R shows howmuch information V t ( p ) adds to the regression compared to a Quantile regression with only aconstant. Ideally the coefficient of β should be 0 while the coefficient of β should be 1. In addition to the estimators considered in the Monte Carlo study, we also estimate theBayesian quantile regression (BQR) without any shrinkage priors. This serves as a benchmark.The results of the KS test, and the pseudo R ’s are presented in table (4), while the forecasteddensities are shown in figures (6) and (7) where the shaded regions show the fill between the 5 th quantile and the 95 th quantile after sorting the quantiles as suggested by Chernozhukov et al.(2010).Comparing the forecast distributions on a visual basis in figures 6 and 7 , reveals certainproperties of the estimators. First, the horsehoe prior increases calibration of the densityforecasts compared to the plain BQR. This can be seen from figure 6, where the blue shadedarea is often too narrow with the central tendency not capturing sharp movements unlikethe HS-BQR. Second, L1QR and BALQR perform similarly to the HS-BQR, but offer wideruncertainty bands. The BALQR in particular consistently has larger 95th and 5th quantileestimates than the other estimators considered in figure 6 which from an economic standpointare unreasonably large. This can be seen by the lowest band of the BALQR which consistentlyprovides estimates that are below 0% inflation. The BALQR’s forecasts imply a consistent 5%risk of deflation for the past 50 years. Third, the LBQR and BQRENET provide the worstcalibrated forecast densities which are too wide compared to the other estimators. As shown infigure (7), both estimators have considerable downward skew with lower quantiles estimating inexcess of 20%-30% deflation at different parts of the sample. Upper quantiles for the BQRENETand LBQR after 2009 include the possibility of 30% inflation. These forecast densities seem fartoo wide to be informative.The visual inspection however is not confirmed by the KS statistics in table (4): The lassobased estimators have a lower KS statistic than the HS-BQR. Since the KS-statistic has anull hypothesis that the empirical CDF is no different than a uniform distribution, the higher The VQR takes advantage of this property to construct an F-test that factors in which quantile was esti-mated. For more information consult Gaglianone et al. (2011). R p R is able to remedy this conundrum by comparing the explanatory power ofeach individual quantile and corroborates the visual inspection. The HS-BQR yields the bestperformance, when looking at the pseudo R , with it beating the other estimators consideredfor all the quantiles considered. The L1-QR and BQR also provide good results with pseudo R ’s just below that of the HS-BQR. The BALQR’s pseudo R for the lower quantiles areworse than that of the HS-BQR, BQR, and L1-QR, but its higher quantiles perform just aswell, which is is in line with figure (6). Finally, the LBQR and BQRENET perform the worst,especially for lower quantiles. The pseudo R results highlight the potential of HS-BQR for notjust creating density forecasts, but also in providing Value-at-Risk estimates. In this paper, we have extended the widely popular horseshoe prior of Carvalho et al. (2010)to the Bayesian quantile regression and provided a new algorithm to sample the shrinkagecoefficients via slice sampling for the independent prior and a fast sampling algorithm thatspeeds up computation significantly in high dimensions.27n our simulations, we considered a variety of sparse, dense and block designs with differenterror distributions. We then tested a selection of quantile regression methods to see how theHS-BQR fares in comparison. The simulations revealed three points about the HS-BQR. First,the HS-BQR provides better or comparable performance in terms of both coefficient bias andforecast risk. For the sparse design it beat all other estimators, while in the dense and blockdesigns the HS-BQR came close to the best estimator even beating it for some quantiles.Second, the HS-BQR was more stable when estimating the tails of the distribution, havingthe lowest bias and forecast error for the extreme quantiles (0.1 and 0.9). This shows how theshrinkage profile of the Horseshoe prior is more efficient than the lasso based priors. Finally,the HS-BQR has difficulties simultaneously identifying the correct location and scale effectsin high-dimensional setting, but this is an issue for the other estimators as well. When theHS-BQR was applied to a comparable ’small data’ design, it had no issues simultaneouslyidentifying β ’s and β ’s quantile function.In the empirical application, we tested the HS-BQR’s performance in creating density fore-casts as well as Value-at-Risk estimations. The forecast density of the HS-BQR provides eco-nomically more intuitive forecast densities than the LBQR, BALQR, and BQRENET, all ofwhich provide forecast bands that are too wide. The KS statistic reveals that the PIT CDF ofthe HS-BQR is not significantly different from the expected uniform distribution. The pseudo R revealed that the the HS-BQR’s fitted quantiles provide the best goodness of fit of all theestimators. The only other estimator that comes close to the HS-BQR is the L1-QR. Thisshows that the HS-BQR is an adequate method to give credible Value-at-Risk estimates.The results show that the HS-BQR is a competitive estimator for which especially good be-haviour can be expected in sparse designs with few observations. However, there are multiplefronts on which the proposed HS-BQR can be improved upon. For instance, the simulationshighlighted that in dense and block designs, the HS prior tends to shrink the constant tooaggressively. Hence, extensions which allow for differing shrinkage terms for subsets of theregressors might be able to alleviate this problem. The results also highlight that the esti-mators have difficulties simultaneously capturing location and scale shifts in high dimensions.Extensions to the HS-BQR should tackle this because to attain oracle properties in quantileregression it is not enough to shrink unimportant variables to 0, but also to identify the correctquantile functions. 28 eferences Aastveit, K. A., J. Mitchell, F. Ravazzolo, and H. K. Van Dijk (2018). The evolution of forecastdensity combinations in economics. Technical report, Tinbergen Institute Discussion Paper.Adrian, T., N. Boyarchenko, and D. Giannone (2019). Vulnerable growth.
American EconomicReview 109 (4), 1263–89.Alhamzawi, R. and K. Yu (2013). Conjugate priors and variable selection for bayesian quantileregression.
Computational Statistics & Data Analysis 64 , 209–219.Alhamzawi, R., K. Yu, and D. F. Benoit (2012). Bayesian adaptive lasso quantile regression.
Statistical Modelling 12 (3), 279–297.Bai, J. and S. Ng (2008).
Large dimensional factor analysis . Now Publishers Inc.Batencourt, M. (2018). Bayes sparse regression.Belloni, A. and V. Chernozhukov (2011). l1-penalized quantile regression in high-dimensionalsparse models.
The Annals of Statistics 39 (1), 82–130.Bhadra, A., J. Datta, N. G. Polson, B. Willard, et al. (2019). Lasso meets horseshoe: A survey.
Statistical Science 34 (3), 405–427.Bhattacharya, A., A. Chakraborty, and B. K. Mallick (2016). Fast sampling with gaussian scalemixture priors in high-dimensional regression.
Biometrika , asw042.Buchinsky, M. (1998). Recent advances in quantile regression models: a practical guideline forempirical research.
Journal of human resources , 88–126.Carriero, A., T. E. Clark, and M. G. Marcellino (2020). Nowcasting tail risks to economicactivity with many indicators.Carvalho, C. M., N. G. Polson, and J. G. Scott (2010). The horseshoe estimator for sparsesignals.
Biometrika 97 (2), 465–480.Chen, C. W., D. B. Dunson, C. Reed, and K. Yu (2013). Bayesian variable selection in quantileregression.
Statistics and its Interface 6 (2), 261–274.29hen, C. W., R. Gerlach, B. B. Hwang, and M. McAleer (2012). Forecasting value-at-riskusing nonlinear regression quantiles and the intra-day range.
International Journal of Fore-casting 28 (3), 557–574.Chernozhukov, V., I. Fern´andez-Val, and A. Galichon (2010). Quantile and probability curveswithout crossing.
Econometrica 78 (3), 1093–1125.Davino, C., M. Furno, and D. Vistocco (2013).
Quantile regression: theory and applications ,Volume 988. John Wiley & Sons.Diebold, F. X., T. A. Gunther, and A. S. Tay (1998). Evaluating density forecasts with appli-cations to financial risk management.
International Economic Review 39 (4), 863–883.Engle, R. F. and S. Manganelli (2004). Caviar: Conditional autoregressive value at risk byregression quantiles.
Journal of Business & Economic Statistics 22 (4), 367–381.Gaglianone, W. P., L. R. Lima, O. Linton, and D. R. Smith (2011). Evaluating value-at-riskmodels via quantile regression.
Journal of Business & Economic Statistics 29 (1), 150–160.Gelman, A. et al. (2006). Prior distributions for variance parameters in hierarchical models(comment on article by browne and draper).
Bayesian analysis 1 (3), 515–534.Giglio, S., B. Kelly, and S. Pruitt (2016). Systemic risk and the macroeconomy: An empiricalevaluation.
Journal of Financial Economics 119 (3), 457–471.Khare, K. and J. P. Hobert (2012). Geometric ergodicity of the gibbs sampler for bayesianquantile regression.
Journal of Multivariate Analysis 112 , 108–116.Koenker, R. (2005).
Quantile regression . New York: Cambridge University Press.Koenker, R. and G. Bassett (1978). Regression quantiles.
Econometrica 46 , 33–50.Koenker, R. and G. Bassett (1982). Robust tests for heteroscedasticity based on regressionquantiles.
Econometrica: Journal of the Econometric Society , 43–61.Koenker, R. and J. A. Machado (1999). Goodness of fit and related inference processes forquantile regression.
Journal of the american statistical association 94 (448), 1296–1310.Korobilis, D. (2017). Quantile regression forecasts of inflation under model uncertainty.
Inter-national Journal of Forecasting 33 (1), 11–20.30otz, S., T. Kozubowski, and K. Podgorski (2012).
The Laplace distribution and generalizations:a revisit with applications to communications, economics, engineering, and finance . SpringerScience & Business Media.Kozumi, H. and G. Kobayashi (2011). Gibbs sampling methods for bayesian quantile regression.
Journal of statistical computation and simulation 81 (11), 1565–1578.Li, Q., R. Xi, N. Lin, et al. (2010). Bayesian regularized quantile regression.
Bayesian Analy-sis 5 (3), 533–556.Makalic, E. and D. F. Schmidt (2015). A simple sampler for the horseshoe estimator.
IEEESignal Processing Letters 23 (1), 179–182.Moran, G. E., V. Roˇckov´a, E. I. George, et al. (2018). Variance prior forms for high-dimensionalbayesian variable selection.
Bayesian Analysis , 1091–1119.Park, T. and G. Casella (2008). The bayesian lasso.
Journal of the American StatisticalAssociation 103 (482), 681–686.Polson, N. G. and J. G. Scott (2010). Shrink globally, act locally: Sparse bayesian regularizationand prediction.
Bayesian statistics 9 , 501–538.Polson, N. G., J. G. Scott, and J. Windle (2014). The bayesian bridge.
Journal of the RoyalStatistical Society: Series B (Statistical Methodology) 76 (4), 713–733.Stock, J. H. and M. W. Watson (2002). Forecasting using principal components from a largenumber of predictors.
Journal of the American statistical association 97 (460), 1167–1179.Taylor, J. W. (2019). Forecasting value at risk and expected shortfall using a semiparametricapproach based on the asymmetric laplace distribution.
Journal of Business & EconomicStatistics 37 (1), 121–133.Tibshirani, R. (1996). Regression shrinkage and selection via the lasso.
Journal of the RoyalStatistical Society: Series B (Methodological) 58 (1), 267–288.Wang, L. (2017). Nonconvex penalized quantile regression: A review of methods, theory andalgorithms. In
Handbook of Quantile Regression , pp. 273–292. Chapman and Hall/CRC.Yu, K. and R. A. Moyeed (2001). Bayesian quantile regression.
Statistics & Probability Let-ters 54 (4), 437–447. 31
Appendix
We now give further details on the derivation of Algorithm 1. The goal of the algorithm is tocircumvent having to compute large K × K matrices by redefining auxiliary variables whichunder certain linear combination result in draws of the desired distribution N ( β, V ). As above,by the Sherman-Morrison-Woodbury theorem (see e.g. Hager 1989), Σ and µ can be expandedas: Σ = (Φ (cid:48) Φ + D − ) − = D − D Φ (cid:48) (Φ D Φ + I T ) − Φ Dµ = D Φ (cid:48) (Φ D Φ (cid:48) + I T ) α This expansion per-se won’t help in sampling from N (0 , Σ). Letting ξ and u being definedas above, ω = ( ξ (cid:48) , u (cid:48) ) (cid:48) ∈ R T + K follows a multivariate normal distribution centred on 0 withcovariance Ω = P SS (cid:48) R where it is easily verified that P = (Φ D Φ (cid:48) + I T ), R = D and S can be derived as: Cov ( ξ, u ) = Cov ( √ D(cid:15), Φ u )= E ( √ D(cid:15)u (cid:48) X (cid:48) √ U )= E ( √ D(cid:15)(cid:15) (cid:48) √ DX (cid:48) √ U )= DX (cid:48) √ U = D Φ (cid:48) (cid:15) is defined here following N(0,1) distribution. Rewriting Ω into its LDU decomposition (seee.g. Hamilton, 1994) as: P SS (cid:48) R = I T S (cid:48) P − I K (cid:124) (cid:123)(cid:122) (cid:125) L P R − S (cid:48) P − S (cid:124) (cid:123)(cid:122) (cid:125) Γ I T P − S I K (cid:124) (cid:123)(cid:122) (cid:125) L’ (33)Where the lower K × K block in Γ is equal to Σ. To retrieve the lower part, we isolate Γwhich is easily obtained because L is lower triangular and thus the inverse is readily availableas: I T − S (cid:48) P − I K . 32ince ω has already been sampled from N (0 , Ω) in steps 2 and three of the algorithm, thetransformation ω ∗ = L − is distributed N (0 , Γ). Collecting the lower block of ω ∗ yields a samplefrom N (0 ,,